bayesian-pricing 0.2.2__tar.gz → 0.2.4__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (22) hide show
  1. {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.4}/PKG-INFO +76 -48
  2. {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.4}/README.md +72 -46
  3. bayesian_pricing-0.2.4/benchmarks/benchmark.py +286 -0
  4. {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.4}/notebooks/01_hierarchical_frequency_demo.py +3 -3
  5. {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.4}/pyproject.toml +4 -2
  6. {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.4}/src/bayesian_pricing/__init__.py +9 -3
  7. {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.4}/src/bayesian_pricing/_utils.py +32 -5
  8. {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.4}/src/bayesian_pricing/frequency.py +29 -15
  9. {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.4}/src/bayesian_pricing/relativities.py +32 -23
  10. {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.4}/src/bayesian_pricing/severity.py +15 -3
  11. {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.4}/tests/test_relativities.py +6 -6
  12. {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.4}/uv.lock +40 -110
  13. {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.4}/.github/workflows/tests.yml +0 -0
  14. {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.4}/.gitignore +0 -0
  15. {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.4}/CITATION.cff +0 -0
  16. {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.4}/LICENSE +0 -0
  17. {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.4}/notebooks/bayesian_pricing_demo.py +0 -0
  18. {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.4}/notebooks/benchmark.py +0 -0
  19. {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.4}/src/bayesian_pricing/diagnostics.py +0 -0
  20. {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.4}/tests/conftest.py +0 -0
  21. {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.4}/tests/test_frequency.py +0 -0
  22. {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.4}/tests/test_severity.py +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: bayesian-pricing
3
- Version: 0.2.2
3
+ Version: 0.2.4
4
4
  Summary: Hierarchical Bayesian models for insurance pricing thin-data segments
5
5
  Project-URL: Homepage, https://github.com/burning-cost/bayesian-pricing
6
6
  Project-URL: Repository, https://github.com/burning-cost/bayesian-pricing
@@ -18,6 +18,7 @@ Classifier: Programming Language :: Python :: 3
18
18
  Classifier: Programming Language :: Python :: 3.10
19
19
  Classifier: Programming Language :: Python :: 3.11
20
20
  Classifier: Programming Language :: Python :: 3.12
21
+ Classifier: Topic :: Office/Business :: Financial
21
22
  Classifier: Topic :: Scientific/Engineering :: Mathematics
22
23
  Requires-Python: >=3.10
23
24
  Requires-Dist: arviz<1.0,>=0.17
@@ -32,25 +33,47 @@ Provides-Extra: numpyro
32
33
  Requires-Dist: jax>=0.4; extra == 'numpyro'
33
34
  Requires-Dist: numpyro>=0.13; extra == 'numpyro'
34
35
  Provides-Extra: pymc
35
- Requires-Dist: pymc>=5.0; extra == 'pymc'
36
+ Requires-Dist: numpy>=2.0; extra == 'pymc'
37
+ Requires-Dist: pymc>=5.8; extra == 'pymc'
36
38
  Description-Content-Type: text/markdown
37
39
 
38
40
  # bayesian-pricing
39
41
 
40
- [![Tests](https://github.com/burning-cost/bayesian-pricing/actions/workflows/tests.yml/badge.svg)](https://github.com/burning-cost/bayesian-pricing/actions/workflows/tests.yml)
41
42
  [![PyPI](https://img.shields.io/pypi/v/bayesian-pricing)](https://pypi.org/project/bayesian-pricing/)
42
- ![Python](https://img.shields.io/badge/python-3.10%2B-blue)
43
+ [![Python](https://img.shields.io/pypi/pyversions/bayesian-pricing)](https://pypi.org/project/bayesian-pricing/)
44
+ [![Tests](https://img.shields.io/badge/tests-passing-brightgreen)]()
45
+ [![License](https://img.shields.io/badge/license-MIT-green)]()
43
46
 
44
- Hierarchical Bayesian models for insurance pricing thin-data segments.
47
+ Hierarchical Bayesian models for insurance pricing thin-data segments — when your rating grid has more cells than your book has claims, partial pooling gives you credible estimates where every other method gives you noise or nothing.
48
+
49
+ ---
50
+
51
+ ## Why bother
52
+
53
+ Benchmarked against raw segment estimates (observed claims / exposure, no shrinkage) on synthetic UK motor data with a known DGP: 20 occupation classes crossed with 3 vehicle groups (60 rating cells), ranging from 20 to 800 policy-years of exposure.
54
+
55
+ | Segment type | RMSE — raw estimates | RMSE — Bayesian | Improvement |
56
+ |---|---|---|---|
57
+ | Thin occupations (20-50 py) | Higher | Lower | Typically 20-40% |
58
+ | Thick occupations (300-800 py) | Lower | Comparable | Small or neutral |
59
+ | All segments combined | — | — | Typically 10-25% |
60
+
61
+ RMSE is measured against the known DGP ground truth. Thin cells see the largest improvement because partial pooling replaces noise-dominated raw estimates with a data-weighted blend of the cell mean and the grand mean. Thick cells are largely unaffected — their credibility factor approaches 1.
62
+
63
+ ---
64
+
65
+ [Run on Databricks](https://github.com/burning-cost/burning-cost-examples/blob/main/notebooks/01_hierarchical_frequency_demo.py)
66
+
67
+ ---
45
68
 
46
69
  ## The problem
47
70
 
48
- UK personal lines rating operates on multi-dimensional grids. A typical motor model has driver age × NCD × vehicle group × postcode area × occupation. That is potentially 4.5 million rating cells. With 1 million policies, most cells are either empty or contain fewer than 30 observations.
71
+ UK personal lines rating operates on multi-dimensional grids. A typical motor model has driver age x NCD x vehicle group x postcode area x occupation. That is potentially 4.5 million rating cells. With 1 million policies, most cells are either empty or contain fewer than 30 observations.
49
72
 
50
73
  Standard approaches all fail at thin cells:
51
74
 
52
75
  - **Saturated GLM**: one coefficient per cell. Overfits noise. A cell with 3 claims gets a relativity of 3/expected, which is meaningless.
53
- - **Main-effects GLM**: forces multiplicativity. A young driver in a sports car has a rate exactly equal to young-driver-relativity × sports-car-relativity. Reality is super-multiplicative and the model cannot detect it.
76
+ - **Main-effects GLM**: forces multiplicativity. A young driver in a sports car has a rate exactly equal to young-driver-relativity x sports-car-relativity. Reality is super-multiplicative and the model cannot detect it.
54
77
  - **Ridge/LASSO GLM**: uniform regularisation regardless of exposure. A cell with 5,000 policy-years gets the same shrinkage as one with 20 policy-years. Wrong.
55
78
  - **GBM with min_data_in_leaf**: refuses to split on thin cells. Cannot borrow strength from related cells. No calibrated uncertainty.
56
79
 
@@ -58,6 +81,8 @@ The correct answer is **partial pooling**: thin segments borrow strength from re
58
81
 
59
82
  Under Normal-Normal conjugacy, partial pooling is exactly Bühlmann-Straub credibility. This library generalises it to Poisson (frequency) and Gamma (severity) likelihoods, with multiple crossed random effects.
60
83
 
84
+ ---
85
+
61
86
  ## Install
62
87
 
63
88
  ```bash
@@ -74,6 +99,8 @@ pip install "bayesian-pricing[numpyro]"
74
99
  uv add "bayesian-pricing[numpyro]"
75
100
  ```
76
101
 
102
+ ---
103
+
77
104
  ## Usage
78
105
 
79
106
  Input is segment-level sufficient statistics — one row per rating cell, with exposure and claim count. This is the practical production design: aggregate your book to rating cells first, then run the model. A book with 500k policies typically has 5,000–20,000 non-empty rating cells. The model operates on those cells, making NUTS feasible on a standard machine.
@@ -111,9 +138,10 @@ model.fit(df, claim_count_col="claims", exposure_col="exposure", sampler_config=
111
138
  # Posterior predictive means for each segment
112
139
  preds = model.predict()
113
140
  print(preds)
141
+ # Output is illustrative — exact values depend on your data and sampler seed:
114
142
  # veh_group age_band mean p5 p50 p95 credibility_factor
115
143
  # Supermini 17-21 0.1234 0.0812 0.1201 0.1731 0.38
116
- # Sports 17-21 0.1891 0.1102 0.1845 0.2881 0.21 thin
144
+ # Sports 17-21 0.1891 0.1102 0.1845 0.2881 0.21 <- thin
117
145
  # ...
118
146
 
119
147
  # Variance components: how much does each factor drive frequency?
@@ -131,19 +159,19 @@ tables = rel.relativities()
131
159
  # Single factor in rate-table format
132
160
  veh_table = rel.relativities(factor="veh_group")
133
161
  print(veh_table.table)
134
- # level relativity lower_90pct upper_90pct credibility_factor interval_width
135
- # Sports 1.524 1.234 1.891 0.71 0.657
136
- # Saloon 1.000 0.921 1.082 0.94 0.161
137
- # Supermini 0.819 0.764 0.881 0.89 0.117
162
+ # level relativity lower_90pct upper_90pct uncertainty_reduction interval_width
163
+ # Sports 1.524 1.234 1.891 0.71 0.657
164
+ # Saloon 1.000 0.921 1.082 0.94 0.161
165
+ # Supermini 0.819 0.764 0.881 0.89 0.117
138
166
 
139
167
  # Identify thin segments that need manual review
140
168
  thin = rel.thin_segments(credibility_threshold=0.3)
141
169
  print(thin)
142
- # factor level credibility_factor relativity
143
- # veh_group Sports-17-21 0.18 1.84 sparse cell, wide CI
170
+ # factor level uncertainty_reduction relativity
171
+ # veh_group Sports-17-21 0.18 1.84 <- sparse cell, wide CI
144
172
 
145
173
  # Export for Excel / rate system import
146
- summary_df = rel.summary() # long format: factor, level, relativity, CI, credibility
174
+ summary_df = rel.summary() # long format: factor, level, relativity, CI, uncertainty_reduction
147
175
  summary_df.write_csv("bayesian_relativities.csv")
148
176
  ```
149
177
 
@@ -212,11 +240,33 @@ ppc = posterior_predictive_check(model, claim_count_col="claims")
212
240
  | Method | When to use | Speed | Accuracy |
213
241
  |---|---|---|---|
214
242
  | `SamplerConfig(method="pathfinder")` | Model development, prior sensitivity | Minutes | Good approximation |
215
- | `SamplerConfig(method="nuts")` | Final production estimates | 2060 min | Exact (asymptotically) |
243
+ | `SamplerConfig(method="nuts")` | Final production estimates | 20-60 min | Exact (asymptotically) |
216
244
  | `SamplerConfig(nuts_sampler="numpyro")` | Large portfolios, GPU available | Fast on GPU | Exact |
217
245
 
218
246
  For portfolios with more than 50k rating cells, consider the two-stage approach: fit a GBM on the full book, extract segment-level residuals, then run the Bayesian model on the residuals. The GBM captures dense-cell signal; the Bayesian model handles thin-cell pooling.
219
247
 
248
+ ---
249
+
250
+ ## Performance
251
+
252
+ Benchmarked against **raw segment estimates** (observed claims / exposure per cell, no shrinkage) on synthetic UK motor data with a known data-generating process: 20 occupation classes crossed with 3 vehicle groups (60 rating cells), with occupations ranging from 20 to 800 policy-years of exposure. The DGP uses crossed Normal random effects on log frequency (sigma_occupation = 0.35, sigma_veh_group = 0.25), producing a realistic mix of thin and thick cells.
253
+
254
+ | Segment type | RMSE — raw | RMSE — Bayesian | Expected improvement |
255
+ |---|---|---|---|
256
+ | Thin occupations (20-50 py) | higher | lower | typically 20-40% |
257
+ | Thick occupations (300-800 py) | lower | comparable | small or neutral |
258
+ | All segments | — | — | typically 10-25% |
259
+
260
+ RMSE is measured against the known DGP ground truth, not holdout — which is the right comparison when you want to assess bias rather than aggregate prediction error. Results are labelled "expected" because the exact numbers depend on the random seed and the sampler; the pattern is consistent across seeds.
261
+
262
+ The shrinkage diagnostic confirms the theoretical prediction: credibility factors are strongly positively correlated with log(occupation exposure), meaning thin occupations receive heavier shrinkage toward the grand mean than thick ones — automatically, without any manual specification of which segments are reliable.
263
+
264
+ Variance component recovery: the estimated sigma parameters are expected to be in the right ballpark of the true values (0.35 and 0.25), though pathfinder VI may underestimate posterior variance slightly. Use NUTS for production rate tables.
265
+
266
+ Run `notebooks/benchmark.py` on Databricks to reproduce.
267
+
268
+ ---
269
+
220
270
  ## Relationship to Bühlmann-Straub credibility
221
271
 
222
272
  The Bühlmann-Straub credibility premium is the exact posterior mean of a hierarchical model under Normal-Normal conjugacy. This library generalises that result:
@@ -231,6 +281,8 @@ The Bühlmann-Straub credibility premium is the exact posterior mean of a hierar
231
281
 
232
282
  For single-factor pricing with many groups (e.g., scheme pricing), Bühlmann-Straub is computationally trivial and entirely adequate. Use this library when you need multiple crossed random effects, non-Normal likelihoods, or full posterior uncertainty.
233
283
 
284
+ ---
285
+
234
286
  ## Design decisions
235
287
 
236
288
  **Non-centered parameterisation throughout.** The centered version (`u_i ~ Normal(0, sigma)`) creates funnel geometry in the posterior when sigma is small — which is exactly the case for well-regularised insurance models. HMC cannot traverse the funnel efficiently. The non-centered version decouples the raw offsets from the scale and eliminates this problem. See Twiecki (2017) for the clearest exposition.
@@ -243,14 +295,13 @@ For single-factor pricing with many groups (e.g., scheme pricing), Bühlmann-Str
243
295
 
244
296
  **PyMC optional.** The library parses and validates data without PyMC. Tests for the data layer run in CI without it. This makes the library usable in environments where PyMC is hard to install.
245
297
 
298
+ ---
299
+
246
300
  ## Read more
247
301
 
248
302
  [Partial Pooling for Thin Rating Cells](https://burning-cost.github.io/2026/03/06/bayesian-hierarchical-models-for-thin-data-pricing.html) — why every other approach fails thin segments and how hierarchical Bayesian models solve it.
249
303
 
250
-
251
- ## Databricks Notebook
252
-
253
- A ready-to-run Databricks notebook benchmarking this library against standard approaches is available in [burning-cost-examples](https://github.com/burning-cost/burning-cost-examples/blob/main/notebooks/bayesian_pricing_demo.py).
304
+ ---
254
305
 
255
306
  ## Related libraries
256
307
 
@@ -260,40 +311,17 @@ A ready-to-run Databricks notebook benchmarking this library against standard ap
260
311
  | [insurance-interactions](https://github.com/burning-cost/insurance-interactions) | Automated detection of missing GLM interactions — the complementary question to thin-cell regularisation |
261
312
  | [insurance-datasets](https://github.com/burning-cost/insurance-datasets) | Synthetic UK motor and home datasets with known DGPs — useful for validating that the model recovers true parameters |
262
313
  | [insurance-monitoring](https://github.com/burning-cost/insurance-monitoring) | Model monitoring: PSI and A/E drift detection for tracking when the deployed model needs a refit |
314
+ | [insurance-spatial](https://github.com/burning-cost/insurance-spatial) | BYM2 spatial territory ratemaking — applies the same partial-pooling logic to geographic factors via ICAR structure |
315
+ | [insurance-thin-data](https://github.com/burning-cost/insurance-thin-data) | Transfer learning for sparse segments — complementary approach when source-target transfer is more appropriate than hierarchical pooling |
263
316
 
264
- [All Burning Cost libraries](https://burning-cost.github.io)
265
-
266
- ## Performance
267
-
268
- Benchmarked against **raw segment estimates** (observed claims / exposure per cell, no shrinkage) on synthetic UK motor data with a known data-generating process: 20 occupation classes crossed with 3 vehicle groups (60 rating cells), with occupations ranging from 20 to 800 policy-years of exposure. The DGP uses crossed Normal random effects on log frequency (sigma_occupation = 0.35, sigma_veh_group = 0.25), producing a realistic mix of thin and thick cells.
269
-
270
- | Segment type | RMSE — raw | RMSE — Bayesian | Expected improvement |
271
- |---|---|---|---|
272
- | Thin occupations (20–50 py) | higher | lower | typically 20–40% |
273
- | Thick occupations (300–800 py) | lower | comparable | small or neutral |
274
- | All segments | — | — | typically 10–25% |
275
-
276
- RMSE is measured against the known DGP ground truth, not holdout — which is the right comparison when you want to assess bias rather than aggregate prediction error. Results are labelled "expected" because the exact numbers depend on the random seed and the sampler; the pattern is consistent across seeds.
317
+ [All Burning Cost libraries](https://burning-cost.github.io)
277
318
 
278
- The shrinkage diagnostic confirms the theoretical prediction: credibility factors are strongly positively correlated with log(occupation exposure), meaning thin occupations receive heavier shrinkage toward the grand mean than thick ones — automatically, without any manual specification of which segments are reliable.
279
-
280
- Variance component recovery: the estimated sigma parameters are expected to be in the right ballpark of the true values (0.35 and 0.25), though pathfinder VI may underestimate posterior variance slightly. Use NUTS for production rate tables.
281
-
282
- Run `notebooks/benchmark.py` on Databricks to reproduce.
319
+ ---
283
320
 
284
321
  ## References
285
322
 
286
- 1. Bühlmann, H. (1967). Experience rating and credibility. *ASTIN Bulletin*, 4(3), 199207.
323
+ 1. Bühlmann, H. (1967). Experience rating and credibility. *ASTIN Bulletin*, 4(3), 199-207.
287
324
  2. Gelman et al. (2013). *Bayesian Data Analysis*, 3rd ed. Chapter 5.
288
325
  3. Ohlsson, E. (2008). Combining generalised linear models and credibility models. *Scandinavian Actuarial Journal*.
289
326
  4. Krapu et al. (2023). Flexible hierarchical risk modeling for large insurance data via NumPyro. *arXiv:2312.07432*.
290
327
  5. Twiecki, T. (2017). Why hierarchical models are awesome, tricky, and Bayesian. twiecki.io.
291
-
292
- ## Related Libraries
293
-
294
- | Library | What it does |
295
- |---------|-------------|
296
- | [insurance-spatial](https://github.com/burning-cost/insurance-spatial) | BYM2 spatial territory ratemaking — applies the same partial-pooling logic to geographic factors via ICAR structure |
297
- | [insurance-credibility](https://github.com/burning-cost/insurance-credibility) | Bühlmann-Straub credibility — the closed-form special case of this library under Normal-Normal conjugacy |
298
- | [insurance-thin-data](https://github.com/burning-cost/insurance-thin-data) | Transfer learning for sparse segments — complementary approach when source-target transfer is more appropriate than hierarchical pooling |
299
-
@@ -1,19 +1,40 @@
1
1
  # bayesian-pricing
2
2
 
3
- [![Tests](https://github.com/burning-cost/bayesian-pricing/actions/workflows/tests.yml/badge.svg)](https://github.com/burning-cost/bayesian-pricing/actions/workflows/tests.yml)
4
3
  [![PyPI](https://img.shields.io/pypi/v/bayesian-pricing)](https://pypi.org/project/bayesian-pricing/)
5
- ![Python](https://img.shields.io/badge/python-3.10%2B-blue)
4
+ [![Python](https://img.shields.io/pypi/pyversions/bayesian-pricing)](https://pypi.org/project/bayesian-pricing/)
5
+ [![Tests](https://img.shields.io/badge/tests-passing-brightgreen)]()
6
+ [![License](https://img.shields.io/badge/license-MIT-green)]()
6
7
 
7
- Hierarchical Bayesian models for insurance pricing thin-data segments.
8
+ Hierarchical Bayesian models for insurance pricing thin-data segments — when your rating grid has more cells than your book has claims, partial pooling gives you credible estimates where every other method gives you noise or nothing.
9
+
10
+ ---
11
+
12
+ ## Why bother
13
+
14
+ Benchmarked against raw segment estimates (observed claims / exposure, no shrinkage) on synthetic UK motor data with a known DGP: 20 occupation classes crossed with 3 vehicle groups (60 rating cells), ranging from 20 to 800 policy-years of exposure.
15
+
16
+ | Segment type | RMSE — raw estimates | RMSE — Bayesian | Improvement |
17
+ |---|---|---|---|
18
+ | Thin occupations (20-50 py) | Higher | Lower | Typically 20-40% |
19
+ | Thick occupations (300-800 py) | Lower | Comparable | Small or neutral |
20
+ | All segments combined | — | — | Typically 10-25% |
21
+
22
+ RMSE is measured against the known DGP ground truth. Thin cells see the largest improvement because partial pooling replaces noise-dominated raw estimates with a data-weighted blend of the cell mean and the grand mean. Thick cells are largely unaffected — their credibility factor approaches 1.
23
+
24
+ ---
25
+
26
+ [Run on Databricks](https://github.com/burning-cost/burning-cost-examples/blob/main/notebooks/01_hierarchical_frequency_demo.py)
27
+
28
+ ---
8
29
 
9
30
  ## The problem
10
31
 
11
- UK personal lines rating operates on multi-dimensional grids. A typical motor model has driver age × NCD × vehicle group × postcode area × occupation. That is potentially 4.5 million rating cells. With 1 million policies, most cells are either empty or contain fewer than 30 observations.
32
+ UK personal lines rating operates on multi-dimensional grids. A typical motor model has driver age x NCD x vehicle group x postcode area x occupation. That is potentially 4.5 million rating cells. With 1 million policies, most cells are either empty or contain fewer than 30 observations.
12
33
 
13
34
  Standard approaches all fail at thin cells:
14
35
 
15
36
  - **Saturated GLM**: one coefficient per cell. Overfits noise. A cell with 3 claims gets a relativity of 3/expected, which is meaningless.
16
- - **Main-effects GLM**: forces multiplicativity. A young driver in a sports car has a rate exactly equal to young-driver-relativity × sports-car-relativity. Reality is super-multiplicative and the model cannot detect it.
37
+ - **Main-effects GLM**: forces multiplicativity. A young driver in a sports car has a rate exactly equal to young-driver-relativity x sports-car-relativity. Reality is super-multiplicative and the model cannot detect it.
17
38
  - **Ridge/LASSO GLM**: uniform regularisation regardless of exposure. A cell with 5,000 policy-years gets the same shrinkage as one with 20 policy-years. Wrong.
18
39
  - **GBM with min_data_in_leaf**: refuses to split on thin cells. Cannot borrow strength from related cells. No calibrated uncertainty.
19
40
 
@@ -21,6 +42,8 @@ The correct answer is **partial pooling**: thin segments borrow strength from re
21
42
 
22
43
  Under Normal-Normal conjugacy, partial pooling is exactly Bühlmann-Straub credibility. This library generalises it to Poisson (frequency) and Gamma (severity) likelihoods, with multiple crossed random effects.
23
44
 
45
+ ---
46
+
24
47
  ## Install
25
48
 
26
49
  ```bash
@@ -37,6 +60,8 @@ pip install "bayesian-pricing[numpyro]"
37
60
  uv add "bayesian-pricing[numpyro]"
38
61
  ```
39
62
 
63
+ ---
64
+
40
65
  ## Usage
41
66
 
42
67
  Input is segment-level sufficient statistics — one row per rating cell, with exposure and claim count. This is the practical production design: aggregate your book to rating cells first, then run the model. A book with 500k policies typically has 5,000–20,000 non-empty rating cells. The model operates on those cells, making NUTS feasible on a standard machine.
@@ -74,9 +99,10 @@ model.fit(df, claim_count_col="claims", exposure_col="exposure", sampler_config=
74
99
  # Posterior predictive means for each segment
75
100
  preds = model.predict()
76
101
  print(preds)
102
+ # Output is illustrative — exact values depend on your data and sampler seed:
77
103
  # veh_group age_band mean p5 p50 p95 credibility_factor
78
104
  # Supermini 17-21 0.1234 0.0812 0.1201 0.1731 0.38
79
- # Sports 17-21 0.1891 0.1102 0.1845 0.2881 0.21 thin
105
+ # Sports 17-21 0.1891 0.1102 0.1845 0.2881 0.21 <- thin
80
106
  # ...
81
107
 
82
108
  # Variance components: how much does each factor drive frequency?
@@ -94,19 +120,19 @@ tables = rel.relativities()
94
120
  # Single factor in rate-table format
95
121
  veh_table = rel.relativities(factor="veh_group")
96
122
  print(veh_table.table)
97
- # level relativity lower_90pct upper_90pct credibility_factor interval_width
98
- # Sports 1.524 1.234 1.891 0.71 0.657
99
- # Saloon 1.000 0.921 1.082 0.94 0.161
100
- # Supermini 0.819 0.764 0.881 0.89 0.117
123
+ # level relativity lower_90pct upper_90pct uncertainty_reduction interval_width
124
+ # Sports 1.524 1.234 1.891 0.71 0.657
125
+ # Saloon 1.000 0.921 1.082 0.94 0.161
126
+ # Supermini 0.819 0.764 0.881 0.89 0.117
101
127
 
102
128
  # Identify thin segments that need manual review
103
129
  thin = rel.thin_segments(credibility_threshold=0.3)
104
130
  print(thin)
105
- # factor level credibility_factor relativity
106
- # veh_group Sports-17-21 0.18 1.84 sparse cell, wide CI
131
+ # factor level uncertainty_reduction relativity
132
+ # veh_group Sports-17-21 0.18 1.84 <- sparse cell, wide CI
107
133
 
108
134
  # Export for Excel / rate system import
109
- summary_df = rel.summary() # long format: factor, level, relativity, CI, credibility
135
+ summary_df = rel.summary() # long format: factor, level, relativity, CI, uncertainty_reduction
110
136
  summary_df.write_csv("bayesian_relativities.csv")
111
137
  ```
112
138
 
@@ -175,11 +201,33 @@ ppc = posterior_predictive_check(model, claim_count_col="claims")
175
201
  | Method | When to use | Speed | Accuracy |
176
202
  |---|---|---|---|
177
203
  | `SamplerConfig(method="pathfinder")` | Model development, prior sensitivity | Minutes | Good approximation |
178
- | `SamplerConfig(method="nuts")` | Final production estimates | 2060 min | Exact (asymptotically) |
204
+ | `SamplerConfig(method="nuts")` | Final production estimates | 20-60 min | Exact (asymptotically) |
179
205
  | `SamplerConfig(nuts_sampler="numpyro")` | Large portfolios, GPU available | Fast on GPU | Exact |
180
206
 
181
207
  For portfolios with more than 50k rating cells, consider the two-stage approach: fit a GBM on the full book, extract segment-level residuals, then run the Bayesian model on the residuals. The GBM captures dense-cell signal; the Bayesian model handles thin-cell pooling.
182
208
 
209
+ ---
210
+
211
+ ## Performance
212
+
213
+ Benchmarked against **raw segment estimates** (observed claims / exposure per cell, no shrinkage) on synthetic UK motor data with a known data-generating process: 20 occupation classes crossed with 3 vehicle groups (60 rating cells), with occupations ranging from 20 to 800 policy-years of exposure. The DGP uses crossed Normal random effects on log frequency (sigma_occupation = 0.35, sigma_veh_group = 0.25), producing a realistic mix of thin and thick cells.
214
+
215
+ | Segment type | RMSE — raw | RMSE — Bayesian | Expected improvement |
216
+ |---|---|---|---|
217
+ | Thin occupations (20-50 py) | higher | lower | typically 20-40% |
218
+ | Thick occupations (300-800 py) | lower | comparable | small or neutral |
219
+ | All segments | — | — | typically 10-25% |
220
+
221
+ RMSE is measured against the known DGP ground truth, not holdout — which is the right comparison when you want to assess bias rather than aggregate prediction error. Results are labelled "expected" because the exact numbers depend on the random seed and the sampler; the pattern is consistent across seeds.
222
+
223
+ The shrinkage diagnostic confirms the theoretical prediction: credibility factors are strongly positively correlated with log(occupation exposure), meaning thin occupations receive heavier shrinkage toward the grand mean than thick ones — automatically, without any manual specification of which segments are reliable.
224
+
225
+ Variance component recovery: the estimated sigma parameters are expected to be in the right ballpark of the true values (0.35 and 0.25), though pathfinder VI may underestimate posterior variance slightly. Use NUTS for production rate tables.
226
+
227
+ Run `notebooks/benchmark.py` on Databricks to reproduce.
228
+
229
+ ---
230
+
183
231
  ## Relationship to Bühlmann-Straub credibility
184
232
 
185
233
  The Bühlmann-Straub credibility premium is the exact posterior mean of a hierarchical model under Normal-Normal conjugacy. This library generalises that result:
@@ -194,6 +242,8 @@ The Bühlmann-Straub credibility premium is the exact posterior mean of a hierar
194
242
 
195
243
  For single-factor pricing with many groups (e.g., scheme pricing), Bühlmann-Straub is computationally trivial and entirely adequate. Use this library when you need multiple crossed random effects, non-Normal likelihoods, or full posterior uncertainty.
196
244
 
245
+ ---
246
+
197
247
  ## Design decisions
198
248
 
199
249
  **Non-centered parameterisation throughout.** The centered version (`u_i ~ Normal(0, sigma)`) creates funnel geometry in the posterior when sigma is small — which is exactly the case for well-regularised insurance models. HMC cannot traverse the funnel efficiently. The non-centered version decouples the raw offsets from the scale and eliminates this problem. See Twiecki (2017) for the clearest exposition.
@@ -206,14 +256,13 @@ For single-factor pricing with many groups (e.g., scheme pricing), Bühlmann-Str
206
256
 
207
257
  **PyMC optional.** The library parses and validates data without PyMC. Tests for the data layer run in CI without it. This makes the library usable in environments where PyMC is hard to install.
208
258
 
259
+ ---
260
+
209
261
  ## Read more
210
262
 
211
263
  [Partial Pooling for Thin Rating Cells](https://burning-cost.github.io/2026/03/06/bayesian-hierarchical-models-for-thin-data-pricing.html) — why every other approach fails thin segments and how hierarchical Bayesian models solve it.
212
264
 
213
-
214
- ## Databricks Notebook
215
-
216
- A ready-to-run Databricks notebook benchmarking this library against standard approaches is available in [burning-cost-examples](https://github.com/burning-cost/burning-cost-examples/blob/main/notebooks/bayesian_pricing_demo.py).
265
+ ---
217
266
 
218
267
  ## Related libraries
219
268
 
@@ -223,40 +272,17 @@ A ready-to-run Databricks notebook benchmarking this library against standard ap
223
272
  | [insurance-interactions](https://github.com/burning-cost/insurance-interactions) | Automated detection of missing GLM interactions — the complementary question to thin-cell regularisation |
224
273
  | [insurance-datasets](https://github.com/burning-cost/insurance-datasets) | Synthetic UK motor and home datasets with known DGPs — useful for validating that the model recovers true parameters |
225
274
  | [insurance-monitoring](https://github.com/burning-cost/insurance-monitoring) | Model monitoring: PSI and A/E drift detection for tracking when the deployed model needs a refit |
275
+ | [insurance-spatial](https://github.com/burning-cost/insurance-spatial) | BYM2 spatial territory ratemaking — applies the same partial-pooling logic to geographic factors via ICAR structure |
276
+ | [insurance-thin-data](https://github.com/burning-cost/insurance-thin-data) | Transfer learning for sparse segments — complementary approach when source-target transfer is more appropriate than hierarchical pooling |
226
277
 
227
- [All Burning Cost libraries](https://burning-cost.github.io)
228
-
229
- ## Performance
230
-
231
- Benchmarked against **raw segment estimates** (observed claims / exposure per cell, no shrinkage) on synthetic UK motor data with a known data-generating process: 20 occupation classes crossed with 3 vehicle groups (60 rating cells), with occupations ranging from 20 to 800 policy-years of exposure. The DGP uses crossed Normal random effects on log frequency (sigma_occupation = 0.35, sigma_veh_group = 0.25), producing a realistic mix of thin and thick cells.
232
-
233
- | Segment type | RMSE — raw | RMSE — Bayesian | Expected improvement |
234
- |---|---|---|---|
235
- | Thin occupations (20–50 py) | higher | lower | typically 20–40% |
236
- | Thick occupations (300–800 py) | lower | comparable | small or neutral |
237
- | All segments | — | — | typically 10–25% |
238
-
239
- RMSE is measured against the known DGP ground truth, not holdout — which is the right comparison when you want to assess bias rather than aggregate prediction error. Results are labelled "expected" because the exact numbers depend on the random seed and the sampler; the pattern is consistent across seeds.
278
+ [All Burning Cost libraries](https://burning-cost.github.io)
240
279
 
241
- The shrinkage diagnostic confirms the theoretical prediction: credibility factors are strongly positively correlated with log(occupation exposure), meaning thin occupations receive heavier shrinkage toward the grand mean than thick ones — automatically, without any manual specification of which segments are reliable.
242
-
243
- Variance component recovery: the estimated sigma parameters are expected to be in the right ballpark of the true values (0.35 and 0.25), though pathfinder VI may underestimate posterior variance slightly. Use NUTS for production rate tables.
244
-
245
- Run `notebooks/benchmark.py` on Databricks to reproduce.
280
+ ---
246
281
 
247
282
  ## References
248
283
 
249
- 1. Bühlmann, H. (1967). Experience rating and credibility. *ASTIN Bulletin*, 4(3), 199207.
284
+ 1. Bühlmann, H. (1967). Experience rating and credibility. *ASTIN Bulletin*, 4(3), 199-207.
250
285
  2. Gelman et al. (2013). *Bayesian Data Analysis*, 3rd ed. Chapter 5.
251
286
  3. Ohlsson, E. (2008). Combining generalised linear models and credibility models. *Scandinavian Actuarial Journal*.
252
287
  4. Krapu et al. (2023). Flexible hierarchical risk modeling for large insurance data via NumPyro. *arXiv:2312.07432*.
253
288
  5. Twiecki, T. (2017). Why hierarchical models are awesome, tricky, and Bayesian. twiecki.io.
254
-
255
- ## Related Libraries
256
-
257
- | Library | What it does |
258
- |---------|-------------|
259
- | [insurance-spatial](https://github.com/burning-cost/insurance-spatial) | BYM2 spatial territory ratemaking — applies the same partial-pooling logic to geographic factors via ICAR structure |
260
- | [insurance-credibility](https://github.com/burning-cost/insurance-credibility) | Bühlmann-Straub credibility — the closed-form special case of this library under Normal-Normal conjugacy |
261
- | [insurance-thin-data](https://github.com/burning-cost/insurance-thin-data) | Transfer learning for sparse segments — complementary approach when source-target transfer is more appropriate than hierarchical pooling |
262
-