insurance-deploy 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,363 @@
1
+ Metadata-Version: 2.4
2
+ Name: insurance-deploy
3
+ Version: 0.1.0
4
+ Summary: Champion/challenger pricing framework for UK insurance — model registry, quote routing, ENBP audit logging, and statistical promotion tests
5
+ Project-URL: Homepage, https://github.com/burning-cost/insurance-deploy
6
+ Project-URL: Repository, https://github.com/burning-cost/insurance-deploy
7
+ License: MIT
8
+ Keywords: ENBP,FCA,UK-insurance,actuarial,champion-challenger,insurance,loss-ratio,model-registry,pricing
9
+ Classifier: Development Status :: 3 - Alpha
10
+ Classifier: Intended Audience :: Financial and Insurance Industry
11
+ Classifier: License :: OSI Approved :: MIT License
12
+ Classifier: Programming Language :: Python :: 3
13
+ Classifier: Programming Language :: Python :: 3.10
14
+ Classifier: Programming Language :: Python :: 3.11
15
+ Classifier: Programming Language :: Python :: 3.12
16
+ Classifier: Topic :: Scientific/Engineering :: Mathematics
17
+ Requires-Python: >=3.10
18
+ Requires-Dist: joblib>=1.2
19
+ Requires-Dist: numpy>=1.24
20
+ Requires-Dist: pandas>=2.0
21
+ Requires-Dist: scipy>=1.10
22
+ Provides-Extra: dev
23
+ Requires-Dist: pytest-cov; extra == 'dev'
24
+ Requires-Dist: pytest>=7.0; extra == 'dev'
25
+ Description-Content-Type: text/markdown
26
+
27
+ # insurance-deploy
28
+
29
+ Champion/challenger pricing framework for UK insurance — model registry, quote routing, ENBP audit logging, and statistical promotion tests.
30
+
31
+ ---
32
+
33
+ ## The problem
34
+
35
+ You've built a better pricing model. CatBoost instead of GLM, or an updated GLM with two years more data and a rebuilt rating factor for NCB. The model validates well in holdout. Your actuarial team wants to deploy it.
36
+
37
+ The problem is everything after model training.
38
+
39
+ How do you run the challenger alongside the champion without disrupting live pricing? How do you log which model priced each quote — per-quote, per-policy, permanently — so you can run the FCA-required ENBP audit? How do you know when you have enough data to make a statistically credible promotion decision, rather than guessing after three months on a sample too small to tell you anything?
40
+
41
+ Every UK pricing team faces this. Most solve it with ad-hoc scripts, spreadsheet logs, and informal sign-off. This library provides the infrastructure.
42
+
43
+ ---
44
+
45
+ ## Regulatory context
46
+
47
+ **ICOBS 6B.2.51R** (the ENBP rules, effective January 2022) requires firms to maintain written records demonstrating that renewal prices do not exceed the Equivalent New Business Price for identical risk profiles.
48
+
49
+ FCA multi-firm review (2023): 83% of firms were non-compliant with record-keeping requirements. Most lacked records granular enough for the SMF holder to sign the annual attestation.
50
+
51
+ When a pricing model changes mid-year, you must be able to demonstrate which model priced each renewal and that the model change did not introduce tenure discrimination. That requires a per-quote model version log. This library is that log.
52
+
53
+ **FCA Consumer Duty (PRIN 2A)** creates a risk for live A/B pricing. Charging two customers of identical profile differently simultaneously could be challenged as inconsistent with fair value obligations. Shadow mode (the default) eliminates this risk — challenger scores in parallel but the customer always sees the champion price.
54
+
55
+ ---
56
+
57
+ ## What this library does
58
+
59
+ Five modules:
60
+
61
+ | Module | Contents |
62
+ |--------|----------|
63
+ | `insurance_deploy.registry` | `ModelRegistry` — append-only version-tagged model store with hash verification |
64
+ | `insurance_deploy.experiment` | `Experiment` — deterministic hash-based routing, shadow and live modes |
65
+ | `insurance_deploy.logger` | `QuoteLogger` — append-only SQLite audit log with ENBP compliance flagging |
66
+ | `insurance_deploy.kpi` | `KPITracker` — hit rate, GWP, loss ratio, frequency, power analysis |
67
+ | `insurance_deploy.comparison` | `ModelComparison` — bootstrap LR test, z-test on hit rate, Poisson frequency test |
68
+ | `insurance_deploy.audit` | `ENBPAuditReport` — ICOBS 6B.2.51R compliance report in Markdown |
69
+
70
+ ---
71
+
72
+ ## Install
73
+
74
+ ```bash
75
+ pip install insurance-deploy
76
+ ```
77
+
78
+ Dependencies: NumPy, SciPy, Pandas, joblib.
79
+
80
+ ---
81
+
82
+ ## Quick start
83
+
84
+ ### Register models
85
+
86
+ ```python
87
+ from insurance_deploy import ModelRegistry
88
+
89
+ registry = ModelRegistry("./registry")
90
+
91
+ # Register current champion
92
+ champion_mv = registry.register(
93
+ champion_model, # any sklearn-compatible object with .predict()
94
+ name="motor",
95
+ version="2.0",
96
+ metadata={
97
+ "training_date": "2024-01-01",
98
+ "features": ["age", "ncd", "postcode_band"],
99
+ "holdout_gini": 0.42,
100
+ }
101
+ )
102
+
103
+ # Register challenger
104
+ challenger_mv = registry.register(
105
+ challenger_model,
106
+ name="motor",
107
+ version="3.0",
108
+ metadata={
109
+ "training_date": "2024-07-01",
110
+ "features": ["age", "ncd", "postcode_band", "vehicle_value"],
111
+ "holdout_gini": 0.45,
112
+ }
113
+ )
114
+ ```
115
+
116
+ ### Set up the experiment
117
+
118
+ ```python
119
+ from insurance_deploy import Experiment
120
+
121
+ exp = Experiment(
122
+ name="motor_v3_vs_v2",
123
+ champion=champion_mv,
124
+ challenger=challenger_mv,
125
+ challenger_pct=0.10, # 10% of policies to challenger
126
+ mode="shadow", # Default. Challenger scores but does not price.
127
+ )
128
+ ```
129
+
130
+ ### Route quotes and log
131
+
132
+ ```python
133
+ from insurance_deploy import QuoteLogger
134
+
135
+ logger = QuoteLogger("./quotes.db")
136
+
137
+ # In your quote handler:
138
+ def handle_quote(policy_id, inputs, renewal_flag=False, enbp=None):
139
+ arm = exp.route(policy_id)
140
+
141
+ # Champion always prices in shadow mode
142
+ champion_price = champion_mv.predict([inputs])[0]
143
+
144
+ # Challenger scores in shadow — output logged, not shown to customer
145
+ challenger_price = challenger_mv.predict([inputs])[0]
146
+
147
+ # Log champion quote (the one the customer sees)
148
+ logger.log_quote(
149
+ policy_id=policy_id,
150
+ experiment_name=exp.name,
151
+ arm=arm,
152
+ model_version=champion_mv.version_id, # champion always prices
153
+ quoted_price=champion_price,
154
+ enbp=enbp, # Provide for renewals — you calculate this, not us
155
+ renewal_flag=renewal_flag,
156
+ )
157
+
158
+ return champion_price
159
+
160
+ # When policy binds:
161
+ logger.log_bind("POL-12345", bound_price=425.0)
162
+
163
+ # When claim is reported (log at each development stage):
164
+ from datetime import date
165
+ logger.log_claim("POL-12345", claim_date=date(2024, 8, 1),
166
+ claim_amount=1200.0, development_month=3)
167
+ # Update at 12 months:
168
+ logger.log_claim("POL-12345", claim_date=date(2024, 8, 1),
169
+ claim_amount=1450.0, development_month=12)
170
+ ```
171
+
172
+ ### Track KPIs
173
+
174
+ ```python
175
+ from insurance_deploy import KPITracker
176
+
177
+ tracker = KPITracker(logger)
178
+
179
+ # Immediately available
180
+ print(tracker.hit_rate("motor_v3_vs_v2"))
181
+ # {'champion': {'quoted': 900, 'bound': 270, 'hit_rate': 0.30},
182
+ # 'challenger': {'quoted': 100, 'bound': 28, 'hit_rate': 0.28}}
183
+
184
+ print(tracker.gwp("motor_v3_vs_v2"))
185
+ # {'champion': {'bound_policies': 270, 'total_gwp': 108000.0, 'mean_gwp': 400.0},
186
+ # 'challenger': {'bound_policies': 28, 'total_gwp': 11480.0, 'mean_gwp': 410.0}}
187
+
188
+ # After 12 months development
189
+ lr = tracker.loss_ratio("motor_v3_vs_v2", development_months=12)
190
+ print(lr)
191
+ # {'champion': {'loss_ratio': 0.64, 'policy_count': 270, ...},
192
+ # 'challenger': {'loss_ratio': 0.61, 'policy_count': 28, ...}}
193
+
194
+ # Power analysis: how long until we can decide?
195
+ pa = tracker.power_analysis("motor_v3_vs_v2", target_delta_lr=0.03)
196
+ print(f"Months to LR significance (incl. 12m development): "
197
+ f"{pa['lr_total_months_with_development']:.0f}")
198
+ # Months to LR significance (incl. 12m development): 28
199
+ ```
200
+
201
+ ### Statistical comparison
202
+
203
+ ```python
204
+ from insurance_deploy import ModelComparison
205
+
206
+ comp = ModelComparison(tracker)
207
+
208
+ # Bootstrap loss ratio test (requires 12m+ development)
209
+ result = comp.bootstrap_lr_test("motor_v3_vs_v2", n_bootstrap=10_000, seed=42)
210
+ print(result.summary())
211
+ # Test: bootstrap_lr_test | Experiment: motor_v3_vs_v2
212
+ # Champion estimate: 0.6402 (n=270)
213
+ # Challenger estimate: 0.6118 (n=28)
214
+ # Difference (challenger - champion): -0.0284
215
+ # 95% CI: [-0.0751, 0.0183]
216
+ # p-value: 0.2341
217
+ #
218
+ # Conclusion: INSUFFICIENT_EVIDENCE
219
+ # Recommendation: No statistically significant difference detected in loss_ratio
220
+ # (p=0.234 >= 0.05). Continue experiment. Consider running power_analysis()
221
+ # to estimate time to significance.
222
+
223
+ # Hit rate test (available earlier)
224
+ hr_result = comp.hit_rate_test("motor_v3_vs_v2")
225
+ ```
226
+
227
+ ### ENBP audit report
228
+
229
+ ```python
230
+ from insurance_deploy import ENBPAuditReport
231
+
232
+ reporter = ENBPAuditReport(logger)
233
+ md = reporter.generate(
234
+ "motor_v3_vs_v2",
235
+ period_start="2024-01-01",
236
+ period_end="2024-12-31",
237
+ firm_name="Acme Insurance Ltd",
238
+ smf_holder="Jane Smith",
239
+ )
240
+ print(md) # Markdown: paste into attestation pack or Databricks notebook
241
+ ```
242
+
243
+ ---
244
+
245
+ ## Shadow mode vs live mode
246
+
247
+ Shadow mode is the default and is right for most use cases.
248
+
249
+ **Shadow mode:** champion prices every quote. Challenger runs on identical inputs, output is logged but not shown to the customer. Zero fair value regulatory risk. Model quality comparison is clean — no adverse selection confound.
250
+
251
+ **Live mode:** challenger prices its routed fraction of policies (10% by default). Enables market response testing (does challenger's different pricing affect conversion?). Raises FCA Consumer Duty fair value questions. Also introduces adverse selection bias: if challenger prices differently, the bound cohorts will have different risk profiles, making loss ratio comparison harder to interpret.
252
+
253
+ ```python
254
+ # Live mode — get legal sign-off first
255
+ import warnings
256
+ with warnings.catch_warnings():
257
+ warnings.simplefilter("ignore") # suppress the FCA warning if you've taken legal advice
258
+ exp = Experiment(
259
+ name="live_test",
260
+ champion=champion_mv,
261
+ challenger=challenger_mv,
262
+ mode="live",
263
+ )
264
+ ```
265
+
266
+ The library is opinionated here. The warning is intentional. Suppress it when you've done the legal groundwork, not before.
267
+
268
+ ---
269
+
270
+ ## Routing determinism
271
+
272
+ Routing uses SHA-256(policy_id + experiment_name), last 8 hex characters as integer, modulo 100. If result < challenger_pct * 100, route to challenger.
273
+
274
+ This is deterministic and stateless. Given a policy_id and experiment name, the routing decision is always the same and can be recomputed at any point independently. No database of assignments required. Any assignment can be verified from first principles.
275
+
276
+ Assignment is by policy, not by quote. A policy routed to challenger on first quote will always be routed to challenger within this experiment. This is required for ENBP audit integrity.
277
+
278
+ ---
279
+
280
+ ## Why loss ratio significance takes 18 months
281
+
282
+ At 10% challenger split with 3,000 bound policies/month total:
283
+ - Challenger receives ~300 policies/month
284
+ - Hit rate significance (2pp delta, 80% power): ~5 months
285
+ - Claim frequency significance (0.5pp delta): ~10 months
286
+ - **Developed loss ratio significance (3pp delta, 12-month development): ~17 months from first quote, total ~29 months from experiment start**
287
+
288
+ This is not a limitation of the library. It is physics. LR has a 12-36 month reward tail. Any framework claiming to optimise on LR signal faster than this using bandits or similar methods is either using a proxy metric (hit rate) or lying.
289
+
290
+ The `power_analysis()` method makes this timeline explicit. Run it before starting an experiment so your stakeholders have realistic expectations.
291
+
292
+ ---
293
+
294
+ ## Radar wrapper pattern
295
+
296
+ Most UK personal lines insurers deploy rates via WTW Radar Live. The library integrates as a governance layer around Radar:
297
+
298
+ ```python
299
+ import requests
300
+ from insurance_deploy import Experiment, QuoteLogger
301
+
302
+ def get_quote(policy_id, risk_dict, renewal_flag=False, enbp=None):
303
+ # Champion = Radar Live (existing production system)
304
+ radar_response = requests.post(RADAR_LIVE_URL, json=risk_dict)
305
+ champion_price = radar_response.json()["premium"]
306
+
307
+ # Challenger = Python model (your new model)
308
+ arm = exp.route(policy_id)
309
+ challenger_price = challenger_mv.predict([risk_dict])[0]
310
+
311
+ # Log the quote (champion prices; challenger is shadow)
312
+ logger.log_quote(
313
+ policy_id=policy_id,
314
+ experiment_name=exp.name,
315
+ arm=arm,
316
+ model_version=champion_mv.version_id,
317
+ quoted_price=champion_price,
318
+ enbp=enbp,
319
+ renewal_flag=renewal_flag,
320
+ )
321
+
322
+ return champion_price # customer always sees champion price
323
+ ```
324
+
325
+ No Radar infrastructure changes required. The library handles the governance layer.
326
+
327
+ ---
328
+
329
+ ## ENBP: what the library does and doesn't do
330
+
331
+ The library records ENBP. It does not calculate it.
332
+
333
+ ENBP calculation per ICOBS 6B is your pricing team's responsibility. You pass the ENBP value to `log_quote(enbp=...)`. The library records the value, flags whether `quoted_price <= enbp`, and includes this in the audit report.
334
+
335
+ If your ENBP calculation is wrong, the log is wrong — but that is upstream of this library's scope. The separation is intentional.
336
+
337
+ ---
338
+
339
+ ## Databricks companion notebook
340
+
341
+ `notebooks/insurance_deploy_demo.py` demonstrates the full workflow on synthetic data:
342
+ - Model registry setup
343
+ - Experiment configuration and routing verification
344
+ - Quote/bind/claim data generation and logging
345
+ - KPI dashboard
346
+ - Bootstrap LR test
347
+ - ENBP audit report generation
348
+
349
+ Run as a Databricks Python notebook. Requires `pip install insurance-deploy`.
350
+
351
+ ---
352
+
353
+ ## Scope
354
+
355
+ This library handles: model version registry, champion/challenger routing, audit logging, KPI computation, statistical tests, ENBP compliance reports.
356
+
357
+ It does not handle: model training, rate optimisation (see `insurance-optimise`), model drift monitoring (see `insurance-monitor`), causal effect estimation (see `insurance-causal-policy`), or real-time API infrastructure.
358
+
359
+ ---
360
+
361
+ ## License
362
+
363
+ MIT
@@ -0,0 +1,337 @@
1
+ # insurance-deploy
2
+
3
+ Champion/challenger pricing framework for UK insurance — model registry, quote routing, ENBP audit logging, and statistical promotion tests.
4
+
5
+ ---
6
+
7
+ ## The problem
8
+
9
+ You've built a better pricing model. CatBoost instead of GLM, or an updated GLM with two years more data and a rebuilt rating factor for NCB. The model validates well in holdout. Your actuarial team wants to deploy it.
10
+
11
+ The problem is everything after model training.
12
+
13
+ How do you run the challenger alongside the champion without disrupting live pricing? How do you log which model priced each quote — per-quote, per-policy, permanently — so you can run the FCA-required ENBP audit? How do you know when you have enough data to make a statistically credible promotion decision, rather than guessing after three months on a sample too small to tell you anything?
14
+
15
+ Every UK pricing team faces this. Most solve it with ad-hoc scripts, spreadsheet logs, and informal sign-off. This library provides the infrastructure.
16
+
17
+ ---
18
+
19
+ ## Regulatory context
20
+
21
+ **ICOBS 6B.2.51R** (the ENBP rules, effective January 2022) requires firms to maintain written records demonstrating that renewal prices do not exceed the Equivalent New Business Price for identical risk profiles.
22
+
23
+ FCA multi-firm review (2023): 83% of firms were non-compliant with record-keeping requirements. Most lacked records granular enough for the SMF holder to sign the annual attestation.
24
+
25
+ When a pricing model changes mid-year, you must be able to demonstrate which model priced each renewal and that the model change did not introduce tenure discrimination. That requires a per-quote model version log. This library is that log.
26
+
27
+ **FCA Consumer Duty (PRIN 2A)** creates a risk for live A/B pricing. Charging two customers of identical profile differently simultaneously could be challenged as inconsistent with fair value obligations. Shadow mode (the default) eliminates this risk — challenger scores in parallel but the customer always sees the champion price.
28
+
29
+ ---
30
+
31
+ ## What this library does
32
+
33
+ Five modules:
34
+
35
+ | Module | Contents |
36
+ |--------|----------|
37
+ | `insurance_deploy.registry` | `ModelRegistry` — append-only version-tagged model store with hash verification |
38
+ | `insurance_deploy.experiment` | `Experiment` — deterministic hash-based routing, shadow and live modes |
39
+ | `insurance_deploy.logger` | `QuoteLogger` — append-only SQLite audit log with ENBP compliance flagging |
40
+ | `insurance_deploy.kpi` | `KPITracker` — hit rate, GWP, loss ratio, frequency, power analysis |
41
+ | `insurance_deploy.comparison` | `ModelComparison` — bootstrap LR test, z-test on hit rate, Poisson frequency test |
42
+ | `insurance_deploy.audit` | `ENBPAuditReport` — ICOBS 6B.2.51R compliance report in Markdown |
43
+
44
+ ---
45
+
46
+ ## Install
47
+
48
+ ```bash
49
+ pip install insurance-deploy
50
+ ```
51
+
52
+ Dependencies: NumPy, SciPy, Pandas, joblib.
53
+
54
+ ---
55
+
56
+ ## Quick start
57
+
58
+ ### Register models
59
+
60
+ ```python
61
+ from insurance_deploy import ModelRegistry
62
+
63
+ registry = ModelRegistry("./registry")
64
+
65
+ # Register current champion
66
+ champion_mv = registry.register(
67
+ champion_model, # any sklearn-compatible object with .predict()
68
+ name="motor",
69
+ version="2.0",
70
+ metadata={
71
+ "training_date": "2024-01-01",
72
+ "features": ["age", "ncd", "postcode_band"],
73
+ "holdout_gini": 0.42,
74
+ }
75
+ )
76
+
77
+ # Register challenger
78
+ challenger_mv = registry.register(
79
+ challenger_model,
80
+ name="motor",
81
+ version="3.0",
82
+ metadata={
83
+ "training_date": "2024-07-01",
84
+ "features": ["age", "ncd", "postcode_band", "vehicle_value"],
85
+ "holdout_gini": 0.45,
86
+ }
87
+ )
88
+ ```
89
+
90
+ ### Set up the experiment
91
+
92
+ ```python
93
+ from insurance_deploy import Experiment
94
+
95
+ exp = Experiment(
96
+ name="motor_v3_vs_v2",
97
+ champion=champion_mv,
98
+ challenger=challenger_mv,
99
+ challenger_pct=0.10, # 10% of policies to challenger
100
+ mode="shadow", # Default. Challenger scores but does not price.
101
+ )
102
+ ```
103
+
104
+ ### Route quotes and log
105
+
106
+ ```python
107
+ from insurance_deploy import QuoteLogger
108
+
109
+ logger = QuoteLogger("./quotes.db")
110
+
111
+ # In your quote handler:
112
+ def handle_quote(policy_id, inputs, renewal_flag=False, enbp=None):
113
+ arm = exp.route(policy_id)
114
+
115
+ # Champion always prices in shadow mode
116
+ champion_price = champion_mv.predict([inputs])[0]
117
+
118
+ # Challenger scores in shadow — output logged, not shown to customer
119
+ challenger_price = challenger_mv.predict([inputs])[0]
120
+
121
+ # Log champion quote (the one the customer sees)
122
+ logger.log_quote(
123
+ policy_id=policy_id,
124
+ experiment_name=exp.name,
125
+ arm=arm,
126
+ model_version=champion_mv.version_id, # champion always prices
127
+ quoted_price=champion_price,
128
+ enbp=enbp, # Provide for renewals — you calculate this, not us
129
+ renewal_flag=renewal_flag,
130
+ )
131
+
132
+ return champion_price
133
+
134
+ # When policy binds:
135
+ logger.log_bind("POL-12345", bound_price=425.0)
136
+
137
+ # When claim is reported (log at each development stage):
138
+ from datetime import date
139
+ logger.log_claim("POL-12345", claim_date=date(2024, 8, 1),
140
+ claim_amount=1200.0, development_month=3)
141
+ # Update at 12 months:
142
+ logger.log_claim("POL-12345", claim_date=date(2024, 8, 1),
143
+ claim_amount=1450.0, development_month=12)
144
+ ```
145
+
146
+ ### Track KPIs
147
+
148
+ ```python
149
+ from insurance_deploy import KPITracker
150
+
151
+ tracker = KPITracker(logger)
152
+
153
+ # Immediately available
154
+ print(tracker.hit_rate("motor_v3_vs_v2"))
155
+ # {'champion': {'quoted': 900, 'bound': 270, 'hit_rate': 0.30},
156
+ # 'challenger': {'quoted': 100, 'bound': 28, 'hit_rate': 0.28}}
157
+
158
+ print(tracker.gwp("motor_v3_vs_v2"))
159
+ # {'champion': {'bound_policies': 270, 'total_gwp': 108000.0, 'mean_gwp': 400.0},
160
+ # 'challenger': {'bound_policies': 28, 'total_gwp': 11480.0, 'mean_gwp': 410.0}}
161
+
162
+ # After 12 months development
163
+ lr = tracker.loss_ratio("motor_v3_vs_v2", development_months=12)
164
+ print(lr)
165
+ # {'champion': {'loss_ratio': 0.64, 'policy_count': 270, ...},
166
+ # 'challenger': {'loss_ratio': 0.61, 'policy_count': 28, ...}}
167
+
168
+ # Power analysis: how long until we can decide?
169
+ pa = tracker.power_analysis("motor_v3_vs_v2", target_delta_lr=0.03)
170
+ print(f"Months to LR significance (incl. 12m development): "
171
+ f"{pa['lr_total_months_with_development']:.0f}")
172
+ # Months to LR significance (incl. 12m development): 28
173
+ ```
174
+
175
+ ### Statistical comparison
176
+
177
+ ```python
178
+ from insurance_deploy import ModelComparison
179
+
180
+ comp = ModelComparison(tracker)
181
+
182
+ # Bootstrap loss ratio test (requires 12m+ development)
183
+ result = comp.bootstrap_lr_test("motor_v3_vs_v2", n_bootstrap=10_000, seed=42)
184
+ print(result.summary())
185
+ # Test: bootstrap_lr_test | Experiment: motor_v3_vs_v2
186
+ # Champion estimate: 0.6402 (n=270)
187
+ # Challenger estimate: 0.6118 (n=28)
188
+ # Difference (challenger - champion): -0.0284
189
+ # 95% CI: [-0.0751, 0.0183]
190
+ # p-value: 0.2341
191
+ #
192
+ # Conclusion: INSUFFICIENT_EVIDENCE
193
+ # Recommendation: No statistically significant difference detected in loss_ratio
194
+ # (p=0.234 >= 0.05). Continue experiment. Consider running power_analysis()
195
+ # to estimate time to significance.
196
+
197
+ # Hit rate test (available earlier)
198
+ hr_result = comp.hit_rate_test("motor_v3_vs_v2")
199
+ ```
200
+
201
+ ### ENBP audit report
202
+
203
+ ```python
204
+ from insurance_deploy import ENBPAuditReport
205
+
206
+ reporter = ENBPAuditReport(logger)
207
+ md = reporter.generate(
208
+ "motor_v3_vs_v2",
209
+ period_start="2024-01-01",
210
+ period_end="2024-12-31",
211
+ firm_name="Acme Insurance Ltd",
212
+ smf_holder="Jane Smith",
213
+ )
214
+ print(md) # Markdown: paste into attestation pack or Databricks notebook
215
+ ```
216
+
217
+ ---
218
+
219
+ ## Shadow mode vs live mode
220
+
221
+ Shadow mode is the default and is right for most use cases.
222
+
223
+ **Shadow mode:** champion prices every quote. Challenger runs on identical inputs, output is logged but not shown to the customer. Zero fair value regulatory risk. Model quality comparison is clean — no adverse selection confound.
224
+
225
+ **Live mode:** challenger prices its routed fraction of policies (10% by default). Enables market response testing (does challenger's different pricing affect conversion?). Raises FCA Consumer Duty fair value questions. Also introduces adverse selection bias: if challenger prices differently, the bound cohorts will have different risk profiles, making loss ratio comparison harder to interpret.
226
+
227
+ ```python
228
+ # Live mode — get legal sign-off first
229
+ import warnings
230
+ with warnings.catch_warnings():
231
+ warnings.simplefilter("ignore") # suppress the FCA warning if you've taken legal advice
232
+ exp = Experiment(
233
+ name="live_test",
234
+ champion=champion_mv,
235
+ challenger=challenger_mv,
236
+ mode="live",
237
+ )
238
+ ```
239
+
240
+ The library is opinionated here. The warning is intentional. Suppress it when you've done the legal groundwork, not before.
241
+
242
+ ---
243
+
244
+ ## Routing determinism
245
+
246
+ Routing uses SHA-256(policy_id + experiment_name), last 8 hex characters as integer, modulo 100. If result < challenger_pct * 100, route to challenger.
247
+
248
+ This is deterministic and stateless. Given a policy_id and experiment name, the routing decision is always the same and can be recomputed at any point independently. No database of assignments required. Any assignment can be verified from first principles.
249
+
250
+ Assignment is by policy, not by quote. A policy routed to challenger on first quote will always be routed to challenger within this experiment. This is required for ENBP audit integrity.
251
+
252
+ ---
253
+
254
+ ## Why loss ratio significance takes 18 months
255
+
256
+ At 10% challenger split with 3,000 bound policies/month total:
257
+ - Challenger receives ~300 policies/month
258
+ - Hit rate significance (2pp delta, 80% power): ~5 months
259
+ - Claim frequency significance (0.5pp delta): ~10 months
260
+ - **Developed loss ratio significance (3pp delta, 12-month development): ~17 months from first quote, total ~29 months from experiment start**
261
+
262
+ This is not a limitation of the library. It is physics. LR has a 12-36 month reward tail. Any framework claiming to optimise on LR signal faster than this using bandits or similar methods is either using a proxy metric (hit rate) or lying.
263
+
264
+ The `power_analysis()` method makes this timeline explicit. Run it before starting an experiment so your stakeholders have realistic expectations.
265
+
266
+ ---
267
+
268
+ ## Radar wrapper pattern
269
+
270
+ Most UK personal lines insurers deploy rates via WTW Radar Live. The library integrates as a governance layer around Radar:
271
+
272
+ ```python
273
+ import requests
274
+ from insurance_deploy import Experiment, QuoteLogger
275
+
276
+ def get_quote(policy_id, risk_dict, renewal_flag=False, enbp=None):
277
+ # Champion = Radar Live (existing production system)
278
+ radar_response = requests.post(RADAR_LIVE_URL, json=risk_dict)
279
+ champion_price = radar_response.json()["premium"]
280
+
281
+ # Challenger = Python model (your new model)
282
+ arm = exp.route(policy_id)
283
+ challenger_price = challenger_mv.predict([risk_dict])[0]
284
+
285
+ # Log the quote (champion prices; challenger is shadow)
286
+ logger.log_quote(
287
+ policy_id=policy_id,
288
+ experiment_name=exp.name,
289
+ arm=arm,
290
+ model_version=champion_mv.version_id,
291
+ quoted_price=champion_price,
292
+ enbp=enbp,
293
+ renewal_flag=renewal_flag,
294
+ )
295
+
296
+ return champion_price # customer always sees champion price
297
+ ```
298
+
299
+ No Radar infrastructure changes required. The library handles the governance layer.
300
+
301
+ ---
302
+
303
+ ## ENBP: what the library does and doesn't do
304
+
305
+ The library records ENBP. It does not calculate it.
306
+
307
+ ENBP calculation per ICOBS 6B is your pricing team's responsibility. You pass the ENBP value to `log_quote(enbp=...)`. The library records the value, flags whether `quoted_price <= enbp`, and includes this in the audit report.
308
+
309
+ If your ENBP calculation is wrong, the log is wrong — but that is upstream of this library's scope. The separation is intentional.
310
+
311
+ ---
312
+
313
+ ## Databricks companion notebook
314
+
315
+ `notebooks/insurance_deploy_demo.py` demonstrates the full workflow on synthetic data:
316
+ - Model registry setup
317
+ - Experiment configuration and routing verification
318
+ - Quote/bind/claim data generation and logging
319
+ - KPI dashboard
320
+ - Bootstrap LR test
321
+ - ENBP audit report generation
322
+
323
+ Run as a Databricks Python notebook. Requires `pip install insurance-deploy`.
324
+
325
+ ---
326
+
327
+ ## Scope
328
+
329
+ This library handles: model version registry, champion/challenger routing, audit logging, KPI computation, statistical tests, ENBP compliance reports.
330
+
331
+ It does not handle: model training, rate optimisation (see `insurance-optimise`), model drift monitoring (see `insurance-monitor`), causal effect estimation (see `insurance-causal-policy`), or real-time API infrastructure.
332
+
333
+ ---
334
+
335
+ ## License
336
+
337
+ MIT