drn 0.0.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
drn-0.0.1/.gitignore ADDED
@@ -0,0 +1,4 @@
1
+ __pycache__
2
+ .DS_Store
3
+ build
4
+ *.egg-info
@@ -0,0 +1,7 @@
1
+ repos:
2
+ - repo: https://github.com/psf/black
3
+ rev: 8fe6270
4
+ hooks:
5
+ - id: black
6
+ language_version: python3.10
7
+
drn-0.0.1/LICENSE.md ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2024 Tian (Eric) Dong
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
drn-0.0.1/PKG-INFO ADDED
@@ -0,0 +1,516 @@
1
+ Metadata-Version: 2.4
2
+ Name: drn
3
+ Version: 0.0.1
4
+ Summary: Distributional regression modelling using PyTorch
5
+ Author-email: Eric Dong <tiandong1999@gmail.com>, Patrick Laub <patrick.laub@gmail.com>
6
+ Maintainer-email: Eric Dong <tiandong1999@gmail.com>
7
+ License-File: LICENSE.md
8
+ Classifier: License :: OSI Approved :: MIT License
9
+ Classifier: Programming Language :: Python :: 3
10
+ Requires-Python: >=3.8
11
+ Requires-Dist: matplotlib
12
+ Requires-Dist: numpy
13
+ Requires-Dist: pandas
14
+ Requires-Dist: scikit-learn
15
+ Requires-Dist: seaborn
16
+ Requires-Dist: shap
17
+ Requires-Dist: statsmodels
18
+ Requires-Dist: torch
19
+ Requires-Dist: tqdm
20
+ Provides-Extra: dev
21
+ Requires-Dist: black; extra == 'dev'
22
+ Requires-Dist: ipykernel; extra == 'dev'
23
+ Requires-Dist: isort; extra == 'dev'
24
+ Requires-Dist: jupytext; extra == 'dev'
25
+ Requires-Dist: mypy; extra == 'dev'
26
+ Requires-Dist: pre-commit; extra == 'dev'
27
+ Requires-Dist: pytest; extra == 'dev'
28
+ Description-Content-Type: text/markdown
29
+
30
+ # drn - A Python Package for Distributional Refinement Network (DRN)
31
+
32
+ ## Table of Contents
33
+ - [Overview](#overview)
34
+ - [Key Features](#key-features)
35
+ - [Installation](#installation)
36
+ - [Example: Train a DRN](#example-train-a-drn)
37
+ - [DRN Baseline Component](#drn-baseline-component)
38
+ - [DRN Deep Learning Component](#drn-deep-learning-component)
39
+ - [Example: Distributional Forecasts and Interpretability](#example-distributional-forecasts-and-interpretability)
40
+ - [Distributional Properties: Mean and Quantiles](#distributional-properties-mean-and-quantiles)
41
+ - [Forecasting Performance: Evaluation Metrics](#forecasting-performance-evaluation-metrics)
42
+ - [Interpretability: Kernel SHAP-Embedded PDF and CDF](#interpretability-kernel-shap-embedded-pdf-and-cdf)
43
+ - [Related Repository](#related-repository)
44
+ - [License](#license)
45
+ - [Authors](#authors)
46
+ - [Citations](#citation)
47
+ - [Contact](#Contact)
48
+
49
+ ## Overview
50
+
51
+ A key task in actuarial modelling involves modelling the distributional properties of losses. Classic (distributional) regression approaches like Generalized Linear Models (GLMs; Nelder and Wedderburn, 1972) are commonly used, but challenges remain in developing models that can:
52
+ 1. Allow covariates to flexibly impact different aspects of the conditional distribution,
53
+ 2. Integrate developments in machine learning and AI to maximise the predictive power while considering (1), and,
54
+ 3. Maintain a level of interpretability in the model to enhance trust in the model and its outputs, which is often compromised in efforts pursuing (1) and (2).
55
+
56
+ We tackle this problem by proposing a Distributional Refinement Network (DRN), which combines an inherently interpretable baseline model (such as GLMs) with a flexible neural network--a modified Deep Distribution Regression (DDR; Li et al., 2021) method.
57
+ Inspired by the Combined Actuarial Neural Network (CANN; Schelldorfer and W{\''u}thrich, 2019), our approach flexibly refines the entire baseline distribution.
58
+ As a result, the DRN captures varying effects of features across all quantiles, improving predictive performance while maintaining adequate interpretability.
59
+
60
+ This package, `drn`, addresses the challenges listed above and yields the results demonstrated in our [DRN paper](https://arxiv.org/abs/2406.00998) (Avanzi et al. 2024).
61
+ The full range of key features, installation procedure, examples, and related repositories are listed in the following sections.
62
+
63
+ ## Key Features
64
+
65
+ - **Comprehensive Distributional Regression Models**:
66
+ The `drn` package includes advanced distributional regression models such as the Distributional Refinement Network (DRN), Combined Actuarial Neural Network, Mixture Density Network (MDN; Bishop, 1994), and Deep Distribution Regression (DDR).
67
+ Built on PyTorch, it offers a user-friendly neural network training framework with features like early stopping, dropout, and other essential functionalities.
68
+
69
+ - **Exceptional Distributional Flexibility with Tailored Regularisation**:
70
+ The DRN can accurately model the entire distribution for forecasting purposes.
71
+ Users can control the range and extent of the baseline refinement across all quantiles.
72
+ The recommended baseline model for DRN is a GLM.
73
+ However, in theory, the baseline can be any form of distributional regression method, accommodating bounded, unbounded, discrete, continuous, or mixed response variables.
74
+ The balance between the baseline and the deep learning component is regulated by the KL divergence between them, with user-defined directionality.
75
+ The smoothness of the final forecast density can be adjusted using a roughness penalty, both of which can be tuned for more precise and reliable distributional flexibility.
76
+
77
+ - **Full Distributional Forecasting and Various Evaluation Metrics**:
78
+ The regression models provide full distributional forecasting information, including density, cumulative density function, mean, and quantiles.
79
+ The package includes a range of metrics for evaluating forecasting performance, such as Root Mean Squared Error (RMSE), Quantile Loss (QL), Continuous Ranked Probability Score (CRPS), and Negative Log-Likelihood (NLL).
80
+ These metrics enable a comprehensive assessment of the model's performance across different aspects of distributional forecasting.
81
+
82
+ - **Reasonable Distributional Interpretability with Integrated Kernel SHAP Analysis**:
83
+ The recommended baseline model for DRN is a GLM due to its inherent interpretability, as discussed in the [DRN paper](https://arxiv.org/abs/2406.00998).
84
+ Additional, DRN integrates interpretability techniques like SHAP, allowing users to see detailed decomposition of contributions from both the baseline model and the DRN across various distributional properties beyond the mean.
85
+ Users can generate plots for both density and CDF for the baseline and refined models.
86
+ Kernel SHAP analysis is embedded within these plots, providing customised post-hoc interpretability and aiding in understanding the model's adjustments of key distributional properties beyond the mean.
87
+
88
+
89
+ ## Installation
90
+
91
+ To install the DRN package, simply run:
92
+
93
+ ```sh
94
+ pip install git+https://github.com/EricTianDong/drn.git
95
+ ```
96
+
97
+ If you wish to use the same environment as in the [DRN paper](https://arxiv.org/abs/2406.00998), follow these steps before installation:
98
+
99
+ 1. **Clone the repository:**
100
+ ```sh
101
+ git clone https://github.com/EricTianDong/drn.git
102
+ cd drn
103
+ ```
104
+ 2. **Create the Conda environment:**
105
+ ```sh
106
+ conda env create -f environment.yml
107
+ ```
108
+
109
+ 3. **Activate the Conda environment:**
110
+ ```sh
111
+ conda activate ai
112
+ ```
113
+
114
+ ## Example: Train a DRN
115
+
116
+ This section demonstrates how to construct DRN using our `drn` package from scratch.
117
+ After loading all relevant packages, we generate a synthetic Gaussian dataset.
118
+
119
+ ``` python
120
+ from drn import train, split_and_preprocess
121
+ from drn import GLM, DRN
122
+ from drn import models
123
+ import numpy as np
124
+ import pandas as pd
125
+ import torch
126
+ ```
127
+
128
+ ``` python
129
+ def generate_synthetic_gaussian_lognormal(n=1000, seed=1, specific_instance=None):
130
+ rng = np.random.default_rng(seed)
131
+
132
+ # Parameters
133
+ mu = [0, 0] # Means of the Gaussian
134
+ sigma = [0.5, 0.5] # Standard deviations
135
+ rho = 0.0 # Correlation coefficient
136
+
137
+ # Covariance matrix
138
+ covariance = [
139
+ [sigma[0] ** 2, rho * sigma[0] * sigma[1]],
140
+ [rho * sigma[0] * sigma[1], sigma[1] ** 2],
141
+ ]
142
+
143
+ # Generate bivariate normal distribution
144
+ x = rng.multivariate_normal(mu, covariance, n)
145
+
146
+ # Create a non-linear relationship between X1 & X2 and means & dispersion.
147
+ means = -x[:, 0] + x[:, 1]
148
+ dispersion = 0.5 * (x[:, 0] ** 2 + x[:, 1] ** 2)
149
+
150
+ # Use specific instance if provided
151
+ if specific_instance is not None:
152
+ x_1, x_2 = specific_instance
153
+ means = (-x_1 + x_2).repeat(n)
154
+ dispersion = (0.5 * (x_1 ** 2 + x_2 ** 2)).repeat(n)
155
+
156
+ # Generate response variable Y, which consists both normal and lognormal components
157
+ y_normal = rng.normal(means, dispersion)
158
+ y_lognormal = np.exp(rng.normal(np.log(means**2), scale = dispersion))
159
+ y = y_normal - y_lognormal
160
+
161
+ return pd.DataFrame(x, columns=["X_1", "X_2"]), pd.Series(y, name="Y")
162
+
163
+ # Generate synthetic data
164
+ features, target = generate_synthetic_gaussian_lognormal(12000)
165
+ ```
166
+
167
+ You can choose to split and preprocess the dataset as you wish.
168
+ The following is just an example to generate a training and validation dataset compatible for training using PyTorch.
169
+
170
+ ``` python
171
+ # Preprocess and split the data
172
+ x_train, x_val, x_test, y_train, y_val, y_test, \
173
+ x_train_raw, x_val_raw, x_test_raw, \
174
+ num_features, cat_features, all_categories, ct = split_and_preprocess(
175
+ features,
176
+ target,
177
+ ['X_1', 'X_2'], # Numerical features
178
+ [], # Categorical features
179
+ seed=0,
180
+ num_standard=True # Whether to standardize or not
181
+ )
182
+
183
+ # Convert pandas dataframes to PyTorch tensors
184
+ X_train = torch.Tensor(x_train.values)
185
+ Y_train = torch.Tensor(y_train.values)
186
+ X_val = torch.Tensor(x_val.values)
187
+ Y_val = torch.Tensor(y_val.values)
188
+ X_test = torch.Tensor(x_test.values)
189
+ Y_test = torch.Tensor(y_test.values)
190
+
191
+ # Create PyTorch datasets for training and validation
192
+ train_dataset = torch.utils.data.TensorDataset(X_train, Y_train)
193
+ val_dataset = torch.utils.data.TensorDataset(X_val, Y_val)
194
+ ```
195
+
196
+ ### DRN Baseline Component
197
+
198
+ The first stage of constructing a DRN involves training a baseline distributional regression model, such as a GLM.
199
+ Below, we use the GLM from statsmodels.
200
+ You don't need to add the intercept term, just pass in X_train and Y_train as torch tensors.
201
+
202
+ ``` python
203
+ baseline = GLM.from_statsmodels(X_train, Y_train, distribution='gaussian')
204
+ ```
205
+
206
+ Alternatively, you can train a GLM using the SGD method.
207
+ We currently support the 'gaussian' and 'gamma' distributions for the neural network version of GLM.
208
+
209
+ ``` python
210
+ # Initialise and train the baseline GLM model
211
+ torch.manual_seed(23)
212
+ baseline = GLM(X_train.shape[1], distribution='gaussian')
213
+
214
+ train(
215
+ baseline,
216
+ models.gaussian_deviance_loss,
217
+ train_dataset,
218
+ val_dataset,
219
+ log_interval=10,
220
+ epochs=5000,
221
+ lr=0.001,
222
+ patience=100,
223
+ batch_size=100
224
+ )
225
+
226
+ # Update dispersion parameters for the baseline model
227
+ baseline.update_dispersion(X_train, Y_train)
228
+ baseline.eval()
229
+ ```
230
+
231
+ ### DRN Deep Learning Component
232
+
233
+ You need to first specify a region for distributional refinement of the baseline, defined by `cutpoints_DRN`.
234
+ Select a lower bound `c_0`, an upper bound `c_K`, a proportion `p` (cutpoints-to-observation ratio) and the minumum number of training observations `min_obs` needed for each partitioned interval.
235
+ In practice:
236
+ - Try `p` around 0.05-0.1 for small datasets (less than 10000 observations) and decrease `p` as the number of observations increases.
237
+ - Try `min_obs` = 0 for small datasets and increase `min_obs` as the number of training observations increases, if desirable.
238
+
239
+ ``` python
240
+ # Define the refinement region for DRN
241
+ cutpoints_DRN = models.drn_cutpoints(
242
+ c_0 = np.min(y_train) * 1.1 if np.min(y_train) < 0 else 0.0,
243
+ c_K = np.max(y_train) * 1.1,
244
+ p = 0.1,
245
+ y = y_train,
246
+ min_obs = 1
247
+ )
248
+ ```
249
+
250
+ Finally, specify the hyperparameters for the DRN and pass in the GLM and refinement region defined earlier.
251
+ The regularisation coefficients `kl_alpha`, `dv_alpha`, and `mean_alpha` control the deviation from the baseline's distribution, the roughness of the estimated density, and the deviation from the baseline's mean, respectively.
252
+ All of these coefficients can be treated as hyperparameters.
253
+ Nevertheless:
254
+ - Try a small `kl_alpha`, i.e., 1e-5~1e-4, depending on the performance of the baseline (generally, the better the baseline, the larger the `kl_alpha`).
255
+ - Try a reasonably large `dv_alpha` for a small number of cutpoints, i.e., ~1e-3. Decrease `dv_alpha` as the number of cutpoints increases.
256
+ - Try to start with a small `mean_alpha`, i.e., 1e-5~1e-4. Alternatively, set it to zero if total deviations from the baseline's means are ideal.
257
+
258
+ ``` python
259
+ # Initialise and train the DRN model
260
+ torch.manual_seed(23)
261
+ drn_model = DRN(
262
+ num_features=x_train.shape[1],
263
+ cutpoints=cutpoints_DRN,
264
+ glm=baseline,
265
+ hidden_size=128,
266
+ num_hidden_layers=2,
267
+ baseline_start=False,
268
+ dropout_rate=0.2
269
+ )
270
+
271
+ train(
272
+ drn_model,
273
+ lambda pred, y: models.drn_loss(
274
+ pred,
275
+ y,
276
+ kl_alpha=1e-4, # KL divergence penalty
277
+ dv_alpha=1e-3, # Roughness penalty
278
+ mean_alpha=1e-5, # Mean penalty
279
+ kl_direction = 'forwards'
280
+ ),
281
+ train_dataset,
282
+ val_dataset,
283
+ lr=0.0005,
284
+ batch_size=256,
285
+ log_interval=1,
286
+ patience=30,
287
+ epochs=1000
288
+ )
289
+
290
+ drn_model.eval()
291
+ ```
292
+
293
+
294
+ ## Example: Distributional Forecasts and Interpretability
295
+
296
+ This section demonstrates how to use a DRN to forecast probability density functions (PDFs) and cumulative density functions (CDFs), key distributional properties, and evaluate distributional forecasting performance.
297
+
298
+ ``` python
299
+ from drn import metrics
300
+ from drn import interpretability
301
+ ```
302
+
303
+ ### Distributional Properties: Mean and Quantiles
304
+
305
+ Currently, we support mean and quantile forecasts.
306
+ Variance, skewness, and kurtosis can be derived using the density function.
307
+
308
+ ``` python
309
+ test_instance = X_test[:1]
310
+ mean_pred = drn_model.distributions(test_instance).mean
311
+ _10_quantile = drn_model.distributions(test_instance).quantiles(
312
+ [10],
313
+ l = torch.min(Y_train) * 3 if torch.min(Y_train) < 0 else 0.0,
314
+ u = torch.max(Y_train) * 3)
315
+ _90_quantile = drn_model.distributions(test_instance).quantiles(
316
+ [90],
317
+ l = torch.min(Y_train) * 3 if torch.min(Y_train) < 0 else 0.0,
318
+ u = torch.max(Y_train) * 3)
319
+ _99_quantile = drn_model.distributions(test_instance).quantiles(
320
+ [99],
321
+ l = torch.min(Y_train) * 3 if torch.min(Y_train) < 0 else 0.0,
322
+ u = torch.max(Y_train) * 3)
323
+
324
+ for metric_name, metric in zip(['Mean', '10% Quantile', '90% Quantile', '99% Quantile'],
325
+ [mean_pred, _10_quantile, _90_quantile, _99_quantile]):
326
+ print(f'{metric_name}: {metric.item()}')
327
+ ```
328
+
329
+ ### Forecasting Performance: Evaluation Metrics
330
+
331
+ Generate both the `distributions` and the `cdf` objects.
332
+
333
+ ``` python
334
+ names = ["GLM", "DRN"]
335
+ dr_models = [baseline, drn_model]
336
+
337
+ print("Generating distributional forecasts")
338
+ dists_train, dists_val, dists_test = {}, {}, {}
339
+ for name, model in zip(names, dr_models):
340
+ print(f"- {name}")
341
+ dists_train[name] = model.distributions(X_train)
342
+ dists_val[name] = model.distributions(X_val)
343
+ dists_test[name] = model.distributions(X_test)
344
+
345
+ print("Calculating CDF over a grid")
346
+ GRID_SIZE = 3000
347
+ grid = torch.linspace(0, np.max(y_train) * 1.1, GRID_SIZE).unsqueeze(-1)
348
+
349
+ cdfs_train, cdfs_val, cdfs_test = {}, {}, {}
350
+ for name, model in zip(names, dr_models):
351
+ print(f"- {name}")
352
+ cdfs_train[name] = dists_train[name].cdf(grid)
353
+ cdfs_val[name] = dists_val[name].cdf(grid)
354
+ cdfs_test[name] = dists_test[name].cdf(grid)
355
+ ```
356
+
357
+ Then, generate the evaluation metrics: NLL, CRPS, RMSE, and QLs.
358
+
359
+ ``` python
360
+ print("Calculating negative log likelihoods")
361
+ nlls_train, nlls_val, nlls_test = {}, {}, {}
362
+ for name, model in zip(names, dr_models):
363
+ nlls_train[name] = -dists_train[name].log_prob(Y_train).mean()
364
+ nlls_val[name] = -dists_val[name].log_prob(Y_val).mean()
365
+ nlls_test[name] = -dists_test[name].log_prob(Y_test).mean()
366
+
367
+ for nll_dict, df_name in zip([nlls_train, nlls_val, nlls_test], ['training', 'val', 'test']):
368
+ print(f'NLL on {df_name} set')
369
+ for name in names:
370
+ print(f"{name}: {nll_dict[name]:.4f}")
371
+ print('-------------------------------')
372
+
373
+ print("Calculating CRPS")
374
+ grid = grid.squeeze()
375
+ crps_train, crps_val, crps_test = {}, {}, {}
376
+ for name, model in zip(names, dr_models):
377
+ crps_train[name] = metrics.crps(Y_train, grid, cdfs_train[name])
378
+ crps_val[name] = metrics.crps(Y_val, grid, cdfs_val[name])
379
+ crps_test[name] = metrics.crps(Y_test, grid, cdfs_test[name])
380
+
381
+ for crps_dict, df_name in zip([crps_train, crps_val, crps_test], ['training', 'val', 'test']):
382
+ print(f'CRPS on {df_name} set')
383
+ for name in names:
384
+ print(f"{name}: {crps_dict[name].mean():.4f}")
385
+ print('------------------------------')
386
+
387
+ print("Calculating RMSE")
388
+ rmse_train, rmse_val, rmse_test = {}, {}, {}
389
+ for name, model in zip(names, dr_models):
390
+ means_train = dists_train[name].mean
391
+ means_val = dists_val[name].mean
392
+ means_test = dists_test[name].mean
393
+ rmse_train[name] = metrics.rmse(y_train, means_train)
394
+ rmse_val[name] = metrics.rmse(y_val, means_val)
395
+ rmse_test[name] = metrics.rmse(y_test, means_test)
396
+
397
+ for rmse_dict, df_name in zip([rmse_train, rmse_val, rmse_test], ['training', 'validation', 'test']):
398
+ print(f'RMSE on {df_name} set')
399
+ for name in names:
400
+ print(f"{name}: {rmse_dict[name].mean():.4f}")
401
+ print('-------------------------------')
402
+
403
+ print("Calculating Quantile Loss")
404
+ ql_90_train, ql_90_val, ql_90_test = {}, {}, {}
405
+ for features, response, dataset_name, ql_dict in zip(
406
+ [X_train, X_val, X_test], [y_train, y_val, y_test], ['Training', 'Validation', 'Test'], [ql_90_train, ql_90_val, ql_90_test]
407
+ ):
408
+ print(f'{dataset_name} Dataset Quantile Loss(es)')
409
+ for model, model_name in zip(dr_models, names):
410
+ ql_dict[model_name] = metrics.quantile_losses(
411
+ 0.9, model, model_name, features, response,
412
+ max_iter=1000, tolerance=1e-4,
413
+ l=torch.Tensor([np.min(y_train) - 3 * (np.max(y_train) - np.min(y_train))]),
414
+ u=torch.Tensor([np.max(y_train) + 3 * (np.max(y_train) - np.min(y_train))])
415
+ )
416
+ print('----------------------')
417
+
418
+ ```
419
+
420
+ ### Interpretability: Kernel SHAP-Embedded PDF and CDF
421
+
422
+ To plot the PDF and CDF, you need to first initialise an `explainer`.
423
+ The instance to be examined should be a DataFrame, with feature values that are neither standardised nor encoded.
424
+ We currently support the Kernel SHAP method.
425
+
426
+ ``` python
427
+ test_instance_df = x_test_raw.iloc[:1]
428
+ Y_instance = Y_test[:1]
429
+
430
+ # Initialise the Explainer
431
+ drn_explainer = interpretability.DRNExplainer(
432
+ drn_model,
433
+ baseline,
434
+ cutpoints_DRN,
435
+ x_train_raw,
436
+ cat_features,
437
+ all_categories,
438
+ ct
439
+ )
440
+
441
+ # Plot the PDF before and after refinement
442
+ drn_explainer.plot_adjustment_factors(
443
+ instance=test_instance_df,
444
+ num_interpolations=1_000,
445
+ plot_adjustments_labels=False,
446
+ x_range=(-2, 2),
447
+ )
448
+
449
+ # Use Kernel SHAP to explain the mean adjustment
450
+ drn_explainer.plot_dp_adjustment_shap(
451
+ instance_raw=test_instance_df,
452
+ method='Kernel',
453
+ nsamples_background_fraction=0.5,
454
+ top_K_features=2,
455
+ labelling_gap=0.1,
456
+ dist_property='Mean',
457
+ x_range=(-1, 1),
458
+ y_range=(0.0, 2.0),
459
+ observation=Y_instance,
460
+ adjustment=True,
461
+ shap_fontsize=15,
462
+ figsize=(7, 7),
463
+ plot_title='Explaining a 90% Quantile Adjustment',
464
+ )
465
+
466
+ # Explain DRN's 90% quantile prediction from ground up
467
+ drn_explainer.cdf_plot(
468
+ instance=test_instance_df,
469
+ method='Kernel',
470
+ nsamples_background_fraction=0.5,
471
+ top_K_features=2,
472
+ labelling_gap=0.1,
473
+ dist_property='90% Quantile',
474
+ x_range=(-0.5, 1.0),
475
+ y_range=(0.8, 1.0),
476
+ adjustment=False,
477
+ plot_baseline=False,
478
+ shap_fontsize=15,
479
+ figsize=(7, 7),
480
+ plot_title='90% Quantile Explanation',
481
+ )
482
+ ```
483
+
484
+ ## Related Repository
485
+
486
+ This package accompanies the [DRN paper](https://arxiv.org/abs/2406.00998) on the Distributional Refinement Network (DRN).
487
+ The related repository, available at [https://github.com/agi-lab/DRN](https://github.com/agi-lab/DRN), contains the Python notebooks and additional resources needed to reproduce the results presented in the [DRN paper](https://arxiv.org/abs/2406.00998).
488
+
489
+ ## License
490
+
491
+ See [LICENSE.md](https://github.com/EricTianDong/drn/tree/main?tab=MIT-1-ov-file).
492
+
493
+ ## Authors
494
+
495
+ - Eric Dong (author, maintainer),
496
+ - Patrick Laub (author).
497
+
498
+
499
+ ## Citation
500
+
501
+ ``` sh
502
+ @misc{avanzi2024distributional,
503
+ title={Distributional Refinement Network: Distributional Forecasting via Deep Learning},
504
+ author={Benjamin Avanzi and Eric Dong and Patrick J. Laub and Bernard Wong},
505
+ year={2024},
506
+ eprint={2406.00998},
507
+ archivePrefix={arXiv},
508
+ primaryClass={stat.ML}
509
+ }
510
+ ```
511
+
512
+ ## Contact
513
+
514
+ For any questions or further information, please contact tiandong1999@gmail.com.
515
+
516
+