drn 0.0.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- drn-0.0.1/.gitignore +4 -0
- drn-0.0.1/.pre-commit-config.yaml +7 -0
- drn-0.0.1/LICENSE.md +21 -0
- drn-0.0.1/PKG-INFO +516 -0
- drn-0.0.1/README.md +487 -0
- drn-0.0.1/pyproject.toml +35 -0
- drn-0.0.1/src/drn/__init__.py +6 -0
- drn-0.0.1/src/drn/distributions/__init__.py +4 -0
- drn-0.0.1/src/drn/distributions/extended_histogram.py +219 -0
- drn-0.0.1/src/drn/distributions/histogram.py +406 -0
- drn-0.0.1/src/drn/interpretability.py +1717 -0
- drn-0.0.1/src/drn/metrics.py +114 -0
- drn-0.0.1/src/drn/models/__init__.py +39 -0
- drn-0.0.1/src/drn/models/cann.py +195 -0
- drn-0.0.1/src/drn/models/ddr.py +111 -0
- drn-0.0.1/src/drn/models/drn.py +231 -0
- drn-0.0.1/src/drn/models/glm.py +324 -0
- drn-0.0.1/src/drn/models/mdn.py +236 -0
- drn-0.0.1/src/drn/py.typed +0 -0
- drn-0.0.1/src/drn/train.py +244 -0
- drn-0.0.1/tests/synthetic_dataset.py +51 -0
- drn-0.0.1/tests/test_fit_models_synthetic.py +304 -0
- drn-0.0.1/tests/test_glm_distributions.py +41 -0
- drn-0.0.1/tests/test_interpretabilty.py +77 -0
- drn-0.0.1/tests/test_utility_functions.py +60 -0
drn-0.0.1/.gitignore
ADDED
drn-0.0.1/LICENSE.md
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2024 Tian (Eric) Dong
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
drn-0.0.1/PKG-INFO
ADDED
|
@@ -0,0 +1,516 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: drn
|
|
3
|
+
Version: 0.0.1
|
|
4
|
+
Summary: Distributional regression modelling using PyTorch
|
|
5
|
+
Author-email: Eric Dong <tiandong1999@gmail.com>, Patrick Laub <patrick.laub@gmail.com>
|
|
6
|
+
Maintainer-email: Eric Dong <tiandong1999@gmail.com>
|
|
7
|
+
License-File: LICENSE.md
|
|
8
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
9
|
+
Classifier: Programming Language :: Python :: 3
|
|
10
|
+
Requires-Python: >=3.8
|
|
11
|
+
Requires-Dist: matplotlib
|
|
12
|
+
Requires-Dist: numpy
|
|
13
|
+
Requires-Dist: pandas
|
|
14
|
+
Requires-Dist: scikit-learn
|
|
15
|
+
Requires-Dist: seaborn
|
|
16
|
+
Requires-Dist: shap
|
|
17
|
+
Requires-Dist: statsmodels
|
|
18
|
+
Requires-Dist: torch
|
|
19
|
+
Requires-Dist: tqdm
|
|
20
|
+
Provides-Extra: dev
|
|
21
|
+
Requires-Dist: black; extra == 'dev'
|
|
22
|
+
Requires-Dist: ipykernel; extra == 'dev'
|
|
23
|
+
Requires-Dist: isort; extra == 'dev'
|
|
24
|
+
Requires-Dist: jupytext; extra == 'dev'
|
|
25
|
+
Requires-Dist: mypy; extra == 'dev'
|
|
26
|
+
Requires-Dist: pre-commit; extra == 'dev'
|
|
27
|
+
Requires-Dist: pytest; extra == 'dev'
|
|
28
|
+
Description-Content-Type: text/markdown
|
|
29
|
+
|
|
30
|
+
# drn - A Python Package for Distributional Refinement Network (DRN)
|
|
31
|
+
|
|
32
|
+
## Table of Contents
|
|
33
|
+
- [Overview](#overview)
|
|
34
|
+
- [Key Features](#key-features)
|
|
35
|
+
- [Installation](#installation)
|
|
36
|
+
- [Example: Train a DRN](#example-train-a-drn)
|
|
37
|
+
- [DRN Baseline Component](#drn-baseline-component)
|
|
38
|
+
- [DRN Deep Learning Component](#drn-deep-learning-component)
|
|
39
|
+
- [Example: Distributional Forecasts and Interpretability](#example-distributional-forecasts-and-interpretability)
|
|
40
|
+
- [Distributional Properties: Mean and Quantiles](#distributional-properties-mean-and-quantiles)
|
|
41
|
+
- [Forecasting Performance: Evaluation Metrics](#forecasting-performance-evaluation-metrics)
|
|
42
|
+
- [Interpretability: Kernel SHAP-Embedded PDF and CDF](#interpretability-kernel-shap-embedded-pdf-and-cdf)
|
|
43
|
+
- [Related Repository](#related-repository)
|
|
44
|
+
- [License](#license)
|
|
45
|
+
- [Authors](#authors)
|
|
46
|
+
- [Citations](#citation)
|
|
47
|
+
- [Contact](#Contact)
|
|
48
|
+
|
|
49
|
+
## Overview
|
|
50
|
+
|
|
51
|
+
A key task in actuarial modelling involves modelling the distributional properties of losses. Classic (distributional) regression approaches like Generalized Linear Models (GLMs; Nelder and Wedderburn, 1972) are commonly used, but challenges remain in developing models that can:
|
|
52
|
+
1. Allow covariates to flexibly impact different aspects of the conditional distribution,
|
|
53
|
+
2. Integrate developments in machine learning and AI to maximise the predictive power while considering (1), and,
|
|
54
|
+
3. Maintain a level of interpretability in the model to enhance trust in the model and its outputs, which is often compromised in efforts pursuing (1) and (2).
|
|
55
|
+
|
|
56
|
+
We tackle this problem by proposing a Distributional Refinement Network (DRN), which combines an inherently interpretable baseline model (such as GLMs) with a flexible neural network--a modified Deep Distribution Regression (DDR; Li et al., 2021) method.
|
|
57
|
+
Inspired by the Combined Actuarial Neural Network (CANN; Schelldorfer and W{\''u}thrich, 2019), our approach flexibly refines the entire baseline distribution.
|
|
58
|
+
As a result, the DRN captures varying effects of features across all quantiles, improving predictive performance while maintaining adequate interpretability.
|
|
59
|
+
|
|
60
|
+
This package, `drn`, addresses the challenges listed above and yields the results demonstrated in our [DRN paper](https://arxiv.org/abs/2406.00998) (Avanzi et al. 2024).
|
|
61
|
+
The full range of key features, installation procedure, examples, and related repositories are listed in the following sections.
|
|
62
|
+
|
|
63
|
+
## Key Features
|
|
64
|
+
|
|
65
|
+
- **Comprehensive Distributional Regression Models**:
|
|
66
|
+
The `drn` package includes advanced distributional regression models such as the Distributional Refinement Network (DRN), Combined Actuarial Neural Network, Mixture Density Network (MDN; Bishop, 1994), and Deep Distribution Regression (DDR).
|
|
67
|
+
Built on PyTorch, it offers a user-friendly neural network training framework with features like early stopping, dropout, and other essential functionalities.
|
|
68
|
+
|
|
69
|
+
- **Exceptional Distributional Flexibility with Tailored Regularisation**:
|
|
70
|
+
The DRN can accurately model the entire distribution for forecasting purposes.
|
|
71
|
+
Users can control the range and extent of the baseline refinement across all quantiles.
|
|
72
|
+
The recommended baseline model for DRN is a GLM.
|
|
73
|
+
However, in theory, the baseline can be any form of distributional regression method, accommodating bounded, unbounded, discrete, continuous, or mixed response variables.
|
|
74
|
+
The balance between the baseline and the deep learning component is regulated by the KL divergence between them, with user-defined directionality.
|
|
75
|
+
The smoothness of the final forecast density can be adjusted using a roughness penalty, both of which can be tuned for more precise and reliable distributional flexibility.
|
|
76
|
+
|
|
77
|
+
- **Full Distributional Forecasting and Various Evaluation Metrics**:
|
|
78
|
+
The regression models provide full distributional forecasting information, including density, cumulative density function, mean, and quantiles.
|
|
79
|
+
The package includes a range of metrics for evaluating forecasting performance, such as Root Mean Squared Error (RMSE), Quantile Loss (QL), Continuous Ranked Probability Score (CRPS), and Negative Log-Likelihood (NLL).
|
|
80
|
+
These metrics enable a comprehensive assessment of the model's performance across different aspects of distributional forecasting.
|
|
81
|
+
|
|
82
|
+
- **Reasonable Distributional Interpretability with Integrated Kernel SHAP Analysis**:
|
|
83
|
+
The recommended baseline model for DRN is a GLM due to its inherent interpretability, as discussed in the [DRN paper](https://arxiv.org/abs/2406.00998).
|
|
84
|
+
Additional, DRN integrates interpretability techniques like SHAP, allowing users to see detailed decomposition of contributions from both the baseline model and the DRN across various distributional properties beyond the mean.
|
|
85
|
+
Users can generate plots for both density and CDF for the baseline and refined models.
|
|
86
|
+
Kernel SHAP analysis is embedded within these plots, providing customised post-hoc interpretability and aiding in understanding the model's adjustments of key distributional properties beyond the mean.
|
|
87
|
+
|
|
88
|
+
|
|
89
|
+
## Installation
|
|
90
|
+
|
|
91
|
+
To install the DRN package, simply run:
|
|
92
|
+
|
|
93
|
+
```sh
|
|
94
|
+
pip install git+https://github.com/EricTianDong/drn.git
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
If you wish to use the same environment as in the [DRN paper](https://arxiv.org/abs/2406.00998), follow these steps before installation:
|
|
98
|
+
|
|
99
|
+
1. **Clone the repository:**
|
|
100
|
+
```sh
|
|
101
|
+
git clone https://github.com/EricTianDong/drn.git
|
|
102
|
+
cd drn
|
|
103
|
+
```
|
|
104
|
+
2. **Create the Conda environment:**
|
|
105
|
+
```sh
|
|
106
|
+
conda env create -f environment.yml
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
3. **Activate the Conda environment:**
|
|
110
|
+
```sh
|
|
111
|
+
conda activate ai
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
## Example: Train a DRN
|
|
115
|
+
|
|
116
|
+
This section demonstrates how to construct DRN using our `drn` package from scratch.
|
|
117
|
+
After loading all relevant packages, we generate a synthetic Gaussian dataset.
|
|
118
|
+
|
|
119
|
+
``` python
|
|
120
|
+
from drn import train, split_and_preprocess
|
|
121
|
+
from drn import GLM, DRN
|
|
122
|
+
from drn import models
|
|
123
|
+
import numpy as np
|
|
124
|
+
import pandas as pd
|
|
125
|
+
import torch
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
``` python
|
|
129
|
+
def generate_synthetic_gaussian_lognormal(n=1000, seed=1, specific_instance=None):
|
|
130
|
+
rng = np.random.default_rng(seed)
|
|
131
|
+
|
|
132
|
+
# Parameters
|
|
133
|
+
mu = [0, 0] # Means of the Gaussian
|
|
134
|
+
sigma = [0.5, 0.5] # Standard deviations
|
|
135
|
+
rho = 0.0 # Correlation coefficient
|
|
136
|
+
|
|
137
|
+
# Covariance matrix
|
|
138
|
+
covariance = [
|
|
139
|
+
[sigma[0] ** 2, rho * sigma[0] * sigma[1]],
|
|
140
|
+
[rho * sigma[0] * sigma[1], sigma[1] ** 2],
|
|
141
|
+
]
|
|
142
|
+
|
|
143
|
+
# Generate bivariate normal distribution
|
|
144
|
+
x = rng.multivariate_normal(mu, covariance, n)
|
|
145
|
+
|
|
146
|
+
# Create a non-linear relationship between X1 & X2 and means & dispersion.
|
|
147
|
+
means = -x[:, 0] + x[:, 1]
|
|
148
|
+
dispersion = 0.5 * (x[:, 0] ** 2 + x[:, 1] ** 2)
|
|
149
|
+
|
|
150
|
+
# Use specific instance if provided
|
|
151
|
+
if specific_instance is not None:
|
|
152
|
+
x_1, x_2 = specific_instance
|
|
153
|
+
means = (-x_1 + x_2).repeat(n)
|
|
154
|
+
dispersion = (0.5 * (x_1 ** 2 + x_2 ** 2)).repeat(n)
|
|
155
|
+
|
|
156
|
+
# Generate response variable Y, which consists both normal and lognormal components
|
|
157
|
+
y_normal = rng.normal(means, dispersion)
|
|
158
|
+
y_lognormal = np.exp(rng.normal(np.log(means**2), scale = dispersion))
|
|
159
|
+
y = y_normal - y_lognormal
|
|
160
|
+
|
|
161
|
+
return pd.DataFrame(x, columns=["X_1", "X_2"]), pd.Series(y, name="Y")
|
|
162
|
+
|
|
163
|
+
# Generate synthetic data
|
|
164
|
+
features, target = generate_synthetic_gaussian_lognormal(12000)
|
|
165
|
+
```
|
|
166
|
+
|
|
167
|
+
You can choose to split and preprocess the dataset as you wish.
|
|
168
|
+
The following is just an example to generate a training and validation dataset compatible for training using PyTorch.
|
|
169
|
+
|
|
170
|
+
``` python
|
|
171
|
+
# Preprocess and split the data
|
|
172
|
+
x_train, x_val, x_test, y_train, y_val, y_test, \
|
|
173
|
+
x_train_raw, x_val_raw, x_test_raw, \
|
|
174
|
+
num_features, cat_features, all_categories, ct = split_and_preprocess(
|
|
175
|
+
features,
|
|
176
|
+
target,
|
|
177
|
+
['X_1', 'X_2'], # Numerical features
|
|
178
|
+
[], # Categorical features
|
|
179
|
+
seed=0,
|
|
180
|
+
num_standard=True # Whether to standardize or not
|
|
181
|
+
)
|
|
182
|
+
|
|
183
|
+
# Convert pandas dataframes to PyTorch tensors
|
|
184
|
+
X_train = torch.Tensor(x_train.values)
|
|
185
|
+
Y_train = torch.Tensor(y_train.values)
|
|
186
|
+
X_val = torch.Tensor(x_val.values)
|
|
187
|
+
Y_val = torch.Tensor(y_val.values)
|
|
188
|
+
X_test = torch.Tensor(x_test.values)
|
|
189
|
+
Y_test = torch.Tensor(y_test.values)
|
|
190
|
+
|
|
191
|
+
# Create PyTorch datasets for training and validation
|
|
192
|
+
train_dataset = torch.utils.data.TensorDataset(X_train, Y_train)
|
|
193
|
+
val_dataset = torch.utils.data.TensorDataset(X_val, Y_val)
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
### DRN Baseline Component
|
|
197
|
+
|
|
198
|
+
The first stage of constructing a DRN involves training a baseline distributional regression model, such as a GLM.
|
|
199
|
+
Below, we use the GLM from statsmodels.
|
|
200
|
+
You don't need to add the intercept term, just pass in X_train and Y_train as torch tensors.
|
|
201
|
+
|
|
202
|
+
``` python
|
|
203
|
+
baseline = GLM.from_statsmodels(X_train, Y_train, distribution='gaussian')
|
|
204
|
+
```
|
|
205
|
+
|
|
206
|
+
Alternatively, you can train a GLM using the SGD method.
|
|
207
|
+
We currently support the 'gaussian' and 'gamma' distributions for the neural network version of GLM.
|
|
208
|
+
|
|
209
|
+
``` python
|
|
210
|
+
# Initialise and train the baseline GLM model
|
|
211
|
+
torch.manual_seed(23)
|
|
212
|
+
baseline = GLM(X_train.shape[1], distribution='gaussian')
|
|
213
|
+
|
|
214
|
+
train(
|
|
215
|
+
baseline,
|
|
216
|
+
models.gaussian_deviance_loss,
|
|
217
|
+
train_dataset,
|
|
218
|
+
val_dataset,
|
|
219
|
+
log_interval=10,
|
|
220
|
+
epochs=5000,
|
|
221
|
+
lr=0.001,
|
|
222
|
+
patience=100,
|
|
223
|
+
batch_size=100
|
|
224
|
+
)
|
|
225
|
+
|
|
226
|
+
# Update dispersion parameters for the baseline model
|
|
227
|
+
baseline.update_dispersion(X_train, Y_train)
|
|
228
|
+
baseline.eval()
|
|
229
|
+
```
|
|
230
|
+
|
|
231
|
+
### DRN Deep Learning Component
|
|
232
|
+
|
|
233
|
+
You need to first specify a region for distributional refinement of the baseline, defined by `cutpoints_DRN`.
|
|
234
|
+
Select a lower bound `c_0`, an upper bound `c_K`, a proportion `p` (cutpoints-to-observation ratio) and the minumum number of training observations `min_obs` needed for each partitioned interval.
|
|
235
|
+
In practice:
|
|
236
|
+
- Try `p` around 0.05-0.1 for small datasets (less than 10000 observations) and decrease `p` as the number of observations increases.
|
|
237
|
+
- Try `min_obs` = 0 for small datasets and increase `min_obs` as the number of training observations increases, if desirable.
|
|
238
|
+
|
|
239
|
+
``` python
|
|
240
|
+
# Define the refinement region for DRN
|
|
241
|
+
cutpoints_DRN = models.drn_cutpoints(
|
|
242
|
+
c_0 = np.min(y_train) * 1.1 if np.min(y_train) < 0 else 0.0,
|
|
243
|
+
c_K = np.max(y_train) * 1.1,
|
|
244
|
+
p = 0.1,
|
|
245
|
+
y = y_train,
|
|
246
|
+
min_obs = 1
|
|
247
|
+
)
|
|
248
|
+
```
|
|
249
|
+
|
|
250
|
+
Finally, specify the hyperparameters for the DRN and pass in the GLM and refinement region defined earlier.
|
|
251
|
+
The regularisation coefficients `kl_alpha`, `dv_alpha`, and `mean_alpha` control the deviation from the baseline's distribution, the roughness of the estimated density, and the deviation from the baseline's mean, respectively.
|
|
252
|
+
All of these coefficients can be treated as hyperparameters.
|
|
253
|
+
Nevertheless:
|
|
254
|
+
- Try a small `kl_alpha`, i.e., 1e-5~1e-4, depending on the performance of the baseline (generally, the better the baseline, the larger the `kl_alpha`).
|
|
255
|
+
- Try a reasonably large `dv_alpha` for a small number of cutpoints, i.e., ~1e-3. Decrease `dv_alpha` as the number of cutpoints increases.
|
|
256
|
+
- Try to start with a small `mean_alpha`, i.e., 1e-5~1e-4. Alternatively, set it to zero if total deviations from the baseline's means are ideal.
|
|
257
|
+
|
|
258
|
+
``` python
|
|
259
|
+
# Initialise and train the DRN model
|
|
260
|
+
torch.manual_seed(23)
|
|
261
|
+
drn_model = DRN(
|
|
262
|
+
num_features=x_train.shape[1],
|
|
263
|
+
cutpoints=cutpoints_DRN,
|
|
264
|
+
glm=baseline,
|
|
265
|
+
hidden_size=128,
|
|
266
|
+
num_hidden_layers=2,
|
|
267
|
+
baseline_start=False,
|
|
268
|
+
dropout_rate=0.2
|
|
269
|
+
)
|
|
270
|
+
|
|
271
|
+
train(
|
|
272
|
+
drn_model,
|
|
273
|
+
lambda pred, y: models.drn_loss(
|
|
274
|
+
pred,
|
|
275
|
+
y,
|
|
276
|
+
kl_alpha=1e-4, # KL divergence penalty
|
|
277
|
+
dv_alpha=1e-3, # Roughness penalty
|
|
278
|
+
mean_alpha=1e-5, # Mean penalty
|
|
279
|
+
kl_direction = 'forwards'
|
|
280
|
+
),
|
|
281
|
+
train_dataset,
|
|
282
|
+
val_dataset,
|
|
283
|
+
lr=0.0005,
|
|
284
|
+
batch_size=256,
|
|
285
|
+
log_interval=1,
|
|
286
|
+
patience=30,
|
|
287
|
+
epochs=1000
|
|
288
|
+
)
|
|
289
|
+
|
|
290
|
+
drn_model.eval()
|
|
291
|
+
```
|
|
292
|
+
|
|
293
|
+
|
|
294
|
+
## Example: Distributional Forecasts and Interpretability
|
|
295
|
+
|
|
296
|
+
This section demonstrates how to use a DRN to forecast probability density functions (PDFs) and cumulative density functions (CDFs), key distributional properties, and evaluate distributional forecasting performance.
|
|
297
|
+
|
|
298
|
+
``` python
|
|
299
|
+
from drn import metrics
|
|
300
|
+
from drn import interpretability
|
|
301
|
+
```
|
|
302
|
+
|
|
303
|
+
### Distributional Properties: Mean and Quantiles
|
|
304
|
+
|
|
305
|
+
Currently, we support mean and quantile forecasts.
|
|
306
|
+
Variance, skewness, and kurtosis can be derived using the density function.
|
|
307
|
+
|
|
308
|
+
``` python
|
|
309
|
+
test_instance = X_test[:1]
|
|
310
|
+
mean_pred = drn_model.distributions(test_instance).mean
|
|
311
|
+
_10_quantile = drn_model.distributions(test_instance).quantiles(
|
|
312
|
+
[10],
|
|
313
|
+
l = torch.min(Y_train) * 3 if torch.min(Y_train) < 0 else 0.0,
|
|
314
|
+
u = torch.max(Y_train) * 3)
|
|
315
|
+
_90_quantile = drn_model.distributions(test_instance).quantiles(
|
|
316
|
+
[90],
|
|
317
|
+
l = torch.min(Y_train) * 3 if torch.min(Y_train) < 0 else 0.0,
|
|
318
|
+
u = torch.max(Y_train) * 3)
|
|
319
|
+
_99_quantile = drn_model.distributions(test_instance).quantiles(
|
|
320
|
+
[99],
|
|
321
|
+
l = torch.min(Y_train) * 3 if torch.min(Y_train) < 0 else 0.0,
|
|
322
|
+
u = torch.max(Y_train) * 3)
|
|
323
|
+
|
|
324
|
+
for metric_name, metric in zip(['Mean', '10% Quantile', '90% Quantile', '99% Quantile'],
|
|
325
|
+
[mean_pred, _10_quantile, _90_quantile, _99_quantile]):
|
|
326
|
+
print(f'{metric_name}: {metric.item()}')
|
|
327
|
+
```
|
|
328
|
+
|
|
329
|
+
### Forecasting Performance: Evaluation Metrics
|
|
330
|
+
|
|
331
|
+
Generate both the `distributions` and the `cdf` objects.
|
|
332
|
+
|
|
333
|
+
``` python
|
|
334
|
+
names = ["GLM", "DRN"]
|
|
335
|
+
dr_models = [baseline, drn_model]
|
|
336
|
+
|
|
337
|
+
print("Generating distributional forecasts")
|
|
338
|
+
dists_train, dists_val, dists_test = {}, {}, {}
|
|
339
|
+
for name, model in zip(names, dr_models):
|
|
340
|
+
print(f"- {name}")
|
|
341
|
+
dists_train[name] = model.distributions(X_train)
|
|
342
|
+
dists_val[name] = model.distributions(X_val)
|
|
343
|
+
dists_test[name] = model.distributions(X_test)
|
|
344
|
+
|
|
345
|
+
print("Calculating CDF over a grid")
|
|
346
|
+
GRID_SIZE = 3000
|
|
347
|
+
grid = torch.linspace(0, np.max(y_train) * 1.1, GRID_SIZE).unsqueeze(-1)
|
|
348
|
+
|
|
349
|
+
cdfs_train, cdfs_val, cdfs_test = {}, {}, {}
|
|
350
|
+
for name, model in zip(names, dr_models):
|
|
351
|
+
print(f"- {name}")
|
|
352
|
+
cdfs_train[name] = dists_train[name].cdf(grid)
|
|
353
|
+
cdfs_val[name] = dists_val[name].cdf(grid)
|
|
354
|
+
cdfs_test[name] = dists_test[name].cdf(grid)
|
|
355
|
+
```
|
|
356
|
+
|
|
357
|
+
Then, generate the evaluation metrics: NLL, CRPS, RMSE, and QLs.
|
|
358
|
+
|
|
359
|
+
``` python
|
|
360
|
+
print("Calculating negative log likelihoods")
|
|
361
|
+
nlls_train, nlls_val, nlls_test = {}, {}, {}
|
|
362
|
+
for name, model in zip(names, dr_models):
|
|
363
|
+
nlls_train[name] = -dists_train[name].log_prob(Y_train).mean()
|
|
364
|
+
nlls_val[name] = -dists_val[name].log_prob(Y_val).mean()
|
|
365
|
+
nlls_test[name] = -dists_test[name].log_prob(Y_test).mean()
|
|
366
|
+
|
|
367
|
+
for nll_dict, df_name in zip([nlls_train, nlls_val, nlls_test], ['training', 'val', 'test']):
|
|
368
|
+
print(f'NLL on {df_name} set')
|
|
369
|
+
for name in names:
|
|
370
|
+
print(f"{name}: {nll_dict[name]:.4f}")
|
|
371
|
+
print('-------------------------------')
|
|
372
|
+
|
|
373
|
+
print("Calculating CRPS")
|
|
374
|
+
grid = grid.squeeze()
|
|
375
|
+
crps_train, crps_val, crps_test = {}, {}, {}
|
|
376
|
+
for name, model in zip(names, dr_models):
|
|
377
|
+
crps_train[name] = metrics.crps(Y_train, grid, cdfs_train[name])
|
|
378
|
+
crps_val[name] = metrics.crps(Y_val, grid, cdfs_val[name])
|
|
379
|
+
crps_test[name] = metrics.crps(Y_test, grid, cdfs_test[name])
|
|
380
|
+
|
|
381
|
+
for crps_dict, df_name in zip([crps_train, crps_val, crps_test], ['training', 'val', 'test']):
|
|
382
|
+
print(f'CRPS on {df_name} set')
|
|
383
|
+
for name in names:
|
|
384
|
+
print(f"{name}: {crps_dict[name].mean():.4f}")
|
|
385
|
+
print('------------------------------')
|
|
386
|
+
|
|
387
|
+
print("Calculating RMSE")
|
|
388
|
+
rmse_train, rmse_val, rmse_test = {}, {}, {}
|
|
389
|
+
for name, model in zip(names, dr_models):
|
|
390
|
+
means_train = dists_train[name].mean
|
|
391
|
+
means_val = dists_val[name].mean
|
|
392
|
+
means_test = dists_test[name].mean
|
|
393
|
+
rmse_train[name] = metrics.rmse(y_train, means_train)
|
|
394
|
+
rmse_val[name] = metrics.rmse(y_val, means_val)
|
|
395
|
+
rmse_test[name] = metrics.rmse(y_test, means_test)
|
|
396
|
+
|
|
397
|
+
for rmse_dict, df_name in zip([rmse_train, rmse_val, rmse_test], ['training', 'validation', 'test']):
|
|
398
|
+
print(f'RMSE on {df_name} set')
|
|
399
|
+
for name in names:
|
|
400
|
+
print(f"{name}: {rmse_dict[name].mean():.4f}")
|
|
401
|
+
print('-------------------------------')
|
|
402
|
+
|
|
403
|
+
print("Calculating Quantile Loss")
|
|
404
|
+
ql_90_train, ql_90_val, ql_90_test = {}, {}, {}
|
|
405
|
+
for features, response, dataset_name, ql_dict in zip(
|
|
406
|
+
[X_train, X_val, X_test], [y_train, y_val, y_test], ['Training', 'Validation', 'Test'], [ql_90_train, ql_90_val, ql_90_test]
|
|
407
|
+
):
|
|
408
|
+
print(f'{dataset_name} Dataset Quantile Loss(es)')
|
|
409
|
+
for model, model_name in zip(dr_models, names):
|
|
410
|
+
ql_dict[model_name] = metrics.quantile_losses(
|
|
411
|
+
0.9, model, model_name, features, response,
|
|
412
|
+
max_iter=1000, tolerance=1e-4,
|
|
413
|
+
l=torch.Tensor([np.min(y_train) - 3 * (np.max(y_train) - np.min(y_train))]),
|
|
414
|
+
u=torch.Tensor([np.max(y_train) + 3 * (np.max(y_train) - np.min(y_train))])
|
|
415
|
+
)
|
|
416
|
+
print('----------------------')
|
|
417
|
+
|
|
418
|
+
```
|
|
419
|
+
|
|
420
|
+
### Interpretability: Kernel SHAP-Embedded PDF and CDF
|
|
421
|
+
|
|
422
|
+
To plot the PDF and CDF, you need to first initialise an `explainer`.
|
|
423
|
+
The instance to be examined should be a DataFrame, with feature values that are neither standardised nor encoded.
|
|
424
|
+
We currently support the Kernel SHAP method.
|
|
425
|
+
|
|
426
|
+
``` python
|
|
427
|
+
test_instance_df = x_test_raw.iloc[:1]
|
|
428
|
+
Y_instance = Y_test[:1]
|
|
429
|
+
|
|
430
|
+
# Initialise the Explainer
|
|
431
|
+
drn_explainer = interpretability.DRNExplainer(
|
|
432
|
+
drn_model,
|
|
433
|
+
baseline,
|
|
434
|
+
cutpoints_DRN,
|
|
435
|
+
x_train_raw,
|
|
436
|
+
cat_features,
|
|
437
|
+
all_categories,
|
|
438
|
+
ct
|
|
439
|
+
)
|
|
440
|
+
|
|
441
|
+
# Plot the PDF before and after refinement
|
|
442
|
+
drn_explainer.plot_adjustment_factors(
|
|
443
|
+
instance=test_instance_df,
|
|
444
|
+
num_interpolations=1_000,
|
|
445
|
+
plot_adjustments_labels=False,
|
|
446
|
+
x_range=(-2, 2),
|
|
447
|
+
)
|
|
448
|
+
|
|
449
|
+
# Use Kernel SHAP to explain the mean adjustment
|
|
450
|
+
drn_explainer.plot_dp_adjustment_shap(
|
|
451
|
+
instance_raw=test_instance_df,
|
|
452
|
+
method='Kernel',
|
|
453
|
+
nsamples_background_fraction=0.5,
|
|
454
|
+
top_K_features=2,
|
|
455
|
+
labelling_gap=0.1,
|
|
456
|
+
dist_property='Mean',
|
|
457
|
+
x_range=(-1, 1),
|
|
458
|
+
y_range=(0.0, 2.0),
|
|
459
|
+
observation=Y_instance,
|
|
460
|
+
adjustment=True,
|
|
461
|
+
shap_fontsize=15,
|
|
462
|
+
figsize=(7, 7),
|
|
463
|
+
plot_title='Explaining a 90% Quantile Adjustment',
|
|
464
|
+
)
|
|
465
|
+
|
|
466
|
+
# Explain DRN's 90% quantile prediction from ground up
|
|
467
|
+
drn_explainer.cdf_plot(
|
|
468
|
+
instance=test_instance_df,
|
|
469
|
+
method='Kernel',
|
|
470
|
+
nsamples_background_fraction=0.5,
|
|
471
|
+
top_K_features=2,
|
|
472
|
+
labelling_gap=0.1,
|
|
473
|
+
dist_property='90% Quantile',
|
|
474
|
+
x_range=(-0.5, 1.0),
|
|
475
|
+
y_range=(0.8, 1.0),
|
|
476
|
+
adjustment=False,
|
|
477
|
+
plot_baseline=False,
|
|
478
|
+
shap_fontsize=15,
|
|
479
|
+
figsize=(7, 7),
|
|
480
|
+
plot_title='90% Quantile Explanation',
|
|
481
|
+
)
|
|
482
|
+
```
|
|
483
|
+
|
|
484
|
+
## Related Repository
|
|
485
|
+
|
|
486
|
+
This package accompanies the [DRN paper](https://arxiv.org/abs/2406.00998) on the Distributional Refinement Network (DRN).
|
|
487
|
+
The related repository, available at [https://github.com/agi-lab/DRN](https://github.com/agi-lab/DRN), contains the Python notebooks and additional resources needed to reproduce the results presented in the [DRN paper](https://arxiv.org/abs/2406.00998).
|
|
488
|
+
|
|
489
|
+
## License
|
|
490
|
+
|
|
491
|
+
See [LICENSE.md](https://github.com/EricTianDong/drn/tree/main?tab=MIT-1-ov-file).
|
|
492
|
+
|
|
493
|
+
## Authors
|
|
494
|
+
|
|
495
|
+
- Eric Dong (author, maintainer),
|
|
496
|
+
- Patrick Laub (author).
|
|
497
|
+
|
|
498
|
+
|
|
499
|
+
## Citation
|
|
500
|
+
|
|
501
|
+
``` sh
|
|
502
|
+
@misc{avanzi2024distributional,
|
|
503
|
+
title={Distributional Refinement Network: Distributional Forecasting via Deep Learning},
|
|
504
|
+
author={Benjamin Avanzi and Eric Dong and Patrick J. Laub and Bernard Wong},
|
|
505
|
+
year={2024},
|
|
506
|
+
eprint={2406.00998},
|
|
507
|
+
archivePrefix={arXiv},
|
|
508
|
+
primaryClass={stat.ML}
|
|
509
|
+
}
|
|
510
|
+
```
|
|
511
|
+
|
|
512
|
+
## Contact
|
|
513
|
+
|
|
514
|
+
For any questions or further information, please contact tiandong1999@gmail.com.
|
|
515
|
+
|
|
516
|
+
|