puncc 0.7__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- puncc-0.7/PKG-INFO +208 -0
- puncc-0.7/README.md +190 -0
- puncc-0.7/deel/puncc/__init__.py +33 -0
- puncc-0.7/deel/puncc/anomaly_detection.py +231 -0
- puncc-0.7/deel/puncc/api/__init__.py +22 -0
- puncc-0.7/deel/puncc/api/calibration.py +542 -0
- puncc-0.7/deel/puncc/api/conformalization.py +411 -0
- puncc-0.7/deel/puncc/api/experimental.py +128 -0
- puncc-0.7/deel/puncc/api/nonconformity_scores.py +289 -0
- puncc-0.7/deel/puncc/api/prediction.py +465 -0
- puncc-0.7/deel/puncc/api/prediction_sets.py +301 -0
- puncc-0.7/deel/puncc/api/splitting.py +195 -0
- puncc-0.7/deel/puncc/api/utils.py +351 -0
- puncc-0.7/deel/puncc/classification.py +349 -0
- puncc-0.7/deel/puncc/metrics.py +128 -0
- puncc-0.7/deel/puncc/plotting.py +258 -0
- puncc-0.7/deel/puncc/regression.py +1020 -0
- puncc-0.7/puncc.egg-info/PKG-INFO +208 -0
- puncc-0.7/puncc.egg-info/SOURCES.txt +23 -0
- puncc-0.7/puncc.egg-info/dependency_links.txt +1 -0
- puncc-0.7/puncc.egg-info/requires.txt +23 -0
- puncc-0.7/puncc.egg-info/top_level.txt +1 -0
- puncc-0.7/pyproject.toml +2 -0
- puncc-0.7/setup.cfg +4 -0
- puncc-0.7/setup.py +90 -0
puncc-0.7/PKG-INFO
ADDED
|
@@ -0,0 +1,208 @@
|
|
|
1
|
+
Metadata-Version: 2.1
|
|
2
|
+
Name: puncc
|
|
3
|
+
Version: 0.7
|
|
4
|
+
Summary: Predictive UNcertainty Calibration and Conformalization Library
|
|
5
|
+
Home-page: https://github.com/deel-ai/puncc
|
|
6
|
+
Author: Mouhcine Mendil, Luca Mossina, Joseba Dalmau
|
|
7
|
+
Author-email: mouhcine.mendil@irt-saintexupery.com, luca.mossina@irt-saintexupery.com, joseba.dalmau@irt-saintexupery.com
|
|
8
|
+
License: UNKNOWN
|
|
9
|
+
Description: <!-- Banner -->
|
|
10
|
+
<div align="center">
|
|
11
|
+
<picture>
|
|
12
|
+
<source media="(prefers-color-scheme: dark)" srcset="docs/assets/banner_dark.png">
|
|
13
|
+
<source media="(prefers-color-scheme: light)" srcset="docs/assets/banner_light.png">
|
|
14
|
+
<img src="docs/assets/banner_light.png" alt="Puncc" width="90%" align="right">
|
|
15
|
+
</picture>
|
|
16
|
+
</div>
|
|
17
|
+
<br>
|
|
18
|
+
|
|
19
|
+
<!-- Badges -->
|
|
20
|
+
<div align="center">
|
|
21
|
+
<a href="#">
|
|
22
|
+
<img src="https://img.shields.io/badge/Python-3.8 +-efefef">
|
|
23
|
+
</a>
|
|
24
|
+
<a href="#">
|
|
25
|
+
<img src="https://img.shields.io/badge/License-MIT-efefef">
|
|
26
|
+
</a>
|
|
27
|
+
<a href="https://github.com/deel-ai/puncc/actions/workflows/linter.yml">
|
|
28
|
+
<img alt="PyLint" src="https://github.com/deel-ai/puncc/actions/workflows/linter.yml/badge.svg">
|
|
29
|
+
</a>
|
|
30
|
+
<a href="https://github.com/deel-ai/puncc/actions/workflows/tests.yml">
|
|
31
|
+
<img alt="Tox" src="https://github.com/deel-ai/puncc/actions/workflows/tests.yml/badge.svg">
|
|
32
|
+
</a>
|
|
33
|
+
</div>
|
|
34
|
+
<br>
|
|
35
|
+
|
|
36
|
+
***Puncc*** (short for **P**redictive **un**certainty **c**alibration and **c**onformalization) is an open-source Python library. It seamlessly integrates a collection of state-of-the-art conformal prediction algorithms and associated techniques for diverse machine learning tasks, including regression, classification and anomaly detection.
|
|
37
|
+
***Puncc*** can be used with any predictive model to provide rigorous uncertainty estimations.
|
|
38
|
+
Under data exchangeability (or *i.i.d*), the generated prediction sets are guaranteed to cover the true outputs within a user-defined error $\alpha$.
|
|
39
|
+
|
|
40
|
+
Documentation is available [**online**](https://deel-ai.github.io/puncc/index.html).
|
|
41
|
+
|
|
42
|
+
## π Table of contents
|
|
43
|
+
|
|
44
|
+
- [πΎ Installation](#-installation)
|
|
45
|
+
- [π Documentation](#-documentation)
|
|
46
|
+
- [π¨βπ Tutorials](#-tutorials)
|
|
47
|
+
- [π QuickStart](#-quickstart)
|
|
48
|
+
- [π Citation](#-citation)
|
|
49
|
+
- [π» Contributing](#-contributing)
|
|
50
|
+
- [π Acknowledgments](#-acknowledgments)
|
|
51
|
+
- [π¨βπ» Creators](#-creators)
|
|
52
|
+
- [π License](#-license)
|
|
53
|
+
|
|
54
|
+
## πΎ Installation
|
|
55
|
+
|
|
56
|
+
*puncc* requires a version of python higher than 3.8 and several libraries including Scikit-learn and Numpy. It is recommended to install *puncc* in a virtual environment to not mess with your system's dependencies.
|
|
57
|
+
|
|
58
|
+
You can directly install the library using pip:
|
|
59
|
+
|
|
60
|
+
```bash
|
|
61
|
+
pip install git+https://github.com/deel-ai/puncc
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
<!--
|
|
65
|
+
You can alternatively clone the repo and use the makefile to automatically create a virtual environment
|
|
66
|
+
and install the requirements:
|
|
67
|
+
|
|
68
|
+
* For users:
|
|
69
|
+
|
|
70
|
+
```bash
|
|
71
|
+
make install-user
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
* For developpers:
|
|
75
|
+
|
|
76
|
+
```bash
|
|
77
|
+
make prepare-dev
|
|
78
|
+
```
|
|
79
|
+
-->
|
|
80
|
+
|
|
81
|
+
## π Documentation
|
|
82
|
+
|
|
83
|
+
For comprehensive documentation, we encourage you to visit the [**official documentation page**](https://deel-ai.github.io/puncc/index.html).
|
|
84
|
+
|
|
85
|
+
## π¨βπ Tutorials
|
|
86
|
+
|
|
87
|
+
We highly recommand following the introduction tutorials to get familiar with the library and its API:
|
|
88
|
+
|
|
89
|
+
* [**Introduction tutorial**](docs/puncc_intro.ipynb)</font> <sub> [](https://colab.research.google.com/drive/1TC_BM7JaEYtBIq6yuYB5U4cJjeg71Tch) </sub>
|
|
90
|
+
|
|
91
|
+
* [**API tutorial**](docs/api_intro.ipynb) <sub> [](https://colab.research.google.com/drive/1d06qQweM1X1eSrCnixA_MLEZil1vXewj) </sub>
|
|
92
|
+
|
|
93
|
+
You can also familiarize yourself with the architecture of *puncc* to build more efficiently your own conformal prediction methods:
|
|
94
|
+
|
|
95
|
+
* [**Architecture overview**](docs/puncc_architecture.ipynb)
|
|
96
|
+
|
|
97
|
+
## π Quickstart
|
|
98
|
+
|
|
99
|
+
Conformal prediction enables to transform point predictions into interval predictions with high probability of coverage. The figure below shows the result of applying the split conformal algorithm on a linear regressor.
|
|
100
|
+
|
|
101
|
+
<figure style="text-align:center">
|
|
102
|
+
<img src="docs/assets/cp_process.png"/>
|
|
103
|
+
</figure>
|
|
104
|
+
|
|
105
|
+
Many conformal prediction algorithms can easily be applied using *puncc*. The code snippet below shows the example of split conformal prediction wrapping a linear model, done in few lines of code:
|
|
106
|
+
|
|
107
|
+
```python
|
|
108
|
+
from sklearn import linear_model
|
|
109
|
+
from deel.puncc.api.prediction import BasePredictor
|
|
110
|
+
|
|
111
|
+
# Load training data and test data
|
|
112
|
+
# ...
|
|
113
|
+
|
|
114
|
+
# Instanciate a linear regression model
|
|
115
|
+
# linear_model = ...
|
|
116
|
+
|
|
117
|
+
|
|
118
|
+
# Create a predictor to wrap the linear regression model defined earlier.
|
|
119
|
+
# This enables interoperability with different ML libraries.
|
|
120
|
+
# The argument `is_trained` is set to False to tell that the the linear model
|
|
121
|
+
# needs to be trained before the calibration.
|
|
122
|
+
lin_reg_predictor = BasePredictor(linear_model, is_trained=False)
|
|
123
|
+
|
|
124
|
+
# Instanciate the split cp wrapper around the linear predictor.
|
|
125
|
+
split_cp = SplitCP(lin_reg_predictor)
|
|
126
|
+
|
|
127
|
+
# Fit model (as is_trained` is False) on the fit dataset and
|
|
128
|
+
# compute the residuals on the calibration dataset.
|
|
129
|
+
# The fit (resp. calibration) subset is randomly sampled from the training
|
|
130
|
+
# data and constitutes 80% (resp. 20%) of it (fit_ratio = 80%).
|
|
131
|
+
split_cp.fit(X_train, y_train, fit_ratio=.8)
|
|
132
|
+
|
|
133
|
+
# The predict returns the output of the linear model y_pred and
|
|
134
|
+
# the calibrated interval [y_pred_lower, y_pred_upper].
|
|
135
|
+
y_pred, y_pred_lower, y_pred_upper = split_cp.predict(X_test, alpha=alpha)
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
The library provides several metrics (`deel.puncc.metrics`) and plotting capabilities (`deel.puncc.plotting`) to evaluate and visualize the results of a conformal procedure. For a target error rate of $\alpha = 0.1$, the marginal coverage reached in this example on the test set is higher than $90$% (see [Introduction tutorial](docs/puncc_intro.ipynb)):
|
|
139
|
+
|
|
140
|
+
<figure style="text-align:center">
|
|
141
|
+
<img src="docs/assets/results_quickstart_split_cp_pi.png" alt="90% Prediction Interval with the Split Conformal Prediction Method"/>
|
|
142
|
+
<div align=center>90% Prediction Interval with Split Conformal Prediction.</div>
|
|
143
|
+
</figure>
|
|
144
|
+
<br>
|
|
145
|
+
|
|
146
|
+
### More flexibility with the API
|
|
147
|
+
|
|
148
|
+
*Puncc* provides two ways of defining and using conformal prediction wrappers:
|
|
149
|
+
- A direct approach to run state-of-the-art conformal prediction procedures. This is what we used in the previous conformal regression example.
|
|
150
|
+
- **Low-level API**: a more flexible approach based of full customization of the prediction model, the choice of nonconformity scores and the split between fit and calibration datasets.
|
|
151
|
+
|
|
152
|
+
A quick comparison of both approaches is provided in the [API tutorial](docs/api_intro.ipynb) for a regression problem.
|
|
153
|
+
|
|
154
|
+
## π Citation
|
|
155
|
+
|
|
156
|
+
If you use our library for your work, please cite our paper:
|
|
157
|
+
|
|
158
|
+
```
|
|
159
|
+
@inproceedings{mendil2023puncc,
|
|
160
|
+
title={PUNCC: a Python Library for Predictive Uncertainty Calibration and Conformalization},
|
|
161
|
+
author={Mendil, Mouhcine and Mossina, Luca and Vigouroux, David},
|
|
162
|
+
booktitle={Conformal and Probabilistic Prediction with Applications},
|
|
163
|
+
pages={582--601},
|
|
164
|
+
year={2023},
|
|
165
|
+
organization={PMLR}
|
|
166
|
+
}
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
*Puncc* has been used to support the work presented in our COPA 2022 paper on conformal prediction for time series.
|
|
170
|
+
|
|
171
|
+
```
|
|
172
|
+
@inproceedings{mendil2022robust,
|
|
173
|
+
title={Robust Gas Demand Forecasting With Conformal Prediction},
|
|
174
|
+
author={Mendil, Mouhcine and Mossina, Luca and Nabhan, Marc and Pasini, Kevin},
|
|
175
|
+
booktitle={Conformal and Probabilistic Prediction with Applications},
|
|
176
|
+
pages={169--187},
|
|
177
|
+
year={2022},
|
|
178
|
+
organization={PMLR}
|
|
179
|
+
}
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
## π» Contributing
|
|
183
|
+
|
|
184
|
+
Contributions are welcome! Feel free to report an issue or open a pull
|
|
185
|
+
request. Take a look at our guidelines [here](CONTRIBUTING.md).
|
|
186
|
+
|
|
187
|
+
## π Acknowledgments
|
|
188
|
+
|
|
189
|
+
<img align="right" src="https://www.deel.ai/wp-content/uploads/2021/05/logo-DEEL.png" width="25%">
|
|
190
|
+
This project received funding from the French βInvesting for the Future β PIA3β program within the Artificial and Natural Intelligence Toulouse Institute (ANITI). The authors gratefully acknowledge the support of the <a href="https://www.deel.ai/"> DEEL </a> project.
|
|
191
|
+
|
|
192
|
+
## π¨βπ» Creators
|
|
193
|
+
|
|
194
|
+
[Mouhcine MENDIL](https://github.com/M-Mouhcine) initially developed this library as a research tool, with assistance from [Lucas MOSSINA](https://github.com/lmossina). We have recently welcomed [Joseba DALMAU](https://github.com/jdalch) to the team to help enhance **puncc** and work on the development of new features.
|
|
195
|
+
|
|
196
|
+
## π License
|
|
197
|
+
|
|
198
|
+
The package is released under [MIT](LICENSES/headers/MIT-Clause.txt) license.
|
|
199
|
+
|
|
200
|
+
Platform: UNKNOWN
|
|
201
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
202
|
+
Classifier: Programming Language :: Python
|
|
203
|
+
Classifier: Programming Language :: Python :: 3
|
|
204
|
+
Classifier: Operating System :: OS Independent
|
|
205
|
+
Requires-Python: >=3.8
|
|
206
|
+
Description-Content-Type: text/markdown
|
|
207
|
+
Provides-Extra: interactive
|
|
208
|
+
Provides-Extra: dev
|
puncc-0.7/README.md
ADDED
|
@@ -0,0 +1,190 @@
|
|
|
1
|
+
<!-- Banner -->
|
|
2
|
+
<div align="center">
|
|
3
|
+
<picture>
|
|
4
|
+
<source media="(prefers-color-scheme: dark)" srcset="docs/assets/banner_dark.png">
|
|
5
|
+
<source media="(prefers-color-scheme: light)" srcset="docs/assets/banner_light.png">
|
|
6
|
+
<img src="docs/assets/banner_light.png" alt="Puncc" width="90%" align="right">
|
|
7
|
+
</picture>
|
|
8
|
+
</div>
|
|
9
|
+
<br>
|
|
10
|
+
|
|
11
|
+
<!-- Badges -->
|
|
12
|
+
<div align="center">
|
|
13
|
+
<a href="#">
|
|
14
|
+
<img src="https://img.shields.io/badge/Python-3.8 +-efefef">
|
|
15
|
+
</a>
|
|
16
|
+
<a href="#">
|
|
17
|
+
<img src="https://img.shields.io/badge/License-MIT-efefef">
|
|
18
|
+
</a>
|
|
19
|
+
<a href="https://github.com/deel-ai/puncc/actions/workflows/linter.yml">
|
|
20
|
+
<img alt="PyLint" src="https://github.com/deel-ai/puncc/actions/workflows/linter.yml/badge.svg">
|
|
21
|
+
</a>
|
|
22
|
+
<a href="https://github.com/deel-ai/puncc/actions/workflows/tests.yml">
|
|
23
|
+
<img alt="Tox" src="https://github.com/deel-ai/puncc/actions/workflows/tests.yml/badge.svg">
|
|
24
|
+
</a>
|
|
25
|
+
</div>
|
|
26
|
+
<br>
|
|
27
|
+
|
|
28
|
+
***Puncc*** (short for **P**redictive **un**certainty **c**alibration and **c**onformalization) is an open-source Python library. It seamlessly integrates a collection of state-of-the-art conformal prediction algorithms and associated techniques for diverse machine learning tasks, including regression, classification and anomaly detection.
|
|
29
|
+
***Puncc*** can be used with any predictive model to provide rigorous uncertainty estimations.
|
|
30
|
+
Under data exchangeability (or *i.i.d*), the generated prediction sets are guaranteed to cover the true outputs within a user-defined error $\alpha$.
|
|
31
|
+
|
|
32
|
+
Documentation is available [**online**](https://deel-ai.github.io/puncc/index.html).
|
|
33
|
+
|
|
34
|
+
## π Table of contents
|
|
35
|
+
|
|
36
|
+
- [πΎ Installation](#-installation)
|
|
37
|
+
- [π Documentation](#-documentation)
|
|
38
|
+
- [π¨βπ Tutorials](#-tutorials)
|
|
39
|
+
- [π QuickStart](#-quickstart)
|
|
40
|
+
- [π Citation](#-citation)
|
|
41
|
+
- [π» Contributing](#-contributing)
|
|
42
|
+
- [π Acknowledgments](#-acknowledgments)
|
|
43
|
+
- [π¨βπ» Creators](#-creators)
|
|
44
|
+
- [π License](#-license)
|
|
45
|
+
|
|
46
|
+
## πΎ Installation
|
|
47
|
+
|
|
48
|
+
*puncc* requires a version of python higher than 3.8 and several libraries including Scikit-learn and Numpy. It is recommended to install *puncc* in a virtual environment to not mess with your system's dependencies.
|
|
49
|
+
|
|
50
|
+
You can directly install the library using pip:
|
|
51
|
+
|
|
52
|
+
```bash
|
|
53
|
+
pip install git+https://github.com/deel-ai/puncc
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
<!--
|
|
57
|
+
You can alternatively clone the repo and use the makefile to automatically create a virtual environment
|
|
58
|
+
and install the requirements:
|
|
59
|
+
|
|
60
|
+
* For users:
|
|
61
|
+
|
|
62
|
+
```bash
|
|
63
|
+
make install-user
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
* For developpers:
|
|
67
|
+
|
|
68
|
+
```bash
|
|
69
|
+
make prepare-dev
|
|
70
|
+
```
|
|
71
|
+
-->
|
|
72
|
+
|
|
73
|
+
## π Documentation
|
|
74
|
+
|
|
75
|
+
For comprehensive documentation, we encourage you to visit the [**official documentation page**](https://deel-ai.github.io/puncc/index.html).
|
|
76
|
+
|
|
77
|
+
## π¨βπ Tutorials
|
|
78
|
+
|
|
79
|
+
We highly recommand following the introduction tutorials to get familiar with the library and its API:
|
|
80
|
+
|
|
81
|
+
* [**Introduction tutorial**](docs/puncc_intro.ipynb)</font> <sub> [](https://colab.research.google.com/drive/1TC_BM7JaEYtBIq6yuYB5U4cJjeg71Tch) </sub>
|
|
82
|
+
|
|
83
|
+
* [**API tutorial**](docs/api_intro.ipynb) <sub> [](https://colab.research.google.com/drive/1d06qQweM1X1eSrCnixA_MLEZil1vXewj) </sub>
|
|
84
|
+
|
|
85
|
+
You can also familiarize yourself with the architecture of *puncc* to build more efficiently your own conformal prediction methods:
|
|
86
|
+
|
|
87
|
+
* [**Architecture overview**](docs/puncc_architecture.ipynb)
|
|
88
|
+
|
|
89
|
+
## π Quickstart
|
|
90
|
+
|
|
91
|
+
Conformal prediction enables to transform point predictions into interval predictions with high probability of coverage. The figure below shows the result of applying the split conformal algorithm on a linear regressor.
|
|
92
|
+
|
|
93
|
+
<figure style="text-align:center">
|
|
94
|
+
<img src="docs/assets/cp_process.png"/>
|
|
95
|
+
</figure>
|
|
96
|
+
|
|
97
|
+
Many conformal prediction algorithms can easily be applied using *puncc*. The code snippet below shows the example of split conformal prediction wrapping a linear model, done in few lines of code:
|
|
98
|
+
|
|
99
|
+
```python
|
|
100
|
+
from sklearn import linear_model
|
|
101
|
+
from deel.puncc.api.prediction import BasePredictor
|
|
102
|
+
|
|
103
|
+
# Load training data and test data
|
|
104
|
+
# ...
|
|
105
|
+
|
|
106
|
+
# Instanciate a linear regression model
|
|
107
|
+
# linear_model = ...
|
|
108
|
+
|
|
109
|
+
|
|
110
|
+
# Create a predictor to wrap the linear regression model defined earlier.
|
|
111
|
+
# This enables interoperability with different ML libraries.
|
|
112
|
+
# The argument `is_trained` is set to False to tell that the the linear model
|
|
113
|
+
# needs to be trained before the calibration.
|
|
114
|
+
lin_reg_predictor = BasePredictor(linear_model, is_trained=False)
|
|
115
|
+
|
|
116
|
+
# Instanciate the split cp wrapper around the linear predictor.
|
|
117
|
+
split_cp = SplitCP(lin_reg_predictor)
|
|
118
|
+
|
|
119
|
+
# Fit model (as is_trained` is False) on the fit dataset and
|
|
120
|
+
# compute the residuals on the calibration dataset.
|
|
121
|
+
# The fit (resp. calibration) subset is randomly sampled from the training
|
|
122
|
+
# data and constitutes 80% (resp. 20%) of it (fit_ratio = 80%).
|
|
123
|
+
split_cp.fit(X_train, y_train, fit_ratio=.8)
|
|
124
|
+
|
|
125
|
+
# The predict returns the output of the linear model y_pred and
|
|
126
|
+
# the calibrated interval [y_pred_lower, y_pred_upper].
|
|
127
|
+
y_pred, y_pred_lower, y_pred_upper = split_cp.predict(X_test, alpha=alpha)
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
The library provides several metrics (`deel.puncc.metrics`) and plotting capabilities (`deel.puncc.plotting`) to evaluate and visualize the results of a conformal procedure. For a target error rate of $\alpha = 0.1$, the marginal coverage reached in this example on the test set is higher than $90$% (see [Introduction tutorial](docs/puncc_intro.ipynb)):
|
|
131
|
+
|
|
132
|
+
<figure style="text-align:center">
|
|
133
|
+
<img src="docs/assets/results_quickstart_split_cp_pi.png" alt="90% Prediction Interval with the Split Conformal Prediction Method"/>
|
|
134
|
+
<div align=center>90% Prediction Interval with Split Conformal Prediction.</div>
|
|
135
|
+
</figure>
|
|
136
|
+
<br>
|
|
137
|
+
|
|
138
|
+
### More flexibility with the API
|
|
139
|
+
|
|
140
|
+
*Puncc* provides two ways of defining and using conformal prediction wrappers:
|
|
141
|
+
- A direct approach to run state-of-the-art conformal prediction procedures. This is what we used in the previous conformal regression example.
|
|
142
|
+
- **Low-level API**: a more flexible approach based of full customization of the prediction model, the choice of nonconformity scores and the split between fit and calibration datasets.
|
|
143
|
+
|
|
144
|
+
A quick comparison of both approaches is provided in the [API tutorial](docs/api_intro.ipynb) for a regression problem.
|
|
145
|
+
|
|
146
|
+
## π Citation
|
|
147
|
+
|
|
148
|
+
If you use our library for your work, please cite our paper:
|
|
149
|
+
|
|
150
|
+
```
|
|
151
|
+
@inproceedings{mendil2023puncc,
|
|
152
|
+
title={PUNCC: a Python Library for Predictive Uncertainty Calibration and Conformalization},
|
|
153
|
+
author={Mendil, Mouhcine and Mossina, Luca and Vigouroux, David},
|
|
154
|
+
booktitle={Conformal and Probabilistic Prediction with Applications},
|
|
155
|
+
pages={582--601},
|
|
156
|
+
year={2023},
|
|
157
|
+
organization={PMLR}
|
|
158
|
+
}
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
*Puncc* has been used to support the work presented in our COPA 2022 paper on conformal prediction for time series.
|
|
162
|
+
|
|
163
|
+
```
|
|
164
|
+
@inproceedings{mendil2022robust,
|
|
165
|
+
title={Robust Gas Demand Forecasting With Conformal Prediction},
|
|
166
|
+
author={Mendil, Mouhcine and Mossina, Luca and Nabhan, Marc and Pasini, Kevin},
|
|
167
|
+
booktitle={Conformal and Probabilistic Prediction with Applications},
|
|
168
|
+
pages={169--187},
|
|
169
|
+
year={2022},
|
|
170
|
+
organization={PMLR}
|
|
171
|
+
}
|
|
172
|
+
```
|
|
173
|
+
|
|
174
|
+
## π» Contributing
|
|
175
|
+
|
|
176
|
+
Contributions are welcome! Feel free to report an issue or open a pull
|
|
177
|
+
request. Take a look at our guidelines [here](CONTRIBUTING.md).
|
|
178
|
+
|
|
179
|
+
## π Acknowledgments
|
|
180
|
+
|
|
181
|
+
<img align="right" src="https://www.deel.ai/wp-content/uploads/2021/05/logo-DEEL.png" width="25%">
|
|
182
|
+
This project received funding from the French βInvesting for the Future β PIA3β program within the Artificial and Natural Intelligence Toulouse Institute (ANITI). The authors gratefully acknowledge the support of the <a href="https://www.deel.ai/"> DEEL </a> project.
|
|
183
|
+
|
|
184
|
+
## π¨βπ» Creators
|
|
185
|
+
|
|
186
|
+
[Mouhcine MENDIL](https://github.com/M-Mouhcine) initially developed this library as a research tool, with assistance from [Lucas MOSSINA](https://github.com/lmossina). We have recently welcomed [Joseba DALMAU](https://github.com/jdalch) to the team to help enhance **puncc** and work on the development of new features.
|
|
187
|
+
|
|
188
|
+
## π License
|
|
189
|
+
|
|
190
|
+
The package is released under [MIT](LICENSES/headers/MIT-Clause.txt) license.
|
|
@@ -0,0 +1,33 @@
|
|
|
1
|
+
# -*- coding: utf-8 -*-
|
|
2
|
+
# Copyright IRT Antoine de Saint ExupΓ©ry et UniversitΓ© Paul Sabatier Toulouse III - All
|
|
3
|
+
# rights reserved. DEEL is a research program operated by IVADO, IRT Saint ExupΓ©ry,
|
|
4
|
+
# CRIAQ and ANITI - https://www.deel.ai/
|
|
5
|
+
#
|
|
6
|
+
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
7
|
+
# of this software and associated documentation files (the "Software"), to deal
|
|
8
|
+
# in the Software without restriction, including without limitation the rights
|
|
9
|
+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
10
|
+
# copies of the Software, and to permit persons to whom the Software is
|
|
11
|
+
# furnished to do so, subject to the following conditions:
|
|
12
|
+
#
|
|
13
|
+
# The above copyright notice and this permission notice shall be included in all
|
|
14
|
+
# copies or substantial portions of the Software.
|
|
15
|
+
#
|
|
16
|
+
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
17
|
+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
18
|
+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
19
|
+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
20
|
+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
21
|
+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
22
|
+
# SOFTWARE.
|
|
23
|
+
"""Initialization of puncc."""
|
|
24
|
+
import logging.config
|
|
25
|
+
|
|
26
|
+
|
|
27
|
+
# Create the Logger
|
|
28
|
+
logging.basicConfig(
|
|
29
|
+
format="%(asctime)s === %(name)s [%(funcName)s()] | %(levelname)s | - %(message)s",
|
|
30
|
+
datefmt="%d-%b-%y %H:%M:%S",
|
|
31
|
+
level=logging.ERROR,
|
|
32
|
+
)
|
|
33
|
+
loggers = logging.getLogger(__name__)
|
|
@@ -0,0 +1,231 @@
|
|
|
1
|
+
# -*- coding: utf-8 -*-
|
|
2
|
+
# Copyright IRT Antoine de Saint ExupΓ©ry et UniversitΓ© Paul Sabatier Toulouse III - All
|
|
3
|
+
# rights reserved. DEEL is a research program operated by IVADO, IRT Saint ExupΓ©ry,
|
|
4
|
+
# CRIAQ and ANITI - https://www.deel.ai/
|
|
5
|
+
#
|
|
6
|
+
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
7
|
+
# of this software and associated documentation files (the "Software"), to deal
|
|
8
|
+
# in the Software without restriction, including without limitation the rights
|
|
9
|
+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
10
|
+
# copies of the Software, and to permit persons to whom the Software is
|
|
11
|
+
# furnished to do so, subject to the following conditions:
|
|
12
|
+
#
|
|
13
|
+
# The above copyright notice and this permission notice shall be included in all
|
|
14
|
+
# copies or substantial portions of the Software.
|
|
15
|
+
#
|
|
16
|
+
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
17
|
+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
18
|
+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
19
|
+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
20
|
+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
21
|
+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
22
|
+
# SOFTWARE.
|
|
23
|
+
"""
|
|
24
|
+
This module implements usual anomaly detection wrappers.
|
|
25
|
+
"""
|
|
26
|
+
import logging
|
|
27
|
+
from typing import Iterable
|
|
28
|
+
from typing import Optional
|
|
29
|
+
from typing import Tuple
|
|
30
|
+
|
|
31
|
+
import numpy as np
|
|
32
|
+
|
|
33
|
+
from deel.puncc.api.calibration import ScoreCalibrator
|
|
34
|
+
from deel.puncc.api.splitting import IdSplitter
|
|
35
|
+
from deel.puncc.api.splitting import RandomSplitter
|
|
36
|
+
|
|
37
|
+
logger = logging.getLogger(__name__)
|
|
38
|
+
|
|
39
|
+
|
|
40
|
+
class SplitCAD:
|
|
41
|
+
"""Split conformal anomaly detection method based on Laxhammar's algorithm.
|
|
42
|
+
The anomaly detection is based on the calibrated threshold (through
|
|
43
|
+
conformal prediction) of underlying anomaly detection (model's) scores.
|
|
44
|
+
For more details, we refer the user to the :ref:`theory overview
|
|
45
|
+
page <theory_overview>`.
|
|
46
|
+
|
|
47
|
+
:param BasePredictor predictor: a predictor implementing fit and predict.
|
|
48
|
+
:param bool train: if False, prediction model(s) will not be (re)trained.
|
|
49
|
+
Defaults to True.
|
|
50
|
+
:param float random_state: random seed used when the user does not
|
|
51
|
+
provide a custom fit/calibration split in `fit` method.
|
|
52
|
+
|
|
53
|
+
Example::
|
|
54
|
+
|
|
55
|
+
import numpy as np
|
|
56
|
+
from sklearn.ensemble import IsolationForest
|
|
57
|
+
from sklearn.datasets import make_moons
|
|
58
|
+
import matplotlib.pyplot as plt
|
|
59
|
+
|
|
60
|
+
from deel.puncc.anomaly_detection import SplitCAD
|
|
61
|
+
from deel.puncc.api.prediction import BasePredictor
|
|
62
|
+
|
|
63
|
+
# We generate the two moons dataset
|
|
64
|
+
dataset = 4 * make_moons(n_samples=1000, noise=0.05, random_state=0)[
|
|
65
|
+
0
|
|
66
|
+
] - np.array([0.5, 0.25])
|
|
67
|
+
|
|
68
|
+
# We generate uniformly new (test) data points
|
|
69
|
+
rng = np.random.RandomState(42)
|
|
70
|
+
z_test = rng.uniform(low=-6, high=6, size=(150, 2))
|
|
71
|
+
|
|
72
|
+
|
|
73
|
+
# The nonconformity scores are defined as the IF scores (anomaly score).
|
|
74
|
+
# By default, score_samples return the opposite of IF scores.
|
|
75
|
+
# We need to redefine the predict to output the nonconformity scores.
|
|
76
|
+
class ADPredictor(BasePredictor):
|
|
77
|
+
def predict(self, X):
|
|
78
|
+
return -self.model.score_samples(X)
|
|
79
|
+
|
|
80
|
+
# Instantiate the Isolation Forest (IF) anomaly detection model
|
|
81
|
+
# and wrap it in a predictor
|
|
82
|
+
if_predictor = ADPredictor(IsolationForest(random_state=42))
|
|
83
|
+
|
|
84
|
+
# Instantiate CAD on top of IF predictor
|
|
85
|
+
if_cad = SplitCAD(if_predictor, train=True, random_state=0)
|
|
86
|
+
|
|
87
|
+
# Fit the IF on the proper fitting dataset and
|
|
88
|
+
# calibrate it using calibration dataset.
|
|
89
|
+
# The two datasets are sampled randomly with a ration of 7:3,
|
|
90
|
+
# respectively.
|
|
91
|
+
if_cad.fit(z=dataset, fit_ratio=0.7)
|
|
92
|
+
|
|
93
|
+
# We set the maximum false detection rate to 1%
|
|
94
|
+
alpha = 0.01
|
|
95
|
+
|
|
96
|
+
# The method `predict` is called on the new data points
|
|
97
|
+
# to test which are anomalous and which are not
|
|
98
|
+
results = if_cad.predict(z_test, alpha=alpha)
|
|
99
|
+
|
|
100
|
+
anomalies = z_test[results]
|
|
101
|
+
not_anomalies = z_test[np.invert(results)]
|
|
102
|
+
|
|
103
|
+
# Plot results
|
|
104
|
+
plt.scatter(dataset[:, 0], dataset[:, 1], s=10, label="Inliers")
|
|
105
|
+
plt.scatter(
|
|
106
|
+
anomalies[:, 0],
|
|
107
|
+
anomalies[:, 1],
|
|
108
|
+
marker="x",
|
|
109
|
+
color="red",
|
|
110
|
+
s=40,
|
|
111
|
+
label="Anomalies",
|
|
112
|
+
)
|
|
113
|
+
plt.scatter(
|
|
114
|
+
not_anomalies[:, 0],
|
|
115
|
+
not_anomalies[:, 1],
|
|
116
|
+
marker="x",
|
|
117
|
+
color="blue",
|
|
118
|
+
s=40,
|
|
119
|
+
label="Normal",
|
|
120
|
+
)
|
|
121
|
+
plt.xticks(())
|
|
122
|
+
plt.yticks(())
|
|
123
|
+
plt.legend()
|
|
124
|
+
"""
|
|
125
|
+
|
|
126
|
+
def __init__(self, predictor, *, train=True, random_state: float = None):
|
|
127
|
+
self.predictor = predictor
|
|
128
|
+
self.calibrator = ScoreCalibrator(nonconf_score_func=predictor.predict)
|
|
129
|
+
|
|
130
|
+
self.train = train
|
|
131
|
+
|
|
132
|
+
self.random_state = random_state
|
|
133
|
+
|
|
134
|
+
self.__is_fit = False
|
|
135
|
+
|
|
136
|
+
def fit(
|
|
137
|
+
self,
|
|
138
|
+
*,
|
|
139
|
+
z: Optional[Iterable] = None,
|
|
140
|
+
fit_ratio: float = 0.8,
|
|
141
|
+
z_fit: Optional[Iterable] = None,
|
|
142
|
+
z_calib: Optional[Iterable] = None,
|
|
143
|
+
**kwargs: Optional[dict],
|
|
144
|
+
):
|
|
145
|
+
"""This method fits the models on the fit data
|
|
146
|
+
and computes nonconformity scores on calibration data.
|
|
147
|
+
If z are provided, randomly split data into
|
|
148
|
+
fit and calib subsets w.r.t to the fit_ratio.
|
|
149
|
+
In case z_fit and z_calib are provided,
|
|
150
|
+
the conformalization is performed on the given user defined
|
|
151
|
+
fit and calibration sets.
|
|
152
|
+
|
|
153
|
+
.. NOTE::
|
|
154
|
+
|
|
155
|
+
If z is provided, `fit` ignores
|
|
156
|
+
any user-defined fit/calib split.
|
|
157
|
+
|
|
158
|
+
|
|
159
|
+
:param Iterable z: data points from the training dataset.
|
|
160
|
+
:param float fit_ratio: the proportion of samples assigned to the
|
|
161
|
+
fit subset.
|
|
162
|
+
:param Iterable z_fit: data points from the fit dataset.
|
|
163
|
+
:param Iterable z_calib: data points from the calibration dataset.
|
|
164
|
+
:param dict kwargs: predict configuration to be passed to the model's
|
|
165
|
+
fit method.
|
|
166
|
+
|
|
167
|
+
:raises RuntimeError: no dataset provided.
|
|
168
|
+
|
|
169
|
+
"""
|
|
170
|
+
|
|
171
|
+
if z is not None:
|
|
172
|
+
splitter = RandomSplitter(
|
|
173
|
+
ratio=fit_ratio, random_state=self.random_state
|
|
174
|
+
)
|
|
175
|
+
|
|
176
|
+
elif z_fit is not None and z_calib is not None:
|
|
177
|
+
splitter = IdSplitter(z_fit, z_fit, z_calib, z_calib)
|
|
178
|
+
|
|
179
|
+
elif (
|
|
180
|
+
self.predictor.is_trained and z_fit is None and z_calib is not None
|
|
181
|
+
):
|
|
182
|
+
splitter = IdSplitter(
|
|
183
|
+
np.empty_like(z_calib), np.empty_like(z_calib), z_calib, z_calib
|
|
184
|
+
)
|
|
185
|
+
|
|
186
|
+
else:
|
|
187
|
+
raise RuntimeError("No dataset provided.")
|
|
188
|
+
|
|
189
|
+
# Apply splitter
|
|
190
|
+
z_fit, _, z_calib, _ = splitter(z, z)[0]
|
|
191
|
+
|
|
192
|
+
# Fit underlying model and calibrator
|
|
193
|
+
if self.train:
|
|
194
|
+
logger.info("Fitting model")
|
|
195
|
+
self.predictor.fit(z_fit, **kwargs)
|
|
196
|
+
|
|
197
|
+
# Make sure that predictor is already trained if train arg is False
|
|
198
|
+
elif self.train is False and self.predictor.is_trained is False:
|
|
199
|
+
raise RuntimeError(
|
|
200
|
+
"'train' argument is set to 'False' but model is not pre-trained"
|
|
201
|
+
)
|
|
202
|
+
|
|
203
|
+
else: # Skipping training
|
|
204
|
+
logger.info("Skipping training.")
|
|
205
|
+
|
|
206
|
+
# Fitting calibrator
|
|
207
|
+
self.calibrator.fit(z_calib)
|
|
208
|
+
|
|
209
|
+
self.__is_fit = True
|
|
210
|
+
|
|
211
|
+
def predict(self, z_test: Iterable, alpha) -> Tuple[np.ndarray]:
|
|
212
|
+
"""Predict whether each example is an anomaly or not. The decision is
|
|
213
|
+
taken based on the calibrated threshold (through conformal prediction)
|
|
214
|
+
of underlying anomaly detection scores.
|
|
215
|
+
|
|
216
|
+
:param Iterable z_test: new data points.
|
|
217
|
+
:param float alpha: target maximum FDR.
|
|
218
|
+
|
|
219
|
+
:returns: outlier tag. True if outlier, False otherwise.
|
|
220
|
+
:rtype: Iterables[bool]
|
|
221
|
+
|
|
222
|
+
"""
|
|
223
|
+
|
|
224
|
+
if self.__is_fit is None:
|
|
225
|
+
raise RuntimeError("Fit method should be called before predict.")
|
|
226
|
+
|
|
227
|
+
anomaly_pred = np.invert(
|
|
228
|
+
self.calibrator.is_conformal(z_test, alpha=alpha)
|
|
229
|
+
)
|
|
230
|
+
|
|
231
|
+
return anomaly_pred
|
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
# -*- coding: utf-8 -*-
|
|
2
|
+
# Copyright IRT Antoine de Saint ExupΓ©ry et UniversitΓ© Paul Sabatier Toulouse III - All
|
|
3
|
+
# rights reserved. DEEL is a research program operated by IVADO, IRT Saint ExupΓ©ry,
|
|
4
|
+
# CRIAQ and ANITI - https://www.deel.ai/
|
|
5
|
+
#
|
|
6
|
+
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
7
|
+
# of this software and associated documentation files (the "Software"), to deal
|
|
8
|
+
# in the Software without restriction, including without limitation the rights
|
|
9
|
+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
10
|
+
# copies of the Software, and to permit persons to whom the Software is
|
|
11
|
+
# furnished to do so, subject to the following conditions:
|
|
12
|
+
#
|
|
13
|
+
# The above copyright notice and this permission notice shall be included in all
|
|
14
|
+
# copies or substantial portions of the Software.
|
|
15
|
+
#
|
|
16
|
+
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
17
|
+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
18
|
+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
19
|
+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
20
|
+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
21
|
+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
22
|
+
# SOFTWARE.
|