puncc 0.7__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
puncc-0.7/PKG-INFO ADDED
@@ -0,0 +1,208 @@
1
+ Metadata-Version: 2.1
2
+ Name: puncc
3
+ Version: 0.7
4
+ Summary: Predictive UNcertainty Calibration and Conformalization Library
5
+ Home-page: https://github.com/deel-ai/puncc
6
+ Author: Mouhcine Mendil, Luca Mossina, Joseba Dalmau
7
+ Author-email: mouhcine.mendil@irt-saintexupery.com, luca.mossina@irt-saintexupery.com, joseba.dalmau@irt-saintexupery.com
8
+ License: UNKNOWN
9
+ Description: <!-- Banner -->
10
+ <div align="center">
11
+ <picture>
12
+ <source media="(prefers-color-scheme: dark)" srcset="docs/assets/banner_dark.png">
13
+ <source media="(prefers-color-scheme: light)" srcset="docs/assets/banner_light.png">
14
+ <img src="docs/assets/banner_light.png" alt="Puncc" width="90%" align="right">
15
+ </picture>
16
+ </div>
17
+ <br>
18
+
19
+ <!-- Badges -->
20
+ <div align="center">
21
+ <a href="#">
22
+ <img src="https://img.shields.io/badge/Python-3.8 +-efefef">
23
+ </a>
24
+ <a href="#">
25
+ <img src="https://img.shields.io/badge/License-MIT-efefef">
26
+ </a>
27
+ <a href="https://github.com/deel-ai/puncc/actions/workflows/linter.yml">
28
+ <img alt="PyLint" src="https://github.com/deel-ai/puncc/actions/workflows/linter.yml/badge.svg">
29
+ </a>
30
+ <a href="https://github.com/deel-ai/puncc/actions/workflows/tests.yml">
31
+ <img alt="Tox" src="https://github.com/deel-ai/puncc/actions/workflows/tests.yml/badge.svg">
32
+ </a>
33
+ </div>
34
+ <br>
35
+
36
+ ***Puncc*** (short for **P**redictive **un**certainty **c**alibration and **c**onformalization) is an open-source Python library. It seamlessly integrates a collection of state-of-the-art conformal prediction algorithms and associated techniques for diverse machine learning tasks, including regression, classification and anomaly detection.
37
+ ***Puncc*** can be used with any predictive model to provide rigorous uncertainty estimations.
38
+ Under data exchangeability (or *i.i.d*), the generated prediction sets are guaranteed to cover the true outputs within a user-defined error $\alpha$.
39
+
40
+ Documentation is available [**online**](https://deel-ai.github.io/puncc/index.html).
41
+
42
+ ## πŸ“š Table of contents
43
+
44
+ - [🐾 Installation](#-installation)
45
+ - [πŸ“– Documentation](#-documentation)
46
+ - [πŸ‘¨β€πŸŽ“ Tutorials](#-tutorials)
47
+ - [πŸš€ QuickStart](#-quickstart)
48
+ - [πŸ“š Citation](#-citation)
49
+ - [πŸ’» Contributing](#-contributing)
50
+ - [πŸ™ Acknowledgments](#-acknowledgments)
51
+ - [πŸ‘¨β€πŸ’» Creators](#-creators)
52
+ - [πŸ“ License](#-license)
53
+
54
+ ## 🐾 Installation
55
+
56
+ *puncc* requires a version of python higher than 3.8 and several libraries including Scikit-learn and Numpy. It is recommended to install *puncc* in a virtual environment to not mess with your system's dependencies.
57
+
58
+ You can directly install the library using pip:
59
+
60
+ ```bash
61
+ pip install git+https://github.com/deel-ai/puncc
62
+ ```
63
+
64
+ <!--
65
+ You can alternatively clone the repo and use the makefile to automatically create a virtual environment
66
+ and install the requirements:
67
+
68
+ * For users:
69
+
70
+ ```bash
71
+ make install-user
72
+ ```
73
+
74
+ * For developpers:
75
+
76
+ ```bash
77
+ make prepare-dev
78
+ ```
79
+ -->
80
+
81
+ ## πŸ“– Documentation
82
+
83
+ For comprehensive documentation, we encourage you to visit the [**official documentation page**](https://deel-ai.github.io/puncc/index.html).
84
+
85
+ ## πŸ‘¨β€πŸŽ“ Tutorials
86
+
87
+ We highly recommand following the introduction tutorials to get familiar with the library and its API:
88
+
89
+ * [**Introduction tutorial**](docs/puncc_intro.ipynb)</font> <sub> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1TC_BM7JaEYtBIq6yuYB5U4cJjeg71Tch) </sub>
90
+
91
+ * [**API tutorial**](docs/api_intro.ipynb) <sub> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1d06qQweM1X1eSrCnixA_MLEZil1vXewj) </sub>
92
+
93
+ You can also familiarize yourself with the architecture of *puncc* to build more efficiently your own conformal prediction methods:
94
+
95
+ * [**Architecture overview**](docs/puncc_architecture.ipynb)
96
+
97
+ ## πŸš€ Quickstart
98
+
99
+ Conformal prediction enables to transform point predictions into interval predictions with high probability of coverage. The figure below shows the result of applying the split conformal algorithm on a linear regressor.
100
+
101
+ <figure style="text-align:center">
102
+ <img src="docs/assets/cp_process.png"/>
103
+ </figure>
104
+
105
+ Many conformal prediction algorithms can easily be applied using *puncc*. The code snippet below shows the example of split conformal prediction wrapping a linear model, done in few lines of code:
106
+
107
+ ```python
108
+ from sklearn import linear_model
109
+ from deel.puncc.api.prediction import BasePredictor
110
+
111
+ # Load training data and test data
112
+ # ...
113
+
114
+ # Instanciate a linear regression model
115
+ # linear_model = ...
116
+
117
+
118
+ # Create a predictor to wrap the linear regression model defined earlier.
119
+ # This enables interoperability with different ML libraries.
120
+ # The argument `is_trained` is set to False to tell that the the linear model
121
+ # needs to be trained before the calibration.
122
+ lin_reg_predictor = BasePredictor(linear_model, is_trained=False)
123
+
124
+ # Instanciate the split cp wrapper around the linear predictor.
125
+ split_cp = SplitCP(lin_reg_predictor)
126
+
127
+ # Fit model (as is_trained` is False) on the fit dataset and
128
+ # compute the residuals on the calibration dataset.
129
+ # The fit (resp. calibration) subset is randomly sampled from the training
130
+ # data and constitutes 80% (resp. 20%) of it (fit_ratio = 80%).
131
+ split_cp.fit(X_train, y_train, fit_ratio=.8)
132
+
133
+ # The predict returns the output of the linear model y_pred and
134
+ # the calibrated interval [y_pred_lower, y_pred_upper].
135
+ y_pred, y_pred_lower, y_pred_upper = split_cp.predict(X_test, alpha=alpha)
136
+ ```
137
+
138
+ The library provides several metrics (`deel.puncc.metrics`) and plotting capabilities (`deel.puncc.plotting`) to evaluate and visualize the results of a conformal procedure. For a target error rate of $\alpha = 0.1$, the marginal coverage reached in this example on the test set is higher than $90$% (see [Introduction tutorial](docs/puncc_intro.ipynb)):
139
+
140
+ <figure style="text-align:center">
141
+ <img src="docs/assets/results_quickstart_split_cp_pi.png" alt="90% Prediction Interval with the Split Conformal Prediction Method"/>
142
+ <div align=center>90% Prediction Interval with Split Conformal Prediction.</div>
143
+ </figure>
144
+ <br>
145
+
146
+ ### More flexibility with the API
147
+
148
+ *Puncc* provides two ways of defining and using conformal prediction wrappers:
149
+ - A direct approach to run state-of-the-art conformal prediction procedures. This is what we used in the previous conformal regression example.
150
+ - **Low-level API**: a more flexible approach based of full customization of the prediction model, the choice of nonconformity scores and the split between fit and calibration datasets.
151
+
152
+ A quick comparison of both approaches is provided in the [API tutorial](docs/api_intro.ipynb) for a regression problem.
153
+
154
+ ## πŸ“š Citation
155
+
156
+ If you use our library for your work, please cite our paper:
157
+
158
+ ```
159
+ @inproceedings{mendil2023puncc,
160
+ title={PUNCC: a Python Library for Predictive Uncertainty Calibration and Conformalization},
161
+ author={Mendil, Mouhcine and Mossina, Luca and Vigouroux, David},
162
+ booktitle={Conformal and Probabilistic Prediction with Applications},
163
+ pages={582--601},
164
+ year={2023},
165
+ organization={PMLR}
166
+ }
167
+ ```
168
+
169
+ *Puncc* has been used to support the work presented in our COPA 2022 paper on conformal prediction for time series.
170
+
171
+ ```
172
+ @inproceedings{mendil2022robust,
173
+ title={Robust Gas Demand Forecasting With Conformal Prediction},
174
+ author={Mendil, Mouhcine and Mossina, Luca and Nabhan, Marc and Pasini, Kevin},
175
+ booktitle={Conformal and Probabilistic Prediction with Applications},
176
+ pages={169--187},
177
+ year={2022},
178
+ organization={PMLR}
179
+ }
180
+ ```
181
+
182
+ ## πŸ’» Contributing
183
+
184
+ Contributions are welcome! Feel free to report an issue or open a pull
185
+ request. Take a look at our guidelines [here](CONTRIBUTING.md).
186
+
187
+ ## πŸ™ Acknowledgments
188
+
189
+ <img align="right" src="https://www.deel.ai/wp-content/uploads/2021/05/logo-DEEL.png" width="25%">
190
+ This project received funding from the French ”Investing for the Future – PIA3” program within the Artificial and Natural Intelligence Toulouse Institute (ANITI). The authors gratefully acknowledge the support of the <a href="https://www.deel.ai/"> DEEL </a> project.
191
+
192
+ ## πŸ‘¨β€πŸ’» Creators
193
+
194
+ [Mouhcine MENDIL](https://github.com/M-Mouhcine) initially developed this library as a research tool, with assistance from [Lucas MOSSINA](https://github.com/lmossina). We have recently welcomed [Joseba DALMAU](https://github.com/jdalch) to the team to help enhance **puncc** and work on the development of new features.
195
+
196
+ ## πŸ”‘ License
197
+
198
+ The package is released under [MIT](LICENSES/headers/MIT-Clause.txt) license.
199
+
200
+ Platform: UNKNOWN
201
+ Classifier: License :: OSI Approved :: MIT License
202
+ Classifier: Programming Language :: Python
203
+ Classifier: Programming Language :: Python :: 3
204
+ Classifier: Operating System :: OS Independent
205
+ Requires-Python: >=3.8
206
+ Description-Content-Type: text/markdown
207
+ Provides-Extra: interactive
208
+ Provides-Extra: dev
puncc-0.7/README.md ADDED
@@ -0,0 +1,190 @@
1
+ <!-- Banner -->
2
+ <div align="center">
3
+ <picture>
4
+ <source media="(prefers-color-scheme: dark)" srcset="docs/assets/banner_dark.png">
5
+ <source media="(prefers-color-scheme: light)" srcset="docs/assets/banner_light.png">
6
+ <img src="docs/assets/banner_light.png" alt="Puncc" width="90%" align="right">
7
+ </picture>
8
+ </div>
9
+ <br>
10
+
11
+ <!-- Badges -->
12
+ <div align="center">
13
+ <a href="#">
14
+ <img src="https://img.shields.io/badge/Python-3.8 +-efefef">
15
+ </a>
16
+ <a href="#">
17
+ <img src="https://img.shields.io/badge/License-MIT-efefef">
18
+ </a>
19
+ <a href="https://github.com/deel-ai/puncc/actions/workflows/linter.yml">
20
+ <img alt="PyLint" src="https://github.com/deel-ai/puncc/actions/workflows/linter.yml/badge.svg">
21
+ </a>
22
+ <a href="https://github.com/deel-ai/puncc/actions/workflows/tests.yml">
23
+ <img alt="Tox" src="https://github.com/deel-ai/puncc/actions/workflows/tests.yml/badge.svg">
24
+ </a>
25
+ </div>
26
+ <br>
27
+
28
+ ***Puncc*** (short for **P**redictive **un**certainty **c**alibration and **c**onformalization) is an open-source Python library. It seamlessly integrates a collection of state-of-the-art conformal prediction algorithms and associated techniques for diverse machine learning tasks, including regression, classification and anomaly detection.
29
+ ***Puncc*** can be used with any predictive model to provide rigorous uncertainty estimations.
30
+ Under data exchangeability (or *i.i.d*), the generated prediction sets are guaranteed to cover the true outputs within a user-defined error $\alpha$.
31
+
32
+ Documentation is available [**online**](https://deel-ai.github.io/puncc/index.html).
33
+
34
+ ## πŸ“š Table of contents
35
+
36
+ - [🐾 Installation](#-installation)
37
+ - [πŸ“– Documentation](#-documentation)
38
+ - [πŸ‘¨β€πŸŽ“ Tutorials](#-tutorials)
39
+ - [πŸš€ QuickStart](#-quickstart)
40
+ - [πŸ“š Citation](#-citation)
41
+ - [πŸ’» Contributing](#-contributing)
42
+ - [πŸ™ Acknowledgments](#-acknowledgments)
43
+ - [πŸ‘¨β€πŸ’» Creators](#-creators)
44
+ - [πŸ“ License](#-license)
45
+
46
+ ## 🐾 Installation
47
+
48
+ *puncc* requires a version of python higher than 3.8 and several libraries including Scikit-learn and Numpy. It is recommended to install *puncc* in a virtual environment to not mess with your system's dependencies.
49
+
50
+ You can directly install the library using pip:
51
+
52
+ ```bash
53
+ pip install git+https://github.com/deel-ai/puncc
54
+ ```
55
+
56
+ <!--
57
+ You can alternatively clone the repo and use the makefile to automatically create a virtual environment
58
+ and install the requirements:
59
+
60
+ * For users:
61
+
62
+ ```bash
63
+ make install-user
64
+ ```
65
+
66
+ * For developpers:
67
+
68
+ ```bash
69
+ make prepare-dev
70
+ ```
71
+ -->
72
+
73
+ ## πŸ“– Documentation
74
+
75
+ For comprehensive documentation, we encourage you to visit the [**official documentation page**](https://deel-ai.github.io/puncc/index.html).
76
+
77
+ ## πŸ‘¨β€πŸŽ“ Tutorials
78
+
79
+ We highly recommand following the introduction tutorials to get familiar with the library and its API:
80
+
81
+ * [**Introduction tutorial**](docs/puncc_intro.ipynb)</font> <sub> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1TC_BM7JaEYtBIq6yuYB5U4cJjeg71Tch) </sub>
82
+
83
+ * [**API tutorial**](docs/api_intro.ipynb) <sub> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1d06qQweM1X1eSrCnixA_MLEZil1vXewj) </sub>
84
+
85
+ You can also familiarize yourself with the architecture of *puncc* to build more efficiently your own conformal prediction methods:
86
+
87
+ * [**Architecture overview**](docs/puncc_architecture.ipynb)
88
+
89
+ ## πŸš€ Quickstart
90
+
91
+ Conformal prediction enables to transform point predictions into interval predictions with high probability of coverage. The figure below shows the result of applying the split conformal algorithm on a linear regressor.
92
+
93
+ <figure style="text-align:center">
94
+ <img src="docs/assets/cp_process.png"/>
95
+ </figure>
96
+
97
+ Many conformal prediction algorithms can easily be applied using *puncc*. The code snippet below shows the example of split conformal prediction wrapping a linear model, done in few lines of code:
98
+
99
+ ```python
100
+ from sklearn import linear_model
101
+ from deel.puncc.api.prediction import BasePredictor
102
+
103
+ # Load training data and test data
104
+ # ...
105
+
106
+ # Instanciate a linear regression model
107
+ # linear_model = ...
108
+
109
+
110
+ # Create a predictor to wrap the linear regression model defined earlier.
111
+ # This enables interoperability with different ML libraries.
112
+ # The argument `is_trained` is set to False to tell that the the linear model
113
+ # needs to be trained before the calibration.
114
+ lin_reg_predictor = BasePredictor(linear_model, is_trained=False)
115
+
116
+ # Instanciate the split cp wrapper around the linear predictor.
117
+ split_cp = SplitCP(lin_reg_predictor)
118
+
119
+ # Fit model (as is_trained` is False) on the fit dataset and
120
+ # compute the residuals on the calibration dataset.
121
+ # The fit (resp. calibration) subset is randomly sampled from the training
122
+ # data and constitutes 80% (resp. 20%) of it (fit_ratio = 80%).
123
+ split_cp.fit(X_train, y_train, fit_ratio=.8)
124
+
125
+ # The predict returns the output of the linear model y_pred and
126
+ # the calibrated interval [y_pred_lower, y_pred_upper].
127
+ y_pred, y_pred_lower, y_pred_upper = split_cp.predict(X_test, alpha=alpha)
128
+ ```
129
+
130
+ The library provides several metrics (`deel.puncc.metrics`) and plotting capabilities (`deel.puncc.plotting`) to evaluate and visualize the results of a conformal procedure. For a target error rate of $\alpha = 0.1$, the marginal coverage reached in this example on the test set is higher than $90$% (see [Introduction tutorial](docs/puncc_intro.ipynb)):
131
+
132
+ <figure style="text-align:center">
133
+ <img src="docs/assets/results_quickstart_split_cp_pi.png" alt="90% Prediction Interval with the Split Conformal Prediction Method"/>
134
+ <div align=center>90% Prediction Interval with Split Conformal Prediction.</div>
135
+ </figure>
136
+ <br>
137
+
138
+ ### More flexibility with the API
139
+
140
+ *Puncc* provides two ways of defining and using conformal prediction wrappers:
141
+ - A direct approach to run state-of-the-art conformal prediction procedures. This is what we used in the previous conformal regression example.
142
+ - **Low-level API**: a more flexible approach based of full customization of the prediction model, the choice of nonconformity scores and the split between fit and calibration datasets.
143
+
144
+ A quick comparison of both approaches is provided in the [API tutorial](docs/api_intro.ipynb) for a regression problem.
145
+
146
+ ## πŸ“š Citation
147
+
148
+ If you use our library for your work, please cite our paper:
149
+
150
+ ```
151
+ @inproceedings{mendil2023puncc,
152
+ title={PUNCC: a Python Library for Predictive Uncertainty Calibration and Conformalization},
153
+ author={Mendil, Mouhcine and Mossina, Luca and Vigouroux, David},
154
+ booktitle={Conformal and Probabilistic Prediction with Applications},
155
+ pages={582--601},
156
+ year={2023},
157
+ organization={PMLR}
158
+ }
159
+ ```
160
+
161
+ *Puncc* has been used to support the work presented in our COPA 2022 paper on conformal prediction for time series.
162
+
163
+ ```
164
+ @inproceedings{mendil2022robust,
165
+ title={Robust Gas Demand Forecasting With Conformal Prediction},
166
+ author={Mendil, Mouhcine and Mossina, Luca and Nabhan, Marc and Pasini, Kevin},
167
+ booktitle={Conformal and Probabilistic Prediction with Applications},
168
+ pages={169--187},
169
+ year={2022},
170
+ organization={PMLR}
171
+ }
172
+ ```
173
+
174
+ ## πŸ’» Contributing
175
+
176
+ Contributions are welcome! Feel free to report an issue or open a pull
177
+ request. Take a look at our guidelines [here](CONTRIBUTING.md).
178
+
179
+ ## πŸ™ Acknowledgments
180
+
181
+ <img align="right" src="https://www.deel.ai/wp-content/uploads/2021/05/logo-DEEL.png" width="25%">
182
+ This project received funding from the French ”Investing for the Future – PIA3” program within the Artificial and Natural Intelligence Toulouse Institute (ANITI). The authors gratefully acknowledge the support of the <a href="https://www.deel.ai/"> DEEL </a> project.
183
+
184
+ ## πŸ‘¨β€πŸ’» Creators
185
+
186
+ [Mouhcine MENDIL](https://github.com/M-Mouhcine) initially developed this library as a research tool, with assistance from [Lucas MOSSINA](https://github.com/lmossina). We have recently welcomed [Joseba DALMAU](https://github.com/jdalch) to the team to help enhance **puncc** and work on the development of new features.
187
+
188
+ ## πŸ”‘ License
189
+
190
+ The package is released under [MIT](LICENSES/headers/MIT-Clause.txt) license.
@@ -0,0 +1,33 @@
1
+ # -*- coding: utf-8 -*-
2
+ # Copyright IRT Antoine de Saint ExupΓ©ry et UniversitΓ© Paul Sabatier Toulouse III - All
3
+ # rights reserved. DEEL is a research program operated by IVADO, IRT Saint ExupΓ©ry,
4
+ # CRIAQ and ANITI - https://www.deel.ai/
5
+ #
6
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
7
+ # of this software and associated documentation files (the "Software"), to deal
8
+ # in the Software without restriction, including without limitation the rights
9
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
10
+ # copies of the Software, and to permit persons to whom the Software is
11
+ # furnished to do so, subject to the following conditions:
12
+ #
13
+ # The above copyright notice and this permission notice shall be included in all
14
+ # copies or substantial portions of the Software.
15
+ #
16
+ # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
17
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
18
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
19
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
20
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
21
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
22
+ # SOFTWARE.
23
+ """Initialization of puncc."""
24
+ import logging.config
25
+
26
+
27
+ # Create the Logger
28
+ logging.basicConfig(
29
+ format="%(asctime)s === %(name)s [%(funcName)s()] | %(levelname)s | - %(message)s",
30
+ datefmt="%d-%b-%y %H:%M:%S",
31
+ level=logging.ERROR,
32
+ )
33
+ loggers = logging.getLogger(__name__)
@@ -0,0 +1,231 @@
1
+ # -*- coding: utf-8 -*-
2
+ # Copyright IRT Antoine de Saint ExupΓ©ry et UniversitΓ© Paul Sabatier Toulouse III - All
3
+ # rights reserved. DEEL is a research program operated by IVADO, IRT Saint ExupΓ©ry,
4
+ # CRIAQ and ANITI - https://www.deel.ai/
5
+ #
6
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
7
+ # of this software and associated documentation files (the "Software"), to deal
8
+ # in the Software without restriction, including without limitation the rights
9
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
10
+ # copies of the Software, and to permit persons to whom the Software is
11
+ # furnished to do so, subject to the following conditions:
12
+ #
13
+ # The above copyright notice and this permission notice shall be included in all
14
+ # copies or substantial portions of the Software.
15
+ #
16
+ # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
17
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
18
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
19
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
20
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
21
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
22
+ # SOFTWARE.
23
+ """
24
+ This module implements usual anomaly detection wrappers.
25
+ """
26
+ import logging
27
+ from typing import Iterable
28
+ from typing import Optional
29
+ from typing import Tuple
30
+
31
+ import numpy as np
32
+
33
+ from deel.puncc.api.calibration import ScoreCalibrator
34
+ from deel.puncc.api.splitting import IdSplitter
35
+ from deel.puncc.api.splitting import RandomSplitter
36
+
37
+ logger = logging.getLogger(__name__)
38
+
39
+
40
+ class SplitCAD:
41
+ """Split conformal anomaly detection method based on Laxhammar's algorithm.
42
+ The anomaly detection is based on the calibrated threshold (through
43
+ conformal prediction) of underlying anomaly detection (model's) scores.
44
+ For more details, we refer the user to the :ref:`theory overview
45
+ page <theory_overview>`.
46
+
47
+ :param BasePredictor predictor: a predictor implementing fit and predict.
48
+ :param bool train: if False, prediction model(s) will not be (re)trained.
49
+ Defaults to True.
50
+ :param float random_state: random seed used when the user does not
51
+ provide a custom fit/calibration split in `fit` method.
52
+
53
+ Example::
54
+
55
+ import numpy as np
56
+ from sklearn.ensemble import IsolationForest
57
+ from sklearn.datasets import make_moons
58
+ import matplotlib.pyplot as plt
59
+
60
+ from deel.puncc.anomaly_detection import SplitCAD
61
+ from deel.puncc.api.prediction import BasePredictor
62
+
63
+ # We generate the two moons dataset
64
+ dataset = 4 * make_moons(n_samples=1000, noise=0.05, random_state=0)[
65
+ 0
66
+ ] - np.array([0.5, 0.25])
67
+
68
+ # We generate uniformly new (test) data points
69
+ rng = np.random.RandomState(42)
70
+ z_test = rng.uniform(low=-6, high=6, size=(150, 2))
71
+
72
+
73
+ # The nonconformity scores are defined as the IF scores (anomaly score).
74
+ # By default, score_samples return the opposite of IF scores.
75
+ # We need to redefine the predict to output the nonconformity scores.
76
+ class ADPredictor(BasePredictor):
77
+ def predict(self, X):
78
+ return -self.model.score_samples(X)
79
+
80
+ # Instantiate the Isolation Forest (IF) anomaly detection model
81
+ # and wrap it in a predictor
82
+ if_predictor = ADPredictor(IsolationForest(random_state=42))
83
+
84
+ # Instantiate CAD on top of IF predictor
85
+ if_cad = SplitCAD(if_predictor, train=True, random_state=0)
86
+
87
+ # Fit the IF on the proper fitting dataset and
88
+ # calibrate it using calibration dataset.
89
+ # The two datasets are sampled randomly with a ration of 7:3,
90
+ # respectively.
91
+ if_cad.fit(z=dataset, fit_ratio=0.7)
92
+
93
+ # We set the maximum false detection rate to 1%
94
+ alpha = 0.01
95
+
96
+ # The method `predict` is called on the new data points
97
+ # to test which are anomalous and which are not
98
+ results = if_cad.predict(z_test, alpha=alpha)
99
+
100
+ anomalies = z_test[results]
101
+ not_anomalies = z_test[np.invert(results)]
102
+
103
+ # Plot results
104
+ plt.scatter(dataset[:, 0], dataset[:, 1], s=10, label="Inliers")
105
+ plt.scatter(
106
+ anomalies[:, 0],
107
+ anomalies[:, 1],
108
+ marker="x",
109
+ color="red",
110
+ s=40,
111
+ label="Anomalies",
112
+ )
113
+ plt.scatter(
114
+ not_anomalies[:, 0],
115
+ not_anomalies[:, 1],
116
+ marker="x",
117
+ color="blue",
118
+ s=40,
119
+ label="Normal",
120
+ )
121
+ plt.xticks(())
122
+ plt.yticks(())
123
+ plt.legend()
124
+ """
125
+
126
+ def __init__(self, predictor, *, train=True, random_state: float = None):
127
+ self.predictor = predictor
128
+ self.calibrator = ScoreCalibrator(nonconf_score_func=predictor.predict)
129
+
130
+ self.train = train
131
+
132
+ self.random_state = random_state
133
+
134
+ self.__is_fit = False
135
+
136
+ def fit(
137
+ self,
138
+ *,
139
+ z: Optional[Iterable] = None,
140
+ fit_ratio: float = 0.8,
141
+ z_fit: Optional[Iterable] = None,
142
+ z_calib: Optional[Iterable] = None,
143
+ **kwargs: Optional[dict],
144
+ ):
145
+ """This method fits the models on the fit data
146
+ and computes nonconformity scores on calibration data.
147
+ If z are provided, randomly split data into
148
+ fit and calib subsets w.r.t to the fit_ratio.
149
+ In case z_fit and z_calib are provided,
150
+ the conformalization is performed on the given user defined
151
+ fit and calibration sets.
152
+
153
+ .. NOTE::
154
+
155
+ If z is provided, `fit` ignores
156
+ any user-defined fit/calib split.
157
+
158
+
159
+ :param Iterable z: data points from the training dataset.
160
+ :param float fit_ratio: the proportion of samples assigned to the
161
+ fit subset.
162
+ :param Iterable z_fit: data points from the fit dataset.
163
+ :param Iterable z_calib: data points from the calibration dataset.
164
+ :param dict kwargs: predict configuration to be passed to the model's
165
+ fit method.
166
+
167
+ :raises RuntimeError: no dataset provided.
168
+
169
+ """
170
+
171
+ if z is not None:
172
+ splitter = RandomSplitter(
173
+ ratio=fit_ratio, random_state=self.random_state
174
+ )
175
+
176
+ elif z_fit is not None and z_calib is not None:
177
+ splitter = IdSplitter(z_fit, z_fit, z_calib, z_calib)
178
+
179
+ elif (
180
+ self.predictor.is_trained and z_fit is None and z_calib is not None
181
+ ):
182
+ splitter = IdSplitter(
183
+ np.empty_like(z_calib), np.empty_like(z_calib), z_calib, z_calib
184
+ )
185
+
186
+ else:
187
+ raise RuntimeError("No dataset provided.")
188
+
189
+ # Apply splitter
190
+ z_fit, _, z_calib, _ = splitter(z, z)[0]
191
+
192
+ # Fit underlying model and calibrator
193
+ if self.train:
194
+ logger.info("Fitting model")
195
+ self.predictor.fit(z_fit, **kwargs)
196
+
197
+ # Make sure that predictor is already trained if train arg is False
198
+ elif self.train is False and self.predictor.is_trained is False:
199
+ raise RuntimeError(
200
+ "'train' argument is set to 'False' but model is not pre-trained"
201
+ )
202
+
203
+ else: # Skipping training
204
+ logger.info("Skipping training.")
205
+
206
+ # Fitting calibrator
207
+ self.calibrator.fit(z_calib)
208
+
209
+ self.__is_fit = True
210
+
211
+ def predict(self, z_test: Iterable, alpha) -> Tuple[np.ndarray]:
212
+ """Predict whether each example is an anomaly or not. The decision is
213
+ taken based on the calibrated threshold (through conformal prediction)
214
+ of underlying anomaly detection scores.
215
+
216
+ :param Iterable z_test: new data points.
217
+ :param float alpha: target maximum FDR.
218
+
219
+ :returns: outlier tag. True if outlier, False otherwise.
220
+ :rtype: Iterables[bool]
221
+
222
+ """
223
+
224
+ if self.__is_fit is None:
225
+ raise RuntimeError("Fit method should be called before predict.")
226
+
227
+ anomaly_pred = np.invert(
228
+ self.calibrator.is_conformal(z_test, alpha=alpha)
229
+ )
230
+
231
+ return anomaly_pred
@@ -0,0 +1,22 @@
1
+ # -*- coding: utf-8 -*-
2
+ # Copyright IRT Antoine de Saint ExupΓ©ry et UniversitΓ© Paul Sabatier Toulouse III - All
3
+ # rights reserved. DEEL is a research program operated by IVADO, IRT Saint ExupΓ©ry,
4
+ # CRIAQ and ANITI - https://www.deel.ai/
5
+ #
6
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
7
+ # of this software and associated documentation files (the "Software"), to deal
8
+ # in the Software without restriction, including without limitation the rights
9
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
10
+ # copies of the Software, and to permit persons to whom the Software is
11
+ # furnished to do so, subject to the following conditions:
12
+ #
13
+ # The above copyright notice and this permission notice shall be included in all
14
+ # copies or substantial portions of the Software.
15
+ #
16
+ # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
17
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
18
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
19
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
20
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
21
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
22
+ # SOFTWARE.