missoutlier 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- missoutlier-0.1.0/LICENSE +21 -0
- missoutlier-0.1.0/PKG-INFO +159 -0
- missoutlier-0.1.0/README.md +139 -0
- missoutlier-0.1.0/pyproject.toml +29 -0
- missoutlier-0.1.0/setup.cfg +4 -0
- missoutlier-0.1.0/src/missoutlier/__init__.py +6 -0
- missoutlier-0.1.0/src/missoutlier/detect.py +88 -0
- missoutlier-0.1.0/src/missoutlier.egg-info/PKG-INFO +159 -0
- missoutlier-0.1.0/src/missoutlier.egg-info/SOURCES.txt +10 -0
- missoutlier-0.1.0/src/missoutlier.egg-info/dependency_links.txt +1 -0
- missoutlier-0.1.0/src/missoutlier.egg-info/requires.txt +2 -0
- missoutlier-0.1.0/src/missoutlier.egg-info/top_level.txt +1 -0
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Guillaume Pech
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,159 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: missoutlier
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Outlier detection using the MISS (MAD-IQR-SD Simultaneous) method
|
|
5
|
+
Author-email: Guillaume Pech <guillaumepech.cog@gmail.com>
|
|
6
|
+
License-Expression: MIT
|
|
7
|
+
Project-URL: Homepage, https://github.com/GuillaumePech/missOutlierPy
|
|
8
|
+
Project-URL: Paper, https://osf.io/preprints/psyarxiv/2r9yw_v2
|
|
9
|
+
Keywords: outlier,detection,statistics,MAD,IQR,MISS
|
|
10
|
+
Classifier: Development Status :: 3 - Alpha
|
|
11
|
+
Classifier: Intended Audience :: Science/Research
|
|
12
|
+
Classifier: Topic :: Scientific/Engineering :: Mathematics
|
|
13
|
+
Classifier: Programming Language :: Python :: 3
|
|
14
|
+
Requires-Python: >=3.8
|
|
15
|
+
Description-Content-Type: text/markdown
|
|
16
|
+
License-File: LICENSE
|
|
17
|
+
Requires-Dist: numpy>=1.20
|
|
18
|
+
Requires-Dist: scipy>=1.7
|
|
19
|
+
Dynamic: license-file
|
|
20
|
+
|
|
21
|
+
<div align="center">
|
|
22
|
+
|
|
23
|
+
# 🎯 missoutlier
|
|
24
|
+
|
|
25
|
+
### **Outlier Detection Using the MISS Method**
|
|
26
|
+
|
|
27
|
+
*A weighted composite of MAD, IQR, and SD for robust univariate outlier detection*
|
|
28
|
+
|
|
29
|
+
[](https://www.python.org/)
|
|
30
|
+
[](https://opensource.org/licenses/MIT)
|
|
31
|
+
[](https://osf.io/preprints/psyarxiv/2r9yw_v2)
|
|
32
|
+
|
|
33
|
+
---
|
|
34
|
+
|
|
35
|
+
</div>
|
|
36
|
+
|
|
37
|
+
## Overview
|
|
38
|
+
|
|
39
|
+
**missoutlier** implements the **MISS** (MAD–IQR–SD Simultaneous) method, a new approach for univariate outlier detection that combines three classical techniques into a single robust threshold:
|
|
40
|
+
|
|
41
|
+
| Method | Bounds | Weight |
|
|
42
|
+
|--------|--------|--------|
|
|
43
|
+
| **MAD** (Median Absolute Deviation) | `median ± 1.5 × MAD` | 87.8% |
|
|
44
|
+
| **IQR** (Interquartile Range) | `Q25/Q75 ± 2 × IQR` | 1.2% |
|
|
45
|
+
| **SD** (Standard Deviation) | `mean ± 5 × SD` | 11.0% |
|
|
46
|
+
|
|
47
|
+
The composite threshold is computed as:
|
|
48
|
+
|
|
49
|
+
$$\text{MISS} = 0.878 \times \text{MAD} + 0.012 \times \text{IQR} + 0.11 \times \text{SD}$$
|
|
50
|
+
|
|
51
|
+
By heavily weighting the robust MAD while retaining sensitivity from IQR and SD, MISS offers a balanced approach that handles skewed and heavy-tailed distributions better than any single method alone.
|
|
52
|
+
|
|
53
|
+
---
|
|
54
|
+
|
|
55
|
+
## Installation
|
|
56
|
+
|
|
57
|
+
```bash
|
|
58
|
+
# Install from GitHub
|
|
59
|
+
pip install git+https://github.com/GuillaumePech/missOutlierPy.git
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
**Dependencies:** `numpy >= 1.20`, `scipy >= 1.7`
|
|
63
|
+
|
|
64
|
+
---
|
|
65
|
+
|
|
66
|
+
## Quick Start
|
|
67
|
+
|
|
68
|
+
```python
|
|
69
|
+
import numpy as np
|
|
70
|
+
from missoutlier import detect_outliers_miss
|
|
71
|
+
|
|
72
|
+
# Generate data with outliers
|
|
73
|
+
x = np.concatenate([np.random.randn(100), [50, -40]])
|
|
74
|
+
|
|
75
|
+
# Default: replace outliers with NaN
|
|
76
|
+
x_clean = detect_outliers_miss(x)
|
|
77
|
+
# Detected 2 outliers (1.96% of data) using MISS method.
|
|
78
|
+
|
|
79
|
+
# Drop outliers entirely
|
|
80
|
+
x_dropped = detect_outliers_miss(x, drop=True)
|
|
81
|
+
# Detected 2 outliers (1.96% of data) using MISS method.
|
|
82
|
+
|
|
83
|
+
# Handle existing NaNs
|
|
84
|
+
x_na = np.concatenate([np.random.randn(100), [np.nan, 50]])
|
|
85
|
+
x_clean = detect_outliers_miss(x_na, na_rm=True)
|
|
86
|
+
|
|
87
|
+
# Silent mode (no messages)
|
|
88
|
+
x_clean = detect_outliers_miss(x, silent=True)
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
---
|
|
92
|
+
|
|
93
|
+
## Parameters
|
|
94
|
+
|
|
95
|
+
| Parameter | Type | Default | Description |
|
|
96
|
+
|-----------|------|---------|-------------|
|
|
97
|
+
| `data` | array-like | — | Input data (must be one-dimensional) |
|
|
98
|
+
| `drop` | bool | `False` | If `True`, removes outliers. If `False`, replaces them with `NaN` |
|
|
99
|
+
| `na_rm` | bool | `False` | If `True`, ignores `NaN` values when computing thresholds |
|
|
100
|
+
| `silent` | bool | `False` | If `True`, suppresses the detection message |
|
|
101
|
+
|
|
102
|
+
---
|
|
103
|
+
|
|
104
|
+
## How It Works
|
|
105
|
+
|
|
106
|
+
```
|
|
107
|
+
┌──────────────┐
|
|
108
|
+
│ Input Data │
|
|
109
|
+
└──────┬───────┘
|
|
110
|
+
│
|
|
111
|
+
┌────────────┼────────────┐
|
|
112
|
+
▼ ▼ ▼
|
|
113
|
+
┌─────────┐ ┌─────────┐ ┌─────────┐
|
|
114
|
+
│ MAD │ │ IQR │ │ SD │
|
|
115
|
+
│ ×0.878 │ │ ×0.012 │ │ ×0.11 │
|
|
116
|
+
└────┬────┘ └────┬────┘ └────┬────┘
|
|
117
|
+
│ │ │
|
|
118
|
+
└────────────┼────────────┘
|
|
119
|
+
▼
|
|
120
|
+
┌────────────────┐
|
|
121
|
+
│ MISS Threshold │
|
|
122
|
+
└────────┬───────┘
|
|
123
|
+
▼
|
|
124
|
+
┌────────────────┐
|
|
125
|
+
│ Flag Outliers │
|
|
126
|
+
└────────────────┘
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
---
|
|
130
|
+
|
|
131
|
+
## Also available in R
|
|
132
|
+
|
|
133
|
+
```r
|
|
134
|
+
devtools::install_github("GuillaumePech/missOutlierR")
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
---
|
|
138
|
+
|
|
139
|
+
## Citation
|
|
140
|
+
|
|
141
|
+
If you use this package in your research, please cite:
|
|
142
|
+
|
|
143
|
+
> Pech, G., Vaccaro, N., Caspar, E. A., Amerio, P., Cleeremans, A., Leys, C., & Ley, C. (2026). How not to MISS an outlier: comparing three classic univariate methods and introducing a new one, the MAD–IQR–SD Simultaneous (MISS). *PsyArXiv*. https://doi.org/10.31234/osf.io/2r9yw_v2
|
|
144
|
+
|
|
145
|
+
```bibtex
|
|
146
|
+
@article{pech2026miss,
|
|
147
|
+
title={How not to {MISS} an outlier: comparing three classic univariate methods and introducing a new one, the {MAD--IQR--SD} Simultaneous ({MISS})},
|
|
148
|
+
author={Pech, Guillaume and Vaccaro, Niccol{\`o} and Caspar, Emilie A. and Amerio, Pietro and Cleeremans, Axel and Leys, Christophe and Ley, Christophe},
|
|
149
|
+
year={2026},
|
|
150
|
+
journal={PsyArXiv},
|
|
151
|
+
doi={10.31234/osf.io/2r9yw_v2}
|
|
152
|
+
}
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
---
|
|
156
|
+
|
|
157
|
+
## License
|
|
158
|
+
|
|
159
|
+
MIT © [Guillaume Pech](https://github.com/GuillaumePech)
|
|
@@ -0,0 +1,139 @@
|
|
|
1
|
+
<div align="center">
|
|
2
|
+
|
|
3
|
+
# 🎯 missoutlier
|
|
4
|
+
|
|
5
|
+
### **Outlier Detection Using the MISS Method**
|
|
6
|
+
|
|
7
|
+
*A weighted composite of MAD, IQR, and SD for robust univariate outlier detection*
|
|
8
|
+
|
|
9
|
+
[](https://www.python.org/)
|
|
10
|
+
[](https://opensource.org/licenses/MIT)
|
|
11
|
+
[](https://osf.io/preprints/psyarxiv/2r9yw_v2)
|
|
12
|
+
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
</div>
|
|
16
|
+
|
|
17
|
+
## Overview
|
|
18
|
+
|
|
19
|
+
**missoutlier** implements the **MISS** (MAD–IQR–SD Simultaneous) method, a new approach for univariate outlier detection that combines three classical techniques into a single robust threshold:
|
|
20
|
+
|
|
21
|
+
| Method | Bounds | Weight |
|
|
22
|
+
|--------|--------|--------|
|
|
23
|
+
| **MAD** (Median Absolute Deviation) | `median ± 1.5 × MAD` | 87.8% |
|
|
24
|
+
| **IQR** (Interquartile Range) | `Q25/Q75 ± 2 × IQR` | 1.2% |
|
|
25
|
+
| **SD** (Standard Deviation) | `mean ± 5 × SD` | 11.0% |
|
|
26
|
+
|
|
27
|
+
The composite threshold is computed as:
|
|
28
|
+
|
|
29
|
+
$$\text{MISS} = 0.878 \times \text{MAD} + 0.012 \times \text{IQR} + 0.11 \times \text{SD}$$
|
|
30
|
+
|
|
31
|
+
By heavily weighting the robust MAD while retaining sensitivity from IQR and SD, MISS offers a balanced approach that handles skewed and heavy-tailed distributions better than any single method alone.
|
|
32
|
+
|
|
33
|
+
---
|
|
34
|
+
|
|
35
|
+
## Installation
|
|
36
|
+
|
|
37
|
+
```bash
|
|
38
|
+
# Install from GitHub
|
|
39
|
+
pip install git+https://github.com/GuillaumePech/missOutlierPy.git
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
**Dependencies:** `numpy >= 1.20`, `scipy >= 1.7`
|
|
43
|
+
|
|
44
|
+
---
|
|
45
|
+
|
|
46
|
+
## Quick Start
|
|
47
|
+
|
|
48
|
+
```python
|
|
49
|
+
import numpy as np
|
|
50
|
+
from missoutlier import detect_outliers_miss
|
|
51
|
+
|
|
52
|
+
# Generate data with outliers
|
|
53
|
+
x = np.concatenate([np.random.randn(100), [50, -40]])
|
|
54
|
+
|
|
55
|
+
# Default: replace outliers with NaN
|
|
56
|
+
x_clean = detect_outliers_miss(x)
|
|
57
|
+
# Detected 2 outliers (1.96% of data) using MISS method.
|
|
58
|
+
|
|
59
|
+
# Drop outliers entirely
|
|
60
|
+
x_dropped = detect_outliers_miss(x, drop=True)
|
|
61
|
+
# Detected 2 outliers (1.96% of data) using MISS method.
|
|
62
|
+
|
|
63
|
+
# Handle existing NaNs
|
|
64
|
+
x_na = np.concatenate([np.random.randn(100), [np.nan, 50]])
|
|
65
|
+
x_clean = detect_outliers_miss(x_na, na_rm=True)
|
|
66
|
+
|
|
67
|
+
# Silent mode (no messages)
|
|
68
|
+
x_clean = detect_outliers_miss(x, silent=True)
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
---
|
|
72
|
+
|
|
73
|
+
## Parameters
|
|
74
|
+
|
|
75
|
+
| Parameter | Type | Default | Description |
|
|
76
|
+
|-----------|------|---------|-------------|
|
|
77
|
+
| `data` | array-like | — | Input data (must be one-dimensional) |
|
|
78
|
+
| `drop` | bool | `False` | If `True`, removes outliers. If `False`, replaces them with `NaN` |
|
|
79
|
+
| `na_rm` | bool | `False` | If `True`, ignores `NaN` values when computing thresholds |
|
|
80
|
+
| `silent` | bool | `False` | If `True`, suppresses the detection message |
|
|
81
|
+
|
|
82
|
+
---
|
|
83
|
+
|
|
84
|
+
## How It Works
|
|
85
|
+
|
|
86
|
+
```
|
|
87
|
+
┌──────────────┐
|
|
88
|
+
│ Input Data │
|
|
89
|
+
└──────┬───────┘
|
|
90
|
+
│
|
|
91
|
+
┌────────────┼────────────┐
|
|
92
|
+
▼ ▼ ▼
|
|
93
|
+
┌─────────┐ ┌─────────┐ ┌─────────┐
|
|
94
|
+
│ MAD │ │ IQR │ │ SD │
|
|
95
|
+
│ ×0.878 │ │ ×0.012 │ │ ×0.11 │
|
|
96
|
+
└────┬────┘ └────┬────┘ └────┬────┘
|
|
97
|
+
│ │ │
|
|
98
|
+
└────────────┼────────────┘
|
|
99
|
+
▼
|
|
100
|
+
┌────────────────┐
|
|
101
|
+
│ MISS Threshold │
|
|
102
|
+
└────────┬───────┘
|
|
103
|
+
▼
|
|
104
|
+
┌────────────────┐
|
|
105
|
+
│ Flag Outliers │
|
|
106
|
+
└────────────────┘
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
---
|
|
110
|
+
|
|
111
|
+
## Also available in R
|
|
112
|
+
|
|
113
|
+
```r
|
|
114
|
+
devtools::install_github("GuillaumePech/missOutlierR")
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
---
|
|
118
|
+
|
|
119
|
+
## Citation
|
|
120
|
+
|
|
121
|
+
If you use this package in your research, please cite:
|
|
122
|
+
|
|
123
|
+
> Pech, G., Vaccaro, N., Caspar, E. A., Amerio, P., Cleeremans, A., Leys, C., & Ley, C. (2026). How not to MISS an outlier: comparing three classic univariate methods and introducing a new one, the MAD–IQR–SD Simultaneous (MISS). *PsyArXiv*. https://doi.org/10.31234/osf.io/2r9yw_v2
|
|
124
|
+
|
|
125
|
+
```bibtex
|
|
126
|
+
@article{pech2026miss,
|
|
127
|
+
title={How not to {MISS} an outlier: comparing three classic univariate methods and introducing a new one, the {MAD--IQR--SD} Simultaneous ({MISS})},
|
|
128
|
+
author={Pech, Guillaume and Vaccaro, Niccol{\`o} and Caspar, Emilie A. and Amerio, Pietro and Cleeremans, Axel and Leys, Christophe and Ley, Christophe},
|
|
129
|
+
year={2026},
|
|
130
|
+
journal={PsyArXiv},
|
|
131
|
+
doi={10.31234/osf.io/2r9yw_v2}
|
|
132
|
+
}
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
---
|
|
136
|
+
|
|
137
|
+
## License
|
|
138
|
+
|
|
139
|
+
MIT © [Guillaume Pech](https://github.com/GuillaumePech)
|
|
@@ -0,0 +1,29 @@
|
|
|
1
|
+
[build-system]
|
|
2
|
+
requires = ["setuptools>=61.0"]
|
|
3
|
+
build-backend = "setuptools.build_meta"
|
|
4
|
+
|
|
5
|
+
[project]
|
|
6
|
+
name = "missoutlier"
|
|
7
|
+
version = "0.1.0"
|
|
8
|
+
description = "Outlier detection using the MISS (MAD-IQR-SD Simultaneous) method"
|
|
9
|
+
readme = "README.md"
|
|
10
|
+
license = "MIT"
|
|
11
|
+
requires-python = ">=3.8"
|
|
12
|
+
authors = [
|
|
13
|
+
{ name = "Guillaume Pech", email = "guillaumepech.cog@gmail.com" },
|
|
14
|
+
]
|
|
15
|
+
keywords = ["outlier", "detection", "statistics", "MAD", "IQR", "MISS"]
|
|
16
|
+
classifiers = [
|
|
17
|
+
"Development Status :: 3 - Alpha",
|
|
18
|
+
"Intended Audience :: Science/Research",
|
|
19
|
+
"Topic :: Scientific/Engineering :: Mathematics",
|
|
20
|
+
"Programming Language :: Python :: 3",
|
|
21
|
+
]
|
|
22
|
+
dependencies = [
|
|
23
|
+
"numpy>=1.20",
|
|
24
|
+
"scipy>=1.7",
|
|
25
|
+
]
|
|
26
|
+
|
|
27
|
+
[project.urls]
|
|
28
|
+
Homepage = "https://github.com/GuillaumePech/missOutlierPy"
|
|
29
|
+
Paper = "https://osf.io/preprints/psyarxiv/2r9yw_v2"
|
|
@@ -0,0 +1,88 @@
|
|
|
1
|
+
"""Outlier detection using the MISS (MAD-IQR-SD Simultaneous) method."""
|
|
2
|
+
|
|
3
|
+
import numpy as np
|
|
4
|
+
from scipy.stats import median_abs_deviation
|
|
5
|
+
|
|
6
|
+
|
|
7
|
+
def detect_outliers_miss(data, drop=False, na_rm=False, silent=False):
|
|
8
|
+
"""
|
|
9
|
+
Detect outliers in data using the MISS method.
|
|
10
|
+
|
|
11
|
+
Parameters
|
|
12
|
+
----------
|
|
13
|
+
data : array-like
|
|
14
|
+
Input data (will be converted to numeric array)
|
|
15
|
+
drop : bool, optional
|
|
16
|
+
If True, removes outliers; if False, replaces with NaN (default False)
|
|
17
|
+
na_rm : bool, optional
|
|
18
|
+
If True, remove NaN values before computation (default False)
|
|
19
|
+
silent : bool, optional
|
|
20
|
+
If True, suppress messages (default False)
|
|
21
|
+
|
|
22
|
+
Returns
|
|
23
|
+
-------
|
|
24
|
+
numpy.ndarray
|
|
25
|
+
Array with outliers removed or replaced with NaN
|
|
26
|
+
|
|
27
|
+
References
|
|
28
|
+
----------
|
|
29
|
+
Pech, G., Vaccaro, N., Caspar, E. A., Amerio, P., Cleeremans, A.,
|
|
30
|
+
Leys, C., & Ley, C. (2026). How not to MISS an outlier: comparing
|
|
31
|
+
three classic univariate methods and introducing a new one, the
|
|
32
|
+
MAD-IQR-SD Simultaneous (MISS). *PsyArXiv*.
|
|
33
|
+
https://doi.org/10.31234/osf.io/2r9yw_v2
|
|
34
|
+
"""
|
|
35
|
+
# Convert to numpy array and ensure numeric type
|
|
36
|
+
data = np.asarray(data, dtype=float)
|
|
37
|
+
|
|
38
|
+
# Check if data is one-dimensional
|
|
39
|
+
if data.ndim > 1:
|
|
40
|
+
raise ValueError("Data must be one-dimensional")
|
|
41
|
+
|
|
42
|
+
# Create a copy to avoid modifying original
|
|
43
|
+
data = data.copy()
|
|
44
|
+
|
|
45
|
+
# Helper function for handling NaN
|
|
46
|
+
def _get_func(func_name):
|
|
47
|
+
if na_rm:
|
|
48
|
+
return getattr(np, f'nan{func_name}')
|
|
49
|
+
return getattr(np, func_name)
|
|
50
|
+
|
|
51
|
+
# Outlier detection using Median Absolute Deviation (MAD)
|
|
52
|
+
med = _get_func('median')(data)
|
|
53
|
+
mad = median_abs_deviation(data, nan_policy='omit' if na_rm else 'propagate')
|
|
54
|
+
outlier_upper_mad = med + (1.5 * mad)
|
|
55
|
+
outlier_lower_mad = med - (1.5 * mad)
|
|
56
|
+
|
|
57
|
+
# Outlier detection using Interquartile Range (IQR)
|
|
58
|
+
q75 = np.nanpercentile(data, 75) if na_rm else np.percentile(data, 75)
|
|
59
|
+
q25 = np.nanpercentile(data, 25) if na_rm else np.percentile(data, 25)
|
|
60
|
+
iqr = q75 - q25
|
|
61
|
+
outlier_upper_iqr = q75 + (2 * iqr)
|
|
62
|
+
outlier_lower_iqr = q25 - (2 * iqr)
|
|
63
|
+
|
|
64
|
+
# Outlier detection using Standard Deviation (SD)
|
|
65
|
+
mean_val = _get_func('mean')(data)
|
|
66
|
+
std_val = _get_func('std')(data, ddof=1) # ddof=1 for sample std like R
|
|
67
|
+
outlier_upper_sd = mean_val + (5 * std_val)
|
|
68
|
+
outlier_lower_sd = mean_val - (5 * std_val)
|
|
69
|
+
|
|
70
|
+
# Combine all outlier thresholds into the MISS
|
|
71
|
+
outlier_upper_MISS = 0.878 * outlier_upper_mad + 0.012 * outlier_upper_iqr + 0.11 * outlier_upper_sd
|
|
72
|
+
outlier_lower_MISS = 0.878 * outlier_lower_mad + 0.012 * outlier_lower_iqr + 0.11 * outlier_lower_sd
|
|
73
|
+
|
|
74
|
+
# Identify outliers
|
|
75
|
+
outlier_idx = np.where((data <= outlier_lower_MISS) | (data >= outlier_upper_MISS))[0]
|
|
76
|
+
|
|
77
|
+
# Calculate percentage of outliers
|
|
78
|
+
pct_outliers = len(outlier_idx) / len(data) * 100
|
|
79
|
+
|
|
80
|
+
if not silent:
|
|
81
|
+
print(f"Detected {len(outlier_idx)} outliers ({pct_outliers:.2f}% of data) using MISS method.")
|
|
82
|
+
|
|
83
|
+
if drop:
|
|
84
|
+
data = np.delete(data, outlier_idx)
|
|
85
|
+
else:
|
|
86
|
+
data[outlier_idx] = np.nan
|
|
87
|
+
|
|
88
|
+
return data
|
|
@@ -0,0 +1,159 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: missoutlier
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Outlier detection using the MISS (MAD-IQR-SD Simultaneous) method
|
|
5
|
+
Author-email: Guillaume Pech <guillaumepech.cog@gmail.com>
|
|
6
|
+
License-Expression: MIT
|
|
7
|
+
Project-URL: Homepage, https://github.com/GuillaumePech/missOutlierPy
|
|
8
|
+
Project-URL: Paper, https://osf.io/preprints/psyarxiv/2r9yw_v2
|
|
9
|
+
Keywords: outlier,detection,statistics,MAD,IQR,MISS
|
|
10
|
+
Classifier: Development Status :: 3 - Alpha
|
|
11
|
+
Classifier: Intended Audience :: Science/Research
|
|
12
|
+
Classifier: Topic :: Scientific/Engineering :: Mathematics
|
|
13
|
+
Classifier: Programming Language :: Python :: 3
|
|
14
|
+
Requires-Python: >=3.8
|
|
15
|
+
Description-Content-Type: text/markdown
|
|
16
|
+
License-File: LICENSE
|
|
17
|
+
Requires-Dist: numpy>=1.20
|
|
18
|
+
Requires-Dist: scipy>=1.7
|
|
19
|
+
Dynamic: license-file
|
|
20
|
+
|
|
21
|
+
<div align="center">
|
|
22
|
+
|
|
23
|
+
# 🎯 missoutlier
|
|
24
|
+
|
|
25
|
+
### **Outlier Detection Using the MISS Method**
|
|
26
|
+
|
|
27
|
+
*A weighted composite of MAD, IQR, and SD for robust univariate outlier detection*
|
|
28
|
+
|
|
29
|
+
[](https://www.python.org/)
|
|
30
|
+
[](https://opensource.org/licenses/MIT)
|
|
31
|
+
[](https://osf.io/preprints/psyarxiv/2r9yw_v2)
|
|
32
|
+
|
|
33
|
+
---
|
|
34
|
+
|
|
35
|
+
</div>
|
|
36
|
+
|
|
37
|
+
## Overview
|
|
38
|
+
|
|
39
|
+
**missoutlier** implements the **MISS** (MAD–IQR–SD Simultaneous) method, a new approach for univariate outlier detection that combines three classical techniques into a single robust threshold:
|
|
40
|
+
|
|
41
|
+
| Method | Bounds | Weight |
|
|
42
|
+
|--------|--------|--------|
|
|
43
|
+
| **MAD** (Median Absolute Deviation) | `median ± 1.5 × MAD` | 87.8% |
|
|
44
|
+
| **IQR** (Interquartile Range) | `Q25/Q75 ± 2 × IQR` | 1.2% |
|
|
45
|
+
| **SD** (Standard Deviation) | `mean ± 5 × SD` | 11.0% |
|
|
46
|
+
|
|
47
|
+
The composite threshold is computed as:
|
|
48
|
+
|
|
49
|
+
$$\text{MISS} = 0.878 \times \text{MAD} + 0.012 \times \text{IQR} + 0.11 \times \text{SD}$$
|
|
50
|
+
|
|
51
|
+
By heavily weighting the robust MAD while retaining sensitivity from IQR and SD, MISS offers a balanced approach that handles skewed and heavy-tailed distributions better than any single method alone.
|
|
52
|
+
|
|
53
|
+
---
|
|
54
|
+
|
|
55
|
+
## Installation
|
|
56
|
+
|
|
57
|
+
```bash
|
|
58
|
+
# Install from GitHub
|
|
59
|
+
pip install git+https://github.com/GuillaumePech/missOutlierPy.git
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
**Dependencies:** `numpy >= 1.20`, `scipy >= 1.7`
|
|
63
|
+
|
|
64
|
+
---
|
|
65
|
+
|
|
66
|
+
## Quick Start
|
|
67
|
+
|
|
68
|
+
```python
|
|
69
|
+
import numpy as np
|
|
70
|
+
from missoutlier import detect_outliers_miss
|
|
71
|
+
|
|
72
|
+
# Generate data with outliers
|
|
73
|
+
x = np.concatenate([np.random.randn(100), [50, -40]])
|
|
74
|
+
|
|
75
|
+
# Default: replace outliers with NaN
|
|
76
|
+
x_clean = detect_outliers_miss(x)
|
|
77
|
+
# Detected 2 outliers (1.96% of data) using MISS method.
|
|
78
|
+
|
|
79
|
+
# Drop outliers entirely
|
|
80
|
+
x_dropped = detect_outliers_miss(x, drop=True)
|
|
81
|
+
# Detected 2 outliers (1.96% of data) using MISS method.
|
|
82
|
+
|
|
83
|
+
# Handle existing NaNs
|
|
84
|
+
x_na = np.concatenate([np.random.randn(100), [np.nan, 50]])
|
|
85
|
+
x_clean = detect_outliers_miss(x_na, na_rm=True)
|
|
86
|
+
|
|
87
|
+
# Silent mode (no messages)
|
|
88
|
+
x_clean = detect_outliers_miss(x, silent=True)
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
---
|
|
92
|
+
|
|
93
|
+
## Parameters
|
|
94
|
+
|
|
95
|
+
| Parameter | Type | Default | Description |
|
|
96
|
+
|-----------|------|---------|-------------|
|
|
97
|
+
| `data` | array-like | — | Input data (must be one-dimensional) |
|
|
98
|
+
| `drop` | bool | `False` | If `True`, removes outliers. If `False`, replaces them with `NaN` |
|
|
99
|
+
| `na_rm` | bool | `False` | If `True`, ignores `NaN` values when computing thresholds |
|
|
100
|
+
| `silent` | bool | `False` | If `True`, suppresses the detection message |
|
|
101
|
+
|
|
102
|
+
---
|
|
103
|
+
|
|
104
|
+
## How It Works
|
|
105
|
+
|
|
106
|
+
```
|
|
107
|
+
┌──────────────┐
|
|
108
|
+
│ Input Data │
|
|
109
|
+
└──────┬───────┘
|
|
110
|
+
│
|
|
111
|
+
┌────────────┼────────────┐
|
|
112
|
+
▼ ▼ ▼
|
|
113
|
+
┌─────────┐ ┌─────────┐ ┌─────────┐
|
|
114
|
+
│ MAD │ │ IQR │ │ SD │
|
|
115
|
+
│ ×0.878 │ │ ×0.012 │ │ ×0.11 │
|
|
116
|
+
└────┬────┘ └────┬────┘ └────┬────┘
|
|
117
|
+
│ │ │
|
|
118
|
+
└────────────┼────────────┘
|
|
119
|
+
▼
|
|
120
|
+
┌────────────────┐
|
|
121
|
+
│ MISS Threshold │
|
|
122
|
+
└────────┬───────┘
|
|
123
|
+
▼
|
|
124
|
+
┌────────────────┐
|
|
125
|
+
│ Flag Outliers │
|
|
126
|
+
└────────────────┘
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
---
|
|
130
|
+
|
|
131
|
+
## Also available in R
|
|
132
|
+
|
|
133
|
+
```r
|
|
134
|
+
devtools::install_github("GuillaumePech/missOutlierR")
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
---
|
|
138
|
+
|
|
139
|
+
## Citation
|
|
140
|
+
|
|
141
|
+
If you use this package in your research, please cite:
|
|
142
|
+
|
|
143
|
+
> Pech, G., Vaccaro, N., Caspar, E. A., Amerio, P., Cleeremans, A., Leys, C., & Ley, C. (2026). How not to MISS an outlier: comparing three classic univariate methods and introducing a new one, the MAD–IQR–SD Simultaneous (MISS). *PsyArXiv*. https://doi.org/10.31234/osf.io/2r9yw_v2
|
|
144
|
+
|
|
145
|
+
```bibtex
|
|
146
|
+
@article{pech2026miss,
|
|
147
|
+
title={How not to {MISS} an outlier: comparing three classic univariate methods and introducing a new one, the {MAD--IQR--SD} Simultaneous ({MISS})},
|
|
148
|
+
author={Pech, Guillaume and Vaccaro, Niccol{\`o} and Caspar, Emilie A. and Amerio, Pietro and Cleeremans, Axel and Leys, Christophe and Ley, Christophe},
|
|
149
|
+
year={2026},
|
|
150
|
+
journal={PsyArXiv},
|
|
151
|
+
doi={10.31234/osf.io/2r9yw_v2}
|
|
152
|
+
}
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
---
|
|
156
|
+
|
|
157
|
+
## License
|
|
158
|
+
|
|
159
|
+
MIT © [Guillaume Pech](https://github.com/GuillaumePech)
|
|
@@ -0,0 +1,10 @@
|
|
|
1
|
+
LICENSE
|
|
2
|
+
README.md
|
|
3
|
+
pyproject.toml
|
|
4
|
+
src/missoutlier/__init__.py
|
|
5
|
+
src/missoutlier/detect.py
|
|
6
|
+
src/missoutlier.egg-info/PKG-INFO
|
|
7
|
+
src/missoutlier.egg-info/SOURCES.txt
|
|
8
|
+
src/missoutlier.egg-info/dependency_links.txt
|
|
9
|
+
src/missoutlier.egg-info/requires.txt
|
|
10
|
+
src/missoutlier.egg-info/top_level.txt
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
missoutlier
|