deskit 0.2.0__tar.gz → 0.3.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {deskit-0.2.0/src/deskit.egg-info → deskit-0.3.0}/PKG-INFO +18 -11
- {deskit-0.2.0 → deskit-0.3.0}/README.md +17 -10
- {deskit-0.2.0 → deskit-0.3.0}/pyproject.toml +1 -1
- {deskit-0.2.0 → deskit-0.3.0}/src/deskit/__init__.py +4 -4
- {deskit-0.2.0 → deskit-0.3.0}/src/deskit/des/__init__.py +2 -2
- deskit-0.2.0/src/deskit/des/knndwsi.py → deskit-0.3.0/src/deskit/des/dewsi.py +4 -4
- deskit-0.2.0/src/deskit/des/knndws.py → deskit-0.3.0/src/deskit/des/dewsu.py +3 -3
- {deskit-0.2.0 → deskit-0.3.0}/src/deskit/router.py +9 -9
- {deskit-0.2.0 → deskit-0.3.0/src/deskit.egg-info}/PKG-INFO +18 -11
- {deskit-0.2.0 → deskit-0.3.0}/src/deskit.egg-info/SOURCES.txt +2 -2
- {deskit-0.2.0 → deskit-0.3.0}/LICENSE +0 -0
- {deskit-0.2.0 → deskit-0.3.0}/setup.cfg +0 -0
- {deskit-0.2.0 → deskit-0.3.0}/src/deskit/_config.py +0 -0
- {deskit-0.2.0 → deskit-0.3.0}/src/deskit/analysis.py +0 -0
- {deskit-0.2.0 → deskit-0.3.0}/src/deskit/base/__init__.py +0 -0
- {deskit-0.2.0 → deskit-0.3.0}/src/deskit/base/base.py +0 -0
- {deskit-0.2.0 → deskit-0.3.0}/src/deskit/base/knnbase.py +0 -0
- {deskit-0.2.0 → deskit-0.3.0}/src/deskit/des/knorae.py +0 -0
- {deskit-0.2.0 → deskit-0.3.0}/src/deskit/des/knoraiu.py +0 -0
- {deskit-0.2.0 → deskit-0.3.0}/src/deskit/des/knorau.py +0 -0
- {deskit-0.2.0 → deskit-0.3.0}/src/deskit/des/ola.py +0 -0
- {deskit-0.2.0 → deskit-0.3.0}/src/deskit/metrics.py +0 -0
- {deskit-0.2.0 → deskit-0.3.0}/src/deskit/neighbors.py +0 -0
- {deskit-0.2.0 → deskit-0.3.0}/src/deskit/utils.py +0 -0
- {deskit-0.2.0 → deskit-0.3.0}/src/deskit.egg-info/dependency_links.txt +0 -0
- {deskit-0.2.0 → deskit-0.3.0}/src/deskit.egg-info/requires.txt +0 -0
- {deskit-0.2.0 → deskit-0.3.0}/src/deskit.egg-info/top_level.txt +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: deskit
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.3.0
|
|
4
4
|
Summary: A Python library for Dynamic Ensemble Selection
|
|
5
5
|
Author: Tikhon Vodyanov
|
|
6
6
|
License-Expression: MIT
|
|
@@ -31,7 +31,7 @@ Dynamic: license-file
|
|
|
31
31
|
|
|
32
32
|
# deskit
|
|
33
33
|
|
|
34
|
-
|
|
34
|
+
deskit is a flexible, lightweight, and easy-to-use ensembling library that implements
|
|
35
35
|
Dynamic Ensemble Selection (DES) algorithms for ensembling multiple ML models
|
|
36
36
|
on a given dataset.
|
|
37
37
|
|
|
@@ -43,6 +43,8 @@ requiring any wrappers, including custom models, popular ML libraries, and APIs.
|
|
|
43
43
|
deskit includes several DES algorithms, and it works with both classification
|
|
44
44
|
and regression.
|
|
45
45
|
|
|
46
|
+
See the full documentation [here](https://TikaaVo.github.io/deskit/).
|
|
47
|
+
|
|
46
48
|
# Dynamic Ensemble Selection
|
|
47
49
|
|
|
48
50
|
Ensemble learning in machine learning refers to when multiple models trained on a
|
|
@@ -150,8 +152,8 @@ weights = router.predict(X_test[i])
|
|
|
150
152
|
|
|
151
153
|
| Method | Best for | Notes |
|
|
152
154
|
|-----------|---|----------------------------------------------------------------------------------------------------------|
|
|
153
|
-
| `
|
|
154
|
-
| `
|
|
155
|
+
| `DEWSU` | Regression | Softmax over neighbourhood-averaged scores. Temperature controls sharpness. |
|
|
156
|
+
| `DEWSI` | Regression | Like DEWS-U but scores are inverse-distance weighted. |
|
|
155
157
|
| `KNORAU` | Classification | Vote-count weighting. Each model earns one vote per neighbour it correctly classifies. |
|
|
156
158
|
| `KNORAE` | Classification | Intersection-based. Only models correct on all neighbours survive; falls back to smaller neighbourhoods. |
|
|
157
159
|
| `KNORAIU` | Classification | Like KNORA-U but votes are inverse-distance weighted. |
|
|
@@ -202,13 +204,18 @@ def pinball(y_true, y_pred, alpha=0.9):
|
|
|
202
204
|
e = y_true - y_pred
|
|
203
205
|
return alpha * e if e >= 0 else (alpha - 1) * e
|
|
204
206
|
|
|
205
|
-
router =
|
|
207
|
+
router = DEWSU(task="regression", metric=pinball, mode="min", k=20)
|
|
206
208
|
```
|
|
207
209
|
|
|
208
210
|
Built-in metric strings: `accuracy`, `mae`, `mse`, `rmse`, `log_loss`, `prob_correct`.
|
|
209
211
|
|
|
210
212
|
---
|
|
211
213
|
|
|
214
|
+
## Data types
|
|
215
|
+
|
|
216
|
+
deskit can be used with non-tabular data types like images, time series, and more. However, when used, the
|
|
217
|
+
passed features either need to be run through a feature extractor beforehand, such as a CNN backbone for images.
|
|
218
|
+
|
|
212
219
|
## Benchmark results
|
|
213
220
|
|
|
214
221
|
100-seed benchmark (seeds 0–99) on standard sklearn and OpenML datasets. "Best Single" is the best
|
|
@@ -224,7 +231,7 @@ Pool: KNN, Decision Tree, SVR, Ridge, Bayesian Ridge.
|
|
|
224
231
|
|
|
225
232
|
This pool was selected for having variability in architectures while avoiding a single dominant model.
|
|
226
233
|
|
|
227
|
-
deskit algorithms tested: OLA,
|
|
234
|
+
deskit algorithms tested: OLA, DEWS-U, DEWS-I, KNORA-U, KNORA-E, KNORA-IU.
|
|
228
235
|
|
|
229
236
|
### Regression (MAE, lower is better)
|
|
230
237
|
|
|
@@ -232,10 +239,10 @@ deskit algorithms tested: OLA, KNN-DWS, KNN-DWS-I, KNORA-U, KNORA-E, KNORA-IU.
|
|
|
232
239
|
|
|
233
240
|
| Dataset | Best Single | Simple Avg | deskit best |
|
|
234
241
|
|------------------------------|-------------|------------|-------------------------|
|
|
235
|
-
| California Housing (sklearn) | 0.3955 | +7.93% | **−2.68%** (
|
|
236
|
-
| Bike Sharing (OpenML) | 51.604 | +48.39% | **−6.25%** (
|
|
242
|
+
| California Housing (sklearn) | 0.3955 | +7.93% | **−2.68%** (DEWS-I) |
|
|
243
|
+
| Bike Sharing (OpenML) | 51.604 | +48.39% | **−6.25%** (DEWS-I) |
|
|
237
244
|
| Abalone (OpenML) | **1.4923** | +1.29% | +1.61% (KNORA-IU) |
|
|
238
|
-
| Diabetes (sklearn) | **44.986** | +2.98% | +0.88% (
|
|
245
|
+
| Diabetes (sklearn) | **44.986** | +2.98% | +0.88% (DEWS-I) |
|
|
239
246
|
| Concrete Strength (OpenML) | 5.3934 | +21.30% | **−2.85%** (KNORA-IU) |
|
|
240
247
|
|
|
241
248
|
deskit beats best single and simple averaging on 3/5 regression datasets. This shows how DES can provide a
|
|
@@ -252,10 +259,10 @@ and classification-like (like in Abalone).
|
|
|
252
259
|
|
|
253
260
|
| Dataset | Best Single | Simple Avg | deskit best |
|
|
254
261
|
|------------------------|-------------|------------|-------------------------|
|
|
255
|
-
| HAR (OpenML) | 98.24% | −0.32% | **+0.14%** (
|
|
262
|
+
| HAR (OpenML) | 98.24% | −0.32% | **+0.14%** (DEWS-I) |
|
|
256
263
|
| Yeast (OpenML) | 59.19% | +0.46% | **+1.48%** (KNORA-IU) |
|
|
257
264
|
| Image Segment (OpenML) | 93.65% | +1.70% | **+2.33%** (KNORA-IU) |
|
|
258
|
-
| Waveform (OpenML) | **86.28%** | −1.04% | −0.55% (
|
|
265
|
+
| Waveform (OpenML) | **86.28%** | −1.04% | −0.55% (DEWS-I) |
|
|
259
266
|
| Vowel (OpenML) | 90.54% | −1.81% | **+0.93%** (KNORA-IU) |
|
|
260
267
|
|
|
261
268
|
deskit beats or matches best single and simple averaging on 4/5 classification datasets. As seen on regression, DES
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# deskit
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
deskit is a flexible, lightweight, and easy-to-use ensembling library that implements
|
|
4
4
|
Dynamic Ensemble Selection (DES) algorithms for ensembling multiple ML models
|
|
5
5
|
on a given dataset.
|
|
6
6
|
|
|
@@ -12,6 +12,8 @@ requiring any wrappers, including custom models, popular ML libraries, and APIs.
|
|
|
12
12
|
deskit includes several DES algorithms, and it works with both classification
|
|
13
13
|
and regression.
|
|
14
14
|
|
|
15
|
+
See the full documentation [here](https://TikaaVo.github.io/deskit/).
|
|
16
|
+
|
|
15
17
|
# Dynamic Ensemble Selection
|
|
16
18
|
|
|
17
19
|
Ensemble learning in machine learning refers to when multiple models trained on a
|
|
@@ -119,8 +121,8 @@ weights = router.predict(X_test[i])
|
|
|
119
121
|
|
|
120
122
|
| Method | Best for | Notes |
|
|
121
123
|
|-----------|---|----------------------------------------------------------------------------------------------------------|
|
|
122
|
-
| `
|
|
123
|
-
| `
|
|
124
|
+
| `DEWSU` | Regression | Softmax over neighbourhood-averaged scores. Temperature controls sharpness. |
|
|
125
|
+
| `DEWSI` | Regression | Like DEWS-U but scores are inverse-distance weighted. |
|
|
124
126
|
| `KNORAU` | Classification | Vote-count weighting. Each model earns one vote per neighbour it correctly classifies. |
|
|
125
127
|
| `KNORAE` | Classification | Intersection-based. Only models correct on all neighbours survive; falls back to smaller neighbourhoods. |
|
|
126
128
|
| `KNORAIU` | Classification | Like KNORA-U but votes are inverse-distance weighted. |
|
|
@@ -171,13 +173,18 @@ def pinball(y_true, y_pred, alpha=0.9):
|
|
|
171
173
|
e = y_true - y_pred
|
|
172
174
|
return alpha * e if e >= 0 else (alpha - 1) * e
|
|
173
175
|
|
|
174
|
-
router =
|
|
176
|
+
router = DEWSU(task="regression", metric=pinball, mode="min", k=20)
|
|
175
177
|
```
|
|
176
178
|
|
|
177
179
|
Built-in metric strings: `accuracy`, `mae`, `mse`, `rmse`, `log_loss`, `prob_correct`.
|
|
178
180
|
|
|
179
181
|
---
|
|
180
182
|
|
|
183
|
+
## Data types
|
|
184
|
+
|
|
185
|
+
deskit can be used with non-tabular data types like images, time series, and more. However, when used, the
|
|
186
|
+
passed features either need to be run through a feature extractor beforehand, such as a CNN backbone for images.
|
|
187
|
+
|
|
181
188
|
## Benchmark results
|
|
182
189
|
|
|
183
190
|
100-seed benchmark (seeds 0–99) on standard sklearn and OpenML datasets. "Best Single" is the best
|
|
@@ -193,7 +200,7 @@ Pool: KNN, Decision Tree, SVR, Ridge, Bayesian Ridge.
|
|
|
193
200
|
|
|
194
201
|
This pool was selected for having variability in architectures while avoiding a single dominant model.
|
|
195
202
|
|
|
196
|
-
deskit algorithms tested: OLA,
|
|
203
|
+
deskit algorithms tested: OLA, DEWS-U, DEWS-I, KNORA-U, KNORA-E, KNORA-IU.
|
|
197
204
|
|
|
198
205
|
### Regression (MAE, lower is better)
|
|
199
206
|
|
|
@@ -201,10 +208,10 @@ deskit algorithms tested: OLA, KNN-DWS, KNN-DWS-I, KNORA-U, KNORA-E, KNORA-IU.
|
|
|
201
208
|
|
|
202
209
|
| Dataset | Best Single | Simple Avg | deskit best |
|
|
203
210
|
|------------------------------|-------------|------------|-------------------------|
|
|
204
|
-
| California Housing (sklearn) | 0.3955 | +7.93% | **−2.68%** (
|
|
205
|
-
| Bike Sharing (OpenML) | 51.604 | +48.39% | **−6.25%** (
|
|
211
|
+
| California Housing (sklearn) | 0.3955 | +7.93% | **−2.68%** (DEWS-I) |
|
|
212
|
+
| Bike Sharing (OpenML) | 51.604 | +48.39% | **−6.25%** (DEWS-I) |
|
|
206
213
|
| Abalone (OpenML) | **1.4923** | +1.29% | +1.61% (KNORA-IU) |
|
|
207
|
-
| Diabetes (sklearn) | **44.986** | +2.98% | +0.88% (
|
|
214
|
+
| Diabetes (sklearn) | **44.986** | +2.98% | +0.88% (DEWS-I) |
|
|
208
215
|
| Concrete Strength (OpenML) | 5.3934 | +21.30% | **−2.85%** (KNORA-IU) |
|
|
209
216
|
|
|
210
217
|
deskit beats best single and simple averaging on 3/5 regression datasets. This shows how DES can provide a
|
|
@@ -221,10 +228,10 @@ and classification-like (like in Abalone).
|
|
|
221
228
|
|
|
222
229
|
| Dataset | Best Single | Simple Avg | deskit best |
|
|
223
230
|
|------------------------|-------------|------------|-------------------------|
|
|
224
|
-
| HAR (OpenML) | 98.24% | −0.32% | **+0.14%** (
|
|
231
|
+
| HAR (OpenML) | 98.24% | −0.32% | **+0.14%** (DEWS-I) |
|
|
225
232
|
| Yeast (OpenML) | 59.19% | +0.46% | **+1.48%** (KNORA-IU) |
|
|
226
233
|
| Image Segment (OpenML) | 93.65% | +1.70% | **+2.33%** (KNORA-IU) |
|
|
227
|
-
| Waveform (OpenML) | **86.28%** | −1.04% | −0.55% (
|
|
234
|
+
| Waveform (OpenML) | **86.28%** | −1.04% | −0.55% (DEWS-I) |
|
|
228
235
|
| Vowel (OpenML) | 90.54% | −1.81% | **+0.93%** (KNORA-IU) |
|
|
229
236
|
|
|
230
237
|
deskit beats or matches best single and simple averaging on 4/5 classification datasets. As seen on regression, DES
|
|
@@ -5,13 +5,13 @@ Metrics
|
|
|
5
5
|
-------
|
|
6
6
|
Pass a metric name string:
|
|
7
7
|
|
|
8
|
-
|
|
8
|
+
DEWSU(task='classification', metric='log_loss', mode='min')
|
|
9
9
|
|
|
10
10
|
Or import a metric function directly:
|
|
11
11
|
|
|
12
12
|
from deskit.metrics import log_loss, mae
|
|
13
13
|
|
|
14
|
-
|
|
14
|
+
DEWSU(task='classification', metric=log_loss, mode='min')
|
|
15
15
|
|
|
16
16
|
Available built-in metrics:
|
|
17
17
|
Scalar predictions (pass predict() output):
|
|
@@ -21,7 +21,7 @@ Available built-in metrics:
|
|
|
21
21
|
'log_loss', 'prob_correct'
|
|
22
22
|
"""
|
|
23
23
|
|
|
24
|
-
from deskit.des.
|
|
24
|
+
from deskit.des.dewsu import DEWSU
|
|
25
25
|
from deskit.des.ola import OLA
|
|
26
26
|
from deskit.des.knorau import KNORAU
|
|
27
27
|
from deskit.des.knorae import KNORAE
|
|
@@ -31,7 +31,7 @@ from deskit._config import SPEED_PRESETS, list_presets
|
|
|
31
31
|
from deskit.analysis import analyze
|
|
32
32
|
|
|
33
33
|
__all__ = [
|
|
34
|
-
'
|
|
34
|
+
'DEWSU',
|
|
35
35
|
'OLA',
|
|
36
36
|
'KNORAU',
|
|
37
37
|
'KNORAE',
|
|
@@ -1,7 +1,7 @@
|
|
|
1
|
-
from deskit.des.
|
|
1
|
+
from deskit.des.dewsu import DEWSU
|
|
2
2
|
from deskit.des.ola import OLA
|
|
3
3
|
from deskit.des.knorau import KNORAU
|
|
4
4
|
from deskit.des.knorae import KNORAE
|
|
5
5
|
from deskit.des.knoraiu import KNORAIU
|
|
6
6
|
|
|
7
|
-
__all__ = ['
|
|
7
|
+
__all__ = ['DEWSU', 'OLA', 'KNORAU', 'KNORAE', 'KNORAIU']
|
|
@@ -1,5 +1,5 @@
|
|
|
1
1
|
"""
|
|
2
|
-
|
|
2
|
+
DEWS-IU: K-Nearest Neighbors with Distance-Weighted Softmax — Inverse-weighted Union.
|
|
3
3
|
"""
|
|
4
4
|
from deskit.base.knnbase import KNNBase
|
|
5
5
|
from deskit._config import make_finder, resolve_metric, prep_fit_inputs
|
|
@@ -7,11 +7,11 @@ from deskit.utils import to_numpy
|
|
|
7
7
|
import numpy as np
|
|
8
8
|
|
|
9
9
|
|
|
10
|
-
class
|
|
10
|
+
class DEWSI(KNNBase):
|
|
11
11
|
"""
|
|
12
|
-
|
|
12
|
+
DEWS-IU: K-Nearest Neighbors with Distance-Weighted Softmax — Inverse-weighted Union.
|
|
13
13
|
|
|
14
|
-
Extends
|
|
14
|
+
Extends DEWS-U by replacing the simple average of neighbor scores with an
|
|
15
15
|
inverse-distance-weighted average, so closer neighbors have a stronger
|
|
16
16
|
influence on the softmax routing — analogous to how KNORA-IU extends KNORA-U.
|
|
17
17
|
|
|
@@ -1,5 +1,5 @@
|
|
|
1
1
|
"""
|
|
2
|
-
|
|
2
|
+
DEWS-U: K-Nearest Neighbors with Distance-Weighted Softmax.
|
|
3
3
|
"""
|
|
4
4
|
from deskit.base.knnbase import KNNBase
|
|
5
5
|
from deskit._config import make_finder, resolve_metric, prep_fit_inputs
|
|
@@ -7,9 +7,9 @@ from deskit.utils import to_numpy
|
|
|
7
7
|
import numpy as np
|
|
8
8
|
|
|
9
9
|
|
|
10
|
-
class
|
|
10
|
+
class DEWSU(KNNBase):
|
|
11
11
|
"""
|
|
12
|
-
|
|
12
|
+
DEWS-U: K-Nearest Neighbors with Distance-Weighted Softmax.
|
|
13
13
|
|
|
14
14
|
Parameters
|
|
15
15
|
----------
|
|
@@ -3,7 +3,7 @@ DynamicRouter — string-based factory for programmatic algorithm selection.
|
|
|
3
3
|
|
|
4
4
|
Use DynamicRouter when you need to choose an algorithm via a string at runtime.
|
|
5
5
|
"""
|
|
6
|
-
from deskit.des.
|
|
6
|
+
from deskit.des.dewsu import DEWSU
|
|
7
7
|
from deskit.des.ola import OLA
|
|
8
8
|
from deskit.des.knorau import KNORAU
|
|
9
9
|
from deskit.des.knorae import KNORAE
|
|
@@ -12,7 +12,7 @@ from deskit._config import SPEED_PRESETS, list_presets
|
|
|
12
12
|
from deskit.utils import to_numpy, add_batch_dim
|
|
13
13
|
|
|
14
14
|
_METHOD_CLASSES = {
|
|
15
|
-
'
|
|
15
|
+
'DEWS-U': DEWSU,
|
|
16
16
|
'ola': OLA,
|
|
17
17
|
'knora-u': KNORAU,
|
|
18
18
|
'knora-e': KNORAE,
|
|
@@ -29,7 +29,7 @@ class DynamicRouter:
|
|
|
29
29
|
task : str
|
|
30
30
|
'classification' or 'regression'.
|
|
31
31
|
method : str
|
|
32
|
-
'
|
|
32
|
+
'DEWS-U', 'ola', 'knora-u', or 'knora-e'.
|
|
33
33
|
metric : str or callable
|
|
34
34
|
Per-sample scoring function. Built-in names: 'accuracy', 'mae', 'mse',
|
|
35
35
|
'rmse', 'log_loss', 'prob_correct'. Or any callable (y_true, y_pred) -> float.
|
|
@@ -40,7 +40,7 @@ class DynamicRouter:
|
|
|
40
40
|
threshold : float
|
|
41
41
|
Competence gate applied after per-neighborhood normalization.
|
|
42
42
|
temperature : float, optional
|
|
43
|
-
Softmax sharpness for
|
|
43
|
+
Softmax sharpness for DEWS-U. Ignored by other algorithms.
|
|
44
44
|
preset : str
|
|
45
45
|
Speed/accuracy preset. Call list_presets() for options.
|
|
46
46
|
feature_extractor : callable, optional
|
|
@@ -51,7 +51,7 @@ class DynamicRouter:
|
|
|
51
51
|
Forwarded to the neighbor finder constructor.
|
|
52
52
|
"""
|
|
53
53
|
|
|
54
|
-
def __init__(self, task, method='
|
|
54
|
+
def __init__(self, task, method='DEWS-U', metric='accuracy', mode='max',
|
|
55
55
|
k=10, threshold=0.5, temperature=None, preset='balanced',
|
|
56
56
|
feature_extractor=None, finder=None, **kwargs):
|
|
57
57
|
|
|
@@ -71,8 +71,8 @@ class DynamicRouter:
|
|
|
71
71
|
# Pass finder through as a kwarg when using preset='custom'.
|
|
72
72
|
extra = {'finder': finder} if finder is not None else {}
|
|
73
73
|
|
|
74
|
-
#
|
|
75
|
-
if method == '
|
|
74
|
+
# DEWSU accepts temperature; the others don't.
|
|
75
|
+
if method == 'DEWS-U':
|
|
76
76
|
self._des = cls(
|
|
77
77
|
task=task, metric=metric, mode=mode, k=k,
|
|
78
78
|
threshold=threshold, temperature=temperature,
|
|
@@ -108,7 +108,7 @@ class DynamicRouter:
|
|
|
108
108
|
----------
|
|
109
109
|
x : array-like, shape (n_features,) or (n_samples, n_features)
|
|
110
110
|
temperature : float, optional
|
|
111
|
-
|
|
111
|
+
DEWS-U only. Overrides the instance temperature for this call.
|
|
112
112
|
threshold : float, optional
|
|
113
113
|
Overrides the instance threshold for this call.
|
|
114
114
|
|
|
@@ -125,7 +125,7 @@ class DynamicRouter:
|
|
|
125
125
|
# Class methods
|
|
126
126
|
|
|
127
127
|
@classmethod
|
|
128
|
-
def from_data_size(cls, n_samples, n_features, task, method='
|
|
128
|
+
def from_data_size(cls, n_samples, n_features, task, method='DEWS-U',
|
|
129
129
|
metric='accuracy', mode='max', k=10, threshold=0.5,
|
|
130
130
|
n_queries=None, **extra_kwargs):
|
|
131
131
|
"""
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: deskit
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.3.0
|
|
4
4
|
Summary: A Python library for Dynamic Ensemble Selection
|
|
5
5
|
Author: Tikhon Vodyanov
|
|
6
6
|
License-Expression: MIT
|
|
@@ -31,7 +31,7 @@ Dynamic: license-file
|
|
|
31
31
|
|
|
32
32
|
# deskit
|
|
33
33
|
|
|
34
|
-
|
|
34
|
+
deskit is a flexible, lightweight, and easy-to-use ensembling library that implements
|
|
35
35
|
Dynamic Ensemble Selection (DES) algorithms for ensembling multiple ML models
|
|
36
36
|
on a given dataset.
|
|
37
37
|
|
|
@@ -43,6 +43,8 @@ requiring any wrappers, including custom models, popular ML libraries, and APIs.
|
|
|
43
43
|
deskit includes several DES algorithms, and it works with both classification
|
|
44
44
|
and regression.
|
|
45
45
|
|
|
46
|
+
See the full documentation [here](https://TikaaVo.github.io/deskit/).
|
|
47
|
+
|
|
46
48
|
# Dynamic Ensemble Selection
|
|
47
49
|
|
|
48
50
|
Ensemble learning in machine learning refers to when multiple models trained on a
|
|
@@ -150,8 +152,8 @@ weights = router.predict(X_test[i])
|
|
|
150
152
|
|
|
151
153
|
| Method | Best for | Notes |
|
|
152
154
|
|-----------|---|----------------------------------------------------------------------------------------------------------|
|
|
153
|
-
| `
|
|
154
|
-
| `
|
|
155
|
+
| `DEWSU` | Regression | Softmax over neighbourhood-averaged scores. Temperature controls sharpness. |
|
|
156
|
+
| `DEWSI` | Regression | Like DEWS-U but scores are inverse-distance weighted. |
|
|
155
157
|
| `KNORAU` | Classification | Vote-count weighting. Each model earns one vote per neighbour it correctly classifies. |
|
|
156
158
|
| `KNORAE` | Classification | Intersection-based. Only models correct on all neighbours survive; falls back to smaller neighbourhoods. |
|
|
157
159
|
| `KNORAIU` | Classification | Like KNORA-U but votes are inverse-distance weighted. |
|
|
@@ -202,13 +204,18 @@ def pinball(y_true, y_pred, alpha=0.9):
|
|
|
202
204
|
e = y_true - y_pred
|
|
203
205
|
return alpha * e if e >= 0 else (alpha - 1) * e
|
|
204
206
|
|
|
205
|
-
router =
|
|
207
|
+
router = DEWSU(task="regression", metric=pinball, mode="min", k=20)
|
|
206
208
|
```
|
|
207
209
|
|
|
208
210
|
Built-in metric strings: `accuracy`, `mae`, `mse`, `rmse`, `log_loss`, `prob_correct`.
|
|
209
211
|
|
|
210
212
|
---
|
|
211
213
|
|
|
214
|
+
## Data types
|
|
215
|
+
|
|
216
|
+
deskit can be used with non-tabular data types like images, time series, and more. However, when used, the
|
|
217
|
+
passed features either need to be run through a feature extractor beforehand, such as a CNN backbone for images.
|
|
218
|
+
|
|
212
219
|
## Benchmark results
|
|
213
220
|
|
|
214
221
|
100-seed benchmark (seeds 0–99) on standard sklearn and OpenML datasets. "Best Single" is the best
|
|
@@ -224,7 +231,7 @@ Pool: KNN, Decision Tree, SVR, Ridge, Bayesian Ridge.
|
|
|
224
231
|
|
|
225
232
|
This pool was selected for having variability in architectures while avoiding a single dominant model.
|
|
226
233
|
|
|
227
|
-
deskit algorithms tested: OLA,
|
|
234
|
+
deskit algorithms tested: OLA, DEWS-U, DEWS-I, KNORA-U, KNORA-E, KNORA-IU.
|
|
228
235
|
|
|
229
236
|
### Regression (MAE, lower is better)
|
|
230
237
|
|
|
@@ -232,10 +239,10 @@ deskit algorithms tested: OLA, KNN-DWS, KNN-DWS-I, KNORA-U, KNORA-E, KNORA-IU.
|
|
|
232
239
|
|
|
233
240
|
| Dataset | Best Single | Simple Avg | deskit best |
|
|
234
241
|
|------------------------------|-------------|------------|-------------------------|
|
|
235
|
-
| California Housing (sklearn) | 0.3955 | +7.93% | **−2.68%** (
|
|
236
|
-
| Bike Sharing (OpenML) | 51.604 | +48.39% | **−6.25%** (
|
|
242
|
+
| California Housing (sklearn) | 0.3955 | +7.93% | **−2.68%** (DEWS-I) |
|
|
243
|
+
| Bike Sharing (OpenML) | 51.604 | +48.39% | **−6.25%** (DEWS-I) |
|
|
237
244
|
| Abalone (OpenML) | **1.4923** | +1.29% | +1.61% (KNORA-IU) |
|
|
238
|
-
| Diabetes (sklearn) | **44.986** | +2.98% | +0.88% (
|
|
245
|
+
| Diabetes (sklearn) | **44.986** | +2.98% | +0.88% (DEWS-I) |
|
|
239
246
|
| Concrete Strength (OpenML) | 5.3934 | +21.30% | **−2.85%** (KNORA-IU) |
|
|
240
247
|
|
|
241
248
|
deskit beats best single and simple averaging on 3/5 regression datasets. This shows how DES can provide a
|
|
@@ -252,10 +259,10 @@ and classification-like (like in Abalone).
|
|
|
252
259
|
|
|
253
260
|
| Dataset | Best Single | Simple Avg | deskit best |
|
|
254
261
|
|------------------------|-------------|------------|-------------------------|
|
|
255
|
-
| HAR (OpenML) | 98.24% | −0.32% | **+0.14%** (
|
|
262
|
+
| HAR (OpenML) | 98.24% | −0.32% | **+0.14%** (DEWS-I) |
|
|
256
263
|
| Yeast (OpenML) | 59.19% | +0.46% | **+1.48%** (KNORA-IU) |
|
|
257
264
|
| Image Segment (OpenML) | 93.65% | +1.70% | **+2.33%** (KNORA-IU) |
|
|
258
|
-
| Waveform (OpenML) | **86.28%** | −1.04% | −0.55% (
|
|
265
|
+
| Waveform (OpenML) | **86.28%** | −1.04% | −0.55% (DEWS-I) |
|
|
259
266
|
| Vowel (OpenML) | 90.54% | −1.81% | **+0.93%** (KNORA-IU) |
|
|
260
267
|
|
|
261
268
|
deskit beats or matches best single and simple averaging on 4/5 classification datasets. As seen on regression, DES
|
|
@@ -17,8 +17,8 @@ src/deskit/base/__init__.py
|
|
|
17
17
|
src/deskit/base/base.py
|
|
18
18
|
src/deskit/base/knnbase.py
|
|
19
19
|
src/deskit/des/__init__.py
|
|
20
|
-
src/deskit/des/
|
|
21
|
-
src/deskit/des/
|
|
20
|
+
src/deskit/des/dewsi.py
|
|
21
|
+
src/deskit/des/dewsu.py
|
|
22
22
|
src/deskit/des/knorae.py
|
|
23
23
|
src/deskit/des/knoraiu.py
|
|
24
24
|
src/deskit/des/knorau.py
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|