kernelboost 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- kernelboost-0.1.0/CHANGELOG.md +15 -0
- kernelboost-0.1.0/LICENSE +21 -0
- kernelboost-0.1.0/MANIFEST.in +11 -0
- kernelboost-0.1.0/PKG-INFO +279 -0
- kernelboost-0.1.0/README.md +257 -0
- kernelboost-0.1.0/kernelboost/__init__.py +11 -0
- kernelboost-0.1.0/kernelboost/backend.py +202 -0
- kernelboost-0.1.0/kernelboost/booster.py +798 -0
- kernelboost-0.1.0/kernelboost/cpu_functions.py +259 -0
- kernelboost-0.1.0/kernelboost/estimator.py +258 -0
- kernelboost-0.1.0/kernelboost/feature_selection.py +305 -0
- kernelboost-0.1.0/kernelboost/gpu_functions.py +164 -0
- kernelboost-0.1.0/kernelboost/kernels.c +251 -0
- kernelboost-0.1.0/kernelboost/kernels.cu +84 -0
- kernelboost-0.1.0/kernelboost/libkernels.dll +0 -0
- kernelboost-0.1.0/kernelboost/libkernels.so +0 -0
- kernelboost-0.1.0/kernelboost/multiclassbooster.py +516 -0
- kernelboost-0.1.0/kernelboost/objectives.py +336 -0
- kernelboost-0.1.0/kernelboost/optimizer.py +161 -0
- kernelboost-0.1.0/kernelboost/rho_optimizer.py +530 -0
- kernelboost-0.1.0/kernelboost/tree.py +485 -0
- kernelboost-0.1.0/kernelboost/utilities.py +459 -0
- kernelboost-0.1.0/kernelboost.egg-info/PKG-INFO +279 -0
- kernelboost-0.1.0/kernelboost.egg-info/SOURCES.txt +27 -0
- kernelboost-0.1.0/kernelboost.egg-info/dependency_links.txt +1 -0
- kernelboost-0.1.0/kernelboost.egg-info/requires.txt +7 -0
- kernelboost-0.1.0/kernelboost.egg-info/top_level.txt +1 -0
- kernelboost-0.1.0/pyproject.toml +42 -0
- kernelboost-0.1.0/setup.cfg +4 -0
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to this project will be documented in this file.
|
|
4
|
+
|
|
5
|
+
## [0.1.0] - 2026-02-10
|
|
6
|
+
|
|
7
|
+
### Added
|
|
8
|
+
- Initial public release
|
|
9
|
+
- KernelBooster for 1d targets with support for MSE, entropy, and quantile objectives
|
|
10
|
+
- MulticlassBooster for multiclass classification
|
|
11
|
+
- GPU acceleration via CuPy
|
|
12
|
+
- Nadaraya-Watson kernel regression with LOO-CV bandwidth optimization
|
|
13
|
+
- Feature selection (random and smart selectors)
|
|
14
|
+
- Early stopping with validation loss monitoring
|
|
15
|
+
- Uncertainty quantification via prediction intervals and conditional variance prediction
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 tlaiho
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,11 @@
|
|
|
1
|
+
include LICENSE
|
|
2
|
+
include README.md
|
|
3
|
+
include CHANGELOG.md
|
|
4
|
+
recursive-include kernelboost *.c *.cu *.so *.dll
|
|
5
|
+
prune tests
|
|
6
|
+
prune benchmarks
|
|
7
|
+
prune docs
|
|
8
|
+
prune private
|
|
9
|
+
prune private_tests
|
|
10
|
+
global-exclude __pycache__ *.pyc *.pyo
|
|
11
|
+
prune kernelboost/_extras
|
|
@@ -0,0 +1,279 @@
|
|
|
1
|
+
Metadata-Version: 2.1
|
|
2
|
+
Name: kernelboost
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Gradient boosting with kernel regression base learners
|
|
5
|
+
Author-email: tlaiho <tslaiho@gmail.com>
|
|
6
|
+
License: MIT
|
|
7
|
+
Project-URL: Repository, https://github.com/tlaiho/kernelboost
|
|
8
|
+
Keywords: gradient-boosting,kernel-regression,machine-learning
|
|
9
|
+
Classifier: Development Status :: 3 - Alpha
|
|
10
|
+
Classifier: Intended Audience :: Science/Research
|
|
11
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
12
|
+
Classifier: Programming Language :: Python :: 3
|
|
13
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
14
|
+
Requires-Python: >=3.9
|
|
15
|
+
Description-Content-Type: text/markdown
|
|
16
|
+
License-File: LICENSE
|
|
17
|
+
Requires-Dist: numpy>=1.26.4
|
|
18
|
+
Provides-Extra: gpu
|
|
19
|
+
Requires-Dist: cupy>=11.0.0; extra == "gpu"
|
|
20
|
+
Provides-Extra: all
|
|
21
|
+
Requires-Dist: cupy>=11.0.0; extra == "all"
|
|
22
|
+
|
|
23
|
+
# KernelBoost
|
|
24
|
+
|
|
25
|
+
**Gradient boosting with kernel-based local constant estimators**
|
|
26
|
+
|
|
27
|
+

|
|
28
|
+

|
|
29
|
+

|
|
30
|
+

|
|
31
|
+

|
|
32
|
+

|
|
33
|
+
|
|
34
|
+
KernelBoost is a gradient boosting algorithm that uses Nadaraya-Watson (local constant) kernel estimators as base learners instead of decision trees. It has:
|
|
35
|
+
|
|
36
|
+
- Support for regression, classification and quantile regression tasks.
|
|
37
|
+
- sklearn style API (`fit`, `predict`).
|
|
38
|
+
- CPU (via C) and GPU (via CuPy/CUDA) backends.
|
|
39
|
+
|
|
40
|
+
## Installation
|
|
41
|
+
|
|
42
|
+
```bash
|
|
43
|
+
# Basic installation
|
|
44
|
+
pip install kernelboost
|
|
45
|
+
|
|
46
|
+
# With GPU support (requires CUDA)
|
|
47
|
+
pip install cupy-cuda12x # for CUDA 12
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
> **Dependencies**: NumPy only. CuPy optional for GPU acceleration.
|
|
51
|
+
|
|
52
|
+
## Quick Start
|
|
53
|
+
|
|
54
|
+
```python
|
|
55
|
+
from kernelboost import KernelBooster, MulticlassBooster
|
|
56
|
+
from kernelboost.objectives import MSEObjective, EntropyObjective
|
|
57
|
+
|
|
58
|
+
# Regression
|
|
59
|
+
booster = KernelBooster(objective=MSEObjective()).fit(X_train, y_train)
|
|
60
|
+
predictions = booster.predict(X_test)
|
|
61
|
+
|
|
62
|
+
# Binary classification
|
|
63
|
+
booster = KernelBooster(objective=EntropyObjective()).fit(X_train, y_train)
|
|
64
|
+
logits = booster.predict(X_test)
|
|
65
|
+
probabilities = booster.predict_proba(X_test)
|
|
66
|
+
|
|
67
|
+
# Multiclass classification (fits one booster per class)
|
|
68
|
+
booster = MulticlassBooster().fit(X_train, y_train)
|
|
69
|
+
class_labels = booster.predict(X_test)
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
### How it works
|
|
73
|
+
|
|
74
|
+
KernelBooster uses gradient boosting with kernel-based local constant estimators instead of decision trees. Each boosting round fits a KernelTree that partitions the data into regions, then applies Nadaraya-Watson kernel regression at each leaf to predict pseudo-residuals. Unlike tree-based boosters where splits implicitly select features, KernelBooster selects features explicitly at the boosting stage before tree construction.
|
|
75
|
+
|
|
76
|
+
### What it delivers
|
|
77
|
+
|
|
78
|
+
With [suitable preprocessing](#data-preprocessing), KernelBooster can match popular gradient boosters like XGBoost and LightGBM on prediction accuracy while outperforming traditional kernel methods (KernelRidge, SVR, Gaussian Processes). Training time is comparable to other kernel methods. See [Benchmarks](#benchmarks) for detailed comparisons.
|
|
79
|
+
|
|
80
|
+
### Architecture
|
|
81
|
+
|
|
82
|
+
There are three main components to KernelBooster: KernelBooster class that does the boosting, KernelTree class that does the splitting and KernelEstimator class that implements the local constant estimation. As kernel methods are computationally expensive, the guiding principle has been computational efficiency.
|
|
83
|
+
|
|
84
|
+
After calling fit, KernelBooster starts a training loop which is mostly identical to the algorithm described in Friedman (2001). The main difference is that KernelTree does not choose features through its splits but is instead given them by the booster class. Default feature selection is random with increasing kernel sizes in terms of number of features. Random feature selection naturally creates randomness to training results, which can be mitigated with a lower learning rate and more rounds. Similarly to Friedman (2001), KernelBooster can fit several different objective functions, which are passed in as an Objective class.
|
|
85
|
+
|
|
86
|
+
KernelTree splits numerical data by density and categorical data by MSE. The idea here is that the kernel bandwidth should largely depend on how dense the data is. For numerical data, KernelTree splits until number of observations is below the 'max_sample' parameter. Besides finding regions which would be well served by the same bandwidth, this has the benefit of speeding up computation significantly in calculating the kernel matrices for the kernel estimator. For example, with ten splits we go from computing a (n, n) matrix to computing ten (n/10, n/10) matrices with n²/10 operations instead of n² (assuming equal splits). This saves a whopping 90% of compute.
|
|
87
|
+
|
|
88
|
+
The actual estimation is handled by KernelEstimator. It optimizes a scalar precision (inverse bandwidth) for the local constant estimator using leave-one-out cross validation and random search between given bounds. It has both Gaussian and (isotropic) Laplace kernels with default being the Laplace kernel. KernelEstimator also has uncertainty quantification methods for quantile and conditional variance prediction, but these are at this moment still experimental as they use a "naive" single kernel method whose precision is optimized for mean prediction.
|
|
89
|
+
|
|
90
|
+
### Notable features
|
|
91
|
+
|
|
92
|
+
Beyond the core boosting algorithm, KernelBooster includes a few features worth highlighting:
|
|
93
|
+
|
|
94
|
+
#### Smart Feature Selection
|
|
95
|
+
|
|
96
|
+
While the default feature selection is random (RandomSelector), the package includes an mRMR style probabilistic algorithm (SmartSelector) based on correlations between features and pseudo-residuals and performance in previous boosting rounds.
|
|
97
|
+
|
|
98
|
+
```python
|
|
99
|
+
from kernelboost.feature_selection import SmartSelector
|
|
100
|
+
|
|
101
|
+
selector = SmartSelector(
|
|
102
|
+
redundancy_penalty=0.4,
|
|
103
|
+
relevance_alpha=0.7,
|
|
104
|
+
recency_penalty=0.3,
|
|
105
|
+
)
|
|
106
|
+
|
|
107
|
+
booster = KernelBooster(
|
|
108
|
+
objective=MSEObjective(),
|
|
109
|
+
feature_selector=selector,
|
|
110
|
+
)
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
#### Early Stopping
|
|
114
|
+
|
|
115
|
+
Training stops automatically if evaluation loss doesn't improve for consecutive rounds (controlled by early_stopping_rounds parameter).
|
|
116
|
+
|
|
117
|
+
```python
|
|
118
|
+
booster.fit(X_train, y_train, eval_set=(X_val, y_val), early_stopping_rounds=20)
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
#### RhoOptimizer
|
|
122
|
+
|
|
123
|
+
RhoOptimizer performs post-hoc optimization of step sizes, often improving predictions at minimal additional cost. It can also back out optimal regularization parameters (L1 penalty and learning rate) — useful when unsure what level of regularization to use.
|
|
124
|
+
|
|
125
|
+
```python
|
|
126
|
+
from kernelboost.rho_optimizer import RhoOptimizer
|
|
127
|
+
|
|
128
|
+
opt = RhoOptimizer(booster, lambda_reg=1.0)
|
|
129
|
+
opt.fit(X_val, y_val)
|
|
130
|
+
opt.update_booster()
|
|
131
|
+
|
|
132
|
+
# Back out optimal hyperparameters
|
|
133
|
+
lambda1, learning_rate = opt.find_hyperparameters()
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
#### Uncertainty Quantification (Experimental)
|
|
137
|
+
|
|
138
|
+
KernelBooster has both prediction intervals and conditional variance prediction based on kernel estimation. These come for "free" on top of training and require no extra data. Still work in progress.
|
|
139
|
+
|
|
140
|
+
```python
|
|
141
|
+
# Prediction intervals (90% by default)
|
|
142
|
+
lower, upper = booster.predict_intervals(X, alpha=0.1)
|
|
143
|
+
|
|
144
|
+
# Conditional variance estimates
|
|
145
|
+
variance = booster.predict_variance(X)
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
Both interval coverage and conditional variance have a tendency to be underestimated, but this depends on the data and how well boosting has converged. No special tuning required: settings that optimize MSE also give reasonable uncertainty estimates. See [benchmarks](#uncertainty-quantification-california-housing) for a comparison with Gaussian Processes.
|
|
149
|
+
|
|
150
|
+
#### Data Preprocessing
|
|
151
|
+
|
|
152
|
+
Scaling data is a good idea for kernel estimation methods. The package includes a simple RankTransformer that often works well.
|
|
153
|
+
|
|
154
|
+
```python
|
|
155
|
+
from kernelboost.utilities import RankTransformer
|
|
156
|
+
|
|
157
|
+
scaler = RankTransformer(pct=True)
|
|
158
|
+
X_train = scaler.fit_transform(X_train)
|
|
159
|
+
X_test = scaler.transform(X_test)
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
Like other kernel methods, KernelBooster works best with continuous, smooth features. For datasets with many categorical features, tree-based methods are often better suited—they handle splits on categories naturally.
|
|
163
|
+
|
|
164
|
+
## API Reference
|
|
165
|
+
|
|
166
|
+
| Class | Purpose |
|
|
167
|
+
|-------|---------|
|
|
168
|
+
| `KernelBooster` | Main booster for regression/binary classification |
|
|
169
|
+
| `MulticlassBooster` | One-vs-rest multiclass wrapper |
|
|
170
|
+
| `MSEObjective` | Mean squared error (regression) |
|
|
171
|
+
| `EntropyObjective` | Cross-entropy (binary classification) |
|
|
172
|
+
| `QuantileObjective` | Pinball loss (quantile regression) |
|
|
173
|
+
| `SmartSelector` | mRMR-style feature selection |
|
|
174
|
+
| `RandomSelector` | Random feature selection |
|
|
175
|
+
| `RhoOptimizer` | Post-hoc step size optimization |
|
|
176
|
+
| `RankTransformer` | Percentile normalization |
|
|
177
|
+
|
|
178
|
+
## KernelBooster Main Parameters
|
|
179
|
+
|
|
180
|
+
| Parameter | Default | Description |
|
|
181
|
+
|-----------|---------|-------------|
|
|
182
|
+
| `objective` | Required | Loss function: `MSEObjective()`, `EntropyObjective()`, `QuantileObjective()` |
|
|
183
|
+
| `rounds` | auto | Boosting iterations (auto = n_features * 10) |
|
|
184
|
+
| `max_features` | auto | Max features per estimator (auto = min(10, n_features)) |
|
|
185
|
+
| `min_features` | 1 | Min features per estimator |
|
|
186
|
+
| `kernel_type` | 'laplace' | Kernel function: 'laplace' or 'gaussian' |
|
|
187
|
+
| `learning_rate` | 0.5 | Step size shrinkage factor |
|
|
188
|
+
| `lambda1` | 0.0 | L1 regularization |
|
|
189
|
+
| `use_gpu` | False | Enable GPU acceleration |
|
|
190
|
+
|
|
191
|
+
## Benchmarks
|
|
192
|
+
|
|
193
|
+
Results have inherent randomness due to feature selection and subsampling. Scripts available in `benchmarks/`.
|
|
194
|
+
|
|
195
|
+
### Regression (California Housing)
|
|
196
|
+
```text
|
|
197
|
+
=================================================================
|
|
198
|
+
Model MSE MAE R² Time
|
|
199
|
+
-----------------------------------------------------------------
|
|
200
|
+
KernelBooster 0.2053 0.2985 0.8452 11.0s
|
|
201
|
+
sklearn HGBR 0.2247 0.3146 0.8306 0.1s
|
|
202
|
+
XGBoost 0.2155 0.3050 0.8376 0.1s
|
|
203
|
+
LightGBM 0.2097 0.3047 0.8419 0.1s
|
|
204
|
+
=================================================================
|
|
205
|
+
```
|
|
206
|
+
|
|
207
|
+
### Binary Classification (Breast Cancer)
|
|
208
|
+
```text
|
|
209
|
+
=================================================================
|
|
210
|
+
Model Accuracy AUC-ROC F1 Time
|
|
211
|
+
-----------------------------------------------------------------
|
|
212
|
+
KernelBooster 0.9825 0.9984 0.9861 1.6s
|
|
213
|
+
sklearn HGBC 0.9649 0.9944 0.9722 0.1s
|
|
214
|
+
XGBoost 0.9561 0.9938 0.9650 0.0s
|
|
215
|
+
LightGBM 0.9649 0.9925 0.9722 0.0s
|
|
216
|
+
=================================================================
|
|
217
|
+
```
|
|
218
|
+
|
|
219
|
+
### Comparison with Kernel Methods (California Housing)
|
|
220
|
+
```text
|
|
221
|
+
=================================================================
|
|
222
|
+
Kernel Methods Benchmark (n_train=10000)
|
|
223
|
+
=================================================================
|
|
224
|
+
Model MSE MAE R² Time
|
|
225
|
+
-----------------------------------------------------------------
|
|
226
|
+
KernelBooster 0.2091 0.3054 0.8430 6.5s
|
|
227
|
+
KernelRidge 0.4233 0.4835 0.6822 1.7s
|
|
228
|
+
SVR 0.3136 0.3780 0.7646 3.5s
|
|
229
|
+
GP (n=5000) 0.3297 0.4061 0.7524 67.7s
|
|
230
|
+
=================================================================
|
|
231
|
+
```
|
|
232
|
+
|
|
233
|
+
### Uncertainty Quantification (California Housing)
|
|
234
|
+
|
|
235
|
+
Prediction intervals and conditional variance estimates compared to Gaussian Process (sklearn) regression:
|
|
236
|
+
```text
|
|
237
|
+
=================================================================
|
|
238
|
+
Uncertainty Quantification (90% intervals, alpha=0.1)
|
|
239
|
+
=================================================================
|
|
240
|
+
Model Coverage Width Var Corr Var Ratio
|
|
241
|
+
-----------------------------------------------------------------
|
|
242
|
+
KernelBooster 88.1% 1.235 0.206 1.621
|
|
243
|
+
GP (n=5000) 90.9% 1.863 0.157 1.026
|
|
244
|
+
=================================================================
|
|
245
|
+
```
|
|
246
|
+
|
|
247
|
+
Var Corr is the correlation between predicted variance and squared errors.
|
|
248
|
+
Var Ratio is the ratio between mean of squared_errors and predicted variance.
|
|
249
|
+
|
|
250
|
+
### CPU/GPU training time comparison (California Housing)
|
|
251
|
+
|
|
252
|
+
```text
|
|
253
|
+
=================================================================
|
|
254
|
+
GPU vs CPU Training Time (California Housing, n=10000)
|
|
255
|
+
=================================================================
|
|
256
|
+
Backend Time
|
|
257
|
+
-----------------------------------------------------------------
|
|
258
|
+
CPU (C/OpenMP) 38.6s
|
|
259
|
+
GPU (CuPy/CUDA) 4.6s
|
|
260
|
+
=================================================================
|
|
261
|
+
GPU speedup: 8.3x
|
|
262
|
+
```
|
|
263
|
+
|
|
264
|
+
All benchmarks run on Ubuntu 22.04 with Ryzen 7700 and RTX 3090.
|
|
265
|
+
|
|
266
|
+
## References
|
|
267
|
+
|
|
268
|
+
- Fan, J., & Gijbels, I. (1996). *Local Polynomial Modelling and Its Applications*. Chapman & Hall.
|
|
269
|
+
- Fan, J., & Yao, Q. (1998). Efficient estimation of conditional variance functions in stochastic regression. Biometrika, 85(3), 645–660.
|
|
270
|
+
- Friedman, J. H. (2001). *Greedy Function Approximation: A Gradient Boosting Machine*. Annals of Statistics, 29(5), 1189-1232.
|
|
271
|
+
- Hansen, B. E. (2004). Nonparametric Conditional Density Estimation. Working paper, University of Wisconsin.
|
|
272
|
+
|
|
273
|
+
## About
|
|
274
|
+
|
|
275
|
+
KernelBoost is a hobby project exploring alternatives to tree-based gradient boosting. Currently v0.1.0. Pre-compiled binaries included for Linux and Windows. Contributions and feedback welcome.
|
|
276
|
+
|
|
277
|
+
## License
|
|
278
|
+
|
|
279
|
+
MIT License
|
|
@@ -0,0 +1,257 @@
|
|
|
1
|
+
# KernelBoost
|
|
2
|
+
|
|
3
|
+
**Gradient boosting with kernel-based local constant estimators**
|
|
4
|
+
|
|
5
|
+

|
|
6
|
+

|
|
7
|
+

|
|
8
|
+

|
|
9
|
+

|
|
10
|
+

|
|
11
|
+
|
|
12
|
+
KernelBoost is a gradient boosting algorithm that uses Nadaraya-Watson (local constant) kernel estimators as base learners instead of decision trees. It has:
|
|
13
|
+
|
|
14
|
+
- Support for regression, classification and quantile regression tasks.
|
|
15
|
+
- sklearn style API (`fit`, `predict`).
|
|
16
|
+
- CPU (via C) and GPU (via CuPy/CUDA) backends.
|
|
17
|
+
|
|
18
|
+
## Installation
|
|
19
|
+
|
|
20
|
+
```bash
|
|
21
|
+
# Basic installation
|
|
22
|
+
pip install kernelboost
|
|
23
|
+
|
|
24
|
+
# With GPU support (requires CUDA)
|
|
25
|
+
pip install cupy-cuda12x # for CUDA 12
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
> **Dependencies**: NumPy only. CuPy optional for GPU acceleration.
|
|
29
|
+
|
|
30
|
+
## Quick Start
|
|
31
|
+
|
|
32
|
+
```python
|
|
33
|
+
from kernelboost import KernelBooster, MulticlassBooster
|
|
34
|
+
from kernelboost.objectives import MSEObjective, EntropyObjective
|
|
35
|
+
|
|
36
|
+
# Regression
|
|
37
|
+
booster = KernelBooster(objective=MSEObjective()).fit(X_train, y_train)
|
|
38
|
+
predictions = booster.predict(X_test)
|
|
39
|
+
|
|
40
|
+
# Binary classification
|
|
41
|
+
booster = KernelBooster(objective=EntropyObjective()).fit(X_train, y_train)
|
|
42
|
+
logits = booster.predict(X_test)
|
|
43
|
+
probabilities = booster.predict_proba(X_test)
|
|
44
|
+
|
|
45
|
+
# Multiclass classification (fits one booster per class)
|
|
46
|
+
booster = MulticlassBooster().fit(X_train, y_train)
|
|
47
|
+
class_labels = booster.predict(X_test)
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
### How it works
|
|
51
|
+
|
|
52
|
+
KernelBooster uses gradient boosting with kernel-based local constant estimators instead of decision trees. Each boosting round fits a KernelTree that partitions the data into regions, then applies Nadaraya-Watson kernel regression at each leaf to predict pseudo-residuals. Unlike tree-based boosters where splits implicitly select features, KernelBooster selects features explicitly at the boosting stage before tree construction.
|
|
53
|
+
|
|
54
|
+
### What it delivers
|
|
55
|
+
|
|
56
|
+
With [suitable preprocessing](#data-preprocessing), KernelBooster can match popular gradient boosters like XGBoost and LightGBM on prediction accuracy while outperforming traditional kernel methods (KernelRidge, SVR, Gaussian Processes). Training time is comparable to other kernel methods. See [Benchmarks](#benchmarks) for detailed comparisons.
|
|
57
|
+
|
|
58
|
+
### Architecture
|
|
59
|
+
|
|
60
|
+
There are three main components to KernelBooster: KernelBooster class that does the boosting, KernelTree class that does the splitting and KernelEstimator class that implements the local constant estimation. As kernel methods are computationally expensive, the guiding principle has been computational efficiency.
|
|
61
|
+
|
|
62
|
+
After calling fit, KernelBooster starts a training loop which is mostly identical to the algorithm described in Friedman (2001). The main difference is that KernelTree does not choose features through its splits but is instead given them by the booster class. Default feature selection is random with increasing kernel sizes in terms of number of features. Random feature selection naturally creates randomness to training results, which can be mitigated with a lower learning rate and more rounds. Similarly to Friedman (2001), KernelBooster can fit several different objective functions, which are passed in as an Objective class.
|
|
63
|
+
|
|
64
|
+
KernelTree splits numerical data by density and categorical data by MSE. The idea here is that the kernel bandwidth should largely depend on how dense the data is. For numerical data, KernelTree splits until number of observations is below the 'max_sample' parameter. Besides finding regions which would be well served by the same bandwidth, this has the benefit of speeding up computation significantly in calculating the kernel matrices for the kernel estimator. For example, with ten splits we go from computing a (n, n) matrix to computing ten (n/10, n/10) matrices with n²/10 operations instead of n² (assuming equal splits). This saves a whopping 90% of compute.
|
|
65
|
+
|
|
66
|
+
The actual estimation is handled by KernelEstimator. It optimizes a scalar precision (inverse bandwidth) for the local constant estimator using leave-one-out cross validation and random search between given bounds. It has both Gaussian and (isotropic) Laplace kernels with default being the Laplace kernel. KernelEstimator also has uncertainty quantification methods for quantile and conditional variance prediction, but these are at this moment still experimental as they use a "naive" single kernel method whose precision is optimized for mean prediction.
|
|
67
|
+
|
|
68
|
+
### Notable features
|
|
69
|
+
|
|
70
|
+
Beyond the core boosting algorithm, KernelBooster includes a few features worth highlighting:
|
|
71
|
+
|
|
72
|
+
#### Smart Feature Selection
|
|
73
|
+
|
|
74
|
+
While the default feature selection is random (RandomSelector), the package includes an mRMR style probabilistic algorithm (SmartSelector) based on correlations between features and pseudo-residuals and performance in previous boosting rounds.
|
|
75
|
+
|
|
76
|
+
```python
|
|
77
|
+
from kernelboost.feature_selection import SmartSelector
|
|
78
|
+
|
|
79
|
+
selector = SmartSelector(
|
|
80
|
+
redundancy_penalty=0.4,
|
|
81
|
+
relevance_alpha=0.7,
|
|
82
|
+
recency_penalty=0.3,
|
|
83
|
+
)
|
|
84
|
+
|
|
85
|
+
booster = KernelBooster(
|
|
86
|
+
objective=MSEObjective(),
|
|
87
|
+
feature_selector=selector,
|
|
88
|
+
)
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
#### Early Stopping
|
|
92
|
+
|
|
93
|
+
Training stops automatically if evaluation loss doesn't improve for consecutive rounds (controlled by early_stopping_rounds parameter).
|
|
94
|
+
|
|
95
|
+
```python
|
|
96
|
+
booster.fit(X_train, y_train, eval_set=(X_val, y_val), early_stopping_rounds=20)
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
#### RhoOptimizer
|
|
100
|
+
|
|
101
|
+
RhoOptimizer performs post-hoc optimization of step sizes, often improving predictions at minimal additional cost. It can also back out optimal regularization parameters (L1 penalty and learning rate) — useful when unsure what level of regularization to use.
|
|
102
|
+
|
|
103
|
+
```python
|
|
104
|
+
from kernelboost.rho_optimizer import RhoOptimizer
|
|
105
|
+
|
|
106
|
+
opt = RhoOptimizer(booster, lambda_reg=1.0)
|
|
107
|
+
opt.fit(X_val, y_val)
|
|
108
|
+
opt.update_booster()
|
|
109
|
+
|
|
110
|
+
# Back out optimal hyperparameters
|
|
111
|
+
lambda1, learning_rate = opt.find_hyperparameters()
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
#### Uncertainty Quantification (Experimental)
|
|
115
|
+
|
|
116
|
+
KernelBooster has both prediction intervals and conditional variance prediction based on kernel estimation. These come for "free" on top of training and require no extra data. Still work in progress.
|
|
117
|
+
|
|
118
|
+
```python
|
|
119
|
+
# Prediction intervals (90% by default)
|
|
120
|
+
lower, upper = booster.predict_intervals(X, alpha=0.1)
|
|
121
|
+
|
|
122
|
+
# Conditional variance estimates
|
|
123
|
+
variance = booster.predict_variance(X)
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
Both interval coverage and conditional variance have a tendency to be underestimated, but this depends on the data and how well boosting has converged. No special tuning required: settings that optimize MSE also give reasonable uncertainty estimates. See [benchmarks](#uncertainty-quantification-california-housing) for a comparison with Gaussian Processes.
|
|
127
|
+
|
|
128
|
+
#### Data Preprocessing
|
|
129
|
+
|
|
130
|
+
Scaling data is a good idea for kernel estimation methods. The package includes a simple RankTransformer that often works well.
|
|
131
|
+
|
|
132
|
+
```python
|
|
133
|
+
from kernelboost.utilities import RankTransformer
|
|
134
|
+
|
|
135
|
+
scaler = RankTransformer(pct=True)
|
|
136
|
+
X_train = scaler.fit_transform(X_train)
|
|
137
|
+
X_test = scaler.transform(X_test)
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
Like other kernel methods, KernelBooster works best with continuous, smooth features. For datasets with many categorical features, tree-based methods are often better suited—they handle splits on categories naturally.
|
|
141
|
+
|
|
142
|
+
## API Reference
|
|
143
|
+
|
|
144
|
+
| Class | Purpose |
|
|
145
|
+
|-------|---------|
|
|
146
|
+
| `KernelBooster` | Main booster for regression/binary classification |
|
|
147
|
+
| `MulticlassBooster` | One-vs-rest multiclass wrapper |
|
|
148
|
+
| `MSEObjective` | Mean squared error (regression) |
|
|
149
|
+
| `EntropyObjective` | Cross-entropy (binary classification) |
|
|
150
|
+
| `QuantileObjective` | Pinball loss (quantile regression) |
|
|
151
|
+
| `SmartSelector` | mRMR-style feature selection |
|
|
152
|
+
| `RandomSelector` | Random feature selection |
|
|
153
|
+
| `RhoOptimizer` | Post-hoc step size optimization |
|
|
154
|
+
| `RankTransformer` | Percentile normalization |
|
|
155
|
+
|
|
156
|
+
## KernelBooster Main Parameters
|
|
157
|
+
|
|
158
|
+
| Parameter | Default | Description |
|
|
159
|
+
|-----------|---------|-------------|
|
|
160
|
+
| `objective` | Required | Loss function: `MSEObjective()`, `EntropyObjective()`, `QuantileObjective()` |
|
|
161
|
+
| `rounds` | auto | Boosting iterations (auto = n_features * 10) |
|
|
162
|
+
| `max_features` | auto | Max features per estimator (auto = min(10, n_features)) |
|
|
163
|
+
| `min_features` | 1 | Min features per estimator |
|
|
164
|
+
| `kernel_type` | 'laplace' | Kernel function: 'laplace' or 'gaussian' |
|
|
165
|
+
| `learning_rate` | 0.5 | Step size shrinkage factor |
|
|
166
|
+
| `lambda1` | 0.0 | L1 regularization |
|
|
167
|
+
| `use_gpu` | False | Enable GPU acceleration |
|
|
168
|
+
|
|
169
|
+
## Benchmarks
|
|
170
|
+
|
|
171
|
+
Results have inherent randomness due to feature selection and subsampling. Scripts available in `benchmarks/`.
|
|
172
|
+
|
|
173
|
+
### Regression (California Housing)
|
|
174
|
+
```text
|
|
175
|
+
=================================================================
|
|
176
|
+
Model MSE MAE R² Time
|
|
177
|
+
-----------------------------------------------------------------
|
|
178
|
+
KernelBooster 0.2053 0.2985 0.8452 11.0s
|
|
179
|
+
sklearn HGBR 0.2247 0.3146 0.8306 0.1s
|
|
180
|
+
XGBoost 0.2155 0.3050 0.8376 0.1s
|
|
181
|
+
LightGBM 0.2097 0.3047 0.8419 0.1s
|
|
182
|
+
=================================================================
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
### Binary Classification (Breast Cancer)
|
|
186
|
+
```text
|
|
187
|
+
=================================================================
|
|
188
|
+
Model Accuracy AUC-ROC F1 Time
|
|
189
|
+
-----------------------------------------------------------------
|
|
190
|
+
KernelBooster 0.9825 0.9984 0.9861 1.6s
|
|
191
|
+
sklearn HGBC 0.9649 0.9944 0.9722 0.1s
|
|
192
|
+
XGBoost 0.9561 0.9938 0.9650 0.0s
|
|
193
|
+
LightGBM 0.9649 0.9925 0.9722 0.0s
|
|
194
|
+
=================================================================
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
### Comparison with Kernel Methods (California Housing)
|
|
198
|
+
```text
|
|
199
|
+
=================================================================
|
|
200
|
+
Kernel Methods Benchmark (n_train=10000)
|
|
201
|
+
=================================================================
|
|
202
|
+
Model MSE MAE R² Time
|
|
203
|
+
-----------------------------------------------------------------
|
|
204
|
+
KernelBooster 0.2091 0.3054 0.8430 6.5s
|
|
205
|
+
KernelRidge 0.4233 0.4835 0.6822 1.7s
|
|
206
|
+
SVR 0.3136 0.3780 0.7646 3.5s
|
|
207
|
+
GP (n=5000) 0.3297 0.4061 0.7524 67.7s
|
|
208
|
+
=================================================================
|
|
209
|
+
```
|
|
210
|
+
|
|
211
|
+
### Uncertainty Quantification (California Housing)
|
|
212
|
+
|
|
213
|
+
Prediction intervals and conditional variance estimates compared to Gaussian Process (sklearn) regression:
|
|
214
|
+
```text
|
|
215
|
+
=================================================================
|
|
216
|
+
Uncertainty Quantification (90% intervals, alpha=0.1)
|
|
217
|
+
=================================================================
|
|
218
|
+
Model Coverage Width Var Corr Var Ratio
|
|
219
|
+
-----------------------------------------------------------------
|
|
220
|
+
KernelBooster 88.1% 1.235 0.206 1.621
|
|
221
|
+
GP (n=5000) 90.9% 1.863 0.157 1.026
|
|
222
|
+
=================================================================
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
Var Corr is the correlation between predicted variance and squared errors.
|
|
226
|
+
Var Ratio is the ratio between mean of squared_errors and predicted variance.
|
|
227
|
+
|
|
228
|
+
### CPU/GPU training time comparison (California Housing)
|
|
229
|
+
|
|
230
|
+
```text
|
|
231
|
+
=================================================================
|
|
232
|
+
GPU vs CPU Training Time (California Housing, n=10000)
|
|
233
|
+
=================================================================
|
|
234
|
+
Backend Time
|
|
235
|
+
-----------------------------------------------------------------
|
|
236
|
+
CPU (C/OpenMP) 38.6s
|
|
237
|
+
GPU (CuPy/CUDA) 4.6s
|
|
238
|
+
=================================================================
|
|
239
|
+
GPU speedup: 8.3x
|
|
240
|
+
```
|
|
241
|
+
|
|
242
|
+
All benchmarks run on Ubuntu 22.04 with Ryzen 7700 and RTX 3090.
|
|
243
|
+
|
|
244
|
+
## References
|
|
245
|
+
|
|
246
|
+
- Fan, J., & Gijbels, I. (1996). *Local Polynomial Modelling and Its Applications*. Chapman & Hall.
|
|
247
|
+
- Fan, J., & Yao, Q. (1998). Efficient estimation of conditional variance functions in stochastic regression. Biometrika, 85(3), 645–660.
|
|
248
|
+
- Friedman, J. H. (2001). *Greedy Function Approximation: A Gradient Boosting Machine*. Annals of Statistics, 29(5), 1189-1232.
|
|
249
|
+
- Hansen, B. E. (2004). Nonparametric Conditional Density Estimation. Working paper, University of Wisconsin.
|
|
250
|
+
|
|
251
|
+
## About
|
|
252
|
+
|
|
253
|
+
KernelBoost is a hobby project exploring alternatives to tree-based gradient boosting. Currently v0.1.0. Pre-compiled binaries included for Linux and Windows. Contributions and feedback welcome.
|
|
254
|
+
|
|
255
|
+
## License
|
|
256
|
+
|
|
257
|
+
MIT License
|
|
@@ -0,0 +1,11 @@
|
|
|
1
|
+
"""KernelBooster: Gradient boosting with Nadaraya-Watson (local constant) estimator as base learners."""
|
|
2
|
+
|
|
3
|
+
__version__ = "0.1.0"
|
|
4
|
+
|
|
5
|
+
from .booster import KernelBooster
|
|
6
|
+
from .multiclassbooster import MulticlassBooster
|
|
7
|
+
|
|
8
|
+
__all__ = [
|
|
9
|
+
"KernelBooster",
|
|
10
|
+
"MulticlassBooster",
|
|
11
|
+
]
|