kernelboost 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,15 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ ## [0.1.0] - 2026-02-10
6
+
7
+ ### Added
8
+ - Initial public release
9
+ - KernelBooster for 1d targets with support for MSE, entropy, and quantile objectives
10
+ - MulticlassBooster for multiclass classification
11
+ - GPU acceleration via CuPy
12
+ - Nadaraya-Watson kernel regression with LOO-CV bandwidth optimization
13
+ - Feature selection (random and smart selectors)
14
+ - Early stopping with validation loss monitoring
15
+ - Uncertainty quantification via prediction intervals and conditional variance prediction
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 tlaiho
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,11 @@
1
+ include LICENSE
2
+ include README.md
3
+ include CHANGELOG.md
4
+ recursive-include kernelboost *.c *.cu *.so *.dll
5
+ prune tests
6
+ prune benchmarks
7
+ prune docs
8
+ prune private
9
+ prune private_tests
10
+ global-exclude __pycache__ *.pyc *.pyo
11
+ prune kernelboost/_extras
@@ -0,0 +1,279 @@
1
+ Metadata-Version: 2.1
2
+ Name: kernelboost
3
+ Version: 0.1.0
4
+ Summary: Gradient boosting with kernel regression base learners
5
+ Author-email: tlaiho <tslaiho@gmail.com>
6
+ License: MIT
7
+ Project-URL: Repository, https://github.com/tlaiho/kernelboost
8
+ Keywords: gradient-boosting,kernel-regression,machine-learning
9
+ Classifier: Development Status :: 3 - Alpha
10
+ Classifier: Intended Audience :: Science/Research
11
+ Classifier: License :: OSI Approved :: MIT License
12
+ Classifier: Programming Language :: Python :: 3
13
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
14
+ Requires-Python: >=3.9
15
+ Description-Content-Type: text/markdown
16
+ License-File: LICENSE
17
+ Requires-Dist: numpy>=1.26.4
18
+ Provides-Extra: gpu
19
+ Requires-Dist: cupy>=11.0.0; extra == "gpu"
20
+ Provides-Extra: all
21
+ Requires-Dist: cupy>=11.0.0; extra == "all"
22
+
23
+ # KernelBoost
24
+
25
+ **Gradient boosting with kernel-based local constant estimators**
26
+
27
+ ![Python](https://img.shields.io/badge/python-%3E%3D3.9-blue)
28
+ ![NumPy](https://img.shields.io/badge/NumPy-array%20backend-blue)
29
+ ![C](https://img.shields.io/badge/C-language-blue)
30
+ ![GPU](https://img.shields.io/badge/GPU-CUDA%20C%2FCuPy-orange)
31
+ ![License](https://img.shields.io/badge/license-MIT-green)
32
+ ![Version](https://img.shields.io/badge/version-0.1.0-blue)
33
+
34
+ KernelBoost is a gradient boosting algorithm that uses Nadaraya-Watson (local constant) kernel estimators as base learners instead of decision trees. It has:
35
+
36
+ - Support for regression, classification and quantile regression tasks.
37
+ - sklearn style API (`fit`, `predict`).
38
+ - CPU (via C) and GPU (via CuPy/CUDA) backends.
39
+
40
+ ## Installation
41
+
42
+ ```bash
43
+ # Basic installation
44
+ pip install kernelboost
45
+
46
+ # With GPU support (requires CUDA)
47
+ pip install cupy-cuda12x # for CUDA 12
48
+ ```
49
+
50
+ > **Dependencies**: NumPy only. CuPy optional for GPU acceleration.
51
+
52
+ ## Quick Start
53
+
54
+ ```python
55
+ from kernelboost import KernelBooster, MulticlassBooster
56
+ from kernelboost.objectives import MSEObjective, EntropyObjective
57
+
58
+ # Regression
59
+ booster = KernelBooster(objective=MSEObjective()).fit(X_train, y_train)
60
+ predictions = booster.predict(X_test)
61
+
62
+ # Binary classification
63
+ booster = KernelBooster(objective=EntropyObjective()).fit(X_train, y_train)
64
+ logits = booster.predict(X_test)
65
+ probabilities = booster.predict_proba(X_test)
66
+
67
+ # Multiclass classification (fits one booster per class)
68
+ booster = MulticlassBooster().fit(X_train, y_train)
69
+ class_labels = booster.predict(X_test)
70
+ ```
71
+
72
+ ### How it works
73
+
74
+ KernelBooster uses gradient boosting with kernel-based local constant estimators instead of decision trees. Each boosting round fits a KernelTree that partitions the data into regions, then applies Nadaraya-Watson kernel regression at each leaf to predict pseudo-residuals. Unlike tree-based boosters where splits implicitly select features, KernelBooster selects features explicitly at the boosting stage before tree construction.
75
+
76
+ ### What it delivers
77
+
78
+ With [suitable preprocessing](#data-preprocessing), KernelBooster can match popular gradient boosters like XGBoost and LightGBM on prediction accuracy while outperforming traditional kernel methods (KernelRidge, SVR, Gaussian Processes). Training time is comparable to other kernel methods. See [Benchmarks](#benchmarks) for detailed comparisons.
79
+
80
+ ### Architecture
81
+
82
+ There are three main components to KernelBooster: KernelBooster class that does the boosting, KernelTree class that does the splitting and KernelEstimator class that implements the local constant estimation. As kernel methods are computationally expensive, the guiding principle has been computational efficiency.
83
+
84
+ After calling fit, KernelBooster starts a training loop which is mostly identical to the algorithm described in Friedman (2001). The main difference is that KernelTree does not choose features through its splits but is instead given them by the booster class. Default feature selection is random with increasing kernel sizes in terms of number of features. Random feature selection naturally creates randomness to training results, which can be mitigated with a lower learning rate and more rounds. Similarly to Friedman (2001), KernelBooster can fit several different objective functions, which are passed in as an Objective class.
85
+
86
+ KernelTree splits numerical data by density and categorical data by MSE. The idea here is that the kernel bandwidth should largely depend on how dense the data is. For numerical data, KernelTree splits until number of observations is below the 'max_sample' parameter. Besides finding regions which would be well served by the same bandwidth, this has the benefit of speeding up computation significantly in calculating the kernel matrices for the kernel estimator. For example, with ten splits we go from computing a (n, n) matrix to computing ten (n/10, n/10) matrices with n²/10 operations instead of n² (assuming equal splits). This saves a whopping 90% of compute.
87
+
88
+ The actual estimation is handled by KernelEstimator. It optimizes a scalar precision (inverse bandwidth) for the local constant estimator using leave-one-out cross validation and random search between given bounds. It has both Gaussian and (isotropic) Laplace kernels with default being the Laplace kernel. KernelEstimator also has uncertainty quantification methods for quantile and conditional variance prediction, but these are at this moment still experimental as they use a "naive" single kernel method whose precision is optimized for mean prediction.
89
+
90
+ ### Notable features
91
+
92
+ Beyond the core boosting algorithm, KernelBooster includes a few features worth highlighting:
93
+
94
+ #### Smart Feature Selection
95
+
96
+ While the default feature selection is random (RandomSelector), the package includes an mRMR style probabilistic algorithm (SmartSelector) based on correlations between features and pseudo-residuals and performance in previous boosting rounds.
97
+
98
+ ```python
99
+ from kernelboost.feature_selection import SmartSelector
100
+
101
+ selector = SmartSelector(
102
+ redundancy_penalty=0.4,
103
+ relevance_alpha=0.7,
104
+ recency_penalty=0.3,
105
+ )
106
+
107
+ booster = KernelBooster(
108
+ objective=MSEObjective(),
109
+ feature_selector=selector,
110
+ )
111
+ ```
112
+
113
+ #### Early Stopping
114
+
115
+ Training stops automatically if evaluation loss doesn't improve for consecutive rounds (controlled by early_stopping_rounds parameter).
116
+
117
+ ```python
118
+ booster.fit(X_train, y_train, eval_set=(X_val, y_val), early_stopping_rounds=20)
119
+ ```
120
+
121
+ #### RhoOptimizer
122
+
123
+ RhoOptimizer performs post-hoc optimization of step sizes, often improving predictions at minimal additional cost. It can also back out optimal regularization parameters (L1 penalty and learning rate) — useful when unsure what level of regularization to use.
124
+
125
+ ```python
126
+ from kernelboost.rho_optimizer import RhoOptimizer
127
+
128
+ opt = RhoOptimizer(booster, lambda_reg=1.0)
129
+ opt.fit(X_val, y_val)
130
+ opt.update_booster()
131
+
132
+ # Back out optimal hyperparameters
133
+ lambda1, learning_rate = opt.find_hyperparameters()
134
+ ```
135
+
136
+ #### Uncertainty Quantification (Experimental)
137
+
138
+ KernelBooster has both prediction intervals and conditional variance prediction based on kernel estimation. These come for "free" on top of training and require no extra data. Still work in progress.
139
+
140
+ ```python
141
+ # Prediction intervals (90% by default)
142
+ lower, upper = booster.predict_intervals(X, alpha=0.1)
143
+
144
+ # Conditional variance estimates
145
+ variance = booster.predict_variance(X)
146
+ ```
147
+
148
+ Both interval coverage and conditional variance have a tendency to be underestimated, but this depends on the data and how well boosting has converged. No special tuning required: settings that optimize MSE also give reasonable uncertainty estimates. See [benchmarks](#uncertainty-quantification-california-housing) for a comparison with Gaussian Processes.
149
+
150
+ #### Data Preprocessing
151
+
152
+ Scaling data is a good idea for kernel estimation methods. The package includes a simple RankTransformer that often works well.
153
+
154
+ ```python
155
+ from kernelboost.utilities import RankTransformer
156
+
157
+ scaler = RankTransformer(pct=True)
158
+ X_train = scaler.fit_transform(X_train)
159
+ X_test = scaler.transform(X_test)
160
+ ```
161
+
162
+ Like other kernel methods, KernelBooster works best with continuous, smooth features. For datasets with many categorical features, tree-based methods are often better suited—they handle splits on categories naturally.
163
+
164
+ ## API Reference
165
+
166
+ | Class | Purpose |
167
+ |-------|---------|
168
+ | `KernelBooster` | Main booster for regression/binary classification |
169
+ | `MulticlassBooster` | One-vs-rest multiclass wrapper |
170
+ | `MSEObjective` | Mean squared error (regression) |
171
+ | `EntropyObjective` | Cross-entropy (binary classification) |
172
+ | `QuantileObjective` | Pinball loss (quantile regression) |
173
+ | `SmartSelector` | mRMR-style feature selection |
174
+ | `RandomSelector` | Random feature selection |
175
+ | `RhoOptimizer` | Post-hoc step size optimization |
176
+ | `RankTransformer` | Percentile normalization |
177
+
178
+ ## KernelBooster Main Parameters
179
+
180
+ | Parameter | Default | Description |
181
+ |-----------|---------|-------------|
182
+ | `objective` | Required | Loss function: `MSEObjective()`, `EntropyObjective()`, `QuantileObjective()` |
183
+ | `rounds` | auto | Boosting iterations (auto = n_features * 10) |
184
+ | `max_features` | auto | Max features per estimator (auto = min(10, n_features)) |
185
+ | `min_features` | 1 | Min features per estimator |
186
+ | `kernel_type` | 'laplace' | Kernel function: 'laplace' or 'gaussian' |
187
+ | `learning_rate` | 0.5 | Step size shrinkage factor |
188
+ | `lambda1` | 0.0 | L1 regularization |
189
+ | `use_gpu` | False | Enable GPU acceleration |
190
+
191
+ ## Benchmarks
192
+
193
+ Results have inherent randomness due to feature selection and subsampling. Scripts available in `benchmarks/`.
194
+
195
+ ### Regression (California Housing)
196
+ ```text
197
+ =================================================================
198
+ Model MSE MAE R² Time
199
+ -----------------------------------------------------------------
200
+ KernelBooster 0.2053 0.2985 0.8452 11.0s
201
+ sklearn HGBR 0.2247 0.3146 0.8306 0.1s
202
+ XGBoost 0.2155 0.3050 0.8376 0.1s
203
+ LightGBM 0.2097 0.3047 0.8419 0.1s
204
+ =================================================================
205
+ ```
206
+
207
+ ### Binary Classification (Breast Cancer)
208
+ ```text
209
+ =================================================================
210
+ Model Accuracy AUC-ROC F1 Time
211
+ -----------------------------------------------------------------
212
+ KernelBooster 0.9825 0.9984 0.9861 1.6s
213
+ sklearn HGBC 0.9649 0.9944 0.9722 0.1s
214
+ XGBoost 0.9561 0.9938 0.9650 0.0s
215
+ LightGBM 0.9649 0.9925 0.9722 0.0s
216
+ =================================================================
217
+ ```
218
+
219
+ ### Comparison with Kernel Methods (California Housing)
220
+ ```text
221
+ =================================================================
222
+ Kernel Methods Benchmark (n_train=10000)
223
+ =================================================================
224
+ Model MSE MAE R² Time
225
+ -----------------------------------------------------------------
226
+ KernelBooster 0.2091 0.3054 0.8430 6.5s
227
+ KernelRidge 0.4233 0.4835 0.6822 1.7s
228
+ SVR 0.3136 0.3780 0.7646 3.5s
229
+ GP (n=5000) 0.3297 0.4061 0.7524 67.7s
230
+ =================================================================
231
+ ```
232
+
233
+ ### Uncertainty Quantification (California Housing)
234
+
235
+ Prediction intervals and conditional variance estimates compared to Gaussian Process (sklearn) regression:
236
+ ```text
237
+ =================================================================
238
+ Uncertainty Quantification (90% intervals, alpha=0.1)
239
+ =================================================================
240
+ Model Coverage Width Var Corr Var Ratio
241
+ -----------------------------------------------------------------
242
+ KernelBooster 88.1% 1.235 0.206 1.621
243
+ GP (n=5000) 90.9% 1.863 0.157 1.026
244
+ =================================================================
245
+ ```
246
+
247
+ Var Corr is the correlation between predicted variance and squared errors.
248
+ Var Ratio is the ratio between mean of squared_errors and predicted variance.
249
+
250
+ ### CPU/GPU training time comparison (California Housing)
251
+
252
+ ```text
253
+ =================================================================
254
+ GPU vs CPU Training Time (California Housing, n=10000)
255
+ =================================================================
256
+ Backend Time
257
+ -----------------------------------------------------------------
258
+ CPU (C/OpenMP) 38.6s
259
+ GPU (CuPy/CUDA) 4.6s
260
+ =================================================================
261
+ GPU speedup: 8.3x
262
+ ```
263
+
264
+ All benchmarks run on Ubuntu 22.04 with Ryzen 7700 and RTX 3090.
265
+
266
+ ## References
267
+
268
+ - Fan, J., & Gijbels, I. (1996). *Local Polynomial Modelling and Its Applications*. Chapman & Hall.
269
+ - Fan, J., & Yao, Q. (1998). Efficient estimation of conditional variance functions in stochastic regression. Biometrika, 85(3), 645–660.
270
+ - Friedman, J. H. (2001). *Greedy Function Approximation: A Gradient Boosting Machine*. Annals of Statistics, 29(5), 1189-1232.
271
+ - Hansen, B. E. (2004). Nonparametric Conditional Density Estimation. Working paper, University of Wisconsin.
272
+
273
+ ## About
274
+
275
+ KernelBoost is a hobby project exploring alternatives to tree-based gradient boosting. Currently v0.1.0. Pre-compiled binaries included for Linux and Windows. Contributions and feedback welcome.
276
+
277
+ ## License
278
+
279
+ MIT License
@@ -0,0 +1,257 @@
1
+ # KernelBoost
2
+
3
+ **Gradient boosting with kernel-based local constant estimators**
4
+
5
+ ![Python](https://img.shields.io/badge/python-%3E%3D3.9-blue)
6
+ ![NumPy](https://img.shields.io/badge/NumPy-array%20backend-blue)
7
+ ![C](https://img.shields.io/badge/C-language-blue)
8
+ ![GPU](https://img.shields.io/badge/GPU-CUDA%20C%2FCuPy-orange)
9
+ ![License](https://img.shields.io/badge/license-MIT-green)
10
+ ![Version](https://img.shields.io/badge/version-0.1.0-blue)
11
+
12
+ KernelBoost is a gradient boosting algorithm that uses Nadaraya-Watson (local constant) kernel estimators as base learners instead of decision trees. It has:
13
+
14
+ - Support for regression, classification and quantile regression tasks.
15
+ - sklearn style API (`fit`, `predict`).
16
+ - CPU (via C) and GPU (via CuPy/CUDA) backends.
17
+
18
+ ## Installation
19
+
20
+ ```bash
21
+ # Basic installation
22
+ pip install kernelboost
23
+
24
+ # With GPU support (requires CUDA)
25
+ pip install cupy-cuda12x # for CUDA 12
26
+ ```
27
+
28
+ > **Dependencies**: NumPy only. CuPy optional for GPU acceleration.
29
+
30
+ ## Quick Start
31
+
32
+ ```python
33
+ from kernelboost import KernelBooster, MulticlassBooster
34
+ from kernelboost.objectives import MSEObjective, EntropyObjective
35
+
36
+ # Regression
37
+ booster = KernelBooster(objective=MSEObjective()).fit(X_train, y_train)
38
+ predictions = booster.predict(X_test)
39
+
40
+ # Binary classification
41
+ booster = KernelBooster(objective=EntropyObjective()).fit(X_train, y_train)
42
+ logits = booster.predict(X_test)
43
+ probabilities = booster.predict_proba(X_test)
44
+
45
+ # Multiclass classification (fits one booster per class)
46
+ booster = MulticlassBooster().fit(X_train, y_train)
47
+ class_labels = booster.predict(X_test)
48
+ ```
49
+
50
+ ### How it works
51
+
52
+ KernelBooster uses gradient boosting with kernel-based local constant estimators instead of decision trees. Each boosting round fits a KernelTree that partitions the data into regions, then applies Nadaraya-Watson kernel regression at each leaf to predict pseudo-residuals. Unlike tree-based boosters where splits implicitly select features, KernelBooster selects features explicitly at the boosting stage before tree construction.
53
+
54
+ ### What it delivers
55
+
56
+ With [suitable preprocessing](#data-preprocessing), KernelBooster can match popular gradient boosters like XGBoost and LightGBM on prediction accuracy while outperforming traditional kernel methods (KernelRidge, SVR, Gaussian Processes). Training time is comparable to other kernel methods. See [Benchmarks](#benchmarks) for detailed comparisons.
57
+
58
+ ### Architecture
59
+
60
+ There are three main components to KernelBooster: KernelBooster class that does the boosting, KernelTree class that does the splitting and KernelEstimator class that implements the local constant estimation. As kernel methods are computationally expensive, the guiding principle has been computational efficiency.
61
+
62
+ After calling fit, KernelBooster starts a training loop which is mostly identical to the algorithm described in Friedman (2001). The main difference is that KernelTree does not choose features through its splits but is instead given them by the booster class. Default feature selection is random with increasing kernel sizes in terms of number of features. Random feature selection naturally creates randomness to training results, which can be mitigated with a lower learning rate and more rounds. Similarly to Friedman (2001), KernelBooster can fit several different objective functions, which are passed in as an Objective class.
63
+
64
+ KernelTree splits numerical data by density and categorical data by MSE. The idea here is that the kernel bandwidth should largely depend on how dense the data is. For numerical data, KernelTree splits until number of observations is below the 'max_sample' parameter. Besides finding regions which would be well served by the same bandwidth, this has the benefit of speeding up computation significantly in calculating the kernel matrices for the kernel estimator. For example, with ten splits we go from computing a (n, n) matrix to computing ten (n/10, n/10) matrices with n²/10 operations instead of n² (assuming equal splits). This saves a whopping 90% of compute.
65
+
66
+ The actual estimation is handled by KernelEstimator. It optimizes a scalar precision (inverse bandwidth) for the local constant estimator using leave-one-out cross validation and random search between given bounds. It has both Gaussian and (isotropic) Laplace kernels with default being the Laplace kernel. KernelEstimator also has uncertainty quantification methods for quantile and conditional variance prediction, but these are at this moment still experimental as they use a "naive" single kernel method whose precision is optimized for mean prediction.
67
+
68
+ ### Notable features
69
+
70
+ Beyond the core boosting algorithm, KernelBooster includes a few features worth highlighting:
71
+
72
+ #### Smart Feature Selection
73
+
74
+ While the default feature selection is random (RandomSelector), the package includes an mRMR style probabilistic algorithm (SmartSelector) based on correlations between features and pseudo-residuals and performance in previous boosting rounds.
75
+
76
+ ```python
77
+ from kernelboost.feature_selection import SmartSelector
78
+
79
+ selector = SmartSelector(
80
+ redundancy_penalty=0.4,
81
+ relevance_alpha=0.7,
82
+ recency_penalty=0.3,
83
+ )
84
+
85
+ booster = KernelBooster(
86
+ objective=MSEObjective(),
87
+ feature_selector=selector,
88
+ )
89
+ ```
90
+
91
+ #### Early Stopping
92
+
93
+ Training stops automatically if evaluation loss doesn't improve for consecutive rounds (controlled by early_stopping_rounds parameter).
94
+
95
+ ```python
96
+ booster.fit(X_train, y_train, eval_set=(X_val, y_val), early_stopping_rounds=20)
97
+ ```
98
+
99
+ #### RhoOptimizer
100
+
101
+ RhoOptimizer performs post-hoc optimization of step sizes, often improving predictions at minimal additional cost. It can also back out optimal regularization parameters (L1 penalty and learning rate) — useful when unsure what level of regularization to use.
102
+
103
+ ```python
104
+ from kernelboost.rho_optimizer import RhoOptimizer
105
+
106
+ opt = RhoOptimizer(booster, lambda_reg=1.0)
107
+ opt.fit(X_val, y_val)
108
+ opt.update_booster()
109
+
110
+ # Back out optimal hyperparameters
111
+ lambda1, learning_rate = opt.find_hyperparameters()
112
+ ```
113
+
114
+ #### Uncertainty Quantification (Experimental)
115
+
116
+ KernelBooster has both prediction intervals and conditional variance prediction based on kernel estimation. These come for "free" on top of training and require no extra data. Still work in progress.
117
+
118
+ ```python
119
+ # Prediction intervals (90% by default)
120
+ lower, upper = booster.predict_intervals(X, alpha=0.1)
121
+
122
+ # Conditional variance estimates
123
+ variance = booster.predict_variance(X)
124
+ ```
125
+
126
+ Both interval coverage and conditional variance have a tendency to be underestimated, but this depends on the data and how well boosting has converged. No special tuning required: settings that optimize MSE also give reasonable uncertainty estimates. See [benchmarks](#uncertainty-quantification-california-housing) for a comparison with Gaussian Processes.
127
+
128
+ #### Data Preprocessing
129
+
130
+ Scaling data is a good idea for kernel estimation methods. The package includes a simple RankTransformer that often works well.
131
+
132
+ ```python
133
+ from kernelboost.utilities import RankTransformer
134
+
135
+ scaler = RankTransformer(pct=True)
136
+ X_train = scaler.fit_transform(X_train)
137
+ X_test = scaler.transform(X_test)
138
+ ```
139
+
140
+ Like other kernel methods, KernelBooster works best with continuous, smooth features. For datasets with many categorical features, tree-based methods are often better suited—they handle splits on categories naturally.
141
+
142
+ ## API Reference
143
+
144
+ | Class | Purpose |
145
+ |-------|---------|
146
+ | `KernelBooster` | Main booster for regression/binary classification |
147
+ | `MulticlassBooster` | One-vs-rest multiclass wrapper |
148
+ | `MSEObjective` | Mean squared error (regression) |
149
+ | `EntropyObjective` | Cross-entropy (binary classification) |
150
+ | `QuantileObjective` | Pinball loss (quantile regression) |
151
+ | `SmartSelector` | mRMR-style feature selection |
152
+ | `RandomSelector` | Random feature selection |
153
+ | `RhoOptimizer` | Post-hoc step size optimization |
154
+ | `RankTransformer` | Percentile normalization |
155
+
156
+ ## KernelBooster Main Parameters
157
+
158
+ | Parameter | Default | Description |
159
+ |-----------|---------|-------------|
160
+ | `objective` | Required | Loss function: `MSEObjective()`, `EntropyObjective()`, `QuantileObjective()` |
161
+ | `rounds` | auto | Boosting iterations (auto = n_features * 10) |
162
+ | `max_features` | auto | Max features per estimator (auto = min(10, n_features)) |
163
+ | `min_features` | 1 | Min features per estimator |
164
+ | `kernel_type` | 'laplace' | Kernel function: 'laplace' or 'gaussian' |
165
+ | `learning_rate` | 0.5 | Step size shrinkage factor |
166
+ | `lambda1` | 0.0 | L1 regularization |
167
+ | `use_gpu` | False | Enable GPU acceleration |
168
+
169
+ ## Benchmarks
170
+
171
+ Results have inherent randomness due to feature selection and subsampling. Scripts available in `benchmarks/`.
172
+
173
+ ### Regression (California Housing)
174
+ ```text
175
+ =================================================================
176
+ Model MSE MAE R² Time
177
+ -----------------------------------------------------------------
178
+ KernelBooster 0.2053 0.2985 0.8452 11.0s
179
+ sklearn HGBR 0.2247 0.3146 0.8306 0.1s
180
+ XGBoost 0.2155 0.3050 0.8376 0.1s
181
+ LightGBM 0.2097 0.3047 0.8419 0.1s
182
+ =================================================================
183
+ ```
184
+
185
+ ### Binary Classification (Breast Cancer)
186
+ ```text
187
+ =================================================================
188
+ Model Accuracy AUC-ROC F1 Time
189
+ -----------------------------------------------------------------
190
+ KernelBooster 0.9825 0.9984 0.9861 1.6s
191
+ sklearn HGBC 0.9649 0.9944 0.9722 0.1s
192
+ XGBoost 0.9561 0.9938 0.9650 0.0s
193
+ LightGBM 0.9649 0.9925 0.9722 0.0s
194
+ =================================================================
195
+ ```
196
+
197
+ ### Comparison with Kernel Methods (California Housing)
198
+ ```text
199
+ =================================================================
200
+ Kernel Methods Benchmark (n_train=10000)
201
+ =================================================================
202
+ Model MSE MAE R² Time
203
+ -----------------------------------------------------------------
204
+ KernelBooster 0.2091 0.3054 0.8430 6.5s
205
+ KernelRidge 0.4233 0.4835 0.6822 1.7s
206
+ SVR 0.3136 0.3780 0.7646 3.5s
207
+ GP (n=5000) 0.3297 0.4061 0.7524 67.7s
208
+ =================================================================
209
+ ```
210
+
211
+ ### Uncertainty Quantification (California Housing)
212
+
213
+ Prediction intervals and conditional variance estimates compared to Gaussian Process (sklearn) regression:
214
+ ```text
215
+ =================================================================
216
+ Uncertainty Quantification (90% intervals, alpha=0.1)
217
+ =================================================================
218
+ Model Coverage Width Var Corr Var Ratio
219
+ -----------------------------------------------------------------
220
+ KernelBooster 88.1% 1.235 0.206 1.621
221
+ GP (n=5000) 90.9% 1.863 0.157 1.026
222
+ =================================================================
223
+ ```
224
+
225
+ Var Corr is the correlation between predicted variance and squared errors.
226
+ Var Ratio is the ratio between mean of squared_errors and predicted variance.
227
+
228
+ ### CPU/GPU training time comparison (California Housing)
229
+
230
+ ```text
231
+ =================================================================
232
+ GPU vs CPU Training Time (California Housing, n=10000)
233
+ =================================================================
234
+ Backend Time
235
+ -----------------------------------------------------------------
236
+ CPU (C/OpenMP) 38.6s
237
+ GPU (CuPy/CUDA) 4.6s
238
+ =================================================================
239
+ GPU speedup: 8.3x
240
+ ```
241
+
242
+ All benchmarks run on Ubuntu 22.04 with Ryzen 7700 and RTX 3090.
243
+
244
+ ## References
245
+
246
+ - Fan, J., & Gijbels, I. (1996). *Local Polynomial Modelling and Its Applications*. Chapman & Hall.
247
+ - Fan, J., & Yao, Q. (1998). Efficient estimation of conditional variance functions in stochastic regression. Biometrika, 85(3), 645–660.
248
+ - Friedman, J. H. (2001). *Greedy Function Approximation: A Gradient Boosting Machine*. Annals of Statistics, 29(5), 1189-1232.
249
+ - Hansen, B. E. (2004). Nonparametric Conditional Density Estimation. Working paper, University of Wisconsin.
250
+
251
+ ## About
252
+
253
+ KernelBoost is a hobby project exploring alternatives to tree-based gradient boosting. Currently v0.1.0. Pre-compiled binaries included for Linux and Windows. Contributions and feedback welcome.
254
+
255
+ ## License
256
+
257
+ MIT License
@@ -0,0 +1,11 @@
1
+ """KernelBooster: Gradient boosting with Nadaraya-Watson (local constant) estimator as base learners."""
2
+
3
+ __version__ = "0.1.0"
4
+
5
+ from .booster import KernelBooster
6
+ from .multiclassbooster import MulticlassBooster
7
+
8
+ __all__ = [
9
+ "KernelBooster",
10
+ "MulticlassBooster",
11
+ ]