PyPI - kernelboost - Versions diffs - 0.2.1__tar.gz → 0.3.0__tar.gz - Mend

kernelboost 0.2.1tar.gz → 0.3.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (33) hide show

{kernelboost-0.2.1 → kernelboost-0.3.0}/CHANGELOG.md RENAMED Viewed

@@ -2,6 +2,20 @@
 All notable changes to this project will be documented in this file.
+## [0.3.0] - 2026-03-07
+### Added
+- Constant-leaf tree mode (`tree_type='constant'`) with vectorized MSE reduction splitting.
+- MI-based relevance scoring in SmartSelector using histogram mutual information.
+- C implementation of MI computation (`mi_bins.c`) with OpenMP parallelization.
+- Temperature scheduling in SmartSelector via `temperature_max` parameter.
+- `constant_tree_frequency` parameter in SmartSelector to control constant-leaf round frequency.
+### Changed
+- **Breaking**: `feature_list` renamed to `feature_tree_tuple`. Now accepts `(feature_indices, tree_type)` tuples.
+- Feature selectors now return `(features, tree_type)` tuples from `get_features()`.
+- `feature_importances_` now correctly accumulates all rounds instead of losing duplicates.
 ## [0.2.1] - 2026-03-01
 ### Changed

{kernelboost-0.2.1/kernelboost.egg-info → kernelboost-0.3.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: kernelboost
-Version: 0.2.1
+Version: 0.3.0
 Summary: Gradient boosting with kernel regression base learners
 Author-email: tlaiho <tslaiho@gmail.com>
 License: MIT
@@ -29,7 +29,7 @@ Requires-Dist: cupy>=11.0.0; extra == "all"
 ![C](https://img.shields.io/badge/C-language-blue)
 ![GPU](https://img.shields.io/badge/GPU-CUDA%20C%2FCuPy-orange)
 ![License](https://img.shields.io/badge/license-MIT-green)
-![Version](https://img.shields.io/badge/version-0.2.0-blue)
+![Version](https://img.shields.io/badge/version-0.3.0-blue)
 KernelBoost is a gradient boosting algorithm that uses Nadaraya-Watson (local constant) kernel estimators as base learners instead of decision trees. It has:

{kernelboost-0.2.1 → kernelboost-0.3.0}/PYPI_README.md RENAMED Viewed

@@ -7,7 +7,7 @@
 ![C](https://img.shields.io/badge/C-language-blue)
 ![GPU](https://img.shields.io/badge/GPU-CUDA%20C%2FCuPy-orange)
 ![License](https://img.shields.io/badge/license-MIT-green)
-![Version](https://img.shields.io/badge/version-0.2.0-blue)
+![Version](https://img.shields.io/badge/version-0.3.0-blue)
 KernelBoost is a gradient boosting algorithm that uses Nadaraya-Watson (local constant) kernel estimators as base learners instead of decision trees. It has:

{kernelboost-0.2.1 → kernelboost-0.3.0}/README.md RENAMED Viewed

@@ -7,7 +7,7 @@
 ![C](https://img.shields.io/badge/C-language-blue)
 ![GPU](https://img.shields.io/badge/GPU-CUDA%20C%2FCuPy-orange)
 ![License](https://img.shields.io/badge/license-MIT-green)
-![Version](https://img.shields.io/badge/version-0.2.1-blue)
+![Version](https://img.shields.io/badge/version-0.3.0-blue)
 kernelboost is a gradient boosting algorithm that uses Nadaraya-Watson (local constant) kernel estimators as base learners instead of decision trees. It has:
@@ -25,7 +25,7 @@ pip install kernelboost
 pip install cupy-cuda12x  # for CUDA 12
 ```
-> **Dependencies**: NumPy only. CuPy optional for GPU acceleration.
+> **Dependencies**: NumPy. CuPy optional for GPU acceleration.
 ## Quick Start
@@ -61,17 +61,17 @@ There are three main components to kernelboost: KernelBooster class that does th
 After calling fit, KernelBooster starts a training loop which is mostly identical to the algorithm described in Friedman (2001). The main difference is that KernelTree does not choose features through its splits but is instead given them by the booster class. Default feature selection is random with increasing kernel sizes in terms of number of features. Random feature selection naturally creates randomness to training results, which can be mitigated with a lower learning rate and more boosting iterations. Similarly to Friedman (2001), KernelBooster can fit several different objective functions, which are passed in as an Objective class.
-KernelTree splits numerical data by density and categorical data by MSE. The idea here is that the kernel bandwidth should largely depend on how dense the data is. For numerical data, KernelTree splits until number of observations is below the 'max_sample' parameter. Besides finding regions which would be well served by the same bandwidth, this has the benefit of speeding up computation significantly in calculating the kernel matrices for the kernel estimator. For example, with ten splits we go from computing a (n, n) matrix to computing ten (n/10, n/10) matrices with n²/10 operations instead of n² (assuming equal splits). This saves a whopping 90% of compute.
+KernelTree splits numerical data by density and categorical data by MSE. It can also fit pure decision trees with mean values at leaves. The idea here is that the kernel bandwidth should largely depend on how dense the data is. For numerical data, KernelTree splits until number of observations is below the 'max_sample' parameter. Besides finding regions which would be well served by the same bandwidth, this has the benefit of speeding up computation significantly in calculating the kernel matrices for the kernel estimator. For example, with ten splits we go from computing a (n, n) matrix to computing ten (n/10, n/10) matrices with n²/10 operations instead of n² (assuming equal splits). This saves a nice 90% of compute.
-The actual estimation is handled by KernelEstimator. It optimizes a scalar precision (inverse bandwidth) for the local constant estimator using leave-one-out cross validation and random search between given bounds. It has both Gaussian and (isotropic) Laplace kernels with default being the Laplace kernel. KernelEstimator also has uncertainty quantification methods for quantile and conditional variance prediction, but these are at this moment still experimental as they use a "naive" single kernel method whose precision is optimized for mean prediction.
+The actual estimation is handled by KernelEstimator. It optimizes a scalar precision (inverse bandwidth) for the local constant estimator using leave-one-out cross validation and random search between given bounds. It has both Gaussian and (isotropic) Laplace kernels with default being the Laplace kernel. KernelEstimator also has uncertainty quantification methods for quantile and conditional variance prediction (Fan & Yao 1998).
 ### Notable features
-Beyond the core boosting algorithm, kernelboost includes a few features worth highlighting:
+Beyond the core boosting algorithm, a few features worth highlighting:
 #### Smart Feature Selection
-While the default feature selection is random (RandomSelector), the package includes an mRMR style probabilistic algorithm (SmartSelector) based on correlations between features and pseudo-residuals and performance in previous boosting rounds.
+While the default feature selection is random (RandomSelector), the package includes an mRMR style probabilistic algorithm (SmartSelector) based on mutual information between features and pseudo-residuals and loss gain in previous boosting rounds.
 ```python
 from kernelboost.feature_selection import SmartSelector
@@ -79,7 +79,6 @@ from kernelboost.feature_selection import SmartSelector
 selector = SmartSelector(
     redundancy_penalty=0.4,
     relevance_alpha=0.7,
-    recency_penalty=0.3,
 )
 booster = KernelBooster(
@@ -113,7 +112,7 @@ lambda1, learning_rate = opt.find_hyperparameters()
 #### Uncertainty Quantification (Experimental)
-KernelBooster has both prediction intervals and conditional variance prediction based on kernel estimation. These come for "free" on top of training and require no extra data. Still work in progress.
+KernelBooster has both prediction intervals and conditional variance prediction (Fan & Yao 1998) based on kernel estimation. These require no extra data and in that sense come for "free" on top of training. Still work in progress.
 ```python
 # Prediction intervals (90% by default)
@@ -123,7 +122,7 @@ lower, upper = booster.predict_intervals(X, alpha=0.1)
 variance = booster.predict_variance(X)
 ```
-Both interval coverage and conditional variance have a tendency to be underestimated, but this depends on the data and how well boosting has converged. No special tuning required: settings that optimize MSE have also given reasonable uncertainty estimates in testing. See [benchmarks](#uncertainty-quantification-california-housing) for a comparison with Gaussian Processes.
+Both interval coverage and conditional variance have a tendency to be underestimated. See [benchmarks](#uncertainty-quantification-california-housing) for a comparison with Gaussian Processes.
 #### Data Preprocessing
@@ -197,10 +196,10 @@ Results have inherent randomness due to feature selection and subsampling. Scrip
 =================================================================
 Model                       MSE        MAE         R²       Time
 -----------------------------------------------------------------
-kernelboost              0.2053     0.2985     0.8452      11.0s
-sklearn HGBR             0.2247     0.3146     0.8306       0.1s
-XGBoost                  0.2155     0.3050     0.8376       0.1s
-LightGBM                 0.2097     0.3047     0.8419       0.1s
+kernelboost              0.1790     0.2781     0.8651      12.9s
+sklearn HGBR             0.2103     0.3018     0.8415       0.2s
+XGBoost                  0.2080     0.2962     0.8432       0.1s
+LightGBM                 0.1972     0.2894     0.8513       0.1s
 =================================================================
 ```
@@ -209,10 +208,10 @@ LightGBM                 0.2097     0.3047     0.8419       0.1s
 =================================================================
 Model                  Accuracy    AUC-ROC         F1       Time
 -----------------------------------------------------------------
-kernelboost              0.9825     0.9984     0.9861       1.6s
-sklearn HGBC             0.9649     0.9944     0.9722       0.1s
-XGBoost                  0.9561     0.9938     0.9650       0.0s
-LightGBM                 0.9649     0.9925     0.9722       0.0s
+kernelboost              0.9825     0.9984     0.9861       2.1s
+sklearn HGBC             0.9649     0.9948     0.9722       0.1s
+XGBoost                  0.9561     0.9941     0.9650       0.0s
+LightGBM                 0.9737     0.9921     0.9790       0.0s
 =================================================================
 ```
@@ -223,10 +222,10 @@ Kernel Methods Benchmark (n_train=10000)
 =================================================================
 Model                       MSE        MAE         R²       Time
 -----------------------------------------------------------------
-kernelboost              0.2091     0.3054     0.8430       6.5s
-KernelRidge              0.4233     0.4835     0.6822       1.7s
-SVR                      0.3136     0.3780     0.7646       3.5s
-GP (n=5000)              0.3297     0.4061     0.7524      67.7s
+kernelboost              0.2027     0.2936     0.8456       4.7s
+KernelRidge              0.4258     0.4828     0.6756       1.5s
+SVR                      0.3133     0.3766     0.7613       3.3s
+GP (n=5000)              0.3300     0.4038     0.7485      29.8s
 =================================================================
 ```
@@ -237,10 +236,10 @@ Prediction intervals and conditional variance estimates compared to Gaussian Pro
 =================================================================
 Uncertainty Quantification (90% intervals, alpha=0.1)
 =================================================================
-Model                  Coverage    Width    Var Corr   Var Ratio
+Model                  Coverage    Width   Var Corr   Var Ratio
 -----------------------------------------------------------------
-kernelboost              88.1%    1.235      0.206       1.621
-GP (n=5000)              90.9%    1.863      0.157       1.026
+kernelboost              91.4%    1.379      0.228       1.166
+GP (n=5000)              91.0%    1.832      0.156       1.062
 =================================================================
 ```
@@ -255,10 +254,10 @@ GPU vs CPU Training Time (California Housing, n=10000)
 =================================================================
 Backend                                                  Time
 -----------------------------------------------------------------
-CPU (C/OpenMP)                                          38.6s
-GPU (CuPy/CUDA)                                          4.6s
+CPU (C/OpenMP)                                          48.2s
+GPU (CuPy/CUDA)                                          7.3s
 =================================================================
-GPU speedup: 8.3x
+GPU speedup: 6.7x
 ```
 All benchmarks run on Ubuntu 22.04 with Ryzen 7700 and RTX 3090.
@@ -274,7 +273,7 @@ All benchmarks run on Ubuntu 22.04 with Ryzen 7700 and RTX 3090.
 ## About
-kernelboost is a hobby project exploring alternatives to tree-based gradient boosting. Currently v0.2.1. Pre-compiled binaries included for Linux and Windows. Contributions and feedback welcome.
+kernelboost is a hobby project exploring alternatives to tree-based gradient boosting. Pre-compiled binaries included for Linux and Windows. Contributions and feedback welcome.
 ## License

{kernelboost-0.2.1 → kernelboost-0.3.0}/kernelboost/__init__.py RENAMED Viewed

@@ -1,6 +1,6 @@
 """KernelBooster: Gradient boosting with Nadaraya-Watson (local constant) estimator as base learners."""
-__version__ = "0.2.1"
+__version__ = "0.3.0"
 from .booster import KernelBooster
 from .multiclassbooster import MulticlassBooster

{kernelboost-0.2.1 → kernelboost-0.3.0}/kernelboost/booster.py RENAMED Viewed

@@ -10,18 +10,19 @@ class KernelBooster:
     objective : Objective
         Loss function (e.g., MSEObjective(), EntropyObjective()).
     feature_selector : FeatureSelector, default=None
-        Feature selection strategy. If None and feature_list not provided,
+        Feature selection strategy. If None and feature_tree_tuple not provided,
         defaults to RandomSelector.
     feature_names : list, default=None
         Names for features. Uses indices if None.
-    feature_list : list, default=None
-        Explicit feature subsets per round. Takes priority over feature_selector.
     min_features : int, default=1
         Minimum features per round.
     max_features : int, default=None
         Maximum features per round. If None, uses min(10, n_features).
     n_estimators : int, default=None
         Number of boosting rounds. Auto-calculated from n_features if None.
+    feature_tree_tuple : tuple, default=None
+        Explicit feature subsets and tree type ('kernel' or 'constant') per round.
+        Takes priority over feature_selector.
     subsample_share : float, default=0.5
         Training sample share per round.
     lambda1 : float, default=0.0
@@ -66,10 +67,10 @@ class KernelBooster:
         objective,
         feature_selector: FeatureSelector = None,
         feature_names: list = None,
-        feature_list: list = None,
         min_features: int = 1,
         max_features: int = None,
         n_estimators: int = None,
+        feature_tree_tuple: tuple = None,
         subsample_share: float = 0.5,
         lambda1: float = 0.0,
         learning_rate: float = 0.5,
@@ -94,7 +95,7 @@ class KernelBooster:
         self.feature_selector = feature_selector
         self.feature_names = feature_names
-        self.feature_list = feature_list
+        self.feature_tree_tuple = feature_tree_tuple
         self.lambda1 = lambda1
         self.learning_rate = learning_rate
         self.n_estimators = n_estimators
@@ -163,6 +164,8 @@ class KernelBooster:
             raise ValueError(f"n_iter_no_change must be a positive integer or None, got {self.n_iter_no_change}")
         if not (0.0 <= self.overlap_epsilon < 0.5):
             raise ValueError(f"overlap_epsilon must be in [0.0, 0.5), got {self.overlap_epsilon}")
+        if self.feature_tree_tuple is not None and not isinstance(self.feature_tree_tuple, tuple):
+            raise ValueError(f"feature_tree_tuple: must be a tuple of (indices, tree_type) tuples")
     def _validate_data(self, X: np.ndarray, y: np.ndarray) -> None:
         """Validate training data."""
@@ -269,9 +272,6 @@ class KernelBooster:
         self._training_loop()
-        feature_tuples = (tuple(sublist) for sublist in self.fitted_features_)
-        self.rho_dict_ = dict(zip(feature_tuples, self.rho_))
         indices = self._last_n_active_tree_indices(1)
         self.last_active_tree_idx_ = indices[0] if indices else None
@@ -304,10 +304,10 @@ class KernelBooster:
         """Initialize training state."""
         self._sample_size = int(self.subsample_share * self.n_samples_)
-        # priority: explicit feature_list > feature_selector > default random
-        if self.feature_list is not None:
-            self.n_estimators_ = len(self.feature_list)
-            self.feature_list_ = self.feature_list
+        # priority: explicit feature_tree_tuple > feature_selector > default random
+        if self.feature_tree_tuple is not None:
+            self.n_estimators_ = len(self.feature_tree_tuple)
+            self.feature_tree_tuple_ = self.feature_tree_tuple
             self._use_selector = False
         else:
             # default to RandomSelector if no Selector given
@@ -362,11 +362,11 @@ class KernelBooster:
         """Execute one boosting iteration."""
         pseudoresiduals = self.objective.gradient(self.y_, self.predictions_)
-        # get features for this round
+        # get features and leaf type for this round
         if self._use_selector:
-            feature_indices = self.feature_selector.get_features(round_idx, pseudoresiduals)
+            feature_indices, tree_type = self.feature_selector.get_features(round_idx, pseudoresiduals)
         else:
-            feature_indices = self.feature_list_[round_idx]
+            feature_indices, tree_type = self.feature_tree_tuple_[round_idx]
         training_features = self.X_[:, feature_indices]
         all_data = np.concatenate((pseudoresiduals, training_features), axis=1)
@@ -384,6 +384,7 @@ class KernelBooster:
                 **self.tree_optimization,
                 use_gpu=self.use_gpu,
                 **self.kernel_optimization,
+                tree_type=tree_type,
             )
         )
         self.trees_[-1].fit(training_data[:, 1:], training_data[:, 0].reshape(-1, 1))
@@ -391,13 +392,14 @@ class KernelBooster:
         # store tree predictions for hyperparameter optimization
         self.tree_predictions_.append(self.trees_[-1].predict(training_features))
-        precisions = [
-            est.precision_ for est, is_kern in zip(
-                self.trees_[-1].compiled_.estimators, self.trees_[-1].compiled_.is_kernel
-            ) if is_kern
-        ]
-        if precisions:
-            self.last_precision_ = np.mean(precisions)
+        if tree_type == 'kernel':
+            precisions = [
+                est.precision_ for est, is_kern in zip(
+                    self.trees_[-1].compiled_.estimators, self.trees_[-1].compiled_.is_kernel
+                ) if is_kern
+            ]
+            if precisions:
+                self.last_precision_ = np.mean(precisions)
         self.rho_.append(
             self.objective.line_search(
@@ -742,7 +744,7 @@ class KernelBooster:
         return {
             'objective': self.objective,
             'feature_names': self.feature_names,
-            'feature_list': self.feature_list,
+            'feature_tree_tuple': self.feature_tree_tuple,
             'feature_selector': self.feature_selector,
             'max_depth': self.max_depth,
             'max_sample': self.max_sample,
@@ -782,6 +784,7 @@ class KernelBooster:
             'initial_precision': self.initial_precision,
             'sample_share': self.sample_share,
             'precision_method': self.precision_method,
+            'pilot_factor': self.pilot_factor,
         }
         self.tree_optimization = {
@@ -814,10 +817,10 @@ class KernelBooster:
     @property
     def feature_importances_(self) -> np.ndarray:
         """Feature importance based on aggregated |rho| values."""
-        if not hasattr(self, 'rho_dict_'):
+        if not hasattr(self, 'trees_'):
             raise RuntimeError("Booster not fitted. Call fit() first.")
         importances = np.zeros(self.n_features_in_)
-        for feature_indices, rho in self.rho_dict_.items():
+        for feature_indices, rho in zip(self.fitted_features_, self.rho_):
             for idx in feature_indices:
                 importances[idx] += abs(rho)
         total = importances.sum()

{kernelboost-0.2.1 → kernelboost-0.3.0}/kernelboost/feature_selection.py RENAMED Viewed

@@ -1,6 +1,45 @@
 from abc import ABC, abstractmethod
 from collections.abc import Generator
+import ctypes
+import os
+import platform
 import numpy as np
+from numpy.ctypeslib import ndpointer
+# Load C library for fast MI computation
+_dir_path = os.path.dirname(os.path.realpath(__file__))
+_system = platform.system()
+if _system == "Linux":
+    _mi_libname = f"{_dir_path}/libmi.so"
+elif _system == "Windows":
+    _mi_libname = f"{_dir_path}/libmi.dll"
+elif _system == "Darwin":
+    _mi_libname = f"{_dir_path}/libmi.dylib"
+else:
+    raise Exception(f"Platform '{_system}' not supported.")
+try:
+    _mi_lib = ctypes.CDLL(_mi_libname)
+except OSError:
+    raise OSError(
+        f"Could not load C library at {_mi_libname}. "
+        f"Compile it with: gcc -shared -o {_mi_libname} -fPIC kernelboost/mi_bins.c "
+        f"-lm -fopenmp -O3 -march=native -ffast-math -funroll-loops -flto"
+    )
+_mi_lib.histogram_mi_batch.restype = None
+_mi_lib.histogram_mi_batch.argtypes = (
+    ndpointer(ctypes.c_float, flags="C_CONTIGUOUS"),  # X
+    ndpointer(ctypes.c_float, flags="C_CONTIGUOUS"),  # residuals
+    ctypes.c_int,                                       # n
+    ctypes.c_int,                                       # n_features
+    ndpointer(ctypes.c_float, flags="C_CONTIGUOUS"),  # x_thresholds
+    ndpointer(ctypes.c_float, flags="C_CONTIGUOUS"),  # y_thresholds
+    ctypes.c_int,                                       # n_thresh
+    ndpointer(ctypes.c_float, flags="C_CONTIGUOUS"),  # out_mi
+)
 class FeatureSelector(ABC):
@@ -45,9 +84,9 @@ class FeatureSelector(ABC):
         pass
     @abstractmethod
-    def get_features(self, round_idx: int, residuals: np.ndarray) -> list[int]:
+    def get_features(self, round_idx: int, residuals: np.ndarray) -> tuple[list[int], str]:
         """
-        Get feature indices for the next boosting round.
+        Get feature indices and tree type for the next boosting round.
         Args:
         round_idx : int
@@ -56,8 +95,8 @@ class FeatureSelector(ABC):
             Current pseudo-residuals (n_samples,)
         Returns:
-        list[int]
-            Feature indices to use for this round
+        tuple[list[int], str]
+            Feature indices and leaf type ('kernel' or 'constant')
         """
         pass
@@ -137,32 +176,37 @@ class RandomSelector(FeatureSelector):
             self._rng.shuffle(features)
             yield features[:max_size].tolist()
-    def get_features(self, round_idx: int, residuals: np.ndarray) -> list[int]:
+    def get_features(self, round_idx: int, residuals: np.ndarray) -> tuple[list[int], str]:
         selected = next(self._gen)
-        return self._complete_groups(selected)
+        return self._complete_groups(selected), "kernel"
 class SmartSelector(FeatureSelector):
     """
-    Feature selection using mRMR-style approach using correlations.
-    Selects features probabilistically based on relevance, redundancy and recency.
+    Feature selection using mRMR-style approach with mutual information relevance.
+    Selects features probabilistically based on relevance, redundancy and recency.
     Kernel sizes progress from small to large.
     Args:
     redundancy_penalty : float, default=0.4
         Weight for redundancy penalty (0 = ignore, 1 = strong penalty)
-    relevance_alpha : float, default=0.6
-        Balance between residual correlation (1.0) and historical weight (0.0)
-    recency_penalty : float, default=0.3
+    relevance_alpha : float, default=0.7
+        Balance between MI relevance (1.0) and historical weight (0.0)
+    recency_penalty : float, default=0.35
         Penalty applied to recently-used features (0 = no penalty, 1 = strong)
     recency_decay : float, default=0.7
         Decay factor for recency penalty each round (0 = instant decay, 1 = no decay)
     temperature : float, default=0.3
-        Softmax temperature for feature selection. Lower = greedier, higher = more exploration.
-    weight_decay : float, default=0.9
+        Softmax temperature (minimum when using schedule). Higher means more exploration.
+    temperature_max : float | None, default=None
+        Starting temperature for schedule. None means no schedule (fixed temperature).
+        Decays linearly from temperature_max to temperature over all rounds.
+    weight_decay : float, default=0.95
         Decay factor for feature weights each round.
     feature_groups : list[list[int]] | None, default=None
         Groups of features that should be selected together.
+    constant_tree_frequency : int, default=25
+        Insert a constant-leaf tree every N rounds.
     seed : int, optional
         Random seed for reproducibility.
     """
@@ -170,12 +214,14 @@ class SmartSelector(FeatureSelector):
     def __init__(
         self,
         redundancy_penalty: float = 0.4,
-        relevance_alpha: float = 0.6,
-        recency_penalty: float = 0.3,
+        relevance_alpha: float = 0.7,
+        recency_penalty: float = 0.35,
         recency_decay: float = 0.7,
         temperature: float = 0.3,
-        weight_decay: float = 0.9,
+        temperature_max: float | None = None,
+        weight_decay: float = 0.95,
         feature_groups: list[list[int]] | None = None,
+        constant_tree_frequency: int = 25,
         seed: int | None = None,
     ):
         super().__init__()
@@ -184,10 +230,13 @@ class SmartSelector(FeatureSelector):
         self.recency_penalty = recency_penalty
         self.recency_decay = recency_decay
         self.temperature = temperature
+        self.temperature_max = temperature_max
         self.weight_decay = weight_decay
         self.feature_groups = feature_groups
         self.seed = seed if seed is not None else np.random.randint(0, 2**31)
+        self.constant_frequency = constant_tree_frequency
     def initialize(
         self,
         X: np.ndarray,
@@ -197,12 +246,22 @@ class SmartSelector(FeatureSelector):
         rounds: int,
     ) -> int:
         self.n_features = n_features
-        self.X_std_ = X.std(axis=0)
-        self.X_centered_ = X - X.mean(axis=0)
+        self.X_ = X.view()
+        self.n_bins_ = max(10, int(np.sqrt(X.shape[0] / 5)))
+        self.rounds_ = rounds
+        self.schedule_rounds_ = max(1, rounds) if self.temperature_max is not None else None
         self.corr_matrix_ = np.corrcoef(X, rowvar=False)
         self.corr_matrix_ = np.nan_to_num(self.corr_matrix_, nan=0.0)
+        # precompute quantile thresholds for each feature (fixed count for C compatibility)
+        self.quantiles = np.linspace(0, 1, self.n_bins_ + 1)
+        self.n_thresh_ = self.n_bins_ + 1
+        self.x_thresholds_ = np.array([
+            np.quantile(X[:, f], self.quantiles)
+            for f in range(n_features)
+        ], dtype=np.float32)
         self.feature_weights_ = np.zeros(n_features)
         self.recency_scores_ = np.zeros(n_features)
         self._rng = np.random.default_rng(self.seed)
@@ -229,10 +288,16 @@ class SmartSelector(FeatureSelector):
             yield max_size
     def get_features(self, round_idx: int, residuals: np.ndarray) -> list[int]:
-        k = next(self._size_gen)
-        relevance = self._compute_relevance(residuals)
-        selected = self._select_features(k, relevance)
-        return self._complete_groups(selected)
+        if round_idx > 0 and round_idx % self.constant_frequency == 0:
+            tree_type = "constant"
+            selected = list(range(self.n_features))
+        else:
+            tree_type = "kernel"
+            n_features = next(self._size_gen)
+            relevance = self._compute_relevance(residuals)
+            selected = self._select_features(n_features, relevance, round_idx)
+        return self._complete_groups(selected), tree_type
     def update(self, feature_indices: list[int], gain: float) -> None:
         self.recency_scores_ *= self.recency_decay
@@ -246,32 +311,43 @@ class SmartSelector(FeatureSelector):
                 self.feature_weights_[idx] += weight_increment
     def _compute_relevance(self, pseudoresiduals: np.ndarray) -> np.ndarray:
-        """Compute relevance scores from residual correlation, history, and recency."""
-        pseudoresiduals = pseudoresiduals.ravel()
-        # correlation with pseudoresiduals
-        r_centered = pseudoresiduals - pseudoresiduals.mean()
-        r_std = r_centered.std()
-        cov = self.X_centered_.T @ r_centered / len(pseudoresiduals)
-        denom = self.X_std_ * r_std
-        correlations = np.abs(np.divide(cov, denom, out=np.zeros_like(cov), where=denom > 1e-10))
+        """Compute relevance scores from MI with residuals, history, and recency."""
+        residuals = np.ascontiguousarray(pseudoresiduals.ravel(), dtype=np.float32)
+        y_thresholds = np.quantile(residuals, self.quantiles).astype(np.float32)
+        raw_scores = np.zeros(self.n_features, dtype=np.float32)
+        _mi_lib.histogram_mi_batch(
+            np.ascontiguousarray(self.X_, dtype=np.float32),
+            residuals,
+            self.X_.shape[0],
+            self.n_features,
+            np.ascontiguousarray(self.x_thresholds_),
+            np.ascontiguousarray(y_thresholds),
+            self.n_thresh_,
+            raw_scores,
+        )
-        # normalize to max values
-        corr_norm = correlations / (correlations.max() + 1e-10)
+        # normalize to [0, 1]
+        scores_norm = raw_scores / (raw_scores.max() + 1e-10)
         weights_norm = self.feature_weights_ / (self.feature_weights_.max() + 1e-10)
         relevance = (
-            self.relevance_alpha * corr_norm +
-            (1 - self.relevance_alpha) * weights_norm -
+            self.relevance_alpha * scores_norm +
+            (1 - self.relevance_alpha) * weights_norm -
             self.recency_penalty * self.recency_scores_
         )
-        relevance = np.maximum(relevance, 0.0)
+        return np.maximum(relevance, 0.0)
-        return relevance
+    def _get_temperature(self, round_idx: int) -> float:
+        """Compute temperature for the current round."""
+        if self.schedule_rounds_ is None:
+            return self.temperature
+        progress = min(round_idx / self.schedule_rounds_, 1.0)
+        return self.temperature_max + (self.temperature - self.temperature_max) * progress
-    def _select_features(self, k: int, relevance: np.ndarray) -> list[int]:
+    def _select_features(self, k: int, relevance: np.ndarray, round_idx: int) -> list[int]:
         """Select k features probabilistically using relevance and redundancy."""
+        temp = self._get_temperature(round_idx)
         selected = []
         available = list(range(self.n_features))
@@ -280,7 +356,6 @@ class SmartSelector(FeatureSelector):
                 break
             scores = np.zeros(len(available))
             for i, j in enumerate(available):
-                # redundancy with already selected
                 if selected:
                     redundancy = np.mean([
                         abs(self.corr_matrix_[j, s]) for s in selected
@@ -290,12 +365,10 @@ class SmartSelector(FeatureSelector):
                 scores[i] = relevance[j] - self.redundancy_penalty * redundancy
-            # convert scores to probabilities
-            scaled = scores / self.temperature
-            exp_scores = np.exp(scaled - scaled.max())  # subtract max for numerical stability
+            scaled = scores / temp
+            exp_scores = np.exp(scaled - scaled.max())
             probs = exp_scores / exp_scores.sum()
-            # select based on probabilities
             idx = self._rng.choice(len(available), p=probs)
             feat = available[idx]

{kernelboost-0.2.1 → kernelboost-0.3.0}/kernelboost/kernels.c RENAMED Viewed

@@ -71,8 +71,8 @@ float loo_mse(
     int n = training_obs;
     size_t tri_size = (size_t)n * (n + 1) / 2;
-    // allocate upper triangle storage
-    float *upper = (float *)malloc(tri_size * sizeof(float));
+    // allocate upper triangle storage, check null
+    float *upper = malloc(tri_size * sizeof(float));
     if (!upper) return -1.0f;
     // first pass: compute upper triangle kernel values

kernelboost-0.3.0/kernelboost/libmi.dll ADDED Viewed

Binary file

kernelboost-0.3.0/kernelboost/libmi.so ADDED Viewed

Binary file

kernelboost-0.3.0/kernelboost/mi_bins.c ADDED Viewed

@@ -0,0 +1,72 @@
+#include <stdio.h>
+#include <stdlib.h>
+#include <math.h>
+#include <omp.h>
+// C code for fast MI estimation over the whole feature set
+static int find_bin(float *thresholds, int n_thresh, float val) {
+    int lo = 0, hi = n_thresh - 2;
+    while (lo <= hi) {
+        int mid = (lo + hi) / 2;
+        if (val < thresholds[mid])
+            hi = mid - 1;
+        else
+            lo = mid + 1;
+    }
+    int bin = lo - 1;
+    if (bin < 0) bin = 0;
+    if (bin > n_thresh - 2) bin = n_thresh - 2;
+    return bin;
+}
+void histogram_mi_batch(
+    float *X,              /* (n, n_features), row-major */
+    float *residuals,      /* (n,) */
+    int n,
+    int n_features,
+    float *x_thresholds,   /* (n_features, n_thresh) */
+    float *y_thresholds,   /* (n_thresh,) */
+    int n_thresh,                /* n_bins + 1 */
+    float *out_mi /* (n_features,) output */ ) {
+    int n_bins = n_thresh - 1;
+    size_t binsize = (size_t) n_bins * n_bins;
+    #pragma omp parallel for schedule(dynamic)
+    for (int f=0; f < n_features; f++) {
+        float *hist = calloc(binsize, sizeof(float));
+            for (int i=0; i < n; i++) {
+                int xi = find_bin(x_thresholds + f * n_thresh, n_thresh, X[i * n_features + f]);
+                int yi = find_bin(y_thresholds, n_thresh, residuals[i]);
+                hist[xi * n_bins + yi] += 1;
+            }
+        // convert to probabilities, compute marginals
+        double inv_n = 1.0 / n;
+        float *pxy = calloc(binsize, sizeof(float));
+        float *px = calloc(n_bins, sizeof(float));
+        float *py = calloc(n_bins, sizeof(float));
+        for (int x_index=0; x_index < n_bins; x_index++){
+            for (int y_index=0; y_index < n_bins; y_index++){
+                float probability = hist[x_index * n_bins + y_index] * inv_n;
+                pxy[x_index * n_bins + y_index] = probability;
+                px[x_index] += probability;
+                py[y_index] += probability;
+            }
+        }
+        // MI
+        double mi = 0.0;
+        for (int x_index=0; x_index < n_bins; x_index++){
+            for (int y_index=0; y_index < n_bins; y_index++){
+                if (pxy[x_index * n_bins + y_index] > 0 && px[x_index] * py[y_index] > 0)
+                    mi += (pxy[x_index * n_bins + y_index] *
+                        log(pxy[x_index * n_bins + y_index] / (px[x_index] * py[y_index])));
+            }
+        }
+        out_mi[f] = fmax(0, mi);
+        free(hist); free(pxy); free(px); free(py);
+    }
+    }

{kernelboost-0.2.1 → kernelboost-0.3.0}/kernelboost/rho_optimizer.py RENAMED Viewed

@@ -371,8 +371,6 @@ class RhoOptimizer:
         self.booster_.rho_ = list(self.rho_)
-        feature_tuples = (tuple(sublist) for sublist in self.booster_.fitted_features_)
-        self.booster_.rho_dict_ = dict(zip(feature_tuples, self.booster_.rho_))
         if self.lambda1_ is not None:
             self.booster_.lambda1 = self.lambda1_

{kernelboost-0.2.1 → kernelboost-0.3.0}/kernelboost/tree.py RENAMED Viewed

@@ -87,9 +87,8 @@ class CompiledTree:
 class KernelTree:
     """
-    Decision tree that splits numerical features by density, categorical by MSE gain.
-    Leaf nodes contain KernelEstimators or constants depending on sample size and
-    whether numerical features present.
+    Decision tree that splits either by density (for kernel leaves) or by MSE gain
+    (categorical features or constant leaves).
     Args:
     min_sample : int, default=500
@@ -120,6 +119,13 @@ class KernelTree:
         Precision selection method: 'search' (LOO-CV) or 'silverman'.
     pilot_factor : float, default=3.0
         Multiplier for pilot precision bounds: search range is [p/factor, p*factor].
+    tree_type : str, default='kernel'
+        Leaf node type: 'kernel' for kernel estimation or 'constant' for constant leaves.
+    gain_threshold : float, default=1e-3
+        Minimum MSE gain required for a split in constant leaf mode.
+    quantiles : list, optional
+        Split candidate quantiles for constant leaf mode.
+        If None, defaults to linspace(0.01, 0.99, 99).
     """
     def __init__(
@@ -130,13 +136,16 @@ class KernelTree:
         feature_types: dict = None,
         overlap_epsilon: float = 0.05,
         use_gpu: bool = False,
-        kernel_type: str = 'gaussian',
+        kernel_type: str = 'laplace',
         search_rounds: int = 20,
         bounds: tuple = (0.10, 35.0),
         initial_precision: float = 0.0,
         sample_share: float = 1.0,
         precision_method: str = 'pilot-cv',
         pilot_factor: float = 3.0,
+        tree_type: str = 'kernel',
+        gain_threshold: float = 1e-3,
+        quantiles: list = None,
     ):
         self.min_sample = min_sample
@@ -152,6 +161,13 @@ class KernelTree:
         self.sample_share = sample_share
         self.precision_method = precision_method
         self.pilot_factor = pilot_factor
+        self.tree_type = tree_type
+        self.gain_threshold = gain_threshold
+        if quantiles is None:
+            self.quantiles = np.linspace(0.01, 0.99, 99)
+        else:
+            self.quantiles = quantiles
         self.kernel_optimization = {
             'kernel_type': kernel_type,
@@ -163,6 +179,10 @@ class KernelTree:
             'pilot_factor': pilot_factor,
         }
+        # min sample decreased, depth increase for non-kernel trees
+        self._const_min_sample = max(50, self.min_sample // 5)
+        self._const_max_depth = self.max_depth + 3
         self._validate_params()
     def _validate_params(self):
@@ -180,31 +200,38 @@ class KernelTree:
                 raise ValueError(f"feature_types values must be 'C' or 'N', got invalid keys: {invalid}")
         if not (0.0 <= self.overlap_epsilon < 0.5):
             raise ValueError("overlap_epsilon must be in [0.0, 0.5)")
+        if self.tree_type not in {"kernel", "constant"}:
+            raise ValueError(f"tree_type must be 'kernel' or 'constant', got '{self.tree_type}'")
+        if self.gain_threshold < 0:
+            raise ValueError(f"gain_threshold must be >= 0, got {self.gain_threshold}")
     def fit(self, X: np.ndarray, y: np.ndarray) -> "KernelTree":
         """Fit the tree to training data."""
         self.X_ = X.astype(np.float32)
         self.y_ = y.astype(np.float32).ravel()
         self.n_samples_, self.n_features_ = X.shape
-        self.feature_ranges_ = self.X_.max(axis=0) - self.X_.min(axis=0)
         self._detect_types()
+        if not self.numerical_:
+            self.tree_type = "constant"
+        if self.tree_type == "constant":
+            self.categorical_ = []
+            self.numerical_ = list(range(self.n_features_))
+            # uses indices rather than values:
+            sorted_by_feat = [np.argsort(self.X_[:, f]) for f in range(self.n_features_)]
+            self.root_ = self._grow_constant(sorted_by_feat)
+        else:
+            self.feature_ranges_ = self.X_.max(axis=0) - self.X_.min(axis=0)
+            self.root_ = self._grow_numerical(self.X_, self.y_)
+            if self.categorical_:
+                self.root_ = self._expand_categorical(self.root_, self.X_, self.y_)
-        self.root_ = self._grow_numerical(self.X_, self.y_)
-        if self.categorical_:
-            self.root_ = self._expand_categorical(self.root_, self.X_, self.y_)
         self.compiled_ = self._compile(self.root_)
         self.depth_ = self._compute_depth(self.root_)
+        del self.X_, self.y_
         return self
-    def _compute_depth(self, node: Node, current: int = 0) -> int:
-        """Compute the maximum depth of the tree."""
-        if isinstance(node, Leaf):
-            return current
-        return max(
-            self._compute_depth(node.left, current + 1),
-            self._compute_depth(node.right, current + 1)
-        )
     def _detect_types(self) -> None:
         """Classify features as categorical or numerical."""
         self.categorical_, self.numerical_ = [], []
@@ -225,7 +252,6 @@ class KernelTree:
                     self.categorical_.append(i)
                 else:
                     self.numerical_.append(i)
-        self._use_kernel = len(self.numerical_) > 0
     def _grow_numerical(self, X: np.ndarray, y: np.ndarray) -> Node:
         """Grow a tree on numerical features."""
@@ -297,7 +323,7 @@ class KernelTree:
             return self._make_leaf(X, y)
         feat, thresh, gain = split
-        if gain < 1e-4:  # no meaningful improvement
+        if gain < self.gain_threshold:
             return self._make_leaf(X, y)
         left_mask = X[:, feat] <= thresh
@@ -312,7 +338,7 @@ class KernelTree:
         best_mse, best_feat, best_thresh = base_mse, None, None
         for f in self.categorical_:
-            values = np.unique(X[:, f])
+            values = np.unique(X[:, f])
             for thresh in values[:-1]:
                 left_mask = X[:, f] <= thresh
                 n_left, n_right = left_mask.sum(), (~left_mask).sum()
@@ -329,17 +355,115 @@ class KernelTree:
             return None
         return best_feat, best_thresh, base_mse - best_mse
     def _make_leaf(self, X: np.ndarray, y: np.ndarray) -> Leaf:
         """Create leaf with kernel estimator or mean constant."""
         n = len(y)
-        if self._use_kernel and n <= self.max_sample:
+        if n <= self.max_sample:
             X_num = np.delete(X, self.categorical_, axis=1) if self.categorical_ else X
             k = KernelEstimator(use_gpu=self.use_gpu, **self.kernel_optimization)
             k.fit(X_num, y)
             return Leaf(k)
         return Leaf(float(np.mean(y)))
+    def _grow_constant(self, sorted_by_feat: list[np.ndarray], depth: int = 0) -> Node:
+        """Grow a tree with constant leaves."""
+        n = len(sorted_by_feat[0])
+        samples = sorted_by_feat[0]
+        if depth >= self._const_max_depth or n < 1.5 * self._const_min_sample:
+            return Leaf(float(np.mean(self.y_[samples])))
+        split = self._find_constant_split(sorted_by_feat, n)
+        if split is None:
+            return Leaf(float(np.mean(self.y_[samples])))
+        feat, thresh, gain = split
+        if gain < self.gain_threshold:
+            return Leaf(float(np.mean(self.y_[samples])))
+        left_sorted, right_sorted = [], []
+        for f in range(self.n_features_):
+            left_mask = self.X_[sorted_by_feat[f], feat] <= thresh
+            left_sorted.append(sorted_by_feat[f][left_mask])
+            right_sorted.append(sorted_by_feat[f][~left_mask])
+        return Branch(feat, thresh,
+                      self._grow_constant(left_sorted, depth + 1),
+                      self._grow_constant(right_sorted, depth + 1))
+    def _find_constant_split(
+        self,
+        sorted_by_feat: list[np.ndarray],
+        n: int) -> tuple[int, float, float] | None:
+        """Find best split by MSE reduction across all features."""
+        base_mse = np.var(self.y_[sorted_by_feat[0]])
+        best_mse, best_feat, best_thresh = base_mse, None, None
+        min_s = self._const_min_sample
+        for f in range(self.n_features_):
+            idx = sorted_by_feat[f]
+            col_sorted = self.X_[idx, f]
+            y_sorted = self.y_[idx].astype(np.float64)
+            cum_sum = np.cumsum(y_sorted)
+            cum_sq = np.cumsum(y_sorted ** 2)
+            total_sum = cum_sum[-1]
+            total_sq = cum_sq[-1]
+            values = np.quantile(col_sorted, self.quantiles)
+            values = values[:-1]  # skip last candidate
+            positions = np.searchsorted(col_sorted, values, side='right')
+            # filter valid splits
+            valid = (positions >= min_s) & (positions <= n - min_s)
+            if not np.any(valid):
+                continue
+            pos = positions[valid]
+            thresholds = values[valid]
+            # skip duplicates — same partition, same MSE
+            unique_mask = np.empty(len(pos), dtype=bool)
+            unique_mask[0] = True
+            unique_mask[1:] = np.diff(pos) != 0
+            pos = pos[unique_mask]
+            thresholds = thresholds[unique_mask]
+            # vectorized MSE via cumulative sums
+            left_n = pos.astype(np.float64)
+            left_sum = cum_sum[pos - 1]
+            left_sq = cum_sq[pos - 1]
+            right_n = n - left_n
+            right_sum = total_sum - left_sum
+            right_sq = total_sq - left_sq
+            # guards against negatives
+            left_var = np.maximum(0.0, left_sq / left_n - (left_sum / left_n) ** 2)
+            right_var = np.maximum(0.0, right_sq / right_n - (right_sum / right_n) ** 2)
+            mse = (left_n * left_var + right_n * right_var) / n
+            idx_best = np.argmin(mse)
+            if mse[idx_best] < best_mse:
+                best_mse = mse[idx_best]
+                best_feat = f
+                best_thresh = float(thresholds[idx_best])
+        if best_feat is None:
+            return None
+        return best_feat, best_thresh, base_mse - best_mse
+    def _compute_depth(self, node: Node, current: int = 0) -> int:
+        """Compute the maximum depth of the tree."""
+        if isinstance(node, Leaf):
+            return current
+        return max(
+            self._compute_depth(node.left, current + 1),
+            self._compute_depth(node.right, current + 1)
+        )
     def _compile(self, root: Node) -> CompiledTree:
         """Convert nested tree structure to flat thresholds for prediction
         and create overlap for training data over the thresholds."""
@@ -428,6 +552,10 @@ class KernelTree:
             'initial_precision': self.initial_precision,
             'sample_share': self.sample_share,
             'precision_method': self.precision_method,
+            'pilot_factor': self.pilot_factor,
+            'tree_type': self.tree_type,
+            'gain_threshold': self.gain_threshold,
+            'quantiles': self.quantiles,
         }
     def set_params(self, **params) -> "KernelTree":
@@ -445,8 +573,15 @@ class KernelTree:
             'initial_precision': self.initial_precision,
             'sample_share': self.sample_share,
             'precision_method': self.precision_method,
+            'pilot_factor': self.pilot_factor,
         }
+        self._const_min_sample = max(50, self.min_sample // 5)
+        self._const_max_depth = self.max_depth + 3
         self._validate_params()
         return self

{kernelboost-0.2.1 → kernelboost-0.3.0/kernelboost.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: kernelboost
-Version: 0.2.1
+Version: 0.3.0
 Summary: Gradient boosting with kernel regression base learners
 Author-email: tlaiho <tslaiho@gmail.com>
 License: MIT
@@ -29,7 +29,7 @@ Requires-Dist: cupy>=11.0.0; extra == "all"
 ![C](https://img.shields.io/badge/C-language-blue)
 ![GPU](https://img.shields.io/badge/GPU-CUDA%20C%2FCuPy-orange)
 ![License](https://img.shields.io/badge/license-MIT-green)
-![Version](https://img.shields.io/badge/version-0.2.0-blue)
+![Version](https://img.shields.io/badge/version-0.3.0-blue)
 KernelBoost is a gradient boosting algorithm that uses Nadaraya-Watson (local constant) kernel estimators as base learners instead of decision trees. It has:

{kernelboost-0.2.1 → kernelboost-0.3.0}/kernelboost.egg-info/SOURCES.txt RENAMED Viewed

@@ -15,6 +15,9 @@ kernelboost/kernels.c
 kernelboost/kernels.cu
 kernelboost/libkernels.dll
 kernelboost/libkernels.so
+kernelboost/libmi.dll
+kernelboost/libmi.so
+kernelboost/mi_bins.c
 kernelboost/multiclassbooster.py
 kernelboost/objectives.py
 kernelboost/optimizer.py

{kernelboost-0.2.1 → kernelboost-0.3.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "kernelboost"
-version = "0.2.1"
+version = "0.3.0"
 description = "Gradient boosting with kernel regression base learners"
 readme = "PYPI_README.md"
 requires-python = ">=3.9"