PyPI - warpgbm - Versions diffs - 0.1.22__tar.gz → 0.1.24__tar.gz - Mend

warpgbm 0.1.22tar.gz → 0.1.24tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (25) hide show

{warpgbm-0.1.22/warpgbm.egg-info → warpgbm-0.1.24}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: warpgbm
-Version: 0.1.22
+Version: 0.1.24
 Summary: A fast GPU-accelerated Gradient Boosted Decision Tree library with PyTorch + CUDA
 License:                     GNU GENERAL PUBLIC LICENSE
                                Version 3, 29 June 2007
@@ -879,8 +879,26 @@ No installation required — just press **"Open in Playground"**, then **Run All
 - `L2_reg`: L2 regularizer (default: 1e-6)
 ### Methods:
-- `.fit(X, y, era_id=None)`: Train the model. `X` can be raw floats or pre-binned `int8` data. `era_id` is optional and used internally.
-- `.predict(X)`: Predict on new data, using parallelized CUDA kernel.
+```
+.fit(
+   X,                             # numpy array (float or int) 2 dimensions (num_samples, num_features)
+   y,                             # numpy array (float or int) 1 dimension (num_samples)
+   era_id=None,                   # numpy array (int) 1 dimension (num_samples)
+   X_eval=None,                   # numpy array (float or int) 2 dimensions (eval_num_samples, num_features)
+   y_eval=None,                   # numpy array (float or int) 1 dimension (eval_num_samples)
+   eval_every_n_trees=None,       # const (int) >= 1
+   early_stopping_rounds=None,    # const (int) >= 1
+)
+```
+Train with optional validation set and early stopping.
+```
+.predict(
+   X                              # numpy array (float or int) 2 dimensions (predict_num_samples, num_features)
+)
+```
+Predict on new data, using parallelized CUDA kernel.
 ---
@@ -896,3 +914,7 @@ WarpGBM builds on the shoulders of PyTorch, scikit-learn, LightGBM, and the CUDA
 - Vectorized predict function replaced with CUDA kernel (`warpgbm/cuda/predict.cu`), parallelizing per sample, per tree.
+### v0.1.23
+- Adjust gain in split kernel and added support for an eval set with early stopping based on MSE.

{warpgbm-0.1.22 → warpgbm-0.1.24}/README.md RENAMED Viewed

@@ -191,8 +191,26 @@ No installation required — just press **"Open in Playground"**, then **Run All
 - `L2_reg`: L2 regularizer (default: 1e-6)
 ### Methods:
-- `.fit(X, y, era_id=None)`: Train the model. `X` can be raw floats or pre-binned `int8` data. `era_id` is optional and used internally.
-- `.predict(X)`: Predict on new data, using parallelized CUDA kernel.
+```
+.fit(
+   X,                             # numpy array (float or int) 2 dimensions (num_samples, num_features)
+   y,                             # numpy array (float or int) 1 dimension (num_samples)
+   era_id=None,                   # numpy array (int) 1 dimension (num_samples)
+   X_eval=None,                   # numpy array (float or int) 2 dimensions (eval_num_samples, num_features)
+   y_eval=None,                   # numpy array (float or int) 1 dimension (eval_num_samples)
+   eval_every_n_trees=None,       # const (int) >= 1
+   early_stopping_rounds=None,    # const (int) >= 1
+)
+```
+Train with optional validation set and early stopping.
+```
+.predict(
+   X                              # numpy array (float or int) 2 dimensions (predict_num_samples, num_features)
+)
+```
+Predict on new data, using parallelized CUDA kernel.
 ---
@@ -208,3 +226,7 @@ WarpGBM builds on the shoulders of PyTorch, scikit-learn, LightGBM, and the CUDA
 - Vectorized predict function replaced with CUDA kernel (`warpgbm/cuda/predict.cu`), parallelizing per sample, per tree.
+### v0.1.23
+- Adjust gain in split kernel and added support for an eval set with early stopping based on MSE.

{warpgbm-0.1.22 → warpgbm-0.1.24}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "warpgbm"
-version = "0.1.22"
+version = "0.1.24"
 description = "A fast GPU-accelerated Gradient Boosted Decision Tree library with PyTorch + CUDA"
 readme = "README.md"
 requires-python = ">=3.8"

warpgbm-0.1.24/tests/numerai_test.py ADDED Viewed

@@ -0,0 +1,62 @@
+from numerapi import NumerAPI
+import pandas as pd
+import numpy as np
+from warpgbm import WarpGBM
+import time
+from sklearn.metrics import mean_squared_error
+def predict_in_chunks(model, X, chunk_size=100_000):
+    preds = []
+    for i in range(0, X.shape[0], chunk_size):
+        X_chunk = X[i : i + chunk_size]
+        preds.append(model.predict(X_chunk))
+    return np.concatenate(preds)
+def test_numerai_data():
+    napi = NumerAPI()
+    napi.download_dataset("v5.0/train.parquet", "numerai_train.parquet")
+    data = pd.read_parquet("numerai_train.parquet")
+    features = [f for f in list(data) if "feature" in f][:1000]
+    target = "target"
+    X = data[features].astype("int8").values[:]
+    y = data[target].values
+    model = WarpGBM(
+        max_depth=10,
+        num_bins=5,
+        n_estimators=100,
+        learning_rate=1,
+        threads_per_block=64,
+        rows_per_thread=4,
+        colsample_bytree=0.8,
+    )
+    start_fit = time.time()
+    model.fit(
+        X,
+        y,
+        # era_id=era,
+        # X_eval=X,
+        # y_eval=y,
+        # eval_every_n_trees=10,
+        # early_stopping_rounds=1,
+    )
+    fit_time = time.time() - start_fit
+    print(f"  Fit time:     {fit_time:.3f} seconds")
+    start_pred = time.time()
+    preds = predict_in_chunks(model, X, chunk_size=500_000)
+    pred_time = time.time() - start_pred
+    print(f"  Predict time: {pred_time:.3f} seconds")
+    corr = np.corrcoef(preds, y)[0, 1]
+    mse = mean_squared_error(preds, y)
+    print(f"  Correlation:  {corr:.4f}")
+    print(f"  MSE:  {mse:.4f}")
+    assert corr > 0.68, f"In-sample correlation too low: {corr}"
+    assert mse < 0.03, f"In-sample mse too high: {mse}"

{warpgbm-0.1.22 → warpgbm-0.1.24}/tests/test_fit_predict_corr.py RENAMED Viewed

@@ -29,7 +29,15 @@ def test_fit_predictpytee_correlation():
         )
         start_fit = time.time()
-        model.fit(X, y, era_id=era)
+        model.fit(
+            X,
+            y,
+            era_id=era,
+            X_eval=X,
+            y_eval=y,
+            eval_every_n_trees=10,
+            early_stopping_rounds=1,
+        )
         fit_time = time.time() - start_fit
         print(f"  Fit time:     {fit_time:.3f} seconds")

warpgbm-0.1.24/version.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+ 0.1.24

{warpgbm-0.1.22 → warpgbm-0.1.24}/warpgbm/core.py RENAMED Viewed

@@ -5,6 +5,7 @@ from warpgbm.cuda import node_kernel
 from tqdm import tqdm
 from typing import Tuple
 from torch import Tensor
+import gc
 histogram_kernels = {
     "hist1": node_kernel.compute_histogram,
@@ -29,6 +30,7 @@ class WarpGBM(BaseEstimator, RegressorMixin):
         L2_reg=1e-6,
         L1_reg=0.0,
         device="cuda",
+        colsample_bytree=1.0,
     ):
         # Validate arguments
         self._validate_hyperparams(
@@ -43,6 +45,7 @@ class WarpGBM(BaseEstimator, RegressorMixin):
             rows_per_thread=rows_per_thread,
             L2_reg=L2_reg,
             L1_reg=L1_reg,
+            colsample_bytree=colsample_bytree,
         )
         self.num_bins = num_bins
@@ -70,6 +73,8 @@ class WarpGBM(BaseEstimator, RegressorMixin):
         self.rows_per_thread = rows_per_thread
         self.L2_reg = L2_reg
         self.L1_reg = L1_reg
+        self.forest = [{} for _ in range(self.n_estimators)]
+        self.colsample_bytree = colsample_bytree
     def _validate_hyperparams(self, **kwargs):
         # Type checks
@@ -81,7 +86,13 @@ class WarpGBM(BaseEstimator, RegressorMixin):
             "threads_per_block",
             "rows_per_thread",
         ]
-        float_params = ["learning_rate", "min_split_gain", "L2_reg", "L1_reg"]
+        float_params = [
+            "learning_rate",
+            "min_split_gain",
+            "L2_reg",
+            "L1_reg",
+            "colsample_bytree",
+        ]
         for param in int_params:
             if not isinstance(kwargs[param], int):
@@ -121,10 +132,100 @@ class WarpGBM(BaseEstimator, RegressorMixin):
             raise ValueError(
                 f"Invalid histogram_computer: {kwargs['histogram_computer']}. Choose from {list(histogram_kernels.keys())}."
             )
+        if kwargs["colsample_bytree"] <= 0 or kwargs["colsample_bytree"] > 1:
+            raise ValueError(
+                f"Invalid colsample_bytree: {kwargs['colsample_bytree']}. Must be a float value > 0 and <= 1."
+            )
+    def validate_fit_params(
+        self, X, y, era_id, X_eval, y_eval, eval_every_n_trees, early_stopping_rounds
+    ):
+        # ─── Required: X and y ───
+        if not isinstance(X, np.ndarray) or not isinstance(y, np.ndarray):
+            raise TypeError("X and y must be numpy arrays.")
+        if X.ndim != 2:
+            raise ValueError(f"X must be 2-dimensional, got shape {X.shape}")
+        if y.ndim != 1:
+            raise ValueError(f"y must be 1-dimensional, got shape {y.shape}")
+        if X.shape[0] != y.shape[0]:
+            raise ValueError(
+                f"X and y must have the same number of rows. Got {X.shape[0]} and {y.shape[0]}."
+            )
+        # ─── Optional: era_id ───
+        if era_id is not None:
+            if not isinstance(era_id, np.ndarray):
+                raise TypeError("era_id must be a numpy array.")
+            if era_id.ndim != 1:
+                raise ValueError(
+                    f"era_id must be 1-dimensional, got shape {era_id.shape}"
+                )
+            if len(era_id) != len(y):
+                raise ValueError(
+                    f"era_id must have same length as y. Got {len(era_id)} and {len(y)}."
+                )
+        # ─── Optional: Eval Set ───
+        eval_args = [X_eval, y_eval, eval_every_n_trees]
+        if any(arg is not None for arg in eval_args):
+            # Require all of them
+            if X_eval is None or y_eval is None or eval_every_n_trees is None:
+                raise ValueError(
+                    "If using eval set, X_eval, y_eval, and eval_every_n_trees must all be defined."
+                )
+            if not isinstance(X_eval, np.ndarray) or not isinstance(y_eval, np.ndarray):
+                raise TypeError("X_eval and y_eval must be numpy arrays.")
+            if X_eval.ndim != 2:
+                raise ValueError(
+                    f"X_eval must be 2-dimensional, got shape {X_eval.shape}"
+                )
+            if y_eval.ndim != 1:
+                raise ValueError(
+                    f"y_eval must be 1-dimensional, got shape {y_eval.shape}"
+                )
+            if X_eval.shape[0] != y_eval.shape[0]:
+                raise ValueError(
+                    f"X_eval and y_eval must have same number of rows. Got {X_eval.shape[0]} and {y_eval.shape[0]}."
+                )
+            if not isinstance(eval_every_n_trees, int) or eval_every_n_trees <= 0:
+                raise ValueError(
+                    f"eval_every_n_trees must be a positive integer, got {eval_every_n_trees}."
+                )
+            if early_stopping_rounds is not None:
+                if (
+                    not isinstance(early_stopping_rounds, int)
+                    or early_stopping_rounds <= 0
+                ):
+                    raise ValueError(
+                        f"early_stopping_rounds must be a positive integer, got {early_stopping_rounds}."
+                    )
+            else:
+                # No early stopping = set to "never trigger"
+                early_stopping_rounds = self.n_estimators + 1
+        return early_stopping_rounds  # May have been defaulted here
+    def fit(
+        self,
+        X,
+        y,
+        era_id=None,
+        X_eval=None,
+        y_eval=None,
+        eval_every_n_trees=None,
+        early_stopping_rounds=None,
+    ):
+        early_stopping_rounds = self.validate_fit_params(
+            X, y, era_id, X_eval, y_eval, eval_every_n_trees, early_stopping_rounds
+        )
-    def fit(self, X, y, era_id=None):
         if era_id is None:
             era_id = np.ones(X.shape[0], dtype="int32")
+        # Train data preprocessing
         self.bin_indices, era_indices, self.bin_edges, self.unique_eras, self.Y_gpu = (
             self.preprocess_gpu_data(X, y, era_id)
         )
@@ -137,8 +238,29 @@ class WarpGBM(BaseEstimator, RegressorMixin):
         self.best_bins = torch.zeros(
             self.num_features, device=self.device, dtype=torch.int32
         )
+        self.feature_indices = torch.arange(self.num_features, device=self.device)
+        # ─── Optional Eval Set ───
+        if X_eval is not None and y_eval is not None:
+            self.bin_indices_eval = self.bin_data_with_existing_edges(X_eval)
+            self.Y_gpu_eval = torch.from_numpy(y_eval).to(torch.float32).to(self.device)
+            self.eval_every_n_trees = eval_every_n_trees
+            self.early_stopping_rounds = early_stopping_rounds
+        else:
+            self.bin_indices_eval = None
+            self.Y_gpu_eval = None
+            self.eval_every_n_trees = None
+            self.early_stopping_rounds = None
+        # ─── Grow the forest ───
         with torch.no_grad():
-            self.forest = self.grow_forest()
+            self.grow_forest()
+        del self.bin_indices
+        del self.Y_gpu
+        gc.collect()
         return self
     def preprocess_gpu_data(self, X_np, Y_np, era_id_np):
@@ -248,16 +370,16 @@ class WarpGBM(BaseEstimator, RegressorMixin):
             return {"leaf_value": leaf_value.item(), "samples": node_indices.numel()}
         parent_size = node_indices.numel()
-        best_feature, best_bin = self.find_best_split(
+        local_feature, best_bin = self.find_best_split(
             gradient_histogram, hessian_histogram
         )
-        if best_feature == -1:
+        if local_feature == -1:
             leaf_value = self.residual[node_indices].mean()
             self.gradients[node_indices] += self.learning_rate * leaf_value
             return {"leaf_value": leaf_value.item(), "samples": parent_size}
-        split_mask = self.bin_indices[node_indices, best_feature] <= best_bin
+        split_mask = self.bin_indices_tree[node_indices, local_feature] <= best_bin
         left_indices = node_indices[split_mask]
         right_indices = node_indices[~split_mask]
@@ -266,13 +388,13 @@ class WarpGBM(BaseEstimator, RegressorMixin):
         if left_size <= right_size:
             grad_hist_left, hess_hist_left = self.compute_histograms(
-                self.bin_indices[left_indices], self.residual[left_indices]
+                self.bin_indices_tree[left_indices], self.residual[left_indices]
             )
             grad_hist_right = gradient_histogram - grad_hist_left
             hess_hist_right = hessian_histogram - hess_hist_left
         else:
             grad_hist_right, hess_hist_right = self.compute_histograms(
-                self.bin_indices[right_indices], self.residual[right_indices]
+                self.bin_indices_tree[right_indices], self.residual[right_indices]
             )
             grad_hist_left = gradient_histogram - grad_hist_right
             hess_hist_left = hessian_histogram - hess_hist_right
@@ -286,44 +408,79 @@ class WarpGBM(BaseEstimator, RegressorMixin):
         )
         return {
-            "feature": best_feature,
+            "feature": self.feat_indices_tree[local_feature],
             "bin": best_bin,
             "left": left_child,
             "right": right_child,
         }
+    def compute_eval(self, i):
+        if self.eval_every_n_trees == None:
+            return
+        if i % self.eval_every_n_trees == 0:
+            eval_preds = self.predict_binned(self.bin_indices_eval)
+            eval_loss = ((self.Y_gpu_eval - eval_preds) ** 2).mean().item()
+            self.eval_loss.append(eval_loss)
+            train_loss = ((self.Y_gpu - self.gradients) ** 2).mean().item()
+            self.training_loss.append(train_loss)
+            if len(self.eval_loss) > self.early_stopping_rounds:
+                if self.eval_loss[-self.early_stopping_rounds] < self.eval_loss[-1]:
+                    self.stop = True
+            print(
+                f"🌲 Tree {i+1}/{self.n_estimators} | Train MSE: {train_loss:.6f} | Eval MSE: {eval_loss:.6f}"
+            )
+            del eval_preds, eval_loss, train_loss
     def grow_forest(self):
-        forest = [{} for _ in range(self.n_estimators)]
         self.training_loss = []
+        self.eval_loss = []  # if eval set is given
+        self.stop = False
-        for i in tqdm(range(self.n_estimators)):
+        if self.colsample_bytree < 1.0:
+            k = max(1, int(self.colsample_bytree * self.num_features))
+        else:
+            self.feat_indices_tree = self.feature_indices
+            self.bin_indices_tree = self.bin_indices
+        for i in range(self.n_estimators):
             self.residual = self.Y_gpu - self.gradients
+            if self.colsample_bytree < 1.0:
+                self.feat_indices_tree = torch.randperm(
+                    self.num_features, device=self.device
+                )[:k]
+                self.bin_indices_tree = self.bin_indices[:, self.feat_indices_tree]
             self.root_gradient_histogram, self.root_hessian_histogram = (
-                self.compute_histograms(self.bin_indices, self.residual)
+                self.compute_histograms(self.bin_indices_tree, self.residual)
             )
             tree = self.grow_tree(
                 self.root_gradient_histogram,
                 self.root_hessian_histogram,
                 self.root_node_indices,
-                depth=0,
+                0,
             )
-            forest[i] = tree
-        # loss = ((self.Y_gpu - self.gradients) ** 2).mean().item()
-        # self.training_loss.append(loss)
-        # print(f"🌲 Tree {i+1}/{self.n_estimators} - MSE: {loss:.6f}")
+            self.forest[i] = tree
+            self.compute_eval(i)
+            if self.stop:
+                break
         print("Finished training forest.")
-        return forest
-    def predict(self, X_np):
-        X_tensor = torch.from_numpy(X_np).to(torch.float32).pin_memory()
+    def bin_data_with_existing_edges(self, X_np):
+        X_tensor = torch.from_numpy(X_np).type(torch.float32).pin_memory()
         num_samples = X_tensor.size(0)
         bin_indices = torch.zeros(
             (num_samples, self.num_features), dtype=torch.int8, device=self.device
         )
         with torch.no_grad():
             for f in range(self.num_features):
                 X_f = X_tensor[:, f].to(self.device, non_blocking=True)
@@ -332,10 +489,16 @@ class WarpGBM(BaseEstimator, RegressorMixin):
                 node_kernel.custom_cuda_binner(X_f, bin_edges_f, bin_indices_f)
                 bin_indices[:, f] = bin_indices_f
+        return bin_indices
+    def predict_binned(self, bin_indices):
+        num_samples = bin_indices.size(0)
         tree_tensor = torch.stack(
             [
                 self.flatten_tree(tree, max_nodes=2 ** (self.max_depth + 1))
                 for tree in self.forest
+                if tree
             ]
         ).to(self.device)
@@ -344,24 +507,33 @@ class WarpGBM(BaseEstimator, RegressorMixin):
             bin_indices.contiguous(), tree_tensor.contiguous(), self.learning_rate, out
         )
-        return out.cpu().numpy()
+        return out
-    def flatten_tree(self, tree, max_nodes):
-        """
-        Convert a recursive tree structure into a flat matrix format.
+    def predict(self, X_np):
+        is_integer_type = np.issubdtype(X_np.dtype, np.integer)
-        Each row in the output represents a node:
-        - Columns: [feature, bin, left_id, right_id, is_leaf, value]
-        - Internal nodes fill columns 0–3 and set is_leaf = 0
-        - Leaf nodes fill only value and set is_leaf = 1
+        if is_integer_type and X_np.shape[1] == self.num_features:
+            max_vals = X_np.max(axis=0)
+            if np.all(max_vals < self.num_bins):
+                print("Detected pre-binned input at predict-time — skipping binning.")
+                is_prebinned = True
+            else:
+                is_prebinned = False
+        else:
+            is_prebinned = False
-        Args:
-            tree (list): A list containing a single root node (recursive dict form).
-            max_nodes (int): Max number of nodes to allocate in the flat matrix.
+        if is_prebinned:
+            bin_indices = (
+                torch.from_numpy(X_np).to(self.device).contiguous().to(torch.int8)
+            )
+        else:
+            bin_indices = self.bin_data_with_existing_edges(X_np)
+        preds = self.predict_binned(bin_indices).cpu().numpy()
+        del bin_indices
+        return preds
-        Returns:
-            torch.Tensor: [max_nodes x 6] matrix representing the flattened tree.
-        """
+    def flatten_tree(self, tree, max_nodes):
         flat = torch.full((max_nodes, 6), float("nan"), dtype=torch.float32)
         node_counter = [0]
         node_list = []

{warpgbm-0.1.22 → warpgbm-0.1.24}/warpgbm/cuda/best_split_kernel.cu RENAMED Viewed

@@ -38,7 +38,7 @@ __global__ void best_split_kernel_global_only(
         if (H_L >= min_child_samples && H_R >= min_child_samples)
         {
-            float gain = (G_L * G_L) / (H_L + eps) + (G_R * G_R) / (H_R + eps);
+            float gain = (G_L * G_L) / (H_L + eps) + (G_R * G_R) / (H_R + eps) - (G_total * G_total) / (H_total + eps);
             if (gain > best_gain)
             {
                 best_gain = gain;

{warpgbm-0.1.22 → warpgbm-0.1.24/warpgbm.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: warpgbm
-Version: 0.1.22
+Version: 0.1.24
 Summary: A fast GPU-accelerated Gradient Boosted Decision Tree library with PyTorch + CUDA
 License:                     GNU GENERAL PUBLIC LICENSE
                                Version 3, 29 June 2007
@@ -879,8 +879,26 @@ No installation required — just press **"Open in Playground"**, then **Run All
 - `L2_reg`: L2 regularizer (default: 1e-6)
 ### Methods:
-- `.fit(X, y, era_id=None)`: Train the model. `X` can be raw floats or pre-binned `int8` data. `era_id` is optional and used internally.
-- `.predict(X)`: Predict on new data, using parallelized CUDA kernel.
+```
+.fit(
+   X,                             # numpy array (float or int) 2 dimensions (num_samples, num_features)
+   y,                             # numpy array (float or int) 1 dimension (num_samples)
+   era_id=None,                   # numpy array (int) 1 dimension (num_samples)
+   X_eval=None,                   # numpy array (float or int) 2 dimensions (eval_num_samples, num_features)
+   y_eval=None,                   # numpy array (float or int) 1 dimension (eval_num_samples)
+   eval_every_n_trees=None,       # const (int) >= 1
+   early_stopping_rounds=None,    # const (int) >= 1
+)
+```
+Train with optional validation set and early stopping.
+```
+.predict(
+   X                              # numpy array (float or int) 2 dimensions (predict_num_samples, num_features)
+)
+```
+Predict on new data, using parallelized CUDA kernel.
 ---
@@ -896,3 +914,7 @@ WarpGBM builds on the shoulders of PyTorch, scikit-learn, LightGBM, and the CUDA
 - Vectorized predict function replaced with CUDA kernel (`warpgbm/cuda/predict.cu`), parallelizing per sample, per tree.
+### v0.1.23
+- Adjust gain in split kernel and added support for an eval set with early stopping based on MSE.

{warpgbm-0.1.22 → warpgbm-0.1.24}/warpgbm.egg-info/SOURCES.txt RENAMED Viewed

@@ -5,6 +5,7 @@ pyproject.toml
 setup.py
 version.txt
 tests/__init__.py
+tests/numerai_test.py
 tests/test_fit_predict_corr.py
 warpgbm/__init__.py
 warpgbm/core.py