PyPI - warpgbm - Versions diffs - 0.1.21__tar.gz → 0.1.23__tar.gz - Mend

warpgbm 0.1.21tar.gz → 0.1.23tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (25) hide show

{warpgbm-0.1.21/warpgbm.egg-info → warpgbm-0.1.23}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: warpgbm
-Version: 0.1.21
+Version: 0.1.23
 Summary: A fast GPU-accelerated Gradient Boosted Decision Tree library with PyTorch + CUDA
 License:                     GNU GENERAL PUBLIC LICENSE
                                Version 3, 29 June 2007
@@ -704,26 +704,20 @@ WarpGBM is a high-performance, GPU-accelerated Gradient Boosted Decision Tree (G
 ---
-## Performance Note
-In our initial tests on an NVIDIA 3090 (local) and A100 (Google Colab Pro), WarpGBM achieves **14x to 20x faster training times** compared to LightGBM's CPU version and **2x faster** on the GPU version using default configurations. Speed also outperforms XGBoost and CatBoost on regression problems. It also consumes **significantly less RAM and CPU**. These early results hint at more thorough benchmarking to come.
----
 ## Benchmarks
 ### Scikit-Learn Synthetic Data: 1 Million Rows and 1,000 Features
-In this benchmark we compare the speed and in-sample correlation of **WarpGBM v0.1.19** against LightGBM, XGBoost and CatBoost, all with their GPU-enabled versions. This benchmark runs on Google Colab with the L4 GPU environment. The CPU versions don't even come close to the speed here so we didn't test them.
+In this benchmark we compare the speed and in-sample correlation of **WarpGBM v0.1.21** against LightGBM, XGBoost and CatBoost, all with their GPU-enabled versions. This benchmark runs on Google Colab with the L4 GPU environment.
 ```
-   WarpGBM:   corr = 0.8882, train = 21.8s, infer = 11.6s
-   XGBoost:   corr = 0.8877, train = 33.4s, infer = 8.1s
-  LightGBM:   corr = 0.8604, train = 30.2s, infer = 1.4s
-  CatBoost:   corr = 0.8935, train = 377.9s, infer = 375.8s
+   WarpGBM:   corr = 0.8882, train = 18.7s, infer = 4.9s
+   XGBoost:   corr = 0.8877, train = 33.1s, infer = 8.1s
+  LightGBM:   corr = 0.8604, train = 30.3s, infer = 1.4s
+  CatBoost:   corr = 0.8935, train = 400.0s, infer = 382.6s
 ```
-Colab Notebook: https://colab.research.google.com/drive/16U1kbYlD5HibGbnF5NGsjChZ1p1IA2pK
+Colab Notebook: https://colab.research.google.com/drive/16U1kbYlD5HibGbnF5NGsjChZ1p1IA2pK?usp=sharing
 ---
@@ -746,7 +740,7 @@ pip install warpgbm
 This installs from PyPI and also compiles CUDA code locally during installation. This method works well **if your environment already has PyTorch with GPU support** installed and configured.
 > **Tip:**\
-> If you encounter an error related to mismatched or missing CUDA versions, try installing with the following flag:
+> If you encounter an error related to mismatched or missing CUDA versions, try installing with the following flag. This is currently required in the Colab environments.
 >
 > ```bash
 > pip install warpgbm --no-build-isolation
@@ -886,8 +880,7 @@ No installation required — just press **"Open in Playground"**, then **Run All
 ### Methods:
 - `.fit(X, y, era_id=None)`: Train the model. `X` can be raw floats or pre-binned `int8` data. `era_id` is optional and used internally.
-- `.predict(X, chunksize=50_000)`: Predict on new raw float or pre-binned data.
-- `.predict_numpy(X, chunksize=50_000)`: Same as `.predict(X)` but without using the GPU.
+- `.predict(X)`: Predict on new data, using parallelized CUDA kernel.
 ---

{warpgbm-0.1.21 → warpgbm-0.1.23}/README.md RENAMED Viewed

@@ -16,26 +16,20 @@ WarpGBM is a high-performance, GPU-accelerated Gradient Boosted Decision Tree (G
 ---
-## Performance Note
-In our initial tests on an NVIDIA 3090 (local) and A100 (Google Colab Pro), WarpGBM achieves **14x to 20x faster training times** compared to LightGBM's CPU version and **2x faster** on the GPU version using default configurations. Speed also outperforms XGBoost and CatBoost on regression problems. It also consumes **significantly less RAM and CPU**. These early results hint at more thorough benchmarking to come.
----
 ## Benchmarks
 ### Scikit-Learn Synthetic Data: 1 Million Rows and 1,000 Features
-In this benchmark we compare the speed and in-sample correlation of **WarpGBM v0.1.19** against LightGBM, XGBoost and CatBoost, all with their GPU-enabled versions. This benchmark runs on Google Colab with the L4 GPU environment. The CPU versions don't even come close to the speed here so we didn't test them.
+In this benchmark we compare the speed and in-sample correlation of **WarpGBM v0.1.21** against LightGBM, XGBoost and CatBoost, all with their GPU-enabled versions. This benchmark runs on Google Colab with the L4 GPU environment.
 ```
-   WarpGBM:   corr = 0.8882, train = 21.8s, infer = 11.6s
-   XGBoost:   corr = 0.8877, train = 33.4s, infer = 8.1s
-  LightGBM:   corr = 0.8604, train = 30.2s, infer = 1.4s
-  CatBoost:   corr = 0.8935, train = 377.9s, infer = 375.8s
+   WarpGBM:   corr = 0.8882, train = 18.7s, infer = 4.9s
+   XGBoost:   corr = 0.8877, train = 33.1s, infer = 8.1s
+  LightGBM:   corr = 0.8604, train = 30.3s, infer = 1.4s
+  CatBoost:   corr = 0.8935, train = 400.0s, infer = 382.6s
 ```
-Colab Notebook: https://colab.research.google.com/drive/16U1kbYlD5HibGbnF5NGsjChZ1p1IA2pK
+Colab Notebook: https://colab.research.google.com/drive/16U1kbYlD5HibGbnF5NGsjChZ1p1IA2pK?usp=sharing
 ---
@@ -58,7 +52,7 @@ pip install warpgbm
 This installs from PyPI and also compiles CUDA code locally during installation. This method works well **if your environment already has PyTorch with GPU support** installed and configured.
 > **Tip:**\
-> If you encounter an error related to mismatched or missing CUDA versions, try installing with the following flag:
+> If you encounter an error related to mismatched or missing CUDA versions, try installing with the following flag. This is currently required in the Colab environments.
 >
 > ```bash
 > pip install warpgbm --no-build-isolation
@@ -198,8 +192,7 @@ No installation required — just press **"Open in Playground"**, then **Run All
 ### Methods:
 - `.fit(X, y, era_id=None)`: Train the model. `X` can be raw floats or pre-binned `int8` data. `era_id` is optional and used internally.
-- `.predict(X, chunksize=50_000)`: Predict on new raw float or pre-binned data.
-- `.predict_numpy(X, chunksize=50_000)`: Same as `.predict(X)` but without using the GPU.
+- `.predict(X)`: Predict on new data, using parallelized CUDA kernel.
 ---

{warpgbm-0.1.21 → warpgbm-0.1.23}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "warpgbm"
-version = "0.1.21"
+version = "0.1.23"
 description = "A fast GPU-accelerated Gradient Boosted Decision Tree library with PyTorch + CUDA"
 readme = "README.md"
 requires-python = ">=3.8"

{warpgbm-0.1.21 → warpgbm-0.1.23}/tests/test_fit_predict_corr.py RENAMED Viewed

@@ -1,11 +1,9 @@
 import numpy as np
 from warpgbm import WarpGBM
 from sklearn.datasets import make_regression
-import numpy as np
 import time
-from warpgbm import WarpGBM
-from sklearn.datasets import make_regression
+from sklearn.metrics import mean_squared_error
 def test_fit_predictpytee_correlation():
     np.random.seed(42)
@@ -14,23 +12,32 @@ def test_fit_predictpytee_correlation():
     X, y = make_regression(n_samples=N, n_features=F, noise=0.1, random_state=42)
     era = np.zeros(N, dtype=np.int32)
     corrs = []
+    mses = []
-    for hist_type in ['hist1', 'hist2', 'hist3']:
+    for hist_type in ["hist1", "hist2", "hist3"]:
         print(f"\nTesting histogram method: {hist_type}")
         model = WarpGBM(
             max_depth=10,
             num_bins=10,
-            n_estimators=10,
+            n_estimators=100,
             learning_rate=1,
             verbosity=False,
             histogram_computer=hist_type,
             threads_per_block=64,
-            rows_per_thread=4
+            rows_per_thread=4,
         )
         start_fit = time.time()
-        model.fit(X, y, era_id=era)
+        model.fit(
+            X,
+            y,
+            era_id=era,
+            X_eval=X,
+            y_eval=y,
+            eval_every_n_trees=10,
+            early_stopping_rounds=1,
+        )
         fit_time = time.time() - start_fit
         print(f"  Fit time:     {fit_time:.3f} seconds")
@@ -40,7 +47,11 @@ def test_fit_predictpytee_correlation():
         print(f"  Predict time: {pred_time:.3f} seconds")
         corr = np.corrcoef(preds, y)[0, 1]
+        mse = mean_squared_error(preds, y)
         print(f"  Correlation:  {corr:.4f}")
+        print(f"  MSE:  {mse:.4f}")
         corrs.append(corr)
+        mses.append(mse)
-    assert (np.array(corrs) > 0.95).all(), f"In-sample correlation too low: {corrs}"
+    assert (np.array(corrs) > 0.9).all(), f"In-sample correlation too low: {corrs}"
+    assert (np.array(mses) < 2).all(), f"In-sample mse too high: {mses}"

warpgbm-0.1.23/version.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+ 0.1.23

warpgbm-0.1.23/warpgbm/core.py ADDED Viewed

@@ -0,0 +1,537 @@
+import torch
+import numpy as np
+from sklearn.base import BaseEstimator, RegressorMixin
+from warpgbm.cuda import node_kernel
+from tqdm import tqdm
+from typing import Tuple
+from torch import Tensor
+histogram_kernels = {
+    "hist1": node_kernel.compute_histogram,
+    "hist2": node_kernel.compute_histogram2,
+    "hist3": node_kernel.compute_histogram3,
+}
+class WarpGBM(BaseEstimator, RegressorMixin):
+    def __init__(
+        self,
+        num_bins=10,
+        max_depth=3,
+        learning_rate=0.1,
+        n_estimators=100,
+        min_child_weight=20,
+        min_split_gain=0.0,
+        verbosity=True,
+        histogram_computer="hist3",
+        threads_per_block=64,
+        rows_per_thread=4,
+        L2_reg=1e-6,
+        L1_reg=0.0,
+        device="cuda",
+    ):
+        # Validate arguments
+        self._validate_hyperparams(
+            num_bins=num_bins,
+            max_depth=max_depth,
+            learning_rate=learning_rate,
+            n_estimators=n_estimators,
+            min_child_weight=min_child_weight,
+            min_split_gain=min_split_gain,
+            histogram_computer=histogram_computer,
+            threads_per_block=threads_per_block,
+            rows_per_thread=rows_per_thread,
+            L2_reg=L2_reg,
+            L1_reg=L1_reg,
+        )
+        self.num_bins = num_bins
+        self.max_depth = max_depth
+        self.learning_rate = learning_rate
+        self.n_estimators = n_estimators
+        self.forest = None
+        self.bin_edges = None
+        self.base_prediction = None
+        self.unique_eras = None
+        self.device = device
+        self.root_gradient_histogram = None
+        self.root_hessian_histogram = None
+        self.gradients = None
+        self.root_node_indices = None
+        self.bin_indices = None
+        self.Y_gpu = None
+        self.num_features = None
+        self.num_samples = None
+        self.min_child_weight = min_child_weight
+        self.min_split_gain = min_split_gain
+        self.best_bin = torch.tensor([-1], dtype=torch.int32, device=self.device)
+        self.compute_histogram = histogram_kernels[histogram_computer]
+        self.threads_per_block = threads_per_block
+        self.rows_per_thread = rows_per_thread
+        self.L2_reg = L2_reg
+        self.L1_reg = L1_reg
+        self.forest = [{} for _ in range(self.n_estimators)]
+    def _validate_hyperparams(self, **kwargs):
+        # Type checks
+        int_params = [
+            "num_bins",
+            "max_depth",
+            "n_estimators",
+            "min_child_weight",
+            "threads_per_block",
+            "rows_per_thread",
+        ]
+        float_params = ["learning_rate", "min_split_gain", "L2_reg", "L1_reg"]
+        for param in int_params:
+            if not isinstance(kwargs[param], int):
+                raise TypeError(
+                    f"{param} must be an integer, got {type(kwargs[param])}."
+                )
+        for param in float_params:
+            if not isinstance(
+                kwargs[param], (float, int)
+            ):  # Accept ints as valid floats
+                raise TypeError(f"{param} must be a float, got {type(kwargs[param])}.")
+        if not (2 <= kwargs["num_bins"] <= 127):
+            raise ValueError("num_bins must be between 2 and 127 inclusive.")
+        if kwargs["max_depth"] < 1:
+            raise ValueError("max_depth must be at least 1.")
+        if not (0.0 < kwargs["learning_rate"] <= 1.0):
+            raise ValueError("learning_rate must be in (0.0, 1.0].")
+        if kwargs["n_estimators"] <= 0:
+            raise ValueError("n_estimators must be positive.")
+        if kwargs["min_child_weight"] < 1:
+            raise ValueError("min_child_weight must be a positive integer.")
+        if kwargs["min_split_gain"] < 0:
+            raise ValueError("min_split_gain must be non-negative.")
+        if kwargs["threads_per_block"] <= 0 or kwargs["threads_per_block"] % 32 != 0:
+            raise ValueError(
+                "threads_per_block should be a positive multiple of 32 (warp size)."
+            )
+        if not (1 <= kwargs["rows_per_thread"] <= 16):
+            raise ValueError(
+                "rows_per_thread must be positive between 1 and 16 inclusive."
+            )
+        if kwargs["L2_reg"] < 0 or kwargs["L1_reg"] < 0:
+            raise ValueError("L2_reg and L1_reg must be non-negative.")
+        if kwargs["histogram_computer"] not in histogram_kernels:
+            raise ValueError(
+                f"Invalid histogram_computer: {kwargs['histogram_computer']}. Choose from {list(histogram_kernels.keys())}."
+            )
+    def validate_fit_params(
+        self, X, y, era_id, X_eval, y_eval, eval_every_n_trees, early_stopping_rounds
+    ):
+        # ─── Required: X and y ───
+        if not isinstance(X, np.ndarray) or not isinstance(y, np.ndarray):
+            raise TypeError("X and y must be numpy arrays.")
+        if X.ndim != 2:
+            raise ValueError(f"X must be 2-dimensional, got shape {X.shape}")
+        if y.ndim != 1:
+            raise ValueError(f"y must be 1-dimensional, got shape {y.shape}")
+        if X.shape[0] != y.shape[0]:
+            raise ValueError(
+                f"X and y must have the same number of rows. Got {X.shape[0]} and {y.shape[0]}."
+            )
+        # ─── Optional: era_id ───
+        if era_id is not None:
+            if not isinstance(era_id, np.ndarray):
+                raise TypeError("era_id must be a numpy array.")
+            if era_id.ndim != 1:
+                raise ValueError(
+                    f"era_id must be 1-dimensional, got shape {era_id.shape}"
+                )
+            if len(era_id) != len(y):
+                raise ValueError(
+                    f"era_id must have same length as y. Got {len(era_id)} and {len(y)}."
+                )
+        # ─── Optional: Eval Set ───
+        eval_args = [X_eval, y_eval, eval_every_n_trees]
+        if any(arg is not None for arg in eval_args):
+            # Require all of them
+            if X_eval is None or y_eval is None or eval_every_n_trees is None:
+                raise ValueError(
+                    "If using eval set, X_eval, y_eval, and eval_every_n_trees must all be defined."
+                )
+            if not isinstance(X_eval, np.ndarray) or not isinstance(y_eval, np.ndarray):
+                raise TypeError("X_eval and y_eval must be numpy arrays.")
+            if X_eval.ndim != 2:
+                raise ValueError(
+                    f"X_eval must be 2-dimensional, got shape {X_eval.shape}"
+                )
+            if y_eval.ndim != 1:
+                raise ValueError(
+                    f"y_eval must be 1-dimensional, got shape {y_eval.shape}"
+                )
+            if X_eval.shape[0] != y_eval.shape[0]:
+                raise ValueError(
+                    f"X_eval and y_eval must have same number of rows. Got {X_eval.shape[0]} and {y_eval.shape[0]}."
+                )
+            if not isinstance(eval_every_n_trees, int) or eval_every_n_trees <= 0:
+                raise ValueError(
+                    f"eval_every_n_trees must be a positive integer, got {eval_every_n_trees}."
+                )
+            if early_stopping_rounds is not None:
+                if (
+                    not isinstance(early_stopping_rounds, int)
+                    or early_stopping_rounds <= 0
+                ):
+                    raise ValueError(
+                        f"early_stopping_rounds must be a positive integer, got {early_stopping_rounds}."
+                    )
+            else:
+                # No early stopping = set to "never trigger"
+                early_stopping_rounds = self.n_estimators + 1
+        return early_stopping_rounds  # May have been defaulted here
+    def fit(
+        self,
+        X,
+        y,
+        era_id=None,
+        X_eval=None,
+        y_eval=None,
+        eval_every_n_trees=None,
+        early_stopping_rounds=None,
+    ):
+        early_stopping_rounds = self.validate_fit_params(
+            X, y, era_id, X_eval, y_eval, eval_every_n_trees, early_stopping_rounds
+        )
+        if era_id is None:
+            era_id = np.ones(X.shape[0], dtype="int32")
+        # Train data preprocessing
+        self.bin_indices, era_indices, self.bin_edges, self.unique_eras, self.Y_gpu = (
+            self.preprocess_gpu_data(X, y, era_id)
+        )
+        self.num_samples, self.num_features = X.shape
+        self.gradients = torch.zeros_like(self.Y_gpu)
+        self.root_node_indices = torch.arange(self.num_samples, device=self.device)
+        self.base_prediction = self.Y_gpu.mean().item()
+        self.gradients += self.base_prediction
+        self.best_gains = torch.zeros(self.num_features, device=self.device)
+        self.best_bins = torch.zeros(
+            self.num_features, device=self.device, dtype=torch.int32
+        )
+        # ─── Optional Eval Set ───
+        if X_eval is not None and y_eval is not None:
+            self.bin_indices_eval = self.bin_data_with_existing_edges(X_eval)
+            self.Y_gpu_eval = torch.from_numpy(y_eval).to(torch.float32).to(self.device)
+            self.eval_every_n_trees = eval_every_n_trees
+            self.early_stopping_rounds = early_stopping_rounds
+        else:
+            self.bin_indices_eval = None
+            self.Y_gpu_eval = None
+            self.eval_every_n_trees = None
+            self.early_stopping_rounds = None
+        # ─── Grow the forest ───
+        with torch.no_grad():
+            self.grow_forest()
+        return self
+    def preprocess_gpu_data(self, X_np, Y_np, era_id_np):
+        with torch.no_grad():
+            self.num_samples, self.num_features = X_np.shape
+            Y_gpu = torch.from_numpy(Y_np).type(torch.float32).to(self.device)
+            era_id_gpu = torch.from_numpy(era_id_np).type(torch.int32).to(self.device)
+            is_integer_type = np.issubdtype(X_np.dtype, np.integer)
+            if is_integer_type:
+                max_vals = X_np.max(axis=0)
+                if np.all(max_vals < self.num_bins):
+                    print(
+                        "Detected pre-binned integer input — skipping quantile binning."
+                    )
+                    bin_indices = (
+                        torch.from_numpy(X_np)
+                        .to(self.device)
+                        .contiguous()
+                        .to(torch.int8)
+                    )
+                    # We'll store None or an empty tensor in self.bin_edges
+                    # to indicate that we skip binning at predict-time
+                    bin_edges = torch.arange(
+                        1, self.num_bins, dtype=torch.float32
+                    ).repeat(self.num_features, 1)
+                    bin_edges = bin_edges.to(self.device)
+                    unique_eras, era_indices = torch.unique(
+                        era_id_gpu, return_inverse=True
+                    )
+                    return bin_indices, era_indices, bin_edges, unique_eras, Y_gpu
+                else:
+                    print(
+                        "Integer input detected, but values exceed num_bins — falling back to quantile binning."
+                    )
+            bin_indices = torch.empty(
+                (self.num_samples, self.num_features), dtype=torch.int8, device="cuda"
+            )
+            bin_edges = torch.empty(
+                (self.num_features, self.num_bins - 1),
+                dtype=torch.float32,
+                device="cuda",
+            )
+            X_np = torch.from_numpy(X_np).to(torch.float32).pin_memory()
+            for f in range(self.num_features):
+                X_f = X_np[:, f].to("cuda", non_blocking=True)
+                quantiles = torch.linspace(
+                    0, 1, self.num_bins + 1, device="cuda", dtype=X_f.dtype
+                )[1:-1]
+                bin_edges_f = torch.quantile(
+                    X_f, quantiles, dim=0
+                ).contiguous()  # shape: [B-1] for 1D input
+                bin_indices_f = bin_indices[:, f].contiguous()  # view into output
+                node_kernel.custom_cuda_binner(X_f, bin_edges_f, bin_indices_f)
+                bin_indices[:, f] = bin_indices_f
+                bin_edges[f, :] = bin_edges_f
+            unique_eras, era_indices = torch.unique(era_id_gpu, return_inverse=True)
+            return bin_indices, era_indices, bin_edges, unique_eras, Y_gpu
+    def compute_histograms(self, bin_indices_sub, gradients):
+        grad_hist = torch.zeros(
+            (self.num_features, self.num_bins), device=self.device, dtype=torch.float32
+        )
+        hess_hist = torch.zeros(
+            (self.num_features, self.num_bins), device=self.device, dtype=torch.float32
+        )
+        self.compute_histogram(
+            bin_indices_sub,
+            gradients,
+            grad_hist,
+            hess_hist,
+            self.num_bins,
+            self.threads_per_block,
+            self.rows_per_thread,
+        )
+        return grad_hist, hess_hist
+    def find_best_split(self, gradient_histogram, hessian_histogram):
+        node_kernel.compute_split(
+            gradient_histogram,
+            hessian_histogram,
+            self.min_split_gain,
+            self.min_child_weight,
+            self.L2_reg,
+            self.best_gains,
+            self.best_bins,
+            self.threads_per_block,
+        )
+        if torch.all(self.best_bins == -1):
+            return -1, -1  # No valid split found
+        f = torch.argmax(self.best_gains).item()
+        b = self.best_bins[f].item()
+        return f, b
+    def grow_tree(self, gradient_histogram, hessian_histogram, node_indices, depth):
+        if depth == self.max_depth:
+            leaf_value = self.residual[node_indices].mean()
+            self.gradients[node_indices] += self.learning_rate * leaf_value
+            return {"leaf_value": leaf_value.item(), "samples": node_indices.numel()}
+        parent_size = node_indices.numel()
+        best_feature, best_bin = self.find_best_split(
+            gradient_histogram, hessian_histogram
+        )
+        if best_feature == -1:
+            leaf_value = self.residual[node_indices].mean()
+            self.gradients[node_indices] += self.learning_rate * leaf_value
+            return {"leaf_value": leaf_value.item(), "samples": parent_size}
+        split_mask = self.bin_indices[node_indices, best_feature] <= best_bin
+        left_indices = node_indices[split_mask]
+        right_indices = node_indices[~split_mask]
+        left_size = left_indices.numel()
+        right_size = right_indices.numel()
+        if left_size <= right_size:
+            grad_hist_left, hess_hist_left = self.compute_histograms(
+                self.bin_indices[left_indices], self.residual[left_indices]
+            )
+            grad_hist_right = gradient_histogram - grad_hist_left
+            hess_hist_right = hessian_histogram - hess_hist_left
+        else:
+            grad_hist_right, hess_hist_right = self.compute_histograms(
+                self.bin_indices[right_indices], self.residual[right_indices]
+            )
+            grad_hist_left = gradient_histogram - grad_hist_right
+            hess_hist_left = hessian_histogram - hess_hist_right
+        new_depth = depth + 1
+        left_child = self.grow_tree(
+            grad_hist_left, hess_hist_left, left_indices, new_depth
+        )
+        right_child = self.grow_tree(
+            grad_hist_right, hess_hist_right, right_indices, new_depth
+        )
+        return {
+            "feature": best_feature,
+            "bin": best_bin,
+            "left": left_child,
+            "right": right_child,
+        }
+    def compute_eval(self, i):
+        if self.eval_every_n_trees == None:
+            return
+        if i % self.eval_every_n_trees == 0:
+            eval_preds = self.predict_binned(self.bin_indices_eval)
+            eval_loss = ((self.Y_gpu_eval - eval_preds) ** 2).mean().item()
+            self.eval_loss.append(eval_loss)
+            train_loss = ((self.Y_gpu - self.gradients) ** 2).mean().item()
+            self.training_loss.append(train_loss)
+            if len(self.eval_loss) > self.early_stopping_rounds:
+                if self.eval_loss[-self.early_stopping_rounds] < self.eval_loss[-1]:
+                    self.stop = True
+            print(
+                f"🌲 Tree {i+1}/{self.n_estimators} | Train MSE: {train_loss:.6f} | Eval MSE: {eval_loss:.6f}"
+            )
+            del eval_preds, eval_loss, train_loss
+    def grow_forest(self):
+        self.training_loss = []
+        self.eval_loss = []  # <-- if eval set is given
+        self.stop = False
+        for i in range(self.n_estimators):
+            self.residual = self.Y_gpu - self.gradients
+            self.root_gradient_histogram, self.root_hessian_histogram = (
+                self.compute_histograms(self.bin_indices, self.residual)
+            )
+            tree = self.grow_tree(
+                self.root_gradient_histogram,
+                self.root_hessian_histogram,
+                self.root_node_indices,
+                depth=0,
+            )
+            self.forest[i] = tree
+            self.compute_eval(i)
+            if self.stop:
+                break
+        print("Finished training forest.")
+    def bin_data_with_existing_edges(self, X_np):
+        X_tensor = torch.from_numpy(X_np).to(torch.float32).pin_memory()
+        num_samples = X_tensor.size(0)
+        bin_indices = torch.zeros(
+            (num_samples, self.num_features), dtype=torch.int8, device=self.device
+        )
+        with torch.no_grad():
+            for f in range(self.num_features):
+                X_f = X_tensor[:, f].to(self.device, non_blocking=True)
+                bin_edges_f = self.bin_edges[f]
+                bin_indices_f = bin_indices[:, f].contiguous()
+                node_kernel.custom_cuda_binner(X_f, bin_edges_f, bin_indices_f)
+                bin_indices[:, f] = bin_indices_f
+        return bin_indices
+    def predict_binned(self, bin_indices):
+        num_samples = bin_indices.size(0)
+        tree_tensor = torch.stack(
+            [
+                self.flatten_tree(tree, max_nodes=2 ** (self.max_depth + 1))
+                for tree in self.forest
+                if tree
+            ]
+        ).to(self.device)
+        out = torch.zeros(num_samples, device=self.device) + self.base_prediction
+        node_kernel.predict_forest(
+            bin_indices.contiguous(), tree_tensor.contiguous(), self.learning_rate, out
+        )
+        return out
+    def predict(self, X_np):
+        bin_indices = self.bin_data_with_existing_edges(X_np)
+        out = self.predict_binned(bin_indices)
+        return out.cpu().numpy()
+    def flatten_tree(self, tree, max_nodes):
+        """
+        Convert a recursive tree structure into a flat matrix format.
+        Each row in the output represents a node:
+        - Columns: [feature, bin, left_id, right_id, is_leaf, value]
+        - Internal nodes fill columns 0–3 and set is_leaf = 0
+        - Leaf nodes fill only value and set is_leaf = 1
+        Args:
+            tree (list): A list containing a single root node (recursive dict form).
+            max_nodes (int): Max number of nodes to allocate in the flat matrix.
+        Returns:
+            torch.Tensor: [max_nodes x 6] matrix representing the flattened tree.
+        """
+        flat = torch.full((max_nodes, 6), float("nan"), dtype=torch.float32)
+        node_counter = [0]
+        node_list = []
+        def walk(node):
+            curr_id = node_counter[0]
+            node_counter[0] += 1
+            new_node = {"node_id": curr_id}
+            if "leaf_value" in node:
+                new_node["leaf_value"] = float(node["leaf_value"])
+            else:
+                new_node["best_feature"] = float(node["feature"])
+                new_node["split_bin"] = float(node["bin"])
+                new_node["left_id"] = node_counter[0]
+                walk(node["left"])
+                new_node["right_id"] = node_counter[0]
+                walk(node["right"])
+            node_list.append(new_node)
+            return new_node
+        walk(tree)
+        for node in node_list:
+            i = node["node_id"]
+            if "leaf_value" in node:
+                flat[i, 4] = 1.0
+                flat[i, 5] = node["leaf_value"]
+            else:
+                flat[i, 0] = node["best_feature"]
+                flat[i, 1] = node["split_bin"]
+                flat[i, 2] = node["left_id"]
+                flat[i, 3] = node["right_id"]
+                flat[i, 4] = 0.0
+        return flat

{warpgbm-0.1.21 → warpgbm-0.1.23}/warpgbm/cuda/best_split_kernel.cu RENAMED Viewed

@@ -38,7 +38,7 @@ __global__ void best_split_kernel_global_only(
         if (H_L >= min_child_samples && H_R >= min_child_samples)
         {
-            float gain = (G_L * G_L) / (H_L + eps) + (G_R * G_R) / (H_R + eps);
+            float gain = (G_L * G_L) / (H_L + eps) + (G_R * G_R) / (H_R + eps) - (G_total * G_total) / (H_total + eps);
             if (gain > best_gain)
             {
                 best_gain = gain;

{warpgbm-0.1.21 → warpgbm-0.1.23}/warpgbm/cuda/histogram_kernel.cu RENAMED Viewed

@@ -107,12 +107,6 @@ void launch_histogram_kernel_cuda(
         N, F, B);
 }
-#define CHECK_CUDA(x) TORCH_CHECK(x.is_cuda(), #x " must be a CUDA tensor")
-#define CHECK_CONTIGUOUS(x) TORCH_CHECK(x.is_contiguous(), #x " must be contiguous")
-#define CHECK_INPUT(x) \
-    CHECK_CUDA(x);     \
-    CHECK_CONTIGUOUS(x)
 // CUDA kernel: tiled, 64-bit safe
 __global__ void histogram_tiled_kernel(
     const int8_t *__restrict__ bin_indices, // [N, F]
@@ -148,10 +142,6 @@ void launch_histogram_kernel_cuda_2(
     int threads_per_block = 256,
     int rows_per_thread = 1)
 {
-    CHECK_INPUT(bin_indices);
-    CHECK_INPUT(gradients);
-    CHECK_INPUT(grad_hist);
-    CHECK_INPUT(hess_hist);
     int64_t N = bin_indices.size(0);
     int64_t F = bin_indices.size(1);
@@ -233,10 +223,6 @@ void launch_histogram_kernel_cuda_configurable(
     int threads_per_block = 256,
     int rows_per_thread = 1)
 {
-    CHECK_INPUT(bin_indices);
-    CHECK_INPUT(gradients);
-    CHECK_INPUT(grad_hist);
-    CHECK_INPUT(hess_hist);
     int64_t N = bin_indices.size(0);
     int64_t F = bin_indices.size(1);

{warpgbm-0.1.21 → warpgbm-0.1.23/warpgbm.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: warpgbm
-Version: 0.1.21
+Version: 0.1.23
 Summary: A fast GPU-accelerated Gradient Boosted Decision Tree library with PyTorch + CUDA
 License:                     GNU GENERAL PUBLIC LICENSE
                                Version 3, 29 June 2007
@@ -704,26 +704,20 @@ WarpGBM is a high-performance, GPU-accelerated Gradient Boosted Decision Tree (G
 ---
-## Performance Note
-In our initial tests on an NVIDIA 3090 (local) and A100 (Google Colab Pro), WarpGBM achieves **14x to 20x faster training times** compared to LightGBM's CPU version and **2x faster** on the GPU version using default configurations. Speed also outperforms XGBoost and CatBoost on regression problems. It also consumes **significantly less RAM and CPU**. These early results hint at more thorough benchmarking to come.
----
 ## Benchmarks
 ### Scikit-Learn Synthetic Data: 1 Million Rows and 1,000 Features
-In this benchmark we compare the speed and in-sample correlation of **WarpGBM v0.1.19** against LightGBM, XGBoost and CatBoost, all with their GPU-enabled versions. This benchmark runs on Google Colab with the L4 GPU environment. The CPU versions don't even come close to the speed here so we didn't test them.
+In this benchmark we compare the speed and in-sample correlation of **WarpGBM v0.1.21** against LightGBM, XGBoost and CatBoost, all with their GPU-enabled versions. This benchmark runs on Google Colab with the L4 GPU environment.
 ```
-   WarpGBM:   corr = 0.8882, train = 21.8s, infer = 11.6s
-   XGBoost:   corr = 0.8877, train = 33.4s, infer = 8.1s
-  LightGBM:   corr = 0.8604, train = 30.2s, infer = 1.4s
-  CatBoost:   corr = 0.8935, train = 377.9s, infer = 375.8s
+   WarpGBM:   corr = 0.8882, train = 18.7s, infer = 4.9s
+   XGBoost:   corr = 0.8877, train = 33.1s, infer = 8.1s
+  LightGBM:   corr = 0.8604, train = 30.3s, infer = 1.4s
+  CatBoost:   corr = 0.8935, train = 400.0s, infer = 382.6s
 ```
-Colab Notebook: https://colab.research.google.com/drive/16U1kbYlD5HibGbnF5NGsjChZ1p1IA2pK
+Colab Notebook: https://colab.research.google.com/drive/16U1kbYlD5HibGbnF5NGsjChZ1p1IA2pK?usp=sharing
 ---
@@ -746,7 +740,7 @@ pip install warpgbm
 This installs from PyPI and also compiles CUDA code locally during installation. This method works well **if your environment already has PyTorch with GPU support** installed and configured.
 > **Tip:**\
-> If you encounter an error related to mismatched or missing CUDA versions, try installing with the following flag:
+> If you encounter an error related to mismatched or missing CUDA versions, try installing with the following flag. This is currently required in the Colab environments.
 >
 > ```bash
 > pip install warpgbm --no-build-isolation
@@ -886,8 +880,7 @@ No installation required — just press **"Open in Playground"**, then **Run All
 ### Methods:
 - `.fit(X, y, era_id=None)`: Train the model. `X` can be raw floats or pre-binned `int8` data. `era_id` is optional and used internally.
-- `.predict(X, chunksize=50_000)`: Predict on new raw float or pre-binned data.
-- `.predict_numpy(X, chunksize=50_000)`: Same as `.predict(X)` but without using the GPU.
+- `.predict(X)`: Predict on new data, using parallelized CUDA kernel.
 ---

warpgbm-0.1.21/version.txt DELETED Viewed

	@@ -1 +0,0 @@
1	- 0.1.21

warpgbm-0.1.21/warpgbm/core.py DELETED Viewed

@@ -1,341 +0,0 @@
-import torch
-import numpy as np
-from sklearn.base import BaseEstimator, RegressorMixin
-from warpgbm.cuda import node_kernel
-from tqdm import tqdm
-from typing import Tuple
-from torch import Tensor
-histogram_kernels = {
-    'hist1': node_kernel.compute_histogram,
-    'hist2': node_kernel.compute_histogram2,
-    'hist3': node_kernel.compute_histogram3
-}
-class WarpGBM(BaseEstimator, RegressorMixin):
-    def __init__(
-        self,
-        num_bins=10,
-        max_depth=3,
-        learning_rate=0.1,
-        n_estimators=100,
-        min_child_weight=20,
-        min_split_gain=0.0,
-        verbosity=True,
-        histogram_computer='hist3',
-        threads_per_block=64,
-        rows_per_thread=4,
-        L2_reg=1e-6,
-        L1_reg=0.0,
-        device='cuda'
-    ):
-        # Validate arguments
-        self._validate_hyperparams(
-            num_bins=num_bins,
-            max_depth=max_depth,
-            learning_rate=learning_rate,
-            n_estimators=n_estimators,
-            min_child_weight=min_child_weight,
-            min_split_gain=min_split_gain,
-            histogram_computer=histogram_computer,
-            threads_per_block=threads_per_block,
-            rows_per_thread=rows_per_thread,
-            L2_reg=L2_reg,
-            L1_reg=L1_reg
-        )
-        self.num_bins = num_bins
-        self.max_depth = max_depth
-        self.learning_rate = learning_rate
-        self.n_estimators = n_estimators
-        self.forest = None
-        self.bin_edges = None
-        self.base_prediction = None
-        self.unique_eras = None
-        self.device = device
-        self.root_gradient_histogram = None
-        self.root_hessian_histogram = None
-        self.gradients = None
-        self.root_node_indices = None
-        self.bin_indices = None
-        self.Y_gpu = None
-        self.num_features = None
-        self.num_samples = None
-        self.min_child_weight = min_child_weight
-        self.min_split_gain = min_split_gain
-        self.best_bin = torch.tensor([-1], dtype=torch.int32, device=self.device)
-        self.compute_histogram = histogram_kernels[histogram_computer]
-        self.threads_per_block = threads_per_block
-        self.rows_per_thread = rows_per_thread
-        self.L2_reg = L2_reg
-        self.L1_reg = L1_reg
-    def _validate_hyperparams(self, **kwargs):
-        # Type checks
-        int_params = [
-            "num_bins", "max_depth", "n_estimators", "min_child_weight",
-            "threads_per_block", "rows_per_thread"
-        ]
-        float_params = [
-            "learning_rate", "min_split_gain", "L2_reg", "L1_reg"
-        ]
-        for param in int_params:
-            if not isinstance(kwargs[param], int):
-                raise TypeError(f"{param} must be an integer, got {type(kwargs[param])}.")
-        for param in float_params:
-            if not isinstance(kwargs[param], (float, int)):  # Accept ints as valid floats
-                raise TypeError(f"{param} must be a float, got {type(kwargs[param])}.")
-        if not ( 2 <= kwargs["num_bins"] <= 127 ):
-            raise ValueError("num_bins must be between 2 and 127 inclusive.")
-        if kwargs["max_depth"] < 1:
-            raise ValueError("max_depth must be at least 1.")
-        if not (0.0 < kwargs["learning_rate"] <= 1.0):
-            raise ValueError("learning_rate must be in (0.0, 1.0].")
-        if kwargs["n_estimators"] <= 0:
-            raise ValueError("n_estimators must be positive.")
-        if kwargs["min_child_weight"] < 1:
-            raise ValueError("min_child_weight must be a positive integer.")
-        if kwargs["min_split_gain"] < 0:
-            raise ValueError("min_split_gain must be non-negative.")
-        if kwargs["threads_per_block"] <= 0 or kwargs["threads_per_block"] % 32 != 0:
-            raise ValueError("threads_per_block should be a positive multiple of 32 (warp size).")
-        if not ( 1 <= kwargs["rows_per_thread"] <= 16 ):
-            raise ValueError("rows_per_thread must be positive between 1 and 16 inclusive.")
-        if kwargs["L2_reg"] < 0 or kwargs["L1_reg"] < 0:
-            raise ValueError("L2_reg and L1_reg must be non-negative.")
-        if kwargs["histogram_computer"] not in histogram_kernels:
-            raise ValueError(f"Invalid histogram_computer: {kwargs['histogram_computer']}. Choose from {list(histogram_kernels.keys())}.")
-    def fit(self, X, y, era_id=None):
-        if era_id is None:
-            era_id = np.ones(X.shape[0], dtype='int32')
-        self.bin_indices, era_indices, self.bin_edges, self.unique_eras, self.Y_gpu = self.preprocess_gpu_data(X, y, era_id)
-        self.num_samples, self.num_features = X.shape
-        self.gradients = torch.zeros_like(self.Y_gpu)
-        self.root_node_indices = torch.arange(self.num_samples, device=self.device)
-        self.base_prediction = self.Y_gpu.mean().item()
-        self.gradients += self.base_prediction
-        self.best_gains = torch.zeros(self.num_features, device=self.device)
-        self.best_bins = torch.zeros(self.num_features, device=self.device, dtype=torch.int32)
-        with torch.no_grad():
-            self.forest = self.grow_forest()
-        return self
-    def preprocess_gpu_data(self, X_np, Y_np, era_id_np):
-        with torch.no_grad():
-            self.num_samples, self.num_features = X_np.shape
-            Y_gpu = torch.from_numpy(Y_np).type(torch.float32).to(self.device)
-            era_id_gpu = torch.from_numpy(era_id_np).type(torch.int32).to(self.device)
-            is_integer_type = np.issubdtype(X_np.dtype, np.integer)
-            if is_integer_type:
-                max_vals = X_np.max(axis=0)
-                if np.all(max_vals < self.num_bins):
-                    print("Detected pre-binned integer input — skipping quantile binning.")
-                    bin_indices = torch.from_numpy(X_np).to(self.device).contiguous().to(torch.int8)
-                    # We'll store None or an empty tensor in self.bin_edges
-                    # to indicate that we skip binning at predict-time
-                    bin_edges = torch.arange(1, self.num_bins, dtype=torch.float32).repeat(self.num_features, 1)
-                    bin_edges = bin_edges.to(self.device)
-                    unique_eras, era_indices = torch.unique(era_id_gpu, return_inverse=True)
-                    return bin_indices, era_indices, bin_edges, unique_eras, Y_gpu
-                else:
-                    print("Integer input detected, but values exceed num_bins — falling back to quantile binning.")
-            bin_indices = torch.empty((self.num_samples, self.num_features), dtype=torch.int8, device='cuda')
-            bin_edges = torch.empty((self.num_features, self.num_bins - 1), dtype=torch.float32, device='cuda')
-            X_np = torch.from_numpy(X_np).to(torch.float32).pin_memory()
-            for f in range(self.num_features):
-                X_f = X_np[:, f].to('cuda', non_blocking=True)
-                quantiles = torch.linspace(0, 1, self.num_bins + 1, device='cuda', dtype=X_f.dtype)[1:-1]
-                bin_edges_f = torch.quantile(X_f, quantiles, dim=0).contiguous()  # shape: [B-1] for 1D input
-                bin_indices_f = bin_indices[:, f].contiguous()  # view into output
-                node_kernel.custom_cuda_binner(X_f, bin_edges_f, bin_indices_f)
-                bin_indices[:,f] = bin_indices_f
-                bin_edges[f,:] = bin_edges_f
-            unique_eras, era_indices = torch.unique(era_id_gpu, return_inverse=True)
-            return bin_indices, era_indices, bin_edges, unique_eras, Y_gpu
-    def compute_histograms(self, bin_indices_sub, gradients):
-        grad_hist = torch.zeros((self.num_features, self.num_bins), device=self.device, dtype=torch.float32)
-        hess_hist = torch.zeros((self.num_features, self.num_bins), device=self.device, dtype=torch.float32)
-        self.compute_histogram(
-            bin_indices_sub,
-            gradients,
-            grad_hist,
-            hess_hist,
-            self.num_bins,
-            self.threads_per_block,
-            self.rows_per_thread
-        )
-        return grad_hist, hess_hist
-    def find_best_split(self, gradient_histogram, hessian_histogram):
-        node_kernel.compute_split(
-            gradient_histogram,
-            hessian_histogram,
-            self.min_split_gain,
-            self.min_child_weight,
-            self.L2_reg,
-            self.best_gains,
-            self.best_bins,
-            self.threads_per_block
-        )
-        if torch.all(self.best_bins == -1):
-            return -1, -1  # No valid split found
-        f = torch.argmax(self.best_gains).item()
-        b = self.best_bins[f].item()
-        return f, b
-    def grow_tree(self, gradient_histogram, hessian_histogram, node_indices, depth):
-        if depth == self.max_depth:
-            leaf_value = self.residual[node_indices].mean()
-            self.gradients[node_indices] += self.learning_rate * leaf_value
-            return {"leaf_value": leaf_value.item(), "samples": node_indices.numel()}
-        parent_size = node_indices.numel()
-        best_feature, best_bin = self.find_best_split(gradient_histogram, hessian_histogram)
-        if best_feature == -1:
-            leaf_value = self.residual[node_indices].mean()
-            self.gradients[node_indices] += self.learning_rate * leaf_value
-            return {"leaf_value": leaf_value.item(), "samples": parent_size}
-        split_mask = (self.bin_indices[node_indices, best_feature] <= best_bin)
-        left_indices = node_indices[split_mask]
-        right_indices = node_indices[~split_mask]
-        left_size = left_indices.numel()
-        right_size = right_indices.numel()
-        if left_size <= right_size:
-            grad_hist_left, hess_hist_left = self.compute_histograms( self.bin_indices[left_indices], self.residual[left_indices] )
-            grad_hist_right = gradient_histogram - grad_hist_left
-            hess_hist_right = hessian_histogram - hess_hist_left
-        else:
-            grad_hist_right, hess_hist_right = self.compute_histograms( self.bin_indices[right_indices], self.residual[right_indices] )
-            grad_hist_left = gradient_histogram - grad_hist_right
-            hess_hist_left = hessian_histogram - hess_hist_right
-        new_depth = depth + 1
-        left_child = self.grow_tree(grad_hist_left, hess_hist_left, left_indices, new_depth)
-        right_child = self.grow_tree(grad_hist_right, hess_hist_right, right_indices, new_depth)
-        return { "feature": best_feature, "bin": best_bin, "left": left_child, "right": right_child }
-    def grow_forest(self):
-        forest = [{} for _ in range(self.n_estimators)]
-        self.training_loss = []
-        for i in tqdm( range(self.n_estimators) ):
-            self.residual = self.Y_gpu - self.gradients
-            self.root_gradient_histogram, self.root_hessian_histogram = \
-                self.compute_histograms(self.bin_indices, self.residual)
-            tree = self.grow_tree(
-                self.root_gradient_histogram,
-                self.root_hessian_histogram,
-                self.root_node_indices,
-                depth=0
-            )
-            forest[i] = tree
-        # loss = ((self.Y_gpu - self.gradients) ** 2).mean().item()
-        # self.training_loss.append(loss)
-        # print(f"🌲 Tree {i+1}/{self.n_estimators} - MSE: {loss:.6f}")
-        print("Finished training forest.")
-        return forest
-    def predict(self, X_np):
-        X_tensor = torch.from_numpy(X_np).to(torch.float32).pin_memory()
-        num_samples = X_tensor.size(0)
-        bin_indices = torch.zeros((num_samples, self.num_features), dtype=torch.int8, device=self.device)
-        with torch.no_grad():
-            for f in range(self.num_features):
-                X_f = X_tensor[:, f].to(self.device, non_blocking=True)
-                bin_edges_f = self.bin_edges[f]
-                bin_indices_f = bin_indices[:, f].contiguous()
-                node_kernel.custom_cuda_binner(X_f, bin_edges_f, bin_indices_f)
-                bin_indices[:, f] = bin_indices_f
-        tree_tensor = torch.stack([
-            self.flatten_tree(tree, max_nodes=2**(self.max_depth + 1))
-            for tree in self.forest
-        ]).to(self.device)
-        out = torch.zeros(num_samples, device=self.device)
-        node_kernel.predict_forest(
-            bin_indices.contiguous(),
-            tree_tensor.contiguous(),
-            self.learning_rate,
-            out
-        )
-        return out.cpu().numpy()
-    def flatten_tree(self, tree, max_nodes):
-        """
-        Convert a recursive tree structure into a flat matrix format.
-        Each row in the output represents a node:
-        - Columns: [feature, bin, left_id, right_id, is_leaf, value]
-        - Internal nodes fill columns 0–3 and set is_leaf = 0
-        - Leaf nodes fill only value and set is_leaf = 1
-        Args:
-            tree (list): A list containing a single root node (recursive dict form).
-            max_nodes (int): Max number of nodes to allocate in the flat matrix.
-        Returns:
-            torch.Tensor: [max_nodes x 6] matrix representing the flattened tree.
-        """
-        flat = torch.full((max_nodes, 6), float('nan'), dtype=torch.float32)
-        node_counter = [0]
-        node_list = []
-        def walk(node):
-            curr_id = node_counter[0]
-            node_counter[0] += 1
-            new_node = {'node_id': curr_id}
-            if 'leaf_value' in node:
-                new_node['leaf_value'] = float(node['leaf_value'])
-            else:
-                new_node['best_feature'] = float(node['feature'])
-                new_node['split_bin'] = float(node['bin'])
-                new_node['left_id'] = node_counter[0]
-                walk(node['left'])
-                new_node['right_id'] = node_counter[0]
-                walk(node['right'])
-            node_list.append(new_node)
-            return new_node
-        walk(tree)
-        for node in node_list:
-            i = node['node_id']
-            if 'leaf_value' in node:
-                flat[i, 4] = 1.0
-                flat[i, 5] = node['leaf_value']
-            else:
-                flat[i, 0] = node['best_feature']
-                flat[i, 1] = node['split_bin']
-                flat[i, 2] = node['left_id']
-                flat[i, 3] = node['right_id']
-                flat[i, 4] = 0.0
-        return flat