warpgbm 0.1.10__tar.gz → 0.1.12__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {warpgbm-0.1.10/warpgbm.egg-info → warpgbm-0.1.12}/PKG-INFO +1 -1
- warpgbm-0.1.12/README.md +167 -0
- warpgbm-0.1.12/tests/test_fit_predict_corr.py +68 -0
- warpgbm-0.1.12/version.txt +1 -0
- {warpgbm-0.1.10 → warpgbm-0.1.12}/warpgbm/core.py +8 -6
- {warpgbm-0.1.10 → warpgbm-0.1.12/warpgbm.egg-info}/PKG-INFO +1 -1
- warpgbm-0.1.10/README.md +0 -60
- warpgbm-0.1.10/tests/test_fit_predict_corr.py +0 -29
- warpgbm-0.1.10/version.txt +0 -1
- {warpgbm-0.1.10 → warpgbm-0.1.12}/LICENSE +0 -0
- {warpgbm-0.1.10 → warpgbm-0.1.12}/MANIFEST.in +0 -0
- {warpgbm-0.1.10 → warpgbm-0.1.12}/pyproject.toml +0 -0
- {warpgbm-0.1.10 → warpgbm-0.1.12}/setup.cfg +0 -0
- {warpgbm-0.1.10 → warpgbm-0.1.12}/setup.py +0 -0
- {warpgbm-0.1.10 → warpgbm-0.1.12}/tests/__init__.py +0 -0
- {warpgbm-0.1.10 → warpgbm-0.1.12}/warpgbm/__init__.py +0 -0
- {warpgbm-0.1.10 → warpgbm-0.1.12}/warpgbm/cuda/__init__.py +0 -0
- {warpgbm-0.1.10 → warpgbm-0.1.12}/warpgbm/cuda/best_split_kernel.cu +0 -0
- {warpgbm-0.1.10 → warpgbm-0.1.12}/warpgbm/cuda/histogram_kernel.cu +0 -0
- {warpgbm-0.1.10 → warpgbm-0.1.12}/warpgbm/cuda/node_kernel.cpp +0 -0
- {warpgbm-0.1.10 → warpgbm-0.1.12}/warpgbm.egg-info/SOURCES.txt +0 -0
- {warpgbm-0.1.10 → warpgbm-0.1.12}/warpgbm.egg-info/dependency_links.txt +0 -0
- {warpgbm-0.1.10 → warpgbm-0.1.12}/warpgbm.egg-info/requires.txt +0 -0
- {warpgbm-0.1.10 → warpgbm-0.1.12}/warpgbm.egg-info/top_level.txt +0 -0
warpgbm-0.1.12/README.md
ADDED
@@ -0,0 +1,167 @@
|
|
1
|
+
# WarpGBM
|
2
|
+
|
3
|
+
WarpGBM is a high-performance, GPU-accelerated Gradient Boosted Decision Tree (GBDT) library built with PyTorch and CUDA. It offers blazing-fast histogram-based training and efficient prediction, with compatibility for research and production workflows.
|
4
|
+
|
5
|
+
---
|
6
|
+
|
7
|
+
## Features
|
8
|
+
|
9
|
+
- GPU-accelerated training and histogram construction using custom CUDA kernels
|
10
|
+
- Drop-in scikit-learn style interface
|
11
|
+
- Supports pre-binned data or automatic quantile binning
|
12
|
+
- Fully differentiable prediction path
|
13
|
+
- Simple install with `pip`
|
14
|
+
|
15
|
+
---
|
16
|
+
|
17
|
+
## Performance Note
|
18
|
+
|
19
|
+
In our initial tests on an NVIDIA 3090 (local) and A100 (Google Colab Pro), WarpGBM achieves **14x to 20x faster training times** compared to LightGBM using default configurations. It also consumes **significantly less RAM and CPU**. These early results hint at more thorough benchmarking to come.
|
20
|
+
|
21
|
+
---
|
22
|
+
|
23
|
+
## Installation
|
24
|
+
|
25
|
+
### 🔧 Recommended (GitHub, always latest):
|
26
|
+
|
27
|
+
```bash
|
28
|
+
pip install git+https://github.com/jefferythewind/warpgbm.git
|
29
|
+
```
|
30
|
+
|
31
|
+
This installs the latest version directly from GitHub and compiles CUDA extensions on your machine using your **local PyTorch and CUDA setup**. It's the most reliable method for ensuring compatibility and staying up to date with the latest features.
|
32
|
+
|
33
|
+
### 📦 Alternatively (PyPI, stable releases):
|
34
|
+
|
35
|
+
```bash
|
36
|
+
pip install warpgbm
|
37
|
+
```
|
38
|
+
|
39
|
+
This installs from PyPI and also compiles CUDA code locally during installation. This method works well **if your environment already has PyTorch with GPU support** installed and configured.
|
40
|
+
|
41
|
+
> 💡 **Tip:**\
|
42
|
+
> If you encounter an error related to mismatched or missing CUDA versions, try installing with the following flag:
|
43
|
+
>
|
44
|
+
> ```bash
|
45
|
+
> pip install warpgbm --no-build-isolation
|
46
|
+
> ```
|
47
|
+
|
48
|
+
Before either method, make sure you’ve installed PyTorch with GPU support:\
|
49
|
+
👉 [https://pytorch.org/get-started/locally/](https://pytorch.org/get-started/locally/)
|
50
|
+
|
51
|
+
---
|
52
|
+
|
53
|
+
## Example
|
54
|
+
|
55
|
+
```python
|
56
|
+
import numpy as np
|
57
|
+
from sklearn.datasets import make_regression
|
58
|
+
from time import time
|
59
|
+
import lightgbm as lgb
|
60
|
+
from warpgbm import WarpGBM
|
61
|
+
|
62
|
+
# Create synthetic regression dataset
|
63
|
+
X, y = make_regression(n_samples=100_000, n_features=500, noise=0.1, random_state=42)
|
64
|
+
X = X.astype(np.float32)
|
65
|
+
y = y.astype(np.float32)
|
66
|
+
|
67
|
+
# Train LightGBM
|
68
|
+
start = time()
|
69
|
+
lgb_model = lgb.LGBMRegressor(max_depth=5, n_estimators=100, learning_rate=0.01, max_bin=7)
|
70
|
+
lgb_model.fit(X, y)
|
71
|
+
lgb_time = time() - start
|
72
|
+
lgb_preds = lgb_model.predict(X)
|
73
|
+
|
74
|
+
# Train WarpGBM
|
75
|
+
start = time()
|
76
|
+
wgbm_model = WarpGBM(max_depth=5, n_estimators=100, learning_rate=0.01, num_bins=7)
|
77
|
+
wgbm_model.fit(X, y)
|
78
|
+
wgbm_time = time() - start
|
79
|
+
wgbm_preds = wgbm_model.predict(X)
|
80
|
+
|
81
|
+
# Results
|
82
|
+
print(f"LightGBM: corr = {np.corrcoef(lgb_preds, y)[0,1]:.4f}, time = {lgb_time:.2f}s")
|
83
|
+
print(f"WarpGBM: corr = {np.corrcoef(wgbm_preds, y)[0,1]:.4f}, time = {wgbm_time:.2f}s")
|
84
|
+
```
|
85
|
+
|
86
|
+
**🧪 Results (Ryzen 9 CPU, NVIDIA 3090 GPU):**
|
87
|
+
|
88
|
+
```
|
89
|
+
LightGBM: corr = 0.8742, time = 37.33s
|
90
|
+
WarpGBM: corr = 0.8621, time = 5.40s
|
91
|
+
```
|
92
|
+
|
93
|
+
---
|
94
|
+
|
95
|
+
## Pre-binned Data Example (Numerai)
|
96
|
+
|
97
|
+
WarpGBM can save additional training time if your dataset is already pre-binned. The Numerai tournament data is a great example:
|
98
|
+
|
99
|
+
```python
|
100
|
+
import pandas as pd
|
101
|
+
from numerapi import NumerAPI
|
102
|
+
from time import time
|
103
|
+
import lightgbm as lgb
|
104
|
+
from warpgbm import WarpGBM
|
105
|
+
import numpy as np
|
106
|
+
|
107
|
+
napi = NumerAPI()
|
108
|
+
napi.download_dataset('v5.0/train.parquet', 'train.parquet')
|
109
|
+
train = pd.read_parquet('train.parquet')
|
110
|
+
|
111
|
+
feature_set = [f for f in train.columns if 'feature' in f]
|
112
|
+
target = 'target_cyrus'
|
113
|
+
|
114
|
+
X_np = train[feature_set].astype('int8').values
|
115
|
+
Y_np = train[target].values
|
116
|
+
|
117
|
+
# LightGBM
|
118
|
+
start = time()
|
119
|
+
lgb_model = lgb.LGBMRegressor(max_depth=5, n_estimators=100, learning_rate=0.01, max_bin=7)
|
120
|
+
lgb_model.fit(X_np, Y_np)
|
121
|
+
lgb_time = time() - start
|
122
|
+
lgb_preds = lgb_model.predict(X_np)
|
123
|
+
|
124
|
+
# WarpGBM
|
125
|
+
start = time()
|
126
|
+
wgbm_model = WarpGBM(max_depth=5, n_estimators=100, learning_rate=0.01, num_bins=7)
|
127
|
+
wgbm_model.fit(X_np, Y_np)
|
128
|
+
wgbm_time = time() - start
|
129
|
+
wgbm_preds = wgbm_model.predict(X_np)
|
130
|
+
|
131
|
+
# Results
|
132
|
+
print(f"LightGBM: corr = {np.corrcoef(lgb_preds, Y_np)[0,1]:.4f}, time = {lgb_time:.2f}s")
|
133
|
+
print(f"WarpGBM: corr = {np.corrcoef(wgbm_preds, Y_np)[0,1]:.4f}, time = {wgbm_time:.2f}s")
|
134
|
+
```
|
135
|
+
|
136
|
+
---
|
137
|
+
|
138
|
+
## Documentation
|
139
|
+
|
140
|
+
### `WarpGBM` Parameters:
|
141
|
+
- `num_bins`: Number of histogram bins to use (default: 10)
|
142
|
+
- `max_depth`: Maximum depth of trees (default: 3)
|
143
|
+
- `learning_rate`: Shrinkage rate applied to leaf outputs (default: 0.1)
|
144
|
+
- `n_estimators`: Number of boosting iterations (default: 100)
|
145
|
+
- `min_child_weight`: Minimum sum of instance weight needed in a child (default: 20)
|
146
|
+
- `min_split_gain`: Minimum loss reduction required to make a further partition (default: 0.0)
|
147
|
+
- `verbosity`: Whether to print training logs (default: True)
|
148
|
+
- `histogram_computer`: Choice of histogram kernel (`'hist1'`, `'hist2'`, `'hist3'`) (default: `'hist3'`)
|
149
|
+
- `threads_per_block`: CUDA threads per block (default: 32)
|
150
|
+
- `rows_per_thread`: Number of training rows processed per thread (default: 4)
|
151
|
+
- `device`: Device to train on (`'cuda'` or `'cpu'`, default: `'cuda'`)
|
152
|
+
- `split_type`: Algorithm used to choose best split (`'v1'` = CUDA kernel, `'v2'` = torch-based) (default: `'v2'`)
|
153
|
+
|
154
|
+
### Methods:
|
155
|
+
- `.fit(X, y, era_id=None)`: Train the model. `X` can be raw floats or pre-binned `int8` data. `era_id` is optional and used internally.
|
156
|
+
- `.predict(X)`: Predict on new raw float or pre-binned data.
|
157
|
+
- `.predict_data(bin_indices)`: Predict from binned data directly (NumPy `int8` matrix).
|
158
|
+
- `.grow_forest()`: Manually triggers tree construction loop (usually not needed).
|
159
|
+
|
160
|
+
---
|
161
|
+
|
162
|
+
## Acknowledgements
|
163
|
+
|
164
|
+
WarpGBM builds on the shoulders of PyTorch, scikit-learn, LightGBM, and the CUDA ecosystem. Thanks to all contributors in the GBDT research and engineering space.
|
165
|
+
|
166
|
+
---
|
167
|
+
|
@@ -0,0 +1,68 @@
|
|
1
|
+
import numpy as np
|
2
|
+
from warpgbm import WarpGBM
|
3
|
+
|
4
|
+
def test_fit_predict_correlation():
|
5
|
+
np.random.seed(42)
|
6
|
+
N = 500
|
7
|
+
F = 5
|
8
|
+
X = np.random.randn(N, F).astype(np.float32)
|
9
|
+
true_weights = np.array([0.5, -1.0, 2.0, 0.0, 1.0])
|
10
|
+
noise = 0.1 * np.random.randn(N)
|
11
|
+
y = (X @ true_weights + noise).astype(np.float32)
|
12
|
+
era = np.zeros(N, dtype=np.int32)
|
13
|
+
corrs = []
|
14
|
+
|
15
|
+
model = WarpGBM(
|
16
|
+
max_depth = 10,
|
17
|
+
num_bins = 10,
|
18
|
+
n_estimators = 10,
|
19
|
+
learning_rate = 1,
|
20
|
+
verbosity=False,
|
21
|
+
histogram_computer='hist1',
|
22
|
+
threads_per_block=32,
|
23
|
+
rows_per_thread=4
|
24
|
+
)
|
25
|
+
|
26
|
+
model.fit(X, y, era_id=era)
|
27
|
+
preds = model.predict(X)
|
28
|
+
|
29
|
+
# Pearson correlation in-sample
|
30
|
+
corr = np.corrcoef(preds, y)[0, 1]
|
31
|
+
corrs.append(corr)
|
32
|
+
|
33
|
+
model = WarpGBM(
|
34
|
+
max_depth = 10,
|
35
|
+
num_bins = 10,
|
36
|
+
n_estimators = 10,
|
37
|
+
learning_rate = 1,
|
38
|
+
verbosity=False,
|
39
|
+
histogram_computer='hist2',
|
40
|
+
threads_per_block=32,
|
41
|
+
rows_per_thread=4
|
42
|
+
)
|
43
|
+
|
44
|
+
model.fit(X, y, era_id=era)
|
45
|
+
preds = model.predict(X)
|
46
|
+
|
47
|
+
# Pearson correlation in-sample
|
48
|
+
corr = np.corrcoef(preds, y)[0, 1]
|
49
|
+
corrs.append(corr)
|
50
|
+
|
51
|
+
model = WarpGBM(
|
52
|
+
max_depth = 10,
|
53
|
+
num_bins = 10,
|
54
|
+
n_estimators = 10,
|
55
|
+
learning_rate = 1,
|
56
|
+
verbosity=False,
|
57
|
+
histogram_computer='hist3',
|
58
|
+
threads_per_block=32,
|
59
|
+
rows_per_thread=4
|
60
|
+
)
|
61
|
+
|
62
|
+
model.fit(X, y, era_id=era)
|
63
|
+
preds = model.predict(X)
|
64
|
+
|
65
|
+
# Pearson correlation in-sample
|
66
|
+
corr = np.corrcoef(preds, y)[0, 1]
|
67
|
+
corrs.append(corr)
|
68
|
+
assert ( np.array(corrs) > 0.95 ).all(), f"In-sample correlation too low: {corr:.4f}"
|
@@ -0,0 +1 @@
|
|
1
|
+
0.1.12
|
@@ -20,9 +20,10 @@ class WarpGBM(BaseEstimator, RegressorMixin):
|
|
20
20
|
min_child_weight=20,
|
21
21
|
min_split_gain=0.0,
|
22
22
|
verbosity=True,
|
23
|
-
histogram_computer='
|
24
|
-
threads_per_block=
|
25
|
-
rows_per_thread=
|
23
|
+
histogram_computer='hist3',
|
24
|
+
threads_per_block=64,
|
25
|
+
rows_per_thread=4,
|
26
|
+
L2_reg = 1e-6,
|
26
27
|
device = 'cuda'
|
27
28
|
):
|
28
29
|
self.num_bins = num_bins
|
@@ -52,6 +53,7 @@ class WarpGBM(BaseEstimator, RegressorMixin):
|
|
52
53
|
self.compute_histogram = histogram_kernels[histogram_computer]
|
53
54
|
self.threads_per_block = threads_per_block
|
54
55
|
self.rows_per_thread = rows_per_thread
|
56
|
+
self.L2_reg = L2_reg
|
55
57
|
|
56
58
|
|
57
59
|
def fit(self, X, y, era_id=None):
|
@@ -124,9 +126,9 @@ class WarpGBM(BaseEstimator, RegressorMixin):
|
|
124
126
|
hessian_histogram.contiguous(),
|
125
127
|
self.num_features,
|
126
128
|
self.num_bins,
|
127
|
-
|
128
|
-
|
129
|
-
|
129
|
+
self.min_split_gain,
|
130
|
+
self.min_child_weight,
|
131
|
+
self.L2_reg,
|
130
132
|
self.out_feature,
|
131
133
|
self.out_bin
|
132
134
|
)
|
warpgbm-0.1.10/README.md
DELETED
@@ -1,60 +0,0 @@
|
|
1
|
-
# WarpGBM
|
2
|
-
|
3
|
-
WarpGBM is a high-performance, GPU-accelerated Gradient Boosted Decision Tree (GBDT) library built with PyTorch and CUDA. It offers blazing-fast histogram-based training and efficient prediction, with compatibility for research and production workflows.
|
4
|
-
|
5
|
-
---
|
6
|
-
|
7
|
-
## Features
|
8
|
-
|
9
|
-
- GPU-accelerated training and histogram construction using custom CUDA kernels
|
10
|
-
- Drop-in scikit-learn style interface
|
11
|
-
- Supports pre-binned data or automatic quantile binning
|
12
|
-
- Fully differentiable prediction path
|
13
|
-
- Simple install with `pip`
|
14
|
-
|
15
|
-
---
|
16
|
-
|
17
|
-
## Performance Note
|
18
|
-
|
19
|
-
In our initial tests on an NVIDIA 3090 (local) and A100 (Google Colab Pro), WarpGBM achieves **14x to 20x faster training times** compared to LightGBM using default configurations. It also consumes **significantly less RAM and CPU**. These early results hint at more thorough benchmarking to come.
|
20
|
-
|
21
|
-
---
|
22
|
-
|
23
|
-
## Installation
|
24
|
-
|
25
|
-
First, install PyTorch for your system with GPU support:
|
26
|
-
https://pytorch.org/get-started/locally/
|
27
|
-
|
28
|
-
Then:
|
29
|
-
|
30
|
-
```bash
|
31
|
-
pip install warpgbm
|
32
|
-
```
|
33
|
-
|
34
|
-
Note: WarpGBM will compile custom CUDA extensions at install time using your installed PyTorch.
|
35
|
-
|
36
|
-
---
|
37
|
-
|
38
|
-
## Example
|
39
|
-
|
40
|
-
```python
|
41
|
-
import numpy as np
|
42
|
-
from warpgbm import WarpGBM
|
43
|
-
|
44
|
-
# Generate a simple regression dataset
|
45
|
-
X = np.random.randn(100, 5).astype(np.float32)
|
46
|
-
w = np.array([0.5, -1.0, 2.0, 0.0, 1.0])
|
47
|
-
y = (X @ w + 0.1 * np.random.randn(100)).astype(np.float32)
|
48
|
-
|
49
|
-
model = WarpGBM(max_depth=3, n_estimators=10)
|
50
|
-
model.fit(X, y) # era_id is optional
|
51
|
-
preds = model.predict(X)
|
52
|
-
```
|
53
|
-
|
54
|
-
---
|
55
|
-
|
56
|
-
## Acknowledgements
|
57
|
-
|
58
|
-
WarpGBM builds on the shoulders of PyTorch, scikit-learn, LightGBM, and the CUDA ecosystem. Thanks to all contributors in the GBDT research and engineering space.
|
59
|
-
|
60
|
-
---
|
@@ -1,29 +0,0 @@
|
|
1
|
-
import numpy as np
|
2
|
-
from warpgbm import WarpGBM
|
3
|
-
|
4
|
-
def test_fit_predict_correlation():
|
5
|
-
np.random.seed(42)
|
6
|
-
|
7
|
-
N = 200
|
8
|
-
F = 5
|
9
|
-
X = np.random.randn(N, F).astype(np.float32)
|
10
|
-
true_weights = np.array([0.5, -1.0, 2.0, 0.0, 1.0])
|
11
|
-
noise = 0.1 * np.random.randn(N)
|
12
|
-
y = (X @ true_weights + noise).astype(np.float32)
|
13
|
-
era = np.zeros(N, dtype=np.int32)
|
14
|
-
|
15
|
-
model = WarpGBM(
|
16
|
-
num_bins=16,
|
17
|
-
max_depth=3,
|
18
|
-
n_estimators=10,
|
19
|
-
learning_rate=0.2,
|
20
|
-
verbosity=False,
|
21
|
-
device='cuda'
|
22
|
-
)
|
23
|
-
|
24
|
-
model.fit(X, y, era_id=era)
|
25
|
-
preds = model.predict(X)
|
26
|
-
|
27
|
-
# Pearson correlation in-sample
|
28
|
-
corr = np.corrcoef(preds, y)[0, 1]
|
29
|
-
assert corr > 0.95, f"In-sample correlation too low: {corr:.4f}"
|
warpgbm-0.1.10/version.txt
DELETED
@@ -1 +0,0 @@
|
|
1
|
-
0.1.10
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|