warpgbm 0.1.27__tar.gz → 2.0.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {warpgbm-0.1.27/warpgbm.egg-info → warpgbm-2.0.0}/PKG-INFO +333 -150
- warpgbm-2.0.0/README.md +424 -0
- {warpgbm-0.1.27 → warpgbm-2.0.0}/pyproject.toml +1 -1
- warpgbm-2.0.0/tests/test_invariant.py +100 -0
- warpgbm-2.0.0/tests/test_multiclass.py +332 -0
- warpgbm-2.0.0/version.txt +1 -0
- {warpgbm-0.1.27 → warpgbm-2.0.0}/warpgbm/core.py +386 -53
- warpgbm-2.0.0/warpgbm/cuda/best_split_kernel.cu +89 -0
- {warpgbm-0.1.27 → warpgbm-2.0.0}/warpgbm/cuda/histogram_kernel.cu +24 -15
- {warpgbm-0.1.27 → warpgbm-2.0.0}/warpgbm/cuda/node_kernel.cpp +9 -8
- warpgbm-2.0.0/warpgbm/metrics.py +37 -0
- {warpgbm-0.1.27 → warpgbm-2.0.0/warpgbm.egg-info}/PKG-INFO +333 -150
- {warpgbm-0.1.27 → warpgbm-2.0.0}/warpgbm.egg-info/SOURCES.txt +2 -0
- warpgbm-0.1.27/README.md +0 -241
- warpgbm-0.1.27/version.txt +0 -1
- warpgbm-0.1.27/warpgbm/cuda/best_split_kernel.cu +0 -79
- warpgbm-0.1.27/warpgbm/metrics.py +0 -10
- {warpgbm-0.1.27 → warpgbm-2.0.0}/LICENSE +0 -0
- {warpgbm-0.1.27 → warpgbm-2.0.0}/MANIFEST.in +0 -0
- {warpgbm-0.1.27 → warpgbm-2.0.0}/setup.cfg +0 -0
- {warpgbm-0.1.27 → warpgbm-2.0.0}/setup.py +0 -0
- {warpgbm-0.1.27 → warpgbm-2.0.0}/tests/__init__.py +0 -0
- {warpgbm-0.1.27 → warpgbm-2.0.0}/tests/full_numerai_test.py +0 -0
- {warpgbm-0.1.27 → warpgbm-2.0.0}/tests/numerai_test.py +0 -0
- {warpgbm-0.1.27 → warpgbm-2.0.0}/tests/test_fit_predict_corr.py +0 -0
- {warpgbm-0.1.27 → warpgbm-2.0.0}/warpgbm/__init__.py +0 -0
- {warpgbm-0.1.27 → warpgbm-2.0.0}/warpgbm/cuda/__init__.py +0 -0
- {warpgbm-0.1.27 → warpgbm-2.0.0}/warpgbm/cuda/binner.cu +0 -0
- {warpgbm-0.1.27 → warpgbm-2.0.0}/warpgbm/cuda/predict.cu +0 -0
- {warpgbm-0.1.27 → warpgbm-2.0.0}/warpgbm.egg-info/dependency_links.txt +0 -0
- {warpgbm-0.1.27 → warpgbm-2.0.0}/warpgbm.egg-info/requires.txt +0 -0
- {warpgbm-0.1.27 → warpgbm-2.0.0}/warpgbm.egg-info/top_level.txt +0 -0
@@ -1,6 +1,6 @@
|
|
1
1
|
Metadata-Version: 2.4
|
2
2
|
Name: warpgbm
|
3
|
-
Version: 0.
|
3
|
+
Version: 2.0.0
|
4
4
|
Summary: A fast GPU-accelerated Gradient Boosted Decision Tree library with PyTorch + CUDA
|
5
5
|
License: GNU GENERAL PUBLIC LICENSE
|
6
6
|
Version 3, 29 June 2007
|
@@ -688,242 +688,425 @@ Dynamic: license-file
|
|
688
688
|
|
689
689
|

|
690
690
|
|
691
|
+
# WarpGBM ⚡
|
691
692
|
|
692
|
-
|
693
|
+
> **Neural-speed gradient boosting. GPU-native. Distribution-aware. Production-ready.**
|
693
694
|
|
694
|
-
WarpGBM is a high-performance, GPU-accelerated Gradient Boosted Decision Tree (GBDT) library
|
695
|
+
WarpGBM is a high-performance, GPU-accelerated Gradient Boosted Decision Tree (GBDT) library engineered from silicon up with PyTorch and custom CUDA kernels. Built for speed demons and researchers who refuse to compromise.
|
695
696
|
|
696
|
-
|
697
|
+
## 🎯 What Sets WarpGBM Apart
|
698
|
+
|
699
|
+
**Regression + Classification Unified**
|
700
|
+
Train on continuous targets or multiclass labels with the same blazing-fast infrastructure.
|
697
701
|
|
698
|
-
|
702
|
+
**Invariant Learning (DES Algorithm)**
|
703
|
+
The only open-source GBDT that natively learns signals stable across shifting distributions. Powered by **[Directional Era-Splitting](https://arxiv.org/abs/2309.14496)** — because your data doesn't live in a vacuum.
|
699
704
|
|
700
|
-
|
701
|
-
|
702
|
-
|
703
|
-
-
|
705
|
+
**GPU-Accelerated Everything**
|
706
|
+
Custom CUDA kernels for binning, histograms, splits, and inference. No compromises, no CPU bottlenecks.
|
707
|
+
|
708
|
+
**Scikit-Learn Compatible**
|
709
|
+
Drop-in replacement. Same API you know, 10x the speed you need.
|
704
710
|
|
705
711
|
---
|
706
712
|
|
707
|
-
##
|
713
|
+
## 🚀 Quick Start
|
708
714
|
|
709
|
-
###
|
715
|
+
### Installation
|
710
716
|
|
711
|
-
|
717
|
+
```bash
|
718
|
+
# Latest from GitHub (recommended)
|
719
|
+
pip install git+https://github.com/jefferythewind/warpgbm.git
|
712
720
|
|
721
|
+
# Stable from PyPI
|
722
|
+
pip install warpgbm
|
713
723
|
```
|
714
|
-
|
715
|
-
|
716
|
-
|
717
|
-
|
724
|
+
|
725
|
+
**Prerequisites:** PyTorch with CUDA support ([install guide](https://pytorch.org/get-started/locally/))
|
726
|
+
|
727
|
+
### Regression in 5 Lines
|
728
|
+
|
729
|
+
```python
|
730
|
+
from warpgbm import WarpGBM
|
731
|
+
import numpy as np
|
732
|
+
|
733
|
+
model = WarpGBM(objective='regression', max_depth=5, n_estimators=100)
|
734
|
+
model.fit(X_train, y_train)
|
735
|
+
predictions = model.predict(X_test)
|
718
736
|
```
|
719
737
|
|
720
|
-
|
738
|
+
### Classification in 5 Lines
|
739
|
+
|
740
|
+
```python
|
741
|
+
from warpgbm import WarpGBM
|
742
|
+
|
743
|
+
model = WarpGBM(objective='multiclass', max_depth=5, n_estimators=50)
|
744
|
+
model.fit(X_train, y_train) # y can be integers, strings, whatever
|
745
|
+
probabilities = model.predict_proba(X_test)
|
746
|
+
labels = model.predict(X_test)
|
747
|
+
```
|
721
748
|
|
722
749
|
---
|
723
750
|
|
724
|
-
##
|
751
|
+
## 🎮 Features
|
725
752
|
|
726
|
-
###
|
753
|
+
### Core Engine
|
754
|
+
- ⚡ **GPU-native CUDA kernels** for histogram building, split finding, binning, and prediction
|
755
|
+
- 🎯 **Multi-objective support**: regression, binary, multiclass classification
|
756
|
+
- 📊 **Pre-binned data optimization** — skip binning if your data's already quantized
|
757
|
+
- 🔥 **Mixed precision support** — `float32` or `int8` inputs
|
758
|
+
- 🎲 **Stochastic features** — `colsample_bytree` for regularization
|
727
759
|
|
728
|
-
|
729
|
-
|
730
|
-
|
760
|
+
### Intelligence
|
761
|
+
- 🧠 **Invariant learning via DES** — identifies signals that generalize across time/regimes/environments
|
762
|
+
- 📈 **Smart initialization** — class priors for classification, mean for regression
|
763
|
+
- 🎯 **Automatic label encoding** — handles strings, integers, whatever you throw at it
|
731
764
|
|
732
|
-
|
765
|
+
### Training Utilities
|
766
|
+
- ✅ **Early stopping** with validation sets
|
767
|
+
- 📊 **Rich metrics**: MSE, RMSLE, correlation, log loss, accuracy
|
768
|
+
- 🔍 **Progress tracking** with loss curves
|
769
|
+
- 🎚️ **Regularization** — L2 leaf penalties, min split gain, min child weight
|
733
770
|
|
734
|
-
|
771
|
+
---
|
735
772
|
|
736
|
-
|
737
|
-
pip install warpgbm
|
738
|
-
```
|
773
|
+
## ⚔️ Benchmarks
|
739
774
|
|
740
|
-
|
775
|
+
### Synthetic Data: 1M Rows × 1K Features (Google Colab L4 GPU)
|
741
776
|
|
742
|
-
|
743
|
-
|
744
|
-
|
745
|
-
|
746
|
-
|
747
|
-
|
777
|
+
```
|
778
|
+
WarpGBM: corr = 0.8882, train = 17.4s, infer = 3.2s ⚡
|
779
|
+
XGBoost: corr = 0.8877, train = 33.2s, infer = 8.0s
|
780
|
+
LightGBM: corr = 0.8604, train = 29.8s, infer = 1.6s
|
781
|
+
CatBoost: corr = 0.8935, train = 392.1s, infer = 379.2s
|
782
|
+
```
|
748
783
|
|
749
|
-
|
784
|
+
**2× faster than XGBoost. 23× faster than CatBoost.**
|
750
785
|
|
751
|
-
|
786
|
+
[→ Run the benchmark yourself](https://colab.research.google.com/drive/16U1kbYlD5HibGbnF5NGsjChZ1p1IA2pK?usp=sharing)
|
787
|
+
|
788
|
+
### Multiclass Classification: 3.5K Samples, 3 Classes, 50 Rounds
|
752
789
|
|
753
790
|
```
|
754
|
-
|
755
|
-
|
756
|
-
|
757
|
-
pip install .\dist\warpgbm-0.1.15-cp310-cp310-win_amd64.whl
|
791
|
+
Training: 2.13s
|
792
|
+
Inference: 0.37s
|
793
|
+
Accuracy: 75.3%
|
758
794
|
```
|
759
795
|
|
760
|
-
|
761
|
-
[https://pytorch.org/get-started/locally/](https://pytorch.org/get-started/locally/)
|
796
|
+
**Production-ready multiclass at neural network speeds.**
|
762
797
|
|
763
798
|
---
|
764
799
|
|
765
|
-
##
|
800
|
+
## 📖 Examples
|
801
|
+
|
802
|
+
### Regression: Beat LightGBM on Your Laptop
|
766
803
|
|
767
804
|
```python
|
768
805
|
import numpy as np
|
769
806
|
from sklearn.datasets import make_regression
|
770
|
-
from time import time
|
771
|
-
import lightgbm as lgb
|
772
807
|
from warpgbm import WarpGBM
|
773
808
|
|
774
|
-
#
|
775
|
-
X, y = make_regression(n_samples=100_000, n_features=500,
|
776
|
-
X = X.astype(np.float32)
|
777
|
-
|
778
|
-
|
779
|
-
|
780
|
-
|
781
|
-
|
782
|
-
|
783
|
-
|
784
|
-
|
785
|
-
|
786
|
-
|
787
|
-
|
788
|
-
|
789
|
-
|
790
|
-
|
791
|
-
wgbm_preds = wgbm_model.predict(X)
|
792
|
-
|
793
|
-
# Results
|
794
|
-
print(f"LightGBM: corr = {np.corrcoef(lgb_preds, y)[0,1]:.4f}, time = {lgb_time:.2f}s")
|
795
|
-
print(f"WarpGBM: corr = {np.corrcoef(wgbm_preds, y)[0,1]:.4f}, time = {wgbm_time:.2f}s")
|
809
|
+
# Generate data
|
810
|
+
X, y = make_regression(n_samples=100_000, n_features=500, random_state=42)
|
811
|
+
X, y = X.astype(np.float32), y.astype(np.float32)
|
812
|
+
|
813
|
+
# Train
|
814
|
+
model = WarpGBM(
|
815
|
+
objective='regression',
|
816
|
+
max_depth=5,
|
817
|
+
n_estimators=100,
|
818
|
+
learning_rate=0.01,
|
819
|
+
num_bins=32
|
820
|
+
)
|
821
|
+
model.fit(X, y)
|
822
|
+
|
823
|
+
# Predict
|
824
|
+
preds = model.predict(X)
|
825
|
+
print(f"Correlation: {np.corrcoef(preds, y)[0,1]:.4f}")
|
796
826
|
```
|
797
827
|
|
798
|
-
|
828
|
+
### Classification: Multiclass with Early Stopping
|
799
829
|
|
800
|
-
```
|
801
|
-
|
802
|
-
|
830
|
+
```python
|
831
|
+
from sklearn.datasets import make_classification
|
832
|
+
from sklearn.model_selection import train_test_split
|
833
|
+
from warpgbm import WarpGBM
|
834
|
+
|
835
|
+
# 5-class problem
|
836
|
+
X, y = make_classification(
|
837
|
+
n_samples=10_000,
|
838
|
+
n_features=50,
|
839
|
+
n_classes=5,
|
840
|
+
n_informative=30
|
841
|
+
)
|
842
|
+
|
843
|
+
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2)
|
844
|
+
|
845
|
+
model = WarpGBM(
|
846
|
+
objective='multiclass',
|
847
|
+
max_depth=6,
|
848
|
+
n_estimators=200,
|
849
|
+
learning_rate=0.1,
|
850
|
+
num_bins=32
|
851
|
+
)
|
852
|
+
|
853
|
+
model.fit(
|
854
|
+
X_train, y_train,
|
855
|
+
X_eval=X_val, y_eval=y_val,
|
856
|
+
eval_every_n_trees=10,
|
857
|
+
early_stopping_rounds=5,
|
858
|
+
eval_metric='logloss'
|
859
|
+
)
|
860
|
+
|
861
|
+
# Get probabilities or class predictions
|
862
|
+
probs = model.predict_proba(X_val) # shape: (n_samples, n_classes)
|
863
|
+
labels = model.predict(X_val) # class labels
|
803
864
|
```
|
804
865
|
|
805
|
-
|
866
|
+
### Invariant Learning: Distribution-Robust Signals
|
806
867
|
|
807
|
-
|
868
|
+
```python
|
869
|
+
# Your data spans multiple time periods/regimes/environments
|
870
|
+
# Pass era_id to learn only signals that work across ALL eras
|
808
871
|
|
809
|
-
|
872
|
+
model = WarpGBM(
|
873
|
+
objective='regression',
|
874
|
+
max_depth=8,
|
875
|
+
n_estimators=100
|
876
|
+
)
|
877
|
+
|
878
|
+
model.fit(
|
879
|
+
X, y,
|
880
|
+
era_id=era_labels # Array marking which era each sample belongs to
|
881
|
+
)
|
882
|
+
|
883
|
+
# Now your model ignores spurious correlations that don't generalize!
|
884
|
+
```
|
885
|
+
|
886
|
+
### Pre-binned Data: Maximum Speed (Numerai Example)
|
810
887
|
|
811
888
|
```python
|
812
889
|
import pandas as pd
|
813
890
|
from numerapi import NumerAPI
|
814
|
-
from time import time
|
815
|
-
import lightgbm as lgb
|
816
891
|
from warpgbm import WarpGBM
|
817
|
-
import numpy as np
|
818
892
|
|
893
|
+
# Download Numerai data (already quantized to integers)
|
819
894
|
napi = NumerAPI()
|
820
895
|
napi.download_dataset('v5.0/train.parquet', 'train.parquet')
|
821
896
|
train = pd.read_parquet('train.parquet')
|
822
897
|
|
823
|
-
|
824
|
-
|
825
|
-
|
826
|
-
X_np = train[feature_set].astype('int8').values
|
827
|
-
Y_np = train[target].values
|
828
|
-
|
829
|
-
# LightGBM
|
830
|
-
start = time()
|
831
|
-
lgb_model = lgb.LGBMRegressor(max_depth=5, n_estimators=100, learning_rate=0.01, max_bin=7)
|
832
|
-
lgb_model.fit(X_np, Y_np)
|
833
|
-
lgb_time = time() - start
|
834
|
-
lgb_preds = lgb_model.predict(X_np)
|
835
|
-
|
836
|
-
# WarpGBM
|
837
|
-
start = time()
|
838
|
-
wgbm_model = WarpGBM(max_depth=5, n_estimators=100, learning_rate=0.01, num_bins=7)
|
839
|
-
wgbm_model.fit(X_np, Y_np)
|
840
|
-
wgbm_time = time() - start
|
841
|
-
wgbm_preds = wgbm_model.predict(X_np)
|
842
|
-
|
843
|
-
# Results
|
844
|
-
print(f"LightGBM: corr = {np.corrcoef(lgb_preds, Y_np)[0,1]:.4f}, time = {lgb_time:.2f}s")
|
845
|
-
print(f"WarpGBM: corr = {np.corrcoef(wgbm_preds, Y_np)[0,1]:.4f}, time = {wgbm_time:.2f}s")
|
846
|
-
```
|
898
|
+
features = [f for f in train.columns if 'feature' in f]
|
899
|
+
X = train[features].astype('int8').values
|
900
|
+
y = train['target'].values
|
847
901
|
|
848
|
-
|
849
|
-
|
850
|
-
|
851
|
-
LightGBM: corr = 0.0703, time = 643.88s
|
852
|
-
WarpGBM: corr = 0.0660, time = 49.16s
|
902
|
+
# WarpGBM detects pre-binned data and skips binning
|
903
|
+
model = WarpGBM(max_depth=5, n_estimators=100, num_bins=20)
|
904
|
+
model.fit(X, y) # Blazing fast!
|
853
905
|
```
|
854
906
|
|
907
|
+
**Result: 13× faster than LightGBM on Numerai data (49s vs 643s)**
|
908
|
+
|
855
909
|
---
|
856
910
|
|
857
|
-
|
911
|
+
## 🧠 Invariant Learning: Why It Matters
|
912
|
+
|
913
|
+
Most ML models assume your training and test data come from the same distribution. **Reality check: they don't.**
|
914
|
+
|
915
|
+
- Stock prices shift with market regimes
|
916
|
+
- User behavior changes over time
|
917
|
+
- Experimental data varies by batch/site/condition
|
918
|
+
|
919
|
+
**Traditional GBDT:** Learns any signal that correlates with the target, including fragile patterns that break OOD.
|
920
|
+
|
921
|
+
**WarpGBM with DES:** Explicitly tests if each split generalizes across ALL environments (eras). Only keeps robust signals.
|
858
922
|
|
859
|
-
|
923
|
+
### The Algorithm
|
860
924
|
|
861
|
-
|
925
|
+
For each potential split, compute gain separately in each era. Only accept splits where:
|
926
|
+
1. Gain is positive in ALL eras
|
927
|
+
2. Split direction is consistent across eras
|
862
928
|
|
863
|
-
|
929
|
+
This prevents overfitting to spurious correlations that only work in some time periods or environments.
|
930
|
+
|
931
|
+
### Visual Intuition
|
932
|
+
|
933
|
+
<img src="https://github.com/user-attachments/assets/2be11ef3-6f2e-4636-ab91-307a73add247" alt="Era Splitting Visualization" width="400"/>
|
934
|
+
|
935
|
+
**Left:** Standard training pools all data together — learns any signal that correlates.
|
936
|
+
**Right:** Era-aware training demands signals work across all periods — learns robust features only.
|
937
|
+
|
938
|
+
### Research Foundation
|
939
|
+
|
940
|
+
- **Invariant Risk Minimization**: [Arjovsky et al., 2019](https://arxiv.org/abs/1907.02893)
|
941
|
+
- **Hard-to-Vary Explanations**: [Parascandolo et al., 2020](https://arxiv.org/abs/2009.00329)
|
942
|
+
- **Era Splitting for Trees**: [DeLise, 2023](https://arxiv.org/abs/2309.14496)
|
864
943
|
|
865
944
|
---
|
866
945
|
|
867
|
-
##
|
868
|
-
|
869
|
-
###
|
870
|
-
|
871
|
-
|
872
|
-
|
873
|
-
|
874
|
-
|
875
|
-
|
876
|
-
|
877
|
-
|
878
|
-
|
879
|
-
|
880
|
-
-
|
881
|
-
|
882
|
-
|
946
|
+
## 📚 API Reference
|
947
|
+
|
948
|
+
### Constructor Parameters
|
949
|
+
|
950
|
+
```python
|
951
|
+
WarpGBM(
|
952
|
+
objective='regression', # 'regression', 'binary', or 'multiclass'
|
953
|
+
num_bins=10, # Histogram bins for feature quantization
|
954
|
+
max_depth=3, # Maximum tree depth
|
955
|
+
learning_rate=0.1, # Shrinkage rate (aka eta)
|
956
|
+
n_estimators=100, # Number of boosting rounds
|
957
|
+
min_child_weight=20, # Min sum of instance weights in child node
|
958
|
+
min_split_gain=0.0, # Min loss reduction to split
|
959
|
+
L2_reg=1e-6, # L2 leaf regularization
|
960
|
+
colsample_bytree=1.0, # Feature subsample ratio per tree
|
961
|
+
threads_per_block=64, # CUDA block size (tune for your GPU)
|
962
|
+
rows_per_thread=4, # Rows processed per thread
|
963
|
+
device='cuda' # 'cuda' or 'cpu' (GPU strongly recommended)
|
964
|
+
)
|
883
965
|
```
|
884
|
-
|
885
|
-
|
886
|
-
|
887
|
-
|
888
|
-
|
889
|
-
|
890
|
-
|
891
|
-
|
892
|
-
|
966
|
+
|
967
|
+
### Training Methods
|
968
|
+
|
969
|
+
```python
|
970
|
+
model.fit(
|
971
|
+
X, # Features: np.array shape (n_samples, n_features)
|
972
|
+
y, # Target: np.array shape (n_samples,)
|
973
|
+
era_id=None, # Optional: era labels for invariant learning
|
974
|
+
X_eval=None, # Optional: validation features
|
975
|
+
y_eval=None, # Optional: validation targets
|
976
|
+
eval_every_n_trees=None, # Eval frequency (in rounds)
|
977
|
+
early_stopping_rounds=None, # Stop if no improvement for N evals
|
978
|
+
eval_metric='mse' # 'mse', 'rmsle', 'corr', 'logloss', 'accuracy'
|
893
979
|
)
|
894
980
|
```
|
895
|
-
Train with optional validation set and early stopping.
|
896
981
|
|
982
|
+
### Prediction Methods
|
897
983
|
|
984
|
+
```python
|
985
|
+
# Regression: returns predicted values
|
986
|
+
predictions = model.predict(X)
|
987
|
+
|
988
|
+
# Classification: returns class labels (decoded)
|
989
|
+
labels = model.predict(X)
|
990
|
+
|
991
|
+
# Classification: returns class probabilities
|
992
|
+
probabilities = model.predict_proba(X) # shape: (n_samples, n_classes)
|
898
993
|
```
|
899
|
-
|
900
|
-
|
901
|
-
|
994
|
+
|
995
|
+
### Attributes
|
996
|
+
|
997
|
+
```python
|
998
|
+
model.classes_ # Unique class labels (classification only)
|
999
|
+
model.num_classes # Number of classes (classification only)
|
1000
|
+
model.forest # Trained tree structures
|
1001
|
+
model.training_loss # Training loss history
|
1002
|
+
model.eval_loss # Validation loss history (if eval set provided)
|
1003
|
+
```
|
1004
|
+
|
1005
|
+
---
|
1006
|
+
|
1007
|
+
## 🔧 Installation Details
|
1008
|
+
|
1009
|
+
### Linux / macOS (Recommended)
|
1010
|
+
|
1011
|
+
```bash
|
1012
|
+
pip install git+https://github.com/jefferythewind/warpgbm.git
|
1013
|
+
```
|
1014
|
+
|
1015
|
+
Compiles CUDA extensions using your local PyTorch + CUDA setup.
|
1016
|
+
|
1017
|
+
### Colab / Mismatched CUDA Versions
|
1018
|
+
|
1019
|
+
```bash
|
1020
|
+
pip install warpgbm --no-build-isolation
|
1021
|
+
```
|
1022
|
+
|
1023
|
+
### Windows
|
1024
|
+
|
1025
|
+
```bash
|
1026
|
+
git clone https://github.com/jefferythewind/warpgbm.git
|
1027
|
+
cd warpgbm
|
1028
|
+
python setup.py bdist_wheel
|
1029
|
+
pip install dist/warpgbm-*.whl
|
902
1030
|
```
|
903
|
-
Predict on new data, using parallelized CUDA kernel.
|
904
1031
|
|
905
1032
|
---
|
906
1033
|
|
907
|
-
##
|
1034
|
+
## 🎯 Use Cases
|
908
1035
|
|
909
|
-
|
1036
|
+
**Financial ML:** Learn signals that work across market regimes
|
1037
|
+
**Time Series:** Robust forecasting across distribution shifts
|
1038
|
+
**Scientific Research:** Models that generalize across experimental batches
|
1039
|
+
**High-Speed Inference:** Production systems with millisecond SLAs
|
1040
|
+
**Kaggle/Competitions:** GPU-accelerated hyperparameter tuning
|
1041
|
+
**Multiclass Problems:** Image classification fallback, text categorization, fraud detection
|
910
1042
|
|
911
1043
|
---
|
912
1044
|
|
913
|
-
##
|
1045
|
+
## 🚧 Roadmap
|
914
1046
|
|
915
|
-
|
1047
|
+
- [ ] Multi-GPU training support
|
1048
|
+
- [ ] SHAP value computation on GPU
|
1049
|
+
- [ ] Feature interaction constraints
|
1050
|
+
- [ ] Monotonic constraints
|
1051
|
+
- [ ] Custom loss functions
|
1052
|
+
- [ ] Distributed training
|
1053
|
+
- [ ] ONNX export for deployment
|
916
1054
|
|
917
|
-
|
1055
|
+
---
|
918
1056
|
|
919
|
-
|
1057
|
+
## 🙏 Acknowledgements
|
920
1058
|
|
921
|
-
|
1059
|
+
Built on the shoulders of PyTorch, scikit-learn, LightGBM, XGBoost, and the CUDA ecosystem. Special thanks to the GBDT research community and all contributors.
|
922
1060
|
|
923
|
-
|
1061
|
+
---
|
924
1062
|
|
925
|
-
|
1063
|
+
## 📝 Version History
|
1064
|
+
|
1065
|
+
### v2.0.0 (Current)
|
1066
|
+
- ✨ **Multiclass classification support** via softmax objective
|
1067
|
+
- 🎯 Binary classification mode
|
1068
|
+
- 📊 New metrics: log loss, accuracy
|
1069
|
+
- 🏷️ Automatic label encoding (supports strings)
|
1070
|
+
- 🔮 `predict_proba()` for probability outputs
|
1071
|
+
- ✅ Comprehensive test suite for classification
|
1072
|
+
- 🔒 Full backward compatibility with regression
|
1073
|
+
- 🐛 Fixed unused variable issue (#8)
|
1074
|
+
- 🧹 Removed unimplemented L1_reg parameter
|
1075
|
+
- 📚 Major documentation overhaul with AGENT_GUIDE.md
|
1076
|
+
|
1077
|
+
### v1.0.0
|
1078
|
+
- 🧠 Invariant learning via Directional Era-Splitting (DES)
|
1079
|
+
- 🚀 VRAM optimizations
|
1080
|
+
- 📈 Era-aware histogram computation
|
926
1081
|
|
927
1082
|
### v0.1.26
|
1083
|
+
- 🐛 Memory bug fixes in prediction
|
1084
|
+
- 📊 Added correlation eval metric
|
1085
|
+
|
1086
|
+
### v0.1.25
|
1087
|
+
- 🎲 Feature subsampling (`colsample_bytree`)
|
1088
|
+
|
1089
|
+
### v0.1.23
|
1090
|
+
- ⏹️ Early stopping support
|
1091
|
+
- ✅ Validation set evaluation
|
1092
|
+
|
1093
|
+
### v0.1.21
|
1094
|
+
- ⚡ CUDA prediction kernel (replaced vectorized Python)
|
1095
|
+
|
1096
|
+
---
|
1097
|
+
|
1098
|
+
## 📄 License
|
1099
|
+
|
1100
|
+
MIT License - see [LICENSE](LICENSE) file
|
1101
|
+
|
1102
|
+
---
|
1103
|
+
|
1104
|
+
## 🤝 Contributing
|
1105
|
+
|
1106
|
+
Pull requests welcome! See [AGENT_GUIDE.md](AGENT_GUIDE.md) for architecture details and development guidelines.
|
1107
|
+
|
1108
|
+
---
|
1109
|
+
|
1110
|
+
**Built with 🔥 by @jefferythewind**
|
928
1111
|
|
929
|
-
|
1112
|
+
*"Train smarter. Predict faster. Generalize better."*
|