PyPI - torch-rechub - Versions diffs - 0.0.6__tar.gz → 0.1.0__tar.gz - Mend

torch-rechub 0.0.6tar.gz → 0.1.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (283) hide show

{torch_rechub-0.0.6 → torch_rechub-0.1.0}/.github/workflows/ci.yml RENAMED Viewed

@@ -56,7 +56,7 @@ jobs:
           python-version: ${{ env.PYTHON_VERSION }}
       - name: Cache pip packages
-        uses: actions/cache@v4
+        uses: actions/cache@v5
         with:
           path: ~/.cache/pip
           key: ${{ runner.os }}-pip-lint-${{ hashFiles('pyproject.toml') }}
@@ -104,6 +104,9 @@ jobs:
       matrix:
         os: [ubuntu-latest, windows-latest, macos-latest]
+    env:
+      SKIP_MILVUS_TESTS: ${{ matrix.os != 'ubuntu-latest' && '1' || '0' }}
     steps:
       - name: Checkout code
         uses: actions/checkout@v6
@@ -114,7 +117,7 @@ jobs:
           python-version: '3.9'
       - name: Cache pip packages
-        uses: actions/cache@v4
+        uses: actions/cache@v5
         with:
           path: |
             ~/.cache/pip
@@ -136,12 +139,36 @@ jobs:
           # Install CPU-only PyTorch for faster CI
           pip install torch --index-url ${{ env.TORCH_INDEX_URL }}
           # Install the package with dev and onnx dependencies
-          pip install -e ".[dev,onnx]" || pip install -r requirements-dev.txt && pip install -e .
+          pip install -e ".[dev,annoy,faiss,milvus,onnx]" || pip install -r requirements-dev.txt && pip install -e .
+      - name: Start Milvus
+        if: matrix.os == 'ubuntu-latest'
+        run: |
+          # Download the installation script
+          curl -sfL https://raw.githubusercontent.com/milvus-io/milvus/master/scripts/standalone_embed.sh -o standalone_embed.sh
+          # Start the Docker container
+          bash standalone_embed.sh start
+      - name: Wait for Milvus
+        if: matrix.os == 'ubuntu-latest'
+        run: |
+          for i in {1..60}; do
+            if curl -fsS http://localhost:9091/healthz >/dev/null; then exit 0; fi
+            sleep 2
+          done
+          exit 1
       - name: Run tests
+        if: matrix.os != 'macos-latest'
         run: |
           pytest -c config/pytest.ini tests/ -v
+      - name: Run tests (skip indexing tests)
+        if: matrix.os == 'macos-latest'
+        run: |
+          pytest -c config/pytest.ini tests/ -v --ignore=tests/test_serving.py
       - name: Run tests with coverage (Ubuntu only)
         if: matrix.os == 'ubuntu-latest'
         run: |
@@ -155,6 +182,16 @@ jobs:
           flags: unittests
           name: codecov-umbrella
+      - name: Shutdown Milvus
+        if: always() && matrix.os == 'ubuntu-latest'
+        run: |
+          # Stop Milvus
+          bash standalone_embed.sh stop
+          # Delete Milvus data
+          bash standalone_embed.sh delete
   # ===================================================================
   # 依赖兼容性验证 (Python 3.10+) - 仅验证依赖安装成功
   # (仅在 push/PR 时运行，release 时跳过)
@@ -221,7 +258,7 @@ jobs:
           bandit -r torch_rechub/ -s B101,B311,B614 -x tests,docs,examples -f txt
       - name: Upload security scan results
-        uses: actions/upload-artifact@v5
+        uses: actions/upload-artifact@v6
         if: always()
         with:
           name: bandit-security-report
@@ -259,7 +296,7 @@ jobs:
           twine check dist/*
       - name: Upload build artifacts
-        uses: actions/upload-artifact@v5
+        uses: actions/upload-artifact@v6
         with:
           name: dist-packages
           path: dist/

{torch_rechub-0.0.6 → torch_rechub-0.1.0}/CHANGELOG.md RENAMED Viewed

@@ -7,6 +7,22 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ---
+## [0.1.0] - 2025-12-17
+<!-- Release notes generated using configuration in .github/release.yml at main -->
+## What's Changed
+### ✨ 新特性 / Features
+* Update docs and tutorials && Add ONNX quantization utilities and enhance export  by @1985312383 in https://github.com/datawhalechina/torch-rechub/pull/150
+* REFACTOR+FEATURE: Standardize retrieval backends (ANNOY/FAISS/Milvus) by @ywuenthought in https://github.com/datawhalechina/torch-rechub/pull/151
+**Full Changelog**: https://github.com/datawhalechina/torch-rechub/compare/v0.0.6...v0.1.0
+---
 ## [0.0.6] - 2025-12-11
 <!-- Release notes generated using configuration in .github/release.yml at main -->

{torch_rechub-0.0.6 → torch_rechub-0.1.0}/CONTRIBUTING.md RENAMED Viewed

@@ -150,18 +150,25 @@ def test_deepfm_forward():
 ```python
 def train_model(model, data_loader, optimizer):
     """Train a recommendation model.
-    Args:
-        model (torch.nn.Module): The model to train
-        data_loader (DataLoader): Training data loader
-        optimizer (torch.optim.Optimizer): Optimizer for training
-    Returns:
-        float: Training loss
-    Example:
-        >>> model = DeepFM(features, mlp_params)
-        >>> loss = train_model(model, train_loader, optimizer)
+    Parameters
+    ----------
+    model : torch.nn.Module
+        Model to train.
+    data_loader : DataLoader
+        Training data loader.
+    optimizer : torch.optim.Optimizer
+        Optimizer for training.
+    Returns
+    -------
+    float
+        Training loss.
+    Examples
+    --------
+    >>> model = DeepFM(features, mlp_params)
+    >>> loss = train_model(model, train_loader, optimizer)
     """
     # Implementation here
 ```

{torch_rechub-0.0.6 → torch_rechub-0.1.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: torch-rechub
-Version: 0.0.6
+Version: 0.1.0
 Summary: A Pytorch Toolbox for Recommendation Models, Easy-to-use and Easy-to-extend.
 Project-URL: Homepage, https://github.com/datawhalechina/torch-rechub
 Project-URL: Documentation, https://www.torch-rechub.com
@@ -28,6 +28,8 @@ Requires-Dist: scikit-learn>=0.24.0
 Requires-Dist: torch>=1.10.0
 Requires-Dist: tqdm>=4.60.0
 Requires-Dist: transformers>=4.46.3
+Provides-Extra: annoy
+Requires-Dist: annoy>=1.17.2; extra == 'annoy'
 Provides-Extra: bigdata
 Requires-Dist: pyarrow~=21.0; extra == 'bigdata'
 Provides-Extra: dev
@@ -41,8 +43,13 @@ Requires-Dist: pytest-cov>=2.0; extra == 'dev'
 Requires-Dist: pytest>=6.0; extra == 'dev'
 Requires-Dist: toml>=0.10.2; extra == 'dev'
 Requires-Dist: yapf==0.43.0; extra == 'dev'
+Provides-Extra: faiss
+Requires-Dist: faiss-cpu==1.13.0; extra == 'faiss'
+Provides-Extra: milvus
+Requires-Dist: pymilvus>=2.6.5; extra == 'milvus'
 Provides-Extra: onnx
 Requires-Dist: onnx>=1.14.0; extra == 'onnx'
+Requires-Dist: onnxconverter-common>=1.14.0; extra == 'onnx'
 Requires-Dist: onnxruntime>=1.14.0; extra == 'onnx'
 Provides-Extra: tracking
 Requires-Dist: swanlab>=0.1.0; extra == 'tracking'

torch_rechub-0.1.0/docs/en/core/data.md ADDED Viewed

@@ -0,0 +1,86 @@
+---
+title: Data Pipeline
+description: Torch-RecHub data loading and preprocessing
+---
+# Data Pipeline
+Torch-RecHub offers datasets, generators, and utilities for recommendation data.
+## Datasets
+### TorchDataset
+Training/validation dataset with features and labels.
+```python
+from torch_rechub.utils.data import TorchDataset
+dataset = TorchDataset(x, y)
+```
+### PredictDataset
+Prediction-only dataset (features only).
+```python
+from torch_rechub.utils.data import PredictDataset
+dataset = PredictDataset(x)
+```
+## Data Generators
+### DataGenerator
+Build dataloaders for ranking / multi-task models.
+```python
+from torch_rechub.utils.data import DataGenerator
+dg = DataGenerator(x, y)
+train_dl, val_dl, test_dl = dg.generate_dataloader(
+    split_ratio=[0.7, 0.1],
+    batch_size=256,
+    num_workers=8,
+)
+```
+### MatchDataGenerator
+Build dataloaders for matching/retrieval models.
+```python
+from torch_rechub.utils.data import MatchDataGenerator
+dg = MatchDataGenerator(x, y)
+train_dl, test_dl, item_dl = dg.generate_dataloader(
+    x_test_user=x_test_user,
+    x_all_item=x_all_item,
+    batch_size=256,
+    num_workers=8,
+)
+```
+## Utilities
+### get_auto_embedding_dim
+Compute embedding dim from vocab size: ``int(floor(6 * num_classes**0.25))``.
+```python
+from torch_rechub.utils.data import get_auto_embedding_dim
+embed_dim = get_auto_embedding_dim(vocab_size=1000)
+```
+### get_loss_func
+Return default loss by task type: BCELoss for classification, MSELoss for regression.
+```python
+from torch_rechub.utils.data import get_loss_func
+loss_fn = get_loss_func(task_type="classification")
+```
+## Typical Flow
+1. Define features (Dense/Sparse/Sequence).
+2. Load raw data.
+3. Encode categorical features (e.g., LabelEncoder).
+4. Process sequences (pad/truncate).
+5. Construct samples (e.g., negative sampling).
+6. Use DataGenerator / MatchDataGenerator to build dataloaders.
+7. Train models with the trainers.

torch_rechub-0.1.0/docs/en/core/evaluation.md ADDED Viewed

@@ -0,0 +1,207 @@
+---
+title: Training & Evaluation
+description: Torch-RecHub training and evaluation
+---
+# Training & Evaluation
+Torch-RecHub provides trainers for ranking, matching, multi-task, and generative models. All trainers expose a unified interface for training, evaluation, prediction, ONNX export, and optional experiment tracking/visualization.
+## Experiment Tracking & Visualization
+- Supports **WandB / SwanLab / TensorBoardX** as `model_logger`; you can pass a single instance or a list.
+- Auto-logs train/validation metrics and hyperparameters: `train/loss`, `learning_rate`, `val/auc` (CTR/Match), `val/task_i_score` (MTL), `val/accuracy` (Seq).
+- Set `model_logger=None` (default) for zero overhead when tracking is not needed.
+```python
+from torch_rechub.basic.tracking import WandbLogger, TensorBoardXLogger
+from torch_rechub.trainers import CTRTrainer
+wb = WandbLogger(project="rechub-demo", name="deepfm")
+tb = TensorBoardXLogger(log_dir="./runs/deepfm")
+trainer = CTRTrainer(model, model_logger=[wb, tb])
+trainer.fit(train_dataloader, val_dataloader)
+```
+## Trainers
+### CTRTrainer
+Used for ranking (CTR prediction) models such as DeepFM, Wide&Deep, DCN.
+```python
+from torch_rechub.trainers import CTRTrainer
+from torch_rechub.models.ranking import DeepFM
+model = DeepFM(deep_features=deep_features, fm_features=fm_features, mlp_params={"dims": [256, 128], "dropout": 0.2})
+trainer = CTRTrainer(
+    model=model,
+    optimizer_params={"lr": 0.001, "weight_decay": 0.0001},
+    n_epoch=50,
+    earlystop_patience=10,
+    device="cuda:0",
+    model_path="saved/deepfm"
+)
+trainer.fit(train_dataloader, val_dataloader)
+auc = trainer.evaluate(trainer.model, test_dataloader)
+trainer.export_onnx("deepfm.onnx")
+trainer.visualization(save_path="deepfm_architecture.pdf")
+```
+**Parameters**
+- `model`: Ranking model instance.
+- `optimizer_fn`: Optimizer function, default `torch.optim.Adam`.
+- `optimizer_params`: Optimizer parameters.
+- `regularization_params`: Regularization parameters.
+- `scheduler_fn`: Learning rate scheduler.
+- `scheduler_params`: Scheduler parameters.
+- `n_epoch`: Number of training epochs.
+- `earlystop_patience`: Patience for early stopping.
+- `device`: Training device.
+- `gpus`: List of GPU ids.
+- `loss_mode`: Boolean. `True` when the model returns only predictions; `False` when the model returns predictions plus auxiliary loss.
+- `model_path`: Path to save the model.
+### MatchTrainer
+Used for matching/retrieval models such as DSSM, YoutubeDNN, MIND.
+```python
+from torch_rechub.trainers import MatchTrainer
+from torch_rechub.models.matching import DSSM
+model = DSSM(
+    user_features=user_features,
+    item_features=item_features,
+    temperature=0.02,
+    user_params={"dims": [256, 128, 64]},
+    item_params={"dims": [256, 128, 64]}
+)
+trainer = MatchTrainer(
+    model=model,
+    mode=0,  # 0: point-wise, 1: pair-wise, 2: list-wise
+    optimizer_params={"lr": 0.001},
+    n_epoch=50,
+    device="cuda:0",
+    model_path="saved/dssm"
+)
+trainer.fit(train_dataloader)
+trainer.export_onnx("user_tower.onnx", mode="user")
+trainer.export_onnx("item_tower.onnx", mode="item")
+```
+**Parameters**
+- `model`: Matching model instance.
+- `mode`: Training mode, one of 0 (point-wise), 1 (pair-wise), 2 (list-wise).
+- `optimizer_fn`: Optimizer function, default `torch.optim.Adam`.
+- `optimizer_params`: Optimizer parameters.
+- `regularization_params`: Regularization parameters.
+- `scheduler_fn`: Learning rate scheduler.
+- `scheduler_params`: Scheduler parameters.
+- `n_epoch`: Number of training epochs.
+- `earlystop_patience`: Patience for early stopping.
+- `device`: Training device.
+- `gpus`: List of GPU ids.
+- `model_path`: Path to save the model.
+### MTLTrainer
+Used for multi-task models such as MMoE, PLE, ESMM, SharedBottom.
+```python
+from torch_rechub.trainers import MTLTrainer
+from torch_rechub.models.multi_task import MMOE
+model = MMOE(
+    features=features,
+    task_types=["classification", "classification"],
+    n_expert=8,
+    expert_params={"dims": [32,16]},
+    tower_params_list=[{"dims": [32, 16]}, {"dims": [32, 16]}]
+)
+trainer = MTLTrainer(
+    model=model,
+    task_types=["classification", "classification"],
+    optimizer_params={"lr": 0.001},
+    adaptive_params={"method": "uwl"},
+    n_epoch=50,
+    earlystop_taskid=0,
+    device="cuda:0",
+    model_path="saved/mmoe"
+)
+trainer.fit(train_dataloader, val_dataloader)
+trainer.export_onnx("mmoe.onnx")
+```
+**Parameters**
+- `model`: Multi-task model instance.
+- `task_types`: List of task types (`classification`, `regression`).
+- `optimizer_fn`: Optimizer function, default `torch.optim.Adam`.
+- `optimizer_params`: Optimizer parameters.
+- `regularization_params`: Regularization parameters.
+- `scheduler_fn`: Learning rate scheduler.
+- `scheduler_params`: Scheduler parameters.
+- `adaptive_params`: Adaptive loss weighting parameters.
+- `n_epoch`: Number of training epochs.
+- `earlystop_taskid`: Task id used for early stopping.
+- `earlystop_patience`: Patience for early stopping.
+- `device`: Training device.
+- `gpus`: List of GPU ids.
+- `model_path`: Path to save the model.
+## Callbacks
+### EarlyStopper
+Used for early stopping when validation performance no longer improves.
+```python
+from torch_rechub.basic.callback import EarlyStopper
+early_stopper = EarlyStopper(patience=10)
+if early_stopper.stop_training(auc, model.state_dict()):
+    print(f'validation: best auc: {early_stopper.best_auc}')
+    model.load_state_dict(early_stopper.best_weights)
+    break
+```
+**Parameters**
+- `patience`: Number of consecutive epochs without improvement before stopping.
+- `delta`: Minimum improvement threshold to be considered progress.
+## Loss Functions
+### RegularizationLoss
+Supports L1 and L2 regularization.
+```python
+from torch_rechub.basic.loss_func import RegularizationLoss
+reg_loss_fn = RegularizationLoss(
+    embedding_l1=0.0,
+    embedding_l2=0.0001,
+    dense_l1=0.0,
+    dense_l2=0.0001
+)
+```
+### BPRLoss
+Pairwise loss for matching models.
+```python
+from torch_rechub.basic.loss_func import BPRLoss
+bpr_loss = BPRLoss()
+loss = bpr_loss(pos_score, neg_score)
+```

torch_rechub-0.1.0/docs/en/core/features.md ADDED Viewed

@@ -0,0 +1,77 @@
+---
+title: Feature Definitions
+description: Torch-RecHub feature types
+---
+# Feature Definitions
+Torch-RecHub provides three core feature classes for different data types.
+## DenseFeature
+Numeric features (e.g., age, income).
+```python
+from torch_rechub.basic.features import DenseFeature
+dense_feature = DenseFeature(name="age", embed_dim=1)
+```
+Parameters: `name`, `embed_dim` (always 1).
+## SparseFeature
+Categorical features (e.g., city, gender).
+```python
+from torch_rechub.basic.features import SparseFeature
+sparse_feature = SparseFeature(
+    name="city",
+    vocab_size=100,
+    embed_dim=16,
+    shared_with=None,  # share embeddings with another feature if needed
+)
+```
+Parameters: `name`, `vocab_size`, `embed_dim` (auto if None), `shared_with`, `padding_idx`, `initializer`.
+## SequenceFeature
+Sequence or multi-hot features (e.g., behavior history, tags).
+```python
+from torch_rechub.basic.features import SequenceFeature
+sequence_feature = SequenceFeature(
+    name="user_history",
+    vocab_size=10000,
+    embed_dim=32,
+    pooling="mean",  # mean, sum, concat
+)
+```
+Parameters: `name`, `vocab_size`, `embed_dim` (auto if None), `pooling` (mean/sum/concat), `shared_with`, `padding_idx`, `initializer`.
+## Usage Example
+```python
+from torch_rechub.basic.features import DenseFeature, SparseFeature, SequenceFeature
+dense_features = [
+    DenseFeature(name="age", embed_dim=1),
+    DenseFeature(name="income", embed_dim=1),
+]
+sparse_features = [
+    SparseFeature(name="city", vocab_size=100, embed_dim=16),
+    SparseFeature(name="gender", vocab_size=3, embed_dim=8),
+]
+sequence_features = [
+    SequenceFeature(name="user_history", vocab_size=10000, embed_dim=32, pooling="mean"),
+]
+all_features = dense_features + sparse_features + sequence_features
+```

torch_rechub-0.1.0/docs/en/core/intro.md ADDED Viewed

@@ -0,0 +1,30 @@
+---
+title: Core Components Overview
+description: Torch-RecHub core components overview
+---
+# Core Components Overview
+Torch-RecHub is modular: features, data, models, training, and tools are separated for clarity and extensibility.
+## Architecture
+1) **Feature layer** – dense, sparse, and sequence feature definitions.
+2) **Data layer** – loading, preprocessing, and dataloader generation.
+3) **Model layer** – ranking, matching, multi-task, and generative models.
+4) **Training layer** – unified trainers for fit/eval/predict/ONNX export.
+5) **Tools layer** – ONNX export, visualization, callbacks, losses, etc.
+## Component Relations
+- Feature layer guides preprocessing in the data layer.
+- Data generators feed the training layer.
+- Models are consumed by trainers.
+- Trainers call tools for export/visualization/tracking.
+## Component Details
+- **Feature processing**: `DenseFeature`, `SparseFeature`, `SequenceFeature`. See [Features](/en/core/features).
+- **Data pipeline**: `TorchDataset`, `PredictDataset`, `DataGenerator`, `MatchDataGenerator`. See [Data](/en/core/data).
+- **Training & evaluation**: `CTRTrainer`, `MatchTrainer`, `MTLTrainer` (and generative trainer variants). See [Training & Evaluation](/en/core/evaluation).

torch-rechub 0.0.6__tar.gz → 0.1.0__tar.gz

torch-rechub 0.0.6tar.gz → 0.1.0tar.gz