PyPI - torch-rechub - Versions diffs - 0.2.0__tar.gz → 0.3.0__tar.gz - Mend

torch-rechub 0.2.0tar.gz → 0.3.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (306) hide show

{torch_rechub-0.2.0 → torch_rechub-0.3.0}/CHANGELOG.md RENAMED Viewed

@@ -7,6 +7,27 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ---
+## [0.3.0] - 2026-02-05
+<!-- Release notes generated using configuration in .github/release.yml at main -->
+## What's Changed
+### ✨ 新特性 / Features
+* Inbatchsample by @zerolovesea in https://github.com/datawhalechina/torch-rechub/pull/128
+### 📝 文档更新 / Documentation
+* Enhance docs: ONNX, serving, tools, tutorials by @1985312383 in https://github.com/datawhalechina/torch-rechub/pull/165
+* Expand docs: community, guides, models, data by @1985312383 in https://github.com/datawhalechina/torch-rechub/pull/166
+### 🔄 其他变更 / Other Changes
+* Update favicon and logo images by @1985312383 in https://github.com/datawhalechina/torch-rechub/pull/164
+* Update docs imports to torch_rechub by @1985312383 in https://github.com/datawhalechina/torch-rechub/pull/167
+**Full Changelog**: https://github.com/datawhalechina/torch-rechub/compare/v0.2.0...v0.3.0
+---
 ## [0.2.0] - 2026-01-11
 <!-- Release notes generated using configuration in .github/release.yml at main -->

{torch_rechub-0.2.0 → torch_rechub-0.3.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: torch-rechub
-Version: 0.2.0
+Version: 0.3.0
 Summary: A Pytorch Toolbox for Recommendation Models, Easy-to-use and Easy-to-extend.
 Project-URL: Homepage, https://github.com/datawhalechina/torch-rechub
 Project-URL: Documentation, https://www.torch-rechub.com
@@ -66,6 +66,8 @@ Description-Content-Type: text/markdown
 # Torch-RecHub: 轻量、高效、易用的 PyTorch 推荐系统框架
+【⚠️ Alpha内测版本警告：此为早期内部构建版本，尚不完整且可能存在错误，欢迎大家提Issue反馈问题或建议。】
 [![许可证](https://img.shields.io/badge/license-MIT-blue?style=for-the-badge)](LICENSE)
 ![GitHub Repo stars](https://img.shields.io/github/stars/datawhalechina/torch-rechub?style=for-the-badge)
 ![GitHub forks](https://img.shields.io/github/forks/datawhalechina/torch-rechub?style=for-the-badge)
@@ -77,6 +79,7 @@ Description-Content-Type: text/markdown
 [![numpy 版本](https://img.shields.io/badge/numpy-1.19%2B-orange?style=for-the-badge)](https://numpy.org/)
 [![scikit-learn 版本](https://img.shields.io/badge/scikit_learn-0.23%2B-orange?style=for-the-badge)](https://scikit-learn.org/)
 [![torch-rechub 版本](https://img.shields.io/badge/torch_rechub-0.0.3%2B-orange?style=for-the-badge)](https://pypi.org/project/torch-rechub/)
+[![torchview](https://img.shields.io/badge/torchview-0.2%2B-green?style=for-the-badge)](https://github.com/mert-kurttutan/torchview)
 [English](README_en.md) | 简体中文
@@ -90,6 +93,7 @@ Description-Content-Type: text/markdown
 ## ✨ 特性
+* **生成式推荐模型:** LLM时代下，可以复现部分生成式推荐模型
 * **模块化设计:** 易于添加新的模型、数据集和评估指标。
 * **基于 PyTorch:** 利用 PyTorch 的动态图和 GPU 加速能力。
 * **丰富的模型库:** 涵盖 **30+** 经典和前沿推荐算法（召回、排序、多任务、生成式推荐等）。

{torch_rechub-0.2.0 → torch_rechub-0.3.0}/README.md RENAMED Viewed

@@ -4,6 +4,8 @@
 # Torch-RecHub: 轻量、高效、易用的 PyTorch 推荐系统框架
+【⚠️ Alpha内测版本警告：此为早期内部构建版本，尚不完整且可能存在错误，欢迎大家提Issue反馈问题或建议。】
 [![许可证](https://img.shields.io/badge/license-MIT-blue?style=for-the-badge)](LICENSE)
 ![GitHub Repo stars](https://img.shields.io/github/stars/datawhalechina/torch-rechub?style=for-the-badge)
 ![GitHub forks](https://img.shields.io/github/forks/datawhalechina/torch-rechub?style=for-the-badge)
@@ -15,6 +17,7 @@
 [![numpy 版本](https://img.shields.io/badge/numpy-1.19%2B-orange?style=for-the-badge)](https://numpy.org/)
 [![scikit-learn 版本](https://img.shields.io/badge/scikit_learn-0.23%2B-orange?style=for-the-badge)](https://scikit-learn.org/)
 [![torch-rechub 版本](https://img.shields.io/badge/torch_rechub-0.0.3%2B-orange?style=for-the-badge)](https://pypi.org/project/torch-rechub/)
+[![torchview](https://img.shields.io/badge/torchview-0.2%2B-green?style=for-the-badge)](https://github.com/mert-kurttutan/torchview)
 [English](README_en.md) | 简体中文
@@ -28,6 +31,7 @@
 ## ✨ 特性
+* **生成式推荐模型:** LLM时代下，可以复现部分生成式推荐模型
 * **模块化设计:** 易于添加新的模型、数据集和评估指标。
 * **基于 PyTorch:** 利用 PyTorch 的动态图和 GPU 加速能力。
 * **丰富的模型库:** 涵盖 **30+** 经典和前沿推荐算法（召回、排序、多任务、生成式推荐等）。

{torch_rechub-0.2.0 → torch_rechub-0.3.0}/README_en.md RENAMED Viewed

@@ -15,6 +15,7 @@
 [![numpy Version](https://img.shields.io/badge/numpy-1.19%2B-orange?style=for-the-badge)](https://numpy.org/)
 [![scikit-learn Version](https://img.shields.io/badge/scikit_learn-0.23%2B-orange?style=for-the-badge)](https://scikit-learn.org/)
 [![torch-rechub Version](https://img.shields.io/badge/torch_rechub-0.0.3%2B-orange?style=for-the-badge)](https://pypi.org/project/torch-rechub/)
+[![torchview](https://img.shields.io/badge/torchview-0.2%2B-green?style=for-the-badge)](https://github.com/mert-kurttutan/torchview)
 English | [简体中文](README.md)

torch_rechub-0.3.0/docs/en/community/changelog.md ADDED Viewed

@@ -0,0 +1,9 @@
+---
+title: Changelog
+description: Torch-RecHub version update history
+---
+# Changelog
+This page is under construction.

torch_rechub-0.3.0/docs/en/community/contributing.md ADDED Viewed

@@ -0,0 +1,177 @@
+---
+title: Contributing Guide
+description: Torch-RecHub project contribution guide, including bug reports, feature requests, and pull request process
+---
+# Contributing Guide
+Thank you for your interest in Torch-RecHub! We welcome all forms of contributions, including but not limited to:
+- Bug reports
+- Feature suggestions
+- Documentation improvements
+- Code contributions
+- Test cases
+- Tutorials and examples
+## Quick Start
+### Development Environment Setup
+```bash
+# 1. Fork and clone the repository
+git clone https://github.com/YOUR_USERNAME/torch-rechub.git
+cd torch-rechub
+# 2. Install dependencies and set up environment
+uv sync
+# 3. Install package in development mode
+uv pip install -e .
+```
+### Development Workflow
+1. **Fork the repository**: Click the "Fork" button in the top right corner.
+2. **Make changes**: Implement new features or fix bugs.
+3. **Format code**: Run code formatting before committing to ensure consistent style:
+   ```bash
+   python config/format_code.py
+   ```
+4. **Commit changes**: `git commit -m "feat: add new feature"` or `git commit -m "fix: fix some issue"` (following [Conventional Commits](https://www.conventionalcommits.org/) is recommended).
+5. **Push to branch**: `git push origin`
+6. **Create Pull Request**: Return to the original repository page, click "New pull request", compare your branch with the main repository's `main` branch, then submit the PR.
+## Code Standards
+### Branch Naming
+- `feature/feature-name` - New features
+- `fix/bug-description` - Bug fixes
+- `docs/documentation-update` - Documentation updates
+- `test/test-description` - Test additions
+### Commit Messages
+We follow the [Conventional Commits](https://www.conventionalcommits.org/) specification:
+- `feat: add new recommendation model`
+- `fix: resolve memory leak in training loop`
+- `docs: update installation guide`
+- `test: add unit tests for DeepFM model`
+- `refactor: optimize data loading pipeline`
+### Pull Request Process
+1. **Push your branch**
+   ```bash
+   git push origin your-branch-name
+   ```
+2. **Create Pull Request**
+   - Visit the GitHub repository page
+   - Click "New pull request"
+   - Select your branch
+   - Fill in the PR template
+3. **PR Requirements**
+   - Clear description of changes
+   - Explain why these changes are needed
+   - List related Issues (if any)
+   - Include test instructions
+   - Add screenshots (if applicable)
+## Testing Guide
+### Writing Tests
+- **Unit tests**: Test individual functions or classes
+- **Integration tests**: Test interactions between modules
+- **End-to-end tests**: Test complete workflows
+### Running Tests
+```bash
+# Run all tests
+uv run pytest
+# Run specific test file
+uv run pytest tests/test_models/test_ranking.py
+# Run with coverage report
+uv run pytest --cov=torch_rechub
+```
+## Documentation
+### Documentation Types
+- **API documentation**: Docstrings in code
+- **User guides**: Files in `docs/` directory
+- **Tutorials**: Jupyter notebooks in `tutorials/` directory
+- **README**: Project introduction and quick start
+### Documentation Standards
+- Use Markdown format
+- Include code examples
+- Provide clear step-by-step instructions
+- Keep English and Chinese versions in sync
+- Follow Google-style Python docstrings
+## Contribution Ideas
+### Beginner-Friendly Tasks
+- Improve documentation and comments
+- Add test cases
+- Fix simple bugs
+- Translate documentation
+- Add example code
+- Code formatting and style improvements
+### Advanced Contributions
+- Implement new recommendation algorithms
+- Performance optimization
+- Architecture improvements
+- Add new evaluation metrics
+- Develop tools and scripts
+- Research paper implementations
+### Model Implementation Guide
+When implementing new models:
+1. **Follow existing patterns**: Look at the structure of existing models
+2. **Add comprehensive tests**: Include unit and integration tests
+3. **Provide examples**: Add usage examples in `examples/` directory
+4. **Detailed documentation**: Include docstrings and README updates
+5. **Performance benchmarks**: Compare with existing implementations
+## Getting Help
+If you encounter issues during contribution:
+1. **Check existing Issues**: There may be related discussions
+2. **Create new Issue**: Clearly describe your problem
+3. **Join discussions**: Ask questions in related Issues or PRs
+4. **Contact maintainers**: Via GitHub or email
+## Recognition
+We value every contribution! All contributors will be recognized in:
+- Contributors list in README
+- Acknowledgments in release notes
+- Contributors page in project documentation
+- Special mentions for significant contributions
+## Code of Conduct
+Please follow our [Code of Conduct](https://github.com/datawhalechina/torch-rechub/blob/main/CODE_OF_CONDUCT.md) to ensure a friendly and inclusive community environment.
+---
+Thank you again for your contribution! Every contribution makes Torch-RecHub better.

torch_rechub-0.3.0/docs/en/community/faq.md ADDED Viewed

@@ -0,0 +1,82 @@
+---
+title: FAQ
+description: Torch-RecHub frequently asked questions and troubleshooting guide
+---
+# FAQ
+Torch-RecHub frequently asked questions and troubleshooting guide.
+* **Will there be a TensorFlow version?**
+    - Not currently planned
+    - This project's core positioning is to provide easy-to-use model implementations for beginners, referencing industry-used models. PyTorch has a wider audience.
+* **Why does running the example give AUC=0?**
+    - The examples use 100 sample records for users to reference data format and feature types, ensuring the code runs smoothly. Accuracy is not guaranteed.
+    - If you need to test performance, download the data from the links described in the README, then refer to the parameter configuration files in the examples for model training and evaluation.
+- **Annoy Installation**
+    - Windows installation
+        - Online installation
+        ```bash
+        pip install annoy
+        ```
+        If Windows doesn't have a C++ compilation environment, you may see this error:
+        ```bash
+        error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
+        ```
+        In this case, use offline installation:
+        - Offline installation
+        Annoy library download: [https://www.lfd.uci.edu/~gohlke/pythonlibs/#_annoy](https://www.lfd.uci.edu/~gohlke/pythonlibs/#_annoy)
+        ```bash
+        pip install annoy‑1.17.0‑cp39‑cp39‑win_amd64.whl
+        ```
+    - Linux/macOS installation
+        - Online installation
+        ```bash
+        pip install annoy
+        ```
+        Normally macOS can install online successfully. If online installation fails with a nose-related error, use offline compilation:
+        - Offline installation
+          - Download nose
+            Download: [https://www.lfd.uci.edu/~gohlke/pythonlibs/#_annoy](https://www.lfd.uci.edu/~gohlke/pythonlibs/#_annoy)
+            ```bash
+            pip install nose‑1.3.7‑py3‑none‑any.whl
+            ```
+          - Download annoy
+            Download: [https://files.pythonhosted.org/packages/a1/5b/1c22129f608b3f438713b91cd880dc681d747a860afe3e8e0af86e921942/annoy-1.17.0.tar.gz](https://files.pythonhosted.org/packages/a1/5b/1c22129f608b3f438713b91cd880dc681d747a860afe3e8e0af86e921942/annoy-1.17.0.tar.gz)
+            ```bash
+            tar -zxvf annoy-1.17.0.tar.gz
+            cd annoy-1.17.0
+            python setup.py install
+            ```
+      After installing annoy, you can install torch-rechub:
+      ```bash
+      pip install --upgrade torch-rechub
+      ```

torch_rechub-0.3.0/docs/en/core/data.md ADDED Viewed

@@ -0,0 +1,159 @@
+---
+title: Data Pipeline
+description: Torch-RecHub data loading and preprocessing
+---
+# Data Pipeline
+Torch-RecHub offers datasets, generators, and utilities for recommendation data.
+## Datasets
+### TorchDataset
+Training/validation dataset with features and labels.
+```python
+from torch_rechub.utils.data import TorchDataset
+dataset = TorchDataset(x, y)
+```
+### PredictDataset
+Prediction-only dataset (features only).
+```python
+from torch_rechub.utils.data import PredictDataset
+dataset = PredictDataset(x)
+```
+## Data Generators
+### DataGenerator
+Build dataloaders for ranking / multi-task models.
+```python
+from torch_rechub.utils.data import DataGenerator
+dg = DataGenerator(x, y)
+train_dl, val_dl, test_dl = dg.generate_dataloader(
+    split_ratio=[0.7, 0.1],
+    batch_size=256,
+    num_workers=8,
+)
+```
+### MatchDataGenerator
+Build dataloaders for matching/retrieval models.
+```python
+from torch_rechub.utils.data import MatchDataGenerator
+dg = MatchDataGenerator(x, y)
+train_dl, test_dl, item_dl = dg.generate_dataloader(
+    x_test_user=x_test_user,
+    x_all_item=x_all_item,
+    batch_size=256,
+    num_workers=8,
+)
+```
+## Utilities
+### get_auto_embedding_dim
+Compute embedding dim from vocab size: ``int(floor(6 * num_classes**0.25))``.
+```python
+from torch_rechub.utils.data import get_auto_embedding_dim
+embed_dim = get_auto_embedding_dim(vocab_size=1000)
+```
+### get_loss_func
+Return default loss by task type: BCELoss for classification, MSELoss for regression.
+```python
+from torch_rechub.utils.data import get_loss_func
+loss_fn = get_loss_func(task_type="classification")
+```
+## Parquet Streaming Dataset
+In industrial scenarios, feature engineering is typically done by **PySpark** on big data clusters, with data volumes reaching GB to TB scale. Using `spark_df.toPandas()` directly causes Driver OOM.
+Torch-RecHub provides `ParquetIterableDataset` for streaming Parquet files generated by Spark without loading all data into memory.
+### Installation
+Parquet data loading requires `pyarrow`:
+```bash
+pip install pyarrow
+```
+### ParquetIterableDataset
+Inherits from `torch.utils.data.IterableDataset` with multi-worker support.
+```python
+from torch.utils.data import DataLoader
+from torch_rechub.data import ParquetIterableDataset
+dataset = ParquetIterableDataset(
+    ["/data/train1.parquet", "/data/train2.parquet"],
+    columns=["user_id", "item_id", "label"],  # Optional
+    batch_size=1024,
+)
+loader = DataLoader(dataset, batch_size=None, num_workers=4)
+for batch in loader:
+    user_id = batch["user_id"]  # torch.Tensor
+    item_id = batch["item_id"]  # torch.Tensor
+    label = batch["label"]      # torch.Tensor
+```
+**Parameters:**
+- `file_paths`: List of Parquet file paths
+- `columns`: Column names to read; `None` reads all columns
+- `batch_size`: Rows per batch (default: 1024)
+**Features:**
+- **Streaming**: Uses PyArrow Scanner for constant memory usage
+- **Multi-worker**: Automatically partitions files across workers
+- **Type conversion**: Converts PyArrow arrays to PyTorch Tensors
+- **Nested arrays**: Supports Spark `Array` columns as 2D Tensors
+### Working with Spark
+```python
+# ========== Spark Side ==========
+# df.write.parquet("/data/train.parquet")
+# ========== PyTorch Side ==========
+import glob
+from torch_rechub.data import ParquetIterableDataset
+file_paths = glob.glob("/data/train.parquet/*.parquet")
+dataset = ParquetIterableDataset(file_paths, batch_size=2048)
+loader = DataLoader(dataset, batch_size=None, num_workers=8)
+```
+### Supported Types
+| Parquet/Arrow Type | Result |
+|-------------------|--------|
+| int8/16/32/64 | torch.float32 |
+| float32/64 | torch.float32 |
+| boolean | torch.float32 |
+| list/array | torch.Tensor (2D) |
+> **Note**: Nested arrays require equal row lengths; otherwise raises `ValueError`.
+## Typical Flow
+1. Define features (Dense/Sparse/Sequence).
+2. Load raw data.
+3. Encode categorical features (e.g., LabelEncoder).
+4. Process sequences (pad/truncate).
+5. Construct samples (e.g., negative sampling).
+6. Use DataGenerator / MatchDataGenerator to build dataloaders.
+7. Train models with the trainers.

torch_rechub-0.3.0/docs/en/guide/install.md ADDED Viewed

@@ -0,0 +1,94 @@
+---
+title: Installation Guide
+description: Torch-RecHub installation instructions for stable and development versions
+---
+# Installation Guide
+This document provides detailed installation instructions for Torch-RecHub, including stable and development versions.
+## System Requirements
+Before installing Torch-RecHub, ensure your system meets the following requirements:
+- **Python 3.9+**
+- **PyTorch 1.7+** (CUDA version recommended for GPU acceleration)
+- **NumPy**
+- **Pandas**
+- **SciPy**
+- **Scikit-learn**
+## Installation Methods
+### Stable Version (Recommended)
+The simplest way to install is via pip:
+```bash
+pip install torch-rechub
+```
+### Latest Development Version
+To install the development version with the latest features:
+```bash
+# Install uv first (if not already installed)
+pip install uv
+# Clone and install
+git clone https://github.com/datawhalechina/torch-rechub.git
+cd torch-rechub
+uv sync
+```
+## Development Environment Setup
+If you want to contribute to Torch-RecHub or work with the source code:
+```bash
+# 1. Fork and clone the repository
+git clone https://github.com/YOUR_USERNAME/torch-rechub.git
+cd torch-rechub
+# 2. Install dependencies and set up environment
+uv sync
+# 3. Install package in development mode
+uv pip install -e .
+```
+## Verify Installation
+To verify that Torch-RecHub is correctly installed:
+```python
+import torch_rechub
+print(torch_rechub.__version__)
+```
+Or run a simple example:
+```bash
+python examples/matching/run_ml_dssm.py
+```
+## Troubleshooting
+### PyTorch Installation
+If you need to install PyTorch with a specific CUDA version, visit the [PyTorch official website](https://pytorch.org/get-started/locally/) for installation instructions tailored to your system.
+### GPU Support
+For GPU acceleration, ensure you have:
+- NVIDIA GPU with compute capability 3.5 or higher
+- CUDA Toolkit installed
+- cuDNN library installed
+### Common Issues
+If you encounter any installation issues:
+1. Check [GitHub Issues](https://github.com/datawhalechina/torch-rechub/issues)
+2. Create a new Issue with detailed error messages and system information
+3. Refer to the FAQ section

torch-rechub 0.2.0__tar.gz → 0.3.0__tar.gz

torch-rechub 0.2.0tar.gz → 0.3.0tar.gz