PyPI - torch-rechub - Versions diffs - 0.0.5__tar.gz → 0.1.0__tar.gz - Mend

torch-rechub 0.0.5tar.gz → 0.1.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (278) hide show

{torch_rechub-0.0.5 → torch_rechub-0.1.0}/.github/workflows/ci.yml RENAMED Viewed

@@ -56,7 +56,7 @@ jobs:
           python-version: ${{ env.PYTHON_VERSION }}
       - name: Cache pip packages
-        uses: actions/cache@v4
+        uses: actions/cache@v5
         with:
           path: ~/.cache/pip
           key: ${{ runner.os }}-pip-lint-${{ hashFiles('pyproject.toml') }}
@@ -104,6 +104,9 @@ jobs:
       matrix:
         os: [ubuntu-latest, windows-latest, macos-latest]
+    env:
+      SKIP_MILVUS_TESTS: ${{ matrix.os != 'ubuntu-latest' && '1' || '0' }}
     steps:
       - name: Checkout code
         uses: actions/checkout@v6
@@ -114,7 +117,7 @@ jobs:
           python-version: '3.9'
       - name: Cache pip packages
-        uses: actions/cache@v4
+        uses: actions/cache@v5
         with:
           path: |
             ~/.cache/pip
@@ -136,12 +139,36 @@ jobs:
           # Install CPU-only PyTorch for faster CI
           pip install torch --index-url ${{ env.TORCH_INDEX_URL }}
           # Install the package with dev and onnx dependencies
-          pip install -e ".[dev,onnx]" || pip install -r requirements-dev.txt && pip install -e .
+          pip install -e ".[dev,annoy,faiss,milvus,onnx]" || pip install -r requirements-dev.txt && pip install -e .
+      - name: Start Milvus
+        if: matrix.os == 'ubuntu-latest'
+        run: |
+          # Download the installation script
+          curl -sfL https://raw.githubusercontent.com/milvus-io/milvus/master/scripts/standalone_embed.sh -o standalone_embed.sh
+          # Start the Docker container
+          bash standalone_embed.sh start
+      - name: Wait for Milvus
+        if: matrix.os == 'ubuntu-latest'
+        run: |
+          for i in {1..60}; do
+            if curl -fsS http://localhost:9091/healthz >/dev/null; then exit 0; fi
+            sleep 2
+          done
+          exit 1
       - name: Run tests
+        if: matrix.os != 'macos-latest'
         run: |
           pytest -c config/pytest.ini tests/ -v
+      - name: Run tests (skip indexing tests)
+        if: matrix.os == 'macos-latest'
+        run: |
+          pytest -c config/pytest.ini tests/ -v --ignore=tests/test_serving.py
       - name: Run tests with coverage (Ubuntu only)
         if: matrix.os == 'ubuntu-latest'
         run: |
@@ -155,6 +182,16 @@ jobs:
           flags: unittests
           name: codecov-umbrella
+      - name: Shutdown Milvus
+        if: always() && matrix.os == 'ubuntu-latest'
+        run: |
+          # Stop Milvus
+          bash standalone_embed.sh stop
+          # Delete Milvus data
+          bash standalone_embed.sh delete
   # ===================================================================
   # 依赖兼容性验证 (Python 3.10+) - 仅验证依赖安装成功
   # (仅在 push/PR 时运行，release 时跳过)
@@ -221,7 +258,7 @@ jobs:
           bandit -r torch_rechub/ -s B101,B311,B614 -x tests,docs,examples -f txt
       - name: Upload security scan results
-        uses: actions/upload-artifact@v5
+        uses: actions/upload-artifact@v6
         if: always()
         with:
           name: bandit-security-report
@@ -259,7 +296,7 @@ jobs:
           twine check dist/*
       - name: Upload build artifacts
-        uses: actions/upload-artifact@v5
+        uses: actions/upload-artifact@v6
         with:
           name: dist-packages
           path: dist/
@@ -348,7 +385,7 @@ jobs:
           fi
       - name: Install uv
-        uses: astral-sh/setup-uv@v4
+        uses: astral-sh/setup-uv@v7
         with:
           version: "latest"

{torch_rechub-0.0.5 → torch_rechub-0.1.0}/.github/workflows/deploy.yml RENAMED Viewed

@@ -7,6 +7,7 @@ on:
     paths:
       - 'docs/**'
       - 'package.json'
+      - 'CHANGELOG.md'
       - '.github/workflows/deploy.yml'
 jobs:
@@ -27,6 +28,13 @@ jobs:
       - name: Install dependencies
         run: npm ci
+      - name: Sync CHANGELOG to docs
+        run: |
+          # 复制 CHANGELOG.md 到中英文文档目录
+          cp CHANGELOG.md docs/zh/community/changelog.md
+          cp CHANGELOG.md docs/en/community/changelog.md
+          echo "✅ CHANGELOG.md synced to docs directories"
       - name: Build VitePress site
         run: npm run docs:build

{torch_rechub-0.0.5 → torch_rechub-0.1.0}/CHANGELOG.md RENAMED Viewed

@@ -7,6 +7,42 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ---
+## [0.1.0] - 2025-12-17
+<!-- Release notes generated using configuration in .github/release.yml at main -->
+## What's Changed
+### ✨ 新特性 / Features
+* Update docs and tutorials && Add ONNX quantization utilities and enhance export  by @1985312383 in https://github.com/datawhalechina/torch-rechub/pull/150
+* REFACTOR+FEATURE: Standardize retrieval backends (ANNOY/FAISS/Milvus) by @ywuenthought in https://github.com/datawhalechina/torch-rechub/pull/151
+**Full Changelog**: https://github.com/datawhalechina/torch-rechub/compare/v0.0.6...v0.1.0
+---
+## [0.0.6] - 2025-12-11
+<!-- Release notes generated using configuration in .github/release.yml at main -->
+## What's Changed
+### ✨ 新特性 / Features
+* FEATURE: Support Streaming Parquet Dataset by @ywuenthought in https://github.com/datawhalechina/torch-rechub/pull/143
+* Docs & tracking polish: logger docstrings, README refresh, dependency tweak by @1985312383 in https://github.com/datawhalechina/torch-rechub/pull/146
+### 📝 文档更新 / Documentation
+* Refator Chinese documentation structure by @1985312383 in https://github.com/datawhalechina/torch-rechub/pull/145
+## New Contributors
+* @ywuenthought made their first contribution in https://github.com/datawhalechina/torch-rechub/pull/143
+**Full Changelog**: https://github.com/datawhalechina/torch-rechub/compare/v0.0.5...v0.0.6
+---
 ## [0.0.5] - 2025-12-05
 <!-- Release notes generated using configuration in .github/release.yml at main -->

{torch_rechub-0.0.5 → torch_rechub-0.1.0}/CONTRIBUTING.md RENAMED Viewed

@@ -143,25 +143,32 @@ def test_deepfm_forward():
 - Include code examples
 - Provide clear step-by-step instructions
 - Keep both English and Chinese versions synchronized
-- Follow Google-style docstrings for Python code
+- Follow scikit-learn style docstrings (NumPy/SciPy convention) for Python code
 ### Docstring Example
 ```python
 def train_model(model, data_loader, optimizer):
     """Train a recommendation model.
-    Args:
-        model (torch.nn.Module): The model to train
-        data_loader (DataLoader): Training data loader
-        optimizer (torch.optim.Optimizer): Optimizer for training
-    Returns:
-        float: Training loss
-    Example:
-        >>> model = DeepFM(features, mlp_params)
-        >>> loss = train_model(model, train_loader, optimizer)
+    Parameters
+    ----------
+    model : torch.nn.Module
+        Model to train.
+    data_loader : DataLoader
+        Training data loader.
+    optimizer : torch.optim.Optimizer
+        Optimizer for training.
+    Returns
+    -------
+    float
+        Training loss.
+    Examples
+    --------
+    >>> model = DeepFM(features, mlp_params)
+    >>> loss = train_model(model, train_loader, optimizer)
     """
     # Implementation here
 ```

{torch_rechub-0.0.5 → torch_rechub-0.1.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: torch-rechub
-Version: 0.0.5
+Version: 0.1.0
 Summary: A Pytorch Toolbox for Recommendation Models, Easy-to-use and Easy-to-extend.
 Project-URL: Homepage, https://github.com/datawhalechina/torch-rechub
 Project-URL: Documentation, https://www.torch-rechub.com
@@ -28,19 +28,33 @@ Requires-Dist: scikit-learn>=0.24.0
 Requires-Dist: torch>=1.10.0
 Requires-Dist: tqdm>=4.60.0
 Requires-Dist: transformers>=4.46.3
+Provides-Extra: annoy
+Requires-Dist: annoy>=1.17.2; extra == 'annoy'
+Provides-Extra: bigdata
+Requires-Dist: pyarrow~=21.0; extra == 'bigdata'
 Provides-Extra: dev
 Requires-Dist: bandit>=1.7.0; extra == 'dev'
 Requires-Dist: flake8>=3.8.0; extra == 'dev'
 Requires-Dist: isort==5.13.2; extra == 'dev'
 Requires-Dist: mypy>=0.800; extra == 'dev'
 Requires-Dist: pre-commit>=2.20.0; extra == 'dev'
+Requires-Dist: pyarrow-stubs>=20.0; extra == 'dev'
 Requires-Dist: pytest-cov>=2.0; extra == 'dev'
 Requires-Dist: pytest>=6.0; extra == 'dev'
 Requires-Dist: toml>=0.10.2; extra == 'dev'
 Requires-Dist: yapf==0.43.0; extra == 'dev'
+Provides-Extra: faiss
+Requires-Dist: faiss-cpu==1.13.0; extra == 'faiss'
+Provides-Extra: milvus
+Requires-Dist: pymilvus>=2.6.5; extra == 'milvus'
 Provides-Extra: onnx
-Requires-Dist: onnx>=1.12.0; extra == 'onnx'
-Requires-Dist: onnxruntime>=1.12.0; extra == 'onnx'
+Requires-Dist: onnx>=1.14.0; extra == 'onnx'
+Requires-Dist: onnxconverter-common>=1.14.0; extra == 'onnx'
+Requires-Dist: onnxruntime>=1.14.0; extra == 'onnx'
+Provides-Extra: tracking
+Requires-Dist: swanlab>=0.1.0; extra == 'tracking'
+Requires-Dist: tensorboardx>=2.5; extra == 'tracking'
+Requires-Dist: wandb>=0.13.0; extra == 'tracking'
 Provides-Extra: visualization
 Requires-Dist: graphviz>=0.20; extra == 'visualization'
 Requires-Dist: torchview>=0.2.6; extra == 'visualization'
@@ -89,7 +103,8 @@ Description-Content-Type: text/markdown
 * **易于配置:** 通过配置文件或命令行参数轻松调整实验设置。
 * **可复现性:** 旨在确保实验结果的可复现性。
 * **ONNX 导出:** 支持将训练好的模型导出为 ONNX 格式，便于部署到生产环境。
-* **其他特性:** 例如，支持负采样、多任务学习等。
+* **跨引擎数据处理:** 现已支持基于 PySpark 的数据处理与转换，方便在大数据管道中落地。
+* **实验可视化与跟踪:** 内置 WandB、SwanLab、TensorBoardX 三种可视化/追踪工具的统一集成。
 ## 📖 目录
@@ -399,4 +414,4 @@ ctr_trainer.visualization(save_path="model.pdf", dpi=300)  # 保存为高清 PDF
 ---
-*最后更新: [2025-12-04]*
+*最后更新: [2025-12-11]*

{torch_rechub-0.0.5 → torch_rechub-0.1.0}/README.md RENAMED Viewed

@@ -41,7 +41,8 @@
 * **易于配置:** 通过配置文件或命令行参数轻松调整实验设置。
 * **可复现性:** 旨在确保实验结果的可复现性。
 * **ONNX 导出:** 支持将训练好的模型导出为 ONNX 格式，便于部署到生产环境。
-* **其他特性:** 例如，支持负采样、多任务学习等。
+* **跨引擎数据处理:** 现已支持基于 PySpark 的数据处理与转换，方便在大数据管道中落地。
+* **实验可视化与跟踪:** 内置 WandB、SwanLab、TensorBoardX 三种可视化/追踪工具的统一集成。
 ## 📖 目录
@@ -351,4 +352,4 @@ ctr_trainer.visualization(save_path="model.pdf", dpi=300)  # 保存为高清 PDF
 ---
-*最后更新: [2025-12-04]*
+*最后更新: [2025-12-11]*

{torch_rechub-0.0.5 → torch_rechub-0.1.0}/README_en.md RENAMED Viewed

@@ -41,6 +41,8 @@ English | [简体中文](README.md)
 * **Easy Configuration:** Adjust experiment settings via config files or command-line arguments.
 * **Reproducibility:** Designed to ensure reproducible experimental results.
 * **ONNX Export:** Export trained models to ONNX format for production deployment.
+* **Cross-engine data processing:** PySpark-based data processing and conversion supported for large-scale pipelines.
+* **Experiment visualization & tracking:** Unified integration of WandB, SwanLab, and TensorBoardX.
 * **Additional Features:** Negative sampling, multi-task learning, etc.
 ## 📖 Table of Contents
@@ -342,4 +344,4 @@ If you use this framework in your research or work, please consider citing:
 ---
-*Last updated: [2025-12-04]*
+*Last updated: [2025-12-11]*

torch_rechub-0.1.0/docs/.vitepress/config.mts ADDED Viewed

@@ -0,0 +1,214 @@
+import { defineConfig } from 'vitepress'
+export default defineConfig({
+  title: "torch-rechub",
+  description: "A Lighting Pytorch Framework for Recommendation Models, Easy-to-use and Easy-to-extend.",
+  head: [
+    ['link', { rel: 'icon', href: '/torch-rechub/favicon.ico' }]
+  ],
+  base: '/torch-rechub/',
+  // 路径重写: 假设你的源文件都在 docs/en/ 下，但访问路径去掉 en
+  rewrites: {
+    'en/:rest*': ':rest*'
+  },
+  themeConfig: {
+    logo: '/img/logo.png',
+    search: { provider: 'local' },
+    socialLinks: [
+      { icon: 'github', link: 'https://github.com/datawhalechina/torch-rechub' }
+    ]
+  },
+  locales: {
+    // ====================================================
+    // 🇬🇧 English (Root)
+    // ====================================================
+    root: {
+      label: 'English',
+      lang: 'en',
+      themeConfig: {
+        nav: [
+          { text: '🏠 Home', link: '/' },
+          { text: '🚀 Getting Started', link: '/guide/intro' },
+          { text: '⚙️ Core', link: '/core/intro' },
+          { text: '🏰 Models', link: '/models/intro' },
+          { text: '🛠️ Tools', link: '/tools/intro' },
+          { text: '🚀 Serving', link: '/serving/intro' },
+          { text: '📖 Tutorials', link: '/tutorials/intro' },
+          { text: 'ℹ️ API', link: '/api/api' },
+          { text: '👥 Community', link: '/community/faq' }
+        ],
+        sidebar: {
+          '/guide/': [
+            {
+              text: '🚀 Getting Started',
+              items: [
+                { text: 'Overview', link: '/guide/intro' },
+                { text: 'Installation', link: '/guide/install' },
+                { text: 'Quick Start', link: '/guide/quick_start' }
+              ]
+            }
+          ],
+          '/core/': [{
+            text: '⚙️ Core Components', items: [
+              { text: 'Overview', link: '/core/intro' },
+              { text: 'Feature Columns', link: '/core/features' },
+              { text: 'Data Pipeline', link: '/core/data' },
+              { text: 'Training & Eval', link: '/core/evaluation' }
+            ]
+          }],
+          '/models/': [{
+            text: '🏰 Model Zoo', items: [
+              { text: 'Overview', link: '/models/intro' },
+              { text: 'Ranking Models', link: '/models/ranking' },
+              { text: 'Matching Models', link: '/models/matching' },
+              { text: 'Multi-Task Models', link: '/models/mtl' },
+              { text: 'Generative Models', link: '/models/generative' }
+            ]
+          }],
+          '/tools/': [{
+            text: '🛠️ Dev Tools', items: [
+              { text: 'Overview', link: '/tools/intro' },
+              { text: 'Visualization', link: '/tools/visualization' },
+              { text: 'Experiment Tracking', link: '/tools/tracking' },
+              { text: 'Callbacks', link: '/tools/callbacks' }
+            ]
+          }],
+          '/serving/': [{
+            text: '🚀 Serving', items: [
+              { text: 'Overview', link: '/serving/intro' },
+              { text: 'ONNX & Quantization', link: '/serving/onnx' },
+              { text: 'Vector Indexing', link: '/serving/vector_index' },
+              { text: 'Serving Demo', link: '/serving/demo' }
+            ]
+          }],
+          '/tutorials/': [{
+            text: '📖 Tutorials', items: [
+              { text: 'Overview', link: '/tutorials/intro' },
+              { text: 'CTR Pipeline', link: '/tutorials/ctr' },
+              { text: 'Retrieval System', link: '/tutorials/retrieval' },
+              { text: 'Big Data Pipeline', link: '/tutorials/pipeline' }
+            ]
+          }],
+          '/api/': [
+            {
+              text: 'ℹ️ API Reference',
+              items: [
+                { text: 'Main API', link: '/api/api' },
+              ]
+            }
+          ],
+          '/community/': [
+            {
+              text: '📘 Community',
+              items: [
+                { text: 'FAQ', link: '/community/faq' },
+                { text: 'Contributing', link: '/community/contributing' },
+                { text: 'Changelog', link: '/community/changelog' }
+              ]
+            }
+          ]
+        }
+      }
+    },
+    // ====================================================
+    // 🇨🇳 中文 (Zh)
+    // ====================================================
+    zh: {
+      label: '中文',
+      lang: 'zh-CN',
+      link: '/zh/',
+      themeConfig: {
+        nav: [
+          { text: '🏠 首页', link: '/zh/' },
+          { text: '🚀 快速入门', link: '/zh/guide/intro' },
+          { text: '⚙️ 核心组件', link: '/zh/core/intro' },
+          { text: '🏰 模型库', link: '/zh/models/intro' },
+          { text: '🛠️ 研发工具', link: '/zh/tools/intro' },
+          { text: '🚀 生产部署', link: '/zh/serving/intro' },
+          { text: '📖 场景教程', link: '/zh/tutorials/intro' },
+          { text: 'ℹ️ API', link: '/zh/api/api' },
+          { text: '👥 社区', link: '/zh/community/faq' }
+        ],
+        sidebar: {
+          '/zh/guide/': [
+            {
+              text: '🚀 快速入门',
+              items: [
+                { text: '导览 (Overview)', link: '/zh/guide/intro' },
+                { text: '安装指南', link: '/zh/guide/install' },
+                { text: '3分钟上手', link: '/zh/guide/quick_start' }
+              ]
+            }
+          ],
+          '/zh/core/': [{
+            text: '⚙️ 核心组件', items: [
+              { text: '导览 (Overview)', link: '/zh/core/intro' },
+              { text: '特征定义 (Features)', link: '/zh/core/features' },
+              { text: '数据流水线 (Data)', link: '/zh/core/data' },
+              { text: '训练与评估 (Eval)', link: '/zh/core/evaluation' }
+            ]
+          }],
+          '/zh/models/': [{
+            text: '🏰 模型库', items: [
+              { text: '导览 (Overview)', link: '/zh/models/intro' },
+              { text: '排序模型 (Ranking)', link: '/zh/models/ranking' },
+              { text: '召回模型 (Matching)', link: '/zh/models/matching' },
+              { text: '多任务模型 (MTL)', link: '/zh/models/mtl' },
+              { text: '生成式模型 (Generative)', link: '/zh/models/generative' }
+            ]
+          }],
+          '/zh/tools/': [{
+            text: '🛠️ 研发工具', items: [
+              { text: '导览 (Overview)', link: '/zh/tools/intro' },
+              { text: '可视化监控', link: '/zh/tools/visualization' },
+              { text: '实验追踪', link: '/zh/tools/tracking' },
+              { text: '回调函数', link: '/zh/tools/callbacks' }
+            ]
+          }],
+          '/zh/serving/': [{
+            text: '🚀 生产部署', items: [
+              { text: '导览 (Overview)', link: '/zh/serving/intro' },
+              { text: 'ONNX 导出与量化', link: '/zh/serving/onnx' },
+              { text: '向量检索封装', link: '/zh/serving/vector_index' },
+              { text: '在线服务示例', link: '/zh/serving/demo' }
+            ]
+          }],
+          '/zh/tutorials/': [{
+            text: '📖 场景教程', items: [
+              { text: '导览 (Overview)', link: '/zh/tutorials/intro' },
+              { text: 'CTR 预估流程', link: '/zh/tutorials/ctr' },
+              { text: '召回系统搭建', link: '/zh/tutorials/retrieval' },
+              { text: '全链路流水线', link: '/zh/tutorials/pipeline' }
+            ]
+          }],
+          '/zh/api/': [
+            {
+              text: 'ℹ️ API Reference',
+              items: [
+                { text: 'API 参考', link: '/zh/api/api' },
+              ]
+            }
+          ],
+          '/zh/community/': [
+            {
+              text: '📘 社区信息',
+              items: [
+                { text: '常见问题 (FAQ)', link: '/zh/community/faq' },
+                { text: '贡献指南 (Contributing)', link: '/zh/community/contributing' },
+                { text: '版本日志 (Changelog)', link: '/zh/community/changelog' }
+              ]
+            }
+          ]
+        }
+      }
+    }
+  }
+})

torch_rechub-0.1.0/docs/en/community/faq.md ADDED Viewed

File without changes

torch_rechub-0.1.0/docs/en/core/data.md ADDED Viewed

@@ -0,0 +1,86 @@
+---
+title: Data Pipeline
+description: Torch-RecHub data loading and preprocessing
+---
+# Data Pipeline
+Torch-RecHub offers datasets, generators, and utilities for recommendation data.
+## Datasets
+### TorchDataset
+Training/validation dataset with features and labels.
+```python
+from torch_rechub.utils.data import TorchDataset
+dataset = TorchDataset(x, y)
+```
+### PredictDataset
+Prediction-only dataset (features only).
+```python
+from torch_rechub.utils.data import PredictDataset
+dataset = PredictDataset(x)
+```
+## Data Generators
+### DataGenerator
+Build dataloaders for ranking / multi-task models.
+```python
+from torch_rechub.utils.data import DataGenerator
+dg = DataGenerator(x, y)
+train_dl, val_dl, test_dl = dg.generate_dataloader(
+    split_ratio=[0.7, 0.1],
+    batch_size=256,
+    num_workers=8,
+)
+```
+### MatchDataGenerator
+Build dataloaders for matching/retrieval models.
+```python
+from torch_rechub.utils.data import MatchDataGenerator
+dg = MatchDataGenerator(x, y)
+train_dl, test_dl, item_dl = dg.generate_dataloader(
+    x_test_user=x_test_user,
+    x_all_item=x_all_item,
+    batch_size=256,
+    num_workers=8,
+)
+```
+## Utilities
+### get_auto_embedding_dim
+Compute embedding dim from vocab size: ``int(floor(6 * num_classes**0.25))``.
+```python
+from torch_rechub.utils.data import get_auto_embedding_dim
+embed_dim = get_auto_embedding_dim(vocab_size=1000)
+```
+### get_loss_func
+Return default loss by task type: BCELoss for classification, MSELoss for regression.
+```python
+from torch_rechub.utils.data import get_loss_func
+loss_fn = get_loss_func(task_type="classification")
+```
+## Typical Flow
+1. Define features (Dense/Sparse/Sequence).
+2. Load raw data.
+3. Encode categorical features (e.g., LabelEncoder).
+4. Process sequences (pad/truncate).
+5. Construct samples (e.g., negative sampling).
+6. Use DataGenerator / MatchDataGenerator to build dataloaders.
+7. Train models with the trainers.

torch-rechub 0.0.5__tar.gz → 0.1.0__tar.gz

torch-rechub 0.0.5tar.gz → 0.1.0tar.gz