PyPI - torch-rechub - Versions diffs - 0.0.5__tar.gz → 0.0.6__tar.gz - Mend

torch-rechub 0.0.5tar.gz → 0.0.6tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (263) hide show

{torch_rechub-0.0.5 → torch_rechub-0.0.6}/.github/workflows/ci.yml RENAMED Viewed

@@ -348,7 +348,7 @@ jobs:
           fi
       - name: Install uv
-        uses: astral-sh/setup-uv@v4
+        uses: astral-sh/setup-uv@v7
         with:
           version: "latest"

{torch_rechub-0.0.5 → torch_rechub-0.0.6}/.github/workflows/deploy.yml RENAMED Viewed

@@ -7,6 +7,7 @@ on:
     paths:
       - 'docs/**'
       - 'package.json'
+      - 'CHANGELOG.md'
       - '.github/workflows/deploy.yml'
 jobs:
@@ -27,6 +28,13 @@ jobs:
       - name: Install dependencies
         run: npm ci
+      - name: Sync CHANGELOG to docs
+        run: |
+          # 复制 CHANGELOG.md 到中英文文档目录
+          cp CHANGELOG.md docs/zh/community/changelog.md
+          cp CHANGELOG.md docs/en/community/changelog.md
+          echo "✅ CHANGELOG.md synced to docs directories"
       - name: Build VitePress site
         run: npm run docs:build

{torch_rechub-0.0.5 → torch_rechub-0.0.6}/CHANGELOG.md RENAMED Viewed

@@ -7,6 +7,26 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ---
+## [0.0.6] - 2025-12-11
+<!-- Release notes generated using configuration in .github/release.yml at main -->
+## What's Changed
+### ✨ 新特性 / Features
+* FEATURE: Support Streaming Parquet Dataset by @ywuenthought in https://github.com/datawhalechina/torch-rechub/pull/143
+* Docs & tracking polish: logger docstrings, README refresh, dependency tweak by @1985312383 in https://github.com/datawhalechina/torch-rechub/pull/146
+### 📝 文档更新 / Documentation
+* Refator Chinese documentation structure by @1985312383 in https://github.com/datawhalechina/torch-rechub/pull/145
+## New Contributors
+* @ywuenthought made their first contribution in https://github.com/datawhalechina/torch-rechub/pull/143
+**Full Changelog**: https://github.com/datawhalechina/torch-rechub/compare/v0.0.5...v0.0.6
+---
 ## [0.0.5] - 2025-12-05
 <!-- Release notes generated using configuration in .github/release.yml at main -->

{torch_rechub-0.0.5 → torch_rechub-0.0.6}/CONTRIBUTING.md RENAMED Viewed

@@ -143,7 +143,7 @@ def test_deepfm_forward():
 - Include code examples
 - Provide clear step-by-step instructions
 - Keep both English and Chinese versions synchronized
-- Follow Google-style docstrings for Python code
+- Follow scikit-learn style docstrings (NumPy/SciPy convention) for Python code
 ### Docstring Example

{torch_rechub-0.0.5 → torch_rechub-0.0.6}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: torch-rechub
-Version: 0.0.5
+Version: 0.0.6
 Summary: A Pytorch Toolbox for Recommendation Models, Easy-to-use and Easy-to-extend.
 Project-URL: Homepage, https://github.com/datawhalechina/torch-rechub
 Project-URL: Documentation, https://www.torch-rechub.com
@@ -28,19 +28,26 @@ Requires-Dist: scikit-learn>=0.24.0
 Requires-Dist: torch>=1.10.0
 Requires-Dist: tqdm>=4.60.0
 Requires-Dist: transformers>=4.46.3
+Provides-Extra: bigdata
+Requires-Dist: pyarrow~=21.0; extra == 'bigdata'
 Provides-Extra: dev
 Requires-Dist: bandit>=1.7.0; extra == 'dev'
 Requires-Dist: flake8>=3.8.0; extra == 'dev'
 Requires-Dist: isort==5.13.2; extra == 'dev'
 Requires-Dist: mypy>=0.800; extra == 'dev'
 Requires-Dist: pre-commit>=2.20.0; extra == 'dev'
+Requires-Dist: pyarrow-stubs>=20.0; extra == 'dev'
 Requires-Dist: pytest-cov>=2.0; extra == 'dev'
 Requires-Dist: pytest>=6.0; extra == 'dev'
 Requires-Dist: toml>=0.10.2; extra == 'dev'
 Requires-Dist: yapf==0.43.0; extra == 'dev'
 Provides-Extra: onnx
-Requires-Dist: onnx>=1.12.0; extra == 'onnx'
-Requires-Dist: onnxruntime>=1.12.0; extra == 'onnx'
+Requires-Dist: onnx>=1.14.0; extra == 'onnx'
+Requires-Dist: onnxruntime>=1.14.0; extra == 'onnx'
+Provides-Extra: tracking
+Requires-Dist: swanlab>=0.1.0; extra == 'tracking'
+Requires-Dist: tensorboardx>=2.5; extra == 'tracking'
+Requires-Dist: wandb>=0.13.0; extra == 'tracking'
 Provides-Extra: visualization
 Requires-Dist: graphviz>=0.20; extra == 'visualization'
 Requires-Dist: torchview>=0.2.6; extra == 'visualization'
@@ -89,7 +96,8 @@ Description-Content-Type: text/markdown
 * **易于配置:** 通过配置文件或命令行参数轻松调整实验设置。
 * **可复现性:** 旨在确保实验结果的可复现性。
 * **ONNX 导出:** 支持将训练好的模型导出为 ONNX 格式，便于部署到生产环境。
-* **其他特性:** 例如，支持负采样、多任务学习等。
+* **跨引擎数据处理:** 现已支持基于 PySpark 的数据处理与转换，方便在大数据管道中落地。
+* **实验可视化与跟踪:** 内置 WandB、SwanLab、TensorBoardX 三种可视化/追踪工具的统一集成。
 ## 📖 目录
@@ -399,4 +407,4 @@ ctr_trainer.visualization(save_path="model.pdf", dpi=300)  # 保存为高清 PDF
 ---
-*最后更新: [2025-12-04]*
+*最后更新: [2025-12-11]*

{torch_rechub-0.0.5 → torch_rechub-0.0.6}/README.md RENAMED Viewed

@@ -41,7 +41,8 @@
 * **易于配置:** 通过配置文件或命令行参数轻松调整实验设置。
 * **可复现性:** 旨在确保实验结果的可复现性。
 * **ONNX 导出:** 支持将训练好的模型导出为 ONNX 格式，便于部署到生产环境。
-* **其他特性:** 例如，支持负采样、多任务学习等。
+* **跨引擎数据处理:** 现已支持基于 PySpark 的数据处理与转换，方便在大数据管道中落地。
+* **实验可视化与跟踪:** 内置 WandB、SwanLab、TensorBoardX 三种可视化/追踪工具的统一集成。
 ## 📖 目录
@@ -351,4 +352,4 @@ ctr_trainer.visualization(save_path="model.pdf", dpi=300)  # 保存为高清 PDF
 ---
-*最后更新: [2025-12-04]*
+*最后更新: [2025-12-11]*

{torch_rechub-0.0.5 → torch_rechub-0.0.6}/README_en.md RENAMED Viewed

@@ -41,6 +41,8 @@ English | [简体中文](README.md)
 * **Easy Configuration:** Adjust experiment settings via config files or command-line arguments.
 * **Reproducibility:** Designed to ensure reproducible experimental results.
 * **ONNX Export:** Export trained models to ONNX format for production deployment.
+* **Cross-engine data processing:** PySpark-based data processing and conversion supported for large-scale pipelines.
+* **Experiment visualization & tracking:** Unified integration of WandB, SwanLab, and TensorBoardX.
 * **Additional Features:** Negative sampling, multi-task learning, etc.
 ## 📖 Table of Contents
@@ -342,4 +344,4 @@ If you use this framework in your research or work, please consider citing:
 ---
-*Last updated: [2025-12-04]*
+*Last updated: [2025-12-11]*

torch_rechub-0.0.6/docs/.vitepress/config.mts ADDED Viewed

@@ -0,0 +1,214 @@
+import { defineConfig } from 'vitepress'
+export default defineConfig({
+  title: "torch-rechub",
+  description: "A Lighting Pytorch Framework for Recommendation Models, Easy-to-use and Easy-to-extend.",
+  head: [
+    ['link', { rel: 'icon', href: '/torch-rechub/favicon.ico' }]
+  ],
+  base: '/torch-rechub/',
+  // 路径重写: 假设你的源文件都在 docs/en/ 下，但访问路径去掉 en
+  rewrites: {
+    'en/:rest*': ':rest*'
+  },
+  themeConfig: {
+    logo: '/img/logo.png',
+    search: { provider: 'local' },
+    socialLinks: [
+      { icon: 'github', link: 'https://github.com/datawhalechina/torch-rechub' }
+    ]
+  },
+  locales: {
+    // ====================================================
+    // 🇬🇧 English (Root)
+    // ====================================================
+    root: {
+      label: 'English',
+      lang: 'en',
+      themeConfig: {
+        nav: [
+          { text: '🏠 Home', link: '/' },
+          { text: '🚀 Getting Started', link: '/guide/intro' },
+          { text: '⚙️ Core', link: '/core/intro' },
+          { text: '🏰 Models', link: '/models/intro' },
+          { text: '🛠️ Tools', link: '/tools/intro' },
+          { text: '🚀 Serving', link: '/serving/intro' },
+          { text: '📖 Tutorials', link: '/tutorials/intro' },
+          { text: 'ℹ️ API', link: '/api/api' },
+          { text: '👥 Community', link: '/community/faq' }
+        ],
+        sidebar: {
+          '/guide/': [
+            {
+              text: '🚀 Getting Started',
+              items: [
+                { text: 'Overview', link: '/guide/intro' },
+                { text: 'Installation', link: '/guide/install' },
+                { text: 'Quick Start', link: '/guide/quick_start' }
+              ]
+            }
+          ],
+          '/core/': [{
+            text: '⚙️ Core Components', items: [
+              { text: 'Overview', link: '/core/intro' },
+              { text: 'Feature Columns', link: '/core/features' },
+              { text: 'Data Pipeline', link: '/core/data' },
+              { text: 'Training & Eval', link: '/core/evaluation' }
+            ]
+          }],
+          '/models/': [{
+            text: '🏰 Model Zoo', items: [
+              { text: 'Overview', link: '/models/intro' },
+              { text: 'Ranking Models', link: '/models/ranking' },
+              { text: 'Matching Models', link: '/models/matching' },
+              { text: 'Multi-Task Models', link: '/models/mtl' },
+              { text: 'Generative Models', link: '/models/generative' }
+            ]
+          }],
+          '/tools/': [{
+            text: '🛠️ Dev Tools', items: [
+              { text: 'Overview', link: '/tools/intro' },
+              { text: 'Visualization', link: '/tools/visualization' },
+              { text: 'Experiment Tracking', link: '/tools/tracking' },
+              { text: 'Callbacks', link: '/tools/callbacks' }
+            ]
+          }],
+          '/serving/': [{
+            text: '🚀 Serving', items: [
+              { text: 'Overview', link: '/serving/intro' },
+              { text: 'ONNX & Quantization', link: '/serving/onnx' },
+              { text: 'Vector Indexing', link: '/serving/vector_index' },
+              { text: 'Serving Demo', link: '/serving/demo' }
+            ]
+          }],
+          '/tutorials/': [{
+            text: '📖 Tutorials', items: [
+              { text: 'Overview', link: '/tutorials/intro' },
+              { text: 'CTR Pipeline', link: '/tutorials/ctr' },
+              { text: 'Retrieval System', link: '/tutorials/retrieval' },
+              { text: 'Big Data Pipeline', link: '/tutorials/pipeline' }
+            ]
+          }],
+          '/api/': [
+            {
+              text: 'ℹ️ API Reference',
+              items: [
+                { text: 'Main API', link: '/api/api' },
+              ]
+            }
+          ],
+          '/community/': [
+            {
+              text: '📘 Community',
+              items: [
+                { text: 'FAQ', link: '/community/faq' },
+                { text: 'Contributing', link: '/community/contributing' },
+                { text: 'Changelog', link: '/community/changelog' }
+              ]
+            }
+          ]
+        }
+      }
+    },
+    // ====================================================
+    // 🇨🇳 中文 (Zh)
+    // ====================================================
+    zh: {
+      label: '中文',
+      lang: 'zh-CN',
+      link: '/zh/',
+      themeConfig: {
+        nav: [
+          { text: '🏠 首页', link: '/zh/' },
+          { text: '🚀 快速入门', link: '/zh/guide/intro' },
+          { text: '⚙️ 核心组件', link: '/zh/core/intro' },
+          { text: '🏰 模型库', link: '/zh/models/intro' },
+          { text: '🛠️ 研发工具', link: '/zh/tools/intro' },
+          { text: '🚀 生产部署', link: '/zh/serving/intro' },
+          { text: '📖 场景教程', link: '/zh/tutorials/intro' },
+          { text: 'ℹ️ API', link: '/zh/api/api' },
+          { text: '👥 社区', link: '/zh/community/faq' }
+        ],
+        sidebar: {
+          '/zh/guide/': [
+            {
+              text: '🚀 快速入门',
+              items: [
+                { text: '导览 (Overview)', link: '/zh/guide/intro' },
+                { text: '安装指南', link: '/zh/guide/install' },
+                { text: '3分钟上手', link: '/zh/guide/quick_start' }
+              ]
+            }
+          ],
+          '/zh/core/': [{
+            text: '⚙️ 核心组件', items: [
+              { text: '导览 (Overview)', link: '/zh/core/intro' },
+              { text: '特征定义 (Features)', link: '/zh/core/features' },
+              { text: '数据流水线 (Data)', link: '/zh/core/data' },
+              { text: '训练与评估 (Eval)', link: '/zh/core/evaluation' }
+            ]
+          }],
+          '/zh/models/': [{
+            text: '🏰 模型库', items: [
+              { text: '导览 (Overview)', link: '/zh/models/intro' },
+              { text: '排序模型 (Ranking)', link: '/zh/models/ranking' },
+              { text: '召回模型 (Matching)', link: '/zh/models/matching' },
+              { text: '多任务模型 (MTL)', link: '/zh/models/mtl' },
+              { text: '生成式模型 (Generative)', link: '/zh/models/generative' }
+            ]
+          }],
+          '/zh/tools/': [{
+            text: '🛠️ 研发工具', items: [
+              { text: '导览 (Overview)', link: '/zh/tools/intro' },
+              { text: '可视化监控', link: '/zh/tools/visualization' },
+              { text: '实验追踪', link: '/zh/tools/tracking' },
+              { text: '回调函数', link: '/zh/tools/callbacks' }
+            ]
+          }],
+          '/zh/serving/': [{
+            text: '🚀 生产部署', items: [
+              { text: '导览 (Overview)', link: '/zh/serving/intro' },
+              { text: 'ONNX 导出与量化', link: '/zh/serving/onnx' },
+              { text: '向量检索封装', link: '/zh/serving/vector_index' },
+              { text: '在线服务示例', link: '/zh/serving/demo' }
+            ]
+          }],
+          '/zh/tutorials/': [{
+            text: '📖 场景教程', items: [
+              { text: '导览 (Overview)', link: '/zh/tutorials/intro' },
+              { text: 'CTR 预估流程', link: '/zh/tutorials/ctr' },
+              { text: '召回系统搭建', link: '/zh/tutorials/retrieval' },
+              { text: '全链路流水线', link: '/zh/tutorials/pipeline' }
+            ]
+          }],
+          '/zh/api/': [
+            {
+              text: 'ℹ️ API Reference',
+              items: [
+                { text: 'API 参考', link: '/zh/api/api' },
+              ]
+            }
+          ],
+          '/zh/community/': [
+            {
+              text: '📘 社区信息',
+              items: [
+                { text: '常见问题 (FAQ)', link: '/zh/community/faq' },
+                { text: '贡献指南 (Contributing)', link: '/zh/community/contributing' },
+                { text: '版本日志 (Changelog)', link: '/zh/community/changelog' }
+              ]
+            }
+          ]
+        }
+      }
+    }
+  }
+})

torch_rechub-0.0.6/docs/en/community/faq.md ADDED Viewed

File without changes

torch_rechub-0.0.6/docs/en/core/data.md ADDED Viewed

File without changes

torch_rechub-0.0.6/docs/en/core/evaluation.md ADDED Viewed

File without changes

torch_rechub-0.0.6/docs/en/core/features.md ADDED Viewed

File without changes

torch_rechub-0.0.6/docs/en/core/intro.md ADDED Viewed

File without changes

torch_rechub-0.0.6/docs/en/guide/install.md ADDED Viewed

File without changes

torch_rechub-0.0.6/docs/en/guide/intro.md ADDED Viewed

File without changes

torch_rechub-0.0.6/docs/en/guide/quick_start.md ADDED Viewed

File without changes

torch_rechub-0.0.6/docs/en/models/intro.md ADDED Viewed

File without changes

torch_rechub-0.0.6/docs/en/models/matching.md ADDED Viewed

File without changes

torch_rechub-0.0.6/docs/en/models/mtl.md ADDED Viewed

File without changes

torch_rechub-0.0.6/docs/en/models/ranking.md ADDED Viewed

File without changes

torch_rechub-0.0.6/docs/en/serving/demo.md ADDED Viewed

File without changes

torch_rechub-0.0.6/docs/en/serving/intro.md ADDED Viewed

File without changes

torch_rechub-0.0.6/docs/en/serving/onnx.md ADDED Viewed

File without changes

torch_rechub-0.0.6/docs/en/serving/vector_index.md ADDED Viewed

File without changes

torch_rechub-0.0.6/docs/en/tools/callbacks.md ADDED Viewed

File without changes

torch_rechub-0.0.6/docs/en/tools/intro.md ADDED Viewed

File without changes

torch_rechub-0.0.6/docs/en/tools/tracking.md ADDED Viewed

File without changes

torch_rechub-0.0.6/docs/en/tools/visualization.md ADDED Viewed

File without changes

torch_rechub-0.0.6/docs/en/tutorials/ctr.md ADDED Viewed

File without changes

torch_rechub-0.0.6/docs/en/tutorials/intro.md ADDED Viewed

File without changes

torch_rechub-0.0.6/docs/en/tutorials/pipeline.md ADDED Viewed

File without changes

torch_rechub-0.0.6/docs/en/tutorials/retrieval.md ADDED Viewed

File without changes

torch_rechub-0.0.6/docs/public/favicon.ico ADDED Viewed

Binary file

torch_rechub-0.0.6/docs/public/img/logo.png ADDED Viewed

Binary file

torch_rechub-0.0.6/docs/public/img/logo_with_name.png ADDED Viewed

Binary file

torch_rechub-0.0.6/docs/zh/api/api.md ADDED Viewed

File without changes

torch_rechub-0.0.6/docs/zh/community/changelog.md ADDED Viewed

@@ -0,0 +1,9 @@
+---
+title: 版本日志
+description: Torch-RecHub 版本更新历史
+---
+# 版本日志
+此页面正在建设中。

torch_rechub-0.0.6/docs/zh/core/data.md ADDED Viewed

@@ -0,0 +1,143 @@
+---
+title: 数据流水线
+description: Torch-RecHub 数据加载与预处理
+---
+# 数据流水线
+Torch-RecHub提供了完整的数据处理流水线，包括数据集类、数据生成器和工具函数，用于处理推荐系统中的各种数据需求。
+## 数据类
+### TorchDataset
+用于训练和验证的数据集合，包含特征和标签。
+```python
+from torch_rechub.utils.data import TorchDataset
+# 创建数据集
+dataset = TorchDataset(x, y)
+```
+**参数说明：**
+- `x`：特征字典，键为特征名称，值为特征数据
+- `y`：标签数据
+### PredictDataset
+用于预测的数据集合，仅包含特征。
+```python
+from torch_rechub.utils.data import PredictDataset
+# 创建预测数据集
+dataset = PredictDataset(x)
+```
+**参数说明：**
+- `x`：特征字典，键为特征名称，值为特征数据
+## 数据生成器
+### DataGenerator
+用于生成排序模型和多任务模型的数据加载器。
+```python
+from torch_rechub.utils.data import DataGenerator
+# 创建数据生成器
+dg = DataGenerator(x, y)
+# 生成数据加载器
+train_dl, val_dl, test_dl = dg.generate_dataloader(
+    split_ratio=[0.7, 0.1],  # 训练集:验证集:测试集比例
+    batch_size=256,           # 批次大小
+    num_workers=8             # 并行工作线程数
+)
+```
+**参数说明：**
+- `x`：特征数据
+- `y`：标签数据
+**generate_dataloader方法参数：**
+- `split_ratio`：数据分割比例，长度为2
+- `batch_size`：批次大小
+- `num_workers`：并行工作线程数
+### MatchDataGenerator
+用于生成召回模型的数据加载器。
+```python
+from torch_rechub.utils.data import MatchDataGenerator
+# 创建召回数据生成器
+dg = MatchDataGenerator(x, y)
+# 生成数据加载器
+train_dl, test_dl, item_dl = dg.generate_dataloader(
+    x_test_user=x_test_user,  # 测试用户数据
+    x_all_item=x_all_item,    # 所有物品数据
+    batch_size=256,           # 批次大小
+    num_workers=8             # 并行工作线程数
+)
+```
+**参数说明：**
+- `x`：特征数据
+- `y`：标签数据，可选
+**generate_dataloader方法参数：**
+- `x_test_user`：测试用户数据
+- `x_all_item`：所有物品数据
+- `batch_size`：批次大小
+- `num_workers`：并行工作线程数
+## 工具函数
+### get_auto_embedding_dim
+根据类别数量自动计算嵌入向量长度。
+```python
+from torch_rechub.utils.data import get_auto_embedding_dim
+# 自动计算嵌入向量长度
+embed_dim = get_auto_embedding_dim(vocab_size=1000)
+```
+**参数说明：**
+- `num_classes`：类别数量
+**返回值：**
+- 嵌入向量长度，计算公式：`int(np.floor(6 * np.pow(num_classes, 0.25)))`
+### get_loss_func
+根据任务类型获取对应的损失函数。
+```python
+from torch_rechub.utils.data import get_loss_func
+# 获取分类任务损失函数
+loss_func = get_loss_func(task_type="classification")
+# 获取回归任务损失函数
+loss_func = get_loss_func(task_type="regression")
+```
+**参数说明：**
+- `task_type`：任务类型，可选值：classification（分类）、regression（回归）
+**返回值：**
+- 对应的损失函数实例
+## 数据处理流程
+1. **特征定义**：使用DenseFeature、SparseFeature、SequenceFeature定义特征
+2. **数据加载**：加载原始数据
+3. **特征编码**：对类别型特征进行LabelEncoder编码
+4. **序列处理**：对序列特征进行填充、截断等处理
+5. **样本构造**：构造训练样本，包括负采样等
+6. **数据生成**：使用DataGenerator或MatchDataGenerator生成数据加载器
+7. **模型训练**：将数据加载器传入模型进行训练

torch-rechub 0.0.5__tar.gz → 0.0.6__tar.gz

torch-rechub 0.0.5tar.gz → 0.0.6tar.gz