PyPI - nextrec - Versions diffs - 0.4.9__tar.gz → 0.4.10__tar.gz - Mend

nextrec 0.4.9tar.gz → 0.4.10tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (167) hide show

{nextrec-0.4.9 → nextrec-0.4.10}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: nextrec
-Version: 0.4.9
+Version: 0.4.10
 Summary: A comprehensive recommendation library with match, ranking, and multi-task learning models
 Project-URL: Homepage, https://github.com/zerolovesea/NextRec
 Project-URL: Repository, https://github.com/zerolovesea/NextRec
@@ -66,7 +66,7 @@ Description-Content-Type: text/markdown
 ![Python](https://img.shields.io/badge/Python-3.10+-blue.svg)
 ![PyTorch](https://img.shields.io/badge/PyTorch-1.10+-ee4c2c.svg)
 ![License](https://img.shields.io/badge/License-Apache%202.0-green.svg)
-![Version](https://img.shields.io/badge/Version-0.4.9-orange.svg)
+![Version](https://img.shields.io/badge/Version-0.4.10-orange.svg)
 中文文档 | [English Version](README_en.md)
@@ -99,11 +99,10 @@ NextRec是一个基于PyTorch的现代推荐系统框架，旨在为研究工程
 ## NextRec近期进展
-- **12/12/2025** 在v0.4.9中加入了[RQ-VAE](/nextrec/models/representation/rqvae.py)模块。配套的[数据集](/dataset/ecommerce_task.csv)和[代码](tutorials/notebooks/zh/使用RQ-VAE构建语义ID.ipynb)已经同步在仓库中
+- **12/12/2025** 在v0.4.10中加入了[RQ-VAE](/nextrec/models/representation/rqvae.py)模块。配套的[数据集](/dataset/ecommerce_task.csv)和[代码](tutorials/notebooks/zh/使用RQ-VAE构建语义ID.ipynb)已经同步在仓库中
 - **07/12/2025** 发布了NextRec CLI命令行工具，它允许用户根据配置文件进行一键训练和推理，我们提供了相关的[教程](/nextrec_cli_preset/NextRec-CLI_zh.md)和[教学代码](/nextrec_cli_preset)
 - **03/12/2025** NextRec获得了100颗🌟！感谢大家的支持
 - **06/12/2025** 在v0.4.1中支持了单机多卡的分布式DDP训练，并且提供了配套的[代码](tutorials/distributed)
-- **23/11/2025** 在v0.2.2中对basemodel进行了逻辑上的大幅重构和流程统一，并且对listwise/pairwise/pointwise损失进行了统一
 - **11/11/2025** NextRec v0.1.0发布，我们提供了10余种Ranking模型，4种多任务模型和4种召回模型，以及统一的训练/日志/指标管理系统
 ## 架构
@@ -241,11 +240,11 @@ nextrec --mode=train --train_config=path/to/train_config.yaml
 nextrec --mode=predict --predict_config=path/to/predict_config.yaml
 ```
-> 截止当前版本0.4.9，NextRec CLI支持单机训练，分布式训练相关功能尚在开发中。
+> 截止当前版本0.4.10，NextRec CLI支持单机训练，分布式训练相关功能尚在开发中。
 ## 兼容平台
-当前最新版本为0.4.9，所有模型和测试代码均已在以下平台通过验证，如果开发者在使用中遇到兼容问题，请在issue区提出错误报告及系统版本：
+当前最新版本为0.4.10，所有模型和测试代码均已在以下平台通过验证，如果开发者在使用中遇到兼容问题，请在issue区提出错误报告及系统版本：
 | 平台 | 配置 |
 |------|------|

{nextrec-0.4.9 → nextrec-0.4.10}/README.md RENAMED Viewed

@@ -7,7 +7,7 @@
 ![Python](https://img.shields.io/badge/Python-3.10+-blue.svg)
 ![PyTorch](https://img.shields.io/badge/PyTorch-1.10+-ee4c2c.svg)
 ![License](https://img.shields.io/badge/License-Apache%202.0-green.svg)
-![Version](https://img.shields.io/badge/Version-0.4.9-orange.svg)
+![Version](https://img.shields.io/badge/Version-0.4.10-orange.svg)
 中文文档 | [English Version](README_en.md)
@@ -40,11 +40,10 @@ NextRec是一个基于PyTorch的现代推荐系统框架，旨在为研究工程
 ## NextRec近期进展
-- **12/12/2025** 在v0.4.9中加入了[RQ-VAE](/nextrec/models/representation/rqvae.py)模块。配套的[数据集](/dataset/ecommerce_task.csv)和[代码](tutorials/notebooks/zh/使用RQ-VAE构建语义ID.ipynb)已经同步在仓库中
+- **12/12/2025** 在v0.4.10中加入了[RQ-VAE](/nextrec/models/representation/rqvae.py)模块。配套的[数据集](/dataset/ecommerce_task.csv)和[代码](tutorials/notebooks/zh/使用RQ-VAE构建语义ID.ipynb)已经同步在仓库中
 - **07/12/2025** 发布了NextRec CLI命令行工具，它允许用户根据配置文件进行一键训练和推理，我们提供了相关的[教程](/nextrec_cli_preset/NextRec-CLI_zh.md)和[教学代码](/nextrec_cli_preset)
 - **03/12/2025** NextRec获得了100颗🌟！感谢大家的支持
 - **06/12/2025** 在v0.4.1中支持了单机多卡的分布式DDP训练，并且提供了配套的[代码](tutorials/distributed)
-- **23/11/2025** 在v0.2.2中对basemodel进行了逻辑上的大幅重构和流程统一，并且对listwise/pairwise/pointwise损失进行了统一
 - **11/11/2025** NextRec v0.1.0发布，我们提供了10余种Ranking模型，4种多任务模型和4种召回模型，以及统一的训练/日志/指标管理系统
 ## 架构
@@ -182,11 +181,11 @@ nextrec --mode=train --train_config=path/to/train_config.yaml
 nextrec --mode=predict --predict_config=path/to/predict_config.yaml
 ```
-> 截止当前版本0.4.9，NextRec CLI支持单机训练，分布式训练相关功能尚在开发中。
+> 截止当前版本0.4.10，NextRec CLI支持单机训练，分布式训练相关功能尚在开发中。
 ## 兼容平台
-当前最新版本为0.4.9，所有模型和测试代码均已在以下平台通过验证，如果开发者在使用中遇到兼容问题，请在issue区提出错误报告及系统版本：
+当前最新版本为0.4.10，所有模型和测试代码均已在以下平台通过验证，如果开发者在使用中遇到兼容问题，请在issue区提出错误报告及系统版本：
 | 平台 | 配置 |
 |------|------|

{nextrec-0.4.9 → nextrec-0.4.10}/README_en.md RENAMED Viewed

@@ -7,7 +7,7 @@
 ![Python](https://img.shields.io/badge/Python-3.10+-blue.svg)
 ![PyTorch](https://img.shields.io/badge/PyTorch-1.10+-ee4c2c.svg)
 ![License](https://img.shields.io/badge/License-Apache%202.0-green.svg)
-![Version](https://img.shields.io/badge/Version-0.4.9-orange.svg)
+![Version](https://img.shields.io/badge/Version-0.4.10-orange.svg)
 English | [中文文档](README.md)
@@ -46,7 +46,6 @@ NextRec is a modern recommendation framework built on PyTorch, delivering a unif
 - **07/12/2025** Released the NextRec CLI tool to run training/inference from configs. See the [guide](/nextrec_cli_preset/NextRec-CLI.md) and [reference code](/nextrec_cli_preset).
 - **03/12/2025** NextRec reached 100 ⭐—thanks for the support!
 - **06/12/2025** Added single-machine multi-GPU DDP training in v0.4.1 with supporting [code](tutorials/distributed).
-- **23/11/2025** Major logical refactor of basemodel and unification of listwise/pairwise/pointwise losses in v0.2.2.
 - **11/11/2025** NextRec v0.1.0 released with 10+ ranking models, 4 multi-task models, 4 retrieval models, and a unified training/logging/metrics system.
 ## Architecture
@@ -186,11 +185,11 @@ nextrec --mode=train --train_config=path/to/train_config.yaml
 nextrec --mode=predict --predict_config=path/to/predict_config.yaml
 ```
-> As of version 0.4.9, NextRec CLI supports single-machine training; distributed training features are currently under development.
+> As of version 0.4.10, NextRec CLI supports single-machine training; distributed training features are currently under development.
 ## Platform Compatibility
-The current version is 0.4.9. All models and test code have been validated on the following platforms. If you encounter compatibility issues, please report them in the issue tracker with your system version:
+The current version is 0.4.10. All models and test code have been validated on the following platforms. If you encounter compatibility issues, please report them in the issue tracker with your system version:
 | Platform | Configuration |
 |----------|---------------|

{nextrec-0.4.9 → nextrec-0.4.10}/docs/en/Getting started guide.md RENAMED Viewed

@@ -102,4 +102,4 @@ metrics = model.evaluate(
 - Multi-task: `tutorials/example_multitask.py`
 - Notebooks: `tutorials/notebooks/zh/Hands on nextrec.ipynb`, `tutorials/notebooks/zh/Hands on dataprocessor.ipynb`
-For large offline features or streaming loads, use `DataProcessor` and `RecDataLoader` to configure CSV/Parquet paths and streaming (`load_full=False`) without changing model code.
+For large offline features or streaming loads, use `DataProcessor` and `RecDataLoader` to configure CSV/Parquet paths and streaming (`streaming=True`) without changing model code.

{nextrec-0.4.9 → nextrec-0.4.10}/docs/rtd/conf.py RENAMED Viewed

@@ -11,7 +11,7 @@ sys.path.insert(0, str(PROJECT_ROOT / "nextrec"))
 project = "NextRec"
 copyright = "2025, Yang Zhou"
 author = "Yang Zhou"
-release = "0.4.9"
+release = "0.4.10"
 extensions = [
     "myst_parser",

{nextrec-0.4.9 → nextrec-0.4.10}/docs/zh//345/277/253/351/200/237/344/270/212/346/211/213.md RENAMED Viewed

@@ -102,4 +102,4 @@ metrics = model.evaluate(
 - 多任务：`tutorials/example_multitask.py`
 - Notebook：`tutorials/notebooks/zh/Hands on nextrec.ipynb`、`tutorials/notebooks/zh/Hands on dataprocessor.ipynb`
-如果需要大规模离线特征或流式加载，可结合 `DataProcessor`、`RecDataLoader` 配置 CSV/Parquet 路径与流式参数（`load_full=False`），在不修改模型代码的情况下完成训练与推理。
+如果需要大规模离线特征或流式加载，可结合 `DataProcessor`、`RecDataLoader` 配置 CSV/Parquet 路径与流式参数（`streaming=True`），在不修改模型代码的情况下完成训练与推理。

nextrec-0.4.10/nextrec/__version__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ __version__ = "0.4.10"

{nextrec-0.4.9 → nextrec-0.4.10}/nextrec/basic/model.py RENAMED Viewed

@@ -1376,7 +1376,7 @@ class BaseModel(FeatureSet, nn.Module):
                 data=data,
                 batch_size=batch_size,
                 shuffle=False,
-                load_full=False,
+                streaming=True,
                 chunk_size=streaming_chunk_size,
             )
         else:
@@ -1510,7 +1510,7 @@ class BaseModel(FeatureSet, nn.Module):
                 data=data,
                 batch_size=batch_size,
                 shuffle=False,
-                load_full=False,
+                streaming=True,
                 chunk_size=streaming_chunk_size,
             )
         elif not isinstance(data, DataLoader):
@@ -1605,7 +1605,8 @@ class BaseModel(FeatureSet, nn.Module):
                 if collected_frames
                 else pd.DataFrame(columns=pred_columns or [])
             )
-        return pd.DataFrame(columns=pred_columns or [])
+        # Return the actual save path when not returning dataframe
+        return target_path
     def save_model(
         self,

{nextrec-0.4.9 → nextrec-0.4.10}/nextrec/cli.py RENAMED Viewed

@@ -29,7 +29,7 @@ from typing import Any, Dict, List
 import pandas as pd
 from nextrec.basic.features import DenseFeature, SequenceFeature, SparseFeature
-from nextrec.basic.loggers import setup_logger
+from nextrec.basic.loggers import colorize, format_kv, setup_logger
 from nextrec.data.data_utils import split_dict_random
 from nextrec.data.dataloader import RecDataLoader
 from nextrec.data.preprocessor import DataProcessor
@@ -52,6 +52,17 @@ from nextrec.utils.feature import normalize_to_list
 logger = logging.getLogger(__name__)
+def log_cli_section(title: str) -> None:
+    logger.info("")
+    logger.info(colorize(f"[{title}]", color="bright_blue", bold=True))
+    logger.info(colorize("-" * 80, color="bright_blue"))
+def log_kv_lines(items: list[tuple[str, Any]]) -> None:
+    for label, value in items:
+        logger.info(format_kv(label, value))
 def train_model(train_config_path: str) -> None:
     """
     Train a NextRec model using the provided configuration file.
@@ -74,8 +85,17 @@ def train_model(train_config_path: str) -> None:
     artifact_root = Path(session_cfg.get("artifact_root", "nextrec_logs"))
     session_dir = artifact_root / session_id
     setup_logger(session_id=session_id)
-    logger.info(
-        f"[NextRec CLI] Training start | version={get_nextrec_version()} | session_id={session_id} | artifacts={session_dir.resolve()}"
+    log_cli_section("CLI")
+    log_kv_lines(
+        [
+            ("Mode", "train"),
+            ("Version", get_nextrec_version()),
+            ("Session ID", session_id),
+            ("Artifacts", session_dir.resolve()),
+            ("Config", config_file.resolve()),
+            ("Command", " ".join(sys.argv)),
+        ]
     )
     processor_path = session_dir / "processor.pkl"
@@ -102,11 +122,53 @@ def train_model(train_config_path: str) -> None:
         cfg.get("model_config", "model_config.yaml"), config_dir
     )
+    log_cli_section("Config")
+    log_kv_lines(
+        [
+            ("Train config", config_file.resolve()),
+            ("Feature config", feature_cfg_path),
+            ("Model config", model_cfg_path),
+        ]
+    )
     feature_cfg = read_yaml(feature_cfg_path)
     model_cfg = read_yaml(model_cfg_path)
+    # Extract id_column from data config for GAUC metrics
+    id_column = data_cfg.get("id_column") or data_cfg.get("user_id_column")
+    id_columns = [id_column] if id_column else []
+    log_cli_section("Data")
+    log_kv_lines(
+        [
+            ("Data path", data_path),
+            ("Format", data_cfg.get("format", "auto")),
+            ("Streaming", streaming),
+            ("Target", target),
+            ("ID column", id_column or "(not set)"),
+        ]
+    )
+    if data_cfg.get("valid_ratio") is not None:
+        logger.info(format_kv("Valid ratio", data_cfg.get("valid_ratio")))
+    if data_cfg.get("val_path") or data_cfg.get("valid_path"):
+        logger.info(
+            format_kv(
+                "Validation path",
+                resolve_path(
+                    data_cfg.get("val_path") or data_cfg.get("valid_path"), config_dir
+                ),
+            )
+        )
     if streaming:
         file_paths, file_type = resolve_file_paths(str(data_path))
+        log_kv_lines(
+            [
+                ("File type", file_type),
+                ("Files", len(file_paths)),
+                ("Chunk size", dataloader_chunk_size),
+            ]
+        )
         first_file = file_paths[0]
         first_chunk_size = max(1, min(dataloader_chunk_size, 1000))
         chunk_iter = iter_file_chunks(first_file, file_type, first_chunk_size)
@@ -118,14 +180,12 @@ def train_model(train_config_path: str) -> None:
     else:
         df = read_table(data_path, data_cfg.get("format"))
+        logger.info(format_kv("Rows", len(df)))
+        logger.info(format_kv("Columns", len(df.columns)))
         df_columns = list(df.columns)
     dense_names, sparse_names, sequence_names = select_features(feature_cfg, df_columns)
-    # Extract id_column from data config for GAUC metrics
-    id_column = data_cfg.get("id_column") or data_cfg.get("user_id_column")
-    id_columns = [id_column] if id_column else []
     used_columns = dense_names + sparse_names + sequence_names + target + id_columns
     # keep order but drop duplicates
@@ -141,6 +201,17 @@ def train_model(train_config_path: str) -> None:
         processor, feature_cfg, dense_names, sparse_names, sequence_names
     )
+    log_cli_section("Features")
+    log_kv_lines(
+        [
+            ("Dense features", len(dense_names)),
+            ("Sparse features", len(sparse_names)),
+            ("Sequence features", len(sequence_names)),
+            ("Targets", len(target)),
+            ("Used columns", len(unique_used_columns)),
+        ]
+    )
     if streaming:
         processor.fit(str(data_path), chunk_size=dataloader_chunk_size)
         processed = None
@@ -244,7 +315,7 @@ def train_model(train_config_path: str) -> None:
             data=train_stream_source,
             batch_size=dataloader_cfg.get("train_batch_size", 512),
             shuffle=dataloader_cfg.get("train_shuffle", True),
-            load_full=False,
+            streaming=True,
             chunk_size=dataloader_chunk_size,
             num_workers=dataloader_cfg.get("num_workers", 0),
         )
@@ -255,7 +326,7 @@ def train_model(train_config_path: str) -> None:
                 data=str(val_data_resolved),
                 batch_size=dataloader_cfg.get("valid_batch_size", 512),
                 shuffle=dataloader_cfg.get("valid_shuffle", False),
-                load_full=False,
+                streaming=True,
                 chunk_size=dataloader_chunk_size,
                 num_workers=dataloader_cfg.get("num_workers", 0),
             )
@@ -264,7 +335,7 @@ def train_model(train_config_path: str) -> None:
                 data=streaming_valid_files,
                 batch_size=dataloader_cfg.get("valid_batch_size", 512),
                 shuffle=dataloader_cfg.get("valid_shuffle", False),
-                load_full=False,
+                streaming=True,
                 chunk_size=dataloader_chunk_size,
                 num_workers=dataloader_cfg.get("num_workers", 0),
             )
@@ -295,6 +366,15 @@ def train_model(train_config_path: str) -> None:
         device,
     )
+    log_cli_section("Model")
+    log_kv_lines(
+        [
+            ("Model", model.__class__.__name__),
+            ("Device", device),
+            ("Session ID", session_id),
+        ]
+    )
     model.compile(
         optimizer=train_cfg.get("optimizer", "adam"),
         optimizer_params=train_cfg.get("optimizer_params", {}),
@@ -325,13 +405,30 @@ def predict_model(predict_config_path: str) -> None:
     config_dir = config_file.resolve().parent
     cfg = read_yaml(config_file)
-    session_cfg = cfg.get("session", {}) or {}
-    session_id = session_cfg.get("id", "masknet_tutorial")
-    artifact_root = Path(session_cfg.get("artifact_root", "nextrec_logs"))
-    session_dir = Path(cfg.get("checkpoint_path") or (artifact_root / session_id))
+    # Checkpoint path is the primary configuration
+    if "checkpoint_path" not in cfg:
+        session_cfg = cfg.get("session", {}) or {}
+        session_id = session_cfg.get("id", "nextrec_session")
+        artifact_root = Path(session_cfg.get("artifact_root", "nextrec_logs"))
+        session_dir = artifact_root / session_id
+    else:
+        session_dir = Path(cfg["checkpoint_path"])
+        # Auto-infer session_id from checkpoint directory name
+        session_cfg = cfg.get("session", {}) or {}
+        session_id = session_cfg.get("id") or session_dir.name
     setup_logger(session_id=session_id)
-    logger.info(
-        f"[NextRec CLI] Predict start | version={get_nextrec_version()} | session_id={session_id} | checkpoint={session_dir.resolve()}"
+    log_cli_section("CLI")
+    log_kv_lines(
+        [
+            ("Mode", "predict"),
+            ("Version", get_nextrec_version()),
+            ("Session ID", session_id),
+            ("Checkpoint", session_dir.resolve()),
+            ("Config", config_file.resolve()),
+            ("Command", " ".join(sys.argv)),
+        ]
     )
     processor_path = Path(session_dir / "processor.pkl")
@@ -339,24 +436,38 @@ def predict_model(predict_config_path: str) -> None:
         processor_path = session_dir / "processor" / "processor.pkl"
     predict_cfg = cfg.get("predict", {}) or {}
-    model_cfg_path = resolve_path(
-        cfg.get("model_config", "model_config.yaml"), config_dir
-    )
-    # feature_cfg_path = resolve_path(
-    #     cfg.get("feature_config", "feature_config.yaml"), config_dir
-    # )
+    # Auto-find model_config in checkpoint directory if not specified
+    if "model_config" in cfg:
+        model_cfg_path = resolve_path(cfg["model_config"], config_dir)
+    else:
+        # Try to find model_config.yaml in checkpoint directory
+        auto_model_cfg = session_dir / "model_config.yaml"
+        if auto_model_cfg.exists():
+            model_cfg_path = auto_model_cfg
+        else:
+            # Fallback to config directory
+            model_cfg_path = resolve_path("model_config.yaml", config_dir)
     model_cfg = read_yaml(model_cfg_path)
-    # feature_cfg = read_yaml(feature_cfg_path)
     model_cfg.setdefault("session_id", session_id)
     model_cfg.setdefault("params", {})
+    log_cli_section("Config")
+    log_kv_lines(
+        [
+            ("Predict config", config_file.resolve()),
+            ("Model config", model_cfg_path),
+            ("Processor", processor_path),
+        ]
+    )
     processor = DataProcessor.load(processor_path)
     # Load checkpoint and ensure required parameters are passed
     checkpoint_base = Path(session_dir)
     if checkpoint_base.is_dir():
-        candidates = sorted(checkpoint_base.glob("*.model"))
+        candidates = sorted(checkpoint_base.glob("*.pt"))
         if not candidates:
             raise FileNotFoundError(
                 f"[NextRec CLI Error]: Unable to find model checkpoint: {checkpoint_base}"
@@ -365,7 +476,7 @@ def predict_model(predict_config_path: str) -> None:
         config_dir_for_features = checkpoint_base
     else:
         model_file = (
-            checkpoint_base.with_suffix(".model")
+            checkpoint_base.with_suffix(".pt")
             if checkpoint_base.suffix == ""
             else checkpoint_base
         )
@@ -415,40 +526,78 @@ def predict_model(predict_config_path: str) -> None:
         id_columns = [predict_cfg["id_column"]]
         model.id_columns = id_columns
+    effective_id_columns = id_columns or model.id_columns
+    log_cli_section("Features")
+    log_kv_lines(
+        [
+            ("Dense features", len(dense_features)),
+            ("Sparse features", len(sparse_features)),
+            ("Sequence features", len(sequence_features)),
+            ("Targets", len(target_cols)),
+            ("ID columns", len(effective_id_columns)),
+        ]
+    )
+    log_cli_section("Model")
+    log_kv_lines(
+        [
+            ("Model", model.__class__.__name__),
+            ("Checkpoint", model_file),
+            ("Device", predict_cfg.get("device", "cpu")),
+        ]
+    )
     rec_dataloader = RecDataLoader(
         dense_features=model.dense_features,
         sparse_features=model.sparse_features,
         sequence_features=model.sequence_features,
         target=None,
-        id_columns=id_columns or model.id_columns,
+        id_columns=effective_id_columns,
         processor=processor,
     )
     data_path = resolve_path(predict_cfg["data_path"], config_dir)
     batch_size = predict_cfg.get("batch_size", 512)
+    log_cli_section("Data")
+    log_kv_lines(
+        [
+            ("Data path", data_path),
+            ("Format", predict_cfg.get("source_data_format", predict_cfg.get("data_format", "auto"))),
+            ("Batch size", batch_size),
+            ("Chunk size", predict_cfg.get("chunk_size", 20000)),
+            ("Streaming", predict_cfg.get("streaming", True)),
+        ]
+    )
+    logger.info("")
     pred_loader = rec_dataloader.create_dataloader(
         data=str(data_path),
         batch_size=batch_size,
         shuffle=False,
-        load_full=predict_cfg.get("load_full", False),
+        streaming=predict_cfg.get("streaming", True),
         chunk_size=predict_cfg.get("chunk_size", 20000),
     )
-    output_path = resolve_path(predict_cfg["output_path"], config_dir)
-    output_path.parent.mkdir(parents=True, exist_ok=True)
+    # Build output path: {checkpoint_path}/predictions/{name}.{save_data_format}
+    save_format = predict_cfg.get("save_data_format", predict_cfg.get("save_format", "csv"))
+    pred_name = predict_cfg.get("name", "pred")
+    # Pass filename with extension to let model.predict handle path resolution
+    save_path = f"{pred_name}.{save_format}"
     start = time.time()
-    model.predict(
+    logger.info("")
+    result = model.predict(
         data=pred_loader,
         batch_size=batch_size,
         include_ids=bool(id_columns),
         return_dataframe=False,
-        save_path=output_path,
-        save_format=predict_cfg.get("save_format", "csv"),
+        save_path=save_path,
+        save_format=save_format,
         num_workers=predict_cfg.get("num_workers", 0),
     )
     duration = time.time() - start
+    # When return_dataframe=False, result is the actual file path
+    output_path = result if isinstance(result, Path) else checkpoint_base / "predictions" / save_path
     logger.info(f"Prediction completed, results saved to: {output_path}")
     logger.info(f"Total time: {duration:.2f} seconds")
@@ -492,8 +641,6 @@ Examples:
     parser.add_argument("--predict_config", help="Prediction configuration file path")
     args = parser.parse_args()
-    logger.info(get_nextrec_version())
     if not args.mode:
         parser.error("[NextRec CLI Error] --mode is required (train|predict)")

{nextrec-0.4.9 → nextrec-0.4.10}/nextrec/data/dataloader.py RENAMED Viewed

@@ -102,9 +102,8 @@ class FileDataset(FeatureSet, IterableDataset):
         self.current_file_index = 0
         for file_path in self.file_paths:
             self.current_file_index += 1
-            if self.total_files == 1:
-                file_name = os.path.basename(file_path)
-                logging.info(f"Processing file: {file_name}")
+            # Don't log file processing here to avoid interrupting progress bars
+            # File information is already displayed in the CLI data section
             if self.file_type == "csv":
                 yield from self.read_csv_chunks(file_path)
             elif self.file_type == "parquet":
@@ -190,7 +189,7 @@ class RecDataLoader(FeatureSet):
         ),
         batch_size: int = 32,
         shuffle: bool = True,
-        load_full: bool = True,
+        streaming: bool = False,
         chunk_size: int = 10000,
         num_workers: int = 0,
         sampler=None,
@@ -202,7 +201,7 @@ class RecDataLoader(FeatureSet):
             data: Data source, can be a dict, pd.DataFrame, file path (str), or existing DataLoader.
             batch_size: Batch size for DataLoader.
             shuffle: Whether to shuffle the data (ignored in streaming mode).
-            load_full: If True, load full data into memory; if False, use streaming mode for large files.
+            streaming: If True, use streaming mode for large files; if False, load full data into memory.
             chunk_size: Chunk size for streaming mode (number of rows per chunk).
             num_workers: Number of worker processes for data loading.
             sampler: Optional sampler for DataLoader, only used for distributed training.
@@ -217,7 +216,7 @@ class RecDataLoader(FeatureSet):
                 path=data,
                 batch_size=batch_size,
                 shuffle=shuffle,
-                load_full=load_full,
+                streaming=streaming,
                 chunk_size=chunk_size,
                 num_workers=num_workers,
             )
@@ -230,7 +229,7 @@ class RecDataLoader(FeatureSet):
                 path=data,
                 batch_size=batch_size,
                 shuffle=shuffle,
-                load_full=load_full,
+                streaming=streaming,
                 chunk_size=chunk_size,
                 num_workers=num_workers,
             )
@@ -290,7 +289,7 @@ class RecDataLoader(FeatureSet):
         path: str | os.PathLike | list[str] | list[os.PathLike],
         batch_size: int,
         shuffle: bool,
-        load_full: bool,
+        streaming: bool,
         chunk_size: int = 10000,
         num_workers: int = 0,
     ) -> DataLoader:
@@ -311,8 +310,17 @@ class RecDataLoader(FeatureSet):
                     f"[RecDataLoader Error] Unsupported file extension in list: {suffix}"
                 )
             file_type = "csv" if suffix == ".csv" else "parquet"
+        if streaming:
+            return self.load_files_streaming(
+                file_paths,
+                file_type,
+                batch_size,
+                chunk_size,
+                shuffle,
+                num_workers=num_workers,
+            )
         # Load full data into memory
-        if load_full:
+        else:
             dfs = []
             total_bytes = 0
             for file_path in file_paths:
@@ -325,26 +333,17 @@ class RecDataLoader(FeatureSet):
                     dfs.append(df)
                 except MemoryError as exc:
                     raise MemoryError(
-                        f"[RecDataLoader Error] Out of memory while reading {file_path}. Consider using load_full=False with streaming."
+                        f"[RecDataLoader Error] Out of memory while reading {file_path}. Consider using streaming=True."
                     ) from exc
             try:
                 combined_df = pd.concat(dfs, ignore_index=True)
             except MemoryError as exc:
                 raise MemoryError(
-                    f"[RecDataLoader Error] Out of memory while concatenating loaded data (approx {total_bytes / (1024**3):.2f} GB). Use load_full=False to stream or reduce chunk_size."
+                    f"[RecDataLoader Error] Out of memory while concatenating loaded data (approx {total_bytes / (1024**3):.2f} GB). Use streaming=True or reduce chunk_size."
                 ) from exc
             return self.create_from_memory(
                 combined_df, batch_size, shuffle, num_workers=num_workers
             )
-        else:
-            return self.load_files_streaming(
-                file_paths,
-                file_type,
-                batch_size,
-                chunk_size,
-                shuffle,
-                num_workers=num_workers,
-            )
     def load_files_streaming(
         self,

{nextrec-0.4.9 → nextrec-0.4.10}/nextrec/models/ranking/deepfm.py RENAMED Viewed

@@ -1,12 +1,11 @@
 """
 Date: create on 27/10/2025
 Checkpoint: edit on 24/11/2025
-Author:
-    Yang Zhou,zyaztec@gmail.com
+Author: Yang Zhou,zyaztec@gmail.com
 Reference:
-    [1] Guo H, Tang R, Ye Y, et al. DeepFM: A factorization-machine based neural network
-        for CTR prediction[J]. arXiv preprint arXiv:1703.04247, 2017.
-        (https://arxiv.org/abs/1703.04247)
+[1] Guo H, Tang R, Ye Y, et al. DeepFM: A factorization-machine based neural network
+for CTR prediction[J]. arXiv preprint arXiv:1703.04247, 2017.
+(https://arxiv.org/abs/1703.04247)
 DeepFM combines a Factorization Machine (FM) for explicit second-order feature
 interactions with a deep MLP for high-order nonlinear patterns. Both parts share

nextrec 0.4.9__tar.gz → 0.4.10__tar.gz

nextrec 0.4.9tar.gz → 0.4.10tar.gz