PyPI - replay-rec - Versions diffs - 0.18.0__tar.gz → 0.18.1__tar.gz - Mend

replay-rec 0.18.0tar.gz → 0.18.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (133) hide show

{replay_rec-0.18.0 → replay_rec-0.18.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: replay-rec
-Version: 0.18.0
+Version: 0.18.1
 Summary: RecSys Library
 Home-page: https://sb-ai-lab.github.io/RePlay/
 License: Apache-2.0
@@ -21,10 +21,13 @@ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
 Provides-Extra: all
 Provides-Extra: spark
 Provides-Extra: torch
+Provides-Extra: torch-openvino
 Requires-Dist: fixed-install-nmslib (==2.1.2)
 Requires-Dist: hnswlib (>=0.7.0,<0.8.0)
-Requires-Dist: lightning (>=2.0.2,<=2.4.0) ; extra == "torch" or extra == "all"
+Requires-Dist: lightning (>=2.0.2,<=2.4.0) ; extra == "torch" or extra == "torch-openvino" or extra == "all"
 Requires-Dist: numpy (>=1.20.0)
+Requires-Dist: onnx (>=1.16.2,<1.17.0) ; extra == "torch-openvino" or extra == "all"
+Requires-Dist: openvino (>=2024.3.0,<2024.4.0) ; extra == "torch-openvino" or extra == "all"
 Requires-Dist: optuna (>=3.2.0,<3.3.0)
 Requires-Dist: pandas (>=1.3.5,<=2.2.2)
 Requires-Dist: polars (>=1.0.0,<1.1.0)
@@ -32,10 +35,10 @@ Requires-Dist: psutil (>=6.0.0,<6.1.0)
 Requires-Dist: pyarrow (>=12.0.1)
 Requires-Dist: pyspark (>=3.0,<3.6) ; (python_full_version >= "3.8.1" and python_version < "3.11") and (extra == "spark" or extra == "all")
 Requires-Dist: pyspark (>=3.4,<3.6) ; (python_version >= "3.11" and python_version < "3.12") and (extra == "spark" or extra == "all")
-Requires-Dist: pytorch-ranger (>=0.1.1,<0.2.0) ; extra == "torch" or extra == "all"
+Requires-Dist: pytorch-ranger (>=0.1.1,<0.2.0) ; extra == "torch" or extra == "torch-openvino" or extra == "all"
 Requires-Dist: scikit-learn (>=1.0.2,<2.0.0)
 Requires-Dist: scipy (>=1.8.1,<2.0.0)
-Requires-Dist: torch (>=1.8,<=2.4.0) ; extra == "torch" or extra == "all"
+Requires-Dist: torch (>=1.8,<=2.5.0) ; extra == "torch" or extra == "torch-openvino" or extra == "all"
 Project-URL: Repository, https://github.com/sb-ai-lab/RePlay
 Description-Content-Type: text/markdown
@@ -44,11 +47,15 @@ Description-Content-Type: text/markdown
 [![GitHub License](https://img.shields.io/github/license/sb-ai-lab/RePlay)](https://github.com/sb-ai-lab/RePlay/blob/main/LICENSE)
 [![PyPI - Version](https://img.shields.io/pypi/v/replay-rec)](https://pypi.org/project/replay-rec)
+[![Documentation](https://img.shields.io/badge/docs-latest-brightgreen.svg)](https://sb-ai-lab.github.io/RePlay/)
 [![PyPI - Downloads](https://img.shields.io/pypi/dm/replay-rec)](https://pypistats.org/packages/replay-rec)
 <br>
 [![GitHub Workflow Status (with event)](https://img.shields.io/github/actions/workflow/status/sb-ai-lab/replay/main.yml)](https://github.com/sb-ai-lab/RePlay/actions/workflows/main.yml?query=branch%3Amain)
+[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/charliermarsh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
+[![Python Versions](https://img.shields.io/pypi/pyversions/replay-rec.svg?logo=python&logoColor=white)](https://pypi.org/project/replay-rec)
 [![Join the community on GitHub Discussions](https://badgen.net/badge/join%20the%20discussion/on%20github/black?icon=github)](https://github.com/sb-ai-lab/RePlay/discussions)
 RePlay is an advanced framework designed to facilitate the development and evaluation of recommendation systems. It provides a robust set of tools covering the entire lifecycle of a recommendation system pipeline:
 ## 🚀 Features:
@@ -63,61 +70,25 @@ RePlay is an advanced framework designed to facilitate the development and evalu
 1. **Diverse Hardware Support:** Compatible with various hardware configurations including CPU, GPU, Multi-GPU.
 2. **Cluster Computing Integration:** Integrating with PySpark for distributed computing, enabling scalability for large-scale recommendation systems.
-## 📖 Documentation is available [here](https://sb-ai-lab.github.io/RePlay/).
 <a name="toc"></a>
 # Table of Contents
-* [Installation](#installation)
 * [Quickstart](#quickstart)
+* [Installation](#installation)
 * [Resources](#examples)
 * [Contributing to RePlay](#contributing)
-<a name="installation"></a>
-## 🔧 Installation
-Installation via `pip` package manager is recommended by default:
-```bash
-pip install replay-rec
-```
-In this case it will be installed the `core` package without `PySpark` and `PyTorch` dependencies.
-Also `experimental` submodule will not be installed.
-To install `experimental` submodule please specify the version with `rc0` suffix.
-For example:
-```bash
-pip install replay-rec==XX.YY.ZZrc0
-```
-### Extras
-In addition to the core package, several extras are also provided, including:
-- `[spark]`: Install PySpark functionality
-- `[torch]`: Install PyTorch and Lightning functionality
-- `[all]`: `[spark]` `[torch]`
+<a name="quickstart"></a>
+## 📈 Quickstart
-Example:
 ```bash
-# Install core package with PySpark dependency
-pip install replay-rec[spark]
-# Install package with experimental submodule and PySpark dependency
-pip install replay-rec[spark]==XX.YY.ZZrc0
+pip install replay-rec[all]
 ```
-To build RePlay from sources please use the [instruction](CONTRIBUTING.md#installing-from-the-source).
-If you encounter an error during RePlay installation, check the [troubleshooting](https://sb-ai-lab.github.io/RePlay/pages/installation.html#troubleshooting) guide.
-<a name="quickstart"></a>
-## 📈 Quickstart (PySpark-based)
+Pyspark-based model and [fast](https://github.com/sb-ai-lab/RePlay/blob/main/examples/11_sasrec_dataframes_comparison.ipynb) polars-based data preprocessing:
 ```python
+from polars import from_pandas
 from rs_datasets import MovieLens
 from replay.data import Dataset, FeatureHint, FeatureInfo, FeatureSchema, FeatureType
@@ -131,10 +102,10 @@ from replay.splitters import RatioSplitter
 spark = State().session
 ml_1m = MovieLens("1m")
-K=10
+K = 10
-# data preprocessing
-interactions = convert2spark(ml_1m.ratings)
+# convert data to polars
+interactions = from_pandas(ml_1m.ratings)
 # data splitting
 splitter = RatioSplitter(
@@ -148,7 +119,7 @@ splitter = RatioSplitter(
 )
 train, test = splitter.split(interactions)
-# dataset creating
+# datasets creation
 feature_schema = FeatureSchema(
     [
         FeatureInfo(
@@ -174,20 +145,18 @@ feature_schema = FeatureSchema(
     ]
 )
-train_dataset = Dataset(
-    feature_schema=feature_schema,
-    interactions=train,
-)
-test_dataset = Dataset(
-    feature_schema=feature_schema,
-    interactions=test,
-)
+train_dataset = Dataset(feature_schema=feature_schema, interactions=train)
+test_dataset = Dataset(feature_schema=feature_schema, interactions=test)
 # data encoding
 encoder = DatasetLabelEncoder()
 train_dataset = encoder.fit_transform(train_dataset)
 test_dataset = encoder.transform(test_dataset)
+# convert datasets to spark
+train_dataset.to_spark()
+test_dataset.to_spark()
 # model training
 model = ItemKNN()
 model.fit(train_dataset)
@@ -214,6 +183,44 @@ metrics.add_result("ItemKNN", recs)
 print(metrics.results)
 ```
+<a name="installation"></a>
+## 🔧 Installation
+Installation via `pip` package manager is recommended by default:
+```bash
+pip install replay-rec
+```
+In this case it will be installed the `core` package without `PySpark` and `PyTorch` dependencies.
+Also `experimental` submodule will not be installed.
+To install `experimental` submodule please specify the version with `rc0` suffix.
+For example:
+```bash
+pip install replay-rec==XX.YY.ZZrc0
+```
+### Extras
+In addition to the core package, several extras are also provided, including:
+- `[spark]`: Install PySpark functionality
+- `[torch]`: Install PyTorch and Lightning functionality
+- `[all]`: `[spark]` `[torch]`
+Example:
+```bash
+# Install core package with PySpark dependency
+pip install replay-rec[spark]
+# Install package with experimental submodule and PySpark dependency
+pip install replay-rec[spark]==XX.YY.ZZrc0
+```
+To build RePlay from sources please use the [instruction](CONTRIBUTING.md#installing-from-the-source).
 <a name="examples"></a>
 ## 📑  Resources
@@ -226,14 +233,19 @@ print(metrics.results)
 6. [06_item2item_recommendations.ipynb](https://github.com/sb-ai-lab/RePlay/blob/main/examples/06_item2item_recommendations.ipynb) - Item to Item recommendations example.
 7. [07_filters.ipynb](https://github.com/sb-ai-lab/RePlay/blob/main/examples/07_filters.ipynb) - An example of using filters.
 8. [08_recommending_for_categories.ipynb](https://github.com/sb-ai-lab/RePlay/blob/main/examples/08_recommending_for_categories.ipynb) - An example of recommendation for product categories.
-9. [09_sasrec_example.ipynb](https://github.com/sb-ai-lab/RePlay/blob/main/examples/09_sasrec_example.ipynb) - An example of using transformers to generate recommendations.
+9. [09_sasrec_example.ipynb](https://github.com/sb-ai-lab/RePlay/blob/main/examples/09_sasrec_example.ipynb) - An example of using transformer-based SASRec model to generate recommendations.
+10. [10_bert4rec_example.ipynb](https://github.com/sb-ai-lab/RePlay/blob/main/examples/10_bert4rec_example.ipynb) - An example of using transformer-based BERT4Rec model to generate recommendations.
+11. [11_sasrec_dataframes_comparison.ipynb](https://github.com/sb-ai-lab/RePlay/blob/main/examples/11_sasrec_dataframes_comparison.ipynb) - speed comparison of using different frameworks (pandas, polars, pyspark) for data processing during SASRec training.
+12. [12_neural_ts_exp.ipynb](https://github.com/sb-ai-lab/RePlay/blob/main/examples/12_neural_ts_exp.ipynb) - An example of using Neural Thompson Sampling bandit model (based on Wide&Deep architecture).
+13. [13_personalized_bandit_comparison.ipynb](https://github.com/sb-ai-lab/RePlay/blob/main/examples/13_personalized_bandit_comparison.ipynb) - A comparison of context-free and contextual bandit models.
+14. [14_hierarchical_recommender.ipynb](https://github.com/sb-ai-lab/RePlay/blob/main/examples/14_hierarchical_recommender.ipynb) - An example of using HierarchicalRecommender with user-disjoint LinUCB.
 ### Videos and papers
 * **Video guides**:
 	- [Replay for offline recommendations, AI Journey 2021](https://www.youtube.com/watch?v=ejQZKGAG0xs)
 * **Research papers**:
+    - [RePlay: a Recommendation Framework for Experimentation and Production Use](https://arxiv.org/abs/2409.07272) Alexey Vasilev, Anna Volodkevich, Denis Kulandin, Tatiana Bysheva, Anton Klenitskiy. In The 18th ACM Conference on Recommender Systems (RecSys '24)
 	- [Turning Dross Into Gold Loss: is BERT4Rec really better than SASRec?](https://doi.org/10.1145/3604915.3610644) Anton Klenitskiy, Alexey Vasilev. In The 17th ACM Conference on Recommender Systems (RecSys '23)
     - [The Long Tail of Context: Does it Exist and Matter?](https://arxiv.org/abs/2210.01023). Konstantin Bauman, Alexey Vasilev, Alexander Tuzhilin. In Workshop on Context-Aware Recommender Systems (CARS) (RecSys '22)
     - [Multiobjective Evaluation of Reinforcement Learning Based Recommender Systems](https://doi.org/10.1145/3523227.3551485). Alexey Grishanov, Anastasia Ianina, Konstantin Vorontsov. In The 16th ACM Conference on Recommender Systems (RecSys '22)
@@ -244,3 +256,4 @@ print(metrics.results)
 We welcome community contributions. For details please check our [contributing guidelines](CONTRIBUTING.md).

{replay_rec-0.18.0 → replay_rec-0.18.1}/README.md RENAMED Viewed

@@ -3,11 +3,15 @@
 [![GitHub License](https://img.shields.io/github/license/sb-ai-lab/RePlay)](https://github.com/sb-ai-lab/RePlay/blob/main/LICENSE)
 [![PyPI - Version](https://img.shields.io/pypi/v/replay-rec)](https://pypi.org/project/replay-rec)
+[![Documentation](https://img.shields.io/badge/docs-latest-brightgreen.svg)](https://sb-ai-lab.github.io/RePlay/)
 [![PyPI - Downloads](https://img.shields.io/pypi/dm/replay-rec)](https://pypistats.org/packages/replay-rec)
 <br>
 [![GitHub Workflow Status (with event)](https://img.shields.io/github/actions/workflow/status/sb-ai-lab/replay/main.yml)](https://github.com/sb-ai-lab/RePlay/actions/workflows/main.yml?query=branch%3Amain)
+[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/charliermarsh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
+[![Python Versions](https://img.shields.io/pypi/pyversions/replay-rec.svg?logo=python&logoColor=white)](https://pypi.org/project/replay-rec)
 [![Join the community on GitHub Discussions](https://badgen.net/badge/join%20the%20discussion/on%20github/black?icon=github)](https://github.com/sb-ai-lab/RePlay/discussions)
 RePlay is an advanced framework designed to facilitate the development and evaluation of recommendation systems. It provides a robust set of tools covering the entire lifecycle of a recommendation system pipeline:
 ## 🚀 Features:
@@ -22,61 +26,25 @@ RePlay is an advanced framework designed to facilitate the development and evalu
 1. **Diverse Hardware Support:** Compatible with various hardware configurations including CPU, GPU, Multi-GPU.
 2. **Cluster Computing Integration:** Integrating with PySpark for distributed computing, enabling scalability for large-scale recommendation systems.
-## 📖 Documentation is available [here](https://sb-ai-lab.github.io/RePlay/).
 <a name="toc"></a>
 # Table of Contents
-* [Installation](#installation)
 * [Quickstart](#quickstart)
+* [Installation](#installation)
 * [Resources](#examples)
 * [Contributing to RePlay](#contributing)
-<a name="installation"></a>
-## 🔧 Installation
-Installation via `pip` package manager is recommended by default:
-```bash
-pip install replay-rec
-```
-In this case it will be installed the `core` package without `PySpark` and `PyTorch` dependencies.
-Also `experimental` submodule will not be installed.
-To install `experimental` submodule please specify the version with `rc0` suffix.
-For example:
-```bash
-pip install replay-rec==XX.YY.ZZrc0
-```
-### Extras
-In addition to the core package, several extras are also provided, including:
-- `[spark]`: Install PySpark functionality
-- `[torch]`: Install PyTorch and Lightning functionality
-- `[all]`: `[spark]` `[torch]`
+<a name="quickstart"></a>
+## 📈 Quickstart
-Example:
 ```bash
-# Install core package with PySpark dependency
-pip install replay-rec[spark]
-# Install package with experimental submodule and PySpark dependency
-pip install replay-rec[spark]==XX.YY.ZZrc0
+pip install replay-rec[all]
 ```
-To build RePlay from sources please use the [instruction](CONTRIBUTING.md#installing-from-the-source).
-If you encounter an error during RePlay installation, check the [troubleshooting](https://sb-ai-lab.github.io/RePlay/pages/installation.html#troubleshooting) guide.
-<a name="quickstart"></a>
-## 📈 Quickstart (PySpark-based)
+Pyspark-based model and [fast](https://github.com/sb-ai-lab/RePlay/blob/main/examples/11_sasrec_dataframes_comparison.ipynb) polars-based data preprocessing:
 ```python
+from polars import from_pandas
 from rs_datasets import MovieLens
 from replay.data import Dataset, FeatureHint, FeatureInfo, FeatureSchema, FeatureType
@@ -90,10 +58,10 @@ from replay.splitters import RatioSplitter
 spark = State().session
 ml_1m = MovieLens("1m")
-K=10
+K = 10
-# data preprocessing
-interactions = convert2spark(ml_1m.ratings)
+# convert data to polars
+interactions = from_pandas(ml_1m.ratings)
 # data splitting
 splitter = RatioSplitter(
@@ -107,7 +75,7 @@ splitter = RatioSplitter(
 )
 train, test = splitter.split(interactions)
-# dataset creating
+# datasets creation
 feature_schema = FeatureSchema(
     [
         FeatureInfo(
@@ -133,20 +101,18 @@ feature_schema = FeatureSchema(
     ]
 )
-train_dataset = Dataset(
-    feature_schema=feature_schema,
-    interactions=train,
-)
-test_dataset = Dataset(
-    feature_schema=feature_schema,
-    interactions=test,
-)
+train_dataset = Dataset(feature_schema=feature_schema, interactions=train)
+test_dataset = Dataset(feature_schema=feature_schema, interactions=test)
 # data encoding
 encoder = DatasetLabelEncoder()
 train_dataset = encoder.fit_transform(train_dataset)
 test_dataset = encoder.transform(test_dataset)
+# convert datasets to spark
+train_dataset.to_spark()
+test_dataset.to_spark()
 # model training
 model = ItemKNN()
 model.fit(train_dataset)
@@ -173,6 +139,44 @@ metrics.add_result("ItemKNN", recs)
 print(metrics.results)
 ```
+<a name="installation"></a>
+## 🔧 Installation
+Installation via `pip` package manager is recommended by default:
+```bash
+pip install replay-rec
+```
+In this case it will be installed the `core` package without `PySpark` and `PyTorch` dependencies.
+Also `experimental` submodule will not be installed.
+To install `experimental` submodule please specify the version with `rc0` suffix.
+For example:
+```bash
+pip install replay-rec==XX.YY.ZZrc0
+```
+### Extras
+In addition to the core package, several extras are also provided, including:
+- `[spark]`: Install PySpark functionality
+- `[torch]`: Install PyTorch and Lightning functionality
+- `[all]`: `[spark]` `[torch]`
+Example:
+```bash
+# Install core package with PySpark dependency
+pip install replay-rec[spark]
+# Install package with experimental submodule and PySpark dependency
+pip install replay-rec[spark]==XX.YY.ZZrc0
+```
+To build RePlay from sources please use the [instruction](CONTRIBUTING.md#installing-from-the-source).
 <a name="examples"></a>
 ## 📑  Resources
@@ -185,14 +189,19 @@ print(metrics.results)
 6. [06_item2item_recommendations.ipynb](https://github.com/sb-ai-lab/RePlay/blob/main/examples/06_item2item_recommendations.ipynb) - Item to Item recommendations example.
 7. [07_filters.ipynb](https://github.com/sb-ai-lab/RePlay/blob/main/examples/07_filters.ipynb) - An example of using filters.
 8. [08_recommending_for_categories.ipynb](https://github.com/sb-ai-lab/RePlay/blob/main/examples/08_recommending_for_categories.ipynb) - An example of recommendation for product categories.
-9. [09_sasrec_example.ipynb](https://github.com/sb-ai-lab/RePlay/blob/main/examples/09_sasrec_example.ipynb) - An example of using transformers to generate recommendations.
+9. [09_sasrec_example.ipynb](https://github.com/sb-ai-lab/RePlay/blob/main/examples/09_sasrec_example.ipynb) - An example of using transformer-based SASRec model to generate recommendations.
+10. [10_bert4rec_example.ipynb](https://github.com/sb-ai-lab/RePlay/blob/main/examples/10_bert4rec_example.ipynb) - An example of using transformer-based BERT4Rec model to generate recommendations.
+11. [11_sasrec_dataframes_comparison.ipynb](https://github.com/sb-ai-lab/RePlay/blob/main/examples/11_sasrec_dataframes_comparison.ipynb) - speed comparison of using different frameworks (pandas, polars, pyspark) for data processing during SASRec training.
+12. [12_neural_ts_exp.ipynb](https://github.com/sb-ai-lab/RePlay/blob/main/examples/12_neural_ts_exp.ipynb) - An example of using Neural Thompson Sampling bandit model (based on Wide&Deep architecture).
+13. [13_personalized_bandit_comparison.ipynb](https://github.com/sb-ai-lab/RePlay/blob/main/examples/13_personalized_bandit_comparison.ipynb) - A comparison of context-free and contextual bandit models.
+14. [14_hierarchical_recommender.ipynb](https://github.com/sb-ai-lab/RePlay/blob/main/examples/14_hierarchical_recommender.ipynb) - An example of using HierarchicalRecommender with user-disjoint LinUCB.
 ### Videos and papers
 * **Video guides**:
 	- [Replay for offline recommendations, AI Journey 2021](https://www.youtube.com/watch?v=ejQZKGAG0xs)
 * **Research papers**:
+    - [RePlay: a Recommendation Framework for Experimentation and Production Use](https://arxiv.org/abs/2409.07272) Alexey Vasilev, Anna Volodkevich, Denis Kulandin, Tatiana Bysheva, Anton Klenitskiy. In The 18th ACM Conference on Recommender Systems (RecSys '24)
 	- [Turning Dross Into Gold Loss: is BERT4Rec really better than SASRec?](https://doi.org/10.1145/3604915.3610644) Anton Klenitskiy, Alexey Vasilev. In The 17th ACM Conference on Recommender Systems (RecSys '23)
     - [The Long Tail of Context: Does it Exist and Matter?](https://arxiv.org/abs/2210.01023). Konstantin Bauman, Alexey Vasilev, Alexander Tuzhilin. In Workshop on Context-Aware Recommender Systems (CARS) (RecSys '22)
     - [Multiobjective Evaluation of Reinforcement Learning Based Recommender Systems](https://doi.org/10.1145/3523227.3551485). Alexey Grishanov, Anastasia Ianina, Konstantin Vorontsov. In The 16th ACM Conference on Recommender Systems (RecSys '22)
@@ -202,3 +211,4 @@ print(metrics.results)
 ## 💡 Contributing to RePlay
 We welcome community contributions. For details please check our [contributing guidelines](CONTRIBUTING.md).

{replay_rec-0.18.0 → replay_rec-0.18.1}/pyproject.toml RENAMED Viewed

@@ -41,7 +41,7 @@ exclude = [
     "replay/conftest.py",
     "replay/experimental",
 ]
-version = "0.18.0"
+version = "0.18.1"
 [tool.poetry.dependencies]
 python = ">=3.8.1, <3.12"
@@ -53,11 +53,13 @@ scipy = "^1.8.1"
 psutil = "~6.0.0"
 scikit-learn = "^1.0.2"
 pyarrow = ">=12.0.1"
+openvino = {version = "~2024.3.0", optional = true}
+onnx = {version = "~1.16.2", optional = true}
 pyspark = [
     {version = ">=3.4,<3.6", python = ">=3.11,<3.12", optional = true},
     {version = ">=3.0,<3.6", python = ">=3.8.1,<3.11", optional = true},
 ]
-torch = {version = ">=1.8, <=2.4.0", optional = true}
+torch = {version = ">=1.8, <=2.5.0", optional = true}
 lightning = {version = ">=2.0.2, <=2.4.0", optional = true}
 pytorch-ranger = {version = "^0.1.1", optional = true}
 fixed-install-nmslib = "2.1.2"
@@ -66,7 +68,8 @@ hnswlib = "^0.7.0"
 [tool.poetry.extras]
 spark = ["pyspark"]
 torch = ["torch", "pytorch-ranger", "lightning"]
-all = ["pyspark", "torch", "pytorch-ranger", "lightning"]
+torch-openvino = ["torch", "pytorch-ranger", "lightning", "openvino", "onnx"]
+all = ["pyspark", "torch", "pytorch-ranger", "lightning", "openvino", "onnx"]
 [tool.poetry.group.dev.dependencies]
 jupyter = "~1.0.0"
@@ -85,10 +88,11 @@ myst-parser = "1.0.0"
 ghp-import = "2.1.0"
 docutils = "0.16"
 data-science-types = "0.2.23"
+filelock = "~3.14.0"
 [tool.poetry-dynamic-versioning]
 enable = false
-format-jinja = """0.18.0{{ env['PACKAGE_SUFFIX'] }}"""
+format-jinja = """0.18.1{{ env['PACKAGE_SUFFIX'] }}"""
 vcs = "git"
 [tool.ruff]

{replay_rec-0.18.0 → replay_rec-0.18.1}/replay/__init__.py RENAMED Viewed

@@ -1,3 +1,3 @@
 """ RecSys library """
-__version__ = "0.18.0"
+__version__ = "0.18.1"

{replay_rec-0.18.0 → replay_rec-0.18.1}/replay/data/dataset.py RENAMED Viewed

@@ -458,13 +458,23 @@ class Dataset:
             if feature.feature_hint in [FeatureHint.ITEM_ID, FeatureHint.QUERY_ID]:
                 return nunique(self._ids_feature_map[feature.feature_hint], column)
             assert feature.feature_source
+            if feature.feature_type == FeatureType.CATEGORICAL_LIST:
+                if self.is_spark:
+                    data = (
+                        self._feature_source_map[feature.feature_source]
+                        .select(column)
+                        .withColumn(column, sf.explode(column))
+                    )
+                else:
+                    data = self._feature_source_map[feature.feature_source][[column]].explode(column)
+                return nunique(data, column)
             return nunique(self._feature_source_map[feature.feature_source], column)
         return callback
     def _set_cardinality(self, features_list: Sequence[FeatureInfo]) -> None:
         for feature in features_list:
-            if feature.feature_type == FeatureType.CATEGORICAL:
+            if feature.feature_type in [FeatureType.CATEGORICAL, FeatureType.CATEGORICAL_LIST]:
                 feature._set_cardinality_callback(self._get_cardinality(feature))
     def _fill_feature_schema(self, feature_schema: FeatureSchema) -> FeatureSchema:
@@ -581,6 +591,7 @@ class Dataset:
         data: DataFrameLike,
         column: str,
         source: FeatureSource,
+        feature_type: FeatureType,
         cardinality: Optional[int],
     ) -> None:
         """
@@ -593,6 +604,16 @@ class Dataset:
         Option: Keep this criterion, but suggest the user to disable the check if he understands
         that the criterion will not pass.
         """
+        if feature_type == FeatureType.CATEGORICAL_LIST:  # explode column if list
+            data = data.withColumn(column, sf.explode(column)) if self.is_spark else data[[column]].explode(column)
+            if self.is_pandas:
+                try:
+                    data[column] = data[column].astype(int)
+                except Exception:
+                    msg = f"IDs in {source.name}.{column} are not encoded. They are not int."
+                    raise ValueError(msg)
         if self.is_pandas:
             is_int = np.issubdtype(dict(data.dtypes)[column], int)
         elif self.is_spark:
@@ -632,6 +653,7 @@ class Dataset:
                     self.interactions,
                     feature.column,
                     FeatureSource.INTERACTIONS,
+                    feature.feature_type,
                     feature.cardinality,
                 )
                 if self.item_features is not None:
@@ -639,6 +661,7 @@ class Dataset:
                         self.item_features,
                         feature.column,
                         FeatureSource.ITEM_FEATURES,
+                        feature.feature_type,
                         feature.cardinality,
                     )
             elif feature.feature_hint == FeatureHint.QUERY_ID:
@@ -646,6 +669,7 @@ class Dataset:
                     self.interactions,
                     feature.column,
                     FeatureSource.INTERACTIONS,
+                    feature.feature_type,
                     feature.cardinality,
                 )
                 if self.query_features is not None:
@@ -653,6 +677,7 @@ class Dataset:
                         self.query_features,
                         feature.column,
                         FeatureSource.QUERY_FEATURES,
+                        feature.feature_type,
                         feature.cardinality,
                     )
             else:
@@ -661,6 +686,7 @@ class Dataset:
                     data,
                     feature.column,
                     feature.feature_source,
+                    feature.feature_type,
                     feature.cardinality,
                 )

{replay_rec-0.18.0 → replay_rec-0.18.1}/replay/data/dataset_utils/dataset_label_encoder.py RENAMED Viewed

@@ -8,8 +8,8 @@ Contains classes for encoding categorical data
 import warnings
 from typing import Dict, Iterable, Iterator, Optional, Sequence, Set, Union
-from replay.data import Dataset, FeatureHint, FeatureSchema, FeatureSource
-from replay.preprocessing import LabelEncoder, LabelEncodingRule
+from replay.data import Dataset, FeatureHint, FeatureSchema, FeatureSource, FeatureType
+from replay.preprocessing import LabelEncoder, LabelEncodingRule, SequenceEncodingRule
 from replay.preprocessing.label_encoder import HandleUnknownStrategies
@@ -62,7 +62,10 @@ class DatasetLabelEncoder:
         self._fill_features_columns(dataset.feature_schema)
         for column, feature_info in dataset.feature_schema.categorical_features.items():
-            encoding_rule = LabelEncodingRule(
+            encoding_rule_class = (
+                SequenceEncodingRule if feature_info.feature_type == FeatureType.CATEGORICAL_LIST else LabelEncodingRule
+            )
+            encoding_rule = encoding_rule_class(
                 column, handle_unknown=self._handle_unknown_rule, default_value=self._default_value_rule
             )
             if feature_info.feature_hint == FeatureHint.QUERY_ID:

replay-rec 0.18.0__tar.gz → 0.18.1__tar.gz

replay-rec 0.18.0tar.gz → 0.18.1tar.gz