PyPI - gsppy - Versions diffs - 3.6.0__tar.gz → 4.1.0__tar.gz - Mend

gsppy 3.6.0tar.gz → 4.1.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (35) hide show

{gsppy-3.6.0 → gsppy-4.1.0}/CHANGELOG.md +186 -0
{gsppy-3.6.0 → gsppy-4.1.0}/PKG-INFO +405 -9
{gsppy-3.6.0 → gsppy-4.1.0}/README.md +395 -4
gsppy-4.1.0/gsppy/__init__.py +88 -0
{gsppy-3.6.0 → gsppy-4.1.0}/gsppy/cli.py +316 -13
gsppy-4.1.0/gsppy/dataframe_adapters.py +458 -0
gsppy-4.1.0/gsppy/enums.py +49 -0
{gsppy-3.6.0 → gsppy-4.1.0}/gsppy/gsp.py +220 -15
gsppy-4.1.0/gsppy/sequence.py +371 -0
gsppy-4.1.0/gsppy/token_mapper.py +99 -0
{gsppy-3.6.0 → gsppy-4.1.0}/gsppy/utils.py +120 -0
{gsppy-3.6.0 → gsppy-4.1.0}/pyproject.toml +18 -7
{gsppy-3.6.0 → gsppy-4.1.0}/tests/test_cli.py +70 -3
gsppy-4.1.0/tests/test_dataframe.py +341 -0
gsppy-4.1.0/tests/test_gsp_sequence_integration.py +345 -0
gsppy-4.1.0/tests/test_sequence.py +466 -0
gsppy-4.1.0/tests/test_spm_format.py +303 -0
{gsppy-3.6.0 → gsppy-4.1.0}/tox.ini +1 -1
gsppy-3.6.0/gsppy/__init__.py +0 -43
{gsppy-3.6.0 → gsppy-4.1.0}/.gitignore +0 -0
{gsppy-3.6.0 → gsppy-4.1.0}/CONTRIBUTING.md +0 -0
{gsppy-3.6.0 → gsppy-4.1.0}/LICENSE +0 -0
{gsppy-3.6.0 → gsppy-4.1.0}/SECURITY.md +0 -0
{gsppy-3.6.0 → gsppy-4.1.0}/gsppy/accelerate.py +0 -0
{gsppy-3.6.0 → gsppy-4.1.0}/gsppy/pruning.py +0 -0
{gsppy-3.6.0 → gsppy-4.1.0}/gsppy/py.typed +0 -0
{gsppy-3.6.0 → gsppy-4.1.0}/rust/Cargo.lock +0 -0
{gsppy-3.6.0 → gsppy-4.1.0}/rust/Cargo.toml +0 -0
{gsppy-3.6.0 → gsppy-4.1.0}/rust/src/lib.rs +0 -0
{gsppy-3.6.0 → gsppy-4.1.0}/tests/__init__.py +0 -0
{gsppy-3.6.0 → gsppy-4.1.0}/tests/test_gsp.py +0 -0
{gsppy-3.6.0 → gsppy-4.1.0}/tests/test_gsp_fuzzing.py +0 -0
{gsppy-3.6.0 → gsppy-4.1.0}/tests/test_pruning.py +0 -0
{gsppy-3.6.0 → gsppy-4.1.0}/tests/test_temporal_constraints.py +0 -0
{gsppy-3.6.0 → gsppy-4.1.0}/tests/test_utils.py +0 -0

{gsppy-3.6.0 → gsppy-4.1.0}/CHANGELOG.md RENAMED Viewed

@@ -1,6 +1,192 @@
 # CHANGELOG
+## v4.1.0 (2026-02-01)
+### Bug Fixes
+- Address code review feedback - add type annotations and remove unused variables
+  ([`bf62d14`](https://github.com/jacksonpradolima/gsp-py/commit/bf62d144d8f1be1e7716291d41af955450612c81))
+Co-authored-by: jacksonpradolima <7774063+jacksonpradolima@users.noreply.github.com>
+### Chores
+- Update uv.lock for version 4.0.0
+  ([`f1ae2af`](https://github.com/jacksonpradolima/gsp-py/commit/f1ae2af2aa71ea44b9d8625ed647da79259ec096))
+### Documentation
+- Add Sequence documentation and examples to README
+  ([`62d0d02`](https://github.com/jacksonpradolima/gsp-py/commit/62d0d02c19c5751331df53e680cc0b9aee19677b))
+Co-authored-by: jacksonpradolima <7774063+jacksonpradolima@users.noreply.github.com>
+- Update docs/ with Sequence abstraction documentation
+  ([`2368cf3`](https://github.com/jacksonpradolima/gsp-py/commit/2368cf30239139e8e2af5457ee6acf14db30ef06))
+Co-authored-by: jacksonpradolima <7774063+jacksonpradolima@users.noreply.github.com>
+### Features
+- Add Sequence abstraction class with comprehensive tests
+  ([`6011bdb`](https://github.com/jacksonpradolima/gsp-py/commit/6011bdb7104755d109b58261b36e1dd1c36b2d61))
+Co-authored-by: jacksonpradolima <7774063+jacksonpradolima@users.noreply.github.com>
+- Integrate Sequence objects with GSP.search() via return_sequences parameter
+  ([`7476588`](https://github.com/jacksonpradolima/gsp-py/commit/7476588f2b277276748e0550366014f2a93d8ef5))
+Co-authored-by: jacksonpradolima <7774063+jacksonpradolima@users.noreply.github.com>
+- Introduce Sequence abstraction for typed pattern representation
+  ([`01ca37b`](https://github.com/jacksonpradolima/gsp-py/commit/01ca37b9bc4572eb7b1c1eaf6fdf26ca2324a3c5))
+### Refactoring
+- Address code review feedback - remove redundant checks
+  ([`621e940`](https://github.com/jacksonpradolima/gsp-py/commit/621e9403379ae0fd07bf45b97616b9979f2d4aa6))
+Co-authored-by: jacksonpradolima <7774063+jacksonpradolima@users.noreply.github.com>
+- Reduce cognitive complexity in sequence_example.py and fix f-string
+  ([`63ac4f9`](https://github.com/jacksonpradolima/gsp-py/commit/63ac4f9ceb869a5228cdccdcf6a9d0b9f46f0350))
+Co-authored-by: jacksonpradolima <7774063+jacksonpradolima@users.noreply.github.com>
+- Update type annotations and improve search method in GSP class
+  ([`e2e9a3f`](https://github.com/jacksonpradolima/gsp-py/commit/e2e9a3f473d1e0c5d6990c8b7c5837a251761032))
+## v4.0.0 (2026-02-01)
+### Chores
+- Add additional VSCode extensions for improved development experience
+  ([`107dfa4`](https://github.com/jacksonpradolima/gsp-py/commit/107dfa422005f4cdec4655a9751fd0d6e597773f))
+- Update uv.lock for version 3.6.1
+  ([`d8d7394`](https://github.com/jacksonpradolima/gsp-py/commit/d8d73947d570844c02e9d974b626da26f07cf1e6))
+### Features
+- Add SPM/GSP delimiter format loader and token mapping utilities
+  ([`4ac1d34`](https://github.com/jacksonpradolima/gsp-py/commit/4ac1d34d166f21d30968872cf16c1bde3ff1f2aa))
+### Refactoring
+- Add type casting for return values in read_transactions_from_spm
+  ([`2099bfd`](https://github.com/jacksonpradolima/gsp-py/commit/2099bfd5253a1dc058dd46bd0da077810958fa76))
+- Update read_transactions_from_spm to return mappings and adjust tests
+  ([`373b8ff`](https://github.com/jacksonpradolima/gsp-py/commit/373b8ff0d7f131140bcdbd039fae0d02572e86b7))
+## v3.6.1 (2026-01-31)
+### Bug Fixes
+- Typing for polars and pandas
+  ([`0773992`](https://github.com/jacksonpradolima/gsp-py/commit/07739921d074e55c8436a88a73e510b1d8761510))
+### Build System
+- **deps**: Bump actions/checkout in /.github/workflows
+  ([`7af193d`](https://github.com/jacksonpradolima/gsp-py/commit/7af193d515972eeca5d8e354e91a60e488357cfb))
+Bumps [actions/checkout](https://github.com/actions/checkout) from 4.3.1 to 6.0.2. - [Release
+  notes](https://github.com/actions/checkout/releases) -
+  [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) -
+  [Commits](https://github.com/actions/checkout/compare/v4.3.1...de0fac2e4500dabe0009e67214ff5f5447ce83dd)
+--- updated-dependencies: - dependency-name: actions/checkout dependency-version: 6.0.2
+dependency-type: direct:production
+update-type: version-update:semver-major
+...
+Signed-off-by: dependabot[bot] <support@github.com>
+- **deps**: Bump actions/github-script in /.github/workflows
+  ([`03a7588`](https://github.com/jacksonpradolima/gsp-py/commit/03a7588301421369731d3d543f81b93c25c292ef))
+Bumps [actions/github-script](https://github.com/actions/github-script) from 7.0.1 to 8.0.0. -
+  [Release notes](https://github.com/actions/github-script/releases) -
+  [Commits](https://github.com/actions/github-script/compare/60a0d83039c74a4aee543508d2ffcb1c3799cdea...ed597411d8f924073f98dfc5c65a23a2325f34cd)
+--- updated-dependencies: - dependency-name: actions/github-script dependency-version: 8.0.0
+dependency-type: direct:production
+update-type: version-update:semver-major
+...
+Signed-off-by: dependabot[bot] <support@github.com>
+- **deps**: Bump actions/setup-python in /.github/workflows
+  ([`75771bf`](https://github.com/jacksonpradolima/gsp-py/commit/75771bff660b3842f2c8d84bdaeb013941e5abe0))
+Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5.6.0 to 6.2.0. -
+  [Release notes](https://github.com/actions/setup-python/releases) -
+  [Commits](https://github.com/actions/setup-python/compare/v5.6.0...a309ff8b426b58ec0e2a45f0f869d46889d02405)
+--- updated-dependencies: - dependency-name: actions/setup-python dependency-version: 6.2.0
+dependency-type: direct:production
+update-type: version-update:semver-major
+...
+Signed-off-by: dependabot[bot] <support@github.com>
+- **deps**: Bump actions/stale in /.github/workflows
+  ([`e699ccd`](https://github.com/jacksonpradolima/gsp-py/commit/e699ccdac689734b4694665d924ace8bba479253))
+Bumps [actions/stale](https://github.com/actions/stale) from 9.0.0 to 10.1.1. - [Release
+  notes](https://github.com/actions/stale/releases) -
+  [Changelog](https://github.com/actions/stale/blob/main/CHANGELOG.md) -
+  [Commits](https://github.com/actions/stale/compare/28ca1036281a5e5922ead5184a1bbf96e5fc984e...997185467fa4f803885201cee163a9f38240193d)
+--- updated-dependencies: - dependency-name: actions/stale dependency-version: 10.1.1
+dependency-type: direct:production
+update-type: version-update:semver-major
+...
+Signed-off-by: dependabot[bot] <support@github.com>
+- **deps**: Bump actions/upload-artifact in /.github/workflows
+  ([`17efaff`](https://github.com/jacksonpradolima/gsp-py/commit/17efaffc755c017e066c0286464899ead6e2cae4))
+Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.6.2 to 6.0.0. -
+  [Release notes](https://github.com/actions/upload-artifact/releases) -
+  [Commits](https://github.com/actions/upload-artifact/compare/v4.6.2...b7c566a772e6b6bfb58ed0dc250532a479d7789f)
+--- updated-dependencies: - dependency-name: actions/upload-artifact dependency-version: 6.0.0
+dependency-type: direct:production
+update-type: version-update:semver-major
+...
+Signed-off-by: dependabot[bot] <support@github.com>
+### Chores
+- Update uv.lock for version 3.6.0
+  ([`4c2a5e5`](https://github.com/jacksonpradolima/gsp-py/commit/4c2a5e5967482443c2db645c9ba4744bd2110dd1))
+- **deps**: Bump ty and ruff
+  ([`07a20df`](https://github.com/jacksonpradolima/gsp-py/commit/07a20df9fb4ff3a3b022d28d152b586ca45383c8))
 ## v3.6.0 (2026-01-26)
 ### Chores

{gsppy-3.6.0 → gsppy-4.1.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: gsppy
-Version: 3.6.0
+Version: 4.1.0
 Summary: GSP (Generalized Sequence Pattern) algorithm in Python
 Project-URL: Homepage, https://github.com/jacksonpradolima/gsp-py
 Author-email: Jackson Antonio do Prado Lima <jacksonpradolima@gmail.com>
@@ -32,15 +32,20 @@ Classifier: Intended Audience :: Science/Research
 Classifier: License :: OSI Approved :: MIT License
 Classifier: Natural Language :: English
 Classifier: Operating System :: OS Independent
-Classifier: Programming Language :: Python :: 3.10
 Classifier: Programming Language :: Python :: 3.11
 Classifier: Programming Language :: Python :: 3.12
 Classifier: Programming Language :: Python :: 3.13
+Classifier: Programming Language :: Python :: 3.14
 Classifier: Topic :: Scientific/Engineering :: Information Analysis
 Classifier: Topic :: Software Development :: Libraries :: Python Modules
-Requires-Python: >=3.10
+Requires-Python: >=3.11
 Requires-Dist: click>=8.0.0
 Requires-Dist: typing-extensions>=4.0.0
+Provides-Extra: dataframe
+Requires-Dist: pandas-stubs>=2.3.3.260113; extra == 'dataframe'
+Requires-Dist: pandas>=3.0.0; extra == 'dataframe'
+Requires-Dist: polars>=1.37.1; extra == 'dataframe'
+Requires-Dist: pyarrow>=10.0.0; extra == 'dataframe'
 Provides-Extra: dev
 Requires-Dist: cython==3.2.4; extra == 'dev'
 Requires-Dist: hatch==1.16.3; extra == 'dev'
@@ -51,9 +56,9 @@ Requires-Dist: pyright==1.1.408; extra == 'dev'
 Requires-Dist: pytest-benchmark==5.2.3; extra == 'dev'
 Requires-Dist: pytest-cov==7.0.0; extra == 'dev'
 Requires-Dist: pytest==9.0.2; extra == 'dev'
-Requires-Dist: ruff==0.14.13; extra == 'dev'
+Requires-Dist: ruff==0.14.14; extra == 'dev'
 Requires-Dist: tox==4.34.1; extra == 'dev'
-Requires-Dist: ty==0.0.12; extra == 'dev'
+Requires-Dist: ty==0.0.14; extra == 'dev'
 Provides-Extra: docs
 Requires-Dist: mkdocs-gen-files<1,>=0.5; extra == 'docs'
 Requires-Dist: mkdocs-literate-nav<1,>=0.6; extra == 'docs'
@@ -72,7 +77,7 @@ Description-Content-Type: text/markdown
 [![PyPI Downloads](https://img.shields.io/pypi/dm/gsppy.svg?style=flat-square)](https://pypi.org/project/gsppy/)
 [![PyPI version](https://badge.fury.io/py/gsppy.svg)](https://pypi.org/project/gsppy)
-![](https://img.shields.io/badge/python-3.10+-blue.svg)
+![](https://img.shields.io/badge/python-3.11+-blue.svg)
 [![OpenSSF Scorecard](https://api.securityscorecards.dev/projects/github.com/jacksonpradolima/gsp-py/badge)](https://securityscorecards.dev/viewer/?uri=github.com/jacksonpradolima/gsp-py)
 [![SLSA provenance](https://github.com/jacksonpradolima/gsp-py/actions/workflows/slsa-provenance.yml/badge.svg)](https://github.com/jacksonpradolima/gsp-py/actions/workflows/slsa-provenance.yml)
@@ -90,7 +95,7 @@ Description-Content-Type: text/markdown
 Sequence Pattern (GSP)** algorithm. Ideal for market basket analysis, temporal mining, and user journey discovery.
 > [!IMPORTANT]
-> GSP-Py is compatible with Python 3.10 and later versions!
+> GSP-Py is compatible with Python 3.11 and later versions!
 ---
@@ -106,6 +111,7 @@ Sequence Pattern (GSP)** algorithm. Ideal for market basket analysis, temporal m
 6. [💡 Usage](#usage)
     - [✅ Example: Analyzing Sales Data](#example-analyzing-sales-data)
     - [📊 Explanation: Support and Results](#explanation-support-and-results)
+    - [📊 DataFrame Input Support](#dataframe-input-support)
     - [⏱️ Temporal Constraints](#temporal-constraints)
 7. [⌨️ Typing](#typing)
 8. [🌟 Planned Features](#planned-features)
@@ -357,6 +363,34 @@ Your input file should be either:
   Bread,Milk,Diaper,Coke
   ```
+- **SPM/GSP Format**: Uses delimiters to separate elements and sequences. This format is commonly used in sequential pattern mining datasets.
+  - `-1`: Marks the end of an element (itemset)
+  - `-2`: Marks the end of a sequence (transaction)
+  Example:
+  ```text
+  1 2 -1 3 -1 -2
+  4 -1 5 6 -1 -2
+  1 -1 2 3 -1 -2
+  ```
+  The above represents:
+  - Transaction 1: `[[1, 2], [3]]` → flattened to `[1, 2, 3]`
+  - Transaction 2: `[[4], [5, 6]]` → flattened to `[4, 5, 6]`
+  - Transaction 3: `[[1], [2, 3]]` → flattened to `[1, 2, 3]`
+  String tokens are also supported:
+  ```text
+  A B -1 C -1 -2
+  D -1 E F -1 -2
+  ```
+- **Parquet/Arrow Files**: Modern columnar data formats (requires 'gsppy[dataframe]')
+  ```bash
+  pip install 'gsppy[dataframe]'
+  ```
+  This installs optional dependencies: `polars`, `pandas`, and `pyarrow` for DataFrame support.
 ### Running the CLI
 Use the following command to run GSPPy on your data:
@@ -371,9 +405,16 @@ Or for CSV files:
 gsppy --file path/to/transactions.csv --min_support 0.3 --backend rust
 ```
+For SPM/GSP format files, use the `--format spm` option:
+```bash
+gsppy --file path/to/data.txt --format spm --min_support 0.3
+```
 #### CLI Options
-- `--file`: Path to your input file (JSON or CSV). **Required**.
+- `--file`: Path to your input file (JSON, CSV, or SPM format). **Required**.
+- `--format`: File format to use. Options: `auto` (default, auto-detect from extension), `json`, `csv`, `spm`, `parquet`, `arrow`.
 - `--min_support`: Minimum support threshold as a fraction (e.g., `0.3` for 30%). Default is `0.2`.
 - `--backend`: Backend to use for support counting. One of `auto` (default), `python`, `rust`, or `gpu`.
 - `--verbose`: Enable detailed logging with timestamps, log levels, and process IDs for debugging and traceability.
@@ -518,6 +559,159 @@ Verbose mode provides:
 For complete documentation on logging, see [docs/logging.md](docs/logging.md).
+### Using Sequence Objects for Rich Pattern Representation
+GSP-Py 4.0+ introduces a **Sequence abstraction class** that provides a richer, more maintainable way to work with sequential patterns. The Sequence class encapsulates pattern items, support counts, and optional metadata in an immutable, hashable object.
+#### Traditional Dict-based Output (Default)
+```python
+from gsppy import GSP
+transactions = [
+    ['Bread', 'Milk'],
+    ['Bread', 'Diaper', 'Beer', 'Eggs'],
+    ['Milk', 'Diaper', 'Beer', 'Coke']
+]
+gsp = GSP(transactions)
+result = gsp.search(min_support=0.3)
+# Returns: [{('Bread',): 4, ('Milk',): 4, ...}, {('Bread', 'Milk'): 3, ...}, ...]
+for level_patterns in result:
+    for pattern, support in level_patterns.items():
+        print(f"Pattern: {pattern}, Support: {support}")
+```
+#### Sequence Objects (New Feature)
+```python
+from gsppy import GSP
+transactions = [
+    ['Bread', 'Milk'],
+    ['Bread', 'Diaper', 'Beer', 'Eggs'],
+    ['Milk', 'Diaper', 'Beer', 'Coke']
+]
+gsp = GSP(transactions)
+result = gsp.search(min_support=0.3, return_sequences=True)
+# Returns: [[Sequence(('Bread',), support=4), ...], [Sequence(('Bread', 'Milk'), support=3), ...], ...]
+for level_patterns in result:
+    for seq in level_patterns:
+        print(f"Pattern: {seq.items}, Support: {seq.support}, Length: {seq.length}")
+        # Access sequence properties
+        print(f"  First item: {seq.first_item}, Last item: {seq.last_item}")
+        # Check if item is in sequence
+        if "Milk" in seq:
+            print(f"  Contains Milk!")
+```
+#### Key Benefits of Sequence Objects
+1. **Rich API**: Access pattern properties like `length`, `first_item`, `last_item`
+2. **Type Safety**: IDE autocomplete and better type hints
+3. **Immutable & Hashable**: Can be used as dictionary keys
+4. **Extensible**: Add metadata for confidence, lift, or custom properties
+5. **Backward Compatible**: Convert to/from dict format as needed
+```python
+from gsppy import Sequence, sequences_to_dict, dict_to_sequences
+# Create custom sequences
+seq = Sequence.from_tuple(("A", "B", "C"), support=5)
+# Extend sequences
+extended = seq.extend("D")  # Creates Sequence(("A", "B", "C", "D"))
+# Add metadata
+seq_with_meta = seq.with_metadata(confidence=0.85, lift=1.5)
+# Convert between formats for compatibility
+seq_result = gsp.search(min_support=0.3, return_sequences=True)
+dict_format = sequences_to_dict(seq_result[0])  # Convert to dict
+```
+For a complete example, see [examples/sequence_example.py](examples/sequence_example.py).
+### Loading SPM/GSP Format Files
+GSP-Py supports loading datasets in the classical SPM/GSP delimiter format, which is widely used in sequential pattern mining research. This format uses:
+- `-1` to mark the end of an element (itemset)
+- `-2` to mark the end of a sequence (transaction)
+#### Using the SPM Loader
+```python
+from gsppy.utils import read_transactions_from_spm
+from gsppy import GSP
+# Load SPM format file
+transactions = read_transactions_from_spm('data.txt')
+# Run GSP algorithm
+gsp = GSP(transactions)
+result = gsp.search(min_support=0.3)
+```
+#### SPM Format Examples
+**Simple sequence file (`data.txt`):**
+```text
+1 2 -1 3 -1 -2
+4 -1 5 6 -1 -2
+1 -1 2 3 -1 -2
+```
+This represents:
+- Transaction 1: Items [1, 2] followed by item [3] → flattened to [1, 2, 3]
+- Transaction 2: Item [4] followed by items [5, 6] → flattened to [4, 5, 6]
+- Transaction 3: Item [1] followed by items [2, 3] → flattened to [1, 2, 3]
+**String tokens are also supported:**
+```text
+A B -1 C -1 -2
+D -1 E F -1 -2
+```
+#### Token Mapping
+For workflows requiring conversion between string tokens and integer IDs, use the `TokenMapper`:
+```python
+from gsppy.utils import read_transactions_from_spm
+from gsppy import TokenMapper
+# Load with mappings
+transactions, str_to_int, int_to_str = read_transactions_from_spm(
+    'data.txt',
+    return_mappings=True
+)
+print("String to Int:", str_to_int)
+# Output: {'1': 0, '2': 1, '3': 2, '4': 3, '5': 4, '6': 5}
+print("Int to String:", int_to_str)
+# Output: {0: '1', 1: '2', 2: '3', 3: '4', 4: '5', 5: '6'}
+# Use the TokenMapper class directly
+mapper = TokenMapper()
+id_a = mapper.add_token("A")
+id_b = mapper.add_token("B")
+print(f"A -> {id_a}, B -> {id_b}")
+# Output: A -> 0, B -> 1
+```
+#### Edge Cases Handled
+The SPM loader gracefully handles:
+- Empty lines (skipped)
+- Missing `-2` delimiter at end of line
+- Extra or consecutive delimiters
+- Mixed-length elements in sequences
+- Both integer and string tokens
 ### Output
 The algorithm will return a list of patterns with their corresponding support.
@@ -584,6 +778,208 @@ result = gsp.search(min_support=0.5)  # Need at least 2/4 sequences
 ---
+## 📊 DataFrame Input Support
+GSP-Py supports **Polars and Pandas DataFrames** as input, enabling high-performance workflows with modern data formats like Arrow and Parquet. This feature is particularly useful for large-scale data engineering pipelines and integration with existing data processing workflows.
+### Installation
+Install GSP-Py with DataFrame support:
+```bash
+pip install 'gsppy[dataframe]'
+```
+This installs the optional dependencies: `polars`, `pandas`, and `pyarrow`.
+### DataFrame Input Formats
+GSP-Py supports two DataFrame formats:
+#### 1. Grouped Format (Transaction ID + Item Columns)
+Use when your data has separate rows for each item in a transaction:
+```python
+import polars as pl
+from gsppy import GSP
+# Polars DataFrame with transaction_id and item columns
+df = pl.DataFrame({
+    "transaction_id": [1, 1, 2, 2, 2, 3, 3],
+    "item": ["Bread", "Milk", "Bread", "Diaper", "Beer", "Milk", "Coke"],
+})
+# Run GSP directly on the DataFrame
+gsp = GSP(df, transaction_col="transaction_id", item_col="item")
+patterns = gsp.search(min_support=0.3)
+for level, freq_patterns in enumerate(patterns, start=1):
+    print(f"\n{level}-Sequence Patterns:")
+    for pattern, support in freq_patterns.items():
+        print(f"  {pattern}: {support}")
+```
+#### 2. Sequence Format (List Column)
+Use when each row contains a complete transaction as a list:
+```python
+import pandas as pd
+from gsppy import GSP
+# Pandas DataFrame with sequences as lists
+df = pd.DataFrame({
+    "transaction": [
+        ["Bread", "Milk"],
+        ["Bread", "Diaper", "Beer"],
+        ["Milk", "Coke"],
+    ]
+})
+gsp = GSP(df, sequence_col="transaction")
+patterns = gsp.search(min_support=0.3)
+```
+### DataFrame with Timestamps
+DataFrames support temporal constraints for time-aware pattern mining:
+```python
+import polars as pl
+from gsppy import GSP
+# Grouped format with timestamps
+df = pl.DataFrame({
+    "transaction_id": [1, 1, 1, 2, 2, 2],
+    "item": ["Login", "Browse", "Purchase", "Login", "Browse", "Purchase"],
+    "timestamp": [0, 2, 5, 0, 1, 15],  # Time in seconds
+})
+# Find patterns where consecutive events occur within 10 seconds
+gsp = GSP(
+    df,
+    transaction_col="transaction_id",
+    item_col="item",
+    timestamp_col="timestamp",
+    maxgap=10
+)
+patterns = gsp.search(min_support=0.5)
+```
+For sequence format with timestamps:
+```python
+import pandas as pd
+from gsppy import GSP
+df = pd.DataFrame({
+    "sequence": [["A", "B", "C"], ["A", "D"]],
+    "timestamps": [[1, 2, 3], [1, 5]],  # Timestamps per item
+})
+gsp = GSP(df, sequence_col="sequence", timestamp_col="timestamps", maxgap=3)
+patterns = gsp.search(min_support=0.5)
+```
+### Working with Parquet and Arrow Files
+DataFrames enable seamless integration with columnar storage formats:
+```python
+import polars as pl
+from gsppy import GSP
+# Read directly from Parquet
+df = pl.read_parquet("transactions.parquet")
+# Run GSP with automatic schema detection
+gsp = GSP(df, transaction_col="txn_id", item_col="product")
+patterns = gsp.search(min_support=0.2)
+# Or use Pandas with Arrow backend
+import pandas as pd
+df_pandas = pd.read_parquet("transactions.parquet", engine="pyarrow")
+gsp = GSP(df_pandas, transaction_col="txn_id", item_col="product")
+patterns = gsp.search(min_support=0.2)
+```
+### Performance Considerations
+DataFrames offer performance benefits for large datasets:
+- **Polars**: Leverages Arrow for zero-copy operations and parallel processing
+- **Pandas**: Compatible with Arrow backend for efficient memory usage
+- **Parquet/Arrow**: Columnar storage enables efficient filtering and reading
+- **Schema validation**: Errors are caught early with clear messages
+### DataFrame Schema Requirements
+**Grouped Format:**
+- `transaction_col`: Column containing transaction/sequence IDs (any type)
+- `item_col`: Column containing items (any type, converted to strings)
+- `timestamp_col` (optional): Column containing timestamps (numeric)
+**Sequence Format:**
+- `sequence_col`: Column containing lists of items
+- `timestamp_col` (optional): Column containing lists of timestamps (must match sequence lengths)
+### Error Handling
+GSP-Py provides clear error messages for schema issues:
+```python
+import polars as pl
+from gsppy import GSP
+df = pl.DataFrame({
+    "txn_id": [1, 2],
+    "product": ["A", "B"],
+})
+# ❌ Missing required column
+try:
+    gsp = GSP(df, transaction_col="txn_id", item_col="item")  # 'item' doesn't exist
+except ValueError as e:
+    print(f"Error: {e}")  # "Column 'item' not found in DataFrame"
+# ❌ Invalid format specification
+try:
+    gsp = GSP(df)  # Must specify either sequence_col or both transaction_col and item_col
+except ValueError as e:
+    print(f"Error: {e}")  # "Must specify either 'sequence_col' or both 'transaction_col' and 'item_col'"
+```
+### Backward Compatibility
+Traditional list-based input continues to work:
+```python
+from gsppy import GSP
+# Lists still work as before
+transactions = [["A", "B"], ["A", "C"], ["B", "C"]]
+gsp = GSP(transactions)
+patterns = gsp.search(min_support=0.5)
+```
+DataFrame parameters cannot be mixed with list input:
+```python
+transactions = [["A", "B"], ["C", "D"]]
+# ❌ This raises an error
+gsp = GSP(transactions, transaction_col="txn")  # ValueError: DataFrame parameters cannot be used with list input
+```
+### Examples and Tests
+For complete examples and edge cases, see:
+- [`tests/test_dataframe.py`](tests/test_dataframe.py) - Comprehensive test suite
+- DataFrame adapter documentation in [`gsppy/dataframe_adapters.py`](gsppy/dataframe_adapters.py)
+---
 ## ⏱️ Temporal Constraints
 GSP-Py supports **time-constrained sequential pattern mining** with three powerful temporal constraints: `mingap`, `maxgap`, and `maxspan`. These constraints enable domain-specific applications such as medical event mining, retail analytics, and temporal user journey discovery.
@@ -591,7 +987,7 @@ GSP-Py supports **time-constrained sequential pattern mining** with three powerf
 ### Temporal Constraint Parameters
 - **`mingap`**: Minimum time gap required between consecutive items in a pattern
-- **`maxgap`**: Maximum time gap allowed between consecutive items in a pattern
+- **`maxgap`**: Maximum time gap allowed between consecutive items in a pattern
 - **`maxspan`**: Maximum time span from the first to the last item in a pattern
 ### Using Temporal Constraints

gsppy 3.6.0__tar.gz → 4.1.0__tar.gz

gsppy 3.6.0tar.gz → 4.1.0tar.gz