PyPI - gsppy - Versions diffs - 4.0.0__tar.gz → 4.1.0__tar.gz - Mend

gsppy 4.0.0tar.gz → 4.1.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (34) hide show

{gsppy-4.0.0 → gsppy-4.1.0}/CHANGELOG.md +57 -0
{gsppy-4.0.0 → gsppy-4.1.0}/PKG-INFO +77 -1
{gsppy-4.0.0 → gsppy-4.1.0}/README.md +76 -0
{gsppy-4.0.0 → gsppy-4.1.0}/gsppy/__init__.py +10 -0
{gsppy-4.0.0 → gsppy-4.1.0}/gsppy/cli.py +2 -2
{gsppy-4.0.0 → gsppy-4.1.0}/gsppy/gsp.py +71 -7
gsppy-4.1.0/gsppy/sequence.py +371 -0
{gsppy-4.0.0 → gsppy-4.1.0}/pyproject.toml +1 -1
gsppy-4.1.0/tests/test_gsp_sequence_integration.py +345 -0
gsppy-4.1.0/tests/test_sequence.py +466 -0
{gsppy-4.0.0 → gsppy-4.1.0}/.gitignore +0 -0
{gsppy-4.0.0 → gsppy-4.1.0}/CONTRIBUTING.md +0 -0
{gsppy-4.0.0 → gsppy-4.1.0}/LICENSE +0 -0
{gsppy-4.0.0 → gsppy-4.1.0}/SECURITY.md +0 -0
{gsppy-4.0.0 → gsppy-4.1.0}/gsppy/accelerate.py +0 -0
{gsppy-4.0.0 → gsppy-4.1.0}/gsppy/dataframe_adapters.py +0 -0
{gsppy-4.0.0 → gsppy-4.1.0}/gsppy/enums.py +0 -0
{gsppy-4.0.0 → gsppy-4.1.0}/gsppy/pruning.py +0 -0
{gsppy-4.0.0 → gsppy-4.1.0}/gsppy/py.typed +0 -0
{gsppy-4.0.0 → gsppy-4.1.0}/gsppy/token_mapper.py +0 -0
{gsppy-4.0.0 → gsppy-4.1.0}/gsppy/utils.py +0 -0
{gsppy-4.0.0 → gsppy-4.1.0}/rust/Cargo.lock +0 -0
{gsppy-4.0.0 → gsppy-4.1.0}/rust/Cargo.toml +0 -0
{gsppy-4.0.0 → gsppy-4.1.0}/rust/src/lib.rs +0 -0
{gsppy-4.0.0 → gsppy-4.1.0}/tests/__init__.py +0 -0
{gsppy-4.0.0 → gsppy-4.1.0}/tests/test_cli.py +0 -0
{gsppy-4.0.0 → gsppy-4.1.0}/tests/test_dataframe.py +0 -0
{gsppy-4.0.0 → gsppy-4.1.0}/tests/test_gsp.py +0 -0
{gsppy-4.0.0 → gsppy-4.1.0}/tests/test_gsp_fuzzing.py +0 -0
{gsppy-4.0.0 → gsppy-4.1.0}/tests/test_pruning.py +0 -0
{gsppy-4.0.0 → gsppy-4.1.0}/tests/test_spm_format.py +0 -0
{gsppy-4.0.0 → gsppy-4.1.0}/tests/test_temporal_constraints.py +0 -0
{gsppy-4.0.0 → gsppy-4.1.0}/tests/test_utils.py +0 -0
{gsppy-4.0.0 → gsppy-4.1.0}/tox.ini +0 -0

{gsppy-4.0.0 → gsppy-4.1.0}/CHANGELOG.md RENAMED Viewed

@@ -1,6 +1,63 @@
 # CHANGELOG
+## v4.1.0 (2026-02-01)
+### Bug Fixes
+- Address code review feedback - add type annotations and remove unused variables
+  ([`bf62d14`](https://github.com/jacksonpradolima/gsp-py/commit/bf62d144d8f1be1e7716291d41af955450612c81))
+Co-authored-by: jacksonpradolima <7774063+jacksonpradolima@users.noreply.github.com>
+### Chores
+- Update uv.lock for version 4.0.0
+  ([`f1ae2af`](https://github.com/jacksonpradolima/gsp-py/commit/f1ae2af2aa71ea44b9d8625ed647da79259ec096))
+### Documentation
+- Add Sequence documentation and examples to README
+  ([`62d0d02`](https://github.com/jacksonpradolima/gsp-py/commit/62d0d02c19c5751331df53e680cc0b9aee19677b))
+Co-authored-by: jacksonpradolima <7774063+jacksonpradolima@users.noreply.github.com>
+- Update docs/ with Sequence abstraction documentation
+  ([`2368cf3`](https://github.com/jacksonpradolima/gsp-py/commit/2368cf30239139e8e2af5457ee6acf14db30ef06))
+Co-authored-by: jacksonpradolima <7774063+jacksonpradolima@users.noreply.github.com>
+### Features
+- Add Sequence abstraction class with comprehensive tests
+  ([`6011bdb`](https://github.com/jacksonpradolima/gsp-py/commit/6011bdb7104755d109b58261b36e1dd1c36b2d61))
+Co-authored-by: jacksonpradolima <7774063+jacksonpradolima@users.noreply.github.com>
+- Integrate Sequence objects with GSP.search() via return_sequences parameter
+  ([`7476588`](https://github.com/jacksonpradolima/gsp-py/commit/7476588f2b277276748e0550366014f2a93d8ef5))
+Co-authored-by: jacksonpradolima <7774063+jacksonpradolima@users.noreply.github.com>
+- Introduce Sequence abstraction for typed pattern representation
+  ([`01ca37b`](https://github.com/jacksonpradolima/gsp-py/commit/01ca37b9bc4572eb7b1c1eaf6fdf26ca2324a3c5))
+### Refactoring
+- Address code review feedback - remove redundant checks
+  ([`621e940`](https://github.com/jacksonpradolima/gsp-py/commit/621e9403379ae0fd07bf45b97616b9979f2d4aa6))
+Co-authored-by: jacksonpradolima <7774063+jacksonpradolima@users.noreply.github.com>
+- Reduce cognitive complexity in sequence_example.py and fix f-string
+  ([`63ac4f9`](https://github.com/jacksonpradolima/gsp-py/commit/63ac4f9ceb869a5228cdccdcf6a9d0b9f46f0350))
+Co-authored-by: jacksonpradolima <7774063+jacksonpradolima@users.noreply.github.com>
+- Update type annotations and improve search method in GSP class
+  ([`e2e9a3f`](https://github.com/jacksonpradolima/gsp-py/commit/e2e9a3f473d1e0c5d6990c8b7c5837a251761032))
 ## v4.0.0 (2026-02-01)
 ### Chores

{gsppy-4.0.0 → gsppy-4.1.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: gsppy
-Version: 4.0.0
+Version: 4.1.0
 Summary: GSP (Generalized Sequence Pattern) algorithm in Python
 Project-URL: Homepage, https://github.com/jacksonpradolima/gsp-py
 Author-email: Jackson Antonio do Prado Lima <jacksonpradolima@gmail.com>
@@ -559,6 +559,82 @@ Verbose mode provides:
 For complete documentation on logging, see [docs/logging.md](docs/logging.md).
+### Using Sequence Objects for Rich Pattern Representation
+GSP-Py 4.0+ introduces a **Sequence abstraction class** that provides a richer, more maintainable way to work with sequential patterns. The Sequence class encapsulates pattern items, support counts, and optional metadata in an immutable, hashable object.
+#### Traditional Dict-based Output (Default)
+```python
+from gsppy import GSP
+transactions = [
+    ['Bread', 'Milk'],
+    ['Bread', 'Diaper', 'Beer', 'Eggs'],
+    ['Milk', 'Diaper', 'Beer', 'Coke']
+]
+gsp = GSP(transactions)
+result = gsp.search(min_support=0.3)
+# Returns: [{('Bread',): 4, ('Milk',): 4, ...}, {('Bread', 'Milk'): 3, ...}, ...]
+for level_patterns in result:
+    for pattern, support in level_patterns.items():
+        print(f"Pattern: {pattern}, Support: {support}")
+```
+#### Sequence Objects (New Feature)
+```python
+from gsppy import GSP
+transactions = [
+    ['Bread', 'Milk'],
+    ['Bread', 'Diaper', 'Beer', 'Eggs'],
+    ['Milk', 'Diaper', 'Beer', 'Coke']
+]
+gsp = GSP(transactions)
+result = gsp.search(min_support=0.3, return_sequences=True)
+# Returns: [[Sequence(('Bread',), support=4), ...], [Sequence(('Bread', 'Milk'), support=3), ...], ...]
+for level_patterns in result:
+    for seq in level_patterns:
+        print(f"Pattern: {seq.items}, Support: {seq.support}, Length: {seq.length}")
+        # Access sequence properties
+        print(f"  First item: {seq.first_item}, Last item: {seq.last_item}")
+        # Check if item is in sequence
+        if "Milk" in seq:
+            print(f"  Contains Milk!")
+```
+#### Key Benefits of Sequence Objects
+1. **Rich API**: Access pattern properties like `length`, `first_item`, `last_item`
+2. **Type Safety**: IDE autocomplete and better type hints
+3. **Immutable & Hashable**: Can be used as dictionary keys
+4. **Extensible**: Add metadata for confidence, lift, or custom properties
+5. **Backward Compatible**: Convert to/from dict format as needed
+```python
+from gsppy import Sequence, sequences_to_dict, dict_to_sequences
+# Create custom sequences
+seq = Sequence.from_tuple(("A", "B", "C"), support=5)
+# Extend sequences
+extended = seq.extend("D")  # Creates Sequence(("A", "B", "C", "D"))
+# Add metadata
+seq_with_meta = seq.with_metadata(confidence=0.85, lift=1.5)
+# Convert between formats for compatibility
+seq_result = gsp.search(min_support=0.3, return_sequences=True)
+dict_format = sequences_to_dict(seq_result[0])  # Convert to dict
+```
+For a complete example, see [examples/sequence_example.py](examples/sequence_example.py).
 ### Loading SPM/GSP Format Files
 GSP-Py supports loading datasets in the classical SPM/GSP delimiter format, which is widely used in sequential pattern mining research. This format uses:

{gsppy-4.0.0 → gsppy-4.1.0}/README.md RENAMED Viewed

@@ -486,6 +486,82 @@ Verbose mode provides:
 For complete documentation on logging, see [docs/logging.md](docs/logging.md).
+### Using Sequence Objects for Rich Pattern Representation
+GSP-Py 4.0+ introduces a **Sequence abstraction class** that provides a richer, more maintainable way to work with sequential patterns. The Sequence class encapsulates pattern items, support counts, and optional metadata in an immutable, hashable object.
+#### Traditional Dict-based Output (Default)
+```python
+from gsppy import GSP
+transactions = [
+    ['Bread', 'Milk'],
+    ['Bread', 'Diaper', 'Beer', 'Eggs'],
+    ['Milk', 'Diaper', 'Beer', 'Coke']
+]
+gsp = GSP(transactions)
+result = gsp.search(min_support=0.3)
+# Returns: [{('Bread',): 4, ('Milk',): 4, ...}, {('Bread', 'Milk'): 3, ...}, ...]
+for level_patterns in result:
+    for pattern, support in level_patterns.items():
+        print(f"Pattern: {pattern}, Support: {support}")
+```
+#### Sequence Objects (New Feature)
+```python
+from gsppy import GSP
+transactions = [
+    ['Bread', 'Milk'],
+    ['Bread', 'Diaper', 'Beer', 'Eggs'],
+    ['Milk', 'Diaper', 'Beer', 'Coke']
+]
+gsp = GSP(transactions)
+result = gsp.search(min_support=0.3, return_sequences=True)
+# Returns: [[Sequence(('Bread',), support=4), ...], [Sequence(('Bread', 'Milk'), support=3), ...], ...]
+for level_patterns in result:
+    for seq in level_patterns:
+        print(f"Pattern: {seq.items}, Support: {seq.support}, Length: {seq.length}")
+        # Access sequence properties
+        print(f"  First item: {seq.first_item}, Last item: {seq.last_item}")
+        # Check if item is in sequence
+        if "Milk" in seq:
+            print(f"  Contains Milk!")
+```
+#### Key Benefits of Sequence Objects
+1. **Rich API**: Access pattern properties like `length`, `first_item`, `last_item`
+2. **Type Safety**: IDE autocomplete and better type hints
+3. **Immutable & Hashable**: Can be used as dictionary keys
+4. **Extensible**: Add metadata for confidence, lift, or custom properties
+5. **Backward Compatible**: Convert to/from dict format as needed
+```python
+from gsppy import Sequence, sequences_to_dict, dict_to_sequences
+# Create custom sequences
+seq = Sequence.from_tuple(("A", "B", "C"), support=5)
+# Extend sequences
+extended = seq.extend("D")  # Creates Sequence(("A", "B", "C", "D"))
+# Add metadata
+seq_with_meta = seq.with_metadata(confidence=0.85, lift=1.5)
+# Convert between formats for compatibility
+seq_result = gsp.search(min_support=0.3, return_sequences=True)
+dict_format = sequences_to_dict(seq_result[0])  # Convert to dict
+```
+For a complete example, see [examples/sequence_example.py](examples/sequence_example.py).
 ### Loading SPM/GSP Format Files
 GSP-Py supports loading datasets in the classical SPM/GSP delimiter format, which is widely used in sequential pattern mining research. This format uses:

{gsppy-4.0.0 → gsppy-4.1.0}/gsppy/__init__.py RENAMED Viewed

@@ -24,6 +24,12 @@ from gsppy.pruning import (
     FrequencyBasedPruning,
     create_default_pruning_strategy,
 )
+from gsppy.sequence import (
+    Sequence,
+    sequences_to_dict,
+    dict_to_sequences,
+    to_sequence,
+)
 from gsppy.token_mapper import TokenMapper
 # DataFrame adapters are optional - import only if dependencies are available
@@ -63,6 +69,10 @@ __all__ = [
     "TemporalAwarePruning",
     "CombinedPruning",
     "create_default_pruning_strategy",
+    "Sequence",
+    "sequences_to_dict",
+    "dict_to_sequences",
+    "to_sequence",
     "TokenMapper",
 ]

{gsppy-4.0.0 → gsppy-4.1.0}/gsppy/cli.py RENAMED Viewed

@@ -35,7 +35,7 @@ import csv
 import sys
 import json
 import logging
-from typing import Any, Dict, List, Tuple, Union, Optional, cast
+from typing import Any, List, Tuple, Union, Optional, cast
 import click
@@ -608,7 +608,7 @@ def main(
     # Initialize and run GSP algorithm
     try:
         gsp = GSP(transactions, mingap=mingap, maxgap=maxgap, maxspan=maxspan, verbose=verbose)
-        patterns: List[Dict[Tuple[str, ...], int]] = gsp.search(min_support=min_support)
+        patterns = gsp.search(min_support=min_support, return_sequences=False)
         logger.info("Frequent Patterns Found:")
         for i, level in enumerate(patterns, start=1):
             logger.info(f"\n{i}-Sequence Patterns:")

{gsppy-4.0.0 → gsppy-4.1.0}/gsppy/gsp.py RENAMED Viewed

@@ -90,7 +90,7 @@ from __future__ import annotations
 import math
 import logging
 import multiprocessing as mp
-from typing import TYPE_CHECKING, Dict, List, Tuple, Union, Optional, cast
+from typing import TYPE_CHECKING, Dict, List, Tuple, Union, Literal, Optional, cast, overload
 from itertools import chain
 from collections import Counter
@@ -102,6 +102,7 @@ from gsppy.utils import (
     is_subsequence_in_list_with_time_constraints,
 )
 from gsppy.pruning import PruningStrategy, create_default_pruning_strategy
+from gsppy.sequence import Sequence, dict_to_sequences
 from gsppy.accelerate import support_counts as support_counts_accel
 if TYPE_CHECKING:
@@ -590,13 +591,37 @@ class GSP:
         """
         logger.info("Run %d: %d candidates filtered to %d.", run, len(candidates), len(self.freq_patterns[run - 1]))
+    @overload
     def search(
         self,
         min_support: float = 0.2,
         max_k: Optional[int] = None,
         backend: Optional[str] = None,
         verbose: Optional[bool] = None,
-    ) -> List[Dict[Tuple[str, ...], int]]:
+        *,
+        return_sequences: Literal[False] = False,
+    ) -> List[Dict[Tuple[str, ...], int]]: ...
+    @overload
+    def search(
+        self,
+        min_support: float = 0.2,
+        max_k: Optional[int] = None,
+        backend: Optional[str] = None,
+        verbose: Optional[bool] = None,
+        *,
+        return_sequences: Literal[True],
+    ) -> List[List[Sequence]]: ...
+    def search(
+        self,
+        min_support: float = 0.2,
+        max_k: Optional[int] = None,
+        backend: Optional[str] = None,
+        verbose: Optional[bool] = None,
+        *,
+        return_sequences: bool = False,
+    ) -> Union[List[Dict[Tuple[str, ...], int]], List[List[Sequence]]]:
         """
         Execute the Generalized Sequential Pattern (GSP) mining algorithm.
@@ -617,11 +642,20 @@ class GSP:
                                     Note: temporal constraints always use Python backend.
             verbose (Optional[bool]): Override instance verbosity setting for this search.
                                      If None, uses the instance's verbose setting.
+            return_sequences (bool): If True, returns patterns as Sequence objects instead of
+                                    Dict[Tuple[str, ...], int]. Defaults to False for backward
+                                    compatibility. When True, returns List[List[Sequence]] where
+                                    each Sequence contains items, support count, and can be extended
+                                    with additional metadata.
         Returns:
-            List[Dict[Tuple[str, ...], int]]: A list of dictionaries containing frequent patterns
-                                              at each k-sequence level, with patterns as keys
-                                              and their support counts as values.
+            Union[List[Dict[Tuple[str, ...], int]], List[List[Sequence]]]:
+                If return_sequences is False (default):
+                    A list of dictionaries containing frequent patterns at each k-sequence level,
+                    with patterns as keys and their support counts as values.
+                If return_sequences is True:
+                    A list of lists containing Sequence objects at each k-sequence level,
+                    where each Sequence encapsulates the pattern items and support count.
         Raises:
             ValueError: If the minimum support threshold is not in the range `(0.0, 1.0]`.
@@ -632,7 +666,7 @@ class GSP:
             - Status updates for each iteration until the algorithm terminates.
         Examples:
-            Basic usage without temporal constraints:
+            Basic usage without temporal constraints (default tuple-based):
             ```python
             from gsppy.gsp import GSP
@@ -645,6 +679,28 @@ class GSP:
             gsp = GSP(transactions)
             patterns = gsp.search(min_support=0.3)
+            # Returns: [{('Bread',): 4, ('Milk',): 4, ...}, {('Bread', 'Milk'): 3, ...}, ...]
+            ```
+            Using Sequence objects for richer pattern representation:
+            ```python
+            from gsppy.gsp import GSP
+            transactions = [
+                ["Bread", "Milk"],
+                ["Bread", "Diaper", "Beer", "Eggs"],
+                ["Milk", "Diaper", "Beer", "Coke"],
+            ]
+            gsp = GSP(transactions)
+            patterns = gsp.search(min_support=0.3, return_sequences=True)
+            # Returns: [[Sequence(('Bread',), support=4), Sequence(('Milk',), support=4), ...], ...]
+            # Access pattern details
+            for level_patterns in patterns:
+                for seq in level_patterns:
+                    print(f"Pattern: {seq.items}, Support: {seq.support}")
             ```
             Usage with temporal constraints (requires timestamped transactions):
@@ -682,6 +738,9 @@ class GSP:
                 f"Using temporal constraints: mingap={self.mingap}, maxgap={self.maxgap}, maxspan={self.maxspan}"
             )
+        # Clear freq_patterns for this search (allow reusing the GSP instance)
+        self.freq_patterns = []
         # Convert fractional support to absolute count (ceil to preserve threshold semantics)
         abs_min_support = int(math.ceil(len(self.transactions) * float(min_support)))
@@ -729,4 +788,9 @@ class GSP:
             self.verbose = original_verbose
             self._configure_logging()
-        return self.freq_patterns[:-1]
+        # Return results in the requested format
+        result = self.freq_patterns[:-1]
+        if return_sequences:
+            # Convert Dict[Tuple[str, ...], int] to List[Sequence] for each level
+            return [dict_to_sequences(level_patterns) for level_patterns in result]
+        return result

gsppy 4.0.0__tar.gz → 4.1.0__tar.gz

gsppy 4.0.0tar.gz → 4.1.0tar.gz