PyPI - gsppy - Versions diffs - 3.0.0__tar.gz → 3.1.1__tar.gz - Mend

gsppy 3.0.0tar.gz → 3.1.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (22) hide show

{gsppy-3.0.0 → gsppy-3.1.1}/CHANGELOG.md +117 -0
{gsppy-3.0.0 → gsppy-3.1.1}/PKG-INFO +52 -18
{gsppy-3.0.0 → gsppy-3.1.1}/README.md +41 -7
{gsppy-3.0.0 → gsppy-3.1.1}/gsppy/utils.py +23 -6
{gsppy-3.0.0 → gsppy-3.1.1}/pyproject.toml +11 -11
{gsppy-3.0.0 → gsppy-3.1.1}/tests/test_gsp.py +142 -2
{gsppy-3.0.0 → gsppy-3.1.1}/tests/test_utils.py +68 -4
{gsppy-3.0.0 → gsppy-3.1.1}/.gitignore +0 -0
{gsppy-3.0.0 → gsppy-3.1.1}/CONTRIBUTING.md +0 -0
{gsppy-3.0.0 → gsppy-3.1.1}/LICENSE +0 -0
{gsppy-3.0.0 → gsppy-3.1.1}/SECURITY.md +0 -0
{gsppy-3.0.0 → gsppy-3.1.1}/gsppy/__init__.py +0 -0
{gsppy-3.0.0 → gsppy-3.1.1}/gsppy/accelerate.py +0 -0
{gsppy-3.0.0 → gsppy-3.1.1}/gsppy/cli.py +0 -0
{gsppy-3.0.0 → gsppy-3.1.1}/gsppy/gsp.py +0 -0
{gsppy-3.0.0 → gsppy-3.1.1}/mypy.ini +0 -0
{gsppy-3.0.0 → gsppy-3.1.1}/rust/Cargo.lock +0 -0
{gsppy-3.0.0 → gsppy-3.1.1}/rust/Cargo.toml +0 -0
{gsppy-3.0.0 → gsppy-3.1.1}/rust/src/lib.rs +0 -0
{gsppy-3.0.0 → gsppy-3.1.1}/tests/__init__.py +0 -0
{gsppy-3.0.0 → gsppy-3.1.1}/tests/test_cli.py +0 -0
{gsppy-3.0.0 → gsppy-3.1.1}/tox.ini +0 -0

{gsppy-3.0.0 → gsppy-3.1.1}/CHANGELOG.md RENAMED Viewed

@@ -1,5 +1,122 @@
 # Changelog
+## [v3.0.0] - 2025-09-14
+### **New Features**
+* **Acceleration Backends Introduced**:
+  * **Rust Backend**: Fast, parallelized subsequence matching using PyO3 + Rayon.
+  * **GPU Backend (Experimental)**: Singleton (`k=1`) support counting accelerated via CuPy.
+* **Unified Backend Selection**:
+  * Backend can be selected via:
+    * Python API (`backend="auto" | "rust" | "gpu" | "python"`)
+    * CLI (`--backend`)
+    * Environment variable (`GSPPY_BACKEND`)
+  * Default: `auto` (tries Rust, falls back to Python).
+* **Optional GPU Install**:
+  * Install with `pip install "gsppy[gpu]"` for CuPy support.
+### **CLI and API Enhancements**
+* Updated CLI to support `--backend` option.
+* Python API `GSP.search()` accepts `backend=` parameter for runtime selection.
+* Environment variable fallback documented and supported.
+### **Documentation Updates**
+* **README.md** updated:
+  * New “GPU Acceleration” section with installation/usage guidance.
+  * Backend selector documentation for CLI, API, and env var.
+  * Python version compatibility badge updated to 3.10+.
+* **CONTRIBUTING.md**:
+  * Rewritten with new setup instructions using `uv` and `Makefile`.
+  * Added pre-commit and `tox-uv` instructions.
+### **Tooling and Developer Experience**
+* **Migrated Tooling**:
+  * Moved from Rye to `uv` for dependency and environment management.
+  * Introduced `uv.lock` for reproducible, cross-version installations.
+* **Makefile**:
+  * Common targets added (`make setup`, `make install`, `make lint`, etc.).
+  * One-liner for full dev bootstrap: `make setup && make install && make pre-commit-install`
+* **Pre-commit hooks**:
+  * Added `.pre-commit-config.yaml` with `ruff`, `pyright`, and `pytest`.
+* **DevContainer Support**:
+  * Added `devcontainer.json` for VS Code one-click environment provisioning.
+### **Testing and Quality**
+* Test suite fully adapted to new tooling (`uv`, `tox-uv`).
+* Static typing validated with both `mypy` and `pyright`.
+* All 38 tests pass locally and in CI (Python 3.10–3.13).
+### **CI/CD and Infrastructure**
+* GitHub Actions updated:
+  * All jobs use `uv` for setup and execution.
+  * `tox` now powered by `tox-uv`, auto-provisions interpreters.
+* Updated testing matrix in `codecov.yml` to Python 3.13.
+* Simplified CI jobs by consolidating environment setup steps.
+### **Packaging & Compatibility**
+* Dropped Python 3.8 and 3.9 support:
+  * Project now requires Python **3.10+**.
+* `pyproject.toml` updated:
+  * `requires-python = ">=3.10"`
+  * Optional `[gpu]` extra defined for CuPy support.
+  * Development dependencies and classifiers modernized.
+### **Code Improvements**
+* Improved type hints and formatting in:
+  * `gsp.py`, `cli.py`, and `utils.py`
+* Lint fixes and logging improvements.
+* Code follows unified `ruff` formatting.
+### **Breaking Changes**
+* 🔥 Dropped support for Python 3.8 and 3.9.
+  * Users must upgrade to Python 3.10 or newer.
+### **Migration Guide**
+For contributors or CI environments:
+```bash
+# Install uv (first time only)
+curl -Ls https://astral.sh/uv/install.sh | bash
+# Setup new local dev environment
+make setup
+make install
+make pre-commit-install
+```
+### **Performance Notes**
+* **GPU acceleration** boosts singleton (k=1) counting using CuPy’s `bincount`.
+* **Rust backend** delivers robust multithreaded subsequence matching.
+* Backend auto-dispatch adapts based on availability and selection.
+---
 ## [v2.3.0] - 2025-01-05
 ### **New Features**

{gsppy-3.0.0 → gsppy-3.1.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: gsppy
-Version: 3.0.0
+Version: 3.1.1
 Summary: GSP (Generalized Sequence Pattern) algorithm in Python
 Project-URL: Homepage, https://github.com/jacksonpradolima/gsp-py
 Author-email: Jackson Antonio do Prado Lima <jacksonpradolima@gmail.com>
@@ -41,21 +41,21 @@ Classifier: Topic :: Software Development :: Libraries :: Python Modules
 Requires-Python: >=3.10
 Requires-Dist: click>=8.0.0
 Provides-Extra: dev
-Requires-Dist: cython==3.1.3; extra == 'dev'
-Requires-Dist: hatch==1.14.0; extra == 'dev'
+Requires-Dist: cython==3.1.4; extra == 'dev'
+Requires-Dist: hatch==1.15.1; extra == 'dev'
 Requires-Dist: hatchling==1.27.0; extra == 'dev'
-Requires-Dist: mypy==1.18.1; extra == 'dev'
-Requires-Dist: pylint==3.2.7; extra == 'dev'
-Requires-Dist: pyright==1.1.405; extra == 'dev'
+Requires-Dist: mypy==1.18.2; extra == 'dev'
+Requires-Dist: pylint==4.0.2; extra == 'dev'
+Requires-Dist: pyright==1.1.406; extra == 'dev'
 Requires-Dist: pytest-benchmark==5.1.0; extra == 'dev'
-Requires-Dist: pytest-cov==5.0.0; extra == 'dev'
-Requires-Dist: pytest==8.3.4; extra == 'dev'
-Requires-Dist: ruff==0.13.0; extra == 'dev'
-Requires-Dist: tox==4.30.2; extra == 'dev'
+Requires-Dist: pytest-cov==7.0.0; extra == 'dev'
+Requires-Dist: pytest==8.4.2; extra == 'dev'
+Requires-Dist: ruff==0.13.3; extra == 'dev'
+Requires-Dist: tox==4.32.0; extra == 'dev'
 Provides-Extra: gpu
 Requires-Dist: cupy<14,>=11; extra == 'gpu'
 Provides-Extra: rust
-Requires-Dist: maturin==1.6.0; extra == 'rust'
+Requires-Dist: maturin==1.9.6; extra == 'rust'
 Description-Content-Type: text/markdown
 [![PyPI License](https://img.shields.io/pypi/l/gsppy.svg?style=flat-square)]()
@@ -104,14 +104,15 @@ principles**. Using support thresholds, GSP identifies frequent sequences of ite
 ### Key Features:
+- **Ordered (non-contiguous) matching**: Detects patterns where items appear in order but not necessarily adjacent, following standard GSP semantics. For example, the pattern `('A', 'C')` is found in the sequence `['A', 'B', 'C']`.
 - **Support-based pruning**: Only retains sequences that meet the minimum support threshold.
 - **Candidate generation**: Iteratively generates candidate sequences of increasing length.
 - **General-purpose**: Useful in retail, web analytics, social networks, temporal sequence mining, and more.
 For example:
-- In a shopping dataset, GSP can identify patterns like "Customers who buy bread and milk often purchase diapers next."
-- In a website clickstream, GSP might find patterns like "Users visit A, then go to B, and later proceed to C."
+- In a shopping dataset, GSP can identify patterns like "Customers who buy bread and milk often purchase diapers next" - even if other items appear between bread and milk.
+- In a website clickstream, GSP might find patterns like "Users visit A, then eventually go to C" - capturing user journeys with intermediate steps.
 ---
@@ -427,24 +428,57 @@ Sample Output:
 ```python
 [
     {('Bread',): 4, ('Milk',): 4, ('Diaper',): 4, ('Beer',): 3, ('Coke',): 2},
-    {('Bread', 'Milk'): 3, ('Milk', 'Diaper'): 3, ('Diaper', 'Beer'): 3},
-    {('Bread', 'Milk', 'Diaper'): 2, ('Milk', 'Diaper', 'Beer'): 2}
+    {('Bread', 'Milk'): 3, ('Bread', 'Diaper'): 3, ('Bread', 'Beer'): 2, ('Milk', 'Diaper'): 3, ('Milk', 'Beer'): 2, ('Milk', 'Coke'): 2, ('Diaper', 'Beer'): 3, ('Diaper', 'Coke'): 2},
+    {('Bread', 'Milk', 'Diaper'): 2, ('Bread', 'Diaper', 'Beer'): 2, ('Milk', 'Diaper', 'Beer'): 2, ('Milk', 'Diaper', 'Coke'): 2}
 ]
 ```
 - The **first dictionary** contains single-item sequences with their frequencies (e.g., `('Bread',): 4` means "Bread"
   appears in 4 transactions).
 - The **second dictionary** contains 2-item sequential patterns (e.g., `('Bread', 'Milk'): 3` means the sequence "
-  Bread → Milk" appears in 3 transactions).
+  Bread → Milk" appears in 3 transactions). Note that patterns like `('Bread', 'Beer')` are detected even when they don't appear adjacent in transactions - they just need to appear in order.
 - The **third dictionary** contains 3-item sequential patterns (e.g., `('Bread', 'Milk', 'Diaper'): 2` means the
   sequence "Bread → Milk → Diaper" appears in 2 transactions).
 > [!NOTE]
-> The **support** of a sequence is calculated as the fraction of transactions containing the sequence, e.g.,
-`[Bread, Milk]` appears in 3 out of 5 transactions → Support = `3 / 5 = 0.6` (60%).
+> The **support** of a sequence is calculated as the fraction of transactions containing the sequence **in order** (not necessarily contiguously), e.g.,
+`('Bread', 'Milk')` appears in 3 out of 5 transactions → Support = `3 / 5 = 0.6` (60%).
 > This insight helps identify frequently occurring sequential patterns in datasets, such as shopping trends or user
 > behavior.
+> [!IMPORTANT]
+> **Non-contiguous (ordered) matching**: GSP-Py detects patterns where items appear in the specified order but not necessarily adjacent. For example, the pattern `('Bread', 'Beer')` matches the transaction `['Bread', 'Milk', 'Diaper', 'Beer']` because Bread appears before Beer, even though they are not adjacent. This follows the standard GSP algorithm semantics for sequential pattern mining.
+### Understanding Non-Contiguous Pattern Matching
+GSP-Py follows the standard GSP algorithm semantics by detecting **ordered (non-contiguous)** subsequences. This means:
+- ✅ **Order matters**: Items must appear in the specified sequence order
+- ✅ **Gaps allowed**: Items don't need to be adjacent
+- ❌ **Wrong order rejected**: Items appearing in different order won't match
+**Example:**
+```python
+from gsppy.gsp import GSP
+sequences = [
+    ['a', 'b', 'c'],  # Contains: (a,b), (a,c), (b,c), (a,b,c)
+    ['a', 'c'],       # Contains: (a,c)
+    ['b', 'c', 'a'],  # Contains: (b,c), (b,a), (c,a)
+    ['a', 'b', 'c', 'd'],  # Contains: (a,b), (a,c), (a,d), (b,c), (b,d), (c,d), etc.
+]
+gsp = GSP(sequences)
+result = gsp.search(min_support=0.5)  # Need at least 2/4 sequences
+# Pattern ('a', 'c') is found with support=3 because:
+# - It appears in ['a', 'b', 'c'] (with 'b' in between)
+# - It appears in ['a', 'c'] (adjacent)
+# - It appears in ['a', 'b', 'c', 'd'] (with 'b' in between)
+# Total: 3 out of 4 sequences = 75% support ✅
+```
 > [!TIP]
 > For more complex examples, find example scripts in the [`gsppy/tests`](gsppy/tests) folder.

{gsppy-3.0.0 → gsppy-3.1.1}/README.md RENAMED Viewed

@@ -44,14 +44,15 @@ principles**. Using support thresholds, GSP identifies frequent sequences of ite
 ### Key Features:
+- **Ordered (non-contiguous) matching**: Detects patterns where items appear in order but not necessarily adjacent, following standard GSP semantics. For example, the pattern `('A', 'C')` is found in the sequence `['A', 'B', 'C']`.
 - **Support-based pruning**: Only retains sequences that meet the minimum support threshold.
 - **Candidate generation**: Iteratively generates candidate sequences of increasing length.
 - **General-purpose**: Useful in retail, web analytics, social networks, temporal sequence mining, and more.
 For example:
-- In a shopping dataset, GSP can identify patterns like "Customers who buy bread and milk often purchase diapers next."
-- In a website clickstream, GSP might find patterns like "Users visit A, then go to B, and later proceed to C."
+- In a shopping dataset, GSP can identify patterns like "Customers who buy bread and milk often purchase diapers next" - even if other items appear between bread and milk.
+- In a website clickstream, GSP might find patterns like "Users visit A, then eventually go to C" - capturing user journeys with intermediate steps.
 ---
@@ -367,24 +368,57 @@ Sample Output:
 ```python
 [
     {('Bread',): 4, ('Milk',): 4, ('Diaper',): 4, ('Beer',): 3, ('Coke',): 2},
-    {('Bread', 'Milk'): 3, ('Milk', 'Diaper'): 3, ('Diaper', 'Beer'): 3},
-    {('Bread', 'Milk', 'Diaper'): 2, ('Milk', 'Diaper', 'Beer'): 2}
+    {('Bread', 'Milk'): 3, ('Bread', 'Diaper'): 3, ('Bread', 'Beer'): 2, ('Milk', 'Diaper'): 3, ('Milk', 'Beer'): 2, ('Milk', 'Coke'): 2, ('Diaper', 'Beer'): 3, ('Diaper', 'Coke'): 2},
+    {('Bread', 'Milk', 'Diaper'): 2, ('Bread', 'Diaper', 'Beer'): 2, ('Milk', 'Diaper', 'Beer'): 2, ('Milk', 'Diaper', 'Coke'): 2}
 ]
 ```
 - The **first dictionary** contains single-item sequences with their frequencies (e.g., `('Bread',): 4` means "Bread"
   appears in 4 transactions).
 - The **second dictionary** contains 2-item sequential patterns (e.g., `('Bread', 'Milk'): 3` means the sequence "
-  Bread → Milk" appears in 3 transactions).
+  Bread → Milk" appears in 3 transactions). Note that patterns like `('Bread', 'Beer')` are detected even when they don't appear adjacent in transactions - they just need to appear in order.
 - The **third dictionary** contains 3-item sequential patterns (e.g., `('Bread', 'Milk', 'Diaper'): 2` means the
   sequence "Bread → Milk → Diaper" appears in 2 transactions).
 > [!NOTE]
-> The **support** of a sequence is calculated as the fraction of transactions containing the sequence, e.g.,
-`[Bread, Milk]` appears in 3 out of 5 transactions → Support = `3 / 5 = 0.6` (60%).
+> The **support** of a sequence is calculated as the fraction of transactions containing the sequence **in order** (not necessarily contiguously), e.g.,
+`('Bread', 'Milk')` appears in 3 out of 5 transactions → Support = `3 / 5 = 0.6` (60%).
 > This insight helps identify frequently occurring sequential patterns in datasets, such as shopping trends or user
 > behavior.
+> [!IMPORTANT]
+> **Non-contiguous (ordered) matching**: GSP-Py detects patterns where items appear in the specified order but not necessarily adjacent. For example, the pattern `('Bread', 'Beer')` matches the transaction `['Bread', 'Milk', 'Diaper', 'Beer']` because Bread appears before Beer, even though they are not adjacent. This follows the standard GSP algorithm semantics for sequential pattern mining.
+### Understanding Non-Contiguous Pattern Matching
+GSP-Py follows the standard GSP algorithm semantics by detecting **ordered (non-contiguous)** subsequences. This means:
+- ✅ **Order matters**: Items must appear in the specified sequence order
+- ✅ **Gaps allowed**: Items don't need to be adjacent
+- ❌ **Wrong order rejected**: Items appearing in different order won't match
+**Example:**
+```python
+from gsppy.gsp import GSP
+sequences = [
+    ['a', 'b', 'c'],  # Contains: (a,b), (a,c), (b,c), (a,b,c)
+    ['a', 'c'],       # Contains: (a,c)
+    ['b', 'c', 'a'],  # Contains: (b,c), (b,a), (c,a)
+    ['a', 'b', 'c', 'd'],  # Contains: (a,b), (a,c), (a,d), (b,c), (b,d), (c,d), etc.
+]
+gsp = GSP(sequences)
+result = gsp.search(min_support=0.5)  # Need at least 2/4 sequences
+# Pattern ('a', 'c') is found with support=3 because:
+# - It appears in ['a', 'b', 'c'] (with 'b' in between)
+# - It appears in ['a', 'c'] (adjacent)
+# - It appears in ['a', 'b', 'c', 'd'] (with 'b' in between)
+# Total: 3 out of 4 sequences = 75% support ✅
+```
 > [!TIP]
 > For more complex examples, find example scripts in the [`gsppy/tests`](gsppy/tests) folder.

{gsppy-3.0.0 → gsppy-3.1.1}/gsppy/utils.py RENAMED Viewed

@@ -5,14 +5,14 @@ and generating candidate patterns from previously frequent patterns.
 The key functionalities include:
 1. Splitting a list of items into smaller batches for easier processing.
-2. Checking for the existence of a contiguous subsequence within a sequence,
+2. Checking for the existence of an ordered (non-contiguous) subsequence within a sequence,
    with caching to optimize repeated comparisons.
 3. Generating candidate patterns from a dictionary of frequent patterns
    to support pattern generation tasks in algorithms like sequence mining.
 Main functionalities:
 - `split_into_batches`: Splits a list of items into smaller batches based on a specified batch size.
-- `is_subsequence_in_list`: Determines if a subsequence exists within another sequence,
+- `is_subsequence_in_list`: Determines if a subsequence exists within another sequence in order,
   using caching to improve performance.
 - `generate_candidates_from_previous`: Generates candidate patterns by joining previously
   identified frequent patterns.
@@ -46,7 +46,10 @@ def split_into_batches(
 @lru_cache(maxsize=None)
 def is_subsequence_in_list(subsequence: Tuple[str, ...], sequence: Tuple[str, ...]) -> bool:
     """
-    Check if a subsequence exists within a sequence as a contiguous subsequence.
+    Check if a subsequence exists within a sequence as an ordered (non-contiguous) subsequence.
+    This function implements the standard GSP semantics where items in the subsequence
+    must appear in the same order in the sequence, but not necessarily contiguously.
     Parameters:
         subsequence: (tuple): The sequence to search for.
@@ -54,6 +57,14 @@ def is_subsequence_in_list(subsequence: Tuple[str, ...], sequence: Tuple[str, ..
     Returns:
         bool: True if the subsequence is found, False otherwise.
+    Examples:
+        >>> is_subsequence_in_list(('a', 'c'), ('a', 'b', 'c'))
+        True
+        >>> is_subsequence_in_list(('a', 'c'), ('c', 'a'))
+        False
+        >>> is_subsequence_in_list(('a', 'b'), ('a', 'b', 'c'))
+        True
     """
     # Handle the case where the subsequence is empty - it should not exist in any sequence
     if not subsequence:
@@ -61,12 +72,18 @@ def is_subsequence_in_list(subsequence: Tuple[str, ...], sequence: Tuple[str, ..
     len_sub, len_seq = len(subsequence), len(sequence)
-    # Return False if the sequence is longer than the list
+    # Return False if the subsequence is longer than the sequence
     if len_sub > len_seq:
         return False
-    # Use any to check if any slice matches the sequence
-    return any(sequence[i : i + len_sub] == subsequence for i in range(len_seq - len_sub + 1))
+    # Use two-pointer approach to check if subsequence exists in order
+    sub_idx = 0
+    for seq_idx in range(len_seq):
+        if sequence[seq_idx] == subsequence[sub_idx]:
+            sub_idx += 1
+            if sub_idx == len_sub:
+                return True
+    return False
 def generate_candidates_from_previous(prev_patterns: Dict[Tuple[str, ...], int]) -> List[Tuple[str, ...]]:

{gsppy-3.0.0 → gsppy-3.1.1}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "gsppy"
-version = "3.0.0"
+version = "3.1.1"
 description = "GSP (Generalized Sequence Pattern) algorithm in Python"
 keywords = ["GSP", "sequential patterns", "data analysis", "sequence mining"]
 license = { file = "LICENSE" }
@@ -39,20 +39,20 @@ gsppy = "gsppy.cli:main"
 [project.optional-dependencies]
 dev = [
-    "cython==3.1.3",
-    "hatch==1.14.0",
+    "cython==3.1.4",
+    "hatch==1.15.1",
     "hatchling==1.27.0",
-    "mypy==1.18.1",
-    "pylint==3.2.7",
-    "pyright==1.1.405",
-    "pytest==8.3.4",
+    "mypy==1.18.2",
+    "pylint==4.0.2",
+    "pyright==1.1.406",
+    "pytest==8.4.2",
     "pytest-benchmark==5.1.0",
-    "pytest-cov==5.0.0",
-    "ruff==0.13.0",
-    "tox==4.30.2",
+    "pytest-cov==7.0.0",
+    "ruff==0.13.3",
+    "tox==4.32.0",
 ]
 rust = [
-    "maturin==1.6.0"
+    "maturin==1.9.6"
 ]
 gpu = [
     "cupy>=11,<14"

{gsppy-3.0.0 → gsppy-3.1.1}/tests/test_gsp.py RENAMED Viewed

@@ -168,13 +168,28 @@ def test_frequent_patterns(supermarket_transactions: List[List[str]]) -> None:
     Asserts:
         - The frequent patterns should match the expected result.
+        - Non-contiguous patterns are correctly detected.
     """
     gsp = GSP(supermarket_transactions)
     result = gsp.search(min_support=0.3)
     expected = [
         {("Bread",): 4, ("Milk",): 4, ("Diaper",): 4, ("Beer",): 3, ("Coke",): 2},
-        {("Bread", "Milk"): 3, ("Milk", "Diaper"): 3, ("Diaper", "Beer"): 3},
-        {("Bread", "Milk", "Diaper"): 2, ("Milk", "Diaper", "Beer"): 2},
+        {
+            ("Bread", "Milk"): 3,
+            ("Bread", "Diaper"): 3,
+            ("Bread", "Beer"): 2,
+            ("Milk", "Diaper"): 3,
+            ("Milk", "Beer"): 2,
+            ("Milk", "Coke"): 2,
+            ("Diaper", "Beer"): 3,
+            ("Diaper", "Coke"): 2,
+        },
+        {
+            ("Bread", "Milk", "Diaper"): 2,
+            ("Bread", "Diaper", "Beer"): 2,
+            ("Milk", "Diaper", "Beer"): 2,
+            ("Milk", "Diaper", "Coke"): 2,
+        },
     ]
     assert result == expected, "Frequent patterns do not match expected results."
@@ -231,6 +246,131 @@ def test_partial_match(supermarket_transactions: List[List[str]]) -> None:
         assert result_level_2 >= expected_patterns_level_2, f"Level 2 patterns mismatch. Got {result_level_2}"
+def test_non_contiguous_subsequences() -> None:
+    """
+    Test the GSP algorithm correctly detects non-contiguous subsequences (Issue #115).
+    This test validates that patterns like ('a', 'c') are detected even when
+    they appear with gaps in sequences like ['a', 'b', 'c'].
+    Asserts:
+        - Non-contiguous patterns are correctly identified with proper support.
+    """
+    sequences = [
+        ["a", "b", "c"],
+        ["a", "c"],
+        ["b", "c", "a"],
+        ["a", "b", "c", "d"],
+    ]
+    gsp = GSP(sequences)
+    result = gsp.search(min_support=0.5)
+    # Expected: ('a', 'c') should be found with support = 3
+    # It appears in: ['a', 'b', 'c'], ['a', 'c'], ['a', 'b', 'c', 'd']
+    assert len(result) >= 2, "Expected at least 2 levels of patterns"
+    level_2_patterns = result[1]
+    assert ("a", "c") in level_2_patterns, f"Pattern ('a', 'c') not found in level 2. Got {level_2_patterns}"
+    assert level_2_patterns[("a", "c")] == 3, f"Expected support 3 for ('a', 'c'), got {level_2_patterns[('a', 'c')]}"
+def test_contiguous_vs_non_contiguous_patterns() -> None:
+    """
+    Comprehensive test demonstrating the difference between contiguous and non-contiguous patterns.
+    This test shows patterns that would ONLY be found in non-contiguous matching (current implementation)
+    vs patterns that would be found in BOTH contiguous and non-contiguous matching.
+    The current implementation uses non-contiguous (ordered) matching, which is the standard GSP behavior.
+    """
+    sequences = [
+        ["X", "Y", "Z"],  # Contains X->Y, Y->Z, X->Z (contiguous: X->Y, Y->Z only)
+        ["X", "Z"],  # Contains X->Z (contiguous: X->Z)
+        ["Y", "Z", "X"],  # Contains Y->Z, Y->X, Z->X (contiguous: Y->Z, Z->X only)
+        ["X", "Y", "Z", "W"],  # Contains many patterns
+    ]
+    gsp = GSP(sequences)
+    result = gsp.search(min_support=0.5)  # Need at least 2/4 sequences
+    # Level 2 patterns
+    level_2_patterns = result[1] if len(result) >= 2 else {}
+    # Patterns that would be found in BOTH contiguous and non-contiguous:
+    # ('X', 'Y') appears contiguously in: ['X', 'Y', 'Z'], ['X', 'Y', 'Z', 'W']
+    # ('Y', 'Z') appears contiguously in: ['X', 'Y', 'Z'], ['Y', 'Z', 'X'], ['X', 'Y', 'Z', 'W']
+    assert ("X", "Y") in level_2_patterns, "('X', 'Y') should be found (contiguous in 2 sequences)"
+    assert ("Y", "Z") in level_2_patterns, "('Y', 'Z') should be found (contiguous in 3 sequences)"
+    # Pattern that would ONLY be found in non-contiguous matching:
+    # ('X', 'Z') appears with gap in: ['X', 'Y', 'Z'], ['X', 'Y', 'Z', 'W']
+    # and contiguously in: ['X', 'Z']
+    # Total support = 3 (>= 2 threshold)
+    assert ("X", "Z") in level_2_patterns, (
+        "('X', 'Z') should be found with non-contiguous matching. "
+        "This pattern has gaps in some sequences but is still ordered."
+    )
+    assert level_2_patterns[("X", "Z")] == 3, f"Expected support 3 for ('X', 'Z'), got {level_2_patterns[('X', 'Z')]}"
+def test_non_contiguous_with_longer_gaps() -> None:
+    """
+    Test non-contiguous matching with longer gaps between elements.
+    This demonstrates that the algorithm correctly finds patterns even when
+    there are multiple elements between the pattern elements.
+    """
+    sequences = [
+        ["A", "B", "C", "D", "E"],  # Contains A->E with 3 elements in between
+        ["A", "X", "Y", "Z", "E"],  # Contains A->E with 3 different elements in between
+        ["A", "E"],  # Contains A->E with no gap
+        ["E", "A"],  # Does NOT contain A->E (wrong order)
+    ]
+    gsp = GSP(sequences)
+    result = gsp.search(min_support=0.5)  # Need at least 2/4 sequences
+    # ('A', 'E') should be found with support = 3
+    level_2_patterns = result[1] if len(result) >= 2 else {}
+    assert ("A", "E") in level_2_patterns, "('A', 'E') should be found despite large gaps"
+    assert level_2_patterns[("A", "E")] == 3, f"Expected support 3 for ('A', 'E'), got {level_2_patterns[('A', 'E')]}"
+    # ('E', 'A') should NOT be found (wrong order)
+    assert ("E", "A") not in level_2_patterns, "('E', 'A') should not be found (wrong order)"
+def test_order_sensitivity() -> None:
+    """
+    Test that the algorithm is sensitive to order - patterns must appear in sequence order.
+    This verifies that even with non-contiguous matching, the order of elements matters.
+    """
+    sequences = [
+        ["P", "Q", "R"],  # Contains P->Q, P->R, Q->R
+        ["P", "R", "Q"],  # Contains P->R, P->Q, R->Q
+        ["Q", "P", "R"],  # Contains Q->P, Q->R, P->R
+        ["R", "Q", "P"],  # Contains R->Q, R->P, Q->P
+    ]
+    gsp = GSP(sequences)
+    result = gsp.search(min_support=0.5)  # Need at least 2/4 sequences
+    level_2_patterns = result[1] if len(result) >= 2 else {}
+    # ('P', 'R') appears in correct order in: ['P', 'Q', 'R'], ['P', 'R', 'Q'], ['Q', 'P', 'R']
+    assert ("P", "R") in level_2_patterns, "('P', 'R') should be found (support = 3)"
+    assert level_2_patterns[("P", "R")] == 3
+    # ('Q', 'P') appears in correct order in: ['Q', 'P', 'R'], ['R', 'Q', 'P']
+    assert ("Q", "P") in level_2_patterns, "('Q', 'P') should be found (support = 2)"
+    assert level_2_patterns[("Q", "P")] == 2
+    # ('R', 'P') appears in correct order in: ['R', 'Q', 'P']
+    # Support = 1, below threshold of 2
+    assert ("R", "P") not in level_2_patterns, "('R', 'P') should not be found (support = 1, below threshold)"
 @pytest.mark.parametrize("min_support", [0.1, 0.2, 0.3, 0.4, 0.5])
 def test_benchmark(benchmark: BenchmarkFixture, supermarket_transactions: List[List[str]], min_support: float) -> None:
     """

{gsppy-3.0.0 → gsppy-3.1.1}/tests/test_utils.py RENAMED Viewed

@@ -45,13 +45,19 @@ def test_is_subsequence_in_list():
     """
     Test the `is_subsequence_in_list` utility function.
     """
-    # Test when the subsequence is present
-    assert is_subsequence_in_list((1, 2), (0, 1, 2, 3)), "Failed to find subsequence"
+    # Test when the subsequence is present (contiguous)
+    assert is_subsequence_in_list((1, 2), (0, 1, 2, 3)), "Failed to find contiguous subsequence"
     assert is_subsequence_in_list((3,), (0, 1, 2, 3)), "Failed single-element subsequence"
-    # Test when the subsequence is not present
-    assert not is_subsequence_in_list((1, 3), (0, 1, 2, 3)), "Incorrectly found non-contiguous subsequence"
+    # Test when the subsequence is present (non-contiguous)
+    assert is_subsequence_in_list((1, 3), (0, 1, 2, 3)), "Failed to find non-contiguous subsequence"
+    assert is_subsequence_in_list((0, 2), (0, 1, 2, 3)), "Failed to find non-contiguous subsequence"
+    assert is_subsequence_in_list((0, 3), (0, 1, 2, 3)), "Failed to find non-contiguous subsequence"
+    # Test when the subsequence is not present (wrong order or missing elements)
+    assert not is_subsequence_in_list((3, 1), (0, 1, 2, 3)), "Incorrectly found reversed subsequence"
     assert not is_subsequence_in_list((4,), (0, 1, 2, 3)), "Incorrectly found non-existent subsequence"
+    assert not is_subsequence_in_list((2, 1), (0, 1, 2, 3)), "Incorrectly found out-of-order subsequence"
     # Test when input sequence or subsequence is empty
     assert not is_subsequence_in_list((), (0, 1, 2, 3)), "Incorrect positive result for empty subsequence"
@@ -61,6 +67,64 @@ def test_is_subsequence_in_list():
     assert not is_subsequence_in_list((1, 2, 3, 4), (1, 2, 3)), "Failed to reject long subsequence"
+def test_is_subsequence_contiguous_vs_non_contiguous():
+    """
+    Test cases that demonstrate the difference between contiguous and non-contiguous matching.
+    The current implementation uses non-contiguous (ordered) matching.
+    This test documents patterns that would differ between the two approaches.
+    """
+    # Pattern that appears with gaps (non-contiguous)
+    # In contiguous mode: would NOT match
+    # In non-contiguous mode: DOES match
+    assert is_subsequence_in_list(("a", "c"), ("a", "b", "c")), (
+        "Non-contiguous: ('a', 'c') should match in ('a', 'b', 'c')"
+    )
+    assert is_subsequence_in_list(("a", "d"), ("a", "b", "c", "d")), (
+        "Non-contiguous: ('a', 'd') should match in ('a', 'b', 'c', 'd')"
+    )
+    assert is_subsequence_in_list((1, 4), (1, 2, 3, 4, 5)), (
+        "Non-contiguous: (1, 4) should match in (1, 2, 3, 4, 5)"
+    )
+    # Pattern that appears contiguously (would match in both modes)
+    assert is_subsequence_in_list(("a", "b"), ("a", "b", "c")), (
+        "Contiguous: ('a', 'b') should match in ('a', 'b', 'c')"
+    )
+    assert is_subsequence_in_list((2, 3), (1, 2, 3, 4)), (
+        "Contiguous: (2, 3) should match in (1, 2, 3, 4)"
+    )
+    # Pattern with wrong order (would NOT match in either mode)
+    assert not is_subsequence_in_list(("c", "a"), ("a", "b", "c")), (
+        "Wrong order: ('c', 'a') should NOT match in ('a', 'b', 'c')"
+    )
+    assert not is_subsequence_in_list((3, 1), (1, 2, 3, 4)), (
+        "Wrong order: (3, 1) should NOT match in (1, 2, 3, 4)"
+    )
+def test_is_subsequence_with_gaps():
+    """
+    Test non-contiguous matching with various gap sizes.
+    """
+    # Small gap
+    assert is_subsequence_in_list(("x", "z"), ("x", "y", "z")), "Failed with 1 element gap"
+    # Medium gap
+    assert is_subsequence_in_list(("a", "e"), ("a", "b", "c", "d", "e")), "Failed with 3 element gap"
+    # Large gap
+    assert is_subsequence_in_list((1, 10), (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)), "Failed with 8 element gap"
+    # Multiple gaps in longer pattern
+    assert is_subsequence_in_list((1, 3, 5), (1, 2, 3, 4, 5)), "Failed with multiple gaps"
+    assert is_subsequence_in_list(("a", "c", "e"), ("a", "b", "c", "d", "e")), "Failed with multiple gaps"
+    # No gap (adjacent elements still work)
+    assert is_subsequence_in_list((1, 2), (1, 2, 3)), "Failed with no gap (contiguous)"
 def test_generate_candidates_from_previous():
     """
     Test the `generate_candidates_from_previous` utility function.

{gsppy-3.0.0 → gsppy-3.1.1}/.gitignore RENAMED Viewed

File without changes

{gsppy-3.0.0 → gsppy-3.1.1}/CONTRIBUTING.md RENAMED Viewed

File without changes

{gsppy-3.0.0 → gsppy-3.1.1}/LICENSE RENAMED Viewed

File without changes

{gsppy-3.0.0 → gsppy-3.1.1}/SECURITY.md RENAMED Viewed

File without changes

{gsppy-3.0.0 → gsppy-3.1.1}/gsppy/__init__.py RENAMED Viewed

File without changes

{gsppy-3.0.0 → gsppy-3.1.1}/gsppy/accelerate.py RENAMED Viewed

File without changes

{gsppy-3.0.0 → gsppy-3.1.1}/gsppy/cli.py RENAMED Viewed

File without changes

{gsppy-3.0.0 → gsppy-3.1.1}/gsppy/gsp.py RENAMED Viewed

File without changes

{gsppy-3.0.0 → gsppy-3.1.1}/mypy.ini RENAMED Viewed

File without changes

{gsppy-3.0.0 → gsppy-3.1.1}/rust/Cargo.lock RENAMED Viewed

File without changes

{gsppy-3.0.0 → gsppy-3.1.1}/rust/Cargo.toml RENAMED Viewed

File without changes

{gsppy-3.0.0 → gsppy-3.1.1}/rust/src/lib.rs RENAMED Viewed

File without changes

{gsppy-3.0.0 → gsppy-3.1.1}/tests/__init__.py RENAMED Viewed

File without changes

{gsppy-3.0.0 → gsppy-3.1.1}/tests/test_cli.py RENAMED Viewed

File without changes

{gsppy-3.0.0 → gsppy-3.1.1}/tox.ini RENAMED Viewed

File without changes

gsppy 3.0.0__tar.gz → 3.1.1__tar.gz

gsppy 3.0.0tar.gz → 3.1.1tar.gz