gsppy 3.4.3__tar.gz → 3.6.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,86 @@
1
1
  # CHANGELOG
2
2
 
3
3
 
4
+ ## v3.6.0 (2026-01-26)
5
+
6
+ ### Chores
7
+
8
+ - Update uv.lock for version 3.5.0
9
+ ([`e2c1be0`](https://github.com/jacksonpradolima/gsp-py/commit/e2c1be0945b0b124d8afa8981877513449b29ff0))
10
+
11
+ ### Features
12
+
13
+ - Add flexible pruning strategy system to GSP algorithm
14
+ ([`94089cc`](https://github.com/jacksonpradolima/gsp-py/commit/94089cc5716ec6d7c7a6e0720843162db116fca2))
15
+
16
+ feat: add flexible pruning strategy system to GSP algorithm
17
+
18
+ - Add typing-extensions as a dependency
19
+ ([`6222945`](https://github.com/jacksonpradolima/gsp-py/commit/62229455ef3976c405d96e5ea9d5cafaf5eee6e3))
20
+
21
+ ### Refactoring
22
+
23
+ - Pruning strategy initialization and enhance type hints; add typing_extensions dependency
24
+ ([`ddc0abd`](https://github.com/jacksonpradolima/gsp-py/commit/ddc0abd9352797dd19988f60d6287da421ef60cf))
25
+
26
+
27
+ ## v3.5.0 (2026-01-26)
28
+
29
+ ### Bug Fixes
30
+
31
+ - Address code review feedback
32
+ ([`1e7cf86`](https://github.com/jacksonpradolima/gsp-py/commit/1e7cf8681b3cd0432e6d1608187b7d518c27fcc0))
33
+
34
+ - Remove root logger modifications to prevent global side effects - Fix redundant logger
35
+ configuration in CLI - Remove redundant subprocess imports in tests - Revert unrelated formatting
36
+ changes in temporal constraints tests - Replace future dates with YYYY-MM-DD placeholders in
37
+ documentation - Add explanation for not using Loguru in logging documentation
38
+
39
+ All changes address feedback from code review while maintaining backward compatibility and test
40
+ coverage.
41
+
42
+ Co-authored-by: jacksonpradolima <7774063+jacksonpradolima@users.noreply.github.com>
43
+
44
+ - Specify logger name in caplog for verbose tests
45
+ ([`cb477b0`](https://github.com/jacksonpradolima/gsp-py/commit/cb477b0f040ce38b60b6e3d485536e79d6d3ea19))
46
+
47
+ Update test_verbose_initialization, test_non_verbose_initialization, and
48
+ test_verbose_override_in_search to use caplog.at_level(logging.DEBUG, logger='gsppy.gsp') instead
49
+ of just caplog.at_level(logging.DEBUG). This ensures tests only capture logs from the gsppy.gsp
50
+ logger, preventing interference from other loggers and making tests more reliable.
51
+
52
+ Co-authored-by: jacksonpradolima <7774063+jacksonpradolima@users.noreply.github.com>
53
+
54
+ - Update test_setup_logging_verbose to match refactored logging
55
+ ([`ab78c33`](https://github.com/jacksonpradolima/gsp-py/commit/ab78c33ee1c09964773b1af835c9bb133a778824))
56
+
57
+ Update test to verify logging.basicConfig is called with DEBUG level instead of checking the removed
58
+ explicit logger.setLevel call. This aligns with the refactored logging configuration that removed
59
+ redundant logger level setting.
60
+
61
+ Co-authored-by: jacksonpradolima <7774063+jacksonpradolima@users.noreply.github.com>
62
+
63
+ ### Chores
64
+
65
+ - Update uv.lock for version 3.4.3
66
+ ([`6a78997`](https://github.com/jacksonpradolima/gsp-py/commit/6a789979fd6a7422c063dbe5b2ff46cd0d2141c6))
67
+
68
+ ### Features
69
+
70
+ - Add explicit verbosity control and structured logging
71
+ ([`44f56d9`](https://github.com/jacksonpradolima/gsp-py/commit/44f56d947978ddad1b7f2a2cca00f59def0ce4e4))
72
+
73
+ feat: add explicit verbosity control and structured logging
74
+
75
+ ### Refactoring
76
+
77
+ - Gsp initialization in tests to handle constraints explicitly and improve verbosity handling
78
+ ([`ced0243`](https://github.com/jacksonpradolima/gsp-py/commit/ced0243e58ff444988e37f5ae472f58d4478498e))
79
+
80
+ - Gsp initialization in tests to handle constraints explicitly and improve verbosity handling
81
+ ([`479f305`](https://github.com/jacksonpradolima/gsp-py/commit/479f305aae02217ce7b75fede5e0fb249fd1b477))
82
+
83
+
4
84
  ## v3.4.3 (2026-01-25)
5
85
 
6
86
  ### Bug Fixes
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: gsppy
3
- Version: 3.4.3
3
+ Version: 3.6.0
4
4
  Summary: GSP (Generalized Sequence Pattern) algorithm in Python
5
5
  Project-URL: Homepage, https://github.com/jacksonpradolima/gsp-py
6
6
  Author-email: Jackson Antonio do Prado Lima <jacksonpradolima@gmail.com>
@@ -40,6 +40,7 @@ Classifier: Topic :: Scientific/Engineering :: Information Analysis
40
40
  Classifier: Topic :: Software Development :: Libraries :: Python Modules
41
41
  Requires-Python: >=3.10
42
42
  Requires-Dist: click>=8.0.0
43
+ Requires-Dist: typing-extensions>=4.0.0
43
44
  Provides-Extra: dev
44
45
  Requires-Dist: cython==3.2.4; extra == 'dev'
45
46
  Requires-Dist: hatch==1.16.3; extra == 'dev'
@@ -105,6 +106,7 @@ Sequence Pattern (GSP)** algorithm. Ideal for market basket analysis, temporal m
105
106
  6. [💡 Usage](#usage)
106
107
  - [✅ Example: Analyzing Sales Data](#example-analyzing-sales-data)
107
108
  - [📊 Explanation: Support and Results](#explanation-support-and-results)
109
+ - [⏱️ Temporal Constraints](#temporal-constraints)
108
110
  7. [⌨️ Typing](#typing)
109
111
  8. [🌟 Planned Features](#planned-features)
110
112
  9. [🤝 Contributing](#contributing)
@@ -123,6 +125,7 @@ principles**. Using support thresholds, GSP identifies frequent sequences of ite
123
125
  - **Ordered (non-contiguous) matching**: Detects patterns where items appear in order but not necessarily adjacent, following standard GSP semantics. For example, the pattern `('A', 'C')` is found in the sequence `['A', 'B', 'C']`.
124
126
  - **Support-based pruning**: Only retains sequences that meet the minimum support threshold.
125
127
  - **Candidate generation**: Iteratively generates candidate sequences of increasing length.
128
+ - **Temporal constraints**: Support for time-constrained pattern mining with `mingap`, `maxgap`, and `maxspan` parameters to find patterns within specific time windows.
126
129
  - **General-purpose**: Useful in retail, web analytics, social networks, temporal sequence mining, and more.
127
130
 
128
131
  For example:
@@ -373,7 +376,28 @@ gsppy --file path/to/transactions.csv --min_support 0.3 --backend rust
373
376
  - `--file`: Path to your input file (JSON or CSV). **Required**.
374
377
  - `--min_support`: Minimum support threshold as a fraction (e.g., `0.3` for 30%). Default is `0.2`.
375
378
  - `--backend`: Backend to use for support counting. One of `auto` (default), `python`, `rust`, or `gpu`.
376
- - `--verbose`: (Optional) Enable detailed output for debugging.
379
+ - `--verbose`: Enable detailed logging with timestamps, log levels, and process IDs for debugging and traceability.
380
+ - `--mingap`, `--maxgap`, `--maxspan`: Temporal constraints for time-aware pattern mining (requires timestamped transactions).
381
+
382
+ #### Verbose Mode
383
+
384
+ For debugging or to track execution in CI/CD pipelines, use the `--verbose` flag:
385
+
386
+ ```bash
387
+ gsppy --file transactions.json --min_support 0.3 --verbose
388
+ ```
389
+
390
+ This produces structured logging output with timestamps, log levels, and process information:
391
+
392
+ ```
393
+ YYYY-MM-DDTHH:MM:SS | INFO | PID:4179 | gsppy.gsp | Pre-processing transactions...
394
+ YYYY-MM-DDTHH:MM:SS | DEBUG | PID:4179 | gsppy.gsp | Unique candidates: [('Bread',), ('Milk',), ...]
395
+ YYYY-MM-DDTHH:MM:SS | INFO | PID:4179 | gsppy.gsp | Starting GSP algorithm with min_support=0.3...
396
+ YYYY-MM-DDTHH:MM:SS | INFO | PID:4179 | gsppy.gsp | Run 1: 6 candidates filtered to 5.
397
+ ...
398
+ ```
399
+
400
+ For complete logging documentation, see [docs/logging.md](docs/logging.md).
377
401
 
378
402
  #### Example
379
403
 
@@ -470,6 +494,30 @@ result = GSP(transactions).search(min_support)
470
494
  print(result)
471
495
  ```
472
496
 
497
+ ### Verbose Mode for Debugging
498
+
499
+ Enable detailed logging to track algorithm progress and debug issues:
500
+
501
+ ```python
502
+ from gsppy.gsp import GSP
503
+
504
+ # Enable verbose logging for the entire instance
505
+ gsp = GSP(transactions, verbose=True)
506
+ result = gsp.search(min_support=0.3)
507
+
508
+ # Or enable verbose for a specific search only
509
+ gsp = GSP(transactions)
510
+ result = gsp.search(min_support=0.3, verbose=True)
511
+ ```
512
+
513
+ Verbose mode provides:
514
+ - Detailed progress information during execution
515
+ - Candidate generation and filtering statistics
516
+ - Preprocessing and validation details
517
+ - Useful for debugging, research, and CI/CD integration
518
+
519
+ For complete documentation on logging, see [docs/logging.md](docs/logging.md).
520
+
473
521
  ### Output
474
522
 
475
523
  The algorithm will return a list of patterns with their corresponding support.
@@ -536,6 +584,262 @@ result = gsp.search(min_support=0.5) # Need at least 2/4 sequences
536
584
 
537
585
  ---
538
586
 
587
+ ## ⏱️ Temporal Constraints
588
+
589
+ GSP-Py supports **time-constrained sequential pattern mining** with three powerful temporal constraints: `mingap`, `maxgap`, and `maxspan`. These constraints enable domain-specific applications such as medical event mining, retail analytics, and temporal user journey discovery.
590
+
591
+ ### Temporal Constraint Parameters
592
+
593
+ - **`mingap`**: Minimum time gap required between consecutive items in a pattern
594
+ - **`maxgap`**: Maximum time gap allowed between consecutive items in a pattern
595
+ - **`maxspan`**: Maximum time span from the first to the last item in a pattern
596
+
597
+ ### Using Temporal Constraints
598
+
599
+ To use temporal constraints, your transactions must include timestamps as (item, timestamp) tuples:
600
+
601
+ ```python
602
+ from gsppy.gsp import GSP
603
+
604
+ # Transactions with timestamps (e.g., in seconds, hours, days, etc.)
605
+ timestamped_transactions = [
606
+ [("Login", 0), ("Browse", 2), ("AddToCart", 5), ("Purchase", 7)],
607
+ [("Login", 0), ("Browse", 1), ("AddToCart", 15), ("Purchase", 20)],
608
+ [("Login", 0), ("Browse", 3), ("AddToCart", 6), ("Purchase", 8)],
609
+ ]
610
+
611
+ # Find patterns where consecutive events occur within 10 time units
612
+ gsp = GSP(timestamped_transactions, maxgap=10)
613
+ patterns = gsp.search(min_support=0.6)
614
+
615
+ # The pattern ("Browse", "AddToCart", "Purchase") will:
616
+ # - Be found in transaction 1: gaps are 3 and 2 (both ≤ 10) ✅
617
+ # - NOT be found in transaction 2: gap between Browse→AddToCart is 14 (exceeds maxgap) ❌
618
+ # - Be found in transaction 3: gaps are 3 and 2 (both ≤ 10) ✅
619
+ # Result: Support = 2/3 = 67% (above threshold of 60%)
620
+ ```
621
+
622
+ ### CLI Usage with Temporal Constraints
623
+
624
+ ```bash
625
+ # Find patterns with maximum gap of 5 time units
626
+ gsppy --file temporal_data.json --min_support 0.3 --maxgap 5
627
+
628
+ # Find patterns with minimum gap of 2 time units
629
+ gsppy --file temporal_data.json --min_support 0.3 --mingap 2
630
+
631
+ # Find patterns that complete within 10 time units
632
+ gsppy --file temporal_data.json --min_support 0.3 --maxspan 10
633
+
634
+ # Combine multiple constraints
635
+ gsppy --file temporal_data.json --min_support 0.3 --mingap 1 --maxgap 5 --maxspan 10
636
+ ```
637
+
638
+ ### Real-World Examples
639
+
640
+ #### Medical Event Mining
641
+
642
+ ```python
643
+ from gsppy.gsp import GSP
644
+
645
+ # Medical events with timestamps in days
646
+ medical_sequences = [
647
+ [("Symptom", 0), ("Diagnosis", 2), ("Treatment", 5), ("Recovery", 15)],
648
+ [("Symptom", 0), ("Diagnosis", 1), ("Treatment", 20), ("Recovery", 30)],
649
+ [("Symptom", 0), ("Diagnosis", 3), ("Treatment", 6), ("Recovery", 18)],
650
+ ]
651
+
652
+ # Find patterns where treatment follows diagnosis within 10 days
653
+ gsp = GSP(medical_sequences, maxgap=10)
654
+ result = gsp.search(min_support=0.5)
655
+
656
+ # Pattern ("Diagnosis", "Treatment") found in sequences 1 & 3 only
657
+ # (sequence 2 has gap of 19 days, exceeding maxgap)
658
+ ```
659
+
660
+ #### Retail Analytics
661
+
662
+ ```python
663
+ from gsppy.gsp import GSP
664
+
665
+ # Customer purchases with timestamps in hours
666
+ purchase_sequences = [
667
+ [("Browse", 0), ("AddToCart", 0.5), ("Purchase", 1)],
668
+ [("Browse", 0), ("AddToCart", 1), ("Purchase", 25)], # Long delay
669
+ [("Browse", 0), ("AddToCart", 0.3), ("Purchase", 0.8)],
670
+ ]
671
+
672
+ # Find purchase journeys that complete within 2 hours
673
+ gsp = GSP(purchase_sequences, maxspan=2)
674
+ result = gsp.search(min_support=0.5)
675
+
676
+ # Full sequence found in 2 out of 3 transactions
677
+ # (sequence 2 has span of 25 hours, exceeding maxspan)
678
+ ```
679
+
680
+ #### User Journey Discovery
681
+
682
+ ```python
683
+ from gsppy.gsp import GSP
684
+
685
+ # Website navigation with timestamps in seconds
686
+ navigation_sequences = [
687
+ [("Home", 0), ("Search", 5), ("Product", 10), ("Checkout", 15)],
688
+ [("Home", 0), ("Search", 3), ("Product", 8), ("Checkout", 180)],
689
+ [("Home", 0), ("Search", 4), ("Product", 9), ("Checkout", 14)],
690
+ ]
691
+
692
+ # Find navigation patterns with:
693
+ # - Minimum 2 seconds between steps (mingap)
694
+ # - Maximum 20 seconds between steps (maxgap)
695
+ # - Complete within 30 seconds total (maxspan)
696
+ gsp = GSP(navigation_sequences, mingap=2, maxgap=20, maxspan=30)
697
+ result = gsp.search(min_support=0.5)
698
+ ```
699
+
700
+ ### Important Notes
701
+
702
+ - Temporal constraints require timestamped transactions (item-timestamp tuples)
703
+ - If temporal constraints are specified but transactions don't have timestamps, a warning is logged and constraints are ignored
704
+ - When using temporal constraints, the Python backend is automatically used (accelerated backends don't yet support temporal constraints)
705
+ - Timestamps can be in any unit (seconds, minutes, hours, days) as long as they're consistent within your dataset
706
+
707
+ ---
708
+
709
+ ## 🔧 Flexible Candidate Pruning
710
+
711
+ GSP-Py supports **flexible candidate pruning strategies** that allow you to customize how candidate sequences are filtered during pattern mining. This enables optimization for different dataset characteristics and mining requirements.
712
+
713
+ ### Built-in Pruning Strategies
714
+
715
+ #### 1. Support-Based Pruning (Default)
716
+
717
+ The standard GSP pruning based on minimum support threshold:
718
+
719
+ ```python
720
+ from gsppy.gsp import GSP
721
+ from gsppy.pruning import SupportBasedPruning
722
+
723
+ # Explicit support-based pruning
724
+ pruner = SupportBasedPruning(min_support_fraction=0.3)
725
+ gsp = GSP(transactions, pruning_strategy=pruner)
726
+ result = gsp.search(min_support=0.3)
727
+ ```
728
+
729
+ #### 2. Frequency-Based Pruning
730
+
731
+ Prunes candidates based on absolute frequency (minimum number of occurrences):
732
+
733
+ ```python
734
+ from gsppy.pruning import FrequencyBasedPruning
735
+
736
+ # Require patterns to appear at least 5 times
737
+ pruner = FrequencyBasedPruning(min_frequency=5)
738
+ gsp = GSP(transactions, pruning_strategy=pruner)
739
+ result = gsp.search(min_support=0.2)
740
+ ```
741
+
742
+ **Use case**: When you need patterns to occur a minimum absolute number of times, regardless of dataset size.
743
+
744
+ #### 3. Temporal-Aware Pruning
745
+
746
+ Optimizes pruning for time-constrained pattern mining by pre-filtering infeasible patterns:
747
+
748
+ ```python
749
+ from gsppy.pruning import TemporalAwarePruning
750
+
751
+ # Prune patterns that cannot satisfy temporal constraints
752
+ pruner = TemporalAwarePruning(
753
+ mingap=1,
754
+ maxgap=5,
755
+ maxspan=10,
756
+ min_support_fraction=0.3
757
+ )
758
+ gsp = GSP(timestamped_transactions, mingap=1, maxgap=5, maxspan=10, pruning_strategy=pruner)
759
+ result = gsp.search(min_support=0.3)
760
+ ```
761
+
762
+ **Use case**: Improves performance for temporal pattern mining by eliminating patterns that cannot satisfy temporal constraints.
763
+
764
+ #### 4. Combined Pruning
765
+
766
+ Combines multiple pruning strategies for aggressive filtering:
767
+
768
+ ```python
769
+ from gsppy.pruning import CombinedPruning, SupportBasedPruning, FrequencyBasedPruning
770
+
771
+ # Apply both support and frequency constraints
772
+ strategies = [
773
+ SupportBasedPruning(min_support_fraction=0.3),
774
+ FrequencyBasedPruning(min_frequency=5)
775
+ ]
776
+ pruner = CombinedPruning(strategies)
777
+ gsp = GSP(transactions, pruning_strategy=pruner)
778
+ result = gsp.search(min_support=0.3)
779
+ ```
780
+
781
+ **Use case**: When you want to combine multiple filtering criteria for more selective pattern discovery.
782
+
783
+ ### Custom Pruning Strategies
784
+
785
+ You can create custom pruning strategies by implementing the `PruningStrategy` interface:
786
+
787
+ ```python
788
+ from gsppy.pruning import PruningStrategy
789
+ from typing import Dict, Optional, Tuple
790
+
791
+ class MyCustomPruner(PruningStrategy):
792
+ def should_prune(
793
+ self,
794
+ candidate: Tuple[str, ...],
795
+ support_count: int,
796
+ total_transactions: int,
797
+ context: Optional[Dict] = None
798
+ ) -> bool:
799
+ # Custom pruning logic
800
+ # Return True to prune (filter out), False to keep
801
+ pattern_length = len(candidate)
802
+ # Example: Prune very long patterns with low support
803
+ if pattern_length > 5 and support_count < 10:
804
+ return True
805
+ return False
806
+
807
+ # Use your custom pruner
808
+ custom_pruner = MyCustomPruner()
809
+ gsp = GSP(transactions, pruning_strategy=custom_pruner)
810
+ result = gsp.search(min_support=0.2)
811
+ ```
812
+
813
+ ### Performance Characteristics
814
+
815
+ Different pruning strategies have different performance tradeoffs:
816
+
817
+ | Strategy | Pruning Aggressiveness | Use Case | Performance Impact |
818
+ |----------|----------------------|----------|-------------------|
819
+ | **SupportBased** | Moderate | General-purpose mining | Baseline performance |
820
+ | **FrequencyBased** | High (for large datasets) | Require absolute frequency | Faster on large datasets |
821
+ | **TemporalAware** | High (for temporal data) | Time-constrained patterns | Significant speedup for temporal mining |
822
+ | **Combined** | Very High | Selective pattern discovery | Fastest, but may miss edge cases |
823
+
824
+ ### Benchmarking Pruning Strategies
825
+
826
+ To compare pruning strategies on your dataset:
827
+
828
+ ```bash
829
+ # Compare all strategies
830
+ python benchmarks/bench_pruning.py --n_tx 1000 --vocab 100 --min_support 0.2 --strategy all
831
+
832
+ # Benchmark a specific strategy
833
+ python benchmarks/bench_pruning.py --n_tx 1000 --vocab 100 --min_support 0.2 --strategy frequency
834
+
835
+ # Run multiple rounds for averaging
836
+ python benchmarks/bench_pruning.py --n_tx 1000 --vocab 100 --min_support 0.2 --strategy all --rounds 3
837
+ ```
838
+
839
+ See `benchmarks/bench_pruning.py` for the complete benchmarking script.
840
+
841
+ ---
842
+
539
843
  ## ⌨️ Typing
540
844
 
541
845
  `gsppy` ships inline type information (PEP 561) via a bundled `py.typed` marker. The public API is re-exported from
@@ -549,17 +853,9 @@ larger applications.
549
853
 
550
854
  We are actively working to improve GSP-Py. Here are some exciting features planned for future releases:
551
855
 
552
- 1. **Custom Filters for Candidate Pruning**:
553
- - Enable users to define their own pruning logic during the mining process.
554
-
555
- 2. **Support for Preprocessing and Postprocessing**:
856
+ 1. **Support for Preprocessing and Postprocessing**:
556
857
  - Add hooks to allow users to transform datasets before mining and customize the output results.
557
858
 
558
- 3. **Support for Time-Constrained Pattern Mining**:
559
- - Extend GSP-Py to handle temporal datasets by allowing users to define time constraints (e.g., maximum time gaps
560
- between events, time windows) during the sequence mining process.
561
- - Enable candidate pruning and support calculations based on these temporal constraints.
562
-
563
859
  Want to contribute or suggest an
564
860
  improvement? [Open a discussion or issue!](https://github.com/jacksonpradolima/gsp-py/issues)
565
861