PyPI - tnapy - Versions diffs - 0.1.0__tar.gz - Mend

tnapy 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (36) hide show

tnapy-0.1.0/LICENSE +21 -0
tnapy-0.1.0/PKG-INFO +450 -0
tnapy-0.1.0/README.md +410 -0
tnapy-0.1.0/pyproject.toml +92 -0
tnapy-0.1.0/setup.cfg +4 -0
tnapy-0.1.0/tests/test_bootstrap.py +759 -0
tnapy-0.1.0/tests/test_cluster.py +428 -0
tnapy-0.1.0/tests/test_compare.py +186 -0
tnapy-0.1.0/tests/test_edge_cases.py +509 -0
tnapy-0.1.0/tests/test_numerical.py +457 -0
tnapy-0.1.0/tests/test_plot.py +625 -0
tnapy-0.1.0/tests/test_r_equivalence.py +568 -0
tnapy-0.1.0/tna/__init__.py +155 -0
tnapy-0.1.0/tna/bootstrap.py +1771 -0
tnapy-0.1.0/tna/centralities.py +453 -0
tnapy-0.1.0/tna/cliques.py +176 -0
tnapy-0.1.0/tna/cluster.py +473 -0
tnapy-0.1.0/tna/colors.py +298 -0
tnapy-0.1.0/tna/communities.py +251 -0
tnapy-0.1.0/tna/compare.py +444 -0
tnapy-0.1.0/tna/data/__init__.py +92 -0
tnapy-0.1.0/tna/data/group_regulation.csv +2001 -0
tnapy-0.1.0/tna/data/group_regulation_long.csv +27534 -0
tnapy-0.1.0/tna/group.py +333 -0
tnapy-0.1.0/tna/model.py +353 -0
tnapy-0.1.0/tna/plot.py +1977 -0
tnapy-0.1.0/tna/prepare.py +486 -0
tnapy-0.1.0/tna/prune.py +52 -0
tnapy-0.1.0/tna/py.typed +0 -0
tnapy-0.1.0/tna/transitions.py +527 -0
tnapy-0.1.0/tna/utils.py +154 -0
tnapy-0.1.0/tnapy.egg-info/PKG-INFO +450 -0
tnapy-0.1.0/tnapy.egg-info/SOURCES.txt +34 -0
tnapy-0.1.0/tnapy.egg-info/dependency_links.txt +1 -0
tnapy-0.1.0/tnapy.egg-info/requires.txt +16 -0
tnapy-0.1.0/tnapy.egg-info/top_level.txt +2 -0

tnapy-0.1.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2025 TNA Contributors
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

tnapy-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,450 @@
+Metadata-Version: 2.4
+Name: tnapy
+Version: 0.1.0
+Summary: Transition Network Analysis for Python
+Author-email: Mohammed Saqr <mohammed.saqr@uef.fi>, Santtu Tikka <santtu.tikka@jyu.fi>, Sonsoles López-Pernas <sonsoles.lopez@uef.fi>
+License: MIT
+Project-URL: Homepage, https://github.com/mohsaqr/tnapy
+Project-URL: Documentation, https://github.com/mohsaqr/tnapy#readme
+Project-URL: Repository, https://github.com/mohsaqr/tnapy
+Project-URL: Issues, https://github.com/mohsaqr/tnapy/issues
+Keywords: network analysis,transition networks,sequence analysis,markov chains,centrality,learning analytics
+Classifier: Development Status :: 3 - Alpha
+Classifier: Intended Audience :: Science/Research
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Topic :: Scientific/Engineering
+Classifier: Topic :: Scientific/Engineering :: Information Analysis
+Requires-Python: >=3.9
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: numpy>=1.21.0
+Requires-Dist: pandas>=1.3.0
+Requires-Dist: networkx>=2.6.0
+Requires-Dist: scipy>=1.7.0
+Requires-Dist: matplotlib>=3.5.0
+Requires-Dist: seaborn>=0.11.0
+Provides-Extra: dev
+Requires-Dist: pytest>=7.0.0; extra == "dev"
+Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
+Requires-Dist: ruff>=0.1.0; extra == "dev"
+Requires-Dist: mypy>=1.0.0; extra == "dev"
+Provides-Extra: docs
+Requires-Dist: sphinx>=5.0.0; extra == "docs"
+Requires-Dist: sphinx-rtd-theme>=1.0.0; extra == "docs"
+Dynamic: license-file
+# TNA - Transition Network Analysis for Python
+A Python package providing **exact numerical equivalence** to the [R TNA package](https://cran.r-project.org/package=tna) for analyzing sequential data as transition networks.
+## Features
+- **8 Model Types**: relative, frequency, co-occurrence, reverse, n-gram, gap, window, attention
+- **9 Centrality Measures**: OutStrength, InStrength, ClosenessIn, ClosenessOut, Closeness, Betweenness, BetweennessRSP, Diffusion, Clustering
+- **Statistical Inference**: Bootstrap resampling, permutation tests, confidence intervals
+- **10+ Visualization Functions**: Network plots, heatmaps, centrality charts, sequence plots
+- **R Package Equivalence**: Verified numerical equivalence with comprehensive test suite
+## Installation
+```bash
+# Development installation
+pip install -e .
+# Or install dependencies directly
+pip install numpy pandas networkx scipy matplotlib seaborn
+```
+## Quick Start
+```python
+import tna
+import pandas as pd
+# Load example data (2000 learning sessions with 9 self-regulated learning behaviors)
+df = tna.load_group_regulation()
+# Build a TNA model (relative transition probabilities)
+model = tna.tna(df)
+print(model)
+# Compute centrality measures
+cent = tna.centralities(model)
+print(cent)
+# Visualize the network
+tna.plot_network(model, layout='circular', edge_threshold=0.05)
+# Visualize centralities
+tna.plot_centralities(cent, measures=['OutStrength', 'InStrength', 'Betweenness'])
+```
+## Model Building
+### Basic Models
+```python
+# Relative transition probabilities (default)
+model = tna.tna(df)
+# Frequency model (raw counts)
+fmodel = tna.ftna(df)
+# Co-occurrence model (bidirectional)
+cmodel = tna.ctna(df)
+# Attention model (exponential decay weighting)
+amodel = tna.atna(df, beta=0.1)
+```
+### Advanced Model Types
+```python
+# All model types via build_model()
+model = tna.build_model(df, type_='relative')      # Row-normalized probabilities
+model = tna.build_model(df, type_='frequency')     # Raw transition counts
+model = tna.build_model(df, type_='co-occurrence') # Bidirectional co-occurrence
+model = tna.build_model(df, type_='reverse')       # Reverse order transitions
+model = tna.build_model(df, type_='n-gram', params={'n': 2})  # Higher-order n-grams
+model = tna.build_model(df, type_='gap', params={'max_gap': 3, 'decay': 0.5})  # Gap-weighted
+model = tna.build_model(df, type_='window', params={'size': 3})  # Sliding window
+model = tna.build_model(df, type_='attention', params={'beta': 0.1})  # Attention-weighted
+```
+### Scaling Options
+```python
+# Apply scaling to weight matrix
+model = tna.tna(df, scaling='minmax')  # Min-max normalization [0, 1]
+model = tna.tna(df, scaling='max')     # Divide by maximum
+model = tna.tna(df, scaling='rank')    # Rank-based scaling
+model = tna.tna(df, scaling=['minmax', 'max'])  # Multiple scalings
+```
+## Centrality Measures
+```python
+# Compute all centrality measures
+cent = tna.centralities(model)
+# Compute specific measures
+cent = tna.centralities(model, measures=['OutStrength', 'InStrength', 'Betweenness'])
+# With normalization
+cent = tna.centralities(model, normalize=True)
+# Include self-loops
+cent = tna.centralities(model, loops=True)
+```
+### Available Measures
+| Measure | Description |
+|---------|-------------|
+| `OutStrength` | Sum of outgoing edge weights |
+| `InStrength` | Sum of incoming edge weights |
+| `ClosenessIn` | Incoming closeness centrality |
+| `ClosenessOut` | Outgoing closeness centrality |
+| `Closeness` | Overall closeness (treats graph as undirected) |
+| `Betweenness` | Standard betweenness centrality |
+| `BetweennessRSP` | Randomized Shortest Path betweenness |
+| `Diffusion` | Diffusion centrality (Banerjee et al. 2014) |
+| `Clustering` | Weighted clustering coefficient (Zhang & Horvath 2005) |
+## Data Preparation
+### From Long Format Data
+```python
+# Prepare raw event data
+prepared = tna.prepare_data(
+    data=events_df,
+    actor='user_id',
+    time='timestamp',
+    action='event_type',
+    time_threshold=900  # 15 minutes session timeout
+)
+# Build model from prepared data
+model = tna.tna(prepared)
+# Access statistics
+print(prepared.statistics)  # n_sessions, n_actors, etc.
+```
+### From Wide Format Data
+```python
+# Direct from wide format (rows=sequences, cols=time steps)
+df = pd.DataFrame({
+    'step1': ['A', 'B', 'A'],
+    'step2': ['B', 'C', 'C'],
+    'step3': ['C', 'A', 'B']
+})
+model = tna.tna(df)
+```
+## Statistical Inference
+### Bootstrap Analysis
+```python
+# Bootstrap confidence intervals for model parameters
+boot = tna.bootstrap_tna(df, n_boot=1000, ci=0.95, seed=42)
+# Get summary with CIs for all edges
+summary = boot.summary()
+# Find significant edges
+sig_edges = boot.significant_edges(threshold=0)
+# Bootstrap centrality measures
+cent_ci = tna.bootstrap_centralities(
+    df,
+    measures=['OutStrength', 'InStrength', 'Betweenness'],
+    n_boot=1000,
+    ci=0.95
+)
+```
+### Permutation Tests
+```python
+# Compare two groups
+result = tna.permutation_test(
+    group1_df, group2_df,
+    n_perm=1000,
+    statistic='weights',  # or 'density', 'centrality'
+    alternative='two-sided',
+    seed=42
+)
+print(f"P-value: {result.p_value}")
+print(f"Significant: {result.is_significant(0.05)}")
+# Edge-wise comparison with multiple testing correction
+edges = tna.permutation_test_edges(
+    group1_df, group2_df,
+    n_perm=1000,
+    correction='fdr'  # or 'bonferroni', 'none'
+)
+```
+### Confidence Intervals
+```python
+# Percentile method
+ci = tna.confidence_interval(boot_samples, ci=0.95, method='percentile')
+# BCa method (bias-corrected and accelerated)
+ci = tna.bca_ci(data, boot_samples, statistic_func=np.mean, ci=0.95)
+```
+## Visualization
+### Network Plots
+```python
+# Basic network plot
+tna.plot_network(model)
+# Customized network
+tna.plot_network(
+    model,
+    layout='circular',           # or 'spring', 'kamada_kawai'
+    node_size='OutStrength',     # Size by centrality
+    edge_threshold=0.05,         # Hide weak edges
+    node_color='steelblue',
+    edge_cmap='Blues'
+)
+# Network with bootstrap confidence intervals
+tna.plot_network_ci(boot, edge_alpha='significance')
+```
+### Centrality Plots
+```python
+# Bar charts for centralities
+tna.plot_centralities(
+    cent,
+    measures=['OutStrength', 'InStrength', 'Betweenness'],
+    ncol=3
+)
+```
+### Heatmap
+```python
+# Transition matrix heatmap
+tna.plot_heatmap(model, cmap='Blues', annotate=True)
+```
+### Model Comparison
+```python
+# Side-by-side comparison of two models
+tna.plot_comparison(
+    model1, model2,
+    plot_type='heatmap',
+    labels=('Group 1', 'Group 2')
+)
+```
+### Sequence Visualization
+```python
+# State distribution over time
+tna.plot_sequences(df, plot_type='distribution')
+# State frequencies
+tna.plot_frequencies(df)
+# Histogram of sequence lengths
+tna.plot_histogram(df)
+```
+### Statistical Plots
+```python
+# Bootstrap distribution
+tna.plot_bootstrap(boot, plot_type='weights')
+tna.plot_bootstrap(boot, plot_type='centrality', measure='OutStrength')
+# Permutation test null distribution
+tna.plot_permutation(result)
+```
+## Example Datasets
+```python
+# Wide format: 2000 sessions x 20 time steps
+df = tna.load_group_regulation()
+# Long format: Actor, Time, Action columns
+df_long = tna.load_group_regulation_long()
+```
+## API Reference
+### Model Building
+| Function | Description |
+|----------|-------------|
+| `tna(x)` | Build relative transition probability model |
+| `ftna(x)` | Build frequency (raw counts) model |
+| `ctna(x)` | Build co-occurrence model |
+| `atna(x, beta)` | Build attention-weighted model |
+| `build_model(x, type_)` | Build model with specified type |
+### Data Preparation
+| Function | Description |
+|----------|-------------|
+| `prepare_data(data, actor, time, action)` | Prepare long-format event data |
+| `create_seqdata(x)` | Create sequence data from various formats |
+### Centralities
+| Function | Description |
+|----------|-------------|
+| `centralities(model, measures)` | Compute centrality measures |
+### Statistical Inference
+| Function | Description |
+|----------|-------------|
+| `bootstrap_tna(x, n_boot)` | Bootstrap analysis of TNA model |
+| `bootstrap_centralities(x, measures, n_boot)` | Bootstrap centrality CIs |
+| `permutation_test(x1, x2, n_perm)` | Permutation test for group comparison |
+| `permutation_test_edges(x1, x2, n_perm)` | Edge-wise permutation tests |
+| `confidence_interval(samples, ci)` | Calculate confidence interval |
+| `bca_ci(data, samples, func, ci)` | BCa confidence interval |
+### Visualization
+| Function | Description |
+|----------|-------------|
+| `plot_network(model)` | Plot transition network |
+| `plot_centralities(cent)` | Plot centrality bar charts |
+| `plot_heatmap(model)` | Plot transition matrix heatmap |
+| `plot_comparison(m1, m2)` | Compare two models |
+| `plot_sequences(df)` | Plot sequence patterns |
+| `plot_frequencies(df)` | Plot state frequencies |
+| `plot_histogram(df)` | Plot sequence length histogram |
+| `plot_bootstrap(boot)` | Visualize bootstrap results |
+| `plot_permutation(result)` | Visualize permutation test |
+| `plot_network_ci(boot)` | Network with confidence intervals |
+### Utilities
+| Function | Description |
+|----------|-------------|
+| `row_normalize(matrix)` | Row-normalize a matrix |
+| `minmax_scale(matrix)` | Min-max scaling to [0, 1] |
+| `max_scale(matrix)` | Divide by maximum |
+| `rank_scale(matrix)` | Rank-based scaling |
+## R Package Equivalence
+This package is designed to produce numerically equivalent results to the R TNA package. Key equivalences:
+- **Transition matrices**: Identical computation of relative, frequency, and co-occurrence matrices
+- **Centrality measures**: Exact ports of R implementations including custom measures (diffusion, weighted clustering)
+- **Data format**: Compatible with R's wide-format sequence data
+### Verification
+```python
+# Python
+model_py = tna.tna(df)
+cent_py = tna.centralities(model_py)
+# Results match R within floating-point precision:
+# - Max absolute difference < 1e-10 for transition matrices
+# - Max absolute difference < 1e-6 for centrality measures
+```
+## Citation
+If you use this package in your research, please cite:
+```bibtex
+@software{tna_python,
+  title = {TNA: Transition Network Analysis for Python},
+  author  = "Saqr, Mohammed and Tikka, Santtu and López-Pernas, Sonsoles",
+  year = {2026},
+  url = {https://github.com/mohsaqr/tnapy}
+}
+```
+Also cite Transition Network Analysis as a method
+```bibtex
+@INPROCEEDINGS{Saqr2025-ku,
+  title     = "Transition Network Analysis: A Novel Framework for Modeling,
+               Visualizing, and Identifying the Temporal Patterns of Learners
+               and Learning Processes",
+  author    = "Saqr, Mohammed and López-Pernas, Sonsoles and Törmänen, Tiina and
+               Kaliisa, Rogers and Misiejuk, Kamila and Tikka, Santtu",
+  booktitle = "Proceedings of Learning Analytics \& Knowledge (LAK '25)",
+  publisher = "ACM",
+  address   = "New York, NY, USA",
+  doi       = "10.1145/3706468.3706513",
+  pages     = "351 - 361",
+  year      =  2025
+}
+```
+## License
+MIT License
+## Contributing
+Contributions are welcome! Please feel free to submit a Pull Request.