tnapy 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
tnapy-0.1.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 TNA Contributors
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
tnapy-0.1.0/PKG-INFO ADDED
@@ -0,0 +1,450 @@
1
+ Metadata-Version: 2.4
2
+ Name: tnapy
3
+ Version: 0.1.0
4
+ Summary: Transition Network Analysis for Python
5
+ Author-email: Mohammed Saqr <mohammed.saqr@uef.fi>, Santtu Tikka <santtu.tikka@jyu.fi>, Sonsoles López-Pernas <sonsoles.lopez@uef.fi>
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/mohsaqr/tnapy
8
+ Project-URL: Documentation, https://github.com/mohsaqr/tnapy#readme
9
+ Project-URL: Repository, https://github.com/mohsaqr/tnapy
10
+ Project-URL: Issues, https://github.com/mohsaqr/tnapy/issues
11
+ Keywords: network analysis,transition networks,sequence analysis,markov chains,centrality,learning analytics
12
+ Classifier: Development Status :: 3 - Alpha
13
+ Classifier: Intended Audience :: Science/Research
14
+ Classifier: License :: OSI Approved :: MIT License
15
+ Classifier: Programming Language :: Python :: 3
16
+ Classifier: Programming Language :: Python :: 3.9
17
+ Classifier: Programming Language :: Python :: 3.10
18
+ Classifier: Programming Language :: Python :: 3.11
19
+ Classifier: Programming Language :: Python :: 3.12
20
+ Classifier: Topic :: Scientific/Engineering
21
+ Classifier: Topic :: Scientific/Engineering :: Information Analysis
22
+ Requires-Python: >=3.9
23
+ Description-Content-Type: text/markdown
24
+ License-File: LICENSE
25
+ Requires-Dist: numpy>=1.21.0
26
+ Requires-Dist: pandas>=1.3.0
27
+ Requires-Dist: networkx>=2.6.0
28
+ Requires-Dist: scipy>=1.7.0
29
+ Requires-Dist: matplotlib>=3.5.0
30
+ Requires-Dist: seaborn>=0.11.0
31
+ Provides-Extra: dev
32
+ Requires-Dist: pytest>=7.0.0; extra == "dev"
33
+ Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
34
+ Requires-Dist: ruff>=0.1.0; extra == "dev"
35
+ Requires-Dist: mypy>=1.0.0; extra == "dev"
36
+ Provides-Extra: docs
37
+ Requires-Dist: sphinx>=5.0.0; extra == "docs"
38
+ Requires-Dist: sphinx-rtd-theme>=1.0.0; extra == "docs"
39
+ Dynamic: license-file
40
+
41
+ # TNA - Transition Network Analysis for Python
42
+
43
+ A Python package providing **exact numerical equivalence** to the [R TNA package](https://cran.r-project.org/package=tna) for analyzing sequential data as transition networks.
44
+
45
+ ## Features
46
+
47
+ - **8 Model Types**: relative, frequency, co-occurrence, reverse, n-gram, gap, window, attention
48
+ - **9 Centrality Measures**: OutStrength, InStrength, ClosenessIn, ClosenessOut, Closeness, Betweenness, BetweennessRSP, Diffusion, Clustering
49
+ - **Statistical Inference**: Bootstrap resampling, permutation tests, confidence intervals
50
+ - **10+ Visualization Functions**: Network plots, heatmaps, centrality charts, sequence plots
51
+ - **R Package Equivalence**: Verified numerical equivalence with comprehensive test suite
52
+
53
+ ## Installation
54
+
55
+ ```bash
56
+ # Development installation
57
+ pip install -e .
58
+
59
+ # Or install dependencies directly
60
+ pip install numpy pandas networkx scipy matplotlib seaborn
61
+ ```
62
+
63
+ ## Quick Start
64
+
65
+ ```python
66
+ import tna
67
+ import pandas as pd
68
+
69
+ # Load example data (2000 learning sessions with 9 self-regulated learning behaviors)
70
+ df = tna.load_group_regulation()
71
+
72
+ # Build a TNA model (relative transition probabilities)
73
+ model = tna.tna(df)
74
+ print(model)
75
+
76
+ # Compute centrality measures
77
+ cent = tna.centralities(model)
78
+ print(cent)
79
+
80
+ # Visualize the network
81
+ tna.plot_network(model, layout='circular', edge_threshold=0.05)
82
+
83
+ # Visualize centralities
84
+ tna.plot_centralities(cent, measures=['OutStrength', 'InStrength', 'Betweenness'])
85
+ ```
86
+
87
+ ## Model Building
88
+
89
+ ### Basic Models
90
+
91
+ ```python
92
+ # Relative transition probabilities (default)
93
+ model = tna.tna(df)
94
+
95
+ # Frequency model (raw counts)
96
+ fmodel = tna.ftna(df)
97
+
98
+ # Co-occurrence model (bidirectional)
99
+ cmodel = tna.ctna(df)
100
+
101
+ # Attention model (exponential decay weighting)
102
+ amodel = tna.atna(df, beta=0.1)
103
+ ```
104
+
105
+ ### Advanced Model Types
106
+
107
+ ```python
108
+ # All model types via build_model()
109
+ model = tna.build_model(df, type_='relative') # Row-normalized probabilities
110
+ model = tna.build_model(df, type_='frequency') # Raw transition counts
111
+ model = tna.build_model(df, type_='co-occurrence') # Bidirectional co-occurrence
112
+ model = tna.build_model(df, type_='reverse') # Reverse order transitions
113
+ model = tna.build_model(df, type_='n-gram', params={'n': 2}) # Higher-order n-grams
114
+ model = tna.build_model(df, type_='gap', params={'max_gap': 3, 'decay': 0.5}) # Gap-weighted
115
+ model = tna.build_model(df, type_='window', params={'size': 3}) # Sliding window
116
+ model = tna.build_model(df, type_='attention', params={'beta': 0.1}) # Attention-weighted
117
+ ```
118
+
119
+ ### Scaling Options
120
+
121
+ ```python
122
+ # Apply scaling to weight matrix
123
+ model = tna.tna(df, scaling='minmax') # Min-max normalization [0, 1]
124
+ model = tna.tna(df, scaling='max') # Divide by maximum
125
+ model = tna.tna(df, scaling='rank') # Rank-based scaling
126
+ model = tna.tna(df, scaling=['minmax', 'max']) # Multiple scalings
127
+ ```
128
+
129
+ ## Centrality Measures
130
+
131
+ ```python
132
+ # Compute all centrality measures
133
+ cent = tna.centralities(model)
134
+
135
+ # Compute specific measures
136
+ cent = tna.centralities(model, measures=['OutStrength', 'InStrength', 'Betweenness'])
137
+
138
+ # With normalization
139
+ cent = tna.centralities(model, normalize=True)
140
+
141
+ # Include self-loops
142
+ cent = tna.centralities(model, loops=True)
143
+ ```
144
+
145
+ ### Available Measures
146
+
147
+ | Measure | Description |
148
+ |---------|-------------|
149
+ | `OutStrength` | Sum of outgoing edge weights |
150
+ | `InStrength` | Sum of incoming edge weights |
151
+ | `ClosenessIn` | Incoming closeness centrality |
152
+ | `ClosenessOut` | Outgoing closeness centrality |
153
+ | `Closeness` | Overall closeness (treats graph as undirected) |
154
+ | `Betweenness` | Standard betweenness centrality |
155
+ | `BetweennessRSP` | Randomized Shortest Path betweenness |
156
+ | `Diffusion` | Diffusion centrality (Banerjee et al. 2014) |
157
+ | `Clustering` | Weighted clustering coefficient (Zhang & Horvath 2005) |
158
+
159
+ ## Data Preparation
160
+
161
+ ### From Long Format Data
162
+
163
+ ```python
164
+ # Prepare raw event data
165
+ prepared = tna.prepare_data(
166
+ data=events_df,
167
+ actor='user_id',
168
+ time='timestamp',
169
+ action='event_type',
170
+ time_threshold=900 # 15 minutes session timeout
171
+ )
172
+
173
+ # Build model from prepared data
174
+ model = tna.tna(prepared)
175
+
176
+ # Access statistics
177
+ print(prepared.statistics) # n_sessions, n_actors, etc.
178
+ ```
179
+
180
+ ### From Wide Format Data
181
+
182
+ ```python
183
+ # Direct from wide format (rows=sequences, cols=time steps)
184
+ df = pd.DataFrame({
185
+ 'step1': ['A', 'B', 'A'],
186
+ 'step2': ['B', 'C', 'C'],
187
+ 'step3': ['C', 'A', 'B']
188
+ })
189
+ model = tna.tna(df)
190
+ ```
191
+
192
+ ## Statistical Inference
193
+
194
+ ### Bootstrap Analysis
195
+
196
+ ```python
197
+ # Bootstrap confidence intervals for model parameters
198
+ boot = tna.bootstrap_tna(df, n_boot=1000, ci=0.95, seed=42)
199
+
200
+ # Get summary with CIs for all edges
201
+ summary = boot.summary()
202
+
203
+ # Find significant edges
204
+ sig_edges = boot.significant_edges(threshold=0)
205
+
206
+ # Bootstrap centrality measures
207
+ cent_ci = tna.bootstrap_centralities(
208
+ df,
209
+ measures=['OutStrength', 'InStrength', 'Betweenness'],
210
+ n_boot=1000,
211
+ ci=0.95
212
+ )
213
+ ```
214
+
215
+ ### Permutation Tests
216
+
217
+ ```python
218
+ # Compare two groups
219
+ result = tna.permutation_test(
220
+ group1_df, group2_df,
221
+ n_perm=1000,
222
+ statistic='weights', # or 'density', 'centrality'
223
+ alternative='two-sided',
224
+ seed=42
225
+ )
226
+ print(f"P-value: {result.p_value}")
227
+ print(f"Significant: {result.is_significant(0.05)}")
228
+
229
+ # Edge-wise comparison with multiple testing correction
230
+ edges = tna.permutation_test_edges(
231
+ group1_df, group2_df,
232
+ n_perm=1000,
233
+ correction='fdr' # or 'bonferroni', 'none'
234
+ )
235
+ ```
236
+
237
+ ### Confidence Intervals
238
+
239
+ ```python
240
+ # Percentile method
241
+ ci = tna.confidence_interval(boot_samples, ci=0.95, method='percentile')
242
+
243
+ # BCa method (bias-corrected and accelerated)
244
+ ci = tna.bca_ci(data, boot_samples, statistic_func=np.mean, ci=0.95)
245
+ ```
246
+
247
+ ## Visualization
248
+
249
+ ### Network Plots
250
+
251
+ ```python
252
+ # Basic network plot
253
+ tna.plot_network(model)
254
+
255
+ # Customized network
256
+ tna.plot_network(
257
+ model,
258
+ layout='circular', # or 'spring', 'kamada_kawai'
259
+ node_size='OutStrength', # Size by centrality
260
+ edge_threshold=0.05, # Hide weak edges
261
+ node_color='steelblue',
262
+ edge_cmap='Blues'
263
+ )
264
+
265
+ # Network with bootstrap confidence intervals
266
+ tna.plot_network_ci(boot, edge_alpha='significance')
267
+ ```
268
+
269
+ ### Centrality Plots
270
+
271
+ ```python
272
+ # Bar charts for centralities
273
+ tna.plot_centralities(
274
+ cent,
275
+ measures=['OutStrength', 'InStrength', 'Betweenness'],
276
+ ncol=3
277
+ )
278
+ ```
279
+
280
+ ### Heatmap
281
+
282
+ ```python
283
+ # Transition matrix heatmap
284
+ tna.plot_heatmap(model, cmap='Blues', annotate=True)
285
+ ```
286
+
287
+ ### Model Comparison
288
+
289
+ ```python
290
+ # Side-by-side comparison of two models
291
+ tna.plot_comparison(
292
+ model1, model2,
293
+ plot_type='heatmap',
294
+ labels=('Group 1', 'Group 2')
295
+ )
296
+ ```
297
+
298
+ ### Sequence Visualization
299
+
300
+ ```python
301
+ # State distribution over time
302
+ tna.plot_sequences(df, plot_type='distribution')
303
+
304
+ # State frequencies
305
+ tna.plot_frequencies(df)
306
+
307
+ # Histogram of sequence lengths
308
+ tna.plot_histogram(df)
309
+ ```
310
+
311
+ ### Statistical Plots
312
+
313
+ ```python
314
+ # Bootstrap distribution
315
+ tna.plot_bootstrap(boot, plot_type='weights')
316
+ tna.plot_bootstrap(boot, plot_type='centrality', measure='OutStrength')
317
+
318
+ # Permutation test null distribution
319
+ tna.plot_permutation(result)
320
+ ```
321
+
322
+ ## Example Datasets
323
+
324
+ ```python
325
+ # Wide format: 2000 sessions x 20 time steps
326
+ df = tna.load_group_regulation()
327
+
328
+ # Long format: Actor, Time, Action columns
329
+ df_long = tna.load_group_regulation_long()
330
+ ```
331
+
332
+ ## API Reference
333
+
334
+ ### Model Building
335
+
336
+ | Function | Description |
337
+ |----------|-------------|
338
+ | `tna(x)` | Build relative transition probability model |
339
+ | `ftna(x)` | Build frequency (raw counts) model |
340
+ | `ctna(x)` | Build co-occurrence model |
341
+ | `atna(x, beta)` | Build attention-weighted model |
342
+ | `build_model(x, type_)` | Build model with specified type |
343
+
344
+ ### Data Preparation
345
+
346
+ | Function | Description |
347
+ |----------|-------------|
348
+ | `prepare_data(data, actor, time, action)` | Prepare long-format event data |
349
+ | `create_seqdata(x)` | Create sequence data from various formats |
350
+
351
+ ### Centralities
352
+
353
+ | Function | Description |
354
+ |----------|-------------|
355
+ | `centralities(model, measures)` | Compute centrality measures |
356
+
357
+ ### Statistical Inference
358
+
359
+ | Function | Description |
360
+ |----------|-------------|
361
+ | `bootstrap_tna(x, n_boot)` | Bootstrap analysis of TNA model |
362
+ | `bootstrap_centralities(x, measures, n_boot)` | Bootstrap centrality CIs |
363
+ | `permutation_test(x1, x2, n_perm)` | Permutation test for group comparison |
364
+ | `permutation_test_edges(x1, x2, n_perm)` | Edge-wise permutation tests |
365
+ | `confidence_interval(samples, ci)` | Calculate confidence interval |
366
+ | `bca_ci(data, samples, func, ci)` | BCa confidence interval |
367
+
368
+ ### Visualization
369
+
370
+ | Function | Description |
371
+ |----------|-------------|
372
+ | `plot_network(model)` | Plot transition network |
373
+ | `plot_centralities(cent)` | Plot centrality bar charts |
374
+ | `plot_heatmap(model)` | Plot transition matrix heatmap |
375
+ | `plot_comparison(m1, m2)` | Compare two models |
376
+ | `plot_sequences(df)` | Plot sequence patterns |
377
+ | `plot_frequencies(df)` | Plot state frequencies |
378
+ | `plot_histogram(df)` | Plot sequence length histogram |
379
+ | `plot_bootstrap(boot)` | Visualize bootstrap results |
380
+ | `plot_permutation(result)` | Visualize permutation test |
381
+ | `plot_network_ci(boot)` | Network with confidence intervals |
382
+
383
+ ### Utilities
384
+
385
+ | Function | Description |
386
+ |----------|-------------|
387
+ | `row_normalize(matrix)` | Row-normalize a matrix |
388
+ | `minmax_scale(matrix)` | Min-max scaling to [0, 1] |
389
+ | `max_scale(matrix)` | Divide by maximum |
390
+ | `rank_scale(matrix)` | Rank-based scaling |
391
+
392
+ ## R Package Equivalence
393
+
394
+ This package is designed to produce numerically equivalent results to the R TNA package. Key equivalences:
395
+
396
+ - **Transition matrices**: Identical computation of relative, frequency, and co-occurrence matrices
397
+ - **Centrality measures**: Exact ports of R implementations including custom measures (diffusion, weighted clustering)
398
+ - **Data format**: Compatible with R's wide-format sequence data
399
+
400
+ ### Verification
401
+
402
+ ```python
403
+ # Python
404
+ model_py = tna.tna(df)
405
+ cent_py = tna.centralities(model_py)
406
+
407
+ # Results match R within floating-point precision:
408
+ # - Max absolute difference < 1e-10 for transition matrices
409
+ # - Max absolute difference < 1e-6 for centrality measures
410
+ ```
411
+
412
+ ## Citation
413
+
414
+ If you use this package in your research, please cite:
415
+
416
+ ```bibtex
417
+ @software{tna_python,
418
+ title = {TNA: Transition Network Analysis for Python},
419
+ author = "Saqr, Mohammed and Tikka, Santtu and López-Pernas, Sonsoles",
420
+ year = {2026},
421
+ url = {https://github.com/mohsaqr/tnapy}
422
+ }
423
+ ```
424
+
425
+ Also cite Transition Network Analysis as a method
426
+
427
+ ```bibtex
428
+ @INPROCEEDINGS{Saqr2025-ku,
429
+ title = "Transition Network Analysis: A Novel Framework for Modeling,
430
+ Visualizing, and Identifying the Temporal Patterns of Learners
431
+ and Learning Processes",
432
+ author = "Saqr, Mohammed and López-Pernas, Sonsoles and Törmänen, Tiina and
433
+ Kaliisa, Rogers and Misiejuk, Kamila and Tikka, Santtu",
434
+ booktitle = "Proceedings of Learning Analytics \& Knowledge (LAK '25)",
435
+ publisher = "ACM",
436
+ address = "New York, NY, USA",
437
+ doi = "10.1145/3706468.3706513",
438
+ pages = "351 - 361",
439
+ year = 2025
440
+ }
441
+
442
+ ```
443
+
444
+ ## License
445
+
446
+ MIT License
447
+
448
+ ## Contributing
449
+
450
+ Contributions are welcome! Please feel free to submit a Pull Request.