tsgap 0.3.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- tsgap-0.3.0/CHANGELOG.md +96 -0
- tsgap-0.3.0/CITATION.cff +21 -0
- tsgap-0.3.0/CONTRIBUTING.md +56 -0
- tsgap-0.3.0/LICENSE +21 -0
- tsgap-0.3.0/MANIFEST.in +6 -0
- tsgap-0.3.0/PKG-INFO +681 -0
- tsgap-0.3.0/README.md +652 -0
- tsgap-0.3.0/pyproject.toml +53 -0
- tsgap-0.3.0/setup.cfg +4 -0
- tsgap-0.3.0/tsgap/__init__.py +22 -0
- tsgap-0.3.0/tsgap/core.py +264 -0
- tsgap-0.3.0/tsgap/mechanisms.py +521 -0
- tsgap-0.3.0/tsgap/patterns.py +528 -0
- tsgap-0.3.0/tsgap/tests/__init__.py +0 -0
- tsgap-0.3.0/tsgap/tests/test_missingness.py +961 -0
- tsgap-0.3.0/tsgap.egg-info/PKG-INFO +681 -0
- tsgap-0.3.0/tsgap.egg-info/SOURCES.txt +18 -0
- tsgap-0.3.0/tsgap.egg-info/dependency_links.txt +1 -0
- tsgap-0.3.0/tsgap.egg-info/requires.txt +5 -0
- tsgap-0.3.0/tsgap.egg-info/top_level.txt +1 -0
tsgap-0.3.0/CHANGELOG.md
ADDED
|
@@ -0,0 +1,96 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
## Version 0.1.1 - Critical Fixes
|
|
4
|
+
|
|
5
|
+
### Critical Bug Fixes
|
|
6
|
+
|
|
7
|
+
#### 1. Calibration Bracketing Logic (Fixed) ⚠️
|
|
8
|
+
- **Before**: Bound expansion logic was reversed, causing calibration to fail for extreme rates
|
|
9
|
+
- **After**: Correct bracketing - expands bounds in proper direction
|
|
10
|
+
- **Impact**: Now handles extreme missing rates (1%, 90%) correctly
|
|
11
|
+
|
|
12
|
+
#### 2. MAR Normalization for 3D Data (Improved)
|
|
13
|
+
- **Before**: Global normalization across all participants
|
|
14
|
+
- **After**: Per-participant normalization for 3D data
|
|
15
|
+
- **Impact**: More consistent MAR behavior across subjects with different scales
|
|
16
|
+
|
|
17
|
+
#### 3. Base Rate Handling (Fixed)
|
|
18
|
+
- **Before**: `base_rate` could conflict with low `missing_rate`
|
|
19
|
+
- **After**: Automatically capped at `missing_rate * 0.5`
|
|
20
|
+
- **Impact**: Prevents calibration issues with very low missing rates
|
|
21
|
+
|
|
22
|
+
#### 4. Probability Zeroing (Improved)
|
|
23
|
+
- **Before**: Non-eligible positions handled after sampling
|
|
24
|
+
- **After**: Probabilities zeroed before sampling for cleaner semantics
|
|
25
|
+
- **Impact**: Slightly faster and more explicit logic
|
|
26
|
+
|
|
27
|
+
### Testing
|
|
28
|
+
|
|
29
|
+
Added extreme rate testing:
|
|
30
|
+
- Low rates: 1%, 2%, 5% - all within 0.4% of target
|
|
31
|
+
- High rates: 50%, 70%, 90% - all within 1.3% of target
|
|
32
|
+
- MCAR remains exact at all rates
|
|
33
|
+
|
|
34
|
+
---
|
|
35
|
+
|
|
36
|
+
## Version 0.1.0 - Initial Release
|
|
37
|
+
|
|
38
|
+
### Key Improvements
|
|
39
|
+
|
|
40
|
+
#### 1. Reproducibility (Fixed)
|
|
41
|
+
- **Before**: Used global `np.random.seed()` and direct calls to `np.random.choice()`, `np.random.rand()`
|
|
42
|
+
- **After**: Uses `np.random.Generator` with seed passed through all functions
|
|
43
|
+
- **Impact**: Fully reproducible results with same seed across all mechanisms
|
|
44
|
+
|
|
45
|
+
#### 2. Consistent Mask Semantics (Fixed)
|
|
46
|
+
- **Before**: MCAR didn't mark existing NaNs as False in returned mask
|
|
47
|
+
- **After**: All mechanisms consistently set `mask[existing_nans] = False`
|
|
48
|
+
- **Impact**: Uniform mask interpretation across all mechanisms
|
|
49
|
+
|
|
50
|
+
#### 3. Target Dimension Support (Added)
|
|
51
|
+
- **Before**: Only MCAR supported `target` parameter
|
|
52
|
+
- **After**: MAR and MNAR now support `target` parameter
|
|
53
|
+
- **Impact**: Can selectively mask specific dimensions while others drive missingness
|
|
54
|
+
|
|
55
|
+
#### 4. Improved Calibration (Enhanced)
|
|
56
|
+
- **Before**: Fixed bounds [-10, 10] for binary search
|
|
57
|
+
- **After**: Automatic bound expansion with bracketing
|
|
58
|
+
- **Impact**: Handles extreme missing rates (e.g., 0.01, 0.99) correctly
|
|
59
|
+
|
|
60
|
+
#### 5. Eligible Position Handling (Fixed)
|
|
61
|
+
- **Before**: MAR/MNAR calibrated over all positions including existing NaNs
|
|
62
|
+
- **After**: Calibration only considers eligible (non-NaN) positions
|
|
63
|
+
- **Impact**: More accurate missing rate achievement on real datasets with existing NaNs
|
|
64
|
+
|
|
65
|
+
#### 6. Performance Optimization (Improved)
|
|
66
|
+
- **Before**: Nested loops for MNAR normalization (slow on large datasets)
|
|
67
|
+
- **After**: Vectorized normalization using numpy broadcasting
|
|
68
|
+
- **Impact**: ~10-100x faster on large 3D arrays
|
|
69
|
+
|
|
70
|
+
#### 7. Code Organization (Refactored)
|
|
71
|
+
- Added helper functions:
|
|
72
|
+
- `_get_eligible_mask()`: Unified eligible position logic
|
|
73
|
+
- `_calibrate_offset()`: Reusable calibration with auto-bracketing
|
|
74
|
+
- **Impact**: Cleaner code, easier to maintain and extend
|
|
75
|
+
|
|
76
|
+
### API Changes
|
|
77
|
+
|
|
78
|
+
All changes are backward compatible. New optional parameters:
|
|
79
|
+
- `rng`: `np.random.Generator` for explicit RNG control
|
|
80
|
+
- `target`: Now supported in MAR and MNAR (was MCAR-only)
|
|
81
|
+
|
|
82
|
+
### Testing
|
|
83
|
+
|
|
84
|
+
All 17 unit tests pass:
|
|
85
|
+
- MCAR exact rate control
|
|
86
|
+
- MAR/MNAR approximate rate control
|
|
87
|
+
- Reproducibility with seeds
|
|
88
|
+
- Edge cases (constant signals, existing NaNs)
|
|
89
|
+
- Block missingness patterns
|
|
90
|
+
|
|
91
|
+
### Documentation
|
|
92
|
+
|
|
93
|
+
- Updated docstrings with clearer parameter descriptions
|
|
94
|
+
- Added notes about missing rate being applied to eligible entries
|
|
95
|
+
- Clarified mask semantics (True=observed, False=missing)
|
|
96
|
+
- Added mathematical formulations for each mechanism in README
|
tsgap-0.3.0/CITATION.cff
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
cff-version: 1.2.0
|
|
2
|
+
message: "If you use this software, please cite it as below."
|
|
3
|
+
authors:
|
|
4
|
+
- family-names: "Oripov"
|
|
5
|
+
given-names: "Feruz"
|
|
6
|
+
orcid: "https://orcid.org/0009-0001-4303-0512"
|
|
7
|
+
title: "TSGap: Composable Time-Series Missingness Simulation"
|
|
8
|
+
version: 0.3.0
|
|
9
|
+
date-released: 2026-03-16
|
|
10
|
+
url: "https://github.com/feruzoripov/tsgap"
|
|
11
|
+
license: MIT
|
|
12
|
+
repository-code: "https://github.com/feruzoripov/tsgap"
|
|
13
|
+
keywords:
|
|
14
|
+
- time series
|
|
15
|
+
- missing data
|
|
16
|
+
- imputation
|
|
17
|
+
- benchmarking
|
|
18
|
+
- simulation
|
|
19
|
+
- MCAR
|
|
20
|
+
- MAR
|
|
21
|
+
- MNAR
|
|
@@ -0,0 +1,56 @@
|
|
|
1
|
+
# Contributing to TSGap
|
|
2
|
+
|
|
3
|
+
Thank you for your interest in contributing to TSGap.
|
|
4
|
+
|
|
5
|
+
## Getting Started
|
|
6
|
+
|
|
7
|
+
1. Fork the repository
|
|
8
|
+
2. Clone your fork: `git clone https://github.com/YOUR_USERNAME/tsgap.git`
|
|
9
|
+
3. Create a virtual environment: `python -m venv .env && source .env/bin/activate`
|
|
10
|
+
4. Install in development mode: `pip install -e ".[dev]"`
|
|
11
|
+
|
|
12
|
+
## Running Tests
|
|
13
|
+
|
|
14
|
+
```bash
|
|
15
|
+
pytest tsgap/tests/ -v
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
All tests must pass before submitting a pull request.
|
|
19
|
+
|
|
20
|
+
## Adding a New Pattern
|
|
21
|
+
|
|
22
|
+
1. Implement your pattern function in `tsgap/patterns.py` following the existing signature:
|
|
23
|
+
```python
|
|
24
|
+
def apply_my_pattern(mask, shape, rng=None, **kwargs):
|
|
25
|
+
...
|
|
26
|
+
return modified_mask
|
|
27
|
+
```
|
|
28
|
+
2. Register it in the `PATTERNS` dictionary at the bottom of the file.
|
|
29
|
+
3. Add tests in `tsgap/tests/test_missingness.py`.
|
|
30
|
+
4. Update the docstring in `tsgap/core.py` and `README.md`.
|
|
31
|
+
|
|
32
|
+
## Adding a New Mechanism
|
|
33
|
+
|
|
34
|
+
1. Implement in `tsgap/mechanisms.py` following the existing signature.
|
|
35
|
+
2. Register it in the `MECHANISMS` dictionary.
|
|
36
|
+
3. Add tests and update documentation.
|
|
37
|
+
|
|
38
|
+
## Code Style
|
|
39
|
+
|
|
40
|
+
- Use type hints (`from __future__ import annotations`)
|
|
41
|
+
- Include docstrings with Parameters/Returns sections (NumPy style)
|
|
42
|
+
- Keep functions focused and composable
|
|
43
|
+
|
|
44
|
+
## Reporting Issues
|
|
45
|
+
|
|
46
|
+
Open an issue on GitHub with:
|
|
47
|
+
- A minimal reproducible example
|
|
48
|
+
- Expected vs. actual behavior
|
|
49
|
+
- Python and NumPy versions
|
|
50
|
+
|
|
51
|
+
## Pull Requests
|
|
52
|
+
|
|
53
|
+
- One feature per PR
|
|
54
|
+
- Include tests for new functionality
|
|
55
|
+
- Update documentation as needed
|
|
56
|
+
- Ensure all existing tests still pass
|
tsgap-0.3.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Feruz Oripov
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|