PyStormTracker 0.3.3__tar.gz → 0.4.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (71) hide show
  1. pystormtracker-0.4.0/.dockerignore +16 -0
  2. {pystormtracker-0.3.3 → pystormtracker-0.4.0}/.github/workflows/ci.yml +70 -10
  3. {pystormtracker-0.3.3 → pystormtracker-0.4.0}/.github/workflows/docker-publish.yml +32 -22
  4. {pystormtracker-0.3.3 → pystormtracker-0.4.0}/.github/workflows/python-publish.yml +13 -3
  5. pystormtracker-0.4.0/.gitignore +40 -0
  6. pystormtracker-0.4.0/.python-version +1 -0
  7. pystormtracker-0.4.0/ARCHITECTURE.md +97 -0
  8. {pystormtracker-0.3.3 → pystormtracker-0.4.0}/CITATION.cff +2 -2
  9. {pystormtracker-0.3.3 → pystormtracker-0.4.0}/PKG-INFO +76 -26
  10. {pystormtracker-0.3.3 → pystormtracker-0.4.0}/README.md +69 -21
  11. pystormtracker-0.4.0/ROADMAP.md +41 -0
  12. pystormtracker-0.4.0/data/test/tracks/era5_vo_2.5x2.5_1e-4_v0.0.2_imilast.txt +38347 -0
  13. pystormtracker-0.4.0/docs/architecture.md +2 -0
  14. pystormtracker-0.4.0/docs/benchmark.md +2 -0
  15. {pystormtracker-0.3.3 → pystormtracker-0.4.0}/docs/conf.py +3 -2
  16. pystormtracker-0.4.0/docs/index.md +27 -0
  17. pystormtracker-0.4.0/docs/readme.md +2 -0
  18. pystormtracker-0.4.0/docs/roadmap.md +2 -0
  19. {pystormtracker-0.3.3 → pystormtracker-0.4.0}/pyproject.toml +18 -11
  20. pystormtracker-0.4.0/src/pystormtracker/__init__.py +20 -0
  21. pystormtracker-0.4.0/src/pystormtracker/cli.py +148 -0
  22. pystormtracker-0.4.0/src/pystormtracker/hodges/__init__.py +0 -0
  23. pystormtracker-0.4.0/src/pystormtracker/hodges/tracker.py +33 -0
  24. pystormtracker-0.4.0/src/pystormtracker/io/__init__.py +0 -0
  25. pystormtracker-0.4.0/src/pystormtracker/io/imilast.py +103 -0
  26. pystormtracker-0.4.0/src/pystormtracker/io/loader.py +72 -0
  27. pystormtracker-0.4.0/src/pystormtracker/models/__init__.py +5 -0
  28. {pystormtracker-0.3.3 → pystormtracker-0.4.0}/src/pystormtracker/models/center.py +4 -8
  29. pystormtracker-0.4.0/src/pystormtracker/models/tracker.py +27 -0
  30. pystormtracker-0.4.0/src/pystormtracker/models/tracks.py +428 -0
  31. pystormtracker-0.4.0/src/pystormtracker/simple/__init__.py +5 -0
  32. pystormtracker-0.4.0/src/pystormtracker/simple/concurrent.py +131 -0
  33. pystormtracker-0.4.0/src/pystormtracker/simple/detector.py +266 -0
  34. pystormtracker-0.4.0/src/pystormtracker/simple/kernels.py +148 -0
  35. pystormtracker-0.4.0/src/pystormtracker/simple/linker.py +175 -0
  36. pystormtracker-0.4.0/src/pystormtracker/simple/tracker.py +130 -0
  37. pystormtracker-0.4.0/src/pystormtracker/utils/__init__.py +0 -0
  38. pystormtracker-0.4.0/src/pystormtracker/utils/benchmark.py +52 -0
  39. {pystormtracker-0.3.3/tests → pystormtracker-0.4.0/src/pystormtracker/utils}/data_utils.py +28 -8
  40. {pystormtracker-0.3.3 → pystormtracker-0.4.0}/tests/test_center.py +12 -12
  41. pystormtracker-0.3.3/tests/test_stormtracker.py → pystormtracker-0.4.0/tests/test_cli.py +2 -3
  42. pystormtracker-0.4.0/tests/test_integration.py +299 -0
  43. {pystormtracker-0.3.3 → pystormtracker-0.4.0}/tests/test_simple_detector.py +12 -9
  44. pystormtracker-0.4.0/tests/test_simple_linker.py +32 -0
  45. pystormtracker-0.4.0/tests/test_simple_tracker.py +40 -0
  46. pystormtracker-0.4.0/tests/test_tracks.py +247 -0
  47. {pystormtracker-0.3.3 → pystormtracker-0.4.0}/uv.lock +504 -206
  48. pystormtracker-0.3.3/.gitignore +0 -73
  49. pystormtracker-0.3.3/.python-version +0 -1
  50. pystormtracker-0.3.3/docs/index.md +0 -21
  51. pystormtracker-0.3.3/src/pystormtracker/__init__.py +0 -11
  52. pystormtracker-0.3.3/src/pystormtracker/models/__init__.py +0 -6
  53. pystormtracker-0.3.3/src/pystormtracker/models/grid.py +0 -36
  54. pystormtracker-0.3.3/src/pystormtracker/models/time.py +0 -14
  55. pystormtracker-0.3.3/src/pystormtracker/models/tracks.py +0 -202
  56. pystormtracker-0.3.3/src/pystormtracker/simple/__init__.py +0 -4
  57. pystormtracker-0.3.3/src/pystormtracker/simple/detector.py +0 -274
  58. pystormtracker-0.3.3/src/pystormtracker/simple/linker.py +0 -135
  59. pystormtracker-0.3.3/src/pystormtracker/stormtracker.py +0 -244
  60. pystormtracker-0.3.3/tests/test_integration.py +0 -199
  61. pystormtracker-0.3.3/tests/test_simple_linker.py +0 -54
  62. pystormtracker-0.3.3/tests/test_tracks.py +0 -124
  63. {pystormtracker-0.3.3 → pystormtracker-0.4.0}/.github/dependabot.yml +0 -0
  64. {pystormtracker-0.3.3 → pystormtracker-0.4.0}/.readthedocs.yaml +0 -0
  65. {pystormtracker-0.3.3 → pystormtracker-0.4.0}/Dockerfile +0 -0
  66. {pystormtracker-0.3.3 → pystormtracker-0.4.0}/LICENSE +0 -0
  67. {pystormtracker-0.3.3 → pystormtracker-0.4.0}/codecov.yml +0 -0
  68. {pystormtracker-0.3.3 → pystormtracker-0.4.0}/data/test/tracks/era5_msl_2.5x2.5_v0.0.2_imilast.txt +0 -0
  69. {pystormtracker-0.3.3 → pystormtracker-0.4.0}/docs/IntercomparisonProtocol.pdf +0 -0
  70. {pystormtracker-0.3.3 → pystormtracker-0.4.0}/tests/__init__.py +0 -0
  71. {pystormtracker-0.3.3 → pystormtracker-0.4.0}/tests/conftest.py +0 -0
@@ -0,0 +1,16 @@
1
+ .coverage
2
+ .coverage.*
3
+ .git
4
+ .github
5
+ .mypy_cache/
6
+ .pytest_cache/
7
+ .ruff_cache/
8
+ .venv/
9
+ .vscode/
10
+ __pycache__/
11
+ benchmark/
12
+ data/test/
13
+ docs/
14
+ htmlcov/
15
+ tests/
16
+ worktrees/
@@ -8,6 +8,7 @@ on:
8
8
  - 'pyproject.toml'
9
9
  - 'uv.lock'
10
10
  - 'Dockerfile'
11
+ - '.github/workflows/ci.yml'
11
12
  push:
12
13
  branches:
13
14
  - main
@@ -20,6 +21,9 @@ on:
20
21
  - 'pyproject.toml'
21
22
  - 'uv.lock'
22
23
  - 'Dockerfile'
24
+ - '.github/workflows/ci.yml'
25
+ release:
26
+ types: [published]
23
27
  workflow_dispatch:
24
28
 
25
29
  concurrency:
@@ -83,6 +87,7 @@ jobs:
83
87
  fail-fast: false
84
88
  matrix:
85
89
  python-version: ["3.11", "3.12", "3.13", "3.14"]
90
+ min-deps: [false]
86
91
  include:
87
92
  - python-version: "3.11"
88
93
  min-deps: true
@@ -105,8 +110,20 @@ jobs:
105
110
  if: ${{ matrix.min-deps }}
106
111
  run: uv sync --group dev --resolution lowest-direct
107
112
  - name: Run Unit Tests
108
- run: |
109
- uv run pytest -vv
113
+ if: matrix.python-version != '3.13' || matrix.min-deps
114
+ run: uv run pytest -vv
115
+
116
+ - name: Run Unit Tests with Coverage
117
+ if: matrix.python-version == '3.13' && !matrix.min-deps
118
+ run: uv run pytest -vv --cov=pystormtracker --cov-report=term-missing --cov-report=xml
119
+
120
+ - name: Upload coverage reports to Codecov
121
+ if: matrix.python-version == '3.13' && !matrix.min-deps
122
+ uses: codecov/codecov-action@v5
123
+ with:
124
+ files: ./coverage.xml
125
+ flags: unit
126
+ token: ${{ secrets.CODECOV_TOKEN }}
110
127
 
111
128
  integration-tests:
112
129
  name: integration-tests (Python ${{ matrix.python-version }}, ${{ matrix.arch }})
@@ -117,10 +134,10 @@ jobs:
117
134
  include:
118
135
  - arch: amd64
119
136
  os: ubuntu-24.04
120
- python-version: "3.14"
137
+ python-version: "3.13"
121
138
  - arch: arm64
122
139
  os: ubuntu-24.04-arm
123
- python-version: "3.14"
140
+ python-version: "3.13"
124
141
  steps:
125
142
  - uses: actions/checkout@v6
126
143
  with:
@@ -136,16 +153,32 @@ jobs:
136
153
  - name: Install dependencies
137
154
  run: uv sync --frozen --group dev
138
155
  - name: Run Integration Tests
139
- run: |
140
- uv run pytest -vv tests/test_integration.py --run-integration
156
+ if: matrix.arch != 'amd64'
157
+ run: uv run pytest -vv tests/test_integration.py --run-integration
158
+
159
+ - name: Run Integration Tests with Coverage
160
+ if: matrix.arch == 'amd64'
161
+ run: uv run pytest -vv --cov=pystormtracker --cov-report=term-missing --cov-report=xml tests/test_integration.py --run-integration
162
+
141
163
  - name: Upload coverage reports to Codecov
164
+ if: matrix.arch == 'amd64'
142
165
  uses: codecov/codecov-action@v5
143
166
  with:
167
+ files: ./coverage.xml
168
+ flags: integration
144
169
  token: ${{ secrets.CODECOV_TOKEN }}
145
170
 
146
171
  docker-build:
147
172
  name: docker-build
148
- needs: [integration-tests]
173
+ needs: [ruff-lint, ruff-format, mypy-typecheck, unit-tests, integration-tests]
174
+ # Only run on merges to main, release branches, tags, releases, or manual dispatch
175
+ if: |
176
+ github.event_name != 'pull_request' &&
177
+ (github.ref == 'refs/heads/main' ||
178
+ startsWith(github.ref, 'refs/heads/release/') ||
179
+ startsWith(github.ref, 'refs/tags/v') ||
180
+ github.event_name == 'release' ||
181
+ github.event_name == 'workflow_dispatch')
149
182
  runs-on: ubuntu-latest
150
183
  steps:
151
184
  - name: Checkout repository
@@ -164,20 +197,47 @@ jobs:
164
197
  push: false
165
198
  load: true
166
199
  platforms: linux/amd64
167
- tags: "${{ vars.DOCKER_IMAGE_NAME }}:${{ github.sha }}"
200
+ tags: "${{ github.repository_owner }}/${{ vars.DOCKER_IMAGE_NAME }}:${{ github.sha }}"
168
201
  cache-from: type=gha,scope=docker-build
169
202
  cache-to: type=gha,mode=max,scope=docker-build
170
203
 
171
204
  - name: Smoke test Docker image
172
205
  run: |
173
- docker run --rm ${{ vars.DOCKER_IMAGE_NAME }}:${{ github.sha }} --help
206
+ # Test CLI help
207
+ docker run --rm ${{ github.repository_owner }}/${{ vars.DOCKER_IMAGE_NAME }}:${{ github.sha }} --help
208
+ # Test library import
209
+ docker run --rm --entrypoint python ${{ github.repository_owner }}/${{ vars.DOCKER_IMAGE_NAME }}:${{ github.sha }} -c "import pystormtracker as pst; print('Import success')"
174
210
 
175
211
  - name: Run Trivy vulnerability scanner
176
212
  uses: aquasecurity/trivy-action@0.35.0
177
213
  with:
178
- image-ref: "${{ vars.DOCKER_IMAGE_NAME }}:${{ github.sha }}"
214
+ image-ref: "${{ github.repository_owner }}/${{ vars.DOCKER_IMAGE_NAME }}:${{ github.sha }}"
179
215
  format: "table"
180
216
  exit-code: "0"
181
217
  ignore-unfixed: true
182
218
  vuln-type: "os,library"
183
219
  severity: "CRITICAL,HIGH"
220
+
221
+ pypi-build:
222
+ name: pypi-build
223
+ needs: [ruff-lint, ruff-format, mypy-typecheck, unit-tests, integration-tests]
224
+ # Only run on merges to main, release branches, tags, releases, or manual dispatch
225
+ if: |
226
+ github.event_name != 'pull_request' &&
227
+ (github.ref == 'refs/heads/main' ||
228
+ startsWith(github.ref, 'refs/heads/release/') ||
229
+ startsWith(github.ref, 'refs/tags/v') ||
230
+ github.event_name == 'release' ||
231
+ github.event_name == 'workflow_dispatch')
232
+ runs-on: ubuntu-latest
233
+ steps:
234
+ - uses: actions/checkout@v6
235
+ with:
236
+ ref: ${{ github.ref }}
237
+ fetch-depth: 0
238
+ - name: Set up uv
239
+ uses: astral-sh/setup-uv@v7
240
+ with:
241
+ enable-cache: true
242
+ - name: Build release distributions
243
+ run: uv build --wheel --sdist
@@ -1,11 +1,9 @@
1
1
  name: Docker Publish
2
2
 
3
3
  on:
4
- push:
5
- branches:
6
- - main
7
- release:
8
- types: [published]
4
+ workflow_run:
5
+ workflows: ["CI"]
6
+ types: [completed]
9
7
  workflow_dispatch:
10
8
 
11
9
  concurrency:
@@ -13,23 +11,33 @@ concurrency:
13
11
  cancel-in-progress: false
14
12
 
15
13
  env:
16
- DOCKER_HUB_REPO: docker.io/${{ vars.DOCKER_IMAGE_NAME }}
17
- GHCR_REPO: ghcr.io/${{ vars.DOCKER_IMAGE_NAME }}
14
+ # Publish to ORG on release, else to OWNER (personal) for merge to main/manual
15
+ DOCKER_HUB_REPO: docker.io/${{ (github.event_name == 'release' || (github.event_name == 'workflow_run' && github.event.workflow_run.event == 'release')) && vars.DOCKER_ORG_NAME || github.repository_owner }}/${{ vars.DOCKER_IMAGE_NAME }}
16
+ GHCR_REPO: ghcr.io/${{ (github.event_name == 'release' || (github.event_name == 'workflow_run' && github.event.workflow_run.event == 'release')) && vars.DOCKER_ORG_NAME || github.repository_owner }}/${{ vars.DOCKER_IMAGE_NAME }}
18
17
 
19
18
  jobs:
20
19
  build-and-push:
21
20
  runs-on: ubuntu-latest
21
+ # Only run if CI succeeded (for workflow_run) or if it's a manual trigger.
22
+ # Added head_repository check for security in trusted context.
23
+ if: |
24
+ (github.event_name == 'workflow_run' &&
25
+ github.event.workflow_run.conclusion == 'success' &&
26
+ github.event.workflow_run.head_repository.full_name == github.repository &&
27
+ (github.event.workflow_run.head_branch == 'main' || github.event.workflow_run.event == 'release')) ||
28
+ github.event_name == 'workflow_dispatch'
22
29
  permissions:
23
30
  actions: read
24
- contents: read
31
+ contents: write
25
32
  packages: write
26
33
  id-token: write
27
34
  attestations: write
35
+ artifact-metadata: write
28
36
  steps:
29
37
  - name: Checkout repository
30
38
  uses: actions/checkout@v6
31
39
  with:
32
- ref: ${{ github.ref }}
40
+ ref: ${{ github.event.workflow_run.head_sha || github.ref }}
33
41
  fetch-depth: 0
34
42
 
35
43
  - name: Set up QEMU
@@ -58,18 +66,20 @@ jobs:
58
66
  images: |
59
67
  ${{ env.DOCKER_HUB_REPO }}
60
68
  ${{ env.GHCR_REPO }}
69
+ # Fix: Tell the metadata action the real ref, otherwise it defaults to 'main'
70
+ ref: ${{ github.event.workflow_run.head_branch || github.ref }}
61
71
  tags: |
62
- # Always tag with short SHA
63
- type=sha,format=short,prefix=
64
- # Tag with 'edge' only for main branch
65
- type=edge,branch=main
66
- # Branch tag for all branches except main
67
- type=ref,event=branch,enable=${{ github.ref_name != 'main' }}
72
+ # Tag with 'edge' only for main branch builds
73
+ type=edge,branch=main,priority=700
68
74
  # Semver tags for releases (includes 'latest')
69
- type=semver,pattern=latest
70
- type=semver,pattern={{version}}
71
- type=semver,pattern={{major}}.{{minor}}
72
- type=semver,pattern={{major}},enable=${{ !startsWith(github.ref_name, 'v0') }}
75
+ type=semver,pattern=latest,priority=1000
76
+ type=semver,pattern={{version}},priority=900
77
+ type=semver,pattern={{major}}.{{minor}},priority=900
78
+ type=semver,pattern={{major}},enable=${{ !startsWith(github.ref_name, 'v0') }},priority=900
79
+ # Branch tag for all branches except main
80
+ type=ref,event=branch,enable=${{ github.ref_name != 'main' }},priority=600
81
+ # Always tag with short SHA
82
+ type=sha,format=short,prefix=,priority=100
73
83
 
74
84
  - name: Build and push Docker image
75
85
  id: push
@@ -85,8 +95,8 @@ jobs:
85
95
  cache-from: type=gha,scope=docker-build
86
96
  cache-to: type=gha,mode=max,scope=docker-build
87
97
 
88
- - name: Generate artifact attestation (Docker Hub)
89
- uses: actions/attest-build-provenance@v4
98
+ - name: Attest Provenance (Docker Hub)
99
+ uses: actions/attest@v4
90
100
  with:
91
101
  subject-name: ${{ env.DOCKER_HUB_REPO }}
92
102
  subject-digest: ${{ steps.push.outputs.digest }}
@@ -101,7 +111,7 @@ jobs:
101
111
  format: cyclonedx-json
102
112
 
103
113
  - name: Attest SBOM (Docker Hub)
104
- uses: actions/attest-sbom@v4
114
+ uses: actions/attest@v4
105
115
  with:
106
116
  subject-name: ${{ env.DOCKER_HUB_REPO }}
107
117
  subject-digest: ${{ steps.push.outputs.digest }}
@@ -4,8 +4,9 @@
4
4
  name: Upload Python Package
5
5
 
6
6
  on:
7
- release:
8
- types: [published]
7
+ workflow_run:
8
+ workflows: ["CI"]
9
+ types: [completed]
9
10
 
10
11
  concurrency:
11
12
  group: ${{ github.workflow }}-${{ github.ref }}
@@ -17,6 +18,13 @@ permissions:
17
18
  jobs:
18
19
  release-build:
19
20
  runs-on: ubuntu-latest
21
+ # Only run if CI succeeded AND it was a release event.
22
+ # Added head_repository check for security.
23
+ if: |
24
+ github.event_name == 'workflow_run' &&
25
+ github.event.workflow_run.conclusion == 'success' &&
26
+ github.event.workflow_run.event == 'release' &&
27
+ github.event.workflow_run.head_repository.full_name == github.repository
20
28
  permissions:
21
29
  contents: read
22
30
  id-token: write
@@ -25,6 +33,7 @@ jobs:
25
33
  steps:
26
34
  - uses: actions/checkout@v6
27
35
  with:
36
+ ref: ${{ github.event.workflow_run.head_sha }}
28
37
  fetch-depth: 0
29
38
 
30
39
  - name: Set up uv
@@ -64,7 +73,8 @@ jobs:
64
73
  id: get_version
65
74
  run: |
66
75
  # Strips 'v' prefix from tag_name (e.g. v0.2.1 -> 0.2.1)
67
- VERSION=${{ github.event.release.tag_name }}
76
+ # In workflow_run for a release, head_branch contains the tag name.
77
+ VERSION=${{ github.event.workflow_run.head_branch }}
68
78
  echo "version=${VERSION#v}" >> $GITHUB_OUTPUT
69
79
 
70
80
  - name: Retrieve release distributions
@@ -0,0 +1,40 @@
1
+ # Byte-compiled / optimized / DLL files
2
+ __pycache__/
3
+ *.py[cod]
4
+ *.so
5
+
6
+ # Python Environments & Caches
7
+ .venv/
8
+ env/
9
+ .mypy_cache/
10
+ .pytest_cache/
11
+ .ruff_cache/
12
+ .cache/
13
+
14
+ # Distribution / packaging
15
+ build/
16
+ dist/
17
+ sdist/
18
+
19
+ # Unit test / coverage reports
20
+ htmlcov/
21
+ .tox/
22
+ .coverage
23
+ .coverage.*
24
+ coverage.xml
25
+
26
+ # Sphinx documentation
27
+ docs/_build/
28
+
29
+ # IPython intermediate checkpoints
30
+ .ipynb_checkpoints
31
+
32
+ # Data and Track files
33
+ *.nc
34
+ *.txt
35
+ !data/test/tracks/*.txt
36
+ *.pickle
37
+
38
+ # IDE and Project Tooling
39
+ .vscode/
40
+ worktrees/
@@ -0,0 +1 @@
1
+ 3.13
@@ -0,0 +1,97 @@
1
+ # PyStormTracker Architecture
2
+
3
+ This document describes the modern, high-performance architecture of PyStormTracker, detailing how it leverages vectorization and decoupled components to process massive climate datasets efficiently.
4
+
5
+ ## 1. High-Level Design Philosophy
6
+
7
+ PyStormTracker is built for scale and extensibility. The architecture is centered around three core principles:
8
+ 1. **Unified API (Tracker Protocol):** A structural interface that allows the CLI and Python API to support multiple tracking algorithms (e.g., `SimpleTracker`, `HodgesTracker`) interchangeably.
9
+ 2. **Centralized Threshold Management:** The `SimpleDetector` is responsible for managing variable-specific detection thresholds (e.g., `1e-4` for vorticity), ensuring consistent behavior across different parallel backends.
10
+ 3. **Vectorization & JIT:** Heavy mathematical operations are offloaded to **Numba** JIT-compiled kernels and **NumPy** broadcasting, bypassing Python's loop overhead and Global Interpreter Lock (GIL).
11
+ 4. **Hybrid Parallelism:** The architecture parallelizes the computationally intensive **Detection** phase while centralizing the **Linking** phase to ensure perfect serial-parallel consistency.
12
+
13
+ ---
14
+
15
+ ## 2. Modern Core Components
16
+
17
+ ### 2.1 Array-Backed Data Models (`Tracks`, `Track`, `Center`)
18
+ The data models utilize a contiguous memory paradigm:
19
+ * **`Tracks`**: The central container holding contiguous 1D NumPy arrays for `track_ids`, `times`, `lats`, `lons`, and a dictionary of scientific variables.
20
+ * **`Track`**: A lightweight "view" into the `Tracks` arrays for a specific ID.
21
+ * **`Center`**: A simple dataclass used strictly for iteration or final data export.
22
+
23
+ **Benefits:** By avoiding the creation of millions of Python objects, memory usage is minimized, and data serialization between parallel processes is nearly instantaneous. Raw NumPy arrays also enable extremely fast distance calculations via C-level broadcasting.
24
+
25
+ ### 2.2 Shared DataLoader
26
+ Data loading is encapsulated in a dedicated `DataLoader` class (`io/loader.py`). This component handles:
27
+ * **Format Abstraction**: Seamlessly detects and opens NetCDF (via `h5netcdf` or `netcdf4`) and GRIB (via `cfgrib`) files.
28
+ * **Variable Mapping**: Automatically maps common variable aliases (e.g., `msl`/`slp`, `vo`/`rv`) and coordinate names (`latitude`/`lat`), allowing the same tracking logic to work across different data providers.
29
+ * **Contiguous I/O**: Performs single-block contiguous reads from disk, bypassing HDF5 lock contention.
30
+
31
+ ### 2.3 Vectorized Linker (`SimpleLinker`)
32
+ Trajectory construction uses NumPy broadcasting to calculate Haversine distance matrices between existing track tails and new storm centers. By sorting points spatially before matching, the Linker ensures deterministic, greedy nearest-neighbor linking.
33
+
34
+ ### 2.4 Parallel Pipeline (Gather-then-Link)
35
+ To ensure that parallel results are bit-wise identical to serial runs, PyStormTracker uses a hybrid parallel strategy:
36
+ 1. **Parallel Detection**: Assigned time chunks are distributed across Dask or MPI workers. Each worker runs Numba kernels to find centers and returns raw coordinate arrays.
37
+ 2. **Centralized Linking**: The main process gathers the raw detections from all workers and performs a single sequential link.
38
+
39
+ **Why this works:** In storm tracking, the **Detection** phase (finding local extrema in 3D grids) consumes >95% of the runtime. The **Linking** phase (connecting coordinate lists) is extremely fast once vectorized. Centralizing the link eliminates the complex "merging" bugs found in tree-reduction strategies while maintaining near-perfect parallel scaling.
40
+
41
+ ---
42
+
43
+ ## 3. The `Tracker` Protocol
44
+
45
+ The `Tracker` Protocol (defined in `src/pystormtracker/models/tracker.py`) provides a standardized interface for all tracking algorithms:
46
+
47
+ ```python
48
+ import pystormtracker as pst
49
+
50
+ # Instantiate any compliant tracker
51
+ tracker = pst.SimpleTracker()
52
+
53
+ # Standardized .track() method
54
+ tracks = tracker.track(
55
+ infile="era5_msl.nc",
56
+ varname="msl",
57
+ start_time="2025-01-01",
58
+ backend="dask"
59
+ )
60
+
61
+ # Standardized export
62
+ tracks.write("output.txt", format="imilast")
63
+ ```
64
+
65
+ ---
66
+
67
+ ## 4. Future Architectural Direction
68
+
69
+ To further optimize scalability and memory efficiency for native-resolution climate datasets (e.g., 0.25° ERA5), the architecture is evolving towards deeper integration with the scientific Python ecosystem:
70
+
71
+ * **Idiomatic Xarray (`apply_ufunc`):** Transitioning away from custom MPI/Dask chunking in favor of Xarray's native `apply_ufunc(..., dask="parallelized")`. This delegates chunk management and distributed execution entirely to Xarray/Dask, reducing custom orchestration code.
72
+ * **Lazy Evaluation & Thread Topology:** Shifting from eager chunk-loading to lazy, frame-by-frame memory access to eliminate out-of-memory risks on large domains. Concurrently, strictly pinning Numba thread topologies to prevent CPU oversubscription in multi-process backends.
73
+ * **Tree-based Linking:** Upgrading the current NumPy-broadcasting linker to utilize C-level tree structures (e.g., `scipy.spatial.cKDTree`), breaking the $O(N^2)$ scaling barrier for extremely long or dense trajectory sequences.
74
+
75
+ For more details on specific planned implementations, see the [Roadmap](ROADMAP.md).
76
+
77
+ ---
78
+
79
+ ## 5. Performance Benchmarks
80
+
81
+ To quantify the efficiency gains of the modern array-backed JIT architecture, a comprehensive performance comparison was conducted between the legacy object-oriented system (`v0.3.3`) and the current implementation.
82
+
83
+ Detailed execution timings (breaking down Detection, Linking, Export, and I/O Overhead) across Serial, Dask, and MPI backends for both standard and high-resolution ERA5 datasets are available in the [Benchmark Report](benchmark/BENCHMARK.md).
84
+
85
+ ---
86
+
87
+ ## Appendix: Evolution from Legacy Architecture
88
+
89
+ The current architecture represents a fundamental shift from the legacy nested-object design used in earlier versions.
90
+
91
+ | Feature | Legacy Architecture (v0.3.x and earlier) | Modern Architecture (v0.4.0+) |
92
+ | :--- | :--- | :--- |
93
+ | **Data Storage** | Nested lists of `Center` and `Track` objects. | Flat, C-contiguous NumPy arrays. |
94
+ | **Parallelism** | Threads (bottlenecked by GIL). | Processes/MPI (true concurrent I/O). |
95
+ | **Linking Strategy** | Tree-reduction (prone to boundary splits). | Parallel Detect + Centralized Link (perfect matching). |
96
+ | **Linker** | $O(N^2)$ nested Python loops. | Vectorized NumPy matrix broadcasting. |
97
+ | **I/O** | Many small lazy-loaded chunks. | Contiguous shared `DataLoader`. |
@@ -12,14 +12,14 @@ identifiers:
12
12
  value: 10.5281/zenodo.18764813
13
13
  repository-code: 'https://github.com/mwyau/PyStormTracker'
14
14
  url: 'https://pystormtracker.readthedocs.io/'
15
- abstract: A Parallel Object-Oriented Cyclone Tracker in Python
15
+ abstract: A High-Performance Cyclone Tracker in Python
16
16
  keywords:
17
17
  - cyclone tracking
18
18
  - climate variability
19
19
  - dask
20
20
  - mpi
21
21
  license: BSD-3-Clause
22
- version: 0.3.3
22
+ version: 0.4.0
23
23
  date-released: '2026-03-10'
24
24
  preferred-citation:
25
25
  type: article
@@ -1,7 +1,7 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: PyStormTracker
3
- Version: 0.3.3
4
- Summary: A Parallel Object-Oriented Cyclone Tracker in Python
3
+ Version: 0.4.0
4
+ Summary: A High-Performance Cyclone Tracker in Python
5
5
  Project-URL: Homepage, https://pypi.org/project/PyStormTracker/
6
6
  Project-URL: Repository, https://github.com/mwyau/PyStormTracker.git
7
7
  Project-URL: Issues, https://github.com/mwyau/PyStormTracker/issues
@@ -22,13 +22,15 @@ Requires-Dist: dask>=2024.1.0
22
22
  Requires-Dist: distributed>=2024.1.0
23
23
  Requires-Dist: h5netcdf>=1.0.0
24
24
  Requires-Dist: h5py>=3.8.0
25
+ Requires-Dist: matplotlib>=3.10.8
26
+ Requires-Dist: numba>=0.60.0
25
27
  Requires-Dist: numpy>=1.24.0
26
- Requires-Dist: scipy>=1.9.2
28
+ Requires-Dist: pandas>=3.0.1
27
29
  Requires-Dist: xarray>=2024.9.0
28
30
  Provides-Extra: all
29
31
  Requires-Dist: cfgrib>=0.9.15.1; extra == 'all'
30
32
  Requires-Dist: eccodes>=2.43.0; extra == 'all'
31
- Requires-Dist: eccodeslib>=2.43.0; extra == 'all'
33
+ Requires-Dist: eccodeslib>=2.43.0; (sys_platform != 'win32') and extra == 'all'
32
34
  Requires-Dist: mpi4py>=4.1.0; extra == 'all'
33
35
  Requires-Dist: netcdf4>=1.6.1; extra == 'all'
34
36
  Provides-Extra: docs
@@ -38,7 +40,7 @@ Requires-Dist: sphinx>=9.0.4; extra == 'docs'
38
40
  Provides-Extra: grib
39
41
  Requires-Dist: cfgrib>=0.9.15.1; extra == 'grib'
40
42
  Requires-Dist: eccodes>=2.43.0; extra == 'grib'
41
- Requires-Dist: eccodeslib>=2.43.0; extra == 'grib'
43
+ Requires-Dist: eccodeslib>=2.43.0; (sys_platform != 'win32') and extra == 'grib'
42
44
  Provides-Extra: mpi
43
45
  Requires-Dist: mpi4py>=4.1.0; extra == 'mpi'
44
46
  Provides-Extra: netcdf4
@@ -57,29 +59,37 @@ Description-Content-Type: text/markdown
57
59
  [![GHCR](https://img.shields.io/badge/ghcr.io-xddd%2Fpystormtracker-blue?logo=github)](https://github.com/orgs/xddd/packages/container/package/pystormtracker)
58
60
  [![DOI](https://zenodo.org/badge/36328800.svg)](https://doi.org/10.5281/zenodo.18764813)
59
61
 
60
- **PyStormTracker** is a Python package for cyclone trajectory analysis, implementing the "Simple Tracker" algorithm described in **Yau and Chang (2020)**. It is currently being expanded to include a Python port of the adaptive constraints tracking algorithm from **Hodges (1999)** (originally in C) and the Accumulated Track Activity metrics from **Yau and Chang (2020)** (originally in Matlab).
62
+ **PyStormTracker** is a high-performance Python package for cyclone trajectory analysis. It implements the "Simple Tracker" algorithm and provides a scalable framework for processing large-scale climate datasets like ERA5.
61
63
 
62
- Initially developed at the **National Center for Atmospheric Research (NCAR)** as part of the **2015 SIParCS** program, PyStormTracker leverages task-parallel strategies and tree reduction algorithms to efficiently and accurately process large-scale climate datasets.
64
+ The project is currently being expanded to include a Python port of the adaptive constraints tracking algorithm from **Hodges (1999)** and Accumulated Track Activity metrics.
65
+
66
+ Initially developed at the **National Center for Atmospheric Research (NCAR)** as part of the **2015 SIParCS** program, PyStormTracker leverages task-parallel strategies and tree reduction algorithms to efficiently process large-scale climate datasets.
63
67
 
64
68
  ## Features
65
69
 
66
- - **Modern & Typed**: Strictly targets **Python 3.11+** with complete type hints and strict `mypy` compliance.
67
- - **Xarray Native**: Leverages `xarray` for robust, high-performance coordinate-aware processing, lazy data loading, and optimized I/O.
68
- - **Scalable Execution**: Supports multiple backends:
69
- - **Dask (Default)**: Automatically scales to utilize all available CPU cores.
70
- - **MPI**: Enables distributed execution across cluster nodes via `mpi4py`.
71
- - **Serial**: Standard sequential execution for debugging or small datasets.
72
- - **Robust Feature Detection**: Employs optimized $O(N)$ extrema filtering with robust handling of masked or missing data.
73
- - **Interoperable Output**: Exports tracking results to the standard IMILAST intercomparison format (`.txt`) with human-readable datetime strings.
70
+ - **High-Performance Architecture**: Uses an **Array-Backed** data model to eliminate Python object overhead and ensure zero-copy serialization during parallel execution. **Achieves up to 11.8x speedup in serial workloads.**
71
+ - **JIT-Optimized Kernels**: Core mathematical filters are implemented in **Numba**, running at raw C speeds while releasing the GIL for true multi-process execution.
72
+ - **Xarray Native**: Seamlessly handles NetCDF and GRIB formats with coordinate-aware processing and robust variable alias handling (e.g., `msl`/`slp`, `lon`/`longitude`).
73
+ - **Scalable Backends**:
74
+ - **Serial (Default)**: Standard sequential execution.
75
+ - **Dask**: Multi-process tree-reduction for local or distributed scaling.
76
+ - **MPI**: High-performance distributed execution via `mpi4py`.
77
+ - **Typed & Modern**: Built for **Python 3.11+** with strict type safety and `mypy` compliance.
78
+ - **Interoperable**: Full support for the standard **IMILAST** intercomparison format (`.txt`) with human-readable datetime strings.
79
+
80
+ <p align="center">
81
+ <img src="benchmark/benchmark_0_25x0_25_breakdown.png" width="600" alt="v0.4.0 Performance Improvements">
82
+ <br>
83
+ <i>Significant performance gains in v0.4.0+ compared to the legacy v0.3.3 architecture on high-resolution ERA5 data.</i>
84
+ </p>
74
85
 
75
86
  ## Technical Methodology
76
87
 
77
- PyStormTracker treats meteorological fields as 2D images, utilizing `scipy.ndimage` for robust feature detection and tracking:
88
+ PyStormTracker treats meteorological fields as 2D images and leverages JIT-compiled Numba loops for high-performance feature detection:
78
89
 
79
90
  - **Local Extrema Detection**: Employs an optimized sliding window filter to efficiently identify local minima (e.g., cyclones) or maxima (e.g., anticyclones, vorticity).
80
91
  - **Intensity & Refinement**: Applies the discrete **Laplacian operator** to measure the "sharpness" of the field at each candidate center. This metric resolves duplicate detections, ensuring only the most physically intense point is retained when adjacent pixels are flagged.
81
- - **Spherical Continuity**: Uses `mode='wrap'` for all spatial filters to correctly handle periodic boundary conditions across the Prime Meridian, allowing for seamless global tracking.
82
- - **Trajectory Linking**: Connects detected centers across consecutive time steps into continuous trajectories using a nearest-neighbor heuristic linking strategy.
92
+ - **Trajectory Linking**: Connects detected centers across consecutive time steps into continuous trajectories using a vectorized nearest-neighbor heuristic linking strategy.
83
93
 
84
94
  ## Documentation
85
95
 
@@ -90,7 +100,7 @@ Full documentation, including API references and advanced usage examples, is ava
90
100
  ### Prerequisites
91
101
  - Python 3.11+
92
102
  - (Optional) OpenMPI for MPI support.
93
- - **Windows Users**: Note that the `grib` optional dependency (via `eccodeslib`) currently only supports Linux and macOS.
103
+ - **Windows Users**: the `eccodeslib` GRIB helper library is only required on Linux/macOS. (Note: GRIB/ecCodes support on Windows is currently experimental and untested).
94
104
 
95
105
  ### From PyPI (Recommended)
96
106
  You can install the latest stable version of PyStormTracker directly from PyPI:
@@ -119,23 +129,61 @@ uv sync
119
129
 
120
130
  ## Usage
121
131
 
132
+ ### Command Line Interface
133
+
122
134
  Once installed, you can use the `stormtracker` command directly:
123
135
 
124
136
  ```bash
125
- stormtracker -i era5_msl_2025-2026_djf_2.5x2.5.nc -v msl -o my_tracks
137
+ stormtracker -i data.nc -v msl -o my_tracks
126
138
  ```
127
139
 
128
- ### Command Line Arguments
140
+ #### Command Line Arguments
129
141
 
130
142
  | Argument | Short | Description |
131
143
  | :--- | :--- | :--- |
132
- | `--input` | `-i` | **Required.** Path to the input NetCDF file. |
144
+ | `--input` | `-i` | **Required.** Path to the input NetCDF/GRIB file. |
133
145
  | `--var` | `-v` | **Required.** Variable name to track (e.g., `msl`, `vo`). |
134
146
  | `--output` | `-o` | **Required.** Path to the output track file (appends `.txt` if missing). |
135
147
  | `--num` | `-n` | Number of time steps to process. |
148
+ | `--threshold` | `-t` | Detection threshold (defaults: `1e-4` for `vo`, `0.0` otherwise). |
136
149
  | `--mode` | `-m` | `min` (default) for low pressure, `max` for vorticity/high pressure. |
137
- | `--backend` | `-b` | `dask` (default), `serial`, or `mpi`. |
150
+ | `--backend` | `-b` | `serial` (default), `dask`, or `mpi`. |
138
151
  | `--workers` | `-w` | Number of Dask workers (defaults to CPU core count). |
152
+ | `--engine` | `-e` | Xarray engine (e.g., `h5netcdf`, `netcdf4`, `cfgrib`). |
153
+
154
+ ### Python API
155
+
156
+ You can easily integrate PyStormTracker into your own scripts or Jupyter Notebooks:
157
+
158
+ ```python
159
+ import pystormtracker as pst
160
+
161
+ # 1. Instantiate the tracker (defaults to Serial backend)
162
+ tracker = pst.SimpleTracker()
163
+
164
+ # 2. Run the tracking algorithm. Returns an array-backed Tracks object.
165
+ tracks = tracker.track(
166
+ infile="data.nc",
167
+ varname="msl",
168
+ mode="min",
169
+ start_time="2025-01-01", # Optional: limit by start date
170
+ end_time="2025-01-31", # Optional: limit by end date
171
+ backend="dask", # Optional: use 'serial', 'dask', or 'mpi'
172
+ n_workers=4
173
+ )
174
+
175
+ # 3. Analyze the results programmatically
176
+ for track in tracks:
177
+ if len(track) >= 8:
178
+ print(f"Track {track.track_id} lived for {len(track)} steps.")
179
+
180
+ # 4. Export results
181
+ tracks.write("output.txt", format="imilast")
182
+ ```
183
+
184
+ ## Sample Data
185
+
186
+ Sample datasets for testing and benchmarking are hosted in the [PyStormTracker-Data](https://github.com/mwyau/PyStormTracker-Data) repository.
139
187
 
140
188
  ## Development
141
189
 
@@ -163,7 +211,9 @@ uv run mypy src/
163
211
  ### Tiered Testing
164
212
  To keep development cycles fast, testing is tiered:
165
213
  - **Fast Tests**: Default local runs (skips integration tests).
166
- - **Integration Tests**: ONLY long-running integration/regression tests.
214
+ - **Integration Tests**: Integration and regression tests.
215
+ - **Local**: Runs "short" variants (60 time steps) to ensure backend consistency quickly.
216
+ - **CI**: Runs "full" (all time steps) variants, including legacy regressions.
167
217
  - **Full Suite**: Everything.
168
218
 
169
219
  **Run fast unit tests only (Default):**
@@ -171,7 +221,7 @@ To keep development cycles fast, testing is tiered:
171
221
  uv run pytest
172
222
  ```
173
223
 
174
- **Run ONLY integration tests:**
224
+ **Run integration tests (Short variants locally):**
175
225
  ```bash
176
226
  uv run pytest --run-integration
177
227
  ```
@@ -187,7 +237,7 @@ If you use this software in your research, please cite the following:
187
237
 
188
238
  - **Yau, A. M. W.**, 2026: mwyau/PyStormTracker. *Zenodo*, [https://doi.org/10.5281/zenodo.18764813](https://doi.org/10.5281/zenodo.18764813).
189
239
 
190
- - **Yau, A. M. W., and E. K. M. Chang**, 2020: Finding Storm Track Activity Metrics That Are Highly Correlated with Weather Impacts. Part I: Frameworks for Evaluation and Accumulated Track Activity. *J. Climate*, **33**, 10169–10186, [https://doi.org/10.1175/JCLI-D-20-0393.1](https://doi.org/10.1175/JCLI-D-20-0393.1).
240
+ - **Yau, A. M. W. and Chang, E. K. M.**, 2020: Finding Storm Track Activity Metrics That Are Highly Correlated with Weather Impacts. *J. Climate*, **33**, 10169–10186, [https://doi.org/10.1175/JCLI-D-20-0393.1](https://doi.org/10.1175/JCLI-D-20-0393.1).
191
241
 
192
242
  ## References
193
243