jp-idwr-db 0.2.2__tar.gz → 0.2.3__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (27) hide show
  1. {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/CHANGELOG.md +6 -0
  2. jp_idwr_db-0.2.3/PKG-INFO +293 -0
  3. jp_idwr_db-0.2.3/README.md +256 -0
  4. {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/pyproject.toml +1 -1
  5. {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/__init__.py +1 -1
  6. {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/config.py +1 -1
  7. {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/data_manager.py +11 -0
  8. jp_idwr_db-0.2.2/PKG-INFO +0 -243
  9. jp_idwr_db-0.2.2/README.md +0 -206
  10. {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/.gitignore +0 -0
  11. {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/CITATION.cff +0 -0
  12. {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/LICENSE +0 -0
  13. {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/__main__.py +0 -0
  14. {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/_internal/__init__.py +0 -0
  15. {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/_internal/download.py +0 -0
  16. {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/_internal/read.py +0 -0
  17. {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/_internal/validation.py +0 -0
  18. {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/api.py +0 -0
  19. {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/cli.py +0 -0
  20. {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/datasets.py +0 -0
  21. {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/http.py +0 -0
  22. {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/io.py +0 -0
  23. {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/py.typed +0 -0
  24. {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/transform.py +0 -0
  25. {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/types.py +0 -0
  26. {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/urls.py +0 -0
  27. {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/utils.py +0 -0
@@ -1,5 +1,11 @@
1
1
  # Changelog
2
2
 
3
+ ## 0.2.3 - 2026-02-06
4
+
5
+ - Refreshed release data assets from sorted parquet datasets (date/prefecture/category ordering).
6
+ - Improved README motivation and practical example narrative.
7
+ - Expanded examples with research-oriented `get_data()` workflows.
8
+
3
9
  ## 0.2.2 - 2026-02-06
4
10
 
5
11
  - Fixed PyPI publish command in release workflow for current `uv` (`uv publish dist/*`).
@@ -0,0 +1,293 @@
1
+ Metadata-Version: 2.4
2
+ Name: jp-idwr-db
3
+ Version: 0.2.3
4
+ Summary: Japanese IDWR infectious disease database and analytics toolkit built on Polars.
5
+ Project-URL: Homepage, https://github.com/AlFontal/jp-idwr-db
6
+ Project-URL: Repository, https://github.com/AlFontal/jp-idwr-db
7
+ Project-URL: Bug Tracker, https://github.com/AlFontal/jp-idwr-db/issues
8
+ Author: jp-idwr-db contributors
9
+ License: GPL-3.0-or-later
10
+ License-File: LICENSE
11
+ Keywords: epidemiology,infectious-disease,japan,polars,surveillance
12
+ Classifier: Development Status :: 3 - Alpha
13
+ Classifier: Intended Audience :: Science/Research
14
+ Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
15
+ Classifier: Programming Language :: Python :: 3
16
+ Classifier: Programming Language :: Python :: 3.10
17
+ Classifier: Programming Language :: Python :: 3.11
18
+ Classifier: Programming Language :: Python :: 3.12
19
+ Classifier: Programming Language :: Python :: 3.13
20
+ Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
21
+ Requires-Python: >=3.10
22
+ Requires-Dist: fastexcel>=0.10
23
+ Requires-Dist: httpx>=0.27
24
+ Requires-Dist: openpyxl>=3.1
25
+ Requires-Dist: platformdirs>=4.2
26
+ Requires-Dist: polars>=0.20
27
+ Requires-Dist: pyarrow>=14.0
28
+ Provides-Extra: dev
29
+ Requires-Dist: mypy>=1.8; extra == 'dev'
30
+ Requires-Dist: pre-commit>=3.7; extra == 'dev'
31
+ Requires-Dist: pytest-cov>=5.0; extra == 'dev'
32
+ Requires-Dist: pytest>=8.0; extra == 'dev'
33
+ Requires-Dist: ruff>=0.4; extra == 'dev'
34
+ Provides-Extra: excel
35
+ Requires-Dist: fastexcel>=0.10; extra == 'excel'
36
+ Description-Content-Type: text/markdown
37
+
38
+ # jp-idwr-db
39
+
40
+ Python access to Japanese infectious disease surveillance data from NIID/JIHS.
41
+
42
+ `jp-idwr-db` provides a Polars-first API for filtering and analysis.
43
+ Parquet datasets are versioned as GitHub Release assets and downloaded to a local cache on first use.
44
+ It is inspired by the R package `jpinfect`, but it is not an API-parity port and includes independently curated ingestion and coverage.
45
+
46
+ NIID/JIHS surveillance data is public, but it is not exposed as a clean analytical API.
47
+ To reconstruct usable time series, you typically need to navigate multiple archive structures, yearly directories,
48
+ and week-level files with changing formats (Excel and CSV) across historical and modern reporting systems.
49
+
50
+ This package exists to remove that friction: it consolidates those heterogeneous sources into standardized, queryable
51
+ tables so you can move directly to epidemiological analysis instead of file discovery, parsing, and schema harmonization.
52
+
53
+ ## Install
54
+
55
+ ```bash
56
+ pip install jp-idwr-db
57
+ ```
58
+
59
+ ## Data Download Model
60
+
61
+ - Package wheels do not ship the large parquet tables.
62
+ - On first call to `jp.load(...)` (or `jp.get_data(...)`), the package downloads versioned data assets from GitHub Releases.
63
+ - Cache path defaults to:
64
+ - macOS: `~/Library/Caches/jp_idwr_db/data/<version>/`
65
+ - Linux: `~/.cache/jp_idwr_db/data/<version>/`
66
+ - Windows: `%LOCALAPPDATA%\\jp_idwr_db\\Cache\\data\\<version>\\`
67
+
68
+ Prefetch explicitly:
69
+
70
+ ```bash
71
+ python -m jp_idwr_db data download
72
+ python -m jp_idwr_db data download --version v0.2.2 --force
73
+ ```
74
+
75
+ Environment overrides:
76
+
77
+ - `JPINFECT_DATA_VERSION`: choose a specific release tag (example: `v0.2.2`)
78
+ - `JPINFECT_DATA_BASE_URL`: override asset host base URL
79
+ - `JPINFECT_CACHE_DIR`: override local cache root
80
+
81
+ ## Quick Start
82
+
83
+ To fetch the full unified dataset with a single call:
84
+
85
+ ```python
86
+ import jp_idwr_db as jp
87
+ import polars as pl
88
+
89
+ df = (
90
+ jp.load("unified")
91
+ .select(["date", "prefecture", "category", "disease", "count", "source"])
92
+ )
93
+ print(df)
94
+ ```
95
+
96
+ ```text
97
+ shape: (5_370_477, 6)
98
+ ┌────────────┬────────────┬──────────┬─────────────────────────────┬───────┬────────────────────┐
99
+ │ date ┆ prefecture ┆ category ┆ disease ┆ count ┆ source │
100
+ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
101
+ │ date ┆ str ┆ str ┆ str ┆ f64 ┆ str │
102
+ ╞════════════╪════════════╪══════════╪═════════════════════════════╪═══════╪════════════════════╡
103
+ │ 1999-04-11 ┆ Aichi ┆ total ┆ AIDS ┆ 0.0 ┆ Confirmed cases │
104
+ │ 1999-04-11 ┆ Aichi ┆ total ┆ Acute poliomyelitis ┆ 0.0 ┆ Confirmed cases │
105
+ │ 1999-04-11 ┆ Aichi ┆ total ┆ Acute viral hepatitis ┆ 4.0 ┆ Confirmed cases │
106
+ │ 1999-04-11 ┆ Aichi ┆ total ┆ Amebiasis ┆ 0.0 ┆ Confirmed cases │
107
+ │ 1999-04-11 ┆ Aichi ┆ total ┆ Anthrax ┆ 0.0 ┆ Confirmed cases │
108
+ │ … ┆ … ┆ … ┆ … ┆ … ┆ … │
109
+ │ 2026-02-09 ┆ Yamanashi ┆ total ┆ Viral hepatitis(excluding ┆ 0.0 ┆ All-case reporting │
110
+ │ ┆ ┆ ┆ hepa… ┆ ┆ │
111
+ │ 2026-02-09 ┆ Yamanashi ┆ total ┆ West Nile fever ┆ 0.0 ┆ All-case reporting │
112
+ │ 2026-02-09 ┆ Yamanashi ┆ total ┆ Western equine encephalitis ┆ 0.0 ┆ All-case reporting │
113
+ │ 2026-02-09 ┆ Yamanashi ┆ total ┆ Yellow fever ┆ 0.0 ┆ All-case reporting │
114
+ │ 2026-02-09 ┆ Yamanashi ┆ total ┆ Zika virus infection ┆ 0.0 ┆ All-case reporting │
115
+ └────────────┴────────────┴──────────┴─────────────────────────────┴───────┴────────────────────┘
116
+ ```
117
+
118
+ You can also filter at the source with `jp.get_data(...)`:
119
+
120
+ ```python
121
+
122
+ # Fetch only tuberculosis data for 2024 in Tokyo, Osaka, and Hokkaido
123
+ tb = (
124
+ jp.get_data(
125
+ disease="Tuberculosis",
126
+ year=2024,
127
+ prefecture=["Tokyo", "Osaka", "Hokkaido"])
128
+ .select(["date", "prefecture", "disease", "count", "source"])
129
+ )
130
+ print(tb)
131
+ ```
132
+
133
+ ```text
134
+ shape: (156, 5)
135
+ ┌────────────┬────────────┬──────────────┬───────┬────────────────────┐
136
+ │ date ┆ prefecture ┆ disease ┆ count ┆ source │
137
+ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │
138
+ │ date ┆ str ┆ str ┆ f64 ┆ str │
139
+ ╞════════════╪════════════╪══════════════╪═══════╪════════════════════╡
140
+ │ 2024-01-01 ┆ Hokkaido ┆ Tuberculosis ┆ 2.0 ┆ All-case reporting │
141
+ │ 2024-01-01 ┆ Osaka ┆ Tuberculosis ┆ 3.0 ┆ All-case reporting │
142
+ │ 2024-01-01 ┆ Tokyo ┆ Tuberculosis ┆ 15.0 ┆ All-case reporting │
143
+ │ 2024-01-08 ┆ Hokkaido ┆ Tuberculosis ┆ 4.0 ┆ All-case reporting │
144
+ │ 2024-01-08 ┆ Osaka ┆ Tuberculosis ┆ 17.0 ┆ All-case reporting │
145
+ │ … ┆ … ┆ … ┆ … ┆ … │
146
+ │ 2024-12-16 ┆ Osaka ┆ Tuberculosis ┆ 17.0 ┆ All-case reporting │
147
+ │ 2024-12-16 ┆ Tokyo ┆ Tuberculosis ┆ 41.0 ┆ All-case reporting │
148
+ │ 2024-12-23 ┆ Hokkaido ┆ Tuberculosis ┆ 5.0 ┆ All-case reporting │
149
+ │ 2024-12-23 ┆ Osaka ┆ Tuberculosis ┆ 16.0 ┆ All-case reporting │
150
+ │ 2024-12-23 ┆ Tokyo ┆ Tuberculosis ┆ 53.0 ┆ All-case reporting │
151
+ └────────────┴────────────┴──────────────┴───────┴────────────────────┘
152
+ ```
153
+
154
+ ```python
155
+
156
+ # Sentinel-only diseases from recent years in Tokyo prefecture
157
+ sentinel_df = (
158
+ jp.get_data(
159
+ source="sentinel",
160
+ year=(2024, 2026))
161
+ .select(["date", "prefecture", "disease", "count", "per_sentinel"])
162
+ )
163
+ print(sentinel_df)
164
+ ```
165
+
166
+ ```text
167
+ shape: (2_052, 5)
168
+ ┌────────────┬────────────┬─────────────────────────────────┬─────────┬──────────────┐
169
+ │ date ┆ prefecture ┆ disease ┆ count ┆ per_sentinel │
170
+ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │
171
+ │ date ┆ str ┆ str ┆ f64 ┆ f64 │
172
+ ╞════════════╪════════════╪═════════════════════════════════╪═════════╪══════════════╡
173
+ │ 2024-01-07 ┆ Tokyo ┆ Acute hemorrhagic conjunctivit… ┆ null ┆ null │
174
+ │ 2024-01-07 ┆ Tokyo ┆ Aseptic meningitis ┆ null ┆ null │
175
+ │ 2024-01-07 ┆ Tokyo ┆ Bacterial meningitis ┆ null ┆ null │
176
+ │ 2024-01-07 ┆ Tokyo ┆ COVID-19 ┆ 1365.0 ┆ 3.38 │
177
+ │ 2024-01-07 ┆ Tokyo ┆ Chickenpox ┆ 31.0 ┆ 0.12 │
178
+ │ … ┆ … ┆ … ┆ … ┆ … │
179
+ │ 2026-01-25 ┆ Tokyo ┆ Influenza(excld. avian influen… ┆ 13082.0 ┆ 34.07 │
180
+ │ 2026-01-25 ┆ Tokyo ┆ Mumps ┆ 30.0 ┆ 0.12 │
181
+ │ 2026-01-25 ┆ Tokyo ┆ Mycoplasma pneumonia ┆ 32.0 ┆ 1.28 │
182
+ │ 2026-01-25 ┆ Tokyo ┆ Pharyngoconjunctival fever ┆ 115.0 ┆ 0.47 │
183
+ │ 2026-01-25 ┆ Tokyo ┆ Respiratory syncytial virus in… ┆ 242.0 ┆ 1.0 │
184
+ └────────────┴────────────┴─────────────────────────────────┴─────────┴──────────────┘
185
+ ```
186
+
187
+ ## Main API
188
+
189
+ Top-level API exported by `jp_idwr_db`:
190
+
191
+ - `load(name)`
192
+ - `get_data(...)`
193
+ - `list_diseases(source="all")`
194
+ - `list_prefectures()`
195
+ - `get_latest_week()`
196
+ - `prefecture_map()`
197
+ - `attach_prefecture_id(df, prefecture_col="prefecture", id_col="prefecture_id")`
198
+ - `merge(...)`, `pivot(...)`
199
+ - `configure(...)`, `get_config()`
200
+
201
+
202
+ ## Datasets
203
+
204
+ Use `jp.load(...)` with:
205
+
206
+ - `"sex"`: historical sex-disaggregated surveillance
207
+ - `"place"`: historical place-category surveillance
208
+ - `"bullet"`: modern all-case weekly reports (rapid zensu)
209
+ - `"sentinel"`: sentinel weekly reports (teitenrui; 2012+ in release data assets)
210
+ - `"unified"`: deduplicated combined dataset (sex-total + modern bullet/sentinel, recommended)
211
+
212
+ Detailed schema and coverage are documented in [DATASETS.md](./docs/DATASETS.md).
213
+
214
+ ## Optional Prefecture IDs
215
+
216
+ Attach ISO prefecture IDs (JP-01 ... JP-47) only when needed:
217
+
218
+ ```python
219
+ import jp_idwr_db as jp
220
+
221
+ df_with_ids = (
222
+ jp.get_data(disease="Measles", year=2024)
223
+ .select(["prefecture", "disease", "count"])
224
+ .sort(["prefecture", "count"])
225
+ .unique(subset=["prefecture"], keep="first")
226
+ .pipe(jp.attach_prefecture_id)
227
+ .sort("prefecture")
228
+ )
229
+ print(df_with_ids)
230
+ ```
231
+
232
+ ```text
233
+ shape: (48, 4)
234
+ ┌────────────┬─────────┬───────┬───────────────┐
235
+ │ prefecture ┆ disease ┆ count ┆ prefecture_id │
236
+ │ --- ┆ --- ┆ --- ┆ --- │
237
+ │ str ┆ str ┆ f64 ┆ str │
238
+ ╞════════════╪═════════╪═══════╪═══════════════╡
239
+ │ Aichi ┆ Measles ┆ 0.0 ┆ JP-23 │
240
+ │ Akita ┆ Measles ┆ 0.0 ┆ JP-05 │
241
+ │ Aomori ┆ Measles ┆ 0.0 ┆ JP-02 │
242
+ │ Chiba ┆ Measles ┆ 0.0 ┆ JP-12 │
243
+ │ Ehime ┆ Measles ┆ 0.0 ┆ JP-38 │
244
+ │ … ┆ … ┆ … ┆ … │
245
+ │ Toyama ┆ Measles ┆ 0.0 ┆ JP-16 │
246
+ │ Wakayama ┆ Measles ┆ 0.0 ┆ JP-30 │
247
+ │ Yamagata ┆ Measles ┆ 0.0 ┆ JP-06 │
248
+ │ Yamaguchi ┆ Measles ┆ 0.0 ┆ JP-35 │
249
+ │ Yamanashi ┆ Measles ┆ 0.0 ┆ JP-19 │
250
+ └────────────┴─────────┴───────┴───────────────┘
251
+ ```
252
+
253
+ ## Raw Download and Parsing
254
+
255
+ Raw file workflows are available in `jp_idwr_db.io`:
256
+
257
+ - `jp_idwr_db.io.download(...)`
258
+ - `jp_idwr_db.io.download_recent(...)`
259
+ - `jp_idwr_db.io.read(...)`
260
+
261
+ These are useful for refreshing local raw weekly files or debugging parser behavior.
262
+
263
+ ## Data Wrangling Examples
264
+
265
+ See [EXAMPLES.md](./docs/EXAMPLES.md) for Polars-first data wrangling recipes (grouping, trends, regional slices, source-aware filtering).
266
+
267
+ Disease-by-disease temporal coverage is documented in [DISEASES.md](./docs/DISEASES.md).
268
+
269
+ ## Data Source
270
+
271
+ NIID/JIHS infectious disease surveillance publications:
272
+
273
+ - Historical annual archive files (`Syu_01_1`, `Syu_02_1`)
274
+ - Rapid weekly CSV reports (`zensuXX.csv`, `teitenruiXX.csv`)
275
+
276
+ ## Development
277
+
278
+ ```bash
279
+ uv sync --all-extras --dev
280
+ uv run ruff check .
281
+ uv run mypy src
282
+ uv run pytest
283
+ ```
284
+
285
+ ## Security and Integrity
286
+
287
+ - Release assets include a `jp_idwr_db-manifest.json` with SHA256 checksums.
288
+ - `ensure_data()` verifies archive checksum and each extracted parquet checksum before marking cache complete.
289
+ - For PyPI publishing, prefer Trusted Publishing (OIDC) over long-lived API tokens.
290
+
291
+ ## License
292
+
293
+ GPL-3.0-or-later. See [LICENSE](./LICENSE).
@@ -0,0 +1,256 @@
1
+ # jp-idwr-db
2
+
3
+ Python access to Japanese infectious disease surveillance data from NIID/JIHS.
4
+
5
+ `jp-idwr-db` provides a Polars-first API for filtering and analysis.
6
+ Parquet datasets are versioned as GitHub Release assets and downloaded to a local cache on first use.
7
+ It is inspired by the R package `jpinfect`, but it is not an API-parity port and includes independently curated ingestion and coverage.
8
+
9
+ NIID/JIHS surveillance data is public, but it is not exposed as a clean analytical API.
10
+ To reconstruct usable time series, you typically need to navigate multiple archive structures, yearly directories,
11
+ and week-level files with changing formats (Excel and CSV) across historical and modern reporting systems.
12
+
13
+ This package exists to remove that friction: it consolidates those heterogeneous sources into standardized, queryable
14
+ tables so you can move directly to epidemiological analysis instead of file discovery, parsing, and schema harmonization.
15
+
16
+ ## Install
17
+
18
+ ```bash
19
+ pip install jp-idwr-db
20
+ ```
21
+
22
+ ## Data Download Model
23
+
24
+ - Package wheels do not ship the large parquet tables.
25
+ - On first call to `jp.load(...)` (or `jp.get_data(...)`), the package downloads versioned data assets from GitHub Releases.
26
+ - Cache path defaults to:
27
+ - macOS: `~/Library/Caches/jp_idwr_db/data/<version>/`
28
+ - Linux: `~/.cache/jp_idwr_db/data/<version>/`
29
+ - Windows: `%LOCALAPPDATA%\\jp_idwr_db\\Cache\\data\\<version>\\`
30
+
31
+ Prefetch explicitly:
32
+
33
+ ```bash
34
+ python -m jp_idwr_db data download
35
+ python -m jp_idwr_db data download --version v0.2.2 --force
36
+ ```
37
+
38
+ Environment overrides:
39
+
40
+ - `JPINFECT_DATA_VERSION`: choose a specific release tag (example: `v0.2.2`)
41
+ - `JPINFECT_DATA_BASE_URL`: override asset host base URL
42
+ - `JPINFECT_CACHE_DIR`: override local cache root
43
+
44
+ ## Quick Start
45
+
46
+ To fetch the full unified dataset with a single call:
47
+
48
+ ```python
49
+ import jp_idwr_db as jp
50
+ import polars as pl
51
+
52
+ df = (
53
+ jp.load("unified")
54
+ .select(["date", "prefecture", "category", "disease", "count", "source"])
55
+ )
56
+ print(df)
57
+ ```
58
+
59
+ ```text
60
+ shape: (5_370_477, 6)
61
+ ┌────────────┬────────────┬──────────┬─────────────────────────────┬───────┬────────────────────┐
62
+ │ date ┆ prefecture ┆ category ┆ disease ┆ count ┆ source │
63
+ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
64
+ │ date ┆ str ┆ str ┆ str ┆ f64 ┆ str │
65
+ ╞════════════╪════════════╪══════════╪═════════════════════════════╪═══════╪════════════════════╡
66
+ │ 1999-04-11 ┆ Aichi ┆ total ┆ AIDS ┆ 0.0 ┆ Confirmed cases │
67
+ │ 1999-04-11 ┆ Aichi ┆ total ┆ Acute poliomyelitis ┆ 0.0 ┆ Confirmed cases │
68
+ │ 1999-04-11 ┆ Aichi ┆ total ┆ Acute viral hepatitis ┆ 4.0 ┆ Confirmed cases │
69
+ │ 1999-04-11 ┆ Aichi ┆ total ┆ Amebiasis ┆ 0.0 ┆ Confirmed cases │
70
+ │ 1999-04-11 ┆ Aichi ┆ total ┆ Anthrax ┆ 0.0 ┆ Confirmed cases │
71
+ │ … ┆ … ┆ … ┆ … ┆ … ┆ … │
72
+ │ 2026-02-09 ┆ Yamanashi ┆ total ┆ Viral hepatitis(excluding ┆ 0.0 ┆ All-case reporting │
73
+ │ ┆ ┆ ┆ hepa… ┆ ┆ │
74
+ │ 2026-02-09 ┆ Yamanashi ┆ total ┆ West Nile fever ┆ 0.0 ┆ All-case reporting │
75
+ │ 2026-02-09 ┆ Yamanashi ┆ total ┆ Western equine encephalitis ┆ 0.0 ┆ All-case reporting │
76
+ │ 2026-02-09 ┆ Yamanashi ┆ total ┆ Yellow fever ┆ 0.0 ┆ All-case reporting │
77
+ │ 2026-02-09 ┆ Yamanashi ┆ total ┆ Zika virus infection ┆ 0.0 ┆ All-case reporting │
78
+ └────────────┴────────────┴──────────┴─────────────────────────────┴───────┴────────────────────┘
79
+ ```
80
+
81
+ You can also filter at the source with `jp.get_data(...)`:
82
+
83
+ ```python
84
+
85
+ # Fetch only tuberculosis data for 2024 in Tokyo, Osaka, and Hokkaido
86
+ tb = (
87
+ jp.get_data(
88
+ disease="Tuberculosis",
89
+ year=2024,
90
+ prefecture=["Tokyo", "Osaka", "Hokkaido"])
91
+ .select(["date", "prefecture", "disease", "count", "source"])
92
+ )
93
+ print(tb)
94
+ ```
95
+
96
+ ```text
97
+ shape: (156, 5)
98
+ ┌────────────┬────────────┬──────────────┬───────┬────────────────────┐
99
+ │ date ┆ prefecture ┆ disease ┆ count ┆ source │
100
+ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │
101
+ │ date ┆ str ┆ str ┆ f64 ┆ str │
102
+ ╞════════════╪════════════╪══════════════╪═══════╪════════════════════╡
103
+ │ 2024-01-01 ┆ Hokkaido ┆ Tuberculosis ┆ 2.0 ┆ All-case reporting │
104
+ │ 2024-01-01 ┆ Osaka ┆ Tuberculosis ┆ 3.0 ┆ All-case reporting │
105
+ │ 2024-01-01 ┆ Tokyo ┆ Tuberculosis ┆ 15.0 ┆ All-case reporting │
106
+ │ 2024-01-08 ┆ Hokkaido ┆ Tuberculosis ┆ 4.0 ┆ All-case reporting │
107
+ │ 2024-01-08 ┆ Osaka ┆ Tuberculosis ┆ 17.0 ┆ All-case reporting │
108
+ │ … ┆ … ┆ … ┆ … ┆ … │
109
+ │ 2024-12-16 ┆ Osaka ┆ Tuberculosis ┆ 17.0 ┆ All-case reporting │
110
+ │ 2024-12-16 ┆ Tokyo ┆ Tuberculosis ┆ 41.0 ┆ All-case reporting │
111
+ │ 2024-12-23 ┆ Hokkaido ┆ Tuberculosis ┆ 5.0 ┆ All-case reporting │
112
+ │ 2024-12-23 ┆ Osaka ┆ Tuberculosis ┆ 16.0 ┆ All-case reporting │
113
+ │ 2024-12-23 ┆ Tokyo ┆ Tuberculosis ┆ 53.0 ┆ All-case reporting │
114
+ └────────────┴────────────┴──────────────┴───────┴────────────────────┘
115
+ ```
116
+
117
+ ```python
118
+
119
+ # Sentinel-only diseases from recent years in Tokyo prefecture
120
+ sentinel_df = (
121
+ jp.get_data(
122
+ source="sentinel",
123
+ year=(2024, 2026))
124
+ .select(["date", "prefecture", "disease", "count", "per_sentinel"])
125
+ )
126
+ print(sentinel_df)
127
+ ```
128
+
129
+ ```text
130
+ shape: (2_052, 5)
131
+ ┌────────────┬────────────┬─────────────────────────────────┬─────────┬──────────────┐
132
+ │ date ┆ prefecture ┆ disease ┆ count ┆ per_sentinel │
133
+ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │
134
+ │ date ┆ str ┆ str ┆ f64 ┆ f64 │
135
+ ╞════════════╪════════════╪═════════════════════════════════╪═════════╪══════════════╡
136
+ │ 2024-01-07 ┆ Tokyo ┆ Acute hemorrhagic conjunctivit… ┆ null ┆ null │
137
+ │ 2024-01-07 ┆ Tokyo ┆ Aseptic meningitis ┆ null ┆ null │
138
+ │ 2024-01-07 ┆ Tokyo ┆ Bacterial meningitis ┆ null ┆ null │
139
+ │ 2024-01-07 ┆ Tokyo ┆ COVID-19 ┆ 1365.0 ┆ 3.38 │
140
+ │ 2024-01-07 ┆ Tokyo ┆ Chickenpox ┆ 31.0 ┆ 0.12 │
141
+ │ … ┆ … ┆ … ┆ … ┆ … │
142
+ │ 2026-01-25 ┆ Tokyo ┆ Influenza(excld. avian influen… ┆ 13082.0 ┆ 34.07 │
143
+ │ 2026-01-25 ┆ Tokyo ┆ Mumps ┆ 30.0 ┆ 0.12 │
144
+ │ 2026-01-25 ┆ Tokyo ┆ Mycoplasma pneumonia ┆ 32.0 ┆ 1.28 │
145
+ │ 2026-01-25 ┆ Tokyo ┆ Pharyngoconjunctival fever ┆ 115.0 ┆ 0.47 │
146
+ │ 2026-01-25 ┆ Tokyo ┆ Respiratory syncytial virus in… ┆ 242.0 ┆ 1.0 │
147
+ └────────────┴────────────┴─────────────────────────────────┴─────────┴──────────────┘
148
+ ```
149
+
150
+ ## Main API
151
+
152
+ Top-level API exported by `jp_idwr_db`:
153
+
154
+ - `load(name)`
155
+ - `get_data(...)`
156
+ - `list_diseases(source="all")`
157
+ - `list_prefectures()`
158
+ - `get_latest_week()`
159
+ - `prefecture_map()`
160
+ - `attach_prefecture_id(df, prefecture_col="prefecture", id_col="prefecture_id")`
161
+ - `merge(...)`, `pivot(...)`
162
+ - `configure(...)`, `get_config()`
163
+
164
+
165
+ ## Datasets
166
+
167
+ Use `jp.load(...)` with:
168
+
169
+ - `"sex"`: historical sex-disaggregated surveillance
170
+ - `"place"`: historical place-category surveillance
171
+ - `"bullet"`: modern all-case weekly reports (rapid zensu)
172
+ - `"sentinel"`: sentinel weekly reports (teitenrui; 2012+ in release data assets)
173
+ - `"unified"`: deduplicated combined dataset (sex-total + modern bullet/sentinel, recommended)
174
+
175
+ Detailed schema and coverage are documented in [DATASETS.md](./docs/DATASETS.md).
176
+
177
+ ## Optional Prefecture IDs
178
+
179
+ Attach ISO prefecture IDs (JP-01 ... JP-47) only when needed:
180
+
181
+ ```python
182
+ import jp_idwr_db as jp
183
+
184
+ df_with_ids = (
185
+ jp.get_data(disease="Measles", year=2024)
186
+ .select(["prefecture", "disease", "count"])
187
+ .sort(["prefecture", "count"])
188
+ .unique(subset=["prefecture"], keep="first")
189
+ .pipe(jp.attach_prefecture_id)
190
+ .sort("prefecture")
191
+ )
192
+ print(df_with_ids)
193
+ ```
194
+
195
+ ```text
196
+ shape: (48, 4)
197
+ ┌────────────┬─────────┬───────┬───────────────┐
198
+ │ prefecture ┆ disease ┆ count ┆ prefecture_id │
199
+ │ --- ┆ --- ┆ --- ┆ --- │
200
+ │ str ┆ str ┆ f64 ┆ str │
201
+ ╞════════════╪═════════╪═══════╪═══════════════╡
202
+ │ Aichi ┆ Measles ┆ 0.0 ┆ JP-23 │
203
+ │ Akita ┆ Measles ┆ 0.0 ┆ JP-05 │
204
+ │ Aomori ┆ Measles ┆ 0.0 ┆ JP-02 │
205
+ │ Chiba ┆ Measles ┆ 0.0 ┆ JP-12 │
206
+ │ Ehime ┆ Measles ┆ 0.0 ┆ JP-38 │
207
+ │ … ┆ … ┆ … ┆ … │
208
+ │ Toyama ┆ Measles ┆ 0.0 ┆ JP-16 │
209
+ │ Wakayama ┆ Measles ┆ 0.0 ┆ JP-30 │
210
+ │ Yamagata ┆ Measles ┆ 0.0 ┆ JP-06 │
211
+ │ Yamaguchi ┆ Measles ┆ 0.0 ┆ JP-35 │
212
+ │ Yamanashi ┆ Measles ┆ 0.0 ┆ JP-19 │
213
+ └────────────┴─────────┴───────┴───────────────┘
214
+ ```
215
+
216
+ ## Raw Download and Parsing
217
+
218
+ Raw file workflows are available in `jp_idwr_db.io`:
219
+
220
+ - `jp_idwr_db.io.download(...)`
221
+ - `jp_idwr_db.io.download_recent(...)`
222
+ - `jp_idwr_db.io.read(...)`
223
+
224
+ These are useful for refreshing local raw weekly files or debugging parser behavior.
225
+
226
+ ## Data Wrangling Examples
227
+
228
+ See [EXAMPLES.md](./docs/EXAMPLES.md) for Polars-first data wrangling recipes (grouping, trends, regional slices, source-aware filtering).
229
+
230
+ Disease-by-disease temporal coverage is documented in [DISEASES.md](./docs/DISEASES.md).
231
+
232
+ ## Data Source
233
+
234
+ NIID/JIHS infectious disease surveillance publications:
235
+
236
+ - Historical annual archive files (`Syu_01_1`, `Syu_02_1`)
237
+ - Rapid weekly CSV reports (`zensuXX.csv`, `teitenruiXX.csv`)
238
+
239
+ ## Development
240
+
241
+ ```bash
242
+ uv sync --all-extras --dev
243
+ uv run ruff check .
244
+ uv run mypy src
245
+ uv run pytest
246
+ ```
247
+
248
+ ## Security and Integrity
249
+
250
+ - Release assets include a `jp_idwr_db-manifest.json` with SHA256 checksums.
251
+ - `ensure_data()` verifies archive checksum and each extracted parquet checksum before marking cache complete.
252
+ - For PyPI publishing, prefer Trusted Publishing (OIDC) over long-lived API tokens.
253
+
254
+ ## License
255
+
256
+ GPL-3.0-or-later. See [LICENSE](./LICENSE).
@@ -1,6 +1,6 @@
1
1
  [project]
2
2
  name = "jp-idwr-db"
3
- version = "0.2.2"
3
+ version = "0.2.3"
4
4
  description = "Japanese IDWR infectious disease database and analytics toolkit built on Polars."
5
5
  readme = "README.md"
6
6
  license = { text = "GPL-3.0-or-later" }
@@ -25,5 +25,5 @@ __all__ = [
25
25
  "prefecture_map",
26
26
  ]
27
27
 
28
- __version__ = "0.2.2"
28
+ __version__ = "0.2.3"
29
29
  __data_version__ = __version__
@@ -25,7 +25,7 @@ class Config:
25
25
 
26
26
  cache_dir: Path = Path(user_cache_dir("jp_idwr_db"))
27
27
  rate_limit_per_minute: int = 20
28
- user_agent: str = "jp_idwr_db/0.2.2 (+https://github.com/AlFontal/jp-idwr-db)"
28
+ user_agent: str = "jp_idwr_db/0.2.3 (+https://github.com/AlFontal/jp-idwr-db)"
29
29
  timeout_seconds: float = 30.0
30
30
  retries: int = 3
31
31
 
@@ -6,6 +6,7 @@ import hashlib
6
6
  import json
7
7
  import os
8
8
  import shutil
9
+ import sys
9
10
  import zipfile
10
11
  from importlib.metadata import PackageNotFoundError
11
12
  from importlib.metadata import version as package_version
@@ -128,6 +129,15 @@ def ensure_data(version: str | None = None, force: bool = False) -> Path:
128
129
  if force and data_dir.exists():
129
130
  shutil.rmtree(data_dir)
130
131
  data_dir.mkdir(parents=True, exist_ok=True)
132
+ action = "Refreshing" if force else "Building"
133
+ print(
134
+ f"[jp_idwr_db] {action} local data cache for {resolved} at {data_dir}.",
135
+ file=sys.stderr,
136
+ )
137
+ print(
138
+ "[jp_idwr_db] This happens on first use and may take a moment.",
139
+ file=sys.stderr,
140
+ )
131
141
 
132
142
  archive_path, manifest_path = download_release_assets(resolved, data_dir)
133
143
  manifest = json.loads(manifest_path.read_text(encoding="utf-8"))
@@ -159,4 +169,5 @@ def ensure_data(version: str | None = None, force: bool = False) -> Path:
159
169
  raise ValueError(f"Missing required datasets in cache: {sorted(missing_expected)}")
160
170
 
161
171
  marker.write_text("ok\n", encoding="utf-8")
172
+ print("[jp_idwr_db] Data cache ready.", file=sys.stderr)
162
173
  return data_dir
jp_idwr_db-0.2.2/PKG-INFO DELETED
@@ -1,243 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: jp-idwr-db
3
- Version: 0.2.2
4
- Summary: Japanese IDWR infectious disease database and analytics toolkit built on Polars.
5
- Project-URL: Homepage, https://github.com/AlFontal/jp-idwr-db
6
- Project-URL: Repository, https://github.com/AlFontal/jp-idwr-db
7
- Project-URL: Bug Tracker, https://github.com/AlFontal/jp-idwr-db/issues
8
- Author: jp-idwr-db contributors
9
- License: GPL-3.0-or-later
10
- License-File: LICENSE
11
- Keywords: epidemiology,infectious-disease,japan,polars,surveillance
12
- Classifier: Development Status :: 3 - Alpha
13
- Classifier: Intended Audience :: Science/Research
14
- Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
15
- Classifier: Programming Language :: Python :: 3
16
- Classifier: Programming Language :: Python :: 3.10
17
- Classifier: Programming Language :: Python :: 3.11
18
- Classifier: Programming Language :: Python :: 3.12
19
- Classifier: Programming Language :: Python :: 3.13
20
- Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
21
- Requires-Python: >=3.10
22
- Requires-Dist: fastexcel>=0.10
23
- Requires-Dist: httpx>=0.27
24
- Requires-Dist: openpyxl>=3.1
25
- Requires-Dist: platformdirs>=4.2
26
- Requires-Dist: polars>=0.20
27
- Requires-Dist: pyarrow>=14.0
28
- Provides-Extra: dev
29
- Requires-Dist: mypy>=1.8; extra == 'dev'
30
- Requires-Dist: pre-commit>=3.7; extra == 'dev'
31
- Requires-Dist: pytest-cov>=5.0; extra == 'dev'
32
- Requires-Dist: pytest>=8.0; extra == 'dev'
33
- Requires-Dist: ruff>=0.4; extra == 'dev'
34
- Provides-Extra: excel
35
- Requires-Dist: fastexcel>=0.10; extra == 'excel'
36
- Description-Content-Type: text/markdown
37
-
38
- # jp-idwr-db
39
-
40
- Python access to Japanese infectious disease surveillance data from NIID/JIHS.
41
-
42
- `jp-idwr-db` provides a Polars-first API for filtering and analysis.
43
- Parquet datasets are versioned as GitHub Release assets and downloaded to a local cache on first use.
44
- It is inspired by the R package `jpinfect`, but it is not an API-parity port and includes independently curated ingestion and coverage.
45
-
46
- ## Install
47
-
48
- ```bash
49
- pip install jp-idwr-db
50
- ```
51
-
52
- ## Data Download Model
53
-
54
- - Package wheels do not ship the large parquet tables.
55
- - On first call to `jp.load(...)` (or `jp.get_data(...)`), the package downloads versioned data assets from GitHub Releases.
56
- - Cache path defaults to:
57
- - macOS: `~/Library/Caches/jp_idwr_db/data/<version>/`
58
- - Linux: `~/.cache/jp_idwr_db/data/<version>/`
59
- - Windows: `%LOCALAPPDATA%\\jp_idwr_db\\Cache\\data\\<version>\\`
60
-
61
- Prefetch explicitly:
62
-
63
- ```bash
64
- python -m jp_idwr_db data download
65
- python -m jp_idwr_db data download --version v0.1.0 --force
66
- ```
67
-
68
- Environment overrides:
69
-
70
- - `JPINFECT_DATA_VERSION`: choose a specific release tag (example: `v0.1.0`)
71
- - `JPINFECT_DATA_BASE_URL`: override asset host base URL
72
- - `JPINFECT_CACHE_DIR`: override local cache root
73
-
74
- ## Quick Start
75
-
76
- ```python
77
- import jp_idwr_db as jp
78
-
79
- # Full unified dataset (recommended)
80
- df = jp.load("unified")
81
- print(df.select(["prefecture", "disease", "year", "week", "count", "source"]).head(8))
82
- ```
83
-
84
- ```text
85
- shape: (8, 6)
86
- ┌────────────┬─────────────────────────────────┬──────┬──────┬───────┬───────────────────────┐
87
- │ prefecture ┆ disease ┆ year ┆ week ┆ count ┆ source │
88
- │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
89
- │ str ┆ str ┆ i32 ┆ i32 ┆ f64 ┆ str │
90
- ╞════════════╪═════════════════════════════════╪══════╪══════╪═══════╪═══════════════════════╡
91
- │ Tochigi ┆ Lyme disease ┆ 2011 ┆ 24 ┆ 0.0 ┆ Confirmed cases │
92
- │ Kochi ┆ Avian influenza H5N1 ┆ 2008 ┆ 51 ┆ 0.0 ┆ Confirmed cases │
93
- │ Hokkaido ┆ Dengue fever ┆ 1999 ┆ 28 ┆ 0.0 ┆ Confirmed cases │
94
- │ Tokyo ┆ Congenital rubella syndrome ┆ 2014 ┆ 41 ┆ 0.0 ┆ Confirmed cases │
95
- │ Nagasaki ┆ Severe Acute Respiratory Syndr… ┆ 2018 ┆ 4 ┆ 0.0 ┆ Confirmed cases │
96
- │ Fukushima ┆ Infectious gastroenteritis (on… ┆ 2019 ┆ 25 ┆ 145.0 ┆ Sentinel surveillance │
97
- │ Nara ┆ Severe invasive streptococcal … ┆ 2003 ┆ 10 ┆ 0.0 ┆ Confirmed cases │
98
- │ Mie ┆ Plague ┆ 2006 ┆ 37 ┆ 0.0 ┆ Confirmed cases │
99
- └────────────┴─────────────────────────────────┴──────┴──────┴───────┴───────────────────────┘
100
- ```
101
-
102
- ```python
103
- import jp_idwr_db as jp
104
-
105
- # Optional: attach ISO prefecture IDs (JP-01 ... JP-47) only when needed
106
- df_with_ids = jp.attach_prefecture_id(df, prefecture_col="prefecture", id_col="prefecture_id")
107
- print(df_with_ids.select(["prefecture", "prefecture_id"]).head())
108
- ```
109
-
110
- ```text
111
- shape: (5, 2)
112
- ┌────────────┬───────────────┐
113
- │ prefecture ┆ prefecture_id │
114
- ╞════════════╪═══════════════╡
115
- │ Tochigi ┆ JP-09 │
116
- │ Kochi ┆ JP-39 │
117
- │ Hokkaido ┆ JP-01 │
118
- │ Tokyo ┆ JP-13 │
119
- │ Nagasaki ┆ JP-42 │
120
- └────────────┴───────────────┘
121
- ```
122
-
123
- ## Main API
124
-
125
- Top-level API exported by `jp_idwr_db`:
126
-
127
- - `load(name)`
128
- - `get_data(...)`
129
- - `list_diseases(source="all")`
130
- - `list_prefectures()`
131
- - `get_latest_week()`
132
- - `prefecture_map()`
133
- - `attach_prefecture_id(df, prefecture_col="prefecture", id_col="prefecture_id")`
134
- - `merge(...)`, `pivot(...)`
135
- - `configure(...)`, `get_config()`
136
-
137
- ### Filtered Access with `get_data`
138
-
139
- ```python
140
- import jp_idwr_db as jp
141
-
142
- # Tuberculosis rows for a year range
143
- tb = jp.get_data(disease="Tuberculosis", year=(2018, 2023))
144
- print(tb.select(["prefecture", "disease", "year", "week", "count", "source"]).head(8))
145
- ```
146
-
147
- ```text
148
- shape: (8, 6)
149
- ┌────────────┬──────────────┬──────┬──────┬───────┬─────────────────┐
150
- │ prefecture ┆ disease ┆ year ┆ week ┆ count ┆ source │
151
- │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
152
- │ str ┆ str ┆ i32 ┆ i32 ┆ f64 ┆ str │
153
- ╞════════════╪══════════════╪══════╪══════╪═══════╪═════════════════╡
154
- │ Hokkaido ┆ Tuberculosis ┆ 2020 ┆ 12 ┆ 5.0 ┆ Confirmed cases │
155
- │ Oita ┆ Tuberculosis ┆ 2023 ┆ 38 ┆ 6.0 ┆ Confirmed cases │
156
- │ Fukuoka ┆ Tuberculosis ┆ 2021 ┆ 8 ┆ 12.0 ┆ Confirmed cases │
157
- │ Kagawa ┆ Tuberculosis ┆ 2020 ┆ 19 ┆ 2.0 ┆ Confirmed cases │
158
- │ Chiba ┆ Tuberculosis ┆ 2020 ┆ 19 ┆ 9.0 ┆ Confirmed cases │
159
- │ Kanagawa ┆ Tuberculosis ┆ 2022 ┆ 17 ┆ 25.0 ┆ Confirmed cases │
160
- │ Okinawa ┆ Tuberculosis ┆ 2021 ┆ 11 ┆ 4.0 ┆ Confirmed cases │
161
- │ Gifu ┆ Tuberculosis ┆ 2018 ┆ 23 ┆ 7.0 ┆ Confirmed cases │
162
- └────────────┴──────────────┴──────┴──────┴───────┴─────────────────┘
163
- ```
164
-
165
- ```python
166
- import jp_idwr_db as jp
167
-
168
- # Sentinel-only diseases from recent years
169
- sentinel = jp.get_data(source="sentinel", year=(2023, 2026))
170
- print(sentinel.select(["prefecture", "disease", "year", "week", "count", "source"]).head(8))
171
- ```
172
-
173
- ```text
174
- shape: (8, 6)
175
- ┌────────────┬─────────────────────────────────┬──────┬──────┬───────┬───────────────────────┐
176
- │ prefecture ┆ disease ┆ year ┆ week ┆ count ┆ source │
177
- │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
178
- │ str ┆ str ┆ i32 ┆ i32 ┆ f64 ┆ str │
179
- ╞════════════╪═════════════════════════════════╪══════╪══════╪═══════╪═══════════════════════╡
180
- │ Ishikawa ┆ Respiratory syncytial virus in… ┆ 2024 ┆ 42 ┆ 813.0 ┆ Sentinel surveillance │
181
- │ Nara ┆ Erythema infection ┆ 2025 ┆ 31 ┆ 823.0 ┆ Sentinel surveillance │
182
- │ Saga ┆ Mumps ┆ 2024 ┆ 26 ┆ 14.0 ┆ Sentinel surveillance │
183
- │ Hyogo ┆ Pharyngoconjunctival fever ┆ 2023 ┆ 19 ┆ 468.0 ┆ Sentinel surveillance │
184
- │ Miyazaki ┆ Infectious gastroenteritis ┆ 2026 ┆ 3 ┆ 339.0 ┆ Sentinel surveillance │
185
- │ Kagoshima ┆ Infectious gastroenteritis (on… ┆ 2024 ┆ 9 ┆ null ┆ Sentinel surveillance │
186
- │ Osaka ┆ Mumps ┆ 2024 ┆ 49 ┆ 404.0 ┆ Sentinel surveillance │
187
- │ Aomori ┆ Erythema infection ┆ 2024 ┆ 10 ┆ 5.0 ┆ Sentinel surveillance │
188
- └────────────┴─────────────────────────────────┴──────┴──────┴───────┴───────────────────────┘
189
- ```
190
-
191
- ## Datasets
192
-
193
- Use `jp.load(...)` with:
194
-
195
- - `"sex"`: historical sex-disaggregated surveillance
196
- - `"place"`: historical place-category surveillance
197
- - `"bullet"`: modern all-case weekly reports (rapid zensu)
198
- - `"sentinel"`: sentinel weekly reports (teitenrui; 2012+ in release data assets)
199
- - `"unified"`: deduplicated combined dataset (sex-total + modern bullet/sentinel, recommended)
200
-
201
- Detailed schema and coverage are documented in [DATASETS.md](./docs/DATASETS.md).
202
-
203
- ## Raw Download and Parsing
204
-
205
- Raw file workflows are available in `jp_idwr_db.io`:
206
-
207
- - `jp_idwr_db.io.download(...)`
208
- - `jp_idwr_db.io.download_recent(...)`
209
- - `jp_idwr_db.io.read(...)`
210
-
211
- These are useful for refreshing local raw weekly files or debugging parser behavior.
212
-
213
- ## Data Wrangling Examples
214
-
215
- See [EXAMPLES.md](./docs/EXAMPLES.md) for Polars-first data wrangling recipes (grouping, trends, regional slices, source-aware filtering).
216
-
217
- Disease-by-disease temporal coverage is documented in [DISEASES.md](./docs/DISEASES.md).
218
-
219
- ## Data Source
220
-
221
- NIID/JIHS infectious disease surveillance publications:
222
-
223
- - Historical annual archive files (`Syu_01_1`, `Syu_02_1`)
224
- - Rapid weekly CSV reports (`zensuXX.csv`, `teitenruiXX.csv`)
225
-
226
- ## Development
227
-
228
- ```bash
229
- uv sync --all-extras --dev
230
- uv run ruff check .
231
- uv run mypy src
232
- uv run pytest
233
- ```
234
-
235
- ## Security and Integrity
236
-
237
- - Release assets include a `jp_idwr_db-manifest.json` with SHA256 checksums.
238
- - `ensure_data()` verifies archive checksum and each extracted parquet checksum before marking cache complete.
239
- - For PyPI publishing, prefer Trusted Publishing (OIDC) over long-lived API tokens.
240
-
241
- ## License
242
-
243
- GPL-3.0-or-later. See [LICENSE](./LICENSE).
@@ -1,206 +0,0 @@
1
- # jp-idwr-db
2
-
3
- Python access to Japanese infectious disease surveillance data from NIID/JIHS.
4
-
5
- `jp-idwr-db` provides a Polars-first API for filtering and analysis.
6
- Parquet datasets are versioned as GitHub Release assets and downloaded to a local cache on first use.
7
- It is inspired by the R package `jpinfect`, but it is not an API-parity port and includes independently curated ingestion and coverage.
8
-
9
- ## Install
10
-
11
- ```bash
12
- pip install jp-idwr-db
13
- ```
14
-
15
- ## Data Download Model
16
-
17
- - Package wheels do not ship the large parquet tables.
18
- - On first call to `jp.load(...)` (or `jp.get_data(...)`), the package downloads versioned data assets from GitHub Releases.
19
- - Cache path defaults to:
20
- - macOS: `~/Library/Caches/jp_idwr_db/data/<version>/`
21
- - Linux: `~/.cache/jp_idwr_db/data/<version>/`
22
- - Windows: `%LOCALAPPDATA%\\jp_idwr_db\\Cache\\data\\<version>\\`
23
-
24
- Prefetch explicitly:
25
-
26
- ```bash
27
- python -m jp_idwr_db data download
28
- python -m jp_idwr_db data download --version v0.1.0 --force
29
- ```
30
-
31
- Environment overrides:
32
-
33
- - `JPINFECT_DATA_VERSION`: choose a specific release tag (example: `v0.1.0`)
34
- - `JPINFECT_DATA_BASE_URL`: override asset host base URL
35
- - `JPINFECT_CACHE_DIR`: override local cache root
36
-
37
- ## Quick Start
38
-
39
- ```python
40
- import jp_idwr_db as jp
41
-
42
- # Full unified dataset (recommended)
43
- df = jp.load("unified")
44
- print(df.select(["prefecture", "disease", "year", "week", "count", "source"]).head(8))
45
- ```
46
-
47
- ```text
48
- shape: (8, 6)
49
- ┌────────────┬─────────────────────────────────┬──────┬──────┬───────┬───────────────────────┐
50
- │ prefecture ┆ disease ┆ year ┆ week ┆ count ┆ source │
51
- │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
52
- │ str ┆ str ┆ i32 ┆ i32 ┆ f64 ┆ str │
53
- ╞════════════╪═════════════════════════════════╪══════╪══════╪═══════╪═══════════════════════╡
54
- │ Tochigi ┆ Lyme disease ┆ 2011 ┆ 24 ┆ 0.0 ┆ Confirmed cases │
55
- │ Kochi ┆ Avian influenza H5N1 ┆ 2008 ┆ 51 ┆ 0.0 ┆ Confirmed cases │
56
- │ Hokkaido ┆ Dengue fever ┆ 1999 ┆ 28 ┆ 0.0 ┆ Confirmed cases │
57
- │ Tokyo ┆ Congenital rubella syndrome ┆ 2014 ┆ 41 ┆ 0.0 ┆ Confirmed cases │
58
- │ Nagasaki ┆ Severe Acute Respiratory Syndr… ┆ 2018 ┆ 4 ┆ 0.0 ┆ Confirmed cases │
59
- │ Fukushima ┆ Infectious gastroenteritis (on… ┆ 2019 ┆ 25 ┆ 145.0 ┆ Sentinel surveillance │
60
- │ Nara ┆ Severe invasive streptococcal … ┆ 2003 ┆ 10 ┆ 0.0 ┆ Confirmed cases │
61
- │ Mie ┆ Plague ┆ 2006 ┆ 37 ┆ 0.0 ┆ Confirmed cases │
62
- └────────────┴─────────────────────────────────┴──────┴──────┴───────┴───────────────────────┘
63
- ```
64
-
65
- ```python
66
- import jp_idwr_db as jp
67
-
68
- # Optional: attach ISO prefecture IDs (JP-01 ... JP-47) only when needed
69
- df_with_ids = jp.attach_prefecture_id(df, prefecture_col="prefecture", id_col="prefecture_id")
70
- print(df_with_ids.select(["prefecture", "prefecture_id"]).head())
71
- ```
72
-
73
- ```text
74
- shape: (5, 2)
75
- ┌────────────┬───────────────┐
76
- │ prefecture ┆ prefecture_id │
77
- ╞════════════╪═══════════════╡
78
- │ Tochigi ┆ JP-09 │
79
- │ Kochi ┆ JP-39 │
80
- │ Hokkaido ┆ JP-01 │
81
- │ Tokyo ┆ JP-13 │
82
- │ Nagasaki ┆ JP-42 │
83
- └────────────┴───────────────┘
84
- ```
85
-
86
- ## Main API
87
-
88
- Top-level API exported by `jp_idwr_db`:
89
-
90
- - `load(name)`
91
- - `get_data(...)`
92
- - `list_diseases(source="all")`
93
- - `list_prefectures()`
94
- - `get_latest_week()`
95
- - `prefecture_map()`
96
- - `attach_prefecture_id(df, prefecture_col="prefecture", id_col="prefecture_id")`
97
- - `merge(...)`, `pivot(...)`
98
- - `configure(...)`, `get_config()`
99
-
100
- ### Filtered Access with `get_data`
101
-
102
- ```python
103
- import jp_idwr_db as jp
104
-
105
- # Tuberculosis rows for a year range
106
- tb = jp.get_data(disease="Tuberculosis", year=(2018, 2023))
107
- print(tb.select(["prefecture", "disease", "year", "week", "count", "source"]).head(8))
108
- ```
109
-
110
- ```text
111
- shape: (8, 6)
112
- ┌────────────┬──────────────┬──────┬──────┬───────┬─────────────────┐
113
- │ prefecture ┆ disease ┆ year ┆ week ┆ count ┆ source │
114
- │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
115
- │ str ┆ str ┆ i32 ┆ i32 ┆ f64 ┆ str │
116
- ╞════════════╪══════════════╪══════╪══════╪═══════╪═════════════════╡
117
- │ Hokkaido ┆ Tuberculosis ┆ 2020 ┆ 12 ┆ 5.0 ┆ Confirmed cases │
118
- │ Oita ┆ Tuberculosis ┆ 2023 ┆ 38 ┆ 6.0 ┆ Confirmed cases │
119
- │ Fukuoka ┆ Tuberculosis ┆ 2021 ┆ 8 ┆ 12.0 ┆ Confirmed cases │
120
- │ Kagawa ┆ Tuberculosis ┆ 2020 ┆ 19 ┆ 2.0 ┆ Confirmed cases │
121
- │ Chiba ┆ Tuberculosis ┆ 2020 ┆ 19 ┆ 9.0 ┆ Confirmed cases │
122
- │ Kanagawa ┆ Tuberculosis ┆ 2022 ┆ 17 ┆ 25.0 ┆ Confirmed cases │
123
- │ Okinawa ┆ Tuberculosis ┆ 2021 ┆ 11 ┆ 4.0 ┆ Confirmed cases │
124
- │ Gifu ┆ Tuberculosis ┆ 2018 ┆ 23 ┆ 7.0 ┆ Confirmed cases │
125
- └────────────┴──────────────┴──────┴──────┴───────┴─────────────────┘
126
- ```
127
-
128
- ```python
129
- import jp_idwr_db as jp
130
-
131
- # Sentinel-only diseases from recent years
132
- sentinel = jp.get_data(source="sentinel", year=(2023, 2026))
133
- print(sentinel.select(["prefecture", "disease", "year", "week", "count", "source"]).head(8))
134
- ```
135
-
136
- ```text
137
- shape: (8, 6)
138
- ┌────────────┬─────────────────────────────────┬──────┬──────┬───────┬───────────────────────┐
139
- │ prefecture ┆ disease ┆ year ┆ week ┆ count ┆ source │
140
- │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
141
- │ str ┆ str ┆ i32 ┆ i32 ┆ f64 ┆ str │
142
- ╞════════════╪═════════════════════════════════╪══════╪══════╪═══════╪═══════════════════════╡
143
- │ Ishikawa ┆ Respiratory syncytial virus in… ┆ 2024 ┆ 42 ┆ 813.0 ┆ Sentinel surveillance │
144
- │ Nara ┆ Erythema infection ┆ 2025 ┆ 31 ┆ 823.0 ┆ Sentinel surveillance │
145
- │ Saga ┆ Mumps ┆ 2024 ┆ 26 ┆ 14.0 ┆ Sentinel surveillance │
146
- │ Hyogo ┆ Pharyngoconjunctival fever ┆ 2023 ┆ 19 ┆ 468.0 ┆ Sentinel surveillance │
147
- │ Miyazaki ┆ Infectious gastroenteritis ┆ 2026 ┆ 3 ┆ 339.0 ┆ Sentinel surveillance │
148
- │ Kagoshima ┆ Infectious gastroenteritis (on… ┆ 2024 ┆ 9 ┆ null ┆ Sentinel surveillance │
149
- │ Osaka ┆ Mumps ┆ 2024 ┆ 49 ┆ 404.0 ┆ Sentinel surveillance │
150
- │ Aomori ┆ Erythema infection ┆ 2024 ┆ 10 ┆ 5.0 ┆ Sentinel surveillance │
151
- └────────────┴─────────────────────────────────┴──────┴──────┴───────┴───────────────────────┘
152
- ```
153
-
154
- ## Datasets
155
-
156
- Use `jp.load(...)` with:
157
-
158
- - `"sex"`: historical sex-disaggregated surveillance
159
- - `"place"`: historical place-category surveillance
160
- - `"bullet"`: modern all-case weekly reports (rapid zensu)
161
- - `"sentinel"`: sentinel weekly reports (teitenrui; 2012+ in release data assets)
162
- - `"unified"`: deduplicated combined dataset (sex-total + modern bullet/sentinel, recommended)
163
-
164
- Detailed schema and coverage are documented in [DATASETS.md](./docs/DATASETS.md).
165
-
166
- ## Raw Download and Parsing
167
-
168
- Raw file workflows are available in `jp_idwr_db.io`:
169
-
170
- - `jp_idwr_db.io.download(...)`
171
- - `jp_idwr_db.io.download_recent(...)`
172
- - `jp_idwr_db.io.read(...)`
173
-
174
- These are useful for refreshing local raw weekly files or debugging parser behavior.
175
-
176
- ## Data Wrangling Examples
177
-
178
- See [EXAMPLES.md](./docs/EXAMPLES.md) for Polars-first data wrangling recipes (grouping, trends, regional slices, source-aware filtering).
179
-
180
- Disease-by-disease temporal coverage is documented in [DISEASES.md](./docs/DISEASES.md).
181
-
182
- ## Data Source
183
-
184
- NIID/JIHS infectious disease surveillance publications:
185
-
186
- - Historical annual archive files (`Syu_01_1`, `Syu_02_1`)
187
- - Rapid weekly CSV reports (`zensuXX.csv`, `teitenruiXX.csv`)
188
-
189
- ## Development
190
-
191
- ```bash
192
- uv sync --all-extras --dev
193
- uv run ruff check .
194
- uv run mypy src
195
- uv run pytest
196
- ```
197
-
198
- ## Security and Integrity
199
-
200
- - Release assets include a `jp_idwr_db-manifest.json` with SHA256 checksums.
201
- - `ensure_data()` verifies archive checksum and each extracted parquet checksum before marking cache complete.
202
- - For PyPI publishing, prefer Trusted Publishing (OIDC) over long-lived API tokens.
203
-
204
- ## License
205
-
206
- GPL-3.0-or-later. See [LICENSE](./LICENSE).
File without changes
File without changes
File without changes