jp-idwr-db 0.2.2__py3-none-any.whl → 0.2.3__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
jp_idwr_db/__init__.py CHANGED
@@ -25,5 +25,5 @@ __all__ = [
25
25
  "prefecture_map",
26
26
  ]
27
27
 
28
- __version__ = "0.2.2"
28
+ __version__ = "0.2.3"
29
29
  __data_version__ = __version__
jp_idwr_db/config.py CHANGED
@@ -25,7 +25,7 @@ class Config:
25
25
 
26
26
  cache_dir: Path = Path(user_cache_dir("jp_idwr_db"))
27
27
  rate_limit_per_minute: int = 20
28
- user_agent: str = "jp_idwr_db/0.2.2 (+https://github.com/AlFontal/jp-idwr-db)"
28
+ user_agent: str = "jp_idwr_db/0.2.3 (+https://github.com/AlFontal/jp-idwr-db)"
29
29
  timeout_seconds: float = 30.0
30
30
  retries: int = 3
31
31
 
@@ -6,6 +6,7 @@ import hashlib
6
6
  import json
7
7
  import os
8
8
  import shutil
9
+ import sys
9
10
  import zipfile
10
11
  from importlib.metadata import PackageNotFoundError
11
12
  from importlib.metadata import version as package_version
@@ -128,6 +129,15 @@ def ensure_data(version: str | None = None, force: bool = False) -> Path:
128
129
  if force and data_dir.exists():
129
130
  shutil.rmtree(data_dir)
130
131
  data_dir.mkdir(parents=True, exist_ok=True)
132
+ action = "Refreshing" if force else "Building"
133
+ print(
134
+ f"[jp_idwr_db] {action} local data cache for {resolved} at {data_dir}.",
135
+ file=sys.stderr,
136
+ )
137
+ print(
138
+ "[jp_idwr_db] This happens on first use and may take a moment.",
139
+ file=sys.stderr,
140
+ )
131
141
 
132
142
  archive_path, manifest_path = download_release_assets(resolved, data_dir)
133
143
  manifest = json.loads(manifest_path.read_text(encoding="utf-8"))
@@ -159,4 +169,5 @@ def ensure_data(version: str | None = None, force: bool = False) -> Path:
159
169
  raise ValueError(f"Missing required datasets in cache: {sorted(missing_expected)}")
160
170
 
161
171
  marker.write_text("ok\n", encoding="utf-8")
172
+ print("[jp_idwr_db] Data cache ready.", file=sys.stderr)
162
173
  return data_dir
@@ -0,0 +1,293 @@
1
+ Metadata-Version: 2.4
2
+ Name: jp-idwr-db
3
+ Version: 0.2.3
4
+ Summary: Japanese IDWR infectious disease database and analytics toolkit built on Polars.
5
+ Project-URL: Homepage, https://github.com/AlFontal/jp-idwr-db
6
+ Project-URL: Repository, https://github.com/AlFontal/jp-idwr-db
7
+ Project-URL: Bug Tracker, https://github.com/AlFontal/jp-idwr-db/issues
8
+ Author: jp-idwr-db contributors
9
+ License: GPL-3.0-or-later
10
+ License-File: LICENSE
11
+ Keywords: epidemiology,infectious-disease,japan,polars,surveillance
12
+ Classifier: Development Status :: 3 - Alpha
13
+ Classifier: Intended Audience :: Science/Research
14
+ Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
15
+ Classifier: Programming Language :: Python :: 3
16
+ Classifier: Programming Language :: Python :: 3.10
17
+ Classifier: Programming Language :: Python :: 3.11
18
+ Classifier: Programming Language :: Python :: 3.12
19
+ Classifier: Programming Language :: Python :: 3.13
20
+ Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
21
+ Requires-Python: >=3.10
22
+ Requires-Dist: fastexcel>=0.10
23
+ Requires-Dist: httpx>=0.27
24
+ Requires-Dist: openpyxl>=3.1
25
+ Requires-Dist: platformdirs>=4.2
26
+ Requires-Dist: polars>=0.20
27
+ Requires-Dist: pyarrow>=14.0
28
+ Provides-Extra: dev
29
+ Requires-Dist: mypy>=1.8; extra == 'dev'
30
+ Requires-Dist: pre-commit>=3.7; extra == 'dev'
31
+ Requires-Dist: pytest-cov>=5.0; extra == 'dev'
32
+ Requires-Dist: pytest>=8.0; extra == 'dev'
33
+ Requires-Dist: ruff>=0.4; extra == 'dev'
34
+ Provides-Extra: excel
35
+ Requires-Dist: fastexcel>=0.10; extra == 'excel'
36
+ Description-Content-Type: text/markdown
37
+
38
+ # jp-idwr-db
39
+
40
+ Python access to Japanese infectious disease surveillance data from NIID/JIHS.
41
+
42
+ `jp-idwr-db` provides a Polars-first API for filtering and analysis.
43
+ Parquet datasets are versioned as GitHub Release assets and downloaded to a local cache on first use.
44
+ It is inspired by the R package `jpinfect`, but it is not an API-parity port and includes independently curated ingestion and coverage.
45
+
46
+ NIID/JIHS surveillance data is public, but it is not exposed as a clean analytical API.
47
+ To reconstruct usable time series, you typically need to navigate multiple archive structures, yearly directories,
48
+ and week-level files with changing formats (Excel and CSV) across historical and modern reporting systems.
49
+
50
+ This package exists to remove that friction: it consolidates those heterogeneous sources into standardized, queryable
51
+ tables so you can move directly to epidemiological analysis instead of file discovery, parsing, and schema harmonization.
52
+
53
+ ## Install
54
+
55
+ ```bash
56
+ pip install jp-idwr-db
57
+ ```
58
+
59
+ ## Data Download Model
60
+
61
+ - Package wheels do not ship the large parquet tables.
62
+ - On first call to `jp.load(...)` (or `jp.get_data(...)`), the package downloads versioned data assets from GitHub Releases.
63
+ - Cache path defaults to:
64
+ - macOS: `~/Library/Caches/jp_idwr_db/data/<version>/`
65
+ - Linux: `~/.cache/jp_idwr_db/data/<version>/`
66
+ - Windows: `%LOCALAPPDATA%\\jp_idwr_db\\Cache\\data\\<version>\\`
67
+
68
+ Prefetch explicitly:
69
+
70
+ ```bash
71
+ python -m jp_idwr_db data download
72
+ python -m jp_idwr_db data download --version v0.2.2 --force
73
+ ```
74
+
75
+ Environment overrides:
76
+
77
+ - `JPINFECT_DATA_VERSION`: choose a specific release tag (example: `v0.2.2`)
78
+ - `JPINFECT_DATA_BASE_URL`: override asset host base URL
79
+ - `JPINFECT_CACHE_DIR`: override local cache root
80
+
81
+ ## Quick Start
82
+
83
+ To fetch the full unified dataset with a single call:
84
+
85
+ ```python
86
+ import jp_idwr_db as jp
87
+ import polars as pl
88
+
89
+ df = (
90
+ jp.load("unified")
91
+ .select(["date", "prefecture", "category", "disease", "count", "source"])
92
+ )
93
+ print(df)
94
+ ```
95
+
96
+ ```text
97
+ shape: (5_370_477, 6)
98
+ ┌────────────┬────────────┬──────────┬─────────────────────────────┬───────┬────────────────────┐
99
+ │ date ┆ prefecture ┆ category ┆ disease ┆ count ┆ source │
100
+ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
101
+ │ date ┆ str ┆ str ┆ str ┆ f64 ┆ str │
102
+ ╞════════════╪════════════╪══════════╪═════════════════════════════╪═══════╪════════════════════╡
103
+ │ 1999-04-11 ┆ Aichi ┆ total ┆ AIDS ┆ 0.0 ┆ Confirmed cases │
104
+ │ 1999-04-11 ┆ Aichi ┆ total ┆ Acute poliomyelitis ┆ 0.0 ┆ Confirmed cases │
105
+ │ 1999-04-11 ┆ Aichi ┆ total ┆ Acute viral hepatitis ┆ 4.0 ┆ Confirmed cases │
106
+ │ 1999-04-11 ┆ Aichi ┆ total ┆ Amebiasis ┆ 0.0 ┆ Confirmed cases │
107
+ │ 1999-04-11 ┆ Aichi ┆ total ┆ Anthrax ┆ 0.0 ┆ Confirmed cases │
108
+ │ … ┆ … ┆ … ┆ … ┆ … ┆ … │
109
+ │ 2026-02-09 ┆ Yamanashi ┆ total ┆ Viral hepatitis(excluding ┆ 0.0 ┆ All-case reporting │
110
+ │ ┆ ┆ ┆ hepa… ┆ ┆ │
111
+ │ 2026-02-09 ┆ Yamanashi ┆ total ┆ West Nile fever ┆ 0.0 ┆ All-case reporting │
112
+ │ 2026-02-09 ┆ Yamanashi ┆ total ┆ Western equine encephalitis ┆ 0.0 ┆ All-case reporting │
113
+ │ 2026-02-09 ┆ Yamanashi ┆ total ┆ Yellow fever ┆ 0.0 ┆ All-case reporting │
114
+ │ 2026-02-09 ┆ Yamanashi ┆ total ┆ Zika virus infection ┆ 0.0 ┆ All-case reporting │
115
+ └────────────┴────────────┴──────────┴─────────────────────────────┴───────┴────────────────────┘
116
+ ```
117
+
118
+ You can also filter at the source with `jp.get_data(...)`:
119
+
120
+ ```python
121
+
122
+ # Fetch only tuberculosis data for 2024 in Tokyo, Osaka, and Hokkaido
123
+ tb = (
124
+ jp.get_data(
125
+ disease="Tuberculosis",
126
+ year=2024,
127
+ prefecture=["Tokyo", "Osaka", "Hokkaido"])
128
+ .select(["date", "prefecture", "disease", "count", "source"])
129
+ )
130
+ print(tb)
131
+ ```
132
+
133
+ ```text
134
+ shape: (156, 5)
135
+ ┌────────────┬────────────┬──────────────┬───────┬────────────────────┐
136
+ │ date ┆ prefecture ┆ disease ┆ count ┆ source │
137
+ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │
138
+ │ date ┆ str ┆ str ┆ f64 ┆ str │
139
+ ╞════════════╪════════════╪══════════════╪═══════╪════════════════════╡
140
+ │ 2024-01-01 ┆ Hokkaido ┆ Tuberculosis ┆ 2.0 ┆ All-case reporting │
141
+ │ 2024-01-01 ┆ Osaka ┆ Tuberculosis ┆ 3.0 ┆ All-case reporting │
142
+ │ 2024-01-01 ┆ Tokyo ┆ Tuberculosis ┆ 15.0 ┆ All-case reporting │
143
+ │ 2024-01-08 ┆ Hokkaido ┆ Tuberculosis ┆ 4.0 ┆ All-case reporting │
144
+ │ 2024-01-08 ┆ Osaka ┆ Tuberculosis ┆ 17.0 ┆ All-case reporting │
145
+ │ … ┆ … ┆ … ┆ … ┆ … │
146
+ │ 2024-12-16 ┆ Osaka ┆ Tuberculosis ┆ 17.0 ┆ All-case reporting │
147
+ │ 2024-12-16 ┆ Tokyo ┆ Tuberculosis ┆ 41.0 ┆ All-case reporting │
148
+ │ 2024-12-23 ┆ Hokkaido ┆ Tuberculosis ┆ 5.0 ┆ All-case reporting │
149
+ │ 2024-12-23 ┆ Osaka ┆ Tuberculosis ┆ 16.0 ┆ All-case reporting │
150
+ │ 2024-12-23 ┆ Tokyo ┆ Tuberculosis ┆ 53.0 ┆ All-case reporting │
151
+ └────────────┴────────────┴──────────────┴───────┴────────────────────┘
152
+ ```
153
+
154
+ ```python
155
+
156
+ # Sentinel-only diseases from recent years in Tokyo prefecture
157
+ sentinel_df = (
158
+ jp.get_data(
159
+ source="sentinel",
160
+ year=(2024, 2026))
161
+ .select(["date", "prefecture", "disease", "count", "per_sentinel"])
162
+ )
163
+ print(sentinel_df)
164
+ ```
165
+
166
+ ```text
167
+ shape: (2_052, 5)
168
+ ┌────────────┬────────────┬─────────────────────────────────┬─────────┬──────────────┐
169
+ │ date ┆ prefecture ┆ disease ┆ count ┆ per_sentinel │
170
+ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │
171
+ │ date ┆ str ┆ str ┆ f64 ┆ f64 │
172
+ ╞════════════╪════════════╪═════════════════════════════════╪═════════╪══════════════╡
173
+ │ 2024-01-07 ┆ Tokyo ┆ Acute hemorrhagic conjunctivit… ┆ null ┆ null │
174
+ │ 2024-01-07 ┆ Tokyo ┆ Aseptic meningitis ┆ null ┆ null │
175
+ │ 2024-01-07 ┆ Tokyo ┆ Bacterial meningitis ┆ null ┆ null │
176
+ │ 2024-01-07 ┆ Tokyo ┆ COVID-19 ┆ 1365.0 ┆ 3.38 │
177
+ │ 2024-01-07 ┆ Tokyo ┆ Chickenpox ┆ 31.0 ┆ 0.12 │
178
+ │ … ┆ … ┆ … ┆ … ┆ … │
179
+ │ 2026-01-25 ┆ Tokyo ┆ Influenza(excld. avian influen… ┆ 13082.0 ┆ 34.07 │
180
+ │ 2026-01-25 ┆ Tokyo ┆ Mumps ┆ 30.0 ┆ 0.12 │
181
+ │ 2026-01-25 ┆ Tokyo ┆ Mycoplasma pneumonia ┆ 32.0 ┆ 1.28 │
182
+ │ 2026-01-25 ┆ Tokyo ┆ Pharyngoconjunctival fever ┆ 115.0 ┆ 0.47 │
183
+ │ 2026-01-25 ┆ Tokyo ┆ Respiratory syncytial virus in… ┆ 242.0 ┆ 1.0 │
184
+ └────────────┴────────────┴─────────────────────────────────┴─────────┴──────────────┘
185
+ ```
186
+
187
+ ## Main API
188
+
189
+ Top-level API exported by `jp_idwr_db`:
190
+
191
+ - `load(name)`
192
+ - `get_data(...)`
193
+ - `list_diseases(source="all")`
194
+ - `list_prefectures()`
195
+ - `get_latest_week()`
196
+ - `prefecture_map()`
197
+ - `attach_prefecture_id(df, prefecture_col="prefecture", id_col="prefecture_id")`
198
+ - `merge(...)`, `pivot(...)`
199
+ - `configure(...)`, `get_config()`
200
+
201
+
202
+ ## Datasets
203
+
204
+ Use `jp.load(...)` with:
205
+
206
+ - `"sex"`: historical sex-disaggregated surveillance
207
+ - `"place"`: historical place-category surveillance
208
+ - `"bullet"`: modern all-case weekly reports (rapid zensu)
209
+ - `"sentinel"`: sentinel weekly reports (teitenrui; 2012+ in release data assets)
210
+ - `"unified"`: deduplicated combined dataset (sex-total + modern bullet/sentinel, recommended)
211
+
212
+ Detailed schema and coverage are documented in [DATASETS.md](./docs/DATASETS.md).
213
+
214
+ ## Optional Prefecture IDs
215
+
216
+ Attach ISO prefecture IDs (JP-01 ... JP-47) only when needed:
217
+
218
+ ```python
219
+ import jp_idwr_db as jp
220
+
221
+ df_with_ids = (
222
+ jp.get_data(disease="Measles", year=2024)
223
+ .select(["prefecture", "disease", "count"])
224
+ .sort(["prefecture", "count"])
225
+ .unique(subset=["prefecture"], keep="first")
226
+ .pipe(jp.attach_prefecture_id)
227
+ .sort("prefecture")
228
+ )
229
+ print(df_with_ids)
230
+ ```
231
+
232
+ ```text
233
+ shape: (48, 4)
234
+ ┌────────────┬─────────┬───────┬───────────────┐
235
+ │ prefecture ┆ disease ┆ count ┆ prefecture_id │
236
+ │ --- ┆ --- ┆ --- ┆ --- │
237
+ │ str ┆ str ┆ f64 ┆ str │
238
+ ╞════════════╪═════════╪═══════╪═══════════════╡
239
+ │ Aichi ┆ Measles ┆ 0.0 ┆ JP-23 │
240
+ │ Akita ┆ Measles ┆ 0.0 ┆ JP-05 │
241
+ │ Aomori ┆ Measles ┆ 0.0 ┆ JP-02 │
242
+ │ Chiba ┆ Measles ┆ 0.0 ┆ JP-12 │
243
+ │ Ehime ┆ Measles ┆ 0.0 ┆ JP-38 │
244
+ │ … ┆ … ┆ … ┆ … │
245
+ │ Toyama ┆ Measles ┆ 0.0 ┆ JP-16 │
246
+ │ Wakayama ┆ Measles ┆ 0.0 ┆ JP-30 │
247
+ │ Yamagata ┆ Measles ┆ 0.0 ┆ JP-06 │
248
+ │ Yamaguchi ┆ Measles ┆ 0.0 ┆ JP-35 │
249
+ │ Yamanashi ┆ Measles ┆ 0.0 ┆ JP-19 │
250
+ └────────────┴─────────┴───────┴───────────────┘
251
+ ```
252
+
253
+ ## Raw Download and Parsing
254
+
255
+ Raw file workflows are available in `jp_idwr_db.io`:
256
+
257
+ - `jp_idwr_db.io.download(...)`
258
+ - `jp_idwr_db.io.download_recent(...)`
259
+ - `jp_idwr_db.io.read(...)`
260
+
261
+ These are useful for refreshing local raw weekly files or debugging parser behavior.
262
+
263
+ ## Data Wrangling Examples
264
+
265
+ See [EXAMPLES.md](./docs/EXAMPLES.md) for Polars-first data wrangling recipes (grouping, trends, regional slices, source-aware filtering).
266
+
267
+ Disease-by-disease temporal coverage is documented in [DISEASES.md](./docs/DISEASES.md).
268
+
269
+ ## Data Source
270
+
271
+ NIID/JIHS infectious disease surveillance publications:
272
+
273
+ - Historical annual archive files (`Syu_01_1`, `Syu_02_1`)
274
+ - Rapid weekly CSV reports (`zensuXX.csv`, `teitenruiXX.csv`)
275
+
276
+ ## Development
277
+
278
+ ```bash
279
+ uv sync --all-extras --dev
280
+ uv run ruff check .
281
+ uv run mypy src
282
+ uv run pytest
283
+ ```
284
+
285
+ ## Security and Integrity
286
+
287
+ - Release assets include a `jp_idwr_db-manifest.json` with SHA256 checksums.
288
+ - `ensure_data()` verifies archive checksum and each extracted parquet checksum before marking cache complete.
289
+ - For PyPI publishing, prefer Trusted Publishing (OIDC) over long-lived API tokens.
290
+
291
+ ## License
292
+
293
+ GPL-3.0-or-later. See [LICENSE](./LICENSE).
@@ -1,9 +1,9 @@
1
- jp_idwr_db/__init__.py,sha256=ZLacAkYDdSYOEV5Z_cH7ByzTg9MvY-LC5crVMg3pnok,694
1
+ jp_idwr_db/__init__.py,sha256=3Vpp0-ueVeXTtuzYjtUKCAzraljRGLMD2g4npas_liw,694
2
2
  jp_idwr_db/__main__.py,sha256=NczbhLtbnKQm-S4Tc-qARbWGm-s9DQqdx0utCN2e92o,139
3
3
  jp_idwr_db/api.py,sha256=Jx7DLmrM2whtbtnnmYlyX-NWuAtpUw4Jz53-3vSjIfs,6799
4
4
  jp_idwr_db/cli.py,sha256=taYLWVif3HFSfP9_P9E_nkIrc2L4hfR3-L4wcLY_D0E,1191
5
- jp_idwr_db/config.py,sha256=ipabkq7wPH8hE0uU9cgXmScL7UU5hU8_qACtgZT3IzA,1576
6
- jp_idwr_db/data_manager.py,sha256=OHZng1Z9AUHIhJwGRmvgXgy05iruHepQdNmBOkmEs3E,5542
5
+ jp_idwr_db/config.py,sha256=lSTD3rv6vOy799NPR3eAvBvY-KR1wpdKBf9TnVi1Sb0,1576
6
+ jp_idwr_db/data_manager.py,sha256=-Ryn8AaSxISmYVbBLjRr-pkQTJBqZzrgW0CB9DQMz3E,5903
7
7
  jp_idwr_db/datasets.py,sha256=Z9dUqqaNdBLUXJTeaGhI2qje2z9XxHZPEsx6_6yFjZs,3068
8
8
  jp_idwr_db/http.py,sha256=Y5IRu1Bi448_5kUGugbmryyCklmCc9xkQMTcJU8HX7I,7059
9
9
  jp_idwr_db/io.py,sha256=HUSJ0gW4py9uflCSTeGu5X27Lh1e3yITY2DlSARP0Vc,42362
@@ -16,8 +16,8 @@ jp_idwr_db/_internal/__init__.py,sha256=uIjX7aaqbJ2Acm2cEZ8QjgQqqFSmGFbCc-mmeCX_
16
16
  jp_idwr_db/_internal/download.py,sha256=APpS9MF2IuOxRqqp0BJbsSgHYkV2OauzH9BamZlqCOs,323
17
17
  jp_idwr_db/_internal/read.py,sha256=ZqsuzbyIS5LSFEXl3y9TiCoNxGV5IGueV4uQ6jZ5NnQ,274
18
18
  jp_idwr_db/_internal/validation.py,sha256=_Swb-kQgY8_u2aPvNxEh0NMV0a4PY_Moq3mnYxi27Jk,4646
19
- jp_idwr_db-0.2.2.dist-info/METADATA,sha256=EITWHxcCyNr7mDV_A4zNB_N0jsI6qdzWcWv1DHbxqcM,11951
20
- jp_idwr_db-0.2.2.dist-info/WHEEL,sha256=WLgqFyCfm_KASv4WHyYy0P3pM_m7J5L9k2skdKLirC8,87
21
- jp_idwr_db-0.2.2.dist-info/entry_points.txt,sha256=haoRgws2bVbaX-yFyGkPTMcadVnKvKVP6CLIIjuNHy8,51
22
- jp_idwr_db-0.2.2.dist-info/licenses/LICENSE,sha256=bHrJGTGJC-W4DvVUSZGWxzgNdgTIUBxjJv-Y4twk68M,33429
23
- jp_idwr_db-0.2.2.dist-info/RECORD,,
19
+ jp_idwr_db-0.2.3.dist-info/METADATA,sha256=RZmJu21QwFm-kIx91FM9cctxJsDL1EVpLXntaX4ZN2A,14508
20
+ jp_idwr_db-0.2.3.dist-info/WHEEL,sha256=WLgqFyCfm_KASv4WHyYy0P3pM_m7J5L9k2skdKLirC8,87
21
+ jp_idwr_db-0.2.3.dist-info/entry_points.txt,sha256=haoRgws2bVbaX-yFyGkPTMcadVnKvKVP6CLIIjuNHy8,51
22
+ jp_idwr_db-0.2.3.dist-info/licenses/LICENSE,sha256=bHrJGTGJC-W4DvVUSZGWxzgNdgTIUBxjJv-Y4twk68M,33429
23
+ jp_idwr_db-0.2.3.dist-info/RECORD,,
@@ -1,243 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: jp-idwr-db
3
- Version: 0.2.2
4
- Summary: Japanese IDWR infectious disease database and analytics toolkit built on Polars.
5
- Project-URL: Homepage, https://github.com/AlFontal/jp-idwr-db
6
- Project-URL: Repository, https://github.com/AlFontal/jp-idwr-db
7
- Project-URL: Bug Tracker, https://github.com/AlFontal/jp-idwr-db/issues
8
- Author: jp-idwr-db contributors
9
- License: GPL-3.0-or-later
10
- License-File: LICENSE
11
- Keywords: epidemiology,infectious-disease,japan,polars,surveillance
12
- Classifier: Development Status :: 3 - Alpha
13
- Classifier: Intended Audience :: Science/Research
14
- Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
15
- Classifier: Programming Language :: Python :: 3
16
- Classifier: Programming Language :: Python :: 3.10
17
- Classifier: Programming Language :: Python :: 3.11
18
- Classifier: Programming Language :: Python :: 3.12
19
- Classifier: Programming Language :: Python :: 3.13
20
- Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
21
- Requires-Python: >=3.10
22
- Requires-Dist: fastexcel>=0.10
23
- Requires-Dist: httpx>=0.27
24
- Requires-Dist: openpyxl>=3.1
25
- Requires-Dist: platformdirs>=4.2
26
- Requires-Dist: polars>=0.20
27
- Requires-Dist: pyarrow>=14.0
28
- Provides-Extra: dev
29
- Requires-Dist: mypy>=1.8; extra == 'dev'
30
- Requires-Dist: pre-commit>=3.7; extra == 'dev'
31
- Requires-Dist: pytest-cov>=5.0; extra == 'dev'
32
- Requires-Dist: pytest>=8.0; extra == 'dev'
33
- Requires-Dist: ruff>=0.4; extra == 'dev'
34
- Provides-Extra: excel
35
- Requires-Dist: fastexcel>=0.10; extra == 'excel'
36
- Description-Content-Type: text/markdown
37
-
38
- # jp-idwr-db
39
-
40
- Python access to Japanese infectious disease surveillance data from NIID/JIHS.
41
-
42
- `jp-idwr-db` provides a Polars-first API for filtering and analysis.
43
- Parquet datasets are versioned as GitHub Release assets and downloaded to a local cache on first use.
44
- It is inspired by the R package `jpinfect`, but it is not an API-parity port and includes independently curated ingestion and coverage.
45
-
46
- ## Install
47
-
48
- ```bash
49
- pip install jp-idwr-db
50
- ```
51
-
52
- ## Data Download Model
53
-
54
- - Package wheels do not ship the large parquet tables.
55
- - On first call to `jp.load(...)` (or `jp.get_data(...)`), the package downloads versioned data assets from GitHub Releases.
56
- - Cache path defaults to:
57
- - macOS: `~/Library/Caches/jp_idwr_db/data/<version>/`
58
- - Linux: `~/.cache/jp_idwr_db/data/<version>/`
59
- - Windows: `%LOCALAPPDATA%\\jp_idwr_db\\Cache\\data\\<version>\\`
60
-
61
- Prefetch explicitly:
62
-
63
- ```bash
64
- python -m jp_idwr_db data download
65
- python -m jp_idwr_db data download --version v0.1.0 --force
66
- ```
67
-
68
- Environment overrides:
69
-
70
- - `JPINFECT_DATA_VERSION`: choose a specific release tag (example: `v0.1.0`)
71
- - `JPINFECT_DATA_BASE_URL`: override asset host base URL
72
- - `JPINFECT_CACHE_DIR`: override local cache root
73
-
74
- ## Quick Start
75
-
76
- ```python
77
- import jp_idwr_db as jp
78
-
79
- # Full unified dataset (recommended)
80
- df = jp.load("unified")
81
- print(df.select(["prefecture", "disease", "year", "week", "count", "source"]).head(8))
82
- ```
83
-
84
- ```text
85
- shape: (8, 6)
86
- ┌────────────┬─────────────────────────────────┬──────┬──────┬───────┬───────────────────────┐
87
- │ prefecture ┆ disease ┆ year ┆ week ┆ count ┆ source │
88
- │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
89
- │ str ┆ str ┆ i32 ┆ i32 ┆ f64 ┆ str │
90
- ╞════════════╪═════════════════════════════════╪══════╪══════╪═══════╪═══════════════════════╡
91
- │ Tochigi ┆ Lyme disease ┆ 2011 ┆ 24 ┆ 0.0 ┆ Confirmed cases │
92
- │ Kochi ┆ Avian influenza H5N1 ┆ 2008 ┆ 51 ┆ 0.0 ┆ Confirmed cases │
93
- │ Hokkaido ┆ Dengue fever ┆ 1999 ┆ 28 ┆ 0.0 ┆ Confirmed cases │
94
- │ Tokyo ┆ Congenital rubella syndrome ┆ 2014 ┆ 41 ┆ 0.0 ┆ Confirmed cases │
95
- │ Nagasaki ┆ Severe Acute Respiratory Syndr… ┆ 2018 ┆ 4 ┆ 0.0 ┆ Confirmed cases │
96
- │ Fukushima ┆ Infectious gastroenteritis (on… ┆ 2019 ┆ 25 ┆ 145.0 ┆ Sentinel surveillance │
97
- │ Nara ┆ Severe invasive streptococcal … ┆ 2003 ┆ 10 ┆ 0.0 ┆ Confirmed cases │
98
- │ Mie ┆ Plague ┆ 2006 ┆ 37 ┆ 0.0 ┆ Confirmed cases │
99
- └────────────┴─────────────────────────────────┴──────┴──────┴───────┴───────────────────────┘
100
- ```
101
-
102
- ```python
103
- import jp_idwr_db as jp
104
-
105
- # Optional: attach ISO prefecture IDs (JP-01 ... JP-47) only when needed
106
- df_with_ids = jp.attach_prefecture_id(df, prefecture_col="prefecture", id_col="prefecture_id")
107
- print(df_with_ids.select(["prefecture", "prefecture_id"]).head())
108
- ```
109
-
110
- ```text
111
- shape: (5, 2)
112
- ┌────────────┬───────────────┐
113
- │ prefecture ┆ prefecture_id │
114
- ╞════════════╪═══════════════╡
115
- │ Tochigi ┆ JP-09 │
116
- │ Kochi ┆ JP-39 │
117
- │ Hokkaido ┆ JP-01 │
118
- │ Tokyo ┆ JP-13 │
119
- │ Nagasaki ┆ JP-42 │
120
- └────────────┴───────────────┘
121
- ```
122
-
123
- ## Main API
124
-
125
- Top-level API exported by `jp_idwr_db`:
126
-
127
- - `load(name)`
128
- - `get_data(...)`
129
- - `list_diseases(source="all")`
130
- - `list_prefectures()`
131
- - `get_latest_week()`
132
- - `prefecture_map()`
133
- - `attach_prefecture_id(df, prefecture_col="prefecture", id_col="prefecture_id")`
134
- - `merge(...)`, `pivot(...)`
135
- - `configure(...)`, `get_config()`
136
-
137
- ### Filtered Access with `get_data`
138
-
139
- ```python
140
- import jp_idwr_db as jp
141
-
142
- # Tuberculosis rows for a year range
143
- tb = jp.get_data(disease="Tuberculosis", year=(2018, 2023))
144
- print(tb.select(["prefecture", "disease", "year", "week", "count", "source"]).head(8))
145
- ```
146
-
147
- ```text
148
- shape: (8, 6)
149
- ┌────────────┬──────────────┬──────┬──────┬───────┬─────────────────┐
150
- │ prefecture ┆ disease ┆ year ┆ week ┆ count ┆ source │
151
- │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
152
- │ str ┆ str ┆ i32 ┆ i32 ┆ f64 ┆ str │
153
- ╞════════════╪══════════════╪══════╪══════╪═══════╪═════════════════╡
154
- │ Hokkaido ┆ Tuberculosis ┆ 2020 ┆ 12 ┆ 5.0 ┆ Confirmed cases │
155
- │ Oita ┆ Tuberculosis ┆ 2023 ┆ 38 ┆ 6.0 ┆ Confirmed cases │
156
- │ Fukuoka ┆ Tuberculosis ┆ 2021 ┆ 8 ┆ 12.0 ┆ Confirmed cases │
157
- │ Kagawa ┆ Tuberculosis ┆ 2020 ┆ 19 ┆ 2.0 ┆ Confirmed cases │
158
- │ Chiba ┆ Tuberculosis ┆ 2020 ┆ 19 ┆ 9.0 ┆ Confirmed cases │
159
- │ Kanagawa ┆ Tuberculosis ┆ 2022 ┆ 17 ┆ 25.0 ┆ Confirmed cases │
160
- │ Okinawa ┆ Tuberculosis ┆ 2021 ┆ 11 ┆ 4.0 ┆ Confirmed cases │
161
- │ Gifu ┆ Tuberculosis ┆ 2018 ┆ 23 ┆ 7.0 ┆ Confirmed cases │
162
- └────────────┴──────────────┴──────┴──────┴───────┴─────────────────┘
163
- ```
164
-
165
- ```python
166
- import jp_idwr_db as jp
167
-
168
- # Sentinel-only diseases from recent years
169
- sentinel = jp.get_data(source="sentinel", year=(2023, 2026))
170
- print(sentinel.select(["prefecture", "disease", "year", "week", "count", "source"]).head(8))
171
- ```
172
-
173
- ```text
174
- shape: (8, 6)
175
- ┌────────────┬─────────────────────────────────┬──────┬──────┬───────┬───────────────────────┐
176
- │ prefecture ┆ disease ┆ year ┆ week ┆ count ┆ source │
177
- │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
178
- │ str ┆ str ┆ i32 ┆ i32 ┆ f64 ┆ str │
179
- ╞════════════╪═════════════════════════════════╪══════╪══════╪═══════╪═══════════════════════╡
180
- │ Ishikawa ┆ Respiratory syncytial virus in… ┆ 2024 ┆ 42 ┆ 813.0 ┆ Sentinel surveillance │
181
- │ Nara ┆ Erythema infection ┆ 2025 ┆ 31 ┆ 823.0 ┆ Sentinel surveillance │
182
- │ Saga ┆ Mumps ┆ 2024 ┆ 26 ┆ 14.0 ┆ Sentinel surveillance │
183
- │ Hyogo ┆ Pharyngoconjunctival fever ┆ 2023 ┆ 19 ┆ 468.0 ┆ Sentinel surveillance │
184
- │ Miyazaki ┆ Infectious gastroenteritis ┆ 2026 ┆ 3 ┆ 339.0 ┆ Sentinel surveillance │
185
- │ Kagoshima ┆ Infectious gastroenteritis (on… ┆ 2024 ┆ 9 ┆ null ┆ Sentinel surveillance │
186
- │ Osaka ┆ Mumps ┆ 2024 ┆ 49 ┆ 404.0 ┆ Sentinel surveillance │
187
- │ Aomori ┆ Erythema infection ┆ 2024 ┆ 10 ┆ 5.0 ┆ Sentinel surveillance │
188
- └────────────┴─────────────────────────────────┴──────┴──────┴───────┴───────────────────────┘
189
- ```
190
-
191
- ## Datasets
192
-
193
- Use `jp.load(...)` with:
194
-
195
- - `"sex"`: historical sex-disaggregated surveillance
196
- - `"place"`: historical place-category surveillance
197
- - `"bullet"`: modern all-case weekly reports (rapid zensu)
198
- - `"sentinel"`: sentinel weekly reports (teitenrui; 2012+ in release data assets)
199
- - `"unified"`: deduplicated combined dataset (sex-total + modern bullet/sentinel, recommended)
200
-
201
- Detailed schema and coverage are documented in [DATASETS.md](./docs/DATASETS.md).
202
-
203
- ## Raw Download and Parsing
204
-
205
- Raw file workflows are available in `jp_idwr_db.io`:
206
-
207
- - `jp_idwr_db.io.download(...)`
208
- - `jp_idwr_db.io.download_recent(...)`
209
- - `jp_idwr_db.io.read(...)`
210
-
211
- These are useful for refreshing local raw weekly files or debugging parser behavior.
212
-
213
- ## Data Wrangling Examples
214
-
215
- See [EXAMPLES.md](./docs/EXAMPLES.md) for Polars-first data wrangling recipes (grouping, trends, regional slices, source-aware filtering).
216
-
217
- Disease-by-disease temporal coverage is documented in [DISEASES.md](./docs/DISEASES.md).
218
-
219
- ## Data Source
220
-
221
- NIID/JIHS infectious disease surveillance publications:
222
-
223
- - Historical annual archive files (`Syu_01_1`, `Syu_02_1`)
224
- - Rapid weekly CSV reports (`zensuXX.csv`, `teitenruiXX.csv`)
225
-
226
- ## Development
227
-
228
- ```bash
229
- uv sync --all-extras --dev
230
- uv run ruff check .
231
- uv run mypy src
232
- uv run pytest
233
- ```
234
-
235
- ## Security and Integrity
236
-
237
- - Release assets include a `jp_idwr_db-manifest.json` with SHA256 checksums.
238
- - `ensure_data()` verifies archive checksum and each extracted parquet checksum before marking cache complete.
239
- - For PyPI publishing, prefer Trusted Publishing (OIDC) over long-lived API tokens.
240
-
241
- ## License
242
-
243
- GPL-3.0-or-later. See [LICENSE](./LICENSE).