jp-idwr-db 0.2.2__tar.gz → 0.2.3__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/CHANGELOG.md +6 -0
- jp_idwr_db-0.2.3/PKG-INFO +293 -0
- jp_idwr_db-0.2.3/README.md +256 -0
- {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/pyproject.toml +1 -1
- {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/__init__.py +1 -1
- {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/config.py +1 -1
- {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/data_manager.py +11 -0
- jp_idwr_db-0.2.2/PKG-INFO +0 -243
- jp_idwr_db-0.2.2/README.md +0 -206
- {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/.gitignore +0 -0
- {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/CITATION.cff +0 -0
- {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/LICENSE +0 -0
- {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/__main__.py +0 -0
- {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/_internal/__init__.py +0 -0
- {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/_internal/download.py +0 -0
- {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/_internal/read.py +0 -0
- {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/_internal/validation.py +0 -0
- {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/api.py +0 -0
- {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/cli.py +0 -0
- {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/datasets.py +0 -0
- {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/http.py +0 -0
- {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/io.py +0 -0
- {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/py.typed +0 -0
- {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/transform.py +0 -0
- {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/types.py +0 -0
- {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/urls.py +0 -0
- {jp_idwr_db-0.2.2 → jp_idwr_db-0.2.3}/src/jp_idwr_db/utils.py +0 -0
|
@@ -1,5 +1,11 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 0.2.3 - 2026-02-06
|
|
4
|
+
|
|
5
|
+
- Refreshed release data assets from sorted parquet datasets (date/prefecture/category ordering).
|
|
6
|
+
- Improved README motivation and practical example narrative.
|
|
7
|
+
- Expanded examples with research-oriented `get_data()` workflows.
|
|
8
|
+
|
|
3
9
|
## 0.2.2 - 2026-02-06
|
|
4
10
|
|
|
5
11
|
- Fixed PyPI publish command in release workflow for current `uv` (`uv publish dist/*`).
|
|
@@ -0,0 +1,293 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: jp-idwr-db
|
|
3
|
+
Version: 0.2.3
|
|
4
|
+
Summary: Japanese IDWR infectious disease database and analytics toolkit built on Polars.
|
|
5
|
+
Project-URL: Homepage, https://github.com/AlFontal/jp-idwr-db
|
|
6
|
+
Project-URL: Repository, https://github.com/AlFontal/jp-idwr-db
|
|
7
|
+
Project-URL: Bug Tracker, https://github.com/AlFontal/jp-idwr-db/issues
|
|
8
|
+
Author: jp-idwr-db contributors
|
|
9
|
+
License: GPL-3.0-or-later
|
|
10
|
+
License-File: LICENSE
|
|
11
|
+
Keywords: epidemiology,infectious-disease,japan,polars,surveillance
|
|
12
|
+
Classifier: Development Status :: 3 - Alpha
|
|
13
|
+
Classifier: Intended Audience :: Science/Research
|
|
14
|
+
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
|
|
15
|
+
Classifier: Programming Language :: Python :: 3
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
20
|
+
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
|
|
21
|
+
Requires-Python: >=3.10
|
|
22
|
+
Requires-Dist: fastexcel>=0.10
|
|
23
|
+
Requires-Dist: httpx>=0.27
|
|
24
|
+
Requires-Dist: openpyxl>=3.1
|
|
25
|
+
Requires-Dist: platformdirs>=4.2
|
|
26
|
+
Requires-Dist: polars>=0.20
|
|
27
|
+
Requires-Dist: pyarrow>=14.0
|
|
28
|
+
Provides-Extra: dev
|
|
29
|
+
Requires-Dist: mypy>=1.8; extra == 'dev'
|
|
30
|
+
Requires-Dist: pre-commit>=3.7; extra == 'dev'
|
|
31
|
+
Requires-Dist: pytest-cov>=5.0; extra == 'dev'
|
|
32
|
+
Requires-Dist: pytest>=8.0; extra == 'dev'
|
|
33
|
+
Requires-Dist: ruff>=0.4; extra == 'dev'
|
|
34
|
+
Provides-Extra: excel
|
|
35
|
+
Requires-Dist: fastexcel>=0.10; extra == 'excel'
|
|
36
|
+
Description-Content-Type: text/markdown
|
|
37
|
+
|
|
38
|
+
# jp-idwr-db
|
|
39
|
+
|
|
40
|
+
Python access to Japanese infectious disease surveillance data from NIID/JIHS.
|
|
41
|
+
|
|
42
|
+
`jp-idwr-db` provides a Polars-first API for filtering and analysis.
|
|
43
|
+
Parquet datasets are versioned as GitHub Release assets and downloaded to a local cache on first use.
|
|
44
|
+
It is inspired by the R package `jpinfect`, but it is not an API-parity port and includes independently curated ingestion and coverage.
|
|
45
|
+
|
|
46
|
+
NIID/JIHS surveillance data is public, but it is not exposed as a clean analytical API.
|
|
47
|
+
To reconstruct usable time series, you typically need to navigate multiple archive structures, yearly directories,
|
|
48
|
+
and week-level files with changing formats (Excel and CSV) across historical and modern reporting systems.
|
|
49
|
+
|
|
50
|
+
This package exists to remove that friction: it consolidates those heterogeneous sources into standardized, queryable
|
|
51
|
+
tables so you can move directly to epidemiological analysis instead of file discovery, parsing, and schema harmonization.
|
|
52
|
+
|
|
53
|
+
## Install
|
|
54
|
+
|
|
55
|
+
```bash
|
|
56
|
+
pip install jp-idwr-db
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
## Data Download Model
|
|
60
|
+
|
|
61
|
+
- Package wheels do not ship the large parquet tables.
|
|
62
|
+
- On first call to `jp.load(...)` (or `jp.get_data(...)`), the package downloads versioned data assets from GitHub Releases.
|
|
63
|
+
- Cache path defaults to:
|
|
64
|
+
- macOS: `~/Library/Caches/jp_idwr_db/data/<version>/`
|
|
65
|
+
- Linux: `~/.cache/jp_idwr_db/data/<version>/`
|
|
66
|
+
- Windows: `%LOCALAPPDATA%\\jp_idwr_db\\Cache\\data\\<version>\\`
|
|
67
|
+
|
|
68
|
+
Prefetch explicitly:
|
|
69
|
+
|
|
70
|
+
```bash
|
|
71
|
+
python -m jp_idwr_db data download
|
|
72
|
+
python -m jp_idwr_db data download --version v0.2.2 --force
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
Environment overrides:
|
|
76
|
+
|
|
77
|
+
- `JPINFECT_DATA_VERSION`: choose a specific release tag (example: `v0.2.2`)
|
|
78
|
+
- `JPINFECT_DATA_BASE_URL`: override asset host base URL
|
|
79
|
+
- `JPINFECT_CACHE_DIR`: override local cache root
|
|
80
|
+
|
|
81
|
+
## Quick Start
|
|
82
|
+
|
|
83
|
+
To fetch the full unified dataset with a single call:
|
|
84
|
+
|
|
85
|
+
```python
|
|
86
|
+
import jp_idwr_db as jp
|
|
87
|
+
import polars as pl
|
|
88
|
+
|
|
89
|
+
df = (
|
|
90
|
+
jp.load("unified")
|
|
91
|
+
.select(["date", "prefecture", "category", "disease", "count", "source"])
|
|
92
|
+
)
|
|
93
|
+
print(df)
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
```text
|
|
97
|
+
shape: (5_370_477, 6)
|
|
98
|
+
┌────────────┬────────────┬──────────┬─────────────────────────────┬───────┬────────────────────┐
|
|
99
|
+
│ date ┆ prefecture ┆ category ┆ disease ┆ count ┆ source │
|
|
100
|
+
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
|
|
101
|
+
│ date ┆ str ┆ str ┆ str ┆ f64 ┆ str │
|
|
102
|
+
╞════════════╪════════════╪══════════╪═════════════════════════════╪═══════╪════════════════════╡
|
|
103
|
+
│ 1999-04-11 ┆ Aichi ┆ total ┆ AIDS ┆ 0.0 ┆ Confirmed cases │
|
|
104
|
+
│ 1999-04-11 ┆ Aichi ┆ total ┆ Acute poliomyelitis ┆ 0.0 ┆ Confirmed cases │
|
|
105
|
+
│ 1999-04-11 ┆ Aichi ┆ total ┆ Acute viral hepatitis ┆ 4.0 ┆ Confirmed cases │
|
|
106
|
+
│ 1999-04-11 ┆ Aichi ┆ total ┆ Amebiasis ┆ 0.0 ┆ Confirmed cases │
|
|
107
|
+
│ 1999-04-11 ┆ Aichi ┆ total ┆ Anthrax ┆ 0.0 ┆ Confirmed cases │
|
|
108
|
+
│ … ┆ … ┆ … ┆ … ┆ … ┆ … │
|
|
109
|
+
│ 2026-02-09 ┆ Yamanashi ┆ total ┆ Viral hepatitis(excluding ┆ 0.0 ┆ All-case reporting │
|
|
110
|
+
│ ┆ ┆ ┆ hepa… ┆ ┆ │
|
|
111
|
+
│ 2026-02-09 ┆ Yamanashi ┆ total ┆ West Nile fever ┆ 0.0 ┆ All-case reporting │
|
|
112
|
+
│ 2026-02-09 ┆ Yamanashi ┆ total ┆ Western equine encephalitis ┆ 0.0 ┆ All-case reporting │
|
|
113
|
+
│ 2026-02-09 ┆ Yamanashi ┆ total ┆ Yellow fever ┆ 0.0 ┆ All-case reporting │
|
|
114
|
+
│ 2026-02-09 ┆ Yamanashi ┆ total ┆ Zika virus infection ┆ 0.0 ┆ All-case reporting │
|
|
115
|
+
└────────────┴────────────┴──────────┴─────────────────────────────┴───────┴────────────────────┘
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
You can also filter at the source with `jp.get_data(...)`:
|
|
119
|
+
|
|
120
|
+
```python
|
|
121
|
+
|
|
122
|
+
# Fetch only tuberculosis data for 2024 in Tokyo, Osaka, and Hokkaido
|
|
123
|
+
tb = (
|
|
124
|
+
jp.get_data(
|
|
125
|
+
disease="Tuberculosis",
|
|
126
|
+
year=2024,
|
|
127
|
+
prefecture=["Tokyo", "Osaka", "Hokkaido"])
|
|
128
|
+
.select(["date", "prefecture", "disease", "count", "source"])
|
|
129
|
+
)
|
|
130
|
+
print(tb)
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
```text
|
|
134
|
+
shape: (156, 5)
|
|
135
|
+
┌────────────┬────────────┬──────────────┬───────┬────────────────────┐
|
|
136
|
+
│ date ┆ prefecture ┆ disease ┆ count ┆ source │
|
|
137
|
+
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
|
|
138
|
+
│ date ┆ str ┆ str ┆ f64 ┆ str │
|
|
139
|
+
╞════════════╪════════════╪══════════════╪═══════╪════════════════════╡
|
|
140
|
+
│ 2024-01-01 ┆ Hokkaido ┆ Tuberculosis ┆ 2.0 ┆ All-case reporting │
|
|
141
|
+
│ 2024-01-01 ┆ Osaka ┆ Tuberculosis ┆ 3.0 ┆ All-case reporting │
|
|
142
|
+
│ 2024-01-01 ┆ Tokyo ┆ Tuberculosis ┆ 15.0 ┆ All-case reporting │
|
|
143
|
+
│ 2024-01-08 ┆ Hokkaido ┆ Tuberculosis ┆ 4.0 ┆ All-case reporting │
|
|
144
|
+
│ 2024-01-08 ┆ Osaka ┆ Tuberculosis ┆ 17.0 ┆ All-case reporting │
|
|
145
|
+
│ … ┆ … ┆ … ┆ … ┆ … │
|
|
146
|
+
│ 2024-12-16 ┆ Osaka ┆ Tuberculosis ┆ 17.0 ┆ All-case reporting │
|
|
147
|
+
│ 2024-12-16 ┆ Tokyo ┆ Tuberculosis ┆ 41.0 ┆ All-case reporting │
|
|
148
|
+
│ 2024-12-23 ┆ Hokkaido ┆ Tuberculosis ┆ 5.0 ┆ All-case reporting │
|
|
149
|
+
│ 2024-12-23 ┆ Osaka ┆ Tuberculosis ┆ 16.0 ┆ All-case reporting │
|
|
150
|
+
│ 2024-12-23 ┆ Tokyo ┆ Tuberculosis ┆ 53.0 ┆ All-case reporting │
|
|
151
|
+
└────────────┴────────────┴──────────────┴───────┴────────────────────┘
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
```python
|
|
155
|
+
|
|
156
|
+
# Sentinel-only diseases from recent years in Tokyo prefecture
|
|
157
|
+
sentinel_df = (
|
|
158
|
+
jp.get_data(
|
|
159
|
+
source="sentinel",
|
|
160
|
+
year=(2024, 2026))
|
|
161
|
+
.select(["date", "prefecture", "disease", "count", "per_sentinel"])
|
|
162
|
+
)
|
|
163
|
+
print(sentinel_df)
|
|
164
|
+
```
|
|
165
|
+
|
|
166
|
+
```text
|
|
167
|
+
shape: (2_052, 5)
|
|
168
|
+
┌────────────┬────────────┬─────────────────────────────────┬─────────┬──────────────┐
|
|
169
|
+
│ date ┆ prefecture ┆ disease ┆ count ┆ per_sentinel │
|
|
170
|
+
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
|
|
171
|
+
│ date ┆ str ┆ str ┆ f64 ┆ f64 │
|
|
172
|
+
╞════════════╪════════════╪═════════════════════════════════╪═════════╪══════════════╡
|
|
173
|
+
│ 2024-01-07 ┆ Tokyo ┆ Acute hemorrhagic conjunctivit… ┆ null ┆ null │
|
|
174
|
+
│ 2024-01-07 ┆ Tokyo ┆ Aseptic meningitis ┆ null ┆ null │
|
|
175
|
+
│ 2024-01-07 ┆ Tokyo ┆ Bacterial meningitis ┆ null ┆ null │
|
|
176
|
+
│ 2024-01-07 ┆ Tokyo ┆ COVID-19 ┆ 1365.0 ┆ 3.38 │
|
|
177
|
+
│ 2024-01-07 ┆ Tokyo ┆ Chickenpox ┆ 31.0 ┆ 0.12 │
|
|
178
|
+
│ … ┆ … ┆ … ┆ … ┆ … │
|
|
179
|
+
│ 2026-01-25 ┆ Tokyo ┆ Influenza(excld. avian influen… ┆ 13082.0 ┆ 34.07 │
|
|
180
|
+
│ 2026-01-25 ┆ Tokyo ┆ Mumps ┆ 30.0 ┆ 0.12 │
|
|
181
|
+
│ 2026-01-25 ┆ Tokyo ┆ Mycoplasma pneumonia ┆ 32.0 ┆ 1.28 │
|
|
182
|
+
│ 2026-01-25 ┆ Tokyo ┆ Pharyngoconjunctival fever ┆ 115.0 ┆ 0.47 │
|
|
183
|
+
│ 2026-01-25 ┆ Tokyo ┆ Respiratory syncytial virus in… ┆ 242.0 ┆ 1.0 │
|
|
184
|
+
└────────────┴────────────┴─────────────────────────────────┴─────────┴──────────────┘
|
|
185
|
+
```
|
|
186
|
+
|
|
187
|
+
## Main API
|
|
188
|
+
|
|
189
|
+
Top-level API exported by `jp_idwr_db`:
|
|
190
|
+
|
|
191
|
+
- `load(name)`
|
|
192
|
+
- `get_data(...)`
|
|
193
|
+
- `list_diseases(source="all")`
|
|
194
|
+
- `list_prefectures()`
|
|
195
|
+
- `get_latest_week()`
|
|
196
|
+
- `prefecture_map()`
|
|
197
|
+
- `attach_prefecture_id(df, prefecture_col="prefecture", id_col="prefecture_id")`
|
|
198
|
+
- `merge(...)`, `pivot(...)`
|
|
199
|
+
- `configure(...)`, `get_config()`
|
|
200
|
+
|
|
201
|
+
|
|
202
|
+
## Datasets
|
|
203
|
+
|
|
204
|
+
Use `jp.load(...)` with:
|
|
205
|
+
|
|
206
|
+
- `"sex"`: historical sex-disaggregated surveillance
|
|
207
|
+
- `"place"`: historical place-category surveillance
|
|
208
|
+
- `"bullet"`: modern all-case weekly reports (rapid zensu)
|
|
209
|
+
- `"sentinel"`: sentinel weekly reports (teitenrui; 2012+ in release data assets)
|
|
210
|
+
- `"unified"`: deduplicated combined dataset (sex-total + modern bullet/sentinel, recommended)
|
|
211
|
+
|
|
212
|
+
Detailed schema and coverage are documented in [DATASETS.md](./docs/DATASETS.md).
|
|
213
|
+
|
|
214
|
+
## Optional Prefecture IDs
|
|
215
|
+
|
|
216
|
+
Attach ISO prefecture IDs (JP-01 ... JP-47) only when needed:
|
|
217
|
+
|
|
218
|
+
```python
|
|
219
|
+
import jp_idwr_db as jp
|
|
220
|
+
|
|
221
|
+
df_with_ids = (
|
|
222
|
+
jp.get_data(disease="Measles", year=2024)
|
|
223
|
+
.select(["prefecture", "disease", "count"])
|
|
224
|
+
.sort(["prefecture", "count"])
|
|
225
|
+
.unique(subset=["prefecture"], keep="first")
|
|
226
|
+
.pipe(jp.attach_prefecture_id)
|
|
227
|
+
.sort("prefecture")
|
|
228
|
+
)
|
|
229
|
+
print(df_with_ids)
|
|
230
|
+
```
|
|
231
|
+
|
|
232
|
+
```text
|
|
233
|
+
shape: (48, 4)
|
|
234
|
+
┌────────────┬─────────┬───────┬───────────────┐
|
|
235
|
+
│ prefecture ┆ disease ┆ count ┆ prefecture_id │
|
|
236
|
+
│ --- ┆ --- ┆ --- ┆ --- │
|
|
237
|
+
│ str ┆ str ┆ f64 ┆ str │
|
|
238
|
+
╞════════════╪═════════╪═══════╪═══════════════╡
|
|
239
|
+
│ Aichi ┆ Measles ┆ 0.0 ┆ JP-23 │
|
|
240
|
+
│ Akita ┆ Measles ┆ 0.0 ┆ JP-05 │
|
|
241
|
+
│ Aomori ┆ Measles ┆ 0.0 ┆ JP-02 │
|
|
242
|
+
│ Chiba ┆ Measles ┆ 0.0 ┆ JP-12 │
|
|
243
|
+
│ Ehime ┆ Measles ┆ 0.0 ┆ JP-38 │
|
|
244
|
+
│ … ┆ … ┆ … ┆ … │
|
|
245
|
+
│ Toyama ┆ Measles ┆ 0.0 ┆ JP-16 │
|
|
246
|
+
│ Wakayama ┆ Measles ┆ 0.0 ┆ JP-30 │
|
|
247
|
+
│ Yamagata ┆ Measles ┆ 0.0 ┆ JP-06 │
|
|
248
|
+
│ Yamaguchi ┆ Measles ┆ 0.0 ┆ JP-35 │
|
|
249
|
+
│ Yamanashi ┆ Measles ┆ 0.0 ┆ JP-19 │
|
|
250
|
+
└────────────┴─────────┴───────┴───────────────┘
|
|
251
|
+
```
|
|
252
|
+
|
|
253
|
+
## Raw Download and Parsing
|
|
254
|
+
|
|
255
|
+
Raw file workflows are available in `jp_idwr_db.io`:
|
|
256
|
+
|
|
257
|
+
- `jp_idwr_db.io.download(...)`
|
|
258
|
+
- `jp_idwr_db.io.download_recent(...)`
|
|
259
|
+
- `jp_idwr_db.io.read(...)`
|
|
260
|
+
|
|
261
|
+
These are useful for refreshing local raw weekly files or debugging parser behavior.
|
|
262
|
+
|
|
263
|
+
## Data Wrangling Examples
|
|
264
|
+
|
|
265
|
+
See [EXAMPLES.md](./docs/EXAMPLES.md) for Polars-first data wrangling recipes (grouping, trends, regional slices, source-aware filtering).
|
|
266
|
+
|
|
267
|
+
Disease-by-disease temporal coverage is documented in [DISEASES.md](./docs/DISEASES.md).
|
|
268
|
+
|
|
269
|
+
## Data Source
|
|
270
|
+
|
|
271
|
+
NIID/JIHS infectious disease surveillance publications:
|
|
272
|
+
|
|
273
|
+
- Historical annual archive files (`Syu_01_1`, `Syu_02_1`)
|
|
274
|
+
- Rapid weekly CSV reports (`zensuXX.csv`, `teitenruiXX.csv`)
|
|
275
|
+
|
|
276
|
+
## Development
|
|
277
|
+
|
|
278
|
+
```bash
|
|
279
|
+
uv sync --all-extras --dev
|
|
280
|
+
uv run ruff check .
|
|
281
|
+
uv run mypy src
|
|
282
|
+
uv run pytest
|
|
283
|
+
```
|
|
284
|
+
|
|
285
|
+
## Security and Integrity
|
|
286
|
+
|
|
287
|
+
- Release assets include a `jp_idwr_db-manifest.json` with SHA256 checksums.
|
|
288
|
+
- `ensure_data()` verifies archive checksum and each extracted parquet checksum before marking cache complete.
|
|
289
|
+
- For PyPI publishing, prefer Trusted Publishing (OIDC) over long-lived API tokens.
|
|
290
|
+
|
|
291
|
+
## License
|
|
292
|
+
|
|
293
|
+
GPL-3.0-or-later. See [LICENSE](./LICENSE).
|
|
@@ -0,0 +1,256 @@
|
|
|
1
|
+
# jp-idwr-db
|
|
2
|
+
|
|
3
|
+
Python access to Japanese infectious disease surveillance data from NIID/JIHS.
|
|
4
|
+
|
|
5
|
+
`jp-idwr-db` provides a Polars-first API for filtering and analysis.
|
|
6
|
+
Parquet datasets are versioned as GitHub Release assets and downloaded to a local cache on first use.
|
|
7
|
+
It is inspired by the R package `jpinfect`, but it is not an API-parity port and includes independently curated ingestion and coverage.
|
|
8
|
+
|
|
9
|
+
NIID/JIHS surveillance data is public, but it is not exposed as a clean analytical API.
|
|
10
|
+
To reconstruct usable time series, you typically need to navigate multiple archive structures, yearly directories,
|
|
11
|
+
and week-level files with changing formats (Excel and CSV) across historical and modern reporting systems.
|
|
12
|
+
|
|
13
|
+
This package exists to remove that friction: it consolidates those heterogeneous sources into standardized, queryable
|
|
14
|
+
tables so you can move directly to epidemiological analysis instead of file discovery, parsing, and schema harmonization.
|
|
15
|
+
|
|
16
|
+
## Install
|
|
17
|
+
|
|
18
|
+
```bash
|
|
19
|
+
pip install jp-idwr-db
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
## Data Download Model
|
|
23
|
+
|
|
24
|
+
- Package wheels do not ship the large parquet tables.
|
|
25
|
+
- On first call to `jp.load(...)` (or `jp.get_data(...)`), the package downloads versioned data assets from GitHub Releases.
|
|
26
|
+
- Cache path defaults to:
|
|
27
|
+
- macOS: `~/Library/Caches/jp_idwr_db/data/<version>/`
|
|
28
|
+
- Linux: `~/.cache/jp_idwr_db/data/<version>/`
|
|
29
|
+
- Windows: `%LOCALAPPDATA%\\jp_idwr_db\\Cache\\data\\<version>\\`
|
|
30
|
+
|
|
31
|
+
Prefetch explicitly:
|
|
32
|
+
|
|
33
|
+
```bash
|
|
34
|
+
python -m jp_idwr_db data download
|
|
35
|
+
python -m jp_idwr_db data download --version v0.2.2 --force
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
Environment overrides:
|
|
39
|
+
|
|
40
|
+
- `JPINFECT_DATA_VERSION`: choose a specific release tag (example: `v0.2.2`)
|
|
41
|
+
- `JPINFECT_DATA_BASE_URL`: override asset host base URL
|
|
42
|
+
- `JPINFECT_CACHE_DIR`: override local cache root
|
|
43
|
+
|
|
44
|
+
## Quick Start
|
|
45
|
+
|
|
46
|
+
To fetch the full unified dataset with a single call:
|
|
47
|
+
|
|
48
|
+
```python
|
|
49
|
+
import jp_idwr_db as jp
|
|
50
|
+
import polars as pl
|
|
51
|
+
|
|
52
|
+
df = (
|
|
53
|
+
jp.load("unified")
|
|
54
|
+
.select(["date", "prefecture", "category", "disease", "count", "source"])
|
|
55
|
+
)
|
|
56
|
+
print(df)
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
```text
|
|
60
|
+
shape: (5_370_477, 6)
|
|
61
|
+
┌────────────┬────────────┬──────────┬─────────────────────────────┬───────┬────────────────────┐
|
|
62
|
+
│ date ┆ prefecture ┆ category ┆ disease ┆ count ┆ source │
|
|
63
|
+
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
|
|
64
|
+
│ date ┆ str ┆ str ┆ str ┆ f64 ┆ str │
|
|
65
|
+
╞════════════╪════════════╪══════════╪═════════════════════════════╪═══════╪════════════════════╡
|
|
66
|
+
│ 1999-04-11 ┆ Aichi ┆ total ┆ AIDS ┆ 0.0 ┆ Confirmed cases │
|
|
67
|
+
│ 1999-04-11 ┆ Aichi ┆ total ┆ Acute poliomyelitis ┆ 0.0 ┆ Confirmed cases │
|
|
68
|
+
│ 1999-04-11 ┆ Aichi ┆ total ┆ Acute viral hepatitis ┆ 4.0 ┆ Confirmed cases │
|
|
69
|
+
│ 1999-04-11 ┆ Aichi ┆ total ┆ Amebiasis ┆ 0.0 ┆ Confirmed cases │
|
|
70
|
+
│ 1999-04-11 ┆ Aichi ┆ total ┆ Anthrax ┆ 0.0 ┆ Confirmed cases │
|
|
71
|
+
│ … ┆ … ┆ … ┆ … ┆ … ┆ … │
|
|
72
|
+
│ 2026-02-09 ┆ Yamanashi ┆ total ┆ Viral hepatitis(excluding ┆ 0.0 ┆ All-case reporting │
|
|
73
|
+
│ ┆ ┆ ┆ hepa… ┆ ┆ │
|
|
74
|
+
│ 2026-02-09 ┆ Yamanashi ┆ total ┆ West Nile fever ┆ 0.0 ┆ All-case reporting │
|
|
75
|
+
│ 2026-02-09 ┆ Yamanashi ┆ total ┆ Western equine encephalitis ┆ 0.0 ┆ All-case reporting │
|
|
76
|
+
│ 2026-02-09 ┆ Yamanashi ┆ total ┆ Yellow fever ┆ 0.0 ┆ All-case reporting │
|
|
77
|
+
│ 2026-02-09 ┆ Yamanashi ┆ total ┆ Zika virus infection ┆ 0.0 ┆ All-case reporting │
|
|
78
|
+
└────────────┴────────────┴──────────┴─────────────────────────────┴───────┴────────────────────┘
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
You can also filter at the source with `jp.get_data(...)`:
|
|
82
|
+
|
|
83
|
+
```python
|
|
84
|
+
|
|
85
|
+
# Fetch only tuberculosis data for 2024 in Tokyo, Osaka, and Hokkaido
|
|
86
|
+
tb = (
|
|
87
|
+
jp.get_data(
|
|
88
|
+
disease="Tuberculosis",
|
|
89
|
+
year=2024,
|
|
90
|
+
prefecture=["Tokyo", "Osaka", "Hokkaido"])
|
|
91
|
+
.select(["date", "prefecture", "disease", "count", "source"])
|
|
92
|
+
)
|
|
93
|
+
print(tb)
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
```text
|
|
97
|
+
shape: (156, 5)
|
|
98
|
+
┌────────────┬────────────┬──────────────┬───────┬────────────────────┐
|
|
99
|
+
│ date ┆ prefecture ┆ disease ┆ count ┆ source │
|
|
100
|
+
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
|
|
101
|
+
│ date ┆ str ┆ str ┆ f64 ┆ str │
|
|
102
|
+
╞════════════╪════════════╪══════════════╪═══════╪════════════════════╡
|
|
103
|
+
│ 2024-01-01 ┆ Hokkaido ┆ Tuberculosis ┆ 2.0 ┆ All-case reporting │
|
|
104
|
+
│ 2024-01-01 ┆ Osaka ┆ Tuberculosis ┆ 3.0 ┆ All-case reporting │
|
|
105
|
+
│ 2024-01-01 ┆ Tokyo ┆ Tuberculosis ┆ 15.0 ┆ All-case reporting │
|
|
106
|
+
│ 2024-01-08 ┆ Hokkaido ┆ Tuberculosis ┆ 4.0 ┆ All-case reporting │
|
|
107
|
+
│ 2024-01-08 ┆ Osaka ┆ Tuberculosis ┆ 17.0 ┆ All-case reporting │
|
|
108
|
+
│ … ┆ … ┆ … ┆ … ┆ … │
|
|
109
|
+
│ 2024-12-16 ┆ Osaka ┆ Tuberculosis ┆ 17.0 ┆ All-case reporting │
|
|
110
|
+
│ 2024-12-16 ┆ Tokyo ┆ Tuberculosis ┆ 41.0 ┆ All-case reporting │
|
|
111
|
+
│ 2024-12-23 ┆ Hokkaido ┆ Tuberculosis ┆ 5.0 ┆ All-case reporting │
|
|
112
|
+
│ 2024-12-23 ┆ Osaka ┆ Tuberculosis ┆ 16.0 ┆ All-case reporting │
|
|
113
|
+
│ 2024-12-23 ┆ Tokyo ┆ Tuberculosis ┆ 53.0 ┆ All-case reporting │
|
|
114
|
+
└────────────┴────────────┴──────────────┴───────┴────────────────────┘
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
```python
|
|
118
|
+
|
|
119
|
+
# Sentinel-only diseases from recent years in Tokyo prefecture
|
|
120
|
+
sentinel_df = (
|
|
121
|
+
jp.get_data(
|
|
122
|
+
source="sentinel",
|
|
123
|
+
year=(2024, 2026))
|
|
124
|
+
.select(["date", "prefecture", "disease", "count", "per_sentinel"])
|
|
125
|
+
)
|
|
126
|
+
print(sentinel_df)
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
```text
|
|
130
|
+
shape: (2_052, 5)
|
|
131
|
+
┌────────────┬────────────┬─────────────────────────────────┬─────────┬──────────────┐
|
|
132
|
+
│ date ┆ prefecture ┆ disease ┆ count ┆ per_sentinel │
|
|
133
|
+
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
|
|
134
|
+
│ date ┆ str ┆ str ┆ f64 ┆ f64 │
|
|
135
|
+
╞════════════╪════════════╪═════════════════════════════════╪═════════╪══════════════╡
|
|
136
|
+
│ 2024-01-07 ┆ Tokyo ┆ Acute hemorrhagic conjunctivit… ┆ null ┆ null │
|
|
137
|
+
│ 2024-01-07 ┆ Tokyo ┆ Aseptic meningitis ┆ null ┆ null │
|
|
138
|
+
│ 2024-01-07 ┆ Tokyo ┆ Bacterial meningitis ┆ null ┆ null │
|
|
139
|
+
│ 2024-01-07 ┆ Tokyo ┆ COVID-19 ┆ 1365.0 ┆ 3.38 │
|
|
140
|
+
│ 2024-01-07 ┆ Tokyo ┆ Chickenpox ┆ 31.0 ┆ 0.12 │
|
|
141
|
+
│ … ┆ … ┆ … ┆ … ┆ … │
|
|
142
|
+
│ 2026-01-25 ┆ Tokyo ┆ Influenza(excld. avian influen… ┆ 13082.0 ┆ 34.07 │
|
|
143
|
+
│ 2026-01-25 ┆ Tokyo ┆ Mumps ┆ 30.0 ┆ 0.12 │
|
|
144
|
+
│ 2026-01-25 ┆ Tokyo ┆ Mycoplasma pneumonia ┆ 32.0 ┆ 1.28 │
|
|
145
|
+
│ 2026-01-25 ┆ Tokyo ┆ Pharyngoconjunctival fever ┆ 115.0 ┆ 0.47 │
|
|
146
|
+
│ 2026-01-25 ┆ Tokyo ┆ Respiratory syncytial virus in… ┆ 242.0 ┆ 1.0 │
|
|
147
|
+
└────────────┴────────────┴─────────────────────────────────┴─────────┴──────────────┘
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
## Main API
|
|
151
|
+
|
|
152
|
+
Top-level API exported by `jp_idwr_db`:
|
|
153
|
+
|
|
154
|
+
- `load(name)`
|
|
155
|
+
- `get_data(...)`
|
|
156
|
+
- `list_diseases(source="all")`
|
|
157
|
+
- `list_prefectures()`
|
|
158
|
+
- `get_latest_week()`
|
|
159
|
+
- `prefecture_map()`
|
|
160
|
+
- `attach_prefecture_id(df, prefecture_col="prefecture", id_col="prefecture_id")`
|
|
161
|
+
- `merge(...)`, `pivot(...)`
|
|
162
|
+
- `configure(...)`, `get_config()`
|
|
163
|
+
|
|
164
|
+
|
|
165
|
+
## Datasets
|
|
166
|
+
|
|
167
|
+
Use `jp.load(...)` with:
|
|
168
|
+
|
|
169
|
+
- `"sex"`: historical sex-disaggregated surveillance
|
|
170
|
+
- `"place"`: historical place-category surveillance
|
|
171
|
+
- `"bullet"`: modern all-case weekly reports (rapid zensu)
|
|
172
|
+
- `"sentinel"`: sentinel weekly reports (teitenrui; 2012+ in release data assets)
|
|
173
|
+
- `"unified"`: deduplicated combined dataset (sex-total + modern bullet/sentinel, recommended)
|
|
174
|
+
|
|
175
|
+
Detailed schema and coverage are documented in [DATASETS.md](./docs/DATASETS.md).
|
|
176
|
+
|
|
177
|
+
## Optional Prefecture IDs
|
|
178
|
+
|
|
179
|
+
Attach ISO prefecture IDs (JP-01 ... JP-47) only when needed:
|
|
180
|
+
|
|
181
|
+
```python
|
|
182
|
+
import jp_idwr_db as jp
|
|
183
|
+
|
|
184
|
+
df_with_ids = (
|
|
185
|
+
jp.get_data(disease="Measles", year=2024)
|
|
186
|
+
.select(["prefecture", "disease", "count"])
|
|
187
|
+
.sort(["prefecture", "count"])
|
|
188
|
+
.unique(subset=["prefecture"], keep="first")
|
|
189
|
+
.pipe(jp.attach_prefecture_id)
|
|
190
|
+
.sort("prefecture")
|
|
191
|
+
)
|
|
192
|
+
print(df_with_ids)
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
```text
|
|
196
|
+
shape: (48, 4)
|
|
197
|
+
┌────────────┬─────────┬───────┬───────────────┐
|
|
198
|
+
│ prefecture ┆ disease ┆ count ┆ prefecture_id │
|
|
199
|
+
│ --- ┆ --- ┆ --- ┆ --- │
|
|
200
|
+
│ str ┆ str ┆ f64 ┆ str │
|
|
201
|
+
╞════════════╪═════════╪═══════╪═══════════════╡
|
|
202
|
+
│ Aichi ┆ Measles ┆ 0.0 ┆ JP-23 │
|
|
203
|
+
│ Akita ┆ Measles ┆ 0.0 ┆ JP-05 │
|
|
204
|
+
│ Aomori ┆ Measles ┆ 0.0 ┆ JP-02 │
|
|
205
|
+
│ Chiba ┆ Measles ┆ 0.0 ┆ JP-12 │
|
|
206
|
+
│ Ehime ┆ Measles ┆ 0.0 ┆ JP-38 │
|
|
207
|
+
│ … ┆ … ┆ … ┆ … │
|
|
208
|
+
│ Toyama ┆ Measles ┆ 0.0 ┆ JP-16 │
|
|
209
|
+
│ Wakayama ┆ Measles ┆ 0.0 ┆ JP-30 │
|
|
210
|
+
│ Yamagata ┆ Measles ┆ 0.0 ┆ JP-06 │
|
|
211
|
+
│ Yamaguchi ┆ Measles ┆ 0.0 ┆ JP-35 │
|
|
212
|
+
│ Yamanashi ┆ Measles ┆ 0.0 ┆ JP-19 │
|
|
213
|
+
└────────────┴─────────┴───────┴───────────────┘
|
|
214
|
+
```
|
|
215
|
+
|
|
216
|
+
## Raw Download and Parsing
|
|
217
|
+
|
|
218
|
+
Raw file workflows are available in `jp_idwr_db.io`:
|
|
219
|
+
|
|
220
|
+
- `jp_idwr_db.io.download(...)`
|
|
221
|
+
- `jp_idwr_db.io.download_recent(...)`
|
|
222
|
+
- `jp_idwr_db.io.read(...)`
|
|
223
|
+
|
|
224
|
+
These are useful for refreshing local raw weekly files or debugging parser behavior.
|
|
225
|
+
|
|
226
|
+
## Data Wrangling Examples
|
|
227
|
+
|
|
228
|
+
See [EXAMPLES.md](./docs/EXAMPLES.md) for Polars-first data wrangling recipes (grouping, trends, regional slices, source-aware filtering).
|
|
229
|
+
|
|
230
|
+
Disease-by-disease temporal coverage is documented in [DISEASES.md](./docs/DISEASES.md).
|
|
231
|
+
|
|
232
|
+
## Data Source
|
|
233
|
+
|
|
234
|
+
NIID/JIHS infectious disease surveillance publications:
|
|
235
|
+
|
|
236
|
+
- Historical annual archive files (`Syu_01_1`, `Syu_02_1`)
|
|
237
|
+
- Rapid weekly CSV reports (`zensuXX.csv`, `teitenruiXX.csv`)
|
|
238
|
+
|
|
239
|
+
## Development
|
|
240
|
+
|
|
241
|
+
```bash
|
|
242
|
+
uv sync --all-extras --dev
|
|
243
|
+
uv run ruff check .
|
|
244
|
+
uv run mypy src
|
|
245
|
+
uv run pytest
|
|
246
|
+
```
|
|
247
|
+
|
|
248
|
+
## Security and Integrity
|
|
249
|
+
|
|
250
|
+
- Release assets include a `jp_idwr_db-manifest.json` with SHA256 checksums.
|
|
251
|
+
- `ensure_data()` verifies archive checksum and each extracted parquet checksum before marking cache complete.
|
|
252
|
+
- For PyPI publishing, prefer Trusted Publishing (OIDC) over long-lived API tokens.
|
|
253
|
+
|
|
254
|
+
## License
|
|
255
|
+
|
|
256
|
+
GPL-3.0-or-later. See [LICENSE](./LICENSE).
|
|
@@ -25,7 +25,7 @@ class Config:
|
|
|
25
25
|
|
|
26
26
|
cache_dir: Path = Path(user_cache_dir("jp_idwr_db"))
|
|
27
27
|
rate_limit_per_minute: int = 20
|
|
28
|
-
user_agent: str = "jp_idwr_db/0.2.
|
|
28
|
+
user_agent: str = "jp_idwr_db/0.2.3 (+https://github.com/AlFontal/jp-idwr-db)"
|
|
29
29
|
timeout_seconds: float = 30.0
|
|
30
30
|
retries: int = 3
|
|
31
31
|
|
|
@@ -6,6 +6,7 @@ import hashlib
|
|
|
6
6
|
import json
|
|
7
7
|
import os
|
|
8
8
|
import shutil
|
|
9
|
+
import sys
|
|
9
10
|
import zipfile
|
|
10
11
|
from importlib.metadata import PackageNotFoundError
|
|
11
12
|
from importlib.metadata import version as package_version
|
|
@@ -128,6 +129,15 @@ def ensure_data(version: str | None = None, force: bool = False) -> Path:
|
|
|
128
129
|
if force and data_dir.exists():
|
|
129
130
|
shutil.rmtree(data_dir)
|
|
130
131
|
data_dir.mkdir(parents=True, exist_ok=True)
|
|
132
|
+
action = "Refreshing" if force else "Building"
|
|
133
|
+
print(
|
|
134
|
+
f"[jp_idwr_db] {action} local data cache for {resolved} at {data_dir}.",
|
|
135
|
+
file=sys.stderr,
|
|
136
|
+
)
|
|
137
|
+
print(
|
|
138
|
+
"[jp_idwr_db] This happens on first use and may take a moment.",
|
|
139
|
+
file=sys.stderr,
|
|
140
|
+
)
|
|
131
141
|
|
|
132
142
|
archive_path, manifest_path = download_release_assets(resolved, data_dir)
|
|
133
143
|
manifest = json.loads(manifest_path.read_text(encoding="utf-8"))
|
|
@@ -159,4 +169,5 @@ def ensure_data(version: str | None = None, force: bool = False) -> Path:
|
|
|
159
169
|
raise ValueError(f"Missing required datasets in cache: {sorted(missing_expected)}")
|
|
160
170
|
|
|
161
171
|
marker.write_text("ok\n", encoding="utf-8")
|
|
172
|
+
print("[jp_idwr_db] Data cache ready.", file=sys.stderr)
|
|
162
173
|
return data_dir
|
jp_idwr_db-0.2.2/PKG-INFO
DELETED
|
@@ -1,243 +0,0 @@
|
|
|
1
|
-
Metadata-Version: 2.4
|
|
2
|
-
Name: jp-idwr-db
|
|
3
|
-
Version: 0.2.2
|
|
4
|
-
Summary: Japanese IDWR infectious disease database and analytics toolkit built on Polars.
|
|
5
|
-
Project-URL: Homepage, https://github.com/AlFontal/jp-idwr-db
|
|
6
|
-
Project-URL: Repository, https://github.com/AlFontal/jp-idwr-db
|
|
7
|
-
Project-URL: Bug Tracker, https://github.com/AlFontal/jp-idwr-db/issues
|
|
8
|
-
Author: jp-idwr-db contributors
|
|
9
|
-
License: GPL-3.0-or-later
|
|
10
|
-
License-File: LICENSE
|
|
11
|
-
Keywords: epidemiology,infectious-disease,japan,polars,surveillance
|
|
12
|
-
Classifier: Development Status :: 3 - Alpha
|
|
13
|
-
Classifier: Intended Audience :: Science/Research
|
|
14
|
-
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
|
|
15
|
-
Classifier: Programming Language :: Python :: 3
|
|
16
|
-
Classifier: Programming Language :: Python :: 3.10
|
|
17
|
-
Classifier: Programming Language :: Python :: 3.11
|
|
18
|
-
Classifier: Programming Language :: Python :: 3.12
|
|
19
|
-
Classifier: Programming Language :: Python :: 3.13
|
|
20
|
-
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
|
|
21
|
-
Requires-Python: >=3.10
|
|
22
|
-
Requires-Dist: fastexcel>=0.10
|
|
23
|
-
Requires-Dist: httpx>=0.27
|
|
24
|
-
Requires-Dist: openpyxl>=3.1
|
|
25
|
-
Requires-Dist: platformdirs>=4.2
|
|
26
|
-
Requires-Dist: polars>=0.20
|
|
27
|
-
Requires-Dist: pyarrow>=14.0
|
|
28
|
-
Provides-Extra: dev
|
|
29
|
-
Requires-Dist: mypy>=1.8; extra == 'dev'
|
|
30
|
-
Requires-Dist: pre-commit>=3.7; extra == 'dev'
|
|
31
|
-
Requires-Dist: pytest-cov>=5.0; extra == 'dev'
|
|
32
|
-
Requires-Dist: pytest>=8.0; extra == 'dev'
|
|
33
|
-
Requires-Dist: ruff>=0.4; extra == 'dev'
|
|
34
|
-
Provides-Extra: excel
|
|
35
|
-
Requires-Dist: fastexcel>=0.10; extra == 'excel'
|
|
36
|
-
Description-Content-Type: text/markdown
|
|
37
|
-
|
|
38
|
-
# jp-idwr-db
|
|
39
|
-
|
|
40
|
-
Python access to Japanese infectious disease surveillance data from NIID/JIHS.
|
|
41
|
-
|
|
42
|
-
`jp-idwr-db` provides a Polars-first API for filtering and analysis.
|
|
43
|
-
Parquet datasets are versioned as GitHub Release assets and downloaded to a local cache on first use.
|
|
44
|
-
It is inspired by the R package `jpinfect`, but it is not an API-parity port and includes independently curated ingestion and coverage.
|
|
45
|
-
|
|
46
|
-
## Install
|
|
47
|
-
|
|
48
|
-
```bash
|
|
49
|
-
pip install jp-idwr-db
|
|
50
|
-
```
|
|
51
|
-
|
|
52
|
-
## Data Download Model
|
|
53
|
-
|
|
54
|
-
- Package wheels do not ship the large parquet tables.
|
|
55
|
-
- On first call to `jp.load(...)` (or `jp.get_data(...)`), the package downloads versioned data assets from GitHub Releases.
|
|
56
|
-
- Cache path defaults to:
|
|
57
|
-
- macOS: `~/Library/Caches/jp_idwr_db/data/<version>/`
|
|
58
|
-
- Linux: `~/.cache/jp_idwr_db/data/<version>/`
|
|
59
|
-
- Windows: `%LOCALAPPDATA%\\jp_idwr_db\\Cache\\data\\<version>\\`
|
|
60
|
-
|
|
61
|
-
Prefetch explicitly:
|
|
62
|
-
|
|
63
|
-
```bash
|
|
64
|
-
python -m jp_idwr_db data download
|
|
65
|
-
python -m jp_idwr_db data download --version v0.1.0 --force
|
|
66
|
-
```
|
|
67
|
-
|
|
68
|
-
Environment overrides:
|
|
69
|
-
|
|
70
|
-
- `JPINFECT_DATA_VERSION`: choose a specific release tag (example: `v0.1.0`)
|
|
71
|
-
- `JPINFECT_DATA_BASE_URL`: override asset host base URL
|
|
72
|
-
- `JPINFECT_CACHE_DIR`: override local cache root
|
|
73
|
-
|
|
74
|
-
## Quick Start
|
|
75
|
-
|
|
76
|
-
```python
|
|
77
|
-
import jp_idwr_db as jp
|
|
78
|
-
|
|
79
|
-
# Full unified dataset (recommended)
|
|
80
|
-
df = jp.load("unified")
|
|
81
|
-
print(df.select(["prefecture", "disease", "year", "week", "count", "source"]).head(8))
|
|
82
|
-
```
|
|
83
|
-
|
|
84
|
-
```text
|
|
85
|
-
shape: (8, 6)
|
|
86
|
-
┌────────────┬─────────────────────────────────┬──────┬──────┬───────┬───────────────────────┐
|
|
87
|
-
│ prefecture ┆ disease ┆ year ┆ week ┆ count ┆ source │
|
|
88
|
-
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
|
|
89
|
-
│ str ┆ str ┆ i32 ┆ i32 ┆ f64 ┆ str │
|
|
90
|
-
╞════════════╪═════════════════════════════════╪══════╪══════╪═══════╪═══════════════════════╡
|
|
91
|
-
│ Tochigi ┆ Lyme disease ┆ 2011 ┆ 24 ┆ 0.0 ┆ Confirmed cases │
|
|
92
|
-
│ Kochi ┆ Avian influenza H5N1 ┆ 2008 ┆ 51 ┆ 0.0 ┆ Confirmed cases │
|
|
93
|
-
│ Hokkaido ┆ Dengue fever ┆ 1999 ┆ 28 ┆ 0.0 ┆ Confirmed cases │
|
|
94
|
-
│ Tokyo ┆ Congenital rubella syndrome ┆ 2014 ┆ 41 ┆ 0.0 ┆ Confirmed cases │
|
|
95
|
-
│ Nagasaki ┆ Severe Acute Respiratory Syndr… ┆ 2018 ┆ 4 ┆ 0.0 ┆ Confirmed cases │
|
|
96
|
-
│ Fukushima ┆ Infectious gastroenteritis (on… ┆ 2019 ┆ 25 ┆ 145.0 ┆ Sentinel surveillance │
|
|
97
|
-
│ Nara ┆ Severe invasive streptococcal … ┆ 2003 ┆ 10 ┆ 0.0 ┆ Confirmed cases │
|
|
98
|
-
│ Mie ┆ Plague ┆ 2006 ┆ 37 ┆ 0.0 ┆ Confirmed cases │
|
|
99
|
-
└────────────┴─────────────────────────────────┴──────┴──────┴───────┴───────────────────────┘
|
|
100
|
-
```
|
|
101
|
-
|
|
102
|
-
```python
|
|
103
|
-
import jp_idwr_db as jp
|
|
104
|
-
|
|
105
|
-
# Optional: attach ISO prefecture IDs (JP-01 ... JP-47) only when needed
|
|
106
|
-
df_with_ids = jp.attach_prefecture_id(df, prefecture_col="prefecture", id_col="prefecture_id")
|
|
107
|
-
print(df_with_ids.select(["prefecture", "prefecture_id"]).head())
|
|
108
|
-
```
|
|
109
|
-
|
|
110
|
-
```text
|
|
111
|
-
shape: (5, 2)
|
|
112
|
-
┌────────────┬───────────────┐
|
|
113
|
-
│ prefecture ┆ prefecture_id │
|
|
114
|
-
╞════════════╪═══════════════╡
|
|
115
|
-
│ Tochigi ┆ JP-09 │
|
|
116
|
-
│ Kochi ┆ JP-39 │
|
|
117
|
-
│ Hokkaido ┆ JP-01 │
|
|
118
|
-
│ Tokyo ┆ JP-13 │
|
|
119
|
-
│ Nagasaki ┆ JP-42 │
|
|
120
|
-
└────────────┴───────────────┘
|
|
121
|
-
```
|
|
122
|
-
|
|
123
|
-
## Main API
|
|
124
|
-
|
|
125
|
-
Top-level API exported by `jp_idwr_db`:
|
|
126
|
-
|
|
127
|
-
- `load(name)`
|
|
128
|
-
- `get_data(...)`
|
|
129
|
-
- `list_diseases(source="all")`
|
|
130
|
-
- `list_prefectures()`
|
|
131
|
-
- `get_latest_week()`
|
|
132
|
-
- `prefecture_map()`
|
|
133
|
-
- `attach_prefecture_id(df, prefecture_col="prefecture", id_col="prefecture_id")`
|
|
134
|
-
- `merge(...)`, `pivot(...)`
|
|
135
|
-
- `configure(...)`, `get_config()`
|
|
136
|
-
|
|
137
|
-
### Filtered Access with `get_data`
|
|
138
|
-
|
|
139
|
-
```python
|
|
140
|
-
import jp_idwr_db as jp
|
|
141
|
-
|
|
142
|
-
# Tuberculosis rows for a year range
|
|
143
|
-
tb = jp.get_data(disease="Tuberculosis", year=(2018, 2023))
|
|
144
|
-
print(tb.select(["prefecture", "disease", "year", "week", "count", "source"]).head(8))
|
|
145
|
-
```
|
|
146
|
-
|
|
147
|
-
```text
|
|
148
|
-
shape: (8, 6)
|
|
149
|
-
┌────────────┬──────────────┬──────┬──────┬───────┬─────────────────┐
|
|
150
|
-
│ prefecture ┆ disease ┆ year ┆ week ┆ count ┆ source │
|
|
151
|
-
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
|
|
152
|
-
│ str ┆ str ┆ i32 ┆ i32 ┆ f64 ┆ str │
|
|
153
|
-
╞════════════╪══════════════╪══════╪══════╪═══════╪═════════════════╡
|
|
154
|
-
│ Hokkaido ┆ Tuberculosis ┆ 2020 ┆ 12 ┆ 5.0 ┆ Confirmed cases │
|
|
155
|
-
│ Oita ┆ Tuberculosis ┆ 2023 ┆ 38 ┆ 6.0 ┆ Confirmed cases │
|
|
156
|
-
│ Fukuoka ┆ Tuberculosis ┆ 2021 ┆ 8 ┆ 12.0 ┆ Confirmed cases │
|
|
157
|
-
│ Kagawa ┆ Tuberculosis ┆ 2020 ┆ 19 ┆ 2.0 ┆ Confirmed cases │
|
|
158
|
-
│ Chiba ┆ Tuberculosis ┆ 2020 ┆ 19 ┆ 9.0 ┆ Confirmed cases │
|
|
159
|
-
│ Kanagawa ┆ Tuberculosis ┆ 2022 ┆ 17 ┆ 25.0 ┆ Confirmed cases │
|
|
160
|
-
│ Okinawa ┆ Tuberculosis ┆ 2021 ┆ 11 ┆ 4.0 ┆ Confirmed cases │
|
|
161
|
-
│ Gifu ┆ Tuberculosis ┆ 2018 ┆ 23 ┆ 7.0 ┆ Confirmed cases │
|
|
162
|
-
└────────────┴──────────────┴──────┴──────┴───────┴─────────────────┘
|
|
163
|
-
```
|
|
164
|
-
|
|
165
|
-
```python
|
|
166
|
-
import jp_idwr_db as jp
|
|
167
|
-
|
|
168
|
-
# Sentinel-only diseases from recent years
|
|
169
|
-
sentinel = jp.get_data(source="sentinel", year=(2023, 2026))
|
|
170
|
-
print(sentinel.select(["prefecture", "disease", "year", "week", "count", "source"]).head(8))
|
|
171
|
-
```
|
|
172
|
-
|
|
173
|
-
```text
|
|
174
|
-
shape: (8, 6)
|
|
175
|
-
┌────────────┬─────────────────────────────────┬──────┬──────┬───────┬───────────────────────┐
|
|
176
|
-
│ prefecture ┆ disease ┆ year ┆ week ┆ count ┆ source │
|
|
177
|
-
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
|
|
178
|
-
│ str ┆ str ┆ i32 ┆ i32 ┆ f64 ┆ str │
|
|
179
|
-
╞════════════╪═════════════════════════════════╪══════╪══════╪═══════╪═══════════════════════╡
|
|
180
|
-
│ Ishikawa ┆ Respiratory syncytial virus in… ┆ 2024 ┆ 42 ┆ 813.0 ┆ Sentinel surveillance │
|
|
181
|
-
│ Nara ┆ Erythema infection ┆ 2025 ┆ 31 ┆ 823.0 ┆ Sentinel surveillance │
|
|
182
|
-
│ Saga ┆ Mumps ┆ 2024 ┆ 26 ┆ 14.0 ┆ Sentinel surveillance │
|
|
183
|
-
│ Hyogo ┆ Pharyngoconjunctival fever ┆ 2023 ┆ 19 ┆ 468.0 ┆ Sentinel surveillance │
|
|
184
|
-
│ Miyazaki ┆ Infectious gastroenteritis ┆ 2026 ┆ 3 ┆ 339.0 ┆ Sentinel surveillance │
|
|
185
|
-
│ Kagoshima ┆ Infectious gastroenteritis (on… ┆ 2024 ┆ 9 ┆ null ┆ Sentinel surveillance │
|
|
186
|
-
│ Osaka ┆ Mumps ┆ 2024 ┆ 49 ┆ 404.0 ┆ Sentinel surveillance │
|
|
187
|
-
│ Aomori ┆ Erythema infection ┆ 2024 ┆ 10 ┆ 5.0 ┆ Sentinel surveillance │
|
|
188
|
-
└────────────┴─────────────────────────────────┴──────┴──────┴───────┴───────────────────────┘
|
|
189
|
-
```
|
|
190
|
-
|
|
191
|
-
## Datasets
|
|
192
|
-
|
|
193
|
-
Use `jp.load(...)` with:
|
|
194
|
-
|
|
195
|
-
- `"sex"`: historical sex-disaggregated surveillance
|
|
196
|
-
- `"place"`: historical place-category surveillance
|
|
197
|
-
- `"bullet"`: modern all-case weekly reports (rapid zensu)
|
|
198
|
-
- `"sentinel"`: sentinel weekly reports (teitenrui; 2012+ in release data assets)
|
|
199
|
-
- `"unified"`: deduplicated combined dataset (sex-total + modern bullet/sentinel, recommended)
|
|
200
|
-
|
|
201
|
-
Detailed schema and coverage are documented in [DATASETS.md](./docs/DATASETS.md).
|
|
202
|
-
|
|
203
|
-
## Raw Download and Parsing
|
|
204
|
-
|
|
205
|
-
Raw file workflows are available in `jp_idwr_db.io`:
|
|
206
|
-
|
|
207
|
-
- `jp_idwr_db.io.download(...)`
|
|
208
|
-
- `jp_idwr_db.io.download_recent(...)`
|
|
209
|
-
- `jp_idwr_db.io.read(...)`
|
|
210
|
-
|
|
211
|
-
These are useful for refreshing local raw weekly files or debugging parser behavior.
|
|
212
|
-
|
|
213
|
-
## Data Wrangling Examples
|
|
214
|
-
|
|
215
|
-
See [EXAMPLES.md](./docs/EXAMPLES.md) for Polars-first data wrangling recipes (grouping, trends, regional slices, source-aware filtering).
|
|
216
|
-
|
|
217
|
-
Disease-by-disease temporal coverage is documented in [DISEASES.md](./docs/DISEASES.md).
|
|
218
|
-
|
|
219
|
-
## Data Source
|
|
220
|
-
|
|
221
|
-
NIID/JIHS infectious disease surveillance publications:
|
|
222
|
-
|
|
223
|
-
- Historical annual archive files (`Syu_01_1`, `Syu_02_1`)
|
|
224
|
-
- Rapid weekly CSV reports (`zensuXX.csv`, `teitenruiXX.csv`)
|
|
225
|
-
|
|
226
|
-
## Development
|
|
227
|
-
|
|
228
|
-
```bash
|
|
229
|
-
uv sync --all-extras --dev
|
|
230
|
-
uv run ruff check .
|
|
231
|
-
uv run mypy src
|
|
232
|
-
uv run pytest
|
|
233
|
-
```
|
|
234
|
-
|
|
235
|
-
## Security and Integrity
|
|
236
|
-
|
|
237
|
-
- Release assets include a `jp_idwr_db-manifest.json` with SHA256 checksums.
|
|
238
|
-
- `ensure_data()` verifies archive checksum and each extracted parquet checksum before marking cache complete.
|
|
239
|
-
- For PyPI publishing, prefer Trusted Publishing (OIDC) over long-lived API tokens.
|
|
240
|
-
|
|
241
|
-
## License
|
|
242
|
-
|
|
243
|
-
GPL-3.0-or-later. See [LICENSE](./LICENSE).
|
jp_idwr_db-0.2.2/README.md
DELETED
|
@@ -1,206 +0,0 @@
|
|
|
1
|
-
# jp-idwr-db
|
|
2
|
-
|
|
3
|
-
Python access to Japanese infectious disease surveillance data from NIID/JIHS.
|
|
4
|
-
|
|
5
|
-
`jp-idwr-db` provides a Polars-first API for filtering and analysis.
|
|
6
|
-
Parquet datasets are versioned as GitHub Release assets and downloaded to a local cache on first use.
|
|
7
|
-
It is inspired by the R package `jpinfect`, but it is not an API-parity port and includes independently curated ingestion and coverage.
|
|
8
|
-
|
|
9
|
-
## Install
|
|
10
|
-
|
|
11
|
-
```bash
|
|
12
|
-
pip install jp-idwr-db
|
|
13
|
-
```
|
|
14
|
-
|
|
15
|
-
## Data Download Model
|
|
16
|
-
|
|
17
|
-
- Package wheels do not ship the large parquet tables.
|
|
18
|
-
- On first call to `jp.load(...)` (or `jp.get_data(...)`), the package downloads versioned data assets from GitHub Releases.
|
|
19
|
-
- Cache path defaults to:
|
|
20
|
-
- macOS: `~/Library/Caches/jp_idwr_db/data/<version>/`
|
|
21
|
-
- Linux: `~/.cache/jp_idwr_db/data/<version>/`
|
|
22
|
-
- Windows: `%LOCALAPPDATA%\\jp_idwr_db\\Cache\\data\\<version>\\`
|
|
23
|
-
|
|
24
|
-
Prefetch explicitly:
|
|
25
|
-
|
|
26
|
-
```bash
|
|
27
|
-
python -m jp_idwr_db data download
|
|
28
|
-
python -m jp_idwr_db data download --version v0.1.0 --force
|
|
29
|
-
```
|
|
30
|
-
|
|
31
|
-
Environment overrides:
|
|
32
|
-
|
|
33
|
-
- `JPINFECT_DATA_VERSION`: choose a specific release tag (example: `v0.1.0`)
|
|
34
|
-
- `JPINFECT_DATA_BASE_URL`: override asset host base URL
|
|
35
|
-
- `JPINFECT_CACHE_DIR`: override local cache root
|
|
36
|
-
|
|
37
|
-
## Quick Start
|
|
38
|
-
|
|
39
|
-
```python
|
|
40
|
-
import jp_idwr_db as jp
|
|
41
|
-
|
|
42
|
-
# Full unified dataset (recommended)
|
|
43
|
-
df = jp.load("unified")
|
|
44
|
-
print(df.select(["prefecture", "disease", "year", "week", "count", "source"]).head(8))
|
|
45
|
-
```
|
|
46
|
-
|
|
47
|
-
```text
|
|
48
|
-
shape: (8, 6)
|
|
49
|
-
┌────────────┬─────────────────────────────────┬──────┬──────┬───────┬───────────────────────┐
|
|
50
|
-
│ prefecture ┆ disease ┆ year ┆ week ┆ count ┆ source │
|
|
51
|
-
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
|
|
52
|
-
│ str ┆ str ┆ i32 ┆ i32 ┆ f64 ┆ str │
|
|
53
|
-
╞════════════╪═════════════════════════════════╪══════╪══════╪═══════╪═══════════════════════╡
|
|
54
|
-
│ Tochigi ┆ Lyme disease ┆ 2011 ┆ 24 ┆ 0.0 ┆ Confirmed cases │
|
|
55
|
-
│ Kochi ┆ Avian influenza H5N1 ┆ 2008 ┆ 51 ┆ 0.0 ┆ Confirmed cases │
|
|
56
|
-
│ Hokkaido ┆ Dengue fever ┆ 1999 ┆ 28 ┆ 0.0 ┆ Confirmed cases │
|
|
57
|
-
│ Tokyo ┆ Congenital rubella syndrome ┆ 2014 ┆ 41 ┆ 0.0 ┆ Confirmed cases │
|
|
58
|
-
│ Nagasaki ┆ Severe Acute Respiratory Syndr… ┆ 2018 ┆ 4 ┆ 0.0 ┆ Confirmed cases │
|
|
59
|
-
│ Fukushima ┆ Infectious gastroenteritis (on… ┆ 2019 ┆ 25 ┆ 145.0 ┆ Sentinel surveillance │
|
|
60
|
-
│ Nara ┆ Severe invasive streptococcal … ┆ 2003 ┆ 10 ┆ 0.0 ┆ Confirmed cases │
|
|
61
|
-
│ Mie ┆ Plague ┆ 2006 ┆ 37 ┆ 0.0 ┆ Confirmed cases │
|
|
62
|
-
└────────────┴─────────────────────────────────┴──────┴──────┴───────┴───────────────────────┘
|
|
63
|
-
```
|
|
64
|
-
|
|
65
|
-
```python
|
|
66
|
-
import jp_idwr_db as jp
|
|
67
|
-
|
|
68
|
-
# Optional: attach ISO prefecture IDs (JP-01 ... JP-47) only when needed
|
|
69
|
-
df_with_ids = jp.attach_prefecture_id(df, prefecture_col="prefecture", id_col="prefecture_id")
|
|
70
|
-
print(df_with_ids.select(["prefecture", "prefecture_id"]).head())
|
|
71
|
-
```
|
|
72
|
-
|
|
73
|
-
```text
|
|
74
|
-
shape: (5, 2)
|
|
75
|
-
┌────────────┬───────────────┐
|
|
76
|
-
│ prefecture ┆ prefecture_id │
|
|
77
|
-
╞════════════╪═══════════════╡
|
|
78
|
-
│ Tochigi ┆ JP-09 │
|
|
79
|
-
│ Kochi ┆ JP-39 │
|
|
80
|
-
│ Hokkaido ┆ JP-01 │
|
|
81
|
-
│ Tokyo ┆ JP-13 │
|
|
82
|
-
│ Nagasaki ┆ JP-42 │
|
|
83
|
-
└────────────┴───────────────┘
|
|
84
|
-
```
|
|
85
|
-
|
|
86
|
-
## Main API
|
|
87
|
-
|
|
88
|
-
Top-level API exported by `jp_idwr_db`:
|
|
89
|
-
|
|
90
|
-
- `load(name)`
|
|
91
|
-
- `get_data(...)`
|
|
92
|
-
- `list_diseases(source="all")`
|
|
93
|
-
- `list_prefectures()`
|
|
94
|
-
- `get_latest_week()`
|
|
95
|
-
- `prefecture_map()`
|
|
96
|
-
- `attach_prefecture_id(df, prefecture_col="prefecture", id_col="prefecture_id")`
|
|
97
|
-
- `merge(...)`, `pivot(...)`
|
|
98
|
-
- `configure(...)`, `get_config()`
|
|
99
|
-
|
|
100
|
-
### Filtered Access with `get_data`
|
|
101
|
-
|
|
102
|
-
```python
|
|
103
|
-
import jp_idwr_db as jp
|
|
104
|
-
|
|
105
|
-
# Tuberculosis rows for a year range
|
|
106
|
-
tb = jp.get_data(disease="Tuberculosis", year=(2018, 2023))
|
|
107
|
-
print(tb.select(["prefecture", "disease", "year", "week", "count", "source"]).head(8))
|
|
108
|
-
```
|
|
109
|
-
|
|
110
|
-
```text
|
|
111
|
-
shape: (8, 6)
|
|
112
|
-
┌────────────┬──────────────┬──────┬──────┬───────┬─────────────────┐
|
|
113
|
-
│ prefecture ┆ disease ┆ year ┆ week ┆ count ┆ source │
|
|
114
|
-
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
|
|
115
|
-
│ str ┆ str ┆ i32 ┆ i32 ┆ f64 ┆ str │
|
|
116
|
-
╞════════════╪══════════════╪══════╪══════╪═══════╪═════════════════╡
|
|
117
|
-
│ Hokkaido ┆ Tuberculosis ┆ 2020 ┆ 12 ┆ 5.0 ┆ Confirmed cases │
|
|
118
|
-
│ Oita ┆ Tuberculosis ┆ 2023 ┆ 38 ┆ 6.0 ┆ Confirmed cases │
|
|
119
|
-
│ Fukuoka ┆ Tuberculosis ┆ 2021 ┆ 8 ┆ 12.0 ┆ Confirmed cases │
|
|
120
|
-
│ Kagawa ┆ Tuberculosis ┆ 2020 ┆ 19 ┆ 2.0 ┆ Confirmed cases │
|
|
121
|
-
│ Chiba ┆ Tuberculosis ┆ 2020 ┆ 19 ┆ 9.0 ┆ Confirmed cases │
|
|
122
|
-
│ Kanagawa ┆ Tuberculosis ┆ 2022 ┆ 17 ┆ 25.0 ┆ Confirmed cases │
|
|
123
|
-
│ Okinawa ┆ Tuberculosis ┆ 2021 ┆ 11 ┆ 4.0 ┆ Confirmed cases │
|
|
124
|
-
│ Gifu ┆ Tuberculosis ┆ 2018 ┆ 23 ┆ 7.0 ┆ Confirmed cases │
|
|
125
|
-
└────────────┴──────────────┴──────┴──────┴───────┴─────────────────┘
|
|
126
|
-
```
|
|
127
|
-
|
|
128
|
-
```python
|
|
129
|
-
import jp_idwr_db as jp
|
|
130
|
-
|
|
131
|
-
# Sentinel-only diseases from recent years
|
|
132
|
-
sentinel = jp.get_data(source="sentinel", year=(2023, 2026))
|
|
133
|
-
print(sentinel.select(["prefecture", "disease", "year", "week", "count", "source"]).head(8))
|
|
134
|
-
```
|
|
135
|
-
|
|
136
|
-
```text
|
|
137
|
-
shape: (8, 6)
|
|
138
|
-
┌────────────┬─────────────────────────────────┬──────┬──────┬───────┬───────────────────────┐
|
|
139
|
-
│ prefecture ┆ disease ┆ year ┆ week ┆ count ┆ source │
|
|
140
|
-
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
|
|
141
|
-
│ str ┆ str ┆ i32 ┆ i32 ┆ f64 ┆ str │
|
|
142
|
-
╞════════════╪═════════════════════════════════╪══════╪══════╪═══════╪═══════════════════════╡
|
|
143
|
-
│ Ishikawa ┆ Respiratory syncytial virus in… ┆ 2024 ┆ 42 ┆ 813.0 ┆ Sentinel surveillance │
|
|
144
|
-
│ Nara ┆ Erythema infection ┆ 2025 ┆ 31 ┆ 823.0 ┆ Sentinel surveillance │
|
|
145
|
-
│ Saga ┆ Mumps ┆ 2024 ┆ 26 ┆ 14.0 ┆ Sentinel surveillance │
|
|
146
|
-
│ Hyogo ┆ Pharyngoconjunctival fever ┆ 2023 ┆ 19 ┆ 468.0 ┆ Sentinel surveillance │
|
|
147
|
-
│ Miyazaki ┆ Infectious gastroenteritis ┆ 2026 ┆ 3 ┆ 339.0 ┆ Sentinel surveillance │
|
|
148
|
-
│ Kagoshima ┆ Infectious gastroenteritis (on… ┆ 2024 ┆ 9 ┆ null ┆ Sentinel surveillance │
|
|
149
|
-
│ Osaka ┆ Mumps ┆ 2024 ┆ 49 ┆ 404.0 ┆ Sentinel surveillance │
|
|
150
|
-
│ Aomori ┆ Erythema infection ┆ 2024 ┆ 10 ┆ 5.0 ┆ Sentinel surveillance │
|
|
151
|
-
└────────────┴─────────────────────────────────┴──────┴──────┴───────┴───────────────────────┘
|
|
152
|
-
```
|
|
153
|
-
|
|
154
|
-
## Datasets
|
|
155
|
-
|
|
156
|
-
Use `jp.load(...)` with:
|
|
157
|
-
|
|
158
|
-
- `"sex"`: historical sex-disaggregated surveillance
|
|
159
|
-
- `"place"`: historical place-category surveillance
|
|
160
|
-
- `"bullet"`: modern all-case weekly reports (rapid zensu)
|
|
161
|
-
- `"sentinel"`: sentinel weekly reports (teitenrui; 2012+ in release data assets)
|
|
162
|
-
- `"unified"`: deduplicated combined dataset (sex-total + modern bullet/sentinel, recommended)
|
|
163
|
-
|
|
164
|
-
Detailed schema and coverage are documented in [DATASETS.md](./docs/DATASETS.md).
|
|
165
|
-
|
|
166
|
-
## Raw Download and Parsing
|
|
167
|
-
|
|
168
|
-
Raw file workflows are available in `jp_idwr_db.io`:
|
|
169
|
-
|
|
170
|
-
- `jp_idwr_db.io.download(...)`
|
|
171
|
-
- `jp_idwr_db.io.download_recent(...)`
|
|
172
|
-
- `jp_idwr_db.io.read(...)`
|
|
173
|
-
|
|
174
|
-
These are useful for refreshing local raw weekly files or debugging parser behavior.
|
|
175
|
-
|
|
176
|
-
## Data Wrangling Examples
|
|
177
|
-
|
|
178
|
-
See [EXAMPLES.md](./docs/EXAMPLES.md) for Polars-first data wrangling recipes (grouping, trends, regional slices, source-aware filtering).
|
|
179
|
-
|
|
180
|
-
Disease-by-disease temporal coverage is documented in [DISEASES.md](./docs/DISEASES.md).
|
|
181
|
-
|
|
182
|
-
## Data Source
|
|
183
|
-
|
|
184
|
-
NIID/JIHS infectious disease surveillance publications:
|
|
185
|
-
|
|
186
|
-
- Historical annual archive files (`Syu_01_1`, `Syu_02_1`)
|
|
187
|
-
- Rapid weekly CSV reports (`zensuXX.csv`, `teitenruiXX.csv`)
|
|
188
|
-
|
|
189
|
-
## Development
|
|
190
|
-
|
|
191
|
-
```bash
|
|
192
|
-
uv sync --all-extras --dev
|
|
193
|
-
uv run ruff check .
|
|
194
|
-
uv run mypy src
|
|
195
|
-
uv run pytest
|
|
196
|
-
```
|
|
197
|
-
|
|
198
|
-
## Security and Integrity
|
|
199
|
-
|
|
200
|
-
- Release assets include a `jp_idwr_db-manifest.json` with SHA256 checksums.
|
|
201
|
-
- `ensure_data()` verifies archive checksum and each extracted parquet checksum before marking cache complete.
|
|
202
|
-
- For PyPI publishing, prefer Trusted Publishing (OIDC) over long-lived API tokens.
|
|
203
|
-
|
|
204
|
-
## License
|
|
205
|
-
|
|
206
|
-
GPL-3.0-or-later. See [LICENSE](./LICENSE).
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|