radreport 0.4.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Mustafa Merchant
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,459 @@
1
+ Metadata-Version: 2.4
2
+ Name: radreport
3
+ Version: 0.4.0
4
+ Summary: Parse, de-identify, structure, and export radiology free-text reports to FHIR
5
+ Author-email: Mustafa Merchant <mustafamerchant072@gmail.com>
6
+ License-Expression: MIT
7
+ Project-URL: Homepage, https://github.com/mustafamm072/radreport
8
+ Project-URL: Documentation, https://github.com/mustafamm072/radreport#readme
9
+ Project-URL: Issues, https://github.com/mustafamm072/radreport/issues
10
+ Keywords: radiology,DICOM,FHIR,medical imaging,healthcare,NLP,de-identification,PHI,HIPAA
11
+ Classifier: Development Status :: 3 - Alpha
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: Intended Audience :: Healthcare Industry
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Programming Language :: Python :: 3.9
16
+ Classifier: Programming Language :: Python :: 3.10
17
+ Classifier: Programming Language :: Python :: 3.11
18
+ Classifier: Programming Language :: Python :: 3.12
19
+ Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
20
+ Requires-Python: >=3.9
21
+ Description-Content-Type: text/markdown
22
+ License-File: LICENSE
23
+ Provides-Extra: dev
24
+ Requires-Dist: pytest>=7.0; extra == "dev"
25
+ Requires-Dist: pytest-cov; extra == "dev"
26
+ Dynamic: license-file
27
+
28
+ # radreport
29
+
30
+ **Parse radiology free-text reports into structured data. No ML. No GPU. No dependencies.**
31
+
32
+ [![PyPI version](https://badge.fury.io/py/radreport.svg)](https://badge.fury.io/py/radreport)
33
+ [![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/)
34
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
35
+
36
+ Radiology reports come out as free-text PDFs. Downstream systems — EMRs, telehealth portals, billing platforms, research pipelines — need structured data. This library bridges that gap.
37
+
38
+ Four things it does well:
39
+
40
+ 1. **Parse** — splits any free-text report into labeled sections, extracts measurements, links findings to anatomy
41
+ 2. **Detect** — flags critical/urgent findings with negation awareness (no false alerts for "no pneumothorax")
42
+ 3. **De-identify** — redacts PHI (dates, MRNs, names, contact info…) with a full audit trail, so reports can leave a controlled environment for research
43
+ 4. **Export** — outputs FHIR R4 DiagnosticReport resources ready for any EMR
44
+
45
+ ---
46
+
47
+ ## Install
48
+
49
+ ```bash
50
+ pip install radreport
51
+ ```
52
+
53
+ Zero required dependencies. Works on Python 3.9+.
54
+
55
+ ---
56
+
57
+ ## Quick Start
58
+
59
+ ```python
60
+ from radreport import ReportParser, CriticalFindingsDetector, FHIRExporter
61
+ import json
62
+
63
+ report_text = """
64
+ INDICATION: Chest pain, rule out PE.
65
+
66
+ FINDINGS:
67
+ Lungs: Filling defect in the right main pulmonary artery consistent with
68
+ pulmonary embolism. No pneumothorax.
69
+
70
+ IMPRESSION:
71
+ Pulmonary embolism, right main pulmonary artery. Urgent correlation recommended.
72
+ """
73
+
74
+ # 1. Parse
75
+ parser = ReportParser()
76
+ report = parser.parse(report_text, modality="CT")
77
+
78
+ print(report.impression)
79
+ # → "Pulmonary embolism, right main pulmonary artery. Urgent correlation recommended."
80
+
81
+ # 2. Detect critical findings
82
+ detector = CriticalFindingsDetector()
83
+ report = detector.detect(report)
84
+
85
+ for cf in report.critical_findings:
86
+ if not cf.negated:
87
+ print(f"[{cf.severity.upper()}] {cf.term} ({cf.category})")
88
+ print(f" Context: {cf.context}")
89
+ # → [CRITICAL] pulmonary embolism (pulmonary)
90
+ # Context: Filling defect in the right main pulmonary artery consistent with pulmonary embolism.
91
+
92
+ # 3. Export to FHIR
93
+ exporter = FHIRExporter()
94
+ fhir = exporter.export(report, patient_id="pt-001")
95
+ print(json.dumps(fhir, indent=2))
96
+ ```
97
+
98
+ ---
99
+
100
+ ## CLI
101
+
102
+ After installation, the `radreport` command is available for single-file and batch processing:
103
+
104
+ ```bash
105
+ # Parse a single report to JSON
106
+ radreport report.txt
107
+
108
+ # Parse with critical findings detection
109
+ radreport report.txt --critical
110
+
111
+ # Export as FHIR DiagnosticReport
112
+ radreport report.txt --fhir --patient-id pt-001 --modality CT
113
+
114
+ # Extract follow-up recommendations too
115
+ radreport report.txt --critical --recommend
116
+
117
+ # Redact PHI before parsing (safe to store/share the output)
118
+ radreport report.txt --deidentify --critical
119
+
120
+ # Batch process multiple files → JSON array
121
+ radreport reports/*.txt --critical -o batch.json
122
+
123
+ # Specify modality for all files
124
+ radreport *.txt --modality MRI --fhir -o fhir_batch.json
125
+
126
+ # Flat CSV for research/analytics (one row per report)
127
+ radreport reports/*.txt --critical --recommend --format csv -o cohort.csv
128
+ ```
129
+
130
+ **Flags:**
131
+
132
+ | Flag | Short | Description |
133
+ |------|-------|-------------|
134
+ | `--modality MOD` | `-m` | CT, MRI, XR, US, NM, PET … |
135
+ | `--critical` | `-c` | Run critical findings detection |
136
+ | `--recommend` | `-r` | Extract follow-up imaging recommendations |
137
+ | `--deidentify` | `-d` | Redact PHI (dates, MRN, names, phone…) before parsing |
138
+ | `--fhir` | `-f` | Export as FHIR R4 DiagnosticReport (implies --critical) |
139
+ | `--patient-id ID` | | FHIR Patient resource ID (used with `--fhir`) |
140
+ | `--format FMT` | `--fmt` | Output format: `json` (default) or `csv` (not compatible with `--fhir`) |
141
+ | `--output FILE` | `-o` | Write output to file instead of stdout |
142
+
143
+ ---
144
+
145
+ ## Parsing
146
+
147
+ ### Sections
148
+
149
+ The parser recognizes standard radiology report sections regardless of formatting style:
150
+
151
+ | Section key | Matched headers |
152
+ |------------------|-----------------|
153
+ | `indication` | Indication, Clinical Indication, History, Reason for Exam |
154
+ | `technique` | Technique, Procedure, Protocol |
155
+ | `comparison` | Comparison, Prior Study, Previous |
156
+ | `findings` | Findings, Observations |
157
+ | `impression` | Impression, Conclusion, Assessment, Diagnosis |
158
+ | `recommendation` | Recommendation, Follow-up, Advised |
159
+
160
+ ```python
161
+ report = parser.parse(text, modality="MRI")
162
+
163
+ findings = report.get_section("findings")
164
+ print(findings.raw_text)
165
+
166
+ impression = report.get_section("impression")
167
+ print(impression.raw_text)
168
+ ```
169
+
170
+ ### Measurements
171
+
172
+ All measurements are extracted and normalized to millimeters:
173
+
174
+ ```python
175
+ for m in report.all_measurements:
176
+ print(f" Raw: {m.raw}")
177
+ print(f" Normalized (mm): {m.dimensions_mm}")
178
+ print(f" Largest dimension: {m.largest_dimension_mm} mm")
179
+
180
+ # Raw: 2.3 x 1.8 cm
181
+ # Normalized (mm): [23.0, 18.0]
182
+ # Largest dimension: 23.0 mm
183
+ ```
184
+
185
+ Handles: `1.2 x 0.8 cm`, `12mm`, `1.2cm`, `12 x 8 x 5 mm`, `1.2 x 0.8 x 0.5 cm`
186
+
187
+ ### Findings by anatomy
188
+
189
+ ```python
190
+ findings_section = report.get_section("findings")
191
+ for finding in findings_section.findings:
192
+ print(f"Anatomy: {finding.anatomy or 'unspecified'}")
193
+ print(f"Text: {finding.text}")
194
+ ```
195
+
196
+ ### Batch processing
197
+
198
+ ```python
199
+ reports = parser.parse_batch(list_of_texts, modality="CT")
200
+ # Returns list[ParsedReport | None] — None for empty/unparseable inputs
201
+ active = [r for r in reports if r is not None]
202
+ ```
203
+
204
+ ### JSON serialization
205
+
206
+ ```python
207
+ report = parser.parse(text, modality="CT")
208
+
209
+ # As dict
210
+ d = report.to_dict()
211
+
212
+ # As JSON string (shorthand)
213
+ json_str = report.to_json()
214
+ json_str = report.to_json(indent=4)
215
+ ```
216
+
217
+ ---
218
+
219
+ ## Critical Findings Detection
220
+
221
+ Rule-based. Fully auditable. No black boxes.
222
+
223
+ Covers 45+ terms across 7 categories:
224
+
225
+ | Category | Examples |
226
+ |-------------|----------|
227
+ | `vascular` | aortic dissection, DVT, aortic aneurysm |
228
+ | `pulmonary` | pulmonary embolism, PE, pneumothorax, hemothorax |
229
+ | `neuro` | subdural hematoma, midline shift, intracranial hemorrhage |
230
+ | `abdominal` | free air, bowel perforation, appendicitis |
231
+ | `cardiac` | cardiac tamponade, pericardial effusion |
232
+ | `spinal` | cord compression, cervical fracture |
233
+ | `oncologic` | malignancy, metastasis, carcinoma |
234
+
235
+ ### Negation awareness
236
+
237
+ ```python
238
+ # "No pneumothorax identified" → negated=True, won't trigger alert
239
+ # "Pneumothorax present" → negated=False, triggers alert
240
+
241
+ active = [cf for cf in report.critical_findings if not cf.negated]
242
+ ```
243
+
244
+ Negation is **scoped to the sentence** and **fails safe**:
245
+
246
+ - A negation in one sentence never carries into the next — *"No acute hemorrhage.
247
+ Large subdural hematoma is present."* flags the hematoma as active.
248
+ - When a term appears more than once, an active (non-negated) mention always wins
249
+ over a negated one — *"No pneumothorax at the apex. Large pneumothorax at the
250
+ base."* flags pneumothorax as active. A term is reported as negated only when
251
+ **every** mention is negated. This prevents a real critical finding from being
252
+ silently suppressed by an earlier "no ..." phrase.
253
+
254
+ ### Severity levels
255
+
256
+ - `critical` — requires immediate action (PE, subdural hematoma, pneumothorax)
257
+ - `urgent` — requires same-day follow-up (DVT, bowel obstruction, appendicitis)
258
+ - `significant` — requires follow-up (malignancy, metastasis)
259
+
260
+ ### Extending the term list
261
+
262
+ ```python
263
+ from radreport.critical_findings import CRITICAL_TERMS
264
+
265
+ CRITICAL_TERMS["tension pneumothorax"] = ("pulmonary", "critical")
266
+ CRITICAL_TERMS["septic emboli"] = ("vascular", "urgent")
267
+ ```
268
+
269
+ ---
270
+
271
+ ## Follow-up Recommendations
272
+
273
+ Extract structured follow-up imaging recommendations from the recommendation and
274
+ impression sections — interval, modality, and urgency.
275
+
276
+ ```python
277
+ from radreport import ReportParser, RecommendationExtractor
278
+
279
+ report = ReportParser().parse(text, modality="CT")
280
+ report = RecommendationExtractor().extract(report)
281
+
282
+ for rec in report.recommendations:
283
+ print(rec.interval, rec.modality, rec.urgency)
284
+
285
+ # "Recommend follow-up CT in 6 months."
286
+ # → interval="6 months", modality="CT", urgency="routine"
287
+ ```
288
+
289
+ Negation-aware: *"No follow-up imaging indicated"* yields no recommendation.
290
+ Identical recommendations are deduplicated.
291
+
292
+ ---
293
+
294
+ ## De-identification
295
+
296
+ Strip Protected Health Information (PHI) from a report before it leaves a
297
+ controlled environment — for research collaboration, analytics warehouses, or
298
+ off-site processing. Like everything else in this library, it is **rule-based
299
+ and fully auditable**: every removal is a traceable regular-expression match,
300
+ recorded with its original offset. No ML, no cloud NER service — the kind of
301
+ thing hospital IT will actually approve.
302
+
303
+ ```python
304
+ from radreport import Deidentifier
305
+
306
+ deid = Deidentifier()
307
+ result = deid.deidentify(raw_report_text)
308
+
309
+ print(result.text) # scrubbed report, safe to share
310
+ print(result.redaction_count) # e.g. 11
311
+ print(result.category_counts()) # {"date": 3, "mrn": 1, "name": 2, ...}
312
+
313
+ # Audit trail — every removed span, keyed to the original text
314
+ for r in result.redactions:
315
+ print(r.category, r.original, "→", r.replacement, f"@{r.start}:{r.end}")
316
+ ```
317
+
318
+ ### What it detects
319
+
320
+ Categories map to the HIPAA Safe Harbor identifiers that are reliably matchable
321
+ from text alone:
322
+
323
+ | Category | Examples |
324
+ |----------|----------|
325
+ | `date` | `03/10/2024`, `2024-03-10`, `March 5, 2024` |
326
+ | `age` | ages **90+** (`94-year-old`) — HIPAA requires aggregating these |
327
+ | `ssn` | `123-45-6789` |
328
+ | `mrn` | `MRN: 12345678`, `Medical Record Number 12345678` |
329
+ | `accession` | `Accession: A98765432` |
330
+ | `phone` | `(555) 123-4567`, `555-123-4567` |
331
+ | `email` | `jdoe@example.com` |
332
+ | `url` | `http://pacs.hospital.org/...` |
333
+ | `ipv4` | `10.0.0.1` |
334
+ | `zip` | `DC 20500` (ZIP after a state code) |
335
+ | `name` | titled names (`Dr. Jane Smith`) and header fields (`Patient Name: …`) |
336
+
337
+ Clinical content is preserved: a `6 mm` nodule or a `90 mm` mass is never
338
+ mistaken for PHI, because age matching requires an explicit age cue.
339
+
340
+ ### Configuration
341
+
342
+ ```python
343
+ # Redact only dates and names, and use a custom placeholder for names
344
+ deid = Deidentifier(
345
+ categories=["date", "name"],
346
+ placeholders={"name": "XXXX"},
347
+ )
348
+ ```
349
+
350
+ > **Scope and limitations.** Rule-based de-identification is a strong first pass,
351
+ > not a compliance guarantee. Names that appear in free narrative **without** a
352
+ > title or header label are not caught. Always review the output before PHI
353
+ > leaves a controlled environment. This tool does not certify HIPAA Safe Harbor
354
+ > compliance.
355
+
356
+ ---
357
+
358
+ ## FHIR Export
359
+
360
+ Outputs a valid FHIR R4 `DiagnosticReport` resource.
361
+
362
+ ```python
363
+ from datetime import datetime
364
+
365
+ fhir = exporter.export(
366
+ report,
367
+ patient_id="pt-001", # Optional: links to FHIR Patient resource
368
+ report_id="rpt-20240315", # Optional: custom resource ID
369
+ issued_dt=datetime.now(), # Optional: defaults to UTC now
370
+ )
371
+ ```
372
+
373
+ ### What's included
374
+
375
+ - `resourceType`: `DiagnosticReport`
376
+ - `status`: `final`
377
+ - `code`: LOINC code matched to modality (CT, MRI, US, etc.)
378
+ - `conclusion`: impression text
379
+ - `presentedForm`: full report text as base64 attachment
380
+ - `contained`: FHIR Observations for each active (non-negated) critical finding
381
+ - `extension`: structured sections for downstream parsing
382
+ - `subject`: patient reference (when `patient_id` provided)
383
+
384
+ ---
385
+
386
+ ## Full Pipeline Example
387
+
388
+ ```python
389
+ import json
390
+ from radreport import ReportParser, CriticalFindingsDetector, FHIRExporter
391
+
392
+ parser = ReportParser()
393
+ detector = CriticalFindingsDetector()
394
+ exporter = FHIRExporter()
395
+
396
+ def process_report(text: str, modality: str, patient_id: str) -> dict:
397
+ report = parser.parse(text, modality=modality)
398
+ report = detector.detect(report)
399
+
400
+ active_criticals = [cf for cf in report.critical_findings if not cf.negated]
401
+ if active_criticals:
402
+ print(f"WARNING: {len(active_criticals)} critical finding(s) detected")
403
+
404
+ return exporter.export(report, patient_id=patient_id)
405
+
406
+ fhir_json = process_report(report_text, modality="CT", patient_id="pt-001")
407
+ print(json.dumps(fhir_json, indent=2))
408
+ ```
409
+
410
+ See [examples/full_pipeline.py](examples/full_pipeline.py) for a runnable end-to-end example.
411
+
412
+ ---
413
+
414
+ ## Design Principles
415
+
416
+ **No dependencies.** The library installs with no third-party packages. This matters in hospital environments where every dependency goes through security review.
417
+
418
+ **Rule-based, not ML-based.** Every decision the library makes is traceable to a specific rule. No model weights, no GPU, no probabilistic outputs. Clinical teams can audit exactly why a finding was flagged.
419
+
420
+ **Negation-aware.** A library that can't distinguish "no pneumothorax" from "pneumothorax" is dangerous in clinical contexts. Negation detection is built into the core.
421
+
422
+ **Auditable de-identification.** PHI redaction runs locally with no ML and no external calls, and every removed span is logged with its original offset — so a privacy officer can review exactly what left the building and why.
423
+
424
+ **FHIR-first output.** Every modern EMR speaks FHIR. The export format is designed to drop into existing integrations without transformation.
425
+
426
+ ---
427
+
428
+ ## Running Tests
429
+
430
+ ```bash
431
+ pip install radreport[dev]
432
+ pytest tests/ -v
433
+ ```
434
+
435
+ ---
436
+
437
+ ## Roadmap
438
+
439
+ - [x] CLI tool for single-file and batch processing (`radreport` command)
440
+ - [x] `parse_batch()` API for processing lists of reports
441
+ - [x] `to_json()` convenience method on `ParsedReport`
442
+ - [x] Structured output for follow-up recommendations (`RecommendationExtractor`)
443
+ - [x] CSV export mode for research/analytics workflows (`--format csv`)
444
+ - [x] Rule-based PHI de-identification with audit trail (`Deidentifier`)
445
+ - [ ] Template matching for common report types (Chest XR, CT Abdomen, MRI Brain)
446
+ - [ ] Structured comparison / prior-study extraction (new / increased / stable / resolved)
447
+ - [ ] Additional FHIR resource types (ImagingStudy, Condition)
448
+
449
+ ---
450
+
451
+ ## Disclaimer
452
+
453
+ This library is a developer tool for structuring report text. It is **not** a medical device and is **not** intended for direct clinical decision-making. Critical findings detection is designed to assist human review workflows, not replace radiologist judgment.
454
+
455
+ ---
456
+
457
+ ## License
458
+
459
+ MIT