fhir-pyrate 0.2.0b9__tar.gz → 0.2.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,20 +1,53 @@
1
+ Metadata-Version: 2.1
2
+ Name: fhir-pyrate
3
+ Version: 0.2.2
4
+ Summary: FHIR-PYrate is a package that provides a high-level API to query FHIR Servers for bundles of resources and return the structured information as pandas DataFrames. It can also be used to filter resources using RegEx and SpaCy and download DICOM studies and series.
5
+ Home-page: https://github.com/UMEssen/FHIR-PYrate
6
+ License: MIT
7
+ Keywords: python,fhir,data-science,fhirpath,healthcare
8
+ Author: Rene Hosch
9
+ Author-email: rene.hosch@uk-essen.de
10
+ Requires-Python: >=3.10,<4.0
11
+ Classifier: License :: OSI Approved :: MIT License
12
+ Classifier: Programming Language :: Python :: 3
13
+ Classifier: Programming Language :: Python :: 3.10
14
+ Classifier: Programming Language :: Python :: 3.11
15
+ Classifier: Programming Language :: Python :: 3.12
16
+ Classifier: Programming Language :: Python :: 3.13
17
+ Provides-Extra: all
18
+ Provides-Extra: downloader
19
+ Provides-Extra: miner
20
+ Requires-Dist: PyJWT (>=2.4.0,<3.0.0)
21
+ Requires-Dist: SimpleITK (>=2.0.2,<3.0.0) ; extra == "downloader" or extra == "all"
22
+ Requires-Dist: dicomweb-client (>=0.52.0,<0.53.0) ; extra == "downloader" or extra == "all"
23
+ Requires-Dist: fhirpathpy (>=0.2.2,<0.3.0)
24
+ Requires-Dist: numpy (>=2.0.0,<3.0.0)
25
+ Requires-Dist: pandas (>=2.0.0,<3.0.0)
26
+ Requires-Dist: pydicom (>=2.1.2,<3.0.0) ; extra == "downloader" or extra == "all"
27
+ Requires-Dist: requests (>=2.28.0,<3.0.0)
28
+ Requires-Dist: requests-cache (>=0.9.7,<0.10.0)
29
+ Requires-Dist: spacy (>=3.0.6,<4.0.0) ; extra == "miner" or extra == "all"
30
+ Requires-Dist: tqdm (>=4.56.0,<5.0.0)
31
+ Project-URL: Repository, https://github.com/UMEssen/FHIR-PYrate
32
+ Description-Content-Type: text/markdown
33
+
1
34
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
2
- [![Supported Python version](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/release/python-380/)
35
+ [![Supported Python version](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/release/python-31011/)
3
36
  [![Stable Version](https://img.shields.io/pypi/v/fhir-pyrate?label=stable)](https://pypi.org/project/fhir-pyrate/)
4
37
  [![Pre-release Version](https://img.shields.io/github/v/release/UMEssen/fhir-pyrate?label=pre-release&include_prereleases&sort=semver)](https://pypi.org/project/fhir-pyrate/#history)
5
38
  [![DOI](https://zenodo.org/badge/456893108.svg)](https://zenodo.org/badge/latestdoi/456893108)
39
+ [![Affiliated with RTG WisPerMed](https://img.shields.io/badge/Affiliated-RTG%202535%20WisPerMed-blue)](https://wispermed.org/)
6
40
 
7
41
  <!-- PROJECT LOGO -->
8
- <br />
9
- <div align="center">
10
- <a href="https://github.com/UMEssen/FHIR-PYrate">
11
- <img src="https://raw.githubusercontent.com/UMEssen/FHIR-PYrate/main/images/logo.svg" alt="Logo" width="440" height="338">
12
- </a>
13
- </div>
42
+ ![Pyrate-Banner](images/pyrate-banner.png)
14
43
 
15
44
  This package is meant to provide a simple abstraction to query and structure FHIR resources as
16
45
  pandas DataFrames. Want to use R instead? Try out [fhircrackr](https://github.com/POLAR-fhiR/fhircrackr)!
17
46
 
47
+ **If you use this package, please cite:**
48
+
49
+ Hosch, R., Baldini, G., Parmar, V. et al. FHIR-PYrate: a data science friendly Python package to query FHIR servers. BMC Health Serv Res 23, 734 (2023). https://doi.org/10.1186/s12913-023-09498-1
50
+
18
51
  There are four main classes:
19
52
  * [Ahoy](https://github.com/UMEssen/FHIR-PYrate/blob/main/fhir_pyrate/ahoy.py): Authenticate on the FHIR API
20
53
  ([Example 1](https://github.com/UMEssen/FHIR-PYrate/blob/main/examples/1-simple-json-to-df.ipynb),
@@ -38,11 +71,6 @@ our institute. If there is anything in the code that only applies to our server,
38
71
  problems with the authentication (or anything else really), please just create an issue or
39
72
  [email us](mailto:giulia.baldini@uk-essen.de).
40
73
 
41
- <br />
42
- <div align="center">
43
- <img src="https://raw.githubusercontent.com/UMEssen/FHIR-PYrate/main/images/resources.svg" alt="Resources" width="630" height="385">
44
- </div>
45
-
46
74
  <!-- TABLE OF CONTENTS -->
47
75
  Table of Contents:
48
76
 
@@ -203,10 +231,12 @@ The Pirate functions do one of three things:
203
231
  | trade_rows_for_dataframe | 3 | Yes | Yes | DataFrame |
204
232
 
205
233
 
206
- **BETA FEATURE**: It is also possible to cache the bundles using the `bundle_caching` parameter,
207
- which specifies a caching folder. This has not yet been tested extensively and does not have any
208
- cache invalidation mechanism.
209
-
234
+ **CACHING**: It is also possible to cache the bundles using the `cache_folder` parameter.
235
+ This unfortunately does not currently work with multiprocessing, but saves a lot of time if you
236
+ need to download a lot of data and you are always doing the same requests.
237
+ You can also specify how long the cache should be valid with the `cache_expiry_time` parameter.
238
+ Additionally, you can also specify whether the requests should be retried using the `retry_requests`
239
+ parameter. There is an example of this in the docstrings of the Pirate class.
210
240
 
211
241
  A toy request for ImagingStudy:
212
242
 
@@ -396,7 +426,65 @@ parameters specified in `df_constraints` as columns of the final DataFrame.
396
426
  You can find an example in [Example 3](https://github.com/UMEssen/FHIR-PYrate/blob/main/examples/3-patients-for-condition.ipynb).
397
427
  Additionally, you can specify the `with_columns` parameter, which can add any columns from the original
398
428
  DataFrame. The columns can be either specified as a list of columns `[col1, col2, ...]` or as a
399
- list of tuples `[(new_name_for_col1, col1), (new_name_for_col2, col2), ...]`
429
+ list of tuples `[(new_name_for_col1, col1), (new_name_for_col2, col2), ...]`.
430
+
431
+ Currently, whenever a column is completely empty (i.e., no resources
432
+ have a corresponding value for that column), it is just removed from the DataFrame.
433
+ This is to ensure that we output clean DataFrames when we are handling multiple resources.
434
+ More on that in the following section.
435
+
436
+ #### Note on Querying Multiple Resources
437
+
438
+ Not all FHIR servers allow this (at least not the public ones that we have tried),
439
+ but it is also possible to obtain multiple resources with just one query:
440
+ ```python
441
+ search = ...
442
+ result_dfs = search.steal_bundles_to_dataframe(
443
+ resource_type="ImagingStudy",
444
+ request_params={
445
+ "_lastUpdated": "ge2022-12",
446
+ "_count": "3",
447
+ "_include": "ImagingStudy:subject",
448
+ },
449
+ fhir_paths=[
450
+ "id",
451
+ "started",
452
+ ("modality", "modality.code"),
453
+ ("procedureCode", "procedureCode.coding.code"),
454
+ (
455
+ "study_instance_uid",
456
+ "identifier.where(system = 'urn:dicom:uid').value.replace('urn:oid:', '')",
457
+ ),
458
+ ("series_instance_uid", "series.uid"),
459
+ ("series_code", "series.modality.code"),
460
+ ("numberOfInstances", "series.numberOfInstances"),
461
+ ("family_first", "name[0].family"),
462
+ ("given_first", "name[0].given"),
463
+ ],
464
+ num_pages=1,
465
+ )
466
+ ```
467
+ In this case, a dictionary of DataFrames is returned, where the keys are the resource types.
468
+ You can then select the single dictionary by doing `result_dfs["ImagingStudy"]`
469
+ or `result_dfs["Patient"]`.
470
+ You can find an example of this in [Example 2](https://github.com/UMEssen/FHIR-PYrate/blob/main/examples/2-condition-to-imaging-study.ipynb)
471
+ where the `ImagingStudy` resource is queried.
472
+
473
+ In theory, it would be smarter to specify the resource name in front of the FHIRPaths,
474
+ e.g. `ImagingStudy.series.uid` instead of `series.uid`, and for each DataFrame only return the
475
+ corresponding attributes.
476
+ However, we do not want to force the user to always specify the resource type, and in the current
477
+ version the DataFrames
478
+ coming from multiple resources have the same columns, because
479
+ we cannot filter which resource was actually intended.
480
+ Currently, we solved this by just removing all columns that do not have any results.
481
+ Which means however, that if you are actually requesting an attribute for a specific resource and it
482
+ is not found, that that column will not appear.
483
+ In the future, [we plan to do a smarter filtering of the FHIRPaths](https://github.com/UMEssen/FHIR-PYrate/issues/120),
484
+ such that only the ones containing
485
+ the actual resource name are kept if the resource name is specified in the path,
486
+ and that a column full of `None`s is obtained in case no resource type is specified.
487
+
400
488
 
401
489
  ### [Miner](https://github.com/UMEssen/FHIR-PYrate/blob/main/fhir_pyrate/miner.py)
402
490
 
@@ -526,3 +614,4 @@ This project is licenced under the [MIT Licence](LICENSE).
526
614
 
527
615
  ## Project status
528
616
  The project is in active development.
617
+
@@ -1,53 +1,20 @@
1
- Metadata-Version: 2.1
2
- Name: fhir-pyrate
3
- Version: 0.2.0b9
4
- Summary: FHIR-PYrate is a package that provides a high-level API to query FHIR Servers for bundles of resources and return the structured information as pandas DataFrames. It can also be used to filter resources using RegEx and SpaCy and download DICOM studies and series.
5
- Home-page: https://github.com/UMEssen/FHIR-PYrate
6
- License: MIT
7
- Keywords: python,fhir,data-science,fhirpath,healthcare
8
- Author: Rene Hosch
9
- Author-email: rene.hosch@uk-essen.de
10
- Requires-Python: >=3.8,<4.0
11
- Classifier: License :: OSI Approved :: MIT License
12
- Classifier: Programming Language :: Python :: 3
13
- Classifier: Programming Language :: Python :: 3.8
14
- Classifier: Programming Language :: Python :: 3.9
15
- Classifier: Programming Language :: Python :: 3.10
16
- Classifier: Programming Language :: Python :: 3.11
17
- Provides-Extra: all
18
- Provides-Extra: downloader
19
- Provides-Extra: miner
20
- Requires-Dist: PyJWT (>=2.4.0,<3.0.0)
21
- Requires-Dist: SimpleITK (>=2.0.2,<3.0.0) ; extra == "downloader" or extra == "all"
22
- Requires-Dist: dicomweb-client (>=0.52.0,<0.53.0) ; extra == "downloader" or extra == "all"
23
- Requires-Dist: fhirpathpy (>=0.1.0,<0.2.0)
24
- Requires-Dist: numpy (>=1.22,<2.0)
25
- Requires-Dist: pandas (>=1.3.0,<2.0.0)
26
- Requires-Dist: pydicom (>=2.1.2,<3.0.0) ; extra == "downloader" or extra == "all"
27
- Requires-Dist: requests (>=2.28.0,<3.0.0)
28
- Requires-Dist: requests-cache (>=0.9.7,<0.10.0)
29
- Requires-Dist: spacy (>=3.0.6,<4.0.0) ; extra == "miner" or extra == "all"
30
- Requires-Dist: tqdm (>=4.56.0,<5.0.0)
31
- Project-URL: Repository, https://github.com/UMEssen/FHIR-PYrate
32
- Description-Content-Type: text/markdown
33
-
34
1
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
35
- [![Supported Python version](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/release/python-380/)
2
+ [![Supported Python version](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/release/python-31011/)
36
3
  [![Stable Version](https://img.shields.io/pypi/v/fhir-pyrate?label=stable)](https://pypi.org/project/fhir-pyrate/)
37
4
  [![Pre-release Version](https://img.shields.io/github/v/release/UMEssen/fhir-pyrate?label=pre-release&include_prereleases&sort=semver)](https://pypi.org/project/fhir-pyrate/#history)
38
5
  [![DOI](https://zenodo.org/badge/456893108.svg)](https://zenodo.org/badge/latestdoi/456893108)
6
+ [![Affiliated with RTG WisPerMed](https://img.shields.io/badge/Affiliated-RTG%202535%20WisPerMed-blue)](https://wispermed.org/)
39
7
 
40
8
  <!-- PROJECT LOGO -->
41
- <br />
42
- <div align="center">
43
- <a href="https://github.com/UMEssen/FHIR-PYrate">
44
- <img src="https://raw.githubusercontent.com/UMEssen/FHIR-PYrate/main/images/logo.svg" alt="Logo" width="440" height="338">
45
- </a>
46
- </div>
9
+ ![Pyrate-Banner](images/pyrate-banner.png)
47
10
 
48
11
  This package is meant to provide a simple abstraction to query and structure FHIR resources as
49
12
  pandas DataFrames. Want to use R instead? Try out [fhircrackr](https://github.com/POLAR-fhiR/fhircrackr)!
50
13
 
14
+ **If you use this package, please cite:**
15
+
16
+ Hosch, R., Baldini, G., Parmar, V. et al. FHIR-PYrate: a data science friendly Python package to query FHIR servers. BMC Health Serv Res 23, 734 (2023). https://doi.org/10.1186/s12913-023-09498-1
17
+
51
18
  There are four main classes:
52
19
  * [Ahoy](https://github.com/UMEssen/FHIR-PYrate/blob/main/fhir_pyrate/ahoy.py): Authenticate on the FHIR API
53
20
  ([Example 1](https://github.com/UMEssen/FHIR-PYrate/blob/main/examples/1-simple-json-to-df.ipynb),
@@ -71,11 +38,6 @@ our institute. If there is anything in the code that only applies to our server,
71
38
  problems with the authentication (or anything else really), please just create an issue or
72
39
  [email us](mailto:giulia.baldini@uk-essen.de).
73
40
 
74
- <br />
75
- <div align="center">
76
- <img src="https://raw.githubusercontent.com/UMEssen/FHIR-PYrate/main/images/resources.svg" alt="Resources" width="630" height="385">
77
- </div>
78
-
79
41
  <!-- TABLE OF CONTENTS -->
80
42
  Table of Contents:
81
43
 
@@ -236,10 +198,12 @@ The Pirate functions do one of three things:
236
198
  | trade_rows_for_dataframe | 3 | Yes | Yes | DataFrame |
237
199
 
238
200
 
239
- **BETA FEATURE**: It is also possible to cache the bundles using the `bundle_caching` parameter,
240
- which specifies a caching folder. This has not yet been tested extensively and does not have any
241
- cache invalidation mechanism.
242
-
201
+ **CACHING**: It is also possible to cache the bundles using the `cache_folder` parameter.
202
+ This unfortunately does not currently work with multiprocessing, but saves a lot of time if you
203
+ need to download a lot of data and you are always doing the same requests.
204
+ You can also specify how long the cache should be valid with the `cache_expiry_time` parameter.
205
+ Additionally, you can also specify whether the requests should be retried using the `retry_requests`
206
+ parameter. There is an example of this in the docstrings of the Pirate class.
243
207
 
244
208
  A toy request for ImagingStudy:
245
209
 
@@ -429,7 +393,65 @@ parameters specified in `df_constraints` as columns of the final DataFrame.
429
393
  You can find an example in [Example 3](https://github.com/UMEssen/FHIR-PYrate/blob/main/examples/3-patients-for-condition.ipynb).
430
394
  Additionally, you can specify the `with_columns` parameter, which can add any columns from the original
431
395
  DataFrame. The columns can be either specified as a list of columns `[col1, col2, ...]` or as a
432
- list of tuples `[(new_name_for_col1, col1), (new_name_for_col2, col2), ...]`
396
+ list of tuples `[(new_name_for_col1, col1), (new_name_for_col2, col2), ...]`.
397
+
398
+ Currently, whenever a column is completely empty (i.e., no resources
399
+ have a corresponding value for that column), it is just removed from the DataFrame.
400
+ This is to ensure that we output clean DataFrames when we are handling multiple resources.
401
+ More on that in the following section.
402
+
403
+ #### Note on Querying Multiple Resources
404
+
405
+ Not all FHIR servers allow this (at least not the public ones that we have tried),
406
+ but it is also possible to obtain multiple resources with just one query:
407
+ ```python
408
+ search = ...
409
+ result_dfs = search.steal_bundles_to_dataframe(
410
+ resource_type="ImagingStudy",
411
+ request_params={
412
+ "_lastUpdated": "ge2022-12",
413
+ "_count": "3",
414
+ "_include": "ImagingStudy:subject",
415
+ },
416
+ fhir_paths=[
417
+ "id",
418
+ "started",
419
+ ("modality", "modality.code"),
420
+ ("procedureCode", "procedureCode.coding.code"),
421
+ (
422
+ "study_instance_uid",
423
+ "identifier.where(system = 'urn:dicom:uid').value.replace('urn:oid:', '')",
424
+ ),
425
+ ("series_instance_uid", "series.uid"),
426
+ ("series_code", "series.modality.code"),
427
+ ("numberOfInstances", "series.numberOfInstances"),
428
+ ("family_first", "name[0].family"),
429
+ ("given_first", "name[0].given"),
430
+ ],
431
+ num_pages=1,
432
+ )
433
+ ```
434
+ In this case, a dictionary of DataFrames is returned, where the keys are the resource types.
435
+ You can then select the single dictionary by doing `result_dfs["ImagingStudy"]`
436
+ or `result_dfs["Patient"]`.
437
+ You can find an example of this in [Example 2](https://github.com/UMEssen/FHIR-PYrate/blob/main/examples/2-condition-to-imaging-study.ipynb)
438
+ where the `ImagingStudy` resource is queried.
439
+
440
+ In theory, it would be smarter to specify the resource name in front of the FHIRPaths,
441
+ e.g. `ImagingStudy.series.uid` instead of `series.uid`, and for each DataFrame only return the
442
+ corresponding attributes.
443
+ However, we do not want to force the user to always specify the resource type, and in the current
444
+ version the DataFrames
445
+ coming from multiple resources have the same columns, because
446
+ we cannot filter which resource was actually intended.
447
+ Currently, we solved this by just removing all columns that do not have any results.
448
+ Which means however, that if you are actually requesting an attribute for a specific resource and it
449
+ is not found, that that column will not appear.
450
+ In the future, [we plan to do a smarter filtering of the FHIRPaths](https://github.com/UMEssen/FHIR-PYrate/issues/120),
451
+ such that only the ones containing
452
+ the actual resource name are kept if the resource name is specified in the path,
453
+ and that a column full of `None`s is obtained in case no resource type is specified.
454
+
433
455
 
434
456
  ### [Miner](https://github.com/UMEssen/FHIR-PYrate/blob/main/fhir_pyrate/miner.py)
435
457
 
@@ -559,4 +581,3 @@ This project is licenced under the [MIT Licence](LICENSE).
559
581
 
560
582
  ## Project status
561
583
  The project is in active development.
562
-
@@ -33,18 +33,22 @@ class Ahoy:
33
33
  :param token_refresh_delta: Either a timedelta object that tells us how often the token
34
34
  should be refreshed, or a number of minutes; this does not need to be specified for JWT tokens
35
35
  that contain the expiry date
36
+ :param session: The session that can be used for the authentication. This is particularly
37
+ useful if you have some particular requirements for your authentication (e.g. you need to
38
+ support for cusum self-signed certificates).
36
39
  """
37
40
 
38
41
  def __init__(
39
42
  self,
40
- auth_url: str = None,
43
+ auth_url: Optional[str] = None,
41
44
  auth_type: Optional[str] = "token",
42
- refresh_url: str = None,
43
- username: str = None,
45
+ refresh_url: Optional[str] = None,
46
+ username: Optional[str] = None,
44
47
  auth_method: Optional[str] = "password",
45
- token: str = None,
48
+ token: Optional[str] = None,
46
49
  max_login_attempts: int = 5,
47
- token_refresh_delta: Union[int, timedelta] = None,
50
+ token_refresh_delta: Optional[Union[int, timedelta]] = None,
51
+ session: Optional[requests.Session] = None,
48
52
  ) -> None:
49
53
  self.auth_type = auth_type
50
54
  self.auth_method = auth_method
@@ -54,7 +58,10 @@ class Ahoy:
54
58
  self._user_env_name = "FHIR_USER"
55
59
  self._pass_env_name = "FHIR_PASSWORD"
56
60
  self.token = token
57
- self.session = requests.Session()
61
+ if session is None:
62
+ self.session = requests.Session()
63
+ else:
64
+ self.session = session
58
65
  self.max_login_attempts = max_login_attempts
59
66
  self.token_refresh_delta = token_refresh_delta
60
67
  if self.auth_type is not None and self.auth_method is not None:
@@ -75,7 +82,7 @@ class Ahoy:
75
82
  self.close()
76
83
 
77
84
  def change_environment_variable_name(
78
- self, user_env: str = None, pass_env: str = None
85
+ self, user_env: Optional[str] = None, pass_env: Optional[str] = None
79
86
  ) -> None:
80
87
  """
81
88
  Change the name of the variables used to retrieve username and password.