adtl 0.6.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- adtl-0.6.1/LICENSE +21 -0
- adtl-0.6.1/PKG-INFO +148 -0
- adtl-0.6.1/README.md +94 -0
- adtl-0.6.1/adtl/__init__.py +1085 -0
- adtl-0.6.1/adtl/__main__.py +3 -0
- adtl-0.6.1/adtl/transformations.py +491 -0
- adtl-0.6.1/adtl.egg-info/PKG-INFO +148 -0
- adtl-0.6.1/adtl.egg-info/SOURCES.txt +14 -0
- adtl-0.6.1/adtl.egg-info/dependency_links.txt +1 -0
- adtl-0.6.1/adtl.egg-info/entry_points.txt +2 -0
- adtl-0.6.1/adtl.egg-info/requires.txt +26 -0
- adtl-0.6.1/adtl.egg-info/top_level.txt +1 -0
- adtl-0.6.1/pyproject.toml +53 -0
- adtl-0.6.1/setup.cfg +4 -0
- adtl-0.6.1/tests/test_parser.py +1387 -0
- adtl-0.6.1/tests/test_transformations.py +316 -0
adtl-0.6.1/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2022 Global.health
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
adtl-0.6.1/PKG-INFO
ADDED
|
@@ -0,0 +1,148 @@
|
|
|
1
|
+
Metadata-Version: 2.1
|
|
2
|
+
Name: adtl
|
|
3
|
+
Version: 0.6.1
|
|
4
|
+
Summary: Another data transformation language
|
|
5
|
+
Author-email: Abhishek Dasgupta <abhishek.dasgupta@dtc.ox.ac.uk>, Pip Liggins <philippa.liggins@dtc.ox.ac.uk>
|
|
6
|
+
License: MIT License
|
|
7
|
+
|
|
8
|
+
Copyright (c) 2022 Global.health
|
|
9
|
+
|
|
10
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
11
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
12
|
+
in the Software without restriction, including without limitation the rights
|
|
13
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
14
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
15
|
+
furnished to do so, subject to the following conditions:
|
|
16
|
+
|
|
17
|
+
The above copyright notice and this permission notice shall be included in all
|
|
18
|
+
copies or substantial portions of the Software.
|
|
19
|
+
|
|
20
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
21
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
22
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
23
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
24
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
25
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
26
|
+
SOFTWARE.
|
|
27
|
+
|
|
28
|
+
Project-URL: Home, https://github.com/globaldothealth/adtl
|
|
29
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
30
|
+
Requires-Python: >=3.9
|
|
31
|
+
Description-Content-Type: text/markdown
|
|
32
|
+
License-File: LICENSE
|
|
33
|
+
Requires-Dist: backports.zoneinfo; python_version < "3.9"
|
|
34
|
+
Requires-Dist: tomli>=2.0.0
|
|
35
|
+
Requires-Dist: pint>=0.24.4
|
|
36
|
+
Requires-Dist: requests>=2.0.0
|
|
37
|
+
Requires-Dist: fastjsonschema==2.16.*
|
|
38
|
+
Requires-Dist: tqdm
|
|
39
|
+
Requires-Dist: python-dateutil
|
|
40
|
+
Requires-Dist: more_itertools
|
|
41
|
+
Provides-Extra: test
|
|
42
|
+
Requires-Dist: pytest; extra == "test"
|
|
43
|
+
Requires-Dist: pytest-cov; extra == "test"
|
|
44
|
+
Requires-Dist: syrupy==4.*; extra == "test"
|
|
45
|
+
Requires-Dist: responses; extra == "test"
|
|
46
|
+
Requires-Dist: pytest-unordered; extra == "test"
|
|
47
|
+
Requires-Dist: adtl[parquet]; extra == "test"
|
|
48
|
+
Provides-Extra: docs
|
|
49
|
+
Requires-Dist: sphinx==8.*; extra == "docs"
|
|
50
|
+
Requires-Dist: myst-parser==4.*; extra == "docs"
|
|
51
|
+
Requires-Dist: sphinx-book-theme; extra == "docs"
|
|
52
|
+
Provides-Extra: parquet
|
|
53
|
+
Requires-Dist: polars; extra == "parquet"
|
|
54
|
+
|
|
55
|
+
# adtl – another data transformation language
|
|
56
|
+
|
|
57
|
+
[](https://www.python.org/downloads/)
|
|
58
|
+
|
|
59
|
+
[](https://github.com/globaldothealth/adtl/actions/workflows/tests.yml)
|
|
60
|
+
[](https://codecov.io/gh/globaldothealth/adtl)
|
|
61
|
+
[](https://github.com/psf/black)
|
|
62
|
+
|
|
63
|
+
|
|
64
|
+
adtl is a data transformation language (DTL) used by some applications in
|
|
65
|
+
[Global.health](https://global.health), notably for the ISARIC clinical data pipeline at
|
|
66
|
+
[globaldothealth/isaric](https://github.com/globaldothealth/isaric) and the InsightBoard
|
|
67
|
+
project dashboard at [globaldothealth/InsightBoard](https://github.com/globaldothealth/InsightBoard)
|
|
68
|
+
|
|
69
|
+
Documentation: [ReadTheDocs](https://adtl.readthedocs.io/en/latest/index.html)
|
|
70
|
+
|
|
71
|
+
## Installation
|
|
72
|
+
|
|
73
|
+
You can install this package using either [`pipx`](https://pypa.github.io/pipx/)
|
|
74
|
+
or `pip`. Installing via `pipx` offers advantages if you want to just use the
|
|
75
|
+
`adtl` tool standalone from the command line, as it isolates the Python
|
|
76
|
+
package dependencies in a virtual environment. On the other hand, `pip` installs
|
|
77
|
+
packages to the global environment which is generally not recommended as it
|
|
78
|
+
can interfere with other packages on your system.
|
|
79
|
+
|
|
80
|
+
* Installation via `pipx`:
|
|
81
|
+
|
|
82
|
+
```shell
|
|
83
|
+
pipx install adtl
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
* Installation via `pip`:
|
|
87
|
+
|
|
88
|
+
```shell
|
|
89
|
+
python3 -m pip install adtl
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
If you are writing code which depends on adtl (instead of using the
|
|
93
|
+
command-line program), then it is best to add a dependency on `adtl` to your
|
|
94
|
+
Python build tool of choice.
|
|
95
|
+
|
|
96
|
+
To use the development version, replace `adtl` with the full GitHub URL:
|
|
97
|
+
|
|
98
|
+
```shell
|
|
99
|
+
pip install git+https://github.com/globaldothealth/adtl
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
## Rationale
|
|
103
|
+
|
|
104
|
+
Most existing data transformation languages are usually in a XML dialect, though
|
|
105
|
+
there are recent variations in other file formats. In addition, many DTLs use a
|
|
106
|
+
custom domain specific language. The primary utility of this DTL is to provide a
|
|
107
|
+
easy to use library in Python for basic data transformations, which are
|
|
108
|
+
specified in a JSON file. It is not meant to be a comprehensive, and adtl can
|
|
109
|
+
be used as a step within a larger data processing pipeline.
|
|
110
|
+
|
|
111
|
+
## Usage
|
|
112
|
+
|
|
113
|
+
adtl can be used from the command line or as a Python library
|
|
114
|
+
|
|
115
|
+
As a CLI:
|
|
116
|
+
```bash
|
|
117
|
+
adtl specification-file input-file
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
Here *specification-file* is the parser specification (as TOML or JSON)
|
|
121
|
+
and *input-file* is the data file (not the data dictionary) that adtl
|
|
122
|
+
will transform using the instructions in the specification.
|
|
123
|
+
|
|
124
|
+
Python library:
|
|
125
|
+
```python
|
|
126
|
+
import adtl
|
|
127
|
+
|
|
128
|
+
parser = adtl.Parser(specification)
|
|
129
|
+
print(parser.tables) # list of tables created
|
|
130
|
+
|
|
131
|
+
for row in parser.parse().read_table(table):
|
|
132
|
+
print(row)
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
If adtl is not in your PATH, this may give an error. Either add the location
|
|
136
|
+
where the adtl script is installed to your PATH, or try running adtl as a module
|
|
137
|
+
|
|
138
|
+
```shell
|
|
139
|
+
python3 -m adtl specification-file input-file
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
Running adtl will create output files with the name of the parser, suffixed with
|
|
143
|
+
table names in the current working directory.
|
|
144
|
+
|
|
145
|
+
## Development
|
|
146
|
+
|
|
147
|
+
Install [pre-commit](https://pre-commit.com) and setup pre-commit hooks
|
|
148
|
+
(`pre-commit install`) which will do linting checks before commit.
|
adtl-0.6.1/README.md
ADDED
|
@@ -0,0 +1,94 @@
|
|
|
1
|
+
# adtl – another data transformation language
|
|
2
|
+
|
|
3
|
+
[](https://www.python.org/downloads/)
|
|
4
|
+
|
|
5
|
+
[](https://github.com/globaldothealth/adtl/actions/workflows/tests.yml)
|
|
6
|
+
[](https://codecov.io/gh/globaldothealth/adtl)
|
|
7
|
+
[](https://github.com/psf/black)
|
|
8
|
+
|
|
9
|
+
|
|
10
|
+
adtl is a data transformation language (DTL) used by some applications in
|
|
11
|
+
[Global.health](https://global.health), notably for the ISARIC clinical data pipeline at
|
|
12
|
+
[globaldothealth/isaric](https://github.com/globaldothealth/isaric) and the InsightBoard
|
|
13
|
+
project dashboard at [globaldothealth/InsightBoard](https://github.com/globaldothealth/InsightBoard)
|
|
14
|
+
|
|
15
|
+
Documentation: [ReadTheDocs](https://adtl.readthedocs.io/en/latest/index.html)
|
|
16
|
+
|
|
17
|
+
## Installation
|
|
18
|
+
|
|
19
|
+
You can install this package using either [`pipx`](https://pypa.github.io/pipx/)
|
|
20
|
+
or `pip`. Installing via `pipx` offers advantages if you want to just use the
|
|
21
|
+
`adtl` tool standalone from the command line, as it isolates the Python
|
|
22
|
+
package dependencies in a virtual environment. On the other hand, `pip` installs
|
|
23
|
+
packages to the global environment which is generally not recommended as it
|
|
24
|
+
can interfere with other packages on your system.
|
|
25
|
+
|
|
26
|
+
* Installation via `pipx`:
|
|
27
|
+
|
|
28
|
+
```shell
|
|
29
|
+
pipx install adtl
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
* Installation via `pip`:
|
|
33
|
+
|
|
34
|
+
```shell
|
|
35
|
+
python3 -m pip install adtl
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
If you are writing code which depends on adtl (instead of using the
|
|
39
|
+
command-line program), then it is best to add a dependency on `adtl` to your
|
|
40
|
+
Python build tool of choice.
|
|
41
|
+
|
|
42
|
+
To use the development version, replace `adtl` with the full GitHub URL:
|
|
43
|
+
|
|
44
|
+
```shell
|
|
45
|
+
pip install git+https://github.com/globaldothealth/adtl
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
## Rationale
|
|
49
|
+
|
|
50
|
+
Most existing data transformation languages are usually in a XML dialect, though
|
|
51
|
+
there are recent variations in other file formats. In addition, many DTLs use a
|
|
52
|
+
custom domain specific language. The primary utility of this DTL is to provide a
|
|
53
|
+
easy to use library in Python for basic data transformations, which are
|
|
54
|
+
specified in a JSON file. It is not meant to be a comprehensive, and adtl can
|
|
55
|
+
be used as a step within a larger data processing pipeline.
|
|
56
|
+
|
|
57
|
+
## Usage
|
|
58
|
+
|
|
59
|
+
adtl can be used from the command line or as a Python library
|
|
60
|
+
|
|
61
|
+
As a CLI:
|
|
62
|
+
```bash
|
|
63
|
+
adtl specification-file input-file
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
Here *specification-file* is the parser specification (as TOML or JSON)
|
|
67
|
+
and *input-file* is the data file (not the data dictionary) that adtl
|
|
68
|
+
will transform using the instructions in the specification.
|
|
69
|
+
|
|
70
|
+
Python library:
|
|
71
|
+
```python
|
|
72
|
+
import adtl
|
|
73
|
+
|
|
74
|
+
parser = adtl.Parser(specification)
|
|
75
|
+
print(parser.tables) # list of tables created
|
|
76
|
+
|
|
77
|
+
for row in parser.parse().read_table(table):
|
|
78
|
+
print(row)
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
If adtl is not in your PATH, this may give an error. Either add the location
|
|
82
|
+
where the adtl script is installed to your PATH, or try running adtl as a module
|
|
83
|
+
|
|
84
|
+
```shell
|
|
85
|
+
python3 -m adtl specification-file input-file
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
Running adtl will create output files with the name of the parser, suffixed with
|
|
89
|
+
table names in the current working directory.
|
|
90
|
+
|
|
91
|
+
## Development
|
|
92
|
+
|
|
93
|
+
Install [pre-commit](https://pre-commit.com) and setup pre-commit hooks
|
|
94
|
+
(`pre-commit install`) which will do linting checks before commit.
|