jsonunwrap 0.2.0__tar.gz → 0.2.2__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- jsonunwrap-0.2.2/PKG-INFO +93 -0
- jsonunwrap-0.2.2/README.md +78 -0
- jsonunwrap-0.2.2/jsonunwrap/core.py +99 -0
- jsonunwrap-0.2.2/jsonunwrap.egg-info/PKG-INFO +93 -0
- {jsonunwrap-0.2.0 → jsonunwrap-0.2.2}/pyproject.toml +1 -1
- jsonunwrap-0.2.0/PKG-INFO +0 -28
- jsonunwrap-0.2.0/README.md +0 -13
- jsonunwrap-0.2.0/jsonunwrap/core.py +0 -79
- jsonunwrap-0.2.0/jsonunwrap.egg-info/PKG-INFO +0 -28
- {jsonunwrap-0.2.0 → jsonunwrap-0.2.2}/jsonunwrap/__init__.py +0 -0
- {jsonunwrap-0.2.0 → jsonunwrap-0.2.2}/jsonunwrap.egg-info/SOURCES.txt +0 -0
- {jsonunwrap-0.2.0 → jsonunwrap-0.2.2}/jsonunwrap.egg-info/dependency_links.txt +0 -0
- {jsonunwrap-0.2.0 → jsonunwrap-0.2.2}/jsonunwrap.egg-info/requires.txt +0 -0
- {jsonunwrap-0.2.0 → jsonunwrap-0.2.2}/jsonunwrap.egg-info/top_level.txt +0 -0
- {jsonunwrap-0.2.0 → jsonunwrap-0.2.2}/setup.cfg +0 -0
- {jsonunwrap-0.2.0 → jsonunwrap-0.2.2}/tests/test_core.py +0 -0
|
@@ -0,0 +1,93 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: jsonunwrap
|
|
3
|
+
Version: 0.2.2
|
|
4
|
+
Summary: A small python package that unpacks data from a JSON url and converts it into a csv file.
|
|
5
|
+
Author-email: njuedominic <njuemugodominic@gmail.com>
|
|
6
|
+
Project-URL: Homepage, https://github.com/njuedominic/json-unwrap
|
|
7
|
+
Project-URL: Bug Tracker, https://github.com/njuedominic/json-unwrap/issues
|
|
8
|
+
Classifier: Programming Language :: Python :: 3
|
|
9
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
10
|
+
Classifier: Operating System :: OS Independent
|
|
11
|
+
Requires-Python: >3.8
|
|
12
|
+
Description-Content-Type: text/markdown
|
|
13
|
+
Requires-Dist: requests>=2.28.0
|
|
14
|
+
Requires-Dist: pandas>=2.0.0
|
|
15
|
+
|
|
16
|
+
# jsonunwrap
|
|
17
|
+
|
|
18
|
+
**jsonunwrap** is a simple JSON flattening and normalization library.
|
|
19
|
+
|
|
20
|
+
```python
|
|
21
|
+
>>> import jsonunwrap as ju
|
|
22
|
+
>>> url = "https://dummyjson.com/carts"
|
|
23
|
+
>>> fetchdata = ju.fetch_json(url)
|
|
24
|
+
>>> df = ju.unwrap_data(fetchdata)
|
|
25
|
+
>>> df.columns
|
|
26
|
+
Index(['id', 'user.name', 'hobbies'], dtype='object')
|
|
27
|
+
>>> len(df)
|
|
28
|
+
2
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
jsonunwrap allows you to deeply normalize complex, semi-structured nested JSON data into clean pandas DataFrames extremely easily.
|
|
32
|
+
|
|
33
|
+
---
|
|
34
|
+
|
|
35
|
+
## Installing jsonunwrap
|
|
36
|
+
|
|
37
|
+
jsonunwrap is available on PyPI:
|
|
38
|
+
|
|
39
|
+
```bash
|
|
40
|
+
$ python -m pip install jsonunwrap
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
## Supported Features & Best–Practices
|
|
44
|
+
|
|
45
|
+
jsonunwrap is ready for the data parsing demands of modern web APIs, automation scripts, and data engineering pipelines.
|
|
46
|
+
|
|
47
|
+
- Recursive Deep-Flattening (Dot-notation formatting for sub-dictionaries)
|
|
48
|
+
- Automated List Explosion (Spanning embedded primitive lists safely into unique rows)
|
|
49
|
+
- Endpoint Target Binding (Fetch and write stream sequences in a single wrapper)
|
|
50
|
+
- Clean API Namespace (Unified module boundaries with zero complex configuration maps)
|
|
51
|
+
|
|
52
|
+
---
|
|
53
|
+
|
|
54
|
+
## Quick Usage Reference
|
|
55
|
+
|
|
56
|
+
### Download, Normalize, and Export an API Stream to Disk
|
|
57
|
+
|
|
58
|
+
```python
|
|
59
|
+
import jsonunwrap as ju
|
|
60
|
+
|
|
61
|
+
target_url = "url"
|
|
62
|
+
output_file = "data/normalized_users.csv"
|
|
63
|
+
|
|
64
|
+
# Fetch, unwrap nested schemas, and generate a local CSV payload
|
|
65
|
+
df = ju.json_to_csv(target_url, output_file)
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
---
|
|
69
|
+
|
|
70
|
+
## Core API Documentation
|
|
71
|
+
|
|
72
|
+
The complete package interaction map exposes three primary interfaces:
|
|
73
|
+
|
|
74
|
+
### `jsonunwrap.unwrap_data(data)`
|
|
75
|
+
Recursively flattens an in-memory dictionary or list of records. Returns a parsed `pandas.DataFrame`.
|
|
76
|
+
|
|
77
|
+
### `jsonunwrap.fetch_json(url, **kwargs)`
|
|
78
|
+
A connection delivery utility utilizing the `requests` transport subsystem. Passes `**kwargs` onwards to target handlers.
|
|
79
|
+
|
|
80
|
+
### `jsonunwrap.json_to_csv(url, output_path)`
|
|
81
|
+
High-level workflow coordinator connecting remote schema acquisition loops directly to a structural local CSV export.
|
|
82
|
+
|
|
83
|
+
---
|
|
84
|
+
|
|
85
|
+
## Contributions & Testing
|
|
86
|
+
|
|
87
|
+
To test modifications locally or prepare adjustments, execute using `pytest`:
|
|
88
|
+
|
|
89
|
+
```bash
|
|
90
|
+
\$ git clone https://github.com/njuedominic/json-unwrap
|
|
91
|
+
\$ cd jsonunwrap
|
|
92
|
+
\(python -m pip install -e .[test]\) pytest
|
|
93
|
+
```
|
|
@@ -0,0 +1,78 @@
|
|
|
1
|
+
# jsonunwrap
|
|
2
|
+
|
|
3
|
+
**jsonunwrap** is a simple JSON flattening and normalization library.
|
|
4
|
+
|
|
5
|
+
```python
|
|
6
|
+
>>> import jsonunwrap as ju
|
|
7
|
+
>>> url = "https://dummyjson.com/carts"
|
|
8
|
+
>>> fetchdata = ju.fetch_json(url)
|
|
9
|
+
>>> df = ju.unwrap_data(fetchdata)
|
|
10
|
+
>>> df.columns
|
|
11
|
+
Index(['id', 'user.name', 'hobbies'], dtype='object')
|
|
12
|
+
>>> len(df)
|
|
13
|
+
2
|
|
14
|
+
```
|
|
15
|
+
|
|
16
|
+
jsonunwrap allows you to deeply normalize complex, semi-structured nested JSON data into clean pandas DataFrames extremely easily.
|
|
17
|
+
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
## Installing jsonunwrap
|
|
21
|
+
|
|
22
|
+
jsonunwrap is available on PyPI:
|
|
23
|
+
|
|
24
|
+
```bash
|
|
25
|
+
$ python -m pip install jsonunwrap
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
## Supported Features & Best–Practices
|
|
29
|
+
|
|
30
|
+
jsonunwrap is ready for the data parsing demands of modern web APIs, automation scripts, and data engineering pipelines.
|
|
31
|
+
|
|
32
|
+
- Recursive Deep-Flattening (Dot-notation formatting for sub-dictionaries)
|
|
33
|
+
- Automated List Explosion (Spanning embedded primitive lists safely into unique rows)
|
|
34
|
+
- Endpoint Target Binding (Fetch and write stream sequences in a single wrapper)
|
|
35
|
+
- Clean API Namespace (Unified module boundaries with zero complex configuration maps)
|
|
36
|
+
|
|
37
|
+
---
|
|
38
|
+
|
|
39
|
+
## Quick Usage Reference
|
|
40
|
+
|
|
41
|
+
### Download, Normalize, and Export an API Stream to Disk
|
|
42
|
+
|
|
43
|
+
```python
|
|
44
|
+
import jsonunwrap as ju
|
|
45
|
+
|
|
46
|
+
target_url = "url"
|
|
47
|
+
output_file = "data/normalized_users.csv"
|
|
48
|
+
|
|
49
|
+
# Fetch, unwrap nested schemas, and generate a local CSV payload
|
|
50
|
+
df = ju.json_to_csv(target_url, output_file)
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
---
|
|
54
|
+
|
|
55
|
+
## Core API Documentation
|
|
56
|
+
|
|
57
|
+
The complete package interaction map exposes three primary interfaces:
|
|
58
|
+
|
|
59
|
+
### `jsonunwrap.unwrap_data(data)`
|
|
60
|
+
Recursively flattens an in-memory dictionary or list of records. Returns a parsed `pandas.DataFrame`.
|
|
61
|
+
|
|
62
|
+
### `jsonunwrap.fetch_json(url, **kwargs)`
|
|
63
|
+
A connection delivery utility utilizing the `requests` transport subsystem. Passes `**kwargs` onwards to target handlers.
|
|
64
|
+
|
|
65
|
+
### `jsonunwrap.json_to_csv(url, output_path)`
|
|
66
|
+
High-level workflow coordinator connecting remote schema acquisition loops directly to a structural local CSV export.
|
|
67
|
+
|
|
68
|
+
---
|
|
69
|
+
|
|
70
|
+
## Contributions & Testing
|
|
71
|
+
|
|
72
|
+
To test modifications locally or prepare adjustments, execute using `pytest`:
|
|
73
|
+
|
|
74
|
+
```bash
|
|
75
|
+
\$ git clone https://github.com/njuedominic/json-unwrap
|
|
76
|
+
\$ cd jsonunwrap
|
|
77
|
+
\(python -m pip install -e .[test]\) pytest
|
|
78
|
+
```
|
|
@@ -0,0 +1,99 @@
|
|
|
1
|
+
"""
|
|
2
|
+
This module contains the core functionality for the json_unwrap package.
|
|
3
|
+
The main function is `json_to_csv`, which converts a JSON file to a CSV file.
|
|
4
|
+
"""
|
|
5
|
+
import os
|
|
6
|
+
from typing import Any, Dict, List, Union
|
|
7
|
+
import pandas as pd
|
|
8
|
+
import requests
|
|
9
|
+
|
|
10
|
+
def unwrap_data(data: Union[Dict[str, Any], List[Any]]) -> pd.DataFrame:
|
|
11
|
+
"""
|
|
12
|
+
Normalizes and deeply flattens semi-structured JSON data into a pandas DataFrame.
|
|
13
|
+
"""
|
|
14
|
+
# 1. Ensure we start with a clean record list
|
|
15
|
+
if isinstance(data, dict):
|
|
16
|
+
# Handle cases where the data is inside a wrapper key (like {"products": [...]})
|
|
17
|
+
list_keys = [k for k, v in data.items() if isinstance(v, list)]
|
|
18
|
+
if list_keys and len(data) <= 4:
|
|
19
|
+
main_data = data[list_keys[0]]
|
|
20
|
+
else:
|
|
21
|
+
main_data = [data]
|
|
22
|
+
else:
|
|
23
|
+
main_data = data
|
|
24
|
+
|
|
25
|
+
# 2. Base normalization
|
|
26
|
+
df = pd.json_normalize(main_data)
|
|
27
|
+
|
|
28
|
+
# 3. Clean Linear Pass: Avoid infinite loops by tracking column states directly
|
|
29
|
+
columns_to_process = list(df.columns)
|
|
30
|
+
|
|
31
|
+
while columns_to_process:
|
|
32
|
+
col = columns_to_process.pop(0)
|
|
33
|
+
|
|
34
|
+
# Guard check if the column was dropped in a previous iteration
|
|
35
|
+
if col not in df.columns:
|
|
36
|
+
continue
|
|
37
|
+
|
|
38
|
+
non_null_vals = df[col].dropna()
|
|
39
|
+
if non_null_vals.empty:
|
|
40
|
+
continue
|
|
41
|
+
|
|
42
|
+
# Check for nested dictionaries
|
|
43
|
+
if any(isinstance(val, dict) for val in non_null_vals):
|
|
44
|
+
nested_df = pd.json_normalize(non_null_vals).set_index(non_null_vals.index)
|
|
45
|
+
# Add new sub-columns back into the processing queue
|
|
46
|
+
new_cols = [f"{c}_{col}" for c in nested_df.columns]
|
|
47
|
+
df = df.drop(columns=[col]).join(nested_df, rsuffix=f"_{col}")
|
|
48
|
+
columns_to_process.extend(new_cols)
|
|
49
|
+
|
|
50
|
+
# Check for nested lists (But do not loop back if it's just raw strings/ints)
|
|
51
|
+
elif any(isinstance(val, list) for val in non_null_vals):
|
|
52
|
+
# Check if the list contains dictionaries before exploding heavily
|
|
53
|
+
first_list = next((v for v in non_null_vals if isinstance(v, list) and v), None)
|
|
54
|
+
|
|
55
|
+
df = df.explode(col)
|
|
56
|
+
|
|
57
|
+
# If the inner elements were dictionaries, we need to flatten them on the next pass
|
|
58
|
+
if first_list and isinstance(first_list[0], dict):
|
|
59
|
+
columns_to_process.append(col)
|
|
60
|
+
|
|
61
|
+
return df
|
|
62
|
+
|
|
63
|
+
|
|
64
|
+
def fetch_json(url: str, **kwargs: Any) -> Union[Dict[str, Any], List[Any]]:
|
|
65
|
+
"""
|
|
66
|
+
Fetches raw JSON data from a URL.
|
|
67
|
+
|
|
68
|
+
Args:
|
|
69
|
+
url: The endpoint web target.
|
|
70
|
+
**kwargs: Additional arguments passed directly to requests.get (e.g., headers, auth).
|
|
71
|
+
"""
|
|
72
|
+
response = requests.get(url, **kwargs)
|
|
73
|
+
response.raise_for_status()
|
|
74
|
+
return response.json()
|
|
75
|
+
|
|
76
|
+
|
|
77
|
+
def json_to_csv(url: str, output_path: str) -> pd.DataFrame:
|
|
78
|
+
"""
|
|
79
|
+
Fetches JSON from a URL, deeply flattens it, and saves it directly to a CSV file.
|
|
80
|
+
|
|
81
|
+
Args:
|
|
82
|
+
url: The endpoint URL containing the target JSON data.
|
|
83
|
+
output_path: Target filesystem path where the CSV will be written.
|
|
84
|
+
|
|
85
|
+
Returns:
|
|
86
|
+
The generated pandas DataFrame.
|
|
87
|
+
"""
|
|
88
|
+
# Ensure the parent output directory exists safely
|
|
89
|
+
directory = os.path.dirname(output_path)
|
|
90
|
+
if directory:
|
|
91
|
+
os.makedirs(directory, exist_ok=True)
|
|
92
|
+
|
|
93
|
+
raw_data = fetch_json(url)
|
|
94
|
+
df = unwrap_data(raw_data)
|
|
95
|
+
|
|
96
|
+
df.to_csv(output_path, index=False)
|
|
97
|
+
return df
|
|
98
|
+
|
|
99
|
+
|
|
@@ -0,0 +1,93 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: jsonunwrap
|
|
3
|
+
Version: 0.2.2
|
|
4
|
+
Summary: A small python package that unpacks data from a JSON url and converts it into a csv file.
|
|
5
|
+
Author-email: njuedominic <njuemugodominic@gmail.com>
|
|
6
|
+
Project-URL: Homepage, https://github.com/njuedominic/json-unwrap
|
|
7
|
+
Project-URL: Bug Tracker, https://github.com/njuedominic/json-unwrap/issues
|
|
8
|
+
Classifier: Programming Language :: Python :: 3
|
|
9
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
10
|
+
Classifier: Operating System :: OS Independent
|
|
11
|
+
Requires-Python: >3.8
|
|
12
|
+
Description-Content-Type: text/markdown
|
|
13
|
+
Requires-Dist: requests>=2.28.0
|
|
14
|
+
Requires-Dist: pandas>=2.0.0
|
|
15
|
+
|
|
16
|
+
# jsonunwrap
|
|
17
|
+
|
|
18
|
+
**jsonunwrap** is a simple JSON flattening and normalization library.
|
|
19
|
+
|
|
20
|
+
```python
|
|
21
|
+
>>> import jsonunwrap as ju
|
|
22
|
+
>>> url = "https://dummyjson.com/carts"
|
|
23
|
+
>>> fetchdata = ju.fetch_json(url)
|
|
24
|
+
>>> df = ju.unwrap_data(fetchdata)
|
|
25
|
+
>>> df.columns
|
|
26
|
+
Index(['id', 'user.name', 'hobbies'], dtype='object')
|
|
27
|
+
>>> len(df)
|
|
28
|
+
2
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
jsonunwrap allows you to deeply normalize complex, semi-structured nested JSON data into clean pandas DataFrames extremely easily.
|
|
32
|
+
|
|
33
|
+
---
|
|
34
|
+
|
|
35
|
+
## Installing jsonunwrap
|
|
36
|
+
|
|
37
|
+
jsonunwrap is available on PyPI:
|
|
38
|
+
|
|
39
|
+
```bash
|
|
40
|
+
$ python -m pip install jsonunwrap
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
## Supported Features & Best–Practices
|
|
44
|
+
|
|
45
|
+
jsonunwrap is ready for the data parsing demands of modern web APIs, automation scripts, and data engineering pipelines.
|
|
46
|
+
|
|
47
|
+
- Recursive Deep-Flattening (Dot-notation formatting for sub-dictionaries)
|
|
48
|
+
- Automated List Explosion (Spanning embedded primitive lists safely into unique rows)
|
|
49
|
+
- Endpoint Target Binding (Fetch and write stream sequences in a single wrapper)
|
|
50
|
+
- Clean API Namespace (Unified module boundaries with zero complex configuration maps)
|
|
51
|
+
|
|
52
|
+
---
|
|
53
|
+
|
|
54
|
+
## Quick Usage Reference
|
|
55
|
+
|
|
56
|
+
### Download, Normalize, and Export an API Stream to Disk
|
|
57
|
+
|
|
58
|
+
```python
|
|
59
|
+
import jsonunwrap as ju
|
|
60
|
+
|
|
61
|
+
target_url = "url"
|
|
62
|
+
output_file = "data/normalized_users.csv"
|
|
63
|
+
|
|
64
|
+
# Fetch, unwrap nested schemas, and generate a local CSV payload
|
|
65
|
+
df = ju.json_to_csv(target_url, output_file)
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
---
|
|
69
|
+
|
|
70
|
+
## Core API Documentation
|
|
71
|
+
|
|
72
|
+
The complete package interaction map exposes three primary interfaces:
|
|
73
|
+
|
|
74
|
+
### `jsonunwrap.unwrap_data(data)`
|
|
75
|
+
Recursively flattens an in-memory dictionary or list of records. Returns a parsed `pandas.DataFrame`.
|
|
76
|
+
|
|
77
|
+
### `jsonunwrap.fetch_json(url, **kwargs)`
|
|
78
|
+
A connection delivery utility utilizing the `requests` transport subsystem. Passes `**kwargs` onwards to target handlers.
|
|
79
|
+
|
|
80
|
+
### `jsonunwrap.json_to_csv(url, output_path)`
|
|
81
|
+
High-level workflow coordinator connecting remote schema acquisition loops directly to a structural local CSV export.
|
|
82
|
+
|
|
83
|
+
---
|
|
84
|
+
|
|
85
|
+
## Contributions & Testing
|
|
86
|
+
|
|
87
|
+
To test modifications locally or prepare adjustments, execute using `pytest`:
|
|
88
|
+
|
|
89
|
+
```bash
|
|
90
|
+
\$ git clone https://github.com/njuedominic/json-unwrap
|
|
91
|
+
\$ cd jsonunwrap
|
|
92
|
+
\(python -m pip install -e .[test]\) pytest
|
|
93
|
+
```
|
jsonunwrap-0.2.0/PKG-INFO
DELETED
|
@@ -1,28 +0,0 @@
|
|
|
1
|
-
Metadata-Version: 2.4
|
|
2
|
-
Name: jsonunwrap
|
|
3
|
-
Version: 0.2.0
|
|
4
|
-
Summary: A small python package that unpacks data from a JSON url and converts it into a csv file.
|
|
5
|
-
Author-email: njuedominic <njuemugodominic@gmail.com>
|
|
6
|
-
Project-URL: Homepage, https://github.com/njuedominic/json-unwrap
|
|
7
|
-
Project-URL: Bug Tracker, https://github.com/njuedominic/json-unwrap/issues
|
|
8
|
-
Classifier: Programming Language :: Python :: 3
|
|
9
|
-
Classifier: License :: OSI Approved :: MIT License
|
|
10
|
-
Classifier: Operating System :: OS Independent
|
|
11
|
-
Requires-Python: >3.8
|
|
12
|
-
Description-Content-Type: text/markdown
|
|
13
|
-
Requires-Dist: requests>=2.28.0
|
|
14
|
-
Requires-Dist: pandas>=2.0.0
|
|
15
|
-
|
|
16
|
-
### JSON Unwrap
|
|
17
|
-
A small python package that unpacks data from a JSON url and converts it into a csv file.
|
|
18
|
-
|
|
19
|
-
### Current features
|
|
20
|
-
* Convert one JSON url to a csv
|
|
21
|
-
* Automatically creates a data folder if it does not exist
|
|
22
|
-
|
|
23
|
-
### Roadmap
|
|
24
|
-
* Custom file names for output csv
|
|
25
|
-
* Convert into a dataframe ready for use in a notebook environment
|
|
26
|
-
* Error handling for a failed url
|
|
27
|
-
* Tests
|
|
28
|
-
* Documentation examples
|
jsonunwrap-0.2.0/README.md
DELETED
|
@@ -1,13 +0,0 @@
|
|
|
1
|
-
### JSON Unwrap
|
|
2
|
-
A small python package that unpacks data from a JSON url and converts it into a csv file.
|
|
3
|
-
|
|
4
|
-
### Current features
|
|
5
|
-
* Convert one JSON url to a csv
|
|
6
|
-
* Automatically creates a data folder if it does not exist
|
|
7
|
-
|
|
8
|
-
### Roadmap
|
|
9
|
-
* Custom file names for output csv
|
|
10
|
-
* Convert into a dataframe ready for use in a notebook environment
|
|
11
|
-
* Error handling for a failed url
|
|
12
|
-
* Tests
|
|
13
|
-
* Documentation examples
|
|
@@ -1,79 +0,0 @@
|
|
|
1
|
-
"""
|
|
2
|
-
This module contains the core functionality for the json_unwrap package.
|
|
3
|
-
The main function is `json_to_csv`, which converts a JSON file to a CSV file.
|
|
4
|
-
"""
|
|
5
|
-
import os
|
|
6
|
-
from typing import Any, Dict, List, Union
|
|
7
|
-
import pandas as pd
|
|
8
|
-
import requests
|
|
9
|
-
|
|
10
|
-
def unwrap_data(data: Union[Dict[str, Any], List[Any]]) -> pd.DataFrame:
|
|
11
|
-
"""
|
|
12
|
-
Normalizes and deeply flattens semi-structured JSON data into a pandas DataFrame.
|
|
13
|
-
"""
|
|
14
|
-
# Simply convert a single dictionary into a list containing that dictionary
|
|
15
|
-
if isinstance(data, dict):
|
|
16
|
-
main_data = [data]
|
|
17
|
-
else:
|
|
18
|
-
main_data = data
|
|
19
|
-
|
|
20
|
-
# Perform the initial normalization
|
|
21
|
-
df = pd.json_normalize(main_data)
|
|
22
|
-
|
|
23
|
-
# Automatically iterate through the columns and deeply flatten any nested structures
|
|
24
|
-
changed = True
|
|
25
|
-
while changed:
|
|
26
|
-
changed = False
|
|
27
|
-
for col in list(df.columns):
|
|
28
|
-
# Explode lists
|
|
29
|
-
if any(isinstance(val, list) for val in df[col].dropna()):
|
|
30
|
-
df = df.explode(col)
|
|
31
|
-
changed = True
|
|
32
|
-
break # Refresh columns list after structural changes
|
|
33
|
-
|
|
34
|
-
# Normalize and merge nested dictionaries
|
|
35
|
-
if any(isinstance(val, dict) for val in df[col].dropna()):
|
|
36
|
-
nested_df = pd.json_normalize(df[col]).set_index(df.index)
|
|
37
|
-
df = df.drop(columns=[col]).join(nested_df, rsuffix=f"_{col}")
|
|
38
|
-
changed = True
|
|
39
|
-
break
|
|
40
|
-
|
|
41
|
-
return df
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
def fetch_json(url: str, **kwargs: Any) -> Union[Dict[str, Any], List[Any]]:
|
|
45
|
-
"""
|
|
46
|
-
Fetches raw JSON data from a URL.
|
|
47
|
-
|
|
48
|
-
Args:
|
|
49
|
-
url: The endpoint web target.
|
|
50
|
-
**kwargs: Additional arguments passed directly to requests.get (e.g., headers, auth).
|
|
51
|
-
"""
|
|
52
|
-
response = requests.get(url, **kwargs)
|
|
53
|
-
response.raise_for_status()
|
|
54
|
-
return response.json()
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
def json_to_csv(url: str, output_path: str) -> pd.DataFrame:
|
|
58
|
-
"""
|
|
59
|
-
Fetches JSON from a URL, deeply flattens it, and saves it directly to a CSV file.
|
|
60
|
-
|
|
61
|
-
Args:
|
|
62
|
-
url: The endpoint URL containing the target JSON data.
|
|
63
|
-
output_path: Target filesystem path where the CSV will be written.
|
|
64
|
-
|
|
65
|
-
Returns:
|
|
66
|
-
The generated pandas DataFrame.
|
|
67
|
-
"""
|
|
68
|
-
# Ensure the parent output directory exists safely
|
|
69
|
-
directory = os.path.dirname(output_path)
|
|
70
|
-
if directory:
|
|
71
|
-
os.makedirs(directory, exist_ok=True)
|
|
72
|
-
|
|
73
|
-
raw_data = fetch_json(url)
|
|
74
|
-
df = unwrap_data(raw_data)
|
|
75
|
-
|
|
76
|
-
df.to_csv(output_path, index=False)
|
|
77
|
-
return df
|
|
78
|
-
|
|
79
|
-
|
|
@@ -1,28 +0,0 @@
|
|
|
1
|
-
Metadata-Version: 2.4
|
|
2
|
-
Name: jsonunwrap
|
|
3
|
-
Version: 0.2.0
|
|
4
|
-
Summary: A small python package that unpacks data from a JSON url and converts it into a csv file.
|
|
5
|
-
Author-email: njuedominic <njuemugodominic@gmail.com>
|
|
6
|
-
Project-URL: Homepage, https://github.com/njuedominic/json-unwrap
|
|
7
|
-
Project-URL: Bug Tracker, https://github.com/njuedominic/json-unwrap/issues
|
|
8
|
-
Classifier: Programming Language :: Python :: 3
|
|
9
|
-
Classifier: License :: OSI Approved :: MIT License
|
|
10
|
-
Classifier: Operating System :: OS Independent
|
|
11
|
-
Requires-Python: >3.8
|
|
12
|
-
Description-Content-Type: text/markdown
|
|
13
|
-
Requires-Dist: requests>=2.28.0
|
|
14
|
-
Requires-Dist: pandas>=2.0.0
|
|
15
|
-
|
|
16
|
-
### JSON Unwrap
|
|
17
|
-
A small python package that unpacks data from a JSON url and converts it into a csv file.
|
|
18
|
-
|
|
19
|
-
### Current features
|
|
20
|
-
* Convert one JSON url to a csv
|
|
21
|
-
* Automatically creates a data folder if it does not exist
|
|
22
|
-
|
|
23
|
-
### Roadmap
|
|
24
|
-
* Custom file names for output csv
|
|
25
|
-
* Convert into a dataframe ready for use in a notebook environment
|
|
26
|
-
* Error handling for a failed url
|
|
27
|
-
* Tests
|
|
28
|
-
* Documentation examples
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|