Tabular-Enhancement-Tool 0.1.2__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- tabular_enhancement_tool-0.1.2/LICENSE +21 -0
- tabular_enhancement_tool-0.1.2/PKG-INFO +279 -0
- tabular_enhancement_tool-0.1.2/README.md +251 -0
- tabular_enhancement_tool-0.1.2/Tabular_Enhancement_Tool.egg-info/PKG-INFO +279 -0
- tabular_enhancement_tool-0.1.2/Tabular_Enhancement_Tool.egg-info/SOURCES.txt +17 -0
- tabular_enhancement_tool-0.1.2/Tabular_Enhancement_Tool.egg-info/dependency_links.txt +1 -0
- tabular_enhancement_tool-0.1.2/Tabular_Enhancement_Tool.egg-info/entry_points.txt +2 -0
- tabular_enhancement_tool-0.1.2/Tabular_Enhancement_Tool.egg-info/requires.txt +10 -0
- tabular_enhancement_tool-0.1.2/Tabular_Enhancement_Tool.egg-info/top_level.txt +2 -0
- tabular_enhancement_tool-0.1.2/setup.cfg +4 -0
- tabular_enhancement_tool-0.1.2/setup.py +29 -0
- tabular_enhancement_tool-0.1.2/tabular_enhancement_tool/__init__.py +16 -0
- tabular_enhancement_tool-0.1.2/tabular_enhancement_tool/cli.py +163 -0
- tabular_enhancement_tool-0.1.2/tabular_enhancement_tool/core.py +309 -0
- tabular_enhancement_tool-0.1.2/tests/__init__.py +0 -0
- tabular_enhancement_tool-0.1.2/tests/test_cli.py +457 -0
- tabular_enhancement_tool-0.1.2/tests/test_complex_data.py +134 -0
- tabular_enhancement_tool-0.1.2/tests/test_core.py +316 -0
- tabular_enhancement_tool-0.1.2/tests/test_sqlalchemy.py +131 -0
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Christopher Boyd
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,279 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: Tabular-Enhancement-Tool
|
|
3
|
+
Version: 0.1.2
|
|
4
|
+
Summary: A Python package for asynchronously enhancing tabular files via APIs.
|
|
5
|
+
Author: Christopher Boyd and co-authored/developed by Junie, an autonomous programmer developed by JetBrains.
|
|
6
|
+
Maintainer: Christopher Boyd
|
|
7
|
+
Requires-Python: >=3.6
|
|
8
|
+
Description-Content-Type: text/markdown
|
|
9
|
+
License-File: LICENSE
|
|
10
|
+
Requires-Dist: pandas
|
|
11
|
+
Requires-Dist: requests
|
|
12
|
+
Requires-Dist: openpyxl
|
|
13
|
+
Requires-Dist: sqlalchemy
|
|
14
|
+
Requires-Dist: pyarrow
|
|
15
|
+
Requires-Dist: fastparquet
|
|
16
|
+
Provides-Extra: test
|
|
17
|
+
Requires-Dist: pytest; extra == "test"
|
|
18
|
+
Requires-Dist: coverage; extra == "test"
|
|
19
|
+
Dynamic: author
|
|
20
|
+
Dynamic: description
|
|
21
|
+
Dynamic: description-content-type
|
|
22
|
+
Dynamic: license-file
|
|
23
|
+
Dynamic: maintainer
|
|
24
|
+
Dynamic: provides-extra
|
|
25
|
+
Dynamic: requires-dist
|
|
26
|
+
Dynamic: requires-python
|
|
27
|
+
Dynamic: summary
|
|
28
|
+
|
|
29
|
+
# Tabular-Enhancement-Tool
|
|
30
|
+
|
|
31
|
+
[](https://tabular-enhancement-tool.readthedocs.io/en/latest/?badge=latest)
|
|
32
|
+
[](https://badge.fury.io/py/Tabular-Enhancement-Tool)
|
|
33
|
+
[](https://github.com/Junie/Tabular-Enhancement-Tool/actions/workflows/python-tests.yml)
|
|
34
|
+
[](https://codecov.io/gh/Mikuana/tabular-enhancement-tool)
|
|
35
|
+
[](https://github.com/astral-sh/ruff)
|
|
36
|
+
|
|
37
|
+
A Python package for asynchronously enhancing tabular files (CSV, Excel, TSV, TXT, Parquet) by calling external APIs for each row.
|
|
38
|
+
|
|
39
|
+
## Why Tabular-Enhancement-Tool?
|
|
40
|
+
|
|
41
|
+
In modern data lake architectures, raw tabular data (e.g., event logs, daily exports, customer records) often arrives in formats like CSV, Excel, or TSV. To make this data actionable, it frequently needs to be enriched with information residing in other systems—such as CRM details, geolocation data, or legacy internal services—accessible only via REST APIs.
|
|
42
|
+
|
|
43
|
+
The **Tabular-Enhancement-Tool (tet)** is designed to streamline this enrichment process:
|
|
44
|
+
|
|
45
|
+
- **Multi-source enhancement**: Fetches data from external JSON-based REST APIs or SQLAlchemy-compatible databases.
|
|
46
|
+
- **High Performance via Multi-threading**: Instead of sequential processing, which can take hours for large files, this tool utilizes a thread pool to handle hundreds of rows concurrently.
|
|
47
|
+
- **Data Integrity and Precision**: The tool instructs Pandas to treat all inputs as strings, ensuring that original data—like ZIP codes with leading zeros or numeric IDs—is **retained exactly** as it appeared in the source.
|
|
48
|
+
- **Append-Only Enhancement**: Your original columns are never modified. The responses are appended as new columns, allowing you to preserve the lineage of the raw data while adding new value.
|
|
49
|
+
- **Response Flattening**: By default, the tool expands API/Database response objects into individual columns, making the data immediately available for analysis. For REST APIs, the tool automatically extracts the `data` field from the JSON response if present, focusing on the core payload. This behavior can be disabled if a single nested object is preferred.
|
|
50
|
+
- **Strict Order Preservation**: Even with parallel execution, the output rows are guaranteed to match the order of the input file, making it safe for downstream processes that rely on stable indexing.
|
|
51
|
+
- **Flexible field mapping**: Map DataFrame columns to API payload fields or database query filters.
|
|
52
|
+
- **HTTP GET and POST support**: Choose the appropriate method for your API, with support for URL templating and query parameters.
|
|
53
|
+
- **REST API Authentication**: Supports Basic Auth, Bearer Token, and API Key authentication schemes.
|
|
54
|
+
- **SQLAlchemy Integration**: Supports any database with a SQLAlchemy dialect (PostgreSQL, MySQL, SQLite, Oracle, SQL Server, etc.).
|
|
55
|
+
|
|
56
|
+
## Installation
|
|
57
|
+
|
|
58
|
+
You can install the package directly from the source directory:
|
|
59
|
+
|
|
60
|
+
```bash
|
|
61
|
+
pip install .
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
This will automatically install the required dependencies (`pandas`, `requests`, `openpyxl`) and provide the `tabular-enhancer` command.
|
|
65
|
+
|
|
66
|
+
## Usage
|
|
67
|
+
|
|
68
|
+
### Command Line Interface (CLI)
|
|
69
|
+
|
|
70
|
+
After installation, you can run the tool using the `tabular-enhancer` command:
|
|
71
|
+
|
|
72
|
+
```bash
|
|
73
|
+
tabular-enhancer input_data.csv \
|
|
74
|
+
--api_url "https://api.example.com/process" \
|
|
75
|
+
--mapping '{"api_field_1": "csv_column_a", "api_field_2": "csv_column_b"}' \
|
|
76
|
+
--max_workers 10
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
**Arguments:**
|
|
80
|
+
- `input_file`: Path to your CSV, Excel, TSV, TXT, or Parquet file.
|
|
81
|
+
- `--max_workers`: (Optional) Number of concurrent threads (default: 5).
|
|
82
|
+
- `--no_flatten`: (Optional) Do not expand response objects into individual columns.
|
|
83
|
+
|
|
84
|
+
**API Options:**
|
|
85
|
+
- `--api_url`: The endpoint where the POST request will be sent.
|
|
86
|
+
- `--mapping`: A JSON string mapping API payload keys to your file's column names. e.g. `'{"api_field": "csv_column"}'`.
|
|
87
|
+
- `--method`: (Optional) HTTP method to use (`POST` or `GET`, default: `POST`).
|
|
88
|
+
- `--auth_type`: (Optional) Authentication type (`basic`, `bearer`, or `apikey`).
|
|
89
|
+
- `--auth_user`: (Optional) Username for `basic` auth.
|
|
90
|
+
- `--auth_pass`: (Optional) Password for `basic` auth.
|
|
91
|
+
- `--auth_token`: (Optional) Token for `bearer` or `apikey` auth.
|
|
92
|
+
- `--auth_header`: (Optional) Custom header for `apikey` auth (default: `X-API-Key`).
|
|
93
|
+
|
|
94
|
+
**SQLAlchemy Options:**
|
|
95
|
+
- `--db_url`: The SQLAlchemy connection URL to the target database.
|
|
96
|
+
- `--table_name`: Name of the table to query for enhancement.
|
|
97
|
+
- `--mapping`: A JSON list of column names in your file to be used as filters (WHERE clause) for the query. e.g. `'["email_address"]'`.
|
|
98
|
+
|
|
99
|
+
**CLI Usage Examples:**
|
|
100
|
+
|
|
101
|
+
```bash
|
|
102
|
+
# REST API Enhancement
|
|
103
|
+
tabular-enhancer input.csv \
|
|
104
|
+
--api_url "https://api.example.com/process" \
|
|
105
|
+
--mapping '{"user_id": "id"}'
|
|
106
|
+
|
|
107
|
+
# SQLAlchemy Database Enhancement
|
|
108
|
+
tabular-enhancer data.xlsx \
|
|
109
|
+
--db_url "postgresql://user:pass@localhost/dbname" \
|
|
110
|
+
--table_name "users" \
|
|
111
|
+
--mapping '["email_address"]'
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
**CLI Authentication Examples:**
|
|
115
|
+
|
|
116
|
+
```bash
|
|
117
|
+
# Basic Auth
|
|
118
|
+
tabular-enhancer data.csv --api_url "..." --mapping '...' --auth_type basic --auth_user "admin" --auth_pass "secret"
|
|
119
|
+
|
|
120
|
+
# Bearer Token
|
|
121
|
+
tabular-enhancer data.csv --api_url "..." --mapping '...' --auth_type bearer --auth_token "your_token"
|
|
122
|
+
|
|
123
|
+
# API Key
|
|
124
|
+
tabular-enhancer data.csv --api_url "..." --mapping '...' --auth_type apikey --auth_token "your_api_key"
|
|
125
|
+
|
|
126
|
+
# GET request with URL templating
|
|
127
|
+
tabular-enhancer data.csv --api_url "https://api.weather.gov/points/{lat},{lon}" --mapping '{"lat": "latitude", "lon": "longitude"}' --method GET
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
### REST API Enhancement
|
|
131
|
+
|
|
132
|
+
```python
|
|
133
|
+
import pandas as pd
|
|
134
|
+
import tabular_enhancement_tool as tet
|
|
135
|
+
|
|
136
|
+
# Load data
|
|
137
|
+
df = tet.read_tabular_file("my_data.xlsx")
|
|
138
|
+
|
|
139
|
+
# API Configuration
|
|
140
|
+
api_url = "https://api.example.com/v1/enrich"
|
|
141
|
+
mapping = {"user_id": "ID"}
|
|
142
|
+
enhancer = tet.TabularEnhancer(api_url, mapping)
|
|
143
|
+
|
|
144
|
+
# Process
|
|
145
|
+
df_enhanced = enhancer.process_dataframe(df)
|
|
146
|
+
|
|
147
|
+
# Save
|
|
148
|
+
tet.save_tabular_file(df_enhanced, "my_data.xlsx")
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
### REST API Enhancement (POST Example)
|
|
152
|
+
|
|
153
|
+
The following example demonstrates how to use the `httpbin.org` public API to simulate posting data from a CSV file.
|
|
154
|
+
|
|
155
|
+
```python
|
|
156
|
+
import tabular_enhancement_tool as tet
|
|
157
|
+
|
|
158
|
+
# Load data
|
|
159
|
+
df = tet.read_tabular_file("examples/posts_data.csv")
|
|
160
|
+
|
|
161
|
+
# HTTPBin API configuration
|
|
162
|
+
api_url = "https://httpbin.org/post"
|
|
163
|
+
mapping = {
|
|
164
|
+
"title": "title",
|
|
165
|
+
"body": "body",
|
|
166
|
+
"userId": "userId"
|
|
167
|
+
}
|
|
168
|
+
|
|
169
|
+
enhancer = tet.TabularEnhancer(api_url, mapping, method="POST")
|
|
170
|
+
|
|
171
|
+
# Process
|
|
172
|
+
df_enhanced = enhancer.process_dataframe(df)
|
|
173
|
+
|
|
174
|
+
# Save
|
|
175
|
+
tet.save_tabular_file(df_enhanced, "examples/posts_data.csv", suffix="_enhanced")
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
### REST API Enhancement (GET with URL Templating)
|
|
179
|
+
|
|
180
|
+
```python
|
|
181
|
+
import tabular_enhancement_tool as tet
|
|
182
|
+
|
|
183
|
+
# Load data with coordinates
|
|
184
|
+
df = tet.read_tabular_file("cities.csv")
|
|
185
|
+
|
|
186
|
+
# NWS API example
|
|
187
|
+
api_url = "https://api.weather.gov/points/{lat},{lon}"
|
|
188
|
+
mapping = {"lat": "lat", "lon": "lon"}
|
|
189
|
+
headers = {"User-Agent": "(myweatherapp.com, contact@example.com)"}
|
|
190
|
+
|
|
191
|
+
enhancer = tet.TabularEnhancer(api_url, mapping, method="GET", headers=headers)
|
|
192
|
+
|
|
193
|
+
# Process
|
|
194
|
+
df_enhanced = enhancer.process_dataframe(df)
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
### SQLAlchemy Database Enhancement (Core)
|
|
198
|
+
|
|
199
|
+
```python
|
|
200
|
+
import pandas as pd
|
|
201
|
+
import tabular_enhancement_tool as tet
|
|
202
|
+
|
|
203
|
+
# Load data
|
|
204
|
+
df = tet.read_tabular_file("data.csv")
|
|
205
|
+
|
|
206
|
+
# SQLAlchemy Configuration
|
|
207
|
+
db_url = "postgresql://user:pass@localhost/dbname"
|
|
208
|
+
mapping = ["ID"]
|
|
209
|
+
enhancer = tet.ODBCEnhancer(db_url, mapping, table_name="orders")
|
|
210
|
+
|
|
211
|
+
# Process
|
|
212
|
+
df_enhanced = enhancer.process_dataframe(df)
|
|
213
|
+
|
|
214
|
+
# Save
|
|
215
|
+
tet.save_tabular_file(df_enhanced, "data.csv")
|
|
216
|
+
```
|
|
217
|
+
|
|
218
|
+
### SQLAlchemy Database Enhancement (ORM)
|
|
219
|
+
|
|
220
|
+
```python
|
|
221
|
+
from sqlalchemy.orm import DeclarativeBase
|
|
222
|
+
from sqlalchemy import Column, Integer, String
|
|
223
|
+
import tabular_enhancement_tool as tet
|
|
224
|
+
|
|
225
|
+
class Base(DeclarativeBase):
|
|
226
|
+
pass
|
|
227
|
+
|
|
228
|
+
class User(Base):
|
|
229
|
+
__tablename__ = "users"
|
|
230
|
+
id = Column(Integer, primary_key=True)
|
|
231
|
+
name = Column(String)
|
|
232
|
+
role = Column(String)
|
|
233
|
+
|
|
234
|
+
# Load data
|
|
235
|
+
df = tet.read_tabular_file("data.csv")
|
|
236
|
+
|
|
237
|
+
# Process using ORM model
|
|
238
|
+
enhancer = tet.ODBCEnhancer("sqlite:///mydb.db", mapping=["id"], model=User)
|
|
239
|
+
df_enhanced = enhancer.process_dataframe(df)
|
|
240
|
+
```
|
|
241
|
+
|
|
242
|
+
### SQLAlchemy Database Enhancement (SQLite Example)
|
|
243
|
+
|
|
244
|
+
This example shows how to use the `ODBCEnhancer` with a SQLite database to enrich a CSV file.
|
|
245
|
+
|
|
246
|
+
```python
|
|
247
|
+
import tabular_enhancement_tool as tet
|
|
248
|
+
|
|
249
|
+
# Load data
|
|
250
|
+
df = tet.read_tabular_file("users.csv")
|
|
251
|
+
|
|
252
|
+
# SQLite connection URL
|
|
253
|
+
db_url = "sqlite:///company_data.db"
|
|
254
|
+
|
|
255
|
+
# Match by 'email' and fetch related columns from the 'employees' table
|
|
256
|
+
enhancer = tet.ODBCEnhancer(
|
|
257
|
+
connection_url=db_url,
|
|
258
|
+
mapping=["email"],
|
|
259
|
+
table_name="employees"
|
|
260
|
+
)
|
|
261
|
+
|
|
262
|
+
# Process and save
|
|
263
|
+
df_enhanced = enhancer.process_dataframe(df)
|
|
264
|
+
tet.save_tabular_file(df_enhanced, "users.csv", suffix="_enriched")
|
|
265
|
+
```
|
|
266
|
+
|
|
267
|
+
## License
|
|
268
|
+
|
|
269
|
+
Distributed under the MIT License. See `LICENSE` for more information.
|
|
270
|
+
|
|
271
|
+
## Development and CI/CD
|
|
272
|
+
|
|
273
|
+
- **Linting & Formatting**: Ruff is used to maintain high code quality and consistent style.
|
|
274
|
+
- **Documentation**: Managed by Sphinx and hosted on [Read the Docs](https://tabular-enhancement-tool.readthedocs.io/en/latest/). For more detailed examples and API documentation, please visit the official documentation site.
|
|
275
|
+
- **Publishing**: The `main` branch is automatically built and published to PyPI on every push. **Note**: Remember to bump the version in `setup.py` and `tabular_enhancement_tool/__init__.py` before pushing to `main`.
|
|
276
|
+
|
|
277
|
+
## Credits
|
|
278
|
+
|
|
279
|
+
This tool was authored by **Christopher Boyd** and co-authored/developed by **Junie**, an autonomous programmer developed by JetBrains.
|
|
@@ -0,0 +1,251 @@
|
|
|
1
|
+
# Tabular-Enhancement-Tool
|
|
2
|
+
|
|
3
|
+
[](https://tabular-enhancement-tool.readthedocs.io/en/latest/?badge=latest)
|
|
4
|
+
[](https://badge.fury.io/py/Tabular-Enhancement-Tool)
|
|
5
|
+
[](https://github.com/Junie/Tabular-Enhancement-Tool/actions/workflows/python-tests.yml)
|
|
6
|
+
[](https://codecov.io/gh/Mikuana/tabular-enhancement-tool)
|
|
7
|
+
[](https://github.com/astral-sh/ruff)
|
|
8
|
+
|
|
9
|
+
A Python package for asynchronously enhancing tabular files (CSV, Excel, TSV, TXT, Parquet) by calling external APIs for each row.
|
|
10
|
+
|
|
11
|
+
## Why Tabular-Enhancement-Tool?
|
|
12
|
+
|
|
13
|
+
In modern data lake architectures, raw tabular data (e.g., event logs, daily exports, customer records) often arrives in formats like CSV, Excel, or TSV. To make this data actionable, it frequently needs to be enriched with information residing in other systems—such as CRM details, geolocation data, or legacy internal services—accessible only via REST APIs.
|
|
14
|
+
|
|
15
|
+
The **Tabular-Enhancement-Tool (tet)** is designed to streamline this enrichment process:
|
|
16
|
+
|
|
17
|
+
- **Multi-source enhancement**: Fetches data from external JSON-based REST APIs or SQLAlchemy-compatible databases.
|
|
18
|
+
- **High Performance via Multi-threading**: Instead of sequential processing, which can take hours for large files, this tool utilizes a thread pool to handle hundreds of rows concurrently.
|
|
19
|
+
- **Data Integrity and Precision**: The tool instructs Pandas to treat all inputs as strings, ensuring that original data—like ZIP codes with leading zeros or numeric IDs—is **retained exactly** as it appeared in the source.
|
|
20
|
+
- **Append-Only Enhancement**: Your original columns are never modified. The responses are appended as new columns, allowing you to preserve the lineage of the raw data while adding new value.
|
|
21
|
+
- **Response Flattening**: By default, the tool expands API/Database response objects into individual columns, making the data immediately available for analysis. For REST APIs, the tool automatically extracts the `data` field from the JSON response if present, focusing on the core payload. This behavior can be disabled if a single nested object is preferred.
|
|
22
|
+
- **Strict Order Preservation**: Even with parallel execution, the output rows are guaranteed to match the order of the input file, making it safe for downstream processes that rely on stable indexing.
|
|
23
|
+
- **Flexible field mapping**: Map DataFrame columns to API payload fields or database query filters.
|
|
24
|
+
- **HTTP GET and POST support**: Choose the appropriate method for your API, with support for URL templating and query parameters.
|
|
25
|
+
- **REST API Authentication**: Supports Basic Auth, Bearer Token, and API Key authentication schemes.
|
|
26
|
+
- **SQLAlchemy Integration**: Supports any database with a SQLAlchemy dialect (PostgreSQL, MySQL, SQLite, Oracle, SQL Server, etc.).
|
|
27
|
+
|
|
28
|
+
## Installation
|
|
29
|
+
|
|
30
|
+
You can install the package directly from the source directory:
|
|
31
|
+
|
|
32
|
+
```bash
|
|
33
|
+
pip install .
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
This will automatically install the required dependencies (`pandas`, `requests`, `openpyxl`) and provide the `tabular-enhancer` command.
|
|
37
|
+
|
|
38
|
+
## Usage
|
|
39
|
+
|
|
40
|
+
### Command Line Interface (CLI)
|
|
41
|
+
|
|
42
|
+
After installation, you can run the tool using the `tabular-enhancer` command:
|
|
43
|
+
|
|
44
|
+
```bash
|
|
45
|
+
tabular-enhancer input_data.csv \
|
|
46
|
+
--api_url "https://api.example.com/process" \
|
|
47
|
+
--mapping '{"api_field_1": "csv_column_a", "api_field_2": "csv_column_b"}' \
|
|
48
|
+
--max_workers 10
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
**Arguments:**
|
|
52
|
+
- `input_file`: Path to your CSV, Excel, TSV, TXT, or Parquet file.
|
|
53
|
+
- `--max_workers`: (Optional) Number of concurrent threads (default: 5).
|
|
54
|
+
- `--no_flatten`: (Optional) Do not expand response objects into individual columns.
|
|
55
|
+
|
|
56
|
+
**API Options:**
|
|
57
|
+
- `--api_url`: The endpoint where the POST request will be sent.
|
|
58
|
+
- `--mapping`: A JSON string mapping API payload keys to your file's column names. e.g. `'{"api_field": "csv_column"}'`.
|
|
59
|
+
- `--method`: (Optional) HTTP method to use (`POST` or `GET`, default: `POST`).
|
|
60
|
+
- `--auth_type`: (Optional) Authentication type (`basic`, `bearer`, or `apikey`).
|
|
61
|
+
- `--auth_user`: (Optional) Username for `basic` auth.
|
|
62
|
+
- `--auth_pass`: (Optional) Password for `basic` auth.
|
|
63
|
+
- `--auth_token`: (Optional) Token for `bearer` or `apikey` auth.
|
|
64
|
+
- `--auth_header`: (Optional) Custom header for `apikey` auth (default: `X-API-Key`).
|
|
65
|
+
|
|
66
|
+
**SQLAlchemy Options:**
|
|
67
|
+
- `--db_url`: The SQLAlchemy connection URL to the target database.
|
|
68
|
+
- `--table_name`: Name of the table to query for enhancement.
|
|
69
|
+
- `--mapping`: A JSON list of column names in your file to be used as filters (WHERE clause) for the query. e.g. `'["email_address"]'`.
|
|
70
|
+
|
|
71
|
+
**CLI Usage Examples:**
|
|
72
|
+
|
|
73
|
+
```bash
|
|
74
|
+
# REST API Enhancement
|
|
75
|
+
tabular-enhancer input.csv \
|
|
76
|
+
--api_url "https://api.example.com/process" \
|
|
77
|
+
--mapping '{"user_id": "id"}'
|
|
78
|
+
|
|
79
|
+
# SQLAlchemy Database Enhancement
|
|
80
|
+
tabular-enhancer data.xlsx \
|
|
81
|
+
--db_url "postgresql://user:pass@localhost/dbname" \
|
|
82
|
+
--table_name "users" \
|
|
83
|
+
--mapping '["email_address"]'
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
**CLI Authentication Examples:**
|
|
87
|
+
|
|
88
|
+
```bash
|
|
89
|
+
# Basic Auth
|
|
90
|
+
tabular-enhancer data.csv --api_url "..." --mapping '...' --auth_type basic --auth_user "admin" --auth_pass "secret"
|
|
91
|
+
|
|
92
|
+
# Bearer Token
|
|
93
|
+
tabular-enhancer data.csv --api_url "..." --mapping '...' --auth_type bearer --auth_token "your_token"
|
|
94
|
+
|
|
95
|
+
# API Key
|
|
96
|
+
tabular-enhancer data.csv --api_url "..." --mapping '...' --auth_type apikey --auth_token "your_api_key"
|
|
97
|
+
|
|
98
|
+
# GET request with URL templating
|
|
99
|
+
tabular-enhancer data.csv --api_url "https://api.weather.gov/points/{lat},{lon}" --mapping '{"lat": "latitude", "lon": "longitude"}' --method GET
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
### REST API Enhancement
|
|
103
|
+
|
|
104
|
+
```python
|
|
105
|
+
import pandas as pd
|
|
106
|
+
import tabular_enhancement_tool as tet
|
|
107
|
+
|
|
108
|
+
# Load data
|
|
109
|
+
df = tet.read_tabular_file("my_data.xlsx")
|
|
110
|
+
|
|
111
|
+
# API Configuration
|
|
112
|
+
api_url = "https://api.example.com/v1/enrich"
|
|
113
|
+
mapping = {"user_id": "ID"}
|
|
114
|
+
enhancer = tet.TabularEnhancer(api_url, mapping)
|
|
115
|
+
|
|
116
|
+
# Process
|
|
117
|
+
df_enhanced = enhancer.process_dataframe(df)
|
|
118
|
+
|
|
119
|
+
# Save
|
|
120
|
+
tet.save_tabular_file(df_enhanced, "my_data.xlsx")
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
### REST API Enhancement (POST Example)
|
|
124
|
+
|
|
125
|
+
The following example demonstrates how to use the `httpbin.org` public API to simulate posting data from a CSV file.
|
|
126
|
+
|
|
127
|
+
```python
|
|
128
|
+
import tabular_enhancement_tool as tet
|
|
129
|
+
|
|
130
|
+
# Load data
|
|
131
|
+
df = tet.read_tabular_file("examples/posts_data.csv")
|
|
132
|
+
|
|
133
|
+
# HTTPBin API configuration
|
|
134
|
+
api_url = "https://httpbin.org/post"
|
|
135
|
+
mapping = {
|
|
136
|
+
"title": "title",
|
|
137
|
+
"body": "body",
|
|
138
|
+
"userId": "userId"
|
|
139
|
+
}
|
|
140
|
+
|
|
141
|
+
enhancer = tet.TabularEnhancer(api_url, mapping, method="POST")
|
|
142
|
+
|
|
143
|
+
# Process
|
|
144
|
+
df_enhanced = enhancer.process_dataframe(df)
|
|
145
|
+
|
|
146
|
+
# Save
|
|
147
|
+
tet.save_tabular_file(df_enhanced, "examples/posts_data.csv", suffix="_enhanced")
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
### REST API Enhancement (GET with URL Templating)
|
|
151
|
+
|
|
152
|
+
```python
|
|
153
|
+
import tabular_enhancement_tool as tet
|
|
154
|
+
|
|
155
|
+
# Load data with coordinates
|
|
156
|
+
df = tet.read_tabular_file("cities.csv")
|
|
157
|
+
|
|
158
|
+
# NWS API example
|
|
159
|
+
api_url = "https://api.weather.gov/points/{lat},{lon}"
|
|
160
|
+
mapping = {"lat": "lat", "lon": "lon"}
|
|
161
|
+
headers = {"User-Agent": "(myweatherapp.com, contact@example.com)"}
|
|
162
|
+
|
|
163
|
+
enhancer = tet.TabularEnhancer(api_url, mapping, method="GET", headers=headers)
|
|
164
|
+
|
|
165
|
+
# Process
|
|
166
|
+
df_enhanced = enhancer.process_dataframe(df)
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
### SQLAlchemy Database Enhancement (Core)
|
|
170
|
+
|
|
171
|
+
```python
|
|
172
|
+
import pandas as pd
|
|
173
|
+
import tabular_enhancement_tool as tet
|
|
174
|
+
|
|
175
|
+
# Load data
|
|
176
|
+
df = tet.read_tabular_file("data.csv")
|
|
177
|
+
|
|
178
|
+
# SQLAlchemy Configuration
|
|
179
|
+
db_url = "postgresql://user:pass@localhost/dbname"
|
|
180
|
+
mapping = ["ID"]
|
|
181
|
+
enhancer = tet.ODBCEnhancer(db_url, mapping, table_name="orders")
|
|
182
|
+
|
|
183
|
+
# Process
|
|
184
|
+
df_enhanced = enhancer.process_dataframe(df)
|
|
185
|
+
|
|
186
|
+
# Save
|
|
187
|
+
tet.save_tabular_file(df_enhanced, "data.csv")
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
### SQLAlchemy Database Enhancement (ORM)
|
|
191
|
+
|
|
192
|
+
```python
|
|
193
|
+
from sqlalchemy.orm import DeclarativeBase
|
|
194
|
+
from sqlalchemy import Column, Integer, String
|
|
195
|
+
import tabular_enhancement_tool as tet
|
|
196
|
+
|
|
197
|
+
class Base(DeclarativeBase):
|
|
198
|
+
pass
|
|
199
|
+
|
|
200
|
+
class User(Base):
|
|
201
|
+
__tablename__ = "users"
|
|
202
|
+
id = Column(Integer, primary_key=True)
|
|
203
|
+
name = Column(String)
|
|
204
|
+
role = Column(String)
|
|
205
|
+
|
|
206
|
+
# Load data
|
|
207
|
+
df = tet.read_tabular_file("data.csv")
|
|
208
|
+
|
|
209
|
+
# Process using ORM model
|
|
210
|
+
enhancer = tet.ODBCEnhancer("sqlite:///mydb.db", mapping=["id"], model=User)
|
|
211
|
+
df_enhanced = enhancer.process_dataframe(df)
|
|
212
|
+
```
|
|
213
|
+
|
|
214
|
+
### SQLAlchemy Database Enhancement (SQLite Example)
|
|
215
|
+
|
|
216
|
+
This example shows how to use the `ODBCEnhancer` with a SQLite database to enrich a CSV file.
|
|
217
|
+
|
|
218
|
+
```python
|
|
219
|
+
import tabular_enhancement_tool as tet
|
|
220
|
+
|
|
221
|
+
# Load data
|
|
222
|
+
df = tet.read_tabular_file("users.csv")
|
|
223
|
+
|
|
224
|
+
# SQLite connection URL
|
|
225
|
+
db_url = "sqlite:///company_data.db"
|
|
226
|
+
|
|
227
|
+
# Match by 'email' and fetch related columns from the 'employees' table
|
|
228
|
+
enhancer = tet.ODBCEnhancer(
|
|
229
|
+
connection_url=db_url,
|
|
230
|
+
mapping=["email"],
|
|
231
|
+
table_name="employees"
|
|
232
|
+
)
|
|
233
|
+
|
|
234
|
+
# Process and save
|
|
235
|
+
df_enhanced = enhancer.process_dataframe(df)
|
|
236
|
+
tet.save_tabular_file(df_enhanced, "users.csv", suffix="_enriched")
|
|
237
|
+
```
|
|
238
|
+
|
|
239
|
+
## License
|
|
240
|
+
|
|
241
|
+
Distributed under the MIT License. See `LICENSE` for more information.
|
|
242
|
+
|
|
243
|
+
## Development and CI/CD
|
|
244
|
+
|
|
245
|
+
- **Linting & Formatting**: Ruff is used to maintain high code quality and consistent style.
|
|
246
|
+
- **Documentation**: Managed by Sphinx and hosted on [Read the Docs](https://tabular-enhancement-tool.readthedocs.io/en/latest/). For more detailed examples and API documentation, please visit the official documentation site.
|
|
247
|
+
- **Publishing**: The `main` branch is automatically built and published to PyPI on every push. **Note**: Remember to bump the version in `setup.py` and `tabular_enhancement_tool/__init__.py` before pushing to `main`.
|
|
248
|
+
|
|
249
|
+
## Credits
|
|
250
|
+
|
|
251
|
+
This tool was authored by **Christopher Boyd** and co-authored/developed by **Junie**, an autonomous programmer developed by JetBrains.
|