apiphany 3.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
apiphany-3.0.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,340 @@
1
+ Metadata-Version: 2.4
2
+ Name: apiphany
3
+ Version: 3.0.0
4
+ Summary: Apiphany is a high-performance API orchestration engine. It transforms the chaotic process of extracting data from complex REST and GraphQL APIs into a clean, strictly-typed, and fully declarative workflow. Powered by `httpx` and `asyncio`, Apiphany tears through deeply chained API requests concurrently without ever triggering rate limits, handles massive multi-page payloads automatically, and seamlessly unpacks deeply nested JSON blobs into pristine Pandas DataFrames or SQL tables.
5
+ Author-email: Vatsa <228140210+svr-s@users.noreply.github.com>
6
+ Project-URL: Repository, https://github.com/svr-s/apiphany
7
+ Classifier: Programming Language :: Python :: 3
8
+ Classifier: Operating System :: OS Independent
9
+ Requires-Python: >=3.8
10
+ Description-Content-Type: text/markdown
11
+ License-File: LICENSE
12
+ Requires-Dist: httpx>=0.24.0
13
+ Requires-Dist: aiolimiter>=1.0.0
14
+ Requires-Dist: pydantic>=2.0.0
15
+ Requires-Dist: python-json-logger>=2.0.0
16
+ Provides-Extra: aws
17
+ Requires-Dist: boto3>=1.26.0; extra == "aws"
18
+ Provides-Extra: sql
19
+ Requires-Dist: pandas>=1.5.0; extra == "sql"
20
+ Requires-Dist: sqlalchemy>=2.0.0; extra == "sql"
21
+ Dynamic: license-file
22
+
23
+ # Apiphany
24
+
25
+ Apiphany is a high-performance API orchestration engine. It transforms the chaotic process of extracting data from complex REST and GraphQL APIs into a clean, strictly-typed, and fully declarative workflow. Powered by `httpx` and `asyncio`, Apiphany tears through deeply chained API requests concurrently without ever triggering rate limits, handles massive multi-page payloads automatically, and seamlessly unpacks deeply nested JSON blobs into pristine Pandas DataFrames or SQL tables.
26
+
27
+ ## Features
28
+ - **Asynchronous Concurrency**: Built entirely on `httpx` and `asyncio`, the engine fires hundreds of concurrent requests simultaneously with near-zero memory footprint.
29
+ - **Enterprise Rate Limiting**: Built-in `aiolimiter` ensures you never get IP banned. Strictly limits execution to `requests_per_second`.
30
+ - **Chained Requests**: Feed the extracted outputs of one API call as parameters into another dynamically (e.g., fetch Users -> Posts -> Comments).
31
+ - **Deep Extraction**: Native integration with `json_extract_pandas` to extract, unpack, and normalize nested JSON responses into clean DataFrames.
32
+ - **Incremental State Tracking**: Native `state.json` watermarking. Works locally or via S3 (`s3://bucket/state.json`) to persist the latest timestamps or IDs fetched.
33
+ - **Secrets Management**: Dynamically fetch client credentials or API keys directly from AWS Secrets Manager using ARNs.
34
+ - **Robust Failure Handling**: Automatic exponential backoff for server errors (`500`, `502`, `503`, `504`).
35
+ - **Cloud-Agnostic Logging**: Emits pure JSON logs using `python-json-logger` natively compatible with AWS CloudWatch and Datadog.
36
+
37
+ ## Requirements
38
+ - Python 3.8+
39
+ - `httpx`
40
+ - `aiolimiter`
41
+ - `pydantic`
42
+ - `python-json-logger`
43
+
44
+ *(Optional based on cloud providers)*
45
+ - `boto3` (for S3 exports, state management, and AWS Secrets Manager)
46
+ - `pandas` & `sqlalchemy` (for SQL database exports)
47
+
48
+ ---
49
+
50
+ ## Installation & Usage
51
+
52
+ Install the core package:
53
+ ```bash
54
+ pip install apiphany
55
+ ```
56
+
57
+ To install with AWS capabilities (S3 exports and AWS Secrets Manager) or SQL export capabilities:
58
+ ```bash
59
+ pip install apiphany[aws]
60
+ pip install apiphany[sql]
61
+ ```
62
+
63
+ ### Python API
64
+ Import `APIOrchestrator` directly into your data pipeline, Lambda function, or test suite:
65
+
66
+ ```python
67
+ import asyncio
68
+ from apiphany import APIOrchestrator
69
+
70
+ # 1. Initialize the Client
71
+ client = APIOrchestrator(
72
+ config_file="apiphany_config.json",
73
+ entity_name="jsonplaceholder"
74
+ )
75
+
76
+ # 2. Execute an API asynchronously!
77
+ # output_format can be "raw" (list of JSON dicts) or "flattened" (Pandas DataFrame)
78
+ df = asyncio.run(client.execute(
79
+ api_identifier="get_users",
80
+ output_format="flattened"
81
+ ))
82
+
83
+ print(df.head())
84
+ ```
85
+
86
+ ---
87
+
88
+ ## Configuration Schema Details
89
+
90
+ Apiphany operates entirely off a declarative JSON configuration file (`apiphany_config.json`). Every attribute is strictly validated at runtime by `pydantic`.
91
+
92
+ ### 1. Root Entity Structure
93
+
94
+ The top level defines logical groups of APIs (Entities), global credentials, and rate limit rules.
95
+
96
+ - **`entity_name`** (`string`): The logical name of the API provider (e.g., `"stripe"`, `"salesforce"`).
97
+ - **`post_process`** (`string`): *(Optional)* Filepath to a custom Python script to mutate raw JSON before extraction.
98
+ - **`certificates`** (`dict`): *(Optional)* Must contain `"cert"` and `"key"` paths. Supports local or `s3://` URIs for mTLS.
99
+ - **`client_credentials`** (`dict`): Global dictionary for Secrets. Supports templating (`{{variable}}`). If you provide `"secrets_manager_arn"`, Apiphany resolves it directly from AWS Secrets Manager.
100
+ - **`retry_config`** (`dict`): Contains `total_retries` (default: 3) and `requests_per_second` (default: 10).
101
+ - **`api_list`** (`list`): The core array of `APISchema` endpoint objects.
102
+
103
+ **Example:**
104
+ ```json
105
+ {
106
+ "api_config": [
107
+ {
108
+ "entity_name": "stripe",
109
+ "client_credentials": {
110
+ "secrets_manager_arn": "arn:aws:secretsmanager:us-east-1:123:secret:stripe-keys"
111
+ },
112
+ "retry_config": {
113
+ "total_retries": 5,
114
+ "requests_per_second": 20
115
+ },
116
+ "api_list": [ ... ]
117
+ }
118
+ ]
119
+ }
120
+ ```
121
+
122
+ ---
123
+
124
+ ### 2. API Object (`api_list`)
125
+
126
+ Each object inside `api_list` defines a single executable endpoint.
127
+
128
+ - **`api_identifier`** (`string`): Unique ID used to execute the endpoint via `.execute(api_identifier="my_api")`.
129
+ - **`api_name`** (`string`): Human-readable description.
130
+ - **`method`** (`string`): HTTP Method (`"GET"`, `"POST"`, `"PUT"`, etc.).
131
+ - **`url`** (`string`): The URL. Supports templating from parent outputs (e.g., `https://api.com/users/{{id}}`).
132
+ - **`auth_type`** (`string`): Authentication type: `"None"`, `"Bearer"`, `"Basic"`, or `"APIKey"`.
133
+ - **`api_key_name`** (`string`): If auth_type is `"APIKey"`, the header key name (default: `"x-api-key"`).
134
+ - **`headers`** (`dict`): Static HTTP headers to inject.
135
+ - **`query_params`** (`dict`): Static URL parameters to inject. Supports templating.
136
+ - **`payload`** (`dict`): JSON body payload for POST/PUT requests.
137
+ - **`graphql_query`** (`string`): If supplied, forces a POST request and wraps the string as `{"query": "..."}`. Supports templating.
138
+
139
+ **Example:**
140
+ ```json
141
+ {
142
+ "api_identifier": "create_user",
143
+ "api_name": "Create a new User",
144
+ "method": "POST",
145
+ "url": "https://api.example.com/users",
146
+ "auth_type": "Bearer",
147
+ "headers": {
148
+ "Content-Type": "application/json"
149
+ },
150
+ "payload": {
151
+ "name": "Jane",
152
+ "role": "admin"
153
+ }
154
+ }
155
+ ```
156
+
157
+ ---
158
+
159
+ ### 3. Pagination Configuration
160
+
161
+ Apiphany automatically scrolls through pages until the payload is empty.
162
+
163
+ - **`type`** (`string`): `"page_based"` or `"offset_based"`.
164
+ - **`page_key`** (`string`): Parameter name for page number (default: `"page"`).
165
+ - **`size_key`** (`string`): Parameter name for page size (default: `"limit"`).
166
+ - **`page_size`** (`int`): Number of records to request per page (default: 100).
167
+ - **`offset_key`** (`string`): Parameter name for offset (default: `"offset"`).
168
+ - **`limit_key`** (`string`): Parameter name for limit (default: `"limit"`).
169
+ - **`limit_value`** (`int`): Offset increment amount (default: 100).
170
+ - **`stop_condition`** (`string`): Defines when to stop. `"no_data"` stops when returned array length < limit.
171
+
172
+ **Example (Offset Based):**
173
+ ```json
174
+ "pagination": {
175
+ "type": "offset_based",
176
+ "offset_key": "start",
177
+ "limit_key": "limit",
178
+ "limit_value": 500
179
+ }
180
+ ```
181
+
182
+ ---
183
+
184
+ ### 4. Chained Requests
185
+
186
+ Dynamically feeds the output of this API into another child API.
187
+
188
+ - **`child_api_identifier`** (`string`): The `api_identifier` of the next endpoint to trigger.
189
+ - **`key_mapping`** (`dict`): Maps parent JSON keys to child URL template parameters (e.g., `{"id": "userId"}`).
190
+ - **`max_concurrent_requests`** (`int`): Maximum parallel threads hitting the child API (default: 5).
191
+ - **`batch_size`** (`int`): If > 1, combines parent values into comma-separated lists (e.g. `1,2,3,4,5`).
192
+
193
+ **Example (Batched Child Resolution):**
194
+ ```json
195
+ "chained_request": {
196
+ "child_api_identifier": "get_user_posts",
197
+ "key_mapping": {
198
+ "id": "userIds"
199
+ },
200
+ "max_concurrent_requests": 10,
201
+ "batch_size": 20
202
+ }
203
+ ```
204
+ *If 100 IDs are fetched, it will group them into 5 concurrent requests, injecting `?userIds=1,2,3...20` into the child URL.*
205
+
206
+ ---
207
+
208
+ ### 5. Extractor Configuration
209
+
210
+ Defines how deeply nested API JSON is parsed and flattened into Pandas DataFrames. Powered by `json_extract_pandas`.
211
+
212
+ - **`response_data_extractor`** (`string`): Target JSON path to pluck the core array from the root response (e.g. `"data.results"`).
213
+ - **`json_extract_config.record_path`** (`list`): Path to the nested array within the row to explode into multiple rows (e.g., `["line_items"]`).
214
+ - **`json_extract_config.meta`** (`list`): Fields to duplicate across every exploded row (e.g., `["transaction_id", ["user", "name"]]`).
215
+ - **`json_extract_config.errors`** (`string`): How to handle missing keys (`"raise"` or `"ignore"`).
216
+
217
+ **Example:**
218
+ ```json
219
+ "extractor_config": {
220
+ "response_data_extractor": "data.results",
221
+ "json_extract_config": {
222
+ "record_path": [
223
+ "line_items"
224
+ ],
225
+ "meta": [
226
+ "transaction_id",
227
+ "timestamp",
228
+ ["user", "name"]
229
+ ]
230
+ }
231
+ }
232
+ ```
233
+ *The above will flatten an array of `line_items` while ensuring `transaction_id`, `timestamp`, and `user.name` are attached as columns to every item's row.*
234
+
235
+ ---
236
+
237
+ ### 6. Export Configuration
238
+
239
+ Defines where Apiphany automatically saves the final dataset when execution finishes.
240
+
241
+ - **`save_output`** (`bool`): Master toggle to enable saving (default: `true`).
242
+ - **`output_type`** (`string`): `"flattened"` (Pandas CSV/JSON/SQL) or `"raw"` (Pure JSON dicts).
243
+ - **`target.type`** (`string`): Destination: `"file"` or `"sql"`.
244
+ - **`target.location`** (`string`): Local filepath (`"out.csv"`), S3 URI (`"s3://bucket/out.csv"`), or SQL URI (`"sqlite:///db.sqlite"`).
245
+ - **`target.table_name`** (`string`): (SQL Only) Name of the target database table.
246
+ - **`target.if_exists`** (`string`): (SQL Only) Behavior if table exists (`"append"`, `"replace"`, `"fail"`).
247
+
248
+ **Example (Export to SQL):**
249
+ ```json
250
+ "export_config": {
251
+ "save_output": true,
252
+ "output_type": "flattened",
253
+ "target": {
254
+ "type": "sql",
255
+ "location": "sqlite:///my_db.db",
256
+ "table_name": "transactions",
257
+ "if_exists": "append"
258
+ }
259
+ }
260
+ ```
261
+
262
+ ---
263
+
264
+ ### 7. State Tracking (Incremental Syncs)
265
+
266
+ Saves the highest watermark detected in a payload to skip historical downloads on the next run.
267
+
268
+ - **`incremental_key`** (`string`): The column to evaluate for the highest watermark (e.g., `"updated_at"`, `"id"`).
269
+
270
+ **Example:**
271
+ ```json
272
+ "state_tracking": {
273
+ "incremental_key": "updated_at"
274
+ }
275
+ ```
276
+ *When you run Apiphany, it scans the DataFrame for the highest `updated_at` and saves it. On your next run, you can inject it directly into the URL by putting `{{state_updated_at}}` in your `query_params`.*
277
+
278
+ ---
279
+
280
+ ## Full End-to-End Example Config
281
+
282
+ Here is a complete example demonstrating how Apiphany can fetch a list of Users, automatically extract the `id` from each user, and concurrently fetch all the Posts authored by those users using batched Chained Requests.
283
+
284
+ ```json
285
+ {
286
+ "api_config": [
287
+ {
288
+ "entity_name": "jsonplaceholder",
289
+ "client_credentials": {},
290
+ "retry_config": {
291
+ "total_retries": 3,
292
+ "requests_per_second": 10
293
+ },
294
+ "api_list": [
295
+ {
296
+ "api_identifier": "get_users",
297
+ "api_name": "Fetch Users",
298
+ "method": "GET",
299
+ "url": "https://jsonplaceholder.typicode.com/users",
300
+ "auth_type": "None",
301
+ "chained_request": {
302
+ "child_api_identifier": "get_user_posts",
303
+ "key_mapping": {
304
+ "id": "userId"
305
+ },
306
+ "batch_size": 5,
307
+ "max_concurrent_requests": 10
308
+ },
309
+ "export_config": {
310
+ "save_output": true,
311
+ "output_type": "flattened",
312
+ "target": {
313
+ "type": "file",
314
+ "location": "users.csv"
315
+ }
316
+ }
317
+ },
318
+ {
319
+ "api_identifier": "get_user_posts",
320
+ "api_name": "Fetch Posts for User",
321
+ "method": "GET",
322
+ "url": "https://jsonplaceholder.typicode.com/posts",
323
+ "query_params": {
324
+ "userId": "{{userId}}"
325
+ },
326
+ "auth_type": "None",
327
+ "export_config": {
328
+ "save_output": true,
329
+ "output_type": "flattened",
330
+ "target": {
331
+ "type": "file",
332
+ "location": "posts.csv"
333
+ }
334
+ }
335
+ }
336
+ ]
337
+ }
338
+ ]
339
+ }
340
+ ```
@@ -0,0 +1,318 @@
1
+ # Apiphany
2
+
3
+ Apiphany is a high-performance API orchestration engine. It transforms the chaotic process of extracting data from complex REST and GraphQL APIs into a clean, strictly-typed, and fully declarative workflow. Powered by `httpx` and `asyncio`, Apiphany tears through deeply chained API requests concurrently without ever triggering rate limits, handles massive multi-page payloads automatically, and seamlessly unpacks deeply nested JSON blobs into pristine Pandas DataFrames or SQL tables.
4
+
5
+ ## Features
6
+ - **Asynchronous Concurrency**: Built entirely on `httpx` and `asyncio`, the engine fires hundreds of concurrent requests simultaneously with near-zero memory footprint.
7
+ - **Enterprise Rate Limiting**: Built-in `aiolimiter` ensures you never get IP banned. Strictly limits execution to `requests_per_second`.
8
+ - **Chained Requests**: Feed the extracted outputs of one API call as parameters into another dynamically (e.g., fetch Users -> Posts -> Comments).
9
+ - **Deep Extraction**: Native integration with `json_extract_pandas` to extract, unpack, and normalize nested JSON responses into clean DataFrames.
10
+ - **Incremental State Tracking**: Native `state.json` watermarking. Works locally or via S3 (`s3://bucket/state.json`) to persist the latest timestamps or IDs fetched.
11
+ - **Secrets Management**: Dynamically fetch client credentials or API keys directly from AWS Secrets Manager using ARNs.
12
+ - **Robust Failure Handling**: Automatic exponential backoff for server errors (`500`, `502`, `503`, `504`).
13
+ - **Cloud-Agnostic Logging**: Emits pure JSON logs using `python-json-logger` natively compatible with AWS CloudWatch and Datadog.
14
+
15
+ ## Requirements
16
+ - Python 3.8+
17
+ - `httpx`
18
+ - `aiolimiter`
19
+ - `pydantic`
20
+ - `python-json-logger`
21
+
22
+ *(Optional based on cloud providers)*
23
+ - `boto3` (for S3 exports, state management, and AWS Secrets Manager)
24
+ - `pandas` & `sqlalchemy` (for SQL database exports)
25
+
26
+ ---
27
+
28
+ ## Installation & Usage
29
+
30
+ Install the core package:
31
+ ```bash
32
+ pip install apiphany
33
+ ```
34
+
35
+ To install with AWS capabilities (S3 exports and AWS Secrets Manager) or SQL export capabilities:
36
+ ```bash
37
+ pip install apiphany[aws]
38
+ pip install apiphany[sql]
39
+ ```
40
+
41
+ ### Python API
42
+ Import `APIOrchestrator` directly into your data pipeline, Lambda function, or test suite:
43
+
44
+ ```python
45
+ import asyncio
46
+ from apiphany import APIOrchestrator
47
+
48
+ # 1. Initialize the Client
49
+ client = APIOrchestrator(
50
+ config_file="apiphany_config.json",
51
+ entity_name="jsonplaceholder"
52
+ )
53
+
54
+ # 2. Execute an API asynchronously!
55
+ # output_format can be "raw" (list of JSON dicts) or "flattened" (Pandas DataFrame)
56
+ df = asyncio.run(client.execute(
57
+ api_identifier="get_users",
58
+ output_format="flattened"
59
+ ))
60
+
61
+ print(df.head())
62
+ ```
63
+
64
+ ---
65
+
66
+ ## Configuration Schema Details
67
+
68
+ Apiphany operates entirely off a declarative JSON configuration file (`apiphany_config.json`). Every attribute is strictly validated at runtime by `pydantic`.
69
+
70
+ ### 1. Root Entity Structure
71
+
72
+ The top level defines logical groups of APIs (Entities), global credentials, and rate limit rules.
73
+
74
+ - **`entity_name`** (`string`): The logical name of the API provider (e.g., `"stripe"`, `"salesforce"`).
75
+ - **`post_process`** (`string`): *(Optional)* Filepath to a custom Python script to mutate raw JSON before extraction.
76
+ - **`certificates`** (`dict`): *(Optional)* Must contain `"cert"` and `"key"` paths. Supports local or `s3://` URIs for mTLS.
77
+ - **`client_credentials`** (`dict`): Global dictionary for Secrets. Supports templating (`{{variable}}`). If you provide `"secrets_manager_arn"`, Apiphany resolves it directly from AWS Secrets Manager.
78
+ - **`retry_config`** (`dict`): Contains `total_retries` (default: 3) and `requests_per_second` (default: 10).
79
+ - **`api_list`** (`list`): The core array of `APISchema` endpoint objects.
80
+
81
+ **Example:**
82
+ ```json
83
+ {
84
+ "api_config": [
85
+ {
86
+ "entity_name": "stripe",
87
+ "client_credentials": {
88
+ "secrets_manager_arn": "arn:aws:secretsmanager:us-east-1:123:secret:stripe-keys"
89
+ },
90
+ "retry_config": {
91
+ "total_retries": 5,
92
+ "requests_per_second": 20
93
+ },
94
+ "api_list": [ ... ]
95
+ }
96
+ ]
97
+ }
98
+ ```
99
+
100
+ ---
101
+
102
+ ### 2. API Object (`api_list`)
103
+
104
+ Each object inside `api_list` defines a single executable endpoint.
105
+
106
+ - **`api_identifier`** (`string`): Unique ID used to execute the endpoint via `.execute(api_identifier="my_api")`.
107
+ - **`api_name`** (`string`): Human-readable description.
108
+ - **`method`** (`string`): HTTP Method (`"GET"`, `"POST"`, `"PUT"`, etc.).
109
+ - **`url`** (`string`): The URL. Supports templating from parent outputs (e.g., `https://api.com/users/{{id}}`).
110
+ - **`auth_type`** (`string`): Authentication type: `"None"`, `"Bearer"`, `"Basic"`, or `"APIKey"`.
111
+ - **`api_key_name`** (`string`): If auth_type is `"APIKey"`, the header key name (default: `"x-api-key"`).
112
+ - **`headers`** (`dict`): Static HTTP headers to inject.
113
+ - **`query_params`** (`dict`): Static URL parameters to inject. Supports templating.
114
+ - **`payload`** (`dict`): JSON body payload for POST/PUT requests.
115
+ - **`graphql_query`** (`string`): If supplied, forces a POST request and wraps the string as `{"query": "..."}`. Supports templating.
116
+
117
+ **Example:**
118
+ ```json
119
+ {
120
+ "api_identifier": "create_user",
121
+ "api_name": "Create a new User",
122
+ "method": "POST",
123
+ "url": "https://api.example.com/users",
124
+ "auth_type": "Bearer",
125
+ "headers": {
126
+ "Content-Type": "application/json"
127
+ },
128
+ "payload": {
129
+ "name": "Jane",
130
+ "role": "admin"
131
+ }
132
+ }
133
+ ```
134
+
135
+ ---
136
+
137
+ ### 3. Pagination Configuration
138
+
139
+ Apiphany automatically scrolls through pages until the payload is empty.
140
+
141
+ - **`type`** (`string`): `"page_based"` or `"offset_based"`.
142
+ - **`page_key`** (`string`): Parameter name for page number (default: `"page"`).
143
+ - **`size_key`** (`string`): Parameter name for page size (default: `"limit"`).
144
+ - **`page_size`** (`int`): Number of records to request per page (default: 100).
145
+ - **`offset_key`** (`string`): Parameter name for offset (default: `"offset"`).
146
+ - **`limit_key`** (`string`): Parameter name for limit (default: `"limit"`).
147
+ - **`limit_value`** (`int`): Offset increment amount (default: 100).
148
+ - **`stop_condition`** (`string`): Defines when to stop. `"no_data"` stops when returned array length < limit.
149
+
150
+ **Example (Offset Based):**
151
+ ```json
152
+ "pagination": {
153
+ "type": "offset_based",
154
+ "offset_key": "start",
155
+ "limit_key": "limit",
156
+ "limit_value": 500
157
+ }
158
+ ```
159
+
160
+ ---
161
+
162
+ ### 4. Chained Requests
163
+
164
+ Dynamically feeds the output of this API into another child API.
165
+
166
+ - **`child_api_identifier`** (`string`): The `api_identifier` of the next endpoint to trigger.
167
+ - **`key_mapping`** (`dict`): Maps parent JSON keys to child URL template parameters (e.g., `{"id": "userId"}`).
168
+ - **`max_concurrent_requests`** (`int`): Maximum parallel threads hitting the child API (default: 5).
169
+ - **`batch_size`** (`int`): If > 1, combines parent values into comma-separated lists (e.g. `1,2,3,4,5`).
170
+
171
+ **Example (Batched Child Resolution):**
172
+ ```json
173
+ "chained_request": {
174
+ "child_api_identifier": "get_user_posts",
175
+ "key_mapping": {
176
+ "id": "userIds"
177
+ },
178
+ "max_concurrent_requests": 10,
179
+ "batch_size": 20
180
+ }
181
+ ```
182
+ *If 100 IDs are fetched, it will group them into 5 concurrent requests, injecting `?userIds=1,2,3...20` into the child URL.*
183
+
184
+ ---
185
+
186
+ ### 5. Extractor Configuration
187
+
188
+ Defines how deeply nested API JSON is parsed and flattened into Pandas DataFrames. Powered by `json_extract_pandas`.
189
+
190
+ - **`response_data_extractor`** (`string`): Target JSON path to pluck the core array from the root response (e.g. `"data.results"`).
191
+ - **`json_extract_config.record_path`** (`list`): Path to the nested array within the row to explode into multiple rows (e.g., `["line_items"]`).
192
+ - **`json_extract_config.meta`** (`list`): Fields to duplicate across every exploded row (e.g., `["transaction_id", ["user", "name"]]`).
193
+ - **`json_extract_config.errors`** (`string`): How to handle missing keys (`"raise"` or `"ignore"`).
194
+
195
+ **Example:**
196
+ ```json
197
+ "extractor_config": {
198
+ "response_data_extractor": "data.results",
199
+ "json_extract_config": {
200
+ "record_path": [
201
+ "line_items"
202
+ ],
203
+ "meta": [
204
+ "transaction_id",
205
+ "timestamp",
206
+ ["user", "name"]
207
+ ]
208
+ }
209
+ }
210
+ ```
211
+ *The above will flatten an array of `line_items` while ensuring `transaction_id`, `timestamp`, and `user.name` are attached as columns to every item's row.*
212
+
213
+ ---
214
+
215
+ ### 6. Export Configuration
216
+
217
+ Defines where Apiphany automatically saves the final dataset when execution finishes.
218
+
219
+ - **`save_output`** (`bool`): Master toggle to enable saving (default: `true`).
220
+ - **`output_type`** (`string`): `"flattened"` (Pandas CSV/JSON/SQL) or `"raw"` (Pure JSON dicts).
221
+ - **`target.type`** (`string`): Destination: `"file"` or `"sql"`.
222
+ - **`target.location`** (`string`): Local filepath (`"out.csv"`), S3 URI (`"s3://bucket/out.csv"`), or SQL URI (`"sqlite:///db.sqlite"`).
223
+ - **`target.table_name`** (`string`): (SQL Only) Name of the target database table.
224
+ - **`target.if_exists`** (`string`): (SQL Only) Behavior if table exists (`"append"`, `"replace"`, `"fail"`).
225
+
226
+ **Example (Export to SQL):**
227
+ ```json
228
+ "export_config": {
229
+ "save_output": true,
230
+ "output_type": "flattened",
231
+ "target": {
232
+ "type": "sql",
233
+ "location": "sqlite:///my_db.db",
234
+ "table_name": "transactions",
235
+ "if_exists": "append"
236
+ }
237
+ }
238
+ ```
239
+
240
+ ---
241
+
242
+ ### 7. State Tracking (Incremental Syncs)
243
+
244
+ Saves the highest watermark detected in a payload to skip historical downloads on the next run.
245
+
246
+ - **`incremental_key`** (`string`): The column to evaluate for the highest watermark (e.g., `"updated_at"`, `"id"`).
247
+
248
+ **Example:**
249
+ ```json
250
+ "state_tracking": {
251
+ "incremental_key": "updated_at"
252
+ }
253
+ ```
254
+ *When you run Apiphany, it scans the DataFrame for the highest `updated_at` and saves it. On your next run, you can inject it directly into the URL by putting `{{state_updated_at}}` in your `query_params`.*
255
+
256
+ ---
257
+
258
+ ## Full End-to-End Example Config
259
+
260
+ Here is a complete example demonstrating how Apiphany can fetch a list of Users, automatically extract the `id` from each user, and concurrently fetch all the Posts authored by those users using batched Chained Requests.
261
+
262
+ ```json
263
+ {
264
+ "api_config": [
265
+ {
266
+ "entity_name": "jsonplaceholder",
267
+ "client_credentials": {},
268
+ "retry_config": {
269
+ "total_retries": 3,
270
+ "requests_per_second": 10
271
+ },
272
+ "api_list": [
273
+ {
274
+ "api_identifier": "get_users",
275
+ "api_name": "Fetch Users",
276
+ "method": "GET",
277
+ "url": "https://jsonplaceholder.typicode.com/users",
278
+ "auth_type": "None",
279
+ "chained_request": {
280
+ "child_api_identifier": "get_user_posts",
281
+ "key_mapping": {
282
+ "id": "userId"
283
+ },
284
+ "batch_size": 5,
285
+ "max_concurrent_requests": 10
286
+ },
287
+ "export_config": {
288
+ "save_output": true,
289
+ "output_type": "flattened",
290
+ "target": {
291
+ "type": "file",
292
+ "location": "users.csv"
293
+ }
294
+ }
295
+ },
296
+ {
297
+ "api_identifier": "get_user_posts",
298
+ "api_name": "Fetch Posts for User",
299
+ "method": "GET",
300
+ "url": "https://jsonplaceholder.typicode.com/posts",
301
+ "query_params": {
302
+ "userId": "{{userId}}"
303
+ },
304
+ "auth_type": "None",
305
+ "export_config": {
306
+ "save_output": true,
307
+ "output_type": "flattened",
308
+ "target": {
309
+ "type": "file",
310
+ "location": "posts.csv"
311
+ }
312
+ }
313
+ }
314
+ ]
315
+ }
316
+ ]
317
+ }
318
+ ```