@jackchuka/gql-ingest 1.1.0 → 1.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +125 -20
- package/bin/cli.js +44 -41
- package/dist/config.d.ts +12 -2
- package/dist/config.d.ts.map +1 -1
- package/dist/graphql-client.d.ts +6 -2
- package/dist/graphql-client.d.ts.map +1 -1
- package/dist/mapper.d.ts +2 -2
- package/dist/mapper.d.ts.map +1 -1
- package/dist/metrics.d.ts +5 -0
- package/dist/metrics.d.ts.map +1 -1
- package/package.json +1 -1
- package/src/cli.ts +24 -22
- package/src/config.test.ts +74 -7
- package/src/config.ts +37 -4
- package/src/graphql-client.test.ts +127 -1
- package/src/graphql-client.ts +132 -32
- package/src/mapper.test.ts +4 -4
- package/src/mapper.ts +13 -8
- package/src/metrics.ts +22 -0
package/README.md
CHANGED
|
@@ -10,6 +10,10 @@ A TypeScript CLI tool that reads CSV files and ingests data into GraphQL APIs th
|
|
|
10
10
|
- ✅ External GraphQL mutation definitions (separate .graphql files)
|
|
11
11
|
- ✅ CSV-to-GraphQL variable mapping via JSON configuration
|
|
12
12
|
- ✅ Configurable GraphQL endpoint and headers
|
|
13
|
+
- ✅ **Parallel processing** with dependency management
|
|
14
|
+
- ✅ Entity-level and row-level concurrency control
|
|
15
|
+
- ✅ **Retry capabilities** with exponential backoff and configurable error handling
|
|
16
|
+
- ✅ Comprehensive metrics and progress tracking
|
|
13
17
|
|
|
14
18
|
## Installation
|
|
15
19
|
|
|
@@ -68,6 +72,77 @@ npx @jackchuka/gql-ingest \
|
|
|
68
72
|
--headers '{"X-API-Key": "your-api-key", "Content-Type": "application/json"}'
|
|
69
73
|
```
|
|
70
74
|
|
|
75
|
+
## Parallel Processing 🚀
|
|
76
|
+
|
|
77
|
+
GQL Ingest supports advanced parallel processing with dependency management for high-performance data ingestion:
|
|
78
|
+
|
|
79
|
+
### Key Capabilities
|
|
80
|
+
|
|
81
|
+
- **Entity-level parallelism**: Process multiple entities (users, products, orders) concurrently
|
|
82
|
+
- **Row-level parallelism**: Process multiple CSV rows within an entity concurrently
|
|
83
|
+
- **Dependency management**: Ensure entities process in the correct order (e.g., users before orders)
|
|
84
|
+
- **Smart batching**: Control exactly how many entities/rows process simultaneously
|
|
85
|
+
- **Real-time metrics**: Track progress, success rates, and performance
|
|
86
|
+
|
|
87
|
+
### Quick Example
|
|
88
|
+
|
|
89
|
+
```yaml
|
|
90
|
+
# config.yaml - Add to your configuration directory
|
|
91
|
+
parallelProcessing:
|
|
92
|
+
concurrency: 10 # Process up to 10 CSV rows per entity concurrently
|
|
93
|
+
entityConcurrency: 3 # Process up to 3 entities simultaneously
|
|
94
|
+
preserveRowOrder: false # Allow rows to complete out of order for speed
|
|
95
|
+
|
|
96
|
+
# Define dependencies between entities
|
|
97
|
+
entityDependencies:
|
|
98
|
+
products: ["users"] # Products must wait for users to complete
|
|
99
|
+
orders: ["products"] # Orders must wait for products to complete
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
**Performance Impact**: This configuration can process data **10-50x faster** than sequential processing, depending on your GraphQL API's capabilities.
|
|
103
|
+
|
|
104
|
+
👉 **[Full Parallel Processing Guide](PARALLEL_PROCESSING.md)** - Detailed configuration options, performance tuning, and examples.
|
|
105
|
+
|
|
106
|
+
## Retry Capabilities 🔄
|
|
107
|
+
|
|
108
|
+
GQL Ingest includes robust retry functionality to handle transient failures and improve reliability:
|
|
109
|
+
|
|
110
|
+
### Key Features
|
|
111
|
+
|
|
112
|
+
- **Automatic retries**: Failed GraphQL mutations are retried automatically
|
|
113
|
+
- **Exponential backoff**: Intelligent delay increases between retry attempts
|
|
114
|
+
- **Jitter**: Randomization prevents thundering herd problems
|
|
115
|
+
- **Configurable error codes**: Control which HTTP status codes trigger retries
|
|
116
|
+
- **Per-entity overrides**: Different retry settings for different entities
|
|
117
|
+
- **Metrics tracking**: Monitor retry success rates and attempt counts
|
|
118
|
+
|
|
119
|
+
### Quick Example
|
|
120
|
+
|
|
121
|
+
```yaml
|
|
122
|
+
# config.yaml - Add to your configuration directory
|
|
123
|
+
retry:
|
|
124
|
+
maxAttempts: 5 # Retry up to 5 times (default: 3)
|
|
125
|
+
baseDelay: 2000 # Start with 2s delay (default: 1000ms)
|
|
126
|
+
maxDelay: 60000 # Cap delays at 60s (default: 30000ms)
|
|
127
|
+
exponentialBackoff: true # Double delay each retry (default: true)
|
|
128
|
+
retryableStatusCodes: # Which HTTP errors to retry (defaults shown)
|
|
129
|
+
- 408 # Request Timeout
|
|
130
|
+
- 429 # Too Many Requests
|
|
131
|
+
- 500 # Internal Server Error
|
|
132
|
+
- 502 # Bad Gateway
|
|
133
|
+
- 503 # Service Unavailable
|
|
134
|
+
- 504 # Gateway Timeout
|
|
135
|
+
|
|
136
|
+
# Per-entity retry overrides
|
|
137
|
+
entityConfig:
|
|
138
|
+
critical-orders:
|
|
139
|
+
retry:
|
|
140
|
+
maxAttempts: 10 # More retries for critical data
|
|
141
|
+
baseDelay: 500 # Faster initial retry
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
**Reliability Impact**: Retry capabilities can improve success rates from 95% to 99.9%+ for APIs with transient failures.
|
|
145
|
+
|
|
71
146
|
## Configuration
|
|
72
147
|
|
|
73
148
|
The `--config` flag points to a configuration directory containing three subdirectories:
|
|
@@ -75,6 +150,7 @@ The `--config` flag points to a configuration directory containing three subdire
|
|
|
75
150
|
- `data/` - CSV files with actual data
|
|
76
151
|
- `graphql/` - GraphQL mutation definitions
|
|
77
152
|
- `mappings/` - JSON files that map CSV columns to GraphQL variables
|
|
153
|
+
- `config.yaml` - _(Optional)_ Parallel processing and dependency configuration
|
|
78
154
|
|
|
79
155
|
Each entity has three corresponding files across these directories with matching names.
|
|
80
156
|
|
|
@@ -113,37 +189,66 @@ mutation CreateItem($name: String!, $sku: String!) {
|
|
|
113
189
|
}
|
|
114
190
|
```
|
|
115
191
|
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
192
|
+
**examples/demo/config.yaml** _(Optional - for parallel processing and retry configuration)_:
|
|
193
|
+
|
|
194
|
+
```yaml
|
|
195
|
+
# Parallel processing configuration
|
|
196
|
+
parallelProcessing:
|
|
197
|
+
concurrency: 5 # Process 5 rows per entity concurrently
|
|
198
|
+
entityConcurrency: 2 # Process 2 entities simultaneously
|
|
199
|
+
preserveRowOrder: false # Allow faster out-of-order completion
|
|
200
|
+
|
|
201
|
+
# Global retry configuration
|
|
202
|
+
retry:
|
|
203
|
+
maxAttempts: 3 # Retry failed requests up to 3 times
|
|
204
|
+
baseDelay: 1000 # Start with 1s delay between retries
|
|
205
|
+
exponentialBackoff: true # Double delay each retry
|
|
206
|
+
|
|
207
|
+
# Entity dependencies
|
|
208
|
+
entityDependencies:
|
|
209
|
+
items: ["users"] # Items depend on users being processed first
|
|
210
|
+
|
|
211
|
+
# Per-entity overrides (optional)
|
|
212
|
+
entityConfig:
|
|
213
|
+
users:
|
|
214
|
+
retry:
|
|
215
|
+
maxAttempts: 5 # More retries for user creation
|
|
216
|
+
items:
|
|
217
|
+
concurrency: 10 # Higher concurrency for items
|
|
128
218
|
```
|
|
129
219
|
|
|
130
|
-
|
|
220
|
+
## Development
|
|
131
221
|
|
|
132
|
-
|
|
222
|
+
### Scripts
|
|
133
223
|
|
|
134
224
|
```bash
|
|
135
|
-
npm
|
|
225
|
+
npm run build # Build CLI bundle with esbuild
|
|
226
|
+
npm run build:types # Generate TypeScript declarations
|
|
227
|
+
npm run build:all # Build bundle + types
|
|
228
|
+
npm run dev # Run in development mode
|
|
229
|
+
npm run test # Run test suite
|
|
136
230
|
```
|
|
137
231
|
|
|
138
232
|
## How It Works
|
|
139
233
|
|
|
140
234
|
1. **Discovery**: The tool scans the `mappings/` directory for `.json` files
|
|
141
|
-
2. **
|
|
142
|
-
|
|
235
|
+
2. **Dependency Resolution**: Analyzes `entityDependencies` to create execution waves
|
|
236
|
+
3. **Parallel Processing**: For each dependency wave:
|
|
237
|
+
- Processes up to `entityConcurrency` entities simultaneously
|
|
238
|
+
- Within each entity, processes up to `concurrency` CSV rows concurrently
|
|
239
|
+
- Waits for the entire wave to complete before starting the next wave
|
|
240
|
+
4. **GraphQL Execution**: For each CSV row:
|
|
143
241
|
- Loads the GraphQL mutation definition
|
|
144
|
-
- Maps CSV columns to GraphQL variables
|
|
145
|
-
- Executes the mutation
|
|
146
|
-
|
|
242
|
+
- Maps CSV columns to GraphQL variables using the mapping configuration
|
|
243
|
+
- Executes the mutation against the GraphQL endpoint
|
|
244
|
+
5. **Error Handling & Retries**:
|
|
245
|
+
- Failed mutations are automatically retried with exponential backoff
|
|
246
|
+
- Non-retryable errors (e.g., validation failures) are logged and skipped
|
|
247
|
+
- Configurable retry policies per entity type
|
|
248
|
+
6. **Metrics & Monitoring**:
|
|
249
|
+
- Real-time progress tracking and success/failure rates
|
|
250
|
+
- Retry attempt counts and success rates
|
|
251
|
+
- Detailed per-entity performance breakdown
|
|
147
252
|
|
|
148
253
|
## License
|
|
149
254
|
|