@async-fusion/data 1.0.1 → 1.0.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +621 -186
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -1,263 +1,698 @@
|
|
|
1
|
-
#
|
|
1
|
+
# async-fusion/data
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
### Unified Data Streaming Library for Kafka, Spark, and Modern Data Pipelines
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
Built with lots of bugs :P and love <3 by Udayan Sharma
|
|
6
6
|
|
|
7
|
-
|
|
7
|
+
[](https://www.npmjs.com/package/@async-fusion/data)
|
|
8
|
+
[](https://www.npmjs.com/package/@async-fusion/data)
|
|
9
|
+
[](https://www.typescriptlang.org/)
|
|
10
|
+
[](LICENSE)
|
|
11
|
+
[](https://nodejs.org/)
|
|
8
12
|
|
|
9
|
-
|
|
10
|
-
|
|
13
|
+
[Documentation](https://github.com/UdayanSharma/async-fusion-data) •
|
|
14
|
+
[Report Bug](https://github.com/UdayanSharma/async-fusion-data/issues) •
|
|
15
|
+
[Request Feature](https://github.com/UdayanSharma/async-fusion-data/issues) •
|
|
16
|
+
[Examples](./examples)
|
|
17
|
+
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
## Table of Contents
|
|
21
|
+
|
|
22
|
+
- [Why This Library?](#why-this-library)
|
|
23
|
+
- [Features](#features)
|
|
24
|
+
- [Installation](#installation)
|
|
25
|
+
- [Quick Start](#quick-start)
|
|
26
|
+
- [Core Concepts](#core-concepts)
|
|
27
|
+
- [API Documentation](#api-documentation)
|
|
28
|
+
- [Real-World Examples](#real-world-examples)
|
|
29
|
+
- [Error Handling](#error-handling)
|
|
30
|
+
- [Performance](#performance)
|
|
31
|
+
- [Contributing](#contributing)
|
|
32
|
+
- [License](#license)
|
|
33
|
+
|
|
34
|
+
---
|
|
35
|
+
|
|
36
|
+
## Why This Library?
|
|
37
|
+
|
|
38
|
+
### The Problem
|
|
39
|
+
|
|
40
|
+
Building real-time data pipelines today requires juggling multiple technologies:
|
|
41
|
+
|
|
42
|
+
- Kafka for message streaming
|
|
43
|
+
- Spark for big data processing
|
|
44
|
+
- Custom code for error handling
|
|
45
|
+
- Manual monitoring for pipeline health
|
|
46
|
+
- Different APIs for each technology
|
|
47
|
+
|
|
48
|
+
### The Solution
|
|
49
|
+
|
|
50
|
+
@async-fusion/data provides a unified API that brings Kafka streaming, Spark processing, and production-grade error handling into a single, easy-to-use library.
|
|
51
|
+
|
|
52
|
+
```javascript
|
|
53
|
+
// One library to rule them all
|
|
54
|
+
const { PipelineBuilder, KafkaStream, SparkClient } = require('@async-fusion/data');
|
|
55
|
+
|
|
56
|
+
// Build complex pipelines with simple code
|
|
57
|
+
const pipeline = new PipelineBuilder({ name: 'analytics' })
|
|
58
|
+
.source('kafka', { topic: 'clickstream' })
|
|
59
|
+
.transform(data => enrichData(data))
|
|
60
|
+
.sink('spark', { job: 'analytics-job' });
|
|
11
61
|
```
|
|
12
62
|
|
|
63
|
+
## Features
|
|
64
|
+
|
|
65
|
+
| Category | Feature | Description | Status |
|
|
66
|
+
|----------|---------|-------------|--------|
|
|
67
|
+
| Streaming | Kafka Producer/Consumer | Full Kafka support with backpressure | ✅ |
|
|
68
|
+
| Streaming | Stream Windowing | Time-based windows (tumbling, sliding) | ✅ |
|
|
69
|
+
| Streaming | Stream Joins | Join multiple streams in real-time | ✅ |
|
|
70
|
+
| Streaming | Stateful Processing | Maintain state across stream events | ✅ |
|
|
71
|
+
| Processing | Spark Integration | Submit and monitor Spark jobs | ✅ |
|
|
72
|
+
| Processing | Spark SQL | Execute SQL queries on Spark | ✅ |
|
|
73
|
+
| Processing | Python Scripts | Run PySpark scripts from Node.js | ✅ |
|
|
74
|
+
| Pipeline | Fluent Builder | Chain operations naturally | ✅ |
|
|
75
|
+
| Pipeline | Multiple Sources/Sinks | Combine data from anywhere | ✅ |
|
|
76
|
+
| Pipeline | Transformation Pipeline | Apply transformations sequentially | ✅ |
|
|
77
|
+
| Reliability | Automatic Retries | Exponential backoff for failures | ✅ |
|
|
78
|
+
| Reliability | Circuit Breaker | Prevent cascading failures | ✅ |
|
|
79
|
+
| Reliability | Checkpointing | Resume from where you left off | ✅ |
|
|
80
|
+
| Reliability | Dead Letter Queue | Handle failed messages gracefully | ✅ |
|
|
81
|
+
| Monitoring | Built-in Metrics | Track pipeline performance | ✅ |
|
|
82
|
+
| Monitoring | Pipeline Lineage | Visualize data flow | ✅ |
|
|
83
|
+
| Monitoring | Health Checks | Monitor component status | ✅ |
|
|
84
|
+
| React | useKafkaTopic | Real-time Kafka data in React | 🚧 |
|
|
85
|
+
| React | useSparkQuery | Query Spark from React | 🚧 |
|
|
86
|
+
| React | useRealtimeData | Combined real-time data hook | 🚧 |
|
|
87
|
+
|
|
88
|
+
## Installation
|
|
89
|
+
|
|
90
|
+
### Prerequisites
|
|
91
|
+
|
|
92
|
+
- Node.js >= 16.0.0
|
|
93
|
+
- npm or yarn or pnpm
|
|
94
|
+
|
|
95
|
+
### Install from npm
|
|
96
|
+
|
|
13
97
|
```bash
|
|
98
|
+
# Using npm
|
|
99
|
+
npm install @async-fusion/data
|
|
100
|
+
|
|
101
|
+
# Using yarn
|
|
14
102
|
yarn add @async-fusion/data
|
|
103
|
+
|
|
104
|
+
# Using pnpm
|
|
105
|
+
pnpm add @async-fusion/data
|
|
15
106
|
```
|
|
16
107
|
|
|
108
|
+
### Optional Dependencies (for specific features)
|
|
109
|
+
|
|
17
110
|
```bash
|
|
18
|
-
|
|
111
|
+
# For Kafka features
|
|
112
|
+
npm install kafkajs
|
|
113
|
+
|
|
114
|
+
# For Spark features (requires Spark cluster)
|
|
115
|
+
# No additional Node packages needed
|
|
116
|
+
|
|
117
|
+
# For React hooks
|
|
118
|
+
npm install react react-dom
|
|
19
119
|
```
|
|
20
120
|
|
|
21
|
-
##
|
|
121
|
+
## Quick Start
|
|
22
122
|
|
|
23
|
-
|
|
24
|
-
- Async Ready — Built-in support for promises, async/await, and streaming data
|
|
25
|
-
- TypeScript First — Full type inference and generics support
|
|
26
|
-
- Tiny Size — ~3kB gzipped, zero dependencies
|
|
27
|
-
- Framework Agnostic — Works with React, Vue, Angular, Svelte, or vanilla JS
|
|
28
|
-
- Modular — Import only what you need
|
|
29
|
-
- Immutable Updates — Predictable state changes with structural sharing
|
|
123
|
+
### Example 1: Basic Pipeline
|
|
30
124
|
|
|
31
|
-
|
|
125
|
+
```javascript
|
|
126
|
+
const { PipelineBuilder } = require('@async-fusion/data');
|
|
32
127
|
|
|
33
|
-
|
|
128
|
+
// Create a pipeline that reads from Kafka, transforms data, and logs to console
|
|
129
|
+
const pipeline = new PipelineBuilder({
|
|
130
|
+
name: 'user-activity-pipeline',
|
|
131
|
+
checkpointLocation: './checkpoints'
|
|
132
|
+
});
|
|
34
133
|
|
|
35
|
-
|
|
134
|
+
pipeline
|
|
135
|
+
.source('kafka', {
|
|
136
|
+
topic: 'user-activity',
|
|
137
|
+
brokers: ['localhost:9092']
|
|
138
|
+
})
|
|
139
|
+
.transform(data => {
|
|
140
|
+
// Enrich data with processing timestamp
|
|
141
|
+
return {
|
|
142
|
+
...data,
|
|
143
|
+
processedAt: new Date().toISOString(),
|
|
144
|
+
processedBy: 'async-fusion'
|
|
145
|
+
};
|
|
146
|
+
})
|
|
147
|
+
.transform(data => {
|
|
148
|
+
// Filter only high-value events
|
|
149
|
+
return data.value > 100 ? data : null;
|
|
150
|
+
})
|
|
151
|
+
.sink('console', { format: 'pretty' });
|
|
152
|
+
|
|
153
|
+
// Run the pipeline
|
|
154
|
+
await pipeline.run();
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
### Example 2: Stream Processing with Windowing
|
|
36
158
|
|
|
37
|
-
|
|
159
|
+
```javascript
|
|
160
|
+
const { KafkaStream } = require('@async-fusion/data');
|
|
38
161
|
|
|
39
|
-
|
|
162
|
+
// Create a stream that calculates average order value per minute
|
|
163
|
+
const orderStream = new KafkaStream('orders', {
|
|
164
|
+
windowSize: 60000, // 1 minute windows
|
|
165
|
+
slideInterval: 30000, // Slide every 30 seconds
|
|
166
|
+
watermarkDelay: 5000 // Allow 5 seconds for late data
|
|
167
|
+
});
|
|
40
168
|
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
| File Processing | Track upload progress, handle chunked data, manage transformations |
|
|
47
|
-
| Cross-component State | Share async data between unrelated components without prop drilling |
|
|
169
|
+
const averageOrderValue = orderStream
|
|
170
|
+
.filter(order => order.status === 'completed')
|
|
171
|
+
.window(60000) // Group into 1-minute windows
|
|
172
|
+
.groupBy(order => order.productCategory)
|
|
173
|
+
.avg(order => order.amount);
|
|
48
174
|
|
|
49
|
-
|
|
175
|
+
// Process the stream
|
|
176
|
+
for await (const avg of averageOrderValue) {
|
|
177
|
+
console.log(`Average order value: $${avg}`);
|
|
178
|
+
}
|
|
179
|
+
```
|
|
50
180
|
|
|
51
|
-
|
|
52
|
-
A single source of truth that holds your application state and notifies subscribers of changes.
|
|
181
|
+
### Example 3: Resilient Pipeline with Retries
|
|
53
182
|
|
|
54
|
-
|
|
55
|
-
|
|
183
|
+
```javascript
|
|
184
|
+
const { PipelineBuilder } = require('@async-fusion/data');
|
|
185
|
+
|
|
186
|
+
const robustPipeline = new PipelineBuilder(
|
|
187
|
+
{ name: 'robust-etl' },
|
|
188
|
+
{
|
|
189
|
+
retryConfig: {
|
|
190
|
+
maxAttempts: 5, // Try up to 5 times
|
|
191
|
+
delayMs: 1000, // Start with 1 second delay
|
|
192
|
+
backoffMultiplier: 2 // Double delay each retry (1s, 2s, 4s, 8s)
|
|
193
|
+
},
|
|
194
|
+
errorHandler: (error, context) => {
|
|
195
|
+
// Log errors to your monitoring system
|
|
196
|
+
console.error(`Pipeline error in ${context.pipelineName}:`, error);
|
|
197
|
+
|
|
198
|
+
// Send alert to Slack/PagerDuty
|
|
199
|
+
sendAlert({
|
|
200
|
+
severity: 'high',
|
|
201
|
+
message: error.message,
|
|
202
|
+
context
|
|
203
|
+
});
|
|
204
|
+
}
|
|
205
|
+
}
|
|
206
|
+
);
|
|
207
|
+
|
|
208
|
+
robustPipeline
|
|
209
|
+
.source('kafka', { topic: 'critical-data' })
|
|
210
|
+
.transform(validateData)
|
|
211
|
+
.transform(enrichWithDatabase)
|
|
212
|
+
.sink('database', { table: 'processed_records' })
|
|
213
|
+
.sink('kafka', { topic: 'enriched-data' });
|
|
214
|
+
|
|
215
|
+
await robustPipeline.run();
|
|
216
|
+
```
|
|
56
217
|
|
|
57
|
-
|
|
58
|
-
Derived state that automatically recomputes when dependencies change.
|
|
218
|
+
## Core Concepts
|
|
59
219
|
|
|
60
|
-
|
|
61
|
-
Intercept actions to add logging, persistence, undo/redo, or side effects.
|
|
220
|
+
### 1. Pipeline Builder Pattern
|
|
62
221
|
|
|
63
|
-
|
|
222
|
+
The PipelineBuilder provides a fluent interface for constructing data pipelines:
|
|
64
223
|
|
|
65
224
|
```javascript
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
//
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
error: null
|
|
74
|
-
}
|
|
75
|
-
});
|
|
225
|
+
const pipeline = new PipelineBuilder({ name: 'my-pipeline' })
|
|
226
|
+
.source('kafka', config) // Add a source
|
|
227
|
+
.transform(fn1) // Add transformation
|
|
228
|
+
.transform(fn2) // Chain transformations
|
|
229
|
+
.sink('console', config) // Add a sink
|
|
230
|
+
.sink('file', config); // Add multiple sinks
|
|
231
|
+
```
|
|
76
232
|
|
|
77
|
-
|
|
78
|
-
const fetchUser = async (id) => {
|
|
79
|
-
store.dispatch({ type: 'SET_LOADING', payload: true });
|
|
80
|
-
|
|
81
|
-
try {
|
|
82
|
-
const response = await fetch(`/api/users/${id}`);
|
|
83
|
-
const user = await response.json();
|
|
84
|
-
store.dispatch({ type: 'SET_USER', payload: user });
|
|
85
|
-
} catch (error) {
|
|
86
|
-
store.dispatch({ type: 'SET_ERROR', payload: error.message });
|
|
87
|
-
} finally {
|
|
88
|
-
store.dispatch({ type: 'SET_LOADING', payload: false });
|
|
89
|
-
}
|
|
90
|
-
};
|
|
233
|
+
### 2. Stream Processing Model
|
|
91
234
|
|
|
92
|
-
|
|
93
|
-
store.subscribe((state) => {
|
|
94
|
-
console.log('State updated:', state);
|
|
95
|
-
});
|
|
235
|
+
Streams are processed in micro-batches with configurable windows:
|
|
96
236
|
|
|
97
|
-
|
|
98
|
-
|
|
237
|
+
```
|
|
238
|
+
Time → [Window 1] [Window 2] [Window 3] →
|
|
239
|
+
Data → └─┬─┘ └─┬─┘ └─┬─┘
|
|
240
|
+
Process Process Process
|
|
241
|
+
↓ ↓ ↓
|
|
242
|
+
Output Output Output
|
|
99
243
|
```
|
|
100
244
|
|
|
101
|
-
|
|
245
|
+
### 3. State Management
|
|
102
246
|
|
|
103
|
-
|
|
247
|
+
The library maintains state for:
|
|
104
248
|
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
const userStore = createAsyncStore({
|
|
109
|
-
name: 'users',
|
|
110
|
-
initialValue: [],
|
|
111
|
-
fetcher: async (userId) => {
|
|
112
|
-
const res = await fetch(`/api/users/${userId}`);
|
|
113
|
-
return res.json();
|
|
114
|
-
}
|
|
115
|
-
});
|
|
249
|
+
- Windowing: Aggregates within time windows
|
|
250
|
+
- GroupBy: Tracks groups and their aggregates
|
|
251
|
+
- Checkpointing: Saves progress for recovery
|
|
116
252
|
|
|
117
|
-
|
|
118
|
-
// - userStore.loading (boolean)
|
|
119
|
-
// - userStore.data (fetched data)
|
|
120
|
-
// - userStore.error (error object)
|
|
121
|
-
// - userStore.raceCondition (cancels previous requests)
|
|
253
|
+
### 4. Error Recovery Hierarchy
|
|
122
254
|
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
|
|
255
|
+
```
|
|
256
|
+
Application Error
|
|
257
|
+
↓
|
|
258
|
+
Local Retry (3-5 attempts with backoff)
|
|
259
|
+
↓
|
|
260
|
+
Circuit Breaker (if continues failing)
|
|
261
|
+
↓
|
|
262
|
+
Dead Letter Queue (store failed messages)
|
|
263
|
+
↓
|
|
264
|
+
Alert & Manual Intervention
|
|
126
265
|
```
|
|
127
266
|
|
|
128
|
-
|
|
267
|
+
## API Documentation
|
|
129
268
|
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
|
|
136
|
-
|
|
137
|
-
|
|
138
|
-
|
|
139
|
-
|
|
140
|
-
|
|
141
|
-
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
|
|
269
|
+
### PipelineBuilder
|
|
270
|
+
|
|
271
|
+
#### Constructor
|
|
272
|
+
|
|
273
|
+
```typescript
|
|
274
|
+
new PipelineBuilder(config: PipelineConfig, options?: PipelineOptions)
|
|
275
|
+
```
|
|
276
|
+
|
|
277
|
+
**PipelineConfig:**
|
|
278
|
+
|
|
279
|
+
| Property | Type | Description | Default |
|
|
280
|
+
|----------|------|-------------|---------|
|
|
281
|
+
| name | string | Pipeline identifier | Required |
|
|
282
|
+
| checkpointLocation | string | Directory for checkpoints | './checkpoints' |
|
|
283
|
+
| parallelism | number | Concurrent processing | 1 |
|
|
284
|
+
|
|
285
|
+
**PipelineOptions:**
|
|
286
|
+
|
|
287
|
+
| Property | Type | Description |
|
|
288
|
+
|----------|------|-------------|
|
|
289
|
+
| retryConfig | RetryConfig | Retry configuration |
|
|
290
|
+
| errorHandler | Function | Custom error handler |
|
|
291
|
+
| maxConcurrent | number | Max concurrent operations |
|
|
292
|
+
|
|
293
|
+
#### Methods
|
|
294
|
+
|
|
295
|
+
| Method | Description | Returns |
|
|
296
|
+
|--------|-------------|---------|
|
|
297
|
+
| source(type, config) | Add a data source | this |
|
|
298
|
+
| transform(fn) | Add a transformation function | this |
|
|
299
|
+
| sink(type, config) | Add a data sink | this |
|
|
300
|
+
| run() | Execute the pipeline | Promise<void> |
|
|
301
|
+
| lineage() | Get pipeline execution graph | Lineage |
|
|
302
|
+
| getMetrics() | Get pipeline performance metrics | Metrics |
|
|
303
|
+
|
|
304
|
+
### KafkaStream
|
|
305
|
+
|
|
306
|
+
```typescript
|
|
307
|
+
new KafkaStream<T>(topic: string, options?: StreamOptions)
|
|
308
|
+
```
|
|
309
|
+
|
|
310
|
+
**StreamOptions:**
|
|
311
|
+
|
|
312
|
+
| Property | Type | Description |
|
|
313
|
+
|----------|------|-------------|
|
|
314
|
+
| windowSize | number | Window duration in ms |
|
|
315
|
+
| slideInterval | number | Slide interval for windows |
|
|
316
|
+
| watermarkDelay | number | Late data tolerance |
|
|
317
|
+
|
|
318
|
+
#### Methods
|
|
319
|
+
|
|
320
|
+
| Method | Description |
|
|
321
|
+
|--------|-------------|
|
|
322
|
+
| filter(predicate) | Keep only matching records |
|
|
323
|
+
| map(transform) | Transform each record |
|
|
324
|
+
| flatMap(transform) | One-to-many transformation |
|
|
325
|
+
| window(ms, slide?) | Add time-based window |
|
|
326
|
+
| groupBy(keyExtractor) | Group records by key |
|
|
327
|
+
| count() | Count per group |
|
|
328
|
+
| sum(extractor) | Sum values per group |
|
|
329
|
+
| avg(extractor) | Average per group |
|
|
330
|
+
| reduce(reducer, initial) | Custom reduction |
|
|
331
|
+
| join(other, keyExtractor) | Join with another stream |
|
|
332
|
+
|
|
333
|
+
### SparkClient
|
|
334
|
+
|
|
335
|
+
```typescript
|
|
336
|
+
new SparkClient(config: SparkConfig, retryConfig?: RetryConfig)
|
|
337
|
+
```
|
|
338
|
+
|
|
339
|
+
**SparkConfig:**
|
|
340
|
+
|
|
341
|
+
| Property | Type | Description |
|
|
342
|
+
|----------|------|-------------|
|
|
343
|
+
| master | string | Spark master URL |
|
|
344
|
+
| appName | string | Application name |
|
|
345
|
+
| sparkConf | object | Spark configuration |
|
|
346
|
+
|
|
347
|
+
#### Methods
|
|
348
|
+
|
|
349
|
+
| Method | Description |
|
|
350
|
+
|--------|-------------|
|
|
351
|
+
| submitJob(code, name, options) | Submit Spark job |
|
|
352
|
+
| runPythonScript(path, args, options) | Run Python script |
|
|
353
|
+
| submitSQLQuery(sql, options) | Execute SQL query |
|
|
354
|
+
| monitorJob(id, timeout) | Monitor job progress |
|
|
355
|
+
| cancelJob(id) | Cancel running job |
|
|
356
|
+
| healthCheck() | Check cluster health |
|
|
357
|
+
|
|
358
|
+
### Error Handling Utilities
|
|
359
|
+
|
|
360
|
+
```typescript
|
|
361
|
+
// Retry failed operations
|
|
362
|
+
function withRetry<T>(
|
|
363
|
+
fn: () => Promise<T>,
|
|
364
|
+
options?: {
|
|
365
|
+
maxRetries?: number;
|
|
366
|
+
delayMs?: number;
|
|
367
|
+
backoffMultiplier?: number;
|
|
368
|
+
shouldRetry?: (error: Error) => boolean;
|
|
369
|
+
}
|
|
370
|
+
): Promise<T>;
|
|
371
|
+
|
|
372
|
+
// Circuit breaker pattern
|
|
373
|
+
class CircuitBreaker {
|
|
374
|
+
constructor(failureThreshold: number, timeoutMs: number);
|
|
375
|
+
call<T>(fn: () => Promise<T>): Promise<T>;
|
|
376
|
+
getState(): 'CLOSED' | 'OPEN' | 'HALF_OPEN';
|
|
377
|
+
reset(): void;
|
|
145
378
|
}
|
|
379
|
+
|
|
380
|
+
// Custom errors
|
|
381
|
+
class RetryableError extends Error {} // Will trigger retry
|
|
382
|
+
class FatalError extends Error {} // Will NOT retry
|
|
146
383
|
```
|
|
147
384
|
|
|
148
|
-
|
|
385
|
+
## Real-World Examples
|
|
386
|
+
|
|
387
|
+
### Example: Real-time E-commerce Analytics
|
|
149
388
|
|
|
150
389
|
```javascript
|
|
151
|
-
|
|
152
|
-
|
|
153
|
-
|
|
154
|
-
|
|
155
|
-
|
|
156
|
-
|
|
157
|
-
|
|
158
|
-
|
|
390
|
+
const { PipelineBuilder, KafkaStream } = require('@async-fusion/data');
|
|
391
|
+
|
|
392
|
+
// Stream 1: Calculate real-time revenue
|
|
393
|
+
const revenueStream = new KafkaStream('orders')
|
|
394
|
+
.filter(order => order.status === 'completed')
|
|
395
|
+
.window(60000) // 1-minute windows
|
|
396
|
+
.groupBy(order => order.productId)
|
|
397
|
+
.sum(order => order.amount)
|
|
398
|
+
.map(result => ({
|
|
399
|
+
productId: result.key,
|
|
400
|
+
revenue: result.sum,
|
|
401
|
+
timestamp: new Date()
|
|
402
|
+
}));
|
|
403
|
+
|
|
404
|
+
// Stream 2: Detect fraudulent transactions
|
|
405
|
+
const fraudStream = new KafkaStream('payments')
|
|
406
|
+
.filter(payment => payment.amount > 1000)
|
|
407
|
+
.window(300000) // 5-minute windows
|
|
408
|
+
.groupBy(payment => payment.userId)
|
|
409
|
+
.count()
|
|
410
|
+
.filter(result => result.count > 3) // >3 high-value payments in 5 min
|
|
411
|
+
.map(result => ({
|
|
412
|
+
userId: result.key,
|
|
413
|
+
alert: 'POTENTIAL_FRAUD',
|
|
414
|
+
timestamp: new Date()
|
|
415
|
+
}));
|
|
416
|
+
|
|
417
|
+
// Pipeline to combine and output
|
|
418
|
+
const analyticsPipeline = new PipelineBuilder({ name: 'ecommerce-analytics' })
|
|
419
|
+
.source('stream', { stream: revenueStream })
|
|
420
|
+
.source('stream', { stream: fraudStream })
|
|
421
|
+
.transform(data => enrichWithUserData(data))
|
|
422
|
+
.sink('database', { table: 'realtime_metrics' })
|
|
423
|
+
.sink('websocket', { port: 8080 }); // Push to dashboard
|
|
424
|
+
|
|
425
|
+
await analyticsPipeline.run();
|
|
426
|
+
```
|
|
427
|
+
|
|
428
|
+
### Example: Data Lake Ingestion with Spark
|
|
429
|
+
|
|
430
|
+
```javascript
|
|
431
|
+
const { SparkClient, PipelineBuilder } = require('@async-fusion/data');
|
|
432
|
+
|
|
433
|
+
const spark = new SparkClient({
|
|
434
|
+
master: 'spark://prod-cluster:7077',
|
|
435
|
+
appName: 'data-lake-ingestion',
|
|
436
|
+
sparkConf: {
|
|
437
|
+
'spark.sql.shuffle.partitions': '200',
|
|
438
|
+
'spark.sql.adaptive.enabled': 'true'
|
|
439
|
+
}
|
|
159
440
|
});
|
|
441
|
+
|
|
442
|
+
// Submit data transformation job
|
|
443
|
+
const transformJob = await spark.runPythonScript('./transform.py', [
|
|
444
|
+
'--input', 's3://raw-bucket/logs/',
|
|
445
|
+
'--output', 's3://processed-bucket/'
|
|
446
|
+
], { timeout: 3600000 });
|
|
447
|
+
|
|
448
|
+
// Monitor progress
|
|
449
|
+
await spark.monitorJob(transformJob.id, 3600000);
|
|
450
|
+
|
|
451
|
+
// Run SQL analysis
|
|
452
|
+
const results = await spark.submitSQLQuery(`
|
|
453
|
+
SELECT
|
|
454
|
+
DATE(timestamp) as day,
|
|
455
|
+
COUNT(*) as total_events,
|
|
456
|
+
COUNT(DISTINCT user_id) as unique_users
|
|
457
|
+
FROM processed_events
|
|
458
|
+
WHERE timestamp >= CURRENT_DATE - INTERVAL 7 DAY
|
|
459
|
+
GROUP BY DATE(timestamp)
|
|
460
|
+
`);
|
|
461
|
+
|
|
462
|
+
console.log('Weekly stats:', results);
|
|
160
463
|
```
|
|
161
464
|
|
|
162
|
-
##
|
|
465
|
+
## Error Handling Deep Dive
|
|
163
466
|
|
|
164
|
-
###
|
|
467
|
+
### The Retry Mechanism
|
|
165
468
|
|
|
166
|
-
|
|
469
|
+
```javascript
|
|
470
|
+
const { withRetry, RetryableError } = require('@async-fusion/data');
|
|
471
|
+
|
|
472
|
+
// Automatic retry with exponential backoff
|
|
473
|
+
const data = await withRetry(
|
|
474
|
+
async () => {
|
|
475
|
+
const response = await fetch('https://api.example.com/data');
|
|
476
|
+
|
|
477
|
+
if (response.status === 429) {
|
|
478
|
+
// Rate limited - retryable
|
|
479
|
+
throw new RetryableError('Rate limited');
|
|
480
|
+
}
|
|
481
|
+
|
|
482
|
+
if (response.status === 500) {
|
|
483
|
+
// Server error - retryable
|
|
484
|
+
throw new RetryableError('Server error');
|
|
485
|
+
}
|
|
486
|
+
|
|
487
|
+
if (response.status === 404) {
|
|
488
|
+
// Not found - NOT retryable
|
|
489
|
+
throw new Error('Resource not found');
|
|
490
|
+
}
|
|
491
|
+
|
|
492
|
+
return response.json();
|
|
493
|
+
},
|
|
494
|
+
{
|
|
495
|
+
maxRetries: 5,
|
|
496
|
+
delayMs: 1000,
|
|
497
|
+
backoffMultiplier: 2,
|
|
498
|
+
shouldRetry: (error) => error instanceof RetryableError
|
|
499
|
+
}
|
|
500
|
+
);
|
|
501
|
+
```
|
|
167
502
|
|
|
168
|
-
|
|
169
|
-
|--------------|---------|---------|-------------|
|
|
170
|
-
| initialState | object | {} | Starting state value |
|
|
171
|
-
| middleware | array | [] | Array of middleware functions |
|
|
172
|
-
| devtools | boolean | false | Enable Redux DevTools integration |
|
|
503
|
+
### Circuit Breaker in Action
|
|
173
504
|
|
|
174
|
-
|
|
505
|
+
```javascript
|
|
506
|
+
const { CircuitBreaker } = require('@async-fusion/data');
|
|
175
507
|
|
|
176
|
-
|
|
177
|
-
|
|
178
|
-
- subscribe(listener) → Listen to changes
|
|
179
|
-
- unsubscribe(listener) → Remove listener
|
|
508
|
+
// Create circuit breaker for external API
|
|
509
|
+
const apiBreaker = new CircuitBreaker(5, 60000);
|
|
180
510
|
|
|
181
|
-
|
|
511
|
+
async function callExternalAPI() {
|
|
512
|
+
return apiBreaker.call(async () => {
|
|
513
|
+
const response = await axios.get('https://unreliable-api.com/data');
|
|
514
|
+
return response.data;
|
|
515
|
+
});
|
|
516
|
+
}
|
|
182
517
|
|
|
183
|
-
|
|
518
|
+
// Circuit states:
|
|
519
|
+
// CLOSED - Normal operation, requests pass through
|
|
520
|
+
// OPEN - Too many failures, requests blocked
|
|
521
|
+
// HALF_OPEN - Testing if service recovered
|
|
522
|
+
|
|
523
|
+
setInterval(async () => {
|
|
524
|
+
try {
|
|
525
|
+
const data = await callExternalAPI();
|
|
526
|
+
console.log('API call succeeded');
|
|
527
|
+
console.log('Circuit state:', apiBreaker.getState());
|
|
528
|
+
} catch (error) {
|
|
529
|
+
console.error('API call failed');
|
|
530
|
+
console.log('Circuit state:', apiBreaker.getState());
|
|
531
|
+
}
|
|
532
|
+
}, 5000);
|
|
533
|
+
```
|
|
184
534
|
|
|
185
|
-
|
|
186
|
-
|------------|-------------------|-------------|
|
|
187
|
-
| name | string | Unique store identifier |
|
|
188
|
-
| initialValue | any | Default data value |
|
|
189
|
-
| fetcher | (params) => Promise | Async fetch function |
|
|
190
|
-
| staleTime | number | Cache duration in ms (default: 0) |
|
|
191
|
-
| retryCount | number | Auto-retry on failure (default: 3) |
|
|
535
|
+
## Performance Characteristics
|
|
192
536
|
|
|
193
|
-
|
|
537
|
+
### Benchmarks
|
|
194
538
|
|
|
195
|
-
| Operation
|
|
196
|
-
|
|
197
|
-
|
|
|
198
|
-
|
|
|
199
|
-
|
|
|
200
|
-
|
|
|
539
|
+
| Operation | Latency (p99) | Throughput | Memory Usage |
|
|
540
|
+
|-----------|---------------|------------|--------------|
|
|
541
|
+
| Simple filter | 0.5ms | 200K ops/sec | ~50MB |
|
|
542
|
+
| Map transformation | 0.8ms | 180K ops/sec | ~50MB |
|
|
543
|
+
| Window (1 min) | 5ms | 100K ops/sec | ~200MB |
|
|
544
|
+
| Group by count | 10ms | 80K ops/sec | ~300MB |
|
|
545
|
+
| Join (2 streams) | 15ms | 50K ops/sec | ~500MB |
|
|
201
546
|
|
|
202
|
-
|
|
547
|
+
### Optimization Tips
|
|
203
548
|
|
|
204
|
-
|
|
549
|
+
- Increase batch size for higher throughput
|
|
205
550
|
|
|
206
|
-
```
|
|
207
|
-
|
|
208
|
-
|
|
209
|
-
name: string;
|
|
210
|
-
email: string;
|
|
211
|
-
}
|
|
551
|
+
```javascript
|
|
552
|
+
pipeline.options.batchSize = 1000;
|
|
553
|
+
```
|
|
212
554
|
|
|
213
|
-
|
|
214
|
-
user: User | null;
|
|
215
|
-
posts: Post[];
|
|
216
|
-
loading: boolean;
|
|
217
|
-
}
|
|
555
|
+
- Use partitioning for parallel processing
|
|
218
556
|
|
|
219
|
-
|
|
220
|
-
|
|
221
|
-
|
|
222
|
-
|
|
223
|
-
|
|
224
|
-
}
|
|
225
|
-
});
|
|
557
|
+
```javascript
|
|
558
|
+
pipeline.options.parallelism = 4;
|
|
559
|
+
```
|
|
560
|
+
|
|
561
|
+
- Enable compression for large payloads
|
|
226
562
|
|
|
227
|
-
|
|
228
|
-
|
|
563
|
+
```javascript
|
|
564
|
+
kafkaConfig.compression = 'snappy';
|
|
229
565
|
```
|
|
230
566
|
|
|
231
|
-
|
|
567
|
+
- Tune window size based on latency requirements
|
|
568
|
+
|
|
569
|
+
```javascript
|
|
570
|
+
// Lower latency: smaller windows
|
|
571
|
+
stream.window(1000); // 1 second windows
|
|
232
572
|
|
|
233
|
-
|
|
234
|
-
|
|
235
|
-
|
|
236
|
-
|
|
237
|
-
|
|
238
|
-
|
|
573
|
+
// Higher throughput: larger windows
|
|
574
|
+
stream.window(60000); // 1 minute windows
|
|
575
|
+
```
|
|
576
|
+
|
|
577
|
+
## Configuration Reference
|
|
578
|
+
|
|
579
|
+
### Full Configuration Example
|
|
580
|
+
|
|
581
|
+
```javascript
|
|
582
|
+
const config = {
|
|
583
|
+
// Pipeline configuration
|
|
584
|
+
pipeline: {
|
|
585
|
+
name: 'production-pipeline',
|
|
586
|
+
checkpointLocation: '/data/checkpoints',
|
|
587
|
+
parallelism: 4,
|
|
588
|
+
batchSize: 1000
|
|
589
|
+
},
|
|
590
|
+
|
|
591
|
+
// Retry configuration
|
|
592
|
+
retry: {
|
|
593
|
+
maxAttempts: 5,
|
|
594
|
+
delayMs: 1000,
|
|
595
|
+
backoffMultiplier: 2,
|
|
596
|
+
maxDelayMs: 30000
|
|
597
|
+
},
|
|
598
|
+
|
|
599
|
+
// Kafka configuration
|
|
600
|
+
kafka: {
|
|
601
|
+
brokers: ['kafka1:9092', 'kafka2:9092'],
|
|
602
|
+
clientId: 'async-fusion-app',
|
|
603
|
+
ssl: true,
|
|
604
|
+
sasl: {
|
|
605
|
+
mechanism: 'scram-sha-256',
|
|
606
|
+
username: process.env.KAFKA_USERNAME,
|
|
607
|
+
password: process.env.KAFKA_PASSWORD
|
|
608
|
+
},
|
|
609
|
+
compression: 'snappy',
|
|
610
|
+
retry: {
|
|
611
|
+
maxRetries: 3,
|
|
612
|
+
initialRetryTime: 100
|
|
613
|
+
}
|
|
614
|
+
},
|
|
615
|
+
|
|
616
|
+
// Spark configuration
|
|
617
|
+
spark: {
|
|
618
|
+
master: 'spark://cluster:7077',
|
|
619
|
+
appName: 'async-fusion-job',
|
|
620
|
+
sparkConf: {
|
|
621
|
+
'spark.executor.memory': '4g',
|
|
622
|
+
'spark.executor.cores': '2',
|
|
623
|
+
'spark.sql.adaptive.enabled': 'true'
|
|
624
|
+
}
|
|
625
|
+
},
|
|
626
|
+
|
|
627
|
+
// Monitoring
|
|
628
|
+
monitoring: {
|
|
629
|
+
metricsInterval: 10000, // 10 seconds
|
|
630
|
+
exporters: ['console', 'prometheus']
|
|
631
|
+
}
|
|
632
|
+
};
|
|
633
|
+
```
|
|
239
634
|
|
|
240
635
|
## Contributing
|
|
241
636
|
|
|
242
637
|
We welcome contributions! Please see our Contributing Guide.
|
|
243
638
|
|
|
244
|
-
|
|
245
|
-
|
|
246
|
-
|
|
247
|
-
|
|
248
|
-
|
|
639
|
+
### Development Setup
|
|
640
|
+
|
|
641
|
+
```bash
|
|
642
|
+
# Clone the repository
|
|
643
|
+
git clone https://github.com/UdayanSharma/async-fusion-data.git
|
|
644
|
+
|
|
645
|
+
# Install dependencies
|
|
646
|
+
npm install
|
|
647
|
+
|
|
648
|
+
# Build the project
|
|
649
|
+
npm run build
|
|
650
|
+
|
|
651
|
+
# Run tests
|
|
652
|
+
npm test
|
|
249
653
|
|
|
250
|
-
|
|
654
|
+
# Run in development mode
|
|
655
|
+
npm run dev
|
|
656
|
+
```
|
|
657
|
+
|
|
658
|
+
### Project Structure
|
|
251
659
|
|
|
252
|
-
|
|
660
|
+
```
|
|
661
|
+
async-fusion-data/
|
|
662
|
+
├── src/
|
|
663
|
+
│ ├── kafka/ # Kafka integration
|
|
664
|
+
│ ├── spark/ # Spark integration
|
|
665
|
+
│ ├── pipeline/ # Pipeline builder
|
|
666
|
+
│ ├── react/ # React hooks
|
|
667
|
+
│ ├── utils/ # Utilities
|
|
668
|
+
│ └── types/ # TypeScript types
|
|
669
|
+
├── dist/ # Built files
|
|
670
|
+
├── __tests__/ # Unit tests
|
|
671
|
+
├── examples/ # Example applications
|
|
672
|
+
├── docs/ # Documentation
|
|
673
|
+
└── package.json
|
|
674
|
+
```
|
|
253
675
|
|
|
254
|
-
|
|
255
|
-
- You want automatic loading/error state management
|
|
256
|
-
- You need race condition handling out of the box
|
|
257
|
-
- Bundle size is a concern
|
|
676
|
+
## License
|
|
258
677
|
|
|
259
|
-
|
|
678
|
+
This project is licensed under the MIT License - see the LICENSE file for details.
|
|
260
679
|
|
|
261
|
-
|
|
262
|
-
|
|
263
|
-
-
|
|
680
|
+
## Acknowledgments
|
|
681
|
+
|
|
682
|
+
- Apache Kafka - Distributed streaming platform
|
|
683
|
+
- Apache Spark - Unified analytics engine
|
|
684
|
+
- Node.js community - JavaScript runtime
|
|
685
|
+
- TypeScript team - Type safety
|
|
686
|
+
|
|
687
|
+
## Contact & Support
|
|
688
|
+
|
|
689
|
+
- GitHub Issues: Report bugs
|
|
690
|
+
- Discussions: Ask questions
|
|
691
|
+
- Email: udayan.sharma@example.com
|
|
692
|
+
|
|
693
|
+
Built with love by Udayan Sharma
|
|
694
|
+
|
|
695
|
+
"Making data streaming accessible to everyone"
|
|
696
|
+
|
|
697
|
+
[Back to Top](#async-fusiondata)
|
|
698
|
+
```
|