node-es-transformer 1.0.0-beta7 → 1.0.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +1 -1
- package/README.md +294 -56
- package/dist/node-es-transformer.cjs.js +170 -68
- package/dist/node-es-transformer.cjs.js.map +1 -1
- package/dist/node-es-transformer.esm.js +171 -69
- package/dist/node-es-transformer.esm.js.map +1 -1
- package/index.d.ts +197 -0
- package/package.json +57 -21
package/LICENSE
CHANGED
package/README.md
CHANGED
|
@@ -1,32 +1,100 @@
|
|
|
1
|
-
[](https://www.npmjs.com/package/node-es-transformer)
|
|
2
|
+
[](https://www.npmjs.com/package/node-es-transformer)
|
|
3
|
+
[](https://github.com/walterra/node-es-transformer/blob/main/LICENSE)
|
|
4
|
+
[](https://nodejs.org/)
|
|
5
5
|
[](https://github.com/walterra/node-es-transformer/actions)
|
|
6
|
+
[](https://github.com/walterra/node-es-transformer/blob/main/index.d.ts)
|
|
7
|
+
[](https://www.elastic.co/elasticsearch/)
|
|
6
8
|
|
|
7
9
|
# node-es-transformer
|
|
8
10
|
|
|
9
|
-
|
|
11
|
+
Stream-based library for ingesting and transforming large data files (CSV/JSON) into Elasticsearch indices.
|
|
10
12
|
|
|
11
|
-
|
|
13
|
+
## Quick Start
|
|
12
14
|
|
|
13
|
-
|
|
15
|
+
```bash
|
|
16
|
+
npm install node-es-transformer
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
```javascript
|
|
20
|
+
const transformer = require('node-es-transformer');
|
|
21
|
+
|
|
22
|
+
// Ingest a large JSON file
|
|
23
|
+
await transformer({
|
|
24
|
+
fileName: 'data.json',
|
|
25
|
+
targetIndexName: 'my-index',
|
|
26
|
+
mappings: {
|
|
27
|
+
properties: {
|
|
28
|
+
'@timestamp': { type: 'date' },
|
|
29
|
+
'message': { type: 'text' }
|
|
30
|
+
}
|
|
31
|
+
}
|
|
32
|
+
});
|
|
33
|
+
```
|
|
14
34
|
|
|
15
|
-
|
|
35
|
+
See [Usage](#usage) for more examples.
|
|
36
|
+
|
|
37
|
+
## Why Use This?
|
|
38
|
+
|
|
39
|
+
If you need to ingest large CSV/JSON files (GigaBytes) into Elasticsearch without running out of memory, this is the tool for you. Other solutions often run out of JS heap, hammer ES with too many requests, time out, or try to do everything in a single bulk request.
|
|
40
|
+
|
|
41
|
+
**When to use this:**
|
|
42
|
+
- Large file ingestion (20-30 GB tested)
|
|
43
|
+
- Custom JavaScript transformations
|
|
44
|
+
- Cross-version migration (ES 8.x → 9.x)
|
|
45
|
+
- Developer-friendly Node.js workflow
|
|
46
|
+
|
|
47
|
+
**When to use alternatives:**
|
|
48
|
+
- [Logstash](https://www.elastic.co/products/logstash) - Enterprise ingestion pipelines
|
|
49
|
+
- [Filebeat](https://www.elastic.co/products/beats/filebeat) - Log file shipping
|
|
50
|
+
- [Elastic Agent](https://www.elastic.co/guide/en/fleet/current/fleet-overview.html) - Modern unified agent
|
|
51
|
+
- [Elasticsearch Transforms](https://www.elastic.co/guide/en/elasticsearch/reference/current/transforms.html) - Built-in data transformation
|
|
52
|
+
|
|
53
|
+
## Table of Contents
|
|
54
|
+
|
|
55
|
+
- [Features](#features)
|
|
56
|
+
- [Quick Start](#quick-start)
|
|
57
|
+
- [Installation](#installation)
|
|
58
|
+
- [Version Compatibility](#version-compatibility)
|
|
59
|
+
- [Usage](#usage)
|
|
60
|
+
- [Read from a file](#read-from-a-file)
|
|
61
|
+
- [Read from another index](#read-from-another-index)
|
|
62
|
+
- [Reindex from ES 8.x to 9.x](#reindex-from-elasticsearch-8x-to-9x)
|
|
63
|
+
- [API Reference](#api-reference)
|
|
64
|
+
- [Documentation](#documentation)
|
|
65
|
+
- [Contributing](#contributing)
|
|
66
|
+
- [License](#license)
|
|
16
67
|
|
|
17
68
|
## Features
|
|
18
69
|
|
|
19
|
-
-
|
|
20
|
-
-
|
|
21
|
-
-
|
|
22
|
-
-
|
|
23
|
-
-
|
|
70
|
+
- **Streaming and buffering**: Files are read using streams and Elasticsearch ingestion uses buffered bulk indexing. Handles very large files (20-30 GB tested) without running out of memory.
|
|
71
|
+
- **High throughput**: Up to 20k documents/second on a single machine (2.9 GHz Intel Core i7, 16GB RAM, SSD), depending on document size. See [PERFORMANCE.md](PERFORMANCE.md) for benchmarks and tuning guidance.
|
|
72
|
+
- **Wildcard support**: Ingest multiple files matching a pattern (e.g., `logs/*.json`).
|
|
73
|
+
- **Flexible sources**: Read from files, Elasticsearch indices, or Node.js streams.
|
|
74
|
+
- **Reindexing with transforms**: Fetch documents from existing indices and transform them using JavaScript.
|
|
75
|
+
- **Document splitting**: Transform one source document into multiple target documents (e.g., tweets → hashtags).
|
|
76
|
+
- **Cross-version support**: Seamlessly reindex between Elasticsearch 8.x and 9.x.
|
|
77
|
+
|
|
78
|
+
## Version Compatibility
|
|
24
79
|
|
|
25
|
-
|
|
80
|
+
| node-es-transformer | Elasticsearch Client | Elasticsearch Server | Node.js |
|
|
81
|
+
| ----------------------- | -------------------- | -------------------- | ------- |
|
|
82
|
+
| 1.0.0+ | 8.x and 9.x | 8.x and 9.x | 22+ |
|
|
83
|
+
| 1.0.0-beta7 and earlier | 8.x | 8.x | 18-20 |
|
|
26
84
|
|
|
27
|
-
|
|
85
|
+
**Multi-Version Support**: Starting with v1.0.0, the library supports both Elasticsearch 8.x and 9.x through automatic version detection and client aliasing. This enables seamless reindexing between major versions (e.g., migrating from ES 8.x to 9.x). All functionality is tested in CI against multiple ES versions including cross-version reindexing scenarios.
|
|
28
86
|
|
|
29
|
-
|
|
87
|
+
**Upgrading?** See [MIGRATION.md](MIGRATION.md) for upgrade guidance from beta versions to v1.0.0.
|
|
88
|
+
|
|
89
|
+
## Installation
|
|
90
|
+
|
|
91
|
+
```bash
|
|
92
|
+
npm install node-es-transformer
|
|
93
|
+
# or
|
|
94
|
+
yarn add node-es-transformer
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
## Usage
|
|
30
98
|
|
|
31
99
|
### Read from a file
|
|
32
100
|
|
|
@@ -95,57 +163,227 @@ transformer({
|
|
|
95
163
|
});
|
|
96
164
|
```
|
|
97
165
|
|
|
98
|
-
###
|
|
99
|
-
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
-
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
-
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
-
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
-
|
|
116
|
-
-
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
166
|
+
### Reindex from Elasticsearch 8.x to 9.x
|
|
167
|
+
|
|
168
|
+
The library automatically detects the Elasticsearch version and uses the appropriate client. This enables seamless reindexing between major versions:
|
|
169
|
+
|
|
170
|
+
```javascript
|
|
171
|
+
const transformer = require('node-es-transformer');
|
|
172
|
+
|
|
173
|
+
// Auto-detection (recommended)
|
|
174
|
+
transformer({
|
|
175
|
+
sourceClientConfig: {
|
|
176
|
+
node: 'https://es8-cluster.example.com:9200',
|
|
177
|
+
auth: { apiKey: 'your-es8-api-key' },
|
|
178
|
+
},
|
|
179
|
+
targetClientConfig: {
|
|
180
|
+
node: 'https://es9-cluster.example.com:9200',
|
|
181
|
+
auth: { apiKey: 'your-es9-api-key' },
|
|
182
|
+
},
|
|
183
|
+
sourceIndexName: 'my-source-index',
|
|
184
|
+
targetIndexName: 'my-target-index',
|
|
185
|
+
transform(doc) {
|
|
186
|
+
// Optional transformation during reindexing
|
|
187
|
+
return doc;
|
|
188
|
+
},
|
|
189
|
+
});
|
|
190
|
+
|
|
191
|
+
// Explicit version specification (if auto-detection fails)
|
|
192
|
+
transformer({
|
|
193
|
+
sourceClientConfig: {
|
|
194
|
+
/* ... */
|
|
195
|
+
},
|
|
196
|
+
targetClientConfig: {
|
|
197
|
+
/* ... */
|
|
198
|
+
},
|
|
199
|
+
sourceClientVersion: 8, // Force ES 8.x client
|
|
200
|
+
targetClientVersion: 9, // Force ES 9.x client
|
|
201
|
+
sourceIndexName: 'my-source-index',
|
|
202
|
+
targetIndexName: 'my-target-index',
|
|
203
|
+
});
|
|
204
|
+
|
|
205
|
+
// Using pre-instantiated clients (advanced)
|
|
206
|
+
const { Client: Client8 } = require('es8');
|
|
207
|
+
const { Client: Client9 } = require('es9');
|
|
208
|
+
|
|
209
|
+
const sourceClient = new Client8({
|
|
210
|
+
node: 'https://es8-cluster.example.com:9200',
|
|
211
|
+
});
|
|
212
|
+
const targetClient = new Client9({
|
|
213
|
+
node: 'https://es9-cluster.example.com:9200',
|
|
214
|
+
});
|
|
215
|
+
|
|
216
|
+
transformer({
|
|
217
|
+
sourceClient,
|
|
218
|
+
targetClient,
|
|
219
|
+
sourceIndexName: 'my-source-index',
|
|
220
|
+
targetIndexName: 'my-target-index',
|
|
221
|
+
});
|
|
222
|
+
```
|
|
223
|
+
|
|
224
|
+
**Note**: To use pre-instantiated clients with different ES versions, install both client versions:
|
|
121
225
|
|
|
122
226
|
```bash
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
yarn
|
|
227
|
+
npm install es9@npm:@elastic/elasticsearch@^9.2.0
|
|
228
|
+
npm install es8@npm:@elastic/elasticsearch@^8.17.0
|
|
126
229
|
```
|
|
127
230
|
|
|
128
|
-
|
|
231
|
+
## API Reference
|
|
129
232
|
|
|
130
|
-
|
|
131
|
-
A CommonJS bundle, suitable for use in Node.js, that `require`s the external dependency. This corresponds to the `"main"` field in package.json
|
|
132
|
-
- `dist/node-es-transformer.esm.js`
|
|
133
|
-
an ES module bundle, suitable for use in other people's libraries and applications, that `import`s the external dependency. This corresponds to the `"module"` field in package.json
|
|
233
|
+
### Configuration Options
|
|
134
234
|
|
|
135
|
-
|
|
235
|
+
All options are passed to the main `transformer()` function.
|
|
136
236
|
|
|
137
|
-
|
|
237
|
+
#### Required Options
|
|
138
238
|
|
|
139
|
-
|
|
140
|
-
|
|
141
|
-
|
|
239
|
+
- **`targetIndexName`** (string): The target Elasticsearch index where documents will be indexed.
|
|
240
|
+
|
|
241
|
+
#### Source Options
|
|
242
|
+
|
|
243
|
+
Choose **one** of these sources:
|
|
244
|
+
|
|
245
|
+
- **`fileName`** (string): Source filename to ingest. Supports wildcards (e.g., `logs/*.json`).
|
|
246
|
+
- **`sourceIndexName`** (string): Source Elasticsearch index to reindex from.
|
|
247
|
+
- **`stream`** (Readable): Node.js readable stream to ingest from.
|
|
248
|
+
|
|
249
|
+
#### Client Configuration
|
|
250
|
+
|
|
251
|
+
- **`sourceClient`** (Client): Pre-instantiated Elasticsearch client for source operations. If provided, `sourceClientConfig` is ignored.
|
|
252
|
+
- **`targetClient`** (Client): Pre-instantiated Elasticsearch client for target operations. If not provided, uses `sourceClient` or creates from config.
|
|
253
|
+
- **`sourceClientConfig`** (object): Elasticsearch client configuration for source. Default: `{ node: 'http://localhost:9200' }`. Ignored if `sourceClient` is provided.
|
|
254
|
+
- **`targetClientConfig`** (object): Elasticsearch client configuration for target. If not provided, uses `sourceClientConfig`. Ignored if `targetClient` is provided.
|
|
255
|
+
- **`sourceClientVersion`** (8 | 9): Force specific ES client version for source. Auto-detected if not specified.
|
|
256
|
+
- **`targetClientVersion`** (8 | 9): Force specific ES client version for target. Auto-detected if not specified.
|
|
257
|
+
|
|
258
|
+
#### Index Configuration
|
|
259
|
+
|
|
260
|
+
- **`mappings`** (object): Elasticsearch document mappings for target index. If reindexing and not provided, mappings are copied from source index.
|
|
261
|
+
- **`mappingsOverride`** (boolean): When reindexing, apply `mappings` on top of source index mappings. Default: `false`.
|
|
262
|
+
- **`deleteIndex`** (boolean): Delete target index if it exists before starting. Default: `false`.
|
|
263
|
+
- **`indexMappingTotalFieldsLimit`** (number): Field limit for target index (`index.mapping.total_fields.limit` setting).
|
|
264
|
+
- **`pipeline`** (string): Elasticsearch ingest pipeline name to use during indexing.
|
|
265
|
+
|
|
266
|
+
#### Performance Options
|
|
267
|
+
|
|
268
|
+
- **`bufferSize`** (number): Buffer size threshold in KBytes for bulk indexing. Default: `5120` (5 MB).
|
|
269
|
+
- **`searchSize`** (number): Number of documents to fetch per search request when reindexing. Default: `100`.
|
|
270
|
+
- **`populatedFields`** (boolean): Detect which fields are actually populated in documents. Useful for optimizing indices with many mapped but unused fields. Default: `false`.
|
|
271
|
+
|
|
272
|
+
#### Processing Options
|
|
273
|
+
|
|
274
|
+
- **`transform`** (function): Callback to transform documents. Signature: `(doc, context?) => doc | doc[] | null | undefined`.
|
|
275
|
+
- Return transformed document
|
|
276
|
+
- Return array of documents to split one source into multiple targets
|
|
277
|
+
- Return `null`/`undefined` to skip document
|
|
278
|
+
- **`query`** (object): Elasticsearch [DSL query](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html) to filter source documents.
|
|
279
|
+
- **`splitRegex`** (RegExp): Line split regex for file/stream sources. Default: `/\n/`.
|
|
280
|
+
- **`skipHeader`** (boolean): Skip first line of source file (e.g., CSV header). Default: `false`.
|
|
281
|
+
- **`verbose`** (boolean): Enable logging and progress bars. Default: `true`.
|
|
282
|
+
|
|
283
|
+
### Return Value
|
|
284
|
+
|
|
285
|
+
The `transformer()` function returns a Promise that resolves to an object with:
|
|
286
|
+
|
|
287
|
+
- **`events`** (EventEmitter): Event emitter for monitoring progress.
|
|
288
|
+
- `'queued'`: Document added to queue
|
|
289
|
+
- `'indexed'`: Document successfully indexed
|
|
290
|
+
- `'complete'`: All documents processed
|
|
291
|
+
- `'error'`: Error occurred
|
|
292
|
+
|
|
293
|
+
```javascript
|
|
294
|
+
const result = await transformer({
|
|
295
|
+
/* options */
|
|
296
|
+
});
|
|
297
|
+
|
|
298
|
+
result.events.on('complete', () => {
|
|
299
|
+
console.log('Ingestion complete!');
|
|
300
|
+
});
|
|
142
301
|
|
|
143
|
-
|
|
144
|
-
|
|
302
|
+
result.events.on('error', err => {
|
|
303
|
+
console.error('Error:', err);
|
|
304
|
+
});
|
|
305
|
+
```
|
|
306
|
+
|
|
307
|
+
### TypeScript Support
|
|
308
|
+
|
|
309
|
+
Full TypeScript definitions are included. Import types for type-safe configuration:
|
|
310
|
+
|
|
311
|
+
```typescript
|
|
312
|
+
import transformer, { TransformerOptions } from 'node-es-transformer';
|
|
313
|
+
|
|
314
|
+
const options: TransformerOptions = {
|
|
315
|
+
fileName: 'data.json',
|
|
316
|
+
targetIndexName: 'my-index',
|
|
317
|
+
};
|
|
145
318
|
```
|
|
146
319
|
|
|
147
|
-
|
|
320
|
+
See [examples/typescript-example.ts](examples/typescript-example.ts) for more examples.
|
|
321
|
+
|
|
322
|
+
## Documentation
|
|
323
|
+
|
|
324
|
+
- **[README.md](README.md)** - Getting started and API reference (you are here)
|
|
325
|
+
- **[examples/](examples/)** - Practical code samples for common use cases
|
|
326
|
+
- **[VERSIONING.md](VERSIONING.md)** - API stability guarantees and versioning policy
|
|
327
|
+
- **[PERFORMANCE.md](PERFORMANCE.md)** - Benchmarks, tuning, and optimization guide
|
|
328
|
+
- **[TESTING.md](TESTING.md)** - Test coverage, approach, and how to run tests
|
|
329
|
+
- **[DEPENDENCIES.md](DEPENDENCIES.md)** - Dependency audit and update tracking
|
|
330
|
+
- **[MIGRATION.md](MIGRATION.md)** - Upgrading from beta to v1.0.0
|
|
331
|
+
- **[CONTRIBUTING.md](CONTRIBUTING.md)** - How to contribute (open an issue first!)
|
|
332
|
+
- **[DEVELOPMENT.md](DEVELOPMENT.md)** - Development setup and testing
|
|
333
|
+
- **[RELEASE.md](RELEASE.md)** - Complete release process and troubleshooting
|
|
334
|
+
- **[SECURITY.md](SECURITY.md)** - Security policy and vulnerability reporting
|
|
335
|
+
|
|
336
|
+
### Error Handling
|
|
337
|
+
|
|
338
|
+
Always handle errors when using the library:
|
|
339
|
+
|
|
340
|
+
```javascript
|
|
341
|
+
transformer({
|
|
342
|
+
/* options */
|
|
343
|
+
})
|
|
344
|
+
.then(() => console.log('Success'))
|
|
345
|
+
.catch(err => console.error('Error:', err));
|
|
346
|
+
|
|
347
|
+
// Or with async/await
|
|
348
|
+
try {
|
|
349
|
+
await transformer({
|
|
350
|
+
/* options */
|
|
351
|
+
});
|
|
352
|
+
console.log('Success');
|
|
353
|
+
} catch (err) {
|
|
354
|
+
console.error('Error:', err);
|
|
355
|
+
}
|
|
356
|
+
```
|
|
357
|
+
|
|
358
|
+
### More Examples
|
|
359
|
+
|
|
360
|
+
See the [examples/](examples/) directory for practical code samples covering:
|
|
361
|
+
|
|
362
|
+
- Basic file ingestion
|
|
363
|
+
- Reindexing with transformations
|
|
364
|
+
- Cross-version migration (ES 8.x → 9.x)
|
|
365
|
+
- Document splitting
|
|
366
|
+
- Wildcard file processing
|
|
367
|
+
- Stream-based ingestion
|
|
368
|
+
|
|
369
|
+
## Contributing
|
|
370
|
+
|
|
371
|
+
Contributions are welcome! Before starting work on a PR, please open an issue to discuss your proposed changes.
|
|
372
|
+
|
|
373
|
+
- [CONTRIBUTING.md](CONTRIBUTING.md) - Contribution guidelines and PR process
|
|
374
|
+
- [DEVELOPMENT.md](DEVELOPMENT.md) - Development setup, testing, and release process
|
|
375
|
+
- [SECURITY.md](SECURITY.md) - Security policy and vulnerability reporting
|
|
376
|
+
|
|
377
|
+
## Support
|
|
378
|
+
|
|
379
|
+
This is a single-person best-effort project. While I aim to address issues and maintain the library, response times may vary. See [VERSIONING.md](VERSIONING.md) for details on API stability and support expectations.
|
|
380
|
+
|
|
381
|
+
**Getting help:**
|
|
382
|
+
- Check the [documentation](#documentation) first
|
|
383
|
+
- Review [examples/](examples/) for practical code samples
|
|
384
|
+
- Search [existing issues](https://github.com/walterra/node-es-transformer/issues)
|
|
385
|
+
- Open a new issue with details (version, steps to reproduce, expected vs actual behavior)
|
|
148
386
|
|
|
149
387
|
## License
|
|
150
388
|
|
|
151
|
-
[Apache 2.0](LICENSE)
|
|
389
|
+
[Apache 2.0](LICENSE)
|