node-es-transformer 1.0.0-beta7 → 1.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE CHANGED
@@ -1,4 +1,4 @@
1
- Copyright 2018-2025 Walter Rafelsberger
1
+ Copyright 2018-2026 Walter M. Rafelsberger
2
2
 
3
3
  Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
4
4
 
package/README.md CHANGED
@@ -1,32 +1,100 @@
1
- [![npm](https://img.shields.io/npm/v/node-es-transformer.svg?maxAge=2592000)](https://www.npmjs.com/package/node-es-transformer)
2
- [![npm](https://img.shields.io/npm/l/node-es-transformer.svg?maxAge=2592000)](https://www.npmjs.com/package/node-es-transformer)
3
- [![npm](https://img.shields.io/npm/dt/node-es-transformer.svg?maxAge=2592000)](https://www.npmjs.com/package/node-es-transformer)
4
- [![Commitizen friendly](https://img.shields.io/badge/commitizen-friendly-brightgreen.svg)](http://commitizen.github.io/cz-cli/)
1
+ [![npm version](https://img.shields.io/npm/v/node-es-transformer.svg)](https://www.npmjs.com/package/node-es-transformer)
2
+ [![npm downloads](https://img.shields.io/npm/dt/node-es-transformer.svg)](https://www.npmjs.com/package/node-es-transformer)
3
+ [![license](https://img.shields.io/npm/l/node-es-transformer.svg)](https://github.com/walterra/node-es-transformer/blob/main/LICENSE)
4
+ [![Node.js version](https://img.shields.io/badge/node-%3E%3D22-brightgreen.svg)](https://nodejs.org/)
5
5
  [![CI](https://github.com/walterra/node-es-transformer/actions/workflows/ci.yml/badge.svg)](https://github.com/walterra/node-es-transformer/actions)
6
+ [![TypeScript](https://img.shields.io/badge/TypeScript-definitions-blue.svg)](https://github.com/walterra/node-es-transformer/blob/main/index.d.ts)
7
+ [![Elasticsearch](https://img.shields.io/badge/Elasticsearch-8.x%20%7C%209.x-005571.svg)](https://www.elastic.co/elasticsearch/)
6
8
 
7
9
  # node-es-transformer
8
10
 
9
- A nodejs based library to (re)index and transform data from/to Elasticsearch.
11
+ Stream-based library for ingesting and transforming large data files (CSV/JSON) into Elasticsearch indices.
10
12
 
11
- ### Why another reindex/ingestion tool?
13
+ ## Quick Start
12
14
 
13
- If you're looking for a nodejs based tool which allows you to ingest large CSV/JSON files in the GigaBytes you've come to the right place. Everything else I've tried with larger files runs out of JS heap, hammers ES with too many single requests, times out or tries to do everything with a single bulk request.
15
+ ```bash
16
+ npm install node-es-transformer
17
+ ```
18
+
19
+ ```javascript
20
+ const transformer = require('node-es-transformer');
21
+
22
+ // Ingest a large JSON file
23
+ await transformer({
24
+ fileName: 'data.json',
25
+ targetIndexName: 'my-index',
26
+ mappings: {
27
+ properties: {
28
+ '@timestamp': { type: 'date' },
29
+ 'message': { type: 'text' }
30
+ }
31
+ }
32
+ });
33
+ ```
14
34
 
15
- While I'd generally recommend using [Logstash](https://www.elastic.co/products/logstash), [filebeat](https://www.elastic.co/products/beats/filebeat), [Ingest Nodes](https://www.elastic.co/guide/en/elasticsearch/reference/master/ingest.html), [Elastic Agent](https://www.elastic.co/guide/en/fleet/current/fleet-overview.html) or [Elasticsearch Transforms](https://www.elastic.co/guide/en/elasticsearch/reference/current/transforms.html) for established use cases, this tool may be of help especially if you feel more at home in the JavaScript/nodejs universe and have use cases with customized ingestion and data transformation needs.
35
+ See [Usage](#usage) for more examples.
36
+
37
+ ## Why Use This?
38
+
39
+ If you need to ingest large CSV/JSON files (GigaBytes) into Elasticsearch without running out of memory, this is the tool for you. Other solutions often run out of JS heap, hammer ES with too many requests, time out, or try to do everything in a single bulk request.
40
+
41
+ **When to use this:**
42
+ - Large file ingestion (20-30 GB tested)
43
+ - Custom JavaScript transformations
44
+ - Cross-version migration (ES 8.x → 9.x)
45
+ - Developer-friendly Node.js workflow
46
+
47
+ **When to use alternatives:**
48
+ - [Logstash](https://www.elastic.co/products/logstash) - Enterprise ingestion pipelines
49
+ - [Filebeat](https://www.elastic.co/products/beats/filebeat) - Log file shipping
50
+ - [Elastic Agent](https://www.elastic.co/guide/en/fleet/current/fleet-overview.html) - Modern unified agent
51
+ - [Elasticsearch Transforms](https://www.elastic.co/guide/en/elasticsearch/reference/current/transforms.html) - Built-in data transformation
52
+
53
+ ## Table of Contents
54
+
55
+ - [Features](#features)
56
+ - [Quick Start](#quick-start)
57
+ - [Installation](#installation)
58
+ - [Version Compatibility](#version-compatibility)
59
+ - [Usage](#usage)
60
+ - [Read from a file](#read-from-a-file)
61
+ - [Read from another index](#read-from-another-index)
62
+ - [Reindex from ES 8.x to 9.x](#reindex-from-elasticsearch-8x-to-9x)
63
+ - [API Reference](#api-reference)
64
+ - [Documentation](#documentation)
65
+ - [Contributing](#contributing)
66
+ - [License](#license)
16
67
 
17
68
  ## Features
18
69
 
19
- - Buffering/Streaming for both reading and indexing. Files are read using streaming and Elasticsearch ingestion is done using buffered bulk indexing. This is tailored towards ingestion of large files. Successfully tested so far with JSON and CSV files in the range of 20-30 GBytes. On a single machine running both `node-es-transformer` and Elasticsearch ingestion rates up to 20k documents/second were achieved (2,9 GHz Intel Core i7, 16GByte RAM, SSD), depending on document size.
20
- - Supports wildcards to ingest/transform a range of files in one go.
21
- - Supports fetching documents from existing indices using search/scroll. This allows you to reindex with custom data transformations just using JavaScript in the `transform` callback.
22
- - Supports ingesting docs based on a nodejs stream.
23
- - The `transform` callback gives you each source document, but you can split it up in multiple ones and return an array of documents. An example use case for this: Each source document is a Tweet and you want to transform that into an entity centric index based on Hashtags.
70
+ - **Streaming and buffering**: Files are read using streams and Elasticsearch ingestion uses buffered bulk indexing. Handles very large files (20-30 GB tested) without running out of memory.
71
+ - **High throughput**: Up to 20k documents/second on a single machine (2.9 GHz Intel Core i7, 16GB RAM, SSD), depending on document size. See [PERFORMANCE.md](PERFORMANCE.md) for benchmarks and tuning guidance.
72
+ - **Wildcard support**: Ingest multiple files matching a pattern (e.g., `logs/*.json`).
73
+ - **Flexible sources**: Read from files, Elasticsearch indices, or Node.js streams.
74
+ - **Reindexing with transforms**: Fetch documents from existing indices and transform them using JavaScript.
75
+ - **Document splitting**: Transform one source document into multiple target documents (e.g., tweets → hashtags).
76
+ - **Cross-version support**: Seamlessly reindex between Elasticsearch 8.x and 9.x.
77
+
78
+ ## Version Compatibility
24
79
 
25
- ## Getting started
80
+ | node-es-transformer | Elasticsearch Client | Elasticsearch Server | Node.js |
81
+ | ----------------------- | -------------------- | -------------------- | ------- |
82
+ | 1.0.0+ | 8.x and 9.x | 8.x and 9.x | 22+ |
83
+ | 1.0.0-beta7 and earlier | 8.x | 8.x | 18-20 |
26
84
 
27
- In your node-js project, add `node-es-transformer` as a dependency (`yarn add node-es-transformer` or `npm install node-es-transformer`).
85
+ **Multi-Version Support**: Starting with v1.0.0, the library supports both Elasticsearch 8.x and 9.x through automatic version detection and client aliasing. This enables seamless reindexing between major versions (e.g., migrating from ES 8.x to 9.x). All functionality is tested in CI against multiple ES versions including cross-version reindexing scenarios.
28
86
 
29
- Use the library in your code like:
87
+ **Upgrading?** See [MIGRATION.md](MIGRATION.md) for upgrade guidance from beta versions to v1.0.0.
88
+
89
+ ## Installation
90
+
91
+ ```bash
92
+ npm install node-es-transformer
93
+ # or
94
+ yarn add node-es-transformer
95
+ ```
96
+
97
+ ## Usage
30
98
 
31
99
  ### Read from a file
32
100
 
@@ -95,57 +163,227 @@ transformer({
95
163
  });
96
164
  ```
97
165
 
98
- ### Options
99
-
100
- - `deleteIndex`: Setting to automatically delete an existing index, default is `false`.
101
- - `sourceClientConfig`/`targetClientConfig`: Optional Elasticsearch client options, defaults to `{ node: 'http://localhost:9200' }`.
102
- - `bufferSize`: The threshold to flush bulk index request in KBytes, defaults to `5120`.
103
- - `searchSize`: The amount of documents to be fetched with each search request when reindexing from another source index.
104
- - `fileName`: Source filename to ingest, supports wildcards. If this is set, `sourceIndexName` and `stream` are not allowed.
105
- - `stream`: Source nodejs stream to ingest. If this is set, `sourceIndexName` and `fileName` are not allowed.
106
- - `splitRegex`: Custom line split regex, defaults to `/\n/`.
107
- - `sourceIndexName`: The source Elasticsearch index to reindex from. If this is set, `fileName` and `stream` are not allowed.
108
- - `targetIndexName`: The target Elasticsearch index where documents will be indexed.
109
- - `mappings`: Optional Elasticsearch document mappings. If not set and you're reindexing from another index, the mappings from the existing index will be used.
110
- - `mappingsOverride`: If you're reindexing and this is set to `true`, `mappings` will be applied on top of the source index's mappings. Defaults to `false`.
111
- - `indexMappingTotalFieldsLimit`: Optional field limit for the target index to be created that will be passed on as the `index.mapping.total_fields.limit` setting.
112
- - `populatedFields`: If `true`, fetches a set of random documents to identify which fields are actually used by documents. Can be useful for indices with lots of field mappings to increase query/reindex performance. Defaults to `false`.
113
- - `query`: Optional Elasticsearch [DSL query](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html) to filter documents from the source index.
114
- - `skipHeader`: If true, skips the first line of the source file. Defaults to `false`.
115
- - `transform(line)`: A callback function which allows the transformation of a source line into one or several documents.
116
- - `verbose`: Logging verbosity, defaults to `true`
117
-
118
- ## Development
119
-
120
- Clone this repository and install its dependencies:
166
+ ### Reindex from Elasticsearch 8.x to 9.x
167
+
168
+ The library automatically detects the Elasticsearch version and uses the appropriate client. This enables seamless reindexing between major versions:
169
+
170
+ ```javascript
171
+ const transformer = require('node-es-transformer');
172
+
173
+ // Auto-detection (recommended)
174
+ transformer({
175
+ sourceClientConfig: {
176
+ node: 'https://es8-cluster.example.com:9200',
177
+ auth: { apiKey: 'your-es8-api-key' },
178
+ },
179
+ targetClientConfig: {
180
+ node: 'https://es9-cluster.example.com:9200',
181
+ auth: { apiKey: 'your-es9-api-key' },
182
+ },
183
+ sourceIndexName: 'my-source-index',
184
+ targetIndexName: 'my-target-index',
185
+ transform(doc) {
186
+ // Optional transformation during reindexing
187
+ return doc;
188
+ },
189
+ });
190
+
191
+ // Explicit version specification (if auto-detection fails)
192
+ transformer({
193
+ sourceClientConfig: {
194
+ /* ... */
195
+ },
196
+ targetClientConfig: {
197
+ /* ... */
198
+ },
199
+ sourceClientVersion: 8, // Force ES 8.x client
200
+ targetClientVersion: 9, // Force ES 9.x client
201
+ sourceIndexName: 'my-source-index',
202
+ targetIndexName: 'my-target-index',
203
+ });
204
+
205
+ // Using pre-instantiated clients (advanced)
206
+ const { Client: Client8 } = require('es8');
207
+ const { Client: Client9 } = require('es9');
208
+
209
+ const sourceClient = new Client8({
210
+ node: 'https://es8-cluster.example.com:9200',
211
+ });
212
+ const targetClient = new Client9({
213
+ node: 'https://es9-cluster.example.com:9200',
214
+ });
215
+
216
+ transformer({
217
+ sourceClient,
218
+ targetClient,
219
+ sourceIndexName: 'my-source-index',
220
+ targetIndexName: 'my-target-index',
221
+ });
222
+ ```
223
+
224
+ **Note**: To use pre-instantiated clients with different ES versions, install both client versions:
121
225
 
122
226
  ```bash
123
- git clone https://github.com/walterra/node-es-transformer
124
- cd node-es-transformer
125
- yarn
227
+ npm install es9@npm:@elastic/elasticsearch@^9.2.0
228
+ npm install es8@npm:@elastic/elasticsearch@^8.17.0
126
229
  ```
127
230
 
128
- `yarn build` builds the library to `dist`, generating two files:
231
+ ## API Reference
129
232
 
130
- - `dist/node-es-transformer.cjs.js`
131
- A CommonJS bundle, suitable for use in Node.js, that `require`s the external dependency. This corresponds to the `"main"` field in package.json
132
- - `dist/node-es-transformer.esm.js`
133
- an ES module bundle, suitable for use in other people's libraries and applications, that `import`s the external dependency. This corresponds to the `"module"` field in package.json
233
+ ### Configuration Options
134
234
 
135
- `yarn dev` builds the library, then keeps rebuilding it whenever the source files change using [rollup-watch](https://github.com/rollup/rollup-watch).
235
+ All options are passed to the main `transformer()` function.
136
236
 
137
- `yarn test` runs the tests. The tests expect that you have an Elasticsearch instance running without security at `http://localhost:9200`. Using docker, you can set this up with:
237
+ #### Required Options
138
238
 
139
- ```bash
140
- # Download the docker image
141
- docker pull docker.elastic.co/elasticsearch/elasticsearch:8.17.0
239
+ - **`targetIndexName`** (string): The target Elasticsearch index where documents will be indexed.
240
+
241
+ #### Source Options
242
+
243
+ Choose **one** of these sources:
244
+
245
+ - **`fileName`** (string): Source filename to ingest. Supports wildcards (e.g., `logs/*.json`).
246
+ - **`sourceIndexName`** (string): Source Elasticsearch index to reindex from.
247
+ - **`stream`** (Readable): Node.js readable stream to ingest from.
248
+
249
+ #### Client Configuration
250
+
251
+ - **`sourceClient`** (Client): Pre-instantiated Elasticsearch client for source operations. If provided, `sourceClientConfig` is ignored.
252
+ - **`targetClient`** (Client): Pre-instantiated Elasticsearch client for target operations. If not provided, uses `sourceClient` or creates from config.
253
+ - **`sourceClientConfig`** (object): Elasticsearch client configuration for source. Default: `{ node: 'http://localhost:9200' }`. Ignored if `sourceClient` is provided.
254
+ - **`targetClientConfig`** (object): Elasticsearch client configuration for target. If not provided, uses `sourceClientConfig`. Ignored if `targetClient` is provided.
255
+ - **`sourceClientVersion`** (8 | 9): Force specific ES client version for source. Auto-detected if not specified.
256
+ - **`targetClientVersion`** (8 | 9): Force specific ES client version for target. Auto-detected if not specified.
257
+
258
+ #### Index Configuration
259
+
260
+ - **`mappings`** (object): Elasticsearch document mappings for target index. If reindexing and not provided, mappings are copied from source index.
261
+ - **`mappingsOverride`** (boolean): When reindexing, apply `mappings` on top of source index mappings. Default: `false`.
262
+ - **`deleteIndex`** (boolean): Delete target index if it exists before starting. Default: `false`.
263
+ - **`indexMappingTotalFieldsLimit`** (number): Field limit for target index (`index.mapping.total_fields.limit` setting).
264
+ - **`pipeline`** (string): Elasticsearch ingest pipeline name to use during indexing.
265
+
266
+ #### Performance Options
267
+
268
+ - **`bufferSize`** (number): Buffer size threshold in KBytes for bulk indexing. Default: `5120` (5 MB).
269
+ - **`searchSize`** (number): Number of documents to fetch per search request when reindexing. Default: `100`.
270
+ - **`populatedFields`** (boolean): Detect which fields are actually populated in documents. Useful for optimizing indices with many mapped but unused fields. Default: `false`.
271
+
272
+ #### Processing Options
273
+
274
+ - **`transform`** (function): Callback to transform documents. Signature: `(doc, context?) => doc | doc[] | null | undefined`.
275
+ - Return transformed document
276
+ - Return array of documents to split one source into multiple targets
277
+ - Return `null`/`undefined` to skip document
278
+ - **`query`** (object): Elasticsearch [DSL query](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html) to filter source documents.
279
+ - **`splitRegex`** (RegExp): Line split regex for file/stream sources. Default: `/\n/`.
280
+ - **`skipHeader`** (boolean): Skip first line of source file (e.g., CSV header). Default: `false`.
281
+ - **`verbose`** (boolean): Enable logging and progress bars. Default: `true`.
282
+
283
+ ### Return Value
284
+
285
+ The `transformer()` function returns a Promise that resolves to an object with:
286
+
287
+ - **`events`** (EventEmitter): Event emitter for monitoring progress.
288
+ - `'queued'`: Document added to queue
289
+ - `'indexed'`: Document successfully indexed
290
+ - `'complete'`: All documents processed
291
+ - `'error'`: Error occurred
292
+
293
+ ```javascript
294
+ const result = await transformer({
295
+ /* options */
296
+ });
297
+
298
+ result.events.on('complete', () => {
299
+ console.log('Ingestion complete!');
300
+ });
142
301
 
143
- # Run the container
144
- docker run --name es01 --net elastic -p 9200:9200 -it -m 1GB -e "discovery.type=single-node" -e "xpack.security.enabled=false" docker.elastic.co/elasticsearch/elasticsearch:8.17.0
302
+ result.events.on('error', err => {
303
+ console.error('Error:', err);
304
+ });
305
+ ```
306
+
307
+ ### TypeScript Support
308
+
309
+ Full TypeScript definitions are included. Import types for type-safe configuration:
310
+
311
+ ```typescript
312
+ import transformer, { TransformerOptions } from 'node-es-transformer';
313
+
314
+ const options: TransformerOptions = {
315
+ fileName: 'data.json',
316
+ targetIndexName: 'my-index',
317
+ };
145
318
  ```
146
319
 
147
- To commit, use `cz`. To prepare a release, use e.g. `yarn release -- --release-as 1.0.0-beta2`.
320
+ See [examples/typescript-example.ts](examples/typescript-example.ts) for more examples.
321
+
322
+ ## Documentation
323
+
324
+ - **[README.md](README.md)** - Getting started and API reference (you are here)
325
+ - **[examples/](examples/)** - Practical code samples for common use cases
326
+ - **[VERSIONING.md](VERSIONING.md)** - API stability guarantees and versioning policy
327
+ - **[PERFORMANCE.md](PERFORMANCE.md)** - Benchmarks, tuning, and optimization guide
328
+ - **[TESTING.md](TESTING.md)** - Test coverage, approach, and how to run tests
329
+ - **[DEPENDENCIES.md](DEPENDENCIES.md)** - Dependency audit and update tracking
330
+ - **[MIGRATION.md](MIGRATION.md)** - Upgrading from beta to v1.0.0
331
+ - **[CONTRIBUTING.md](CONTRIBUTING.md)** - How to contribute (open an issue first!)
332
+ - **[DEVELOPMENT.md](DEVELOPMENT.md)** - Development setup and testing
333
+ - **[RELEASE.md](RELEASE.md)** - Complete release process and troubleshooting
334
+ - **[SECURITY.md](SECURITY.md)** - Security policy and vulnerability reporting
335
+
336
+ ### Error Handling
337
+
338
+ Always handle errors when using the library:
339
+
340
+ ```javascript
341
+ transformer({
342
+ /* options */
343
+ })
344
+ .then(() => console.log('Success'))
345
+ .catch(err => console.error('Error:', err));
346
+
347
+ // Or with async/await
348
+ try {
349
+ await transformer({
350
+ /* options */
351
+ });
352
+ console.log('Success');
353
+ } catch (err) {
354
+ console.error('Error:', err);
355
+ }
356
+ ```
357
+
358
+ ### More Examples
359
+
360
+ See the [examples/](examples/) directory for practical code samples covering:
361
+
362
+ - Basic file ingestion
363
+ - Reindexing with transformations
364
+ - Cross-version migration (ES 8.x → 9.x)
365
+ - Document splitting
366
+ - Wildcard file processing
367
+ - Stream-based ingestion
368
+
369
+ ## Contributing
370
+
371
+ Contributions are welcome! Before starting work on a PR, please open an issue to discuss your proposed changes.
372
+
373
+ - [CONTRIBUTING.md](CONTRIBUTING.md) - Contribution guidelines and PR process
374
+ - [DEVELOPMENT.md](DEVELOPMENT.md) - Development setup, testing, and release process
375
+ - [SECURITY.md](SECURITY.md) - Security policy and vulnerability reporting
376
+
377
+ ## Support
378
+
379
+ This is a single-person best-effort project. While I aim to address issues and maintain the library, response times may vary. See [VERSIONING.md](VERSIONING.md) for details on API stability and support expectations.
380
+
381
+ **Getting help:**
382
+ - Check the [documentation](#documentation) first
383
+ - Review [examples/](examples/) for practical code samples
384
+ - Search [existing issues](https://github.com/walterra/node-es-transformer/issues)
385
+ - Open a new issue with details (version, steps to reproduce, expected vs actual behavior)
148
386
 
149
387
  ## License
150
388
 
151
- [Apache 2.0](LICENSE).
389
+ [Apache 2.0](LICENSE)