node-es-transformer 1.0.0-beta3 → 1.0.0-beta5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +95 -0
- package/README.md +6 -16
- package/changelog.config.js +3 -0
- package/dist/node-es-transformer.cjs.js +263 -339
- package/dist/node-es-transformer.cjs.js.map +1 -0
- package/dist/node-es-transformer.esm.js +254 -328
- package/dist/node-es-transformer.esm.js.map +1 -0
- package/package.json +7 -7
package/CHANGELOG.md
ADDED
|
@@ -0,0 +1,95 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to this project will be documented in this file. See [commit-and-tag-version](https://github.com/absolute-version/commit-and-tag-version) for commit guidelines.
|
|
4
|
+
|
|
5
|
+
## [1.0.0-beta5](https://github.com/walterra/node-es-transformer/compare/v1.0.0-beta4...v1.0.0-beta5) (2024-12-31)
|
|
6
|
+
|
|
7
|
+
|
|
8
|
+
### ⚠ BREAKING CHANGES
|
|
9
|
+
|
|
10
|
+
* Minimum required nodejs version: v20
|
|
11
|
+
|
|
12
|
+
* update .nvmrc to v20 ([e77899a](https://github.com/walterra/node-es-transformer/commit/e77899a398703fd2d0ffaeb093a3b7a1d638cc6f))
|
|
13
|
+
|
|
14
|
+
|
|
15
|
+
### Features
|
|
16
|
+
|
|
17
|
+
* emits docsPerSecond event ([bc825fe](https://github.com/walterra/node-es-transformer/commit/bc825fee18cbe2eaafc2f7d67f34067e0b93db50))
|
|
18
|
+
* support for pipeline ([c53ea2c](https://github.com/walterra/node-es-transformer/commit/c53ea2c5465b564ec141ab2abfe7e0db6cac1fb0))
|
|
19
|
+
|
|
20
|
+
## [1.0.0-beta4](https://github.com/walterra/node-es-transformer/compare/v1.0.0-beta3...v1.0.0-beta4) (2024-12-30)
|
|
21
|
+
|
|
22
|
+
### Docs
|
|
23
|
+
|
|
24
|
+
- updated README.md to document `stream` option (bc664fe).
|
|
25
|
+
|
|
26
|
+
### Chore
|
|
27
|
+
|
|
28
|
+
- add tests for `stream` option (b5b644c).
|
|
29
|
+
- update ES client to `8.17.0` (62edd5c).
|
|
30
|
+
|
|
31
|
+
## [1.0.0-beta3](https://github.com/walterra/node-es-transformer/compare/v1.0.0-beta2...v1.0.0-beta3) (2024-12-23)
|
|
32
|
+
|
|
33
|
+
### ⚠ BREAKING CHANGES
|
|
34
|
+
|
|
35
|
+
- bufferSize is no longer number of docs but flush KBytes.
|
|
36
|
+
|
|
37
|
+
### Features
|
|
38
|
+
|
|
39
|
+
- delete index option ([b92bd21](https://github.com/walterra/node-es-transformer/commit/b92bd211ace2eb66aedb06f58ba64e8c23f94aaa))
|
|
40
|
+
- adds support for node stream as source ([281950c](https://github.com/walterra/node-es-transformer/commit/281950c12f20a9526f3d1db75ed23cec5255cba4))
|
|
41
|
+
- make use of ES client bulk index helper ([b0b39c8](https://github.com/walterra/node-es-transformer/commit/b0b39c8fe95758cc52f8c82caa7ac4bde2cd87a1))
|
|
42
|
+
|
|
43
|
+
## [1.0.0-beta2](https://github.com/walterra/node-es-transformer/compare/v1.0.0-beta1...v1.0.0-beta2) (2023-11-08)
|
|
44
|
+
|
|
45
|
+
### Features
|
|
46
|
+
|
|
47
|
+
- new `populatedFields` option ([abc9a06](https://github.com/walterra/node-es-transformer/commit/abc9a06ee0aade79fd5e4acf93371e7213790cde))
|
|
48
|
+
|
|
49
|
+
### Bug Fixes
|
|
50
|
+
|
|
51
|
+
- fix line handling for transform callback for file reader ([9962382](https://github.com/walterra/node-es-transformer/commit/99623824ef80fff2956bf9b90164395f8854ebe3))
|
|
52
|
+
|
|
53
|
+
## [1.0.0-beta1](https://github.com/walterra/node-es-transformer/compare/v1.0.0-alpha12...v1.0.0-beta1) (2023-10-30)
|
|
54
|
+
|
|
55
|
+
### Bug Fixes
|
|
56
|
+
|
|
57
|
+
- avoid passing on an empty buffer in finish callback ([e0fbe8e](https://github.com/walterra/node-es-transformer/commit/e0fbe8e47a876af2d601fbe74521e46dbc0dc750))
|
|
58
|
+
- fix event handling for file-reader ([5f472b3](https://github.com/walterra/node-es-transformer/commit/5f472b37f647bb0320653f8333ccec984483c12f))
|
|
59
|
+
- fixes parallel calls ([9c2785d](https://github.com/walterra/node-es-transformer/commit/9c2785d592ff5ee825799f4fe0a0dbaed54ddd15))
|
|
60
|
+
- trigger end of progress bar only after finish event was triggered ([e57b9a0](https://github.com/walterra/node-es-transformer/commit/e57b9a0954c8e98cc8b724ba9dca546e1f443d59))
|
|
61
|
+
|
|
62
|
+
## [1.0.0-alpha12](https://github.com/walterra/node-es-transformer/compare/v1.0.0-alpha11...v1.0.0-alpha12) (2023-10-12)
|
|
63
|
+
|
|
64
|
+
### Features
|
|
65
|
+
|
|
66
|
+
- bulk ingest with parallel calls and dynamic backoff ([0c7311d](https://github.com/walterra/node-es-transformer/commit/0c7311daf19b0da1a59a8698a9dd9b240ca20c21))
|
|
67
|
+
|
|
68
|
+
## [1.0.0-alpha11](https://github.com/walterra/node-es-transformer/compare/v1.0.0-alpha10...v1.0.0-alpha11) (2023-10-12)
|
|
69
|
+
|
|
70
|
+
### Features
|
|
71
|
+
|
|
72
|
+
- new option 'indexMappingTotalFieldsLimit' ([92edad1](https://github.com/walterra/node-es-transformer/commit/92edad18da7186d3881fc181e6e88b7929bed2d4))
|
|
73
|
+
|
|
74
|
+
### Bug Fixes
|
|
75
|
+
|
|
76
|
+
- fixes bufferSize to be applied to index reader too ([ffc3749](https://github.com/walterra/node-es-transformer/commit/ffc3749e296cd39f39924571c197986addc756ff))
|
|
77
|
+
|
|
78
|
+
## [`v1.0.0-alpha10`](https://github.com/walterra/node-es-transformer/releases/tag/v1.0.0-alpha10)
|
|
79
|
+
|
|
80
|
+
- New option `mappingsOverride` (0b951e1).
|
|
81
|
+
- New option `query` (45f91db).
|
|
82
|
+
|
|
83
|
+
## [`v1.0.0-alpha9`](https://github.com/walterra/node-es-transformer/releases/tag/v1.0.0-alpha9)
|
|
84
|
+
|
|
85
|
+
- Source and target configs are now expected to be passed in as complete client configs instead of individual parameters (5e6d0c7).
|
|
86
|
+
|
|
87
|
+
## [`v1.0.0-alpha8`](https://github.com/walterra/node-es-transformer/releases/tag/v1.0.0-alpha8)
|
|
88
|
+
|
|
89
|
+
- Exposes events and introduces `finish` event (a3e5810).
|
|
90
|
+
- Drop support for `_type` from `6.x` indices (3a26a84).
|
|
91
|
+
|
|
92
|
+
## [`v1.0.0-alpha7`](https://github.com/walterra/node-es-transformer/releases/tag/v1.0.0-alpha7)
|
|
93
|
+
|
|
94
|
+
- This version locks down `event-stream` to version `3.3.4` because of the security issue described here: https://github.com/dominictarr/event-stream/issues/116
|
|
95
|
+
- Last version to support `_type` from `6.x` indices.
|
package/README.md
CHANGED
|
@@ -14,23 +14,12 @@ If you're looking for a nodejs based tool which allows you to ingest large CSV/J
|
|
|
14
14
|
|
|
15
15
|
While I'd generally recommend using [Logstash](https://www.elastic.co/products/logstash), [filebeat](https://www.elastic.co/products/beats/filebeat), [Ingest Nodes](https://www.elastic.co/guide/en/elasticsearch/reference/master/ingest.html), [Elastic Agent](https://www.elastic.co/guide/en/fleet/current/fleet-overview.html) or [Elasticsearch Transforms](https://www.elastic.co/guide/en/elasticsearch/reference/current/transforms.html) for established use cases, this tool may be of help especially if you feel more at home in the JavaScript/nodejs universe and have use cases with customized ingestion and data transformation needs.
|
|
16
16
|
|
|
17
|
-
**This is experimental code, use at your own risk. Nonetheless, I encourage you to give it a try so I can gather some feedback.**
|
|
18
|
-
|
|
19
|
-
### So why is this still _alpha_?
|
|
20
|
-
|
|
21
|
-
- The API is not quite final and might change from release to release.
|
|
22
|
-
- The code needs some more safety measures to avoid some possible accidental data loss scenarios.
|
|
23
|
-
- No test coverage yet.
|
|
24
|
-
|
|
25
|
-
---
|
|
26
|
-
|
|
27
|
-
Now that we've talked about the caveats, let's have a look what you actually get with this tool:
|
|
28
|
-
|
|
29
17
|
## Features
|
|
30
18
|
|
|
31
19
|
- Buffering/Streaming for both reading and indexing. Files are read using streaming and Elasticsearch ingestion is done using buffered bulk indexing. This is tailored towards ingestion of large files. Successfully tested so far with JSON and CSV files in the range of 20-30 GBytes. On a single machine running both `node-es-transformer` and Elasticsearch ingestion rates up to 20k documents/second were achieved (2,9 GHz Intel Core i7, 16GByte RAM, SSD), depending on document size.
|
|
32
20
|
- Supports wildcards to ingest/transform a range of files in one go.
|
|
33
21
|
- Supports fetching documents from existing indices using search/scroll. This allows you to reindex with custom data transformations just using JavaScript in the `transform` callback.
|
|
22
|
+
- Supports ingesting docs based on a nodejs stream.
|
|
34
23
|
- The `transform` callback gives you each source document, but you can split it up in multiple ones and return an array of documents. An example use case for this: Each source document is a Tweet and you want to transform that into an entity centric index based on Hashtags.
|
|
35
24
|
|
|
36
25
|
## Getting started
|
|
@@ -112,9 +101,10 @@ transformer({
|
|
|
112
101
|
- `sourceClientConfig`/`targetClientConfig`: Optional Elasticsearch client options, defaults to `{ node: 'http://localhost:9200' }`.
|
|
113
102
|
- `bufferSize`: The threshold to flush bulk index request in KBytes, defaults to `5120`.
|
|
114
103
|
- `searchSize`: The amount of documents to be fetched with each search request when reindexing from another source index.
|
|
115
|
-
- `fileName`: Source filename to ingest, supports wildcards. If this is set, `sourceIndexName`
|
|
104
|
+
- `fileName`: Source filename to ingest, supports wildcards. If this is set, `sourceIndexName` and `stream` are not allowed.
|
|
105
|
+
- `stream`: Source nodejs stream to ingest. If this is set, `sourceIndexName` and `fileName` are not allowed.
|
|
116
106
|
- `splitRegex`: Custom line split regex, defaults to `/\n/`.
|
|
117
|
-
- `sourceIndexName`: The source Elasticsearch index to reindex from. If this is set, `fileName`
|
|
107
|
+
- `sourceIndexName`: The source Elasticsearch index to reindex from. If this is set, `fileName` and `stream` are not allowed.
|
|
118
108
|
- `targetIndexName`: The target Elasticsearch index where documents will be indexed.
|
|
119
109
|
- `mappings`: Optional Elasticsearch document mappings. If not set and you're reindexing from another index, the mappings from the existing index will be used.
|
|
120
110
|
- `mappingsOverride`: If you're reindexing and this is set to `true`, `mappings` will be applied on top of the source index's mappings. Defaults to `false`.
|
|
@@ -148,10 +138,10 @@ yarn
|
|
|
148
138
|
|
|
149
139
|
```bash
|
|
150
140
|
# Download the docker image
|
|
151
|
-
docker pull docker.elastic.co/elasticsearch/elasticsearch:8.
|
|
141
|
+
docker pull docker.elastic.co/elasticsearch/elasticsearch:8.17.0
|
|
152
142
|
|
|
153
143
|
# Run the container
|
|
154
|
-
docker run --name es01 --net elastic -p 9200:9200 -it -m 1GB -e "discovery.type=single-node" -e "xpack.security.enabled=false" docker.elastic.co/elasticsearch/elasticsearch:8.
|
|
144
|
+
docker run --name es01 --net elastic -p 9200:9200 -it -m 1GB -e "discovery.type=single-node" -e "xpack.security.enabled=false" docker.elastic.co/elasticsearch/elasticsearch:8.17.0
|
|
155
145
|
```
|
|
156
146
|
|
|
157
147
|
To commit, use `cz`. To prepare a release, use e.g. `yarn release -- --release-as 1.0.0-beta2`.
|