node-es-transformer 1.0.0-beta3 → 1.0.0-beta5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md ADDED
@@ -0,0 +1,95 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file. See [commit-and-tag-version](https://github.com/absolute-version/commit-and-tag-version) for commit guidelines.
4
+
5
+ ## [1.0.0-beta5](https://github.com/walterra/node-es-transformer/compare/v1.0.0-beta4...v1.0.0-beta5) (2024-12-31)
6
+
7
+
8
+ ### ⚠ BREAKING CHANGES
9
+
10
+ * Minimum required nodejs version: v20
11
+
12
+ * update .nvmrc to v20 ([e77899a](https://github.com/walterra/node-es-transformer/commit/e77899a398703fd2d0ffaeb093a3b7a1d638cc6f))
13
+
14
+
15
+ ### Features
16
+
17
+ * emits docsPerSecond event ([bc825fe](https://github.com/walterra/node-es-transformer/commit/bc825fee18cbe2eaafc2f7d67f34067e0b93db50))
18
+ * support for pipeline ([c53ea2c](https://github.com/walterra/node-es-transformer/commit/c53ea2c5465b564ec141ab2abfe7e0db6cac1fb0))
19
+
20
+ ## [1.0.0-beta4](https://github.com/walterra/node-es-transformer/compare/v1.0.0-beta3...v1.0.0-beta4) (2024-12-30)
21
+
22
+ ### Docs
23
+
24
+ - updated README.md to document `stream` option (bc664fe).
25
+
26
+ ### Chore
27
+
28
+ - add tests for `stream` option (b5b644c).
29
+ - update ES client to `8.17.0` (62edd5c).
30
+
31
+ ## [1.0.0-beta3](https://github.com/walterra/node-es-transformer/compare/v1.0.0-beta2...v1.0.0-beta3) (2024-12-23)
32
+
33
+ ### ⚠ BREAKING CHANGES
34
+
35
+ - bufferSize is no longer number of docs but flush KBytes.
36
+
37
+ ### Features
38
+
39
+ - delete index option ([b92bd21](https://github.com/walterra/node-es-transformer/commit/b92bd211ace2eb66aedb06f58ba64e8c23f94aaa))
40
+ - adds support for node stream as source ([281950c](https://github.com/walterra/node-es-transformer/commit/281950c12f20a9526f3d1db75ed23cec5255cba4))
41
+ - make use of ES client bulk index helper ([b0b39c8](https://github.com/walterra/node-es-transformer/commit/b0b39c8fe95758cc52f8c82caa7ac4bde2cd87a1))
42
+
43
+ ## [1.0.0-beta2](https://github.com/walterra/node-es-transformer/compare/v1.0.0-beta1...v1.0.0-beta2) (2023-11-08)
44
+
45
+ ### Features
46
+
47
+ - new `populatedFields` option ([abc9a06](https://github.com/walterra/node-es-transformer/commit/abc9a06ee0aade79fd5e4acf93371e7213790cde))
48
+
49
+ ### Bug Fixes
50
+
51
+ - fix line handling for transform callback for file reader ([9962382](https://github.com/walterra/node-es-transformer/commit/99623824ef80fff2956bf9b90164395f8854ebe3))
52
+
53
+ ## [1.0.0-beta1](https://github.com/walterra/node-es-transformer/compare/v1.0.0-alpha12...v1.0.0-beta1) (2023-10-30)
54
+
55
+ ### Bug Fixes
56
+
57
+ - avoid passing on an empty buffer in finish callback ([e0fbe8e](https://github.com/walterra/node-es-transformer/commit/e0fbe8e47a876af2d601fbe74521e46dbc0dc750))
58
+ - fix event handling for file-reader ([5f472b3](https://github.com/walterra/node-es-transformer/commit/5f472b37f647bb0320653f8333ccec984483c12f))
59
+ - fixes parallel calls ([9c2785d](https://github.com/walterra/node-es-transformer/commit/9c2785d592ff5ee825799f4fe0a0dbaed54ddd15))
60
+ - trigger end of progress bar only after finish event was triggered ([e57b9a0](https://github.com/walterra/node-es-transformer/commit/e57b9a0954c8e98cc8b724ba9dca546e1f443d59))
61
+
62
+ ## [1.0.0-alpha12](https://github.com/walterra/node-es-transformer/compare/v1.0.0-alpha11...v1.0.0-alpha12) (2023-10-12)
63
+
64
+ ### Features
65
+
66
+ - bulk ingest with parallel calls and dynamic backoff ([0c7311d](https://github.com/walterra/node-es-transformer/commit/0c7311daf19b0da1a59a8698a9dd9b240ca20c21))
67
+
68
+ ## [1.0.0-alpha11](https://github.com/walterra/node-es-transformer/compare/v1.0.0-alpha10...v1.0.0-alpha11) (2023-10-12)
69
+
70
+ ### Features
71
+
72
+ - new option 'indexMappingTotalFieldsLimit' ([92edad1](https://github.com/walterra/node-es-transformer/commit/92edad18da7186d3881fc181e6e88b7929bed2d4))
73
+
74
+ ### Bug Fixes
75
+
76
+ - fixes bufferSize to be applied to index reader too ([ffc3749](https://github.com/walterra/node-es-transformer/commit/ffc3749e296cd39f39924571c197986addc756ff))
77
+
78
+ ## [`v1.0.0-alpha10`](https://github.com/walterra/node-es-transformer/releases/tag/v1.0.0-alpha10)
79
+
80
+ - New option `mappingsOverride` (0b951e1).
81
+ - New option `query` (45f91db).
82
+
83
+ ## [`v1.0.0-alpha9`](https://github.com/walterra/node-es-transformer/releases/tag/v1.0.0-alpha9)
84
+
85
+ - Source and target configs are now expected to be passed in as complete client configs instead of individual parameters (5e6d0c7).
86
+
87
+ ## [`v1.0.0-alpha8`](https://github.com/walterra/node-es-transformer/releases/tag/v1.0.0-alpha8)
88
+
89
+ - Exposes events and introduces `finish` event (a3e5810).
90
+ - Drop support for `_type` from `6.x` indices (3a26a84).
91
+
92
+ ## [`v1.0.0-alpha7`](https://github.com/walterra/node-es-transformer/releases/tag/v1.0.0-alpha7)
93
+
94
+ - This version locks down `event-stream` to version `3.3.4` because of the security issue described here: https://github.com/dominictarr/event-stream/issues/116
95
+ - Last version to support `_type` from `6.x` indices.
package/README.md CHANGED
@@ -14,23 +14,12 @@ If you're looking for a nodejs based tool which allows you to ingest large CSV/J
14
14
 
15
15
  While I'd generally recommend using [Logstash](https://www.elastic.co/products/logstash), [filebeat](https://www.elastic.co/products/beats/filebeat), [Ingest Nodes](https://www.elastic.co/guide/en/elasticsearch/reference/master/ingest.html), [Elastic Agent](https://www.elastic.co/guide/en/fleet/current/fleet-overview.html) or [Elasticsearch Transforms](https://www.elastic.co/guide/en/elasticsearch/reference/current/transforms.html) for established use cases, this tool may be of help especially if you feel more at home in the JavaScript/nodejs universe and have use cases with customized ingestion and data transformation needs.
16
16
 
17
- **This is experimental code, use at your own risk. Nonetheless, I encourage you to give it a try so I can gather some feedback.**
18
-
19
- ### So why is this still _alpha_?
20
-
21
- - The API is not quite final and might change from release to release.
22
- - The code needs some more safety measures to avoid some possible accidental data loss scenarios.
23
- - No test coverage yet.
24
-
25
- ---
26
-
27
- Now that we've talked about the caveats, let's have a look what you actually get with this tool:
28
-
29
17
  ## Features
30
18
 
31
19
  - Buffering/Streaming for both reading and indexing. Files are read using streaming and Elasticsearch ingestion is done using buffered bulk indexing. This is tailored towards ingestion of large files. Successfully tested so far with JSON and CSV files in the range of 20-30 GBytes. On a single machine running both `node-es-transformer` and Elasticsearch ingestion rates up to 20k documents/second were achieved (2,9 GHz Intel Core i7, 16GByte RAM, SSD), depending on document size.
32
20
  - Supports wildcards to ingest/transform a range of files in one go.
33
21
  - Supports fetching documents from existing indices using search/scroll. This allows you to reindex with custom data transformations just using JavaScript in the `transform` callback.
22
+ - Supports ingesting docs based on a nodejs stream.
34
23
  - The `transform` callback gives you each source document, but you can split it up in multiple ones and return an array of documents. An example use case for this: Each source document is a Tweet and you want to transform that into an entity centric index based on Hashtags.
35
24
 
36
25
  ## Getting started
@@ -112,9 +101,10 @@ transformer({
112
101
  - `sourceClientConfig`/`targetClientConfig`: Optional Elasticsearch client options, defaults to `{ node: 'http://localhost:9200' }`.
113
102
  - `bufferSize`: The threshold to flush bulk index request in KBytes, defaults to `5120`.
114
103
  - `searchSize`: The amount of documents to be fetched with each search request when reindexing from another source index.
115
- - `fileName`: Source filename to ingest, supports wildcards. If this is set, `sourceIndexName` is not allowed.
104
+ - `fileName`: Source filename to ingest, supports wildcards. If this is set, `sourceIndexName` and `stream` are not allowed.
105
+ - `stream`: Source nodejs stream to ingest. If this is set, `sourceIndexName` and `fileName` are not allowed.
116
106
  - `splitRegex`: Custom line split regex, defaults to `/\n/`.
117
- - `sourceIndexName`: The source Elasticsearch index to reindex from. If this is set, `fileName` is not allowed.
107
+ - `sourceIndexName`: The source Elasticsearch index to reindex from. If this is set, `fileName` and `stream` are not allowed.
118
108
  - `targetIndexName`: The target Elasticsearch index where documents will be indexed.
119
109
  - `mappings`: Optional Elasticsearch document mappings. If not set and you're reindexing from another index, the mappings from the existing index will be used.
120
110
  - `mappingsOverride`: If you're reindexing and this is set to `true`, `mappings` will be applied on top of the source index's mappings. Defaults to `false`.
@@ -148,10 +138,10 @@ yarn
148
138
 
149
139
  ```bash
150
140
  # Download the docker image
151
- docker pull docker.elastic.co/elasticsearch/elasticsearch:8.15.0
141
+ docker pull docker.elastic.co/elasticsearch/elasticsearch:8.17.0
152
142
 
153
143
  # Run the container
154
- docker run --name es01 --net elastic -p 9200:9200 -it -m 1GB -e "discovery.type=single-node" -e "xpack.security.enabled=false" docker.elastic.co/elasticsearch/elasticsearch:8.15.0
144
+ docker run --name es01 --net elastic -p 9200:9200 -it -m 1GB -e "discovery.type=single-node" -e "xpack.security.enabled=false" docker.elastic.co/elasticsearch/elasticsearch:8.17.0
155
145
  ```
156
146
 
157
147
  To commit, use `cz`. To prepare a release, use e.g. `yarn release -- --release-as 1.0.0-beta2`.
@@ -0,0 +1,3 @@
1
+ module.exports = {
2
+ disableEmoji: true,
3
+ };