node-es-transformer 1.0.0-alpha8 → 1.0.0-beta1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +43 -39
- package/dist/node-es-transformer.cjs.js +199 -117
- package/dist/node-es-transformer.esm.js +199 -117
- package/package.json +22 -6
package/README.md
CHANGED
|
@@ -1,6 +1,8 @@
|
|
|
1
1
|
[](https://www.npmjs.com/package/node-es-transformer)
|
|
2
2
|
[](https://www.npmjs.com/package/node-es-transformer)
|
|
3
3
|
[](https://www.npmjs.com/package/node-es-transformer)
|
|
4
|
+
[](http://commitizen.github.io/cz-cli/)
|
|
5
|
+
[](https://github.com/walterra/node-es-transformer/actions)
|
|
4
6
|
|
|
5
7
|
# node-es-transformer
|
|
6
8
|
|
|
@@ -10,7 +12,7 @@ A nodejs based library to (re)index and transform data from/to Elasticsearch.
|
|
|
10
12
|
|
|
11
13
|
If you're looking for a nodejs based tool which allows you to ingest large CSV/JSON files in the GigaBytes you've come to the right place. Everything else I've tried with larger files runs out of JS heap, hammers ES with too many single requests, times out or tries to do everything with a single bulk request.
|
|
12
14
|
|
|
13
|
-
While I'd generally recommend using [Logstash](https://www.elastic.co/products/logstash), [filebeat](https://www.elastic.co/products/beats/filebeat)
|
|
15
|
+
While I'd generally recommend using [Logstash](https://www.elastic.co/products/logstash), [filebeat](https://www.elastic.co/products/beats/filebeat), [Ingest Nodes](https://www.elastic.co/guide/en/elasticsearch/reference/master/ingest.html), [Elastic Agent](https://www.elastic.co/guide/en/fleet/current/fleet-overview.html) or [Elasticsearch Transforms](https://www.elastic.co/guide/en/elasticsearch/reference/current/transforms.html) for established use cases, this tool may be of help especially if you feel more at home in the JavaScript/nodejs universe and have use cases with customized ingestion and data transformation needs.
|
|
14
16
|
|
|
15
17
|
**This is experimental code, use at your own risk. Nonetheless, I encourage you to give it a try so I can gather some feedback.**
|
|
16
18
|
|
|
@@ -26,7 +28,7 @@ Now that we've talked about the caveats, let's have a look what you actually get
|
|
|
26
28
|
|
|
27
29
|
## Features
|
|
28
30
|
|
|
29
|
-
- Buffering/Streaming for both reading and indexing. Files are read using streaming and Elasticsearch ingestion is done using buffered bulk indexing. This is tailored towards ingestion of large files. Successfully tested so far with JSON and CSV files in the range of 20-30 GBytes. On a single machine running both `node-es-transformer` and Elasticsearch ingestion rates up to 20k documents/second were achieved (2,9 GHz Intel Core i7, 16GByte RAM, SSD).
|
|
31
|
+
- Buffering/Streaming for both reading and indexing. Files are read using streaming and Elasticsearch ingestion is done using buffered bulk indexing. This is tailored towards ingestion of large files. Successfully tested so far with JSON and CSV files in the range of 20-30 GBytes. On a single machine running both `node-es-transformer` and Elasticsearch ingestion rates up to 20k documents/second were achieved (2,9 GHz Intel Core i7, 16GByte RAM, SSD), depending on document size.
|
|
30
32
|
- Supports wildcards to ingest/transform a range of files in one go.
|
|
31
33
|
- Supports fetching documents from existing indices using search/scroll. This allows you to reindex with custom data transformations just using JavaScript in the `transform` callback.
|
|
32
34
|
- The `transform` callback gives you each source document, but you can split it up in multiple ones and return an array of documents. An example use case for this: Each source document is a Tweet and you want to transform that into an entity centric index based on Hashtags.
|
|
@@ -46,20 +48,18 @@ transformer({
|
|
|
46
48
|
fileName: 'filename.json',
|
|
47
49
|
targetIndexName: 'my-index',
|
|
48
50
|
mappings: {
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
'
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
'
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
'
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
'
|
|
61
|
-
type: 'keyword'
|
|
62
|
-
}
|
|
51
|
+
properties: {
|
|
52
|
+
'@timestamp': {
|
|
53
|
+
type: 'date'
|
|
54
|
+
},
|
|
55
|
+
'first_name': {
|
|
56
|
+
type: 'keyword'
|
|
57
|
+
},
|
|
58
|
+
'last_name': {
|
|
59
|
+
type: 'keyword'
|
|
60
|
+
}
|
|
61
|
+
'full_name': {
|
|
62
|
+
type: 'keyword'
|
|
63
63
|
}
|
|
64
64
|
}
|
|
65
65
|
},
|
|
@@ -82,20 +82,18 @@ transformer({
|
|
|
82
82
|
targetIndexName: 'my-target-index',
|
|
83
83
|
// optional, if you skip mappings, they will be fetched from the source index.
|
|
84
84
|
mappings: {
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
'
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
'
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
'
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
'
|
|
97
|
-
type: 'keyword'
|
|
98
|
-
}
|
|
85
|
+
properties: {
|
|
86
|
+
'@timestamp': {
|
|
87
|
+
type: 'date'
|
|
88
|
+
},
|
|
89
|
+
'first_name': {
|
|
90
|
+
type: 'keyword'
|
|
91
|
+
},
|
|
92
|
+
'last_name': {
|
|
93
|
+
type: 'keyword'
|
|
94
|
+
}
|
|
95
|
+
'full_name': {
|
|
96
|
+
type: 'keyword'
|
|
99
97
|
}
|
|
100
98
|
}
|
|
101
99
|
},
|
|
@@ -111,18 +109,16 @@ transformer({
|
|
|
111
109
|
### Options
|
|
112
110
|
|
|
113
111
|
- `deleteIndex`: Setting to automatically delete an existing index, default is `false`.
|
|
114
|
-
- `
|
|
115
|
-
- `host`/`targetHost`: Elasticsearch host, defaults to `localhost`.
|
|
116
|
-
- `port`/`targetPort`: Elasticsearch port, defaults to `9200`.
|
|
117
|
-
- `auth`/`targetAuth`: Optional Elasticsearch authorization object, for example `{ username: 'elastic', password: 'changeme'}`.
|
|
118
|
-
- `rejectUnauthorized`: Elasticsearch TLS option, defaults to `true`.
|
|
119
|
-
- `ca`: Optional path to certificate used for TLS configuraiton.
|
|
112
|
+
- `sourceClientConfig`/`targetClientConfig`: Optional Elasticsearch client options, defaults to `{ node: 'http://localhost:9200' }`.
|
|
120
113
|
- `bufferSize`: The amount of documents inserted with each Elasticsearch bulk insert request, default is `1000`.
|
|
121
114
|
- `fileName`: Source filename to ingest, supports wildcards. If this is set, `sourceIndexName` is not allowed.
|
|
122
115
|
- `splitRegex`: Custom line split regex, defaults to `/\n/`.
|
|
123
|
-
- `sourceIndexName`: The source Elasticsearch to reindex from. If this is set, `fileName` is not allowed.
|
|
116
|
+
- `sourceIndexName`: The source Elasticsearch index to reindex from. If this is set, `fileName` is not allowed.
|
|
124
117
|
- `targetIndexName`: The target Elasticsearch index where documents will be indexed.
|
|
125
|
-
- `mappings`: Elasticsearch document
|
|
118
|
+
- `mappings`: Optional Elasticsearch document mappings. If not set and you're reindexing from another index, the mappings from the existing index will be used.
|
|
119
|
+
- `mappingsOverride`: If you're reindexing and this is set to `true`, `mappings` will be applied on top of the source index's mappings. Defaults to `false`.
|
|
120
|
+
- `indexMappingTotalFieldsLimit`: Optional field limit for the target index to be created that will be passed on as the `index.mapping.total_fields.limit` setting.
|
|
121
|
+
- `query`: Optional Elasticsearch [DSL query](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html) to filter documents from the source index.
|
|
126
122
|
- `skipHeader`: If true, skips the first line of the source file. Defaults to `false`.
|
|
127
123
|
- `transform(line)`: A callback function which allows the transformation of a source line into one or several documents.
|
|
128
124
|
- `verbose`: Logging verbosity, defaults to `true`
|
|
@@ -146,7 +142,15 @@ yarn
|
|
|
146
142
|
|
|
147
143
|
`yarn dev` builds the library, then keeps rebuilding it whenever the source files change using [rollup-watch](https://github.com/rollup/rollup-watch).
|
|
148
144
|
|
|
149
|
-
`yarn test`
|
|
145
|
+
`yarn test` runs the tests. The tests expect that you have an Elasticsearch instance running without security at `http://localhost:9200`. Using docker, you can set this up with:
|
|
146
|
+
|
|
147
|
+
```bash
|
|
148
|
+
# Download the docker image
|
|
149
|
+
docker pull docker.elastic.co/elasticsearch/elasticsearch:8.10.4
|
|
150
|
+
|
|
151
|
+
# Run the container
|
|
152
|
+
docker run --name es01 --net elastic -p 9200:9200 -it -m 1GB -e "discovery.type=single-node" -e "xpack.security.enabled=false" docker.elastic.co/elasticsearch/elasticsearch:8.10.4
|
|
153
|
+
```
|
|
150
154
|
|
|
151
155
|
## License
|
|
152
156
|
|
|
@@ -8,20 +8,26 @@ var glob = _interopDefault(require('glob'));
|
|
|
8
8
|
var cliProgress = _interopDefault(require('cli-progress'));
|
|
9
9
|
var elasticsearch = _interopDefault(require('@elastic/elasticsearch'));
|
|
10
10
|
|
|
11
|
+
var DEFAULT_BUFFER_SIZE = 1000;
|
|
12
|
+
|
|
11
13
|
function createMappingFactory(ref) {
|
|
12
14
|
var sourceClient = ref.sourceClient;
|
|
13
15
|
var sourceIndexName = ref.sourceIndexName;
|
|
14
16
|
var targetClient = ref.targetClient;
|
|
15
17
|
var targetIndexName = ref.targetIndexName;
|
|
16
18
|
var mappings = ref.mappings;
|
|
19
|
+
var mappingsOverride = ref.mappingsOverride;
|
|
20
|
+
var indexMappingTotalFieldsLimit = ref.indexMappingTotalFieldsLimit;
|
|
17
21
|
var verbose = ref.verbose;
|
|
18
22
|
|
|
19
23
|
return async function () {
|
|
20
|
-
var targetMappings = mappings;
|
|
24
|
+
var targetMappings = mappingsOverride ? undefined : mappings;
|
|
21
25
|
|
|
22
26
|
if (sourceClient && sourceIndexName && typeof targetMappings === 'undefined') {
|
|
23
27
|
try {
|
|
24
|
-
var mapping = await sourceClient.indices.getMapping({
|
|
28
|
+
var mapping = await sourceClient.indices.getMapping({
|
|
29
|
+
index: sourceIndexName,
|
|
30
|
+
});
|
|
25
31
|
targetMappings = mapping[sourceIndexName].mappings;
|
|
26
32
|
} catch (err) {
|
|
27
33
|
console.log('Error reading source mapping', err);
|
|
@@ -30,13 +36,24 @@ function createMappingFactory(ref) {
|
|
|
30
36
|
}
|
|
31
37
|
|
|
32
38
|
if (typeof targetMappings === 'object' && targetMappings !== null) {
|
|
39
|
+
if (mappingsOverride) {
|
|
40
|
+
targetMappings = Object.assign({}, targetMappings,
|
|
41
|
+
{properties: Object.assign({}, targetMappings.properties,
|
|
42
|
+
mappings)});
|
|
43
|
+
}
|
|
44
|
+
|
|
33
45
|
try {
|
|
34
|
-
var resp = await targetClient.indices.create(
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
46
|
+
var resp = await targetClient.indices.create({
|
|
47
|
+
index: targetIndexName,
|
|
48
|
+
body: Object.assign({}, {mappings: targetMappings},
|
|
49
|
+
(indexMappingTotalFieldsLimit !== undefined
|
|
50
|
+
? {
|
|
51
|
+
settings: {
|
|
52
|
+
'index.mapping.total_fields.limit': indexMappingTotalFieldsLimit,
|
|
53
|
+
},
|
|
54
|
+
}
|
|
55
|
+
: {})),
|
|
56
|
+
});
|
|
40
57
|
if (verbose) { console.log('Created target mapping', resp); }
|
|
41
58
|
} catch (err) {
|
|
42
59
|
console.log('Error creating target mapping', err);
|
|
@@ -45,45 +62,69 @@ function createMappingFactory(ref) {
|
|
|
45
62
|
};
|
|
46
63
|
}
|
|
47
64
|
|
|
65
|
+
var MAX_QUEUE_SIZE = 15;
|
|
66
|
+
|
|
48
67
|
function fileReaderFactory(indexer, fileName, transform, splitRegex, verbose) {
|
|
49
68
|
function startIndex(files) {
|
|
69
|
+
var ingestQueueSize = 0;
|
|
70
|
+
var finished = false;
|
|
71
|
+
|
|
50
72
|
var file = files.shift();
|
|
51
|
-
var s = fs
|
|
73
|
+
var s = fs
|
|
74
|
+
.createReadStream(file)
|
|
52
75
|
.pipe(es.split(splitRegex))
|
|
53
|
-
.pipe(
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
76
|
+
.pipe(
|
|
77
|
+
es
|
|
78
|
+
.mapSync(function (line) {
|
|
79
|
+
try {
|
|
80
|
+
var doc = typeof transform === 'function' ? transform(line) : line;
|
|
81
|
+
// if doc is undefined we'll skip indexing it
|
|
82
|
+
if (typeof doc === 'undefined') {
|
|
83
|
+
s.resume();
|
|
84
|
+
return;
|
|
85
|
+
}
|
|
86
|
+
|
|
87
|
+
// the transform callback may return an array of docs so we can emit
|
|
88
|
+
// multiple docs from a single line
|
|
89
|
+
if (Array.isArray(doc)) {
|
|
90
|
+
doc.forEach(function (d) { return indexer.add(d); });
|
|
91
|
+
return;
|
|
92
|
+
}
|
|
93
|
+
|
|
94
|
+
indexer.add(doc);
|
|
95
|
+
} catch (e) {
|
|
96
|
+
console.log('error', e);
|
|
97
|
+
}
|
|
98
|
+
})
|
|
99
|
+
.on('error', function (err) {
|
|
100
|
+
console.log('Error while reading file.', err);
|
|
101
|
+
})
|
|
102
|
+
.on('end', function () {
|
|
103
|
+
if (verbose) { console.log('Read entire file: ', file); }
|
|
104
|
+
if (files.length > 0) {
|
|
105
|
+
startIndex(files);
|
|
106
|
+
return;
|
|
107
|
+
}
|
|
108
|
+
|
|
109
|
+
indexer.finish();
|
|
110
|
+
finished = true;
|
|
111
|
+
})
|
|
112
|
+
);
|
|
62
113
|
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
doc.forEach(function (d) { return indexer.add(d); });
|
|
67
|
-
return;
|
|
68
|
-
}
|
|
114
|
+
indexer.queueEmitter.on('queue-size', async function (size) {
|
|
115
|
+
if (finished) { return; }
|
|
116
|
+
ingestQueueSize = size;
|
|
69
117
|
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
}
|
|
75
|
-
|
|
76
|
-
console.log('Error while reading file.', err);
|
|
77
|
-
})
|
|
78
|
-
.on('end', function () {
|
|
79
|
-
if (verbose) { console.log('Read entire file: ', file); }
|
|
80
|
-
indexer.finish();
|
|
81
|
-
if (files.length > 0) {
|
|
82
|
-
startIndex(files);
|
|
83
|
-
}
|
|
84
|
-
}));
|
|
118
|
+
if (ingestQueueSize < MAX_QUEUE_SIZE) {
|
|
119
|
+
s.resume();
|
|
120
|
+
} else {
|
|
121
|
+
s.pause();
|
|
122
|
+
}
|
|
123
|
+
});
|
|
85
124
|
|
|
86
125
|
indexer.queueEmitter.on('resume', function () {
|
|
126
|
+
if (finished) { return; }
|
|
127
|
+
ingestQueueSize = 0;
|
|
87
128
|
s.resume();
|
|
88
129
|
});
|
|
89
130
|
}
|
|
@@ -99,81 +140,139 @@ var EventEmitter = require('events');
|
|
|
99
140
|
|
|
100
141
|
var queueEmitter = new EventEmitter();
|
|
101
142
|
|
|
143
|
+
var parallelCalls = 1;
|
|
144
|
+
|
|
102
145
|
// a simple helper queue to bulk index documents
|
|
103
146
|
function indexQueueFactory(ref) {
|
|
104
147
|
var client = ref.targetClient;
|
|
105
148
|
var targetIndexName = ref.targetIndexName;
|
|
106
|
-
var bufferSize = ref.bufferSize; if ( bufferSize === void 0 ) bufferSize =
|
|
149
|
+
var bufferSize = ref.bufferSize; if ( bufferSize === void 0 ) bufferSize = DEFAULT_BUFFER_SIZE;
|
|
107
150
|
var skipHeader = ref.skipHeader; if ( skipHeader === void 0 ) skipHeader = false;
|
|
108
151
|
var verbose = ref.verbose; if ( verbose === void 0 ) verbose = true;
|
|
109
152
|
|
|
110
153
|
var buffer = [];
|
|
111
154
|
var queue = [];
|
|
112
|
-
var ingesting =
|
|
155
|
+
var ingesting = 0;
|
|
156
|
+
var ingestTimes = [];
|
|
157
|
+
var finished = false;
|
|
113
158
|
|
|
114
|
-
var ingest =
|
|
159
|
+
var ingest = function (b) {
|
|
115
160
|
if (typeof b !== 'undefined') {
|
|
116
161
|
queue.push(b);
|
|
117
162
|
queueEmitter.emit('queue-size', queue.length);
|
|
118
163
|
}
|
|
119
164
|
|
|
120
|
-
if (
|
|
165
|
+
if (ingestTimes.length > 5) { ingestTimes = ingestTimes.slice(-5); }
|
|
166
|
+
|
|
167
|
+
if (ingesting < parallelCalls) {
|
|
121
168
|
var docs = queue.shift();
|
|
122
|
-
queueEmitter.emit('queue-size', queue.length);
|
|
123
|
-
ingesting = true;
|
|
124
|
-
if (verbose) { console.log(("bulk ingest docs: " + (docs.length / 2) + ", queue length: " + (queue.length))); }
|
|
125
169
|
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
if (queue.length > 0) {
|
|
130
|
-
ingest();
|
|
131
|
-
}
|
|
132
|
-
} catch (err) {
|
|
133
|
-
console.log('bulk index error', err);
|
|
170
|
+
queueEmitter.emit('queue-size', queue.length);
|
|
171
|
+
if (queue.length <= 5) {
|
|
172
|
+
queueEmitter.emit('resume');
|
|
134
173
|
}
|
|
135
|
-
}
|
|
136
174
|
|
|
137
|
-
|
|
138
|
-
|
|
139
|
-
|
|
140
|
-
|
|
175
|
+
ingesting += 1;
|
|
176
|
+
|
|
177
|
+
if (verbose)
|
|
178
|
+
{ console.log(("bulk ingest docs: " + (docs.length / 2) + ", queue length: " + (queue.length))); }
|
|
179
|
+
|
|
180
|
+
var start = Date.now();
|
|
181
|
+
client
|
|
182
|
+
.bulk({ body: docs })
|
|
183
|
+
.then(function () {
|
|
184
|
+
var end = Date.now();
|
|
185
|
+
var delta = end - start;
|
|
186
|
+
ingestTimes.push(delta);
|
|
187
|
+
ingesting -= 1;
|
|
188
|
+
|
|
189
|
+
var ingestTimesMovingAverage =
|
|
190
|
+
ingestTimes.length > 0
|
|
191
|
+
? ingestTimes.reduce(function (p, c) { return p + c; }, 0) / ingestTimes.length
|
|
192
|
+
: 0;
|
|
193
|
+
var ingestTimesMovingAverageSeconds = Math.floor(ingestTimesMovingAverage / 1000);
|
|
194
|
+
|
|
195
|
+
if (
|
|
196
|
+
ingestTimes.length > 0 &&
|
|
197
|
+
ingestTimesMovingAverageSeconds < 30 &&
|
|
198
|
+
parallelCalls < 10
|
|
199
|
+
) {
|
|
200
|
+
parallelCalls += 1;
|
|
201
|
+
} else if (
|
|
202
|
+
ingestTimes.length > 0 &&
|
|
203
|
+
ingestTimesMovingAverageSeconds >= 30 &&
|
|
204
|
+
parallelCalls > 1
|
|
205
|
+
) {
|
|
206
|
+
parallelCalls -= 1;
|
|
207
|
+
}
|
|
208
|
+
|
|
209
|
+
if (queue.length > 0) {
|
|
210
|
+
ingest();
|
|
211
|
+
} else if (queue.length === 0 && finished) {
|
|
212
|
+
queueEmitter.emit('finish');
|
|
213
|
+
}
|
|
214
|
+
})
|
|
215
|
+
.catch(function (error) {
|
|
216
|
+
console.error(error);
|
|
217
|
+
ingesting -= 1;
|
|
218
|
+
parallelCalls = 1;
|
|
219
|
+
if (queue.length > 0) {
|
|
220
|
+
ingest();
|
|
221
|
+
}
|
|
222
|
+
});
|
|
141
223
|
}
|
|
142
224
|
};
|
|
143
225
|
|
|
144
226
|
return {
|
|
145
227
|
add: function (doc) {
|
|
228
|
+
if (finished) {
|
|
229
|
+
throw new Error('Unexpected doc added after indexer should finish.');
|
|
230
|
+
}
|
|
231
|
+
|
|
146
232
|
if (!skipHeader) {
|
|
147
233
|
var header = { index: { _index: targetIndexName } };
|
|
148
234
|
buffer.push(header);
|
|
149
235
|
}
|
|
150
236
|
buffer.push(doc);
|
|
151
237
|
|
|
152
|
-
// console.log(`add: queue.length ${queue.length}`);
|
|
153
238
|
if (queue.length === 0) {
|
|
154
239
|
queueEmitter.emit('resume');
|
|
155
240
|
}
|
|
156
241
|
|
|
157
|
-
if (buffer.length >=
|
|
242
|
+
if (buffer.length >= bufferSize * 2) {
|
|
158
243
|
ingest(buffer);
|
|
159
244
|
buffer = [];
|
|
160
245
|
}
|
|
161
246
|
},
|
|
162
|
-
finish:
|
|
163
|
-
|
|
164
|
-
|
|
165
|
-
|
|
247
|
+
finish: function () {
|
|
248
|
+
finished = true;
|
|
249
|
+
|
|
250
|
+
if (buffer.length > 0) {
|
|
251
|
+
ingest(buffer);
|
|
252
|
+
buffer = [];
|
|
253
|
+
} else if (queue.length === 0 && ingesting === 0) {
|
|
254
|
+
queueEmitter.emit('finish');
|
|
255
|
+
}
|
|
166
256
|
},
|
|
167
257
|
queueEmitter: queueEmitter,
|
|
168
258
|
};
|
|
169
259
|
}
|
|
170
260
|
|
|
171
|
-
var MAX_QUEUE_SIZE =
|
|
261
|
+
var MAX_QUEUE_SIZE$1 = 15;
|
|
172
262
|
|
|
173
263
|
// create a new progress bar instance and use shades_classic theme
|
|
174
264
|
var progressBar = new cliProgress.SingleBar({}, cliProgress.Presets.shades_classic);
|
|
175
265
|
|
|
176
|
-
function indexReaderFactory(
|
|
266
|
+
function indexReaderFactory(
|
|
267
|
+
indexer,
|
|
268
|
+
sourceIndexName,
|
|
269
|
+
transform,
|
|
270
|
+
client,
|
|
271
|
+
query,
|
|
272
|
+
bufferSize
|
|
273
|
+
) {
|
|
274
|
+
if ( bufferSize === void 0 ) bufferSize = DEFAULT_BUFFER_SIZE;
|
|
275
|
+
|
|
177
276
|
return async function indexReader() {
|
|
178
277
|
var responseQueue = [];
|
|
179
278
|
var docsNum = 0;
|
|
@@ -182,7 +281,8 @@ function indexReaderFactory(indexer, sourceIndexName, transform, client) {
|
|
|
182
281
|
return client.search({
|
|
183
282
|
index: sourceIndexName,
|
|
184
283
|
scroll: '30s',
|
|
185
|
-
size:
|
|
284
|
+
size: bufferSize,
|
|
285
|
+
query: query,
|
|
186
286
|
});
|
|
187
287
|
}
|
|
188
288
|
|
|
@@ -202,7 +302,7 @@ function indexReaderFactory(indexer, sourceIndexName, transform, client) {
|
|
|
202
302
|
function processHit(hit) {
|
|
203
303
|
docsNum += 1;
|
|
204
304
|
try {
|
|
205
|
-
var doc =
|
|
305
|
+
var doc = typeof transform === 'function' ? transform(hit._source) : hit._source; // eslint-disable-line no-underscore-dangle
|
|
206
306
|
// if doc is undefined we'll skip indexing it
|
|
207
307
|
if (typeof doc === 'undefined') {
|
|
208
308
|
return;
|
|
@@ -236,15 +336,13 @@ function indexReaderFactory(indexer, sourceIndexName, transform, client) {
|
|
|
236
336
|
progressBar.update(docsNum);
|
|
237
337
|
|
|
238
338
|
// check to see if we have collected all of the docs
|
|
239
|
-
// console.log('check count', response.hits.total.value, docsNum);
|
|
240
339
|
if (response.hits.total.value === docsNum) {
|
|
241
340
|
indexer.finish();
|
|
242
|
-
progressBar.stop();
|
|
243
341
|
break;
|
|
244
342
|
}
|
|
245
343
|
|
|
246
|
-
if (ingestQueueSize < MAX_QUEUE_SIZE) {
|
|
247
|
-
|
|
344
|
+
if (ingestQueueSize < MAX_QUEUE_SIZE$1) {
|
|
345
|
+
// get the next response if there are more docs to fetch
|
|
248
346
|
var sc = await scroll(response._scroll_id); // eslint-disable-line no-await-in-loop,no-underscore-dangle,max-len
|
|
249
347
|
scrollId = sc._scroll_id; // eslint-disable-line no-underscore-dangle
|
|
250
348
|
responseQueue.push(sc);
|
|
@@ -257,8 +355,8 @@ function indexReaderFactory(indexer, sourceIndexName, transform, client) {
|
|
|
257
355
|
indexer.queueEmitter.on('queue-size', async function (size) {
|
|
258
356
|
ingestQueueSize = size;
|
|
259
357
|
|
|
260
|
-
if (!readActive && ingestQueueSize < MAX_QUEUE_SIZE) {
|
|
261
|
-
|
|
358
|
+
if (!readActive && ingestQueueSize < MAX_QUEUE_SIZE$1) {
|
|
359
|
+
// get the next response if there are more docs to fetch
|
|
262
360
|
var sc = await scroll(scrollId); // eslint-disable-line no-await-in-loop,no-underscore-dangle,max-len
|
|
263
361
|
scrollId = sc._scroll_id; // eslint-disable-line no-underscore-dangle
|
|
264
362
|
responseQueue.push(sc);
|
|
@@ -280,28 +378,27 @@ function indexReaderFactory(indexer, sourceIndexName, transform, client) {
|
|
|
280
378
|
processResponseQueue();
|
|
281
379
|
});
|
|
282
380
|
|
|
381
|
+
indexer.queueEmitter.on('finish', function () {
|
|
382
|
+
progressBar.stop();
|
|
383
|
+
});
|
|
384
|
+
|
|
283
385
|
processResponseQueue();
|
|
284
386
|
};
|
|
285
387
|
}
|
|
286
388
|
|
|
287
389
|
async function transformer(ref) {
|
|
288
390
|
var deleteIndex = ref.deleteIndex; if ( deleteIndex === void 0 ) deleteIndex = false;
|
|
289
|
-
var
|
|
290
|
-
var
|
|
291
|
-
var
|
|
292
|
-
var auth = ref.auth;
|
|
293
|
-
var rejectUnauthorized = ref.rejectUnauthorized; if ( rejectUnauthorized === void 0 ) rejectUnauthorized = true;
|
|
294
|
-
var ca = ref.ca;
|
|
295
|
-
var targetProtocol = ref.targetProtocol;
|
|
296
|
-
var targetHost = ref.targetHost;
|
|
297
|
-
var targetPort = ref.targetPort;
|
|
298
|
-
var targetAuth = ref.targetAuth;
|
|
299
|
-
var bufferSize = ref.bufferSize; if ( bufferSize === void 0 ) bufferSize = 1000;
|
|
391
|
+
var sourceClientConfig = ref.sourceClientConfig;
|
|
392
|
+
var targetClientConfig = ref.targetClientConfig;
|
|
393
|
+
var bufferSize = ref.bufferSize; if ( bufferSize === void 0 ) bufferSize = DEFAULT_BUFFER_SIZE;
|
|
300
394
|
var fileName = ref.fileName;
|
|
301
395
|
var splitRegex = ref.splitRegex; if ( splitRegex === void 0 ) splitRegex = /\n/;
|
|
302
396
|
var sourceIndexName = ref.sourceIndexName;
|
|
303
397
|
var targetIndexName = ref.targetIndexName;
|
|
304
398
|
var mappings = ref.mappings;
|
|
399
|
+
var mappingsOverride = ref.mappingsOverride; if ( mappingsOverride === void 0 ) mappingsOverride = false;
|
|
400
|
+
var indexMappingTotalFieldsLimit = ref.indexMappingTotalFieldsLimit;
|
|
401
|
+
var query = ref.query;
|
|
305
402
|
var skipHeader = ref.skipHeader; if ( skipHeader === void 0 ) skipHeader = false;
|
|
306
403
|
var transform = ref.transform;
|
|
307
404
|
var verbose = ref.verbose; if ( verbose === void 0 ) verbose = true;
|
|
@@ -310,19 +407,14 @@ async function transformer(ref) {
|
|
|
310
407
|
throw Error('targetIndexName must be specified.');
|
|
311
408
|
}
|
|
312
409
|
|
|
313
|
-
var
|
|
314
|
-
|
|
315
|
-
|
|
316
|
-
auth: auth,
|
|
317
|
-
tls: { ca: ca, rejectUnauthorized: rejectUnauthorized },
|
|
318
|
-
});
|
|
410
|
+
var defaultClientConfig = {
|
|
411
|
+
node: 'http://localhost:9200',
|
|
412
|
+
};
|
|
319
413
|
|
|
320
|
-
var
|
|
321
|
-
var targetClient = new elasticsearch.Client(
|
|
322
|
-
|
|
323
|
-
|
|
324
|
-
tls: { ca: ca, rejectUnauthorized: rejectUnauthorized },
|
|
325
|
-
});
|
|
414
|
+
var sourceClient = new elasticsearch.Client(sourceClientConfig || defaultClientConfig);
|
|
415
|
+
var targetClient = new elasticsearch.Client(
|
|
416
|
+
targetClientConfig || sourceClientConfig || defaultClientConfig
|
|
417
|
+
);
|
|
326
418
|
|
|
327
419
|
var createMapping = createMappingFactory({
|
|
328
420
|
sourceClient: sourceClient,
|
|
@@ -330,6 +422,8 @@ async function transformer(ref) {
|
|
|
330
422
|
targetClient: targetClient,
|
|
331
423
|
targetIndexName: targetIndexName,
|
|
332
424
|
mappings: mappings,
|
|
425
|
+
mappingsOverride: mappingsOverride,
|
|
426
|
+
indexMappingTotalFieldsLimit: indexMappingTotalFieldsLimit,
|
|
333
427
|
verbose: verbose,
|
|
334
428
|
});
|
|
335
429
|
var indexer = indexQueueFactory({
|
|
@@ -341,30 +435,16 @@ async function transformer(ref) {
|
|
|
341
435
|
});
|
|
342
436
|
|
|
343
437
|
function getReader() {
|
|
344
|
-
if (
|
|
345
|
-
|
|
346
|
-
&& typeof sourceIndexName !== 'undefined'
|
|
347
|
-
) {
|
|
348
|
-
throw Error(
|
|
349
|
-
'Only either one of fileName or sourceIndexName can be specified.'
|
|
350
|
-
);
|
|
438
|
+
if (typeof fileName !== 'undefined' && typeof sourceIndexName !== 'undefined') {
|
|
439
|
+
throw Error('Only either one of fileName or sourceIndexName can be specified.');
|
|
351
440
|
}
|
|
352
441
|
|
|
353
|
-
if (
|
|
354
|
-
typeof fileName === 'undefined'
|
|
355
|
-
&& typeof sourceIndexName === 'undefined'
|
|
356
|
-
) {
|
|
442
|
+
if (typeof fileName === 'undefined' && typeof sourceIndexName === 'undefined') {
|
|
357
443
|
throw Error('Either fileName or sourceIndexName must be specified.');
|
|
358
444
|
}
|
|
359
445
|
|
|
360
446
|
if (typeof fileName !== 'undefined') {
|
|
361
|
-
return fileReaderFactory(
|
|
362
|
-
indexer,
|
|
363
|
-
fileName,
|
|
364
|
-
transform,
|
|
365
|
-
splitRegex,
|
|
366
|
-
verbose
|
|
367
|
-
);
|
|
447
|
+
return fileReaderFactory(indexer, fileName, transform, splitRegex, verbose);
|
|
368
448
|
}
|
|
369
449
|
|
|
370
450
|
if (typeof sourceIndexName !== 'undefined') {
|
|
@@ -372,7 +452,9 @@ async function transformer(ref) {
|
|
|
372
452
|
indexer,
|
|
373
453
|
sourceIndexName,
|
|
374
454
|
transform,
|
|
375
|
-
sourceClient
|
|
455
|
+
sourceClient,
|
|
456
|
+
query,
|
|
457
|
+
bufferSize
|
|
376
458
|
);
|
|
377
459
|
}
|
|
378
460
|
|
|
@@ -4,20 +4,26 @@ import glob from 'glob';
|
|
|
4
4
|
import cliProgress from 'cli-progress';
|
|
5
5
|
import elasticsearch from '@elastic/elasticsearch';
|
|
6
6
|
|
|
7
|
+
var DEFAULT_BUFFER_SIZE = 1000;
|
|
8
|
+
|
|
7
9
|
function createMappingFactory(ref) {
|
|
8
10
|
var sourceClient = ref.sourceClient;
|
|
9
11
|
var sourceIndexName = ref.sourceIndexName;
|
|
10
12
|
var targetClient = ref.targetClient;
|
|
11
13
|
var targetIndexName = ref.targetIndexName;
|
|
12
14
|
var mappings = ref.mappings;
|
|
15
|
+
var mappingsOverride = ref.mappingsOverride;
|
|
16
|
+
var indexMappingTotalFieldsLimit = ref.indexMappingTotalFieldsLimit;
|
|
13
17
|
var verbose = ref.verbose;
|
|
14
18
|
|
|
15
19
|
return async function () {
|
|
16
|
-
var targetMappings = mappings;
|
|
20
|
+
var targetMappings = mappingsOverride ? undefined : mappings;
|
|
17
21
|
|
|
18
22
|
if (sourceClient && sourceIndexName && typeof targetMappings === 'undefined') {
|
|
19
23
|
try {
|
|
20
|
-
var mapping = await sourceClient.indices.getMapping({
|
|
24
|
+
var mapping = await sourceClient.indices.getMapping({
|
|
25
|
+
index: sourceIndexName,
|
|
26
|
+
});
|
|
21
27
|
targetMappings = mapping[sourceIndexName].mappings;
|
|
22
28
|
} catch (err) {
|
|
23
29
|
console.log('Error reading source mapping', err);
|
|
@@ -26,13 +32,24 @@ function createMappingFactory(ref) {
|
|
|
26
32
|
}
|
|
27
33
|
|
|
28
34
|
if (typeof targetMappings === 'object' && targetMappings !== null) {
|
|
35
|
+
if (mappingsOverride) {
|
|
36
|
+
targetMappings = Object.assign({}, targetMappings,
|
|
37
|
+
{properties: Object.assign({}, targetMappings.properties,
|
|
38
|
+
mappings)});
|
|
39
|
+
}
|
|
40
|
+
|
|
29
41
|
try {
|
|
30
|
-
var resp = await targetClient.indices.create(
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
42
|
+
var resp = await targetClient.indices.create({
|
|
43
|
+
index: targetIndexName,
|
|
44
|
+
body: Object.assign({}, {mappings: targetMappings},
|
|
45
|
+
(indexMappingTotalFieldsLimit !== undefined
|
|
46
|
+
? {
|
|
47
|
+
settings: {
|
|
48
|
+
'index.mapping.total_fields.limit': indexMappingTotalFieldsLimit,
|
|
49
|
+
},
|
|
50
|
+
}
|
|
51
|
+
: {})),
|
|
52
|
+
});
|
|
36
53
|
if (verbose) { console.log('Created target mapping', resp); }
|
|
37
54
|
} catch (err) {
|
|
38
55
|
console.log('Error creating target mapping', err);
|
|
@@ -41,45 +58,69 @@ function createMappingFactory(ref) {
|
|
|
41
58
|
};
|
|
42
59
|
}
|
|
43
60
|
|
|
61
|
+
var MAX_QUEUE_SIZE = 15;
|
|
62
|
+
|
|
44
63
|
function fileReaderFactory(indexer, fileName, transform, splitRegex, verbose) {
|
|
45
64
|
function startIndex(files) {
|
|
65
|
+
var ingestQueueSize = 0;
|
|
66
|
+
var finished = false;
|
|
67
|
+
|
|
46
68
|
var file = files.shift();
|
|
47
|
-
var s = fs
|
|
69
|
+
var s = fs
|
|
70
|
+
.createReadStream(file)
|
|
48
71
|
.pipe(es.split(splitRegex))
|
|
49
|
-
.pipe(
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
72
|
+
.pipe(
|
|
73
|
+
es
|
|
74
|
+
.mapSync(function (line) {
|
|
75
|
+
try {
|
|
76
|
+
var doc = typeof transform === 'function' ? transform(line) : line;
|
|
77
|
+
// if doc is undefined we'll skip indexing it
|
|
78
|
+
if (typeof doc === 'undefined') {
|
|
79
|
+
s.resume();
|
|
80
|
+
return;
|
|
81
|
+
}
|
|
82
|
+
|
|
83
|
+
// the transform callback may return an array of docs so we can emit
|
|
84
|
+
// multiple docs from a single line
|
|
85
|
+
if (Array.isArray(doc)) {
|
|
86
|
+
doc.forEach(function (d) { return indexer.add(d); });
|
|
87
|
+
return;
|
|
88
|
+
}
|
|
89
|
+
|
|
90
|
+
indexer.add(doc);
|
|
91
|
+
} catch (e) {
|
|
92
|
+
console.log('error', e);
|
|
93
|
+
}
|
|
94
|
+
})
|
|
95
|
+
.on('error', function (err) {
|
|
96
|
+
console.log('Error while reading file.', err);
|
|
97
|
+
})
|
|
98
|
+
.on('end', function () {
|
|
99
|
+
if (verbose) { console.log('Read entire file: ', file); }
|
|
100
|
+
if (files.length > 0) {
|
|
101
|
+
startIndex(files);
|
|
102
|
+
return;
|
|
103
|
+
}
|
|
104
|
+
|
|
105
|
+
indexer.finish();
|
|
106
|
+
finished = true;
|
|
107
|
+
})
|
|
108
|
+
);
|
|
58
109
|
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
doc.forEach(function (d) { return indexer.add(d); });
|
|
63
|
-
return;
|
|
64
|
-
}
|
|
110
|
+
indexer.queueEmitter.on('queue-size', async function (size) {
|
|
111
|
+
if (finished) { return; }
|
|
112
|
+
ingestQueueSize = size;
|
|
65
113
|
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
}
|
|
71
|
-
|
|
72
|
-
console.log('Error while reading file.', err);
|
|
73
|
-
})
|
|
74
|
-
.on('end', function () {
|
|
75
|
-
if (verbose) { console.log('Read entire file: ', file); }
|
|
76
|
-
indexer.finish();
|
|
77
|
-
if (files.length > 0) {
|
|
78
|
-
startIndex(files);
|
|
79
|
-
}
|
|
80
|
-
}));
|
|
114
|
+
if (ingestQueueSize < MAX_QUEUE_SIZE) {
|
|
115
|
+
s.resume();
|
|
116
|
+
} else {
|
|
117
|
+
s.pause();
|
|
118
|
+
}
|
|
119
|
+
});
|
|
81
120
|
|
|
82
121
|
indexer.queueEmitter.on('resume', function () {
|
|
122
|
+
if (finished) { return; }
|
|
123
|
+
ingestQueueSize = 0;
|
|
83
124
|
s.resume();
|
|
84
125
|
});
|
|
85
126
|
}
|
|
@@ -95,81 +136,139 @@ var EventEmitter = require('events');
|
|
|
95
136
|
|
|
96
137
|
var queueEmitter = new EventEmitter();
|
|
97
138
|
|
|
139
|
+
var parallelCalls = 1;
|
|
140
|
+
|
|
98
141
|
// a simple helper queue to bulk index documents
|
|
99
142
|
function indexQueueFactory(ref) {
|
|
100
143
|
var client = ref.targetClient;
|
|
101
144
|
var targetIndexName = ref.targetIndexName;
|
|
102
|
-
var bufferSize = ref.bufferSize; if ( bufferSize === void 0 ) bufferSize =
|
|
145
|
+
var bufferSize = ref.bufferSize; if ( bufferSize === void 0 ) bufferSize = DEFAULT_BUFFER_SIZE;
|
|
103
146
|
var skipHeader = ref.skipHeader; if ( skipHeader === void 0 ) skipHeader = false;
|
|
104
147
|
var verbose = ref.verbose; if ( verbose === void 0 ) verbose = true;
|
|
105
148
|
|
|
106
149
|
var buffer = [];
|
|
107
150
|
var queue = [];
|
|
108
|
-
var ingesting =
|
|
151
|
+
var ingesting = 0;
|
|
152
|
+
var ingestTimes = [];
|
|
153
|
+
var finished = false;
|
|
109
154
|
|
|
110
|
-
var ingest =
|
|
155
|
+
var ingest = function (b) {
|
|
111
156
|
if (typeof b !== 'undefined') {
|
|
112
157
|
queue.push(b);
|
|
113
158
|
queueEmitter.emit('queue-size', queue.length);
|
|
114
159
|
}
|
|
115
160
|
|
|
116
|
-
if (
|
|
161
|
+
if (ingestTimes.length > 5) { ingestTimes = ingestTimes.slice(-5); }
|
|
162
|
+
|
|
163
|
+
if (ingesting < parallelCalls) {
|
|
117
164
|
var docs = queue.shift();
|
|
118
|
-
queueEmitter.emit('queue-size', queue.length);
|
|
119
|
-
ingesting = true;
|
|
120
|
-
if (verbose) { console.log(("bulk ingest docs: " + (docs.length / 2) + ", queue length: " + (queue.length))); }
|
|
121
165
|
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
if (queue.length > 0) {
|
|
126
|
-
ingest();
|
|
127
|
-
}
|
|
128
|
-
} catch (err) {
|
|
129
|
-
console.log('bulk index error', err);
|
|
166
|
+
queueEmitter.emit('queue-size', queue.length);
|
|
167
|
+
if (queue.length <= 5) {
|
|
168
|
+
queueEmitter.emit('resume');
|
|
130
169
|
}
|
|
131
|
-
}
|
|
132
170
|
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
|
|
136
|
-
|
|
171
|
+
ingesting += 1;
|
|
172
|
+
|
|
173
|
+
if (verbose)
|
|
174
|
+
{ console.log(("bulk ingest docs: " + (docs.length / 2) + ", queue length: " + (queue.length))); }
|
|
175
|
+
|
|
176
|
+
var start = Date.now();
|
|
177
|
+
client
|
|
178
|
+
.bulk({ body: docs })
|
|
179
|
+
.then(function () {
|
|
180
|
+
var end = Date.now();
|
|
181
|
+
var delta = end - start;
|
|
182
|
+
ingestTimes.push(delta);
|
|
183
|
+
ingesting -= 1;
|
|
184
|
+
|
|
185
|
+
var ingestTimesMovingAverage =
|
|
186
|
+
ingestTimes.length > 0
|
|
187
|
+
? ingestTimes.reduce(function (p, c) { return p + c; }, 0) / ingestTimes.length
|
|
188
|
+
: 0;
|
|
189
|
+
var ingestTimesMovingAverageSeconds = Math.floor(ingestTimesMovingAverage / 1000);
|
|
190
|
+
|
|
191
|
+
if (
|
|
192
|
+
ingestTimes.length > 0 &&
|
|
193
|
+
ingestTimesMovingAverageSeconds < 30 &&
|
|
194
|
+
parallelCalls < 10
|
|
195
|
+
) {
|
|
196
|
+
parallelCalls += 1;
|
|
197
|
+
} else if (
|
|
198
|
+
ingestTimes.length > 0 &&
|
|
199
|
+
ingestTimesMovingAverageSeconds >= 30 &&
|
|
200
|
+
parallelCalls > 1
|
|
201
|
+
) {
|
|
202
|
+
parallelCalls -= 1;
|
|
203
|
+
}
|
|
204
|
+
|
|
205
|
+
if (queue.length > 0) {
|
|
206
|
+
ingest();
|
|
207
|
+
} else if (queue.length === 0 && finished) {
|
|
208
|
+
queueEmitter.emit('finish');
|
|
209
|
+
}
|
|
210
|
+
})
|
|
211
|
+
.catch(function (error) {
|
|
212
|
+
console.error(error);
|
|
213
|
+
ingesting -= 1;
|
|
214
|
+
parallelCalls = 1;
|
|
215
|
+
if (queue.length > 0) {
|
|
216
|
+
ingest();
|
|
217
|
+
}
|
|
218
|
+
});
|
|
137
219
|
}
|
|
138
220
|
};
|
|
139
221
|
|
|
140
222
|
return {
|
|
141
223
|
add: function (doc) {
|
|
224
|
+
if (finished) {
|
|
225
|
+
throw new Error('Unexpected doc added after indexer should finish.');
|
|
226
|
+
}
|
|
227
|
+
|
|
142
228
|
if (!skipHeader) {
|
|
143
229
|
var header = { index: { _index: targetIndexName } };
|
|
144
230
|
buffer.push(header);
|
|
145
231
|
}
|
|
146
232
|
buffer.push(doc);
|
|
147
233
|
|
|
148
|
-
// console.log(`add: queue.length ${queue.length}`);
|
|
149
234
|
if (queue.length === 0) {
|
|
150
235
|
queueEmitter.emit('resume');
|
|
151
236
|
}
|
|
152
237
|
|
|
153
|
-
if (buffer.length >=
|
|
238
|
+
if (buffer.length >= bufferSize * 2) {
|
|
154
239
|
ingest(buffer);
|
|
155
240
|
buffer = [];
|
|
156
241
|
}
|
|
157
242
|
},
|
|
158
|
-
finish:
|
|
159
|
-
|
|
160
|
-
|
|
161
|
-
|
|
243
|
+
finish: function () {
|
|
244
|
+
finished = true;
|
|
245
|
+
|
|
246
|
+
if (buffer.length > 0) {
|
|
247
|
+
ingest(buffer);
|
|
248
|
+
buffer = [];
|
|
249
|
+
} else if (queue.length === 0 && ingesting === 0) {
|
|
250
|
+
queueEmitter.emit('finish');
|
|
251
|
+
}
|
|
162
252
|
},
|
|
163
253
|
queueEmitter: queueEmitter,
|
|
164
254
|
};
|
|
165
255
|
}
|
|
166
256
|
|
|
167
|
-
var MAX_QUEUE_SIZE =
|
|
257
|
+
var MAX_QUEUE_SIZE$1 = 15;
|
|
168
258
|
|
|
169
259
|
// create a new progress bar instance and use shades_classic theme
|
|
170
260
|
var progressBar = new cliProgress.SingleBar({}, cliProgress.Presets.shades_classic);
|
|
171
261
|
|
|
172
|
-
function indexReaderFactory(
|
|
262
|
+
function indexReaderFactory(
|
|
263
|
+
indexer,
|
|
264
|
+
sourceIndexName,
|
|
265
|
+
transform,
|
|
266
|
+
client,
|
|
267
|
+
query,
|
|
268
|
+
bufferSize
|
|
269
|
+
) {
|
|
270
|
+
if ( bufferSize === void 0 ) bufferSize = DEFAULT_BUFFER_SIZE;
|
|
271
|
+
|
|
173
272
|
return async function indexReader() {
|
|
174
273
|
var responseQueue = [];
|
|
175
274
|
var docsNum = 0;
|
|
@@ -178,7 +277,8 @@ function indexReaderFactory(indexer, sourceIndexName, transform, client) {
|
|
|
178
277
|
return client.search({
|
|
179
278
|
index: sourceIndexName,
|
|
180
279
|
scroll: '30s',
|
|
181
|
-
size:
|
|
280
|
+
size: bufferSize,
|
|
281
|
+
query: query,
|
|
182
282
|
});
|
|
183
283
|
}
|
|
184
284
|
|
|
@@ -198,7 +298,7 @@ function indexReaderFactory(indexer, sourceIndexName, transform, client) {
|
|
|
198
298
|
function processHit(hit) {
|
|
199
299
|
docsNum += 1;
|
|
200
300
|
try {
|
|
201
|
-
var doc =
|
|
301
|
+
var doc = typeof transform === 'function' ? transform(hit._source) : hit._source; // eslint-disable-line no-underscore-dangle
|
|
202
302
|
// if doc is undefined we'll skip indexing it
|
|
203
303
|
if (typeof doc === 'undefined') {
|
|
204
304
|
return;
|
|
@@ -232,15 +332,13 @@ function indexReaderFactory(indexer, sourceIndexName, transform, client) {
|
|
|
232
332
|
progressBar.update(docsNum);
|
|
233
333
|
|
|
234
334
|
// check to see if we have collected all of the docs
|
|
235
|
-
// console.log('check count', response.hits.total.value, docsNum);
|
|
236
335
|
if (response.hits.total.value === docsNum) {
|
|
237
336
|
indexer.finish();
|
|
238
|
-
progressBar.stop();
|
|
239
337
|
break;
|
|
240
338
|
}
|
|
241
339
|
|
|
242
|
-
if (ingestQueueSize < MAX_QUEUE_SIZE) {
|
|
243
|
-
|
|
340
|
+
if (ingestQueueSize < MAX_QUEUE_SIZE$1) {
|
|
341
|
+
// get the next response if there are more docs to fetch
|
|
244
342
|
var sc = await scroll(response._scroll_id); // eslint-disable-line no-await-in-loop,no-underscore-dangle,max-len
|
|
245
343
|
scrollId = sc._scroll_id; // eslint-disable-line no-underscore-dangle
|
|
246
344
|
responseQueue.push(sc);
|
|
@@ -253,8 +351,8 @@ function indexReaderFactory(indexer, sourceIndexName, transform, client) {
|
|
|
253
351
|
indexer.queueEmitter.on('queue-size', async function (size) {
|
|
254
352
|
ingestQueueSize = size;
|
|
255
353
|
|
|
256
|
-
if (!readActive && ingestQueueSize < MAX_QUEUE_SIZE) {
|
|
257
|
-
|
|
354
|
+
if (!readActive && ingestQueueSize < MAX_QUEUE_SIZE$1) {
|
|
355
|
+
// get the next response if there are more docs to fetch
|
|
258
356
|
var sc = await scroll(scrollId); // eslint-disable-line no-await-in-loop,no-underscore-dangle,max-len
|
|
259
357
|
scrollId = sc._scroll_id; // eslint-disable-line no-underscore-dangle
|
|
260
358
|
responseQueue.push(sc);
|
|
@@ -276,28 +374,27 @@ function indexReaderFactory(indexer, sourceIndexName, transform, client) {
|
|
|
276
374
|
processResponseQueue();
|
|
277
375
|
});
|
|
278
376
|
|
|
377
|
+
indexer.queueEmitter.on('finish', function () {
|
|
378
|
+
progressBar.stop();
|
|
379
|
+
});
|
|
380
|
+
|
|
279
381
|
processResponseQueue();
|
|
280
382
|
};
|
|
281
383
|
}
|
|
282
384
|
|
|
283
385
|
async function transformer(ref) {
|
|
284
386
|
var deleteIndex = ref.deleteIndex; if ( deleteIndex === void 0 ) deleteIndex = false;
|
|
285
|
-
var
|
|
286
|
-
var
|
|
287
|
-
var
|
|
288
|
-
var auth = ref.auth;
|
|
289
|
-
var rejectUnauthorized = ref.rejectUnauthorized; if ( rejectUnauthorized === void 0 ) rejectUnauthorized = true;
|
|
290
|
-
var ca = ref.ca;
|
|
291
|
-
var targetProtocol = ref.targetProtocol;
|
|
292
|
-
var targetHost = ref.targetHost;
|
|
293
|
-
var targetPort = ref.targetPort;
|
|
294
|
-
var targetAuth = ref.targetAuth;
|
|
295
|
-
var bufferSize = ref.bufferSize; if ( bufferSize === void 0 ) bufferSize = 1000;
|
|
387
|
+
var sourceClientConfig = ref.sourceClientConfig;
|
|
388
|
+
var targetClientConfig = ref.targetClientConfig;
|
|
389
|
+
var bufferSize = ref.bufferSize; if ( bufferSize === void 0 ) bufferSize = DEFAULT_BUFFER_SIZE;
|
|
296
390
|
var fileName = ref.fileName;
|
|
297
391
|
var splitRegex = ref.splitRegex; if ( splitRegex === void 0 ) splitRegex = /\n/;
|
|
298
392
|
var sourceIndexName = ref.sourceIndexName;
|
|
299
393
|
var targetIndexName = ref.targetIndexName;
|
|
300
394
|
var mappings = ref.mappings;
|
|
395
|
+
var mappingsOverride = ref.mappingsOverride; if ( mappingsOverride === void 0 ) mappingsOverride = false;
|
|
396
|
+
var indexMappingTotalFieldsLimit = ref.indexMappingTotalFieldsLimit;
|
|
397
|
+
var query = ref.query;
|
|
301
398
|
var skipHeader = ref.skipHeader; if ( skipHeader === void 0 ) skipHeader = false;
|
|
302
399
|
var transform = ref.transform;
|
|
303
400
|
var verbose = ref.verbose; if ( verbose === void 0 ) verbose = true;
|
|
@@ -306,19 +403,14 @@ async function transformer(ref) {
|
|
|
306
403
|
throw Error('targetIndexName must be specified.');
|
|
307
404
|
}
|
|
308
405
|
|
|
309
|
-
var
|
|
310
|
-
|
|
311
|
-
|
|
312
|
-
auth: auth,
|
|
313
|
-
tls: { ca: ca, rejectUnauthorized: rejectUnauthorized },
|
|
314
|
-
});
|
|
406
|
+
var defaultClientConfig = {
|
|
407
|
+
node: 'http://localhost:9200',
|
|
408
|
+
};
|
|
315
409
|
|
|
316
|
-
var
|
|
317
|
-
var targetClient = new elasticsearch.Client(
|
|
318
|
-
|
|
319
|
-
|
|
320
|
-
tls: { ca: ca, rejectUnauthorized: rejectUnauthorized },
|
|
321
|
-
});
|
|
410
|
+
var sourceClient = new elasticsearch.Client(sourceClientConfig || defaultClientConfig);
|
|
411
|
+
var targetClient = new elasticsearch.Client(
|
|
412
|
+
targetClientConfig || sourceClientConfig || defaultClientConfig
|
|
413
|
+
);
|
|
322
414
|
|
|
323
415
|
var createMapping = createMappingFactory({
|
|
324
416
|
sourceClient: sourceClient,
|
|
@@ -326,6 +418,8 @@ async function transformer(ref) {
|
|
|
326
418
|
targetClient: targetClient,
|
|
327
419
|
targetIndexName: targetIndexName,
|
|
328
420
|
mappings: mappings,
|
|
421
|
+
mappingsOverride: mappingsOverride,
|
|
422
|
+
indexMappingTotalFieldsLimit: indexMappingTotalFieldsLimit,
|
|
329
423
|
verbose: verbose,
|
|
330
424
|
});
|
|
331
425
|
var indexer = indexQueueFactory({
|
|
@@ -337,30 +431,16 @@ async function transformer(ref) {
|
|
|
337
431
|
});
|
|
338
432
|
|
|
339
433
|
function getReader() {
|
|
340
|
-
if (
|
|
341
|
-
|
|
342
|
-
&& typeof sourceIndexName !== 'undefined'
|
|
343
|
-
) {
|
|
344
|
-
throw Error(
|
|
345
|
-
'Only either one of fileName or sourceIndexName can be specified.'
|
|
346
|
-
);
|
|
434
|
+
if (typeof fileName !== 'undefined' && typeof sourceIndexName !== 'undefined') {
|
|
435
|
+
throw Error('Only either one of fileName or sourceIndexName can be specified.');
|
|
347
436
|
}
|
|
348
437
|
|
|
349
|
-
if (
|
|
350
|
-
typeof fileName === 'undefined'
|
|
351
|
-
&& typeof sourceIndexName === 'undefined'
|
|
352
|
-
) {
|
|
438
|
+
if (typeof fileName === 'undefined' && typeof sourceIndexName === 'undefined') {
|
|
353
439
|
throw Error('Either fileName or sourceIndexName must be specified.');
|
|
354
440
|
}
|
|
355
441
|
|
|
356
442
|
if (typeof fileName !== 'undefined') {
|
|
357
|
-
return fileReaderFactory(
|
|
358
|
-
indexer,
|
|
359
|
-
fileName,
|
|
360
|
-
transform,
|
|
361
|
-
splitRegex,
|
|
362
|
-
verbose
|
|
363
|
-
);
|
|
443
|
+
return fileReaderFactory(indexer, fileName, transform, splitRegex, verbose);
|
|
364
444
|
}
|
|
365
445
|
|
|
366
446
|
if (typeof sourceIndexName !== 'undefined') {
|
|
@@ -368,7 +448,9 @@ async function transformer(ref) {
|
|
|
368
448
|
indexer,
|
|
369
449
|
sourceIndexName,
|
|
370
450
|
transform,
|
|
371
|
-
sourceClient
|
|
451
|
+
sourceClient,
|
|
452
|
+
query,
|
|
453
|
+
bufferSize
|
|
372
454
|
);
|
|
373
455
|
}
|
|
374
456
|
|
package/package.json
CHANGED
|
@@ -14,22 +14,30 @@
|
|
|
14
14
|
"license": "Apache-2.0",
|
|
15
15
|
"author": "Walter Rafelsberger <walter@rafelsberger.at>",
|
|
16
16
|
"contributors": [],
|
|
17
|
-
"version": "1.0.0-
|
|
17
|
+
"version": "1.0.0-beta1",
|
|
18
18
|
"main": "dist/node-es-transformer.cjs.js",
|
|
19
19
|
"module": "dist/node-es-transformer.esm.js",
|
|
20
20
|
"dependencies": {
|
|
21
|
-
"@elastic/elasticsearch": "^8.
|
|
21
|
+
"@elastic/elasticsearch": "^8.10.0",
|
|
22
22
|
"cli-progress": "^3.12.0",
|
|
23
23
|
"event-stream": "3.3.4",
|
|
24
24
|
"glob": "7.1.2"
|
|
25
25
|
},
|
|
26
26
|
"devDependencies": {
|
|
27
27
|
"acorn": "^6.4.2",
|
|
28
|
-
"
|
|
28
|
+
"commit-and-tag-version": "^11.3.0",
|
|
29
|
+
"cz-conventional-changelog": "^3.3.0",
|
|
30
|
+
"eslint": "^8.51.0",
|
|
29
31
|
"eslint-config-airbnb": "19.0.4",
|
|
32
|
+
"eslint-config-prettier": "^9.0.0",
|
|
30
33
|
"eslint-plugin-import": "2.27.5",
|
|
34
|
+
"eslint-plugin-jest": "^27.4.2",
|
|
31
35
|
"eslint-plugin-jsx-a11y": "6.7.1",
|
|
36
|
+
"eslint-plugin-prettier": "^3.3.1",
|
|
32
37
|
"eslint-plugin-react": "7.32.2",
|
|
38
|
+
"frisby": "^2.1.3",
|
|
39
|
+
"jest": "^29.7.0",
|
|
40
|
+
"prettier": "^2.2.1",
|
|
33
41
|
"rollup": "0.66.6",
|
|
34
42
|
"rollup-plugin-buble": "0.19.6",
|
|
35
43
|
"rollup-plugin-commonjs": "8.0.2",
|
|
@@ -38,10 +46,18 @@
|
|
|
38
46
|
"scripts": {
|
|
39
47
|
"build": "rollup -c",
|
|
40
48
|
"dev": "rollup -c -w",
|
|
41
|
-
"test": "
|
|
42
|
-
"pretest": "npm run build"
|
|
49
|
+
"test": "jest",
|
|
50
|
+
"pretest": "npm run build",
|
|
51
|
+
"release": "commit-and-tag-version",
|
|
52
|
+
"create-sample-data-10000": "node scripts/create_sample_data_10000",
|
|
53
|
+
"create-sample-data-100": "node scripts/create_sample_data_100"
|
|
43
54
|
},
|
|
44
55
|
"files": [
|
|
45
56
|
"dist"
|
|
46
|
-
]
|
|
57
|
+
],
|
|
58
|
+
"config": {
|
|
59
|
+
"commitizen": {
|
|
60
|
+
"path": "./node_modules/cz-conventional-changelog"
|
|
61
|
+
}
|
|
62
|
+
}
|
|
47
63
|
}
|