node-es-transformer 1.0.0-alpha9 → 1.0.0-beta1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +17 -4
- package/dist/node-es-transformer.cjs.js +190 -95
- package/dist/node-es-transformer.esm.js +190 -95
- package/package.json +22 -6
package/README.md
CHANGED
|
@@ -1,6 +1,8 @@
|
|
|
1
1
|
[](https://www.npmjs.com/package/node-es-transformer)
|
|
2
2
|
[](https://www.npmjs.com/package/node-es-transformer)
|
|
3
3
|
[](https://www.npmjs.com/package/node-es-transformer)
|
|
4
|
+
[](http://commitizen.github.io/cz-cli/)
|
|
5
|
+
[](https://github.com/walterra/node-es-transformer/actions)
|
|
4
6
|
|
|
5
7
|
# node-es-transformer
|
|
6
8
|
|
|
@@ -10,7 +12,7 @@ A nodejs based library to (re)index and transform data from/to Elasticsearch.
|
|
|
10
12
|
|
|
11
13
|
If you're looking for a nodejs based tool which allows you to ingest large CSV/JSON files in the GigaBytes you've come to the right place. Everything else I've tried with larger files runs out of JS heap, hammers ES with too many single requests, times out or tries to do everything with a single bulk request.
|
|
12
14
|
|
|
13
|
-
While I'd generally recommend using [Logstash](https://www.elastic.co/products/logstash), [filebeat](https://www.elastic.co/products/beats/filebeat)
|
|
15
|
+
While I'd generally recommend using [Logstash](https://www.elastic.co/products/logstash), [filebeat](https://www.elastic.co/products/beats/filebeat), [Ingest Nodes](https://www.elastic.co/guide/en/elasticsearch/reference/master/ingest.html), [Elastic Agent](https://www.elastic.co/guide/en/fleet/current/fleet-overview.html) or [Elasticsearch Transforms](https://www.elastic.co/guide/en/elasticsearch/reference/current/transforms.html) for established use cases, this tool may be of help especially if you feel more at home in the JavaScript/nodejs universe and have use cases with customized ingestion and data transformation needs.
|
|
14
16
|
|
|
15
17
|
**This is experimental code, use at your own risk. Nonetheless, I encourage you to give it a try so I can gather some feedback.**
|
|
16
18
|
|
|
@@ -26,7 +28,7 @@ Now that we've talked about the caveats, let's have a look what you actually get
|
|
|
26
28
|
|
|
27
29
|
## Features
|
|
28
30
|
|
|
29
|
-
- Buffering/Streaming for both reading and indexing. Files are read using streaming and Elasticsearch ingestion is done using buffered bulk indexing. This is tailored towards ingestion of large files. Successfully tested so far with JSON and CSV files in the range of 20-30 GBytes. On a single machine running both `node-es-transformer` and Elasticsearch ingestion rates up to 20k documents/second were achieved (2,9 GHz Intel Core i7, 16GByte RAM, SSD).
|
|
31
|
+
- Buffering/Streaming for both reading and indexing. Files are read using streaming and Elasticsearch ingestion is done using buffered bulk indexing. This is tailored towards ingestion of large files. Successfully tested so far with JSON and CSV files in the range of 20-30 GBytes. On a single machine running both `node-es-transformer` and Elasticsearch ingestion rates up to 20k documents/second were achieved (2,9 GHz Intel Core i7, 16GByte RAM, SSD), depending on document size.
|
|
30
32
|
- Supports wildcards to ingest/transform a range of files in one go.
|
|
31
33
|
- Supports fetching documents from existing indices using search/scroll. This allows you to reindex with custom data transformations just using JavaScript in the `transform` callback.
|
|
32
34
|
- The `transform` callback gives you each source document, but you can split it up in multiple ones and return an array of documents. An example use case for this: Each source document is a Tweet and you want to transform that into an entity centric index based on Hashtags.
|
|
@@ -113,7 +115,10 @@ transformer({
|
|
|
113
115
|
- `splitRegex`: Custom line split regex, defaults to `/\n/`.
|
|
114
116
|
- `sourceIndexName`: The source Elasticsearch index to reindex from. If this is set, `fileName` is not allowed.
|
|
115
117
|
- `targetIndexName`: The target Elasticsearch index where documents will be indexed.
|
|
116
|
-
- `mappings`: Elasticsearch document
|
|
118
|
+
- `mappings`: Optional Elasticsearch document mappings. If not set and you're reindexing from another index, the mappings from the existing index will be used.
|
|
119
|
+
- `mappingsOverride`: If you're reindexing and this is set to `true`, `mappings` will be applied on top of the source index's mappings. Defaults to `false`.
|
|
120
|
+
- `indexMappingTotalFieldsLimit`: Optional field limit for the target index to be created that will be passed on as the `index.mapping.total_fields.limit` setting.
|
|
121
|
+
- `query`: Optional Elasticsearch [DSL query](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html) to filter documents from the source index.
|
|
117
122
|
- `skipHeader`: If true, skips the first line of the source file. Defaults to `false`.
|
|
118
123
|
- `transform(line)`: A callback function which allows the transformation of a source line into one or several documents.
|
|
119
124
|
- `verbose`: Logging verbosity, defaults to `true`
|
|
@@ -137,7 +142,15 @@ yarn
|
|
|
137
142
|
|
|
138
143
|
`yarn dev` builds the library, then keeps rebuilding it whenever the source files change using [rollup-watch](https://github.com/rollup/rollup-watch).
|
|
139
144
|
|
|
140
|
-
`yarn test`
|
|
145
|
+
`yarn test` runs the tests. The tests expect that you have an Elasticsearch instance running without security at `http://localhost:9200`. Using docker, you can set this up with:
|
|
146
|
+
|
|
147
|
+
```bash
|
|
148
|
+
# Download the docker image
|
|
149
|
+
docker pull docker.elastic.co/elasticsearch/elasticsearch:8.10.4
|
|
150
|
+
|
|
151
|
+
# Run the container
|
|
152
|
+
docker run --name es01 --net elastic -p 9200:9200 -it -m 1GB -e "discovery.type=single-node" -e "xpack.security.enabled=false" docker.elastic.co/elasticsearch/elasticsearch:8.10.4
|
|
153
|
+
```
|
|
141
154
|
|
|
142
155
|
## License
|
|
143
156
|
|
|
@@ -8,20 +8,26 @@ var glob = _interopDefault(require('glob'));
|
|
|
8
8
|
var cliProgress = _interopDefault(require('cli-progress'));
|
|
9
9
|
var elasticsearch = _interopDefault(require('@elastic/elasticsearch'));
|
|
10
10
|
|
|
11
|
+
var DEFAULT_BUFFER_SIZE = 1000;
|
|
12
|
+
|
|
11
13
|
function createMappingFactory(ref) {
|
|
12
14
|
var sourceClient = ref.sourceClient;
|
|
13
15
|
var sourceIndexName = ref.sourceIndexName;
|
|
14
16
|
var targetClient = ref.targetClient;
|
|
15
17
|
var targetIndexName = ref.targetIndexName;
|
|
16
18
|
var mappings = ref.mappings;
|
|
19
|
+
var mappingsOverride = ref.mappingsOverride;
|
|
20
|
+
var indexMappingTotalFieldsLimit = ref.indexMappingTotalFieldsLimit;
|
|
17
21
|
var verbose = ref.verbose;
|
|
18
22
|
|
|
19
23
|
return async function () {
|
|
20
|
-
var targetMappings = mappings;
|
|
24
|
+
var targetMappings = mappingsOverride ? undefined : mappings;
|
|
21
25
|
|
|
22
26
|
if (sourceClient && sourceIndexName && typeof targetMappings === 'undefined') {
|
|
23
27
|
try {
|
|
24
|
-
var mapping = await sourceClient.indices.getMapping({
|
|
28
|
+
var mapping = await sourceClient.indices.getMapping({
|
|
29
|
+
index: sourceIndexName,
|
|
30
|
+
});
|
|
25
31
|
targetMappings = mapping[sourceIndexName].mappings;
|
|
26
32
|
} catch (err) {
|
|
27
33
|
console.log('Error reading source mapping', err);
|
|
@@ -30,13 +36,24 @@ function createMappingFactory(ref) {
|
|
|
30
36
|
}
|
|
31
37
|
|
|
32
38
|
if (typeof targetMappings === 'object' && targetMappings !== null) {
|
|
39
|
+
if (mappingsOverride) {
|
|
40
|
+
targetMappings = Object.assign({}, targetMappings,
|
|
41
|
+
{properties: Object.assign({}, targetMappings.properties,
|
|
42
|
+
mappings)});
|
|
43
|
+
}
|
|
44
|
+
|
|
33
45
|
try {
|
|
34
|
-
var resp = await targetClient.indices.create(
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
46
|
+
var resp = await targetClient.indices.create({
|
|
47
|
+
index: targetIndexName,
|
|
48
|
+
body: Object.assign({}, {mappings: targetMappings},
|
|
49
|
+
(indexMappingTotalFieldsLimit !== undefined
|
|
50
|
+
? {
|
|
51
|
+
settings: {
|
|
52
|
+
'index.mapping.total_fields.limit': indexMappingTotalFieldsLimit,
|
|
53
|
+
},
|
|
54
|
+
}
|
|
55
|
+
: {})),
|
|
56
|
+
});
|
|
40
57
|
if (verbose) { console.log('Created target mapping', resp); }
|
|
41
58
|
} catch (err) {
|
|
42
59
|
console.log('Error creating target mapping', err);
|
|
@@ -45,45 +62,69 @@ function createMappingFactory(ref) {
|
|
|
45
62
|
};
|
|
46
63
|
}
|
|
47
64
|
|
|
65
|
+
var MAX_QUEUE_SIZE = 15;
|
|
66
|
+
|
|
48
67
|
function fileReaderFactory(indexer, fileName, transform, splitRegex, verbose) {
|
|
49
68
|
function startIndex(files) {
|
|
69
|
+
var ingestQueueSize = 0;
|
|
70
|
+
var finished = false;
|
|
71
|
+
|
|
50
72
|
var file = files.shift();
|
|
51
|
-
var s = fs
|
|
73
|
+
var s = fs
|
|
74
|
+
.createReadStream(file)
|
|
52
75
|
.pipe(es.split(splitRegex))
|
|
53
|
-
.pipe(
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
76
|
+
.pipe(
|
|
77
|
+
es
|
|
78
|
+
.mapSync(function (line) {
|
|
79
|
+
try {
|
|
80
|
+
var doc = typeof transform === 'function' ? transform(line) : line;
|
|
81
|
+
// if doc is undefined we'll skip indexing it
|
|
82
|
+
if (typeof doc === 'undefined') {
|
|
83
|
+
s.resume();
|
|
84
|
+
return;
|
|
85
|
+
}
|
|
86
|
+
|
|
87
|
+
// the transform callback may return an array of docs so we can emit
|
|
88
|
+
// multiple docs from a single line
|
|
89
|
+
if (Array.isArray(doc)) {
|
|
90
|
+
doc.forEach(function (d) { return indexer.add(d); });
|
|
91
|
+
return;
|
|
92
|
+
}
|
|
93
|
+
|
|
94
|
+
indexer.add(doc);
|
|
95
|
+
} catch (e) {
|
|
96
|
+
console.log('error', e);
|
|
97
|
+
}
|
|
98
|
+
})
|
|
99
|
+
.on('error', function (err) {
|
|
100
|
+
console.log('Error while reading file.', err);
|
|
101
|
+
})
|
|
102
|
+
.on('end', function () {
|
|
103
|
+
if (verbose) { console.log('Read entire file: ', file); }
|
|
104
|
+
if (files.length > 0) {
|
|
105
|
+
startIndex(files);
|
|
106
|
+
return;
|
|
107
|
+
}
|
|
108
|
+
|
|
109
|
+
indexer.finish();
|
|
110
|
+
finished = true;
|
|
111
|
+
})
|
|
112
|
+
);
|
|
62
113
|
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
doc.forEach(function (d) { return indexer.add(d); });
|
|
67
|
-
return;
|
|
68
|
-
}
|
|
114
|
+
indexer.queueEmitter.on('queue-size', async function (size) {
|
|
115
|
+
if (finished) { return; }
|
|
116
|
+
ingestQueueSize = size;
|
|
69
117
|
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
}
|
|
75
|
-
|
|
76
|
-
console.log('Error while reading file.', err);
|
|
77
|
-
})
|
|
78
|
-
.on('end', function () {
|
|
79
|
-
if (verbose) { console.log('Read entire file: ', file); }
|
|
80
|
-
indexer.finish();
|
|
81
|
-
if (files.length > 0) {
|
|
82
|
-
startIndex(files);
|
|
83
|
-
}
|
|
84
|
-
}));
|
|
118
|
+
if (ingestQueueSize < MAX_QUEUE_SIZE) {
|
|
119
|
+
s.resume();
|
|
120
|
+
} else {
|
|
121
|
+
s.pause();
|
|
122
|
+
}
|
|
123
|
+
});
|
|
85
124
|
|
|
86
125
|
indexer.queueEmitter.on('resume', function () {
|
|
126
|
+
if (finished) { return; }
|
|
127
|
+
ingestQueueSize = 0;
|
|
87
128
|
s.resume();
|
|
88
129
|
});
|
|
89
130
|
}
|
|
@@ -99,81 +140,139 @@ var EventEmitter = require('events');
|
|
|
99
140
|
|
|
100
141
|
var queueEmitter = new EventEmitter();
|
|
101
142
|
|
|
143
|
+
var parallelCalls = 1;
|
|
144
|
+
|
|
102
145
|
// a simple helper queue to bulk index documents
|
|
103
146
|
function indexQueueFactory(ref) {
|
|
104
147
|
var client = ref.targetClient;
|
|
105
148
|
var targetIndexName = ref.targetIndexName;
|
|
106
|
-
var bufferSize = ref.bufferSize; if ( bufferSize === void 0 ) bufferSize =
|
|
149
|
+
var bufferSize = ref.bufferSize; if ( bufferSize === void 0 ) bufferSize = DEFAULT_BUFFER_SIZE;
|
|
107
150
|
var skipHeader = ref.skipHeader; if ( skipHeader === void 0 ) skipHeader = false;
|
|
108
151
|
var verbose = ref.verbose; if ( verbose === void 0 ) verbose = true;
|
|
109
152
|
|
|
110
153
|
var buffer = [];
|
|
111
154
|
var queue = [];
|
|
112
|
-
var ingesting =
|
|
155
|
+
var ingesting = 0;
|
|
156
|
+
var ingestTimes = [];
|
|
157
|
+
var finished = false;
|
|
113
158
|
|
|
114
|
-
var ingest =
|
|
159
|
+
var ingest = function (b) {
|
|
115
160
|
if (typeof b !== 'undefined') {
|
|
116
161
|
queue.push(b);
|
|
117
162
|
queueEmitter.emit('queue-size', queue.length);
|
|
118
163
|
}
|
|
119
164
|
|
|
120
|
-
if (
|
|
165
|
+
if (ingestTimes.length > 5) { ingestTimes = ingestTimes.slice(-5); }
|
|
166
|
+
|
|
167
|
+
if (ingesting < parallelCalls) {
|
|
121
168
|
var docs = queue.shift();
|
|
122
|
-
queueEmitter.emit('queue-size', queue.length);
|
|
123
|
-
ingesting = true;
|
|
124
|
-
if (verbose) { console.log(("bulk ingest docs: " + (docs.length / 2) + ", queue length: " + (queue.length))); }
|
|
125
169
|
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
if (queue.length > 0) {
|
|
130
|
-
ingest();
|
|
131
|
-
}
|
|
132
|
-
} catch (err) {
|
|
133
|
-
console.log('bulk index error', err);
|
|
170
|
+
queueEmitter.emit('queue-size', queue.length);
|
|
171
|
+
if (queue.length <= 5) {
|
|
172
|
+
queueEmitter.emit('resume');
|
|
134
173
|
}
|
|
135
|
-
}
|
|
136
174
|
|
|
137
|
-
|
|
138
|
-
|
|
139
|
-
|
|
140
|
-
|
|
175
|
+
ingesting += 1;
|
|
176
|
+
|
|
177
|
+
if (verbose)
|
|
178
|
+
{ console.log(("bulk ingest docs: " + (docs.length / 2) + ", queue length: " + (queue.length))); }
|
|
179
|
+
|
|
180
|
+
var start = Date.now();
|
|
181
|
+
client
|
|
182
|
+
.bulk({ body: docs })
|
|
183
|
+
.then(function () {
|
|
184
|
+
var end = Date.now();
|
|
185
|
+
var delta = end - start;
|
|
186
|
+
ingestTimes.push(delta);
|
|
187
|
+
ingesting -= 1;
|
|
188
|
+
|
|
189
|
+
var ingestTimesMovingAverage =
|
|
190
|
+
ingestTimes.length > 0
|
|
191
|
+
? ingestTimes.reduce(function (p, c) { return p + c; }, 0) / ingestTimes.length
|
|
192
|
+
: 0;
|
|
193
|
+
var ingestTimesMovingAverageSeconds = Math.floor(ingestTimesMovingAverage / 1000);
|
|
194
|
+
|
|
195
|
+
if (
|
|
196
|
+
ingestTimes.length > 0 &&
|
|
197
|
+
ingestTimesMovingAverageSeconds < 30 &&
|
|
198
|
+
parallelCalls < 10
|
|
199
|
+
) {
|
|
200
|
+
parallelCalls += 1;
|
|
201
|
+
} else if (
|
|
202
|
+
ingestTimes.length > 0 &&
|
|
203
|
+
ingestTimesMovingAverageSeconds >= 30 &&
|
|
204
|
+
parallelCalls > 1
|
|
205
|
+
) {
|
|
206
|
+
parallelCalls -= 1;
|
|
207
|
+
}
|
|
208
|
+
|
|
209
|
+
if (queue.length > 0) {
|
|
210
|
+
ingest();
|
|
211
|
+
} else if (queue.length === 0 && finished) {
|
|
212
|
+
queueEmitter.emit('finish');
|
|
213
|
+
}
|
|
214
|
+
})
|
|
215
|
+
.catch(function (error) {
|
|
216
|
+
console.error(error);
|
|
217
|
+
ingesting -= 1;
|
|
218
|
+
parallelCalls = 1;
|
|
219
|
+
if (queue.length > 0) {
|
|
220
|
+
ingest();
|
|
221
|
+
}
|
|
222
|
+
});
|
|
141
223
|
}
|
|
142
224
|
};
|
|
143
225
|
|
|
144
226
|
return {
|
|
145
227
|
add: function (doc) {
|
|
228
|
+
if (finished) {
|
|
229
|
+
throw new Error('Unexpected doc added after indexer should finish.');
|
|
230
|
+
}
|
|
231
|
+
|
|
146
232
|
if (!skipHeader) {
|
|
147
233
|
var header = { index: { _index: targetIndexName } };
|
|
148
234
|
buffer.push(header);
|
|
149
235
|
}
|
|
150
236
|
buffer.push(doc);
|
|
151
237
|
|
|
152
|
-
// console.log(`add: queue.length ${queue.length}`);
|
|
153
238
|
if (queue.length === 0) {
|
|
154
239
|
queueEmitter.emit('resume');
|
|
155
240
|
}
|
|
156
241
|
|
|
157
|
-
if (buffer.length >=
|
|
242
|
+
if (buffer.length >= bufferSize * 2) {
|
|
158
243
|
ingest(buffer);
|
|
159
244
|
buffer = [];
|
|
160
245
|
}
|
|
161
246
|
},
|
|
162
|
-
finish:
|
|
163
|
-
|
|
164
|
-
|
|
165
|
-
|
|
247
|
+
finish: function () {
|
|
248
|
+
finished = true;
|
|
249
|
+
|
|
250
|
+
if (buffer.length > 0) {
|
|
251
|
+
ingest(buffer);
|
|
252
|
+
buffer = [];
|
|
253
|
+
} else if (queue.length === 0 && ingesting === 0) {
|
|
254
|
+
queueEmitter.emit('finish');
|
|
255
|
+
}
|
|
166
256
|
},
|
|
167
257
|
queueEmitter: queueEmitter,
|
|
168
258
|
};
|
|
169
259
|
}
|
|
170
260
|
|
|
171
|
-
var MAX_QUEUE_SIZE =
|
|
261
|
+
var MAX_QUEUE_SIZE$1 = 15;
|
|
172
262
|
|
|
173
263
|
// create a new progress bar instance and use shades_classic theme
|
|
174
264
|
var progressBar = new cliProgress.SingleBar({}, cliProgress.Presets.shades_classic);
|
|
175
265
|
|
|
176
|
-
function indexReaderFactory(
|
|
266
|
+
function indexReaderFactory(
|
|
267
|
+
indexer,
|
|
268
|
+
sourceIndexName,
|
|
269
|
+
transform,
|
|
270
|
+
client,
|
|
271
|
+
query,
|
|
272
|
+
bufferSize
|
|
273
|
+
) {
|
|
274
|
+
if ( bufferSize === void 0 ) bufferSize = DEFAULT_BUFFER_SIZE;
|
|
275
|
+
|
|
177
276
|
return async function indexReader() {
|
|
178
277
|
var responseQueue = [];
|
|
179
278
|
var docsNum = 0;
|
|
@@ -182,7 +281,8 @@ function indexReaderFactory(indexer, sourceIndexName, transform, client) {
|
|
|
182
281
|
return client.search({
|
|
183
282
|
index: sourceIndexName,
|
|
184
283
|
scroll: '30s',
|
|
185
|
-
size:
|
|
284
|
+
size: bufferSize,
|
|
285
|
+
query: query,
|
|
186
286
|
});
|
|
187
287
|
}
|
|
188
288
|
|
|
@@ -202,7 +302,7 @@ function indexReaderFactory(indexer, sourceIndexName, transform, client) {
|
|
|
202
302
|
function processHit(hit) {
|
|
203
303
|
docsNum += 1;
|
|
204
304
|
try {
|
|
205
|
-
var doc =
|
|
305
|
+
var doc = typeof transform === 'function' ? transform(hit._source) : hit._source; // eslint-disable-line no-underscore-dangle
|
|
206
306
|
// if doc is undefined we'll skip indexing it
|
|
207
307
|
if (typeof doc === 'undefined') {
|
|
208
308
|
return;
|
|
@@ -236,15 +336,13 @@ function indexReaderFactory(indexer, sourceIndexName, transform, client) {
|
|
|
236
336
|
progressBar.update(docsNum);
|
|
237
337
|
|
|
238
338
|
// check to see if we have collected all of the docs
|
|
239
|
-
// console.log('check count', response.hits.total.value, docsNum);
|
|
240
339
|
if (response.hits.total.value === docsNum) {
|
|
241
340
|
indexer.finish();
|
|
242
|
-
progressBar.stop();
|
|
243
341
|
break;
|
|
244
342
|
}
|
|
245
343
|
|
|
246
|
-
if (ingestQueueSize < MAX_QUEUE_SIZE) {
|
|
247
|
-
|
|
344
|
+
if (ingestQueueSize < MAX_QUEUE_SIZE$1) {
|
|
345
|
+
// get the next response if there are more docs to fetch
|
|
248
346
|
var sc = await scroll(response._scroll_id); // eslint-disable-line no-await-in-loop,no-underscore-dangle,max-len
|
|
249
347
|
scrollId = sc._scroll_id; // eslint-disable-line no-underscore-dangle
|
|
250
348
|
responseQueue.push(sc);
|
|
@@ -257,8 +355,8 @@ function indexReaderFactory(indexer, sourceIndexName, transform, client) {
|
|
|
257
355
|
indexer.queueEmitter.on('queue-size', async function (size) {
|
|
258
356
|
ingestQueueSize = size;
|
|
259
357
|
|
|
260
|
-
if (!readActive && ingestQueueSize < MAX_QUEUE_SIZE) {
|
|
261
|
-
|
|
358
|
+
if (!readActive && ingestQueueSize < MAX_QUEUE_SIZE$1) {
|
|
359
|
+
// get the next response if there are more docs to fetch
|
|
262
360
|
var sc = await scroll(scrollId); // eslint-disable-line no-await-in-loop,no-underscore-dangle,max-len
|
|
263
361
|
scrollId = sc._scroll_id; // eslint-disable-line no-underscore-dangle
|
|
264
362
|
responseQueue.push(sc);
|
|
@@ -280,6 +378,10 @@ function indexReaderFactory(indexer, sourceIndexName, transform, client) {
|
|
|
280
378
|
processResponseQueue();
|
|
281
379
|
});
|
|
282
380
|
|
|
381
|
+
indexer.queueEmitter.on('finish', function () {
|
|
382
|
+
progressBar.stop();
|
|
383
|
+
});
|
|
384
|
+
|
|
283
385
|
processResponseQueue();
|
|
284
386
|
};
|
|
285
387
|
}
|
|
@@ -288,12 +390,15 @@ async function transformer(ref) {
|
|
|
288
390
|
var deleteIndex = ref.deleteIndex; if ( deleteIndex === void 0 ) deleteIndex = false;
|
|
289
391
|
var sourceClientConfig = ref.sourceClientConfig;
|
|
290
392
|
var targetClientConfig = ref.targetClientConfig;
|
|
291
|
-
var bufferSize = ref.bufferSize; if ( bufferSize === void 0 ) bufferSize =
|
|
393
|
+
var bufferSize = ref.bufferSize; if ( bufferSize === void 0 ) bufferSize = DEFAULT_BUFFER_SIZE;
|
|
292
394
|
var fileName = ref.fileName;
|
|
293
395
|
var splitRegex = ref.splitRegex; if ( splitRegex === void 0 ) splitRegex = /\n/;
|
|
294
396
|
var sourceIndexName = ref.sourceIndexName;
|
|
295
397
|
var targetIndexName = ref.targetIndexName;
|
|
296
398
|
var mappings = ref.mappings;
|
|
399
|
+
var mappingsOverride = ref.mappingsOverride; if ( mappingsOverride === void 0 ) mappingsOverride = false;
|
|
400
|
+
var indexMappingTotalFieldsLimit = ref.indexMappingTotalFieldsLimit;
|
|
401
|
+
var query = ref.query;
|
|
297
402
|
var skipHeader = ref.skipHeader; if ( skipHeader === void 0 ) skipHeader = false;
|
|
298
403
|
var transform = ref.transform;
|
|
299
404
|
var verbose = ref.verbose; if ( verbose === void 0 ) verbose = true;
|
|
@@ -317,6 +422,8 @@ async function transformer(ref) {
|
|
|
317
422
|
targetClient: targetClient,
|
|
318
423
|
targetIndexName: targetIndexName,
|
|
319
424
|
mappings: mappings,
|
|
425
|
+
mappingsOverride: mappingsOverride,
|
|
426
|
+
indexMappingTotalFieldsLimit: indexMappingTotalFieldsLimit,
|
|
320
427
|
verbose: verbose,
|
|
321
428
|
});
|
|
322
429
|
var indexer = indexQueueFactory({
|
|
@@ -328,30 +435,16 @@ async function transformer(ref) {
|
|
|
328
435
|
});
|
|
329
436
|
|
|
330
437
|
function getReader() {
|
|
331
|
-
if (
|
|
332
|
-
|
|
333
|
-
&& typeof sourceIndexName !== 'undefined'
|
|
334
|
-
) {
|
|
335
|
-
throw Error(
|
|
336
|
-
'Only either one of fileName or sourceIndexName can be specified.'
|
|
337
|
-
);
|
|
438
|
+
if (typeof fileName !== 'undefined' && typeof sourceIndexName !== 'undefined') {
|
|
439
|
+
throw Error('Only either one of fileName or sourceIndexName can be specified.');
|
|
338
440
|
}
|
|
339
441
|
|
|
340
|
-
if (
|
|
341
|
-
typeof fileName === 'undefined'
|
|
342
|
-
&& typeof sourceIndexName === 'undefined'
|
|
343
|
-
) {
|
|
442
|
+
if (typeof fileName === 'undefined' && typeof sourceIndexName === 'undefined') {
|
|
344
443
|
throw Error('Either fileName or sourceIndexName must be specified.');
|
|
345
444
|
}
|
|
346
445
|
|
|
347
446
|
if (typeof fileName !== 'undefined') {
|
|
348
|
-
return fileReaderFactory(
|
|
349
|
-
indexer,
|
|
350
|
-
fileName,
|
|
351
|
-
transform,
|
|
352
|
-
splitRegex,
|
|
353
|
-
verbose
|
|
354
|
-
);
|
|
447
|
+
return fileReaderFactory(indexer, fileName, transform, splitRegex, verbose);
|
|
355
448
|
}
|
|
356
449
|
|
|
357
450
|
if (typeof sourceIndexName !== 'undefined') {
|
|
@@ -359,7 +452,9 @@ async function transformer(ref) {
|
|
|
359
452
|
indexer,
|
|
360
453
|
sourceIndexName,
|
|
361
454
|
transform,
|
|
362
|
-
sourceClient
|
|
455
|
+
sourceClient,
|
|
456
|
+
query,
|
|
457
|
+
bufferSize
|
|
363
458
|
);
|
|
364
459
|
}
|
|
365
460
|
|
|
@@ -4,20 +4,26 @@ import glob from 'glob';
|
|
|
4
4
|
import cliProgress from 'cli-progress';
|
|
5
5
|
import elasticsearch from '@elastic/elasticsearch';
|
|
6
6
|
|
|
7
|
+
var DEFAULT_BUFFER_SIZE = 1000;
|
|
8
|
+
|
|
7
9
|
function createMappingFactory(ref) {
|
|
8
10
|
var sourceClient = ref.sourceClient;
|
|
9
11
|
var sourceIndexName = ref.sourceIndexName;
|
|
10
12
|
var targetClient = ref.targetClient;
|
|
11
13
|
var targetIndexName = ref.targetIndexName;
|
|
12
14
|
var mappings = ref.mappings;
|
|
15
|
+
var mappingsOverride = ref.mappingsOverride;
|
|
16
|
+
var indexMappingTotalFieldsLimit = ref.indexMappingTotalFieldsLimit;
|
|
13
17
|
var verbose = ref.verbose;
|
|
14
18
|
|
|
15
19
|
return async function () {
|
|
16
|
-
var targetMappings = mappings;
|
|
20
|
+
var targetMappings = mappingsOverride ? undefined : mappings;
|
|
17
21
|
|
|
18
22
|
if (sourceClient && sourceIndexName && typeof targetMappings === 'undefined') {
|
|
19
23
|
try {
|
|
20
|
-
var mapping = await sourceClient.indices.getMapping({
|
|
24
|
+
var mapping = await sourceClient.indices.getMapping({
|
|
25
|
+
index: sourceIndexName,
|
|
26
|
+
});
|
|
21
27
|
targetMappings = mapping[sourceIndexName].mappings;
|
|
22
28
|
} catch (err) {
|
|
23
29
|
console.log('Error reading source mapping', err);
|
|
@@ -26,13 +32,24 @@ function createMappingFactory(ref) {
|
|
|
26
32
|
}
|
|
27
33
|
|
|
28
34
|
if (typeof targetMappings === 'object' && targetMappings !== null) {
|
|
35
|
+
if (mappingsOverride) {
|
|
36
|
+
targetMappings = Object.assign({}, targetMappings,
|
|
37
|
+
{properties: Object.assign({}, targetMappings.properties,
|
|
38
|
+
mappings)});
|
|
39
|
+
}
|
|
40
|
+
|
|
29
41
|
try {
|
|
30
|
-
var resp = await targetClient.indices.create(
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
42
|
+
var resp = await targetClient.indices.create({
|
|
43
|
+
index: targetIndexName,
|
|
44
|
+
body: Object.assign({}, {mappings: targetMappings},
|
|
45
|
+
(indexMappingTotalFieldsLimit !== undefined
|
|
46
|
+
? {
|
|
47
|
+
settings: {
|
|
48
|
+
'index.mapping.total_fields.limit': indexMappingTotalFieldsLimit,
|
|
49
|
+
},
|
|
50
|
+
}
|
|
51
|
+
: {})),
|
|
52
|
+
});
|
|
36
53
|
if (verbose) { console.log('Created target mapping', resp); }
|
|
37
54
|
} catch (err) {
|
|
38
55
|
console.log('Error creating target mapping', err);
|
|
@@ -41,45 +58,69 @@ function createMappingFactory(ref) {
|
|
|
41
58
|
};
|
|
42
59
|
}
|
|
43
60
|
|
|
61
|
+
var MAX_QUEUE_SIZE = 15;
|
|
62
|
+
|
|
44
63
|
function fileReaderFactory(indexer, fileName, transform, splitRegex, verbose) {
|
|
45
64
|
function startIndex(files) {
|
|
65
|
+
var ingestQueueSize = 0;
|
|
66
|
+
var finished = false;
|
|
67
|
+
|
|
46
68
|
var file = files.shift();
|
|
47
|
-
var s = fs
|
|
69
|
+
var s = fs
|
|
70
|
+
.createReadStream(file)
|
|
48
71
|
.pipe(es.split(splitRegex))
|
|
49
|
-
.pipe(
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
72
|
+
.pipe(
|
|
73
|
+
es
|
|
74
|
+
.mapSync(function (line) {
|
|
75
|
+
try {
|
|
76
|
+
var doc = typeof transform === 'function' ? transform(line) : line;
|
|
77
|
+
// if doc is undefined we'll skip indexing it
|
|
78
|
+
if (typeof doc === 'undefined') {
|
|
79
|
+
s.resume();
|
|
80
|
+
return;
|
|
81
|
+
}
|
|
82
|
+
|
|
83
|
+
// the transform callback may return an array of docs so we can emit
|
|
84
|
+
// multiple docs from a single line
|
|
85
|
+
if (Array.isArray(doc)) {
|
|
86
|
+
doc.forEach(function (d) { return indexer.add(d); });
|
|
87
|
+
return;
|
|
88
|
+
}
|
|
89
|
+
|
|
90
|
+
indexer.add(doc);
|
|
91
|
+
} catch (e) {
|
|
92
|
+
console.log('error', e);
|
|
93
|
+
}
|
|
94
|
+
})
|
|
95
|
+
.on('error', function (err) {
|
|
96
|
+
console.log('Error while reading file.', err);
|
|
97
|
+
})
|
|
98
|
+
.on('end', function () {
|
|
99
|
+
if (verbose) { console.log('Read entire file: ', file); }
|
|
100
|
+
if (files.length > 0) {
|
|
101
|
+
startIndex(files);
|
|
102
|
+
return;
|
|
103
|
+
}
|
|
104
|
+
|
|
105
|
+
indexer.finish();
|
|
106
|
+
finished = true;
|
|
107
|
+
})
|
|
108
|
+
);
|
|
58
109
|
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
doc.forEach(function (d) { return indexer.add(d); });
|
|
63
|
-
return;
|
|
64
|
-
}
|
|
110
|
+
indexer.queueEmitter.on('queue-size', async function (size) {
|
|
111
|
+
if (finished) { return; }
|
|
112
|
+
ingestQueueSize = size;
|
|
65
113
|
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
}
|
|
71
|
-
|
|
72
|
-
console.log('Error while reading file.', err);
|
|
73
|
-
})
|
|
74
|
-
.on('end', function () {
|
|
75
|
-
if (verbose) { console.log('Read entire file: ', file); }
|
|
76
|
-
indexer.finish();
|
|
77
|
-
if (files.length > 0) {
|
|
78
|
-
startIndex(files);
|
|
79
|
-
}
|
|
80
|
-
}));
|
|
114
|
+
if (ingestQueueSize < MAX_QUEUE_SIZE) {
|
|
115
|
+
s.resume();
|
|
116
|
+
} else {
|
|
117
|
+
s.pause();
|
|
118
|
+
}
|
|
119
|
+
});
|
|
81
120
|
|
|
82
121
|
indexer.queueEmitter.on('resume', function () {
|
|
122
|
+
if (finished) { return; }
|
|
123
|
+
ingestQueueSize = 0;
|
|
83
124
|
s.resume();
|
|
84
125
|
});
|
|
85
126
|
}
|
|
@@ -95,81 +136,139 @@ var EventEmitter = require('events');
|
|
|
95
136
|
|
|
96
137
|
var queueEmitter = new EventEmitter();
|
|
97
138
|
|
|
139
|
+
var parallelCalls = 1;
|
|
140
|
+
|
|
98
141
|
// a simple helper queue to bulk index documents
|
|
99
142
|
function indexQueueFactory(ref) {
|
|
100
143
|
var client = ref.targetClient;
|
|
101
144
|
var targetIndexName = ref.targetIndexName;
|
|
102
|
-
var bufferSize = ref.bufferSize; if ( bufferSize === void 0 ) bufferSize =
|
|
145
|
+
var bufferSize = ref.bufferSize; if ( bufferSize === void 0 ) bufferSize = DEFAULT_BUFFER_SIZE;
|
|
103
146
|
var skipHeader = ref.skipHeader; if ( skipHeader === void 0 ) skipHeader = false;
|
|
104
147
|
var verbose = ref.verbose; if ( verbose === void 0 ) verbose = true;
|
|
105
148
|
|
|
106
149
|
var buffer = [];
|
|
107
150
|
var queue = [];
|
|
108
|
-
var ingesting =
|
|
151
|
+
var ingesting = 0;
|
|
152
|
+
var ingestTimes = [];
|
|
153
|
+
var finished = false;
|
|
109
154
|
|
|
110
|
-
var ingest =
|
|
155
|
+
var ingest = function (b) {
|
|
111
156
|
if (typeof b !== 'undefined') {
|
|
112
157
|
queue.push(b);
|
|
113
158
|
queueEmitter.emit('queue-size', queue.length);
|
|
114
159
|
}
|
|
115
160
|
|
|
116
|
-
if (
|
|
161
|
+
if (ingestTimes.length > 5) { ingestTimes = ingestTimes.slice(-5); }
|
|
162
|
+
|
|
163
|
+
if (ingesting < parallelCalls) {
|
|
117
164
|
var docs = queue.shift();
|
|
118
|
-
queueEmitter.emit('queue-size', queue.length);
|
|
119
|
-
ingesting = true;
|
|
120
|
-
if (verbose) { console.log(("bulk ingest docs: " + (docs.length / 2) + ", queue length: " + (queue.length))); }
|
|
121
165
|
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
if (queue.length > 0) {
|
|
126
|
-
ingest();
|
|
127
|
-
}
|
|
128
|
-
} catch (err) {
|
|
129
|
-
console.log('bulk index error', err);
|
|
166
|
+
queueEmitter.emit('queue-size', queue.length);
|
|
167
|
+
if (queue.length <= 5) {
|
|
168
|
+
queueEmitter.emit('resume');
|
|
130
169
|
}
|
|
131
|
-
}
|
|
132
170
|
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
|
|
136
|
-
|
|
171
|
+
ingesting += 1;
|
|
172
|
+
|
|
173
|
+
if (verbose)
|
|
174
|
+
{ console.log(("bulk ingest docs: " + (docs.length / 2) + ", queue length: " + (queue.length))); }
|
|
175
|
+
|
|
176
|
+
var start = Date.now();
|
|
177
|
+
client
|
|
178
|
+
.bulk({ body: docs })
|
|
179
|
+
.then(function () {
|
|
180
|
+
var end = Date.now();
|
|
181
|
+
var delta = end - start;
|
|
182
|
+
ingestTimes.push(delta);
|
|
183
|
+
ingesting -= 1;
|
|
184
|
+
|
|
185
|
+
var ingestTimesMovingAverage =
|
|
186
|
+
ingestTimes.length > 0
|
|
187
|
+
? ingestTimes.reduce(function (p, c) { return p + c; }, 0) / ingestTimes.length
|
|
188
|
+
: 0;
|
|
189
|
+
var ingestTimesMovingAverageSeconds = Math.floor(ingestTimesMovingAverage / 1000);
|
|
190
|
+
|
|
191
|
+
if (
|
|
192
|
+
ingestTimes.length > 0 &&
|
|
193
|
+
ingestTimesMovingAverageSeconds < 30 &&
|
|
194
|
+
parallelCalls < 10
|
|
195
|
+
) {
|
|
196
|
+
parallelCalls += 1;
|
|
197
|
+
} else if (
|
|
198
|
+
ingestTimes.length > 0 &&
|
|
199
|
+
ingestTimesMovingAverageSeconds >= 30 &&
|
|
200
|
+
parallelCalls > 1
|
|
201
|
+
) {
|
|
202
|
+
parallelCalls -= 1;
|
|
203
|
+
}
|
|
204
|
+
|
|
205
|
+
if (queue.length > 0) {
|
|
206
|
+
ingest();
|
|
207
|
+
} else if (queue.length === 0 && finished) {
|
|
208
|
+
queueEmitter.emit('finish');
|
|
209
|
+
}
|
|
210
|
+
})
|
|
211
|
+
.catch(function (error) {
|
|
212
|
+
console.error(error);
|
|
213
|
+
ingesting -= 1;
|
|
214
|
+
parallelCalls = 1;
|
|
215
|
+
if (queue.length > 0) {
|
|
216
|
+
ingest();
|
|
217
|
+
}
|
|
218
|
+
});
|
|
137
219
|
}
|
|
138
220
|
};
|
|
139
221
|
|
|
140
222
|
return {
|
|
141
223
|
add: function (doc) {
|
|
224
|
+
if (finished) {
|
|
225
|
+
throw new Error('Unexpected doc added after indexer should finish.');
|
|
226
|
+
}
|
|
227
|
+
|
|
142
228
|
if (!skipHeader) {
|
|
143
229
|
var header = { index: { _index: targetIndexName } };
|
|
144
230
|
buffer.push(header);
|
|
145
231
|
}
|
|
146
232
|
buffer.push(doc);
|
|
147
233
|
|
|
148
|
-
// console.log(`add: queue.length ${queue.length}`);
|
|
149
234
|
if (queue.length === 0) {
|
|
150
235
|
queueEmitter.emit('resume');
|
|
151
236
|
}
|
|
152
237
|
|
|
153
|
-
if (buffer.length >=
|
|
238
|
+
if (buffer.length >= bufferSize * 2) {
|
|
154
239
|
ingest(buffer);
|
|
155
240
|
buffer = [];
|
|
156
241
|
}
|
|
157
242
|
},
|
|
158
|
-
finish:
|
|
159
|
-
|
|
160
|
-
|
|
161
|
-
|
|
243
|
+
finish: function () {
|
|
244
|
+
finished = true;
|
|
245
|
+
|
|
246
|
+
if (buffer.length > 0) {
|
|
247
|
+
ingest(buffer);
|
|
248
|
+
buffer = [];
|
|
249
|
+
} else if (queue.length === 0 && ingesting === 0) {
|
|
250
|
+
queueEmitter.emit('finish');
|
|
251
|
+
}
|
|
162
252
|
},
|
|
163
253
|
queueEmitter: queueEmitter,
|
|
164
254
|
};
|
|
165
255
|
}
|
|
166
256
|
|
|
167
|
-
var MAX_QUEUE_SIZE =
|
|
257
|
+
var MAX_QUEUE_SIZE$1 = 15;
|
|
168
258
|
|
|
169
259
|
// create a new progress bar instance and use shades_classic theme
|
|
170
260
|
var progressBar = new cliProgress.SingleBar({}, cliProgress.Presets.shades_classic);
|
|
171
261
|
|
|
172
|
-
function indexReaderFactory(
|
|
262
|
+
function indexReaderFactory(
|
|
263
|
+
indexer,
|
|
264
|
+
sourceIndexName,
|
|
265
|
+
transform,
|
|
266
|
+
client,
|
|
267
|
+
query,
|
|
268
|
+
bufferSize
|
|
269
|
+
) {
|
|
270
|
+
if ( bufferSize === void 0 ) bufferSize = DEFAULT_BUFFER_SIZE;
|
|
271
|
+
|
|
173
272
|
return async function indexReader() {
|
|
174
273
|
var responseQueue = [];
|
|
175
274
|
var docsNum = 0;
|
|
@@ -178,7 +277,8 @@ function indexReaderFactory(indexer, sourceIndexName, transform, client) {
|
|
|
178
277
|
return client.search({
|
|
179
278
|
index: sourceIndexName,
|
|
180
279
|
scroll: '30s',
|
|
181
|
-
size:
|
|
280
|
+
size: bufferSize,
|
|
281
|
+
query: query,
|
|
182
282
|
});
|
|
183
283
|
}
|
|
184
284
|
|
|
@@ -198,7 +298,7 @@ function indexReaderFactory(indexer, sourceIndexName, transform, client) {
|
|
|
198
298
|
function processHit(hit) {
|
|
199
299
|
docsNum += 1;
|
|
200
300
|
try {
|
|
201
|
-
var doc =
|
|
301
|
+
var doc = typeof transform === 'function' ? transform(hit._source) : hit._source; // eslint-disable-line no-underscore-dangle
|
|
202
302
|
// if doc is undefined we'll skip indexing it
|
|
203
303
|
if (typeof doc === 'undefined') {
|
|
204
304
|
return;
|
|
@@ -232,15 +332,13 @@ function indexReaderFactory(indexer, sourceIndexName, transform, client) {
|
|
|
232
332
|
progressBar.update(docsNum);
|
|
233
333
|
|
|
234
334
|
// check to see if we have collected all of the docs
|
|
235
|
-
// console.log('check count', response.hits.total.value, docsNum);
|
|
236
335
|
if (response.hits.total.value === docsNum) {
|
|
237
336
|
indexer.finish();
|
|
238
|
-
progressBar.stop();
|
|
239
337
|
break;
|
|
240
338
|
}
|
|
241
339
|
|
|
242
|
-
if (ingestQueueSize < MAX_QUEUE_SIZE) {
|
|
243
|
-
|
|
340
|
+
if (ingestQueueSize < MAX_QUEUE_SIZE$1) {
|
|
341
|
+
// get the next response if there are more docs to fetch
|
|
244
342
|
var sc = await scroll(response._scroll_id); // eslint-disable-line no-await-in-loop,no-underscore-dangle,max-len
|
|
245
343
|
scrollId = sc._scroll_id; // eslint-disable-line no-underscore-dangle
|
|
246
344
|
responseQueue.push(sc);
|
|
@@ -253,8 +351,8 @@ function indexReaderFactory(indexer, sourceIndexName, transform, client) {
|
|
|
253
351
|
indexer.queueEmitter.on('queue-size', async function (size) {
|
|
254
352
|
ingestQueueSize = size;
|
|
255
353
|
|
|
256
|
-
if (!readActive && ingestQueueSize < MAX_QUEUE_SIZE) {
|
|
257
|
-
|
|
354
|
+
if (!readActive && ingestQueueSize < MAX_QUEUE_SIZE$1) {
|
|
355
|
+
// get the next response if there are more docs to fetch
|
|
258
356
|
var sc = await scroll(scrollId); // eslint-disable-line no-await-in-loop,no-underscore-dangle,max-len
|
|
259
357
|
scrollId = sc._scroll_id; // eslint-disable-line no-underscore-dangle
|
|
260
358
|
responseQueue.push(sc);
|
|
@@ -276,6 +374,10 @@ function indexReaderFactory(indexer, sourceIndexName, transform, client) {
|
|
|
276
374
|
processResponseQueue();
|
|
277
375
|
});
|
|
278
376
|
|
|
377
|
+
indexer.queueEmitter.on('finish', function () {
|
|
378
|
+
progressBar.stop();
|
|
379
|
+
});
|
|
380
|
+
|
|
279
381
|
processResponseQueue();
|
|
280
382
|
};
|
|
281
383
|
}
|
|
@@ -284,12 +386,15 @@ async function transformer(ref) {
|
|
|
284
386
|
var deleteIndex = ref.deleteIndex; if ( deleteIndex === void 0 ) deleteIndex = false;
|
|
285
387
|
var sourceClientConfig = ref.sourceClientConfig;
|
|
286
388
|
var targetClientConfig = ref.targetClientConfig;
|
|
287
|
-
var bufferSize = ref.bufferSize; if ( bufferSize === void 0 ) bufferSize =
|
|
389
|
+
var bufferSize = ref.bufferSize; if ( bufferSize === void 0 ) bufferSize = DEFAULT_BUFFER_SIZE;
|
|
288
390
|
var fileName = ref.fileName;
|
|
289
391
|
var splitRegex = ref.splitRegex; if ( splitRegex === void 0 ) splitRegex = /\n/;
|
|
290
392
|
var sourceIndexName = ref.sourceIndexName;
|
|
291
393
|
var targetIndexName = ref.targetIndexName;
|
|
292
394
|
var mappings = ref.mappings;
|
|
395
|
+
var mappingsOverride = ref.mappingsOverride; if ( mappingsOverride === void 0 ) mappingsOverride = false;
|
|
396
|
+
var indexMappingTotalFieldsLimit = ref.indexMappingTotalFieldsLimit;
|
|
397
|
+
var query = ref.query;
|
|
293
398
|
var skipHeader = ref.skipHeader; if ( skipHeader === void 0 ) skipHeader = false;
|
|
294
399
|
var transform = ref.transform;
|
|
295
400
|
var verbose = ref.verbose; if ( verbose === void 0 ) verbose = true;
|
|
@@ -313,6 +418,8 @@ async function transformer(ref) {
|
|
|
313
418
|
targetClient: targetClient,
|
|
314
419
|
targetIndexName: targetIndexName,
|
|
315
420
|
mappings: mappings,
|
|
421
|
+
mappingsOverride: mappingsOverride,
|
|
422
|
+
indexMappingTotalFieldsLimit: indexMappingTotalFieldsLimit,
|
|
316
423
|
verbose: verbose,
|
|
317
424
|
});
|
|
318
425
|
var indexer = indexQueueFactory({
|
|
@@ -324,30 +431,16 @@ async function transformer(ref) {
|
|
|
324
431
|
});
|
|
325
432
|
|
|
326
433
|
function getReader() {
|
|
327
|
-
if (
|
|
328
|
-
|
|
329
|
-
&& typeof sourceIndexName !== 'undefined'
|
|
330
|
-
) {
|
|
331
|
-
throw Error(
|
|
332
|
-
'Only either one of fileName or sourceIndexName can be specified.'
|
|
333
|
-
);
|
|
434
|
+
if (typeof fileName !== 'undefined' && typeof sourceIndexName !== 'undefined') {
|
|
435
|
+
throw Error('Only either one of fileName or sourceIndexName can be specified.');
|
|
334
436
|
}
|
|
335
437
|
|
|
336
|
-
if (
|
|
337
|
-
typeof fileName === 'undefined'
|
|
338
|
-
&& typeof sourceIndexName === 'undefined'
|
|
339
|
-
) {
|
|
438
|
+
if (typeof fileName === 'undefined' && typeof sourceIndexName === 'undefined') {
|
|
340
439
|
throw Error('Either fileName or sourceIndexName must be specified.');
|
|
341
440
|
}
|
|
342
441
|
|
|
343
442
|
if (typeof fileName !== 'undefined') {
|
|
344
|
-
return fileReaderFactory(
|
|
345
|
-
indexer,
|
|
346
|
-
fileName,
|
|
347
|
-
transform,
|
|
348
|
-
splitRegex,
|
|
349
|
-
verbose
|
|
350
|
-
);
|
|
443
|
+
return fileReaderFactory(indexer, fileName, transform, splitRegex, verbose);
|
|
351
444
|
}
|
|
352
445
|
|
|
353
446
|
if (typeof sourceIndexName !== 'undefined') {
|
|
@@ -355,7 +448,9 @@ async function transformer(ref) {
|
|
|
355
448
|
indexer,
|
|
356
449
|
sourceIndexName,
|
|
357
450
|
transform,
|
|
358
|
-
sourceClient
|
|
451
|
+
sourceClient,
|
|
452
|
+
query,
|
|
453
|
+
bufferSize
|
|
359
454
|
);
|
|
360
455
|
}
|
|
361
456
|
|
package/package.json
CHANGED
|
@@ -14,22 +14,30 @@
|
|
|
14
14
|
"license": "Apache-2.0",
|
|
15
15
|
"author": "Walter Rafelsberger <walter@rafelsberger.at>",
|
|
16
16
|
"contributors": [],
|
|
17
|
-
"version": "1.0.0-
|
|
17
|
+
"version": "1.0.0-beta1",
|
|
18
18
|
"main": "dist/node-es-transformer.cjs.js",
|
|
19
19
|
"module": "dist/node-es-transformer.esm.js",
|
|
20
20
|
"dependencies": {
|
|
21
|
-
"@elastic/elasticsearch": "^8.
|
|
21
|
+
"@elastic/elasticsearch": "^8.10.0",
|
|
22
22
|
"cli-progress": "^3.12.0",
|
|
23
23
|
"event-stream": "3.3.4",
|
|
24
24
|
"glob": "7.1.2"
|
|
25
25
|
},
|
|
26
26
|
"devDependencies": {
|
|
27
27
|
"acorn": "^6.4.2",
|
|
28
|
-
"
|
|
28
|
+
"commit-and-tag-version": "^11.3.0",
|
|
29
|
+
"cz-conventional-changelog": "^3.3.0",
|
|
30
|
+
"eslint": "^8.51.0",
|
|
29
31
|
"eslint-config-airbnb": "19.0.4",
|
|
32
|
+
"eslint-config-prettier": "^9.0.0",
|
|
30
33
|
"eslint-plugin-import": "2.27.5",
|
|
34
|
+
"eslint-plugin-jest": "^27.4.2",
|
|
31
35
|
"eslint-plugin-jsx-a11y": "6.7.1",
|
|
36
|
+
"eslint-plugin-prettier": "^3.3.1",
|
|
32
37
|
"eslint-plugin-react": "7.32.2",
|
|
38
|
+
"frisby": "^2.1.3",
|
|
39
|
+
"jest": "^29.7.0",
|
|
40
|
+
"prettier": "^2.2.1",
|
|
33
41
|
"rollup": "0.66.6",
|
|
34
42
|
"rollup-plugin-buble": "0.19.6",
|
|
35
43
|
"rollup-plugin-commonjs": "8.0.2",
|
|
@@ -38,10 +46,18 @@
|
|
|
38
46
|
"scripts": {
|
|
39
47
|
"build": "rollup -c",
|
|
40
48
|
"dev": "rollup -c -w",
|
|
41
|
-
"test": "
|
|
42
|
-
"pretest": "npm run build"
|
|
49
|
+
"test": "jest",
|
|
50
|
+
"pretest": "npm run build",
|
|
51
|
+
"release": "commit-and-tag-version",
|
|
52
|
+
"create-sample-data-10000": "node scripts/create_sample_data_10000",
|
|
53
|
+
"create-sample-data-100": "node scripts/create_sample_data_100"
|
|
43
54
|
},
|
|
44
55
|
"files": [
|
|
45
56
|
"dist"
|
|
46
|
-
]
|
|
57
|
+
],
|
|
58
|
+
"config": {
|
|
59
|
+
"commitizen": {
|
|
60
|
+
"path": "./node_modules/cz-conventional-changelog"
|
|
61
|
+
}
|
|
62
|
+
}
|
|
47
63
|
}
|