node-es-transformer 1.0.0 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -79,11 +79,10 @@ If you need to ingest large CSV/JSON files (GigaBytes) into Elasticsearch withou
79
79
 
80
80
  | node-es-transformer | Elasticsearch Client | Elasticsearch Server | Node.js |
81
81
  | ----------------------- | -------------------- | -------------------- | ------- |
82
- | 1.0.0-beta8+ | 8.x and 9.x | 8.x and 9.x | 22+ |
83
- | 1.0.0-beta7 | 9.x only | 9.x only | 22+ |
84
- | 1.0.0-beta6 and earlier | 8.x | 8.x | 22+ |
82
+ | 1.0.0+ | 8.x and 9.x | 8.x and 9.x | 22+ |
83
+ | 1.0.0-beta7 and earlier | 8.x | 8.x | 18-20 |
85
84
 
86
- **Multi-Version Support**: Starting with v1.0.0-beta8, the library supports both Elasticsearch 8.x and 9.x through automatic version detection and client aliasing. This enables seamless reindexing between major versions (e.g., migrating from ES 8.x to 9.x). All functionality is tested in CI against multiple ES versions including cross-version reindexing scenarios.
85
+ **Multi-Version Support**: Starting with v1.0.0, the library supports both Elasticsearch 8.x and 9.x through automatic version detection and client aliasing. This enables seamless reindexing between major versions (e.g., migrating from ES 8.x to 9.x). All functionality is tested in CI against multiple ES versions including cross-version reindexing scenarios.
87
86
 
88
87
  **Upgrading?** See [MIGRATION.md](MIGRATION.md) for upgrade guidance from beta versions to v1.0.0.
89
88
 
@@ -97,7 +96,7 @@ yarn add node-es-transformer
97
96
 
98
97
  ## Usage
99
98
 
100
- ### Read from a file
99
+ ### Read NDJSON from a file
101
100
 
102
101
  ```javascript
103
102
  const transformer = require('node-es-transformer');
@@ -130,6 +129,50 @@ transformer({
130
129
  });
131
130
  ```
132
131
 
132
+ ### Read CSV from a file
133
+
134
+ ```javascript
135
+ const transformer = require('node-es-transformer');
136
+
137
+ transformer({
138
+ fileName: 'users.csv',
139
+ sourceFormat: 'csv',
140
+ targetIndexName: 'users-index',
141
+ mappings: {
142
+ properties: {
143
+ id: { type: 'integer' },
144
+ first_name: { type: 'keyword' },
145
+ last_name: { type: 'keyword' },
146
+ full_name: { type: 'keyword' },
147
+ },
148
+ },
149
+ transform(row) {
150
+ return {
151
+ ...row,
152
+ id: Number(row.id),
153
+ full_name: `${row.first_name} ${row.last_name}`,
154
+ };
155
+ },
156
+ });
157
+ ```
158
+
159
+ ### Infer mappings from CSV sample
160
+
161
+ ```javascript
162
+ const transformer = require('node-es-transformer');
163
+
164
+ transformer({
165
+ fileName: 'users.csv',
166
+ sourceFormat: 'csv',
167
+ targetIndexName: 'users-index',
168
+ inferMappings: true,
169
+ inferMappingsOptions: {
170
+ sampleBytes: 200000,
171
+ lines_to_sample: 2000,
172
+ },
173
+ });
174
+ ```
175
+
133
176
  ### Read from another index
134
177
 
135
178
  ```javascript
@@ -243,9 +286,11 @@ All options are passed to the main `transformer()` function.
243
286
 
244
287
  Choose **one** of these sources:
245
288
 
246
- - **`fileName`** (string): Source filename to ingest. Supports wildcards (e.g., `logs/*.json`).
289
+ - **`fileName`** (string): Source filename to ingest. Supports wildcards (e.g., `logs/*.json` or `data/*.csv`).
247
290
  - **`sourceIndexName`** (string): Source Elasticsearch index to reindex from.
248
291
  - **`stream`** (Readable): Node.js readable stream to ingest from.
292
+ - **`sourceFormat`** (`'ndjson' | 'csv'`): Format for file/stream sources. Default: `'ndjson'`.
293
+ - **`csvOptions`** (object): CSV parser options (delimiter, quote, columns, etc.) used when `sourceFormat: 'csv'`.
249
294
 
250
295
  #### Client Configuration
251
296
 
@@ -260,10 +305,14 @@ Choose **one** of these sources:
260
305
 
261
306
  - **`mappings`** (object): Elasticsearch document mappings for target index. If reindexing and not provided, mappings are copied from source index.
262
307
  - **`mappingsOverride`** (boolean): When reindexing, apply `mappings` on top of source index mappings. Default: `false`.
308
+ - **`inferMappings`** (boolean): Infer mappings for `fileName` sources via `/_text_structure/find_structure`. Ignored when `mappings` is provided. If inference returns `ingest_pipeline`, it is created as `<targetIndexName>-inferred-pipeline` and applied as the index default pipeline (unless `pipeline` is explicitly set). Default: `false`.
309
+ - **`inferMappingsOptions`** (object): Options for `/_text_structure/find_structure` (for example `sampleBytes`, `lines_to_sample`, `delimiter`, `quote`, `has_header_row`, `timeout`).
263
310
  - **`deleteIndex`** (boolean): Delete target index if it exists before starting. Default: `false`.
264
311
  - **`indexMappingTotalFieldsLimit`** (number): Field limit for target index (`index.mapping.total_fields.limit` setting).
265
312
  - **`pipeline`** (string): Elasticsearch ingest pipeline name to use during indexing.
266
313
 
314
+ When `inferMappings` is enabled, the target cluster must allow `/_text_structure/find_structure` (cluster privilege: `monitor_text_structure`). If inferred ingest pipelines are used, the target cluster must also allow creating ingest pipelines (`_ingest/pipeline`).
315
+
267
316
  #### Performance Options
268
317
 
269
318
  - **`bufferSize`** (number): Buffer size threshold in KBytes for bulk indexing. Default: `5120` (5 MB).
@@ -277,8 +326,12 @@ Choose **one** of these sources:
277
326
  - Return array of documents to split one source into multiple targets
278
327
  - Return `null`/`undefined` to skip document
279
328
  - **`query`** (object): Elasticsearch [DSL query](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html) to filter source documents.
280
- - **`splitRegex`** (RegExp): Line split regex for file/stream sources. Default: `/\n/`.
281
- - **`skipHeader`** (boolean): Skip first line of source file (e.g., CSV header). Default: `false`.
329
+ - **`splitRegex`** (RegExp): Line split regex for file/stream sources when `sourceFormat` is `'ndjson'`. Default: `/\n/`.
330
+ - **`skipHeader`** (boolean): Header skipping for file/stream sources.
331
+ - NDJSON: skips the first non-empty line
332
+ - CSV: skips the first data line only when `csvOptions.columns` does not consume headers
333
+ - Default: `false`
334
+ - Applies only to `fileName`/`stream` sources
282
335
  - **`verbose`** (boolean): Enable logging and progress bars. Default: `true`.
283
336
 
284
337
  ### Return Value