npm - node-es-transformer - Versions diffs - 1.0.0 → 1.1.0 - Mend

node-es-transformer 1.0.0 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/README.md +61 -8
package/dist/node-es-transformer.cjs.js +817 -0
package/dist/node-es-transformer.cjs.js.map +1 -0
package/dist/node-es-transformer.esm.js +815 -0
package/dist/node-es-transformer.esm.js.map +1 -0
package/index.d.ts +58 -1
package/package.json +2 -1

package/README.md CHANGED Viewed

@@ -79,11 +79,10 @@ If you need to ingest large CSV/JSON files (GigaBytes) into Elasticsearch withou
 | node-es-transformer     | Elasticsearch Client | Elasticsearch Server | Node.js |
 | ----------------------- | -------------------- | -------------------- | ------- |
-| 1.0.0-beta8+            | 8.x and 9.x          | 8.x and 9.x          | 22+     |
-| 1.0.0-beta7             | 9.x only             | 9.x only             | 22+     |
-| 1.0.0-beta6 and earlier | 8.x                  | 8.x                  | 22+     |
+| 1.0.0+                  | 8.x and 9.x          | 8.x and 9.x          | 22+     |
+| 1.0.0-beta7 and earlier | 8.x                  | 8.x                  | 18-20   |
-**Multi-Version Support**: Starting with v1.0.0-beta8, the library supports both Elasticsearch 8.x and 9.x through automatic version detection and client aliasing. This enables seamless reindexing between major versions (e.g., migrating from ES 8.x to 9.x). All functionality is tested in CI against multiple ES versions including cross-version reindexing scenarios.
+**Multi-Version Support**: Starting with v1.0.0, the library supports both Elasticsearch 8.x and 9.x through automatic version detection and client aliasing. This enables seamless reindexing between major versions (e.g., migrating from ES 8.x to 9.x). All functionality is tested in CI against multiple ES versions including cross-version reindexing scenarios.
 **Upgrading?** See [MIGRATION.md](MIGRATION.md) for upgrade guidance from beta versions to v1.0.0.
@@ -97,7 +96,7 @@ yarn add node-es-transformer
 ## Usage
-### Read from a file
+### Read NDJSON from a file
 ```javascript
 const transformer = require('node-es-transformer');
@@ -130,6 +129,50 @@ transformer({
 });
 ```
+### Read CSV from a file
+```javascript
+const transformer = require('node-es-transformer');
+transformer({
+  fileName: 'users.csv',
+  sourceFormat: 'csv',
+  targetIndexName: 'users-index',
+  mappings: {
+    properties: {
+      id: { type: 'integer' },
+      first_name: { type: 'keyword' },
+      last_name: { type: 'keyword' },
+      full_name: { type: 'keyword' },
+    },
+  },
+  transform(row) {
+    return {
+      ...row,
+      id: Number(row.id),
+      full_name: `${row.first_name} ${row.last_name}`,
+    };
+  },
+});
+```
+### Infer mappings from CSV sample
+```javascript
+const transformer = require('node-es-transformer');
+transformer({
+  fileName: 'users.csv',
+  sourceFormat: 'csv',
+  targetIndexName: 'users-index',
+  inferMappings: true,
+  inferMappingsOptions: {
+    sampleBytes: 200000,
+    lines_to_sample: 2000,
+  },
+});
+```
 ### Read from another index
 ```javascript
@@ -243,9 +286,11 @@ All options are passed to the main `transformer()` function.
 Choose **one** of these sources:
-- **`fileName`** (string): Source filename to ingest. Supports wildcards (e.g., `logs/*.json`).
+- **`fileName`** (string): Source filename to ingest. Supports wildcards (e.g., `logs/*.json` or `data/*.csv`).
 - **`sourceIndexName`** (string): Source Elasticsearch index to reindex from.
 - **`stream`** (Readable): Node.js readable stream to ingest from.
+- **`sourceFormat`** (`'ndjson' | 'csv'`): Format for file/stream sources. Default: `'ndjson'`.
+- **`csvOptions`** (object): CSV parser options (delimiter, quote, columns, etc.) used when `sourceFormat: 'csv'`.
 #### Client Configuration
@@ -260,10 +305,14 @@ Choose **one** of these sources:
 - **`mappings`** (object): Elasticsearch document mappings for target index. If reindexing and not provided, mappings are copied from source index.
 - **`mappingsOverride`** (boolean): When reindexing, apply `mappings` on top of source index mappings. Default: `false`.
+- **`inferMappings`** (boolean): Infer mappings for `fileName` sources via `/_text_structure/find_structure`. Ignored when `mappings` is provided. If inference returns `ingest_pipeline`, it is created as `<targetIndexName>-inferred-pipeline` and applied as the index default pipeline (unless `pipeline` is explicitly set). Default: `false`.
+- **`inferMappingsOptions`** (object): Options for `/_text_structure/find_structure` (for example `sampleBytes`, `lines_to_sample`, `delimiter`, `quote`, `has_header_row`, `timeout`).
 - **`deleteIndex`** (boolean): Delete target index if it exists before starting. Default: `false`.
 - **`indexMappingTotalFieldsLimit`** (number): Field limit for target index (`index.mapping.total_fields.limit` setting).
 - **`pipeline`** (string): Elasticsearch ingest pipeline name to use during indexing.
+When `inferMappings` is enabled, the target cluster must allow `/_text_structure/find_structure` (cluster privilege: `monitor_text_structure`). If inferred ingest pipelines are used, the target cluster must also allow creating ingest pipelines (`_ingest/pipeline`).
 #### Performance Options
 - **`bufferSize`** (number): Buffer size threshold in KBytes for bulk indexing. Default: `5120` (5 MB).
@@ -277,8 +326,12 @@ Choose **one** of these sources:
   - Return array of documents to split one source into multiple targets
   - Return `null`/`undefined` to skip document
 - **`query`** (object): Elasticsearch [DSL query](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html) to filter source documents.
-- **`splitRegex`** (RegExp): Line split regex for file/stream sources. Default: `/\n/`.
-- **`skipHeader`** (boolean): Skip first line of source file (e.g., CSV header). Default: `false`.
+- **`splitRegex`** (RegExp): Line split regex for file/stream sources when `sourceFormat` is `'ndjson'`. Default: `/\n/`.
+- **`skipHeader`** (boolean): Header skipping for file/stream sources.
+  - NDJSON: skips the first non-empty line
+  - CSV: skips the first data line only when `csvOptions.columns` does not consume headers
+  - Default: `false`
+  - Applies only to `fileName`/`stream` sources
 - **`verbose`** (boolean): Enable logging and progress bars. Default: `true`.
 ### Return Value