npm - @forzalabs/remora - Versions diffs - 1.1.15 → 1.2.2 - Mend

@forzalabs/remora 1.1.15 → 1.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/CHANGELOG.md +113 -0
package/index.js +381 -124
package/json_schemas/consumer-schema.json +107 -21
package/package.json +1 -1
package/workers/ExecutorWorker.js +371 -114

package/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,113 @@
+# Changelog
+All notable changes to this project will be documented in this file.
+The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
+## Unreleased
+## V 1.2.2 - 2026-04-10
+### Added
+- Added field-level consumer validations with support for multiple rules per field and per-rule failure actions: `fail`, `skip`, `warn`, and `set_default`
+- Added dataset-level consumer validations for `unique_fields`, `min_rows`, `max_rows`, `no_duplicates`, and `not_empty`
+- Added `DataValidationEngine` to centralize field and dataset validation logic
+- Added validation result type definitions to the definitions package for shared use across engines and executors
+- Added `warn()` logging support for non-fatal validation outcomes
+- Added canary consumer coverage for field-level and dataset-level validations with passing, warning, skipped, defaulted, and failing scenarios
+- Added `verify:local` to the canary package to build the local CLI and run the canary suite against it instead of the published package
+- Added canary coverage for `sample` and `discover` against gzipped local producer inputs
+- Added compressed CSV and JSONL canary fixtures plus matching producer definitions for gzip-based CLI verification
+### Changed
+- Updated consumer field validation configuration from a single flat validation object to an ordered array of validation rules with explicit `onFail` behavior
+### Fixed
+- Fixed the consumer JSON schema to support the new field-level and dataset-level validation configuration
+- Fixed AJV strict-mode compatibility for validation `in` and `not_in` rule arrays by replacing union `type` declarations with `oneOf`
+- Fixed CLI `sample` and `discover` commands so they can read producers configured with compressed local files by reusing the existing decompression logic during sampling
+- Fixed conflict in export with unique values and split by max size
+## V 1.1.15 - 2026-03-26
+### Added
+- Added validation that consumer field keys exist in the referenced producer's dimensions/measures
+- Added validation that every consumer field defines the required `key` property
+- Added validation that `copyFrom` references a field that appears earlier in the consumer's field list
+- Added validation that `distinctOn` keys and `orderBy` reference fields present in the consumer
+- Added validation that join SQL `${P.field}` and `${producer.field}` references point to valid fields
+### Fixed
+- Fixed environment variable not exposed to front-end
+- Fixed database endpoint selector
+- Fixed worker image volume usage in a cloud environment
+- Fixed worker-thread execution errors being logged only to the terminal by propagating them back to the orchestrator file logger
+- Fixed `ConsumerExecutor.processRecord` error reporting to log step-specific failures for field resolution, aliasing, transformations, and filter evaluation
+## V 1.1.11 - 2026-02-05
+### Added
+- Added `startRow` and `startColumn` settings for Excel producers (.xls/.xlsx), allowing users to specify the 1-indexed row and column from which to begin reading data
+### Fixed
+- Fixed `MaxListenersExceededWarning` on `WriteStream` during consumer execution by replacing shared stream merge with per-file append pipelines
+- Fixed CLI `run` command always exiting with code 1 even on successful runs
+- Fixed incomplete file logging caused by `process.exit()` terminating before winston could flush buffered writes; added `logger.flush()` to worker threads, orchestrator, and CLI exit paths
+- Added `logger.flush()` before `process.exit()` in all data-processing CLI actions (sample, mock, automap, discover, debug) and worker startup
+- Fixed CLI `discover` command exiting with code 1 on success instead of code 0
+- Fixed per-worker `WriteStream` in `Executor.ts` never being closed, risking data loss before distinct/distinctOn post-processing passes
+- Fixed `Dataset.ts` stream await pattern where `resolve` was never called in the `finish` handler (5 sites: transformStream, sort batches, k-way merge, append), causing promises to hang indefinitely
+- Fixed `ExecutorWriter.ts` not awaiting intermediate stream flush during file-size-based rotation
+- Fixed `DriverHelper.appendObjectsToUnifiedFile` and `LocalDestinationDriver.transformAndMove` not awaiting stream flush before returning
+## V 1.1.9 - 2026-02-04
+### Added
+- Added `switch_case` transformation for mapping specific values to other values (similar to a switch/case statement)
+- Added validation to detect multiple consumer fields reading from the same producer dimension (suggests using `copyFrom` instead)
+- Added detailed logging to the executor orchestrator with usage ID tracing throughout the execution lifecycle
+### Changed
+- Cleaned up CLI execution error output to show concise messages in console while preserving full stack traces in internal logs
+## V 1.1.8 - 2026-02-03
+### Added
+- Added `pivot` option to consumers, enabling row-to-column transformation with aggregation (sum, count, avg, min, max)
+- Added `copyFrom` property to consumer fields, allowing a field to be a value copy of another field in the dataset
+## V 1.1.7 - 2026-02-02
+### Changed
+- Improved the mock engine
+- Improved logging
+## V 1.1.6 - 2026-02-02
+### Added
+- Added `--limit` option to `remora run` command to process only the first N records
+- Added descriptive error messages for failed field transformations with full stack trace preservation
+- Added file logging with rotation (enabled via `REMORA_DEBUG_MODE=true` in production)
+- Added structured logging across key application areas
+### Changed
+- Moved `DEBUG_MODE` from project.json settings to `REMORA_DEBUG_MODE` environment variable
+## V 1.1.5 - 2026-02-01
+### Added
+- Refactored for monorepository
+- Added output maximum file size definable from consumer
+- Added support for nested subfolders inside remora configuration directories (sources, producers, consumers, schemas)
+### Fixed
+- Bug in parsing via GZ file
+- Issues with concurrent requests
+### Changed
+- Dockerfile for apps in a monorepo build
+- Package.json to workspaces compliance
+- Refactored internal module structure
+- Removed the _file annotations for environment variables
+## V 1.0.18