RubyGems - data_porter - Versions diffs - 0.9.0 → 1.0.2 - Mend

data_porter 0.9.0 → 1.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (36) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +48 -0
data/README.md +5 -1
data/app/assets/javascripts/data_porter/import_form_controller.js +1 -0
data/app/assets/javascripts/data_porter/template_form_controller.js +31 -8
data/app/assets/stylesheets/data_porter/alerts.css +2 -1
data/app/assets/stylesheets/data_porter/layout.css +2 -2
data/app/controllers/data_porter/concerns/import_validation.rb +29 -0
data/app/controllers/data_porter/concerns/mapping_management.rb +13 -4
data/app/controllers/data_porter/imports_controller.rb +28 -4
data/app/views/data_porter/imports/show.html.erb +4 -0
data/config/routes.rb +1 -0
data/lib/data_porter/components/preview/results_summary.rb +6 -1
data/lib/data_porter/configuration.rb +7 -1
data/lib/data_porter/orchestrator/importer.rb +27 -0
data/lib/data_porter/orchestrator/record_builder.rb +9 -0
data/lib/data_porter/registry.rb +7 -1
data/lib/data_porter/rejects_csv_builder.rb +35 -0
data/lib/data_porter/sources/base.rb +6 -0
data/lib/data_porter/sources/csv.rb +32 -5
data/lib/data_porter/sources/xlsx.rb +2 -1
data/lib/data_porter/version.rb +1 -1
data/lib/data_porter.rb +1 -0
data/lib/generators/data_porter/install/templates/create_data_porter_imports.rb.erb +1 -1
data/lib/generators/data_porter/install/templates/initializer.rb +3 -5
metadata +5 -11
data/docs/CONFIGURATION.md +0 -103
data/docs/MAPPING.md +0 -44
data/docs/ROADMAP.md +0 -28
data/docs/SOURCES.md +0 -94
data/docs/TARGETS.md +0 -227
data/docs/screenshots/index-with-previewing.jpg +0 -0
data/docs/screenshots/index.jpg +0 -0
data/docs/screenshots/mapping.jpg +0 -0
data/docs/screenshots/modal-new-import.jpg +0 -0
data/docs/screenshots/preview.jpg +0 -0

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 20e2b579cf4078a611095abd155090d7754e1e33e044a022136c846edb793bcb
-  data.tar.gz: 58e1bcad055c198aab6ccb71ddbbb5b219d5b82a7a94a34fb8c064f109d37878
+  metadata.gz: 7ca6bfabfc9f831d71c60a1942516a5dccf95c85e3787f16a1217188c9feb3a0
+  data.tar.gz: f703da9261612953fcacad2674e38bef3037b804191e7fb3087577846b096461
 SHA512:
-  metadata.gz: 9929013331242330b53fe9f5e4b4940600a339c7e2d723551e864a969406cd2c516e70a10c95eff25c820b7f7e5904b63da4156b473d140e261d3a65ed9811eb
-  data.tar.gz: c9be1bbff65d0da5b36b5a837f590514ed29eb911a4ba527baa3e2f228f2d27252a4f4ff9bff3af579ac6674b6be19b32f0f4bf23aab992d42291877ea019e56
+  metadata.gz: a7d8ad32cb5d80e027d9adfe2a089a84703c6e8e5b00901c0d057a4b2bb24cb2ffbe0f6edb53f61021219fd03384830517192657fbe4c693dd74b6977279b22a
+  data.tar.gz: 6d96ecefa39d191cea801ff8e4075f5c9bb4e13979f7754041db33a6685701d98f4521230500a1d0a47a417ce141f1b08973b5cbfd65868b0c3920c99afb61f6

data/CHANGELOG.md CHANGED Viewed

@@ -5,6 +5,54 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [1.0.2] - 2026-02-07
+### Changed
+- Exclude `docs/` from gem package (194 KB → 80 KB)
+## [1.0.1] - 2026-02-07
+### Added
+- **CSV delimiter auto-detection** -- Automatically detect `,` `;` `\t` separators via frequency analysis on the first line; explicit `col_sep` config still takes precedence
+- **CSV encoding auto-detection** -- Detect and transcode Latin-1 / ISO-8859-1 content to UTF-8; strip UTF-8 BOM when present
+### Fixed
+- **`param.collection` accepts arrays** -- `Registry.serialize_param` now duck-types with `respond_to?(:call)` so both lambdas and plain arrays work
+- **`dp-input` styling** -- Text inputs now share the same CSS rules as `dp-select` and `dp-file-input`
+- **Migration template nullable user** -- Removed `null: false` from polymorphic `user` reference so the engine works without authentication
+- **Skipped records visible in results** -- Added "Skipped" stat card for missing + partial records; title reflects errors; export rejects button includes all rejected rows
+- **Hidden param label removed** -- `type: :hidden` params no longer render a label or wrapper div
+### Changed
+- 402 RSpec examples (up from 391), 0 failures
+## [1.0.0] - 2026-02-07
+### Added
+- **Max records guard** -- `config.max_records` (default: 10,000) rejects files exceeding the limit before parsing
+- **Transaction mode** -- `config.transaction_mode` (`:per_record` or `:all`); `:all` wraps entire import in a single transaction that rolls back on any failure
+- **Fallback headers** -- Auto-generate `col_1, col_2...` when CSV/XLSX header row is empty
+- **Reject rows CSV export** -- Download CSV of failed/errored records with original data + error messages after import; available when `errored_count > 0`
+- **E2E specs** -- 6 end-to-end integration tests covering all source types (CSV, XLSX, JSON, API), import params, and reject rows export
+### Fixed
+- **Import params whitelist** -- `merge_import_params` now permits only param names declared in the Target DSL instead of using `permit!`
+- **Column mapping whitelist** -- `permitted_column_mapping` filters mapping values to valid target column names; invalid values replaced with `""`
+- **File size validation** -- Uploads exceeding `config.max_file_size` (default: 10 MB) are rejected before save
+- **MIME type validation** -- Uploaded files must match allowed content types per source (CSV: `text/csv`, `text/plain`; JSON: `application/json`, `text/plain`; XLSX: OpenXML spreadsheet)
+- **XSS in template form** -- Replaced `innerHTML` with safe DOM methods in `template_form_controller.js`
+### Changed
+- Validation chain refactored to `all_validations_pass?` using `.all?` to collect all errors at once instead of short-circuiting
+- 391 RSpec examples (up from 354), 0 failures
 ## [0.9.0] - 2026-02-07
 ### Added

data/README.md CHANGED Viewed

@@ -30,6 +30,9 @@ Supports CSV, JSON, XLSX, and API sources with a declarative DSL for defining im
 - **Import params** -- Declare extra form fields (select, text, number, hidden) per target for scoped imports ([docs](docs/TARGETS.md#params--))
 - **Per-target source filtering** -- Each target declares its allowed sources, the UI filters accordingly
 - **Import deletion & auto-purge** -- Delete imports from the UI, or schedule `rake data_porter:purge` for automatic cleanup
+- **Reject rows export** -- Download a CSV of failed/errored records with error messages after import
+- **Security validations** -- File size limit, MIME type check, strong parameter whitelisting
+- **Safety guards** -- Max records limit (`config.max_records`), configurable transaction mode (`:per_record` or `:all`)
 - **Declarative Target DSL** -- One class per import type, zero boilerplate ([docs](docs/TARGETS.md))
 ## Requirements
@@ -129,6 +132,7 @@ pending -> parsing -> previewing -> importing -> completed
 | POST | `/imports/:id/confirm` | Run import |
 | POST | `/imports/:id/cancel` | Cancel import |
 | POST | `/imports/:id/dry_run` | Dry run validation |
+| GET | `/imports/:id/export_rejects` | Download rejects CSV |
 | | `/mapping_templates` | Full CRUD for templates |
 ## Development
@@ -137,7 +141,7 @@ pending -> parsing -> previewing -> importing -> completed
 git clone https://github.com/SerylLns/data_porter.git
 cd data_porter
 bin/setup
-bundle exec rspec     # 354 specs
+bundle exec rspec     # 391 specs
 bundle exec rubocop   # 0 offenses
 ```

data/app/assets/javascripts/data_porter/import_form_controller.js CHANGED Viewed

@@ -32,6 +32,7 @@ export default class extends Controller {
   }
   buildParamField(p) {
+    if (p.type === "hidden") return this.buildInput(p)
     var div = document.createElement("div")
     div.className = "dp-field"
     div.appendChild(this.buildLabel(p))

data/app/assets/javascripts/data_porter/template_form_controller.js CHANGED Viewed

@@ -18,7 +18,8 @@ export default class extends Controller {
     const pair = document.createElement("div")
     pair.className = "dp-mapping-pair"
     pair.style.cssText = "display: flex; gap: 0.5rem; margin-bottom: 0.5rem;"
-    pair.innerHTML = this.pairHTML(columns)
+    pair.appendChild(this.buildKeyInput())
+    pair.appendChild(this.buildValueSelect(columns))
     container.appendChild(pair)
   }
@@ -34,13 +35,35 @@ export default class extends Controller {
     })
   }
-  pairHTML(columns) {
-    const options = columns.map(([label, name]) =>
-      `<option value="${name}">${label}</option>`
-    ).join("")
+  buildKeyInput() {
+    const input = document.createElement("input")
+    input.type = "text"
+    input.name = "mapping_template[mapping_keys][]"
+    input.placeholder = "File header"
+    input.className = "dp-select"
+    input.style.flex = "1"
+    return input
+  }
+  buildValueSelect(columns) {
+    const select = document.createElement("select")
+    select.name = "mapping_template[mapping_values][]"
+    select.className = "dp-select"
+    select.style.flex = "1"
+    select.dataset.dataPorterTemplateFormTarget = "fieldSelect"
+    const blank = document.createElement("option")
+    blank.value = ""
+    blank.textContent = "Select a field..."
+    select.appendChild(blank)
+    columns.forEach(([label, name]) => {
+      const opt = document.createElement("option")
+      opt.value = name
+      opt.textContent = label
+      select.appendChild(opt)
+    })
-    return `<input type="text" name="mapping_template[mapping_keys][]" placeholder="File header" class="dp-select" style="flex: 1;" />` +
-      `<select name="mapping_template[mapping_values][]" class="dp-select" style="flex: 1;" data-data-porter--template-form-target="fieldSelect">` +
-      `<option value="">Select a field...</option>${options}</select>`
+    return select
   }
 }

data/app/assets/stylesheets/data_porter/alerts.css CHANGED Viewed

@@ -32,7 +32,7 @@
   display: grid;
   grid-template-columns: repeat(auto-fit, minmax(120px, 1fr));
   gap: 1rem;
-  max-width: 400px;
+  max-width: 500px;
   margin: 0 auto;
 }
@@ -60,6 +60,7 @@
 .dp-results__stat--success strong { color: var(--dp-success); }
 .dp-results__stat--error strong { color: var(--dp-danger); }
+.dp-results__stat--warning strong { color: var(--dp-warning); }
 .dp-results__duration {
   margin-top: 1rem;

data/app/assets/stylesheets/data_porter/layout.css CHANGED Viewed

@@ -29,7 +29,7 @@
   color: var(--dp-gray-700);
 }
-.dp-select, .dp-file-input {
+.dp-select, .dp-input, .dp-file-input {
   display: block;
   width: 100%;
   padding: 0.625rem 0.875rem;
@@ -50,7 +50,7 @@
   padding-right: 2.5rem;
 }
-.dp-select:focus, .dp-file-input:focus {
+.dp-select:focus, .dp-input:focus, .dp-file-input:focus {
   outline: none;
   border-color: var(--dp-primary);
   box-shadow: 0 0 0 3px rgba(79, 70, 229, 0.15);

data/app/controllers/data_porter/concerns/import_validation.rb CHANGED Viewed

@@ -5,6 +5,12 @@ module DataPorter
     module ImportValidation
       extend ActiveSupport::Concern
+      ALLOWED_CONTENT_TYPES = {
+        "csv" => %w[text/csv text/plain],
+        "json" => %w[application/json text/plain],
+        "xlsx" => %w[application/vnd.openxmlformats-officedocument.spreadsheetml.sheet]
+      }.freeze
       private
       def valid_source_for_target?
@@ -42,6 +48,29 @@ module DataPorter
       def import_param_values
         (@import.config || {}).fetch("import_params", {})
       end
+      def valid_file_size?
+        return true unless @import.file.attached?
+        max = DataPorter.configuration.max_file_size
+        return true if @import.file.blob.byte_size <= max
+        @import.errors.add(:file, "is too large (max #{max / 1.megabyte} MB)")
+        false
+      end
+      def valid_file_content_type?
+        return true unless @import.file.attached?
+        allowed = ALLOWED_CONTENT_TYPES[@import.source_type]
+        return true unless allowed
+        content_type = @import.file.blob.content_type
+        return true if allowed.include?(content_type)
+        @import.errors.add(:file, "has an invalid content type (#{content_type})")
+        false
+      end
     end
   end
 end

data/app/controllers/data_porter/concerns/mapping_management.rb CHANGED Viewed

@@ -23,8 +23,7 @@ module DataPorter
       end
       def save_column_mapping
-        mapping = params.require(:column_mapping).permit!.to_h
-        merged = (@import.config || {}).merge("column_mapping" => mapping)
+        merged = (@import.config || {}).merge("column_mapping" => permitted_column_mapping)
         @import.update!(config: merged, status: :pending)
       end
@@ -32,11 +31,21 @@ module DataPorter
         return unless params[:save_template] == "1"
         return unless defined?(DataPorter::MappingTemplate)
-        mapping = params.require(:column_mapping).permit!.to_h
         DataPorter::MappingTemplate.find_or_initialize_by(
           target_key: @import.target_key,
           name: params[:template_name].presence || "Default"
-        ).update!(mapping: mapping)
+        ).update!(mapping: permitted_column_mapping)
+      end
+      def permitted_column_mapping
+        raw = params.require(:column_mapping).permit!.to_h
+        valid_names = valid_column_names
+        raw.transform_values { |v| valid_names.include?(v) ? v : "" }
+      end
+      def valid_column_names
+        columns = @import.target_class._columns || []
+        columns.to_set { |c| c.name.to_s }
       end
     end
   end

data/app/controllers/data_porter/imports_controller.rb CHANGED Viewed

@@ -8,7 +8,7 @@ module DataPorter
     layout "data_porter/application"
-    before_action :set_import, only: %i[show parse confirm cancel dry_run update_mapping status destroy]
+    before_action :set_import, only: %i[show parse confirm cancel dry_run update_mapping status export_rejects destroy]
     before_action :load_targets, only: %i[index new create]
     def index
@@ -22,7 +22,7 @@ module DataPorter
     def create
       build_import
-      if valid_source_for_target? && valid_file_presence? && valid_import_params? && @import.save
+      if all_validations_pass? && @import.save
         enqueue_after_create
         redirect_to import_path(@import)
       else
@@ -73,6 +73,12 @@ module DataPorter
       render json: { status: @import.status, progress: progress }
     end
+    def export_rejects
+      columns = @import.target_class._columns || []
+      csv = RejectsCsvBuilder.new(columns, @import.records).generate
+      send_data csv, filename: "rejects_import_#{@import.id}.csv", type: "text/csv"
+    end
     def destroy
       @import.file.purge if @import.file.attached?
       @import.destroy!
@@ -95,6 +101,16 @@ module DataPorter
       @import.status = :pending
     end
+    def all_validations_pass?
+      [
+        valid_source_for_target?,
+        valid_file_presence?,
+        valid_file_size?,
+        valid_file_content_type?,
+        valid_import_params?
+      ].all?
+    end
     def import_params
       permitted = params.require(:data_import).permit(:target_key, :source_type, :file, config: {})
       merge_import_params(permitted)
@@ -104,11 +120,19 @@ module DataPorter
       nested = params.dig(:data_import, :config, :import_params)
       return permitted unless nested
-      config = permitted[:config] || {}
-      config["import_params"] = nested.permit!.to_h
+      config = permitted[:config]&.to_unsafe_h || {}
+      config["import_params"] = nested.permit(*allowed_param_keys).to_h
       permitted.merge(config: config)
     end
+    def allowed_param_keys
+      target_key = params.dig(:data_import, :target_key)
+      return [] unless target_key
+      target = DataPorter::Registry.find(target_key)
+      (target._params || []).map { |p| p.name.to_s }
+    end
     def enqueue_after_create
       if @import.file_based?
         DataPorter::ExtractHeadersJob.perform_later(@import.id)

data/app/views/data_porter/imports/show.html.erb CHANGED Viewed

@@ -86,6 +86,10 @@
     <% end %>
     <div class="dp-actions">
       <%= link_to "Back to imports", imports_path, class: "dp-btn dp-btn--primary" %>
+      <% rejected = @import.report.errored_count.to_i + @import.report.missing_count.to_i + @import.report.partial_count.to_i %>
+      <% if rejected.positive? %>
+        <%= link_to "Download rejects CSV", export_rejects_import_path(@import), class: "dp-btn dp-btn--secondary" %>
+      <% end %>
       <%= button_to "Delete", import_path(@import),
             method: :delete, class: "dp-btn dp-btn--danger",
             data: { turbo_confirm: "Delete this import?" } %>

data/config/routes.rb CHANGED Viewed

@@ -9,6 +9,7 @@ DataPorter::Engine.routes.draw do
       post :dry_run
       patch :update_mapping
       get :status
+      get :export_rejects
     end
   end

data/lib/data_porter/components/preview/results_summary.rb CHANGED Viewed

@@ -33,6 +33,7 @@ module DataPorter
           div(class: "dp-results__cards") do
             stat("dp-results__stat--success", @report.imported_count, "Imported")
             stat("dp-results__stat--error", @report.errored_count, "Errors")
+            stat("dp-results__stat--warning", skipped_count, "Skipped") if skipped_count.positive?
           end
         end
@@ -52,7 +53,11 @@ module DataPorter
         end
         def success?
-          @report.errored_count.zero?
+          @report.errored_count.zero? && skipped_count.zero?
+        end
+        def skipped_count
+          @report.missing_count.to_i + @report.partial_count.to_i
         end
       end
     end

data/lib/data_porter/configuration.rb CHANGED Viewed

@@ -10,7 +10,10 @@ module DataPorter
                   :preview_limit,
                   :enabled_sources,
                   :scope,
-                  :purge_after
+                  :purge_after,
+                  :max_file_size,
+                  :max_records,
+                  :transaction_mode
     def initialize
       @parent_controller = "ApplicationController"
@@ -22,6 +25,9 @@ module DataPorter
       @enabled_sources = %i[csv json api xlsx]
       @scope = nil
       @purge_after = 60.days
+      @max_file_size = 10.megabytes
+      @max_records = 10_000
+      @transaction_mode = :per_record
     end
   end
 end

data/lib/data_porter/orchestrator/importer.rb CHANGED Viewed

@@ -6,6 +6,14 @@ module DataPorter
       private
       def import_records
+        if DataPorter.configuration.transaction_mode == :all
+          import_all_or_nothing
+        else
+          import_per_record
+        end
+      end
+      def import_per_record
         importable = @data_import.importable_records
         context = build_context
         results = { created: 0, errored: 0 }
@@ -16,6 +24,25 @@ module DataPorter
           broadcast_progress(index + 1, total)
         end
+        finalize_import(results)
+      end
+      def import_all_or_nothing
+        importable = @data_import.importable_records
+        context = build_context
+        total = importable.size
+        ActiveRecord::Base.transaction do
+          importable.each_with_index do |record, index|
+            @target.persist(record, context: context)
+            broadcast_progress(index + 1, total)
+          end
+        end
+        finalize_import(created: total, errored: 0)
+      end
+      def finalize_import(results)
         @data_import.update!(status: :completed)
         @broadcaster.success
         results

data/lib/data_porter/orchestrator/record_builder.rb CHANGED Viewed

@@ -8,6 +8,7 @@ module DataPorter
       def build_records
         source = build_source
         raw_rows = source.fetch
+        enforce_max_records!(raw_rows.size)
         columns = @target.class._columns || []
         validator = RecordValidator.new(columns)
@@ -16,6 +17,14 @@ module DataPorter
         end
       end
+      def enforce_max_records!(count)
+        max = DataPorter.configuration.max_records
+        return unless max
+        return if count <= max
+        raise Error, "File contains #{count} records, exceeds maximum of #{max}"
+      end
       def build_record(row, index, columns, validator)
         record = StoreModels::ImportRecord.new(
           line_number: index + 1,

data/lib/data_porter/registry.rb CHANGED Viewed

@@ -37,6 +37,12 @@ module DataPorter
       private
+      def resolve_collection(collection)
+        return unless collection
+        collection.respond_to?(:call) ? collection.call : collection
+      end
       def serialize_params(params)
         return [] unless params
@@ -50,7 +56,7 @@ module DataPorter
           required: param.required,
           label: param.label,
           default: param.default,
-          collection: param.collection&.call
+          collection: resolve_collection(param.collection)
         }.compact
       end
     end

data/lib/data_porter/rejects_csv_builder.rb ADDED Viewed

@@ -0,0 +1,35 @@
+# frozen_string_literal: true
+require "csv"
+module DataPorter
+  class RejectsCsvBuilder
+    def initialize(columns, records)
+      @columns = columns
+      @records = records
+    end
+    def generate
+      CSV.generate do |csv|
+        csv << header_row
+        rejected_records.each { |r| csv << record_row(r) }
+      end
+    end
+    private
+    def header_row
+      ["line"] + @columns.map { |c| c.name.to_s } + ["errors"]
+    end
+    def rejected_records
+      @records.reject(&:complete?)
+    end
+    def record_row(record)
+      values = @columns.map { |c| record.data[c.name.to_s] }
+      errors = record.errors_list.map(&:message).join("; ")
+      [record.line_number] + values + [errors]
+    end
+  end
+end

data/lib/data_porter/sources/base.rb CHANGED Viewed

@@ -45,6 +45,12 @@ module DataPorter
       def auto_map(row)
         row.to_h.transform_keys { |k| k.parameterize(separator: "_").to_sym }
       end
+      def fallback_headers(raw_headers)
+        return raw_headers if raw_headers.any?(&:present?)
+        raw_headers.each_with_index.map { |_, i| "col_#{i + 1}" }
+      end
     end
   end
 end

data/lib/data_porter/sources/csv.rb CHANGED Viewed

@@ -5,6 +5,8 @@ require "csv"
 module DataPorter
   module Sources
     class Csv < Base
+      SEPARATORS = [",", ";", "\t"].freeze
       def initialize(data_import, content: nil)
         super(data_import)
         @content = content
@@ -12,7 +14,8 @@ module DataPorter
       def headers
         first_line = csv_content.lines.first
-        ::CSV.parse_line(first_line, **extra_options).map(&:to_s)
+        raw = ::CSV.parse_line(first_line, **extra_options).map(&:to_s)
+        fallback_headers(raw)
       end
       def fetch
@@ -26,11 +29,28 @@ module DataPorter
       private
       def csv_content
-        @content || download_file
+        @csv_content ||= ensure_utf8(@content || download_file)
       end
       def download_file
-        @data_import.file.download.force_encoding("UTF-8")
+        @data_import.file.download
+      end
+      def ensure_utf8(raw)
+        raw = strip_bom(raw)
+        return raw if raw.encoding == Encoding::UTF_8 && raw.valid_encoding?
+        raw.force_encoding("UTF-8")
+        return raw if raw.valid_encoding?
+        raw.encode("UTF-8", "ISO-8859-1")
+      end
+      def strip_bom(raw)
+        bytes = raw.b
+        return raw unless bytes.start_with?("\xEF\xBB\xBF".b)
+        bytes[3..].force_encoding("UTF-8")
       end
       def csv_options
@@ -39,9 +59,16 @@ module DataPorter
       def extra_options
         config = @data_import.config
-        return {} unless config.is_a?(Hash)
+        return { col_sep: detect_separator } unless config.is_a?(Hash)
+        opts = config.symbolize_keys.slice(:col_sep, :encoding)
+        opts[:col_sep] ||= detect_separator
+        opts
+      end
-        config.symbolize_keys.slice(:col_sep, :encoding)
+      def detect_separator
+        first_line = csv_content.lines.first.to_s
+        SEPARATORS.max_by { |sep| first_line.count(sep) }
       end
     end
   end

data/lib/data_porter/sources/xlsx.rb CHANGED Viewed

@@ -14,7 +14,8 @@ module DataPorter
       def headers
         sheet = target_sheet
         first_row = sheet.simple_rows.first
-        first_row&.values&.map(&:to_s) || []
+        raw = first_row&.values&.map(&:to_s) || []
+        fallback_headers(raw)
       ensure
         cleanup
       end

data/lib/data_porter/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module DataPorter
-  VERSION = "0.9.0"
+  VERSION = "1.0.2"
 end

data/lib/data_porter.rb CHANGED Viewed

@@ -18,6 +18,7 @@ require_relative "data_porter/sources"
 require_relative "data_porter/record_validator"
 require_relative "data_porter/broadcaster"
 require_relative "data_porter/orchestrator"
+require_relative "data_porter/rejects_csv_builder"
 require_relative "data_porter/components"
 require_relative "data_porter/engine"

data/lib/generators/data_porter/install/templates/create_data_porter_imports.rb.erb CHANGED Viewed

@@ -10,7 +10,7 @@ class CreateDataPorterImports < ActiveRecord::Migration[<%= ActiveRecord::Migrat
       t.jsonb   :report,      null: false, default: {}
       t.jsonb   :config,      null: false, default: {}
-      t.references :user, polymorphic: true, null: false
+      t.references :user, polymorphic: true
       t.timestamps
     end

data/lib/generators/data_porter/install/templates/initializer.rb CHANGED Viewed

@@ -15,11 +15,9 @@ DataPorter.configure do |config|
   # config.cable_channel_prefix = "data_porter"
   # Context builder: inject business data into targets.
-  # Receives the current controller instance.
-  # config.context_builder = ->(controller) {
-  #   OpenStruct.new(
-  #     user: controller.current_user
-  #   )
+  # Receives the DataImport record.
+  # config.context_builder = ->(data_import) {
+  #   { user: data_import.user }
   # }
   # Maximum number of records displayed in preview.

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: data_porter
 version: !ruby/object:Gem::Version
-  version: 0.9.0
+  version: 1.0.2
 platform: ruby
 authors:
 - Seryl Lounis
@@ -139,16 +139,6 @@ files:
 - app/views/data_porter/mapping_templates/new.html.erb
 - app/views/layouts/data_porter/application.html.erb
 - config/routes.rb
-- docs/CONFIGURATION.md
-- docs/MAPPING.md
-- docs/ROADMAP.md
-- docs/SOURCES.md
-- docs/TARGETS.md
-- docs/screenshots/index-with-previewing.jpg
-- docs/screenshots/index.jpg
-- docs/screenshots/mapping.jpg
-- docs/screenshots/modal-new-import.jpg
-- docs/screenshots/preview.jpg
 - lib/data_porter.rb
 - lib/data_porter/broadcaster.rb
 - lib/data_porter/components.rb
@@ -174,6 +164,7 @@ files:
 - lib/data_porter/orchestrator/record_builder.rb
 - lib/data_porter/record_validator.rb
 - lib/data_porter/registry.rb
+- lib/data_porter/rejects_csv_builder.rb
 - lib/data_porter/sources.rb
 - lib/data_porter/sources/api.rb
 - lib/data_porter/sources/base.rb
@@ -198,8 +189,11 @@ homepage: https://github.com/SerylLns/data_porter
 licenses:
 - MIT
 metadata:
+  homepage_uri: https://github.com/SerylLns/data_porter
   source_code_uri: https://github.com/SerylLns/data_porter
   changelog_uri: https://github.com/SerylLns/data_porter/blob/main/CHANGELOG.md
+  documentation_uri: https://github.com/SerylLns/data_porter#readme
+  bug_tracker_uri: https://github.com/SerylLns/data_porter/issues
   rubygems_mcp_server_uri: https://rubygems.org/gems/data_porter
   rubygems_mfa_required: 'true'
 rdoc_options: []

data/docs/CONFIGURATION.md DELETED Viewed

@@ -1,103 +0,0 @@
-# Configuration
-All options are set in `config/initializers/data_porter.rb`:
-```ruby
-DataPorter.configure do |config|
-  # Parent controller for the engine's controllers to inherit from.
-  # Controls authentication, layouts, and helpers.
-  config.parent_controller = "ApplicationController"
-  # ActiveJob queue name for import jobs.
-  config.queue_name = :imports
-  # ActiveStorage service for uploaded files.
-  config.storage_service = :local
-  # ActionCable channel prefix.
-  config.cable_channel_prefix = "data_porter"
-  # Context builder: inject business data into targets.
-  # Receives the current controller instance.
-  config.context_builder = ->(controller) {
-    OpenStruct.new(user: controller.current_user)
-  }
-  # Maximum number of records displayed in preview.
-  config.preview_limit = 500
-  # Enabled source types.
-  config.enabled_sources = %i[csv json api xlsx]
-  # Auto-purge completed/failed imports older than this duration.
-  # Set to nil to disable. Run `rake data_porter:purge` manually or via cron.
-  config.purge_after = 60.days
-end
-```
-## Options reference
-| Option | Default | Description |
-|---|---|---|
-| `parent_controller` | `"ApplicationController"` | Controller class the engine inherits from |
-| `queue_name` | `:imports` | ActiveJob queue for import jobs |
-| `storage_service` | `:local` | ActiveStorage service name |
-| `cable_channel_prefix` | `"data_porter"` | ActionCable stream prefix |
-| `context_builder` | `nil` | Lambda receiving the controller, returns context passed to target methods |
-| `preview_limit` | `500` | Max records shown in the preview step |
-| `enabled_sources` | `%i[csv json api xlsx]` | Source types available in the UI |
-| `purge_after` | `60.days` | Auto-purge completed/failed imports older than this duration |
-## Authentication
-The engine inherits authentication from `parent_controller`. Set it to your authenticated base controller:
-```ruby
-config.parent_controller = "Admin::BaseController"
-```
-All engine routes will require the same authentication as your base controller.
-## Context builder
-The `context_builder` lambda lets you inject business data (current user, tenant, permissions) into target methods (`persist`, `after_import`, `on_error`):
-```ruby
-config.context_builder = ->(controller) {
-  OpenStruct.new(
-    user: controller.current_user,
-    organization: controller.current_organization
-  )
-}
-```
-The returned object is available as `context` in all target instance methods.
-## Real-time progress
-DataPorter tracks import progress via JSON polling. The Stimulus progress controller polls `GET /imports/:id/status` every second and updates an animated progress bar.
-The status endpoint returns:
-```json
-{
-  "status": "importing",
-  "progress": { "current": 42, "total": 100, "percentage": 42 }
-}
-```
-No ActionCable or WebSocket configuration required -- it works out of the box with any deployment.
-## Auto-purge
-Old completed/failed imports can be cleaned up automatically:
-```bash
-# Run manually
-bin/rails data_porter:purge
-# Or schedule via cron (e.g. with whenever or solid_queue)
-# Removes imports older than purge_after (default: 60 days)
-```
-Attached files are purged from ActiveStorage along with the import record.

data/docs/MAPPING.md DELETED Viewed

@@ -1,44 +0,0 @@
-# Column Mapping
-For file-based sources (CSV/XLSX), DataPorter adds an interactive mapping step between upload and parsing. Users see their file's actual column headers and map each one to a target field via dropdowns.
-```
-File Header          Target Field
-+-----------+        +---------------+
-| Prenom    |   ->   | First Name  v |
-+-----------+        +---------------+
-+-----------+        +---------------+
-| Nom       |   ->   | Last Name   v |
-+-----------+        +---------------+
-```
-Dropdowns are pre-filled from the Target's `csv_mapping` when headers match. Users can adjust any mapping before continuing to the preview step.
-## Required fields
-Required target fields are marked with `*` in the dropdown labels. If any required field is left unmapped, a warning banner appears listing the missing fields. This validation is client-side only -- it warns but does not block submission.
-## Duplicate detection
-If two file headers are mapped to the same target field, the affected rows are highlighted with an orange border and a warning message appears. This helps catch accidental duplicate mappings before parsing.
-## Mapping Templates
-Mappings can be saved as reusable templates. When starting a new import, users select a saved template from a dropdown to auto-fill all column mappings at once. Templates are stored per-target, so each import type has its own template library.
-### Managing templates
-- **Inline**: Check "Save as template" in the mapping form and give it a name
-- **CRUD**: Use the "Mapping Templates" link on the imports index page to create, edit, and delete templates
-When a template is loaded, the "Save as template" checkbox is hidden since the user is already working from an existing template.
-## Mapping Priority
-When parsing, mappings are resolved in priority order:
-1. **User mapping** -- from the mapping UI (`config["column_mapping"]`)
-2. **Code mapping** -- from the Target DSL (`csv_mapping`)
-3. **Auto-map** -- parameterize headers to match column names
-Non-file sources (JSON, API) skip the mapping step entirely.

data/docs/ROADMAP.md DELETED Viewed

@@ -1,28 +0,0 @@
-# Roadmap
-## v1.0 — Production-ready
-The goal is a gem that handles real-world imports reliably at scale.
-### ~~1. Records pagination~~ DONE
-Implemented in v0.6.0. Preview and completed pages are paginated (50 per page).
-Controller limits records loaded via `RecordPagination` concern.
-### ~~2. Import params~~ DONE
-Implemented in v0.9.0. Targets declare `params` with a DSL (`:select`, `:text`,
-`:number`, `:hidden`). Values stored in `config["import_params"]`, accessible
-via `import_params` in all target instance methods. See [Targets docs](TARGETS.md#params--).
----
-## v2+ (future)
-- Scoped imports (filter index by user/tenant)
-- Webhooks / callbacks on import completion
-- Batch persist (`insert_all` support)
-- Resume / partial retry
-- Scheduled imports (recurring API source)
-- i18n
-- Dashboard stats

data/docs/SOURCES.md DELETED Viewed

@@ -1,94 +0,0 @@
-# Sources
-DataPorter supports four source types. Each source reads data from a different format and feeds it through the same parsing pipeline.
-## CSV
-Upload a CSV file. Headers are extracted automatically and presented in the [column mapping](MAPPING.md) step. Configure header mappings with `csv_mapping` in your [Target](TARGETS.md) when file headers don't match your column names.
-Custom separator:
-```ruby
-import.config = { "separator" => ";" }
-```
-## XLSX
-Upload an Excel `.xlsx` file. Uses the same `csv_mapping` for header-to-column mapping as CSV. By default the first sheet is parsed; select a different sheet via config:
-```ruby
-import.config = { "sheet_index" => 1 }
-```
-Powered by [creek](https://github.com/pythonicrubyist/creek) for streaming, memory-efficient parsing.
-## JSON
-Upload a JSON file. Use `json_root` in your Target to specify the path to the records array. Raw JSON arrays are supported without `json_root`.
-```ruby
-json_root "data.users"
-```
-Given `{ "data": { "users": [...] } }`, records are extracted from `data.users`.
-## API
-Fetch records from an external API endpoint. No file upload is needed -- the engine calls the API directly.
-### Basic usage
-```ruby
-api_config do
-  endpoint "https://api.example.com/data"
-  headers({ "Authorization" => "Bearer token" })
-  response_root "results"
-end
-```
-| Option | Type | Description |
-|---|---|---|
-| `endpoint` | String or Proc | URL to fetch records from |
-| `headers` | Hash or Proc | HTTP headers sent with the request |
-| `response_root` | String | Key in the JSON response containing the records array (omit for top-level arrays) |
-### Dynamic endpoints and headers
-Both `endpoint` and `headers` accept lambdas for runtime values. The endpoint lambda receives the import's `config` hash:
-```ruby
-api_config do
-  endpoint ->(params) { "https://api.example.com/events?page=#{params[:page]}" }
-  headers -> { { "Authorization" => "Bearer #{ENV['API_TOKEN']}" } }
-  response_root "data"
-end
-```
-### Full example
-```ruby
-class EventTarget < DataPorter::Target
-  label "Events"
-  model_name "Event"
-  sources :api
-  api_config do
-    endpoint "https://api.example.com/events"
-    headers -> { { "Authorization" => "Bearer #{ENV['EVENTS_API_KEY']}" } }
-    response_root "events"
-  end
-  columns do
-    column :name,     type: :string, required: true
-    column :date,     type: :date
-    column :venue,    type: :string
-    column :capacity, type: :integer
-  end
-  def persist(record, context:)
-    Event.create!(record.attributes)
-  end
-end
-```
-When a user creates an import with source type **API**, the engine skips file upload entirely, calls the configured endpoint, parses the JSON response, and feeds the records through the same preview/validate/import pipeline as file-based sources.

data/docs/TARGETS.md DELETED Viewed

@@ -1,227 +0,0 @@
-# Targets
-Targets are plain Ruby classes in `app/importers/` that inherit from `DataPorter::Target`. Each target defines one import type: its columns, sources, mappings, and persistence logic.
-## Generator
-```bash
-bin/rails generate data_porter:target ModelName column:type[:required] ... [--sources csv xlsx]
-```
-Examples:
-```bash
-bin/rails generate data_porter:target User email:string:required name:string age:integer --sources csv xlsx
-bin/rails generate data_porter:target Product name:string price:decimal --sources csv
-bin/rails generate data_porter:target Order order_number:string total:decimal
-```
-Column format: `name:type[:required]`
-Supported types: `string`, `integer`, `decimal`, `boolean`, `date`.
-The `--sources` option specifies which source types the target accepts (default: `csv`). The UI will only show these sources when the target is selected.
-## Class-level DSL
-```ruby
-class OrderTarget < DataPorter::Target
-  label "Orders"
-  model_name "Order"
-  icon "fas fa-shopping-cart"
-  sources :csv, :json, :api, :xlsx
-  columns do
-    column :order_number, type: :string, required: true
-    column :total,        type: :decimal
-    column :placed_at,    type: :date
-    column :active,       type: :boolean
-    column :quantity,     type: :integer
-  end
-  csv_mapping do
-    map "Order #" => :order_number
-    map "Total ($)" => :total
-  end
-  json_root "data.orders"
-  api_config do
-    endpoint "https://api.example.com/orders"
-    headers({ "Authorization" => "Bearer token" })
-    response_root "data.orders"
-  end
-  deduplicate_by :order_number
-  dry_run_enabled
-  params do
-    param :warehouse_id, type: :select, label: "Warehouse", required: true,
-          collection: -> { Warehouse.pluck(:name, :id) }
-    param :currency, type: :text, default: "USD"
-  end
-end
-```
-### `label(value)`
-Human-readable name shown in the UI.
-### `model_name(value)`
-The ActiveRecord model name this target imports into (for display purposes).
-### `icon(value)`
-CSS icon class (e.g. FontAwesome) shown in the UI.
-### `sources(*types)`
-Accepted source types: `:csv`, `:json`, `:api`, `:xlsx`.
-### `columns { ... }`
-Defines the expected columns for this import. Each column accepts:
-| Parameter | Type | Default | Description |
-|---|---|---|---|
-| `name` | Symbol | (required) | Column identifier |
-| `type` | Symbol | `:string` | One of `:string`, `:integer`, `:decimal`, `:boolean`, `:date` |
-| `required` | Boolean | `false` | Whether the column must have a value |
-| `label` | String | Humanized name | Display label in the preview |
-### `csv_mapping { ... }`
-Maps CSV/XLSX header names to column names when they don't match:
-```ruby
-csv_mapping do
-  map "First Name" => :first_name
-  map "E-mail" => :email
-end
-```
-### `json_root(path)`
-Dot-separated path to the array of records within a JSON document:
-```ruby
-json_root "data.users"
-```
-Given `{ "data": { "users": [...] } }`, records are extracted from `data.users`.
-### `api_config { ... }`
-See [Sources: API](SOURCES.md#api) for full documentation.
-### `deduplicate_by(*keys)`
-Skip records that share the same value(s) for the given column(s):
-```ruby
-deduplicate_by :email
-deduplicate_by :first_name, :last_name
-```
-### `dry_run_enabled`
-Enables dry run mode for this target. A "Dry Run" button appears in the preview step. Dry run executes the full import pipeline (transform, validate, persist) inside a rolled-back transaction, giving a validation report without modifying the database.
-### `params { ... }`
-Declares extra form fields shown when this target is selected in the import form. Values are stored in `config["import_params"]` and accessible via `import_params` in all instance methods.
-```ruby
-params do
-  param :hotel_id, type: :select, label: "Hotel", required: true,
-        collection: -> { Hotel.pluck(:name, :id) }
-  param :currency, type: :text, label: "Currency", default: "EUR"
-  param :batch_size, type: :number, label: "Batch Size", default: "100"
-  param :tenant_id, type: :hidden, default: "abc123"
-end
-```
-Each param accepts:
-| Parameter | Type | Default | Description |
-|---|---|---|---|
-| `name` | Symbol | (required) | Param identifier |
-| `type` | Symbol | `:text` | One of `:select`, `:text`, `:number`, `:hidden` |
-| `required` | Boolean | `false` | Validated on import creation, shown with `*` in the form |
-| `label` | String | Humanized name | Display label in the form |
-| `default` | String | `nil` | Pre-filled value in the form |
-| `collection` | Lambda | `nil` | For `:select` type -- returns `[[label, value], ...]` |
-Collection lambdas are evaluated when the form loads, not at boot time. This ensures fresh data (e.g., newly created hotels appear immediately).
-## Instance Methods
-### `import_params`
-Returns a hash of the import params values set by the user in the form. Available in all instance methods (`persist`, `transform`, `validate`, `after_import`, `on_error`). Defaults to `{}` when no params are declared.
-```ruby
-def persist(record, context:)
-  Guest.create!(
-    record.attributes.merge(
-      hotel_id: import_params["hotel_id"],
-      currency: import_params["currency"]
-    )
-  )
-end
-```
-Override these in your target to customize behavior.
-### `transform(record)`
-Transform a record before validation. Must return the (modified) record.
-```ruby
-def transform(record)
-  record.attributes["email"] = record.attributes["email"]&.downcase
-  record
-end
-```
-### `validate(record)`
-Add custom validation errors to a record:
-```ruby
-def validate(record)
-  record.add_error("Email is invalid") unless record.attributes["email"]&.include?("@")
-end
-```
-### `persist(record, context:)`
-**Required.** Save the record to your database. Raises `NotImplementedError` if not overridden.
-```ruby
-def persist(record, context:)
-  User.create!(record.attributes)
-end
-```
-### `after_import(results, context:)`
-Called once after all records have been processed:
-```ruby
-def after_import(results, context:)
-  AdminMailer.import_complete(context.user, results).deliver_later
-end
-```
-### `on_error(record, error, context:)`
-Called when a record fails to import:
-```ruby
-def on_error(record, error, context:)
-  Sentry.capture_exception(error, extra: { record: record.attributes })
-end
-```

data/docs/screenshots/index-with-previewing.jpg DELETED Viewed

Binary file

data/docs/screenshots/index.jpg DELETED Viewed

Binary file

data/docs/screenshots/mapping.jpg DELETED Viewed

Binary file

data/docs/screenshots/modal-new-import.jpg DELETED Viewed

Binary file

data/docs/screenshots/preview.jpg DELETED Viewed

Binary file