data_porter 2.4.0 → 2.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: b393c618ddb47334a61a62de03796211188b58367cad71bfc8574874208a34db
4
- data.tar.gz: 9eaa6b01ea0127bd22fa8557cccc34f47070e58991512f66996df6f9b789e8e3
3
+ metadata.gz: 99b0792c8088d9d2e55826f898fcfa99d872bea179f41d352b1e4aaff9ab14d8
4
+ data.tar.gz: 77c539c666176f992747f2d326a9ce8284cb19426b5f1ae470a317f7dfd740f1
5
5
  SHA512:
6
- metadata.gz: 9f5e73c6eb11dccd143496f4147c941d2729a17643b10eed46f092e4615a2a861d1359b6993dadddaa7ab256672646a25b93f73f77bf238b1b4543fb7c92eb26
7
- data.tar.gz: 3f2c4be7e4dbe828a57bd81709c64b5c592c71aabaae0e604643216455c2c3bfc7ef74d14b78241658104c996125495587cfff73301c5a8f918abca64aca8e4f
6
+ metadata.gz: b863d64f885f55ba6f530ce99173fef55ca473f74c7b0fc6736890bd770583949cf2101389c0635707ea38ceede63aebc529fc1a1e5990c76bf5c0750de318d6
7
+ data.tar.gz: 392ebe242c8ae947b37961bc8d59373a3d9f6a84730f0c946cd35af0a621ecc1a1cfb8b09b361f3a74fcc826c098fad317a4350d9b213206197406d88f763a01
data/CHANGELOG.md CHANGED
@@ -5,6 +5,28 @@ All notable changes to this project will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [Unreleased]
9
+
10
+ ### Added
11
+
12
+ - **Auto-map heuristics** -- Smart column suggestions that pre-fill mapping selects when CSV/XLSX headers match target fields by exact name or built-in synonym (e.g. "E-mail Address" → email, "fname" → first_name). Supports per-column custom synonyms via `synonyms:` keyword in column DSL. Fallback chain: saved mapping > code-defined > auto-map > empty
13
+
14
+ ## [2.5.1] - 2026-02-21
15
+
16
+ ### Fixed
17
+
18
+ - Display target icon (from `icon` DSL) in the imports index table and show page title/details. Previously the icon was stored in the Registry but never rendered in the UI
19
+
20
+ ## [2.5.0] - 2026-02-21
21
+
22
+ ### Added
23
+
24
+ - **Bulk import mode** -- Opt-in per-target batch persistence via `bulk_mode batch_size: 500, on_conflict: :retry_per_record`. Uses `insert_all` by default (with auto-injected timestamps) for 10-100x throughput on simple create scenarios. Custom batch logic via `persist_batch` override. Configurable conflict strategy: `:retry_per_record` (default) retries failed batches record-by-record, `:fail_batch` marks entire batch as errored. Progress broadcasts per batch instead of per record
25
+
26
+ ### Changed
27
+
28
+ - 551 RSpec examples (up from 540), 0 failures
29
+
8
30
  ## [2.4.0] - 2026-02-21
9
31
 
10
32
  ### Added
data/ROADMAP.md CHANGED
@@ -2,10 +2,6 @@
2
2
 
3
3
  ## Next
4
4
 
5
- ### Bulk import
6
-
7
- High-volume import support using `insert_all` / `upsert_all` for batch persistence. Opt-in per target to bypass per-record `persist` calls, enabling 10-100x throughput for simple create/upsert scenarios. Configurable batch size, with fallback to per-record mode on conflict.
8
-
9
5
  ### Update & diff mode
10
6
 
11
7
  Support update (upsert) imports alongside create-only. Given a `deduplicate_by` key, detect existing records and show a diff preview: new records, changed fields (highlighted), unchanged rows. User confirms which changes to apply. Enables recurring data sync workflows.
@@ -34,10 +30,6 @@ Headless REST API for programmatic imports:
34
30
  - Auth via `config.api_authenticate` lambda (API key or Bearer token)
35
31
  - Reuses existing job pipeline (parse, import, dry run)
36
32
 
37
- ### Auto-map heuristics
38
-
39
- Smart column mapping suggestions using tokenized header matching and synonym dictionaries. When a CSV has "E-mail Address", auto-suggest mapping to `:email`. Built-in synonyms for common patterns (phone → phone_number, first name → first_name). Configurable synonym lists per target.
40
-
41
33
  ---
42
34
 
43
35
  ## Ideas
@@ -27,7 +27,21 @@ module DataPorter
27
27
  saved = @import.config&.dig("column_mapping")
28
28
  return saved if saved.present?
29
29
 
30
- (target._csv_mappings || {}).transform_values(&:to_s)
30
+ code_mapping = (target._csv_mappings || {}).transform_values(&:to_s)
31
+ return code_mapping if code_mapping.present?
32
+
33
+ auto_map_suggestions(target)
34
+ end
35
+
36
+ def auto_map_suggestions(target)
37
+ columns = target._columns || []
38
+ return {} if columns.empty? || @file_headers.empty?
39
+
40
+ custom = columns.each_with_object({}) do |col, hash|
41
+ hash[col.name] = col.synonyms if col.synonyms.any?
42
+ end
43
+
44
+ AutoMapper.new(@file_headers, columns.map(&:name), custom_synonyms: custom).call
31
45
  end
32
46
 
33
47
  def save_column_mapping
@@ -28,7 +28,7 @@
28
28
  <% @imports.each do |import| %>
29
29
  <tr>
30
30
  <td><%= import.id %></td>
31
- <td><%= import.target_key %></td>
31
+ <td><% target_cls = import.target_class rescue nil %><% if target_cls&._icon.present? %><i class="<%= target_cls._icon %>"></i> <% end %><%= target_cls&._label || import.target_key %></td>
32
32
  <td><%= import.source_type %></td>
33
33
  <td><%= raw DataPorter::Components::Shared::StatusBadge.new(status: import.status).call %></td>
34
34
  <td><%= import.created_at&.strftime("%Y-%m-%d %H:%M") %></td>
@@ -4,7 +4,7 @@
4
4
  <%= link_to t("data_porter.imports.back_to_imports"), imports_path, class: "dp-btn dp-btn--secondary" %>
5
5
  </div>
6
6
  <h1 class="dp-title">
7
- <%= t("data_porter.imports.show_title", target: @target._label, id: @import.id) %>
7
+ <% if @target._icon.present? %><i class="<%= @target._icon %>"></i> <% end %><%= t("data_porter.imports.show_title", target: @target._label, id: @import.id) %>
8
8
  </h1>
9
9
  <%= raw DataPorter::Components::Shared::StatusBadge.new(status: @import.status).call %>
10
10
  </div>
@@ -12,7 +12,7 @@
12
12
  <div class="dp-import-details">
13
13
  <dl class="dp-details-grid">
14
14
  <dt><%= t("data_porter.imports.details.target") %></dt>
15
- <dd><%= @target._label %></dd>
15
+ <dd><% if @target._icon.present? %><i class="<%= @target._icon %>"></i> <% end %><%= @target._label %></dd>
16
16
  <dt><%= t("data_porter.imports.details.source") %></dt>
17
17
  <dd><%= @import.source_type.upcase %></dd>
18
18
  <% if @import.file.attached? %>
@@ -6,6 +6,7 @@
6
6
  <title>DataPorter</title>
7
7
  <%= csrf_meta_tags %>
8
8
  <%= stylesheet_link_tag "data_porter/application" %>
9
+ <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.5.1/css/all.min.css" integrity="sha512-DTOQO9RWCH3ppGqcWaEA1BIZOC6xxalwEsw9c2QQeAIftl+Vegovlnee1c9QX4TctnWMn13TZye+giMm8e2LwA==" crossorigin="anonymous" referrerpolicy="no-referrer" />
9
10
  <script type="importmap">
10
11
  {
11
12
  "imports": {
@@ -0,0 +1,87 @@
1
+ # frozen_string_literal: true
2
+
3
+ module DataPorter
4
+ class AutoMapper
5
+ SYNONYMS = {
6
+ email: %w[email e_mail email_address e_mail_address courriel mail],
7
+ first_name: %w[first_name firstname fname first prenom],
8
+ last_name: %w[last_name lastname lname last nom],
9
+ name: %w[name full_name fullname nom_complet],
10
+ phone_number: %w[phone_number phone tel telephone mobile cell],
11
+ address: %w[address addr street adresse],
12
+ city: %w[city ville town],
13
+ zip_code: %w[zip_code zip postal_code postcode code_postal],
14
+ country: %w[country pays nation],
15
+ company: %w[company company_name organization organisation entreprise societe],
16
+ title: %w[title job_title position titre poste],
17
+ description: %w[description desc notes],
18
+ quantity: %w[quantity qty amount],
19
+ price: %w[price unit_price prix montant],
20
+ date: %w[date created_at updated_at],
21
+ status: %w[status state statut etat],
22
+ id: %w[id identifier external_id ref reference]
23
+ }.freeze
24
+
25
+ def initialize(headers, target_columns, custom_synonyms: {})
26
+ @headers = headers
27
+ @target_columns = target_columns.map(&:to_s)
28
+ @custom_synonyms = custom_synonyms
29
+ end
30
+
31
+ def call
32
+ used = Set.new
33
+ @headers.each_with_object({}) do |header, mapping|
34
+ match = find_match(header, used)
35
+ used.add(match) if match
36
+ mapping[header] = match || ""
37
+ end
38
+ end
39
+
40
+ private
41
+
42
+ def find_match(header, used)
43
+ normalized = normalize(header)
44
+ return nil if normalized.empty?
45
+
46
+ exact_match(normalized, used) || synonym_match(normalized, used)
47
+ end
48
+
49
+ def exact_match(normalized, used)
50
+ @target_columns.find { |col| col == normalized && !used.include?(col) }
51
+ end
52
+
53
+ def synonym_match(normalized, used)
54
+ lookup_table[normalized]&.find { |col| !used.include?(col) }
55
+ end
56
+
57
+ def lookup_table
58
+ @lookup_table ||= build_lookup_table
59
+ end
60
+
61
+ def build_lookup_table
62
+ table = Hash.new { |h, k| h[k] = [] }
63
+ merged_synonyms.each do |column, synonyms|
64
+ col_name = column.to_s
65
+ next unless @target_columns.include?(col_name)
66
+
67
+ synonyms.each { |syn| table[syn] << col_name }
68
+ end
69
+ table
70
+ end
71
+
72
+ def merged_synonyms
73
+ result = SYNONYMS.transform_values(&:dup)
74
+ @custom_synonyms.each do |column, syns|
75
+ key = column.to_sym
76
+ result[key] = (result.fetch(key, []) + syns.map { |s| normalize(s) }).uniq
77
+ end
78
+ result
79
+ end
80
+
81
+ def normalize(header)
82
+ return "" if header.nil?
83
+
84
+ header.to_s.strip.downcase.gsub(/[\s-]+/, "_").gsub(/[^a-z0-9_]/, "")
85
+ end
86
+ end
87
+ end
@@ -2,14 +2,15 @@
2
2
 
3
3
  module DataPorter
4
4
  module DSL
5
- Column = Struct.new(:name, :type, :required, :label, :transform, :options, keyword_init: true) do
6
- def initialize(name:, type: :string, required: false, label: nil, transform: [], **options)
5
+ Column = Struct.new(:name, :type, :required, :label, :transform, :synonyms, :options, keyword_init: true) do
6
+ def initialize(name:, type: :string, required: false, label: nil, transform: [], synonyms: [], **options)
7
7
  super(
8
8
  name: name.to_sym,
9
9
  type: type.to_sym,
10
10
  required: required,
11
11
  label: label || name.to_s.humanize,
12
12
  transform: Array(transform),
13
+ synonyms: Array(synonyms),
13
14
  options: options
14
15
  )
15
16
  end
@@ -0,0 +1,54 @@
1
+ # frozen_string_literal: true
2
+
3
+ module DataPorter
4
+ class Orchestrator
5
+ module BulkImporter
6
+ private
7
+
8
+ def import_bulk
9
+ importable = @data_import.importable_records
10
+ context = build_context
11
+ config = @target.class._bulk_config
12
+ results = { created: 0, errored: 0 }
13
+ total = importable.size
14
+ processed = 0
15
+
16
+ importable.each_slice(config[:batch_size]) do |batch|
17
+ persist_batch_with_fallback(batch, context, config, results)
18
+ processed += batch.size
19
+ broadcast_progress(processed, total)
20
+ end
21
+
22
+ finalize_import(results)
23
+ end
24
+
25
+ def persist_batch_with_fallback(batch, context, config, results)
26
+ @target.persist_batch(batch, context: context)
27
+ results[:created] += batch.size
28
+ rescue StandardError => e
29
+ handle_batch_failure(batch, context, config, results, e)
30
+ end
31
+
32
+ def handle_batch_failure(batch, context, config, results, error)
33
+ if config[:on_conflict] == :fail_batch
34
+ fail_batch(batch, results, error)
35
+ else
36
+ retry_per_record(batch, context, results)
37
+ end
38
+ end
39
+
40
+ def fail_batch(batch, results, error)
41
+ batch.each do |record|
42
+ record.add_error(error.message)
43
+ end
44
+ results[:errored] += batch.size
45
+ end
46
+
47
+ def retry_per_record(batch, context, results)
48
+ batch.each do |record|
49
+ persist_record(record, context, results)
50
+ end
51
+ end
52
+ end
53
+ end
54
+ end
@@ -6,7 +6,9 @@ module DataPorter
6
6
  private
7
7
 
8
8
  def import_records
9
- if DataPorter.configuration.transaction_mode == :all
9
+ if @target.class._bulk_config
10
+ import_bulk
11
+ elsif DataPorter.configuration.transaction_mode == :all
10
12
  import_all_or_nothing
11
13
  else
12
14
  import_per_record
@@ -2,12 +2,14 @@
2
2
 
3
3
  require_relative "orchestrator/record_builder"
4
4
  require_relative "orchestrator/importer"
5
+ require_relative "orchestrator/bulk_importer"
5
6
  require_relative "orchestrator/dry_runner"
6
7
 
7
8
  module DataPorter
8
9
  class Orchestrator
9
10
  include RecordBuilder
10
11
  include Importer
12
+ include BulkImporter
11
13
  include DryRunner
12
14
 
13
15
  def initialize(data_import, content: nil)
@@ -10,7 +10,8 @@ module DataPorter
10
10
  class << self
11
11
  attr_reader :_label, :_model_name, :_icon, :_sources,
12
12
  :_columns, :_csv_mappings, :_dedup_keys, :_json_root,
13
- :_api_config, :_dry_run_enabled, :_params, :_webhooks
13
+ :_api_config, :_dry_run_enabled, :_params, :_webhooks,
14
+ :_bulk_config
14
15
 
15
16
  def label(value)
16
17
  @_label = value
@@ -82,6 +83,10 @@ module DataPorter
82
83
  @_webhooks << DSL::Webhook.new(url: url, **)
83
84
  end
84
85
 
86
+ def bulk_mode(batch_size: 500, on_conflict: :retry_per_record)
87
+ @_bulk_config = { batch_size: batch_size, on_conflict: on_conflict }
88
+ end
89
+
85
90
  private
86
91
 
87
92
  def auto_register
@@ -108,6 +113,16 @@ module DataPorter
108
113
  raise NotImplementedError
109
114
  end
110
115
 
116
+ def persist_batch(records, context: nil) # rubocop:disable Lint/UnusedMethodArgument
117
+ raise Error, "model_name is required for default persist_batch" unless self.class._model_name
118
+
119
+ now = Time.current
120
+ model_class = self.class._model_name.constantize
121
+ model_class.insert_all(
122
+ records.map { |r| r.data.merge("created_at" => now, "updated_at" => now) }
123
+ )
124
+ end
125
+
111
126
  def after_import(_results, context:); end
112
127
 
113
128
  def on_error(_record, _error, context:); end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module DataPorter
4
- VERSION = "2.4.0"
4
+ VERSION = "2.5.1"
5
5
  end
data/lib/data_porter.rb CHANGED
@@ -20,6 +20,7 @@ require_relative "data_porter/record_validator"
20
20
  require_relative "data_porter/broadcaster"
21
21
  require_relative "data_porter/webhook_notifier"
22
22
  require_relative "data_porter/orchestrator"
23
+ require_relative "data_porter/auto_mapper"
23
24
  require_relative "data_porter/rejects_csv_builder"
24
25
  require_relative "data_porter/components"
25
26
  require_relative "data_porter/engine"
@@ -37,6 +37,10 @@ DataPorter.configure do |config|
37
37
  # Set to nil to disable auto-purge. Run `rake data_porter:purge` manually or via cron.
38
38
  # config.purge_after = 60.days
39
39
 
40
+ # Bulk import: enable per-target via `bulk_mode` in your Target class.
41
+ # Uses insert_all for 10-100x throughput on large imports.
42
+ # See docs/ADVANCED.md for configuration options.
43
+
40
44
  # HMAC-SHA256 secret for signing webhook payloads.
41
45
  # When set, every webhook request includes an X-DataPorter-Signature header.
42
46
  # Set to nil to disable signing (default).
@@ -5,6 +5,7 @@ class <%= target_class_name %> < DataPorter::Target
5
5
  model_name "<%= model_name %>"
6
6
  icon "fas fa-file-import"
7
7
  sources <%= target_sources %>
8
+ # bulk_mode batch_size: 500, on_conflict: :retry_per_record
8
9
  <% if parsed_columns.any? %>
9
10
 
10
11
  columns do
data/mkdocs.yml CHANGED
@@ -92,7 +92,9 @@ nav:
92
92
  - Targets: TARGETS.md
93
93
  - Sources: SOURCES.md
94
94
  - Column Mapping: MAPPING.md
95
+ - Views & Theming: VIEWS.md
95
96
  - Routes: routes.md
97
+ - Advanced: ADVANCED.md
96
98
  - Roadmap: ROADMAP.md
97
99
  - Changelog: changelog.md
98
100
  - Contributing: contributing.md
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: data_porter
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.4.0
4
+ version: 2.5.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Seryl Lounis
@@ -146,6 +146,7 @@ files:
146
146
  - config/locales/fr.yml
147
147
  - config/routes.rb
148
148
  - lib/data_porter.rb
149
+ - lib/data_porter/auto_mapper.rb
149
150
  - lib/data_porter/broadcaster.rb
150
151
  - lib/data_porter/column_transformer.rb
151
152
  - lib/data_porter/components.rb
@@ -167,6 +168,7 @@ files:
167
168
  - lib/data_porter/dsl/webhook.rb
168
169
  - lib/data_porter/engine.rb
169
170
  - lib/data_porter/orchestrator.rb
171
+ - lib/data_porter/orchestrator/bulk_importer.rb
170
172
  - lib/data_porter/orchestrator/dry_runner.rb
171
173
  - lib/data_porter/orchestrator/importer.rb
172
174
  - lib/data_porter/orchestrator/record_builder.rb