data_porter 2.4.0 → 2.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +40 -0
- data/README.md +2 -0
- data/ROADMAP.md +0 -12
- data/app/controllers/data_porter/concerns/mapping_management.rb +15 -1
- data/app/controllers/data_porter/imports_controller.rb +7 -1
- data/app/models/data_porter/data_import.rb +5 -1
- data/app/views/data_porter/imports/index.html.erb +1 -1
- data/app/views/data_porter/imports/show.html.erb +7 -3
- data/app/views/layouts/data_porter/application.html.erb +1 -0
- data/config/locales/en.yml +1 -0
- data/config/locales/fr.yml +1 -0
- data/config/routes.rb +1 -0
- data/lib/data_porter/auto_mapper.rb +87 -0
- data/lib/data_porter/dsl/column.rb +3 -2
- data/lib/data_porter/orchestrator/bulk_importer.rb +62 -0
- data/lib/data_porter/orchestrator/importer.rb +28 -4
- data/lib/data_porter/orchestrator.rb +27 -6
- data/lib/data_porter/target.rb +16 -1
- data/lib/data_porter/version.rb +1 -1
- data/lib/data_porter.rb +1 -0
- data/lib/generators/data_porter/install/templates/initializer.rb +4 -0
- data/lib/generators/data_porter/target/templates/target.rb.tt +1 -0
- data/mkdocs.yml +2 -0
- metadata +3 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 955a886124d8ff2f1da4f23e725a52caec3bbc635450f058e3bbce81f4b898f5
|
|
4
|
+
data.tar.gz: 6f1f0be41999d105c7558b7192f3f61d79f935431ce7b99bede5753d69be9ce3
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 399c87e6daa56196ae96525ca75b27a464a109463d3f4ebf5361738a93cc6385b30e37e588cab45c3a2218b99596dabcd9a48b0a9856eac17902907a9f4f9a34
|
|
7
|
+
data.tar.gz: f423073b27fc407cc94ecc8e11693d40da5210387129f428105ac4973d7b7be034a59b8161c2a4da3c99acc7e4ec1b13d4828c54336e12d441044549b6243151
|
data/CHANGELOG.md
CHANGED
|
@@ -5,6 +5,46 @@ All notable changes to this project will be documented in this file.
|
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
|
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
7
|
|
|
8
|
+
## [Unreleased]
|
|
9
|
+
|
|
10
|
+
### Added
|
|
11
|
+
|
|
12
|
+
- **Auto-map heuristics** -- Smart column suggestions that pre-fill mapping selects when CSV/XLSX headers match target fields by exact name or built-in synonym (e.g. "E-mail Address" → email, "fname" → first_name). Supports per-column custom synonyms via `synonyms:` keyword in column DSL. Fallback chain: saved mapping > code-defined > auto-map > empty
|
|
13
|
+
|
|
14
|
+
## [2.6.0] - 2026-02-21
|
|
15
|
+
|
|
16
|
+
### Added
|
|
17
|
+
|
|
18
|
+
- **Resume on failure** -- When an import fails mid-way (crash, timeout, exception), resume from the last successful record instead of re-importing from scratch. Progress checkpoints stored in the existing `config` JSONB column alongside `broadcast_progress` — zero additional DB operations or migrations. Works with both per-record and bulk import modes
|
|
19
|
+
- `resumable?` predicate on `DataImport` — returns `true` when a failed import has a checkpoint with processed records
|
|
20
|
+
- Resume button in the failed import UI (primary action), with Retry demoted to secondary
|
|
21
|
+
- `POST :resume` route on the imports controller
|
|
22
|
+
|
|
23
|
+
### Fixed
|
|
24
|
+
|
|
25
|
+
- `handle_failure` now preserves existing report data (parsed counts, partial results) instead of creating a new empty report
|
|
26
|
+
- `parse!` now clears stale checkpoint and progress data from previous import attempts
|
|
27
|
+
|
|
28
|
+
### Changed
|
|
29
|
+
|
|
30
|
+
- 574 RSpec examples (up from 551), 0 failures
|
|
31
|
+
|
|
32
|
+
## [2.5.1] - 2026-02-21
|
|
33
|
+
|
|
34
|
+
### Fixed
|
|
35
|
+
|
|
36
|
+
- Display target icon (from `icon` DSL) in the imports index table and show page title/details. Previously the icon was stored in the Registry but never rendered in the UI
|
|
37
|
+
|
|
38
|
+
## [2.5.0] - 2026-02-21
|
|
39
|
+
|
|
40
|
+
### Added
|
|
41
|
+
|
|
42
|
+
- **Bulk import mode** -- Opt-in per-target batch persistence via `bulk_mode batch_size: 500, on_conflict: :retry_per_record`. Uses `insert_all` by default (with auto-injected timestamps) for 10-100x throughput on simple create scenarios. Custom batch logic via `persist_batch` override. Configurable conflict strategy: `:retry_per_record` (default) retries failed batches record-by-record, `:fail_batch` marks entire batch as errored. Progress broadcasts per batch instead of per record
|
|
43
|
+
|
|
44
|
+
### Changed
|
|
45
|
+
|
|
46
|
+
- 551 RSpec examples (up from 540), 0 failures
|
|
47
|
+
|
|
8
48
|
## [2.4.0] - 2026-02-21
|
|
9
49
|
|
|
10
50
|
### Added
|
data/README.md
CHANGED
|
@@ -103,6 +103,8 @@ pending -> parsing -> previewing -> importing -> completed
|
|
|
103
103
|
|
|
104
104
|
**[Full documentation on GitHub Pages](https://seryllns.github.io/data_porter/)**
|
|
105
105
|
|
|
106
|
+
> **Build series**: Want to see how DataPorter was built step by step? [Building DataPorter on dev.to](https://dev.to/seryllns_/series/35813) -- 30 parts covering architecture, TDD, and every feature from first commit to production.
|
|
107
|
+
|
|
106
108
|
| Topic | Description |
|
|
107
109
|
|---|---|
|
|
108
110
|
| [Configuration](docs/CONFIGURATION.md) | All options, authentication, context builder, real-time updates |
|
data/ROADMAP.md
CHANGED
|
@@ -2,18 +2,10 @@
|
|
|
2
2
|
|
|
3
3
|
## Next
|
|
4
4
|
|
|
5
|
-
### Bulk import
|
|
6
|
-
|
|
7
|
-
High-volume import support using `insert_all` / `upsert_all` for batch persistence. Opt-in per target to bypass per-record `persist` calls, enabling 10-100x throughput for simple create/upsert scenarios. Configurable batch size, with fallback to per-record mode on conflict.
|
|
8
|
-
|
|
9
5
|
### Update & diff mode
|
|
10
6
|
|
|
11
7
|
Support update (upsert) imports alongside create-only. Given a `deduplicate_by` key, detect existing records and show a diff preview: new records, changed fields (highlighted), unchanged rows. User confirms which changes to apply. Enables recurring data sync workflows.
|
|
12
8
|
|
|
13
|
-
### Resume / retry on failure
|
|
14
|
-
|
|
15
|
-
If an import fails mid-way (timeout, crash, transient error), resume from the last successful record instead of restarting from scratch. Track a checkpoint index in the report. Critical for large imports (5k+ records) where re-processing everything is not acceptable.
|
|
16
|
-
|
|
17
9
|
### API pagination
|
|
18
10
|
|
|
19
11
|
Support paginated API sources. The current API source does a single GET, which works for small datasets but not for APIs returning thousands of records across multiple pages. Support offset, cursor, and link-header pagination strategies via `api_config`:
|
|
@@ -34,10 +26,6 @@ Headless REST API for programmatic imports:
|
|
|
34
26
|
- Auth via `config.api_authenticate` lambda (API key or Bearer token)
|
|
35
27
|
- Reuses existing job pipeline (parse, import, dry run)
|
|
36
28
|
|
|
37
|
-
### Auto-map heuristics
|
|
38
|
-
|
|
39
|
-
Smart column mapping suggestions using tokenized header matching and synonym dictionaries. When a CSV has "E-mail Address", auto-suggest mapping to `:email`. Built-in synonyms for common patterns (phone → phone_number, first name → first_name). Configurable synonym lists per target.
|
|
40
|
-
|
|
41
29
|
---
|
|
42
30
|
|
|
43
31
|
## Ideas
|
|
@@ -27,7 +27,21 @@ module DataPorter
|
|
|
27
27
|
saved = @import.config&.dig("column_mapping")
|
|
28
28
|
return saved if saved.present?
|
|
29
29
|
|
|
30
|
-
(target._csv_mappings || {}).transform_values(&:to_s)
|
|
30
|
+
code_mapping = (target._csv_mappings || {}).transform_values(&:to_s)
|
|
31
|
+
return code_mapping if code_mapping.present?
|
|
32
|
+
|
|
33
|
+
auto_map_suggestions(target)
|
|
34
|
+
end
|
|
35
|
+
|
|
36
|
+
def auto_map_suggestions(target)
|
|
37
|
+
columns = target._columns || []
|
|
38
|
+
return {} if columns.empty? || @file_headers.empty?
|
|
39
|
+
|
|
40
|
+
custom = columns.each_with_object({}) do |col, hash|
|
|
41
|
+
hash[col.name] = col.synonyms if col.synonyms.any?
|
|
42
|
+
end
|
|
43
|
+
|
|
44
|
+
AutoMapper.new(@file_headers, columns.map(&:name), custom_synonyms: custom).call
|
|
31
45
|
end
|
|
32
46
|
|
|
33
47
|
def save_column_mapping
|
|
@@ -10,7 +10,7 @@ module DataPorter
|
|
|
10
10
|
layout "data_porter/application"
|
|
11
11
|
|
|
12
12
|
before_action :set_import, only: %i[show parse confirm cancel dry_run update_mapping
|
|
13
|
-
status export_rejects destroy back_to_mapping]
|
|
13
|
+
status export_rejects destroy back_to_mapping resume]
|
|
14
14
|
before_action :load_targets, only: %i[index new create]
|
|
15
15
|
|
|
16
16
|
def index
|
|
@@ -69,6 +69,12 @@ module DataPorter
|
|
|
69
69
|
redirect_to import_path(@import)
|
|
70
70
|
end
|
|
71
71
|
|
|
72
|
+
def resume
|
|
73
|
+
@import.update!(status: :pending)
|
|
74
|
+
DataPorter::ImportJob.perform_later(@import.id)
|
|
75
|
+
redirect_to import_path(@import)
|
|
76
|
+
end
|
|
77
|
+
|
|
72
78
|
def dry_run
|
|
73
79
|
@import.update!(status: :pending)
|
|
74
80
|
DataPorter::DryRunJob.perform_later(@import.id)
|
|
@@ -53,12 +53,16 @@ module DataPorter
|
|
|
53
53
|
records.group_by(&:status).transform_values(&:count)
|
|
54
54
|
end
|
|
55
55
|
|
|
56
|
+
def resumable?
|
|
57
|
+
failed? && config&.dig("checkpoint", "processed").to_i.positive?
|
|
58
|
+
end
|
|
59
|
+
|
|
56
60
|
def reset_to_mapping!
|
|
57
61
|
update!(
|
|
58
62
|
status: :mapping,
|
|
59
63
|
records: [],
|
|
60
64
|
report: StoreModels::Report.new,
|
|
61
|
-
config: (config || {}).except("progress")
|
|
65
|
+
config: (config || {}).except("progress", "checkpoint")
|
|
62
66
|
)
|
|
63
67
|
end
|
|
64
68
|
|
|
@@ -28,7 +28,7 @@
|
|
|
28
28
|
<% @imports.each do |import| %>
|
|
29
29
|
<tr>
|
|
30
30
|
<td><%= import.id %></td>
|
|
31
|
-
<td
|
|
31
|
+
<td><% target_cls = import.target_class rescue nil %><% if target_cls&._icon.present? %><i class="<%= target_cls._icon %>"></i> <% end %><%= target_cls&._label || import.target_key %></td>
|
|
32
32
|
<td><%= import.source_type %></td>
|
|
33
33
|
<td><%= raw DataPorter::Components::Shared::StatusBadge.new(status: import.status).call %></td>
|
|
34
34
|
<td><%= import.created_at&.strftime("%Y-%m-%d %H:%M") %></td>
|
|
@@ -4,7 +4,7 @@
|
|
|
4
4
|
<%= link_to t("data_porter.imports.back_to_imports"), imports_path, class: "dp-btn dp-btn--secondary" %>
|
|
5
5
|
</div>
|
|
6
6
|
<h1 class="dp-title">
|
|
7
|
-
<%= t("data_porter.imports.show_title", target: @target._label, id: @import.id) %>
|
|
7
|
+
<% if @target._icon.present? %><i class="<%= @target._icon %>"></i> <% end %><%= t("data_porter.imports.show_title", target: @target._label, id: @import.id) %>
|
|
8
8
|
</h1>
|
|
9
9
|
<%= raw DataPorter::Components::Shared::StatusBadge.new(status: @import.status).call %>
|
|
10
10
|
</div>
|
|
@@ -12,7 +12,7 @@
|
|
|
12
12
|
<div class="dp-import-details">
|
|
13
13
|
<dl class="dp-details-grid">
|
|
14
14
|
<dt><%= t("data_porter.imports.details.target") %></dt>
|
|
15
|
-
<dd
|
|
15
|
+
<dd><% if @target._icon.present? %><i class="<%= @target._icon %>"></i> <% end %><%= @target._label %></dd>
|
|
16
16
|
<dt><%= t("data_porter.imports.details.source") %></dt>
|
|
17
17
|
<dd><%= @import.source_type.upcase %></dd>
|
|
18
18
|
<% if @import.file.attached? %>
|
|
@@ -103,8 +103,12 @@
|
|
|
103
103
|
<% if @import.failed? %>
|
|
104
104
|
<%= raw DataPorter::Components::Shared::FailureAlert.new(report: @import.report).call %>
|
|
105
105
|
<div class="dp-actions">
|
|
106
|
+
<% if @import.resumable? %>
|
|
107
|
+
<%= button_to t("data_porter.imports.resume"), resume_import_path(@import),
|
|
108
|
+
method: :post, class: "dp-btn dp-btn--primary" %>
|
|
109
|
+
<% end %>
|
|
106
110
|
<%= button_to t("data_porter.imports.retry"), parse_import_path(@import),
|
|
107
|
-
method: :post, class: "dp-btn dp-btn--
|
|
111
|
+
method: :post, class: "dp-btn dp-btn--secondary" %>
|
|
108
112
|
<%= button_to t("data_porter.imports.delete"), import_path(@import),
|
|
109
113
|
method: :delete, class: "dp-btn dp-btn--danger",
|
|
110
114
|
data: { turbo_confirm: t("data_porter.imports.delete_confirm") } %>
|
|
@@ -6,6 +6,7 @@
|
|
|
6
6
|
<title>DataPorter</title>
|
|
7
7
|
<%= csrf_meta_tags %>
|
|
8
8
|
<%= stylesheet_link_tag "data_porter/application" %>
|
|
9
|
+
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.5.1/css/all.min.css" integrity="sha512-DTOQO9RWCH3ppGqcWaEA1BIZOC6xxalwEsw9c2QQeAIftl+Vegovlnee1c9QX4TctnWMn13TZye+giMm8e2LwA==" crossorigin="anonymous" referrerpolicy="no-referrer" />
|
|
9
10
|
<script type="importmap">
|
|
10
11
|
{
|
|
11
12
|
"imports": {
|
data/config/locales/en.yml
CHANGED
data/config/locales/fr.yml
CHANGED
data/config/routes.rb
CHANGED
|
@@ -0,0 +1,87 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module DataPorter
|
|
4
|
+
class AutoMapper
|
|
5
|
+
SYNONYMS = {
|
|
6
|
+
email: %w[email e_mail email_address e_mail_address courriel mail],
|
|
7
|
+
first_name: %w[first_name firstname fname first prenom],
|
|
8
|
+
last_name: %w[last_name lastname lname last nom],
|
|
9
|
+
name: %w[name full_name fullname nom_complet],
|
|
10
|
+
phone_number: %w[phone_number phone tel telephone mobile cell],
|
|
11
|
+
address: %w[address addr street adresse],
|
|
12
|
+
city: %w[city ville town],
|
|
13
|
+
zip_code: %w[zip_code zip postal_code postcode code_postal],
|
|
14
|
+
country: %w[country pays nation],
|
|
15
|
+
company: %w[company company_name organization organisation entreprise societe],
|
|
16
|
+
title: %w[title job_title position titre poste],
|
|
17
|
+
description: %w[description desc notes],
|
|
18
|
+
quantity: %w[quantity qty amount],
|
|
19
|
+
price: %w[price unit_price prix montant],
|
|
20
|
+
date: %w[date created_at updated_at],
|
|
21
|
+
status: %w[status state statut etat],
|
|
22
|
+
id: %w[id identifier external_id ref reference]
|
|
23
|
+
}.freeze
|
|
24
|
+
|
|
25
|
+
def initialize(headers, target_columns, custom_synonyms: {})
|
|
26
|
+
@headers = headers
|
|
27
|
+
@target_columns = target_columns.map(&:to_s)
|
|
28
|
+
@custom_synonyms = custom_synonyms
|
|
29
|
+
end
|
|
30
|
+
|
|
31
|
+
def call
|
|
32
|
+
used = Set.new
|
|
33
|
+
@headers.each_with_object({}) do |header, mapping|
|
|
34
|
+
match = find_match(header, used)
|
|
35
|
+
used.add(match) if match
|
|
36
|
+
mapping[header] = match || ""
|
|
37
|
+
end
|
|
38
|
+
end
|
|
39
|
+
|
|
40
|
+
private
|
|
41
|
+
|
|
42
|
+
def find_match(header, used)
|
|
43
|
+
normalized = normalize(header)
|
|
44
|
+
return nil if normalized.empty?
|
|
45
|
+
|
|
46
|
+
exact_match(normalized, used) || synonym_match(normalized, used)
|
|
47
|
+
end
|
|
48
|
+
|
|
49
|
+
def exact_match(normalized, used)
|
|
50
|
+
@target_columns.find { |col| col == normalized && !used.include?(col) }
|
|
51
|
+
end
|
|
52
|
+
|
|
53
|
+
def synonym_match(normalized, used)
|
|
54
|
+
lookup_table[normalized]&.find { |col| !used.include?(col) }
|
|
55
|
+
end
|
|
56
|
+
|
|
57
|
+
def lookup_table
|
|
58
|
+
@lookup_table ||= build_lookup_table
|
|
59
|
+
end
|
|
60
|
+
|
|
61
|
+
def build_lookup_table
|
|
62
|
+
table = Hash.new { |h, k| h[k] = [] }
|
|
63
|
+
merged_synonyms.each do |column, synonyms|
|
|
64
|
+
col_name = column.to_s
|
|
65
|
+
next unless @target_columns.include?(col_name)
|
|
66
|
+
|
|
67
|
+
synonyms.each { |syn| table[syn] << col_name }
|
|
68
|
+
end
|
|
69
|
+
table
|
|
70
|
+
end
|
|
71
|
+
|
|
72
|
+
def merged_synonyms
|
|
73
|
+
result = SYNONYMS.transform_values(&:dup)
|
|
74
|
+
@custom_synonyms.each do |column, syns|
|
|
75
|
+
key = column.to_sym
|
|
76
|
+
result[key] = (result.fetch(key, []) + syns.map { |s| normalize(s) }).uniq
|
|
77
|
+
end
|
|
78
|
+
result
|
|
79
|
+
end
|
|
80
|
+
|
|
81
|
+
def normalize(header)
|
|
82
|
+
return "" if header.nil?
|
|
83
|
+
|
|
84
|
+
header.to_s.strip.downcase.gsub(/[\s-]+/, "_").gsub(/[^a-z0-9_]/, "")
|
|
85
|
+
end
|
|
86
|
+
end
|
|
87
|
+
end
|
|
@@ -2,14 +2,15 @@
|
|
|
2
2
|
|
|
3
3
|
module DataPorter
|
|
4
4
|
module DSL
|
|
5
|
-
Column = Struct.new(:name, :type, :required, :label, :transform, :options, keyword_init: true) do
|
|
6
|
-
def initialize(name:, type: :string, required: false, label: nil, transform: [], **options)
|
|
5
|
+
Column = Struct.new(:name, :type, :required, :label, :transform, :synonyms, :options, keyword_init: true) do
|
|
6
|
+
def initialize(name:, type: :string, required: false, label: nil, transform: [], synonyms: [], **options)
|
|
7
7
|
super(
|
|
8
8
|
name: name.to_sym,
|
|
9
9
|
type: type.to_sym,
|
|
10
10
|
required: required,
|
|
11
11
|
label: label || name.to_s.humanize,
|
|
12
12
|
transform: Array(transform),
|
|
13
|
+
synonyms: Array(synonyms),
|
|
13
14
|
options: options
|
|
14
15
|
)
|
|
15
16
|
end
|
|
@@ -0,0 +1,62 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module DataPorter
|
|
4
|
+
class Orchestrator
|
|
5
|
+
module BulkImporter
|
|
6
|
+
private
|
|
7
|
+
|
|
8
|
+
def import_bulk
|
|
9
|
+
importable = @data_import.importable_records
|
|
10
|
+
checkpoint = load_checkpoint
|
|
11
|
+
@bulk_state = build_bulk_state(importable, checkpoint)
|
|
12
|
+
|
|
13
|
+
process_batches(importable.drop(checkpoint[:processed]))
|
|
14
|
+
finalize_import(@bulk_state[:results])
|
|
15
|
+
end
|
|
16
|
+
|
|
17
|
+
def build_bulk_state(importable, checkpoint)
|
|
18
|
+
{
|
|
19
|
+
context: build_context,
|
|
20
|
+
bulk_config: @target.class._bulk_config,
|
|
21
|
+
results: seed_results(checkpoint),
|
|
22
|
+
total: importable.size,
|
|
23
|
+
processed: checkpoint[:processed]
|
|
24
|
+
}
|
|
25
|
+
end
|
|
26
|
+
|
|
27
|
+
def process_batches(records)
|
|
28
|
+
records.each_slice(@bulk_state[:bulk_config][:batch_size]) do |batch|
|
|
29
|
+
persist_batch_with_fallback(batch)
|
|
30
|
+
@bulk_state[:processed] += batch.size
|
|
31
|
+
broadcast_progress(@bulk_state[:processed], @bulk_state[:total], results: @bulk_state[:results])
|
|
32
|
+
end
|
|
33
|
+
end
|
|
34
|
+
|
|
35
|
+
def persist_batch_with_fallback(batch)
|
|
36
|
+
@target.persist_batch(batch, context: @bulk_state[:context])
|
|
37
|
+
@bulk_state[:results][:created] += batch.size
|
|
38
|
+
rescue StandardError => e
|
|
39
|
+
handle_batch_failure(batch, e)
|
|
40
|
+
end
|
|
41
|
+
|
|
42
|
+
def handle_batch_failure(batch, error)
|
|
43
|
+
if @bulk_state[:bulk_config][:on_conflict] == :fail_batch
|
|
44
|
+
fail_batch(batch, error)
|
|
45
|
+
else
|
|
46
|
+
retry_per_record(batch)
|
|
47
|
+
end
|
|
48
|
+
end
|
|
49
|
+
|
|
50
|
+
def fail_batch(batch, error)
|
|
51
|
+
batch.each { |record| record.add_error(error.message) }
|
|
52
|
+
@bulk_state[:results][:errored] += batch.size
|
|
53
|
+
end
|
|
54
|
+
|
|
55
|
+
def retry_per_record(batch)
|
|
56
|
+
batch.each do |record|
|
|
57
|
+
persist_record(record, @bulk_state[:context], @bulk_state[:results])
|
|
58
|
+
end
|
|
59
|
+
end
|
|
60
|
+
end
|
|
61
|
+
end
|
|
62
|
+
end
|
|
@@ -6,7 +6,9 @@ module DataPorter
|
|
|
6
6
|
private
|
|
7
7
|
|
|
8
8
|
def import_records
|
|
9
|
-
if
|
|
9
|
+
if @target.class._bulk_config
|
|
10
|
+
import_bulk
|
|
11
|
+
elsif DataPorter.configuration.transaction_mode == :all
|
|
10
12
|
import_all_or_nothing
|
|
11
13
|
else
|
|
12
14
|
import_per_record
|
|
@@ -16,12 +18,14 @@ module DataPorter
|
|
|
16
18
|
def import_per_record
|
|
17
19
|
importable = @data_import.importable_records
|
|
18
20
|
context = build_context
|
|
19
|
-
|
|
21
|
+
checkpoint = load_checkpoint
|
|
22
|
+
results = seed_results(checkpoint)
|
|
23
|
+
remaining = importable.drop(checkpoint[:processed])
|
|
20
24
|
total = importable.size
|
|
21
25
|
|
|
22
|
-
|
|
26
|
+
remaining.each_with_index do |record, index|
|
|
23
27
|
persist_record(record, context, results)
|
|
24
|
-
broadcast_progress(index + 1, total)
|
|
28
|
+
broadcast_progress(checkpoint[:processed] + index + 1, total, results: results)
|
|
25
29
|
end
|
|
26
30
|
|
|
27
31
|
finalize_import(results)
|
|
@@ -43,6 +47,7 @@ module DataPorter
|
|
|
43
47
|
end
|
|
44
48
|
|
|
45
49
|
def finalize_import(results)
|
|
50
|
+
clear_checkpoint
|
|
46
51
|
@data_import.update!(status: :completed)
|
|
47
52
|
@broadcaster.success
|
|
48
53
|
WebhookNotifier.notify(@data_import, "import.completed")
|
|
@@ -64,6 +69,25 @@ module DataPorter
|
|
|
64
69
|
report.errored_count = results[:errored]
|
|
65
70
|
@data_import.update!(report: report)
|
|
66
71
|
end
|
|
72
|
+
|
|
73
|
+
def load_checkpoint
|
|
74
|
+
cp = @data_import.config&.dig("checkpoint") || {}
|
|
75
|
+
{
|
|
76
|
+
processed: cp["processed"].to_i,
|
|
77
|
+
created: cp["created"].to_i,
|
|
78
|
+
errored: cp["errored"].to_i
|
|
79
|
+
}
|
|
80
|
+
end
|
|
81
|
+
|
|
82
|
+
def seed_results(checkpoint)
|
|
83
|
+
{ created: checkpoint[:created], errored: checkpoint[:errored] }
|
|
84
|
+
end
|
|
85
|
+
|
|
86
|
+
def clear_checkpoint
|
|
87
|
+
config = @data_import.config || {}
|
|
88
|
+
config.delete("checkpoint")
|
|
89
|
+
@data_import.update_column(:config, config)
|
|
90
|
+
end
|
|
67
91
|
end
|
|
68
92
|
end
|
|
69
93
|
end
|
|
@@ -2,12 +2,14 @@
|
|
|
2
2
|
|
|
3
3
|
require_relative "orchestrator/record_builder"
|
|
4
4
|
require_relative "orchestrator/importer"
|
|
5
|
+
require_relative "orchestrator/bulk_importer"
|
|
5
6
|
require_relative "orchestrator/dry_runner"
|
|
6
7
|
|
|
7
8
|
module DataPorter
|
|
8
9
|
class Orchestrator
|
|
9
10
|
include RecordBuilder
|
|
10
11
|
include Importer
|
|
12
|
+
include BulkImporter
|
|
11
13
|
include DryRunner
|
|
12
14
|
|
|
13
15
|
def initialize(data_import, content: nil)
|
|
@@ -30,6 +32,7 @@ module DataPorter
|
|
|
30
32
|
def parse!
|
|
31
33
|
@data_import.parsing!
|
|
32
34
|
records = build_records
|
|
35
|
+
clear_stale_import_data
|
|
33
36
|
@data_import.update!(records: records, status: :previewing)
|
|
34
37
|
build_report
|
|
35
38
|
WebhookNotifier.notify(@data_import, "import.parsed")
|
|
@@ -90,18 +93,36 @@ module DataPorter
|
|
|
90
93
|
DataPorter.configuration.context_builder&.call(@data_import)
|
|
91
94
|
end
|
|
92
95
|
|
|
93
|
-
def broadcast_progress(current, total)
|
|
94
|
-
percentage = ((current.to_f / total) * 100).round
|
|
96
|
+
def broadcast_progress(current, total, results: nil)
|
|
95
97
|
config = @data_import.config || {}
|
|
96
|
-
config["progress"] = { "current" => current, "total" => total, "percentage" =>
|
|
98
|
+
config["progress"] = { "current" => current, "total" => total, "percentage" => pct(current, total) }
|
|
99
|
+
save_checkpoint(config, current, results) if results
|
|
97
100
|
@data_import.update_column(:config, config)
|
|
98
101
|
@broadcaster.progress(current, total)
|
|
99
102
|
end
|
|
100
103
|
|
|
104
|
+
def pct(current, total)
|
|
105
|
+
((current.to_f / total) * 100).round
|
|
106
|
+
end
|
|
107
|
+
|
|
108
|
+
def save_checkpoint(config, processed, results)
|
|
109
|
+
config["checkpoint"] = {
|
|
110
|
+
"processed" => processed,
|
|
111
|
+
"created" => results[:created],
|
|
112
|
+
"errored" => results[:errored]
|
|
113
|
+
}
|
|
114
|
+
end
|
|
115
|
+
|
|
116
|
+
def clear_stale_import_data
|
|
117
|
+
config = @data_import.config || {}
|
|
118
|
+
config.delete("checkpoint")
|
|
119
|
+
config.delete("progress")
|
|
120
|
+
@data_import.config = config
|
|
121
|
+
end
|
|
122
|
+
|
|
101
123
|
def handle_failure(error)
|
|
102
|
-
report = StoreModels::Report.new
|
|
103
|
-
|
|
104
|
-
)
|
|
124
|
+
report = @data_import.report || StoreModels::Report.new
|
|
125
|
+
report.error_reports = [StoreModels::Error.new(message: error.message)]
|
|
105
126
|
@data_import.update!(status: :failed, report: report)
|
|
106
127
|
@broadcaster.failure(error.message)
|
|
107
128
|
WebhookNotifier.notify(@data_import, "import.failed")
|
data/lib/data_porter/target.rb
CHANGED
|
@@ -10,7 +10,8 @@ module DataPorter
|
|
|
10
10
|
class << self
|
|
11
11
|
attr_reader :_label, :_model_name, :_icon, :_sources,
|
|
12
12
|
:_columns, :_csv_mappings, :_dedup_keys, :_json_root,
|
|
13
|
-
:_api_config, :_dry_run_enabled, :_params, :_webhooks
|
|
13
|
+
:_api_config, :_dry_run_enabled, :_params, :_webhooks,
|
|
14
|
+
:_bulk_config
|
|
14
15
|
|
|
15
16
|
def label(value)
|
|
16
17
|
@_label = value
|
|
@@ -82,6 +83,10 @@ module DataPorter
|
|
|
82
83
|
@_webhooks << DSL::Webhook.new(url: url, **)
|
|
83
84
|
end
|
|
84
85
|
|
|
86
|
+
def bulk_mode(batch_size: 500, on_conflict: :retry_per_record)
|
|
87
|
+
@_bulk_config = { batch_size: batch_size, on_conflict: on_conflict }
|
|
88
|
+
end
|
|
89
|
+
|
|
85
90
|
private
|
|
86
91
|
|
|
87
92
|
def auto_register
|
|
@@ -108,6 +113,16 @@ module DataPorter
|
|
|
108
113
|
raise NotImplementedError
|
|
109
114
|
end
|
|
110
115
|
|
|
116
|
+
def persist_batch(records, context: nil) # rubocop:disable Lint/UnusedMethodArgument
|
|
117
|
+
raise Error, "model_name is required for default persist_batch" unless self.class._model_name
|
|
118
|
+
|
|
119
|
+
now = Time.current
|
|
120
|
+
model_class = self.class._model_name.constantize
|
|
121
|
+
model_class.insert_all(
|
|
122
|
+
records.map { |r| r.data.merge("created_at" => now, "updated_at" => now) }
|
|
123
|
+
)
|
|
124
|
+
end
|
|
125
|
+
|
|
111
126
|
def after_import(_results, context:); end
|
|
112
127
|
|
|
113
128
|
def on_error(_record, _error, context:); end
|
data/lib/data_porter/version.rb
CHANGED
data/lib/data_porter.rb
CHANGED
|
@@ -20,6 +20,7 @@ require_relative "data_porter/record_validator"
|
|
|
20
20
|
require_relative "data_porter/broadcaster"
|
|
21
21
|
require_relative "data_porter/webhook_notifier"
|
|
22
22
|
require_relative "data_porter/orchestrator"
|
|
23
|
+
require_relative "data_porter/auto_mapper"
|
|
23
24
|
require_relative "data_porter/rejects_csv_builder"
|
|
24
25
|
require_relative "data_porter/components"
|
|
25
26
|
require_relative "data_porter/engine"
|
|
@@ -37,6 +37,10 @@ DataPorter.configure do |config|
|
|
|
37
37
|
# Set to nil to disable auto-purge. Run `rake data_porter:purge` manually or via cron.
|
|
38
38
|
# config.purge_after = 60.days
|
|
39
39
|
|
|
40
|
+
# Bulk import: enable per-target via `bulk_mode` in your Target class.
|
|
41
|
+
# Uses insert_all for 10-100x throughput on large imports.
|
|
42
|
+
# See docs/ADVANCED.md for configuration options.
|
|
43
|
+
|
|
40
44
|
# HMAC-SHA256 secret for signing webhook payloads.
|
|
41
45
|
# When set, every webhook request includes an X-DataPorter-Signature header.
|
|
42
46
|
# Set to nil to disable signing (default).
|
data/mkdocs.yml
CHANGED
|
@@ -92,7 +92,9 @@ nav:
|
|
|
92
92
|
- Targets: TARGETS.md
|
|
93
93
|
- Sources: SOURCES.md
|
|
94
94
|
- Column Mapping: MAPPING.md
|
|
95
|
+
- Views & Theming: VIEWS.md
|
|
95
96
|
- Routes: routes.md
|
|
97
|
+
- Advanced: ADVANCED.md
|
|
96
98
|
- Roadmap: ROADMAP.md
|
|
97
99
|
- Changelog: changelog.md
|
|
98
100
|
- Contributing: contributing.md
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: data_porter
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 2.
|
|
4
|
+
version: 2.6.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Seryl Lounis
|
|
@@ -146,6 +146,7 @@ files:
|
|
|
146
146
|
- config/locales/fr.yml
|
|
147
147
|
- config/routes.rb
|
|
148
148
|
- lib/data_porter.rb
|
|
149
|
+
- lib/data_porter/auto_mapper.rb
|
|
149
150
|
- lib/data_porter/broadcaster.rb
|
|
150
151
|
- lib/data_porter/column_transformer.rb
|
|
151
152
|
- lib/data_porter/components.rb
|
|
@@ -167,6 +168,7 @@ files:
|
|
|
167
168
|
- lib/data_porter/dsl/webhook.rb
|
|
168
169
|
- lib/data_porter/engine.rb
|
|
169
170
|
- lib/data_porter/orchestrator.rb
|
|
171
|
+
- lib/data_porter/orchestrator/bulk_importer.rb
|
|
170
172
|
- lib/data_porter/orchestrator/dry_runner.rb
|
|
171
173
|
- lib/data_porter/orchestrator/importer.rb
|
|
172
174
|
- lib/data_porter/orchestrator/record_builder.rb
|