data_porter 2.5.1 → 2.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +18 -0
- data/README.md +2 -0
- data/ROADMAP.md +0 -4
- data/app/controllers/data_porter/imports_controller.rb +7 -1
- data/app/models/data_porter/data_import.rb +5 -1
- data/app/views/data_porter/imports/show.html.erb +5 -1
- data/config/locales/en.yml +1 -0
- data/config/locales/fr.yml +1 -0
- data/config/routes.rb +1 -0
- data/lib/data_porter/orchestrator/bulk_importer.rb +35 -27
- data/lib/data_porter/orchestrator/importer.rb +25 -3
- data/lib/data_porter/orchestrator.rb +25 -6
- data/lib/data_porter/version.rb +1 -1
- data/mkdocs.yml +1 -1
- metadata +1 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 955a886124d8ff2f1da4f23e725a52caec3bbc635450f058e3bbce81f4b898f5
|
|
4
|
+
data.tar.gz: 6f1f0be41999d105c7558b7192f3f61d79f935431ce7b99bede5753d69be9ce3
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 399c87e6daa56196ae96525ca75b27a464a109463d3f4ebf5361738a93cc6385b30e37e588cab45c3a2218b99596dabcd9a48b0a9856eac17902907a9f4f9a34
|
|
7
|
+
data.tar.gz: f423073b27fc407cc94ecc8e11693d40da5210387129f428105ac4973d7b7be034a59b8161c2a4da3c99acc7e4ec1b13d4828c54336e12d441044549b6243151
|
data/CHANGELOG.md
CHANGED
|
@@ -11,6 +11,24 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
|
|
11
11
|
|
|
12
12
|
- **Auto-map heuristics** -- Smart column suggestions that pre-fill mapping selects when CSV/XLSX headers match target fields by exact name or built-in synonym (e.g. "E-mail Address" → email, "fname" → first_name). Supports per-column custom synonyms via `synonyms:` keyword in column DSL. Fallback chain: saved mapping > code-defined > auto-map > empty
|
|
13
13
|
|
|
14
|
+
## [2.6.0] - 2026-02-21
|
|
15
|
+
|
|
16
|
+
### Added
|
|
17
|
+
|
|
18
|
+
- **Resume on failure** -- When an import fails mid-way (crash, timeout, exception), resume from the last successful record instead of re-importing from scratch. Progress checkpoints stored in the existing `config` JSONB column alongside `broadcast_progress` — zero additional DB operations or migrations. Works with both per-record and bulk import modes
|
|
19
|
+
- `resumable?` predicate on `DataImport` — returns `true` when a failed import has a checkpoint with processed records
|
|
20
|
+
- Resume button in the failed import UI (primary action), with Retry demoted to secondary
|
|
21
|
+
- `POST :resume` route on the imports controller
|
|
22
|
+
|
|
23
|
+
### Fixed
|
|
24
|
+
|
|
25
|
+
- `handle_failure` now preserves existing report data (parsed counts, partial results) instead of creating a new empty report
|
|
26
|
+
- `parse!` now clears stale checkpoint and progress data from previous import attempts
|
|
27
|
+
|
|
28
|
+
### Changed
|
|
29
|
+
|
|
30
|
+
- 574 RSpec examples (up from 551), 0 failures
|
|
31
|
+
|
|
14
32
|
## [2.5.1] - 2026-02-21
|
|
15
33
|
|
|
16
34
|
### Fixed
|
data/README.md
CHANGED
|
@@ -103,6 +103,8 @@ pending -> parsing -> previewing -> importing -> completed
|
|
|
103
103
|
|
|
104
104
|
**[Full documentation on GitHub Pages](https://seryllns.github.io/data_porter/)**
|
|
105
105
|
|
|
106
|
+
> **Build series**: Want to see how DataPorter was built step by step? [Building DataPorter on dev.to](https://dev.to/seryllns_/series/35813) -- 30 parts covering architecture, TDD, and every feature from first commit to production.
|
|
107
|
+
|
|
106
108
|
| Topic | Description |
|
|
107
109
|
|---|---|
|
|
108
110
|
| [Configuration](docs/CONFIGURATION.md) | All options, authentication, context builder, real-time updates |
|
data/ROADMAP.md
CHANGED
|
@@ -6,10 +6,6 @@
|
|
|
6
6
|
|
|
7
7
|
Support update (upsert) imports alongside create-only. Given a `deduplicate_by` key, detect existing records and show a diff preview: new records, changed fields (highlighted), unchanged rows. User confirms which changes to apply. Enables recurring data sync workflows.
|
|
8
8
|
|
|
9
|
-
### Resume / retry on failure
|
|
10
|
-
|
|
11
|
-
If an import fails mid-way (timeout, crash, transient error), resume from the last successful record instead of restarting from scratch. Track a checkpoint index in the report. Critical for large imports (5k+ records) where re-processing everything is not acceptable.
|
|
12
|
-
|
|
13
9
|
### API pagination
|
|
14
10
|
|
|
15
11
|
Support paginated API sources. The current API source does a single GET, which works for small datasets but not for APIs returning thousands of records across multiple pages. Support offset, cursor, and link-header pagination strategies via `api_config`:
|
|
@@ -10,7 +10,7 @@ module DataPorter
|
|
|
10
10
|
layout "data_porter/application"
|
|
11
11
|
|
|
12
12
|
before_action :set_import, only: %i[show parse confirm cancel dry_run update_mapping
|
|
13
|
-
status export_rejects destroy back_to_mapping]
|
|
13
|
+
status export_rejects destroy back_to_mapping resume]
|
|
14
14
|
before_action :load_targets, only: %i[index new create]
|
|
15
15
|
|
|
16
16
|
def index
|
|
@@ -69,6 +69,12 @@ module DataPorter
|
|
|
69
69
|
redirect_to import_path(@import)
|
|
70
70
|
end
|
|
71
71
|
|
|
72
|
+
def resume
|
|
73
|
+
@import.update!(status: :pending)
|
|
74
|
+
DataPorter::ImportJob.perform_later(@import.id)
|
|
75
|
+
redirect_to import_path(@import)
|
|
76
|
+
end
|
|
77
|
+
|
|
72
78
|
def dry_run
|
|
73
79
|
@import.update!(status: :pending)
|
|
74
80
|
DataPorter::DryRunJob.perform_later(@import.id)
|
|
@@ -53,12 +53,16 @@ module DataPorter
|
|
|
53
53
|
records.group_by(&:status).transform_values(&:count)
|
|
54
54
|
end
|
|
55
55
|
|
|
56
|
+
def resumable?
|
|
57
|
+
failed? && config&.dig("checkpoint", "processed").to_i.positive?
|
|
58
|
+
end
|
|
59
|
+
|
|
56
60
|
def reset_to_mapping!
|
|
57
61
|
update!(
|
|
58
62
|
status: :mapping,
|
|
59
63
|
records: [],
|
|
60
64
|
report: StoreModels::Report.new,
|
|
61
|
-
config: (config || {}).except("progress")
|
|
65
|
+
config: (config || {}).except("progress", "checkpoint")
|
|
62
66
|
)
|
|
63
67
|
end
|
|
64
68
|
|
|
@@ -103,8 +103,12 @@
|
|
|
103
103
|
<% if @import.failed? %>
|
|
104
104
|
<%= raw DataPorter::Components::Shared::FailureAlert.new(report: @import.report).call %>
|
|
105
105
|
<div class="dp-actions">
|
|
106
|
+
<% if @import.resumable? %>
|
|
107
|
+
<%= button_to t("data_porter.imports.resume"), resume_import_path(@import),
|
|
108
|
+
method: :post, class: "dp-btn dp-btn--primary" %>
|
|
109
|
+
<% end %>
|
|
106
110
|
<%= button_to t("data_porter.imports.retry"), parse_import_path(@import),
|
|
107
|
-
method: :post, class: "dp-btn dp-btn--
|
|
111
|
+
method: :post, class: "dp-btn dp-btn--secondary" %>
|
|
108
112
|
<%= button_to t("data_porter.imports.delete"), import_path(@import),
|
|
109
113
|
method: :delete, class: "dp-btn dp-btn--danger",
|
|
110
114
|
data: { turbo_confirm: t("data_porter.imports.delete_confirm") } %>
|
data/config/locales/en.yml
CHANGED
data/config/locales/fr.yml
CHANGED
data/config/routes.rb
CHANGED
|
@@ -7,46 +7,54 @@ module DataPorter
|
|
|
7
7
|
|
|
8
8
|
def import_bulk
|
|
9
9
|
importable = @data_import.importable_records
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
10
|
+
checkpoint = load_checkpoint
|
|
11
|
+
@bulk_state = build_bulk_state(importable, checkpoint)
|
|
12
|
+
|
|
13
|
+
process_batches(importable.drop(checkpoint[:processed]))
|
|
14
|
+
finalize_import(@bulk_state[:results])
|
|
15
|
+
end
|
|
16
|
+
|
|
17
|
+
def build_bulk_state(importable, checkpoint)
|
|
18
|
+
{
|
|
19
|
+
context: build_context,
|
|
20
|
+
bulk_config: @target.class._bulk_config,
|
|
21
|
+
results: seed_results(checkpoint),
|
|
22
|
+
total: importable.size,
|
|
23
|
+
processed: checkpoint[:processed]
|
|
24
|
+
}
|
|
25
|
+
end
|
|
21
26
|
|
|
22
|
-
|
|
27
|
+
def process_batches(records)
|
|
28
|
+
records.each_slice(@bulk_state[:bulk_config][:batch_size]) do |batch|
|
|
29
|
+
persist_batch_with_fallback(batch)
|
|
30
|
+
@bulk_state[:processed] += batch.size
|
|
31
|
+
broadcast_progress(@bulk_state[:processed], @bulk_state[:total], results: @bulk_state[:results])
|
|
32
|
+
end
|
|
23
33
|
end
|
|
24
34
|
|
|
25
|
-
def persist_batch_with_fallback(batch
|
|
26
|
-
@target.persist_batch(batch, context: context)
|
|
27
|
-
results[:created] += batch.size
|
|
35
|
+
def persist_batch_with_fallback(batch)
|
|
36
|
+
@target.persist_batch(batch, context: @bulk_state[:context])
|
|
37
|
+
@bulk_state[:results][:created] += batch.size
|
|
28
38
|
rescue StandardError => e
|
|
29
|
-
handle_batch_failure(batch,
|
|
39
|
+
handle_batch_failure(batch, e)
|
|
30
40
|
end
|
|
31
41
|
|
|
32
|
-
def handle_batch_failure(batch,
|
|
33
|
-
if
|
|
34
|
-
fail_batch(batch,
|
|
42
|
+
def handle_batch_failure(batch, error)
|
|
43
|
+
if @bulk_state[:bulk_config][:on_conflict] == :fail_batch
|
|
44
|
+
fail_batch(batch, error)
|
|
35
45
|
else
|
|
36
|
-
retry_per_record(batch
|
|
46
|
+
retry_per_record(batch)
|
|
37
47
|
end
|
|
38
48
|
end
|
|
39
49
|
|
|
40
|
-
def fail_batch(batch,
|
|
41
|
-
batch.each
|
|
42
|
-
|
|
43
|
-
end
|
|
44
|
-
results[:errored] += batch.size
|
|
50
|
+
def fail_batch(batch, error)
|
|
51
|
+
batch.each { |record| record.add_error(error.message) }
|
|
52
|
+
@bulk_state[:results][:errored] += batch.size
|
|
45
53
|
end
|
|
46
54
|
|
|
47
|
-
def retry_per_record(batch
|
|
55
|
+
def retry_per_record(batch)
|
|
48
56
|
batch.each do |record|
|
|
49
|
-
persist_record(record, context, results)
|
|
57
|
+
persist_record(record, @bulk_state[:context], @bulk_state[:results])
|
|
50
58
|
end
|
|
51
59
|
end
|
|
52
60
|
end
|
|
@@ -18,12 +18,14 @@ module DataPorter
|
|
|
18
18
|
def import_per_record
|
|
19
19
|
importable = @data_import.importable_records
|
|
20
20
|
context = build_context
|
|
21
|
-
|
|
21
|
+
checkpoint = load_checkpoint
|
|
22
|
+
results = seed_results(checkpoint)
|
|
23
|
+
remaining = importable.drop(checkpoint[:processed])
|
|
22
24
|
total = importable.size
|
|
23
25
|
|
|
24
|
-
|
|
26
|
+
remaining.each_with_index do |record, index|
|
|
25
27
|
persist_record(record, context, results)
|
|
26
|
-
broadcast_progress(index + 1, total)
|
|
28
|
+
broadcast_progress(checkpoint[:processed] + index + 1, total, results: results)
|
|
27
29
|
end
|
|
28
30
|
|
|
29
31
|
finalize_import(results)
|
|
@@ -45,6 +47,7 @@ module DataPorter
|
|
|
45
47
|
end
|
|
46
48
|
|
|
47
49
|
def finalize_import(results)
|
|
50
|
+
clear_checkpoint
|
|
48
51
|
@data_import.update!(status: :completed)
|
|
49
52
|
@broadcaster.success
|
|
50
53
|
WebhookNotifier.notify(@data_import, "import.completed")
|
|
@@ -66,6 +69,25 @@ module DataPorter
|
|
|
66
69
|
report.errored_count = results[:errored]
|
|
67
70
|
@data_import.update!(report: report)
|
|
68
71
|
end
|
|
72
|
+
|
|
73
|
+
def load_checkpoint
|
|
74
|
+
cp = @data_import.config&.dig("checkpoint") || {}
|
|
75
|
+
{
|
|
76
|
+
processed: cp["processed"].to_i,
|
|
77
|
+
created: cp["created"].to_i,
|
|
78
|
+
errored: cp["errored"].to_i
|
|
79
|
+
}
|
|
80
|
+
end
|
|
81
|
+
|
|
82
|
+
def seed_results(checkpoint)
|
|
83
|
+
{ created: checkpoint[:created], errored: checkpoint[:errored] }
|
|
84
|
+
end
|
|
85
|
+
|
|
86
|
+
def clear_checkpoint
|
|
87
|
+
config = @data_import.config || {}
|
|
88
|
+
config.delete("checkpoint")
|
|
89
|
+
@data_import.update_column(:config, config)
|
|
90
|
+
end
|
|
69
91
|
end
|
|
70
92
|
end
|
|
71
93
|
end
|
|
@@ -32,6 +32,7 @@ module DataPorter
|
|
|
32
32
|
def parse!
|
|
33
33
|
@data_import.parsing!
|
|
34
34
|
records = build_records
|
|
35
|
+
clear_stale_import_data
|
|
35
36
|
@data_import.update!(records: records, status: :previewing)
|
|
36
37
|
build_report
|
|
37
38
|
WebhookNotifier.notify(@data_import, "import.parsed")
|
|
@@ -92,18 +93,36 @@ module DataPorter
|
|
|
92
93
|
DataPorter.configuration.context_builder&.call(@data_import)
|
|
93
94
|
end
|
|
94
95
|
|
|
95
|
-
def broadcast_progress(current, total)
|
|
96
|
-
percentage = ((current.to_f / total) * 100).round
|
|
96
|
+
def broadcast_progress(current, total, results: nil)
|
|
97
97
|
config = @data_import.config || {}
|
|
98
|
-
config["progress"] = { "current" => current, "total" => total, "percentage" =>
|
|
98
|
+
config["progress"] = { "current" => current, "total" => total, "percentage" => pct(current, total) }
|
|
99
|
+
save_checkpoint(config, current, results) if results
|
|
99
100
|
@data_import.update_column(:config, config)
|
|
100
101
|
@broadcaster.progress(current, total)
|
|
101
102
|
end
|
|
102
103
|
|
|
104
|
+
def pct(current, total)
|
|
105
|
+
((current.to_f / total) * 100).round
|
|
106
|
+
end
|
|
107
|
+
|
|
108
|
+
def save_checkpoint(config, processed, results)
|
|
109
|
+
config["checkpoint"] = {
|
|
110
|
+
"processed" => processed,
|
|
111
|
+
"created" => results[:created],
|
|
112
|
+
"errored" => results[:errored]
|
|
113
|
+
}
|
|
114
|
+
end
|
|
115
|
+
|
|
116
|
+
def clear_stale_import_data
|
|
117
|
+
config = @data_import.config || {}
|
|
118
|
+
config.delete("checkpoint")
|
|
119
|
+
config.delete("progress")
|
|
120
|
+
@data_import.config = config
|
|
121
|
+
end
|
|
122
|
+
|
|
103
123
|
def handle_failure(error)
|
|
104
|
-
report = StoreModels::Report.new
|
|
105
|
-
|
|
106
|
-
)
|
|
124
|
+
report = @data_import.report || StoreModels::Report.new
|
|
125
|
+
report.error_reports = [StoreModels::Error.new(message: error.message)]
|
|
107
126
|
@data_import.update!(status: :failed, report: report)
|
|
108
127
|
@broadcaster.failure(error.message)
|
|
109
128
|
WebhookNotifier.notify(@data_import, "import.failed")
|
data/lib/data_porter/version.rb
CHANGED
data/mkdocs.yml
CHANGED