data_porter 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (159) hide show
  1. checksums.yaml +7 -0
  2. data/.claude/commands/blog-status.md +10 -0
  3. data/.claude/commands/blog.md +109 -0
  4. data/.claude/commands/task-done.md +27 -0
  5. data/.claude/commands/tm/add-dependency.md +58 -0
  6. data/.claude/commands/tm/add-subtask.md +79 -0
  7. data/.claude/commands/tm/add-task.md +81 -0
  8. data/.claude/commands/tm/analyze-complexity.md +124 -0
  9. data/.claude/commands/tm/analyze-project.md +100 -0
  10. data/.claude/commands/tm/auto-implement-tasks.md +100 -0
  11. data/.claude/commands/tm/command-pipeline.md +80 -0
  12. data/.claude/commands/tm/complexity-report.md +120 -0
  13. data/.claude/commands/tm/convert-task-to-subtask.md +74 -0
  14. data/.claude/commands/tm/expand-all-tasks.md +52 -0
  15. data/.claude/commands/tm/expand-task.md +52 -0
  16. data/.claude/commands/tm/fix-dependencies.md +82 -0
  17. data/.claude/commands/tm/help.md +101 -0
  18. data/.claude/commands/tm/init-project-quick.md +49 -0
  19. data/.claude/commands/tm/init-project.md +53 -0
  20. data/.claude/commands/tm/install-taskmaster.md +118 -0
  21. data/.claude/commands/tm/learn.md +106 -0
  22. data/.claude/commands/tm/list-tasks-by-status.md +42 -0
  23. data/.claude/commands/tm/list-tasks-with-subtasks.md +30 -0
  24. data/.claude/commands/tm/list-tasks.md +46 -0
  25. data/.claude/commands/tm/next-task.md +69 -0
  26. data/.claude/commands/tm/parse-prd-with-research.md +51 -0
  27. data/.claude/commands/tm/parse-prd.md +52 -0
  28. data/.claude/commands/tm/project-status.md +67 -0
  29. data/.claude/commands/tm/quick-install-taskmaster.md +23 -0
  30. data/.claude/commands/tm/remove-all-subtasks.md +94 -0
  31. data/.claude/commands/tm/remove-dependency.md +65 -0
  32. data/.claude/commands/tm/remove-subtask.md +87 -0
  33. data/.claude/commands/tm/remove-subtasks.md +89 -0
  34. data/.claude/commands/tm/remove-task.md +110 -0
  35. data/.claude/commands/tm/setup-models.md +52 -0
  36. data/.claude/commands/tm/show-task.md +85 -0
  37. data/.claude/commands/tm/smart-workflow.md +58 -0
  38. data/.claude/commands/tm/sync-readme.md +120 -0
  39. data/.claude/commands/tm/tm-main.md +147 -0
  40. data/.claude/commands/tm/to-cancelled.md +58 -0
  41. data/.claude/commands/tm/to-deferred.md +50 -0
  42. data/.claude/commands/tm/to-done.md +47 -0
  43. data/.claude/commands/tm/to-in-progress.md +39 -0
  44. data/.claude/commands/tm/to-pending.md +35 -0
  45. data/.claude/commands/tm/to-review.md +43 -0
  46. data/.claude/commands/tm/update-single-task.md +122 -0
  47. data/.claude/commands/tm/update-task.md +75 -0
  48. data/.claude/commands/tm/update-tasks-from-id.md +111 -0
  49. data/.claude/commands/tm/validate-dependencies.md +72 -0
  50. data/.claude/commands/tm/view-models.md +52 -0
  51. data/.env.example +12 -0
  52. data/.mcp.json +24 -0
  53. data/.taskmaster/CLAUDE.md +435 -0
  54. data/.taskmaster/config.json +44 -0
  55. data/.taskmaster/docs/prd.txt +2044 -0
  56. data/.taskmaster/state.json +6 -0
  57. data/.taskmaster/tasks/task_001.md +19 -0
  58. data/.taskmaster/tasks/task_002.md +19 -0
  59. data/.taskmaster/tasks/task_003.md +19 -0
  60. data/.taskmaster/tasks/task_004.md +19 -0
  61. data/.taskmaster/tasks/task_005.md +19 -0
  62. data/.taskmaster/tasks/task_006.md +19 -0
  63. data/.taskmaster/tasks/task_007.md +19 -0
  64. data/.taskmaster/tasks/task_008.md +19 -0
  65. data/.taskmaster/tasks/task_009.md +19 -0
  66. data/.taskmaster/tasks/task_010.md +19 -0
  67. data/.taskmaster/tasks/task_011.md +19 -0
  68. data/.taskmaster/tasks/task_012.md +19 -0
  69. data/.taskmaster/tasks/task_013.md +19 -0
  70. data/.taskmaster/tasks/task_014.md +19 -0
  71. data/.taskmaster/tasks/task_015.md +19 -0
  72. data/.taskmaster/tasks/task_016.md +19 -0
  73. data/.taskmaster/tasks/task_017.md +19 -0
  74. data/.taskmaster/tasks/task_018.md +19 -0
  75. data/.taskmaster/tasks/task_019.md +19 -0
  76. data/.taskmaster/tasks/task_020.md +19 -0
  77. data/.taskmaster/tasks/tasks.json +299 -0
  78. data/.taskmaster/templates/example_prd.txt +47 -0
  79. data/.taskmaster/templates/example_prd_rpg.txt +511 -0
  80. data/CHANGELOG.md +29 -0
  81. data/CLAUDE.md +65 -0
  82. data/CODE_OF_CONDUCT.md +10 -0
  83. data/CONTRIBUTING.md +49 -0
  84. data/LICENSE +21 -0
  85. data/README.md +463 -0
  86. data/Rakefile +12 -0
  87. data/app/assets/stylesheets/data_porter/application.css +646 -0
  88. data/app/channels/data_porter/import_channel.rb +10 -0
  89. data/app/controllers/data_porter/imports_controller.rb +68 -0
  90. data/app/javascript/data_porter/progress_controller.js +33 -0
  91. data/app/jobs/data_porter/dry_run_job.rb +12 -0
  92. data/app/jobs/data_porter/import_job.rb +12 -0
  93. data/app/jobs/data_porter/parse_job.rb +12 -0
  94. data/app/models/data_porter/data_import.rb +49 -0
  95. data/app/views/data_porter/imports/index.html.erb +142 -0
  96. data/app/views/data_porter/imports/new.html.erb +88 -0
  97. data/app/views/data_porter/imports/show.html.erb +49 -0
  98. data/config/database.yml +3 -0
  99. data/config/routes.rb +12 -0
  100. data/docs/SPEC.md +2012 -0
  101. data/docs/UI.md +32 -0
  102. data/docs/blog/001-why-build-a-data-import-engine.md +166 -0
  103. data/docs/blog/002-scaffolding-a-rails-engine.md +188 -0
  104. data/docs/blog/003-configuration-dsl.md +222 -0
  105. data/docs/blog/004-store-model-jsonb.md +237 -0
  106. data/docs/blog/005-target-dsl.md +284 -0
  107. data/docs/blog/006-parsing-csv-sources.md +300 -0
  108. data/docs/blog/007-orchestrator.md +247 -0
  109. data/docs/blog/008-actioncable-stimulus.md +376 -0
  110. data/docs/blog/009-phlex-ui-components.md +446 -0
  111. data/docs/blog/010-controllers-routing.md +374 -0
  112. data/docs/blog/011-generators.md +364 -0
  113. data/docs/blog/012-json-api-sources.md +323 -0
  114. data/docs/blog/013-testing-rails-engine.md +618 -0
  115. data/docs/blog/014-dry-run.md +307 -0
  116. data/docs/blog/015-publishing-retro.md +264 -0
  117. data/docs/blog/016-erb-view-templates.md +431 -0
  118. data/docs/blog/017-showcase-final-retro.md +220 -0
  119. data/docs/blog/BACKLOG.md +8 -0
  120. data/docs/blog/SERIES.md +154 -0
  121. data/docs/screenshots/index-with-previewing.jpg +0 -0
  122. data/docs/screenshots/index.jpg +0 -0
  123. data/docs/screenshots/modal-new-import.jpg +0 -0
  124. data/docs/screenshots/preview.jpg +0 -0
  125. data/lib/data_porter/broadcaster.rb +29 -0
  126. data/lib/data_porter/components/base.rb +10 -0
  127. data/lib/data_porter/components/failure_alert.rb +20 -0
  128. data/lib/data_porter/components/preview_table.rb +54 -0
  129. data/lib/data_porter/components/progress_bar.rb +33 -0
  130. data/lib/data_porter/components/results_summary.rb +19 -0
  131. data/lib/data_porter/components/status_badge.rb +16 -0
  132. data/lib/data_porter/components/summary_cards.rb +30 -0
  133. data/lib/data_porter/components.rb +14 -0
  134. data/lib/data_porter/configuration.rb +25 -0
  135. data/lib/data_porter/dsl/api_config.rb +25 -0
  136. data/lib/data_porter/dsl/column.rb +17 -0
  137. data/lib/data_porter/engine.rb +15 -0
  138. data/lib/data_porter/orchestrator.rb +141 -0
  139. data/lib/data_porter/record_validator.rb +32 -0
  140. data/lib/data_porter/registry.rb +33 -0
  141. data/lib/data_porter/sources/api.rb +49 -0
  142. data/lib/data_porter/sources/base.rb +35 -0
  143. data/lib/data_porter/sources/csv.rb +43 -0
  144. data/lib/data_porter/sources/json.rb +45 -0
  145. data/lib/data_porter/sources.rb +20 -0
  146. data/lib/data_porter/store_models/error.rb +13 -0
  147. data/lib/data_porter/store_models/import_record.rb +52 -0
  148. data/lib/data_porter/store_models/report.rb +21 -0
  149. data/lib/data_porter/target.rb +89 -0
  150. data/lib/data_porter/type_validator.rb +46 -0
  151. data/lib/data_porter/version.rb +5 -0
  152. data/lib/data_porter.rb +32 -0
  153. data/lib/generators/data_porter/install/install_generator.rb +33 -0
  154. data/lib/generators/data_porter/install/templates/create_data_porter_imports.rb.erb +21 -0
  155. data/lib/generators/data_porter/install/templates/initializer.rb +30 -0
  156. data/lib/generators/data_porter/target/target_generator.rb +44 -0
  157. data/lib/generators/data_porter/target/templates/target.rb.tt +20 -0
  158. data/sig/data_porter.rbs +4 -0
  159. metadata +274 -0
@@ -0,0 +1,307 @@
1
+ ---
2
+ title: "Building DataPorter #14 -- Dry Run : Valider avant d'importer"
3
+ series: "Building DataPorter - A Data Import Engine for Rails"
4
+ part: 14
5
+ tags: [ruby, rails, rails-engine, gem-development, dry-run, validation, store-model, activejob]
6
+ published: false
7
+ ---
8
+
9
+ # Dry Run : Valider avant d'importer
10
+
11
+ > Le preview attrape les erreurs de colonnes. Le dry run attrape les erreurs de base de donnees. Deux filets de securite, deux niveaux de confiance.
12
+
13
+ ## Context
14
+
15
+ This is part 14 of the series where we build **DataPorter**, a mountable Rails engine for data import workflows. In [part 13](#), we have detailed the testing strategy: in-memory SQLite, structural controller specs, anonymous target classes, and a spec_helper that bootstraps just enough Rails to cover every layer.
16
+
17
+ We have now a complete import pipeline: parse the file, preview the records, confirm, persist. But there is a gap between the preview and the real import. The preview validates the *data* -- required fields, types, formats. It does not validate what happens when that data hits the database. A uniqueness constraint, a foreign key violation, a custom model validation that queries other tables -- none of these surface until `persist` is actually called. By then, the import is running for real.
18
+
19
+ In this article, we build a **dry run** mode that bridges this gap. It runs the full persist logic inside a transaction, captures any database-level errors on each record, then rolls back. The user sees exactly which records would fail *before* committing to the import.
20
+
21
+ ## Why two validation layers
22
+
23
+ The preview phase runs the RecordValidator against column definitions. It catches structural problems: "this field is required and it is empty", "this field should be an email but it is not". These are fast, stateless checks that do not touch the database.
24
+
25
+ But many real-world validations are stateful. A `validates_uniqueness_of :email` on the User model requires a database query. A `belongs_to :company` with a foreign key constraint requires the company to exist. A custom validation that checks `if: -> { some_scope.exists? }` requires the full ActiveRecord context. None of these can run during preview because there is no model instance, no transaction, no database connection in the validation path.
26
+
27
+ The dry run fills this gap. It calls the target's `persist` method -- the same method that the real import uses -- but captures exceptions instead of letting them propagate. Each record gets annotated with its result: passed or failed, with the error message attached.
28
+
29
+ ## The `dry_run_enabled` DSL flag
30
+
31
+ Not every import needs a dry run. A simple CSV-to-table import with no uniqueness constraints might not benefit from the overhead. We make it opt-in at the target level:
32
+
33
+ ```ruby
34
+ # app/importers/user_import.rb
35
+ class UserImport < DataPorter::Target
36
+ label "Users"
37
+ model_name "User"
38
+ dry_run_enabled
39
+
40
+ columns do
41
+ column :email, type: :email, required: true
42
+ column :name, type: :string, required: true
43
+ end
44
+
45
+ def persist(record, context:)
46
+ User.create!(record.attributes)
47
+ end
48
+ end
49
+ ```
50
+
51
+ The `dry_run_enabled` class method is a simple flag on the Target DSL:
52
+
53
+ ```ruby
54
+ # lib/data_porter/target.rb
55
+ class << self
56
+ attr_reader :_dry_run_enabled
57
+
58
+ def dry_run_enabled
59
+ @_dry_run_enabled = true
60
+ end
61
+ end
62
+ ```
63
+
64
+ No arguments, no block, no configuration. Either the target supports dry run or it does not. The controller checks `target_class._dry_run_enabled` to decide whether to show the "Dry Run" button on the preview page.
65
+
66
+ ## The `dry_running` status
67
+
68
+ The DataImport enum needed a new state. The import transitions from `previewing` to `dry_running` while the dry run executes, then back to `previewing` when it completes:
69
+
70
+ ```ruby
71
+ # app/models/data_porter/data_import.rb
72
+ enum :status, {
73
+ pending: 0,
74
+ parsing: 1,
75
+ previewing: 2,
76
+ importing: 3,
77
+ completed: 4,
78
+ failed: 5,
79
+ dry_running: 6
80
+ }
81
+ ```
82
+
83
+ The value `6` is appended at the end rather than inserted in logical order. This is intentional -- existing records in production have integer status values. Inserting `dry_running` at position 3 would shift `importing`, `completed`, and `failed`, corrupting every existing import. Enums with integer backing must be append-only.
84
+
85
+ The state flow is: `previewing -> dry_running -> previewing`. The dry run is not a terminal state. It enriches the records with database-level feedback and returns to the preview so the user can review the results and decide whether to proceed with the real import.
86
+
87
+ ## The `dry_run!` flow
88
+
89
+ The Orchestrator gains a third public method alongside `parse!` and `import!`:
90
+
91
+ ```ruby
92
+ # lib/data_porter/orchestrator.rb
93
+ def dry_run!
94
+ @data_import.dry_running!
95
+ run_dry_run_records
96
+ @data_import.update!(status: :previewing)
97
+ build_report
98
+ rescue StandardError => e
99
+ handle_failure(e)
100
+ end
101
+ ```
102
+
103
+ The structure mirrors `parse!` and `import!`: transition to the working status, do the work, transition to the result status, rebuild the report. The `rescue` catches catastrophic failures and transitions to `failed` with an error report.
104
+
105
+ The real work happens in `run_dry_run_records`:
106
+
107
+ ```ruby
108
+ def run_dry_run_records
109
+ records = @data_import.records
110
+ importable = records.select(&:importable?)
111
+ context = build_context
112
+
113
+ importable.each do |record|
114
+ dry_run_record(record, context)
115
+ end
116
+
117
+ @data_import.records_will_change!
118
+ @data_import.update!(records: records)
119
+ end
120
+
121
+ def dry_run_record(record, context)
122
+ @target.persist(record, context: context)
123
+ record.dry_run_passed = true
124
+ rescue StandardError => e
125
+ record.dry_run_passed = false
126
+ record.add_error(e.message)
127
+ end
128
+ ```
129
+
130
+ For each importable record, we call the target's actual `persist` method. If it succeeds, `dry_run_passed` is set to `true`. If it raises -- an `ActiveRecord::RecordInvalid`, a constraint violation, any exception -- we capture the message on the record and mark it as failed. The import data is never committed because the dry run operates at the record level, catching errors individually.
131
+
132
+ The `dry_run_passed` attribute on ImportRecord is a simple boolean:
133
+
134
+ ```ruby
135
+ # lib/data_porter/store_models/import_record.rb
136
+ attribute :dry_run_passed, :boolean, default: false
137
+ ```
138
+
139
+ After the dry run, the preview table can show a green check or a red cross next to each record, along with the specific error message for failures. The user gets a precise map of what will work and what will not.
140
+
141
+ ## The StoreModel dirty tracking gotcha
142
+
143
+ There is a subtle but critical line in `run_dry_run_records`:
144
+
145
+ ```ruby
146
+ @data_import.records_will_change!
147
+ @data_import.update!(records: records)
148
+ ```
149
+
150
+ Why `records_will_change!` before `update!`? The answer lies in how ActiveRecord tracks changes on complex attributes.
151
+
152
+ StoreModel attributes are serialized to JSON and stored in a text (or JSONB) column. When you modify an object *in place* -- setting `record.dry_run_passed = true` on a record that already exists in the `records` array -- ActiveRecord does not detect the change. From its perspective, the `records` attribute still points to the same Ruby array at the same memory address. The serialized value has changed, but ActiveRecord's dirty tracking compares object identity, not serialized content.
153
+
154
+ Without `records_will_change!`, the `update!` call would see "records has not changed" and skip the column in the SQL UPDATE. The dry run results would be computed correctly in memory but never persisted to the database. The user would see no change on the preview page.
155
+
156
+ `records_will_change!` explicitly marks the attribute as dirty, forcing ActiveRecord to include it in the next save. This is a well-known pattern with serialized attributes, but it is easy to forget -- and the failure mode is silent. The data looks correct in the current process, the tests that do not reload from the database pass, and only the production user sees stale results.
157
+
158
+ This is one of those bugs that TDD catches early. The spec reloads the import from the database and checks `dry_run_passed` on the reloaded records:
159
+
160
+ ```ruby
161
+ it "marks records as dry_run_passed on success" do
162
+ DataPorter::Orchestrator.new(import.reload).dry_run!
163
+ import.reload.records.each do |record|
164
+ expect(record.dry_run_passed).to be true
165
+ end
166
+ end
167
+ ```
168
+
169
+ The `import.reload` forces a fresh read from SQLite. Without `records_will_change!`, this spec fails -- `dry_run_passed` is still `false` in the database even though it was set to `true` in memory.
170
+
171
+ ## DryRunJob
172
+
173
+ Like `parse!` and `import!`, the dry run executes asynchronously via a dedicated job:
174
+
175
+ ```ruby
176
+ # app/jobs/data_porter/dry_run_job.rb
177
+ class DryRunJob < ActiveJob::Base
178
+ queue_as { DataPorter.configuration.queue_name }
179
+
180
+ def perform(import_id)
181
+ data_import = DataImport.find(import_id)
182
+ Orchestrator.new(data_import).dry_run!
183
+ end
184
+ end
185
+ ```
186
+
187
+ Same pattern as ParseJob and ImportJob: find the import, delegate to the Orchestrator. The job itself has no logic -- it is a one-liner that bridges the async boundary.
188
+
189
+ ## Controller action and route
190
+
191
+ The controller gains a `dry_run` action:
192
+
193
+ ```ruby
194
+ # app/controllers/data_porter/imports_controller.rb
195
+ before_action :set_import, only: %i[show parse confirm cancel dry_run]
196
+
197
+ def dry_run
198
+ DataPorter::DryRunJob.perform_later(@import.id)
199
+ redirect_to import_path(@import)
200
+ end
201
+ ```
202
+
203
+ And the route:
204
+
205
+ ```ruby
206
+ resources :imports, only: %i[index new create show] do
207
+ member do
208
+ post :parse
209
+ post :confirm
210
+ post :cancel
211
+ post :dry_run
212
+ end
213
+ end
214
+ ```
215
+
216
+ The pattern is identical to the other member actions: POST triggers a side effect (enqueue a job), redirect back to the show page where ActionCable will push progress updates. The view conditionally shows the "Dry Run" button only when `target_class._dry_run_enabled` is true and the import is in `previewing` status.
217
+
218
+ ## Testing
219
+
220
+ The dry run specs follow the series' established patterns -- anonymous target classes, registry cleanup, and database round-trip assertions:
221
+
222
+ ```ruby
223
+ RSpec.describe "Dry Run" do
224
+ let(:target_class) do
225
+ klass = Class.new(DataPorter::Target) do
226
+ label "Guests"
227
+ model_name "Guest"
228
+ dry_run_enabled
229
+
230
+ columns do
231
+ column :first_name, type: :string, required: true
232
+ column :last_name, type: :string
233
+ end
234
+ end
235
+ klass.define_method(:persist) do |record, context:|
236
+ record
237
+ end
238
+ klass
239
+ end
240
+
241
+ describe "Orchestrator#dry_run!" do
242
+ it "transitions to previewing after dry run" do
243
+ DataPorter::Orchestrator.new(import.reload).dry_run!
244
+ expect(import.reload.status).to eq("previewing")
245
+ end
246
+
247
+ it "marks records as dry_run_passed on success" do
248
+ DataPorter::Orchestrator.new(import.reload).dry_run!
249
+ import.reload.records.each do |record|
250
+ expect(record.dry_run_passed).to be true
251
+ end
252
+ end
253
+
254
+ it "captures errors from failing persist" do
255
+ # Target that raises ActiveRecord::RecordInvalid
256
+ DataPorter::Orchestrator.new(failing_import.reload).dry_run!
257
+
258
+ record = failing_import.reload.records.first
259
+ expect(record.dry_run_passed).to be false
260
+ expect(record.errors_list.map(&:message)).to include(match(/Validation failed/))
261
+ end
262
+ end
263
+ end
264
+ ```
265
+
266
+ The failing target class simulates a database-level error by raising `ActiveRecord::RecordInvalid` in `persist`. The spec verifies that the error is captured on the record, that `dry_run_passed` is false, and that the import still transitions to `previewing` -- not `failed`. A record-level error is expected operational feedback, not a catastrophic failure.
267
+
268
+ The DryRunJob spec verifies delegation:
269
+
270
+ ```ruby
271
+ describe "DryRunJob" do
272
+ it "calls Orchestrator#dry_run!" do
273
+ orchestrator = instance_double(DataPorter::Orchestrator, dry_run!: nil)
274
+ allow(DataPorter::Orchestrator).to receive(:new).and_return(orchestrator)
275
+
276
+ DataPorter::DryRunJob.new.perform(import.id)
277
+
278
+ expect(orchestrator).to have_received(:dry_run!)
279
+ end
280
+ end
281
+ ```
282
+
283
+ ## Decisions & tradeoffs
284
+
285
+ | Decision | We chose | Over | Because |
286
+ |----------|----------|------|---------|
287
+ | Opt-in flag | `dry_run_enabled` on Target DSL | Always-on dry run | Not every import benefits from the overhead; simple imports can skip it |
288
+ | Status value | Append `dry_running: 6` at the end of the enum | Insert in logical order | Integer-backed enums must be append-only to avoid corrupting existing data |
289
+ | Dirty tracking | Explicit `records_will_change!` | Reassigning the array (`self.records = records.dup`) | More explicit about intent; avoids unnecessary array duplication; documents the StoreModel gotcha |
290
+ | Error boundary | Per-record rescue in `dry_run_record` | Wrapping all records in a single begin/rescue | One failing record should not prevent the others from being validated |
291
+
292
+ ## Recap
293
+
294
+ - The **dry run** bridges the gap between preview (column-level validation) and real import (database-level validation), giving users a complete picture before any data is committed.
295
+ - The **`dry_run_enabled` DSL flag** makes it opt-in per target -- not every import needs the overhead.
296
+ - The **`dry_running` status** follows the append-only rule for integer-backed enums, preserving existing data.
297
+ - The **`records_will_change!` call** is the key to making StoreModel in-place mutations persist -- without it, ActiveRecord skips the attribute in the SQL UPDATE because its dirty tracking does not detect in-place changes on serialized objects.
298
+ - The **DryRunJob** follows the same thin-job pattern as ParseJob and ImportJob: find, delegate, done.
299
+ - The **controller action and route** mirror the existing member actions: POST triggers a job, redirect back to show.
300
+
301
+ ## Next up
302
+
303
+ We now have a full-featured, tested, and safe import engine. In part 15, we wrap up the series: **publishing the gem to RubyGems**, writing a proper CHANGELOG, choosing a versioning strategy, and reflecting on what worked, what we would do differently, and what DataPorter looks like from the outside.
304
+
305
+ ---
306
+
307
+ *This is part 14 of the series "Building DataPorter - A Data Import Engine for Rails". [Previous: Testing a Rails Engine with RSpec](#) | [Next: Publishing the Gem & Retrospective](#)*
@@ -0,0 +1,264 @@
1
+ ---
2
+ title: "Building DataPorter #15 -- Publication et retrospective"
3
+ series: "Building DataPorter - A Data Import Engine for Rails"
4
+ part: 15
5
+ tags: [ruby, rails, rails-engine, gem-development, rubygems, retrospective, open-source]
6
+ published: false
7
+ ---
8
+
9
+ # Publication et retrospective
10
+
11
+ > De `bundle gem` a `gem push` : retour sur 14 articles, 20 composants, et les lecons apprises en construisant un Rails engine de A a Z avec TDD.
12
+
13
+ ## Context
14
+
15
+ This is the final article in the series where we build **DataPorter**, a mountable Rails engine for data import workflows. In [part 14](#), we added Dry Run mode -- the last safety net before data touches the database.
16
+
17
+ We started this series with a question: why do we keep rebuilding the same import workflow in every Rails app? Fourteen articles later, we have a published gem that answers it. This article covers the last mile -- publishing to RubyGems -- then steps back to look at what we built, what we learned, and what we would do differently.
18
+
19
+ ## Publishing the gem
20
+
21
+ ### Le gemspec final
22
+
23
+ The gemspec is the identity card of a Ruby gem. Everything RubyGems needs to index, display, and resolve dependencies lives here. Here is ours in its final form:
24
+
25
+ ```ruby
26
+ # data_porter.gemspec
27
+ Gem::Specification.new do |spec|
28
+ spec.name = "data_porter"
29
+ spec.version = DataPorter::VERSION
30
+ spec.authors = ["Seryl Lounis"]
31
+ spec.email = ["seryllounis@outlook.fr"]
32
+
33
+ spec.summary = "Rails engine for multi-step data imports with preview"
34
+ spec.description = "A mountable Rails engine providing a complete data import workflow: " \
35
+ "upload/configure, preview with validation, and import. " \
36
+ "Supports CSV, JSON, and API sources with a simple DSL for defining import targets."
37
+ spec.homepage = "https://github.com/SerylLns/data_porter"
38
+ spec.license = "MIT"
39
+ spec.required_ruby_version = ">= 3.2.0"
40
+
41
+ spec.metadata["homepage_uri"] = spec.homepage
42
+ spec.metadata["source_code_uri"] = "https://github.com/SerylLns/data_porter"
43
+ spec.metadata["changelog_uri"] = "https://github.com/SerylLns/data_porter/blob/master/CHANGELOG.md"
44
+ spec.metadata["rubygems_mfa_required"] = "true"
45
+
46
+ # ...
47
+
48
+ spec.add_dependency "csv"
49
+ spec.add_dependency "phlex", ">= 1.0"
50
+ spec.add_dependency "rails", ">= 7.0"
51
+ spec.add_dependency "store_model", ">= 2.0"
52
+ spec.add_dependency "turbo-rails", ">= 1.0"
53
+ end
54
+ ```
55
+
56
+ Quelques points a noter. `rubygems_mfa_required` force l'authentification multi-facteur pour publier -- c'est devenu un standard pour tout gem open source serieux. Le `required_ruby_version` a `>= 3.2.0` exclut les versions de Ruby qui ne sont plus maintenues. Les dependances runtime sont volontairement larges (`>= 1.0`, `>= 7.0`) pour eviter de bloquer les host apps sur des versions specifiques.
57
+
58
+ Le filtre `spec.files` exclut les fichiers de dev (`spec/`, `bin/`, `.github/`) pour que le gem publie ne contienne que le code de production. C'est important -- personne ne veut telecharger 2 Mo de specs quand il installe un gem.
59
+
60
+ ### Versioning
61
+
62
+ DataPorter suit le semantic versioning :
63
+
64
+ - **0.1.0** : premiere release. Le `0.x` indique clairement que l'API peut encore evoluer.
65
+ - **0.x.y** : chaque nouvelle feature (un nouveau type de source, un nouveau composant) incremente le minor. Chaque bugfix incremente le patch.
66
+ - **1.0.0** : viendra quand l'API sera stabilisee et testee en production sur plusieurs apps.
67
+
68
+ Le numero de version vit dans un seul fichier :
69
+
70
+ ```ruby
71
+ # lib/data_porter/version.rb
72
+ module DataPorter
73
+ VERSION = "0.1.0"
74
+ end
75
+ ```
76
+
77
+ Un seul endroit a modifier. Le gemspec le lit avec `require_relative`. Le CHANGELOG le reference. Le tag Git le reprend. Pas de duplication.
78
+
79
+ ### Le workflow de publication
80
+
81
+ ```bash
82
+ # 1. Mettre a jour la version
83
+ # lib/data_porter/version.rb -> VERSION = "0.1.0"
84
+
85
+ # 2. Mettre a jour le CHANGELOG
86
+ # CHANGELOG.md -> ## [0.1.0] - 2026-02-06
87
+
88
+ # 3. Commit, tag, push
89
+ git add -A && git commit -m "Release v0.1.0"
90
+ git tag v0.1.0
91
+ git push origin master --tags
92
+
93
+ # 4. Build et push
94
+ gem build data_porter.gemspec
95
+ gem push data_porter-0.1.0.gem
96
+ ```
97
+
98
+ Ou, si le Rakefile est configure avec `bundler/gem_tasks` :
99
+
100
+ ```bash
101
+ bundle exec rake release
102
+ ```
103
+
104
+ Cette commande fait tout d'un coup : build, tag Git, push Git, push RubyGems. C'est la methode recommandee parce qu'elle garantit que le tag et le gem sont synchronises.
105
+
106
+ ## Documentation
107
+
108
+ Un gem sans documentation est un gem que personne n'utilisera. DataPorter s'appuie sur trois niveaux de doc :
109
+
110
+ **Le README** : point d'entree. Installation en une commande (`rails generate data_porter:install`), un exemple de Target en 15 lignes, le diagramme du workflow en trois etapes. Un developpeur doit pouvoir comprendre ce que fait le gem et l'installer en moins de 5 minutes.
111
+
112
+ **Le CHANGELOG** : chaque release documentee avec ce qui a change, ce qui a ete ajoute, ce qui a casse. Format [Keep a Changelog](https://keepachangelog.com/) -- c'est un standard que la communaute Ruby connait.
113
+
114
+ **Les commentaires inline** : chaque methode publique documentee avec YARD. Le DSL est le point le plus critique -- `column`, `sources`, `csv_mapping`, `persist` doivent etre documentes avec des exemples, parce que c'est ce que les utilisateurs liront le plus.
115
+
116
+ ## Ce qu'on a construit
117
+
118
+ Voici la liste complete des composants qui forment DataPorter, dans l'ordre ou on les a construits :
119
+
120
+ | # | Composant | Role |
121
+ |---|-----------|------|
122
+ | 1 | **Engine + isolate_namespace** | Structure du gem, isolation des noms |
123
+ | 2 | **Configuration DSL** | `DataPorter.configure`, defaults, `context_builder` |
124
+ | 3 | **StoreModels (ImportRecord, Error, Report)** | Structures JSONB typees sans tables supplementaires |
125
+ | 4 | **TypeValidator** | Validation de types (email, phone, url) sur les colonnes |
126
+ | 5 | **Target DSL** | `label`, `model`, `columns`, `sources`, `persist` |
127
+ | 6 | **Registry** | Auto-decouverte et resolution des targets |
128
+ | 7 | **Source::Base + Source::CSV** | Abstraction de sources, parsing CSV avec mapping |
129
+ | 8 | **DataImport model** | ActiveRecord, enum status, polymorphic user |
130
+ | 9 | **Orchestrator** | Coordination parse/import, gestion d'erreurs par record |
131
+ | 10 | **RecordValidator** | Validations generiques (required, type) |
132
+ | 11 | **ParseJob + ImportJob** | Background processing via ActiveJob |
133
+ | 12 | **Broadcaster + ImportChannel** | Progression temps reel via ActionCable |
134
+ | 13 | **7 composants Phlex** | StatusBadge, SummaryCards, PreviewTable, ProgressBar, ResultsSummary, FailureAlert |
135
+ | 14 | **Stimulus controller** | Animation de la barre de progression cote client |
136
+ | 15 | **ImportsController** | Heritage dynamique, 7 actions, Turbo integration |
137
+ | 16 | **Install generator** | Migration, initializer, routes, repertoire importers |
138
+ | 17 | **Target generator** | Scaffold de target avec parsing de colonnes |
139
+ | 18 | **Source::JSON** | Import depuis fichier JSON ou texte brut |
140
+ | 19 | **Source::API** | Import depuis endpoint HTTP avec auth et params |
141
+ | 20 | **Dry Run** | Transaction + rollback, enrichissement des records avec erreurs DB |
142
+
143
+ Vingt composants. Chacun avec ses specs. Chacun avec un article qui explique pourquoi il existe et comment il fonctionne.
144
+
145
+ ## L'architecture : le flux complet
146
+
147
+ Voici ce qui se passe quand un utilisateur importe un fichier CSV, du debut a la fin :
148
+
149
+ ```
150
+ Upload (Controller#create)
151
+ |
152
+ v
153
+ Parse (ParseJob -> Orchestrator#parse!)
154
+ |-- Source::CSV.fetch -> raw rows
155
+ |-- Target.transform(record) -> transformation
156
+ |-- RecordValidator.validate(record) -> required, types
157
+ |-- Target.validate(record) -> business rules
158
+ |-- record.determine_status! -> complete/partial/missing
159
+ |-- Broadcaster -> ActionCable -> Stimulus -> progress bar
160
+ |
161
+ v
162
+ Preview (Controller#show)
163
+ |-- PreviewTable(columns, records) -> tableau dynamique
164
+ |-- SummaryCards(report) -> compteurs par statut
165
+ |-- StatusBadge(status) -> badge "previewing"
166
+ |
167
+ v
168
+ Dry Run (DryRunJob -> Orchestrator dans transaction + rollback)
169
+ |-- Enrichit les records avec les erreurs DB
170
+ |-- Broadcaster -> progression
171
+ |
172
+ v
173
+ Import (ImportJob -> Orchestrator#import!)
174
+ |-- Target.persist(record, context:) -> par record
175
+ |-- rescue -> record.add_error, continue
176
+ |-- Target.after_import(results, context:)
177
+ |-- Broadcaster -> "completed"
178
+ |
179
+ v
180
+ Results (Controller#show)
181
+ |-- ResultsSummary(report) -> imported/errored counts
182
+ |-- PreviewTable avec erreurs inline
183
+ ```
184
+
185
+ Le gem possede l'infrastructure. La host app possede la logique metier. La separation est nette : un seul fichier Target et un initializer, c'est tout ce que la host app doit fournir.
186
+
187
+ ## Lecons apprises
188
+
189
+ ### TDD sans dummy app
190
+
191
+ La decision la plus structurante de la serie : tester le engine sans creer d'application Rails dans `spec/dummy/`. Un `spec_helper.rb` de 60 lignes qui bootstrap SQLite en memoire, configure les load paths, et stub `ApplicationController`. Ca marche, et ca marche bien -- le suite tourne en moins d'une seconde.
192
+
193
+ L'avantage inattendu : cette contrainte force a garder chaque composant decouple. Si un composant a besoin d'un router pour etre teste, c'est un signal qu'il est trop couple au framework. Les tests structurels sur les controllers (verifier l'heritage, les callbacks, les methodes) semblaient etranges au debut. Avec le recul, ils testent exactement ce que le gem possede -- le cablage -- et laissent les tests d'integration a la host app.
194
+
195
+ Le piege a eviter : la duplication entre le schema dans `spec_helper.rb` et la migration template. Si les deux divergent, les tests passent mais la migration generee ne correspond pas a ce qui est teste. Un commentaire explicite dans le spec_helper rappelle cette dependance.
196
+
197
+ ### StoreModel : les gotchas
198
+
199
+ StoreModel est puissant, mais il a ses subtilites :
200
+
201
+ **Dirty tracking** : quand on modifie un objet a l'interieur d'un attribut `store_model`, ActiveRecord ne detecte pas le changement. On peut modifier `data_import.records.first.status = "complete"` et appeler `save` -- rien ne sera persiste. La solution : appeler `records_will_change!` avant de modifier, ou reassigner l'attribut entier avec `data_import.records = modified_records`.
202
+
203
+ **Serialisation round-trip** : les cles symboles deviennent des cles string apres un save/reload. `{ name: "Alice" }` revient en `{ "name" => "Alice" }`. Il faut le savoir et coder en consequence -- soit toujours utiliser des string keys, soit appeler `symbolize_keys` a la sortie. DataPorter fait le second dans `ImportRecord#attributes`.
204
+
205
+ **SQLite vs PostgreSQL** : en test, les colonnes StoreModel sont des `text`. En production, elles sont `jsonb`. StoreModel gere la difference de facon transparente, mais certaines requetes JSONB (indexes, contains) ne sont pas testables en SQLite. C'est un compromis acceptable pour la vitesse du feedback loop.
206
+
207
+ ### Phlex dans un engine : `plain` vs `text`
208
+
209
+ Un piege specifique a Phlex : pour emettre du texte brut a l'interieur d'un element, il faut utiliser `plain` (pas `text`). Dans les premieres versions de Phlex, `text` existait mais a ete renomme. Si vous utilisez `text` avec une version recente, vous obtenez un `NoMethodError` cryptique. La SummaryCards le montre bien :
210
+
211
+ ```ruby
212
+ def card(css_class, count, label)
213
+ div(class: "dp-card #{css_class}") do
214
+ strong { count.to_s }
215
+ plain " #{label}" # pas text, pas p, juste du texte brut
216
+ end
217
+ end
218
+ ```
219
+
220
+ L'autre subtilite : appeler `super()` dans le `initialize` de chaque composant. Phlex l'exige, et l'oublier produit des erreurs silencieuses ou des rendus vides.
221
+
222
+ ### Patterns de test : controllers, channels, JS
223
+
224
+ Tester du JavaScript depuis Ruby en lisant le fichier comme du texte et en assertant sur les strings -- ca semble hacky. En pratique, ca detecte la categorie de bugs la plus frequente dans un engine : le desalignement entre le code Ruby et le code JS. Le channel s'appelle `DataPorter::ImportChannel` en Ruby et `"DataPorter::ImportChannel"` en JS. Si l'un change et pas l'autre, le test echoue. Pour un seul fichier Stimulus de 30 lignes, ca vaut mieux que d'ajouter Jest et `node_modules` au projet.
225
+
226
+ Les tests structurels de controllers (`_process_action_callbacks`, `instance_method`, `superclass`) forment un contrat : le gem garantit que le controller a la bonne forme. La host app garantit qu'il se comporte correctement dans son contexte. C'est une separation de responsabilites propre.
227
+
228
+ ## Et apres ?
229
+
230
+ DataPorter 0.1.0 couvre le workflow standard. Voici ce qui pourrait venir dans les versions suivantes :
231
+
232
+ **Batch imports** : pour les fichiers de 100k+ lignes, importer par lots de 1000 avec `insert_all` au lieu de `create!` record par record. Ca necessite de repenser le contrat de `persist` -- au lieu d'un record a la fois, le target recevrait un batch.
233
+
234
+ **Streaming de progression** : remplacer ActionCable par Server-Sent Events (SSE) pour les apps qui n'ont pas besoin de WebSocket bidirectionnel. Plus leger, pas de Redis en dependance.
235
+
236
+ **Validateurs custom** : permettre aux targets de declarer des validateurs avec un DSL :
237
+
238
+ ```ruby
239
+ columns do
240
+ column :email, type: :email, required: true, validate: ->(val) {
241
+ "already exists" if User.exists?(email: val)
242
+ }
243
+ end
244
+ ```
245
+
246
+ **Export** : le chemin inverse. Si on sait parser et valider des records, on sait aussi les serialiser en CSV/JSON. Le Target a deja toute l'information necessaire (colonnes, types, labels).
247
+
248
+ **Support Excel** : un `Source::Xlsx` qui s'appuie sur `roo` ou `creek` pour parser les fichiers `.xlsx`. Le pattern Source est la, il suffit d'implementer `fetch`.
249
+
250
+ ## Reflexion finale
251
+
252
+ Construire DataPorter a ete un exercice de discipline autant que de code. La methode -- Taskmaster pour planifier, TDD pour implementer, un article pour documenter chaque etape -- force a prendre des decisions explicites. Pas de "on verra plus tard". Chaque composant existe parce qu'un test l'exige, et chaque test existe parce qu'un comportement a ete specifie.
253
+
254
+ Le choix de ne pas utiliser de dummy app etait un pari. Il a paye : les tests sont rapides, les composants sont decouples, et le gem est testable sans infrastructure Rails. Mais il a un cout -- certains bugs d'integration ne seront detectes que dans la host app. C'est un tradeoff assume : le gem teste son cablage, la host app teste son comportement.
255
+
256
+ StoreModel, Phlex, Stimulus -- chaque dependance a apporte sa part de surprises. Le dirty tracking de StoreModel, le `plain` vs `text` de Phlex, le nommage a double tiret de Stimulus pour les engines. Ces gotchas n'apparaissent dans aucune documentation. Ils apparaissent quand un test echoue a 23h et qu'on lit le code source du gem pour comprendre pourquoi. C'est ca, le vrai avantage du TDD : on decouvre les problemes dans le terminal, pas en production.
257
+
258
+ DataPorter est maintenant un gem publie sur RubyGems. Un `bundle add data_porter`, un `rails generate data_porter:install`, un Target de 15 lignes, et n'importe quelle app Rails a un systeme d'import complet avec preview, validation, progression temps reel et dry run.
259
+
260
+ C'etait le plan depuis le debut. Il aura fallu 15 articles pour y arriver.
261
+
262
+ ---
263
+
264
+ *This is part 15 of the series "Building DataPorter - A Data Import Engine for Rails". [Previous: Dry Run: Validate Before You Persist](#)*