data_porter 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (159) hide show
  1. checksums.yaml +7 -0
  2. data/.claude/commands/blog-status.md +10 -0
  3. data/.claude/commands/blog.md +109 -0
  4. data/.claude/commands/task-done.md +27 -0
  5. data/.claude/commands/tm/add-dependency.md +58 -0
  6. data/.claude/commands/tm/add-subtask.md +79 -0
  7. data/.claude/commands/tm/add-task.md +81 -0
  8. data/.claude/commands/tm/analyze-complexity.md +124 -0
  9. data/.claude/commands/tm/analyze-project.md +100 -0
  10. data/.claude/commands/tm/auto-implement-tasks.md +100 -0
  11. data/.claude/commands/tm/command-pipeline.md +80 -0
  12. data/.claude/commands/tm/complexity-report.md +120 -0
  13. data/.claude/commands/tm/convert-task-to-subtask.md +74 -0
  14. data/.claude/commands/tm/expand-all-tasks.md +52 -0
  15. data/.claude/commands/tm/expand-task.md +52 -0
  16. data/.claude/commands/tm/fix-dependencies.md +82 -0
  17. data/.claude/commands/tm/help.md +101 -0
  18. data/.claude/commands/tm/init-project-quick.md +49 -0
  19. data/.claude/commands/tm/init-project.md +53 -0
  20. data/.claude/commands/tm/install-taskmaster.md +118 -0
  21. data/.claude/commands/tm/learn.md +106 -0
  22. data/.claude/commands/tm/list-tasks-by-status.md +42 -0
  23. data/.claude/commands/tm/list-tasks-with-subtasks.md +30 -0
  24. data/.claude/commands/tm/list-tasks.md +46 -0
  25. data/.claude/commands/tm/next-task.md +69 -0
  26. data/.claude/commands/tm/parse-prd-with-research.md +51 -0
  27. data/.claude/commands/tm/parse-prd.md +52 -0
  28. data/.claude/commands/tm/project-status.md +67 -0
  29. data/.claude/commands/tm/quick-install-taskmaster.md +23 -0
  30. data/.claude/commands/tm/remove-all-subtasks.md +94 -0
  31. data/.claude/commands/tm/remove-dependency.md +65 -0
  32. data/.claude/commands/tm/remove-subtask.md +87 -0
  33. data/.claude/commands/tm/remove-subtasks.md +89 -0
  34. data/.claude/commands/tm/remove-task.md +110 -0
  35. data/.claude/commands/tm/setup-models.md +52 -0
  36. data/.claude/commands/tm/show-task.md +85 -0
  37. data/.claude/commands/tm/smart-workflow.md +58 -0
  38. data/.claude/commands/tm/sync-readme.md +120 -0
  39. data/.claude/commands/tm/tm-main.md +147 -0
  40. data/.claude/commands/tm/to-cancelled.md +58 -0
  41. data/.claude/commands/tm/to-deferred.md +50 -0
  42. data/.claude/commands/tm/to-done.md +47 -0
  43. data/.claude/commands/tm/to-in-progress.md +39 -0
  44. data/.claude/commands/tm/to-pending.md +35 -0
  45. data/.claude/commands/tm/to-review.md +43 -0
  46. data/.claude/commands/tm/update-single-task.md +122 -0
  47. data/.claude/commands/tm/update-task.md +75 -0
  48. data/.claude/commands/tm/update-tasks-from-id.md +111 -0
  49. data/.claude/commands/tm/validate-dependencies.md +72 -0
  50. data/.claude/commands/tm/view-models.md +52 -0
  51. data/.env.example +12 -0
  52. data/.mcp.json +24 -0
  53. data/.taskmaster/CLAUDE.md +435 -0
  54. data/.taskmaster/config.json +44 -0
  55. data/.taskmaster/docs/prd.txt +2044 -0
  56. data/.taskmaster/state.json +6 -0
  57. data/.taskmaster/tasks/task_001.md +19 -0
  58. data/.taskmaster/tasks/task_002.md +19 -0
  59. data/.taskmaster/tasks/task_003.md +19 -0
  60. data/.taskmaster/tasks/task_004.md +19 -0
  61. data/.taskmaster/tasks/task_005.md +19 -0
  62. data/.taskmaster/tasks/task_006.md +19 -0
  63. data/.taskmaster/tasks/task_007.md +19 -0
  64. data/.taskmaster/tasks/task_008.md +19 -0
  65. data/.taskmaster/tasks/task_009.md +19 -0
  66. data/.taskmaster/tasks/task_010.md +19 -0
  67. data/.taskmaster/tasks/task_011.md +19 -0
  68. data/.taskmaster/tasks/task_012.md +19 -0
  69. data/.taskmaster/tasks/task_013.md +19 -0
  70. data/.taskmaster/tasks/task_014.md +19 -0
  71. data/.taskmaster/tasks/task_015.md +19 -0
  72. data/.taskmaster/tasks/task_016.md +19 -0
  73. data/.taskmaster/tasks/task_017.md +19 -0
  74. data/.taskmaster/tasks/task_018.md +19 -0
  75. data/.taskmaster/tasks/task_019.md +19 -0
  76. data/.taskmaster/tasks/task_020.md +19 -0
  77. data/.taskmaster/tasks/tasks.json +299 -0
  78. data/.taskmaster/templates/example_prd.txt +47 -0
  79. data/.taskmaster/templates/example_prd_rpg.txt +511 -0
  80. data/CHANGELOG.md +29 -0
  81. data/CLAUDE.md +65 -0
  82. data/CODE_OF_CONDUCT.md +10 -0
  83. data/CONTRIBUTING.md +49 -0
  84. data/LICENSE +21 -0
  85. data/README.md +463 -0
  86. data/Rakefile +12 -0
  87. data/app/assets/stylesheets/data_porter/application.css +646 -0
  88. data/app/channels/data_porter/import_channel.rb +10 -0
  89. data/app/controllers/data_porter/imports_controller.rb +68 -0
  90. data/app/javascript/data_porter/progress_controller.js +33 -0
  91. data/app/jobs/data_porter/dry_run_job.rb +12 -0
  92. data/app/jobs/data_porter/import_job.rb +12 -0
  93. data/app/jobs/data_porter/parse_job.rb +12 -0
  94. data/app/models/data_porter/data_import.rb +49 -0
  95. data/app/views/data_porter/imports/index.html.erb +142 -0
  96. data/app/views/data_porter/imports/new.html.erb +88 -0
  97. data/app/views/data_porter/imports/show.html.erb +49 -0
  98. data/config/database.yml +3 -0
  99. data/config/routes.rb +12 -0
  100. data/docs/SPEC.md +2012 -0
  101. data/docs/UI.md +32 -0
  102. data/docs/blog/001-why-build-a-data-import-engine.md +166 -0
  103. data/docs/blog/002-scaffolding-a-rails-engine.md +188 -0
  104. data/docs/blog/003-configuration-dsl.md +222 -0
  105. data/docs/blog/004-store-model-jsonb.md +237 -0
  106. data/docs/blog/005-target-dsl.md +284 -0
  107. data/docs/blog/006-parsing-csv-sources.md +300 -0
  108. data/docs/blog/007-orchestrator.md +247 -0
  109. data/docs/blog/008-actioncable-stimulus.md +376 -0
  110. data/docs/blog/009-phlex-ui-components.md +446 -0
  111. data/docs/blog/010-controllers-routing.md +374 -0
  112. data/docs/blog/011-generators.md +364 -0
  113. data/docs/blog/012-json-api-sources.md +323 -0
  114. data/docs/blog/013-testing-rails-engine.md +618 -0
  115. data/docs/blog/014-dry-run.md +307 -0
  116. data/docs/blog/015-publishing-retro.md +264 -0
  117. data/docs/blog/016-erb-view-templates.md +431 -0
  118. data/docs/blog/017-showcase-final-retro.md +220 -0
  119. data/docs/blog/BACKLOG.md +8 -0
  120. data/docs/blog/SERIES.md +154 -0
  121. data/docs/screenshots/index-with-previewing.jpg +0 -0
  122. data/docs/screenshots/index.jpg +0 -0
  123. data/docs/screenshots/modal-new-import.jpg +0 -0
  124. data/docs/screenshots/preview.jpg +0 -0
  125. data/lib/data_porter/broadcaster.rb +29 -0
  126. data/lib/data_porter/components/base.rb +10 -0
  127. data/lib/data_porter/components/failure_alert.rb +20 -0
  128. data/lib/data_porter/components/preview_table.rb +54 -0
  129. data/lib/data_porter/components/progress_bar.rb +33 -0
  130. data/lib/data_porter/components/results_summary.rb +19 -0
  131. data/lib/data_porter/components/status_badge.rb +16 -0
  132. data/lib/data_porter/components/summary_cards.rb +30 -0
  133. data/lib/data_porter/components.rb +14 -0
  134. data/lib/data_porter/configuration.rb +25 -0
  135. data/lib/data_porter/dsl/api_config.rb +25 -0
  136. data/lib/data_porter/dsl/column.rb +17 -0
  137. data/lib/data_porter/engine.rb +15 -0
  138. data/lib/data_porter/orchestrator.rb +141 -0
  139. data/lib/data_porter/record_validator.rb +32 -0
  140. data/lib/data_porter/registry.rb +33 -0
  141. data/lib/data_porter/sources/api.rb +49 -0
  142. data/lib/data_porter/sources/base.rb +35 -0
  143. data/lib/data_porter/sources/csv.rb +43 -0
  144. data/lib/data_porter/sources/json.rb +45 -0
  145. data/lib/data_porter/sources.rb +20 -0
  146. data/lib/data_porter/store_models/error.rb +13 -0
  147. data/lib/data_porter/store_models/import_record.rb +52 -0
  148. data/lib/data_porter/store_models/report.rb +21 -0
  149. data/lib/data_porter/target.rb +89 -0
  150. data/lib/data_porter/type_validator.rb +46 -0
  151. data/lib/data_porter/version.rb +5 -0
  152. data/lib/data_porter.rb +32 -0
  153. data/lib/generators/data_porter/install/install_generator.rb +33 -0
  154. data/lib/generators/data_porter/install/templates/create_data_porter_imports.rb.erb +21 -0
  155. data/lib/generators/data_porter/install/templates/initializer.rb +30 -0
  156. data/lib/generators/data_porter/target/target_generator.rb +44 -0
  157. data/lib/generators/data_porter/target/templates/target.rb.tt +20 -0
  158. data/sig/data_porter.rbs +4 -0
  159. metadata +274 -0
@@ -0,0 +1,300 @@
1
+ ---
2
+ title: "Building DataPorter #6 — Parsing CSV Data with Sources"
3
+ series: "Building DataPorter - A Data Import Engine for Rails"
4
+ part: 6
5
+ tags: [ruby, rails, rails-engine, gem-development, csv, active-storage, source-pattern]
6
+ published: false
7
+ ---
8
+
9
+ # Parsing CSV Data with Sources
10
+
11
+ > How to model an import record in the database, parse CSV files through a pluggable Source layer, and map headers to target columns -- the first end-to-end flow.
12
+
13
+ ## Context
14
+
15
+ This is part 6 of the series where we build **DataPorter**, a mountable Rails engine for data import workflows. In [part 5](#), we designed the Target DSL and Registry -- the layer that describes *what* an import looks like: its columns, mappings, and persistence logic.
16
+
17
+ Now we need the other half: the code that represents an import *in progress* and the code that reads raw data from a file. By the end of this article, we will have a `DataImport` ActiveRecord model to track state, a `Source` abstraction for parsing, and a concrete CSV source that maps headers to target columns. This is where data first flows through the engine.
18
+
19
+ ## The problem
20
+
21
+ A target declaration tells the engine what columns to expect, but it says nothing about where data comes from or how to track the import lifecycle. We need a database-backed model that records who started the import, what state it is in, what records were parsed, and what errors occurred. We also need a layer that can read a CSV file (today) and a JSON payload or API response (later) through the same interface. Without this separation, the parsing logic would live in the controller or the target itself, coupling the format to the business rules.
22
+
23
+ ## What we're building
24
+
25
+ Here is the end-to-end flow we are wiring together:
26
+
27
+ ```ruby
28
+ # 1. Create an import record
29
+ import = DataPorter::DataImport.create!(
30
+ target_key: "guests",
31
+ source_type: "csv",
32
+ user: current_user
33
+ )
34
+
35
+ # 2. Resolve the source and parse the file
36
+ source_class = DataPorter::Sources.resolve(import.source_type)
37
+ source = source_class.new(import, content: csv_string)
38
+ rows = source.fetch
39
+ # => [{ first_name: "Alice", last_name: "Smith", email: "alice@example.com" }, ...]
40
+
41
+ # 3. The import knows its target
42
+ import.target_class
43
+ # => GuestTarget
44
+ ```
45
+
46
+ Three objects, three concerns: `DataImport` tracks state, `Sources.resolve` picks the parser, and the source turns raw bytes into mapped hashes. The Orchestrator (part 7) will coordinate these pieces, but each works independently.
47
+
48
+ ## Implementation
49
+
50
+ ### Step 1 -- The DataImport model and migration
51
+
52
+ The `DataImport` model is the central record for every import. It needs to track which target is being imported, what source format the data arrives in, what state the import has reached, and who initiated it.
53
+
54
+ The migration creates a single table with JSONB columns for records and report data (the StoreModel types from part 4):
55
+
56
+ ```ruby
57
+ # lib/generators/data_porter/install/templates/create_data_porter_imports.rb.erb
58
+ create_table :data_porter_imports do |t|
59
+ t.string :target_key, null: false
60
+ t.string :source_type, null: false, default: "csv"
61
+ t.integer :status, null: false, default: 0
62
+ t.jsonb :records, null: false, default: []
63
+ t.jsonb :report, null: false, default: {}
64
+ t.jsonb :config, null: false, default: {}
65
+
66
+ t.references :user, polymorphic: true, null: false
67
+
68
+ t.timestamps
69
+ end
70
+ ```
71
+
72
+ The `user` reference is polymorphic (`user_type` + `user_id`), so the engine works regardless of whether the host app calls its user model `User`, `AdminUser`, or `Account`. The `config` JSONB column stores source-specific options like CSV delimiters or API authentication parameters -- things that vary per import, not per target.
73
+
74
+ The model itself is compact:
75
+
76
+ ```ruby
77
+ # app/models/data_porter/data_import.rb
78
+ class DataImport < ActiveRecord::Base
79
+ self.table_name = "data_porter_imports"
80
+
81
+ belongs_to :user, polymorphic: true
82
+
83
+ enum :status, {
84
+ pending: 0, parsing: 1, previewing: 2,
85
+ importing: 3, completed: 4, failed: 5
86
+ }
87
+
88
+ attribute :records, StoreModels::ImportRecord.to_array_type, default: -> { [] }
89
+ attribute :report, StoreModels::Report.to_type, default: -> { StoreModels::Report.new }
90
+ attribute :config, :json, default: -> { {} }
91
+
92
+ validates :target_key, presence: true
93
+ validates :source_type, presence: true, inclusion: { in: %w[csv json api] }
94
+ end
95
+ ```
96
+
97
+ The `status` enum defines the import lifecycle as a linear state machine: `pending -> parsing -> previewing -> importing -> completed` (or `failed` at any point). Integer-backed enums keep the database column small and indexable. The `records` and `report` attributes use StoreModel types that we built in part 4 -- they serialize structured data into the JSONB columns while providing typed Ruby objects in memory.
98
+
99
+ Two convenience methods bridge the model to the rest of the engine:
100
+
101
+ ```ruby
102
+ # app/models/data_porter/data_import.rb
103
+ def target_class
104
+ Registry.find(target_key)
105
+ end
106
+
107
+ def importable_records
108
+ records.select(&:importable?)
109
+ end
110
+ ```
111
+
112
+ `target_class` delegates to the Registry so any code holding a `DataImport` can reach the target's column definitions, mappings, and hooks. `importable_records` filters parsed records down to those that passed validation -- the subset the Orchestrator will actually persist.
113
+
114
+ ### Step 2 -- The Source base class
115
+
116
+ Sources are responsible for one thing: turning raw input into an array of hashes where keys are target column names. The base class defines the interface and the shared mapping logic:
117
+
118
+ ```ruby
119
+ # lib/data_porter/sources/base.rb
120
+ module DataPorter
121
+ module Sources
122
+ class Base
123
+ def initialize(data_import, **)
124
+ @data_import = data_import
125
+ @target_class = data_import.target_class
126
+ end
127
+
128
+ def fetch
129
+ raise NotImplementedError
130
+ end
131
+ end
132
+ end
133
+ end
134
+ ```
135
+
136
+ Every source receives the `DataImport` record at construction, which gives it access to the target class (for column mappings) and the config hash (for source-specific options). The `**` double splat lets subclasses accept extra keyword arguments without the base class needing to know about them.
137
+
138
+ The mapping logic lives in the base class because it is shared across all sources that deal with key-value rows:
139
+
140
+ ```ruby
141
+ # lib/data_porter/sources/base.rb (private methods)
142
+ def apply_csv_mapping(row)
143
+ mappings = @target_class._csv_mappings
144
+ return auto_map(row) if mappings.nil? || mappings.empty?
145
+
146
+ explicit_map(row, mappings)
147
+ end
148
+
149
+ def auto_map(row)
150
+ row.to_h.transform_keys { |k| k.parameterize(separator: "_").to_sym }
151
+ end
152
+
153
+ def explicit_map(row, mappings)
154
+ mappings.each_with_object({}) do |(header, column), hash|
155
+ hash[column] = row[header]
156
+ end
157
+ end
158
+ ```
159
+
160
+ There are two mapping strategies. When a target defines `csv_mapping`, explicit mapping applies: only the declared header-to-column pairs are extracted, and anything else in the row is silently dropped. When no mapping is defined, auto-mapping kicks in: every header is parameterized into a snake_case symbol (`"First Name"` becomes `:first_name`). This lets simple imports work with zero configuration while giving complex imports full control over which columns matter.
161
+
162
+ ### Step 3 -- The CSV source and source resolution
163
+
164
+ The CSV source implements `fetch` by parsing content through Ruby's standard library `CSV` class:
165
+
166
+ ```ruby
167
+ # lib/data_porter/sources/csv.rb
168
+ class Csv < Base
169
+ def initialize(data_import, content: nil)
170
+ super(data_import)
171
+ @content = content
172
+ end
173
+
174
+ def fetch
175
+ rows = []
176
+ ::CSV.parse(csv_content, **csv_options) do |row|
177
+ rows << apply_csv_mapping(row)
178
+ end
179
+ rows
180
+ end
181
+
182
+ private
183
+
184
+ def csv_content
185
+ @content || download_file
186
+ end
187
+
188
+ def download_file
189
+ @data_import.file.download
190
+ end
191
+
192
+ def csv_options
193
+ { headers: true }.merge(extra_options)
194
+ end
195
+
196
+ def extra_options
197
+ config = @data_import.config
198
+ return {} unless config.is_a?(Hash)
199
+ config.symbolize_keys.slice(:col_sep, :encoding)
200
+ end
201
+ end
202
+ ```
203
+
204
+ The `content:` keyword argument enables two usage modes. In production, `content` is nil and the source downloads the file from ActiveStorage via `@data_import.file.download`. In tests, you pass a CSV string directly, avoiding the need for file attachments or storage mocks. The `csv_options` method merges `headers: true` (so `CSV.parse` yields `CSV::Row` objects with named access) with any per-import overrides from the `config` column -- currently `col_sep` for semicolon-delimited files and `encoding` for non-UTF-8 data.
205
+
206
+ Source resolution ties it together:
207
+
208
+ ```ruby
209
+ # lib/data_porter/sources.rb
210
+ module DataPorter
211
+ module Sources
212
+ REGISTRY = {
213
+ csv: Csv
214
+ }.freeze
215
+
216
+ def self.resolve(type)
217
+ REGISTRY.fetch(type.to_sym) { raise Error, "Unknown source type: #{type}" }
218
+ end
219
+ end
220
+ end
221
+ ```
222
+
223
+ This is intentionally simpler than the Target Registry. Sources are engine-internal (the gem ships them), so a frozen hash with a `resolve` method is sufficient. When we add JSON and API sources in part 12, they get one line each in the registry.
224
+
225
+ ## Decisions & tradeoffs
226
+
227
+ | Decision | We chose | Over | Because |
228
+ |----------|----------|------|---------|
229
+ | User association | Polymorphic `belongs_to :user` | A configurable foreign key or no association | Polymorphic works with any user model name without configuration; the engine does not need to know the host app's auth setup |
230
+ | State tracking | Integer-backed `enum` | A state machine gem (AASM, Statesman) | Six linear states do not need transition guards or history tracking yet; a gem would add a dependency for no immediate benefit |
231
+ | Auto-mapping fallback | `parameterize` + `to_sym` on headers | Requiring explicit mapping for all imports | Auto-mapping lets simple CSVs work with zero target configuration; explicit mapping is there when headers don't match column names |
232
+ | CSV content injection | `content:` keyword on initialize | Always reading from ActiveStorage | Injecting content makes tests fast and storage-independent; production code passes `nil` and falls through to `download_file` |
233
+ | Source-specific config | JSONB `config` column on DataImport | Separate columns for each option | A single JSONB column absorbs any source's options (col_sep, encoding, API headers) without schema changes |
234
+
235
+ ## Testing it
236
+
237
+ DataImport specs verify validations, the status enum, and StoreModel integration:
238
+
239
+ ```ruby
240
+ # spec/data_porter/data_import_spec.rb
241
+ it "validates source_type inclusion" do
242
+ import = described_class.new(target_key: "guests", source_type: "xml")
243
+
244
+ expect(import).not_to be_valid
245
+ expect(import.errors[:source_type]).to include("is not included in the list")
246
+ end
247
+
248
+ it "saves and reloads with records" do
249
+ import = described_class.create!(
250
+ target_key: "guests", source_type: "csv",
251
+ user_type: "User", user_id: 1
252
+ )
253
+ record = DataPorter::StoreModels::ImportRecord.new(line_number: 1, data: { name: "Alice" })
254
+ import.update!(records: [record])
255
+
256
+ reloaded = described_class.find(import.id)
257
+ expect(reloaded.records.first.data).to eq({ "name" => "Alice" })
258
+ end
259
+ ```
260
+
261
+ CSV source specs exercise both mapping modes and the config override:
262
+
263
+ ```ruby
264
+ # spec/data_porter/sources/csv_spec.rb
265
+ it "parses CSV content and applies mapping" do
266
+ csv_content = "Prenom,Nom,Email\nAlice,Smith,alice@example.com\n"
267
+ source = described_class.new(data_import, content: csv_content)
268
+
269
+ rows = source.fetch
270
+
271
+ expect(rows.size).to eq(2)
272
+ expect(rows.first).to eq(
273
+ first_name: "Alice", last_name: "Smith", email: "alice@example.com"
274
+ )
275
+ end
276
+
277
+ it "auto-maps when no csv_mapping defined" do
278
+ csv_content = "First Name,Last Name\nAlice,Smith\n"
279
+ source = described_class.new(import, content: csv_content)
280
+
281
+ expect(source.fetch.first).to eq(first_name: "Alice", last_name: "Smith")
282
+ end
283
+ ```
284
+
285
+ All source specs pass CSV strings directly through the `content:` parameter, so they run without ActiveStorage, without file fixtures, and without I/O.
286
+
287
+ ## Recap
288
+
289
+ - The **DataImport model** is the database-backed record for every import, tracking target key, source type, status, parsed records (via StoreModel), and the initiating user (via polymorphic association).
290
+ - The **migration** uses JSONB columns for records, report, and config, keeping the schema stable as features evolve.
291
+ - The **Source base class** defines the `fetch` interface and shared column-mapping logic with two strategies: explicit mapping from the target's `csv_mapping` block, or automatic parameterize-based mapping when none is defined.
292
+ - The **CSV source** parses content via Ruby's `CSV` library, supports ActiveStorage file download in production and string injection in tests, and respects per-import config options like custom delimiters.
293
+
294
+ ## Next up
295
+
296
+ We have targets that describe imports and sources that parse raw data into mapped hashes. But right now, nothing coordinates the flow: reading the file, building ImportRecord objects, running validations, transitioning the status, and persisting results. In part 7, we build the **Orchestrator** -- the class that ties `DataImport`, `Target`, and `Source` together into the complete parse-then-import workflow. That is where state transitions, per-record error handling, and ActiveJob integration come in.
297
+
298
+ ---
299
+
300
+ *This is part 6 of the series "Building DataPorter - A Data Import Engine for Rails". [Previous: Designing a Target DSL](#) | [Next: The Orchestrator](#)*
@@ -0,0 +1,247 @@
1
+ ---
2
+ title: "Building DataPorter #7 -- The Orchestrator: Coordinating the Import Workflow"
3
+ series: "Building DataPorter - A Data Import Engine for Rails"
4
+ part: 7
5
+ tags: [ruby, rails, rails-engine, gem-development, orchestrator, activejob, error-handling]
6
+ published: false
7
+ ---
8
+
9
+ # The Orchestrator: Coordinating the Import Workflow
10
+
11
+ > How to coordinate parsing, validation, and persistence through a single class that manages state transitions, handles errors per-record, and integrates with ActiveJob.
12
+
13
+ ## Context
14
+
15
+ This is part 7 of the series where we build **DataPorter**, a mountable Rails engine for data import workflows. In [part 6](#), we built the DataImport model to track import state and the Source layer to parse CSV files into mapped hashes.
16
+
17
+ We now have all the individual pieces: targets describe imports, sources parse files, the RecordValidator checks column constraints, and DataImport tracks state. But nothing ties them together. In this article, we build the Orchestrator -- the class that coordinates the full parse-then-import workflow.
18
+
19
+ ## The problem
20
+
21
+ Right now, if you wanted to run an import, you would need to manually resolve the source, call `fetch`, iterate over the rows, build ImportRecord objects, run validations, update the status, and handle failures. That is a lot of procedural logic. If it lived in a controller action, it would be untestable, unreusable, and impossible to run in the background. If it lived in the model, DataImport would become a god object. We need a dedicated coordination layer that knows the *order* of operations but delegates the *details* to the objects that own them.
22
+
23
+ ## What we're building
24
+
25
+ Here is the Orchestrator in action -- two method calls that drive the entire workflow:
26
+
27
+ ```ruby
28
+ # In a controller or job
29
+ orchestrator = DataPorter::Orchestrator.new(data_import, content: csv_string)
30
+
31
+ # Step 1: Parse the file, validate records, generate a preview
32
+ orchestrator.parse!
33
+ data_import.status # => "previewing"
34
+ data_import.records # => [ImportRecord, ImportRecord, ...]
35
+
36
+ # Step 2: After user reviews the preview, persist the records
37
+ orchestrator.import!
38
+ data_import.status # => "completed"
39
+ data_import.report.imported_count # => 42
40
+ ```
41
+
42
+ Two methods, two phases. `parse!` turns raw data into validated records and stops at `previewing` so the user can review. `import!` takes the importable records and persists them through the target's `persist` method. If anything goes catastrophically wrong, the import transitions to `failed` with an error report.
43
+
44
+ ## Implementation
45
+
46
+ ### Step 1 -- The Orchestrator skeleton and parse phase
47
+
48
+ The Orchestrator is a plain Ruby object. It receives a DataImport and optional content (for testing), then delegates to the pieces we already built:
49
+
50
+ ```ruby
51
+ # lib/data_porter/orchestrator.rb
52
+ class Orchestrator
53
+ def initialize(data_import, content: nil)
54
+ @data_import = data_import
55
+ @target = data_import.target_class.new
56
+ @source_options = { content: content }.compact
57
+ end
58
+
59
+ def parse!
60
+ @data_import.parsing!
61
+ records = build_records
62
+ @data_import.update!(records: records, status: :previewing)
63
+ build_report
64
+ rescue StandardError => e
65
+ handle_failure(e)
66
+ end
67
+ end
68
+ ```
69
+
70
+ The constructor instantiates the target (so we can call its hooks) and compacts the source options (so `content: nil` does not override ActiveStorage downloads). The `parse!` method follows a strict sequence: transition to `parsing`, build and validate records, save them with a `previewing` status, then generate a summary report. The `rescue` at the method level catches any failure -- a malformed CSV, a missing file, an unexpected source error -- and transitions the import to `failed` with the error message preserved in the report.
71
+
72
+ Notice that `parsing!` is called *before* the work starts. This is intentional: if the job crashes between the status transition and the `update!`, the import is left in `parsing` rather than `pending`, signaling to the user that something went wrong mid-process rather than silently sitting idle.
73
+
74
+ ### Step 2 -- Building and validating records
75
+
76
+ The `build_records` method is where the Source, Target, and RecordValidator converge:
77
+
78
+ ```ruby
79
+ # lib/data_porter/orchestrator.rb
80
+ def build_records
81
+ source = @data_import.source_class.new(@data_import, **@source_options)
82
+ raw_rows = source.fetch
83
+ columns = @target.class._columns || []
84
+ validator = RecordValidator.new(columns)
85
+
86
+ raw_rows.each_with_index.map do |row, index|
87
+ build_record(row, index, columns, validator)
88
+ end
89
+ end
90
+
91
+ def build_record(row, index, columns, validator)
92
+ record = StoreModels::ImportRecord.new(
93
+ line_number: index + 1,
94
+ data: extract_data(row, columns)
95
+ )
96
+ record = @target.transform(record)
97
+ @target.validate(record)
98
+ validator.validate(record)
99
+ record.determine_status!
100
+ record
101
+ end
102
+ ```
103
+
104
+ Each row goes through a four-step pipeline: extract the data for declared columns, let the target transform it (e.g., normalizing phone numbers), let the target run custom validations, then run the generic column-level validations (required fields, type checks). Finally, `determine_status!` sets each record to `complete`, `partial`, or `missing` based on whether errors were added.
105
+
106
+ The RecordValidator handles the generic constraints we defined in the column DSL:
107
+
108
+ ```ruby
109
+ # lib/data_porter/record_validator.rb
110
+ class RecordValidator
111
+ def initialize(columns)
112
+ @columns = columns
113
+ end
114
+
115
+ def validate(record)
116
+ @columns.each do |col|
117
+ value = record.data[col.name]
118
+ validate_required(record, col, value)
119
+ validate_type(record, col, value)
120
+ end
121
+ end
122
+ end
123
+ ```
124
+
125
+ This separation matters: the target owns business-rule validations ("email must be unique in the system"), while the RecordValidator owns structural validations ("email must look like an email"). Neither knows about the other.
126
+
127
+ ### Step 3 -- The import phase and per-record error handling
128
+
129
+ Once the user reviews the preview and confirms, `import!` persists the records:
130
+
131
+ ```ruby
132
+ # lib/data_porter/orchestrator.rb
133
+ def import!
134
+ @data_import.importing!
135
+ results = import_records
136
+ update_import_report(results)
137
+ @target.after_import(results, context: build_context)
138
+ rescue StandardError => e
139
+ handle_failure(e)
140
+ end
141
+
142
+ def persist_record(record, context, results)
143
+ @target.persist(record, context: context)
144
+ results[:created] += 1
145
+ rescue StandardError => e
146
+ record.add_error(e.message)
147
+ @target.on_error(record, e, context: context)
148
+ results[:errored] += 1
149
+ end
150
+ ```
151
+
152
+ The critical design decision here is the error boundary. Each record is persisted individually, and if `persist` raises -- a uniqueness violation, a foreign key constraint, a custom validation from the host app -- the error is captured *on that record* and the import continues. The import does not wrap everything in a single transaction. This means a 10,000-row file with 3 bad records will successfully import 9,997 records rather than rolling back the entire batch.
153
+
154
+ The `on_error` hook lets the target react to failures (logging, notifying, skipping related records), while `after_import` runs once after all records are processed, receiving the results hash for summary work like sending a confirmation email.
155
+
156
+ ### Step 4 -- ActiveJob integration
157
+
158
+ The Orchestrator is designed to be called from anywhere, but its primary consumer is a pair of ActiveJob classes:
159
+
160
+ ```ruby
161
+ # app/jobs/data_porter/parse_job.rb
162
+ class ParseJob < ActiveJob::Base
163
+ queue_as { DataPorter.configuration.queue_name }
164
+
165
+ def perform(import_id)
166
+ data_import = DataImport.find(import_id)
167
+ Orchestrator.new(data_import).parse!
168
+ end
169
+ end
170
+
171
+ # app/jobs/data_porter/import_job.rb
172
+ class ImportJob < ActiveJob::Base
173
+ queue_as { DataPorter.configuration.queue_name }
174
+
175
+ def perform(import_id)
176
+ data_import = DataImport.find(import_id)
177
+ Orchestrator.new(data_import).import!
178
+ end
179
+ end
180
+ ```
181
+
182
+ Each job is a one-liner: find the import, delegate to the Orchestrator. The queue name comes from the engine's configuration, so the host app controls which queue processes imports. Because the Orchestrator already handles failures internally (transitioning to `failed` and recording the error), the jobs do not need their own error handling -- a crash at the ActiveJob level means something truly unexpected happened, and the adapter's retry mechanism takes over.
183
+
184
+ ## Decisions & tradeoffs
185
+
186
+ | Decision | We chose | Over | Because |
187
+ |----------|----------|------|---------|
188
+ | Coordination layer | Dedicated Orchestrator class | Controller-level logic or model callbacks | Keeps controllers thin, models focused on data, and the workflow independently testable |
189
+ | Transaction boundaries | Per-record persist (no wrapping transaction) | Single transaction around all records | A failed record should not roll back thousands of successful ones; partial success is more useful than total failure |
190
+ | Error recovery | Capture error on the record, continue importing | Halt on first error | Users expect to see which rows failed and why, not just "import failed"; the report becomes actionable |
191
+ | Two-phase workflow | Separate `parse!` and `import!` methods | A single `run!` method | The preview step between parse and import lets users catch problems before data hits the database |
192
+ | Job design | Thin jobs delegating to Orchestrator | Logic inside the job classes | The Orchestrator is testable without ActiveJob; jobs are just the async trigger |
193
+
194
+ ## Testing it
195
+
196
+ The Orchestrator specs exercise both phases end-to-end using an anonymous target class and injected CSV content:
197
+
198
+ ```ruby
199
+ # spec/data_porter/orchestrator_spec.rb
200
+ describe "#parse!" do
201
+ it "transitions to previewing" do
202
+ orchestrator = described_class.new(data_import, content: csv_content)
203
+
204
+ orchestrator.parse!
205
+
206
+ expect(data_import.reload.status).to eq("previewing")
207
+ end
208
+
209
+ it "validates required fields" do
210
+ csv = "First Name,Last Name,Email\n,Smith,alice@example.com\n"
211
+ orchestrator = described_class.new(data_import, content: csv)
212
+
213
+ orchestrator.parse!
214
+
215
+ record = data_import.reload.records.first
216
+ expect(record.status).to eq("missing")
217
+ end
218
+ end
219
+
220
+ describe "#import!" do
221
+ it "handles per-record errors" do
222
+ # Target that always raises on persist
223
+ orchestrator = described_class.new(import)
224
+ orchestrator.import!
225
+
226
+ expect(import.reload.status).to eq("completed")
227
+ expect(import.report.errored_count).to eq(1)
228
+ end
229
+ end
230
+ ```
231
+
232
+ The key pattern: even when every `persist` call raises, the import still reaches `completed` -- not `failed`. The `failed` status is reserved for catastrophic errors (the source cannot be read, the target cannot be resolved). Per-record errors are expected operational noise, tracked in the report.
233
+
234
+ ## Recap
235
+
236
+ - The **Orchestrator** is a plain Ruby class that coordinates the parse-validate-persist workflow, keeping controllers thin and models focused.
237
+ - The **two-phase design** (`parse!` then `import!`) creates a natural preview checkpoint where users can review data before it touches the database.
238
+ - **Per-record error handling** means a single bad row never takes down the entire import; errors are captured on individual records and surfaced in the report.
239
+ - **ActiveJob integration** is a thin wrapper: two one-liner jobs that delegate to the Orchestrator, using the engine's configured queue name.
240
+
241
+ ## Next up
242
+
243
+ The import now runs in the background, but the user has no way to know what is happening. They click "Import" and stare at a static page. In part 8, we build a **real-time progress system** using ActionCable and Stimulus -- a Broadcaster service that pushes status updates and record counts to the browser as the Orchestrator processes each row. No more refreshing to check if it is done.
244
+
245
+ ---
246
+
247
+ *This is part 7 of the series "Building DataPorter - A Data Import Engine for Rails". [Previous: Parsing CSV Data with Sources](#) | [Next: Real-time Progress with ActionCable & Stimulus](#)*