data_porter 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (159) hide show
  1. checksums.yaml +7 -0
  2. data/.claude/commands/blog-status.md +10 -0
  3. data/.claude/commands/blog.md +109 -0
  4. data/.claude/commands/task-done.md +27 -0
  5. data/.claude/commands/tm/add-dependency.md +58 -0
  6. data/.claude/commands/tm/add-subtask.md +79 -0
  7. data/.claude/commands/tm/add-task.md +81 -0
  8. data/.claude/commands/tm/analyze-complexity.md +124 -0
  9. data/.claude/commands/tm/analyze-project.md +100 -0
  10. data/.claude/commands/tm/auto-implement-tasks.md +100 -0
  11. data/.claude/commands/tm/command-pipeline.md +80 -0
  12. data/.claude/commands/tm/complexity-report.md +120 -0
  13. data/.claude/commands/tm/convert-task-to-subtask.md +74 -0
  14. data/.claude/commands/tm/expand-all-tasks.md +52 -0
  15. data/.claude/commands/tm/expand-task.md +52 -0
  16. data/.claude/commands/tm/fix-dependencies.md +82 -0
  17. data/.claude/commands/tm/help.md +101 -0
  18. data/.claude/commands/tm/init-project-quick.md +49 -0
  19. data/.claude/commands/tm/init-project.md +53 -0
  20. data/.claude/commands/tm/install-taskmaster.md +118 -0
  21. data/.claude/commands/tm/learn.md +106 -0
  22. data/.claude/commands/tm/list-tasks-by-status.md +42 -0
  23. data/.claude/commands/tm/list-tasks-with-subtasks.md +30 -0
  24. data/.claude/commands/tm/list-tasks.md +46 -0
  25. data/.claude/commands/tm/next-task.md +69 -0
  26. data/.claude/commands/tm/parse-prd-with-research.md +51 -0
  27. data/.claude/commands/tm/parse-prd.md +52 -0
  28. data/.claude/commands/tm/project-status.md +67 -0
  29. data/.claude/commands/tm/quick-install-taskmaster.md +23 -0
  30. data/.claude/commands/tm/remove-all-subtasks.md +94 -0
  31. data/.claude/commands/tm/remove-dependency.md +65 -0
  32. data/.claude/commands/tm/remove-subtask.md +87 -0
  33. data/.claude/commands/tm/remove-subtasks.md +89 -0
  34. data/.claude/commands/tm/remove-task.md +110 -0
  35. data/.claude/commands/tm/setup-models.md +52 -0
  36. data/.claude/commands/tm/show-task.md +85 -0
  37. data/.claude/commands/tm/smart-workflow.md +58 -0
  38. data/.claude/commands/tm/sync-readme.md +120 -0
  39. data/.claude/commands/tm/tm-main.md +147 -0
  40. data/.claude/commands/tm/to-cancelled.md +58 -0
  41. data/.claude/commands/tm/to-deferred.md +50 -0
  42. data/.claude/commands/tm/to-done.md +47 -0
  43. data/.claude/commands/tm/to-in-progress.md +39 -0
  44. data/.claude/commands/tm/to-pending.md +35 -0
  45. data/.claude/commands/tm/to-review.md +43 -0
  46. data/.claude/commands/tm/update-single-task.md +122 -0
  47. data/.claude/commands/tm/update-task.md +75 -0
  48. data/.claude/commands/tm/update-tasks-from-id.md +111 -0
  49. data/.claude/commands/tm/validate-dependencies.md +72 -0
  50. data/.claude/commands/tm/view-models.md +52 -0
  51. data/.env.example +12 -0
  52. data/.mcp.json +24 -0
  53. data/.taskmaster/CLAUDE.md +435 -0
  54. data/.taskmaster/config.json +44 -0
  55. data/.taskmaster/docs/prd.txt +2044 -0
  56. data/.taskmaster/state.json +6 -0
  57. data/.taskmaster/tasks/task_001.md +19 -0
  58. data/.taskmaster/tasks/task_002.md +19 -0
  59. data/.taskmaster/tasks/task_003.md +19 -0
  60. data/.taskmaster/tasks/task_004.md +19 -0
  61. data/.taskmaster/tasks/task_005.md +19 -0
  62. data/.taskmaster/tasks/task_006.md +19 -0
  63. data/.taskmaster/tasks/task_007.md +19 -0
  64. data/.taskmaster/tasks/task_008.md +19 -0
  65. data/.taskmaster/tasks/task_009.md +19 -0
  66. data/.taskmaster/tasks/task_010.md +19 -0
  67. data/.taskmaster/tasks/task_011.md +19 -0
  68. data/.taskmaster/tasks/task_012.md +19 -0
  69. data/.taskmaster/tasks/task_013.md +19 -0
  70. data/.taskmaster/tasks/task_014.md +19 -0
  71. data/.taskmaster/tasks/task_015.md +19 -0
  72. data/.taskmaster/tasks/task_016.md +19 -0
  73. data/.taskmaster/tasks/task_017.md +19 -0
  74. data/.taskmaster/tasks/task_018.md +19 -0
  75. data/.taskmaster/tasks/task_019.md +19 -0
  76. data/.taskmaster/tasks/task_020.md +19 -0
  77. data/.taskmaster/tasks/tasks.json +299 -0
  78. data/.taskmaster/templates/example_prd.txt +47 -0
  79. data/.taskmaster/templates/example_prd_rpg.txt +511 -0
  80. data/CHANGELOG.md +29 -0
  81. data/CLAUDE.md +65 -0
  82. data/CODE_OF_CONDUCT.md +10 -0
  83. data/CONTRIBUTING.md +49 -0
  84. data/LICENSE +21 -0
  85. data/README.md +463 -0
  86. data/Rakefile +12 -0
  87. data/app/assets/stylesheets/data_porter/application.css +646 -0
  88. data/app/channels/data_porter/import_channel.rb +10 -0
  89. data/app/controllers/data_porter/imports_controller.rb +68 -0
  90. data/app/javascript/data_porter/progress_controller.js +33 -0
  91. data/app/jobs/data_porter/dry_run_job.rb +12 -0
  92. data/app/jobs/data_porter/import_job.rb +12 -0
  93. data/app/jobs/data_porter/parse_job.rb +12 -0
  94. data/app/models/data_porter/data_import.rb +49 -0
  95. data/app/views/data_porter/imports/index.html.erb +142 -0
  96. data/app/views/data_porter/imports/new.html.erb +88 -0
  97. data/app/views/data_porter/imports/show.html.erb +49 -0
  98. data/config/database.yml +3 -0
  99. data/config/routes.rb +12 -0
  100. data/docs/SPEC.md +2012 -0
  101. data/docs/UI.md +32 -0
  102. data/docs/blog/001-why-build-a-data-import-engine.md +166 -0
  103. data/docs/blog/002-scaffolding-a-rails-engine.md +188 -0
  104. data/docs/blog/003-configuration-dsl.md +222 -0
  105. data/docs/blog/004-store-model-jsonb.md +237 -0
  106. data/docs/blog/005-target-dsl.md +284 -0
  107. data/docs/blog/006-parsing-csv-sources.md +300 -0
  108. data/docs/blog/007-orchestrator.md +247 -0
  109. data/docs/blog/008-actioncable-stimulus.md +376 -0
  110. data/docs/blog/009-phlex-ui-components.md +446 -0
  111. data/docs/blog/010-controllers-routing.md +374 -0
  112. data/docs/blog/011-generators.md +364 -0
  113. data/docs/blog/012-json-api-sources.md +323 -0
  114. data/docs/blog/013-testing-rails-engine.md +618 -0
  115. data/docs/blog/014-dry-run.md +307 -0
  116. data/docs/blog/015-publishing-retro.md +264 -0
  117. data/docs/blog/016-erb-view-templates.md +431 -0
  118. data/docs/blog/017-showcase-final-retro.md +220 -0
  119. data/docs/blog/BACKLOG.md +8 -0
  120. data/docs/blog/SERIES.md +154 -0
  121. data/docs/screenshots/index-with-previewing.jpg +0 -0
  122. data/docs/screenshots/index.jpg +0 -0
  123. data/docs/screenshots/modal-new-import.jpg +0 -0
  124. data/docs/screenshots/preview.jpg +0 -0
  125. data/lib/data_porter/broadcaster.rb +29 -0
  126. data/lib/data_porter/components/base.rb +10 -0
  127. data/lib/data_porter/components/failure_alert.rb +20 -0
  128. data/lib/data_porter/components/preview_table.rb +54 -0
  129. data/lib/data_porter/components/progress_bar.rb +33 -0
  130. data/lib/data_porter/components/results_summary.rb +19 -0
  131. data/lib/data_porter/components/status_badge.rb +16 -0
  132. data/lib/data_porter/components/summary_cards.rb +30 -0
  133. data/lib/data_porter/components.rb +14 -0
  134. data/lib/data_porter/configuration.rb +25 -0
  135. data/lib/data_porter/dsl/api_config.rb +25 -0
  136. data/lib/data_porter/dsl/column.rb +17 -0
  137. data/lib/data_porter/engine.rb +15 -0
  138. data/lib/data_porter/orchestrator.rb +141 -0
  139. data/lib/data_porter/record_validator.rb +32 -0
  140. data/lib/data_porter/registry.rb +33 -0
  141. data/lib/data_porter/sources/api.rb +49 -0
  142. data/lib/data_porter/sources/base.rb +35 -0
  143. data/lib/data_porter/sources/csv.rb +43 -0
  144. data/lib/data_porter/sources/json.rb +45 -0
  145. data/lib/data_porter/sources.rb +20 -0
  146. data/lib/data_porter/store_models/error.rb +13 -0
  147. data/lib/data_porter/store_models/import_record.rb +52 -0
  148. data/lib/data_porter/store_models/report.rb +21 -0
  149. data/lib/data_porter/target.rb +89 -0
  150. data/lib/data_porter/type_validator.rb +46 -0
  151. data/lib/data_porter/version.rb +5 -0
  152. data/lib/data_porter.rb +32 -0
  153. data/lib/generators/data_porter/install/install_generator.rb +33 -0
  154. data/lib/generators/data_porter/install/templates/create_data_porter_imports.rb.erb +21 -0
  155. data/lib/generators/data_porter/install/templates/initializer.rb +30 -0
  156. data/lib/generators/data_porter/target/target_generator.rb +44 -0
  157. data/lib/generators/data_porter/target/templates/target.rb.tt +20 -0
  158. data/sig/data_porter.rbs +4 -0
  159. metadata +274 -0
@@ -0,0 +1,237 @@
1
+ ---
2
+ title: "Building DataPorter #4 — Modeling import data with StoreModel & JSONB"
3
+ series: "Building DataPorter - A Data Import Engine for Rails"
4
+ part: 4
5
+ tags: [ruby, rails, rails-engine, gem-development, store-model, jsonb, data-modeling]
6
+ published: false
7
+ ---
8
+
9
+ # Modeling import data with StoreModel & JSONB
10
+
11
+ > Storing structured import records, errors, and reports inside a single JSONB column -- no extra tables, no schema sprawl.
12
+
13
+ ## Context
14
+
15
+ This is part 4 of the series where we build **DataPorter**, a mountable Rails engine for data import workflows. In [part 3](#), we built the configuration DSL that lets host apps customize the gem through a clean `configure` block.
16
+
17
+ Now we shift from *how the gem behaves* to *what it operates on*: the data models for parsed records, validation errors, and summary reports. We'll model all three using the StoreModel gem and PostgreSQL JSONB columns.
18
+
19
+ ## The problem
20
+
21
+ A typical import engine ends up with a lot of tables: imports, import rows, import errors, reports. Each needs a migration, foreign keys, indexes, and cleanup logic. For a gem that drops into any Rails app, that's a heavy footprint.
22
+
23
+ But these records are ephemeral. They exist during the import workflow, get consulted in the results view, and nobody queries them independently. You never ask "give me all errors across all imports." They're always accessed through their parent.
24
+
25
+ If the data is always read and written as a group, it doesn't need its own table. It needs a structured column.
26
+
27
+ ## What we're building
28
+
29
+ A single `DataImport` record will carry its entire import payload in JSONB columns:
30
+
31
+ ```ruby
32
+ # Anywhere in the engine
33
+ import = DataPorter::DataImport.find(42)
34
+
35
+ import.report.records_count # => 150
36
+ import.report.errored_count # => 3
37
+ import.report.error_reports.each { |e| puts e.message }
38
+
39
+ import.records.first.line_number # => 1
40
+ import.records.first.status # => "complete"
41
+ import.records.first.data # => { "name" => "Alice", "email" => "alice@example.com" }
42
+ ```
43
+
44
+ No joins, no N+1 queries. Records and reports come back as typed Ruby objects with real attributes and methods -- not raw hashes.
45
+
46
+ ## Implementation
47
+
48
+ ### Step 1 -- The Error model
49
+
50
+ Every import record can accumulate validation errors. We need a small object to represent each one. StoreModel lets us define it like an ActiveModel attribute model, but serialized into JSON.
51
+
52
+ ```ruby
53
+ # lib/data_porter/store_models/error.rb
54
+ module DataPorter
55
+ module StoreModels
56
+ class Error
57
+ include StoreModel::Model
58
+
59
+ attribute :message, :string
60
+ end
61
+ end
62
+ end
63
+ ```
64
+
65
+ `include StoreModel::Model` gives us ActiveModel-compatible attributes that serialize to and from JSON. Why not a plain hash? Because `error.message` is a method call with autocompletion, not `error["message"]` where you guess at indifferent access. If we need `:code` or `:severity` later, we add an attribute and existing data deserializes cleanly -- new fields default to nil.
66
+
67
+ ### Step 2 -- The ImportRecord model
68
+
69
+ Each row from the source file becomes an `ImportRecord`. This is the workhorse of the import: it holds the parsed data, tracks validation status, and collects errors and warnings.
70
+
71
+ ```ruby
72
+ # lib/data_porter/store_models/import_record.rb
73
+ module DataPorter
74
+ module StoreModels
75
+ class ImportRecord
76
+ include StoreModel::Model
77
+
78
+ attribute :line_number, :integer
79
+ attribute :status, :string, default: "pending"
80
+ attribute :data, default: -> { {} }
81
+ attribute :errors_list, Error.to_array_type, default: -> { [] }
82
+ attribute :warnings, Error.to_array_type, default: -> { [] }
83
+ attribute :target_id, :integer
84
+ attribute :dry_run_passed, :boolean, default: false
85
+ end
86
+ end
87
+ end
88
+ ```
89
+
90
+ The `data` attribute stores whatever hash the source parser produces -- no explicit type, because each import target defines different columns. The lambda defaults (`-> { {} }`) are critical; without them, every record shares the same mutable object. `Error.to_array_type` makes `errors_list` a typed array: each JSON entry deserializes into an `Error` instance, not a raw hash.
91
+
92
+ The model also carries behavior. Status determination runs after validation:
93
+
94
+ ```ruby
95
+ # lib/data_porter/store_models/import_record.rb
96
+ def determine_status!
97
+ self.status = if required_error?
98
+ "missing"
99
+ elsif errors_list.any?
100
+ "partial"
101
+ else
102
+ "complete"
103
+ end
104
+ end
105
+ ```
106
+
107
+ Three statuses: "missing" (a required field is absent -- the record cannot be imported), "partial" (optional field errors exist -- the record can be imported with warnings), and "complete" (clean row, ready to go). The Orchestrator will call `determine_status!` after validation and use `importable?` to decide which records to persist.
108
+
109
+ ### Step 3 -- The Report model
110
+
111
+ After parsing and validating, we need a summary. The Report model aggregates counts and collects top-level errors (like "file has no header row" or "unexpected encoding").
112
+
113
+ ```ruby
114
+ # lib/data_porter/store_models/report.rb
115
+ module DataPorter
116
+ module StoreModels
117
+ class Report
118
+ include StoreModel::Model
119
+
120
+ attribute :records_count, :integer, default: 0
121
+ attribute :complete_count, :integer, default: 0
122
+ attribute :partial_count, :integer, default: 0
123
+ attribute :missing_count, :integer, default: 0
124
+ attribute :duplicate_count, :integer, default: 0
125
+ attribute :imported_count, :integer, default: 0
126
+ attribute :errored_count, :integer, default: 0
127
+ attribute :error_reports, Error.to_array_type, default: []
128
+ end
129
+ end
130
+ end
131
+ ```
132
+
133
+ Every counter defaults to zero; the Orchestrator increments them during processing. `error_reports` reuses `Error.to_array_type` for import-level errors that don't belong to a specific row -- same typed-array pattern as `ImportRecord#errors_list`, so the UI can render both with the same component.
134
+
135
+ ### Step 4 -- TypeValidator: validating before the database
136
+
137
+ StoreModel handles serialization. But we also need to validate *values* before they reach the model -- does "abc" pass as an integer? Is "not-a-date" a valid date? The TypeValidator module handles this at the column level, before the data ever touches ActiveRecord.
138
+
139
+ ```ruby
140
+ # lib/data_porter/type_validator.rb
141
+ module DataPorter
142
+ module TypeValidator
143
+ VALIDATORS = {
144
+ string: ->(_value, _opts) { true },
145
+ integer: ->(value, _opts) { Integer(value, exception: false) },
146
+ decimal: ->(value, _opts) { Float(value, exception: false) },
147
+ date: ->(value, opts) { parse_date(value, opts) },
148
+ email: ->(value, _opts) { value.match?(/\A[^@\s]+@[^@\s]+\z/) },
149
+ phone: ->(value, _opts) { value.match?(/\A[+\d][\d\s\-().]{6,}\z/) },
150
+ url: ->(value, _opts) { valid_url?(value) },
151
+ boolean: ->(value, _opts) { %w[true false 1 0].include?(value.to_s.downcase) }
152
+ }.freeze
153
+ end
154
+ end
155
+ ```
156
+
157
+ Each type maps to a lambda that returns truthy or falsy. The public API is one method: `TypeValidator.valid?("42", :integer)`. Integers use `Integer()` with `exception: false` to avoid rescue-driven control flow. Dates support an optional `:format` option for regional formatting like `"%d/%m/%Y"`.
158
+
159
+ The key design choice: validation happens *before* data enters the StoreModel. During parsing, the source reads a row, column definitions declare expected types, and the validator checks each value. Errors get added to the ImportRecord via `add_error`. By the time `determine_status!` runs, all column-level issues are captured.
160
+
161
+ This is deliberately separate from database-level validation (uniqueness, foreign keys), which runs later during the actual import. Keeping the two layers apart lets us show users a preview with type errors highlighted before any write attempt -- the foundation for the dry-run feature in part 14.
162
+
163
+ ## Decisions & tradeoffs
164
+
165
+ | Decision | We chose | Over | Because |
166
+ |----------|----------|------|---------|
167
+ | Row storage | JSONB column (array of StoreModel) | Separate `import_rows` table | Records are always accessed through their parent; no independent queries needed. One fewer migration for host apps to manage |
168
+ | Structured JSON | StoreModel gem | Hand-rolled `serialize` / raw hashes | StoreModel gives us ActiveModel attributes, typed arrays, defaults, and validations. Writing our own serializer would duplicate all of that |
169
+ | Validation layer | Column-level TypeValidator + later DB-level | Database-only validation | Enables preview and dry-run without touching the database. Users see type errors immediately, before any write attempt |
170
+ | Error representation | StoreModel class with `:message` | Plain strings in an array | Extensible -- we can add `:code`, `:severity`, `:column` later without changing the array structure or breaking existing serialized data |
171
+ | Status logic | Method on ImportRecord (`determine_status!`) | External service or state machine gem | Status depends only on the record's own errors. No transitions or events needed. A method is the simplest thing that works |
172
+
173
+ ## Testing it
174
+
175
+ ImportRecord specs verify status determination:
176
+
177
+ ```ruby
178
+ # spec/data_porter/store_models/import_record_spec.rb
179
+ RSpec.describe DataPorter::StoreModels::ImportRecord do
180
+ subject(:record) { described_class.new(line_number: 1, data: { name: "Alice" }) }
181
+
182
+ describe "#determine_status!" do
183
+ it "sets missing when required field error exists" do
184
+ record.add_error("Name is required")
185
+ record.determine_status!
186
+ expect(record.status).to eq("missing")
187
+ end
188
+
189
+ it "sets partial when non-required error exists" do
190
+ record.add_error("Email: invalid email")
191
+ record.determine_status!
192
+ expect(record.status).to eq("partial")
193
+ end
194
+
195
+ it "sets complete when no errors" do
196
+ record.determine_status!
197
+ expect(record.status).to eq("complete")
198
+ end
199
+ end
200
+ end
201
+ ```
202
+
203
+ TypeValidator specs cover each type, including edge cases like custom date formats:
204
+
205
+ ```ruby
206
+ # spec/data_porter/type_validator_spec.rb
207
+ RSpec.describe DataPorter::TypeValidator do
208
+ it "accepts valid integers" do
209
+ expect(described_class.valid?("42", :integer)).to be true
210
+ end
211
+
212
+ it "rejects non-integers" do
213
+ expect(described_class.valid?("abc", :integer)).to be false
214
+ end
215
+
216
+ it "accepts dates with custom format" do
217
+ expect(described_class.valid?("15/01/2024", :date, format: "%d/%m/%Y")).to be true
218
+ end
219
+ end
220
+ ```
221
+
222
+ No database setup needed for any of these. StoreModel objects instantiate in memory like plain Ruby objects, which makes the specs fast and isolated.
223
+
224
+ ## Recap
225
+
226
+ - **JSONB columns** let us store structured import data (records, errors, reports) without extra tables. The data is always accessed through the parent import, so a separate table would add complexity with no query benefit.
227
+ - **StoreModel** turns those JSONB columns into proper Ruby objects with typed attributes, defaults, and methods. `Error.to_array_type` gives us typed arrays that serialize and deserialize automatically.
228
+ - **ImportRecord** is the core unit of work: it holds a parsed row, collects errors and warnings, and determines its own status based on the errors it carries.
229
+ - **TypeValidator** handles column-level validation before the database, enabling the preview and dry-run features. Each type is a lambda in a hash -- easy to extend, easy to test.
230
+
231
+ ## Next up
232
+
233
+ We have configuration (part 3) and data models (this part). In part 5, we'll bring them together by designing the **Target DSL** -- the class-level interface that lets each import type declare its label, model, columns, and CSV mapping in a single file. One file per import type, zero boilerplate. If you've ever wanted `class_attribute` to do more heavy lifting, that's the one.
234
+
235
+ ---
236
+
237
+ *This is part 4 of the series "Building DataPorter - A Data Import Engine for Rails". [Previous: Configuration DSL](#) | [Next: Designing a Target DSL](#)*
@@ -0,0 +1,284 @@
1
+ ---
2
+ title: "Building DataPorter #5 — Designing a Target DSL"
3
+ series: "Building DataPorter - A Data Import Engine for Rails"
4
+ part: 5
5
+ tags: [ruby, rails, rails-engine, gem-development, dsl, metaprogramming, registry-pattern]
6
+ published: false
7
+ ---
8
+
9
+ # Designing a Target DSL
10
+
11
+ > How to make each import type a single, self-describing Ruby class -- one file, zero boilerplate.
12
+
13
+ ## Context
14
+
15
+ This is part 5 of the series where we build **DataPorter**, a mountable Rails engine for data import workflows. In [part 4](#), we modeled import records, errors, and reports using StoreModel and JSONB columns -- the data structures the engine operates on.
16
+
17
+ Now we need the layer that *describes* an import: what model does it target, what columns does it expect, how do CSV headers map to those columns? This is the Target DSL and the Registry that makes targets discoverable.
18
+
19
+ ## The problem
20
+
21
+ Every import type in a Rails app needs the same boilerplate: column definitions, header mapping, validation rules, persistence logic. Without a convention, each import ends up in a different controller action or service object, with slightly different patterns. Adding a new import type means copying an existing one and changing field names.
22
+
23
+ We want a developer to open a single file, declare what their import looks like, and have the engine handle everything else. No initializer wiring, no registration callbacks, no controller configuration.
24
+
25
+ ## What we're building
26
+
27
+ Here is what a complete target definition looks like in the host app:
28
+
29
+ ```ruby
30
+ # app/data_porter/targets/guest_target.rb
31
+ class GuestTarget < DataPorter::Target
32
+ label "Guests"
33
+ model_name "Guest"
34
+ icon "fas fa-users"
35
+ sources :csv, :json
36
+
37
+ columns do
38
+ column :first_name, type: :string, required: true
39
+ column :last_name, type: :string, required: true
40
+ column :email, type: :email
41
+ end
42
+
43
+ csv_mapping do
44
+ map "Prenom" => :first_name
45
+ map "Nom" => :last_name
46
+ end
47
+
48
+ deduplicate_by :email
49
+
50
+ def persist(record, context:)
51
+ Guest.create!(record.data)
52
+ end
53
+ end
54
+ ```
55
+
56
+ That is the entire file. The class-level DSL declares metadata and column schema. The instance method `persist` handles the actual write. The engine discovers this class through the Registry and wires it into the UI and orchestration layer automatically.
57
+
58
+ ## Implementation
59
+
60
+ ### Step 1 -- The Column struct
61
+
62
+ Before the Target itself, we need a value object for columns. Each column has a name, a type for validation, a required flag, a display label, and an open-ended options hash for type-specific settings like date formats.
63
+
64
+ ```ruby
65
+ # lib/data_porter/dsl/column.rb
66
+ module DataPorter
67
+ module DSL
68
+ Column = Struct.new(:name, :type, :required, :label, :options, keyword_init: true) do
69
+ def initialize(name:, type: :string, required: false, label: nil, **options)
70
+ super(
71
+ name: name.to_sym,
72
+ type: type.to_sym,
73
+ required: required,
74
+ label: label || name.to_s.humanize,
75
+ options: options
76
+ )
77
+ end
78
+ end
79
+ end
80
+ end
81
+ ```
82
+
83
+ A `Struct` gives us equality, `to_h`, `members`, and frozen-by-value semantics for free. The constructor coerces `name` and `type` to symbols so callers can pass strings or symbols without worrying. The `label` falls back to `humanize` -- one less thing to type for the common case, but overridable when the generated label doesn't fit (`column :full_name, label: "Full Name"`). The `**options` splat captures anything else (like `format: "%d/%m/%Y"` for dates) and tucks it into the `options` hash, keeping the struct's interface stable as we add type-specific features.
84
+
85
+ ### Step 2 -- The Target base class
86
+
87
+ The Target is where the DSL lives. All the declarative methods (`label`, `model_name`, `columns`, etc.) are class methods on a base class that host-app targets inherit from. Instance methods provide hook points for the import lifecycle.
88
+
89
+ ```ruby
90
+ # lib/data_porter/target.rb
91
+ module DataPorter
92
+ class Target
93
+ class << self
94
+ attr_reader :_label, :_model_name, :_icon, :_sources,
95
+ :_columns, :_csv_mappings, :_dedup_keys
96
+
97
+ def label(value) = @_label = value
98
+ def model_name(value) = @_model_name = value
99
+ def icon(value) = @_icon = value
100
+
101
+ def sources(*types)
102
+ @_sources = types.map(&:to_sym)
103
+ end
104
+
105
+ def columns(&)
106
+ @_columns = []
107
+ instance_eval(&)
108
+ end
109
+
110
+ def column(name, **)
111
+ @_columns << DSL::Column.new(name: name, **)
112
+ end
113
+
114
+ def csv_mapping(&)
115
+ @_csv_mappings = {}
116
+ instance_eval(&)
117
+ end
118
+
119
+ def map(hash)
120
+ @_csv_mappings.merge!(hash)
121
+ end
122
+
123
+ def deduplicate_by(*keys)
124
+ @_dedup_keys = keys.map(&:to_sym)
125
+ end
126
+ end
127
+ end
128
+ end
129
+ ```
130
+
131
+ Every DSL method is a class method that stores its value in a class instance variable (`@_label`, not `@@label`). The underscore prefix is a convention to signal "this is DSL storage, not your public API." The `columns` block uses `instance_eval` to execute `column` calls in the class context, which gives us the nested block syntax without requiring the caller to reference `self`.
132
+
133
+ The instance-level hooks provide the extensibility points for the import lifecycle:
134
+
135
+ ```ruby
136
+ # lib/data_porter/target.rb (instance methods)
137
+ def transform(record) = record
138
+ def validate(record) = nil
139
+ def persist(_record, context:) = raise NotImplementedError
140
+ def after_import(_results, context:) = nil
141
+ def on_error(_record, _error, context:) = nil
142
+ ```
143
+
144
+ `transform` and `validate` are no-ops by default -- override them if you need custom data munging or cross-field validation beyond type checking. `persist` raises `NotImplementedError` because every target must define how records get written. `after_import` and `on_error` are optional hooks for cleanup, notifications, or error recovery. The `context:` keyword argument carries the host app context (current user, tenant, etc.) that we set up in the configuration DSL back in part 3.
145
+
146
+ The split between class methods (declaration) and instance methods (execution) is deliberate. Class methods describe *what* the import is. Instance methods describe *what happens* during the import. The Orchestrator (part 7) will call `target_class._columns` to know the schema, then instantiate the target and call `target.persist(record, context: ctx)` for each row. Keeping these on different layers prevents the declaration phase from depending on runtime state.
147
+
148
+ ### Step 3 -- The Registry
149
+
150
+ Targets need to be discoverable. The engine's UI shows a dropdown of available import types; the Orchestrator looks up a target by key to process an import. The Registry is the central index.
151
+
152
+ ```ruby
153
+ # lib/data_porter/registry.rb
154
+ module DataPorter
155
+ class TargetNotFound < Error; end
156
+
157
+ module Registry
158
+ @targets = {}
159
+
160
+ class << self
161
+ def register(key, klass)
162
+ @targets[key.to_sym] = klass
163
+ end
164
+
165
+ def find(key)
166
+ @targets.fetch(key.to_sym) do
167
+ raise TargetNotFound, "Target '#{key}' not found"
168
+ end
169
+ end
170
+
171
+ def available
172
+ @targets.map do |key, klass|
173
+ { key: key, label: klass._label, icon: klass._icon }
174
+ end
175
+ end
176
+
177
+ def clear
178
+ @targets = {}
179
+ end
180
+ end
181
+ end
182
+ end
183
+ ```
184
+
185
+ The Registry is a module with class-level state -- essentially a singleton hash. `register` adds a target class under a symbolic key. `find` retrieves it, raising a custom `TargetNotFound` error instead of a generic `KeyError` so the controller can rescue it with a proper 404. `available` returns lightweight summaries for the UI: just the key, label, and icon, without exposing the full class.
186
+
187
+ Registration happens in the host app's initializer:
188
+
189
+ ```ruby
190
+ # config/initializers/data_porter.rb
191
+ DataPorter::Registry.register(:guests, GuestTarget)
192
+ DataPorter::Registry.register(:products, ProductTarget)
193
+ ```
194
+
195
+ We considered auto-discovery (scanning a directory for Target subclasses), but explicit registration is simpler to reason about: you can see exactly which targets are active, control ordering, and conditionally register based on environment or feature flags. The `clear` and `refresh!` methods support testing and hot-reloading in development.
196
+
197
+ ## Decisions & tradeoffs
198
+
199
+ | Decision | We chose | Over | Because |
200
+ |----------|----------|------|---------|
201
+ | DSL placement | Class methods on a base class | Instance methods or a configuration hash | Class methods read like declarations, not procedure calls. They execute once at load time, not per-import |
202
+ | State storage | Class instance variables (`@_label`) | `class_attribute` from ActiveSupport | Class instance variables don't leak to subclasses by default, avoiding surprising inheritance behavior. We don't need the per-instance override that `class_attribute` provides |
203
+ | Column definition | `Struct` with keyword init | Plain hash or full ActiveModel class | Struct gives us typed attributes, equality, and `to_h` with no dependencies. A hash would lose the interface; ActiveModel would be overkill for a value object |
204
+ | Hook pattern | Instance methods with default no-ops | Event system or callback chain | Override-and-call is the simplest extension model. No subscription management, no ordering concerns. If you need it, override it |
205
+ | Registry | Explicit `register` calls | Auto-discovery via `inherited` hook or directory scanning | Explicit registration is visible, testable, and doesn't depend on load order or file system conventions. Auto-discovery can be added later as sugar on top |
206
+
207
+ ## Testing it
208
+
209
+ Target specs verify both the DSL declarations and the default hook behavior:
210
+
211
+ ```ruby
212
+ # spec/data_porter/target_spec.rb
213
+ let(:target_class) do
214
+ Class.new(DataPorter::Target) do
215
+ label "Guests"
216
+ model_name "Guest"
217
+ sources :csv, :json
218
+
219
+ columns do
220
+ column :first_name, type: :string, required: true
221
+ column :email, type: :email
222
+ end
223
+
224
+ csv_mapping do
225
+ map "Prenom" => :first_name
226
+ end
227
+ end
228
+ end
229
+
230
+ it "sets the label" do
231
+ expect(target_class._label).to eq("Guests")
232
+ end
233
+
234
+ it "defines columns" do
235
+ expect(target_class._columns.size).to eq(2)
236
+ end
237
+
238
+ it "persist raises NotImplementedError" do
239
+ expect { target_class.new.persist(nil, context: nil) }
240
+ .to raise_error(NotImplementedError)
241
+ end
242
+ ```
243
+
244
+ Registry specs confirm lookup, error handling, and the `available` summary:
245
+
246
+ ```ruby
247
+ # spec/data_porter/registry_spec.rb
248
+ before { described_class.clear }
249
+
250
+ it "stores a target by key" do
251
+ described_class.register(:guests, target_class)
252
+ expect(described_class.find(:guests)).to eq(target_class)
253
+ end
254
+
255
+ it "raises TargetNotFound for unknown keys" do
256
+ expect { described_class.find(:unknown) }
257
+ .to raise_error(DataPorter::TargetNotFound)
258
+ end
259
+
260
+ it "returns target summaries" do
261
+ described_class.register(:guests, target_class)
262
+ result = described_class.available
263
+ expect(result).to contain_exactly(
264
+ { key: :guests, label: "Guests", icon: "fas fa-users" }
265
+ )
266
+ end
267
+ ```
268
+
269
+ Both suites run without a database. Anonymous classes (`Class.new(DataPorter::Target)`) let us define fresh targets per test without polluting the class hierarchy.
270
+
271
+ ## Recap
272
+
273
+ - The **Target base class** uses class methods as a declarative DSL: `label`, `model_name`, `columns`, `csv_mapping`, and `deduplicate_by`. Each stores its value in a class instance variable, keeping subclasses isolated.
274
+ - The **Column struct** is a lightweight value object that captures name, type, required flag, display label, and open-ended options. Struct gives us equality and `to_h` for free.
275
+ - **Instance method hooks** (`transform`, `validate`, `persist`, `after_import`, `on_error`) separate runtime behavior from static declaration. Only `persist` is mandatory; the rest default to no-ops.
276
+ - The **Registry** is an explicit registration system that maps symbolic keys to target classes, providing lookup for the Orchestrator and summaries for the UI.
277
+
278
+ ## Next up
279
+
280
+ We have data models (part 4) and a target DSL (this part) that describes what each import expects. In part 6, we will wire them together with the **Source layer** -- starting with CSV parsing, ActiveStorage file handling, and automatic column mapping from CSV headers to target columns. That is where the first end-to-end flow comes together: upload a file, parse it, and see structured records.
281
+
282
+ ---
283
+
284
+ *This is part 5 of the series "Building DataPorter - A Data Import Engine for Rails". [Previous: Modeling import data with StoreModel & JSONB](#) | [Next: Parsing CSV data with Sources](#)*