data_porter 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.claude/commands/blog-status.md +10 -0
- data/.claude/commands/blog.md +109 -0
- data/.claude/commands/task-done.md +27 -0
- data/.claude/commands/tm/add-dependency.md +58 -0
- data/.claude/commands/tm/add-subtask.md +79 -0
- data/.claude/commands/tm/add-task.md +81 -0
- data/.claude/commands/tm/analyze-complexity.md +124 -0
- data/.claude/commands/tm/analyze-project.md +100 -0
- data/.claude/commands/tm/auto-implement-tasks.md +100 -0
- data/.claude/commands/tm/command-pipeline.md +80 -0
- data/.claude/commands/tm/complexity-report.md +120 -0
- data/.claude/commands/tm/convert-task-to-subtask.md +74 -0
- data/.claude/commands/tm/expand-all-tasks.md +52 -0
- data/.claude/commands/tm/expand-task.md +52 -0
- data/.claude/commands/tm/fix-dependencies.md +82 -0
- data/.claude/commands/tm/help.md +101 -0
- data/.claude/commands/tm/init-project-quick.md +49 -0
- data/.claude/commands/tm/init-project.md +53 -0
- data/.claude/commands/tm/install-taskmaster.md +118 -0
- data/.claude/commands/tm/learn.md +106 -0
- data/.claude/commands/tm/list-tasks-by-status.md +42 -0
- data/.claude/commands/tm/list-tasks-with-subtasks.md +30 -0
- data/.claude/commands/tm/list-tasks.md +46 -0
- data/.claude/commands/tm/next-task.md +69 -0
- data/.claude/commands/tm/parse-prd-with-research.md +51 -0
- data/.claude/commands/tm/parse-prd.md +52 -0
- data/.claude/commands/tm/project-status.md +67 -0
- data/.claude/commands/tm/quick-install-taskmaster.md +23 -0
- data/.claude/commands/tm/remove-all-subtasks.md +94 -0
- data/.claude/commands/tm/remove-dependency.md +65 -0
- data/.claude/commands/tm/remove-subtask.md +87 -0
- data/.claude/commands/tm/remove-subtasks.md +89 -0
- data/.claude/commands/tm/remove-task.md +110 -0
- data/.claude/commands/tm/setup-models.md +52 -0
- data/.claude/commands/tm/show-task.md +85 -0
- data/.claude/commands/tm/smart-workflow.md +58 -0
- data/.claude/commands/tm/sync-readme.md +120 -0
- data/.claude/commands/tm/tm-main.md +147 -0
- data/.claude/commands/tm/to-cancelled.md +58 -0
- data/.claude/commands/tm/to-deferred.md +50 -0
- data/.claude/commands/tm/to-done.md +47 -0
- data/.claude/commands/tm/to-in-progress.md +39 -0
- data/.claude/commands/tm/to-pending.md +35 -0
- data/.claude/commands/tm/to-review.md +43 -0
- data/.claude/commands/tm/update-single-task.md +122 -0
- data/.claude/commands/tm/update-task.md +75 -0
- data/.claude/commands/tm/update-tasks-from-id.md +111 -0
- data/.claude/commands/tm/validate-dependencies.md +72 -0
- data/.claude/commands/tm/view-models.md +52 -0
- data/.env.example +12 -0
- data/.mcp.json +24 -0
- data/.taskmaster/CLAUDE.md +435 -0
- data/.taskmaster/config.json +44 -0
- data/.taskmaster/docs/prd.txt +2044 -0
- data/.taskmaster/state.json +6 -0
- data/.taskmaster/tasks/task_001.md +19 -0
- data/.taskmaster/tasks/task_002.md +19 -0
- data/.taskmaster/tasks/task_003.md +19 -0
- data/.taskmaster/tasks/task_004.md +19 -0
- data/.taskmaster/tasks/task_005.md +19 -0
- data/.taskmaster/tasks/task_006.md +19 -0
- data/.taskmaster/tasks/task_007.md +19 -0
- data/.taskmaster/tasks/task_008.md +19 -0
- data/.taskmaster/tasks/task_009.md +19 -0
- data/.taskmaster/tasks/task_010.md +19 -0
- data/.taskmaster/tasks/task_011.md +19 -0
- data/.taskmaster/tasks/task_012.md +19 -0
- data/.taskmaster/tasks/task_013.md +19 -0
- data/.taskmaster/tasks/task_014.md +19 -0
- data/.taskmaster/tasks/task_015.md +19 -0
- data/.taskmaster/tasks/task_016.md +19 -0
- data/.taskmaster/tasks/task_017.md +19 -0
- data/.taskmaster/tasks/task_018.md +19 -0
- data/.taskmaster/tasks/task_019.md +19 -0
- data/.taskmaster/tasks/task_020.md +19 -0
- data/.taskmaster/tasks/tasks.json +299 -0
- data/.taskmaster/templates/example_prd.txt +47 -0
- data/.taskmaster/templates/example_prd_rpg.txt +511 -0
- data/CHANGELOG.md +29 -0
- data/CLAUDE.md +65 -0
- data/CODE_OF_CONDUCT.md +10 -0
- data/CONTRIBUTING.md +49 -0
- data/LICENSE +21 -0
- data/README.md +463 -0
- data/Rakefile +12 -0
- data/app/assets/stylesheets/data_porter/application.css +646 -0
- data/app/channels/data_porter/import_channel.rb +10 -0
- data/app/controllers/data_porter/imports_controller.rb +68 -0
- data/app/javascript/data_porter/progress_controller.js +33 -0
- data/app/jobs/data_porter/dry_run_job.rb +12 -0
- data/app/jobs/data_porter/import_job.rb +12 -0
- data/app/jobs/data_porter/parse_job.rb +12 -0
- data/app/models/data_porter/data_import.rb +49 -0
- data/app/views/data_porter/imports/index.html.erb +142 -0
- data/app/views/data_porter/imports/new.html.erb +88 -0
- data/app/views/data_porter/imports/show.html.erb +49 -0
- data/config/database.yml +3 -0
- data/config/routes.rb +12 -0
- data/docs/SPEC.md +2012 -0
- data/docs/UI.md +32 -0
- data/docs/blog/001-why-build-a-data-import-engine.md +166 -0
- data/docs/blog/002-scaffolding-a-rails-engine.md +188 -0
- data/docs/blog/003-configuration-dsl.md +222 -0
- data/docs/blog/004-store-model-jsonb.md +237 -0
- data/docs/blog/005-target-dsl.md +284 -0
- data/docs/blog/006-parsing-csv-sources.md +300 -0
- data/docs/blog/007-orchestrator.md +247 -0
- data/docs/blog/008-actioncable-stimulus.md +376 -0
- data/docs/blog/009-phlex-ui-components.md +446 -0
- data/docs/blog/010-controllers-routing.md +374 -0
- data/docs/blog/011-generators.md +364 -0
- data/docs/blog/012-json-api-sources.md +323 -0
- data/docs/blog/013-testing-rails-engine.md +618 -0
- data/docs/blog/014-dry-run.md +307 -0
- data/docs/blog/015-publishing-retro.md +264 -0
- data/docs/blog/016-erb-view-templates.md +431 -0
- data/docs/blog/017-showcase-final-retro.md +220 -0
- data/docs/blog/BACKLOG.md +8 -0
- data/docs/blog/SERIES.md +154 -0
- data/docs/screenshots/index-with-previewing.jpg +0 -0
- data/docs/screenshots/index.jpg +0 -0
- data/docs/screenshots/modal-new-import.jpg +0 -0
- data/docs/screenshots/preview.jpg +0 -0
- data/lib/data_porter/broadcaster.rb +29 -0
- data/lib/data_porter/components/base.rb +10 -0
- data/lib/data_porter/components/failure_alert.rb +20 -0
- data/lib/data_porter/components/preview_table.rb +54 -0
- data/lib/data_porter/components/progress_bar.rb +33 -0
- data/lib/data_porter/components/results_summary.rb +19 -0
- data/lib/data_porter/components/status_badge.rb +16 -0
- data/lib/data_porter/components/summary_cards.rb +30 -0
- data/lib/data_porter/components.rb +14 -0
- data/lib/data_porter/configuration.rb +25 -0
- data/lib/data_porter/dsl/api_config.rb +25 -0
- data/lib/data_porter/dsl/column.rb +17 -0
- data/lib/data_porter/engine.rb +15 -0
- data/lib/data_porter/orchestrator.rb +141 -0
- data/lib/data_porter/record_validator.rb +32 -0
- data/lib/data_porter/registry.rb +33 -0
- data/lib/data_porter/sources/api.rb +49 -0
- data/lib/data_porter/sources/base.rb +35 -0
- data/lib/data_porter/sources/csv.rb +43 -0
- data/lib/data_porter/sources/json.rb +45 -0
- data/lib/data_porter/sources.rb +20 -0
- data/lib/data_porter/store_models/error.rb +13 -0
- data/lib/data_porter/store_models/import_record.rb +52 -0
- data/lib/data_porter/store_models/report.rb +21 -0
- data/lib/data_porter/target.rb +89 -0
- data/lib/data_porter/type_validator.rb +46 -0
- data/lib/data_porter/version.rb +5 -0
- data/lib/data_porter.rb +32 -0
- data/lib/generators/data_porter/install/install_generator.rb +33 -0
- data/lib/generators/data_porter/install/templates/create_data_porter_imports.rb.erb +21 -0
- data/lib/generators/data_porter/install/templates/initializer.rb +30 -0
- data/lib/generators/data_porter/target/target_generator.rb +44 -0
- data/lib/generators/data_porter/target/templates/target.rb.tt +20 -0
- data/sig/data_porter.rbs +4 -0
- metadata +274 -0
|
@@ -0,0 +1,237 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: "Building DataPorter #4 — Modeling import data with StoreModel & JSONB"
|
|
3
|
+
series: "Building DataPorter - A Data Import Engine for Rails"
|
|
4
|
+
part: 4
|
|
5
|
+
tags: [ruby, rails, rails-engine, gem-development, store-model, jsonb, data-modeling]
|
|
6
|
+
published: false
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
# Modeling import data with StoreModel & JSONB
|
|
10
|
+
|
|
11
|
+
> Storing structured import records, errors, and reports inside a single JSONB column -- no extra tables, no schema sprawl.
|
|
12
|
+
|
|
13
|
+
## Context
|
|
14
|
+
|
|
15
|
+
This is part 4 of the series where we build **DataPorter**, a mountable Rails engine for data import workflows. In [part 3](#), we built the configuration DSL that lets host apps customize the gem through a clean `configure` block.
|
|
16
|
+
|
|
17
|
+
Now we shift from *how the gem behaves* to *what it operates on*: the data models for parsed records, validation errors, and summary reports. We'll model all three using the StoreModel gem and PostgreSQL JSONB columns.
|
|
18
|
+
|
|
19
|
+
## The problem
|
|
20
|
+
|
|
21
|
+
A typical import engine ends up with a lot of tables: imports, import rows, import errors, reports. Each needs a migration, foreign keys, indexes, and cleanup logic. For a gem that drops into any Rails app, that's a heavy footprint.
|
|
22
|
+
|
|
23
|
+
But these records are ephemeral. They exist during the import workflow, get consulted in the results view, and nobody queries them independently. You never ask "give me all errors across all imports." They're always accessed through their parent.
|
|
24
|
+
|
|
25
|
+
If the data is always read and written as a group, it doesn't need its own table. It needs a structured column.
|
|
26
|
+
|
|
27
|
+
## What we're building
|
|
28
|
+
|
|
29
|
+
A single `DataImport` record will carry its entire import payload in JSONB columns:
|
|
30
|
+
|
|
31
|
+
```ruby
|
|
32
|
+
# Anywhere in the engine
|
|
33
|
+
import = DataPorter::DataImport.find(42)
|
|
34
|
+
|
|
35
|
+
import.report.records_count # => 150
|
|
36
|
+
import.report.errored_count # => 3
|
|
37
|
+
import.report.error_reports.each { |e| puts e.message }
|
|
38
|
+
|
|
39
|
+
import.records.first.line_number # => 1
|
|
40
|
+
import.records.first.status # => "complete"
|
|
41
|
+
import.records.first.data # => { "name" => "Alice", "email" => "alice@example.com" }
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
No joins, no N+1 queries. Records and reports come back as typed Ruby objects with real attributes and methods -- not raw hashes.
|
|
45
|
+
|
|
46
|
+
## Implementation
|
|
47
|
+
|
|
48
|
+
### Step 1 -- The Error model
|
|
49
|
+
|
|
50
|
+
Every import record can accumulate validation errors. We need a small object to represent each one. StoreModel lets us define it like an ActiveModel attribute model, but serialized into JSON.
|
|
51
|
+
|
|
52
|
+
```ruby
|
|
53
|
+
# lib/data_porter/store_models/error.rb
|
|
54
|
+
module DataPorter
|
|
55
|
+
module StoreModels
|
|
56
|
+
class Error
|
|
57
|
+
include StoreModel::Model
|
|
58
|
+
|
|
59
|
+
attribute :message, :string
|
|
60
|
+
end
|
|
61
|
+
end
|
|
62
|
+
end
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
`include StoreModel::Model` gives us ActiveModel-compatible attributes that serialize to and from JSON. Why not a plain hash? Because `error.message` is a method call with autocompletion, not `error["message"]` where you guess at indifferent access. If we need `:code` or `:severity` later, we add an attribute and existing data deserializes cleanly -- new fields default to nil.
|
|
66
|
+
|
|
67
|
+
### Step 2 -- The ImportRecord model
|
|
68
|
+
|
|
69
|
+
Each row from the source file becomes an `ImportRecord`. This is the workhorse of the import: it holds the parsed data, tracks validation status, and collects errors and warnings.
|
|
70
|
+
|
|
71
|
+
```ruby
|
|
72
|
+
# lib/data_porter/store_models/import_record.rb
|
|
73
|
+
module DataPorter
|
|
74
|
+
module StoreModels
|
|
75
|
+
class ImportRecord
|
|
76
|
+
include StoreModel::Model
|
|
77
|
+
|
|
78
|
+
attribute :line_number, :integer
|
|
79
|
+
attribute :status, :string, default: "pending"
|
|
80
|
+
attribute :data, default: -> { {} }
|
|
81
|
+
attribute :errors_list, Error.to_array_type, default: -> { [] }
|
|
82
|
+
attribute :warnings, Error.to_array_type, default: -> { [] }
|
|
83
|
+
attribute :target_id, :integer
|
|
84
|
+
attribute :dry_run_passed, :boolean, default: false
|
|
85
|
+
end
|
|
86
|
+
end
|
|
87
|
+
end
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
The `data` attribute stores whatever hash the source parser produces -- no explicit type, because each import target defines different columns. The lambda defaults (`-> { {} }`) are critical; without them, every record shares the same mutable object. `Error.to_array_type` makes `errors_list` a typed array: each JSON entry deserializes into an `Error` instance, not a raw hash.
|
|
91
|
+
|
|
92
|
+
The model also carries behavior. Status determination runs after validation:
|
|
93
|
+
|
|
94
|
+
```ruby
|
|
95
|
+
# lib/data_porter/store_models/import_record.rb
|
|
96
|
+
def determine_status!
|
|
97
|
+
self.status = if required_error?
|
|
98
|
+
"missing"
|
|
99
|
+
elsif errors_list.any?
|
|
100
|
+
"partial"
|
|
101
|
+
else
|
|
102
|
+
"complete"
|
|
103
|
+
end
|
|
104
|
+
end
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
Three statuses: "missing" (a required field is absent -- the record cannot be imported), "partial" (optional field errors exist -- the record can be imported with warnings), and "complete" (clean row, ready to go). The Orchestrator will call `determine_status!` after validation and use `importable?` to decide which records to persist.
|
|
108
|
+
|
|
109
|
+
### Step 3 -- The Report model
|
|
110
|
+
|
|
111
|
+
After parsing and validating, we need a summary. The Report model aggregates counts and collects top-level errors (like "file has no header row" or "unexpected encoding").
|
|
112
|
+
|
|
113
|
+
```ruby
|
|
114
|
+
# lib/data_porter/store_models/report.rb
|
|
115
|
+
module DataPorter
|
|
116
|
+
module StoreModels
|
|
117
|
+
class Report
|
|
118
|
+
include StoreModel::Model
|
|
119
|
+
|
|
120
|
+
attribute :records_count, :integer, default: 0
|
|
121
|
+
attribute :complete_count, :integer, default: 0
|
|
122
|
+
attribute :partial_count, :integer, default: 0
|
|
123
|
+
attribute :missing_count, :integer, default: 0
|
|
124
|
+
attribute :duplicate_count, :integer, default: 0
|
|
125
|
+
attribute :imported_count, :integer, default: 0
|
|
126
|
+
attribute :errored_count, :integer, default: 0
|
|
127
|
+
attribute :error_reports, Error.to_array_type, default: []
|
|
128
|
+
end
|
|
129
|
+
end
|
|
130
|
+
end
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
Every counter defaults to zero; the Orchestrator increments them during processing. `error_reports` reuses `Error.to_array_type` for import-level errors that don't belong to a specific row -- same typed-array pattern as `ImportRecord#errors_list`, so the UI can render both with the same component.
|
|
134
|
+
|
|
135
|
+
### Step 4 -- TypeValidator: validating before the database
|
|
136
|
+
|
|
137
|
+
StoreModel handles serialization. But we also need to validate *values* before they reach the model -- does "abc" pass as an integer? Is "not-a-date" a valid date? The TypeValidator module handles this at the column level, before the data ever touches ActiveRecord.
|
|
138
|
+
|
|
139
|
+
```ruby
|
|
140
|
+
# lib/data_porter/type_validator.rb
|
|
141
|
+
module DataPorter
|
|
142
|
+
module TypeValidator
|
|
143
|
+
VALIDATORS = {
|
|
144
|
+
string: ->(_value, _opts) { true },
|
|
145
|
+
integer: ->(value, _opts) { Integer(value, exception: false) },
|
|
146
|
+
decimal: ->(value, _opts) { Float(value, exception: false) },
|
|
147
|
+
date: ->(value, opts) { parse_date(value, opts) },
|
|
148
|
+
email: ->(value, _opts) { value.match?(/\A[^@\s]+@[^@\s]+\z/) },
|
|
149
|
+
phone: ->(value, _opts) { value.match?(/\A[+\d][\d\s\-().]{6,}\z/) },
|
|
150
|
+
url: ->(value, _opts) { valid_url?(value) },
|
|
151
|
+
boolean: ->(value, _opts) { %w[true false 1 0].include?(value.to_s.downcase) }
|
|
152
|
+
}.freeze
|
|
153
|
+
end
|
|
154
|
+
end
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
Each type maps to a lambda that returns truthy or falsy. The public API is one method: `TypeValidator.valid?("42", :integer)`. Integers use `Integer()` with `exception: false` to avoid rescue-driven control flow. Dates support an optional `:format` option for regional formatting like `"%d/%m/%Y"`.
|
|
158
|
+
|
|
159
|
+
The key design choice: validation happens *before* data enters the StoreModel. During parsing, the source reads a row, column definitions declare expected types, and the validator checks each value. Errors get added to the ImportRecord via `add_error`. By the time `determine_status!` runs, all column-level issues are captured.
|
|
160
|
+
|
|
161
|
+
This is deliberately separate from database-level validation (uniqueness, foreign keys), which runs later during the actual import. Keeping the two layers apart lets us show users a preview with type errors highlighted before any write attempt -- the foundation for the dry-run feature in part 14.
|
|
162
|
+
|
|
163
|
+
## Decisions & tradeoffs
|
|
164
|
+
|
|
165
|
+
| Decision | We chose | Over | Because |
|
|
166
|
+
|----------|----------|------|---------|
|
|
167
|
+
| Row storage | JSONB column (array of StoreModel) | Separate `import_rows` table | Records are always accessed through their parent; no independent queries needed. One fewer migration for host apps to manage |
|
|
168
|
+
| Structured JSON | StoreModel gem | Hand-rolled `serialize` / raw hashes | StoreModel gives us ActiveModel attributes, typed arrays, defaults, and validations. Writing our own serializer would duplicate all of that |
|
|
169
|
+
| Validation layer | Column-level TypeValidator + later DB-level | Database-only validation | Enables preview and dry-run without touching the database. Users see type errors immediately, before any write attempt |
|
|
170
|
+
| Error representation | StoreModel class with `:message` | Plain strings in an array | Extensible -- we can add `:code`, `:severity`, `:column` later without changing the array structure or breaking existing serialized data |
|
|
171
|
+
| Status logic | Method on ImportRecord (`determine_status!`) | External service or state machine gem | Status depends only on the record's own errors. No transitions or events needed. A method is the simplest thing that works |
|
|
172
|
+
|
|
173
|
+
## Testing it
|
|
174
|
+
|
|
175
|
+
ImportRecord specs verify status determination:
|
|
176
|
+
|
|
177
|
+
```ruby
|
|
178
|
+
# spec/data_porter/store_models/import_record_spec.rb
|
|
179
|
+
RSpec.describe DataPorter::StoreModels::ImportRecord do
|
|
180
|
+
subject(:record) { described_class.new(line_number: 1, data: { name: "Alice" }) }
|
|
181
|
+
|
|
182
|
+
describe "#determine_status!" do
|
|
183
|
+
it "sets missing when required field error exists" do
|
|
184
|
+
record.add_error("Name is required")
|
|
185
|
+
record.determine_status!
|
|
186
|
+
expect(record.status).to eq("missing")
|
|
187
|
+
end
|
|
188
|
+
|
|
189
|
+
it "sets partial when non-required error exists" do
|
|
190
|
+
record.add_error("Email: invalid email")
|
|
191
|
+
record.determine_status!
|
|
192
|
+
expect(record.status).to eq("partial")
|
|
193
|
+
end
|
|
194
|
+
|
|
195
|
+
it "sets complete when no errors" do
|
|
196
|
+
record.determine_status!
|
|
197
|
+
expect(record.status).to eq("complete")
|
|
198
|
+
end
|
|
199
|
+
end
|
|
200
|
+
end
|
|
201
|
+
```
|
|
202
|
+
|
|
203
|
+
TypeValidator specs cover each type, including edge cases like custom date formats:
|
|
204
|
+
|
|
205
|
+
```ruby
|
|
206
|
+
# spec/data_porter/type_validator_spec.rb
|
|
207
|
+
RSpec.describe DataPorter::TypeValidator do
|
|
208
|
+
it "accepts valid integers" do
|
|
209
|
+
expect(described_class.valid?("42", :integer)).to be true
|
|
210
|
+
end
|
|
211
|
+
|
|
212
|
+
it "rejects non-integers" do
|
|
213
|
+
expect(described_class.valid?("abc", :integer)).to be false
|
|
214
|
+
end
|
|
215
|
+
|
|
216
|
+
it "accepts dates with custom format" do
|
|
217
|
+
expect(described_class.valid?("15/01/2024", :date, format: "%d/%m/%Y")).to be true
|
|
218
|
+
end
|
|
219
|
+
end
|
|
220
|
+
```
|
|
221
|
+
|
|
222
|
+
No database setup needed for any of these. StoreModel objects instantiate in memory like plain Ruby objects, which makes the specs fast and isolated.
|
|
223
|
+
|
|
224
|
+
## Recap
|
|
225
|
+
|
|
226
|
+
- **JSONB columns** let us store structured import data (records, errors, reports) without extra tables. The data is always accessed through the parent import, so a separate table would add complexity with no query benefit.
|
|
227
|
+
- **StoreModel** turns those JSONB columns into proper Ruby objects with typed attributes, defaults, and methods. `Error.to_array_type` gives us typed arrays that serialize and deserialize automatically.
|
|
228
|
+
- **ImportRecord** is the core unit of work: it holds a parsed row, collects errors and warnings, and determines its own status based on the errors it carries.
|
|
229
|
+
- **TypeValidator** handles column-level validation before the database, enabling the preview and dry-run features. Each type is a lambda in a hash -- easy to extend, easy to test.
|
|
230
|
+
|
|
231
|
+
## Next up
|
|
232
|
+
|
|
233
|
+
We have configuration (part 3) and data models (this part). In part 5, we'll bring them together by designing the **Target DSL** -- the class-level interface that lets each import type declare its label, model, columns, and CSV mapping in a single file. One file per import type, zero boilerplate. If you've ever wanted `class_attribute` to do more heavy lifting, that's the one.
|
|
234
|
+
|
|
235
|
+
---
|
|
236
|
+
|
|
237
|
+
*This is part 4 of the series "Building DataPorter - A Data Import Engine for Rails". [Previous: Configuration DSL](#) | [Next: Designing a Target DSL](#)*
|
|
@@ -0,0 +1,284 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: "Building DataPorter #5 — Designing a Target DSL"
|
|
3
|
+
series: "Building DataPorter - A Data Import Engine for Rails"
|
|
4
|
+
part: 5
|
|
5
|
+
tags: [ruby, rails, rails-engine, gem-development, dsl, metaprogramming, registry-pattern]
|
|
6
|
+
published: false
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
# Designing a Target DSL
|
|
10
|
+
|
|
11
|
+
> How to make each import type a single, self-describing Ruby class -- one file, zero boilerplate.
|
|
12
|
+
|
|
13
|
+
## Context
|
|
14
|
+
|
|
15
|
+
This is part 5 of the series where we build **DataPorter**, a mountable Rails engine for data import workflows. In [part 4](#), we modeled import records, errors, and reports using StoreModel and JSONB columns -- the data structures the engine operates on.
|
|
16
|
+
|
|
17
|
+
Now we need the layer that *describes* an import: what model does it target, what columns does it expect, how do CSV headers map to those columns? This is the Target DSL and the Registry that makes targets discoverable.
|
|
18
|
+
|
|
19
|
+
## The problem
|
|
20
|
+
|
|
21
|
+
Every import type in a Rails app needs the same boilerplate: column definitions, header mapping, validation rules, persistence logic. Without a convention, each import ends up in a different controller action or service object, with slightly different patterns. Adding a new import type means copying an existing one and changing field names.
|
|
22
|
+
|
|
23
|
+
We want a developer to open a single file, declare what their import looks like, and have the engine handle everything else. No initializer wiring, no registration callbacks, no controller configuration.
|
|
24
|
+
|
|
25
|
+
## What we're building
|
|
26
|
+
|
|
27
|
+
Here is what a complete target definition looks like in the host app:
|
|
28
|
+
|
|
29
|
+
```ruby
|
|
30
|
+
# app/data_porter/targets/guest_target.rb
|
|
31
|
+
class GuestTarget < DataPorter::Target
|
|
32
|
+
label "Guests"
|
|
33
|
+
model_name "Guest"
|
|
34
|
+
icon "fas fa-users"
|
|
35
|
+
sources :csv, :json
|
|
36
|
+
|
|
37
|
+
columns do
|
|
38
|
+
column :first_name, type: :string, required: true
|
|
39
|
+
column :last_name, type: :string, required: true
|
|
40
|
+
column :email, type: :email
|
|
41
|
+
end
|
|
42
|
+
|
|
43
|
+
csv_mapping do
|
|
44
|
+
map "Prenom" => :first_name
|
|
45
|
+
map "Nom" => :last_name
|
|
46
|
+
end
|
|
47
|
+
|
|
48
|
+
deduplicate_by :email
|
|
49
|
+
|
|
50
|
+
def persist(record, context:)
|
|
51
|
+
Guest.create!(record.data)
|
|
52
|
+
end
|
|
53
|
+
end
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
That is the entire file. The class-level DSL declares metadata and column schema. The instance method `persist` handles the actual write. The engine discovers this class through the Registry and wires it into the UI and orchestration layer automatically.
|
|
57
|
+
|
|
58
|
+
## Implementation
|
|
59
|
+
|
|
60
|
+
### Step 1 -- The Column struct
|
|
61
|
+
|
|
62
|
+
Before the Target itself, we need a value object for columns. Each column has a name, a type for validation, a required flag, a display label, and an open-ended options hash for type-specific settings like date formats.
|
|
63
|
+
|
|
64
|
+
```ruby
|
|
65
|
+
# lib/data_porter/dsl/column.rb
|
|
66
|
+
module DataPorter
|
|
67
|
+
module DSL
|
|
68
|
+
Column = Struct.new(:name, :type, :required, :label, :options, keyword_init: true) do
|
|
69
|
+
def initialize(name:, type: :string, required: false, label: nil, **options)
|
|
70
|
+
super(
|
|
71
|
+
name: name.to_sym,
|
|
72
|
+
type: type.to_sym,
|
|
73
|
+
required: required,
|
|
74
|
+
label: label || name.to_s.humanize,
|
|
75
|
+
options: options
|
|
76
|
+
)
|
|
77
|
+
end
|
|
78
|
+
end
|
|
79
|
+
end
|
|
80
|
+
end
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
A `Struct` gives us equality, `to_h`, `members`, and frozen-by-value semantics for free. The constructor coerces `name` and `type` to symbols so callers can pass strings or symbols without worrying. The `label` falls back to `humanize` -- one less thing to type for the common case, but overridable when the generated label doesn't fit (`column :full_name, label: "Full Name"`). The `**options` splat captures anything else (like `format: "%d/%m/%Y"` for dates) and tucks it into the `options` hash, keeping the struct's interface stable as we add type-specific features.
|
|
84
|
+
|
|
85
|
+
### Step 2 -- The Target base class
|
|
86
|
+
|
|
87
|
+
The Target is where the DSL lives. All the declarative methods (`label`, `model_name`, `columns`, etc.) are class methods on a base class that host-app targets inherit from. Instance methods provide hook points for the import lifecycle.
|
|
88
|
+
|
|
89
|
+
```ruby
|
|
90
|
+
# lib/data_porter/target.rb
|
|
91
|
+
module DataPorter
|
|
92
|
+
class Target
|
|
93
|
+
class << self
|
|
94
|
+
attr_reader :_label, :_model_name, :_icon, :_sources,
|
|
95
|
+
:_columns, :_csv_mappings, :_dedup_keys
|
|
96
|
+
|
|
97
|
+
def label(value) = @_label = value
|
|
98
|
+
def model_name(value) = @_model_name = value
|
|
99
|
+
def icon(value) = @_icon = value
|
|
100
|
+
|
|
101
|
+
def sources(*types)
|
|
102
|
+
@_sources = types.map(&:to_sym)
|
|
103
|
+
end
|
|
104
|
+
|
|
105
|
+
def columns(&)
|
|
106
|
+
@_columns = []
|
|
107
|
+
instance_eval(&)
|
|
108
|
+
end
|
|
109
|
+
|
|
110
|
+
def column(name, **)
|
|
111
|
+
@_columns << DSL::Column.new(name: name, **)
|
|
112
|
+
end
|
|
113
|
+
|
|
114
|
+
def csv_mapping(&)
|
|
115
|
+
@_csv_mappings = {}
|
|
116
|
+
instance_eval(&)
|
|
117
|
+
end
|
|
118
|
+
|
|
119
|
+
def map(hash)
|
|
120
|
+
@_csv_mappings.merge!(hash)
|
|
121
|
+
end
|
|
122
|
+
|
|
123
|
+
def deduplicate_by(*keys)
|
|
124
|
+
@_dedup_keys = keys.map(&:to_sym)
|
|
125
|
+
end
|
|
126
|
+
end
|
|
127
|
+
end
|
|
128
|
+
end
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
Every DSL method is a class method that stores its value in a class instance variable (`@_label`, not `@@label`). The underscore prefix is a convention to signal "this is DSL storage, not your public API." The `columns` block uses `instance_eval` to execute `column` calls in the class context, which gives us the nested block syntax without requiring the caller to reference `self`.
|
|
132
|
+
|
|
133
|
+
The instance-level hooks provide the extensibility points for the import lifecycle:
|
|
134
|
+
|
|
135
|
+
```ruby
|
|
136
|
+
# lib/data_porter/target.rb (instance methods)
|
|
137
|
+
def transform(record) = record
|
|
138
|
+
def validate(record) = nil
|
|
139
|
+
def persist(_record, context:) = raise NotImplementedError
|
|
140
|
+
def after_import(_results, context:) = nil
|
|
141
|
+
def on_error(_record, _error, context:) = nil
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
`transform` and `validate` are no-ops by default -- override them if you need custom data munging or cross-field validation beyond type checking. `persist` raises `NotImplementedError` because every target must define how records get written. `after_import` and `on_error` are optional hooks for cleanup, notifications, or error recovery. The `context:` keyword argument carries the host app context (current user, tenant, etc.) that we set up in the configuration DSL back in part 3.
|
|
145
|
+
|
|
146
|
+
The split between class methods (declaration) and instance methods (execution) is deliberate. Class methods describe *what* the import is. Instance methods describe *what happens* during the import. The Orchestrator (part 7) will call `target_class._columns` to know the schema, then instantiate the target and call `target.persist(record, context: ctx)` for each row. Keeping these on different layers prevents the declaration phase from depending on runtime state.
|
|
147
|
+
|
|
148
|
+
### Step 3 -- The Registry
|
|
149
|
+
|
|
150
|
+
Targets need to be discoverable. The engine's UI shows a dropdown of available import types; the Orchestrator looks up a target by key to process an import. The Registry is the central index.
|
|
151
|
+
|
|
152
|
+
```ruby
|
|
153
|
+
# lib/data_porter/registry.rb
|
|
154
|
+
module DataPorter
|
|
155
|
+
class TargetNotFound < Error; end
|
|
156
|
+
|
|
157
|
+
module Registry
|
|
158
|
+
@targets = {}
|
|
159
|
+
|
|
160
|
+
class << self
|
|
161
|
+
def register(key, klass)
|
|
162
|
+
@targets[key.to_sym] = klass
|
|
163
|
+
end
|
|
164
|
+
|
|
165
|
+
def find(key)
|
|
166
|
+
@targets.fetch(key.to_sym) do
|
|
167
|
+
raise TargetNotFound, "Target '#{key}' not found"
|
|
168
|
+
end
|
|
169
|
+
end
|
|
170
|
+
|
|
171
|
+
def available
|
|
172
|
+
@targets.map do |key, klass|
|
|
173
|
+
{ key: key, label: klass._label, icon: klass._icon }
|
|
174
|
+
end
|
|
175
|
+
end
|
|
176
|
+
|
|
177
|
+
def clear
|
|
178
|
+
@targets = {}
|
|
179
|
+
end
|
|
180
|
+
end
|
|
181
|
+
end
|
|
182
|
+
end
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
The Registry is a module with class-level state -- essentially a singleton hash. `register` adds a target class under a symbolic key. `find` retrieves it, raising a custom `TargetNotFound` error instead of a generic `KeyError` so the controller can rescue it with a proper 404. `available` returns lightweight summaries for the UI: just the key, label, and icon, without exposing the full class.
|
|
186
|
+
|
|
187
|
+
Registration happens in the host app's initializer:
|
|
188
|
+
|
|
189
|
+
```ruby
|
|
190
|
+
# config/initializers/data_porter.rb
|
|
191
|
+
DataPorter::Registry.register(:guests, GuestTarget)
|
|
192
|
+
DataPorter::Registry.register(:products, ProductTarget)
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
We considered auto-discovery (scanning a directory for Target subclasses), but explicit registration is simpler to reason about: you can see exactly which targets are active, control ordering, and conditionally register based on environment or feature flags. The `clear` and `refresh!` methods support testing and hot-reloading in development.
|
|
196
|
+
|
|
197
|
+
## Decisions & tradeoffs
|
|
198
|
+
|
|
199
|
+
| Decision | We chose | Over | Because |
|
|
200
|
+
|----------|----------|------|---------|
|
|
201
|
+
| DSL placement | Class methods on a base class | Instance methods or a configuration hash | Class methods read like declarations, not procedure calls. They execute once at load time, not per-import |
|
|
202
|
+
| State storage | Class instance variables (`@_label`) | `class_attribute` from ActiveSupport | Class instance variables don't leak to subclasses by default, avoiding surprising inheritance behavior. We don't need the per-instance override that `class_attribute` provides |
|
|
203
|
+
| Column definition | `Struct` with keyword init | Plain hash or full ActiveModel class | Struct gives us typed attributes, equality, and `to_h` with no dependencies. A hash would lose the interface; ActiveModel would be overkill for a value object |
|
|
204
|
+
| Hook pattern | Instance methods with default no-ops | Event system or callback chain | Override-and-call is the simplest extension model. No subscription management, no ordering concerns. If you need it, override it |
|
|
205
|
+
| Registry | Explicit `register` calls | Auto-discovery via `inherited` hook or directory scanning | Explicit registration is visible, testable, and doesn't depend on load order or file system conventions. Auto-discovery can be added later as sugar on top |
|
|
206
|
+
|
|
207
|
+
## Testing it
|
|
208
|
+
|
|
209
|
+
Target specs verify both the DSL declarations and the default hook behavior:
|
|
210
|
+
|
|
211
|
+
```ruby
|
|
212
|
+
# spec/data_porter/target_spec.rb
|
|
213
|
+
let(:target_class) do
|
|
214
|
+
Class.new(DataPorter::Target) do
|
|
215
|
+
label "Guests"
|
|
216
|
+
model_name "Guest"
|
|
217
|
+
sources :csv, :json
|
|
218
|
+
|
|
219
|
+
columns do
|
|
220
|
+
column :first_name, type: :string, required: true
|
|
221
|
+
column :email, type: :email
|
|
222
|
+
end
|
|
223
|
+
|
|
224
|
+
csv_mapping do
|
|
225
|
+
map "Prenom" => :first_name
|
|
226
|
+
end
|
|
227
|
+
end
|
|
228
|
+
end
|
|
229
|
+
|
|
230
|
+
it "sets the label" do
|
|
231
|
+
expect(target_class._label).to eq("Guests")
|
|
232
|
+
end
|
|
233
|
+
|
|
234
|
+
it "defines columns" do
|
|
235
|
+
expect(target_class._columns.size).to eq(2)
|
|
236
|
+
end
|
|
237
|
+
|
|
238
|
+
it "persist raises NotImplementedError" do
|
|
239
|
+
expect { target_class.new.persist(nil, context: nil) }
|
|
240
|
+
.to raise_error(NotImplementedError)
|
|
241
|
+
end
|
|
242
|
+
```
|
|
243
|
+
|
|
244
|
+
Registry specs confirm lookup, error handling, and the `available` summary:
|
|
245
|
+
|
|
246
|
+
```ruby
|
|
247
|
+
# spec/data_porter/registry_spec.rb
|
|
248
|
+
before { described_class.clear }
|
|
249
|
+
|
|
250
|
+
it "stores a target by key" do
|
|
251
|
+
described_class.register(:guests, target_class)
|
|
252
|
+
expect(described_class.find(:guests)).to eq(target_class)
|
|
253
|
+
end
|
|
254
|
+
|
|
255
|
+
it "raises TargetNotFound for unknown keys" do
|
|
256
|
+
expect { described_class.find(:unknown) }
|
|
257
|
+
.to raise_error(DataPorter::TargetNotFound)
|
|
258
|
+
end
|
|
259
|
+
|
|
260
|
+
it "returns target summaries" do
|
|
261
|
+
described_class.register(:guests, target_class)
|
|
262
|
+
result = described_class.available
|
|
263
|
+
expect(result).to contain_exactly(
|
|
264
|
+
{ key: :guests, label: "Guests", icon: "fas fa-users" }
|
|
265
|
+
)
|
|
266
|
+
end
|
|
267
|
+
```
|
|
268
|
+
|
|
269
|
+
Both suites run without a database. Anonymous classes (`Class.new(DataPorter::Target)`) let us define fresh targets per test without polluting the class hierarchy.
|
|
270
|
+
|
|
271
|
+
## Recap
|
|
272
|
+
|
|
273
|
+
- The **Target base class** uses class methods as a declarative DSL: `label`, `model_name`, `columns`, `csv_mapping`, and `deduplicate_by`. Each stores its value in a class instance variable, keeping subclasses isolated.
|
|
274
|
+
- The **Column struct** is a lightweight value object that captures name, type, required flag, display label, and open-ended options. Struct gives us equality and `to_h` for free.
|
|
275
|
+
- **Instance method hooks** (`transform`, `validate`, `persist`, `after_import`, `on_error`) separate runtime behavior from static declaration. Only `persist` is mandatory; the rest default to no-ops.
|
|
276
|
+
- The **Registry** is an explicit registration system that maps symbolic keys to target classes, providing lookup for the Orchestrator and summaries for the UI.
|
|
277
|
+
|
|
278
|
+
## Next up
|
|
279
|
+
|
|
280
|
+
We have data models (part 4) and a target DSL (this part) that describes what each import expects. In part 6, we will wire them together with the **Source layer** -- starting with CSV parsing, ActiveStorage file handling, and automatic column mapping from CSV headers to target columns. That is where the first end-to-end flow comes together: upload a file, parse it, and see structured records.
|
|
281
|
+
|
|
282
|
+
---
|
|
283
|
+
|
|
284
|
+
*This is part 5 of the series "Building DataPorter - A Data Import Engine for Rails". [Previous: Modeling import data with StoreModel & JSONB](#) | [Next: Parsing CSV data with Sources](#)*
|