RubyGems - data_porter - Versions diffs - 0.1.0 - Mend

data_porter 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (159) hide show

checksums.yaml +7 -0
data/.claude/commands/blog-status.md +10 -0
data/.claude/commands/blog.md +109 -0
data/.claude/commands/task-done.md +27 -0
data/.claude/commands/tm/add-dependency.md +58 -0
data/.claude/commands/tm/add-subtask.md +79 -0
data/.claude/commands/tm/add-task.md +81 -0
data/.claude/commands/tm/analyze-complexity.md +124 -0
data/.claude/commands/tm/analyze-project.md +100 -0
data/.claude/commands/tm/auto-implement-tasks.md +100 -0
data/.claude/commands/tm/command-pipeline.md +80 -0
data/.claude/commands/tm/complexity-report.md +120 -0
data/.claude/commands/tm/convert-task-to-subtask.md +74 -0
data/.claude/commands/tm/expand-all-tasks.md +52 -0
data/.claude/commands/tm/expand-task.md +52 -0
data/.claude/commands/tm/fix-dependencies.md +82 -0
data/.claude/commands/tm/help.md +101 -0
data/.claude/commands/tm/init-project-quick.md +49 -0
data/.claude/commands/tm/init-project.md +53 -0
data/.claude/commands/tm/install-taskmaster.md +118 -0
data/.claude/commands/tm/learn.md +106 -0
data/.claude/commands/tm/list-tasks-by-status.md +42 -0
data/.claude/commands/tm/list-tasks-with-subtasks.md +30 -0
data/.claude/commands/tm/list-tasks.md +46 -0
data/.claude/commands/tm/next-task.md +69 -0
data/.claude/commands/tm/parse-prd-with-research.md +51 -0
data/.claude/commands/tm/parse-prd.md +52 -0
data/.claude/commands/tm/project-status.md +67 -0
data/.claude/commands/tm/quick-install-taskmaster.md +23 -0
data/.claude/commands/tm/remove-all-subtasks.md +94 -0
data/.claude/commands/tm/remove-dependency.md +65 -0
data/.claude/commands/tm/remove-subtask.md +87 -0
data/.claude/commands/tm/remove-subtasks.md +89 -0
data/.claude/commands/tm/remove-task.md +110 -0
data/.claude/commands/tm/setup-models.md +52 -0
data/.claude/commands/tm/show-task.md +85 -0
data/.claude/commands/tm/smart-workflow.md +58 -0
data/.claude/commands/tm/sync-readme.md +120 -0
data/.claude/commands/tm/tm-main.md +147 -0
data/.claude/commands/tm/to-cancelled.md +58 -0
data/.claude/commands/tm/to-deferred.md +50 -0
data/.claude/commands/tm/to-done.md +47 -0
data/.claude/commands/tm/to-in-progress.md +39 -0
data/.claude/commands/tm/to-pending.md +35 -0
data/.claude/commands/tm/to-review.md +43 -0
data/.claude/commands/tm/update-single-task.md +122 -0
data/.claude/commands/tm/update-task.md +75 -0
data/.claude/commands/tm/update-tasks-from-id.md +111 -0
data/.claude/commands/tm/validate-dependencies.md +72 -0
data/.claude/commands/tm/view-models.md +52 -0
data/.env.example +12 -0
data/.mcp.json +24 -0
data/.taskmaster/CLAUDE.md +435 -0
data/.taskmaster/config.json +44 -0
data/.taskmaster/docs/prd.txt +2044 -0
data/.taskmaster/state.json +6 -0
data/.taskmaster/tasks/task_001.md +19 -0
data/.taskmaster/tasks/task_002.md +19 -0
data/.taskmaster/tasks/task_003.md +19 -0
data/.taskmaster/tasks/task_004.md +19 -0
data/.taskmaster/tasks/task_005.md +19 -0
data/.taskmaster/tasks/task_006.md +19 -0
data/.taskmaster/tasks/task_007.md +19 -0
data/.taskmaster/tasks/task_008.md +19 -0
data/.taskmaster/tasks/task_009.md +19 -0
data/.taskmaster/tasks/task_010.md +19 -0
data/.taskmaster/tasks/task_011.md +19 -0
data/.taskmaster/tasks/task_012.md +19 -0
data/.taskmaster/tasks/task_013.md +19 -0
data/.taskmaster/tasks/task_014.md +19 -0
data/.taskmaster/tasks/task_015.md +19 -0
data/.taskmaster/tasks/task_016.md +19 -0
data/.taskmaster/tasks/task_017.md +19 -0
data/.taskmaster/tasks/task_018.md +19 -0
data/.taskmaster/tasks/task_019.md +19 -0
data/.taskmaster/tasks/task_020.md +19 -0
data/.taskmaster/tasks/tasks.json +299 -0
data/.taskmaster/templates/example_prd.txt +47 -0
data/.taskmaster/templates/example_prd_rpg.txt +511 -0
data/CHANGELOG.md +29 -0
data/CLAUDE.md +65 -0
data/CODE_OF_CONDUCT.md +10 -0
data/CONTRIBUTING.md +49 -0
data/LICENSE +21 -0
data/README.md +463 -0
data/Rakefile +12 -0
data/app/assets/stylesheets/data_porter/application.css +646 -0
data/app/channels/data_porter/import_channel.rb +10 -0
data/app/controllers/data_porter/imports_controller.rb +68 -0
data/app/javascript/data_porter/progress_controller.js +33 -0
data/app/jobs/data_porter/dry_run_job.rb +12 -0
data/app/jobs/data_porter/import_job.rb +12 -0
data/app/jobs/data_porter/parse_job.rb +12 -0
data/app/models/data_porter/data_import.rb +49 -0
data/app/views/data_porter/imports/index.html.erb +142 -0
data/app/views/data_porter/imports/new.html.erb +88 -0
data/app/views/data_porter/imports/show.html.erb +49 -0
data/config/database.yml +3 -0
data/config/routes.rb +12 -0
data/docs/SPEC.md +2012 -0
data/docs/UI.md +32 -0
data/docs/blog/001-why-build-a-data-import-engine.md +166 -0
data/docs/blog/002-scaffolding-a-rails-engine.md +188 -0
data/docs/blog/003-configuration-dsl.md +222 -0
data/docs/blog/004-store-model-jsonb.md +237 -0
data/docs/blog/005-target-dsl.md +284 -0
data/docs/blog/006-parsing-csv-sources.md +300 -0
data/docs/blog/007-orchestrator.md +247 -0
data/docs/blog/008-actioncable-stimulus.md +376 -0
data/docs/blog/009-phlex-ui-components.md +446 -0
data/docs/blog/010-controllers-routing.md +374 -0
data/docs/blog/011-generators.md +364 -0
data/docs/blog/012-json-api-sources.md +323 -0
data/docs/blog/013-testing-rails-engine.md +618 -0
data/docs/blog/014-dry-run.md +307 -0
data/docs/blog/015-publishing-retro.md +264 -0
data/docs/blog/016-erb-view-templates.md +431 -0
data/docs/blog/017-showcase-final-retro.md +220 -0
data/docs/blog/BACKLOG.md +8 -0
data/docs/blog/SERIES.md +154 -0
data/docs/screenshots/index-with-previewing.jpg +0 -0
data/docs/screenshots/index.jpg +0 -0
data/docs/screenshots/modal-new-import.jpg +0 -0
data/docs/screenshots/preview.jpg +0 -0
data/lib/data_porter/broadcaster.rb +29 -0
data/lib/data_porter/components/base.rb +10 -0
data/lib/data_porter/components/failure_alert.rb +20 -0
data/lib/data_porter/components/preview_table.rb +54 -0
data/lib/data_porter/components/progress_bar.rb +33 -0
data/lib/data_porter/components/results_summary.rb +19 -0
data/lib/data_porter/components/status_badge.rb +16 -0
data/lib/data_porter/components/summary_cards.rb +30 -0
data/lib/data_porter/components.rb +14 -0
data/lib/data_porter/configuration.rb +25 -0
data/lib/data_porter/dsl/api_config.rb +25 -0
data/lib/data_porter/dsl/column.rb +17 -0
data/lib/data_porter/engine.rb +15 -0
data/lib/data_porter/orchestrator.rb +141 -0
data/lib/data_porter/record_validator.rb +32 -0
data/lib/data_porter/registry.rb +33 -0
data/lib/data_porter/sources/api.rb +49 -0
data/lib/data_porter/sources/base.rb +35 -0
data/lib/data_porter/sources/csv.rb +43 -0
data/lib/data_porter/sources/json.rb +45 -0
data/lib/data_porter/sources.rb +20 -0
data/lib/data_porter/store_models/error.rb +13 -0
data/lib/data_porter/store_models/import_record.rb +52 -0
data/lib/data_porter/store_models/report.rb +21 -0
data/lib/data_porter/target.rb +89 -0
data/lib/data_porter/type_validator.rb +46 -0
data/lib/data_porter/version.rb +5 -0
data/lib/data_porter.rb +32 -0
data/lib/generators/data_porter/install/install_generator.rb +33 -0
data/lib/generators/data_porter/install/templates/create_data_porter_imports.rb.erb +21 -0
data/lib/generators/data_porter/install/templates/initializer.rb +30 -0
data/lib/generators/data_porter/target/target_generator.rb +44 -0
data/lib/generators/data_porter/target/templates/target.rb.tt +20 -0
data/sig/data_porter.rbs +4 -0
metadata +274 -0

data/docs/blog/011-generators.md ADDED Viewed

@@ -0,0 +1,364 @@
+---
+title: "Building DataPorter #11 -- Generators: Install & Target Scaffolding"
+series: "Building DataPorter - A Data Import Engine for Rails"
+part: 11
+tags: [ruby, rails, rails-engine, gem-development, generators, scaffolding, templates]
+published: false
+---
+# Generators: Install & Target Scaffolding
+> A great gem installs in one command. A great engine scaffolds new import types from the command line. Here is how to build Rails generators that bootstrap everything -- migration, initializer, routes, and per-target files -- so adopters never have to wire anything by hand.
+## Context
+This is part 11 of the series where we build **DataPorter**, a mountable Rails engine for data import workflows. In [part 10](#), we built the ImportsController, wired up engine routes, and solved the dynamic parent controller inheritance problem.
+At this point the engine is feature-complete: targets, sources, the orchestrator, real-time progress, a Phlex UI, and controllers all work together. But onboarding a new host app still requires manually creating a migration, writing an initializer, mounting the engine in routes, and knowing the exact Target DSL to define an import type. That is too many steps for someone evaluating the gem for the first time. We need generators that collapse all of that into a single `rails generate` command.
+## The problem
+Installing a Rails engine by hand means at least four discrete steps: copy a migration, create an initializer with sane defaults, add a route mount, and create the directory where import targets will live. Miss any one of these and the engine will not work -- but the error messages will not tell you which step you forgot. A missing migration produces an `ActiveRecord::StatementInvalid`; a missing route mount means a `NoMethodError` when the engine tries to resolve its URL helpers; a missing initializer silently uses defaults that may not match your app.
+On top of that, every time a developer wants to add a new import type, they need to remember the Target DSL: which class to inherit from, which methods to define, how to declare columns. That is cognitive overhead that belongs in a generator, not in someone's memory.
+## What we are building
+Two generators. The first bootstraps the entire engine into a host app:
+```bash
+$ rails generate data_porter:install
+      create  db/migrate/20260206120000_create_data_porter_imports.rb
+      create  config/initializers/data_porter.rb
+      create  app/importers
+       route  mount DataPorter::Engine, at: "/imports"
+```
+The second scaffolds a new target with parsed column definitions:
+```bash
+$ rails generate data_porter:target guests first_name:string:required email:email last_name:string
+      create  app/importers/guests_target.rb
+```
+One command to install, one command per import type. No manual file creation, no DSL memorization.
+## Implementation
+### Step 1 -- The install generator
+The install generator inherits from `Rails::Generators::Base` and mixes in `ActiveRecord::Generators::Migration` for timestamped migration support. Each public method in the generator becomes a step that Rails executes in definition order:
+```ruby
+# lib/generators/data_porter/install/install_generator.rb
+module DataPorter
+  module Generators
+    class InstallGenerator < Rails::Generators::Base
+      include ActiveRecord::Generators::Migration
+      source_root File.expand_path("templates", __dir__)
+      def copy_migration
+        migration_template(
+          "create_data_porter_imports.rb.erb",
+          "db/migrate/create_data_porter_imports.rb"
+        )
+      end
+      def copy_initializer
+        template("initializer.rb", "config/initializers/data_porter.rb")
+      end
+      def create_importers_directory
+        empty_directory("app/importers")
+      end
+      def mount_engine
+        route 'mount DataPorter::Engine, at: "/imports"'
+      end
+    end
+  end
+end
+```
+Four methods, four artifacts. Let us walk through each one.
+`copy_migration` uses `migration_template` instead of the plain `template` method. The difference matters: `migration_template` adds a timestamp prefix to the filename and raises an error if a migration with the same name already exists, preventing duplicate migrations when someone accidentally runs the generator twice. The template itself is an ERB file that interpolates the current ActiveRecord migration version:
+```ruby
+# lib/generators/data_porter/install/templates/create_data_porter_imports.rb.erb
+class CreateDataPorterImports < ActiveRecord::Migration[<%= ActiveRecord::Migration.current_version %>]
+  def change
+    create_table :data_porter_imports do |t|
+      t.string  :target_key,  null: false
+      t.string  :source_type, null: false, default: "csv"
+      t.integer :status,      null: false, default: 0
+      t.jsonb   :records,     null: false, default: []
+      t.jsonb   :report,      null: false, default: {}
+      t.jsonb   :config,      null: false, default: {}
+      t.references :user, polymorphic: true, null: false
+      t.timestamps
+    end
+    add_index :data_porter_imports, :status
+    add_index :data_porter_imports, :target_key
+  end
+end
+```
+The `<%= ActiveRecord::Migration.current_version %>` call means the generated migration always matches the host app's Rails version -- a Rails 7.1 app gets `Migration[7.1]`, a Rails 7.2 app gets `Migration[7.2]`. No hardcoding, no compatibility issues.
+`copy_initializer` generates a commented-out configuration file. Every option is present but disabled, so developers can see what is available without digging through source code:
+```ruby
+# lib/generators/data_porter/install/templates/initializer.rb
+DataPorter.configure do |config|
+  # Parent controller for the engine's controllers to inherit from.
+  # This controls authentication, layouts, and helpers.
+  # config.parent_controller = "ApplicationController"
+  # ActiveJob queue name for import jobs.
+  # config.queue_name = :imports
+  # ActiveStorage service for uploaded files.
+  # config.storage_service = :local
+  # ActionCable channel prefix.
+  # config.cable_channel_prefix = "data_porter"
+  # Context builder: inject business data into targets.
+  # Receives the current controller instance.
+  # config.context_builder = ->(controller) {
+  #   OpenStruct.new(
+  #     user: controller.current_user
+  #   )
+  # }
+  # Maximum number of records displayed in preview.
+  # config.preview_limit = 500
+  # Enabled source types.
+  # config.enabled_sources = %i[csv json api]
+end
+```
+This is a deliberate design choice: every option documents itself with a comment that explains *what it controls*, not just what it is. The `context_builder` lambda even includes a usage example. When someone opens this file for the first time, they should understand the engine's configuration surface without reading the README.
+`create_importers_directory` calls `empty_directory`, which creates `app/importers` and silently skips if it already exists. This is where host apps will place their target files.
+`mount_engine` uses the built-in `route` helper, which injects the mount line into `config/routes.rb`. The default mount point is `/imports`, but because it is a standard route mount, developers can change the path or wrap it in constraints after generation.
+### Step 2 -- The target generator
+The target generator inherits from `Rails::Generators::NamedBase` instead of `Base`. This gives us the `name` argument for free -- Rails automatically parses the first argument as the target name and provides helper methods like `class_name`, `file_name`, and `singular_name`:
+```ruby
+# lib/generators/data_porter/target/target_generator.rb
+module DataPorter
+  module Generators
+    class TargetGenerator < Rails::Generators::NamedBase
+      source_root File.expand_path("templates", __dir__)
+      argument :columns, type: :array, default: [], banner: "name:type[:required]"
+      def create_target_file
+        template("target.rb.tt", "app/importers/#{file_name}_target.rb")
+      end
+      private
+      def target_class_name
+        "#{class_name}Target"
+      end
+      def model_name
+        class_name.singularize
+      end
+      def target_label
+        class_name.titleize
+      end
+      def parsed_columns
+        columns.map { |col| parse_column(col) }
+      end
+      def parse_column(definition)
+        parts = definition.split(":")
+        {
+          name: parts[0],
+          type: parts[1] || "string",
+          required: parts[2] == "required"
+        }
+      end
+    end
+  end
+end
+```
+The `columns` argument uses `type: :array` with a `default: []`, which means you can generate a bare target with no columns and fill them in later, or you can specify everything up front. The `banner` string `"name:type[:required]"` tells `rails generate data_porter:target --help` exactly what format columns expect.
+The `parse_column` method splits each column definition on colons. `first_name:string:required` becomes `{ name: "first_name", type: "string", required: true }`. `email:email` becomes `{ name: "email", type: "email", required: false }`. If you omit the type entirely, it defaults to `"string"`. This mirrors the familiar `rails generate model` syntax but adapts it to our column DSL.
+The naming helpers derive everything from the single `name` argument. Pass `guests` and you get: `target_class_name` returns `"GuestsTarget"`, `model_name` returns `"Guest"` (singularized, because the target usually maps to one ActiveRecord model), and `target_label` returns `"Guests"` (titleized, for the UI).
+### Step 3 -- The target template
+The template uses Thor's `.tt` format (which processes ERB tags using the generator's binding) to produce a complete, working target file:
+```erb
+# lib/generators/data_porter/target/templates/target.rb.tt
+class <%= target_class_name %> < DataPorter::Target
+  label "<%= target_label %>"
+  model_name "<%= model_name %>"
+  icon "fas fa-file-import"
+  sources :csv
+<% if parsed_columns.any? %>
+  columns do
+<% parsed_columns.each do |col| -%>
+    column :<%= col[:name] %>, type: :<%= col[:type] %><%= ", required: true" if col[:required] %>
+<% end -%>
+  end
+<% end %>
+  def persist(record, context:)
+    # <%= model_name %>.create!(record.attributes)
+  end
+end
+```
+Running `rails generate data_porter:target guests first_name:string:required email:email last_name:string` produces:
+```ruby
+# app/importers/guests_target.rb
+class GuestsTarget < DataPorter::Target
+  label "Guests"
+  model_name "Guest"
+  icon "fas fa-file-import"
+  sources :csv
+  columns do
+    column :first_name, type: :string, required: true
+    column :email, type: :email
+    column :last_name, type: :string
+  end
+  def persist(record, context:)
+    # Guest.create!(record.attributes)
+  end
+end
+```
+Two details worth noting. First, the `persist` method is generated with a commented-out implementation line. This is intentional: we want the developer to think about what persistence means for their domain rather than blindly accepting a default. Maybe they need `find_or_create_by`, maybe they need to call a service object, maybe they need to update existing records. The commented hint shows the simplest path while making it clear this is the one method they *must* customize.
+Second, the `columns` block is conditionally rendered. If you run `rails generate data_porter:target guests` with no columns, you get a clean skeleton without an empty `columns do; end` block. You can add columns later as you figure out your CSV structure.
+## Decisions & tradeoffs
+| Decision | We chose | Over | Because |
+|----------|----------|------|---------|
+| Generator base class | `NamedBase` for target, `Base` for install | Both using `Base` | `NamedBase` gives us `class_name`, `file_name`, and argument parsing for free; install does not need a name argument |
+| Migration template format | ERB (`.rb.erb`) with `ActiveRecord::Migration.current_version` | Hardcoded migration version | The generated migration automatically matches the host app's Rails version |
+| Column parsing syntax | `name:type[:required]` colon-separated | A flag-based approach (`--columns name:string --required name`) | Matches the familiar `rails generate model` convention; less typing, easier to remember |
+| Initializer style | All options commented out with explanations | Only showing uncommented defaults | Developers discover every configuration option on first read; uncommenting is easier than looking up what is available |
+| Target output directory | `app/importers/` | `app/data_porter/targets/` or `app/models/concerns/` | Short, clear, conventional; mirrors how apps organize service objects in `app/services/` |
+| Persist method | Commented-out hint (`# Guest.create!(...)`) | Working default implementation | Forces the developer to make an intentional choice about their persistence strategy |
+## Testing it
+Generator testing is unusual -- you are testing that files get created with the right content, not that objects behave correctly. The specs verify the generator's structure and its column-parsing logic without actually running the generator against a filesystem:
+```ruby
+# spec/data_porter/generators/install_generator_spec.rb
+RSpec.describe DataPorter::Generators::InstallGenerator do
+  it "inherits from Rails::Generators::Base" do
+    expect(described_class.superclass).to eq(Rails::Generators::Base)
+  end
+  it "has a source_root pointing to templates" do
+    expect(described_class.source_root).to end_with("lib/generators/data_porter/install/templates")
+  end
+  describe "generator methods" do
+    it "defines copy_migration" do
+      expect(described_class.instance_method(:copy_migration)).to be_a(UnboundMethod)
+    end
+    it "defines copy_initializer" do
+      expect(described_class.instance_method(:copy_initializer)).to be_a(UnboundMethod)
+    end
+    it "defines create_importers_directory" do
+      expect(described_class.instance_method(:create_importers_directory)).to be_a(UnboundMethod)
+    end
+    it "defines mount_engine" do
+      expect(described_class.instance_method(:mount_engine)).to be_a(UnboundMethod)
+    end
+  end
+end
+```
+The install generator specs verify that the class inherits from the right base, that the source root points to the templates directory, and that each expected method exists. This is a structural test: if someone renames a method or changes the inheritance chain, the spec catches it immediately.
+The target generator specs go deeper, testing the column parsing and naming derivation:
+```ruby
+# spec/data_porter/generators/target_generator_spec.rb
+RSpec.describe DataPorter::Generators::TargetGenerator do
+  it "inherits from Rails::Generators::NamedBase" do
+    expect(described_class.superclass).to eq(Rails::Generators::NamedBase)
+  end
+  describe "column parsing" do
+    let(:generator) { described_class.new(["guests", "first_name:string:required", "email:email"]) }
+    it "parses column definitions" do
+      columns = generator.send(:parsed_columns)
+      expect(columns.size).to eq(2)
+      expect(columns[0]).to eq({ name: "first_name", type: "string", required: true })
+      expect(columns[1]).to eq({ name: "email", type: "email", required: false })
+    end
+  end
+  describe "naming" do
+    let(:generator) { described_class.new(["guests"]) }
+    it "derives the class name" do
+      expect(generator.send(:target_class_name)).to eq("GuestsTarget")
+    end
+    it "derives the model name" do
+      expect(generator.send(:model_name)).to eq("Guest")
+    end
+    it "derives the label" do
+      expect(generator.send(:target_label)).to eq("Guests")
+    end
+  end
+end
+```
+The column parsing spec is the most important one. It confirms that `first_name:string:required` correctly splits into a hash with `required: true`, while `email:email` (no third segment) defaults to `required: false`. The naming specs verify that `guests` as input produces `GuestsTarget` for the class, `Guest` for the model, and `Guests` for the label -- the singularization and titleization that the template depends on.
+Notice that the specs instantiate the generator directly with `described_class.new(["guests", ...])` rather than invoking it through the Rails generator runner. This keeps the tests fast and focused: we are testing the logic, not the file I/O.
+## Recap
+- The **install generator** bootstraps the entire engine in one command: migration, initializer, importers directory, and route mount -- four steps that previously required reading the README and manually creating files.
+- The **target generator** scaffolds a new import type with parsed column definitions, producing a ready-to-customize target file that follows the DSL conventions established in earlier parts of the series.
+- The **migration template** uses ERB with `ActiveRecord::Migration.current_version` to match the host app's Rails version automatically, avoiding hardcoded version numbers.
+- The **column parsing syntax** (`name:type[:required]`) mirrors `rails generate model` conventions, keeping the learning curve flat for Rails developers.
+- The **initializer template** documents every configuration option with comments, making the engine's surface area discoverable without external documentation.
+## Next up
+The engine can now parse CSV files, but real-world data does not always arrive in a spreadsheet. In part 12, we extend the Source layer with **JSON and API sources** -- a JSON source that accepts files or raw text, and an API source that fetches data from HTTP endpoints with injectable parameters and authentication headers. The source abstraction we designed in part 6 is about to prove its worth.
+---
+*This is part 11 of the series "Building DataPorter - A Data Import Engine for Rails". [Previous: Controllers & Routing in a Rails Engine](#) | [Next: Adding JSON & API Sources](#)*

data/docs/blog/012-json-api-sources.md ADDED Viewed

@@ -0,0 +1,323 @@
+---
+title: "Building DataPorter #12 -- Au-dela du CSV : Sources JSON et API"
+series: "Building DataPorter - A Data Import Engine for Rails"
+part: 12
+tags: [ruby, rails, rails-engine, gem-development, json, api, http, sources, dsl]
+published: false
+---
+# Au-dela du CSV : Sources JSON et API
+> Le CSV est le format roi de l'import de donnees -- mais dans la vraie vie, les donnees arrivent aussi en JSON depuis un fichier, ou directement depuis une API tierce. Voici comment DataPorter etend son architecture de sources pour absorber ces nouveaux formats sans rien casser.
+## Contexte
+Ceci est la partie 12 de la serie ou nous construisons **DataPorter**, un engine Rails montable pour les workflows d'import de donnees. Dans la [partie 11](#), nous avons construit les generateurs install et target pour que l'adoption du gem se fasse en une seule commande.
+Jusqu'ici, DataPorter ne sait lire que du CSV. C'est suffisant pour beaucoup de cas, mais le monde reel est plus varie : un partenaire envoie un export JSON, un service interne expose une API REST, un front-end pousse du JSON brut dans un formulaire. Si chaque nouveau format demande de rearchitecturer le moteur, on a rate quelque chose. L'abstraction `Sources::Base` que nous avons posee dans la partie 6 va maintenant montrer sa valeur.
+## Pourquoi plusieurs sources ?
+Un moteur d'import qui ne parle que CSV force les utilisateurs a convertir leurs donnees avant de les importer. C'est de la friction inutile. En supportant JSON et API nativement, on couvre trois scenarios courants :
+- **CSV** -- L'utilisateur uploade un fichier depuis son poste.
+- **JSON** -- L'utilisateur uploade un fichier JSON, ou bien le systeme injecte du JSON brut via la configuration.
+- **API** -- Le systeme va chercher les donnees directement sur un endpoint HTTP, avec authentification et parametres dynamiques.
+Le point cle : chaque source doit respecter le meme contrat -- une methode `fetch` qui retourne un tableau de hashes avec des cles symboliques. Le reste du pipeline (validation, transformation, persistence) ne change pas.
+## La source JSON
+La source JSON doit gerer trois manieres de recevoir du contenu : injection directe (pour les tests ou l'usage programmatique), JSON brut stocke dans la configuration de l'import, et telechargement depuis un fichier ActiveStorage.
+```ruby
+# lib/data_porter/sources/json.rb
+module DataPorter
+  module Sources
+    class Json < Base
+      def initialize(data_import, content: nil)
+        super(data_import)
+        @content = content
+      end
+      def fetch
+        parsed = ::JSON.parse(json_content)
+        records = extract_records(parsed)
+        Array(records).map do |hash|
+          hash.transform_keys { |k| k.parameterize(separator: "_").to_sym }
+        end
+      end
+      private
+      def json_content
+        @content || config_raw_json || download_file
+      end
+      def config_raw_json
+        config = @data_import.config
+        config["raw_json"] if config.is_a?(Hash)
+      end
+      def download_file
+        @data_import.file.download
+      end
+      def extract_records(parsed)
+        root = @target_class._json_root
+        return parsed unless root
+        parsed.dig(*root.split("."))
+      end
+    end
+  end
+end
+```
+Trois choses meritent attention.
+**La cascade de `json_content`.** La methode essaie trois sources dans l'ordre : le contenu injecte au constructeur, la cle `raw_json` dans la configuration de l'import, et enfin le fichier ActiveStorage. Cette cascade permet une grande flexibilite sans parametrage explicite -- le bon chemin est choisi automatiquement selon ce qui est disponible.
+**Le `json_root` pour les chemins imbriques.** Les API et les fichiers JSON du monde reel enveloppent souvent les donnees dans une structure : `{"data": {"guests": [...]}}`. Plutot que de forcer l'utilisateur a aplatir son JSON, on lui donne un DSL dans le Target :
+```ruby
+class GuestsTarget < DataPorter::Target
+  label "Guests"
+  model_name "Guest"
+  json_root "data.guests"
+  columns do
+    column :name, type: :string
+  end
+end
+```
+La methode `extract_records` utilise `dig` en decoupant le chemin sur les points. `"data.guests"` devient `parsed.dig("data", "guests")`. Simple, lisible, et supporte n'importe quel niveau d'imbrication.
+**La normalisation des cles.** Comme pour le CSV, chaque cle est transformee via `parameterize(separator: "_").to_sym`. `"First Name"` devient `:first_name`. Cela garantit que le reste du pipeline recoit toujours des cles au meme format, quel que soit le format source.
+## La source API
+La source API va chercher les donnees sur un endpoint HTTP. Elle doit supporter des endpoints statiques et dynamiques, des headers fixes et generes a la volee, et l'extraction de donnees depuis une cle de reponse.
+```ruby
+# lib/data_porter/sources/api.rb
+module DataPorter
+  module Sources
+    class Api < Base
+      def fetch
+        api = @target_class._api_config
+        response = perform_request(api)
+        parsed = ::JSON.parse(response.body)
+        records = extract_records(parsed, api)
+        Array(records).map do |hash|
+          hash.transform_keys { |k| k.parameterize(separator: "_").to_sym }
+        end
+      end
+      private
+      def perform_request(api)
+        url = resolve_endpoint(api)
+        headers = resolve_headers(api)
+        uri = URI(url)
+        Net::HTTP.start(uri.host, uri.port, use_ssl: uri.scheme == "https") do |http|
+          request = Net::HTTP::Get.new(uri)
+          headers.each { |k, v| request[k] = v }
+          http.request(request)
+        end
+      end
+      def resolve_endpoint(api)
+        params = @data_import.config.symbolize_keys
+        api.endpoint.is_a?(Proc) ? api.endpoint.call(params) : api.endpoint
+      end
+      def resolve_headers(api)
+        api.headers.is_a?(Proc) ? api.headers.call : (api.headers || {})
+      end
+      def extract_records(parsed, api)
+        root = api.response_root
+        root ? parsed[root.to_s] : parsed
+      end
+    end
+  end
+end
+```
+Le coeur de la logique se trouve dans `resolve_endpoint` et `resolve_headers`. Chacune de ces methodes accepte soit une valeur statique, soit un lambda. Cela ouvre deux modes d'utilisation :
+```ruby
+# Endpoint statique, headers fixes
+api_config do
+  endpoint "https://api.example.com/stays"
+  headers({ "Authorization" => "Bearer abc123" })
+  response_root :stays
+end
+# Endpoint dynamique, headers generes a la volee
+api_config do
+  endpoint ->(params) { "https://api.example.com/items?id=#{params[:item_id]}" }
+  headers(-> { { "Authorization" => "Bearer #{Token.current}" } })
+end
+```
+Dans le cas du lambda d'endpoint, les parametres proviennent de `@data_import.config.symbolize_keys`. L'utilisateur passe `config: { item_id: "42" }` au moment de la creation de l'import, et le lambda recoit ces parametres pour construire l'URL. Pour les headers, le lambda est appele sans argument -- il va chercher le token la ou il se trouve (variable d'environnement, modele, service externe).
+Le `response_root` fonctionne comme le `json_root` de la source JSON, mais en plus simple : il extrait une seule cle du hash de reponse. `response_root :stays` sur une reponse `{"stays": [...]}` retourne directement le tableau. Si aucun `response_root` n'est defini, la reponse entiere est utilisee.
+## Le pattern DSL d'ApiConfig
+La configuration API utilise un objet DSL dedie plutot que de simples `attr_accessor` :
+```ruby
+# lib/data_porter/dsl/api_config.rb
+module DataPorter
+  module DSL
+    class ApiConfig
+      def endpoint(value = nil)
+        return @endpoint if value.nil?
+        @endpoint = value
+      end
+      def headers(value = nil)
+        return @headers if value.nil?
+        @headers = value
+      end
+      def response_root(value = nil)
+        return @response_root if value.nil?
+        @response_root = value
+      end
+    end
+  end
+end
+```
+Chaque methode joue un double role : appellee avec un argument, elle agit comme un setter ; appellee sans argument, elle agit comme un getter. Ce pattern evite de separer `attr_reader` et `attr_writer` et produit un DSL naturel :
+```ruby
+api_config do
+  endpoint "https://api.example.com/data"   # setter
+end
+api.endpoint  # => "https://api.example.com/data"  (getter)
+```
+Dans le Target, `api_config` cree une instance d'`ApiConfig` et execute le bloc dans son contexte via `instance_eval` :
+```ruby
+# Dans DataPorter::Target
+def api_config(&)
+  @_api_config = DSL::ApiConfig.new
+  @_api_config.instance_eval(&)
+end
+```
+Ce pattern -- objet DSL + `instance_eval` -- est le meme que celui utilise pour le bloc `columns`. C'est un idiome Ruby classique qui donne une syntaxe propre tout en gardant l'implementation testable (l'objet `ApiConfig` est un PORO normal, facile a instancier et inspecter dans les specs).
+## Le dispatch via Sources.resolve
+L'ajout de nouvelles sources ne modifie rien au code existant. Le module `Sources` maintient un registre simple :
+```ruby
+# lib/data_porter/sources.rb
+module DataPorter
+  module Sources
+    REGISTRY = {
+      api: Api,
+      csv: Csv,
+      json: Json
+    }.freeze
+    def self.resolve(type)
+      REGISTRY.fetch(type.to_sym) { raise Error, "Unknown source type: #{type}" }
+    end
+  end
+end
+```
+L'Orchestrator appelle `Sources.resolve(import.source_type)` et recoit la bonne classe. Il instancie ensuite la source et appelle `fetch`. Ni l'Orchestrator ni les controllers ne savent quel type de source est utilise -- c'est le `source_type` stocke dans l'import qui decide. Ajouter une source XML ou Parquet demanderait : une classe heritant de `Base`, une entree dans le `REGISTRY`, et c'est tout.
+## L'approche TDD
+Les deux sources ont ete construites en TDD. La source JSON est testee avec trois scenarios :
+```ruby
+it "parses JSON array content" do
+  json = '[{"first_name": "Alice", "last_name": "Smith"}]'
+  source = described_class.new(import, content: json)
+  records = source.fetch
+  expect(records.first[:first_name]).to eq("Alice")
+end
+it "extracts records from a nested path" do
+  json = '{"data": {"guests": [{"name": "Alice"}, {"name": "Bob"}]}}'
+  source = described_class.new(import_with_root, content: json)
+  expect(source.fetch.size).to eq(2)
+end
+it "reads from config raw_json when no content provided" do
+  import.update!(config: { "raw_json" => '[{"first_name": "Config"}]' })
+  source = described_class.new(import)
+  expect(source.fetch.first[:first_name]).to eq("Config")
+end
+```
+Chaque test couvre un chemin de la cascade : injection directe, `json_root`, et fallback `raw_json`. Pour la source API, on stubbe `Net::HTTP.start` pour eviter les vrais appels HTTP, et on teste les quatre axes : endpoint statique, endpoint lambda, headers lambda, et absence de `response_root` :
+```ruby
+it "fetches and parses records from response_root" do
+  response_body = '{"stays": [{"name": "Beach House"}, {"name": "Mountain Cabin"}]}'
+  stub_http_get(response_body)
+  source = described_class.new(import)
+  expect(source.fetch.size).to eq(2)
+end
+it "resolves the endpoint lambda with params" do
+  response_body = '[{"title": "Item 42"}]'
+  stub_http_get(response_body)
+  source = described_class.new(import_with_lambda)
+  expect(source.fetch.first[:title]).to eq("Item 42")
+end
+```
+Le stub est minimal : `allow(Net::HTTP).to receive(:start).and_return(response)`. On ne teste pas que `Net::HTTP` fonctionne -- on teste que notre code compose correctement l'URL, les headers, et extrait les bonnes donnees de la reponse.
+## Decisions et compromis
+| Decision | Choix retenu | Alternative ecartee | Raison |
+|----------|-------------|---------------------|--------|
+| Client HTTP | `Net::HTTP` (stdlib) | Faraday, HTTParty | Zero dependance supplementaire ; suffisant pour des GET simples |
+| Endpoint dynamique | Lambda recevant `params` | String avec interpolation | Le lambda permet toute logique (conditions, appels de service) sans eval de string |
+| Headers dynamiques | Lambda sans argument | Callback avec contexte | Les headers viennent souvent d'un service global (ENV, token store), pas du contexte de l'import |
+| Cascade JSON | `content` > `raw_json` > `file` | Argument obligatoire | Flexibilite maximale ; chaque cas d'usage trouve son chemin naturellement |
+| Normalisation des cles | `parameterize` + `to_sym` | Mapping explicite | Coherent avec la source CSV ; le pipeline en aval recoit toujours le meme format |
+## Recap
+- **La source JSON** supporte trois modes d'entree (injection, config `raw_json`, fichier) via une cascade de fallbacks, et utilise `json_root` pour naviguer dans des structures imbriquees.
+- **La source API** resout dynamiquement endpoints et headers grace a un systeme dual statique/lambda, et extrait les donnees via `response_root`.
+- **Le DSL `ApiConfig`** utilise un pattern getter/setter sans `attr_reader`, evaluee dans un bloc `instance_eval` pour une syntaxe naturelle.
+- **`Sources.resolve`** dispatche vers la bonne classe via un registre fige -- ajouter une source est une operation en deux lignes.
+- **Les tests** couvrent chaque chemin de chaque source sans toucher le reseau, grace a l'injection de contenu et au stubbing HTTP.
+## La suite
+Les sources JSON et API completent le trio de formats supportes. Mais nous n'avons pas encore parle de la strategie de test globale de l'engine -- comment tester un moteur Rails sans application hote complete, comment organiser les specs entre tests unitaires et integration, comment mocker ActiveStorage et ActionCable. Dans la partie 13, nous plongeons dans le **testing d'un Rails Engine avec RSpec** et les patterns qui gardent la suite rapide et fiable.
+---
+*Ceci est la partie 12 de la serie "Building DataPorter - A Data Import Engine for Rails". [Precedent : Generators: Install & Target Scaffolding](#) | [Suivant : Testing a Rails Engine with RSpec](#)*