RubyGems - data_drain - Versions diffs - 0.3.2 → 0.5.0 - Mend

data_drain 0.3.2 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

checksums.yaml +4 -4
data/.rubocop.yml +12 -0
data/CHANGELOG.md +43 -0
data/README.md +30 -0
data/docs/IMPROVEMENT_PLAN.md +114 -0
data/docs/execution/v0.4.0-OBSERVACIONES.md +144 -0
data/docs/execution/v0.4.0.md +1216 -0
data/docs/execution/v0.5.0-OBSERVACIONES.md +167 -0
data/docs/execution/v0.5.0.md +900 -0
data/docs/glue-jobs-lifecycle.md +330 -0
data/docs/glue_pyspark_example.py +49 -19
data/lib/data_drain/glue_runner.rb +236 -1
data/lib/data_drain/storage/base.rb +12 -0
data/lib/data_drain/storage/local.rb +13 -0
data/lib/data_drain/storage/s3.rb +17 -0
data/lib/data_drain/validations.rb +8 -0
data/lib/data_drain/version.rb +1 -1
data/skill/SKILL.md +64 -3
data/skill/references/eventos-telemetria.md +8 -0
metadata +6 -1

data/docs/execution/v0.4.0.md ADDED Viewed

@@ -0,0 +1,1216 @@
+# Plan de Ejecución — v0.4.0
+**Release objetivo:** v0.4.0 — Glue Jobs Lifecycle (post-roadmap)
+**Items:** 32, 33, 34, 35, 36 (nuevos, no estaban en roadmap original)
+**Branch sugerido:** `feature/v0.4.0`
+**Base:** `main` (contiene v0.3.1, roadmap 24/24 cerrado)
+**Estado:** No iniciado
+**Última actualización:** 2026-04-15
+---
+## Contexto
+DataDrain hoy solo ejecuta Glue Jobs pre-existentes (`GlueRunner.run_and_wait`). Para automatizar el ciclo de vida completo (infra-as-code), agregamos creación/actualización/eliminación de Jobs.
+**Caso de uso:** un proceso schedulado (cron mensual) que asegura el job existe con la config correcta antes de correrlo:
+```ruby
+DataDrain::GlueRunner.ensure_job(
+  name: "data-drain-export-versions",
+  role_arn: "arn:aws:iam::123:role/GlueServiceRole",
+  script_location: "s3://my-bucket/scripts/export.py",
+  glue_version: "4.0",
+  worker_type: "G.1X",
+  number_of_workers: 5
+)
+DataDrain::GlueRunner.run_and_wait("data-drain-export-versions", { ... }, max_wait_seconds: 3600)
+```
+## Decisiones de diseño aprobadas
+| # | Decisión | Elegida |
+|---|---------|---------|
+| A | API shape | **A2 — `ensure_job` idempotente + atómicos** |
+| B | Script upload a S3 | **B1 — caller responsable** (no upload desde gema) |
+| C | Validación IAM/bucket pre-create | **C1 — no validar** (errores AWS son claros) |
+---
+## Items del release
+| Fase | Item | Resumen | Estimación |
+|------|------|---------|------------|
+| 0 | — | Setup baseline + branch | 15min |
+| 1 | **34** | Helpers consultivos: `job_exists?`, `get_job` | 1h |
+| 2 | **32** | Atómicos: `create_job`, `update_job`, `delete_job` | 4-5h |
+| 3 | **33** | `ensure_job` idempotente (compara y reconcilia) | 3-4h |
+| 4 | **35** | Tests con `Aws::Glue::Client.stub_responses` (consolidación) | 3h (parcial inline en fases 1-3) |
+| 5 | **36** | Docs: `glue-jobs-lifecycle.md` + ejemplos README | 2h |
+| 6 | — | Release | 30min |
+**Total estimado:** 13-16h, ~2 días enfocados.
+**Breaking:** ninguno. Solo agregamos métodos al módulo `GlueRunner` existente. `run_and_wait` queda intacto.
+---
+## Review de agentes — incorporado
+Revisión por **big-pickle** 2026-04-15 (`docs/execution/v0.4.0-OBSERVACIONES.md`). 5 observaciones, todas incorporadas (con corrección a la #5):
+| # | Severidad | Resolución | Ubicación |
+|---|-----------|-----------|-----------|
+| 1 | **Bloqueante** | Crear `validate_glue_name!` (regex `\A[a-zA-Z0-9_-]+\z`) — Glue acepta `-`, regex actual no. Aplicado ANTES de Fase 1. | Fase 0.4 |
+| 2 | Media | Refinar `changed_fields` para tratar `extracted[field].nil? && !desired_config.key?(field)` como "no opinion" (no diff) | Fase 3.2 |
+| 3 | Media | Test inline en Fase 2.3 captura params de `update_job` para validar API shape (no incluir `:name`, `command` con shape completa) | Fase 2.5 |
+| 4 | Baja | Verificación explícita: `extract_current_config` NO incluye `created_on`, `last_modified_on`, `allocated_capacity`. Documentar | Fase 3.2 |
+| 5 | ⚠️ Corrección parcial | Big-pickle dice que `Hash#==` en Ruby compara referencias. **Es incorrecto:** `Hash#==` compara contenido (claves + valores). Sin embargo, vale agregar test explícito para confirmar Hash equality del `default_arguments`. Sin cambio al código, sí test. | Fase 4.1 |
+**Nota técnica para la observación #5:**
+```ruby
+{ "--k" => "v" } == { "--k" => "v" }    # => true (compara contenido)
+{ "--k" => "v" } != { "--k" => "v" }    # => false
+```
+Riesgo real existente (no señalado por big-pickle):
+- Si AWS retorna `Aws::Glue::Types::JobUpdate` que no es Hash plain, comparación fallaría. Mi plan ya hace `default_arguments&.to_h || {}` — convierte a Hash. OK.
+- Si las keys vienen como Symbol vs String desde AWS SDK, comparación falla. Verificar con test.
+---
+## Orden de ejecución y dependencias
+```
+Fase 0: setup
+   │
+   ▼
+Fase 1: Item 34 (job_exists? + get_job) ──► foundation, sin deps
+   │                                         requerido por 32 (delete_job) y 33 (ensure_job)
+   ▼
+Fase 2: Item 32 (create + update + delete atómicos) ──► usa get_job para delete safety
+   │
+   ▼
+Fase 3: Item 33 (ensure_job idempotente) ──► usa get_job + create_job/update_job
+   │                                          comparación de hash deseado vs actual
+   ▼
+Fase 4: Item 35 (tests consolidación) ──► tests inline en fases 1-3, esta fase
+   │                                       valida cobertura final
+   ▼
+Fase 5: Item 36 (docs) ──► glue-jobs-lifecycle.md + README + skill
+   │
+   ▼
+Fase 6: Release
+```
+---
+## Pre-requisitos (Fase 0)
+### 0.1 Verificar entorno
+- [ ] `git checkout main && git pull`
+- [ ] Versión actual `lib/data_drain/version.rb` = `"0.3.1"`
+- [ ] `bundle exec rspec` pasa (coverage ≥ 90%)
+- [ ] `bundle exec rubocop` sin ofensas
+- [ ] CI verde en main Ruby 3.2/3.3/3.4
+### 0.2 Crear branch
+- [ ] `git checkout -b feature/v0.4.0`
+### 0.3 Marcar items como en progreso
+Convención del `IMPROVEMENT_PLAN.md`:
+- [ ] Editar `docs/IMPROVEMENT_PLAN.md` sección "Follow-ups post-roadmap":
+  - Agregar items 32, 33, 34, 35, 36 con detalle (similar a item 17 estructura).
+  - Marcar `[~]` (en progreso).
+- [ ] Commit: `chore: agregar items 32-36 al roadmap como en progreso`
+### 0.4 Crear `validate_glue_name!` (BLOQUEANTE — observación 1 big-pickle)
+AWS Glue Job names aceptan guiones (`-`). El `Validations.validate_identifier!` actual usa regex `\A[a-zA-Z_][a-zA-Z0-9_]*\z` que NO acepta `-`. Sin este fix, ejemplos del plan (`name: "data-drain-export-versions"`) fallarían.
+**Decisión:** crear método separado, no modificar `validate_identifier!` (afectaría `table_name`, `primary_key`, `folder_name` que SÍ deben ser identificadores SQL estrictos).
+- [ ] Editar `lib/data_drain/validations.rb`. Agregar:
+  ```ruby
+  GLUE_NAME_REGEX = /\A[a-zA-Z0-9_-]+\z/
+  # Valida un nombre de AWS Glue Job. Permite alfanumérico, `_` y `-`.
+  # Más permisivo que validate_identifier! porque AWS lo permite.
+  #
+  # @param field_name [Symbol]
+  # @param value [String]
+  # @raise [DataDrain::ConfigurationError]
+  def self.validate_glue_name!(field_name, value)
+    return if GLUE_NAME_REGEX.match?(value.to_s)
+    raise ConfigurationError,
+          "#{field_name} '#{value}' debe ser un Glue Job name válido (alfanumérico, '-', '_')"
+  end
+  ```
+- [ ] Tests en `spec/data_drain/validations_spec.rb`:
+  ```ruby
+  describe ".validate_glue_name!" do
+    it "no levanta para nombres con guiones" do
+      expect { described_class.validate_glue_name!(:name, "data-drain-export-versions") }.not_to raise_error
+    end
+    it "no levanta para alfanumérico simple" do
+      expect { described_class.validate_glue_name!(:name, "myJob123") }.not_to raise_error
+    end
+    it "no levanta para guiones bajos" do
+      expect { described_class.validate_glue_name!(:name, "my_job_v2") }.not_to raise_error
+    end
+    it "rechaza espacios" do
+      expect { described_class.validate_glue_name!(:name, "my job") }.to raise_error(DataDrain::ConfigurationError)
+    end
+    it "rechaza punto y coma (SQL injection)" do
+      expect { described_class.validate_glue_name!(:name, "job; DROP") }.to raise_error(DataDrain::ConfigurationError)
+    end
+    it "rechaza nombre vacío" do
+      expect { described_class.validate_glue_name!(:name, "") }.to raise_error(DataDrain::ConfigurationError)
+    end
+  end
+  ```
+- [ ] Validar: `bundle exec rspec spec/data_drain/validations_spec.rb`
+- [ ] Commit: `feat(validations): validate_glue_name! para Glue Job names con '-' (pre-fase v0.4.0)`
+### Checkpoint Fase 0
+- [ ] Branch creado
+- [ ] Items en roadmap como `[~]`
+- [ ] Baseline verde
+- [ ] `validate_glue_name!` implementado y testeado (resuelve obs 1 big-pickle)
+---
+## Fase 1 — Item 34: Helpers consultivos `job_exists?` + `get_job`
+**Foundation.** Necesarios para items 32 (delete safety) y 33 (ensure idempotente).
+### 1.1 Implementación
+- [ ] Editar `lib/data_drain/glue_runner.rb`. Agregar al final de la clase:
+  ```ruby
+  # Verifica si un Glue Job existe.
+  #
+  # @param name [String] Nombre del Job en AWS.
+  # @return [Boolean]
+  def self.job_exists?(name)
+    !get_job(name).nil?
+  end
+  # Retorna los datos actuales del Job o nil si no existe.
+  #
+  # @param name [String]
+  # @return [Aws::Glue::Types::Job, nil]
+  def self.get_job(name)
+    config = DataDrain.configuration
+    config.validate!
+    client = Aws::Glue::Client.new(region: config.aws_region)
+    client.get_job(job_name: name).job
+  rescue Aws::Glue::Errors::EntityNotFoundException
+    nil
+  end
+  ```
+- [ ] Mantener `extend Observability` y `private_class_method` ya existentes.
+### 1.2 Tests
+- [ ] Editar `spec/data_drain/glue_runner_spec.rb`. Agregar al final:
+  ```ruby
+  describe ".get_job" do
+    let(:glue_client) { Aws::Glue::Client.new(stub_responses: true, region: "us-east-1") }
+    before do
+      allow(Aws::Glue::Client).to receive(:new).and_return(glue_client)
+    end
+    it "retorna el Job si existe" do
+      glue_client.stub_responses(:get_job, {
+        job: { name: "my-job", role: "arn:aws:iam::123:role/Glue" }
+      })
+      job = described_class.get_job("my-job")
+      expect(job.name).to eq("my-job")
+    end
+    it "retorna nil si EntityNotFoundException" do
+      glue_client.stub_responses(:get_job, "EntityNotFoundException")
+      expect(described_class.get_job("nonexistent")).to be_nil
+    end
+    it "propaga otros errores AWS" do
+      glue_client.stub_responses(:get_job, "InternalServiceException")
+      expect { described_class.get_job("my-job") }.to raise_error(Aws::Glue::Errors::ServiceError)
+    end
+  end
+  describe ".job_exists?" do
+    let(:glue_client) { Aws::Glue::Client.new(stub_responses: true, region: "us-east-1") }
+    before do
+      allow(Aws::Glue::Client).to receive(:new).and_return(glue_client)
+    end
+    it "true si get_job retorna Job" do
+      glue_client.stub_responses(:get_job, { job: { name: "my-job" } })
+      expect(described_class.job_exists?("my-job")).to be true
+    end
+    it "false si get_job retorna nil" do
+      glue_client.stub_responses(:get_job, "EntityNotFoundException")
+      expect(described_class.job_exists?("nonexistent")).to be false
+    end
+  end
+  ```
+### 1.3 Validación
+- [ ] `bundle exec rspec spec/data_drain/glue_runner_spec.rb`
+- [ ] `bundle exec rubocop lib/data_drain/glue_runner.rb`
+### 1.4 Commit
+- [ ] Commit: `feat(glue): job_exists? + get_job helpers consultivos (item 34)`
+### Checkpoint Fase 1
+- [ ] Helpers funcionando con stubs
+- [ ] EntityNotFoundException → nil (no propaga)
+- [ ] Otros errores AWS sí propagan
+---
+## Fase 2 — Item 32: Atómicos `create_job` + `update_job` + `delete_job`
+### 2.1 Definición de hash de configuración
+Para evitar firmas con muchos kwargs, usar un hash de config bien definido. Helpers internos lo traducen a la API AWS.
+- [ ] Documentar las opciones soportadas (subset de Aws::Glue::Types::JobUpdate):
+  ```
+  name             [String, REQUERIDO]
+  role_arn         [String, REQUERIDO]
+  script_location  [String, REQUERIDO] — "s3://..."
+  glue_version     [String, default "4.0"]
+  worker_type      [String, default "G.1X"]      — G.1X, G.2X, G.4X, G.8X
+  number_of_workers [Integer, default 5]
+  timeout_minutes  [Integer, default 2880]       — 48h
+  max_retries      [Integer, default 0]
+  max_concurrent_runs [Integer, default 1]
+  command_name     [String, default "glueetl"]   — o "pythonshell"
+  python_version   [String, default "3"]
+  default_arguments [Hash, default {}]
+  description      [String, opcional]
+  ```
+### 2.2 Implementar `create_job`
+- [ ] Agregar a `lib/data_drain/glue_runner.rb`:
+  ```ruby
+  # Crea un Glue Job.
+  #
+  # @param config [Hash] Ver "Definición de hash de configuración" en docs.
+  # @return [String] El nombre del job creado.
+  # @raise [Aws::Glue::Errors::AlreadyExistsException] Si ya existe.
+  # @raise [DataDrain::ConfigurationError] Si faltan campos obligatorios.
+  def self.create_job(config)
+    validate_job_config!(config)
+    config_for_aws = build_aws_job_params(config)
+    aws_config = DataDrain.configuration
+    aws_config.validate!
+    client = Aws::Glue::Client.new(region: aws_config.aws_region)
+    @logger = aws_config.logger
+    safe_log(:info, "glue_runner.job_create",
+             { job: config[:name],
+               glue_version: config_for_aws[:glue_version],
+               worker_type: config_for_aws[:worker_type],
+               number_of_workers: config_for_aws[:number_of_workers] })
+    client.create_job(config_for_aws)
+    config[:name]
+  rescue Aws::Glue::Errors::ServiceError => e
+    safe_log(:error, "glue_runner.job_create_error",
+             { job: config[:name] }.merge(exception_metadata(e)))
+    raise
+  end
+  # @api private
+  def self.validate_job_config!(config)
+    %i[name role_arn script_location].each do |field|
+      val = config[field]
+      next unless val.nil? || val.to_s.empty?
+      raise DataDrain::ConfigurationError, "config[:#{field}] es obligatorio para Glue Job"
+    end
+    DataDrain::Validations.validate_glue_name!(:name, config[:name])
+  end
+  # @api private
+  def self.build_aws_job_params(config)
+    {
+      name: config[:name],
+      description: config[:description],
+      role: config[:role_arn],
+      command: {
+        name: config.fetch(:command_name, "glueetl"),
+        script_location: config[:script_location],
+        python_version: config.fetch(:python_version, "3")
+      }.compact,
+      default_arguments: config.fetch(:default_arguments, {}),
+      glue_version: config.fetch(:glue_version, "4.0"),
+      worker_type: config.fetch(:worker_type, "G.1X"),
+      number_of_workers: config.fetch(:number_of_workers, 5),
+      timeout: config.fetch(:timeout_minutes, 2880),
+      max_retries: config.fetch(:max_retries, 0),
+      execution_property: { max_concurrent_runs: config.fetch(:max_concurrent_runs, 1) }
+    }.compact
+  end
+  ```
+### 2.3 Implementar `update_job`
+- [ ] Agregar:
+  ```ruby
+  # Actualiza un Glue Job existente.
+  #
+  # @param config [Hash] Mismos campos que create_job.
+  # @return [String] Nombre del job actualizado.
+  # @raise [Aws::Glue::Errors::EntityNotFoundException] Si no existe.
+  def self.update_job(config)
+    validate_job_config!(config)
+    aws_params = build_aws_job_params(config)
+    aws_config = DataDrain.configuration
+    aws_config.validate!
+    client = Aws::Glue::Client.new(region: aws_config.aws_region)
+    @logger = aws_config.logger
+    # AWS API: update_job toma {name:, job_update: {...}} donde job_update
+    # NO incluye :name (es el ID del path), pero sí :command, :role, etc.
+    job_update = aws_params.except(:name)
+    safe_log(:info, "glue_runner.job_update",
+             { job: config[:name] })
+    client.update_job(name: config[:name], job_update: job_update)
+    config[:name]
+  rescue Aws::Glue::Errors::ServiceError => e
+    safe_log(:error, "glue_runner.job_update_error",
+             { job: config[:name] }.merge(exception_metadata(e)))
+    raise
+  end
+  ```
+### 2.4 Implementar `delete_job`
+- [ ] Agregar:
+  ```ruby
+  # Elimina un Glue Job. No-op si no existe (similar a DROP TABLE IF EXISTS).
+  #
+  # @param name [String]
+  # @return [Boolean] true si se borró, false si no existía.
+  def self.delete_job(name)
+    DataDrain::Validations.validate_glue_name!(:name, name)
+    config = DataDrain.configuration
+    config.validate!
+    client = Aws::Glue::Client.new(region: config.aws_region)
+    @logger = config.logger
+    client.delete_job(job_name: name)
+    safe_log(:info, "glue_runner.job_delete", { job: name })
+    true
+  rescue Aws::Glue::Errors::EntityNotFoundException
+    safe_log(:info, "glue_runner.job_delete_skipped", { job: name, reason: "not_found" })
+    false
+  rescue Aws::Glue::Errors::ServiceError => e
+    safe_log(:error, "glue_runner.job_delete_error",
+             { job: name }.merge(exception_metadata(e)))
+    raise
+  end
+  ```
+**Decisión:** `delete_job` es **idempotente** (no levanta si no existe). Mejor UX para callers que quieren tear-down sin chequear existencia primero.
+### 2.5 Tests
+- [ ] Agregar a `spec/data_drain/glue_runner_spec.rb`:
+  ```ruby
+  describe ".create_job" do
+    let(:glue_client) { Aws::Glue::Client.new(stub_responses: true, region: "us-east-1") }
+    before do
+      allow(Aws::Glue::Client).to receive(:new).and_return(glue_client)
+    end
+    let(:valid_config) do
+      {
+        name: "my-job",
+        role_arn: "arn:aws:iam::123:role/GlueRole",
+        script_location: "s3://my-bucket/scripts/export.py"
+      }
+    end
+    it "crea con defaults razonables" do
+      created_params = nil
+      glue_client.stub_responses(:create_job, lambda { |context|
+        created_params = context.params
+        { name: "my-job" }
+      })
+      result = described_class.create_job(valid_config)
+      expect(result).to eq("my-job")
+      expect(created_params[:role]).to eq("arn:aws:iam::123:role/GlueRole")
+      expect(created_params[:glue_version]).to eq("4.0")
+      expect(created_params[:worker_type]).to eq("G.1X")
+      expect(created_params[:number_of_workers]).to eq(5)
+      expect(created_params[:command][:name]).to eq("glueetl")
+      expect(created_params[:command][:python_version]).to eq("3")
+    end
+    it "respeta worker_type custom" do
+      created_params = nil
+      glue_client.stub_responses(:create_job, lambda { |context|
+        created_params = context.params
+        { name: "my-job" }
+      })
+      described_class.create_job(valid_config.merge(worker_type: "G.4X", number_of_workers: 20))
+      expect(created_params[:worker_type]).to eq("G.4X")
+      expect(created_params[:number_of_workers]).to eq(20)
+    end
+    it "rechaza config sin name" do
+      expect {
+        described_class.create_job(valid_config.merge(name: nil))
+      }.to raise_error(DataDrain::ConfigurationError, /name/)
+    end
+    it "rechaza config sin role_arn" do
+      expect {
+        described_class.create_job(valid_config.merge(role_arn: nil))
+      }.to raise_error(DataDrain::ConfigurationError, /role_arn/)
+    end
+    it "rechaza name con caracteres inválidos" do
+      expect {
+        described_class.create_job(valid_config.merge(name: "my-job; DROP"))
+      }.to raise_error(DataDrain::ConfigurationError, /name/)
+    end
+    it "ACEPTA name con guiones (Glue convention)" do
+      glue_client.stub_responses(:create_job, { name: "data-drain-export-versions" })
+      expect {
+        described_class.create_job(valid_config.merge(name: "data-drain-export-versions"))
+      }.not_to raise_error
+    end
+    it "loguea glue_runner.job_create_error y propaga si falla" do
+      glue_client.stub_responses(:create_job, "AlreadyExistsException")
+      expect {
+        described_class.create_job(valid_config)
+      }.to raise_error(Aws::Glue::Errors::AlreadyExistsException)
+    end
+  end
+  describe ".update_job" do
+    let(:glue_client) { Aws::Glue::Client.new(stub_responses: true, region: "us-east-1") }
+    before do
+      allow(Aws::Glue::Client).to receive(:new).and_return(glue_client)
+    end
+    let(:valid_config) do
+      {
+        name: "my-job",
+        role_arn: "arn:aws:iam::123:role/GlueRole",
+        script_location: "s3://my-bucket/scripts/export.py"
+      }
+    end
+    # Observación 3 big-pickle: capturar params para validar API shape de update_job
+    it "envía job_update SIN :name (es path param)" do
+      captured_params = nil
+      glue_client.stub_responses(:update_job, lambda { |context|
+        captured_params = context.params
+        { job_name: "my-job" }
+      })
+      described_class.update_job(valid_config)
+      expect(captured_params[:name]).to eq("my-job")
+      expect(captured_params[:job_update]).to be_a(Hash)
+      expect(captured_params[:job_update]).not_to have_key(:name)  # ← clave
+      expect(captured_params[:job_update][:role]).to eq("arn:aws:iam::123:role/GlueRole")
+      expect(captured_params[:job_update][:command]).to be_a(Hash)
+      expect(captured_params[:job_update][:command][:script_location]).to eq("s3://my-bucket/scripts/export.py")
+    end
+    it "propaga EntityNotFoundException si el job no existe" do
+      glue_client.stub_responses(:update_job, "EntityNotFoundException")
+      expect {
+        described_class.update_job(valid_config)
+      }.to raise_error(Aws::Glue::Errors::EntityNotFoundException)
+    end
+  end
+  describe ".delete_job" do
+    let(:glue_client) { Aws::Glue::Client.new(stub_responses: true, region: "us-east-1") }
+    before do
+      allow(Aws::Glue::Client).to receive(:new).and_return(glue_client)
+    end
+    it "borra y retorna true si existe" do
+      glue_client.stub_responses(:delete_job, { job_name: "my-job" })
+      expect(described_class.delete_job("my-job")).to be true
+    end
+    it "retorna false si no existe (idempotente)" do
+      glue_client.stub_responses(:delete_job, "EntityNotFoundException")
+      expect(described_class.delete_job("nonexistent")).to be false
+    end
+    it "propaga otros errores AWS" do
+      glue_client.stub_responses(:delete_job, "ServiceUnavailable")
+      expect {
+        described_class.delete_job("my-job")
+      }.to raise_error(Aws::Glue::Errors::ServiceError)
+    end
+    it "valida name como identificador" do
+      expect {
+        described_class.delete_job("my-job; DROP")
+      }.to raise_error(DataDrain::ConfigurationError)
+    end
+  end
+  ```
+### 2.6 Validación Fase 2
+- [ ] `bundle exec rspec spec/data_drain/glue_runner_spec.rb`
+- [ ] `bundle exec rubocop lib/data_drain/glue_runner.rb`
+### 2.7 Commit
+- [ ] Commit: `feat(glue): create_job + update_job + delete_job atómicos (item 32)`
+### Checkpoint Fase 2
+- [ ] 3 operaciones atómicas + helpers de validación
+- [ ] Tests cubren happy + edge + error AWS
+- [ ] `delete_job` es idempotente
+- [ ] Identificadores `name` validados con regex
+---
+## Fase 3 — Item 33: `ensure_job` idempotente
+### 3.1 Diseño de comparación
+**Reto:** AWS `get_job` retorna campos default que el caller no setea (ej. `MaxRetries: 0`). Si comparamos hash actual vs deseado naive, falsamente vemos diff.
+**Solución:** comparar SOLO los campos que el caller envía explícitamente. El resto se considera "no opinion".
+Estrategia:
+```ruby
+# Comparar set de campos del config caller vs valores actuales del Job
+def diff_fields(desired_config, current_job)
+  changes = []
+  comparable_fields(desired_config).each do |field|
+    desired = desired_config[field]
+    current = extract_current_value(current_job, field)
+    changes << field if desired != current
+  end
+  changes
+end
+```
+Campos comparables (del subset que el caller setea):
+- `description`, `role_arn`, `script_location`, `glue_version`,
+- `worker_type`, `number_of_workers`, `timeout_minutes`,
+- `max_retries`, `max_concurrent_runs`, `command_name`, `python_version`,
+- `default_arguments`
+### 3.2 Implementación
+- [ ] Agregar a `lib/data_drain/glue_runner.rb`:
+  ```ruby
+  # Asegura que el Glue Job existe con la config deseada. Idempotente.
+  #
+  # - Si no existe → create_job
+  # - Si existe pero difiere → update_job (loguea changed_fields)
+  # - Si existe y coincide → no-op (loguea unchanged)
+  #
+  # @param config [Hash] Misma estructura que create_job.
+  # @return [Symbol] :created | :updated | :unchanged
+  def self.ensure_job(config)
+    validate_job_config!(config)
+    current = get_job(config[:name])
+    return create_and_log(config) if current.nil?
+    changed = changed_fields(config, current)
+    if changed.empty?
+      @logger = DataDrain.configuration.logger
+      safe_log(:info, "glue_runner.job_unchanged", { job: config[:name] })
+      :unchanged
+    else
+      update_and_log(config, changed)
+    end
+  end
+  # @api private
+  def self.create_and_log(config)
+    create_job(config)
+    :created
+  end
+  # @api private
+  def self.update_and_log(config, changed_fields)
+    @logger = DataDrain.configuration.logger
+    safe_log(:info, "glue_runner.job_update",
+             { job: config[:name], changed_fields: changed_fields })
+    update_job(config)
+    :updated
+  end
+  # @api private
+  # @return [Array<Symbol>] keys que difieren entre desired y current
+  #
+  # Refinamiento (observación 2 big-pickle): si extracted retorna nil para un
+  # campo Y el caller no lo especificó, NO se considera diff (es "no opinion"
+  # en ambos lados).
+  def self.changed_fields(desired_config, current_job)
+    extracted = extract_current_config(current_job)
+    %i[description role_arn script_location glue_version worker_type
+       number_of_workers timeout_minutes max_retries max_concurrent_runs
+       command_name python_version default_arguments].select do |field|
+      next false unless desired_config.key?(field)
+      # Si extracted es nil (AWS no retornó el campo) y el caller SÍ lo especificó,
+      # sigue siendo un diff (necesita update). Si caller no lo especificó tampoco,
+      # el `unless desired_config.key?(field)` arriba ya lo descartó.
+      desired_config[field] != extracted[field]
+    end
+  end
+  # @api private
+  # @return [Hash] config "deseada" extraída del Job actual
+  #
+  # IMPORTANTE (observación 4 big-pickle): este método NO extrae:
+  # - created_on / last_modified_on (timestamps, siempre difieren)
+  # - allocated_capacity (deprecated por AWS, reemplazado por number_of_workers)
+  # - log_uri / connections / non_overridable_arguments (no soportados aún)
+  # Si se agregan en el futuro, asegurar que tienen contraparte en
+  # build_aws_job_params para evitar diff falsos en ensure_job.
+  def self.extract_current_config(job)
+    {
+      description: job.description,
+      role_arn: job.role,
+      script_location: job.command&.script_location,
+      glue_version: job.glue_version,
+      worker_type: job.worker_type,
+      number_of_workers: job.number_of_workers,
+      timeout_minutes: job.timeout,
+      max_retries: job.max_retries,
+      max_concurrent_runs: job.execution_property&.max_concurrent_runs,
+      command_name: job.command&.name,
+      python_version: job.command&.python_version,
+      default_arguments: job.default_arguments&.to_h || {}
+    }
+  end
+  ```
+### 3.3 Tests
+- [ ] Agregar a `spec/data_drain/glue_runner_spec.rb`:
+  ```ruby
+  describe ".ensure_job" do
+    let(:glue_client) { Aws::Glue::Client.new(stub_responses: true, region: "us-east-1") }
+    before do
+      allow(Aws::Glue::Client).to receive(:new).and_return(glue_client)
+    end
+    let(:base_config) do
+      {
+        name: "my-job",
+        role_arn: "arn:aws:iam::123:role/GlueRole",
+        script_location: "s3://my-bucket/scripts/export.py",
+        worker_type: "G.1X",
+        number_of_workers: 5
+      }
+    end
+    it "retorna :created si el job no existe" do
+      glue_client.stub_responses(:get_job, "EntityNotFoundException")
+      glue_client.stub_responses(:create_job, { name: "my-job" })
+      expect(described_class.ensure_job(base_config)).to eq(:created)
+    end
+    it "retorna :unchanged si el job existe con misma config" do
+      glue_client.stub_responses(:get_job, {
+        job: {
+          name: "my-job",
+          role: "arn:aws:iam::123:role/GlueRole",
+          command: { script_location: "s3://my-bucket/scripts/export.py", name: "glueetl", python_version: "3" },
+          worker_type: "G.1X",
+          number_of_workers: 5
+        }
+      })
+      expect(described_class.ensure_job(base_config)).to eq(:unchanged)
+    end
+    it "retorna :updated si difiere algún campo" do
+      glue_client.stub_responses(:get_job, {
+        job: {
+          name: "my-job",
+          role: "arn:aws:iam::123:role/GlueRole",
+          command: { script_location: "s3://my-bucket/scripts/export.py" },
+          worker_type: "G.1X",
+          number_of_workers: 3   # ← difiere de 5
+        }
+      })
+      glue_client.stub_responses(:update_job, { job_name: "my-job" })
+      expect(described_class.ensure_job(base_config)).to eq(:updated)
+    end
+    it "loguea changed_fields en :updated" do
+      glue_client.stub_responses(:get_job, {
+        job: {
+          name: "my-job",
+          role: "arn:aws:iam::123:role/GlueRole",
+          command: { script_location: "s3://OLD-bucket/scripts/export.py" },  # difiere
+          worker_type: "G.1X",
+          number_of_workers: 3   # difiere
+        }
+      })
+      glue_client.stub_responses(:update_job, { job_name: "my-job" })
+      logs = capture_logs { described_class.ensure_job(base_config) }
+      update_log = logs.find { |l| l.include?("glue_runner.job_update") }
+      expect(update_log).to include("changed_fields=")
+      expect(update_log).to match(/script_location|number_of_workers/)
+    end
+    it "ignora campos no especificados por el caller" do
+      # caller solo pide worker_type y number_of_workers
+      partial_config = { name: "my-job", role_arn: "arn:...", script_location: "s3://..." }
+      glue_client.stub_responses(:get_job, {
+        job: {
+          name: "my-job",
+          role: "arn:...",
+          command: { script_location: "s3://..." },
+          max_retries: 5  # campo NO pedido por caller, no debe disparar update
+        }
+      })
+      expect(described_class.ensure_job(partial_config)).to eq(:unchanged)
+    end
+  end
+  ```
+### 3.4 Validación Fase 3
+- [ ] `bundle exec rspec spec/data_drain/glue_runner_spec.rb`
+- [ ] `bundle exec rubocop lib/data_drain/glue_runner.rb`
+### 3.5 Commit
+- [ ] Commit: `feat(glue): ensure_job idempotente con diff de campos (item 33)`
+### Checkpoint Fase 3
+- [ ] `ensure_job` retorna `:created | :updated | :unchanged`
+- [ ] Comparación ignora campos no seteados por caller (no false positives)
+- [ ] `changed_fields` se loguea en update
+---
+## Fase 4 — Item 35: Tests consolidación + cobertura
+Tests inline en fases 1-3. Esta fase valida cobertura final + edge cases.
+### 4.1 Edge cases
+- [ ] Test: `ensure_job` con `default_arguments: { "--key" => "val" }` que cambia → `:updated`.
+- [ ] Test: `ensure_job` con timeout que difiere por tipo (Integer vs String).
+- [ ] Test: `delete_job` con name vacío → ConfigurationError.
+- [ ] Test: errores de red transitorios (`Aws::Glue::Errors::ServiceUnavailable`) propagan.
+### 4.1.1 Tests específicos de Hash equality (observación 5 big-pickle)
+Ruby `Hash#==` compara contenido, no referencia. Pero hay edge cases reales con AWS SDK que requieren cobertura explícita:
+- [ ] Test: `default_arguments` igual contenido (String keys) → `:unchanged`:
+  ```ruby
+  it "default_arguments con mismas String keys es :unchanged" do
+    config = base_config.merge(default_arguments: { "--TempDir" => "s3://tmp/" })
+    glue_client.stub_responses(:get_job, {
+      job: {
+        name: "my-job", role: "...",
+        command: { script_location: "..." },
+        default_arguments: { "--TempDir" => "s3://tmp/" }
+      }
+    })
+    expect(described_class.ensure_job(config)).to eq(:unchanged)
+  end
+  ```
+- [ ] Test: `default_arguments` con orden distinto pero misma data → `:unchanged` (Hash#== ignora orden):
+  ```ruby
+  it "default_arguments con orden de keys distinto es :unchanged" do
+    config = base_config.merge(default_arguments: { "--A" => "1", "--B" => "2" })
+    glue_client.stub_responses(:get_job, {
+      job: {
+        name: "my-job", role: "...",
+        command: { script_location: "..." },
+        default_arguments: { "--B" => "2", "--A" => "1" }  # orden distinto
+      }
+    })
+    expect(described_class.ensure_job(config)).to eq(:unchanged)
+  end
+  ```
+- [ ] Test: `default_arguments` con valor distinto → `:updated`:
+  ```ruby
+  it "default_arguments con valor distinto dispara :updated" do
+    config = base_config.merge(default_arguments: { "--TempDir" => "s3://NEW/" })
+    glue_client.stub_responses(:get_job, {
+      job: {
+        name: "my-job", role: "...",
+        command: { script_location: "..." },
+        default_arguments: { "--TempDir" => "s3://OLD/" }
+      }
+    })
+    glue_client.stub_responses(:update_job, { job_name: "my-job" })
+    expect(described_class.ensure_job(config)).to eq(:updated)
+  end
+  ```
+- [ ] Test: AWS retorna Symbol keys (improbable, pero verificar):
+  ```ruby
+  it "default_arguments funciona si AWS retorna Symbol keys (defensive)" do
+    # Si AWS SDK alguna vez retorna { :"--TempDir" => ... } en lugar de String,
+    # la comparación fallaría. Test defensive — si falla, agregar normalización
+    # en extract_current_config: default_arguments.transform_keys(&:to_s).
+    config = base_config.merge(default_arguments: { "--TempDir" => "s3://tmp/" })
+    # AWS no parece retornar Symbol keys aquí, pero validar comportamiento.
+  end
+  ```
+### 4.2 Coverage
+- [ ] `bundle exec rspec` — verificar coverage ≥ 90% (threshold actual).
+- [ ] Si baja: agregar tests para ramas descubiertas.
+### 4.3 Commit
+- [ ] Commit: `test(glue): cobertura edge cases lifecycle Jobs (item 35)`
+### Checkpoint Fase 4
+- [ ] Coverage estable o sube
+- [ ] No flakes en 3 corridas seguidas
+---
+## Fase 5 — Item 36: Docs
+### 5.1 `skill/references/glue-jobs-lifecycle.md`
+- [ ] Crear archivo nuevo:
+  ```markdown
+  # Glue Jobs Lifecycle
+  DataDrain v0.4.0+ provee gestión de ciclo de vida de AWS Glue Jobs:
+  crear, actualizar, eliminar e idempotentemente garantizar (`ensure_job`).
+  ## Pre-requisitos
+  - IAM rol con `glue:CreateJob`, `glue:UpdateJob`, `glue:DeleteJob`, `glue:GetJob`.
+  - Script PySpark/Python ya subido a S3 (la gema NO sube scripts).
+  - IAM rol que ejecuta el Job Glue (separado del que crea, ver AWS docs).
+  ## Operaciones atómicas
+  ### create_job
+  ```ruby
+  DataDrain::GlueRunner.create_job(
+    name: "data-drain-export-versions",
+    role_arn: "arn:aws:iam::123:role/GlueServiceRole",
+    script_location: "s3://my-bucket/scripts/export.py"
+  )
+  ```
+  Defaults: `glue_version: "4.0"`, `worker_type: "G.1X"`, `number_of_workers: 5`,
+  `timeout: 2880` (48h), `command_name: "glueetl"`, `python_version: "3"`.
+  ### update_job
+  Mismo hash de config. Falla si no existe.
+  ### delete_job
+  ```ruby
+  DataDrain::GlueRunner.delete_job("data-drain-export-versions")
+  # => true si se borró, false si no existía (idempotente)
+  ```
+  ## Idempotente — `ensure_job` (recomendado)
+  ```ruby
+  DataDrain::GlueRunner.ensure_job(
+    name: "data-drain-export-versions",
+    role_arn: "...",
+    script_location: "s3://...",
+    worker_type: "G.4X",
+    number_of_workers: 10,
+    default_arguments: { "--TempDir" => "s3://my-bucket/temp/" }
+  )
+  # => :created | :updated | :unchanged
+  ```
+  Algoritmo:
+  1. `get_job(name)` — si nil → `create_job`, retorna `:created`
+  2. Si existe, comparar campos seteados por caller con job actual
+  3. Si difieren → `update_job`, retorna `:updated` (loguea `changed_fields`)
+  4. Si coinciden → no-op, retorna `:unchanged`
+  **Importante:** `ensure_job` solo compara campos que el caller setea. Campos no
+  declarados (ej. `max_retries` si no lo pasás) NO disparan update aunque AWS
+  los retorne con valores default.
+  ## Helpers consultivos
+  ```ruby
+  DataDrain::GlueRunner.job_exists?("...")  # Boolean
+  DataDrain::GlueRunner.get_job("...")      # Aws::Glue::Types::Job o nil
+  ```
+  ## Patrón completo: ensure + run + tear-down
+  ```ruby
+  job_name = "data-drain-export-versions"
+  DataDrain::GlueRunner.ensure_job(
+    name: job_name,
+    role_arn: ENV["GLUE_ROLE_ARN"],
+    script_location: "s3://#{bucket}/scripts/export.py",
+    worker_type: "G.1X",
+    number_of_workers: 5
+  )
+  DataDrain::GlueRunner.run_and_wait(
+    job_name,
+    { "--start_date" => start_date.to_fs(:db), ... },
+    max_wait_seconds: 3600
+  )
+  DataDrain::Engine.new(
+    bucket: bucket, table_name: "versions", ...,
+    skip_export: true
+  ).call
+  # Opcional: cleanup
+  # DataDrain::GlueRunner.delete_job(job_name)
+  ```
+  ## Eventos de telemetría
+  - `glue_runner.job_create` (INFO) — `job`, `glue_version`, `worker_type`, `number_of_workers`
+  - `glue_runner.job_update` (INFO) — `job`, `changed_fields` (en ensure)
+  - `glue_runner.job_unchanged` (INFO) — `job`
+  - `glue_runner.job_delete` (INFO) — `job`
+  - `glue_runner.job_delete_skipped` (INFO) — `job`, `reason: "not_found"`
+  - `glue_runner.job_create_error` (ERROR) — `job`, `error_class`, `error_message`
+  - `glue_runner.job_update_error` (ERROR) — idem
+  - `glue_runner.job_delete_error` (ERROR) — idem
+  ## Limitaciones (v0.4.0)
+  - **No upload de scripts.** Caller responsable de subir scripts a S3 antes de `create_job`.
+  - **No validación pre-create de IAM/bucket.** Errores AWS son claros, gema no agrega chequeos.
+  - **No gestión de Workflows/Triggers/Crawlers Glue.** Solo Jobs.
+  - **No soporta `connections:` en Job config.** Si necesitás conexiones JDBC dentro del Job, agregalas con AWS Console o `update_job` directo.
+  ```
+### 5.2 README
+- [ ] Editar `README.md` sección "Orquestación con AWS Glue", agregar sub-sección:
+  ```markdown
+  ### Gestión de Jobs Glue (v0.4.0+)
+  Para automatizar create/update/delete de Jobs Glue:
+      DataDrain::GlueRunner.ensure_job(
+        name: "my-export-job",
+        role_arn: ENV["GLUE_ROLE_ARN"],
+        script_location: "s3://my-bucket/scripts/export.py"
+      )
+  Detalle: [`skill/references/glue-jobs-lifecycle.md`](skill/references/glue-jobs-lifecycle.md).
+  ```
+### 5.3 SKILL.md + eventos
+- [ ] Editar `skill/SKILL.md` sección "Referencias", agregar:
+  ```markdown
+  - [Glue Jobs Lifecycle](references/glue-jobs-lifecycle.md) — Crear, actualizar, eliminar Jobs Glue
+  ```
+- [ ] Editar `skill/references/eventos-telemetria.md`. Agregar sección "GlueRunner — Lifecycle":
+  - `glue_runner.job_create`, `job_update`, `job_unchanged`, `job_delete`, `job_delete_skipped`
+  - `glue_runner.job_create_error`, `job_update_error`, `job_delete_error`
+- [ ] Editar `skill/references/api-detallada.md` sección GlueRunner: agregar 5 nuevos métodos.
+### 5.4 Commit
+- [ ] Commit: `docs: glue-jobs-lifecycle + ejemplos README + eventos (item 36)`
+### Checkpoint Fase 5
+- [ ] `glue-jobs-lifecycle.md` cubre todos los métodos
+- [ ] README mencionado el nuevo feature
+- [ ] Eventos catalogados
+---
+## Fase 6 — Release
+### 6.1 Lint + tests finales
+- [ ] `bundle exec rubocop` — 0 ofensas.
+- [ ] `bundle exec rspec` — coverage ≥ 90%.
+- [ ] CI verde Ruby 3.2/3.3/3.4.
+### 6.2 CHANGELOG
+- [ ] Editar `CHANGELOG.md`, agregar al tope:
+  ```markdown
+  ## [0.4.0] - 2026-XX-XX
+  ### Features
+  - **Glue Jobs Lifecycle:** nuevos métodos en `GlueRunner` para gestionar Jobs Glue:
+    - `create_job(config)` — crea con defaults razonables (Glue 4.0, G.1X, 5 workers)
+    - `update_job(config)` — actualiza Job existente
+    - `delete_job(name)` — idempotente (no levanta si no existe)
+    - `ensure_job(config)` — declarative idempotente: `:created | :updated | :unchanged`
+    - `get_job(name)` — retorna el Job o nil
+    - `job_exists?(name)` — boolean
+    Permite "infra as code" sin gestionar Jobs vía Console o Terraform. (items 32, 33, 34)
+  ### Telemetry nueva
+  - `glue_runner.job_create`, `job_update`, `job_unchanged`, `job_delete`, `job_delete_skipped`
+  - `glue_runner.job_create_error`, `job_update_error`, `job_delete_error`
+  ### Docs
+  - `skill/references/glue-jobs-lifecycle.md` (item 36)
+  - README sección "Gestión de Jobs Glue" actualizada
+  ### Tests
+  - Cobertura mantiene ≥ 90% (real ~97%+).
+  - Stubs `Aws::Glue::Client.new(stub_responses: true)` para todos los nuevos métodos.
+  ```
+### 6.3 Bump versión
+- [ ] `lib/data_drain/version.rb`: `VERSION = "0.4.0"`
+- [ ] `bundle install`
+### 6.4 Actualizar roadmap
+- [ ] `docs/IMPROVEMENT_PLAN.md` sección "Follow-ups post-roadmap":
+  - Items 32, 33, 34, 35, 36 → `[x]`
+### 6.5 Commit release
+- [ ] Commit: `chore: release v0.4.0 — Glue Jobs Lifecycle`
+### 6.6 PR + merge + tag
+- [ ] `git push origin feature/v0.4.0`
+- [ ] `gh pr create --title "v0.4.0: Glue Jobs Lifecycle (create/update/delete/ensure)"`
+- [ ] CI verde
+- [ ] Mergear
+- [ ] Tag: `git tag v0.4.0 && git push origin v0.4.0`
+### 6.7 Post-merge
+- [ ] Archivar plan: `git mv docs/execution/v0.4.0.md docs/execution/archive/v0.4.0.md`
+- [ ] Commit: `chore: archive v0.4.0 plan, items 32-36 [x]`
+- [ ] Actualizar memoria proyecto.
+---
+## Validación final
+- [ ] CI verde matrix Ruby
+- [ ] Coverage ≥ 90%
+- [ ] `bundle exec rubocop` sin ofensas
+- [ ] Tag v0.4.0 creado
+- [ ] 5 items marcados `[x]`
+- [ ] Plan archivado
+- [ ] CHANGELOG actualizado
+---
+## Plan B — escenarios de bloqueo
+| Si... | Entonces... |
+|-------|-------------|
+| `ensure_job` diff genera false positives (AWS retorna campos default que no setteamos) | Refinar `extract_current_config` con `compact` o exclusiones explícitas. Tests con stub que retorne campos extras. |
+| `Aws::Glue::Errors::EntityNotFoundException` no es la clase exacta en alguna versión SDK | Verificar nombre real con `Aws::Glue::Errors.constants`. Ajustar rescue. |
+| `update_job` rechaza `:name` en `job_update` (vs lo que documentamos) | Probar contra stub real. Ajustar `aws_params.except(:name)`. |
+| Tests con `stub_responses` no soportan `EntityNotFoundException` por nombre string | Usar instancia: `glue_client.stub_responses(:get_job, Aws::Glue::Errors::EntityNotFoundException.new(nil, "msg"))`. |
+| Item 33 (ensure_job) toma >4h por edge cases del diff | Cortar scope: lanzar v0.4.0 con solo atómicos (item 32+34) + helpers, dejar `ensure_job` para v0.4.1. |
+| Permisos IAM faltantes en testing real | Documentar en glue-jobs-lifecycle.md el set mínimo requerido. |
+| AWS SDK retorna `default_arguments` con Symbol keys (no String) | Normalizar en `extract_current_config`: `default_arguments&.to_h&.transform_keys(&:to_s) || {}`. Test específico en Fase 4.1.1 detecta el caso. |
+| Glue API requiere `:command` con shape completa en `update_job` (no parcial) | Mi `build_aws_job_params` ya envía hash completo de `:command` (name + script_location + python_version). Test Fase 2.5 valida shape. |
+---
+## Notas para el agente que ejecuta
+- **Item 33 (`ensure_job`) es el más complejo.** Tests deben cubrir bien el diff de fields para no falsos positivos.
+- **Cada commit cierra con verde:** rspec + rubocop antes de avanzar.
+- **`delete_job` es idempotente por diseño.** Documentar esta decisión claramente en YARD.
+- **Identificadores `name`** se validan con `Validations.validate_identifier!` (existente, regex `\A[a-zA-Z_][a-zA-Z0-9_]*\z`). Si un caller usa nombres con `-` (común en AWS), **ajustar la regex o crear validación específica para Glue Job names** (que permiten `-`).
+  - **Acción de pre-fase:** verificar si Glue Job names permiten `-`. Si sí (creo que sí), adaptar `validate_identifier!` o crear `validate_glue_name!` separado en `Validations`.
+- **Coordinación con big-pickle:** después de subir el plan, pedir review (similar a v0.3.0 y v0.3.1).