RubyGems - data_drain - Versions diffs - 0.2.1 → 0.2.2 - Mend

data_drain 0.2.1 → 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +23 -1
data/CLAUDE.md +3 -1
data/README.md +3 -0
data/docs/IMPROVEMENT_PLAN.md +262 -7
data/docs/execution/v0.2.2.md +891 -0
data/lib/data_drain/configuration.rb +49 -5
data/lib/data_drain/engine.rb +1 -0
data/lib/data_drain/file_ingestor.rb +1 -0
data/lib/data_drain/glue_runner.rb +22 -10
data/lib/data_drain/observability.rb +4 -2
data/lib/data_drain/record.rb +2 -1
data/lib/data_drain/storage/s3.rb +33 -37
data/lib/data_drain/version.rb +1 -1
data/skill/SKILL.md +1 -0
data/skill/references/antipatrones.md +20 -3
data/skill/references/api-detallada.md +18 -5
data/skill/references/eventos-telemetria.md +5 -0
data/skill/references/postgres-tuning.md +129 -0
metadata +3 -1

data/docs/execution/v0.2.2.md ADDED Viewed

@@ -0,0 +1,891 @@
+# Plan de Ejecución — v0.2.2
+**Release objetivo:** v0.2.2 — Robustez operacional (items P1 cheap)
+**Items del roadmap:** 9, 7, 8, 11a ([ver IMPROVEMENT_PLAN.md](../IMPROVEMENT_PLAN.md#p1--performance-y-robustez-v021--v030))
+**Branch sugerido:** `feature/v0.2.2`
+**Base:** `main` (contiene v0.2.0 + v0.2.1)
+**Estado:** No iniciado
+**Última actualización:** 2026-04-14
+---
+## Contexto
+v0.2.0 cerró los P0 (security + testing). v0.2.1 fue un patch CI + PR feedback. Este release mete los items P1 "baratos" que aportan valor operacional sin refactors grandes:
+| Item | Resumen | Estimación |
+|------|---------|------------|
+| 9 | Filtro secretos por regex en Observability | 30min |
+| 7 | `max_wait_seconds` en `GlueRunner.run_and_wait` | 1-2h |
+| 8 | `Configuration#validate!` | 2-3h |
+| 11a | Docs Postgres tuning por tamaño de tabla | 4-6h |
+| cleanup A1 | Fix typo `依赖` en CHANGELOG v0.2.1 | 2min |
+| cleanup A2 | Comment en `disconnect!` rescue | 2min |
+| cleanup A3 | Rename + agregar test string/symbol keys en `record_spec.rb` | 10min |
+| cleanup A4 | Cerrar `db`+`conn` en `record_spec.rb#before(:all)` | 5min |
+| cleanup B1 | Reordenar `public`/`private` en `storage/s3.rb` | 15min |
+**Total estimado:** 1-2 días de trabajo enfocado.
+**Excluidos intencionalmente** (se mueven a v0.3.0 por ser refactor/más costosos):
+- Item 5 (VACUUM post-purga) — toca purge path crítico, requiere tests con Postgres real
+- Item 6 (sandboxing Record) — requiere validación con S3 real
+- Item 10 (refactor Engine#call) — refactor mayor
+- Item 11b (warning runtime purga lenta) — requiere mocks elaborados
+---
+## Review de agentes — incorporado
+Revisión por **big-pickle (opencode)** 2026-04-14 ([ClickUp task 86b9dka0c](https://app.clickup.com/t/86b9dka0c)). 4 riesgos planteados, todos incorporados:
+| Riesgo | Resolución | Ubicación en este plan |
+|--------|-----------|----------------------|
+| 1: `db_port` con default 5432 — validación pasa siempre | Excluido de `validate_db_config!` con nota explicativa | Fase 3.1 |
+| 2: `db_pass` no se valida sin justificar | Excluido con nota: auth peer/trust/IAM pueden tener nil | Fase 3.1 + test nuevo 3.2 |
+| 3: Mock `Process.clock_gettime` puede ser flakey | Plan B ya contempla Timecop como fallback | Sin cambio |
+| 4: Acceso al monorepo Wispro no alcanzable desde gema | Marcado como step manual explícito | Fase 3.3 |
+---
+## Orden de ejecución y dependencias
+```
+Fase 0: setup branch + baseline + A1 (fix CHANGELOG v0.2.1)
+   │
+   ▼
+Fase 1: Item 9 (filtro secretos regex) ──► isolated, warm-up
+   │
+   ▼
+Fase 2: Item 7 (max_wait_seconds GlueRunner) ──► isolated
+   │
+   ▼
+Fase 3: Item 8 (Configuration#validate!) ──► toca Engine/FileIngestor/GlueRunner
+   │                                         (depende de item 1 de v0.2.0 ya mergeado)
+   ▼
+Fase 4: Item 11a (docs Postgres tuning) ──► pure docs, sin deps
+   │
+   ▼
+Fase 4.5: Cleanup del review de v0.2.0 (A2, A3, A4, B1)
+   │
+   ▼
+Fase 5: Release (CHANGELOG, version bump, tag)
+```
+**Razonamiento:**
+- 9 primero: XS, isolated a Observability, warm-up.
+- 7 segundo: isolated a GlueRunner, patrón simple.
+- 8 tercero: toca 3 clases, establece el patrón de `raise ConfigurationError`. Depende de item 1 (v0.2.0) — la validación NO debe exigir `aws_access_key_id` porque ahora es opcional con `credential_chain`.
+- 11a último: docs puras, puede generarse con la skill `postgresql-optimization` como apoyo.
+---
+## Pre-requisitos (Fase 0)
+### 0.1 Verificar entorno
+- [ ] `git checkout main && git pull`
+- [ ] Confirmar versión actual en `lib/data_drain/version.rb` = `"0.2.1"`
+- [ ] `bundle exec rspec` pasa en main (112 specs, coverage ≥ 80%)
+- [ ] `bundle exec rubocop` sin ofensas en `lib/` (specs excluidas por `.rubocop.yml`)
+### 0.2 Crear branch
+- [ ] `git checkout -b feature/v0.2.2`
+### 0.3 Revisar skill `postgresql-optimization`
+Para item 11a. Ubicada en `.agents/skills/postgresql-optimization/SKILL.md`. Tiene material sobre:
+- Índices composite/parcial/expression/covering
+- EXPLAIN ANALYZE
+- pg_stat_statements, pg_stat_activity
+- Particionamiento declarativo
+- VACUUM
+- [ ] Abrir skill para consulta durante Fase 4
+### 0.4 Cleanup A1 — Fix typo `依赖` en CHANGELOG v0.2.1
+- [ ] Editar `CHANGELOG.md`, encontrar la línea:
+  ```
+  CI: Descarga binario pre-compilado de DuckDB en vez de依赖 del sistema (`libduckdb-dev`).
+  ```
+  Reemplazar `en vez de依赖 del sistema` por `en vez de depender del sistema`.
+- [ ] Commit: `fix(changelog): typo en CHANGELOG v0.2.1 (cleanup A1)`
+### Checkpoint Fase 0
+- [ ] Branch creado
+- [ ] Baseline tests verdes
+- [ ] Entorno limpio
+- [ ] CHANGELOG v0.2.1 corregido
+---
+## Fase 1 — Item 9: Filtro secretos por regex (P1)
+**Roadmap:** [Item 9](../IMPROVEMENT_PLAN.md#item-9--filtro-de-secretos-por-regex-en-observability)
+### Contexto
+`Observability#safe_log` filtra solo claves exactas (`%i[password token secret api_key auth]`). No filtra variantes como `db_password`, `aws_secret_access_key`, `bearer_token`. Cambio trivial pero aumenta defensa en profundidad significativamente.
+### 1.1 Implementación
+- [ ] Editar `lib/data_drain/observability.rb`:
+  - Agregar constante al módulo (antes de `module_function` equivalente):
+    ```ruby
+    SENSITIVE_KEY_PATTERN = /password|passwd|pass|secret|token|api_key|apikey|auth|credential|private_key/i
+    ```
+  - En `safe_log`, reemplazar:
+    ```ruby
+    val = %i[password token secret api_key auth].include?(k.to_sym) ? "[FILTERED]" : v
+    ```
+    por:
+    ```ruby
+    val = SENSITIVE_KEY_PATTERN.match?(k.to_s) ? "[FILTERED]" : v
+    ```
+### 1.2 Tests
+- [ ] Editar `spec/data_drain/observability_spec.rb`. Agregar al bloque `describe "#safe_log"`:
+  ```ruby
+  it "filtra db_password (regex)" do
+    instance.emit(:info, "test", db_password: "x")
+    expect(test_logger.string).to include("db_password=[FILTERED]")
+    expect(test_logger.string).not_to include("db_password=x")
+  end
+  it "filtra aws_secret_access_key (regex)" do
+    instance.emit(:info, "test", aws_secret_access_key: "akia123")
+    expect(test_logger.string).to include("aws_secret_access_key=[FILTERED]")
+  end
+  it "filtra bearer_token (regex)" do
+    instance.emit(:info, "test", bearer_token: "eyJhbGc...")
+    expect(test_logger.string).to include("bearer_token=[FILTERED]")
+  end
+  it "filtra private_key (regex)" do
+    instance.emit(:info, "test", private_key: "-----BEGIN RSA")
+    expect(test_logger.string).to include("private_key=[FILTERED]")
+  end
+  it "filtra credential" do
+    instance.emit(:info, "test", aws_credential: "x")
+    expect(test_logger.string).to include("aws_credential=[FILTERED]")
+  end
+  it "no filtra campos no sensibles" do
+    instance.emit(:info, "test", count: 42, table: "versions", user_id: 5)
+    expect(test_logger.string).to include("count=42")
+    expect(test_logger.string).to include("user_id=5")
+  end
+  ```
+### 1.3 Validación local
+- [ ] `bundle exec rspec spec/data_drain/observability_spec.rb`
+- [ ] `bundle exec rubocop lib/data_drain/observability.rb`
+### 1.4 Docs
+- [ ] Actualizar `CLAUDE.md` sección "Logging" — mencionar regex filter
+- [ ] Actualizar `skill/references/api-detallada.md` sección Observability#safe_log
+- [ ] Alinear con `/Users/gabriel/.claude/CLAUDE.md` línea `Filter sensitive keys (password|pass|passwd|secret|token|api_key|auth) → [FILTERED]` — el regex de v0.2.2 es **superset** (agrega `apikey`, `credential`, `private_key`), así que cumple con el estándar global.
+### 1.5 Commit
+- [ ] `git add lib/data_drain/observability.rb spec/data_drain/observability_spec.rb`
+- [ ] `git add CLAUDE.md skill/`
+- [ ] Commit: `feat(security): filtro secretos por regex en Observability (item 9)`
+### Checkpoint Fase 1
+- [ ] Tests verdes
+- [ ] Rubocop limpio
+- [ ] Coverage no bajó
+---
+## Fase 2 — Item 7: `max_wait_seconds` en GlueRunner (P1)
+**Roadmap:** [Item 7](../IMPROVEMENT_PLAN.md#item-7--max_wait_seconds-en-gluerunnerrun_and_wait)
+### Contexto
+`GlueRunner.run_and_wait` no tiene timeout máximo. Si Glue queda colgado en `RUNNING`, bloquea indefinidamente.
+### 2.1 Implementación
+- [ ] Editar `lib/data_drain/glue_runner.rb`:
+  - Agregar parámetro `max_wait_seconds:` con default `nil`:
+    ```ruby
+    def self.run_and_wait(job_name, arguments = {}, polling_interval: 30, max_wait_seconds: nil)
+    ```
+  - En el loop, antes del `get_job_run`, agregar guard:
+    ```ruby
+    loop do
+      if max_wait_seconds &&
+         (Process.clock_gettime(Process::CLOCK_MONOTONIC) - start_time) > max_wait_seconds
+        safe_log(:error, "glue_runner.timeout", {
+                   job: job_name,
+                   run_id: run_id,
+                   max_wait_seconds: max_wait_seconds
+                 })
+        raise DataDrain::Error,
+              "Glue Job #{job_name} (Run ID: #{run_id}) excedió max_wait_seconds=#{max_wait_seconds}"
+      end
+      run_info = client.get_job_run(...).job_run
+      # ... resto del case actual
+    end
+    ```
+  - Actualizar YARD:
+    ```ruby
+    # @param max_wait_seconds [Integer, nil] Timeout máximo en segundos.
+    #   nil = sin límite (comportamiento anterior).
+    # @raise [DataDrain::Error] si max_wait_seconds excede antes de SUCCEEDED
+    ```
+### 2.2 Tests
+- [ ] Editar `spec/data_drain/glue_runner_spec.rb`. Agregar:
+  ```ruby
+  it "levanta DataDrain::Error cuando max_wait_seconds se excede" do
+    start_response = double("start_resp", job_run_id: "run-timeout")
+    running_info = double("run_info", job_run_state: "RUNNING", error_message: nil)
+    allow(mock_client).to receive(:start_job_run).and_return(start_response)
+    allow(mock_client).to receive(:get_job_run)
+      .and_return(double("get_resp", job_run: running_info))
+    # Simular tiempo: start_time es t0, primer check pasa, segundo check excede
+    times = [0.0, 0.1, 200.0]
+    allow(Process).to receive(:clock_gettime).with(Process::CLOCK_MONOTONIC) do
+      times.shift || 300.0
+    end
+    allow(Kernel).to receive(:sleep)
+    expect do
+      described_class.run_and_wait("slow-job", {}, polling_interval: 1, max_wait_seconds: 60)
+    end.to raise_error(DataDrain::Error, /max_wait_seconds=60/)
+  end
+  it "sin max_wait_seconds mantiene comportamiento anterior (no timeout local)" do
+    start_response = double("start_resp", job_run_id: "run-ok")
+    succeeded_info = double("run_info", job_run_state: "SUCCEEDED", error_message: nil)
+    allow(mock_client).to receive(:start_job_run).and_return(start_response)
+    allow(mock_client).to receive(:get_job_run)
+      .and_return(double("get_resp", job_run: succeeded_info))
+    # max_wait_seconds nil por default — no se chequea
+    expect { described_class.run_and_wait("ok-job") }.not_to raise_error
+  end
+  ```
+### 2.3 Validación local
+- [ ] `bundle exec rspec spec/data_drain/glue_runner_spec.rb`
+- [ ] `bundle exec rubocop lib/data_drain/glue_runner.rb`
+### 2.4 Docs
+- [ ] `skill/references/api-detallada.md` sección GlueRunner — agregar `max_wait_seconds`
+- [ ] `skill/references/eventos-telemetria.md` — agregar evento `glue_runner.timeout`
+- [ ] `skill/references/antipatrones.md` — actualizar antipatrón 14 ("Confiar en que GlueRunner tiene timeout máximo") para mencionar que ahora sí se puede con `max_wait_seconds:`
+### 2.5 Commit
+- [ ] Commit: `feat(glue): max_wait_seconds en GlueRunner.run_and_wait (item 7)`
+### Checkpoint Fase 2
+- [ ] Tests verdes
+- [ ] Antipatrón 14 actualizado
+---
+## Fase 3 — Item 8: `Configuration#validate!` (P1)
+**Roadmap:** [Item 8](../IMPROVEMENT_PLAN.md#item-8--configurationvalidate)
+### Contexto
+`Configuration` no valida invariantes. Errores típicos (storage_mode inválido, aws_region faltante con `:s3`, db_* faltantes en Engine) se manifiestan tarde con errores oscuros (`NoMethodError`, `Aws::Errors`, `PG::ConnectionBad`).
+**Importante:** item 1 de v0.2.0 hizo `aws_access_key_id`/`aws_secret_access_key` **opcionales** (credential_chain). `validate!` NO debe exigirlos, solo `aws_region`.
+### 3.1 Implementación
+- [ ] Editar `lib/data_drain/configuration.rb`:
+  ```ruby
+  # Valida invariantes generales (storage_mode + AWS si aplica).
+  # Llamado por FileIngestor#initialize y GlueRunner.run_and_wait.
+  #
+  # @raise [DataDrain::ConfigurationError]
+  def validate!
+    validate_storage_mode!
+    validate_aws_config! if storage_mode.to_sym == :s3
+  end
+  # Valida además las credenciales PostgreSQL.
+  # Llamado por Engine#initialize.
+  #
+  # @raise [DataDrain::ConfigurationError]
+  def validate_for_engine!
+    validate!
+    validate_db_config!
+  end
+  private
+  def validate_storage_mode!
+    return if %i[local s3].include?(storage_mode.to_sym)
+    raise DataDrain::ConfigurationError,
+          "storage_mode debe ser :local o :s3, recibido #{storage_mode.inspect}"
+  end
+  def validate_aws_config!
+    return unless aws_region.nil? || aws_region.to_s.empty?
+    raise DataDrain::ConfigurationError,
+          "aws_region es obligatorio con storage_mode = :s3"
+  end
+  # NOTA (review big-pickle riesgo 1): db_port se excluye porque tiene
+  # default 5432 — la validación siempre pasaría, sería código muerto.
+  # Si alguien lo setea a nil intencionalmente, Postgres rompe con error
+  # descriptivo igual.
+  #
+  # NOTA (review big-pickle riesgo 2): db_pass se excluye porque puede
+  # ser nil cuando Postgres usa auth peer/trust (sockets locales) o IAM
+  # (RDS IAM authentication). Requerirlo rompería esos casos válidos.
+  def validate_db_config!
+    %i[db_host db_user db_name].each do |attr|
+      val = public_send(attr)
+      next unless val.nil? || val.to_s.empty?
+      raise DataDrain::ConfigurationError,
+            "config.#{attr} es obligatorio para Engine (storage_mode=#{storage_mode})"
+    end
+  end
+  ```
+- [ ] Editar `lib/data_drain/engine.rb#initialize`. Después de capturar `@config`:
+  ```ruby
+  @config = DataDrain.configuration
+  @config.validate_for_engine!
+  @logger = @config.logger
+  @adapter = DataDrain::Storage.adapter
+  ```
+- [ ] Editar `lib/data_drain/file_ingestor.rb#initialize`. Después de capturar `@config`:
+  ```ruby
+  @config = DataDrain.configuration
+  @config.validate!
+  @logger = @config.logger
+  @adapter = DataDrain::Storage.adapter
+  ```
+- [ ] Editar `lib/data_drain/glue_runner.rb.run_and_wait`. Al principio:
+  ```ruby
+  def self.run_and_wait(job_name, arguments = {}, polling_interval: 30, max_wait_seconds: nil)
+    config = DataDrain.configuration
+    config.validate!
+    # ... resto
+  end
+  ```
+### 3.2 Tests
+- [ ] Crear `spec/data_drain/configuration_validate_spec.rb` (o agregar a `configuration_spec.rb`):
+  ```ruby
+  RSpec.describe DataDrain::Configuration do
+    describe "#validate!" do
+      it "no levanta con storage_mode :local" do
+        config = described_class.new
+        config.storage_mode = :local
+        expect { config.validate! }.not_to raise_error
+      end
+      it "levanta con storage_mode :foo" do
+        config = described_class.new
+        config.storage_mode = :foo
+        expect { config.validate! }.to raise_error(DataDrain::ConfigurationError, /storage_mode/)
+      end
+      it "levanta con storage_mode :s3 sin aws_region" do
+        config = described_class.new
+        config.storage_mode = :s3
+        config.aws_region = nil
+        expect { config.validate! }.to raise_error(DataDrain::ConfigurationError, /aws_region/)
+      end
+      it "no levanta con storage_mode :s3 + aws_region, sin credenciales (credential_chain)" do
+        config = described_class.new
+        config.storage_mode = :s3
+        config.aws_region = "us-east-1"
+        expect { config.validate! }.not_to raise_error
+      end
+    end
+    describe "#validate_for_engine!" do
+      it "levanta sin db_host" do
+        config = described_class.new
+        config.db_host = nil
+        config.db_user = "u"
+        config.db_name = "d"
+        expect { config.validate_for_engine! }.to raise_error(DataDrain::ConfigurationError, /db_host/)
+      end
+      it "levanta sin db_name" do
+        config = described_class.new
+        config.db_user = "u"
+        config.db_name = nil
+        expect { config.validate_for_engine! }.to raise_error(DataDrain::ConfigurationError, /db_name/)
+      end
+      it "no levanta con todos los campos requeridos seteados" do
+        config = described_class.new
+        config.db_user = "u"
+        config.db_name = "d"
+        # db_pass intencionalmente nil — válido con auth peer/trust/IAM
+        expect { config.validate_for_engine! }.not_to raise_error
+      end
+      it "no levanta con db_pass nil (auth peer/trust/IAM)" do
+        config = described_class.new
+        config.db_user = "u"
+        config.db_pass = nil
+        config.db_name = "d"
+        expect { config.validate_for_engine! }.not_to raise_error
+      end
+    end
+  end
+  ```
+- [ ] Agregar a `engine_spec.rb` bloque `describe "validación de configuración"`:
+  ```ruby
+  it "levanta ConfigurationError si db_name falta" do
+    DataDrain.configure { |c| c.db_user = "u"; c.db_name = nil }
+    expect do
+      described_class.new(base_options.merge(table_name: "versions"))
+    end.to raise_error(DataDrain::ConfigurationError, /db_name/)
+  ensure
+    DataDrain.reset_configuration!
+  end
+  ```
+- [ ] Tests similares en `file_ingestor_spec.rb` y `glue_runner_spec.rb`.
+### 3.3 Impacto backward-compat
+- [ ] **Riesgo:** si algún caller actual tiene config con `db_user=""` o `db_name=nil` (estaba roto pero no se ejecutaba el path), ahora romperá en `Engine.new`.
+- [ ] **⚠️ Step manual (review big-pickle riesgo 4):** agentes en esta gema aislada NO tienen acceso al monorepo Wispro. Ejecutar manualmente desde el monorepo:
+  ```bash
+  cd ~/src/wispro-monorepo  # o donde esté
+  rg "DataDrain.configure" -A 20 --type ruby
+  rg "DataDrain::Engine.new" -A 10 --type ruby
+  ```
+  Verificar que todos los campos requeridos (db_host, db_user, db_name; aws_region si :s3) están seteados. Documentar callers encontrados:
+  > Callers con config incompleta: ___________________
+- [ ] Documentar en CHANGELOG como "BREAKING preventivo" (similar al item 2 de v0.2.0).
+### 3.4 Validación local
+- [ ] `bundle exec rspec`
+- [ ] `bundle exec rubocop lib/`
+### 3.5 Docs
+- [ ] `skill/references/api-detallada.md` sección Configuration — agregar `#validate!` y `#validate_for_engine!`
+- [ ] `CLAUDE.md` sección "Configuración" con nota sobre validación automática
+- [ ] `skill/references/antipatrones.md` — agregar nuevo antipatrón: "No llamar `Engine.new` con `db_name` faltante esperando que 'use el default' — ahora falla rápido con error descriptivo".
+### 3.6 Commit
+- [ ] Commit: `feat(config): Configuration#validate! invocada en Engine/FileIngestor/GlueRunner (item 8)`
+### Checkpoint Fase 3
+- [ ] Tests verdes (incluyendo engine_spec / file_ingestor_spec actualizados)
+- [ ] Coverage no bajó
+- [ ] Documentado como BREAKING preventivo en CHANGELOG (borrador)
+---
+## Fase 4 — Item 11a: Docs Postgres tuning por tamaño de tabla (P1)
+**Roadmap:** [Item 11a](../IMPROVEMENT_PLAN.md#item-11a--documentación-de-postgres-tuning-por-tamaño-de-tabla)
+### Contexto
+DataDrain no documenta tuning de Postgres para purgas masivas. Items recurrentes: ¿qué índice ayuda al DELETE en lotes? ¿cuándo migrar a particionamiento? ¿cómo diagnosticar purgas lentas?
+La skill `postgresql-optimization` (en `.agents/skills/`) aporta material base sobre índices, EXPLAIN ANALYZE, `pg_stat_statements`, particionamiento.
+### 4.1 Crear `skill/references/postgres-tuning.md`
+Estructura sugerida:
+```markdown
+# Postgres Tuning para DataDrain
+Guía operacional para tablas que DataDrain archiva y purga. Cubre índices,
+VACUUM, particionamiento y diagnóstico.
+## Tabla de decisión por tamaño
+| Tamaño | Estrategia |
+|--------|-----------|
+| <10GB | Índice composite `(created_at, pk)` con `CREATE INDEX CONCURRENTLY` |
+| 10-100GB | Mismo + `SET maintenance_work_mem='4GB'` + checklist |
+| 100GB-1TB | Particionamiento declarativo por mes |
+| >1TB | Particionamiento obligatorio + `DROP PARTITION` reemplaza DELETE |
+## Índice recomendado
+Para tablas <100GB, DataDrain se beneficia de un índice composite:
+    CREATE INDEX CONCURRENTLY idx_versions_created_at_id
+    ON versions (created_at, id);
+El DELETE en batches usa `WHERE created_at >= X AND created_at < Y` + `IN (SELECT id LIMIT N)`.
+El índice composite lo convierte en index scan por rango + acceso directo al id.
+### Checklist pre-`CREATE INDEX CONCURRENTLY`
+- [ ] Tamaño actual: `SELECT pg_size_pretty(pg_total_relation_size('versions'));`
+- [ ] Espacio libre disco (>2x tabla)
+- [ ] `SET maintenance_work_mem = '4GB';` (sesión)
+- [ ] `SET statement_timeout = 0;`
+- [ ] Ventana de baja carga
+- [ ] Plan rollback: `DROP INDEX CONCURRENTLY` si satura I/O
+### Riesgos de `CONCURRENTLY`
+1. **Dos pasadas** (puede tardar horas en 500GB)
+2. **I/O sostenido** (satura IOPS en EBS gp3 sin provisioned)
+3. **Puede fallar y dejar índice INVALID** → recuperar con `DROP INDEX CONCURRENTLY idx; CREATE INDEX CONCURRENTLY idx ...`
+4. **Espacio en disco alto** durante build (sort externo si `maintenance_work_mem` bajo)
+## VACUUM ANALYZE post-purga
+En tablas no particionadas, purgar millones de rows deja dead tuples.
+Sin VACUUM, el espacio no se libera y los seq scan recorren páginas vacías.
+    VACUUM ANALYZE versions;
+Item 5 del roadmap agrega `config.vacuum_after_purge` para automatizar esto.
+Hasta v0.3.0, correr manualmente después de cada `Engine#call` en tablas
+grandes no particionadas.
+**NO usar `VACUUM FULL`** — bloquea la tabla entera (ACCESS EXCLUSIVE lock).
+## Diagnóstico de purga lenta
+    -- Plan del DELETE en lotes
+    EXPLAIN (ANALYZE, BUFFERS)
+    DELETE FROM versions
+    WHERE id IN (
+      SELECT id FROM versions
+      WHERE created_at >= '2026-01-01' AND created_at < '2026-02-01'
+      LIMIT 5000
+    );
+    -- Sesiones activas sobre la tabla
+    SELECT pid, state, wait_event, query_start, query
+    FROM pg_stat_activity
+    WHERE query LIKE '%versions%'
+      AND state != 'idle';
+    -- Estadísticas de la tabla
+    SELECT relname, n_live_tup, n_dead_tup, last_vacuum, last_autovacuum
+    FROM pg_stat_user_tables
+    WHERE relname = 'versions';
+    -- Top queries lentas (requiere pg_stat_statements)
+    SELECT substring(query, 1, 100) AS query, calls, mean_exec_time, rows
+    FROM pg_stat_statements
+    WHERE query LIKE '%versions%'
+    ORDER BY mean_exec_time DESC
+    LIMIT 10;
+## Particionamiento declarativo (tablas > 100GB)
+Migrar a tabla particionada cambia DataDrain de "DELETE masivo throttled" a
+"DROP PARTITION instantáneo".
+### Setup
+    -- 1. Crear tabla particionada (vacía, misma estructura que versions)
+    CREATE TABLE versions_new (
+      id UUID PRIMARY KEY,
+      created_at TIMESTAMP NOT NULL,
+      ... -- resto de columnas
+    ) PARTITION BY RANGE (created_at);
+    -- 2. Crear partición por mes
+    CREATE TABLE versions_2026_03 PARTITION OF versions_new
+      FOR VALUES FROM ('2026-03-01') TO ('2026-04-01');
+    -- 3. Migrar datos (lotes, una partición por vez)
+    INSERT INTO versions_2026_03
+    SELECT * FROM versions
+    WHERE created_at >= '2026-03-01' AND created_at < '2026-04-01';
+    -- 4. Swap nombres (downtime mínimo)
+    BEGIN;
+      ALTER TABLE versions RENAME TO versions_old;
+      ALTER TABLE versions_new RENAME TO versions;
+    COMMIT;
+### Beneficio para DataDrain
+    -- v0.2.x: DELETE en lotes, VACUUM después, horas en TB
+    DataDrain::Engine.new(...).call
+    -- Con particiones: DataDrain sigue funcionando pero si el rango
+    -- coincide con una partición, el operador puede hacer:
+    DROP TABLE versions_2026_03;  -- instantáneo, sin bloat
+DataDrain no detecta particiones automáticamente (futuro item). Hoy el
+operador decide.
+## Referencias
+- Skill: `.agents/skills/postgresql-optimization/SKILL.md`
+- PG docs: https://www.postgresql.org/docs/current/ddl-partitioning.html
+- Item 5 roadmap (VACUUM automático): ../IMPROVEMENT_PLAN.md#item-5
+- Item 11b roadmap (warning runtime): ../IMPROVEMENT_PLAN.md#item-11b
+```
+### 4.2 Cross-references
+- [ ] Agregar link en `skill/SKILL.md` sección "Referencias":
+  ```markdown
+  - [Postgres Tuning](references/postgres-tuning.md) — Índices, VACUUM, particionamiento y diagnóstico
+  ```
+- [ ] Agregar sección "Postgres tuning" en `CLAUDE.md` con link al doc completo (dejar resumen de 5-10 líneas en CLAUDE.md y link al doc extenso en `skill/references/`).
+- [ ] Agregar en `README.md` sección "Performance" (o "Tuning") breve nota apuntando a la skill para detalles.
+### 4.3 Validación
+- [ ] `bundle exec rspec` (no toca código, debería pasar igual)
+- [ ] Verificar links relativos en markdown (abrir local en editor/GitHub preview)
+### 4.4 Commit
+- [ ] Commit: `docs: postgres tuning por tamaño de tabla (item 11a)`
+### Checkpoint Fase 4
+- [ ] `postgres-tuning.md` creado y linkeado
+- [ ] CLAUDE.md y README actualizados
+- [ ] SKILL.md sección Referencias actualizada
+---
+## Fase 4.5 — Cleanup del review de v0.2.0 PR (A2, A3, A4, B1)
+**Contexto:** items detectados durante el review de v0.2.0 (PR #6) que quedaron abiertos. Cheap, limpian deuda mientras el contexto está fresco.
+### 4.5.1 A2 — Comment en `disconnect!` rescue
+- [ ] Editar `lib/data_drain/record.rb`. Encontrar:
+  ```ruby
+  rescue StandardError # rubocop:disable Lint/SuppressedException
+  end
+  ```
+  Reemplazar por:
+  ```ruby
+  rescue StandardError
+    # Silenciamos para no romper el flujo del thread durante cleanup.
+    # disconnect! se invoca típicamente en middlewares (Sidekiq/Puma);
+    # una excepción acá propagaría a jobs no relacionados.
+    nil
+  end
+  ```
+  Nota: agregar `nil` explícito permite quitar el `rubocop:disable Lint/SuppressedException`.
+### 4.5.2 A3 — Rename test + agregar cobertura real string/symbol keys
+- [ ] Editar `spec/data_drain/record_spec.rb`, bloque `describe ".build_query_path"`:
+  - Renombrar test actual:
+    ```ruby
+    it "interpola symbol values como strings" do
+      path = record_class.send(:build_query_path, { year: :integer, month: :integer })
+      expect(path).to include("year=integer")
+    end
+    ```
+  - Agregar test real de string keys:
+    ```ruby
+    it "acepta string keys en el hash de particiones" do
+      path = record_class.send(:build_query_path, { "year" => 2026, "month" => 3 })
+      expect(path).to include("year=2026")
+      expect(path).to include("month=3")
+    end
+    it "combina string keys y symbol keys en el mismo hash" do
+      path = record_class.send(:build_query_path, { "year" => 2026, month: 3 })
+      expect(path).to include("year=2026")
+      expect(path).to include("month=3")
+    end
+    ```
+  Razón: el código hace `partitions[k.to_sym] || partitions[k.to_s]`; sin estos tests la rama string-keys no está cubierta.
+### 4.5.3 A4 — Cerrar DuckDB en `record_spec.rb#before(:all)`
+- [ ] Editar `spec/data_drain/record_spec.rb`, bloque `before(:all)`:
+  ```ruby
+  before(:all) do
+    path = "spec/fixtures/test_archive"
+    FileUtils.rm_rf(path)
+    db = DuckDB::Database.open(":memory:")
+    conn = db.connect
+    conn.query(<<~SQL)
+      COPY (...) TO '#{path}' (...);
+    SQL
+    conn.close  # ← agregar
+    db.close    # ← agregar
+  end
+  ```
+### 4.5.4 B1 — Reordenar `public`/`private` en `storage/s3.rb`
+- [ ] Editar `lib/data_drain/storage/s3.rb`:
+  - Actualmente tiene toggling `private` → definiciones → `public` → `build_path` → `destroy_partitions` → `private` → `delete_in_batches`.
+  - Reordenar: todos los métodos públicos primero (`setup_duckdb`, `build_path`, `destroy_partitions`), luego `private` una sola vez, luego todos los privados (`create_s3_secret`, `escape_sql`, `delete_in_batches`).
+  - No cambia lógica, solo orden.
+- [ ] Verificar que quita el `public` re-toggle.
+### 4.5.5 Validación Fase 4.5
+- [ ] `bundle exec rspec` — todo verde
+- [ ] `bundle exec rubocop lib/` — sin ofensas
+- [ ] `bundle exec rubocop lib/data_drain/storage/s3.rb` específicamente (por el reorder)
+### 4.5.6 Commits
+Hacer commits separados por cleanup (son independientes):
+- [ ] `fix(record): comment en disconnect! rescue (cleanup A2)`
+- [ ] `test(record): agregar cobertura string vs symbol keys (cleanup A3)`
+- [ ] `test(record): cerrar DuckDB conn+db en before(:all) (cleanup A4)`
+- [ ] `refactor(storage/s3): reordenar public/private (cleanup B1)`
+### Checkpoint Fase 4.5
+- [ ] 4 commits cleanup (A2, A3, A4, B1)
+- [ ] Coverage estable o sube (A3 agrega ramas cubiertas)
+- [ ] Sin regresiones
+---
+## Fase 5 — Release
+### 5.1 Lint global
+- [ ] `bundle exec rubocop lib/` sin ofensas
+- [ ] `bundle exec rspec` pasa, coverage ≥ 80%
+### 5.2 CHANGELOG
+- [ ] Editar `CHANGELOG.md`, agregar al tope:
+  ```markdown
+  ## [0.2.2] - 2026-XX-XX
+  ### Security
+  - `Observability#safe_log` filtra secretos con regex en lugar de claves exactas.
+    Ahora captura `db_password`, `aws_secret_access_key`, `bearer_token`, `private_key`,
+    `*credential*`, etc. (item 9)
+  ### Features
+  - `GlueRunner.run_and_wait` acepta `max_wait_seconds:` para evitar bloqueo
+    indefinido en jobs colgados. Default `nil` (sin límite, comportamiento previo).
+    Emite `glue_runner.timeout` y levanta `DataDrain::Error`. (item 7)
+  - `Configuration#validate!` y `Configuration#validate_for_engine!` invocados
+    automáticamente en `Engine`, `FileIngestor` y `GlueRunner`. Falla rápido con
+    errores descriptivos si falta configuración (ej. `aws_region` con `:s3`,
+    `db_*` con Engine). (item 8)
+  ### Docs
+  - `skill/references/postgres-tuning.md`: guía de tuning de Postgres por tamaño
+    de tabla (índices, VACUUM, particionamiento, diagnóstico). (item 11a)
+  ### Cleanups (review PR #6)
+  - Fix typo `依赖` en CHANGELOG v0.2.1 (A1).
+  - Comment explicativo en `Record.disconnect!` rescue (A2).
+  - Cobertura real string-keys vs symbol-keys en `Record.build_query_path` (A3).
+  - Cerrar conn+db en `record_spec.rb#before(:all)` para evitar memory leak en suite (A4).
+  - Reorder `public`/`private` en `storage/s3.rb` (B1).
+  ### BREAKING (preventivo)
+  - `Engine.new` / `FileIngestor.new` / `GlueRunner.run_and_wait` ahora levantan
+    `DataDrain::ConfigurationError` en el boot si la configuración está incompleta.
+    Antes fallaban tarde con errores oscuros (`NoMethodError`, `PG::ConnectionBad`).
+  ```
+### 5.3 Bump versión
+- [ ] Editar `lib/data_drain/version.rb`: `VERSION = "0.2.2"`
+- [ ] `bundle install` (actualiza `Gemfile.lock`)
+### 5.4 Actualizar roadmap
+- [ ] Editar `docs/IMPROVEMENT_PLAN.md`:
+  - Items 7, 8, 9, 11a: `[ ]` → `[x]`
+  - Actualizar fecha "Última actualización"
+### 5.5 Commit release
+- [ ] Commit: `chore: release v0.2.2 — items P1 (filtros, timeout Glue, config validate, docs PG)`
+### 5.6 Merge y tag
+- [ ] `git push origin feature/v0.2.2`
+- [ ] Abrir PR a `main` con cuerpo basado en CHANGELOG
+- [ ] Esperar CI verde
+- [ ] Mergear
+- [ ] Tag: `git tag v0.2.2 && git push origin v0.2.2`
+### 5.7 Post-merge
+- [ ] Archivar este plan: `git mv docs/execution/v0.2.2.md docs/execution/archive/v0.2.2.md`
+- [ ] Commit: `chore: archive v0.2.2 plan, mark items 7/8/9/11a [x]`
+---
+## Validación final
+- [ ] `bundle exec rspec` verde, coverage ≥ 80%
+- [ ] `bundle exec rubocop lib/` sin ofensas
+- [ ] CHANGELOG completo con fecha real
+- [ ] Version = `0.2.2`
+- [ ] Tag `v0.2.2` creado y pusheado
+- [ ] Items 7, 8, 9, 11a marcados `[x]` en roadmap
+- [ ] Plan archivado
+---
+## Plan B — si algún item se atasca
+| Si... | Entonces... |
+|-------|-------------|
+| Item 8 rompe callers en monorepo Wispro (validate_for_engine!) | Relajar validación: log warning en lugar de raise. Re-habilitar raise en v0.3.0 tras coordinar. |
+| Item 7 tests con mock de Process.clock_gettime son flakey | Usar `Timecop` o abstraer el monotonic clock a un método de clase stubeable. |
+| Item 11a toma más de 1 día | Cortar a lo esencial: tabla decisión + checklist índice + diagnóstico SQL. Particionamiento puede ir a v0.3.0. |
+| CI falla por coverage | Revisar SimpleCov — probablemente los nuevos branches de `validate!` (item 8) no están cubiertos. Agregar tests específicos. |
+---
+## Notas para el agente que ejecuta
+- **Cada fase cierra con commit atómico.** No commit si tests rojos.
+- **Antes de cada commit:** `bundle exec rspec` + `bundle exec rubocop lib/`.
+- **Item 8 (validate!) es el más invasivo.** Hacer última revisión de callers del monorepo antes de mergear.
+- **Items 9, 7, 11a pueden paralelizarse** si hay múltiples agentes. Item 8 debe ir solo (toca 3 archivos core).
+- **Si CI agrega Ruby 3.2, 3.3 al matrix (item 14c pendiente)**, verificar que los cambios pasan en todas.
+- **Actualizar `skill/` en cada fase** — no acumular deuda de docs para el final.