RubyGems - data_drain - Versions diffs - 0.5.0 → 0.5.2 - Mend

data_drain 0.5.0 → 0.5.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +13 -0
data/README.md +10 -6
data/data_drain.gemspec +1 -1
data/lib/data_drain/record.rb +4 -1
data/lib/data_drain/version.rb +1 -1
data/skill/SKILL.md +2 -2
data/skill/references/eventos-telemetria.md +9 -0
metadata +5 -5
/data/docs/execution/{v0.5.0-OBSERVACIONES.md → archives/v0.5.0-OBSERVACIONES.md} +0 -0
/data/docs/execution/{v0.5.0.md → archives/v0.5.0.md} +0 -0

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: c92c85e6232344565dc090539d3d58aa47904ca1e454b696ba4eba12e2648881
-  data.tar.gz: 1b3332ac50288dfd6793aed0c51ab8fa49d8dc309f7bc8dd4a642ecffd97e3cd
+  metadata.gz: 414600ce1230908cb1eef7e092ebf9287774ddbe4985286d8aa83995d0e47d4b
+  data.tar.gz: d300b31686ccf09320abc070a018566510e3e9a2d8488cb2bc83209dc56a3b21
 SHA512:
-  metadata.gz: d0cecb3d168ad96943b9cc70eb936e9b95e92d03dc0bd08ed5996ac1967ef627191802892097abf3b094d5b59515c150ca8edc4ffcefe13be8b4a2d7721a180a
-  data.tar.gz: 45018ca7e4287bf055cb7060e5720f83da318bd7c9ad44a30d8e832972feb1eb99d29286a9813128d4f84171d4c032e12f9a4f4dd7cc0022b36e44a3098d91c8
+  metadata.gz: f4a7177e6d412995216397de87e9e93806815c395dea206fed75b541f3dafb208a34ea2c1cb35d6226f0dd4d4bf118c9ba13b0e8dc522d73be16ab59487fc7c9
+  data.tar.gz: 40dd834ad6af6d0c291b35a4ddbd259ef635400b31f92852d50e9565ff9f14ce880c3d0b4ad04f8a1e924888e92327338356776d065bd07dabec5d450ea0d440

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,18 @@
 ## [Unreleased]
+## [0.5.2] - 2026-04-16
+### Correcciones
+- `Record#where()` ahora usa wildcards (`key=*`) para partition keys no especificadas, en lugar de valores vacíos (`key=`). Consistente con `destroy_partitions`. Fixes #1.
+## [0.5.1] - 2026-04-15
+### Docs
+- `skill/references/eventos-telemetria.md`: nuevos eventos `script_uploaded` y `script_upload_error`.
+- `README.md`: ejemplos de `script_path` en GlueRunner y observabilidad.
 ## [0.5.0] - 2026-04-15
 ### Features

data/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # DataDrain
-[![CI](https://github.com/gedera/data_drain/actions/workflows/main.yml/badge.svg)](https://github.com/gedera/data_drain/actions/workflows/main.yml)
+[![CI](https://github.com/sequre/data_drain/actions/workflows/main.yml/badge.svg)](https://github.com/sequre/data_drain/actions/workflows/main.yml)
 Micro-framework Ruby para extraer, archivar y purgar datos históricos de PostgreSQL hacia un Data Lake (S3 o disco local) en formato Parquet, usando DuckDB en memoria.
@@ -18,7 +18,7 @@ Micro-framework Ruby para extraer, archivar y purgar datos históricos de Postgr
 ```ruby
 # Gemfile
-gem 'data_drain', git: 'https://github.com/gedera/data_drain.git', branch: 'main'
+gem 'data_drain', git: 'https://github.com/sequre/data_drain.git', branch: 'main'
 ```
 ```bash
@@ -115,11 +115,13 @@ DataDrain::GlueRunner.job_exists?("my-glue-export-job")
 job = DataDrain::GlueRunner.get_job("my-glue-export-job")
 # => Aws::Glue::Types::Job (Name, Command, DefaultArguments, etc.)
-# Crear un job
+# Crear un job con script local (v0.5.0+)
 job = DataDrain::GlueRunner.create_job(
   "my-glue-export-job",
   role_arn: "arn:aws:iam::123:role/GlueServiceRole",
-  script_location: "s3://my-bucket/scripts/export.py",
+  script_path: "scripts/glue/export.py",  # local → S3 automático
+  script_bucket: "my-bucket",
+  script_folder: "scripts",
   default_arguments: { "--extra-files" => "s3://my-bucket/scripts/udf.py" },
   timeout: 1440,
   max_retries: 2
@@ -129,8 +131,9 @@ job = DataDrain::GlueRunner.create_job(
 job = DataDrain::GlueRunner.ensure_job(
   "my-glue-export-job",
   role_arn: "arn:aws:iam::123:role/GlueServiceRole",
-  script_location: "s3://my-bucket/scripts/export.py",
-  timeout: 1440
+  script_path: "scripts/glue/export.py",
+  script_bucket: "my-bucket",
+  script_folder: "scripts"
 )
 # Eliminar un job
@@ -198,6 +201,7 @@ ArchivedVersion.destroy_all(year: 2024, month: 3)    # un mes globalmente
 ```
 component=data_drain event=engine.complete table=versions duration_s=12.4 export_duration_s=8.1 purge_duration_s=3.9 count=150000
 component=data_drain event=engine.purge_heartbeat table=versions batches_processed_count=100 rows_deleted_count=500000
+component=data_drain event=glue_runner.script_uploaded local_path=scripts/glue/export.py s3_path=s3://my-bucket/scripts/export.py bytes=4521
 component=data_drain event=glue_runner.failed job=my-export-job run_id=jr_abc123 status=FAILED duration_s=301.0
 ```

data/data_drain.gemspec CHANGED Viewed

@@ -11,7 +11,7 @@ Gem::Specification.new do |spec|
   spec.summary = "Micro-framework para drenar datos de PostgreSQL a Parquet vía DuckDB."
   spec.description = "Extrae datos transaccionales, los archiva en un Data Lake (S3/Local) " \
                      "en formato Parquet usando Hive Partitioning, y purga el origen de forma segura."
-  spec.homepage = "https://github.com/gedera/data_drain"
+  spec.homepage = "https://github.com/sequre/data_drain"
   spec.required_ruby_version = ">= 3.2"
   spec.files = Dir.chdir(__dir__) do

data/lib/data_drain/record.rb CHANGED Viewed

@@ -131,7 +131,10 @@ module DataDrain
       # @param partitions [Hash]
       # @return [String]
       def build_query_path(partitions)
-        partition_path = partition_keys.map { |k| "#{k}=#{partitions[k.to_sym] || partitions[k.to_s]}" }.join("/")
+        partition_path = partition_keys.map do |k|
+          val = partitions.key?(k.to_sym) ? partitions[k.to_sym] : partitions[k.to_s]
+          val.nil? || val.to_s.empty? ? "#{k}=*" : "#{k}=#{val}"
+        end.join("/")
         DataDrain::Storage.adapter.build_path(bucket, folder_name, partition_path)
       end

data/lib/data_drain/version.rb CHANGED Viewed

@@ -2,5 +2,5 @@
 module DataDrain
   # @return [String] versión semver de la gema
-  VERSION = "0.5.0"
+  VERSION = "0.5.2"
 end

data/skill/SKILL.md CHANGED Viewed

@@ -70,7 +70,7 @@ DataDrain resuelve el ciclo de vida de datos históricos en bases relacionales c
 - Ruby `>= 3.2.0`
 - Runtime: `activemodel >= 6.0`, `duckdb ~> 1.4`, `pg >= 1.2`, `aws-sdk-s3 ~> 1.114`, `aws-sdk-glue ~> 1.0`
-- Versión actual: `0.5.0`
+- Versión actual: `0.5.1`
 ## API Pública (resumen)
@@ -271,7 +271,7 @@ Catálogo completo en [Antipatrones](references/antipatrones.md). Resumen de los
 ## Referencias
 - [API Detallada](references/api-detallada.md) — Firmas completas, parámetros, retornos y comportamientos de cada clase pública.
-- [Glue Jobs Lifecycle](https://github.com/gedera/data_drain/blob/main/docs/glue-jobs-lifecycle.md) — Guía completa de gestión de AWS Glue Jobs: crear, actualizar, eliminar, verificar y ejecutar jobs idempotentemente.
+- [Glue Jobs Lifecycle](https://github.com/sequre/data_drain/blob/main/docs/glue-jobs-lifecycle.md) — Guía completa de gestión de AWS Glue Jobs: crear, actualizar, eliminar, verificar y ejecutar jobs idempotentemente.
 - [Eventos y Telemetría](references/eventos-telemetria.md) — Catálogo completo de eventos KV emitidos por la gema.
 - [Antipatrones](references/antipatrones.md) — Qué NO hacer y alternativas correctas.
 - [Postgres Tuning](references/postgres-tuning.md) — Índices, VACUUM, particionamiento y diagnóstico por tamaño de tabla.

data/skill/references/eventos-telemetria.md CHANGED Viewed

@@ -115,6 +115,15 @@ Catálogo completo de eventos KV emitidos por DataDrain. Formato Wispro-Observab
 **Nivel:** INFO. Emite antes de `start_job_run`.
 **Campos:** `job`.
+### `glue_runner.script_uploaded`
+**Nivel:** INFO. Emite tras subir un script a S3 (v0.5.0+).
+**Campos:** `local_path`, `s3_path`, `bytes`.
+### `glue_runner.script_upload_error`
+**Nivel:** ERROR. Emite si el upload a S3 falla (v0.5.0+).
+**Campos:** `local_path`, `bucket`, `error_class`, `error_message`.
+**Consecuencia:** propaga el `Aws::S3::Errors::ServiceError`.
 ### `glue_runner.job_exists`
 **Nivel:** INFO. Emite en `ensure_job` cuando el job ya existe y se actualiza.
 **Campos:** `job`.

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: data_drain
 version: !ruby/object:Gem::Version
-  version: 0.5.0
+  version: 0.5.2
 platform: ruby
 authors:
 - Gabriel
 autorequire:
 bindir: exe
 cert_chain: []
-date: 2026-04-15 00:00:00.000000000 Z
+date: 2026-04-16 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: activemodel
@@ -104,11 +104,11 @@ files:
 - docs/execution/archive/v0.3.0.md
 - docs/execution/archive/v0.3.1-OBSERVACIONES.md
 - docs/execution/archive/v0.3.1.md
+- docs/execution/archives/v0.5.0-OBSERVACIONES.md
+- docs/execution/archives/v0.5.0.md
 - docs/execution/v0.2.2.md
 - docs/execution/v0.4.0-OBSERVACIONES.md
 - docs/execution/v0.4.0.md
-- docs/execution/v0.5.0-OBSERVACIONES.md
-- docs/execution/v0.5.0.md
 - docs/glue-jobs-lifecycle.md
 - docs/glue_pyspark_example.py
 - lib/data_drain.rb
@@ -133,7 +133,7 @@ files:
 - skill/references/api-detallada.md
 - skill/references/eventos-telemetria.md
 - skill/references/postgres-tuning.md
-homepage: https://github.com/gedera/data_drain
+homepage: https://github.com/sequre/data_drain
 licenses: []
 metadata: {}
 post_install_message:

/data/docs/execution/{v0.5.0-OBSERVACIONES.md → archives/v0.5.0-OBSERVACIONES.md} RENAMED Viewed

File without changes

/data/docs/execution/{v0.5.0.md → archives/v0.5.0.md} RENAMED Viewed

File without changes