data_drain 0.3.0 → 0.3.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop.yml +40 -1
- data/CHANGELOG.md +38 -0
- data/CLAUDE.md +14 -0
- data/README.md +2 -0
- data/data_drain.gemspec +1 -1
- data/docs/IMPROVEMENT_PLAN.md +122 -21
- data/docs/execution/archive/v0.3.0-OBSERVACIONES.md +136 -0
- data/docs/execution/archive/v0.3.0.md +1111 -0
- data/docs/execution/archive/v0.3.1-OBSERVACIONES.md +146 -0
- data/docs/execution/archive/v0.3.1.md +842 -0
- data/lib/data_drain/engine.rb +1 -0
- data/lib/data_drain/observability.rb +2 -0
- data/lib/data_drain/storage/base.rb +12 -0
- data/lib/data_drain/storage/local.rb +1 -3
- data/lib/data_drain/storage/s3.rb +5 -3
- data/lib/data_drain/types/json_type.rb +1 -0
- data/lib/data_drain/validations.rb +2 -0
- data/lib/data_drain/version.rb +2 -1
- data/lib/data_drain.rb +1 -0
- data/skill/references/antipatrones.md +10 -0
- data/skill/references/postgres-tuning.md +14 -0
- metadata +6 -2
|
@@ -0,0 +1,1111 @@
|
|
|
1
|
+
# Plan de Ejecución — v0.3.0
|
|
2
|
+
|
|
3
|
+
**Release objetivo:** v0.3.0 — Refactor y observabilidad avanzada
|
|
4
|
+
**Items del roadmap:** 10, 20, 5, 11b, 6 ([ver IMPROVEMENT_PLAN.md](../IMPROVEMENT_PLAN.md))
|
|
5
|
+
**Branch sugerido:** `feature/v0.3.0`
|
|
6
|
+
**Base:** `main` (contiene v0.2.2)
|
|
7
|
+
**Estado:** No iniciado
|
|
8
|
+
**Última actualización:** 2026-04-15
|
|
9
|
+
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
## Contexto
|
|
13
|
+
|
|
14
|
+
v0.2.x cerró P0 (seguridad + testing) e P1 cheap (validaciones + timeout Glue + filtros + docs PG). v0.3.0 se enfoca en **refactor del path crítico (`Engine#call`)** para desbloquear features que lo extienden (VACUUM, warning purga lenta) sin reintroducir complejidad ciclomática. Incluye también el sandboxing final de DuckDB.
|
|
15
|
+
|
|
16
|
+
**Items:**
|
|
17
|
+
|
|
18
|
+
| Item | Resumen | Estimación | Tipo |
|
|
19
|
+
|------|---------|------------|------|
|
|
20
|
+
| 10 | Refactor `Engine#call` (CC=13 → ~5) | 4-6h | `refactor` |
|
|
21
|
+
| 20 | Limpiar `rubocop:disable` en `lib/` agregados en v0.2.0 | 3-4h | `refactor` |
|
|
22
|
+
| 5 | `vacuum_after_purge` opcional post-purga | 3-4h | `feat` `perf` |
|
|
23
|
+
| 11b | Warning runtime de purga lenta sin avance | 3-4h | `feat` `perf` |
|
|
24
|
+
| 6 | Sandboxing de `Record.connection` | 2-3h | `security` |
|
|
25
|
+
|
|
26
|
+
**Total estimado:** 15-21h, 2-3 días enfocados.
|
|
27
|
+
|
|
28
|
+
**Breaking:** ninguno (refactor interno + features opt-in).
|
|
29
|
+
|
|
30
|
+
**Excluidos intencionalmente** (se mueven a v0.3.1):
|
|
31
|
+
- Items 12-19, 22-24 (calidad/DX/CI)
|
|
32
|
+
- Mejoras de tests y YARD coverage
|
|
33
|
+
|
|
34
|
+
---
|
|
35
|
+
|
|
36
|
+
## Review de agentes — incorporado
|
|
37
|
+
|
|
38
|
+
Revisión por **big-pickle** 2026-04-15 (`docs/execution/v0.3.0-OBSERVACIONES.md`). 6 observaciones + 4 preguntas abiertas, todas incorporadas:
|
|
39
|
+
|
|
40
|
+
| Obs | Severidad | Resolución | Ubicación |
|
|
41
|
+
|-----|-----------|-----------|-----------|
|
|
42
|
+
| 1: Dependencia Item 10 → 20 tight | Media | Checkpoint explícito de entrada validando API de `@durations`/`timed`/`purge_loop` | Fase 2.0 |
|
|
43
|
+
| 2: `purge_loop` no retorna `total_deleted` | **Alta** | Refactor explícito en sub-step 1.3 antes de avanzar a item 5 | Fase 1.3 + 1.4 |
|
|
44
|
+
| 3: Timecop require/return duplicado | Baja | Centralizado `require "timecop"` + `Timecop.return` en `spec_helper.rb` | Fase 0.3 |
|
|
45
|
+
| 4: Bug en `vacuum_if_needed` (rescue sin `dead_before`) | Media | `engine.vacuum_error` incluye `dead_tuples_before` y `duration_s` parcial | Fase 3.2 + test 3.3 |
|
|
46
|
+
| 5: `lock_configuration` + S3 ambiguo | Media | Investigación prescriptiva con 3 tests en `bin/console` pre-implementación | Fase 5.1 |
|
|
47
|
+
| 6: CC post-Fase 1 no se reconfirma | Baja | Checkpoint 2.0 valida CC ≤ 5 antes de refactor FileIngestor | Fase 2.0 |
|
|
48
|
+
|
|
49
|
+
**Respuestas a preguntas abiertas:**
|
|
50
|
+
|
|
51
|
+
1. **¿`lock_configuration` + S3 es real o teórico?** — TBD. Fase 5.1 lo descarta o confirma antes de implementar.
|
|
52
|
+
2. **Scope fallback si el timeline se estira** — priorizar en orden: 10 > 20 > 5 > 11b > 6. Si se corta, parar en 5 (VACUUM es el feature más solicitado).
|
|
53
|
+
3. **Tests integration A (mocks) o B (`:integration` tagged)?** — Opción A. Mantiene consistencia con suite actual (CI DuckDB-only).
|
|
54
|
+
4. **Coordinación con Gemini** — no trabaja en paralelo actualmente en esta gema. Si aparece, revisar `feedback_gemini_collaboration.md` (source= duplicado, orden de campos, include vs extend).
|
|
55
|
+
|
|
56
|
+
---
|
|
57
|
+
|
|
58
|
+
## Orden de ejecución y dependencias
|
|
59
|
+
|
|
60
|
+
```
|
|
61
|
+
Fase 0: setup branch + baseline
|
|
62
|
+
│
|
|
63
|
+
▼
|
|
64
|
+
Fase 1: Item 10 (refactor Engine#call) ──► FUNDACIÓN
|
|
65
|
+
│ extrae step_count, step_export,
|
|
66
|
+
│ step_verify, step_purge, timed helper
|
|
67
|
+
▼
|
|
68
|
+
Fase 2: Item 20 (cleanup rubocop:disable) ──► follow-up natural de 10
|
|
69
|
+
│ engine/file_ingestor/record/s3
|
|
70
|
+
▼
|
|
71
|
+
Fase 3: Item 5 (VACUUM post-purga) ──────────► extiende step_purge
|
|
72
|
+
│ nuevo evento engine.vacuum_complete
|
|
73
|
+
▼
|
|
74
|
+
Fase 4: Item 11b (warning purga lenta) ──────► extiende step_purge loop
|
|
75
|
+
│ Timecop por riesgo 3 big-pickle
|
|
76
|
+
▼
|
|
77
|
+
Fase 5: Item 6 (sandboxing Record) ──────────► isolated
|
|
78
|
+
│ SET lock_configuration=true
|
|
79
|
+
▼
|
|
80
|
+
Fase 6: Release (CHANGELOG, version bump, tag)
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
**Razonamiento:**
|
|
84
|
+
- Item 10 primero: establece `step_*` + `timed`. Items 5 y 11b se apoyan en `step_purge` extraído.
|
|
85
|
+
- Item 20 inmediato después: aprovecha contexto fresco del refactor para limpiar disables. Evita acumular deuda visible.
|
|
86
|
+
- Item 5 y 11b en serie: ambos tocan purge path, mejor secuenciales para simplificar tests.
|
|
87
|
+
- Item 6 al final: isolated en `Record`, no bloquea nada.
|
|
88
|
+
|
|
89
|
+
---
|
|
90
|
+
|
|
91
|
+
## Pre-requisitos (Fase 0)
|
|
92
|
+
|
|
93
|
+
### 0.1 Verificar entorno
|
|
94
|
+
|
|
95
|
+
- [ ] `git checkout main && git pull`
|
|
96
|
+
- [ ] Versión actual en `lib/data_drain/version.rb` = `"0.2.2"`
|
|
97
|
+
- [ ] `bundle exec rspec` pasa (~113 specs, coverage ≥ 80%)
|
|
98
|
+
- [ ] `bundle exec rubocop lib/` sin ofensas
|
|
99
|
+
- [ ] CI verde en main (Ruby 3.4.4)
|
|
100
|
+
|
|
101
|
+
### 0.2 Crear branch
|
|
102
|
+
|
|
103
|
+
- [ ] `git checkout -b feature/v0.3.0`
|
|
104
|
+
|
|
105
|
+
### 0.3 Agregar Timecop al Gemfile + centralizar en spec_helper (item 11b)
|
|
106
|
+
|
|
107
|
+
Big-pickle señaló en review de v0.2.2 (ClickUp 86b9dka0c) que el mock directo de `Process.clock_gettime` puede ser flakey. Para item 11b (que depende intensamente de timing) usar Timecop como primary tool.
|
|
108
|
+
|
|
109
|
+
**Observación 3 de big-pickle:** centralizar el `require "timecop"` y `Timecop.return` en `spec_helper.rb` para evitar setup duplicado y flakes por estado residual entre tests.
|
|
110
|
+
|
|
111
|
+
- [ ] Agregar al `Gemfile`:
|
|
112
|
+
```ruby
|
|
113
|
+
group :test do
|
|
114
|
+
gem "simplecov", require: false
|
|
115
|
+
gem "timecop", require: false
|
|
116
|
+
end
|
|
117
|
+
```
|
|
118
|
+
- [ ] `bundle install`
|
|
119
|
+
- [ ] Editar `spec/spec_helper.rb`, agregar:
|
|
120
|
+
```ruby
|
|
121
|
+
require "timecop"
|
|
122
|
+
|
|
123
|
+
RSpec.configure do |config|
|
|
124
|
+
config.after(:each) do
|
|
125
|
+
Timecop.return # garantiza reset entre tests, aunque el test falle
|
|
126
|
+
end
|
|
127
|
+
end
|
|
128
|
+
```
|
|
129
|
+
- [ ] Tests de item 11b usan `Timecop.travel(...)` directo sin `require "timecop"` local ni `Timecop.return` manual — el helper global lo hace.
|
|
130
|
+
- [ ] Commit: `chore: add timecop, centralizar en spec_helper (item 11b)`
|
|
131
|
+
|
|
132
|
+
### 0.4 Decidir política de tests integration
|
|
133
|
+
|
|
134
|
+
Items 5 y 11b se benefician de tests contra Postgres real. Decidir antes de arrancar:
|
|
135
|
+
|
|
136
|
+
- [ ] **Opción A (recomendada):** mockear todo (PG::Connection + PG::Result), usar mocks elaborados. Pros: CI sigue en DuckDB-only. Contras: tests frágiles vs comportamiento real PG.
|
|
137
|
+
- [ ] **Opción B:** marcar tests con `:integration`, skipear por default (ya hay convención del plan v0.2.2), correr manualmente o con Postgres service container en CI.
|
|
138
|
+
|
|
139
|
+
Si se elige B:
|
|
140
|
+
- [ ] Verificar que `spec/spec_helper.rb` tenga la convención `:integration` (según v0.2.2 Fase 0.4).
|
|
141
|
+
- [ ] Documentar en README sección "Contribuir" cómo correr tests integration.
|
|
142
|
+
|
|
143
|
+
**Decisión propuesta:** A (mocks). Mantiene consistencia con suite actual.
|
|
144
|
+
|
|
145
|
+
### Checkpoint Fase 0
|
|
146
|
+
|
|
147
|
+
- [ ] Branch creado
|
|
148
|
+
- [ ] Timecop disponible
|
|
149
|
+
- [ ] Política de tests integration definida
|
|
150
|
+
- [ ] Baseline verde
|
|
151
|
+
|
|
152
|
+
---
|
|
153
|
+
|
|
154
|
+
## Fase 1 — Item 10: Refactor `Engine#call` (CC=13 → ~5)
|
|
155
|
+
|
|
156
|
+
**Roadmap:** [Item 10](../IMPROVEMENT_PLAN.md#item-10--refactor-enginecall-cc13--5)
|
|
157
|
+
|
|
158
|
+
### Contexto
|
|
159
|
+
|
|
160
|
+
`Engine#call` tiene CC=13 (alta). Mezcla setup, conteo, export condicional, verify, purge y logging granular. Difícil de testear en aislamiento. `rubocop:disable Metrics/ClassLength, AbcSize, MethodLength` es workaround.
|
|
161
|
+
|
|
162
|
+
**Objetivo:** extraer `step_count`, `step_export`, `step_verify`, `step_purge` como métodos privados con timing acumulado en `@durations` hash. `#call` queda como pipeline lineal CC≤5.
|
|
163
|
+
|
|
164
|
+
### 1.1 Refactor estructural
|
|
165
|
+
|
|
166
|
+
- [ ] Editar `lib/data_drain/engine.rb`. Reemplazar `#call` actual por:
|
|
167
|
+
```ruby
|
|
168
|
+
def call
|
|
169
|
+
@durations = {}
|
|
170
|
+
start_time = monotonic
|
|
171
|
+
log_start
|
|
172
|
+
|
|
173
|
+
setup_duckdb
|
|
174
|
+
return skip_empty(start_time) if step_count.zero?
|
|
175
|
+
|
|
176
|
+
step_export unless @skip_export
|
|
177
|
+
return integrity_failed(start_time) unless step_verify
|
|
178
|
+
|
|
179
|
+
step_purge
|
|
180
|
+
log_complete(start_time)
|
|
181
|
+
true
|
|
182
|
+
end
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
- [ ] Agregar métodos privados:
|
|
186
|
+
```ruby
|
|
187
|
+
# @api private
|
|
188
|
+
def monotonic
|
|
189
|
+
Process.clock_gettime(Process::CLOCK_MONOTONIC)
|
|
190
|
+
end
|
|
191
|
+
|
|
192
|
+
# @api private
|
|
193
|
+
def timed(step_name)
|
|
194
|
+
t = monotonic
|
|
195
|
+
result = yield
|
|
196
|
+
@durations[step_name] = monotonic - t
|
|
197
|
+
result
|
|
198
|
+
end
|
|
199
|
+
|
|
200
|
+
# @api private
|
|
201
|
+
def log_start
|
|
202
|
+
safe_log(:info, "engine.start",
|
|
203
|
+
{ table: @table_name, start_date: @start_date.to_date, end_date: @end_date.to_date })
|
|
204
|
+
end
|
|
205
|
+
|
|
206
|
+
# @api private
|
|
207
|
+
def step_count
|
|
208
|
+
@pg_count = timed(:db_query) { get_postgres_count }
|
|
209
|
+
@pg_count
|
|
210
|
+
end
|
|
211
|
+
|
|
212
|
+
# @api private
|
|
213
|
+
def skip_empty(start_time)
|
|
214
|
+
duration = monotonic - start_time
|
|
215
|
+
safe_log(:info, "engine.skip_empty", {
|
|
216
|
+
table: @table_name,
|
|
217
|
+
duration_s: duration.round(2),
|
|
218
|
+
db_query_duration_s: @durations.fetch(:db_query, 0).round(2)
|
|
219
|
+
})
|
|
220
|
+
true
|
|
221
|
+
end
|
|
222
|
+
|
|
223
|
+
# @api private
|
|
224
|
+
def step_export
|
|
225
|
+
safe_log(:info, "engine.export_start", { table: @table_name, count: @pg_count })
|
|
226
|
+
timed(:export) { export_to_parquet }
|
|
227
|
+
end
|
|
228
|
+
|
|
229
|
+
# @api private
|
|
230
|
+
def step_verify
|
|
231
|
+
timed(:integrity) { verify_integrity }
|
|
232
|
+
end
|
|
233
|
+
|
|
234
|
+
# @api private
|
|
235
|
+
def step_purge
|
|
236
|
+
timed(:purge) { purge_from_postgres }
|
|
237
|
+
end
|
|
238
|
+
|
|
239
|
+
# @api private
|
|
240
|
+
def log_complete(start_time)
|
|
241
|
+
duration = monotonic - start_time
|
|
242
|
+
safe_log(:info, "engine.complete", {
|
|
243
|
+
table: @table_name,
|
|
244
|
+
duration_s: duration.round(2),
|
|
245
|
+
db_query_duration_s: @durations.fetch(:db_query, 0).round(2),
|
|
246
|
+
export_duration_s: @durations.fetch(:export, 0).round(2),
|
|
247
|
+
integrity_duration_s: @durations.fetch(:integrity, 0).round(2),
|
|
248
|
+
purge_duration_s: @durations.fetch(:purge, 0).round(2),
|
|
249
|
+
count: @pg_count
|
|
250
|
+
})
|
|
251
|
+
end
|
|
252
|
+
|
|
253
|
+
# @api private
|
|
254
|
+
def integrity_failed(start_time)
|
|
255
|
+
duration = monotonic - start_time
|
|
256
|
+
safe_log(:error, "engine.integrity_error", {
|
|
257
|
+
table: @table_name,
|
|
258
|
+
duration_s: duration.round(2),
|
|
259
|
+
count: @pg_count
|
|
260
|
+
})
|
|
261
|
+
false
|
|
262
|
+
end
|
|
263
|
+
```
|
|
264
|
+
|
|
265
|
+
- [ ] Quitar el `# rubocop:disable Metrics/ClassLength, Metrics/AbcSize, Metrics/MethodLength, Naming/AccessorMethodName` de arriba de la clase.
|
|
266
|
+
- [ ] Mantener `export_to_parquet`, `verify_integrity`, `purge_from_postgres`, `base_where_sql`, `setup_duckdb`, `get_postgres_count` tal cual (ya son privados y testeados).
|
|
267
|
+
|
|
268
|
+
### 1.2 Validación de equivalencia
|
|
269
|
+
|
|
270
|
+
**Critical:** los eventos emitidos deben ser idénticos en campos, orden y valores al comportamiento anterior de v0.2.2.
|
|
271
|
+
|
|
272
|
+
- [ ] Crear test de equivalencia que capture todos los logs emitidos en un flujo happy path y compare con snapshot conocido:
|
|
273
|
+
```ruby
|
|
274
|
+
# spec/data_drain/engine_refactor_spec.rb
|
|
275
|
+
RSpec.describe DataDrain::Engine, "refactor step extraction" do
|
|
276
|
+
# ... setup mocks ...
|
|
277
|
+
|
|
278
|
+
it "emite los mismos eventos en el mismo orden que v0.2.2" do
|
|
279
|
+
expected_events = %w[
|
|
280
|
+
engine.start
|
|
281
|
+
engine.export_start
|
|
282
|
+
engine.integrity_check
|
|
283
|
+
engine.purge_start
|
|
284
|
+
engine.complete
|
|
285
|
+
]
|
|
286
|
+
|
|
287
|
+
emitted = capture_events { engine.call }
|
|
288
|
+
expect(emitted.map { |e| e[:event] }).to eq(expected_events)
|
|
289
|
+
end
|
|
290
|
+
|
|
291
|
+
it "engine.complete incluye todas las métricas granulares" do
|
|
292
|
+
event = capture_events { engine.call }.last
|
|
293
|
+
expect(event[:fields]).to include(
|
|
294
|
+
:duration_s, :db_query_duration_s, :export_duration_s,
|
|
295
|
+
:integrity_duration_s, :purge_duration_s, :count
|
|
296
|
+
)
|
|
297
|
+
end
|
|
298
|
+
end
|
|
299
|
+
```
|
|
300
|
+
|
|
301
|
+
### 1.3 Refactor explícito de `purge_from_postgres` para retornar `total_deleted`
|
|
302
|
+
|
|
303
|
+
**Observación 2 de big-pickle (ALTA):** el `purge_from_postgres` actual (v0.2.2) NO retorna `total_deleted` — el loop termina con `break if count.zero?` que retorna `nil`. Esto bloquea a Fase 3 (item 5) que necesita el valor.
|
|
304
|
+
|
|
305
|
+
**Acción:** refactorizar como sub-step explícito aquí, no diferirlo.
|
|
306
|
+
|
|
307
|
+
- [ ] En `lib/data_drain/engine.rb`, `purge_from_postgres`:
|
|
308
|
+
- Extraer el loop interno a método privado `purge_loop(conn) → Integer`:
|
|
309
|
+
```ruby
|
|
310
|
+
# @api private
|
|
311
|
+
# @param conn [PG::Connection]
|
|
312
|
+
# @return [Integer] total de filas borradas
|
|
313
|
+
def purge_loop(conn)
|
|
314
|
+
batches_processed = 0
|
|
315
|
+
total_deleted = 0
|
|
316
|
+
|
|
317
|
+
loop do
|
|
318
|
+
sql = build_delete_sql
|
|
319
|
+
result = conn.exec(sql)
|
|
320
|
+
count = result.cmd_tuples
|
|
321
|
+
break if count.zero?
|
|
322
|
+
|
|
323
|
+
batches_processed += 1
|
|
324
|
+
total_deleted += count
|
|
325
|
+
|
|
326
|
+
emit_heartbeat_if_due(batches_processed, total_deleted)
|
|
327
|
+
sleep(@config.throttle_delay) if @config.throttle_delay.positive?
|
|
328
|
+
end
|
|
329
|
+
|
|
330
|
+
total_deleted
|
|
331
|
+
end
|
|
332
|
+
|
|
333
|
+
# @api private
|
|
334
|
+
def emit_heartbeat_if_due(batches_processed, total_deleted)
|
|
335
|
+
return unless (batches_processed % 100).zero?
|
|
336
|
+
|
|
337
|
+
safe_log(:info, "engine.purge_heartbeat", {
|
|
338
|
+
table: @table_name,
|
|
339
|
+
batches_processed_count: batches_processed,
|
|
340
|
+
rows_deleted_count: total_deleted
|
|
341
|
+
})
|
|
342
|
+
end
|
|
343
|
+
|
|
344
|
+
# @api private
|
|
345
|
+
def build_delete_sql
|
|
346
|
+
<<~SQL
|
|
347
|
+
DELETE FROM #{@table_name}
|
|
348
|
+
WHERE #{@primary_key} IN (
|
|
349
|
+
SELECT #{@primary_key} FROM #{@table_name}
|
|
350
|
+
WHERE #{base_where_sql}
|
|
351
|
+
LIMIT #{@config.batch_size}
|
|
352
|
+
)
|
|
353
|
+
SQL
|
|
354
|
+
end
|
|
355
|
+
```
|
|
356
|
+
- `purge_from_postgres` queda como orquestador:
|
|
357
|
+
```ruby
|
|
358
|
+
def purge_from_postgres
|
|
359
|
+
safe_log(:info, "engine.purge_start", { table: @table_name, batch_size: @config.batch_size })
|
|
360
|
+
conn = open_pg_connection
|
|
361
|
+
apply_session_timeout(conn)
|
|
362
|
+
|
|
363
|
+
purge_loop(conn) # retorna total_deleted (se usa en Fase 3 item 5)
|
|
364
|
+
ensure
|
|
365
|
+
conn&.close
|
|
366
|
+
end
|
|
367
|
+
```
|
|
368
|
+
|
|
369
|
+
### 1.4 Test de equivalencia para `purge_loop`
|
|
370
|
+
|
|
371
|
+
- [ ] Agregar en `spec/data_drain/engine_spec.rb`:
|
|
372
|
+
```ruby
|
|
373
|
+
describe "#purge_loop (refactor)" do
|
|
374
|
+
it "retorna total_deleted (suma de cmd_tuples por lote)" do
|
|
375
|
+
allow(mock_pg_result).to receive(:cmd_tuples).and_return(100, 50, 0)
|
|
376
|
+
allow(mock_pg_conn).to receive(:exec).with(/DELETE/).and_return(mock_pg_result)
|
|
377
|
+
|
|
378
|
+
engine.send(:setup_for_purge) # helper si es necesario
|
|
379
|
+
total = engine.send(:purge_loop, mock_pg_conn)
|
|
380
|
+
expect(total).to eq(150)
|
|
381
|
+
end
|
|
382
|
+
end
|
|
383
|
+
```
|
|
384
|
+
|
|
385
|
+
### 1.5 Validación rubocop
|
|
386
|
+
|
|
387
|
+
- [ ] `bundle exec rubocop lib/data_drain/engine.rb` — sin ofensas, sin disables.
|
|
388
|
+
- [ ] CC de `#call` ≤ 5 (medido con `rubocop --only Metrics/CyclomaticComplexity lib/data_drain/engine.rb`).
|
|
389
|
+
- [ ] CC de `#purge_from_postgres` también ≤ 5 tras extracción.
|
|
390
|
+
|
|
391
|
+
### 1.6 Commit
|
|
392
|
+
|
|
393
|
+
- [ ] `git add lib/data_drain/engine.rb spec/data_drain/engine_spec.rb spec/data_drain/engine_refactor_spec.rb`
|
|
394
|
+
- [ ] Commit: `refactor(engine): extraer step_* + purge_loop retorna total_deleted (item 10)`
|
|
395
|
+
|
|
396
|
+
### Checkpoint Fase 1
|
|
397
|
+
|
|
398
|
+
- [ ] CC de `Engine#call` ≤ 5
|
|
399
|
+
- [ ] `rubocop:disable` eliminado de `engine.rb`
|
|
400
|
+
- [ ] Eventos emitidos idénticos (test de equivalencia pasa)
|
|
401
|
+
- [ ] Tests existentes siguen pasando sin cambios
|
|
402
|
+
- [ ] Coverage no bajó
|
|
403
|
+
|
|
404
|
+
---
|
|
405
|
+
|
|
406
|
+
## Fase 2 — Item 20: Limpiar `rubocop:disable` restantes en `lib/`
|
|
407
|
+
|
|
408
|
+
**Roadmap:** [Item 20](../IMPROVEMENT_PLAN.md#item-20--limpiar-rubocopdisable-en-lib-agregados-en-v020)
|
|
409
|
+
|
|
410
|
+
### 2.0 Checkpoint de entrada (observaciones 1 y 6 big-pickle)
|
|
411
|
+
|
|
412
|
+
**No arrancar Fase 2 sin confirmar estos invariantes de Fase 1:**
|
|
413
|
+
|
|
414
|
+
- [ ] CC real de `Engine#call` ≤ 5. Verificar:
|
|
415
|
+
```bash
|
|
416
|
+
bundle exec rubocop --only Metrics/CyclomaticComplexity lib/data_drain/engine.rb
|
|
417
|
+
```
|
|
418
|
+
Si CC salió > 5 en la práctica (ej. 7-8), detener y ajustar Fase 1 antes de avanzar.
|
|
419
|
+
- [ ] API de `@durations` hash confirmada con keys: `:db_query`, `:export`, `:integrity`, `:purge`.
|
|
420
|
+
- [ ] API de `timed(step_name)` confirmada:
|
|
421
|
+
```ruby
|
|
422
|
+
timed(:db_query) { yield_result }
|
|
423
|
+
# ejecuta yield, mide monotonic, guarda @durations[:db_query] = duration, retorna yield_result
|
|
424
|
+
```
|
|
425
|
+
- [ ] `purge_loop` retorna `total_deleted` (Integer).
|
|
426
|
+
|
|
427
|
+
Si alguno falla → volver a Fase 1 y ajustar. El refactor de FileIngestor depende de esta API.
|
|
428
|
+
|
|
429
|
+
### Contexto
|
|
430
|
+
|
|
431
|
+
v0.2.0 agregó `rubocop:disable` en 4 archivos. Engine ya limpio tras Fase 1. Faltan:
|
|
432
|
+
- `lib/data_drain/file_ingestor.rb`: `Metrics/AbcSize, CyclomaticComplexity, PerceivedComplexity, MethodLength`
|
|
433
|
+
- `lib/data_drain/record.rb`: `Metrics/AbcSize, MethodLength`
|
|
434
|
+
- `lib/data_drain/storage/s3.rb`: `Metrics/AbcSize, CyclomaticComplexity, MethodLength`
|
|
435
|
+
|
|
436
|
+
### 2.1 FileIngestor#call refactor
|
|
437
|
+
|
|
438
|
+
Similar a Engine#call, extraer `step_validate_file`, `step_count_source`, `step_export`, `step_cleanup` (nombres tentativos) y `timed` helper reutilizable.
|
|
439
|
+
|
|
440
|
+
- [ ] Considerar extraer `timed` a mixin o módulo si Engine y FileIngestor lo comparten (evita duplicación). Opciones:
|
|
441
|
+
- **Opción A:** `include Observability::Timing` con `timed` + `monotonic` como métodos compartidos.
|
|
442
|
+
- **Opción B:** dejar `timed` duplicado en cada clase. Simple pero DRY violation.
|
|
443
|
+
- **Opción C:** extraer a `DataDrain::Timing` módulo puro (no mixin).
|
|
444
|
+
|
|
445
|
+
**Decisión propuesta:** A. Ya hay precedente con `Observability`.
|
|
446
|
+
|
|
447
|
+
- [ ] Crear `lib/data_drain/observability/timing.rb`:
|
|
448
|
+
```ruby
|
|
449
|
+
module DataDrain
|
|
450
|
+
module Observability
|
|
451
|
+
module Timing
|
|
452
|
+
private
|
|
453
|
+
|
|
454
|
+
def monotonic
|
|
455
|
+
Process.clock_gettime(Process::CLOCK_MONOTONIC)
|
|
456
|
+
end
|
|
457
|
+
|
|
458
|
+
def timed(step_name)
|
|
459
|
+
t = monotonic
|
|
460
|
+
result = yield
|
|
461
|
+
@durations ||= {}
|
|
462
|
+
@durations[step_name] = monotonic - t
|
|
463
|
+
result
|
|
464
|
+
end
|
|
465
|
+
end
|
|
466
|
+
end
|
|
467
|
+
end
|
|
468
|
+
```
|
|
469
|
+
|
|
470
|
+
- [ ] En `Engine` y `FileIngestor`: `include Observability::Timing`. Quitar definiciones locales.
|
|
471
|
+
|
|
472
|
+
- [ ] Refactor `FileIngestor#call` análogo a Engine. Dividir en step_* methods.
|
|
473
|
+
|
|
474
|
+
### 2.2 Record refactor
|
|
475
|
+
|
|
476
|
+
- [ ] `Record#execute_and_instantiate` tiene `Metrics/AbcSize, MethodLength`. Revisar si se puede simplificar con guard clause:
|
|
477
|
+
```ruby
|
|
478
|
+
def execute_and_instantiate(sql, columns)
|
|
479
|
+
@logger = DataDrain.configuration.logger
|
|
480
|
+
result = connection.query(sql)
|
|
481
|
+
result.map { |row| new(columns.zip(row).to_h) }
|
|
482
|
+
rescue DuckDB::Error => e
|
|
483
|
+
safe_log(:warn, "record.parquet_not_found", exception_metadata(e))
|
|
484
|
+
[]
|
|
485
|
+
end
|
|
486
|
+
```
|
|
487
|
+
(Mueve el `begin/rescue` al nivel del método con `rescue` pendiente. Más limpio.)
|
|
488
|
+
- [ ] Quitar `rubocop:disable` de `record.rb`.
|
|
489
|
+
|
|
490
|
+
### 2.3 Storage::S3 refactor
|
|
491
|
+
|
|
492
|
+
- [ ] `Storage::S3#destroy_partitions` tiene AbcSize/CyclomaticComplexity. Extraer:
|
|
493
|
+
- `build_destroy_pattern(partition_keys, partitions)` → regex + prefix
|
|
494
|
+
- `collect_matching_objects(client, bucket, prefix, pattern_regex)` → array
|
|
495
|
+
- [ ] Resultado: `#destroy_partitions` queda como pipeline de 3 llamados.
|
|
496
|
+
- [ ] Quitar `rubocop:disable` de `s3.rb`.
|
|
497
|
+
|
|
498
|
+
### 2.4 Validación global
|
|
499
|
+
|
|
500
|
+
- [ ] `bundle exec rubocop lib/` — 0 ofensas, 0 disables `Metrics/*` en todo el `lib/`.
|
|
501
|
+
- [ ] `grep -r "rubocop:disable Metrics" lib/` retorna vacío.
|
|
502
|
+
- [ ] `bundle exec rspec` verde.
|
|
503
|
+
- [ ] Coverage estable o sube.
|
|
504
|
+
|
|
505
|
+
### 2.5 Commits (granulares)
|
|
506
|
+
|
|
507
|
+
- [ ] `refactor(observability): extraer Timing mixin (item 20)`
|
|
508
|
+
- [ ] `refactor(file_ingestor): extraer step_* methods (item 20)`
|
|
509
|
+
- [ ] `refactor(record): guard clause en execute_and_instantiate (item 20)`
|
|
510
|
+
- [ ] `refactor(storage/s3): extraer build_destroy_pattern + collect_matching_objects (item 20)`
|
|
511
|
+
|
|
512
|
+
### Checkpoint Fase 2
|
|
513
|
+
|
|
514
|
+
- [ ] 0 `rubocop:disable Metrics/*` en `lib/`
|
|
515
|
+
- [ ] Tests verdes
|
|
516
|
+
- [ ] Coverage estable
|
|
517
|
+
- [ ] `Observability::Timing` reusable
|
|
518
|
+
|
|
519
|
+
---
|
|
520
|
+
|
|
521
|
+
## Fase 3 — Item 5: `vacuum_after_purge` opcional
|
|
522
|
+
|
|
523
|
+
**Roadmap:** [Item 5](../IMPROVEMENT_PLAN.md#item-5--vacuum-analyze-opcional-post-purga)
|
|
524
|
+
|
|
525
|
+
### Contexto
|
|
526
|
+
|
|
527
|
+
Purgar millones de rows deja dead tuples. Sin VACUUM, el espacio no se libera y seq scans recorren páginas vacías. Agregar opción `config.vacuum_after_purge = false` (default).
|
|
528
|
+
|
|
529
|
+
**Importante:** VACUUM NO se puede ejecutar dentro de transacción. Debe correr con autocommit. La conexión `PG.connect` de `purge_from_postgres` está en autocommit por default, así que OK.
|
|
530
|
+
|
|
531
|
+
### 3.1 Agregar opción a Configuration
|
|
532
|
+
|
|
533
|
+
- [ ] Editar `lib/data_drain/configuration.rb`:
|
|
534
|
+
```ruby
|
|
535
|
+
attr_accessor :storage_mode, :aws_region,
|
|
536
|
+
:aws_access_key_id, :aws_secret_access_key,
|
|
537
|
+
:db_host, :db_port, :db_user, :db_pass, :db_name,
|
|
538
|
+
:batch_size, :throttle_delay, :logger, :limit_ram, :tmp_directory,
|
|
539
|
+
:idle_in_transaction_session_timeout,
|
|
540
|
+
:vacuum_after_purge # ← NUEVO
|
|
541
|
+
|
|
542
|
+
def initialize
|
|
543
|
+
# ... existente ...
|
|
544
|
+
@vacuum_after_purge = false # ← default opt-in
|
|
545
|
+
end
|
|
546
|
+
```
|
|
547
|
+
|
|
548
|
+
### 3.2 Implementar VACUUM en Engine
|
|
549
|
+
|
|
550
|
+
- [ ] En `lib/data_drain/engine.rb`, extender `purge_from_postgres` (ya refactorizado por Fase 1). Dentro del `ensure` actual, antes de `conn&.close`:
|
|
551
|
+
```ruby
|
|
552
|
+
def purge_from_postgres
|
|
553
|
+
safe_log(:info, "engine.purge_start", { table: @table_name, batch_size: @config.batch_size })
|
|
554
|
+
conn = open_pg_connection
|
|
555
|
+
apply_session_timeout(conn)
|
|
556
|
+
|
|
557
|
+
total_deleted = purge_loop(conn)
|
|
558
|
+
|
|
559
|
+
vacuum_if_needed(conn, total_deleted)
|
|
560
|
+
ensure
|
|
561
|
+
conn&.close
|
|
562
|
+
end
|
|
563
|
+
|
|
564
|
+
private
|
|
565
|
+
|
|
566
|
+
# @api private
|
|
567
|
+
#
|
|
568
|
+
# Bug fix observación 4 big-pickle: dead_before debe estar disponible en el
|
|
569
|
+
# rescue para poder reportarlo con el error. Se mide ANTES del begin pero
|
|
570
|
+
# se incluye tanto en complete como en error.
|
|
571
|
+
def vacuum_if_needed(conn, total_deleted)
|
|
572
|
+
return unless @config.vacuum_after_purge
|
|
573
|
+
return if total_deleted.zero?
|
|
574
|
+
|
|
575
|
+
vacuum_start = monotonic
|
|
576
|
+
dead_before = fetch_dead_tuple_count(conn)
|
|
577
|
+
|
|
578
|
+
begin
|
|
579
|
+
conn.exec("VACUUM ANALYZE #{@table_name};")
|
|
580
|
+
rescue PG::Error => e
|
|
581
|
+
safe_log(:warn, "engine.vacuum_error", {
|
|
582
|
+
table: @table_name,
|
|
583
|
+
dead_tuples_before: dead_before,
|
|
584
|
+
rows_deleted_count: total_deleted,
|
|
585
|
+
duration_s: (monotonic - vacuum_start).round(2)
|
|
586
|
+
}.merge(exception_metadata(e)))
|
|
587
|
+
return
|
|
588
|
+
end
|
|
589
|
+
|
|
590
|
+
dead_after = fetch_dead_tuple_count(conn)
|
|
591
|
+
vacuum_duration = monotonic - vacuum_start
|
|
592
|
+
|
|
593
|
+
safe_log(:info, "engine.vacuum_complete", {
|
|
594
|
+
table: @table_name,
|
|
595
|
+
duration_s: vacuum_duration.round(2),
|
|
596
|
+
dead_tuples_before: dead_before,
|
|
597
|
+
dead_tuples_after: dead_after,
|
|
598
|
+
rows_deleted_count: total_deleted
|
|
599
|
+
})
|
|
600
|
+
end
|
|
601
|
+
|
|
602
|
+
# @api private
|
|
603
|
+
def fetch_dead_tuple_count(conn)
|
|
604
|
+
result = conn.exec_params(
|
|
605
|
+
"SELECT n_dead_tup FROM pg_stat_user_tables WHERE relname = $1",
|
|
606
|
+
[@table_name]
|
|
607
|
+
)
|
|
608
|
+
result.first&.dig("n_dead_tup")&.to_i || 0
|
|
609
|
+
rescue PG::Error
|
|
610
|
+
-1 # -1 señala "no se pudo medir"
|
|
611
|
+
end
|
|
612
|
+
```
|
|
613
|
+
|
|
614
|
+
- [ ] Nota: `purge_loop` debe retornar `total_deleted`. Refactorizar si no lo hace.
|
|
615
|
+
|
|
616
|
+
### 3.3 Tests
|
|
617
|
+
|
|
618
|
+
- [ ] Agregar a `spec/data_drain/engine_spec.rb`:
|
|
619
|
+
```ruby
|
|
620
|
+
describe "VACUUM post-purge" do
|
|
621
|
+
it "no ejecuta VACUUM cuando vacuum_after_purge es false (default)" do
|
|
622
|
+
engine = described_class.new(base_options.merge(table_name: "versions"))
|
|
623
|
+
# ... mocks ...
|
|
624
|
+
expect(mock_pg_conn).not_to receive(:exec).with(/VACUUM/)
|
|
625
|
+
engine.call
|
|
626
|
+
end
|
|
627
|
+
|
|
628
|
+
it "ejecuta VACUUM ANALYZE cuando vacuum_after_purge es true y total_deleted > 0" do
|
|
629
|
+
DataDrain.configure { |c| c.vacuum_after_purge = true }
|
|
630
|
+
engine = described_class.new(base_options.merge(table_name: "versions"))
|
|
631
|
+
# ... mocks para que purge borre 100 rows ...
|
|
632
|
+
expect(mock_pg_conn).to receive(:exec).with(/VACUUM ANALYZE versions/)
|
|
633
|
+
# ... mock pg_stat_user_tables ...
|
|
634
|
+
engine.call
|
|
635
|
+
end
|
|
636
|
+
|
|
637
|
+
it "no ejecuta VACUUM si total_deleted es 0" do
|
|
638
|
+
DataDrain.configure { |c| c.vacuum_after_purge = true }
|
|
639
|
+
# ... mocks con cmd_tuples = 0 desde el primer lote ...
|
|
640
|
+
expect(mock_pg_conn).not_to receive(:exec).with(/VACUUM/)
|
|
641
|
+
end
|
|
642
|
+
|
|
643
|
+
it "emite engine.vacuum_complete con dead_tuples_before/after" do
|
|
644
|
+
# ... verifica log con fields correctos ...
|
|
645
|
+
end
|
|
646
|
+
|
|
647
|
+
it "captura PG::Error y loguea engine.vacuum_error sin levantar" do
|
|
648
|
+
DataDrain.configure { |c| c.vacuum_after_purge = true }
|
|
649
|
+
allow(mock_pg_conn).to receive(:exec).with(/VACUUM/)
|
|
650
|
+
.and_raise(PG::Error, "lock timeout")
|
|
651
|
+
expect(engine.call).to be true
|
|
652
|
+
# ... verifica log warning ...
|
|
653
|
+
end
|
|
654
|
+
|
|
655
|
+
# Observación 4 big-pickle: dead_before ya estaba medido cuando VACUUM falla
|
|
656
|
+
it "engine.vacuum_error incluye dead_tuples_before y rows_deleted_count" do
|
|
657
|
+
DataDrain.configure { |c| c.vacuum_after_purge = true }
|
|
658
|
+
# mock fetch_dead_tuple_count → 500 (pre-VACUUM)
|
|
659
|
+
# mock VACUUM → raise PG::Error
|
|
660
|
+
logs = capture_logs { engine.call }
|
|
661
|
+
error_log = logs.find { |l| l.include?("engine.vacuum_error") }
|
|
662
|
+
expect(error_log).to include("dead_tuples_before=500")
|
|
663
|
+
expect(error_log).to include("rows_deleted_count=")
|
|
664
|
+
end
|
|
665
|
+
end
|
|
666
|
+
```
|
|
667
|
+
|
|
668
|
+
### 3.4 Docs
|
|
669
|
+
|
|
670
|
+
- [ ] `skill/references/eventos-telemetria.md` — agregar `engine.vacuum_complete` y `engine.vacuum_error`.
|
|
671
|
+
- [ ] `skill/references/api-detallada.md` — Configuration table: agregar `vacuum_after_purge` (Boolean, default `false`).
|
|
672
|
+
- [ ] `skill/references/postgres-tuning.md` — actualizar sección "VACUUM ANALYZE post-purga" explicando que ahora hay opción automática. Link al config.
|
|
673
|
+
- [ ] `README.md` — en el bloque de configuración agregar:
|
|
674
|
+
```ruby
|
|
675
|
+
config.vacuum_after_purge = false # true: VACUUM ANALYZE tras cada purge
|
|
676
|
+
```
|
|
677
|
+
|
|
678
|
+
### 3.5 Commit
|
|
679
|
+
|
|
680
|
+
- [ ] Commit: `feat(engine): vacuum_after_purge opcional post-purge (item 5)`
|
|
681
|
+
|
|
682
|
+
### Checkpoint Fase 3
|
|
683
|
+
|
|
684
|
+
- [ ] Default `false` mantiene backward-compat.
|
|
685
|
+
- [ ] Con `true` ejecuta VACUUM ANALYZE solo si hubo deletes.
|
|
686
|
+
- [ ] PG::Error no rompe el flujo (warn + continue).
|
|
687
|
+
- [ ] Log `engine.vacuum_complete` con métricas.
|
|
688
|
+
|
|
689
|
+
---
|
|
690
|
+
|
|
691
|
+
## Fase 4 — Item 11b: Warning runtime de purga lenta
|
|
692
|
+
|
|
693
|
+
**Roadmap:** [Item 11b](../IMPROVEMENT_PLAN.md#item-11b--warning-runtime-de-purga-lenta-sin-avance)
|
|
694
|
+
|
|
695
|
+
### Contexto
|
|
696
|
+
|
|
697
|
+
`engine.purge_heartbeat` se emite cada 100 lotes. Si un lote tarda 5 minutos (índice faltante, lock contention), no hay alerta hasta el lote 100 (que podría ser horas).
|
|
698
|
+
|
|
699
|
+
**Importante:** Usar Timecop (agregado en Fase 0.3) para tests. Big-pickle señaló en review v0.2.2 que mock directo de `Process.clock_gettime` es flakey.
|
|
700
|
+
|
|
701
|
+
### 4.1 Agregar opciones a Configuration
|
|
702
|
+
|
|
703
|
+
- [ ] Editar `lib/data_drain/configuration.rb`:
|
|
704
|
+
```ruby
|
|
705
|
+
attr_accessor ...,
|
|
706
|
+
:slow_batch_threshold_s, # ← NUEVO
|
|
707
|
+
:slow_batch_alert_after # ← NUEVO
|
|
708
|
+
|
|
709
|
+
def initialize
|
|
710
|
+
...
|
|
711
|
+
@slow_batch_threshold_s = 30
|
|
712
|
+
@slow_batch_alert_after = 5
|
|
713
|
+
end
|
|
714
|
+
```
|
|
715
|
+
|
|
716
|
+
### 4.2 Implementar warning en purge_loop
|
|
717
|
+
|
|
718
|
+
- [ ] Refactor `Engine#purge_loop` (ya extraído en Fase 1) para medir timing por lote:
|
|
719
|
+
```ruby
|
|
720
|
+
# @api private
|
|
721
|
+
def purge_loop(conn)
|
|
722
|
+
batches_processed = 0
|
|
723
|
+
total_deleted = 0
|
|
724
|
+
slow_batch_streak = 0
|
|
725
|
+
|
|
726
|
+
loop do
|
|
727
|
+
batch_start = monotonic
|
|
728
|
+
result = conn.exec(build_delete_sql)
|
|
729
|
+
batch_duration = monotonic - batch_start
|
|
730
|
+
count = result.cmd_tuples
|
|
731
|
+
break if count.zero?
|
|
732
|
+
|
|
733
|
+
batches_processed += 1
|
|
734
|
+
total_deleted += count
|
|
735
|
+
|
|
736
|
+
slow_batch_streak = handle_batch_timing(
|
|
737
|
+
batch_duration, count, slow_batch_streak
|
|
738
|
+
)
|
|
739
|
+
|
|
740
|
+
emit_heartbeat_if_due(batches_processed, total_deleted)
|
|
741
|
+
|
|
742
|
+
sleep(@config.throttle_delay) if @config.throttle_delay.positive?
|
|
743
|
+
end
|
|
744
|
+
|
|
745
|
+
total_deleted
|
|
746
|
+
end
|
|
747
|
+
|
|
748
|
+
# @api private
|
|
749
|
+
def handle_batch_timing(batch_duration, count, streak)
|
|
750
|
+
if batch_duration > @config.slow_batch_threshold_s
|
|
751
|
+
streak += 1
|
|
752
|
+
safe_log(:warn, "engine.slow_batch", {
|
|
753
|
+
table: @table_name,
|
|
754
|
+
batch_duration_s: batch_duration.round(2),
|
|
755
|
+
batch_size: count,
|
|
756
|
+
streak: streak,
|
|
757
|
+
threshold_s: @config.slow_batch_threshold_s
|
|
758
|
+
})
|
|
759
|
+
|
|
760
|
+
if streak == @config.slow_batch_alert_after
|
|
761
|
+
safe_log(:warn, "engine.purge_degraded", {
|
|
762
|
+
table: @table_name,
|
|
763
|
+
consecutive_slow_batches: streak,
|
|
764
|
+
hint: "considerar índice composite o particionamiento (ver postgres-tuning.md)"
|
|
765
|
+
})
|
|
766
|
+
end
|
|
767
|
+
streak
|
|
768
|
+
else
|
|
769
|
+
0 # reset streak
|
|
770
|
+
end
|
|
771
|
+
end
|
|
772
|
+
```
|
|
773
|
+
|
|
774
|
+
### 4.3 Tests con Timecop
|
|
775
|
+
|
|
776
|
+
- [ ] Agregar a `spec/data_drain/engine_spec.rb` (Timecop ya está en spec_helper, Timecop.return en after(:each) global):
|
|
777
|
+
```ruby
|
|
778
|
+
describe "warning de purga lenta" do
|
|
779
|
+
it "emite engine.slow_batch cuando batch excede threshold" do
|
|
780
|
+
DataDrain.configure do |c|
|
|
781
|
+
c.slow_batch_threshold_s = 5
|
|
782
|
+
c.slow_batch_alert_after = 3
|
|
783
|
+
end
|
|
784
|
+
|
|
785
|
+
engine = described_class.new(base_options.merge(table_name: "versions"))
|
|
786
|
+
# ... mocks ...
|
|
787
|
+
|
|
788
|
+
call_count = 0
|
|
789
|
+
allow(mock_pg_result).to receive(:cmd_tuples) do
|
|
790
|
+
call_count += 1
|
|
791
|
+
call_count <= 2 ? 100 : 0
|
|
792
|
+
end
|
|
793
|
+
|
|
794
|
+
allow(mock_pg_conn).to receive(:exec).with(/DELETE/) do
|
|
795
|
+
Timecop.travel(Time.now + 10) # simular 10s por lote
|
|
796
|
+
mock_pg_result
|
|
797
|
+
end
|
|
798
|
+
|
|
799
|
+
logs = capture_logs { engine.call }
|
|
800
|
+
expect(logs).to include(match(/engine.slow_batch/))
|
|
801
|
+
end
|
|
802
|
+
|
|
803
|
+
it "emite engine.purge_degraded tras N lotes lentos consecutivos" do
|
|
804
|
+
DataDrain.configure do |c|
|
|
805
|
+
c.slow_batch_threshold_s = 5
|
|
806
|
+
c.slow_batch_alert_after = 3
|
|
807
|
+
end
|
|
808
|
+
|
|
809
|
+
# ... simular 3 lotes lentos consecutivos ...
|
|
810
|
+
logs = capture_logs { engine.call }
|
|
811
|
+
expect(logs.select { |l| l.include?("engine.purge_degraded") }.size).to eq(1)
|
|
812
|
+
end
|
|
813
|
+
|
|
814
|
+
it "resetea streak si un lote es rápido" do
|
|
815
|
+
# ... 2 lotes lentos, 1 rápido, 2 lentos ...
|
|
816
|
+
# NO debería emitir purge_degraded porque streak se reseteó
|
|
817
|
+
end
|
|
818
|
+
|
|
819
|
+
it "thresholds configurables via Configuration" do
|
|
820
|
+
# ...
|
|
821
|
+
end
|
|
822
|
+
end
|
|
823
|
+
```
|
|
824
|
+
|
|
825
|
+
### 4.4 Docs
|
|
826
|
+
|
|
827
|
+
- [ ] `skill/references/eventos-telemetria.md` — agregar `engine.slow_batch` (WARN) y `engine.purge_degraded` (WARN).
|
|
828
|
+
- [ ] `skill/references/api-detallada.md` — Configuration table agregar `slow_batch_threshold_s` (Integer, default `30`) y `slow_batch_alert_after` (Integer, default `5`).
|
|
829
|
+
- [ ] `skill/references/postgres-tuning.md` — mencionar que DataDrain ahora alerta automáticamente si detecta purga lenta, link a sección "Índice recomendado".
|
|
830
|
+
|
|
831
|
+
### 4.5 Commit
|
|
832
|
+
|
|
833
|
+
- [ ] Commit: `feat(engine): warning runtime de purga lenta sin avance (item 11b)`
|
|
834
|
+
|
|
835
|
+
### Checkpoint Fase 4
|
|
836
|
+
|
|
837
|
+
- [ ] Thresholds configurables con defaults razonables.
|
|
838
|
+
- [ ] Streak se resetea con lote rápido.
|
|
839
|
+
- [ ] `purge_degraded` emite una sola vez por streak.
|
|
840
|
+
- [ ] Tests usan Timecop (no mocks frágiles).
|
|
841
|
+
|
|
842
|
+
---
|
|
843
|
+
|
|
844
|
+
## Fase 5 — Item 6: Sandboxing de `Record.connection`
|
|
845
|
+
|
|
846
|
+
**Roadmap:** [Item 6](../IMPROVEMENT_PLAN.md#item-6--sandboxing-de-recordconnection)
|
|
847
|
+
|
|
848
|
+
### Contexto
|
|
849
|
+
|
|
850
|
+
`Record.connection` es read-only por diseño. Aplicar `SET lock_configuration=true` post-setup reduce blast radius si alguien intenta ejecutar SQL malicioso vía `where_clause` (improbable hoy pero defensa en profundidad).
|
|
851
|
+
|
|
852
|
+
**Riesgo:** `lock_configuration=true` congela TODAS las configs de la conexión. Si después se intenta cambiar `s3_*` o cargar otra extensión, falla. Actualmente el setup carga httpfs + SECRET antes del lock, así que OK.
|
|
853
|
+
|
|
854
|
+
### 5.1 Investigación previa — PRE-REQUISITO (observación 5 big-pickle)
|
|
855
|
+
|
|
856
|
+
**No implementar 5.2 sin completar esta investigación.** El Plan B contempla que `lock_configuration` rompa httpfs/S3 en runtime; esto debe descartarse antes.
|
|
857
|
+
|
|
858
|
+
Ejecutar los 3 tests en `bin/console`:
|
|
859
|
+
|
|
860
|
+
- [ ] **Test 1: lock no afecta httpfs ya cargado**
|
|
861
|
+
```ruby
|
|
862
|
+
# bin/console
|
|
863
|
+
db = DuckDB::Database.open(":memory:"); conn = db.connect
|
|
864
|
+
conn.query("INSTALL httpfs; LOAD httpfs;")
|
|
865
|
+
conn.query("SET lock_configuration=true;")
|
|
866
|
+
|
|
867
|
+
# Debería funcionar:
|
|
868
|
+
conn.query("SELECT 1;") # → [[1]]
|
|
869
|
+
|
|
870
|
+
# Debería FALLAR (esperado):
|
|
871
|
+
begin
|
|
872
|
+
conn.query("SET memory_limit='1KB';")
|
|
873
|
+
puts "ERROR: lock no funcionó"
|
|
874
|
+
rescue DuckDB::Error => e
|
|
875
|
+
puts "OK: lock rechaza SET (#{e.message})"
|
|
876
|
+
end
|
|
877
|
+
```
|
|
878
|
+
|
|
879
|
+
- [ ] **Test 2: lock no afecta secrets ya creados**
|
|
880
|
+
```ruby
|
|
881
|
+
db = DuckDB::Database.open(":memory:"); conn = db.connect
|
|
882
|
+
conn.query("INSTALL httpfs; LOAD httpfs;")
|
|
883
|
+
conn.query("CREATE SECRET test_secret (TYPE S3, PROVIDER credential_chain, REGION 'us-east-1');")
|
|
884
|
+
conn.query("SET lock_configuration=true;")
|
|
885
|
+
|
|
886
|
+
# Verificar que el secret sigue disponible:
|
|
887
|
+
result = conn.query("FROM duckdb_secrets();").to_a
|
|
888
|
+
puts "Secrets post-lock: #{result.size}" # Debe ser ≥ 1
|
|
889
|
+
```
|
|
890
|
+
|
|
891
|
+
- [ ] **Test 3: read_parquet(s3://...) funciona post-lock** (requiere credenciales AWS válidas)
|
|
892
|
+
```ruby
|
|
893
|
+
# Solo si hay un bucket de prueba
|
|
894
|
+
# conn.query("FROM read_parquet('s3://your-test-bucket/path/*.parquet') LIMIT 1;")
|
|
895
|
+
# Si falla con error de auth → problema de credenciales, no del lock
|
|
896
|
+
# Si falla con error de lock → bloqueante, ajustar código (ver punto siguiente)
|
|
897
|
+
```
|
|
898
|
+
|
|
899
|
+
- [ ] **Documentar resultados:**
|
|
900
|
+
> Test 1 (SET rechazado post-lock): _______________
|
|
901
|
+
> Test 2 (secrets preservados): _______________
|
|
902
|
+
> Test 3 (read_parquet S3): _______________
|
|
903
|
+
|
|
904
|
+
**Plan según resultados:**
|
|
905
|
+
|
|
906
|
+
| Escenario | Acción |
|
|
907
|
+
|-----------|--------|
|
|
908
|
+
| Todos los tests OK | Proceder con 5.2 implementación unconditional |
|
|
909
|
+
| Test 3 falla por lock | Ajustar: aplicar lock solo si `storage_mode == :local`, documentar limitación S3 en CLAUDE.md y SKILL.md |
|
|
910
|
+
| Test 2 falla (secrets rotos) | **STOP** — diseñar alternativa (ej. lock via SET scope=session en lugar de global) |
|
|
911
|
+
|
|
912
|
+
### 5.2 Implementar
|
|
913
|
+
|
|
914
|
+
- [ ] Editar `lib/data_drain/record.rb` en `self.connection`, después de `DataDrain::Storage.adapter.setup_duckdb(conn)`:
|
|
915
|
+
```ruby
|
|
916
|
+
def self.connection
|
|
917
|
+
Thread.current[:data_drain_duckdb] ||= begin
|
|
918
|
+
db = DuckDB::Database.open(":memory:")
|
|
919
|
+
conn = db.connect
|
|
920
|
+
|
|
921
|
+
config = DataDrain.configuration
|
|
922
|
+
conn.query("SET max_memory='#{config.limit_ram}';") if config.limit_ram.present?
|
|
923
|
+
conn.query("SET temp_directory='#{config.tmp_directory}'") if config.tmp_directory.present?
|
|
924
|
+
|
|
925
|
+
DataDrain::Storage.adapter.setup_duckdb(conn)
|
|
926
|
+
|
|
927
|
+
# Sandboxing: congelar config post-setup (item 6 v0.3.0)
|
|
928
|
+
# NO afecta secrets ni extensiones ya cargadas — solo previene SETs futuros.
|
|
929
|
+
conn.query("SET lock_configuration=true;")
|
|
930
|
+
|
|
931
|
+
{ db: db, conn: conn }
|
|
932
|
+
end
|
|
933
|
+
Thread.current[:data_drain_duckdb][:conn]
|
|
934
|
+
end
|
|
935
|
+
```
|
|
936
|
+
|
|
937
|
+
### 5.3 Tests
|
|
938
|
+
|
|
939
|
+
- [ ] Agregar a `spec/data_drain/record_spec.rb`:
|
|
940
|
+
```ruby
|
|
941
|
+
describe "sandboxing (item 6)" do
|
|
942
|
+
after { described_class.disconnect! }
|
|
943
|
+
|
|
944
|
+
it "aplica lock_configuration=true tras setup" do
|
|
945
|
+
conn = record_class.connection
|
|
946
|
+
expect {
|
|
947
|
+
conn.query("SET memory_limit='1KB';")
|
|
948
|
+
}.to raise_error(DuckDB::Error, /lock/i)
|
|
949
|
+
end
|
|
950
|
+
|
|
951
|
+
it "permite queries de lectura tras lock" do
|
|
952
|
+
expect {
|
|
953
|
+
record_class.where(year: 2026, month: 3)
|
|
954
|
+
}.not_to raise_error
|
|
955
|
+
end
|
|
956
|
+
|
|
957
|
+
it "S3 (httpfs) sigue funcionando tras lock" do
|
|
958
|
+
# Skip si no hay fixture S3 o mockear
|
|
959
|
+
# Requiere setup de S3 antes del lock, verificar que no rompe
|
|
960
|
+
skip "requiere fixture S3 real" unless ENV["TEST_S3"]
|
|
961
|
+
# ...
|
|
962
|
+
end
|
|
963
|
+
end
|
|
964
|
+
```
|
|
965
|
+
|
|
966
|
+
### 5.4 Edge case: reconnect tras `disconnect!`
|
|
967
|
+
|
|
968
|
+
Cuando se llama `Record.disconnect!` y luego se hace una query, se reinicia el setup. El lock se aplica de nuevo. Test:
|
|
969
|
+
|
|
970
|
+
- [ ] Agregar:
|
|
971
|
+
```ruby
|
|
972
|
+
it "reaplica lock_configuration tras reconnect" do
|
|
973
|
+
record_class.connection
|
|
974
|
+
record_class.disconnect!
|
|
975
|
+
conn = record_class.connection # reconnect
|
|
976
|
+
|
|
977
|
+
expect {
|
|
978
|
+
conn.query("SET memory_limit='1KB';")
|
|
979
|
+
}.to raise_error(DuckDB::Error, /lock/i)
|
|
980
|
+
end
|
|
981
|
+
```
|
|
982
|
+
|
|
983
|
+
### 5.5 Docs
|
|
984
|
+
|
|
985
|
+
- [ ] `CLAUDE.md` — sección "Seguridad" agregar nota sobre `lock_configuration` en Record.
|
|
986
|
+
- [ ] `skill/references/api-detallada.md` — Record.connection: mencionar que la conexión está sandboxed.
|
|
987
|
+
- [ ] `skill/references/antipatrones.md` — nuevo antipatrón: "No intentar cambiar config DuckDB en la conexión de Record — está locked. Para configs distintas, crear conexión efímera (Engine/FileIngestor sí pueden)."
|
|
988
|
+
|
|
989
|
+
### 5.6 Commit
|
|
990
|
+
|
|
991
|
+
- [ ] Commit: `security(record): sandboxing con lock_configuration=true (item 6)`
|
|
992
|
+
|
|
993
|
+
### Checkpoint Fase 5
|
|
994
|
+
|
|
995
|
+
- [ ] Lock aplicado tras setup.
|
|
996
|
+
- [ ] Read queries siguen funcionando.
|
|
997
|
+
- [ ] S3 queries (httpfs) siguen funcionando.
|
|
998
|
+
- [ ] Reconnect post-`disconnect!` reaplica lock.
|
|
999
|
+
|
|
1000
|
+
---
|
|
1001
|
+
|
|
1002
|
+
## Fase 6 — Release
|
|
1003
|
+
|
|
1004
|
+
### 6.1 Lint global
|
|
1005
|
+
|
|
1006
|
+
- [ ] `bundle exec rubocop lib/` — 0 ofensas, 0 `rubocop:disable Metrics/*`.
|
|
1007
|
+
- [ ] `bundle exec rspec` — todo verde, coverage ≥ 80%.
|
|
1008
|
+
- [ ] CI verde en branch.
|
|
1009
|
+
|
|
1010
|
+
### 6.2 CHANGELOG
|
|
1011
|
+
|
|
1012
|
+
- [ ] Editar `CHANGELOG.md`, agregar al tope:
|
|
1013
|
+
```markdown
|
|
1014
|
+
## [0.3.0] - 2026-XX-XX
|
|
1015
|
+
|
|
1016
|
+
### Refactor
|
|
1017
|
+
- `Engine#call` refactorizado: extraídos `step_count`, `step_export`, `step_verify`,
|
|
1018
|
+
`step_purge` como métodos privados con `timed` helper. CC bajó de 13 a 5. Eventos
|
|
1019
|
+
emitidos idénticos al comportamiento anterior. (item 10)
|
|
1020
|
+
- Extraído `DataDrain::Observability::Timing` mixin compartido entre Engine y
|
|
1021
|
+
FileIngestor. (item 20)
|
|
1022
|
+
- `FileIngestor#call` refactorizado análogo a Engine. (item 20)
|
|
1023
|
+
- Eliminados todos los `# rubocop:disable Metrics/*` en `lib/`. (item 20)
|
|
1024
|
+
|
|
1025
|
+
### Features
|
|
1026
|
+
- `config.vacuum_after_purge = false` (default). Si `true`, ejecuta `VACUUM ANALYZE`
|
|
1027
|
+
post-purga cuando hubo deletes. Emite `engine.vacuum_complete` con dead_tuples
|
|
1028
|
+
antes/después y duración. Errores PG se capturan como `engine.vacuum_error` WARN.
|
|
1029
|
+
(item 5)
|
|
1030
|
+
- `config.slow_batch_threshold_s = 30` y `config.slow_batch_alert_after = 5`.
|
|
1031
|
+
Detecta lotes de purga lentos. Emite `engine.slow_batch` WARN por cada lote
|
|
1032
|
+
lento, `engine.purge_degraded` WARN una vez por streak. Incluye hint a docs de
|
|
1033
|
+
tuning. (item 11b)
|
|
1034
|
+
|
|
1035
|
+
### Security
|
|
1036
|
+
- `Record.connection` aplica `SET lock_configuration=true` post-setup. Congela
|
|
1037
|
+
cualquier SET futuro sobre la conexión (defensa en profundidad). NO afecta
|
|
1038
|
+
secrets ni extensiones ya cargadas. (item 6)
|
|
1039
|
+
|
|
1040
|
+
### Telemetry nueva
|
|
1041
|
+
- `engine.vacuum_complete`, `engine.vacuum_error`, `engine.slow_batch`,
|
|
1042
|
+
`engine.purge_degraded`.
|
|
1043
|
+
|
|
1044
|
+
### Tests
|
|
1045
|
+
- Coverage se mantiene ≥ 80%.
|
|
1046
|
+
- Nuevo test de equivalencia para Engine (eventos idénticos pre/post refactor).
|
|
1047
|
+
- Timecop agregado para tests de timing (item 11b).
|
|
1048
|
+
```
|
|
1049
|
+
|
|
1050
|
+
### 6.3 Bump versión
|
|
1051
|
+
|
|
1052
|
+
- [ ] `lib/data_drain/version.rb`: `VERSION = "0.3.0"`
|
|
1053
|
+
- [ ] `bundle install` (actualiza `Gemfile.lock`)
|
|
1054
|
+
|
|
1055
|
+
### 6.4 Actualizar roadmap
|
|
1056
|
+
|
|
1057
|
+
- [ ] `docs/IMPROVEMENT_PLAN.md`: items 5, 6, 10, 11b, 20 → `[x]`.
|
|
1058
|
+
- [ ] Actualizar "Última actualización".
|
|
1059
|
+
|
|
1060
|
+
### 6.5 Commit release
|
|
1061
|
+
|
|
1062
|
+
- [ ] Commit: `chore: release v0.3.0 — refactor y observabilidad avanzada`
|
|
1063
|
+
|
|
1064
|
+
### 6.6 PR + merge + tag
|
|
1065
|
+
|
|
1066
|
+
- [ ] `git push origin feature/v0.3.0`
|
|
1067
|
+
- [ ] `gh pr create` con título `v0.3.0: refactor Engine + VACUUM + warning purga + sandboxing`
|
|
1068
|
+
- [ ] Esperar CI verde
|
|
1069
|
+
- [ ] Mergear
|
|
1070
|
+
- [ ] Tag: `git tag v0.3.0 && git push origin v0.3.0`
|
|
1071
|
+
|
|
1072
|
+
### 6.7 Post-merge
|
|
1073
|
+
|
|
1074
|
+
- [ ] Archivar plan: `git mv docs/execution/v0.3.0.md docs/execution/archive/v0.3.0.md`
|
|
1075
|
+
- [ ] Commit: `chore: archive v0.3.0 plan, mark items 5/6/10/11b/20 [x]`
|
|
1076
|
+
|
|
1077
|
+
---
|
|
1078
|
+
|
|
1079
|
+
## Validación final
|
|
1080
|
+
|
|
1081
|
+
- [ ] CI verde en main tras merge
|
|
1082
|
+
- [ ] Coverage ≥ 80%
|
|
1083
|
+
- [ ] 0 `rubocop:disable Metrics/*` en `lib/`
|
|
1084
|
+
- [ ] Tag v0.3.0 creado
|
|
1085
|
+
- [ ] 5 items marcados `[x]` en roadmap
|
|
1086
|
+
- [ ] Plan archivado
|
|
1087
|
+
- [ ] CHANGELOG actualizado
|
|
1088
|
+
|
|
1089
|
+
---
|
|
1090
|
+
|
|
1091
|
+
## Plan B — escenarios de bloqueo
|
|
1092
|
+
|
|
1093
|
+
| Si... | Entonces... |
|
|
1094
|
+
|-------|-------------|
|
|
1095
|
+
| Item 10 rompe tests existentes por cambio de orden de eventos | Revisar test de equivalencia; ajustar el orden de `safe_log` calls |
|
|
1096
|
+
| Item 20 refactor de FileIngestor#call es más costoso de lo esperado | Dividir en sub-release v0.3.0.1 |
|
|
1097
|
+
| Item 5 tests mockeados de `pg_stat_user_tables` son frágiles | Marcar tests con `:integration` y correr con Postgres real manualmente |
|
|
1098
|
+
| Item 11b tests con Timecop siguen flakey en CI | Aumentar tolerancia en assertions, usar `Time.travel` con offsets grandes |
|
|
1099
|
+
| Item 6 `lock_configuration` rompe httpfs/S3 en runtime | Mover el lock a solo cuando `storage_mode == :local`; documentar limitación |
|
|
1100
|
+
| Coverage baja de 80% tras refactor | Agregar tests para nuevos métodos privados `step_*` via `send(:step_count)` etc. |
|
|
1101
|
+
|
|
1102
|
+
---
|
|
1103
|
+
|
|
1104
|
+
## Notas para el agente que ejecuta
|
|
1105
|
+
|
|
1106
|
+
- **Fase 1 es la más invasiva.** Tests de equivalencia son críticos para no romper consumidores que dependen del formato de logs.
|
|
1107
|
+
- **Commits granulares**: dentro de Fase 2, cada archivo refactoreado = 1 commit. Facilita bisect si algo rompe.
|
|
1108
|
+
- **Cada fase cierra con verde**: rspec + rubocop antes del siguiente paso.
|
|
1109
|
+
- **Observabilidad crítica**: los eventos nuevos (`vacuum_complete`, `slow_batch`, `purge_degraded`) deben respetar el estándar Wispro-Observability-Spec v1 (keys snake_case, `_s` Float, `_count` Integer, sin unidades en valores).
|
|
1110
|
+
- **No tocar** `skip_export`, `verify_integrity`, `base_where_sql`, `setup_duckdb`, `get_postgres_count`, `export_to_parquet`. Ya están probados y no necesitan refactor.
|
|
1111
|
+
- **Coordinación con Gemini**: si trabaja en paralelo, verificar antes de consolidar (`source=`, orden de campos en logs, `include` vs `extend`, etc. — ver `feedback_gemini_collaboration.md`).
|