data_shifter 0.1.0 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.husky/pre-commit +0 -3
- data/CHANGELOG.md +25 -0
- data/README.md +114 -44
- data/lib/data_shifter/configuration.rb +42 -0
- data/lib/data_shifter/errors.rb +46 -0
- data/lib/data_shifter/internal/env.rb +8 -6
- data/lib/data_shifter/internal/log_deduplicator.rb +149 -0
- data/lib/data_shifter/internal/output.rb +17 -3
- data/lib/data_shifter/internal/side_effect_guards.rb +120 -0
- data/lib/data_shifter/shift.rb +102 -19
- data/lib/data_shifter/version.rb +1 -1
- data/lib/data_shifter.rb +21 -0
- data/lib/generators/data_shift_generator.rb +10 -7
- metadata +20 -3
- data/lib/data_shifter/rubocop.rb +0 -4
- data/lib/rubocop/cop/data_shifter/skip_transaction_guard_dry_run.rb +0 -55
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 8143cec17a5f8cb7374ad327338e694cea0a9422bf0392e005a3eaeeee9ab83d
|
|
4
|
+
data.tar.gz: b9a246478df8ef89377482951e74ad36b63655510f4bf3bbc6a1b4480edec85e
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: e8f150f146151d8a82ccc79fd2e31f2ed813ee1b631d0eb9fc5490fa201ed31890a088f24bb7bb086d4158407f5bfb3f969c639dad75d85809c72d44e609172e
|
|
7
|
+
data.tar.gz: b776d810b0819d216436e169414c9c09c0a0c1f7cc3123f58912fa638675f55c998c3c7be71bae40868c5fddbb031bd42d3b2b8a915491d9e6d792852712405f
|
data/.husky/pre-commit
CHANGED
data/CHANGELOG.md
ADDED
|
@@ -0,0 +1,25 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
## [Unreleased]
|
|
4
|
+
|
|
5
|
+
* N/A
|
|
6
|
+
|
|
7
|
+
## [0.2.0]
|
|
8
|
+
|
|
9
|
+
### Added
|
|
10
|
+
|
|
11
|
+
- **Configuration object**: New `DataShifter.configure` block for global settings.
|
|
12
|
+
- **Dry-run rollback for `transaction false`**: Shifts using `transaction false` (or `:none`) now roll back DB changes in dry-run mode, matching the behavior of other transaction modes.
|
|
13
|
+
- **Automatic side-effect guards in dry run**: When a shift runs in dry run mode, HTTP (via WebMock), ActionMailer, ActiveJob, and Sidekiq (if loaded) are now automatically blocked or faked so that unguarded external calls do not run. Restore happens in an `ensure` so state is reverted after the run.
|
|
14
|
+
- **HTTP**: All outbound requests are blocked unless allowed with the per-shift `allow_external_requests [...]` DSL or global `DataShifter.config.allow_external_requests`.
|
|
15
|
+
- **ActionMailer**: `perform_deliveries = false` for the duration of the dry run.
|
|
16
|
+
- **ActiveJob**: Queue adapter set to `:test` for the duration of the dry run.
|
|
17
|
+
- **Sidekiq**: `Sidekiq::Testing.fake!` for the duration of the dry run (only if `Sidekiq::Testing` is already loaded).
|
|
18
|
+
- Dependency on `webmock` (>= 3.18) for dry-run HTTP blocking.
|
|
19
|
+
- **Log deduplication**: Repeated log messages are now suppressed during shift runs (default: on). First occurrence logs normally; subsequent occurrences are counted and a summary is printed at the end. Configure globally with `config.suppress_repeated_logs` and `config.repeated_log_cap` (default 1000). Override per-shift with `suppress_repeated_logs false`.
|
|
20
|
+
- **Global progress bar default**: `config.progress_enabled` (default `true`) sets the default for all shifts. Per-shift `progress true/false` still overrides.
|
|
21
|
+
- **Global status interval**: `config.status_interval_seconds` (default `nil`) provides a fallback when `STATUS_INTERVAL` env var is not set.
|
|
22
|
+
- **skip! abort behavior**: `skip!` now terminates the current `process_record` (no `return` needed after calling it).
|
|
23
|
+
- **Grouped skip reasons**: Skip reasons are grouped and the top 10 (by count) are shown in the summary and status output instead of logging each skip inline.
|
|
24
|
+
|
|
25
|
+
## [0.1.0] - Initial release
|
data/README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# DataShifter
|
|
2
2
|
|
|
3
|
-
Rake-backed data migrations (
|
|
3
|
+
Rake-backed data migrations ("shifts") for Rails apps, with **dry run by default**, progress output, and a consistent summary. Define shift classes in `lib/data_shifts/*.rb`; run them as `rake data:shift:<task_name>`.
|
|
4
4
|
|
|
5
5
|
## Installation
|
|
6
6
|
|
|
@@ -21,7 +21,7 @@ Generate a shift (optionally scoped to a model):
|
|
|
21
21
|
|
|
22
22
|
```bash
|
|
23
23
|
bin/rails generate data_shift backfill_foo
|
|
24
|
-
bin/rails generate data_shift backfill_users --model
|
|
24
|
+
bin/rails generate data_shift backfill_users --model User
|
|
25
25
|
```
|
|
26
26
|
|
|
27
27
|
Add your logic to the generated file in `lib/data_shifts/`.
|
|
@@ -33,19 +33,6 @@ rake data:shift:backfill_foo
|
|
|
33
33
|
COMMIT=1 rake data:shift:backfill_foo
|
|
34
34
|
```
|
|
35
35
|
|
|
36
|
-
## How shift files map to rake tasks
|
|
37
|
-
|
|
38
|
-
DataShifter defines one rake task per file in `lib/data_shifts/*.rb`.
|
|
39
|
-
|
|
40
|
-
- **Task name**: derived from the filename with any leading digits removed.
|
|
41
|
-
- `20260201120000_backfill_foo.rb` → `data:shift:backfill_foo` (leading `<digits>_` prefix is stripped)
|
|
42
|
-
- `backfill_foo.rb` → `data:shift:backfill_foo`
|
|
43
|
-
- **Class name**: task name camelized, inside the `DataShifts` module.
|
|
44
|
-
- `backfill_foo` → `DataShifts::BackfillFoo`
|
|
45
|
-
|
|
46
|
-
Shift files are **required only when the task runs** (tasks are defined up front; classes load lazily).
|
|
47
|
-
The `description "..."` line is extracted from the file and used for `rake -T` output without loading the shift class.
|
|
48
|
-
|
|
49
36
|
## Defining a shift
|
|
50
37
|
|
|
51
38
|
Typical shifts implement:
|
|
@@ -77,7 +64,39 @@ Shifts run in **dry run** mode by default. In the automatic transaction modes (`
|
|
|
77
64
|
- **Commit**: `COMMIT=1 rake data:shift:backfill_foo`
|
|
78
65
|
- (`COMMIT=true` or `DRY_RUN=false` also commit)
|
|
79
66
|
|
|
80
|
-
|
|
67
|
+
### Automatic side-effect guards (dry run)
|
|
68
|
+
|
|
69
|
+
In **dry run** mode, DataShifter automatically blocks or fakes these side effects so unguarded code is less likely to hit the network or send mail/jobs:
|
|
70
|
+
|
|
71
|
+
| Service | Behavior in dry run |
|
|
72
|
+
|-------------|----------------------|
|
|
73
|
+
| **HTTP** | Blocked via WebMock (`disable_net_connect!`). Allow specific hosts with `allow_external_requests [...]` or `DataShifter.config.allow_external_requests`. |
|
|
74
|
+
| **ActionMailer** | `perform_deliveries = false` (restored after run). |
|
|
75
|
+
| **ActiveJob** | Queue adapter set to `:test` (restored after run). |
|
|
76
|
+
| **Sidekiq** | `Sidekiq::Testing.fake!` (restored with `disable!` after run). Only applied if `Sidekiq::Testing` is already loaded. |
|
|
77
|
+
|
|
78
|
+
**Guarding other side effects:** For anything we don't cover (e.g. another service, or allowed HTTP that mutates), use e.g. `return if dry_run?` in your shift. DB changes are always rolled back in dry run; only non-DB side effects need this.
|
|
79
|
+
|
|
80
|
+
To allow HTTP to specific hosts during dry run (e.g. a migration that must call an API to compute values), use the per-shift DSL or global config (NOTE: it is your responsibility to ensure you only make readonly requests in `dry_run?` mode):
|
|
81
|
+
|
|
82
|
+
```ruby
|
|
83
|
+
# Per shift
|
|
84
|
+
module DataShifts
|
|
85
|
+
class BackfillFromApi < DataShifter::Shift
|
|
86
|
+
allow_external_requests ["api.readonly.example.com", %r{\.internal\.company\z}]
|
|
87
|
+
# ...
|
|
88
|
+
end
|
|
89
|
+
end
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
```ruby
|
|
93
|
+
# Global (e.g. in config/initializers/data_shifter.rb)
|
|
94
|
+
DataShifter.configure do |config|
|
|
95
|
+
config.allow_external_requests = ["api.readonly.example.com"]
|
|
96
|
+
end
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
Allowed hosts are combined (per-shift + global). Restore (WebMock, mail, jobs) happens in an `ensure` so later code and other specs are unaffected.
|
|
81
100
|
|
|
82
101
|
## Transaction modes
|
|
83
102
|
|
|
@@ -85,7 +104,7 @@ Set the transaction mode at the class level:
|
|
|
85
104
|
|
|
86
105
|
- **`transaction :single` / `transaction true` (default)**: one DB transaction for the entire run; dry run rolls back at the end; a record error aborts the run.
|
|
87
106
|
- **`transaction :per_record`**: in commit mode, each record runs in its own transaction (errors are collected and the run continues); in dry run, the run is wrapped in a single rollback transaction.
|
|
88
|
-
- **`transaction false` / `transaction :none`**:
|
|
107
|
+
- **`transaction false` / `transaction :none`**: No automatic transaction in **commit** mode only. In dry run, the run is still wrapped in a single rollback transaction so DB changes are never committed. Use when you have external side effects or your own transaction strategy in commit mode.
|
|
89
108
|
|
|
90
109
|
```ruby
|
|
91
110
|
module DataShifts
|
|
@@ -137,7 +156,53 @@ CONTINUE_FROM=123 COMMIT=1 rake data:shift:backfill_foo
|
|
|
137
156
|
Notes:
|
|
138
157
|
|
|
139
158
|
- Only supported for `ActiveRecord::Relation` collections (Array-based collections—like those from `find_exactly!`—cannot be resumed).
|
|
140
|
-
- The filter is `primary_key > CONTINUE_FROM`, so it
|
|
159
|
+
- The filter is `primary_key > CONTINUE_FROM`, so it's only useful with monotonically increasing primary keys (e.g. `find_each`'s default behavior).
|
|
160
|
+
|
|
161
|
+
## How shift files map to rake tasks
|
|
162
|
+
|
|
163
|
+
DataShifter defines one rake task per file in `lib/data_shifts/*.rb`.
|
|
164
|
+
|
|
165
|
+
- **Task name**: derived from the filename with any leading digits removed.
|
|
166
|
+
- `20260201120000_backfill_foo.rb` → `data:shift:backfill_foo` (leading `<digits>_` prefix is stripped)
|
|
167
|
+
- `backfill_foo.rb` → `data:shift:backfill_foo`
|
|
168
|
+
- **Class name**: task name camelized, inside the `DataShifts` module.
|
|
169
|
+
- `backfill_foo` → `DataShifts::BackfillFoo`
|
|
170
|
+
|
|
171
|
+
Shift files are **required only when the task runs** (tasks are defined up front; classes load lazily).
|
|
172
|
+
The `description "..."` line is extracted from the file and used for `rake -T` output without loading the shift class.
|
|
173
|
+
|
|
174
|
+
## Configuration
|
|
175
|
+
|
|
176
|
+
Configure DataShifter globally in an initializer:
|
|
177
|
+
|
|
178
|
+
```ruby
|
|
179
|
+
# config/initializers/data_shifter.rb
|
|
180
|
+
DataShifter.configure do |config|
|
|
181
|
+
# Hosts allowed for HTTP during dry run only (no effect in commit mode)
|
|
182
|
+
config.allow_external_requests = ["api.readonly.example.com"]
|
|
183
|
+
|
|
184
|
+
# Suppress repeated log messages during a shift run (default: true)
|
|
185
|
+
config.suppress_repeated_logs = true
|
|
186
|
+
|
|
187
|
+
# Max unique messages to track for deduplication (default: 1000)
|
|
188
|
+
config.repeated_log_cap = 1000
|
|
189
|
+
|
|
190
|
+
# Global default for progress bar visibility (default: true)
|
|
191
|
+
config.progress_enabled = true
|
|
192
|
+
|
|
193
|
+
# Default status print interval in seconds when ENV STATUS_INTERVAL is not set (default: nil)
|
|
194
|
+
config.status_interval_seconds = nil
|
|
195
|
+
end
|
|
196
|
+
```
|
|
197
|
+
|
|
198
|
+
Per-shift overrides:
|
|
199
|
+
|
|
200
|
+
```ruby
|
|
201
|
+
class MyShift < DataShifter::Shift
|
|
202
|
+
progress false # Disable progress bar for this shift
|
|
203
|
+
suppress_repeated_logs false # Disable log deduplication for this shift
|
|
204
|
+
end
|
|
205
|
+
```
|
|
141
206
|
|
|
142
207
|
## Operational tips
|
|
143
208
|
|
|
@@ -145,7 +210,7 @@ Notes:
|
|
|
145
210
|
|
|
146
211
|
- **Start with a dry run**: run the task once with no environment variables set, confirm logs and summary look right, then re-run with `COMMIT=1`.
|
|
147
212
|
- **Make shifts idempotent**: structure `process_record` so re-running is safe (for example, update only when the target column is `NULL`, or compute the same derived value deterministically).
|
|
148
|
-
- **Guard side effects
|
|
213
|
+
- **Guard side effects we don't auto-block**: use `return if dry_run?` for any side effect not covered by Automatic side-effect guards (see above).
|
|
149
214
|
|
|
150
215
|
### Choosing a transaction mode (behavior + guidance)
|
|
151
216
|
|
|
@@ -156,8 +221,8 @@ Notes:
|
|
|
156
221
|
- **Behavior**: in commit mode, records are committed one-by-one; errors are collected and the run continues; the overall run fails at the end if any record failed.
|
|
157
222
|
- **Use when**: you want maximum progress and are OK investigating/fixing a subset of failures.
|
|
158
223
|
- **`transaction false` / `:none`**:
|
|
159
|
-
- **Behavior**: no automatic transaction
|
|
160
|
-
- **Use when**: you have intentional external side effects
|
|
224
|
+
- **Behavior**: in commit mode, no automatic transaction; in dry run, the run is still wrapped in a rollback transaction so DB changes are not committed.
|
|
225
|
+
- **Use when**: you have intentional external side effects or your own transaction/locking strategy in commit mode.
|
|
161
226
|
|
|
162
227
|
### Performance and operability (recommended)
|
|
163
228
|
|
|
@@ -182,17 +247,19 @@ def process_record(buyback)
|
|
|
182
247
|
end
|
|
183
248
|
```
|
|
184
249
|
|
|
185
|
-
### `skip!` (count but don
|
|
250
|
+
### `skip!` (count but don't update)
|
|
186
251
|
|
|
187
|
-
Mark a record as skipped (
|
|
252
|
+
Mark a record as skipped. Calling `skip!` terminates the current `process_record` immediately (no `return` needed). The record is counted as "Skipped" in the summary.
|
|
188
253
|
|
|
189
254
|
```ruby
|
|
190
255
|
def process_record(record)
|
|
191
256
|
skip!("already done") if record.foo.present?
|
|
192
|
-
record.update!(foo: value)
|
|
257
|
+
record.update!(foo: value) # not executed if skipped
|
|
193
258
|
end
|
|
194
259
|
```
|
|
195
260
|
|
|
261
|
+
Skip reasons are grouped: the summary shows the top 10 reasons by count (e.g. `"already done" (42), "not eligible" (3)`) instead of logging each skip inline. This keeps the progress bar clean.
|
|
262
|
+
|
|
196
263
|
### Throttling and disabling the progress bar
|
|
197
264
|
|
|
198
265
|
```ruby
|
|
@@ -202,19 +269,28 @@ class SomeShift < DataShifter::Shift
|
|
|
202
269
|
end
|
|
203
270
|
```
|
|
204
271
|
|
|
272
|
+
|
|
205
273
|
## Generator
|
|
206
274
|
|
|
207
275
|
| Command | Generates |
|
|
208
276
|
|--------|----------|
|
|
209
277
|
| `bin/rails generate data_shift backfill_foo` | `lib/data_shifts/<timestamp>_backfill_foo.rb` with a `DataShifts::BackfillFoo` class |
|
|
210
|
-
| `bin/rails generate data_shift backfill_users --model
|
|
278
|
+
| `bin/rails generate data_shift backfill_users --model User` | Same, with `User.all` in `collection` and `process_record(user)` |
|
|
211
279
|
| `bin/rails generate data_shift backfill_users --spec` | Also generates `spec/lib/data_shifts/backfill_users_spec.rb` when RSpec is enabled |
|
|
212
280
|
|
|
213
281
|
The generator refuses to create a second shift if it would produce a duplicate rake task name.
|
|
214
282
|
|
|
215
283
|
## Testing shifts (RSpec)
|
|
216
284
|
|
|
217
|
-
This gem ships a small helper module for running shifts in tests:
|
|
285
|
+
This gem ships a small helper module for running shifts in tests. Require it and include `DataShifter::SpecHelper` in specs or in `RSpec.configure` for `type: :data_shift`.
|
|
286
|
+
|
|
287
|
+
**Helpers:**
|
|
288
|
+
|
|
289
|
+
- **`run_data_shift(shift_class, dry_run: true, commit: false)`** — Runs the shift; returns an `Axn::Result`. Use `commit: true` to run in commit mode.
|
|
290
|
+
- **`silence_data_shift_output`** — Suppresses STDOUT for the block (e.g. progress bar).
|
|
291
|
+
- **`capture_data_shift_output`** — Runs the block and returns `[result, output_string]` for asserting on printed output.
|
|
292
|
+
|
|
293
|
+
Use `expect { ... }.not_to change(...)` and `expect { ... }.to change(...)` to assert that data stays unchanged in dry run and changes when committed:
|
|
218
294
|
|
|
219
295
|
```ruby
|
|
220
296
|
require "data_shifter/spec_helper"
|
|
@@ -222,35 +298,29 @@ require "data_shifter/spec_helper"
|
|
|
222
298
|
RSpec.describe DataShifts::BackfillFoo do
|
|
223
299
|
include DataShifter::SpecHelper
|
|
224
300
|
|
|
225
|
-
before { allow($stdout).to receive(:puts) }
|
|
301
|
+
before { allow($stdout).to receive(:puts) }
|
|
226
302
|
|
|
227
303
|
it "does not persist changes in dry run" do
|
|
228
|
-
|
|
229
|
-
|
|
230
|
-
|
|
304
|
+
expect do
|
|
305
|
+
result = run_data_shift(described_class, dry_run: true)
|
|
306
|
+
expect(result).to be_ok
|
|
307
|
+
end.not_to change(Foo, :count)
|
|
231
308
|
end
|
|
232
309
|
|
|
233
310
|
it "persists changes when committed" do
|
|
234
|
-
|
|
235
|
-
|
|
236
|
-
|
|
311
|
+
expect do
|
|
312
|
+
result = run_data_shift(described_class, commit: true)
|
|
313
|
+
expect(result).to be_ok
|
|
314
|
+
end.to change(Foo, :count).by(1)
|
|
315
|
+
# Or for in-place updates: .to change { record.reload.bar }.from(nil).to("baz")
|
|
237
316
|
end
|
|
238
317
|
end
|
|
239
318
|
```
|
|
240
319
|
|
|
241
|
-
## Optional RuboCop cop
|
|
242
|
-
|
|
243
|
-
If you use `transaction false` / `transaction :none`, you should guard writes and side effects with `dry_run?`. You can help avoid mistakes by linting that the helper is at least called once via the bundled cop:
|
|
244
|
-
|
|
245
|
-
```yaml
|
|
246
|
-
# .rubocop.yml
|
|
247
|
-
require:
|
|
248
|
-
- data_shifter/rubocop
|
|
249
|
-
```
|
|
250
|
-
|
|
251
320
|
## Requirements
|
|
252
321
|
|
|
253
322
|
- Ruby ≥ 3.2.1
|
|
254
|
-
- Rails (ActiveRecord, ActiveSupport, Railties) ≥
|
|
323
|
+
- Rails (ActiveRecord, ActiveSupport, Railties) ≥ 7.0
|
|
255
324
|
- `axn` (Shift classes include `Axn`)
|
|
256
325
|
- `ruby-progressbar` (for progress bars)
|
|
326
|
+
- `webmock` (for dry-run HTTP blocking; optional allowlist via `allow_external_requests [...]` / `DataShifter.config.allow_external_requests`)
|
|
@@ -0,0 +1,42 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module DataShifter
|
|
4
|
+
# Global configuration for DataShifter.
|
|
5
|
+
#
|
|
6
|
+
# Configure via:
|
|
7
|
+
# DataShifter.configure do |config|
|
|
8
|
+
# config.allow_external_requests = ["api.readonly.example.com"]
|
|
9
|
+
# config.suppress_repeated_logs = true
|
|
10
|
+
# end
|
|
11
|
+
#
|
|
12
|
+
# Or access directly:
|
|
13
|
+
# DataShifter.config.progress_enabled = false
|
|
14
|
+
class Configuration
|
|
15
|
+
# Hosts or regexes allowed for HTTP during dry run only (combined with per-shift allow_external_requests).
|
|
16
|
+
# Has no effect in commit mode — HTTP is unrestricted when dry_run is false.
|
|
17
|
+
attr_accessor :allow_external_requests
|
|
18
|
+
|
|
19
|
+
# Whether to suppress repeated log messages during a shift run. Default: true.
|
|
20
|
+
# Can be overridden per shift with `suppress_repeated_logs true/false`.
|
|
21
|
+
attr_accessor :suppress_repeated_logs
|
|
22
|
+
|
|
23
|
+
# Maximum unique log messages to track for deduplication. Default: 1000.
|
|
24
|
+
# When exceeded, entries with count == 1 are cleared first; repeated entries are kept.
|
|
25
|
+
attr_accessor :repeated_log_cap
|
|
26
|
+
|
|
27
|
+
# Global default for progress bar visibility. Default: true.
|
|
28
|
+
# Per-shift `progress true/false` overrides this.
|
|
29
|
+
attr_accessor :progress_enabled
|
|
30
|
+
|
|
31
|
+
# Default status print interval in seconds when ENV STATUS_INTERVAL is not set. Default: nil.
|
|
32
|
+
attr_accessor :status_interval_seconds
|
|
33
|
+
|
|
34
|
+
def initialize
|
|
35
|
+
@allow_external_requests = []
|
|
36
|
+
@suppress_repeated_logs = true
|
|
37
|
+
@repeated_log_cap = 1000
|
|
38
|
+
@progress_enabled = true
|
|
39
|
+
@status_interval_seconds = nil
|
|
40
|
+
end
|
|
41
|
+
end
|
|
42
|
+
end
|
|
@@ -0,0 +1,46 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module DataShifter
|
|
4
|
+
# Raised when a dry run attempts an outbound HTTP request to a host that is
|
|
5
|
+
# not allowed via allow_external_requests (per-shift or global config).
|
|
6
|
+
class ExternalRequestNotAllowedError < StandardError
|
|
7
|
+
def initialize(attempted_host: nil)
|
|
8
|
+
@attempted_host = attempted_host
|
|
9
|
+
super(build_message)
|
|
10
|
+
end
|
|
11
|
+
|
|
12
|
+
attr_reader :attempted_host
|
|
13
|
+
|
|
14
|
+
private
|
|
15
|
+
|
|
16
|
+
def build_message
|
|
17
|
+
intro = if @attempted_host && !@attempted_host.to_s.strip.empty?
|
|
18
|
+
"Dry run blocked an outbound HTTP request to #{@attempted_host}."
|
|
19
|
+
else
|
|
20
|
+
"Dry run blocked an outbound HTTP request."
|
|
21
|
+
end
|
|
22
|
+
|
|
23
|
+
if @attempted_host && !@attempted_host.to_s.strip.empty?
|
|
24
|
+
<<~MSG.strip
|
|
25
|
+
#{intro}
|
|
26
|
+
|
|
27
|
+
To allow this host during dry run, add to your shift class:
|
|
28
|
+
|
|
29
|
+
allow_external_requests ["#{@attempted_host}"]
|
|
30
|
+
|
|
31
|
+
Or set DataShifter.config.allow_external_requests in an initializer.
|
|
32
|
+
MSG
|
|
33
|
+
else
|
|
34
|
+
<<~MSG.strip
|
|
35
|
+
#{intro}
|
|
36
|
+
|
|
37
|
+
To allow specific hosts during dry run, add to your shift class:
|
|
38
|
+
|
|
39
|
+
allow_external_requests ["host.example.com"] # or use a regex
|
|
40
|
+
|
|
41
|
+
Or set DataShifter.config.allow_external_requests in an initializer.
|
|
42
|
+
MSG
|
|
43
|
+
end
|
|
44
|
+
end
|
|
45
|
+
end
|
|
46
|
+
end
|
|
@@ -18,14 +18,16 @@ module DataShifter
|
|
|
18
18
|
end
|
|
19
19
|
end
|
|
20
20
|
|
|
21
|
-
# Parse STATUS_INTERVAL environment variable.
|
|
22
|
-
# Returns nil if not set
|
|
21
|
+
# Parse STATUS_INTERVAL environment variable, falling back to config.
|
|
22
|
+
# Returns nil if not set/invalid and config is nil.
|
|
23
23
|
def status_interval_seconds
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
24
|
+
if ENV["STATUS_INTERVAL"].present?
|
|
25
|
+
Integer(ENV.fetch("STATUS_INTERVAL", nil), 10)
|
|
26
|
+
else
|
|
27
|
+
DataShifter.config.status_interval_seconds
|
|
28
|
+
end
|
|
27
29
|
rescue ArgumentError
|
|
28
|
-
|
|
30
|
+
DataShifter.config.status_interval_seconds
|
|
29
31
|
end
|
|
30
32
|
|
|
31
33
|
# Get CONTINUE_FROM environment variable value.
|
|
@@ -0,0 +1,149 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require "digest"
|
|
4
|
+
require "logger"
|
|
5
|
+
|
|
6
|
+
module DataShifter
|
|
7
|
+
module Internal
|
|
8
|
+
# A proxy logger that suppresses repeated log messages during a shift run.
|
|
9
|
+
# Uses a hash of the message as the key for memory efficiency.
|
|
10
|
+
# First occurrence is forwarded; subsequent occurrences are counted but not forwarded.
|
|
11
|
+
# At the end, prints a summary of suppressed messages via puts.
|
|
12
|
+
class LogDeduplicator
|
|
13
|
+
attr_reader :real_logger, :cap, :seen
|
|
14
|
+
|
|
15
|
+
def initialize(real_logger, cap:)
|
|
16
|
+
@real_logger = real_logger
|
|
17
|
+
@cap = cap
|
|
18
|
+
@seen = {}
|
|
19
|
+
end
|
|
20
|
+
|
|
21
|
+
def add(severity, message = nil, progname = nil, &block)
|
|
22
|
+
msg = block ? block.call : message
|
|
23
|
+
key = message_key(severity, progname, msg)
|
|
24
|
+
|
|
25
|
+
if @seen.key?(key)
|
|
26
|
+
@seen[key][:count] += 1
|
|
27
|
+
nil
|
|
28
|
+
else
|
|
29
|
+
enforce_cap
|
|
30
|
+
@seen[key] = { count: 1, message: truncate_message(msg || progname), severity: }
|
|
31
|
+
@real_logger.add(severity, message, progname, &block)
|
|
32
|
+
end
|
|
33
|
+
end
|
|
34
|
+
|
|
35
|
+
def debug(message = nil, progname = nil, &)
|
|
36
|
+
add(Logger::DEBUG, message, progname, &)
|
|
37
|
+
end
|
|
38
|
+
|
|
39
|
+
def info(message = nil, progname = nil, &)
|
|
40
|
+
add(Logger::INFO, message, progname, &)
|
|
41
|
+
end
|
|
42
|
+
|
|
43
|
+
def warn(message = nil, progname = nil, &)
|
|
44
|
+
add(Logger::WARN, message, progname, &)
|
|
45
|
+
end
|
|
46
|
+
|
|
47
|
+
def error(message = nil, progname = nil, &)
|
|
48
|
+
add(Logger::ERROR, message, progname, &)
|
|
49
|
+
end
|
|
50
|
+
|
|
51
|
+
def fatal(message = nil, progname = nil, &)
|
|
52
|
+
add(Logger::FATAL, message, progname, &)
|
|
53
|
+
end
|
|
54
|
+
|
|
55
|
+
def unknown(message = nil, progname = nil, &)
|
|
56
|
+
add(Logger::UNKNOWN, message, progname, &)
|
|
57
|
+
end
|
|
58
|
+
|
|
59
|
+
def <<(msg)
|
|
60
|
+
key = message_key(Logger::INFO, nil, msg)
|
|
61
|
+
if @seen.key?(key)
|
|
62
|
+
@seen[key][:count] += 1
|
|
63
|
+
else
|
|
64
|
+
enforce_cap
|
|
65
|
+
@seen[key] = { count: 1, message: truncate_message(msg), severity: Logger::INFO }
|
|
66
|
+
@real_logger << msg
|
|
67
|
+
end
|
|
68
|
+
end
|
|
69
|
+
|
|
70
|
+
def level
|
|
71
|
+
@real_logger.level
|
|
72
|
+
end
|
|
73
|
+
|
|
74
|
+
def level=(val)
|
|
75
|
+
@real_logger.level = val
|
|
76
|
+
end
|
|
77
|
+
|
|
78
|
+
def formatter
|
|
79
|
+
@real_logger.formatter
|
|
80
|
+
end
|
|
81
|
+
|
|
82
|
+
def formatter=(val)
|
|
83
|
+
@real_logger.formatter = val
|
|
84
|
+
end
|
|
85
|
+
|
|
86
|
+
def close
|
|
87
|
+
@real_logger.close
|
|
88
|
+
end
|
|
89
|
+
|
|
90
|
+
def suppressed_messages
|
|
91
|
+
@seen.select { |_k, v| v[:count] > 1 }
|
|
92
|
+
end
|
|
93
|
+
|
|
94
|
+
def print_summary
|
|
95
|
+
suppressed = suppressed_messages
|
|
96
|
+
return if suppressed.empty?
|
|
97
|
+
|
|
98
|
+
puts "\n[DataShifter] Suppressed repeated log messages:"
|
|
99
|
+
suppressed.each_value do |entry|
|
|
100
|
+
count = entry[:count] - 1
|
|
101
|
+
snippet = entry[:message].to_s[0, 100]
|
|
102
|
+
snippet = "#{snippet}..." if entry[:message].to_s.length > 100
|
|
103
|
+
puts " #{count}x suppressed: #{snippet.inspect}"
|
|
104
|
+
end
|
|
105
|
+
end
|
|
106
|
+
|
|
107
|
+
def method_missing(method, ...)
|
|
108
|
+
@real_logger.send(method, ...)
|
|
109
|
+
end
|
|
110
|
+
|
|
111
|
+
def respond_to_missing?(method, include_private = false)
|
|
112
|
+
@real_logger.respond_to?(method, include_private) || super
|
|
113
|
+
end
|
|
114
|
+
|
|
115
|
+
class << self
|
|
116
|
+
def with_deduplicating_logger(real_logger, cap:)
|
|
117
|
+
proxy = new(real_logger, cap:)
|
|
118
|
+
yield proxy
|
|
119
|
+
ensure
|
|
120
|
+
proxy&.print_summary
|
|
121
|
+
end
|
|
122
|
+
end
|
|
123
|
+
|
|
124
|
+
private
|
|
125
|
+
|
|
126
|
+
def message_key(severity, progname, message)
|
|
127
|
+
normalized = "#{severity}:#{progname}:#{message}"
|
|
128
|
+
Digest::SHA256.hexdigest(normalized)
|
|
129
|
+
end
|
|
130
|
+
|
|
131
|
+
def truncate_message(msg)
|
|
132
|
+
str = msg.to_s
|
|
133
|
+
str.length > 200 ? "#{str[0, 200]}..." : str
|
|
134
|
+
end
|
|
135
|
+
|
|
136
|
+
def enforce_cap
|
|
137
|
+
return if @seen.size < @cap
|
|
138
|
+
|
|
139
|
+
singles = @seen.select { |_k, v| v[:count] == 1 }
|
|
140
|
+
singles.each_key { |k| @seen.delete(k) } if singles.any?
|
|
141
|
+
|
|
142
|
+
return unless @seen.size >= @cap
|
|
143
|
+
|
|
144
|
+
oldest_key = @seen.keys.first
|
|
145
|
+
@seen.delete(oldest_key)
|
|
146
|
+
end
|
|
147
|
+
end
|
|
148
|
+
end
|
|
149
|
+
end
|
|
@@ -11,6 +11,8 @@ module DataShifter
|
|
|
11
11
|
none: "none",
|
|
12
12
|
}.freeze
|
|
13
13
|
|
|
14
|
+
SKIP_REASONS_DISPLAY_LIMIT = 10
|
|
15
|
+
|
|
14
16
|
module_function
|
|
15
17
|
|
|
16
18
|
def print_header(io:, shift_class:, total:, label:, dry_run:, transaction_mode:, status_interval:)
|
|
@@ -30,7 +32,7 @@ module DataShifter
|
|
|
30
32
|
io.puts ""
|
|
31
33
|
end
|
|
32
34
|
|
|
33
|
-
def print_summary(io:, stats:, errors:, start_time:, dry_run:, transaction_mode:, interrupted:, task_name:, last_successful_id:)
|
|
35
|
+
def print_summary(io:, stats:, errors:, start_time:, dry_run:, transaction_mode:, interrupted:, task_name:, last_successful_id:, skip_reasons: {})
|
|
34
36
|
return unless start_time
|
|
35
37
|
|
|
36
38
|
elapsed = (Time.current - start_time).round(1)
|
|
@@ -43,6 +45,7 @@ module DataShifter
|
|
|
43
45
|
io.puts "Succeeded: #{stats[:succeeded]}"
|
|
44
46
|
io.puts "Failed: #{stats[:failed]}"
|
|
45
47
|
io.puts "Skipped: #{stats[:skipped]}"
|
|
48
|
+
print_skip_reasons(io:, skip_reasons:) if skip_reasons.any?
|
|
46
49
|
|
|
47
50
|
print_errors(io:, errors:) if errors.any?
|
|
48
51
|
print_interrupt_warning(io:, transaction_mode:, dry_run:) if interrupted
|
|
@@ -52,7 +55,7 @@ module DataShifter
|
|
|
52
55
|
io.puts "=" * 60
|
|
53
56
|
end
|
|
54
57
|
|
|
55
|
-
def print_progress(io:, stats:, errors:, start_time:, status_interval:)
|
|
58
|
+
def print_progress(io:, stats:, errors:, start_time:, status_interval:, skip_reasons: {})
|
|
56
59
|
return unless start_time
|
|
57
60
|
|
|
58
61
|
elapsed = (Time.current - start_time).round(1)
|
|
@@ -74,6 +77,7 @@ module DataShifter
|
|
|
74
77
|
io.puts "Succeeded: #{stats[:succeeded]}"
|
|
75
78
|
io.puts "Failed: #{stats[:failed]}"
|
|
76
79
|
io.puts "Skipped: #{stats[:skipped]}"
|
|
80
|
+
print_skip_reasons(io:, skip_reasons:) if skip_reasons.any?
|
|
77
81
|
|
|
78
82
|
print_errors(io:, errors:) if errors.any?
|
|
79
83
|
|
|
@@ -85,7 +89,9 @@ module DataShifter
|
|
|
85
89
|
io.puts ""
|
|
86
90
|
io.puts "ERRORS:"
|
|
87
91
|
errors.each do |err|
|
|
88
|
-
|
|
92
|
+
lines = err[:error].to_s.split("\n")
|
|
93
|
+
io.puts " #{err[:record]}: #{lines.first}"
|
|
94
|
+
lines.drop(1).each { |line| io.puts " #{line}" }
|
|
89
95
|
err[:backtrace]&.each { |line| io.puts " #{line}" }
|
|
90
96
|
end
|
|
91
97
|
end
|
|
@@ -145,6 +151,14 @@ module DataShifter
|
|
|
145
151
|
status_tips.join(" or ")
|
|
146
152
|
end
|
|
147
153
|
end
|
|
154
|
+
|
|
155
|
+
def print_skip_reasons(io:, skip_reasons:)
|
|
156
|
+
return if skip_reasons.empty?
|
|
157
|
+
|
|
158
|
+
top = skip_reasons.sort_by { |_reason, count| -count }.first(SKIP_REASONS_DISPLAY_LIMIT)
|
|
159
|
+
formatted = top.map { |reason, count| "\"#{reason}\" (#{count})" }.join(", ")
|
|
160
|
+
io.puts " #{formatted}"
|
|
161
|
+
end
|
|
148
162
|
end
|
|
149
163
|
end
|
|
150
164
|
end
|
|
@@ -0,0 +1,120 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require "uri"
|
|
4
|
+
|
|
5
|
+
module DataShifter
|
|
6
|
+
module Internal
|
|
7
|
+
# Applies and restores side-effect guards during dry runs so that HTTP, mail,
|
|
8
|
+
# and job enqueues are blocked (or faked) unless explicitly allowed.
|
|
9
|
+
#
|
|
10
|
+
# Production impact:
|
|
11
|
+
# - WebMock: required only when apply_webmock runs (i.e. during a dry run), so commit-only
|
|
12
|
+
# production runs never load WebMock. On restore we revert to the previous state (enable!
|
|
13
|
+
# or disable!) so e.g. specs that had WebMock enabled are not left with it disabled.
|
|
14
|
+
# - ActionMailer / ActiveJob / Sidekiq: no extra loading; we only toggle existing config
|
|
15
|
+
# for the duration of the block and restore in ensure, so impact is scoped to the run.
|
|
16
|
+
module SideEffectGuards
|
|
17
|
+
class << self
|
|
18
|
+
# Applies side-effect guards, yields, then restores. Call only when running in dry run.
|
|
19
|
+
def with_guards(shift_class:, &block)
|
|
20
|
+
saved = {}
|
|
21
|
+
apply_guards(shift_class, saved)
|
|
22
|
+
block.call
|
|
23
|
+
rescue webmock_net_connect_error => e
|
|
24
|
+
host = extract_host_from_webmock_message(e.message)
|
|
25
|
+
raise DataShifter::ExternalRequestNotAllowedError.new(attempted_host: host), cause: e
|
|
26
|
+
ensure
|
|
27
|
+
restore_guards(saved) if saved.any?
|
|
28
|
+
end
|
|
29
|
+
|
|
30
|
+
private
|
|
31
|
+
|
|
32
|
+
def apply_guards(shift_class, saved)
|
|
33
|
+
apply_webmock(shift_class, saved)
|
|
34
|
+
# rubocop:disable Style/CombinableDefined -- parent must be checked first to avoid NameError when constant not loaded
|
|
35
|
+
apply_action_mailer(saved) if defined?(ActionMailer) && defined?(ActionMailer::Base)
|
|
36
|
+
apply_active_job(saved) if defined?(ActiveJob) && defined?(ActiveJob::Base)
|
|
37
|
+
apply_sidekiq(saved) if defined?(Sidekiq) && defined?(Sidekiq::Testing)
|
|
38
|
+
# rubocop:enable Style/CombinableDefined
|
|
39
|
+
end
|
|
40
|
+
|
|
41
|
+
def apply_webmock(shift_class, saved)
|
|
42
|
+
if defined?(WebMock)
|
|
43
|
+
# WebMock already loaded (e.g. in specs); capture so we can restore
|
|
44
|
+
saved[:webmock_was_enabled] = net_http_webmock_enabled?
|
|
45
|
+
else
|
|
46
|
+
require "webmock"
|
|
47
|
+
saved[:webmock_was_enabled] = false
|
|
48
|
+
end
|
|
49
|
+
WebMock.enable!
|
|
50
|
+
allowed = allowed_net_hosts(shift_class)
|
|
51
|
+
opts = allowed.any? ? { allow: allowed } : {}
|
|
52
|
+
WebMock.disable_net_connect!(**opts)
|
|
53
|
+
saved[:webmock] = true
|
|
54
|
+
end
|
|
55
|
+
|
|
56
|
+
def net_http_webmock_enabled?
|
|
57
|
+
Net::HTTP.socket_type.to_s.include?("StubSocket")
|
|
58
|
+
rescue StandardError
|
|
59
|
+
false
|
|
60
|
+
end
|
|
61
|
+
|
|
62
|
+
def allowed_net_hosts(shift_class)
|
|
63
|
+
per_shift = shift_class.respond_to?(:_allow_external_requests) ? shift_class._allow_external_requests : []
|
|
64
|
+
global = DataShifter.config.allow_external_requests
|
|
65
|
+
Array(per_shift) + Array(global)
|
|
66
|
+
end
|
|
67
|
+
|
|
68
|
+
def webmock_net_connect_error
|
|
69
|
+
return WebMock::NetConnectNotAllowedError if defined?(WebMock::NetConnectNotAllowedError)
|
|
70
|
+
|
|
71
|
+
Class.new(StandardError) # never matched when WebMock not loaded
|
|
72
|
+
end
|
|
73
|
+
|
|
74
|
+
def extract_host_from_webmock_message(message)
|
|
75
|
+
return nil unless message.is_a?(String)
|
|
76
|
+
|
|
77
|
+
# WebMock format: "Unregistered request: GET https://host/path with headers ..."
|
|
78
|
+
m = message.match(%r{Unregistered request: \w+ (https?://[^\s]+)})
|
|
79
|
+
return nil unless m
|
|
80
|
+
|
|
81
|
+
uri = URI.parse(m[1])
|
|
82
|
+
uri.host
|
|
83
|
+
rescue URI::InvalidURIError, ArgumentError
|
|
84
|
+
nil
|
|
85
|
+
end
|
|
86
|
+
|
|
87
|
+
def apply_action_mailer(saved)
|
|
88
|
+
saved[:action_mailer_perform_deliveries] = ActionMailer::Base.perform_deliveries
|
|
89
|
+
ActionMailer::Base.perform_deliveries = false
|
|
90
|
+
end
|
|
91
|
+
|
|
92
|
+
def apply_active_job(saved)
|
|
93
|
+
saved[:active_job_adapter] = ActiveJob::Base.queue_adapter
|
|
94
|
+
ActiveJob::Base.queue_adapter = :test
|
|
95
|
+
end
|
|
96
|
+
|
|
97
|
+
def apply_sidekiq(saved)
|
|
98
|
+
return unless Sidekiq::Testing.respond_to?(:fake!)
|
|
99
|
+
|
|
100
|
+
Sidekiq::Testing.fake!
|
|
101
|
+
saved[:sidekiq] = true
|
|
102
|
+
end
|
|
103
|
+
|
|
104
|
+
def restore_guards(saved)
|
|
105
|
+
if saved.delete(:webmock)
|
|
106
|
+
(saved.delete(:webmock_was_enabled) ? WebMock.enable! : WebMock.disable!)
|
|
107
|
+
end
|
|
108
|
+
|
|
109
|
+
ActionMailer::Base.perform_deliveries = saved.delete(:action_mailer_perform_deliveries) if saved.key?(:action_mailer_perform_deliveries)
|
|
110
|
+
|
|
111
|
+
ActiveJob::Base.queue_adapter = saved.delete(:active_job_adapter) if saved.key?(:active_job_adapter)
|
|
112
|
+
|
|
113
|
+
return unless saved.delete(:sidekiq)
|
|
114
|
+
|
|
115
|
+
Sidekiq::Testing.disable!
|
|
116
|
+
end
|
|
117
|
+
end
|
|
118
|
+
end
|
|
119
|
+
end
|
|
120
|
+
end
|
data/lib/data_shifter/shift.rb
CHANGED
|
@@ -7,6 +7,8 @@ require_relative "internal/output"
|
|
|
7
7
|
require_relative "internal/signal_handler"
|
|
8
8
|
require_relative "internal/record_utils"
|
|
9
9
|
require_relative "internal/progress_bar"
|
|
10
|
+
require_relative "internal/side_effect_guards"
|
|
11
|
+
require_relative "internal/log_deduplicator"
|
|
10
12
|
|
|
11
13
|
# Base class for data shifts. Dry-run by default, progress bars, transaction modes, consistent summaries.
|
|
12
14
|
#
|
|
@@ -30,15 +32,15 @@ require_relative "internal/progress_bar"
|
|
|
30
32
|
# Running:
|
|
31
33
|
# - `rake data:shift:backfill_foo` (dry run by default)
|
|
32
34
|
# - `COMMIT=1 rake data:shift:backfill_foo` (apply changes)
|
|
33
|
-
# - Or call directly: `MyShift.call(dry_run: false)` (Axn semantics) -
|
|
35
|
+
# - Or technically can call directly: `MyShift.call(dry_run: false)` (Axn semantics) - BUT:
|
|
36
|
+
# NOTES: default location not auto-loaded, and in general it is strongly recommended to use the rake task.
|
|
34
37
|
#
|
|
35
38
|
# Transaction modes (set at class level with `transaction`):
|
|
36
39
|
# - `transaction :single` (default): one transaction for the whole run (all-or-nothing).
|
|
37
40
|
# - `transaction :per_record`: each record in its own transaction.
|
|
38
|
-
# - `transaction false`: no automatic
|
|
41
|
+
# - `transaction false`: no automatic transaction in commit mode; in dry run we still wrap in a rollback transaction.
|
|
39
42
|
#
|
|
40
|
-
# Dry run:
|
|
41
|
-
# Non-DB side effects are not rolled back; guard with `return if dry_run?` / `return unless dry_run?`.
|
|
43
|
+
# Dry run: DB changes are always rolled back (we wrap in a transaction and raise Rollback). Guard non-DB side effects with `return if dry_run?`.
|
|
42
44
|
#
|
|
43
45
|
# Fixed list of IDs (fail fast): Use find_exactly!(Model, [id1, id2, ...]) in `collection`.
|
|
44
46
|
# Large collections: Return an ActiveRecord::Relation and iteration uses `find_each`.
|
|
@@ -51,16 +53,25 @@ module DataShifter
|
|
|
51
53
|
|
|
52
54
|
log_calls false if respond_to?(:log_calls)
|
|
53
55
|
|
|
56
|
+
around :_with_log_deduplication
|
|
57
|
+
around :_with_side_effect_guards
|
|
54
58
|
around :_with_transaction_for_dry_run
|
|
55
59
|
before :_reset_tracking
|
|
56
60
|
on_success :_print_summary
|
|
57
61
|
on_error :_print_summary
|
|
58
62
|
|
|
59
63
|
class_attribute :_transaction_mode, default: :single
|
|
60
|
-
class_attribute :_progress_enabled, default:
|
|
64
|
+
class_attribute :_progress_enabled, default: nil
|
|
61
65
|
class_attribute :_description, default: nil
|
|
62
66
|
class_attribute :_task_name, default: nil
|
|
63
67
|
class_attribute :_throttle_interval, default: nil
|
|
68
|
+
class_attribute :_allow_external_requests, default: [], instance_accessor: false
|
|
69
|
+
class_attribute :_suppress_repeated_logs, default: nil, instance_accessor: false
|
|
70
|
+
|
|
71
|
+
# Internal exception used by skip! to abort the current process_record.
|
|
72
|
+
# Rescued in _process_one; not propagated.
|
|
73
|
+
class SkipRecord < StandardError; end
|
|
74
|
+
private_constant :SkipRecord
|
|
64
75
|
|
|
65
76
|
class << self
|
|
66
77
|
def description(text = nil)
|
|
@@ -104,6 +115,19 @@ module DataShifter
|
|
|
104
115
|
self._throttle_interval = interval
|
|
105
116
|
end
|
|
106
117
|
|
|
118
|
+
# Allow these hosts (or regexes) for HTTP during dry run only. Combines with DataShifter.config.allow_external_requests.
|
|
119
|
+
# Has no effect in commit mode — HTTP is unrestricted when dry_run is false.
|
|
120
|
+
# Example: allow_external_requests ["api.readonly.example.com", %r{\.internal\.company\z}]
|
|
121
|
+
def allow_external_requests(hosts)
|
|
122
|
+
self._allow_external_requests = Array(hosts)
|
|
123
|
+
end
|
|
124
|
+
|
|
125
|
+
# Enable/disable log deduplication for this shift. Overrides DataShifter.config.suppress_repeated_logs.
|
|
126
|
+
# Example: suppress_repeated_logs false
|
|
127
|
+
def suppress_repeated_logs(enabled)
|
|
128
|
+
self._suppress_repeated_logs = !!enabled
|
|
129
|
+
end
|
|
130
|
+
|
|
107
131
|
def run!
|
|
108
132
|
dry_run = Internal::Env.dry_run?
|
|
109
133
|
result = call(dry_run:)
|
|
@@ -133,8 +157,9 @@ module DataShifter
|
|
|
133
157
|
|
|
134
158
|
def skip!(reason = nil)
|
|
135
159
|
@stats[:skipped] += 1
|
|
136
|
-
|
|
137
|
-
|
|
160
|
+
key = reason.to_s.presence || "(no reason given)"
|
|
161
|
+
@skip_reasons[key] += 1
|
|
162
|
+
raise SkipRecord
|
|
138
163
|
end
|
|
139
164
|
|
|
140
165
|
def log(message)
|
|
@@ -145,24 +170,61 @@ module DataShifter
|
|
|
145
170
|
|
|
146
171
|
# --- Axn lifecycle hooks ---
|
|
147
172
|
|
|
173
|
+
def _with_log_deduplication(chain)
|
|
174
|
+
effective = self.class._suppress_repeated_logs.nil? ? DataShifter.config.suppress_repeated_logs : self.class._suppress_repeated_logs
|
|
175
|
+
unless effective && defined?(::Rails) && ::Rails.respond_to?(:logger) && ::Rails.logger
|
|
176
|
+
chain.call
|
|
177
|
+
return
|
|
178
|
+
end
|
|
179
|
+
|
|
180
|
+
original_logger = ::Rails.logger
|
|
181
|
+
original_ar_logger = ::ActiveRecord::Base.logger
|
|
182
|
+
|
|
183
|
+
Internal::LogDeduplicator.with_deduplicating_logger(original_logger, cap: DataShifter.config.repeated_log_cap) do |proxy|
|
|
184
|
+
::Rails.logger = proxy
|
|
185
|
+
::ActiveRecord::Base.logger = proxy
|
|
186
|
+
chain.call
|
|
187
|
+
end
|
|
188
|
+
ensure
|
|
189
|
+
if effective && defined?(::Rails) && ::Rails.respond_to?(:logger=)
|
|
190
|
+
::Rails.logger = original_logger
|
|
191
|
+
::ActiveRecord::Base.logger = original_ar_logger
|
|
192
|
+
end
|
|
193
|
+
end
|
|
194
|
+
|
|
195
|
+
def _with_side_effect_guards(chain)
|
|
196
|
+
if dry_run?
|
|
197
|
+
Internal::SideEffectGuards.with_guards(shift_class: self.class) { chain.call }
|
|
198
|
+
else
|
|
199
|
+
chain.call
|
|
200
|
+
end
|
|
201
|
+
end
|
|
202
|
+
|
|
148
203
|
def _with_transaction_for_dry_run(chain)
|
|
149
204
|
if _transaction_mode == :none
|
|
150
|
-
|
|
205
|
+
if dry_run?
|
|
206
|
+
::ActiveRecord::Base.transaction do
|
|
207
|
+
chain.call
|
|
208
|
+
raise ::ActiveRecord::Rollback
|
|
209
|
+
end
|
|
210
|
+
else
|
|
211
|
+
chain.call
|
|
212
|
+
end
|
|
151
213
|
return
|
|
152
214
|
end
|
|
153
215
|
|
|
154
216
|
if _transaction_mode == :single
|
|
155
|
-
ActiveRecord::Base.transaction do
|
|
217
|
+
::ActiveRecord::Base.transaction do
|
|
156
218
|
chain.call
|
|
157
|
-
raise ActiveRecord::Rollback if dry_run?
|
|
219
|
+
raise ::ActiveRecord::Rollback if dry_run?
|
|
158
220
|
end
|
|
159
221
|
return
|
|
160
222
|
end
|
|
161
223
|
|
|
162
224
|
if dry_run?
|
|
163
|
-
ActiveRecord::Base.transaction do
|
|
225
|
+
::ActiveRecord::Base.transaction do
|
|
164
226
|
chain.call
|
|
165
|
-
raise ActiveRecord::Rollback
|
|
227
|
+
raise ::ActiveRecord::Rollback
|
|
166
228
|
end
|
|
167
229
|
else
|
|
168
230
|
chain.call
|
|
@@ -172,6 +234,7 @@ module DataShifter
|
|
|
172
234
|
def _reset_tracking
|
|
173
235
|
@stats = { processed: 0, succeeded: 0, failed: 0, skipped: 0 }
|
|
174
236
|
@errors = []
|
|
237
|
+
@skip_reasons = Hash.new(0)
|
|
175
238
|
@start_time = Time.current
|
|
176
239
|
@last_status_print = @start_time
|
|
177
240
|
@_data_shift_interrupted = false
|
|
@@ -183,6 +246,7 @@ module DataShifter
|
|
|
183
246
|
io: $stdout,
|
|
184
247
|
stats: @stats,
|
|
185
248
|
errors: @errors,
|
|
249
|
+
skip_reasons: @skip_reasons,
|
|
186
250
|
start_time: @start_time,
|
|
187
251
|
dry_run: dry_run?,
|
|
188
252
|
transaction_mode: _transaction_mode,
|
|
@@ -209,6 +273,7 @@ module DataShifter
|
|
|
209
273
|
io: $stdout,
|
|
210
274
|
stats: @stats,
|
|
211
275
|
errors: @errors,
|
|
276
|
+
skip_reasons: @skip_reasons,
|
|
212
277
|
start_time: @start_time,
|
|
213
278
|
status_interval: Internal::Env.status_interval_seconds,
|
|
214
279
|
)
|
|
@@ -275,18 +340,19 @@ module DataShifter
|
|
|
275
340
|
# --- Transaction execution strategies ---
|
|
276
341
|
|
|
277
342
|
def _run_in_single_transaction(enum, total, &block)
|
|
278
|
-
ActiveRecord::Base.transaction do
|
|
343
|
+
::ActiveRecord::Base.transaction do
|
|
279
344
|
_iterate(enum, total, &block)
|
|
280
345
|
if dry_run?
|
|
281
346
|
log "\nDry run complete — rolling back all changes."
|
|
282
|
-
raise ActiveRecord::Rollback
|
|
347
|
+
raise ::ActiveRecord::Rollback
|
|
283
348
|
end
|
|
284
349
|
end
|
|
285
350
|
rescue StandardError => e
|
|
286
351
|
return if @errors.any?
|
|
287
352
|
|
|
288
353
|
@stats[:failed] += 1
|
|
289
|
-
|
|
354
|
+
error_text = _format_error(e)
|
|
355
|
+
@errors << { record: "transaction", error: error_text, backtrace: e.backtrace&.first(3) }
|
|
290
356
|
end
|
|
291
357
|
|
|
292
358
|
def _run_per_record(enum, total, &)
|
|
@@ -294,7 +360,7 @@ module DataShifter
|
|
|
294
360
|
if dry_run?
|
|
295
361
|
yield record
|
|
296
362
|
else
|
|
297
|
-
ActiveRecord::Base.transaction { yield record }
|
|
363
|
+
::ActiveRecord::Base.transaction { yield record }
|
|
298
364
|
end
|
|
299
365
|
end
|
|
300
366
|
end
|
|
@@ -304,7 +370,8 @@ module DataShifter
|
|
|
304
370
|
end
|
|
305
371
|
|
|
306
372
|
def _iterate(enum, total)
|
|
307
|
-
|
|
373
|
+
progress_on = _progress_enabled.nil? ? DataShifter.config.progress_enabled : _progress_enabled
|
|
374
|
+
bar = Internal::ProgressBar.create(total:, dry_run: dry_run?, enabled: progress_on)
|
|
308
375
|
if enum.respond_to?(:find_each)
|
|
309
376
|
enum.find_each do |record|
|
|
310
377
|
_process_one(record) { yield record }
|
|
@@ -325,11 +392,15 @@ module DataShifter
|
|
|
325
392
|
yield
|
|
326
393
|
@stats[:succeeded] += 1
|
|
327
394
|
@_last_successful_id = record.id if record.respond_to?(:id)
|
|
395
|
+
rescue SkipRecord
|
|
396
|
+
# skip! already incremented @stats[:skipped] and recorded the reason; just continue
|
|
397
|
+
nil
|
|
328
398
|
rescue StandardError => e
|
|
329
399
|
@stats[:failed] += 1
|
|
330
400
|
identifier = Internal::RecordUtils.identifier(record)
|
|
331
|
-
|
|
332
|
-
|
|
401
|
+
error_text = _format_error(e)
|
|
402
|
+
@errors << { record: identifier, error: error_text, backtrace: e.backtrace&.first(3) }
|
|
403
|
+
_log_error(identifier, error_text)
|
|
333
404
|
|
|
334
405
|
raise if _transaction_mode == :single
|
|
335
406
|
ensure
|
|
@@ -345,6 +416,18 @@ module DataShifter
|
|
|
345
416
|
_print_progress
|
|
346
417
|
end
|
|
347
418
|
|
|
419
|
+
def _format_error(e)
|
|
420
|
+
msg = e.message.to_s
|
|
421
|
+
msg += "\n Caused by: #{e.cause.class}: #{e.cause.message}" if e.respond_to?(:cause) && e.cause
|
|
422
|
+
msg
|
|
423
|
+
end
|
|
424
|
+
|
|
425
|
+
def _log_error(identifier, error_text)
|
|
426
|
+
lines = error_text.to_s.split("\n")
|
|
427
|
+
log "ERROR #{identifier}: #{lines.first}"
|
|
428
|
+
lines.drop(1).each { |line| log " #{line}" }
|
|
429
|
+
end
|
|
430
|
+
|
|
348
431
|
# --- Output helpers ---
|
|
349
432
|
|
|
350
433
|
def _print_header(total)
|
data/lib/data_shifter/version.rb
CHANGED
data/lib/data_shifter.rb
CHANGED
|
@@ -1,5 +1,26 @@
|
|
|
1
1
|
# frozen_string_literal: true
|
|
2
2
|
|
|
3
3
|
require_relative "data_shifter/version"
|
|
4
|
+
require_relative "data_shifter/configuration"
|
|
5
|
+
require_relative "data_shifter/errors"
|
|
4
6
|
require_relative "data_shifter/shift"
|
|
5
7
|
require_relative "data_shifter/railtie"
|
|
8
|
+
|
|
9
|
+
module DataShifter
|
|
10
|
+
class << self
|
|
11
|
+
# Returns the global configuration instance.
|
|
12
|
+
def config
|
|
13
|
+
@config ||= Configuration.new
|
|
14
|
+
end
|
|
15
|
+
|
|
16
|
+
# Yields the configuration for block-style setup.
|
|
17
|
+
#
|
|
18
|
+
# DataShifter.configure do |c|
|
|
19
|
+
# c.allow_external_requests = ["api.readonly.example.com"]
|
|
20
|
+
# c.suppress_repeated_logs = false
|
|
21
|
+
# end
|
|
22
|
+
def configure
|
|
23
|
+
yield config
|
|
24
|
+
end
|
|
25
|
+
end
|
|
26
|
+
end
|
|
@@ -83,6 +83,7 @@ class DataShiftGenerator < Rails::Generators::NamedBase
|
|
|
83
83
|
underscored_name = name.underscore
|
|
84
84
|
record_arg = @model_name.present? ? @model_name.underscore : "record"
|
|
85
85
|
|
|
86
|
+
model_for_change = @model_name.present? ? @model_name : "Model"
|
|
86
87
|
create_file "spec/lib/data_shifts/#{underscored_name}_spec.rb", <<~RUBY
|
|
87
88
|
# frozen_string_literal: true
|
|
88
89
|
|
|
@@ -94,22 +95,24 @@ class DataShiftGenerator < Rails::Generators::NamedBase
|
|
|
94
95
|
|
|
95
96
|
before { allow($stdout).to receive(:puts) }
|
|
96
97
|
|
|
97
|
-
#
|
|
98
|
+
# Set up test records as needed
|
|
98
99
|
# let(:#{record_arg}) { create(:#{record_arg}) }
|
|
99
100
|
|
|
100
101
|
describe "dry run" do
|
|
101
102
|
it "does not persist changes" do
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
|
|
103
|
+
expect do
|
|
104
|
+
result = run_data_shift(described_class, dry_run: true)
|
|
105
|
+
expect(result).to be_ok
|
|
106
|
+
end.not_to change(#{model_for_change}, :count)
|
|
105
107
|
end
|
|
106
108
|
end
|
|
107
109
|
|
|
108
110
|
describe "commit" do
|
|
109
111
|
it "applies changes" do
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
|
|
112
|
+
expect do
|
|
113
|
+
result = run_data_shift(described_class, commit: true)
|
|
114
|
+
expect(result).to be_ok
|
|
115
|
+
end.to change(#{model_for_change}, :count)
|
|
113
116
|
end
|
|
114
117
|
end
|
|
115
118
|
end
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: data_shifter
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.
|
|
4
|
+
version: 0.2.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Kali Donovan
|
|
@@ -85,6 +85,20 @@ dependencies:
|
|
|
85
85
|
- - ">="
|
|
86
86
|
- !ruby/object:Gem::Version
|
|
87
87
|
version: '1.13'
|
|
88
|
+
- !ruby/object:Gem::Dependency
|
|
89
|
+
name: webmock
|
|
90
|
+
requirement: !ruby/object:Gem::Requirement
|
|
91
|
+
requirements:
|
|
92
|
+
- - ">="
|
|
93
|
+
- !ruby/object:Gem::Version
|
|
94
|
+
version: '3.18'
|
|
95
|
+
type: :runtime
|
|
96
|
+
prerelease: false
|
|
97
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
98
|
+
requirements:
|
|
99
|
+
- - ">="
|
|
100
|
+
- !ruby/object:Gem::Version
|
|
101
|
+
version: '3.18'
|
|
88
102
|
description: 'DataShifter: backfills and one-off fixes as rake tasks. Dry run by default,
|
|
89
103
|
auto rollback, progress bars, consistent summaries.'
|
|
90
104
|
email:
|
|
@@ -95,22 +109,25 @@ extra_rdoc_files: []
|
|
|
95
109
|
files:
|
|
96
110
|
- ".husky/pre-commit"
|
|
97
111
|
- ".lintstagedrc"
|
|
112
|
+
- CHANGELOG.md
|
|
98
113
|
- LICENSE.txt
|
|
99
114
|
- README.md
|
|
100
115
|
- Rakefile
|
|
101
116
|
- lib/data_shifter.rb
|
|
117
|
+
- lib/data_shifter/configuration.rb
|
|
118
|
+
- lib/data_shifter/errors.rb
|
|
102
119
|
- lib/data_shifter/internal/env.rb
|
|
120
|
+
- lib/data_shifter/internal/log_deduplicator.rb
|
|
103
121
|
- lib/data_shifter/internal/output.rb
|
|
104
122
|
- lib/data_shifter/internal/progress_bar.rb
|
|
105
123
|
- lib/data_shifter/internal/record_utils.rb
|
|
124
|
+
- lib/data_shifter/internal/side_effect_guards.rb
|
|
106
125
|
- lib/data_shifter/internal/signal_handler.rb
|
|
107
126
|
- lib/data_shifter/railtie.rb
|
|
108
|
-
- lib/data_shifter/rubocop.rb
|
|
109
127
|
- lib/data_shifter/shift.rb
|
|
110
128
|
- lib/data_shifter/spec_helper.rb
|
|
111
129
|
- lib/data_shifter/version.rb
|
|
112
130
|
- lib/generators/data_shift_generator.rb
|
|
113
|
-
- lib/rubocop/cop/data_shifter/skip_transaction_guard_dry_run.rb
|
|
114
131
|
homepage: https://github.com/teamshares/data_shifter
|
|
115
132
|
licenses:
|
|
116
133
|
- MIT
|
data/lib/data_shifter/rubocop.rb
DELETED
|
@@ -1,55 +0,0 @@
|
|
|
1
|
-
# frozen_string_literal: true
|
|
2
|
-
|
|
3
|
-
module RuboCop
|
|
4
|
-
module Cop
|
|
5
|
-
module DataShifter
|
|
6
|
-
# In data shift files, `transaction false` disables automatic transaction
|
|
7
|
-
# and rollback. DB writes (and side effects) are not rolled back on dry run, so
|
|
8
|
-
# the shift must guard them with `return if dry_run?` or `return unless dry_run?`.
|
|
9
|
-
#
|
|
10
|
-
# @example
|
|
11
|
-
# # bad
|
|
12
|
-
# class BackfillUsers < DataShifter::Shift
|
|
13
|
-
# transaction false
|
|
14
|
-
# def process_record(record)
|
|
15
|
-
# record.update!(foo: 1)
|
|
16
|
-
# end
|
|
17
|
-
# end
|
|
18
|
-
#
|
|
19
|
-
# # good
|
|
20
|
-
# class BackfillUsers < DataShifter::Shift
|
|
21
|
-
# transaction false
|
|
22
|
-
# def process_record(record)
|
|
23
|
-
# return if dry_run?
|
|
24
|
-
# record.update!(foo: 1)
|
|
25
|
-
# end
|
|
26
|
-
# end
|
|
27
|
-
class SkipTransactionGuardDryRun < Base
|
|
28
|
-
MSG = "Data shifts using `transaction false` must guard writes/side effects with " \
|
|
29
|
-
"`return if dry_run?` or `return unless dry_run?`."
|
|
30
|
-
|
|
31
|
-
def_node_matcher :skip_transaction_call?, <<~PATTERN
|
|
32
|
-
(send _ :transaction {(sym :none) (false)})
|
|
33
|
-
PATTERN
|
|
34
|
-
|
|
35
|
-
def on_send(node)
|
|
36
|
-
return unless skip_transaction_call?(node)
|
|
37
|
-
return if file_contains_dry_run_guard?
|
|
38
|
-
|
|
39
|
-
add_offense(node, message: MSG)
|
|
40
|
-
end
|
|
41
|
-
|
|
42
|
-
private
|
|
43
|
-
|
|
44
|
-
def file_contains_dry_run_guard?
|
|
45
|
-
return true unless processed_source.ast
|
|
46
|
-
|
|
47
|
-
processed_source.ast.each_node(:send) do |send_node|
|
|
48
|
-
return true if send_node.method?(:dry_run?)
|
|
49
|
-
end
|
|
50
|
-
false
|
|
51
|
-
end
|
|
52
|
-
end
|
|
53
|
-
end
|
|
54
|
-
end
|
|
55
|
-
end
|