data_shifter 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 6e4e5f5aa36cfac3275fcb493a2555e6f093873c310395e92d4c7c64c42bb63b
4
- data.tar.gz: d7c3e9a682d237887960a0bb3946e73a19bd913709ffb7c069a17a89e45f876a
3
+ metadata.gz: 8143cec17a5f8cb7374ad327338e694cea0a9422bf0392e005a3eaeeee9ab83d
4
+ data.tar.gz: b9a246478df8ef89377482951e74ad36b63655510f4bf3bbc6a1b4480edec85e
5
5
  SHA512:
6
- metadata.gz: 74d1ab829e7d3a695d934f6d624fa6bdad25b7bf13ee11d680ecd90ab9ccfbfa8df33eadf9f3239193a246a183db4604691747db7ae5ea12c4548afed547645a
7
- data.tar.gz: c70d3e6be2982dc83501349aeee74ae2e3dd476fd2315f46a76b4e2b52e77cfa6f7c1042a7c4f613460880d07382731b2dd24db41d9e85991b54075d6cf76fa0
6
+ metadata.gz: e8f150f146151d8a82ccc79fd2e31f2ed813ee1b631d0eb9fc5490fa201ed31890a088f24bb7bb086d4158407f5bfb3f969c639dad75d85809c72d44e609172e
7
+ data.tar.gz: b776d810b0819d216436e169414c9c09c0a0c1f7cc3123f58912fa638675f55c998c3c7be71bae40868c5fddbb031bd42d3b2b8a915491d9e6d792852712405f
data/.husky/pre-commit CHANGED
@@ -1,4 +1 @@
1
- #!/bin/sh
2
- . "$(dirname "$0")/_/husky.sh"
3
-
4
1
  npx lint-staged
data/CHANGELOG.md ADDED
@@ -0,0 +1,25 @@
1
+ # Changelog
2
+
3
+ ## [Unreleased]
4
+
5
+ * N/A
6
+
7
+ ## [0.2.0]
8
+
9
+ ### Added
10
+
11
+ - **Configuration object**: New `DataShifter.configure` block for global settings.
12
+ - **Dry-run rollback for `transaction false`**: Shifts using `transaction false` (or `:none`) now roll back DB changes in dry-run mode, matching the behavior of other transaction modes.
13
+ - **Automatic side-effect guards in dry run**: When a shift runs in dry run mode, HTTP (via WebMock), ActionMailer, ActiveJob, and Sidekiq (if loaded) are now automatically blocked or faked so that unguarded external calls do not run. Restore happens in an `ensure` so state is reverted after the run.
14
+ - **HTTP**: All outbound requests are blocked unless allowed with the per-shift `allow_external_requests [...]` DSL or global `DataShifter.config.allow_external_requests`.
15
+ - **ActionMailer**: `perform_deliveries = false` for the duration of the dry run.
16
+ - **ActiveJob**: Queue adapter set to `:test` for the duration of the dry run.
17
+ - **Sidekiq**: `Sidekiq::Testing.fake!` for the duration of the dry run (only if `Sidekiq::Testing` is already loaded).
18
+ - Dependency on `webmock` (>= 3.18) for dry-run HTTP blocking.
19
+ - **Log deduplication**: Repeated log messages are now suppressed during shift runs (default: on). First occurrence logs normally; subsequent occurrences are counted and a summary is printed at the end. Configure globally with `config.suppress_repeated_logs` and `config.repeated_log_cap` (default 1000). Override per-shift with `suppress_repeated_logs false`.
20
+ - **Global progress bar default**: `config.progress_enabled` (default `true`) sets the default for all shifts. Per-shift `progress true/false` still overrides.
21
+ - **Global status interval**: `config.status_interval_seconds` (default `nil`) provides a fallback when `STATUS_INTERVAL` env var is not set.
22
+ - **skip! abort behavior**: `skip!` now terminates the current `process_record` (no `return` needed after calling it).
23
+ - **Grouped skip reasons**: Skip reasons are grouped and the top 10 (by count) are shown in the summary and status output instead of logging each skip inline.
24
+
25
+ ## [0.1.0] - Initial release
data/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # DataShifter
2
2
 
3
- Rake-backed data migrations (shifts) for Rails apps, with **dry run by default**, progress output, and a consistent summary. Define shift classes in `lib/data_shifts/*.rb`; run them as `rake data:shift:<task_name>`.
3
+ Rake-backed data migrations ("shifts") for Rails apps, with **dry run by default**, progress output, and a consistent summary. Define shift classes in `lib/data_shifts/*.rb`; run them as `rake data:shift:<task_name>`.
4
4
 
5
5
  ## Installation
6
6
 
@@ -21,7 +21,7 @@ Generate a shift (optionally scoped to a model):
21
21
 
22
22
  ```bash
23
23
  bin/rails generate data_shift backfill_foo
24
- bin/rails generate data_shift backfill_users --model=User
24
+ bin/rails generate data_shift backfill_users --model User
25
25
  ```
26
26
 
27
27
  Add your logic to the generated file in `lib/data_shifts/`.
@@ -33,19 +33,6 @@ rake data:shift:backfill_foo
33
33
  COMMIT=1 rake data:shift:backfill_foo
34
34
  ```
35
35
 
36
- ## How shift files map to rake tasks
37
-
38
- DataShifter defines one rake task per file in `lib/data_shifts/*.rb`.
39
-
40
- - **Task name**: derived from the filename with any leading digits removed.
41
- - `20260201120000_backfill_foo.rb` → `data:shift:backfill_foo` (leading `<digits>_` prefix is stripped)
42
- - `backfill_foo.rb` → `data:shift:backfill_foo`
43
- - **Class name**: task name camelized, inside the `DataShifts` module.
44
- - `backfill_foo` → `DataShifts::BackfillFoo`
45
-
46
- Shift files are **required only when the task runs** (tasks are defined up front; classes load lazily).
47
- The `description "..."` line is extracted from the file and used for `rake -T` output without loading the shift class.
48
-
49
36
  ## Defining a shift
50
37
 
51
38
  Typical shifts implement:
@@ -77,7 +64,39 @@ Shifts run in **dry run** mode by default. In the automatic transaction modes (`
77
64
  - **Commit**: `COMMIT=1 rake data:shift:backfill_foo`
78
65
  - (`COMMIT=true` or `DRY_RUN=false` also commit)
79
66
 
80
- Non-DB side effects (API calls, emails, enqueued jobs, etc.) obviously cannot be automatically rolled back, so guard them with e.g. `return if dry_run?`.
67
+ ### Automatic side-effect guards (dry run)
68
+
69
+ In **dry run** mode, DataShifter automatically blocks or fakes these side effects so unguarded code is less likely to hit the network or send mail/jobs:
70
+
71
+ | Service | Behavior in dry run |
72
+ |-------------|----------------------|
73
+ | **HTTP** | Blocked via WebMock (`disable_net_connect!`). Allow specific hosts with `allow_external_requests [...]` or `DataShifter.config.allow_external_requests`. |
74
+ | **ActionMailer** | `perform_deliveries = false` (restored after run). |
75
+ | **ActiveJob** | Queue adapter set to `:test` (restored after run). |
76
+ | **Sidekiq** | `Sidekiq::Testing.fake!` (restored with `disable!` after run). Only applied if `Sidekiq::Testing` is already loaded. |
77
+
78
+ **Guarding other side effects:** For anything we don't cover (e.g. another service, or allowed HTTP that mutates), use e.g. `return if dry_run?` in your shift. DB changes are always rolled back in dry run; only non-DB side effects need this.
79
+
80
+ To allow HTTP to specific hosts during dry run (e.g. a migration that must call an API to compute values), use the per-shift DSL or global config (NOTE: it is your responsibility to ensure you only make readonly requests in `dry_run?` mode):
81
+
82
+ ```ruby
83
+ # Per shift
84
+ module DataShifts
85
+ class BackfillFromApi < DataShifter::Shift
86
+ allow_external_requests ["api.readonly.example.com", %r{\.internal\.company\z}]
87
+ # ...
88
+ end
89
+ end
90
+ ```
91
+
92
+ ```ruby
93
+ # Global (e.g. in config/initializers/data_shifter.rb)
94
+ DataShifter.configure do |config|
95
+ config.allow_external_requests = ["api.readonly.example.com"]
96
+ end
97
+ ```
98
+
99
+ Allowed hosts are combined (per-shift + global). Restore (WebMock, mail, jobs) happens in an `ensure` so later code and other specs are unaffected.
81
100
 
82
101
  ## Transaction modes
83
102
 
@@ -85,7 +104,7 @@ Set the transaction mode at the class level:
85
104
 
86
105
  - **`transaction :single` / `transaction true` (default)**: one DB transaction for the entire run; dry run rolls back at the end; a record error aborts the run.
87
106
  - **`transaction :per_record`**: in commit mode, each record runs in its own transaction (errors are collected and the run continues); in dry run, the run is wrapped in a single rollback transaction.
88
- - **`transaction false` / `transaction :none`**: CAUTION: NOT RECOMMENDED. No automatic transactions and no automatic rollback; ⚠️ **you must manually guard DB writes AND side effects with `dry_run?`.**
107
+ - **`transaction false` / `transaction :none`**: No automatic transaction in **commit** mode only. In dry run, the run is still wrapped in a single rollback transaction so DB changes are never committed. Use when you have external side effects or your own transaction strategy in commit mode.
89
108
 
90
109
  ```ruby
91
110
  module DataShifts
@@ -137,7 +156,53 @@ CONTINUE_FROM=123 COMMIT=1 rake data:shift:backfill_foo
137
156
  Notes:
138
157
 
139
158
  - Only supported for `ActiveRecord::Relation` collections (Array-based collections—like those from `find_exactly!`—cannot be resumed).
140
- - The filter is `primary_key > CONTINUE_FROM`, so its only useful with monotonically increasing primary keys (e.g. `find_each`'s default behavior).
159
+ - The filter is `primary_key > CONTINUE_FROM`, so it's only useful with monotonically increasing primary keys (e.g. `find_each`'s default behavior).
160
+
161
+ ## How shift files map to rake tasks
162
+
163
+ DataShifter defines one rake task per file in `lib/data_shifts/*.rb`.
164
+
165
+ - **Task name**: derived from the filename with any leading digits removed.
166
+ - `20260201120000_backfill_foo.rb` → `data:shift:backfill_foo` (leading `<digits>_` prefix is stripped)
167
+ - `backfill_foo.rb` → `data:shift:backfill_foo`
168
+ - **Class name**: task name camelized, inside the `DataShifts` module.
169
+ - `backfill_foo` → `DataShifts::BackfillFoo`
170
+
171
+ Shift files are **required only when the task runs** (tasks are defined up front; classes load lazily).
172
+ The `description "..."` line is extracted from the file and used for `rake -T` output without loading the shift class.
173
+
174
+ ## Configuration
175
+
176
+ Configure DataShifter globally in an initializer:
177
+
178
+ ```ruby
179
+ # config/initializers/data_shifter.rb
180
+ DataShifter.configure do |config|
181
+ # Hosts allowed for HTTP during dry run only (no effect in commit mode)
182
+ config.allow_external_requests = ["api.readonly.example.com"]
183
+
184
+ # Suppress repeated log messages during a shift run (default: true)
185
+ config.suppress_repeated_logs = true
186
+
187
+ # Max unique messages to track for deduplication (default: 1000)
188
+ config.repeated_log_cap = 1000
189
+
190
+ # Global default for progress bar visibility (default: true)
191
+ config.progress_enabled = true
192
+
193
+ # Default status print interval in seconds when ENV STATUS_INTERVAL is not set (default: nil)
194
+ config.status_interval_seconds = nil
195
+ end
196
+ ```
197
+
198
+ Per-shift overrides:
199
+
200
+ ```ruby
201
+ class MyShift < DataShifter::Shift
202
+ progress false # Disable progress bar for this shift
203
+ suppress_repeated_logs false # Disable log deduplication for this shift
204
+ end
205
+ ```
141
206
 
142
207
  ## Operational tips
143
208
 
@@ -145,7 +210,7 @@ Notes:
145
210
 
146
211
  - **Start with a dry run**: run the task once with no environment variables set, confirm logs and summary look right, then re-run with `COMMIT=1`.
147
212
  - **Make shifts idempotent**: structure `process_record` so re-running is safe (for example, update only when the target column is `NULL`, or compute the same derived value deterministically).
148
- - **Guard side effects explicitly**: even in dry run, API calls / emails / enqueues are not rolled back. Use `dry_run?` helper to skip side-effectful code.
213
+ - **Guard side effects we don't auto-block**: use `return if dry_run?` for any side effect not covered by Automatic side-effect guards (see above).
149
214
 
150
215
  ### Choosing a transaction mode (behavior + guidance)
151
216
 
@@ -156,8 +221,8 @@ Notes:
156
221
  - **Behavior**: in commit mode, records are committed one-by-one; errors are collected and the run continues; the overall run fails at the end if any record failed.
157
222
  - **Use when**: you want maximum progress and are OK investigating/fixing a subset of failures.
158
223
  - **`transaction false` / `:none`**:
159
- - **Behavior**: no automatic transaction wrapper (even in dry run) and no automatic rollback.
160
- - **Use when**: you have intentional external side effects, or you’re doing your own transaction/locking strategy—**but always guard writes/side effects with `dry_run?`.**
224
+ - **Behavior**: in commit mode, no automatic transaction; in dry run, the run is still wrapped in a rollback transaction so DB changes are not committed.
225
+ - **Use when**: you have intentional external side effects or your own transaction/locking strategy in commit mode.
161
226
 
162
227
  ### Performance and operability (recommended)
163
228
 
@@ -182,17 +247,19 @@ def process_record(buyback)
182
247
  end
183
248
  ```
184
249
 
185
- ### `skip!` (count but dont update)
250
+ ### `skip!` (count but don't update)
186
251
 
187
- Mark a record as skipped (it will increment Skipped in the summary):
252
+ Mark a record as skipped. Calling `skip!` terminates the current `process_record` immediately (no `return` needed). The record is counted as "Skipped" in the summary.
188
253
 
189
254
  ```ruby
190
255
  def process_record(record)
191
256
  skip!("already done") if record.foo.present?
192
- record.update!(foo: value)
257
+ record.update!(foo: value) # not executed if skipped
193
258
  end
194
259
  ```
195
260
 
261
+ Skip reasons are grouped: the summary shows the top 10 reasons by count (e.g. `"already done" (42), "not eligible" (3)`) instead of logging each skip inline. This keeps the progress bar clean.
262
+
196
263
  ### Throttling and disabling the progress bar
197
264
 
198
265
  ```ruby
@@ -202,19 +269,28 @@ class SomeShift < DataShifter::Shift
202
269
  end
203
270
  ```
204
271
 
272
+
205
273
  ## Generator
206
274
 
207
275
  | Command | Generates |
208
276
  |--------|----------|
209
277
  | `bin/rails generate data_shift backfill_foo` | `lib/data_shifts/<timestamp>_backfill_foo.rb` with a `DataShifts::BackfillFoo` class |
210
- | `bin/rails generate data_shift backfill_users --model=User` | Same, with `User.all` in `collection` and `process_record(user)` |
278
+ | `bin/rails generate data_shift backfill_users --model User` | Same, with `User.all` in `collection` and `process_record(user)` |
211
279
  | `bin/rails generate data_shift backfill_users --spec` | Also generates `spec/lib/data_shifts/backfill_users_spec.rb` when RSpec is enabled |
212
280
 
213
281
  The generator refuses to create a second shift if it would produce a duplicate rake task name.
214
282
 
215
283
  ## Testing shifts (RSpec)
216
284
 
217
- This gem ships a small helper module for running shifts in tests:
285
+ This gem ships a small helper module for running shifts in tests. Require it and include `DataShifter::SpecHelper` in specs or in `RSpec.configure` for `type: :data_shift`.
286
+
287
+ **Helpers:**
288
+
289
+ - **`run_data_shift(shift_class, dry_run: true, commit: false)`** — Runs the shift; returns an `Axn::Result`. Use `commit: true` to run in commit mode.
290
+ - **`silence_data_shift_output`** — Suppresses STDOUT for the block (e.g. progress bar).
291
+ - **`capture_data_shift_output`** — Runs the block and returns `[result, output_string]` for asserting on printed output.
292
+
293
+ Use `expect { ... }.not_to change(...)` and `expect { ... }.to change(...)` to assert that data stays unchanged in dry run and changes when committed:
218
294
 
219
295
  ```ruby
220
296
  require "data_shifter/spec_helper"
@@ -222,35 +298,29 @@ require "data_shifter/spec_helper"
222
298
  RSpec.describe DataShifts::BackfillFoo do
223
299
  include DataShifter::SpecHelper
224
300
 
225
- before { allow($stdout).to receive(:puts) } # silence shift output
301
+ before { allow($stdout).to receive(:puts) }
226
302
 
227
303
  it "does not persist changes in dry run" do
228
- result = run_data_shift(described_class, dry_run: true)
229
- expect(result).to be_ok
230
- # TODO: add some check confirming data is unchanged
304
+ expect do
305
+ result = run_data_shift(described_class, dry_run: true)
306
+ expect(result).to be_ok
307
+ end.not_to change(Foo, :count)
231
308
  end
232
309
 
233
310
  it "persists changes when committed" do
234
- result = run_data_shift(described_class, commit: true)
235
- expect(result).to be_ok
236
- # TODO: add some check confirming data is changed
311
+ expect do
312
+ result = run_data_shift(described_class, commit: true)
313
+ expect(result).to be_ok
314
+ end.to change(Foo, :count).by(1)
315
+ # Or for in-place updates: .to change { record.reload.bar }.from(nil).to("baz")
237
316
  end
238
317
  end
239
318
  ```
240
319
 
241
- ## Optional RuboCop cop
242
-
243
- If you use `transaction false` / `transaction :none`, you should guard writes and side effects with `dry_run?`. You can help avoid mistakes by linting that the helper is at least called once via the bundled cop:
244
-
245
- ```yaml
246
- # .rubocop.yml
247
- require:
248
- - data_shifter/rubocop
249
- ```
250
-
251
320
  ## Requirements
252
321
 
253
322
  - Ruby ≥ 3.2.1
254
- - Rails (ActiveRecord, ActiveSupport, Railties) ≥ 6.1
323
+ - Rails (ActiveRecord, ActiveSupport, Railties) ≥ 7.0
255
324
  - `axn` (Shift classes include `Axn`)
256
325
  - `ruby-progressbar` (for progress bars)
326
+ - `webmock` (for dry-run HTTP blocking; optional allowlist via `allow_external_requests [...]` / `DataShifter.config.allow_external_requests`)
@@ -0,0 +1,42 @@
1
+ # frozen_string_literal: true
2
+
3
+ module DataShifter
4
+ # Global configuration for DataShifter.
5
+ #
6
+ # Configure via:
7
+ # DataShifter.configure do |config|
8
+ # config.allow_external_requests = ["api.readonly.example.com"]
9
+ # config.suppress_repeated_logs = true
10
+ # end
11
+ #
12
+ # Or access directly:
13
+ # DataShifter.config.progress_enabled = false
14
+ class Configuration
15
+ # Hosts or regexes allowed for HTTP during dry run only (combined with per-shift allow_external_requests).
16
+ # Has no effect in commit mode — HTTP is unrestricted when dry_run is false.
17
+ attr_accessor :allow_external_requests
18
+
19
+ # Whether to suppress repeated log messages during a shift run. Default: true.
20
+ # Can be overridden per shift with `suppress_repeated_logs true/false`.
21
+ attr_accessor :suppress_repeated_logs
22
+
23
+ # Maximum unique log messages to track for deduplication. Default: 1000.
24
+ # When exceeded, entries with count == 1 are cleared first; repeated entries are kept.
25
+ attr_accessor :repeated_log_cap
26
+
27
+ # Global default for progress bar visibility. Default: true.
28
+ # Per-shift `progress true/false` overrides this.
29
+ attr_accessor :progress_enabled
30
+
31
+ # Default status print interval in seconds when ENV STATUS_INTERVAL is not set. Default: nil.
32
+ attr_accessor :status_interval_seconds
33
+
34
+ def initialize
35
+ @allow_external_requests = []
36
+ @suppress_repeated_logs = true
37
+ @repeated_log_cap = 1000
38
+ @progress_enabled = true
39
+ @status_interval_seconds = nil
40
+ end
41
+ end
42
+ end
@@ -0,0 +1,46 @@
1
+ # frozen_string_literal: true
2
+
3
+ module DataShifter
4
+ # Raised when a dry run attempts an outbound HTTP request to a host that is
5
+ # not allowed via allow_external_requests (per-shift or global config).
6
+ class ExternalRequestNotAllowedError < StandardError
7
+ def initialize(attempted_host: nil)
8
+ @attempted_host = attempted_host
9
+ super(build_message)
10
+ end
11
+
12
+ attr_reader :attempted_host
13
+
14
+ private
15
+
16
+ def build_message
17
+ intro = if @attempted_host && !@attempted_host.to_s.strip.empty?
18
+ "Dry run blocked an outbound HTTP request to #{@attempted_host}."
19
+ else
20
+ "Dry run blocked an outbound HTTP request."
21
+ end
22
+
23
+ if @attempted_host && !@attempted_host.to_s.strip.empty?
24
+ <<~MSG.strip
25
+ #{intro}
26
+
27
+ To allow this host during dry run, add to your shift class:
28
+
29
+ allow_external_requests ["#{@attempted_host}"]
30
+
31
+ Or set DataShifter.config.allow_external_requests in an initializer.
32
+ MSG
33
+ else
34
+ <<~MSG.strip
35
+ #{intro}
36
+
37
+ To allow specific hosts during dry run, add to your shift class:
38
+
39
+ allow_external_requests ["host.example.com"] # or use a regex
40
+
41
+ Or set DataShifter.config.allow_external_requests in an initializer.
42
+ MSG
43
+ end
44
+ end
45
+ end
46
+ end
@@ -18,14 +18,16 @@ module DataShifter
18
18
  end
19
19
  end
20
20
 
21
- # Parse STATUS_INTERVAL environment variable.
22
- # Returns nil if not set or invalid.
21
+ # Parse STATUS_INTERVAL environment variable, falling back to config.
22
+ # Returns nil if not set/invalid and config is nil.
23
23
  def status_interval_seconds
24
- return nil unless ENV["STATUS_INTERVAL"].present?
25
-
26
- Integer(ENV.fetch("STATUS_INTERVAL", nil), 10)
24
+ if ENV["STATUS_INTERVAL"].present?
25
+ Integer(ENV.fetch("STATUS_INTERVAL", nil), 10)
26
+ else
27
+ DataShifter.config.status_interval_seconds
28
+ end
27
29
  rescue ArgumentError
28
- nil
30
+ DataShifter.config.status_interval_seconds
29
31
  end
30
32
 
31
33
  # Get CONTINUE_FROM environment variable value.
@@ -0,0 +1,149 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "digest"
4
+ require "logger"
5
+
6
+ module DataShifter
7
+ module Internal
8
+ # A proxy logger that suppresses repeated log messages during a shift run.
9
+ # Uses a hash of the message as the key for memory efficiency.
10
+ # First occurrence is forwarded; subsequent occurrences are counted but not forwarded.
11
+ # At the end, prints a summary of suppressed messages via puts.
12
+ class LogDeduplicator
13
+ attr_reader :real_logger, :cap, :seen
14
+
15
+ def initialize(real_logger, cap:)
16
+ @real_logger = real_logger
17
+ @cap = cap
18
+ @seen = {}
19
+ end
20
+
21
+ def add(severity, message = nil, progname = nil, &block)
22
+ msg = block ? block.call : message
23
+ key = message_key(severity, progname, msg)
24
+
25
+ if @seen.key?(key)
26
+ @seen[key][:count] += 1
27
+ nil
28
+ else
29
+ enforce_cap
30
+ @seen[key] = { count: 1, message: truncate_message(msg || progname), severity: }
31
+ @real_logger.add(severity, message, progname, &block)
32
+ end
33
+ end
34
+
35
+ def debug(message = nil, progname = nil, &)
36
+ add(Logger::DEBUG, message, progname, &)
37
+ end
38
+
39
+ def info(message = nil, progname = nil, &)
40
+ add(Logger::INFO, message, progname, &)
41
+ end
42
+
43
+ def warn(message = nil, progname = nil, &)
44
+ add(Logger::WARN, message, progname, &)
45
+ end
46
+
47
+ def error(message = nil, progname = nil, &)
48
+ add(Logger::ERROR, message, progname, &)
49
+ end
50
+
51
+ def fatal(message = nil, progname = nil, &)
52
+ add(Logger::FATAL, message, progname, &)
53
+ end
54
+
55
+ def unknown(message = nil, progname = nil, &)
56
+ add(Logger::UNKNOWN, message, progname, &)
57
+ end
58
+
59
+ def <<(msg)
60
+ key = message_key(Logger::INFO, nil, msg)
61
+ if @seen.key?(key)
62
+ @seen[key][:count] += 1
63
+ else
64
+ enforce_cap
65
+ @seen[key] = { count: 1, message: truncate_message(msg), severity: Logger::INFO }
66
+ @real_logger << msg
67
+ end
68
+ end
69
+
70
+ def level
71
+ @real_logger.level
72
+ end
73
+
74
+ def level=(val)
75
+ @real_logger.level = val
76
+ end
77
+
78
+ def formatter
79
+ @real_logger.formatter
80
+ end
81
+
82
+ def formatter=(val)
83
+ @real_logger.formatter = val
84
+ end
85
+
86
+ def close
87
+ @real_logger.close
88
+ end
89
+
90
+ def suppressed_messages
91
+ @seen.select { |_k, v| v[:count] > 1 }
92
+ end
93
+
94
+ def print_summary
95
+ suppressed = suppressed_messages
96
+ return if suppressed.empty?
97
+
98
+ puts "\n[DataShifter] Suppressed repeated log messages:"
99
+ suppressed.each_value do |entry|
100
+ count = entry[:count] - 1
101
+ snippet = entry[:message].to_s[0, 100]
102
+ snippet = "#{snippet}..." if entry[:message].to_s.length > 100
103
+ puts " #{count}x suppressed: #{snippet.inspect}"
104
+ end
105
+ end
106
+
107
+ def method_missing(method, ...)
108
+ @real_logger.send(method, ...)
109
+ end
110
+
111
+ def respond_to_missing?(method, include_private = false)
112
+ @real_logger.respond_to?(method, include_private) || super
113
+ end
114
+
115
+ class << self
116
+ def with_deduplicating_logger(real_logger, cap:)
117
+ proxy = new(real_logger, cap:)
118
+ yield proxy
119
+ ensure
120
+ proxy&.print_summary
121
+ end
122
+ end
123
+
124
+ private
125
+
126
+ def message_key(severity, progname, message)
127
+ normalized = "#{severity}:#{progname}:#{message}"
128
+ Digest::SHA256.hexdigest(normalized)
129
+ end
130
+
131
+ def truncate_message(msg)
132
+ str = msg.to_s
133
+ str.length > 200 ? "#{str[0, 200]}..." : str
134
+ end
135
+
136
+ def enforce_cap
137
+ return if @seen.size < @cap
138
+
139
+ singles = @seen.select { |_k, v| v[:count] == 1 }
140
+ singles.each_key { |k| @seen.delete(k) } if singles.any?
141
+
142
+ return unless @seen.size >= @cap
143
+
144
+ oldest_key = @seen.keys.first
145
+ @seen.delete(oldest_key)
146
+ end
147
+ end
148
+ end
149
+ end
@@ -11,6 +11,8 @@ module DataShifter
11
11
  none: "none",
12
12
  }.freeze
13
13
 
14
+ SKIP_REASONS_DISPLAY_LIMIT = 10
15
+
14
16
  module_function
15
17
 
16
18
  def print_header(io:, shift_class:, total:, label:, dry_run:, transaction_mode:, status_interval:)
@@ -30,7 +32,7 @@ module DataShifter
30
32
  io.puts ""
31
33
  end
32
34
 
33
- def print_summary(io:, stats:, errors:, start_time:, dry_run:, transaction_mode:, interrupted:, task_name:, last_successful_id:)
35
+ def print_summary(io:, stats:, errors:, start_time:, dry_run:, transaction_mode:, interrupted:, task_name:, last_successful_id:, skip_reasons: {})
34
36
  return unless start_time
35
37
 
36
38
  elapsed = (Time.current - start_time).round(1)
@@ -43,6 +45,7 @@ module DataShifter
43
45
  io.puts "Succeeded: #{stats[:succeeded]}"
44
46
  io.puts "Failed: #{stats[:failed]}"
45
47
  io.puts "Skipped: #{stats[:skipped]}"
48
+ print_skip_reasons(io:, skip_reasons:) if skip_reasons.any?
46
49
 
47
50
  print_errors(io:, errors:) if errors.any?
48
51
  print_interrupt_warning(io:, transaction_mode:, dry_run:) if interrupted
@@ -52,7 +55,7 @@ module DataShifter
52
55
  io.puts "=" * 60
53
56
  end
54
57
 
55
- def print_progress(io:, stats:, errors:, start_time:, status_interval:)
58
+ def print_progress(io:, stats:, errors:, start_time:, status_interval:, skip_reasons: {})
56
59
  return unless start_time
57
60
 
58
61
  elapsed = (Time.current - start_time).round(1)
@@ -74,6 +77,7 @@ module DataShifter
74
77
  io.puts "Succeeded: #{stats[:succeeded]}"
75
78
  io.puts "Failed: #{stats[:failed]}"
76
79
  io.puts "Skipped: #{stats[:skipped]}"
80
+ print_skip_reasons(io:, skip_reasons:) if skip_reasons.any?
77
81
 
78
82
  print_errors(io:, errors:) if errors.any?
79
83
 
@@ -85,7 +89,9 @@ module DataShifter
85
89
  io.puts ""
86
90
  io.puts "ERRORS:"
87
91
  errors.each do |err|
88
- io.puts " #{err[:record]}: #{err[:error]}"
92
+ lines = err[:error].to_s.split("\n")
93
+ io.puts " #{err[:record]}: #{lines.first}"
94
+ lines.drop(1).each { |line| io.puts " #{line}" }
89
95
  err[:backtrace]&.each { |line| io.puts " #{line}" }
90
96
  end
91
97
  end
@@ -145,6 +151,14 @@ module DataShifter
145
151
  status_tips.join(" or ")
146
152
  end
147
153
  end
154
+
155
+ def print_skip_reasons(io:, skip_reasons:)
156
+ return if skip_reasons.empty?
157
+
158
+ top = skip_reasons.sort_by { |_reason, count| -count }.first(SKIP_REASONS_DISPLAY_LIMIT)
159
+ formatted = top.map { |reason, count| "\"#{reason}\" (#{count})" }.join(", ")
160
+ io.puts " #{formatted}"
161
+ end
148
162
  end
149
163
  end
150
164
  end
@@ -0,0 +1,120 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "uri"
4
+
5
+ module DataShifter
6
+ module Internal
7
+ # Applies and restores side-effect guards during dry runs so that HTTP, mail,
8
+ # and job enqueues are blocked (or faked) unless explicitly allowed.
9
+ #
10
+ # Production impact:
11
+ # - WebMock: required only when apply_webmock runs (i.e. during a dry run), so commit-only
12
+ # production runs never load WebMock. On restore we revert to the previous state (enable!
13
+ # or disable!) so e.g. specs that had WebMock enabled are not left with it disabled.
14
+ # - ActionMailer / ActiveJob / Sidekiq: no extra loading; we only toggle existing config
15
+ # for the duration of the block and restore in ensure, so impact is scoped to the run.
16
+ module SideEffectGuards
17
+ class << self
18
+ # Applies side-effect guards, yields, then restores. Call only when running in dry run.
19
+ def with_guards(shift_class:, &block)
20
+ saved = {}
21
+ apply_guards(shift_class, saved)
22
+ block.call
23
+ rescue webmock_net_connect_error => e
24
+ host = extract_host_from_webmock_message(e.message)
25
+ raise DataShifter::ExternalRequestNotAllowedError.new(attempted_host: host), cause: e
26
+ ensure
27
+ restore_guards(saved) if saved.any?
28
+ end
29
+
30
+ private
31
+
32
+ def apply_guards(shift_class, saved)
33
+ apply_webmock(shift_class, saved)
34
+ # rubocop:disable Style/CombinableDefined -- parent must be checked first to avoid NameError when constant not loaded
35
+ apply_action_mailer(saved) if defined?(ActionMailer) && defined?(ActionMailer::Base)
36
+ apply_active_job(saved) if defined?(ActiveJob) && defined?(ActiveJob::Base)
37
+ apply_sidekiq(saved) if defined?(Sidekiq) && defined?(Sidekiq::Testing)
38
+ # rubocop:enable Style/CombinableDefined
39
+ end
40
+
41
+ def apply_webmock(shift_class, saved)
42
+ if defined?(WebMock)
43
+ # WebMock already loaded (e.g. in specs); capture so we can restore
44
+ saved[:webmock_was_enabled] = net_http_webmock_enabled?
45
+ else
46
+ require "webmock"
47
+ saved[:webmock_was_enabled] = false
48
+ end
49
+ WebMock.enable!
50
+ allowed = allowed_net_hosts(shift_class)
51
+ opts = allowed.any? ? { allow: allowed } : {}
52
+ WebMock.disable_net_connect!(**opts)
53
+ saved[:webmock] = true
54
+ end
55
+
56
+ def net_http_webmock_enabled?
57
+ Net::HTTP.socket_type.to_s.include?("StubSocket")
58
+ rescue StandardError
59
+ false
60
+ end
61
+
62
+ def allowed_net_hosts(shift_class)
63
+ per_shift = shift_class.respond_to?(:_allow_external_requests) ? shift_class._allow_external_requests : []
64
+ global = DataShifter.config.allow_external_requests
65
+ Array(per_shift) + Array(global)
66
+ end
67
+
68
+ def webmock_net_connect_error
69
+ return WebMock::NetConnectNotAllowedError if defined?(WebMock::NetConnectNotAllowedError)
70
+
71
+ Class.new(StandardError) # never matched when WebMock not loaded
72
+ end
73
+
74
+ def extract_host_from_webmock_message(message)
75
+ return nil unless message.is_a?(String)
76
+
77
+ # WebMock format: "Unregistered request: GET https://host/path with headers ..."
78
+ m = message.match(%r{Unregistered request: \w+ (https?://[^\s]+)})
79
+ return nil unless m
80
+
81
+ uri = URI.parse(m[1])
82
+ uri.host
83
+ rescue URI::InvalidURIError, ArgumentError
84
+ nil
85
+ end
86
+
87
+ def apply_action_mailer(saved)
88
+ saved[:action_mailer_perform_deliveries] = ActionMailer::Base.perform_deliveries
89
+ ActionMailer::Base.perform_deliveries = false
90
+ end
91
+
92
+ def apply_active_job(saved)
93
+ saved[:active_job_adapter] = ActiveJob::Base.queue_adapter
94
+ ActiveJob::Base.queue_adapter = :test
95
+ end
96
+
97
+ def apply_sidekiq(saved)
98
+ return unless Sidekiq::Testing.respond_to?(:fake!)
99
+
100
+ Sidekiq::Testing.fake!
101
+ saved[:sidekiq] = true
102
+ end
103
+
104
+ def restore_guards(saved)
105
+ if saved.delete(:webmock)
106
+ (saved.delete(:webmock_was_enabled) ? WebMock.enable! : WebMock.disable!)
107
+ end
108
+
109
+ ActionMailer::Base.perform_deliveries = saved.delete(:action_mailer_perform_deliveries) if saved.key?(:action_mailer_perform_deliveries)
110
+
111
+ ActiveJob::Base.queue_adapter = saved.delete(:active_job_adapter) if saved.key?(:active_job_adapter)
112
+
113
+ return unless saved.delete(:sidekiq)
114
+
115
+ Sidekiq::Testing.disable!
116
+ end
117
+ end
118
+ end
119
+ end
120
+ end
@@ -7,6 +7,8 @@ require_relative "internal/output"
7
7
  require_relative "internal/signal_handler"
8
8
  require_relative "internal/record_utils"
9
9
  require_relative "internal/progress_bar"
10
+ require_relative "internal/side_effect_guards"
11
+ require_relative "internal/log_deduplicator"
10
12
 
11
13
  # Base class for data shifts. Dry-run by default, progress bars, transaction modes, consistent summaries.
12
14
  #
@@ -30,15 +32,15 @@ require_relative "internal/progress_bar"
30
32
  # Running:
31
33
  # - `rake data:shift:backfill_foo` (dry run by default)
32
34
  # - `COMMIT=1 rake data:shift:backfill_foo` (apply changes)
33
- # - Or call directly: `MyShift.call(dry_run: false)` (Axn semantics) - but note default location not auto-loaded
35
+ # - Or technically can call directly: `MyShift.call(dry_run: false)` (Axn semantics) - BUT:
36
+ # NOTES: default location not auto-loaded, and in general it is strongly recommended to use the rake task.
34
37
  #
35
38
  # Transaction modes (set at class level with `transaction`):
36
39
  # - `transaction :single` (default): one transaction for the whole run (all-or-nothing).
37
40
  # - `transaction :per_record`: each record in its own transaction.
38
- # - `transaction false`: no automatic transactions; guard writes with `return if dry_run?`.
41
+ # - `transaction false`: no automatic transaction in commit mode; in dry run we still wrap in a rollback transaction.
39
42
  #
40
- # Dry run: In `:single` and `:per_record`, dry_run rolls back DB changes automatically.
41
- # Non-DB side effects are not rolled back; guard with `return if dry_run?` / `return unless dry_run?`.
43
+ # Dry run: DB changes are always rolled back (we wrap in a transaction and raise Rollback). Guard non-DB side effects with `return if dry_run?`.
42
44
  #
43
45
  # Fixed list of IDs (fail fast): Use find_exactly!(Model, [id1, id2, ...]) in `collection`.
44
46
  # Large collections: Return an ActiveRecord::Relation and iteration uses `find_each`.
@@ -51,16 +53,25 @@ module DataShifter
51
53
 
52
54
  log_calls false if respond_to?(:log_calls)
53
55
 
56
+ around :_with_log_deduplication
57
+ around :_with_side_effect_guards
54
58
  around :_with_transaction_for_dry_run
55
59
  before :_reset_tracking
56
60
  on_success :_print_summary
57
61
  on_error :_print_summary
58
62
 
59
63
  class_attribute :_transaction_mode, default: :single
60
- class_attribute :_progress_enabled, default: true
64
+ class_attribute :_progress_enabled, default: nil
61
65
  class_attribute :_description, default: nil
62
66
  class_attribute :_task_name, default: nil
63
67
  class_attribute :_throttle_interval, default: nil
68
+ class_attribute :_allow_external_requests, default: [], instance_accessor: false
69
+ class_attribute :_suppress_repeated_logs, default: nil, instance_accessor: false
70
+
71
+ # Internal exception used by skip! to abort the current process_record.
72
+ # Rescued in _process_one; not propagated.
73
+ class SkipRecord < StandardError; end
74
+ private_constant :SkipRecord
64
75
 
65
76
  class << self
66
77
  def description(text = nil)
@@ -104,6 +115,19 @@ module DataShifter
104
115
  self._throttle_interval = interval
105
116
  end
106
117
 
118
+ # Allow these hosts (or regexes) for HTTP during dry run only. Combines with DataShifter.config.allow_external_requests.
119
+ # Has no effect in commit mode — HTTP is unrestricted when dry_run is false.
120
+ # Example: allow_external_requests ["api.readonly.example.com", %r{\.internal\.company\z}]
121
+ def allow_external_requests(hosts)
122
+ self._allow_external_requests = Array(hosts)
123
+ end
124
+
125
+ # Enable/disable log deduplication for this shift. Overrides DataShifter.config.suppress_repeated_logs.
126
+ # Example: suppress_repeated_logs false
127
+ def suppress_repeated_logs(enabled)
128
+ self._suppress_repeated_logs = !!enabled
129
+ end
130
+
107
131
  def run!
108
132
  dry_run = Internal::Env.dry_run?
109
133
  result = call(dry_run:)
@@ -133,8 +157,9 @@ module DataShifter
133
157
 
134
158
  def skip!(reason = nil)
135
159
  @stats[:skipped] += 1
136
- @stats[:succeeded] -= 1
137
- log " SKIP: #{reason}" if reason
160
+ key = reason.to_s.presence || "(no reason given)"
161
+ @skip_reasons[key] += 1
162
+ raise SkipRecord
138
163
  end
139
164
 
140
165
  def log(message)
@@ -145,24 +170,61 @@ module DataShifter
145
170
 
146
171
  # --- Axn lifecycle hooks ---
147
172
 
173
+ def _with_log_deduplication(chain)
174
+ effective = self.class._suppress_repeated_logs.nil? ? DataShifter.config.suppress_repeated_logs : self.class._suppress_repeated_logs
175
+ unless effective && defined?(::Rails) && ::Rails.respond_to?(:logger) && ::Rails.logger
176
+ chain.call
177
+ return
178
+ end
179
+
180
+ original_logger = ::Rails.logger
181
+ original_ar_logger = ::ActiveRecord::Base.logger
182
+
183
+ Internal::LogDeduplicator.with_deduplicating_logger(original_logger, cap: DataShifter.config.repeated_log_cap) do |proxy|
184
+ ::Rails.logger = proxy
185
+ ::ActiveRecord::Base.logger = proxy
186
+ chain.call
187
+ end
188
+ ensure
189
+ if effective && defined?(::Rails) && ::Rails.respond_to?(:logger=)
190
+ ::Rails.logger = original_logger
191
+ ::ActiveRecord::Base.logger = original_ar_logger
192
+ end
193
+ end
194
+
195
+ def _with_side_effect_guards(chain)
196
+ if dry_run?
197
+ Internal::SideEffectGuards.with_guards(shift_class: self.class) { chain.call }
198
+ else
199
+ chain.call
200
+ end
201
+ end
202
+
148
203
  def _with_transaction_for_dry_run(chain)
149
204
  if _transaction_mode == :none
150
- chain.call
205
+ if dry_run?
206
+ ::ActiveRecord::Base.transaction do
207
+ chain.call
208
+ raise ::ActiveRecord::Rollback
209
+ end
210
+ else
211
+ chain.call
212
+ end
151
213
  return
152
214
  end
153
215
 
154
216
  if _transaction_mode == :single
155
- ActiveRecord::Base.transaction do
217
+ ::ActiveRecord::Base.transaction do
156
218
  chain.call
157
- raise ActiveRecord::Rollback if dry_run?
219
+ raise ::ActiveRecord::Rollback if dry_run?
158
220
  end
159
221
  return
160
222
  end
161
223
 
162
224
  if dry_run?
163
- ActiveRecord::Base.transaction do
225
+ ::ActiveRecord::Base.transaction do
164
226
  chain.call
165
- raise ActiveRecord::Rollback
227
+ raise ::ActiveRecord::Rollback
166
228
  end
167
229
  else
168
230
  chain.call
@@ -172,6 +234,7 @@ module DataShifter
172
234
  def _reset_tracking
173
235
  @stats = { processed: 0, succeeded: 0, failed: 0, skipped: 0 }
174
236
  @errors = []
237
+ @skip_reasons = Hash.new(0)
175
238
  @start_time = Time.current
176
239
  @last_status_print = @start_time
177
240
  @_data_shift_interrupted = false
@@ -183,6 +246,7 @@ module DataShifter
183
246
  io: $stdout,
184
247
  stats: @stats,
185
248
  errors: @errors,
249
+ skip_reasons: @skip_reasons,
186
250
  start_time: @start_time,
187
251
  dry_run: dry_run?,
188
252
  transaction_mode: _transaction_mode,
@@ -209,6 +273,7 @@ module DataShifter
209
273
  io: $stdout,
210
274
  stats: @stats,
211
275
  errors: @errors,
276
+ skip_reasons: @skip_reasons,
212
277
  start_time: @start_time,
213
278
  status_interval: Internal::Env.status_interval_seconds,
214
279
  )
@@ -275,18 +340,19 @@ module DataShifter
275
340
  # --- Transaction execution strategies ---
276
341
 
277
342
  def _run_in_single_transaction(enum, total, &block)
278
- ActiveRecord::Base.transaction do
343
+ ::ActiveRecord::Base.transaction do
279
344
  _iterate(enum, total, &block)
280
345
  if dry_run?
281
346
  log "\nDry run complete — rolling back all changes."
282
- raise ActiveRecord::Rollback
347
+ raise ::ActiveRecord::Rollback
283
348
  end
284
349
  end
285
350
  rescue StandardError => e
286
351
  return if @errors.any?
287
352
 
288
353
  @stats[:failed] += 1
289
- @errors << { record: "transaction", error: e.message, backtrace: e.backtrace&.first(3) }
354
+ error_text = _format_error(e)
355
+ @errors << { record: "transaction", error: error_text, backtrace: e.backtrace&.first(3) }
290
356
  end
291
357
 
292
358
  def _run_per_record(enum, total, &)
@@ -294,7 +360,7 @@ module DataShifter
294
360
  if dry_run?
295
361
  yield record
296
362
  else
297
- ActiveRecord::Base.transaction { yield record }
363
+ ::ActiveRecord::Base.transaction { yield record }
298
364
  end
299
365
  end
300
366
  end
@@ -304,7 +370,8 @@ module DataShifter
304
370
  end
305
371
 
306
372
  def _iterate(enum, total)
307
- bar = Internal::ProgressBar.create(total:, dry_run: dry_run?, enabled: _progress_enabled)
373
+ progress_on = _progress_enabled.nil? ? DataShifter.config.progress_enabled : _progress_enabled
374
+ bar = Internal::ProgressBar.create(total:, dry_run: dry_run?, enabled: progress_on)
308
375
  if enum.respond_to?(:find_each)
309
376
  enum.find_each do |record|
310
377
  _process_one(record) { yield record }
@@ -325,11 +392,15 @@ module DataShifter
325
392
  yield
326
393
  @stats[:succeeded] += 1
327
394
  @_last_successful_id = record.id if record.respond_to?(:id)
395
+ rescue SkipRecord
396
+ # skip! already incremented @stats[:skipped] and recorded the reason; just continue
397
+ nil
328
398
  rescue StandardError => e
329
399
  @stats[:failed] += 1
330
400
  identifier = Internal::RecordUtils.identifier(record)
331
- @errors << { record: identifier, error: e.message, backtrace: e.backtrace&.first(3) }
332
- log "ERROR #{identifier}: #{e.message}"
401
+ error_text = _format_error(e)
402
+ @errors << { record: identifier, error: error_text, backtrace: e.backtrace&.first(3) }
403
+ _log_error(identifier, error_text)
333
404
 
334
405
  raise if _transaction_mode == :single
335
406
  ensure
@@ -345,6 +416,18 @@ module DataShifter
345
416
  _print_progress
346
417
  end
347
418
 
419
+ def _format_error(e)
420
+ msg = e.message.to_s
421
+ msg += "\n Caused by: #{e.cause.class}: #{e.cause.message}" if e.respond_to?(:cause) && e.cause
422
+ msg
423
+ end
424
+
425
+ def _log_error(identifier, error_text)
426
+ lines = error_text.to_s.split("\n")
427
+ log "ERROR #{identifier}: #{lines.first}"
428
+ lines.drop(1).each { |line| log " #{line}" }
429
+ end
430
+
348
431
  # --- Output helpers ---
349
432
 
350
433
  def _print_header(total)
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module DataShifter
4
- VERSION = "0.1.0"
4
+ VERSION = "0.2.0"
5
5
  end
data/lib/data_shifter.rb CHANGED
@@ -1,5 +1,26 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  require_relative "data_shifter/version"
4
+ require_relative "data_shifter/configuration"
5
+ require_relative "data_shifter/errors"
4
6
  require_relative "data_shifter/shift"
5
7
  require_relative "data_shifter/railtie"
8
+
9
+ module DataShifter
10
+ class << self
11
+ # Returns the global configuration instance.
12
+ def config
13
+ @config ||= Configuration.new
14
+ end
15
+
16
+ # Yields the configuration for block-style setup.
17
+ #
18
+ # DataShifter.configure do |c|
19
+ # c.allow_external_requests = ["api.readonly.example.com"]
20
+ # c.suppress_repeated_logs = false
21
+ # end
22
+ def configure
23
+ yield config
24
+ end
25
+ end
26
+ end
@@ -83,6 +83,7 @@ class DataShiftGenerator < Rails::Generators::NamedBase
83
83
  underscored_name = name.underscore
84
84
  record_arg = @model_name.present? ? @model_name.underscore : "record"
85
85
 
86
+ model_for_change = @model_name.present? ? @model_name : "Model"
86
87
  create_file "spec/lib/data_shifts/#{underscored_name}_spec.rb", <<~RUBY
87
88
  # frozen_string_literal: true
88
89
 
@@ -94,22 +95,24 @@ class DataShiftGenerator < Rails::Generators::NamedBase
94
95
 
95
96
  before { allow($stdout).to receive(:puts) }
96
97
 
97
- # TODO: Set up test records
98
+ # Set up test records as needed
98
99
  # let(:#{record_arg}) { create(:#{record_arg}) }
99
100
 
100
101
  describe "dry run" do
101
102
  it "does not persist changes" do
102
- result = run_data_shift(described_class, dry_run: true)
103
- expect(result).to be_ok
104
- # TODO: Assert that records are unchanged
103
+ expect do
104
+ result = run_data_shift(described_class, dry_run: true)
105
+ expect(result).to be_ok
106
+ end.not_to change(#{model_for_change}, :count)
105
107
  end
106
108
  end
107
109
 
108
110
  describe "commit" do
109
111
  it "applies changes" do
110
- result = run_data_shift(described_class, commit: true)
111
- expect(result).to be_ok
112
- # TODO: Assert that records are updated
112
+ expect do
113
+ result = run_data_shift(described_class, commit: true)
114
+ expect(result).to be_ok
115
+ end.to change(#{model_for_change}, :count)
113
116
  end
114
117
  end
115
118
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: data_shifter
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Kali Donovan
@@ -85,6 +85,20 @@ dependencies:
85
85
  - - ">="
86
86
  - !ruby/object:Gem::Version
87
87
  version: '1.13'
88
+ - !ruby/object:Gem::Dependency
89
+ name: webmock
90
+ requirement: !ruby/object:Gem::Requirement
91
+ requirements:
92
+ - - ">="
93
+ - !ruby/object:Gem::Version
94
+ version: '3.18'
95
+ type: :runtime
96
+ prerelease: false
97
+ version_requirements: !ruby/object:Gem::Requirement
98
+ requirements:
99
+ - - ">="
100
+ - !ruby/object:Gem::Version
101
+ version: '3.18'
88
102
  description: 'DataShifter: backfills and one-off fixes as rake tasks. Dry run by default,
89
103
  auto rollback, progress bars, consistent summaries.'
90
104
  email:
@@ -95,22 +109,25 @@ extra_rdoc_files: []
95
109
  files:
96
110
  - ".husky/pre-commit"
97
111
  - ".lintstagedrc"
112
+ - CHANGELOG.md
98
113
  - LICENSE.txt
99
114
  - README.md
100
115
  - Rakefile
101
116
  - lib/data_shifter.rb
117
+ - lib/data_shifter/configuration.rb
118
+ - lib/data_shifter/errors.rb
102
119
  - lib/data_shifter/internal/env.rb
120
+ - lib/data_shifter/internal/log_deduplicator.rb
103
121
  - lib/data_shifter/internal/output.rb
104
122
  - lib/data_shifter/internal/progress_bar.rb
105
123
  - lib/data_shifter/internal/record_utils.rb
124
+ - lib/data_shifter/internal/side_effect_guards.rb
106
125
  - lib/data_shifter/internal/signal_handler.rb
107
126
  - lib/data_shifter/railtie.rb
108
- - lib/data_shifter/rubocop.rb
109
127
  - lib/data_shifter/shift.rb
110
128
  - lib/data_shifter/spec_helper.rb
111
129
  - lib/data_shifter/version.rb
112
130
  - lib/generators/data_shift_generator.rb
113
- - lib/rubocop/cop/data_shifter/skip_transaction_guard_dry_run.rb
114
131
  homepage: https://github.com/teamshares/data_shifter
115
132
  licenses:
116
133
  - MIT
@@ -1,4 +0,0 @@
1
- # frozen_string_literal: true
2
-
3
- require "rubocop"
4
- require "rubocop/cop/data_shifter/skip_transaction_guard_dry_run"
@@ -1,55 +0,0 @@
1
- # frozen_string_literal: true
2
-
3
- module RuboCop
4
- module Cop
5
- module DataShifter
6
- # In data shift files, `transaction false` disables automatic transaction
7
- # and rollback. DB writes (and side effects) are not rolled back on dry run, so
8
- # the shift must guard them with `return if dry_run?` or `return unless dry_run?`.
9
- #
10
- # @example
11
- # # bad
12
- # class BackfillUsers < DataShifter::Shift
13
- # transaction false
14
- # def process_record(record)
15
- # record.update!(foo: 1)
16
- # end
17
- # end
18
- #
19
- # # good
20
- # class BackfillUsers < DataShifter::Shift
21
- # transaction false
22
- # def process_record(record)
23
- # return if dry_run?
24
- # record.update!(foo: 1)
25
- # end
26
- # end
27
- class SkipTransactionGuardDryRun < Base
28
- MSG = "Data shifts using `transaction false` must guard writes/side effects with " \
29
- "`return if dry_run?` or `return unless dry_run?`."
30
-
31
- def_node_matcher :skip_transaction_call?, <<~PATTERN
32
- (send _ :transaction {(sym :none) (false)})
33
- PATTERN
34
-
35
- def on_send(node)
36
- return unless skip_transaction_call?(node)
37
- return if file_contains_dry_run_guard?
38
-
39
- add_offense(node, message: MSG)
40
- end
41
-
42
- private
43
-
44
- def file_contains_dry_run_guard?
45
- return true unless processed_source.ast
46
-
47
- processed_source.ast.each_node(:send) do |send_node|
48
- return true if send_node.method?(:dry_run?)
49
- end
50
- false
51
- end
52
- end
53
- end
54
- end
55
- end