archive_storage 0.1.0 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 32099fdbd406f02b06d31a562372c848c0d520aa6d4da0278793aecb4ed6cfe5
4
- data.tar.gz: e28f3f328bcefdf82382b854a7dc7841367f9b6f42e83c5a65fb62082a6985a7
3
+ metadata.gz: 6441fbc9f1b8de1aa81f70d435a4be4c6355844160e23da06c65501273c8be63
4
+ data.tar.gz: 7949fa9dc2bd082d290a806ecf6858b6acf1643f50669db176bd99ce250dfef9
5
5
  SHA512:
6
- metadata.gz: 2db3a3be0b2d4300e53d00c6ae1e60f450e9201dd37efe62b2c703f1f50a075e212e9ac09265c1b1814a567b0b43ed4e142aa873eac32c20f3d240dab3860146
7
- data.tar.gz: 3fd152d37cb0f5e4f4f0566f9c1b0577fc877cddf728abde9b6a35fcbce5fa0834c1c2dbfa0c686f431e8537126a7aef8fe20606aa2b88694108cedff895a431
6
+ metadata.gz: 8c0a6a7d3b6e50582e0f9a115809139710f75b60fe77ce0f6ee74d83e55cfa5f7832e46bba109f9e1d32d878bbaaf1fce7f4621127c055f7074205cb3c026152
7
+ data.tar.gz: c462624a3b0b1b59c1b41ba9bec587ca6fc35f8de67c74f3b33c92b34637dec07491365ace8794fdc587e506e2a26a7324a02b1773d1005728604ff4c7987dac
data/README.md CHANGED
@@ -1,42 +1,50 @@
1
1
  # archive_storage
2
2
 
3
- Zero-downtime archival storage for CarrierWave uploads.
3
+ Archival storage for Rails uploaders.
4
4
 
5
- `archive_storage` moves older uploaded files from one storage backend to another, keeps a registry of the current file location, and routes reads to the right backend. It currently integrates with CarrierWave; support for other uploader libraries can be added later without changing the registry model.
5
+ `archive_storage` moves older uploaded files from a primary storage backend to one or more archive backends, records the current file location in a registry table, and keeps reads routed through the uploader.
6
6
 
7
- Supported storage adapters:
7
+ The gem currently supports CarrierWave. The storage, registry, and migration layers are intentionally not tied to CarrierWave, so support for other Rails uploader libraries can be added later.
8
8
 
9
- - S3-compatible object storage, including MinIO and AWS S3
10
- - filesystem/NFS
11
- - memory adapter for tests
9
+ ## Contents
12
10
 
13
- Typical use cases:
14
-
15
- - `main` S3/MinIO bucket -> `archive_001` cold bucket
16
- - `archive_001` -> `archive_002` when the first archive fills up
17
- - NFS/local disk -> S3-compatible archive storage
11
+ - [Features](#features)
12
+ - [Installation](#installation)
13
+ - [Getting Started](#getting-started)
14
+ - [Configuration](#configuration)
15
+ - [CarrierWave](#carrierwave)
16
+ - [Policies](#policies)
17
+ - [Scheduled Jobs](#scheduled-jobs)
18
+ - [Command Line](#command-line)
19
+ - [Migration Flow](#migration-flow)
20
+ - [Verification](#verification)
21
+ - [Cleanup](#cleanup)
22
+ - [Registry](#registry)
23
+ - [Development](#development)
18
24
 
19
25
  ## Features
20
26
 
21
- - model-first DSL: `archive_storage_for :file`
22
- - automatic CarrierWave storage wiring
23
- - ActiveRecord registry table: `archive_storage_files`
24
- - dry-run planning
25
- - scheduled enqueueing
26
- - background migration jobs
27
- - copy, verify, read switch, fallback read, delayed source cleanup
28
- - optional CarrierWave versions/thumbs migration
27
+ - Model-level archive policy with `archive_storage_for :file`
28
+ - CarrierWave integration without changing shared base uploaders globally
29
+ - Multiple archive storages, for example `archive_001`, then `archive_002`
30
+ - S3-compatible storage, filesystem/NFS storage, and a memory adapter for tests
31
+ - ActiveRecord registry table for file location and migration state
32
+ - Dry-run planning
33
+ - Scheduled enqueueing
34
+ - Background migration jobs
35
+ - Copy, verify, read switch, fallback read, and delayed source cleanup
36
+ - Optional CarrierWave version/thumb migration
29
37
  - GoodJob, ActiveJob, Sidekiq, `sidekiq-cron`, and `sidekiq-scheduler` support
30
38
 
31
39
  ## Installation
32
40
 
33
- Add the gem:
41
+ Add the gem to your Rails application:
34
42
 
35
43
  ```ruby
36
44
  gem "archive_storage"
37
45
  ```
38
46
 
39
- For S3-compatible storage:
47
+ For S3-compatible storage, also add:
40
48
 
41
49
  ```ruby
42
50
  gem "aws-sdk-s3"
@@ -49,9 +57,9 @@ bin/rails generate archive_storage:install
49
57
  bin/rails db:migrate
50
58
  ```
51
59
 
52
- ## Configuration
60
+ ## Getting Started
53
61
 
54
- Define the storage backends and scheduled archive jobs.
62
+ Configure storages:
55
63
 
56
64
  ```ruby
57
65
  # config/initializers/archive_storage.rb
@@ -86,6 +94,85 @@ ArchiveStorage.configure do |config|
86
94
  s.region = "us-east-1"
87
95
  s.path_style = true
88
96
  end
97
+ end
98
+ ```
99
+
100
+ Add a policy to the model that owns the upload:
101
+
102
+ ```ruby
103
+ class ProjectDocument < ApplicationRecord
104
+ scope :ready_for_archive, -> { where("created_at <= ?", 90.days.ago) }
105
+
106
+ mount_uploader :file, DocumentUploader
107
+
108
+ archive_storage_for :file do
109
+ primary :main
110
+
111
+ archive :archive_001,
112
+ after: 90.days,
113
+ scope: :ready_for_archive,
114
+ max_byte_size: 3.gigabytes,
115
+ if: ->(record) { record.closed? }
116
+
117
+ archive :archive_002,
118
+ after: 2.years,
119
+ scope: ->(records) { records.where(priority: "low") },
120
+ if: ->(record) { record.closed? }
121
+
122
+ read_fallbacks :main, :archive_001, :archive_002
123
+ end
124
+ end
125
+ ```
126
+
127
+ Keep the uploader focused on paths and filenames:
128
+
129
+ ```ruby
130
+ class DocumentUploader < CarrierWave::Uploader::Base
131
+ def store_dir
132
+ "uploads/#{model.class.to_s.underscore}/#{mounted_as}/#{model.id}"
133
+ end
134
+ end
135
+ ```
136
+
137
+ Run a dry plan:
138
+
139
+ ```bash
140
+ bin/rails archive_storage:plan MODEL=ProjectDocument MOUNT=file
141
+ ```
142
+
143
+ Enqueue migrations:
144
+
145
+ ```bash
146
+ bin/rails archive_storage:enqueue MODEL=ProjectDocument MOUNT=file LIMIT=10000
147
+ ```
148
+
149
+ ## Configuration
150
+
151
+ `archive_storage` needs storage definitions and, optionally, schedules and runtime defaults.
152
+
153
+ ```ruby
154
+ # config/initializers/archive_storage.rb
155
+
156
+ ArchiveStorage.configure do |config|
157
+ config.storage :main do |s|
158
+ s.provider = :s3
159
+ s.endpoint = ENV.fetch("MAIN_STORAGE_ENDPOINT")
160
+ s.bucket = "production-main"
161
+ s.access_key_id = ENV.fetch("MAIN_STORAGE_ACCESS_KEY")
162
+ s.secret_access_key = ENV.fetch("MAIN_STORAGE_SECRET_KEY")
163
+ s.region = "us-east-1"
164
+ s.path_style = true
165
+ end
166
+
167
+ config.storage :archive_001 do |s|
168
+ s.provider = :s3
169
+ s.endpoint = ENV.fetch("ARCHIVE_001_ENDPOINT")
170
+ s.bucket = "production-archive-001"
171
+ s.access_key_id = ENV.fetch("ARCHIVE_001_ACCESS_KEY")
172
+ s.secret_access_key = ENV.fetch("ARCHIVE_001_SECRET_KEY")
173
+ s.region = "us-east-1"
174
+ s.path_style = true
175
+ end
89
176
 
90
177
  config.schedule :archive_documents,
91
178
  cron: "0 0-6,22,23 * * 1-5",
@@ -93,7 +180,7 @@ ArchiveStorage.configure do |config|
93
180
  mounted_as: :file,
94
181
  migration_rate: 10_000
95
182
 
96
- # Optional defaults:
183
+ # Defaults:
97
184
  #
98
185
  # config.job_backend = :active_job # :active_job, :good_job, :sidekiq, or :inline
99
186
  # config.migration_queue = :default
@@ -105,18 +192,20 @@ ArchiveStorage.configure do |config|
105
192
  end
106
193
  ```
107
194
 
108
- Filesystem/NFS storage can be mixed with S3-compatible storage:
195
+ Filesystem or NFS storage can be used as either source or archive storage:
109
196
 
110
197
  ```ruby
111
- config.storage :nfs_main do |s|
112
- s.provider = :filesystem
113
- s.root_path = "/mnt/uploads"
198
+ ArchiveStorage.configure do |config|
199
+ config.storage :nfs_main do |s|
200
+ s.provider = :filesystem
201
+ s.root_path = "/mnt/uploads"
202
+ end
114
203
  end
115
204
  ```
116
205
 
117
- ## Model Policy
206
+ ## CarrierWave
118
207
 
119
- Put archive policy next to the model that owns the file.
208
+ `archive_storage_for` automatically wires the mounted CarrierWave uploader to `storage :archive_storage`.
120
209
 
121
210
  ```ruby
122
211
  class ProjectDocument < ApplicationRecord
@@ -124,62 +213,107 @@ class ProjectDocument < ApplicationRecord
124
213
 
125
214
  archive_storage_for :file do
126
215
  primary :main
216
+ archive :archive_001, after: 90.days, scope: :ready_for_archive
217
+ end
218
+ end
219
+ ```
127
220
 
128
- archive :archive_001,
129
- after: 90.days,
130
- scope: :ready_for_archive,
131
- if: ->(record) { record.closed? }
221
+ The gem creates a per-model/per-mount uploader subclass under the model and uses that subclass internally. This avoids changing a shared uploader class globally when the same uploader is mounted by many models.
132
222
 
133
- archive :archive_002,
134
- after: 2.years,
135
- scope: ->(records) { records.where(priority: "low") },
136
- if: ->(record) { record.closed? }
223
+ CarrierWave versions are not migrated by default. Enable them only when the generated files should follow the same archive policy:
137
224
 
138
- read_fallbacks :main, :archive_001, :archive_002
225
+ ```ruby
226
+ archive_storage_for :file do
227
+ include_versions true
228
+ end
229
+ ```
139
230
 
140
- # Optional:
141
- #
142
- # delete_source_after verification: true, delay: 7.days
143
- # include_versions true
144
- # versions :thumb, :preview
145
- # timestamp_attribute :created_at
146
- end
231
+ To migrate only selected versions:
232
+
233
+ ```ruby
234
+ archive_storage_for :file do
235
+ versions :thumb, :preview
147
236
  end
148
237
  ```
149
238
 
150
- `archive_storage_for` automatically wires the mounted CarrierWave uploader to `storage :archive_storage`. The uploader can stay focused on path, filename, and version behavior:
239
+ ## Policies
240
+
241
+ Policies are declared on the model:
151
242
 
152
243
  ```ruby
153
- class DocumentUploader < CarrierWave::Uploader::Base
154
- def store_dir
155
- "uploads/#{model.class.to_s.underscore}/#{mounted_as}/#{model.id}"
244
+ archive_storage_for :file do
245
+ primary :main
246
+
247
+ archive :archive_001,
248
+ after: 90.days,
249
+ scope: :ready_for_archive,
250
+ max_byte_size: 3.gigabytes,
251
+ if: ->(record) { record.closed? }
252
+
253
+ read_fallbacks :main, :archive_001
254
+
255
+ # delete_source_after verification: true, delay: 7.days
256
+ # include_versions true
257
+ # versions :thumb, :preview
258
+ # timestamp_attribute :created_at
259
+ end
260
+ ```
261
+
262
+ Policy options:
263
+
264
+ - `primary` sets the storage used for new uploads.
265
+ - `archive` adds an archive destination rule.
266
+ - `after` is checked in Ruby after records are loaded.
267
+ - `scope` narrows the ActiveRecord relation before scanning records.
268
+ - `if` applies a per-record Ruby predicate.
269
+ - `max_byte_size` skips oversized files using storage metadata and checks again before copy.
270
+ - `read_fallbacks` sets the read recovery order.
271
+ - `delete_source_after` configures the per-mount cleanup delay.
272
+ - `include_versions` and `versions` control CarrierWave versions.
273
+ - `timestamp_attribute` changes the attribute used by `after`.
274
+
275
+ For large tables, keep heavy filters in SQL:
276
+
277
+ ```ruby
278
+ class ProjectDocument < ApplicationRecord
279
+ scope :ready_for_archive, -> {
280
+ where("created_at <= ?", 90.days.ago).where(status: "closed")
281
+ }
282
+
283
+ archive_storage_for :file do
284
+ primary :main
285
+ archive :archive_001, after: 90.days, scope: :ready_for_archive
156
286
  end
157
287
  end
158
288
  ```
159
289
 
160
- Policy notes:
290
+ `after` is useful as a safety check, but it should not replace a selective SQL scope on large production tables.
161
291
 
162
- - `primary` is where new uploads are stored.
163
- - `archive` rules are checked in order; the last eligible rule wins.
164
- - `scope` narrows the model relation before records are scanned. It can be a model scope name, a relation, or a callable that receives the current relation.
165
- - `read_fallbacks` is the read-recovery order when registry metadata is missing or a configured fallback error is raised.
166
- - By default only the original CarrierWave file is planned. Use `include_versions true` or `versions ...` when thumbnails/previews must move too.
292
+ Archive rules are checked in order. The last eligible rule wins, which allows progressive archives:
293
+
294
+ ```ruby
295
+ archive_storage_for :file do
296
+ primary :main
297
+ archive :archive_001, after: 90.days, scope: :ready_for_archive
298
+ archive :archive_002, after: 2.years, scope: :ready_for_archive
299
+ end
300
+ ```
167
301
 
168
302
  ## Scheduled Jobs
169
303
 
170
- Schedules are declared in global configuration:
304
+ Schedules are declared in `ArchiveStorage.configure`:
171
305
 
172
306
  ```ruby
173
307
  ArchiveStorage.configure do |config|
174
308
  config.schedule :archive_documents,
175
- cron: "0 0-6,22,23 * * 1-5",
309
+ cron: "*/10 * * * *",
176
310
  model: "ProjectDocument",
177
311
  mounted_as: :file,
178
312
  migration_rate: 10_000
179
313
  end
180
314
  ```
181
315
 
182
- `migration_rate` means at most this many files are enqueued by one scheduled run.
316
+ `migration_rate` is the maximum number of files enqueued by one scheduled run. If the cron runs every 10 minutes, `migration_rate: 10_000` means up to 10,000 files per run, not per hour.
183
317
 
184
318
  `archive_storage` registers scheduler entries automatically. You do not need to merge `ArchiveStorage.good_job_cron` or `ArchiveStorage.sidekiq_cron` into your application config.
185
319
 
@@ -187,7 +321,7 @@ end
187
321
 
188
322
  When `good_job` is present, `archive_storage` appends its entries to `config.good_job.cron` after Rails initialization. Existing GoodJob cron entries are preserved.
189
323
 
190
- Enable GoodJob cron in the app environment where the scheduler should run:
324
+ Enable GoodJob cron in the environment where scheduling should run:
191
325
 
192
326
  ```ruby
193
327
  # config/environments/production.rb
@@ -199,7 +333,7 @@ end
199
333
 
200
334
  ### Sidekiq
201
335
 
202
- Use Sidekiq for migration jobs:
336
+ Use Sidekiq workers for archive jobs:
203
337
 
204
338
  ```ruby
205
339
  # config/initializers/archive_storage.rb
@@ -217,14 +351,14 @@ gem "sidekiq-cron"
217
351
  gem "sidekiq-scheduler"
218
352
  ```
219
353
 
220
- On Sidekiq server startup, `archive_storage` adds its own schedules without deleting existing jobs:
354
+ On Sidekiq server startup, `archive_storage` adds its schedules without deleting existing schedules:
221
355
 
222
- - with `sidekiq-cron`, it uses non-destructive `Sidekiq::Cron::Job.load_from_hash`
356
+ - with `sidekiq-cron`, it uses `Sidekiq::Cron::Job.load_from_hash`
223
357
  - with `sidekiq-scheduler`, it uses `Sidekiq.set_schedule` and reloads the scheduler
224
358
 
225
- Existing jobs from `sidekiq.yml`, `config/schedule.yml`, or custom initializers remain in place.
359
+ Existing jobs from `sidekiq.yml`, `config/schedule.yml`, and custom initializers remain in place.
226
360
 
227
- ## Commands
361
+ ## Command Line
228
362
 
229
363
  ```bash
230
364
  bin/rails archive_storage:plan MODEL=ProjectDocument MOUNT=file
@@ -235,66 +369,52 @@ bin/rails archive_storage:cleanup_source
235
369
  bin/rails archive_storage:status
236
370
  ```
237
371
 
238
- Options:
372
+ Supported environment options:
239
373
 
240
374
  ```bash
241
375
  MODEL=ProjectDocument
242
376
  MOUNT=file
377
+ UPLOADER=DocumentUploader
243
378
  OLDER_THAN=90d
244
379
  LIMIT=10000
245
380
  INLINE=true
246
381
  ESTIMATE_SIZES=false
247
382
  ```
248
383
 
249
- `UPLOADER=DocumentUploader` is still accepted for advanced/legacy uploader-level configurations.
250
-
251
384
  Command behavior:
252
385
 
253
386
  - `plan` prints a dry-run plan.
254
- - `enqueue` and `migrate` enqueue migration jobs by default.
387
+ - `enqueue` enqueues migration jobs.
388
+ - `migrate` enqueues migration jobs by default.
255
389
  - `migrate INLINE=true` runs migration inline.
256
- - `verify` re-checks already migrated files.
257
- - `cleanup_source` deletes verified source copies that are past the cleanup delay.
390
+ - `verify` rechecks already migrated files.
391
+ - `cleanup_source` deletes verified source copies after the cleanup delay.
258
392
  - `status` prints registry counters.
259
393
 
394
+ `MODEL` and `MOUNT` are recommended for model-level policies. `UPLOADER` is still accepted for advanced or legacy uploader-level configurations.
395
+
260
396
  ## Migration Flow
261
397
 
398
+ The migration process is intentionally staged:
399
+
262
400
  ```text
263
401
  source only
264
- source + destination copied
265
- destination verified
266
- registry points reads to destination
402
+ source + archive copied
403
+ archive verified
404
+ registry points reads to archive
267
405
  reads can fallback to source
268
406
  source deleted later when cleanup is enabled
269
407
  ```
270
408
 
271
- Source deletion is disabled by default:
272
-
273
- ```ruby
274
- config.delete_source_enabled = false
275
- ```
276
-
277
- Turn it on only after the migration path has been verified in production:
278
-
279
- ```ruby
280
- config.delete_source_enabled = true
281
- ```
282
-
283
- Per-mount cleanup delay:
284
-
285
- ```ruby
286
- archive_storage_for :file do
287
- delete_source_after verification: true, delay: 7.days
288
- end
289
- ```
409
+ This keeps the application reading through the uploader while files are being copied and verified.
290
410
 
291
411
  ## Verification
292
412
 
293
- The default strategy is `:auto`.
413
+ The default verification strategy is `:auto`.
294
414
 
295
415
  `archive_storage` does not blindly trust S3 ETags. Multipart S3 uploads can have ETags like `hash-3`, and uploading the same bytes to another storage can produce a different ETag.
296
416
 
297
- Strategies:
417
+ Available strategies:
298
418
 
299
419
  - `:auto` - size check, then checksum when available, then non-multipart ETag, otherwise size-only
300
420
  - `:checksum` - require matching checksums
@@ -309,40 +429,86 @@ ArchiveStorage.configure do |config|
309
429
  end
310
430
  ```
311
431
 
312
- ## Registry
432
+ For filesystem/NFS sources, checksums are based on the bytes read from disk. For S3-compatible sources, checksum and ETag metadata are used when available according to the configured strategy.
313
433
 
314
- The generated migration creates `archive_storage_files`.
434
+ ## Cleanup
315
435
 
316
- The registry stores:
436
+ Source deletion is disabled by default:
317
437
 
318
- - model identity: `record_type`, `record_id`, `mounted_as`, `uploader`
319
- - object identity: `identifier`, `storage_key`, source/target keys
320
- - storage state: `current_storage`, `source_storage`, `target_storage`
321
- - migration state: enqueue, migration, verification, cleanup timestamps
322
- - metadata: byte size, checksum, content type, attempts, last error
438
+ ```ruby
439
+ ArchiveStorage.configure do |config|
440
+ config.delete_source_enabled = false
441
+ end
442
+ ```
323
443
 
324
- Business tables do not need extra columns for archive location.
444
+ Enable it only after planning, migration, and reads have been verified in production:
325
445
 
326
- ## CarrierWave Versions
446
+ ```ruby
447
+ ArchiveStorage.configure do |config|
448
+ config.delete_source_enabled = true
449
+ end
450
+ ```
327
451
 
328
- CarrierWave versions are disabled by default.
452
+ It can also be a callable, which is useful for feature flags:
329
453
 
330
454
  ```ruby
331
- archive_storage_for :file do
332
- include_versions true
455
+ ArchiveStorage.configure do |config|
456
+ config.delete_source_enabled = -> { Unleash.enabled?(:archive_storage_delete_source) }
333
457
  end
334
458
  ```
335
459
 
336
- To migrate only selected versions:
460
+ Configure cleanup delay per mount:
337
461
 
338
462
  ```ruby
339
463
  archive_storage_for :file do
340
- versions :thumb, :preview
464
+ primary :main
465
+ archive :archive_001, after: 90.days, scope: :ready_for_archive
466
+ delete_source_after verification: true, delay: 7.days
341
467
  end
342
468
  ```
343
469
 
344
- Use this only when those files are stored and read as part of the same archival policy. It can multiply the number of objects planned for migration.
470
+ Run cleanup:
471
+
472
+ ```bash
473
+ bin/rails archive_storage:cleanup_source
474
+ ```
475
+
476
+ ## Registry
477
+
478
+ The generated migration creates `archive_storage_files`.
479
+
480
+ The registry stores:
481
+
482
+ - model identity: `record_type`, `record_id`, `mounted_as`, `uploader`
483
+ - object identity: `identifier`, `storage_key`, `source_storage_key`, `target_storage_key`
484
+ - storage state: `current_storage`, `source_storage`, `target_storage`
485
+ - migration state: `enqueued_at`, `migration_started_at`, `migrated_at`, `verified_at`, `source_deleted_at`
486
+ - metadata: `byte_size`, `checksum`, `content_type`, `attempts`, `last_error`
487
+
488
+ The registry has a unique identity index on:
489
+
490
+ ```text
491
+ record_type, record_id, mounted_as, identifier, storage_key
492
+ ```
493
+
494
+ Business tables do not need extra columns for archive location.
495
+
496
+ If an application generated an older migration without the unique identity index, add a migration that replaces the old identity index with the unique one before relying on parallel enqueueing.
497
+
498
+ ## Development
499
+
500
+ Run the test suite:
501
+
502
+ ```bash
503
+ bundle exec rake test
504
+ ```
505
+
506
+ Build the gem:
507
+
508
+ ```bash
509
+ bundle exec gem build archive_storage.gemspec
510
+ ```
345
511
 
346
- ## Current Scope
512
+ ## License
347
513
 
348
- This MVP is focused on Rails, ActiveRecord, and CarrierWave. The storage and registry layers are not CarrierWave-specific, so other uploader integrations can be added later.
514
+ MIT.
@@ -8,8 +8,8 @@ Gem::Specification.new do |spec|
8
8
  spec.authors = ["E. Tashkovyan"]
9
9
  spec.email = []
10
10
 
11
- spec.summary = "Policy-based archive storage and zero-downtime file migration."
12
- spec.description = "Move uploads across storage backends such as filesystem, NFS, MinIO, and S3 without downtime."
11
+ spec.summary = "Archival storage for Rails uploaders."
12
+ spec.description = "Move older Rails uploads across storage backends such as filesystem, NFS, MinIO, and S3."
13
13
  spec.homepage = "https://github.com/estashkovyan/archive_storage"
14
14
  spec.license = "MIT"
15
15
  spec.required_ruby_version = ">= 3.1.0"
@@ -44,6 +44,10 @@ module ArchiveStorage
44
44
  @verification_strategy = :checksum if value
45
45
  end
46
46
 
47
+ def delete_source_enabled?
48
+ delete_source_enabled.respond_to?(:call) ? !!delete_source_enabled.call : !!delete_source_enabled
49
+ end
50
+
47
51
  def storage(name, &block)
48
52
  config = (@storages[name.to_sym] ||= StorageConfig.new(name))
49
53
  block.call(config) if block
@@ -1,9 +1,10 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module ArchiveStorage
4
- Error = Class.new(StandardError)
5
- ConfigurationError = Class.new(Error)
6
- NotFoundError = Class.new(Error)
7
- VerificationError = Class.new(Error)
8
- RegistryUnavailableError = Class.new(Error)
4
+ Error = Class.new(StandardError)
5
+ ConfigurationError = Class.new(Error)
6
+ NotFoundError = Class.new(Error)
7
+ VerificationError = Class.new(Error)
8
+ RegistryUnavailableError = Class.new(Error)
9
+ MaxByteSizeExceededError = Class.new(Error)
9
10
  end
@@ -48,6 +48,8 @@ module ArchiveStorage
48
48
  source = ArchiveStorage.adapter(source_storage)
49
49
  target = ArchiveStorage.adapter(target_storage)
50
50
 
51
+ validate_max_byte_size!(file_record, source, source_key, target_storage)
52
+
51
53
  target.copy_from(source, source_key, target_key)
52
54
  verification = Verifier.new.verify!(
53
55
  source_adapter: source,
@@ -105,7 +107,7 @@ module ArchiveStorage
105
107
  attr_reader :planner
106
108
 
107
109
  def cleanup_ready?(file_record)
108
- return false unless ArchiveStorage.configuration.delete_source_enabled
110
+ return false unless ArchiveStorage.configuration.delete_source_enabled?
109
111
  return false unless file_record.source_storage
110
112
  return false if file_record.source_deleted_at
111
113
 
@@ -130,6 +132,17 @@ module ArchiveStorage
130
132
  nil
131
133
  end
132
134
 
135
+ def validate_max_byte_size!(file_record, source_adapter, source_key, target_storage)
136
+ rule = policy_for(file_record)&.rule_for_storage(target_storage)
137
+ return unless rule&.max_byte_size?
138
+
139
+ metadata = source_adapter.head(source_key)
140
+ return if rule.byte_size_allowed?(metadata.byte_size)
141
+
142
+ raise MaxByteSizeExceededError,
143
+ "object #{source_key.inspect} is #{metadata.byte_size} bytes; max is #{rule.max_byte_size}"
144
+ end
145
+
133
146
  def safe_update_error(file_record, error)
134
147
  file_record.update!(last_error: "#{error.class}: #{error.message}") if file_record.respond_to?(:update!)
135
148
  rescue StandardError
@@ -7,9 +7,14 @@ module ArchiveStorage
7
7
 
8
8
  policy = PolicyBuilder.build(&block)
9
9
  uploader_class = archive_storage_uploader_for(mounted_as)
10
-
11
- ArchiveStorage.wire_carrierwave_uploader!(uploader_class)
12
- ArchiveStorage.register_mount(self, mounted_as, uploader: uploader_class, policy: policy)
10
+ archive_uploader_class = ArchiveStorage.build_mount_uploader!(
11
+ self,
12
+ mounted_as,
13
+ uploader_class
14
+ )
15
+
16
+ ArchiveStorage.wire_carrierwave_uploader!(archive_uploader_class)
17
+ ArchiveStorage.register_mount(self, mounted_as, uploader: archive_uploader_class, policy: policy)
13
18
 
14
19
  archive_storage_policies[mounted_as.to_sym] = policy
15
20
  end
@@ -126,11 +126,13 @@ module ArchiveStorage
126
126
  storage_key: storage_key,
127
127
  default: policy.primary_storage_key
128
128
  )
129
- target_storage = policy.target_storage_for(record)
130
- return nil unless target_storage
131
- return nil if current_storage.to_sym == target_storage.to_sym
129
+ now = Time.now
130
+ metadata = source_metadata_for(policy, record, current_storage, storage_key, now: now)
131
+ target_rule = policy.target_rule_for(record, now: now, byte_size: metadata&.byte_size)
132
+ return nil unless target_rule
132
133
 
133
- metadata = estimate_metadata(target_storage, policy.primary_storage_key, storage_key)
134
+ target_storage = target_rule.storage_key
135
+ return nil if current_storage.to_sym == target_storage.to_sym
134
136
 
135
137
  Candidate.new(
136
138
  record: record,
@@ -179,12 +181,21 @@ module ArchiveStorage
179
181
  nil
180
182
  end
181
183
 
182
- def estimate_metadata(_target_storage, source_storage, storage_key)
184
+ def estimate_metadata(source_storage, storage_key)
183
185
  return nil unless estimate_sizes
184
186
 
185
187
  ArchiveStorage.adapter(source_storage).head(storage_key)
186
188
  rescue StandardError
187
189
  nil
188
190
  end
191
+
192
+ def source_metadata_for(policy, record, source_storage, storage_key, now:)
193
+ return estimate_metadata(source_storage, storage_key) if estimate_sizes
194
+ return nil unless policy.requires_byte_size_for?(record, now: now)
195
+
196
+ ArchiveStorage.adapter(source_storage).head(storage_key)
197
+ rescue StandardError
198
+ nil
199
+ end
189
200
  end
190
201
  end
@@ -27,12 +27,39 @@ module ArchiveStorage
27
27
  primary_storage&.storage_key
28
28
  end
29
29
 
30
- def target_storage_for(record, now: Time.now)
30
+ def target_storage_for(record, now: Time.now, byte_size: nil)
31
+ target_rule_for(record, now: now, byte_size: byte_size)&.storage_key
32
+ end
33
+
34
+ def target_rule_for(record, now: Time.now, byte_size: nil)
31
35
  eligible_rules = rules.select do |rule|
32
- rule.eligible?(record, now: now, timestamp_attribute: timestamp_attribute)
36
+ rule.eligible?(
37
+ record,
38
+ now: now,
39
+ timestamp_attribute: timestamp_attribute,
40
+ byte_size: byte_size
41
+ )
33
42
  end
34
43
 
35
- eligible_rules.last&.storage_key
44
+ eligible_rules.last
45
+ end
46
+
47
+ def rule_for_storage(storage_key)
48
+ rules.reverse.find { |rule| rule.storage_key == storage_key.to_sym }
49
+ end
50
+
51
+ def requires_byte_size?
52
+ rules.any?(&:max_byte_size?)
53
+ end
54
+
55
+ def requires_byte_size_for?(record, now: Time.now)
56
+ rules.any? do |rule|
57
+ rule.requires_byte_size_for?(
58
+ record,
59
+ now: now,
60
+ timestamp_attribute: timestamp_attribute
61
+ )
62
+ end
36
63
  end
37
64
 
38
65
  def apply_rule_scopes(scope)
@@ -44,7 +44,8 @@ module ArchiveStorage
44
44
  name,
45
45
  after: options[:after],
46
46
  condition: options[:if],
47
- scope: options[:scope]
47
+ scope: options[:scope],
48
+ max_byte_size: options[:max_byte_size]
48
49
  )
49
50
  end
50
51
 
@@ -4,46 +4,39 @@ require_relative "errors"
4
4
  require_relative "models/file_record"
5
5
 
6
6
  module ArchiveStorage
7
- class Registry
8
- def available?
9
- defined?(::ActiveRecord::Base) &&
10
- ::ActiveRecord::Base.connected? &&
11
- record_class.table_exists?
12
- rescue StandardError
13
- false
14
- end
7
+ class Registry
8
+ def available?
9
+ defined?(::ActiveRecord::Base) &&
10
+ ::ActiveRecord::Base.connected? &&
11
+ record_class.table_exists?
12
+ rescue StandardError
13
+ false
14
+ end
15
15
 
16
- def find_for_uploader(uploader, identifier:, storage_key:)
17
- return nil unless available?
18
- return nil unless uploader_identity_available?(uploader)
16
+ def find_for_uploader(uploader, identifier:, storage_key:)
17
+ return nil unless available?
18
+ return nil unless uploader_identity_available?(uploader)
19
19
 
20
- record_class.find_by(
21
- record_type: uploader.model.class.name,
22
- record_id: uploader.model.id,
23
- mounted_as: uploader.mounted_as.to_s,
24
- identifier: identifier.to_s,
25
- storage_key: storage_key.to_s
26
- )
27
- end
20
+ record_class.find_by(
21
+ identity_for_uploader(uploader, identifier: identifier, storage_key: storage_key)
22
+ )
23
+ end
28
24
 
29
- def current_storage_for(uploader, identifier:, storage_key:, default:)
30
- find_for_uploader(
31
- uploader,
32
- identifier: identifier,
33
- storage_key: storage_key
34
- )&.current_storage&.to_sym || default
35
- end
25
+ def current_storage_for(uploader, identifier:, storage_key:, default:)
26
+ find_for_uploader(
27
+ uploader,
28
+ identifier: identifier,
29
+ storage_key: storage_key
30
+ )&.current_storage&.to_sym || default
31
+ end
36
32
 
37
- def upsert_for_uploader(uploader, identifier:, storage_key:, current_storage:, metadata: {})
38
- return nil unless available?
39
- return nil unless uploader_identity_available?(uploader)
33
+ def upsert_for_uploader(uploader, identifier:, storage_key:, current_storage:, metadata: {})
34
+ return nil unless available?
35
+ return nil unless uploader_identity_available?(uploader)
40
36
 
37
+ with_unique_retry do
41
38
  record = record_class.find_or_initialize_by(
42
- record_type: uploader.model.class.name,
43
- record_id: uploader.model.id,
44
- mounted_as: uploader.mounted_as.to_s,
45
- identifier: identifier.to_s,
46
- storage_key: storage_key.to_s
39
+ identity_for_uploader(uploader, identifier: identifier, storage_key: storage_key)
47
40
  )
48
41
 
49
42
  record.uploader = uploader.class.name
@@ -54,16 +47,14 @@ module ArchiveStorage
54
47
  record.save!
55
48
  record
56
49
  end
50
+ end
57
51
 
58
- def claim_candidate(candidate)
59
- raise RegistryUnavailableError, "archive_storage_files table is not available" unless available?
52
+ def claim_candidate(candidate)
53
+ raise RegistryUnavailableError, "archive_storage_files table is not available" unless available?
60
54
 
55
+ with_unique_retry do
61
56
  record = record_class.find_or_initialize_by(
62
- record_type: candidate.record.class.name,
63
- record_id: candidate.record.id,
64
- mounted_as: candidate.mounted_as.to_s,
65
- identifier: candidate.identifier.to_s,
66
- storage_key: candidate.storage_key.to_s
57
+ identity_for_candidate(candidate)
67
58
  )
68
59
  return nil unless claimable?(record)
69
60
 
@@ -80,30 +71,64 @@ module ArchiveStorage
80
71
  record.save!
81
72
  record
82
73
  end
74
+ end
83
75
 
84
- alias ensure_for_candidate claim_candidate
76
+ alias ensure_for_candidate claim_candidate
85
77
 
86
- private
78
+ private
87
79
 
88
- def record_class
89
- ArchiveStorage.configuration.registry_class
90
- end
80
+ def record_class
81
+ ArchiveStorage.configuration.registry_class
82
+ end
91
83
 
92
- def claimable?(record)
93
- return false if record.respond_to?(:migrated_at) && record.migrated_at
94
- return true unless record.respond_to?(:enqueued_at)
95
- return true unless record.enqueued_at
84
+ def claimable?(record)
85
+ return false if record.respond_to?(:migrated_at) && record.migrated_at
86
+ return true unless record.respond_to?(:enqueued_at)
87
+ return true unless record.enqueued_at
96
88
 
97
- record.enqueued_at <= Time.now - ArchiveStorage.configuration.enqueue_claim_ttl
98
- end
89
+ record.enqueued_at <= Time.now - ArchiveStorage.configuration.enqueue_claim_ttl
90
+ end
99
91
 
100
- def uploader_identity_available?(uploader)
101
- uploader.respond_to?(:model) &&
102
- uploader.model &&
103
- uploader.model.respond_to?(:id) &&
104
- uploader.model.id &&
105
- uploader.respond_to?(:mounted_as) &&
106
- uploader.mounted_as
107
- end
92
+ def uploader_identity_available?(uploader)
93
+ uploader.respond_to?(:model) &&
94
+ uploader.model &&
95
+ uploader.model.respond_to?(:id) &&
96
+ uploader.model.id &&
97
+ uploader.respond_to?(:mounted_as) &&
98
+ uploader.mounted_as
99
+ end
100
+
101
+ def identity_for_uploader(uploader, identifier:, storage_key:)
102
+ {
103
+ record_type: uploader.model.class.name,
104
+ record_id: uploader.model.id,
105
+ mounted_as: uploader.mounted_as.to_s,
106
+ identifier: identifier.to_s,
107
+ storage_key: storage_key.to_s
108
+ }
109
+ end
110
+
111
+ def identity_for_candidate(candidate)
112
+ {
113
+ record_type: candidate.record.class.name,
114
+ record_id: candidate.record.id,
115
+ mounted_as: candidate.mounted_as.to_s,
116
+ identifier: candidate.identifier.to_s,
117
+ storage_key: candidate.storage_key.to_s
118
+ }
119
+ end
120
+
121
+ def with_unique_retry
122
+ yield
123
+ rescue StandardError => error
124
+ raise unless unique_violation?(error)
125
+
126
+ yield
127
+ end
128
+
129
+ def unique_violation?(error)
130
+ defined?(::ActiveRecord::RecordNotUnique) &&
131
+ error.is_a?(::ActiveRecord::RecordNotUnique)
108
132
  end
133
+ end
109
134
  end
@@ -2,19 +2,21 @@
2
2
 
3
3
  module ArchiveStorage
4
4
  class StorageRule
5
- attr_reader :role, :storage_key, :after, :condition, :scope
5
+ attr_reader :role, :storage_key, :after, :condition, :scope, :max_byte_size
6
6
 
7
- def initialize(role, storage_key, after: nil, condition: nil, scope: nil)
7
+ def initialize(role, storage_key, after: nil, condition: nil, scope: nil, max_byte_size: nil)
8
8
  @role = role.to_sym
9
9
  @storage_key = storage_key.to_sym
10
10
  @after = after
11
11
  @condition = condition
12
12
  @scope = scope
13
+ @max_byte_size = normalize_byte_size(max_byte_size)
13
14
  end
14
15
 
15
- def eligible?(record, now:, timestamp_attribute:)
16
+ def eligible?(record, now:, timestamp_attribute:, byte_size: nil)
16
17
  old_enough?(record, now: now, timestamp_attribute: timestamp_attribute) &&
17
- condition_matches?(record)
18
+ condition_matches?(record) &&
19
+ byte_size_allowed?(byte_size)
18
20
  end
19
21
 
20
22
  def scoped?
@@ -32,8 +34,31 @@ module ArchiveStorage
32
34
  end
33
35
  end
34
36
 
37
+ def max_byte_size?
38
+ !max_byte_size.nil?
39
+ end
40
+
41
+ def requires_byte_size_for?(record, now:, timestamp_attribute:)
42
+ max_byte_size? &&
43
+ old_enough?(record, now: now, timestamp_attribute: timestamp_attribute) &&
44
+ condition_matches?(record)
45
+ end
46
+
47
+ def byte_size_allowed?(byte_size)
48
+ return true unless max_byte_size?
49
+ return false if byte_size.nil?
50
+
51
+ byte_size <= max_byte_size
52
+ end
53
+
35
54
  private
36
55
 
56
+ def normalize_byte_size(value)
57
+ return nil if value.nil?
58
+
59
+ Integer(value)
60
+ end
61
+
37
62
  def old_enough?(record, now:, timestamp_attribute:)
38
63
  return true unless after
39
64
  return false unless record
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module ArchiveStorage
4
- VERSION = "0.1.0"
4
+ VERSION = "0.1.1"
5
5
  end
@@ -53,6 +53,15 @@ module ArchiveStorage
53
53
  configuration.mount(model, mounted_as, uploader: uploader, policy: policy)
54
54
  end
55
55
 
56
+ def build_mount_uploader!(model_class, mounted_as, uploader_class)
57
+ return uploader_class unless model_class.respond_to?(:uploaders)
58
+ return uploader_class unless model_class.uploaders.respond_to?(:[]=)
59
+
60
+ subclass = mount_uploader_subclass(model_class, mounted_as, uploader_class)
61
+ model_class.uploaders[mounted_as.to_sym] = subclass
62
+ subclass
63
+ end
64
+
56
65
  def wire_carrierwave_uploader!(uploader_class)
57
66
  return unless uploader_class
58
67
 
@@ -101,6 +110,17 @@ module ArchiveStorage
101
110
  nil
102
111
  end
103
112
 
113
+ def mount_uploader_subclass(model_class, mounted_as, uploader_class)
114
+ const_name = "ArchiveStorage#{camelize(mounted_as)}Uploader"
115
+ return model_class.const_get(const_name, false) if model_class.const_defined?(const_name, false)
116
+
117
+ model_class.const_set(const_name, Class.new(uploader_class))
118
+ end
119
+
120
+ def camelize(value)
121
+ value.to_s.split("_").map(&:capitalize).join
122
+ end
123
+
104
124
  def mount_policy_for_uploader(uploader)
105
125
  return nil unless uploader.respond_to?(:model) && uploader.model
106
126
  return nil unless uploader.respond_to?(:mounted_as) && uploader.mounted_as
@@ -35,7 +35,8 @@ class CreateArchiveStorageFiles < ActiveRecord::Migration[7.0]
35
35
  end
36
36
 
37
37
  add_index :archive_storage_files,
38
- [:record_type, :record_id, :mounted_as, :identifier],
38
+ [:record_type, :record_id, :mounted_as, :identifier, :storage_key],
39
+ unique: true,
39
40
  name: "idx_archive_storage_identity"
40
41
 
41
42
  add_index :archive_storage_files,
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: archive_storage
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.1.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - E. Tashkovyan
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2026-05-26 00:00:00.000000000 Z
11
+ date: 2026-06-08 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: activejob
@@ -152,8 +152,8 @@ dependencies:
152
152
  - - "~>"
153
153
  - !ruby/object:Gem::Version
154
154
  version: '13.0'
155
- description: Move uploads across storage backends such as filesystem, NFS, MinIO,
156
- and S3 without downtime.
155
+ description: Move older Rails uploads across storage backends such as filesystem,
156
+ NFS, MinIO, and S3.
157
157
  email: []
158
158
  executables: []
159
159
  extensions: []
@@ -223,5 +223,5 @@ requirements: []
223
223
  rubygems_version: 3.5.22
224
224
  signing_key:
225
225
  specification_version: 4
226
- summary: Policy-based archive storage and zero-downtime file migration.
226
+ summary: Archival storage for Rails uploaders.
227
227
  test_files: []