mass_encryption 0.1.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: cc8df4e76e1e8014ef21dbaa9602c2912d7ea52b548e0a16e9dda04e32b27ae8
4
+ data.tar.gz: e81be1a7a60e9d9866aa8a2d3dda9395b45ebde59f2edac3b5f0f0a336377412
5
+ SHA512:
6
+ metadata.gz: 92133d3921f5e5ad0021f6eb9088a90765be980cdb078dff480a09a36af6c01d267d5890fa392c0fab92abd4a904b6199f74d9f5558c005322e3c9a521cb4403
7
+ data.tar.gz: 6964dcdd7e9df1fdc5eb991407dc92947fa17ac7a3208166d7e5695c1d98b3adf4f4f9a5fedbd7d37288b60c5e7f8df67b60e97467a6751b528051f964094692
data/MIT-LICENSE ADDED
@@ -0,0 +1,20 @@
1
+ Copyright 2021 Jorge Manrubia
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,81 @@
1
+ ![example workflow](https://github.com/basecamp/mass_encryption/actions/workflows/build.yml/badge.svg)
2
+
3
+ # MassEncryption
4
+
5
+ MassEncryption lets you encrypt large sets of data using [Active Record encryption](https://edgeguides.rubyonrails.org/active_record_encryption.html).
6
+
7
+ Its main use case is adding encryption to existing applications where you have a large amount of existing data to encrypt.
8
+
9
+ It relies on [Active Job](https://guides.rubyonrails.org/active_job_basics.html) to create encryption jobs that take care of encrypting data in batches.
10
+
11
+ ## Installation
12
+
13
+ Add this line to your application's Gemfile:
14
+
15
+ ```ruby
16
+ gem 'mass_encryption'
17
+ ```
18
+
19
+ ## Usage
20
+
21
+ MassEncryption offers two modes of operation:
22
+
23
+ - Encrypt data in tracks (recommended)
24
+ - Encrypt data in parallel jobs
25
+
26
+ ### Encrypt data in tracks (recommended)
27
+
28
+ When encrypting in tracks, you create a limited number of jobs that will encrypt a batch of records. Each job represents a track. When the job encrypts its batch, it enqueues the next batch in the track.
29
+
30
+ ![](docs/images/encryption-in-tracks.png)
31
+
32
+ This mode of encryption lets you keep the number of jobs you enqueue under control. This has two advantages:
33
+
34
+ - It avoids having to enqueue all the jobs ahead of time. For example, you don't normally want to enqueue millions of jobs up front to encrypt billions of rows.
35
+ - It lets you limit concurrency to avoid capacity issues.
36
+
37
+ You can launch the encryption in this mode with:
38
+
39
+ ```shell
40
+ rake mass_encryption:encrypt_all_in_tracks
41
+ ```
42
+
43
+ By default it will encrypt all the models with encrypted attributes using a batch size of 1000 records per job and one track, so only one job will encrypt data at any given moment.
44
+
45
+ For example:
46
+
47
+ ```shell
48
+ # Encrypt al the posts starting with id 10 using 6 encryption jobs
49
+ rake mass_encryption:encrypt_all_in_tracks EXCEPT="Post" FROM_ID=10 TRACKS=6
50
+ ```
51
+
52
+ ### Encrypt data in parallel jobs
53
+
54
+ In this mode, it will simply loop through all the batches of records and enqueue a job for each.
55
+
56
+ By default it will encrypt all the models with encrypted attributes using a batch size of 1000 records per job.
57
+
58
+ ```shell
59
+ # Encrypt al the posts starting with id 10 using as many jobs as needed to encrypt them in batches of 500 records
60
+ rake mass_encryption:encrypt_all_in_parallel_jobs EXCEPT="Post" FROM_ID=10 BATCH_SIZE=500
61
+ ```
62
+
63
+ ### Options
64
+
65
+ You can customize it by passing the following environment variables when invoking the rake task:
66
+
67
+ * `ONLY`. Comma-separated list of class names to encrypt.
68
+ * `EXCLUDE`. Comma-separated list of class name to exclude.
69
+ * `FROM_ID`. Id to use as an anchor to start encryption. This is handy to resume encryption operations that got interrupted. Ids lower than it won't be encrypted. By default it will be the id of the first model record.
70
+ * `BATCH_SIZE`. The amount of records each job will encrypt. By default it's 1000.
71
+ * `TRACKS`: The number of tracks to use (only available when encrypting in tracks). By default it's 1.
72
+
73
+ ## How it works
74
+
75
+ * MassEncryption internally uses [`upsert_all`](https://edgeapi.rubyonrails.org/classes/ActiveRecord/Persistence/ClassMethods.html#method-i-upsert_all) to perform fast updates in bulk.
76
+
77
+ * If there was some error when trying to update the records, MassEncryption jobs will try to encrypt the records in the batch one by one. They will collect all the individual errors and raise a single `MassEncryption::MassEncryption::BatchEncryptionError` error aggregating them all. This way, one record failing to encrypt won't prevent other records in teh batch from being encrypted.
78
+
79
+ ## License
80
+
81
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
data/Rakefile ADDED
@@ -0,0 +1,18 @@
1
+ require "bundler/setup"
2
+
3
+ APP_RAKEFILE = File.expand_path("test/dummy/Rakefile", __dir__)
4
+ load "rails/tasks/engine.rake"
5
+
6
+ load "rails/tasks/statistics.rake"
7
+
8
+ require "bundler/gem_tasks"
9
+
10
+ require "rake/testtask"
11
+
12
+ Rake::TestTask.new(:test) do |t|
13
+ t.libs << 'test'
14
+ t.pattern = 'test/**/*_test.rb'
15
+ t.verbose = false
16
+ end
17
+
18
+ task default: :test
@@ -0,0 +1,5 @@
1
+ module MassEncryption
2
+ class ApplicationJob < ActiveJob::Base
3
+ queue_as :encryption
4
+ end
5
+ end
@@ -0,0 +1,10 @@
1
+ class MassEncryption::BatchEncryptionJob < MassEncryption::ApplicationJob
2
+ def perform(batch, auto_enqueue_next: true)
3
+ if batch.present?
4
+ batch.encrypt_now
5
+ self.class.perform_later batch.next if auto_enqueue_next
6
+ end
7
+ end
8
+
9
+ ActiveSupport.run_load_hooks(:mass_encryption_batch_job, self)
10
+ end
data/config/routes.rb ADDED
@@ -0,0 +1,2 @@
1
+ MassEncryption::Engine.routes.draw do
2
+ end
@@ -0,0 +1,100 @@
1
+ class MassEncryption::Batch
2
+ attr_reader :from_id, :size, :track, :tracks_count
3
+
4
+ DEFAULT_BATCH_SIZE = 1000
5
+
6
+ delegate :logger, to: MassEncryption
7
+
8
+ def initialize(klass:, from_id:, size: DEFAULT_BATCH_SIZE, track: 0, tracks_count: 1)
9
+ @class_name = klass.name # not storing class as instance variable as it causes stack overflow error with json serialization
10
+ @from_id = from_id
11
+ @size = size
12
+ @track = track
13
+ @tracks_count = tracks_count
14
+ end
15
+
16
+ def klass
17
+ @class_name.constantize
18
+ end
19
+
20
+ def encrypt_now
21
+ if klass.encrypted_attributes.present? && present?
22
+ validate_encrypting_is_allowed
23
+ encrypt_records
24
+ end
25
+ end
26
+
27
+ def validate_encrypting_is_allowed
28
+ raise ActiveRecord::Encryption::Errors::Configuration, "can't mass encrypt while in protected mode" if ActiveRecord::Encryption.context.frozen_encryption?
29
+ end
30
+
31
+ def encrypt_later(auto_enqueue_next: false)
32
+ MassEncryption::BatchEncryptionJob.perform_later(self, auto_enqueue_next: auto_enqueue_next)
33
+ end
34
+
35
+ def present?
36
+ # we deliberately load the association to avoid 2 queries when checking if there are records
37
+ # before encrypting in +MassEncryption::BatchEncryptionJob+
38
+ records.present?
39
+ end
40
+
41
+ def next
42
+ self.class.new(klass: klass, from_id: next_track_records.last.id + 1, size: size, track: track, tracks_count: tracks_count)
43
+ end
44
+
45
+ def records
46
+ @records ||= klass.where("id >= ?", determine_from_id).order(id: :asc).limit(size)
47
+ end
48
+
49
+ def to_s
50
+ "<#{klass}> from: #{from_id} size: #{size} (track=#{track}, tracks_count=#{tracks_count}) | #{records.first.id} - #{records.last.id}"
51
+ end
52
+
53
+ private
54
+ def encrypt_records
55
+ encrypt_using_upsert
56
+ rescue StandardError => error
57
+ logger.error "Upsert failed with #{error.inspect}. Trying to encrypt record by record..."
58
+ encrypt_record_by_record
59
+ end
60
+
61
+ def encrypt_using_upsert
62
+ klass.upsert_all records.collect(&:attributes), update_only: klass.encrypted_attributes, record_timestamps: false
63
+ end
64
+
65
+ def encrypt_record_by_record
66
+ errors_by_record = {}
67
+
68
+ records.each do |record|
69
+ record.encrypt
70
+ rescue StandardError => error
71
+ errors_by_record[record] = error
72
+ end
73
+
74
+ raise MassEncryption::BatchEncryptionError.new(errors_by_record) if errors_by_record.present?
75
+ end
76
+
77
+ def determine_from_id
78
+ if track == 0
79
+ from_id # save a query to determine the id for the first track
80
+ else
81
+ last_track_id.present? ? (last_track_id + 1) : from_id
82
+ end
83
+ end
84
+
85
+ def last_track_id
86
+ @last_track_id ||= ids_in_the_same_track.last
87
+ end
88
+
89
+ def offset
90
+ track * size
91
+ end
92
+
93
+ def ids_in_the_same_track
94
+ klass.where("id >= ?", from_id).order(id: :asc).limit(offset).ids
95
+ end
96
+
97
+ def next_track_records
98
+ klass.where("id >= ?", from_id).order(id: :asc).limit(size * tracks_count)
99
+ end
100
+ end
@@ -0,0 +1,9 @@
1
+ class MassEncryption::BatchEncryptionError < StandardError
2
+ attr_reader :errors_by_record
3
+
4
+ def initialize(errors_by_record)
5
+ @errors_by_record = errors_by_record
6
+ message = errors_by_record.collect { |record, error| "[#{record.class}:#{record.id}] #{error.inspect}" }.join(", ")
7
+ super(message)
8
+ end
9
+ end
@@ -0,0 +1,19 @@
1
+ class MassEncryption::BatchSerializer < ActiveJob::Serializers::ObjectSerializer
2
+ def serialize?(argument)
3
+ argument.kind_of?(MassEncryption::Batch)
4
+ end
5
+
6
+ def serialize(batch)
7
+ super(
8
+ "klass" => batch.klass.name,
9
+ "from_id" => batch.from_id,
10
+ "size" => batch.size,
11
+ "track" => batch.track || 0,
12
+ "tracks_count" => batch.tracks_count || 1
13
+ )
14
+ end
15
+
16
+ def deserialize(hash)
17
+ MassEncryption::Batch.new(klass: hash["klass"].constantize, from_id: hash["from_id"], size: hash["size"], track: hash["track"], tracks_count: hash["tracks_count"])
18
+ end
19
+ end
@@ -0,0 +1,89 @@
1
+ class MassEncryption::Encryptor
2
+ DEFAULT_BATCH_SIZE = 1000
3
+
4
+ delegate :logger, to: MassEncryption
5
+
6
+ def initialize(from_id: nil, only: nil, except: nil, batch_size: DEFAULT_BATCH_SIZE, tracks_count: nil, silent: true)
7
+ only = Array(only || all_encryptable_classes)
8
+ except = Array(except)
9
+
10
+ @from_id = from_id
11
+ @encryptable_classes = only - except
12
+ @batch_size = batch_size
13
+ @silent = silent
14
+ @tracks_count = tracks_count
15
+
16
+ logger.info info_message unless silent
17
+ end
18
+
19
+ def encrypt_all_later
20
+ encryptable_classes.each { |klass| enqueue_encryption_jobs_for(klass) }
21
+ end
22
+
23
+ private
24
+ attr_reader :from_id, :encryptable_classes, :batch_size, :silent, :tracks_count
25
+
26
+ def info_message
27
+ message = "Encrypting #{encryptable_classes.count} models"
28
+ message << if execute_in_sequential_tracks?
29
+ " with #{tracks_count} head jobs"
30
+ else
31
+ " with parallel jobs"
32
+ end
33
+ message << "\n\t#{encryptable_classes.collect(&:name).join(", ")}\n\n"
34
+ message << [ "Batch size: #{batch_size}", ("From id: #{from_id}" if from_id), "Tracks count: #{tracks_count}" ].compact.join(" | ")
35
+
36
+ message
37
+ end
38
+
39
+ def enqueue_encryption_jobs_for(klass)
40
+ if execute_in_sequential_tracks?
41
+ enqueue_track_encryption_jobs_for(klass)
42
+ else
43
+ enqueue_all_encryption_jobs_for(klass)
44
+ end
45
+ end
46
+
47
+ def execute_in_sequential_tracks?
48
+ tracks_count.present?
49
+ end
50
+
51
+ def enqueue_all_encryption_jobs_for(klass)
52
+ all_records_for(klass).select(:id).in_batches(of: batch_size, load: true) do |records|
53
+ MassEncryption::Batch.new(klass: klass, from_id: records.first.id, size: batch_size).encrypt_later(auto_enqueue_next: false)
54
+ end
55
+ end
56
+
57
+ def all_records_for(klass)
58
+ base = klass
59
+ base = base.where("id >= ?", from_id) if from_id.present?
60
+ base
61
+ end
62
+
63
+ def enqueue_track_encryption_jobs_for(klass)
64
+ tracks_count.times.each do |track|
65
+ if first_record = all_records_for(klass).first
66
+ MassEncryption::Batch.new(klass: klass, from_id: first_record.id, size: batch_size, track: track, tracks_count: tracks_count)&.encrypt_later(auto_enqueue_next: true)
67
+ end
68
+ end
69
+ end
70
+
71
+ def all_encryptable_classes
72
+ @all_encryptable_classes ||= begin
73
+ Rails.application.eager_load! unless Rails.application.config.eager_load
74
+ ActiveRecord::Base.descendants.find_all { |klass| encryptable_class?(klass) }
75
+ end
76
+ end
77
+
78
+ def encryptable_class?(klass)
79
+ has_encrypted_attributes?(klass) || has_encrypted_rich_text_attribute?(klass)
80
+ end
81
+
82
+ def has_encrypted_attributes?(klass)
83
+ klass.encrypted_attributes.present?
84
+ end
85
+
86
+ def has_encrypted_rich_text_attribute?(klass)
87
+ klass.reflect_on_all_associations(:has_one).find { |relation| relation.klass == ActionText::EncryptedRichText }
88
+ end
89
+ end
@@ -0,0 +1,9 @@
1
+ module MassEncryption
2
+ class Engine < ::Rails::Engine
3
+ isolate_namespace MassEncryption
4
+
5
+ initializer "mass_encryption.active_job" do
6
+ config.active_job.custom_serializers << MassEncryption::BatchSerializer
7
+ end
8
+ end
9
+ end
@@ -0,0 +1,3 @@
1
+ module MassEncryption
2
+ VERSION = '0.1.1'
3
+ end
@@ -0,0 +1,10 @@
1
+ require "mass_encryption/version"
2
+ require "mass_encryption/engine"
3
+
4
+ require "zeitwerk"
5
+ loader = Zeitwerk::Loader.for_gem
6
+ loader.setup
7
+
8
+ module MassEncryption
9
+ mattr_accessor :logger, default: ActiveSupport::Logger.new(STDOUT)
10
+ end
@@ -0,0 +1,31 @@
1
+ namespace :mass_encryption do
2
+ task encrypt_all_in_tracks: :environment do
3
+ from_id = ENV["FROM_ID"]
4
+ only = MassEncryption::Tasks.classes_from(ENV["ONLY"])
5
+ except = MassEncryption::Tasks.classes_from(ENV["EXCEPT"])
6
+ tracks = (ENV["TRACKS"] || 1).to_i
7
+ batch_size = (ENV["BATCH_SIZE"] || 1000).to_i
8
+
9
+ MassEncryption::Encryptor.new(from_id: from_id, only: only, except: except, tracks_count: tracks, silent: false, batch_size: batch_size).encrypt_all_later
10
+ end
11
+
12
+ task encrypt_all_in_parallel_jobs: :environment do
13
+ from_id = ENV["FROM_ID"]
14
+ only = MassEncryption::Tasks.classes_from(ENV["ONLY"])
15
+ except = MassEncryption::Tasks.classes_from(ENV["EXCEPT"])
16
+ batch_size = (ENV["BATCH_SIZE"] || 1000).to_i
17
+
18
+ MassEncryption::Encryptor.new(from_id: from_id, only: only, except: except, silent: false, batch_size: batch_size).encrypt_all_later
19
+ end
20
+ end
21
+
22
+ module MassEncryption::Tasks
23
+ extend self
24
+
25
+ def classes_from(string)
26
+ if string.present?
27
+ class_strings = string.split(/[\s,]/).filter(&:present?)
28
+ class_strings.collect(&:constantize)
29
+ end
30
+ end
31
+ end
metadata ADDED
@@ -0,0 +1,157 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: mass_encryption
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.1
5
+ platform: ruby
6
+ authors:
7
+ - Jorge Manrubia
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2021-12-22 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: rails
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: 7.0.0
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: 7.0.0
27
+ - !ruby/object:Gem::Dependency
28
+ name: mysql2
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - ">="
32
+ - !ruby/object:Gem::Version
33
+ version: '0'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ">="
39
+ - !ruby/object:Gem::Version
40
+ version: '0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: pg
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - ">="
46
+ - !ruby/object:Gem::Version
47
+ version: '0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: '0'
55
+ - !ruby/object:Gem::Dependency
56
+ name: rubocop
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ">="
60
+ - !ruby/object:Gem::Version
61
+ version: '0'
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - ">="
67
+ - !ruby/object:Gem::Version
68
+ version: '0'
69
+ - !ruby/object:Gem::Dependency
70
+ name: rubocop-performance
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - ">="
74
+ - !ruby/object:Gem::Version
75
+ version: '0'
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - ">="
81
+ - !ruby/object:Gem::Version
82
+ version: '0'
83
+ - !ruby/object:Gem::Dependency
84
+ name: rubocop-rails
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - ">="
88
+ - !ruby/object:Gem::Version
89
+ version: '0'
90
+ type: :development
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - ">="
95
+ - !ruby/object:Gem::Version
96
+ version: '0'
97
+ - !ruby/object:Gem::Dependency
98
+ name: mocha
99
+ requirement: !ruby/object:Gem::Requirement
100
+ requirements:
101
+ - - ">="
102
+ - !ruby/object:Gem::Version
103
+ version: '0'
104
+ type: :development
105
+ prerelease: false
106
+ version_requirements: !ruby/object:Gem::Requirement
107
+ requirements:
108
+ - - ">="
109
+ - !ruby/object:Gem::Version
110
+ version: '0'
111
+ description:
112
+ email:
113
+ - jorge@hey.com
114
+ executables: []
115
+ extensions: []
116
+ extra_rdoc_files: []
117
+ files:
118
+ - MIT-LICENSE
119
+ - README.md
120
+ - Rakefile
121
+ - app/jobs/mass_encryption/application_job.rb
122
+ - app/jobs/mass_encryption/batch_encryption_job.rb
123
+ - config/routes.rb
124
+ - lib/mass_encryption.rb
125
+ - lib/mass_encryption/batch.rb
126
+ - lib/mass_encryption/batch_encryption_error.rb
127
+ - lib/mass_encryption/batch_serializer.rb
128
+ - lib/mass_encryption/encryptor.rb
129
+ - lib/mass_encryption/engine.rb
130
+ - lib/mass_encryption/version.rb
131
+ - lib/tasks/mass_encryption_tasks.rake
132
+ homepage: http://github.com/basecamp/mass_encryption
133
+ licenses:
134
+ - MIT
135
+ metadata:
136
+ homepage_uri: http://github.com/basecamp/mass_encryption
137
+ allowed_push_host: https://rubygems.org
138
+ post_install_message:
139
+ rdoc_options: []
140
+ require_paths:
141
+ - lib
142
+ required_ruby_version: !ruby/object:Gem::Requirement
143
+ requirements:
144
+ - - ">="
145
+ - !ruby/object:Gem::Version
146
+ version: '0'
147
+ required_rubygems_version: !ruby/object:Gem::Requirement
148
+ requirements:
149
+ - - ">="
150
+ - !ruby/object:Gem::Version
151
+ version: '0'
152
+ requirements: []
153
+ rubygems_version: 3.2.32
154
+ signing_key:
155
+ specification_version: 4
156
+ summary: Encrypt data in mass with Active Record Encryption
157
+ test_files: []