rails-archiver 0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: d8e9e0797cbf13a9816455c8a2386933afb8b413
4
+ data.tar.gz: 45098734416b669f4011416e0d8d1cec1b859742
5
+ SHA512:
6
+ metadata.gz: 5067af9aaa1d6a0c6986da09562b95332ea258e7053fc669f6ef5817a3115cc5d43e4202f3de266d31d8bbf842b21e490cd3e23e4f19bd699caa908bb1a056f8
7
+ data.tar.gz: e93df4a91c299d70d7e708968aa40cb02a364d88149b73cf12c1c0e0e03d8059a3aab63bd51c63c4f616b884f6774697fe13692b483dd44edc1c853068827479
data/README.md ADDED
@@ -0,0 +1,58 @@
1
+ # rails-archiver
2
+ **This project is currently in beta.**
3
+
4
+ This project allows you to archive an entire tree of a model and restore it
5
+ back. This is useful for cases where there is one central "work item"
6
+ which has many tangential tables associated with it. Rather than having a policy
7
+ to archive each table after a specific amount of time, or to manually
8
+ find all related items and archive them, this will search through the
9
+ associations of the model, archive them all at once, and allow to restore them.
10
+
11
+ You can also use this to back up the full model and restore it, e.g. on
12
+ a development machine or another environment. This allows you to easily
13
+ "package up" related data.
14
+
15
+ The intended usage is to leave the actual work item where it is so that
16
+ you can actually figure out what's available and how to access the associated
17
+ tables - the assumption is that all the associations are what's taking up
18
+ the room in your database. You can inherit from the base `Archiver` class
19
+ if you want to change this behavior.
20
+
21
+ # Usage
22
+
23
+ There are two central classes, `Archiver` and `Unarchiver`. Each of them
24
+ take a "Transport" which dictates how to store and retrieve the archive.
25
+ A sample which uses the AWS SDK to save the archive to S3 is provided.
26
+
27
+ archiver = RailsArchiver::Archiver.new(my_model,
28
+ :transport => :s3,
29
+ :delete_records => true)
30
+
31
+ archiver.transport.configure(
32
+ :bucket_name => 'my_bucket',
33
+ :base_path => '/path/to/directory')
34
+ archiver.archive
35
+
36
+ unarchiver = RailsArchiver::Unarchiver.new(my_model)
37
+
38
+ ## Special attributes
39
+
40
+ If the model has an attribute called "archived", it will automatically be set
41
+ to true when it's been archived, and false once it's been unarchived. In
42
+ addition, if using the S3 transport, it will also look for an attribute
43
+ called `archived_s3_key` and set it to the location of the archive.
44
+
45
+ ## Deciding what to archive
46
+
47
+ By default, the archiver will include all associations which are:
48
+ 1) dependent -> `destroy` or `delete_all`
49
+ 2) `has_many` or `has_one`
50
+
51
+ You can change this behavior by subclassing the archiver class and overriding
52
+ the `get_associations` method.
53
+
54
+ ## Compatibility
55
+
56
+ Currently this project has been tested with Rails 3.0. However, since it uses
57
+ fairly basic Rails model methods, it should be compatible with Rails 4 and 5
58
+ as well.
@@ -0,0 +1,161 @@
1
+ require 'tmpdir'
2
+ # Takes a database model and:
3
+ # 1) Visits all dependent associations
4
+ # 2) Saves everything in one giant JSON hash
5
+ # 3) Uploads the hash as configured
6
+ # 4) Deletes all current records from the database
7
+ # 5) Marks model as archived
8
+ module RailsArchiver
9
+ class Archiver
10
+
11
+ # Hash which determines equality solely based on the id key.
12
+ class IDHash < Hash
13
+ def ==(other)
14
+ self[:id] == other[:id]
15
+ end
16
+ end
17
+
18
+ attr_accessor :transport
19
+
20
+ # Create a new Archiver with the given model.
21
+ # @param model [ActiveRecord::Base] the model to archive or unarchive.
22
+ # @param options [Hash]
23
+ # * logger [Logger]
24
+ # * transport [Sybmol] :in_memory or :s3 right now
25
+ # * delete_records [Boolean] whether or not we should delete existing
26
+ # records
27
+ def initialize(model, options={})
28
+ @model = model
29
+ @logger = options.delete(:logger) || ::Logger.new(STDOUT)
30
+ @hash = {}
31
+ self.transport = _get_transport(options.delete(:transport) || :in_memory)
32
+ @options = options
33
+ # hash of table name -> IDs to delete in that table
34
+ @ids_to_delete = {}
35
+ end
36
+
37
+ # Archive a model.
38
+ # @return [Hash] the hash that was archived.
39
+ def archive
40
+ @logger.info("Starting archive of #{@model.class.name} #{@model.id}")
41
+ @hash = {}
42
+ _visit_association(@model)
43
+ @logger.info('Completed loading data')
44
+ @transport.store_archive(@hash)
45
+ if @model.attribute_names.include?('archived')
46
+ @model.update_attribute(:archived, true)
47
+ end
48
+ @logger.info('Deleting rows')
49
+ _delete_records if @options[:delete_records]
50
+ @logger.info('All records deleted')
51
+ @hash
52
+ end
53
+
54
+ # Returns a single object in the database represented as a hash.
55
+ # Does not account for any associations, only prints out the columns
56
+ # associated with the object as they relate to the current schema.
57
+ # Can be extended but should not be overridden or called explicitly.
58
+ # @param node [ActiveRecord::Base] an object that inherits from AR::Base
59
+ # @return [Hash]
60
+ def visit(node)
61
+ return {} unless node.class.respond_to?(:column_names)
62
+ if @options[:delete_records] && node != @model
63
+ @ids_to_delete[node.class.table_name] ||= Set.new
64
+ @ids_to_delete[node.class.table_name] << node.id
65
+ end
66
+ IDHash[
67
+ node.class.column_names.select do |cn|
68
+ next unless node.respond_to?(cn)
69
+ # Only export columns that we actually have data for
70
+ !node[cn].nil?
71
+ end.map do |cn|
72
+ [cn.to_sym, node[cn]]
73
+ end
74
+ ]
75
+ end
76
+
77
+ # Delete rows from a table. Can be used in #delete_records.
78
+ # @param table [String] the table name.
79
+ # @param ids [Array<Integer>] the IDs to delete.
80
+ def delete_from_table(table, ids)
81
+ return if ids.blank?
82
+ @logger.info("Deleting #{ids.size} records from #{table}")
83
+ groups = ids.to_a.in_groups_of(10000)
84
+ groups.each_with_index do |group, i|
85
+ sleep(0.5) if i > 0 # throttle so we don't kill the DB
86
+ delete_query = <<-SQL
87
+ DELETE FROM `#{table}` WHERE `id` IN (#{group.compact.join(',')})
88
+ SQL
89
+ ActiveRecord::Base.connection.delete(delete_query)
90
+ end
91
+
92
+ @logger.info("Finished deleting from #{table}")
93
+ end
94
+
95
+ protected
96
+
97
+ # Callback that runs after deletion is finished.
98
+ def after_delete
99
+ end
100
+
101
+ # Indicate which associations to retrieve from the given model.
102
+ # @param node [ActiveRecord::Base]
103
+ def get_associations(node)
104
+ node.class.reflect_on_all_associations.select do |assoc|
105
+ [:destroy, :delete_all].include?(assoc.options[:dependent]) &&
106
+ [:has_many, :has_one].include?(assoc.macro)
107
+ end
108
+ end
109
+
110
+ private
111
+
112
+ # Delete the records corresponding to the model.
113
+ def _delete_records
114
+ @ids_to_delete.each do |table, ids|
115
+ delete_from_table(table, ids)
116
+ end
117
+ end
118
+
119
+ # @param symbol_or_object [Symbol|RailsArchiver::Transport::Base]
120
+ # @return [RailsArchiver::Transport::Base]
121
+ def _get_transport(symbol_or_object)
122
+ if symbol_or_object.is_a?(Symbol)
123
+ klass = if symbol_or_object.present?
124
+ "RailsArchiver::Transport::#{symbol_or_object.to_s.classify}".constantize
125
+ else
126
+ Transport::InMemory
127
+ end
128
+ klass.new(@model, @logger)
129
+ else
130
+ symbol_or_object
131
+ end
132
+ end
133
+
134
+ # Used to visit an association, and recursively calls down to
135
+ # all child objects through all other allowed associations.
136
+ # @param node [ActiveRecord::Base|Array<ActiveRecord::Base>]
137
+ # any object(s) that inherits from ActiveRecord::Base
138
+ def _visit_association(node)
139
+ return if node.blank?
140
+ if node.respond_to?(:each) # e.g. a list of nodes from a has_many association
141
+ node.each { |n| _visit_association(n) }
142
+ else
143
+ class_name = node.class.name
144
+ @hash[class_name] ||= Set.new
145
+ @hash[class_name] << visit(node)
146
+ get_associations(node).each do |assoc|
147
+ @logger.debug("Visiting #{assoc.name}")
148
+ new_nodes = node.send(assoc.name)
149
+ next if new_nodes.blank?
150
+
151
+ if new_nodes.respond_to?(:each)
152
+ new_nodes.each { |n| _visit_association(n) }
153
+ else
154
+ _visit_association(new_nodes)
155
+ end
156
+ end
157
+
158
+ end
159
+ end
160
+ end
161
+ end
@@ -0,0 +1,37 @@
1
+ # Abstract class that represents a way to store and retrieve the generated
2
+ # JSON object.
3
+ module RailsArchiver
4
+ module Transport
5
+ class Base
6
+
7
+ # @param model [ActiveRecord::Base] the model we will be working with.
8
+ def initialize(model, logger=nil)
9
+ @model = model
10
+ @options = {}
11
+ @logger = logger || ::Logger.new(STDOUT)
12
+ end
13
+
14
+ # @param options [Hash] A set of options to work with.
15
+ def configure(options)
16
+ @options = options
17
+ end
18
+
19
+ # To be implemented by subclasses. Store the archive somewhere to be retrieved
20
+ # later. You should also be storing the location somewhere such as on the
21
+ # model. Use @model to reference it.
22
+ # @param hash [Hash] the hash to store. Generally you'll want to use
23
+ # .to_json on it.
24
+ def store_archive(hash)
25
+ raise NotImplementedError
26
+ end
27
+
28
+ # To be implemented by subclasses. Retrieve the archive that was previously
29
+ # created.
30
+ # @return [Hash] the retrieved hash.
31
+ def retrieve_archive
32
+ raise NotImplementedError
33
+ end
34
+
35
+ end
36
+ end
37
+ end
@@ -0,0 +1,17 @@
1
+ # Transport that just stores and retrieves the hash in memory.
2
+ module RailsArchiver
3
+ module Transport
4
+ class InMemory < Base
5
+
6
+ def store_archive(json)
7
+ @options[:json] = json
8
+ 'some-key-here'
9
+ end
10
+
11
+ def retrieve_archive
12
+ @options[:json]
13
+ end
14
+
15
+ end
16
+ end
17
+ end
@@ -0,0 +1,73 @@
1
+ require 'aws-sdk'
2
+ require 'securerandom'
3
+ # Transport that stores to S3. Uses an archived_s3_key attribute.
4
+ module RailsArchiver
5
+ module Transport
6
+ class S3 < Base
7
+
8
+ def s3_client
9
+ option_hash = @options[:region] ? {:region => @options[:region]} : {}
10
+ Aws::S3::Client.new(option_hash)
11
+ end
12
+
13
+ # Gzips the file, returns the gzipped filename
14
+ def gzip(filename)
15
+ output = `gzip --force #{filename.shellescape} 2>&1`
16
+
17
+ raise output if $?.exitstatus != 0
18
+
19
+ "#{filename}.gz"
20
+ end
21
+
22
+ def gunzip(filename)
23
+ output = `gunzip --force #{filename.shellescape} 2>&1`
24
+
25
+ raise output if $?.exitstatus != 0
26
+ end
27
+
28
+ def store_archive(hash)
29
+ json = hash.to_json
30
+ file_path = "#{@model.id}_#{SecureRandom.hex(8)}.json"
31
+ s3_key = "#{@options[:base_path]}/#{file_path}.gz"
32
+ Dir.mktmpdir do |dir|
33
+ json_filename = "#{dir}/#{file_path}"
34
+ @logger.info('Writing hash to JSON')
35
+ File.write(json_filename, json)
36
+ @logger.info('Zipping file')
37
+ filename = gzip(json_filename)
38
+ @logger.info("Uploading file to #{s3_key}")
39
+ _save_archive_to_s3(s3_key, filename)
40
+ end
41
+ s3_key
42
+ @model.update_attribute(:archived_s3_key, s3_key)
43
+ end
44
+
45
+ def retrieve_archive
46
+ Dir.mktmpdir do |dir|
47
+ filename = "#{dir}/#{@model.id}.json"
48
+ _get_archive_from_s3(@model.archived_s3_key, "#{filename}.gz")
49
+ @logger.info('Unzipping file')
50
+ gunzip("#{filename}.gz")
51
+ @logger.info('Parsing JSON')
52
+ JSON.parse(File.read(filename))
53
+ end
54
+ end
55
+
56
+ private
57
+
58
+ def _get_archive_from_s3(s3_key, filename)
59
+ s3_client.get_object(
60
+ :response_target => filename,
61
+ :bucket => @options[:bucket_name],
62
+ :key => s3_key)
63
+ end
64
+
65
+ def _save_archive_to_s3(s3_key, filename)
66
+ s3_client.put_object(:bucket => @options[:bucket_name],
67
+ :key => s3_key,
68
+ :body => File.open(filename))
69
+
70
+ end
71
+ end
72
+ end
73
+ end
@@ -0,0 +1,125 @@
1
+ # Class that loads a tree hash of objects representing ActiveRecord classes.
2
+ # We will use the models in the codebase to determine how to import them.
3
+
4
+ require 'activerecord-import'
5
+
6
+ module RailsArchiver
7
+ class Unarchiver
8
+
9
+ class ImportError < StandardError; end
10
+
11
+ attr_accessor :errors, :transport
12
+
13
+ # @param model [ActiveRecord::Base]
14
+ # @param options [Hash]
15
+ # * logger [Logger]
16
+ # * new_copy [Boolean] if true, create all new objects instead of
17
+ # replacing existing ones.
18
+ # * crash_on_errors [Boolean] if true, do not do any imports if any
19
+ # models' validations failed.
20
+ def initialize(model, options={})
21
+ @model = model
22
+ @logger = options.delete(:logger) || Logger.new(STDOUT)
23
+ @options = options
24
+ # Transport for downloading
25
+ self.transport = _get_transport(options.delete(:transport) || :in_memory)
26
+ self.errors = []
27
+ end
28
+
29
+ # Unarchive a model.
30
+ def unarchive
31
+ @errors = []
32
+ @logger.info('Downloading JSON file')
33
+ hash = @transport.retrieve_archive
34
+ @logger.info("Loading #{@model.class.name}")
35
+ load_classes(hash)
36
+ @model.reload
37
+ if @model.attribute_names.include?('archived')
38
+ @model.update_attribute(:archived, false)
39
+ end
40
+ @logger.info("#{@model.class.name} load complete!")
41
+ end
42
+
43
+ # Load a list of general classes that were saved as JSON.
44
+ # @param hash [Hash]
45
+ def load_classes(hash)
46
+ full_hash = hash.with_indifferent_access
47
+ full_hash.each do |key, vals|
48
+ save_models(key.constantize, vals)
49
+ end
50
+ if @options[:crash_on_errors] && self.errors.any?
51
+ raise ImportError.new("Errors occurred during load - please see 'errors' method for more details")
52
+ end
53
+ end
54
+
55
+ # Save all models into memory in the given hash on the given class.
56
+ # @param klass [Class] the starting class.
57
+ # @param hashes [Array<Hash<] the object hashes to import.
58
+ def save_models(klass, hashes)
59
+ models = hashes.map { |hash| init_model(klass, hash) }
60
+ import_objects(klass, models)
61
+ end
62
+
63
+ # Import saved objects.
64
+ # @param klass [Class]
65
+ # @param models [Array<ActiveRecord::Base>]
66
+ def import_objects(klass, models)
67
+ cols_to_update = klass.column_names - [klass.primary_key]
68
+ # check other unique indexes
69
+ indexes = ActiveRecord::Base.connection.indexes(klass.table_name).
70
+ select(&:unique)
71
+ indexes.each { |index| cols_to_update -= index.columns }
72
+ options = { :validate => false, :timestamps => false,
73
+ :on_duplicate_key_update => cols_to_update }
74
+
75
+ @logger.info("Importing #{models.length} for #{klass.name}")
76
+ models.in_groups_of(1000).each do |group|
77
+ klass.import(group.compact, options)
78
+ end
79
+ rescue => e
80
+ self.errors << "Error importing class #{klass.name}: #{e.message}"
81
+ end
82
+
83
+ def init_model(klass, hash)
84
+ attrs = hash.select do |x|
85
+ klass.column_names.include?(x) && x != klass.primary_key
86
+ end
87
+
88
+ # fix time zone issues
89
+ klass.columns.each do |col|
90
+ if col.type == :datetime && attrs[col.name]
91
+ attrs[col.name] = Time.zone.parse(attrs[col.name])
92
+ end
93
+ end
94
+
95
+ model = klass.where(klass.primary_key => hash[klass.primary_key]).first
96
+ if model.nil?
97
+ model = klass.new
98
+ model.send(:attributes=, attrs, false)
99
+ # can't set this in the attribute hash, it'll be overridden. Need
100
+ # to set it manually.
101
+ model[klass.primary_key] = hash[klass.primary_key]
102
+ else
103
+ model.send(:attributes=, attrs, false)
104
+ end
105
+
106
+ model
107
+ end
108
+
109
+ private
110
+
111
+ def _get_transport(symbol_or_object)
112
+ if symbol_or_object.is_a?(Symbol)
113
+ klass = if symbol_or_object.present?
114
+ "RailsArchiver::Transport::#{symbol_or_object.to_s.classify}".constantize
115
+ else
116
+ Transport::InMemory
117
+ end
118
+ klass.new(@model, @logger)
119
+ else
120
+ symbol_or_object
121
+ end
122
+ end
123
+
124
+ end
125
+ end
@@ -0,0 +1,6 @@
1
+ require 'rails-archiver/transport/base.rb'
2
+ require 'rails-archiver/transport/in_memory.rb'
3
+ require 'rails-archiver/transport/s3.rb'
4
+
5
+ require 'rails-archiver/archiver.rb'
6
+ require 'rails-archiver/unarchiver.rb'
@@ -0,0 +1,18 @@
1
+ Gem::Specification.new do |s|
2
+ s.name = 'rails-archiver'
3
+ s.require_paths = %w(. lib lib/rails-archiver)
4
+ s.version = '0.1'
5
+ s.date = '2016-03-19'
6
+ s.summary = 'Fully archive a Rails model'
7
+ s.description = <<-EOF
8
+ EOF
9
+ s.authors = ['Daniel Orner']
10
+ s.email = 'daniel.orner@wishabi.com'
11
+ s.files = `git ls-files`.split($/)
12
+ s.homepage = 'https://github.com/dmorner/rails-archiver'
13
+ s.license = 'MIT'
14
+
15
+ s.add_dependency 'rails', '>= 3.0'
16
+ s.add_dependency 'aws-sdk', '~> 2.6'
17
+
18
+ end
metadata ADDED
@@ -0,0 +1,81 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: rails-archiver
3
+ version: !ruby/object:Gem::Version
4
+ version: '0.1'
5
+ platform: ruby
6
+ authors:
7
+ - Daniel Orner
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2016-03-19 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: rails
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: '3.0'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: '3.0'
27
+ - !ruby/object:Gem::Dependency
28
+ name: aws-sdk
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '2.6'
34
+ type: :runtime
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '2.6'
41
+ description: ''
42
+ email: daniel.orner@wishabi.com
43
+ executables: []
44
+ extensions: []
45
+ extra_rdoc_files: []
46
+ files:
47
+ - README.md
48
+ - lib/rails-archiver.rb
49
+ - lib/rails-archiver/archiver.rb
50
+ - lib/rails-archiver/transport/base.rb
51
+ - lib/rails-archiver/transport/in_memory.rb
52
+ - lib/rails-archiver/transport/s3.rb
53
+ - lib/rails-archiver/unarchiver.rb
54
+ - rails-archiver.gemspec
55
+ homepage: https://github.com/dmorner/rails-archiver
56
+ licenses:
57
+ - MIT
58
+ metadata: {}
59
+ post_install_message:
60
+ rdoc_options: []
61
+ require_paths:
62
+ - "."
63
+ - lib
64
+ - lib/rails-archiver
65
+ required_ruby_version: !ruby/object:Gem::Requirement
66
+ requirements:
67
+ - - ">="
68
+ - !ruby/object:Gem::Version
69
+ version: '0'
70
+ required_rubygems_version: !ruby/object:Gem::Requirement
71
+ requirements:
72
+ - - ">="
73
+ - !ruby/object:Gem::Version
74
+ version: '0'
75
+ requirements: []
76
+ rubyforge_project:
77
+ rubygems_version: 2.5.2
78
+ signing_key:
79
+ specification_version: 4
80
+ summary: Fully archive a Rails model
81
+ test_files: []