rails-archiver 0.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: d8e9e0797cbf13a9816455c8a2386933afb8b413
4
+ data.tar.gz: 45098734416b669f4011416e0d8d1cec1b859742
5
+ SHA512:
6
+ metadata.gz: 5067af9aaa1d6a0c6986da09562b95332ea258e7053fc669f6ef5817a3115cc5d43e4202f3de266d31d8bbf842b21e490cd3e23e4f19bd699caa908bb1a056f8
7
+ data.tar.gz: e93df4a91c299d70d7e708968aa40cb02a364d88149b73cf12c1c0e0e03d8059a3aab63bd51c63c4f616b884f6774697fe13692b483dd44edc1c853068827479
data/README.md ADDED
@@ -0,0 +1,58 @@
1
+ # rails-archiver
2
+ **This project is currently in beta.**
3
+
4
+ This project allows you to archive an entire tree of a model and restore it
5
+ back. This is useful for cases where there is one central "work item"
6
+ which has many tangential tables associated with it. Rather than having a policy
7
+ to archive each table after a specific amount of time, or to manually
8
+ find all related items and archive them, this will search through the
9
+ associations of the model, archive them all at once, and allow to restore them.
10
+
11
+ You can also use this to back up the full model and restore it, e.g. on
12
+ a development machine or another environment. This allows you to easily
13
+ "package up" related data.
14
+
15
+ The intended usage is to leave the actual work item where it is so that
16
+ you can actually figure out what's available and how to access the associated
17
+ tables - the assumption is that all the associations are what's taking up
18
+ the room in your database. You can inherit from the base `Archiver` class
19
+ if you want to change this behavior.
20
+
21
+ # Usage
22
+
23
+ There are two central classes, `Archiver` and `Unarchiver`. Each of them
24
+ take a "Transport" which dictates how to store and retrieve the archive.
25
+ A sample which uses the AWS SDK to save the archive to S3 is provided.
26
+
27
+ archiver = RailsArchiver::Archiver.new(my_model,
28
+ :transport => :s3,
29
+ :delete_records => true)
30
+
31
+ archiver.transport.configure(
32
+ :bucket_name => 'my_bucket',
33
+ :base_path => '/path/to/directory')
34
+ archiver.archive
35
+
36
+ unarchiver = RailsArchiver::Unarchiver.new(my_model)
37
+
38
+ ## Special attributes
39
+
40
+ If the model has an attribute called "archived", it will automatically be set
41
+ to true when it's been archived, and false once it's been unarchived. In
42
+ addition, if using the S3 transport, it will also look for an attribute
43
+ called `archived_s3_key` and set it to the location of the archive.
44
+
45
+ ## Deciding what to archive
46
+
47
+ By default, the archiver will include all associations which are:
48
+ 1) dependent -> `destroy` or `delete_all`
49
+ 2) `has_many` or `has_one`
50
+
51
+ You can change this behavior by subclassing the archiver class and overriding
52
+ the `get_associations` method.
53
+
54
+ ## Compatibility
55
+
56
+ Currently this project has been tested with Rails 3.0. However, since it uses
57
+ fairly basic Rails model methods, it should be compatible with Rails 4 and 5
58
+ as well.
@@ -0,0 +1,161 @@
1
+ require 'tmpdir'
2
+ # Takes a database model and:
3
+ # 1) Visits all dependent associations
4
+ # 2) Saves everything in one giant JSON hash
5
+ # 3) Uploads the hash as configured
6
+ # 4) Deletes all current records from the database
7
+ # 5) Marks model as archived
8
+ module RailsArchiver
9
+ class Archiver
10
+
11
+ # Hash which determines equality solely based on the id key.
12
+ class IDHash < Hash
13
+ def ==(other)
14
+ self[:id] == other[:id]
15
+ end
16
+ end
17
+
18
+ attr_accessor :transport
19
+
20
+ # Create a new Archiver with the given model.
21
+ # @param model [ActiveRecord::Base] the model to archive or unarchive.
22
+ # @param options [Hash]
23
+ # * logger [Logger]
24
+ # * transport [Sybmol] :in_memory or :s3 right now
25
+ # * delete_records [Boolean] whether or not we should delete existing
26
+ # records
27
+ def initialize(model, options={})
28
+ @model = model
29
+ @logger = options.delete(:logger) || ::Logger.new(STDOUT)
30
+ @hash = {}
31
+ self.transport = _get_transport(options.delete(:transport) || :in_memory)
32
+ @options = options
33
+ # hash of table name -> IDs to delete in that table
34
+ @ids_to_delete = {}
35
+ end
36
+
37
+ # Archive a model.
38
+ # @return [Hash] the hash that was archived.
39
+ def archive
40
+ @logger.info("Starting archive of #{@model.class.name} #{@model.id}")
41
+ @hash = {}
42
+ _visit_association(@model)
43
+ @logger.info('Completed loading data')
44
+ @transport.store_archive(@hash)
45
+ if @model.attribute_names.include?('archived')
46
+ @model.update_attribute(:archived, true)
47
+ end
48
+ @logger.info('Deleting rows')
49
+ _delete_records if @options[:delete_records]
50
+ @logger.info('All records deleted')
51
+ @hash
52
+ end
53
+
54
+ # Returns a single object in the database represented as a hash.
55
+ # Does not account for any associations, only prints out the columns
56
+ # associated with the object as they relate to the current schema.
57
+ # Can be extended but should not be overridden or called explicitly.
58
+ # @param node [ActiveRecord::Base] an object that inherits from AR::Base
59
+ # @return [Hash]
60
+ def visit(node)
61
+ return {} unless node.class.respond_to?(:column_names)
62
+ if @options[:delete_records] && node != @model
63
+ @ids_to_delete[node.class.table_name] ||= Set.new
64
+ @ids_to_delete[node.class.table_name] << node.id
65
+ end
66
+ IDHash[
67
+ node.class.column_names.select do |cn|
68
+ next unless node.respond_to?(cn)
69
+ # Only export columns that we actually have data for
70
+ !node[cn].nil?
71
+ end.map do |cn|
72
+ [cn.to_sym, node[cn]]
73
+ end
74
+ ]
75
+ end
76
+
77
+ # Delete rows from a table. Can be used in #delete_records.
78
+ # @param table [String] the table name.
79
+ # @param ids [Array<Integer>] the IDs to delete.
80
+ def delete_from_table(table, ids)
81
+ return if ids.blank?
82
+ @logger.info("Deleting #{ids.size} records from #{table}")
83
+ groups = ids.to_a.in_groups_of(10000)
84
+ groups.each_with_index do |group, i|
85
+ sleep(0.5) if i > 0 # throttle so we don't kill the DB
86
+ delete_query = <<-SQL
87
+ DELETE FROM `#{table}` WHERE `id` IN (#{group.compact.join(',')})
88
+ SQL
89
+ ActiveRecord::Base.connection.delete(delete_query)
90
+ end
91
+
92
+ @logger.info("Finished deleting from #{table}")
93
+ end
94
+
95
+ protected
96
+
97
+ # Callback that runs after deletion is finished.
98
+ def after_delete
99
+ end
100
+
101
+ # Indicate which associations to retrieve from the given model.
102
+ # @param node [ActiveRecord::Base]
103
+ def get_associations(node)
104
+ node.class.reflect_on_all_associations.select do |assoc|
105
+ [:destroy, :delete_all].include?(assoc.options[:dependent]) &&
106
+ [:has_many, :has_one].include?(assoc.macro)
107
+ end
108
+ end
109
+
110
+ private
111
+
112
+ # Delete the records corresponding to the model.
113
+ def _delete_records
114
+ @ids_to_delete.each do |table, ids|
115
+ delete_from_table(table, ids)
116
+ end
117
+ end
118
+
119
+ # @param symbol_or_object [Symbol|RailsArchiver::Transport::Base]
120
+ # @return [RailsArchiver::Transport::Base]
121
+ def _get_transport(symbol_or_object)
122
+ if symbol_or_object.is_a?(Symbol)
123
+ klass = if symbol_or_object.present?
124
+ "RailsArchiver::Transport::#{symbol_or_object.to_s.classify}".constantize
125
+ else
126
+ Transport::InMemory
127
+ end
128
+ klass.new(@model, @logger)
129
+ else
130
+ symbol_or_object
131
+ end
132
+ end
133
+
134
+ # Used to visit an association, and recursively calls down to
135
+ # all child objects through all other allowed associations.
136
+ # @param node [ActiveRecord::Base|Array<ActiveRecord::Base>]
137
+ # any object(s) that inherits from ActiveRecord::Base
138
+ def _visit_association(node)
139
+ return if node.blank?
140
+ if node.respond_to?(:each) # e.g. a list of nodes from a has_many association
141
+ node.each { |n| _visit_association(n) }
142
+ else
143
+ class_name = node.class.name
144
+ @hash[class_name] ||= Set.new
145
+ @hash[class_name] << visit(node)
146
+ get_associations(node).each do |assoc|
147
+ @logger.debug("Visiting #{assoc.name}")
148
+ new_nodes = node.send(assoc.name)
149
+ next if new_nodes.blank?
150
+
151
+ if new_nodes.respond_to?(:each)
152
+ new_nodes.each { |n| _visit_association(n) }
153
+ else
154
+ _visit_association(new_nodes)
155
+ end
156
+ end
157
+
158
+ end
159
+ end
160
+ end
161
+ end
@@ -0,0 +1,37 @@
1
+ # Abstract class that represents a way to store and retrieve the generated
2
+ # JSON object.
3
+ module RailsArchiver
4
+ module Transport
5
+ class Base
6
+
7
+ # @param model [ActiveRecord::Base] the model we will be working with.
8
+ def initialize(model, logger=nil)
9
+ @model = model
10
+ @options = {}
11
+ @logger = logger || ::Logger.new(STDOUT)
12
+ end
13
+
14
+ # @param options [Hash] A set of options to work with.
15
+ def configure(options)
16
+ @options = options
17
+ end
18
+
19
+ # To be implemented by subclasses. Store the archive somewhere to be retrieved
20
+ # later. You should also be storing the location somewhere such as on the
21
+ # model. Use @model to reference it.
22
+ # @param hash [Hash] the hash to store. Generally you'll want to use
23
+ # .to_json on it.
24
+ def store_archive(hash)
25
+ raise NotImplementedError
26
+ end
27
+
28
+ # To be implemented by subclasses. Retrieve the archive that was previously
29
+ # created.
30
+ # @return [Hash] the retrieved hash.
31
+ def retrieve_archive
32
+ raise NotImplementedError
33
+ end
34
+
35
+ end
36
+ end
37
+ end
@@ -0,0 +1,17 @@
1
+ # Transport that just stores and retrieves the hash in memory.
2
+ module RailsArchiver
3
+ module Transport
4
+ class InMemory < Base
5
+
6
+ def store_archive(json)
7
+ @options[:json] = json
8
+ 'some-key-here'
9
+ end
10
+
11
+ def retrieve_archive
12
+ @options[:json]
13
+ end
14
+
15
+ end
16
+ end
17
+ end
@@ -0,0 +1,73 @@
1
+ require 'aws-sdk'
2
+ require 'securerandom'
3
+ # Transport that stores to S3. Uses an archived_s3_key attribute.
4
+ module RailsArchiver
5
+ module Transport
6
+ class S3 < Base
7
+
8
+ def s3_client
9
+ option_hash = @options[:region] ? {:region => @options[:region]} : {}
10
+ Aws::S3::Client.new(option_hash)
11
+ end
12
+
13
+ # Gzips the file, returns the gzipped filename
14
+ def gzip(filename)
15
+ output = `gzip --force #{filename.shellescape} 2>&1`
16
+
17
+ raise output if $?.exitstatus != 0
18
+
19
+ "#{filename}.gz"
20
+ end
21
+
22
+ def gunzip(filename)
23
+ output = `gunzip --force #{filename.shellescape} 2>&1`
24
+
25
+ raise output if $?.exitstatus != 0
26
+ end
27
+
28
+ def store_archive(hash)
29
+ json = hash.to_json
30
+ file_path = "#{@model.id}_#{SecureRandom.hex(8)}.json"
31
+ s3_key = "#{@options[:base_path]}/#{file_path}.gz"
32
+ Dir.mktmpdir do |dir|
33
+ json_filename = "#{dir}/#{file_path}"
34
+ @logger.info('Writing hash to JSON')
35
+ File.write(json_filename, json)
36
+ @logger.info('Zipping file')
37
+ filename = gzip(json_filename)
38
+ @logger.info("Uploading file to #{s3_key}")
39
+ _save_archive_to_s3(s3_key, filename)
40
+ end
41
+ s3_key
42
+ @model.update_attribute(:archived_s3_key, s3_key)
43
+ end
44
+
45
+ def retrieve_archive
46
+ Dir.mktmpdir do |dir|
47
+ filename = "#{dir}/#{@model.id}.json"
48
+ _get_archive_from_s3(@model.archived_s3_key, "#{filename}.gz")
49
+ @logger.info('Unzipping file')
50
+ gunzip("#{filename}.gz")
51
+ @logger.info('Parsing JSON')
52
+ JSON.parse(File.read(filename))
53
+ end
54
+ end
55
+
56
+ private
57
+
58
+ def _get_archive_from_s3(s3_key, filename)
59
+ s3_client.get_object(
60
+ :response_target => filename,
61
+ :bucket => @options[:bucket_name],
62
+ :key => s3_key)
63
+ end
64
+
65
+ def _save_archive_to_s3(s3_key, filename)
66
+ s3_client.put_object(:bucket => @options[:bucket_name],
67
+ :key => s3_key,
68
+ :body => File.open(filename))
69
+
70
+ end
71
+ end
72
+ end
73
+ end
@@ -0,0 +1,125 @@
1
+ # Class that loads a tree hash of objects representing ActiveRecord classes.
2
+ # We will use the models in the codebase to determine how to import them.
3
+
4
+ require 'activerecord-import'
5
+
6
+ module RailsArchiver
7
+ class Unarchiver
8
+
9
+ class ImportError < StandardError; end
10
+
11
+ attr_accessor :errors, :transport
12
+
13
+ # @param model [ActiveRecord::Base]
14
+ # @param options [Hash]
15
+ # * logger [Logger]
16
+ # * new_copy [Boolean] if true, create all new objects instead of
17
+ # replacing existing ones.
18
+ # * crash_on_errors [Boolean] if true, do not do any imports if any
19
+ # models' validations failed.
20
+ def initialize(model, options={})
21
+ @model = model
22
+ @logger = options.delete(:logger) || Logger.new(STDOUT)
23
+ @options = options
24
+ # Transport for downloading
25
+ self.transport = _get_transport(options.delete(:transport) || :in_memory)
26
+ self.errors = []
27
+ end
28
+
29
+ # Unarchive a model.
30
+ def unarchive
31
+ @errors = []
32
+ @logger.info('Downloading JSON file')
33
+ hash = @transport.retrieve_archive
34
+ @logger.info("Loading #{@model.class.name}")
35
+ load_classes(hash)
36
+ @model.reload
37
+ if @model.attribute_names.include?('archived')
38
+ @model.update_attribute(:archived, false)
39
+ end
40
+ @logger.info("#{@model.class.name} load complete!")
41
+ end
42
+
43
+ # Load a list of general classes that were saved as JSON.
44
+ # @param hash [Hash]
45
+ def load_classes(hash)
46
+ full_hash = hash.with_indifferent_access
47
+ full_hash.each do |key, vals|
48
+ save_models(key.constantize, vals)
49
+ end
50
+ if @options[:crash_on_errors] && self.errors.any?
51
+ raise ImportError.new("Errors occurred during load - please see 'errors' method for more details")
52
+ end
53
+ end
54
+
55
+ # Save all models into memory in the given hash on the given class.
56
+ # @param klass [Class] the starting class.
57
+ # @param hashes [Array<Hash<] the object hashes to import.
58
+ def save_models(klass, hashes)
59
+ models = hashes.map { |hash| init_model(klass, hash) }
60
+ import_objects(klass, models)
61
+ end
62
+
63
+ # Import saved objects.
64
+ # @param klass [Class]
65
+ # @param models [Array<ActiveRecord::Base>]
66
+ def import_objects(klass, models)
67
+ cols_to_update = klass.column_names - [klass.primary_key]
68
+ # check other unique indexes
69
+ indexes = ActiveRecord::Base.connection.indexes(klass.table_name).
70
+ select(&:unique)
71
+ indexes.each { |index| cols_to_update -= index.columns }
72
+ options = { :validate => false, :timestamps => false,
73
+ :on_duplicate_key_update => cols_to_update }
74
+
75
+ @logger.info("Importing #{models.length} for #{klass.name}")
76
+ models.in_groups_of(1000).each do |group|
77
+ klass.import(group.compact, options)
78
+ end
79
+ rescue => e
80
+ self.errors << "Error importing class #{klass.name}: #{e.message}"
81
+ end
82
+
83
+ def init_model(klass, hash)
84
+ attrs = hash.select do |x|
85
+ klass.column_names.include?(x) && x != klass.primary_key
86
+ end
87
+
88
+ # fix time zone issues
89
+ klass.columns.each do |col|
90
+ if col.type == :datetime && attrs[col.name]
91
+ attrs[col.name] = Time.zone.parse(attrs[col.name])
92
+ end
93
+ end
94
+
95
+ model = klass.where(klass.primary_key => hash[klass.primary_key]).first
96
+ if model.nil?
97
+ model = klass.new
98
+ model.send(:attributes=, attrs, false)
99
+ # can't set this in the attribute hash, it'll be overridden. Need
100
+ # to set it manually.
101
+ model[klass.primary_key] = hash[klass.primary_key]
102
+ else
103
+ model.send(:attributes=, attrs, false)
104
+ end
105
+
106
+ model
107
+ end
108
+
109
+ private
110
+
111
+ def _get_transport(symbol_or_object)
112
+ if symbol_or_object.is_a?(Symbol)
113
+ klass = if symbol_or_object.present?
114
+ "RailsArchiver::Transport::#{symbol_or_object.to_s.classify}".constantize
115
+ else
116
+ Transport::InMemory
117
+ end
118
+ klass.new(@model, @logger)
119
+ else
120
+ symbol_or_object
121
+ end
122
+ end
123
+
124
+ end
125
+ end
@@ -0,0 +1,6 @@
1
+ require 'rails-archiver/transport/base.rb'
2
+ require 'rails-archiver/transport/in_memory.rb'
3
+ require 'rails-archiver/transport/s3.rb'
4
+
5
+ require 'rails-archiver/archiver.rb'
6
+ require 'rails-archiver/unarchiver.rb'
@@ -0,0 +1,18 @@
1
+ Gem::Specification.new do |s|
2
+ s.name = 'rails-archiver'
3
+ s.require_paths = %w(. lib lib/rails-archiver)
4
+ s.version = '0.1'
5
+ s.date = '2016-03-19'
6
+ s.summary = 'Fully archive a Rails model'
7
+ s.description = <<-EOF
8
+ EOF
9
+ s.authors = ['Daniel Orner']
10
+ s.email = 'daniel.orner@wishabi.com'
11
+ s.files = `git ls-files`.split($/)
12
+ s.homepage = 'https://github.com/dmorner/rails-archiver'
13
+ s.license = 'MIT'
14
+
15
+ s.add_dependency 'rails', '>= 3.0'
16
+ s.add_dependency 'aws-sdk', '~> 2.6'
17
+
18
+ end
metadata ADDED
@@ -0,0 +1,81 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: rails-archiver
3
+ version: !ruby/object:Gem::Version
4
+ version: '0.1'
5
+ platform: ruby
6
+ authors:
7
+ - Daniel Orner
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2016-03-19 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: rails
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: '3.0'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: '3.0'
27
+ - !ruby/object:Gem::Dependency
28
+ name: aws-sdk
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '2.6'
34
+ type: :runtime
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '2.6'
41
+ description: ''
42
+ email: daniel.orner@wishabi.com
43
+ executables: []
44
+ extensions: []
45
+ extra_rdoc_files: []
46
+ files:
47
+ - README.md
48
+ - lib/rails-archiver.rb
49
+ - lib/rails-archiver/archiver.rb
50
+ - lib/rails-archiver/transport/base.rb
51
+ - lib/rails-archiver/transport/in_memory.rb
52
+ - lib/rails-archiver/transport/s3.rb
53
+ - lib/rails-archiver/unarchiver.rb
54
+ - rails-archiver.gemspec
55
+ homepage: https://github.com/dmorner/rails-archiver
56
+ licenses:
57
+ - MIT
58
+ metadata: {}
59
+ post_install_message:
60
+ rdoc_options: []
61
+ require_paths:
62
+ - "."
63
+ - lib
64
+ - lib/rails-archiver
65
+ required_ruby_version: !ruby/object:Gem::Requirement
66
+ requirements:
67
+ - - ">="
68
+ - !ruby/object:Gem::Version
69
+ version: '0'
70
+ required_rubygems_version: !ruby/object:Gem::Requirement
71
+ requirements:
72
+ - - ">="
73
+ - !ruby/object:Gem::Version
74
+ version: '0'
75
+ requirements: []
76
+ rubyforge_project:
77
+ rubygems_version: 2.5.2
78
+ signing_key:
79
+ specification_version: 4
80
+ summary: Fully archive a Rails model
81
+ test_files: []