abstract_importer 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 3f5a316e4e45d1c7a09af7b4930dda62e690d5c8
4
+ data.tar.gz: 333effaf5481100cd62f0f7523161747e923d515
5
+ SHA512:
6
+ metadata.gz: a49953a3ecee96971314712785578a4b4fb1c09a2461b02448995fe1c3589f471e158c87c6e054aa613386a5ecaa064fd49a03e009d881bc5d29a6702dd23697
7
+ data.tar.gz: a18398b96c81a7f8f8ecb469c76fa06b7513f534a2c7d6c9b515479c76209161f08e844bc1b7bbf768881dc8b060730aaf2e7d47b44d44a6248d1087c24887fc
data/.gitignore ADDED
@@ -0,0 +1,17 @@
1
+ *.gem
2
+ *.rbc
3
+ .bundle
4
+ .config
5
+ .yardoc
6
+ Gemfile.lock
7
+ InstalledFiles
8
+ _yardoc
9
+ coverage
10
+ doc/
11
+ lib/bundler/man
12
+ pkg
13
+ rdoc
14
+ spec/reports
15
+ test/tmp
16
+ test/version_tmp
17
+ tmp
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in abstract_importer.gemspec
4
+ gemspec
data/LICENSE.txt ADDED
@@ -0,0 +1,22 @@
1
+ Copyright (c) 2013 Bob Lail
2
+
3
+ MIT License
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining
6
+ a copy of this software and associated documentation files (the
7
+ "Software"), to deal in the Software without restriction, including
8
+ without limitation the rights to use, copy, modify, merge, publish,
9
+ distribute, sublicense, and/or sell copies of the Software, and to
10
+ permit persons to whom the Software is furnished to do so, subject to
11
+ the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,168 @@
1
+ # AbstractImporter
2
+
3
+ AbstractImporter provides services for importing complex data from an arbitrary data source. It:
4
+
5
+ * Preserves relationships between tables that are imported as a set
6
+ * Allows you to extend and modify the import process through a DSL and callbacks
7
+ * Supports partial and idempotent imports
8
+ * Sports flexible reporting and logging
9
+
10
+
11
+
12
+ ## Getting Started
13
+
14
+ ### Installation
15
+
16
+ Add this line to your application's `Gemfile`:
17
+
18
+ gem 'abstract_importer'
19
+
20
+ And then execute:
21
+
22
+ $ bundle
23
+
24
+ Or install it yourself as:
25
+
26
+ $ gem install abstract_importer
27
+
28
+
29
+
30
+ ### Usage
31
+
32
+ Derive your own importer from `AbstractImporter::Base` and specify the tables you intend to import:
33
+
34
+ ```ruby
35
+ class MyImporter < AbstractImporter::Base
36
+
37
+ import do |import|
38
+ import.students
39
+ import.parents
40
+ end
41
+
42
+ end
43
+ ```
44
+
45
+ `AbstractImporter` now knows it must import two collections: `students` and `parents`, in that order. It refers to this as its "Import Plan".
46
+
47
+
48
+ ##### Parent and Data Source
49
+
50
+ `MyImporter`'s initializer takes two arguments: `parent` and `data_source`:
51
+
52
+ * `parent` is any object that will respond to the names of your collections with an `ActiveRecord::Relation`.
53
+ * `data_source` is any object that will respond to the names of your collections by yielding a hash of attributes once for every record you should import.
54
+
55
+ Here are reasonable classes for `parent` and `data_source`:
56
+
57
+ ```ruby
58
+ # parent
59
+ class Account < ActiveRecord::Base
60
+ has_many :students
61
+ has_many :parents
62
+ end
63
+ ```
64
+
65
+ ```ruby
66
+ # data source
67
+ class Database
68
+ def students
69
+ yield id: 457, name: "Ron"
70
+ yield id: 458, name: "Ginny"
71
+ yield id: 459, name: "Fred"
72
+ yield id: 460, name: "George"
73
+ end
74
+
75
+ def parents
76
+ yield id: 88, name: "Arthur"
77
+ yield id: 89, name: "Molly"
78
+ end
79
+ end
80
+ ```
81
+
82
+
83
+ ##### legacy_id
84
+
85
+ For every record that AbstractImporter creates, it will assign the attribute `legacy_id`.
86
+
87
+ AbstractImporter uses this value to make sure that we don't import the same record twice in case an import is interrupted and needs to be retried or a user imports their old database more than once.
88
+
89
+
90
+ ##### Performing an Import
91
+
92
+ A straightforward import looks like this:
93
+
94
+ ```ruby
95
+ summary = MyImport.new(parent, data_source).perform!
96
+ ```
97
+
98
+ AbstractImporter optionally takes a hash of settings as a third argument:
99
+
100
+ * `:dry_run` (default: `false`) when set to `true`, goes through all the steps except creating the records
101
+ * `:io` (default: `$stderr`) an IO object that is passed to the reporter
102
+ * `:reporter` (default: `AbstractImporter::Reporter.new(io)`) performs logging in response to various events
103
+
104
+
105
+
106
+ ### Customizing the Import Plan
107
+
108
+ You can customize the Import Plan by defining various callbacks on each collection you declare:
109
+
110
+ ```ruby
111
+ class MyImporter < AbstractImporter::Base
112
+
113
+ import do |import|
114
+ import.students do |options|
115
+ options.finder :find_student
116
+ options.before_build { |attrs| attrs.merge(name: attrs[:name].capitalize) }
117
+ options.on_complete :students_completed
118
+ end
119
+ import.parents
120
+ end
121
+
122
+ def find_student
123
+ ...
124
+ end
125
+
126
+ def students_completed
127
+ ...
128
+ end
129
+
130
+ end
131
+ ```
132
+
133
+ The complete list of callbacks is below.
134
+
135
+ ##### finder
136
+
137
+ `finder` accepts a hash of attributes for a record to be imported and returns a corresponding record (if one exists). This can be useful for finding an preexisting counterpart to an imported record. (e.g. The user has created the tag "Butterbeer" and tries to import a tag with the same name. Although the legacy "Butterbeer" tag was never imported, it should not be, and any legacy articles associated with it should be associated with the native one.)
138
+
139
+ ##### before_build
140
+
141
+ `before_build` allows a callback to modify the hash of attributes before it is passed to `ActiveRecord::Relation#build`.
142
+
143
+ ##### before_create
144
+
145
+ `before_create` allows a callback to modify a record before `save` is called on it.
146
+
147
+ ##### rescue
148
+
149
+ `rescue` (like `before_create`) is called with a record just before `save` is called. Unlike `before_create`, `rescue` is only called if the record does not pass validations.
150
+
151
+ ##### after_create
152
+
153
+ `after_create` is called with the original hash of attributes and the newly-saved record right after it is successfully saved.
154
+
155
+ ##### on_complete
156
+
157
+ `on_complete` is called when all of the records in a collection have been processed.
158
+
159
+
160
+
161
+
162
+ ## Contributing
163
+
164
+ 1. Fork it
165
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
166
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
167
+ 4. Push to the branch (`git push origin my-new-feature`)
168
+ 5. Create new Pull Request
data/Rakefile ADDED
@@ -0,0 +1,9 @@
1
+ require "bundler/gem_tasks"
2
+ require 'rake/testtask'
3
+
4
+ Rake::TestTask.new(:test) do |t|
5
+ t.libs << 'lib'
6
+ t.libs << 'test'
7
+ t.pattern = 'test/**/*_test.rb'
8
+ t.verbose = false
9
+ end
@@ -0,0 +1,33 @@
1
+ # coding: utf-8
2
+ lib = File.expand_path('../lib', __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require 'abstract_importer/version'
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.name = "abstract_importer"
8
+ spec.version = AbstractImporter::VERSION
9
+ spec.authors = ["Bob Lail"]
10
+ spec.email = ["bob.lail@cph.org"]
11
+ spec.summary = %q{Provides services for the mass-import of complex relational data}
12
+ spec.homepage = "https://github.com/concordia-publishing-house/abstract_importer"
13
+ spec.license = "MIT"
14
+
15
+ spec.files = `git ls-files`.split($/)
16
+ spec.executables = spec.files.grep(%r{^bin/}) { |f| File.basename(f) }
17
+ spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
18
+ spec.require_paths = ["lib"]
19
+
20
+ spec.add_dependency "activerecord"
21
+
22
+ spec.add_development_dependency "bundler", "~> 1.3"
23
+ spec.add_development_dependency "rake"
24
+ spec.add_development_dependency "rails"
25
+ spec.add_development_dependency "sqlite3"
26
+ spec.add_development_dependency "turn"
27
+ spec.add_development_dependency "pry"
28
+ spec.add_development_dependency "rr"
29
+ spec.add_development_dependency "database_cleaner"
30
+ spec.add_development_dependency "simplecov"
31
+ spec.add_development_dependency "shoulda-context"
32
+
33
+ end
@@ -0,0 +1,170 @@
1
+ require 'abstract_importer/import_options'
2
+ require 'abstract_importer/import_plan'
3
+ require 'abstract_importer/reporter'
4
+ require 'abstract_importer/collection'
5
+ require 'abstract_importer/collection_importer'
6
+ require 'abstract_importer/id_map'
7
+
8
+
9
+ module AbstractImporter
10
+ class Base
11
+
12
+ class << self
13
+ def import
14
+ yield @import_plan = ImportPlan.new
15
+ end
16
+
17
+ attr_reader :import_plan
18
+ end
19
+
20
+
21
+
22
+ def initialize(parent, source, options={})
23
+ @source = source
24
+ @parent = parent
25
+
26
+ io = options.fetch(:io, $stderr)
27
+ @reporter = Reporter.new(io, Rails.env.production?)
28
+ @dry_run = options.fetch(:dry_run, false)
29
+
30
+ @id_map = IdMap.new
31
+ @summary = {}
32
+ @import_plan = self.class.import_plan.to_h
33
+ @collections = []
34
+ end
35
+
36
+ attr_reader :source, :parent, :reporter, :id_map, :summary
37
+
38
+ def dry_run?
39
+ @dry_run
40
+ end
41
+
42
+
43
+
44
+
45
+
46
+ def perform!
47
+ reporter.start_all(self)
48
+
49
+ ms = Benchmark.ms do
50
+ setup
51
+ end
52
+ reporter.finish_setup(ms)
53
+
54
+ ms = Benchmark.ms do
55
+ collections.each &method(:import_collection)
56
+ end
57
+
58
+ teardown
59
+ reporter.finish_all(self, ms)
60
+ @summary
61
+ end
62
+
63
+ def setup
64
+ verify_source!
65
+ verify_parent!
66
+ instantiate_collections!
67
+ prepopulate_id_map!
68
+ end
69
+
70
+ def import_collection(collection)
71
+ @summary[collection.name] = CollectionImporter.new(self, collection).perform!
72
+ end
73
+
74
+ def teardown
75
+ end
76
+
77
+
78
+
79
+
80
+
81
+ def describe_source
82
+ source.to_s
83
+ end
84
+
85
+ def describe_destination
86
+ parent.to_s
87
+ end
88
+
89
+
90
+
91
+
92
+
93
+ def remap_foreign_key?(plural, foreign_key)
94
+ true
95
+ end
96
+
97
+ def map_foreign_key(legacy_id, plural, foreign_key, depends_on)
98
+ id_map.apply!(legacy_id, depends_on)
99
+ rescue IdMap::IdNotMappedError
100
+ record_no_id_in_map_error(legacy_id, plural, foreign_key, depends_on)
101
+ nil
102
+ end
103
+
104
+
105
+
106
+
107
+
108
+ private
109
+
110
+ attr_reader :collections, :import_plan
111
+
112
+ def verify_source!
113
+ import_plan.keys.each do |collection|
114
+ next if source.respond_to?(collection)
115
+
116
+ raise "#{source.class} does not respond to `#{collection}`; " <<
117
+ "but #{self.class} plans to import records with that name"
118
+ end
119
+ end
120
+
121
+ def verify_parent!
122
+ import_plan.keys.each do |collection|
123
+ next if parent.respond_to?(collection)
124
+
125
+ raise "#{parent.class} does not have a collection named `#{collection}`; " <<
126
+ "but #{self.class} plans to import records with that name"
127
+ end
128
+ end
129
+
130
+ def instantiate_collections!
131
+ @collections = import_plan.map do |name, block|
132
+ reflection = parent.class.reflect_on_association(name)
133
+ model = reflection.klass
134
+ table_name = model.table_name
135
+ scope = parent.public_send(name)
136
+
137
+ options = ImportOptions.new
138
+ instance_exec(options, &block) if block
139
+
140
+ Collection.new(name, model, table_name, scope, options)
141
+ end
142
+ end
143
+
144
+ def prepopulate_id_map!
145
+ collections.each do |collection|
146
+ query = collection.scope.where("#{collection.table_name}.legacy_id IS NOT NULL")
147
+ map = values_of(query, :id, :legacy_id) \
148
+ .each_with_object({}) { |(id, legacy_id), map| map[legacy_id] = id }
149
+
150
+ id_map.init collection.table_name, map
151
+ end
152
+ end
153
+
154
+ def values_of(query, *columns)
155
+ if Rails.version < "4.0.0"
156
+ query = query.select(columns.map { |column| "#{query.table_name}.#{column}" }.join(", "))
157
+ ActiveRecord::Base.connection.select_rows(query.to_sql)
158
+ else
159
+ query.pluck(*columns)
160
+ end
161
+ end
162
+
163
+
164
+
165
+ def record_no_id_in_map_error(legacy_id, plural, foreign_key, depends_on)
166
+ reporter.count_notice "#{plural}.#{foreign_key} will be nil: a #{depends_on.to_s.singularize} with the legacy id #{legacy_id} was not mapped."
167
+ end
168
+
169
+ end
170
+ end
@@ -0,0 +1,4 @@
1
+ module AbstractImporter
2
+ class Collection < Struct.new(:name, :model, :table_name, :scope, :options)
3
+ end
4
+ end
@@ -0,0 +1,211 @@
1
+ module AbstractImporter
2
+ class CollectionImporter
3
+
4
+ def initialize(importer, collection)
5
+ @importer = importer
6
+ @collection = collection
7
+ end
8
+
9
+ attr_reader :importer, :collection, :summary
10
+
11
+ delegate :name,
12
+ :table_name,
13
+ :model,
14
+ :scope,
15
+ :options,
16
+ :to => :collection
17
+
18
+ delegate :dry_run?,
19
+ :parent,
20
+ :source,
21
+ :reporter,
22
+ :remap_foreign_key?,
23
+ :id_map,
24
+ :map_foreign_key,
25
+ :to => :importer
26
+
27
+
28
+
29
+ def perform!
30
+ reporter.start_collection(self)
31
+ prepare!
32
+
33
+ summary[5] = Benchmark.ms do
34
+ each_new_record &method(:process_record)
35
+ end
36
+
37
+ invoke_callback(:on_complete)
38
+ reporter.finish_collection(self, summary)
39
+ summary
40
+ end
41
+
42
+
43
+
44
+ def prepare!
45
+ # [total, existing_records, new_records, already_imported, invalid, milliseconds]
46
+ @summary = [ 0, 0, 0, 0, 0, 0]
47
+ @already_imported = load_already_imported_records!
48
+ @mappings = prepare_mappings!
49
+ end
50
+
51
+ def load_already_imported_records!
52
+ # We keep a _separate_ list of legacy IDs that
53
+ # have already been imported. It would optimal to
54
+ # check @id_map to see if a record has been imported;
55
+ # but because of a bug with tags, that won't work:
56
+ #
57
+ # Tags import from three table: Activity, Skill,
58
+ # and Training. Those tables yield tags whose
59
+ # legacy_ids collide. As a result several tags
60
+ # can share the same ID; and tags that would collide
61
+ # are [erroneously] not imported.
62
+ #
63
+ # Fixing this problem would involve changing the
64
+ # legacy_id identifier for each tag which would
65
+ # break the connection between already-imported tags
66
+ # and new imports.
67
+ #
68
+ id_map[table_name]
69
+ end
70
+
71
+ def prepare_mappings!
72
+ mappings = []
73
+ model.reflect_on_all_associations.each do |association|
74
+
75
+ # We only want the associations where this record
76
+ # has foreign keys that refer to another
77
+ next unless association.macro == :belongs_to
78
+
79
+ # We don't at this time support polymorphic associations
80
+ # which would require extending id_map to take the foreign
81
+ # type fields into account.
82
+ #
83
+ # Rails can't return `association.table_name` so easily
84
+ # because `table_name` comes from `klass` and `klass`
85
+ # isn't predetermined.
86
+ next if association.options[:polymorphic]
87
+
88
+ depends_on = association.table_name.to_sym
89
+ foreign_key = association.foreign_key.to_sym
90
+
91
+ # We support skipping some mappings entirely. I believe
92
+ # this is largely to cut down on verbosity in the log
93
+ # files and should be refactored to another place in time.
94
+ next unless remap_foreign_key?(name, foreign_key)
95
+
96
+ mappings << Proc.new do |hash|
97
+ if hash.key?(foreign_key)
98
+ hash[foreign_key] = map_foreign_key(hash[foreign_key], name, foreign_key, depends_on)
99
+ else
100
+ reporter.count_notice "#{name}.#{foreign_key} will not be mapped because it is not used"
101
+ end
102
+ end
103
+ end
104
+ mappings
105
+ end
106
+
107
+
108
+
109
+
110
+
111
+ def each_new_record
112
+ source.public_send(name) do |hash_or_hashes|
113
+ Array.wrap(hash_or_hashes).each do |hash|
114
+ yield hash.dup
115
+ end
116
+ end
117
+ end
118
+
119
+ def process_record(hash)
120
+ summary[0] += 1
121
+
122
+ if already_imported?(hash)
123
+ summary[3] += 1
124
+ return
125
+ end
126
+
127
+ remap_foreign_keys!(hash)
128
+
129
+ if redundant_record?(hash)
130
+ summary[1] += 1
131
+ return
132
+ end
133
+
134
+ if create_record(hash)
135
+ summary[2] += 1
136
+ else
137
+ summary[4] += 1
138
+ end
139
+ end
140
+
141
+
142
+
143
+
144
+
145
+ def already_imported?(hash)
146
+ @already_imported.key? hash[:id]
147
+ end
148
+
149
+ def remap_foreign_keys!(hash)
150
+ @mappings.each do |proc|
151
+ proc.call(hash)
152
+ end
153
+ end
154
+
155
+ def redundant_record?(hash)
156
+ existing_record = invoke_callback(:finder, hash)
157
+ if existing_record
158
+ id_map.register(record: existing_record, legacy_id: hash[:id])
159
+ true
160
+ else
161
+ false
162
+ end
163
+ end
164
+
165
+
166
+
167
+
168
+
169
+ def create_record(hash)
170
+ record = build_record(hash)
171
+
172
+ return true if dry_run?
173
+
174
+ invoke_callback(:before_create, record)
175
+
176
+ # rescue_callback has one shot to fix things
177
+ invoke_callback(:rescue, record) unless record.valid?
178
+
179
+ if record.save
180
+ invoke_callback(:after_create, hash, record)
181
+ id_map << record
182
+
183
+ reporter.record_created(record)
184
+ true
185
+ else
186
+
187
+ reporter.record_failed(record)
188
+ false
189
+ end
190
+ end
191
+
192
+ def build_record(hash)
193
+ hash = invoke_callback(:before_build, hash) || hash
194
+
195
+ legacy_id = hash.delete(:id)
196
+
197
+ scope.build hash.merge(legacy_id: legacy_id)
198
+ end
199
+
200
+
201
+
202
+ def invoke_callback(callback, *args)
203
+ callback_name = :"#{callback}_callback"
204
+ callback = options.public_send(callback_name)
205
+ return unless callback
206
+ callback = importer.method(callback) if callback.is_a?(Symbol)
207
+ callback.call(*args)
208
+ end
209
+
210
+ end
211
+ end
@@ -0,0 +1,47 @@
1
+ module AbstractImporter
2
+ class IdMap
3
+
4
+ class IdNotMappedError < StandardError; end
5
+
6
+ def initialize
7
+ @id_map = Hash.new { |hash, key| hash[key] = {} }
8
+ end
9
+
10
+
11
+
12
+ def init(table_name, map)
13
+ table_name = table_name.to_sym
14
+ @id_map[table_name] = map
15
+ end
16
+
17
+ def get(table_name)
18
+ @id_map[table_name.to_sym].dup
19
+ end
20
+ alias :[] :get
21
+
22
+ def <<(record)
23
+ register(record: record)
24
+ end
25
+
26
+ def register(options={})
27
+ if options.key?(:record)
28
+ record = options[:record]
29
+ table_name, record_id, legacy_id = record.class.table_name, record.id, record.legacy_id
30
+ end
31
+ table_name = options[:table_name] if options.key?(:table_name)
32
+ legacy_id = options[:legacy_id] if options.key?(:legacy_id)
33
+ record_id = options[:record_id] if options.key?(:record_id)
34
+
35
+ table_name = table_name.to_sym
36
+ @id_map[table_name][legacy_id] = record_id
37
+ end
38
+
39
+ def apply!(legacy_id, depends_on)
40
+ return nil if legacy_id.blank?
41
+ id_map = @id_map[depends_on]
42
+ raise IdNotMappedError.new unless id_map.key?(legacy_id)
43
+ id_map[legacy_id]
44
+ end
45
+
46
+ end
47
+ end