data-import 0.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/.gitignore +6 -0
- data/.rvmrc +1 -0
- data/Gemfile +4 -0
- data/README.md +157 -0
- data/Rakefile +20 -0
- data/data-import.gemspec +29 -0
- data/lib/data-import.rb +35 -0
- data/lib/data-import/adapters/sequel.rb +96 -0
- data/lib/data-import/database.rb +28 -0
- data/lib/data-import/definition.rb +20 -0
- data/lib/data-import/definition/simple.rb +59 -0
- data/lib/data-import/dsl.rb +54 -0
- data/lib/data-import/dsl/import.rb +48 -0
- data/lib/data-import/dsl/import/from.rb +35 -0
- data/lib/data-import/execution_plan.rb +16 -0
- data/lib/data-import/importer.rb +55 -0
- data/lib/data-import/runner.rb +62 -0
- data/lib/data-import/version.rb +3 -0
- data/scripts/ci.sh +12 -0
- data/spec/data-import/adapters/sequel_spec.rb +159 -0
- data/spec/data-import/database_spec.rb +24 -0
- data/spec/data-import/definition/simple_spec.rb +71 -0
- data/spec/data-import/definition_spec.rb +14 -0
- data/spec/data-import/dsl/import/from_spec.rb +43 -0
- data/spec/data-import/dsl/import_spec.rb +87 -0
- data/spec/data-import/dsl_spec.rb +99 -0
- data/spec/data-import/execution_plan_spec.rb +25 -0
- data/spec/data-import/importer_spec.rb +150 -0
- data/spec/data-import/runner_spec.rb +136 -0
- data/spec/data-import_spec.rb +34 -0
- data/spec/integration/before_block_spec.rb +59 -0
- data/spec/integration/simple_mappings_spec.rb +68 -0
- data/spec/integration/update_records_spec.rb +57 -0
- data/spec/junit_formatter.rb +106 -0
- data/spec/spec_helper.rb +8 -0
- metadata +164 -0
data/.gitignore
ADDED
data/.rvmrc
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
rvm --create ruby-1.9.2@data-import
|
data/Gemfile
ADDED
data/README.md
ADDED
@@ -0,0 +1,157 @@
|
|
1
|
+
# DataImport
|
2
|
+
|
3
|
+
data-import is a data-migration framework. The goal of the project is to provide a simple api to migrate data from a legacy schema into a new one. It's based on jeremyevans/sequel.
|
4
|
+
|
5
|
+
## Installation
|
6
|
+
|
7
|
+
```ruby
|
8
|
+
gem 'data-import'
|
9
|
+
```
|
10
|
+
|
11
|
+
you can put your migration configuration in any file you like. We suggest something like `mapping.rb`
|
12
|
+
|
13
|
+
```ruby
|
14
|
+
source :sequel, 'sqlite:/'
|
15
|
+
target :sequel, 'sqlite:/'
|
16
|
+
|
17
|
+
import 'Animals' do
|
18
|
+
from 'tblAnimal', :primary_key => 'sAnimalID'
|
19
|
+
to 'animals'
|
20
|
+
|
21
|
+
mapping 'sAnimalID' => 'id'
|
22
|
+
mapping 'strAnimalTitleText' => 'name'
|
23
|
+
mapping 'sAnimalAge' => 'age'
|
24
|
+
mapping 'strThreat' do |context, threat|
|
25
|
+
rating = ['none', 'medium', 'big'].index(threat) + 1
|
26
|
+
{:danger_rating => rating}
|
27
|
+
end
|
28
|
+
end
|
29
|
+
```
|
30
|
+
|
31
|
+
to run the import just execute:
|
32
|
+
|
33
|
+
```ruby
|
34
|
+
mapping_path = Rails.root + 'mapping.rb'
|
35
|
+
DataImport.run_config! mapping_path
|
36
|
+
```
|
37
|
+
|
38
|
+
if you execute the import frequently you can create a Rake-Task:
|
39
|
+
|
40
|
+
```ruby
|
41
|
+
desc "Imports the date from the source database"
|
42
|
+
task :import do
|
43
|
+
mapping_path = Rails.root + 'mapping.rb'
|
44
|
+
options = {}
|
45
|
+
options[:only] = ENV['RUN_ONLY'].split(',') if ENV['RUN_ONLY'].present?
|
46
|
+
|
47
|
+
DataImport.run_config! mapping_path, options
|
48
|
+
end
|
49
|
+
```
|
50
|
+
|
51
|
+
## Configuration
|
52
|
+
|
53
|
+
data-import provides a clean dsl to define your mappings from the legacy schema to the new one.
|
54
|
+
|
55
|
+
### Before Filter ###
|
56
|
+
|
57
|
+
data-import allows you to definie a global filter. This filter can be used to make global transformations like encoding fixes. You can define a filter, which downcases every string like so:
|
58
|
+
|
59
|
+
```ruby
|
60
|
+
before_filter do |row|
|
61
|
+
row.each do |k, v|
|
62
|
+
row[k] = v.downcase if v.respond_to?(:downcase)
|
63
|
+
end
|
64
|
+
end
|
65
|
+
```
|
66
|
+
|
67
|
+
### Simple Mappings
|
68
|
+
|
69
|
+
You've already seen a very basic example of the dsl in the Installation-Section. This part shows off the features of the mapping-DSL.
|
70
|
+
|
71
|
+
#### Structure ####
|
72
|
+
|
73
|
+
every mapping starts with a call to `import` followed by the name of the mapping. You can name mappings however you like. The block passed to import contains the mapping itself. You can supply the source-table with `from` and the target-table with `to`. Make sure that you set the primary-key on the source-table otherwhise pagination is not working properly and the migration will fill up your RAM.
|
74
|
+
|
75
|
+
```ruby
|
76
|
+
import 'Users' do
|
77
|
+
from 'tblUser', :primary_key => 'sUserID'
|
78
|
+
to 'users'
|
79
|
+
```
|
80
|
+
|
81
|
+
#### Column-Mappings ####
|
82
|
+
|
83
|
+
You can create simple name-mappings with a call to `mapping`:
|
84
|
+
|
85
|
+
```ruby
|
86
|
+
mapping 'sUserID' => 'id'
|
87
|
+
mapping 'strEmail' => 'email'
|
88
|
+
mapping 'strUsername' => 'username'
|
89
|
+
```
|
90
|
+
|
91
|
+
If you need to process a column you can add a block. This will pass in the values of the columns you specified after mapping. The return value of the block should be a hash or nil. Nil means no mapping at all and in case of a hash you have to use the column-names of the target-table as keys.
|
92
|
+
|
93
|
+
```ruby
|
94
|
+
mapping 'strThreat' do |context, threat|
|
95
|
+
rating = ['none', 'medium', 'big'].index(threat) + 1
|
96
|
+
{:danger_rating => rating}
|
97
|
+
end
|
98
|
+
```
|
99
|
+
|
100
|
+
### Dependencies
|
101
|
+
|
102
|
+
You can specify dependencies between definitions. Dependencies are always run before a given definition will be executed. Adding all necessary dependencies also allows you to run a set of definitions instead of everything.
|
103
|
+
|
104
|
+
```ruby
|
105
|
+
import 'Roles' do
|
106
|
+
from 'tblRole', :primary_key => 'sRoleID'
|
107
|
+
to 'roles'
|
108
|
+
end
|
109
|
+
|
110
|
+
import 'SubscriptionPlans' do
|
111
|
+
from 'tblSubcriptionCat', :primary_key => 'sSubscriptionCatID'
|
112
|
+
to 'subscription_plans'
|
113
|
+
end
|
114
|
+
|
115
|
+
import 'Users' do
|
116
|
+
from 'tblUser', :primary_key => 'sUserID'
|
117
|
+
to 'users'
|
118
|
+
dependencies 'SubscriptionPlans'
|
119
|
+
end
|
120
|
+
|
121
|
+
import 'Permissions' do
|
122
|
+
from 'tblUserRoles'
|
123
|
+
to 'permissions'
|
124
|
+
dependencies 'Users', 'Roles'
|
125
|
+
end
|
126
|
+
```
|
127
|
+
|
128
|
+
you can now run parts of your mappings using the :only option:
|
129
|
+
|
130
|
+
```ruby
|
131
|
+
DataImport.run_config! 'mappings.rb', :only => ['Users'] # => imports SubscriptionPlans then Users
|
132
|
+
DataImport.run_config! 'mappings.rb', :only => ['Roles'] # => imports Roles only
|
133
|
+
DataImport.run_config! 'mappings.rb', :only => ['Permissions'] # => imports Roles, SubscriptionPlans, Users and then Permissions
|
134
|
+
```
|
135
|
+
|
136
|
+
## Examples
|
137
|
+
|
138
|
+
you can learn a lot from the [integration specs](https://github.com/garaio/data-import/tree/master/spec/integration).
|
139
|
+
|
140
|
+
## Community
|
141
|
+
|
142
|
+
### Got a question?
|
143
|
+
|
144
|
+
Just send me a message and I'll try to get to you as soon as possible.
|
145
|
+
|
146
|
+
### Found a bug?
|
147
|
+
|
148
|
+
Please submit a new issue.
|
149
|
+
|
150
|
+
### Fixed something?
|
151
|
+
|
152
|
+
1. Fork data-import
|
153
|
+
2. Create a topic branch - `git checkout -b my_branch`
|
154
|
+
3. Make your changes and update the History.txt file
|
155
|
+
4. Push to your branch - `git push origin my_branch`
|
156
|
+
5. Send me a pull-request for your topic branch
|
157
|
+
6. That's it!
|
data/Rakefile
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
require "bundler/gem_tasks"
|
2
|
+
|
3
|
+
require 'rspec/core/rake_task'
|
4
|
+
|
5
|
+
namespace :ci do
|
6
|
+
task :setup do
|
7
|
+
include FileUtils
|
8
|
+
|
9
|
+
rm_rf 'reports'
|
10
|
+
mkdir_p 'reports/rspec'
|
11
|
+
end
|
12
|
+
|
13
|
+
RSpec::Core::RakeTask.new(:rspec => :setup) do |t|
|
14
|
+
t.rspec_opts = ['--no-color',
|
15
|
+
'-r ./spec/junit_formatter.rb',
|
16
|
+
'-f "JUnitFormatter"',
|
17
|
+
'-o reports/rspec/junit.xml']
|
18
|
+
t.pattern = "spec/**/*_spec.rb"
|
19
|
+
end
|
20
|
+
end
|
data/data-import.gemspec
ADDED
@@ -0,0 +1,29 @@
|
|
1
|
+
# -*- encoding: utf-8 -*-
|
2
|
+
$:.push File.expand_path("../lib", __FILE__)
|
3
|
+
require "data-import/version"
|
4
|
+
|
5
|
+
Gem::Specification.new do |s|
|
6
|
+
s.name = "data-import"
|
7
|
+
s.version = DataImport::VERSION
|
8
|
+
s.authors = ['Michael Stämpfli', 'Yves Senn']
|
9
|
+
s.email = ['michael.staempfli@garaio.com', 'yves.senn@garaio.com']
|
10
|
+
s.homepage = ""
|
11
|
+
s.summary = %q{migrate your data to a better palce}
|
12
|
+
s.description = %q{sequel based dsl to migrate data from a legacy database to a new home}
|
13
|
+
|
14
|
+
s.rubyforge_project = "data-import"
|
15
|
+
|
16
|
+
s.files = `git ls-files`.split("\n")
|
17
|
+
s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
|
18
|
+
s.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
|
19
|
+
s.require_paths = ["lib"]
|
20
|
+
|
21
|
+
# specify any dependencies here; for example:
|
22
|
+
s.add_development_dependency "rspec"
|
23
|
+
s.add_development_dependency "sqlite3"
|
24
|
+
|
25
|
+
s.add_runtime_dependency "sequel"
|
26
|
+
s.add_runtime_dependency "progress"
|
27
|
+
s.add_runtime_dependency "activesupport"
|
28
|
+
s.add_runtime_dependency "i18n"
|
29
|
+
end
|
data/lib/data-import.rb
ADDED
@@ -0,0 +1,35 @@
|
|
1
|
+
|
2
|
+
require 'yaml'
|
3
|
+
require 'progress'
|
4
|
+
require 'active_support/all'
|
5
|
+
|
6
|
+
require "data-import/version"
|
7
|
+
require 'data-import/runner'
|
8
|
+
require 'data-import/execution_plan'
|
9
|
+
require 'data-import/dsl'
|
10
|
+
require 'data-import/database'
|
11
|
+
require 'data-import/definition'
|
12
|
+
require 'data-import/importer'
|
13
|
+
|
14
|
+
# Monkeypatch for active support (see https://github.com/rails/rails/pull/2801)
|
15
|
+
class Time
|
16
|
+
class << self
|
17
|
+
def ===(other)
|
18
|
+
super || (self == Time && other.is_a?(ActiveSupport::TimeWithZone))
|
19
|
+
end
|
20
|
+
end
|
21
|
+
end
|
22
|
+
|
23
|
+
module DataImport
|
24
|
+
|
25
|
+
def self.run_config!(config_path, options = {})
|
26
|
+
plan = DataImport::Dsl.evaluate_import_config(config_path)
|
27
|
+
run_plan!(plan, options)
|
28
|
+
end
|
29
|
+
|
30
|
+
def self.run_plan!(plan, options = {})
|
31
|
+
runner = Runner.new(plan)
|
32
|
+
runner.run(options)
|
33
|
+
end
|
34
|
+
|
35
|
+
end
|
@@ -0,0 +1,96 @@
|
|
1
|
+
require 'sequel'
|
2
|
+
require 'iconv'
|
3
|
+
|
4
|
+
module DataImport
|
5
|
+
module Adapters
|
6
|
+
class Sequel
|
7
|
+
|
8
|
+
attr_reader :db
|
9
|
+
|
10
|
+
def self.connect(options = {})
|
11
|
+
::Sequel.identifier_output_method = :to_s
|
12
|
+
self.new ::Sequel.connect(options)
|
13
|
+
end
|
14
|
+
|
15
|
+
def initialize(db)
|
16
|
+
@db = db
|
17
|
+
end
|
18
|
+
|
19
|
+
def truncate(table)
|
20
|
+
@db.from(table).delete
|
21
|
+
end
|
22
|
+
|
23
|
+
def transaction(&block)
|
24
|
+
@db.transaction do
|
25
|
+
yield block
|
26
|
+
end
|
27
|
+
end
|
28
|
+
|
29
|
+
def each_row(table, options = {}, &block)
|
30
|
+
if options[:primary_key].nil? || !numeric_column?(table, options[:primary_key])
|
31
|
+
each_row_without_batches table, options, &block
|
32
|
+
else
|
33
|
+
each_row_in_batches table, options, &block
|
34
|
+
end
|
35
|
+
end
|
36
|
+
|
37
|
+
def each_row_without_batches(table, options = {}, &block)
|
38
|
+
sql = @db.from(table)
|
39
|
+
sql = sql.select(*options[:columns]) unless options[:columns].nil?
|
40
|
+
sql = sql.distinct if options[:distinct]
|
41
|
+
sql = sql.order(*options[:order]) unless options[:order].nil?
|
42
|
+
sql.each do |row|
|
43
|
+
yield row if block_given?
|
44
|
+
end
|
45
|
+
end
|
46
|
+
|
47
|
+
def each_row_in_batches(table, options = {}, &block)
|
48
|
+
personen = @db.from(table)
|
49
|
+
max = maximum_value(table, options[:primary_key]) || 0
|
50
|
+
lower_bound = 0
|
51
|
+
batch_size = 1000
|
52
|
+
while (lower_bound <= max) do
|
53
|
+
upper_bound = lower_bound + batch_size - 1
|
54
|
+
sql = personen.filter(options[:primary_key] => lower_bound..upper_bound)
|
55
|
+
sql = sql.select(*options[:columns]) unless options[:columns].nil?
|
56
|
+
sql = sql.distinct if options[:distinct]
|
57
|
+
sql = sql.order(*options[:order]) unless options[:order].nil?
|
58
|
+
sql.each do |result|
|
59
|
+
yield result if block_given?
|
60
|
+
end unless sql.nil?
|
61
|
+
lower_bound += batch_size
|
62
|
+
end
|
63
|
+
end
|
64
|
+
|
65
|
+
def maximum_value(table, column)
|
66
|
+
@db.from(table).max(column)
|
67
|
+
end
|
68
|
+
|
69
|
+
def count(table, options = {})
|
70
|
+
sql = @db.from(table)
|
71
|
+
sql = sql.select(*options[:columns]) unless options[:columns].nil?
|
72
|
+
sql = sql.distinct if options[:distinct]
|
73
|
+
sql.count
|
74
|
+
end
|
75
|
+
|
76
|
+
def insert_row(table, row)
|
77
|
+
@db.from(table).insert(row)
|
78
|
+
end
|
79
|
+
|
80
|
+
def update_row(table, row)
|
81
|
+
id = row.delete(:id) || row.delete('id')
|
82
|
+
@db.from(table).filter(:id => id).update(row)
|
83
|
+
end
|
84
|
+
|
85
|
+
def numeric_column?(table, column)
|
86
|
+
column_definition = @db.schema(table).select{|c| c.first == column}.first
|
87
|
+
column_definition[1][:type] == :integer unless column_definition.nil?
|
88
|
+
end
|
89
|
+
|
90
|
+
def unique_row(table, key)
|
91
|
+
@db.from(table)[:id => key]
|
92
|
+
end
|
93
|
+
|
94
|
+
end
|
95
|
+
end
|
96
|
+
end
|
@@ -0,0 +1,28 @@
|
|
1
|
+
module DataImport
|
2
|
+
class Database
|
3
|
+
|
4
|
+
def self.connect(name, options = {})
|
5
|
+
adapter = find_adapter(name)
|
6
|
+
unless adapter.nil?
|
7
|
+
adapter.connect options
|
8
|
+
end
|
9
|
+
end
|
10
|
+
|
11
|
+
private
|
12
|
+
|
13
|
+
SUPPORTED_ADAPTERS = [:sequel]
|
14
|
+
|
15
|
+
def self.find_adapter(name)
|
16
|
+
@@loaded_adapters ||= {}
|
17
|
+
if SUPPORTED_ADAPTERS.include? name.to_sym
|
18
|
+
if @@loaded_adapters[name.to_sym].nil?
|
19
|
+
require "data-import/adapters/#{name.to_s}"
|
20
|
+
class_name = name.to_s.gsub(/\/(.?)/) { "::#{$1.upcase}" }.gsub(/(?:^|_)(.)/) { $1.upcase }
|
21
|
+
@@loaded_adapters[name.to_sym] = DataImport::Adapters.const_get(class_name)
|
22
|
+
end
|
23
|
+
@@loaded_adapters[name.to_sym]
|
24
|
+
end
|
25
|
+
end
|
26
|
+
|
27
|
+
end
|
28
|
+
end
|
@@ -0,0 +1,20 @@
|
|
1
|
+
require 'data-import/definition/simple'
|
2
|
+
|
3
|
+
module DataImport
|
4
|
+
class Definition
|
5
|
+
attr_reader :name
|
6
|
+
attr_reader :source_database, :target_database
|
7
|
+
attr_reader :dependencies
|
8
|
+
|
9
|
+
def initialize(name, source_database, target_database)
|
10
|
+
@name = name
|
11
|
+
@source_database = source_database
|
12
|
+
@target_database = target_database
|
13
|
+
@dependencies = []
|
14
|
+
end
|
15
|
+
|
16
|
+
def add_dependency(dependency)
|
17
|
+
@dependencies << dependency
|
18
|
+
end
|
19
|
+
end
|
20
|
+
end
|
@@ -0,0 +1,59 @@
|
|
1
|
+
module DataImport
|
2
|
+
class Definition
|
3
|
+
class Simple < Definition
|
4
|
+
attr_reader :id_mappings
|
5
|
+
attr_reader :source_primary_key
|
6
|
+
attr_accessor :source_table_name, :source_columns, :source_distinct_columns, :source_order_columns
|
7
|
+
attr_accessor :target_table_name
|
8
|
+
attr_accessor :after_blocks, :after_row_blocks
|
9
|
+
attr_reader :mode
|
10
|
+
|
11
|
+
def initialize(name, source_database, target_database)
|
12
|
+
super
|
13
|
+
@mode = :insert
|
14
|
+
@id_mappings = {}
|
15
|
+
@after_blocks = []
|
16
|
+
@after_row_blocks = []
|
17
|
+
@source_columns = []
|
18
|
+
@source_order_columns = []
|
19
|
+
end
|
20
|
+
|
21
|
+
def mappings
|
22
|
+
@mappings ||= {}
|
23
|
+
end
|
24
|
+
|
25
|
+
def source_primary_key=(value)
|
26
|
+
@source_primary_key = value.to_sym unless value.nil?
|
27
|
+
end
|
28
|
+
|
29
|
+
def add_id_mapping(mapping)
|
30
|
+
@id_mappings.merge! mapping
|
31
|
+
end
|
32
|
+
|
33
|
+
def new_id_of(value)
|
34
|
+
@id_mappings[value]
|
35
|
+
end
|
36
|
+
|
37
|
+
def definition(name = nil)
|
38
|
+
if name.nil?
|
39
|
+
self
|
40
|
+
else
|
41
|
+
DataImport.definitions[name] or raise ArgumentError
|
42
|
+
end
|
43
|
+
end
|
44
|
+
|
45
|
+
def use_mode(mode)
|
46
|
+
@mode = mode
|
47
|
+
end
|
48
|
+
|
49
|
+
def run(context)
|
50
|
+
options = {:columns => source_columns, :distinct => source_distinct_columns}
|
51
|
+
Progress.start("Importing #{name}", source_database.count(source_table_name, options)) do
|
52
|
+
Importer.new(context, self).run do
|
53
|
+
Progress.step
|
54
|
+
end
|
55
|
+
end
|
56
|
+
end
|
57
|
+
end
|
58
|
+
end
|
59
|
+
end
|