data-import 0.0.1
Sign up to get free protection for your applications and to get access to all the features.
- data/.gitignore +6 -0
- data/.rvmrc +1 -0
- data/Gemfile +4 -0
- data/README.md +157 -0
- data/Rakefile +20 -0
- data/data-import.gemspec +29 -0
- data/lib/data-import.rb +35 -0
- data/lib/data-import/adapters/sequel.rb +96 -0
- data/lib/data-import/database.rb +28 -0
- data/lib/data-import/definition.rb +20 -0
- data/lib/data-import/definition/simple.rb +59 -0
- data/lib/data-import/dsl.rb +54 -0
- data/lib/data-import/dsl/import.rb +48 -0
- data/lib/data-import/dsl/import/from.rb +35 -0
- data/lib/data-import/execution_plan.rb +16 -0
- data/lib/data-import/importer.rb +55 -0
- data/lib/data-import/runner.rb +62 -0
- data/lib/data-import/version.rb +3 -0
- data/scripts/ci.sh +12 -0
- data/spec/data-import/adapters/sequel_spec.rb +159 -0
- data/spec/data-import/database_spec.rb +24 -0
- data/spec/data-import/definition/simple_spec.rb +71 -0
- data/spec/data-import/definition_spec.rb +14 -0
- data/spec/data-import/dsl/import/from_spec.rb +43 -0
- data/spec/data-import/dsl/import_spec.rb +87 -0
- data/spec/data-import/dsl_spec.rb +99 -0
- data/spec/data-import/execution_plan_spec.rb +25 -0
- data/spec/data-import/importer_spec.rb +150 -0
- data/spec/data-import/runner_spec.rb +136 -0
- data/spec/data-import_spec.rb +34 -0
- data/spec/integration/before_block_spec.rb +59 -0
- data/spec/integration/simple_mappings_spec.rb +68 -0
- data/spec/integration/update_records_spec.rb +57 -0
- data/spec/junit_formatter.rb +106 -0
- data/spec/spec_helper.rb +8 -0
- metadata +164 -0
data/.gitignore
ADDED
data/.rvmrc
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
rvm --create ruby-1.9.2@data-import
|
data/Gemfile
ADDED
data/README.md
ADDED
@@ -0,0 +1,157 @@
|
|
1
|
+
# DataImport
|
2
|
+
|
3
|
+
data-import is a data-migration framework. The goal of the project is to provide a simple api to migrate data from a legacy schema into a new one. It's based on jeremyevans/sequel.
|
4
|
+
|
5
|
+
## Installation
|
6
|
+
|
7
|
+
```ruby
|
8
|
+
gem 'data-import'
|
9
|
+
```
|
10
|
+
|
11
|
+
you can put your migration configuration in any file you like. We suggest something like `mapping.rb`
|
12
|
+
|
13
|
+
```ruby
|
14
|
+
source :sequel, 'sqlite:/'
|
15
|
+
target :sequel, 'sqlite:/'
|
16
|
+
|
17
|
+
import 'Animals' do
|
18
|
+
from 'tblAnimal', :primary_key => 'sAnimalID'
|
19
|
+
to 'animals'
|
20
|
+
|
21
|
+
mapping 'sAnimalID' => 'id'
|
22
|
+
mapping 'strAnimalTitleText' => 'name'
|
23
|
+
mapping 'sAnimalAge' => 'age'
|
24
|
+
mapping 'strThreat' do |context, threat|
|
25
|
+
rating = ['none', 'medium', 'big'].index(threat) + 1
|
26
|
+
{:danger_rating => rating}
|
27
|
+
end
|
28
|
+
end
|
29
|
+
```
|
30
|
+
|
31
|
+
to run the import just execute:
|
32
|
+
|
33
|
+
```ruby
|
34
|
+
mapping_path = Rails.root + 'mapping.rb'
|
35
|
+
DataImport.run_config! mapping_path
|
36
|
+
```
|
37
|
+
|
38
|
+
if you execute the import frequently you can create a Rake-Task:
|
39
|
+
|
40
|
+
```ruby
|
41
|
+
desc "Imports the date from the source database"
|
42
|
+
task :import do
|
43
|
+
mapping_path = Rails.root + 'mapping.rb'
|
44
|
+
options = {}
|
45
|
+
options[:only] = ENV['RUN_ONLY'].split(',') if ENV['RUN_ONLY'].present?
|
46
|
+
|
47
|
+
DataImport.run_config! mapping_path, options
|
48
|
+
end
|
49
|
+
```
|
50
|
+
|
51
|
+
## Configuration
|
52
|
+
|
53
|
+
data-import provides a clean dsl to define your mappings from the legacy schema to the new one.
|
54
|
+
|
55
|
+
### Before Filter ###
|
56
|
+
|
57
|
+
data-import allows you to definie a global filter. This filter can be used to make global transformations like encoding fixes. You can define a filter, which downcases every string like so:
|
58
|
+
|
59
|
+
```ruby
|
60
|
+
before_filter do |row|
|
61
|
+
row.each do |k, v|
|
62
|
+
row[k] = v.downcase if v.respond_to?(:downcase)
|
63
|
+
end
|
64
|
+
end
|
65
|
+
```
|
66
|
+
|
67
|
+
### Simple Mappings
|
68
|
+
|
69
|
+
You've already seen a very basic example of the dsl in the Installation-Section. This part shows off the features of the mapping-DSL.
|
70
|
+
|
71
|
+
#### Structure ####
|
72
|
+
|
73
|
+
every mapping starts with a call to `import` followed by the name of the mapping. You can name mappings however you like. The block passed to import contains the mapping itself. You can supply the source-table with `from` and the target-table with `to`. Make sure that you set the primary-key on the source-table otherwhise pagination is not working properly and the migration will fill up your RAM.
|
74
|
+
|
75
|
+
```ruby
|
76
|
+
import 'Users' do
|
77
|
+
from 'tblUser', :primary_key => 'sUserID'
|
78
|
+
to 'users'
|
79
|
+
```
|
80
|
+
|
81
|
+
#### Column-Mappings ####
|
82
|
+
|
83
|
+
You can create simple name-mappings with a call to `mapping`:
|
84
|
+
|
85
|
+
```ruby
|
86
|
+
mapping 'sUserID' => 'id'
|
87
|
+
mapping 'strEmail' => 'email'
|
88
|
+
mapping 'strUsername' => 'username'
|
89
|
+
```
|
90
|
+
|
91
|
+
If you need to process a column you can add a block. This will pass in the values of the columns you specified after mapping. The return value of the block should be a hash or nil. Nil means no mapping at all and in case of a hash you have to use the column-names of the target-table as keys.
|
92
|
+
|
93
|
+
```ruby
|
94
|
+
mapping 'strThreat' do |context, threat|
|
95
|
+
rating = ['none', 'medium', 'big'].index(threat) + 1
|
96
|
+
{:danger_rating => rating}
|
97
|
+
end
|
98
|
+
```
|
99
|
+
|
100
|
+
### Dependencies
|
101
|
+
|
102
|
+
You can specify dependencies between definitions. Dependencies are always run before a given definition will be executed. Adding all necessary dependencies also allows you to run a set of definitions instead of everything.
|
103
|
+
|
104
|
+
```ruby
|
105
|
+
import 'Roles' do
|
106
|
+
from 'tblRole', :primary_key => 'sRoleID'
|
107
|
+
to 'roles'
|
108
|
+
end
|
109
|
+
|
110
|
+
import 'SubscriptionPlans' do
|
111
|
+
from 'tblSubcriptionCat', :primary_key => 'sSubscriptionCatID'
|
112
|
+
to 'subscription_plans'
|
113
|
+
end
|
114
|
+
|
115
|
+
import 'Users' do
|
116
|
+
from 'tblUser', :primary_key => 'sUserID'
|
117
|
+
to 'users'
|
118
|
+
dependencies 'SubscriptionPlans'
|
119
|
+
end
|
120
|
+
|
121
|
+
import 'Permissions' do
|
122
|
+
from 'tblUserRoles'
|
123
|
+
to 'permissions'
|
124
|
+
dependencies 'Users', 'Roles'
|
125
|
+
end
|
126
|
+
```
|
127
|
+
|
128
|
+
you can now run parts of your mappings using the :only option:
|
129
|
+
|
130
|
+
```ruby
|
131
|
+
DataImport.run_config! 'mappings.rb', :only => ['Users'] # => imports SubscriptionPlans then Users
|
132
|
+
DataImport.run_config! 'mappings.rb', :only => ['Roles'] # => imports Roles only
|
133
|
+
DataImport.run_config! 'mappings.rb', :only => ['Permissions'] # => imports Roles, SubscriptionPlans, Users and then Permissions
|
134
|
+
```
|
135
|
+
|
136
|
+
## Examples
|
137
|
+
|
138
|
+
you can learn a lot from the [integration specs](https://github.com/garaio/data-import/tree/master/spec/integration).
|
139
|
+
|
140
|
+
## Community
|
141
|
+
|
142
|
+
### Got a question?
|
143
|
+
|
144
|
+
Just send me a message and I'll try to get to you as soon as possible.
|
145
|
+
|
146
|
+
### Found a bug?
|
147
|
+
|
148
|
+
Please submit a new issue.
|
149
|
+
|
150
|
+
### Fixed something?
|
151
|
+
|
152
|
+
1. Fork data-import
|
153
|
+
2. Create a topic branch - `git checkout -b my_branch`
|
154
|
+
3. Make your changes and update the History.txt file
|
155
|
+
4. Push to your branch - `git push origin my_branch`
|
156
|
+
5. Send me a pull-request for your topic branch
|
157
|
+
6. That's it!
|
data/Rakefile
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
require "bundler/gem_tasks"
|
2
|
+
|
3
|
+
require 'rspec/core/rake_task'
|
4
|
+
|
5
|
+
namespace :ci do
|
6
|
+
task :setup do
|
7
|
+
include FileUtils
|
8
|
+
|
9
|
+
rm_rf 'reports'
|
10
|
+
mkdir_p 'reports/rspec'
|
11
|
+
end
|
12
|
+
|
13
|
+
RSpec::Core::RakeTask.new(:rspec => :setup) do |t|
|
14
|
+
t.rspec_opts = ['--no-color',
|
15
|
+
'-r ./spec/junit_formatter.rb',
|
16
|
+
'-f "JUnitFormatter"',
|
17
|
+
'-o reports/rspec/junit.xml']
|
18
|
+
t.pattern = "spec/**/*_spec.rb"
|
19
|
+
end
|
20
|
+
end
|
data/data-import.gemspec
ADDED
@@ -0,0 +1,29 @@
|
|
1
|
+
# -*- encoding: utf-8 -*-
|
2
|
+
$:.push File.expand_path("../lib", __FILE__)
|
3
|
+
require "data-import/version"
|
4
|
+
|
5
|
+
Gem::Specification.new do |s|
|
6
|
+
s.name = "data-import"
|
7
|
+
s.version = DataImport::VERSION
|
8
|
+
s.authors = ['Michael Stämpfli', 'Yves Senn']
|
9
|
+
s.email = ['michael.staempfli@garaio.com', 'yves.senn@garaio.com']
|
10
|
+
s.homepage = ""
|
11
|
+
s.summary = %q{migrate your data to a better palce}
|
12
|
+
s.description = %q{sequel based dsl to migrate data from a legacy database to a new home}
|
13
|
+
|
14
|
+
s.rubyforge_project = "data-import"
|
15
|
+
|
16
|
+
s.files = `git ls-files`.split("\n")
|
17
|
+
s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
|
18
|
+
s.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
|
19
|
+
s.require_paths = ["lib"]
|
20
|
+
|
21
|
+
# specify any dependencies here; for example:
|
22
|
+
s.add_development_dependency "rspec"
|
23
|
+
s.add_development_dependency "sqlite3"
|
24
|
+
|
25
|
+
s.add_runtime_dependency "sequel"
|
26
|
+
s.add_runtime_dependency "progress"
|
27
|
+
s.add_runtime_dependency "activesupport"
|
28
|
+
s.add_runtime_dependency "i18n"
|
29
|
+
end
|
data/lib/data-import.rb
ADDED
@@ -0,0 +1,35 @@
|
|
1
|
+
|
2
|
+
require 'yaml'
|
3
|
+
require 'progress'
|
4
|
+
require 'active_support/all'
|
5
|
+
|
6
|
+
require "data-import/version"
|
7
|
+
require 'data-import/runner'
|
8
|
+
require 'data-import/execution_plan'
|
9
|
+
require 'data-import/dsl'
|
10
|
+
require 'data-import/database'
|
11
|
+
require 'data-import/definition'
|
12
|
+
require 'data-import/importer'
|
13
|
+
|
14
|
+
# Monkeypatch for active support (see https://github.com/rails/rails/pull/2801)
|
15
|
+
class Time
|
16
|
+
class << self
|
17
|
+
def ===(other)
|
18
|
+
super || (self == Time && other.is_a?(ActiveSupport::TimeWithZone))
|
19
|
+
end
|
20
|
+
end
|
21
|
+
end
|
22
|
+
|
23
|
+
module DataImport
|
24
|
+
|
25
|
+
def self.run_config!(config_path, options = {})
|
26
|
+
plan = DataImport::Dsl.evaluate_import_config(config_path)
|
27
|
+
run_plan!(plan, options)
|
28
|
+
end
|
29
|
+
|
30
|
+
def self.run_plan!(plan, options = {})
|
31
|
+
runner = Runner.new(plan)
|
32
|
+
runner.run(options)
|
33
|
+
end
|
34
|
+
|
35
|
+
end
|
@@ -0,0 +1,96 @@
|
|
1
|
+
require 'sequel'
|
2
|
+
require 'iconv'
|
3
|
+
|
4
|
+
module DataImport
|
5
|
+
module Adapters
|
6
|
+
class Sequel
|
7
|
+
|
8
|
+
attr_reader :db
|
9
|
+
|
10
|
+
def self.connect(options = {})
|
11
|
+
::Sequel.identifier_output_method = :to_s
|
12
|
+
self.new ::Sequel.connect(options)
|
13
|
+
end
|
14
|
+
|
15
|
+
def initialize(db)
|
16
|
+
@db = db
|
17
|
+
end
|
18
|
+
|
19
|
+
def truncate(table)
|
20
|
+
@db.from(table).delete
|
21
|
+
end
|
22
|
+
|
23
|
+
def transaction(&block)
|
24
|
+
@db.transaction do
|
25
|
+
yield block
|
26
|
+
end
|
27
|
+
end
|
28
|
+
|
29
|
+
def each_row(table, options = {}, &block)
|
30
|
+
if options[:primary_key].nil? || !numeric_column?(table, options[:primary_key])
|
31
|
+
each_row_without_batches table, options, &block
|
32
|
+
else
|
33
|
+
each_row_in_batches table, options, &block
|
34
|
+
end
|
35
|
+
end
|
36
|
+
|
37
|
+
def each_row_without_batches(table, options = {}, &block)
|
38
|
+
sql = @db.from(table)
|
39
|
+
sql = sql.select(*options[:columns]) unless options[:columns].nil?
|
40
|
+
sql = sql.distinct if options[:distinct]
|
41
|
+
sql = sql.order(*options[:order]) unless options[:order].nil?
|
42
|
+
sql.each do |row|
|
43
|
+
yield row if block_given?
|
44
|
+
end
|
45
|
+
end
|
46
|
+
|
47
|
+
def each_row_in_batches(table, options = {}, &block)
|
48
|
+
personen = @db.from(table)
|
49
|
+
max = maximum_value(table, options[:primary_key]) || 0
|
50
|
+
lower_bound = 0
|
51
|
+
batch_size = 1000
|
52
|
+
while (lower_bound <= max) do
|
53
|
+
upper_bound = lower_bound + batch_size - 1
|
54
|
+
sql = personen.filter(options[:primary_key] => lower_bound..upper_bound)
|
55
|
+
sql = sql.select(*options[:columns]) unless options[:columns].nil?
|
56
|
+
sql = sql.distinct if options[:distinct]
|
57
|
+
sql = sql.order(*options[:order]) unless options[:order].nil?
|
58
|
+
sql.each do |result|
|
59
|
+
yield result if block_given?
|
60
|
+
end unless sql.nil?
|
61
|
+
lower_bound += batch_size
|
62
|
+
end
|
63
|
+
end
|
64
|
+
|
65
|
+
def maximum_value(table, column)
|
66
|
+
@db.from(table).max(column)
|
67
|
+
end
|
68
|
+
|
69
|
+
def count(table, options = {})
|
70
|
+
sql = @db.from(table)
|
71
|
+
sql = sql.select(*options[:columns]) unless options[:columns].nil?
|
72
|
+
sql = sql.distinct if options[:distinct]
|
73
|
+
sql.count
|
74
|
+
end
|
75
|
+
|
76
|
+
def insert_row(table, row)
|
77
|
+
@db.from(table).insert(row)
|
78
|
+
end
|
79
|
+
|
80
|
+
def update_row(table, row)
|
81
|
+
id = row.delete(:id) || row.delete('id')
|
82
|
+
@db.from(table).filter(:id => id).update(row)
|
83
|
+
end
|
84
|
+
|
85
|
+
def numeric_column?(table, column)
|
86
|
+
column_definition = @db.schema(table).select{|c| c.first == column}.first
|
87
|
+
column_definition[1][:type] == :integer unless column_definition.nil?
|
88
|
+
end
|
89
|
+
|
90
|
+
def unique_row(table, key)
|
91
|
+
@db.from(table)[:id => key]
|
92
|
+
end
|
93
|
+
|
94
|
+
end
|
95
|
+
end
|
96
|
+
end
|
@@ -0,0 +1,28 @@
|
|
1
|
+
module DataImport
|
2
|
+
class Database
|
3
|
+
|
4
|
+
def self.connect(name, options = {})
|
5
|
+
adapter = find_adapter(name)
|
6
|
+
unless adapter.nil?
|
7
|
+
adapter.connect options
|
8
|
+
end
|
9
|
+
end
|
10
|
+
|
11
|
+
private
|
12
|
+
|
13
|
+
SUPPORTED_ADAPTERS = [:sequel]
|
14
|
+
|
15
|
+
def self.find_adapter(name)
|
16
|
+
@@loaded_adapters ||= {}
|
17
|
+
if SUPPORTED_ADAPTERS.include? name.to_sym
|
18
|
+
if @@loaded_adapters[name.to_sym].nil?
|
19
|
+
require "data-import/adapters/#{name.to_s}"
|
20
|
+
class_name = name.to_s.gsub(/\/(.?)/) { "::#{$1.upcase}" }.gsub(/(?:^|_)(.)/) { $1.upcase }
|
21
|
+
@@loaded_adapters[name.to_sym] = DataImport::Adapters.const_get(class_name)
|
22
|
+
end
|
23
|
+
@@loaded_adapters[name.to_sym]
|
24
|
+
end
|
25
|
+
end
|
26
|
+
|
27
|
+
end
|
28
|
+
end
|
@@ -0,0 +1,20 @@
|
|
1
|
+
require 'data-import/definition/simple'
|
2
|
+
|
3
|
+
module DataImport
|
4
|
+
class Definition
|
5
|
+
attr_reader :name
|
6
|
+
attr_reader :source_database, :target_database
|
7
|
+
attr_reader :dependencies
|
8
|
+
|
9
|
+
def initialize(name, source_database, target_database)
|
10
|
+
@name = name
|
11
|
+
@source_database = source_database
|
12
|
+
@target_database = target_database
|
13
|
+
@dependencies = []
|
14
|
+
end
|
15
|
+
|
16
|
+
def add_dependency(dependency)
|
17
|
+
@dependencies << dependency
|
18
|
+
end
|
19
|
+
end
|
20
|
+
end
|
@@ -0,0 +1,59 @@
|
|
1
|
+
module DataImport
|
2
|
+
class Definition
|
3
|
+
class Simple < Definition
|
4
|
+
attr_reader :id_mappings
|
5
|
+
attr_reader :source_primary_key
|
6
|
+
attr_accessor :source_table_name, :source_columns, :source_distinct_columns, :source_order_columns
|
7
|
+
attr_accessor :target_table_name
|
8
|
+
attr_accessor :after_blocks, :after_row_blocks
|
9
|
+
attr_reader :mode
|
10
|
+
|
11
|
+
def initialize(name, source_database, target_database)
|
12
|
+
super
|
13
|
+
@mode = :insert
|
14
|
+
@id_mappings = {}
|
15
|
+
@after_blocks = []
|
16
|
+
@after_row_blocks = []
|
17
|
+
@source_columns = []
|
18
|
+
@source_order_columns = []
|
19
|
+
end
|
20
|
+
|
21
|
+
def mappings
|
22
|
+
@mappings ||= {}
|
23
|
+
end
|
24
|
+
|
25
|
+
def source_primary_key=(value)
|
26
|
+
@source_primary_key = value.to_sym unless value.nil?
|
27
|
+
end
|
28
|
+
|
29
|
+
def add_id_mapping(mapping)
|
30
|
+
@id_mappings.merge! mapping
|
31
|
+
end
|
32
|
+
|
33
|
+
def new_id_of(value)
|
34
|
+
@id_mappings[value]
|
35
|
+
end
|
36
|
+
|
37
|
+
def definition(name = nil)
|
38
|
+
if name.nil?
|
39
|
+
self
|
40
|
+
else
|
41
|
+
DataImport.definitions[name] or raise ArgumentError
|
42
|
+
end
|
43
|
+
end
|
44
|
+
|
45
|
+
def use_mode(mode)
|
46
|
+
@mode = mode
|
47
|
+
end
|
48
|
+
|
49
|
+
def run(context)
|
50
|
+
options = {:columns => source_columns, :distinct => source_distinct_columns}
|
51
|
+
Progress.start("Importing #{name}", source_database.count(source_table_name, options)) do
|
52
|
+
Importer.new(context, self).run do
|
53
|
+
Progress.step
|
54
|
+
end
|
55
|
+
end
|
56
|
+
end
|
57
|
+
end
|
58
|
+
end
|
59
|
+
end
|