data-import 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,6 @@
1
+ *.gem
2
+ .bundle
3
+ Gemfile.lock
4
+ pkg/*
5
+ reports/*
6
+ bin/*
data/.rvmrc ADDED
@@ -0,0 +1 @@
1
+ rvm --create ruby-1.9.2@data-import
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source "http://rubygems.org"
2
+
3
+ # Specify your gem's dependencies in data-import.gemspec
4
+ gemspec
@@ -0,0 +1,157 @@
1
+ # DataImport
2
+
3
+ data-import is a data-migration framework. The goal of the project is to provide a simple api to migrate data from a legacy schema into a new one. It's based on jeremyevans/sequel.
4
+
5
+ ## Installation
6
+
7
+ ```ruby
8
+ gem 'data-import'
9
+ ```
10
+
11
+ you can put your migration configuration in any file you like. We suggest something like `mapping.rb`
12
+
13
+ ```ruby
14
+ source :sequel, 'sqlite:/'
15
+ target :sequel, 'sqlite:/'
16
+
17
+ import 'Animals' do
18
+ from 'tblAnimal', :primary_key => 'sAnimalID'
19
+ to 'animals'
20
+
21
+ mapping 'sAnimalID' => 'id'
22
+ mapping 'strAnimalTitleText' => 'name'
23
+ mapping 'sAnimalAge' => 'age'
24
+ mapping 'strThreat' do |context, threat|
25
+ rating = ['none', 'medium', 'big'].index(threat) + 1
26
+ {:danger_rating => rating}
27
+ end
28
+ end
29
+ ```
30
+
31
+ to run the import just execute:
32
+
33
+ ```ruby
34
+ mapping_path = Rails.root + 'mapping.rb'
35
+ DataImport.run_config! mapping_path
36
+ ```
37
+
38
+ if you execute the import frequently you can create a Rake-Task:
39
+
40
+ ```ruby
41
+ desc "Imports the date from the source database"
42
+ task :import do
43
+ mapping_path = Rails.root + 'mapping.rb'
44
+ options = {}
45
+ options[:only] = ENV['RUN_ONLY'].split(',') if ENV['RUN_ONLY'].present?
46
+
47
+ DataImport.run_config! mapping_path, options
48
+ end
49
+ ```
50
+
51
+ ## Configuration
52
+
53
+ data-import provides a clean dsl to define your mappings from the legacy schema to the new one.
54
+
55
+ ### Before Filter ###
56
+
57
+ data-import allows you to definie a global filter. This filter can be used to make global transformations like encoding fixes. You can define a filter, which downcases every string like so:
58
+
59
+ ```ruby
60
+ before_filter do |row|
61
+ row.each do |k, v|
62
+ row[k] = v.downcase if v.respond_to?(:downcase)
63
+ end
64
+ end
65
+ ```
66
+
67
+ ### Simple Mappings
68
+
69
+ You've already seen a very basic example of the dsl in the Installation-Section. This part shows off the features of the mapping-DSL.
70
+
71
+ #### Structure ####
72
+
73
+ every mapping starts with a call to `import` followed by the name of the mapping. You can name mappings however you like. The block passed to import contains the mapping itself. You can supply the source-table with `from` and the target-table with `to`. Make sure that you set the primary-key on the source-table otherwhise pagination is not working properly and the migration will fill up your RAM.
74
+
75
+ ```ruby
76
+ import 'Users' do
77
+ from 'tblUser', :primary_key => 'sUserID'
78
+ to 'users'
79
+ ```
80
+
81
+ #### Column-Mappings ####
82
+
83
+ You can create simple name-mappings with a call to `mapping`:
84
+
85
+ ```ruby
86
+ mapping 'sUserID' => 'id'
87
+ mapping 'strEmail' => 'email'
88
+ mapping 'strUsername' => 'username'
89
+ ```
90
+
91
+ If you need to process a column you can add a block. This will pass in the values of the columns you specified after mapping. The return value of the block should be a hash or nil. Nil means no mapping at all and in case of a hash you have to use the column-names of the target-table as keys.
92
+
93
+ ```ruby
94
+ mapping 'strThreat' do |context, threat|
95
+ rating = ['none', 'medium', 'big'].index(threat) + 1
96
+ {:danger_rating => rating}
97
+ end
98
+ ```
99
+
100
+ ### Dependencies
101
+
102
+ You can specify dependencies between definitions. Dependencies are always run before a given definition will be executed. Adding all necessary dependencies also allows you to run a set of definitions instead of everything.
103
+
104
+ ```ruby
105
+ import 'Roles' do
106
+ from 'tblRole', :primary_key => 'sRoleID'
107
+ to 'roles'
108
+ end
109
+
110
+ import 'SubscriptionPlans' do
111
+ from 'tblSubcriptionCat', :primary_key => 'sSubscriptionCatID'
112
+ to 'subscription_plans'
113
+ end
114
+
115
+ import 'Users' do
116
+ from 'tblUser', :primary_key => 'sUserID'
117
+ to 'users'
118
+ dependencies 'SubscriptionPlans'
119
+ end
120
+
121
+ import 'Permissions' do
122
+ from 'tblUserRoles'
123
+ to 'permissions'
124
+ dependencies 'Users', 'Roles'
125
+ end
126
+ ```
127
+
128
+ you can now run parts of your mappings using the :only option:
129
+
130
+ ```ruby
131
+ DataImport.run_config! 'mappings.rb', :only => ['Users'] # => imports SubscriptionPlans then Users
132
+ DataImport.run_config! 'mappings.rb', :only => ['Roles'] # => imports Roles only
133
+ DataImport.run_config! 'mappings.rb', :only => ['Permissions'] # => imports Roles, SubscriptionPlans, Users and then Permissions
134
+ ```
135
+
136
+ ## Examples
137
+
138
+ you can learn a lot from the [integration specs](https://github.com/garaio/data-import/tree/master/spec/integration).
139
+
140
+ ## Community
141
+
142
+ ### Got a question?
143
+
144
+ Just send me a message and I'll try to get to you as soon as possible.
145
+
146
+ ### Found a bug?
147
+
148
+ Please submit a new issue.
149
+
150
+ ### Fixed something?
151
+
152
+ 1. Fork data-import
153
+ 2. Create a topic branch - `git checkout -b my_branch`
154
+ 3. Make your changes and update the History.txt file
155
+ 4. Push to your branch - `git push origin my_branch`
156
+ 5. Send me a pull-request for your topic branch
157
+ 6. That's it!
@@ -0,0 +1,20 @@
1
+ require "bundler/gem_tasks"
2
+
3
+ require 'rspec/core/rake_task'
4
+
5
+ namespace :ci do
6
+ task :setup do
7
+ include FileUtils
8
+
9
+ rm_rf 'reports'
10
+ mkdir_p 'reports/rspec'
11
+ end
12
+
13
+ RSpec::Core::RakeTask.new(:rspec => :setup) do |t|
14
+ t.rspec_opts = ['--no-color',
15
+ '-r ./spec/junit_formatter.rb',
16
+ '-f "JUnitFormatter"',
17
+ '-o reports/rspec/junit.xml']
18
+ t.pattern = "spec/**/*_spec.rb"
19
+ end
20
+ end
@@ -0,0 +1,29 @@
1
+ # -*- encoding: utf-8 -*-
2
+ $:.push File.expand_path("../lib", __FILE__)
3
+ require "data-import/version"
4
+
5
+ Gem::Specification.new do |s|
6
+ s.name = "data-import"
7
+ s.version = DataImport::VERSION
8
+ s.authors = ['Michael Stämpfli', 'Yves Senn']
9
+ s.email = ['michael.staempfli@garaio.com', 'yves.senn@garaio.com']
10
+ s.homepage = ""
11
+ s.summary = %q{migrate your data to a better palce}
12
+ s.description = %q{sequel based dsl to migrate data from a legacy database to a new home}
13
+
14
+ s.rubyforge_project = "data-import"
15
+
16
+ s.files = `git ls-files`.split("\n")
17
+ s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
18
+ s.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
19
+ s.require_paths = ["lib"]
20
+
21
+ # specify any dependencies here; for example:
22
+ s.add_development_dependency "rspec"
23
+ s.add_development_dependency "sqlite3"
24
+
25
+ s.add_runtime_dependency "sequel"
26
+ s.add_runtime_dependency "progress"
27
+ s.add_runtime_dependency "activesupport"
28
+ s.add_runtime_dependency "i18n"
29
+ end
@@ -0,0 +1,35 @@
1
+
2
+ require 'yaml'
3
+ require 'progress'
4
+ require 'active_support/all'
5
+
6
+ require "data-import/version"
7
+ require 'data-import/runner'
8
+ require 'data-import/execution_plan'
9
+ require 'data-import/dsl'
10
+ require 'data-import/database'
11
+ require 'data-import/definition'
12
+ require 'data-import/importer'
13
+
14
+ # Monkeypatch for active support (see https://github.com/rails/rails/pull/2801)
15
+ class Time
16
+ class << self
17
+ def ===(other)
18
+ super || (self == Time && other.is_a?(ActiveSupport::TimeWithZone))
19
+ end
20
+ end
21
+ end
22
+
23
+ module DataImport
24
+
25
+ def self.run_config!(config_path, options = {})
26
+ plan = DataImport::Dsl.evaluate_import_config(config_path)
27
+ run_plan!(plan, options)
28
+ end
29
+
30
+ def self.run_plan!(plan, options = {})
31
+ runner = Runner.new(plan)
32
+ runner.run(options)
33
+ end
34
+
35
+ end
@@ -0,0 +1,96 @@
1
+ require 'sequel'
2
+ require 'iconv'
3
+
4
+ module DataImport
5
+ module Adapters
6
+ class Sequel
7
+
8
+ attr_reader :db
9
+
10
+ def self.connect(options = {})
11
+ ::Sequel.identifier_output_method = :to_s
12
+ self.new ::Sequel.connect(options)
13
+ end
14
+
15
+ def initialize(db)
16
+ @db = db
17
+ end
18
+
19
+ def truncate(table)
20
+ @db.from(table).delete
21
+ end
22
+
23
+ def transaction(&block)
24
+ @db.transaction do
25
+ yield block
26
+ end
27
+ end
28
+
29
+ def each_row(table, options = {}, &block)
30
+ if options[:primary_key].nil? || !numeric_column?(table, options[:primary_key])
31
+ each_row_without_batches table, options, &block
32
+ else
33
+ each_row_in_batches table, options, &block
34
+ end
35
+ end
36
+
37
+ def each_row_without_batches(table, options = {}, &block)
38
+ sql = @db.from(table)
39
+ sql = sql.select(*options[:columns]) unless options[:columns].nil?
40
+ sql = sql.distinct if options[:distinct]
41
+ sql = sql.order(*options[:order]) unless options[:order].nil?
42
+ sql.each do |row|
43
+ yield row if block_given?
44
+ end
45
+ end
46
+
47
+ def each_row_in_batches(table, options = {}, &block)
48
+ personen = @db.from(table)
49
+ max = maximum_value(table, options[:primary_key]) || 0
50
+ lower_bound = 0
51
+ batch_size = 1000
52
+ while (lower_bound <= max) do
53
+ upper_bound = lower_bound + batch_size - 1
54
+ sql = personen.filter(options[:primary_key] => lower_bound..upper_bound)
55
+ sql = sql.select(*options[:columns]) unless options[:columns].nil?
56
+ sql = sql.distinct if options[:distinct]
57
+ sql = sql.order(*options[:order]) unless options[:order].nil?
58
+ sql.each do |result|
59
+ yield result if block_given?
60
+ end unless sql.nil?
61
+ lower_bound += batch_size
62
+ end
63
+ end
64
+
65
+ def maximum_value(table, column)
66
+ @db.from(table).max(column)
67
+ end
68
+
69
+ def count(table, options = {})
70
+ sql = @db.from(table)
71
+ sql = sql.select(*options[:columns]) unless options[:columns].nil?
72
+ sql = sql.distinct if options[:distinct]
73
+ sql.count
74
+ end
75
+
76
+ def insert_row(table, row)
77
+ @db.from(table).insert(row)
78
+ end
79
+
80
+ def update_row(table, row)
81
+ id = row.delete(:id) || row.delete('id')
82
+ @db.from(table).filter(:id => id).update(row)
83
+ end
84
+
85
+ def numeric_column?(table, column)
86
+ column_definition = @db.schema(table).select{|c| c.first == column}.first
87
+ column_definition[1][:type] == :integer unless column_definition.nil?
88
+ end
89
+
90
+ def unique_row(table, key)
91
+ @db.from(table)[:id => key]
92
+ end
93
+
94
+ end
95
+ end
96
+ end
@@ -0,0 +1,28 @@
1
+ module DataImport
2
+ class Database
3
+
4
+ def self.connect(name, options = {})
5
+ adapter = find_adapter(name)
6
+ unless adapter.nil?
7
+ adapter.connect options
8
+ end
9
+ end
10
+
11
+ private
12
+
13
+ SUPPORTED_ADAPTERS = [:sequel]
14
+
15
+ def self.find_adapter(name)
16
+ @@loaded_adapters ||= {}
17
+ if SUPPORTED_ADAPTERS.include? name.to_sym
18
+ if @@loaded_adapters[name.to_sym].nil?
19
+ require "data-import/adapters/#{name.to_s}"
20
+ class_name = name.to_s.gsub(/\/(.?)/) { "::#{$1.upcase}" }.gsub(/(?:^|_)(.)/) { $1.upcase }
21
+ @@loaded_adapters[name.to_sym] = DataImport::Adapters.const_get(class_name)
22
+ end
23
+ @@loaded_adapters[name.to_sym]
24
+ end
25
+ end
26
+
27
+ end
28
+ end
@@ -0,0 +1,20 @@
1
+ require 'data-import/definition/simple'
2
+
3
+ module DataImport
4
+ class Definition
5
+ attr_reader :name
6
+ attr_reader :source_database, :target_database
7
+ attr_reader :dependencies
8
+
9
+ def initialize(name, source_database, target_database)
10
+ @name = name
11
+ @source_database = source_database
12
+ @target_database = target_database
13
+ @dependencies = []
14
+ end
15
+
16
+ def add_dependency(dependency)
17
+ @dependencies << dependency
18
+ end
19
+ end
20
+ end
@@ -0,0 +1,59 @@
1
+ module DataImport
2
+ class Definition
3
+ class Simple < Definition
4
+ attr_reader :id_mappings
5
+ attr_reader :source_primary_key
6
+ attr_accessor :source_table_name, :source_columns, :source_distinct_columns, :source_order_columns
7
+ attr_accessor :target_table_name
8
+ attr_accessor :after_blocks, :after_row_blocks
9
+ attr_reader :mode
10
+
11
+ def initialize(name, source_database, target_database)
12
+ super
13
+ @mode = :insert
14
+ @id_mappings = {}
15
+ @after_blocks = []
16
+ @after_row_blocks = []
17
+ @source_columns = []
18
+ @source_order_columns = []
19
+ end
20
+
21
+ def mappings
22
+ @mappings ||= {}
23
+ end
24
+
25
+ def source_primary_key=(value)
26
+ @source_primary_key = value.to_sym unless value.nil?
27
+ end
28
+
29
+ def add_id_mapping(mapping)
30
+ @id_mappings.merge! mapping
31
+ end
32
+
33
+ def new_id_of(value)
34
+ @id_mappings[value]
35
+ end
36
+
37
+ def definition(name = nil)
38
+ if name.nil?
39
+ self
40
+ else
41
+ DataImport.definitions[name] or raise ArgumentError
42
+ end
43
+ end
44
+
45
+ def use_mode(mode)
46
+ @mode = mode
47
+ end
48
+
49
+ def run(context)
50
+ options = {:columns => source_columns, :distinct => source_distinct_columns}
51
+ Progress.start("Importing #{name}", source_database.count(source_table_name, options)) do
52
+ Importer.new(context, self).run do
53
+ Progress.step
54
+ end
55
+ end
56
+ end
57
+ end
58
+ end
59
+ end