data-import 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,6 @@
1
+ *.gem
2
+ .bundle
3
+ Gemfile.lock
4
+ pkg/*
5
+ reports/*
6
+ bin/*
data/.rvmrc ADDED
@@ -0,0 +1 @@
1
+ rvm --create ruby-1.9.2@data-import
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source "http://rubygems.org"
2
+
3
+ # Specify your gem's dependencies in data-import.gemspec
4
+ gemspec
@@ -0,0 +1,157 @@
1
+ # DataImport
2
+
3
+ data-import is a data-migration framework. The goal of the project is to provide a simple api to migrate data from a legacy schema into a new one. It's based on jeremyevans/sequel.
4
+
5
+ ## Installation
6
+
7
+ ```ruby
8
+ gem 'data-import'
9
+ ```
10
+
11
+ you can put your migration configuration in any file you like. We suggest something like `mapping.rb`
12
+
13
+ ```ruby
14
+ source :sequel, 'sqlite:/'
15
+ target :sequel, 'sqlite:/'
16
+
17
+ import 'Animals' do
18
+ from 'tblAnimal', :primary_key => 'sAnimalID'
19
+ to 'animals'
20
+
21
+ mapping 'sAnimalID' => 'id'
22
+ mapping 'strAnimalTitleText' => 'name'
23
+ mapping 'sAnimalAge' => 'age'
24
+ mapping 'strThreat' do |context, threat|
25
+ rating = ['none', 'medium', 'big'].index(threat) + 1
26
+ {:danger_rating => rating}
27
+ end
28
+ end
29
+ ```
30
+
31
+ to run the import just execute:
32
+
33
+ ```ruby
34
+ mapping_path = Rails.root + 'mapping.rb'
35
+ DataImport.run_config! mapping_path
36
+ ```
37
+
38
+ if you execute the import frequently you can create a Rake-Task:
39
+
40
+ ```ruby
41
+ desc "Imports the date from the source database"
42
+ task :import do
43
+ mapping_path = Rails.root + 'mapping.rb'
44
+ options = {}
45
+ options[:only] = ENV['RUN_ONLY'].split(',') if ENV['RUN_ONLY'].present?
46
+
47
+ DataImport.run_config! mapping_path, options
48
+ end
49
+ ```
50
+
51
+ ## Configuration
52
+
53
+ data-import provides a clean dsl to define your mappings from the legacy schema to the new one.
54
+
55
+ ### Before Filter ###
56
+
57
+ data-import allows you to definie a global filter. This filter can be used to make global transformations like encoding fixes. You can define a filter, which downcases every string like so:
58
+
59
+ ```ruby
60
+ before_filter do |row|
61
+ row.each do |k, v|
62
+ row[k] = v.downcase if v.respond_to?(:downcase)
63
+ end
64
+ end
65
+ ```
66
+
67
+ ### Simple Mappings
68
+
69
+ You've already seen a very basic example of the dsl in the Installation-Section. This part shows off the features of the mapping-DSL.
70
+
71
+ #### Structure ####
72
+
73
+ every mapping starts with a call to `import` followed by the name of the mapping. You can name mappings however you like. The block passed to import contains the mapping itself. You can supply the source-table with `from` and the target-table with `to`. Make sure that you set the primary-key on the source-table otherwhise pagination is not working properly and the migration will fill up your RAM.
74
+
75
+ ```ruby
76
+ import 'Users' do
77
+ from 'tblUser', :primary_key => 'sUserID'
78
+ to 'users'
79
+ ```
80
+
81
+ #### Column-Mappings ####
82
+
83
+ You can create simple name-mappings with a call to `mapping`:
84
+
85
+ ```ruby
86
+ mapping 'sUserID' => 'id'
87
+ mapping 'strEmail' => 'email'
88
+ mapping 'strUsername' => 'username'
89
+ ```
90
+
91
+ If you need to process a column you can add a block. This will pass in the values of the columns you specified after mapping. The return value of the block should be a hash or nil. Nil means no mapping at all and in case of a hash you have to use the column-names of the target-table as keys.
92
+
93
+ ```ruby
94
+ mapping 'strThreat' do |context, threat|
95
+ rating = ['none', 'medium', 'big'].index(threat) + 1
96
+ {:danger_rating => rating}
97
+ end
98
+ ```
99
+
100
+ ### Dependencies
101
+
102
+ You can specify dependencies between definitions. Dependencies are always run before a given definition will be executed. Adding all necessary dependencies also allows you to run a set of definitions instead of everything.
103
+
104
+ ```ruby
105
+ import 'Roles' do
106
+ from 'tblRole', :primary_key => 'sRoleID'
107
+ to 'roles'
108
+ end
109
+
110
+ import 'SubscriptionPlans' do
111
+ from 'tblSubcriptionCat', :primary_key => 'sSubscriptionCatID'
112
+ to 'subscription_plans'
113
+ end
114
+
115
+ import 'Users' do
116
+ from 'tblUser', :primary_key => 'sUserID'
117
+ to 'users'
118
+ dependencies 'SubscriptionPlans'
119
+ end
120
+
121
+ import 'Permissions' do
122
+ from 'tblUserRoles'
123
+ to 'permissions'
124
+ dependencies 'Users', 'Roles'
125
+ end
126
+ ```
127
+
128
+ you can now run parts of your mappings using the :only option:
129
+
130
+ ```ruby
131
+ DataImport.run_config! 'mappings.rb', :only => ['Users'] # => imports SubscriptionPlans then Users
132
+ DataImport.run_config! 'mappings.rb', :only => ['Roles'] # => imports Roles only
133
+ DataImport.run_config! 'mappings.rb', :only => ['Permissions'] # => imports Roles, SubscriptionPlans, Users and then Permissions
134
+ ```
135
+
136
+ ## Examples
137
+
138
+ you can learn a lot from the [integration specs](https://github.com/garaio/data-import/tree/master/spec/integration).
139
+
140
+ ## Community
141
+
142
+ ### Got a question?
143
+
144
+ Just send me a message and I'll try to get to you as soon as possible.
145
+
146
+ ### Found a bug?
147
+
148
+ Please submit a new issue.
149
+
150
+ ### Fixed something?
151
+
152
+ 1. Fork data-import
153
+ 2. Create a topic branch - `git checkout -b my_branch`
154
+ 3. Make your changes and update the History.txt file
155
+ 4. Push to your branch - `git push origin my_branch`
156
+ 5. Send me a pull-request for your topic branch
157
+ 6. That's it!
@@ -0,0 +1,20 @@
1
+ require "bundler/gem_tasks"
2
+
3
+ require 'rspec/core/rake_task'
4
+
5
+ namespace :ci do
6
+ task :setup do
7
+ include FileUtils
8
+
9
+ rm_rf 'reports'
10
+ mkdir_p 'reports/rspec'
11
+ end
12
+
13
+ RSpec::Core::RakeTask.new(:rspec => :setup) do |t|
14
+ t.rspec_opts = ['--no-color',
15
+ '-r ./spec/junit_formatter.rb',
16
+ '-f "JUnitFormatter"',
17
+ '-o reports/rspec/junit.xml']
18
+ t.pattern = "spec/**/*_spec.rb"
19
+ end
20
+ end
@@ -0,0 +1,29 @@
1
+ # -*- encoding: utf-8 -*-
2
+ $:.push File.expand_path("../lib", __FILE__)
3
+ require "data-import/version"
4
+
5
+ Gem::Specification.new do |s|
6
+ s.name = "data-import"
7
+ s.version = DataImport::VERSION
8
+ s.authors = ['Michael Stämpfli', 'Yves Senn']
9
+ s.email = ['michael.staempfli@garaio.com', 'yves.senn@garaio.com']
10
+ s.homepage = ""
11
+ s.summary = %q{migrate your data to a better palce}
12
+ s.description = %q{sequel based dsl to migrate data from a legacy database to a new home}
13
+
14
+ s.rubyforge_project = "data-import"
15
+
16
+ s.files = `git ls-files`.split("\n")
17
+ s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
18
+ s.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
19
+ s.require_paths = ["lib"]
20
+
21
+ # specify any dependencies here; for example:
22
+ s.add_development_dependency "rspec"
23
+ s.add_development_dependency "sqlite3"
24
+
25
+ s.add_runtime_dependency "sequel"
26
+ s.add_runtime_dependency "progress"
27
+ s.add_runtime_dependency "activesupport"
28
+ s.add_runtime_dependency "i18n"
29
+ end
@@ -0,0 +1,35 @@
1
+
2
+ require 'yaml'
3
+ require 'progress'
4
+ require 'active_support/all'
5
+
6
+ require "data-import/version"
7
+ require 'data-import/runner'
8
+ require 'data-import/execution_plan'
9
+ require 'data-import/dsl'
10
+ require 'data-import/database'
11
+ require 'data-import/definition'
12
+ require 'data-import/importer'
13
+
14
+ # Monkeypatch for active support (see https://github.com/rails/rails/pull/2801)
15
+ class Time
16
+ class << self
17
+ def ===(other)
18
+ super || (self == Time && other.is_a?(ActiveSupport::TimeWithZone))
19
+ end
20
+ end
21
+ end
22
+
23
+ module DataImport
24
+
25
+ def self.run_config!(config_path, options = {})
26
+ plan = DataImport::Dsl.evaluate_import_config(config_path)
27
+ run_plan!(plan, options)
28
+ end
29
+
30
+ def self.run_plan!(plan, options = {})
31
+ runner = Runner.new(plan)
32
+ runner.run(options)
33
+ end
34
+
35
+ end
@@ -0,0 +1,96 @@
1
+ require 'sequel'
2
+ require 'iconv'
3
+
4
+ module DataImport
5
+ module Adapters
6
+ class Sequel
7
+
8
+ attr_reader :db
9
+
10
+ def self.connect(options = {})
11
+ ::Sequel.identifier_output_method = :to_s
12
+ self.new ::Sequel.connect(options)
13
+ end
14
+
15
+ def initialize(db)
16
+ @db = db
17
+ end
18
+
19
+ def truncate(table)
20
+ @db.from(table).delete
21
+ end
22
+
23
+ def transaction(&block)
24
+ @db.transaction do
25
+ yield block
26
+ end
27
+ end
28
+
29
+ def each_row(table, options = {}, &block)
30
+ if options[:primary_key].nil? || !numeric_column?(table, options[:primary_key])
31
+ each_row_without_batches table, options, &block
32
+ else
33
+ each_row_in_batches table, options, &block
34
+ end
35
+ end
36
+
37
+ def each_row_without_batches(table, options = {}, &block)
38
+ sql = @db.from(table)
39
+ sql = sql.select(*options[:columns]) unless options[:columns].nil?
40
+ sql = sql.distinct if options[:distinct]
41
+ sql = sql.order(*options[:order]) unless options[:order].nil?
42
+ sql.each do |row|
43
+ yield row if block_given?
44
+ end
45
+ end
46
+
47
+ def each_row_in_batches(table, options = {}, &block)
48
+ personen = @db.from(table)
49
+ max = maximum_value(table, options[:primary_key]) || 0
50
+ lower_bound = 0
51
+ batch_size = 1000
52
+ while (lower_bound <= max) do
53
+ upper_bound = lower_bound + batch_size - 1
54
+ sql = personen.filter(options[:primary_key] => lower_bound..upper_bound)
55
+ sql = sql.select(*options[:columns]) unless options[:columns].nil?
56
+ sql = sql.distinct if options[:distinct]
57
+ sql = sql.order(*options[:order]) unless options[:order].nil?
58
+ sql.each do |result|
59
+ yield result if block_given?
60
+ end unless sql.nil?
61
+ lower_bound += batch_size
62
+ end
63
+ end
64
+
65
+ def maximum_value(table, column)
66
+ @db.from(table).max(column)
67
+ end
68
+
69
+ def count(table, options = {})
70
+ sql = @db.from(table)
71
+ sql = sql.select(*options[:columns]) unless options[:columns].nil?
72
+ sql = sql.distinct if options[:distinct]
73
+ sql.count
74
+ end
75
+
76
+ def insert_row(table, row)
77
+ @db.from(table).insert(row)
78
+ end
79
+
80
+ def update_row(table, row)
81
+ id = row.delete(:id) || row.delete('id')
82
+ @db.from(table).filter(:id => id).update(row)
83
+ end
84
+
85
+ def numeric_column?(table, column)
86
+ column_definition = @db.schema(table).select{|c| c.first == column}.first
87
+ column_definition[1][:type] == :integer unless column_definition.nil?
88
+ end
89
+
90
+ def unique_row(table, key)
91
+ @db.from(table)[:id => key]
92
+ end
93
+
94
+ end
95
+ end
96
+ end
@@ -0,0 +1,28 @@
1
+ module DataImport
2
+ class Database
3
+
4
+ def self.connect(name, options = {})
5
+ adapter = find_adapter(name)
6
+ unless adapter.nil?
7
+ adapter.connect options
8
+ end
9
+ end
10
+
11
+ private
12
+
13
+ SUPPORTED_ADAPTERS = [:sequel]
14
+
15
+ def self.find_adapter(name)
16
+ @@loaded_adapters ||= {}
17
+ if SUPPORTED_ADAPTERS.include? name.to_sym
18
+ if @@loaded_adapters[name.to_sym].nil?
19
+ require "data-import/adapters/#{name.to_s}"
20
+ class_name = name.to_s.gsub(/\/(.?)/) { "::#{$1.upcase}" }.gsub(/(?:^|_)(.)/) { $1.upcase }
21
+ @@loaded_adapters[name.to_sym] = DataImport::Adapters.const_get(class_name)
22
+ end
23
+ @@loaded_adapters[name.to_sym]
24
+ end
25
+ end
26
+
27
+ end
28
+ end
@@ -0,0 +1,20 @@
1
+ require 'data-import/definition/simple'
2
+
3
+ module DataImport
4
+ class Definition
5
+ attr_reader :name
6
+ attr_reader :source_database, :target_database
7
+ attr_reader :dependencies
8
+
9
+ def initialize(name, source_database, target_database)
10
+ @name = name
11
+ @source_database = source_database
12
+ @target_database = target_database
13
+ @dependencies = []
14
+ end
15
+
16
+ def add_dependency(dependency)
17
+ @dependencies << dependency
18
+ end
19
+ end
20
+ end
@@ -0,0 +1,59 @@
1
+ module DataImport
2
+ class Definition
3
+ class Simple < Definition
4
+ attr_reader :id_mappings
5
+ attr_reader :source_primary_key
6
+ attr_accessor :source_table_name, :source_columns, :source_distinct_columns, :source_order_columns
7
+ attr_accessor :target_table_name
8
+ attr_accessor :after_blocks, :after_row_blocks
9
+ attr_reader :mode
10
+
11
+ def initialize(name, source_database, target_database)
12
+ super
13
+ @mode = :insert
14
+ @id_mappings = {}
15
+ @after_blocks = []
16
+ @after_row_blocks = []
17
+ @source_columns = []
18
+ @source_order_columns = []
19
+ end
20
+
21
+ def mappings
22
+ @mappings ||= {}
23
+ end
24
+
25
+ def source_primary_key=(value)
26
+ @source_primary_key = value.to_sym unless value.nil?
27
+ end
28
+
29
+ def add_id_mapping(mapping)
30
+ @id_mappings.merge! mapping
31
+ end
32
+
33
+ def new_id_of(value)
34
+ @id_mappings[value]
35
+ end
36
+
37
+ def definition(name = nil)
38
+ if name.nil?
39
+ self
40
+ else
41
+ DataImport.definitions[name] or raise ArgumentError
42
+ end
43
+ end
44
+
45
+ def use_mode(mode)
46
+ @mode = mode
47
+ end
48
+
49
+ def run(context)
50
+ options = {:columns => source_columns, :distinct => source_distinct_columns}
51
+ Progress.start("Importing #{name}", source_database.count(source_table_name, options)) do
52
+ Importer.new(context, self).run do
53
+ Progress.step
54
+ end
55
+ end
56
+ end
57
+ end
58
+ end
59
+ end