tapsoob 0.1.10

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 1140973a2ff5d3c3acb175c2c58b21ac93ac1425
4
+ data.tar.gz: 1e0a3b8443be5d4fcbf567d8635a6a754d3e7a29
5
+ SHA512:
6
+ metadata.gz: 61d6f73a2f3ad4d4173e9cad5090af8e0ca94def18b925adb9716565d458f0addfe8030bc67fd2d7b16caf7706ac2e9b3653c35912010ec4fde60ac102c7eda9
7
+ data.tar.gz: 443ae1a295828478dd89ba039c6f842e973eaa1dd491465865c25c0165732c8f288b7325161a4c31ea74102be9066cbf8a12119ef749c035c0de0d54a2280075
data/.gitignore ADDED
@@ -0,0 +1,4 @@
1
+ # Ignore dev stuffs
2
+ db/
3
+ *.dat
4
+ *.gem
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source "http://rubygems.org"
2
+
3
+ # load the gem's dependencies
4
+ gemspec
data/Gemfile.lock ADDED
@@ -0,0 +1,24 @@
1
+ PATH
2
+ remote: .
3
+ specs:
4
+ tapsoob (0.1.0)
5
+ sequel (~> 3.45.0)
6
+
7
+ GEM
8
+ remote: http://rubygems.org/
9
+ specs:
10
+ mysql (2.9.1)
11
+ mysql2 (0.3.11)
12
+ pg (0.14.1)
13
+ sequel (3.45.0)
14
+ sqlite3 (1.3.7)
15
+
16
+ PLATFORMS
17
+ ruby
18
+
19
+ DEPENDENCIES
20
+ mysql (~> 2.9.1)
21
+ mysql2 (~> 0.3.11)
22
+ pg (~> 0.14.1)
23
+ sqlite3 (~> 1.3.7)
24
+ tapsoob!
data/README.md ADDED
@@ -0,0 +1,66 @@
1
+ # Tapsoob
2
+
3
+ Tapsoob is a simple tool to export and import databases. It is inspired by <https://github.com/ricardochimal/taps> but instead of having to rely on a server and a client, databases are exported to the filesystem or imported from a previous export, hence the OOB (out-of-band). This was done in order to avoid exposing a whole database using the HyperText Protocol for security reasons.
4
+
5
+ Although the code is not quite perfect yet and some improvement will probably be made in the future it's usable. However if you notice an issue don't hesitate to report it.
6
+
7
+
8
+ ## Database support
9
+
10
+ Tapsoob currently rely on the Sequel ORM (<http://sequel.rubyforge.org/>) so we currently support every database that Sequel supports. If additionnal support is required in the future (NoSQL databases) I'll make my best to figure out a way to bring that in my ToDo list.
11
+
12
+ ### Oracle
13
+
14
+ If you're using either Oracle or Oracle XE you will need some extra requirements. If you're using Ruby you'll need to have your ORACLE_HOME environnement variable set properly and the `ruby-oci8` gem installed. However if you're using jRuby you'll need to have the official Oracle JDBC driver (see here for more informations: <http://www.oracle.com/technetwork/articles/dsl/jruby-oracle11g-330825.html>) and it should be loaded prior to using Tapsoob otherwise you won't be able to connect the database.
15
+
16
+
17
+ ## Exporting your data
18
+
19
+ tapsoob pull [OPTIONS] <dump_path> <database_url>
20
+
21
+ The `dump_path` is the location on the filesystem where you want your database exported and the `database_url` is an URL looking like this `postgres://user:password@localhost/blog`, you can find out more about how to connect to your RDBMS by refering yourself to this page: <http://sequel.rubyforge.org/rdoc/files/doc/opening_databases_rdoc.html> on the Sequel documentation.
22
+
23
+ Regarding options a complete list of options can be displayed using the following command:
24
+
25
+ tapsoob pull -h
26
+
27
+
28
+ ## Importing your data
29
+
30
+ tapsoob push [OPTIONS] <dump_path> <database_url>
31
+
32
+ As for exporting the `dump_path` is the path you previously exported a database you now wish to import and the `database_url` should conform to the same format described before.
33
+
34
+ You can list all available options using the command:
35
+
36
+ tapsoob push -h
37
+
38
+
39
+ ## Integration with Rails
40
+
41
+ If you're using Rails, there's also two Rake tasks provided:
42
+
43
+ * `tapsoob:pull` which dumps the database into a new folder under the `db` folder
44
+ * `tapsoob:push` which reads the last dump you made from `tapsoob:pull` from the `db` folder
45
+
46
+
47
+ ## Notes
48
+
49
+ Your exports can be moved from one machine to another for backups or replication, you can also use Tapsoob to switch your RDBMS from one of the supported system to another.
50
+
51
+
52
+ ## Feature(s) to come
53
+
54
+ * Tests
55
+
56
+
57
+ ## License
58
+
59
+ The MIT License (MIT)
60
+ Copyright © 2013 Félix Bellanger <felix.bellanger@gmail.com>
61
+
62
+ Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
63
+
64
+ The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
65
+
66
+ THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/Rakefile ADDED
File without changes
data/bin/schema ADDED
@@ -0,0 +1,54 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'rubygems'
4
+ require 'sequel'
5
+
6
+ $:.unshift File.dirname(__FILE__) + '/../lib'
7
+
8
+ require 'tapsoob/schema'
9
+
10
+ cmd = ARGV.shift.strip rescue ''
11
+ database_url = ARGV.shift.strip rescue ''
12
+
13
+ def show_usage_and_exit
14
+ puts <<EOTXT
15
+ schema console <database_url>
16
+ schema dump <database_url>
17
+ schema dump_table <database_url> <table>
18
+ schema indexes <database_url>
19
+ schema indexes_individual <database_url>
20
+ schema reset_db_sequences <database_url>
21
+ schema load <database_url> <schema_file>
22
+ schema load_indexes <database_url> <indexes_file>
23
+ EOTXT
24
+ exit(1)
25
+ end
26
+
27
+ case cmd
28
+ when 'dump'
29
+ puts Tapsoob::Schema.dump(database_url)
30
+ when 'dump_table'
31
+ table = ARGV.shift.strip
32
+ puts Tapsoob::Schema.dump_table(database_url, table)
33
+ when 'indexes'
34
+ puts Tapsoob::Schema.indexes(database_url)
35
+ when 'indexes_individual'
36
+ puts Tapsoob::Schema.indexes_individual(database_url)
37
+ when 'load_indexes'
38
+ filename = ARGV.shift.strip rescue ''
39
+ indexes = File.read(filename) rescue show_usage_and_exit
40
+ Tapsoob::Schema.load_indexes(database_url, indexes)
41
+ when 'load'
42
+ filename = ARGV.shift.strip rescue ''
43
+ schema = File.read(filename) rescue show_usage_and_exit
44
+ Tapsoob::Schema.load(database_url, schema)
45
+ when 'reset_db_sequences'
46
+ Tapsoob::Schema.reset_db_sequences(database_url)
47
+ when 'console'
48
+ $db = Sequel.connect(database_url)
49
+ require 'irb'
50
+ require 'irb/completion'
51
+ IRB.start
52
+ else
53
+ show_usage_and_exit
54
+ end
data/bin/tapsoob ADDED
@@ -0,0 +1,6 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ $:.unshift File.dirname(__FILE__) + '/../lib'
4
+ require 'tapsoob/cli'
5
+
6
+ Tapsoob::Cli.new(ARGV.dup).run
data/lib/tapsoob.rb ADDED
@@ -0,0 +1,9 @@
1
+ # -*- encoding : utf-8 -*-
2
+ $:.unshift File.dirname(__FILE__)
3
+
4
+ # internal requires
5
+ require 'tapsoob/operation'
6
+
7
+ module Tapsoob
8
+ require 'tapsoob/railtie' if defined?(Rails)
9
+ end
@@ -0,0 +1,53 @@
1
+ # -*- encoding : utf-8 -*-
2
+ require 'tapsoob/errors'
3
+
4
+ class Tapsoob::Chunksize
5
+ attr_accessor :idle_secs, :time_in_db, :start_time, :end_time, :retries
6
+ attr_reader :chunksize
7
+
8
+ def initialize(chunksize)
9
+ @chunksize = chunksize
10
+ @idle_secs = 0.0
11
+ @retries = 0
12
+ end
13
+
14
+ def to_i
15
+ chunksize
16
+ end
17
+
18
+ def reset_chunksize
19
+ @chunksize = (retries <= 1) ? 10 : 1
20
+ end
21
+
22
+ def diff
23
+ end_time - start_time - time_in_db - idle_secs
24
+ end
25
+
26
+ def time_in_db=(t)
27
+ @time_in_db = t
28
+ @time_in_db = @time_in_db.to_f rescue 0.0
29
+ end
30
+
31
+ def time_delta
32
+ t1 = Time.now
33
+ yield if block_given?
34
+ t2 = Time.now
35
+ t2 - t1
36
+ end
37
+
38
+ def calc_new_chunksize
39
+ new_chunksize = if retries > 0
40
+ chunksize
41
+ elsif diff > 3.0
42
+ (chunksize / 3).ceil
43
+ elsif diff > 1.1
44
+ chunksize - 100
45
+ elsif diff < 0.8
46
+ chunksize * 2
47
+ else
48
+ chunksize + 100
49
+ end
50
+ new_chunksize = 1 if new_chunksize < 1
51
+ new_chunksize
52
+ end
53
+ end
@@ -0,0 +1,145 @@
1
+ # -*- encoding : utf-8 -*-
2
+ require 'fileutils'
3
+ require 'optparse'
4
+ require 'tempfile'
5
+ require 'tapsoob/config'
6
+ require 'tapsoob/log'
7
+
8
+ Tapsoob::Config.tapsoob_database_url = ENV['TAPSOOB_DATABASE_URL'] || begin
9
+ # this is dirty but it solves a weird problem where the tempfile disappears mid-process
10
+ require 'sqlite3'
11
+ $__taps_database = Tempfile.new('tapsoob.db')
12
+ $__taps_database.open()
13
+ "sqlite://#{$__taps_database.path}"
14
+ end
15
+
16
+ module Tapsoob
17
+ class Cli
18
+ attr_accessor :argv
19
+
20
+ def initialize(argv)
21
+ @argv = argv
22
+ end
23
+
24
+ def run
25
+ method = (argv.shift || 'help').to_sym
26
+ if [:pull, :push, :version].include? method
27
+ send(method)
28
+ else
29
+ help
30
+ end
31
+ end
32
+
33
+ def pull
34
+ opts = clientoptparse(:pull)
35
+ Tapsoob.log.level = Logger::DEBUG if opts[:debug]
36
+ if opts[:resume_filename]
37
+ clientresumexfer(:pull, opts)
38
+ else
39
+ clientxfer(:pull, opts)
40
+ end
41
+ end
42
+
43
+ def push
44
+ opts = clientoptparse(:push)
45
+ Tapsoob.log.level = Logger::DEBUG if opts[:debug]
46
+ if opts[:resume_filename]
47
+ clientresumexfer(:push, opts)
48
+ else
49
+ clientxfer(:push, opts)
50
+ end
51
+ end
52
+
53
+ def version
54
+ puts Tapsoob.version
55
+ end
56
+
57
+ def help
58
+ puts <<EOHELP
59
+ Options
60
+ =======
61
+ pull Pull a database and export it into a directory
62
+ push Push a database from a directory
63
+ version Tapsoob version
64
+
65
+ Add '-h' to any command to see their usage
66
+ EOHELP
67
+ end
68
+
69
+ def clientoptparse(cmd)
70
+ opts={:default_chunksize => 1000, :database_url => nil, :dump_path => nil, :debug => false, :resume_filename => nil, :disable_compression => false, :indexes_first => false}
71
+ OptionParser.new do |o|
72
+ o.banner = "Usage: #{File.basename($0)} #{cmd} [OPTIONS] <dump_path> <database_url>"
73
+
74
+ case cmd
75
+ when :pull
76
+ o.define_head "Pull a database and export it into a directory"
77
+ when :push
78
+ o.define_head "Push a database from a directory"
79
+ end
80
+
81
+ o.on("-s", "--skip-schema", "Don't transfer the schema just data") { |v| opts[:skip_schema] = true }
82
+ o.on("-i", "--indexes-first", "Transfer indexes first before data") { |v| opts[:indexes_first] = true }
83
+ o.on("-r", "--resume=file", "Resume a Tapsoob Session from a stored file") { |v| opts[:resume_filename] = v }
84
+ o.on("-c", "--chunksize=N", "Initial Chunksize") { |v| opts[:default_chunksize] = (v.to_i < 10 ? 10 : v.to_i) }
85
+ o.on("-g", "--disable-compression", "Disable Compression") { |v| opts[:disable_compression] = true }
86
+ o.on("-f", "--filter=regex", "Regex Filter for tables") { |v| opts[:table_filter] = v }
87
+ o.on("-t", "--tables=A,B,C", Array, "Shortcut to filter on a list of tables") do |v|
88
+ r_tables = v.collect { |t| "^#{t}" }.join("|")
89
+ opts[:table_filter] = "#{r_tables}"
90
+ end
91
+ o.on("-e", "--exclude-tables=A,B,C", Array, "Shortcut to exclude a list of tables") { |v| opts[:exclude_tables] = v }
92
+ o.on("-d", "--debug", "Enable Debug Messages") { |v| opts[:debug] = true }
93
+
94
+ opts[:dump_path] = argv.shift
95
+ opts[:database_url] = argv.shift
96
+
97
+ if opts[:database_url].nil?
98
+ $stderr.puts "Missing Database URL"
99
+ puts o
100
+ exit 1
101
+ end
102
+ if opts[:dump_path].nil?
103
+ $stderr.puts "Missing Tapsoob Dump Path"
104
+ puts o
105
+ exist 1
106
+ end
107
+ end
108
+
109
+ opts
110
+ end
111
+
112
+ def clientxfer(method, opts)
113
+ database_url = opts.delete(:database_url)
114
+ dump_path = opts.delete(:dump_path)
115
+
116
+ Tapsoob::Config.verify_database_url(database_url)
117
+
118
+ FileUtils.mkpath "#{dump_path}/schemas"
119
+ FileUtils.mkpath "#{dump_path}/data"
120
+ FileUtils.mkpath "#{dump_path}/indexes"
121
+
122
+ require 'tapsoob/operation'
123
+
124
+ Tapsoob::Operation.factory(method, database_url, dump_path, opts).run
125
+ end
126
+
127
+ def clientresumexfer(method, opts)
128
+ session = JSON.parse(File.read(opts.delete(:resume_filename)))
129
+ session.symbolize_recursively!
130
+
131
+ database_url = opts.delete(:database_url)
132
+ dump_path = opts.delete(:dump_path) || session.delete(:dump_path)
133
+
134
+ require 'taps/operation'
135
+
136
+ newsession = session.merge({
137
+ :default_chunksize => opts[:default_chunksize],
138
+ :disable_compression => opts[:disable_compression],
139
+ :resume => true
140
+ })
141
+
142
+ Tapsoob::Operation.factory(method, database_url, dump_path, newsession).run
143
+ end
144
+ end
145
+ end
@@ -0,0 +1,33 @@
1
+ # -*- encoding : utf-8 -*-
2
+ require 'sequel'
3
+ require 'tapsoob/version'
4
+
5
+ Sequel.datetime_class = DateTime
6
+
7
+ module Tapsoob
8
+ def self.exiting=(val)
9
+ @@exiting = val
10
+ end
11
+
12
+ def exiting?
13
+ (@@exiting ||= false) == true
14
+ end
15
+
16
+ class Config
17
+ class << self
18
+ attr_accessor :tapsoob_database_url
19
+ attr_accessor :login, :password, :database_url, :dump_path
20
+ attr_accessor :chunksize
21
+
22
+ def verify_database_url(db_url=nil)
23
+ db_url ||= self.database_url
24
+ db = Sequel.connect(db_url)
25
+ db.tables
26
+ db.disconnect
27
+ rescue Object => e
28
+ puts "Failed to connect to database:\n #{e.class} -> #{e}"
29
+ exit 1
30
+ end
31
+ end
32
+ end
33
+ end
@@ -0,0 +1,350 @@
1
+ # -*- encoding : utf-8 -*-
2
+ require 'tapsoob/log'
3
+ require 'tapsoob/utils'
4
+
5
+ module Tapsoob
6
+ class DataStream
7
+ DEFAULT_CHUNKSIZE = 1000
8
+
9
+ attr_reader :db, :state
10
+
11
+ def initialize(db, state)
12
+ @db = db
13
+ @state = {
14
+ :offset => 0,
15
+ :avg_chunksize => 0,
16
+ :num_chunksize => 0,
17
+ :total_chunksize => 0
18
+ }.merge(state)
19
+ @state[:chunksize] ||= DEFAULT_CHUNKSIZE
20
+ @complete = false
21
+ end
22
+
23
+ def log
24
+ Tapsoob.log
25
+ end
26
+
27
+ def error=(val)
28
+ state[:error] = val
29
+ end
30
+
31
+ def error
32
+ state[:error] || false
33
+ end
34
+
35
+ def table_name
36
+ state[:table_name].to_sym
37
+ end
38
+
39
+ def table_name_sql
40
+ table_name.identifier
41
+ end
42
+
43
+ def to_hash
44
+ state.merge(:klass => self.class.to_s)
45
+ end
46
+
47
+ def to_json
48
+ JSON.generate(to_hash)
49
+ end
50
+
51
+ def string_columns
52
+ @string_columns ||= Tapsoob::Utils.incorrect_blobs(db, table_name)
53
+ end
54
+
55
+ def table
56
+ @table ||= db[table_name_sql]
57
+ end
58
+
59
+ def order_by(name=nil)
60
+ @order_by ||= begin
61
+ name ||= table_name
62
+ Tapsoob::Utils.order_by(db, name)
63
+ end
64
+ end
65
+
66
+ def increment(row_count)
67
+ state[:offset] += row_count
68
+ end
69
+
70
+ # keep a record of the average chunksize within the first few hundred thousand records, after chunksize
71
+ # goes below 100 or maybe if offset is > 1000
72
+ def fetch_rows
73
+ state[:chunksize] = fetch_chunksize
74
+ ds = table.order(*order_by).limit(state[:chunksize], state[:offset])
75
+ log.debug "DataStream#fetch_rows SQL -> #{ds.sql}"
76
+ rows = Tapsoob::Utils.format_data(ds.all,
77
+ :string_columns => string_columns,
78
+ :schema => db.schema(table_name),
79
+ :table => table_name
80
+ )
81
+ update_chunksize_stats
82
+ rows
83
+ end
84
+
85
+ def fetch_file(dump_path)
86
+ state[:chunksize] = fetch_chunksize
87
+ ds = JSON.parse(File.read(File.join(dump_path, "data", "#{table_name}.json")))
88
+ log.debug "DataStream#fetch_file"
89
+ rows = {
90
+ :header => ds["header"],
91
+ :data => ds["data"][state[:offset], (state[:offset] + state[:chunksize])] || [ ]
92
+ }
93
+ update_chunksize_stats
94
+ rows
95
+ end
96
+
97
+ def max_chunksize_training
98
+ 20
99
+ end
100
+
101
+ def fetch_chunksize
102
+ chunksize = state[:chunksize]
103
+ return chunksize if state[:num_chunksize] < max_chunksize_training
104
+ return chunksize if state[:avg_chunksize] == 0
105
+ return chunksize if state[:error]
106
+ state[:avg_chunksize] > chunksize ? state[:avg_chunksize] : chunksize
107
+ end
108
+
109
+ def update_chunksize_stats
110
+ return if state[:num_chunksize] >= max_chunksize_training
111
+ state[:total_chunksize] += state[:chunksize]
112
+ state[:num_chunksize] += 1
113
+ state[:avg_chunksize] = state[:total_chunksize] / state[:num_chunksize] rescue state[:chunksize]
114
+ end
115
+
116
+ def encode_rows(rows)
117
+ Tapsoob::Utils.base64encode(Marshal.dump(rows))
118
+ end
119
+
120
+ def fetch(opts = {})
121
+ opts = (opts.empty? ? { :type => "database", :source => db.uri } : opts)
122
+
123
+ log.debug "DataStream#fetch state -> #{state.inspect}"
124
+
125
+ t1 = Time.now
126
+ rows = (opts[:type] == "file" ? fetch_file(opts[:source]) : fetch_rows)
127
+ encoded_data = encode_rows(rows)
128
+ t2 = Time.now
129
+ elapsed_time = t2 - t1
130
+
131
+ if opts[:type] == "file"
132
+ @complete = rows[:data] == [ ]
133
+ else
134
+ @complete = rows == { }
135
+ end
136
+
137
+ [encoded_data, (@complete ? 0 : rows[:data].size), elapsed_time]
138
+ end
139
+
140
+ def complete?
141
+ @complete
142
+ end
143
+
144
+ def fetch_database(dump_path)
145
+ params = fetch_from_database
146
+ encoded_data = params[:encoded_data]
147
+ json = params[:json]
148
+
149
+ rows = parse_encoded_data(encoded_data, json[:checksum])
150
+
151
+ @complete = rows == { }
152
+
153
+ # update local state
154
+ state.merge!(json[:state].merge(:chunksize => state[:chunksize]))
155
+
156
+ unless @complete
157
+ Tapsoob::Utils.export_rows(dump_path, table_name, rows)
158
+ state[:offset] += rows[:data].size
159
+ rows[:data].size
160
+ else
161
+ 0
162
+ end
163
+ end
164
+
165
+ def fetch_from_database
166
+ res = nil
167
+ log.debug "DataStream#fetch_from_database state -> #{state.inspect}"
168
+ state[:chunksize] = Tapsoob::Utils.calculate_chunksize(state[:chunksize]) do |c|
169
+ state[:chunksize] = c.to_i
170
+ encoded_data = fetch.first
171
+
172
+ checksum = Tapsoob::Utils.checksum(encoded_data).to_s
173
+
174
+ res = {
175
+ :json => { :checksum => checksum, :state => to_hash },
176
+ :encoded_data => encoded_data
177
+ }
178
+ end
179
+
180
+ res
181
+ end
182
+
183
+ def fetch_data_in_database(params)
184
+ encoded_data = params[:encoded_data]
185
+
186
+ rows = parse_encoded_data(encoded_data, params[:checksum])
187
+
188
+ @complete = rows[:data] == [ ]
189
+
190
+ unless @complete
191
+ import_rows(rows)
192
+ rows[:data].size
193
+ else
194
+ 0
195
+ end
196
+ end
197
+
198
+ def self.parse_json(json)
199
+ hash = JSON.parse(json).symbolize_keys
200
+ hash[:state].symbolize_keys! if hash.has_key?(:state)
201
+ hash
202
+ end
203
+
204
+ def parse_encoded_data(encoded_data, checksum)
205
+ raise Tapsoob::CorruptedData.new("Checksum Failed") unless Tapsoob::Utils.valid_data?(encoded_data, checksum)
206
+
207
+ begin
208
+ return Marshal.load(Tapsoob::Utils.base64decode(encoded_data))
209
+ rescue Object => e
210
+ unless ENV['NO_DUMP_MARSHAL_ERRORS']
211
+ puts "Error encountered loading data, wrote the data chunk to dump.#{Process.pid}.dat"
212
+ File.open("dump.#{Process.pid}.dat", "w") { |f| f.write(encoded_data) }
213
+ end
214
+ raise e
215
+ end
216
+ end
217
+
218
+ def import_rows(rows)
219
+ table.import(rows[:header], rows[:data], :commit_every => 100)
220
+ state[:offset] += rows[:data].size
221
+ rescue Exception => ex
222
+ case ex.message
223
+ when /integer out of range/ then
224
+ raise Taps::InvalidData, <<-ERROR, []
225
+ \nDetected integer data that exceeds the maximum allowable size for an integer type.
226
+ This generally occurs when importing from SQLite due to the fact that SQLite does
227
+ not enforce maximum values on integer types.
228
+ ERROR
229
+ else raise ex
230
+ end
231
+ end
232
+
233
+ def verify_stream
234
+ state[:offset] = table.count
235
+ end
236
+
237
+ def self.factory(db, state)
238
+ if defined?(Sequel::MySQL) && Sequel::MySQL.respond_to?(:convert_invalid_date_time=)
239
+ Sequel::MySQL.convert_invalid_date_time = :nil
240
+ end
241
+
242
+ if state.has_key?(:klass)
243
+ return eval(state[:klass]).new(db, state)
244
+ end
245
+
246
+ if Tapsoob::Utils.single_integer_primary_key(db, state[:table_name].to_sym)
247
+ DataStreamKeyed.new(db, state)
248
+ else
249
+ DataStream.new(db, state)
250
+ end
251
+ end
252
+ end
253
+
254
+ class DataStreamKeyed < DataStream
255
+ attr_accessor :buffer
256
+
257
+ def initialize(db, state)
258
+ super(db, state)
259
+ @state = { :primary_key => order_by(state[:table_name]).first, :filter => 0 }.merge(@state)
260
+ @state[:chunksize] ||= DEFAULT_CHUNKSIZE
261
+ @buffer = []
262
+ end
263
+
264
+ def primary_key
265
+ state[:primary_key].to_sym
266
+ end
267
+
268
+ def buffer_limit
269
+ if state[:last_fetched] and state[:last_fetched] < state[:filter] and self.buffer.size == 0
270
+ state[:last_fetched]
271
+ else
272
+ state[:filter]
273
+ end
274
+ end
275
+
276
+ def calc_limit(chunksize)
277
+ # we want to not fetch more than is needed while we're
278
+ # inside sinatra but locally we can select more than
279
+ # is strictly needed
280
+ if defined?(Sinatra)
281
+ (chunksize * 1.1).ceil
282
+ else
283
+ (chunksize * 3).ceil
284
+ end
285
+ end
286
+
287
+ def load_buffer(chunksize)
288
+ # make sure BasicObject is not polluted by subsequent requires
289
+ Sequel::BasicObject.remove_methods!
290
+
291
+ num = 0
292
+ loop do
293
+ limit = calc_limit(chunksize)
294
+ # we have to use local variables in order for the virtual row filter to work correctly
295
+ key = primary_key
296
+ buf_limit = buffer_limit
297
+ ds = table.order(*order_by).filter { key.sql_number > buf_limit }.limit(limit)
298
+ log.debug "DataStreamKeyed#load_buffer SQL -> #{ds.sql}"
299
+ data = ds.all
300
+ self.buffer += data
301
+ num += data.size
302
+ if data.size > 0
303
+ # keep a record of the last primary key value in the buffer
304
+ state[:filter] = self.buffer.last[ primary_key ]
305
+ end
306
+
307
+ break if num >= chunksize or data.size == 0
308
+ end
309
+ end
310
+
311
+ def fetch_buffered(chunksize)
312
+ load_buffer(chunksize) if self.buffer.size < chunksize
313
+ rows = buffer.slice(0, chunksize)
314
+ state[:last_fetched] = if rows.size > 0
315
+ rows.last[ primary_key ]
316
+ else
317
+ nil
318
+ end
319
+ rows
320
+ end
321
+
322
+ #def import_rows(rows)
323
+ # table.import(rows[:header], rows[:data])
324
+ #end
325
+
326
+ #def fetch_rows
327
+ # chunksize = state[:chunksize]
328
+ # Tapsoob::Utils.format_data(fetch_buffered(chunksize) || [],
329
+ # :string_columns => string_columns)
330
+ #end
331
+
332
+ def increment(row_count)
333
+ # pop the rows we just successfully sent off the buffer
334
+ @buffer.slice!(0, row_count)
335
+ end
336
+
337
+ def verify_stream
338
+ key = primary_key
339
+ ds = table.order(*order_by)
340
+ current_filter = ds.max(key.sql_number)
341
+
342
+ # set the current filter to the max of the primary key
343
+ state[:filter] = current_filter
344
+ # clear out the last_fetched value so it can restart from scratch
345
+ state[:last_fetched] = nil
346
+
347
+ log.debug "DataStreamKeyed#verify_stream -> state: #{state.inspect}"
348
+ end
349
+ end
350
+ end