tapsoob 0.1.10
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/.gitignore +4 -0
- data/Gemfile +4 -0
- data/Gemfile.lock +24 -0
- data/README.md +66 -0
- data/Rakefile +0 -0
- data/bin/schema +54 -0
- data/bin/tapsoob +6 -0
- data/lib/tapsoob.rb +9 -0
- data/lib/tapsoob/chunksize.rb +53 -0
- data/lib/tapsoob/cli.rb +145 -0
- data/lib/tapsoob/config.rb +33 -0
- data/lib/tapsoob/data_stream.rb +350 -0
- data/lib/tapsoob/errors.rb +16 -0
- data/lib/tapsoob/log.rb +16 -0
- data/lib/tapsoob/operation.rb +468 -0
- data/lib/tapsoob/progress_bar.rb +236 -0
- data/lib/tapsoob/railtie.rb +11 -0
- data/lib/tapsoob/schema.rb +83 -0
- data/lib/tapsoob/utils.rb +179 -0
- data/lib/tapsoob/version.rb +4 -0
- data/lib/tasks/tapsoob.rake +59 -0
- data/tapsoob.gemspec +30 -0
- metadata +138 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: 1140973a2ff5d3c3acb175c2c58b21ac93ac1425
|
4
|
+
data.tar.gz: 1e0a3b8443be5d4fcbf567d8635a6a754d3e7a29
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 61d6f73a2f3ad4d4173e9cad5090af8e0ca94def18b925adb9716565d458f0addfe8030bc67fd2d7b16caf7706ac2e9b3653c35912010ec4fde60ac102c7eda9
|
7
|
+
data.tar.gz: 443ae1a295828478dd89ba039c6f842e973eaa1dd491465865c25c0165732c8f288b7325161a4c31ea74102be9066cbf8a12119ef749c035c0de0d54a2280075
|
data/.gitignore
ADDED
data/Gemfile
ADDED
data/Gemfile.lock
ADDED
@@ -0,0 +1,24 @@
|
|
1
|
+
PATH
|
2
|
+
remote: .
|
3
|
+
specs:
|
4
|
+
tapsoob (0.1.0)
|
5
|
+
sequel (~> 3.45.0)
|
6
|
+
|
7
|
+
GEM
|
8
|
+
remote: http://rubygems.org/
|
9
|
+
specs:
|
10
|
+
mysql (2.9.1)
|
11
|
+
mysql2 (0.3.11)
|
12
|
+
pg (0.14.1)
|
13
|
+
sequel (3.45.0)
|
14
|
+
sqlite3 (1.3.7)
|
15
|
+
|
16
|
+
PLATFORMS
|
17
|
+
ruby
|
18
|
+
|
19
|
+
DEPENDENCIES
|
20
|
+
mysql (~> 2.9.1)
|
21
|
+
mysql2 (~> 0.3.11)
|
22
|
+
pg (~> 0.14.1)
|
23
|
+
sqlite3 (~> 1.3.7)
|
24
|
+
tapsoob!
|
data/README.md
ADDED
@@ -0,0 +1,66 @@
|
|
1
|
+
# Tapsoob
|
2
|
+
|
3
|
+
Tapsoob is a simple tool to export and import databases. It is inspired by <https://github.com/ricardochimal/taps> but instead of having to rely on a server and a client, databases are exported to the filesystem or imported from a previous export, hence the OOB (out-of-band). This was done in order to avoid exposing a whole database using the HyperText Protocol for security reasons.
|
4
|
+
|
5
|
+
Although the code is not quite perfect yet and some improvement will probably be made in the future it's usable. However if you notice an issue don't hesitate to report it.
|
6
|
+
|
7
|
+
|
8
|
+
## Database support
|
9
|
+
|
10
|
+
Tapsoob currently rely on the Sequel ORM (<http://sequel.rubyforge.org/>) so we currently support every database that Sequel supports. If additionnal support is required in the future (NoSQL databases) I'll make my best to figure out a way to bring that in my ToDo list.
|
11
|
+
|
12
|
+
### Oracle
|
13
|
+
|
14
|
+
If you're using either Oracle or Oracle XE you will need some extra requirements. If you're using Ruby you'll need to have your ORACLE_HOME environnement variable set properly and the `ruby-oci8` gem installed. However if you're using jRuby you'll need to have the official Oracle JDBC driver (see here for more informations: <http://www.oracle.com/technetwork/articles/dsl/jruby-oracle11g-330825.html>) and it should be loaded prior to using Tapsoob otherwise you won't be able to connect the database.
|
15
|
+
|
16
|
+
|
17
|
+
## Exporting your data
|
18
|
+
|
19
|
+
tapsoob pull [OPTIONS] <dump_path> <database_url>
|
20
|
+
|
21
|
+
The `dump_path` is the location on the filesystem where you want your database exported and the `database_url` is an URL looking like this `postgres://user:password@localhost/blog`, you can find out more about how to connect to your RDBMS by refering yourself to this page: <http://sequel.rubyforge.org/rdoc/files/doc/opening_databases_rdoc.html> on the Sequel documentation.
|
22
|
+
|
23
|
+
Regarding options a complete list of options can be displayed using the following command:
|
24
|
+
|
25
|
+
tapsoob pull -h
|
26
|
+
|
27
|
+
|
28
|
+
## Importing your data
|
29
|
+
|
30
|
+
tapsoob push [OPTIONS] <dump_path> <database_url>
|
31
|
+
|
32
|
+
As for exporting the `dump_path` is the path you previously exported a database you now wish to import and the `database_url` should conform to the same format described before.
|
33
|
+
|
34
|
+
You can list all available options using the command:
|
35
|
+
|
36
|
+
tapsoob push -h
|
37
|
+
|
38
|
+
|
39
|
+
## Integration with Rails
|
40
|
+
|
41
|
+
If you're using Rails, there's also two Rake tasks provided:
|
42
|
+
|
43
|
+
* `tapsoob:pull` which dumps the database into a new folder under the `db` folder
|
44
|
+
* `tapsoob:push` which reads the last dump you made from `tapsoob:pull` from the `db` folder
|
45
|
+
|
46
|
+
|
47
|
+
## Notes
|
48
|
+
|
49
|
+
Your exports can be moved from one machine to another for backups or replication, you can also use Tapsoob to switch your RDBMS from one of the supported system to another.
|
50
|
+
|
51
|
+
|
52
|
+
## Feature(s) to come
|
53
|
+
|
54
|
+
* Tests
|
55
|
+
|
56
|
+
|
57
|
+
## License
|
58
|
+
|
59
|
+
The MIT License (MIT)
|
60
|
+
Copyright © 2013 Félix Bellanger <felix.bellanger@gmail.com>
|
61
|
+
|
62
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
|
63
|
+
|
64
|
+
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
|
65
|
+
|
66
|
+
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/Rakefile
ADDED
File without changes
|
data/bin/schema
ADDED
@@ -0,0 +1,54 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
require 'rubygems'
|
4
|
+
require 'sequel'
|
5
|
+
|
6
|
+
$:.unshift File.dirname(__FILE__) + '/../lib'
|
7
|
+
|
8
|
+
require 'tapsoob/schema'
|
9
|
+
|
10
|
+
cmd = ARGV.shift.strip rescue ''
|
11
|
+
database_url = ARGV.shift.strip rescue ''
|
12
|
+
|
13
|
+
def show_usage_and_exit
|
14
|
+
puts <<EOTXT
|
15
|
+
schema console <database_url>
|
16
|
+
schema dump <database_url>
|
17
|
+
schema dump_table <database_url> <table>
|
18
|
+
schema indexes <database_url>
|
19
|
+
schema indexes_individual <database_url>
|
20
|
+
schema reset_db_sequences <database_url>
|
21
|
+
schema load <database_url> <schema_file>
|
22
|
+
schema load_indexes <database_url> <indexes_file>
|
23
|
+
EOTXT
|
24
|
+
exit(1)
|
25
|
+
end
|
26
|
+
|
27
|
+
case cmd
|
28
|
+
when 'dump'
|
29
|
+
puts Tapsoob::Schema.dump(database_url)
|
30
|
+
when 'dump_table'
|
31
|
+
table = ARGV.shift.strip
|
32
|
+
puts Tapsoob::Schema.dump_table(database_url, table)
|
33
|
+
when 'indexes'
|
34
|
+
puts Tapsoob::Schema.indexes(database_url)
|
35
|
+
when 'indexes_individual'
|
36
|
+
puts Tapsoob::Schema.indexes_individual(database_url)
|
37
|
+
when 'load_indexes'
|
38
|
+
filename = ARGV.shift.strip rescue ''
|
39
|
+
indexes = File.read(filename) rescue show_usage_and_exit
|
40
|
+
Tapsoob::Schema.load_indexes(database_url, indexes)
|
41
|
+
when 'load'
|
42
|
+
filename = ARGV.shift.strip rescue ''
|
43
|
+
schema = File.read(filename) rescue show_usage_and_exit
|
44
|
+
Tapsoob::Schema.load(database_url, schema)
|
45
|
+
when 'reset_db_sequences'
|
46
|
+
Tapsoob::Schema.reset_db_sequences(database_url)
|
47
|
+
when 'console'
|
48
|
+
$db = Sequel.connect(database_url)
|
49
|
+
require 'irb'
|
50
|
+
require 'irb/completion'
|
51
|
+
IRB.start
|
52
|
+
else
|
53
|
+
show_usage_and_exit
|
54
|
+
end
|
data/bin/tapsoob
ADDED
data/lib/tapsoob.rb
ADDED
@@ -0,0 +1,53 @@
|
|
1
|
+
# -*- encoding : utf-8 -*-
|
2
|
+
require 'tapsoob/errors'
|
3
|
+
|
4
|
+
class Tapsoob::Chunksize
|
5
|
+
attr_accessor :idle_secs, :time_in_db, :start_time, :end_time, :retries
|
6
|
+
attr_reader :chunksize
|
7
|
+
|
8
|
+
def initialize(chunksize)
|
9
|
+
@chunksize = chunksize
|
10
|
+
@idle_secs = 0.0
|
11
|
+
@retries = 0
|
12
|
+
end
|
13
|
+
|
14
|
+
def to_i
|
15
|
+
chunksize
|
16
|
+
end
|
17
|
+
|
18
|
+
def reset_chunksize
|
19
|
+
@chunksize = (retries <= 1) ? 10 : 1
|
20
|
+
end
|
21
|
+
|
22
|
+
def diff
|
23
|
+
end_time - start_time - time_in_db - idle_secs
|
24
|
+
end
|
25
|
+
|
26
|
+
def time_in_db=(t)
|
27
|
+
@time_in_db = t
|
28
|
+
@time_in_db = @time_in_db.to_f rescue 0.0
|
29
|
+
end
|
30
|
+
|
31
|
+
def time_delta
|
32
|
+
t1 = Time.now
|
33
|
+
yield if block_given?
|
34
|
+
t2 = Time.now
|
35
|
+
t2 - t1
|
36
|
+
end
|
37
|
+
|
38
|
+
def calc_new_chunksize
|
39
|
+
new_chunksize = if retries > 0
|
40
|
+
chunksize
|
41
|
+
elsif diff > 3.0
|
42
|
+
(chunksize / 3).ceil
|
43
|
+
elsif diff > 1.1
|
44
|
+
chunksize - 100
|
45
|
+
elsif diff < 0.8
|
46
|
+
chunksize * 2
|
47
|
+
else
|
48
|
+
chunksize + 100
|
49
|
+
end
|
50
|
+
new_chunksize = 1 if new_chunksize < 1
|
51
|
+
new_chunksize
|
52
|
+
end
|
53
|
+
end
|
data/lib/tapsoob/cli.rb
ADDED
@@ -0,0 +1,145 @@
|
|
1
|
+
# -*- encoding : utf-8 -*-
|
2
|
+
require 'fileutils'
|
3
|
+
require 'optparse'
|
4
|
+
require 'tempfile'
|
5
|
+
require 'tapsoob/config'
|
6
|
+
require 'tapsoob/log'
|
7
|
+
|
8
|
+
Tapsoob::Config.tapsoob_database_url = ENV['TAPSOOB_DATABASE_URL'] || begin
|
9
|
+
# this is dirty but it solves a weird problem where the tempfile disappears mid-process
|
10
|
+
require 'sqlite3'
|
11
|
+
$__taps_database = Tempfile.new('tapsoob.db')
|
12
|
+
$__taps_database.open()
|
13
|
+
"sqlite://#{$__taps_database.path}"
|
14
|
+
end
|
15
|
+
|
16
|
+
module Tapsoob
|
17
|
+
class Cli
|
18
|
+
attr_accessor :argv
|
19
|
+
|
20
|
+
def initialize(argv)
|
21
|
+
@argv = argv
|
22
|
+
end
|
23
|
+
|
24
|
+
def run
|
25
|
+
method = (argv.shift || 'help').to_sym
|
26
|
+
if [:pull, :push, :version].include? method
|
27
|
+
send(method)
|
28
|
+
else
|
29
|
+
help
|
30
|
+
end
|
31
|
+
end
|
32
|
+
|
33
|
+
def pull
|
34
|
+
opts = clientoptparse(:pull)
|
35
|
+
Tapsoob.log.level = Logger::DEBUG if opts[:debug]
|
36
|
+
if opts[:resume_filename]
|
37
|
+
clientresumexfer(:pull, opts)
|
38
|
+
else
|
39
|
+
clientxfer(:pull, opts)
|
40
|
+
end
|
41
|
+
end
|
42
|
+
|
43
|
+
def push
|
44
|
+
opts = clientoptparse(:push)
|
45
|
+
Tapsoob.log.level = Logger::DEBUG if opts[:debug]
|
46
|
+
if opts[:resume_filename]
|
47
|
+
clientresumexfer(:push, opts)
|
48
|
+
else
|
49
|
+
clientxfer(:push, opts)
|
50
|
+
end
|
51
|
+
end
|
52
|
+
|
53
|
+
def version
|
54
|
+
puts Tapsoob.version
|
55
|
+
end
|
56
|
+
|
57
|
+
def help
|
58
|
+
puts <<EOHELP
|
59
|
+
Options
|
60
|
+
=======
|
61
|
+
pull Pull a database and export it into a directory
|
62
|
+
push Push a database from a directory
|
63
|
+
version Tapsoob version
|
64
|
+
|
65
|
+
Add '-h' to any command to see their usage
|
66
|
+
EOHELP
|
67
|
+
end
|
68
|
+
|
69
|
+
def clientoptparse(cmd)
|
70
|
+
opts={:default_chunksize => 1000, :database_url => nil, :dump_path => nil, :debug => false, :resume_filename => nil, :disable_compression => false, :indexes_first => false}
|
71
|
+
OptionParser.new do |o|
|
72
|
+
o.banner = "Usage: #{File.basename($0)} #{cmd} [OPTIONS] <dump_path> <database_url>"
|
73
|
+
|
74
|
+
case cmd
|
75
|
+
when :pull
|
76
|
+
o.define_head "Pull a database and export it into a directory"
|
77
|
+
when :push
|
78
|
+
o.define_head "Push a database from a directory"
|
79
|
+
end
|
80
|
+
|
81
|
+
o.on("-s", "--skip-schema", "Don't transfer the schema just data") { |v| opts[:skip_schema] = true }
|
82
|
+
o.on("-i", "--indexes-first", "Transfer indexes first before data") { |v| opts[:indexes_first] = true }
|
83
|
+
o.on("-r", "--resume=file", "Resume a Tapsoob Session from a stored file") { |v| opts[:resume_filename] = v }
|
84
|
+
o.on("-c", "--chunksize=N", "Initial Chunksize") { |v| opts[:default_chunksize] = (v.to_i < 10 ? 10 : v.to_i) }
|
85
|
+
o.on("-g", "--disable-compression", "Disable Compression") { |v| opts[:disable_compression] = true }
|
86
|
+
o.on("-f", "--filter=regex", "Regex Filter for tables") { |v| opts[:table_filter] = v }
|
87
|
+
o.on("-t", "--tables=A,B,C", Array, "Shortcut to filter on a list of tables") do |v|
|
88
|
+
r_tables = v.collect { |t| "^#{t}" }.join("|")
|
89
|
+
opts[:table_filter] = "#{r_tables}"
|
90
|
+
end
|
91
|
+
o.on("-e", "--exclude-tables=A,B,C", Array, "Shortcut to exclude a list of tables") { |v| opts[:exclude_tables] = v }
|
92
|
+
o.on("-d", "--debug", "Enable Debug Messages") { |v| opts[:debug] = true }
|
93
|
+
|
94
|
+
opts[:dump_path] = argv.shift
|
95
|
+
opts[:database_url] = argv.shift
|
96
|
+
|
97
|
+
if opts[:database_url].nil?
|
98
|
+
$stderr.puts "Missing Database URL"
|
99
|
+
puts o
|
100
|
+
exit 1
|
101
|
+
end
|
102
|
+
if opts[:dump_path].nil?
|
103
|
+
$stderr.puts "Missing Tapsoob Dump Path"
|
104
|
+
puts o
|
105
|
+
exist 1
|
106
|
+
end
|
107
|
+
end
|
108
|
+
|
109
|
+
opts
|
110
|
+
end
|
111
|
+
|
112
|
+
def clientxfer(method, opts)
|
113
|
+
database_url = opts.delete(:database_url)
|
114
|
+
dump_path = opts.delete(:dump_path)
|
115
|
+
|
116
|
+
Tapsoob::Config.verify_database_url(database_url)
|
117
|
+
|
118
|
+
FileUtils.mkpath "#{dump_path}/schemas"
|
119
|
+
FileUtils.mkpath "#{dump_path}/data"
|
120
|
+
FileUtils.mkpath "#{dump_path}/indexes"
|
121
|
+
|
122
|
+
require 'tapsoob/operation'
|
123
|
+
|
124
|
+
Tapsoob::Operation.factory(method, database_url, dump_path, opts).run
|
125
|
+
end
|
126
|
+
|
127
|
+
def clientresumexfer(method, opts)
|
128
|
+
session = JSON.parse(File.read(opts.delete(:resume_filename)))
|
129
|
+
session.symbolize_recursively!
|
130
|
+
|
131
|
+
database_url = opts.delete(:database_url)
|
132
|
+
dump_path = opts.delete(:dump_path) || session.delete(:dump_path)
|
133
|
+
|
134
|
+
require 'taps/operation'
|
135
|
+
|
136
|
+
newsession = session.merge({
|
137
|
+
:default_chunksize => opts[:default_chunksize],
|
138
|
+
:disable_compression => opts[:disable_compression],
|
139
|
+
:resume => true
|
140
|
+
})
|
141
|
+
|
142
|
+
Tapsoob::Operation.factory(method, database_url, dump_path, newsession).run
|
143
|
+
end
|
144
|
+
end
|
145
|
+
end
|
@@ -0,0 +1,33 @@
|
|
1
|
+
# -*- encoding : utf-8 -*-
|
2
|
+
require 'sequel'
|
3
|
+
require 'tapsoob/version'
|
4
|
+
|
5
|
+
Sequel.datetime_class = DateTime
|
6
|
+
|
7
|
+
module Tapsoob
|
8
|
+
def self.exiting=(val)
|
9
|
+
@@exiting = val
|
10
|
+
end
|
11
|
+
|
12
|
+
def exiting?
|
13
|
+
(@@exiting ||= false) == true
|
14
|
+
end
|
15
|
+
|
16
|
+
class Config
|
17
|
+
class << self
|
18
|
+
attr_accessor :tapsoob_database_url
|
19
|
+
attr_accessor :login, :password, :database_url, :dump_path
|
20
|
+
attr_accessor :chunksize
|
21
|
+
|
22
|
+
def verify_database_url(db_url=nil)
|
23
|
+
db_url ||= self.database_url
|
24
|
+
db = Sequel.connect(db_url)
|
25
|
+
db.tables
|
26
|
+
db.disconnect
|
27
|
+
rescue Object => e
|
28
|
+
puts "Failed to connect to database:\n #{e.class} -> #{e}"
|
29
|
+
exit 1
|
30
|
+
end
|
31
|
+
end
|
32
|
+
end
|
33
|
+
end
|
@@ -0,0 +1,350 @@
|
|
1
|
+
# -*- encoding : utf-8 -*-
|
2
|
+
require 'tapsoob/log'
|
3
|
+
require 'tapsoob/utils'
|
4
|
+
|
5
|
+
module Tapsoob
|
6
|
+
class DataStream
|
7
|
+
DEFAULT_CHUNKSIZE = 1000
|
8
|
+
|
9
|
+
attr_reader :db, :state
|
10
|
+
|
11
|
+
def initialize(db, state)
|
12
|
+
@db = db
|
13
|
+
@state = {
|
14
|
+
:offset => 0,
|
15
|
+
:avg_chunksize => 0,
|
16
|
+
:num_chunksize => 0,
|
17
|
+
:total_chunksize => 0
|
18
|
+
}.merge(state)
|
19
|
+
@state[:chunksize] ||= DEFAULT_CHUNKSIZE
|
20
|
+
@complete = false
|
21
|
+
end
|
22
|
+
|
23
|
+
def log
|
24
|
+
Tapsoob.log
|
25
|
+
end
|
26
|
+
|
27
|
+
def error=(val)
|
28
|
+
state[:error] = val
|
29
|
+
end
|
30
|
+
|
31
|
+
def error
|
32
|
+
state[:error] || false
|
33
|
+
end
|
34
|
+
|
35
|
+
def table_name
|
36
|
+
state[:table_name].to_sym
|
37
|
+
end
|
38
|
+
|
39
|
+
def table_name_sql
|
40
|
+
table_name.identifier
|
41
|
+
end
|
42
|
+
|
43
|
+
def to_hash
|
44
|
+
state.merge(:klass => self.class.to_s)
|
45
|
+
end
|
46
|
+
|
47
|
+
def to_json
|
48
|
+
JSON.generate(to_hash)
|
49
|
+
end
|
50
|
+
|
51
|
+
def string_columns
|
52
|
+
@string_columns ||= Tapsoob::Utils.incorrect_blobs(db, table_name)
|
53
|
+
end
|
54
|
+
|
55
|
+
def table
|
56
|
+
@table ||= db[table_name_sql]
|
57
|
+
end
|
58
|
+
|
59
|
+
def order_by(name=nil)
|
60
|
+
@order_by ||= begin
|
61
|
+
name ||= table_name
|
62
|
+
Tapsoob::Utils.order_by(db, name)
|
63
|
+
end
|
64
|
+
end
|
65
|
+
|
66
|
+
def increment(row_count)
|
67
|
+
state[:offset] += row_count
|
68
|
+
end
|
69
|
+
|
70
|
+
# keep a record of the average chunksize within the first few hundred thousand records, after chunksize
|
71
|
+
# goes below 100 or maybe if offset is > 1000
|
72
|
+
def fetch_rows
|
73
|
+
state[:chunksize] = fetch_chunksize
|
74
|
+
ds = table.order(*order_by).limit(state[:chunksize], state[:offset])
|
75
|
+
log.debug "DataStream#fetch_rows SQL -> #{ds.sql}"
|
76
|
+
rows = Tapsoob::Utils.format_data(ds.all,
|
77
|
+
:string_columns => string_columns,
|
78
|
+
:schema => db.schema(table_name),
|
79
|
+
:table => table_name
|
80
|
+
)
|
81
|
+
update_chunksize_stats
|
82
|
+
rows
|
83
|
+
end
|
84
|
+
|
85
|
+
def fetch_file(dump_path)
|
86
|
+
state[:chunksize] = fetch_chunksize
|
87
|
+
ds = JSON.parse(File.read(File.join(dump_path, "data", "#{table_name}.json")))
|
88
|
+
log.debug "DataStream#fetch_file"
|
89
|
+
rows = {
|
90
|
+
:header => ds["header"],
|
91
|
+
:data => ds["data"][state[:offset], (state[:offset] + state[:chunksize])] || [ ]
|
92
|
+
}
|
93
|
+
update_chunksize_stats
|
94
|
+
rows
|
95
|
+
end
|
96
|
+
|
97
|
+
def max_chunksize_training
|
98
|
+
20
|
99
|
+
end
|
100
|
+
|
101
|
+
def fetch_chunksize
|
102
|
+
chunksize = state[:chunksize]
|
103
|
+
return chunksize if state[:num_chunksize] < max_chunksize_training
|
104
|
+
return chunksize if state[:avg_chunksize] == 0
|
105
|
+
return chunksize if state[:error]
|
106
|
+
state[:avg_chunksize] > chunksize ? state[:avg_chunksize] : chunksize
|
107
|
+
end
|
108
|
+
|
109
|
+
def update_chunksize_stats
|
110
|
+
return if state[:num_chunksize] >= max_chunksize_training
|
111
|
+
state[:total_chunksize] += state[:chunksize]
|
112
|
+
state[:num_chunksize] += 1
|
113
|
+
state[:avg_chunksize] = state[:total_chunksize] / state[:num_chunksize] rescue state[:chunksize]
|
114
|
+
end
|
115
|
+
|
116
|
+
def encode_rows(rows)
|
117
|
+
Tapsoob::Utils.base64encode(Marshal.dump(rows))
|
118
|
+
end
|
119
|
+
|
120
|
+
def fetch(opts = {})
|
121
|
+
opts = (opts.empty? ? { :type => "database", :source => db.uri } : opts)
|
122
|
+
|
123
|
+
log.debug "DataStream#fetch state -> #{state.inspect}"
|
124
|
+
|
125
|
+
t1 = Time.now
|
126
|
+
rows = (opts[:type] == "file" ? fetch_file(opts[:source]) : fetch_rows)
|
127
|
+
encoded_data = encode_rows(rows)
|
128
|
+
t2 = Time.now
|
129
|
+
elapsed_time = t2 - t1
|
130
|
+
|
131
|
+
if opts[:type] == "file"
|
132
|
+
@complete = rows[:data] == [ ]
|
133
|
+
else
|
134
|
+
@complete = rows == { }
|
135
|
+
end
|
136
|
+
|
137
|
+
[encoded_data, (@complete ? 0 : rows[:data].size), elapsed_time]
|
138
|
+
end
|
139
|
+
|
140
|
+
def complete?
|
141
|
+
@complete
|
142
|
+
end
|
143
|
+
|
144
|
+
def fetch_database(dump_path)
|
145
|
+
params = fetch_from_database
|
146
|
+
encoded_data = params[:encoded_data]
|
147
|
+
json = params[:json]
|
148
|
+
|
149
|
+
rows = parse_encoded_data(encoded_data, json[:checksum])
|
150
|
+
|
151
|
+
@complete = rows == { }
|
152
|
+
|
153
|
+
# update local state
|
154
|
+
state.merge!(json[:state].merge(:chunksize => state[:chunksize]))
|
155
|
+
|
156
|
+
unless @complete
|
157
|
+
Tapsoob::Utils.export_rows(dump_path, table_name, rows)
|
158
|
+
state[:offset] += rows[:data].size
|
159
|
+
rows[:data].size
|
160
|
+
else
|
161
|
+
0
|
162
|
+
end
|
163
|
+
end
|
164
|
+
|
165
|
+
def fetch_from_database
|
166
|
+
res = nil
|
167
|
+
log.debug "DataStream#fetch_from_database state -> #{state.inspect}"
|
168
|
+
state[:chunksize] = Tapsoob::Utils.calculate_chunksize(state[:chunksize]) do |c|
|
169
|
+
state[:chunksize] = c.to_i
|
170
|
+
encoded_data = fetch.first
|
171
|
+
|
172
|
+
checksum = Tapsoob::Utils.checksum(encoded_data).to_s
|
173
|
+
|
174
|
+
res = {
|
175
|
+
:json => { :checksum => checksum, :state => to_hash },
|
176
|
+
:encoded_data => encoded_data
|
177
|
+
}
|
178
|
+
end
|
179
|
+
|
180
|
+
res
|
181
|
+
end
|
182
|
+
|
183
|
+
def fetch_data_in_database(params)
|
184
|
+
encoded_data = params[:encoded_data]
|
185
|
+
|
186
|
+
rows = parse_encoded_data(encoded_data, params[:checksum])
|
187
|
+
|
188
|
+
@complete = rows[:data] == [ ]
|
189
|
+
|
190
|
+
unless @complete
|
191
|
+
import_rows(rows)
|
192
|
+
rows[:data].size
|
193
|
+
else
|
194
|
+
0
|
195
|
+
end
|
196
|
+
end
|
197
|
+
|
198
|
+
def self.parse_json(json)
|
199
|
+
hash = JSON.parse(json).symbolize_keys
|
200
|
+
hash[:state].symbolize_keys! if hash.has_key?(:state)
|
201
|
+
hash
|
202
|
+
end
|
203
|
+
|
204
|
+
def parse_encoded_data(encoded_data, checksum)
|
205
|
+
raise Tapsoob::CorruptedData.new("Checksum Failed") unless Tapsoob::Utils.valid_data?(encoded_data, checksum)
|
206
|
+
|
207
|
+
begin
|
208
|
+
return Marshal.load(Tapsoob::Utils.base64decode(encoded_data))
|
209
|
+
rescue Object => e
|
210
|
+
unless ENV['NO_DUMP_MARSHAL_ERRORS']
|
211
|
+
puts "Error encountered loading data, wrote the data chunk to dump.#{Process.pid}.dat"
|
212
|
+
File.open("dump.#{Process.pid}.dat", "w") { |f| f.write(encoded_data) }
|
213
|
+
end
|
214
|
+
raise e
|
215
|
+
end
|
216
|
+
end
|
217
|
+
|
218
|
+
def import_rows(rows)
|
219
|
+
table.import(rows[:header], rows[:data], :commit_every => 100)
|
220
|
+
state[:offset] += rows[:data].size
|
221
|
+
rescue Exception => ex
|
222
|
+
case ex.message
|
223
|
+
when /integer out of range/ then
|
224
|
+
raise Taps::InvalidData, <<-ERROR, []
|
225
|
+
\nDetected integer data that exceeds the maximum allowable size for an integer type.
|
226
|
+
This generally occurs when importing from SQLite due to the fact that SQLite does
|
227
|
+
not enforce maximum values on integer types.
|
228
|
+
ERROR
|
229
|
+
else raise ex
|
230
|
+
end
|
231
|
+
end
|
232
|
+
|
233
|
+
def verify_stream
|
234
|
+
state[:offset] = table.count
|
235
|
+
end
|
236
|
+
|
237
|
+
def self.factory(db, state)
|
238
|
+
if defined?(Sequel::MySQL) && Sequel::MySQL.respond_to?(:convert_invalid_date_time=)
|
239
|
+
Sequel::MySQL.convert_invalid_date_time = :nil
|
240
|
+
end
|
241
|
+
|
242
|
+
if state.has_key?(:klass)
|
243
|
+
return eval(state[:klass]).new(db, state)
|
244
|
+
end
|
245
|
+
|
246
|
+
if Tapsoob::Utils.single_integer_primary_key(db, state[:table_name].to_sym)
|
247
|
+
DataStreamKeyed.new(db, state)
|
248
|
+
else
|
249
|
+
DataStream.new(db, state)
|
250
|
+
end
|
251
|
+
end
|
252
|
+
end
|
253
|
+
|
254
|
+
class DataStreamKeyed < DataStream
|
255
|
+
attr_accessor :buffer
|
256
|
+
|
257
|
+
def initialize(db, state)
|
258
|
+
super(db, state)
|
259
|
+
@state = { :primary_key => order_by(state[:table_name]).first, :filter => 0 }.merge(@state)
|
260
|
+
@state[:chunksize] ||= DEFAULT_CHUNKSIZE
|
261
|
+
@buffer = []
|
262
|
+
end
|
263
|
+
|
264
|
+
def primary_key
|
265
|
+
state[:primary_key].to_sym
|
266
|
+
end
|
267
|
+
|
268
|
+
def buffer_limit
|
269
|
+
if state[:last_fetched] and state[:last_fetched] < state[:filter] and self.buffer.size == 0
|
270
|
+
state[:last_fetched]
|
271
|
+
else
|
272
|
+
state[:filter]
|
273
|
+
end
|
274
|
+
end
|
275
|
+
|
276
|
+
def calc_limit(chunksize)
|
277
|
+
# we want to not fetch more than is needed while we're
|
278
|
+
# inside sinatra but locally we can select more than
|
279
|
+
# is strictly needed
|
280
|
+
if defined?(Sinatra)
|
281
|
+
(chunksize * 1.1).ceil
|
282
|
+
else
|
283
|
+
(chunksize * 3).ceil
|
284
|
+
end
|
285
|
+
end
|
286
|
+
|
287
|
+
def load_buffer(chunksize)
|
288
|
+
# make sure BasicObject is not polluted by subsequent requires
|
289
|
+
Sequel::BasicObject.remove_methods!
|
290
|
+
|
291
|
+
num = 0
|
292
|
+
loop do
|
293
|
+
limit = calc_limit(chunksize)
|
294
|
+
# we have to use local variables in order for the virtual row filter to work correctly
|
295
|
+
key = primary_key
|
296
|
+
buf_limit = buffer_limit
|
297
|
+
ds = table.order(*order_by).filter { key.sql_number > buf_limit }.limit(limit)
|
298
|
+
log.debug "DataStreamKeyed#load_buffer SQL -> #{ds.sql}"
|
299
|
+
data = ds.all
|
300
|
+
self.buffer += data
|
301
|
+
num += data.size
|
302
|
+
if data.size > 0
|
303
|
+
# keep a record of the last primary key value in the buffer
|
304
|
+
state[:filter] = self.buffer.last[ primary_key ]
|
305
|
+
end
|
306
|
+
|
307
|
+
break if num >= chunksize or data.size == 0
|
308
|
+
end
|
309
|
+
end
|
310
|
+
|
311
|
+
def fetch_buffered(chunksize)
|
312
|
+
load_buffer(chunksize) if self.buffer.size < chunksize
|
313
|
+
rows = buffer.slice(0, chunksize)
|
314
|
+
state[:last_fetched] = if rows.size > 0
|
315
|
+
rows.last[ primary_key ]
|
316
|
+
else
|
317
|
+
nil
|
318
|
+
end
|
319
|
+
rows
|
320
|
+
end
|
321
|
+
|
322
|
+
#def import_rows(rows)
|
323
|
+
# table.import(rows[:header], rows[:data])
|
324
|
+
#end
|
325
|
+
|
326
|
+
#def fetch_rows
|
327
|
+
# chunksize = state[:chunksize]
|
328
|
+
# Tapsoob::Utils.format_data(fetch_buffered(chunksize) || [],
|
329
|
+
# :string_columns => string_columns)
|
330
|
+
#end
|
331
|
+
|
332
|
+
def increment(row_count)
|
333
|
+
# pop the rows we just successfully sent off the buffer
|
334
|
+
@buffer.slice!(0, row_count)
|
335
|
+
end
|
336
|
+
|
337
|
+
def verify_stream
|
338
|
+
key = primary_key
|
339
|
+
ds = table.order(*order_by)
|
340
|
+
current_filter = ds.max(key.sql_number)
|
341
|
+
|
342
|
+
# set the current filter to the max of the primary key
|
343
|
+
state[:filter] = current_filter
|
344
|
+
# clear out the last_fetched value so it can restart from scratch
|
345
|
+
state[:last_fetched] = nil
|
346
|
+
|
347
|
+
log.debug "DataStreamKeyed#verify_stream -> state: #{state.inspect}"
|
348
|
+
end
|
349
|
+
end
|
350
|
+
end
|