taps2 0.5.5 → 0.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/{README.rdoc → README.md} +29 -22
- data/bin/{schema → schema2} +17 -8
- data/bin/{schema.cmd → schema2.cmd} +0 -0
- data/bin/{taps → taps2} +1 -1
- data/lib/taps/chunksize.rb +16 -12
- data/lib/taps/cli.rb +33 -38
- data/lib/taps/config.rb +2 -2
- data/lib/taps/data_stream.rb +256 -262
- data/lib/taps/errors.rb +1 -1
- data/lib/taps/log.rb +1 -1
- data/lib/taps/monkey.rb +10 -9
- data/lib/taps/multipart.rb +1 -2
- data/lib/taps/operation.rb +81 -86
- data/lib/taps/progress_bar.rb +42 -45
- data/lib/taps/schema.rb +2 -2
- data/lib/taps/server.rb +37 -38
- data/lib/taps/utils.rb +31 -34
- data/lib/taps/version.rb +13 -13
- data/lib/vendor/okjson.rb +115 -161
- data/spec/base.rb +2 -2
- data/spec/chunksize_spec.rb +6 -6
- data/spec/cli_spec.rb +6 -6
- data/spec/data_stream_spec.rb +4 -4
- data/spec/operation_spec.rb +17 -18
- data/spec/server_spec.rb +7 -7
- data/spec/utils_spec.rb +20 -20
- metadata +38 -39
- data/VERSION.yml +0 -5
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: bd7e00f8e9f2311fc7b0d8e990cb6d225978e39b
|
4
|
+
data.tar.gz: ab2788206704d5cb6ec72811192e2584439eca14
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: e1fe890ae878b3f52e33d9dbf88caaa32eb51b7672a8967da9cc2a5c408ef6e877f8e0c8f5698ad9a095ece0cdaa09252bc566021a5b1293633045faa595e467
|
7
|
+
data.tar.gz: 10c5c43cfe50ac13c0428d62c9c2e007d2f2481240040827c47a301ba3af14e008a3dd972e52bf4256c99149e179a3c77f162db466bfb1016a72c27f013e5b16
|
data/{README.rdoc → README.md}
RENAMED
@@ -1,25 +1,25 @@
|
|
1
|
-
|
1
|
+
# Taps (2) -- simple database import/export app
|
2
2
|
|
3
3
|
A simple database agnostic import/export app to transfer data to/from a remote database.
|
4
4
|
|
5
|
-
*Forked and updated* with fixes and improvements. Integrates fixes and updates from
|
5
|
+
*Forked and updated* with fixes and improvements. Integrates fixes and updates from [taps-taps](https://github.com/wijet/taps) and [tapsicle](https://github.com/jiffyondemand/tapsicle) forks.
|
6
6
|
|
7
|
-
|
7
|
+
## Installation
|
8
8
|
|
9
9
|
Renamed gem
|
10
10
|
|
11
|
-
|
11
|
+
$ gem install taps2
|
12
12
|
|
13
13
|
By default, Taps will attempt to create a SQLite3 database for sessions. Unless you specify a different database type, you'll need to install SQLite 3. (See _Environment Variables_ for alternative session databases.)
|
14
14
|
|
15
|
-
|
15
|
+
$ gem install sqlite3
|
16
16
|
|
17
17
|
Install the gems to support databases you want to work with, such as MySQL or PostgreSQL.
|
18
18
|
|
19
|
-
|
20
|
-
|
19
|
+
$ gem install mysql2
|
20
|
+
$ gem install pg
|
21
21
|
|
22
|
-
|
22
|
+
## Configuration: Environment Variables
|
23
23
|
|
24
24
|
_All environment variables are optional._
|
25
25
|
|
@@ -37,40 +37,42 @@ The `NO_DUMP_MARSHAL_ERRORS` variable allows you to disable dumping of marshalle
|
|
37
37
|
|
38
38
|
The `NO_DEFLATE` variable allows you to disable gzip compression (`Rack::Deflater`) on the server.
|
39
39
|
|
40
|
-
|
40
|
+
## Usage: Server
|
41
41
|
|
42
42
|
Here's how you start a taps server
|
43
43
|
|
44
|
-
|
44
|
+
$ taps2 server postgres://localdbuser:localdbpass@localhost/dbname httpuser httppassword
|
45
45
|
|
46
46
|
You can also specify an encoding in the database url
|
47
47
|
|
48
|
-
|
48
|
+
$ taps2 server mysql://localdbuser:localdbpass@localhost/dbname?encoding=latin1 httpuser httppassword
|
49
49
|
|
50
|
-
|
50
|
+
## Usage: Client
|
51
51
|
|
52
52
|
When you want to pull down a database from a taps server
|
53
53
|
|
54
|
-
|
54
|
+
$ taps2 pull postgres://dbuser:dbpassword@localhost/dbname http://httpuser:httppassword@example.com:5000
|
55
55
|
|
56
56
|
or when you want to push a local database to a taps server
|
57
57
|
|
58
|
-
|
58
|
+
$ taps2 push postgres://dbuser:dbpassword@localhost/dbname http://httpuser:httppassword@example.com:5000
|
59
59
|
|
60
60
|
or when you want to transfer a list of tables
|
61
61
|
|
62
|
-
|
62
|
+
$ taps2 push postgres://dbuser:dbpassword@localhost/dbname http://httpuser:httppassword@example.com:5000 --tables logs,tags
|
63
63
|
|
64
64
|
or when you want to transfer tables that start with a word
|
65
65
|
|
66
|
-
|
66
|
+
$ taps2 push postgres://dbuser:dbpassword@localhost/dbname http://httpuser:httppassword@example.com:5000 --filter '^log_'
|
67
67
|
|
68
|
-
|
68
|
+
## Troubleshooting
|
69
69
|
|
70
70
|
* "Error: invalid byte sequence for encoding" can be resolved by adding `encoding` to database URI (https://github.com/ricardochimal/taps/issues/110)
|
71
|
-
* *Example:* `
|
71
|
+
* *Example:* `taps2 server mysql://root@localhost/example_database?encoding=UTF8 httpuser httppassword`
|
72
|
+
* SQLite3 database URI may require three slashes (e.g. `sqlite3:///path/to/file.db`)
|
73
|
+
* Make sure to use an absolute/full path to the file on the server
|
72
74
|
|
73
|
-
|
75
|
+
## Known Issues
|
74
76
|
|
75
77
|
* Foreign key constraints get lost in the schema transfer
|
76
78
|
* Indexes may drop the "order" (https://github.com/ricardochimal/taps/issues/111)
|
@@ -78,14 +80,19 @@ or when you want to transfer tables that start with a word
|
|
78
80
|
* Tables without primary keys will be incredibly slow to transfer. This is due to it being inefficient having large offset values in queries.
|
79
81
|
* Multiple schemas are currently not supported (https://github.com/ricardochimal/taps/issues/97)
|
80
82
|
* Taps does not drop tables when overwriting database (https://github.com/ricardochimal/taps/issues/94)
|
83
|
+
* Oracle database classes not fully supported (https://github.com/ricardochimal/taps/issues/89)
|
84
|
+
* Some blank default values may be converted to NULL in MySQL table schemas (https://github.com/ricardochimal/taps/issues/88)
|
85
|
+
* Conversion of column data types can cause side effects when going from one database type to another
|
86
|
+
* MySQL `bigint` converts to PostgreSQL `string` (https://github.com/ricardochimal/taps/issues/77)
|
87
|
+
* Passwords in database URI can cause issues with special characters (https://github.com/ricardochimal/taps/issues/74)
|
81
88
|
|
82
|
-
|
89
|
+
## Feature Requests
|
83
90
|
|
84
91
|
* Allow a single Taps server to serve data from different databases (https://github.com/ricardochimal/taps/issues/103)
|
85
92
|
|
86
|
-
|
93
|
+
## Meta
|
87
94
|
|
88
|
-
Maintained by
|
95
|
+
Maintained by [Joel Van Horn](http://github.com/joelvh)
|
89
96
|
|
90
97
|
Written by Ricardo Chimal, Jr. (ricardo at heroku dot com) and Adam Wiggins (adam at heroku dot com)
|
91
98
|
|
data/bin/{schema → schema2}
RENAMED
@@ -1,14 +1,15 @@
|
|
1
1
|
#!/usr/bin/env ruby
|
2
2
|
|
3
3
|
require 'rubygems'
|
4
|
-
gem 'sequel', '~> 3.20.0'
|
5
4
|
|
6
|
-
|
5
|
+
gem 'sequel', '~> 4.0'
|
6
|
+
|
7
|
+
$LOAD_PATH.unshift File.dirname(__FILE__) + '/../lib'
|
7
8
|
|
8
9
|
require 'taps/schema'
|
9
10
|
|
10
|
-
cmd = ARGV.shift.strip
|
11
|
-
database_url = ARGV.shift.strip
|
11
|
+
cmd = ARGV.shift.to_s.strip
|
12
|
+
database_url = ARGV.shift.to_s.strip
|
12
13
|
|
13
14
|
def show_usage_and_exit
|
14
15
|
puts <<EOTXT
|
@@ -35,12 +36,20 @@ when 'indexes'
|
|
35
36
|
when 'indexes_individual'
|
36
37
|
puts Taps::Schema.indexes_individual(database_url)
|
37
38
|
when 'load_indexes'
|
38
|
-
filename = ARGV.shift.strip
|
39
|
-
indexes =
|
39
|
+
filename = ARGV.shift.to_s.strip
|
40
|
+
indexes = begin
|
41
|
+
File.read(filename)
|
42
|
+
rescue StandardError
|
43
|
+
show_usage_and_exit
|
44
|
+
end
|
40
45
|
Taps::Schema.load_indexes(database_url, indexes)
|
41
46
|
when 'load'
|
42
|
-
filename = ARGV.shift.strip
|
43
|
-
schema =
|
47
|
+
filename = ARGV.shift.to_s.strip
|
48
|
+
schema = begin
|
49
|
+
File.read(filename)
|
50
|
+
rescue StandardError
|
51
|
+
show_usage_and_exit
|
52
|
+
end
|
44
53
|
Taps::Schema.load(database_url, schema)
|
45
54
|
when 'reset_db_sequences'
|
46
55
|
Taps::Schema.reset_db_sequences(database_url)
|
File without changes
|
data/bin/{taps → taps2}
RENAMED
data/lib/taps/chunksize.rb
CHANGED
@@ -15,7 +15,7 @@ class Taps::Chunksize
|
|
15
15
|
end
|
16
16
|
|
17
17
|
def reset_chunksize
|
18
|
-
@chunksize =
|
18
|
+
@chunksize = retries <= 1 ? 10 : 1
|
19
19
|
end
|
20
20
|
|
21
21
|
def diff
|
@@ -24,7 +24,11 @@ class Taps::Chunksize
|
|
24
24
|
|
25
25
|
def time_in_db=(t)
|
26
26
|
@time_in_db = t
|
27
|
-
@time_in_db =
|
27
|
+
@time_in_db = begin
|
28
|
+
@time_in_db.to_f
|
29
|
+
rescue
|
30
|
+
0.0
|
31
|
+
end
|
28
32
|
end
|
29
33
|
|
30
34
|
def time_delta
|
@@ -36,16 +40,16 @@ class Taps::Chunksize
|
|
36
40
|
|
37
41
|
def calc_new_chunksize
|
38
42
|
new_chunksize = if retries > 0
|
39
|
-
|
40
|
-
|
41
|
-
|
42
|
-
|
43
|
-
|
44
|
-
|
45
|
-
|
46
|
-
|
47
|
-
|
48
|
-
|
43
|
+
chunksize
|
44
|
+
elsif diff > 3.0
|
45
|
+
(chunksize / 3).ceil
|
46
|
+
elsif diff > 1.1
|
47
|
+
chunksize - 100
|
48
|
+
elsif diff < 0.8
|
49
|
+
chunksize * 2
|
50
|
+
else
|
51
|
+
chunksize + 100
|
52
|
+
end
|
49
53
|
new_chunksize = 1 if new_chunksize < 1
|
50
54
|
new_chunksize
|
51
55
|
end
|
data/lib/taps/cli.rb
CHANGED
@@ -9,7 +9,7 @@ Taps::Config.taps_database_url = ENV['TAPS_DATABASE_URL'] || ENV['DATABASE_URL']
|
|
9
9
|
# this is dirty but it solves a weird problem where the tempfile disappears mid-process
|
10
10
|
require 'sqlite3'
|
11
11
|
$__taps_database = Tempfile.new('taps.db')
|
12
|
-
$__taps_database.open
|
12
|
+
$__taps_database.open
|
13
13
|
"sqlite://#{$__taps_database.path}"
|
14
14
|
end
|
15
15
|
|
@@ -23,7 +23,7 @@ module Taps
|
|
23
23
|
|
24
24
|
def run
|
25
25
|
method = (argv.shift || 'help').to_sym
|
26
|
-
if [
|
26
|
+
if %i[pull push server version].include? method
|
27
27
|
send(method)
|
28
28
|
else
|
29
29
|
help
|
@@ -59,12 +59,10 @@ module Taps
|
|
59
59
|
|
60
60
|
Taps::Config.verify_database_url
|
61
61
|
require 'taps/server'
|
62
|
-
Taps::Server.run!(
|
63
|
-
|
64
|
-
|
65
|
-
|
66
|
-
:dump_errors => true,
|
67
|
-
})
|
62
|
+
Taps::Server.run!(port: opts[:port],
|
63
|
+
environment: :production,
|
64
|
+
logging: true,
|
65
|
+
dump_errors: true)
|
68
66
|
end
|
69
67
|
|
70
68
|
def version
|
@@ -85,13 +83,13 @@ EOHELP
|
|
85
83
|
end
|
86
84
|
|
87
85
|
def serveroptparse
|
88
|
-
opts={:
|
86
|
+
opts = { port: 5000, database_url: nil, login: nil, password: nil, debug: false }
|
89
87
|
OptionParser.new do |o|
|
90
|
-
o.banner = "Usage: #{File.basename($
|
91
|
-
o.define_head
|
88
|
+
o.banner = "Usage: #{File.basename($PROGRAM_NAME)} server [OPTIONS] <local_database_url> <login> <password>"
|
89
|
+
o.define_head 'Start a taps database import/export server'
|
92
90
|
|
93
|
-
o.on(
|
94
|
-
o.on(
|
91
|
+
o.on('-p', '--port=N', 'Server Port') { |v| opts[:port] = v.to_i if v.to_i > 0 }
|
92
|
+
o.on('-d', '--debug', 'Enable Debug Messages') { |_v| opts[:debug] = true }
|
95
93
|
o.parse!(argv)
|
96
94
|
|
97
95
|
opts[:database_url] = argv.shift
|
@@ -99,17 +97,17 @@ EOHELP
|
|
99
97
|
opts[:password] = argv.shift || ENV['TAPS_PASSWORD']
|
100
98
|
|
101
99
|
if opts[:database_url].nil?
|
102
|
-
|
100
|
+
warn 'Missing Database URL'
|
103
101
|
puts o
|
104
102
|
exit 1
|
105
103
|
end
|
106
104
|
if opts[:login].nil?
|
107
|
-
|
105
|
+
warn 'Missing Login'
|
108
106
|
puts o
|
109
107
|
exit 1
|
110
108
|
end
|
111
109
|
if opts[:password].nil?
|
112
|
-
|
110
|
+
warn 'Missing Password'
|
113
111
|
puts o
|
114
112
|
exit 1
|
115
113
|
end
|
@@ -118,41 +116,41 @@ EOHELP
|
|
118
116
|
end
|
119
117
|
|
120
118
|
def clientoptparse(cmd)
|
121
|
-
opts={:
|
119
|
+
opts = { default_chunksize: 1000, database_url: nil, remote_url: nil, debug: false, resume_filename: nil, disable_compresion: false, indexes_first: false }
|
122
120
|
OptionParser.new do |o|
|
123
|
-
o.banner = "Usage: #{File.basename($
|
121
|
+
o.banner = "Usage: #{File.basename($PROGRAM_NAME)} #{cmd} [OPTIONS] <local_database_url> <remote_url>"
|
124
122
|
|
125
123
|
case cmd
|
126
124
|
when :pull
|
127
|
-
o.define_head
|
125
|
+
o.define_head 'Pull a database from a taps server'
|
128
126
|
when :push
|
129
|
-
o.define_head
|
127
|
+
o.define_head 'Push a database to a taps server'
|
130
128
|
end
|
131
129
|
|
132
|
-
o.on(
|
133
|
-
o.on(
|
134
|
-
o.on(
|
135
|
-
o.on(
|
136
|
-
o.on(
|
137
|
-
o.on(
|
138
|
-
o.on(
|
139
|
-
r_tables = v.collect { |t| "^#{t}$" }.join(
|
130
|
+
o.on('-s', '--skip-schema', "Don't transfer the schema, just data") { |_v| opts[:skip_schema] = true }
|
131
|
+
o.on('-i', '--indexes-first', 'Transfer indexes first before data') { |_v| opts[:indexes_first] = true }
|
132
|
+
o.on('-r', '--resume=file', 'Resume a Taps Session from a stored file') { |v| opts[:resume_filename] = v }
|
133
|
+
o.on('-c', '--chunksize=N', 'Initial Chunksize') { |v| opts[:default_chunksize] = (v.to_i < 10 ? 10 : v.to_i) }
|
134
|
+
o.on('-g', '--disable-compression', 'Disable Compression') { |_v| opts[:disable_compression] = true }
|
135
|
+
o.on('-f', '--filter=regex', 'Regex Filter for tables') { |v| opts[:table_filter] = v }
|
136
|
+
o.on('-t', '--tables=A,B,C', Array, 'Shortcut to filter on a list of tables') do |v|
|
137
|
+
r_tables = v.collect { |t| "^#{t}$" }.join('|')
|
140
138
|
opts[:table_filter] = "(#{r_tables})"
|
141
139
|
end
|
142
|
-
o.on(
|
143
|
-
o.on(
|
140
|
+
o.on('-e', '--exclude_tables=A,B,C', Array, 'Shortcut to exclude a list of tables') { |v| opts[:exclude_tables] = v }
|
141
|
+
o.on('-d', '--debug', 'Enable Debug Messages') { |_v| opts[:debug] = true }
|
144
142
|
o.parse!(argv)
|
145
143
|
|
146
144
|
opts[:database_url] = argv.shift
|
147
145
|
opts[:remote_url] = argv.shift
|
148
146
|
|
149
147
|
if opts[:database_url].nil?
|
150
|
-
|
148
|
+
warn 'Missing Database URL'
|
151
149
|
puts o
|
152
150
|
exit 1
|
153
151
|
end
|
154
152
|
if opts[:remote_url].nil?
|
155
|
-
|
153
|
+
warn 'Missing Remote Taps URL'
|
156
154
|
puts o
|
157
155
|
exit 1
|
158
156
|
end
|
@@ -183,14 +181,11 @@ EOHELP
|
|
183
181
|
|
184
182
|
require 'taps/operation'
|
185
183
|
|
186
|
-
newsession = session.merge(
|
187
|
-
|
188
|
-
|
189
|
-
:resume => true,
|
190
|
-
})
|
184
|
+
newsession = session.merge(default_chunksize: opts[:default_chunksize],
|
185
|
+
disable_compression: opts[:disable_compression],
|
186
|
+
resume: true)
|
191
187
|
|
192
188
|
Taps::Operation.factory(method, database_url, remote_url, newsession).run
|
193
189
|
end
|
194
|
-
|
195
190
|
end
|
196
191
|
end
|
data/lib/taps/config.rb
CHANGED
@@ -22,8 +22,8 @@ module Taps
|
|
22
22
|
attr_accessor :login, :password, :database_url, :remote_url
|
23
23
|
attr_accessor :chunksize
|
24
24
|
|
25
|
-
def verify_database_url(db_url=nil)
|
26
|
-
db_url ||=
|
25
|
+
def verify_database_url(db_url = nil)
|
26
|
+
db_url ||= database_url
|
27
27
|
db = Sequel.connect(db_url)
|
28
28
|
db.tables
|
29
29
|
db.disconnect
|
data/lib/taps/data_stream.rb
CHANGED
@@ -6,338 +6,332 @@ require 'taps/errors'
|
|
6
6
|
require 'vendor/okjson'
|
7
7
|
|
8
8
|
module Taps
|
9
|
+
class DataStream
|
10
|
+
DEFAULT_CHUNKSIZE = 1000
|
11
|
+
|
12
|
+
attr_reader :db, :state
|
13
|
+
|
14
|
+
def initialize(db, state)
|
15
|
+
@db = db
|
16
|
+
@state = {
|
17
|
+
offset: 0,
|
18
|
+
avg_chunksize: 0,
|
19
|
+
num_chunksize: 0,
|
20
|
+
total_chunksize: 0
|
21
|
+
}.merge(state)
|
22
|
+
@state[:chunksize] ||= DEFAULT_CHUNKSIZE
|
23
|
+
@complete = false
|
24
|
+
end
|
9
25
|
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
attr_reader :db, :state
|
14
|
-
|
15
|
-
def initialize(db, state)
|
16
|
-
@db = db
|
17
|
-
@state = {
|
18
|
-
:offset => 0,
|
19
|
-
:avg_chunksize => 0,
|
20
|
-
:num_chunksize => 0,
|
21
|
-
:total_chunksize => 0,
|
22
|
-
}.merge(state)
|
23
|
-
@state[:chunksize] ||= DEFAULT_CHUNKSIZE
|
24
|
-
@complete = false
|
25
|
-
end
|
26
|
-
|
27
|
-
def log
|
28
|
-
Taps.log
|
29
|
-
end
|
26
|
+
def log
|
27
|
+
Taps.log
|
28
|
+
end
|
30
29
|
|
31
|
-
|
32
|
-
|
33
|
-
|
30
|
+
def error=(val)
|
31
|
+
state[:error] = val
|
32
|
+
end
|
34
33
|
|
35
|
-
|
36
|
-
|
37
|
-
|
34
|
+
def error
|
35
|
+
state[:error] || false
|
36
|
+
end
|
38
37
|
|
39
|
-
|
40
|
-
|
41
|
-
|
38
|
+
def table_name
|
39
|
+
state[:table_name].to_sym
|
40
|
+
end
|
42
41
|
|
43
|
-
|
44
|
-
|
45
|
-
|
42
|
+
def table_name_sql
|
43
|
+
table_name.identifier
|
44
|
+
end
|
46
45
|
|
47
|
-
|
48
|
-
|
49
|
-
|
46
|
+
def to_hash
|
47
|
+
state.merge(klass: self.class.to_s)
|
48
|
+
end
|
50
49
|
|
51
|
-
|
52
|
-
|
53
|
-
|
50
|
+
def to_json
|
51
|
+
::OkJson.encode(to_hash)
|
52
|
+
end
|
54
53
|
|
55
|
-
|
56
|
-
|
57
|
-
|
54
|
+
def string_columns
|
55
|
+
@string_columns ||= Taps::Utils.incorrect_blobs(db, table_name)
|
56
|
+
end
|
58
57
|
|
59
|
-
|
60
|
-
|
61
|
-
|
58
|
+
def table
|
59
|
+
@table ||= db[table_name_sql]
|
60
|
+
end
|
62
61
|
|
63
|
-
|
64
|
-
|
65
|
-
|
66
|
-
|
62
|
+
def order_by(name = nil)
|
63
|
+
@order_by ||= begin
|
64
|
+
name ||= table_name
|
65
|
+
Taps::Utils.order_by(db, name)
|
66
|
+
end
|
67
67
|
end
|
68
|
-
end
|
69
68
|
|
70
|
-
|
71
|
-
|
72
|
-
|
69
|
+
def increment(row_count)
|
70
|
+
state[:offset] += row_count
|
71
|
+
end
|
73
72
|
|
74
|
-
|
75
|
-
|
76
|
-
|
77
|
-
|
78
|
-
|
79
|
-
|
80
|
-
|
81
|
-
|
82
|
-
|
83
|
-
|
84
|
-
|
85
|
-
|
86
|
-
|
87
|
-
end
|
73
|
+
# keep a record of the average chunksize within the first few hundred thousand records, after chunksize
|
74
|
+
# goes below 100 or maybe if offset is > 1000
|
75
|
+
def fetch_rows
|
76
|
+
state[:chunksize] = fetch_chunksize
|
77
|
+
ds = table.order(*order_by).limit(state[:chunksize], state[:offset])
|
78
|
+
log.debug "DataStream#fetch_rows SQL -> #{ds.sql}"
|
79
|
+
rows = Taps::Utils.format_data(ds.all,
|
80
|
+
string_columns: string_columns,
|
81
|
+
schema: db.schema(table_name),
|
82
|
+
table: table_name)
|
83
|
+
update_chunksize_stats
|
84
|
+
rows
|
85
|
+
end
|
88
86
|
|
89
|
-
|
90
|
-
|
91
|
-
|
87
|
+
def max_chunksize_training
|
88
|
+
20
|
89
|
+
end
|
92
90
|
|
93
|
-
|
94
|
-
|
95
|
-
|
96
|
-
|
97
|
-
|
98
|
-
|
99
|
-
|
91
|
+
def fetch_chunksize
|
92
|
+
chunksize = state[:chunksize]
|
93
|
+
return chunksize if state[:num_chunksize] < max_chunksize_training
|
94
|
+
return chunksize if state[:avg_chunksize] == 0
|
95
|
+
return chunksize if state[:error]
|
96
|
+
state[:avg_chunksize] > chunksize ? state[:avg_chunksize] : chunksize
|
97
|
+
end
|
100
98
|
|
101
|
-
|
102
|
-
|
103
|
-
|
104
|
-
|
105
|
-
|
106
|
-
|
99
|
+
def update_chunksize_stats
|
100
|
+
return if state[:num_chunksize] >= max_chunksize_training
|
101
|
+
state[:total_chunksize] += state[:chunksize]
|
102
|
+
state[:num_chunksize] += 1
|
103
|
+
state[:avg_chunksize] = begin
|
104
|
+
state[:total_chunksize] / state[:num_chunksize]
|
105
|
+
rescue
|
106
|
+
state[:chunksize]
|
107
|
+
end
|
108
|
+
end
|
107
109
|
|
108
|
-
|
109
|
-
|
110
|
-
|
110
|
+
def encode_rows(rows)
|
111
|
+
Taps::Utils.base64encode(Marshal.dump(rows))
|
112
|
+
end
|
111
113
|
|
112
|
-
|
113
|
-
|
114
|
+
def fetch
|
115
|
+
log.debug "DataStream#fetch state -> #{state.inspect}"
|
114
116
|
|
115
|
-
|
116
|
-
|
117
|
-
|
118
|
-
|
119
|
-
|
117
|
+
t1 = Time.now
|
118
|
+
rows = fetch_rows
|
119
|
+
encoded_data = encode_rows(rows)
|
120
|
+
t2 = Time.now
|
121
|
+
elapsed_time = t2 - t1
|
120
122
|
|
121
|
-
|
123
|
+
@complete = rows == {}
|
122
124
|
|
123
|
-
|
124
|
-
|
125
|
+
[encoded_data, (@complete ? 0 : rows[:data].size), elapsed_time]
|
126
|
+
end
|
125
127
|
|
126
|
-
|
127
|
-
|
128
|
-
|
128
|
+
def complete?
|
129
|
+
@complete
|
130
|
+
end
|
129
131
|
|
130
|
-
|
131
|
-
|
132
|
-
|
133
|
-
|
132
|
+
def fetch_remote(resource, headers)
|
133
|
+
params = fetch_from_resource(resource, headers)
|
134
|
+
encoded_data = params[:encoded_data]
|
135
|
+
json = params[:json]
|
134
136
|
|
135
|
-
|
136
|
-
|
137
|
+
rows = parse_encoded_data(encoded_data, json[:checksum])
|
138
|
+
@complete = rows == {}
|
137
139
|
|
138
|
-
|
139
|
-
|
140
|
+
# update local state
|
141
|
+
state.merge!(json[:state].merge(chunksize: state[:chunksize]))
|
140
142
|
|
141
|
-
|
142
|
-
|
143
|
-
|
144
|
-
|
145
|
-
|
143
|
+
if @complete
|
144
|
+
0
|
145
|
+
else
|
146
|
+
import_rows(rows)
|
147
|
+
rows[:data].size
|
148
|
+
end
|
146
149
|
end
|
147
|
-
end
|
148
150
|
|
149
|
-
|
150
|
-
|
151
|
-
|
152
|
-
|
151
|
+
# this one is used inside the server process
|
152
|
+
def fetch_remote_in_server(params)
|
153
|
+
json = self.class.parse_json(params[:json])
|
154
|
+
encoded_data = params[:encoded_data]
|
153
155
|
|
154
|
-
|
155
|
-
|
156
|
+
rows = parse_encoded_data(encoded_data, json[:checksum])
|
157
|
+
@complete = rows == {}
|
156
158
|
|
157
|
-
|
158
|
-
|
159
|
-
|
160
|
-
|
161
|
-
|
159
|
+
if @complete
|
160
|
+
0
|
161
|
+
else
|
162
|
+
import_rows(rows)
|
163
|
+
rows[:data].size
|
164
|
+
end
|
162
165
|
end
|
163
|
-
end
|
164
166
|
|
165
|
-
|
166
|
-
|
167
|
-
|
168
|
-
|
169
|
-
|
170
|
-
|
171
|
-
|
167
|
+
def fetch_from_resource(resource, headers)
|
168
|
+
res = nil
|
169
|
+
log.debug "DataStream#fetch_from_resource state -> #{state.inspect}"
|
170
|
+
state[:chunksize] = Taps::Utils.calculate_chunksize(state[:chunksize]) do |c|
|
171
|
+
state[:chunksize] = c.to_i
|
172
|
+
res = resource.post({ state: ::OkJson.encode(to_hash) }, headers)
|
173
|
+
end
|
172
174
|
|
173
|
-
|
174
|
-
|
175
|
-
|
176
|
-
|
177
|
-
|
178
|
-
|
175
|
+
begin
|
176
|
+
params = Taps::Multipart.parse(res)
|
177
|
+
params[:json] = self.class.parse_json(params[:json]) if params.key?(:json)
|
178
|
+
return params
|
179
|
+
rescue ::OkJson::ParserError
|
180
|
+
raise Taps::CorruptedData, 'Invalid OkJson Received'
|
181
|
+
end
|
179
182
|
end
|
180
|
-
end
|
181
183
|
|
182
|
-
|
183
|
-
|
184
|
-
|
185
|
-
|
186
|
-
|
187
|
-
|
188
|
-
def parse_encoded_data(encoded_data, checksum)
|
189
|
-
raise Taps::CorruptedData.new("Checksum Failed") unless Taps::Utils.valid_data?(encoded_data, checksum)
|
184
|
+
def self.parse_json(json)
|
185
|
+
hash = ::OkJson.decode(json).symbolize_keys
|
186
|
+
hash[:state].symbolize_keys! if hash.key?(:state)
|
187
|
+
hash
|
188
|
+
end
|
190
189
|
|
191
|
-
|
192
|
-
|
193
|
-
|
194
|
-
|
195
|
-
|
196
|
-
|
190
|
+
def parse_encoded_data(encoded_data, checksum)
|
191
|
+
raise Taps::CorruptedData, 'Checksum Failed' unless Taps::Utils.valid_data?(encoded_data, checksum)
|
192
|
+
|
193
|
+
begin
|
194
|
+
return Marshal.load(Taps::Utils.base64decode(encoded_data))
|
195
|
+
rescue Object
|
196
|
+
unless ENV['NO_DUMP_MARSHAL_ERRORS']
|
197
|
+
puts "Error encountered loading data, wrote the data chunk to dump.#{Process.pid}.dat"
|
198
|
+
File.open("dump.#{Process.pid}.dat", 'w') { |f| f.write(encoded_data) }
|
199
|
+
end
|
200
|
+
raise
|
197
201
|
end
|
198
|
-
raise
|
199
202
|
end
|
200
|
-
end
|
201
203
|
|
202
|
-
|
203
|
-
|
204
|
-
|
205
|
-
|
206
|
-
|
207
|
-
|
208
|
-
|
204
|
+
def import_rows(rows)
|
205
|
+
table.import(rows[:header], rows[:data])
|
206
|
+
state[:offset] += rows[:data].size
|
207
|
+
rescue Exception => ex
|
208
|
+
case ex.message
|
209
|
+
when /integer out of range/ then
|
210
|
+
raise Taps::InvalidData, <<-ERROR, []
|
209
211
|
\nDetected integer data that exceeds the maximum allowable size for an integer type.
|
210
212
|
This generally occurs when importing from SQLite due to the fact that SQLite does
|
211
213
|
not enforce maximum values on integer types.
|
212
214
|
ERROR
|
213
|
-
|
215
|
+
else raise ex
|
216
|
+
end
|
214
217
|
end
|
215
|
-
end
|
216
218
|
|
217
|
-
|
218
|
-
|
219
|
-
end
|
220
|
-
|
221
|
-
def verify_remote_stream(resource, headers)
|
222
|
-
json_raw = resource.post({:state => ::OkJson.encode(self)}, headers).to_s
|
223
|
-
json = self.class.parse_json(json_raw)
|
224
|
-
|
225
|
-
self.class.new(db, json[:state])
|
226
|
-
end
|
227
|
-
|
228
|
-
def self.factory(db, state)
|
229
|
-
if defined?(Sequel::MySQL) && Sequel::MySQL.respond_to?(:convert_invalid_date_time=)
|
230
|
-
Sequel::MySQL.convert_invalid_date_time = :nil
|
219
|
+
def verify_stream
|
220
|
+
state[:offset] = table.count
|
231
221
|
end
|
232
222
|
|
233
|
-
|
234
|
-
|
235
|
-
|
223
|
+
def verify_remote_stream(resource, headers)
|
224
|
+
json_raw = resource.post({ state: ::OkJson.encode(self) }, headers).to_s
|
225
|
+
json = self.class.parse_json(json_raw)
|
236
226
|
|
237
|
-
|
238
|
-
DataStreamKeyed.new(db, state)
|
239
|
-
else
|
240
|
-
DataStream.new(db, state)
|
227
|
+
self.class.new(db, json[:state])
|
241
228
|
end
|
242
|
-
end
|
243
|
-
end
|
244
229
|
|
230
|
+
def self.factory(db, state)
|
231
|
+
if defined?(Sequel::MySQL) && Sequel::MySQL.respond_to?(:convert_invalid_date_time=)
|
232
|
+
Sequel::MySQL.convert_invalid_date_time = :nil
|
233
|
+
end
|
245
234
|
|
246
|
-
|
247
|
-
attr_accessor :buffer
|
235
|
+
return eval(state[:klass]).new(db, state) if state.key?(:klass)
|
248
236
|
|
249
|
-
|
250
|
-
|
251
|
-
|
252
|
-
|
253
|
-
|
237
|
+
if Taps::Utils.single_integer_primary_key(db, state[:table_name].to_sym)
|
238
|
+
DataStreamKeyed.new(db, state)
|
239
|
+
else
|
240
|
+
DataStream.new(db, state)
|
241
|
+
end
|
242
|
+
end
|
254
243
|
end
|
255
244
|
|
256
|
-
|
257
|
-
|
258
|
-
end
|
245
|
+
class DataStreamKeyed < DataStream
|
246
|
+
attr_accessor :buffer
|
259
247
|
|
260
|
-
|
261
|
-
|
262
|
-
state[:
|
263
|
-
|
264
|
-
|
248
|
+
def initialize(db, state)
|
249
|
+
super(db, state)
|
250
|
+
@state = { primary_key: order_by(state[:table_name]).first, filter: 0 }.merge(state)
|
251
|
+
@state[:chunksize] ||= DEFAULT_CHUNKSIZE
|
252
|
+
@buffer = []
|
265
253
|
end
|
266
|
-
end
|
267
254
|
|
268
|
-
|
269
|
-
|
270
|
-
# inside sinatra but locally we can select more than
|
271
|
-
# is strictly needed
|
272
|
-
if defined?(Sinatra)
|
273
|
-
(chunksize * 1.1).ceil
|
274
|
-
else
|
275
|
-
(chunksize * 3).ceil
|
255
|
+
def primary_key
|
256
|
+
state[:primary_key].to_sym
|
276
257
|
end
|
277
|
-
end
|
278
258
|
|
279
|
-
|
280
|
-
|
281
|
-
|
259
|
+
def buffer_limit
|
260
|
+
if state[:last_fetched] && (state[:last_fetched] < state[:filter]) && buffer.empty?
|
261
|
+
state[:last_fetched]
|
262
|
+
else
|
263
|
+
state[:filter]
|
264
|
+
end
|
265
|
+
end
|
282
266
|
|
283
|
-
|
284
|
-
|
285
|
-
|
286
|
-
#
|
287
|
-
|
288
|
-
|
289
|
-
|
290
|
-
|
291
|
-
data = ds.all
|
292
|
-
self.buffer += data
|
293
|
-
num += data.size
|
294
|
-
if data.size > 0
|
295
|
-
# keep a record of the last primary key value in the buffer
|
296
|
-
state[:filter] = self.buffer.last[ primary_key ]
|
267
|
+
def calc_limit(chunksize)
|
268
|
+
# we want to not fetch more than is needed while we're
|
269
|
+
# inside sinatra but locally we can select more than
|
270
|
+
# is strictly needed
|
271
|
+
if defined?(Sinatra)
|
272
|
+
(chunksize * 1.1).ceil
|
273
|
+
else
|
274
|
+
(chunksize * 3).ceil
|
297
275
|
end
|
276
|
+
end
|
298
277
|
|
299
|
-
|
278
|
+
def load_buffer(chunksize)
|
279
|
+
# make sure BasicObject is not polluted by subsequent requires
|
280
|
+
Sequel::BasicObject.remove_methods!
|
281
|
+
|
282
|
+
num = 0
|
283
|
+
loop do
|
284
|
+
limit = calc_limit(chunksize)
|
285
|
+
# we have to use local variables in order for the virtual row filter to work correctly
|
286
|
+
key = primary_key
|
287
|
+
buf_limit = buffer_limit
|
288
|
+
ds = table.order(*order_by).filter { key.sql_number > buf_limit }.limit(limit)
|
289
|
+
log.debug "DataStreamKeyed#load_buffer SQL -> #{ds.sql}"
|
290
|
+
data = ds.all
|
291
|
+
self.buffer += data
|
292
|
+
num += data.size
|
293
|
+
unless data.empty?
|
294
|
+
# keep a record of the last primary key value in the buffer
|
295
|
+
state[:filter] = self.buffer.last[primary_key]
|
296
|
+
end
|
297
|
+
|
298
|
+
break if (num >= chunksize) || data.empty?
|
299
|
+
end
|
300
300
|
end
|
301
|
-
end
|
302
301
|
|
303
|
-
|
304
|
-
|
305
|
-
|
306
|
-
|
307
|
-
rows
|
308
|
-
else
|
309
|
-
nil
|
302
|
+
def fetch_buffered(chunksize)
|
303
|
+
load_buffer(chunksize) if self.buffer.size < chunksize
|
304
|
+
rows = buffer.slice(0, chunksize)
|
305
|
+
state[:last_fetched] = (rows.last[primary_key] unless rows.empty?)
|
306
|
+
rows
|
310
307
|
end
|
311
|
-
rows
|
312
|
-
end
|
313
308
|
|
314
|
-
|
315
|
-
|
316
|
-
|
309
|
+
def import_rows(rows)
|
310
|
+
table.import(rows[:header], rows[:data])
|
311
|
+
end
|
317
312
|
|
318
|
-
|
319
|
-
|
320
|
-
|
321
|
-
|
322
|
-
|
313
|
+
def fetch_rows
|
314
|
+
chunksize = state[:chunksize]
|
315
|
+
Taps::Utils.format_data(fetch_buffered(chunksize) || [],
|
316
|
+
string_columns: string_columns)
|
317
|
+
end
|
323
318
|
|
324
|
-
|
325
|
-
|
326
|
-
|
327
|
-
|
319
|
+
def increment(row_count)
|
320
|
+
# pop the rows we just successfully sent off the buffer
|
321
|
+
@buffer.slice!(0, row_count)
|
322
|
+
end
|
328
323
|
|
329
|
-
|
330
|
-
|
331
|
-
|
332
|
-
|
324
|
+
def verify_stream
|
325
|
+
key = primary_key
|
326
|
+
ds = table.order(*order_by)
|
327
|
+
current_filter = ds.max(key.sql_number)
|
333
328
|
|
334
|
-
|
335
|
-
|
336
|
-
|
337
|
-
|
329
|
+
# set the current filter to the max of the primary key
|
330
|
+
state[:filter] = current_filter
|
331
|
+
# clear out the last_fetched value so it can restart from scratch
|
332
|
+
state[:last_fetched] = nil
|
338
333
|
|
339
|
-
|
334
|
+
log.debug "DataStreamKeyed#verify_stream -> state: #{state.inspect}"
|
335
|
+
end
|
340
336
|
end
|
341
337
|
end
|
342
|
-
|
343
|
-
end
|