taps2 0.5.5 → 0.6.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 33670f0428acea1fd1b319fa7709121d06e1ca44
4
- data.tar.gz: 62f305882ef883472629bf9f72ed41f8efcf11ab
3
+ metadata.gz: bd7e00f8e9f2311fc7b0d8e990cb6d225978e39b
4
+ data.tar.gz: ab2788206704d5cb6ec72811192e2584439eca14
5
5
  SHA512:
6
- metadata.gz: a6f3f8eeea90f1da98a93213866b1549464cb3058accade4528cb6130854b1634e25a2a4501d64ac3633c1eb9a8f857791db57c6bf7b59a404c03fe28f66248e
7
- data.tar.gz: 36ae8502a3165c24d1bd85cc3e80eebe89a14f77cf6d75deba118e565727c79da28296ae23eb489c87b9765d8dd3f3879351880b88ccefe70c1548e228eb5b4a
6
+ metadata.gz: e1fe890ae878b3f52e33d9dbf88caaa32eb51b7672a8967da9cc2a5c408ef6e877f8e0c8f5698ad9a095ece0cdaa09252bc566021a5b1293633045faa595e467
7
+ data.tar.gz: 10c5c43cfe50ac13c0428d62c9c2e007d2f2481240040827c47a301ba3af14e008a3dd972e52bf4256c99149e179a3c77f162db466bfb1016a72c27f013e5b16
@@ -1,25 +1,25 @@
1
- = Taps (2) -- simple database import/export app
1
+ # Taps (2) -- simple database import/export app
2
2
 
3
3
  A simple database agnostic import/export app to transfer data to/from a remote database.
4
4
 
5
- *Forked and updated* with fixes and improvements. Integrates fixes and updates from {taps-taps}(https://github.com/wijet/taps) and {tapsicle}(https://github.com/jiffyondemand/tapsicle) forks.
5
+ *Forked and updated* with fixes and improvements. Integrates fixes and updates from [taps-taps](https://github.com/wijet/taps) and [tapsicle](https://github.com/jiffyondemand/tapsicle) forks.
6
6
 
7
- == Installation
7
+ ## Installation
8
8
 
9
9
  Renamed gem
10
10
 
11
- $ gem install taps2
11
+ $ gem install taps2
12
12
 
13
13
  By default, Taps will attempt to create a SQLite3 database for sessions. Unless you specify a different database type, you'll need to install SQLite 3. (See _Environment Variables_ for alternative session databases.)
14
14
 
15
- $ gem install sqlite3
15
+ $ gem install sqlite3
16
16
 
17
17
  Install the gems to support databases you want to work with, such as MySQL or PostgreSQL.
18
18
 
19
- $ gem install mysql2
20
- $ gem install pg
19
+ $ gem install mysql2
20
+ $ gem install pg
21
21
 
22
- == Configuration: Environment Variables
22
+ ## Configuration: Environment Variables
23
23
 
24
24
  _All environment variables are optional._
25
25
 
@@ -37,40 +37,42 @@ The `NO_DUMP_MARSHAL_ERRORS` variable allows you to disable dumping of marshalle
37
37
 
38
38
  The `NO_DEFLATE` variable allows you to disable gzip compression (`Rack::Deflater`) on the server.
39
39
 
40
- == Usage: Server
40
+ ## Usage: Server
41
41
 
42
42
  Here's how you start a taps server
43
43
 
44
- $ taps server postgres://localdbuser:localdbpass@localhost/dbname httpuser httppassword
44
+ $ taps2 server postgres://localdbuser:localdbpass@localhost/dbname httpuser httppassword
45
45
 
46
46
  You can also specify an encoding in the database url
47
47
 
48
- $ taps server mysql://localdbuser:localdbpass@localhost/dbname?encoding=latin1 httpuser httppassword
48
+ $ taps2 server mysql://localdbuser:localdbpass@localhost/dbname?encoding=latin1 httpuser httppassword
49
49
 
50
- == Usage: Client
50
+ ## Usage: Client
51
51
 
52
52
  When you want to pull down a database from a taps server
53
53
 
54
- $ taps pull postgres://dbuser:dbpassword@localhost/dbname http://httpuser:httppassword@example.com:5000
54
+ $ taps2 pull postgres://dbuser:dbpassword@localhost/dbname http://httpuser:httppassword@example.com:5000
55
55
 
56
56
  or when you want to push a local database to a taps server
57
57
 
58
- $ taps push postgres://dbuser:dbpassword@localhost/dbname http://httpuser:httppassword@example.com:5000
58
+ $ taps2 push postgres://dbuser:dbpassword@localhost/dbname http://httpuser:httppassword@example.com:5000
59
59
 
60
60
  or when you want to transfer a list of tables
61
61
 
62
- $ taps push postgres://dbuser:dbpassword@localhost/dbname http://httpuser:httppassword@example.com:5000 --tables logs,tags
62
+ $ taps2 push postgres://dbuser:dbpassword@localhost/dbname http://httpuser:httppassword@example.com:5000 --tables logs,tags
63
63
 
64
64
  or when you want to transfer tables that start with a word
65
65
 
66
- $ taps push postgres://dbuser:dbpassword@localhost/dbname http://httpuser:httppassword@example.com:5000 --filter '^log_'
66
+ $ taps2 push postgres://dbuser:dbpassword@localhost/dbname http://httpuser:httppassword@example.com:5000 --filter '^log_'
67
67
 
68
- == Troubleshooting
68
+ ## Troubleshooting
69
69
 
70
70
  * "Error: invalid byte sequence for encoding" can be resolved by adding `encoding` to database URI (https://github.com/ricardochimal/taps/issues/110)
71
- * *Example:* `taps server mysql://root@localhost/example_database?encoding=UTF8 httpuser httppassword`
71
+ * *Example:* `taps2 server mysql://root@localhost/example_database?encoding=UTF8 httpuser httppassword`
72
+ * SQLite3 database URI may require three slashes (e.g. `sqlite3:///path/to/file.db`)
73
+ * Make sure to use an absolute/full path to the file on the server
72
74
 
73
- == Known Issues
75
+ ## Known Issues
74
76
 
75
77
  * Foreign key constraints get lost in the schema transfer
76
78
  * Indexes may drop the "order" (https://github.com/ricardochimal/taps/issues/111)
@@ -78,14 +80,19 @@ or when you want to transfer tables that start with a word
78
80
  * Tables without primary keys will be incredibly slow to transfer. This is due to it being inefficient having large offset values in queries.
79
81
  * Multiple schemas are currently not supported (https://github.com/ricardochimal/taps/issues/97)
80
82
  * Taps does not drop tables when overwriting database (https://github.com/ricardochimal/taps/issues/94)
83
+ * Oracle database classes not fully supported (https://github.com/ricardochimal/taps/issues/89)
84
+ * Some blank default values may be converted to NULL in MySQL table schemas (https://github.com/ricardochimal/taps/issues/88)
85
+ * Conversion of column data types can cause side effects when going from one database type to another
86
+ * MySQL `bigint` converts to PostgreSQL `string` (https://github.com/ricardochimal/taps/issues/77)
87
+ * Passwords in database URI can cause issues with special characters (https://github.com/ricardochimal/taps/issues/74)
81
88
 
82
- == Feature Requests
89
+ ## Feature Requests
83
90
 
84
91
  * Allow a single Taps server to serve data from different databases (https://github.com/ricardochimal/taps/issues/103)
85
92
 
86
- == Meta
93
+ ## Meta
87
94
 
88
- Maintained by {Joel Van Horn}(http://github.com/joelvh)
95
+ Maintained by [Joel Van Horn](http://github.com/joelvh)
89
96
 
90
97
  Written by Ricardo Chimal, Jr. (ricardo at heroku dot com) and Adam Wiggins (adam at heroku dot com)
91
98
 
@@ -1,14 +1,15 @@
1
1
  #!/usr/bin/env ruby
2
2
 
3
3
  require 'rubygems'
4
- gem 'sequel', '~> 3.20.0'
5
4
 
6
- $:.unshift File.dirname(__FILE__) + '/../lib'
5
+ gem 'sequel', '~> 4.0'
6
+
7
+ $LOAD_PATH.unshift File.dirname(__FILE__) + '/../lib'
7
8
 
8
9
  require 'taps/schema'
9
10
 
10
- cmd = ARGV.shift.strip rescue ''
11
- database_url = ARGV.shift.strip rescue ''
11
+ cmd = ARGV.shift.to_s.strip
12
+ database_url = ARGV.shift.to_s.strip
12
13
 
13
14
  def show_usage_and_exit
14
15
  puts <<EOTXT
@@ -35,12 +36,20 @@ when 'indexes'
35
36
  when 'indexes_individual'
36
37
  puts Taps::Schema.indexes_individual(database_url)
37
38
  when 'load_indexes'
38
- filename = ARGV.shift.strip rescue ''
39
- indexes = File.read(filename) rescue show_usage_and_exit
39
+ filename = ARGV.shift.to_s.strip
40
+ indexes = begin
41
+ File.read(filename)
42
+ rescue StandardError
43
+ show_usage_and_exit
44
+ end
40
45
  Taps::Schema.load_indexes(database_url, indexes)
41
46
  when 'load'
42
- filename = ARGV.shift.strip rescue ''
43
- schema = File.read(filename) rescue show_usage_and_exit
47
+ filename = ARGV.shift.to_s.strip
48
+ schema = begin
49
+ File.read(filename)
50
+ rescue StandardError
51
+ show_usage_and_exit
52
+ end
44
53
  Taps::Schema.load(database_url, schema)
45
54
  when 'reset_db_sequences'
46
55
  Taps::Schema.reset_db_sequences(database_url)
File without changes
data/bin/{taps → taps2} RENAMED
@@ -1,6 +1,6 @@
1
1
  #!/usr/bin/env ruby
2
2
 
3
- $:.unshift File.dirname(__FILE__) + '/../lib'
3
+ $LOAD_PATH.unshift File.dirname(__FILE__) + '/../lib'
4
4
  require 'taps/cli'
5
5
 
6
6
  Taps::Cli.new(ARGV.dup).run
@@ -15,7 +15,7 @@ class Taps::Chunksize
15
15
  end
16
16
 
17
17
  def reset_chunksize
18
- @chunksize = (retries <= 1) ? 10 : 1
18
+ @chunksize = retries <= 1 ? 10 : 1
19
19
  end
20
20
 
21
21
  def diff
@@ -24,7 +24,11 @@ class Taps::Chunksize
24
24
 
25
25
  def time_in_db=(t)
26
26
  @time_in_db = t
27
- @time_in_db = @time_in_db.to_f rescue 0.0
27
+ @time_in_db = begin
28
+ @time_in_db.to_f
29
+ rescue
30
+ 0.0
31
+ end
28
32
  end
29
33
 
30
34
  def time_delta
@@ -36,16 +40,16 @@ class Taps::Chunksize
36
40
 
37
41
  def calc_new_chunksize
38
42
  new_chunksize = if retries > 0
39
- chunksize
40
- elsif diff > 3.0
41
- (chunksize / 3).ceil
42
- elsif diff > 1.1
43
- chunksize - 100
44
- elsif diff < 0.8
45
- chunksize * 2
46
- else
47
- chunksize + 100
48
- end
43
+ chunksize
44
+ elsif diff > 3.0
45
+ (chunksize / 3).ceil
46
+ elsif diff > 1.1
47
+ chunksize - 100
48
+ elsif diff < 0.8
49
+ chunksize * 2
50
+ else
51
+ chunksize + 100
52
+ end
49
53
  new_chunksize = 1 if new_chunksize < 1
50
54
  new_chunksize
51
55
  end
data/lib/taps/cli.rb CHANGED
@@ -9,7 +9,7 @@ Taps::Config.taps_database_url = ENV['TAPS_DATABASE_URL'] || ENV['DATABASE_URL']
9
9
  # this is dirty but it solves a weird problem where the tempfile disappears mid-process
10
10
  require 'sqlite3'
11
11
  $__taps_database = Tempfile.new('taps.db')
12
- $__taps_database.open()
12
+ $__taps_database.open
13
13
  "sqlite://#{$__taps_database.path}"
14
14
  end
15
15
 
@@ -23,7 +23,7 @@ module Taps
23
23
 
24
24
  def run
25
25
  method = (argv.shift || 'help').to_sym
26
- if [:pull, :push, :server, :version].include? method
26
+ if %i[pull push server version].include? method
27
27
  send(method)
28
28
  else
29
29
  help
@@ -59,12 +59,10 @@ module Taps
59
59
 
60
60
  Taps::Config.verify_database_url
61
61
  require 'taps/server'
62
- Taps::Server.run!({
63
- :port => opts[:port],
64
- :environment => :production,
65
- :logging => true,
66
- :dump_errors => true,
67
- })
62
+ Taps::Server.run!(port: opts[:port],
63
+ environment: :production,
64
+ logging: true,
65
+ dump_errors: true)
68
66
  end
69
67
 
70
68
  def version
@@ -85,13 +83,13 @@ EOHELP
85
83
  end
86
84
 
87
85
  def serveroptparse
88
- opts={:port => 5000, :database_url => nil, :login => nil, :password => nil, :debug => false}
86
+ opts = { port: 5000, database_url: nil, login: nil, password: nil, debug: false }
89
87
  OptionParser.new do |o|
90
- o.banner = "Usage: #{File.basename($0)} server [OPTIONS] <local_database_url> <login> <password>"
91
- o.define_head "Start a taps database import/export server"
88
+ o.banner = "Usage: #{File.basename($PROGRAM_NAME)} server [OPTIONS] <local_database_url> <login> <password>"
89
+ o.define_head 'Start a taps database import/export server'
92
90
 
93
- o.on("-p", "--port=N", "Server Port") { |v| opts[:port] = v.to_i if v.to_i > 0 }
94
- o.on("-d", "--debug", "Enable Debug Messages") { |v| opts[:debug] = true }
91
+ o.on('-p', '--port=N', 'Server Port') { |v| opts[:port] = v.to_i if v.to_i > 0 }
92
+ o.on('-d', '--debug', 'Enable Debug Messages') { |_v| opts[:debug] = true }
95
93
  o.parse!(argv)
96
94
 
97
95
  opts[:database_url] = argv.shift
@@ -99,17 +97,17 @@ EOHELP
99
97
  opts[:password] = argv.shift || ENV['TAPS_PASSWORD']
100
98
 
101
99
  if opts[:database_url].nil?
102
- $stderr.puts "Missing Database URL"
100
+ warn 'Missing Database URL'
103
101
  puts o
104
102
  exit 1
105
103
  end
106
104
  if opts[:login].nil?
107
- $stderr.puts "Missing Login"
105
+ warn 'Missing Login'
108
106
  puts o
109
107
  exit 1
110
108
  end
111
109
  if opts[:password].nil?
112
- $stderr.puts "Missing Password"
110
+ warn 'Missing Password'
113
111
  puts o
114
112
  exit 1
115
113
  end
@@ -118,41 +116,41 @@ EOHELP
118
116
  end
119
117
 
120
118
  def clientoptparse(cmd)
121
- opts={:default_chunksize => 1000, :database_url => nil, :remote_url => nil, :debug => false, :resume_filename => nil, :disable_compresion => false, :indexes_first => false}
119
+ opts = { default_chunksize: 1000, database_url: nil, remote_url: nil, debug: false, resume_filename: nil, disable_compresion: false, indexes_first: false }
122
120
  OptionParser.new do |o|
123
- o.banner = "Usage: #{File.basename($0)} #{cmd} [OPTIONS] <local_database_url> <remote_url>"
121
+ o.banner = "Usage: #{File.basename($PROGRAM_NAME)} #{cmd} [OPTIONS] <local_database_url> <remote_url>"
124
122
 
125
123
  case cmd
126
124
  when :pull
127
- o.define_head "Pull a database from a taps server"
125
+ o.define_head 'Pull a database from a taps server'
128
126
  when :push
129
- o.define_head "Push a database to a taps server"
127
+ o.define_head 'Push a database to a taps server'
130
128
  end
131
129
 
132
- o.on("-s", "--skip-schema", "Don't transfer the schema, just data") { |v| opts[:skip_schema] = true }
133
- o.on("-i", "--indexes-first", "Transfer indexes first before data") { |v| opts[:indexes_first] = true }
134
- o.on("-r", "--resume=file", "Resume a Taps Session from a stored file") { |v| opts[:resume_filename] = v }
135
- o.on("-c", "--chunksize=N", "Initial Chunksize") { |v| opts[:default_chunksize] = (v.to_i < 10 ? 10 : v.to_i) }
136
- o.on("-g", "--disable-compression", "Disable Compression") { |v| opts[:disable_compression] = true }
137
- o.on("-f", "--filter=regex", "Regex Filter for tables") { |v| opts[:table_filter] = v }
138
- o.on("-t", "--tables=A,B,C", Array, "Shortcut to filter on a list of tables") do |v|
139
- r_tables = v.collect { |t| "^#{t}$" }.join("|")
130
+ o.on('-s', '--skip-schema', "Don't transfer the schema, just data") { |_v| opts[:skip_schema] = true }
131
+ o.on('-i', '--indexes-first', 'Transfer indexes first before data') { |_v| opts[:indexes_first] = true }
132
+ o.on('-r', '--resume=file', 'Resume a Taps Session from a stored file') { |v| opts[:resume_filename] = v }
133
+ o.on('-c', '--chunksize=N', 'Initial Chunksize') { |v| opts[:default_chunksize] = (v.to_i < 10 ? 10 : v.to_i) }
134
+ o.on('-g', '--disable-compression', 'Disable Compression') { |_v| opts[:disable_compression] = true }
135
+ o.on('-f', '--filter=regex', 'Regex Filter for tables') { |v| opts[:table_filter] = v }
136
+ o.on('-t', '--tables=A,B,C', Array, 'Shortcut to filter on a list of tables') do |v|
137
+ r_tables = v.collect { |t| "^#{t}$" }.join('|')
140
138
  opts[:table_filter] = "(#{r_tables})"
141
139
  end
142
- o.on("-e", "--exclude_tables=A,B,C", Array, "Shortcut to exclude a list of tables") { |v| opts[:exclude_tables] = v }
143
- o.on("-d", "--debug", "Enable Debug Messages") { |v| opts[:debug] = true }
140
+ o.on('-e', '--exclude_tables=A,B,C', Array, 'Shortcut to exclude a list of tables') { |v| opts[:exclude_tables] = v }
141
+ o.on('-d', '--debug', 'Enable Debug Messages') { |_v| opts[:debug] = true }
144
142
  o.parse!(argv)
145
143
 
146
144
  opts[:database_url] = argv.shift
147
145
  opts[:remote_url] = argv.shift
148
146
 
149
147
  if opts[:database_url].nil?
150
- $stderr.puts "Missing Database URL"
148
+ warn 'Missing Database URL'
151
149
  puts o
152
150
  exit 1
153
151
  end
154
152
  if opts[:remote_url].nil?
155
- $stderr.puts "Missing Remote Taps URL"
153
+ warn 'Missing Remote Taps URL'
156
154
  puts o
157
155
  exit 1
158
156
  end
@@ -183,14 +181,11 @@ EOHELP
183
181
 
184
182
  require 'taps/operation'
185
183
 
186
- newsession = session.merge({
187
- :default_chunksize => opts[:default_chunksize],
188
- :disable_compression => opts[:disable_compression],
189
- :resume => true,
190
- })
184
+ newsession = session.merge(default_chunksize: opts[:default_chunksize],
185
+ disable_compression: opts[:disable_compression],
186
+ resume: true)
191
187
 
192
188
  Taps::Operation.factory(method, database_url, remote_url, newsession).run
193
189
  end
194
-
195
190
  end
196
191
  end
data/lib/taps/config.rb CHANGED
@@ -22,8 +22,8 @@ module Taps
22
22
  attr_accessor :login, :password, :database_url, :remote_url
23
23
  attr_accessor :chunksize
24
24
 
25
- def verify_database_url(db_url=nil)
26
- db_url ||= self.database_url
25
+ def verify_database_url(db_url = nil)
26
+ db_url ||= database_url
27
27
  db = Sequel.connect(db_url)
28
28
  db.tables
29
29
  db.disconnect
@@ -6,338 +6,332 @@ require 'taps/errors'
6
6
  require 'vendor/okjson'
7
7
 
8
8
  module Taps
9
+ class DataStream
10
+ DEFAULT_CHUNKSIZE = 1000
11
+
12
+ attr_reader :db, :state
13
+
14
+ def initialize(db, state)
15
+ @db = db
16
+ @state = {
17
+ offset: 0,
18
+ avg_chunksize: 0,
19
+ num_chunksize: 0,
20
+ total_chunksize: 0
21
+ }.merge(state)
22
+ @state[:chunksize] ||= DEFAULT_CHUNKSIZE
23
+ @complete = false
24
+ end
9
25
 
10
- class DataStream
11
- DEFAULT_CHUNKSIZE = 1000
12
-
13
- attr_reader :db, :state
14
-
15
- def initialize(db, state)
16
- @db = db
17
- @state = {
18
- :offset => 0,
19
- :avg_chunksize => 0,
20
- :num_chunksize => 0,
21
- :total_chunksize => 0,
22
- }.merge(state)
23
- @state[:chunksize] ||= DEFAULT_CHUNKSIZE
24
- @complete = false
25
- end
26
-
27
- def log
28
- Taps.log
29
- end
26
+ def log
27
+ Taps.log
28
+ end
30
29
 
31
- def error=(val)
32
- state[:error] = val
33
- end
30
+ def error=(val)
31
+ state[:error] = val
32
+ end
34
33
 
35
- def error
36
- state[:error] || false
37
- end
34
+ def error
35
+ state[:error] || false
36
+ end
38
37
 
39
- def table_name
40
- state[:table_name].to_sym
41
- end
38
+ def table_name
39
+ state[:table_name].to_sym
40
+ end
42
41
 
43
- def table_name_sql
44
- table_name.identifier
45
- end
42
+ def table_name_sql
43
+ table_name.identifier
44
+ end
46
45
 
47
- def to_hash
48
- state.merge(:klass => self.class.to_s)
49
- end
46
+ def to_hash
47
+ state.merge(klass: self.class.to_s)
48
+ end
50
49
 
51
- def to_json
52
- ::OkJson.encode(to_hash)
53
- end
50
+ def to_json
51
+ ::OkJson.encode(to_hash)
52
+ end
54
53
 
55
- def string_columns
56
- @string_columns ||= Taps::Utils.incorrect_blobs(db, table_name)
57
- end
54
+ def string_columns
55
+ @string_columns ||= Taps::Utils.incorrect_blobs(db, table_name)
56
+ end
58
57
 
59
- def table
60
- @table ||= db[table_name_sql]
61
- end
58
+ def table
59
+ @table ||= db[table_name_sql]
60
+ end
62
61
 
63
- def order_by(name=nil)
64
- @order_by ||= begin
65
- name ||= table_name
66
- Taps::Utils.order_by(db, name)
62
+ def order_by(name = nil)
63
+ @order_by ||= begin
64
+ name ||= table_name
65
+ Taps::Utils.order_by(db, name)
66
+ end
67
67
  end
68
- end
69
68
 
70
- def increment(row_count)
71
- state[:offset] += row_count
72
- end
69
+ def increment(row_count)
70
+ state[:offset] += row_count
71
+ end
73
72
 
74
- # keep a record of the average chunksize within the first few hundred thousand records, after chunksize
75
- # goes below 100 or maybe if offset is > 1000
76
- def fetch_rows
77
- state[:chunksize] = fetch_chunksize
78
- ds = table.order(*order_by).limit(state[:chunksize], state[:offset])
79
- log.debug "DataStream#fetch_rows SQL -> #{ds.sql}"
80
- rows = Taps::Utils.format_data(ds.all,
81
- :string_columns => string_columns,
82
- :schema => db.schema(table_name),
83
- :table => table_name
84
- )
85
- update_chunksize_stats
86
- rows
87
- end
73
+ # keep a record of the average chunksize within the first few hundred thousand records, after chunksize
74
+ # goes below 100 or maybe if offset is > 1000
75
+ def fetch_rows
76
+ state[:chunksize] = fetch_chunksize
77
+ ds = table.order(*order_by).limit(state[:chunksize], state[:offset])
78
+ log.debug "DataStream#fetch_rows SQL -> #{ds.sql}"
79
+ rows = Taps::Utils.format_data(ds.all,
80
+ string_columns: string_columns,
81
+ schema: db.schema(table_name),
82
+ table: table_name)
83
+ update_chunksize_stats
84
+ rows
85
+ end
88
86
 
89
- def max_chunksize_training
90
- 20
91
- end
87
+ def max_chunksize_training
88
+ 20
89
+ end
92
90
 
93
- def fetch_chunksize
94
- chunksize = state[:chunksize]
95
- return chunksize if state[:num_chunksize] < max_chunksize_training
96
- return chunksize if state[:avg_chunksize] == 0
97
- return chunksize if state[:error]
98
- state[:avg_chunksize] > chunksize ? state[:avg_chunksize] : chunksize
99
- end
91
+ def fetch_chunksize
92
+ chunksize = state[:chunksize]
93
+ return chunksize if state[:num_chunksize] < max_chunksize_training
94
+ return chunksize if state[:avg_chunksize] == 0
95
+ return chunksize if state[:error]
96
+ state[:avg_chunksize] > chunksize ? state[:avg_chunksize] : chunksize
97
+ end
100
98
 
101
- def update_chunksize_stats
102
- return if state[:num_chunksize] >= max_chunksize_training
103
- state[:total_chunksize] += state[:chunksize]
104
- state[:num_chunksize] += 1
105
- state[:avg_chunksize] = state[:total_chunksize] / state[:num_chunksize] rescue state[:chunksize]
106
- end
99
+ def update_chunksize_stats
100
+ return if state[:num_chunksize] >= max_chunksize_training
101
+ state[:total_chunksize] += state[:chunksize]
102
+ state[:num_chunksize] += 1
103
+ state[:avg_chunksize] = begin
104
+ state[:total_chunksize] / state[:num_chunksize]
105
+ rescue
106
+ state[:chunksize]
107
+ end
108
+ end
107
109
 
108
- def encode_rows(rows)
109
- Taps::Utils.base64encode(Marshal.dump(rows))
110
- end
110
+ def encode_rows(rows)
111
+ Taps::Utils.base64encode(Marshal.dump(rows))
112
+ end
111
113
 
112
- def fetch
113
- log.debug "DataStream#fetch state -> #{state.inspect}"
114
+ def fetch
115
+ log.debug "DataStream#fetch state -> #{state.inspect}"
114
116
 
115
- t1 = Time.now
116
- rows = fetch_rows
117
- encoded_data = encode_rows(rows)
118
- t2 = Time.now
119
- elapsed_time = t2 - t1
117
+ t1 = Time.now
118
+ rows = fetch_rows
119
+ encoded_data = encode_rows(rows)
120
+ t2 = Time.now
121
+ elapsed_time = t2 - t1
120
122
 
121
- @complete = rows == { }
123
+ @complete = rows == {}
122
124
 
123
- [encoded_data, (@complete ? 0 : rows[:data].size), elapsed_time]
124
- end
125
+ [encoded_data, (@complete ? 0 : rows[:data].size), elapsed_time]
126
+ end
125
127
 
126
- def complete?
127
- @complete
128
- end
128
+ def complete?
129
+ @complete
130
+ end
129
131
 
130
- def fetch_remote(resource, headers)
131
- params = fetch_from_resource(resource, headers)
132
- encoded_data = params[:encoded_data]
133
- json = params[:json]
132
+ def fetch_remote(resource, headers)
133
+ params = fetch_from_resource(resource, headers)
134
+ encoded_data = params[:encoded_data]
135
+ json = params[:json]
134
136
 
135
- rows = parse_encoded_data(encoded_data, json[:checksum])
136
- @complete = rows == { }
137
+ rows = parse_encoded_data(encoded_data, json[:checksum])
138
+ @complete = rows == {}
137
139
 
138
- # update local state
139
- state.merge!(json[:state].merge(:chunksize => state[:chunksize]))
140
+ # update local state
141
+ state.merge!(json[:state].merge(chunksize: state[:chunksize]))
140
142
 
141
- unless @complete
142
- import_rows(rows)
143
- rows[:data].size
144
- else
145
- 0
143
+ if @complete
144
+ 0
145
+ else
146
+ import_rows(rows)
147
+ rows[:data].size
148
+ end
146
149
  end
147
- end
148
150
 
149
- # this one is used inside the server process
150
- def fetch_remote_in_server(params)
151
- json = self.class.parse_json(params[:json])
152
- encoded_data = params[:encoded_data]
151
+ # this one is used inside the server process
152
+ def fetch_remote_in_server(params)
153
+ json = self.class.parse_json(params[:json])
154
+ encoded_data = params[:encoded_data]
153
155
 
154
- rows = parse_encoded_data(encoded_data, json[:checksum])
155
- @complete = rows == { }
156
+ rows = parse_encoded_data(encoded_data, json[:checksum])
157
+ @complete = rows == {}
156
158
 
157
- unless @complete
158
- import_rows(rows)
159
- rows[:data].size
160
- else
161
- 0
159
+ if @complete
160
+ 0
161
+ else
162
+ import_rows(rows)
163
+ rows[:data].size
164
+ end
162
165
  end
163
- end
164
166
 
165
- def fetch_from_resource(resource, headers)
166
- res = nil
167
- log.debug "DataStream#fetch_from_resource state -> #{state.inspect}"
168
- state[:chunksize] = Taps::Utils.calculate_chunksize(state[:chunksize]) do |c|
169
- state[:chunksize] = c.to_i
170
- res = resource.post({:state => ::OkJson.encode(self.to_hash)}, headers)
171
- end
167
+ def fetch_from_resource(resource, headers)
168
+ res = nil
169
+ log.debug "DataStream#fetch_from_resource state -> #{state.inspect}"
170
+ state[:chunksize] = Taps::Utils.calculate_chunksize(state[:chunksize]) do |c|
171
+ state[:chunksize] = c.to_i
172
+ res = resource.post({ state: ::OkJson.encode(to_hash) }, headers)
173
+ end
172
174
 
173
- begin
174
- params = Taps::Multipart.parse(res)
175
- params[:json] = self.class.parse_json(params[:json]) if params.has_key?(:json)
176
- return params
177
- rescue ::OkJson::ParserError
178
- raise Taps::CorruptedData.new("Invalid OkJson Received")
175
+ begin
176
+ params = Taps::Multipart.parse(res)
177
+ params[:json] = self.class.parse_json(params[:json]) if params.key?(:json)
178
+ return params
179
+ rescue ::OkJson::ParserError
180
+ raise Taps::CorruptedData, 'Invalid OkJson Received'
181
+ end
179
182
  end
180
- end
181
183
 
182
- def self.parse_json(json)
183
- hash = ::OkJson.decode(json).symbolize_keys
184
- hash[:state].symbolize_keys! if hash.has_key?(:state)
185
- hash
186
- end
187
-
188
- def parse_encoded_data(encoded_data, checksum)
189
- raise Taps::CorruptedData.new("Checksum Failed") unless Taps::Utils.valid_data?(encoded_data, checksum)
184
+ def self.parse_json(json)
185
+ hash = ::OkJson.decode(json).symbolize_keys
186
+ hash[:state].symbolize_keys! if hash.key?(:state)
187
+ hash
188
+ end
190
189
 
191
- begin
192
- return Marshal.load(Taps::Utils.base64decode(encoded_data))
193
- rescue Object
194
- unless ENV['NO_DUMP_MARSHAL_ERRORS']
195
- puts "Error encountered loading data, wrote the data chunk to dump.#{Process.pid}.dat"
196
- File.open("dump.#{Process.pid}.dat", "w") { |f| f.write(encoded_data) }
190
+ def parse_encoded_data(encoded_data, checksum)
191
+ raise Taps::CorruptedData, 'Checksum Failed' unless Taps::Utils.valid_data?(encoded_data, checksum)
192
+
193
+ begin
194
+ return Marshal.load(Taps::Utils.base64decode(encoded_data))
195
+ rescue Object
196
+ unless ENV['NO_DUMP_MARSHAL_ERRORS']
197
+ puts "Error encountered loading data, wrote the data chunk to dump.#{Process.pid}.dat"
198
+ File.open("dump.#{Process.pid}.dat", 'w') { |f| f.write(encoded_data) }
199
+ end
200
+ raise
197
201
  end
198
- raise
199
202
  end
200
- end
201
203
 
202
- def import_rows(rows)
203
- table.import(rows[:header], rows[:data])
204
- state[:offset] += rows[:data].size
205
- rescue Exception => ex
206
- case ex.message
207
- when /integer out of range/ then
208
- raise Taps::InvalidData, <<-ERROR, []
204
+ def import_rows(rows)
205
+ table.import(rows[:header], rows[:data])
206
+ state[:offset] += rows[:data].size
207
+ rescue Exception => ex
208
+ case ex.message
209
+ when /integer out of range/ then
210
+ raise Taps::InvalidData, <<-ERROR, []
209
211
  \nDetected integer data that exceeds the maximum allowable size for an integer type.
210
212
  This generally occurs when importing from SQLite due to the fact that SQLite does
211
213
  not enforce maximum values on integer types.
212
214
  ERROR
213
- else raise ex
215
+ else raise ex
216
+ end
214
217
  end
215
- end
216
218
 
217
- def verify_stream
218
- state[:offset] = table.count
219
- end
220
-
221
- def verify_remote_stream(resource, headers)
222
- json_raw = resource.post({:state => ::OkJson.encode(self)}, headers).to_s
223
- json = self.class.parse_json(json_raw)
224
-
225
- self.class.new(db, json[:state])
226
- end
227
-
228
- def self.factory(db, state)
229
- if defined?(Sequel::MySQL) && Sequel::MySQL.respond_to?(:convert_invalid_date_time=)
230
- Sequel::MySQL.convert_invalid_date_time = :nil
219
+ def verify_stream
220
+ state[:offset] = table.count
231
221
  end
232
222
 
233
- if state.has_key?(:klass)
234
- return eval(state[:klass]).new(db, state)
235
- end
223
+ def verify_remote_stream(resource, headers)
224
+ json_raw = resource.post({ state: ::OkJson.encode(self) }, headers).to_s
225
+ json = self.class.parse_json(json_raw)
236
226
 
237
- if Taps::Utils.single_integer_primary_key(db, state[:table_name].to_sym)
238
- DataStreamKeyed.new(db, state)
239
- else
240
- DataStream.new(db, state)
227
+ self.class.new(db, json[:state])
241
228
  end
242
- end
243
- end
244
229
 
230
+ def self.factory(db, state)
231
+ if defined?(Sequel::MySQL) && Sequel::MySQL.respond_to?(:convert_invalid_date_time=)
232
+ Sequel::MySQL.convert_invalid_date_time = :nil
233
+ end
245
234
 
246
- class DataStreamKeyed < DataStream
247
- attr_accessor :buffer
235
+ return eval(state[:klass]).new(db, state) if state.key?(:klass)
248
236
 
249
- def initialize(db, state)
250
- super(db, state)
251
- @state = { :primary_key => order_by(state[:table_name]).first, :filter => 0 }.merge(state)
252
- @state[:chunksize] ||= DEFAULT_CHUNKSIZE
253
- @buffer = []
237
+ if Taps::Utils.single_integer_primary_key(db, state[:table_name].to_sym)
238
+ DataStreamKeyed.new(db, state)
239
+ else
240
+ DataStream.new(db, state)
241
+ end
242
+ end
254
243
  end
255
244
 
256
- def primary_key
257
- state[:primary_key].to_sym
258
- end
245
+ class DataStreamKeyed < DataStream
246
+ attr_accessor :buffer
259
247
 
260
- def buffer_limit
261
- if state[:last_fetched] and state[:last_fetched] < state[:filter] and self.buffer.size == 0
262
- state[:last_fetched]
263
- else
264
- state[:filter]
248
+ def initialize(db, state)
249
+ super(db, state)
250
+ @state = { primary_key: order_by(state[:table_name]).first, filter: 0 }.merge(state)
251
+ @state[:chunksize] ||= DEFAULT_CHUNKSIZE
252
+ @buffer = []
265
253
  end
266
- end
267
254
 
268
- def calc_limit(chunksize)
269
- # we want to not fetch more than is needed while we're
270
- # inside sinatra but locally we can select more than
271
- # is strictly needed
272
- if defined?(Sinatra)
273
- (chunksize * 1.1).ceil
274
- else
275
- (chunksize * 3).ceil
255
+ def primary_key
256
+ state[:primary_key].to_sym
276
257
  end
277
- end
278
258
 
279
- def load_buffer(chunksize)
280
- # make sure BasicObject is not polluted by subsequent requires
281
- Sequel::BasicObject.remove_methods!
259
+ def buffer_limit
260
+ if state[:last_fetched] && (state[:last_fetched] < state[:filter]) && buffer.empty?
261
+ state[:last_fetched]
262
+ else
263
+ state[:filter]
264
+ end
265
+ end
282
266
 
283
- num = 0
284
- loop do
285
- limit = calc_limit(chunksize)
286
- # we have to use local variables in order for the virtual row filter to work correctly
287
- key = primary_key
288
- buf_limit = buffer_limit
289
- ds = table.order(*order_by).filter { key.sql_number > buf_limit }.limit(limit)
290
- log.debug "DataStreamKeyed#load_buffer SQL -> #{ds.sql}"
291
- data = ds.all
292
- self.buffer += data
293
- num += data.size
294
- if data.size > 0
295
- # keep a record of the last primary key value in the buffer
296
- state[:filter] = self.buffer.last[ primary_key ]
267
+ def calc_limit(chunksize)
268
+ # we want to not fetch more than is needed while we're
269
+ # inside sinatra but locally we can select more than
270
+ # is strictly needed
271
+ if defined?(Sinatra)
272
+ (chunksize * 1.1).ceil
273
+ else
274
+ (chunksize * 3).ceil
297
275
  end
276
+ end
298
277
 
299
- break if num >= chunksize or data.size == 0
278
+ def load_buffer(chunksize)
279
+ # make sure BasicObject is not polluted by subsequent requires
280
+ Sequel::BasicObject.remove_methods!
281
+
282
+ num = 0
283
+ loop do
284
+ limit = calc_limit(chunksize)
285
+ # we have to use local variables in order for the virtual row filter to work correctly
286
+ key = primary_key
287
+ buf_limit = buffer_limit
288
+ ds = table.order(*order_by).filter { key.sql_number > buf_limit }.limit(limit)
289
+ log.debug "DataStreamKeyed#load_buffer SQL -> #{ds.sql}"
290
+ data = ds.all
291
+ self.buffer += data
292
+ num += data.size
293
+ unless data.empty?
294
+ # keep a record of the last primary key value in the buffer
295
+ state[:filter] = self.buffer.last[primary_key]
296
+ end
297
+
298
+ break if (num >= chunksize) || data.empty?
299
+ end
300
300
  end
301
- end
302
301
 
303
- def fetch_buffered(chunksize)
304
- load_buffer(chunksize) if self.buffer.size < chunksize
305
- rows = buffer.slice(0, chunksize)
306
- state[:last_fetched] = if rows.size > 0
307
- rows.last[ primary_key ]
308
- else
309
- nil
302
+ def fetch_buffered(chunksize)
303
+ load_buffer(chunksize) if self.buffer.size < chunksize
304
+ rows = buffer.slice(0, chunksize)
305
+ state[:last_fetched] = (rows.last[primary_key] unless rows.empty?)
306
+ rows
310
307
  end
311
- rows
312
- end
313
308
 
314
- def import_rows(rows)
315
- table.import(rows[:header], rows[:data])
316
- end
309
+ def import_rows(rows)
310
+ table.import(rows[:header], rows[:data])
311
+ end
317
312
 
318
- def fetch_rows
319
- chunksize = state[:chunksize]
320
- Taps::Utils.format_data(fetch_buffered(chunksize) || [],
321
- :string_columns => string_columns)
322
- end
313
+ def fetch_rows
314
+ chunksize = state[:chunksize]
315
+ Taps::Utils.format_data(fetch_buffered(chunksize) || [],
316
+ string_columns: string_columns)
317
+ end
323
318
 
324
- def increment(row_count)
325
- # pop the rows we just successfully sent off the buffer
326
- @buffer.slice!(0, row_count)
327
- end
319
+ def increment(row_count)
320
+ # pop the rows we just successfully sent off the buffer
321
+ @buffer.slice!(0, row_count)
322
+ end
328
323
 
329
- def verify_stream
330
- key = primary_key
331
- ds = table.order(*order_by)
332
- current_filter = ds.max(key.sql_number)
324
+ def verify_stream
325
+ key = primary_key
326
+ ds = table.order(*order_by)
327
+ current_filter = ds.max(key.sql_number)
333
328
 
334
- # set the current filter to the max of the primary key
335
- state[:filter] = current_filter
336
- # clear out the last_fetched value so it can restart from scratch
337
- state[:last_fetched] = nil
329
+ # set the current filter to the max of the primary key
330
+ state[:filter] = current_filter
331
+ # clear out the last_fetched value so it can restart from scratch
332
+ state[:last_fetched] = nil
338
333
 
339
- log.debug "DataStreamKeyed#verify_stream -> state: #{state.inspect}"
334
+ log.debug "DataStreamKeyed#verify_stream -> state: #{state.inspect}"
335
+ end
340
336
  end
341
337
  end
342
-
343
- end