multisert 0.0.1 → 0.0.2
Sign up to get free protection for your applications and to get access to all the features.
- data/LICENSE +10 -19
- data/README.md +78 -1
- data/lib/multisert.rb +5 -7
- data/lib/multisert/version.rb +1 -1
- data/performance/multisert_performance_test.rb +85 -0
- data/performance/performance_helper.rb +62 -0
- data/spec/multisert_spec.rb +30 -2
- data/spec/spec_helper.rb +35 -34
- metadata +4 -2
data/LICENSE
CHANGED
@@ -1,22 +1,13 @@
|
|
1
|
-
Copyright
|
1
|
+
Copyright 2013 Jeff Iacono
|
2
2
|
|
3
|
-
|
3
|
+
Licensed under the Apache License, Version 2.0 (the "License");
|
4
|
+
you may not use this file except in compliance with the License.
|
5
|
+
You may obtain a copy of the License at
|
4
6
|
|
5
|
-
|
6
|
-
a copy of this software and associated documentation files (the
|
7
|
-
"Software"), to deal in the Software without restriction, including
|
8
|
-
without limitation the rights to use, copy, modify, merge, publish,
|
9
|
-
distribute, sublicense, and/or sell copies of the Software, and to
|
10
|
-
permit persons to whom the Software is furnished to do so, subject to
|
11
|
-
the following conditions:
|
7
|
+
http://www.apache.org/licenses/LICENSE-2.0
|
12
8
|
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
19
|
-
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
20
|
-
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
21
|
-
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
22
|
-
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
9
|
+
Unless required by applicable law or agreed to in writing, software
|
10
|
+
distributed under the License is distributed on an "AS IS" BASIS,
|
11
|
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
12
|
+
See the License for the specific language governing permissions and
|
13
|
+
limitations under the License.
|
data/README.md
CHANGED
@@ -1,6 +1,7 @@
|
|
1
1
|
# Multisert
|
2
2
|
|
3
|
-
|
3
|
+
Multisert is a buffer that handles bundling up INSERTs, which increases runtime
|
4
|
+
performance.
|
4
5
|
|
5
6
|
## Installation
|
6
7
|
|
@@ -70,6 +71,52 @@ script. This ensures that any pending entries are written to the database table
|
|
70
71
|
that were not automatically taken care of by the auto-flush that will kick in
|
71
72
|
during the iteration.
|
72
73
|
|
74
|
+
## Performance
|
75
|
+
|
76
|
+
The gem has a quick performance test built in that can be run via:
|
77
|
+
```bash
|
78
|
+
$ ruby ./performance/multisert_performance_test
|
79
|
+
```
|
80
|
+
We ran the performance test (with some modification to iterate the test 5
|
81
|
+
times) and receive the following output:
|
82
|
+
|
83
|
+
```bash
|
84
|
+
$ ruby ./performance/multisert_performance_test
|
85
|
+
# test 1:
|
86
|
+
# insert w/o buffer took 53.37s to insert 100000 entries
|
87
|
+
# multisert w/ buffer of 10000 took 1.77s to insert 100000 entries
|
88
|
+
#
|
89
|
+
# test 2:
|
90
|
+
# insert w/o buffer took 53.22s to insert 100000 entries
|
91
|
+
# multisert w/ buffer of 10000 took 1.84s to insert 100000 entries
|
92
|
+
#
|
93
|
+
# test 3:
|
94
|
+
# insert w/o buffer took 54.42s to insert 100000 entries
|
95
|
+
# multisert w/ buffer of 10000 took 1.9s to insert 100000 entries
|
96
|
+
#
|
97
|
+
# test 4:
|
98
|
+
# insert w/o buffer took 53.38s to insert 100000 entries
|
99
|
+
# multisert w/ buffer of 10000 took 1.81s to insert 100000 entries
|
100
|
+
#
|
101
|
+
# test 5:
|
102
|
+
# insert w/o buffer took 53.52s to insert 100000 entries
|
103
|
+
# multisert w/ buffer of 10000 took 1.78s to insert 100000 entries
|
104
|
+
```
|
105
|
+
|
106
|
+
As we can see, ~30x performance increase.
|
107
|
+
|
108
|
+
The performance test was run on a computer with the following specs:
|
109
|
+
|
110
|
+
Model Name: MacBook Air
|
111
|
+
Model Identifier: MacBookAir4,2
|
112
|
+
Processor Name: Intel Core i5
|
113
|
+
Processor Speed: 1.7 GHz
|
114
|
+
Number of Processors: 1
|
115
|
+
Total Number of Cores: 2
|
116
|
+
L2 Cache (per Core): 256 KB
|
117
|
+
L3 Cache: 3 MB
|
118
|
+
Memory: 4 GB
|
119
|
+
|
73
120
|
## FAQ
|
74
121
|
|
75
122
|
### Packet Too Large / Connection Lost Errors
|
@@ -83,6 +130,20 @@ To learn more, [read the documentation](http://dev.mysql.com/doc/refman/5.5/en//
|
|
83
130
|
If you need to you can adjust the buffer size by setting `max_buffer_count`
|
84
131
|
attribute. Generally, 10,000 to 100,000 is a pretty good starting range.
|
85
132
|
|
133
|
+
### Does it work with Dates?
|
134
|
+
|
135
|
+
Yes, just pass in a Date instance and it will be converted to a mysql
|
136
|
+
friendly format under the hood ("%Y-%m-%d"). If you need a special format,
|
137
|
+
convert the date to a string that is in the form you want before passing it into
|
138
|
+
Multisert.
|
139
|
+
|
140
|
+
### Does it work with Times?
|
141
|
+
|
142
|
+
Yes, just pass in a Time instance and it will be converted to a mysql
|
143
|
+
friendly format under the hood ("%Y-%m-%d %H:%M:%S"). If you need a special
|
144
|
+
format, convert the time to a string that is in the form you want before passing
|
145
|
+
it into Multisert.
|
146
|
+
|
86
147
|
## Contributing
|
87
148
|
|
88
149
|
1. Fork it
|
@@ -90,3 +151,19 @@ attribute. Generally, 10,000 to 100,000 is a pretty good starting range.
|
|
90
151
|
3. Commit your changes (`git commit -am 'Added some feature'`)
|
91
152
|
4. Push to the branch (`git push origin my-new-feature`)
|
92
153
|
5. Create new Pull Request
|
154
|
+
|
155
|
+
## License
|
156
|
+
|
157
|
+
Copyright 2013 Jeff Iacono
|
158
|
+
|
159
|
+
Licensed under the Apache License, Version 2.0 (the "License");
|
160
|
+
you may not use this file except in compliance with the License.
|
161
|
+
You may obtain a copy of the License at
|
162
|
+
|
163
|
+
http://www.apache.org/licenses/LICENSE-2.0
|
164
|
+
|
165
|
+
Unless required by applicable law or agreed to in writing, software
|
166
|
+
distributed under the License is distributed on an "AS IS" BASIS,
|
167
|
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
168
|
+
See the License for the specific language governing permissions and
|
169
|
+
limitations under the License.
|
data/lib/multisert.rb
CHANGED
@@ -68,13 +68,11 @@ private
|
|
68
68
|
|
69
69
|
def cast value
|
70
70
|
case value
|
71
|
-
|
72
|
-
|
73
|
-
|
74
|
-
when
|
75
|
-
|
76
|
-
else
|
77
|
-
value
|
71
|
+
# TODO: want to escape the string too, checking for " and ;
|
72
|
+
when String then "'#{value}'"
|
73
|
+
when Date then "'#{value.strftime("%Y-%m-%d")}'"
|
74
|
+
when Time then "'#{value.strftime("%Y-%m-%d %H:%M:%S")}'"
|
75
|
+
else value
|
78
76
|
end
|
79
77
|
end
|
80
78
|
end
|
data/lib/multisert/version.rb
CHANGED
@@ -0,0 +1,85 @@
|
|
1
|
+
require './performance/performance_helper'
|
2
|
+
|
3
|
+
PERFORMANCE_DATABASE = 'multisert_performance'
|
4
|
+
PERFORMANCE_TABLE = 'performance_data'
|
5
|
+
PERFORMANCE_DESTINATION = "#{PERFORMANCE_DATABASE}.#{PERFORMANCE_TABLE}"
|
6
|
+
NUM_OF_OPERATIONS = 100_000
|
7
|
+
CONNECTION = Mysql2::Client.new(host: 'localhost', username: 'root')
|
8
|
+
|
9
|
+
def puts_with_time content
|
10
|
+
puts "[#{Time.now}] #{content}"
|
11
|
+
end
|
12
|
+
|
13
|
+
def generate_records records_count = NUM_OF_OPERATIONS
|
14
|
+
puts_with_time "generating #{records_count} random entries"
|
15
|
+
sample_records = (0...records_count).reduce([]) do |memo, i|
|
16
|
+
memo << {'field_1' => i,
|
17
|
+
'field_2' => i + 1,
|
18
|
+
'field_3' => i + 2,
|
19
|
+
'field_4' => i + 3}
|
20
|
+
memo
|
21
|
+
end
|
22
|
+
puts_with_time "generated #{records_count} random entries"
|
23
|
+
sample_records
|
24
|
+
end
|
25
|
+
|
26
|
+
def ensure_data_completeness! connection, datastore, expected_count
|
27
|
+
unless (res = connection.query("SELECT COUNT(*) AS the_count FROM #{datastore}").to_a.first['the_count']) == expected_count
|
28
|
+
raise RuntimeError, "data not written completely. Got #{res}, expected #{expected_count}"
|
29
|
+
end
|
30
|
+
end
|
31
|
+
|
32
|
+
def insert_performance_test connection, cleaner, sample_records, destination
|
33
|
+
fields = sample_records.first.keys.join(', ')
|
34
|
+
|
35
|
+
cleaner.ensure_clean_database!
|
36
|
+
|
37
|
+
(timer = Timer.new).start!
|
38
|
+
sample_records.each do |record|
|
39
|
+
connection.query %[
|
40
|
+
INSERT INTO #{destination} (#{fields})
|
41
|
+
VALUES (#{record.map { |k,v| v }.join(', ')})]
|
42
|
+
end
|
43
|
+
runtime = timer.stop!
|
44
|
+
ensure_data_completeness! connection, destination, sample_records.count
|
45
|
+
puts "insert w/o buffer took #{runtime.round(2)}s to insert #{sample_records.count} entries"
|
46
|
+
end
|
47
|
+
|
48
|
+
def multinsert_performance_test connection, cleaner, sample_records, destination, max_buffer_count = nil
|
49
|
+
database, table = destination.split('.')
|
50
|
+
|
51
|
+
buffer = Multisert.new connection: connection,
|
52
|
+
database: database,
|
53
|
+
table: table,
|
54
|
+
fields: sample_records.first.keys,
|
55
|
+
max_buffer_count: max_buffer_count
|
56
|
+
|
57
|
+
cleaner.ensure_clean_database!
|
58
|
+
|
59
|
+
(timer = Timer.new).start!
|
60
|
+
sample_records.each do |record|
|
61
|
+
buffer << record.map { |k, v| v }
|
62
|
+
end
|
63
|
+
buffer.flush!
|
64
|
+
runtime = timer.stop!
|
65
|
+
ensure_data_completeness! connection, destination, sample_records.count
|
66
|
+
puts "multisert w/ buffer of #{buffer.max_buffer_count} took #{runtime.round(2)}s to insert #{sample_records.count} entries"
|
67
|
+
end
|
68
|
+
|
69
|
+
cleaner = MrClean.new(database: PERFORMANCE_DATABASE, connection: CONNECTION)
|
70
|
+
cleaner.create_table_schemas << %[
|
71
|
+
CREATE TABLE IF NOT EXISTS #{PERFORMANCE_DATABASE}.#{PERFORMANCE_TABLE} (
|
72
|
+
field_1 int default null
|
73
|
+
, field_2 int default null
|
74
|
+
, field_3 int default null
|
75
|
+
, field_4 int default null
|
76
|
+
)]
|
77
|
+
|
78
|
+
sample_records = generate_records
|
79
|
+
|
80
|
+
puts_with_time "starting performance test: using #{sample_records.count} random entries, writing to #{PERFORMANCE_DESTINATION}"
|
81
|
+
|
82
|
+
#insert_performance_test CONNECTION, cleaner, sample_records, PERFORMANCE_DESTINATION
|
83
|
+
(0..10_000).step(10) do |i|
|
84
|
+
multinsert_performance_test CONNECTION, cleaner, sample_records, PERFORMANCE_DESTINATION, i
|
85
|
+
end
|
@@ -0,0 +1,62 @@
|
|
1
|
+
require 'rubygems'
|
2
|
+
require 'bundler'
|
3
|
+
Bundler.require
|
4
|
+
require 'mysql2'
|
5
|
+
require './lib/multisert'
|
6
|
+
|
7
|
+
class Timer
|
8
|
+
def start!
|
9
|
+
@start = Time.now
|
10
|
+
end
|
11
|
+
|
12
|
+
def stop!
|
13
|
+
Time.now - @start
|
14
|
+
end
|
15
|
+
end
|
16
|
+
|
17
|
+
# TODO: convert to gem
|
18
|
+
class MrClean
|
19
|
+
attr_accessor :connection, :database, :create_table_schemas
|
20
|
+
|
21
|
+
def initialize attrs = {}
|
22
|
+
@connection = attrs[:connection]
|
23
|
+
@database = attrs[:database]
|
24
|
+
@create_table_schemas = attrs[:create_table_schemas] || []
|
25
|
+
yield self if block_given?
|
26
|
+
end
|
27
|
+
|
28
|
+
def ensure_clean_database! opts = {}
|
29
|
+
clean_database! !!opts[:teardown_tables]
|
30
|
+
ensure_tables!
|
31
|
+
end
|
32
|
+
|
33
|
+
private
|
34
|
+
|
35
|
+
def database_exists?
|
36
|
+
@connection.query('show databases').to_a.map { |database|
|
37
|
+
database['Database']
|
38
|
+
}.include?(@database)
|
39
|
+
end
|
40
|
+
|
41
|
+
def ensure_database!
|
42
|
+
@connection.query "create database if not exists #{@database}"
|
43
|
+
end
|
44
|
+
|
45
|
+
def clean_database! teardown_tables
|
46
|
+
return unless database_exists?
|
47
|
+
@connection.query("show tables in #{@database}").to_a.each do |table|
|
48
|
+
if teardown_tables
|
49
|
+
@connection.query("drop table if exists #{@database}.#{table["Tables_in_#{@database}"]}")
|
50
|
+
else
|
51
|
+
@connection.query("truncate #{@database}.#{table["Tables_in_#{@database}"]}")
|
52
|
+
end
|
53
|
+
end
|
54
|
+
end
|
55
|
+
|
56
|
+
def ensure_tables!
|
57
|
+
ensure_database!
|
58
|
+
@create_table_schemas.each do |create_table_schema|
|
59
|
+
@connection.query create_table_schema
|
60
|
+
end
|
61
|
+
end
|
62
|
+
end
|
data/spec/multisert_spec.rb
CHANGED
@@ -9,7 +9,7 @@ TEST_TABLE = 'test_data'
|
|
9
9
|
# TODO: make into yaml config
|
10
10
|
$connection = Mysql2::Client.new(host: 'localhost', username: 'root')
|
11
11
|
|
12
|
-
$cleaner = MrClean.new(database: TEST_DATABASE, connection: $connection) do |mgr|
|
12
|
+
$cleaner = MultisertSpec::MrClean.new(database: TEST_DATABASE, connection: $connection) do |mgr|
|
13
13
|
mgr.create_table_schemas << %[
|
14
14
|
CREATE TABLE IF NOT EXISTS #{mgr.database}.#{TEST_TABLE} (
|
15
15
|
test_field_int_1 int default null,
|
@@ -17,7 +17,8 @@ $cleaner = MrClean.new(database: TEST_DATABASE, connection: $connection) do |mgr
|
|
17
17
|
test_field_int_3 int default null,
|
18
18
|
test_field_int_4 int default null,
|
19
19
|
test_field_varchar varchar(10) default null,
|
20
|
-
test_field_date DATE default null
|
20
|
+
test_field_date DATE default null,
|
21
|
+
test_field_datetime DATETIME default null
|
21
22
|
)]
|
22
23
|
end
|
23
24
|
|
@@ -147,5 +148,32 @@ describe Multisert do
|
|
147
148
|
|
148
149
|
buffer.entries.should == []
|
149
150
|
end
|
151
|
+
|
152
|
+
it "works with times" do
|
153
|
+
pre_flush_records = connection.query "SELECT * FROM #{TEST_DATABASE}.#{TEST_TABLE}"
|
154
|
+
pre_flush_records.to_a.should == []
|
155
|
+
|
156
|
+
buffer.connection = connection
|
157
|
+
buffer.database = TEST_DATABASE
|
158
|
+
buffer.table = TEST_TABLE
|
159
|
+
buffer.fields = ['test_field_datetime']
|
160
|
+
|
161
|
+
buffer << [Time.new(2013, 1, 15, 1, 5, 11)]
|
162
|
+
buffer << [Time.new(2013, 1, 16, 2, 6, 22)]
|
163
|
+
buffer << [Time.new(2013, 1, 17, 3, 7, 33)]
|
164
|
+
buffer << [Time.new(2013, 1, 18, 4, 8, 44)]
|
165
|
+
|
166
|
+
buffer.flush!
|
167
|
+
|
168
|
+
post_flush_records = connection.query %[SELECT test_field_datetime FROM #{TEST_DATABASE}.#{TEST_TABLE}]
|
169
|
+
|
170
|
+
post_flush_records.to_a.should == [
|
171
|
+
{'test_field_datetime' => Time.new(2013, 1, 15, 1, 5, 11)},
|
172
|
+
{'test_field_datetime' => Time.new(2013, 1, 16, 2, 6, 22)},
|
173
|
+
{'test_field_datetime' => Time.new(2013, 1, 17, 3, 7, 33)},
|
174
|
+
{'test_field_datetime' => Time.new(2013, 1, 18, 4, 8, 44)}]
|
175
|
+
|
176
|
+
buffer.entries.should == []
|
177
|
+
end
|
150
178
|
end
|
151
179
|
end
|
data/spec/spec_helper.rb
CHANGED
@@ -1,46 +1,47 @@
|
|
1
|
-
|
2
|
-
|
1
|
+
module MultisertSpec
|
2
|
+
class MrClean
|
3
|
+
attr_accessor :connection, :database, :create_table_schemas
|
3
4
|
|
4
|
-
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
5
|
+
def initialize attrs = {}
|
6
|
+
@connection = attrs[:connection]
|
7
|
+
@database = attrs[:database]
|
8
|
+
@create_table_schemas = attrs[:create_table_schemas] || []
|
9
|
+
yield self if block_given?
|
10
|
+
end
|
10
11
|
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
12
|
+
def ensure_clean_database! opts = {}
|
13
|
+
clean_database! !!opts[:teardown_tables]
|
14
|
+
ensure_tables!
|
15
|
+
end
|
15
16
|
|
16
|
-
private
|
17
|
+
private
|
17
18
|
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
19
|
+
def database_exists?
|
20
|
+
@connection.query('show databases').to_a.map { |database|
|
21
|
+
database['Database']
|
22
|
+
}.include?(@database)
|
23
|
+
end
|
23
24
|
|
24
|
-
|
25
|
-
|
26
|
-
|
25
|
+
def ensure_database!
|
26
|
+
@connection.query "create database if not exists #{@database}"
|
27
|
+
end
|
27
28
|
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
32
|
-
|
33
|
-
|
34
|
-
|
35
|
-
|
29
|
+
def clean_database! teardown_tables
|
30
|
+
return unless database_exists?
|
31
|
+
@connection.query("show tables in #{@database}").to_a.each do |table|
|
32
|
+
if teardown_tables
|
33
|
+
@connection.query("drop table if exists #{@database}.#{table["Tables_in_#{@database}"]}")
|
34
|
+
else
|
35
|
+
@connection.query("truncate #{@database}.#{table["Tables_in_#{@database}"]}")
|
36
|
+
end
|
36
37
|
end
|
37
38
|
end
|
38
|
-
end
|
39
39
|
|
40
|
-
|
41
|
-
|
42
|
-
|
43
|
-
|
40
|
+
def ensure_tables!
|
41
|
+
ensure_database!
|
42
|
+
@create_table_schemas.each do |create_table_schema|
|
43
|
+
@connection.query create_table_schema
|
44
|
+
end
|
44
45
|
end
|
45
46
|
end
|
46
47
|
end
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: multisert
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.2
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2013-03-
|
12
|
+
date: 2013-03-09 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: mysql2
|
@@ -91,6 +91,8 @@ files:
|
|
91
91
|
- lib/multisert.rb
|
92
92
|
- lib/multisert/version.rb
|
93
93
|
- multisert.gemspec
|
94
|
+
- performance/multisert_performance_test.rb
|
95
|
+
- performance/performance_helper.rb
|
94
96
|
- spec/multisert_spec.rb
|
95
97
|
- spec/spec_helper.rb
|
96
98
|
homepage: https://github.com/jeffreyiacono/multisert
|