multisert 0.0.1 → 0.0.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/LICENSE +10 -19
- data/README.md +78 -1
- data/lib/multisert.rb +5 -7
- data/lib/multisert/version.rb +1 -1
- data/performance/multisert_performance_test.rb +85 -0
- data/performance/performance_helper.rb +62 -0
- data/spec/multisert_spec.rb +30 -2
- data/spec/spec_helper.rb +35 -34
- metadata +4 -2
data/LICENSE
CHANGED
@@ -1,22 +1,13 @@
|
|
1
|
-
Copyright
|
1
|
+
Copyright 2013 Jeff Iacono
|
2
2
|
|
3
|
-
|
3
|
+
Licensed under the Apache License, Version 2.0 (the "License");
|
4
|
+
you may not use this file except in compliance with the License.
|
5
|
+
You may obtain a copy of the License at
|
4
6
|
|
5
|
-
|
6
|
-
a copy of this software and associated documentation files (the
|
7
|
-
"Software"), to deal in the Software without restriction, including
|
8
|
-
without limitation the rights to use, copy, modify, merge, publish,
|
9
|
-
distribute, sublicense, and/or sell copies of the Software, and to
|
10
|
-
permit persons to whom the Software is furnished to do so, subject to
|
11
|
-
the following conditions:
|
7
|
+
http://www.apache.org/licenses/LICENSE-2.0
|
12
8
|
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
19
|
-
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
20
|
-
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
21
|
-
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
22
|
-
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
9
|
+
Unless required by applicable law or agreed to in writing, software
|
10
|
+
distributed under the License is distributed on an "AS IS" BASIS,
|
11
|
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
12
|
+
See the License for the specific language governing permissions and
|
13
|
+
limitations under the License.
|
data/README.md
CHANGED
@@ -1,6 +1,7 @@
|
|
1
1
|
# Multisert
|
2
2
|
|
3
|
-
|
3
|
+
Multisert is a buffer that handles bundling up INSERTs, which increases runtime
|
4
|
+
performance.
|
4
5
|
|
5
6
|
## Installation
|
6
7
|
|
@@ -70,6 +71,52 @@ script. This ensures that any pending entries are written to the database table
|
|
70
71
|
that were not automatically taken care of by the auto-flush that will kick in
|
71
72
|
during the iteration.
|
72
73
|
|
74
|
+
## Performance
|
75
|
+
|
76
|
+
The gem has a quick performance test built in that can be run via:
|
77
|
+
```bash
|
78
|
+
$ ruby ./performance/multisert_performance_test
|
79
|
+
```
|
80
|
+
We ran the performance test (with some modification to iterate the test 5
|
81
|
+
times) and receive the following output:
|
82
|
+
|
83
|
+
```bash
|
84
|
+
$ ruby ./performance/multisert_performance_test
|
85
|
+
# test 1:
|
86
|
+
# insert w/o buffer took 53.37s to insert 100000 entries
|
87
|
+
# multisert w/ buffer of 10000 took 1.77s to insert 100000 entries
|
88
|
+
#
|
89
|
+
# test 2:
|
90
|
+
# insert w/o buffer took 53.22s to insert 100000 entries
|
91
|
+
# multisert w/ buffer of 10000 took 1.84s to insert 100000 entries
|
92
|
+
#
|
93
|
+
# test 3:
|
94
|
+
# insert w/o buffer took 54.42s to insert 100000 entries
|
95
|
+
# multisert w/ buffer of 10000 took 1.9s to insert 100000 entries
|
96
|
+
#
|
97
|
+
# test 4:
|
98
|
+
# insert w/o buffer took 53.38s to insert 100000 entries
|
99
|
+
# multisert w/ buffer of 10000 took 1.81s to insert 100000 entries
|
100
|
+
#
|
101
|
+
# test 5:
|
102
|
+
# insert w/o buffer took 53.52s to insert 100000 entries
|
103
|
+
# multisert w/ buffer of 10000 took 1.78s to insert 100000 entries
|
104
|
+
```
|
105
|
+
|
106
|
+
As we can see, ~30x performance increase.
|
107
|
+
|
108
|
+
The performance test was run on a computer with the following specs:
|
109
|
+
|
110
|
+
Model Name: MacBook Air
|
111
|
+
Model Identifier: MacBookAir4,2
|
112
|
+
Processor Name: Intel Core i5
|
113
|
+
Processor Speed: 1.7 GHz
|
114
|
+
Number of Processors: 1
|
115
|
+
Total Number of Cores: 2
|
116
|
+
L2 Cache (per Core): 256 KB
|
117
|
+
L3 Cache: 3 MB
|
118
|
+
Memory: 4 GB
|
119
|
+
|
73
120
|
## FAQ
|
74
121
|
|
75
122
|
### Packet Too Large / Connection Lost Errors
|
@@ -83,6 +130,20 @@ To learn more, [read the documentation](http://dev.mysql.com/doc/refman/5.5/en//
|
|
83
130
|
If you need to you can adjust the buffer size by setting `max_buffer_count`
|
84
131
|
attribute. Generally, 10,000 to 100,000 is a pretty good starting range.
|
85
132
|
|
133
|
+
### Does it work with Dates?
|
134
|
+
|
135
|
+
Yes, just pass in a Date instance and it will be converted to a mysql
|
136
|
+
friendly format under the hood ("%Y-%m-%d"). If you need a special format,
|
137
|
+
convert the date to a string that is in the form you want before passing it into
|
138
|
+
Multisert.
|
139
|
+
|
140
|
+
### Does it work with Times?
|
141
|
+
|
142
|
+
Yes, just pass in a Time instance and it will be converted to a mysql
|
143
|
+
friendly format under the hood ("%Y-%m-%d %H:%M:%S"). If you need a special
|
144
|
+
format, convert the time to a string that is in the form you want before passing
|
145
|
+
it into Multisert.
|
146
|
+
|
86
147
|
## Contributing
|
87
148
|
|
88
149
|
1. Fork it
|
@@ -90,3 +151,19 @@ attribute. Generally, 10,000 to 100,000 is a pretty good starting range.
|
|
90
151
|
3. Commit your changes (`git commit -am 'Added some feature'`)
|
91
152
|
4. Push to the branch (`git push origin my-new-feature`)
|
92
153
|
5. Create new Pull Request
|
154
|
+
|
155
|
+
## License
|
156
|
+
|
157
|
+
Copyright 2013 Jeff Iacono
|
158
|
+
|
159
|
+
Licensed under the Apache License, Version 2.0 (the "License");
|
160
|
+
you may not use this file except in compliance with the License.
|
161
|
+
You may obtain a copy of the License at
|
162
|
+
|
163
|
+
http://www.apache.org/licenses/LICENSE-2.0
|
164
|
+
|
165
|
+
Unless required by applicable law or agreed to in writing, software
|
166
|
+
distributed under the License is distributed on an "AS IS" BASIS,
|
167
|
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
168
|
+
See the License for the specific language governing permissions and
|
169
|
+
limitations under the License.
|
data/lib/multisert.rb
CHANGED
@@ -68,13 +68,11 @@ private
|
|
68
68
|
|
69
69
|
def cast value
|
70
70
|
case value
|
71
|
-
|
72
|
-
|
73
|
-
|
74
|
-
when
|
75
|
-
|
76
|
-
else
|
77
|
-
value
|
71
|
+
# TODO: want to escape the string too, checking for " and ;
|
72
|
+
when String then "'#{value}'"
|
73
|
+
when Date then "'#{value.strftime("%Y-%m-%d")}'"
|
74
|
+
when Time then "'#{value.strftime("%Y-%m-%d %H:%M:%S")}'"
|
75
|
+
else value
|
78
76
|
end
|
79
77
|
end
|
80
78
|
end
|
data/lib/multisert/version.rb
CHANGED
@@ -0,0 +1,85 @@
|
|
1
|
+
require './performance/performance_helper'
|
2
|
+
|
3
|
+
PERFORMANCE_DATABASE = 'multisert_performance'
|
4
|
+
PERFORMANCE_TABLE = 'performance_data'
|
5
|
+
PERFORMANCE_DESTINATION = "#{PERFORMANCE_DATABASE}.#{PERFORMANCE_TABLE}"
|
6
|
+
NUM_OF_OPERATIONS = 100_000
|
7
|
+
CONNECTION = Mysql2::Client.new(host: 'localhost', username: 'root')
|
8
|
+
|
9
|
+
def puts_with_time content
|
10
|
+
puts "[#{Time.now}] #{content}"
|
11
|
+
end
|
12
|
+
|
13
|
+
def generate_records records_count = NUM_OF_OPERATIONS
|
14
|
+
puts_with_time "generating #{records_count} random entries"
|
15
|
+
sample_records = (0...records_count).reduce([]) do |memo, i|
|
16
|
+
memo << {'field_1' => i,
|
17
|
+
'field_2' => i + 1,
|
18
|
+
'field_3' => i + 2,
|
19
|
+
'field_4' => i + 3}
|
20
|
+
memo
|
21
|
+
end
|
22
|
+
puts_with_time "generated #{records_count} random entries"
|
23
|
+
sample_records
|
24
|
+
end
|
25
|
+
|
26
|
+
def ensure_data_completeness! connection, datastore, expected_count
|
27
|
+
unless (res = connection.query("SELECT COUNT(*) AS the_count FROM #{datastore}").to_a.first['the_count']) == expected_count
|
28
|
+
raise RuntimeError, "data not written completely. Got #{res}, expected #{expected_count}"
|
29
|
+
end
|
30
|
+
end
|
31
|
+
|
32
|
+
def insert_performance_test connection, cleaner, sample_records, destination
|
33
|
+
fields = sample_records.first.keys.join(', ')
|
34
|
+
|
35
|
+
cleaner.ensure_clean_database!
|
36
|
+
|
37
|
+
(timer = Timer.new).start!
|
38
|
+
sample_records.each do |record|
|
39
|
+
connection.query %[
|
40
|
+
INSERT INTO #{destination} (#{fields})
|
41
|
+
VALUES (#{record.map { |k,v| v }.join(', ')})]
|
42
|
+
end
|
43
|
+
runtime = timer.stop!
|
44
|
+
ensure_data_completeness! connection, destination, sample_records.count
|
45
|
+
puts "insert w/o buffer took #{runtime.round(2)}s to insert #{sample_records.count} entries"
|
46
|
+
end
|
47
|
+
|
48
|
+
def multinsert_performance_test connection, cleaner, sample_records, destination, max_buffer_count = nil
|
49
|
+
database, table = destination.split('.')
|
50
|
+
|
51
|
+
buffer = Multisert.new connection: connection,
|
52
|
+
database: database,
|
53
|
+
table: table,
|
54
|
+
fields: sample_records.first.keys,
|
55
|
+
max_buffer_count: max_buffer_count
|
56
|
+
|
57
|
+
cleaner.ensure_clean_database!
|
58
|
+
|
59
|
+
(timer = Timer.new).start!
|
60
|
+
sample_records.each do |record|
|
61
|
+
buffer << record.map { |k, v| v }
|
62
|
+
end
|
63
|
+
buffer.flush!
|
64
|
+
runtime = timer.stop!
|
65
|
+
ensure_data_completeness! connection, destination, sample_records.count
|
66
|
+
puts "multisert w/ buffer of #{buffer.max_buffer_count} took #{runtime.round(2)}s to insert #{sample_records.count} entries"
|
67
|
+
end
|
68
|
+
|
69
|
+
cleaner = MrClean.new(database: PERFORMANCE_DATABASE, connection: CONNECTION)
|
70
|
+
cleaner.create_table_schemas << %[
|
71
|
+
CREATE TABLE IF NOT EXISTS #{PERFORMANCE_DATABASE}.#{PERFORMANCE_TABLE} (
|
72
|
+
field_1 int default null
|
73
|
+
, field_2 int default null
|
74
|
+
, field_3 int default null
|
75
|
+
, field_4 int default null
|
76
|
+
)]
|
77
|
+
|
78
|
+
sample_records = generate_records
|
79
|
+
|
80
|
+
puts_with_time "starting performance test: using #{sample_records.count} random entries, writing to #{PERFORMANCE_DESTINATION}"
|
81
|
+
|
82
|
+
#insert_performance_test CONNECTION, cleaner, sample_records, PERFORMANCE_DESTINATION
|
83
|
+
(0..10_000).step(10) do |i|
|
84
|
+
multinsert_performance_test CONNECTION, cleaner, sample_records, PERFORMANCE_DESTINATION, i
|
85
|
+
end
|
@@ -0,0 +1,62 @@
|
|
1
|
+
require 'rubygems'
|
2
|
+
require 'bundler'
|
3
|
+
Bundler.require
|
4
|
+
require 'mysql2'
|
5
|
+
require './lib/multisert'
|
6
|
+
|
7
|
+
class Timer
|
8
|
+
def start!
|
9
|
+
@start = Time.now
|
10
|
+
end
|
11
|
+
|
12
|
+
def stop!
|
13
|
+
Time.now - @start
|
14
|
+
end
|
15
|
+
end
|
16
|
+
|
17
|
+
# TODO: convert to gem
|
18
|
+
class MrClean
|
19
|
+
attr_accessor :connection, :database, :create_table_schemas
|
20
|
+
|
21
|
+
def initialize attrs = {}
|
22
|
+
@connection = attrs[:connection]
|
23
|
+
@database = attrs[:database]
|
24
|
+
@create_table_schemas = attrs[:create_table_schemas] || []
|
25
|
+
yield self if block_given?
|
26
|
+
end
|
27
|
+
|
28
|
+
def ensure_clean_database! opts = {}
|
29
|
+
clean_database! !!opts[:teardown_tables]
|
30
|
+
ensure_tables!
|
31
|
+
end
|
32
|
+
|
33
|
+
private
|
34
|
+
|
35
|
+
def database_exists?
|
36
|
+
@connection.query('show databases').to_a.map { |database|
|
37
|
+
database['Database']
|
38
|
+
}.include?(@database)
|
39
|
+
end
|
40
|
+
|
41
|
+
def ensure_database!
|
42
|
+
@connection.query "create database if not exists #{@database}"
|
43
|
+
end
|
44
|
+
|
45
|
+
def clean_database! teardown_tables
|
46
|
+
return unless database_exists?
|
47
|
+
@connection.query("show tables in #{@database}").to_a.each do |table|
|
48
|
+
if teardown_tables
|
49
|
+
@connection.query("drop table if exists #{@database}.#{table["Tables_in_#{@database}"]}")
|
50
|
+
else
|
51
|
+
@connection.query("truncate #{@database}.#{table["Tables_in_#{@database}"]}")
|
52
|
+
end
|
53
|
+
end
|
54
|
+
end
|
55
|
+
|
56
|
+
def ensure_tables!
|
57
|
+
ensure_database!
|
58
|
+
@create_table_schemas.each do |create_table_schema|
|
59
|
+
@connection.query create_table_schema
|
60
|
+
end
|
61
|
+
end
|
62
|
+
end
|
data/spec/multisert_spec.rb
CHANGED
@@ -9,7 +9,7 @@ TEST_TABLE = 'test_data'
|
|
9
9
|
# TODO: make into yaml config
|
10
10
|
$connection = Mysql2::Client.new(host: 'localhost', username: 'root')
|
11
11
|
|
12
|
-
$cleaner = MrClean.new(database: TEST_DATABASE, connection: $connection) do |mgr|
|
12
|
+
$cleaner = MultisertSpec::MrClean.new(database: TEST_DATABASE, connection: $connection) do |mgr|
|
13
13
|
mgr.create_table_schemas << %[
|
14
14
|
CREATE TABLE IF NOT EXISTS #{mgr.database}.#{TEST_TABLE} (
|
15
15
|
test_field_int_1 int default null,
|
@@ -17,7 +17,8 @@ $cleaner = MrClean.new(database: TEST_DATABASE, connection: $connection) do |mgr
|
|
17
17
|
test_field_int_3 int default null,
|
18
18
|
test_field_int_4 int default null,
|
19
19
|
test_field_varchar varchar(10) default null,
|
20
|
-
test_field_date DATE default null
|
20
|
+
test_field_date DATE default null,
|
21
|
+
test_field_datetime DATETIME default null
|
21
22
|
)]
|
22
23
|
end
|
23
24
|
|
@@ -147,5 +148,32 @@ describe Multisert do
|
|
147
148
|
|
148
149
|
buffer.entries.should == []
|
149
150
|
end
|
151
|
+
|
152
|
+
it "works with times" do
|
153
|
+
pre_flush_records = connection.query "SELECT * FROM #{TEST_DATABASE}.#{TEST_TABLE}"
|
154
|
+
pre_flush_records.to_a.should == []
|
155
|
+
|
156
|
+
buffer.connection = connection
|
157
|
+
buffer.database = TEST_DATABASE
|
158
|
+
buffer.table = TEST_TABLE
|
159
|
+
buffer.fields = ['test_field_datetime']
|
160
|
+
|
161
|
+
buffer << [Time.new(2013, 1, 15, 1, 5, 11)]
|
162
|
+
buffer << [Time.new(2013, 1, 16, 2, 6, 22)]
|
163
|
+
buffer << [Time.new(2013, 1, 17, 3, 7, 33)]
|
164
|
+
buffer << [Time.new(2013, 1, 18, 4, 8, 44)]
|
165
|
+
|
166
|
+
buffer.flush!
|
167
|
+
|
168
|
+
post_flush_records = connection.query %[SELECT test_field_datetime FROM #{TEST_DATABASE}.#{TEST_TABLE}]
|
169
|
+
|
170
|
+
post_flush_records.to_a.should == [
|
171
|
+
{'test_field_datetime' => Time.new(2013, 1, 15, 1, 5, 11)},
|
172
|
+
{'test_field_datetime' => Time.new(2013, 1, 16, 2, 6, 22)},
|
173
|
+
{'test_field_datetime' => Time.new(2013, 1, 17, 3, 7, 33)},
|
174
|
+
{'test_field_datetime' => Time.new(2013, 1, 18, 4, 8, 44)}]
|
175
|
+
|
176
|
+
buffer.entries.should == []
|
177
|
+
end
|
150
178
|
end
|
151
179
|
end
|
data/spec/spec_helper.rb
CHANGED
@@ -1,46 +1,47 @@
|
|
1
|
-
|
2
|
-
|
1
|
+
module MultisertSpec
|
2
|
+
class MrClean
|
3
|
+
attr_accessor :connection, :database, :create_table_schemas
|
3
4
|
|
4
|
-
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
5
|
+
def initialize attrs = {}
|
6
|
+
@connection = attrs[:connection]
|
7
|
+
@database = attrs[:database]
|
8
|
+
@create_table_schemas = attrs[:create_table_schemas] || []
|
9
|
+
yield self if block_given?
|
10
|
+
end
|
10
11
|
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
12
|
+
def ensure_clean_database! opts = {}
|
13
|
+
clean_database! !!opts[:teardown_tables]
|
14
|
+
ensure_tables!
|
15
|
+
end
|
15
16
|
|
16
|
-
private
|
17
|
+
private
|
17
18
|
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
19
|
+
def database_exists?
|
20
|
+
@connection.query('show databases').to_a.map { |database|
|
21
|
+
database['Database']
|
22
|
+
}.include?(@database)
|
23
|
+
end
|
23
24
|
|
24
|
-
|
25
|
-
|
26
|
-
|
25
|
+
def ensure_database!
|
26
|
+
@connection.query "create database if not exists #{@database}"
|
27
|
+
end
|
27
28
|
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
32
|
-
|
33
|
-
|
34
|
-
|
35
|
-
|
29
|
+
def clean_database! teardown_tables
|
30
|
+
return unless database_exists?
|
31
|
+
@connection.query("show tables in #{@database}").to_a.each do |table|
|
32
|
+
if teardown_tables
|
33
|
+
@connection.query("drop table if exists #{@database}.#{table["Tables_in_#{@database}"]}")
|
34
|
+
else
|
35
|
+
@connection.query("truncate #{@database}.#{table["Tables_in_#{@database}"]}")
|
36
|
+
end
|
36
37
|
end
|
37
38
|
end
|
38
|
-
end
|
39
39
|
|
40
|
-
|
41
|
-
|
42
|
-
|
43
|
-
|
40
|
+
def ensure_tables!
|
41
|
+
ensure_database!
|
42
|
+
@create_table_schemas.each do |create_table_schema|
|
43
|
+
@connection.query create_table_schema
|
44
|
+
end
|
44
45
|
end
|
45
46
|
end
|
46
47
|
end
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: multisert
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.2
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2013-03-
|
12
|
+
date: 2013-03-09 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: mysql2
|
@@ -91,6 +91,8 @@ files:
|
|
91
91
|
- lib/multisert.rb
|
92
92
|
- lib/multisert/version.rb
|
93
93
|
- multisert.gemspec
|
94
|
+
- performance/multisert_performance_test.rb
|
95
|
+
- performance/performance_helper.rb
|
94
96
|
- spec/multisert_spec.rb
|
95
97
|
- spec/spec_helper.rb
|
96
98
|
homepage: https://github.com/jeffreyiacono/multisert
|