multisert 0.0.1 → 0.0.2

Sign up to get free protection for your applications and to get access to all the features.
data/LICENSE CHANGED
@@ -1,22 +1,13 @@
1
- Copyright (c) 2013 Jeff Iacono
1
+ Copyright 2013 Jeff Iacono
2
2
 
3
- MIT License
3
+ Licensed under the Apache License, Version 2.0 (the "License");
4
+ you may not use this file except in compliance with the License.
5
+ You may obtain a copy of the License at
4
6
 
5
- Permission is hereby granted, free of charge, to any person obtaining
6
- a copy of this software and associated documentation files (the
7
- "Software"), to deal in the Software without restriction, including
8
- without limitation the rights to use, copy, modify, merge, publish,
9
- distribute, sublicense, and/or sell copies of the Software, and to
10
- permit persons to whom the Software is furnished to do so, subject to
11
- the following conditions:
7
+ http://www.apache.org/licenses/LICENSE-2.0
12
8
 
13
- The above copyright notice and this permission notice shall be
14
- included in all copies or substantial portions of the Software.
15
-
16
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
- EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
- MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
- NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
- LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
- OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
- WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
9
+ Unless required by applicable law or agreed to in writing, software
10
+ distributed under the License is distributed on an "AS IS" BASIS,
11
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ See the License for the specific language governing permissions and
13
+ limitations under the License.
data/README.md CHANGED
@@ -1,6 +1,7 @@
1
1
  # Multisert
2
2
 
3
- TODO: Write a gem description
3
+ Multisert is a buffer that handles bundling up INSERTs, which increases runtime
4
+ performance.
4
5
 
5
6
  ## Installation
6
7
 
@@ -70,6 +71,52 @@ script. This ensures that any pending entries are written to the database table
70
71
  that were not automatically taken care of by the auto-flush that will kick in
71
72
  during the iteration.
72
73
 
74
+ ## Performance
75
+
76
+ The gem has a quick performance test built in that can be run via:
77
+ ```bash
78
+ $ ruby ./performance/multisert_performance_test
79
+ ```
80
+ We ran the performance test (with some modification to iterate the test 5
81
+ times) and receive the following output:
82
+
83
+ ```bash
84
+ $ ruby ./performance/multisert_performance_test
85
+ # test 1:
86
+ # insert w/o buffer took 53.37s to insert 100000 entries
87
+ # multisert w/ buffer of 10000 took 1.77s to insert 100000 entries
88
+ #
89
+ # test 2:
90
+ # insert w/o buffer took 53.22s to insert 100000 entries
91
+ # multisert w/ buffer of 10000 took 1.84s to insert 100000 entries
92
+ #
93
+ # test 3:
94
+ # insert w/o buffer took 54.42s to insert 100000 entries
95
+ # multisert w/ buffer of 10000 took 1.9s to insert 100000 entries
96
+ #
97
+ # test 4:
98
+ # insert w/o buffer took 53.38s to insert 100000 entries
99
+ # multisert w/ buffer of 10000 took 1.81s to insert 100000 entries
100
+ #
101
+ # test 5:
102
+ # insert w/o buffer took 53.52s to insert 100000 entries
103
+ # multisert w/ buffer of 10000 took 1.78s to insert 100000 entries
104
+ ```
105
+
106
+ As we can see, ~30x performance increase.
107
+
108
+ The performance test was run on a computer with the following specs:
109
+
110
+ Model Name: MacBook Air
111
+ Model Identifier: MacBookAir4,2
112
+ Processor Name: Intel Core i5
113
+ Processor Speed: 1.7 GHz
114
+ Number of Processors: 1
115
+ Total Number of Cores: 2
116
+ L2 Cache (per Core): 256 KB
117
+ L3 Cache: 3 MB
118
+ Memory: 4 GB
119
+
73
120
  ## FAQ
74
121
 
75
122
  ### Packet Too Large / Connection Lost Errors
@@ -83,6 +130,20 @@ To learn more, [read the documentation](http://dev.mysql.com/doc/refman/5.5/en//
83
130
  If you need to you can adjust the buffer size by setting `max_buffer_count`
84
131
  attribute. Generally, 10,000 to 100,000 is a pretty good starting range.
85
132
 
133
+ ### Does it work with Dates?
134
+
135
+ Yes, just pass in a Date instance and it will be converted to a mysql
136
+ friendly format under the hood ("%Y-%m-%d"). If you need a special format,
137
+ convert the date to a string that is in the form you want before passing it into
138
+ Multisert.
139
+
140
+ ### Does it work with Times?
141
+
142
+ Yes, just pass in a Time instance and it will be converted to a mysql
143
+ friendly format under the hood ("%Y-%m-%d %H:%M:%S"). If you need a special
144
+ format, convert the time to a string that is in the form you want before passing
145
+ it into Multisert.
146
+
86
147
  ## Contributing
87
148
 
88
149
  1. Fork it
@@ -90,3 +151,19 @@ attribute. Generally, 10,000 to 100,000 is a pretty good starting range.
90
151
  3. Commit your changes (`git commit -am 'Added some feature'`)
91
152
  4. Push to the branch (`git push origin my-new-feature`)
92
153
  5. Create new Pull Request
154
+
155
+ ## License
156
+
157
+ Copyright 2013 Jeff Iacono
158
+
159
+ Licensed under the Apache License, Version 2.0 (the "License");
160
+ you may not use this file except in compliance with the License.
161
+ You may obtain a copy of the License at
162
+
163
+ http://www.apache.org/licenses/LICENSE-2.0
164
+
165
+ Unless required by applicable law or agreed to in writing, software
166
+ distributed under the License is distributed on an "AS IS" BASIS,
167
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
168
+ See the License for the specific language governing permissions and
169
+ limitations under the License.
@@ -68,13 +68,11 @@ private
68
68
 
69
69
  def cast value
70
70
  case value
71
- when String
72
- # TODO: want to escape the string too, checking for " and ;
73
- "'#{value}'"
74
- when Date
75
- "'#{value}'"
76
- else
77
- value
71
+ # TODO: want to escape the string too, checking for " and ;
72
+ when String then "'#{value}'"
73
+ when Date then "'#{value.strftime("%Y-%m-%d")}'"
74
+ when Time then "'#{value.strftime("%Y-%m-%d %H:%M:%S")}'"
75
+ else value
78
76
  end
79
77
  end
80
78
  end
@@ -1,3 +1,3 @@
1
1
  class Multisert
2
- VERSION = "0.0.1"
2
+ VERSION = "0.0.2"
3
3
  end
@@ -0,0 +1,85 @@
1
+ require './performance/performance_helper'
2
+
3
+ PERFORMANCE_DATABASE = 'multisert_performance'
4
+ PERFORMANCE_TABLE = 'performance_data'
5
+ PERFORMANCE_DESTINATION = "#{PERFORMANCE_DATABASE}.#{PERFORMANCE_TABLE}"
6
+ NUM_OF_OPERATIONS = 100_000
7
+ CONNECTION = Mysql2::Client.new(host: 'localhost', username: 'root')
8
+
9
+ def puts_with_time content
10
+ puts "[#{Time.now}] #{content}"
11
+ end
12
+
13
+ def generate_records records_count = NUM_OF_OPERATIONS
14
+ puts_with_time "generating #{records_count} random entries"
15
+ sample_records = (0...records_count).reduce([]) do |memo, i|
16
+ memo << {'field_1' => i,
17
+ 'field_2' => i + 1,
18
+ 'field_3' => i + 2,
19
+ 'field_4' => i + 3}
20
+ memo
21
+ end
22
+ puts_with_time "generated #{records_count} random entries"
23
+ sample_records
24
+ end
25
+
26
+ def ensure_data_completeness! connection, datastore, expected_count
27
+ unless (res = connection.query("SELECT COUNT(*) AS the_count FROM #{datastore}").to_a.first['the_count']) == expected_count
28
+ raise RuntimeError, "data not written completely. Got #{res}, expected #{expected_count}"
29
+ end
30
+ end
31
+
32
+ def insert_performance_test connection, cleaner, sample_records, destination
33
+ fields = sample_records.first.keys.join(', ')
34
+
35
+ cleaner.ensure_clean_database!
36
+
37
+ (timer = Timer.new).start!
38
+ sample_records.each do |record|
39
+ connection.query %[
40
+ INSERT INTO #{destination} (#{fields})
41
+ VALUES (#{record.map { |k,v| v }.join(', ')})]
42
+ end
43
+ runtime = timer.stop!
44
+ ensure_data_completeness! connection, destination, sample_records.count
45
+ puts "insert w/o buffer took #{runtime.round(2)}s to insert #{sample_records.count} entries"
46
+ end
47
+
48
+ def multinsert_performance_test connection, cleaner, sample_records, destination, max_buffer_count = nil
49
+ database, table = destination.split('.')
50
+
51
+ buffer = Multisert.new connection: connection,
52
+ database: database,
53
+ table: table,
54
+ fields: sample_records.first.keys,
55
+ max_buffer_count: max_buffer_count
56
+
57
+ cleaner.ensure_clean_database!
58
+
59
+ (timer = Timer.new).start!
60
+ sample_records.each do |record|
61
+ buffer << record.map { |k, v| v }
62
+ end
63
+ buffer.flush!
64
+ runtime = timer.stop!
65
+ ensure_data_completeness! connection, destination, sample_records.count
66
+ puts "multisert w/ buffer of #{buffer.max_buffer_count} took #{runtime.round(2)}s to insert #{sample_records.count} entries"
67
+ end
68
+
69
+ cleaner = MrClean.new(database: PERFORMANCE_DATABASE, connection: CONNECTION)
70
+ cleaner.create_table_schemas << %[
71
+ CREATE TABLE IF NOT EXISTS #{PERFORMANCE_DATABASE}.#{PERFORMANCE_TABLE} (
72
+ field_1 int default null
73
+ , field_2 int default null
74
+ , field_3 int default null
75
+ , field_4 int default null
76
+ )]
77
+
78
+ sample_records = generate_records
79
+
80
+ puts_with_time "starting performance test: using #{sample_records.count} random entries, writing to #{PERFORMANCE_DESTINATION}"
81
+
82
+ #insert_performance_test CONNECTION, cleaner, sample_records, PERFORMANCE_DESTINATION
83
+ (0..10_000).step(10) do |i|
84
+ multinsert_performance_test CONNECTION, cleaner, sample_records, PERFORMANCE_DESTINATION, i
85
+ end
@@ -0,0 +1,62 @@
1
+ require 'rubygems'
2
+ require 'bundler'
3
+ Bundler.require
4
+ require 'mysql2'
5
+ require './lib/multisert'
6
+
7
+ class Timer
8
+ def start!
9
+ @start = Time.now
10
+ end
11
+
12
+ def stop!
13
+ Time.now - @start
14
+ end
15
+ end
16
+
17
+ # TODO: convert to gem
18
+ class MrClean
19
+ attr_accessor :connection, :database, :create_table_schemas
20
+
21
+ def initialize attrs = {}
22
+ @connection = attrs[:connection]
23
+ @database = attrs[:database]
24
+ @create_table_schemas = attrs[:create_table_schemas] || []
25
+ yield self if block_given?
26
+ end
27
+
28
+ def ensure_clean_database! opts = {}
29
+ clean_database! !!opts[:teardown_tables]
30
+ ensure_tables!
31
+ end
32
+
33
+ private
34
+
35
+ def database_exists?
36
+ @connection.query('show databases').to_a.map { |database|
37
+ database['Database']
38
+ }.include?(@database)
39
+ end
40
+
41
+ def ensure_database!
42
+ @connection.query "create database if not exists #{@database}"
43
+ end
44
+
45
+ def clean_database! teardown_tables
46
+ return unless database_exists?
47
+ @connection.query("show tables in #{@database}").to_a.each do |table|
48
+ if teardown_tables
49
+ @connection.query("drop table if exists #{@database}.#{table["Tables_in_#{@database}"]}")
50
+ else
51
+ @connection.query("truncate #{@database}.#{table["Tables_in_#{@database}"]}")
52
+ end
53
+ end
54
+ end
55
+
56
+ def ensure_tables!
57
+ ensure_database!
58
+ @create_table_schemas.each do |create_table_schema|
59
+ @connection.query create_table_schema
60
+ end
61
+ end
62
+ end
@@ -9,7 +9,7 @@ TEST_TABLE = 'test_data'
9
9
  # TODO: make into yaml config
10
10
  $connection = Mysql2::Client.new(host: 'localhost', username: 'root')
11
11
 
12
- $cleaner = MrClean.new(database: TEST_DATABASE, connection: $connection) do |mgr|
12
+ $cleaner = MultisertSpec::MrClean.new(database: TEST_DATABASE, connection: $connection) do |mgr|
13
13
  mgr.create_table_schemas << %[
14
14
  CREATE TABLE IF NOT EXISTS #{mgr.database}.#{TEST_TABLE} (
15
15
  test_field_int_1 int default null,
@@ -17,7 +17,8 @@ $cleaner = MrClean.new(database: TEST_DATABASE, connection: $connection) do |mgr
17
17
  test_field_int_3 int default null,
18
18
  test_field_int_4 int default null,
19
19
  test_field_varchar varchar(10) default null,
20
- test_field_date DATE default null
20
+ test_field_date DATE default null,
21
+ test_field_datetime DATETIME default null
21
22
  )]
22
23
  end
23
24
 
@@ -147,5 +148,32 @@ describe Multisert do
147
148
 
148
149
  buffer.entries.should == []
149
150
  end
151
+
152
+ it "works with times" do
153
+ pre_flush_records = connection.query "SELECT * FROM #{TEST_DATABASE}.#{TEST_TABLE}"
154
+ pre_flush_records.to_a.should == []
155
+
156
+ buffer.connection = connection
157
+ buffer.database = TEST_DATABASE
158
+ buffer.table = TEST_TABLE
159
+ buffer.fields = ['test_field_datetime']
160
+
161
+ buffer << [Time.new(2013, 1, 15, 1, 5, 11)]
162
+ buffer << [Time.new(2013, 1, 16, 2, 6, 22)]
163
+ buffer << [Time.new(2013, 1, 17, 3, 7, 33)]
164
+ buffer << [Time.new(2013, 1, 18, 4, 8, 44)]
165
+
166
+ buffer.flush!
167
+
168
+ post_flush_records = connection.query %[SELECT test_field_datetime FROM #{TEST_DATABASE}.#{TEST_TABLE}]
169
+
170
+ post_flush_records.to_a.should == [
171
+ {'test_field_datetime' => Time.new(2013, 1, 15, 1, 5, 11)},
172
+ {'test_field_datetime' => Time.new(2013, 1, 16, 2, 6, 22)},
173
+ {'test_field_datetime' => Time.new(2013, 1, 17, 3, 7, 33)},
174
+ {'test_field_datetime' => Time.new(2013, 1, 18, 4, 8, 44)}]
175
+
176
+ buffer.entries.should == []
177
+ end
150
178
  end
151
179
  end
@@ -1,46 +1,47 @@
1
- class MrClean
2
- attr_accessor :connection, :database, :create_table_schemas
1
+ module MultisertSpec
2
+ class MrClean
3
+ attr_accessor :connection, :database, :create_table_schemas
3
4
 
4
- def initialize attrs = {}
5
- @connection = attrs[:connection]
6
- @database = attrs[:database]
7
- @create_table_schemas = attrs[:create_table_schemas] || []
8
- yield self if block_given?
9
- end
5
+ def initialize attrs = {}
6
+ @connection = attrs[:connection]
7
+ @database = attrs[:database]
8
+ @create_table_schemas = attrs[:create_table_schemas] || []
9
+ yield self if block_given?
10
+ end
10
11
 
11
- def ensure_clean_database! opts = {}
12
- clean_database! !!opts[:teardown_tables]
13
- ensure_tables!
14
- end
12
+ def ensure_clean_database! opts = {}
13
+ clean_database! !!opts[:teardown_tables]
14
+ ensure_tables!
15
+ end
15
16
 
16
- private
17
+ private
17
18
 
18
- def database_exists?
19
- @connection.query('show databases').to_a.map { |database|
20
- database['Database']
21
- }.include?(@database)
22
- end
19
+ def database_exists?
20
+ @connection.query('show databases').to_a.map { |database|
21
+ database['Database']
22
+ }.include?(@database)
23
+ end
23
24
 
24
- def ensure_database!
25
- @connection.query "create database if not exists #{@database}"
26
- end
25
+ def ensure_database!
26
+ @connection.query "create database if not exists #{@database}"
27
+ end
27
28
 
28
- def clean_database! teardown_tables
29
- return unless database_exists?
30
- @connection.query("show tables in #{@database}").to_a.each do |table|
31
- if teardown_tables
32
- puts "TEARING DOWN"
33
- @connection.query("drop table if exists #{@database}.#{table["Tables_in_#{@database}"]}")
34
- else
35
- @connection.query("truncate #{@database}.#{table["Tables_in_#{@database}"]}")
29
+ def clean_database! teardown_tables
30
+ return unless database_exists?
31
+ @connection.query("show tables in #{@database}").to_a.each do |table|
32
+ if teardown_tables
33
+ @connection.query("drop table if exists #{@database}.#{table["Tables_in_#{@database}"]}")
34
+ else
35
+ @connection.query("truncate #{@database}.#{table["Tables_in_#{@database}"]}")
36
+ end
36
37
  end
37
38
  end
38
- end
39
39
 
40
- def ensure_tables!
41
- ensure_database!
42
- @create_table_schemas.each do |create_table_schema|
43
- @connection.query create_table_schema
40
+ def ensure_tables!
41
+ ensure_database!
42
+ @create_table_schemas.each do |create_table_schema|
43
+ @connection.query create_table_schema
44
+ end
44
45
  end
45
46
  end
46
47
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: multisert
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.1
4
+ version: 0.0.2
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2013-03-07 00:00:00.000000000 Z
12
+ date: 2013-03-09 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: mysql2
@@ -91,6 +91,8 @@ files:
91
91
  - lib/multisert.rb
92
92
  - lib/multisert/version.rb
93
93
  - multisert.gemspec
94
+ - performance/multisert_performance_test.rb
95
+ - performance/performance_helper.rb
94
96
  - spec/multisert_spec.rb
95
97
  - spec/spec_helper.rb
96
98
  homepage: https://github.com/jeffreyiacono/multisert