multisert 0.0.1 → 0.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/LICENSE CHANGED
@@ -1,22 +1,13 @@
1
- Copyright (c) 2013 Jeff Iacono
1
+ Copyright 2013 Jeff Iacono
2
2
 
3
- MIT License
3
+ Licensed under the Apache License, Version 2.0 (the "License");
4
+ you may not use this file except in compliance with the License.
5
+ You may obtain a copy of the License at
4
6
 
5
- Permission is hereby granted, free of charge, to any person obtaining
6
- a copy of this software and associated documentation files (the
7
- "Software"), to deal in the Software without restriction, including
8
- without limitation the rights to use, copy, modify, merge, publish,
9
- distribute, sublicense, and/or sell copies of the Software, and to
10
- permit persons to whom the Software is furnished to do so, subject to
11
- the following conditions:
7
+ http://www.apache.org/licenses/LICENSE-2.0
12
8
 
13
- The above copyright notice and this permission notice shall be
14
- included in all copies or substantial portions of the Software.
15
-
16
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
- EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
- MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
- NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
- LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
- OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
- WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
9
+ Unless required by applicable law or agreed to in writing, software
10
+ distributed under the License is distributed on an "AS IS" BASIS,
11
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ See the License for the specific language governing permissions and
13
+ limitations under the License.
data/README.md CHANGED
@@ -1,6 +1,7 @@
1
1
  # Multisert
2
2
 
3
- TODO: Write a gem description
3
+ Multisert is a buffer that handles bundling up INSERTs, which increases runtime
4
+ performance.
4
5
 
5
6
  ## Installation
6
7
 
@@ -70,6 +71,52 @@ script. This ensures that any pending entries are written to the database table
70
71
  that were not automatically taken care of by the auto-flush that will kick in
71
72
  during the iteration.
72
73
 
74
+ ## Performance
75
+
76
+ The gem has a quick performance test built in that can be run via:
77
+ ```bash
78
+ $ ruby ./performance/multisert_performance_test
79
+ ```
80
+ We ran the performance test (with some modification to iterate the test 5
81
+ times) and receive the following output:
82
+
83
+ ```bash
84
+ $ ruby ./performance/multisert_performance_test
85
+ # test 1:
86
+ # insert w/o buffer took 53.37s to insert 100000 entries
87
+ # multisert w/ buffer of 10000 took 1.77s to insert 100000 entries
88
+ #
89
+ # test 2:
90
+ # insert w/o buffer took 53.22s to insert 100000 entries
91
+ # multisert w/ buffer of 10000 took 1.84s to insert 100000 entries
92
+ #
93
+ # test 3:
94
+ # insert w/o buffer took 54.42s to insert 100000 entries
95
+ # multisert w/ buffer of 10000 took 1.9s to insert 100000 entries
96
+ #
97
+ # test 4:
98
+ # insert w/o buffer took 53.38s to insert 100000 entries
99
+ # multisert w/ buffer of 10000 took 1.81s to insert 100000 entries
100
+ #
101
+ # test 5:
102
+ # insert w/o buffer took 53.52s to insert 100000 entries
103
+ # multisert w/ buffer of 10000 took 1.78s to insert 100000 entries
104
+ ```
105
+
106
+ As we can see, ~30x performance increase.
107
+
108
+ The performance test was run on a computer with the following specs:
109
+
110
+ Model Name: MacBook Air
111
+ Model Identifier: MacBookAir4,2
112
+ Processor Name: Intel Core i5
113
+ Processor Speed: 1.7 GHz
114
+ Number of Processors: 1
115
+ Total Number of Cores: 2
116
+ L2 Cache (per Core): 256 KB
117
+ L3 Cache: 3 MB
118
+ Memory: 4 GB
119
+
73
120
  ## FAQ
74
121
 
75
122
  ### Packet Too Large / Connection Lost Errors
@@ -83,6 +130,20 @@ To learn more, [read the documentation](http://dev.mysql.com/doc/refman/5.5/en//
83
130
  If you need to you can adjust the buffer size by setting `max_buffer_count`
84
131
  attribute. Generally, 10,000 to 100,000 is a pretty good starting range.
85
132
 
133
+ ### Does it work with Dates?
134
+
135
+ Yes, just pass in a Date instance and it will be converted to a mysql
136
+ friendly format under the hood ("%Y-%m-%d"). If you need a special format,
137
+ convert the date to a string that is in the form you want before passing it into
138
+ Multisert.
139
+
140
+ ### Does it work with Times?
141
+
142
+ Yes, just pass in a Time instance and it will be converted to a mysql
143
+ friendly format under the hood ("%Y-%m-%d %H:%M:%S"). If you need a special
144
+ format, convert the time to a string that is in the form you want before passing
145
+ it into Multisert.
146
+
86
147
  ## Contributing
87
148
 
88
149
  1. Fork it
@@ -90,3 +151,19 @@ attribute. Generally, 10,000 to 100,000 is a pretty good starting range.
90
151
  3. Commit your changes (`git commit -am 'Added some feature'`)
91
152
  4. Push to the branch (`git push origin my-new-feature`)
92
153
  5. Create new Pull Request
154
+
155
+ ## License
156
+
157
+ Copyright 2013 Jeff Iacono
158
+
159
+ Licensed under the Apache License, Version 2.0 (the "License");
160
+ you may not use this file except in compliance with the License.
161
+ You may obtain a copy of the License at
162
+
163
+ http://www.apache.org/licenses/LICENSE-2.0
164
+
165
+ Unless required by applicable law or agreed to in writing, software
166
+ distributed under the License is distributed on an "AS IS" BASIS,
167
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
168
+ See the License for the specific language governing permissions and
169
+ limitations under the License.
@@ -68,13 +68,11 @@ private
68
68
 
69
69
  def cast value
70
70
  case value
71
- when String
72
- # TODO: want to escape the string too, checking for " and ;
73
- "'#{value}'"
74
- when Date
75
- "'#{value}'"
76
- else
77
- value
71
+ # TODO: want to escape the string too, checking for " and ;
72
+ when String then "'#{value}'"
73
+ when Date then "'#{value.strftime("%Y-%m-%d")}'"
74
+ when Time then "'#{value.strftime("%Y-%m-%d %H:%M:%S")}'"
75
+ else value
78
76
  end
79
77
  end
80
78
  end
@@ -1,3 +1,3 @@
1
1
  class Multisert
2
- VERSION = "0.0.1"
2
+ VERSION = "0.0.2"
3
3
  end
@@ -0,0 +1,85 @@
1
+ require './performance/performance_helper'
2
+
3
+ PERFORMANCE_DATABASE = 'multisert_performance'
4
+ PERFORMANCE_TABLE = 'performance_data'
5
+ PERFORMANCE_DESTINATION = "#{PERFORMANCE_DATABASE}.#{PERFORMANCE_TABLE}"
6
+ NUM_OF_OPERATIONS = 100_000
7
+ CONNECTION = Mysql2::Client.new(host: 'localhost', username: 'root')
8
+
9
+ def puts_with_time content
10
+ puts "[#{Time.now}] #{content}"
11
+ end
12
+
13
+ def generate_records records_count = NUM_OF_OPERATIONS
14
+ puts_with_time "generating #{records_count} random entries"
15
+ sample_records = (0...records_count).reduce([]) do |memo, i|
16
+ memo << {'field_1' => i,
17
+ 'field_2' => i + 1,
18
+ 'field_3' => i + 2,
19
+ 'field_4' => i + 3}
20
+ memo
21
+ end
22
+ puts_with_time "generated #{records_count} random entries"
23
+ sample_records
24
+ end
25
+
26
+ def ensure_data_completeness! connection, datastore, expected_count
27
+ unless (res = connection.query("SELECT COUNT(*) AS the_count FROM #{datastore}").to_a.first['the_count']) == expected_count
28
+ raise RuntimeError, "data not written completely. Got #{res}, expected #{expected_count}"
29
+ end
30
+ end
31
+
32
+ def insert_performance_test connection, cleaner, sample_records, destination
33
+ fields = sample_records.first.keys.join(', ')
34
+
35
+ cleaner.ensure_clean_database!
36
+
37
+ (timer = Timer.new).start!
38
+ sample_records.each do |record|
39
+ connection.query %[
40
+ INSERT INTO #{destination} (#{fields})
41
+ VALUES (#{record.map { |k,v| v }.join(', ')})]
42
+ end
43
+ runtime = timer.stop!
44
+ ensure_data_completeness! connection, destination, sample_records.count
45
+ puts "insert w/o buffer took #{runtime.round(2)}s to insert #{sample_records.count} entries"
46
+ end
47
+
48
+ def multinsert_performance_test connection, cleaner, sample_records, destination, max_buffer_count = nil
49
+ database, table = destination.split('.')
50
+
51
+ buffer = Multisert.new connection: connection,
52
+ database: database,
53
+ table: table,
54
+ fields: sample_records.first.keys,
55
+ max_buffer_count: max_buffer_count
56
+
57
+ cleaner.ensure_clean_database!
58
+
59
+ (timer = Timer.new).start!
60
+ sample_records.each do |record|
61
+ buffer << record.map { |k, v| v }
62
+ end
63
+ buffer.flush!
64
+ runtime = timer.stop!
65
+ ensure_data_completeness! connection, destination, sample_records.count
66
+ puts "multisert w/ buffer of #{buffer.max_buffer_count} took #{runtime.round(2)}s to insert #{sample_records.count} entries"
67
+ end
68
+
69
+ cleaner = MrClean.new(database: PERFORMANCE_DATABASE, connection: CONNECTION)
70
+ cleaner.create_table_schemas << %[
71
+ CREATE TABLE IF NOT EXISTS #{PERFORMANCE_DATABASE}.#{PERFORMANCE_TABLE} (
72
+ field_1 int default null
73
+ , field_2 int default null
74
+ , field_3 int default null
75
+ , field_4 int default null
76
+ )]
77
+
78
+ sample_records = generate_records
79
+
80
+ puts_with_time "starting performance test: using #{sample_records.count} random entries, writing to #{PERFORMANCE_DESTINATION}"
81
+
82
+ #insert_performance_test CONNECTION, cleaner, sample_records, PERFORMANCE_DESTINATION
83
+ (0..10_000).step(10) do |i|
84
+ multinsert_performance_test CONNECTION, cleaner, sample_records, PERFORMANCE_DESTINATION, i
85
+ end
@@ -0,0 +1,62 @@
1
+ require 'rubygems'
2
+ require 'bundler'
3
+ Bundler.require
4
+ require 'mysql2'
5
+ require './lib/multisert'
6
+
7
+ class Timer
8
+ def start!
9
+ @start = Time.now
10
+ end
11
+
12
+ def stop!
13
+ Time.now - @start
14
+ end
15
+ end
16
+
17
+ # TODO: convert to gem
18
+ class MrClean
19
+ attr_accessor :connection, :database, :create_table_schemas
20
+
21
+ def initialize attrs = {}
22
+ @connection = attrs[:connection]
23
+ @database = attrs[:database]
24
+ @create_table_schemas = attrs[:create_table_schemas] || []
25
+ yield self if block_given?
26
+ end
27
+
28
+ def ensure_clean_database! opts = {}
29
+ clean_database! !!opts[:teardown_tables]
30
+ ensure_tables!
31
+ end
32
+
33
+ private
34
+
35
+ def database_exists?
36
+ @connection.query('show databases').to_a.map { |database|
37
+ database['Database']
38
+ }.include?(@database)
39
+ end
40
+
41
+ def ensure_database!
42
+ @connection.query "create database if not exists #{@database}"
43
+ end
44
+
45
+ def clean_database! teardown_tables
46
+ return unless database_exists?
47
+ @connection.query("show tables in #{@database}").to_a.each do |table|
48
+ if teardown_tables
49
+ @connection.query("drop table if exists #{@database}.#{table["Tables_in_#{@database}"]}")
50
+ else
51
+ @connection.query("truncate #{@database}.#{table["Tables_in_#{@database}"]}")
52
+ end
53
+ end
54
+ end
55
+
56
+ def ensure_tables!
57
+ ensure_database!
58
+ @create_table_schemas.each do |create_table_schema|
59
+ @connection.query create_table_schema
60
+ end
61
+ end
62
+ end
@@ -9,7 +9,7 @@ TEST_TABLE = 'test_data'
9
9
  # TODO: make into yaml config
10
10
  $connection = Mysql2::Client.new(host: 'localhost', username: 'root')
11
11
 
12
- $cleaner = MrClean.new(database: TEST_DATABASE, connection: $connection) do |mgr|
12
+ $cleaner = MultisertSpec::MrClean.new(database: TEST_DATABASE, connection: $connection) do |mgr|
13
13
  mgr.create_table_schemas << %[
14
14
  CREATE TABLE IF NOT EXISTS #{mgr.database}.#{TEST_TABLE} (
15
15
  test_field_int_1 int default null,
@@ -17,7 +17,8 @@ $cleaner = MrClean.new(database: TEST_DATABASE, connection: $connection) do |mgr
17
17
  test_field_int_3 int default null,
18
18
  test_field_int_4 int default null,
19
19
  test_field_varchar varchar(10) default null,
20
- test_field_date DATE default null
20
+ test_field_date DATE default null,
21
+ test_field_datetime DATETIME default null
21
22
  )]
22
23
  end
23
24
 
@@ -147,5 +148,32 @@ describe Multisert do
147
148
 
148
149
  buffer.entries.should == []
149
150
  end
151
+
152
+ it "works with times" do
153
+ pre_flush_records = connection.query "SELECT * FROM #{TEST_DATABASE}.#{TEST_TABLE}"
154
+ pre_flush_records.to_a.should == []
155
+
156
+ buffer.connection = connection
157
+ buffer.database = TEST_DATABASE
158
+ buffer.table = TEST_TABLE
159
+ buffer.fields = ['test_field_datetime']
160
+
161
+ buffer << [Time.new(2013, 1, 15, 1, 5, 11)]
162
+ buffer << [Time.new(2013, 1, 16, 2, 6, 22)]
163
+ buffer << [Time.new(2013, 1, 17, 3, 7, 33)]
164
+ buffer << [Time.new(2013, 1, 18, 4, 8, 44)]
165
+
166
+ buffer.flush!
167
+
168
+ post_flush_records = connection.query %[SELECT test_field_datetime FROM #{TEST_DATABASE}.#{TEST_TABLE}]
169
+
170
+ post_flush_records.to_a.should == [
171
+ {'test_field_datetime' => Time.new(2013, 1, 15, 1, 5, 11)},
172
+ {'test_field_datetime' => Time.new(2013, 1, 16, 2, 6, 22)},
173
+ {'test_field_datetime' => Time.new(2013, 1, 17, 3, 7, 33)},
174
+ {'test_field_datetime' => Time.new(2013, 1, 18, 4, 8, 44)}]
175
+
176
+ buffer.entries.should == []
177
+ end
150
178
  end
151
179
  end
@@ -1,46 +1,47 @@
1
- class MrClean
2
- attr_accessor :connection, :database, :create_table_schemas
1
+ module MultisertSpec
2
+ class MrClean
3
+ attr_accessor :connection, :database, :create_table_schemas
3
4
 
4
- def initialize attrs = {}
5
- @connection = attrs[:connection]
6
- @database = attrs[:database]
7
- @create_table_schemas = attrs[:create_table_schemas] || []
8
- yield self if block_given?
9
- end
5
+ def initialize attrs = {}
6
+ @connection = attrs[:connection]
7
+ @database = attrs[:database]
8
+ @create_table_schemas = attrs[:create_table_schemas] || []
9
+ yield self if block_given?
10
+ end
10
11
 
11
- def ensure_clean_database! opts = {}
12
- clean_database! !!opts[:teardown_tables]
13
- ensure_tables!
14
- end
12
+ def ensure_clean_database! opts = {}
13
+ clean_database! !!opts[:teardown_tables]
14
+ ensure_tables!
15
+ end
15
16
 
16
- private
17
+ private
17
18
 
18
- def database_exists?
19
- @connection.query('show databases').to_a.map { |database|
20
- database['Database']
21
- }.include?(@database)
22
- end
19
+ def database_exists?
20
+ @connection.query('show databases').to_a.map { |database|
21
+ database['Database']
22
+ }.include?(@database)
23
+ end
23
24
 
24
- def ensure_database!
25
- @connection.query "create database if not exists #{@database}"
26
- end
25
+ def ensure_database!
26
+ @connection.query "create database if not exists #{@database}"
27
+ end
27
28
 
28
- def clean_database! teardown_tables
29
- return unless database_exists?
30
- @connection.query("show tables in #{@database}").to_a.each do |table|
31
- if teardown_tables
32
- puts "TEARING DOWN"
33
- @connection.query("drop table if exists #{@database}.#{table["Tables_in_#{@database}"]}")
34
- else
35
- @connection.query("truncate #{@database}.#{table["Tables_in_#{@database}"]}")
29
+ def clean_database! teardown_tables
30
+ return unless database_exists?
31
+ @connection.query("show tables in #{@database}").to_a.each do |table|
32
+ if teardown_tables
33
+ @connection.query("drop table if exists #{@database}.#{table["Tables_in_#{@database}"]}")
34
+ else
35
+ @connection.query("truncate #{@database}.#{table["Tables_in_#{@database}"]}")
36
+ end
36
37
  end
37
38
  end
38
- end
39
39
 
40
- def ensure_tables!
41
- ensure_database!
42
- @create_table_schemas.each do |create_table_schema|
43
- @connection.query create_table_schema
40
+ def ensure_tables!
41
+ ensure_database!
42
+ @create_table_schemas.each do |create_table_schema|
43
+ @connection.query create_table_schema
44
+ end
44
45
  end
45
46
  end
46
47
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: multisert
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.1
4
+ version: 0.0.2
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2013-03-07 00:00:00.000000000 Z
12
+ date: 2013-03-09 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: mysql2
@@ -91,6 +91,8 @@ files:
91
91
  - lib/multisert.rb
92
92
  - lib/multisert/version.rb
93
93
  - multisert.gemspec
94
+ - performance/multisert_performance_test.rb
95
+ - performance/performance_helper.rb
94
96
  - spec/multisert_spec.rb
95
97
  - spec/spec_helper.rb
96
98
  homepage: https://github.com/jeffreyiacono/multisert