multisert 0.0.2 → 0.0.3

Sign up to get free protection for your applications and to get access to all the features.
data/README.md CHANGED
@@ -27,7 +27,7 @@ CREATE TABLE IF NOT EXISTS some_database.some_table (
27
27
  field_2 int default null,
28
28
  field_3 int default null,
29
29
  field_4 int default null
30
- );
30
+ ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
31
31
  ```
32
32
 
33
33
  Now let's say we want to insert 1,000,000 records after running the
@@ -57,22 +57,24 @@ buffer = Multisert.new connection: dbclient,
57
57
  res = some_magical_calculation(i)
58
58
  buffer << res
59
59
  end
60
- buffer.flush!
60
+ buffer.write!
61
61
  ```
62
62
 
63
63
  We start by creating a new Multisert instance, providing the database
64
64
  connection, database and table, and fields as attributes. Next, as we get the
65
65
  results from `some_magical_calculation`, we shovel each into the Multisert
66
66
  instance. As we iterate through, the Multisert instance will build up the
67
- records and then flush itself to the specified database table when it hits an
67
+ records and then write itself to the specified database table when it hits an
68
68
  internal count (default is 10_000, but can be set via the `max_buffer_count`
69
- attribute). One last thing to note is the `buffer.flush!` at the end of the
69
+ attribute). One last thing to note is the `buffer.write!` at the end of the
70
70
  script. This ensures that any pending entries are written to the database table
71
- that were not automatically taken care of by the auto-flush that will kick in
71
+ that were not automatically taken care of by the auto-write that will kick in
72
72
  during the iteration.
73
73
 
74
74
  ## Performance
75
75
 
76
+ ### Individual vs Buffer
77
+
76
78
  The gem has a quick performance test built in that can be run via:
77
79
  ```bash
78
80
  $ ruby ./performance/multisert_performance_test
@@ -117,6 +119,36 @@ The performance test was run on a computer with the following specs:
117
119
  L3 Cache: 3 MB
118
120
  Memory: 4 GB
119
121
 
122
+ All data was written to a mysql instance on localhost.
123
+
124
+ ### Buffer Sizes
125
+
126
+ Let's take a look at how buffer size comes into play.
127
+
128
+ We ran 3 separate and independent tests on the same computer as above.
129
+ Additionally, also note that a buffer size of 0 and 1 are basically identical.
130
+
131
+ If we look at using a buffer size ranging from 0 - 10, we see the following
132
+ performance:
133
+
134
+ <img src="https://raw.github.com/jeffreyiacono/images/master/multisert/multisert-performance-test-0-10.png" width="900" alt="Buffer size: 0 - 10" />
135
+
136
+ If we take a step back and look at buffer sizes ranging from 0 - 100, we see the
137
+ following performance:
138
+
139
+ <img src="https://raw.github.com/jeffreyiacono/images/master/multisert/multisert-performance-test-0-100.png" width="900" alt="Buffer size: 0 - 100" />
140
+
141
+ Finally, if we look at buffer sizes ranging from 0 - 1,000 and 0 - 10,000 we see
142
+ the following performance (spoiler alert: not much difference, just more data
143
+ points!):
144
+
145
+ <img src="https://raw.github.com/jeffreyiacono/images/master/multisert/multisert-performance-test-0-1000.png" width="900" alt="Buffer size: 0 - 100" />
146
+
147
+ <img src="https://raw.github.com/jeffreyiacono/images/master/multisert/multisert-performance-test-0-10000.png" width="900" alt="Buffer size: 0 - 100" />
148
+
149
+ As can be seen, we see vastly improved performance as we increment our buffer
150
+ from 0 - 100, but then level off thereafter.
151
+
120
152
  ## FAQ
121
153
 
122
154
  ### Packet Too Large / Connection Lost Errors
@@ -18,37 +18,44 @@ class Multisert
18
18
  end
19
19
 
20
20
  def entries
21
- @entries ||= []
21
+ buffer
22
22
  end
23
23
 
24
24
  def << entry
25
25
  entries << entry
26
- flush! if flush_buffer?
26
+ write_buffer! if write_buffer?
27
27
  entry
28
28
  end
29
29
 
30
- def flush!
30
+ def write_buffer!
31
31
  return if buffer_empty?
32
32
  @connection.query multisert_sql
33
- reset_entries!
33
+ reset_buffer!
34
34
  end
35
35
 
36
+ alias_method :write!, :write_buffer!
37
+ alias_method :flush!, :write_buffer!
38
+
36
39
  def max_buffer_count
37
40
  @max_buffer_count || MAX_BUFFER_COUNT_DEFAULT
38
41
  end
39
42
 
40
43
  private
41
44
 
42
- def buffer_empty?
43
- entries.empty?
45
+ def buffer
46
+ @buffer ||= []
44
47
  end
45
48
 
46
- def flush_buffer?
47
- entries.count >= max_buffer_count
49
+ def reset_buffer!
50
+ @buffer = []
51
+ end
52
+
53
+ def buffer_empty?
54
+ buffer.empty?
48
55
  end
49
56
 
50
- def reset_entries!
51
- @entries = []
57
+ def write_buffer?
58
+ buffer.count >= max_buffer_count
52
59
  end
53
60
 
54
61
  def multisert_sql
@@ -60,7 +67,7 @@ private
60
67
  end
61
68
 
62
69
  def multisert_values
63
- @entries.reduce([]) { |memo, entries|
70
+ @buffer.reduce([]) { |memo, entries|
64
71
  memo << "(#{entries.map { |e| cast e }.join(',')})"
65
72
  memo
66
73
  }.join(",")
@@ -1,3 +1,3 @@
1
1
  class Multisert
2
- VERSION = "0.0.2"
2
+ VERSION = "0.0.3"
3
3
  end
@@ -8,7 +8,7 @@ Gem::Specification.new do |gem|
8
8
  gem.summary = %q{Buffer to handle bulk INSERTs}
9
9
  gem.homepage = "https://github.com/jeffreyiacono/multisert"
10
10
 
11
- gem.files = `git ls-files`.split($\)
11
+ gem.files = `git ls-files`.split($\).delete_if { |f| f =~ /^data\// }
12
12
  gem.executables = gem.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
13
13
  gem.test_files = gem.files.grep(%r{^(test|spec|features)/})
14
14
  gem.name = "multisert"
@@ -79,7 +79,14 @@ sample_records = generate_records
79
79
 
80
80
  puts_with_time "starting performance test: using #{sample_records.count} random entries, writing to #{PERFORMANCE_DESTINATION}"
81
81
 
82
- #insert_performance_test CONNECTION, cleaner, sample_records, PERFORMANCE_DESTINATION
83
- (0..10_000).step(10) do |i|
82
+ # individual insert vs multisert
83
+ insert_performance_test CONNECTION, cleaner, sample_records, PERFORMANCE_DESTINATION
84
+ multinsert_performance_test CONNECTION, cleaner, sample_records, PERFORMANCE_DESTINATION, 10_000
85
+
86
+ mini_steps = (0..9)
87
+ big_steps = (10..10_000).step(10)
88
+
89
+ # buffer size performance test
90
+ [*mini_steps, *big_steps].each do |i|
84
91
  multinsert_performance_test CONNECTION, cleaner, sample_records, PERFORMANCE_DESTINATION, i
85
92
  end
@@ -28,18 +28,18 @@ describe Multisert do
28
28
 
29
29
  it "addes to the entries" do
30
30
  buffer << [1, 2, 3]
31
- buffer.entries.should == [[1, 2, 3]]
31
+ expect(buffer.entries).to eq [[1, 2, 3]]
32
32
  end
33
33
 
34
34
  it "calls #flush! when the number of entries equals (or exceeds) max buffer count" do
35
35
  buffer.max_buffer_count = 2
36
- buffer.should_receive(:flush!)
36
+ buffer.should_receive(:write_buffer!)
37
37
  buffer << [1, 2, 3]
38
38
  buffer << [1, 2, 3]
39
39
  end
40
40
  end
41
41
 
42
- describe "#flush!" do
42
+ describe "#write_buffer!" do
43
43
  let(:connection) { $connection }
44
44
  let(:buffer) { described_class.new }
45
45
 
@@ -48,19 +48,19 @@ describe Multisert do
48
48
  end
49
49
 
50
50
  it "does not fall over when there are no entries" do
51
- flush_records = connection.query "DELETE FROM #{TEST_DATABASE}.#{TEST_TABLE}"
52
- flush_records.to_a.should == []
51
+ write_buffer_records = connection.query "DELETE FROM #{TEST_DATABASE}.#{TEST_TABLE}"
52
+ expect(write_buffer_records.to_a).to eq []
53
53
 
54
- buffer.flush!
54
+ buffer.write_buffer!
55
55
 
56
- flush_records = connection.query "SELECT * FROM #{TEST_DATABASE}.#{TEST_TABLE}"
57
- flush_records.to_a.should == []
58
- buffer.entries.should == []
56
+ write_buffer_records = connection.query "SELECT * FROM #{TEST_DATABASE}.#{TEST_TABLE}"
57
+ expect(write_buffer_records.to_a).to eq []
58
+ expect(buffer.entries).to eq []
59
59
  end
60
60
 
61
61
  it "multi-inserts all added entries" do
62
- pre_flush_records = connection.query "SELECT * FROM #{TEST_DATABASE}.#{TEST_TABLE}"
63
- pre_flush_records.to_a.should == []
62
+ pre_write_buffer_records = connection.query "SELECT * FROM #{TEST_DATABASE}.#{TEST_TABLE}"
63
+ expect(pre_write_buffer_records.to_a).to eq []
64
64
 
65
65
  buffer.connection = connection
66
66
  buffer.database = TEST_DATABASE
@@ -75,9 +75,9 @@ describe Multisert do
75
75
  buffer << [10, 11, 12, 13]
76
76
  buffer << [14, 15, 16, 17]
77
77
 
78
- buffer.flush!
78
+ buffer.write_buffer!
79
79
 
80
- post_flush_records = connection.query %[
80
+ post_write_buffer_records = connection.query %[
81
81
  SELECT
82
82
  test_field_int_1
83
83
  , test_field_int_2
@@ -85,18 +85,19 @@ describe Multisert do
85
85
  , test_field_int_4
86
86
  FROM #{TEST_DATABASE}.#{TEST_TABLE}]
87
87
 
88
- post_flush_records.to_a.should == [
88
+ expect(post_write_buffer_records.to_a).to eq [
89
89
  {'test_field_int_1' => 1, 'test_field_int_2' => 3, 'test_field_int_3' => 4, 'test_field_int_4' => 5},
90
90
  {'test_field_int_1' => 6, 'test_field_int_2' => 7, 'test_field_int_3' => 8, 'test_field_int_4' => 9},
91
91
  {'test_field_int_1' => 10, 'test_field_int_2' => 11, 'test_field_int_3' => 12, 'test_field_int_4' => 13},
92
- {'test_field_int_1' => 14, 'test_field_int_2' => 15, 'test_field_int_3' => 16, 'test_field_int_4' => 17}]
92
+ {'test_field_int_1' => 14, 'test_field_int_2' => 15, 'test_field_int_3' => 16, 'test_field_int_4' => 17}
93
+ ]
93
94
 
94
- buffer.entries.should == []
95
+ expect(buffer.entries).to eq []
95
96
  end
96
97
 
97
98
  it "works with strings" do
98
- pre_flush_records = connection.query "SELECT * FROM #{TEST_DATABASE}.#{TEST_TABLE}"
99
- pre_flush_records.to_a.should == []
99
+ pre_write_buffer_records = connection.query "SELECT * FROM #{TEST_DATABASE}.#{TEST_TABLE}"
100
+ expect(pre_write_buffer_records.to_a).to eq []
100
101
 
101
102
  buffer.connection = connection
102
103
  buffer.database = TEST_DATABASE
@@ -108,23 +109,24 @@ describe Multisert do
108
109
  buffer << ['c']
109
110
  buffer << ['d']
110
111
 
111
- buffer.flush!
112
+ buffer.write_buffer!
112
113
 
113
- post_flush_records = connection.query %[SELECT test_field_varchar FROM #{TEST_DATABASE}.#{TEST_TABLE}]
114
- post_flush_records.to_a.should == [
114
+ post_write_buffer_records = connection.query %[SELECT test_field_varchar FROM #{TEST_DATABASE}.#{TEST_TABLE}]
115
+ expect(post_write_buffer_records.to_a).to eq [
115
116
  {'test_field_varchar' => 'a'},
116
117
  {'test_field_varchar' => 'b'},
117
118
  {'test_field_varchar' => 'c'},
118
- {'test_field_varchar' => 'd'}]
119
+ {'test_field_varchar' => 'd'}
120
+ ]
119
121
 
120
- buffer.entries.should == []
122
+ expect(buffer.entries).to eq []
121
123
  end
122
124
 
123
125
  it "works with strings that have illegal characters"
124
126
 
125
127
  it "works with dates" do
126
- pre_flush_records = connection.query "SELECT * FROM #{TEST_DATABASE}.#{TEST_TABLE}"
127
- pre_flush_records.to_a.should == []
128
+ pre_write_buffer_records = connection.query "SELECT * FROM #{TEST_DATABASE}.#{TEST_TABLE}"
129
+ expect(pre_write_buffer_records.to_a).to eq []
128
130
 
129
131
  buffer.connection = connection
130
132
  buffer.database = TEST_DATABASE
@@ -136,22 +138,23 @@ describe Multisert do
136
138
  buffer << [Date.new(2013, 1, 17)]
137
139
  buffer << [Date.new(2013, 1, 18)]
138
140
 
139
- buffer.flush!
141
+ buffer.write_buffer!
140
142
 
141
- post_flush_records = connection.query %[SELECT test_field_date FROM #{TEST_DATABASE}.#{TEST_TABLE}]
143
+ post_write_buffer_records = connection.query %[SELECT test_field_date FROM #{TEST_DATABASE}.#{TEST_TABLE}]
142
144
 
143
- post_flush_records.to_a.should == [
145
+ expect(post_write_buffer_records.to_a).to eq [
144
146
  {'test_field_date' => Date.parse('2013-01-15')},
145
147
  {'test_field_date' => Date.parse('2013-01-16')},
146
148
  {'test_field_date' => Date.parse('2013-01-17')},
147
- {'test_field_date' => Date.parse('2013-01-18')}]
149
+ {'test_field_date' => Date.parse('2013-01-18')}
150
+ ]
148
151
 
149
- buffer.entries.should == []
152
+ expect(buffer.entries).to eq []
150
153
  end
151
154
 
152
155
  it "works with times" do
153
- pre_flush_records = connection.query "SELECT * FROM #{TEST_DATABASE}.#{TEST_TABLE}"
154
- pre_flush_records.to_a.should == []
156
+ pre_write_buffer_records = connection.query "SELECT * FROM #{TEST_DATABASE}.#{TEST_TABLE}"
157
+ expect(pre_write_buffer_records.to_a).to eq []
155
158
 
156
159
  buffer.connection = connection
157
160
  buffer.database = TEST_DATABASE
@@ -163,17 +166,33 @@ describe Multisert do
163
166
  buffer << [Time.new(2013, 1, 17, 3, 7, 33)]
164
167
  buffer << [Time.new(2013, 1, 18, 4, 8, 44)]
165
168
 
166
- buffer.flush!
169
+ buffer.write_buffer!
167
170
 
168
- post_flush_records = connection.query %[SELECT test_field_datetime FROM #{TEST_DATABASE}.#{TEST_TABLE}]
171
+ post_write_buffer_records = connection.query %[SELECT test_field_datetime FROM #{TEST_DATABASE}.#{TEST_TABLE}]
169
172
 
170
- post_flush_records.to_a.should == [
173
+ expect(post_write_buffer_records.to_a).to eq [
171
174
  {'test_field_datetime' => Time.new(2013, 1, 15, 1, 5, 11)},
172
175
  {'test_field_datetime' => Time.new(2013, 1, 16, 2, 6, 22)},
173
176
  {'test_field_datetime' => Time.new(2013, 1, 17, 3, 7, 33)},
174
177
  {'test_field_datetime' => Time.new(2013, 1, 18, 4, 8, 44)}]
175
178
 
176
- buffer.entries.should == []
179
+ expect(buffer.entries).to eq []
180
+ end
181
+ end
182
+
183
+ describe "#flush!" do
184
+ it "aliases #write_buffer!" do
185
+ instance = described_class.new
186
+ flush_method = instance.method(:flush!)
187
+ expect(flush_method).to eq instance.method(:write_buffer!)
188
+ end
189
+ end
190
+
191
+ describe "#write!" do
192
+ it "aliases #write_buffer!" do
193
+ instance = described_class.new
194
+ flush_method = instance.method(:write!)
195
+ expect(flush_method).to eq instance.method(:write_buffer!)
177
196
  end
178
197
  end
179
198
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: multisert
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.2
4
+ version: 0.0.3
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2013-03-09 00:00:00.000000000 Z
12
+ date: 2013-03-13 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: mysql2