upsert 0.2.2 → 0.3.0
Sign up to get free protection for your applications and to get access to all the features.
- data/CHANGELOG +8 -1
- data/README.md +9 -10
- data/lib/upsert.rb +9 -6
- data/lib/upsert/mysql2_client.rb +23 -76
- data/lib/upsert/pg_connection.rb +2 -2
- data/lib/upsert/sqlite3_database.rb +2 -2
- data/lib/upsert/version.rb +1 -1
- data/test/helper.rb +2 -2
- data/test/shared/database.rb +4 -4
- data/test/shared/multibyte.rb +2 -2
- data/test/shared/threaded.rb +2 -2
- data/test/test_mysql2.rb +0 -65
- metadata +1 -1
data/CHANGELOG
CHANGED
@@ -1,8 +1,15 @@
|
|
1
|
+
0.3.0 / 2012-06-21
|
2
|
+
|
3
|
+
* Enhancements
|
4
|
+
|
5
|
+
* Remove all the sampling - just keep a cumulative total of sql bytes as we build up an ON DUPLICATE KEY UPDATE query.
|
6
|
+
* Deprecate Upsert.stream in favor of Upsert.batch (but provide an alias for backwards compat)
|
7
|
+
|
1
8
|
0.2.2 / 2012-06-21
|
2
9
|
|
3
10
|
* Bug fixes
|
4
11
|
|
5
|
-
* Correct and simplify how sql length is calculated when batching
|
12
|
+
* Correct and simplify how sql length is calculated when batching MySQL upserts.
|
6
13
|
|
7
14
|
0.2.1 / 2012-06-21
|
8
15
|
|
data/README.md
CHANGED
@@ -10,24 +10,24 @@ The second argument is currently (mis)named a "document" because this was inspir
|
|
10
10
|
|
11
11
|
### One by one
|
12
12
|
|
13
|
-
Faster than just doing `Pet.create`... 85% faster on PostgreSQL, for example. But no validations or anything.
|
13
|
+
Faster than just doing `Pet.create`... 85% faster on PostgreSQL, for example, than all the different native ActiveRecord methods I've tried. But no validations or anything.
|
14
14
|
|
15
15
|
upsert = Upsert.new Pet.connection, Pet.table_name
|
16
16
|
upsert.row({:name => 'Jerry'}, :breed => 'beagle')
|
17
17
|
upsert.row({:name => 'Pierre'}, :breed => 'tabby')
|
18
18
|
|
19
|
-
###
|
19
|
+
### Batch mode
|
20
20
|
|
21
21
|
Rows are buffered in memory until it's efficient to send them to the database. Currently this only provides an advantage on MySQL because it uses `ON DUPLICATE KEY UPDATE`... but if a similar method appears in PostgreSQL, the same code will still work.
|
22
22
|
|
23
|
-
Upsert.
|
23
|
+
Upsert.batch(Pet.connection, Pet.table_name) do |upsert|
|
24
24
|
upsert.row({:name => 'Jerry'}, :breed => 'beagle')
|
25
25
|
upsert.row({:name => 'Pierre'}, :breed => 'tabby')
|
26
26
|
end
|
27
27
|
|
28
28
|
### `ActiveRecord::Base.upsert` (optional)
|
29
29
|
|
30
|
-
For bulk upserts, you probably still want to use `Upsert.
|
30
|
+
For bulk upserts, you probably still want to use `Upsert.batch`.
|
31
31
|
|
32
32
|
require 'upsert/active_record_upsert'
|
33
33
|
Pet.upsert({:name => 'Jerry'}, :breed => 'beagle')
|
@@ -37,7 +37,7 @@ For bulk upserts, you probably still want to use `Upsert.stream`.
|
|
37
37
|
|
38
38
|
Currently, the first row you pass in determines the columns that will be used. That's useful for mass importing of many rows with the same columns, but is surprising if you're trying to use a single `Upsert` object to add arbitrary data. For example, this won't work:
|
39
39
|
|
40
|
-
Upsert.
|
40
|
+
Upsert.batch(Pet.connection, Pet.table_name) do |upsert|
|
41
41
|
upsert.row({:name => 'Jerry'}, :breed => 'beagle')
|
42
42
|
upsert.row({:tag_number => 456}, :spiel => 'great cat') # won't work - doesn't use same columns
|
43
43
|
end
|
@@ -51,10 +51,9 @@ You would need to use a new `Upsert` object. On the other hand, this is totally
|
|
51
51
|
|
52
52
|
Pull requests for any of these would be greatly appreciated:
|
53
53
|
|
54
|
-
1.
|
55
|
-
2.
|
56
|
-
3.
|
57
|
-
4. Naming suggestions: should "document" be called "setters" or "attributes"? Should "stream" be "batch" instead?
|
54
|
+
1. Fix SQLite tests.
|
55
|
+
2. If you think there's a fix for the "fixed column set" gotcha...
|
56
|
+
3. Naming suggestions: should "document" be called "setters" or "attributes"?
|
58
57
|
|
59
58
|
## Real-world usage
|
60
59
|
|
@@ -219,7 +218,7 @@ You could also use [activerecord-import](https://github.com/zdennis/activerecord
|
|
219
218
|
|
220
219
|
Pet.import columns, all_values, :timestamps => false, :on_duplicate_key_update => columns
|
221
220
|
|
222
|
-
This, however, only works on MySQL and requires ActiveRecord—and if all you are doing is upserts, `upsert` is tested to be 40% faster. And you don't have to put all of the rows to be upserted into a single huge array - you can
|
221
|
+
This, however, only works on MySQL and requires ActiveRecord—and if all you are doing is upserts, `upsert` is tested to be 40% faster. And you don't have to put all of the rows to be upserted into a single huge array - you can batch them using `Upsert.batch`.
|
223
222
|
|
224
223
|
## Copyright
|
225
224
|
|
data/lib/upsert.rb
CHANGED
@@ -16,7 +16,7 @@ class Upsert
|
|
16
16
|
Binary.new v
|
17
17
|
end
|
18
18
|
|
19
|
-
# @yield [Upsert] An +Upsert+ object in
|
19
|
+
# @yield [Upsert] An +Upsert+ object in batch mode. You can call #row on it multiple times and it will try to optimize on speed.
|
20
20
|
#
|
21
21
|
# @note Buffered in memory until it's efficient to send to the server a packet.
|
22
22
|
#
|
@@ -25,16 +25,19 @@ class Upsert
|
|
25
25
|
# @return [nil]
|
26
26
|
#
|
27
27
|
# @example Many at once
|
28
|
-
# Upsert.
|
28
|
+
# Upsert.batch(Pet.connection, Pet.table_name) do |upsert|
|
29
29
|
# upsert.row({:name => 'Jerry'}, :breed => 'beagle')
|
30
30
|
# upsert.row({:name => 'Pierre'}, :breed => 'tabby')
|
31
31
|
# end
|
32
|
-
def
|
32
|
+
def batch(connection, table_name)
|
33
33
|
upsert = new connection, table_name
|
34
34
|
upsert.async!
|
35
35
|
yield upsert
|
36
36
|
upsert.sync!
|
37
37
|
end
|
38
|
+
|
39
|
+
# @deprecated Use .batch instead.
|
40
|
+
alias :stream :batch
|
38
41
|
end
|
39
42
|
|
40
43
|
# Raised if a query would be too large to send in a single packet.
|
@@ -58,13 +61,13 @@ class Upsert
|
|
58
61
|
attr_reader :table_name
|
59
62
|
|
60
63
|
# @private
|
61
|
-
attr_reader :
|
64
|
+
attr_reader :buffer
|
62
65
|
|
63
66
|
# @param [Mysql2::Client,Sqlite3::Database,PG::Connection,#raw_connection] connection A supported database connection.
|
64
67
|
# @param [String,Symbol] table_name The name of the table into which you will be upserting.
|
65
68
|
def initialize(connection, table_name)
|
66
69
|
@table_name = table_name
|
67
|
-
@
|
70
|
+
@buffer = []
|
68
71
|
|
69
72
|
@connection = if connection.respond_to?(:raw_connection)
|
70
73
|
# deal with ActiveRecord::Base.connection or ActiveRecord::Base.connection_pool.checkout
|
@@ -92,7 +95,7 @@ class Upsert
|
|
92
95
|
# upsert.row({:name => 'Jerry'}, :breed => 'beagle')
|
93
96
|
# upsert.row({:name => 'Pierre'}, :breed => 'tabby')
|
94
97
|
def row(selector, document)
|
95
|
-
|
98
|
+
buffer.push Row.new(self, selector, document)
|
96
99
|
if sql = chunk
|
97
100
|
execute sql
|
98
101
|
end
|
data/lib/upsert/mysql2_client.rb
CHANGED
@@ -1,50 +1,34 @@
|
|
1
1
|
class Upsert
|
2
2
|
# @private
|
3
3
|
module Mysql2_Client
|
4
|
-
SAMPLE = 0.1
|
5
|
-
|
6
4
|
def chunk
|
7
|
-
return if
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
end
|
13
|
-
if async? and take == all
|
14
|
-
return
|
15
|
-
end
|
16
|
-
while take > 2 and oversize?(take)
|
17
|
-
$stderr.puts " Length prediction via sampling failed, shrinking" if ENV['UPSERT_DEBUG'] == 'true'
|
18
|
-
take -= 2
|
5
|
+
return if buffer.empty?
|
6
|
+
if not async?
|
7
|
+
retval = sql
|
8
|
+
buffer.clear
|
9
|
+
return retval
|
19
10
|
end
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
24
|
-
|
25
|
-
|
26
|
-
|
27
|
-
|
11
|
+
@cumulative_sql_bytesize ||= static_sql_bytesize
|
12
|
+
new_row = buffer.pop
|
13
|
+
d = new_row.values_sql_bytesize + 3 # ),(
|
14
|
+
if @cumulative_sql_bytesize + d > max_sql_bytesize
|
15
|
+
retval = sql
|
16
|
+
buffer.clear
|
17
|
+
@cumulative_sql_bytesize = static_sql_bytesize + d
|
18
|
+
else
|
19
|
+
retval = nil
|
20
|
+
@cumulative_sql_bytesize += d
|
28
21
|
end
|
29
|
-
|
30
|
-
|
31
|
-
chunk
|
22
|
+
buffer.push new_row
|
23
|
+
retval
|
32
24
|
end
|
33
25
|
|
34
26
|
def execute(sql)
|
35
27
|
connection.query sql
|
36
28
|
end
|
37
29
|
|
38
|
-
def probably_oversize?(take)
|
39
|
-
estimate_sql_bytesize(take) > max_sql_bytesize
|
40
|
-
end
|
41
|
-
|
42
|
-
def oversize?(take)
|
43
|
-
sql_bytesize(take) > max_sql_bytesize
|
44
|
-
end
|
45
|
-
|
46
30
|
def columns
|
47
|
-
@columns ||=
|
31
|
+
@columns ||= buffer.first.columns
|
48
32
|
end
|
49
33
|
|
50
34
|
def insert_part
|
@@ -65,48 +49,11 @@ class Upsert
|
|
65
49
|
@static_sql_bytesize ||= insert_part.bytesize + update_part.bytesize + 2
|
66
50
|
end
|
67
51
|
|
68
|
-
|
69
|
-
|
70
|
-
|
71
|
-
if
|
72
|
-
|
73
|
-
memo += 3*(take-1)
|
74
|
-
end
|
75
|
-
memo
|
76
|
-
end
|
77
|
-
|
78
|
-
def estimate_variable_sql_bytesize(take)
|
79
|
-
n = (take * SAMPLE).ceil
|
80
|
-
sample = if RUBY_VERSION >= '1.9'
|
81
|
-
rows.first(take).sample(n)
|
82
|
-
else
|
83
|
-
# based on https://github.com/marcandre/backports/blob/master/lib/backports/1.8.7/array.rb
|
84
|
-
memo = rows.first(take)
|
85
|
-
n.times do |i|
|
86
|
-
r = i + Kernel.rand(take - i)
|
87
|
-
memo[i], memo[r] = memo[r], memo[i]
|
88
|
-
end
|
89
|
-
memo.first(n)
|
90
|
-
end
|
91
|
-
memo = sample.inject(0) { |sum, row| sum + row.values_sql_bytesize } / SAMPLE
|
92
|
-
if take > 0
|
93
|
-
# parens and comma
|
94
|
-
memo += 3*(take-1)
|
95
|
-
end
|
96
|
-
memo
|
97
|
-
end
|
98
|
-
|
99
|
-
def sql_bytesize(take)
|
100
|
-
static_sql_bytesize + variable_sql_bytesize(take)
|
101
|
-
end
|
102
|
-
|
103
|
-
def estimate_sql_bytesize(take)
|
104
|
-
static_sql_bytesize + estimate_variable_sql_bytesize(take)
|
105
|
-
end
|
106
|
-
|
107
|
-
def sql(take)
|
108
|
-
all_value_sql = rows.first(take).map { |row| row.values_sql }
|
109
|
-
[ insert_part, '(', all_value_sql.join('),('), ')', update_part ].join
|
52
|
+
def sql
|
53
|
+
all_value_sql = buffer.map { |row| row.values_sql }
|
54
|
+
retval = [ insert_part, '(', all_value_sql.join('),('), ')', update_part ].join
|
55
|
+
raise TooBig if retval.bytesize > max_sql_bytesize
|
56
|
+
retval
|
110
57
|
end
|
111
58
|
|
112
59
|
# since setting an option like :as => :hash actually persists that option to the client, don't pass any options
|
data/lib/upsert/pg_connection.rb
CHANGED
@@ -2,8 +2,8 @@ class Upsert
|
|
2
2
|
# @private
|
3
3
|
module SQLite3_Database
|
4
4
|
def chunk
|
5
|
-
return if
|
6
|
-
row =
|
5
|
+
return if buffer.empty?
|
6
|
+
row = buffer.shift
|
7
7
|
%{INSERT OR IGNORE INTO "#{table_name}" (#{row.columns_sql}) VALUES (#{row.values_sql});UPDATE "#{table_name}" SET #{row.set_sql} WHERE #{row.where_sql}}
|
8
8
|
end
|
9
9
|
|
data/lib/upsert/version.rb
CHANGED
data/test/helper.rb
CHANGED
@@ -77,7 +77,7 @@ MiniTest::Spec.class_eval do
|
|
77
77
|
|
78
78
|
Pet.delete_all
|
79
79
|
|
80
|
-
Upsert.
|
80
|
+
Upsert.batch(connection, :pets) do |upsert|
|
81
81
|
records.each do |selector, document|
|
82
82
|
upsert.row(selector, document)
|
83
83
|
end
|
@@ -109,7 +109,7 @@ MiniTest::Spec.class_eval do
|
|
109
109
|
sleep 1
|
110
110
|
|
111
111
|
upsert_time = Benchmark.realtime do
|
112
|
-
Upsert.
|
112
|
+
Upsert.batch(connection, :pets) do |upsert|
|
113
113
|
records.each do |selector, document|
|
114
114
|
upsert.row(selector, document)
|
115
115
|
end
|
data/test/shared/database.rb
CHANGED
@@ -44,17 +44,17 @@ shared_examples_for 'is a database with an upsert trick' do
|
|
44
44
|
end
|
45
45
|
end
|
46
46
|
end
|
47
|
-
describe :
|
47
|
+
describe :batch do
|
48
48
|
it "works for multiple rows (base case)" do
|
49
49
|
assert_creates(Pet, [{:name => 'Jerry', :gender => 'male'}]) do
|
50
|
-
Upsert.
|
50
|
+
Upsert.batch(connection, :pets) do |upsert|
|
51
51
|
upsert.row({:name => 'Jerry'}, :gender => 'male')
|
52
52
|
end
|
53
53
|
end
|
54
54
|
end
|
55
55
|
it "works for multiple rows (not changing anything)" do
|
56
56
|
assert_creates(Pet, [{:name => 'Jerry', :gender => 'male'}]) do
|
57
|
-
Upsert.
|
57
|
+
Upsert.batch(connection, :pets) do |upsert|
|
58
58
|
upsert.row({:name => 'Jerry'}, :gender => 'male')
|
59
59
|
upsert.row({:name => 'Jerry'}, :gender => 'male')
|
60
60
|
end
|
@@ -62,7 +62,7 @@ shared_examples_for 'is a database with an upsert trick' do
|
|
62
62
|
end
|
63
63
|
it "works for multiple rows (changing something)" do
|
64
64
|
assert_creates(Pet, [{:name => 'Jerry', :gender => 'neutered'}]) do
|
65
|
-
Upsert.
|
65
|
+
Upsert.batch(connection, :pets) do |upsert|
|
66
66
|
upsert.row({:name => 'Jerry'}, :gender => 'male')
|
67
67
|
upsert.row({:name => 'Jerry'}, :gender => 'neutered')
|
68
68
|
end
|
data/test/shared/multibyte.rb
CHANGED
@@ -13,9 +13,9 @@ shared_examples_for "supports multibyte" do
|
|
13
13
|
upsert.row({:name => 'I♥NY'}, {:gender => 'jÚrgen'})
|
14
14
|
end
|
15
15
|
end
|
16
|
-
it "works
|
16
|
+
it "works batch" do
|
17
17
|
assert_creates(Pet, [{:name => 'I♥NY', :gender => 'jÚrgen'}]) do
|
18
|
-
Upsert.
|
18
|
+
Upsert.batch(connection, :pets) do |upsert|
|
19
19
|
upsert.row({:name => 'I♥NY'}, {:gender => 'périferôl'})
|
20
20
|
upsert.row({:name => 'I♥NY'}, {:gender => 'jÚrgen'})
|
21
21
|
end
|
data/test/shared/threaded.rb
CHANGED
@@ -13,9 +13,9 @@ shared_examples_for 'is thread-safe' do
|
|
13
13
|
end
|
14
14
|
end
|
15
15
|
end
|
16
|
-
it "is safe to use
|
16
|
+
it "is safe to use batch" do
|
17
17
|
assert_creates(Pet, [{:name => 'Jerry', :gender => 'neutered'}]) do
|
18
|
-
Upsert.
|
18
|
+
Upsert.batch(connection, :pets) do |upsert|
|
19
19
|
ts = []
|
20
20
|
10.times do
|
21
21
|
ts << Thread.new do
|
data/test/test_mysql2.rb
CHANGED
@@ -38,69 +38,4 @@ describe Upsert::Mysql2_Client do
|
|
38
38
|
it_also "doesn't mess with timezones"
|
39
39
|
|
40
40
|
it_also "doesn't blow up on reserved words"
|
41
|
-
|
42
|
-
describe '#sql_bytesize' do
|
43
|
-
def assert_exact(selector_proc, document_proc, show = false)
|
44
|
-
upsert = Upsert.new connection, :pets
|
45
|
-
0.upto(256) do |i|
|
46
|
-
upsert.rows << Upsert::Row.new(upsert, selector_proc.call(i), document_proc.call(i))
|
47
|
-
i.upto(upsert.rows.length) do |take|
|
48
|
-
expected_sql = upsert.sql(take)
|
49
|
-
actual = upsert.sql_bytesize(take)
|
50
|
-
if show and actual != expected_sql.bytesize
|
51
|
-
$stderr.puts
|
52
|
-
$stderr.puts "Expected: #{expected_sql.bytesize}"
|
53
|
-
$stderr.puts "Actual: #{actual}"
|
54
|
-
$stderr.puts expected_sql
|
55
|
-
end
|
56
|
-
actual.must_equal expected_sql.bytesize
|
57
|
-
end
|
58
|
-
end
|
59
|
-
end
|
60
|
-
def rand_string(length)
|
61
|
-
# http://www.dzone.com/snippets/generate-random-string-letters
|
62
|
-
# Array.new(length) { (rand(122-97) + 97).chr }.join
|
63
|
-
if RUBY_VERSION >= '1.9'
|
64
|
-
Array.new(length) { rand(512).chr(Encoding::UTF_8) }.join
|
65
|
-
else
|
66
|
-
Array.new(length) { rand(512) }.pack('C*')
|
67
|
-
end
|
68
|
-
end
|
69
|
-
it "is exact as selector length changes" do
|
70
|
-
selector_proc = proc do |i|
|
71
|
-
{ :name => rand_string(i) }
|
72
|
-
end
|
73
|
-
document_proc = proc do |i|
|
74
|
-
{}
|
75
|
-
end
|
76
|
-
assert_exact selector_proc, document_proc
|
77
|
-
end
|
78
|
-
it "is exact as value length changes" do
|
79
|
-
selector_proc = proc do |i|
|
80
|
-
{ :name => 'Jerry' }
|
81
|
-
end
|
82
|
-
document_proc = proc do |i|
|
83
|
-
{ :spiel => rand_string(i) }
|
84
|
-
end
|
85
|
-
assert_exact selector_proc, document_proc
|
86
|
-
end
|
87
|
-
it "is exact as both selector and value length change" do
|
88
|
-
selector_proc = proc do |i|
|
89
|
-
{ :name => rand_string(i) }
|
90
|
-
end
|
91
|
-
document_proc = proc do |i|
|
92
|
-
{ :spiel => rand_string(i) }
|
93
|
-
end
|
94
|
-
assert_exact selector_proc, document_proc
|
95
|
-
end
|
96
|
-
it "is exact with numbers too" do
|
97
|
-
selector_proc = proc do |i|
|
98
|
-
{ :tag_number => rand(1e5) }
|
99
|
-
end
|
100
|
-
document_proc = proc do |i|
|
101
|
-
{ :lovability => rand }
|
102
|
-
end
|
103
|
-
assert_exact selector_proc, document_proc
|
104
|
-
end
|
105
|
-
end
|
106
41
|
end
|