upsert 0.4.0 → 0.5.0

Sign up to get free protection for your applications and to get access to all the features.
data/CHANGELOG CHANGED
@@ -1,3 +1,25 @@
1
+ 0.5.0 / 2012-09-21
2
+
3
+ * Breaking changes (well, not really)
4
+
5
+ * "document" (as in the second argument to #row) has been renamed to "setter"!
6
+
7
+ * Bug fixes
8
+
9
+ * If you say upsert({:name => 'Jerry', :color => 'red'}), make sure that it only affects rows really meeting those conditions
10
+ * Always sort selector and setter keys - i.e., column names - before doing anything with them
11
+ * Support PostgreSQL 9.1+
12
+ * Support MRI 1.8
13
+
14
+ * Enhancements
15
+
16
+ * Slightly faster benchmarks for SQlite3 and MySQL
17
+ * Slightly slower on PostgreSQL (probably because the merge function requires more arguments), but more accurate
18
+ * Slightly clearer code structure
19
+ * Use bind parameters instead of quoting for PostgreSQL and SQLite3.
20
+ * Provide Upsert.clear_database_functions(connection) (currently only for PostgreSQL)
21
+ * Don't subclass String for Upset::Binary... hopefully save some strcpy()s?
22
+
1
23
  0.4.0 / 2012-09-04
2
24
 
3
25
  * Bug fixes
@@ -18,7 +40,7 @@
18
40
 
19
41
  * Enhancements
20
42
 
21
- * Make document an optional argument
43
+ * Make setter an optional argument
22
44
 
23
45
  0.3.3 / 2012-06-26
24
46
 
data/README.md CHANGED
@@ -26,7 +26,7 @@ Rows are buffered in memory until it's efficient to send them to the database.
26
26
  upsert.row({:name => 'Pierre'}, :breed => 'tabby')
27
27
  end
28
28
 
29
- Tested to be much about 85% faster on PostgreSQL and 50% faster on MySQL than comparable methods (see the tests).
29
+ Tested to be much about 60% faster on PostgreSQL and 60–90% faster on MySQL and SQLite3 than comparable methods (see the tests, which fail if they are not faster).
30
30
 
31
31
  ## Gotchas
32
32
 
@@ -34,9 +34,9 @@ Tested to be much about 85% faster on PostgreSQL and 50% faster on MySQL than co
34
34
 
35
35
  Make sure you're upserting against either primary key columns or columns with UNIQUE indexes or both.
36
36
 
37
- ### Columns are set based on the first row you pass
37
+ ### For MySQL, columns are set based on the first row you pass
38
38
 
39
- Currently, the first row you pass in determines the columns that will be used. That's useful for mass importing of many rows with the same columns, but is surprising if you're trying to use a single `Upsert` object to add arbitrary data. For example, this won't work:
39
+ Currently, on MySQL, the first row you pass in determines the columns that will be used for all future upserts using the same Upsert object. That's useful for mass importing of many rows with the same columns, but is surprising if you're trying to use a single `Upsert` object to add arbitrary data. For example, this won't work:
40
40
 
41
41
  Upsert.batch(Pet.connection, Pet.table_name) do |upsert|
42
42
  upsert.row({:name => 'Jerry'}, :breed => 'beagle')
@@ -52,9 +52,11 @@ You would need to use a new `Upsert` object. On the other hand, this is totally
52
52
 
53
53
  Pull requests for any of these would be greatly appreciated:
54
54
 
55
+ 1. More correctness tests! What is the dictionary definition of "upsert," anyway?
56
+ 1. Sanity check my three benchmarks (four if you include activerecord-import on MySQL). Do they accurately represent optimized alternatives?
55
57
  1. Provide `require 'upsert/debug'` that will make sure you are selecting on columns that have unique indexes
56
- 1. Make `Upsert` instances accept arbitrary columns, which is what people probably expect. (this should work on PG already)
57
- 1. Naming suggestions: should "document" be called "setters" or "attributes"?
58
+ 1. Make `Upsert` instances accept arbitrary columns, which is what people probably expect. (this should work on PostgreSQL and SQLite3 already)
59
+ 1. JRuby support
58
60
 
59
61
  ## Real-world usage
60
62
 
@@ -73,16 +75,16 @@ Originally written to speed up the [`data_miner`](https://github.com/seamusabshe
73
75
 
74
76
  Using the [mysql2](https://rubygems.org/gems/mysql2) driver.
75
77
 
76
- Upsert.new Mysql2::Connection.new([...]), :pets
78
+ upsert = Upsert.new(Mysql2::Connection.new(:username => 'root', :password => 'password', :database => 'upsert_test'), :pets)
77
79
 
78
80
  #### Speed
79
81
 
80
- From the tests:
82
+ From the tests (updated 9/21/12):
81
83
 
82
- Upsert was 77% faster than find + new/set/save
83
- Upsert was 58% faster than create + rescue/find/update
84
- Upsert was 80% faster than find_or_create + update_attributes
85
- Upsert was 39% faster than faking upserts with activerecord-import
84
+ Upsert was 88% faster than find + new/set/save
85
+ Upsert was 90% faster than create + rescue/find/update
86
+ Upsert was 90% faster than find_or_create + update_attributes
87
+ Upsert was 60% faster than faking upserts with activerecord-import
86
88
 
87
89
  #### SQL MERGE trick
88
90
 
@@ -92,21 +94,25 @@ From the tests:
92
94
  INSERT INTO table (a,b,c) VALUES (1,2,3), (4,5,6)
93
95
  ON DUPLICATE KEY UPDATE a=VALUES(a),b=VALUES(b),c=VALUES(c);
94
96
 
97
+ If `a` only appeared in the selector, then we avoid updating it in case of a duplicate key:
98
+
99
+ ON DUPLICATE KEY UPDATE a=a,b=VALUES(b),c=VALUES(c);
100
+
95
101
  Since this is an upsert helper library, not a general-use ON DUPLICATE KEY UPDATE wrapper, you **can't** do things like `c=c+1`.
96
102
 
97
103
  ### PostgreSQL
98
104
 
99
105
  Using the [pg](https://rubygems.org/gems/pg) driver.
100
106
 
101
- Upsert.new PG.connect([...]), :pets
107
+ upsert = Upsert.new(PG.connect(:dbname => 'upsert_test'), :pets)
102
108
 
103
109
  #### Speed
104
110
 
105
- From the tests:
111
+ From the tests (updated 9/21/12):
106
112
 
107
- Upsert was 73% faster than find + new/set/save
108
- Upsert was 84% faster than find_or_create + update_attributes
109
- Upsert was 87% faster than create + rescue/find/update
113
+ Upsert was 65% faster than find + new/set/save
114
+ Upsert was 79% faster than find_or_create + update_attributes
115
+ Upsert was 76% faster than create + rescue/find/update
110
116
  # (can't compare to activerecord-import because you can't fake it on pg)
111
117
 
112
118
  #### SQL MERGE trick
@@ -138,6 +144,8 @@ From the tests:
138
144
  SELECT merge_db(1, 'david');
139
145
  SELECT merge_db(1, 'dennis');
140
146
 
147
+ I slightly modified it so that it only retries once - don't want infinite loops.
148
+
141
149
  The decision was made **not** to use the following because it's not straight from the manual:
142
150
 
143
151
  # http://stackoverflow.com/questions/1109061/insert-on-duplicate-update-postgresql
@@ -166,11 +174,16 @@ This was also rejected because there's something we can use in the manual:
166
174
 
167
175
  Using the [sqlite3](https://rubygems.org/gems/sqlite3) driver.
168
176
 
169
- Upsert.new SQLite3::Database.open([...]), :pets
177
+ upsert = Upsert.new(SQLite3::Database.open(':memory:'), :pets)
170
178
 
171
179
  #### Speed
172
180
 
173
- FIXME tests are segfaulting. Pull request would be lovely.
181
+ From the tests (updated 9/21/12):
182
+
183
+ Upsert was 77% faster than find + new/set/save
184
+ Upsert was 80% faster than find_or_create + update_attributes
185
+ Upsert was 85% faster than create + rescue/find/update
186
+ # (can't compare to activerecord-import because you can't fake it on sqlite3)
174
187
 
175
188
  #### SQL MERGE trick
176
189
 
data/Rakefile CHANGED
@@ -11,6 +11,7 @@ task :rspec_all_databases do
11
11
  puts
12
12
  pid = POSIX::Spawn.spawn({'ADAPTER' => adapter}, 'rspec', '--format', 'documentation', File.expand_path('../spec', __FILE__))
13
13
  Process.waitpid pid
14
+ raise unless $?.success?
14
15
  end
15
16
  end
16
17
 
@@ -7,6 +7,7 @@ require 'upsert/binary'
7
7
  require 'upsert/buffer'
8
8
  require 'upsert/connection'
9
9
  require 'upsert/row'
10
+ require 'upsert/cell'
10
11
 
11
12
  class Upsert
12
13
  class << self
@@ -34,6 +35,16 @@ class Upsert
34
35
  end
35
36
  end
36
37
 
38
+ # @param [Mysql2::Client,Sqlite3::Database,PG::Connection,#raw_connection] connection A supported database connection.
39
+ #
40
+ # Clear any database functions that may have been created.
41
+ #
42
+ # Currently only applies to PostgreSQL.
43
+ def clear_database_functions(connection)
44
+ dummy = new(connection, :dummy)
45
+ dummy.buffer.clear_database_functions
46
+ end
47
+
37
48
  # @param [String] v A string containing binary data that should be inserted/escaped as such.
38
49
  #
39
50
  # @return [Upsert::Binary]
@@ -41,9 +52,14 @@ class Upsert
41
52
  Binary.new v
42
53
  end
43
54
 
44
- # @yield [Upsert] An +Upsert+ object in batch mode. You can call #row on it multiple times and it will try to optimize on speed.
55
+ # Guarantee that the most efficient way of buffering rows is used.
56
+ #
57
+ # Currently mostly helps for MySQL, but you should use it whenever possible in case future buffering-based optimizations become possible.
45
58
  #
46
- # @note Buffered in memory until it's efficient to send to the server a packet.
59
+ # @param [Mysql2::Client,Sqlite3::Database,PG::Connection,#raw_connection] connection A supported database connection.
60
+ # @param [String,Symbol] table_name The name of the table into which you will be upserting.
61
+ #
62
+ # @yield [Upsert] An +Upsert+ object in batch mode. You can call #row on it multiple times and it will try to optimize on speed.
47
63
  #
48
64
  # @return [nil]
49
65
  #
@@ -66,12 +82,17 @@ class Upsert
66
82
  SINGLE_QUOTE = %{'}
67
83
  DOUBLE_QUOTE = %{"}
68
84
  BACKTICK = %{`}
69
- E_AND_SINGLE_QUOTE = %{E'}
70
85
  X_AND_SINGLE_QUOTE = %{x'}
71
86
  USEC_SPRINTF = '%06d'
72
87
  ISO8601_DATETIME = '%Y-%m-%d %H:%M:%S'
73
88
  ISO8601_DATE = '%F'
74
89
  NULL_WORD = 'NULL'
90
+ HANDLER = {
91
+ 'SQLite3::Database' => 'SQLite3_Database',
92
+ 'PGConn' => 'PG_Connection',
93
+ 'PG::Connection' => 'PG_Connection',
94
+ 'Mysql2::Client' => 'Mysql2_Client',
95
+ }
75
96
 
76
97
  # @return [Upsert::Connection]
77
98
  attr_reader :connection
@@ -82,22 +103,32 @@ class Upsert
82
103
  # @private
83
104
  attr_reader :buffer
84
105
 
106
+ # @private
107
+ attr_reader :row_class
108
+
109
+ # @private
110
+ attr_reader :cell_class
111
+
85
112
  # @param [Mysql2::Client,Sqlite3::Database,PG::Connection,#raw_connection] connection A supported database connection.
86
113
  # @param [String,Symbol] table_name The name of the table into which you will be upserting.
87
114
  def initialize(connection, table_name)
88
115
  @table_name = table_name.to_s
89
116
  raw_connection = connection.respond_to?(:raw_connection) ? connection.raw_connection : connection
90
- n = raw_connection.class.name.gsub(/\W+/, '_')
91
- @connection = Connection.const_get(n).new self, raw_connection
92
- @buffer = Buffer.const_get(n).new self
117
+ connection_class_name = HANDLER[raw_connection.class.name]
118
+ @connection = Connection.const_get(connection_class_name).new self, raw_connection
119
+ @buffer = Buffer.const_get(connection_class_name).new self
120
+ @row_class = Row.const_get connection_class_name
121
+ @cell_class = Cell.const_get connection_class_name
93
122
  end
94
123
 
95
- # Upsert a row given a selector and a document.
124
+ # Upsert a row given a selector and a setter.
125
+ #
126
+ # The selector values are used as setters if it's a new row. So if your selector is `name=Jerry` and your setter is `age=4`, and there is no Jerry yet, then a new row will be created with name Jerry and age 4.
96
127
  #
97
128
  # @see http://api.mongodb.org/ruby/1.6.4/Mongo/Collection.html#update-instance_method Loosely based on the upsert functionality of the mongo-ruby-driver #update method
98
129
  #
99
130
  # @param [Hash] selector Key-value pairs that will be used to find or create a row.
100
- # @param [Hash] document Key-value pairs that will be set on the row, whether it previously existed or not.
131
+ # @param [Hash] setter Key-value pairs that will be set on the row, whether it previously existed or not.
101
132
  #
102
133
  # @return [nil]
103
134
  #
@@ -105,8 +136,8 @@ class Upsert
105
136
  # upsert = Upsert.new Pet.connection, Pet.table_name
106
137
  # upsert.row({:name => 'Jerry'}, :breed => 'beagle')
107
138
  # upsert.row({:name => 'Pierre'}, :breed => 'tabby')
108
- def row(selector, document = {})
109
- buffer << Row.new(self, selector, document)
139
+ def row(selector, setter = {})
140
+ buffer << row_class.new(self, selector, setter)
110
141
  nil
111
142
  end
112
143
 
@@ -1,9 +1,9 @@
1
1
  class Upsert
2
2
  module ActiveRecordUpsert
3
- def upsert(selector, document = {})
3
+ def upsert(selector, setter = {})
4
4
  ActiveRecord::Base.connection_pool.with_connection do |c|
5
5
  upsert = Upsert.new c, table_name
6
- upsert.row selector, document
6
+ upsert.row selector, setter
7
7
  end
8
8
  end
9
9
  end
@@ -2,6 +2,5 @@ class Upsert
2
2
  # A wrapper class for binary strings so that Upsert knows to escape them as such.
3
3
  #
4
4
  # Create them with +Upsert.binary(x)+
5
- class Binary < ::String
6
- end
5
+ Binary = Struct.new(:value)
7
6
  end
@@ -4,9 +4,9 @@ class Upsert
4
4
  class Mysql2_Client < Buffer
5
5
  def ready
6
6
  return if rows.empty?
7
- c = parent.connection
7
+ connection = parent.connection
8
8
  if not async?
9
- c.execute sql
9
+ connection.execute sql
10
10
  rows.clear
11
11
  return
12
12
  end
@@ -14,7 +14,7 @@ class Upsert
14
14
  new_row = rows.pop
15
15
  d = new_row.values_sql_bytesize + 3 # ),(
16
16
  if @cumulative_sql_bytesize + d > max_sql_bytesize
17
- c.execute sql
17
+ connection.execute sql
18
18
  rows.clear
19
19
  @cumulative_sql_bytesize = static_sql_bytesize + d
20
20
  else
@@ -24,24 +24,33 @@ class Upsert
24
24
  nil
25
25
  end
26
26
 
27
- def columns
28
- @columns ||= rows.first.columns
27
+ def setter
28
+ @setter ||= rows.first.setter.keys
29
+ end
30
+
31
+ def original_setter
32
+ @original_setter ||= rows.first.original_setter_keys
29
33
  end
30
34
 
31
35
  def insert_part
32
36
  @insert_part ||= begin
33
37
  connection = parent.connection
34
- columns_sql = columns.map { |k| connection.quote_ident(k) }.join(',')
35
- %{INSERT INTO #{parent.quoted_table_name} (#{columns_sql}) VALUES }
38
+ column_names = setter.map { |k| connection.quote_ident(k) }
39
+ %{INSERT INTO #{parent.quoted_table_name} (#{column_names.join(',')}) VALUES }
36
40
  end
37
41
  end
38
42
 
39
43
  def update_part
40
44
  @update_part ||= begin
41
45
  connection = parent.connection
42
- updaters = columns.map do |k|
43
- qk = connection.quote_ident(k)
44
- [ qk, "VALUES(#{qk})" ].join('=')
46
+ updaters = setter.map do |k|
47
+ quoted_name = connection.quote_ident(k)
48
+ if original_setter.include?(k)
49
+ "#{quoted_name}=VALUES(#{quoted_name})"
50
+ else
51
+ # NOOP
52
+ "#{quoted_name}=#{quoted_name}"
53
+ end
45
54
  end.join(',')
46
55
  %{ ON DUPLICATE KEY UPDATE #{updaters}}
47
56
  end
@@ -53,9 +62,13 @@ class Upsert
53
62
  end
54
63
 
55
64
  def sql
56
- all_value_sql = rows.map { |row| row.values_sql }
57
- retval = [ insert_part, '(', all_value_sql.join('),('), ')', update_part ].join
58
- retval
65
+ [
66
+ insert_part,
67
+ '(',
68
+ rows.map { |row| row.quoted_setter_values.join(',') }.join('),('),
69
+ ')',
70
+ update_part
71
+ ].join
59
72
  end
60
73
 
61
74
  # since setting an option like :as => :hash actually persists that option to the client, don't pass any options
@@ -1,3 +1,4 @@
1
+ require 'upsert/buffer/pg_connection/column_definition'
1
2
  require 'upsert/buffer/pg_connection/merge_function'
2
3
 
3
4
  class Upsert
@@ -7,48 +8,12 @@ class Upsert
7
8
  def ready
8
9
  return if rows.empty?
9
10
  row = rows.shift
10
- MergeFunction.execute(self, row)
11
+ MergeFunction.execute self, row
11
12
  end
12
13
 
13
14
  def clear_database_functions
14
- connection = parent.connection
15
- # http://stackoverflow.com/questions/7622908/postgresql-drop-function-without-knowing-the-number-type-of-parameters
16
- connection.execute <<-EOS
17
- CREATE OR REPLACE FUNCTION pg_temp.upsert_delfunc(text)
18
- RETURNS void AS
19
- $BODY$
20
- DECLARE
21
- _sql text;
22
- BEGIN
23
-
24
- FOR _sql IN
25
- SELECT 'DROP FUNCTION ' || quote_ident(n.nspname)
26
- || '.' || quote_ident(p.proname)
27
- || '(' || pg_catalog.pg_get_function_identity_arguments(p.oid) || ');'
28
- FROM pg_catalog.pg_proc p
29
- LEFT JOIN pg_catalog.pg_namespace n ON n.oid = p.pronamespace
30
- WHERE p.proname = $1
31
- AND pg_catalog.pg_function_is_visible(p.oid) -- you may or may not want this
32
- LOOP
33
- EXECUTE _sql;
34
- END LOOP;
35
-
36
- END;
37
- $BODY$
38
- LANGUAGE plpgsql;
39
- EOS
40
- res = connection.execute(%{SELECT proname FROM pg_proc WHERE proname LIKE 'upsert_%'})
41
- res.each do |row|
42
- k = row['proname']
43
- next if k == 'upsert_delfunc'
44
- Upsert.logger.info %{[upsert] Dropping function #{k.inspect}}
45
- connection.execute %{SELECT pg_temp.upsert_delfunc('#{k}')}
46
- end
15
+ MergeFunction.clear self
47
16
  end
48
17
  end
49
-
50
- # @private
51
- # backwards compatibility - https://github.com/seamusabshere/upsert/issues/2
52
- PGconn = PG_Connection
53
18
  end
54
19
  end
@@ -0,0 +1,59 @@
1
+ class Upsert
2
+ class Buffer
3
+ class PG_Connection < Buffer
4
+ # @private
5
+ class ColumnDefinition
6
+ class << self
7
+ # activerecord-3.2.5/lib/active_record/connection_adapters/postgresql_adapter.rb#column_definitions
8
+ def all(buffer, table_name)
9
+ connection = buffer.parent.connection
10
+ res = connection.execute <<-EOS
11
+ SELECT a.attname AS name, format_type(a.atttypid, a.atttypmod) AS sql_type, d.adsrc AS default
12
+ FROM pg_attribute a LEFT JOIN pg_attrdef d
13
+ ON a.attrelid = d.adrelid AND a.attnum = d.adnum
14
+ WHERE a.attrelid = '#{connection.quote_ident(table_name)}'::regclass
15
+ AND a.attnum > 0 AND NOT a.attisdropped
16
+ EOS
17
+ res.map do |row|
18
+ new connection, row['name'], row['sql_type'], row['default']
19
+ end.sort_by do |cd|
20
+ cd.name
21
+ end
22
+ end
23
+ end
24
+
25
+ attr_reader :name
26
+ attr_reader :sql_type
27
+ attr_reader :default
28
+ attr_reader :quoted_name
29
+ attr_reader :quoted_selector_name
30
+ attr_reader :quoted_setter_name
31
+
32
+ def initialize(connection, name, sql_type, default)
33
+ @name = name
34
+ @sql_type = sql_type
35
+ @default = default
36
+ @quoted_name = connection.quote_ident name
37
+ @quoted_selector_name = connection.quote_ident "#{name}_selector"
38
+ @quoted_setter_name = connection.quote_ident "#{name}_setter"
39
+ end
40
+
41
+ def to_selector_arg
42
+ "#{quoted_selector_name} #{sql_type}"
43
+ end
44
+
45
+ def to_setter_arg
46
+ "#{quoted_setter_name} #{sql_type}"
47
+ end
48
+
49
+ def to_setter
50
+ "#{quoted_name} = #{quoted_setter_name}"
51
+ end
52
+
53
+ def to_selector
54
+ "#{quoted_name} = #{quoted_selector_name}"
55
+ end
56
+ end
57
+ end
58
+ end
59
+ end