upsert 0.3.4 → 0.4.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (51) hide show
  1. data/CHANGELOG +12 -0
  2. data/README.md +6 -9
  3. data/Rakefile +9 -14
  4. data/lib/upsert.rb +40 -71
  5. data/lib/upsert/buffer.rb +36 -0
  6. data/lib/upsert/buffer/mysql2_client.rb +67 -0
  7. data/lib/upsert/buffer/pg_connection.rb +54 -0
  8. data/lib/upsert/buffer/pg_connection/merge_function.rb +138 -0
  9. data/lib/upsert/buffer/sqlite3_database.rb +13 -0
  10. data/lib/upsert/connection.rb +41 -0
  11. data/lib/upsert/connection/mysql2_client.rb +53 -0
  12. data/lib/upsert/connection/pg_connection.rb +39 -0
  13. data/lib/upsert/connection/sqlite3_database.rb +36 -0
  14. data/lib/upsert/row.rb +28 -24
  15. data/lib/upsert/version.rb +1 -1
  16. data/spec/active_record_upsert_spec.rb +16 -0
  17. data/spec/binary_spec.rb +21 -0
  18. data/spec/correctness_spec.rb +73 -0
  19. data/spec/database_functions_spec.rb +36 -0
  20. data/spec/database_spec.rb +97 -0
  21. data/spec/logger_spec.rb +37 -0
  22. data/{test → spec}/misc/get_postgres_reserved_words.rb +0 -0
  23. data/{test → spec}/misc/mysql_reserved.txt +0 -0
  24. data/{test → spec}/misc/pg_reserved.txt +0 -0
  25. data/spec/multibyte_spec.rb +27 -0
  26. data/spec/precision_spec.rb +11 -0
  27. data/spec/reserved_words_spec.rb +46 -0
  28. data/{test/helper.rb → spec/spec_helper.rb} +43 -43
  29. data/spec/speed_spec.rb +73 -0
  30. data/spec/threaded_spec.rb +34 -0
  31. data/spec/timezones_spec.rb +28 -0
  32. data/upsert.gemspec +6 -2
  33. metadata +99 -50
  34. data/lib/upsert/mysql2_client.rb +0 -104
  35. data/lib/upsert/pg_connection.rb +0 -92
  36. data/lib/upsert/pg_connection/column_definition.rb +0 -35
  37. data/lib/upsert/sqlite3_database.rb +0 -39
  38. data/test/shared/binary.rb +0 -18
  39. data/test/shared/correctness.rb +0 -72
  40. data/test/shared/database.rb +0 -94
  41. data/test/shared/multibyte.rb +0 -37
  42. data/test/shared/precision.rb +0 -8
  43. data/test/shared/reserved_words.rb +0 -45
  44. data/test/shared/speed.rb +0 -72
  45. data/test/shared/threaded.rb +0 -31
  46. data/test/shared/timezones.rb +0 -25
  47. data/test/test_active_record_connection_adapter.rb +0 -36
  48. data/test/test_active_record_upsert.rb +0 -23
  49. data/test/test_mysql2.rb +0 -43
  50. data/test/test_pg.rb +0 -45
  51. data/test/test_sqlite.rb +0 -47
data/CHANGELOG CHANGED
@@ -1,3 +1,15 @@
1
+ 0.4.0 / 2012-09-04
2
+
3
+ * Bug fixes
4
+
5
+ * Don't raise TooBig - rely on Mysql2 to complain about oversized packets
6
+
7
+ * Enhancements
8
+
9
+ * Re-use PostgreSQL merge functions across connections, even outside of batch mode. Huzzah!
10
+ * For MySQL, increase speed for one-off upserts by not checking packet size
11
+ * Allow configuring Upsert.logger. Defaults to Rails.logger or Logger.new($stderr). If you set env var UPSERT_DEBUG=true then it will set log level to debug.
12
+
1
13
  0.3.4 / 2012-07-03
2
14
 
3
15
  * Bug fixes
data/README.md CHANGED
@@ -28,13 +28,13 @@ Rows are buffered in memory until it's efficient to send them to the database.
28
28
 
29
29
  Tested to be much about 85% faster on PostgreSQL and 50% faster on MySQL than comparable methods (see the tests).
30
30
 
31
- ### Gotchas
31
+ ## Gotchas
32
32
 
33
- #### Undefined behavior if you use this without properly defining UNIQUE indexes
33
+ ### Undefined behavior without real UNIQUE indexes
34
34
 
35
35
  Make sure you're upserting against either primary key columns or columns with UNIQUE indexes or both.
36
36
 
37
- #### Columns are set based on the first row you pass
37
+ ### Columns are set based on the first row you pass
38
38
 
39
39
  Currently, the first row you pass in determines the columns that will be used. That's useful for mass importing of many rows with the same columns, but is surprising if you're trying to use a single `Upsert` object to add arbitrary data. For example, this won't work:
40
40
 
@@ -52,12 +52,9 @@ You would need to use a new `Upsert` object. On the other hand, this is totally
52
52
 
53
53
  Pull requests for any of these would be greatly appreciated:
54
54
 
55
- 1. Fix SQLite tests.
56
- 2. For PG, be smarter about when you create functions - try to re-use them within a connection.
57
- 3. Provide optional SQL logging.
58
- 4. Provide `require 'upsert/debug'` that will make sure you are selecting on columns that have unique indexes
59
- 5. Make `Upsert` instances accept arbitrary columns, which is what people probably expect.
60
- 6. Naming suggestions: should "document" be called "setters" or "attributes"?
55
+ 1. Provide `require 'upsert/debug'` that will make sure you are selecting on columns that have unique indexes
56
+ 1. Make `Upsert` instances accept arbitrary columns, which is what people probably expect. (this should work on PG already)
57
+ 1. Naming suggestions: should "document" be called "setters" or "attributes"?
61
58
 
62
59
  ## Real-world usage
63
60
 
data/Rakefile CHANGED
@@ -1,25 +1,20 @@
1
1
  #!/usr/bin/env rake
2
2
  require "bundler/gem_tasks"
3
3
 
4
- require 'rake'
5
- require 'rake/testtask'
6
- Rake::TestTask.new(:_test) do |test|
7
- test.libs << 'lib' << 'test'
8
- test.pattern = 'test/**/test_*.rb'
9
- test.verbose = true
10
- end
11
-
12
- task :test_each_db_adapter do
13
- %w{ active_record_upsert mysql2 sqlite pg active_record_connection_adapter }.each do |database|
4
+ task :rspec_all_databases do
5
+ require 'posix-spawn'
6
+ %w{ postgresql mysql2 sqlite3 }.each do |adapter|
14
7
  puts
15
- puts "#{'*'*10} Running #{database} tests"
8
+ puts '#'*50
9
+ puts "# Running specs against #{adapter}"
10
+ puts '#'*50
16
11
  puts
17
- puts `rake _test TEST=test/test_#{database}.rb`
12
+ pid = POSIX::Spawn.spawn({'ADAPTER' => adapter}, 'rspec', '--format', 'documentation', File.expand_path('../spec', __FILE__))
13
+ Process.waitpid pid
18
14
  end
19
15
  end
20
16
 
21
- task :default => :test_each_db_adapter
22
- task :test => :test_each_db_adapter
17
+ task :default => :rspec_all_databases
23
18
 
24
19
  require 'yard'
25
20
  YARD::Rake::YardocTask.new
data/lib/upsert.rb CHANGED
@@ -1,14 +1,39 @@
1
1
  require 'bigdecimal'
2
+ require 'thread'
3
+ require 'logger'
2
4
 
3
5
  require 'upsert/version'
4
6
  require 'upsert/binary'
7
+ require 'upsert/buffer'
8
+ require 'upsert/connection'
5
9
  require 'upsert/row'
6
- require 'upsert/mysql2_client'
7
- require 'upsert/pg_connection'
8
- require 'upsert/sqlite3_database'
9
10
 
10
11
  class Upsert
11
12
  class << self
13
+ # What logger to use.
14
+ # @return [#info,#warn,#debug]
15
+ attr_writer :logger
16
+
17
+ # The current logger
18
+ # @return [#info,#warn,#debug]
19
+ def logger
20
+ @logger || Thread.exclusive do
21
+ @logger ||= if defined?(::Rails) and (rails_logger = ::Rails.logger)
22
+ rails_logger
23
+ elsif defined?(::ActiveRecord) and ::ActiveRecord.const_defined?(:Base) and (ar_logger = ::ActiveRecord::Base.logger)
24
+ ar_logger
25
+ else
26
+ my_logger = Logger.new $stderr
27
+ my_logger.level = Logger::INFO
28
+ my_logger
29
+ end
30
+ if ENV['UPSERT_DEBUG'] == 'true'
31
+ @logger.level = Logger::DEBUG
32
+ end
33
+ @logger
34
+ end
35
+ end
36
+
12
37
  # @param [String] v A string containing binary data that should be inserted/escaped as such.
13
38
  #
14
39
  # @return [Upsert::Binary]
@@ -20,8 +45,6 @@ class Upsert
20
45
  #
21
46
  # @note Buffered in memory until it's efficient to send to the server a packet.
22
47
  #
23
- # @raise [Upsert::TooBig] If any row is too big to fit inside a single packet.
24
- #
25
48
  # @return [nil]
26
49
  #
27
50
  # @example Many at once
@@ -31,19 +54,15 @@ class Upsert
31
54
  # end
32
55
  def batch(connection, table_name)
33
56
  upsert = new connection, table_name
34
- upsert.async!
57
+ upsert.buffer.async!
35
58
  yield upsert
36
- upsert.sync!
59
+ upsert.buffer.sync!
37
60
  end
38
61
 
39
62
  # @deprecated Use .batch instead.
40
63
  alias :stream :batch
41
64
  end
42
65
 
43
- # Raised if a query would be too large to send in a single packet.
44
- class TooBig < RuntimeError
45
- end
46
-
47
66
  SINGLE_QUOTE = %{'}
48
67
  DOUBLE_QUOTE = %{"}
49
68
  BACKTICK = %{`}
@@ -54,10 +73,10 @@ class Upsert
54
73
  ISO8601_DATE = '%F'
55
74
  NULL_WORD = 'NULL'
56
75
 
57
- # @return [Mysql2::Client,Sqlite3::Database,PG::Connection,#raw_connection]
76
+ # @return [Upsert::Connection]
58
77
  attr_reader :connection
59
78
 
60
- # @return [String,Symbol]
79
+ # @return [String]
61
80
  attr_reader :table_name
62
81
 
63
82
  # @private
@@ -66,17 +85,11 @@ class Upsert
66
85
  # @param [Mysql2::Client,Sqlite3::Database,PG::Connection,#raw_connection] connection A supported database connection.
67
86
  # @param [String,Symbol] table_name The name of the table into which you will be upserting.
68
87
  def initialize(connection, table_name)
69
- @table_name = table_name
70
- @buffer = []
71
-
72
- @connection = if connection.respond_to?(:raw_connection)
73
- # deal with ActiveRecord::Base.connection or ActiveRecord::Base.connection_pool.checkout
74
- connection.raw_connection
75
- else
76
- connection
77
- end
78
-
79
- extend Upsert.const_get(@connection.class.name.gsub(/\W+/, '_'))
88
+ @table_name = table_name.to_s
89
+ raw_connection = connection.respond_to?(:raw_connection) ? connection.raw_connection : connection
90
+ n = raw_connection.class.name.gsub(/\W+/, '_')
91
+ @connection = Connection.const_get(n).new self, raw_connection
92
+ @buffer = Buffer.const_get(n).new self
80
93
  end
81
94
 
82
95
  # Upsert a row given a selector and a document.
@@ -86,8 +99,6 @@ class Upsert
86
99
  # @param [Hash] selector Key-value pairs that will be used to find or create a row.
87
100
  # @param [Hash] document Key-value pairs that will be set on the row, whether it previously existed or not.
88
101
  #
89
- # @raise [Upsert::TooBig] If any row is too big to fit inside a single packet.
90
- #
91
102
  # @return [nil]
92
103
  #
93
104
  # @example One at a time
@@ -95,54 +106,12 @@ class Upsert
95
106
  # upsert.row({:name => 'Jerry'}, :breed => 'beagle')
96
107
  # upsert.row({:name => 'Pierre'}, :breed => 'tabby')
97
108
  def row(selector, document = {})
98
- buffer.push Row.new(self, selector, document)
99
- if sql = chunk
100
- execute sql
101
- end
109
+ buffer << Row.new(self, selector, document)
102
110
  nil
103
111
  end
104
112
 
105
113
  # @private
106
- def async?
107
- !!@async
108
- end
109
-
110
- # @private
111
- def async!
112
- @async = true
113
- end
114
-
115
- # @private
116
- def sync!
117
- @async = false
118
- while sql = chunk
119
- execute sql
120
- end
121
- end
122
-
123
- # @private
124
- def quote_value(v)
125
- case v
126
- when NilClass
127
- NULL_WORD
128
- when Upsert::Binary
129
- quote_binary v # must be defined by base
130
- when String
131
- quote_string v # must be defined by base
132
- when TrueClass, FalseClass
133
- quote_boolean v
134
- when BigDecimal
135
- quote_big_decimal v
136
- when Numeric
137
- v
138
- when Symbol
139
- quote_string v.to_s
140
- when Time, DateTime
141
- quote_time v # must be defined by base
142
- when Date
143
- quote_string v.strftime(ISO8601_DATE)
144
- else
145
- raise "not sure how to quote #{v.class}: #{v.inspect}"
146
- end
114
+ def quoted_table_name
115
+ @quoted_table_name ||= connection.quote_ident table_name
147
116
  end
148
117
  end
@@ -0,0 +1,36 @@
1
+ require 'upsert/buffer/mysql2_client'
2
+ require 'upsert/buffer/pg_connection'
3
+ require 'upsert/buffer/sqlite3_database'
4
+
5
+ class Upsert
6
+ # @private
7
+ class Buffer
8
+ attr_reader :parent
9
+ attr_reader :rows
10
+
11
+ def initialize(parent)
12
+ @parent = parent
13
+ @rows = []
14
+ end
15
+
16
+ def <<(row)
17
+ rows << row
18
+ ready
19
+ end
20
+
21
+ def async?
22
+ !!@async
23
+ end
24
+
25
+ def async!
26
+ @async = true
27
+ end
28
+
29
+ def sync!
30
+ @async = false
31
+ until rows.empty?
32
+ ready
33
+ end
34
+ end
35
+ end
36
+ end
@@ -0,0 +1,67 @@
1
+ class Upsert
2
+ # @private
3
+ class Buffer
4
+ class Mysql2_Client < Buffer
5
+ def ready
6
+ return if rows.empty?
7
+ c = parent.connection
8
+ if not async?
9
+ c.execute sql
10
+ rows.clear
11
+ return
12
+ end
13
+ @cumulative_sql_bytesize ||= static_sql_bytesize
14
+ new_row = rows.pop
15
+ d = new_row.values_sql_bytesize + 3 # ),(
16
+ if @cumulative_sql_bytesize + d > max_sql_bytesize
17
+ c.execute sql
18
+ rows.clear
19
+ @cumulative_sql_bytesize = static_sql_bytesize + d
20
+ else
21
+ @cumulative_sql_bytesize += d
22
+ end
23
+ rows << new_row
24
+ nil
25
+ end
26
+
27
+ def columns
28
+ @columns ||= rows.first.columns
29
+ end
30
+
31
+ def insert_part
32
+ @insert_part ||= begin
33
+ connection = parent.connection
34
+ columns_sql = columns.map { |k| connection.quote_ident(k) }.join(',')
35
+ %{INSERT INTO #{parent.quoted_table_name} (#{columns_sql}) VALUES }
36
+ end
37
+ end
38
+
39
+ def update_part
40
+ @update_part ||= begin
41
+ connection = parent.connection
42
+ updaters = columns.map do |k|
43
+ qk = connection.quote_ident(k)
44
+ [ qk, "VALUES(#{qk})" ].join('=')
45
+ end.join(',')
46
+ %{ ON DUPLICATE KEY UPDATE #{updaters}}
47
+ end
48
+ end
49
+
50
+ # where 2 is the parens
51
+ def static_sql_bytesize
52
+ @static_sql_bytesize ||= insert_part.bytesize + update_part.bytesize + 2
53
+ end
54
+
55
+ def sql
56
+ all_value_sql = rows.map { |row| row.values_sql }
57
+ retval = [ insert_part, '(', all_value_sql.join('),('), ')', update_part ].join
58
+ retval
59
+ end
60
+
61
+ # since setting an option like :as => :hash actually persists that option to the client, don't pass any options
62
+ def max_sql_bytesize
63
+ @max_sql_bytesize ||= parent.connection.database_variable_get(:MAX_ALLOWED_PACKET).to_i
64
+ end
65
+ end
66
+ end
67
+ end
@@ -0,0 +1,54 @@
1
+ require 'upsert/buffer/pg_connection/merge_function'
2
+
3
+ class Upsert
4
+ class Buffer
5
+ # @private
6
+ class PG_Connection < Buffer
7
+ def ready
8
+ return if rows.empty?
9
+ row = rows.shift
10
+ MergeFunction.execute(self, row)
11
+ end
12
+
13
+ def clear_database_functions
14
+ connection = parent.connection
15
+ # http://stackoverflow.com/questions/7622908/postgresql-drop-function-without-knowing-the-number-type-of-parameters
16
+ connection.execute <<-EOS
17
+ CREATE OR REPLACE FUNCTION pg_temp.upsert_delfunc(text)
18
+ RETURNS void AS
19
+ $BODY$
20
+ DECLARE
21
+ _sql text;
22
+ BEGIN
23
+
24
+ FOR _sql IN
25
+ SELECT 'DROP FUNCTION ' || quote_ident(n.nspname)
26
+ || '.' || quote_ident(p.proname)
27
+ || '(' || pg_catalog.pg_get_function_identity_arguments(p.oid) || ');'
28
+ FROM pg_catalog.pg_proc p
29
+ LEFT JOIN pg_catalog.pg_namespace n ON n.oid = p.pronamespace
30
+ WHERE p.proname = $1
31
+ AND pg_catalog.pg_function_is_visible(p.oid) -- you may or may not want this
32
+ LOOP
33
+ EXECUTE _sql;
34
+ END LOOP;
35
+
36
+ END;
37
+ $BODY$
38
+ LANGUAGE plpgsql;
39
+ EOS
40
+ res = connection.execute(%{SELECT proname FROM pg_proc WHERE proname LIKE 'upsert_%'})
41
+ res.each do |row|
42
+ k = row['proname']
43
+ next if k == 'upsert_delfunc'
44
+ Upsert.logger.info %{[upsert] Dropping function #{k.inspect}}
45
+ connection.execute %{SELECT pg_temp.upsert_delfunc('#{k}')}
46
+ end
47
+ end
48
+ end
49
+
50
+ # @private
51
+ # backwards compatibility - https://github.com/seamusabshere/upsert/issues/2
52
+ PGconn = PG_Connection
53
+ end
54
+ end
@@ -0,0 +1,138 @@
1
+ require 'digest/md5'
2
+
3
+ class Upsert
4
+ # @private
5
+ class Buffer
6
+ class PG_Connection < Buffer
7
+ class MergeFunction
8
+ class << self
9
+ def execute(buffer, row)
10
+ first_try = true
11
+ begin
12
+ buffer.parent.connection.execute sql(buffer, row)
13
+ rescue PG::Error => pg_error
14
+ if first_try and pg_error.message =~ /function upsert_(.+) does not exist/
15
+ Upsert.logger.info %{[upsert] Function #{"upsert_#{$1}".inspect} went missing, trying to recreate}
16
+ first_try = false
17
+ @lookup.clear
18
+ retry
19
+ else
20
+ raise pg_error
21
+ end
22
+ end
23
+ end
24
+
25
+ def sql(buffer, row)
26
+ merge_function = lookup buffer, row
27
+ %{SELECT #{merge_function.name}(#{merge_function.values_sql(row)})}
28
+ end
29
+
30
+ def unique_key(table_name, selector, columns)
31
+ [
32
+ table_name,
33
+ selector.join(','),
34
+ columns.join(',')
35
+ ].join '/'
36
+ end
37
+
38
+ def lookup(buffer, row)
39
+ @lookup ||= {}
40
+ s = row.selector.keys
41
+ c = row.columns
42
+ @lookup[unique_key(buffer.parent.table_name, s, c)] ||= new(buffer, s, c)
43
+ end
44
+ end
45
+
46
+ attr_reader :buffer
47
+ attr_reader :selector
48
+ attr_reader :columns
49
+
50
+ def initialize(buffer, selector, columns)
51
+ @buffer = buffer
52
+ @selector = selector
53
+ @columns = columns
54
+ create!
55
+ end
56
+
57
+ def name
58
+ @name ||= "upsert_#{Digest::MD5.hexdigest(unique_key)}"
59
+ end
60
+
61
+ def values_sql(row)
62
+ ordered_args = columns.map do |k|
63
+ row.quoted_value(k) || NULL_WORD
64
+ end.join(',')
65
+ end
66
+
67
+ private
68
+
69
+ def unique_key
70
+ @unique_key ||= MergeFunction.unique_key buffer.parent.table_name, selector, columns
71
+ end
72
+
73
+ def connection
74
+ buffer.parent.connection
75
+ end
76
+
77
+ def quoted_table_name
78
+ buffer.parent.quoted_table_name
79
+ end
80
+
81
+ ColumnDefinition = Struct.new(:quoted_name, :quoted_input_name, :sql_type, :default)
82
+
83
+ # activerecord-3.2.5/lib/active_record/connection_adapters/postgresql_adapter.rb#column_definitions
84
+ def get_column_definitions
85
+ res = connection.execute <<-EOS
86
+ SELECT a.attname AS name, format_type(a.atttypid, a.atttypmod) AS sql_type, d.adsrc AS default
87
+ FROM pg_attribute a LEFT JOIN pg_attrdef d
88
+ ON a.attrelid = d.adrelid AND a.attnum = d.adnum
89
+ WHERE a.attrelid = '#{quoted_table_name}'::regclass
90
+ AND a.attnum > 0 AND NOT a.attisdropped
91
+ EOS
92
+ unsorted = res.select do |row|
93
+ columns.include? row['name']
94
+ end.inject({}) do |memo, row|
95
+ k = row['name']
96
+ memo[k] = ColumnDefinition.new connection.quote_ident(k), connection.quote_ident("#{k}_input"), row['sql_type'], row['default']
97
+ memo
98
+ end
99
+ columns.map do |k|
100
+ unsorted[k]
101
+ end
102
+ end
103
+
104
+ # the "canonical example" from http://www.postgresql.org/docs/9.1/static/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE
105
+ def create!
106
+ Upsert.logger.info "[upsert] Creating or replacing database function #{name.inspect} on table #{buffer.parent.table_name.inspect} for selector #{selector.map(&:inspect).join(', ')} and columns #{columns.map(&:inspect).join(', ')}"
107
+ column_definitions = get_column_definitions
108
+ connection.execute <<-EOS
109
+ CREATE OR REPLACE FUNCTION #{name}(#{column_definitions.map { |c| "#{c.quoted_input_name} #{c.sql_type} DEFAULT #{c.default || 'NULL'}" }.join(',') }) RETURNS VOID AS
110
+ $$
111
+ BEGIN
112
+ LOOP
113
+ -- first try to update the key
114
+ UPDATE #{quoted_table_name} SET #{column_definitions.map { |c| "#{c.quoted_name} = #{c.quoted_input_name}" }.join(',')}
115
+ WHERE #{selector.map { |k| "#{connection.quote_ident(k)} = #{connection.quote_ident([k,'input'].join('_'))}" }.join(' AND ') };
116
+ IF found THEN
117
+ RETURN;
118
+ END IF;
119
+ -- not there, so try to insert the key
120
+ -- if someone else inserts the same key concurrently,
121
+ -- we could get a unique-key failure
122
+ BEGIN
123
+ INSERT INTO #{quoted_table_name}(#{column_definitions.map { |c| c.quoted_name }.join(',')}) VALUES (#{column_definitions.map { |c| c.quoted_input_name }.join(',')});
124
+ RETURN;
125
+ EXCEPTION WHEN unique_violation THEN
126
+ -- Do nothing, and loop to try the UPDATE again.
127
+ END;
128
+ END LOOP;
129
+ END;
130
+ $$
131
+ LANGUAGE plpgsql;
132
+ EOS
133
+ end
134
+
135
+ end
136
+ end
137
+ end
138
+ end