sequel-bigquery 0.4.0 → 0.6.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 3e8459abb689482e387bc4b0cd7dac2c582a40a9c449d5cd5bad5c8da57fffdb
4
- data.tar.gz: 4409fa1b5704c03afcd068915311c4944bdaec1326a21ba6df3ed47d9ea118a2
3
+ metadata.gz: 6c15547bd0bbe53a384a987286cda32f9e1ad624c2c9e5339ac412322e810653
4
+ data.tar.gz: 26d87c1ff3d3aecf9d5401de88ad9048a239dd65bc3a682cfbb9d9d726675b9f
5
5
  SHA512:
6
- metadata.gz: 8573f4d5d1f46d63fd062d97efa1bf19ffc7d2e9a3dd24433cdd079554380f7639e2d767657285a9cc1eea829a300c08aa40cfd3a7e24eb9f0f229c0f13e0e8c
7
- data.tar.gz: 51371f99aba0a7799a0c8c34f9d3541c86c76113d4acda78badb420375abded01cdd456f184753793505be70f92eba1d9f0a99f35abdd1f766806023f3860979
6
+ metadata.gz: daa23d28fc8359540d61e57a5f4087f76da68045e6535d864f711c51d8fd0517bf9d74e332ea9cd3a5399ece489beb06df08dfaf409caf8877a1cf6aaa8140a3
7
+ data.tar.gz: eb73f36dbe0317d42e5b1fd9a774dd7d5440d36a8ddebd33c5683a67723efaf5538d4b915a4b518e49a6165854b42152762d57934a57dd050efee0f735563c4f
data/README.md CHANGED
@@ -4,11 +4,22 @@
4
4
 
5
5
  A Sequel adapter for [Google's BigQuery](https://cloud.google.com/bigquery).
6
6
 
7
+ This gem was created in order to manage schema migrations of a BigQuery dataset at GreenSync. At the time of writing, we couldn't find any good tools in any language to manage changes to the schema as a set of migrations.
8
+
9
+ Beyond migrations, I'm unsure how useful this gem is. I haven't yet tested what the performance would be for data interactions vs. directly using the `google-cloud-bigquery` gem's native facilities. If you're inserting a bunch of data, it's probably a better idea to use an [inserter from that gem](https://googleapis.dev/ruby/google-cloud-bigquery/latest/Google/Cloud/Bigquery/Dataset.html#insert_async-instance_method) rather than going through SQL.
10
+
7
11
  ## Contents
8
12
 
9
13
  <!-- MarkdownTOC autolink=true -->
10
14
 
11
15
  - [Intro](#intro)
16
+ - [Quirks](#quirks)
17
+ - [Creating tables with column defaults](#creating-tables-with-column-defaults)
18
+ - [Transactions](#transactions)
19
+ - [Update statements without `WHERE`](#update-statements-without-where)
20
+ - [Combining statements](#combining-statements)
21
+ - [Alter table](#alter-table)
22
+ - [Column recreation](#column-recreation)
12
23
  - [Installation](#installation)
13
24
  - [Usage](#usage)
14
25
  - [Contributing](#contributing)
@@ -25,12 +36,12 @@ A Sequel adapter for [Google's BigQuery](https://cloud.google.com/bigquery).
25
36
  Features:
26
37
 
27
38
  - Connecting
28
- - Migrating
29
- - Table creation, with automatic removal of defaults from statements (since BigQuery doesn't support it)
39
+ - Migrating (see quirks)
40
+ - Table creation (see quirks)
30
41
  - Inserting rows
31
- - Updating rows, with automatic addition of `where 1 = 1` to statements (since BigQuery requires a `where` clause)
42
+ - Updating rows (see quirks)
32
43
  - Querying
33
- - Transactions (buffered since BigQuery only supports them when you execute the whole transaction at once)
44
+ - Transactions (see quirks)
34
45
  - Table partitioning
35
46
  - Ruby types:
36
47
  + String
@@ -42,6 +53,34 @@ Features:
42
53
  + BigDecimal
43
54
  - Selecting the BigQuery server location
44
55
 
56
+ ## Quirks
57
+
58
+ ### Creating tables with column defaults
59
+
60
+ BigQuery doesn't support defaults on columns. As a workaround, all defaults are automatically removed from statements (crudely).
61
+
62
+ ### Transactions
63
+
64
+ BigQuery doesn't support transactions where the statements are executed individually. It does support them if entire transaction SQL is sent all at once though. As a workaround, buffering of statements within a transaction has been implemented. However, the impact of this is that no results can be returned within a transaction.
65
+
66
+ ### Update statements without `WHERE`
67
+
68
+ BigQuery requires all `UPDATE` statement to have a `WHERE` clause. As a workaround, statements which lack one have `where 1 = 1` appended automatically (crudely).
69
+
70
+ ### Combining statements
71
+
72
+ When combining multiple statements into one query (with `;`), and the final statement is not a `SELECT`, the `google-cloud-bigquery` gem has a [bug](https://github.com/googleapis/google-cloud-ruby/issues/9617) which causes an exception. Note that all the statements have been executed when this happens. A workaround is to append `; SELECT 1`.
73
+
74
+ ### Alter table
75
+
76
+ BigQuery [rate-limits alter table statements](https://cloud.google.com/bigquery/quotas#dataset_limits) to 10 per second. This is mitigated somewhat by Sequel combining `ALTER TABLE` statements whenever possible, and BigQuery having extremely high latency (\~2 seconds per query); but you may still run into this limitation.
77
+
78
+ We've also noticed a bug with `google-cloud-bigquery` where an `ALTER TABLE` statement resulted in a `NoMethodError` on nil for `fields` within `from_gapi_json`. We're not yet sure what caused this.
79
+
80
+ ### Column recreation
81
+
82
+ Be careful when deleting a column which you might want to re-add. BigQuery reserves the name of a deleted column for up to the time travel duration - which is [*seven days*](https://cloud.google.com/bigquery/docs/time-travel). Re-creating the entire dataset is a painful workaround.
83
+
45
84
  ## Installation
46
85
 
47
86
  Add it to the `Gemfile` of your project:
@@ -68,6 +107,7 @@ Connect to BigQuery:
68
107
 
69
108
  ```
70
109
  require 'sequel-bigquery'
110
+ require 'logger'
71
111
 
72
112
  db = Sequel.connect(
73
113
  adapter: :bigquery,
@@ -80,6 +120,8 @@ db = Sequel.connect(
80
120
 
81
121
  And use Sequel like normal.
82
122
 
123
+ Note that it is important to supply a logger that will at least output warning messages so you know when your queries are being modifed or buffered, which may be unexpected behaviour.
124
+
83
125
  ## Contributing
84
126
 
85
127
  Pull requests welcome! =)
@@ -4,6 +4,7 @@ require 'delegate'
4
4
  require 'time'
5
5
 
6
6
  require 'google/cloud/bigquery'
7
+ require 'amazing_print'
7
8
  require 'paint'
8
9
  require 'sequel'
9
10
 
@@ -16,39 +17,34 @@ module Sequel
16
17
  class Database < Sequel::Database # rubocop:disable Metrics/ClassLength
17
18
  set_adapter_scheme :bigquery
18
19
 
19
- def initialize(*args, **kawrgs)
20
- puts '.new'
21
- @orig_opts = kawrgs.fetch(:orig_opts)
20
+ def initialize(*args, **kwargs)
21
+ @bigquery_config = kwargs.fetch(:orig_opts)
22
22
  @sql_buffer = []
23
23
  @sql_buffering = false
24
24
  super
25
25
  end
26
26
 
27
27
  def connect(*_args)
28
- puts '#connect'
29
- config = @orig_opts.dup
30
- config.delete(:adapter)
31
- config.delete(:logger)
32
- location = config.delete(:location)
33
- bq_dataset_name = config.delete(:dataset) || config.delete(:database)
34
- @bigquery = Google::Cloud::Bigquery.new(config)
28
+ log_each(:debug, '#connect')
29
+ get_or_create_bigquery_dataset
30
+ .tap { log_each(:debug, '#connect end') }
31
+ end
32
+
33
+ def bigquery
35
34
  # ObjectSpace.each_object(HTTPClient).each { |c| c.debug_dev = STDOUT }
36
- @bigquery.dataset(bq_dataset_name) || begin
37
- @loggers[0].debug('BigQuery dataset %s does not exist; creating it' % bq_dataset_name)
38
- @bigquery.create_dataset(bq_dataset_name, location: location)
39
- end
40
- .tap { puts '#connect end' }
35
+ @bigquery ||= Google::Cloud::Bigquery.new(google_cloud_bigquery_gem_config)
41
36
  end
42
37
 
43
38
  def disconnect_connection(_c)
44
- puts '#disconnect_connection'
39
+ log_each(:debug, '#disconnect_connection')
45
40
  # c.disconnect
46
41
  end
47
42
 
48
43
  def drop_datasets(*dataset_names_to_drop)
49
44
  dataset_names_to_drop.each do |dataset_name_to_drop|
50
- puts "Dropping dataset #{dataset_name_to_drop.inspect}"
51
- dataset_to_drop = @bigquery.dataset(dataset_name_to_drop)
45
+ log_each(:debug, "Dropping dataset #{dataset_name_to_drop.inspect}")
46
+ dataset_to_drop = bigquery.dataset(dataset_name_to_drop)
47
+ next unless dataset_to_drop
52
48
  dataset_to_drop.tables.each(&:delete)
53
49
  dataset_to_drop.delete
54
50
  end
@@ -56,7 +52,7 @@ module Sequel
56
52
  alias drop_dataset drop_datasets
57
53
 
58
54
  def execute(sql, opts = OPTS) # rubocop:disable Metrics/MethodLength, Metrics/AbcSize, Metrics/CyclomaticComplexity, Metrics/PerceivedComplexity
59
- puts '#execute'
55
+ log_each(:debug, '#execute')
60
56
  log_query(sql)
61
57
 
62
58
  # require 'pry'; binding.pry if sql =~ /CREATE TABLE IF NOT EXISTS/i
@@ -82,21 +78,31 @@ module Sequel
82
78
  warn("Warning: Will now execute entire buffered transaction:\n" + @sql_buffer.join("\n"))
83
79
  end
84
80
 
81
+ sql_to_execute = @sql_buffer.any? ? @sql_buffer.join("\n") : sql
82
+
85
83
  synchronize(opts[:server]) do |conn|
86
84
  results = log_connection_yield(sql, conn) do
87
- sql_to_execute = @sql_buffer.any? ? @sql_buffer.join("\n") : sql
88
85
  conn.query(sql_to_execute)
89
86
  end
90
- require 'amazing_print'
91
- ap results
87
+ log_each(:debug, results.awesome_inspect)
92
88
  if block_given?
93
89
  yield results
94
90
  else
95
91
  results
96
92
  end
97
- # TODO
98
- # rescue ::ODBC::Error, ArgumentError => e
99
- rescue Google::Cloud::InvalidArgumentError, ArgumentError => e
93
+ rescue Google::Cloud::InvalidArgumentError, Google::Cloud::PermissionDeniedError => e
94
+ if e.message.include?('too many table update operations for this table')
95
+ warn('Triggered rate limit of table update operations for this table. For more information, see https://cloud.google.com/bigquery/docs/troubleshoot-quotas')
96
+ if retryable_query?(sql_to_execute)
97
+ warn('Detected retryable query - re-running query after a 1 second sleep')
98
+ sleep 1
99
+ retry
100
+ else
101
+ log_each(:error, "Query not detected as retryable; can't automatically recover from being rate-limited")
102
+ end
103
+ end
104
+ raise_error(e)
105
+ rescue ArgumentError => e
100
106
  raise_error(e)
101
107
  end # rubocop:disable Style/MultilineBlockChain
102
108
  .tap do
@@ -123,6 +129,33 @@ module Sequel
123
129
 
124
130
  private
125
131
 
132
+ attr_reader :bigquery_config
133
+
134
+ def google_cloud_bigquery_gem_config
135
+ bigquery_config.dup.tap do |config|
136
+ %i[
137
+ adapter
138
+ database
139
+ dataset
140
+ location
141
+ logger
142
+ ].each do |option|
143
+ config.delete(option)
144
+ end
145
+ end
146
+ end
147
+
148
+ def get_or_create_bigquery_dataset # rubocop:disable Naming/AccessorMethodName
149
+ bigquery.dataset(bigquery_dataset_name) || begin
150
+ log_each(:debug, 'BigQuery dataset %s does not exist; creating it' % bigquery_dataset_name)
151
+ bigquery.create_dataset(bigquery_dataset_name, location: bigquery_config[:location])
152
+ end
153
+ end
154
+
155
+ def bigquery_dataset_name
156
+ bigquery_config[:dataset] || bigquery_config[:database] || (raise ArgumentError, 'BigQuery dataset must be specified')
157
+ end
158
+
126
159
  def connection_execute_method
127
160
  :query
128
161
  end
@@ -137,9 +170,9 @@ module Sequel
137
170
  end
138
171
 
139
172
  def schema_parse_table(_table_name, _opts)
140
- logger.debug(Paint['schema_parse_table', :red, :bold])
173
+ log_each(:debug, Paint['schema_parse_table', :red, :bold])
141
174
  # require 'pry'; binding.pry
142
- @bigquery.datasets.map do |dataset|
175
+ bigquery.datasets.map do |dataset|
143
176
  [
144
177
  dataset.dataset_id,
145
178
  {},
@@ -154,13 +187,12 @@ module Sequel
154
187
 
155
188
  # Padded to horizontally align with post-execution log message which includes the execution time
156
189
  def log_query(sql)
157
- pad = ' '
158
- puts Paint[pad + sql, :cyan, :bold]
159
- # @loggers[0]&.debug(' ' + sql)
190
+ pad = ' ' * 12
191
+ log_each(:debug, Paint[pad + sql, :cyan, :bold])
160
192
  end
161
193
 
162
194
  def warn(msg)
163
- @loggers[0].warn(Paint[msg, '#FFA500', :bold])
195
+ log_each(:warn, Paint[msg, '#FFA500', :bold])
164
196
  end
165
197
 
166
198
  def warn_default_removal(sql)
@@ -190,11 +222,27 @@ module Sequel
190
222
 
191
223
  sql
192
224
  end
225
+
226
+ def supports_combining_alter_table_ops?
227
+ true
228
+ end
229
+
230
+ def retryable_query?(sql)
231
+ single_statement_query?(sql) && alter_table_query?(sql)
232
+ end
233
+
234
+ def single_statement_query?(sql)
235
+ !sql.rstrip.chomp(';').include?(';')
236
+ end
237
+
238
+ def alter_table_query?(sql)
239
+ sql.match?(/\Aalter table /i)
240
+ end
193
241
  end
194
242
 
195
243
  class Dataset < Sequel::Dataset
196
244
  def fetch_rows(sql, &block)
197
- puts '#fetch_rows'
245
+ db.send(:log_each, :debug, '#fetch_rows')
198
246
 
199
247
  execute(sql) do |bq_result|
200
248
  self.columns = bq_result.fields.map { |field| field.name.to_sym }
@@ -220,7 +268,7 @@ module Sequel
220
268
 
221
269
  # Like MySQL, BigQuery uses the nonstandard ` (backtick) for quoting identifiers.
222
270
  def quoted_identifier_append(sql, c)
223
- sql << '`%s`' % c
271
+ sql << ('`%s`' % c)
224
272
  end
225
273
 
226
274
  def input_identifier(v)
@@ -2,6 +2,6 @@
2
2
 
3
3
  module Sequel
4
4
  module Bigquery
5
- VERSION = '0.4.0'
5
+ VERSION = '0.6.0'
6
6
  end
7
7
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: sequel-bigquery
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.0
4
+ version: 0.6.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Brendan Weibrecht
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2021-10-28 00:00:00.000000000 Z
11
+ date: 2021-11-15 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: amazing_print
@@ -98,7 +98,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
98
98
  - !ruby/object:Gem::Version
99
99
  version: '0'
100
100
  requirements: []
101
- rubygems_version: 3.2.16
101
+ rubygems_version: 3.1.6
102
102
  signing_key:
103
103
  specification_version: 4
104
104
  summary: A Sequel adapter for Google's BigQuery