sequel-bigquery 0.4.0 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 3e8459abb689482e387bc4b0cd7dac2c582a40a9c449d5cd5bad5c8da57fffdb
4
- data.tar.gz: 4409fa1b5704c03afcd068915311c4944bdaec1326a21ba6df3ed47d9ea118a2
3
+ metadata.gz: 6c15547bd0bbe53a384a987286cda32f9e1ad624c2c9e5339ac412322e810653
4
+ data.tar.gz: 26d87c1ff3d3aecf9d5401de88ad9048a239dd65bc3a682cfbb9d9d726675b9f
5
5
  SHA512:
6
- metadata.gz: 8573f4d5d1f46d63fd062d97efa1bf19ffc7d2e9a3dd24433cdd079554380f7639e2d767657285a9cc1eea829a300c08aa40cfd3a7e24eb9f0f229c0f13e0e8c
7
- data.tar.gz: 51371f99aba0a7799a0c8c34f9d3541c86c76113d4acda78badb420375abded01cdd456f184753793505be70f92eba1d9f0a99f35abdd1f766806023f3860979
6
+ metadata.gz: daa23d28fc8359540d61e57a5f4087f76da68045e6535d864f711c51d8fd0517bf9d74e332ea9cd3a5399ece489beb06df08dfaf409caf8877a1cf6aaa8140a3
7
+ data.tar.gz: eb73f36dbe0317d42e5b1fd9a774dd7d5440d36a8ddebd33c5683a67723efaf5538d4b915a4b518e49a6165854b42152762d57934a57dd050efee0f735563c4f
data/README.md CHANGED
@@ -4,11 +4,22 @@
4
4
 
5
5
  A Sequel adapter for [Google's BigQuery](https://cloud.google.com/bigquery).
6
6
 
7
+ This gem was created in order to manage schema migrations of a BigQuery dataset at GreenSync. At the time of writing, we couldn't find any good tools in any language to manage changes to the schema as a set of migrations.
8
+
9
+ Beyond migrations, I'm unsure how useful this gem is. I haven't yet tested what the performance would be for data interactions vs. directly using the `google-cloud-bigquery` gem's native facilities. If you're inserting a bunch of data, it's probably a better idea to use an [inserter from that gem](https://googleapis.dev/ruby/google-cloud-bigquery/latest/Google/Cloud/Bigquery/Dataset.html#insert_async-instance_method) rather than going through SQL.
10
+
7
11
  ## Contents
8
12
 
9
13
  <!-- MarkdownTOC autolink=true -->
10
14
 
11
15
  - [Intro](#intro)
16
+ - [Quirks](#quirks)
17
+ - [Creating tables with column defaults](#creating-tables-with-column-defaults)
18
+ - [Transactions](#transactions)
19
+ - [Update statements without `WHERE`](#update-statements-without-where)
20
+ - [Combining statements](#combining-statements)
21
+ - [Alter table](#alter-table)
22
+ - [Column recreation](#column-recreation)
12
23
  - [Installation](#installation)
13
24
  - [Usage](#usage)
14
25
  - [Contributing](#contributing)
@@ -25,12 +36,12 @@ A Sequel adapter for [Google's BigQuery](https://cloud.google.com/bigquery).
25
36
  Features:
26
37
 
27
38
  - Connecting
28
- - Migrating
29
- - Table creation, with automatic removal of defaults from statements (since BigQuery doesn't support it)
39
+ - Migrating (see quirks)
40
+ - Table creation (see quirks)
30
41
  - Inserting rows
31
- - Updating rows, with automatic addition of `where 1 = 1` to statements (since BigQuery requires a `where` clause)
42
+ - Updating rows (see quirks)
32
43
  - Querying
33
- - Transactions (buffered since BigQuery only supports them when you execute the whole transaction at once)
44
+ - Transactions (see quirks)
34
45
  - Table partitioning
35
46
  - Ruby types:
36
47
  + String
@@ -42,6 +53,34 @@ Features:
42
53
  + BigDecimal
43
54
  - Selecting the BigQuery server location
44
55
 
56
+ ## Quirks
57
+
58
+ ### Creating tables with column defaults
59
+
60
+ BigQuery doesn't support defaults on columns. As a workaround, all defaults are automatically removed from statements (crudely).
61
+
62
+ ### Transactions
63
+
64
+ BigQuery doesn't support transactions where the statements are executed individually. It does support them if entire transaction SQL is sent all at once though. As a workaround, buffering of statements within a transaction has been implemented. However, the impact of this is that no results can be returned within a transaction.
65
+
66
+ ### Update statements without `WHERE`
67
+
68
+ BigQuery requires all `UPDATE` statement to have a `WHERE` clause. As a workaround, statements which lack one have `where 1 = 1` appended automatically (crudely).
69
+
70
+ ### Combining statements
71
+
72
+ When combining multiple statements into one query (with `;`), and the final statement is not a `SELECT`, the `google-cloud-bigquery` gem has a [bug](https://github.com/googleapis/google-cloud-ruby/issues/9617) which causes an exception. Note that all the statements have been executed when this happens. A workaround is to append `; SELECT 1`.
73
+
74
+ ### Alter table
75
+
76
+ BigQuery [rate-limits alter table statements](https://cloud.google.com/bigquery/quotas#dataset_limits) to 10 per second. This is mitigated somewhat by Sequel combining `ALTER TABLE` statements whenever possible, and BigQuery having extremely high latency (\~2 seconds per query); but you may still run into this limitation.
77
+
78
+ We've also noticed a bug with `google-cloud-bigquery` where an `ALTER TABLE` statement resulted in a `NoMethodError` on nil for `fields` within `from_gapi_json`. We're not yet sure what caused this.
79
+
80
+ ### Column recreation
81
+
82
+ Be careful when deleting a column which you might want to re-add. BigQuery reserves the name of a deleted column for up to the time travel duration - which is [*seven days*](https://cloud.google.com/bigquery/docs/time-travel). Re-creating the entire dataset is a painful workaround.
83
+
45
84
  ## Installation
46
85
 
47
86
  Add it to the `Gemfile` of your project:
@@ -68,6 +107,7 @@ Connect to BigQuery:
68
107
 
69
108
  ```
70
109
  require 'sequel-bigquery'
110
+ require 'logger'
71
111
 
72
112
  db = Sequel.connect(
73
113
  adapter: :bigquery,
@@ -80,6 +120,8 @@ db = Sequel.connect(
80
120
 
81
121
  And use Sequel like normal.
82
122
 
123
+ Note that it is important to supply a logger that will at least output warning messages so you know when your queries are being modifed or buffered, which may be unexpected behaviour.
124
+
83
125
  ## Contributing
84
126
 
85
127
  Pull requests welcome! =)
@@ -4,6 +4,7 @@ require 'delegate'
4
4
  require 'time'
5
5
 
6
6
  require 'google/cloud/bigquery'
7
+ require 'amazing_print'
7
8
  require 'paint'
8
9
  require 'sequel'
9
10
 
@@ -16,39 +17,34 @@ module Sequel
16
17
  class Database < Sequel::Database # rubocop:disable Metrics/ClassLength
17
18
  set_adapter_scheme :bigquery
18
19
 
19
- def initialize(*args, **kawrgs)
20
- puts '.new'
21
- @orig_opts = kawrgs.fetch(:orig_opts)
20
+ def initialize(*args, **kwargs)
21
+ @bigquery_config = kwargs.fetch(:orig_opts)
22
22
  @sql_buffer = []
23
23
  @sql_buffering = false
24
24
  super
25
25
  end
26
26
 
27
27
  def connect(*_args)
28
- puts '#connect'
29
- config = @orig_opts.dup
30
- config.delete(:adapter)
31
- config.delete(:logger)
32
- location = config.delete(:location)
33
- bq_dataset_name = config.delete(:dataset) || config.delete(:database)
34
- @bigquery = Google::Cloud::Bigquery.new(config)
28
+ log_each(:debug, '#connect')
29
+ get_or_create_bigquery_dataset
30
+ .tap { log_each(:debug, '#connect end') }
31
+ end
32
+
33
+ def bigquery
35
34
  # ObjectSpace.each_object(HTTPClient).each { |c| c.debug_dev = STDOUT }
36
- @bigquery.dataset(bq_dataset_name) || begin
37
- @loggers[0].debug('BigQuery dataset %s does not exist; creating it' % bq_dataset_name)
38
- @bigquery.create_dataset(bq_dataset_name, location: location)
39
- end
40
- .tap { puts '#connect end' }
35
+ @bigquery ||= Google::Cloud::Bigquery.new(google_cloud_bigquery_gem_config)
41
36
  end
42
37
 
43
38
  def disconnect_connection(_c)
44
- puts '#disconnect_connection'
39
+ log_each(:debug, '#disconnect_connection')
45
40
  # c.disconnect
46
41
  end
47
42
 
48
43
  def drop_datasets(*dataset_names_to_drop)
49
44
  dataset_names_to_drop.each do |dataset_name_to_drop|
50
- puts "Dropping dataset #{dataset_name_to_drop.inspect}"
51
- dataset_to_drop = @bigquery.dataset(dataset_name_to_drop)
45
+ log_each(:debug, "Dropping dataset #{dataset_name_to_drop.inspect}")
46
+ dataset_to_drop = bigquery.dataset(dataset_name_to_drop)
47
+ next unless dataset_to_drop
52
48
  dataset_to_drop.tables.each(&:delete)
53
49
  dataset_to_drop.delete
54
50
  end
@@ -56,7 +52,7 @@ module Sequel
56
52
  alias drop_dataset drop_datasets
57
53
 
58
54
  def execute(sql, opts = OPTS) # rubocop:disable Metrics/MethodLength, Metrics/AbcSize, Metrics/CyclomaticComplexity, Metrics/PerceivedComplexity
59
- puts '#execute'
55
+ log_each(:debug, '#execute')
60
56
  log_query(sql)
61
57
 
62
58
  # require 'pry'; binding.pry if sql =~ /CREATE TABLE IF NOT EXISTS/i
@@ -82,21 +78,31 @@ module Sequel
82
78
  warn("Warning: Will now execute entire buffered transaction:\n" + @sql_buffer.join("\n"))
83
79
  end
84
80
 
81
+ sql_to_execute = @sql_buffer.any? ? @sql_buffer.join("\n") : sql
82
+
85
83
  synchronize(opts[:server]) do |conn|
86
84
  results = log_connection_yield(sql, conn) do
87
- sql_to_execute = @sql_buffer.any? ? @sql_buffer.join("\n") : sql
88
85
  conn.query(sql_to_execute)
89
86
  end
90
- require 'amazing_print'
91
- ap results
87
+ log_each(:debug, results.awesome_inspect)
92
88
  if block_given?
93
89
  yield results
94
90
  else
95
91
  results
96
92
  end
97
- # TODO
98
- # rescue ::ODBC::Error, ArgumentError => e
99
- rescue Google::Cloud::InvalidArgumentError, ArgumentError => e
93
+ rescue Google::Cloud::InvalidArgumentError, Google::Cloud::PermissionDeniedError => e
94
+ if e.message.include?('too many table update operations for this table')
95
+ warn('Triggered rate limit of table update operations for this table. For more information, see https://cloud.google.com/bigquery/docs/troubleshoot-quotas')
96
+ if retryable_query?(sql_to_execute)
97
+ warn('Detected retryable query - re-running query after a 1 second sleep')
98
+ sleep 1
99
+ retry
100
+ else
101
+ log_each(:error, "Query not detected as retryable; can't automatically recover from being rate-limited")
102
+ end
103
+ end
104
+ raise_error(e)
105
+ rescue ArgumentError => e
100
106
  raise_error(e)
101
107
  end # rubocop:disable Style/MultilineBlockChain
102
108
  .tap do
@@ -123,6 +129,33 @@ module Sequel
123
129
 
124
130
  private
125
131
 
132
+ attr_reader :bigquery_config
133
+
134
+ def google_cloud_bigquery_gem_config
135
+ bigquery_config.dup.tap do |config|
136
+ %i[
137
+ adapter
138
+ database
139
+ dataset
140
+ location
141
+ logger
142
+ ].each do |option|
143
+ config.delete(option)
144
+ end
145
+ end
146
+ end
147
+
148
+ def get_or_create_bigquery_dataset # rubocop:disable Naming/AccessorMethodName
149
+ bigquery.dataset(bigquery_dataset_name) || begin
150
+ log_each(:debug, 'BigQuery dataset %s does not exist; creating it' % bigquery_dataset_name)
151
+ bigquery.create_dataset(bigquery_dataset_name, location: bigquery_config[:location])
152
+ end
153
+ end
154
+
155
+ def bigquery_dataset_name
156
+ bigquery_config[:dataset] || bigquery_config[:database] || (raise ArgumentError, 'BigQuery dataset must be specified')
157
+ end
158
+
126
159
  def connection_execute_method
127
160
  :query
128
161
  end
@@ -137,9 +170,9 @@ module Sequel
137
170
  end
138
171
 
139
172
  def schema_parse_table(_table_name, _opts)
140
- logger.debug(Paint['schema_parse_table', :red, :bold])
173
+ log_each(:debug, Paint['schema_parse_table', :red, :bold])
141
174
  # require 'pry'; binding.pry
142
- @bigquery.datasets.map do |dataset|
175
+ bigquery.datasets.map do |dataset|
143
176
  [
144
177
  dataset.dataset_id,
145
178
  {},
@@ -154,13 +187,12 @@ module Sequel
154
187
 
155
188
  # Padded to horizontally align with post-execution log message which includes the execution time
156
189
  def log_query(sql)
157
- pad = ' '
158
- puts Paint[pad + sql, :cyan, :bold]
159
- # @loggers[0]&.debug(' ' + sql)
190
+ pad = ' ' * 12
191
+ log_each(:debug, Paint[pad + sql, :cyan, :bold])
160
192
  end
161
193
 
162
194
  def warn(msg)
163
- @loggers[0].warn(Paint[msg, '#FFA500', :bold])
195
+ log_each(:warn, Paint[msg, '#FFA500', :bold])
164
196
  end
165
197
 
166
198
  def warn_default_removal(sql)
@@ -190,11 +222,27 @@ module Sequel
190
222
 
191
223
  sql
192
224
  end
225
+
226
+ def supports_combining_alter_table_ops?
227
+ true
228
+ end
229
+
230
+ def retryable_query?(sql)
231
+ single_statement_query?(sql) && alter_table_query?(sql)
232
+ end
233
+
234
+ def single_statement_query?(sql)
235
+ !sql.rstrip.chomp(';').include?(';')
236
+ end
237
+
238
+ def alter_table_query?(sql)
239
+ sql.match?(/\Aalter table /i)
240
+ end
193
241
  end
194
242
 
195
243
  class Dataset < Sequel::Dataset
196
244
  def fetch_rows(sql, &block)
197
- puts '#fetch_rows'
245
+ db.send(:log_each, :debug, '#fetch_rows')
198
246
 
199
247
  execute(sql) do |bq_result|
200
248
  self.columns = bq_result.fields.map { |field| field.name.to_sym }
@@ -220,7 +268,7 @@ module Sequel
220
268
 
221
269
  # Like MySQL, BigQuery uses the nonstandard ` (backtick) for quoting identifiers.
222
270
  def quoted_identifier_append(sql, c)
223
- sql << '`%s`' % c
271
+ sql << ('`%s`' % c)
224
272
  end
225
273
 
226
274
  def input_identifier(v)
@@ -2,6 +2,6 @@
2
2
 
3
3
  module Sequel
4
4
  module Bigquery
5
- VERSION = '0.4.0'
5
+ VERSION = '0.6.0'
6
6
  end
7
7
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: sequel-bigquery
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.0
4
+ version: 0.6.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Brendan Weibrecht
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2021-10-28 00:00:00.000000000 Z
11
+ date: 2021-11-15 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: amazing_print
@@ -98,7 +98,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
98
98
  - !ruby/object:Gem::Version
99
99
  version: '0'
100
100
  requirements: []
101
- rubygems_version: 3.2.16
101
+ rubygems_version: 3.1.6
102
102
  signing_key:
103
103
  specification_version: 4
104
104
  summary: A Sequel adapter for Google's BigQuery