sequel-bigquery 0.4.0 → 0.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +46 -4
- data/lib/sequel-bigquery.rb +81 -33
- data/lib/sequel_bigquery/version.rb +1 -1
- metadata +3 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 6c15547bd0bbe53a384a987286cda32f9e1ad624c2c9e5339ac412322e810653
|
4
|
+
data.tar.gz: 26d87c1ff3d3aecf9d5401de88ad9048a239dd65bc3a682cfbb9d9d726675b9f
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: daa23d28fc8359540d61e57a5f4087f76da68045e6535d864f711c51d8fd0517bf9d74e332ea9cd3a5399ece489beb06df08dfaf409caf8877a1cf6aaa8140a3
|
7
|
+
data.tar.gz: eb73f36dbe0317d42e5b1fd9a774dd7d5440d36a8ddebd33c5683a67723efaf5538d4b915a4b518e49a6165854b42152762d57934a57dd050efee0f735563c4f
|
data/README.md
CHANGED
@@ -4,11 +4,22 @@
|
|
4
4
|
|
5
5
|
A Sequel adapter for [Google's BigQuery](https://cloud.google.com/bigquery).
|
6
6
|
|
7
|
+
This gem was created in order to manage schema migrations of a BigQuery dataset at GreenSync. At the time of writing, we couldn't find any good tools in any language to manage changes to the schema as a set of migrations.
|
8
|
+
|
9
|
+
Beyond migrations, I'm unsure how useful this gem is. I haven't yet tested what the performance would be for data interactions vs. directly using the `google-cloud-bigquery` gem's native facilities. If you're inserting a bunch of data, it's probably a better idea to use an [inserter from that gem](https://googleapis.dev/ruby/google-cloud-bigquery/latest/Google/Cloud/Bigquery/Dataset.html#insert_async-instance_method) rather than going through SQL.
|
10
|
+
|
7
11
|
## Contents
|
8
12
|
|
9
13
|
<!-- MarkdownTOC autolink=true -->
|
10
14
|
|
11
15
|
- [Intro](#intro)
|
16
|
+
- [Quirks](#quirks)
|
17
|
+
- [Creating tables with column defaults](#creating-tables-with-column-defaults)
|
18
|
+
- [Transactions](#transactions)
|
19
|
+
- [Update statements without `WHERE`](#update-statements-without-where)
|
20
|
+
- [Combining statements](#combining-statements)
|
21
|
+
- [Alter table](#alter-table)
|
22
|
+
- [Column recreation](#column-recreation)
|
12
23
|
- [Installation](#installation)
|
13
24
|
- [Usage](#usage)
|
14
25
|
- [Contributing](#contributing)
|
@@ -25,12 +36,12 @@ A Sequel adapter for [Google's BigQuery](https://cloud.google.com/bigquery).
|
|
25
36
|
Features:
|
26
37
|
|
27
38
|
- Connecting
|
28
|
-
- Migrating
|
29
|
-
- Table creation
|
39
|
+
- Migrating (see quirks)
|
40
|
+
- Table creation (see quirks)
|
30
41
|
- Inserting rows
|
31
|
-
- Updating rows
|
42
|
+
- Updating rows (see quirks)
|
32
43
|
- Querying
|
33
|
-
- Transactions (
|
44
|
+
- Transactions (see quirks)
|
34
45
|
- Table partitioning
|
35
46
|
- Ruby types:
|
36
47
|
+ String
|
@@ -42,6 +53,34 @@ Features:
|
|
42
53
|
+ BigDecimal
|
43
54
|
- Selecting the BigQuery server location
|
44
55
|
|
56
|
+
## Quirks
|
57
|
+
|
58
|
+
### Creating tables with column defaults
|
59
|
+
|
60
|
+
BigQuery doesn't support defaults on columns. As a workaround, all defaults are automatically removed from statements (crudely).
|
61
|
+
|
62
|
+
### Transactions
|
63
|
+
|
64
|
+
BigQuery doesn't support transactions where the statements are executed individually. It does support them if entire transaction SQL is sent all at once though. As a workaround, buffering of statements within a transaction has been implemented. However, the impact of this is that no results can be returned within a transaction.
|
65
|
+
|
66
|
+
### Update statements without `WHERE`
|
67
|
+
|
68
|
+
BigQuery requires all `UPDATE` statement to have a `WHERE` clause. As a workaround, statements which lack one have `where 1 = 1` appended automatically (crudely).
|
69
|
+
|
70
|
+
### Combining statements
|
71
|
+
|
72
|
+
When combining multiple statements into one query (with `;`), and the final statement is not a `SELECT`, the `google-cloud-bigquery` gem has a [bug](https://github.com/googleapis/google-cloud-ruby/issues/9617) which causes an exception. Note that all the statements have been executed when this happens. A workaround is to append `; SELECT 1`.
|
73
|
+
|
74
|
+
### Alter table
|
75
|
+
|
76
|
+
BigQuery [rate-limits alter table statements](https://cloud.google.com/bigquery/quotas#dataset_limits) to 10 per second. This is mitigated somewhat by Sequel combining `ALTER TABLE` statements whenever possible, and BigQuery having extremely high latency (\~2 seconds per query); but you may still run into this limitation.
|
77
|
+
|
78
|
+
We've also noticed a bug with `google-cloud-bigquery` where an `ALTER TABLE` statement resulted in a `NoMethodError` on nil for `fields` within `from_gapi_json`. We're not yet sure what caused this.
|
79
|
+
|
80
|
+
### Column recreation
|
81
|
+
|
82
|
+
Be careful when deleting a column which you might want to re-add. BigQuery reserves the name of a deleted column for up to the time travel duration - which is [*seven days*](https://cloud.google.com/bigquery/docs/time-travel). Re-creating the entire dataset is a painful workaround.
|
83
|
+
|
45
84
|
## Installation
|
46
85
|
|
47
86
|
Add it to the `Gemfile` of your project:
|
@@ -68,6 +107,7 @@ Connect to BigQuery:
|
|
68
107
|
|
69
108
|
```
|
70
109
|
require 'sequel-bigquery'
|
110
|
+
require 'logger'
|
71
111
|
|
72
112
|
db = Sequel.connect(
|
73
113
|
adapter: :bigquery,
|
@@ -80,6 +120,8 @@ db = Sequel.connect(
|
|
80
120
|
|
81
121
|
And use Sequel like normal.
|
82
122
|
|
123
|
+
Note that it is important to supply a logger that will at least output warning messages so you know when your queries are being modifed or buffered, which may be unexpected behaviour.
|
124
|
+
|
83
125
|
## Contributing
|
84
126
|
|
85
127
|
Pull requests welcome! =)
|
data/lib/sequel-bigquery.rb
CHANGED
@@ -4,6 +4,7 @@ require 'delegate'
|
|
4
4
|
require 'time'
|
5
5
|
|
6
6
|
require 'google/cloud/bigquery'
|
7
|
+
require 'amazing_print'
|
7
8
|
require 'paint'
|
8
9
|
require 'sequel'
|
9
10
|
|
@@ -16,39 +17,34 @@ module Sequel
|
|
16
17
|
class Database < Sequel::Database # rubocop:disable Metrics/ClassLength
|
17
18
|
set_adapter_scheme :bigquery
|
18
19
|
|
19
|
-
def initialize(*args, **
|
20
|
-
|
21
|
-
@orig_opts = kawrgs.fetch(:orig_opts)
|
20
|
+
def initialize(*args, **kwargs)
|
21
|
+
@bigquery_config = kwargs.fetch(:orig_opts)
|
22
22
|
@sql_buffer = []
|
23
23
|
@sql_buffering = false
|
24
24
|
super
|
25
25
|
end
|
26
26
|
|
27
27
|
def connect(*_args)
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
32
|
-
|
33
|
-
|
34
|
-
@bigquery = Google::Cloud::Bigquery.new(config)
|
28
|
+
log_each(:debug, '#connect')
|
29
|
+
get_or_create_bigquery_dataset
|
30
|
+
.tap { log_each(:debug, '#connect end') }
|
31
|
+
end
|
32
|
+
|
33
|
+
def bigquery
|
35
34
|
# ObjectSpace.each_object(HTTPClient).each { |c| c.debug_dev = STDOUT }
|
36
|
-
@bigquery.
|
37
|
-
@loggers[0].debug('BigQuery dataset %s does not exist; creating it' % bq_dataset_name)
|
38
|
-
@bigquery.create_dataset(bq_dataset_name, location: location)
|
39
|
-
end
|
40
|
-
.tap { puts '#connect end' }
|
35
|
+
@bigquery ||= Google::Cloud::Bigquery.new(google_cloud_bigquery_gem_config)
|
41
36
|
end
|
42
37
|
|
43
38
|
def disconnect_connection(_c)
|
44
|
-
|
39
|
+
log_each(:debug, '#disconnect_connection')
|
45
40
|
# c.disconnect
|
46
41
|
end
|
47
42
|
|
48
43
|
def drop_datasets(*dataset_names_to_drop)
|
49
44
|
dataset_names_to_drop.each do |dataset_name_to_drop|
|
50
|
-
|
51
|
-
dataset_to_drop =
|
45
|
+
log_each(:debug, "Dropping dataset #{dataset_name_to_drop.inspect}")
|
46
|
+
dataset_to_drop = bigquery.dataset(dataset_name_to_drop)
|
47
|
+
next unless dataset_to_drop
|
52
48
|
dataset_to_drop.tables.each(&:delete)
|
53
49
|
dataset_to_drop.delete
|
54
50
|
end
|
@@ -56,7 +52,7 @@ module Sequel
|
|
56
52
|
alias drop_dataset drop_datasets
|
57
53
|
|
58
54
|
def execute(sql, opts = OPTS) # rubocop:disable Metrics/MethodLength, Metrics/AbcSize, Metrics/CyclomaticComplexity, Metrics/PerceivedComplexity
|
59
|
-
|
55
|
+
log_each(:debug, '#execute')
|
60
56
|
log_query(sql)
|
61
57
|
|
62
58
|
# require 'pry'; binding.pry if sql =~ /CREATE TABLE IF NOT EXISTS/i
|
@@ -82,21 +78,31 @@ module Sequel
|
|
82
78
|
warn("Warning: Will now execute entire buffered transaction:\n" + @sql_buffer.join("\n"))
|
83
79
|
end
|
84
80
|
|
81
|
+
sql_to_execute = @sql_buffer.any? ? @sql_buffer.join("\n") : sql
|
82
|
+
|
85
83
|
synchronize(opts[:server]) do |conn|
|
86
84
|
results = log_connection_yield(sql, conn) do
|
87
|
-
sql_to_execute = @sql_buffer.any? ? @sql_buffer.join("\n") : sql
|
88
85
|
conn.query(sql_to_execute)
|
89
86
|
end
|
90
|
-
|
91
|
-
ap results
|
87
|
+
log_each(:debug, results.awesome_inspect)
|
92
88
|
if block_given?
|
93
89
|
yield results
|
94
90
|
else
|
95
91
|
results
|
96
92
|
end
|
97
|
-
|
98
|
-
|
99
|
-
|
93
|
+
rescue Google::Cloud::InvalidArgumentError, Google::Cloud::PermissionDeniedError => e
|
94
|
+
if e.message.include?('too many table update operations for this table')
|
95
|
+
warn('Triggered rate limit of table update operations for this table. For more information, see https://cloud.google.com/bigquery/docs/troubleshoot-quotas')
|
96
|
+
if retryable_query?(sql_to_execute)
|
97
|
+
warn('Detected retryable query - re-running query after a 1 second sleep')
|
98
|
+
sleep 1
|
99
|
+
retry
|
100
|
+
else
|
101
|
+
log_each(:error, "Query not detected as retryable; can't automatically recover from being rate-limited")
|
102
|
+
end
|
103
|
+
end
|
104
|
+
raise_error(e)
|
105
|
+
rescue ArgumentError => e
|
100
106
|
raise_error(e)
|
101
107
|
end # rubocop:disable Style/MultilineBlockChain
|
102
108
|
.tap do
|
@@ -123,6 +129,33 @@ module Sequel
|
|
123
129
|
|
124
130
|
private
|
125
131
|
|
132
|
+
attr_reader :bigquery_config
|
133
|
+
|
134
|
+
def google_cloud_bigquery_gem_config
|
135
|
+
bigquery_config.dup.tap do |config|
|
136
|
+
%i[
|
137
|
+
adapter
|
138
|
+
database
|
139
|
+
dataset
|
140
|
+
location
|
141
|
+
logger
|
142
|
+
].each do |option|
|
143
|
+
config.delete(option)
|
144
|
+
end
|
145
|
+
end
|
146
|
+
end
|
147
|
+
|
148
|
+
def get_or_create_bigquery_dataset # rubocop:disable Naming/AccessorMethodName
|
149
|
+
bigquery.dataset(bigquery_dataset_name) || begin
|
150
|
+
log_each(:debug, 'BigQuery dataset %s does not exist; creating it' % bigquery_dataset_name)
|
151
|
+
bigquery.create_dataset(bigquery_dataset_name, location: bigquery_config[:location])
|
152
|
+
end
|
153
|
+
end
|
154
|
+
|
155
|
+
def bigquery_dataset_name
|
156
|
+
bigquery_config[:dataset] || bigquery_config[:database] || (raise ArgumentError, 'BigQuery dataset must be specified')
|
157
|
+
end
|
158
|
+
|
126
159
|
def connection_execute_method
|
127
160
|
:query
|
128
161
|
end
|
@@ -137,9 +170,9 @@ module Sequel
|
|
137
170
|
end
|
138
171
|
|
139
172
|
def schema_parse_table(_table_name, _opts)
|
140
|
-
|
173
|
+
log_each(:debug, Paint['schema_parse_table', :red, :bold])
|
141
174
|
# require 'pry'; binding.pry
|
142
|
-
|
175
|
+
bigquery.datasets.map do |dataset|
|
143
176
|
[
|
144
177
|
dataset.dataset_id,
|
145
178
|
{},
|
@@ -154,13 +187,12 @@ module Sequel
|
|
154
187
|
|
155
188
|
# Padded to horizontally align with post-execution log message which includes the execution time
|
156
189
|
def log_query(sql)
|
157
|
-
pad = '
|
158
|
-
|
159
|
-
# @loggers[0]&.debug(' ' + sql)
|
190
|
+
pad = ' ' * 12
|
191
|
+
log_each(:debug, Paint[pad + sql, :cyan, :bold])
|
160
192
|
end
|
161
193
|
|
162
194
|
def warn(msg)
|
163
|
-
|
195
|
+
log_each(:warn, Paint[msg, '#FFA500', :bold])
|
164
196
|
end
|
165
197
|
|
166
198
|
def warn_default_removal(sql)
|
@@ -190,11 +222,27 @@ module Sequel
|
|
190
222
|
|
191
223
|
sql
|
192
224
|
end
|
225
|
+
|
226
|
+
def supports_combining_alter_table_ops?
|
227
|
+
true
|
228
|
+
end
|
229
|
+
|
230
|
+
def retryable_query?(sql)
|
231
|
+
single_statement_query?(sql) && alter_table_query?(sql)
|
232
|
+
end
|
233
|
+
|
234
|
+
def single_statement_query?(sql)
|
235
|
+
!sql.rstrip.chomp(';').include?(';')
|
236
|
+
end
|
237
|
+
|
238
|
+
def alter_table_query?(sql)
|
239
|
+
sql.match?(/\Aalter table /i)
|
240
|
+
end
|
193
241
|
end
|
194
242
|
|
195
243
|
class Dataset < Sequel::Dataset
|
196
244
|
def fetch_rows(sql, &block)
|
197
|
-
|
245
|
+
db.send(:log_each, :debug, '#fetch_rows')
|
198
246
|
|
199
247
|
execute(sql) do |bq_result|
|
200
248
|
self.columns = bq_result.fields.map { |field| field.name.to_sym }
|
@@ -220,7 +268,7 @@ module Sequel
|
|
220
268
|
|
221
269
|
# Like MySQL, BigQuery uses the nonstandard ` (backtick) for quoting identifiers.
|
222
270
|
def quoted_identifier_append(sql, c)
|
223
|
-
sql << '`%s`' % c
|
271
|
+
sql << ('`%s`' % c)
|
224
272
|
end
|
225
273
|
|
226
274
|
def input_identifier(v)
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: sequel-bigquery
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.6.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Brendan Weibrecht
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2021-
|
11
|
+
date: 2021-11-15 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: amazing_print
|
@@ -98,7 +98,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
98
98
|
- !ruby/object:Gem::Version
|
99
99
|
version: '0'
|
100
100
|
requirements: []
|
101
|
-
rubygems_version: 3.
|
101
|
+
rubygems_version: 3.1.6
|
102
102
|
signing_key:
|
103
103
|
specification_version: 4
|
104
104
|
summary: A Sequel adapter for Google's BigQuery
|