sequel-bigquery 0.4.0 → 0.6.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +46 -4
- data/lib/sequel-bigquery.rb +81 -33
- data/lib/sequel_bigquery/version.rb +1 -1
- metadata +3 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 6c15547bd0bbe53a384a987286cda32f9e1ad624c2c9e5339ac412322e810653
|
4
|
+
data.tar.gz: 26d87c1ff3d3aecf9d5401de88ad9048a239dd65bc3a682cfbb9d9d726675b9f
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: daa23d28fc8359540d61e57a5f4087f76da68045e6535d864f711c51d8fd0517bf9d74e332ea9cd3a5399ece489beb06df08dfaf409caf8877a1cf6aaa8140a3
|
7
|
+
data.tar.gz: eb73f36dbe0317d42e5b1fd9a774dd7d5440d36a8ddebd33c5683a67723efaf5538d4b915a4b518e49a6165854b42152762d57934a57dd050efee0f735563c4f
|
data/README.md
CHANGED
@@ -4,11 +4,22 @@
|
|
4
4
|
|
5
5
|
A Sequel adapter for [Google's BigQuery](https://cloud.google.com/bigquery).
|
6
6
|
|
7
|
+
This gem was created in order to manage schema migrations of a BigQuery dataset at GreenSync. At the time of writing, we couldn't find any good tools in any language to manage changes to the schema as a set of migrations.
|
8
|
+
|
9
|
+
Beyond migrations, I'm unsure how useful this gem is. I haven't yet tested what the performance would be for data interactions vs. directly using the `google-cloud-bigquery` gem's native facilities. If you're inserting a bunch of data, it's probably a better idea to use an [inserter from that gem](https://googleapis.dev/ruby/google-cloud-bigquery/latest/Google/Cloud/Bigquery/Dataset.html#insert_async-instance_method) rather than going through SQL.
|
10
|
+
|
7
11
|
## Contents
|
8
12
|
|
9
13
|
<!-- MarkdownTOC autolink=true -->
|
10
14
|
|
11
15
|
- [Intro](#intro)
|
16
|
+
- [Quirks](#quirks)
|
17
|
+
- [Creating tables with column defaults](#creating-tables-with-column-defaults)
|
18
|
+
- [Transactions](#transactions)
|
19
|
+
- [Update statements without `WHERE`](#update-statements-without-where)
|
20
|
+
- [Combining statements](#combining-statements)
|
21
|
+
- [Alter table](#alter-table)
|
22
|
+
- [Column recreation](#column-recreation)
|
12
23
|
- [Installation](#installation)
|
13
24
|
- [Usage](#usage)
|
14
25
|
- [Contributing](#contributing)
|
@@ -25,12 +36,12 @@ A Sequel adapter for [Google's BigQuery](https://cloud.google.com/bigquery).
|
|
25
36
|
Features:
|
26
37
|
|
27
38
|
- Connecting
|
28
|
-
- Migrating
|
29
|
-
- Table creation
|
39
|
+
- Migrating (see quirks)
|
40
|
+
- Table creation (see quirks)
|
30
41
|
- Inserting rows
|
31
|
-
- Updating rows
|
42
|
+
- Updating rows (see quirks)
|
32
43
|
- Querying
|
33
|
-
- Transactions (
|
44
|
+
- Transactions (see quirks)
|
34
45
|
- Table partitioning
|
35
46
|
- Ruby types:
|
36
47
|
+ String
|
@@ -42,6 +53,34 @@ Features:
|
|
42
53
|
+ BigDecimal
|
43
54
|
- Selecting the BigQuery server location
|
44
55
|
|
56
|
+
## Quirks
|
57
|
+
|
58
|
+
### Creating tables with column defaults
|
59
|
+
|
60
|
+
BigQuery doesn't support defaults on columns. As a workaround, all defaults are automatically removed from statements (crudely).
|
61
|
+
|
62
|
+
### Transactions
|
63
|
+
|
64
|
+
BigQuery doesn't support transactions where the statements are executed individually. It does support them if entire transaction SQL is sent all at once though. As a workaround, buffering of statements within a transaction has been implemented. However, the impact of this is that no results can be returned within a transaction.
|
65
|
+
|
66
|
+
### Update statements without `WHERE`
|
67
|
+
|
68
|
+
BigQuery requires all `UPDATE` statement to have a `WHERE` clause. As a workaround, statements which lack one have `where 1 = 1` appended automatically (crudely).
|
69
|
+
|
70
|
+
### Combining statements
|
71
|
+
|
72
|
+
When combining multiple statements into one query (with `;`), and the final statement is not a `SELECT`, the `google-cloud-bigquery` gem has a [bug](https://github.com/googleapis/google-cloud-ruby/issues/9617) which causes an exception. Note that all the statements have been executed when this happens. A workaround is to append `; SELECT 1`.
|
73
|
+
|
74
|
+
### Alter table
|
75
|
+
|
76
|
+
BigQuery [rate-limits alter table statements](https://cloud.google.com/bigquery/quotas#dataset_limits) to 10 per second. This is mitigated somewhat by Sequel combining `ALTER TABLE` statements whenever possible, and BigQuery having extremely high latency (\~2 seconds per query); but you may still run into this limitation.
|
77
|
+
|
78
|
+
We've also noticed a bug with `google-cloud-bigquery` where an `ALTER TABLE` statement resulted in a `NoMethodError` on nil for `fields` within `from_gapi_json`. We're not yet sure what caused this.
|
79
|
+
|
80
|
+
### Column recreation
|
81
|
+
|
82
|
+
Be careful when deleting a column which you might want to re-add. BigQuery reserves the name of a deleted column for up to the time travel duration - which is [*seven days*](https://cloud.google.com/bigquery/docs/time-travel). Re-creating the entire dataset is a painful workaround.
|
83
|
+
|
45
84
|
## Installation
|
46
85
|
|
47
86
|
Add it to the `Gemfile` of your project:
|
@@ -68,6 +107,7 @@ Connect to BigQuery:
|
|
68
107
|
|
69
108
|
```
|
70
109
|
require 'sequel-bigquery'
|
110
|
+
require 'logger'
|
71
111
|
|
72
112
|
db = Sequel.connect(
|
73
113
|
adapter: :bigquery,
|
@@ -80,6 +120,8 @@ db = Sequel.connect(
|
|
80
120
|
|
81
121
|
And use Sequel like normal.
|
82
122
|
|
123
|
+
Note that it is important to supply a logger that will at least output warning messages so you know when your queries are being modifed or buffered, which may be unexpected behaviour.
|
124
|
+
|
83
125
|
## Contributing
|
84
126
|
|
85
127
|
Pull requests welcome! =)
|
data/lib/sequel-bigquery.rb
CHANGED
@@ -4,6 +4,7 @@ require 'delegate'
|
|
4
4
|
require 'time'
|
5
5
|
|
6
6
|
require 'google/cloud/bigquery'
|
7
|
+
require 'amazing_print'
|
7
8
|
require 'paint'
|
8
9
|
require 'sequel'
|
9
10
|
|
@@ -16,39 +17,34 @@ module Sequel
|
|
16
17
|
class Database < Sequel::Database # rubocop:disable Metrics/ClassLength
|
17
18
|
set_adapter_scheme :bigquery
|
18
19
|
|
19
|
-
def initialize(*args, **
|
20
|
-
|
21
|
-
@orig_opts = kawrgs.fetch(:orig_opts)
|
20
|
+
def initialize(*args, **kwargs)
|
21
|
+
@bigquery_config = kwargs.fetch(:orig_opts)
|
22
22
|
@sql_buffer = []
|
23
23
|
@sql_buffering = false
|
24
24
|
super
|
25
25
|
end
|
26
26
|
|
27
27
|
def connect(*_args)
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
32
|
-
|
33
|
-
|
34
|
-
@bigquery = Google::Cloud::Bigquery.new(config)
|
28
|
+
log_each(:debug, '#connect')
|
29
|
+
get_or_create_bigquery_dataset
|
30
|
+
.tap { log_each(:debug, '#connect end') }
|
31
|
+
end
|
32
|
+
|
33
|
+
def bigquery
|
35
34
|
# ObjectSpace.each_object(HTTPClient).each { |c| c.debug_dev = STDOUT }
|
36
|
-
@bigquery.
|
37
|
-
@loggers[0].debug('BigQuery dataset %s does not exist; creating it' % bq_dataset_name)
|
38
|
-
@bigquery.create_dataset(bq_dataset_name, location: location)
|
39
|
-
end
|
40
|
-
.tap { puts '#connect end' }
|
35
|
+
@bigquery ||= Google::Cloud::Bigquery.new(google_cloud_bigquery_gem_config)
|
41
36
|
end
|
42
37
|
|
43
38
|
def disconnect_connection(_c)
|
44
|
-
|
39
|
+
log_each(:debug, '#disconnect_connection')
|
45
40
|
# c.disconnect
|
46
41
|
end
|
47
42
|
|
48
43
|
def drop_datasets(*dataset_names_to_drop)
|
49
44
|
dataset_names_to_drop.each do |dataset_name_to_drop|
|
50
|
-
|
51
|
-
dataset_to_drop =
|
45
|
+
log_each(:debug, "Dropping dataset #{dataset_name_to_drop.inspect}")
|
46
|
+
dataset_to_drop = bigquery.dataset(dataset_name_to_drop)
|
47
|
+
next unless dataset_to_drop
|
52
48
|
dataset_to_drop.tables.each(&:delete)
|
53
49
|
dataset_to_drop.delete
|
54
50
|
end
|
@@ -56,7 +52,7 @@ module Sequel
|
|
56
52
|
alias drop_dataset drop_datasets
|
57
53
|
|
58
54
|
def execute(sql, opts = OPTS) # rubocop:disable Metrics/MethodLength, Metrics/AbcSize, Metrics/CyclomaticComplexity, Metrics/PerceivedComplexity
|
59
|
-
|
55
|
+
log_each(:debug, '#execute')
|
60
56
|
log_query(sql)
|
61
57
|
|
62
58
|
# require 'pry'; binding.pry if sql =~ /CREATE TABLE IF NOT EXISTS/i
|
@@ -82,21 +78,31 @@ module Sequel
|
|
82
78
|
warn("Warning: Will now execute entire buffered transaction:\n" + @sql_buffer.join("\n"))
|
83
79
|
end
|
84
80
|
|
81
|
+
sql_to_execute = @sql_buffer.any? ? @sql_buffer.join("\n") : sql
|
82
|
+
|
85
83
|
synchronize(opts[:server]) do |conn|
|
86
84
|
results = log_connection_yield(sql, conn) do
|
87
|
-
sql_to_execute = @sql_buffer.any? ? @sql_buffer.join("\n") : sql
|
88
85
|
conn.query(sql_to_execute)
|
89
86
|
end
|
90
|
-
|
91
|
-
ap results
|
87
|
+
log_each(:debug, results.awesome_inspect)
|
92
88
|
if block_given?
|
93
89
|
yield results
|
94
90
|
else
|
95
91
|
results
|
96
92
|
end
|
97
|
-
|
98
|
-
|
99
|
-
|
93
|
+
rescue Google::Cloud::InvalidArgumentError, Google::Cloud::PermissionDeniedError => e
|
94
|
+
if e.message.include?('too many table update operations for this table')
|
95
|
+
warn('Triggered rate limit of table update operations for this table. For more information, see https://cloud.google.com/bigquery/docs/troubleshoot-quotas')
|
96
|
+
if retryable_query?(sql_to_execute)
|
97
|
+
warn('Detected retryable query - re-running query after a 1 second sleep')
|
98
|
+
sleep 1
|
99
|
+
retry
|
100
|
+
else
|
101
|
+
log_each(:error, "Query not detected as retryable; can't automatically recover from being rate-limited")
|
102
|
+
end
|
103
|
+
end
|
104
|
+
raise_error(e)
|
105
|
+
rescue ArgumentError => e
|
100
106
|
raise_error(e)
|
101
107
|
end # rubocop:disable Style/MultilineBlockChain
|
102
108
|
.tap do
|
@@ -123,6 +129,33 @@ module Sequel
|
|
123
129
|
|
124
130
|
private
|
125
131
|
|
132
|
+
attr_reader :bigquery_config
|
133
|
+
|
134
|
+
def google_cloud_bigquery_gem_config
|
135
|
+
bigquery_config.dup.tap do |config|
|
136
|
+
%i[
|
137
|
+
adapter
|
138
|
+
database
|
139
|
+
dataset
|
140
|
+
location
|
141
|
+
logger
|
142
|
+
].each do |option|
|
143
|
+
config.delete(option)
|
144
|
+
end
|
145
|
+
end
|
146
|
+
end
|
147
|
+
|
148
|
+
def get_or_create_bigquery_dataset # rubocop:disable Naming/AccessorMethodName
|
149
|
+
bigquery.dataset(bigquery_dataset_name) || begin
|
150
|
+
log_each(:debug, 'BigQuery dataset %s does not exist; creating it' % bigquery_dataset_name)
|
151
|
+
bigquery.create_dataset(bigquery_dataset_name, location: bigquery_config[:location])
|
152
|
+
end
|
153
|
+
end
|
154
|
+
|
155
|
+
def bigquery_dataset_name
|
156
|
+
bigquery_config[:dataset] || bigquery_config[:database] || (raise ArgumentError, 'BigQuery dataset must be specified')
|
157
|
+
end
|
158
|
+
|
126
159
|
def connection_execute_method
|
127
160
|
:query
|
128
161
|
end
|
@@ -137,9 +170,9 @@ module Sequel
|
|
137
170
|
end
|
138
171
|
|
139
172
|
def schema_parse_table(_table_name, _opts)
|
140
|
-
|
173
|
+
log_each(:debug, Paint['schema_parse_table', :red, :bold])
|
141
174
|
# require 'pry'; binding.pry
|
142
|
-
|
175
|
+
bigquery.datasets.map do |dataset|
|
143
176
|
[
|
144
177
|
dataset.dataset_id,
|
145
178
|
{},
|
@@ -154,13 +187,12 @@ module Sequel
|
|
154
187
|
|
155
188
|
# Padded to horizontally align with post-execution log message which includes the execution time
|
156
189
|
def log_query(sql)
|
157
|
-
pad = '
|
158
|
-
|
159
|
-
# @loggers[0]&.debug(' ' + sql)
|
190
|
+
pad = ' ' * 12
|
191
|
+
log_each(:debug, Paint[pad + sql, :cyan, :bold])
|
160
192
|
end
|
161
193
|
|
162
194
|
def warn(msg)
|
163
|
-
|
195
|
+
log_each(:warn, Paint[msg, '#FFA500', :bold])
|
164
196
|
end
|
165
197
|
|
166
198
|
def warn_default_removal(sql)
|
@@ -190,11 +222,27 @@ module Sequel
|
|
190
222
|
|
191
223
|
sql
|
192
224
|
end
|
225
|
+
|
226
|
+
def supports_combining_alter_table_ops?
|
227
|
+
true
|
228
|
+
end
|
229
|
+
|
230
|
+
def retryable_query?(sql)
|
231
|
+
single_statement_query?(sql) && alter_table_query?(sql)
|
232
|
+
end
|
233
|
+
|
234
|
+
def single_statement_query?(sql)
|
235
|
+
!sql.rstrip.chomp(';').include?(';')
|
236
|
+
end
|
237
|
+
|
238
|
+
def alter_table_query?(sql)
|
239
|
+
sql.match?(/\Aalter table /i)
|
240
|
+
end
|
193
241
|
end
|
194
242
|
|
195
243
|
class Dataset < Sequel::Dataset
|
196
244
|
def fetch_rows(sql, &block)
|
197
|
-
|
245
|
+
db.send(:log_each, :debug, '#fetch_rows')
|
198
246
|
|
199
247
|
execute(sql) do |bq_result|
|
200
248
|
self.columns = bq_result.fields.map { |field| field.name.to_sym }
|
@@ -220,7 +268,7 @@ module Sequel
|
|
220
268
|
|
221
269
|
# Like MySQL, BigQuery uses the nonstandard ` (backtick) for quoting identifiers.
|
222
270
|
def quoted_identifier_append(sql, c)
|
223
|
-
sql << '`%s`' % c
|
271
|
+
sql << ('`%s`' % c)
|
224
272
|
end
|
225
273
|
|
226
274
|
def input_identifier(v)
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: sequel-bigquery
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.6.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Brendan Weibrecht
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2021-
|
11
|
+
date: 2021-11-15 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: amazing_print
|
@@ -98,7 +98,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
98
98
|
- !ruby/object:Gem::Version
|
99
99
|
version: '0'
|
100
100
|
requirements: []
|
101
|
-
rubygems_version: 3.
|
101
|
+
rubygems_version: 3.1.6
|
102
102
|
signing_key:
|
103
103
|
specification_version: 4
|
104
104
|
summary: A Sequel adapter for Google's BigQuery
|