sequel-bigquery 0.3.0 → 0.5.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +36 -4
- data/lib/sequel-bigquery.rb +57 -30
- data/lib/sequel_bigquery/version.rb +1 -1
- metadata +3 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: c37139c40fd486391d9d16ab147430352b67a1a2b5a55f7eed4189c368b5e87a
|
4
|
+
data.tar.gz: e3753842a9727cf53451ff80ac9abc250343a719d4215ab368c5a93a9d2b4830
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 6189f0b53a6edd9d66c915316ce8e9e8c836181609405b54e246c0dbe5451bce995c6f5b73e7b320624c35b62bfd66874757734f839ef7f9023dbc099271b2a4
|
7
|
+
data.tar.gz: 715b5adbdb6e225ca9af37363fbc30b59bde54a0b4602d91afcca93506cac541d252b31767571605ee555f2fdaad8d316ccf67a083354bfdcc9cd5f925c51646
|
data/README.md
CHANGED
@@ -4,11 +4,20 @@
|
|
4
4
|
|
5
5
|
A Sequel adapter for [Google's BigQuery](https://cloud.google.com/bigquery).
|
6
6
|
|
7
|
+
This gem was created in order to manage schema migrations of a BigQuery dataset at GreenSync. At the time of writing, we couldn't find any good tools in any language to manage changes to the schema as a set of migrations.
|
8
|
+
|
9
|
+
Beyond migrations, I'm unsure how useful this gem is. I haven't yet tested what the performance would be for data interactions vs. directly using the `google-cloud-bigquery` gem's native facilities. If you're inserting a bunch of data, it's probably a better idea to use an [inserter from that gem](https://googleapis.dev/ruby/google-cloud-bigquery/latest/Google/Cloud/Bigquery/Dataset.html#insert_async-instance_method) rather than going through SQL.
|
10
|
+
|
7
11
|
## Contents
|
8
12
|
|
9
13
|
<!-- MarkdownTOC autolink=true -->
|
10
14
|
|
11
15
|
- [Intro](#intro)
|
16
|
+
- [Quirks](#quirks)
|
17
|
+
- [Creating tables with column defaults](#creating-tables-with-column-defaults)
|
18
|
+
- [Transactions](#transactions)
|
19
|
+
- [Update statements without `WHERE`](#update-statements-without-where)
|
20
|
+
- [Alter table](#alter-table)
|
12
21
|
- [Installation](#installation)
|
13
22
|
- [Usage](#usage)
|
14
23
|
- [Contributing](#contributing)
|
@@ -25,12 +34,12 @@ A Sequel adapter for [Google's BigQuery](https://cloud.google.com/bigquery).
|
|
25
34
|
Features:
|
26
35
|
|
27
36
|
- Connecting
|
28
|
-
- Migrating
|
29
|
-
- Table creation
|
37
|
+
- Migrating (see quirks)
|
38
|
+
- Table creation (see quirks)
|
30
39
|
- Inserting rows
|
31
|
-
- Updating rows
|
40
|
+
- Updating rows (see quirks)
|
32
41
|
- Querying
|
33
|
-
- Transactions (
|
42
|
+
- Transactions (see quirks)
|
34
43
|
- Table partitioning
|
35
44
|
- Ruby types:
|
36
45
|
+ String
|
@@ -40,6 +49,25 @@ Features:
|
|
40
49
|
+ Date
|
41
50
|
+ Float
|
42
51
|
+ BigDecimal
|
52
|
+
- Selecting the BigQuery server location
|
53
|
+
|
54
|
+
## Quirks
|
55
|
+
|
56
|
+
### Creating tables with column defaults
|
57
|
+
|
58
|
+
BigQuery doesn't support defaults on columns. As a workaround, all defaults are automatically removed from statements (crudely).
|
59
|
+
|
60
|
+
### Transactions
|
61
|
+
|
62
|
+
BigQuery doesn't support transactions where the statements are executed individually. It does support them if entire transaction SQL is sent all at once though. As a workaround, buffering of statements within a transaction has been implemented. However, the impact of this is that no results can be returned within a transaction.
|
63
|
+
|
64
|
+
### Update statements without `WHERE`
|
65
|
+
|
66
|
+
BigQuery requires all `UPDATE` statement to have a `WHERE` clause. As a workaround, statements which lack one have `where 1 = 1` appended automatically (crudely).
|
67
|
+
|
68
|
+
### Alter table
|
69
|
+
|
70
|
+
We've found that the `google-cloud-bigquery` gem seems to have a bug where an internal lack of result value results in a `NoMethodError` on nil for `fields` within `from_gapi_json`. [See issue #6](https://github.com/ZimbiX/sequel-bigquery/issues/6#issuecomment-968523731). As a workaround, all generated statements within `alter_table` are joined together with `;` and executed only at the end of the block. A `select 1` is also appended to try to ensure we have a result to avoid the aforementioned exception. A bonus of batching the queries is that the latency should be somewhat reduced.
|
43
71
|
|
44
72
|
## Installation
|
45
73
|
|
@@ -67,17 +95,21 @@ Connect to BigQuery:
|
|
67
95
|
|
68
96
|
```
|
69
97
|
require 'sequel-bigquery'
|
98
|
+
require 'logger'
|
70
99
|
|
71
100
|
db = Sequel.connect(
|
72
101
|
adapter: :bigquery,
|
73
102
|
project: 'your-gcp-project',
|
74
103
|
database: 'your_bigquery_dataset_name',
|
104
|
+
location: 'australia-southeast2',
|
75
105
|
logger: Logger.new(STDOUT),
|
76
106
|
)
|
77
107
|
```
|
78
108
|
|
79
109
|
And use Sequel like normal.
|
80
110
|
|
111
|
+
Note that it is important to supply a logger that will at least output warning messages so you know when your queries are being modifed or buffered, which may be unexpected behaviour.
|
112
|
+
|
81
113
|
## Contributing
|
82
114
|
|
83
115
|
Pull requests welcome! =)
|
data/lib/sequel-bigquery.rb
CHANGED
@@ -4,6 +4,7 @@ require 'delegate'
|
|
4
4
|
require 'time'
|
5
5
|
|
6
6
|
require 'google/cloud/bigquery'
|
7
|
+
require 'amazing_print'
|
7
8
|
require 'paint'
|
8
9
|
require 'sequel'
|
9
10
|
|
@@ -16,38 +17,34 @@ module Sequel
|
|
16
17
|
class Database < Sequel::Database # rubocop:disable Metrics/ClassLength
|
17
18
|
set_adapter_scheme :bigquery
|
18
19
|
|
19
|
-
def initialize(*args, **
|
20
|
-
|
21
|
-
@orig_opts = kawrgs.fetch(:orig_opts)
|
20
|
+
def initialize(*args, **kwargs)
|
21
|
+
@bigquery_config = kwargs.fetch(:orig_opts)
|
22
22
|
@sql_buffer = []
|
23
23
|
@sql_buffering = false
|
24
24
|
super
|
25
25
|
end
|
26
26
|
|
27
27
|
def connect(*_args)
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
32
|
-
|
33
|
-
|
28
|
+
log_each(:debug, '#connect')
|
29
|
+
get_or_create_bigquery_dataset
|
30
|
+
.tap { log_each(:debug, '#connect end') }
|
31
|
+
end
|
32
|
+
|
33
|
+
def bigquery
|
34
34
|
# ObjectSpace.each_object(HTTPClient).each { |c| c.debug_dev = STDOUT }
|
35
|
-
@bigquery.
|
36
|
-
@loggers[0].debug('BigQuery dataset %s does not exist; creating it' % bq_dataset_name)
|
37
|
-
@bigquery.create_dataset(bq_dataset_name)
|
38
|
-
end
|
39
|
-
.tap { puts '#connect end' }
|
35
|
+
@bigquery ||= Google::Cloud::Bigquery.new(google_cloud_bigquery_gem_config)
|
40
36
|
end
|
41
37
|
|
42
38
|
def disconnect_connection(_c)
|
43
|
-
|
39
|
+
log_each(:debug, '#disconnect_connection')
|
44
40
|
# c.disconnect
|
45
41
|
end
|
46
42
|
|
47
43
|
def drop_datasets(*dataset_names_to_drop)
|
48
44
|
dataset_names_to_drop.each do |dataset_name_to_drop|
|
49
|
-
|
50
|
-
dataset_to_drop =
|
45
|
+
log_each(:debug, "Dropping dataset #{dataset_name_to_drop.inspect}")
|
46
|
+
dataset_to_drop = bigquery.dataset(dataset_name_to_drop)
|
47
|
+
next unless dataset_to_drop
|
51
48
|
dataset_to_drop.tables.each(&:delete)
|
52
49
|
dataset_to_drop.delete
|
53
50
|
end
|
@@ -55,7 +52,7 @@ module Sequel
|
|
55
52
|
alias drop_dataset drop_datasets
|
56
53
|
|
57
54
|
def execute(sql, opts = OPTS) # rubocop:disable Metrics/MethodLength, Metrics/AbcSize, Metrics/CyclomaticComplexity, Metrics/PerceivedComplexity
|
58
|
-
|
55
|
+
log_each(:debug, '#execute')
|
59
56
|
log_query(sql)
|
60
57
|
|
61
58
|
# require 'pry'; binding.pry if sql =~ /CREATE TABLE IF NOT EXISTS/i
|
@@ -86,15 +83,12 @@ module Sequel
|
|
86
83
|
sql_to_execute = @sql_buffer.any? ? @sql_buffer.join("\n") : sql
|
87
84
|
conn.query(sql_to_execute)
|
88
85
|
end
|
89
|
-
|
90
|
-
ap results
|
86
|
+
log_each(:debug, results.awesome_inspect)
|
91
87
|
if block_given?
|
92
88
|
yield results
|
93
89
|
else
|
94
90
|
results
|
95
91
|
end
|
96
|
-
# TODO
|
97
|
-
# rescue ::ODBC::Error, ArgumentError => e
|
98
92
|
rescue Google::Cloud::InvalidArgumentError, ArgumentError => e
|
99
93
|
raise_error(e)
|
100
94
|
end # rubocop:disable Style/MultilineBlockChain
|
@@ -122,6 +116,33 @@ module Sequel
|
|
122
116
|
|
123
117
|
private
|
124
118
|
|
119
|
+
attr_reader :bigquery_config
|
120
|
+
|
121
|
+
def google_cloud_bigquery_gem_config
|
122
|
+
bigquery_config.dup.tap do |config|
|
123
|
+
%i[
|
124
|
+
adapter
|
125
|
+
database
|
126
|
+
dataset
|
127
|
+
location
|
128
|
+
logger
|
129
|
+
].each do |option|
|
130
|
+
config.delete(option)
|
131
|
+
end
|
132
|
+
end
|
133
|
+
end
|
134
|
+
|
135
|
+
def get_or_create_bigquery_dataset # rubocop:disable Naming/AccessorMethodName
|
136
|
+
bigquery.dataset(bigquery_dataset_name) || begin
|
137
|
+
log_each(:debug, 'BigQuery dataset %s does not exist; creating it' % bigquery_dataset_name)
|
138
|
+
bigquery.create_dataset(bigquery_dataset_name, location: bigquery_config[:location])
|
139
|
+
end
|
140
|
+
end
|
141
|
+
|
142
|
+
def bigquery_dataset_name
|
143
|
+
bigquery_config[:dataset] || bigquery_config[:database] || (raise ArgumentError, 'BigQuery dataset must be specified')
|
144
|
+
end
|
145
|
+
|
125
146
|
def connection_execute_method
|
126
147
|
:query
|
127
148
|
end
|
@@ -136,9 +157,9 @@ module Sequel
|
|
136
157
|
end
|
137
158
|
|
138
159
|
def schema_parse_table(_table_name, _opts)
|
139
|
-
|
160
|
+
log_each(:debug, Paint['schema_parse_table', :red, :bold])
|
140
161
|
# require 'pry'; binding.pry
|
141
|
-
|
162
|
+
bigquery.datasets.map do |dataset|
|
142
163
|
[
|
143
164
|
dataset.dataset_id,
|
144
165
|
{},
|
@@ -153,13 +174,12 @@ module Sequel
|
|
153
174
|
|
154
175
|
# Padded to horizontally align with post-execution log message which includes the execution time
|
155
176
|
def log_query(sql)
|
156
|
-
pad = '
|
157
|
-
|
158
|
-
# @loggers[0]&.debug(' ' + sql)
|
177
|
+
pad = ' ' * 12
|
178
|
+
log_each(:debug, Paint[pad + sql, :cyan, :bold])
|
159
179
|
end
|
160
180
|
|
161
181
|
def warn(msg)
|
162
|
-
|
182
|
+
log_each(:warn, Paint[msg, '#FFA500', :bold])
|
163
183
|
end
|
164
184
|
|
165
185
|
def warn_default_removal(sql)
|
@@ -189,11 +209,18 @@ module Sequel
|
|
189
209
|
|
190
210
|
sql
|
191
211
|
end
|
212
|
+
|
213
|
+
# Batch the alter table queries and make sure something is returned to avoid an error related to the return value
|
214
|
+
def apply_alter_table(name, ops)
|
215
|
+
sqls = alter_table_sql_list(name, ops)
|
216
|
+
sqls_joined = (sqls + ['select 1']).join(";\n")
|
217
|
+
execute_ddl(sqls_joined)
|
218
|
+
end
|
192
219
|
end
|
193
220
|
|
194
221
|
class Dataset < Sequel::Dataset
|
195
222
|
def fetch_rows(sql, &block)
|
196
|
-
|
223
|
+
db.send(:log_each, :debug, '#fetch_rows')
|
197
224
|
|
198
225
|
execute(sql) do |bq_result|
|
199
226
|
self.columns = bq_result.fields.map { |field| field.name.to_sym }
|
@@ -219,7 +246,7 @@ module Sequel
|
|
219
246
|
|
220
247
|
# Like MySQL, BigQuery uses the nonstandard ` (backtick) for quoting identifiers.
|
221
248
|
def quoted_identifier_append(sql, c)
|
222
|
-
sql << '`%s`' % c
|
249
|
+
sql << ('`%s`' % c)
|
223
250
|
end
|
224
251
|
|
225
252
|
def input_identifier(v)
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: sequel-bigquery
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.5.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Brendan Weibrecht
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2021-
|
11
|
+
date: 2021-11-15 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: amazing_print
|
@@ -98,7 +98,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
98
98
|
- !ruby/object:Gem::Version
|
99
99
|
version: '0'
|
100
100
|
requirements: []
|
101
|
-
rubygems_version: 3.
|
101
|
+
rubygems_version: 3.1.6
|
102
102
|
signing_key:
|
103
103
|
specification_version: 4
|
104
104
|
summary: A Sequel adapter for Google's BigQuery
|