sequel-bigquery 0.3.0 → 0.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +36 -4
- data/lib/sequel-bigquery.rb +57 -30
- data/lib/sequel_bigquery/version.rb +1 -1
- metadata +3 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: c37139c40fd486391d9d16ab147430352b67a1a2b5a55f7eed4189c368b5e87a
|
4
|
+
data.tar.gz: e3753842a9727cf53451ff80ac9abc250343a719d4215ab368c5a93a9d2b4830
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 6189f0b53a6edd9d66c915316ce8e9e8c836181609405b54e246c0dbe5451bce995c6f5b73e7b320624c35b62bfd66874757734f839ef7f9023dbc099271b2a4
|
7
|
+
data.tar.gz: 715b5adbdb6e225ca9af37363fbc30b59bde54a0b4602d91afcca93506cac541d252b31767571605ee555f2fdaad8d316ccf67a083354bfdcc9cd5f925c51646
|
data/README.md
CHANGED
@@ -4,11 +4,20 @@
|
|
4
4
|
|
5
5
|
A Sequel adapter for [Google's BigQuery](https://cloud.google.com/bigquery).
|
6
6
|
|
7
|
+
This gem was created in order to manage schema migrations of a BigQuery dataset at GreenSync. At the time of writing, we couldn't find any good tools in any language to manage changes to the schema as a set of migrations.
|
8
|
+
|
9
|
+
Beyond migrations, I'm unsure how useful this gem is. I haven't yet tested what the performance would be for data interactions vs. directly using the `google-cloud-bigquery` gem's native facilities. If you're inserting a bunch of data, it's probably a better idea to use an [inserter from that gem](https://googleapis.dev/ruby/google-cloud-bigquery/latest/Google/Cloud/Bigquery/Dataset.html#insert_async-instance_method) rather than going through SQL.
|
10
|
+
|
7
11
|
## Contents
|
8
12
|
|
9
13
|
<!-- MarkdownTOC autolink=true -->
|
10
14
|
|
11
15
|
- [Intro](#intro)
|
16
|
+
- [Quirks](#quirks)
|
17
|
+
- [Creating tables with column defaults](#creating-tables-with-column-defaults)
|
18
|
+
- [Transactions](#transactions)
|
19
|
+
- [Update statements without `WHERE`](#update-statements-without-where)
|
20
|
+
- [Alter table](#alter-table)
|
12
21
|
- [Installation](#installation)
|
13
22
|
- [Usage](#usage)
|
14
23
|
- [Contributing](#contributing)
|
@@ -25,12 +34,12 @@ A Sequel adapter for [Google's BigQuery](https://cloud.google.com/bigquery).
|
|
25
34
|
Features:
|
26
35
|
|
27
36
|
- Connecting
|
28
|
-
- Migrating
|
29
|
-
- Table creation
|
37
|
+
- Migrating (see quirks)
|
38
|
+
- Table creation (see quirks)
|
30
39
|
- Inserting rows
|
31
|
-
- Updating rows
|
40
|
+
- Updating rows (see quirks)
|
32
41
|
- Querying
|
33
|
-
- Transactions (
|
42
|
+
- Transactions (see quirks)
|
34
43
|
- Table partitioning
|
35
44
|
- Ruby types:
|
36
45
|
+ String
|
@@ -40,6 +49,25 @@ Features:
|
|
40
49
|
+ Date
|
41
50
|
+ Float
|
42
51
|
+ BigDecimal
|
52
|
+
- Selecting the BigQuery server location
|
53
|
+
|
54
|
+
## Quirks
|
55
|
+
|
56
|
+
### Creating tables with column defaults
|
57
|
+
|
58
|
+
BigQuery doesn't support defaults on columns. As a workaround, all defaults are automatically removed from statements (crudely).
|
59
|
+
|
60
|
+
### Transactions
|
61
|
+
|
62
|
+
BigQuery doesn't support transactions where the statements are executed individually. It does support them if entire transaction SQL is sent all at once though. As a workaround, buffering of statements within a transaction has been implemented. However, the impact of this is that no results can be returned within a transaction.
|
63
|
+
|
64
|
+
### Update statements without `WHERE`
|
65
|
+
|
66
|
+
BigQuery requires all `UPDATE` statement to have a `WHERE` clause. As a workaround, statements which lack one have `where 1 = 1` appended automatically (crudely).
|
67
|
+
|
68
|
+
### Alter table
|
69
|
+
|
70
|
+
We've found that the `google-cloud-bigquery` gem seems to have a bug where an internal lack of result value results in a `NoMethodError` on nil for `fields` within `from_gapi_json`. [See issue #6](https://github.com/ZimbiX/sequel-bigquery/issues/6#issuecomment-968523731). As a workaround, all generated statements within `alter_table` are joined together with `;` and executed only at the end of the block. A `select 1` is also appended to try to ensure we have a result to avoid the aforementioned exception. A bonus of batching the queries is that the latency should be somewhat reduced.
|
43
71
|
|
44
72
|
## Installation
|
45
73
|
|
@@ -67,17 +95,21 @@ Connect to BigQuery:
|
|
67
95
|
|
68
96
|
```
|
69
97
|
require 'sequel-bigquery'
|
98
|
+
require 'logger'
|
70
99
|
|
71
100
|
db = Sequel.connect(
|
72
101
|
adapter: :bigquery,
|
73
102
|
project: 'your-gcp-project',
|
74
103
|
database: 'your_bigquery_dataset_name',
|
104
|
+
location: 'australia-southeast2',
|
75
105
|
logger: Logger.new(STDOUT),
|
76
106
|
)
|
77
107
|
```
|
78
108
|
|
79
109
|
And use Sequel like normal.
|
80
110
|
|
111
|
+
Note that it is important to supply a logger that will at least output warning messages so you know when your queries are being modifed or buffered, which may be unexpected behaviour.
|
112
|
+
|
81
113
|
## Contributing
|
82
114
|
|
83
115
|
Pull requests welcome! =)
|
data/lib/sequel-bigquery.rb
CHANGED
@@ -4,6 +4,7 @@ require 'delegate'
|
|
4
4
|
require 'time'
|
5
5
|
|
6
6
|
require 'google/cloud/bigquery'
|
7
|
+
require 'amazing_print'
|
7
8
|
require 'paint'
|
8
9
|
require 'sequel'
|
9
10
|
|
@@ -16,38 +17,34 @@ module Sequel
|
|
16
17
|
class Database < Sequel::Database # rubocop:disable Metrics/ClassLength
|
17
18
|
set_adapter_scheme :bigquery
|
18
19
|
|
19
|
-
def initialize(*args, **
|
20
|
-
|
21
|
-
@orig_opts = kawrgs.fetch(:orig_opts)
|
20
|
+
def initialize(*args, **kwargs)
|
21
|
+
@bigquery_config = kwargs.fetch(:orig_opts)
|
22
22
|
@sql_buffer = []
|
23
23
|
@sql_buffering = false
|
24
24
|
super
|
25
25
|
end
|
26
26
|
|
27
27
|
def connect(*_args)
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
32
|
-
|
33
|
-
|
28
|
+
log_each(:debug, '#connect')
|
29
|
+
get_or_create_bigquery_dataset
|
30
|
+
.tap { log_each(:debug, '#connect end') }
|
31
|
+
end
|
32
|
+
|
33
|
+
def bigquery
|
34
34
|
# ObjectSpace.each_object(HTTPClient).each { |c| c.debug_dev = STDOUT }
|
35
|
-
@bigquery.
|
36
|
-
@loggers[0].debug('BigQuery dataset %s does not exist; creating it' % bq_dataset_name)
|
37
|
-
@bigquery.create_dataset(bq_dataset_name)
|
38
|
-
end
|
39
|
-
.tap { puts '#connect end' }
|
35
|
+
@bigquery ||= Google::Cloud::Bigquery.new(google_cloud_bigquery_gem_config)
|
40
36
|
end
|
41
37
|
|
42
38
|
def disconnect_connection(_c)
|
43
|
-
|
39
|
+
log_each(:debug, '#disconnect_connection')
|
44
40
|
# c.disconnect
|
45
41
|
end
|
46
42
|
|
47
43
|
def drop_datasets(*dataset_names_to_drop)
|
48
44
|
dataset_names_to_drop.each do |dataset_name_to_drop|
|
49
|
-
|
50
|
-
dataset_to_drop =
|
45
|
+
log_each(:debug, "Dropping dataset #{dataset_name_to_drop.inspect}")
|
46
|
+
dataset_to_drop = bigquery.dataset(dataset_name_to_drop)
|
47
|
+
next unless dataset_to_drop
|
51
48
|
dataset_to_drop.tables.each(&:delete)
|
52
49
|
dataset_to_drop.delete
|
53
50
|
end
|
@@ -55,7 +52,7 @@ module Sequel
|
|
55
52
|
alias drop_dataset drop_datasets
|
56
53
|
|
57
54
|
def execute(sql, opts = OPTS) # rubocop:disable Metrics/MethodLength, Metrics/AbcSize, Metrics/CyclomaticComplexity, Metrics/PerceivedComplexity
|
58
|
-
|
55
|
+
log_each(:debug, '#execute')
|
59
56
|
log_query(sql)
|
60
57
|
|
61
58
|
# require 'pry'; binding.pry if sql =~ /CREATE TABLE IF NOT EXISTS/i
|
@@ -86,15 +83,12 @@ module Sequel
|
|
86
83
|
sql_to_execute = @sql_buffer.any? ? @sql_buffer.join("\n") : sql
|
87
84
|
conn.query(sql_to_execute)
|
88
85
|
end
|
89
|
-
|
90
|
-
ap results
|
86
|
+
log_each(:debug, results.awesome_inspect)
|
91
87
|
if block_given?
|
92
88
|
yield results
|
93
89
|
else
|
94
90
|
results
|
95
91
|
end
|
96
|
-
# TODO
|
97
|
-
# rescue ::ODBC::Error, ArgumentError => e
|
98
92
|
rescue Google::Cloud::InvalidArgumentError, ArgumentError => e
|
99
93
|
raise_error(e)
|
100
94
|
end # rubocop:disable Style/MultilineBlockChain
|
@@ -122,6 +116,33 @@ module Sequel
|
|
122
116
|
|
123
117
|
private
|
124
118
|
|
119
|
+
attr_reader :bigquery_config
|
120
|
+
|
121
|
+
def google_cloud_bigquery_gem_config
|
122
|
+
bigquery_config.dup.tap do |config|
|
123
|
+
%i[
|
124
|
+
adapter
|
125
|
+
database
|
126
|
+
dataset
|
127
|
+
location
|
128
|
+
logger
|
129
|
+
].each do |option|
|
130
|
+
config.delete(option)
|
131
|
+
end
|
132
|
+
end
|
133
|
+
end
|
134
|
+
|
135
|
+
def get_or_create_bigquery_dataset # rubocop:disable Naming/AccessorMethodName
|
136
|
+
bigquery.dataset(bigquery_dataset_name) || begin
|
137
|
+
log_each(:debug, 'BigQuery dataset %s does not exist; creating it' % bigquery_dataset_name)
|
138
|
+
bigquery.create_dataset(bigquery_dataset_name, location: bigquery_config[:location])
|
139
|
+
end
|
140
|
+
end
|
141
|
+
|
142
|
+
def bigquery_dataset_name
|
143
|
+
bigquery_config[:dataset] || bigquery_config[:database] || (raise ArgumentError, 'BigQuery dataset must be specified')
|
144
|
+
end
|
145
|
+
|
125
146
|
def connection_execute_method
|
126
147
|
:query
|
127
148
|
end
|
@@ -136,9 +157,9 @@ module Sequel
|
|
136
157
|
end
|
137
158
|
|
138
159
|
def schema_parse_table(_table_name, _opts)
|
139
|
-
|
160
|
+
log_each(:debug, Paint['schema_parse_table', :red, :bold])
|
140
161
|
# require 'pry'; binding.pry
|
141
|
-
|
162
|
+
bigquery.datasets.map do |dataset|
|
142
163
|
[
|
143
164
|
dataset.dataset_id,
|
144
165
|
{},
|
@@ -153,13 +174,12 @@ module Sequel
|
|
153
174
|
|
154
175
|
# Padded to horizontally align with post-execution log message which includes the execution time
|
155
176
|
def log_query(sql)
|
156
|
-
pad = '
|
157
|
-
|
158
|
-
# @loggers[0]&.debug(' ' + sql)
|
177
|
+
pad = ' ' * 12
|
178
|
+
log_each(:debug, Paint[pad + sql, :cyan, :bold])
|
159
179
|
end
|
160
180
|
|
161
181
|
def warn(msg)
|
162
|
-
|
182
|
+
log_each(:warn, Paint[msg, '#FFA500', :bold])
|
163
183
|
end
|
164
184
|
|
165
185
|
def warn_default_removal(sql)
|
@@ -189,11 +209,18 @@ module Sequel
|
|
189
209
|
|
190
210
|
sql
|
191
211
|
end
|
212
|
+
|
213
|
+
# Batch the alter table queries and make sure something is returned to avoid an error related to the return value
|
214
|
+
def apply_alter_table(name, ops)
|
215
|
+
sqls = alter_table_sql_list(name, ops)
|
216
|
+
sqls_joined = (sqls + ['select 1']).join(";\n")
|
217
|
+
execute_ddl(sqls_joined)
|
218
|
+
end
|
192
219
|
end
|
193
220
|
|
194
221
|
class Dataset < Sequel::Dataset
|
195
222
|
def fetch_rows(sql, &block)
|
196
|
-
|
223
|
+
db.send(:log_each, :debug, '#fetch_rows')
|
197
224
|
|
198
225
|
execute(sql) do |bq_result|
|
199
226
|
self.columns = bq_result.fields.map { |field| field.name.to_sym }
|
@@ -219,7 +246,7 @@ module Sequel
|
|
219
246
|
|
220
247
|
# Like MySQL, BigQuery uses the nonstandard ` (backtick) for quoting identifiers.
|
221
248
|
def quoted_identifier_append(sql, c)
|
222
|
-
sql << '`%s`' % c
|
249
|
+
sql << ('`%s`' % c)
|
223
250
|
end
|
224
251
|
|
225
252
|
def input_identifier(v)
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: sequel-bigquery
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.5.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Brendan Weibrecht
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2021-
|
11
|
+
date: 2021-11-15 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: amazing_print
|
@@ -98,7 +98,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
98
98
|
- !ruby/object:Gem::Version
|
99
99
|
version: '0'
|
100
100
|
requirements: []
|
101
|
-
rubygems_version: 3.
|
101
|
+
rubygems_version: 3.1.6
|
102
102
|
signing_key:
|
103
103
|
specification_version: 4
|
104
104
|
summary: A Sequel adapter for Google's BigQuery
|