sequel-bigquery 0.3.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: a1df3065759c1f700359a326b1e039e2e31c7d67785c5fe5c36d11d9933bc03a
4
- data.tar.gz: 73c44cf0e5a3afea392ca675f7645a788698de6b357f9ec1023f72b5da803891
3
+ metadata.gz: c37139c40fd486391d9d16ab147430352b67a1a2b5a55f7eed4189c368b5e87a
4
+ data.tar.gz: e3753842a9727cf53451ff80ac9abc250343a719d4215ab368c5a93a9d2b4830
5
5
  SHA512:
6
- metadata.gz: 5fe03d5ba85f4b7daf490f5f1f6038f8d2101c78afcc20911a2b909c2bb73c683d8a094c3cf9229871dc6f2862b746cfd14797fe8d5091904955450600a25383
7
- data.tar.gz: bb26c418a72d251ab8d68d458ff51a108f053d83cbf326e962aa8cce446e0d5726b4da8d3d1ba7c50978d6e7de4ef362003eeea74fa414a4c5fbb6e2cf39a370
6
+ metadata.gz: 6189f0b53a6edd9d66c915316ce8e9e8c836181609405b54e246c0dbe5451bce995c6f5b73e7b320624c35b62bfd66874757734f839ef7f9023dbc099271b2a4
7
+ data.tar.gz: 715b5adbdb6e225ca9af37363fbc30b59bde54a0b4602d91afcca93506cac541d252b31767571605ee555f2fdaad8d316ccf67a083354bfdcc9cd5f925c51646
data/README.md CHANGED
@@ -4,11 +4,20 @@
4
4
 
5
5
  A Sequel adapter for [Google's BigQuery](https://cloud.google.com/bigquery).
6
6
 
7
+ This gem was created in order to manage schema migrations of a BigQuery dataset at GreenSync. At the time of writing, we couldn't find any good tools in any language to manage changes to the schema as a set of migrations.
8
+
9
+ Beyond migrations, I'm unsure how useful this gem is. I haven't yet tested what the performance would be for data interactions vs. directly using the `google-cloud-bigquery` gem's native facilities. If you're inserting a bunch of data, it's probably a better idea to use an [inserter from that gem](https://googleapis.dev/ruby/google-cloud-bigquery/latest/Google/Cloud/Bigquery/Dataset.html#insert_async-instance_method) rather than going through SQL.
10
+
7
11
  ## Contents
8
12
 
9
13
  <!-- MarkdownTOC autolink=true -->
10
14
 
11
15
  - [Intro](#intro)
16
+ - [Quirks](#quirks)
17
+ - [Creating tables with column defaults](#creating-tables-with-column-defaults)
18
+ - [Transactions](#transactions)
19
+ - [Update statements without `WHERE`](#update-statements-without-where)
20
+ - [Alter table](#alter-table)
12
21
  - [Installation](#installation)
13
22
  - [Usage](#usage)
14
23
  - [Contributing](#contributing)
@@ -25,12 +34,12 @@ A Sequel adapter for [Google's BigQuery](https://cloud.google.com/bigquery).
25
34
  Features:
26
35
 
27
36
  - Connecting
28
- - Migrating
29
- - Table creation, with automatic removal of defaults from statements (since BigQuery doesn't support it)
37
+ - Migrating (see quirks)
38
+ - Table creation (see quirks)
30
39
  - Inserting rows
31
- - Updating rows, with automatic addition of `where 1 = 1` to statements (since BigQuery requires a `where` clause)
40
+ - Updating rows (see quirks)
32
41
  - Querying
33
- - Transactions (buffered since BigQuery only supports them when you execute the whole transaction at once)
42
+ - Transactions (see quirks)
34
43
  - Table partitioning
35
44
  - Ruby types:
36
45
  + String
@@ -40,6 +49,25 @@ Features:
40
49
  + Date
41
50
  + Float
42
51
  + BigDecimal
52
+ - Selecting the BigQuery server location
53
+
54
+ ## Quirks
55
+
56
+ ### Creating tables with column defaults
57
+
58
+ BigQuery doesn't support defaults on columns. As a workaround, all defaults are automatically removed from statements (crudely).
59
+
60
+ ### Transactions
61
+
62
+ BigQuery doesn't support transactions where the statements are executed individually. It does support them if entire transaction SQL is sent all at once though. As a workaround, buffering of statements within a transaction has been implemented. However, the impact of this is that no results can be returned within a transaction.
63
+
64
+ ### Update statements without `WHERE`
65
+
66
+ BigQuery requires all `UPDATE` statement to have a `WHERE` clause. As a workaround, statements which lack one have `where 1 = 1` appended automatically (crudely).
67
+
68
+ ### Alter table
69
+
70
+ We've found that the `google-cloud-bigquery` gem seems to have a bug where an internal lack of result value results in a `NoMethodError` on nil for `fields` within `from_gapi_json`. [See issue #6](https://github.com/ZimbiX/sequel-bigquery/issues/6#issuecomment-968523731). As a workaround, all generated statements within `alter_table` are joined together with `;` and executed only at the end of the block. A `select 1` is also appended to try to ensure we have a result to avoid the aforementioned exception. A bonus of batching the queries is that the latency should be somewhat reduced.
43
71
 
44
72
  ## Installation
45
73
 
@@ -67,17 +95,21 @@ Connect to BigQuery:
67
95
 
68
96
  ```
69
97
  require 'sequel-bigquery'
98
+ require 'logger'
70
99
 
71
100
  db = Sequel.connect(
72
101
  adapter: :bigquery,
73
102
  project: 'your-gcp-project',
74
103
  database: 'your_bigquery_dataset_name',
104
+ location: 'australia-southeast2',
75
105
  logger: Logger.new(STDOUT),
76
106
  )
77
107
  ```
78
108
 
79
109
  And use Sequel like normal.
80
110
 
111
+ Note that it is important to supply a logger that will at least output warning messages so you know when your queries are being modifed or buffered, which may be unexpected behaviour.
112
+
81
113
  ## Contributing
82
114
 
83
115
  Pull requests welcome! =)
@@ -4,6 +4,7 @@ require 'delegate'
4
4
  require 'time'
5
5
 
6
6
  require 'google/cloud/bigquery'
7
+ require 'amazing_print'
7
8
  require 'paint'
8
9
  require 'sequel'
9
10
 
@@ -16,38 +17,34 @@ module Sequel
16
17
  class Database < Sequel::Database # rubocop:disable Metrics/ClassLength
17
18
  set_adapter_scheme :bigquery
18
19
 
19
- def initialize(*args, **kawrgs)
20
- puts '.new'
21
- @orig_opts = kawrgs.fetch(:orig_opts)
20
+ def initialize(*args, **kwargs)
21
+ @bigquery_config = kwargs.fetch(:orig_opts)
22
22
  @sql_buffer = []
23
23
  @sql_buffering = false
24
24
  super
25
25
  end
26
26
 
27
27
  def connect(*_args)
28
- puts '#connect'
29
- config = @orig_opts.dup
30
- config.delete(:adapter)
31
- config.delete(:logger)
32
- bq_dataset_name = config.delete(:dataset) || config.delete(:database)
33
- @bigquery = Google::Cloud::Bigquery.new(config)
28
+ log_each(:debug, '#connect')
29
+ get_or_create_bigquery_dataset
30
+ .tap { log_each(:debug, '#connect end') }
31
+ end
32
+
33
+ def bigquery
34
34
  # ObjectSpace.each_object(HTTPClient).each { |c| c.debug_dev = STDOUT }
35
- @bigquery.dataset(bq_dataset_name) || begin
36
- @loggers[0].debug('BigQuery dataset %s does not exist; creating it' % bq_dataset_name)
37
- @bigquery.create_dataset(bq_dataset_name)
38
- end
39
- .tap { puts '#connect end' }
35
+ @bigquery ||= Google::Cloud::Bigquery.new(google_cloud_bigquery_gem_config)
40
36
  end
41
37
 
42
38
  def disconnect_connection(_c)
43
- puts '#disconnect_connection'
39
+ log_each(:debug, '#disconnect_connection')
44
40
  # c.disconnect
45
41
  end
46
42
 
47
43
  def drop_datasets(*dataset_names_to_drop)
48
44
  dataset_names_to_drop.each do |dataset_name_to_drop|
49
- puts "Dropping dataset #{dataset_name_to_drop.inspect}"
50
- dataset_to_drop = @bigquery.dataset(dataset_name_to_drop)
45
+ log_each(:debug, "Dropping dataset #{dataset_name_to_drop.inspect}")
46
+ dataset_to_drop = bigquery.dataset(dataset_name_to_drop)
47
+ next unless dataset_to_drop
51
48
  dataset_to_drop.tables.each(&:delete)
52
49
  dataset_to_drop.delete
53
50
  end
@@ -55,7 +52,7 @@ module Sequel
55
52
  alias drop_dataset drop_datasets
56
53
 
57
54
  def execute(sql, opts = OPTS) # rubocop:disable Metrics/MethodLength, Metrics/AbcSize, Metrics/CyclomaticComplexity, Metrics/PerceivedComplexity
58
- puts '#execute'
55
+ log_each(:debug, '#execute')
59
56
  log_query(sql)
60
57
 
61
58
  # require 'pry'; binding.pry if sql =~ /CREATE TABLE IF NOT EXISTS/i
@@ -86,15 +83,12 @@ module Sequel
86
83
  sql_to_execute = @sql_buffer.any? ? @sql_buffer.join("\n") : sql
87
84
  conn.query(sql_to_execute)
88
85
  end
89
- require 'amazing_print'
90
- ap results
86
+ log_each(:debug, results.awesome_inspect)
91
87
  if block_given?
92
88
  yield results
93
89
  else
94
90
  results
95
91
  end
96
- # TODO
97
- # rescue ::ODBC::Error, ArgumentError => e
98
92
  rescue Google::Cloud::InvalidArgumentError, ArgumentError => e
99
93
  raise_error(e)
100
94
  end # rubocop:disable Style/MultilineBlockChain
@@ -122,6 +116,33 @@ module Sequel
122
116
 
123
117
  private
124
118
 
119
+ attr_reader :bigquery_config
120
+
121
+ def google_cloud_bigquery_gem_config
122
+ bigquery_config.dup.tap do |config|
123
+ %i[
124
+ adapter
125
+ database
126
+ dataset
127
+ location
128
+ logger
129
+ ].each do |option|
130
+ config.delete(option)
131
+ end
132
+ end
133
+ end
134
+
135
+ def get_or_create_bigquery_dataset # rubocop:disable Naming/AccessorMethodName
136
+ bigquery.dataset(bigquery_dataset_name) || begin
137
+ log_each(:debug, 'BigQuery dataset %s does not exist; creating it' % bigquery_dataset_name)
138
+ bigquery.create_dataset(bigquery_dataset_name, location: bigquery_config[:location])
139
+ end
140
+ end
141
+
142
+ def bigquery_dataset_name
143
+ bigquery_config[:dataset] || bigquery_config[:database] || (raise ArgumentError, 'BigQuery dataset must be specified')
144
+ end
145
+
125
146
  def connection_execute_method
126
147
  :query
127
148
  end
@@ -136,9 +157,9 @@ module Sequel
136
157
  end
137
158
 
138
159
  def schema_parse_table(_table_name, _opts)
139
- logger.debug(Paint['schema_parse_table', :red, :bold])
160
+ log_each(:debug, Paint['schema_parse_table', :red, :bold])
140
161
  # require 'pry'; binding.pry
141
- @bigquery.datasets.map do |dataset|
162
+ bigquery.datasets.map do |dataset|
142
163
  [
143
164
  dataset.dataset_id,
144
165
  {},
@@ -153,13 +174,12 @@ module Sequel
153
174
 
154
175
  # Padded to horizontally align with post-execution log message which includes the execution time
155
176
  def log_query(sql)
156
- pad = ' '
157
- puts Paint[pad + sql, :cyan, :bold]
158
- # @loggers[0]&.debug(' ' + sql)
177
+ pad = ' ' * 12
178
+ log_each(:debug, Paint[pad + sql, :cyan, :bold])
159
179
  end
160
180
 
161
181
  def warn(msg)
162
- @loggers[0].warn(Paint[msg, '#FFA500', :bold])
182
+ log_each(:warn, Paint[msg, '#FFA500', :bold])
163
183
  end
164
184
 
165
185
  def warn_default_removal(sql)
@@ -189,11 +209,18 @@ module Sequel
189
209
 
190
210
  sql
191
211
  end
212
+
213
+ # Batch the alter table queries and make sure something is returned to avoid an error related to the return value
214
+ def apply_alter_table(name, ops)
215
+ sqls = alter_table_sql_list(name, ops)
216
+ sqls_joined = (sqls + ['select 1']).join(";\n")
217
+ execute_ddl(sqls_joined)
218
+ end
192
219
  end
193
220
 
194
221
  class Dataset < Sequel::Dataset
195
222
  def fetch_rows(sql, &block)
196
- puts '#fetch_rows'
223
+ db.send(:log_each, :debug, '#fetch_rows')
197
224
 
198
225
  execute(sql) do |bq_result|
199
226
  self.columns = bq_result.fields.map { |field| field.name.to_sym }
@@ -219,7 +246,7 @@ module Sequel
219
246
 
220
247
  # Like MySQL, BigQuery uses the nonstandard ` (backtick) for quoting identifiers.
221
248
  def quoted_identifier_append(sql, c)
222
- sql << '`%s`' % c
249
+ sql << ('`%s`' % c)
223
250
  end
224
251
 
225
252
  def input_identifier(v)
@@ -2,6 +2,6 @@
2
2
 
3
3
  module Sequel
4
4
  module Bigquery
5
- VERSION = '0.3.0'
5
+ VERSION = '0.5.0'
6
6
  end
7
7
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: sequel-bigquery
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.0
4
+ version: 0.5.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Brendan Weibrecht
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2021-10-27 00:00:00.000000000 Z
11
+ date: 2021-11-15 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: amazing_print
@@ -98,7 +98,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
98
98
  - !ruby/object:Gem::Version
99
99
  version: '0'
100
100
  requirements: []
101
- rubygems_version: 3.2.16
101
+ rubygems_version: 3.1.6
102
102
  signing_key:
103
103
  specification_version: 4
104
104
  summary: A Sequel adapter for Google's BigQuery