sequel-bigquery 0.3.0 → 0.5.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: a1df3065759c1f700359a326b1e039e2e31c7d67785c5fe5c36d11d9933bc03a
4
- data.tar.gz: 73c44cf0e5a3afea392ca675f7645a788698de6b357f9ec1023f72b5da803891
3
+ metadata.gz: c37139c40fd486391d9d16ab147430352b67a1a2b5a55f7eed4189c368b5e87a
4
+ data.tar.gz: e3753842a9727cf53451ff80ac9abc250343a719d4215ab368c5a93a9d2b4830
5
5
  SHA512:
6
- metadata.gz: 5fe03d5ba85f4b7daf490f5f1f6038f8d2101c78afcc20911a2b909c2bb73c683d8a094c3cf9229871dc6f2862b746cfd14797fe8d5091904955450600a25383
7
- data.tar.gz: bb26c418a72d251ab8d68d458ff51a108f053d83cbf326e962aa8cce446e0d5726b4da8d3d1ba7c50978d6e7de4ef362003eeea74fa414a4c5fbb6e2cf39a370
6
+ metadata.gz: 6189f0b53a6edd9d66c915316ce8e9e8c836181609405b54e246c0dbe5451bce995c6f5b73e7b320624c35b62bfd66874757734f839ef7f9023dbc099271b2a4
7
+ data.tar.gz: 715b5adbdb6e225ca9af37363fbc30b59bde54a0b4602d91afcca93506cac541d252b31767571605ee555f2fdaad8d316ccf67a083354bfdcc9cd5f925c51646
data/README.md CHANGED
@@ -4,11 +4,20 @@
4
4
 
5
5
  A Sequel adapter for [Google's BigQuery](https://cloud.google.com/bigquery).
6
6
 
7
+ This gem was created in order to manage schema migrations of a BigQuery dataset at GreenSync. At the time of writing, we couldn't find any good tools in any language to manage changes to the schema as a set of migrations.
8
+
9
+ Beyond migrations, I'm unsure how useful this gem is. I haven't yet tested what the performance would be for data interactions vs. directly using the `google-cloud-bigquery` gem's native facilities. If you're inserting a bunch of data, it's probably a better idea to use an [inserter from that gem](https://googleapis.dev/ruby/google-cloud-bigquery/latest/Google/Cloud/Bigquery/Dataset.html#insert_async-instance_method) rather than going through SQL.
10
+
7
11
  ## Contents
8
12
 
9
13
  <!-- MarkdownTOC autolink=true -->
10
14
 
11
15
  - [Intro](#intro)
16
+ - [Quirks](#quirks)
17
+ - [Creating tables with column defaults](#creating-tables-with-column-defaults)
18
+ - [Transactions](#transactions)
19
+ - [Update statements without `WHERE`](#update-statements-without-where)
20
+ - [Alter table](#alter-table)
12
21
  - [Installation](#installation)
13
22
  - [Usage](#usage)
14
23
  - [Contributing](#contributing)
@@ -25,12 +34,12 @@ A Sequel adapter for [Google's BigQuery](https://cloud.google.com/bigquery).
25
34
  Features:
26
35
 
27
36
  - Connecting
28
- - Migrating
29
- - Table creation, with automatic removal of defaults from statements (since BigQuery doesn't support it)
37
+ - Migrating (see quirks)
38
+ - Table creation (see quirks)
30
39
  - Inserting rows
31
- - Updating rows, with automatic addition of `where 1 = 1` to statements (since BigQuery requires a `where` clause)
40
+ - Updating rows (see quirks)
32
41
  - Querying
33
- - Transactions (buffered since BigQuery only supports them when you execute the whole transaction at once)
42
+ - Transactions (see quirks)
34
43
  - Table partitioning
35
44
  - Ruby types:
36
45
  + String
@@ -40,6 +49,25 @@ Features:
40
49
  + Date
41
50
  + Float
42
51
  + BigDecimal
52
+ - Selecting the BigQuery server location
53
+
54
+ ## Quirks
55
+
56
+ ### Creating tables with column defaults
57
+
58
+ BigQuery doesn't support defaults on columns. As a workaround, all defaults are automatically removed from statements (crudely).
59
+
60
+ ### Transactions
61
+
62
+ BigQuery doesn't support transactions where the statements are executed individually. It does support them if entire transaction SQL is sent all at once though. As a workaround, buffering of statements within a transaction has been implemented. However, the impact of this is that no results can be returned within a transaction.
63
+
64
+ ### Update statements without `WHERE`
65
+
66
+ BigQuery requires all `UPDATE` statement to have a `WHERE` clause. As a workaround, statements which lack one have `where 1 = 1` appended automatically (crudely).
67
+
68
+ ### Alter table
69
+
70
+ We've found that the `google-cloud-bigquery` gem seems to have a bug where an internal lack of result value results in a `NoMethodError` on nil for `fields` within `from_gapi_json`. [See issue #6](https://github.com/ZimbiX/sequel-bigquery/issues/6#issuecomment-968523731). As a workaround, all generated statements within `alter_table` are joined together with `;` and executed only at the end of the block. A `select 1` is also appended to try to ensure we have a result to avoid the aforementioned exception. A bonus of batching the queries is that the latency should be somewhat reduced.
43
71
 
44
72
  ## Installation
45
73
 
@@ -67,17 +95,21 @@ Connect to BigQuery:
67
95
 
68
96
  ```
69
97
  require 'sequel-bigquery'
98
+ require 'logger'
70
99
 
71
100
  db = Sequel.connect(
72
101
  adapter: :bigquery,
73
102
  project: 'your-gcp-project',
74
103
  database: 'your_bigquery_dataset_name',
104
+ location: 'australia-southeast2',
75
105
  logger: Logger.new(STDOUT),
76
106
  )
77
107
  ```
78
108
 
79
109
  And use Sequel like normal.
80
110
 
111
+ Note that it is important to supply a logger that will at least output warning messages so you know when your queries are being modifed or buffered, which may be unexpected behaviour.
112
+
81
113
  ## Contributing
82
114
 
83
115
  Pull requests welcome! =)
@@ -4,6 +4,7 @@ require 'delegate'
4
4
  require 'time'
5
5
 
6
6
  require 'google/cloud/bigquery'
7
+ require 'amazing_print'
7
8
  require 'paint'
8
9
  require 'sequel'
9
10
 
@@ -16,38 +17,34 @@ module Sequel
16
17
  class Database < Sequel::Database # rubocop:disable Metrics/ClassLength
17
18
  set_adapter_scheme :bigquery
18
19
 
19
- def initialize(*args, **kawrgs)
20
- puts '.new'
21
- @orig_opts = kawrgs.fetch(:orig_opts)
20
+ def initialize(*args, **kwargs)
21
+ @bigquery_config = kwargs.fetch(:orig_opts)
22
22
  @sql_buffer = []
23
23
  @sql_buffering = false
24
24
  super
25
25
  end
26
26
 
27
27
  def connect(*_args)
28
- puts '#connect'
29
- config = @orig_opts.dup
30
- config.delete(:adapter)
31
- config.delete(:logger)
32
- bq_dataset_name = config.delete(:dataset) || config.delete(:database)
33
- @bigquery = Google::Cloud::Bigquery.new(config)
28
+ log_each(:debug, '#connect')
29
+ get_or_create_bigquery_dataset
30
+ .tap { log_each(:debug, '#connect end') }
31
+ end
32
+
33
+ def bigquery
34
34
  # ObjectSpace.each_object(HTTPClient).each { |c| c.debug_dev = STDOUT }
35
- @bigquery.dataset(bq_dataset_name) || begin
36
- @loggers[0].debug('BigQuery dataset %s does not exist; creating it' % bq_dataset_name)
37
- @bigquery.create_dataset(bq_dataset_name)
38
- end
39
- .tap { puts '#connect end' }
35
+ @bigquery ||= Google::Cloud::Bigquery.new(google_cloud_bigquery_gem_config)
40
36
  end
41
37
 
42
38
  def disconnect_connection(_c)
43
- puts '#disconnect_connection'
39
+ log_each(:debug, '#disconnect_connection')
44
40
  # c.disconnect
45
41
  end
46
42
 
47
43
  def drop_datasets(*dataset_names_to_drop)
48
44
  dataset_names_to_drop.each do |dataset_name_to_drop|
49
- puts "Dropping dataset #{dataset_name_to_drop.inspect}"
50
- dataset_to_drop = @bigquery.dataset(dataset_name_to_drop)
45
+ log_each(:debug, "Dropping dataset #{dataset_name_to_drop.inspect}")
46
+ dataset_to_drop = bigquery.dataset(dataset_name_to_drop)
47
+ next unless dataset_to_drop
51
48
  dataset_to_drop.tables.each(&:delete)
52
49
  dataset_to_drop.delete
53
50
  end
@@ -55,7 +52,7 @@ module Sequel
55
52
  alias drop_dataset drop_datasets
56
53
 
57
54
  def execute(sql, opts = OPTS) # rubocop:disable Metrics/MethodLength, Metrics/AbcSize, Metrics/CyclomaticComplexity, Metrics/PerceivedComplexity
58
- puts '#execute'
55
+ log_each(:debug, '#execute')
59
56
  log_query(sql)
60
57
 
61
58
  # require 'pry'; binding.pry if sql =~ /CREATE TABLE IF NOT EXISTS/i
@@ -86,15 +83,12 @@ module Sequel
86
83
  sql_to_execute = @sql_buffer.any? ? @sql_buffer.join("\n") : sql
87
84
  conn.query(sql_to_execute)
88
85
  end
89
- require 'amazing_print'
90
- ap results
86
+ log_each(:debug, results.awesome_inspect)
91
87
  if block_given?
92
88
  yield results
93
89
  else
94
90
  results
95
91
  end
96
- # TODO
97
- # rescue ::ODBC::Error, ArgumentError => e
98
92
  rescue Google::Cloud::InvalidArgumentError, ArgumentError => e
99
93
  raise_error(e)
100
94
  end # rubocop:disable Style/MultilineBlockChain
@@ -122,6 +116,33 @@ module Sequel
122
116
 
123
117
  private
124
118
 
119
+ attr_reader :bigquery_config
120
+
121
+ def google_cloud_bigquery_gem_config
122
+ bigquery_config.dup.tap do |config|
123
+ %i[
124
+ adapter
125
+ database
126
+ dataset
127
+ location
128
+ logger
129
+ ].each do |option|
130
+ config.delete(option)
131
+ end
132
+ end
133
+ end
134
+
135
+ def get_or_create_bigquery_dataset # rubocop:disable Naming/AccessorMethodName
136
+ bigquery.dataset(bigquery_dataset_name) || begin
137
+ log_each(:debug, 'BigQuery dataset %s does not exist; creating it' % bigquery_dataset_name)
138
+ bigquery.create_dataset(bigquery_dataset_name, location: bigquery_config[:location])
139
+ end
140
+ end
141
+
142
+ def bigquery_dataset_name
143
+ bigquery_config[:dataset] || bigquery_config[:database] || (raise ArgumentError, 'BigQuery dataset must be specified')
144
+ end
145
+
125
146
  def connection_execute_method
126
147
  :query
127
148
  end
@@ -136,9 +157,9 @@ module Sequel
136
157
  end
137
158
 
138
159
  def schema_parse_table(_table_name, _opts)
139
- logger.debug(Paint['schema_parse_table', :red, :bold])
160
+ log_each(:debug, Paint['schema_parse_table', :red, :bold])
140
161
  # require 'pry'; binding.pry
141
- @bigquery.datasets.map do |dataset|
162
+ bigquery.datasets.map do |dataset|
142
163
  [
143
164
  dataset.dataset_id,
144
165
  {},
@@ -153,13 +174,12 @@ module Sequel
153
174
 
154
175
  # Padded to horizontally align with post-execution log message which includes the execution time
155
176
  def log_query(sql)
156
- pad = ' '
157
- puts Paint[pad + sql, :cyan, :bold]
158
- # @loggers[0]&.debug(' ' + sql)
177
+ pad = ' ' * 12
178
+ log_each(:debug, Paint[pad + sql, :cyan, :bold])
159
179
  end
160
180
 
161
181
  def warn(msg)
162
- @loggers[0].warn(Paint[msg, '#FFA500', :bold])
182
+ log_each(:warn, Paint[msg, '#FFA500', :bold])
163
183
  end
164
184
 
165
185
  def warn_default_removal(sql)
@@ -189,11 +209,18 @@ module Sequel
189
209
 
190
210
  sql
191
211
  end
212
+
213
+ # Batch the alter table queries and make sure something is returned to avoid an error related to the return value
214
+ def apply_alter_table(name, ops)
215
+ sqls = alter_table_sql_list(name, ops)
216
+ sqls_joined = (sqls + ['select 1']).join(";\n")
217
+ execute_ddl(sqls_joined)
218
+ end
192
219
  end
193
220
 
194
221
  class Dataset < Sequel::Dataset
195
222
  def fetch_rows(sql, &block)
196
- puts '#fetch_rows'
223
+ db.send(:log_each, :debug, '#fetch_rows')
197
224
 
198
225
  execute(sql) do |bq_result|
199
226
  self.columns = bq_result.fields.map { |field| field.name.to_sym }
@@ -219,7 +246,7 @@ module Sequel
219
246
 
220
247
  # Like MySQL, BigQuery uses the nonstandard ` (backtick) for quoting identifiers.
221
248
  def quoted_identifier_append(sql, c)
222
- sql << '`%s`' % c
249
+ sql << ('`%s`' % c)
223
250
  end
224
251
 
225
252
  def input_identifier(v)
@@ -2,6 +2,6 @@
2
2
 
3
3
  module Sequel
4
4
  module Bigquery
5
- VERSION = '0.3.0'
5
+ VERSION = '0.5.0'
6
6
  end
7
7
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: sequel-bigquery
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.0
4
+ version: 0.5.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Brendan Weibrecht
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2021-10-27 00:00:00.000000000 Z
11
+ date: 2021-11-15 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: amazing_print
@@ -98,7 +98,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
98
98
  - !ruby/object:Gem::Version
99
99
  version: '0'
100
100
  requirements: []
101
- rubygems_version: 3.2.16
101
+ rubygems_version: 3.1.6
102
102
  signing_key:
103
103
  specification_version: 4
104
104
  summary: A Sequel adapter for Google's BigQuery