RubyGems - sequel-bigquery - Versions diffs - 0.3.0 → 0.5.0 - Mend

sequel-bigquery 0.3.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

checksums.yaml +4 -4
data/README.md +36 -4
data/lib/sequel-bigquery.rb +57 -30
data/lib/sequel_bigquery/version.rb +1 -1
metadata +3 -3

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: a1df3065759c1f700359a326b1e039e2e31c7d67785c5fe5c36d11d9933bc03a
-  data.tar.gz: 73c44cf0e5a3afea392ca675f7645a788698de6b357f9ec1023f72b5da803891
+  metadata.gz: c37139c40fd486391d9d16ab147430352b67a1a2b5a55f7eed4189c368b5e87a
+  data.tar.gz: e3753842a9727cf53451ff80ac9abc250343a719d4215ab368c5a93a9d2b4830
 SHA512:
-  metadata.gz: 5fe03d5ba85f4b7daf490f5f1f6038f8d2101c78afcc20911a2b909c2bb73c683d8a094c3cf9229871dc6f2862b746cfd14797fe8d5091904955450600a25383
-  data.tar.gz: bb26c418a72d251ab8d68d458ff51a108f053d83cbf326e962aa8cce446e0d5726b4da8d3d1ba7c50978d6e7de4ef362003eeea74fa414a4c5fbb6e2cf39a370
+  metadata.gz: 6189f0b53a6edd9d66c915316ce8e9e8c836181609405b54e246c0dbe5451bce995c6f5b73e7b320624c35b62bfd66874757734f839ef7f9023dbc099271b2a4
+  data.tar.gz: 715b5adbdb6e225ca9af37363fbc30b59bde54a0b4602d91afcca93506cac541d252b31767571605ee555f2fdaad8d316ccf67a083354bfdcc9cd5f925c51646

data/README.md CHANGED Viewed

@@ -4,11 +4,20 @@
 A Sequel adapter for [Google's BigQuery](https://cloud.google.com/bigquery).
+This gem was created in order to manage schema migrations of a BigQuery dataset at GreenSync. At the time of writing, we couldn't find any good tools in any language to manage changes to the schema as a set of migrations.
+Beyond migrations, I'm unsure how useful this gem is. I haven't yet tested what the performance would be for data interactions vs. directly using the `google-cloud-bigquery` gem's native facilities. If you're inserting a bunch of data, it's probably a better idea to use an [inserter from that gem](https://googleapis.dev/ruby/google-cloud-bigquery/latest/Google/Cloud/Bigquery/Dataset.html#insert_async-instance_method) rather than going through SQL.
 ## Contents
 <!-- MarkdownTOC autolink=true -->
 - [Intro](#intro)
+- [Quirks](#quirks)
+  - [Creating tables with column defaults](#creating-tables-with-column-defaults)
+  - [Transactions](#transactions)
+  - [Update statements without `WHERE`](#update-statements-without-where)
+  - [Alter table](#alter-table)
 - [Installation](#installation)
 - [Usage](#usage)
 - [Contributing](#contributing)
@@ -25,12 +34,12 @@ A Sequel adapter for [Google's BigQuery](https://cloud.google.com/bigquery).
 Features:
 - Connecting
-- Migrating
-- Table creation, with automatic removal of defaults from statements (since BigQuery doesn't support it)
+- Migrating (see quirks)
+- Table creation (see quirks)
 - Inserting rows
-- Updating rows, with automatic addition of `where 1 = 1` to statements (since BigQuery requires a `where` clause)
+- Updating rows (see quirks)
 - Querying
-- Transactions (buffered since BigQuery only supports them when you execute the whole transaction at once)
+- Transactions (see quirks)
 - Table partitioning
 - Ruby types:
   + String
@@ -40,6 +49,25 @@ Features:
   + Date
   + Float
   + BigDecimal
+- Selecting the BigQuery server location
+## Quirks
+### Creating tables with column defaults
+BigQuery doesn't support defaults on columns. As a workaround, all defaults are automatically removed from statements (crudely).
+### Transactions
+BigQuery doesn't support transactions where the statements are executed individually. It does support them if entire transaction SQL is sent all at once though. As a workaround, buffering of statements within a transaction has been implemented. However, the impact of this is that no results can be returned within a transaction.
+### Update statements without `WHERE`
+BigQuery requires all `UPDATE` statement to have a `WHERE` clause. As a workaround, statements which lack one have `where 1 = 1` appended automatically (crudely).
+### Alter table
+We've found that the `google-cloud-bigquery` gem seems to have a bug where an internal lack of result value results in a `NoMethodError` on nil for `fields` within `from_gapi_json`. [See issue #6](https://github.com/ZimbiX/sequel-bigquery/issues/6#issuecomment-968523731). As a workaround, all generated statements within `alter_table` are joined together with `;` and executed only at the end of the block. A `select 1` is also appended to try to ensure we have a result to avoid the aforementioned exception. A bonus of batching the queries is that the latency should be somewhat reduced.
 ## Installation
@@ -67,17 +95,21 @@ Connect to BigQuery:
 ```
 require 'sequel-bigquery'
+require 'logger'
 db = Sequel.connect(
   adapter: :bigquery,
   project: 'your-gcp-project',
   database: 'your_bigquery_dataset_name',
+  location: 'australia-southeast2',
   logger: Logger.new(STDOUT),
 )
 ```
 And use Sequel like normal.
+Note that it is important to supply a logger that will at least output warning messages so you know when your queries are being modifed or buffered, which may be unexpected behaviour.
 ## Contributing
 Pull requests welcome! =)

data/lib/sequel-bigquery.rb CHANGED Viewed

@@ -4,6 +4,7 @@ require 'delegate'
 require 'time'
 require 'google/cloud/bigquery'
+require 'amazing_print'
 require 'paint'
 require 'sequel'
@@ -16,38 +17,34 @@ module Sequel
     class Database < Sequel::Database # rubocop:disable Metrics/ClassLength
       set_adapter_scheme :bigquery
-      def initialize(*args, **kawrgs)
-        puts '.new'
-        @orig_opts = kawrgs.fetch(:orig_opts)
+      def initialize(*args, **kwargs)
+        @bigquery_config = kwargs.fetch(:orig_opts)
         @sql_buffer = []
         @sql_buffering = false
         super
       end
       def connect(*_args)
-        puts '#connect'
-        config = @orig_opts.dup
-        config.delete(:adapter)
-        config.delete(:logger)
-        bq_dataset_name = config.delete(:dataset) || config.delete(:database)
-        @bigquery = Google::Cloud::Bigquery.new(config)
+        log_each(:debug, '#connect')
+        get_or_create_bigquery_dataset
+          .tap { log_each(:debug, '#connect end') }
+      end
+      def bigquery
         # ObjectSpace.each_object(HTTPClient).each { |c| c.debug_dev = STDOUT }
-        @bigquery.dataset(bq_dataset_name) || begin
-          @loggers[0].debug('BigQuery dataset %s does not exist; creating it' % bq_dataset_name)
-          @bigquery.create_dataset(bq_dataset_name)
-        end
-          .tap { puts '#connect end' }
+        @bigquery ||= Google::Cloud::Bigquery.new(google_cloud_bigquery_gem_config)
       end
       def disconnect_connection(_c)
-        puts '#disconnect_connection'
+        log_each(:debug, '#disconnect_connection')
         # c.disconnect
       end
       def drop_datasets(*dataset_names_to_drop)
         dataset_names_to_drop.each do |dataset_name_to_drop|
-          puts "Dropping dataset #{dataset_name_to_drop.inspect}"
-          dataset_to_drop = @bigquery.dataset(dataset_name_to_drop)
+          log_each(:debug, "Dropping dataset #{dataset_name_to_drop.inspect}")
+          dataset_to_drop = bigquery.dataset(dataset_name_to_drop)
+          next unless dataset_to_drop
           dataset_to_drop.tables.each(&:delete)
           dataset_to_drop.delete
         end
@@ -55,7 +52,7 @@ module Sequel
       alias drop_dataset drop_datasets
       def execute(sql, opts = OPTS) # rubocop:disable Metrics/MethodLength, Metrics/AbcSize, Metrics/CyclomaticComplexity, Metrics/PerceivedComplexity
-        puts '#execute'
+        log_each(:debug, '#execute')
         log_query(sql)
         # require 'pry'; binding.pry if sql =~ /CREATE TABLE IF NOT EXISTS/i
@@ -86,15 +83,12 @@ module Sequel
             sql_to_execute = @sql_buffer.any? ? @sql_buffer.join("\n") : sql
             conn.query(sql_to_execute)
           end
-          require 'amazing_print'
-          ap results
+          log_each(:debug, results.awesome_inspect)
           if block_given?
             yield results
           else
             results
           end
-        # TODO
-        # rescue ::ODBC::Error, ArgumentError => e
         rescue Google::Cloud::InvalidArgumentError, ArgumentError => e
           raise_error(e)
         end # rubocop:disable Style/MultilineBlockChain
@@ -122,6 +116,33 @@ module Sequel
       private
+      attr_reader :bigquery_config
+      def google_cloud_bigquery_gem_config
+        bigquery_config.dup.tap do |config|
+          %i[
+            adapter
+            database
+            dataset
+            location
+            logger
+          ].each do |option|
+            config.delete(option)
+          end
+        end
+      end
+      def get_or_create_bigquery_dataset # rubocop:disable Naming/AccessorMethodName
+        bigquery.dataset(bigquery_dataset_name) || begin
+          log_each(:debug, 'BigQuery dataset %s does not exist; creating it' % bigquery_dataset_name)
+          bigquery.create_dataset(bigquery_dataset_name, location: bigquery_config[:location])
+        end
+      end
+      def bigquery_dataset_name
+        bigquery_config[:dataset] || bigquery_config[:database] || (raise ArgumentError, 'BigQuery dataset must be specified')
+      end
       def connection_execute_method
         :query
       end
@@ -136,9 +157,9 @@ module Sequel
       end
       def schema_parse_table(_table_name, _opts)
-        logger.debug(Paint['schema_parse_table', :red, :bold])
+        log_each(:debug, Paint['schema_parse_table', :red, :bold])
         # require 'pry'; binding.pry
-        @bigquery.datasets.map do |dataset|
+        bigquery.datasets.map do |dataset|
           [
             dataset.dataset_id,
             {},
@@ -153,13 +174,12 @@ module Sequel
       # Padded to horizontally align with post-execution log message which includes the execution time
       def log_query(sql)
-        pad = '                                                                '
-        puts Paint[pad + sql, :cyan, :bold]
-        # @loggers[0]&.debug('            ' + sql)
+        pad = ' ' * 12
+        log_each(:debug, Paint[pad + sql, :cyan, :bold])
       end
       def warn(msg)
-        @loggers[0].warn(Paint[msg, '#FFA500', :bold])
+        log_each(:warn, Paint[msg, '#FFA500', :bold])
       end
       def warn_default_removal(sql)
@@ -189,11 +209,18 @@ module Sequel
         sql
       end
+      # Batch the alter table queries and make sure something is returned to avoid an error related to the return value
+      def apply_alter_table(name, ops)
+        sqls = alter_table_sql_list(name, ops)
+        sqls_joined = (sqls + ['select 1']).join(";\n")
+        execute_ddl(sqls_joined)
+      end
     end
     class Dataset < Sequel::Dataset
       def fetch_rows(sql, &block)
-        puts '#fetch_rows'
+        db.send(:log_each, :debug, '#fetch_rows')
         execute(sql) do |bq_result|
           self.columns = bq_result.fields.map { |field| field.name.to_sym }
@@ -219,7 +246,7 @@ module Sequel
       # Like MySQL, BigQuery uses the nonstandard ` (backtick) for quoting identifiers.
       def quoted_identifier_append(sql, c)
-        sql << '`%s`' % c
+        sql << ('`%s`' % c)
       end
       def input_identifier(v)

data/lib/sequel_bigquery/version.rb CHANGED Viewed

@@ -2,6 +2,6 @@
 module Sequel
   module Bigquery
-    VERSION = '0.3.0'
+    VERSION = '0.5.0'
   end
 end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: sequel-bigquery
 version: !ruby/object:Gem::Version
-  version: 0.3.0
+  version: 0.5.0
 platform: ruby
 authors:
 - Brendan Weibrecht
 autorequire:
 bindir: exe
 cert_chain: []
-date: 2021-10-27 00:00:00.000000000 Z
+date: 2021-11-15 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: amazing_print
@@ -98,7 +98,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubygems_version: 3.2.16
+rubygems_version: 3.1.6
 signing_key:
 specification_version: 4
 summary: A Sequel adapter for Google's BigQuery