RubyGems - embulk-output-bigquery - Versions diffs - 0.4.2 → 0.4.3 - Mend

embulk-output-bigquery 0.4.2 → 0.4.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +4 -0
data/README.md +13 -24
data/embulk-output-bigquery.gemspec +3 -2
data/lib/embulk/output/bigquery/bigquery_client.rb +5 -4
data/lib/embulk/output/bigquery/value_converter_factory.rb +6 -31
data/test/test_file_writer.rb +3 -3
data/test/test_value_converter_factory.rb +11 -10
metadata +19 -19

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 4287886bb0467a77706c88ae428cc95f082cdca1
-  data.tar.gz: 5c3677c05609d29b5835be54bde777827ab6b607
+  metadata.gz: e8b074a351f22417a10571e1a9aa60a1bc82df0d
+  data.tar.gz: c50a49b3b99f5cab88e023af6d95b39990c40d89
 SHA512:
-  metadata.gz: 4d2ff6070fc2eb7a27c26513f712d056ef2e7f0a158125c8921293a5c8299f4963891928bf6e3c43679d4131bbdd7481acf96d1a9fe5cb338b5c39f8c3553b6b
-  data.tar.gz: 3bae2694c1d59218517b0f6755216d5050b522b20263c83ebde062df069c2e97fc0adbd586ddc192e43c80af4691c26d55665e363c13e1f8a467cc44acae5f04
+  metadata.gz: 52e5a630d3173d2baec83dd03fbf0e3e4cb7d46aeb870e192ce31c6c8178534cade8975ec586e2a07bc1c213d616505ed52b5ec154cabdc19669317f8ba673b3
+  data.tar.gz: f9b15c9ff54a64626b33ce123a11ac80a7984535eb9fc47c042c9647b3d1728bd6f5f4f3157c9fcbac9163b097f83de5de7e3db2fb443a93aa4bdf32bd9d7fd5

data/CHANGELOG.md CHANGED Viewed

@@ -1,3 +1,7 @@
+## 0.4.3 - 2017-02-11
+* [maintenance] Fix `schma_update_options` was not set with load_from_gcs (thanks to h10a-bf)
 ## 0.4.2 - 2016-10-12
 * [maintenance] Fix `schema_update_options` was not working (nil error)

data/README.md CHANGED Viewed

@@ -102,8 +102,8 @@ Following options are same as [bq command-line tools](https://cloud.google.com/b
 |  allow_quoted_newlines            | boolean  | optional  | false   | Set true, if data contains newline characters. It may cause slow procsssing |
 |  time_partitioning                | hash     | optional  | `{"type":"DAY"}` if `table` parameter has a partition decorator, otherwise nil | See [Time Partitioning](#time-partitioning) |
 |  time_partitioning.type           | string   | required  | nil     | The only type supported is DAY, which will generate one partition per day based on data loading time. |
-|  time_partitioning.expiration__ms | int      | optional  | nil     | Number of milliseconds for which to keep the storage for a partition. partition |
-|  schema_update_options            | array    | optional  | nil     | List of `ALLOW_FIELD_ADDITION` or `ALLOW_FIELD_RELAXATION` or both. See [jobs#configuration.load.schemaUpdateOptions](https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load.schemaUpdateOptions) |
+|  time_partitioning.expiration_ms | int      | optional  | nil     | Number of milliseconds for which to keep the storage for a partition. partition |
+|  schema_update_options            | array    | optional  | nil     | (Experimental) List of `ALLOW_FIELD_ADDITION` or `ALLOW_FIELD_RELAXATION` or both. See [jobs#configuration.load.schemaUpdateOptions](https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load.schemaUpdateOptions). NOTE for the current status: `schema_update_options` does not work for `copy` job, that is, is not effective for most of modes such as `append`, `append_direct`, `replace`, `replace_backup` (except `delete_in_advance`) |
 ### Example
@@ -127,24 +127,25 @@ out:
 ##### append
-1. Load to temporary table.
+1. Load to temporary table (Create and WRITE_APPEND in parallel)
 2. Copy temporary table to destination table (or partition). (WRITE_APPEND)
 ##### append_direct
-Insert data into existing table (or partition) directly.
+1. Insert data into existing table (or partition) directly. (WRITE_APPEND in parallel)
 This is not transactional, i.e., if fails, the target table could have some rows inserted.
 ##### replace
-1. Load to temporary table.
+1. Load to temporary table (Create and WRITE_APPEND in parallel)
 2. Copy temporary table to destination table (or partition). (WRITE_TRUNCATE)
 ```is_skip_job_result_check``` must be false when replace mode
 ##### replace_backup
-1. Load to temporary table.
+1. Load to temporary table (Create and WRITE_APPEND in parallel)
 2. Copy destination table (or partition) to backup table (or partition). (dataset_old, table_old)
 3. Copy temporary table to destination table (or partition). (WRITE_TRUNCATE)
@@ -316,7 +317,7 @@ Therefore, it is recommended to format records with filter plugins written in Ja
 filters:
   - type: to_json
     column: {name: payload, type: string}
-    default_format: %Y-%m-%d %H:%M:%S.%6N
+    default_format: "%Y-%m-%d %H:%M:%S.%6N"
 out:
   type: bigquery
   payload_column_index: 0 # or, payload_column: payload
@@ -397,24 +398,12 @@ out:
     expiration_ms: 259200000
 ```
-Use `schema_update_options` to allow the schema of the desitination table to be updated as a side effect of the load job as:
-```yaml
-out:
-  type: bigquery
-  table: table_name$20160929
-  auto_create_table: true
-  time_partitioning:
-    type: DAY
-    expiration_ms: 259200000
-  schema_update_options:
-    - ALLOW_FIELD_ADDITION
-    - ALLOW_FIELD_RELAXATION
-```
+Use [Tables: patch](https://cloud.google.com/bigquery/docs/reference/v2/tables/patch) API to update the schema of the partitioned table, embulk-output-bigquery itself does not support it, though.
+Note that only adding a new column, and relaxing non-necessary columns to be `NULLABLE` are supported now. Deleting columns, and renaming columns are not supported.
-It seems that only adding a new column, and relaxing non-necessary columns to be `NULLABLE` are supported now.
-Deleting columns, and renaming columns are not supported.
-See [jobs#configuration.load.schemaUpdateOptions](https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load.schemaUpdateOptions) for details.
+MEMO: [jobs#configuration.load.schemaUpdateOptions](https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load.schemaUpdateOptions) is available
+to update the schema of the desitination table as a side effect of the load job, but it is not available for copy job.
+Thus, it was not suitable for embulk-output-bigquery idempotence modes, `append`, `replace`, and `replace_backup`, sigh.
 ## Development

data/embulk-output-bigquery.gemspec CHANGED Viewed

@@ -1,6 +1,6 @@
 Gem::Specification.new do |spec|
   spec.name          = "embulk-output-bigquery"
-  spec.version       = "0.4.2"
+  spec.version       = "0.4.3"
   spec.authors       = ["Satoshi Akama", "Naotoshi Seo"]
   spec.summary       = "Google BigQuery output plugin for Embulk"
   spec.description   = "Embulk plugin that insert records to Google BigQuery."
@@ -13,7 +13,8 @@ Gem::Specification.new do |spec|
   spec.require_paths = ["lib"]
   spec.add_dependency 'google-api-client'
-  spec.add_dependency "tzinfo"
+  spec.add_dependency 'time_with_zone'
   spec.add_development_dependency 'embulk', ['>= 0.8.2']
   spec.add_development_dependency 'bundler', ['>= 1.10.6']
   spec.add_development_dependency 'rake', ['>= 10.0']

data/lib/embulk/output/bigquery/bigquery_client.rb CHANGED Viewed

@@ -104,6 +104,11 @@ module Embulk
                   }
                 }
               }
+              if @task['schema_update_options']
+                body[:configuration][:load][:schema_update_options] = @task['schema_update_options']
+              end
               opts = {}
               Embulk.logger.debug { "embulk-output-bigquery: insert_job(#{@project}, #{body}, #{opts})" }
@@ -258,10 +263,6 @@ module Embulk
                 }
               }
-              if @task['schema_update_options']
-                body[:configuration][:copy][:schema_update_options] = @task['schema_update_options']
-              end
               opts = {}
               Embulk.logger.debug { "embulk-output-bigquery: insert_job(#{@project}, #{body}, #{opts})" }
               response = with_network_retry { client.insert_job(@project, body, opts) }

data/lib/embulk/output/bigquery/value_converter_factory.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 require 'time'
-require 'tzinfo'
+require 'time_with_zone'
 require 'json'
 require_relative 'helper'
@@ -23,8 +23,8 @@ module Embulk
         # @return [Array] an arary whose key is column_index, and value is its converter (Proc)
         def self.create_converters(task, schema)
           column_options_map       = Helper.column_options_map(task['column_options'])
-          default_timestamp_format = task['default_timestamp_format']
-          default_timezone         = task['default_timezone']
+          default_timestamp_format = task['default_timestamp_format'] || DEFAULT_TIMESTAMP_FORMAT
+          default_timezone         = task['default_timezone'] || DEFAULT_TIMEZONE
           schema.map do |column|
             column_name   = column[:name]
             embulk_type   = column[:type]
@@ -53,7 +53,7 @@ module Embulk
           @timestamp_format = timestamp_format
           @default_timestamp_format = default_timestamp_format
           @timezone         = timezone || default_timezone
-          @zone_offset      = get_zone_offset(@timezone) if @timezone
+          @zone_offset      = TimeWithZone.zone_offset(@timezone)
           @strict           = strict.nil? ? true : strict
         end
@@ -194,7 +194,7 @@ module Embulk
               Proc.new {|val|
                 next nil if val.nil?
                 with_typecast_error(val) do |val|
-                  strptime_with_zone(val, @timestamp_format, zone_offset).to_f
+                  TimeWithZone.set_zone_offset(Time.strptime(val, @timestamp_format), zone_offset).strftime("%Y-%m-%d %H:%M:%S.%6N %:z")
                 end
               }
             else
@@ -238,7 +238,7 @@ module Embulk
           when 'TIMESTAMP'
             Proc.new {|val|
               next nil if val.nil?
-              val.to_f # BigQuery supports UNIX timestamp
+              val.strftime("%Y-%m-%d %H:%M:%S.%6N %:z")
             }
           else
             raise NotSupportedType, "cannot take column type #{type} for timestamp column"
@@ -261,31 +261,6 @@ module Embulk
             raise NotSupportedType, "cannot take column type #{type} for json column"
           end
         end
-        private
-        # [+-]HH:MM, [+-]HHMM, [+-]HH
-        NUMERIC_PATTERN = %r{\A[+-]\d\d(:?\d\d)?\z}
-        # Region/Zone, Region/Zone/Zone
-        NAME_PATTERN = %r{\A[^/]+/[^/]+(/[^/]+)?\z}
-        def strptime_with_zone(date, timestamp_format, zone_offset)
-          time = Time.strptime(date, timestamp_format)
-          utc_offset = time.utc_offset
-          time.localtime(zone_offset) + utc_offset - zone_offset
-        end
-        def get_zone_offset(timezone)
-          if NUMERIC_PATTERN === timezone
-            Time.zone_offset(timezone)
-          elsif NAME_PATTERN === timezone || 'UTC' == timezone
-            tz = TZInfo::Timezone.get(timezone)
-            tz.period_for_utc(Time.now).utc_total_offset
-          else
-            raise ArgumentError, "timezone format is invalid: #{timezone}"
-          end
-        end
       end
     end
   end

data/test/test_file_writer.rb CHANGED Viewed

@@ -43,7 +43,7 @@ module Embulk
       end
       def record
-        [true, 1, 1.1, 'foo', Time.parse("2016-02-26 00:00:00 +09:00"), {"foo"=>"foo"}]
+        [true, 1, 1.1, 'foo', Time.parse("2016-02-26 00:00:00 +00:00").utc, {"foo"=>"foo"}]
       end
       def page
@@ -81,7 +81,7 @@ module Embulk
           formatter_proc = file_writer.instance_variable_get(:@formatter_proc)
           assert_equal :to_csv, formatter_proc.name
-          expected = %Q[true,1,1.1,foo,1456412400.0,"{""foo"":""foo""}"\n]
+          expected = %Q[true,1,1.1,foo,2016-02-26 00:00:00.000000 +00:00,"{""foo"":""foo""}"\n]
           assert_equal expected, formatter_proc.call(record)
         end
@@ -91,7 +91,7 @@ module Embulk
           formatter_proc = file_writer.instance_variable_get(:@formatter_proc)
           assert_equal :to_jsonl, formatter_proc.name
-          expected = %Q[{"boolean":true,"long":1,"double":1.1,"string":"foo","timestamp":1456412400.0,"json":"{\\"foo\\":\\"foo\\"}"}\n]
+          expected = %Q[{"boolean":true,"long":1,"double":1.1,"string":"foo","timestamp":"2016-02-26 00:00:00.000000 +00:00","json":"{\\"foo\\":\\"foo\\"}"}\n]
           assert_equal expected, formatter_proc.call(record)
         end
       end

data/test/test_value_converter_factory.rb CHANGED Viewed

@@ -23,8 +23,8 @@ module Embulk
           assert_equal 1, converters[1].call(1)
           assert_equal 1.1, converters[2].call(1.1)
           assert_equal 'foo', converters[3].call('foo')
-          timestamp = Time.parse("2016-02-26 00:00:00.100000 UTC")
-          assert_equal 1456444800.1, converters[4].call(timestamp)
+          timestamp = Time.parse("2016-02-26 00:00:00.500000 +00:00")
+          assert_equal "2016-02-26 00:00:00.500000 +00:00", converters[4].call(timestamp)
           assert_equal %Q[{"foo":"foo"}], converters[5].call({'foo'=>'foo'})
         end
@@ -55,7 +55,7 @@ module Embulk
           assert_equal '1', converters[1].call(1)
           assert_equal '1.1', converters[2].call(1.1)
           assert_equal 1, converters[3].call('1')
-          timestamp = Time.parse("2016-02-26 00:00:00.100000 UTC")
+          timestamp = Time.parse("2016-02-26 00:00:00.100000 +00:00")
           assert_equal 1456444800, converters[4].call(timestamp)
           assert_equal({'foo'=>'foo'}, converters[5].call({'foo'=>'foo'}))
         end
@@ -208,7 +208,7 @@ module Embulk
             timestamp_format: '%Y-%m-%d', timezone: 'Asia/Tokyo'
           ).create_converter
           assert_equal nil, converter.call(nil)
-          assert_equal 1456412400.0, converter.call("2016-02-26")
+          assert_equal "2016-02-26 00:00:00.000000 +09:00", converter.call("2016-02-26")
           # Users must care of BQ timestamp format by themselves with no timestamp_format
           converter = ValueConverterFactory.new(SCHEMA_TYPE, 'TIMESTAMP').create_converter
@@ -240,22 +240,22 @@ module Embulk
         def test_float
           converter = ValueConverterFactory.new(SCHEMA_TYPE, 'FLOAT').create_converter
           assert_equal nil, converter.call(nil)
-          expected = 1456444800.100000
+          expected = 1456444800.500000
           assert_equal expected, converter.call(Time.at(expected))
         end
         def test_string
           converter = ValueConverterFactory.new(SCHEMA_TYPE, 'STRING').create_converter
           assert_equal nil, converter.call(nil)
-          timestamp = Time.parse("2016-02-26 00:00:00.100000 UTC")
-          expected = "2016-02-26 00:00:00.100000"
+          timestamp = Time.parse("2016-02-26 00:00:00.500000 +00:00")
+          expected = "2016-02-26 00:00:00.500000"
           assert_equal expected, converter.call(timestamp)
           converter = ValueConverterFactory.new(
             SCHEMA_TYPE, 'STRING',
             timestamp_format: '%Y-%m-%d', timezone: 'Asia/Tokyo'
           ).create_converter
-          timestamp = Time.parse("2016-02-25 15:00:00.100000 UTC")
+          timestamp = Time.parse("2016-02-25 15:00:00.500000 +00:00")
           expected = "2016-02-26"
           assert_equal expected, converter.call(timestamp)
         end
@@ -263,8 +263,9 @@ module Embulk
         def test_timestamp
           converter = ValueConverterFactory.new(SCHEMA_TYPE, 'TIMESTAMP').create_converter
           assert_equal nil, converter.call(nil)
-          expected = 1456444800.100000
-          assert_equal expected, converter.call(Time.at(expected))
+          subject = 1456444800.500000
+          expected = "2016-02-26 00:00:00.500000 +00:00"
+          assert_equal expected, converter.call(Time.at(subject).utc)
         end
         def test_record

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: embulk-output-bigquery
 version: !ruby/object:Gem::Version
-  version: 0.4.2
+  version: 0.4.3
 platform: ruby
 authors:
 - Satoshi Akama
@@ -9,78 +9,78 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2016-10-12 00:00:00.000000000 Z
+date: 2017-02-11 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
-  name: google-api-client
-  version_requirements: !ruby/object:Gem::Requirement
-    requirements:
-    - - ">="
-      - !ruby/object:Gem::Version
-        version: '0'
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - ">="
       - !ruby/object:Gem::Version
         version: '0'
+  name: google-api-client
   prerelease: false
   type: :runtime
-- !ruby/object:Gem::Dependency
-  name: tzinfo
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - ">="
       - !ruby/object:Gem::Version
         version: '0'
+- !ruby/object:Gem::Dependency
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - ">="
       - !ruby/object:Gem::Version
         version: '0'
+  name: time_with_zone
   prerelease: false
   type: :runtime
-- !ruby/object:Gem::Dependency
-  name: embulk
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - ">="
       - !ruby/object:Gem::Version
-        version: 0.8.2
+        version: '0'
+- !ruby/object:Gem::Dependency
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - ">="
       - !ruby/object:Gem::Version
         version: 0.8.2
+  name: embulk
   prerelease: false
   type: :development
-- !ruby/object:Gem::Dependency
-  name: bundler
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - ">="
       - !ruby/object:Gem::Version
-        version: 1.10.6
+        version: 0.8.2
+- !ruby/object:Gem::Dependency
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - ">="
       - !ruby/object:Gem::Version
         version: 1.10.6
+  name: bundler
   prerelease: false
   type: :development
-- !ruby/object:Gem::Dependency
-  name: rake
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - ">="
       - !ruby/object:Gem::Version
-        version: '10.0'
+        version: 1.10.6
+- !ruby/object:Gem::Dependency
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - ">="
       - !ruby/object:Gem::Version
         version: '10.0'
+  name: rake
   prerelease: false
   type: :development
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '10.0'
 description: Embulk plugin that insert records to Google BigQuery.
 email:
 - satoshiakama@gmail.com