RubyGems - embulk-output-bigquery - Versions diffs - 0.4.2 → 0.4.3 - Mend

embulk-output-bigquery 0.4.2 → 0.4.3

Files changed (9) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +4 -0
data/README.md +13 -24
data/embulk-output-bigquery.gemspec +3 -2
data/lib/embulk/output/bigquery/bigquery_client.rb +5 -4
data/lib/embulk/output/bigquery/value_converter_factory.rb +6 -31
data/test/test_file_writer.rb +3 -3
data/test/test_value_converter_factory.rb +11 -10
metadata +19 -19

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 4287886bb0467a77706c88ae428cc95f082cdca1
-  data.tar.gz: 5c3677c05609d29b5835be54bde777827ab6b607
+  metadata.gz: e8b074a351f22417a10571e1a9aa60a1bc82df0d
+  data.tar.gz: c50a49b3b99f5cab88e023af6d95b39990c40d89
 SHA512:
-  metadata.gz: 4d2ff6070fc2eb7a27c26513f712d056ef2e7f0a158125c8921293a5c8299f4963891928bf6e3c43679d4131bbdd7481acf96d1a9fe5cb338b5c39f8c3553b6b
-  data.tar.gz: 3bae2694c1d59218517b0f6755216d5050b522b20263c83ebde062df069c2e97fc0adbd586ddc192e43c80af4691c26d55665e363c13e1f8a467cc44acae5f04
+  metadata.gz: 52e5a630d3173d2baec83dd03fbf0e3e4cb7d46aeb870e192ce31c6c8178534cade8975ec586e2a07bc1c213d616505ed52b5ec154cabdc19669317f8ba673b3
+  data.tar.gz: f9b15c9ff54a64626b33ce123a11ac80a7984535eb9fc47c042c9647b3d1728bd6f5f4f3157c9fcbac9163b097f83de5de7e3db2fb443a93aa4bdf32bd9d7fd5

data/CHANGELOG.md CHANGED Viewed

@@ -1,3 +1,7 @@
+## 0.4.3 - 2017-02-11
+* [maintenance] Fix `schma_update_options` was not set with load_from_gcs (thanks to h10a-bf)
 ## 0.4.2 - 2016-10-12
 * [maintenance] Fix `schema_update_options` was not working (nil error)

data/README.md CHANGED Viewed

@@ -102,8 +102,8 @@ Following options are same as [bq command-line tools](https://cloud.google.com/b
 |  allow_quoted_newlines            | boolean  | optional  | false   | Set true, if data contains newline characters. It may cause slow procsssing |
 |  time_partitioning                | hash     | optional  | `{"type":"DAY"}` if `table` parameter has a partition decorator, otherwise nil | See [Time Partitioning](#time-partitioning) |
 |  time_partitioning.type           | string   | required  | nil     | The only type supported is DAY, which will generate one partition per day based on data loading time. |
-|  time_partitioning.expiration__ms | int      | optional  | nil     | Number of milliseconds for which to keep the storage for a partition. partition |
-|  schema_update_options            | array    | optional  | nil     | List of `ALLOW_FIELD_ADDITION` or `ALLOW_FIELD_RELAXATION` or both. See [jobs#configuration.load.schemaUpdateOptions](https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load.schemaUpdateOptions) |
+|  time_partitioning.expiration_ms | int      | optional  | nil     | Number of milliseconds for which to keep the storage for a partition. partition |
+|  schema_update_options            | array    | optional  | nil     | (Experimental) List of `ALLOW_FIELD_ADDITION` or `ALLOW_FIELD_RELAXATION` or both. See [jobs#configuration.load.schemaUpdateOptions](https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load.schemaUpdateOptions). NOTE for the current status: `schema_update_options` does not work for `copy` job, that is, is not effective for most of modes such as `append`, `append_direct`, `replace`, `replace_backup` (except `delete_in_advance`) |
 ### Example
@@ -127,24 +127,25 @@ out:
 ##### append
-1. Load to temporary table.
+1. Load to temporary table (Create and WRITE_APPEND in parallel)
 2. Copy temporary table to destination table (or partition). (WRITE_APPEND)
 ##### append_direct
-Insert data into existing table (or partition) directly.
+1. Insert data into existing table (or partition) directly. (WRITE_APPEND in parallel)
 This is not transactional, i.e., if fails, the target table could have some rows inserted.
 ##### replace
-1. Load to temporary table.
+1. Load to temporary table (Create and WRITE_APPEND in parallel)
 2. Copy temporary table to destination table (or partition). (WRITE_TRUNCATE)
 ```is_skip_job_result_check``` must be false when replace mode
 ##### replace_backup
-1. Load to temporary table.
+1. Load to temporary table (Create and WRITE_APPEND in parallel)
 2. Copy destination table (or partition) to backup table (or partition). (dataset_old, table_old)
 3. Copy temporary table to destination table (or partition). (WRITE_TRUNCATE)
@@ -316,7 +317,7 @@ Therefore, it is recommended to format records with filter plugins written in Ja
 filters:
   - type: to_json
     column: {name: payload, type: string}
-    default_format: %Y-%m-%d %H:%M:%S.%6N
+    default_format: "%Y-%m-%d %H:%M:%S.%6N"
 out:
   type: bigquery
   payload_column_index: 0 # or, payload_column: payload
@@ -397,24 +398,12 @@ out:
     expiration_ms: 259200000
 ```
-Use `schema_update_options` to allow the schema of the desitination table to be updated as a side effect of the load job as:
-```yaml
-out:
-  type: bigquery
-  table: table_name$20160929
-  auto_create_table: true
-  time_partitioning:
-    type: DAY
-    expiration_ms: 259200000
-  schema_update_options:
-    - ALLOW_FIELD_ADDITION
-    - ALLOW_FIELD_RELAXATION
-```
+Use [Tables: patch](https://cloud.google.com/bigquery/docs/reference/v2/tables/patch) API to update the schema of the partitioned table, embulk-output-bigquery itself does not support it, though.
+Note that only adding a new column, and relaxing non-necessary columns to be `NULLABLE` are supported now. Deleting columns, and renaming columns are not supported.
-It seems that only adding a new column, and relaxing non-necessary columns to be `NULLABLE` are supported now.
-Deleting columns, and renaming columns are not supported.
-See [jobs#configuration.load.schemaUpdateOptions](https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load.schemaUpdateOptions) for details.
+MEMO: [jobs#configuration.load.schemaUpdateOptions](https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load.schemaUpdateOptions) is available
+to update the schema of the desitination table as a side effect of the load job, but it is not available for copy job.
+Thus, it was not suitable for embulk-output-bigquery idempotence modes, `append`, `replace`, and `replace_backup`, sigh.
 ## Development

data/embulk-output-bigquery.gemspec CHANGED Viewed

@@ -1,6 +1,6 @@
 Gem::Specification.new do |spec|
   spec.name          = "embulk-output-bigquery"
-  spec.version       = "0.4.2"
+  spec.version       = "0.4.3"
   spec.authors       = ["Satoshi Akama", "Naotoshi Seo"]
   spec.summary       = "Google BigQuery output plugin for Embulk"
   spec.description   = "Embulk plugin that insert records to Google BigQuery."
@@ -13,7 +13,8 @@ Gem::Specification.new do |spec|
   spec.require_paths = ["lib"]
   spec.add_dependency 'google-api-client'
-  spec.add_dependency "tzinfo"
+  spec.add_dependency 'time_with_zone'
   spec.add_development_dependency 'embulk', ['>= 0.8.2']
   spec.add_development_dependency 'bundler', ['>= 1.10.6']
   spec.add_development_dependency 'rake', ['>= 10.0']

data/lib/embulk/output/bigquery/bigquery_client.rb CHANGED Viewed

@@ -104,6 +104,11 @@ module Embulk
                   }
                 }
               }
+              if @task['schema_update_options']
+                body[:configuration][:load][:schema_update_options] = @task['schema_update_options']
+              end
               opts = {}
               Embulk.logger.debug { "embulk-output-bigquery: insert_job(#{@project}, #{body}, #{opts})" }
@@ -258,10 +263,6 @@ module Embulk
                 }
               }
-              if @task['schema_update_options']
-                body[:configuration][:copy][:schema_update_options] = @task['schema_update_options']
-              end
               opts = {}
               Embulk.logger.debug { "embulk-output-bigquery: insert_job(#{@project}, #{body}, #{opts})" }
               response = with_network_retry { client.insert_job(@project, body, opts) }

data/lib/embulk/output/bigquery/value_converter_factory.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 require 'time'
-require 'tzinfo'
+require 'time_with_zone'
 require 'json'
 require_relative 'helper'
@@ -23,8 +23,8 @@ module Embulk
         # @return [Array] an arary whose key is column_index, and value is its converter (Proc)
         def self.create_converters(task, schema)
           column_options_map       = Helper.column_options_map(task['column_options'])
-          default_timestamp_format = task['default_timestamp_format']
-          default_timezone         = task['default_timezone']
+          default_timestamp_format = task['default_timestamp_format'] || DEFAULT_TIMESTAMP_FORMAT
+          default_timezone         = task['default_timezone'] || DEFAULT_TIMEZONE
           schema.map do |column|
             column_name   = column[:name]
             embulk_type   = column[:type]
@@ -53,7 +53,7 @@ module Embulk
           @timestamp_format = timestamp_format
           @default_timestamp_format = default_timestamp_format
           @timezone         = timezone || default_timezone
-          @zone_offset      = get_zone_offset(@timezone) if @timezone
+          @zone_offset      = TimeWithZone.zone_offset(@timezone)
           @strict           = strict.nil? ? true : strict
         end
@@ -194,7 +194,7 @@ module Embulk
               Proc.new {|val|
                 next nil if val.nil?
                 with_typecast_error(val) do |val|
-                  strptime_with_zone(val, @timestamp_format, zone_offset).to_f
+                  TimeWithZone.set_zone_offset(Time.strptime(val, @timestamp_format), zone_offset).strftime("%Y-%m-%d %H:%M:%S.%6N %:z")
                 end
               }
             else
@@ -238,7 +238,7 @@ module Embulk
           when 'TIMESTAMP'
             Proc.new {|val|
               next nil if val.nil?
-              val.to_f # BigQuery supports UNIX timestamp
+              val.strftime("%Y-%m-%d %H:%M:%S.%6N %:z")
             }
           else
             raise NotSupportedType, "cannot take column type #{type} for timestamp column"
@@ -261,31 +261,6 @@ module Embulk
             raise NotSupportedType, "cannot take column type #{type} for json column"
           end
         end
-        private
-        # [+-]HH:MM, [+-]HHMM, [+-]HH
-        NUMERIC_PATTERN = %r{\A[+-]\d\d(:?\d\d)?\z}
-        # Region/Zone, Region/Zone/Zone
-        NAME_PATTERN = %r{\A[^/]+/[^/]+(/[^/]+)?\z}
-        def strptime_with_zone(date, timestamp_format, zone_offset)
-          time = Time.strptime(date, timestamp_format)
-          utc_offset = time.utc_offset
-          time.localtime(zone_offset) + utc_offset - zone_offset
-        end
-        def get_zone_offset(timezone)
-          if NUMERIC_PATTERN === timezone
-            Time.zone_offset(timezone)
-          elsif NAME_PATTERN === timezone || 'UTC' == timezone
-            tz = TZInfo::Timezone.get(timezone)
-            tz.period_for_utc(Time.now).utc_total_offset
-          else
-            raise ArgumentError, "timezone format is invalid: #{timezone}"
-          end
-        end
       end
     end
   end

data/test/test_file_writer.rb CHANGED Viewed

@@ -43,7 +43,7 @@ module Embulk
       end
       def record
-        [true, 1, 1.1, 'foo', Time.parse("2016-02-26 00:00:00 +09:00"), {"foo"=>"foo"}]
+        [true, 1, 1.1, 'foo', Time.parse("2016-02-26 00:00:00 +00:00").utc, {"foo"=>"foo"}]
       end
       def page
@@ -81,7 +81,7 @@ module Embulk
           formatter_proc = file_writer.instance_variable_get(:@formatter_proc)
           assert_equal :to_csv, formatter_proc.name
-          expected = %Q[true,1,1.1,foo,1456412400.0,"{""foo"":""foo""}"\n]
+          expected = %Q[true,1,1.1,foo,2016-02-26 00:00:00.000000 +00:00,"{""foo"":""foo""}"\n]
           assert_equal expected, formatter_proc.call(record)
         end
@@ -91,7 +91,7 @@ module Embulk
           formatter_proc = file_writer.instance_variable_get(:@formatter_proc)
           assert_equal :to_jsonl, formatter_proc.name
-          expected = %Q[{"boolean":true,"long":1,"double":1.1,"string":"foo","timestamp":1456412400.0,"json":"{\\"foo\\":\\"foo\\"}"}\n]
+          expected = %Q[{"boolean":true,"long":1,"double":1.1,"string":"foo","timestamp":"2016-02-26 00:00:00.000000 +00:00","json":"{\\"foo\\":\\"foo\\"}"}\n]
           assert_equal expected, formatter_proc.call(record)
         end
       end

data/test/test_value_converter_factory.rb CHANGED Viewed

@@ -23,8 +23,8 @@ module Embulk
           assert_equal 1, converters[1].call(1)
           assert_equal 1.1, converters[2].call(1.1)
           assert_equal 'foo', converters[3].call('foo')
-          timestamp = Time.parse("2016-02-26 00:00:00.100000 UTC")
-          assert_equal 1456444800.1, converters[4].call(timestamp)
+          timestamp = Time.parse("2016-02-26 00:00:00.500000 +00:00")
+          assert_equal "2016-02-26 00:00:00.500000 +00:00", converters[4].call(timestamp)
           assert_equal %Q[{"foo":"foo"}], converters[5].call({'foo'=>'foo'})
         end
@@ -55,7 +55,7 @@ module Embulk
           assert_equal '1', converters[1].call(1)
           assert_equal '1.1', converters[2].call(1.1)
           assert_equal 1, converters[3].call('1')
-          timestamp = Time.parse("2016-02-26 00:00:00.100000 UTC")
+          timestamp = Time.parse("2016-02-26 00:00:00.100000 +00:00")
           assert_equal 1456444800, converters[4].call(timestamp)
           assert_equal({'foo'=>'foo'}, converters[5].call({'foo'=>'foo'}))
         end
@@ -208,7 +208,7 @@ module Embulk
             timestamp_format: '%Y-%m-%d', timezone: 'Asia/Tokyo'
           ).create_converter
           assert_equal nil, converter.call(nil)
-          assert_equal 1456412400.0, converter.call("2016-02-26")
+          assert_equal "2016-02-26 00:00:00.000000 +09:00", converter.call("2016-02-26")
           # Users must care of BQ timestamp format by themselves with no timestamp_format
           converter = ValueConverterFactory.new(SCHEMA_TYPE, 'TIMESTAMP').create_converter
@@ -240,22 +240,22 @@ module Embulk
         def test_float
           converter = ValueConverterFactory.new(SCHEMA_TYPE, 'FLOAT').create_converter
           assert_equal nil, converter.call(nil)
-          expected = 1456444800.100000
+          expected = 1456444800.500000
           assert_equal expected, converter.call(Time.at(expected))
         end
         def test_string
           converter = ValueConverterFactory.new(SCHEMA_TYPE, 'STRING').create_converter
           assert_equal nil, converter.call(nil)
-          timestamp = Time.parse("2016-02-26 00:00:00.100000 UTC")
-          expected = "2016-02-26 00:00:00.100000"
+          timestamp = Time.parse("2016-02-26 00:00:00.500000 +00:00")
+          expected = "2016-02-26 00:00:00.500000"
           assert_equal expected, converter.call(timestamp)
           converter = ValueConverterFactory.new(
             SCHEMA_TYPE, 'STRING',
             timestamp_format: '%Y-%m-%d', timezone: 'Asia/Tokyo'
           ).create_converter
-          timestamp = Time.parse("2016-02-25 15:00:00.100000 UTC")
+          timestamp = Time.parse("2016-02-25 15:00:00.500000 +00:00")
           expected = "2016-02-26"
           assert_equal expected, converter.call(timestamp)
         end
@@ -263,8 +263,9 @@ module Embulk
         def test_timestamp
           converter = ValueConverterFactory.new(SCHEMA_TYPE, 'TIMESTAMP').create_converter
           assert_equal nil, converter.call(nil)
-          expected = 1456444800.100000
-          assert_equal expected, converter.call(Time.at(expected))
+          subject = 1456444800.500000
+          expected = "2016-02-26 00:00:00.500000 +00:00"
+          assert_equal expected, converter.call(Time.at(subject).utc)
         end
         def test_record

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: embulk-output-bigquery
 version: !ruby/object:Gem::Version
-  version: 0.4.2
+  version: 0.4.3
 platform: ruby
 authors:
 - Satoshi Akama
@@ -9,78 +9,78 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2016-10-12 00:00:00.000000000 Z
+date: 2017-02-11 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
-  name: google-api-client
-  version_requirements: !ruby/object:Gem::Requirement
-    requirements:
-    - - ">="
-      - !ruby/object:Gem::Version
-        version: '0'
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - ">="
       - !ruby/object:Gem::Version
         version: '0'
+  name: google-api-client
   prerelease: false
   type: :runtime
-- !ruby/object:Gem::Dependency
-  name: tzinfo
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - ">="
       - !ruby/object:Gem::Version
         version: '0'
+- !ruby/object:Gem::Dependency
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - ">="
       - !ruby/object:Gem::Version
         version: '0'
+  name: time_with_zone
   prerelease: false
   type: :runtime
-- !ruby/object:Gem::Dependency
-  name: embulk
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - ">="
       - !ruby/object:Gem::Version
-        version: 0.8.2
+        version: '0'
+- !ruby/object:Gem::Dependency
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - ">="
       - !ruby/object:Gem::Version
         version: 0.8.2
+  name: embulk
   prerelease: false
   type: :development
-- !ruby/object:Gem::Dependency
-  name: bundler
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - ">="
       - !ruby/object:Gem::Version
-        version: 1.10.6
+        version: 0.8.2
+- !ruby/object:Gem::Dependency
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - ">="
       - !ruby/object:Gem::Version
         version: 1.10.6
+  name: bundler
   prerelease: false
   type: :development
-- !ruby/object:Gem::Dependency
-  name: rake
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - ">="
       - !ruby/object:Gem::Version
-        version: '10.0'
+        version: 1.10.6
+- !ruby/object:Gem::Dependency
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - ">="
       - !ruby/object:Gem::Version
         version: '10.0'
+  name: rake
   prerelease: false
   type: :development
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '10.0'
 description: Embulk plugin that insert records to Google BigQuery.
 email:
 - satoshiakama@gmail.com