RubyGems - embulk-output-bigquery - Versions diffs - 0.3.7 → 0.4.0 - Mend

embulk-output-bigquery 0.3.7 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +5 -0
data/README.md +47 -17
data/embulk-output-bigquery.gemspec +1 -1
data/example/config_delete_in_advance_partitioned_table.yml +33 -0
data/example/config_progress_log_interval.yml +31 -0
data/example/config_replace_backup_paritioned_table.yml +34 -0
data/example/config_replace_paritioned_table.yml +33 -0
data/lib/embulk/output/bigquery.rb +55 -14
data/lib/embulk/output/bigquery/bigquery_client.rb +63 -28
data/lib/embulk/output/bigquery/file_writer.rb +13 -4
data/lib/embulk/output/bigquery/helper.rb +10 -0
data/test/test_bigquery_client.rb +41 -0
data/test/test_configure.rb +17 -0
data/test/test_example.rb +20 -11
data/test/test_helper.rb +10 -0
data/test/test_transaction.rb +169 -32
metadata +6 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: f0c3b2728451f241f860fdcea92a26470db4203d
-  data.tar.gz: e38b0f175e46685bd25d9a31a33358bd3220b8c6
+  metadata.gz: 71bc9b253f725436a06e183667cbc87720c3719b
+  data.tar.gz: a32e43da05a4f90ab72c5715ffdf6b08501996d4
 SHA512:
-  metadata.gz: 4639014f1f5ba0a6e791ceb35dccde7cb68fc75162d1a6cbfa06d3e99f882fd397470a97c3dd4edfe47cef786fd884ac53e787e48d01921ca7c1bc6edea58dd1
-  data.tar.gz: 5f6db056c5119510db2f00edd2351bd240f9e715e7acf6ad3da7a9dde4b0f5b9fb8b187b97641532dc6215b72dcb9317150da7260ae39235ab3ed22968bd9f67
+  metadata.gz: bd3d8aefbc98c2f044b782f807f595603ac7b11052a06b6486803fd2f6871127058a50e9c69ffc1fac92b75de9561c57e99ad9ba3cd8899507e93085d45ed615
+  data.tar.gz: 813b6455f463940968232b4332b8553698b9ef99ad4f3f5af6800b10223c33498fde9f8915604090f85dc7c2f78d16a865cd90da2174447c7f84ab3ef80a4cf8

data/CHANGELOG.md CHANGED Viewed

@@ -1,3 +1,8 @@
+## 0.4.0 - 2016-10-01
+* [enhancement] Support partitioned table
+* [maintenance] Add `progress_log_interval` option to control the interval of showing progress log, and now showing progress log is off by default
 ## 0.3.7 - 2016-08-03
 * [maintenance] Fix Thread.new to use thread local variables to avoid nil idx error (thanks to @shyouhei and @umisora)

data/README.md CHANGED Viewed

@@ -44,7 +44,7 @@ v0.3.x has incompatibility changes with v0.2.x. Please see [CHANGELOG.md](CHANGE
 |  json_keyfile                        | string      | required when auth_method is json_key     |   | Fullpath of json key |
 |  project                             | string      | required if json_keyfile is not given     |   | project_id |
 |  dataset                             | string      | required   |                          | dataset |
-|  table                               | string      | required   |                          | table name |
+|  table                               | string      | required   |                          | table name, or table name with a partition decorator such as `table_name$20160929`|
 |  auto_create_dataset                 | boolean     | optional   | false                    | automatically create dataset |
 |  auto_create_table                   | boolean     | optional   | false                    | See [Dynamic Table Creating](#dynamic-table-creating) |
 |  schema_file                         | string      | optional   |                          | /path/to/schema.json |
@@ -63,6 +63,7 @@ v0.3.x has incompatibility changes with v0.2.x. Please see [CHANGELOG.md](CHANGE
 |  payload_column_index                | integer     | optional   | nil                      | See [Formatter Performance Issue](#formatter-performance-issue) |
 |  gcs_bucket                          | stringr     | optional   | nil                      | See [GCS Bucket](#gcs-bucket) |
 |  auto_create_gcs_bucket              | boolean     | optional   | false                    | See [GCS Bucket](#gcs-bucket) |
+|  progress_log_interval               | float       | optional   | nil (Disabled)           | Progress log interval. The progress log is disabled by nil (default). NOTE: This option may be removed in a future because a filter plugin can achieve the same goal |
 Client or request options
@@ -87,18 +88,21 @@ Options for intermediate local files
 `source_format` is also used to determine formatter (csv or jsonl).
-#### Same options of bq command-line tools or BigQuery job's propery
+#### Same options of bq command-line tools or BigQuery job's property
 Following options are same as [bq command-line tools](https://cloud.google.com/bigquery/bq-command-line-tool#creatingtablefromfile) or BigQuery [job's property](https://cloud.google.com/bigquery/docs/reference/v2/jobs#resource).
-| name                      | type        | required?  | default      | description            |
-|:--------------------------|:------------|:-----------|:-------------|:-----------------------|
-|  source_format            | string      | required   | "CSV"        |   File type (`NEWLINE_DELIMITED_JSON` or `CSV`) |
-|  max_bad_records          | int         | optional   | 0            | |
-|  field_delimiter          | char        | optional   | ","          |  |
-|  encoding                 | string      | optional   | "UTF-8"      | `UTF-8` or `ISO-8859-1` |
-|  ignore_unknown_values    | boolean     | optional   | 0            | |
-|  allow_quoted_newlines    | boolean     | optional   | 0            | Set true, if data contains newline characters. It may cause slow procsssing |
+| name                              | type     | required? | default | description            |
+|:----------------------------------|:---------|:----------|:--------|:-----------------------|
+|  source_format                    | string   | required  | "CSV"   |   File type (`NEWLINE_DELIMITED_JSON` or `CSV`) |
+|  max_bad_records                  | int      | optional  | 0       | |
+|  field_delimiter                  | char     | optional  | ","     | |
+|  encoding                         | string   | optional  | "UTF-8" | `UTF-8` or `ISO-8859-1` |
+|  ignore_unknown_values            | boolean  | optional  | false   | |
+|  allow_quoted_newlines            | boolean  | optional  | false   | Set true, if data contains newline characters. It may cause slow procsssing |
+|  time_partitioning                | hash     | optional  | nil     | See [Time Partitioning](#time-partitioning) |
+|  time_partitioning.type           | string   | required  | nil     | The only type supported is DAY, which will generate one partition per day based on data loading time. |
+|  time_partitioning.expiration__ms | int      | optional  | nil     | Number of milliseconds for which to keep the storage for a partition. partition |
 ### Example
@@ -123,32 +127,32 @@ out:
 ##### append
 1. Load to temporary table.
-2. Copy temporary table to destination table. (WRITE_APPEND)
+2. Copy temporary table to destination table (or partition). (WRITE_APPEND)
 ##### append_direct
-Insert data into existing table directly.
+Insert data into existing table (or partition) directly.
 This is not transactional, i.e., if fails, the target table could have some rows inserted.
 ##### replace
 1. Load to temporary table.
-2. Copy temporary table to destination table. (WRITE_TRUNCATE)
+2. Copy temporary table to destination table (or partition). (WRITE_TRUNCATE)
 ```is_skip_job_result_check``` must be false when replace mode
 ##### replace_backup
 1. Load to temporary table.
-2. Copy destination table to backup table. (dataset_old, table_old)
-3. Copy temporary table to destination table. (WRITE_TRUNCATE)
+2. Copy destination table (or partition) to backup table (or partition). (dataset_old, table_old)
+3. Copy temporary table to destination table (or partition). (WRITE_TRUNCATE)
 ```is_skip_job_result_check``` must be false when replace_backup mode.
 ##### delete_in_advance
-1. Delete destination table, if it exists.
-2. Load to destination table.
+1. Delete destination table (or partition), if it exists.
+2. Load to destination table (or partition).
 ### Authentication
@@ -366,6 +370,32 @@ out:
 ToDo: Use https://cloud.google.com/storage/docs/streaming if google-api-ruby-client supports streaming transfers into GCS.
+### Time Partitioning
+From 0.4.0, embulk-output-bigquery supports to load into partitioned table.
+See also [Creating and Updating Date-Partitioned Tables](https://cloud.google.com/bigquery/docs/creating-partitioned-tables).
+To load into a partition, specify `table` parameter with a partition decorator as:
+```yaml
+out:
+  type: bigquery
+  table: table_name$20160929
+  auto_create_table: true
+```
+You may configure `time_partitioning` parameter together to create table via `auto_create_table: true` option as:
+```yaml
+out:
+  type: bigquery
+  table: table_name$20160929
+  auto_create_table: true
+  time-partitioning:
+    type: DAY
+    expiration_ms: 259200000
+```
 ## Development
 ### Run example:

data/embulk-output-bigquery.gemspec CHANGED Viewed

@@ -1,6 +1,6 @@
 Gem::Specification.new do |spec|
   spec.name          = "embulk-output-bigquery"
-  spec.version       = "0.3.7"
+  spec.version       = "0.4.0"
   spec.authors       = ["Satoshi Akama", "Naotoshi Seo"]
   spec.summary       = "Google BigQuery output plugin for Embulk"
   spec.description   = "Embulk plugin that insert records to Google BigQuery."

data/example/config_delete_in_advance_partitioned_table.yml ADDED Viewed

@@ -0,0 +1,33 @@
+in:
+  type: file
+  path_prefix: example/example.csv
+  parser:
+    type: csv
+    charset: UTF-8
+    newline: CRLF
+    null_string: 'NULL'
+    skip_header_lines: 1
+    comment_line_marker: '#'
+    columns:
+      - {name: date,        type: string}
+      - {name: timestamp,   type: timestamp, format: "%Y-%m-%d %H:%M:%S.%N", timezone: "+09:00"}
+      - {name: "null",      type: string}
+      - {name: long,        type: long}
+      - {name: string,      type: string}
+      - {name: double,      type: double}
+      - {name: boolean,     type: boolean}
+out:
+  type: bigquery
+  mode: delete_in_advance
+  auth_method: json_key
+  json_keyfile: example/your-project-000.json
+  dataset: your_dataset_name
+  table: your_partitioned_table_name$20160929
+  source_format: NEWLINE_DELIMITED_JSON
+  compression: NONE
+  auto_create_dataset: true
+  auto_create_table: true
+  schema_file: example/schema.json
+  time_partitioning:
+    type: 'DAY'
+    expiration_ms: 100

data/example/config_progress_log_interval.yml ADDED Viewed

@@ -0,0 +1,31 @@
+in:
+  type: file
+  path_prefix: example/example.csv
+  parser:
+    type: csv
+    charset: UTF-8
+    newline: CRLF
+    null_string: 'NULL'
+    skip_header_lines: 1
+    comment_line_marker: '#'
+    columns:
+      - {name: date,        type: string}
+      - {name: timestamp,   type: timestamp, format: "%Y-%m-%d %H:%M:%S.%N", timezone: "+09:00"}
+      - {name: "null",      type: string}
+      - {name: long,        type: long}
+      - {name: string,      type: string}
+      - {name: double,      type: double}
+      - {name: boolean,     type: boolean}
+out:
+  type: bigquery
+  mode: replace
+  auth_method: json_key
+  json_keyfile: example/your-project-000.json
+  dataset: your_dataset_name
+  table: your_table_name
+  source_format: NEWLINE_DELIMITED_JSON
+  compression: NONE
+  auto_create_dataset: true
+  auto_create_table: true
+  schema_file: example/schema.json
+  progress_log_interval: 0.1

data/example/config_replace_backup_paritioned_table.yml ADDED Viewed

@@ -0,0 +1,34 @@
+in:
+  type: file
+  path_prefix: example/example.csv
+  parser:
+    type: csv
+    charset: UTF-8
+    newline: CRLF
+    null_string: 'NULL'
+    skip_header_lines: 1
+    comment_line_marker: '#'
+    columns:
+      - {name: date,        type: string}
+      - {name: timestamp,   type: timestamp, format: "%Y-%m-%d %H:%M:%S.%N", timezone: "+09:00"}
+      - {name: "null",      type: string}
+      - {name: long,        type: long}
+      - {name: string,      type: string}
+      - {name: double,      type: double}
+      - {name: boolean,     type: boolean}
+out:
+  type: bigquery
+  mode: replace_backup
+  auth_method: json_key
+  json_keyfile: example/your-project-000.json
+  dataset: your_dataset_name
+  table: your_partitioned_table_name$20160929
+  table_old: your_partitioned_table_name_old$20160929
+  source_format: NEWLINE_DELIMITED_JSON
+  compression: NONE
+  auto_create_dataset: true
+  auto_create_table: true
+  schema_file: example/schema.json
+  time_partitioning:
+    type: 'DAY'
+    expiration_ms: 100

data/example/config_replace_paritioned_table.yml ADDED Viewed

@@ -0,0 +1,33 @@
+in:
+  type: file
+  path_prefix: example/example.csv
+  parser:
+    type: csv
+    charset: UTF-8
+    newline: CRLF
+    null_string: 'NULL'
+    skip_header_lines: 1
+    comment_line_marker: '#'
+    columns:
+      - {name: date,        type: string}
+      - {name: timestamp,   type: timestamp, format: "%Y-%m-%d %H:%M:%S.%N", timezone: "+09:00"}
+      - {name: "null",      type: string}
+      - {name: long,        type: long}
+      - {name: string,      type: string}
+      - {name: double,      type: double}
+      - {name: boolean,     type: boolean}
+out:
+  type: bigquery
+  mode: replace
+  auth_method: json_key
+  json_keyfile: example/your-project-000.json
+  dataset: your_dataset_name
+  table: your_partitioned_table_name$20160929
+  source_format: NEWLINE_DELIMITED_JSON
+  compression: NONE
+  auto_create_dataset: true
+  auto_create_table: true
+  schema_file: example/schema.json
+  time_partitioning:
+    type: 'DAY'
+    expiration_ms: 100

data/lib/embulk/output/bigquery.rb CHANGED Viewed

@@ -56,6 +56,7 @@ module Embulk
           'with_rehearsal'                 => config.param('with_rehearsal',                 :bool,    :default => false),
           'rehearsal_counts'               => config.param('rehearsal_counts',               :integer, :default => 1000),
           'abort_on_error'                 => config.param('abort_on_error',                 :bool,    :default => nil),
+          'progress_log_interval'          => config.param('progress_log_interval',          :float,   :default => nil),
           'column_options'                 => config.param('column_options',                 :array,   :default => []),
           'default_timezone'               => config.param('default_timezone',               :string,  :default => ValueConverterFactory::DEFAULT_TIMEZONE),
@@ -84,6 +85,7 @@ module Embulk
           'encoding'                       => config.param('encoding',                       :string,  :default => 'UTF-8'),
           'ignore_unknown_values'          => config.param('ignore_unknown_values',          :bool,    :default => false),
           'allow_quoted_newlines'          => config.param('allow_quoted_newlines',          :bool,    :default => false),
+          'time_partitioning'              => config.param('time_partitioning',              :hash,    :default => nil),
           # for debug
           'skip_load'                      => config.param('skip_load',                      :bool,    :default => false),
@@ -204,6 +206,8 @@ module Embulk
         if %w[replace replace_backup append].include?(task['mode'])
           task['temp_table'] ||= "LOAD_TEMP_#{unique_name}_#{task['table']}"
+        else
+          task['temp_table'] = nil
         end
         if task['with_rehearsal']
@@ -218,6 +222,14 @@ module Embulk
           task['abort_on_error'] = (task['max_bad_records'] == 0)
         end
+        if task['time_partitioning']
+          unless task['time_partitioning']['type']
+            raise ConfigError.new "`time_partitioning` must have `type` key"
+          end
+        elsif Helper.has_partition_decorator?(task['table'])
+          task['time_partitioning'] = {'type' => 'DAY'}
+        end
         task
       end
@@ -258,14 +270,7 @@ module Embulk
         }
       end
-      def self.transaction(config, schema, task_count, &control)
-        task = self.configure(config, schema, task_count)
-        @task = task
-        @schema = schema
-        @bigquery = BigqueryClient.new(task, schema)
-        @converters = ValueConverterFactory.create_converters(task, schema)
+      def self.auto_create(task, bigquery)
         if task['auto_create_dataset']
           bigquery.create_dataset(task['dataset'])
         else
@@ -282,18 +287,50 @@ module Embulk
         case task['mode']
         when 'delete_in_advance'
-          bigquery.delete_table(task['table'])
-          bigquery.create_table(task['table'])
+          if task['time_partitioning']
+            bigquery.delete_partition(task['table'])
+          else
+            bigquery.delete_table(task['table'])
+          end
+          bigquery.create_table(task['table'], options: task)
         when 'replace', 'replace_backup', 'append'
-          bigquery.create_table(task['temp_table'])
+          bigquery.create_table(task['temp_table'], options: task)
+          if task['time_partitioning']
+            if task['auto_create_table']
+              bigquery.create_table(task['table'], options: task)
+            else
+              bigquery.get_table(task['table']) # raises NotFoundError
+            end
+          end
         else # append_direct
           if task['auto_create_table']
-            bigquery.create_table(task['table'])
+            bigquery.create_table(task['table'], options: task)
           else
             bigquery.get_table(task['table']) # raises NotFoundError
           end
         end
+        if task['mode'] == 'replace_backup'
+          if task['time_partitioning'] and Helper.has_partition_decorator?(task['table_old'])
+            if task['auto_create_table']
+              bigquery.create_table(task['table_old'], dataset: task['dataset_old'], options: task)
+            else
+              bigquery.get_table(task['table_old'], dataset: task['dataset_old']) # raises NotFoundError
+            end
+          end
+        end
+      end
+      def self.transaction(config, schema, task_count, &control)
+        task = self.configure(config, schema, task_count)
+        @task = task
+        @schema = schema
+        @bigquery = BigqueryClient.new(task, schema)
+        @converters = ValueConverterFactory.create_converters(task, schema)
+        self.auto_create(@task, @bigquery)
         begin
           paths = []
           if task['skip_file_generation']
@@ -346,7 +383,11 @@ module Embulk
             end
             if task['mode'] == 'replace_backup'
-              bigquery.copy(task['table'], task['table_old'], task['dataset_old'])
+              begin
+                bigquery.get_table(task['table'])
+                bigquery.copy(task['table'], task['table_old'], task['dataset_old'])
+              rescue NotFoundError
+              end
             end
             if task['temp_table']
@@ -359,7 +400,7 @@ module Embulk
           end
         ensure
           begin
-            if task['temp_table'] # replace or replace_backup
+            if task['temp_table'] # append or replace or replace_backup
               bigquery.delete_table(task['temp_table'])
             end
           ensure

data/lib/embulk/output/bigquery/bigquery_client.rb CHANGED Viewed

@@ -17,6 +17,14 @@ module Embulk
           reset_fields(fields) if fields
           @project = @task['project']
           @dataset = @task['dataset']
+          @task['source_format'] ||= 'CSV'
+          @task['max_bad_records'] ||= 0
+          @task['field_delimiter'] ||= ','
+          @task['source_format'] == 'CSV' ? @task['field_delimiter'] : nil
+          @task['encoding'] ||= 'UTF-8'
+          @task['ignore_unknown_values'] = false if @task['ignore_unknown_values'].nil?
+          @task['allow_quoted_newlines'] = false if @task['allow_quoted_newlines'].nil?
         end
         def fields
@@ -143,7 +151,7 @@ module Embulk
           responses
         end
-        def load(path, table)
+        def load(path, table, write_disposition: 'WRITE_APPEND')
           with_job_retry do
             begin
               if File.exist?(path)
@@ -175,7 +183,7 @@ module Embulk
                     schema: {
                       fields: fields,
                     },
-                    write_disposition: 'WRITE_APPEND',
+                    write_disposition:     write_disposition,
                     source_format:         @task['source_format'],
                     max_bad_records:       @task['max_bad_records'],
                     field_delimiter:       @task['source_format'] == 'CSV' ? @task['field_delimiter'] : nil,
@@ -233,15 +241,15 @@ module Embulk
                     create_deposition: 'CREATE_IF_NEEDED',
                     write_disposition: write_disposition,
                     source_table: {
-                    project_id: @project,
-                    dataset_id: @dataset,
-                    table_id: source_table,
-                  },
-                  destination_table: {
-                    project_id: @project,
-                    dataset_id: destination_dataset,
-                    table_id: destination_table,
-                  },
+                      project_id: @project,
+                      dataset_id: @dataset,
+                      table_id: source_table,
+                    },
+                    destination_table: {
+                      project_id: @project,
+                      dataset_id: destination_dataset,
+                      table_id: destination_table,
+                    },
                   }
                 }
               }
@@ -363,9 +371,11 @@ module Embulk
           end
         end
-        def create_table(table)
+        def create_table(table, dataset: nil, options: {})
           begin
-            Embulk.logger.info { "embulk-output-bigquery: Create table... #{@project}:#{@dataset}.#{table}" }
+            table = Helper.chomp_partition_decorator(table)
+            dataset ||= @dataset
+            Embulk.logger.info { "embulk-output-bigquery: Create table... #{@project}:#{dataset}.#{table}" }
             body = {
               table_reference: {
                 table_id: table,
@@ -374,9 +384,15 @@ module Embulk
                 fields: fields,
               }
             }
+            if options['time_partitioning']
+              body[:time_partitioning] = {
+                type: options['time_partitioning']['type'],
+                expiration_ms: options['time_partitioning']['expiration_ms'],
+              }
+            end
             opts = {}
-            Embulk.logger.debug { "embulk-output-bigquery: insert_table(#{@project}, #{@dataset}, #{body}, #{opts})" }
-            with_network_retry { client.insert_table(@project, @dataset, body, opts) }
+            Embulk.logger.debug { "embulk-output-bigquery: insert_table(#{@project}, #{dataset}, #{body}, #{opts})" }
+            with_network_retry { client.insert_table(@project, dataset, body, opts) }
           rescue Google::Apis::ServerError, Google::Apis::ClientError, Google::Apis::AuthorizationError => e
             if e.status_code == 409 && /Already Exists:/ =~ e.message
               # ignore 'Already Exists' error
@@ -385,16 +401,18 @@ module Embulk
             response = {status_code: e.status_code, message: e.message, error_class: e.class}
             Embulk.logger.error {
-              "embulk-output-bigquery: insert_table(#{@project}, #{@dataset}, #{body}, #{opts}), response:#{response}"
+              "embulk-output-bigquery: insert_table(#{@project}, #{dataset}, #{body}, #{opts}), response:#{response}"
             }
-            raise Error, "failed to create table #{@project}:#{@dataset}.#{table}, response:#{response}"
+            raise Error, "failed to create table #{@project}:#{dataset}.#{table}, response:#{response}"
           end
         end
-        def delete_table(table)
+        def delete_table(table, dataset: nil)
           begin
-            Embulk.logger.info { "embulk-output-bigquery: Delete table... #{@project}:#{@dataset}.#{table}" }
-            with_network_retry { client.delete_table(@project, @dataset, table) }
+            table = Helper.chomp_partition_decorator(table)
+            dataset ||= @dataset
+            Embulk.logger.info { "embulk-output-bigquery: Delete table... #{@project}:#{dataset}.#{table}" }
+            with_network_retry { client.delete_table(@project, dataset, table) }
           rescue Google::Apis::ServerError, Google::Apis::ClientError, Google::Apis::AuthorizationError => e
             if e.status_code == 404 && /Not found:/ =~ e.message
               # ignore 'Not Found' error
@@ -403,26 +421,43 @@ module Embulk
             response = {status_code: e.status_code, message: e.message, error_class: e.class}
             Embulk.logger.error {
-              "embulk-output-bigquery: delete_table(#{@project}, #{@dataset}, #{table}), response:#{response}"
+              "embulk-output-bigquery: delete_table(#{@project}, #{dataset}, #{table}), response:#{response}"
             }
-            raise Error, "failed to delete table #{@project}:#{@dataset}.#{table}, response:#{response}"
+            raise Error, "failed to delete table #{@project}:#{dataset}.#{table}, response:#{response}"
           end
         end
-        def get_table(table)
+        def get_table(table, dataset: nil)
           begin
-            Embulk.logger.info { "embulk-output-bigquery: Get table... #{@project}:#{@dataset}.#{table}" }
-            with_network_retry { client.get_table(@project, @dataset, table) }
+            table = Helper.chomp_partition_decorator(table)
+            dataset ||= @dataset
+            Embulk.logger.info { "embulk-output-bigquery: Get table... #{@project}:#{dataset}.#{table}" }
+            with_network_retry { client.get_table(@project, dataset, table) }
           rescue Google::Apis::ServerError, Google::Apis::ClientError, Google::Apis::AuthorizationError => e
             if e.status_code == 404
-              raise NotFoundError, "Table #{@project}:#{@dataset}.#{table} is not found"
+              raise NotFoundError, "Table #{@project}:#{dataset}.#{table} is not found"
             end
             response = {status_code: e.status_code, message: e.message, error_class: e.class}
             Embulk.logger.error {
-              "embulk-output-bigquery: get_table(#{@project}, #{@dataset}, #{table}), response:#{response}"
+              "embulk-output-bigquery: get_table(#{@project}, #{dataset}, #{table}), response:#{response}"
             }
-            raise Error, "failed to get table #{@project}:#{@dataset}.#{table}, response:#{response}"
+            raise Error, "failed to get table #{@project}:#{dataset}.#{table}, response:#{response}"
+          end
+        end
+        # Is this only a way to drop partition?
+        def delete_partition(table_with_partition, dataset: nil)
+          dataset ||= @dataset
+          begin
+            table = Helper.chomp_partition_decorator(table_with_partition)
+            get_table(table, dataset: dataset)
+          rescue NotFoundError
+          else
+            Embulk.logger.info { "embulk-output-bigquery: Delete partition... #{@project}:#{dataset}.#{table_with_partition}" }
+            Tempfile.create('embulk_output_bigquery_empty_file_') do |fp|
+              load(fp.path, table_with_partition, write_disposition: 'WRITE_TRUNCATE')
+            end
           end
         end
       end

data/lib/embulk/output/bigquery/file_writer.rb CHANGED Viewed

@@ -16,8 +16,11 @@ module Embulk
           @converters = converters || ValueConverterFactory.create_converters(task, schema)
           @num_rows = 0
-          @progress_log_timer = Time.now
-          @previous_num_rows = 0
+          if @task['progress_log_interval']
+            @progress_log_interval = @task['progress_log_interval']
+            @progress_log_timer = Time.now
+            @previous_num_rows = 0
+          end
           if @task['payload_column_index']
             @payload_column_index = @task['payload_column_index']
@@ -103,14 +106,20 @@ module Embulk
             _io.write formatted_record
             @num_rows += 1
           end
+          show_progress if @task['progress_log_interval']
+          @num_rows
+        end
+        private
+        def show_progress
           now = Time.now
-          if @progress_log_timer < now - 10 # once in 10 seconds
+          if @progress_log_timer < now - @progress_log_interval
             speed = ((@num_rows - @previous_num_rows) / (now - @progress_log_timer).to_f).round(1)
             @progress_log_timer = now
             @previous_num_rows = @num_rows
             Embulk.logger.info { "embulk-output-bigquery: num_rows #{num_format(@num_rows)} (#{num_format(speed)} rows/sec)" }
           end
-          @num_rows
         end
       end
     end

data/lib/embulk/output/bigquery/helper.rb CHANGED Viewed

@@ -5,6 +5,16 @@ module Embulk
   module Output
     class Bigquery < OutputPlugin
       class Helper
+        PARTITION_DECORATOR_REGEXP = /\$.+\z/
+        def self.has_partition_decorator?(table)
+          !!(table =~ PARTITION_DECORATOR_REGEXP)
+        end
+        def self.chomp_partition_decorator(table)
+          table.sub(PARTITION_DECORATOR_REGEXP, '')
+        end
         def self.bq_type_from_embulk_type(embulk_type)
           case embulk_type
           when :boolean then 'BOOLEAN'

data/test/test_bigquery_client.rb CHANGED Viewed

@@ -105,6 +105,15 @@ else
           def test_create_table_already_exists
             assert_nothing_raised { client.create_table('your_table_name') }
           end
+          def test_create_partitioned_table
+            client.delete_table('your_table_name')
+            assert_nothing_raised do
+              client.create_table('your_table_name$20160929', options:{
+                'time_partitioning' => {'type'=>'DAY'}
+              })
+            end
+          end
         end
         sub_test_case "delete_table" do
@@ -116,6 +125,11 @@ else
           def test_delete_table_not_found
             assert_nothing_raised { client.delete_table('your_table_name') }
           end
+          def test_delete_partitioned_table
+            client.create_table('your_table_name')
+            assert_nothing_raised { client.delete_table('your_table_name$20160929') }
+          end
         end
         sub_test_case "get_table" do
@@ -130,6 +144,33 @@ else
               client.get_table('your_table_name')
             }
           end
+          def test_get_partitioned_table
+            client.create_table('your_table_name')
+            assert_nothing_raised { client.get_table('your_table_name$20160929') }
+          end
+        end
+        sub_test_case "delete_partition" do
+          def test_delete_partition
+            client.create_table('your_table_name$20160929', options:{
+              'time_partitioning' => {'type'=>'DAY'}
+            })
+            assert_nothing_raised { client.delete_partition('your_table_name$20160929') }
+          ensure
+            client.delete_table('your_table_name')
+          end
+          def test_delete_partition_of_non_partitioned_table
+            client.create_table('your_table_name')
+            assert_raise { client.delete_partition('your_table_name$20160929') }
+          ensure
+            client.delete_table('your_table_name')
+          end
+          def test_delete_partition_table_not_found
+            assert_nothing_raised { client.delete_partition('your_table_name$20160929') }
+          end
         end
         sub_test_case "fields" do

data/test/test_configure.rb CHANGED Viewed

@@ -84,6 +84,7 @@ module Embulk
         assert_equal "UTF-8", task['encoding']
         assert_equal false, task['ignore_unknown_values']
         assert_equal false, task['allow_quoted_newlines']
+        assert_equal nil, task['time_partitioning']
         assert_equal false, task['skip_load']
       end
@@ -249,6 +250,22 @@ module Embulk
         task = Bigquery.configure(config, schema, processor_count)
         assert_equal '.foo', task['file_ext']
       end
+      def test_time_partitioning
+        config = least_config.merge('time_partitioning' => {'type' => 'DAY'})
+        assert_nothing_raised { Bigquery.configure(config, schema, processor_count) }
+        config = least_config.merge('time_partitioning' => {'foo' => 'bar'})
+        assert_raise { Bigquery.configure(config, schema, processor_count) }
+        config = least_config.merge('table' => 'table')
+        task = Bigquery.configure(config, schema, processor_count)
+        assert_equal nil, task['time_partitioning']
+        config = least_config.merge('table' => 'table_name$20160912')
+        task = Bigquery.configure(config, schema, processor_count)
+        assert_equal 'DAY', task['time_partitioning']['type']
+      end
     end
   end
 end

data/test/test_example.rb CHANGED Viewed

@@ -18,19 +18,28 @@ else
       end
     end
-    files = Dir.glob("#{APP_ROOT}/example/config_*.yml").sort
-    files = files.reject {|file| File.symlink?(file) }
-    # files.shift
+    def embulk_run(config_path)
+      Bundler.with_clean_env do
+        cmd = "#{embulk_path} run -X page_size=1 -b . -l trace #{config_path}"
+        puts "=" * 64
+        puts cmd
+        system(cmd)
+      end
+    end
+    files = Dir.glob("#{APP_ROOT}/example/config_*.yml").reject {|file| File.symlink?(file) }.sort
     files.each do |config_path|
-      next if File.basename(config_path) == 'config_expose_errors.yml'
-      define_method(:"test_#{File.basename(config_path, ".yml")}") do
-        success = Bundler.with_clean_env do
-          cmd = "#{embulk_path} run -X page_size=1 -b . -l trace #{config_path}"
-          puts "=" * 64
-          puts cmd
-          system(cmd)
+      if %w[
+        config_expose_errors.yml
+        config_prevent_duplicate_insert.yml
+        ].include?(File.basename(config_path))
+        define_method(:"test_#{File.basename(config_path, ".yml")}") do
+          assert_false embulk_run(config_path)
+        end
+      else
+        define_method(:"test_#{File.basename(config_path, ".yml")}") do
+          assert_true embulk_run(config_path)
         end
-        assert_true success
       end
     end
   end

data/test/test_helper.rb CHANGED Viewed

@@ -14,6 +14,16 @@ module Embulk
         end
       end
+      def has_partition_decorator?
+        assert_true Helper.has_partition_decorator?('table$20160929')
+        assert_false Helper.has_partition_decorator?('table')
+      end
+      def chomp_partition_decorator
+        assert_equal 'table', Helper.chomp_partition_decorator?('table$20160929')
+        assert_equal 'table', Helper.chomp_partition_decorator?('table')
+      end
       def bq_type_from_embulk_type
         assert_equal 'BOOLEAN',   Helper.bq_type_from_embulk_type(:boolean)
         assert_equal 'STRING',    Helper.bq_type_from_embulk_type(:string)

data/test/test_transaction.rb CHANGED Viewed

@@ -8,10 +8,12 @@ module Embulk
     class TestTransaction < Test::Unit::TestCase
       def least_config
         DataSource.new({
-          'project' => 'your_project_name',
-          'dataset' => 'your_dataset_name',
-          'table'   => 'your_table_name',
+          'project'     => 'your_project_name',
+          'dataset'     => 'your_dataset_name',
+          'table'       => 'your_table_name',
           'p12_keyfile' => __FILE__, # fake
+          'temp_table'  => 'temp_table', # randomly created is not good for our test
+          'path_prefix' => 'tmp/', # randomly created is not good for our test
         })
       end
@@ -38,17 +40,6 @@ module Embulk
         stub(Bigquery).transaction_report { {'num_input_rows' => 1, 'num_output_rows' => 1, 'num_rejected_rows' => 0} }
       end
-      def test_append
-        config = least_config.merge('mode' => 'append', 'temp_table' => 'temp_table')
-        any_instance_of(BigqueryClient) do |obj|
-          mock(obj).get_dataset(config['dataset'])
-          mock(obj).create_table(config['temp_table'])
-          mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_APPEND')
-          mock(obj).delete_table(config['temp_table'])
-        end
-        Bigquery.transaction(config, schema, processor_count, &control)
-      end
       sub_test_case "append_direct" do
         def test_append_direct
           config = least_config.merge('mode' => 'append_direct')
@@ -61,43 +52,108 @@ module Embulk
         def test_append_direct_with_auto_create
           config = least_config.merge('mode' => 'append_direct', 'auto_create_dataset' => true, 'auto_create_table' => true)
+          task = Bigquery.configure(config, schema, processor_count)
           any_instance_of(BigqueryClient) do |obj|
             mock(obj).create_dataset(config['dataset'])
-            mock(obj).create_table(config['table'])
+            mock(obj).create_table(config['table'], options: task)
+          end
+          Bigquery.transaction(config, schema, processor_count, &control)
+        end
+        def test_append_direct_with_partition
+          config = least_config.merge('mode' => 'append_direct', 'table' => 'table$20160929')
+          any_instance_of(BigqueryClient) do |obj|
+            mock(obj).get_dataset(config['dataset'])
+            mock(obj).get_table(config['table'])
+          end
+          Bigquery.transaction(config, schema, processor_count, &control)
+        end
+        def test_append_direct_with_partition_with_auto_create
+          config = least_config.merge('mode' => 'append_direct', 'table' => 'table$20160929', 'auto_create_dataset' => true, 'auto_create_table' => true)
+          task = Bigquery.configure(config, schema, processor_count)
+          any_instance_of(BigqueryClient) do |obj|
+            mock(obj).create_dataset(config['dataset'])
+            mock(obj).create_table(config['table'], options: task)
           end
           Bigquery.transaction(config, schema, processor_count, &control)
         end
       end
-      def test_delete_in_advance
-        config = least_config.merge('mode' => 'delete_in_advance')
-        any_instance_of(BigqueryClient) do |obj|
-          mock(obj).get_dataset(config['dataset'])
-          mock(obj).delete_table(config['table'])
-          mock(obj).create_table(config['table'])
+      sub_test_case "delete_in_advance" do
+        def test_delete_in_advance
+          config = least_config.merge('mode' => 'delete_in_advance')
+          task = Bigquery.configure(config, schema, processor_count)
+          any_instance_of(BigqueryClient) do |obj|
+            mock(obj).get_dataset(config['dataset'])
+            mock(obj).delete_table(config['table'])
+            mock(obj).create_table(config['table'], options: task)
+          end
+          Bigquery.transaction(config, schema, processor_count, &control)
+        end
+        def test_delete_in_advance_with_partitioning
+          config = least_config.merge('mode' => 'delete_in_advance', 'table' => 'table$20160929')
+          task = Bigquery.configure(config, schema, processor_count)
+          any_instance_of(BigqueryClient) do |obj|
+            mock(obj).get_dataset(config['dataset'])
+            mock(obj).delete_partition(config['table'])
+            mock(obj).create_table(config['table'], options: task)
+          end
+          Bigquery.transaction(config, schema, processor_count, &control)
         end
-        Bigquery.transaction(config, schema, processor_count, &control)
       end
-      def test_replace
-        config = least_config.merge('mode' => 'replace', 'temp_table' => 'temp_table')
-        any_instance_of(BigqueryClient) do |obj|
-          mock(obj).get_dataset(config['dataset'])
-          mock(obj).create_table(config['temp_table'])
-          mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
-          mock(obj).delete_table(config['temp_table'])
+      sub_test_case "replace" do
+        def test_replace
+          config = least_config.merge('mode' => 'replace')
+          task = Bigquery.configure(config, schema, processor_count)
+          any_instance_of(BigqueryClient) do |obj|
+            mock(obj).get_dataset(config['dataset'])
+            mock(obj).create_table(config['temp_table'], options: task)
+            mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
+            mock(obj).delete_table(config['temp_table'])
+          end
+          Bigquery.transaction(config, schema, processor_count, &control)
+        end
+        def test_replace_with_partitioning
+          config = least_config.merge('mode' => 'replace', 'table' => 'table$20160929')
+          task = Bigquery.configure(config, schema, processor_count)
+          any_instance_of(BigqueryClient) do |obj|
+            mock(obj).get_dataset(config['dataset'])
+            mock(obj).create_table(config['temp_table'], options: task)
+            mock(obj).get_table(config['table'])
+            mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
+            mock(obj).delete_table(config['temp_table'])
+          end
+          Bigquery.transaction(config, schema, processor_count, &control)
+        end
+        def test_replace_with_partitioning_with_auto_create_table
+          config = least_config.merge('mode' => 'replace', 'table' => 'table$20160929', 'auto_create_table' => true)
+          task = Bigquery.configure(config, schema, processor_count)
+          any_instance_of(BigqueryClient) do |obj|
+            mock(obj).get_dataset(config['dataset'])
+            mock(obj).create_table(config['temp_table'], options: task)
+            mock(obj).create_table(config['table'], options: task)
+            mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
+            mock(obj).delete_table(config['temp_table'])
+          end
+          Bigquery.transaction(config, schema, processor_count, &control)
         end
-        Bigquery.transaction(config, schema, processor_count, &control)
       end
       sub_test_case "replace_backup" do
         def test_replace_backup
           config = least_config.merge('mode' => 'replace_backup', 'dataset_old' => 'dataset_old', 'table_old' => 'table_old', 'temp_table' => 'temp_table')
+          task = Bigquery.configure(config, schema, processor_count)
           any_instance_of(BigqueryClient) do |obj|
             mock(obj).get_dataset(config['dataset'])
             mock(obj).get_dataset(config['dataset_old'])
-            mock(obj).create_table(config['temp_table'])
+            mock(obj).create_table(config['temp_table'], options: task)
+            mock(obj).get_table(task['table'])
             mock(obj).copy(config['table'], config['table_old'], config['dataset_old'])
             mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
@@ -108,11 +164,51 @@ module Embulk
         def test_replace_backup_auto_create_dataset
           config = least_config.merge('mode' => 'replace_backup', 'dataset_old' => 'dataset_old', 'table_old' => 'table_old', 'temp_table' => 'temp_table', 'auto_create_dataset' => true)
+          task = Bigquery.configure(config, schema, processor_count)
           any_instance_of(BigqueryClient) do |obj|
             mock(obj).create_dataset(config['dataset'])
             mock(obj).create_dataset(config['dataset_old'], reference: config['dataset'])
-            mock(obj).create_table(config['temp_table'])
+            mock(obj).create_table(config['temp_table'], options: task)
+            mock(obj).get_table(task['table'])
+            mock(obj).copy(config['table'], config['table_old'], config['dataset_old'])
+            mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
+            mock(obj).delete_table(config['temp_table'])
+          end
+          Bigquery.transaction(config, schema, processor_count, &control)
+        end
+        def test_replace_backup_with_partitioning
+          config = least_config.merge('mode' => 'replace_backup', 'table' => 'table$20160929', 'dataset_old' => 'dataset_old', 'table_old' => 'table_old$20190929', 'temp_table' => 'temp_table')
+          task = Bigquery.configure(config, schema, processor_count)
+          any_instance_of(BigqueryClient) do |obj|
+            mock(obj).get_dataset(config['dataset'])
+            mock(obj).get_dataset(config['dataset_old'])
+            mock(obj).create_table(config['temp_table'], options: task)
+            mock(obj).get_table(task['table'])
+            mock(obj).get_table(task['table_old'], dataset: config['dataset_old'])
+            mock(obj).get_table(task['table'])
+            mock(obj).copy(config['table'], config['table_old'], config['dataset_old'])
+            mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
+            mock(obj).delete_table(config['temp_table'])
+          end
+          Bigquery.transaction(config, schema, processor_count, &control)
+        end
+        def test_replace_backup_with_partitioning_auto_create_table
+          config = least_config.merge('mode' => 'replace_backup', 'table' => 'table$20160929', 'dataset_old' => 'dataset_old', 'table_old' => 'table_old$20160929', 'temp_table' => 'temp_table', 'auto_create_table' => true)
+          task = Bigquery.configure(config, schema, processor_count)
+          any_instance_of(BigqueryClient) do |obj|
+            mock(obj).get_dataset(config['dataset'])
+            mock(obj).get_dataset(config['dataset_old'])
+            mock(obj).create_table(config['temp_table'], options: task)
+            mock(obj).create_table(task['table'], options: task)
+            mock(obj).create_table(task['table_old'], dataset: config['dataset_old'], options: task)
+            mock(obj).get_table(task['table'])
             mock(obj).copy(config['table'], config['table_old'], config['dataset_old'])
             mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
@@ -121,6 +217,47 @@ module Embulk
           Bigquery.transaction(config, schema, processor_count, &control)
         end
       end
+      sub_test_case "append" do
+        def test_append
+          config = least_config.merge('mode' => 'append')
+          task = Bigquery.configure(config, schema, processor_count)
+          any_instance_of(BigqueryClient) do |obj|
+            mock(obj).get_dataset(config['dataset'])
+            mock(obj).create_table(config['temp_table'], options: task)
+            mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_APPEND')
+            mock(obj).delete_table(config['temp_table'])
+          end
+          Bigquery.transaction(config, schema, processor_count, &control)
+        end
+        def test_append_with_partitioning
+          config = least_config.merge('mode' => 'append', 'table' => 'table$20160929')
+          task = Bigquery.configure(config, schema, processor_count)
+          any_instance_of(BigqueryClient) do |obj|
+            mock(obj).get_dataset(config['dataset'])
+            mock(obj).create_table(config['temp_table'], options: task)
+            mock(obj).get_table(config['table'])
+            mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_APPEND')
+            mock(obj).delete_table(config['temp_table'])
+          end
+          Bigquery.transaction(config, schema, processor_count, &control)
+        end
+        def test_append_with_partitioning_with_auto_create_table
+          config = least_config.merge('mode' => 'append', 'table' => 'table$20160929', 'auto_create_table' => true)
+          task = Bigquery.configure(config, schema, processor_count)
+          any_instance_of(BigqueryClient) do |obj|
+            mock(obj).get_dataset(config['dataset'])
+            mock(obj).create_table(config['temp_table'], options: task)
+            mock(obj).create_table(config['table'], options: task)
+            mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_APPEND')
+            mock(obj).delete_table(config['temp_table'])
+          end
+          Bigquery.transaction(config, schema, processor_count, &control)
+        end
+      end
     end
   end
 end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: embulk-output-bigquery
 version: !ruby/object:Gem::Version
-  version: 0.3.7
+  version: 0.4.0
 platform: ruby
 authors:
 - Satoshi Akama
@@ -9,7 +9,7 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2016-08-03 00:00:00.000000000 Z
+date: 2016-10-01 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: google-api-client
@@ -100,6 +100,7 @@ files:
 - example/config_client_options.yml
 - example/config_csv.yml
 - example/config_delete_in_advance.yml
+- example/config_delete_in_advance_partitioned_table.yml
 - example/config_expose_errors.yml
 - example/config_gcs.yml
 - example/config_guess_from_embulk_schema.yml
@@ -114,8 +115,11 @@ files:
 - example/config_payload_column.yml
 - example/config_payload_column_index.yml
 - example/config_prevent_duplicate_insert.yml
+- example/config_progress_log_interval.yml
 - example/config_replace.yml
 - example/config_replace_backup.yml
+- example/config_replace_backup_paritioned_table.yml
+- example/config_replace_paritioned_table.yml
 - example/config_skip_file_generation.yml
 - example/config_table_strftime.yml
 - example/config_template_table.yml