RubyGems - embulk-output-bigquery - Versions diffs - 0.3.7 → 0.4.0 - Mend

embulk-output-bigquery 0.3.7 → 0.4.0

Files changed (18) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +5 -0
data/README.md +47 -17
data/embulk-output-bigquery.gemspec +1 -1
data/example/config_delete_in_advance_partitioned_table.yml +33 -0
data/example/config_progress_log_interval.yml +31 -0
data/example/config_replace_backup_paritioned_table.yml +34 -0
data/example/config_replace_paritioned_table.yml +33 -0
data/lib/embulk/output/bigquery.rb +55 -14
data/lib/embulk/output/bigquery/bigquery_client.rb +63 -28
data/lib/embulk/output/bigquery/file_writer.rb +13 -4
data/lib/embulk/output/bigquery/helper.rb +10 -0
data/test/test_bigquery_client.rb +41 -0
data/test/test_configure.rb +17 -0
data/test/test_example.rb +20 -11
data/test/test_helper.rb +10 -0
data/test/test_transaction.rb +169 -32
metadata +6 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: f0c3b2728451f241f860fdcea92a26470db4203d
-  data.tar.gz: e38b0f175e46685bd25d9a31a33358bd3220b8c6
+  metadata.gz: 71bc9b253f725436a06e183667cbc87720c3719b
+  data.tar.gz: a32e43da05a4f90ab72c5715ffdf6b08501996d4
 SHA512:
-  metadata.gz: 4639014f1f5ba0a6e791ceb35dccde7cb68fc75162d1a6cbfa06d3e99f882fd397470a97c3dd4edfe47cef786fd884ac53e787e48d01921ca7c1bc6edea58dd1
-  data.tar.gz: 5f6db056c5119510db2f00edd2351bd240f9e715e7acf6ad3da7a9dde4b0f5b9fb8b187b97641532dc6215b72dcb9317150da7260ae39235ab3ed22968bd9f67
+  metadata.gz: bd3d8aefbc98c2f044b782f807f595603ac7b11052a06b6486803fd2f6871127058a50e9c69ffc1fac92b75de9561c57e99ad9ba3cd8899507e93085d45ed615
+  data.tar.gz: 813b6455f463940968232b4332b8553698b9ef99ad4f3f5af6800b10223c33498fde9f8915604090f85dc7c2f78d16a865cd90da2174447c7f84ab3ef80a4cf8

data/CHANGELOG.md CHANGED Viewed

@@ -1,3 +1,8 @@
+## 0.4.0 - 2016-10-01
+* [enhancement] Support partitioned table
+* [maintenance] Add `progress_log_interval` option to control the interval of showing progress log, and now showing progress log is off by default
 ## 0.3.7 - 2016-08-03
 * [maintenance] Fix Thread.new to use thread local variables to avoid nil idx error (thanks to @shyouhei and @umisora)

data/README.md CHANGED Viewed

@@ -44,7 +44,7 @@ v0.3.x has incompatibility changes with v0.2.x. Please see [CHANGELOG.md](CHANGE
 |  json_keyfile                        | string      | required when auth_method is json_key     |   | Fullpath of json key |
 |  project                             | string      | required if json_keyfile is not given     |   | project_id |
 |  dataset                             | string      | required   |                          | dataset |
-|  table                               | string      | required   |                          | table name |
+|  table                               | string      | required   |                          | table name, or table name with a partition decorator such as `table_name$20160929`|
 |  auto_create_dataset                 | boolean     | optional   | false                    | automatically create dataset |
 |  auto_create_table                   | boolean     | optional   | false                    | See [Dynamic Table Creating](#dynamic-table-creating) |
 |  schema_file                         | string      | optional   |                          | /path/to/schema.json |
@@ -63,6 +63,7 @@ v0.3.x has incompatibility changes with v0.2.x. Please see [CHANGELOG.md](CHANGE
 |  payload_column_index                | integer     | optional   | nil                      | See [Formatter Performance Issue](#formatter-performance-issue) |
 |  gcs_bucket                          | stringr     | optional   | nil                      | See [GCS Bucket](#gcs-bucket) |
 |  auto_create_gcs_bucket              | boolean     | optional   | false                    | See [GCS Bucket](#gcs-bucket) |
+|  progress_log_interval               | float       | optional   | nil (Disabled)           | Progress log interval. The progress log is disabled by nil (default). NOTE: This option may be removed in a future because a filter plugin can achieve the same goal |
 Client or request options
@@ -87,18 +88,21 @@ Options for intermediate local files
 `source_format` is also used to determine formatter (csv or jsonl).
-#### Same options of bq command-line tools or BigQuery job's propery
+#### Same options of bq command-line tools or BigQuery job's property
 Following options are same as [bq command-line tools](https://cloud.google.com/bigquery/bq-command-line-tool#creatingtablefromfile) or BigQuery [job's property](https://cloud.google.com/bigquery/docs/reference/v2/jobs#resource).
-| name                      | type        | required?  | default      | description            |
-|:--------------------------|:------------|:-----------|:-------------|:-----------------------|
-|  source_format            | string      | required   | "CSV"        |   File type (`NEWLINE_DELIMITED_JSON` or `CSV`) |
-|  max_bad_records          | int         | optional   | 0            | |
-|  field_delimiter          | char        | optional   | ","          |  |
-|  encoding                 | string      | optional   | "UTF-8"      | `UTF-8` or `ISO-8859-1` |
-|  ignore_unknown_values    | boolean     | optional   | 0            | |
-|  allow_quoted_newlines    | boolean     | optional   | 0            | Set true, if data contains newline characters. It may cause slow procsssing |
+| name                              | type     | required? | default | description            |
+|:----------------------------------|:---------|:----------|:--------|:-----------------------|
+|  source_format                    | string   | required  | "CSV"   |   File type (`NEWLINE_DELIMITED_JSON` or `CSV`) |
+|  max_bad_records                  | int      | optional  | 0       | |
+|  field_delimiter                  | char     | optional  | ","     | |
+|  encoding                         | string   | optional  | "UTF-8" | `UTF-8` or `ISO-8859-1` |
+|  ignore_unknown_values            | boolean  | optional  | false   | |
+|  allow_quoted_newlines            | boolean  | optional  | false   | Set true, if data contains newline characters. It may cause slow procsssing |
+|  time_partitioning                | hash     | optional  | nil     | See [Time Partitioning](#time-partitioning) |
+|  time_partitioning.type           | string   | required  | nil     | The only type supported is DAY, which will generate one partition per day based on data loading time. |
+|  time_partitioning.expiration__ms | int      | optional  | nil     | Number of milliseconds for which to keep the storage for a partition. partition |
 ### Example
@@ -123,32 +127,32 @@ out:
 ##### append
 1. Load to temporary table.
-2. Copy temporary table to destination table. (WRITE_APPEND)
+2. Copy temporary table to destination table (or partition). (WRITE_APPEND)
 ##### append_direct
-Insert data into existing table directly.
+Insert data into existing table (or partition) directly.
 This is not transactional, i.e., if fails, the target table could have some rows inserted.
 ##### replace
 1. Load to temporary table.
-2. Copy temporary table to destination table. (WRITE_TRUNCATE)
+2. Copy temporary table to destination table (or partition). (WRITE_TRUNCATE)
 ```is_skip_job_result_check``` must be false when replace mode
 ##### replace_backup
 1. Load to temporary table.
-2. Copy destination table to backup table. (dataset_old, table_old)
-3. Copy temporary table to destination table. (WRITE_TRUNCATE)
+2. Copy destination table (or partition) to backup table (or partition). (dataset_old, table_old)
+3. Copy temporary table to destination table (or partition). (WRITE_TRUNCATE)
 ```is_skip_job_result_check``` must be false when replace_backup mode.
 ##### delete_in_advance
-1. Delete destination table, if it exists.
-2. Load to destination table.
+1. Delete destination table (or partition), if it exists.
+2. Load to destination table (or partition).
 ### Authentication
@@ -366,6 +370,32 @@ out:
 ToDo: Use https://cloud.google.com/storage/docs/streaming if google-api-ruby-client supports streaming transfers into GCS.
+### Time Partitioning
+From 0.4.0, embulk-output-bigquery supports to load into partitioned table.
+See also [Creating and Updating Date-Partitioned Tables](https://cloud.google.com/bigquery/docs/creating-partitioned-tables).
+To load into a partition, specify `table` parameter with a partition decorator as:
+```yaml
+out:
+  type: bigquery
+  table: table_name$20160929
+  auto_create_table: true
+```
+You may configure `time_partitioning` parameter together to create table via `auto_create_table: true` option as:
+```yaml
+out:
+  type: bigquery
+  table: table_name$20160929
+  auto_create_table: true
+  time-partitioning:
+    type: DAY
+    expiration_ms: 259200000
+```
 ## Development
 ### Run example:

data/embulk-output-bigquery.gemspec CHANGED Viewed

@@ -1,6 +1,6 @@
 Gem::Specification.new do |spec|
   spec.name          = "embulk-output-bigquery"
-  spec.version       = "0.3.7"
+  spec.version       = "0.4.0"
   spec.authors       = ["Satoshi Akama", "Naotoshi Seo"]
   spec.summary       = "Google BigQuery output plugin for Embulk"
   spec.description   = "Embulk plugin that insert records to Google BigQuery."

data/example/config_delete_in_advance_partitioned_table.yml ADDED Viewed

@@ -0,0 +1,33 @@
+in:
+  type: file
+  path_prefix: example/example.csv
+  parser:
+    type: csv
+    charset: UTF-8
+    newline: CRLF
+    null_string: 'NULL'
+    skip_header_lines: 1
+    comment_line_marker: '#'
+    columns:
+      - {name: date,        type: string}
+      - {name: timestamp,   type: timestamp, format: "%Y-%m-%d %H:%M:%S.%N", timezone: "+09:00"}
+      - {name: "null",      type: string}
+      - {name: long,        type: long}
+      - {name: string,      type: string}
+      - {name: double,      type: double}
+      - {name: boolean,     type: boolean}
+out:
+  type: bigquery
+  mode: delete_in_advance
+  auth_method: json_key
+  json_keyfile: example/your-project-000.json
+  dataset: your_dataset_name
+  table: your_partitioned_table_name$20160929
+  source_format: NEWLINE_DELIMITED_JSON
+  compression: NONE
+  auto_create_dataset: true
+  auto_create_table: true
+  schema_file: example/schema.json
+  time_partitioning:
+    type: 'DAY'
+    expiration_ms: 100

data/example/config_progress_log_interval.yml ADDED Viewed

@@ -0,0 +1,31 @@
+in:
+  type: file
+  path_prefix: example/example.csv
+  parser:
+    type: csv
+    charset: UTF-8
+    newline: CRLF
+    null_string: 'NULL'
+    skip_header_lines: 1
+    comment_line_marker: '#'
+    columns:
+      - {name: date,        type: string}
+      - {name: timestamp,   type: timestamp, format: "%Y-%m-%d %H:%M:%S.%N", timezone: "+09:00"}
+      - {name: "null",      type: string}
+      - {name: long,        type: long}
+      - {name: string,      type: string}
+      - {name: double,      type: double}
+      - {name: boolean,     type: boolean}
+out:
+  type: bigquery
+  mode: replace
+  auth_method: json_key
+  json_keyfile: example/your-project-000.json
+  dataset: your_dataset_name
+  table: your_table_name
+  source_format: NEWLINE_DELIMITED_JSON
+  compression: NONE
+  auto_create_dataset: true
+  auto_create_table: true
+  schema_file: example/schema.json
+  progress_log_interval: 0.1

data/example/config_replace_backup_paritioned_table.yml ADDED Viewed

@@ -0,0 +1,34 @@
+in:
+  type: file
+  path_prefix: example/example.csv
+  parser:
+    type: csv
+    charset: UTF-8
+    newline: CRLF
+    null_string: 'NULL'
+    skip_header_lines: 1
+    comment_line_marker: '#'
+    columns:
+      - {name: date,        type: string}
+      - {name: timestamp,   type: timestamp, format: "%Y-%m-%d %H:%M:%S.%N", timezone: "+09:00"}
+      - {name: "null",      type: string}
+      - {name: long,        type: long}
+      - {name: string,      type: string}
+      - {name: double,      type: double}
+      - {name: boolean,     type: boolean}
+out:
+  type: bigquery
+  mode: replace_backup
+  auth_method: json_key
+  json_keyfile: example/your-project-000.json
+  dataset: your_dataset_name
+  table: your_partitioned_table_name$20160929
+  table_old: your_partitioned_table_name_old$20160929
+  source_format: NEWLINE_DELIMITED_JSON
+  compression: NONE
+  auto_create_dataset: true
+  auto_create_table: true
+  schema_file: example/schema.json
+  time_partitioning:
+    type: 'DAY'
+    expiration_ms: 100

data/example/config_replace_paritioned_table.yml ADDED Viewed

@@ -0,0 +1,33 @@
+in:
+  type: file
+  path_prefix: example/example.csv
+  parser:
+    type: csv
+    charset: UTF-8
+    newline: CRLF
+    null_string: 'NULL'
+    skip_header_lines: 1
+    comment_line_marker: '#'
+    columns:
+      - {name: date,        type: string}
+      - {name: timestamp,   type: timestamp, format: "%Y-%m-%d %H:%M:%S.%N", timezone: "+09:00"}
+      - {name: "null",      type: string}
+      - {name: long,        type: long}
+      - {name: string,      type: string}
+      - {name: double,      type: double}
+      - {name: boolean,     type: boolean}
+out:
+  type: bigquery
+  mode: replace
+  auth_method: json_key
+  json_keyfile: example/your-project-000.json
+  dataset: your_dataset_name
+  table: your_partitioned_table_name$20160929
+  source_format: NEWLINE_DELIMITED_JSON
+  compression: NONE
+  auto_create_dataset: true
+  auto_create_table: true
+  schema_file: example/schema.json
+  time_partitioning:
+    type: 'DAY'
+    expiration_ms: 100

data/lib/embulk/output/bigquery.rb CHANGED Viewed

@@ -56,6 +56,7 @@ module Embulk
           'with_rehearsal'                 => config.param('with_rehearsal',                 :bool,    :default => false),
           'rehearsal_counts'               => config.param('rehearsal_counts',               :integer, :default => 1000),
           'abort_on_error'                 => config.param('abort_on_error',                 :bool,    :default => nil),
+          'progress_log_interval'          => config.param('progress_log_interval',          :float,   :default => nil),
           'column_options'                 => config.param('column_options',                 :array,   :default => []),
           'default_timezone'               => config.param('default_timezone',               :string,  :default => ValueConverterFactory::DEFAULT_TIMEZONE),
@@ -84,6 +85,7 @@ module Embulk
           'encoding'                       => config.param('encoding',                       :string,  :default => 'UTF-8'),
           'ignore_unknown_values'          => config.param('ignore_unknown_values',          :bool,    :default => false),
           'allow_quoted_newlines'          => config.param('allow_quoted_newlines',          :bool,    :default => false),
+          'time_partitioning'              => config.param('time_partitioning',              :hash,    :default => nil),
           # for debug
           'skip_load'                      => config.param('skip_load',                      :bool,    :default => false),
@@ -204,6 +206,8 @@ module Embulk
         if %w[replace replace_backup append].include?(task['mode'])
           task['temp_table'] ||= "LOAD_TEMP_#{unique_name}_#{task['table']}"
+        else
+          task['temp_table'] = nil
         end
         if task['with_rehearsal']
@@ -218,6 +222,14 @@ module Embulk
           task['abort_on_error'] = (task['max_bad_records'] == 0)
         end
+        if task['time_partitioning']
+          unless task['time_partitioning']['type']
+            raise ConfigError.new "`time_partitioning` must have `type` key"
+          end
+        elsif Helper.has_partition_decorator?(task['table'])
+          task['time_partitioning'] = {'type' => 'DAY'}
+        end
         task
       end
@@ -258,14 +270,7 @@ module Embulk
         }
       end
-      def self.transaction(config, schema, task_count, &control)
-        task = self.configure(config, schema, task_count)
-        @task = task
-        @schema = schema
-        @bigquery = BigqueryClient.new(task, schema)
-        @converters = ValueConverterFactory.create_converters(task, schema)
+      def self.auto_create(task, bigquery)
         if task['auto_create_dataset']
           bigquery.create_dataset(task['dataset'])
         else
@@ -282,18 +287,50 @@ module Embulk
         case task['mode']
         when 'delete_in_advance'
-          bigquery.delete_table(task['table'])
-          bigquery.create_table(task['table'])
+          if task['time_partitioning']
+            bigquery.delete_partition(task['table'])
+          else
+            bigquery.delete_table(task['table'])
+          end
+          bigquery.create_table(task['table'], options: task)
         when 'replace', 'replace_backup', 'append'
-          bigquery.create_table(task['temp_table'])
+          bigquery.create_table(task['temp_table'], options: task)
+          if task['time_partitioning']
+            if task['auto_create_table']
+              bigquery.create_table(task['table'], options: task)
+            else
+              bigquery.get_table(task['table']) # raises NotFoundError
+            end
+          end
         else # append_direct
           if task['auto_create_table']
-            bigquery.create_table(task['table'])
+            bigquery.create_table(task['table'], options: task)
           else
             bigquery.get_table(task['table']) # raises NotFoundError
           end
         end
+        if task['mode'] == 'replace_backup'
+          if task['time_partitioning'] and Helper.has_partition_decorator?(task['table_old'])
+            if task['auto_create_table']
+              bigquery.create_table(task['table_old'], dataset: task['dataset_old'], options: task)
+            else
+              bigquery.get_table(task['table_old'], dataset: task['dataset_old']) # raises NotFoundError
+            end
+          end
+        end
+      end
+      def self.transaction(config, schema, task_count, &control)
+        task = self.configure(config, schema, task_count)
+        @task = task
+        @schema = schema
+        @bigquery = BigqueryClient.new(task, schema)
+        @converters = ValueConverterFactory.create_converters(task, schema)
+        self.auto_create(@task, @bigquery)
         begin
           paths = []
           if task['skip_file_generation']
@@ -346,7 +383,11 @@ module Embulk
             end
             if task['mode'] == 'replace_backup'
-              bigquery.copy(task['table'], task['table_old'], task['dataset_old'])
+              begin
+                bigquery.get_table(task['table'])
+                bigquery.copy(task['table'], task['table_old'], task['dataset_old'])
+              rescue NotFoundError
+              end
             end
             if task['temp_table']
@@ -359,7 +400,7 @@ module Embulk
           end
         ensure
           begin
-            if task['temp_table'] # replace or replace_backup
+            if task['temp_table'] # append or replace or replace_backup
               bigquery.delete_table(task['temp_table'])
             end
           ensure

data/lib/embulk/output/bigquery/bigquery_client.rb CHANGED Viewed

@@ -17,6 +17,14 @@ module Embulk
           reset_fields(fields) if fields
           @project = @task['project']
           @dataset = @task['dataset']
+          @task['source_format'] ||= 'CSV'
+          @task['max_bad_records'] ||= 0
+          @task['field_delimiter'] ||= ','
+          @task['source_format'] == 'CSV' ? @task['field_delimiter'] : nil
+          @task['encoding'] ||= 'UTF-8'
+          @task['ignore_unknown_values'] = false if @task['ignore_unknown_values'].nil?
+          @task['allow_quoted_newlines'] = false if @task['allow_quoted_newlines'].nil?
         end
         def fields
@@ -143,7 +151,7 @@ module Embulk
           responses
         end
-        def load(path, table)
+        def load(path, table, write_disposition: 'WRITE_APPEND')
           with_job_retry do
             begin
               if File.exist?(path)
@@ -175,7 +183,7 @@ module Embulk
                     schema: {
                       fields: fields,
                     },
-                    write_disposition: 'WRITE_APPEND',
+                    write_disposition:     write_disposition,
                     source_format:         @task['source_format'],
                     max_bad_records:       @task['max_bad_records'],
                     field_delimiter:       @task['source_format'] == 'CSV' ? @task['field_delimiter'] : nil,
@@ -233,15 +241,15 @@ module Embulk
                     create_deposition: 'CREATE_IF_NEEDED',
                     write_disposition: write_disposition,
                     source_table: {
-                    project_id: @project,
-                    dataset_id: @dataset,
-                    table_id: source_table,
-                  },
-                  destination_table: {
-                    project_id: @project,
-                    dataset_id: destination_dataset,
-                    table_id: destination_table,
-                  },
+                      project_id: @project,
+                      dataset_id: @dataset,
+                      table_id: source_table,
+                    },
+                    destination_table: {
+                      project_id: @project,
+                      dataset_id: destination_dataset,
+                      table_id: destination_table,
+                    },
                   }
                 }
               }
@@ -363,9 +371,11 @@ module Embulk
           end
         end
-        def create_table(table)
+        def create_table(table, dataset: nil, options: {})
           begin
-            Embulk.logger.info { "embulk-output-bigquery: Create table... #{@project}:#{@dataset}.#{table}" }
+            table = Helper.chomp_partition_decorator(table)
+            dataset ||= @dataset
+            Embulk.logger.info { "embulk-output-bigquery: Create table... #{@project}:#{dataset}.#{table}" }
             body = {
               table_reference: {
                 table_id: table,
@@ -374,9 +384,15 @@ module Embulk
                 fields: fields,
               }
             }
+            if options['time_partitioning']
+              body[:time_partitioning] = {
+                type: options['time_partitioning']['type'],
+                expiration_ms: options['time_partitioning']['expiration_ms'],
+              }
+            end
             opts = {}
-            Embulk.logger.debug { "embulk-output-bigquery: insert_table(#{@project}, #{@dataset}, #{body}, #{opts})" }
-            with_network_retry { client.insert_table(@project, @dataset, body, opts) }
+            Embulk.logger.debug { "embulk-output-bigquery: insert_table(#{@project}, #{dataset}, #{body}, #{opts})" }
+            with_network_retry { client.insert_table(@project, dataset, body, opts) }
           rescue Google::Apis::ServerError, Google::Apis::ClientError, Google::Apis::AuthorizationError => e
             if e.status_code == 409 && /Already Exists:/ =~ e.message
               # ignore 'Already Exists' error
@@ -385,16 +401,18 @@ module Embulk
             response = {status_code: e.status_code, message: e.message, error_class: e.class}
             Embulk.logger.error {
-              "embulk-output-bigquery: insert_table(#{@project}, #{@dataset}, #{body}, #{opts}), response:#{response}"
+              "embulk-output-bigquery: insert_table(#{@project}, #{dataset}, #{body}, #{opts}), response:#{response}"
             }
-            raise Error, "failed to create table #{@project}:#{@dataset}.#{table}, response:#{response}"
+            raise Error, "failed to create table #{@project}:#{dataset}.#{table}, response:#{response}"
           end
         end
-        def delete_table(table)
+        def delete_table(table, dataset: nil)
           begin
-            Embulk.logger.info { "embulk-output-bigquery: Delete table... #{@project}:#{@dataset}.#{table}" }
-            with_network_retry { client.delete_table(@project, @dataset, table) }
+            table = Helper.chomp_partition_decorator(table)
+            dataset ||= @dataset
+            Embulk.logger.info { "embulk-output-bigquery: Delete table... #{@project}:#{dataset}.#{table}" }
+            with_network_retry { client.delete_table(@project, dataset, table) }
           rescue Google::Apis::ServerError, Google::Apis::ClientError, Google::Apis::AuthorizationError => e
             if e.status_code == 404 && /Not found:/ =~ e.message
               # ignore 'Not Found' error
@@ -403,26 +421,43 @@ module Embulk
             response = {status_code: e.status_code, message: e.message, error_class: e.class}
             Embulk.logger.error {
-              "embulk-output-bigquery: delete_table(#{@project}, #{@dataset}, #{table}), response:#{response}"
+              "embulk-output-bigquery: delete_table(#{@project}, #{dataset}, #{table}), response:#{response}"
             }
-            raise Error, "failed to delete table #{@project}:#{@dataset}.#{table}, response:#{response}"
+            raise Error, "failed to delete table #{@project}:#{dataset}.#{table}, response:#{response}"
           end
         end
-        def get_table(table)
+        def get_table(table, dataset: nil)
           begin
-            Embulk.logger.info { "embulk-output-bigquery: Get table... #{@project}:#{@dataset}.#{table}" }
-            with_network_retry { client.get_table(@project, @dataset, table) }
+            table = Helper.chomp_partition_decorator(table)
+            dataset ||= @dataset
+            Embulk.logger.info { "embulk-output-bigquery: Get table... #{@project}:#{dataset}.#{table}" }
+            with_network_retry { client.get_table(@project, dataset, table) }
           rescue Google::Apis::ServerError, Google::Apis::ClientError, Google::Apis::AuthorizationError => e
             if e.status_code == 404
-              raise NotFoundError, "Table #{@project}:#{@dataset}.#{table} is not found"
+              raise NotFoundError, "Table #{@project}:#{dataset}.#{table} is not found"
             end
             response = {status_code: e.status_code, message: e.message, error_class: e.class}
             Embulk.logger.error {
-              "embulk-output-bigquery: get_table(#{@project}, #{@dataset}, #{table}), response:#{response}"
+              "embulk-output-bigquery: get_table(#{@project}, #{dataset}, #{table}), response:#{response}"
             }
-            raise Error, "failed to get table #{@project}:#{@dataset}.#{table}, response:#{response}"
+            raise Error, "failed to get table #{@project}:#{dataset}.#{table}, response:#{response}"
+          end
+        end
+        # Is this only a way to drop partition?
+        def delete_partition(table_with_partition, dataset: nil)
+          dataset ||= @dataset
+          begin
+            table = Helper.chomp_partition_decorator(table_with_partition)
+            get_table(table, dataset: dataset)
+          rescue NotFoundError
+          else
+            Embulk.logger.info { "embulk-output-bigquery: Delete partition... #{@project}:#{dataset}.#{table_with_partition}" }
+            Tempfile.create('embulk_output_bigquery_empty_file_') do |fp|
+              load(fp.path, table_with_partition, write_disposition: 'WRITE_TRUNCATE')
+            end
           end
         end
       end

data/lib/embulk/output/bigquery/file_writer.rb CHANGED Viewed

@@ -16,8 +16,11 @@ module Embulk
           @converters = converters || ValueConverterFactory.create_converters(task, schema)
           @num_rows = 0
-          @progress_log_timer = Time.now
-          @previous_num_rows = 0
+          if @task['progress_log_interval']
+            @progress_log_interval = @task['progress_log_interval']
+            @progress_log_timer = Time.now
+            @previous_num_rows = 0
+          end
           if @task['payload_column_index']
             @payload_column_index = @task['payload_column_index']
@@ -103,14 +106,20 @@ module Embulk
             _io.write formatted_record
             @num_rows += 1
           end
+          show_progress if @task['progress_log_interval']
+          @num_rows
+        end
+        private
+        def show_progress
           now = Time.now
-          if @progress_log_timer < now - 10 # once in 10 seconds
+          if @progress_log_timer < now - @progress_log_interval
             speed = ((@num_rows - @previous_num_rows) / (now - @progress_log_timer).to_f).round(1)
             @progress_log_timer = now
             @previous_num_rows = @num_rows
             Embulk.logger.info { "embulk-output-bigquery: num_rows #{num_format(@num_rows)} (#{num_format(speed)} rows/sec)" }
           end
-          @num_rows
         end
       end
     end

data/lib/embulk/output/bigquery/helper.rb CHANGED Viewed

@@ -5,6 +5,16 @@ module Embulk
   module Output
     class Bigquery < OutputPlugin
       class Helper
+        PARTITION_DECORATOR_REGEXP = /\$.+\z/
+        def self.has_partition_decorator?(table)
+          !!(table =~ PARTITION_DECORATOR_REGEXP)
+        end
+        def self.chomp_partition_decorator(table)
+          table.sub(PARTITION_DECORATOR_REGEXP, '')
+        end
         def self.bq_type_from_embulk_type(embulk_type)
           case embulk_type
           when :boolean then 'BOOLEAN'

data/test/test_bigquery_client.rb CHANGED Viewed

@@ -105,6 +105,15 @@ else
           def test_create_table_already_exists
             assert_nothing_raised { client.create_table('your_table_name') }
           end
+          def test_create_partitioned_table
+            client.delete_table('your_table_name')
+            assert_nothing_raised do
+              client.create_table('your_table_name$20160929', options:{
+                'time_partitioning' => {'type'=>'DAY'}
+              })
+            end
+          end
         end
         sub_test_case "delete_table" do
@@ -116,6 +125,11 @@ else
           def test_delete_table_not_found
             assert_nothing_raised { client.delete_table('your_table_name') }
           end
+          def test_delete_partitioned_table
+            client.create_table('your_table_name')
+            assert_nothing_raised { client.delete_table('your_table_name$20160929') }
+          end
         end
         sub_test_case "get_table" do
@@ -130,6 +144,33 @@ else
               client.get_table('your_table_name')
             }
           end
+          def test_get_partitioned_table
+            client.create_table('your_table_name')
+            assert_nothing_raised { client.get_table('your_table_name$20160929') }
+          end
+        end
+        sub_test_case "delete_partition" do
+          def test_delete_partition
+            client.create_table('your_table_name$20160929', options:{
+              'time_partitioning' => {'type'=>'DAY'}
+            })
+            assert_nothing_raised { client.delete_partition('your_table_name$20160929') }
+          ensure
+            client.delete_table('your_table_name')
+          end
+          def test_delete_partition_of_non_partitioned_table
+            client.create_table('your_table_name')
+            assert_raise { client.delete_partition('your_table_name$20160929') }
+          ensure
+            client.delete_table('your_table_name')
+          end
+          def test_delete_partition_table_not_found
+            assert_nothing_raised { client.delete_partition('your_table_name$20160929') }
+          end
         end
         sub_test_case "fields" do

data/test/test_configure.rb CHANGED Viewed

@@ -84,6 +84,7 @@ module Embulk
         assert_equal "UTF-8", task['encoding']
         assert_equal false, task['ignore_unknown_values']
         assert_equal false, task['allow_quoted_newlines']
+        assert_equal nil, task['time_partitioning']
         assert_equal false, task['skip_load']
       end
@@ -249,6 +250,22 @@ module Embulk
         task = Bigquery.configure(config, schema, processor_count)
         assert_equal '.foo', task['file_ext']
       end
+      def test_time_partitioning
+        config = least_config.merge('time_partitioning' => {'type' => 'DAY'})
+        assert_nothing_raised { Bigquery.configure(config, schema, processor_count) }
+        config = least_config.merge('time_partitioning' => {'foo' => 'bar'})
+        assert_raise { Bigquery.configure(config, schema, processor_count) }
+        config = least_config.merge('table' => 'table')
+        task = Bigquery.configure(config, schema, processor_count)
+        assert_equal nil, task['time_partitioning']
+        config = least_config.merge('table' => 'table_name$20160912')
+        task = Bigquery.configure(config, schema, processor_count)
+        assert_equal 'DAY', task['time_partitioning']['type']
+      end
     end
   end
 end

data/test/test_example.rb CHANGED Viewed

@@ -18,19 +18,28 @@ else
       end
     end
-    files = Dir.glob("#{APP_ROOT}/example/config_*.yml").sort
-    files = files.reject {|file| File.symlink?(file) }
-    # files.shift
+    def embulk_run(config_path)
+      Bundler.with_clean_env do
+        cmd = "#{embulk_path} run -X page_size=1 -b . -l trace #{config_path}"
+        puts "=" * 64
+        puts cmd
+        system(cmd)
+      end
+    end
+    files = Dir.glob("#{APP_ROOT}/example/config_*.yml").reject {|file| File.symlink?(file) }.sort
     files.each do |config_path|
-      next if File.basename(config_path) == 'config_expose_errors.yml'
-      define_method(:"test_#{File.basename(config_path, ".yml")}") do
-        success = Bundler.with_clean_env do
-          cmd = "#{embulk_path} run -X page_size=1 -b . -l trace #{config_path}"
-          puts "=" * 64
-          puts cmd
-          system(cmd)
+      if %w[
+        config_expose_errors.yml
+        config_prevent_duplicate_insert.yml
+        ].include?(File.basename(config_path))
+        define_method(:"test_#{File.basename(config_path, ".yml")}") do
+          assert_false embulk_run(config_path)
+        end
+      else
+        define_method(:"test_#{File.basename(config_path, ".yml")}") do
+          assert_true embulk_run(config_path)
         end
-        assert_true success
       end
     end
   end

data/test/test_helper.rb CHANGED Viewed

@@ -14,6 +14,16 @@ module Embulk
         end
       end
+      def has_partition_decorator?
+        assert_true Helper.has_partition_decorator?('table$20160929')
+        assert_false Helper.has_partition_decorator?('table')
+      end
+      def chomp_partition_decorator
+        assert_equal 'table', Helper.chomp_partition_decorator?('table$20160929')
+        assert_equal 'table', Helper.chomp_partition_decorator?('table')
+      end
       def bq_type_from_embulk_type
         assert_equal 'BOOLEAN',   Helper.bq_type_from_embulk_type(:boolean)
         assert_equal 'STRING',    Helper.bq_type_from_embulk_type(:string)

data/test/test_transaction.rb CHANGED Viewed

@@ -8,10 +8,12 @@ module Embulk
     class TestTransaction < Test::Unit::TestCase
       def least_config
         DataSource.new({
-          'project' => 'your_project_name',
-          'dataset' => 'your_dataset_name',
-          'table'   => 'your_table_name',
+          'project'     => 'your_project_name',
+          'dataset'     => 'your_dataset_name',
+          'table'       => 'your_table_name',
           'p12_keyfile' => __FILE__, # fake
+          'temp_table'  => 'temp_table', # randomly created is not good for our test
+          'path_prefix' => 'tmp/', # randomly created is not good for our test
         })
       end
@@ -38,17 +40,6 @@ module Embulk
         stub(Bigquery).transaction_report { {'num_input_rows' => 1, 'num_output_rows' => 1, 'num_rejected_rows' => 0} }
       end
-      def test_append
-        config = least_config.merge('mode' => 'append', 'temp_table' => 'temp_table')
-        any_instance_of(BigqueryClient) do |obj|
-          mock(obj).get_dataset(config['dataset'])
-          mock(obj).create_table(config['temp_table'])
-          mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_APPEND')
-          mock(obj).delete_table(config['temp_table'])
-        end
-        Bigquery.transaction(config, schema, processor_count, &control)
-      end
       sub_test_case "append_direct" do
         def test_append_direct
           config = least_config.merge('mode' => 'append_direct')
@@ -61,43 +52,108 @@ module Embulk
         def test_append_direct_with_auto_create
           config = least_config.merge('mode' => 'append_direct', 'auto_create_dataset' => true, 'auto_create_table' => true)
+          task = Bigquery.configure(config, schema, processor_count)
           any_instance_of(BigqueryClient) do |obj|
             mock(obj).create_dataset(config['dataset'])
-            mock(obj).create_table(config['table'])
+            mock(obj).create_table(config['table'], options: task)
+          end
+          Bigquery.transaction(config, schema, processor_count, &control)
+        end
+        def test_append_direct_with_partition
+          config = least_config.merge('mode' => 'append_direct', 'table' => 'table$20160929')
+          any_instance_of(BigqueryClient) do |obj|
+            mock(obj).get_dataset(config['dataset'])
+            mock(obj).get_table(config['table'])
+          end
+          Bigquery.transaction(config, schema, processor_count, &control)
+        end
+        def test_append_direct_with_partition_with_auto_create
+          config = least_config.merge('mode' => 'append_direct', 'table' => 'table$20160929', 'auto_create_dataset' => true, 'auto_create_table' => true)
+          task = Bigquery.configure(config, schema, processor_count)
+          any_instance_of(BigqueryClient) do |obj|
+            mock(obj).create_dataset(config['dataset'])
+            mock(obj).create_table(config['table'], options: task)
           end
           Bigquery.transaction(config, schema, processor_count, &control)
         end
       end
-      def test_delete_in_advance
-        config = least_config.merge('mode' => 'delete_in_advance')
-        any_instance_of(BigqueryClient) do |obj|
-          mock(obj).get_dataset(config['dataset'])
-          mock(obj).delete_table(config['table'])
-          mock(obj).create_table(config['table'])
+      sub_test_case "delete_in_advance" do
+        def test_delete_in_advance
+          config = least_config.merge('mode' => 'delete_in_advance')
+          task = Bigquery.configure(config, schema, processor_count)
+          any_instance_of(BigqueryClient) do |obj|
+            mock(obj).get_dataset(config['dataset'])
+            mock(obj).delete_table(config['table'])
+            mock(obj).create_table(config['table'], options: task)
+          end
+          Bigquery.transaction(config, schema, processor_count, &control)
+        end
+        def test_delete_in_advance_with_partitioning
+          config = least_config.merge('mode' => 'delete_in_advance', 'table' => 'table$20160929')
+          task = Bigquery.configure(config, schema, processor_count)
+          any_instance_of(BigqueryClient) do |obj|
+            mock(obj).get_dataset(config['dataset'])
+            mock(obj).delete_partition(config['table'])
+            mock(obj).create_table(config['table'], options: task)
+          end
+          Bigquery.transaction(config, schema, processor_count, &control)
         end
-        Bigquery.transaction(config, schema, processor_count, &control)
       end
-      def test_replace
-        config = least_config.merge('mode' => 'replace', 'temp_table' => 'temp_table')
-        any_instance_of(BigqueryClient) do |obj|
-          mock(obj).get_dataset(config['dataset'])
-          mock(obj).create_table(config['temp_table'])
-          mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
-          mock(obj).delete_table(config['temp_table'])
+      sub_test_case "replace" do
+        def test_replace
+          config = least_config.merge('mode' => 'replace')
+          task = Bigquery.configure(config, schema, processor_count)
+          any_instance_of(BigqueryClient) do |obj|
+            mock(obj).get_dataset(config['dataset'])
+            mock(obj).create_table(config['temp_table'], options: task)
+            mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
+            mock(obj).delete_table(config['temp_table'])
+          end
+          Bigquery.transaction(config, schema, processor_count, &control)
+        end
+        def test_replace_with_partitioning
+          config = least_config.merge('mode' => 'replace', 'table' => 'table$20160929')
+          task = Bigquery.configure(config, schema, processor_count)
+          any_instance_of(BigqueryClient) do |obj|
+            mock(obj).get_dataset(config['dataset'])
+            mock(obj).create_table(config['temp_table'], options: task)
+            mock(obj).get_table(config['table'])
+            mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
+            mock(obj).delete_table(config['temp_table'])
+          end
+          Bigquery.transaction(config, schema, processor_count, &control)
+        end
+        def test_replace_with_partitioning_with_auto_create_table
+          config = least_config.merge('mode' => 'replace', 'table' => 'table$20160929', 'auto_create_table' => true)
+          task = Bigquery.configure(config, schema, processor_count)
+          any_instance_of(BigqueryClient) do |obj|
+            mock(obj).get_dataset(config['dataset'])
+            mock(obj).create_table(config['temp_table'], options: task)
+            mock(obj).create_table(config['table'], options: task)
+            mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
+            mock(obj).delete_table(config['temp_table'])
+          end
+          Bigquery.transaction(config, schema, processor_count, &control)
         end
-        Bigquery.transaction(config, schema, processor_count, &control)
       end
       sub_test_case "replace_backup" do
         def test_replace_backup
           config = least_config.merge('mode' => 'replace_backup', 'dataset_old' => 'dataset_old', 'table_old' => 'table_old', 'temp_table' => 'temp_table')
+          task = Bigquery.configure(config, schema, processor_count)
           any_instance_of(BigqueryClient) do |obj|
             mock(obj).get_dataset(config['dataset'])
             mock(obj).get_dataset(config['dataset_old'])
-            mock(obj).create_table(config['temp_table'])
+            mock(obj).create_table(config['temp_table'], options: task)
+            mock(obj).get_table(task['table'])
             mock(obj).copy(config['table'], config['table_old'], config['dataset_old'])
             mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
@@ -108,11 +164,51 @@ module Embulk
         def test_replace_backup_auto_create_dataset
           config = least_config.merge('mode' => 'replace_backup', 'dataset_old' => 'dataset_old', 'table_old' => 'table_old', 'temp_table' => 'temp_table', 'auto_create_dataset' => true)
+          task = Bigquery.configure(config, schema, processor_count)
           any_instance_of(BigqueryClient) do |obj|
             mock(obj).create_dataset(config['dataset'])
             mock(obj).create_dataset(config['dataset_old'], reference: config['dataset'])
-            mock(obj).create_table(config['temp_table'])
+            mock(obj).create_table(config['temp_table'], options: task)
+            mock(obj).get_table(task['table'])
+            mock(obj).copy(config['table'], config['table_old'], config['dataset_old'])
+            mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
+            mock(obj).delete_table(config['temp_table'])
+          end
+          Bigquery.transaction(config, schema, processor_count, &control)
+        end
+        def test_replace_backup_with_partitioning
+          config = least_config.merge('mode' => 'replace_backup', 'table' => 'table$20160929', 'dataset_old' => 'dataset_old', 'table_old' => 'table_old$20190929', 'temp_table' => 'temp_table')
+          task = Bigquery.configure(config, schema, processor_count)
+          any_instance_of(BigqueryClient) do |obj|
+            mock(obj).get_dataset(config['dataset'])
+            mock(obj).get_dataset(config['dataset_old'])
+            mock(obj).create_table(config['temp_table'], options: task)
+            mock(obj).get_table(task['table'])
+            mock(obj).get_table(task['table_old'], dataset: config['dataset_old'])
+            mock(obj).get_table(task['table'])
+            mock(obj).copy(config['table'], config['table_old'], config['dataset_old'])
+            mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
+            mock(obj).delete_table(config['temp_table'])
+          end
+          Bigquery.transaction(config, schema, processor_count, &control)
+        end
+        def test_replace_backup_with_partitioning_auto_create_table
+          config = least_config.merge('mode' => 'replace_backup', 'table' => 'table$20160929', 'dataset_old' => 'dataset_old', 'table_old' => 'table_old$20160929', 'temp_table' => 'temp_table', 'auto_create_table' => true)
+          task = Bigquery.configure(config, schema, processor_count)
+          any_instance_of(BigqueryClient) do |obj|
+            mock(obj).get_dataset(config['dataset'])
+            mock(obj).get_dataset(config['dataset_old'])
+            mock(obj).create_table(config['temp_table'], options: task)
+            mock(obj).create_table(task['table'], options: task)
+            mock(obj).create_table(task['table_old'], dataset: config['dataset_old'], options: task)
+            mock(obj).get_table(task['table'])
             mock(obj).copy(config['table'], config['table_old'], config['dataset_old'])
             mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
@@ -121,6 +217,47 @@ module Embulk
           Bigquery.transaction(config, schema, processor_count, &control)
         end
       end
+      sub_test_case "append" do
+        def test_append
+          config = least_config.merge('mode' => 'append')
+          task = Bigquery.configure(config, schema, processor_count)
+          any_instance_of(BigqueryClient) do |obj|
+            mock(obj).get_dataset(config['dataset'])
+            mock(obj).create_table(config['temp_table'], options: task)
+            mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_APPEND')
+            mock(obj).delete_table(config['temp_table'])
+          end
+          Bigquery.transaction(config, schema, processor_count, &control)
+        end
+        def test_append_with_partitioning
+          config = least_config.merge('mode' => 'append', 'table' => 'table$20160929')
+          task = Bigquery.configure(config, schema, processor_count)
+          any_instance_of(BigqueryClient) do |obj|
+            mock(obj).get_dataset(config['dataset'])
+            mock(obj).create_table(config['temp_table'], options: task)
+            mock(obj).get_table(config['table'])
+            mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_APPEND')
+            mock(obj).delete_table(config['temp_table'])
+          end
+          Bigquery.transaction(config, schema, processor_count, &control)
+        end
+        def test_append_with_partitioning_with_auto_create_table
+          config = least_config.merge('mode' => 'append', 'table' => 'table$20160929', 'auto_create_table' => true)
+          task = Bigquery.configure(config, schema, processor_count)
+          any_instance_of(BigqueryClient) do |obj|
+            mock(obj).get_dataset(config['dataset'])
+            mock(obj).create_table(config['temp_table'], options: task)
+            mock(obj).create_table(config['table'], options: task)
+            mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_APPEND')
+            mock(obj).delete_table(config['temp_table'])
+          end
+          Bigquery.transaction(config, schema, processor_count, &control)
+        end
+      end
     end
   end
 end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: embulk-output-bigquery
 version: !ruby/object:Gem::Version
-  version: 0.3.7
+  version: 0.4.0
 platform: ruby
 authors:
 - Satoshi Akama
@@ -9,7 +9,7 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2016-08-03 00:00:00.000000000 Z
+date: 2016-10-01 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: google-api-client
@@ -100,6 +100,7 @@ files:
 - example/config_client_options.yml
 - example/config_csv.yml
 - example/config_delete_in_advance.yml
+- example/config_delete_in_advance_partitioned_table.yml
 - example/config_expose_errors.yml
 - example/config_gcs.yml
 - example/config_guess_from_embulk_schema.yml
@@ -114,8 +115,11 @@ files:
 - example/config_payload_column.yml
 - example/config_payload_column_index.yml
 - example/config_prevent_duplicate_insert.yml
+- example/config_progress_log_interval.yml
 - example/config_replace.yml
 - example/config_replace_backup.yml
+- example/config_replace_backup_paritioned_table.yml
+- example/config_replace_paritioned_table.yml
 - example/config_skip_file_generation.yml
 - example/config_table_strftime.yml
 - example/config_template_table.yml