RubyGems - embulk-output-bigquery - Versions diffs - 0.4.14 → 0.5.0 - Mend

embulk-output-bigquery 0.4.14 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +7 -0
data/README.md +4 -36
data/embulk-output-bigquery.gemspec +1 -1
data/lib/embulk/output/bigquery.rb +11 -32
data/lib/embulk/output/bigquery/bigquery_client.rb +2 -11
data/test/test_configure.rb +6 -7
data/test/test_example.rb +0 -1
data/test/test_transaction.rb +17 -56
metadata +1 -2
data/example/config_prevent_duplicate_insert.yml +0 -30

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 4fb376f288bfa86d632d727b3d0770ca4b94e364261c3f87a2569c801ee2fa00
-  data.tar.gz: 2571a07afb9aac0774e0744f9d5118712bb83f44f82470dd4fd25bf515c7b9fa
+  metadata.gz: 3e0087103039718cb24224b6bb793d820b53b935194d412e4b2984aba3d7d7a8
+  data.tar.gz: 9ac27a3b881277450cbfaa096de0690c721a8f86f0e78abb692c8a4ed5b679d5
 SHA512:
-  metadata.gz: 15f71decc69d34d8fbc3ee09452a6307107b71f759820b8a0521c6473b2231c4706febf216b59baae0e18fc3a06a056c18552d1093f0ac264ef84183a6d27992
-  data.tar.gz: 7ee57f82766927cb804bf0d88550f7f3e4d0459315160a0eec98ccd4c00e2a2423a093cffd17e836d2dba8461cbc2ae4e227ff85d60c7c9628d32b1fd142b7eb
+  metadata.gz: 6b0ccf4e349a5d15321cfcc97138a98676bddfd412fd6fadfc8b1e0d6cd31d9739a8a5f46ccd923644543ae43cc0134b3e7598f80d89c330a4ac8aec49c084c1
+  data.tar.gz: f02557cdd7956620ae59eb6bc0e5872992d20a65881bd69230b0b0442342a36203d1eedd8a20702d2000f412b909359657bfa300b3e82b5f494398ea6e5ea301

data/CHANGELOG.md CHANGED

@@ -1,3 +1,10 @@
+## 0.5.0 - 2019-08-10
+* [incompatibility change] Drop deprecated time\_partitioning.require\_partition\_filter
+* [incompatibility change] Drop prevent\_duplicate\_insert which has no use-case now
+* [incompatibility change] Change default value of `auto\_create\_table` to `true` from `false`
+  * Modes `replace`, `replace_backup`, `append`, `delete_in_advance`, that is, except `append_direct` requires `auto_create_table: true`.
 ## 0.4.14 - 2019-08-10
 * [enhancement] Support field partitioning correctly.

data/README.md CHANGED

@@ -23,14 +23,6 @@ https://developers.google.com/bigquery/loading-data-into-bigquery
 Current version of this plugin supports Google API with Service Account Authentication, but does not support
 OAuth flow for installed applications.
-### INCOMPATIBILITY CHANGES
-v0.3.x has incompatibility changes with v0.2.x. Please see [CHANGELOG.md](CHANGELOG.md) for details.
-* `formatter` option (formatter plugin support) is dropped. Use `source_format` option instead. (it already exists in v0.2.x too)
-* `encoders` option (encoder plugin support) is dropped. Use `compression` option instead (it already exists in v0.2.x too).
-* `mode: append` mode now expresses a transactional append, and `mode: append_direct` is one which is not transactional.
 ## Configuration
 #### Original options
@@ -47,10 +39,9 @@ v0.3.x has incompatibility changes with v0.2.x. Please see [CHANGELOG.md](CHANGE
 |  location                            | string      | optional   | nil                      | geographic location of dataset. See [Location](#location) |
 |  table                               | string      | required   |                          | table name, or table name with a partition decorator such as `table_name$20160929`|
 |  auto_create_dataset                 | boolean     | optional   | false                    | automatically create dataset |
-|  auto_create_table                   | boolean     | optional   | false                    | See [Dynamic Table Creating](#dynamic-table-creating) and [Time Partitioning](#time-partitioning) |
+|  auto_create_table                   | boolean     | optional   | true                     | `false` is available only for `append_direct` mode. Other modes requires `true`. See [Dynamic Table Creating](#dynamic-table-creating) and [Time Partitioning](#time-partitioning) |
 |  schema_file                         | string      | optional   |                          | /path/to/schema.json |
 |  template_table                      | string      | optional   |                          | template table name. See [Dynamic Table Creating](#dynamic-table-creating) |
-|  prevent_duplicate_insert            | boolean     | optional   | false                    | See [Prevent Duplication](#prevent-duplication) |
 |  job_status_max_polling_time         | int         | optional   | 3600 sec                 | Max job status polling time |
 |  job_status_polling_interval         | int         | optional   | 10 sec                   | Job status polling interval |
 |  is_skip_job_result_check            | boolean     | optional   | false                    | Skip waiting Load job finishes. Available for append, or delete_in_advance mode |
@@ -107,7 +98,6 @@ Following options are same as [bq command-line tools](https://cloud.google.com/b
 |  time_partitioning.type           | string   | required  | nil     | The only type supported is DAY, which will generate one partition per day based on data loading time. |
 |  time_partitioning.expiration_ms  | int      | optional  | nil     | Number of milliseconds for which to keep the storage for a partition. |
 |  time_partitioning.field          | string   | optional  | nil     | `DATE` or `TIMESTAMP` column used for partitioning |
-|  time_partitioning.require_partition_filter | boolean      | optional  | nil     | If true, valid partition filter is required when query |
 |  clustering                       | hash     | optional  | nil     | Currently, clustering is supported for partitioned tables, so must be used with `time_partitioning` option. See [clustered tables](https://cloud.google.com/bigquery/docs/clustered-tables) |
 |  clustering.fields                | array    | required  | nil     | One or more fields on which data should be clustered. The order of the specified columns determines the sort order of the data. |
 |  schema_update_options            | array    | optional  | nil     | (Experimental) List of `ALLOW_FIELD_ADDITION` or `ALLOW_FIELD_RELAXATION` or both. See [jobs#configuration.load.schemaUpdateOptions](https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load.schemaUpdateOptions). NOTE for the current status: `schema_update_options` does not work for `copy` job, that is, is not effective for most of modes such as `append`, `replace` and `replace_backup`. `delete_in_advance` deletes origin table so does not need to update schema. Only `append_direct` can utilize schema update. |
@@ -252,11 +242,6 @@ out:
 ### Dynamic table creating
-This plugin tries to create a table using BigQuery API when
-* mode is either of `delete_in_advance`, `replace`, `replace_backup`, `append`.
-* mode is `append_direct` and `auto_create_table` is true.
 There are 3 ways to set schema.
 #### Set schema.json
@@ -355,22 +340,6 @@ out:
   payload_column_index: 0 # or, payload_column: payload
 ```
-### Prevent Duplication
-`prevent_duplicate_insert` option is used to prevent inserting same data for modes `append` or `append_direct`.
-When `prevent_duplicate_insert` is set to true, embulk-output-bigquery generate job ID from md5 hash of file and other options.
-`job ID = md5(md5(file) + dataset + table + schema + source_format + file_delimiter + max_bad_records + encoding + ignore_unknown_values + allow_quoted_newlines)`
-[job ID must be unique(including failures)](https://cloud.google.com/bigquery/loading-data-into-bigquery#consistency) so that same data can't be inserted with same settings repeatedly.
-```yaml
-out:
-  type: bigquery
-  prevent_duplicate_insert: true
-```
 ### GCS Bucket
 This is useful to reduce number of consumed jobs, which is limited by [100,000 jobs per project per day](https://cloud.google.com/bigquery/quotas#load_jobs).
@@ -401,32 +370,31 @@ To load into a partition, specify `table` parameter with a partition decorator a
 out:
   type: bigquery
   table: table_name$20160929
-  auto_create_table: true
 ```
-You may configure `time_partitioning` parameter together to create table via `auto_create_table: true` option as:
+You may configure `time_partitioning` parameter together as:
 ```yaml
 out:
   type: bigquery
   table: table_name$20160929
-  auto_create_table: true
   time_partitioning:
     type: DAY
     expiration_ms: 259200000
 ```
 You can also create column-based partitioning table as:
 ```yaml
 out:
   type: bigquery
   mode: replace
-  auto_create_table: true
   table: table_name
   time_partitioning:
     type: DAY
     field: timestamp
 ```
 Note the `time_partitioning.field` should be top-level `DATE` or `TIMESTAMP`.
 Use [Tables: patch](https://cloud.google.com/bigquery/docs/reference/v2/tables/patch) API to update the schema of the partitioned table, embulk-output-bigquery itself does not support it, though.

data/embulk-output-bigquery.gemspec CHANGED

@@ -1,6 +1,6 @@
 Gem::Specification.new do |spec|
   spec.name          = "embulk-output-bigquery"
-  spec.version       = "0.4.14"
+  spec.version       = "0.5.0"
   spec.authors       = ["Satoshi Akama", "Naotoshi Seo"]
   spec.summary       = "Google BigQuery output plugin for Embulk"
   spec.description   = "Embulk plugin that insert records to Google BigQuery."

data/lib/embulk/output/bigquery.rb CHANGED

@@ -45,7 +45,7 @@ module Embulk
           'table_old'                      => config.param('table_old',                      :string,  :default => nil),
           'table_name_old'                 => config.param('table_name_old',                 :string,  :default => nil), # lower version compatibility
           'auto_create_dataset'            => config.param('auto_create_dataset',            :bool,    :default => false),
-          'auto_create_table'              => config.param('auto_create_table',              :bool,    :default => false),
+          'auto_create_table'              => config.param('auto_create_table',              :bool,    :default => true),
           'schema_file'                    => config.param('schema_file',                    :string,  :default => nil),
           'template_table'                 => config.param('template_table',                 :string,  :default => nil),
@@ -53,7 +53,6 @@ module Embulk
           'job_status_max_polling_time'    => config.param('job_status_max_polling_time',    :integer, :default => 3600),
           'job_status_polling_interval'    => config.param('job_status_polling_interval',    :integer, :default => 10),
           'is_skip_job_result_check'       => config.param('is_skip_job_result_check',       :bool,    :default => false),
-          'prevent_duplicate_insert'       => config.param('prevent_duplicate_insert',       :bool,    :default => false),
           'with_rehearsal'                 => config.param('with_rehearsal',                 :bool,    :default => false),
           'rehearsal_counts'               => config.param('rehearsal_counts',               :integer, :default => 1000),
           'abort_on_error'                 => config.param('abort_on_error',                 :bool,    :default => nil),
@@ -105,10 +104,14 @@ module Embulk
           raise ConfigError.new "`mode` must be one of append, append_direct, replace, delete_in_advance, replace_backup"
         end
+        if %w[append replace delete_in_advance replace_backup].include?(task['mode']) and !task['auto_create_table']
+          raise ConfigError.new "`mode: #{task['mode']}` requires `auto_create_table: true`"
+        end
         if task['mode'] == 'replace_backup'
           task['table_old'] ||= task['table_name_old'] # for lower version compatibility
           if task['dataset_old'].nil? and task['table_old'].nil?
-            raise ConfigError.new "`mode replace_backup` requires either of `dataset_old` or `table_old`"
+            raise ConfigError.new "`mode: replace_backup` requires either of `dataset_old` or `table_old`"
           end
           task['dataset_old'] ||= task['dataset']
           task['table_old']   ||= task['table']
@@ -306,42 +309,18 @@ module Embulk
         case task['mode']
         when 'delete_in_advance'
-          bigquery.delete_partition(task['table'])
+          bigquery.delete_table_or_partition(task['table'])
           bigquery.create_table_if_not_exists(task['table'])
         when 'replace'
           bigquery.create_table_if_not_exists(task['temp_table'])
-          if Helper.has_partition_decorator?(task['table'])
-            if task['auto_create_table']
-              bigquery.create_table_if_not_exists(task['table'])
-            else
-              bigquery.get_table(task['table']) # raises NotFoundError
-            end
-          end
+          bigquery.create_table_if_not_exists(task['table'])
         when 'append'
           bigquery.create_table_if_not_exists(task['temp_table'])
-          if Helper.has_partition_decorator?(task['table'])
-            if task['auto_create_table']
-              bigquery.create_table_if_not_exists(task['table'])
-            else
-              bigquery.get_table(task['table']) # raises NotFoundError
-            end
-          end
+          bigquery.create_table_if_not_exists(task['table'])
         when 'replace_backup'
           bigquery.create_table_if_not_exists(task['temp_table'])
-          if Helper.has_partition_decorator?(task['table'])
-            if task['auto_create_table']
-              bigquery.create_table_if_not_exists(task['table'])
-            else
-              bigquery.get_table(task['table']) # raises NotFoundError
-            end
-          end
-          if Helper.has_partition_decorator?(task['table_old'])
-            if task['auto_create_table']
-              bigquery.create_table_if_not_exists(task['table_old'], dataset: task['dataset_old'])
-            else
-              bigquery.get_table(task['table_old'], dataset: task['dataset_old']) # raises NotFoundError
-            end
-          end
+          bigquery.create_table_if_not_exists(task['table'])
+          bigquery.create_table_if_not_exists(task['table_old'], dataset: task['dataset_old'])
         else # append_direct
           if task['auto_create_table']
             bigquery.create_table_if_not_exists(task['table'])

data/lib/embulk/output/bigquery/bigquery_client.rb CHANGED

@@ -79,11 +79,7 @@ module Embulk
             begin
               # As https://cloud.google.com/bigquery/docs/managing_jobs_datasets_projects#managingjobs says,
               # we should generate job_id in client code, otherwise, retrying would cause duplication
-              if @task['prevent_duplicate_insert'] and (@task['mode'] == 'append' or @task['mode'] == 'append_direct')
-                job_id = Helper.create_load_job_id(@task, path, fields)
-              else
-                job_id = "embulk_load_job_#{SecureRandom.uuid}"
-              end
+              job_id = "embulk_load_job_#{SecureRandom.uuid}"
               Embulk.logger.info { "embulk-output-bigquery: Load job starting... job_id:[#{job_id}] #{object_uris} => #{@project}:#{@dataset}.#{table} in #{@location_for_log}" }
               body = {
@@ -174,11 +170,7 @@ module Embulk
               if File.exist?(path)
                 # As https://cloud.google.com/bigquery/docs/managing_jobs_datasets_projects#managingjobs says,
                 # we should generate job_id in client code, otherwise, retrying would cause duplication
-                if @task['prevent_duplicate_insert'] and (@task['mode'] == 'append' or @task['mode'] == 'append_direct')
-                  job_id = Helper.create_load_job_id(@task, path, fields)
-                else
-                  job_id = "embulk_load_job_#{SecureRandom.uuid}"
-                end
+                job_id = "embulk_load_job_#{SecureRandom.uuid}"
                 Embulk.logger.info { "embulk-output-bigquery: Load job starting... job_id:[#{job_id}] #{path} => #{@project}:#{@dataset}.#{table} in #{@location_for_log}" }
               else
                 Embulk.logger.info { "embulk-output-bigquery: Load job starting... #{path} does not exist, skipped" }
@@ -437,7 +429,6 @@ module Embulk
                 type: options['time_partitioning']['type'],
                 expiration_ms: options['time_partitioning']['expiration_ms'],
                 field: options['time_partitioning']['field'],
-                require_partition_filter: options['time_partitioning']['require_partition_filter'],
               }
             end

data/test/test_configure.rb CHANGED

@@ -55,14 +55,13 @@ module Embulk
         assert_equal nil, task['table_old']
         assert_equal nil, task['table_name_old']
         assert_equal false, task['auto_create_dataset']
-        assert_equal false, task['auto_create_table']
+        assert_equal true, task['auto_create_table']
         assert_equal nil, task['schema_file']
         assert_equal nil, task['template_table']
         assert_equal true, task['delete_from_local_when_job_end']
         assert_equal 3600, task['job_status_max_polling_time']
         assert_equal 10, task['job_status_polling_interval']
         assert_equal false, task['is_skip_job_result_check']
-        assert_equal false, task['prevent_duplicate_insert']
         assert_equal false, task['with_rehearsal']
         assert_equal 1000, task['rehearsal_counts']
         assert_equal [], task['column_options']
@@ -162,22 +161,22 @@ module Embulk
       end
       def test_payload_column
-        config = least_config.merge('payload_column' => schema.first.name)
+        config = least_config.merge('payload_column' => schema.first.name, 'auto_create_table' => false, 'mode' => 'append_direct')
         task = Bigquery.configure(config, schema, processor_count)
         assert_equal task['payload_column_index'], 0
-        config = least_config.merge('payload_column' => 'not_exist')
+        config = least_config.merge('payload_column' => 'not_exist', 'auto_create_table' => false, 'mode' => 'append_direct')
         assert_raise { Bigquery.configure(config, schema, processor_count) }
       end
       def test_payload_column_index
-        config = least_config.merge('payload_column_index' => 0)
+        config = least_config.merge('payload_column_index' => 0, 'auto_create_table' => false, 'mode' => 'append_direct')
         assert_nothing_raised { Bigquery.configure(config, schema, processor_count) }
-        config = least_config.merge('payload_column_index' => -1)
+        config = least_config.merge('payload_column_index' => -1, 'auto_create_table' => false, 'mode' => 'append_direct')
         assert_raise { Bigquery.configure(config, schema, processor_count) }
-        config = least_config.merge('payload_column_index' => schema.size)
+        config = least_config.merge('payload_column_index' => schema.size, 'auto_create_table' => false, 'mode' => 'append_direct')
         assert_raise { Bigquery.configure(config, schema, processor_count) }
       end

data/test/test_example.rb CHANGED

@@ -33,7 +33,6 @@ else
     files.each do |config_path|
       if %w[
         config_expose_errors.yml
-        config_prevent_duplicate_insert.yml
         ].include?(File.basename(config_path))
         define_method(:"test_#{File.basename(config_path, ".yml")}") do
           assert_false embulk_run(config_path)

data/test/test_transaction.rb CHANGED

@@ -41,8 +41,8 @@ module Embulk
       end
       sub_test_case "append_direct" do
-        def test_append_direct
-          config = least_config.merge('mode' => 'append_direct')
+        def test_append_direc_without_auto_create
+          config = least_config.merge('mode' => 'append_direct', 'auto_create_dataset' => false, 'auto_create_table' => false)
           any_instance_of(BigqueryClient) do |obj|
             mock(obj).get_dataset(config['dataset'])
             mock(obj).get_table(config['table'])
@@ -60,8 +60,8 @@ module Embulk
           Bigquery.transaction(config, schema, processor_count, &control)
         end
-        def test_append_direct_with_partition
-          config = least_config.merge('mode' => 'append_direct', 'table' => 'table$20160929')
+        def test_append_direct_with_partition_without_auto_create
+          config = least_config.merge('mode' => 'append_direct', 'table' => 'table$20160929', 'auto_create_dataset' => false, 'auto_create_table' => false)
           any_instance_of(BigqueryClient) do |obj|
             mock(obj).get_dataset(config['dataset'])
             mock(obj).get_table(config['table'])
@@ -86,7 +86,7 @@ module Embulk
           task = Bigquery.configure(config, schema, processor_count)
           any_instance_of(BigqueryClient) do |obj|
             mock(obj).get_dataset(config['dataset'])
-            mock(obj).delete_partition(config['table'])
+            mock(obj).delete_table_or_partition(config['table'])
             mock(obj).create_table_if_not_exists(config['table'])
           end
           Bigquery.transaction(config, schema, processor_count, &control)
@@ -97,7 +97,7 @@ module Embulk
           task = Bigquery.configure(config, schema, processor_count)
           any_instance_of(BigqueryClient) do |obj|
             mock(obj).get_dataset(config['dataset'])
-            mock(obj).delete_partition(config['table'])
+            mock(obj).delete_table_or_partition(config['table'])
             mock(obj).create_table_if_not_exists(config['table'])
           end
           Bigquery.transaction(config, schema, processor_count, &control)
@@ -111,6 +111,7 @@ module Embulk
           any_instance_of(BigqueryClient) do |obj|
             mock(obj).get_dataset(config['dataset'])
             mock(obj).create_table_if_not_exists(config['temp_table'])
+            mock(obj).create_table_if_not_exists(config['table'])
             mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
             mock(obj).delete_table(config['temp_table'])
           end
@@ -120,19 +121,6 @@ module Embulk
         def test_replace_with_partitioning
           config = least_config.merge('mode' => 'replace', 'table' => 'table$20160929')
           task = Bigquery.configure(config, schema, processor_count)
-          any_instance_of(BigqueryClient) do |obj|
-            mock(obj).get_dataset(config['dataset'])
-            mock(obj).create_table_if_not_exists(config['temp_table'])
-            mock(obj).get_table(config['table'])
-            mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
-            mock(obj).delete_table(config['temp_table'])
-          end
-          Bigquery.transaction(config, schema, processor_count, &control)
-        end
-        def test_replace_with_partitioning_with_auto_create_table
-          config = least_config.merge('mode' => 'replace', 'table' => 'table$20160929', 'auto_create_table' => true)
-          task = Bigquery.configure(config, schema, processor_count)
           any_instance_of(BigqueryClient) do |obj|
             mock(obj).get_dataset(config['dataset'])
             mock(obj).create_table_if_not_exists(config['temp_table'])
@@ -152,8 +140,10 @@ module Embulk
             mock(obj).get_dataset(config['dataset'])
             mock(obj).get_dataset(config['dataset_old'])
             mock(obj).create_table_if_not_exists(config['temp_table'])
+            mock(obj).create_table_if_not_exists(config['table'])
+            mock(obj).create_table_if_not_exists(config['table_old'], dataset: config['dataset_old'])
-            mock(obj).get_table_or_partition(task['table'])
+            mock(obj).get_table_or_partition(config['table'])
             mock(obj).copy(config['table'], config['table_old'], config['dataset_old'])
             mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
@@ -168,9 +158,11 @@ module Embulk
           any_instance_of(BigqueryClient) do |obj|
             mock(obj).create_dataset(config['dataset'])
             mock(obj).create_dataset(config['dataset_old'], reference: config['dataset'])
+            mock(obj).create_table_if_not_exists(config['table'])
             mock(obj).create_table_if_not_exists(config['temp_table'])
+            mock(obj).create_table_if_not_exists(config['table_old'], dataset: config['dataset_old'])
-            mock(obj).get_table_or_partition(task['table'])
+            mock(obj).get_table_or_partition(config['table'])
             mock(obj).copy(config['table'], config['table_old'], config['dataset_old'])
             mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
@@ -180,35 +172,16 @@ module Embulk
         end
         def test_replace_backup_with_partitioning
-          config = least_config.merge('mode' => 'replace_backup', 'table' => 'table$20160929', 'dataset_old' => 'dataset_old', 'table_old' => 'table_old$20190929', 'temp_table' => 'temp_table')
-          task = Bigquery.configure(config, schema, processor_count)
-          any_instance_of(BigqueryClient) do |obj|
-            mock(obj).get_dataset(config['dataset'])
-            mock(obj).get_dataset(config['dataset_old'])
-            mock(obj).create_table_if_not_exists(config['temp_table'])
-            mock(obj).get_table(task['table'])
-            mock(obj).get_table(task['table_old'], dataset: config['dataset_old'])
-            mock(obj).get_table_or_partition(task['table'])
-            mock(obj).copy(config['table'], config['table_old'], config['dataset_old'])
-            mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
-            mock(obj).delete_table(config['temp_table'])
-          end
-          Bigquery.transaction(config, schema, processor_count, &control)
-        end
-        def test_replace_backup_with_partitioning_auto_create_table
           config = least_config.merge('mode' => 'replace_backup', 'table' => 'table$20160929', 'dataset_old' => 'dataset_old', 'table_old' => 'table_old$20160929', 'temp_table' => 'temp_table', 'auto_create_table' => true)
           task = Bigquery.configure(config, schema, processor_count)
           any_instance_of(BigqueryClient) do |obj|
             mock(obj).get_dataset(config['dataset'])
             mock(obj).get_dataset(config['dataset_old'])
             mock(obj).create_table_if_not_exists(config['temp_table'])
-            mock(obj).create_table_if_not_exists(task['table'])
-            mock(obj).create_table_if_not_exists(task['table_old'], dataset: config['dataset_old'])
+            mock(obj).create_table_if_not_exists(config['table'])
+            mock(obj).create_table_if_not_exists(config['table_old'], dataset: config['dataset_old'])
-            mock(obj).get_table_or_partition(task['table'])
+            mock(obj).get_table_or_partition(config['table'])
             mock(obj).copy(config['table'], config['table_old'], config['dataset_old'])
             mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
@@ -225,6 +198,7 @@ module Embulk
           any_instance_of(BigqueryClient) do |obj|
             mock(obj).get_dataset(config['dataset'])
             mock(obj).create_table_if_not_exists(config['temp_table'])
+            mock(obj).create_table_if_not_exists(config['table'])
             mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_APPEND')
             mock(obj).delete_table(config['temp_table'])
           end
@@ -232,19 +206,6 @@ module Embulk
         end
         def test_append_with_partitioning
-          config = least_config.merge('mode' => 'append', 'table' => 'table$20160929')
-          task = Bigquery.configure(config, schema, processor_count)
-          any_instance_of(BigqueryClient) do |obj|
-            mock(obj).get_dataset(config['dataset'])
-            mock(obj).create_table_if_not_exists(config['temp_table'])
-            mock(obj).get_table(config['table'])
-            mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_APPEND')
-            mock(obj).delete_table(config['temp_table'])
-          end
-          Bigquery.transaction(config, schema, processor_count, &control)
-        end
-        def test_append_with_partitioning_with_auto_create_table
           config = least_config.merge('mode' => 'append', 'table' => 'table$20160929', 'auto_create_table' => true)
           task = Bigquery.configure(config, schema, processor_count)
           any_instance_of(BigqueryClient) do |obj|

metadata CHANGED

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: embulk-output-bigquery
 version: !ruby/object:Gem::Version
-  version: 0.4.14
+  version: 0.5.0
 platform: ruby
 authors:
 - Satoshi Akama
@@ -102,7 +102,6 @@ files:
 - example/config_nested_record.yml
 - example/config_payload_column.yml
 - example/config_payload_column_index.yml
-- example/config_prevent_duplicate_insert.yml
 - example/config_progress_log_interval.yml
 - example/config_replace.yml
 - example/config_replace_backup.yml

data/example/config_prevent_duplicate_insert.yml DELETED

@@ -1,30 +0,0 @@
-in:
-  type: file
-  path_prefix: example/example.csv
-  parser:
-    type: csv
-    charset: UTF-8
-    newline: CRLF
-    null_string: 'NULL'
-    skip_header_lines: 1
-    comment_line_marker: '#'
-    columns:
-      - {name: date,        type: string}
-      - {name: timestamp,   type: timestamp, format: "%Y-%m-%d %H:%M:%S.%N", timezone: "+09:00"}
-      - {name: "null",      type: string}
-      - {name: long,        type: long}
-      - {name: string,      type: string}
-      - {name: double,      type: double}
-      - {name: boolean,     type: boolean}
-out:
-  type: bigquery
-  mode: append
-  auth_method: json_key
-  json_keyfile: example/your-project-000.json
-  dataset: your_dataset_name
-  table: your_table_name
-  source_format: NEWLINE_DELIMITED_JSON
-  auto_create_dataset: true
-  auto_create_table: true
-  schema_file: example/schema.json
-  prevent_duplicate_insert: true