embulk-output-bigquery 0.4.14 → 0.5.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 4fb376f288bfa86d632d727b3d0770ca4b94e364261c3f87a2569c801ee2fa00
4
- data.tar.gz: 2571a07afb9aac0774e0744f9d5118712bb83f44f82470dd4fd25bf515c7b9fa
3
+ metadata.gz: 3e0087103039718cb24224b6bb793d820b53b935194d412e4b2984aba3d7d7a8
4
+ data.tar.gz: 9ac27a3b881277450cbfaa096de0690c721a8f86f0e78abb692c8a4ed5b679d5
5
5
  SHA512:
6
- metadata.gz: 15f71decc69d34d8fbc3ee09452a6307107b71f759820b8a0521c6473b2231c4706febf216b59baae0e18fc3a06a056c18552d1093f0ac264ef84183a6d27992
7
- data.tar.gz: 7ee57f82766927cb804bf0d88550f7f3e4d0459315160a0eec98ccd4c00e2a2423a093cffd17e836d2dba8461cbc2ae4e227ff85d60c7c9628d32b1fd142b7eb
6
+ metadata.gz: 6b0ccf4e349a5d15321cfcc97138a98676bddfd412fd6fadfc8b1e0d6cd31d9739a8a5f46ccd923644543ae43cc0134b3e7598f80d89c330a4ac8aec49c084c1
7
+ data.tar.gz: f02557cdd7956620ae59eb6bc0e5872992d20a65881bd69230b0b0442342a36203d1eedd8a20702d2000f412b909359657bfa300b3e82b5f494398ea6e5ea301
@@ -1,3 +1,10 @@
1
+ ## 0.5.0 - 2019-08-10
2
+
3
+ * [incompatibility change] Drop deprecated time\_partitioning.require\_partition\_filter
4
+ * [incompatibility change] Drop prevent\_duplicate\_insert which has no use-case now
5
+ * [incompatibility change] Change default value of `auto\_create\_table` to `true` from `false`
6
+ * Modes `replace`, `replace_backup`, `append`, `delete_in_advance`, that is, except `append_direct` requires `auto_create_table: true`.
7
+
1
8
  ## 0.4.14 - 2019-08-10
2
9
 
3
10
  * [enhancement] Support field partitioning correctly.
data/README.md CHANGED
@@ -23,14 +23,6 @@ https://developers.google.com/bigquery/loading-data-into-bigquery
23
23
  Current version of this plugin supports Google API with Service Account Authentication, but does not support
24
24
  OAuth flow for installed applications.
25
25
 
26
- ### INCOMPATIBILITY CHANGES
27
-
28
- v0.3.x has incompatibility changes with v0.2.x. Please see [CHANGELOG.md](CHANGELOG.md) for details.
29
-
30
- * `formatter` option (formatter plugin support) is dropped. Use `source_format` option instead. (it already exists in v0.2.x too)
31
- * `encoders` option (encoder plugin support) is dropped. Use `compression` option instead (it already exists in v0.2.x too).
32
- * `mode: append` mode now expresses a transactional append, and `mode: append_direct` is one which is not transactional.
33
-
34
26
  ## Configuration
35
27
 
36
28
  #### Original options
@@ -47,10 +39,9 @@ v0.3.x has incompatibility changes with v0.2.x. Please see [CHANGELOG.md](CHANGE
47
39
  | location | string | optional | nil | geographic location of dataset. See [Location](#location) |
48
40
  | table | string | required | | table name, or table name with a partition decorator such as `table_name$20160929`|
49
41
  | auto_create_dataset | boolean | optional | false | automatically create dataset |
50
- | auto_create_table | boolean | optional | false | See [Dynamic Table Creating](#dynamic-table-creating) and [Time Partitioning](#time-partitioning) |
42
+ | auto_create_table | boolean | optional | true | `false` is available only for `append_direct` mode. Other modes requires `true`. See [Dynamic Table Creating](#dynamic-table-creating) and [Time Partitioning](#time-partitioning) |
51
43
  | schema_file | string | optional | | /path/to/schema.json |
52
44
  | template_table | string | optional | | template table name. See [Dynamic Table Creating](#dynamic-table-creating) |
53
- | prevent_duplicate_insert | boolean | optional | false | See [Prevent Duplication](#prevent-duplication) |
54
45
  | job_status_max_polling_time | int | optional | 3600 sec | Max job status polling time |
55
46
  | job_status_polling_interval | int | optional | 10 sec | Job status polling interval |
56
47
  | is_skip_job_result_check | boolean | optional | false | Skip waiting Load job finishes. Available for append, or delete_in_advance mode |
@@ -107,7 +98,6 @@ Following options are same as [bq command-line tools](https://cloud.google.com/b
107
98
  | time_partitioning.type | string | required | nil | The only type supported is DAY, which will generate one partition per day based on data loading time. |
108
99
  | time_partitioning.expiration_ms | int | optional | nil | Number of milliseconds for which to keep the storage for a partition. |
109
100
  | time_partitioning.field | string | optional | nil | `DATE` or `TIMESTAMP` column used for partitioning |
110
- | time_partitioning.require_partition_filter | boolean | optional | nil | If true, valid partition filter is required when query |
111
101
  | clustering | hash | optional | nil | Currently, clustering is supported for partitioned tables, so must be used with `time_partitioning` option. See [clustered tables](https://cloud.google.com/bigquery/docs/clustered-tables) |
112
102
  | clustering.fields | array | required | nil | One or more fields on which data should be clustered. The order of the specified columns determines the sort order of the data. |
113
103
  | schema_update_options | array | optional | nil | (Experimental) List of `ALLOW_FIELD_ADDITION` or `ALLOW_FIELD_RELAXATION` or both. See [jobs#configuration.load.schemaUpdateOptions](https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load.schemaUpdateOptions). NOTE for the current status: `schema_update_options` does not work for `copy` job, that is, is not effective for most of modes such as `append`, `replace` and `replace_backup`. `delete_in_advance` deletes origin table so does not need to update schema. Only `append_direct` can utilize schema update. |
@@ -252,11 +242,6 @@ out:
252
242
 
253
243
  ### Dynamic table creating
254
244
 
255
- This plugin tries to create a table using BigQuery API when
256
-
257
- * mode is either of `delete_in_advance`, `replace`, `replace_backup`, `append`.
258
- * mode is `append_direct` and `auto_create_table` is true.
259
-
260
245
  There are 3 ways to set schema.
261
246
 
262
247
  #### Set schema.json
@@ -355,22 +340,6 @@ out:
355
340
  payload_column_index: 0 # or, payload_column: payload
356
341
  ```
357
342
 
358
- ### Prevent Duplication
359
-
360
- `prevent_duplicate_insert` option is used to prevent inserting same data for modes `append` or `append_direct`.
361
-
362
- When `prevent_duplicate_insert` is set to true, embulk-output-bigquery generate job ID from md5 hash of file and other options.
363
-
364
- `job ID = md5(md5(file) + dataset + table + schema + source_format + file_delimiter + max_bad_records + encoding + ignore_unknown_values + allow_quoted_newlines)`
365
-
366
- [job ID must be unique(including failures)](https://cloud.google.com/bigquery/loading-data-into-bigquery#consistency) so that same data can't be inserted with same settings repeatedly.
367
-
368
- ```yaml
369
- out:
370
- type: bigquery
371
- prevent_duplicate_insert: true
372
- ```
373
-
374
343
  ### GCS Bucket
375
344
 
376
345
  This is useful to reduce number of consumed jobs, which is limited by [100,000 jobs per project per day](https://cloud.google.com/bigquery/quotas#load_jobs).
@@ -401,32 +370,31 @@ To load into a partition, specify `table` parameter with a partition decorator a
401
370
  out:
402
371
  type: bigquery
403
372
  table: table_name$20160929
404
- auto_create_table: true
405
373
  ```
406
374
 
407
- You may configure `time_partitioning` parameter together to create table via `auto_create_table: true` option as:
375
+ You may configure `time_partitioning` parameter together as:
408
376
 
409
377
  ```yaml
410
378
  out:
411
379
  type: bigquery
412
380
  table: table_name$20160929
413
- auto_create_table: true
414
381
  time_partitioning:
415
382
  type: DAY
416
383
  expiration_ms: 259200000
417
384
  ```
418
385
 
419
386
  You can also create column-based partitioning table as:
387
+
420
388
  ```yaml
421
389
  out:
422
390
  type: bigquery
423
391
  mode: replace
424
- auto_create_table: true
425
392
  table: table_name
426
393
  time_partitioning:
427
394
  type: DAY
428
395
  field: timestamp
429
396
  ```
397
+
430
398
  Note the `time_partitioning.field` should be top-level `DATE` or `TIMESTAMP`.
431
399
 
432
400
  Use [Tables: patch](https://cloud.google.com/bigquery/docs/reference/v2/tables/patch) API to update the schema of the partitioned table, embulk-output-bigquery itself does not support it, though.
@@ -1,6 +1,6 @@
1
1
  Gem::Specification.new do |spec|
2
2
  spec.name = "embulk-output-bigquery"
3
- spec.version = "0.4.14"
3
+ spec.version = "0.5.0"
4
4
  spec.authors = ["Satoshi Akama", "Naotoshi Seo"]
5
5
  spec.summary = "Google BigQuery output plugin for Embulk"
6
6
  spec.description = "Embulk plugin that insert records to Google BigQuery."
@@ -45,7 +45,7 @@ module Embulk
45
45
  'table_old' => config.param('table_old', :string, :default => nil),
46
46
  'table_name_old' => config.param('table_name_old', :string, :default => nil), # lower version compatibility
47
47
  'auto_create_dataset' => config.param('auto_create_dataset', :bool, :default => false),
48
- 'auto_create_table' => config.param('auto_create_table', :bool, :default => false),
48
+ 'auto_create_table' => config.param('auto_create_table', :bool, :default => true),
49
49
  'schema_file' => config.param('schema_file', :string, :default => nil),
50
50
  'template_table' => config.param('template_table', :string, :default => nil),
51
51
 
@@ -53,7 +53,6 @@ module Embulk
53
53
  'job_status_max_polling_time' => config.param('job_status_max_polling_time', :integer, :default => 3600),
54
54
  'job_status_polling_interval' => config.param('job_status_polling_interval', :integer, :default => 10),
55
55
  'is_skip_job_result_check' => config.param('is_skip_job_result_check', :bool, :default => false),
56
- 'prevent_duplicate_insert' => config.param('prevent_duplicate_insert', :bool, :default => false),
57
56
  'with_rehearsal' => config.param('with_rehearsal', :bool, :default => false),
58
57
  'rehearsal_counts' => config.param('rehearsal_counts', :integer, :default => 1000),
59
58
  'abort_on_error' => config.param('abort_on_error', :bool, :default => nil),
@@ -105,10 +104,14 @@ module Embulk
105
104
  raise ConfigError.new "`mode` must be one of append, append_direct, replace, delete_in_advance, replace_backup"
106
105
  end
107
106
 
107
+ if %w[append replace delete_in_advance replace_backup].include?(task['mode']) and !task['auto_create_table']
108
+ raise ConfigError.new "`mode: #{task['mode']}` requires `auto_create_table: true`"
109
+ end
110
+
108
111
  if task['mode'] == 'replace_backup'
109
112
  task['table_old'] ||= task['table_name_old'] # for lower version compatibility
110
113
  if task['dataset_old'].nil? and task['table_old'].nil?
111
- raise ConfigError.new "`mode replace_backup` requires either of `dataset_old` or `table_old`"
114
+ raise ConfigError.new "`mode: replace_backup` requires either of `dataset_old` or `table_old`"
112
115
  end
113
116
  task['dataset_old'] ||= task['dataset']
114
117
  task['table_old'] ||= task['table']
@@ -306,42 +309,18 @@ module Embulk
306
309
 
307
310
  case task['mode']
308
311
  when 'delete_in_advance'
309
- bigquery.delete_partition(task['table'])
312
+ bigquery.delete_table_or_partition(task['table'])
310
313
  bigquery.create_table_if_not_exists(task['table'])
311
314
  when 'replace'
312
315
  bigquery.create_table_if_not_exists(task['temp_table'])
313
- if Helper.has_partition_decorator?(task['table'])
314
- if task['auto_create_table']
315
- bigquery.create_table_if_not_exists(task['table'])
316
- else
317
- bigquery.get_table(task['table']) # raises NotFoundError
318
- end
319
- end
316
+ bigquery.create_table_if_not_exists(task['table'])
320
317
  when 'append'
321
318
  bigquery.create_table_if_not_exists(task['temp_table'])
322
- if Helper.has_partition_decorator?(task['table'])
323
- if task['auto_create_table']
324
- bigquery.create_table_if_not_exists(task['table'])
325
- else
326
- bigquery.get_table(task['table']) # raises NotFoundError
327
- end
328
- end
319
+ bigquery.create_table_if_not_exists(task['table'])
329
320
  when 'replace_backup'
330
321
  bigquery.create_table_if_not_exists(task['temp_table'])
331
- if Helper.has_partition_decorator?(task['table'])
332
- if task['auto_create_table']
333
- bigquery.create_table_if_not_exists(task['table'])
334
- else
335
- bigquery.get_table(task['table']) # raises NotFoundError
336
- end
337
- end
338
- if Helper.has_partition_decorator?(task['table_old'])
339
- if task['auto_create_table']
340
- bigquery.create_table_if_not_exists(task['table_old'], dataset: task['dataset_old'])
341
- else
342
- bigquery.get_table(task['table_old'], dataset: task['dataset_old']) # raises NotFoundError
343
- end
344
- end
322
+ bigquery.create_table_if_not_exists(task['table'])
323
+ bigquery.create_table_if_not_exists(task['table_old'], dataset: task['dataset_old'])
345
324
  else # append_direct
346
325
  if task['auto_create_table']
347
326
  bigquery.create_table_if_not_exists(task['table'])
@@ -79,11 +79,7 @@ module Embulk
79
79
  begin
80
80
  # As https://cloud.google.com/bigquery/docs/managing_jobs_datasets_projects#managingjobs says,
81
81
  # we should generate job_id in client code, otherwise, retrying would cause duplication
82
- if @task['prevent_duplicate_insert'] and (@task['mode'] == 'append' or @task['mode'] == 'append_direct')
83
- job_id = Helper.create_load_job_id(@task, path, fields)
84
- else
85
- job_id = "embulk_load_job_#{SecureRandom.uuid}"
86
- end
82
+ job_id = "embulk_load_job_#{SecureRandom.uuid}"
87
83
  Embulk.logger.info { "embulk-output-bigquery: Load job starting... job_id:[#{job_id}] #{object_uris} => #{@project}:#{@dataset}.#{table} in #{@location_for_log}" }
88
84
 
89
85
  body = {
@@ -174,11 +170,7 @@ module Embulk
174
170
  if File.exist?(path)
175
171
  # As https://cloud.google.com/bigquery/docs/managing_jobs_datasets_projects#managingjobs says,
176
172
  # we should generate job_id in client code, otherwise, retrying would cause duplication
177
- if @task['prevent_duplicate_insert'] and (@task['mode'] == 'append' or @task['mode'] == 'append_direct')
178
- job_id = Helper.create_load_job_id(@task, path, fields)
179
- else
180
- job_id = "embulk_load_job_#{SecureRandom.uuid}"
181
- end
173
+ job_id = "embulk_load_job_#{SecureRandom.uuid}"
182
174
  Embulk.logger.info { "embulk-output-bigquery: Load job starting... job_id:[#{job_id}] #{path} => #{@project}:#{@dataset}.#{table} in #{@location_for_log}" }
183
175
  else
184
176
  Embulk.logger.info { "embulk-output-bigquery: Load job starting... #{path} does not exist, skipped" }
@@ -437,7 +429,6 @@ module Embulk
437
429
  type: options['time_partitioning']['type'],
438
430
  expiration_ms: options['time_partitioning']['expiration_ms'],
439
431
  field: options['time_partitioning']['field'],
440
- require_partition_filter: options['time_partitioning']['require_partition_filter'],
441
432
  }
442
433
  end
443
434
 
@@ -55,14 +55,13 @@ module Embulk
55
55
  assert_equal nil, task['table_old']
56
56
  assert_equal nil, task['table_name_old']
57
57
  assert_equal false, task['auto_create_dataset']
58
- assert_equal false, task['auto_create_table']
58
+ assert_equal true, task['auto_create_table']
59
59
  assert_equal nil, task['schema_file']
60
60
  assert_equal nil, task['template_table']
61
61
  assert_equal true, task['delete_from_local_when_job_end']
62
62
  assert_equal 3600, task['job_status_max_polling_time']
63
63
  assert_equal 10, task['job_status_polling_interval']
64
64
  assert_equal false, task['is_skip_job_result_check']
65
- assert_equal false, task['prevent_duplicate_insert']
66
65
  assert_equal false, task['with_rehearsal']
67
66
  assert_equal 1000, task['rehearsal_counts']
68
67
  assert_equal [], task['column_options']
@@ -162,22 +161,22 @@ module Embulk
162
161
  end
163
162
 
164
163
  def test_payload_column
165
- config = least_config.merge('payload_column' => schema.first.name)
164
+ config = least_config.merge('payload_column' => schema.first.name, 'auto_create_table' => false, 'mode' => 'append_direct')
166
165
  task = Bigquery.configure(config, schema, processor_count)
167
166
  assert_equal task['payload_column_index'], 0
168
167
 
169
- config = least_config.merge('payload_column' => 'not_exist')
168
+ config = least_config.merge('payload_column' => 'not_exist', 'auto_create_table' => false, 'mode' => 'append_direct')
170
169
  assert_raise { Bigquery.configure(config, schema, processor_count) }
171
170
  end
172
171
 
173
172
  def test_payload_column_index
174
- config = least_config.merge('payload_column_index' => 0)
173
+ config = least_config.merge('payload_column_index' => 0, 'auto_create_table' => false, 'mode' => 'append_direct')
175
174
  assert_nothing_raised { Bigquery.configure(config, schema, processor_count) }
176
175
 
177
- config = least_config.merge('payload_column_index' => -1)
176
+ config = least_config.merge('payload_column_index' => -1, 'auto_create_table' => false, 'mode' => 'append_direct')
178
177
  assert_raise { Bigquery.configure(config, schema, processor_count) }
179
178
 
180
- config = least_config.merge('payload_column_index' => schema.size)
179
+ config = least_config.merge('payload_column_index' => schema.size, 'auto_create_table' => false, 'mode' => 'append_direct')
181
180
  assert_raise { Bigquery.configure(config, schema, processor_count) }
182
181
  end
183
182
 
@@ -33,7 +33,6 @@ else
33
33
  files.each do |config_path|
34
34
  if %w[
35
35
  config_expose_errors.yml
36
- config_prevent_duplicate_insert.yml
37
36
  ].include?(File.basename(config_path))
38
37
  define_method(:"test_#{File.basename(config_path, ".yml")}") do
39
38
  assert_false embulk_run(config_path)
@@ -41,8 +41,8 @@ module Embulk
41
41
  end
42
42
 
43
43
  sub_test_case "append_direct" do
44
- def test_append_direct
45
- config = least_config.merge('mode' => 'append_direct')
44
+ def test_append_direc_without_auto_create
45
+ config = least_config.merge('mode' => 'append_direct', 'auto_create_dataset' => false, 'auto_create_table' => false)
46
46
  any_instance_of(BigqueryClient) do |obj|
47
47
  mock(obj).get_dataset(config['dataset'])
48
48
  mock(obj).get_table(config['table'])
@@ -60,8 +60,8 @@ module Embulk
60
60
  Bigquery.transaction(config, schema, processor_count, &control)
61
61
  end
62
62
 
63
- def test_append_direct_with_partition
64
- config = least_config.merge('mode' => 'append_direct', 'table' => 'table$20160929')
63
+ def test_append_direct_with_partition_without_auto_create
64
+ config = least_config.merge('mode' => 'append_direct', 'table' => 'table$20160929', 'auto_create_dataset' => false, 'auto_create_table' => false)
65
65
  any_instance_of(BigqueryClient) do |obj|
66
66
  mock(obj).get_dataset(config['dataset'])
67
67
  mock(obj).get_table(config['table'])
@@ -86,7 +86,7 @@ module Embulk
86
86
  task = Bigquery.configure(config, schema, processor_count)
87
87
  any_instance_of(BigqueryClient) do |obj|
88
88
  mock(obj).get_dataset(config['dataset'])
89
- mock(obj).delete_partition(config['table'])
89
+ mock(obj).delete_table_or_partition(config['table'])
90
90
  mock(obj).create_table_if_not_exists(config['table'])
91
91
  end
92
92
  Bigquery.transaction(config, schema, processor_count, &control)
@@ -97,7 +97,7 @@ module Embulk
97
97
  task = Bigquery.configure(config, schema, processor_count)
98
98
  any_instance_of(BigqueryClient) do |obj|
99
99
  mock(obj).get_dataset(config['dataset'])
100
- mock(obj).delete_partition(config['table'])
100
+ mock(obj).delete_table_or_partition(config['table'])
101
101
  mock(obj).create_table_if_not_exists(config['table'])
102
102
  end
103
103
  Bigquery.transaction(config, schema, processor_count, &control)
@@ -111,6 +111,7 @@ module Embulk
111
111
  any_instance_of(BigqueryClient) do |obj|
112
112
  mock(obj).get_dataset(config['dataset'])
113
113
  mock(obj).create_table_if_not_exists(config['temp_table'])
114
+ mock(obj).create_table_if_not_exists(config['table'])
114
115
  mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
115
116
  mock(obj).delete_table(config['temp_table'])
116
117
  end
@@ -120,19 +121,6 @@ module Embulk
120
121
  def test_replace_with_partitioning
121
122
  config = least_config.merge('mode' => 'replace', 'table' => 'table$20160929')
122
123
  task = Bigquery.configure(config, schema, processor_count)
123
- any_instance_of(BigqueryClient) do |obj|
124
- mock(obj).get_dataset(config['dataset'])
125
- mock(obj).create_table_if_not_exists(config['temp_table'])
126
- mock(obj).get_table(config['table'])
127
- mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
128
- mock(obj).delete_table(config['temp_table'])
129
- end
130
- Bigquery.transaction(config, schema, processor_count, &control)
131
- end
132
-
133
- def test_replace_with_partitioning_with_auto_create_table
134
- config = least_config.merge('mode' => 'replace', 'table' => 'table$20160929', 'auto_create_table' => true)
135
- task = Bigquery.configure(config, schema, processor_count)
136
124
  any_instance_of(BigqueryClient) do |obj|
137
125
  mock(obj).get_dataset(config['dataset'])
138
126
  mock(obj).create_table_if_not_exists(config['temp_table'])
@@ -152,8 +140,10 @@ module Embulk
152
140
  mock(obj).get_dataset(config['dataset'])
153
141
  mock(obj).get_dataset(config['dataset_old'])
154
142
  mock(obj).create_table_if_not_exists(config['temp_table'])
143
+ mock(obj).create_table_if_not_exists(config['table'])
144
+ mock(obj).create_table_if_not_exists(config['table_old'], dataset: config['dataset_old'])
155
145
 
156
- mock(obj).get_table_or_partition(task['table'])
146
+ mock(obj).get_table_or_partition(config['table'])
157
147
  mock(obj).copy(config['table'], config['table_old'], config['dataset_old'])
158
148
 
159
149
  mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
@@ -168,9 +158,11 @@ module Embulk
168
158
  any_instance_of(BigqueryClient) do |obj|
169
159
  mock(obj).create_dataset(config['dataset'])
170
160
  mock(obj).create_dataset(config['dataset_old'], reference: config['dataset'])
161
+ mock(obj).create_table_if_not_exists(config['table'])
171
162
  mock(obj).create_table_if_not_exists(config['temp_table'])
163
+ mock(obj).create_table_if_not_exists(config['table_old'], dataset: config['dataset_old'])
172
164
 
173
- mock(obj).get_table_or_partition(task['table'])
165
+ mock(obj).get_table_or_partition(config['table'])
174
166
  mock(obj).copy(config['table'], config['table_old'], config['dataset_old'])
175
167
 
176
168
  mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
@@ -180,35 +172,16 @@ module Embulk
180
172
  end
181
173
 
182
174
  def test_replace_backup_with_partitioning
183
- config = least_config.merge('mode' => 'replace_backup', 'table' => 'table$20160929', 'dataset_old' => 'dataset_old', 'table_old' => 'table_old$20190929', 'temp_table' => 'temp_table')
184
- task = Bigquery.configure(config, schema, processor_count)
185
- any_instance_of(BigqueryClient) do |obj|
186
- mock(obj).get_dataset(config['dataset'])
187
- mock(obj).get_dataset(config['dataset_old'])
188
- mock(obj).create_table_if_not_exists(config['temp_table'])
189
- mock(obj).get_table(task['table'])
190
- mock(obj).get_table(task['table_old'], dataset: config['dataset_old'])
191
-
192
- mock(obj).get_table_or_partition(task['table'])
193
- mock(obj).copy(config['table'], config['table_old'], config['dataset_old'])
194
-
195
- mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
196
- mock(obj).delete_table(config['temp_table'])
197
- end
198
- Bigquery.transaction(config, schema, processor_count, &control)
199
- end
200
-
201
- def test_replace_backup_with_partitioning_auto_create_table
202
175
  config = least_config.merge('mode' => 'replace_backup', 'table' => 'table$20160929', 'dataset_old' => 'dataset_old', 'table_old' => 'table_old$20160929', 'temp_table' => 'temp_table', 'auto_create_table' => true)
203
176
  task = Bigquery.configure(config, schema, processor_count)
204
177
  any_instance_of(BigqueryClient) do |obj|
205
178
  mock(obj).get_dataset(config['dataset'])
206
179
  mock(obj).get_dataset(config['dataset_old'])
207
180
  mock(obj).create_table_if_not_exists(config['temp_table'])
208
- mock(obj).create_table_if_not_exists(task['table'])
209
- mock(obj).create_table_if_not_exists(task['table_old'], dataset: config['dataset_old'])
181
+ mock(obj).create_table_if_not_exists(config['table'])
182
+ mock(obj).create_table_if_not_exists(config['table_old'], dataset: config['dataset_old'])
210
183
 
211
- mock(obj).get_table_or_partition(task['table'])
184
+ mock(obj).get_table_or_partition(config['table'])
212
185
  mock(obj).copy(config['table'], config['table_old'], config['dataset_old'])
213
186
 
214
187
  mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
@@ -225,6 +198,7 @@ module Embulk
225
198
  any_instance_of(BigqueryClient) do |obj|
226
199
  mock(obj).get_dataset(config['dataset'])
227
200
  mock(obj).create_table_if_not_exists(config['temp_table'])
201
+ mock(obj).create_table_if_not_exists(config['table'])
228
202
  mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_APPEND')
229
203
  mock(obj).delete_table(config['temp_table'])
230
204
  end
@@ -232,19 +206,6 @@ module Embulk
232
206
  end
233
207
 
234
208
  def test_append_with_partitioning
235
- config = least_config.merge('mode' => 'append', 'table' => 'table$20160929')
236
- task = Bigquery.configure(config, schema, processor_count)
237
- any_instance_of(BigqueryClient) do |obj|
238
- mock(obj).get_dataset(config['dataset'])
239
- mock(obj).create_table_if_not_exists(config['temp_table'])
240
- mock(obj).get_table(config['table'])
241
- mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_APPEND')
242
- mock(obj).delete_table(config['temp_table'])
243
- end
244
- Bigquery.transaction(config, schema, processor_count, &control)
245
- end
246
-
247
- def test_append_with_partitioning_with_auto_create_table
248
209
  config = least_config.merge('mode' => 'append', 'table' => 'table$20160929', 'auto_create_table' => true)
249
210
  task = Bigquery.configure(config, schema, processor_count)
250
211
  any_instance_of(BigqueryClient) do |obj|
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: embulk-output-bigquery
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.14
4
+ version: 0.5.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Satoshi Akama
@@ -102,7 +102,6 @@ files:
102
102
  - example/config_nested_record.yml
103
103
  - example/config_payload_column.yml
104
104
  - example/config_payload_column_index.yml
105
- - example/config_prevent_duplicate_insert.yml
106
105
  - example/config_progress_log_interval.yml
107
106
  - example/config_replace.yml
108
107
  - example/config_replace_backup.yml
@@ -1,30 +0,0 @@
1
- in:
2
- type: file
3
- path_prefix: example/example.csv
4
- parser:
5
- type: csv
6
- charset: UTF-8
7
- newline: CRLF
8
- null_string: 'NULL'
9
- skip_header_lines: 1
10
- comment_line_marker: '#'
11
- columns:
12
- - {name: date, type: string}
13
- - {name: timestamp, type: timestamp, format: "%Y-%m-%d %H:%M:%S.%N", timezone: "+09:00"}
14
- - {name: "null", type: string}
15
- - {name: long, type: long}
16
- - {name: string, type: string}
17
- - {name: double, type: double}
18
- - {name: boolean, type: boolean}
19
- out:
20
- type: bigquery
21
- mode: append
22
- auth_method: json_key
23
- json_keyfile: example/your-project-000.json
24
- dataset: your_dataset_name
25
- table: your_table_name
26
- source_format: NEWLINE_DELIMITED_JSON
27
- auto_create_dataset: true
28
- auto_create_table: true
29
- schema_file: example/schema.json
30
- prevent_duplicate_insert: true