embulk-output-bigquery 0.4.14 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 4fb376f288bfa86d632d727b3d0770ca4b94e364261c3f87a2569c801ee2fa00
4
- data.tar.gz: 2571a07afb9aac0774e0744f9d5118712bb83f44f82470dd4fd25bf515c7b9fa
3
+ metadata.gz: 3e0087103039718cb24224b6bb793d820b53b935194d412e4b2984aba3d7d7a8
4
+ data.tar.gz: 9ac27a3b881277450cbfaa096de0690c721a8f86f0e78abb692c8a4ed5b679d5
5
5
  SHA512:
6
- metadata.gz: 15f71decc69d34d8fbc3ee09452a6307107b71f759820b8a0521c6473b2231c4706febf216b59baae0e18fc3a06a056c18552d1093f0ac264ef84183a6d27992
7
- data.tar.gz: 7ee57f82766927cb804bf0d88550f7f3e4d0459315160a0eec98ccd4c00e2a2423a093cffd17e836d2dba8461cbc2ae4e227ff85d60c7c9628d32b1fd142b7eb
6
+ metadata.gz: 6b0ccf4e349a5d15321cfcc97138a98676bddfd412fd6fadfc8b1e0d6cd31d9739a8a5f46ccd923644543ae43cc0134b3e7598f80d89c330a4ac8aec49c084c1
7
+ data.tar.gz: f02557cdd7956620ae59eb6bc0e5872992d20a65881bd69230b0b0442342a36203d1eedd8a20702d2000f412b909359657bfa300b3e82b5f494398ea6e5ea301
@@ -1,3 +1,10 @@
1
+ ## 0.5.0 - 2019-08-10
2
+
3
+ * [incompatibility change] Drop deprecated time\_partitioning.require\_partition\_filter
4
+ * [incompatibility change] Drop prevent\_duplicate\_insert which has no use-case now
5
+ * [incompatibility change] Change default value of `auto\_create\_table` to `true` from `false`
6
+ * Modes `replace`, `replace_backup`, `append`, `delete_in_advance`, that is, except `append_direct` requires `auto_create_table: true`.
7
+
1
8
  ## 0.4.14 - 2019-08-10
2
9
 
3
10
  * [enhancement] Support field partitioning correctly.
data/README.md CHANGED
@@ -23,14 +23,6 @@ https://developers.google.com/bigquery/loading-data-into-bigquery
23
23
  Current version of this plugin supports Google API with Service Account Authentication, but does not support
24
24
  OAuth flow for installed applications.
25
25
 
26
- ### INCOMPATIBILITY CHANGES
27
-
28
- v0.3.x has incompatibility changes with v0.2.x. Please see [CHANGELOG.md](CHANGELOG.md) for details.
29
-
30
- * `formatter` option (formatter plugin support) is dropped. Use `source_format` option instead. (it already exists in v0.2.x too)
31
- * `encoders` option (encoder plugin support) is dropped. Use `compression` option instead (it already exists in v0.2.x too).
32
- * `mode: append` mode now expresses a transactional append, and `mode: append_direct` is one which is not transactional.
33
-
34
26
  ## Configuration
35
27
 
36
28
  #### Original options
@@ -47,10 +39,9 @@ v0.3.x has incompatibility changes with v0.2.x. Please see [CHANGELOG.md](CHANGE
47
39
  | location | string | optional | nil | geographic location of dataset. See [Location](#location) |
48
40
  | table | string | required | | table name, or table name with a partition decorator such as `table_name$20160929`|
49
41
  | auto_create_dataset | boolean | optional | false | automatically create dataset |
50
- | auto_create_table | boolean | optional | false | See [Dynamic Table Creating](#dynamic-table-creating) and [Time Partitioning](#time-partitioning) |
42
+ | auto_create_table | boolean | optional | true | `false` is available only for `append_direct` mode. Other modes requires `true`. See [Dynamic Table Creating](#dynamic-table-creating) and [Time Partitioning](#time-partitioning) |
51
43
  | schema_file | string | optional | | /path/to/schema.json |
52
44
  | template_table | string | optional | | template table name. See [Dynamic Table Creating](#dynamic-table-creating) |
53
- | prevent_duplicate_insert | boolean | optional | false | See [Prevent Duplication](#prevent-duplication) |
54
45
  | job_status_max_polling_time | int | optional | 3600 sec | Max job status polling time |
55
46
  | job_status_polling_interval | int | optional | 10 sec | Job status polling interval |
56
47
  | is_skip_job_result_check | boolean | optional | false | Skip waiting Load job finishes. Available for append, or delete_in_advance mode |
@@ -107,7 +98,6 @@ Following options are same as [bq command-line tools](https://cloud.google.com/b
107
98
  | time_partitioning.type | string | required | nil | The only type supported is DAY, which will generate one partition per day based on data loading time. |
108
99
  | time_partitioning.expiration_ms | int | optional | nil | Number of milliseconds for which to keep the storage for a partition. |
109
100
  | time_partitioning.field | string | optional | nil | `DATE` or `TIMESTAMP` column used for partitioning |
110
- | time_partitioning.require_partition_filter | boolean | optional | nil | If true, valid partition filter is required when query |
111
101
  | clustering | hash | optional | nil | Currently, clustering is supported for partitioned tables, so must be used with `time_partitioning` option. See [clustered tables](https://cloud.google.com/bigquery/docs/clustered-tables) |
112
102
  | clustering.fields | array | required | nil | One or more fields on which data should be clustered. The order of the specified columns determines the sort order of the data. |
113
103
  | schema_update_options | array | optional | nil | (Experimental) List of `ALLOW_FIELD_ADDITION` or `ALLOW_FIELD_RELAXATION` or both. See [jobs#configuration.load.schemaUpdateOptions](https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load.schemaUpdateOptions). NOTE for the current status: `schema_update_options` does not work for `copy` job, that is, is not effective for most of modes such as `append`, `replace` and `replace_backup`. `delete_in_advance` deletes origin table so does not need to update schema. Only `append_direct` can utilize schema update. |
@@ -252,11 +242,6 @@ out:
252
242
 
253
243
  ### Dynamic table creating
254
244
 
255
- This plugin tries to create a table using BigQuery API when
256
-
257
- * mode is either of `delete_in_advance`, `replace`, `replace_backup`, `append`.
258
- * mode is `append_direct` and `auto_create_table` is true.
259
-
260
245
  There are 3 ways to set schema.
261
246
 
262
247
  #### Set schema.json
@@ -355,22 +340,6 @@ out:
355
340
  payload_column_index: 0 # or, payload_column: payload
356
341
  ```
357
342
 
358
- ### Prevent Duplication
359
-
360
- `prevent_duplicate_insert` option is used to prevent inserting same data for modes `append` or `append_direct`.
361
-
362
- When `prevent_duplicate_insert` is set to true, embulk-output-bigquery generate job ID from md5 hash of file and other options.
363
-
364
- `job ID = md5(md5(file) + dataset + table + schema + source_format + file_delimiter + max_bad_records + encoding + ignore_unknown_values + allow_quoted_newlines)`
365
-
366
- [job ID must be unique(including failures)](https://cloud.google.com/bigquery/loading-data-into-bigquery#consistency) so that same data can't be inserted with same settings repeatedly.
367
-
368
- ```yaml
369
- out:
370
- type: bigquery
371
- prevent_duplicate_insert: true
372
- ```
373
-
374
343
  ### GCS Bucket
375
344
 
376
345
  This is useful to reduce number of consumed jobs, which is limited by [100,000 jobs per project per day](https://cloud.google.com/bigquery/quotas#load_jobs).
@@ -401,32 +370,31 @@ To load into a partition, specify `table` parameter with a partition decorator a
401
370
  out:
402
371
  type: bigquery
403
372
  table: table_name$20160929
404
- auto_create_table: true
405
373
  ```
406
374
 
407
- You may configure `time_partitioning` parameter together to create table via `auto_create_table: true` option as:
375
+ You may configure `time_partitioning` parameter together as:
408
376
 
409
377
  ```yaml
410
378
  out:
411
379
  type: bigquery
412
380
  table: table_name$20160929
413
- auto_create_table: true
414
381
  time_partitioning:
415
382
  type: DAY
416
383
  expiration_ms: 259200000
417
384
  ```
418
385
 
419
386
  You can also create column-based partitioning table as:
387
+
420
388
  ```yaml
421
389
  out:
422
390
  type: bigquery
423
391
  mode: replace
424
- auto_create_table: true
425
392
  table: table_name
426
393
  time_partitioning:
427
394
  type: DAY
428
395
  field: timestamp
429
396
  ```
397
+
430
398
  Note the `time_partitioning.field` should be top-level `DATE` or `TIMESTAMP`.
431
399
 
432
400
  Use [Tables: patch](https://cloud.google.com/bigquery/docs/reference/v2/tables/patch) API to update the schema of the partitioned table, embulk-output-bigquery itself does not support it, though.
@@ -1,6 +1,6 @@
1
1
  Gem::Specification.new do |spec|
2
2
  spec.name = "embulk-output-bigquery"
3
- spec.version = "0.4.14"
3
+ spec.version = "0.5.0"
4
4
  spec.authors = ["Satoshi Akama", "Naotoshi Seo"]
5
5
  spec.summary = "Google BigQuery output plugin for Embulk"
6
6
  spec.description = "Embulk plugin that insert records to Google BigQuery."
@@ -45,7 +45,7 @@ module Embulk
45
45
  'table_old' => config.param('table_old', :string, :default => nil),
46
46
  'table_name_old' => config.param('table_name_old', :string, :default => nil), # lower version compatibility
47
47
  'auto_create_dataset' => config.param('auto_create_dataset', :bool, :default => false),
48
- 'auto_create_table' => config.param('auto_create_table', :bool, :default => false),
48
+ 'auto_create_table' => config.param('auto_create_table', :bool, :default => true),
49
49
  'schema_file' => config.param('schema_file', :string, :default => nil),
50
50
  'template_table' => config.param('template_table', :string, :default => nil),
51
51
 
@@ -53,7 +53,6 @@ module Embulk
53
53
  'job_status_max_polling_time' => config.param('job_status_max_polling_time', :integer, :default => 3600),
54
54
  'job_status_polling_interval' => config.param('job_status_polling_interval', :integer, :default => 10),
55
55
  'is_skip_job_result_check' => config.param('is_skip_job_result_check', :bool, :default => false),
56
- 'prevent_duplicate_insert' => config.param('prevent_duplicate_insert', :bool, :default => false),
57
56
  'with_rehearsal' => config.param('with_rehearsal', :bool, :default => false),
58
57
  'rehearsal_counts' => config.param('rehearsal_counts', :integer, :default => 1000),
59
58
  'abort_on_error' => config.param('abort_on_error', :bool, :default => nil),
@@ -105,10 +104,14 @@ module Embulk
105
104
  raise ConfigError.new "`mode` must be one of append, append_direct, replace, delete_in_advance, replace_backup"
106
105
  end
107
106
 
107
+ if %w[append replace delete_in_advance replace_backup].include?(task['mode']) and !task['auto_create_table']
108
+ raise ConfigError.new "`mode: #{task['mode']}` requires `auto_create_table: true`"
109
+ end
110
+
108
111
  if task['mode'] == 'replace_backup'
109
112
  task['table_old'] ||= task['table_name_old'] # for lower version compatibility
110
113
  if task['dataset_old'].nil? and task['table_old'].nil?
111
- raise ConfigError.new "`mode replace_backup` requires either of `dataset_old` or `table_old`"
114
+ raise ConfigError.new "`mode: replace_backup` requires either of `dataset_old` or `table_old`"
112
115
  end
113
116
  task['dataset_old'] ||= task['dataset']
114
117
  task['table_old'] ||= task['table']
@@ -306,42 +309,18 @@ module Embulk
306
309
 
307
310
  case task['mode']
308
311
  when 'delete_in_advance'
309
- bigquery.delete_partition(task['table'])
312
+ bigquery.delete_table_or_partition(task['table'])
310
313
  bigquery.create_table_if_not_exists(task['table'])
311
314
  when 'replace'
312
315
  bigquery.create_table_if_not_exists(task['temp_table'])
313
- if Helper.has_partition_decorator?(task['table'])
314
- if task['auto_create_table']
315
- bigquery.create_table_if_not_exists(task['table'])
316
- else
317
- bigquery.get_table(task['table']) # raises NotFoundError
318
- end
319
- end
316
+ bigquery.create_table_if_not_exists(task['table'])
320
317
  when 'append'
321
318
  bigquery.create_table_if_not_exists(task['temp_table'])
322
- if Helper.has_partition_decorator?(task['table'])
323
- if task['auto_create_table']
324
- bigquery.create_table_if_not_exists(task['table'])
325
- else
326
- bigquery.get_table(task['table']) # raises NotFoundError
327
- end
328
- end
319
+ bigquery.create_table_if_not_exists(task['table'])
329
320
  when 'replace_backup'
330
321
  bigquery.create_table_if_not_exists(task['temp_table'])
331
- if Helper.has_partition_decorator?(task['table'])
332
- if task['auto_create_table']
333
- bigquery.create_table_if_not_exists(task['table'])
334
- else
335
- bigquery.get_table(task['table']) # raises NotFoundError
336
- end
337
- end
338
- if Helper.has_partition_decorator?(task['table_old'])
339
- if task['auto_create_table']
340
- bigquery.create_table_if_not_exists(task['table_old'], dataset: task['dataset_old'])
341
- else
342
- bigquery.get_table(task['table_old'], dataset: task['dataset_old']) # raises NotFoundError
343
- end
344
- end
322
+ bigquery.create_table_if_not_exists(task['table'])
323
+ bigquery.create_table_if_not_exists(task['table_old'], dataset: task['dataset_old'])
345
324
  else # append_direct
346
325
  if task['auto_create_table']
347
326
  bigquery.create_table_if_not_exists(task['table'])
@@ -79,11 +79,7 @@ module Embulk
79
79
  begin
80
80
  # As https://cloud.google.com/bigquery/docs/managing_jobs_datasets_projects#managingjobs says,
81
81
  # we should generate job_id in client code, otherwise, retrying would cause duplication
82
- if @task['prevent_duplicate_insert'] and (@task['mode'] == 'append' or @task['mode'] == 'append_direct')
83
- job_id = Helper.create_load_job_id(@task, path, fields)
84
- else
85
- job_id = "embulk_load_job_#{SecureRandom.uuid}"
86
- end
82
+ job_id = "embulk_load_job_#{SecureRandom.uuid}"
87
83
  Embulk.logger.info { "embulk-output-bigquery: Load job starting... job_id:[#{job_id}] #{object_uris} => #{@project}:#{@dataset}.#{table} in #{@location_for_log}" }
88
84
 
89
85
  body = {
@@ -174,11 +170,7 @@ module Embulk
174
170
  if File.exist?(path)
175
171
  # As https://cloud.google.com/bigquery/docs/managing_jobs_datasets_projects#managingjobs says,
176
172
  # we should generate job_id in client code, otherwise, retrying would cause duplication
177
- if @task['prevent_duplicate_insert'] and (@task['mode'] == 'append' or @task['mode'] == 'append_direct')
178
- job_id = Helper.create_load_job_id(@task, path, fields)
179
- else
180
- job_id = "embulk_load_job_#{SecureRandom.uuid}"
181
- end
173
+ job_id = "embulk_load_job_#{SecureRandom.uuid}"
182
174
  Embulk.logger.info { "embulk-output-bigquery: Load job starting... job_id:[#{job_id}] #{path} => #{@project}:#{@dataset}.#{table} in #{@location_for_log}" }
183
175
  else
184
176
  Embulk.logger.info { "embulk-output-bigquery: Load job starting... #{path} does not exist, skipped" }
@@ -437,7 +429,6 @@ module Embulk
437
429
  type: options['time_partitioning']['type'],
438
430
  expiration_ms: options['time_partitioning']['expiration_ms'],
439
431
  field: options['time_partitioning']['field'],
440
- require_partition_filter: options['time_partitioning']['require_partition_filter'],
441
432
  }
442
433
  end
443
434
 
@@ -55,14 +55,13 @@ module Embulk
55
55
  assert_equal nil, task['table_old']
56
56
  assert_equal nil, task['table_name_old']
57
57
  assert_equal false, task['auto_create_dataset']
58
- assert_equal false, task['auto_create_table']
58
+ assert_equal true, task['auto_create_table']
59
59
  assert_equal nil, task['schema_file']
60
60
  assert_equal nil, task['template_table']
61
61
  assert_equal true, task['delete_from_local_when_job_end']
62
62
  assert_equal 3600, task['job_status_max_polling_time']
63
63
  assert_equal 10, task['job_status_polling_interval']
64
64
  assert_equal false, task['is_skip_job_result_check']
65
- assert_equal false, task['prevent_duplicate_insert']
66
65
  assert_equal false, task['with_rehearsal']
67
66
  assert_equal 1000, task['rehearsal_counts']
68
67
  assert_equal [], task['column_options']
@@ -162,22 +161,22 @@ module Embulk
162
161
  end
163
162
 
164
163
  def test_payload_column
165
- config = least_config.merge('payload_column' => schema.first.name)
164
+ config = least_config.merge('payload_column' => schema.first.name, 'auto_create_table' => false, 'mode' => 'append_direct')
166
165
  task = Bigquery.configure(config, schema, processor_count)
167
166
  assert_equal task['payload_column_index'], 0
168
167
 
169
- config = least_config.merge('payload_column' => 'not_exist')
168
+ config = least_config.merge('payload_column' => 'not_exist', 'auto_create_table' => false, 'mode' => 'append_direct')
170
169
  assert_raise { Bigquery.configure(config, schema, processor_count) }
171
170
  end
172
171
 
173
172
  def test_payload_column_index
174
- config = least_config.merge('payload_column_index' => 0)
173
+ config = least_config.merge('payload_column_index' => 0, 'auto_create_table' => false, 'mode' => 'append_direct')
175
174
  assert_nothing_raised { Bigquery.configure(config, schema, processor_count) }
176
175
 
177
- config = least_config.merge('payload_column_index' => -1)
176
+ config = least_config.merge('payload_column_index' => -1, 'auto_create_table' => false, 'mode' => 'append_direct')
178
177
  assert_raise { Bigquery.configure(config, schema, processor_count) }
179
178
 
180
- config = least_config.merge('payload_column_index' => schema.size)
179
+ config = least_config.merge('payload_column_index' => schema.size, 'auto_create_table' => false, 'mode' => 'append_direct')
181
180
  assert_raise { Bigquery.configure(config, schema, processor_count) }
182
181
  end
183
182
 
@@ -33,7 +33,6 @@ else
33
33
  files.each do |config_path|
34
34
  if %w[
35
35
  config_expose_errors.yml
36
- config_prevent_duplicate_insert.yml
37
36
  ].include?(File.basename(config_path))
38
37
  define_method(:"test_#{File.basename(config_path, ".yml")}") do
39
38
  assert_false embulk_run(config_path)
@@ -41,8 +41,8 @@ module Embulk
41
41
  end
42
42
 
43
43
  sub_test_case "append_direct" do
44
- def test_append_direct
45
- config = least_config.merge('mode' => 'append_direct')
44
+ def test_append_direc_without_auto_create
45
+ config = least_config.merge('mode' => 'append_direct', 'auto_create_dataset' => false, 'auto_create_table' => false)
46
46
  any_instance_of(BigqueryClient) do |obj|
47
47
  mock(obj).get_dataset(config['dataset'])
48
48
  mock(obj).get_table(config['table'])
@@ -60,8 +60,8 @@ module Embulk
60
60
  Bigquery.transaction(config, schema, processor_count, &control)
61
61
  end
62
62
 
63
- def test_append_direct_with_partition
64
- config = least_config.merge('mode' => 'append_direct', 'table' => 'table$20160929')
63
+ def test_append_direct_with_partition_without_auto_create
64
+ config = least_config.merge('mode' => 'append_direct', 'table' => 'table$20160929', 'auto_create_dataset' => false, 'auto_create_table' => false)
65
65
  any_instance_of(BigqueryClient) do |obj|
66
66
  mock(obj).get_dataset(config['dataset'])
67
67
  mock(obj).get_table(config['table'])
@@ -86,7 +86,7 @@ module Embulk
86
86
  task = Bigquery.configure(config, schema, processor_count)
87
87
  any_instance_of(BigqueryClient) do |obj|
88
88
  mock(obj).get_dataset(config['dataset'])
89
- mock(obj).delete_partition(config['table'])
89
+ mock(obj).delete_table_or_partition(config['table'])
90
90
  mock(obj).create_table_if_not_exists(config['table'])
91
91
  end
92
92
  Bigquery.transaction(config, schema, processor_count, &control)
@@ -97,7 +97,7 @@ module Embulk
97
97
  task = Bigquery.configure(config, schema, processor_count)
98
98
  any_instance_of(BigqueryClient) do |obj|
99
99
  mock(obj).get_dataset(config['dataset'])
100
- mock(obj).delete_partition(config['table'])
100
+ mock(obj).delete_table_or_partition(config['table'])
101
101
  mock(obj).create_table_if_not_exists(config['table'])
102
102
  end
103
103
  Bigquery.transaction(config, schema, processor_count, &control)
@@ -111,6 +111,7 @@ module Embulk
111
111
  any_instance_of(BigqueryClient) do |obj|
112
112
  mock(obj).get_dataset(config['dataset'])
113
113
  mock(obj).create_table_if_not_exists(config['temp_table'])
114
+ mock(obj).create_table_if_not_exists(config['table'])
114
115
  mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
115
116
  mock(obj).delete_table(config['temp_table'])
116
117
  end
@@ -120,19 +121,6 @@ module Embulk
120
121
  def test_replace_with_partitioning
121
122
  config = least_config.merge('mode' => 'replace', 'table' => 'table$20160929')
122
123
  task = Bigquery.configure(config, schema, processor_count)
123
- any_instance_of(BigqueryClient) do |obj|
124
- mock(obj).get_dataset(config['dataset'])
125
- mock(obj).create_table_if_not_exists(config['temp_table'])
126
- mock(obj).get_table(config['table'])
127
- mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
128
- mock(obj).delete_table(config['temp_table'])
129
- end
130
- Bigquery.transaction(config, schema, processor_count, &control)
131
- end
132
-
133
- def test_replace_with_partitioning_with_auto_create_table
134
- config = least_config.merge('mode' => 'replace', 'table' => 'table$20160929', 'auto_create_table' => true)
135
- task = Bigquery.configure(config, schema, processor_count)
136
124
  any_instance_of(BigqueryClient) do |obj|
137
125
  mock(obj).get_dataset(config['dataset'])
138
126
  mock(obj).create_table_if_not_exists(config['temp_table'])
@@ -152,8 +140,10 @@ module Embulk
152
140
  mock(obj).get_dataset(config['dataset'])
153
141
  mock(obj).get_dataset(config['dataset_old'])
154
142
  mock(obj).create_table_if_not_exists(config['temp_table'])
143
+ mock(obj).create_table_if_not_exists(config['table'])
144
+ mock(obj).create_table_if_not_exists(config['table_old'], dataset: config['dataset_old'])
155
145
 
156
- mock(obj).get_table_or_partition(task['table'])
146
+ mock(obj).get_table_or_partition(config['table'])
157
147
  mock(obj).copy(config['table'], config['table_old'], config['dataset_old'])
158
148
 
159
149
  mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
@@ -168,9 +158,11 @@ module Embulk
168
158
  any_instance_of(BigqueryClient) do |obj|
169
159
  mock(obj).create_dataset(config['dataset'])
170
160
  mock(obj).create_dataset(config['dataset_old'], reference: config['dataset'])
161
+ mock(obj).create_table_if_not_exists(config['table'])
171
162
  mock(obj).create_table_if_not_exists(config['temp_table'])
163
+ mock(obj).create_table_if_not_exists(config['table_old'], dataset: config['dataset_old'])
172
164
 
173
- mock(obj).get_table_or_partition(task['table'])
165
+ mock(obj).get_table_or_partition(config['table'])
174
166
  mock(obj).copy(config['table'], config['table_old'], config['dataset_old'])
175
167
 
176
168
  mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
@@ -180,35 +172,16 @@ module Embulk
180
172
  end
181
173
 
182
174
  def test_replace_backup_with_partitioning
183
- config = least_config.merge('mode' => 'replace_backup', 'table' => 'table$20160929', 'dataset_old' => 'dataset_old', 'table_old' => 'table_old$20190929', 'temp_table' => 'temp_table')
184
- task = Bigquery.configure(config, schema, processor_count)
185
- any_instance_of(BigqueryClient) do |obj|
186
- mock(obj).get_dataset(config['dataset'])
187
- mock(obj).get_dataset(config['dataset_old'])
188
- mock(obj).create_table_if_not_exists(config['temp_table'])
189
- mock(obj).get_table(task['table'])
190
- mock(obj).get_table(task['table_old'], dataset: config['dataset_old'])
191
-
192
- mock(obj).get_table_or_partition(task['table'])
193
- mock(obj).copy(config['table'], config['table_old'], config['dataset_old'])
194
-
195
- mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
196
- mock(obj).delete_table(config['temp_table'])
197
- end
198
- Bigquery.transaction(config, schema, processor_count, &control)
199
- end
200
-
201
- def test_replace_backup_with_partitioning_auto_create_table
202
175
  config = least_config.merge('mode' => 'replace_backup', 'table' => 'table$20160929', 'dataset_old' => 'dataset_old', 'table_old' => 'table_old$20160929', 'temp_table' => 'temp_table', 'auto_create_table' => true)
203
176
  task = Bigquery.configure(config, schema, processor_count)
204
177
  any_instance_of(BigqueryClient) do |obj|
205
178
  mock(obj).get_dataset(config['dataset'])
206
179
  mock(obj).get_dataset(config['dataset_old'])
207
180
  mock(obj).create_table_if_not_exists(config['temp_table'])
208
- mock(obj).create_table_if_not_exists(task['table'])
209
- mock(obj).create_table_if_not_exists(task['table_old'], dataset: config['dataset_old'])
181
+ mock(obj).create_table_if_not_exists(config['table'])
182
+ mock(obj).create_table_if_not_exists(config['table_old'], dataset: config['dataset_old'])
210
183
 
211
- mock(obj).get_table_or_partition(task['table'])
184
+ mock(obj).get_table_or_partition(config['table'])
212
185
  mock(obj).copy(config['table'], config['table_old'], config['dataset_old'])
213
186
 
214
187
  mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
@@ -225,6 +198,7 @@ module Embulk
225
198
  any_instance_of(BigqueryClient) do |obj|
226
199
  mock(obj).get_dataset(config['dataset'])
227
200
  mock(obj).create_table_if_not_exists(config['temp_table'])
201
+ mock(obj).create_table_if_not_exists(config['table'])
228
202
  mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_APPEND')
229
203
  mock(obj).delete_table(config['temp_table'])
230
204
  end
@@ -232,19 +206,6 @@ module Embulk
232
206
  end
233
207
 
234
208
  def test_append_with_partitioning
235
- config = least_config.merge('mode' => 'append', 'table' => 'table$20160929')
236
- task = Bigquery.configure(config, schema, processor_count)
237
- any_instance_of(BigqueryClient) do |obj|
238
- mock(obj).get_dataset(config['dataset'])
239
- mock(obj).create_table_if_not_exists(config['temp_table'])
240
- mock(obj).get_table(config['table'])
241
- mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_APPEND')
242
- mock(obj).delete_table(config['temp_table'])
243
- end
244
- Bigquery.transaction(config, schema, processor_count, &control)
245
- end
246
-
247
- def test_append_with_partitioning_with_auto_create_table
248
209
  config = least_config.merge('mode' => 'append', 'table' => 'table$20160929', 'auto_create_table' => true)
249
210
  task = Bigquery.configure(config, schema, processor_count)
250
211
  any_instance_of(BigqueryClient) do |obj|
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: embulk-output-bigquery
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.14
4
+ version: 0.5.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Satoshi Akama
@@ -102,7 +102,6 @@ files:
102
102
  - example/config_nested_record.yml
103
103
  - example/config_payload_column.yml
104
104
  - example/config_payload_column_index.yml
105
- - example/config_prevent_duplicate_insert.yml
106
105
  - example/config_progress_log_interval.yml
107
106
  - example/config_replace.yml
108
107
  - example/config_replace_backup.yml
@@ -1,30 +0,0 @@
1
- in:
2
- type: file
3
- path_prefix: example/example.csv
4
- parser:
5
- type: csv
6
- charset: UTF-8
7
- newline: CRLF
8
- null_string: 'NULL'
9
- skip_header_lines: 1
10
- comment_line_marker: '#'
11
- columns:
12
- - {name: date, type: string}
13
- - {name: timestamp, type: timestamp, format: "%Y-%m-%d %H:%M:%S.%N", timezone: "+09:00"}
14
- - {name: "null", type: string}
15
- - {name: long, type: long}
16
- - {name: string, type: string}
17
- - {name: double, type: double}
18
- - {name: boolean, type: boolean}
19
- out:
20
- type: bigquery
21
- mode: append
22
- auth_method: json_key
23
- json_keyfile: example/your-project-000.json
24
- dataset: your_dataset_name
25
- table: your_table_name
26
- source_format: NEWLINE_DELIMITED_JSON
27
- auto_create_dataset: true
28
- auto_create_table: true
29
- schema_file: example/schema.json
30
- prevent_duplicate_insert: true