embulk-output-bigquery 0.4.14 → 0.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +7 -0
- data/README.md +4 -36
- data/embulk-output-bigquery.gemspec +1 -1
- data/lib/embulk/output/bigquery.rb +11 -32
- data/lib/embulk/output/bigquery/bigquery_client.rb +2 -11
- data/test/test_configure.rb +6 -7
- data/test/test_example.rb +0 -1
- data/test/test_transaction.rb +17 -56
- metadata +1 -2
- data/example/config_prevent_duplicate_insert.yml +0 -30
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 3e0087103039718cb24224b6bb793d820b53b935194d412e4b2984aba3d7d7a8
|
4
|
+
data.tar.gz: 9ac27a3b881277450cbfaa096de0690c721a8f86f0e78abb692c8a4ed5b679d5
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 6b0ccf4e349a5d15321cfcc97138a98676bddfd412fd6fadfc8b1e0d6cd31d9739a8a5f46ccd923644543ae43cc0134b3e7598f80d89c330a4ac8aec49c084c1
|
7
|
+
data.tar.gz: f02557cdd7956620ae59eb6bc0e5872992d20a65881bd69230b0b0442342a36203d1eedd8a20702d2000f412b909359657bfa300b3e82b5f494398ea6e5ea301
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,10 @@
|
|
1
|
+
## 0.5.0 - 2019-08-10
|
2
|
+
|
3
|
+
* [incompatibility change] Drop deprecated time\_partitioning.require\_partition\_filter
|
4
|
+
* [incompatibility change] Drop prevent\_duplicate\_insert which has no use-case now
|
5
|
+
* [incompatibility change] Change default value of `auto\_create\_table` to `true` from `false`
|
6
|
+
* Modes `replace`, `replace_backup`, `append`, `delete_in_advance`, that is, except `append_direct` requires `auto_create_table: true`.
|
7
|
+
|
1
8
|
## 0.4.14 - 2019-08-10
|
2
9
|
|
3
10
|
* [enhancement] Support field partitioning correctly.
|
data/README.md
CHANGED
@@ -23,14 +23,6 @@ https://developers.google.com/bigquery/loading-data-into-bigquery
|
|
23
23
|
Current version of this plugin supports Google API with Service Account Authentication, but does not support
|
24
24
|
OAuth flow for installed applications.
|
25
25
|
|
26
|
-
### INCOMPATIBILITY CHANGES
|
27
|
-
|
28
|
-
v0.3.x has incompatibility changes with v0.2.x. Please see [CHANGELOG.md](CHANGELOG.md) for details.
|
29
|
-
|
30
|
-
* `formatter` option (formatter plugin support) is dropped. Use `source_format` option instead. (it already exists in v0.2.x too)
|
31
|
-
* `encoders` option (encoder plugin support) is dropped. Use `compression` option instead (it already exists in v0.2.x too).
|
32
|
-
* `mode: append` mode now expresses a transactional append, and `mode: append_direct` is one which is not transactional.
|
33
|
-
|
34
26
|
## Configuration
|
35
27
|
|
36
28
|
#### Original options
|
@@ -47,10 +39,9 @@ v0.3.x has incompatibility changes with v0.2.x. Please see [CHANGELOG.md](CHANGE
|
|
47
39
|
| location | string | optional | nil | geographic location of dataset. See [Location](#location) |
|
48
40
|
| table | string | required | | table name, or table name with a partition decorator such as `table_name$20160929`|
|
49
41
|
| auto_create_dataset | boolean | optional | false | automatically create dataset |
|
50
|
-
| auto_create_table | boolean | optional |
|
42
|
+
| auto_create_table | boolean | optional | true | `false` is available only for `append_direct` mode. Other modes requires `true`. See [Dynamic Table Creating](#dynamic-table-creating) and [Time Partitioning](#time-partitioning) |
|
51
43
|
| schema_file | string | optional | | /path/to/schema.json |
|
52
44
|
| template_table | string | optional | | template table name. See [Dynamic Table Creating](#dynamic-table-creating) |
|
53
|
-
| prevent_duplicate_insert | boolean | optional | false | See [Prevent Duplication](#prevent-duplication) |
|
54
45
|
| job_status_max_polling_time | int | optional | 3600 sec | Max job status polling time |
|
55
46
|
| job_status_polling_interval | int | optional | 10 sec | Job status polling interval |
|
56
47
|
| is_skip_job_result_check | boolean | optional | false | Skip waiting Load job finishes. Available for append, or delete_in_advance mode |
|
@@ -107,7 +98,6 @@ Following options are same as [bq command-line tools](https://cloud.google.com/b
|
|
107
98
|
| time_partitioning.type | string | required | nil | The only type supported is DAY, which will generate one partition per day based on data loading time. |
|
108
99
|
| time_partitioning.expiration_ms | int | optional | nil | Number of milliseconds for which to keep the storage for a partition. |
|
109
100
|
| time_partitioning.field | string | optional | nil | `DATE` or `TIMESTAMP` column used for partitioning |
|
110
|
-
| time_partitioning.require_partition_filter | boolean | optional | nil | If true, valid partition filter is required when query |
|
111
101
|
| clustering | hash | optional | nil | Currently, clustering is supported for partitioned tables, so must be used with `time_partitioning` option. See [clustered tables](https://cloud.google.com/bigquery/docs/clustered-tables) |
|
112
102
|
| clustering.fields | array | required | nil | One or more fields on which data should be clustered. The order of the specified columns determines the sort order of the data. |
|
113
103
|
| schema_update_options | array | optional | nil | (Experimental) List of `ALLOW_FIELD_ADDITION` or `ALLOW_FIELD_RELAXATION` or both. See [jobs#configuration.load.schemaUpdateOptions](https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load.schemaUpdateOptions). NOTE for the current status: `schema_update_options` does not work for `copy` job, that is, is not effective for most of modes such as `append`, `replace` and `replace_backup`. `delete_in_advance` deletes origin table so does not need to update schema. Only `append_direct` can utilize schema update. |
|
@@ -252,11 +242,6 @@ out:
|
|
252
242
|
|
253
243
|
### Dynamic table creating
|
254
244
|
|
255
|
-
This plugin tries to create a table using BigQuery API when
|
256
|
-
|
257
|
-
* mode is either of `delete_in_advance`, `replace`, `replace_backup`, `append`.
|
258
|
-
* mode is `append_direct` and `auto_create_table` is true.
|
259
|
-
|
260
245
|
There are 3 ways to set schema.
|
261
246
|
|
262
247
|
#### Set schema.json
|
@@ -355,22 +340,6 @@ out:
|
|
355
340
|
payload_column_index: 0 # or, payload_column: payload
|
356
341
|
```
|
357
342
|
|
358
|
-
### Prevent Duplication
|
359
|
-
|
360
|
-
`prevent_duplicate_insert` option is used to prevent inserting same data for modes `append` or `append_direct`.
|
361
|
-
|
362
|
-
When `prevent_duplicate_insert` is set to true, embulk-output-bigquery generate job ID from md5 hash of file and other options.
|
363
|
-
|
364
|
-
`job ID = md5(md5(file) + dataset + table + schema + source_format + file_delimiter + max_bad_records + encoding + ignore_unknown_values + allow_quoted_newlines)`
|
365
|
-
|
366
|
-
[job ID must be unique(including failures)](https://cloud.google.com/bigquery/loading-data-into-bigquery#consistency) so that same data can't be inserted with same settings repeatedly.
|
367
|
-
|
368
|
-
```yaml
|
369
|
-
out:
|
370
|
-
type: bigquery
|
371
|
-
prevent_duplicate_insert: true
|
372
|
-
```
|
373
|
-
|
374
343
|
### GCS Bucket
|
375
344
|
|
376
345
|
This is useful to reduce number of consumed jobs, which is limited by [100,000 jobs per project per day](https://cloud.google.com/bigquery/quotas#load_jobs).
|
@@ -401,32 +370,31 @@ To load into a partition, specify `table` parameter with a partition decorator a
|
|
401
370
|
out:
|
402
371
|
type: bigquery
|
403
372
|
table: table_name$20160929
|
404
|
-
auto_create_table: true
|
405
373
|
```
|
406
374
|
|
407
|
-
You may configure `time_partitioning` parameter together
|
375
|
+
You may configure `time_partitioning` parameter together as:
|
408
376
|
|
409
377
|
```yaml
|
410
378
|
out:
|
411
379
|
type: bigquery
|
412
380
|
table: table_name$20160929
|
413
|
-
auto_create_table: true
|
414
381
|
time_partitioning:
|
415
382
|
type: DAY
|
416
383
|
expiration_ms: 259200000
|
417
384
|
```
|
418
385
|
|
419
386
|
You can also create column-based partitioning table as:
|
387
|
+
|
420
388
|
```yaml
|
421
389
|
out:
|
422
390
|
type: bigquery
|
423
391
|
mode: replace
|
424
|
-
auto_create_table: true
|
425
392
|
table: table_name
|
426
393
|
time_partitioning:
|
427
394
|
type: DAY
|
428
395
|
field: timestamp
|
429
396
|
```
|
397
|
+
|
430
398
|
Note the `time_partitioning.field` should be top-level `DATE` or `TIMESTAMP`.
|
431
399
|
|
432
400
|
Use [Tables: patch](https://cloud.google.com/bigquery/docs/reference/v2/tables/patch) API to update the schema of the partitioned table, embulk-output-bigquery itself does not support it, though.
|
@@ -1,6 +1,6 @@
|
|
1
1
|
Gem::Specification.new do |spec|
|
2
2
|
spec.name = "embulk-output-bigquery"
|
3
|
-
spec.version = "0.
|
3
|
+
spec.version = "0.5.0"
|
4
4
|
spec.authors = ["Satoshi Akama", "Naotoshi Seo"]
|
5
5
|
spec.summary = "Google BigQuery output plugin for Embulk"
|
6
6
|
spec.description = "Embulk plugin that insert records to Google BigQuery."
|
@@ -45,7 +45,7 @@ module Embulk
|
|
45
45
|
'table_old' => config.param('table_old', :string, :default => nil),
|
46
46
|
'table_name_old' => config.param('table_name_old', :string, :default => nil), # lower version compatibility
|
47
47
|
'auto_create_dataset' => config.param('auto_create_dataset', :bool, :default => false),
|
48
|
-
'auto_create_table' => config.param('auto_create_table', :bool, :default =>
|
48
|
+
'auto_create_table' => config.param('auto_create_table', :bool, :default => true),
|
49
49
|
'schema_file' => config.param('schema_file', :string, :default => nil),
|
50
50
|
'template_table' => config.param('template_table', :string, :default => nil),
|
51
51
|
|
@@ -53,7 +53,6 @@ module Embulk
|
|
53
53
|
'job_status_max_polling_time' => config.param('job_status_max_polling_time', :integer, :default => 3600),
|
54
54
|
'job_status_polling_interval' => config.param('job_status_polling_interval', :integer, :default => 10),
|
55
55
|
'is_skip_job_result_check' => config.param('is_skip_job_result_check', :bool, :default => false),
|
56
|
-
'prevent_duplicate_insert' => config.param('prevent_duplicate_insert', :bool, :default => false),
|
57
56
|
'with_rehearsal' => config.param('with_rehearsal', :bool, :default => false),
|
58
57
|
'rehearsal_counts' => config.param('rehearsal_counts', :integer, :default => 1000),
|
59
58
|
'abort_on_error' => config.param('abort_on_error', :bool, :default => nil),
|
@@ -105,10 +104,14 @@ module Embulk
|
|
105
104
|
raise ConfigError.new "`mode` must be one of append, append_direct, replace, delete_in_advance, replace_backup"
|
106
105
|
end
|
107
106
|
|
107
|
+
if %w[append replace delete_in_advance replace_backup].include?(task['mode']) and !task['auto_create_table']
|
108
|
+
raise ConfigError.new "`mode: #{task['mode']}` requires `auto_create_table: true`"
|
109
|
+
end
|
110
|
+
|
108
111
|
if task['mode'] == 'replace_backup'
|
109
112
|
task['table_old'] ||= task['table_name_old'] # for lower version compatibility
|
110
113
|
if task['dataset_old'].nil? and task['table_old'].nil?
|
111
|
-
raise ConfigError.new "`mode replace_backup` requires either of `dataset_old` or `table_old`"
|
114
|
+
raise ConfigError.new "`mode: replace_backup` requires either of `dataset_old` or `table_old`"
|
112
115
|
end
|
113
116
|
task['dataset_old'] ||= task['dataset']
|
114
117
|
task['table_old'] ||= task['table']
|
@@ -306,42 +309,18 @@ module Embulk
|
|
306
309
|
|
307
310
|
case task['mode']
|
308
311
|
when 'delete_in_advance'
|
309
|
-
bigquery.
|
312
|
+
bigquery.delete_table_or_partition(task['table'])
|
310
313
|
bigquery.create_table_if_not_exists(task['table'])
|
311
314
|
when 'replace'
|
312
315
|
bigquery.create_table_if_not_exists(task['temp_table'])
|
313
|
-
|
314
|
-
if task['auto_create_table']
|
315
|
-
bigquery.create_table_if_not_exists(task['table'])
|
316
|
-
else
|
317
|
-
bigquery.get_table(task['table']) # raises NotFoundError
|
318
|
-
end
|
319
|
-
end
|
316
|
+
bigquery.create_table_if_not_exists(task['table'])
|
320
317
|
when 'append'
|
321
318
|
bigquery.create_table_if_not_exists(task['temp_table'])
|
322
|
-
|
323
|
-
if task['auto_create_table']
|
324
|
-
bigquery.create_table_if_not_exists(task['table'])
|
325
|
-
else
|
326
|
-
bigquery.get_table(task['table']) # raises NotFoundError
|
327
|
-
end
|
328
|
-
end
|
319
|
+
bigquery.create_table_if_not_exists(task['table'])
|
329
320
|
when 'replace_backup'
|
330
321
|
bigquery.create_table_if_not_exists(task['temp_table'])
|
331
|
-
|
332
|
-
|
333
|
-
bigquery.create_table_if_not_exists(task['table'])
|
334
|
-
else
|
335
|
-
bigquery.get_table(task['table']) # raises NotFoundError
|
336
|
-
end
|
337
|
-
end
|
338
|
-
if Helper.has_partition_decorator?(task['table_old'])
|
339
|
-
if task['auto_create_table']
|
340
|
-
bigquery.create_table_if_not_exists(task['table_old'], dataset: task['dataset_old'])
|
341
|
-
else
|
342
|
-
bigquery.get_table(task['table_old'], dataset: task['dataset_old']) # raises NotFoundError
|
343
|
-
end
|
344
|
-
end
|
322
|
+
bigquery.create_table_if_not_exists(task['table'])
|
323
|
+
bigquery.create_table_if_not_exists(task['table_old'], dataset: task['dataset_old'])
|
345
324
|
else # append_direct
|
346
325
|
if task['auto_create_table']
|
347
326
|
bigquery.create_table_if_not_exists(task['table'])
|
@@ -79,11 +79,7 @@ module Embulk
|
|
79
79
|
begin
|
80
80
|
# As https://cloud.google.com/bigquery/docs/managing_jobs_datasets_projects#managingjobs says,
|
81
81
|
# we should generate job_id in client code, otherwise, retrying would cause duplication
|
82
|
-
|
83
|
-
job_id = Helper.create_load_job_id(@task, path, fields)
|
84
|
-
else
|
85
|
-
job_id = "embulk_load_job_#{SecureRandom.uuid}"
|
86
|
-
end
|
82
|
+
job_id = "embulk_load_job_#{SecureRandom.uuid}"
|
87
83
|
Embulk.logger.info { "embulk-output-bigquery: Load job starting... job_id:[#{job_id}] #{object_uris} => #{@project}:#{@dataset}.#{table} in #{@location_for_log}" }
|
88
84
|
|
89
85
|
body = {
|
@@ -174,11 +170,7 @@ module Embulk
|
|
174
170
|
if File.exist?(path)
|
175
171
|
# As https://cloud.google.com/bigquery/docs/managing_jobs_datasets_projects#managingjobs says,
|
176
172
|
# we should generate job_id in client code, otherwise, retrying would cause duplication
|
177
|
-
|
178
|
-
job_id = Helper.create_load_job_id(@task, path, fields)
|
179
|
-
else
|
180
|
-
job_id = "embulk_load_job_#{SecureRandom.uuid}"
|
181
|
-
end
|
173
|
+
job_id = "embulk_load_job_#{SecureRandom.uuid}"
|
182
174
|
Embulk.logger.info { "embulk-output-bigquery: Load job starting... job_id:[#{job_id}] #{path} => #{@project}:#{@dataset}.#{table} in #{@location_for_log}" }
|
183
175
|
else
|
184
176
|
Embulk.logger.info { "embulk-output-bigquery: Load job starting... #{path} does not exist, skipped" }
|
@@ -437,7 +429,6 @@ module Embulk
|
|
437
429
|
type: options['time_partitioning']['type'],
|
438
430
|
expiration_ms: options['time_partitioning']['expiration_ms'],
|
439
431
|
field: options['time_partitioning']['field'],
|
440
|
-
require_partition_filter: options['time_partitioning']['require_partition_filter'],
|
441
432
|
}
|
442
433
|
end
|
443
434
|
|
data/test/test_configure.rb
CHANGED
@@ -55,14 +55,13 @@ module Embulk
|
|
55
55
|
assert_equal nil, task['table_old']
|
56
56
|
assert_equal nil, task['table_name_old']
|
57
57
|
assert_equal false, task['auto_create_dataset']
|
58
|
-
assert_equal
|
58
|
+
assert_equal true, task['auto_create_table']
|
59
59
|
assert_equal nil, task['schema_file']
|
60
60
|
assert_equal nil, task['template_table']
|
61
61
|
assert_equal true, task['delete_from_local_when_job_end']
|
62
62
|
assert_equal 3600, task['job_status_max_polling_time']
|
63
63
|
assert_equal 10, task['job_status_polling_interval']
|
64
64
|
assert_equal false, task['is_skip_job_result_check']
|
65
|
-
assert_equal false, task['prevent_duplicate_insert']
|
66
65
|
assert_equal false, task['with_rehearsal']
|
67
66
|
assert_equal 1000, task['rehearsal_counts']
|
68
67
|
assert_equal [], task['column_options']
|
@@ -162,22 +161,22 @@ module Embulk
|
|
162
161
|
end
|
163
162
|
|
164
163
|
def test_payload_column
|
165
|
-
config = least_config.merge('payload_column' => schema.first.name)
|
164
|
+
config = least_config.merge('payload_column' => schema.first.name, 'auto_create_table' => false, 'mode' => 'append_direct')
|
166
165
|
task = Bigquery.configure(config, schema, processor_count)
|
167
166
|
assert_equal task['payload_column_index'], 0
|
168
167
|
|
169
|
-
config = least_config.merge('payload_column' => 'not_exist')
|
168
|
+
config = least_config.merge('payload_column' => 'not_exist', 'auto_create_table' => false, 'mode' => 'append_direct')
|
170
169
|
assert_raise { Bigquery.configure(config, schema, processor_count) }
|
171
170
|
end
|
172
171
|
|
173
172
|
def test_payload_column_index
|
174
|
-
config = least_config.merge('payload_column_index' => 0)
|
173
|
+
config = least_config.merge('payload_column_index' => 0, 'auto_create_table' => false, 'mode' => 'append_direct')
|
175
174
|
assert_nothing_raised { Bigquery.configure(config, schema, processor_count) }
|
176
175
|
|
177
|
-
config = least_config.merge('payload_column_index' => -1)
|
176
|
+
config = least_config.merge('payload_column_index' => -1, 'auto_create_table' => false, 'mode' => 'append_direct')
|
178
177
|
assert_raise { Bigquery.configure(config, schema, processor_count) }
|
179
178
|
|
180
|
-
config = least_config.merge('payload_column_index' => schema.size)
|
179
|
+
config = least_config.merge('payload_column_index' => schema.size, 'auto_create_table' => false, 'mode' => 'append_direct')
|
181
180
|
assert_raise { Bigquery.configure(config, schema, processor_count) }
|
182
181
|
end
|
183
182
|
|
data/test/test_example.rb
CHANGED
data/test/test_transaction.rb
CHANGED
@@ -41,8 +41,8 @@ module Embulk
|
|
41
41
|
end
|
42
42
|
|
43
43
|
sub_test_case "append_direct" do
|
44
|
-
def
|
45
|
-
config = least_config.merge('mode' => 'append_direct')
|
44
|
+
def test_append_direc_without_auto_create
|
45
|
+
config = least_config.merge('mode' => 'append_direct', 'auto_create_dataset' => false, 'auto_create_table' => false)
|
46
46
|
any_instance_of(BigqueryClient) do |obj|
|
47
47
|
mock(obj).get_dataset(config['dataset'])
|
48
48
|
mock(obj).get_table(config['table'])
|
@@ -60,8 +60,8 @@ module Embulk
|
|
60
60
|
Bigquery.transaction(config, schema, processor_count, &control)
|
61
61
|
end
|
62
62
|
|
63
|
-
def
|
64
|
-
config = least_config.merge('mode' => 'append_direct', 'table' => 'table$20160929')
|
63
|
+
def test_append_direct_with_partition_without_auto_create
|
64
|
+
config = least_config.merge('mode' => 'append_direct', 'table' => 'table$20160929', 'auto_create_dataset' => false, 'auto_create_table' => false)
|
65
65
|
any_instance_of(BigqueryClient) do |obj|
|
66
66
|
mock(obj).get_dataset(config['dataset'])
|
67
67
|
mock(obj).get_table(config['table'])
|
@@ -86,7 +86,7 @@ module Embulk
|
|
86
86
|
task = Bigquery.configure(config, schema, processor_count)
|
87
87
|
any_instance_of(BigqueryClient) do |obj|
|
88
88
|
mock(obj).get_dataset(config['dataset'])
|
89
|
-
mock(obj).
|
89
|
+
mock(obj).delete_table_or_partition(config['table'])
|
90
90
|
mock(obj).create_table_if_not_exists(config['table'])
|
91
91
|
end
|
92
92
|
Bigquery.transaction(config, schema, processor_count, &control)
|
@@ -97,7 +97,7 @@ module Embulk
|
|
97
97
|
task = Bigquery.configure(config, schema, processor_count)
|
98
98
|
any_instance_of(BigqueryClient) do |obj|
|
99
99
|
mock(obj).get_dataset(config['dataset'])
|
100
|
-
mock(obj).
|
100
|
+
mock(obj).delete_table_or_partition(config['table'])
|
101
101
|
mock(obj).create_table_if_not_exists(config['table'])
|
102
102
|
end
|
103
103
|
Bigquery.transaction(config, schema, processor_count, &control)
|
@@ -111,6 +111,7 @@ module Embulk
|
|
111
111
|
any_instance_of(BigqueryClient) do |obj|
|
112
112
|
mock(obj).get_dataset(config['dataset'])
|
113
113
|
mock(obj).create_table_if_not_exists(config['temp_table'])
|
114
|
+
mock(obj).create_table_if_not_exists(config['table'])
|
114
115
|
mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
|
115
116
|
mock(obj).delete_table(config['temp_table'])
|
116
117
|
end
|
@@ -120,19 +121,6 @@ module Embulk
|
|
120
121
|
def test_replace_with_partitioning
|
121
122
|
config = least_config.merge('mode' => 'replace', 'table' => 'table$20160929')
|
122
123
|
task = Bigquery.configure(config, schema, processor_count)
|
123
|
-
any_instance_of(BigqueryClient) do |obj|
|
124
|
-
mock(obj).get_dataset(config['dataset'])
|
125
|
-
mock(obj).create_table_if_not_exists(config['temp_table'])
|
126
|
-
mock(obj).get_table(config['table'])
|
127
|
-
mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
|
128
|
-
mock(obj).delete_table(config['temp_table'])
|
129
|
-
end
|
130
|
-
Bigquery.transaction(config, schema, processor_count, &control)
|
131
|
-
end
|
132
|
-
|
133
|
-
def test_replace_with_partitioning_with_auto_create_table
|
134
|
-
config = least_config.merge('mode' => 'replace', 'table' => 'table$20160929', 'auto_create_table' => true)
|
135
|
-
task = Bigquery.configure(config, schema, processor_count)
|
136
124
|
any_instance_of(BigqueryClient) do |obj|
|
137
125
|
mock(obj).get_dataset(config['dataset'])
|
138
126
|
mock(obj).create_table_if_not_exists(config['temp_table'])
|
@@ -152,8 +140,10 @@ module Embulk
|
|
152
140
|
mock(obj).get_dataset(config['dataset'])
|
153
141
|
mock(obj).get_dataset(config['dataset_old'])
|
154
142
|
mock(obj).create_table_if_not_exists(config['temp_table'])
|
143
|
+
mock(obj).create_table_if_not_exists(config['table'])
|
144
|
+
mock(obj).create_table_if_not_exists(config['table_old'], dataset: config['dataset_old'])
|
155
145
|
|
156
|
-
mock(obj).get_table_or_partition(
|
146
|
+
mock(obj).get_table_or_partition(config['table'])
|
157
147
|
mock(obj).copy(config['table'], config['table_old'], config['dataset_old'])
|
158
148
|
|
159
149
|
mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
|
@@ -168,9 +158,11 @@ module Embulk
|
|
168
158
|
any_instance_of(BigqueryClient) do |obj|
|
169
159
|
mock(obj).create_dataset(config['dataset'])
|
170
160
|
mock(obj).create_dataset(config['dataset_old'], reference: config['dataset'])
|
161
|
+
mock(obj).create_table_if_not_exists(config['table'])
|
171
162
|
mock(obj).create_table_if_not_exists(config['temp_table'])
|
163
|
+
mock(obj).create_table_if_not_exists(config['table_old'], dataset: config['dataset_old'])
|
172
164
|
|
173
|
-
mock(obj).get_table_or_partition(
|
165
|
+
mock(obj).get_table_or_partition(config['table'])
|
174
166
|
mock(obj).copy(config['table'], config['table_old'], config['dataset_old'])
|
175
167
|
|
176
168
|
mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
|
@@ -180,35 +172,16 @@ module Embulk
|
|
180
172
|
end
|
181
173
|
|
182
174
|
def test_replace_backup_with_partitioning
|
183
|
-
config = least_config.merge('mode' => 'replace_backup', 'table' => 'table$20160929', 'dataset_old' => 'dataset_old', 'table_old' => 'table_old$20190929', 'temp_table' => 'temp_table')
|
184
|
-
task = Bigquery.configure(config, schema, processor_count)
|
185
|
-
any_instance_of(BigqueryClient) do |obj|
|
186
|
-
mock(obj).get_dataset(config['dataset'])
|
187
|
-
mock(obj).get_dataset(config['dataset_old'])
|
188
|
-
mock(obj).create_table_if_not_exists(config['temp_table'])
|
189
|
-
mock(obj).get_table(task['table'])
|
190
|
-
mock(obj).get_table(task['table_old'], dataset: config['dataset_old'])
|
191
|
-
|
192
|
-
mock(obj).get_table_or_partition(task['table'])
|
193
|
-
mock(obj).copy(config['table'], config['table_old'], config['dataset_old'])
|
194
|
-
|
195
|
-
mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
|
196
|
-
mock(obj).delete_table(config['temp_table'])
|
197
|
-
end
|
198
|
-
Bigquery.transaction(config, schema, processor_count, &control)
|
199
|
-
end
|
200
|
-
|
201
|
-
def test_replace_backup_with_partitioning_auto_create_table
|
202
175
|
config = least_config.merge('mode' => 'replace_backup', 'table' => 'table$20160929', 'dataset_old' => 'dataset_old', 'table_old' => 'table_old$20160929', 'temp_table' => 'temp_table', 'auto_create_table' => true)
|
203
176
|
task = Bigquery.configure(config, schema, processor_count)
|
204
177
|
any_instance_of(BigqueryClient) do |obj|
|
205
178
|
mock(obj).get_dataset(config['dataset'])
|
206
179
|
mock(obj).get_dataset(config['dataset_old'])
|
207
180
|
mock(obj).create_table_if_not_exists(config['temp_table'])
|
208
|
-
mock(obj).create_table_if_not_exists(
|
209
|
-
mock(obj).create_table_if_not_exists(
|
181
|
+
mock(obj).create_table_if_not_exists(config['table'])
|
182
|
+
mock(obj).create_table_if_not_exists(config['table_old'], dataset: config['dataset_old'])
|
210
183
|
|
211
|
-
mock(obj).get_table_or_partition(
|
184
|
+
mock(obj).get_table_or_partition(config['table'])
|
212
185
|
mock(obj).copy(config['table'], config['table_old'], config['dataset_old'])
|
213
186
|
|
214
187
|
mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
|
@@ -225,6 +198,7 @@ module Embulk
|
|
225
198
|
any_instance_of(BigqueryClient) do |obj|
|
226
199
|
mock(obj).get_dataset(config['dataset'])
|
227
200
|
mock(obj).create_table_if_not_exists(config['temp_table'])
|
201
|
+
mock(obj).create_table_if_not_exists(config['table'])
|
228
202
|
mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_APPEND')
|
229
203
|
mock(obj).delete_table(config['temp_table'])
|
230
204
|
end
|
@@ -232,19 +206,6 @@ module Embulk
|
|
232
206
|
end
|
233
207
|
|
234
208
|
def test_append_with_partitioning
|
235
|
-
config = least_config.merge('mode' => 'append', 'table' => 'table$20160929')
|
236
|
-
task = Bigquery.configure(config, schema, processor_count)
|
237
|
-
any_instance_of(BigqueryClient) do |obj|
|
238
|
-
mock(obj).get_dataset(config['dataset'])
|
239
|
-
mock(obj).create_table_if_not_exists(config['temp_table'])
|
240
|
-
mock(obj).get_table(config['table'])
|
241
|
-
mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_APPEND')
|
242
|
-
mock(obj).delete_table(config['temp_table'])
|
243
|
-
end
|
244
|
-
Bigquery.transaction(config, schema, processor_count, &control)
|
245
|
-
end
|
246
|
-
|
247
|
-
def test_append_with_partitioning_with_auto_create_table
|
248
209
|
config = least_config.merge('mode' => 'append', 'table' => 'table$20160929', 'auto_create_table' => true)
|
249
210
|
task = Bigquery.configure(config, schema, processor_count)
|
250
211
|
any_instance_of(BigqueryClient) do |obj|
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: embulk-output-bigquery
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.5.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Satoshi Akama
|
@@ -102,7 +102,6 @@ files:
|
|
102
102
|
- example/config_nested_record.yml
|
103
103
|
- example/config_payload_column.yml
|
104
104
|
- example/config_payload_column_index.yml
|
105
|
-
- example/config_prevent_duplicate_insert.yml
|
106
105
|
- example/config_progress_log_interval.yml
|
107
106
|
- example/config_replace.yml
|
108
107
|
- example/config_replace_backup.yml
|
@@ -1,30 +0,0 @@
|
|
1
|
-
in:
|
2
|
-
type: file
|
3
|
-
path_prefix: example/example.csv
|
4
|
-
parser:
|
5
|
-
type: csv
|
6
|
-
charset: UTF-8
|
7
|
-
newline: CRLF
|
8
|
-
null_string: 'NULL'
|
9
|
-
skip_header_lines: 1
|
10
|
-
comment_line_marker: '#'
|
11
|
-
columns:
|
12
|
-
- {name: date, type: string}
|
13
|
-
- {name: timestamp, type: timestamp, format: "%Y-%m-%d %H:%M:%S.%N", timezone: "+09:00"}
|
14
|
-
- {name: "null", type: string}
|
15
|
-
- {name: long, type: long}
|
16
|
-
- {name: string, type: string}
|
17
|
-
- {name: double, type: double}
|
18
|
-
- {name: boolean, type: boolean}
|
19
|
-
out:
|
20
|
-
type: bigquery
|
21
|
-
mode: append
|
22
|
-
auth_method: json_key
|
23
|
-
json_keyfile: example/your-project-000.json
|
24
|
-
dataset: your_dataset_name
|
25
|
-
table: your_table_name
|
26
|
-
source_format: NEWLINE_DELIMITED_JSON
|
27
|
-
auto_create_dataset: true
|
28
|
-
auto_create_table: true
|
29
|
-
schema_file: example/schema.json
|
30
|
-
prevent_duplicate_insert: true
|