embulk-output-bigquery 0.4.14 → 0.5.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +7 -0
- data/README.md +4 -36
- data/embulk-output-bigquery.gemspec +1 -1
- data/lib/embulk/output/bigquery.rb +11 -32
- data/lib/embulk/output/bigquery/bigquery_client.rb +2 -11
- data/test/test_configure.rb +6 -7
- data/test/test_example.rb +0 -1
- data/test/test_transaction.rb +17 -56
- metadata +1 -2
- data/example/config_prevent_duplicate_insert.yml +0 -30
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 3e0087103039718cb24224b6bb793d820b53b935194d412e4b2984aba3d7d7a8
|
4
|
+
data.tar.gz: 9ac27a3b881277450cbfaa096de0690c721a8f86f0e78abb692c8a4ed5b679d5
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 6b0ccf4e349a5d15321cfcc97138a98676bddfd412fd6fadfc8b1e0d6cd31d9739a8a5f46ccd923644543ae43cc0134b3e7598f80d89c330a4ac8aec49c084c1
|
7
|
+
data.tar.gz: f02557cdd7956620ae59eb6bc0e5872992d20a65881bd69230b0b0442342a36203d1eedd8a20702d2000f412b909359657bfa300b3e82b5f494398ea6e5ea301
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,10 @@
|
|
1
|
+
## 0.5.0 - 2019-08-10
|
2
|
+
|
3
|
+
* [incompatibility change] Drop deprecated time\_partitioning.require\_partition\_filter
|
4
|
+
* [incompatibility change] Drop prevent\_duplicate\_insert which has no use-case now
|
5
|
+
* [incompatibility change] Change default value of `auto\_create\_table` to `true` from `false`
|
6
|
+
* Modes `replace`, `replace_backup`, `append`, `delete_in_advance`, that is, except `append_direct` requires `auto_create_table: true`.
|
7
|
+
|
1
8
|
## 0.4.14 - 2019-08-10
|
2
9
|
|
3
10
|
* [enhancement] Support field partitioning correctly.
|
data/README.md
CHANGED
@@ -23,14 +23,6 @@ https://developers.google.com/bigquery/loading-data-into-bigquery
|
|
23
23
|
Current version of this plugin supports Google API with Service Account Authentication, but does not support
|
24
24
|
OAuth flow for installed applications.
|
25
25
|
|
26
|
-
### INCOMPATIBILITY CHANGES
|
27
|
-
|
28
|
-
v0.3.x has incompatibility changes with v0.2.x. Please see [CHANGELOG.md](CHANGELOG.md) for details.
|
29
|
-
|
30
|
-
* `formatter` option (formatter plugin support) is dropped. Use `source_format` option instead. (it already exists in v0.2.x too)
|
31
|
-
* `encoders` option (encoder plugin support) is dropped. Use `compression` option instead (it already exists in v0.2.x too).
|
32
|
-
* `mode: append` mode now expresses a transactional append, and `mode: append_direct` is one which is not transactional.
|
33
|
-
|
34
26
|
## Configuration
|
35
27
|
|
36
28
|
#### Original options
|
@@ -47,10 +39,9 @@ v0.3.x has incompatibility changes with v0.2.x. Please see [CHANGELOG.md](CHANGE
|
|
47
39
|
| location | string | optional | nil | geographic location of dataset. See [Location](#location) |
|
48
40
|
| table | string | required | | table name, or table name with a partition decorator such as `table_name$20160929`|
|
49
41
|
| auto_create_dataset | boolean | optional | false | automatically create dataset |
|
50
|
-
| auto_create_table | boolean | optional |
|
42
|
+
| auto_create_table | boolean | optional | true | `false` is available only for `append_direct` mode. Other modes requires `true`. See [Dynamic Table Creating](#dynamic-table-creating) and [Time Partitioning](#time-partitioning) |
|
51
43
|
| schema_file | string | optional | | /path/to/schema.json |
|
52
44
|
| template_table | string | optional | | template table name. See [Dynamic Table Creating](#dynamic-table-creating) |
|
53
|
-
| prevent_duplicate_insert | boolean | optional | false | See [Prevent Duplication](#prevent-duplication) |
|
54
45
|
| job_status_max_polling_time | int | optional | 3600 sec | Max job status polling time |
|
55
46
|
| job_status_polling_interval | int | optional | 10 sec | Job status polling interval |
|
56
47
|
| is_skip_job_result_check | boolean | optional | false | Skip waiting Load job finishes. Available for append, or delete_in_advance mode |
|
@@ -107,7 +98,6 @@ Following options are same as [bq command-line tools](https://cloud.google.com/b
|
|
107
98
|
| time_partitioning.type | string | required | nil | The only type supported is DAY, which will generate one partition per day based on data loading time. |
|
108
99
|
| time_partitioning.expiration_ms | int | optional | nil | Number of milliseconds for which to keep the storage for a partition. |
|
109
100
|
| time_partitioning.field | string | optional | nil | `DATE` or `TIMESTAMP` column used for partitioning |
|
110
|
-
| time_partitioning.require_partition_filter | boolean | optional | nil | If true, valid partition filter is required when query |
|
111
101
|
| clustering | hash | optional | nil | Currently, clustering is supported for partitioned tables, so must be used with `time_partitioning` option. See [clustered tables](https://cloud.google.com/bigquery/docs/clustered-tables) |
|
112
102
|
| clustering.fields | array | required | nil | One or more fields on which data should be clustered. The order of the specified columns determines the sort order of the data. |
|
113
103
|
| schema_update_options | array | optional | nil | (Experimental) List of `ALLOW_FIELD_ADDITION` or `ALLOW_FIELD_RELAXATION` or both. See [jobs#configuration.load.schemaUpdateOptions](https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load.schemaUpdateOptions). NOTE for the current status: `schema_update_options` does not work for `copy` job, that is, is not effective for most of modes such as `append`, `replace` and `replace_backup`. `delete_in_advance` deletes origin table so does not need to update schema. Only `append_direct` can utilize schema update. |
|
@@ -252,11 +242,6 @@ out:
|
|
252
242
|
|
253
243
|
### Dynamic table creating
|
254
244
|
|
255
|
-
This plugin tries to create a table using BigQuery API when
|
256
|
-
|
257
|
-
* mode is either of `delete_in_advance`, `replace`, `replace_backup`, `append`.
|
258
|
-
* mode is `append_direct` and `auto_create_table` is true.
|
259
|
-
|
260
245
|
There are 3 ways to set schema.
|
261
246
|
|
262
247
|
#### Set schema.json
|
@@ -355,22 +340,6 @@ out:
|
|
355
340
|
payload_column_index: 0 # or, payload_column: payload
|
356
341
|
```
|
357
342
|
|
358
|
-
### Prevent Duplication
|
359
|
-
|
360
|
-
`prevent_duplicate_insert` option is used to prevent inserting same data for modes `append` or `append_direct`.
|
361
|
-
|
362
|
-
When `prevent_duplicate_insert` is set to true, embulk-output-bigquery generate job ID from md5 hash of file and other options.
|
363
|
-
|
364
|
-
`job ID = md5(md5(file) + dataset + table + schema + source_format + file_delimiter + max_bad_records + encoding + ignore_unknown_values + allow_quoted_newlines)`
|
365
|
-
|
366
|
-
[job ID must be unique(including failures)](https://cloud.google.com/bigquery/loading-data-into-bigquery#consistency) so that same data can't be inserted with same settings repeatedly.
|
367
|
-
|
368
|
-
```yaml
|
369
|
-
out:
|
370
|
-
type: bigquery
|
371
|
-
prevent_duplicate_insert: true
|
372
|
-
```
|
373
|
-
|
374
343
|
### GCS Bucket
|
375
344
|
|
376
345
|
This is useful to reduce number of consumed jobs, which is limited by [100,000 jobs per project per day](https://cloud.google.com/bigquery/quotas#load_jobs).
|
@@ -401,32 +370,31 @@ To load into a partition, specify `table` parameter with a partition decorator a
|
|
401
370
|
out:
|
402
371
|
type: bigquery
|
403
372
|
table: table_name$20160929
|
404
|
-
auto_create_table: true
|
405
373
|
```
|
406
374
|
|
407
|
-
You may configure `time_partitioning` parameter together
|
375
|
+
You may configure `time_partitioning` parameter together as:
|
408
376
|
|
409
377
|
```yaml
|
410
378
|
out:
|
411
379
|
type: bigquery
|
412
380
|
table: table_name$20160929
|
413
|
-
auto_create_table: true
|
414
381
|
time_partitioning:
|
415
382
|
type: DAY
|
416
383
|
expiration_ms: 259200000
|
417
384
|
```
|
418
385
|
|
419
386
|
You can also create column-based partitioning table as:
|
387
|
+
|
420
388
|
```yaml
|
421
389
|
out:
|
422
390
|
type: bigquery
|
423
391
|
mode: replace
|
424
|
-
auto_create_table: true
|
425
392
|
table: table_name
|
426
393
|
time_partitioning:
|
427
394
|
type: DAY
|
428
395
|
field: timestamp
|
429
396
|
```
|
397
|
+
|
430
398
|
Note the `time_partitioning.field` should be top-level `DATE` or `TIMESTAMP`.
|
431
399
|
|
432
400
|
Use [Tables: patch](https://cloud.google.com/bigquery/docs/reference/v2/tables/patch) API to update the schema of the partitioned table, embulk-output-bigquery itself does not support it, though.
|
@@ -1,6 +1,6 @@
|
|
1
1
|
Gem::Specification.new do |spec|
|
2
2
|
spec.name = "embulk-output-bigquery"
|
3
|
-
spec.version = "0.
|
3
|
+
spec.version = "0.5.0"
|
4
4
|
spec.authors = ["Satoshi Akama", "Naotoshi Seo"]
|
5
5
|
spec.summary = "Google BigQuery output plugin for Embulk"
|
6
6
|
spec.description = "Embulk plugin that insert records to Google BigQuery."
|
@@ -45,7 +45,7 @@ module Embulk
|
|
45
45
|
'table_old' => config.param('table_old', :string, :default => nil),
|
46
46
|
'table_name_old' => config.param('table_name_old', :string, :default => nil), # lower version compatibility
|
47
47
|
'auto_create_dataset' => config.param('auto_create_dataset', :bool, :default => false),
|
48
|
-
'auto_create_table' => config.param('auto_create_table', :bool, :default =>
|
48
|
+
'auto_create_table' => config.param('auto_create_table', :bool, :default => true),
|
49
49
|
'schema_file' => config.param('schema_file', :string, :default => nil),
|
50
50
|
'template_table' => config.param('template_table', :string, :default => nil),
|
51
51
|
|
@@ -53,7 +53,6 @@ module Embulk
|
|
53
53
|
'job_status_max_polling_time' => config.param('job_status_max_polling_time', :integer, :default => 3600),
|
54
54
|
'job_status_polling_interval' => config.param('job_status_polling_interval', :integer, :default => 10),
|
55
55
|
'is_skip_job_result_check' => config.param('is_skip_job_result_check', :bool, :default => false),
|
56
|
-
'prevent_duplicate_insert' => config.param('prevent_duplicate_insert', :bool, :default => false),
|
57
56
|
'with_rehearsal' => config.param('with_rehearsal', :bool, :default => false),
|
58
57
|
'rehearsal_counts' => config.param('rehearsal_counts', :integer, :default => 1000),
|
59
58
|
'abort_on_error' => config.param('abort_on_error', :bool, :default => nil),
|
@@ -105,10 +104,14 @@ module Embulk
|
|
105
104
|
raise ConfigError.new "`mode` must be one of append, append_direct, replace, delete_in_advance, replace_backup"
|
106
105
|
end
|
107
106
|
|
107
|
+
if %w[append replace delete_in_advance replace_backup].include?(task['mode']) and !task['auto_create_table']
|
108
|
+
raise ConfigError.new "`mode: #{task['mode']}` requires `auto_create_table: true`"
|
109
|
+
end
|
110
|
+
|
108
111
|
if task['mode'] == 'replace_backup'
|
109
112
|
task['table_old'] ||= task['table_name_old'] # for lower version compatibility
|
110
113
|
if task['dataset_old'].nil? and task['table_old'].nil?
|
111
|
-
raise ConfigError.new "`mode replace_backup` requires either of `dataset_old` or `table_old`"
|
114
|
+
raise ConfigError.new "`mode: replace_backup` requires either of `dataset_old` or `table_old`"
|
112
115
|
end
|
113
116
|
task['dataset_old'] ||= task['dataset']
|
114
117
|
task['table_old'] ||= task['table']
|
@@ -306,42 +309,18 @@ module Embulk
|
|
306
309
|
|
307
310
|
case task['mode']
|
308
311
|
when 'delete_in_advance'
|
309
|
-
bigquery.
|
312
|
+
bigquery.delete_table_or_partition(task['table'])
|
310
313
|
bigquery.create_table_if_not_exists(task['table'])
|
311
314
|
when 'replace'
|
312
315
|
bigquery.create_table_if_not_exists(task['temp_table'])
|
313
|
-
|
314
|
-
if task['auto_create_table']
|
315
|
-
bigquery.create_table_if_not_exists(task['table'])
|
316
|
-
else
|
317
|
-
bigquery.get_table(task['table']) # raises NotFoundError
|
318
|
-
end
|
319
|
-
end
|
316
|
+
bigquery.create_table_if_not_exists(task['table'])
|
320
317
|
when 'append'
|
321
318
|
bigquery.create_table_if_not_exists(task['temp_table'])
|
322
|
-
|
323
|
-
if task['auto_create_table']
|
324
|
-
bigquery.create_table_if_not_exists(task['table'])
|
325
|
-
else
|
326
|
-
bigquery.get_table(task['table']) # raises NotFoundError
|
327
|
-
end
|
328
|
-
end
|
319
|
+
bigquery.create_table_if_not_exists(task['table'])
|
329
320
|
when 'replace_backup'
|
330
321
|
bigquery.create_table_if_not_exists(task['temp_table'])
|
331
|
-
|
332
|
-
|
333
|
-
bigquery.create_table_if_not_exists(task['table'])
|
334
|
-
else
|
335
|
-
bigquery.get_table(task['table']) # raises NotFoundError
|
336
|
-
end
|
337
|
-
end
|
338
|
-
if Helper.has_partition_decorator?(task['table_old'])
|
339
|
-
if task['auto_create_table']
|
340
|
-
bigquery.create_table_if_not_exists(task['table_old'], dataset: task['dataset_old'])
|
341
|
-
else
|
342
|
-
bigquery.get_table(task['table_old'], dataset: task['dataset_old']) # raises NotFoundError
|
343
|
-
end
|
344
|
-
end
|
322
|
+
bigquery.create_table_if_not_exists(task['table'])
|
323
|
+
bigquery.create_table_if_not_exists(task['table_old'], dataset: task['dataset_old'])
|
345
324
|
else # append_direct
|
346
325
|
if task['auto_create_table']
|
347
326
|
bigquery.create_table_if_not_exists(task['table'])
|
@@ -79,11 +79,7 @@ module Embulk
|
|
79
79
|
begin
|
80
80
|
# As https://cloud.google.com/bigquery/docs/managing_jobs_datasets_projects#managingjobs says,
|
81
81
|
# we should generate job_id in client code, otherwise, retrying would cause duplication
|
82
|
-
|
83
|
-
job_id = Helper.create_load_job_id(@task, path, fields)
|
84
|
-
else
|
85
|
-
job_id = "embulk_load_job_#{SecureRandom.uuid}"
|
86
|
-
end
|
82
|
+
job_id = "embulk_load_job_#{SecureRandom.uuid}"
|
87
83
|
Embulk.logger.info { "embulk-output-bigquery: Load job starting... job_id:[#{job_id}] #{object_uris} => #{@project}:#{@dataset}.#{table} in #{@location_for_log}" }
|
88
84
|
|
89
85
|
body = {
|
@@ -174,11 +170,7 @@ module Embulk
|
|
174
170
|
if File.exist?(path)
|
175
171
|
# As https://cloud.google.com/bigquery/docs/managing_jobs_datasets_projects#managingjobs says,
|
176
172
|
# we should generate job_id in client code, otherwise, retrying would cause duplication
|
177
|
-
|
178
|
-
job_id = Helper.create_load_job_id(@task, path, fields)
|
179
|
-
else
|
180
|
-
job_id = "embulk_load_job_#{SecureRandom.uuid}"
|
181
|
-
end
|
173
|
+
job_id = "embulk_load_job_#{SecureRandom.uuid}"
|
182
174
|
Embulk.logger.info { "embulk-output-bigquery: Load job starting... job_id:[#{job_id}] #{path} => #{@project}:#{@dataset}.#{table} in #{@location_for_log}" }
|
183
175
|
else
|
184
176
|
Embulk.logger.info { "embulk-output-bigquery: Load job starting... #{path} does not exist, skipped" }
|
@@ -437,7 +429,6 @@ module Embulk
|
|
437
429
|
type: options['time_partitioning']['type'],
|
438
430
|
expiration_ms: options['time_partitioning']['expiration_ms'],
|
439
431
|
field: options['time_partitioning']['field'],
|
440
|
-
require_partition_filter: options['time_partitioning']['require_partition_filter'],
|
441
432
|
}
|
442
433
|
end
|
443
434
|
|
data/test/test_configure.rb
CHANGED
@@ -55,14 +55,13 @@ module Embulk
|
|
55
55
|
assert_equal nil, task['table_old']
|
56
56
|
assert_equal nil, task['table_name_old']
|
57
57
|
assert_equal false, task['auto_create_dataset']
|
58
|
-
assert_equal
|
58
|
+
assert_equal true, task['auto_create_table']
|
59
59
|
assert_equal nil, task['schema_file']
|
60
60
|
assert_equal nil, task['template_table']
|
61
61
|
assert_equal true, task['delete_from_local_when_job_end']
|
62
62
|
assert_equal 3600, task['job_status_max_polling_time']
|
63
63
|
assert_equal 10, task['job_status_polling_interval']
|
64
64
|
assert_equal false, task['is_skip_job_result_check']
|
65
|
-
assert_equal false, task['prevent_duplicate_insert']
|
66
65
|
assert_equal false, task['with_rehearsal']
|
67
66
|
assert_equal 1000, task['rehearsal_counts']
|
68
67
|
assert_equal [], task['column_options']
|
@@ -162,22 +161,22 @@ module Embulk
|
|
162
161
|
end
|
163
162
|
|
164
163
|
def test_payload_column
|
165
|
-
config = least_config.merge('payload_column' => schema.first.name)
|
164
|
+
config = least_config.merge('payload_column' => schema.first.name, 'auto_create_table' => false, 'mode' => 'append_direct')
|
166
165
|
task = Bigquery.configure(config, schema, processor_count)
|
167
166
|
assert_equal task['payload_column_index'], 0
|
168
167
|
|
169
|
-
config = least_config.merge('payload_column' => 'not_exist')
|
168
|
+
config = least_config.merge('payload_column' => 'not_exist', 'auto_create_table' => false, 'mode' => 'append_direct')
|
170
169
|
assert_raise { Bigquery.configure(config, schema, processor_count) }
|
171
170
|
end
|
172
171
|
|
173
172
|
def test_payload_column_index
|
174
|
-
config = least_config.merge('payload_column_index' => 0)
|
173
|
+
config = least_config.merge('payload_column_index' => 0, 'auto_create_table' => false, 'mode' => 'append_direct')
|
175
174
|
assert_nothing_raised { Bigquery.configure(config, schema, processor_count) }
|
176
175
|
|
177
|
-
config = least_config.merge('payload_column_index' => -1)
|
176
|
+
config = least_config.merge('payload_column_index' => -1, 'auto_create_table' => false, 'mode' => 'append_direct')
|
178
177
|
assert_raise { Bigquery.configure(config, schema, processor_count) }
|
179
178
|
|
180
|
-
config = least_config.merge('payload_column_index' => schema.size)
|
179
|
+
config = least_config.merge('payload_column_index' => schema.size, 'auto_create_table' => false, 'mode' => 'append_direct')
|
181
180
|
assert_raise { Bigquery.configure(config, schema, processor_count) }
|
182
181
|
end
|
183
182
|
|
data/test/test_example.rb
CHANGED
data/test/test_transaction.rb
CHANGED
@@ -41,8 +41,8 @@ module Embulk
|
|
41
41
|
end
|
42
42
|
|
43
43
|
sub_test_case "append_direct" do
|
44
|
-
def
|
45
|
-
config = least_config.merge('mode' => 'append_direct')
|
44
|
+
def test_append_direc_without_auto_create
|
45
|
+
config = least_config.merge('mode' => 'append_direct', 'auto_create_dataset' => false, 'auto_create_table' => false)
|
46
46
|
any_instance_of(BigqueryClient) do |obj|
|
47
47
|
mock(obj).get_dataset(config['dataset'])
|
48
48
|
mock(obj).get_table(config['table'])
|
@@ -60,8 +60,8 @@ module Embulk
|
|
60
60
|
Bigquery.transaction(config, schema, processor_count, &control)
|
61
61
|
end
|
62
62
|
|
63
|
-
def
|
64
|
-
config = least_config.merge('mode' => 'append_direct', 'table' => 'table$20160929')
|
63
|
+
def test_append_direct_with_partition_without_auto_create
|
64
|
+
config = least_config.merge('mode' => 'append_direct', 'table' => 'table$20160929', 'auto_create_dataset' => false, 'auto_create_table' => false)
|
65
65
|
any_instance_of(BigqueryClient) do |obj|
|
66
66
|
mock(obj).get_dataset(config['dataset'])
|
67
67
|
mock(obj).get_table(config['table'])
|
@@ -86,7 +86,7 @@ module Embulk
|
|
86
86
|
task = Bigquery.configure(config, schema, processor_count)
|
87
87
|
any_instance_of(BigqueryClient) do |obj|
|
88
88
|
mock(obj).get_dataset(config['dataset'])
|
89
|
-
mock(obj).
|
89
|
+
mock(obj).delete_table_or_partition(config['table'])
|
90
90
|
mock(obj).create_table_if_not_exists(config['table'])
|
91
91
|
end
|
92
92
|
Bigquery.transaction(config, schema, processor_count, &control)
|
@@ -97,7 +97,7 @@ module Embulk
|
|
97
97
|
task = Bigquery.configure(config, schema, processor_count)
|
98
98
|
any_instance_of(BigqueryClient) do |obj|
|
99
99
|
mock(obj).get_dataset(config['dataset'])
|
100
|
-
mock(obj).
|
100
|
+
mock(obj).delete_table_or_partition(config['table'])
|
101
101
|
mock(obj).create_table_if_not_exists(config['table'])
|
102
102
|
end
|
103
103
|
Bigquery.transaction(config, schema, processor_count, &control)
|
@@ -111,6 +111,7 @@ module Embulk
|
|
111
111
|
any_instance_of(BigqueryClient) do |obj|
|
112
112
|
mock(obj).get_dataset(config['dataset'])
|
113
113
|
mock(obj).create_table_if_not_exists(config['temp_table'])
|
114
|
+
mock(obj).create_table_if_not_exists(config['table'])
|
114
115
|
mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
|
115
116
|
mock(obj).delete_table(config['temp_table'])
|
116
117
|
end
|
@@ -120,19 +121,6 @@ module Embulk
|
|
120
121
|
def test_replace_with_partitioning
|
121
122
|
config = least_config.merge('mode' => 'replace', 'table' => 'table$20160929')
|
122
123
|
task = Bigquery.configure(config, schema, processor_count)
|
123
|
-
any_instance_of(BigqueryClient) do |obj|
|
124
|
-
mock(obj).get_dataset(config['dataset'])
|
125
|
-
mock(obj).create_table_if_not_exists(config['temp_table'])
|
126
|
-
mock(obj).get_table(config['table'])
|
127
|
-
mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
|
128
|
-
mock(obj).delete_table(config['temp_table'])
|
129
|
-
end
|
130
|
-
Bigquery.transaction(config, schema, processor_count, &control)
|
131
|
-
end
|
132
|
-
|
133
|
-
def test_replace_with_partitioning_with_auto_create_table
|
134
|
-
config = least_config.merge('mode' => 'replace', 'table' => 'table$20160929', 'auto_create_table' => true)
|
135
|
-
task = Bigquery.configure(config, schema, processor_count)
|
136
124
|
any_instance_of(BigqueryClient) do |obj|
|
137
125
|
mock(obj).get_dataset(config['dataset'])
|
138
126
|
mock(obj).create_table_if_not_exists(config['temp_table'])
|
@@ -152,8 +140,10 @@ module Embulk
|
|
152
140
|
mock(obj).get_dataset(config['dataset'])
|
153
141
|
mock(obj).get_dataset(config['dataset_old'])
|
154
142
|
mock(obj).create_table_if_not_exists(config['temp_table'])
|
143
|
+
mock(obj).create_table_if_not_exists(config['table'])
|
144
|
+
mock(obj).create_table_if_not_exists(config['table_old'], dataset: config['dataset_old'])
|
155
145
|
|
156
|
-
mock(obj).get_table_or_partition(
|
146
|
+
mock(obj).get_table_or_partition(config['table'])
|
157
147
|
mock(obj).copy(config['table'], config['table_old'], config['dataset_old'])
|
158
148
|
|
159
149
|
mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
|
@@ -168,9 +158,11 @@ module Embulk
|
|
168
158
|
any_instance_of(BigqueryClient) do |obj|
|
169
159
|
mock(obj).create_dataset(config['dataset'])
|
170
160
|
mock(obj).create_dataset(config['dataset_old'], reference: config['dataset'])
|
161
|
+
mock(obj).create_table_if_not_exists(config['table'])
|
171
162
|
mock(obj).create_table_if_not_exists(config['temp_table'])
|
163
|
+
mock(obj).create_table_if_not_exists(config['table_old'], dataset: config['dataset_old'])
|
172
164
|
|
173
|
-
mock(obj).get_table_or_partition(
|
165
|
+
mock(obj).get_table_or_partition(config['table'])
|
174
166
|
mock(obj).copy(config['table'], config['table_old'], config['dataset_old'])
|
175
167
|
|
176
168
|
mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
|
@@ -180,35 +172,16 @@ module Embulk
|
|
180
172
|
end
|
181
173
|
|
182
174
|
def test_replace_backup_with_partitioning
|
183
|
-
config = least_config.merge('mode' => 'replace_backup', 'table' => 'table$20160929', 'dataset_old' => 'dataset_old', 'table_old' => 'table_old$20190929', 'temp_table' => 'temp_table')
|
184
|
-
task = Bigquery.configure(config, schema, processor_count)
|
185
|
-
any_instance_of(BigqueryClient) do |obj|
|
186
|
-
mock(obj).get_dataset(config['dataset'])
|
187
|
-
mock(obj).get_dataset(config['dataset_old'])
|
188
|
-
mock(obj).create_table_if_not_exists(config['temp_table'])
|
189
|
-
mock(obj).get_table(task['table'])
|
190
|
-
mock(obj).get_table(task['table_old'], dataset: config['dataset_old'])
|
191
|
-
|
192
|
-
mock(obj).get_table_or_partition(task['table'])
|
193
|
-
mock(obj).copy(config['table'], config['table_old'], config['dataset_old'])
|
194
|
-
|
195
|
-
mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
|
196
|
-
mock(obj).delete_table(config['temp_table'])
|
197
|
-
end
|
198
|
-
Bigquery.transaction(config, schema, processor_count, &control)
|
199
|
-
end
|
200
|
-
|
201
|
-
def test_replace_backup_with_partitioning_auto_create_table
|
202
175
|
config = least_config.merge('mode' => 'replace_backup', 'table' => 'table$20160929', 'dataset_old' => 'dataset_old', 'table_old' => 'table_old$20160929', 'temp_table' => 'temp_table', 'auto_create_table' => true)
|
203
176
|
task = Bigquery.configure(config, schema, processor_count)
|
204
177
|
any_instance_of(BigqueryClient) do |obj|
|
205
178
|
mock(obj).get_dataset(config['dataset'])
|
206
179
|
mock(obj).get_dataset(config['dataset_old'])
|
207
180
|
mock(obj).create_table_if_not_exists(config['temp_table'])
|
208
|
-
mock(obj).create_table_if_not_exists(
|
209
|
-
mock(obj).create_table_if_not_exists(
|
181
|
+
mock(obj).create_table_if_not_exists(config['table'])
|
182
|
+
mock(obj).create_table_if_not_exists(config['table_old'], dataset: config['dataset_old'])
|
210
183
|
|
211
|
-
mock(obj).get_table_or_partition(
|
184
|
+
mock(obj).get_table_or_partition(config['table'])
|
212
185
|
mock(obj).copy(config['table'], config['table_old'], config['dataset_old'])
|
213
186
|
|
214
187
|
mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_TRUNCATE')
|
@@ -225,6 +198,7 @@ module Embulk
|
|
225
198
|
any_instance_of(BigqueryClient) do |obj|
|
226
199
|
mock(obj).get_dataset(config['dataset'])
|
227
200
|
mock(obj).create_table_if_not_exists(config['temp_table'])
|
201
|
+
mock(obj).create_table_if_not_exists(config['table'])
|
228
202
|
mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_APPEND')
|
229
203
|
mock(obj).delete_table(config['temp_table'])
|
230
204
|
end
|
@@ -232,19 +206,6 @@ module Embulk
|
|
232
206
|
end
|
233
207
|
|
234
208
|
def test_append_with_partitioning
|
235
|
-
config = least_config.merge('mode' => 'append', 'table' => 'table$20160929')
|
236
|
-
task = Bigquery.configure(config, schema, processor_count)
|
237
|
-
any_instance_of(BigqueryClient) do |obj|
|
238
|
-
mock(obj).get_dataset(config['dataset'])
|
239
|
-
mock(obj).create_table_if_not_exists(config['temp_table'])
|
240
|
-
mock(obj).get_table(config['table'])
|
241
|
-
mock(obj).copy(config['temp_table'], config['table'], write_disposition: 'WRITE_APPEND')
|
242
|
-
mock(obj).delete_table(config['temp_table'])
|
243
|
-
end
|
244
|
-
Bigquery.transaction(config, schema, processor_count, &control)
|
245
|
-
end
|
246
|
-
|
247
|
-
def test_append_with_partitioning_with_auto_create_table
|
248
209
|
config = least_config.merge('mode' => 'append', 'table' => 'table$20160929', 'auto_create_table' => true)
|
249
210
|
task = Bigquery.configure(config, schema, processor_count)
|
250
211
|
any_instance_of(BigqueryClient) do |obj|
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: embulk-output-bigquery
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.5.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Satoshi Akama
|
@@ -102,7 +102,6 @@ files:
|
|
102
102
|
- example/config_nested_record.yml
|
103
103
|
- example/config_payload_column.yml
|
104
104
|
- example/config_payload_column_index.yml
|
105
|
-
- example/config_prevent_duplicate_insert.yml
|
106
105
|
- example/config_progress_log_interval.yml
|
107
106
|
- example/config_replace.yml
|
108
107
|
- example/config_replace_backup.yml
|
@@ -1,30 +0,0 @@
|
|
1
|
-
in:
|
2
|
-
type: file
|
3
|
-
path_prefix: example/example.csv
|
4
|
-
parser:
|
5
|
-
type: csv
|
6
|
-
charset: UTF-8
|
7
|
-
newline: CRLF
|
8
|
-
null_string: 'NULL'
|
9
|
-
skip_header_lines: 1
|
10
|
-
comment_line_marker: '#'
|
11
|
-
columns:
|
12
|
-
- {name: date, type: string}
|
13
|
-
- {name: timestamp, type: timestamp, format: "%Y-%m-%d %H:%M:%S.%N", timezone: "+09:00"}
|
14
|
-
- {name: "null", type: string}
|
15
|
-
- {name: long, type: long}
|
16
|
-
- {name: string, type: string}
|
17
|
-
- {name: double, type: double}
|
18
|
-
- {name: boolean, type: boolean}
|
19
|
-
out:
|
20
|
-
type: bigquery
|
21
|
-
mode: append
|
22
|
-
auth_method: json_key
|
23
|
-
json_keyfile: example/your-project-000.json
|
24
|
-
dataset: your_dataset_name
|
25
|
-
table: your_table_name
|
26
|
-
source_format: NEWLINE_DELIMITED_JSON
|
27
|
-
auto_create_dataset: true
|
28
|
-
auto_create_table: true
|
29
|
-
schema_file: example/schema.json
|
30
|
-
prevent_duplicate_insert: true
|