embulk-output-bigquery 0.6.0 → 0.6.1

Sign up to get free protection for your applications and to get access to all the features.
Files changed (50) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.md +7 -3
  3. data/README.md +12 -7
  4. data/embulk-output-bigquery.gemspec +4 -2
  5. data/lib/embulk/output/bigquery.rb +3 -3
  6. metadata +2 -46
  7. data/example/config_append_direct_schema_update_options.yml +0 -31
  8. data/example/config_client_options.yml +0 -33
  9. data/example/config_csv.yml +0 -30
  10. data/example/config_delete_in_advance.yml +0 -29
  11. data/example/config_delete_in_advance_field_partitioned_table.yml +0 -33
  12. data/example/config_delete_in_advance_partitioned_table.yml +0 -33
  13. data/example/config_expose_errors.yml +0 -30
  14. data/example/config_gcs.yml +0 -32
  15. data/example/config_guess_from_embulk_schema.yml +0 -29
  16. data/example/config_guess_with_column_options.yml +0 -40
  17. data/example/config_gzip.yml +0 -1
  18. data/example/config_jsonl.yml +0 -1
  19. data/example/config_max_threads.yml +0 -34
  20. data/example/config_min_ouput_tasks.yml +0 -34
  21. data/example/config_mode_append.yml +0 -30
  22. data/example/config_mode_append_direct.yml +0 -30
  23. data/example/config_nested_record.yml +0 -1
  24. data/example/config_payload_column.yml +0 -20
  25. data/example/config_payload_column_index.yml +0 -20
  26. data/example/config_progress_log_interval.yml +0 -31
  27. data/example/config_replace.yml +0 -30
  28. data/example/config_replace_backup.yml +0 -32
  29. data/example/config_replace_backup_field_partitioned_table.yml +0 -34
  30. data/example/config_replace_backup_partitioned_table.yml +0 -34
  31. data/example/config_replace_field_partitioned_table.yml +0 -33
  32. data/example/config_replace_partitioned_table.yml +0 -33
  33. data/example/config_replace_schema_update_options.yml +0 -33
  34. data/example/config_skip_file_generation.yml +0 -32
  35. data/example/config_table_strftime.yml +0 -30
  36. data/example/config_template_table.yml +0 -21
  37. data/example/config_uncompressed.yml +0 -1
  38. data/example/config_with_rehearsal.yml +0 -33
  39. data/example/example.csv +0 -17
  40. data/example/example.yml +0 -1
  41. data/example/example2_1.csv +0 -1
  42. data/example/example2_2.csv +0 -1
  43. data/example/example4_1.csv +0 -1
  44. data/example/example4_2.csv +0 -1
  45. data/example/example4_3.csv +0 -1
  46. data/example/example4_4.csv +0 -1
  47. data/example/json_key.json +0 -12
  48. data/example/nested_example.jsonl +0 -16
  49. data/example/schema.json +0 -30
  50. data/example/schema_expose_errors.json +0 -30
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 3c0942035a81c9180260f8329ccaa5ba99de2185ea5f9ec5f1b3ffe87d5e8a73
4
- data.tar.gz: c543a1b9f1278cf5d543a96bd3b8c465b2727b03df67d1e1726bef40135a1d42
3
+ metadata.gz: ddfd10c5e85614e1dae0333494333653f1af95b8158dfda8977f8b00d64b3478
4
+ data.tar.gz: 2cec70eaa49c828d7fe9347bc0d9699b9398f21db96880e997a66bdab23deb89
5
5
  SHA512:
6
- metadata.gz: 23559e485346f2f8d65fa76aef2284c8b8c682257ee317b2088b30515e6e2a2936cc3c5b8ab5c3020ee9f9790e735bc48c41bb8e5d30fc777174d681796128c1
7
- data.tar.gz: 336988c0afb153c0b9b7532bf9d85523bb9e9641eca7a79b6ab491be5567e2be9204c47417b28a8d42bbae2907cb6892b1ce5abb98261564c696b15478deb3ad
6
+ metadata.gz: 4782a28272da610f8399aca50cc4ddaefea00b8dbf45a37bec24771d7ecdb05bbdcd6de85ff167c5c3745f6689413c215689bb8d420960705cd6cb2026e99932
7
+ data.tar.gz: 9dbabb787e2f1b5797ccb2a2cd8786ce28d0e0d01310cd522ea4894337a279e809de10abca14b50b836553b6de95df4afd886596d75e7193d4de60a5c6f95781
@@ -1,3 +1,7 @@
1
+ ## 0.6.1 - 2019-08-28
2
+
3
+ * [maintenance] Release a new gem not to include symlinks to make it work on Windows.
4
+
1
5
  ## 0.6.0 - 2019-08-11
2
6
 
3
7
  Cleanup `auth_method`:
@@ -5,14 +9,14 @@ Cleanup `auth_method`:
5
9
  * [enhancement] Support `auth_method: authorized_user` (OAuth)
6
10
  * [incompatibility change] Rename `auth_method: json_key` to `auth_method: service_account` (`json_key` is kept for backward compatibility)
7
11
  * [incompatibility change] Remove deprecated `auth_method: private_key` (p12 key)
8
- * [incompatibility change] Change the default `auth_method` to `application_default` from `private_key`.
12
+ * [incompatibility change] Change the default `auth_method` to `application_default` from `private_key` because `private_key` was dropped.
9
13
 
10
14
  ## 0.5.0 - 2019-08-10
11
15
 
12
16
  * [incompatibility change] Drop deprecated `time_partitioning`.`require_partition_filter`
13
17
  * [incompatibility change] Drop `prevent_duplicate_insert` which has no use-case now
14
- * [incompatibility change] Change default value of `auto_create_table` to `true` from `false`
15
- * Modes `replace`, `replace_backup`, `append`, `delete_in_advance`, that is, except `append_direct` requires `auto_create_table: true`.
18
+ * [incompatibility change] Modes `replace`, `replace_backup`, `append`, and `delete_in_advance` require `auto_create_table: true` now because, previously, these modes had created a target table even with `auto_create_table: false` and made users being confused. Note that `auto_create_table: true` is always required even for a partition (a table name with a partition decorator) which may not require creating a table. This is for simplicity of logics and implementations.
19
+ * [incompatibility change] Change default value of `auto_create_table` to `true` because the above 4 modes, that is, except `append_direct` always require `auto_create_table: true` now.
16
20
 
17
21
  ## 0.4.14 - 2019-08-10
18
22
 
data/README.md CHANGED
@@ -37,7 +37,7 @@ OAuth flow for installed applications.
37
37
  | location | string | optional | nil | geographic location of dataset. See [Location](#location) |
38
38
  | table | string | required | | table name, or table name with a partition decorator such as `table_name$20160929`|
39
39
  | auto_create_dataset | boolean | optional | false | automatically create dataset |
40
- | auto_create_table | boolean | optional | true | `false` is available only for `append_direct` mode. Other modes requires `true`. See [Dynamic Table Creating](#dynamic-table-creating) and [Time Partitioning](#time-partitioning) |
40
+ | auto_create_table | boolean | optional | true | `false` is available only for `append_direct` mode. Other modes require `true`. See [Dynamic Table Creating](#dynamic-table-creating) and [Time Partitioning](#time-partitioning) |
41
41
  | schema_file | string | optional | | /path/to/schema.json |
42
42
  | template_table | string | optional | | template table name. See [Dynamic Table Creating](#dynamic-table-creating) |
43
43
  | job_status_max_polling_time | int | optional | 3600 sec | Max job status polling time |
@@ -213,7 +213,7 @@ You can also embed contents of `json_keyfile` at config.yml.
213
213
  ```yaml
214
214
  out:
215
215
  type: bigquery
216
- auth_method: service_account
216
+ auth_method: authorized_user
217
217
  json_keyfile:
218
218
  content: |
219
219
  {
@@ -239,7 +239,12 @@ out:
239
239
 
240
240
  #### application\_default
241
241
 
242
- Use Application Default Credentials (ADC).
242
+ Use Application Default Credentials (ADC). ADC is a strategy to locate Google Cloud Service Account credentials.
243
+
244
+ 1. ADC checks to see if the environment variable `GOOGLE_APPLICATION_CREDENTIALS` is set. If the variable is set, ADC uses the service account file that the variable points to.
245
+ 2. ADC checks to see if `~/.config/gcloud/application_default_credentials.json` is located. This file is created by running `gcloud auth application-default login`.
246
+ 3. Use the default service account for credentials if the application running on Compute Engine, App Engine, Kubernetes Engine, Cloud Functions or Cloud Run.
247
+
243
248
  See https://cloud.google.com/docs/authentication/production for details.
244
249
 
245
250
  ```yaml
@@ -256,12 +261,12 @@ Table ids are formatted at runtime
256
261
  using the local time of the embulk server.
257
262
 
258
263
  For example, with the configuration below,
259
- data is inserted into tables `table_2015_04`, `table_2015_05` and so on.
264
+ data is inserted into tables `table_20150503`, `table_20150504` and so on.
260
265
 
261
266
  ```yaml
262
267
  out:
263
268
  type: bigquery
264
- table: table_%Y_%m
269
+ table: table_%Y%m%d
265
270
  ```
266
271
 
267
272
  ### Dynamic table creating
@@ -276,7 +281,7 @@ Please set file path of schema.json.
276
281
  out:
277
282
  type: bigquery
278
283
  auto_create_table: true
279
- table: table_%Y_%m
284
+ table: table_%Y%m%d
280
285
  schema_file: /path/to/schema.json
281
286
  ```
282
287
 
@@ -288,7 +293,7 @@ Plugin will try to read schema from existing table and use it as schema template
288
293
  out:
289
294
  type: bigquery
290
295
  auto_create_table: true
291
- table: table_%Y_%m
296
+ table: table_%Y%m%d
292
297
  template_table: existing_table_name
293
298
  ```
294
299
 
@@ -1,6 +1,6 @@
1
1
  Gem::Specification.new do |spec|
2
2
  spec.name = "embulk-output-bigquery"
3
- spec.version = "0.6.0"
3
+ spec.version = "0.6.1"
4
4
  spec.authors = ["Satoshi Akama", "Naotoshi Seo"]
5
5
  spec.summary = "Google BigQuery output plugin for Embulk"
6
6
  spec.description = "Embulk plugin that insert records to Google BigQuery."
@@ -8,7 +8,9 @@ Gem::Specification.new do |spec|
8
8
  spec.licenses = ["MIT"]
9
9
  spec.homepage = "https://github.com/embulk/embulk-output-bigquery"
10
10
 
11
- spec.files = `git ls-files`.split("\n") + Dir["classpath/*.jar"]
11
+ # Exclude example directory which uses symlinks from generating gem.
12
+ # Symlinks do not work properly on the Windows platform without administrator privilege.
13
+ spec.files = `git ls-files`.split("\n") + Dir["classpath/*.jar"] - Dir["example/*" ]
12
14
  spec.test_files = spec.files.grep(%r{^(test|spec)/})
13
15
  spec.require_paths = ["lib"]
14
16
 
@@ -304,14 +304,14 @@ module Embulk
304
304
  bigquery.create_table_if_not_exists(task['table'])
305
305
  when 'replace'
306
306
  bigquery.create_table_if_not_exists(task['temp_table'])
307
- bigquery.create_table_if_not_exists(task['table'])
307
+ bigquery.create_table_if_not_exists(task['table']) # needs for when task['table'] is a partition
308
308
  when 'append'
309
309
  bigquery.create_table_if_not_exists(task['temp_table'])
310
- bigquery.create_table_if_not_exists(task['table'])
310
+ bigquery.create_table_if_not_exists(task['table']) # needs for when task['table'] is a partition
311
311
  when 'replace_backup'
312
312
  bigquery.create_table_if_not_exists(task['temp_table'])
313
313
  bigquery.create_table_if_not_exists(task['table'])
314
- bigquery.create_table_if_not_exists(task['table_old'], dataset: task['dataset_old'])
314
+ bigquery.create_table_if_not_exists(task['table_old'], dataset: task['dataset_old']) # needs for when a partition
315
315
  else # append_direct
316
316
  if task['auto_create_table']
317
317
  bigquery.create_table_if_not_exists(task['table'])
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: embulk-output-bigquery
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.6.0
4
+ version: 0.6.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Satoshi Akama
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2019-08-10 00:00:00.000000000 Z
12
+ date: 2019-08-28 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  requirement: !ruby/object:Gem::Requirement
@@ -83,50 +83,6 @@ files:
83
83
  - README.md
84
84
  - Rakefile
85
85
  - embulk-output-bigquery.gemspec
86
- - example/config_append_direct_schema_update_options.yml
87
- - example/config_client_options.yml
88
- - example/config_csv.yml
89
- - example/config_delete_in_advance.yml
90
- - example/config_delete_in_advance_field_partitioned_table.yml
91
- - example/config_delete_in_advance_partitioned_table.yml
92
- - example/config_expose_errors.yml
93
- - example/config_gcs.yml
94
- - example/config_guess_from_embulk_schema.yml
95
- - example/config_guess_with_column_options.yml
96
- - example/config_gzip.yml
97
- - example/config_jsonl.yml
98
- - example/config_max_threads.yml
99
- - example/config_min_ouput_tasks.yml
100
- - example/config_mode_append.yml
101
- - example/config_mode_append_direct.yml
102
- - example/config_nested_record.yml
103
- - example/config_payload_column.yml
104
- - example/config_payload_column_index.yml
105
- - example/config_progress_log_interval.yml
106
- - example/config_replace.yml
107
- - example/config_replace_backup.yml
108
- - example/config_replace_backup_field_partitioned_table.yml
109
- - example/config_replace_backup_partitioned_table.yml
110
- - example/config_replace_field_partitioned_table.yml
111
- - example/config_replace_partitioned_table.yml
112
- - example/config_replace_schema_update_options.yml
113
- - example/config_skip_file_generation.yml
114
- - example/config_table_strftime.yml
115
- - example/config_template_table.yml
116
- - example/config_uncompressed.yml
117
- - example/config_with_rehearsal.yml
118
- - example/example.csv
119
- - example/example.yml
120
- - example/example2_1.csv
121
- - example/example2_2.csv
122
- - example/example4_1.csv
123
- - example/example4_2.csv
124
- - example/example4_3.csv
125
- - example/example4_4.csv
126
- - example/json_key.json
127
- - example/nested_example.jsonl
128
- - example/schema.json
129
- - example/schema_expose_errors.json
130
86
  - lib/embulk/output/bigquery.rb
131
87
  - lib/embulk/output/bigquery/auth.rb
132
88
  - lib/embulk/output/bigquery/bigquery_client.rb
@@ -1,31 +0,0 @@
1
- in:
2
- type: file
3
- path_prefix: example/example.csv
4
- parser:
5
- type: csv
6
- charset: UTF-8
7
- newline: CRLF
8
- null_string: 'NULL'
9
- skip_header_lines: 1
10
- comment_line_marker: '#'
11
- columns:
12
- - {name: date, type: string}
13
- - {name: timestamp, type: timestamp, format: "%Y-%m-%d %H:%M:%S.%N", timezone: "+09:00"}
14
- - {name: "null", type: string}
15
- - {name: long, type: long}
16
- - {name: string, type: string}
17
- - {name: double, type: double}
18
- - {name: boolean, type: boolean}
19
- out:
20
- type: bigquery
21
- mode: append_direct
22
- auth_method: service_account
23
- json_keyfile: example/your-project-000.json
24
- dataset: your_dataset_name
25
- table: your_table_name
26
- source_format: NEWLINE_DELIMITED_JSON
27
- compression: NONE
28
- auto_create_dataset: true
29
- auto_create_table: true
30
- schema_file: example/schema.json
31
- schema_update_options: [ALLOW_FIELD_ADDITION, ALLOW_FIELD_RELAXATION]
@@ -1,33 +0,0 @@
1
- in:
2
- type: file
3
- path_prefix: example/example.csv
4
- parser:
5
- type: csv
6
- charset: UTF-8
7
- newline: CRLF
8
- null_string: 'NULL'
9
- skip_header_lines: 1
10
- comment_line_marker: '#'
11
- columns:
12
- - {name: date, type: string}
13
- - {name: timestamp, type: timestamp, format: "%Y-%m-%d %H:%M:%S.%N", timezone: "+09:00"}
14
- - {name: "null", type: string}
15
- - {name: long, type: long}
16
- - {name: string, type: string}
17
- - {name: double, type: double}
18
- - {name: boolean, type: boolean}
19
- out:
20
- type: bigquery
21
- mode: replace
22
- auth_method: service_account
23
- json_keyfile: example/your-project-000.json
24
- dataset: your_dataset_name
25
- table: your_table_name
26
- source_format: NEWLINE_DELIMITED_JSON
27
- auto_create_dataset: true
28
- auto_create_table: true
29
- schema_file: example/schema.json
30
- timeout_sec: 400
31
- open_timeout_sec: 400
32
- retries: 2
33
- application_name: "Embulk BigQuery plugin test"
@@ -1,30 +0,0 @@
1
- in:
2
- type: file
3
- path_prefix: example/example.csv
4
- parser:
5
- type: csv
6
- charset: UTF-8
7
- newline: CRLF
8
- null_string: 'NULL'
9
- skip_header_lines: 1
10
- comment_line_marker: '#'
11
- columns:
12
- - {name: date, type: string}
13
- - {name: timestamp, type: timestamp, format: "%Y-%m-%d %H:%M:%S.%N", timezone: "+09:00"}
14
- - {name: "null", type: string}
15
- - {name: long, type: long}
16
- - {name: string, type: string}
17
- - {name: double, type: double}
18
- - {name: boolean, type: boolean}
19
- out:
20
- type: bigquery
21
- mode: replace
22
- auth_method: service_account
23
- json_keyfile: example/your-project-000.json
24
- dataset: your_dataset_name
25
- table: your_table_name
26
- source_format: CSV
27
- compression: GZIP
28
- auto_create_dataset: true
29
- auto_create_table: true
30
- schema_file: example/schema.json
@@ -1,29 +0,0 @@
1
- in:
2
- type: file
3
- path_prefix: example/example.csv
4
- parser:
5
- type: csv
6
- charset: UTF-8
7
- newline: CRLF
8
- null_string: 'NULL'
9
- skip_header_lines: 1
10
- comment_line_marker: '#'
11
- columns:
12
- - {name: date, type: string}
13
- - {name: timestamp, type: timestamp, format: "%Y-%m-%d %H:%M:%S.%N", timezone: "+09:00"}
14
- - {name: "null", type: string}
15
- - {name: long, type: long}
16
- - {name: string, type: string}
17
- - {name: double, type: double}
18
- - {name: boolean, type: boolean}
19
- out:
20
- type: bigquery
21
- mode: delete_in_advance
22
- auth_method: service_account
23
- json_keyfile: example/your-project-000.json
24
- dataset: your_dataset_name
25
- table: your_table_name
26
- source_format: NEWLINE_DELIMITED_JSON
27
- auto_create_dataset: true
28
- auto_create_table: true
29
- schema_file: example/schema.json
@@ -1,33 +0,0 @@
1
- in:
2
- type: file
3
- path_prefix: example/example.csv
4
- parser:
5
- type: csv
6
- charset: UTF-8
7
- newline: CRLF
8
- null_string: 'NULL'
9
- skip_header_lines: 1
10
- comment_line_marker: '#'
11
- columns:
12
- - {name: date, type: string}
13
- - {name: timestamp, type: timestamp, format: "%Y-%m-%d %H:%M:%S.%N", timezone: "+09:00"}
14
- - {name: "null", type: string}
15
- - {name: long, type: long}
16
- - {name: string, type: string}
17
- - {name: double, type: double}
18
- - {name: boolean, type: boolean}
19
- out:
20
- type: bigquery
21
- mode: delete_in_advance
22
- auth_method: service_account
23
- json_keyfile: example/your-project-000.json
24
- dataset: your_dataset_name
25
- table: your_field_partitioned_table_name
26
- source_format: NEWLINE_DELIMITED_JSON
27
- compression: NONE
28
- auto_create_dataset: true
29
- auto_create_table: true
30
- schema_file: example/schema.json
31
- time_partitioning:
32
- type: 'DAY'
33
- field: timestamp
@@ -1,33 +0,0 @@
1
- in:
2
- type: file
3
- path_prefix: example/example.csv
4
- parser:
5
- type: csv
6
- charset: UTF-8
7
- newline: CRLF
8
- null_string: 'NULL'
9
- skip_header_lines: 1
10
- comment_line_marker: '#'
11
- columns:
12
- - {name: date, type: string}
13
- - {name: timestamp, type: timestamp, format: "%Y-%m-%d %H:%M:%S.%N", timezone: "+09:00"}
14
- - {name: "null", type: string}
15
- - {name: long, type: long}
16
- - {name: string, type: string}
17
- - {name: double, type: double}
18
- - {name: boolean, type: boolean}
19
- out:
20
- type: bigquery
21
- mode: delete_in_advance
22
- auth_method: service_account
23
- json_keyfile: example/your-project-000.json
24
- dataset: your_dataset_name
25
- table: your_partitioned_table_name$20160929
26
- source_format: NEWLINE_DELIMITED_JSON
27
- compression: NONE
28
- auto_create_dataset: true
29
- auto_create_table: true
30
- schema_file: example/schema.json
31
- time_partitioning:
32
- type: 'DAY'
33
- expiration_ms: 100
@@ -1,30 +0,0 @@
1
- in:
2
- type: file
3
- path_prefix: example/example.csv
4
- parser:
5
- type: csv
6
- charset: UTF-8
7
- newline: CRLF
8
- null_string: 'NULL'
9
- skip_header_lines: 1
10
- comment_line_marker: '#'
11
- columns:
12
- - {name: date, type: string}
13
- - {name: timestamp, type: timestamp, format: "%Y-%m-%d %H:%M:%S.%N", timezone: "+09:00"}
14
- - {name: "null", type: string}
15
- - {name: long, type: long}
16
- - {name: string, type: string}
17
- - {name: double, type: double}
18
- - {name: boolean, type: boolean}
19
- out:
20
- type: bigquery
21
- mode: replace
22
- auth_method: service_account
23
- json_keyfile: example/your-project-000.json
24
- dataset: your_dataset_name
25
- table: your_table_name
26
- source_format: NEWLINE_DELIMITED_JSON
27
- compression: NONE
28
- auto_create_dataset: true
29
- auto_create_table: true
30
- schema_file: example/schema_expose_errors.json
@@ -1,32 +0,0 @@
1
- in:
2
- type: file
3
- path_prefix: example/example.csv
4
- parser:
5
- type: csv
6
- charset: UTF-8
7
- newline: CRLF
8
- null_string: 'NULL'
9
- skip_header_lines: 1
10
- comment_line_marker: '#'
11
- columns:
12
- - {name: date, type: string}
13
- - {name: timestamp, type: timestamp, format: "%Y-%m-%d %H:%M:%S.%N", timezone: "+09:00"}
14
- - {name: "null", type: string}
15
- - {name: long, type: long}
16
- - {name: string, type: string}
17
- - {name: double, type: double}
18
- - {name: boolean, type: boolean}
19
- out:
20
- type: bigquery
21
- mode: replace
22
- auth_method: service_account
23
- json_keyfile: example/your-project-000.json
24
- dataset: your_dataset_name
25
- table: your_table_name
26
- source_format: NEWLINE_DELIMITED_JSON
27
- compression: GZIP
28
- auto_create_dataset: true
29
- auto_create_table: true
30
- schema_file: example/schema.json
31
- gcs_bucket: your_bucket_name
32
- auto_create_gcs_bucket: true