RubyGems - embulk-output-bigquery - Versions diffs - 0.6.0 → 0.6.1 - Mend

embulk-output-bigquery 0.6.0 → 0.6.1

Files changed (50) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +7 -3
data/README.md +12 -7
data/embulk-output-bigquery.gemspec +4 -2
data/lib/embulk/output/bigquery.rb +3 -3
metadata +2 -46
data/example/config_append_direct_schema_update_options.yml +0 -31
data/example/config_client_options.yml +0 -33
data/example/config_csv.yml +0 -30
data/example/config_delete_in_advance.yml +0 -29
data/example/config_delete_in_advance_field_partitioned_table.yml +0 -33
data/example/config_delete_in_advance_partitioned_table.yml +0 -33
data/example/config_expose_errors.yml +0 -30
data/example/config_gcs.yml +0 -32
data/example/config_guess_from_embulk_schema.yml +0 -29
data/example/config_guess_with_column_options.yml +0 -40
data/example/config_gzip.yml +0 -1
data/example/config_jsonl.yml +0 -1
data/example/config_max_threads.yml +0 -34
data/example/config_min_ouput_tasks.yml +0 -34
data/example/config_mode_append.yml +0 -30
data/example/config_mode_append_direct.yml +0 -30
data/example/config_nested_record.yml +0 -1
data/example/config_payload_column.yml +0 -20
data/example/config_payload_column_index.yml +0 -20
data/example/config_progress_log_interval.yml +0 -31
data/example/config_replace.yml +0 -30
data/example/config_replace_backup.yml +0 -32
data/example/config_replace_backup_field_partitioned_table.yml +0 -34
data/example/config_replace_backup_partitioned_table.yml +0 -34
data/example/config_replace_field_partitioned_table.yml +0 -33
data/example/config_replace_partitioned_table.yml +0 -33
data/example/config_replace_schema_update_options.yml +0 -33
data/example/config_skip_file_generation.yml +0 -32
data/example/config_table_strftime.yml +0 -30
data/example/config_template_table.yml +0 -21
data/example/config_uncompressed.yml +0 -1
data/example/config_with_rehearsal.yml +0 -33
data/example/example.csv +0 -17
data/example/example.yml +0 -1
data/example/example2_1.csv +0 -1
data/example/example2_2.csv +0 -1
data/example/example4_1.csv +0 -1
data/example/example4_2.csv +0 -1
data/example/example4_3.csv +0 -1
data/example/example4_4.csv +0 -1
data/example/json_key.json +0 -12
data/example/nested_example.jsonl +0 -16
data/example/schema.json +0 -30
data/example/schema_expose_errors.json +0 -30

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 3c0942035a81c9180260f8329ccaa5ba99de2185ea5f9ec5f1b3ffe87d5e8a73
-  data.tar.gz: c543a1b9f1278cf5d543a96bd3b8c465b2727b03df67d1e1726bef40135a1d42
+  metadata.gz: ddfd10c5e85614e1dae0333494333653f1af95b8158dfda8977f8b00d64b3478
+  data.tar.gz: 2cec70eaa49c828d7fe9347bc0d9699b9398f21db96880e997a66bdab23deb89
 SHA512:
-  metadata.gz: 23559e485346f2f8d65fa76aef2284c8b8c682257ee317b2088b30515e6e2a2936cc3c5b8ab5c3020ee9f9790e735bc48c41bb8e5d30fc777174d681796128c1
-  data.tar.gz: 336988c0afb153c0b9b7532bf9d85523bb9e9641eca7a79b6ab491be5567e2be9204c47417b28a8d42bbae2907cb6892b1ce5abb98261564c696b15478deb3ad
+  metadata.gz: 4782a28272da610f8399aca50cc4ddaefea00b8dbf45a37bec24771d7ecdb05bbdcd6de85ff167c5c3745f6689413c215689bb8d420960705cd6cb2026e99932
+  data.tar.gz: 9dbabb787e2f1b5797ccb2a2cd8786ce28d0e0d01310cd522ea4894337a279e809de10abca14b50b836553b6de95df4afd886596d75e7193d4de60a5c6f95781

data/CHANGELOG.md CHANGED

@@ -1,3 +1,7 @@
+## 0.6.1 - 2019-08-28
+* [maintenance] Release a new gem not to include symlinks to make it work on Windows.
 ## 0.6.0 - 2019-08-11
 Cleanup `auth_method`:
@@ -5,14 +9,14 @@ Cleanup `auth_method`:
 * [enhancement] Support `auth_method: authorized_user` (OAuth)
 * [incompatibility change] Rename `auth_method: json_key` to `auth_method: service_account` (`json_key` is kept for backward compatibility)
 * [incompatibility change] Remove deprecated `auth_method: private_key` (p12 key)
-* [incompatibility change] Change the default `auth_method` to `application_default` from `private_key`.
+* [incompatibility change] Change the default `auth_method` to `application_default` from `private_key` because `private_key` was dropped.
 ## 0.5.0 - 2019-08-10
 * [incompatibility change] Drop deprecated `time_partitioning`.`require_partition_filter`
 * [incompatibility change] Drop `prevent_duplicate_insert` which has no use-case now
-* [incompatibility change] Change default value of `auto_create_table` to `true` from `false`
-  * Modes `replace`, `replace_backup`, `append`, `delete_in_advance`, that is, except `append_direct` requires `auto_create_table: true`.
+* [incompatibility change] Modes `replace`, `replace_backup`, `append`, and `delete_in_advance` require `auto_create_table: true` now because, previously, these modes had created a target table even with `auto_create_table: false` and made users being confused. Note that `auto_create_table: true` is always required even for a partition (a table name with a partition decorator) which may not require creating a table. This is for simplicity of logics and implementations.
+* [incompatibility change] Change default value of `auto_create_table` to `true` because the above 4 modes, that is, except `append_direct` always require `auto_create_table: true` now.
 ## 0.4.14 - 2019-08-10

data/README.md CHANGED

@@ -37,7 +37,7 @@ OAuth flow for installed applications.
 |  location                            | string      | optional   | nil                      | geographic location of dataset. See [Location](#location) |
 |  table                               | string      | required   |                          | table name, or table name with a partition decorator such as `table_name$20160929`|
 |  auto_create_dataset                 | boolean     | optional   | false                    | automatically create dataset |
-|  auto_create_table                   | boolean     | optional   | true                     | `false` is available only for `append_direct` mode. Other modes requires `true`. See [Dynamic Table Creating](#dynamic-table-creating) and [Time Partitioning](#time-partitioning) |
+|  auto_create_table                   | boolean     | optional   | true                     | `false` is available only for `append_direct` mode. Other modes require `true`. See [Dynamic Table Creating](#dynamic-table-creating) and [Time Partitioning](#time-partitioning) |
 |  schema_file                         | string      | optional   |                          | /path/to/schema.json |
 |  template_table                      | string      | optional   |                          | template table name. See [Dynamic Table Creating](#dynamic-table-creating) |
 |  job_status_max_polling_time         | int         | optional   | 3600 sec                 | Max job status polling time |
@@ -213,7 +213,7 @@ You can also embed contents of `json_keyfile` at config.yml.
 ```yaml
 out:
   type: bigquery
-  auth_method: service_account
+  auth_method: authorized_user
   json_keyfile:
     content: |
       {
@@ -239,7 +239,12 @@ out:
 #### application\_default
-Use Application Default Credentials (ADC).
+Use Application Default Credentials (ADC).  ADC is a strategy to locate Google Cloud Service Account credentials.
+1. ADC checks to see if the environment variable `GOOGLE_APPLICATION_CREDENTIALS` is set. If the variable is set, ADC uses the service account file that the variable points to.
+2. ADC checks to see if `~/.config/gcloud/application_default_credentials.json` is located. This file is created by running `gcloud auth application-default login`.
+3. Use the default service account for credentials if the application running on Compute Engine, App Engine, Kubernetes Engine, Cloud Functions or Cloud Run.
 See https://cloud.google.com/docs/authentication/production for details.
 ```yaml
@@ -256,12 +261,12 @@ Table ids are formatted at runtime
 using the local time of the embulk server.
 For example, with the configuration below,
-data is inserted into tables `table_2015_04`, `table_2015_05` and so on.
+data is inserted into tables `table_20150503`, `table_20150504` and so on.
 ```yaml
 out:
   type: bigquery
-  table: table_%Y_%m
+  table: table_%Y%m%d
 ```
 ### Dynamic table creating
@@ -276,7 +281,7 @@ Please set file path of schema.json.
 out:
   type: bigquery
   auto_create_table: true
-  table: table_%Y_%m
+  table: table_%Y%m%d
   schema_file: /path/to/schema.json
 ```
@@ -288,7 +293,7 @@ Plugin will try to read schema from existing table and use it as schema template
 out:
   type: bigquery
   auto_create_table: true
-  table: table_%Y_%m
+  table: table_%Y%m%d
   template_table: existing_table_name
 ```

data/embulk-output-bigquery.gemspec CHANGED

@@ -1,6 +1,6 @@
 Gem::Specification.new do |spec|
   spec.name          = "embulk-output-bigquery"
-  spec.version       = "0.6.0"
+  spec.version       = "0.6.1"
   spec.authors       = ["Satoshi Akama", "Naotoshi Seo"]
   spec.summary       = "Google BigQuery output plugin for Embulk"
   spec.description   = "Embulk plugin that insert records to Google BigQuery."
@@ -8,7 +8,9 @@ Gem::Specification.new do |spec|
   spec.licenses      = ["MIT"]
   spec.homepage      = "https://github.com/embulk/embulk-output-bigquery"
-  spec.files         = `git ls-files`.split("\n") + Dir["classpath/*.jar"]
+  # Exclude example directory which uses symlinks from generating gem.
+  # Symlinks do not work properly on the Windows platform without administrator privilege.
+  spec.files         = `git ls-files`.split("\n") + Dir["classpath/*.jar"] - Dir["example/*" ]
   spec.test_files    = spec.files.grep(%r{^(test|spec)/})
   spec.require_paths = ["lib"]

data/lib/embulk/output/bigquery.rb CHANGED

@@ -304,14 +304,14 @@ module Embulk
           bigquery.create_table_if_not_exists(task['table'])
         when 'replace'
           bigquery.create_table_if_not_exists(task['temp_table'])
-          bigquery.create_table_if_not_exists(task['table'])
+          bigquery.create_table_if_not_exists(task['table']) # needs for when task['table'] is a partition
         when 'append'
           bigquery.create_table_if_not_exists(task['temp_table'])
-          bigquery.create_table_if_not_exists(task['table'])
+          bigquery.create_table_if_not_exists(task['table']) # needs for when task['table'] is a partition
         when 'replace_backup'
           bigquery.create_table_if_not_exists(task['temp_table'])
           bigquery.create_table_if_not_exists(task['table'])
-          bigquery.create_table_if_not_exists(task['table_old'], dataset: task['dataset_old'])
+          bigquery.create_table_if_not_exists(task['table_old'], dataset: task['dataset_old']) # needs for when a partition
         else # append_direct
           if task['auto_create_table']
             bigquery.create_table_if_not_exists(task['table'])

metadata CHANGED

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: embulk-output-bigquery
 version: !ruby/object:Gem::Version
-  version: 0.6.0
+  version: 0.6.1
 platform: ruby
 authors:
 - Satoshi Akama
@@ -9,7 +9,7 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2019-08-10 00:00:00.000000000 Z
+date: 2019-08-28 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   requirement: !ruby/object:Gem::Requirement
@@ -83,50 +83,6 @@ files:
 - README.md
 - Rakefile
 - embulk-output-bigquery.gemspec
-- example/config_append_direct_schema_update_options.yml
-- example/config_client_options.yml
-- example/config_csv.yml
-- example/config_delete_in_advance.yml
-- example/config_delete_in_advance_field_partitioned_table.yml
-- example/config_delete_in_advance_partitioned_table.yml
-- example/config_expose_errors.yml
-- example/config_gcs.yml
-- example/config_guess_from_embulk_schema.yml
-- example/config_guess_with_column_options.yml
-- example/config_gzip.yml
-- example/config_jsonl.yml
-- example/config_max_threads.yml
-- example/config_min_ouput_tasks.yml
-- example/config_mode_append.yml
-- example/config_mode_append_direct.yml
-- example/config_nested_record.yml
-- example/config_payload_column.yml
-- example/config_payload_column_index.yml
-- example/config_progress_log_interval.yml
-- example/config_replace.yml
-- example/config_replace_backup.yml
-- example/config_replace_backup_field_partitioned_table.yml
-- example/config_replace_backup_partitioned_table.yml
-- example/config_replace_field_partitioned_table.yml
-- example/config_replace_partitioned_table.yml
-- example/config_replace_schema_update_options.yml
-- example/config_skip_file_generation.yml
-- example/config_table_strftime.yml
-- example/config_template_table.yml
-- example/config_uncompressed.yml
-- example/config_with_rehearsal.yml
-- example/example.csv
-- example/example.yml
-- example/example2_1.csv
-- example/example2_2.csv
-- example/example4_1.csv
-- example/example4_2.csv
-- example/example4_3.csv
-- example/example4_4.csv
-- example/json_key.json
-- example/nested_example.jsonl
-- example/schema.json
-- example/schema_expose_errors.json
 - lib/embulk/output/bigquery.rb
 - lib/embulk/output/bigquery/auth.rb
 - lib/embulk/output/bigquery/bigquery_client.rb

data/example/config_append_direct_schema_update_options.yml DELETED

@@ -1,31 +0,0 @@
-in:
-  type: file
-  path_prefix: example/example.csv
-  parser:
-    type: csv
-    charset: UTF-8
-    newline: CRLF
-    null_string: 'NULL'
-    skip_header_lines: 1
-    comment_line_marker: '#'
-    columns:
-      - {name: date,        type: string}
-      - {name: timestamp,   type: timestamp, format: "%Y-%m-%d %H:%M:%S.%N", timezone: "+09:00"}
-      - {name: "null",      type: string}
-      - {name: long,        type: long}
-      - {name: string,      type: string}
-      - {name: double,     type: double}
-      - {name: boolean,     type: boolean}
-out:
-  type: bigquery
-  mode: append_direct
-  auth_method: service_account
-  json_keyfile: example/your-project-000.json
-  dataset: your_dataset_name
-  table: your_table_name
-  source_format: NEWLINE_DELIMITED_JSON
-  compression: NONE
-  auto_create_dataset: true
-  auto_create_table: true
-  schema_file: example/schema.json
-  schema_update_options: [ALLOW_FIELD_ADDITION, ALLOW_FIELD_RELAXATION]

data/example/config_client_options.yml DELETED

@@ -1,33 +0,0 @@
-in:
-  type: file
-  path_prefix: example/example.csv
-  parser:
-    type: csv
-    charset: UTF-8
-    newline: CRLF
-    null_string: 'NULL'
-    skip_header_lines: 1
-    comment_line_marker: '#'
-    columns:
-      - {name: date,        type: string}
-      - {name: timestamp,   type: timestamp, format: "%Y-%m-%d %H:%M:%S.%N", timezone: "+09:00"}
-      - {name: "null",      type: string}
-      - {name: long,        type: long}
-      - {name: string,      type: string}
-      - {name: double,      type: double}
-      - {name: boolean,     type: boolean}
-out:
-  type: bigquery
-  mode: replace
-  auth_method: service_account
-  json_keyfile: example/your-project-000.json
-  dataset: your_dataset_name
-  table: your_table_name
-  source_format: NEWLINE_DELIMITED_JSON
-  auto_create_dataset: true
-  auto_create_table: true
-  schema_file: example/schema.json
-  timeout_sec: 400
-  open_timeout_sec: 400
-  retries: 2
-  application_name: "Embulk BigQuery plugin test"

data/example/config_csv.yml DELETED

@@ -1,30 +0,0 @@
-in:
-  type: file
-  path_prefix: example/example.csv
-  parser:
-    type: csv
-    charset: UTF-8
-    newline: CRLF
-    null_string: 'NULL'
-    skip_header_lines: 1
-    comment_line_marker: '#'
-    columns:
-      - {name: date,        type: string}
-      - {name: timestamp,   type: timestamp, format: "%Y-%m-%d %H:%M:%S.%N", timezone: "+09:00"}
-      - {name: "null",      type: string}
-      - {name: long,        type: long}
-      - {name: string,      type: string}
-      - {name: double,      type: double}
-      - {name: boolean,     type: boolean}
-out:
-  type: bigquery
-  mode: replace
-  auth_method: service_account
-  json_keyfile: example/your-project-000.json
-  dataset: your_dataset_name
-  table: your_table_name
-  source_format: CSV
-  compression: GZIP
-  auto_create_dataset: true
-  auto_create_table: true
-  schema_file: example/schema.json

data/example/config_delete_in_advance.yml DELETED

@@ -1,29 +0,0 @@
-in:
-  type: file
-  path_prefix: example/example.csv
-  parser:
-    type: csv
-    charset: UTF-8
-    newline: CRLF
-    null_string: 'NULL'
-    skip_header_lines: 1
-    comment_line_marker: '#'
-    columns:
-      - {name: date,        type: string}
-      - {name: timestamp,   type: timestamp, format: "%Y-%m-%d %H:%M:%S.%N", timezone: "+09:00"}
-      - {name: "null",      type: string}
-      - {name: long,        type: long}
-      - {name: string,      type: string}
-      - {name: double,      type: double}
-      - {name: boolean,     type: boolean}
-out:
-  type: bigquery
-  mode: delete_in_advance
-  auth_method: service_account
-  json_keyfile: example/your-project-000.json
-  dataset: your_dataset_name
-  table: your_table_name
-  source_format: NEWLINE_DELIMITED_JSON
-  auto_create_dataset: true
-  auto_create_table: true
-  schema_file: example/schema.json

data/example/config_delete_in_advance_field_partitioned_table.yml DELETED

@@ -1,33 +0,0 @@
-in:
-  type: file
-  path_prefix: example/example.csv
-  parser:
-    type: csv
-    charset: UTF-8
-    newline: CRLF
-    null_string: 'NULL'
-    skip_header_lines: 1
-    comment_line_marker: '#'
-    columns:
-      - {name: date,        type: string}
-      - {name: timestamp,   type: timestamp, format: "%Y-%m-%d %H:%M:%S.%N", timezone: "+09:00"}
-      - {name: "null",      type: string}
-      - {name: long,        type: long}
-      - {name: string,      type: string}
-      - {name: double,      type: double}
-      - {name: boolean,     type: boolean}
-out:
-  type: bigquery
-  mode: delete_in_advance
-  auth_method: service_account
-  json_keyfile: example/your-project-000.json
-  dataset: your_dataset_name
-  table: your_field_partitioned_table_name
-  source_format: NEWLINE_DELIMITED_JSON
-  compression: NONE
-  auto_create_dataset: true
-  auto_create_table: true
-  schema_file: example/schema.json
-  time_partitioning:
-    type: 'DAY'
-    field: timestamp

data/example/config_delete_in_advance_partitioned_table.yml DELETED

@@ -1,33 +0,0 @@
-in:
-  type: file
-  path_prefix: example/example.csv
-  parser:
-    type: csv
-    charset: UTF-8
-    newline: CRLF
-    null_string: 'NULL'
-    skip_header_lines: 1
-    comment_line_marker: '#'
-    columns:
-      - {name: date,        type: string}
-      - {name: timestamp,   type: timestamp, format: "%Y-%m-%d %H:%M:%S.%N", timezone: "+09:00"}
-      - {name: "null",      type: string}
-      - {name: long,        type: long}
-      - {name: string,      type: string}
-      - {name: double,      type: double}
-      - {name: boolean,     type: boolean}
-out:
-  type: bigquery
-  mode: delete_in_advance
-  auth_method: service_account
-  json_keyfile: example/your-project-000.json
-  dataset: your_dataset_name
-  table: your_partitioned_table_name$20160929
-  source_format: NEWLINE_DELIMITED_JSON
-  compression: NONE
-  auto_create_dataset: true
-  auto_create_table: true
-  schema_file: example/schema.json
-  time_partitioning:
-    type: 'DAY'
-    expiration_ms: 100

data/example/config_expose_errors.yml DELETED

@@ -1,30 +0,0 @@
-in:
-  type: file
-  path_prefix: example/example.csv
-  parser:
-    type: csv
-    charset: UTF-8
-    newline: CRLF
-    null_string: 'NULL'
-    skip_header_lines: 1
-    comment_line_marker: '#'
-    columns:
-      - {name: date,        type: string}
-      - {name: timestamp,   type: timestamp, format: "%Y-%m-%d %H:%M:%S.%N", timezone: "+09:00"}
-      - {name: "null",      type: string}
-      - {name: long,        type: long}
-      - {name: string,      type: string}
-      - {name: double,      type: double}
-      - {name: boolean,     type: boolean}
-out:
-  type: bigquery
-  mode: replace
-  auth_method: service_account
-  json_keyfile: example/your-project-000.json
-  dataset: your_dataset_name
-  table: your_table_name
-  source_format: NEWLINE_DELIMITED_JSON
-  compression: NONE
-  auto_create_dataset: true
-  auto_create_table: true
-  schema_file: example/schema_expose_errors.json

data/example/config_gcs.yml DELETED

@@ -1,32 +0,0 @@
-in:
-  type: file
-  path_prefix: example/example.csv
-  parser:
-    type: csv
-    charset: UTF-8
-    newline: CRLF
-    null_string: 'NULL'
-    skip_header_lines: 1
-    comment_line_marker: '#'
-    columns:
-      - {name: date,        type: string}
-      - {name: timestamp,   type: timestamp, format: "%Y-%m-%d %H:%M:%S.%N", timezone: "+09:00"}
-      - {name: "null",      type: string}
-      - {name: long,        type: long}
-      - {name: string,      type: string}
-      - {name: double,      type: double}
-      - {name: boolean,     type: boolean}
-out:
-  type: bigquery
-  mode: replace
-  auth_method: service_account
-  json_keyfile: example/your-project-000.json
-  dataset: your_dataset_name
-  table: your_table_name
-  source_format: NEWLINE_DELIMITED_JSON
-  compression: GZIP
-  auto_create_dataset: true
-  auto_create_table: true
-  schema_file: example/schema.json
-  gcs_bucket: your_bucket_name
-  auto_create_gcs_bucket: true