RubyGems - embulk-input-bigquery - Versions diffs - 0.0.3 → 0.0.4 - Mend

embulk-input-bigquery 0.0.3 → 0.0.4

Files changed (7) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +9 -0
data/README.md +65 -21
data/embulk-input-bigquery.gemspec +3 -4
data/lib/embulk/input/bigquery.rb +1 -2
data/lib/embulk/input/bigquery/version.rb +1 -1
metadata +21 -16

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: adc126def78ac278dafebe7ad7bf5830ad7f4f29
-  data.tar.gz: caa6d2d3500b9051889f8039d9bb42b5f6cbf13a
+  metadata.gz: 437379856a6f9006896f43638ed6234e26e6e8ee
+  data.tar.gz: 0fc79f8c9c19071da9b6d0f3093fa781ea9de576
 SHA512:
-  metadata.gz: 8d459d42c1d9c5c995f35010298657fc79df8fbc03eedfa26692f34090163fb28a60a6c1c73029c4e50ecd3a2595f5c85241b569aec07bf69d711f5b15c6059f
-  data.tar.gz: a670a9fde47bd8ca7cfb0412b35cd2c581549ba74788b8bbd96e32360d38f0a6d00e9b1434331ff2adbf593afe8c24450099856d18f05a9f3565b4be270a0c56
+  metadata.gz: 023f0a6dbf42e7600b1e7a019e5eec20fae556591cb8c199845994b48a20b6bfaadf6e8dedb5e8b7b481dce57d89d114cdbbe195e2bb6309ba406ab10d7780c0
+  data.tar.gz: 8757c7256f6072d20946de20a559931409beb8a746a5c86c71b7ae2e1dc2ee34d8b9191fa2764bf9ab1a69fbe8b9f799bfe063cbaabc57b491d1fb594856eb5e

data/CHANGELOG.md ADDED

@@ -0,0 +1,9 @@
+# 0.0.4 (2018/01/14)
+* Unsupport google-cloud-bigquery v0.23 (#23)
+# 0.0.3 (2017/12/11)
+* Add BigQuery query option configurations (#4)
+* Add a feature to automatically define columns using the getQueryResult API (#10)
+* Support embedded keyfile into config.yml (#15)

data/README.md CHANGED

@@ -8,7 +8,22 @@ install it yourself as:
     $ embulk gem install embulk-input-bigquery
-## Usage
+# Configuration
+## Options
+### Query Options
+This plugin uses the gem [`google-cloud(Google Cloud Client Library for Ruby)`](https://github.com/GoogleCloudPlatform/google-cloud-ruby) and queries data using [the synchronous method](https://github.com/GoogleCloudPlatform/google-cloud-ruby/blob/c26b404d06f39d0c0c868e553255fb8f530c07b5/google-cloud-bigquery/lib/google/cloud/bigquery/project.rb#L506). Optional configuration items comply with the Google Cloud Client Library.
+| name                                 | type        | required?  | default                  | description            |
+|:-------------------------------------|:------------|:-----------|:-------------------------|:-----------------------|
+| max                                  | integer     | optional   | `null`                   | The maximum number of rows of data to return per page of results. Setting this flag to a small value such as 1000 and then paging through results might improve reliability when the query result set is large. In addition to this limit, responses are also limited to 10 MB. By default, there is no maximum row count, and only the byte limit applies. |
+| cache                                | boolean     | optional   | true                     | Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. For more information, see [query caching](https://developers.google.com/bigquery/querying-data). |
+| standard\_sql                        | boolean     | optional   | true                     | Specifies whether to use BigQuery's [standard SQL](https://cloud.google.com/bigquery/docs/reference/standard-sql/) dialect for this query. If set to true, the query will use standard SQL rather than the [legacy SQL](https://cloud.google.com/bigquery/docs/reference/legacy-sql) dialect. When set to true, the values of `large_results` and `flatten` are ignored; the query will be run as if `large_results` is true and `flatten` is false. Optional. The default value is true. |
+| legacy\_sql                          | boolean     | optional   | false                    | legacy_sql Specifies whether to use BigQuery's [legacy SQL](https://cloud.google.com/bigquery/docs/reference/legacy-sql) dialect for this query. If set to false, the query will use BigQuery's [standard SQL](https://cloud.google.com/bigquery/docs/reference/standard-sql/) When set to false, the values of `large_results` and `flatten` are ignored; the query will be run as if `large_results` is true and `flatten` is false. Optional. The default value is false. |
+## Example
 ```
 in:
@@ -24,7 +39,7 @@ out:
   type: stdout
 ```
-If, table name is changeable, then
+If the table name is changeable, then
 ```
 in:
@@ -40,24 +55,25 @@ in:
     - {name: month, type: timestamp, format: '%Y-%m', eval: 'require "time"; Time.parse(params["date"]).to_i'}
 ```
-### Determine columns from query results if columns definition is empty
+## Authentication
+### JSON key of GCP's service account
+You first need to create a service account (client ID), download its json key and deploy the key with embulk.
 ```
 in:
   type: bigquery
-  project: 'project-name'
-  keyfile: '/home/hogehoge/bigquery-keyfile.json'
-  sql: 'SELECT price,category_id FROM [ecsite.products] GROUP BY category_id'
-out:
-  type: stdout
+  project: project_name
+  keyfile: /path/to/keyfile.json
 ```
-### Embed keyfile content as string into config
+You can also embed contents of json_keyfile at config.yml.
 ```
 in:
   type: bigquery
-  project: 'project-name'
+  project: project_name
   keyfile:
     content: |
       {
@@ -74,16 +90,44 @@ in:
       }
 ```
+## Automatically determine column schema from query results
+Column schema can be automatically determined from query results if `columns` definition is not given.
+Please note that we have to wait until BigQuery query job complets to get the schema information.
+```
+in:
+  type: bigquery
+  project: project_name
+  keyfile: /path/to/keyfile.json
+  sql: 'SELECT price,category_id FROM [ecsite.products] GROUP BY category_id'
+out:
+  type: stdout
+```
+# Another Choice
+`embulk-input-bigquery` queries to BigQuery, so it costs. To save money, you may take following procedures instead:
+1. [Export data](https://cloud.google.com/bigquery/docs/exporting-data?hl=en) from BigQuery to GCS with avro format
+2. Use [embulk-input-gcs](https://github.com/embulk/embulk-input-gcs) and [embulk-parser-avro](https://github.com/joker1007/embulk-parser-avro) to read the exported data from GCS.
+# Development
+## Run
+```
+embulk bundle install --path vendor/bundle
+embulk run -X page_size=1 -b . -l trace example/example.yml
+```
+## Release gem
+Upgrade `lib/embulk/input/bigquery/version.rb`, then
+```
+$ bundle exec rake release
+```
-## Optional Configuration
-This plugin uses the gem [`google-cloud(Google Cloud Client Library for Ruby)`](https://github.com/GoogleCloudPlatform/google-cloud-ruby) and queries data using [the synchronous method](https://github.com/GoogleCloudPlatform/google-cloud-ruby/blob/master/google-cloud-bigquery/lib/google/cloud/bigquery/project.rb#L281).
-Therefore some optional configuration items comply with the Google Cloud Client Library.
+# ChangeLog
-- [max](https://github.com/GoogleCloudPlatform/google-cloud-ruby/blob/master/google-cloud-bigquery/lib/google/cloud/bigquery/project.rb#L315) :
-  - default value : **null** and null value is interpreted as [no maximum row count](https://github.com/GoogleCloudPlatform/google-cloud-ruby/blob/master/google-cloud-bigquery/lib/google/cloud/bigquery/project.rb#L319) in the Google Cloud Client Library.
-- [cache](https://github.com/GoogleCloudPlatform/google-cloud-ruby/blob/master/google-cloud-bigquery/lib/google/cloud/bigquery/project.rb#L331) :
-  - default value : **null** and null value is interpreted as [true](https://github.com/GoogleCloudPlatform/google-cloud-ruby/blob/master/google-cloud-bigquery/lib/google/cloud/bigquery/project.rb#L333) in the Google Cloud Client Library.
-- [standard_sql](https://github.com/GoogleCloudPlatform/google-cloud-ruby/blob/master/google-cloud-bigquery/lib/google/cloud/bigquery/project.rb#L343):
-  - default value : **null** and null value is interpreted as [true](https://github.com/GoogleCloudPlatform/google-cloud-ruby/blob/master/google-cloud-bigquery/lib/google/cloud/bigquery/project.rb#L351) in the Google Cloud Client Library.
-- [legacy_sql](https://github.com/GoogleCloudPlatform/google-cloud-ruby/blob/master/google-cloud-bigquery/lib/google/cloud/bigquery/project.rb#L353):
-  - default value : **null** and null value is interpreted as [false](https://github.com/GoogleCloudPlatform/google-cloud-ruby/blob/master/google-cloud-bigquery/lib/google/cloud/bigquery/project.rb#L361) in the Google Cloud Client Library.
+[CHANGELOG.md](./CHANGELOG.md)

data/embulk-input-bigquery.gemspec CHANGED

@@ -1,5 +1,4 @@
 # coding: utf-8
 lib = File.expand_path('../lib', __FILE__)
 $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
 require 'embulk/input/bigquery/version'
@@ -7,8 +6,8 @@ require 'embulk/input/bigquery/version'
 Gem::Specification.new do |spec|
   spec.name          = 'embulk-input-bigquery'
   spec.version       = Embulk::Input::Bigquery::VERSION
-  spec.authors       = ['Takeru Narita']
-  spec.email         = ['naritano77@gmail.com']
+  spec.authors       = ['potato2003', 'Naotoshi Seo', 'Takeru Narita']
+  spec.email         = ['potato2003@gmail.com', 'sonots@gmail.com', 'naritano77@gmail.com']
   spec.description   = 'embulk input plugin from bigquery.'
   spec.summary       = 'Embulk input plugin from bigquery.'
   spec.homepage      = 'https://github.com/medjed/embulk-input-bigquery'
@@ -19,7 +18,7 @@ Gem::Specification.new do |spec|
   spec.test_files    = spec.files.grep(%r{^(test|spec|features)/})
   spec.require_paths = ['lib']
+  spec.add_dependency 'google-cloud-bigquery', '~> 0.24'
   spec.add_development_dependency 'bundler', '~> 1.3'
   spec.add_development_dependency 'rake'
-  spec.add_dependency 'google-cloud-bigquery', '~> 0.23'
 end

data/lib/embulk/input/bigquery.rb CHANGED

@@ -11,7 +11,6 @@ module Embulk
       # keyfile:
       #   content: |
       class LocalFile
-        # return JSON string
         def self.load(v)
           if v.is_a?(String)
             v
@@ -27,7 +26,7 @@ module Embulk
         unless sql
           sql_erb = config[:sql_erb]
           erb = ERB.new(sql_erb)
-          erb_params = config[:erb_params]
+          erb_params = config[:erb_params] || {}
           erb_params.each do |k, v|
             params[k] = eval(v)
           end

data/lib/embulk/input/bigquery/version.rb CHANGED

@@ -1,7 +1,7 @@
 module Embulk
   module Input
     module Bigquery
-      VERSION = '0.0.3'.freeze
+      VERSION = '0.0.4'.freeze
     end
   end
 end

metadata CHANGED

@@ -1,15 +1,31 @@
 --- !ruby/object:Gem::Specification
 name: embulk-input-bigquery
 version: !ruby/object:Gem::Version
-  version: 0.0.3
+  version: 0.0.4
 platform: ruby
 authors:
+- potato2003
+- Naotoshi Seo
 - Takeru Narita
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2017-12-11 00:00:00.000000000 Z
+date: 2018-01-14 00:00:00.000000000 Z
 dependencies:
+- !ruby/object:Gem::Dependency
+  name: google-cloud-bigquery
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '0.24'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '0.24'
 - !ruby/object:Gem::Dependency
   name: bundler
   requirement: !ruby/object:Gem::Requirement
@@ -38,28 +54,17 @@ dependencies:
     - - ">="
       - !ruby/object:Gem::Version
         version: '0'
-- !ruby/object:Gem::Dependency
-  name: google-cloud-bigquery
-  requirement: !ruby/object:Gem::Requirement
-    requirements:
-    - - "~>"
-      - !ruby/object:Gem::Version
-        version: '0.23'
-  type: :runtime
-  prerelease: false
-  version_requirements: !ruby/object:Gem::Requirement
-    requirements:
-    - - "~>"
-      - !ruby/object:Gem::Version
-        version: '0.23'
 description: embulk input plugin from bigquery.
 email:
+- potato2003@gmail.com
+- sonots@gmail.com
 - naritano77@gmail.com
 executables: []
 extensions: []
 extra_rdoc_files: []
 files:
 - ".gitignore"
+- CHANGELOG.md
 - Gemfile
 - LICENSE.txt
 - README.md