embulk-input-bigquery 0.0.3 → 0.0.4

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: adc126def78ac278dafebe7ad7bf5830ad7f4f29
4
- data.tar.gz: caa6d2d3500b9051889f8039d9bb42b5f6cbf13a
3
+ metadata.gz: 437379856a6f9006896f43638ed6234e26e6e8ee
4
+ data.tar.gz: 0fc79f8c9c19071da9b6d0f3093fa781ea9de576
5
5
  SHA512:
6
- metadata.gz: 8d459d42c1d9c5c995f35010298657fc79df8fbc03eedfa26692f34090163fb28a60a6c1c73029c4e50ecd3a2595f5c85241b569aec07bf69d711f5b15c6059f
7
- data.tar.gz: a670a9fde47bd8ca7cfb0412b35cd2c581549ba74788b8bbd96e32360d38f0a6d00e9b1434331ff2adbf593afe8c24450099856d18f05a9f3565b4be270a0c56
6
+ metadata.gz: 023f0a6dbf42e7600b1e7a019e5eec20fae556591cb8c199845994b48a20b6bfaadf6e8dedb5e8b7b481dce57d89d114cdbbe195e2bb6309ba406ab10d7780c0
7
+ data.tar.gz: 8757c7256f6072d20946de20a559931409beb8a746a5c86c71b7ae2e1dc2ee34d8b9191fa2764bf9ab1a69fbe8b9f799bfe063cbaabc57b491d1fb594856eb5e
@@ -0,0 +1,9 @@
1
+ # 0.0.4 (2018/01/14)
2
+
3
+ * Unsupport google-cloud-bigquery v0.23 (#23)
4
+
5
+ # 0.0.3 (2017/12/11)
6
+
7
+ * Add BigQuery query option configurations (#4)
8
+ * Add a feature to automatically define columns using the getQueryResult API (#10)
9
+ * Support embedded keyfile into config.yml (#15)
data/README.md CHANGED
@@ -8,7 +8,22 @@ install it yourself as:
8
8
 
9
9
  $ embulk gem install embulk-input-bigquery
10
10
 
11
- ## Usage
11
+ # Configuration
12
+
13
+ ## Options
14
+
15
+ ### Query Options
16
+
17
+ This plugin uses the gem [`google-cloud(Google Cloud Client Library for Ruby)`](https://github.com/GoogleCloudPlatform/google-cloud-ruby) and queries data using [the synchronous method](https://github.com/GoogleCloudPlatform/google-cloud-ruby/blob/c26b404d06f39d0c0c868e553255fb8f530c07b5/google-cloud-bigquery/lib/google/cloud/bigquery/project.rb#L506). Optional configuration items comply with the Google Cloud Client Library.
18
+
19
+ | name | type | required? | default | description |
20
+ |:-------------------------------------|:------------|:-----------|:-------------------------|:-----------------------|
21
+ | max | integer | optional | `null` | The maximum number of rows of data to return per page of results. Setting this flag to a small value such as 1000 and then paging through results might improve reliability when the query result set is large. In addition to this limit, responses are also limited to 10 MB. By default, there is no maximum row count, and only the byte limit applies. |
22
+ | cache | boolean | optional | true | Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. For more information, see [query caching](https://developers.google.com/bigquery/querying-data). |
23
+ | standard\_sql | boolean | optional | true | Specifies whether to use BigQuery's [standard SQL](https://cloud.google.com/bigquery/docs/reference/standard-sql/) dialect for this query. If set to true, the query will use standard SQL rather than the [legacy SQL](https://cloud.google.com/bigquery/docs/reference/legacy-sql) dialect. When set to true, the values of `large_results` and `flatten` are ignored; the query will be run as if `large_results` is true and `flatten` is false. Optional. The default value is true. |
24
+ | legacy\_sql | boolean | optional | false | legacy_sql Specifies whether to use BigQuery's [legacy SQL](https://cloud.google.com/bigquery/docs/reference/legacy-sql) dialect for this query. If set to false, the query will use BigQuery's [standard SQL](https://cloud.google.com/bigquery/docs/reference/standard-sql/) When set to false, the values of `large_results` and `flatten` are ignored; the query will be run as if `large_results` is true and `flatten` is false. Optional. The default value is false. |
25
+
26
+ ## Example
12
27
 
13
28
  ```
14
29
  in:
@@ -24,7 +39,7 @@ out:
24
39
  type: stdout
25
40
  ```
26
41
 
27
- If, table name is changeable, then
42
+ If the table name is changeable, then
28
43
 
29
44
  ```
30
45
  in:
@@ -40,24 +55,25 @@ in:
40
55
  - {name: month, type: timestamp, format: '%Y-%m', eval: 'require "time"; Time.parse(params["date"]).to_i'}
41
56
  ```
42
57
 
43
- ### Determine columns from query results if columns definition is empty
58
+ ## Authentication
59
+
60
+ ### JSON key of GCP's service account
61
+
62
+ You first need to create a service account (client ID), download its json key and deploy the key with embulk.
44
63
 
45
64
  ```
46
65
  in:
47
66
  type: bigquery
48
- project: 'project-name'
49
- keyfile: '/home/hogehoge/bigquery-keyfile.json'
50
- sql: 'SELECT price,category_id FROM [ecsite.products] GROUP BY category_id'
51
- out:
52
- type: stdout
67
+ project: project_name
68
+ keyfile: /path/to/keyfile.json
53
69
  ```
54
70
 
55
- ### Embed keyfile content as string into config
71
+ You can also embed contents of json_keyfile at config.yml.
56
72
 
57
73
  ```
58
74
  in:
59
75
  type: bigquery
60
- project: 'project-name'
76
+ project: project_name
61
77
  keyfile:
62
78
  content: |
63
79
  {
@@ -74,16 +90,44 @@ in:
74
90
  }
75
91
  ```
76
92
 
93
+ ## Automatically determine column schema from query results
94
+
95
+ Column schema can be automatically determined from query results if `columns` definition is not given.
96
+ Please note that we have to wait until BigQuery query job complets to get the schema information.
97
+
98
+ ```
99
+ in:
100
+ type: bigquery
101
+ project: project_name
102
+ keyfile: /path/to/keyfile.json
103
+ sql: 'SELECT price,category_id FROM [ecsite.products] GROUP BY category_id'
104
+ out:
105
+ type: stdout
106
+ ```
107
+
108
+ # Another Choice
109
+
110
+ `embulk-input-bigquery` queries to BigQuery, so it costs. To save money, you may take following procedures instead:
111
+
112
+ 1. [Export data](https://cloud.google.com/bigquery/docs/exporting-data?hl=en) from BigQuery to GCS with avro format
113
+ 2. Use [embulk-input-gcs](https://github.com/embulk/embulk-input-gcs) and [embulk-parser-avro](https://github.com/joker1007/embulk-parser-avro) to read the exported data from GCS.
114
+
115
+ # Development
116
+
117
+ ## Run
118
+ ```
119
+ embulk bundle install --path vendor/bundle
120
+ embulk run -X page_size=1 -b . -l trace example/example.yml
121
+ ```
122
+
123
+ ## Release gem
124
+
125
+ Upgrade `lib/embulk/input/bigquery/version.rb`, then
126
+
127
+ ```
128
+ $ bundle exec rake release
129
+ ```
77
130
 
78
- ## Optional Configuration
79
- This plugin uses the gem [`google-cloud(Google Cloud Client Library for Ruby)`](https://github.com/GoogleCloudPlatform/google-cloud-ruby) and queries data using [the synchronous method](https://github.com/GoogleCloudPlatform/google-cloud-ruby/blob/master/google-cloud-bigquery/lib/google/cloud/bigquery/project.rb#L281).
80
- Therefore some optional configuration items comply with the Google Cloud Client Library.
131
+ # ChangeLog
81
132
 
82
- - [max](https://github.com/GoogleCloudPlatform/google-cloud-ruby/blob/master/google-cloud-bigquery/lib/google/cloud/bigquery/project.rb#L315) :
83
- - default value : **null** and null value is interpreted as [no maximum row count](https://github.com/GoogleCloudPlatform/google-cloud-ruby/blob/master/google-cloud-bigquery/lib/google/cloud/bigquery/project.rb#L319) in the Google Cloud Client Library.
84
- - [cache](https://github.com/GoogleCloudPlatform/google-cloud-ruby/blob/master/google-cloud-bigquery/lib/google/cloud/bigquery/project.rb#L331) :
85
- - default value : **null** and null value is interpreted as [true](https://github.com/GoogleCloudPlatform/google-cloud-ruby/blob/master/google-cloud-bigquery/lib/google/cloud/bigquery/project.rb#L333) in the Google Cloud Client Library.
86
- - [standard_sql](https://github.com/GoogleCloudPlatform/google-cloud-ruby/blob/master/google-cloud-bigquery/lib/google/cloud/bigquery/project.rb#L343):
87
- - default value : **null** and null value is interpreted as [true](https://github.com/GoogleCloudPlatform/google-cloud-ruby/blob/master/google-cloud-bigquery/lib/google/cloud/bigquery/project.rb#L351) in the Google Cloud Client Library.
88
- - [legacy_sql](https://github.com/GoogleCloudPlatform/google-cloud-ruby/blob/master/google-cloud-bigquery/lib/google/cloud/bigquery/project.rb#L353):
89
- - default value : **null** and null value is interpreted as [false](https://github.com/GoogleCloudPlatform/google-cloud-ruby/blob/master/google-cloud-bigquery/lib/google/cloud/bigquery/project.rb#L361) in the Google Cloud Client Library.
133
+ [CHANGELOG.md](./CHANGELOG.md)
@@ -1,5 +1,4 @@
1
1
  # coding: utf-8
2
-
3
2
  lib = File.expand_path('../lib', __FILE__)
4
3
  $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
5
4
  require 'embulk/input/bigquery/version'
@@ -7,8 +6,8 @@ require 'embulk/input/bigquery/version'
7
6
  Gem::Specification.new do |spec|
8
7
  spec.name = 'embulk-input-bigquery'
9
8
  spec.version = Embulk::Input::Bigquery::VERSION
10
- spec.authors = ['Takeru Narita']
11
- spec.email = ['naritano77@gmail.com']
9
+ spec.authors = ['potato2003', 'Naotoshi Seo', 'Takeru Narita']
10
+ spec.email = ['potato2003@gmail.com', 'sonots@gmail.com', 'naritano77@gmail.com']
12
11
  spec.description = 'embulk input plugin from bigquery.'
13
12
  spec.summary = 'Embulk input plugin from bigquery.'
14
13
  spec.homepage = 'https://github.com/medjed/embulk-input-bigquery'
@@ -19,7 +18,7 @@ Gem::Specification.new do |spec|
19
18
  spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
20
19
  spec.require_paths = ['lib']
21
20
 
21
+ spec.add_dependency 'google-cloud-bigquery', '~> 0.24'
22
22
  spec.add_development_dependency 'bundler', '~> 1.3'
23
23
  spec.add_development_dependency 'rake'
24
- spec.add_dependency 'google-cloud-bigquery', '~> 0.23'
25
24
  end
@@ -11,7 +11,6 @@ module Embulk
11
11
  # keyfile:
12
12
  # content: |
13
13
  class LocalFile
14
- # return JSON string
15
14
  def self.load(v)
16
15
  if v.is_a?(String)
17
16
  v
@@ -27,7 +26,7 @@ module Embulk
27
26
  unless sql
28
27
  sql_erb = config[:sql_erb]
29
28
  erb = ERB.new(sql_erb)
30
- erb_params = config[:erb_params]
29
+ erb_params = config[:erb_params] || {}
31
30
  erb_params.each do |k, v|
32
31
  params[k] = eval(v)
33
32
  end
@@ -1,7 +1,7 @@
1
1
  module Embulk
2
2
  module Input
3
3
  module Bigquery
4
- VERSION = '0.0.3'.freeze
4
+ VERSION = '0.0.4'.freeze
5
5
  end
6
6
  end
7
7
  end
metadata CHANGED
@@ -1,15 +1,31 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: embulk-input-bigquery
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.3
4
+ version: 0.0.4
5
5
  platform: ruby
6
6
  authors:
7
+ - potato2003
8
+ - Naotoshi Seo
7
9
  - Takeru Narita
8
10
  autorequire:
9
11
  bindir: bin
10
12
  cert_chain: []
11
- date: 2017-12-11 00:00:00.000000000 Z
13
+ date: 2018-01-14 00:00:00.000000000 Z
12
14
  dependencies:
15
+ - !ruby/object:Gem::Dependency
16
+ name: google-cloud-bigquery
17
+ requirement: !ruby/object:Gem::Requirement
18
+ requirements:
19
+ - - "~>"
20
+ - !ruby/object:Gem::Version
21
+ version: '0.24'
22
+ type: :runtime
23
+ prerelease: false
24
+ version_requirements: !ruby/object:Gem::Requirement
25
+ requirements:
26
+ - - "~>"
27
+ - !ruby/object:Gem::Version
28
+ version: '0.24'
13
29
  - !ruby/object:Gem::Dependency
14
30
  name: bundler
15
31
  requirement: !ruby/object:Gem::Requirement
@@ -38,28 +54,17 @@ dependencies:
38
54
  - - ">="
39
55
  - !ruby/object:Gem::Version
40
56
  version: '0'
41
- - !ruby/object:Gem::Dependency
42
- name: google-cloud-bigquery
43
- requirement: !ruby/object:Gem::Requirement
44
- requirements:
45
- - - "~>"
46
- - !ruby/object:Gem::Version
47
- version: '0.23'
48
- type: :runtime
49
- prerelease: false
50
- version_requirements: !ruby/object:Gem::Requirement
51
- requirements:
52
- - - "~>"
53
- - !ruby/object:Gem::Version
54
- version: '0.23'
55
57
  description: embulk input plugin from bigquery.
56
58
  email:
59
+ - potato2003@gmail.com
60
+ - sonots@gmail.com
57
61
  - naritano77@gmail.com
58
62
  executables: []
59
63
  extensions: []
60
64
  extra_rdoc_files: []
61
65
  files:
62
66
  - ".gitignore"
67
+ - CHANGELOG.md
63
68
  - Gemfile
64
69
  - LICENSE.txt
65
70
  - README.md