tumugi-plugin-bigquery 0.1.0 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.travis.yml +6 -4
- data/CHANGELOG.md +48 -0
- data/README.md +23 -3
- data/examples/load.rb +24 -0
- data/examples/test.csv +6 -0
- data/examples/tumugi_config_example.rb +5 -5
- data/lib/tumugi/plugin/bigquery/client.rb +48 -18
- data/lib/tumugi/plugin/bigquery/version.rb +1 -1
- data/lib/tumugi/plugin/target/bigquery_dataset.rb +2 -2
- data/lib/tumugi/plugin/target/bigquery_table.rb +2 -2
- data/lib/tumugi/plugin/task/bigquery_copy.rb +3 -1
- data/lib/tumugi/plugin/task/bigquery_dataset.rb +1 -1
- data/lib/tumugi/plugin/task/bigquery_export.rb +112 -0
- data/lib/tumugi/plugin/task/bigquery_load.rb +73 -0
- data/lib/tumugi/plugin/task/bigquery_query.rb +1 -1
- data/tumugi-plugin-bigquery.gemspec +3 -2
- metadata +29 -10
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA1:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 1f82d5d752da3918795afc6cc669a0fb4711cf95
|
|
4
|
+
data.tar.gz: fed486ae8aeb9266d4fd11cf523a19a8507755af
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 8418f29dfe96d38bcdfa0c5098d59efd819edaa715a7e2af0945c57f70a4d08fa5c477560df636389eefe0bcf40712c4d240b2a97d3301cadebf6e5615808f2b
|
|
7
|
+
data.tar.gz: aa34ee20fdec506277f3ac40b8819b62f7d000db7c802ed4c9fcedd5af1bf33644ecf673b10326c8e64f8d554a9a708cdfc0f9ae7942b5570257adec3108ab28
|
data/.travis.yml
CHANGED
data/CHANGELOG.md
ADDED
|
@@ -0,0 +1,48 @@
|
|
|
1
|
+
# Change Log
|
|
2
|
+
|
|
3
|
+
## [0.2.0](https://github.com/tumugi/tumugi-plugin-bigquery/tree/0.2.0) (2016-06-06)
|
|
4
|
+
[Full Changelog](https://github.com/tumugi/tumugi-plugin-bigquery/compare/v0.1.0...0.2.0)
|
|
5
|
+
|
|
6
|
+
**Implemented enhancements:**
|
|
7
|
+
|
|
8
|
+
- Support extract table to FileSystemTarget [\#23](https://github.com/tumugi/tumugi-plugin-bigquery/issues/23)
|
|
9
|
+
- Support load from GCS [\#5](https://github.com/tumugi/tumugi-plugin-bigquery/issues/5)
|
|
10
|
+
- Support extract table to Google Cloud Storage [\#4](https://github.com/tumugi/tumugi-plugin-bigquery/issues/4)
|
|
11
|
+
- Support service account application default auth [\#22](https://github.com/tumugi/tumugi-plugin-bigquery/pull/22) ([hakobera](https://github.com/hakobera))
|
|
12
|
+
|
|
13
|
+
**Fixed bugs:**
|
|
14
|
+
|
|
15
|
+
- Fix typo and dependency [\#24](https://github.com/tumugi/tumugi-plugin-bigquery/pull/24) ([hakobera](https://github.com/hakobera))
|
|
16
|
+
- Fix missing project\_id of dataset/table [\#21](https://github.com/tumugi/tumugi-plugin-bigquery/pull/21) ([hakobera](https://github.com/hakobera))
|
|
17
|
+
- Fix private key file auth does not work [\#19](https://github.com/tumugi/tumugi-plugin-bigquery/pull/19) ([hakobera](https://github.com/hakobera))
|
|
18
|
+
- Fix support private key file in config section [\#13](https://github.com/tumugi/tumugi-plugin-bigquery/pull/13) ([hakobera](https://github.com/hakobera))
|
|
19
|
+
|
|
20
|
+
**Closed issues:**
|
|
21
|
+
|
|
22
|
+
- Update tumugi to v0.5.0 [\#8](https://github.com/tumugi/tumugi-plugin-bigquery/issues/8)
|
|
23
|
+
|
|
24
|
+
**Merged pull requests:**
|
|
25
|
+
|
|
26
|
+
- Cache output [\#26](https://github.com/tumugi/tumugi-plugin-bigquery/pull/26) ([hakobera](https://github.com/hakobera))
|
|
27
|
+
- Prepare release for 0.2.0 [\#25](https://github.com/tumugi/tumugi-plugin-bigquery/pull/25) ([hakobera](https://github.com/hakobera))
|
|
28
|
+
- Use Thor's invoke instead of system method [\#18](https://github.com/tumugi/tumugi-plugin-bigquery/pull/18) ([hakobera](https://github.com/hakobera))
|
|
29
|
+
- Change test ruby version [\#17](https://github.com/tumugi/tumugi-plugin-bigquery/pull/17) ([hakobera](https://github.com/hakobera))
|
|
30
|
+
- Change tumugi dependency version [\#16](https://github.com/tumugi/tumugi-plugin-bigquery/pull/16) ([hakobera](https://github.com/hakobera))
|
|
31
|
+
- Implement extract table to google cloud storage feature [\#15](https://github.com/tumugi/tumugi-plugin-bigquery/pull/15) ([hakobera](https://github.com/hakobera))
|
|
32
|
+
- Add BigqueryLoadTask [\#12](https://github.com/tumugi/tumugi-plugin-bigquery/pull/12) ([hakobera](https://github.com/hakobera))
|
|
33
|
+
- Update dependency gems [\#11](https://github.com/tumugi/tumugi-plugin-bigquery/pull/11) ([hakobera](https://github.com/hakobera))
|
|
34
|
+
- Update tumugi to v0.5.0 [\#9](https://github.com/tumugi/tumugi-plugin-bigquery/pull/9) ([hakobera](https://github.com/hakobera))
|
|
35
|
+
- Add rubygems badge [\#3](https://github.com/tumugi/tumugi-plugin-bigquery/pull/3) ([hakobera](https://github.com/hakobera))
|
|
36
|
+
|
|
37
|
+
## [v0.1.0](https://github.com/tumugi/tumugi-plugin-bigquery/tree/v0.1.0) (2016-05-16)
|
|
38
|
+
**Fixed bugs:**
|
|
39
|
+
|
|
40
|
+
- Fix unused arguments [\#2](https://github.com/tumugi/tumugi-plugin-bigquery/pull/2) ([hakobera](https://github.com/hakobera))
|
|
41
|
+
|
|
42
|
+
**Merged pull requests:**
|
|
43
|
+
|
|
44
|
+
- First implementation [\#1](https://github.com/tumugi/tumugi-plugin-bigquery/pull/1) ([hakobera](https://github.com/hakobera))
|
|
45
|
+
|
|
46
|
+
|
|
47
|
+
|
|
48
|
+
\* *This Change Log was automatically generated by [github_changelog_generator](https://github.com/skywinder/Github-Changelog-Generator)*
|
data/README.md
CHANGED
|
@@ -1,4 +1,4 @@
|
|
|
1
|
-
[](https://travis-ci.org/tumugi/tumugi-plugin-bigquery) [](https://codeclimate.com/github/tumugi/tumugi-plugin-bigquery) [](https://coveralls.io/github/tumugi/tumugi-plugin-bigquery)
|
|
1
|
+
[](https://travis-ci.org/tumugi/tumugi-plugin-bigquery) [](https://codeclimate.com/github/tumugi/tumugi-plugin-bigquery) [](https://coveralls.io/github/tumugi/tumugi-plugin-bigquery) [](https://badge.fury.io/rb/tumugi-plugin-bigquery)
|
|
2
2
|
|
|
3
3
|
# tumugi-plugin-bigquery
|
|
4
4
|
|
|
@@ -68,6 +68,8 @@ end
|
|
|
68
68
|
|
|
69
69
|
#### Usage
|
|
70
70
|
|
|
71
|
+
Copy `test.src_table` to `test.dest_table`.
|
|
72
|
+
|
|
71
73
|
```rb
|
|
72
74
|
task :task1, type: :bigquery_copy do
|
|
73
75
|
param_set :src_dataset_id, 'test'
|
|
@@ -77,6 +79,24 @@ task :task1, type: :bigquery_copy do
|
|
|
77
79
|
end
|
|
78
80
|
```
|
|
79
81
|
|
|
82
|
+
### Tumugi::Plugin::BigqueryLoadTask
|
|
83
|
+
|
|
84
|
+
`Tumugi::Plugin::BigqueryLoadTask` is task to load structured data from GCS into BigQuery.
|
|
85
|
+
|
|
86
|
+
#### Usage
|
|
87
|
+
|
|
88
|
+
Load `gs://test_bucket/load_data.csv` into `dest_project:dest_dataset.dest_table`
|
|
89
|
+
|
|
90
|
+
```rb
|
|
91
|
+
task :task1, type: :bigquery_load do
|
|
92
|
+
param_set :bucket, 'test_bucket'
|
|
93
|
+
param_set :key, 'load_data.csv'
|
|
94
|
+
param_set :project_id, 'dest_project'
|
|
95
|
+
param_set :datset_id, 'dest_dataset'
|
|
96
|
+
param_set :table_id, 'dest_table'
|
|
97
|
+
end
|
|
98
|
+
```
|
|
99
|
+
|
|
80
100
|
### Config Section
|
|
81
101
|
|
|
82
102
|
tumugi-plugin-bigquery provide config section named "bigquery" which can specified BigQuery autenticaion info.
|
|
@@ -84,7 +104,7 @@ tumugi-plugin-bigquery provide config section named "bigquery" which can specifi
|
|
|
84
104
|
#### Authenticate by client_email and private_key
|
|
85
105
|
|
|
86
106
|
```rb
|
|
87
|
-
Tumugi.
|
|
107
|
+
Tumugi.configure do |config|
|
|
88
108
|
config.section("bigquery") do |section|
|
|
89
109
|
section.project_id = "xxx"
|
|
90
110
|
section.client_email = "yyy@yyy.iam.gserviceaccount.com"
|
|
@@ -96,7 +116,7 @@ end
|
|
|
96
116
|
#### Authenticate by JSON key file
|
|
97
117
|
|
|
98
118
|
```rb
|
|
99
|
-
Tumugi.
|
|
119
|
+
Tumugi.configure do |config|
|
|
100
120
|
config.section("bigquery") do |section|
|
|
101
121
|
section.private_key_file = "/path/to/key.json"
|
|
102
122
|
end
|
data/examples/load.rb
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
1
|
+
task :task1, type: :bigquery_load do
|
|
2
|
+
requires :task2
|
|
3
|
+
param_set :bucket, 'tumugi-plugin-bigquery'
|
|
4
|
+
param_set :key, 'test.csv'
|
|
5
|
+
param_set :dataset_id, -> { input.dataset_id }
|
|
6
|
+
param_set :table_id, 'load_test'
|
|
7
|
+
param_set :skip_leading_rows, 1
|
|
8
|
+
param_set :schema, [
|
|
9
|
+
{
|
|
10
|
+
name: 'row_number',
|
|
11
|
+
type: 'INTEGER',
|
|
12
|
+
mode: 'NULLABLE'
|
|
13
|
+
},
|
|
14
|
+
{
|
|
15
|
+
name: 'value',
|
|
16
|
+
type: 'INTEGER',
|
|
17
|
+
mode: 'NULLABLE'
|
|
18
|
+
},
|
|
19
|
+
]
|
|
20
|
+
end
|
|
21
|
+
|
|
22
|
+
task :task2, type: :bigquery_dataset do
|
|
23
|
+
param_set :dataset_id, 'test'
|
|
24
|
+
end
|
data/examples/test.csv
ADDED
|
@@ -1,7 +1,7 @@
|
|
|
1
|
-
Tumugi.
|
|
2
|
-
|
|
3
|
-
|
|
4
|
-
|
|
5
|
-
|
|
1
|
+
Tumugi.configure do |config|
|
|
2
|
+
config.section('bigquery') do |section|
|
|
3
|
+
section.project_id = ENV["PROJECT_ID"]
|
|
4
|
+
section.client_email = ENV["CLIENT_EMAIL"]
|
|
5
|
+
section.private_key = ENV["PRIVATE_KEY"].gsub(/\\n/, "\n")
|
|
6
6
|
end
|
|
7
7
|
end
|
|
@@ -1,4 +1,5 @@
|
|
|
1
1
|
require 'kura'
|
|
2
|
+
require 'json'
|
|
2
3
|
require_relative './error'
|
|
3
4
|
|
|
4
5
|
Tumugi::Config.register_section('bigquery', :project_id, :client_email, :private_key, :private_key_file)
|
|
@@ -9,12 +10,22 @@ module Tumugi
|
|
|
9
10
|
class Client
|
|
10
11
|
attr_reader :project_id
|
|
11
12
|
|
|
12
|
-
def initialize(project_id: nil, client_email: nil, private_key: nil)
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
13
|
+
def initialize(project_id: nil, client_email: nil, private_key: nil, private_key_file: nil)
|
|
14
|
+
@project_id = project_id
|
|
15
|
+
|
|
16
|
+
if client_email.nil? && private_key.nil? && !private_key_file.nil?
|
|
17
|
+
@client = Kura.client(private_key_file)
|
|
18
|
+
if @project_id.nil?
|
|
19
|
+
key = JSON.parse(File.read(private_key_file))
|
|
20
|
+
@project_id = key['project_id']
|
|
21
|
+
end
|
|
22
|
+
else
|
|
23
|
+
# This method call style is needed for jruby.
|
|
24
|
+
# JRuby cannot handle correctly if method using keyword hash and last hash argument.
|
|
25
|
+
# see https://bugs.ruby-lang.org/issues/7529
|
|
26
|
+
@client = Kura.client(project_id = { "project_id" => @project_id, "client_email" => client_email, "private_key" => private_key },
|
|
27
|
+
client_email = nil, private_key = nil, {http_options: {timeout: 60}})
|
|
28
|
+
end
|
|
18
29
|
rescue Kura::ApiError => e
|
|
19
30
|
process_error(e)
|
|
20
31
|
end
|
|
@@ -77,6 +88,12 @@ module Tumugi
|
|
|
77
88
|
process_error(e)
|
|
78
89
|
end
|
|
79
90
|
|
|
91
|
+
def table(dataset_id, table_id, project_id: nil)
|
|
92
|
+
@client.table(dataset_id, table_id, project_id: project_id || @project_id)
|
|
93
|
+
rescue Kura::ApiError => e
|
|
94
|
+
process_error(e)
|
|
95
|
+
end
|
|
96
|
+
|
|
80
97
|
def table_exist?(dataset_id, table_id, project_id: nil)
|
|
81
98
|
!@client.table(dataset_id, table_id, project_id: project_id || @project_id).nil?
|
|
82
99
|
rescue Kura::ApiError => e
|
|
@@ -163,6 +180,7 @@ module Tumugi
|
|
|
163
180
|
use_query_cache: true,
|
|
164
181
|
user_defined_function_resources: nil,
|
|
165
182
|
project_id: nil,
|
|
183
|
+
job_project_id: nil,
|
|
166
184
|
job_id: nil,
|
|
167
185
|
wait: nil,
|
|
168
186
|
dry_run: false,
|
|
@@ -175,7 +193,7 @@ module Tumugi
|
|
|
175
193
|
use_query_cache: use_query_cache,
|
|
176
194
|
user_defined_function_resources: user_defined_function_resources,
|
|
177
195
|
project_id: project_id || @project_id,
|
|
178
|
-
job_project_id:
|
|
196
|
+
job_project_id: job_project_id || @project_id,
|
|
179
197
|
job_id: job_id,
|
|
180
198
|
wait: wait,
|
|
181
199
|
dry_run: dry_run,
|
|
@@ -185,28 +203,38 @@ module Tumugi
|
|
|
185
203
|
end
|
|
186
204
|
|
|
187
205
|
def load(dataset_id, table_id, source_uris=nil,
|
|
188
|
-
schema: nil,
|
|
189
|
-
|
|
206
|
+
schema: nil,
|
|
207
|
+
field_delimiter: ",",
|
|
208
|
+
mode: :append,
|
|
209
|
+
allow_jagged_rows: false,
|
|
210
|
+
max_bad_records: 0,
|
|
190
211
|
ignore_unknown_values: false,
|
|
191
212
|
allow_quoted_newlines: false,
|
|
192
|
-
quote: '"',
|
|
213
|
+
quote: '"',
|
|
214
|
+
skip_leading_rows: 0,
|
|
193
215
|
source_format: "CSV",
|
|
194
216
|
project_id: nil,
|
|
217
|
+
job_project_id: nil,
|
|
195
218
|
job_id: nil,
|
|
196
219
|
file: nil, wait: nil,
|
|
197
220
|
dry_run: false,
|
|
198
221
|
&blk)
|
|
199
222
|
@client.load(dataset_id, table_id, source_uris=source_uris,
|
|
200
|
-
schema: schema,
|
|
201
|
-
|
|
223
|
+
schema: schema,
|
|
224
|
+
field_delimiter: field_delimiter,
|
|
225
|
+
mode: mode,
|
|
226
|
+
allow_jagged_rows: allow_jagged_rows,
|
|
227
|
+
max_bad_records: max_bad_records,
|
|
202
228
|
ignore_unknown_values: ignore_unknown_values,
|
|
203
229
|
allow_quoted_newlines: allow_quoted_newlines,
|
|
204
|
-
quote: quote,
|
|
230
|
+
quote: quote,
|
|
231
|
+
skip_leading_rows: skip_leading_rows,
|
|
205
232
|
source_format: source_format,
|
|
206
233
|
project_id: project_id || @project_id,
|
|
207
|
-
job_project_id:
|
|
234
|
+
job_project_id: job_project_id || @project_id,
|
|
208
235
|
job_id: job_id,
|
|
209
|
-
file: file,
|
|
236
|
+
file: file,
|
|
237
|
+
wait: wait,
|
|
210
238
|
dry_run: dry_run,
|
|
211
239
|
&blk)
|
|
212
240
|
rescue Kura::ApiError => e
|
|
@@ -219,6 +247,7 @@ module Tumugi
|
|
|
219
247
|
field_delimiter: ",",
|
|
220
248
|
print_header: true,
|
|
221
249
|
project_id: nil,
|
|
250
|
+
job_project_id: nil,
|
|
222
251
|
job_id: nil,
|
|
223
252
|
wait: nil,
|
|
224
253
|
dry_run: false,
|
|
@@ -229,7 +258,7 @@ module Tumugi
|
|
|
229
258
|
field_delimiter: field_delimiter,
|
|
230
259
|
print_header: print_header,
|
|
231
260
|
project_id: project_id || @project_id,
|
|
232
|
-
job_project_id:
|
|
261
|
+
job_project_id: job_project_id || @project_id,
|
|
233
262
|
job_id: job_id,
|
|
234
263
|
wait: wait,
|
|
235
264
|
dry_run: dry_run,
|
|
@@ -242,6 +271,7 @@ module Tumugi
|
|
|
242
271
|
mode: :truncate,
|
|
243
272
|
src_project_id: nil,
|
|
244
273
|
dest_project_id: nil,
|
|
274
|
+
job_project_id: dest_project_id,
|
|
245
275
|
job_id: nil,
|
|
246
276
|
wait: nil,
|
|
247
277
|
dry_run: false,
|
|
@@ -250,7 +280,7 @@ module Tumugi
|
|
|
250
280
|
mode: mode,
|
|
251
281
|
src_project_id: src_project_id || @project_id,
|
|
252
282
|
dest_project_id: dest_project_id || @project_id,
|
|
253
|
-
job_project_id:
|
|
283
|
+
job_project_id: job_project_id || @project_id,
|
|
254
284
|
job_id: job_id,
|
|
255
285
|
wait: wait,
|
|
256
286
|
dry_run: dry_run,
|
|
@@ -280,7 +310,7 @@ module Tumugi
|
|
|
280
310
|
private
|
|
281
311
|
|
|
282
312
|
def process_error(e)
|
|
283
|
-
raise Tumugi::Plugin::Bigquery::BigqueryError.new(e.
|
|
313
|
+
raise Tumugi::Plugin::Bigquery::BigqueryError.new(e.message, e.reason)
|
|
284
314
|
end
|
|
285
315
|
end
|
|
286
316
|
end
|
|
@@ -17,8 +17,8 @@ module Tumugi
|
|
|
17
17
|
cfg = Tumugi.config.section('bigquery')
|
|
18
18
|
@project_id = project_id || cfg.project_id
|
|
19
19
|
@dataset_id = dataset_id
|
|
20
|
-
@client = client || Tumugi::Plugin::Bigquery::Client.new(project_id: @project_id)
|
|
21
|
-
@dataset = Tumugi::Plugin::Bigquery::Dataset.new(project_id: @project_id, dataset_id: @dataset_id)
|
|
20
|
+
@client = client || Tumugi::Plugin::Bigquery::Client.new(cfg.to_h.merge(project_id: @project_id))
|
|
21
|
+
@dataset = Tumugi::Plugin::Bigquery::Dataset.new(project_id: @client.project_id, dataset_id: @dataset_id)
|
|
22
22
|
end
|
|
23
23
|
|
|
24
24
|
def exist?
|
|
@@ -18,8 +18,8 @@ module Tumugi
|
|
|
18
18
|
@project_id = project_id || cfg.project_id
|
|
19
19
|
@dataset_id = dataset_id
|
|
20
20
|
@table_id = table_id
|
|
21
|
-
@client = client || Tumugi::Plugin::Bigquery::Client.new(project_id: @project_id)
|
|
22
|
-
@table = Tumugi::Plugin::Bigquery::Table.new(project_id: @project_id, dataset_id: @dataset_id, table_id: @table_id)
|
|
21
|
+
@client = client || Tumugi::Plugin::Bigquery::Client.new(cfg.to_h.merge(project_id: @project_id))
|
|
22
|
+
@table = Tumugi::Plugin::Bigquery::Table.new(project_id: @client.project_id, dataset_id: @dataset_id, table_id: @table_id)
|
|
23
23
|
end
|
|
24
24
|
|
|
25
25
|
def exist?
|
|
@@ -15,9 +15,11 @@ module Tumugi
|
|
|
15
15
|
param :wait, type: :int, default: 60
|
|
16
16
|
|
|
17
17
|
def output
|
|
18
|
+
return @output if @output
|
|
19
|
+
|
|
18
20
|
opts = { dataset_id: dest_dataset_id, table_id: dest_table_id }
|
|
19
21
|
opts[:project_id] = dest_project_id if dest_project_id
|
|
20
|
-
Tumugi::Plugin::BigqueryTableTarget.new(opts)
|
|
22
|
+
@output = Tumugi::Plugin::BigqueryTableTarget.new(opts)
|
|
21
23
|
end
|
|
22
24
|
|
|
23
25
|
def run
|
|
@@ -10,7 +10,7 @@ module Tumugi
|
|
|
10
10
|
param :dataset_id, type: :string, required: true
|
|
11
11
|
|
|
12
12
|
def output
|
|
13
|
-
Tumugi::Plugin::BigqueryDatasetTarget.new(project_id: project_id, dataset_id: dataset_id)
|
|
13
|
+
@output ||= Tumugi::Plugin::BigqueryDatasetTarget.new(project_id: project_id, dataset_id: dataset_id)
|
|
14
14
|
end
|
|
15
15
|
|
|
16
16
|
def run
|
|
@@ -0,0 +1,112 @@
|
|
|
1
|
+
require 'json'
|
|
2
|
+
require 'tumugi'
|
|
3
|
+
require 'tumugi/plugin/file_system_target'
|
|
4
|
+
require_relative '../target/bigquery_table'
|
|
5
|
+
|
|
6
|
+
module Tumugi
|
|
7
|
+
module Plugin
|
|
8
|
+
class BigqueryExportTask < Tumugi::Task
|
|
9
|
+
Tumugi::Plugin.register_task('bigquery_export', self)
|
|
10
|
+
|
|
11
|
+
param :project_id, type: :string
|
|
12
|
+
param :job_project_id, type: :string
|
|
13
|
+
param :dataset_id, type: :string, required: true
|
|
14
|
+
param :table_id, type: :string, required: true
|
|
15
|
+
|
|
16
|
+
param :compression, type: :string, default: 'NONE' # GZIP
|
|
17
|
+
param :destination_format, type: :string, default: 'CSV' # NEWLINE_DELIMITED_JSON, AVRO
|
|
18
|
+
|
|
19
|
+
# Only effected if destiation_format == 'CSV'
|
|
20
|
+
param :field_delimiter, type: :string, default: ','
|
|
21
|
+
param :print_header, type: :bool, default: true
|
|
22
|
+
|
|
23
|
+
param :page_size, type: :integer, default: 10000
|
|
24
|
+
|
|
25
|
+
param :wait, type: :integer, default: 120
|
|
26
|
+
|
|
27
|
+
def run
|
|
28
|
+
unless output.is_a?(Tumugi::Plugin::FileSystemTarget)
|
|
29
|
+
raise Tumugi::TumugiError.new("BigqueryExportTask#output must be return a instance of Tumugi::Plugin::FileSystemTarget")
|
|
30
|
+
end
|
|
31
|
+
|
|
32
|
+
client = Tumugi::Plugin::Bigquery::Client.new(config)
|
|
33
|
+
table = Tumugi::Plugin::Bigquery::Table.new(project_id: client.project_id, dataset_id: dataset_id, table_id: table_id)
|
|
34
|
+
job_project_id = client.project_id if job_project_id.nil?
|
|
35
|
+
|
|
36
|
+
log "Source: #{table}"
|
|
37
|
+
log "Destination: #{output}"
|
|
38
|
+
|
|
39
|
+
if is_gcs?(output)
|
|
40
|
+
export_to_gcs(client)
|
|
41
|
+
else
|
|
42
|
+
if destination_format.upcase == 'AVRO'
|
|
43
|
+
raise Tumugi::TumugiError.new("destination_format='AVRO' is only supported when export to Google Cloud Storage")
|
|
44
|
+
end
|
|
45
|
+
if compression.upcase == 'GZIP'
|
|
46
|
+
logger.warn("compression parameter is ignored, it's only supported when export to Google Cloud Storage")
|
|
47
|
+
end
|
|
48
|
+
export_to_file_system(client)
|
|
49
|
+
end
|
|
50
|
+
end
|
|
51
|
+
|
|
52
|
+
private
|
|
53
|
+
|
|
54
|
+
def is_gcs?(target)
|
|
55
|
+
not target.to_s.match(/^gs:\/\/[^\/]+\/.+$/).nil?
|
|
56
|
+
end
|
|
57
|
+
|
|
58
|
+
def export_to_gcs(client)
|
|
59
|
+
options = {
|
|
60
|
+
compression: compression.upcase,
|
|
61
|
+
destination_format: destination_format.upcase,
|
|
62
|
+
field_delimiter: field_delimiter,
|
|
63
|
+
print_header: print_header,
|
|
64
|
+
project_id: client.project_id,
|
|
65
|
+
job_project_id: job_project_id || client.project_id,
|
|
66
|
+
wait: wait
|
|
67
|
+
}
|
|
68
|
+
client.extract(dataset_id, table_id, output.to_s, options)
|
|
69
|
+
end
|
|
70
|
+
|
|
71
|
+
def export_to_file_system(client)
|
|
72
|
+
schema ||= client.table(dataset_id, table_id, project_id: client.project_id).schema.fields
|
|
73
|
+
field_names = schema.map{|f| f.respond_to?(:[]) ? (f["name"] || f[:name]) : f.name }
|
|
74
|
+
start_index = 0
|
|
75
|
+
page_token = nil
|
|
76
|
+
options = {
|
|
77
|
+
max_result: page_size,
|
|
78
|
+
project_id: client.project_id,
|
|
79
|
+
}
|
|
80
|
+
|
|
81
|
+
output.open('w') do |file|
|
|
82
|
+
file.puts field_names.join(field_delimiter) if destination_format == 'CSV' && print_header
|
|
83
|
+
begin
|
|
84
|
+
table_data_list = client.list_tabledata(dataset_id, table_id, options.merge(start_index: start_index, page_token: page_token))
|
|
85
|
+
start_index += page_size
|
|
86
|
+
page_token = table_data_list[:next_token]
|
|
87
|
+
table_data_list[:rows].each do |row|
|
|
88
|
+
file.puts line(field_names, row, destination_format)
|
|
89
|
+
end
|
|
90
|
+
end while not page_token.nil?
|
|
91
|
+
end
|
|
92
|
+
end
|
|
93
|
+
|
|
94
|
+
def line(field_names, row, format)
|
|
95
|
+
case format
|
|
96
|
+
when 'CSV'
|
|
97
|
+
row.map{|v| v[1]}.join(field_delimiter)
|
|
98
|
+
when 'NEWLINE_DELIMITED_JSON'
|
|
99
|
+
JSON.generate(row.to_h)
|
|
100
|
+
end
|
|
101
|
+
end
|
|
102
|
+
|
|
103
|
+
def config
|
|
104
|
+
cfg = Tumugi.config.section('bigquery').to_h
|
|
105
|
+
unless project_id.nil?
|
|
106
|
+
cfg[:project_id] = project_id
|
|
107
|
+
end
|
|
108
|
+
cfg
|
|
109
|
+
end
|
|
110
|
+
end
|
|
111
|
+
end
|
|
112
|
+
end
|
|
@@ -0,0 +1,73 @@
|
|
|
1
|
+
require 'tumugi'
|
|
2
|
+
require_relative '../target/bigquery_table'
|
|
3
|
+
|
|
4
|
+
module Tumugi
|
|
5
|
+
module Plugin
|
|
6
|
+
class BigqueryLoadTask < Tumugi::Task
|
|
7
|
+
Tumugi::Plugin.register_task('bigquery_load', self)
|
|
8
|
+
|
|
9
|
+
param :bucket, type: :string, required: true
|
|
10
|
+
param :key, type: :string, required: true
|
|
11
|
+
param :project_id, type: :string
|
|
12
|
+
param :dataset_id, type: :string, required: true
|
|
13
|
+
param :table_id, type: :string, required: true
|
|
14
|
+
|
|
15
|
+
param :schema # type: :array
|
|
16
|
+
param :field_delimiter, type: :string, default: ','
|
|
17
|
+
param :mode, type: :string, default: 'append' # truncate, empty
|
|
18
|
+
param :allow_jagged_rows, type: :bool, default: false
|
|
19
|
+
param :max_bad_records, type: :integer, default: 0
|
|
20
|
+
param :ignore_unknown_values, type: :bool, default: false
|
|
21
|
+
param :allow_quoted_newlines, type: :bool, default: false
|
|
22
|
+
param :quote, type: :string, default: '"'
|
|
23
|
+
param :skip_leading_rows, type: :interger, default: 0
|
|
24
|
+
param :source_format, type: :string, default: 'CSV' # NEWLINE_DELIMITED_JSON, AVRO
|
|
25
|
+
param :wait, type: :integer, default: 60
|
|
26
|
+
|
|
27
|
+
def output
|
|
28
|
+
return @output if @output
|
|
29
|
+
|
|
30
|
+
opts = { dataset_id: dataset_id, table_id: table_id }
|
|
31
|
+
opts[:project_id] = project_id if project_id
|
|
32
|
+
@output = Tumugi::Plugin::BigqueryTableTarget.new(opts)
|
|
33
|
+
end
|
|
34
|
+
|
|
35
|
+
def run
|
|
36
|
+
if mode != 'append'
|
|
37
|
+
raise Tumugi::ParameterError.new("Parameter 'schema' is required when 'mode' is 'truncate' or 'empty'") if schema.nil?
|
|
38
|
+
end
|
|
39
|
+
|
|
40
|
+
src_uri = "gs://#{bucket}#{normalize_path(key)}"
|
|
41
|
+
log "Source: #{src_uri}"
|
|
42
|
+
log "Destination: #{output}"
|
|
43
|
+
|
|
44
|
+
bq_client = output.client
|
|
45
|
+
opts = {
|
|
46
|
+
schema: schema,
|
|
47
|
+
field_delimiter: field_delimiter,
|
|
48
|
+
mode: mode.to_sym,
|
|
49
|
+
allow_jagged_rows: allow_jagged_rows,
|
|
50
|
+
max_bad_records: max_bad_records,
|
|
51
|
+
ignore_unknown_values: ignore_unknown_values,
|
|
52
|
+
allow_quoted_newlines: allow_quoted_newlines,
|
|
53
|
+
quote: quote,
|
|
54
|
+
skip_leading_rows: skip_leading_rows,
|
|
55
|
+
source_format: source_format,
|
|
56
|
+
project_id: output.project_id,
|
|
57
|
+
wait: wait
|
|
58
|
+
}
|
|
59
|
+
bq_client.load(output.dataset_id, output.table_id, src_uri, opts)
|
|
60
|
+
end
|
|
61
|
+
|
|
62
|
+
private
|
|
63
|
+
|
|
64
|
+
def normalize_path(path)
|
|
65
|
+
unless path.start_with?('/')
|
|
66
|
+
"/#{path}"
|
|
67
|
+
else
|
|
68
|
+
path
|
|
69
|
+
end
|
|
70
|
+
end
|
|
71
|
+
end
|
|
72
|
+
end
|
|
73
|
+
end
|
|
@@ -13,7 +13,7 @@ module Tumugi
|
|
|
13
13
|
param :wait, type: :int, default: 60
|
|
14
14
|
|
|
15
15
|
def output
|
|
16
|
-
Tumugi::Plugin::BigqueryTableTarget.new(project_id: project_id, dataset_id: dataset_id, table_id: table_id)
|
|
16
|
+
@output ||= Tumugi::Plugin::BigqueryTableTarget.new(project_id: project_id, dataset_id: dataset_id, table_id: table_id)
|
|
17
17
|
end
|
|
18
18
|
|
|
19
19
|
def run
|
|
@@ -20,8 +20,8 @@ Gem::Specification.new do |spec|
|
|
|
20
20
|
spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
|
|
21
21
|
spec.require_paths = ["lib"]
|
|
22
22
|
|
|
23
|
-
spec.add_runtime_dependency "tumugi", "
|
|
24
|
-
spec.add_runtime_dependency "kura", "0.2.
|
|
23
|
+
spec.add_runtime_dependency "tumugi", ">= 0.5.1"
|
|
24
|
+
spec.add_runtime_dependency "kura", "~> 0.2.17"
|
|
25
25
|
|
|
26
26
|
spec.add_development_dependency 'bundler', '~> 1.11'
|
|
27
27
|
spec.add_development_dependency 'rake', '~> 10.0'
|
|
@@ -29,4 +29,5 @@ Gem::Specification.new do |spec|
|
|
|
29
29
|
spec.add_development_dependency 'test-unit-rr'
|
|
30
30
|
spec.add_development_dependency 'coveralls'
|
|
31
31
|
spec.add_development_dependency 'github_changelog_generator'
|
|
32
|
+
spec.add_development_dependency 'tumugi-plugin-google_cloud_storage'
|
|
32
33
|
end
|
metadata
CHANGED
|
@@ -1,43 +1,43 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: tumugi-plugin-bigquery
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.
|
|
4
|
+
version: 0.2.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Kazuyuki Honda
|
|
8
8
|
autorequire:
|
|
9
9
|
bindir: exe
|
|
10
10
|
cert_chain: []
|
|
11
|
-
date: 2016-
|
|
11
|
+
date: 2016-06-06 00:00:00.000000000 Z
|
|
12
12
|
dependencies:
|
|
13
13
|
- !ruby/object:Gem::Dependency
|
|
14
14
|
name: tumugi
|
|
15
15
|
requirement: !ruby/object:Gem::Requirement
|
|
16
16
|
requirements:
|
|
17
|
-
- - "
|
|
17
|
+
- - ">="
|
|
18
18
|
- !ruby/object:Gem::Version
|
|
19
|
-
version: 0.
|
|
19
|
+
version: 0.5.1
|
|
20
20
|
type: :runtime
|
|
21
21
|
prerelease: false
|
|
22
22
|
version_requirements: !ruby/object:Gem::Requirement
|
|
23
23
|
requirements:
|
|
24
|
-
- - "
|
|
24
|
+
- - ">="
|
|
25
25
|
- !ruby/object:Gem::Version
|
|
26
|
-
version: 0.
|
|
26
|
+
version: 0.5.1
|
|
27
27
|
- !ruby/object:Gem::Dependency
|
|
28
28
|
name: kura
|
|
29
29
|
requirement: !ruby/object:Gem::Requirement
|
|
30
30
|
requirements:
|
|
31
|
-
- -
|
|
31
|
+
- - "~>"
|
|
32
32
|
- !ruby/object:Gem::Version
|
|
33
|
-
version: 0.2.
|
|
33
|
+
version: 0.2.17
|
|
34
34
|
type: :runtime
|
|
35
35
|
prerelease: false
|
|
36
36
|
version_requirements: !ruby/object:Gem::Requirement
|
|
37
37
|
requirements:
|
|
38
|
-
- -
|
|
38
|
+
- - "~>"
|
|
39
39
|
- !ruby/object:Gem::Version
|
|
40
|
-
version: 0.2.
|
|
40
|
+
version: 0.2.17
|
|
41
41
|
- !ruby/object:Gem::Dependency
|
|
42
42
|
name: bundler
|
|
43
43
|
requirement: !ruby/object:Gem::Requirement
|
|
@@ -122,6 +122,20 @@ dependencies:
|
|
|
122
122
|
- - ">="
|
|
123
123
|
- !ruby/object:Gem::Version
|
|
124
124
|
version: '0'
|
|
125
|
+
- !ruby/object:Gem::Dependency
|
|
126
|
+
name: tumugi-plugin-google_cloud_storage
|
|
127
|
+
requirement: !ruby/object:Gem::Requirement
|
|
128
|
+
requirements:
|
|
129
|
+
- - ">="
|
|
130
|
+
- !ruby/object:Gem::Version
|
|
131
|
+
version: '0'
|
|
132
|
+
type: :development
|
|
133
|
+
prerelease: false
|
|
134
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
135
|
+
requirements:
|
|
136
|
+
- - ">="
|
|
137
|
+
- !ruby/object:Gem::Version
|
|
138
|
+
version: '0'
|
|
125
139
|
description:
|
|
126
140
|
email:
|
|
127
141
|
- hakobera@gmail.com
|
|
@@ -131,13 +145,16 @@ extra_rdoc_files: []
|
|
|
131
145
|
files:
|
|
132
146
|
- ".gitignore"
|
|
133
147
|
- ".travis.yml"
|
|
148
|
+
- CHANGELOG.md
|
|
134
149
|
- Gemfile
|
|
135
150
|
- README.md
|
|
136
151
|
- Rakefile
|
|
137
152
|
- bin/setup
|
|
138
153
|
- examples/copy.rb
|
|
139
154
|
- examples/dataset.rb
|
|
155
|
+
- examples/load.rb
|
|
140
156
|
- examples/query.rb
|
|
157
|
+
- examples/test.csv
|
|
141
158
|
- examples/tumugi_config_example.rb
|
|
142
159
|
- lib/tumugi/plugin/bigquery/client.rb
|
|
143
160
|
- lib/tumugi/plugin/bigquery/dataset.rb
|
|
@@ -148,6 +165,8 @@ files:
|
|
|
148
165
|
- lib/tumugi/plugin/target/bigquery_table.rb
|
|
149
166
|
- lib/tumugi/plugin/task/bigquery_copy.rb
|
|
150
167
|
- lib/tumugi/plugin/task/bigquery_dataset.rb
|
|
168
|
+
- lib/tumugi/plugin/task/bigquery_export.rb
|
|
169
|
+
- lib/tumugi/plugin/task/bigquery_load.rb
|
|
151
170
|
- lib/tumugi/plugin/task/bigquery_query.rb
|
|
152
171
|
- tumugi-plugin-bigquery.gemspec
|
|
153
172
|
homepage: https://github.com/tumugi/tumugi-plugin-bigquery
|