tumugi-plugin-bigquery 0.1.0 → 0.2.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.travis.yml +6 -4
- data/CHANGELOG.md +48 -0
- data/README.md +23 -3
- data/examples/load.rb +24 -0
- data/examples/test.csv +6 -0
- data/examples/tumugi_config_example.rb +5 -5
- data/lib/tumugi/plugin/bigquery/client.rb +48 -18
- data/lib/tumugi/plugin/bigquery/version.rb +1 -1
- data/lib/tumugi/plugin/target/bigquery_dataset.rb +2 -2
- data/lib/tumugi/plugin/target/bigquery_table.rb +2 -2
- data/lib/tumugi/plugin/task/bigquery_copy.rb +3 -1
- data/lib/tumugi/plugin/task/bigquery_dataset.rb +1 -1
- data/lib/tumugi/plugin/task/bigquery_export.rb +112 -0
- data/lib/tumugi/plugin/task/bigquery_load.rb +73 -0
- data/lib/tumugi/plugin/task/bigquery_query.rb +1 -1
- data/tumugi-plugin-bigquery.gemspec +3 -2
- metadata +29 -10
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 1f82d5d752da3918795afc6cc669a0fb4711cf95
|
4
|
+
data.tar.gz: fed486ae8aeb9266d4fd11cf523a19a8507755af
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 8418f29dfe96d38bcdfa0c5098d59efd819edaa715a7e2af0945c57f70a4d08fa5c477560df636389eefe0bcf40712c4d240b2a97d3301cadebf6e5615808f2b
|
7
|
+
data.tar.gz: aa34ee20fdec506277f3ac40b8819b62f7d000db7c802ed4c9fcedd5af1bf33644ecf673b10326c8e64f8d554a9a708cdfc0f9ae7942b5570257adec3108ab28
|
data/.travis.yml
CHANGED
data/CHANGELOG.md
ADDED
@@ -0,0 +1,48 @@
|
|
1
|
+
# Change Log
|
2
|
+
|
3
|
+
## [0.2.0](https://github.com/tumugi/tumugi-plugin-bigquery/tree/0.2.0) (2016-06-06)
|
4
|
+
[Full Changelog](https://github.com/tumugi/tumugi-plugin-bigquery/compare/v0.1.0...0.2.0)
|
5
|
+
|
6
|
+
**Implemented enhancements:**
|
7
|
+
|
8
|
+
- Support extract table to FileSystemTarget [\#23](https://github.com/tumugi/tumugi-plugin-bigquery/issues/23)
|
9
|
+
- Support load from GCS [\#5](https://github.com/tumugi/tumugi-plugin-bigquery/issues/5)
|
10
|
+
- Support extract table to Google Cloud Storage [\#4](https://github.com/tumugi/tumugi-plugin-bigquery/issues/4)
|
11
|
+
- Support service account application default auth [\#22](https://github.com/tumugi/tumugi-plugin-bigquery/pull/22) ([hakobera](https://github.com/hakobera))
|
12
|
+
|
13
|
+
**Fixed bugs:**
|
14
|
+
|
15
|
+
- Fix typo and dependency [\#24](https://github.com/tumugi/tumugi-plugin-bigquery/pull/24) ([hakobera](https://github.com/hakobera))
|
16
|
+
- Fix missing project\_id of dataset/table [\#21](https://github.com/tumugi/tumugi-plugin-bigquery/pull/21) ([hakobera](https://github.com/hakobera))
|
17
|
+
- Fix private key file auth does not work [\#19](https://github.com/tumugi/tumugi-plugin-bigquery/pull/19) ([hakobera](https://github.com/hakobera))
|
18
|
+
- Fix support private key file in config section [\#13](https://github.com/tumugi/tumugi-plugin-bigquery/pull/13) ([hakobera](https://github.com/hakobera))
|
19
|
+
|
20
|
+
**Closed issues:**
|
21
|
+
|
22
|
+
- Update tumugi to v0.5.0 [\#8](https://github.com/tumugi/tumugi-plugin-bigquery/issues/8)
|
23
|
+
|
24
|
+
**Merged pull requests:**
|
25
|
+
|
26
|
+
- Cache output [\#26](https://github.com/tumugi/tumugi-plugin-bigquery/pull/26) ([hakobera](https://github.com/hakobera))
|
27
|
+
- Prepare release for 0.2.0 [\#25](https://github.com/tumugi/tumugi-plugin-bigquery/pull/25) ([hakobera](https://github.com/hakobera))
|
28
|
+
- Use Thor's invoke instead of system method [\#18](https://github.com/tumugi/tumugi-plugin-bigquery/pull/18) ([hakobera](https://github.com/hakobera))
|
29
|
+
- Change test ruby version [\#17](https://github.com/tumugi/tumugi-plugin-bigquery/pull/17) ([hakobera](https://github.com/hakobera))
|
30
|
+
- Change tumugi dependency version [\#16](https://github.com/tumugi/tumugi-plugin-bigquery/pull/16) ([hakobera](https://github.com/hakobera))
|
31
|
+
- Implement extract table to google cloud storage feature [\#15](https://github.com/tumugi/tumugi-plugin-bigquery/pull/15) ([hakobera](https://github.com/hakobera))
|
32
|
+
- Add BigqueryLoadTask [\#12](https://github.com/tumugi/tumugi-plugin-bigquery/pull/12) ([hakobera](https://github.com/hakobera))
|
33
|
+
- Update dependency gems [\#11](https://github.com/tumugi/tumugi-plugin-bigquery/pull/11) ([hakobera](https://github.com/hakobera))
|
34
|
+
- Update tumugi to v0.5.0 [\#9](https://github.com/tumugi/tumugi-plugin-bigquery/pull/9) ([hakobera](https://github.com/hakobera))
|
35
|
+
- Add rubygems badge [\#3](https://github.com/tumugi/tumugi-plugin-bigquery/pull/3) ([hakobera](https://github.com/hakobera))
|
36
|
+
|
37
|
+
## [v0.1.0](https://github.com/tumugi/tumugi-plugin-bigquery/tree/v0.1.0) (2016-05-16)
|
38
|
+
**Fixed bugs:**
|
39
|
+
|
40
|
+
- Fix unused arguments [\#2](https://github.com/tumugi/tumugi-plugin-bigquery/pull/2) ([hakobera](https://github.com/hakobera))
|
41
|
+
|
42
|
+
**Merged pull requests:**
|
43
|
+
|
44
|
+
- First implementation [\#1](https://github.com/tumugi/tumugi-plugin-bigquery/pull/1) ([hakobera](https://github.com/hakobera))
|
45
|
+
|
46
|
+
|
47
|
+
|
48
|
+
\* *This Change Log was automatically generated by [github_changelog_generator](https://github.com/skywinder/Github-Changelog-Generator)*
|
data/README.md
CHANGED
@@ -1,4 +1,4 @@
|
|
1
|
-
[![Build Status](https://travis-ci.org/tumugi/tumugi-plugin-bigquery.svg?branch=master)](https://travis-ci.org/tumugi/tumugi-plugin-bigquery) [![Code Climate](https://codeclimate.com/github/tumugi/tumugi-plugin-bigquery/badges/gpa.svg)](https://codeclimate.com/github/tumugi/tumugi-plugin-bigquery) [![Coverage Status](https://coveralls.io/repos/github/tumugi/tumugi-plugin-bigquery/badge.svg?branch=master)](https://coveralls.io/github/tumugi/tumugi-plugin-bigquery)
|
1
|
+
[![Build Status](https://travis-ci.org/tumugi/tumugi-plugin-bigquery.svg?branch=master)](https://travis-ci.org/tumugi/tumugi-plugin-bigquery) [![Code Climate](https://codeclimate.com/github/tumugi/tumugi-plugin-bigquery/badges/gpa.svg)](https://codeclimate.com/github/tumugi/tumugi-plugin-bigquery) [![Coverage Status](https://coveralls.io/repos/github/tumugi/tumugi-plugin-bigquery/badge.svg?branch=master)](https://coveralls.io/github/tumugi/tumugi-plugin-bigquery) [![Gem Version](https://badge.fury.io/rb/tumugi-plugin-bigquery.svg)](https://badge.fury.io/rb/tumugi-plugin-bigquery)
|
2
2
|
|
3
3
|
# tumugi-plugin-bigquery
|
4
4
|
|
@@ -68,6 +68,8 @@ end
|
|
68
68
|
|
69
69
|
#### Usage
|
70
70
|
|
71
|
+
Copy `test.src_table` to `test.dest_table`.
|
72
|
+
|
71
73
|
```rb
|
72
74
|
task :task1, type: :bigquery_copy do
|
73
75
|
param_set :src_dataset_id, 'test'
|
@@ -77,6 +79,24 @@ task :task1, type: :bigquery_copy do
|
|
77
79
|
end
|
78
80
|
```
|
79
81
|
|
82
|
+
### Tumugi::Plugin::BigqueryLoadTask
|
83
|
+
|
84
|
+
`Tumugi::Plugin::BigqueryLoadTask` is task to load structured data from GCS into BigQuery.
|
85
|
+
|
86
|
+
#### Usage
|
87
|
+
|
88
|
+
Load `gs://test_bucket/load_data.csv` into `dest_project:dest_dataset.dest_table`
|
89
|
+
|
90
|
+
```rb
|
91
|
+
task :task1, type: :bigquery_load do
|
92
|
+
param_set :bucket, 'test_bucket'
|
93
|
+
param_set :key, 'load_data.csv'
|
94
|
+
param_set :project_id, 'dest_project'
|
95
|
+
param_set :datset_id, 'dest_dataset'
|
96
|
+
param_set :table_id, 'dest_table'
|
97
|
+
end
|
98
|
+
```
|
99
|
+
|
80
100
|
### Config Section
|
81
101
|
|
82
102
|
tumugi-plugin-bigquery provide config section named "bigquery" which can specified BigQuery autenticaion info.
|
@@ -84,7 +104,7 @@ tumugi-plugin-bigquery provide config section named "bigquery" which can specifi
|
|
84
104
|
#### Authenticate by client_email and private_key
|
85
105
|
|
86
106
|
```rb
|
87
|
-
Tumugi.
|
107
|
+
Tumugi.configure do |config|
|
88
108
|
config.section("bigquery") do |section|
|
89
109
|
section.project_id = "xxx"
|
90
110
|
section.client_email = "yyy@yyy.iam.gserviceaccount.com"
|
@@ -96,7 +116,7 @@ end
|
|
96
116
|
#### Authenticate by JSON key file
|
97
117
|
|
98
118
|
```rb
|
99
|
-
Tumugi.
|
119
|
+
Tumugi.configure do |config|
|
100
120
|
config.section("bigquery") do |section|
|
101
121
|
section.private_key_file = "/path/to/key.json"
|
102
122
|
end
|
data/examples/load.rb
ADDED
@@ -0,0 +1,24 @@
|
|
1
|
+
task :task1, type: :bigquery_load do
|
2
|
+
requires :task2
|
3
|
+
param_set :bucket, 'tumugi-plugin-bigquery'
|
4
|
+
param_set :key, 'test.csv'
|
5
|
+
param_set :dataset_id, -> { input.dataset_id }
|
6
|
+
param_set :table_id, 'load_test'
|
7
|
+
param_set :skip_leading_rows, 1
|
8
|
+
param_set :schema, [
|
9
|
+
{
|
10
|
+
name: 'row_number',
|
11
|
+
type: 'INTEGER',
|
12
|
+
mode: 'NULLABLE'
|
13
|
+
},
|
14
|
+
{
|
15
|
+
name: 'value',
|
16
|
+
type: 'INTEGER',
|
17
|
+
mode: 'NULLABLE'
|
18
|
+
},
|
19
|
+
]
|
20
|
+
end
|
21
|
+
|
22
|
+
task :task2, type: :bigquery_dataset do
|
23
|
+
param_set :dataset_id, 'test'
|
24
|
+
end
|
data/examples/test.csv
ADDED
@@ -1,7 +1,7 @@
|
|
1
|
-
Tumugi.
|
2
|
-
|
3
|
-
|
4
|
-
|
5
|
-
|
1
|
+
Tumugi.configure do |config|
|
2
|
+
config.section('bigquery') do |section|
|
3
|
+
section.project_id = ENV["PROJECT_ID"]
|
4
|
+
section.client_email = ENV["CLIENT_EMAIL"]
|
5
|
+
section.private_key = ENV["PRIVATE_KEY"].gsub(/\\n/, "\n")
|
6
6
|
end
|
7
7
|
end
|
@@ -1,4 +1,5 @@
|
|
1
1
|
require 'kura'
|
2
|
+
require 'json'
|
2
3
|
require_relative './error'
|
3
4
|
|
4
5
|
Tumugi::Config.register_section('bigquery', :project_id, :client_email, :private_key, :private_key_file)
|
@@ -9,12 +10,22 @@ module Tumugi
|
|
9
10
|
class Client
|
10
11
|
attr_reader :project_id
|
11
12
|
|
12
|
-
def initialize(project_id: nil, client_email: nil, private_key: nil)
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
13
|
+
def initialize(project_id: nil, client_email: nil, private_key: nil, private_key_file: nil)
|
14
|
+
@project_id = project_id
|
15
|
+
|
16
|
+
if client_email.nil? && private_key.nil? && !private_key_file.nil?
|
17
|
+
@client = Kura.client(private_key_file)
|
18
|
+
if @project_id.nil?
|
19
|
+
key = JSON.parse(File.read(private_key_file))
|
20
|
+
@project_id = key['project_id']
|
21
|
+
end
|
22
|
+
else
|
23
|
+
# This method call style is needed for jruby.
|
24
|
+
# JRuby cannot handle correctly if method using keyword hash and last hash argument.
|
25
|
+
# see https://bugs.ruby-lang.org/issues/7529
|
26
|
+
@client = Kura.client(project_id = { "project_id" => @project_id, "client_email" => client_email, "private_key" => private_key },
|
27
|
+
client_email = nil, private_key = nil, {http_options: {timeout: 60}})
|
28
|
+
end
|
18
29
|
rescue Kura::ApiError => e
|
19
30
|
process_error(e)
|
20
31
|
end
|
@@ -77,6 +88,12 @@ module Tumugi
|
|
77
88
|
process_error(e)
|
78
89
|
end
|
79
90
|
|
91
|
+
def table(dataset_id, table_id, project_id: nil)
|
92
|
+
@client.table(dataset_id, table_id, project_id: project_id || @project_id)
|
93
|
+
rescue Kura::ApiError => e
|
94
|
+
process_error(e)
|
95
|
+
end
|
96
|
+
|
80
97
|
def table_exist?(dataset_id, table_id, project_id: nil)
|
81
98
|
!@client.table(dataset_id, table_id, project_id: project_id || @project_id).nil?
|
82
99
|
rescue Kura::ApiError => e
|
@@ -163,6 +180,7 @@ module Tumugi
|
|
163
180
|
use_query_cache: true,
|
164
181
|
user_defined_function_resources: nil,
|
165
182
|
project_id: nil,
|
183
|
+
job_project_id: nil,
|
166
184
|
job_id: nil,
|
167
185
|
wait: nil,
|
168
186
|
dry_run: false,
|
@@ -175,7 +193,7 @@ module Tumugi
|
|
175
193
|
use_query_cache: use_query_cache,
|
176
194
|
user_defined_function_resources: user_defined_function_resources,
|
177
195
|
project_id: project_id || @project_id,
|
178
|
-
job_project_id:
|
196
|
+
job_project_id: job_project_id || @project_id,
|
179
197
|
job_id: job_id,
|
180
198
|
wait: wait,
|
181
199
|
dry_run: dry_run,
|
@@ -185,28 +203,38 @@ module Tumugi
|
|
185
203
|
end
|
186
204
|
|
187
205
|
def load(dataset_id, table_id, source_uris=nil,
|
188
|
-
schema: nil,
|
189
|
-
|
206
|
+
schema: nil,
|
207
|
+
field_delimiter: ",",
|
208
|
+
mode: :append,
|
209
|
+
allow_jagged_rows: false,
|
210
|
+
max_bad_records: 0,
|
190
211
|
ignore_unknown_values: false,
|
191
212
|
allow_quoted_newlines: false,
|
192
|
-
quote: '"',
|
213
|
+
quote: '"',
|
214
|
+
skip_leading_rows: 0,
|
193
215
|
source_format: "CSV",
|
194
216
|
project_id: nil,
|
217
|
+
job_project_id: nil,
|
195
218
|
job_id: nil,
|
196
219
|
file: nil, wait: nil,
|
197
220
|
dry_run: false,
|
198
221
|
&blk)
|
199
222
|
@client.load(dataset_id, table_id, source_uris=source_uris,
|
200
|
-
schema: schema,
|
201
|
-
|
223
|
+
schema: schema,
|
224
|
+
field_delimiter: field_delimiter,
|
225
|
+
mode: mode,
|
226
|
+
allow_jagged_rows: allow_jagged_rows,
|
227
|
+
max_bad_records: max_bad_records,
|
202
228
|
ignore_unknown_values: ignore_unknown_values,
|
203
229
|
allow_quoted_newlines: allow_quoted_newlines,
|
204
|
-
quote: quote,
|
230
|
+
quote: quote,
|
231
|
+
skip_leading_rows: skip_leading_rows,
|
205
232
|
source_format: source_format,
|
206
233
|
project_id: project_id || @project_id,
|
207
|
-
job_project_id:
|
234
|
+
job_project_id: job_project_id || @project_id,
|
208
235
|
job_id: job_id,
|
209
|
-
file: file,
|
236
|
+
file: file,
|
237
|
+
wait: wait,
|
210
238
|
dry_run: dry_run,
|
211
239
|
&blk)
|
212
240
|
rescue Kura::ApiError => e
|
@@ -219,6 +247,7 @@ module Tumugi
|
|
219
247
|
field_delimiter: ",",
|
220
248
|
print_header: true,
|
221
249
|
project_id: nil,
|
250
|
+
job_project_id: nil,
|
222
251
|
job_id: nil,
|
223
252
|
wait: nil,
|
224
253
|
dry_run: false,
|
@@ -229,7 +258,7 @@ module Tumugi
|
|
229
258
|
field_delimiter: field_delimiter,
|
230
259
|
print_header: print_header,
|
231
260
|
project_id: project_id || @project_id,
|
232
|
-
job_project_id:
|
261
|
+
job_project_id: job_project_id || @project_id,
|
233
262
|
job_id: job_id,
|
234
263
|
wait: wait,
|
235
264
|
dry_run: dry_run,
|
@@ -242,6 +271,7 @@ module Tumugi
|
|
242
271
|
mode: :truncate,
|
243
272
|
src_project_id: nil,
|
244
273
|
dest_project_id: nil,
|
274
|
+
job_project_id: dest_project_id,
|
245
275
|
job_id: nil,
|
246
276
|
wait: nil,
|
247
277
|
dry_run: false,
|
@@ -250,7 +280,7 @@ module Tumugi
|
|
250
280
|
mode: mode,
|
251
281
|
src_project_id: src_project_id || @project_id,
|
252
282
|
dest_project_id: dest_project_id || @project_id,
|
253
|
-
job_project_id:
|
283
|
+
job_project_id: job_project_id || @project_id,
|
254
284
|
job_id: job_id,
|
255
285
|
wait: wait,
|
256
286
|
dry_run: dry_run,
|
@@ -280,7 +310,7 @@ module Tumugi
|
|
280
310
|
private
|
281
311
|
|
282
312
|
def process_error(e)
|
283
|
-
raise Tumugi::Plugin::Bigquery::BigqueryError.new(e.
|
313
|
+
raise Tumugi::Plugin::Bigquery::BigqueryError.new(e.message, e.reason)
|
284
314
|
end
|
285
315
|
end
|
286
316
|
end
|
@@ -17,8 +17,8 @@ module Tumugi
|
|
17
17
|
cfg = Tumugi.config.section('bigquery')
|
18
18
|
@project_id = project_id || cfg.project_id
|
19
19
|
@dataset_id = dataset_id
|
20
|
-
@client = client || Tumugi::Plugin::Bigquery::Client.new(project_id: @project_id)
|
21
|
-
@dataset = Tumugi::Plugin::Bigquery::Dataset.new(project_id: @project_id, dataset_id: @dataset_id)
|
20
|
+
@client = client || Tumugi::Plugin::Bigquery::Client.new(cfg.to_h.merge(project_id: @project_id))
|
21
|
+
@dataset = Tumugi::Plugin::Bigquery::Dataset.new(project_id: @client.project_id, dataset_id: @dataset_id)
|
22
22
|
end
|
23
23
|
|
24
24
|
def exist?
|
@@ -18,8 +18,8 @@ module Tumugi
|
|
18
18
|
@project_id = project_id || cfg.project_id
|
19
19
|
@dataset_id = dataset_id
|
20
20
|
@table_id = table_id
|
21
|
-
@client = client || Tumugi::Plugin::Bigquery::Client.new(project_id: @project_id)
|
22
|
-
@table = Tumugi::Plugin::Bigquery::Table.new(project_id: @project_id, dataset_id: @dataset_id, table_id: @table_id)
|
21
|
+
@client = client || Tumugi::Plugin::Bigquery::Client.new(cfg.to_h.merge(project_id: @project_id))
|
22
|
+
@table = Tumugi::Plugin::Bigquery::Table.new(project_id: @client.project_id, dataset_id: @dataset_id, table_id: @table_id)
|
23
23
|
end
|
24
24
|
|
25
25
|
def exist?
|
@@ -15,9 +15,11 @@ module Tumugi
|
|
15
15
|
param :wait, type: :int, default: 60
|
16
16
|
|
17
17
|
def output
|
18
|
+
return @output if @output
|
19
|
+
|
18
20
|
opts = { dataset_id: dest_dataset_id, table_id: dest_table_id }
|
19
21
|
opts[:project_id] = dest_project_id if dest_project_id
|
20
|
-
Tumugi::Plugin::BigqueryTableTarget.new(opts)
|
22
|
+
@output = Tumugi::Plugin::BigqueryTableTarget.new(opts)
|
21
23
|
end
|
22
24
|
|
23
25
|
def run
|
@@ -10,7 +10,7 @@ module Tumugi
|
|
10
10
|
param :dataset_id, type: :string, required: true
|
11
11
|
|
12
12
|
def output
|
13
|
-
Tumugi::Plugin::BigqueryDatasetTarget.new(project_id: project_id, dataset_id: dataset_id)
|
13
|
+
@output ||= Tumugi::Plugin::BigqueryDatasetTarget.new(project_id: project_id, dataset_id: dataset_id)
|
14
14
|
end
|
15
15
|
|
16
16
|
def run
|
@@ -0,0 +1,112 @@
|
|
1
|
+
require 'json'
|
2
|
+
require 'tumugi'
|
3
|
+
require 'tumugi/plugin/file_system_target'
|
4
|
+
require_relative '../target/bigquery_table'
|
5
|
+
|
6
|
+
module Tumugi
|
7
|
+
module Plugin
|
8
|
+
class BigqueryExportTask < Tumugi::Task
|
9
|
+
Tumugi::Plugin.register_task('bigquery_export', self)
|
10
|
+
|
11
|
+
param :project_id, type: :string
|
12
|
+
param :job_project_id, type: :string
|
13
|
+
param :dataset_id, type: :string, required: true
|
14
|
+
param :table_id, type: :string, required: true
|
15
|
+
|
16
|
+
param :compression, type: :string, default: 'NONE' # GZIP
|
17
|
+
param :destination_format, type: :string, default: 'CSV' # NEWLINE_DELIMITED_JSON, AVRO
|
18
|
+
|
19
|
+
# Only effected if destiation_format == 'CSV'
|
20
|
+
param :field_delimiter, type: :string, default: ','
|
21
|
+
param :print_header, type: :bool, default: true
|
22
|
+
|
23
|
+
param :page_size, type: :integer, default: 10000
|
24
|
+
|
25
|
+
param :wait, type: :integer, default: 120
|
26
|
+
|
27
|
+
def run
|
28
|
+
unless output.is_a?(Tumugi::Plugin::FileSystemTarget)
|
29
|
+
raise Tumugi::TumugiError.new("BigqueryExportTask#output must be return a instance of Tumugi::Plugin::FileSystemTarget")
|
30
|
+
end
|
31
|
+
|
32
|
+
client = Tumugi::Plugin::Bigquery::Client.new(config)
|
33
|
+
table = Tumugi::Plugin::Bigquery::Table.new(project_id: client.project_id, dataset_id: dataset_id, table_id: table_id)
|
34
|
+
job_project_id = client.project_id if job_project_id.nil?
|
35
|
+
|
36
|
+
log "Source: #{table}"
|
37
|
+
log "Destination: #{output}"
|
38
|
+
|
39
|
+
if is_gcs?(output)
|
40
|
+
export_to_gcs(client)
|
41
|
+
else
|
42
|
+
if destination_format.upcase == 'AVRO'
|
43
|
+
raise Tumugi::TumugiError.new("destination_format='AVRO' is only supported when export to Google Cloud Storage")
|
44
|
+
end
|
45
|
+
if compression.upcase == 'GZIP'
|
46
|
+
logger.warn("compression parameter is ignored, it's only supported when export to Google Cloud Storage")
|
47
|
+
end
|
48
|
+
export_to_file_system(client)
|
49
|
+
end
|
50
|
+
end
|
51
|
+
|
52
|
+
private
|
53
|
+
|
54
|
+
def is_gcs?(target)
|
55
|
+
not target.to_s.match(/^gs:\/\/[^\/]+\/.+$/).nil?
|
56
|
+
end
|
57
|
+
|
58
|
+
def export_to_gcs(client)
|
59
|
+
options = {
|
60
|
+
compression: compression.upcase,
|
61
|
+
destination_format: destination_format.upcase,
|
62
|
+
field_delimiter: field_delimiter,
|
63
|
+
print_header: print_header,
|
64
|
+
project_id: client.project_id,
|
65
|
+
job_project_id: job_project_id || client.project_id,
|
66
|
+
wait: wait
|
67
|
+
}
|
68
|
+
client.extract(dataset_id, table_id, output.to_s, options)
|
69
|
+
end
|
70
|
+
|
71
|
+
def export_to_file_system(client)
|
72
|
+
schema ||= client.table(dataset_id, table_id, project_id: client.project_id).schema.fields
|
73
|
+
field_names = schema.map{|f| f.respond_to?(:[]) ? (f["name"] || f[:name]) : f.name }
|
74
|
+
start_index = 0
|
75
|
+
page_token = nil
|
76
|
+
options = {
|
77
|
+
max_result: page_size,
|
78
|
+
project_id: client.project_id,
|
79
|
+
}
|
80
|
+
|
81
|
+
output.open('w') do |file|
|
82
|
+
file.puts field_names.join(field_delimiter) if destination_format == 'CSV' && print_header
|
83
|
+
begin
|
84
|
+
table_data_list = client.list_tabledata(dataset_id, table_id, options.merge(start_index: start_index, page_token: page_token))
|
85
|
+
start_index += page_size
|
86
|
+
page_token = table_data_list[:next_token]
|
87
|
+
table_data_list[:rows].each do |row|
|
88
|
+
file.puts line(field_names, row, destination_format)
|
89
|
+
end
|
90
|
+
end while not page_token.nil?
|
91
|
+
end
|
92
|
+
end
|
93
|
+
|
94
|
+
def line(field_names, row, format)
|
95
|
+
case format
|
96
|
+
when 'CSV'
|
97
|
+
row.map{|v| v[1]}.join(field_delimiter)
|
98
|
+
when 'NEWLINE_DELIMITED_JSON'
|
99
|
+
JSON.generate(row.to_h)
|
100
|
+
end
|
101
|
+
end
|
102
|
+
|
103
|
+
def config
|
104
|
+
cfg = Tumugi.config.section('bigquery').to_h
|
105
|
+
unless project_id.nil?
|
106
|
+
cfg[:project_id] = project_id
|
107
|
+
end
|
108
|
+
cfg
|
109
|
+
end
|
110
|
+
end
|
111
|
+
end
|
112
|
+
end
|
@@ -0,0 +1,73 @@
|
|
1
|
+
require 'tumugi'
|
2
|
+
require_relative '../target/bigquery_table'
|
3
|
+
|
4
|
+
module Tumugi
|
5
|
+
module Plugin
|
6
|
+
class BigqueryLoadTask < Tumugi::Task
|
7
|
+
Tumugi::Plugin.register_task('bigquery_load', self)
|
8
|
+
|
9
|
+
param :bucket, type: :string, required: true
|
10
|
+
param :key, type: :string, required: true
|
11
|
+
param :project_id, type: :string
|
12
|
+
param :dataset_id, type: :string, required: true
|
13
|
+
param :table_id, type: :string, required: true
|
14
|
+
|
15
|
+
param :schema # type: :array
|
16
|
+
param :field_delimiter, type: :string, default: ','
|
17
|
+
param :mode, type: :string, default: 'append' # truncate, empty
|
18
|
+
param :allow_jagged_rows, type: :bool, default: false
|
19
|
+
param :max_bad_records, type: :integer, default: 0
|
20
|
+
param :ignore_unknown_values, type: :bool, default: false
|
21
|
+
param :allow_quoted_newlines, type: :bool, default: false
|
22
|
+
param :quote, type: :string, default: '"'
|
23
|
+
param :skip_leading_rows, type: :interger, default: 0
|
24
|
+
param :source_format, type: :string, default: 'CSV' # NEWLINE_DELIMITED_JSON, AVRO
|
25
|
+
param :wait, type: :integer, default: 60
|
26
|
+
|
27
|
+
def output
|
28
|
+
return @output if @output
|
29
|
+
|
30
|
+
opts = { dataset_id: dataset_id, table_id: table_id }
|
31
|
+
opts[:project_id] = project_id if project_id
|
32
|
+
@output = Tumugi::Plugin::BigqueryTableTarget.new(opts)
|
33
|
+
end
|
34
|
+
|
35
|
+
def run
|
36
|
+
if mode != 'append'
|
37
|
+
raise Tumugi::ParameterError.new("Parameter 'schema' is required when 'mode' is 'truncate' or 'empty'") if schema.nil?
|
38
|
+
end
|
39
|
+
|
40
|
+
src_uri = "gs://#{bucket}#{normalize_path(key)}"
|
41
|
+
log "Source: #{src_uri}"
|
42
|
+
log "Destination: #{output}"
|
43
|
+
|
44
|
+
bq_client = output.client
|
45
|
+
opts = {
|
46
|
+
schema: schema,
|
47
|
+
field_delimiter: field_delimiter,
|
48
|
+
mode: mode.to_sym,
|
49
|
+
allow_jagged_rows: allow_jagged_rows,
|
50
|
+
max_bad_records: max_bad_records,
|
51
|
+
ignore_unknown_values: ignore_unknown_values,
|
52
|
+
allow_quoted_newlines: allow_quoted_newlines,
|
53
|
+
quote: quote,
|
54
|
+
skip_leading_rows: skip_leading_rows,
|
55
|
+
source_format: source_format,
|
56
|
+
project_id: output.project_id,
|
57
|
+
wait: wait
|
58
|
+
}
|
59
|
+
bq_client.load(output.dataset_id, output.table_id, src_uri, opts)
|
60
|
+
end
|
61
|
+
|
62
|
+
private
|
63
|
+
|
64
|
+
def normalize_path(path)
|
65
|
+
unless path.start_with?('/')
|
66
|
+
"/#{path}"
|
67
|
+
else
|
68
|
+
path
|
69
|
+
end
|
70
|
+
end
|
71
|
+
end
|
72
|
+
end
|
73
|
+
end
|
@@ -13,7 +13,7 @@ module Tumugi
|
|
13
13
|
param :wait, type: :int, default: 60
|
14
14
|
|
15
15
|
def output
|
16
|
-
Tumugi::Plugin::BigqueryTableTarget.new(project_id: project_id, dataset_id: dataset_id, table_id: table_id)
|
16
|
+
@output ||= Tumugi::Plugin::BigqueryTableTarget.new(project_id: project_id, dataset_id: dataset_id, table_id: table_id)
|
17
17
|
end
|
18
18
|
|
19
19
|
def run
|
@@ -20,8 +20,8 @@ Gem::Specification.new do |spec|
|
|
20
20
|
spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
|
21
21
|
spec.require_paths = ["lib"]
|
22
22
|
|
23
|
-
spec.add_runtime_dependency "tumugi", "
|
24
|
-
spec.add_runtime_dependency "kura", "0.2.
|
23
|
+
spec.add_runtime_dependency "tumugi", ">= 0.5.1"
|
24
|
+
spec.add_runtime_dependency "kura", "~> 0.2.17"
|
25
25
|
|
26
26
|
spec.add_development_dependency 'bundler', '~> 1.11'
|
27
27
|
spec.add_development_dependency 'rake', '~> 10.0'
|
@@ -29,4 +29,5 @@ Gem::Specification.new do |spec|
|
|
29
29
|
spec.add_development_dependency 'test-unit-rr'
|
30
30
|
spec.add_development_dependency 'coveralls'
|
31
31
|
spec.add_development_dependency 'github_changelog_generator'
|
32
|
+
spec.add_development_dependency 'tumugi-plugin-google_cloud_storage'
|
32
33
|
end
|
metadata
CHANGED
@@ -1,43 +1,43 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: tumugi-plugin-bigquery
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.2.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Kazuyuki Honda
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2016-
|
11
|
+
date: 2016-06-06 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: tumugi
|
15
15
|
requirement: !ruby/object:Gem::Requirement
|
16
16
|
requirements:
|
17
|
-
- - "
|
17
|
+
- - ">="
|
18
18
|
- !ruby/object:Gem::Version
|
19
|
-
version: 0.
|
19
|
+
version: 0.5.1
|
20
20
|
type: :runtime
|
21
21
|
prerelease: false
|
22
22
|
version_requirements: !ruby/object:Gem::Requirement
|
23
23
|
requirements:
|
24
|
-
- - "
|
24
|
+
- - ">="
|
25
25
|
- !ruby/object:Gem::Version
|
26
|
-
version: 0.
|
26
|
+
version: 0.5.1
|
27
27
|
- !ruby/object:Gem::Dependency
|
28
28
|
name: kura
|
29
29
|
requirement: !ruby/object:Gem::Requirement
|
30
30
|
requirements:
|
31
|
-
- -
|
31
|
+
- - "~>"
|
32
32
|
- !ruby/object:Gem::Version
|
33
|
-
version: 0.2.
|
33
|
+
version: 0.2.17
|
34
34
|
type: :runtime
|
35
35
|
prerelease: false
|
36
36
|
version_requirements: !ruby/object:Gem::Requirement
|
37
37
|
requirements:
|
38
|
-
- -
|
38
|
+
- - "~>"
|
39
39
|
- !ruby/object:Gem::Version
|
40
|
-
version: 0.2.
|
40
|
+
version: 0.2.17
|
41
41
|
- !ruby/object:Gem::Dependency
|
42
42
|
name: bundler
|
43
43
|
requirement: !ruby/object:Gem::Requirement
|
@@ -122,6 +122,20 @@ dependencies:
|
|
122
122
|
- - ">="
|
123
123
|
- !ruby/object:Gem::Version
|
124
124
|
version: '0'
|
125
|
+
- !ruby/object:Gem::Dependency
|
126
|
+
name: tumugi-plugin-google_cloud_storage
|
127
|
+
requirement: !ruby/object:Gem::Requirement
|
128
|
+
requirements:
|
129
|
+
- - ">="
|
130
|
+
- !ruby/object:Gem::Version
|
131
|
+
version: '0'
|
132
|
+
type: :development
|
133
|
+
prerelease: false
|
134
|
+
version_requirements: !ruby/object:Gem::Requirement
|
135
|
+
requirements:
|
136
|
+
- - ">="
|
137
|
+
- !ruby/object:Gem::Version
|
138
|
+
version: '0'
|
125
139
|
description:
|
126
140
|
email:
|
127
141
|
- hakobera@gmail.com
|
@@ -131,13 +145,16 @@ extra_rdoc_files: []
|
|
131
145
|
files:
|
132
146
|
- ".gitignore"
|
133
147
|
- ".travis.yml"
|
148
|
+
- CHANGELOG.md
|
134
149
|
- Gemfile
|
135
150
|
- README.md
|
136
151
|
- Rakefile
|
137
152
|
- bin/setup
|
138
153
|
- examples/copy.rb
|
139
154
|
- examples/dataset.rb
|
155
|
+
- examples/load.rb
|
140
156
|
- examples/query.rb
|
157
|
+
- examples/test.csv
|
141
158
|
- examples/tumugi_config_example.rb
|
142
159
|
- lib/tumugi/plugin/bigquery/client.rb
|
143
160
|
- lib/tumugi/plugin/bigquery/dataset.rb
|
@@ -148,6 +165,8 @@ files:
|
|
148
165
|
- lib/tumugi/plugin/target/bigquery_table.rb
|
149
166
|
- lib/tumugi/plugin/task/bigquery_copy.rb
|
150
167
|
- lib/tumugi/plugin/task/bigquery_dataset.rb
|
168
|
+
- lib/tumugi/plugin/task/bigquery_export.rb
|
169
|
+
- lib/tumugi/plugin/task/bigquery_load.rb
|
151
170
|
- lib/tumugi/plugin/task/bigquery_query.rb
|
152
171
|
- tumugi-plugin-bigquery.gemspec
|
153
172
|
homepage: https://github.com/tumugi/tumugi-plugin-bigquery
|