tumugi-plugin-bigquery 0.2.0 → 0.3.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.gitignore +1 -0
- data/CHANGELOG.md +27 -4
- data/README.md +254 -33
- data/examples/copy.rb +9 -9
- data/examples/dataset.rb +1 -1
- data/examples/export.rb +13 -0
- data/examples/force_copy.rb +22 -0
- data/examples/load.rb +7 -7
- data/examples/query.rb +3 -3
- data/examples/query_append.rb +13 -0
- data/lib/tumugi/plugin/bigquery/client.rb +2 -0
- data/lib/tumugi/plugin/bigquery/version.rb +1 -1
- data/lib/tumugi/plugin/task/bigquery_copy.rb +11 -2
- data/lib/tumugi/plugin/task/bigquery_query.rb +20 -2
- data/tumugi-plugin-bigquery.gemspec +2 -2
- metadata +21 -18
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 63e4b8a538949b06c7a63d62b60e965c3d167e21
|
4
|
+
data.tar.gz: ead04218cb01d036f9c0c457d6a036ab8b6a12b1
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 1d21aa4a556541f906d566f18fd61b94960eb6021f3c5c749de07dd2be14444533d228d5229a5ffe4d4b934071e670708aeb6c43dad6976e4feb4c8613dd1474
|
7
|
+
data.tar.gz: 389387e0fbcf5e4ab0719260bcefa70d8176ce0b4f96bb8b6d40c4078a23bde30dfd1b03c1189282cc78cdaddda4498c0e57c72325d6a9cbe747a2f0ef44e2b1
|
data/.gitignore
CHANGED
data/CHANGELOG.md
CHANGED
@@ -1,7 +1,29 @@
|
|
1
1
|
# Change Log
|
2
2
|
|
3
|
-
## [
|
4
|
-
[Full Changelog](https://github.com/tumugi/tumugi-plugin-bigquery/compare/v0.
|
3
|
+
## [v0.3.0](https://github.com/tumugi/tumugi-plugin-bigquery/tree/v0.3.0) (2016-07-16)
|
4
|
+
[Full Changelog](https://github.com/tumugi/tumugi-plugin-bigquery/compare/v0.2.0...v0.3.0)
|
5
|
+
|
6
|
+
**Implemented enhancements:**
|
7
|
+
|
8
|
+
- Support flatten\_result flag [\#30](https://github.com/tumugi/tumugi-plugin-bigquery/issues/30)
|
9
|
+
- Support mode parameter for BigqueryQueryTask [\#28](https://github.com/tumugi/tumugi-plugin-bigquery/issues/28)
|
10
|
+
- Support standard SQL [\#20](https://github.com/tumugi/tumugi-plugin-bigquery/issues/20)
|
11
|
+
- Support force copy table [\#7](https://github.com/tumugi/tumugi-plugin-bigquery/issues/7)
|
12
|
+
|
13
|
+
**Fixed bugs:**
|
14
|
+
|
15
|
+
- Fix JSON export for FileSystemTarget does not work [\#31](https://github.com/tumugi/tumugi-plugin-bigquery/issues/31)
|
16
|
+
|
17
|
+
**Merged pull requests:**
|
18
|
+
|
19
|
+
- Update tumugi to 0.6 [\#35](https://github.com/tumugi/tumugi-plugin-bigquery/pull/35) ([hakobera](https://github.com/hakobera))
|
20
|
+
- Add JSON export test [\#34](https://github.com/tumugi/tumugi-plugin-bigquery/pull/34) ([hakobera](https://github.com/hakobera))
|
21
|
+
- Fix misc [\#33](https://github.com/tumugi/tumugi-plugin-bigquery/pull/33) ([hakobera](https://github.com/hakobera))
|
22
|
+
- Support force\_copy parameter for bigquery\_copy task [\#32](https://github.com/tumugi/tumugi-plugin-bigquery/pull/32) ([hakobera](https://github.com/hakobera))
|
23
|
+
- Support append mode query and use legacy SQL flag [\#29](https://github.com/tumugi/tumugi-plugin-bigquery/pull/29) ([hakobera](https://github.com/hakobera))
|
24
|
+
|
25
|
+
## [v0.2.0](https://github.com/tumugi/tumugi-plugin-bigquery/tree/v0.2.0) (2016-06-06)
|
26
|
+
[Full Changelog](https://github.com/tumugi/tumugi-plugin-bigquery/compare/v0.1.0...v0.2.0)
|
5
27
|
|
6
28
|
**Implemented enhancements:**
|
7
29
|
|
@@ -23,8 +45,10 @@
|
|
23
45
|
|
24
46
|
**Merged pull requests:**
|
25
47
|
|
26
|
-
-
|
48
|
+
- Update changelog [\#27](https://github.com/tumugi/tumugi-plugin-bigquery/pull/27) ([hakobera](https://github.com/hakobera))
|
27
49
|
- Prepare release for 0.2.0 [\#25](https://github.com/tumugi/tumugi-plugin-bigquery/pull/25) ([hakobera](https://github.com/hakobera))
|
50
|
+
- Add rubygems badge [\#3](https://github.com/tumugi/tumugi-plugin-bigquery/pull/3) ([hakobera](https://github.com/hakobera))
|
51
|
+
- Cache output [\#26](https://github.com/tumugi/tumugi-plugin-bigquery/pull/26) ([hakobera](https://github.com/hakobera))
|
28
52
|
- Use Thor's invoke instead of system method [\#18](https://github.com/tumugi/tumugi-plugin-bigquery/pull/18) ([hakobera](https://github.com/hakobera))
|
29
53
|
- Change test ruby version [\#17](https://github.com/tumugi/tumugi-plugin-bigquery/pull/17) ([hakobera](https://github.com/hakobera))
|
30
54
|
- Change tumugi dependency version [\#16](https://github.com/tumugi/tumugi-plugin-bigquery/pull/16) ([hakobera](https://github.com/hakobera))
|
@@ -32,7 +56,6 @@
|
|
32
56
|
- Add BigqueryLoadTask [\#12](https://github.com/tumugi/tumugi-plugin-bigquery/pull/12) ([hakobera](https://github.com/hakobera))
|
33
57
|
- Update dependency gems [\#11](https://github.com/tumugi/tumugi-plugin-bigquery/pull/11) ([hakobera](https://github.com/hakobera))
|
34
58
|
- Update tumugi to v0.5.0 [\#9](https://github.com/tumugi/tumugi-plugin-bigquery/pull/9) ([hakobera](https://github.com/hakobera))
|
35
|
-
- Add rubygems badge [\#3](https://github.com/tumugi/tumugi-plugin-bigquery/pull/3) ([hakobera](https://github.com/hakobera))
|
36
59
|
|
37
60
|
## [v0.1.0](https://github.com/tumugi/tumugi-plugin-bigquery/tree/v0.1.0) (2016-05-16)
|
38
61
|
**Fixed bugs:**
|
data/README.md
CHANGED
@@ -1,8 +1,8 @@
|
|
1
1
|
[![Build Status](https://travis-ci.org/tumugi/tumugi-plugin-bigquery.svg?branch=master)](https://travis-ci.org/tumugi/tumugi-plugin-bigquery) [![Code Climate](https://codeclimate.com/github/tumugi/tumugi-plugin-bigquery/badges/gpa.svg)](https://codeclimate.com/github/tumugi/tumugi-plugin-bigquery) [![Coverage Status](https://coveralls.io/repos/github/tumugi/tumugi-plugin-bigquery/badge.svg?branch=master)](https://coveralls.io/github/tumugi/tumugi-plugin-bigquery) [![Gem Version](https://badge.fury.io/rb/tumugi-plugin-bigquery.svg)](https://badge.fury.io/rb/tumugi-plugin-bigquery)
|
2
2
|
|
3
|
-
# tumugi
|
3
|
+
# Google BigQuery plugin for [tumugi](https://github.com/tumugi/tumugi)
|
4
4
|
|
5
|
-
tumugi-plugin-bigquery is a plugin for integrate [Google BigQuery](https://cloud.google.com/bigquery/) and [
|
5
|
+
tumugi-plugin-bigquery is a plugin for integrate [Google BigQuery](https://cloud.google.com/bigquery/) and [tumugi](https://github.com/tumugi/tumugi).
|
6
6
|
|
7
7
|
## Installation
|
8
8
|
|
@@ -12,17 +12,7 @@ Add this line to your application's Gemfile:
|
|
12
12
|
gem 'tumugi-plugin-bigquery'
|
13
13
|
```
|
14
14
|
|
15
|
-
And then execute
|
16
|
-
|
17
|
-
```sh
|
18
|
-
$ bundle
|
19
|
-
```
|
20
|
-
|
21
|
-
Or install it yourself as:
|
22
|
-
|
23
|
-
```sb
|
24
|
-
$ gem install tumugi-plugin-bigquery
|
25
|
-
```
|
15
|
+
And then execute `bundle install`.
|
26
16
|
|
27
17
|
## Target
|
28
18
|
|
@@ -30,21 +20,65 @@ $ gem install tumugi-plugin-bigquery
|
|
30
20
|
|
31
21
|
`Tumugi::Plugin::BigqueryDatasetTarget` is target for BigQuery dataset.
|
32
22
|
|
23
|
+
#### Parameters
|
24
|
+
|
25
|
+
| Name | type | required? | default | description |
|
26
|
+
|------------|--------|-----------|---------|------------------------------------------------------------------|
|
27
|
+
| dataset_id | string | required | | Dataset ID |
|
28
|
+
| project_id | string | optional | | [Project](https://cloud.google.com/compute/docs/projects) ID |
|
29
|
+
|
30
|
+
#### Examples
|
31
|
+
|
32
|
+
```rb
|
33
|
+
task :task1 do
|
34
|
+
output target(:bigquery_dataset, dataset_id: "your_dataset_id")
|
35
|
+
end
|
36
|
+
```
|
37
|
+
|
38
|
+
```rb
|
39
|
+
task :task1 do
|
40
|
+
output target(:bigquery_dataset, project_id: "project_id", dataset_id: "dataset_id")
|
41
|
+
end
|
42
|
+
```
|
43
|
+
|
33
44
|
#### Tumugi::Plugin::BigqueryTableTarget
|
34
45
|
|
35
46
|
`Tumugi::Plugin::BigqueryDatasetTarget` is target for BigQuery table.
|
36
47
|
|
48
|
+
#### Parameters
|
49
|
+
|
50
|
+
| name | type | required? | default | description |
|
51
|
+
|------------|--------|-----------|---------|------------------------------------------------------------------|
|
52
|
+
| table_id | string | required | | Table ID |
|
53
|
+
| dataset_id | string | required | | Dataset ID |
|
54
|
+
| project_id | string | optional | | [Project](https://cloud.google.com/compute/docs/projects) ID |
|
55
|
+
|
56
|
+
#### Examples
|
57
|
+
|
58
|
+
```rb
|
59
|
+
task :task1 do
|
60
|
+
output target(:bigquery_table, table_id: "table_id", dataset_id: "your_dataset_id")
|
61
|
+
end
|
62
|
+
```
|
63
|
+
|
37
64
|
## Task
|
38
65
|
|
39
66
|
### Tumugi::Plugin::BigqueryDatasetTask
|
40
67
|
|
41
68
|
`Tumugi::Plugin::BigqueryDatasetTask` is task to create a dataset.
|
42
69
|
|
43
|
-
####
|
70
|
+
#### Parameters
|
71
|
+
|
72
|
+
| name | type | required? | default | description |
|
73
|
+
|------------|--------|-----------|---------|------------------------------------------------------------------|
|
74
|
+
| dataset_id | string | required | | Dataset ID |
|
75
|
+
| project_id | string | optional | | [Project](https://cloud.google.com/compute/docs/projects) ID |
|
76
|
+
|
77
|
+
#### Examples
|
44
78
|
|
45
79
|
```rb
|
46
80
|
task :task1, type: :bigquery_dataset do
|
47
|
-
|
81
|
+
dataset_id 'test'
|
48
82
|
end
|
49
83
|
```
|
50
84
|
|
@@ -52,13 +86,41 @@ end
|
|
52
86
|
|
53
87
|
`Tumugi::Plugin::BigqueryQueryTask` is task to run `query` and save the result into the table which specified by parameter.
|
54
88
|
|
55
|
-
####
|
89
|
+
#### Parameters
|
90
|
+
|
91
|
+
| name | type | required? | default | description |
|
92
|
+
|-----------------|---------|-----------|------------|-----------------------------------------------------------------------------------------------------------------------------------------------|
|
93
|
+
| query | string | required | | query to execute |
|
94
|
+
| table_id | string | required | | destination table ID |
|
95
|
+
| dataset_id | string | required | | destination dataset ID |
|
96
|
+
| project_id | string | optional | | destination project ID |
|
97
|
+
| mode | string | optional | "truncate" | specifies the action that occurs if the destination table already exists. [see](#mode) |
|
98
|
+
| flatten_results | boolean | optional | true | when you query nested data, BigQuery automatically flattens the table data or not. [see](https://cloud.google.com/bigquery/docs/data#flatten) |
|
99
|
+
| use_legacy_sql | bool | optional | true | use legacy SQL syntanx for BigQuery or not |
|
100
|
+
| wait | integer | optional | 60 | wait time (seconds) for query execution |
|
101
|
+
|
102
|
+
#### Examples
|
103
|
+
|
104
|
+
##### truncate mode (default)
|
56
105
|
|
57
106
|
```rb
|
58
107
|
task :task1, type: :bigquery_query do
|
59
|
-
|
60
|
-
|
61
|
-
|
108
|
+
query "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
|
109
|
+
table_id "dest_table#{Time.now.to_i}"
|
110
|
+
dataset_id "test"
|
111
|
+
end
|
112
|
+
```
|
113
|
+
|
114
|
+
##### append mode
|
115
|
+
|
116
|
+
If you set `mode` to `append`, query result append to existing table.
|
117
|
+
|
118
|
+
```rb
|
119
|
+
task :task1, type: :bigquery_query do
|
120
|
+
query "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
|
121
|
+
table_id "dest_table#{Time.now.to_i}"
|
122
|
+
dataset_id "test"
|
123
|
+
mode "append"
|
62
124
|
end
|
63
125
|
```
|
64
126
|
|
@@ -66,16 +128,46 @@ end
|
|
66
128
|
|
67
129
|
`Tumugi::Plugin::BigqueryCopyTask` is task to copy table which specified by parameter.
|
68
130
|
|
69
|
-
####
|
131
|
+
#### Parameters
|
132
|
+
|
133
|
+
| name | type | required? | default | description |
|
134
|
+
|-----------------|--------|-----------|---------|---------------------------------------------------------|
|
135
|
+
| src_table_id | string | required | | source table ID |
|
136
|
+
| src_dataset_id | string | required | | source dataset ID |
|
137
|
+
| src_project_id | string | optional | | source project ID |
|
138
|
+
| dest_table_id | string | required | | destination table ID |
|
139
|
+
| dest_dataset_id | string | required | | destination dataset ID |
|
140
|
+
| dest_project_id | string | optional | | destination project ID |
|
141
|
+
| force_copy | bool | optional | false | force copy when destination table already exists or not |
|
142
|
+
| wait | integer| optional | 60 | wait time (seconds) for query execution |
|
143
|
+
|
144
|
+
#### Examples
|
70
145
|
|
71
146
|
Copy `test.src_table` to `test.dest_table`.
|
72
147
|
|
148
|
+
##### Normal usecase
|
149
|
+
|
150
|
+
```rb
|
151
|
+
task :task1, type: :bigquery_copy do
|
152
|
+
src_table_id "src_table"
|
153
|
+
src_dataset_id "test"
|
154
|
+
dest_table_id "dest_table"
|
155
|
+
dest_dataset_id "test"
|
156
|
+
end
|
157
|
+
```
|
158
|
+
|
159
|
+
##### force_copy
|
160
|
+
|
161
|
+
If `force_copy` is `true`, copy operation always execute even if destination table exists.
|
162
|
+
This means data of destination table data is deleted, so be carefull to enable this parameter.
|
163
|
+
|
73
164
|
```rb
|
74
165
|
task :task1, type: :bigquery_copy do
|
75
|
-
|
76
|
-
|
77
|
-
|
78
|
-
|
166
|
+
src_table_id "src_table"
|
167
|
+
src_dataset_id "test"
|
168
|
+
dest_table_id "dest_table"
|
169
|
+
dest_dataset_id "test"
|
170
|
+
force_copy true
|
79
171
|
end
|
80
172
|
```
|
81
173
|
|
@@ -83,25 +175,154 @@ end
|
|
83
175
|
|
84
176
|
`Tumugi::Plugin::BigqueryLoadTask` is task to load structured data from GCS into BigQuery.
|
85
177
|
|
86
|
-
####
|
178
|
+
#### Parameters
|
179
|
+
|
180
|
+
| name | type | required? | default | description |
|
181
|
+
|-----------------------|-----------------|------------------------------------|---------------------|----------------------------------------------------------------------------------------------------------------------------------------------|
|
182
|
+
| bucket | string | required | | source GCS bucket name |
|
183
|
+
| key | string | required | | source path of file like "/path/to/file.csv" |
|
184
|
+
| table_id | string | required | | destination table ID |
|
185
|
+
| dataset_id | string | required | | destination dataset ID |
|
186
|
+
| project_id | string | optional | | destination project ID |
|
187
|
+
| schema | array of object | required when mode is not "append" | | see [schema format](#schema) |
|
188
|
+
| mode | string | optional | "append" | specifies the action that occurs if the destination table already exists. [see](#mode) |
|
189
|
+
| source_format | string | optional | "CSV" | source file format. [see](#format) |
|
190
|
+
| ignore_unknown_values | bool | optional | false | indicates if BigQuery should allow extra values that are not represented in the table schema |
|
191
|
+
| max_bad_records | integer | optional | 0 | maximum number of bad records that BigQuery can ignore when running the job |
|
192
|
+
| field_delimiter | string | optional | "," | separator for fields in a CSV file. used only when source_format is "CSV" |
|
193
|
+
| allow_jagged_rows | bool | optional | false | accept rows that are missing trailing optional columns. The missing values are treated as null. used only when source_format is "CSV" |
|
194
|
+
| allow_quoted_newlines | bool | optional | false | indicates if BigQuery should allow quoted data sections that contain newline characters in a CSV file. used only when source_format is "CSV" |
|
195
|
+
| quote | string | optional | "\"" (double-quote) | value that is used to quote data sections in a CSV file. used only when source_format is "CSV" |
|
196
|
+
| skip_leading_rows | integer | optional | 0 | .number of rows at the top of a CSV file that BigQuery will skip when loading the data. used only when source_format is "CSV" |
|
197
|
+
| wait | integer | optional | 60 | wait time (seconds) for query execution |
|
198
|
+
|
199
|
+
#### Example
|
87
200
|
|
88
201
|
Load `gs://test_bucket/load_data.csv` into `dest_project:dest_dataset.dest_table`
|
89
202
|
|
90
203
|
```rb
|
91
204
|
task :task1, type: :bigquery_load do
|
92
|
-
|
93
|
-
|
94
|
-
|
95
|
-
|
96
|
-
|
205
|
+
bucket "test_bucket"
|
206
|
+
key "load_data.csv"
|
207
|
+
table_id "dest_table"
|
208
|
+
datset_id "dest_dataset"
|
209
|
+
project_id "dest_project"
|
210
|
+
end
|
211
|
+
```
|
212
|
+
|
213
|
+
### Tumugi::Plugin::BigqueryExportTask
|
214
|
+
|
215
|
+
`Tumugi::Plugin::BigqueryExportTask` is task to export BigQuery table.
|
216
|
+
|
217
|
+
#### Parameters
|
218
|
+
|
219
|
+
| name | type | required? | default | description |
|
220
|
+
|--------------------|---------|-----------|--------------------|-------------------------------------------------------------------------------------|
|
221
|
+
| project_id | string | optional | | source project ID |
|
222
|
+
| job_project_id | string | optional | same as project_id | job running project ID |
|
223
|
+
| dataset_id | string | required | true | source dataset ID |
|
224
|
+
| table_id | string | required | true | source table ID |
|
225
|
+
| compression | string | optional | "NONE" | [destination file compression]. "NONE": no compression, "GZIP": compression by gzip |
|
226
|
+
| destination_format | string | optional | "CSV" | [destination file format](#format) |
|
227
|
+
| field_delimiter | string | optional | "," | separator for fields in a CSV file. used only when destination_format is "CSV" |
|
228
|
+
| print_header | bool | optional | true | print header row in a CSV file. used only when destination_format is "CSV" |
|
229
|
+
| page_size | integer | optional | 10000 | Fetch number of rows in one request |
|
230
|
+
| wait | integer | optional | 60 | wait time (seconds) for query execution |
|
231
|
+
|
232
|
+
#### Examples
|
233
|
+
|
234
|
+
##### Export `src_dataset.src_table` to local file `data.csv`
|
235
|
+
|
236
|
+
```rb
|
237
|
+
task :task1, type: :bigquery_export do
|
238
|
+
table_id "src_table"
|
239
|
+
datset_id "src_dataset"
|
240
|
+
|
241
|
+
output target(:local_file, "data.csv")
|
242
|
+
end
|
243
|
+
```
|
244
|
+
|
245
|
+
##### Export `src_dataset.src_table` to Google Cloud Storage
|
246
|
+
|
247
|
+
You need [tumugi-plugin-google_cloud_storage](https://github.com/tumugi/tumugi-plugin-google_cloud_storage)
|
248
|
+
|
249
|
+
```rb
|
250
|
+
task :task1, type: :bigquery_export do
|
251
|
+
table_id "src_table"
|
252
|
+
datset_id "src_dataset"
|
253
|
+
|
254
|
+
output target(:google_cloud_storage_file, bucket: "bucket", key: "data.csv")
|
97
255
|
end
|
98
256
|
```
|
99
257
|
|
100
|
-
|
258
|
+
##### Export `src_dataset.src_table` to Google Drive
|
259
|
+
|
260
|
+
You need [tumugi-plugin-google_drive](https://github.com/tumugi/tumugi-plugin-google_drive)
|
261
|
+
|
262
|
+
```rb
|
263
|
+
task :task1, type: :bigquery_export do
|
264
|
+
table_id "src_table"
|
265
|
+
datset_id "src_dataset"
|
266
|
+
|
267
|
+
output target(:google_drive_file, name: "data.csv")
|
268
|
+
end
|
269
|
+
```
|
270
|
+
|
271
|
+
## Common parameter value
|
272
|
+
|
273
|
+
### mode
|
274
|
+
|
275
|
+
| value | description |
|
276
|
+
|----------|-------------|
|
277
|
+
| truncate | If the table already exists, BigQuery overwrites the table data. |
|
278
|
+
| append | If the table already exists, BigQuery appends the data to the table. |
|
279
|
+
| empty | If the table already exists and contains data, a 'duplicate' error is returned in the job result. |
|
280
|
+
|
281
|
+
### format
|
282
|
+
|
283
|
+
| value | description |
|
284
|
+
|------------------------|--------------------------------------------|
|
285
|
+
| CSV | CSV |
|
286
|
+
| NEWLINE_DELIMITED_JSON | Each line is JSON + new line |
|
287
|
+
| AVRO | [see](https://avro.apache.org/docs/1.2.0/) |
|
288
|
+
|
289
|
+
### schema
|
290
|
+
|
291
|
+
Format of `schema` parameter is array of nested object like below:
|
292
|
+
|
293
|
+
```js
|
294
|
+
[
|
295
|
+
{
|
296
|
+
"name": "column1",
|
297
|
+
"type": "string"
|
298
|
+
},
|
299
|
+
{
|
300
|
+
"name": "column2",
|
301
|
+
"type": "integer",
|
302
|
+
"mode": "repeated"
|
303
|
+
},
|
304
|
+
{
|
305
|
+
"name": "record1",
|
306
|
+
"type": "record",
|
307
|
+
"fields": [
|
308
|
+
{
|
309
|
+
"name": "key1",
|
310
|
+
"type": "integer",
|
311
|
+
},
|
312
|
+
{
|
313
|
+
"name": "key2",
|
314
|
+
"type": "integer"
|
315
|
+
}
|
316
|
+
]
|
317
|
+
}
|
318
|
+
]
|
319
|
+
```
|
320
|
+
|
321
|
+
## Config Section
|
101
322
|
|
102
323
|
tumugi-plugin-bigquery provide config section named "bigquery" which can specified BigQuery autenticaion info.
|
103
324
|
|
104
|
-
|
325
|
+
### Authenticate by client_email and private_key
|
105
326
|
|
106
327
|
```rb
|
107
328
|
Tumugi.configure do |config|
|
@@ -113,7 +334,7 @@ Tumugi.configure do |config|
|
|
113
334
|
end
|
114
335
|
```
|
115
336
|
|
116
|
-
|
337
|
+
### Authenticate by JSON key file
|
117
338
|
|
118
339
|
```rb
|
119
340
|
Tumugi.configure do |config|
|
data/examples/copy.rb
CHANGED
@@ -1,21 +1,21 @@
|
|
1
1
|
task :task1, type: :bigquery_copy do
|
2
|
-
|
3
|
-
|
4
|
-
|
5
|
-
|
6
|
-
|
2
|
+
src_project_id { input.project_id }
|
3
|
+
src_dataset_id { input.dataset_id }
|
4
|
+
src_table_id { input.table_id }
|
5
|
+
dest_dataset_id "test"
|
6
|
+
dest_table_id { "dest_table_#{Time.now.to_i}" }
|
7
7
|
|
8
8
|
requires :task2
|
9
9
|
end
|
10
10
|
|
11
11
|
task :task2, type: :bigquery_query do
|
12
|
-
|
13
|
-
|
14
|
-
|
12
|
+
query "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
|
13
|
+
dataset_id { input.dataset_id }
|
14
|
+
table_id "dest_#{Time.now.to_i}"
|
15
15
|
|
16
16
|
requires :task3
|
17
17
|
end
|
18
18
|
|
19
19
|
task :task3, type: :bigquery_dataset do
|
20
|
-
|
20
|
+
dataset_id "test"
|
21
21
|
end
|
data/examples/dataset.rb
CHANGED
data/examples/export.rb
ADDED
@@ -0,0 +1,13 @@
|
|
1
|
+
task :task1, type: :bigquery_export do
|
2
|
+
dataset_id { input.dataset_id }
|
3
|
+
table_id { input.table_id }
|
4
|
+
|
5
|
+
requires :task2
|
6
|
+
output target(:local_file, "tmp/export.csv")
|
7
|
+
end
|
8
|
+
|
9
|
+
task :task2, type: :bigquery_query do
|
10
|
+
query "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
|
11
|
+
dataset_id "test"
|
12
|
+
table_id "dest_#{Time.now.to_i}"
|
13
|
+
end
|
@@ -0,0 +1,22 @@
|
|
1
|
+
task :task1, type: :bigquery_copy do
|
2
|
+
src_project_id { input.project_id }
|
3
|
+
src_dataset_id { input.dataset_id }
|
4
|
+
src_table_id { input.table_id }
|
5
|
+
dest_dataset_id "test"
|
6
|
+
dest_table_id "dest_table_1"
|
7
|
+
force_copy true
|
8
|
+
|
9
|
+
requires :task2
|
10
|
+
end
|
11
|
+
|
12
|
+
task :task2, type: :bigquery_query do
|
13
|
+
query "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
|
14
|
+
dataset_id { input.dataset_id }
|
15
|
+
table_id "dest_#{Time.now.to_i}"
|
16
|
+
|
17
|
+
requires :task3
|
18
|
+
end
|
19
|
+
|
20
|
+
task :task3, type: :bigquery_dataset do
|
21
|
+
dataset_id "test"
|
22
|
+
end
|
data/examples/load.rb
CHANGED
@@ -1,11 +1,11 @@
|
|
1
1
|
task :task1, type: :bigquery_load do
|
2
2
|
requires :task2
|
3
|
-
|
4
|
-
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
|
3
|
+
bucket 'tumugi-plugin-bigquery'
|
4
|
+
key 'test.csv'
|
5
|
+
dataset_id { input.dataset_id }
|
6
|
+
table_id 'load_test'
|
7
|
+
skip_leading_rows 1
|
8
|
+
schema [
|
9
9
|
{
|
10
10
|
name: 'row_number',
|
11
11
|
type: 'INTEGER',
|
@@ -20,5 +20,5 @@ task :task1, type: :bigquery_load do
|
|
20
20
|
end
|
21
21
|
|
22
22
|
task :task2, type: :bigquery_dataset do
|
23
|
-
|
23
|
+
dataset_id "test"
|
24
24
|
end
|
data/examples/query.rb
CHANGED
@@ -6,7 +6,7 @@ task :task1 do
|
|
6
6
|
end
|
7
7
|
|
8
8
|
task :task2, type: :bigquery_query do
|
9
|
-
|
10
|
-
|
11
|
-
|
9
|
+
query "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
|
10
|
+
dataset_id "test"
|
11
|
+
table_id "dest_#{Time.now.to_i}"
|
12
12
|
end
|
@@ -0,0 +1,13 @@
|
|
1
|
+
task :task1 do
|
2
|
+
requires :task2
|
3
|
+
run do
|
4
|
+
log input.table_name
|
5
|
+
end
|
6
|
+
end
|
7
|
+
|
8
|
+
task :task2, type: :bigquery_query do
|
9
|
+
query "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
|
10
|
+
dataset_id "test"
|
11
|
+
table_id "dest_append"
|
12
|
+
mode "append"
|
13
|
+
end
|
@@ -178,6 +178,7 @@ module Tumugi
|
|
178
178
|
flatten_results: true,
|
179
179
|
priority: "INTERACTIVE",
|
180
180
|
use_query_cache: true,
|
181
|
+
use_legacy_sql: true,
|
181
182
|
user_defined_function_resources: nil,
|
182
183
|
project_id: nil,
|
183
184
|
job_project_id: nil,
|
@@ -191,6 +192,7 @@ module Tumugi
|
|
191
192
|
flatten_results: flatten_results,
|
192
193
|
priority: priority,
|
193
194
|
use_query_cache: use_query_cache,
|
195
|
+
use_legacy_sql: use_legacy_sql,
|
194
196
|
user_defined_function_resources: user_defined_function_resources,
|
195
197
|
project_id: project_id || @project_id,
|
196
198
|
job_project_id: job_project_id || @project_id,
|
@@ -12,16 +12,25 @@ module Tumugi
|
|
12
12
|
param :dest_project_id, type: :string
|
13
13
|
param :dest_dataset_id, type: :string, required: true
|
14
14
|
param :dest_table_id, type: :string, required: true
|
15
|
-
param :
|
15
|
+
param :force_copy, type: :bool, default: false
|
16
|
+
param :wait, type: :integer, default: 60
|
16
17
|
|
17
18
|
def output
|
18
19
|
return @output if @output
|
19
|
-
|
20
|
+
|
20
21
|
opts = { dataset_id: dest_dataset_id, table_id: dest_table_id }
|
21
22
|
opts[:project_id] = dest_project_id if dest_project_id
|
22
23
|
@output = Tumugi::Plugin::BigqueryTableTarget.new(opts)
|
23
24
|
end
|
24
25
|
|
26
|
+
def completed?
|
27
|
+
if force_copy && !finished?
|
28
|
+
false
|
29
|
+
else
|
30
|
+
super
|
31
|
+
end
|
32
|
+
end
|
33
|
+
|
25
34
|
def run
|
26
35
|
log "Source: bq://#{src_project_id}/#{src_dataset_id}/#{src_table_id}"
|
27
36
|
log "Destination: #{output}"
|
@@ -10,19 +10,37 @@ module Tumugi
|
|
10
10
|
param :project_id, type: :string
|
11
11
|
param :dataset_id, type: :string, required: true
|
12
12
|
param :table_id, type: :string, required: true
|
13
|
-
param :
|
13
|
+
param :mode, type: :string, default: 'truncate' # append, empty
|
14
|
+
param :flatten_results, type: :bool, default: true
|
15
|
+
param :use_legacy_sql, type: :bool, default: true
|
16
|
+
param :wait, type: :integer, default: 60
|
14
17
|
|
15
18
|
def output
|
16
19
|
@output ||= Tumugi::Plugin::BigqueryTableTarget.new(project_id: project_id, dataset_id: dataset_id, table_id: table_id)
|
17
20
|
end
|
18
21
|
|
22
|
+
def completed?
|
23
|
+
if mode.to_sym == :append && !finished?
|
24
|
+
false
|
25
|
+
else
|
26
|
+
super
|
27
|
+
end
|
28
|
+
end
|
29
|
+
|
19
30
|
def run
|
20
31
|
log "Launching Query"
|
21
32
|
log "Query: #{query}"
|
22
33
|
log "Query destination: #{output}"
|
23
34
|
|
24
35
|
bq_client = output.client
|
25
|
-
bq_client.query(query,
|
36
|
+
bq_client.query(query,
|
37
|
+
project_id: project_id,
|
38
|
+
dataset_id: output.dataset_id,
|
39
|
+
table_id: output.table_id,
|
40
|
+
mode: mode.to_sym,
|
41
|
+
flatten_results: flatten_results,
|
42
|
+
use_legacy_sql: use_legacy_sql,
|
43
|
+
wait: wait)
|
26
44
|
end
|
27
45
|
end
|
28
46
|
end
|
@@ -20,14 +20,14 @@ Gem::Specification.new do |spec|
|
|
20
20
|
spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
|
21
21
|
spec.require_paths = ["lib"]
|
22
22
|
|
23
|
-
spec.add_runtime_dependency "tumugi", ">= 0.
|
23
|
+
spec.add_runtime_dependency "tumugi", ">= 0.6.1"
|
24
24
|
spec.add_runtime_dependency "kura", "~> 0.2.17"
|
25
|
+
spec.add_runtime_dependency "json", "~> 1.8.3" # json 2.0 does not work with JRuby + MultiJson
|
25
26
|
|
26
27
|
spec.add_development_dependency 'bundler', '~> 1.11'
|
27
28
|
spec.add_development_dependency 'rake', '~> 10.0'
|
28
29
|
spec.add_development_dependency 'test-unit', '~> 3.1'
|
29
30
|
spec.add_development_dependency 'test-unit-rr'
|
30
31
|
spec.add_development_dependency 'coveralls'
|
31
|
-
spec.add_development_dependency 'github_changelog_generator'
|
32
32
|
spec.add_development_dependency 'tumugi-plugin-google_cloud_storage'
|
33
33
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: tumugi-plugin-bigquery
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.3.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Kazuyuki Honda
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2016-
|
11
|
+
date: 2016-07-17 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: tumugi
|
@@ -16,14 +16,14 @@ dependencies:
|
|
16
16
|
requirements:
|
17
17
|
- - ">="
|
18
18
|
- !ruby/object:Gem::Version
|
19
|
-
version: 0.
|
19
|
+
version: 0.6.1
|
20
20
|
type: :runtime
|
21
21
|
prerelease: false
|
22
22
|
version_requirements: !ruby/object:Gem::Requirement
|
23
23
|
requirements:
|
24
24
|
- - ">="
|
25
25
|
- !ruby/object:Gem::Version
|
26
|
-
version: 0.
|
26
|
+
version: 0.6.1
|
27
27
|
- !ruby/object:Gem::Dependency
|
28
28
|
name: kura
|
29
29
|
requirement: !ruby/object:Gem::Requirement
|
@@ -38,6 +38,20 @@ dependencies:
|
|
38
38
|
- - "~>"
|
39
39
|
- !ruby/object:Gem::Version
|
40
40
|
version: 0.2.17
|
41
|
+
- !ruby/object:Gem::Dependency
|
42
|
+
name: json
|
43
|
+
requirement: !ruby/object:Gem::Requirement
|
44
|
+
requirements:
|
45
|
+
- - "~>"
|
46
|
+
- !ruby/object:Gem::Version
|
47
|
+
version: 1.8.3
|
48
|
+
type: :runtime
|
49
|
+
prerelease: false
|
50
|
+
version_requirements: !ruby/object:Gem::Requirement
|
51
|
+
requirements:
|
52
|
+
- - "~>"
|
53
|
+
- !ruby/object:Gem::Version
|
54
|
+
version: 1.8.3
|
41
55
|
- !ruby/object:Gem::Dependency
|
42
56
|
name: bundler
|
43
57
|
requirement: !ruby/object:Gem::Requirement
|
@@ -108,20 +122,6 @@ dependencies:
|
|
108
122
|
- - ">="
|
109
123
|
- !ruby/object:Gem::Version
|
110
124
|
version: '0'
|
111
|
-
- !ruby/object:Gem::Dependency
|
112
|
-
name: github_changelog_generator
|
113
|
-
requirement: !ruby/object:Gem::Requirement
|
114
|
-
requirements:
|
115
|
-
- - ">="
|
116
|
-
- !ruby/object:Gem::Version
|
117
|
-
version: '0'
|
118
|
-
type: :development
|
119
|
-
prerelease: false
|
120
|
-
version_requirements: !ruby/object:Gem::Requirement
|
121
|
-
requirements:
|
122
|
-
- - ">="
|
123
|
-
- !ruby/object:Gem::Version
|
124
|
-
version: '0'
|
125
125
|
- !ruby/object:Gem::Dependency
|
126
126
|
name: tumugi-plugin-google_cloud_storage
|
127
127
|
requirement: !ruby/object:Gem::Requirement
|
@@ -152,8 +152,11 @@ files:
|
|
152
152
|
- bin/setup
|
153
153
|
- examples/copy.rb
|
154
154
|
- examples/dataset.rb
|
155
|
+
- examples/export.rb
|
156
|
+
- examples/force_copy.rb
|
155
157
|
- examples/load.rb
|
156
158
|
- examples/query.rb
|
159
|
+
- examples/query_append.rb
|
157
160
|
- examples/test.csv
|
158
161
|
- examples/tumugi_config_example.rb
|
159
162
|
- lib/tumugi/plugin/bigquery/client.rb
|