tumugi-plugin-bigquery 0.2.0 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.gitignore +1 -0
- data/CHANGELOG.md +27 -4
- data/README.md +254 -33
- data/examples/copy.rb +9 -9
- data/examples/dataset.rb +1 -1
- data/examples/export.rb +13 -0
- data/examples/force_copy.rb +22 -0
- data/examples/load.rb +7 -7
- data/examples/query.rb +3 -3
- data/examples/query_append.rb +13 -0
- data/lib/tumugi/plugin/bigquery/client.rb +2 -0
- data/lib/tumugi/plugin/bigquery/version.rb +1 -1
- data/lib/tumugi/plugin/task/bigquery_copy.rb +11 -2
- data/lib/tumugi/plugin/task/bigquery_query.rb +20 -2
- data/tumugi-plugin-bigquery.gemspec +2 -2
- metadata +21 -18
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA1:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 63e4b8a538949b06c7a63d62b60e965c3d167e21
|
|
4
|
+
data.tar.gz: ead04218cb01d036f9c0c457d6a036ab8b6a12b1
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 1d21aa4a556541f906d566f18fd61b94960eb6021f3c5c749de07dd2be14444533d228d5229a5ffe4d4b934071e670708aeb6c43dad6976e4feb4c8613dd1474
|
|
7
|
+
data.tar.gz: 389387e0fbcf5e4ab0719260bcefa70d8176ce0b4f96bb8b6d40c4078a23bde30dfd1b03c1189282cc78cdaddda4498c0e57c72325d6a9cbe747a2f0ef44e2b1
|
data/.gitignore
CHANGED
data/CHANGELOG.md
CHANGED
|
@@ -1,7 +1,29 @@
|
|
|
1
1
|
# Change Log
|
|
2
2
|
|
|
3
|
-
## [
|
|
4
|
-
[Full Changelog](https://github.com/tumugi/tumugi-plugin-bigquery/compare/v0.
|
|
3
|
+
## [v0.3.0](https://github.com/tumugi/tumugi-plugin-bigquery/tree/v0.3.0) (2016-07-16)
|
|
4
|
+
[Full Changelog](https://github.com/tumugi/tumugi-plugin-bigquery/compare/v0.2.0...v0.3.0)
|
|
5
|
+
|
|
6
|
+
**Implemented enhancements:**
|
|
7
|
+
|
|
8
|
+
- Support flatten\_result flag [\#30](https://github.com/tumugi/tumugi-plugin-bigquery/issues/30)
|
|
9
|
+
- Support mode parameter for BigqueryQueryTask [\#28](https://github.com/tumugi/tumugi-plugin-bigquery/issues/28)
|
|
10
|
+
- Support standard SQL [\#20](https://github.com/tumugi/tumugi-plugin-bigquery/issues/20)
|
|
11
|
+
- Support force copy table [\#7](https://github.com/tumugi/tumugi-plugin-bigquery/issues/7)
|
|
12
|
+
|
|
13
|
+
**Fixed bugs:**
|
|
14
|
+
|
|
15
|
+
- Fix JSON export for FileSystemTarget does not work [\#31](https://github.com/tumugi/tumugi-plugin-bigquery/issues/31)
|
|
16
|
+
|
|
17
|
+
**Merged pull requests:**
|
|
18
|
+
|
|
19
|
+
- Update tumugi to 0.6 [\#35](https://github.com/tumugi/tumugi-plugin-bigquery/pull/35) ([hakobera](https://github.com/hakobera))
|
|
20
|
+
- Add JSON export test [\#34](https://github.com/tumugi/tumugi-plugin-bigquery/pull/34) ([hakobera](https://github.com/hakobera))
|
|
21
|
+
- Fix misc [\#33](https://github.com/tumugi/tumugi-plugin-bigquery/pull/33) ([hakobera](https://github.com/hakobera))
|
|
22
|
+
- Support force\_copy parameter for bigquery\_copy task [\#32](https://github.com/tumugi/tumugi-plugin-bigquery/pull/32) ([hakobera](https://github.com/hakobera))
|
|
23
|
+
- Support append mode query and use legacy SQL flag [\#29](https://github.com/tumugi/tumugi-plugin-bigquery/pull/29) ([hakobera](https://github.com/hakobera))
|
|
24
|
+
|
|
25
|
+
## [v0.2.0](https://github.com/tumugi/tumugi-plugin-bigquery/tree/v0.2.0) (2016-06-06)
|
|
26
|
+
[Full Changelog](https://github.com/tumugi/tumugi-plugin-bigquery/compare/v0.1.0...v0.2.0)
|
|
5
27
|
|
|
6
28
|
**Implemented enhancements:**
|
|
7
29
|
|
|
@@ -23,8 +45,10 @@
|
|
|
23
45
|
|
|
24
46
|
**Merged pull requests:**
|
|
25
47
|
|
|
26
|
-
-
|
|
48
|
+
- Update changelog [\#27](https://github.com/tumugi/tumugi-plugin-bigquery/pull/27) ([hakobera](https://github.com/hakobera))
|
|
27
49
|
- Prepare release for 0.2.0 [\#25](https://github.com/tumugi/tumugi-plugin-bigquery/pull/25) ([hakobera](https://github.com/hakobera))
|
|
50
|
+
- Add rubygems badge [\#3](https://github.com/tumugi/tumugi-plugin-bigquery/pull/3) ([hakobera](https://github.com/hakobera))
|
|
51
|
+
- Cache output [\#26](https://github.com/tumugi/tumugi-plugin-bigquery/pull/26) ([hakobera](https://github.com/hakobera))
|
|
28
52
|
- Use Thor's invoke instead of system method [\#18](https://github.com/tumugi/tumugi-plugin-bigquery/pull/18) ([hakobera](https://github.com/hakobera))
|
|
29
53
|
- Change test ruby version [\#17](https://github.com/tumugi/tumugi-plugin-bigquery/pull/17) ([hakobera](https://github.com/hakobera))
|
|
30
54
|
- Change tumugi dependency version [\#16](https://github.com/tumugi/tumugi-plugin-bigquery/pull/16) ([hakobera](https://github.com/hakobera))
|
|
@@ -32,7 +56,6 @@
|
|
|
32
56
|
- Add BigqueryLoadTask [\#12](https://github.com/tumugi/tumugi-plugin-bigquery/pull/12) ([hakobera](https://github.com/hakobera))
|
|
33
57
|
- Update dependency gems [\#11](https://github.com/tumugi/tumugi-plugin-bigquery/pull/11) ([hakobera](https://github.com/hakobera))
|
|
34
58
|
- Update tumugi to v0.5.0 [\#9](https://github.com/tumugi/tumugi-plugin-bigquery/pull/9) ([hakobera](https://github.com/hakobera))
|
|
35
|
-
- Add rubygems badge [\#3](https://github.com/tumugi/tumugi-plugin-bigquery/pull/3) ([hakobera](https://github.com/hakobera))
|
|
36
59
|
|
|
37
60
|
## [v0.1.0](https://github.com/tumugi/tumugi-plugin-bigquery/tree/v0.1.0) (2016-05-16)
|
|
38
61
|
**Fixed bugs:**
|
data/README.md
CHANGED
|
@@ -1,8 +1,8 @@
|
|
|
1
1
|
[](https://travis-ci.org/tumugi/tumugi-plugin-bigquery) [](https://codeclimate.com/github/tumugi/tumugi-plugin-bigquery) [](https://coveralls.io/github/tumugi/tumugi-plugin-bigquery) [](https://badge.fury.io/rb/tumugi-plugin-bigquery)
|
|
2
2
|
|
|
3
|
-
# tumugi
|
|
3
|
+
# Google BigQuery plugin for [tumugi](https://github.com/tumugi/tumugi)
|
|
4
4
|
|
|
5
|
-
tumugi-plugin-bigquery is a plugin for integrate [Google BigQuery](https://cloud.google.com/bigquery/) and [
|
|
5
|
+
tumugi-plugin-bigquery is a plugin for integrate [Google BigQuery](https://cloud.google.com/bigquery/) and [tumugi](https://github.com/tumugi/tumugi).
|
|
6
6
|
|
|
7
7
|
## Installation
|
|
8
8
|
|
|
@@ -12,17 +12,7 @@ Add this line to your application's Gemfile:
|
|
|
12
12
|
gem 'tumugi-plugin-bigquery'
|
|
13
13
|
```
|
|
14
14
|
|
|
15
|
-
And then execute
|
|
16
|
-
|
|
17
|
-
```sh
|
|
18
|
-
$ bundle
|
|
19
|
-
```
|
|
20
|
-
|
|
21
|
-
Or install it yourself as:
|
|
22
|
-
|
|
23
|
-
```sb
|
|
24
|
-
$ gem install tumugi-plugin-bigquery
|
|
25
|
-
```
|
|
15
|
+
And then execute `bundle install`.
|
|
26
16
|
|
|
27
17
|
## Target
|
|
28
18
|
|
|
@@ -30,21 +20,65 @@ $ gem install tumugi-plugin-bigquery
|
|
|
30
20
|
|
|
31
21
|
`Tumugi::Plugin::BigqueryDatasetTarget` is target for BigQuery dataset.
|
|
32
22
|
|
|
23
|
+
#### Parameters
|
|
24
|
+
|
|
25
|
+
| Name | type | required? | default | description |
|
|
26
|
+
|------------|--------|-----------|---------|------------------------------------------------------------------|
|
|
27
|
+
| dataset_id | string | required | | Dataset ID |
|
|
28
|
+
| project_id | string | optional | | [Project](https://cloud.google.com/compute/docs/projects) ID |
|
|
29
|
+
|
|
30
|
+
#### Examples
|
|
31
|
+
|
|
32
|
+
```rb
|
|
33
|
+
task :task1 do
|
|
34
|
+
output target(:bigquery_dataset, dataset_id: "your_dataset_id")
|
|
35
|
+
end
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
```rb
|
|
39
|
+
task :task1 do
|
|
40
|
+
output target(:bigquery_dataset, project_id: "project_id", dataset_id: "dataset_id")
|
|
41
|
+
end
|
|
42
|
+
```
|
|
43
|
+
|
|
33
44
|
#### Tumugi::Plugin::BigqueryTableTarget
|
|
34
45
|
|
|
35
46
|
`Tumugi::Plugin::BigqueryDatasetTarget` is target for BigQuery table.
|
|
36
47
|
|
|
48
|
+
#### Parameters
|
|
49
|
+
|
|
50
|
+
| name | type | required? | default | description |
|
|
51
|
+
|------------|--------|-----------|---------|------------------------------------------------------------------|
|
|
52
|
+
| table_id | string | required | | Table ID |
|
|
53
|
+
| dataset_id | string | required | | Dataset ID |
|
|
54
|
+
| project_id | string | optional | | [Project](https://cloud.google.com/compute/docs/projects) ID |
|
|
55
|
+
|
|
56
|
+
#### Examples
|
|
57
|
+
|
|
58
|
+
```rb
|
|
59
|
+
task :task1 do
|
|
60
|
+
output target(:bigquery_table, table_id: "table_id", dataset_id: "your_dataset_id")
|
|
61
|
+
end
|
|
62
|
+
```
|
|
63
|
+
|
|
37
64
|
## Task
|
|
38
65
|
|
|
39
66
|
### Tumugi::Plugin::BigqueryDatasetTask
|
|
40
67
|
|
|
41
68
|
`Tumugi::Plugin::BigqueryDatasetTask` is task to create a dataset.
|
|
42
69
|
|
|
43
|
-
####
|
|
70
|
+
#### Parameters
|
|
71
|
+
|
|
72
|
+
| name | type | required? | default | description |
|
|
73
|
+
|------------|--------|-----------|---------|------------------------------------------------------------------|
|
|
74
|
+
| dataset_id | string | required | | Dataset ID |
|
|
75
|
+
| project_id | string | optional | | [Project](https://cloud.google.com/compute/docs/projects) ID |
|
|
76
|
+
|
|
77
|
+
#### Examples
|
|
44
78
|
|
|
45
79
|
```rb
|
|
46
80
|
task :task1, type: :bigquery_dataset do
|
|
47
|
-
|
|
81
|
+
dataset_id 'test'
|
|
48
82
|
end
|
|
49
83
|
```
|
|
50
84
|
|
|
@@ -52,13 +86,41 @@ end
|
|
|
52
86
|
|
|
53
87
|
`Tumugi::Plugin::BigqueryQueryTask` is task to run `query` and save the result into the table which specified by parameter.
|
|
54
88
|
|
|
55
|
-
####
|
|
89
|
+
#### Parameters
|
|
90
|
+
|
|
91
|
+
| name | type | required? | default | description |
|
|
92
|
+
|-----------------|---------|-----------|------------|-----------------------------------------------------------------------------------------------------------------------------------------------|
|
|
93
|
+
| query | string | required | | query to execute |
|
|
94
|
+
| table_id | string | required | | destination table ID |
|
|
95
|
+
| dataset_id | string | required | | destination dataset ID |
|
|
96
|
+
| project_id | string | optional | | destination project ID |
|
|
97
|
+
| mode | string | optional | "truncate" | specifies the action that occurs if the destination table already exists. [see](#mode) |
|
|
98
|
+
| flatten_results | boolean | optional | true | when you query nested data, BigQuery automatically flattens the table data or not. [see](https://cloud.google.com/bigquery/docs/data#flatten) |
|
|
99
|
+
| use_legacy_sql | bool | optional | true | use legacy SQL syntanx for BigQuery or not |
|
|
100
|
+
| wait | integer | optional | 60 | wait time (seconds) for query execution |
|
|
101
|
+
|
|
102
|
+
#### Examples
|
|
103
|
+
|
|
104
|
+
##### truncate mode (default)
|
|
56
105
|
|
|
57
106
|
```rb
|
|
58
107
|
task :task1, type: :bigquery_query do
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
108
|
+
query "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
|
|
109
|
+
table_id "dest_table#{Time.now.to_i}"
|
|
110
|
+
dataset_id "test"
|
|
111
|
+
end
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
##### append mode
|
|
115
|
+
|
|
116
|
+
If you set `mode` to `append`, query result append to existing table.
|
|
117
|
+
|
|
118
|
+
```rb
|
|
119
|
+
task :task1, type: :bigquery_query do
|
|
120
|
+
query "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
|
|
121
|
+
table_id "dest_table#{Time.now.to_i}"
|
|
122
|
+
dataset_id "test"
|
|
123
|
+
mode "append"
|
|
62
124
|
end
|
|
63
125
|
```
|
|
64
126
|
|
|
@@ -66,16 +128,46 @@ end
|
|
|
66
128
|
|
|
67
129
|
`Tumugi::Plugin::BigqueryCopyTask` is task to copy table which specified by parameter.
|
|
68
130
|
|
|
69
|
-
####
|
|
131
|
+
#### Parameters
|
|
132
|
+
|
|
133
|
+
| name | type | required? | default | description |
|
|
134
|
+
|-----------------|--------|-----------|---------|---------------------------------------------------------|
|
|
135
|
+
| src_table_id | string | required | | source table ID |
|
|
136
|
+
| src_dataset_id | string | required | | source dataset ID |
|
|
137
|
+
| src_project_id | string | optional | | source project ID |
|
|
138
|
+
| dest_table_id | string | required | | destination table ID |
|
|
139
|
+
| dest_dataset_id | string | required | | destination dataset ID |
|
|
140
|
+
| dest_project_id | string | optional | | destination project ID |
|
|
141
|
+
| force_copy | bool | optional | false | force copy when destination table already exists or not |
|
|
142
|
+
| wait | integer| optional | 60 | wait time (seconds) for query execution |
|
|
143
|
+
|
|
144
|
+
#### Examples
|
|
70
145
|
|
|
71
146
|
Copy `test.src_table` to `test.dest_table`.
|
|
72
147
|
|
|
148
|
+
##### Normal usecase
|
|
149
|
+
|
|
150
|
+
```rb
|
|
151
|
+
task :task1, type: :bigquery_copy do
|
|
152
|
+
src_table_id "src_table"
|
|
153
|
+
src_dataset_id "test"
|
|
154
|
+
dest_table_id "dest_table"
|
|
155
|
+
dest_dataset_id "test"
|
|
156
|
+
end
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
##### force_copy
|
|
160
|
+
|
|
161
|
+
If `force_copy` is `true`, copy operation always execute even if destination table exists.
|
|
162
|
+
This means data of destination table data is deleted, so be carefull to enable this parameter.
|
|
163
|
+
|
|
73
164
|
```rb
|
|
74
165
|
task :task1, type: :bigquery_copy do
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
166
|
+
src_table_id "src_table"
|
|
167
|
+
src_dataset_id "test"
|
|
168
|
+
dest_table_id "dest_table"
|
|
169
|
+
dest_dataset_id "test"
|
|
170
|
+
force_copy true
|
|
79
171
|
end
|
|
80
172
|
```
|
|
81
173
|
|
|
@@ -83,25 +175,154 @@ end
|
|
|
83
175
|
|
|
84
176
|
`Tumugi::Plugin::BigqueryLoadTask` is task to load structured data from GCS into BigQuery.
|
|
85
177
|
|
|
86
|
-
####
|
|
178
|
+
#### Parameters
|
|
179
|
+
|
|
180
|
+
| name | type | required? | default | description |
|
|
181
|
+
|-----------------------|-----------------|------------------------------------|---------------------|----------------------------------------------------------------------------------------------------------------------------------------------|
|
|
182
|
+
| bucket | string | required | | source GCS bucket name |
|
|
183
|
+
| key | string | required | | source path of file like "/path/to/file.csv" |
|
|
184
|
+
| table_id | string | required | | destination table ID |
|
|
185
|
+
| dataset_id | string | required | | destination dataset ID |
|
|
186
|
+
| project_id | string | optional | | destination project ID |
|
|
187
|
+
| schema | array of object | required when mode is not "append" | | see [schema format](#schema) |
|
|
188
|
+
| mode | string | optional | "append" | specifies the action that occurs if the destination table already exists. [see](#mode) |
|
|
189
|
+
| source_format | string | optional | "CSV" | source file format. [see](#format) |
|
|
190
|
+
| ignore_unknown_values | bool | optional | false | indicates if BigQuery should allow extra values that are not represented in the table schema |
|
|
191
|
+
| max_bad_records | integer | optional | 0 | maximum number of bad records that BigQuery can ignore when running the job |
|
|
192
|
+
| field_delimiter | string | optional | "," | separator for fields in a CSV file. used only when source_format is "CSV" |
|
|
193
|
+
| allow_jagged_rows | bool | optional | false | accept rows that are missing trailing optional columns. The missing values are treated as null. used only when source_format is "CSV" |
|
|
194
|
+
| allow_quoted_newlines | bool | optional | false | indicates if BigQuery should allow quoted data sections that contain newline characters in a CSV file. used only when source_format is "CSV" |
|
|
195
|
+
| quote | string | optional | "\"" (double-quote) | value that is used to quote data sections in a CSV file. used only when source_format is "CSV" |
|
|
196
|
+
| skip_leading_rows | integer | optional | 0 | .number of rows at the top of a CSV file that BigQuery will skip when loading the data. used only when source_format is "CSV" |
|
|
197
|
+
| wait | integer | optional | 60 | wait time (seconds) for query execution |
|
|
198
|
+
|
|
199
|
+
#### Example
|
|
87
200
|
|
|
88
201
|
Load `gs://test_bucket/load_data.csv` into `dest_project:dest_dataset.dest_table`
|
|
89
202
|
|
|
90
203
|
```rb
|
|
91
204
|
task :task1, type: :bigquery_load do
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
205
|
+
bucket "test_bucket"
|
|
206
|
+
key "load_data.csv"
|
|
207
|
+
table_id "dest_table"
|
|
208
|
+
datset_id "dest_dataset"
|
|
209
|
+
project_id "dest_project"
|
|
210
|
+
end
|
|
211
|
+
```
|
|
212
|
+
|
|
213
|
+
### Tumugi::Plugin::BigqueryExportTask
|
|
214
|
+
|
|
215
|
+
`Tumugi::Plugin::BigqueryExportTask` is task to export BigQuery table.
|
|
216
|
+
|
|
217
|
+
#### Parameters
|
|
218
|
+
|
|
219
|
+
| name | type | required? | default | description |
|
|
220
|
+
|--------------------|---------|-----------|--------------------|-------------------------------------------------------------------------------------|
|
|
221
|
+
| project_id | string | optional | | source project ID |
|
|
222
|
+
| job_project_id | string | optional | same as project_id | job running project ID |
|
|
223
|
+
| dataset_id | string | required | true | source dataset ID |
|
|
224
|
+
| table_id | string | required | true | source table ID |
|
|
225
|
+
| compression | string | optional | "NONE" | [destination file compression]. "NONE": no compression, "GZIP": compression by gzip |
|
|
226
|
+
| destination_format | string | optional | "CSV" | [destination file format](#format) |
|
|
227
|
+
| field_delimiter | string | optional | "," | separator for fields in a CSV file. used only when destination_format is "CSV" |
|
|
228
|
+
| print_header | bool | optional | true | print header row in a CSV file. used only when destination_format is "CSV" |
|
|
229
|
+
| page_size | integer | optional | 10000 | Fetch number of rows in one request |
|
|
230
|
+
| wait | integer | optional | 60 | wait time (seconds) for query execution |
|
|
231
|
+
|
|
232
|
+
#### Examples
|
|
233
|
+
|
|
234
|
+
##### Export `src_dataset.src_table` to local file `data.csv`
|
|
235
|
+
|
|
236
|
+
```rb
|
|
237
|
+
task :task1, type: :bigquery_export do
|
|
238
|
+
table_id "src_table"
|
|
239
|
+
datset_id "src_dataset"
|
|
240
|
+
|
|
241
|
+
output target(:local_file, "data.csv")
|
|
242
|
+
end
|
|
243
|
+
```
|
|
244
|
+
|
|
245
|
+
##### Export `src_dataset.src_table` to Google Cloud Storage
|
|
246
|
+
|
|
247
|
+
You need [tumugi-plugin-google_cloud_storage](https://github.com/tumugi/tumugi-plugin-google_cloud_storage)
|
|
248
|
+
|
|
249
|
+
```rb
|
|
250
|
+
task :task1, type: :bigquery_export do
|
|
251
|
+
table_id "src_table"
|
|
252
|
+
datset_id "src_dataset"
|
|
253
|
+
|
|
254
|
+
output target(:google_cloud_storage_file, bucket: "bucket", key: "data.csv")
|
|
97
255
|
end
|
|
98
256
|
```
|
|
99
257
|
|
|
100
|
-
|
|
258
|
+
##### Export `src_dataset.src_table` to Google Drive
|
|
259
|
+
|
|
260
|
+
You need [tumugi-plugin-google_drive](https://github.com/tumugi/tumugi-plugin-google_drive)
|
|
261
|
+
|
|
262
|
+
```rb
|
|
263
|
+
task :task1, type: :bigquery_export do
|
|
264
|
+
table_id "src_table"
|
|
265
|
+
datset_id "src_dataset"
|
|
266
|
+
|
|
267
|
+
output target(:google_drive_file, name: "data.csv")
|
|
268
|
+
end
|
|
269
|
+
```
|
|
270
|
+
|
|
271
|
+
## Common parameter value
|
|
272
|
+
|
|
273
|
+
### mode
|
|
274
|
+
|
|
275
|
+
| value | description |
|
|
276
|
+
|----------|-------------|
|
|
277
|
+
| truncate | If the table already exists, BigQuery overwrites the table data. |
|
|
278
|
+
| append | If the table already exists, BigQuery appends the data to the table. |
|
|
279
|
+
| empty | If the table already exists and contains data, a 'duplicate' error is returned in the job result. |
|
|
280
|
+
|
|
281
|
+
### format
|
|
282
|
+
|
|
283
|
+
| value | description |
|
|
284
|
+
|------------------------|--------------------------------------------|
|
|
285
|
+
| CSV | CSV |
|
|
286
|
+
| NEWLINE_DELIMITED_JSON | Each line is JSON + new line |
|
|
287
|
+
| AVRO | [see](https://avro.apache.org/docs/1.2.0/) |
|
|
288
|
+
|
|
289
|
+
### schema
|
|
290
|
+
|
|
291
|
+
Format of `schema` parameter is array of nested object like below:
|
|
292
|
+
|
|
293
|
+
```js
|
|
294
|
+
[
|
|
295
|
+
{
|
|
296
|
+
"name": "column1",
|
|
297
|
+
"type": "string"
|
|
298
|
+
},
|
|
299
|
+
{
|
|
300
|
+
"name": "column2",
|
|
301
|
+
"type": "integer",
|
|
302
|
+
"mode": "repeated"
|
|
303
|
+
},
|
|
304
|
+
{
|
|
305
|
+
"name": "record1",
|
|
306
|
+
"type": "record",
|
|
307
|
+
"fields": [
|
|
308
|
+
{
|
|
309
|
+
"name": "key1",
|
|
310
|
+
"type": "integer",
|
|
311
|
+
},
|
|
312
|
+
{
|
|
313
|
+
"name": "key2",
|
|
314
|
+
"type": "integer"
|
|
315
|
+
}
|
|
316
|
+
]
|
|
317
|
+
}
|
|
318
|
+
]
|
|
319
|
+
```
|
|
320
|
+
|
|
321
|
+
## Config Section
|
|
101
322
|
|
|
102
323
|
tumugi-plugin-bigquery provide config section named "bigquery" which can specified BigQuery autenticaion info.
|
|
103
324
|
|
|
104
|
-
|
|
325
|
+
### Authenticate by client_email and private_key
|
|
105
326
|
|
|
106
327
|
```rb
|
|
107
328
|
Tumugi.configure do |config|
|
|
@@ -113,7 +334,7 @@ Tumugi.configure do |config|
|
|
|
113
334
|
end
|
|
114
335
|
```
|
|
115
336
|
|
|
116
|
-
|
|
337
|
+
### Authenticate by JSON key file
|
|
117
338
|
|
|
118
339
|
```rb
|
|
119
340
|
Tumugi.configure do |config|
|
data/examples/copy.rb
CHANGED
|
@@ -1,21 +1,21 @@
|
|
|
1
1
|
task :task1, type: :bigquery_copy do
|
|
2
|
-
|
|
3
|
-
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
2
|
+
src_project_id { input.project_id }
|
|
3
|
+
src_dataset_id { input.dataset_id }
|
|
4
|
+
src_table_id { input.table_id }
|
|
5
|
+
dest_dataset_id "test"
|
|
6
|
+
dest_table_id { "dest_table_#{Time.now.to_i}" }
|
|
7
7
|
|
|
8
8
|
requires :task2
|
|
9
9
|
end
|
|
10
10
|
|
|
11
11
|
task :task2, type: :bigquery_query do
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
12
|
+
query "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
|
|
13
|
+
dataset_id { input.dataset_id }
|
|
14
|
+
table_id "dest_#{Time.now.to_i}"
|
|
15
15
|
|
|
16
16
|
requires :task3
|
|
17
17
|
end
|
|
18
18
|
|
|
19
19
|
task :task3, type: :bigquery_dataset do
|
|
20
|
-
|
|
20
|
+
dataset_id "test"
|
|
21
21
|
end
|
data/examples/dataset.rb
CHANGED
data/examples/export.rb
ADDED
|
@@ -0,0 +1,13 @@
|
|
|
1
|
+
task :task1, type: :bigquery_export do
|
|
2
|
+
dataset_id { input.dataset_id }
|
|
3
|
+
table_id { input.table_id }
|
|
4
|
+
|
|
5
|
+
requires :task2
|
|
6
|
+
output target(:local_file, "tmp/export.csv")
|
|
7
|
+
end
|
|
8
|
+
|
|
9
|
+
task :task2, type: :bigquery_query do
|
|
10
|
+
query "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
|
|
11
|
+
dataset_id "test"
|
|
12
|
+
table_id "dest_#{Time.now.to_i}"
|
|
13
|
+
end
|
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
task :task1, type: :bigquery_copy do
|
|
2
|
+
src_project_id { input.project_id }
|
|
3
|
+
src_dataset_id { input.dataset_id }
|
|
4
|
+
src_table_id { input.table_id }
|
|
5
|
+
dest_dataset_id "test"
|
|
6
|
+
dest_table_id "dest_table_1"
|
|
7
|
+
force_copy true
|
|
8
|
+
|
|
9
|
+
requires :task2
|
|
10
|
+
end
|
|
11
|
+
|
|
12
|
+
task :task2, type: :bigquery_query do
|
|
13
|
+
query "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
|
|
14
|
+
dataset_id { input.dataset_id }
|
|
15
|
+
table_id "dest_#{Time.now.to_i}"
|
|
16
|
+
|
|
17
|
+
requires :task3
|
|
18
|
+
end
|
|
19
|
+
|
|
20
|
+
task :task3, type: :bigquery_dataset do
|
|
21
|
+
dataset_id "test"
|
|
22
|
+
end
|
data/examples/load.rb
CHANGED
|
@@ -1,11 +1,11 @@
|
|
|
1
1
|
task :task1, type: :bigquery_load do
|
|
2
2
|
requires :task2
|
|
3
|
-
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
|
|
3
|
+
bucket 'tumugi-plugin-bigquery'
|
|
4
|
+
key 'test.csv'
|
|
5
|
+
dataset_id { input.dataset_id }
|
|
6
|
+
table_id 'load_test'
|
|
7
|
+
skip_leading_rows 1
|
|
8
|
+
schema [
|
|
9
9
|
{
|
|
10
10
|
name: 'row_number',
|
|
11
11
|
type: 'INTEGER',
|
|
@@ -20,5 +20,5 @@ task :task1, type: :bigquery_load do
|
|
|
20
20
|
end
|
|
21
21
|
|
|
22
22
|
task :task2, type: :bigquery_dataset do
|
|
23
|
-
|
|
23
|
+
dataset_id "test"
|
|
24
24
|
end
|
data/examples/query.rb
CHANGED
|
@@ -6,7 +6,7 @@ task :task1 do
|
|
|
6
6
|
end
|
|
7
7
|
|
|
8
8
|
task :task2, type: :bigquery_query do
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
9
|
+
query "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
|
|
10
|
+
dataset_id "test"
|
|
11
|
+
table_id "dest_#{Time.now.to_i}"
|
|
12
12
|
end
|
|
@@ -0,0 +1,13 @@
|
|
|
1
|
+
task :task1 do
|
|
2
|
+
requires :task2
|
|
3
|
+
run do
|
|
4
|
+
log input.table_name
|
|
5
|
+
end
|
|
6
|
+
end
|
|
7
|
+
|
|
8
|
+
task :task2, type: :bigquery_query do
|
|
9
|
+
query "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
|
|
10
|
+
dataset_id "test"
|
|
11
|
+
table_id "dest_append"
|
|
12
|
+
mode "append"
|
|
13
|
+
end
|
|
@@ -178,6 +178,7 @@ module Tumugi
|
|
|
178
178
|
flatten_results: true,
|
|
179
179
|
priority: "INTERACTIVE",
|
|
180
180
|
use_query_cache: true,
|
|
181
|
+
use_legacy_sql: true,
|
|
181
182
|
user_defined_function_resources: nil,
|
|
182
183
|
project_id: nil,
|
|
183
184
|
job_project_id: nil,
|
|
@@ -191,6 +192,7 @@ module Tumugi
|
|
|
191
192
|
flatten_results: flatten_results,
|
|
192
193
|
priority: priority,
|
|
193
194
|
use_query_cache: use_query_cache,
|
|
195
|
+
use_legacy_sql: use_legacy_sql,
|
|
194
196
|
user_defined_function_resources: user_defined_function_resources,
|
|
195
197
|
project_id: project_id || @project_id,
|
|
196
198
|
job_project_id: job_project_id || @project_id,
|
|
@@ -12,16 +12,25 @@ module Tumugi
|
|
|
12
12
|
param :dest_project_id, type: :string
|
|
13
13
|
param :dest_dataset_id, type: :string, required: true
|
|
14
14
|
param :dest_table_id, type: :string, required: true
|
|
15
|
-
param :
|
|
15
|
+
param :force_copy, type: :bool, default: false
|
|
16
|
+
param :wait, type: :integer, default: 60
|
|
16
17
|
|
|
17
18
|
def output
|
|
18
19
|
return @output if @output
|
|
19
|
-
|
|
20
|
+
|
|
20
21
|
opts = { dataset_id: dest_dataset_id, table_id: dest_table_id }
|
|
21
22
|
opts[:project_id] = dest_project_id if dest_project_id
|
|
22
23
|
@output = Tumugi::Plugin::BigqueryTableTarget.new(opts)
|
|
23
24
|
end
|
|
24
25
|
|
|
26
|
+
def completed?
|
|
27
|
+
if force_copy && !finished?
|
|
28
|
+
false
|
|
29
|
+
else
|
|
30
|
+
super
|
|
31
|
+
end
|
|
32
|
+
end
|
|
33
|
+
|
|
25
34
|
def run
|
|
26
35
|
log "Source: bq://#{src_project_id}/#{src_dataset_id}/#{src_table_id}"
|
|
27
36
|
log "Destination: #{output}"
|
|
@@ -10,19 +10,37 @@ module Tumugi
|
|
|
10
10
|
param :project_id, type: :string
|
|
11
11
|
param :dataset_id, type: :string, required: true
|
|
12
12
|
param :table_id, type: :string, required: true
|
|
13
|
-
param :
|
|
13
|
+
param :mode, type: :string, default: 'truncate' # append, empty
|
|
14
|
+
param :flatten_results, type: :bool, default: true
|
|
15
|
+
param :use_legacy_sql, type: :bool, default: true
|
|
16
|
+
param :wait, type: :integer, default: 60
|
|
14
17
|
|
|
15
18
|
def output
|
|
16
19
|
@output ||= Tumugi::Plugin::BigqueryTableTarget.new(project_id: project_id, dataset_id: dataset_id, table_id: table_id)
|
|
17
20
|
end
|
|
18
21
|
|
|
22
|
+
def completed?
|
|
23
|
+
if mode.to_sym == :append && !finished?
|
|
24
|
+
false
|
|
25
|
+
else
|
|
26
|
+
super
|
|
27
|
+
end
|
|
28
|
+
end
|
|
29
|
+
|
|
19
30
|
def run
|
|
20
31
|
log "Launching Query"
|
|
21
32
|
log "Query: #{query}"
|
|
22
33
|
log "Query destination: #{output}"
|
|
23
34
|
|
|
24
35
|
bq_client = output.client
|
|
25
|
-
bq_client.query(query,
|
|
36
|
+
bq_client.query(query,
|
|
37
|
+
project_id: project_id,
|
|
38
|
+
dataset_id: output.dataset_id,
|
|
39
|
+
table_id: output.table_id,
|
|
40
|
+
mode: mode.to_sym,
|
|
41
|
+
flatten_results: flatten_results,
|
|
42
|
+
use_legacy_sql: use_legacy_sql,
|
|
43
|
+
wait: wait)
|
|
26
44
|
end
|
|
27
45
|
end
|
|
28
46
|
end
|
|
@@ -20,14 +20,14 @@ Gem::Specification.new do |spec|
|
|
|
20
20
|
spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
|
|
21
21
|
spec.require_paths = ["lib"]
|
|
22
22
|
|
|
23
|
-
spec.add_runtime_dependency "tumugi", ">= 0.
|
|
23
|
+
spec.add_runtime_dependency "tumugi", ">= 0.6.1"
|
|
24
24
|
spec.add_runtime_dependency "kura", "~> 0.2.17"
|
|
25
|
+
spec.add_runtime_dependency "json", "~> 1.8.3" # json 2.0 does not work with JRuby + MultiJson
|
|
25
26
|
|
|
26
27
|
spec.add_development_dependency 'bundler', '~> 1.11'
|
|
27
28
|
spec.add_development_dependency 'rake', '~> 10.0'
|
|
28
29
|
spec.add_development_dependency 'test-unit', '~> 3.1'
|
|
29
30
|
spec.add_development_dependency 'test-unit-rr'
|
|
30
31
|
spec.add_development_dependency 'coveralls'
|
|
31
|
-
spec.add_development_dependency 'github_changelog_generator'
|
|
32
32
|
spec.add_development_dependency 'tumugi-plugin-google_cloud_storage'
|
|
33
33
|
end
|
metadata
CHANGED
|
@@ -1,14 +1,14 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: tumugi-plugin-bigquery
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.
|
|
4
|
+
version: 0.3.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Kazuyuki Honda
|
|
8
8
|
autorequire:
|
|
9
9
|
bindir: exe
|
|
10
10
|
cert_chain: []
|
|
11
|
-
date: 2016-
|
|
11
|
+
date: 2016-07-17 00:00:00.000000000 Z
|
|
12
12
|
dependencies:
|
|
13
13
|
- !ruby/object:Gem::Dependency
|
|
14
14
|
name: tumugi
|
|
@@ -16,14 +16,14 @@ dependencies:
|
|
|
16
16
|
requirements:
|
|
17
17
|
- - ">="
|
|
18
18
|
- !ruby/object:Gem::Version
|
|
19
|
-
version: 0.
|
|
19
|
+
version: 0.6.1
|
|
20
20
|
type: :runtime
|
|
21
21
|
prerelease: false
|
|
22
22
|
version_requirements: !ruby/object:Gem::Requirement
|
|
23
23
|
requirements:
|
|
24
24
|
- - ">="
|
|
25
25
|
- !ruby/object:Gem::Version
|
|
26
|
-
version: 0.
|
|
26
|
+
version: 0.6.1
|
|
27
27
|
- !ruby/object:Gem::Dependency
|
|
28
28
|
name: kura
|
|
29
29
|
requirement: !ruby/object:Gem::Requirement
|
|
@@ -38,6 +38,20 @@ dependencies:
|
|
|
38
38
|
- - "~>"
|
|
39
39
|
- !ruby/object:Gem::Version
|
|
40
40
|
version: 0.2.17
|
|
41
|
+
- !ruby/object:Gem::Dependency
|
|
42
|
+
name: json
|
|
43
|
+
requirement: !ruby/object:Gem::Requirement
|
|
44
|
+
requirements:
|
|
45
|
+
- - "~>"
|
|
46
|
+
- !ruby/object:Gem::Version
|
|
47
|
+
version: 1.8.3
|
|
48
|
+
type: :runtime
|
|
49
|
+
prerelease: false
|
|
50
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
51
|
+
requirements:
|
|
52
|
+
- - "~>"
|
|
53
|
+
- !ruby/object:Gem::Version
|
|
54
|
+
version: 1.8.3
|
|
41
55
|
- !ruby/object:Gem::Dependency
|
|
42
56
|
name: bundler
|
|
43
57
|
requirement: !ruby/object:Gem::Requirement
|
|
@@ -108,20 +122,6 @@ dependencies:
|
|
|
108
122
|
- - ">="
|
|
109
123
|
- !ruby/object:Gem::Version
|
|
110
124
|
version: '0'
|
|
111
|
-
- !ruby/object:Gem::Dependency
|
|
112
|
-
name: github_changelog_generator
|
|
113
|
-
requirement: !ruby/object:Gem::Requirement
|
|
114
|
-
requirements:
|
|
115
|
-
- - ">="
|
|
116
|
-
- !ruby/object:Gem::Version
|
|
117
|
-
version: '0'
|
|
118
|
-
type: :development
|
|
119
|
-
prerelease: false
|
|
120
|
-
version_requirements: !ruby/object:Gem::Requirement
|
|
121
|
-
requirements:
|
|
122
|
-
- - ">="
|
|
123
|
-
- !ruby/object:Gem::Version
|
|
124
|
-
version: '0'
|
|
125
125
|
- !ruby/object:Gem::Dependency
|
|
126
126
|
name: tumugi-plugin-google_cloud_storage
|
|
127
127
|
requirement: !ruby/object:Gem::Requirement
|
|
@@ -152,8 +152,11 @@ files:
|
|
|
152
152
|
- bin/setup
|
|
153
153
|
- examples/copy.rb
|
|
154
154
|
- examples/dataset.rb
|
|
155
|
+
- examples/export.rb
|
|
156
|
+
- examples/force_copy.rb
|
|
155
157
|
- examples/load.rb
|
|
156
158
|
- examples/query.rb
|
|
159
|
+
- examples/query_append.rb
|
|
157
160
|
- examples/test.csv
|
|
158
161
|
- examples/tumugi_config_example.rb
|
|
159
162
|
- lib/tumugi/plugin/bigquery/client.rb
|