tumugi-plugin-bigquery 0.2.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 1f82d5d752da3918795afc6cc669a0fb4711cf95
4
- data.tar.gz: fed486ae8aeb9266d4fd11cf523a19a8507755af
3
+ metadata.gz: 63e4b8a538949b06c7a63d62b60e965c3d167e21
4
+ data.tar.gz: ead04218cb01d036f9c0c457d6a036ab8b6a12b1
5
5
  SHA512:
6
- metadata.gz: 8418f29dfe96d38bcdfa0c5098d59efd819edaa715a7e2af0945c57f70a4d08fa5c477560df636389eefe0bcf40712c4d240b2a97d3301cadebf6e5615808f2b
7
- data.tar.gz: aa34ee20fdec506277f3ac40b8819b62f7d000db7c802ed4c9fcedd5af1bf33644ecf673b10326c8e64f8d554a9a708cdfc0f9ae7942b5570257adec3108ab28
6
+ metadata.gz: 1d21aa4a556541f906d566f18fd61b94960eb6021f3c5c749de07dd2be14444533d228d5229a5ffe4d4b934071e670708aeb6c43dad6976e4feb4c8613dd1474
7
+ data.tar.gz: 389387e0fbcf5e4ab0719260bcefa70d8176ce0b4f96bb8b6d40c4078a23bde30dfd1b03c1189282cc78cdaddda4498c0e57c72325d6a9cbe747a2f0ef44e2b1
data/.gitignore CHANGED
@@ -7,4 +7,5 @@
7
7
  /pkg/
8
8
  /spec/reports/
9
9
  /tmp/
10
+ .ruby-version
10
11
  tumugi_config.rb
@@ -1,7 +1,29 @@
1
1
  # Change Log
2
2
 
3
- ## [0.2.0](https://github.com/tumugi/tumugi-plugin-bigquery/tree/0.2.0) (2016-06-06)
4
- [Full Changelog](https://github.com/tumugi/tumugi-plugin-bigquery/compare/v0.1.0...0.2.0)
3
+ ## [v0.3.0](https://github.com/tumugi/tumugi-plugin-bigquery/tree/v0.3.0) (2016-07-16)
4
+ [Full Changelog](https://github.com/tumugi/tumugi-plugin-bigquery/compare/v0.2.0...v0.3.0)
5
+
6
+ **Implemented enhancements:**
7
+
8
+ - Support flatten\_result flag [\#30](https://github.com/tumugi/tumugi-plugin-bigquery/issues/30)
9
+ - Support mode parameter for BigqueryQueryTask [\#28](https://github.com/tumugi/tumugi-plugin-bigquery/issues/28)
10
+ - Support standard SQL [\#20](https://github.com/tumugi/tumugi-plugin-bigquery/issues/20)
11
+ - Support force copy table [\#7](https://github.com/tumugi/tumugi-plugin-bigquery/issues/7)
12
+
13
+ **Fixed bugs:**
14
+
15
+ - Fix JSON export for FileSystemTarget does not work [\#31](https://github.com/tumugi/tumugi-plugin-bigquery/issues/31)
16
+
17
+ **Merged pull requests:**
18
+
19
+ - Update tumugi to 0.6 [\#35](https://github.com/tumugi/tumugi-plugin-bigquery/pull/35) ([hakobera](https://github.com/hakobera))
20
+ - Add JSON export test [\#34](https://github.com/tumugi/tumugi-plugin-bigquery/pull/34) ([hakobera](https://github.com/hakobera))
21
+ - Fix misc [\#33](https://github.com/tumugi/tumugi-plugin-bigquery/pull/33) ([hakobera](https://github.com/hakobera))
22
+ - Support force\_copy parameter for bigquery\_copy task [\#32](https://github.com/tumugi/tumugi-plugin-bigquery/pull/32) ([hakobera](https://github.com/hakobera))
23
+ - Support append mode query and use legacy SQL flag [\#29](https://github.com/tumugi/tumugi-plugin-bigquery/pull/29) ([hakobera](https://github.com/hakobera))
24
+
25
+ ## [v0.2.0](https://github.com/tumugi/tumugi-plugin-bigquery/tree/v0.2.0) (2016-06-06)
26
+ [Full Changelog](https://github.com/tumugi/tumugi-plugin-bigquery/compare/v0.1.0...v0.2.0)
5
27
 
6
28
  **Implemented enhancements:**
7
29
 
@@ -23,8 +45,10 @@
23
45
 
24
46
  **Merged pull requests:**
25
47
 
26
- - Cache output [\#26](https://github.com/tumugi/tumugi-plugin-bigquery/pull/26) ([hakobera](https://github.com/hakobera))
48
+ - Update changelog [\#27](https://github.com/tumugi/tumugi-plugin-bigquery/pull/27) ([hakobera](https://github.com/hakobera))
27
49
  - Prepare release for 0.2.0 [\#25](https://github.com/tumugi/tumugi-plugin-bigquery/pull/25) ([hakobera](https://github.com/hakobera))
50
+ - Add rubygems badge [\#3](https://github.com/tumugi/tumugi-plugin-bigquery/pull/3) ([hakobera](https://github.com/hakobera))
51
+ - Cache output [\#26](https://github.com/tumugi/tumugi-plugin-bigquery/pull/26) ([hakobera](https://github.com/hakobera))
28
52
  - Use Thor's invoke instead of system method [\#18](https://github.com/tumugi/tumugi-plugin-bigquery/pull/18) ([hakobera](https://github.com/hakobera))
29
53
  - Change test ruby version [\#17](https://github.com/tumugi/tumugi-plugin-bigquery/pull/17) ([hakobera](https://github.com/hakobera))
30
54
  - Change tumugi dependency version [\#16](https://github.com/tumugi/tumugi-plugin-bigquery/pull/16) ([hakobera](https://github.com/hakobera))
@@ -32,7 +56,6 @@
32
56
  - Add BigqueryLoadTask [\#12](https://github.com/tumugi/tumugi-plugin-bigquery/pull/12) ([hakobera](https://github.com/hakobera))
33
57
  - Update dependency gems [\#11](https://github.com/tumugi/tumugi-plugin-bigquery/pull/11) ([hakobera](https://github.com/hakobera))
34
58
  - Update tumugi to v0.5.0 [\#9](https://github.com/tumugi/tumugi-plugin-bigquery/pull/9) ([hakobera](https://github.com/hakobera))
35
- - Add rubygems badge [\#3](https://github.com/tumugi/tumugi-plugin-bigquery/pull/3) ([hakobera](https://github.com/hakobera))
36
59
 
37
60
  ## [v0.1.0](https://github.com/tumugi/tumugi-plugin-bigquery/tree/v0.1.0) (2016-05-16)
38
61
  **Fixed bugs:**
data/README.md CHANGED
@@ -1,8 +1,8 @@
1
1
  [![Build Status](https://travis-ci.org/tumugi/tumugi-plugin-bigquery.svg?branch=master)](https://travis-ci.org/tumugi/tumugi-plugin-bigquery) [![Code Climate](https://codeclimate.com/github/tumugi/tumugi-plugin-bigquery/badges/gpa.svg)](https://codeclimate.com/github/tumugi/tumugi-plugin-bigquery) [![Coverage Status](https://coveralls.io/repos/github/tumugi/tumugi-plugin-bigquery/badge.svg?branch=master)](https://coveralls.io/github/tumugi/tumugi-plugin-bigquery) [![Gem Version](https://badge.fury.io/rb/tumugi-plugin-bigquery.svg)](https://badge.fury.io/rb/tumugi-plugin-bigquery)
2
2
 
3
- # tumugi-plugin-bigquery
3
+ # Google BigQuery plugin for [tumugi](https://github.com/tumugi/tumugi)
4
4
 
5
- tumugi-plugin-bigquery is a plugin for integrate [Google BigQuery](https://cloud.google.com/bigquery/) and [Tumugi](https://github.com/tumugi/tumugi).
5
+ tumugi-plugin-bigquery is a plugin for integrate [Google BigQuery](https://cloud.google.com/bigquery/) and [tumugi](https://github.com/tumugi/tumugi).
6
6
 
7
7
  ## Installation
8
8
 
@@ -12,17 +12,7 @@ Add this line to your application's Gemfile:
12
12
  gem 'tumugi-plugin-bigquery'
13
13
  ```
14
14
 
15
- And then execute:
16
-
17
- ```sh
18
- $ bundle
19
- ```
20
-
21
- Or install it yourself as:
22
-
23
- ```sb
24
- $ gem install tumugi-plugin-bigquery
25
- ```
15
+ And then execute `bundle install`.
26
16
 
27
17
  ## Target
28
18
 
@@ -30,21 +20,65 @@ $ gem install tumugi-plugin-bigquery
30
20
 
31
21
  `Tumugi::Plugin::BigqueryDatasetTarget` is target for BigQuery dataset.
32
22
 
23
+ #### Parameters
24
+
25
+ | Name | type | required? | default | description |
26
+ |------------|--------|-----------|---------|------------------------------------------------------------------|
27
+ | dataset_id | string | required | | Dataset ID |
28
+ | project_id | string | optional | | [Project](https://cloud.google.com/compute/docs/projects) ID |
29
+
30
+ #### Examples
31
+
32
+ ```rb
33
+ task :task1 do
34
+ output target(:bigquery_dataset, dataset_id: "your_dataset_id")
35
+ end
36
+ ```
37
+
38
+ ```rb
39
+ task :task1 do
40
+ output target(:bigquery_dataset, project_id: "project_id", dataset_id: "dataset_id")
41
+ end
42
+ ```
43
+
33
44
  #### Tumugi::Plugin::BigqueryTableTarget
34
45
 
35
46
  `Tumugi::Plugin::BigqueryDatasetTarget` is target for BigQuery table.
36
47
 
48
+ #### Parameters
49
+
50
+ | name | type | required? | default | description |
51
+ |------------|--------|-----------|---------|------------------------------------------------------------------|
52
+ | table_id | string | required | | Table ID |
53
+ | dataset_id | string | required | | Dataset ID |
54
+ | project_id | string | optional | | [Project](https://cloud.google.com/compute/docs/projects) ID |
55
+
56
+ #### Examples
57
+
58
+ ```rb
59
+ task :task1 do
60
+ output target(:bigquery_table, table_id: "table_id", dataset_id: "your_dataset_id")
61
+ end
62
+ ```
63
+
37
64
  ## Task
38
65
 
39
66
  ### Tumugi::Plugin::BigqueryDatasetTask
40
67
 
41
68
  `Tumugi::Plugin::BigqueryDatasetTask` is task to create a dataset.
42
69
 
43
- #### Usage
70
+ #### Parameters
71
+
72
+ | name | type | required? | default | description |
73
+ |------------|--------|-----------|---------|------------------------------------------------------------------|
74
+ | dataset_id | string | required | | Dataset ID |
75
+ | project_id | string | optional | | [Project](https://cloud.google.com/compute/docs/projects) ID |
76
+
77
+ #### Examples
44
78
 
45
79
  ```rb
46
80
  task :task1, type: :bigquery_dataset do
47
- param_set :dataset_id, 'test'
81
+ dataset_id 'test'
48
82
  end
49
83
  ```
50
84
 
@@ -52,13 +86,41 @@ end
52
86
 
53
87
  `Tumugi::Plugin::BigqueryQueryTask` is task to run `query` and save the result into the table which specified by parameter.
54
88
 
55
- #### Usage
89
+ #### Parameters
90
+
91
+ | name | type | required? | default | description |
92
+ |-----------------|---------|-----------|------------|-----------------------------------------------------------------------------------------------------------------------------------------------|
93
+ | query | string | required | | query to execute |
94
+ | table_id | string | required | | destination table ID |
95
+ | dataset_id | string | required | | destination dataset ID |
96
+ | project_id | string | optional | | destination project ID |
97
+ | mode | string | optional | "truncate" | specifies the action that occurs if the destination table already exists. [see](#mode) |
98
+ | flatten_results | boolean | optional | true | when you query nested data, BigQuery automatically flattens the table data or not. [see](https://cloud.google.com/bigquery/docs/data#flatten) |
99
+ | use_legacy_sql | bool | optional | true | use legacy SQL syntanx for BigQuery or not |
100
+ | wait | integer | optional | 60 | wait time (seconds) for query execution |
101
+
102
+ #### Examples
103
+
104
+ ##### truncate mode (default)
56
105
 
57
106
  ```rb
58
107
  task :task1, type: :bigquery_query do
59
- param_set :query, "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
60
- param_set :dataset_id, 'test'
61
- param_set :table_id, "dest_table#{Time.now.to_i}"
108
+ query "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
109
+ table_id "dest_table#{Time.now.to_i}"
110
+ dataset_id "test"
111
+ end
112
+ ```
113
+
114
+ ##### append mode
115
+
116
+ If you set `mode` to `append`, query result append to existing table.
117
+
118
+ ```rb
119
+ task :task1, type: :bigquery_query do
120
+ query "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
121
+ table_id "dest_table#{Time.now.to_i}"
122
+ dataset_id "test"
123
+ mode "append"
62
124
  end
63
125
  ```
64
126
 
@@ -66,16 +128,46 @@ end
66
128
 
67
129
  `Tumugi::Plugin::BigqueryCopyTask` is task to copy table which specified by parameter.
68
130
 
69
- #### Usage
131
+ #### Parameters
132
+
133
+ | name | type | required? | default | description |
134
+ |-----------------|--------|-----------|---------|---------------------------------------------------------|
135
+ | src_table_id | string | required | | source table ID |
136
+ | src_dataset_id | string | required | | source dataset ID |
137
+ | src_project_id | string | optional | | source project ID |
138
+ | dest_table_id | string | required | | destination table ID |
139
+ | dest_dataset_id | string | required | | destination dataset ID |
140
+ | dest_project_id | string | optional | | destination project ID |
141
+ | force_copy | bool | optional | false | force copy when destination table already exists or not |
142
+ | wait | integer| optional | 60 | wait time (seconds) for query execution |
143
+
144
+ #### Examples
70
145
 
71
146
  Copy `test.src_table` to `test.dest_table`.
72
147
 
148
+ ##### Normal usecase
149
+
150
+ ```rb
151
+ task :task1, type: :bigquery_copy do
152
+ src_table_id "src_table"
153
+ src_dataset_id "test"
154
+ dest_table_id "dest_table"
155
+ dest_dataset_id "test"
156
+ end
157
+ ```
158
+
159
+ ##### force_copy
160
+
161
+ If `force_copy` is `true`, copy operation always execute even if destination table exists.
162
+ This means data of destination table data is deleted, so be carefull to enable this parameter.
163
+
73
164
  ```rb
74
165
  task :task1, type: :bigquery_copy do
75
- param_set :src_dataset_id, 'test'
76
- param_set :src_table_id, 'src_table'
77
- param_set :dest_dataset_id, 'test'
78
- param_set :dest_table_id, 'dest_table'
166
+ src_table_id "src_table"
167
+ src_dataset_id "test"
168
+ dest_table_id "dest_table"
169
+ dest_dataset_id "test"
170
+ force_copy true
79
171
  end
80
172
  ```
81
173
 
@@ -83,25 +175,154 @@ end
83
175
 
84
176
  `Tumugi::Plugin::BigqueryLoadTask` is task to load structured data from GCS into BigQuery.
85
177
 
86
- #### Usage
178
+ #### Parameters
179
+
180
+ | name | type | required? | default | description |
181
+ |-----------------------|-----------------|------------------------------------|---------------------|----------------------------------------------------------------------------------------------------------------------------------------------|
182
+ | bucket | string | required | | source GCS bucket name |
183
+ | key | string | required | | source path of file like "/path/to/file.csv" |
184
+ | table_id | string | required | | destination table ID |
185
+ | dataset_id | string | required | | destination dataset ID |
186
+ | project_id | string | optional | | destination project ID |
187
+ | schema | array of object | required when mode is not "append" | | see [schema format](#schema) |
188
+ | mode | string | optional | "append" | specifies the action that occurs if the destination table already exists. [see](#mode) |
189
+ | source_format | string | optional | "CSV" | source file format. [see](#format) |
190
+ | ignore_unknown_values | bool | optional | false | indicates if BigQuery should allow extra values that are not represented in the table schema |
191
+ | max_bad_records | integer | optional | 0 | maximum number of bad records that BigQuery can ignore when running the job |
192
+ | field_delimiter | string | optional | "," | separator for fields in a CSV file. used only when source_format is "CSV" |
193
+ | allow_jagged_rows | bool | optional | false | accept rows that are missing trailing optional columns. The missing values are treated as null. used only when source_format is "CSV" |
194
+ | allow_quoted_newlines | bool | optional | false | indicates if BigQuery should allow quoted data sections that contain newline characters in a CSV file. used only when source_format is "CSV" |
195
+ | quote | string | optional | "\"" (double-quote) | value that is used to quote data sections in a CSV file. used only when source_format is "CSV" |
196
+ | skip_leading_rows | integer | optional | 0 | .number of rows at the top of a CSV file that BigQuery will skip when loading the data. used only when source_format is "CSV" |
197
+ | wait | integer | optional | 60 | wait time (seconds) for query execution |
198
+
199
+ #### Example
87
200
 
88
201
  Load `gs://test_bucket/load_data.csv` into `dest_project:dest_dataset.dest_table`
89
202
 
90
203
  ```rb
91
204
  task :task1, type: :bigquery_load do
92
- param_set :bucket, 'test_bucket'
93
- param_set :key, 'load_data.csv'
94
- param_set :project_id, 'dest_project'
95
- param_set :datset_id, 'dest_dataset'
96
- param_set :table_id, 'dest_table'
205
+ bucket "test_bucket"
206
+ key "load_data.csv"
207
+ table_id "dest_table"
208
+ datset_id "dest_dataset"
209
+ project_id "dest_project"
210
+ end
211
+ ```
212
+
213
+ ### Tumugi::Plugin::BigqueryExportTask
214
+
215
+ `Tumugi::Plugin::BigqueryExportTask` is task to export BigQuery table.
216
+
217
+ #### Parameters
218
+
219
+ | name | type | required? | default | description |
220
+ |--------------------|---------|-----------|--------------------|-------------------------------------------------------------------------------------|
221
+ | project_id | string | optional | | source project ID |
222
+ | job_project_id | string | optional | same as project_id | job running project ID |
223
+ | dataset_id | string | required | true | source dataset ID |
224
+ | table_id | string | required | true | source table ID |
225
+ | compression | string | optional | "NONE" | [destination file compression]. "NONE": no compression, "GZIP": compression by gzip |
226
+ | destination_format | string | optional | "CSV" | [destination file format](#format) |
227
+ | field_delimiter | string | optional | "," | separator for fields in a CSV file. used only when destination_format is "CSV" |
228
+ | print_header | bool | optional | true | print header row in a CSV file. used only when destination_format is "CSV" |
229
+ | page_size | integer | optional | 10000 | Fetch number of rows in one request |
230
+ | wait | integer | optional | 60 | wait time (seconds) for query execution |
231
+
232
+ #### Examples
233
+
234
+ ##### Export `src_dataset.src_table` to local file `data.csv`
235
+
236
+ ```rb
237
+ task :task1, type: :bigquery_export do
238
+ table_id "src_table"
239
+ datset_id "src_dataset"
240
+
241
+ output target(:local_file, "data.csv")
242
+ end
243
+ ```
244
+
245
+ ##### Export `src_dataset.src_table` to Google Cloud Storage
246
+
247
+ You need [tumugi-plugin-google_cloud_storage](https://github.com/tumugi/tumugi-plugin-google_cloud_storage)
248
+
249
+ ```rb
250
+ task :task1, type: :bigquery_export do
251
+ table_id "src_table"
252
+ datset_id "src_dataset"
253
+
254
+ output target(:google_cloud_storage_file, bucket: "bucket", key: "data.csv")
97
255
  end
98
256
  ```
99
257
 
100
- ### Config Section
258
+ ##### Export `src_dataset.src_table` to Google Drive
259
+
260
+ You need [tumugi-plugin-google_drive](https://github.com/tumugi/tumugi-plugin-google_drive)
261
+
262
+ ```rb
263
+ task :task1, type: :bigquery_export do
264
+ table_id "src_table"
265
+ datset_id "src_dataset"
266
+
267
+ output target(:google_drive_file, name: "data.csv")
268
+ end
269
+ ```
270
+
271
+ ## Common parameter value
272
+
273
+ ### mode
274
+
275
+ | value | description |
276
+ |----------|-------------|
277
+ | truncate | If the table already exists, BigQuery overwrites the table data. |
278
+ | append | If the table already exists, BigQuery appends the data to the table. |
279
+ | empty | If the table already exists and contains data, a 'duplicate' error is returned in the job result. |
280
+
281
+ ### format
282
+
283
+ | value | description |
284
+ |------------------------|--------------------------------------------|
285
+ | CSV | CSV |
286
+ | NEWLINE_DELIMITED_JSON | Each line is JSON + new line |
287
+ | AVRO | [see](https://avro.apache.org/docs/1.2.0/) |
288
+
289
+ ### schema
290
+
291
+ Format of `schema` parameter is array of nested object like below:
292
+
293
+ ```js
294
+ [
295
+ {
296
+ "name": "column1",
297
+ "type": "string"
298
+ },
299
+ {
300
+ "name": "column2",
301
+ "type": "integer",
302
+ "mode": "repeated"
303
+ },
304
+ {
305
+ "name": "record1",
306
+ "type": "record",
307
+ "fields": [
308
+ {
309
+ "name": "key1",
310
+ "type": "integer",
311
+ },
312
+ {
313
+ "name": "key2",
314
+ "type": "integer"
315
+ }
316
+ ]
317
+ }
318
+ ]
319
+ ```
320
+
321
+ ## Config Section
101
322
 
102
323
  tumugi-plugin-bigquery provide config section named "bigquery" which can specified BigQuery autenticaion info.
103
324
 
104
- #### Authenticate by client_email and private_key
325
+ ### Authenticate by client_email and private_key
105
326
 
106
327
  ```rb
107
328
  Tumugi.configure do |config|
@@ -113,7 +334,7 @@ Tumugi.configure do |config|
113
334
  end
114
335
  ```
115
336
 
116
- #### Authenticate by JSON key file
337
+ ### Authenticate by JSON key file
117
338
 
118
339
  ```rb
119
340
  Tumugi.configure do |config|
@@ -1,21 +1,21 @@
1
1
  task :task1, type: :bigquery_copy do
2
- param_set :src_project_id, ->{ input.project_id }
3
- param_set :src_dataset_id, ->{ input.dataset_id }
4
- param_set :src_table_id, ->{ input.table_id }
5
- param_set :dest_dataset_id, "test"
6
- param_set :dest_table_id, ->{ "dest_table_#{Time.now.to_i}" }
2
+ src_project_id { input.project_id }
3
+ src_dataset_id { input.dataset_id }
4
+ src_table_id { input.table_id }
5
+ dest_dataset_id "test"
6
+ dest_table_id { "dest_table_#{Time.now.to_i}" }
7
7
 
8
8
  requires :task2
9
9
  end
10
10
 
11
11
  task :task2, type: :bigquery_query do
12
- param_set :query, "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
13
- param_set :dataset_id, "test" #->{ input.dataset_id }
14
- param_set :table_id, "dest_#{Time.now.to_i}"
12
+ query "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
13
+ dataset_id { input.dataset_id }
14
+ table_id "dest_#{Time.now.to_i}"
15
15
 
16
16
  requires :task3
17
17
  end
18
18
 
19
19
  task :task3, type: :bigquery_dataset do
20
- param_set :dataset_id, "test"
20
+ dataset_id "test"
21
21
  end
@@ -6,5 +6,5 @@ task :task1 do
6
6
  end
7
7
 
8
8
  task :task2, type: :bigquery_dataset do
9
- param_set :dataset_id, 'test'
9
+ dataset_id "test"
10
10
  end
@@ -0,0 +1,13 @@
1
+ task :task1, type: :bigquery_export do
2
+ dataset_id { input.dataset_id }
3
+ table_id { input.table_id }
4
+
5
+ requires :task2
6
+ output target(:local_file, "tmp/export.csv")
7
+ end
8
+
9
+ task :task2, type: :bigquery_query do
10
+ query "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
11
+ dataset_id "test"
12
+ table_id "dest_#{Time.now.to_i}"
13
+ end
@@ -0,0 +1,22 @@
1
+ task :task1, type: :bigquery_copy do
2
+ src_project_id { input.project_id }
3
+ src_dataset_id { input.dataset_id }
4
+ src_table_id { input.table_id }
5
+ dest_dataset_id "test"
6
+ dest_table_id "dest_table_1"
7
+ force_copy true
8
+
9
+ requires :task2
10
+ end
11
+
12
+ task :task2, type: :bigquery_query do
13
+ query "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
14
+ dataset_id { input.dataset_id }
15
+ table_id "dest_#{Time.now.to_i}"
16
+
17
+ requires :task3
18
+ end
19
+
20
+ task :task3, type: :bigquery_dataset do
21
+ dataset_id "test"
22
+ end
@@ -1,11 +1,11 @@
1
1
  task :task1, type: :bigquery_load do
2
2
  requires :task2
3
- param_set :bucket, 'tumugi-plugin-bigquery'
4
- param_set :key, 'test.csv'
5
- param_set :dataset_id, -> { input.dataset_id }
6
- param_set :table_id, 'load_test'
7
- param_set :skip_leading_rows, 1
8
- param_set :schema, [
3
+ bucket 'tumugi-plugin-bigquery'
4
+ key 'test.csv'
5
+ dataset_id { input.dataset_id }
6
+ table_id 'load_test'
7
+ skip_leading_rows 1
8
+ schema [
9
9
  {
10
10
  name: 'row_number',
11
11
  type: 'INTEGER',
@@ -20,5 +20,5 @@ task :task1, type: :bigquery_load do
20
20
  end
21
21
 
22
22
  task :task2, type: :bigquery_dataset do
23
- param_set :dataset_id, 'test'
23
+ dataset_id "test"
24
24
  end
@@ -6,7 +6,7 @@ task :task1 do
6
6
  end
7
7
 
8
8
  task :task2, type: :bigquery_query do
9
- param_set :query, "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
10
- param_set :dataset_id, 'test'
11
- param_set :table_id, "dest_#{Time.now.to_i}"
9
+ query "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
10
+ dataset_id "test"
11
+ table_id "dest_#{Time.now.to_i}"
12
12
  end
@@ -0,0 +1,13 @@
1
+ task :task1 do
2
+ requires :task2
3
+ run do
4
+ log input.table_name
5
+ end
6
+ end
7
+
8
+ task :task2, type: :bigquery_query do
9
+ query "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
10
+ dataset_id "test"
11
+ table_id "dest_append"
12
+ mode "append"
13
+ end
@@ -178,6 +178,7 @@ module Tumugi
178
178
  flatten_results: true,
179
179
  priority: "INTERACTIVE",
180
180
  use_query_cache: true,
181
+ use_legacy_sql: true,
181
182
  user_defined_function_resources: nil,
182
183
  project_id: nil,
183
184
  job_project_id: nil,
@@ -191,6 +192,7 @@ module Tumugi
191
192
  flatten_results: flatten_results,
192
193
  priority: priority,
193
194
  use_query_cache: use_query_cache,
195
+ use_legacy_sql: use_legacy_sql,
194
196
  user_defined_function_resources: user_defined_function_resources,
195
197
  project_id: project_id || @project_id,
196
198
  job_project_id: job_project_id || @project_id,
@@ -1,7 +1,7 @@
1
1
  module Tumugi
2
2
  module Plugin
3
3
  module Bigquery
4
- VERSION = "0.2.0"
4
+ VERSION = "0.3.0"
5
5
  end
6
6
  end
7
7
  end
@@ -12,16 +12,25 @@ module Tumugi
12
12
  param :dest_project_id, type: :string
13
13
  param :dest_dataset_id, type: :string, required: true
14
14
  param :dest_table_id, type: :string, required: true
15
- param :wait, type: :int, default: 60
15
+ param :force_copy, type: :bool, default: false
16
+ param :wait, type: :integer, default: 60
16
17
 
17
18
  def output
18
19
  return @output if @output
19
-
20
+
20
21
  opts = { dataset_id: dest_dataset_id, table_id: dest_table_id }
21
22
  opts[:project_id] = dest_project_id if dest_project_id
22
23
  @output = Tumugi::Plugin::BigqueryTableTarget.new(opts)
23
24
  end
24
25
 
26
+ def completed?
27
+ if force_copy && !finished?
28
+ false
29
+ else
30
+ super
31
+ end
32
+ end
33
+
25
34
  def run
26
35
  log "Source: bq://#{src_project_id}/#{src_dataset_id}/#{src_table_id}"
27
36
  log "Destination: #{output}"
@@ -10,19 +10,37 @@ module Tumugi
10
10
  param :project_id, type: :string
11
11
  param :dataset_id, type: :string, required: true
12
12
  param :table_id, type: :string, required: true
13
- param :wait, type: :int, default: 60
13
+ param :mode, type: :string, default: 'truncate' # append, empty
14
+ param :flatten_results, type: :bool, default: true
15
+ param :use_legacy_sql, type: :bool, default: true
16
+ param :wait, type: :integer, default: 60
14
17
 
15
18
  def output
16
19
  @output ||= Tumugi::Plugin::BigqueryTableTarget.new(project_id: project_id, dataset_id: dataset_id, table_id: table_id)
17
20
  end
18
21
 
22
+ def completed?
23
+ if mode.to_sym == :append && !finished?
24
+ false
25
+ else
26
+ super
27
+ end
28
+ end
29
+
19
30
  def run
20
31
  log "Launching Query"
21
32
  log "Query: #{query}"
22
33
  log "Query destination: #{output}"
23
34
 
24
35
  bq_client = output.client
25
- bq_client.query(query, project_id: project_id, dataset_id: output.dataset_id, table_id: output.table_id, wait: wait)
36
+ bq_client.query(query,
37
+ project_id: project_id,
38
+ dataset_id: output.dataset_id,
39
+ table_id: output.table_id,
40
+ mode: mode.to_sym,
41
+ flatten_results: flatten_results,
42
+ use_legacy_sql: use_legacy_sql,
43
+ wait: wait)
26
44
  end
27
45
  end
28
46
  end
@@ -20,14 +20,14 @@ Gem::Specification.new do |spec|
20
20
  spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
21
21
  spec.require_paths = ["lib"]
22
22
 
23
- spec.add_runtime_dependency "tumugi", ">= 0.5.1"
23
+ spec.add_runtime_dependency "tumugi", ">= 0.6.1"
24
24
  spec.add_runtime_dependency "kura", "~> 0.2.17"
25
+ spec.add_runtime_dependency "json", "~> 1.8.3" # json 2.0 does not work with JRuby + MultiJson
25
26
 
26
27
  spec.add_development_dependency 'bundler', '~> 1.11'
27
28
  spec.add_development_dependency 'rake', '~> 10.0'
28
29
  spec.add_development_dependency 'test-unit', '~> 3.1'
29
30
  spec.add_development_dependency 'test-unit-rr'
30
31
  spec.add_development_dependency 'coveralls'
31
- spec.add_development_dependency 'github_changelog_generator'
32
32
  spec.add_development_dependency 'tumugi-plugin-google_cloud_storage'
33
33
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: tumugi-plugin-bigquery
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Kazuyuki Honda
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2016-06-06 00:00:00.000000000 Z
11
+ date: 2016-07-17 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: tumugi
@@ -16,14 +16,14 @@ dependencies:
16
16
  requirements:
17
17
  - - ">="
18
18
  - !ruby/object:Gem::Version
19
- version: 0.5.1
19
+ version: 0.6.1
20
20
  type: :runtime
21
21
  prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
24
  - - ">="
25
25
  - !ruby/object:Gem::Version
26
- version: 0.5.1
26
+ version: 0.6.1
27
27
  - !ruby/object:Gem::Dependency
28
28
  name: kura
29
29
  requirement: !ruby/object:Gem::Requirement
@@ -38,6 +38,20 @@ dependencies:
38
38
  - - "~>"
39
39
  - !ruby/object:Gem::Version
40
40
  version: 0.2.17
41
+ - !ruby/object:Gem::Dependency
42
+ name: json
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: 1.8.3
48
+ type: :runtime
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: 1.8.3
41
55
  - !ruby/object:Gem::Dependency
42
56
  name: bundler
43
57
  requirement: !ruby/object:Gem::Requirement
@@ -108,20 +122,6 @@ dependencies:
108
122
  - - ">="
109
123
  - !ruby/object:Gem::Version
110
124
  version: '0'
111
- - !ruby/object:Gem::Dependency
112
- name: github_changelog_generator
113
- requirement: !ruby/object:Gem::Requirement
114
- requirements:
115
- - - ">="
116
- - !ruby/object:Gem::Version
117
- version: '0'
118
- type: :development
119
- prerelease: false
120
- version_requirements: !ruby/object:Gem::Requirement
121
- requirements:
122
- - - ">="
123
- - !ruby/object:Gem::Version
124
- version: '0'
125
125
  - !ruby/object:Gem::Dependency
126
126
  name: tumugi-plugin-google_cloud_storage
127
127
  requirement: !ruby/object:Gem::Requirement
@@ -152,8 +152,11 @@ files:
152
152
  - bin/setup
153
153
  - examples/copy.rb
154
154
  - examples/dataset.rb
155
+ - examples/export.rb
156
+ - examples/force_copy.rb
155
157
  - examples/load.rb
156
158
  - examples/query.rb
159
+ - examples/query_append.rb
157
160
  - examples/test.csv
158
161
  - examples/tumugi_config_example.rb
159
162
  - lib/tumugi/plugin/bigquery/client.rb