RubyGems - tumugi-plugin-bigquery - Versions diffs - 0.2.0 → 0.3.0 - Mend

tumugi-plugin-bigquery 0.2.0 → 0.3.0

Files changed (17) hide show

checksums.yaml +4 -4
data/.gitignore +1 -0
data/CHANGELOG.md +27 -4
data/README.md +254 -33
data/examples/copy.rb +9 -9
data/examples/dataset.rb +1 -1
data/examples/export.rb +13 -0
data/examples/force_copy.rb +22 -0
data/examples/load.rb +7 -7
data/examples/query.rb +3 -3
data/examples/query_append.rb +13 -0
data/lib/tumugi/plugin/bigquery/client.rb +2 -0
data/lib/tumugi/plugin/bigquery/version.rb +1 -1
data/lib/tumugi/plugin/task/bigquery_copy.rb +11 -2
data/lib/tumugi/plugin/task/bigquery_query.rb +20 -2
data/tumugi-plugin-bigquery.gemspec +2 -2
metadata +21 -18

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 1f82d5d752da3918795afc6cc669a0fb4711cf95
-  data.tar.gz: fed486ae8aeb9266d4fd11cf523a19a8507755af
+  metadata.gz: 63e4b8a538949b06c7a63d62b60e965c3d167e21
+  data.tar.gz: ead04218cb01d036f9c0c457d6a036ab8b6a12b1
 SHA512:
-  metadata.gz: 8418f29dfe96d38bcdfa0c5098d59efd819edaa715a7e2af0945c57f70a4d08fa5c477560df636389eefe0bcf40712c4d240b2a97d3301cadebf6e5615808f2b
-  data.tar.gz: aa34ee20fdec506277f3ac40b8819b62f7d000db7c802ed4c9fcedd5af1bf33644ecf673b10326c8e64f8d554a9a708cdfc0f9ae7942b5570257adec3108ab28
+  metadata.gz: 1d21aa4a556541f906d566f18fd61b94960eb6021f3c5c749de07dd2be14444533d228d5229a5ffe4d4b934071e670708aeb6c43dad6976e4feb4c8613dd1474
+  data.tar.gz: 389387e0fbcf5e4ab0719260bcefa70d8176ce0b4f96bb8b6d40c4078a23bde30dfd1b03c1189282cc78cdaddda4498c0e57c72325d6a9cbe747a2f0ef44e2b1

data/.gitignore CHANGED

@@ -7,4 +7,5 @@
 /pkg/
 /spec/reports/
 /tmp/
+.ruby-version
 tumugi_config.rb

data/CHANGELOG.md CHANGED

@@ -1,7 +1,29 @@
 # Change Log
-## [0.2.0](https://github.com/tumugi/tumugi-plugin-bigquery/tree/0.2.0) (2016-06-06)
-[Full Changelog](https://github.com/tumugi/tumugi-plugin-bigquery/compare/v0.1.0...0.2.0)
+## [v0.3.0](https://github.com/tumugi/tumugi-plugin-bigquery/tree/v0.3.0) (2016-07-16)
+[Full Changelog](https://github.com/tumugi/tumugi-plugin-bigquery/compare/v0.2.0...v0.3.0)
+**Implemented enhancements:**
+- Support flatten\_result flag [\#30](https://github.com/tumugi/tumugi-plugin-bigquery/issues/30)
+- Support mode parameter for BigqueryQueryTask [\#28](https://github.com/tumugi/tumugi-plugin-bigquery/issues/28)
+- Support standard SQL [\#20](https://github.com/tumugi/tumugi-plugin-bigquery/issues/20)
+- Support force copy table [\#7](https://github.com/tumugi/tumugi-plugin-bigquery/issues/7)
+**Fixed bugs:**
+- Fix JSON export for FileSystemTarget does not work [\#31](https://github.com/tumugi/tumugi-plugin-bigquery/issues/31)
+**Merged pull requests:**
+- Update tumugi to 0.6 [\#35](https://github.com/tumugi/tumugi-plugin-bigquery/pull/35) ([hakobera](https://github.com/hakobera))
+- Add JSON export test [\#34](https://github.com/tumugi/tumugi-plugin-bigquery/pull/34) ([hakobera](https://github.com/hakobera))
+- Fix misc [\#33](https://github.com/tumugi/tumugi-plugin-bigquery/pull/33) ([hakobera](https://github.com/hakobera))
+- Support force\_copy parameter for bigquery\_copy task [\#32](https://github.com/tumugi/tumugi-plugin-bigquery/pull/32) ([hakobera](https://github.com/hakobera))
+- Support append mode query and use legacy SQL flag [\#29](https://github.com/tumugi/tumugi-plugin-bigquery/pull/29) ([hakobera](https://github.com/hakobera))
+## [v0.2.0](https://github.com/tumugi/tumugi-plugin-bigquery/tree/v0.2.0) (2016-06-06)
+[Full Changelog](https://github.com/tumugi/tumugi-plugin-bigquery/compare/v0.1.0...v0.2.0)
 **Implemented enhancements:**
@@ -23,8 +45,10 @@
 **Merged pull requests:**
-- Cache output [\#26](https://github.com/tumugi/tumugi-plugin-bigquery/pull/26) ([hakobera](https://github.com/hakobera))
+- Update changelog [\#27](https://github.com/tumugi/tumugi-plugin-bigquery/pull/27) ([hakobera](https://github.com/hakobera))
 - Prepare release for 0.2.0 [\#25](https://github.com/tumugi/tumugi-plugin-bigquery/pull/25) ([hakobera](https://github.com/hakobera))
+- Add rubygems badge [\#3](https://github.com/tumugi/tumugi-plugin-bigquery/pull/3) ([hakobera](https://github.com/hakobera))
+- Cache output [\#26](https://github.com/tumugi/tumugi-plugin-bigquery/pull/26) ([hakobera](https://github.com/hakobera))
 - Use Thor's invoke instead of system method [\#18](https://github.com/tumugi/tumugi-plugin-bigquery/pull/18) ([hakobera](https://github.com/hakobera))
 - Change test ruby version [\#17](https://github.com/tumugi/tumugi-plugin-bigquery/pull/17) ([hakobera](https://github.com/hakobera))
 - Change tumugi dependency version [\#16](https://github.com/tumugi/tumugi-plugin-bigquery/pull/16) ([hakobera](https://github.com/hakobera))
@@ -32,7 +56,6 @@
 - Add BigqueryLoadTask [\#12](https://github.com/tumugi/tumugi-plugin-bigquery/pull/12) ([hakobera](https://github.com/hakobera))
 - Update dependency gems [\#11](https://github.com/tumugi/tumugi-plugin-bigquery/pull/11) ([hakobera](https://github.com/hakobera))
 - Update tumugi to v0.5.0 [\#9](https://github.com/tumugi/tumugi-plugin-bigquery/pull/9) ([hakobera](https://github.com/hakobera))
-- Add rubygems badge [\#3](https://github.com/tumugi/tumugi-plugin-bigquery/pull/3) ([hakobera](https://github.com/hakobera))
 ## [v0.1.0](https://github.com/tumugi/tumugi-plugin-bigquery/tree/v0.1.0) (2016-05-16)
 **Fixed bugs:**

data/README.md CHANGED

@@ -1,8 +1,8 @@
 [![Build Status](https://travis-ci.org/tumugi/tumugi-plugin-bigquery.svg?branch=master)](https://travis-ci.org/tumugi/tumugi-plugin-bigquery) [![Code Climate](https://codeclimate.com/github/tumugi/tumugi-plugin-bigquery/badges/gpa.svg)](https://codeclimate.com/github/tumugi/tumugi-plugin-bigquery) [![Coverage Status](https://coveralls.io/repos/github/tumugi/tumugi-plugin-bigquery/badge.svg?branch=master)](https://coveralls.io/github/tumugi/tumugi-plugin-bigquery)  [![Gem Version](https://badge.fury.io/rb/tumugi-plugin-bigquery.svg)](https://badge.fury.io/rb/tumugi-plugin-bigquery)
-# tumugi-plugin-bigquery
+# Google BigQuery plugin for [tumugi](https://github.com/tumugi/tumugi)
-tumugi-plugin-bigquery is a plugin for integrate [Google BigQuery](https://cloud.google.com/bigquery/) and [Tumugi](https://github.com/tumugi/tumugi).
+tumugi-plugin-bigquery is a plugin for integrate [Google BigQuery](https://cloud.google.com/bigquery/) and [tumugi](https://github.com/tumugi/tumugi).
 ## Installation
@@ -12,17 +12,7 @@ Add this line to your application's Gemfile:
 gem 'tumugi-plugin-bigquery'
 ```
-And then execute:
-```sh
-$ bundle
-```
-Or install it yourself as:
-```sb
-$ gem install tumugi-plugin-bigquery
-```
+And then execute `bundle install`.
 ## Target
@@ -30,21 +20,65 @@ $ gem install tumugi-plugin-bigquery
 `Tumugi::Plugin::BigqueryDatasetTarget` is target for BigQuery dataset.
+#### Parameters
+| Name       | type   | required? | default | description                                                      |
+|------------|--------|-----------|---------|------------------------------------------------------------------|
+| dataset_id | string | required  |         | Dataset ID                                                       |
+| project_id | string | optional  |         | [Project](https://cloud.google.com/compute/docs/projects) ID     |
+#### Examples
+```rb
+task :task1 do
+  output target(:bigquery_dataset, dataset_id: "your_dataset_id")
+end
+```
+```rb
+task :task1 do
+  output target(:bigquery_dataset, project_id: "project_id", dataset_id: "dataset_id")
+end
+```
 #### Tumugi::Plugin::BigqueryTableTarget
 `Tumugi::Plugin::BigqueryDatasetTarget` is target for BigQuery table.
+#### Parameters
+| name       | type   | required? | default | description                                                      |
+|------------|--------|-----------|---------|------------------------------------------------------------------|
+| table_id   | string | required  |         | Table ID                                                         |
+| dataset_id | string | required  |         | Dataset ID                                                       |
+| project_id | string | optional  |         | [Project](https://cloud.google.com/compute/docs/projects) ID     |
+#### Examples
+```rb
+task :task1 do
+  output target(:bigquery_table, table_id: "table_id", dataset_id: "your_dataset_id")
+end
+```
 ## Task
 ### Tumugi::Plugin::BigqueryDatasetTask
 `Tumugi::Plugin::BigqueryDatasetTask` is task to create a dataset.
-#### Usage
+#### Parameters
+| name       | type   | required? | default | description                                                      |
+|------------|--------|-----------|---------|------------------------------------------------------------------|
+| dataset_id | string | required  |         | Dataset ID                                                       |
+| project_id | string | optional  |         | [Project](https://cloud.google.com/compute/docs/projects) ID     |
+#### Examples
 ```rb
 task :task1, type: :bigquery_dataset do
-  param_set :dataset_id, 'test'
+  dataset_id 'test'
 end
 ```
@@ -52,13 +86,41 @@ end
 `Tumugi::Plugin::BigqueryQueryTask` is task to run `query` and save the result into the table which specified by parameter.
-#### Usage
+#### Parameters
+| name            | type    | required? | default    | description                                                                                                                                   |
+|-----------------|---------|-----------|------------|-----------------------------------------------------------------------------------------------------------------------------------------------|
+| query           | string  | required  |            | query to execute                                                                                                                              |
+| table_id        | string  | required  |            | destination table ID                                                                                                                          |
+| dataset_id      | string  | required  |            | destination dataset ID                                                                                                                        |
+| project_id      | string  | optional  |            | destination project ID                                                                                  |
+| mode            | string  | optional  | "truncate" | specifies the action that occurs if the destination table already exists. [see](#mode)                                             |
+| flatten_results | boolean | optional  | true       | when you query nested data, BigQuery automatically flattens the table data or not. [see](https://cloud.google.com/bigquery/docs/data#flatten) |
+| use_legacy_sql  | bool    | optional  | true       | use legacy SQL syntanx for BigQuery or not                                                                                                    |
+| wait            | integer | optional  | 60         | wait time (seconds) for query execution                                                                                                       |
+#### Examples
+##### truncate mode (default)
 ```rb
 task :task1, type: :bigquery_query do
-  param_set :query, "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
-  param_set :dataset_id, 'test'
-  param_set :table_id, "dest_table#{Time.now.to_i}"
+  query      "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
+  table_id   "dest_table#{Time.now.to_i}"
+  dataset_id "test"
+end
+```
+##### append mode
+If you set `mode` to `append`, query result append to existing table.
+```rb
+task :task1, type: :bigquery_query do
+  query      "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
+  table_id   "dest_table#{Time.now.to_i}"
+  dataset_id "test"
+  mode       "append"
 end
 ```
@@ -66,16 +128,46 @@ end
 `Tumugi::Plugin::BigqueryCopyTask` is task to copy table which specified by parameter.
-#### Usage
+#### Parameters
+| name            | type   | required? | default | description                                             |
+|-----------------|--------|-----------|---------|---------------------------------------------------------|
+| src_table_id    | string | required  |         | source table ID                                         |
+| src_dataset_id  | string | required  |         | source dataset ID                                       |
+| src_project_id  | string | optional  |         | source project ID                                       |
+| dest_table_id   | string | required  |         | destination table ID                                    |
+| dest_dataset_id | string | required  |         | destination dataset ID                                  |
+| dest_project_id | string | optional  |         | destination project ID                                  |
+| force_copy      | bool   | optional  | false   | force copy when destination table already exists or not |
+| wait            | integer| optional  | 60      | wait time (seconds) for query execution                 |
+#### Examples
 Copy `test.src_table` to `test.dest_table`.
+##### Normal usecase
+```rb
+task :task1, type: :bigquery_copy do
+  src_table_id    "src_table"
+  src_dataset_id  "test"
+  dest_table_id   "dest_table"
+  dest_dataset_id "test"
+end
+```
+##### force_copy
+If `force_copy` is `true`, copy operation always execute even if destination table exists.
+This means data of destination table data is deleted, so be carefull to enable this parameter.
 ```rb
 task :task1, type: :bigquery_copy do
-  param_set :src_dataset_id, 'test'
-  param_set :src_table_id, 'src_table'
-  param_set :dest_dataset_id, 'test'
-  param_set :dest_table_id, 'dest_table'
+  src_table_id    "src_table"
+  src_dataset_id  "test"
+  dest_table_id   "dest_table"
+  dest_dataset_id "test"
+  force_copy      true
 end
 ```
@@ -83,25 +175,154 @@ end
 `Tumugi::Plugin::BigqueryLoadTask` is task to load structured data from GCS into BigQuery.
-#### Usage
+#### Parameters
+| name                  | type            | required?                          | default             | description                                                                                                                                  |
+|-----------------------|-----------------|------------------------------------|---------------------|----------------------------------------------------------------------------------------------------------------------------------------------|
+| bucket                | string          | required                           |                     | source GCS bucket name                                                                                                                       |
+| key                   | string          | required                           |                     | source path of file like "/path/to/file.csv"                                                                                                 |
+| table_id              | string          | required                           |                     | destination table ID                                                                                                                         |
+| dataset_id            | string          | required                           |                     | destination dataset ID                                                                                                                       |
+| project_id            | string          | optional                           |                     | destination project ID                                                                                                                       |
+| schema                | array of object | required when mode is not "append" |                     | see [schema format](#schema)                                                                                                      |
+| mode                  | string          | optional                           | "append"            | specifies the action that occurs if the destination table already exists. [see](#mode)                                            |
+| source_format         | string          | optional                           | "CSV"               | source file format. [see](#format)                                                                                                |
+| ignore_unknown_values | bool            | optional                           | false               | indicates if BigQuery should allow extra values that are not represented in the table schema                                                 |
+| max_bad_records       | integer         | optional                           | 0                   | maximum number of bad records that BigQuery can ignore when running the job                                                                  |
+| field_delimiter       | string          | optional                           | ","                 | separator for fields in a CSV file. used only when source_format is "CSV"                                                                    |
+| allow_jagged_rows     | bool            | optional                           | false               | accept rows that are missing trailing optional columns. The missing values are treated as null. used only when source_format is "CSV"        |
+| allow_quoted_newlines | bool            | optional                           | false               | indicates if BigQuery should allow quoted data sections that contain newline characters in a CSV file. used only when source_format is "CSV" |
+| quote                 | string          | optional                           | "\"" (double-quote) | value that is used to quote data sections in a CSV file. used only when source_format is "CSV"                                               |
+| skip_leading_rows     | integer         | optional                           | 0                   | .number of rows at the top of a CSV file that BigQuery will skip when loading the data. used only when source_format is "CSV"                |
+| wait                  | integer         | optional                           | 60                  | wait time (seconds) for query execution                                                                                                      |
+#### Example
 Load `gs://test_bucket/load_data.csv` into `dest_project:dest_dataset.dest_table`
 ```rb
 task :task1, type: :bigquery_load do
-  param_set :bucket, 'test_bucket'
-  param_set :key, 'load_data.csv'
-  param_set :project_id, 'dest_project'
-  param_set :datset_id, 'dest_dataset'
-  param_set :table_id, 'dest_table'
+  bucket     "test_bucket"
+  key        "load_data.csv"
+  table_id   "dest_table"
+  datset_id  "dest_dataset"
+  project_id "dest_project"
+end
+```
+### Tumugi::Plugin::BigqueryExportTask
+`Tumugi::Plugin::BigqueryExportTask` is task to export BigQuery table.
+#### Parameters
+| name               | type    | required? | default            | description                                                                         |
+|--------------------|---------|-----------|--------------------|-------------------------------------------------------------------------------------|
+| project_id         | string  | optional  |                    | source project ID                                                                   |
+| job_project_id     | string  | optional  | same as project_id | job running project ID                                                              |
+| dataset_id         | string  | required  | true               | source dataset ID                                                                   |
+| table_id           | string  | required  | true               | source table ID                                                                     |
+| compression        | string  | optional  | "NONE"             | [destination file compression]. "NONE": no compression, "GZIP": compression by gzip |
+| destination_format | string  | optional  | "CSV"              | [destination file format](#format)                                                  |
+| field_delimiter    | string  | optional  | ","                | separator for fields in a CSV file. used only when destination_format is "CSV"      |
+| print_header       | bool    | optional  | true               | print header row in a CSV file. used only when destination_format is "CSV"          |
+| page_size          | integer | optional  | 10000              | Fetch number of rows in one request                                                 |
+| wait               | integer | optional  | 60                 | wait time (seconds) for query execution                                             |
+#### Examples
+##### Export `src_dataset.src_table` to local file `data.csv`
+```rb
+task :task1, type: :bigquery_export do
+  table_id   "src_table"
+  datset_id  "src_dataset"
+  output target(:local_file, "data.csv")
+end
+```
+##### Export `src_dataset.src_table` to Google Cloud Storage
+You need [tumugi-plugin-google_cloud_storage](https://github.com/tumugi/tumugi-plugin-google_cloud_storage)
+```rb
+task :task1, type: :bigquery_export do
+  table_id   "src_table"
+  datset_id  "src_dataset"
+  output target(:google_cloud_storage_file, bucket: "bucket", key: "data.csv")
 end
 ```
-### Config Section
+##### Export `src_dataset.src_table` to Google Drive
+You need [tumugi-plugin-google_drive](https://github.com/tumugi/tumugi-plugin-google_drive)
+```rb
+task :task1, type: :bigquery_export do
+  table_id   "src_table"
+  datset_id  "src_dataset"
+  output target(:google_drive_file, name: "data.csv")
+end
+```
+## Common parameter value
+### mode
+| value    | description |
+|----------|-------------|
+| truncate | If the table already exists, BigQuery overwrites the table data. |
+| append   | If the table already exists, BigQuery appends the data to the table. |
+| empty    | If the table already exists and contains data, a 'duplicate' error is returned in the job result. |
+### format
+| value                  | description                                |
+|------------------------|--------------------------------------------|
+| CSV                    | CSV                                        |
+| NEWLINE_DELIMITED_JSON | Each line is JSON + new line               |
+| AVRO                   | [see](https://avro.apache.org/docs/1.2.0/) |
+### schema
+Format of `schema` parameter is array of nested object like below:
+```js
+[
+  {
+    "name": "column1",
+    "type": "string"
+  },
+  {
+    "name": "column2",
+    "type": "integer",
+    "mode": "repeated"
+  },
+  {
+    "name": "record1",
+    "type": "record",
+    "fields": [
+      {
+        "name": "key1",
+        "type": "integer",
+      },
+      {
+        "name": "key2",
+        "type": "integer"
+      }
+    ]
+  }
+]
+```
+## Config Section
 tumugi-plugin-bigquery provide config section named "bigquery" which can specified BigQuery autenticaion info.
-#### Authenticate by client_email and private_key
+### Authenticate by client_email and private_key
 ```rb
 Tumugi.configure do |config|
@@ -113,7 +334,7 @@ Tumugi.configure do |config|
 end
 ```
-#### Authenticate by JSON key file
+### Authenticate by JSON key file
 ```rb
 Tumugi.configure do |config|

data/examples/copy.rb CHANGED

@@ -1,21 +1,21 @@
 task :task1, type: :bigquery_copy do
-  param_set :src_project_id, ->{ input.project_id }
-  param_set :src_dataset_id, ->{ input.dataset_id }
-  param_set :src_table_id, ->{ input.table_id }
-  param_set :dest_dataset_id, "test"
-  param_set :dest_table_id, ->{ "dest_table_#{Time.now.to_i}" }
+  src_project_id  { input.project_id }
+  src_dataset_id  { input.dataset_id }
+  src_table_id    { input.table_id }
+  dest_dataset_id "test"
+  dest_table_id   { "dest_table_#{Time.now.to_i}" }
   requires :task2
 end
 task :task2, type: :bigquery_query do
-  param_set :query, "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
-  param_set :dataset_id, "test" #->{ input.dataset_id }
-  param_set :table_id, "dest_#{Time.now.to_i}"
+  query      "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
+  dataset_id { input.dataset_id }
+  table_id   "dest_#{Time.now.to_i}"
   requires :task3
 end
 task :task3, type: :bigquery_dataset do
-  param_set :dataset_id, "test"
+  dataset_id "test"
 end

data/examples/dataset.rb CHANGED

@@ -6,5 +6,5 @@ task :task1 do
 end
 task :task2, type: :bigquery_dataset do
-  param_set :dataset_id, 'test'
+  dataset_id "test"
 end

data/examples/export.rb ADDED

@@ -0,0 +1,13 @@
+task :task1, type: :bigquery_export do
+  dataset_id  { input.dataset_id }
+  table_id    { input.table_id }
+  requires :task2
+  output target(:local_file, "tmp/export.csv")
+end
+task :task2, type: :bigquery_query do
+  query      "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
+  dataset_id "test"
+  table_id   "dest_#{Time.now.to_i}"
+end

data/examples/force_copy.rb ADDED

@@ -0,0 +1,22 @@
+task :task1, type: :bigquery_copy do
+  src_project_id  { input.project_id }
+  src_dataset_id  { input.dataset_id }
+  src_table_id    { input.table_id }
+  dest_dataset_id "test"
+  dest_table_id   "dest_table_1"
+  force_copy      true
+  requires :task2
+end
+task :task2, type: :bigquery_query do
+  query      "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
+  dataset_id { input.dataset_id }
+  table_id   "dest_#{Time.now.to_i}"
+  requires :task3
+end
+task :task3, type: :bigquery_dataset do
+  dataset_id "test"
+end

data/examples/load.rb CHANGED

@@ -1,11 +1,11 @@
 task :task1, type: :bigquery_load do
   requires :task2
-  param_set :bucket, 'tumugi-plugin-bigquery'
-  param_set :key, 'test.csv'
-  param_set :dataset_id, -> { input.dataset_id }
-  param_set :table_id, 'load_test'
-  param_set :skip_leading_rows, 1
-  param_set :schema, [
+  bucket 'tumugi-plugin-bigquery'
+  key 'test.csv'
+  dataset_id { input.dataset_id }
+  table_id 'load_test'
+  skip_leading_rows 1
+  schema [
     {
       name: 'row_number',
       type: 'INTEGER',
@@ -20,5 +20,5 @@ task :task1, type: :bigquery_load do
 end
 task :task2, type: :bigquery_dataset do
-  param_set :dataset_id, 'test'
+  dataset_id "test"
 end

data/examples/query.rb CHANGED

@@ -6,7 +6,7 @@ task :task1 do
 end
 task :task2, type: :bigquery_query do
-  param_set :query, "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
-  param_set :dataset_id, 'test'
-  param_set :table_id, "dest_#{Time.now.to_i}"
+  query      "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
+  dataset_id "test"
+  table_id   "dest_#{Time.now.to_i}"
 end

data/examples/query_append.rb ADDED

@@ -0,0 +1,13 @@
+task :task1 do
+  requires :task2
+  run do
+    log input.table_name
+  end
+end
+task :task2, type: :bigquery_query do
+  query      "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
+  dataset_id "test"
+  table_id   "dest_append"
+  mode       "append"
+end

data/lib/tumugi/plugin/bigquery/client.rb CHANGED

@@ -178,6 +178,7 @@ module Tumugi
                   flatten_results: true,
                   priority: "INTERACTIVE",
                   use_query_cache: true,
+                  use_legacy_sql: true,
                   user_defined_function_resources: nil,
                   project_id: nil,
                   job_project_id: nil,
@@ -191,6 +192,7 @@ module Tumugi
                         flatten_results: flatten_results,
                         priority: priority,
                         use_query_cache: use_query_cache,
+                        use_legacy_sql: use_legacy_sql,
                         user_defined_function_resources: user_defined_function_resources,
                         project_id: project_id || @project_id,
                         job_project_id: job_project_id || @project_id,

data/lib/tumugi/plugin/bigquery/version.rb CHANGED

@@ -1,7 +1,7 @@
 module Tumugi
   module Plugin
     module Bigquery
-      VERSION = "0.2.0"
+      VERSION = "0.3.0"
     end
   end
 end

data/lib/tumugi/plugin/task/bigquery_copy.rb CHANGED

@@ -12,16 +12,25 @@ module Tumugi
       param :dest_project_id, type: :string
       param :dest_dataset_id, type: :string, required: true
       param :dest_table_id, type: :string, required: true
-      param :wait, type: :int, default: 60
+      param :force_copy, type: :bool, default: false
+      param :wait, type: :integer, default: 60
       def output
         return @output if @output
         opts = { dataset_id: dest_dataset_id, table_id: dest_table_id }
         opts[:project_id] = dest_project_id if dest_project_id
         @output = Tumugi::Plugin::BigqueryTableTarget.new(opts)
       end
+      def completed?
+        if force_copy && !finished?
+          false
+        else
+          super
+        end
+      end
       def run
         log "Source: bq://#{src_project_id}/#{src_dataset_id}/#{src_table_id}"
         log "Destination: #{output}"

data/lib/tumugi/plugin/task/bigquery_query.rb CHANGED

@@ -10,19 +10,37 @@ module Tumugi
       param :project_id, type: :string
       param :dataset_id, type: :string, required: true
       param :table_id, type: :string, required: true
-      param :wait, type: :int, default: 60
+      param :mode, type: :string, default: 'truncate' # append, empty
+      param :flatten_results, type: :bool, default: true
+      param :use_legacy_sql, type: :bool, default: true
+      param :wait, type: :integer, default: 60
       def output
         @output ||= Tumugi::Plugin::BigqueryTableTarget.new(project_id: project_id, dataset_id: dataset_id, table_id: table_id)
       end
+      def completed?
+        if mode.to_sym == :append && !finished?
+          false
+        else
+          super
+        end
+      end
       def run
         log "Launching Query"
         log "Query: #{query}"
         log "Query destination: #{output}"
         bq_client = output.client
-        bq_client.query(query, project_id: project_id, dataset_id: output.dataset_id, table_id: output.table_id, wait: wait)
+        bq_client.query(query,
+                        project_id: project_id,
+                        dataset_id: output.dataset_id,
+                        table_id: output.table_id,
+                        mode: mode.to_sym,
+                        flatten_results: flatten_results,
+                        use_legacy_sql: use_legacy_sql,
+                        wait: wait)
       end
     end
   end

data/tumugi-plugin-bigquery.gemspec CHANGED

@@ -20,14 +20,14 @@ Gem::Specification.new do |spec|
   spec.executables   = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
   spec.require_paths = ["lib"]
-  spec.add_runtime_dependency "tumugi", ">= 0.5.1"
+  spec.add_runtime_dependency "tumugi", ">= 0.6.1"
   spec.add_runtime_dependency "kura", "~> 0.2.17"
+  spec.add_runtime_dependency "json", "~> 1.8.3" # json 2.0 does not work with JRuby + MultiJson
   spec.add_development_dependency 'bundler', '~> 1.11'
   spec.add_development_dependency 'rake', '~> 10.0'
   spec.add_development_dependency 'test-unit', '~> 3.1'
   spec.add_development_dependency 'test-unit-rr'
   spec.add_development_dependency 'coveralls'
-  spec.add_development_dependency 'github_changelog_generator'
   spec.add_development_dependency 'tumugi-plugin-google_cloud_storage'
 end

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: tumugi-plugin-bigquery
 version: !ruby/object:Gem::Version
-  version: 0.2.0
+  version: 0.3.0
 platform: ruby
 authors:
 - Kazuyuki Honda
 autorequire:
 bindir: exe
 cert_chain: []
-date: 2016-06-06 00:00:00.000000000 Z
+date: 2016-07-17 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: tumugi
@@ -16,14 +16,14 @@ dependencies:
     requirements:
     - - ">="
       - !ruby/object:Gem::Version
-        version: 0.5.1
+        version: 0.6.1
   type: :runtime
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - ">="
       - !ruby/object:Gem::Version
-        version: 0.5.1
+        version: 0.6.1
 - !ruby/object:Gem::Dependency
   name: kura
   requirement: !ruby/object:Gem::Requirement
@@ -38,6 +38,20 @@ dependencies:
     - - "~>"
       - !ruby/object:Gem::Version
         version: 0.2.17
+- !ruby/object:Gem::Dependency
+  name: json
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: 1.8.3
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: 1.8.3
 - !ruby/object:Gem::Dependency
   name: bundler
   requirement: !ruby/object:Gem::Requirement
@@ -108,20 +122,6 @@ dependencies:
     - - ">="
       - !ruby/object:Gem::Version
         version: '0'
-- !ruby/object:Gem::Dependency
-  name: github_changelog_generator
-  requirement: !ruby/object:Gem::Requirement
-    requirements:
-    - - ">="
-      - !ruby/object:Gem::Version
-        version: '0'
-  type: :development
-  prerelease: false
-  version_requirements: !ruby/object:Gem::Requirement
-    requirements:
-    - - ">="
-      - !ruby/object:Gem::Version
-        version: '0'
 - !ruby/object:Gem::Dependency
   name: tumugi-plugin-google_cloud_storage
   requirement: !ruby/object:Gem::Requirement
@@ -152,8 +152,11 @@ files:
 - bin/setup
 - examples/copy.rb
 - examples/dataset.rb
+- examples/export.rb
+- examples/force_copy.rb
 - examples/load.rb
 - examples/query.rb
+- examples/query_append.rb
 - examples/test.csv
 - examples/tumugi_config_example.rb
 - lib/tumugi/plugin/bigquery/client.rb