tumugi-plugin-bigquery 0.1.0 → 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 3ab49f7c9e04361951e1f7ab9eff4019d09c8829
4
- data.tar.gz: 1b5ed8b53e77fb413a98ce6cfe4a682611b33f62
3
+ metadata.gz: 1f82d5d752da3918795afc6cc669a0fb4711cf95
4
+ data.tar.gz: fed486ae8aeb9266d4fd11cf523a19a8507755af
5
5
  SHA512:
6
- metadata.gz: ea2e037bc46885c0e7cec71d085a5883dd9508f597d6e98d275b11464d2a9f27596a26cfd4a9141f51090b9ab2c9a96df63eb27c59ebba12113c00e2cac4d239
7
- data.tar.gz: 1e18b4578eda83a38d200896cf0b8e22a85726b08f956bd20e9e794d81322df18416004d58f977e78b053b3609e559064d04cfe683ab1ecc60d32e8f89f34067
6
+ metadata.gz: 8418f29dfe96d38bcdfa0c5098d59efd819edaa715a7e2af0945c57f70a4d08fa5c477560df636389eefe0bcf40712c4d240b2a97d3301cadebf6e5615808f2b
7
+ data.tar.gz: aa34ee20fdec506277f3ac40b8819b62f7d000db7c802ed4c9fcedd5af1bf33644ecf673b10326c8e64f8d554a9a708cdfc0f9ae7942b5570257adec3108ab28
data/.travis.yml CHANGED
@@ -1,7 +1,9 @@
1
1
  language: ruby
2
2
  cache: bundler
3
3
  rvm:
4
- - 2.1.10
5
- - 2.2.5
6
- - 2.3.1
7
- - jruby-9.0.5.0
4
+ - 2.1
5
+ - 2.3.1
6
+ - jruby-9.1.2.0
7
+ before_install:
8
+ - gem install bundler
9
+
data/CHANGELOG.md ADDED
@@ -0,0 +1,48 @@
1
+ # Change Log
2
+
3
+ ## [0.2.0](https://github.com/tumugi/tumugi-plugin-bigquery/tree/0.2.0) (2016-06-06)
4
+ [Full Changelog](https://github.com/tumugi/tumugi-plugin-bigquery/compare/v0.1.0...0.2.0)
5
+
6
+ **Implemented enhancements:**
7
+
8
+ - Support extract table to FileSystemTarget [\#23](https://github.com/tumugi/tumugi-plugin-bigquery/issues/23)
9
+ - Support load from GCS [\#5](https://github.com/tumugi/tumugi-plugin-bigquery/issues/5)
10
+ - Support extract table to Google Cloud Storage [\#4](https://github.com/tumugi/tumugi-plugin-bigquery/issues/4)
11
+ - Support service account application default auth [\#22](https://github.com/tumugi/tumugi-plugin-bigquery/pull/22) ([hakobera](https://github.com/hakobera))
12
+
13
+ **Fixed bugs:**
14
+
15
+ - Fix typo and dependency [\#24](https://github.com/tumugi/tumugi-plugin-bigquery/pull/24) ([hakobera](https://github.com/hakobera))
16
+ - Fix missing project\_id of dataset/table [\#21](https://github.com/tumugi/tumugi-plugin-bigquery/pull/21) ([hakobera](https://github.com/hakobera))
17
+ - Fix private key file auth does not work [\#19](https://github.com/tumugi/tumugi-plugin-bigquery/pull/19) ([hakobera](https://github.com/hakobera))
18
+ - Fix support private key file in config section [\#13](https://github.com/tumugi/tumugi-plugin-bigquery/pull/13) ([hakobera](https://github.com/hakobera))
19
+
20
+ **Closed issues:**
21
+
22
+ - Update tumugi to v0.5.0 [\#8](https://github.com/tumugi/tumugi-plugin-bigquery/issues/8)
23
+
24
+ **Merged pull requests:**
25
+
26
+ - Cache output [\#26](https://github.com/tumugi/tumugi-plugin-bigquery/pull/26) ([hakobera](https://github.com/hakobera))
27
+ - Prepare release for 0.2.0 [\#25](https://github.com/tumugi/tumugi-plugin-bigquery/pull/25) ([hakobera](https://github.com/hakobera))
28
+ - Use Thor's invoke instead of system method [\#18](https://github.com/tumugi/tumugi-plugin-bigquery/pull/18) ([hakobera](https://github.com/hakobera))
29
+ - Change test ruby version [\#17](https://github.com/tumugi/tumugi-plugin-bigquery/pull/17) ([hakobera](https://github.com/hakobera))
30
+ - Change tumugi dependency version [\#16](https://github.com/tumugi/tumugi-plugin-bigquery/pull/16) ([hakobera](https://github.com/hakobera))
31
+ - Implement extract table to google cloud storage feature [\#15](https://github.com/tumugi/tumugi-plugin-bigquery/pull/15) ([hakobera](https://github.com/hakobera))
32
+ - Add BigqueryLoadTask [\#12](https://github.com/tumugi/tumugi-plugin-bigquery/pull/12) ([hakobera](https://github.com/hakobera))
33
+ - Update dependency gems [\#11](https://github.com/tumugi/tumugi-plugin-bigquery/pull/11) ([hakobera](https://github.com/hakobera))
34
+ - Update tumugi to v0.5.0 [\#9](https://github.com/tumugi/tumugi-plugin-bigquery/pull/9) ([hakobera](https://github.com/hakobera))
35
+ - Add rubygems badge [\#3](https://github.com/tumugi/tumugi-plugin-bigquery/pull/3) ([hakobera](https://github.com/hakobera))
36
+
37
+ ## [v0.1.0](https://github.com/tumugi/tumugi-plugin-bigquery/tree/v0.1.0) (2016-05-16)
38
+ **Fixed bugs:**
39
+
40
+ - Fix unused arguments [\#2](https://github.com/tumugi/tumugi-plugin-bigquery/pull/2) ([hakobera](https://github.com/hakobera))
41
+
42
+ **Merged pull requests:**
43
+
44
+ - First implementation [\#1](https://github.com/tumugi/tumugi-plugin-bigquery/pull/1) ([hakobera](https://github.com/hakobera))
45
+
46
+
47
+
48
+ \* *This Change Log was automatically generated by [github_changelog_generator](https://github.com/skywinder/Github-Changelog-Generator)*
data/README.md CHANGED
@@ -1,4 +1,4 @@
1
- [![Build Status](https://travis-ci.org/tumugi/tumugi-plugin-bigquery.svg?branch=master)](https://travis-ci.org/tumugi/tumugi-plugin-bigquery) [![Code Climate](https://codeclimate.com/github/tumugi/tumugi-plugin-bigquery/badges/gpa.svg)](https://codeclimate.com/github/tumugi/tumugi-plugin-bigquery) [![Coverage Status](https://coveralls.io/repos/github/tumugi/tumugi-plugin-bigquery/badge.svg?branch=master)](https://coveralls.io/github/tumugi/tumugi-plugin-bigquery)
1
+ [![Build Status](https://travis-ci.org/tumugi/tumugi-plugin-bigquery.svg?branch=master)](https://travis-ci.org/tumugi/tumugi-plugin-bigquery) [![Code Climate](https://codeclimate.com/github/tumugi/tumugi-plugin-bigquery/badges/gpa.svg)](https://codeclimate.com/github/tumugi/tumugi-plugin-bigquery) [![Coverage Status](https://coveralls.io/repos/github/tumugi/tumugi-plugin-bigquery/badge.svg?branch=master)](https://coveralls.io/github/tumugi/tumugi-plugin-bigquery) [![Gem Version](https://badge.fury.io/rb/tumugi-plugin-bigquery.svg)](https://badge.fury.io/rb/tumugi-plugin-bigquery)
2
2
 
3
3
  # tumugi-plugin-bigquery
4
4
 
@@ -68,6 +68,8 @@ end
68
68
 
69
69
  #### Usage
70
70
 
71
+ Copy `test.src_table` to `test.dest_table`.
72
+
71
73
  ```rb
72
74
  task :task1, type: :bigquery_copy do
73
75
  param_set :src_dataset_id, 'test'
@@ -77,6 +79,24 @@ task :task1, type: :bigquery_copy do
77
79
  end
78
80
  ```
79
81
 
82
+ ### Tumugi::Plugin::BigqueryLoadTask
83
+
84
+ `Tumugi::Plugin::BigqueryLoadTask` is task to load structured data from GCS into BigQuery.
85
+
86
+ #### Usage
87
+
88
+ Load `gs://test_bucket/load_data.csv` into `dest_project:dest_dataset.dest_table`
89
+
90
+ ```rb
91
+ task :task1, type: :bigquery_load do
92
+ param_set :bucket, 'test_bucket'
93
+ param_set :key, 'load_data.csv'
94
+ param_set :project_id, 'dest_project'
95
+ param_set :datset_id, 'dest_dataset'
96
+ param_set :table_id, 'dest_table'
97
+ end
98
+ ```
99
+
80
100
  ### Config Section
81
101
 
82
102
  tumugi-plugin-bigquery provide config section named "bigquery" which can specified BigQuery autenticaion info.
@@ -84,7 +104,7 @@ tumugi-plugin-bigquery provide config section named "bigquery" which can specifi
84
104
  #### Authenticate by client_email and private_key
85
105
 
86
106
  ```rb
87
- Tumugi.config do |config|
107
+ Tumugi.configure do |config|
88
108
  config.section("bigquery") do |section|
89
109
  section.project_id = "xxx"
90
110
  section.client_email = "yyy@yyy.iam.gserviceaccount.com"
@@ -96,7 +116,7 @@ end
96
116
  #### Authenticate by JSON key file
97
117
 
98
118
  ```rb
99
- Tumugi.config do |config|
119
+ Tumugi.configure do |config|
100
120
  config.section("bigquery") do |section|
101
121
  section.private_key_file = "/path/to/key.json"
102
122
  end
data/examples/load.rb ADDED
@@ -0,0 +1,24 @@
1
+ task :task1, type: :bigquery_load do
2
+ requires :task2
3
+ param_set :bucket, 'tumugi-plugin-bigquery'
4
+ param_set :key, 'test.csv'
5
+ param_set :dataset_id, -> { input.dataset_id }
6
+ param_set :table_id, 'load_test'
7
+ param_set :skip_leading_rows, 1
8
+ param_set :schema, [
9
+ {
10
+ name: 'row_number',
11
+ type: 'INTEGER',
12
+ mode: 'NULLABLE'
13
+ },
14
+ {
15
+ name: 'value',
16
+ type: 'INTEGER',
17
+ mode: 'NULLABLE'
18
+ },
19
+ ]
20
+ end
21
+
22
+ task :task2, type: :bigquery_dataset do
23
+ param_set :dataset_id, 'test'
24
+ end
data/examples/test.csv ADDED
@@ -0,0 +1,6 @@
1
+ row_number,value
2
+ 1,1
3
+ 2,2
4
+ 3,3
5
+ 4,4
6
+ 5,5
@@ -1,7 +1,7 @@
1
- Tumugi.config do |c|
2
- c.section('bigquery') do |s|
3
- s.project_id = ENV["PROJECT_ID"]
4
- s.client_email = ENV["CLIENT_EMAIL"]
5
- s.private_key = ENV["PRIVATE_KEY"].gsub(/\\n/, "\n")
1
+ Tumugi.configure do |config|
2
+ config.section('bigquery') do |section|
3
+ section.project_id = ENV["PROJECT_ID"]
4
+ section.client_email = ENV["CLIENT_EMAIL"]
5
+ section.private_key = ENV["PRIVATE_KEY"].gsub(/\\n/, "\n")
6
6
  end
7
7
  end
@@ -1,4 +1,5 @@
1
1
  require 'kura'
2
+ require 'json'
2
3
  require_relative './error'
3
4
 
4
5
  Tumugi::Config.register_section('bigquery', :project_id, :client_email, :private_key, :private_key_file)
@@ -9,12 +10,22 @@ module Tumugi
9
10
  class Client
10
11
  attr_reader :project_id
11
12
 
12
- def initialize(project_id: nil, client_email: nil, private_key: nil)
13
- config = Tumugi.config.section('bigquery')
14
- @project_id = project_id || config.project_id
15
- @client_email = client_email || config.client_email
16
- @private_key = private_key || config.private_key
17
- @client = Kura.client(@project_id, @client_email, @private_key)
13
+ def initialize(project_id: nil, client_email: nil, private_key: nil, private_key_file: nil)
14
+ @project_id = project_id
15
+
16
+ if client_email.nil? && private_key.nil? && !private_key_file.nil?
17
+ @client = Kura.client(private_key_file)
18
+ if @project_id.nil?
19
+ key = JSON.parse(File.read(private_key_file))
20
+ @project_id = key['project_id']
21
+ end
22
+ else
23
+ # This method call style is needed for jruby.
24
+ # JRuby cannot handle correctly if method using keyword hash and last hash argument.
25
+ # see https://bugs.ruby-lang.org/issues/7529
26
+ @client = Kura.client(project_id = { "project_id" => @project_id, "client_email" => client_email, "private_key" => private_key },
27
+ client_email = nil, private_key = nil, {http_options: {timeout: 60}})
28
+ end
18
29
  rescue Kura::ApiError => e
19
30
  process_error(e)
20
31
  end
@@ -77,6 +88,12 @@ module Tumugi
77
88
  process_error(e)
78
89
  end
79
90
 
91
+ def table(dataset_id, table_id, project_id: nil)
92
+ @client.table(dataset_id, table_id, project_id: project_id || @project_id)
93
+ rescue Kura::ApiError => e
94
+ process_error(e)
95
+ end
96
+
80
97
  def table_exist?(dataset_id, table_id, project_id: nil)
81
98
  !@client.table(dataset_id, table_id, project_id: project_id || @project_id).nil?
82
99
  rescue Kura::ApiError => e
@@ -163,6 +180,7 @@ module Tumugi
163
180
  use_query_cache: true,
164
181
  user_defined_function_resources: nil,
165
182
  project_id: nil,
183
+ job_project_id: nil,
166
184
  job_id: nil,
167
185
  wait: nil,
168
186
  dry_run: false,
@@ -175,7 +193,7 @@ module Tumugi
175
193
  use_query_cache: use_query_cache,
176
194
  user_defined_function_resources: user_defined_function_resources,
177
195
  project_id: project_id || @project_id,
178
- job_project_id: project_id || @project_id,
196
+ job_project_id: job_project_id || @project_id,
179
197
  job_id: job_id,
180
198
  wait: wait,
181
199
  dry_run: dry_run,
@@ -185,28 +203,38 @@ module Tumugi
185
203
  end
186
204
 
187
205
  def load(dataset_id, table_id, source_uris=nil,
188
- schema: nil, delimiter: ",", field_delimiter: delimiter, mode: :append,
189
- allow_jagged_rows: false, max_bad_records: 0,
206
+ schema: nil,
207
+ field_delimiter: ",",
208
+ mode: :append,
209
+ allow_jagged_rows: false,
210
+ max_bad_records: 0,
190
211
  ignore_unknown_values: false,
191
212
  allow_quoted_newlines: false,
192
- quote: '"', skip_leading_rows: 0,
213
+ quote: '"',
214
+ skip_leading_rows: 0,
193
215
  source_format: "CSV",
194
216
  project_id: nil,
217
+ job_project_id: nil,
195
218
  job_id: nil,
196
219
  file: nil, wait: nil,
197
220
  dry_run: false,
198
221
  &blk)
199
222
  @client.load(dataset_id, table_id, source_uris=source_uris,
200
- schema: schema, delimiter: delimiter, field_delimiter: field_delimiter, mode: mode,
201
- allow_jagged_rows: allow_jagged_rows, max_bad_records: max_bad_records,
223
+ schema: schema,
224
+ field_delimiter: field_delimiter,
225
+ mode: mode,
226
+ allow_jagged_rows: allow_jagged_rows,
227
+ max_bad_records: max_bad_records,
202
228
  ignore_unknown_values: ignore_unknown_values,
203
229
  allow_quoted_newlines: allow_quoted_newlines,
204
- quote: quote, skip_leading_rows: skip_leading_rows,
230
+ quote: quote,
231
+ skip_leading_rows: skip_leading_rows,
205
232
  source_format: source_format,
206
233
  project_id: project_id || @project_id,
207
- job_project_id: project_id || @project_id,
234
+ job_project_id: job_project_id || @project_id,
208
235
  job_id: job_id,
209
- file: file, wait: wait,
236
+ file: file,
237
+ wait: wait,
210
238
  dry_run: dry_run,
211
239
  &blk)
212
240
  rescue Kura::ApiError => e
@@ -219,6 +247,7 @@ module Tumugi
219
247
  field_delimiter: ",",
220
248
  print_header: true,
221
249
  project_id: nil,
250
+ job_project_id: nil,
222
251
  job_id: nil,
223
252
  wait: nil,
224
253
  dry_run: false,
@@ -229,7 +258,7 @@ module Tumugi
229
258
  field_delimiter: field_delimiter,
230
259
  print_header: print_header,
231
260
  project_id: project_id || @project_id,
232
- job_project_id: project_id || @project_id,
261
+ job_project_id: job_project_id || @project_id,
233
262
  job_id: job_id,
234
263
  wait: wait,
235
264
  dry_run: dry_run,
@@ -242,6 +271,7 @@ module Tumugi
242
271
  mode: :truncate,
243
272
  src_project_id: nil,
244
273
  dest_project_id: nil,
274
+ job_project_id: dest_project_id,
245
275
  job_id: nil,
246
276
  wait: nil,
247
277
  dry_run: false,
@@ -250,7 +280,7 @@ module Tumugi
250
280
  mode: mode,
251
281
  src_project_id: src_project_id || @project_id,
252
282
  dest_project_id: dest_project_id || @project_id,
253
- job_project_id: dest_project_id || @project_id,
283
+ job_project_id: job_project_id || @project_id,
254
284
  job_id: job_id,
255
285
  wait: wait,
256
286
  dry_run: dry_run,
@@ -280,7 +310,7 @@ module Tumugi
280
310
  private
281
311
 
282
312
  def process_error(e)
283
- raise Tumugi::Plugin::Bigquery::BigqueryError.new(e.reason, e.message)
313
+ raise Tumugi::Plugin::Bigquery::BigqueryError.new(e.message, e.reason)
284
314
  end
285
315
  end
286
316
  end
@@ -1,7 +1,7 @@
1
1
  module Tumugi
2
2
  module Plugin
3
3
  module Bigquery
4
- VERSION = "0.1.0"
4
+ VERSION = "0.2.0"
5
5
  end
6
6
  end
7
7
  end
@@ -17,8 +17,8 @@ module Tumugi
17
17
  cfg = Tumugi.config.section('bigquery')
18
18
  @project_id = project_id || cfg.project_id
19
19
  @dataset_id = dataset_id
20
- @client = client || Tumugi::Plugin::Bigquery::Client.new(project_id: @project_id)
21
- @dataset = Tumugi::Plugin::Bigquery::Dataset.new(project_id: @project_id, dataset_id: @dataset_id)
20
+ @client = client || Tumugi::Plugin::Bigquery::Client.new(cfg.to_h.merge(project_id: @project_id))
21
+ @dataset = Tumugi::Plugin::Bigquery::Dataset.new(project_id: @client.project_id, dataset_id: @dataset_id)
22
22
  end
23
23
 
24
24
  def exist?
@@ -18,8 +18,8 @@ module Tumugi
18
18
  @project_id = project_id || cfg.project_id
19
19
  @dataset_id = dataset_id
20
20
  @table_id = table_id
21
- @client = client || Tumugi::Plugin::Bigquery::Client.new(project_id: @project_id)
22
- @table = Tumugi::Plugin::Bigquery::Table.new(project_id: @project_id, dataset_id: @dataset_id, table_id: @table_id)
21
+ @client = client || Tumugi::Plugin::Bigquery::Client.new(cfg.to_h.merge(project_id: @project_id))
22
+ @table = Tumugi::Plugin::Bigquery::Table.new(project_id: @client.project_id, dataset_id: @dataset_id, table_id: @table_id)
23
23
  end
24
24
 
25
25
  def exist?
@@ -15,9 +15,11 @@ module Tumugi
15
15
  param :wait, type: :int, default: 60
16
16
 
17
17
  def output
18
+ return @output if @output
19
+
18
20
  opts = { dataset_id: dest_dataset_id, table_id: dest_table_id }
19
21
  opts[:project_id] = dest_project_id if dest_project_id
20
- Tumugi::Plugin::BigqueryTableTarget.new(opts)
22
+ @output = Tumugi::Plugin::BigqueryTableTarget.new(opts)
21
23
  end
22
24
 
23
25
  def run
@@ -10,7 +10,7 @@ module Tumugi
10
10
  param :dataset_id, type: :string, required: true
11
11
 
12
12
  def output
13
- Tumugi::Plugin::BigqueryDatasetTarget.new(project_id: project_id, dataset_id: dataset_id)
13
+ @output ||= Tumugi::Plugin::BigqueryDatasetTarget.new(project_id: project_id, dataset_id: dataset_id)
14
14
  end
15
15
 
16
16
  def run
@@ -0,0 +1,112 @@
1
+ require 'json'
2
+ require 'tumugi'
3
+ require 'tumugi/plugin/file_system_target'
4
+ require_relative '../target/bigquery_table'
5
+
6
+ module Tumugi
7
+ module Plugin
8
+ class BigqueryExportTask < Tumugi::Task
9
+ Tumugi::Plugin.register_task('bigquery_export', self)
10
+
11
+ param :project_id, type: :string
12
+ param :job_project_id, type: :string
13
+ param :dataset_id, type: :string, required: true
14
+ param :table_id, type: :string, required: true
15
+
16
+ param :compression, type: :string, default: 'NONE' # GZIP
17
+ param :destination_format, type: :string, default: 'CSV' # NEWLINE_DELIMITED_JSON, AVRO
18
+
19
+ # Only effected if destiation_format == 'CSV'
20
+ param :field_delimiter, type: :string, default: ','
21
+ param :print_header, type: :bool, default: true
22
+
23
+ param :page_size, type: :integer, default: 10000
24
+
25
+ param :wait, type: :integer, default: 120
26
+
27
+ def run
28
+ unless output.is_a?(Tumugi::Plugin::FileSystemTarget)
29
+ raise Tumugi::TumugiError.new("BigqueryExportTask#output must be return a instance of Tumugi::Plugin::FileSystemTarget")
30
+ end
31
+
32
+ client = Tumugi::Plugin::Bigquery::Client.new(config)
33
+ table = Tumugi::Plugin::Bigquery::Table.new(project_id: client.project_id, dataset_id: dataset_id, table_id: table_id)
34
+ job_project_id = client.project_id if job_project_id.nil?
35
+
36
+ log "Source: #{table}"
37
+ log "Destination: #{output}"
38
+
39
+ if is_gcs?(output)
40
+ export_to_gcs(client)
41
+ else
42
+ if destination_format.upcase == 'AVRO'
43
+ raise Tumugi::TumugiError.new("destination_format='AVRO' is only supported when export to Google Cloud Storage")
44
+ end
45
+ if compression.upcase == 'GZIP'
46
+ logger.warn("compression parameter is ignored, it's only supported when export to Google Cloud Storage")
47
+ end
48
+ export_to_file_system(client)
49
+ end
50
+ end
51
+
52
+ private
53
+
54
+ def is_gcs?(target)
55
+ not target.to_s.match(/^gs:\/\/[^\/]+\/.+$/).nil?
56
+ end
57
+
58
+ def export_to_gcs(client)
59
+ options = {
60
+ compression: compression.upcase,
61
+ destination_format: destination_format.upcase,
62
+ field_delimiter: field_delimiter,
63
+ print_header: print_header,
64
+ project_id: client.project_id,
65
+ job_project_id: job_project_id || client.project_id,
66
+ wait: wait
67
+ }
68
+ client.extract(dataset_id, table_id, output.to_s, options)
69
+ end
70
+
71
+ def export_to_file_system(client)
72
+ schema ||= client.table(dataset_id, table_id, project_id: client.project_id).schema.fields
73
+ field_names = schema.map{|f| f.respond_to?(:[]) ? (f["name"] || f[:name]) : f.name }
74
+ start_index = 0
75
+ page_token = nil
76
+ options = {
77
+ max_result: page_size,
78
+ project_id: client.project_id,
79
+ }
80
+
81
+ output.open('w') do |file|
82
+ file.puts field_names.join(field_delimiter) if destination_format == 'CSV' && print_header
83
+ begin
84
+ table_data_list = client.list_tabledata(dataset_id, table_id, options.merge(start_index: start_index, page_token: page_token))
85
+ start_index += page_size
86
+ page_token = table_data_list[:next_token]
87
+ table_data_list[:rows].each do |row|
88
+ file.puts line(field_names, row, destination_format)
89
+ end
90
+ end while not page_token.nil?
91
+ end
92
+ end
93
+
94
+ def line(field_names, row, format)
95
+ case format
96
+ when 'CSV'
97
+ row.map{|v| v[1]}.join(field_delimiter)
98
+ when 'NEWLINE_DELIMITED_JSON'
99
+ JSON.generate(row.to_h)
100
+ end
101
+ end
102
+
103
+ def config
104
+ cfg = Tumugi.config.section('bigquery').to_h
105
+ unless project_id.nil?
106
+ cfg[:project_id] = project_id
107
+ end
108
+ cfg
109
+ end
110
+ end
111
+ end
112
+ end
@@ -0,0 +1,73 @@
1
+ require 'tumugi'
2
+ require_relative '../target/bigquery_table'
3
+
4
+ module Tumugi
5
+ module Plugin
6
+ class BigqueryLoadTask < Tumugi::Task
7
+ Tumugi::Plugin.register_task('bigquery_load', self)
8
+
9
+ param :bucket, type: :string, required: true
10
+ param :key, type: :string, required: true
11
+ param :project_id, type: :string
12
+ param :dataset_id, type: :string, required: true
13
+ param :table_id, type: :string, required: true
14
+
15
+ param :schema # type: :array
16
+ param :field_delimiter, type: :string, default: ','
17
+ param :mode, type: :string, default: 'append' # truncate, empty
18
+ param :allow_jagged_rows, type: :bool, default: false
19
+ param :max_bad_records, type: :integer, default: 0
20
+ param :ignore_unknown_values, type: :bool, default: false
21
+ param :allow_quoted_newlines, type: :bool, default: false
22
+ param :quote, type: :string, default: '"'
23
+ param :skip_leading_rows, type: :interger, default: 0
24
+ param :source_format, type: :string, default: 'CSV' # NEWLINE_DELIMITED_JSON, AVRO
25
+ param :wait, type: :integer, default: 60
26
+
27
+ def output
28
+ return @output if @output
29
+
30
+ opts = { dataset_id: dataset_id, table_id: table_id }
31
+ opts[:project_id] = project_id if project_id
32
+ @output = Tumugi::Plugin::BigqueryTableTarget.new(opts)
33
+ end
34
+
35
+ def run
36
+ if mode != 'append'
37
+ raise Tumugi::ParameterError.new("Parameter 'schema' is required when 'mode' is 'truncate' or 'empty'") if schema.nil?
38
+ end
39
+
40
+ src_uri = "gs://#{bucket}#{normalize_path(key)}"
41
+ log "Source: #{src_uri}"
42
+ log "Destination: #{output}"
43
+
44
+ bq_client = output.client
45
+ opts = {
46
+ schema: schema,
47
+ field_delimiter: field_delimiter,
48
+ mode: mode.to_sym,
49
+ allow_jagged_rows: allow_jagged_rows,
50
+ max_bad_records: max_bad_records,
51
+ ignore_unknown_values: ignore_unknown_values,
52
+ allow_quoted_newlines: allow_quoted_newlines,
53
+ quote: quote,
54
+ skip_leading_rows: skip_leading_rows,
55
+ source_format: source_format,
56
+ project_id: output.project_id,
57
+ wait: wait
58
+ }
59
+ bq_client.load(output.dataset_id, output.table_id, src_uri, opts)
60
+ end
61
+
62
+ private
63
+
64
+ def normalize_path(path)
65
+ unless path.start_with?('/')
66
+ "/#{path}"
67
+ else
68
+ path
69
+ end
70
+ end
71
+ end
72
+ end
73
+ end
@@ -13,7 +13,7 @@ module Tumugi
13
13
  param :wait, type: :int, default: 60
14
14
 
15
15
  def output
16
- Tumugi::Plugin::BigqueryTableTarget.new(project_id: project_id, dataset_id: dataset_id, table_id: table_id)
16
+ @output ||= Tumugi::Plugin::BigqueryTableTarget.new(project_id: project_id, dataset_id: dataset_id, table_id: table_id)
17
17
  end
18
18
 
19
19
  def run
@@ -20,8 +20,8 @@ Gem::Specification.new do |spec|
20
20
  spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
21
21
  spec.require_paths = ["lib"]
22
22
 
23
- spec.add_runtime_dependency "tumugi", "~> 0.4.5"
24
- spec.add_runtime_dependency "kura", "0.2.16"
23
+ spec.add_runtime_dependency "tumugi", ">= 0.5.1"
24
+ spec.add_runtime_dependency "kura", "~> 0.2.17"
25
25
 
26
26
  spec.add_development_dependency 'bundler', '~> 1.11'
27
27
  spec.add_development_dependency 'rake', '~> 10.0'
@@ -29,4 +29,5 @@ Gem::Specification.new do |spec|
29
29
  spec.add_development_dependency 'test-unit-rr'
30
30
  spec.add_development_dependency 'coveralls'
31
31
  spec.add_development_dependency 'github_changelog_generator'
32
+ spec.add_development_dependency 'tumugi-plugin-google_cloud_storage'
32
33
  end
metadata CHANGED
@@ -1,43 +1,43 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: tumugi-plugin-bigquery
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Kazuyuki Honda
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2016-05-16 00:00:00.000000000 Z
11
+ date: 2016-06-06 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: tumugi
15
15
  requirement: !ruby/object:Gem::Requirement
16
16
  requirements:
17
- - - "~>"
17
+ - - ">="
18
18
  - !ruby/object:Gem::Version
19
- version: 0.4.5
19
+ version: 0.5.1
20
20
  type: :runtime
21
21
  prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
- - - "~>"
24
+ - - ">="
25
25
  - !ruby/object:Gem::Version
26
- version: 0.4.5
26
+ version: 0.5.1
27
27
  - !ruby/object:Gem::Dependency
28
28
  name: kura
29
29
  requirement: !ruby/object:Gem::Requirement
30
30
  requirements:
31
- - - '='
31
+ - - "~>"
32
32
  - !ruby/object:Gem::Version
33
- version: 0.2.16
33
+ version: 0.2.17
34
34
  type: :runtime
35
35
  prerelease: false
36
36
  version_requirements: !ruby/object:Gem::Requirement
37
37
  requirements:
38
- - - '='
38
+ - - "~>"
39
39
  - !ruby/object:Gem::Version
40
- version: 0.2.16
40
+ version: 0.2.17
41
41
  - !ruby/object:Gem::Dependency
42
42
  name: bundler
43
43
  requirement: !ruby/object:Gem::Requirement
@@ -122,6 +122,20 @@ dependencies:
122
122
  - - ">="
123
123
  - !ruby/object:Gem::Version
124
124
  version: '0'
125
+ - !ruby/object:Gem::Dependency
126
+ name: tumugi-plugin-google_cloud_storage
127
+ requirement: !ruby/object:Gem::Requirement
128
+ requirements:
129
+ - - ">="
130
+ - !ruby/object:Gem::Version
131
+ version: '0'
132
+ type: :development
133
+ prerelease: false
134
+ version_requirements: !ruby/object:Gem::Requirement
135
+ requirements:
136
+ - - ">="
137
+ - !ruby/object:Gem::Version
138
+ version: '0'
125
139
  description:
126
140
  email:
127
141
  - hakobera@gmail.com
@@ -131,13 +145,16 @@ extra_rdoc_files: []
131
145
  files:
132
146
  - ".gitignore"
133
147
  - ".travis.yml"
148
+ - CHANGELOG.md
134
149
  - Gemfile
135
150
  - README.md
136
151
  - Rakefile
137
152
  - bin/setup
138
153
  - examples/copy.rb
139
154
  - examples/dataset.rb
155
+ - examples/load.rb
140
156
  - examples/query.rb
157
+ - examples/test.csv
141
158
  - examples/tumugi_config_example.rb
142
159
  - lib/tumugi/plugin/bigquery/client.rb
143
160
  - lib/tumugi/plugin/bigquery/dataset.rb
@@ -148,6 +165,8 @@ files:
148
165
  - lib/tumugi/plugin/target/bigquery_table.rb
149
166
  - lib/tumugi/plugin/task/bigquery_copy.rb
150
167
  - lib/tumugi/plugin/task/bigquery_dataset.rb
168
+ - lib/tumugi/plugin/task/bigquery_export.rb
169
+ - lib/tumugi/plugin/task/bigquery_load.rb
151
170
  - lib/tumugi/plugin/task/bigquery_query.rb
152
171
  - tumugi-plugin-bigquery.gemspec
153
172
  homepage: https://github.com/tumugi/tumugi-plugin-bigquery