gcloud 0.2.0 → 0.3.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +8 -8
- data/AUTHENTICATION.md +3 -3
- data/CHANGELOG.md +12 -0
- data/OVERVIEW.md +30 -0
- data/lib/gcloud.rb +126 -9
- data/lib/gcloud/bigquery.rb +399 -0
- data/lib/gcloud/bigquery/connection.rb +592 -0
- data/lib/gcloud/bigquery/copy_job.rb +98 -0
- data/lib/gcloud/bigquery/credentials.rb +29 -0
- data/lib/gcloud/bigquery/data.rb +134 -0
- data/lib/gcloud/bigquery/dataset.rb +662 -0
- data/lib/gcloud/bigquery/dataset/list.rb +51 -0
- data/lib/gcloud/bigquery/errors.rb +62 -0
- data/lib/gcloud/bigquery/extract_job.rb +117 -0
- data/lib/gcloud/bigquery/insert_response.rb +80 -0
- data/lib/gcloud/bigquery/job.rb +283 -0
- data/lib/gcloud/bigquery/job/list.rb +55 -0
- data/lib/gcloud/bigquery/load_job.rb +199 -0
- data/lib/gcloud/bigquery/project.rb +512 -0
- data/lib/gcloud/bigquery/query_data.rb +135 -0
- data/lib/gcloud/bigquery/query_job.rb +151 -0
- data/lib/gcloud/bigquery/table.rb +827 -0
- data/lib/gcloud/bigquery/table/list.rb +55 -0
- data/lib/gcloud/bigquery/view.rb +419 -0
- data/lib/gcloud/credentials.rb +3 -3
- data/lib/gcloud/datastore.rb +15 -3
- data/lib/gcloud/datastore/credentials.rb +3 -2
- data/lib/gcloud/datastore/dataset.rb +5 -1
- data/lib/gcloud/datastore/transaction.rb +1 -1
- data/lib/gcloud/pubsub.rb +14 -3
- data/lib/gcloud/pubsub/credentials.rb +4 -4
- data/lib/gcloud/pubsub/project.rb +5 -1
- data/lib/gcloud/pubsub/topic.rb +5 -0
- data/lib/gcloud/storage.rb +14 -24
- data/lib/gcloud/storage/bucket.rb +10 -4
- data/lib/gcloud/storage/credentials.rb +3 -2
- data/lib/gcloud/storage/file.rb +8 -1
- data/lib/gcloud/storage/project.rb +5 -1
- data/lib/gcloud/upload.rb +54 -0
- data/lib/gcloud/version.rb +1 -1
- metadata +78 -2
checksums.yaml
CHANGED
@@ -1,15 +1,15 @@
|
|
1
1
|
---
|
2
2
|
!binary "U0hBMQ==":
|
3
3
|
metadata.gz: !binary |-
|
4
|
-
|
4
|
+
ZDhjZWRiZTI5NTJjY2FhMmNlNGUzNmRjNTYwYWY0MzhiYmFjOWU2ZA==
|
5
5
|
data.tar.gz: !binary |-
|
6
|
-
|
6
|
+
MGE5MDc4NzkzOTQ2ZWE3NWE0ZDZhMWI5ZDc2M2M0NGNiOTFkMGFjYg==
|
7
7
|
SHA512:
|
8
8
|
metadata.gz: !binary |-
|
9
|
-
|
10
|
-
|
11
|
-
|
9
|
+
ODkwZjlhNTcxNTU5ZmQ0OWUzODUxNmY4MDQ1NzIyZTEyOWRiZGI0ZDNkMzc2
|
10
|
+
MDZjNTk0YTM2YzE5M2MzYjI3YTUzYjQ0YzI4YjM3YzUwMDQxYzg4NzdhNWI5
|
11
|
+
NDFkNzE1MDk1YmNkMmEyZTc4ZmNjNzM4NDQzNmVmYTY2ZjA2YzY=
|
12
12
|
data.tar.gz: !binary |-
|
13
|
-
|
14
|
-
|
15
|
-
|
13
|
+
Y2FmNTFjMjhjMWJlNWI4NGQ4YmI4ZjQzYmJiYTU0ZGRhNzNmMTZiYjI3YTFk
|
14
|
+
NDI3YjdiNDZlOTE4M2FkOTlhMDViYWM4NmRkOGMzYmJhM2RjMTE0MjA2NjMz
|
15
|
+
NDA3NWM5N2ViMjk3NmE2ZGY1YzdkZDA0YzcwMjIwOGQxMTQ2ZDk=
|
data/AUTHENTICATION.md
CHANGED
@@ -45,14 +45,14 @@ The **Project ID** and **Credentials JSON** can be placed in environment variabl
|
|
45
45
|
Here are the environment variables that Datastore checks for project ID:
|
46
46
|
|
47
47
|
1. DATASTORE_PROJECT
|
48
|
-
2.
|
48
|
+
2. GCLOUD_PROJECT
|
49
49
|
|
50
50
|
Here are the environment variables that Datastore checks for credentials:
|
51
51
|
|
52
52
|
1. DATASTORE_KEYFILE - Path to JSON file
|
53
|
-
2.
|
53
|
+
2. GCLOUD_KEYFILE - Path to JSON file
|
54
54
|
3. DATASTORE_KEYFILE_JSON - JSON contents
|
55
|
-
4.
|
55
|
+
4. GCLOUD_KEYFILE_JSON - JSON contents
|
56
56
|
|
57
57
|
### Cloud SDK
|
58
58
|
|
data/CHANGELOG.md
CHANGED
@@ -1,5 +1,17 @@
|
|
1
1
|
# Release History
|
2
2
|
|
3
|
+
### 0.3.0 / 2015-08-21
|
4
|
+
|
5
|
+
#### Major changes
|
6
|
+
|
7
|
+
Add BigQuery service
|
8
|
+
|
9
|
+
#### Minor changes
|
10
|
+
|
11
|
+
* Improve error messaging when uploading files to Storage
|
12
|
+
* Add `GCLOUD_PROJECT` and `GCLOUD_KEYFILE` environment variables
|
13
|
+
* Specify OAuth 2.0 scopes when connecting to services
|
14
|
+
|
3
15
|
### 0.2.0 / 2015-07-22
|
4
16
|
|
5
17
|
#### Major changes
|
data/OVERVIEW.md
CHANGED
@@ -8,6 +8,36 @@ $ gem install gcloud
|
|
8
8
|
|
9
9
|
Gcloud aims to make authentication as simple as possible. Google Cloud requires a **Project ID** and **Service Account Credentials** to connect to the APIs. You can learn more about various options for connection on the [Authentication Guide](AUTHENTICATION.md).
|
10
10
|
|
11
|
+
# BigQuery
|
12
|
+
|
13
|
+
[Google Cloud BigQuery](https://cloud.google.com/bigquery/) ([docs](https://cloud.google.com/bigquery/docs)) enables super-fast, SQL-like queries against append-only tables, using the processing power of Google's infrastructure. Simply move your data into BigQuery and let it handle the hard work. You can control access to both the project and your data based on your business needs, such as giving others the ability to view or query your data.
|
14
|
+
|
15
|
+
See the [gcloud-ruby BigQuery API documentation](rdoc-ref:Gcloud::Bigquery) to learn how to connect to Cloud BigQuery using this library.
|
16
|
+
|
17
|
+
```ruby
|
18
|
+
require "gcloud"
|
19
|
+
|
20
|
+
gcloud = Gcloud.new
|
21
|
+
bigquery = gcloud.bigquery
|
22
|
+
|
23
|
+
# Create a new table to archive todos
|
24
|
+
dataset = bigquery.dataset "my-todo-archive"
|
25
|
+
table = dataset.create_table "todos",
|
26
|
+
name: "Todos Archive",
|
27
|
+
description: "Archive for completed TODO records"
|
28
|
+
|
29
|
+
# Load data into the table
|
30
|
+
file = File.open "/archive/todos/completed-todos.csv"
|
31
|
+
load_job = table.load file
|
32
|
+
|
33
|
+
# Run a query for the number of completed todos by owner
|
34
|
+
count_sql = "SELECT owner, COUNT(*) AS complete_count FROM todos GROUP BY owner"
|
35
|
+
data = bigquery.query count_sql
|
36
|
+
data.each do |row|
|
37
|
+
puts row["name"]
|
38
|
+
end
|
39
|
+
```
|
40
|
+
|
11
41
|
# Datastore
|
12
42
|
|
13
43
|
[Google Cloud Datastore](https://cloud.google.com/datastore/) ([docs](https://cloud.google.com/datastore/docs)) is a fully managed, schemaless database for storing non-relational data. Cloud Datastore automatically scales with your users and supports ACID transactions, high availability of reads and writes, strong consistency for reads and ancestor queries, and eventual consistency for all other queries.
|
data/lib/gcloud.rb
CHANGED
@@ -70,11 +70,26 @@ module Gcloud
|
|
70
70
|
# Creates a new object for connecting to the Datastore service.
|
71
71
|
# Each call creates a new connection.
|
72
72
|
#
|
73
|
+
# === Parameters
|
74
|
+
#
|
75
|
+
# +options+::
|
76
|
+
# An optional Hash for controlling additional behavior. (+Hash+)
|
77
|
+
# <code>options[:scope]</code>::
|
78
|
+
# The OAuth 2.0 scopes controlling the set of resources and operations that
|
79
|
+
# the connection can access. See {Using OAuth 2.0 to Access Google
|
80
|
+
# APIs}[https://developers.google.com/identity/protocols/OAuth2]. (+String+
|
81
|
+
# or +Array+)
|
82
|
+
#
|
83
|
+
# The default scopes are:
|
84
|
+
#
|
85
|
+
# * +https://www.googleapis.com/auth/datastore+
|
86
|
+
# * +https://www.googleapis.com/auth/userinfo.email+
|
87
|
+
#
|
73
88
|
# === Returns
|
74
89
|
#
|
75
90
|
# Gcloud::Datastore::Dataset
|
76
91
|
#
|
77
|
-
# ===
|
92
|
+
# === Examples
|
78
93
|
#
|
79
94
|
# require "gcloud"
|
80
95
|
#
|
@@ -88,20 +103,43 @@ module Gcloud
|
|
88
103
|
#
|
89
104
|
# dataset.save entity
|
90
105
|
#
|
91
|
-
|
106
|
+
# You shouldn't need to override the default scope, but it is possible to do
|
107
|
+
# so with the +scope+ option:
|
108
|
+
#
|
109
|
+
# require "gcloud"
|
110
|
+
#
|
111
|
+
# gcloud = Gcloud.new
|
112
|
+
# platform_scope = "https://www.googleapis.com/auth/cloud-platform"
|
113
|
+
# dataset = gcloud.datastore scope: platform_scope
|
114
|
+
#
|
115
|
+
def datastore options = {}
|
92
116
|
require "gcloud/datastore"
|
93
|
-
Gcloud.datastore @project, @keyfile
|
117
|
+
Gcloud.datastore @project, @keyfile, options
|
94
118
|
end
|
95
119
|
|
96
120
|
##
|
97
121
|
# Creates a new object for connecting to the Storage service.
|
98
122
|
# Each call creates a new connection.
|
99
123
|
#
|
124
|
+
# === Parameters
|
125
|
+
#
|
126
|
+
# +options+::
|
127
|
+
# An optional Hash for controlling additional behavior. (+Hash+)
|
128
|
+
# <code>options[:scope]</code>::
|
129
|
+
# The OAuth 2.0 scopes controlling the set of resources and operations that
|
130
|
+
# the connection can access. See {Using OAuth 2.0 to Access Google
|
131
|
+
# APIs}[https://developers.google.com/identity/protocols/OAuth2]. (+String+
|
132
|
+
# or +Array+)
|
133
|
+
#
|
134
|
+
# The default scope is:
|
135
|
+
#
|
136
|
+
# * +https://www.googleapis.com/auth/devstorage.full_control+
|
137
|
+
#
|
100
138
|
# === Returns
|
101
139
|
#
|
102
140
|
# Gcloud::Storage::Project
|
103
141
|
#
|
104
|
-
# ===
|
142
|
+
# === Examples
|
105
143
|
#
|
106
144
|
# require "gcloud"
|
107
145
|
#
|
@@ -110,20 +148,44 @@ module Gcloud
|
|
110
148
|
# bucket = storage.bucket "my-bucket"
|
111
149
|
# file = bucket.file "path/to/my-file.ext"
|
112
150
|
#
|
113
|
-
|
151
|
+
# The default scope can be overridden with the +scope+ option. For more
|
152
|
+
# information see {Storage OAuth 2.0
|
153
|
+
# Authentication}[https://cloud.google.com/storage/docs/authentication#oauth].
|
154
|
+
#
|
155
|
+
# require "gcloud"
|
156
|
+
#
|
157
|
+
# gcloud = Gcloud.new
|
158
|
+
# readonly_scope = "https://www.googleapis.com/auth/devstorage.read_only"
|
159
|
+
# readonly_storage = gcloud.storage scope: readonly_scope
|
160
|
+
#
|
161
|
+
def storage options = {}
|
114
162
|
require "gcloud/storage"
|
115
|
-
Gcloud.storage @project, @keyfile
|
163
|
+
Gcloud.storage @project, @keyfile, options
|
116
164
|
end
|
117
165
|
|
118
166
|
##
|
119
167
|
# Creates a new object for connecting to the Pub/Sub service.
|
120
168
|
# Each call creates a new connection.
|
121
169
|
#
|
170
|
+
# === Parameters
|
171
|
+
#
|
172
|
+
# +options+::
|
173
|
+
# An optional Hash for controlling additional behavior. (+Hash+)
|
174
|
+
# <code>options[:scope]</code>::
|
175
|
+
# The OAuth 2.0 scopes controlling the set of resources and operations that
|
176
|
+
# the connection can access. See {Using OAuth 2.0 to Access Google
|
177
|
+
# APIs}[https://developers.google.com/identity/protocols/OAuth2]. (+String+
|
178
|
+
# or +Array+)
|
179
|
+
#
|
180
|
+
# The default scope is:
|
181
|
+
#
|
182
|
+
# * +https://www.googleapis.com/auth/pubsub+
|
183
|
+
#
|
122
184
|
# === Returns
|
123
185
|
#
|
124
186
|
# Gcloud::Pubsub::Project
|
125
187
|
#
|
126
|
-
# ===
|
188
|
+
# === Examples
|
127
189
|
#
|
128
190
|
# require "gcloud"
|
129
191
|
#
|
@@ -132,9 +194,64 @@ module Gcloud
|
|
132
194
|
# topic = pubsub.topic "my-topic"
|
133
195
|
# topic.publish "task completed"
|
134
196
|
#
|
135
|
-
|
197
|
+
# The default scope can be overridden with the +scope+ option:
|
198
|
+
#
|
199
|
+
# require "gcloud"
|
200
|
+
#
|
201
|
+
# gcloud = Gcloud.new
|
202
|
+
# platform_scope = "https://www.googleapis.com/auth/cloud-platform"
|
203
|
+
# pubsub = gcloud.pubsub scope: platform_scope
|
204
|
+
#
|
205
|
+
def pubsub options = {}
|
136
206
|
require "gcloud/pubsub"
|
137
|
-
Gcloud.pubsub @project, @keyfile
|
207
|
+
Gcloud.pubsub @project, @keyfile, options
|
208
|
+
end
|
209
|
+
|
210
|
+
##
|
211
|
+
# Creates a new object for connecting to the BigQuery service.
|
212
|
+
# Each call creates a new connection.
|
213
|
+
#
|
214
|
+
# === Parameters
|
215
|
+
#
|
216
|
+
# +options+::
|
217
|
+
# An optional Hash for controlling additional behavior. (+Hash+)
|
218
|
+
# <code>options[:scope]</code>::
|
219
|
+
# The OAuth 2.0 scopes controlling the set of resources and operations that
|
220
|
+
# the connection can access. See {Using OAuth 2.0 to Access Google
|
221
|
+
# APIs}[https://developers.google.com/identity/protocols/OAuth2]. (+String+
|
222
|
+
# or +Array+)
|
223
|
+
#
|
224
|
+
# The default scope is:
|
225
|
+
#
|
226
|
+
# * +https://www.googleapis.com/auth/bigquery+
|
227
|
+
#
|
228
|
+
# === Returns
|
229
|
+
#
|
230
|
+
# Gcloud::Bigquery::Project
|
231
|
+
#
|
232
|
+
# === Examples
|
233
|
+
#
|
234
|
+
# require "gcloud"
|
235
|
+
#
|
236
|
+
# gcloud = Gcloud.new
|
237
|
+
# bigquery = gcloud.bigquery
|
238
|
+
# dataset = bigquery.dataset "my-dataset"
|
239
|
+
# table = dataset.table "my-table"
|
240
|
+
# table.data.each do |row|
|
241
|
+
# puts row
|
242
|
+
# end
|
243
|
+
#
|
244
|
+
# The default scope can be overridden with the +scope+ option:
|
245
|
+
#
|
246
|
+
# require "gcloud"
|
247
|
+
#
|
248
|
+
# gcloud = Gcloud.new
|
249
|
+
# platform_scope = "https://www.googleapis.com/auth/cloud-platform"
|
250
|
+
# bigquery = gcloud.bigquery scope: platform_scope
|
251
|
+
#
|
252
|
+
def bigquery options = {}
|
253
|
+
require "gcloud/bigquery"
|
254
|
+
Gcloud.bigquery @project, @keyfile, options
|
138
255
|
end
|
139
256
|
|
140
257
|
##
|
@@ -0,0 +1,399 @@
|
|
1
|
+
# Copyright 2015 Google Inc. All rights reserved.
|
2
|
+
#
|
3
|
+
# Licensed under the Apache License, Version 2.0 (the "License");
|
4
|
+
# you may not use this file except in compliance with the License.
|
5
|
+
# You may obtain a copy of the License at
|
6
|
+
#
|
7
|
+
# http://www.apache.org/licenses/LICENSE-2.0
|
8
|
+
#
|
9
|
+
# Unless required by applicable law or agreed to in writing, software
|
10
|
+
# distributed under the License is distributed on an "AS IS" BASIS,
|
11
|
+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
12
|
+
# See the License for the specific language governing permissions and
|
13
|
+
# limitations under the License.
|
14
|
+
|
15
|
+
require "gcloud"
|
16
|
+
require "gcloud/bigquery/project"
|
17
|
+
|
18
|
+
#--
|
19
|
+
# Google Cloud BigQuery
|
20
|
+
module Gcloud
|
21
|
+
##
|
22
|
+
# Creates a new +Project+ instance connected to the BigQuery service.
|
23
|
+
# Each call creates a new connection.
|
24
|
+
#
|
25
|
+
# === Parameters
|
26
|
+
#
|
27
|
+
# +project+::
|
28
|
+
# Identifier for a BigQuery project. If not present, the default project for
|
29
|
+
# the credentials is used. (+String+)
|
30
|
+
# +keyfile+::
|
31
|
+
# Keyfile downloaded from Google Cloud. If file path the file must be
|
32
|
+
# readable. (+String+ or +Hash+)
|
33
|
+
# +options+::
|
34
|
+
# An optional Hash for controlling additional behavior. (+Hash+)
|
35
|
+
# <code>options[:scope]</code>::
|
36
|
+
# The OAuth 2.0 scopes controlling the set of resources and operations that
|
37
|
+
# the connection can access. See {Using OAuth 2.0 to Access Google
|
38
|
+
# APIs}[https://developers.google.com/identity/protocols/OAuth2]. (+String+
|
39
|
+
# or +Array+)
|
40
|
+
#
|
41
|
+
# The default scope is:
|
42
|
+
#
|
43
|
+
# * +https://www.googleapis.com/auth/bigquery+
|
44
|
+
#
|
45
|
+
# === Returns
|
46
|
+
#
|
47
|
+
# Gcloud::Bigquery::Project
|
48
|
+
#
|
49
|
+
# === Example
|
50
|
+
#
|
51
|
+
# require "gcloud/bigquery"
|
52
|
+
#
|
53
|
+
# bigquery = Gcloud.bigquery
|
54
|
+
# dataset = bigquery.dataset "my_dataset"
|
55
|
+
# table = dataset.table "my_table"
|
56
|
+
#
|
57
|
+
def self.bigquery project = nil, keyfile = nil, options = {}
|
58
|
+
project ||= Gcloud::Bigquery::Project.default_project
|
59
|
+
if keyfile.nil?
|
60
|
+
credentials = Gcloud::Bigquery::Credentials.default options
|
61
|
+
else
|
62
|
+
credentials = Gcloud::Bigquery::Credentials.new keyfile, options
|
63
|
+
end
|
64
|
+
Gcloud::Bigquery::Project.new project, credentials
|
65
|
+
end
|
66
|
+
|
67
|
+
##
|
68
|
+
# = Google Cloud BigQuery
|
69
|
+
#
|
70
|
+
# Google Cloud BigQuery enables super-fast, SQL-like queries against massive
|
71
|
+
# datasets, using the processing power of Google's infrastructure. To learn
|
72
|
+
# more, read {What is
|
73
|
+
# BigQuery?}[https://cloud.google.com/bigquery/what-is-bigquery].
|
74
|
+
#
|
75
|
+
# Gcloud's goal is to provide an API that is familiar and comfortable to
|
76
|
+
# Rubyists. Authentication is handled by Gcloud#bigquery. You can provide
|
77
|
+
# the project and credential information to connect to the BigQuery service,
|
78
|
+
# or if you are running on Google Compute Engine this configuration is taken
|
79
|
+
# care of for you. You can read more about the options for connecting in the
|
80
|
+
# {Authentication Guide}[link:AUTHENTICATION.md].
|
81
|
+
#
|
82
|
+
# To help you get started quickly, the first few examples below use a public
|
83
|
+
# dataset provided by Google. As soon as you have {signed
|
84
|
+
# up}[https://cloud.google.com/bigquery/sign-up] to use BigQuery, and provided
|
85
|
+
# that you stay in the free tier for queries, you should be able to run these
|
86
|
+
# first examples without the need to set up billing or to load data (although
|
87
|
+
# we'll show you how to do that too.)
|
88
|
+
#
|
89
|
+
# == Listing Datasets and Tables
|
90
|
+
#
|
91
|
+
# A BigQuery project holds datasets, which in turn hold tables. Assuming that
|
92
|
+
# you have not yet created datasets or tables in your own project, let's
|
93
|
+
# connect to Google's +publicdata+ project, and see what you find.
|
94
|
+
#
|
95
|
+
# require "gcloud"
|
96
|
+
#
|
97
|
+
# gcloud = Gcloud.new "publicdata"
|
98
|
+
# bigquery = gcloud.bigquery
|
99
|
+
#
|
100
|
+
# bigquery.datasets.count #=> 1
|
101
|
+
# bigquery.datasets.first.dataset_id #=> "samples"
|
102
|
+
#
|
103
|
+
# dataset = bigquery.datasets.first
|
104
|
+
# tables = dataset.tables
|
105
|
+
#
|
106
|
+
# tables.count #=> 7
|
107
|
+
# tables.map &:table_id #=> [..., "shakespeare", "trigrams", "wikipedia"]
|
108
|
+
#
|
109
|
+
# In addition listing all datasets and tables in the project, you can also
|
110
|
+
# retrieve individual datasets and tables by ID. Let's look at the structure
|
111
|
+
# of the +shakespeare+ table, which contains an entry for every word in every
|
112
|
+
# play written by Shakespeare.
|
113
|
+
#
|
114
|
+
# require "gcloud"
|
115
|
+
#
|
116
|
+
# gcloud = Gcloud.new "publicdata"
|
117
|
+
# bigquery = gcloud.bigquery
|
118
|
+
#
|
119
|
+
# dataset = bigquery.dataset "samples"
|
120
|
+
# table = dataset.table "shakespeare"
|
121
|
+
#
|
122
|
+
# table.headers #=> ["word", "word_count", "corpus", "corpus_date"]
|
123
|
+
# table.rows_count #=> 164656
|
124
|
+
#
|
125
|
+
# Now that you know the column names for the Shakespeare table, you can write
|
126
|
+
# and run a query.
|
127
|
+
#
|
128
|
+
# == Running queries
|
129
|
+
#
|
130
|
+
# BigQuery offers both synchronous and asynchronous methods, as explained in
|
131
|
+
# {Querying Data}[https://cloud.google.com/bigquery/querying-data].
|
132
|
+
#
|
133
|
+
# === Synchronous queries
|
134
|
+
#
|
135
|
+
# Let's start with the simpler synchronous approach. Notice that this time you
|
136
|
+
# are connecting using your own default project. This is necessary for running
|
137
|
+
# a query, since queries need to be able to create tables to hold results.
|
138
|
+
#
|
139
|
+
# require "gcloud"
|
140
|
+
#
|
141
|
+
# gcloud = Gcloud.new
|
142
|
+
# bigquery = gcloud.bigquery
|
143
|
+
#
|
144
|
+
# sql = "SELECT TOP(word, 50) as word, COUNT(*) as count " +
|
145
|
+
# "FROM publicdata:samples.shakespeare"
|
146
|
+
# data = bigquery.query sql
|
147
|
+
#
|
148
|
+
# data.count #=> 50
|
149
|
+
# data.next? #=> false
|
150
|
+
# data.first #=> {"word"=>"you", "count"=>42}
|
151
|
+
#
|
152
|
+
# The +TOP+ function shown above is just one of a variety of functions
|
153
|
+
# offered by BigQuery. See the {Query
|
154
|
+
# Reference}[https://cloud.google.com/bigquery/query-reference] for a full
|
155
|
+
# listing.
|
156
|
+
#
|
157
|
+
# === Asynchronous queries
|
158
|
+
#
|
159
|
+
# Because you probably should not block for most BigQuery operations,
|
160
|
+
# including querying as well as importing, exporting, and copying data, the
|
161
|
+
# BigQuery API enables you to manage longer-running jobs. In the asynchronous
|
162
|
+
# approach to running a query, an instance of Gcloud::Bigquery::QueryJob is
|
163
|
+
# returned, rather than an instance of Gcloud::Bigquery::QueryData.
|
164
|
+
#
|
165
|
+
# require "gcloud"
|
166
|
+
#
|
167
|
+
# gcloud = Gcloud.new
|
168
|
+
# bigquery = gcloud.bigquery
|
169
|
+
#
|
170
|
+
# sql = "SELECT TOP(word, 50) as word, COUNT(*) as count " +
|
171
|
+
# "FROM publicdata:samples.shakespeare"
|
172
|
+
# job = bigquery.query_job sql
|
173
|
+
#
|
174
|
+
# loop do
|
175
|
+
# break if job.done?
|
176
|
+
# sleep 1
|
177
|
+
# job.refresh!
|
178
|
+
# end
|
179
|
+
# if !job.failed?
|
180
|
+
# job.query_results.each do |row|
|
181
|
+
# puts row["word"]
|
182
|
+
# end
|
183
|
+
# end
|
184
|
+
#
|
185
|
+
# Once you have determined that the job is done and has not failed, you can
|
186
|
+
# obtain an instance of Gcloud::Bigquery::QueryData by calling
|
187
|
+
# Gcloud::Bigquery::QueryJob#query_results. The query results for both of
|
188
|
+
# the above examples are stored in temporary tables with a lifetime of about
|
189
|
+
# 24 hours. See the final example below for a demonstration of how to store
|
190
|
+
# query results in a permanent table.
|
191
|
+
#
|
192
|
+
# == Creating Datasets and Tables
|
193
|
+
#
|
194
|
+
# The first thing you need to do in a new BigQuery project is to create a
|
195
|
+
# Gcloud::Bigquery::Dataset. Datasets hold tables and control access to them.
|
196
|
+
#
|
197
|
+
# require "gcloud/bigquery"
|
198
|
+
#
|
199
|
+
# gcloud = Gcloud.new
|
200
|
+
# bigquery = gcloud.bigquery
|
201
|
+
# dataset = bigquery.create_dataset "my_dataset"
|
202
|
+
#
|
203
|
+
# Now that you have a dataset, you can use it to create a table. Every table
|
204
|
+
# is defined by a schema that may contain nested and repeated fields. The
|
205
|
+
# example below shows a schema with a repeated record field named
|
206
|
+
# +cities_lived+. (For more information about nested and repeated fields, see
|
207
|
+
# {Preparing Data for
|
208
|
+
# BigQuery}[https://cloud.google.com/bigquery/preparing-data-for-bigquery].)
|
209
|
+
#
|
210
|
+
# require "gcloud"
|
211
|
+
#
|
212
|
+
# gcloud = Gcloud.new
|
213
|
+
# bigquery = gcloud.bigquery
|
214
|
+
# dataset = bigquery.dataset "my_dataset"
|
215
|
+
#
|
216
|
+
# schema = {
|
217
|
+
# "fields" => [
|
218
|
+
# {
|
219
|
+
# "name" => "first_name",
|
220
|
+
# "type" => "STRING",
|
221
|
+
# "mode" => "REQUIRED"
|
222
|
+
# },
|
223
|
+
# {
|
224
|
+
# "name" => "cities_lived",
|
225
|
+
# "type" => "RECORD",
|
226
|
+
# "mode" => "REPEATED",
|
227
|
+
# "fields" => [
|
228
|
+
# {
|
229
|
+
# "name" => "place",
|
230
|
+
# "type" => "STRING",
|
231
|
+
# "mode" => "REQUIRED"
|
232
|
+
# },
|
233
|
+
# {
|
234
|
+
# "name" => "number_of_years",
|
235
|
+
# "type" => "INTEGER",
|
236
|
+
# "mode" => "REQUIRED"
|
237
|
+
# }
|
238
|
+
# ]
|
239
|
+
# }
|
240
|
+
# ]
|
241
|
+
# }
|
242
|
+
# table = dataset.create_table "people", schema: schema
|
243
|
+
#
|
244
|
+
# Because of the repeated field in this schema, we cannot use the CSV format
|
245
|
+
# to load data into the table.
|
246
|
+
#
|
247
|
+
# == Loading records
|
248
|
+
#
|
249
|
+
# In addition to CSV, data can be imported from files that are formatted as
|
250
|
+
# {Newline-delimited JSON}[http://jsonlines.org/] or
|
251
|
+
# {Avro}[http://avro.apache.org/], or from a Google Cloud Datastore backup. It
|
252
|
+
# can also be "streamed" into BigQuery.
|
253
|
+
#
|
254
|
+
# To follow along with these examples, you will need to set up billing on the
|
255
|
+
# {Google Developers Console}[https://console.developers.google.com].
|
256
|
+
#
|
257
|
+
# === Streaming records
|
258
|
+
#
|
259
|
+
# For situations in which you want new data to be available for querying as
|
260
|
+
# soon as possible, inserting individual records directly from your Ruby
|
261
|
+
# application is a great approach.
|
262
|
+
#
|
263
|
+
# require "gcloud"
|
264
|
+
#
|
265
|
+
# gcloud = Gcloud.new
|
266
|
+
# bigquery = gcloud.bigquery
|
267
|
+
# dataset = bigquery.dataset "my_dataset"
|
268
|
+
# table = dataset.table "people"
|
269
|
+
#
|
270
|
+
# rows = [
|
271
|
+
# {
|
272
|
+
# "first_name" => "Anna",
|
273
|
+
# "cities_lived" => [
|
274
|
+
# {
|
275
|
+
# "place" => "Stockholm",
|
276
|
+
# "number_of_years" => 2
|
277
|
+
# }
|
278
|
+
# ]
|
279
|
+
# },
|
280
|
+
# {
|
281
|
+
# "first_name" => "Bob",
|
282
|
+
# "cities_lived" => [
|
283
|
+
# {
|
284
|
+
# "place" => "Seattle",
|
285
|
+
# "number_of_years" => 5
|
286
|
+
# },
|
287
|
+
# {
|
288
|
+
# "place" => "Austin",
|
289
|
+
# "number_of_years" => 6
|
290
|
+
# }
|
291
|
+
# ]
|
292
|
+
# }
|
293
|
+
# ]
|
294
|
+
# table.insert rows
|
295
|
+
#
|
296
|
+
# There are some trade-offs involved with streaming, so be sure to read the
|
297
|
+
# discussion of data consistency in {Streaming Data Into
|
298
|
+
# BigQuery}[https://cloud.google.com/bigquery/streaming-data-into-bigquery].
|
299
|
+
#
|
300
|
+
# === Uploading a file
|
301
|
+
#
|
302
|
+
# To follow along with this example, please download the
|
303
|
+
# {names.zip}[http://www.ssa.gov/OACT/babynames/names.zip] archive from the
|
304
|
+
# U.S. Social Security Administration. Inside the archive you will find over
|
305
|
+
# 100 files containing baby name records since the year 1880. A PDF file also
|
306
|
+
# contained in the archive specifies the schema used below.
|
307
|
+
#
|
308
|
+
# require "gcloud"
|
309
|
+
#
|
310
|
+
# gcloud = Gcloud.new
|
311
|
+
# bigquery = gcloud.bigquery
|
312
|
+
# dataset = bigquery.dataset "my_dataset"
|
313
|
+
# schema = {
|
314
|
+
# "fields" => [
|
315
|
+
# {
|
316
|
+
# "name" => "name",
|
317
|
+
# "type" => "STRING",
|
318
|
+
# "mode" => "REQUIRED"
|
319
|
+
# },
|
320
|
+
# {
|
321
|
+
# "name" => "sex",
|
322
|
+
# "type" => "STRING",
|
323
|
+
# "mode" => "REQUIRED"
|
324
|
+
# },
|
325
|
+
# {
|
326
|
+
# "name" => "number",
|
327
|
+
# "type" => "INTEGER",
|
328
|
+
# "mode" => "REQUIRED"
|
329
|
+
# }
|
330
|
+
# ]
|
331
|
+
# }
|
332
|
+
# table = dataset.create_table "baby_names", schema: schema
|
333
|
+
#
|
334
|
+
# file = File.open "names/yob2014.txt"
|
335
|
+
# load_job = table.load file, format: "csv"
|
336
|
+
#
|
337
|
+
# Because the names data, although formatted as CSV, is distributed in files
|
338
|
+
# with a +.txt+ extension, this example explicitly passes the +format+ option
|
339
|
+
# in order to demonstrate how to handle such situations. Because CSV is the
|
340
|
+
# default format for load operations, the option is not actually necessary.
|
341
|
+
# For JSON saved with a +.txt+ extension, however, it would be.
|
342
|
+
#
|
343
|
+
# == Exporting query results to Google Cloud Storage
|
344
|
+
#
|
345
|
+
# The example below shows how to pass the +table+ option with a query in order
|
346
|
+
# to store results in a permanent table. It also shows how to export the
|
347
|
+
# result data to a Google Cloud Storage file. In order to follow along, you
|
348
|
+
# will need to enable the Google Cloud Storage API in addition to setting up
|
349
|
+
# billing.
|
350
|
+
#
|
351
|
+
# require "gcloud"
|
352
|
+
#
|
353
|
+
# gcloud = Gcloud.new
|
354
|
+
# bigquery = gcloud.bigquery
|
355
|
+
# dataset = bigquery.dataset "my_dataset"
|
356
|
+
# source_table = dataset.table "baby_names"
|
357
|
+
# result_table = dataset.create_table "baby_names_results"
|
358
|
+
#
|
359
|
+
# sql = "SELECT name, number as count " +
|
360
|
+
# "FROM baby_names " +
|
361
|
+
# "WHERE name CONTAINS 'Sam' " +
|
362
|
+
# "ORDER BY count DESC"
|
363
|
+
# query_job = dataset.query_job sql, table: result_table
|
364
|
+
#
|
365
|
+
# loop do
|
366
|
+
# break if query_job.done?
|
367
|
+
# sleep 1
|
368
|
+
# query_job.refresh!
|
369
|
+
# end
|
370
|
+
#
|
371
|
+
# if !query_job.failed?
|
372
|
+
#
|
373
|
+
# storage = gcloud.storage
|
374
|
+
# bucket_id = "bigquery-exports-#{SecureRandom.uuid}"
|
375
|
+
# bucket = storage.create_bucket bucket_id
|
376
|
+
# extract_url = "gs://#{bucket.id}/baby-names-sam.csv"
|
377
|
+
#
|
378
|
+
# extract_job = result_table.extract extract_url
|
379
|
+
#
|
380
|
+
# loop do
|
381
|
+
# break if extract_job.done?
|
382
|
+
# sleep 1
|
383
|
+
# extract_job.refresh!
|
384
|
+
# end
|
385
|
+
#
|
386
|
+
# # Download to local filesystem
|
387
|
+
# bucket.files.first.download "baby-names-sam.csv"
|
388
|
+
#
|
389
|
+
# end
|
390
|
+
#
|
391
|
+
# If a table you wish to export contains a large amount of data, you can pass
|
392
|
+
# a wildcard URI to export to multiple files (for sharding), or an array of
|
393
|
+
# URIs (for partitioning), or both. See {Exporting Data From
|
394
|
+
# BigQuery}[https://cloud.google.com/bigquery/exporting-data-from-bigquery]
|
395
|
+
# for details.
|
396
|
+
#
|
397
|
+
module Bigquery
|
398
|
+
end
|
399
|
+
end
|