gcloud 0.2.0 → 0.3.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (41) hide show
  1. checksums.yaml +8 -8
  2. data/AUTHENTICATION.md +3 -3
  3. data/CHANGELOG.md +12 -0
  4. data/OVERVIEW.md +30 -0
  5. data/lib/gcloud.rb +126 -9
  6. data/lib/gcloud/bigquery.rb +399 -0
  7. data/lib/gcloud/bigquery/connection.rb +592 -0
  8. data/lib/gcloud/bigquery/copy_job.rb +98 -0
  9. data/lib/gcloud/bigquery/credentials.rb +29 -0
  10. data/lib/gcloud/bigquery/data.rb +134 -0
  11. data/lib/gcloud/bigquery/dataset.rb +662 -0
  12. data/lib/gcloud/bigquery/dataset/list.rb +51 -0
  13. data/lib/gcloud/bigquery/errors.rb +62 -0
  14. data/lib/gcloud/bigquery/extract_job.rb +117 -0
  15. data/lib/gcloud/bigquery/insert_response.rb +80 -0
  16. data/lib/gcloud/bigquery/job.rb +283 -0
  17. data/lib/gcloud/bigquery/job/list.rb +55 -0
  18. data/lib/gcloud/bigquery/load_job.rb +199 -0
  19. data/lib/gcloud/bigquery/project.rb +512 -0
  20. data/lib/gcloud/bigquery/query_data.rb +135 -0
  21. data/lib/gcloud/bigquery/query_job.rb +151 -0
  22. data/lib/gcloud/bigquery/table.rb +827 -0
  23. data/lib/gcloud/bigquery/table/list.rb +55 -0
  24. data/lib/gcloud/bigquery/view.rb +419 -0
  25. data/lib/gcloud/credentials.rb +3 -3
  26. data/lib/gcloud/datastore.rb +15 -3
  27. data/lib/gcloud/datastore/credentials.rb +3 -2
  28. data/lib/gcloud/datastore/dataset.rb +5 -1
  29. data/lib/gcloud/datastore/transaction.rb +1 -1
  30. data/lib/gcloud/pubsub.rb +14 -3
  31. data/lib/gcloud/pubsub/credentials.rb +4 -4
  32. data/lib/gcloud/pubsub/project.rb +5 -1
  33. data/lib/gcloud/pubsub/topic.rb +5 -0
  34. data/lib/gcloud/storage.rb +14 -24
  35. data/lib/gcloud/storage/bucket.rb +10 -4
  36. data/lib/gcloud/storage/credentials.rb +3 -2
  37. data/lib/gcloud/storage/file.rb +8 -1
  38. data/lib/gcloud/storage/project.rb +5 -1
  39. data/lib/gcloud/upload.rb +54 -0
  40. data/lib/gcloud/version.rb +1 -1
  41. metadata +78 -2
checksums.yaml CHANGED
@@ -1,15 +1,15 @@
1
1
  ---
2
2
  !binary "U0hBMQ==":
3
3
  metadata.gz: !binary |-
4
- MmIyMDI3NTljZjg4NGM3YzA2ODY0YmMwMmE5ZDUwOGJmM2I5MmZlMA==
4
+ ZDhjZWRiZTI5NTJjY2FhMmNlNGUzNmRjNTYwYWY0MzhiYmFjOWU2ZA==
5
5
  data.tar.gz: !binary |-
6
- ZmY3ZTI0NmQ4MWIzMmYyZmZjY2ViYThkZDRlN2MxZjZiODUzNjhiNg==
6
+ MGE5MDc4NzkzOTQ2ZWE3NWE0ZDZhMWI5ZDc2M2M0NGNiOTFkMGFjYg==
7
7
  SHA512:
8
8
  metadata.gz: !binary |-
9
- ODE0ZTZjNzI4M2I2MDhlMWYyNTNjYjVhMTIwNzc5NTI2ZmE4MmVlY2NiOGEw
10
- YjZjMjlkODNjOWNmMWEwZDdlNmMyYWVkNTEzOTgyYzM2NDJiM2YyZWUyOGUy
11
- NmFmYzEzMzQwYWUzNjc5N2Q0MWRhOWY0ZDIyNjEyYzRkMTU0ODg=
9
+ ODkwZjlhNTcxNTU5ZmQ0OWUzODUxNmY4MDQ1NzIyZTEyOWRiZGI0ZDNkMzc2
10
+ MDZjNTk0YTM2YzE5M2MzYjI3YTUzYjQ0YzI4YjM3YzUwMDQxYzg4NzdhNWI5
11
+ NDFkNzE1MDk1YmNkMmEyZTc4ZmNjNzM4NDQzNmVmYTY2ZjA2YzY=
12
12
  data.tar.gz: !binary |-
13
- YzFmZGEzYTExZTI4Y2FjN2E4MmRlMjdmNjI5ZGE3Y2Q2Mjk2ZDUwYzcxOWNj
14
- MDVmZDc1NTI5OWEwN2Y2M2RiYmJmYzJiZTQyNjU5MmY0NDFjZWJlNzM4MGZh
15
- ZWMzZDkzMDhlMTkzYmU5ZGNkYzUwYmVkY2RlMTEwMWZlYmQ5NzA=
13
+ Y2FmNTFjMjhjMWJlNWI4NGQ4YmI4ZjQzYmJiYTU0ZGRhNzNmMTZiYjI3YTFk
14
+ NDI3YjdiNDZlOTE4M2FkOTlhMDViYWM4NmRkOGMzYmJhM2RjMTE0MjA2NjMz
15
+ NDA3NWM5N2ViMjk3NmE2ZGY1YzdkZDA0YzcwMjIwOGQxMTQ2ZDk=
@@ -45,14 +45,14 @@ The **Project ID** and **Credentials JSON** can be placed in environment variabl
45
45
  Here are the environment variables that Datastore checks for project ID:
46
46
 
47
47
  1. DATASTORE_PROJECT
48
- 2. GOOGLE_CLOUD_PROJECT
48
+ 2. GCLOUD_PROJECT
49
49
 
50
50
  Here are the environment variables that Datastore checks for credentials:
51
51
 
52
52
  1. DATASTORE_KEYFILE - Path to JSON file
53
- 2. GOOGLE_CLOUD_KEYFILE - Path to JSON file
53
+ 2. GCLOUD_KEYFILE - Path to JSON file
54
54
  3. DATASTORE_KEYFILE_JSON - JSON contents
55
- 4. GOOGLE_CLOUD_KEYFILE_JSON - JSON contents
55
+ 4. GCLOUD_KEYFILE_JSON - JSON contents
56
56
 
57
57
  ### Cloud SDK
58
58
 
@@ -1,5 +1,17 @@
1
1
  # Release History
2
2
 
3
+ ### 0.3.0 / 2015-08-21
4
+
5
+ #### Major changes
6
+
7
+ Add BigQuery service
8
+
9
+ #### Minor changes
10
+
11
+ * Improve error messaging when uploading files to Storage
12
+ * Add `GCLOUD_PROJECT` and `GCLOUD_KEYFILE` environment variables
13
+ * Specify OAuth 2.0 scopes when connecting to services
14
+
3
15
  ### 0.2.0 / 2015-07-22
4
16
 
5
17
  #### Major changes
@@ -8,6 +8,36 @@ $ gem install gcloud
8
8
 
9
9
  Gcloud aims to make authentication as simple as possible. Google Cloud requires a **Project ID** and **Service Account Credentials** to connect to the APIs. You can learn more about various options for connection on the [Authentication Guide](AUTHENTICATION.md).
10
10
 
11
+ # BigQuery
12
+
13
+ [Google Cloud BigQuery](https://cloud.google.com/bigquery/) ([docs](https://cloud.google.com/bigquery/docs)) enables super-fast, SQL-like queries against append-only tables, using the processing power of Google's infrastructure. Simply move your data into BigQuery and let it handle the hard work. You can control access to both the project and your data based on your business needs, such as giving others the ability to view or query your data.
14
+
15
+ See the [gcloud-ruby BigQuery API documentation](rdoc-ref:Gcloud::Bigquery) to learn how to connect to Cloud BigQuery using this library.
16
+
17
+ ```ruby
18
+ require "gcloud"
19
+
20
+ gcloud = Gcloud.new
21
+ bigquery = gcloud.bigquery
22
+
23
+ # Create a new table to archive todos
24
+ dataset = bigquery.dataset "my-todo-archive"
25
+ table = dataset.create_table "todos",
26
+ name: "Todos Archive",
27
+ description: "Archive for completed TODO records"
28
+
29
+ # Load data into the table
30
+ file = File.open "/archive/todos/completed-todos.csv"
31
+ load_job = table.load file
32
+
33
+ # Run a query for the number of completed todos by owner
34
+ count_sql = "SELECT owner, COUNT(*) AS complete_count FROM todos GROUP BY owner"
35
+ data = bigquery.query count_sql
36
+ data.each do |row|
37
+ puts row["name"]
38
+ end
39
+ ```
40
+
11
41
  # Datastore
12
42
 
13
43
  [Google Cloud Datastore](https://cloud.google.com/datastore/) ([docs](https://cloud.google.com/datastore/docs)) is a fully managed, schemaless database for storing non-relational data. Cloud Datastore automatically scales with your users and supports ACID transactions, high availability of reads and writes, strong consistency for reads and ancestor queries, and eventual consistency for all other queries.
@@ -70,11 +70,26 @@ module Gcloud
70
70
  # Creates a new object for connecting to the Datastore service.
71
71
  # Each call creates a new connection.
72
72
  #
73
+ # === Parameters
74
+ #
75
+ # +options+::
76
+ # An optional Hash for controlling additional behavior. (+Hash+)
77
+ # <code>options[:scope]</code>::
78
+ # The OAuth 2.0 scopes controlling the set of resources and operations that
79
+ # the connection can access. See {Using OAuth 2.0 to Access Google
80
+ # APIs}[https://developers.google.com/identity/protocols/OAuth2]. (+String+
81
+ # or +Array+)
82
+ #
83
+ # The default scopes are:
84
+ #
85
+ # * +https://www.googleapis.com/auth/datastore+
86
+ # * +https://www.googleapis.com/auth/userinfo.email+
87
+ #
73
88
  # === Returns
74
89
  #
75
90
  # Gcloud::Datastore::Dataset
76
91
  #
77
- # === Example
92
+ # === Examples
78
93
  #
79
94
  # require "gcloud"
80
95
  #
@@ -88,20 +103,43 @@ module Gcloud
88
103
  #
89
104
  # dataset.save entity
90
105
  #
91
- def datastore
106
+ # You shouldn't need to override the default scope, but it is possible to do
107
+ # so with the +scope+ option:
108
+ #
109
+ # require "gcloud"
110
+ #
111
+ # gcloud = Gcloud.new
112
+ # platform_scope = "https://www.googleapis.com/auth/cloud-platform"
113
+ # dataset = gcloud.datastore scope: platform_scope
114
+ #
115
+ def datastore options = {}
92
116
  require "gcloud/datastore"
93
- Gcloud.datastore @project, @keyfile
117
+ Gcloud.datastore @project, @keyfile, options
94
118
  end
95
119
 
96
120
  ##
97
121
  # Creates a new object for connecting to the Storage service.
98
122
  # Each call creates a new connection.
99
123
  #
124
+ # === Parameters
125
+ #
126
+ # +options+::
127
+ # An optional Hash for controlling additional behavior. (+Hash+)
128
+ # <code>options[:scope]</code>::
129
+ # The OAuth 2.0 scopes controlling the set of resources and operations that
130
+ # the connection can access. See {Using OAuth 2.0 to Access Google
131
+ # APIs}[https://developers.google.com/identity/protocols/OAuth2]. (+String+
132
+ # or +Array+)
133
+ #
134
+ # The default scope is:
135
+ #
136
+ # * +https://www.googleapis.com/auth/devstorage.full_control+
137
+ #
100
138
  # === Returns
101
139
  #
102
140
  # Gcloud::Storage::Project
103
141
  #
104
- # === Example
142
+ # === Examples
105
143
  #
106
144
  # require "gcloud"
107
145
  #
@@ -110,20 +148,44 @@ module Gcloud
110
148
  # bucket = storage.bucket "my-bucket"
111
149
  # file = bucket.file "path/to/my-file.ext"
112
150
  #
113
- def storage
151
+ # The default scope can be overridden with the +scope+ option. For more
152
+ # information see {Storage OAuth 2.0
153
+ # Authentication}[https://cloud.google.com/storage/docs/authentication#oauth].
154
+ #
155
+ # require "gcloud"
156
+ #
157
+ # gcloud = Gcloud.new
158
+ # readonly_scope = "https://www.googleapis.com/auth/devstorage.read_only"
159
+ # readonly_storage = gcloud.storage scope: readonly_scope
160
+ #
161
+ def storage options = {}
114
162
  require "gcloud/storage"
115
- Gcloud.storage @project, @keyfile
163
+ Gcloud.storage @project, @keyfile, options
116
164
  end
117
165
 
118
166
  ##
119
167
  # Creates a new object for connecting to the Pub/Sub service.
120
168
  # Each call creates a new connection.
121
169
  #
170
+ # === Parameters
171
+ #
172
+ # +options+::
173
+ # An optional Hash for controlling additional behavior. (+Hash+)
174
+ # <code>options[:scope]</code>::
175
+ # The OAuth 2.0 scopes controlling the set of resources and operations that
176
+ # the connection can access. See {Using OAuth 2.0 to Access Google
177
+ # APIs}[https://developers.google.com/identity/protocols/OAuth2]. (+String+
178
+ # or +Array+)
179
+ #
180
+ # The default scope is:
181
+ #
182
+ # * +https://www.googleapis.com/auth/pubsub+
183
+ #
122
184
  # === Returns
123
185
  #
124
186
  # Gcloud::Pubsub::Project
125
187
  #
126
- # === Example
188
+ # === Examples
127
189
  #
128
190
  # require "gcloud"
129
191
  #
@@ -132,9 +194,64 @@ module Gcloud
132
194
  # topic = pubsub.topic "my-topic"
133
195
  # topic.publish "task completed"
134
196
  #
135
- def pubsub
197
+ # The default scope can be overridden with the +scope+ option:
198
+ #
199
+ # require "gcloud"
200
+ #
201
+ # gcloud = Gcloud.new
202
+ # platform_scope = "https://www.googleapis.com/auth/cloud-platform"
203
+ # pubsub = gcloud.pubsub scope: platform_scope
204
+ #
205
+ def pubsub options = {}
136
206
  require "gcloud/pubsub"
137
- Gcloud.pubsub @project, @keyfile
207
+ Gcloud.pubsub @project, @keyfile, options
208
+ end
209
+
210
+ ##
211
+ # Creates a new object for connecting to the BigQuery service.
212
+ # Each call creates a new connection.
213
+ #
214
+ # === Parameters
215
+ #
216
+ # +options+::
217
+ # An optional Hash for controlling additional behavior. (+Hash+)
218
+ # <code>options[:scope]</code>::
219
+ # The OAuth 2.0 scopes controlling the set of resources and operations that
220
+ # the connection can access. See {Using OAuth 2.0 to Access Google
221
+ # APIs}[https://developers.google.com/identity/protocols/OAuth2]. (+String+
222
+ # or +Array+)
223
+ #
224
+ # The default scope is:
225
+ #
226
+ # * +https://www.googleapis.com/auth/bigquery+
227
+ #
228
+ # === Returns
229
+ #
230
+ # Gcloud::Bigquery::Project
231
+ #
232
+ # === Examples
233
+ #
234
+ # require "gcloud"
235
+ #
236
+ # gcloud = Gcloud.new
237
+ # bigquery = gcloud.bigquery
238
+ # dataset = bigquery.dataset "my-dataset"
239
+ # table = dataset.table "my-table"
240
+ # table.data.each do |row|
241
+ # puts row
242
+ # end
243
+ #
244
+ # The default scope can be overridden with the +scope+ option:
245
+ #
246
+ # require "gcloud"
247
+ #
248
+ # gcloud = Gcloud.new
249
+ # platform_scope = "https://www.googleapis.com/auth/cloud-platform"
250
+ # bigquery = gcloud.bigquery scope: platform_scope
251
+ #
252
+ def bigquery options = {}
253
+ require "gcloud/bigquery"
254
+ Gcloud.bigquery @project, @keyfile, options
138
255
  end
139
256
 
140
257
  ##
@@ -0,0 +1,399 @@
1
+ # Copyright 2015 Google Inc. All rights reserved.
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+
15
+ require "gcloud"
16
+ require "gcloud/bigquery/project"
17
+
18
+ #--
19
+ # Google Cloud BigQuery
20
+ module Gcloud
21
+ ##
22
+ # Creates a new +Project+ instance connected to the BigQuery service.
23
+ # Each call creates a new connection.
24
+ #
25
+ # === Parameters
26
+ #
27
+ # +project+::
28
+ # Identifier for a BigQuery project. If not present, the default project for
29
+ # the credentials is used. (+String+)
30
+ # +keyfile+::
31
+ # Keyfile downloaded from Google Cloud. If file path the file must be
32
+ # readable. (+String+ or +Hash+)
33
+ # +options+::
34
+ # An optional Hash for controlling additional behavior. (+Hash+)
35
+ # <code>options[:scope]</code>::
36
+ # The OAuth 2.0 scopes controlling the set of resources and operations that
37
+ # the connection can access. See {Using OAuth 2.0 to Access Google
38
+ # APIs}[https://developers.google.com/identity/protocols/OAuth2]. (+String+
39
+ # or +Array+)
40
+ #
41
+ # The default scope is:
42
+ #
43
+ # * +https://www.googleapis.com/auth/bigquery+
44
+ #
45
+ # === Returns
46
+ #
47
+ # Gcloud::Bigquery::Project
48
+ #
49
+ # === Example
50
+ #
51
+ # require "gcloud/bigquery"
52
+ #
53
+ # bigquery = Gcloud.bigquery
54
+ # dataset = bigquery.dataset "my_dataset"
55
+ # table = dataset.table "my_table"
56
+ #
57
+ def self.bigquery project = nil, keyfile = nil, options = {}
58
+ project ||= Gcloud::Bigquery::Project.default_project
59
+ if keyfile.nil?
60
+ credentials = Gcloud::Bigquery::Credentials.default options
61
+ else
62
+ credentials = Gcloud::Bigquery::Credentials.new keyfile, options
63
+ end
64
+ Gcloud::Bigquery::Project.new project, credentials
65
+ end
66
+
67
+ ##
68
+ # = Google Cloud BigQuery
69
+ #
70
+ # Google Cloud BigQuery enables super-fast, SQL-like queries against massive
71
+ # datasets, using the processing power of Google's infrastructure. To learn
72
+ # more, read {What is
73
+ # BigQuery?}[https://cloud.google.com/bigquery/what-is-bigquery].
74
+ #
75
+ # Gcloud's goal is to provide an API that is familiar and comfortable to
76
+ # Rubyists. Authentication is handled by Gcloud#bigquery. You can provide
77
+ # the project and credential information to connect to the BigQuery service,
78
+ # or if you are running on Google Compute Engine this configuration is taken
79
+ # care of for you. You can read more about the options for connecting in the
80
+ # {Authentication Guide}[link:AUTHENTICATION.md].
81
+ #
82
+ # To help you get started quickly, the first few examples below use a public
83
+ # dataset provided by Google. As soon as you have {signed
84
+ # up}[https://cloud.google.com/bigquery/sign-up] to use BigQuery, and provided
85
+ # that you stay in the free tier for queries, you should be able to run these
86
+ # first examples without the need to set up billing or to load data (although
87
+ # we'll show you how to do that too.)
88
+ #
89
+ # == Listing Datasets and Tables
90
+ #
91
+ # A BigQuery project holds datasets, which in turn hold tables. Assuming that
92
+ # you have not yet created datasets or tables in your own project, let's
93
+ # connect to Google's +publicdata+ project, and see what you find.
94
+ #
95
+ # require "gcloud"
96
+ #
97
+ # gcloud = Gcloud.new "publicdata"
98
+ # bigquery = gcloud.bigquery
99
+ #
100
+ # bigquery.datasets.count #=> 1
101
+ # bigquery.datasets.first.dataset_id #=> "samples"
102
+ #
103
+ # dataset = bigquery.datasets.first
104
+ # tables = dataset.tables
105
+ #
106
+ # tables.count #=> 7
107
+ # tables.map &:table_id #=> [..., "shakespeare", "trigrams", "wikipedia"]
108
+ #
109
+ # In addition listing all datasets and tables in the project, you can also
110
+ # retrieve individual datasets and tables by ID. Let's look at the structure
111
+ # of the +shakespeare+ table, which contains an entry for every word in every
112
+ # play written by Shakespeare.
113
+ #
114
+ # require "gcloud"
115
+ #
116
+ # gcloud = Gcloud.new "publicdata"
117
+ # bigquery = gcloud.bigquery
118
+ #
119
+ # dataset = bigquery.dataset "samples"
120
+ # table = dataset.table "shakespeare"
121
+ #
122
+ # table.headers #=> ["word", "word_count", "corpus", "corpus_date"]
123
+ # table.rows_count #=> 164656
124
+ #
125
+ # Now that you know the column names for the Shakespeare table, you can write
126
+ # and run a query.
127
+ #
128
+ # == Running queries
129
+ #
130
+ # BigQuery offers both synchronous and asynchronous methods, as explained in
131
+ # {Querying Data}[https://cloud.google.com/bigquery/querying-data].
132
+ #
133
+ # === Synchronous queries
134
+ #
135
+ # Let's start with the simpler synchronous approach. Notice that this time you
136
+ # are connecting using your own default project. This is necessary for running
137
+ # a query, since queries need to be able to create tables to hold results.
138
+ #
139
+ # require "gcloud"
140
+ #
141
+ # gcloud = Gcloud.new
142
+ # bigquery = gcloud.bigquery
143
+ #
144
+ # sql = "SELECT TOP(word, 50) as word, COUNT(*) as count " +
145
+ # "FROM publicdata:samples.shakespeare"
146
+ # data = bigquery.query sql
147
+ #
148
+ # data.count #=> 50
149
+ # data.next? #=> false
150
+ # data.first #=> {"word"=>"you", "count"=>42}
151
+ #
152
+ # The +TOP+ function shown above is just one of a variety of functions
153
+ # offered by BigQuery. See the {Query
154
+ # Reference}[https://cloud.google.com/bigquery/query-reference] for a full
155
+ # listing.
156
+ #
157
+ # === Asynchronous queries
158
+ #
159
+ # Because you probably should not block for most BigQuery operations,
160
+ # including querying as well as importing, exporting, and copying data, the
161
+ # BigQuery API enables you to manage longer-running jobs. In the asynchronous
162
+ # approach to running a query, an instance of Gcloud::Bigquery::QueryJob is
163
+ # returned, rather than an instance of Gcloud::Bigquery::QueryData.
164
+ #
165
+ # require "gcloud"
166
+ #
167
+ # gcloud = Gcloud.new
168
+ # bigquery = gcloud.bigquery
169
+ #
170
+ # sql = "SELECT TOP(word, 50) as word, COUNT(*) as count " +
171
+ # "FROM publicdata:samples.shakespeare"
172
+ # job = bigquery.query_job sql
173
+ #
174
+ # loop do
175
+ # break if job.done?
176
+ # sleep 1
177
+ # job.refresh!
178
+ # end
179
+ # if !job.failed?
180
+ # job.query_results.each do |row|
181
+ # puts row["word"]
182
+ # end
183
+ # end
184
+ #
185
+ # Once you have determined that the job is done and has not failed, you can
186
+ # obtain an instance of Gcloud::Bigquery::QueryData by calling
187
+ # Gcloud::Bigquery::QueryJob#query_results. The query results for both of
188
+ # the above examples are stored in temporary tables with a lifetime of about
189
+ # 24 hours. See the final example below for a demonstration of how to store
190
+ # query results in a permanent table.
191
+ #
192
+ # == Creating Datasets and Tables
193
+ #
194
+ # The first thing you need to do in a new BigQuery project is to create a
195
+ # Gcloud::Bigquery::Dataset. Datasets hold tables and control access to them.
196
+ #
197
+ # require "gcloud/bigquery"
198
+ #
199
+ # gcloud = Gcloud.new
200
+ # bigquery = gcloud.bigquery
201
+ # dataset = bigquery.create_dataset "my_dataset"
202
+ #
203
+ # Now that you have a dataset, you can use it to create a table. Every table
204
+ # is defined by a schema that may contain nested and repeated fields. The
205
+ # example below shows a schema with a repeated record field named
206
+ # +cities_lived+. (For more information about nested and repeated fields, see
207
+ # {Preparing Data for
208
+ # BigQuery}[https://cloud.google.com/bigquery/preparing-data-for-bigquery].)
209
+ #
210
+ # require "gcloud"
211
+ #
212
+ # gcloud = Gcloud.new
213
+ # bigquery = gcloud.bigquery
214
+ # dataset = bigquery.dataset "my_dataset"
215
+ #
216
+ # schema = {
217
+ # "fields" => [
218
+ # {
219
+ # "name" => "first_name",
220
+ # "type" => "STRING",
221
+ # "mode" => "REQUIRED"
222
+ # },
223
+ # {
224
+ # "name" => "cities_lived",
225
+ # "type" => "RECORD",
226
+ # "mode" => "REPEATED",
227
+ # "fields" => [
228
+ # {
229
+ # "name" => "place",
230
+ # "type" => "STRING",
231
+ # "mode" => "REQUIRED"
232
+ # },
233
+ # {
234
+ # "name" => "number_of_years",
235
+ # "type" => "INTEGER",
236
+ # "mode" => "REQUIRED"
237
+ # }
238
+ # ]
239
+ # }
240
+ # ]
241
+ # }
242
+ # table = dataset.create_table "people", schema: schema
243
+ #
244
+ # Because of the repeated field in this schema, we cannot use the CSV format
245
+ # to load data into the table.
246
+ #
247
+ # == Loading records
248
+ #
249
+ # In addition to CSV, data can be imported from files that are formatted as
250
+ # {Newline-delimited JSON}[http://jsonlines.org/] or
251
+ # {Avro}[http://avro.apache.org/], or from a Google Cloud Datastore backup. It
252
+ # can also be "streamed" into BigQuery.
253
+ #
254
+ # To follow along with these examples, you will need to set up billing on the
255
+ # {Google Developers Console}[https://console.developers.google.com].
256
+ #
257
+ # === Streaming records
258
+ #
259
+ # For situations in which you want new data to be available for querying as
260
+ # soon as possible, inserting individual records directly from your Ruby
261
+ # application is a great approach.
262
+ #
263
+ # require "gcloud"
264
+ #
265
+ # gcloud = Gcloud.new
266
+ # bigquery = gcloud.bigquery
267
+ # dataset = bigquery.dataset "my_dataset"
268
+ # table = dataset.table "people"
269
+ #
270
+ # rows = [
271
+ # {
272
+ # "first_name" => "Anna",
273
+ # "cities_lived" => [
274
+ # {
275
+ # "place" => "Stockholm",
276
+ # "number_of_years" => 2
277
+ # }
278
+ # ]
279
+ # },
280
+ # {
281
+ # "first_name" => "Bob",
282
+ # "cities_lived" => [
283
+ # {
284
+ # "place" => "Seattle",
285
+ # "number_of_years" => 5
286
+ # },
287
+ # {
288
+ # "place" => "Austin",
289
+ # "number_of_years" => 6
290
+ # }
291
+ # ]
292
+ # }
293
+ # ]
294
+ # table.insert rows
295
+ #
296
+ # There are some trade-offs involved with streaming, so be sure to read the
297
+ # discussion of data consistency in {Streaming Data Into
298
+ # BigQuery}[https://cloud.google.com/bigquery/streaming-data-into-bigquery].
299
+ #
300
+ # === Uploading a file
301
+ #
302
+ # To follow along with this example, please download the
303
+ # {names.zip}[http://www.ssa.gov/OACT/babynames/names.zip] archive from the
304
+ # U.S. Social Security Administration. Inside the archive you will find over
305
+ # 100 files containing baby name records since the year 1880. A PDF file also
306
+ # contained in the archive specifies the schema used below.
307
+ #
308
+ # require "gcloud"
309
+ #
310
+ # gcloud = Gcloud.new
311
+ # bigquery = gcloud.bigquery
312
+ # dataset = bigquery.dataset "my_dataset"
313
+ # schema = {
314
+ # "fields" => [
315
+ # {
316
+ # "name" => "name",
317
+ # "type" => "STRING",
318
+ # "mode" => "REQUIRED"
319
+ # },
320
+ # {
321
+ # "name" => "sex",
322
+ # "type" => "STRING",
323
+ # "mode" => "REQUIRED"
324
+ # },
325
+ # {
326
+ # "name" => "number",
327
+ # "type" => "INTEGER",
328
+ # "mode" => "REQUIRED"
329
+ # }
330
+ # ]
331
+ # }
332
+ # table = dataset.create_table "baby_names", schema: schema
333
+ #
334
+ # file = File.open "names/yob2014.txt"
335
+ # load_job = table.load file, format: "csv"
336
+ #
337
+ # Because the names data, although formatted as CSV, is distributed in files
338
+ # with a +.txt+ extension, this example explicitly passes the +format+ option
339
+ # in order to demonstrate how to handle such situations. Because CSV is the
340
+ # default format for load operations, the option is not actually necessary.
341
+ # For JSON saved with a +.txt+ extension, however, it would be.
342
+ #
343
+ # == Exporting query results to Google Cloud Storage
344
+ #
345
+ # The example below shows how to pass the +table+ option with a query in order
346
+ # to store results in a permanent table. It also shows how to export the
347
+ # result data to a Google Cloud Storage file. In order to follow along, you
348
+ # will need to enable the Google Cloud Storage API in addition to setting up
349
+ # billing.
350
+ #
351
+ # require "gcloud"
352
+ #
353
+ # gcloud = Gcloud.new
354
+ # bigquery = gcloud.bigquery
355
+ # dataset = bigquery.dataset "my_dataset"
356
+ # source_table = dataset.table "baby_names"
357
+ # result_table = dataset.create_table "baby_names_results"
358
+ #
359
+ # sql = "SELECT name, number as count " +
360
+ # "FROM baby_names " +
361
+ # "WHERE name CONTAINS 'Sam' " +
362
+ # "ORDER BY count DESC"
363
+ # query_job = dataset.query_job sql, table: result_table
364
+ #
365
+ # loop do
366
+ # break if query_job.done?
367
+ # sleep 1
368
+ # query_job.refresh!
369
+ # end
370
+ #
371
+ # if !query_job.failed?
372
+ #
373
+ # storage = gcloud.storage
374
+ # bucket_id = "bigquery-exports-#{SecureRandom.uuid}"
375
+ # bucket = storage.create_bucket bucket_id
376
+ # extract_url = "gs://#{bucket.id}/baby-names-sam.csv"
377
+ #
378
+ # extract_job = result_table.extract extract_url
379
+ #
380
+ # loop do
381
+ # break if extract_job.done?
382
+ # sleep 1
383
+ # extract_job.refresh!
384
+ # end
385
+ #
386
+ # # Download to local filesystem
387
+ # bucket.files.first.download "baby-names-sam.csv"
388
+ #
389
+ # end
390
+ #
391
+ # If a table you wish to export contains a large amount of data, you can pass
392
+ # a wildcard URI to export to multiple files (for sharding), or an array of
393
+ # URIs (for partitioning), or both. See {Exporting Data From
394
+ # BigQuery}[https://cloud.google.com/bigquery/exporting-data-from-bigquery]
395
+ # for details.
396
+ #
397
+ module Bigquery
398
+ end
399
+ end