google-cloud-bigquery 1.7.1 → 1.8.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.yardopts +9 -1
- data/README.md +24 -12
- data/lib/google-cloud-bigquery.rb +4 -4
- data/lib/google/cloud/bigquery.rb +3 -490
- data/lib/google/cloud/bigquery/convert.rb +2 -0
- data/lib/google/cloud/bigquery/dataset.rb +2 -0
- data/lib/google/cloud/bigquery/encryption_configuration.rb +1 -2
- data/lib/google/cloud/bigquery/job.rb +3 -1
- data/lib/google/cloud/bigquery/load_job.rb +1 -0
- data/lib/google/cloud/bigquery/table.rb +2 -0
- data/lib/google/cloud/bigquery/version.rb +1 -1
- metadata +6 -6
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 6ecc40c252508d7433540e93257afe130072806aa75ec6c5feb02499d0a10913
|
4
|
+
data.tar.gz: 3e2caa63acd11cbcddf71bf4992c80535fd13e7bfd355a4e38e829684128f095
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 256c2c11f97bd389faf49423bbab4b03a3f46e97be055ad47ed7bd9cd1b430966b9f3c5d6651f75ea2af80f16a89e22d6caf796cef61bcee97a9ad9a0ac9633e
|
7
|
+
data.tar.gz: c1f2ad9a2d8b2f977330d37ceed965df8e2e0e0681072abd0e2fb700dc7f5f9333518a6899793c209b804c5d069d33e77f1e24669a966bf0d35234cfb3c7047b
|
data/.yardopts
CHANGED
@@ -2,7 +2,15 @@
|
|
2
2
|
--title=Google Cloud BigQuery
|
3
3
|
--markup markdown
|
4
4
|
--markup-provider redcarpet
|
5
|
+
--main OVERVIEW.md
|
5
6
|
|
6
7
|
./lib/**/*.rb
|
7
8
|
-
|
8
|
-
|
9
|
+
OVERVIEW.md
|
10
|
+
AUTHENTICATION.md
|
11
|
+
LOGGING.md
|
12
|
+
CONTRIBUTING.md
|
13
|
+
TROUBLESHOOTING.md
|
14
|
+
CHANGELOG.md
|
15
|
+
CODE_OF_CONDUCT.md
|
16
|
+
LICENSE
|
data/README.md
CHANGED
@@ -2,7 +2,7 @@
|
|
2
2
|
|
3
3
|
[Google BigQuery](https://cloud.google.com/bigquery/) ([docs](https://cloud.google.com/bigquery/docs)) enables super-fast, SQL-like queries against append-only tables, using the processing power of Google's infrastructure. Simply move your data into BigQuery and let it handle the hard work. You can control access to both the project and your data based on your business needs, such as giving others the ability to view or query your data.
|
4
4
|
|
5
|
-
- [google-cloud-bigquery API documentation](http://googlecloudplatform.github.io/google-cloud-ruby
|
5
|
+
- [google-cloud-bigquery API documentation](http://googlecloudplatform.github.io/google-cloud-ruby/docs/google-cloud-bigquery/latest)
|
6
6
|
- [google-cloud-bigquery on RubyGems](https://rubygems.org/gems/google-cloud-bigquery)
|
7
7
|
- [Google BigQuery documentation](https://cloud.google.com/bigquery/docs)
|
8
8
|
|
@@ -16,7 +16,7 @@ $ gem install google-cloud-bigquery
|
|
16
16
|
|
17
17
|
This library uses Service Account credentials to connect to Google Cloud services. When running on Compute Engine the credentials will be discovered automatically. When running on other environments the Service Account credentials can be specified by providing the path to the JSON file, or the JSON itself, in environment variables.
|
18
18
|
|
19
|
-
Instructions and configuration options are covered in the [Authentication Guide](https://googlecloudplatform.github.io/google-cloud-ruby
|
19
|
+
Instructions and configuration options are covered in the [Authentication Guide](https://googlecloudplatform.github.io/google-cloud-ruby/docs/google-cloud-bigquery/latest/file.AUTHENTICATION).
|
20
20
|
|
21
21
|
## Example
|
22
22
|
|
@@ -49,7 +49,7 @@ end
|
|
49
49
|
|
50
50
|
## Enabling Logging
|
51
51
|
|
52
|
-
To enable logging for this library, set the logger for the underlying [Google API Client](https://github.com/google/google-api-ruby-client/blob/master/README.md#logging) library. The logger that you set may be a Ruby stdlib [`Logger`](https://ruby-doc.org/stdlib-2.4.0/libdoc/logger/rdoc/Logger.html) as shown below, or a [`Google::Cloud::Logging::Logger`](https://googlecloudplatform.github.io/google-cloud-ruby
|
52
|
+
To enable logging for this library, set the logger for the underlying [Google API Client](https://github.com/google/google-api-ruby-client/blob/master/README.md#logging) library. The logger that you set may be a Ruby stdlib [`Logger`](https://ruby-doc.org/stdlib-2.4.0/libdoc/logger/rdoc/Logger.html) as shown below, or a [`Google::Cloud::Logging::Logger`](https://googlecloudplatform.github.io/google-cloud-ruby/docs/google-cloud-logging/latest/Google/Cloud/Logging/Logger) that will write logs to [Stackdriver Logging](https://cloud.google.com/logging/).
|
53
53
|
|
54
54
|
If you do not set the logger explicitly and your application is running in a Rails environment, it will default to `Rails.logger`. Otherwise, if you do not set the logger and you are not using Rails, logging is disabled by default.
|
55
55
|
|
@@ -70,9 +70,9 @@ Google::Apis.logger = my_logger
|
|
70
70
|
This library is supported on Ruby 2.3+.
|
71
71
|
|
72
72
|
Google provides official support for Ruby versions that are actively supported
|
73
|
-
by Ruby Core—that is, Ruby versions that are either in normal maintenance or
|
74
|
-
|
75
|
-
|
73
|
+
by Ruby Core—that is, Ruby versions that are either in normal maintenance or in
|
74
|
+
security maintenance, and not end of life. Currently, this means Ruby 2.3 and
|
75
|
+
later. Older versions of Ruby _may_ still work, but are unsupported and not
|
76
76
|
recommended. See https://www.ruby-lang.org/en/downloads/branches/ for details
|
77
77
|
about the Ruby support schedule.
|
78
78
|
|
@@ -80,21 +80,33 @@ about the Ruby support schedule.
|
|
80
80
|
|
81
81
|
This library follows [Semantic Versioning](http://semver.org/).
|
82
82
|
|
83
|
-
It is currently in major version zero (0.y.z), which means that anything may
|
83
|
+
It is currently in major version zero (0.y.z), which means that anything may
|
84
|
+
change at any time and the public API should not be considered stable.
|
84
85
|
|
85
86
|
## Contributing
|
86
87
|
|
87
88
|
Contributions to this library are always welcome and highly encouraged.
|
88
89
|
|
89
|
-
See the [Contributing
|
90
|
+
See the [Contributing
|
91
|
+
Guide](https://googlecloudplatform.github.io/google-cloud-ruby/docs/google-cloud-bigquery/latest/file.CONTRIBUTING)
|
92
|
+
for more information on how to get started.
|
90
93
|
|
91
|
-
Please note that this project is released with a Contributor Code of Conduct. By
|
94
|
+
Please note that this project is released with a Contributor Code of Conduct. By
|
95
|
+
participating in this project you agree to abide by its terms. See [Code of
|
96
|
+
Conduct](https://googlecloudplatform.github.io/google-cloud-ruby/docs/google-cloud-bigquery/latest/file.CODE_OF_CONDUCT)
|
97
|
+
for more information.
|
92
98
|
|
93
99
|
## License
|
94
100
|
|
95
|
-
This library is licensed under Apache 2.0. Full license text is available in
|
101
|
+
This library is licensed under Apache 2.0. Full license text is available in
|
102
|
+
[LICENSE](https://googlecloudplatform.github.io/google-cloud-ruby/docs/google-cloud-bigquery/latest/file.LICENSE).
|
96
103
|
|
97
104
|
## Support
|
98
105
|
|
99
|
-
Please [report bugs at the project on
|
100
|
-
|
106
|
+
Please [report bugs at the project on
|
107
|
+
Github](https://github.com/GoogleCloudPlatform/google-cloud-ruby/issues). Don't
|
108
|
+
hesitate to [ask
|
109
|
+
questions](http://stackoverflow.com/questions/tagged/google-cloud-platform+ruby)
|
110
|
+
about the client or APIs on [StackOverflow](http://stackoverflow.com). For more
|
111
|
+
see the [Troubleshooting
|
112
|
+
guide](https://googlecloudplatform.github.io/google-cloud-ruby/docs/google-cloud-bigquery/latest/file.TROUBLESHOOTING)
|
@@ -29,8 +29,8 @@ module Google
|
|
29
29
|
# Creates a new object for connecting to the BigQuery service.
|
30
30
|
# Each call creates a new connection.
|
31
31
|
#
|
32
|
-
# For more information on connecting to Google Cloud see the
|
33
|
-
# Guide
|
32
|
+
# For more information on connecting to Google Cloud see the
|
33
|
+
# {file:AUTHENTICATION.md Authentication Guide}.
|
34
34
|
#
|
35
35
|
# @param [String, Array<String>] scope The OAuth 2.0 scopes controlling the
|
36
36
|
# set of resources and operations that the connection can access. See
|
@@ -74,8 +74,8 @@ module Google
|
|
74
74
|
# Creates a new `Project` instance connected to the BigQuery service.
|
75
75
|
# Each call creates a new connection.
|
76
76
|
#
|
77
|
-
# For more information on connecting to Google Cloud see the
|
78
|
-
# Guide
|
77
|
+
# For more information on connecting to Google Cloud see the
|
78
|
+
# {file:AUTHENTICATION.md Authentication Guide}.
|
79
79
|
#
|
80
80
|
# @param [String] project_id Identifier for a BigQuery project. If not
|
81
81
|
# present, the default project for the credentials is used.
|
@@ -24,503 +24,16 @@ module Google
|
|
24
24
|
# # Google Cloud BigQuery
|
25
25
|
#
|
26
26
|
# Google BigQuery enables super-fast, SQL-like queries against massive
|
27
|
-
# datasets, using the processing power of Google's infrastructure.
|
28
|
-
# more, read [What is
|
29
|
-
# BigQuery?](https://cloud.google.com/bigquery/what-is-bigquery).
|
27
|
+
# datasets, using the processing power of Google's infrastructure.
|
30
28
|
#
|
31
|
-
#
|
32
|
-
# Rubyists. Your authentication credentials are detected automatically in
|
33
|
-
# Google Cloud Platform environments such as Google Compute Engine, Google
|
34
|
-
# App Engine and Google Kubernetes Engine. In other environments you can
|
35
|
-
# configure authentication easily, either directly in your code or via
|
36
|
-
# environment variables. Read more about the options for connecting in the
|
37
|
-
# [Authentication
|
38
|
-
# Guide](https://googlecloudplatform.github.io/google-cloud-ruby/#/docs/guides/authentication).
|
39
|
-
#
|
40
|
-
# To help you get started quickly, the first few examples below use a public
|
41
|
-
# dataset provided by Google. As soon as you have [signed
|
42
|
-
# up](https://cloud.google.com/bigquery/sign-up) to use BigQuery, and
|
43
|
-
# provided that you stay in the free tier for queries, you should be able to
|
44
|
-
# run these first examples without the need to set up billing or to load
|
45
|
-
# data (although we'll show you how to do that too.)
|
46
|
-
#
|
47
|
-
# ## Enabling Logging
|
48
|
-
#
|
49
|
-
# To enable logging for this library, set the logger for the underlying
|
50
|
-
# [Google API Client](https://github.com/google/google-api-ruby-client/blob/master/README.md#logging)
|
51
|
-
# library. The logger that you set may be a Ruby stdlib
|
52
|
-
# [`Logger`](https://ruby-doc.org/stdlib-2.4.0/libdoc/logger/rdoc/Logger.html)
|
53
|
-
# as shown below, or a
|
54
|
-
# [`Google::Cloud::Logging::Logger`](https://googlecloudplatform.github.io/google-cloud-ruby/#/docs/google-cloud-logging/latest/google/cloud/logging/logger)
|
55
|
-
# that will write logs to [Stackdriver
|
56
|
-
# Logging](https://cloud.google.com/logging/).
|
57
|
-
#
|
58
|
-
# If you do not set the logger explicitly and your application is running in
|
59
|
-
# a Rails environment, it will default to `Rails.logger`. Otherwise, if you
|
60
|
-
# do not set the logger and you are not using Rails, logging is disabled by
|
61
|
-
# default.
|
62
|
-
#
|
63
|
-
# Configuring a Ruby stdlib logger:
|
64
|
-
#
|
65
|
-
# ```ruby
|
66
|
-
# require "logger"
|
67
|
-
#
|
68
|
-
# my_logger = Logger.new $stderr
|
69
|
-
# my_logger.level = Logger::WARN
|
70
|
-
#
|
71
|
-
# # Set the Google API Client logger
|
72
|
-
# Google::Apis.logger = my_logger
|
73
|
-
# ```
|
74
|
-
#
|
75
|
-
# ## Listing Datasets and Tables
|
76
|
-
#
|
77
|
-
# A BigQuery project contains datasets, which in turn contain tables.
|
78
|
-
# Assuming that you have not yet created datasets or tables in your own
|
79
|
-
# project, let's connect to Google's `bigquery-public-data` project, and see
|
80
|
-
# what we find.
|
81
|
-
#
|
82
|
-
# ```ruby
|
83
|
-
# require "google/cloud/bigquery"
|
84
|
-
#
|
85
|
-
# bigquery = Google::Cloud::Bigquery.new project: "bigquery-public-data"
|
86
|
-
#
|
87
|
-
# bigquery.datasets.count #=> 1
|
88
|
-
# bigquery.datasets.first.dataset_id #=> "samples"
|
89
|
-
#
|
90
|
-
# dataset = bigquery.datasets.first
|
91
|
-
# tables = dataset.tables
|
92
|
-
#
|
93
|
-
# tables.count #=> 7
|
94
|
-
# tables.map &:table_id #=> [..., "shakespeare", "trigrams", "wikipedia"]
|
95
|
-
# ```
|
96
|
-
#
|
97
|
-
# In addition to listing all datasets and tables in the project, you can
|
98
|
-
# also retrieve individual datasets and tables by ID. Let's look at the
|
99
|
-
# structure of the `shakespeare` table, which contains an entry for every
|
100
|
-
# word in every play written by Shakespeare.
|
101
|
-
#
|
102
|
-
# ```ruby
|
103
|
-
# require "google/cloud/bigquery"
|
104
|
-
#
|
105
|
-
# bigquery = Google::Cloud::Bigquery.new project: "bigquery-public-data"
|
106
|
-
#
|
107
|
-
# dataset = bigquery.dataset "samples"
|
108
|
-
# table = dataset.table "shakespeare"
|
109
|
-
#
|
110
|
-
# table.headers #=> [:word, :word_count, :corpus, :corpus_date]
|
111
|
-
# table.rows_count #=> 164656
|
112
|
-
# ```
|
113
|
-
#
|
114
|
-
# Now that you know the column names for the Shakespeare table, let's
|
115
|
-
# write and run a few queries against it.
|
116
|
-
#
|
117
|
-
# ## Running queries
|
118
|
-
#
|
119
|
-
# BigQuery supports two SQL dialects: [standard
|
120
|
-
# SQL](https://cloud.google.com/bigquery/docs/reference/standard-sql/)
|
121
|
-
# and the older [legacy SQl (BigQuery
|
122
|
-
# SQL)](https://cloud.google.com/bigquery/docs/reference/legacy-sql),
|
123
|
-
# as discussed in the guide [Migrating from legacy
|
124
|
-
# SQL](https://cloud.google.com/bigquery/docs/reference/standard-sql/migrating-from-legacy-sql).
|
125
|
-
#
|
126
|
-
# ### Standard SQL
|
127
|
-
#
|
128
|
-
# Standard SQL is the preferred SQL dialect for querying data stored in
|
129
|
-
# BigQuery. It is compliant with the SQL 2011 standard, and has extensions
|
130
|
-
# that support querying nested and repeated data. This is the default
|
131
|
-
# syntax. It has several advantages over legacy SQL, including:
|
132
|
-
#
|
133
|
-
# * Composability using `WITH` clauses and SQL functions
|
134
|
-
# * Subqueries in the `SELECT` list and `WHERE` clause
|
135
|
-
# * Correlated subqueries
|
136
|
-
# * `ARRAY` and `STRUCT` data types
|
137
|
-
# * Inserts, updates, and deletes
|
138
|
-
# * `COUNT(DISTINCT <expr>)` is exact and scalable, providing the accuracy
|
139
|
-
# of `EXACT_COUNT_DISTINCT` without its limitations
|
140
|
-
# * Automatic predicate push-down through `JOIN`s
|
141
|
-
# * Complex `JOIN` predicates, including arbitrary expressions
|
142
|
-
#
|
143
|
-
# For examples that demonstrate some of these features, see [Standard SQL
|
144
|
-
# highlights](https://cloud.google.com/bigquery/docs/reference/standard-sql/migrating-from-legacy-sql#standard_sql_highlights).
|
145
|
-
#
|
146
|
-
# As shown in this example, standard SQL is the library default:
|
147
|
-
#
|
148
|
-
# ```ruby
|
149
|
-
# require "google/cloud/bigquery"
|
150
|
-
#
|
151
|
-
# bigquery = Google::Cloud::Bigquery.new
|
152
|
-
#
|
153
|
-
# sql = "SELECT word, SUM(word_count) AS word_count " \
|
154
|
-
# "FROM `bigquery-public-data.samples.shakespeare`" \
|
155
|
-
# "WHERE word IN ('me', 'I', 'you') GROUP BY word"
|
156
|
-
# data = bigquery.query sql
|
157
|
-
# ```
|
158
|
-
#
|
159
|
-
# Notice that in standard SQL, a fully-qualified table name uses the
|
160
|
-
# following format: <code>`my-dashed-project.dataset1.tableName`</code>.
|
161
|
-
#
|
162
|
-
# ### Legacy SQL (formerly BigQuery SQL)
|
163
|
-
#
|
164
|
-
# Before version 2.0, BigQuery executed queries using a non-standard SQL
|
165
|
-
# dialect known as BigQuery SQL. This variant is optional, and can be
|
166
|
-
# enabled by passing the flag `legacy_sql: true` with your query. (If you
|
167
|
-
# get an SQL syntax error with a query that may be written in legacy SQL,
|
168
|
-
# be sure that you are passing this option.)
|
169
|
-
#
|
170
|
-
# To use legacy SQL, pass the option `legacy_sql: true` with your query:
|
171
|
-
#
|
172
|
-
# ```ruby
|
173
|
-
# require "google/cloud/bigquery"
|
174
|
-
#
|
175
|
-
# bigquery = Google::Cloud::Bigquery.new
|
176
|
-
#
|
177
|
-
# sql = "SELECT TOP(word, 50) as word, COUNT(*) as count " \
|
178
|
-
# "FROM [bigquery-public-data:samples.shakespeare]"
|
179
|
-
# data = bigquery.query sql, legacy_sql: true
|
180
|
-
# ```
|
181
|
-
#
|
182
|
-
# Notice that in legacy SQL, a fully-qualified table name uses brackets
|
183
|
-
# instead of back-ticks, and a colon instead of a dot to separate the
|
184
|
-
# project and the dataset: `[my-dashed-project:dataset1.tableName]`.
|
185
|
-
#
|
186
|
-
# #### Query parameters
|
187
|
-
#
|
188
|
-
# With standard SQL, you can use positional or named query parameters. This
|
189
|
-
# example shows the use of named parameters:
|
190
|
-
#
|
191
|
-
# ```ruby
|
192
|
-
# require "google/cloud/bigquery"
|
193
|
-
#
|
194
|
-
# bigquery = Google::Cloud::Bigquery.new
|
195
|
-
#
|
196
|
-
# sql = "SELECT word, SUM(word_count) AS word_count " \
|
197
|
-
# "FROM `bigquery-public-data.samples.shakespeare`" \
|
198
|
-
# "WHERE word IN UNNEST(@words) GROUP BY word"
|
199
|
-
# data = bigquery.query sql, params: { words: ['me', 'I', 'you'] }
|
200
|
-
# ```
|
201
|
-
#
|
202
|
-
# As demonstrated above, passing the `params` option will automatically set
|
203
|
-
# `standard_sql` to `true`.
|
204
|
-
#
|
205
|
-
# #### Data types
|
206
|
-
#
|
207
|
-
# BigQuery standard SQL supports simple data types such as integers, as well
|
208
|
-
# as more complex types such as `ARRAY` and `STRUCT`.
|
209
|
-
#
|
210
|
-
# The BigQuery data types are converted to and from Ruby types as follows:
|
211
|
-
#
|
212
|
-
# | BigQuery | Ruby | Notes |
|
213
|
-
# |-------------|----------------|---|
|
214
|
-
# | `BOOL` | `true`/`false` | |
|
215
|
-
# | `INT64` | `Integer` | |
|
216
|
-
# | `FLOAT64` | `Float` | |
|
217
|
-
# | `NUMERIC` | `BigDecimal` | Will be rounded to 9 decimal places |
|
218
|
-
# | `STRING` | `String` | |
|
219
|
-
# | `DATETIME` | `DateTime` | `DATETIME` does not support time zone. |
|
220
|
-
# | `DATE` | `Date` | |
|
221
|
-
# | `TIMESTAMP` | `Time` | |
|
222
|
-
# | `TIME` | `Google::Cloud::BigQuery::Time` | |
|
223
|
-
# | `BYTES` | `File`, `IO`, `StringIO`, or similar | |
|
224
|
-
# | `ARRAY` | `Array` | Nested arrays and `nil` values are not supported. |
|
225
|
-
# | `STRUCT` | `Hash` | Hash keys may be strings or symbols. |
|
226
|
-
#
|
227
|
-
# See [Data Types](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types)
|
228
|
-
# for an overview of each BigQuery data type, including allowed values.
|
229
|
-
#
|
230
|
-
# ### Running Queries
|
231
|
-
#
|
232
|
-
# Let's start with the simplest way to run a query. Notice that this time
|
233
|
-
# you are connecting using your own default project. It is necessary to have
|
234
|
-
# write access to the project for running a query, since queries need to
|
235
|
-
# create tables to hold results.
|
236
|
-
#
|
237
|
-
# ```ruby
|
238
|
-
# require "google/cloud/bigquery"
|
239
|
-
#
|
240
|
-
# bigquery = Google::Cloud::Bigquery.new
|
241
|
-
#
|
242
|
-
# sql = "SELECT APPROX_TOP_COUNT(corpus, 10) as title, " \
|
243
|
-
# "COUNT(*) as unique_words " \
|
244
|
-
# "FROM `bigquery-public-data.samples.shakespeare`"
|
245
|
-
# data = bigquery.query sql
|
246
|
-
#
|
247
|
-
# data.next? #=> false
|
248
|
-
# data.first #=> {:title=>[{:value=>"hamlet", :count=>5318}, ...}
|
249
|
-
# ```
|
250
|
-
#
|
251
|
-
# The `APPROX_TOP_COUNT` function shown above is just one of a variety of
|
252
|
-
# functions offered by BigQuery. See the [Query Reference (standard
|
253
|
-
# SQL)](https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators)
|
254
|
-
# for a full listing.
|
255
|
-
#
|
256
|
-
# ### Query Jobs
|
257
|
-
#
|
258
|
-
# It is usually best not to block for most BigQuery operations, including
|
259
|
-
# querying as well as importing, exporting, and copying data. Therefore, the
|
260
|
-
# BigQuery API provides facilities for managing longer-running jobs. With
|
261
|
-
# this approach, an instance of {Google::Cloud::Bigquery::QueryJob} is
|
262
|
-
# returned, rather than an instance of {Google::Cloud::Bigquery::Data}.
|
263
|
-
#
|
264
|
-
# ```ruby
|
265
|
-
# require "google/cloud/bigquery"
|
266
|
-
#
|
267
|
-
# bigquery = Google::Cloud::Bigquery.new
|
268
|
-
#
|
269
|
-
# sql = "SELECT APPROX_TOP_COUNT(corpus, 10) as title, " \
|
270
|
-
# "COUNT(*) as unique_words " \
|
271
|
-
# "FROM `bigquery-public-data.samples.shakespeare`"
|
272
|
-
# job = bigquery.query_job sql
|
273
|
-
#
|
274
|
-
# job.wait_until_done!
|
275
|
-
# if !job.failed?
|
276
|
-
# job.data.first
|
277
|
-
# #=> {:title=>[{:value=>"hamlet", :count=>5318}, ...}
|
278
|
-
# end
|
279
|
-
# ```
|
280
|
-
#
|
281
|
-
# Once you have determined that the job is done and has not failed, you can
|
282
|
-
# obtain an instance of {Google::Cloud::Bigquery::Data} by calling `data` on
|
283
|
-
# the job instance. The query results for both of the above examples are
|
284
|
-
# stored in temporary tables with a lifetime of about 24 hours. See the
|
285
|
-
# final example below for a demonstration of how to store query results in a
|
286
|
-
# permanent table.
|
287
|
-
#
|
288
|
-
# ## Creating Datasets and Tables
|
289
|
-
#
|
290
|
-
# The first thing you need to do in a new BigQuery project is to create a
|
291
|
-
# {Google::Cloud::Bigquery::Dataset}. Datasets hold tables and control
|
292
|
-
# access to them.
|
293
|
-
#
|
294
|
-
# ```ruby
|
295
|
-
# require "google/cloud/bigquery"
|
296
|
-
#
|
297
|
-
# bigquery = Google::Cloud::Bigquery.new
|
298
|
-
#
|
299
|
-
# dataset = bigquery.create_dataset "my_dataset"
|
300
|
-
# ```
|
301
|
-
#
|
302
|
-
# Now that you have a dataset, you can use it to create a table. Every table
|
303
|
-
# is defined by a schema that may contain nested and repeated fields. The
|
304
|
-
# example below shows a schema with a repeated record field named
|
305
|
-
# `cities_lived`. (For more information about nested and repeated fields,
|
306
|
-
# see [Preparing Data for
|
307
|
-
# Loading](https://cloud.google.com/bigquery/preparing-data-for-loading).)
|
308
|
-
#
|
309
|
-
# ```ruby
|
310
|
-
# require "google/cloud/bigquery"
|
311
|
-
#
|
312
|
-
# bigquery = Google::Cloud::Bigquery.new
|
313
|
-
# dataset = bigquery.dataset "my_dataset"
|
314
|
-
#
|
315
|
-
# table = dataset.create_table "people" do |schema|
|
316
|
-
# schema.string "first_name", mode: :required
|
317
|
-
# schema.record "cities_lived", mode: :repeated do |nested_schema|
|
318
|
-
# nested_schema.string "place", mode: :required
|
319
|
-
# nested_schema.integer "number_of_years", mode: :required
|
320
|
-
# end
|
321
|
-
# end
|
322
|
-
# ```
|
323
|
-
#
|
324
|
-
# Because of the repeated field in this schema, we cannot use the CSV format
|
325
|
-
# to load data into the table.
|
326
|
-
#
|
327
|
-
# ## Loading records
|
328
|
-
#
|
329
|
-
# To follow along with these examples, you will need to set up billing on
|
330
|
-
# the [Google Developers Console](https://console.developers.google.com).
|
331
|
-
#
|
332
|
-
# In addition to CSV, data can be imported from files that are formatted as
|
333
|
-
# [Newline-delimited JSON](http://jsonlines.org/),
|
334
|
-
# [Avro](http://avro.apache.org/), [Parquet](https://parquet.apache.org/)
|
335
|
-
# or from a Google Cloud Datastore backup.
|
336
|
-
# It can also be "streamed" into BigQuery.
|
337
|
-
#
|
338
|
-
# ### Streaming records
|
339
|
-
#
|
340
|
-
# For situations in which you want new data to be available for querying as
|
341
|
-
# soon as possible, inserting individual records directly from your Ruby
|
342
|
-
# application is a great approach.
|
343
|
-
#
|
344
|
-
# ```ruby
|
345
|
-
# require "google/cloud/bigquery"
|
346
|
-
#
|
347
|
-
# bigquery = Google::Cloud::Bigquery.new
|
348
|
-
# dataset = bigquery.dataset "my_dataset"
|
349
|
-
# table = dataset.table "people"
|
350
|
-
#
|
351
|
-
# rows = [
|
352
|
-
# {
|
353
|
-
# "first_name" => "Anna",
|
354
|
-
# "cities_lived" => [
|
355
|
-
# {
|
356
|
-
# "place" => "Stockholm",
|
357
|
-
# "number_of_years" => 2
|
358
|
-
# }
|
359
|
-
# ]
|
360
|
-
# },
|
361
|
-
# {
|
362
|
-
# "first_name" => "Bob",
|
363
|
-
# "cities_lived" => [
|
364
|
-
# {
|
365
|
-
# "place" => "Seattle",
|
366
|
-
# "number_of_years" => 5
|
367
|
-
# },
|
368
|
-
# {
|
369
|
-
# "place" => "Austin",
|
370
|
-
# "number_of_years" => 6
|
371
|
-
# }
|
372
|
-
# ]
|
373
|
-
# }
|
374
|
-
# ]
|
375
|
-
# table.insert rows
|
376
|
-
# ```
|
377
|
-
#
|
378
|
-
# To avoid making RPCs (network requests) to retrieve the dataset and table
|
379
|
-
# resources when streaming records, pass the `skip_lookup` option. This
|
380
|
-
# creates local objects without verifying that the resources exist on the
|
381
|
-
# BigQuery service.
|
382
|
-
#
|
383
|
-
# ```ruby
|
384
|
-
# require "google/cloud/bigquery"
|
385
|
-
#
|
386
|
-
# bigquery = Google::Cloud::Bigquery.new
|
387
|
-
# dataset = bigquery.dataset "my_dataset", skip_lookup: true
|
388
|
-
# table = dataset.table "people", skip_lookup: true
|
389
|
-
#
|
390
|
-
# rows = [
|
391
|
-
# {
|
392
|
-
# "first_name" => "Anna",
|
393
|
-
# "cities_lived" => [
|
394
|
-
# {
|
395
|
-
# "place" => "Stockholm",
|
396
|
-
# "number_of_years" => 2
|
397
|
-
# }
|
398
|
-
# ]
|
399
|
-
# },
|
400
|
-
# {
|
401
|
-
# "first_name" => "Bob",
|
402
|
-
# "cities_lived" => [
|
403
|
-
# {
|
404
|
-
# "place" => "Seattle",
|
405
|
-
# "number_of_years" => 5
|
406
|
-
# },
|
407
|
-
# {
|
408
|
-
# "place" => "Austin",
|
409
|
-
# "number_of_years" => 6
|
410
|
-
# }
|
411
|
-
# ]
|
412
|
-
# }
|
413
|
-
# ]
|
414
|
-
# table.insert rows
|
415
|
-
# ```
|
416
|
-
#
|
417
|
-
# There are some trade-offs involved with streaming, so be sure to read the
|
418
|
-
# discussion of data consistency in [Streaming Data Into
|
419
|
-
# BigQuery](https://cloud.google.com/bigquery/streaming-data-into-bigquery).
|
420
|
-
#
|
421
|
-
# ### Uploading a file
|
422
|
-
#
|
423
|
-
# To follow along with this example, please download the
|
424
|
-
# [names.zip](http://www.ssa.gov/OACT/babynames/names.zip) archive from the
|
425
|
-
# U.S. Social Security Administration. Inside the archive you will find over
|
426
|
-
# 100 files containing baby name records since the year 1880.
|
427
|
-
#
|
428
|
-
# ```ruby
|
429
|
-
# require "google/cloud/bigquery"
|
430
|
-
#
|
431
|
-
# bigquery = Google::Cloud::Bigquery.new
|
432
|
-
# dataset = bigquery.dataset "my_dataset"
|
433
|
-
# table = dataset.create_table "baby_names" do |schema|
|
434
|
-
# schema.string "name", mode: :required
|
435
|
-
# schema.string "gender", mode: :required
|
436
|
-
# schema.integer "count", mode: :required
|
437
|
-
# end
|
438
|
-
#
|
439
|
-
# file = File.open "names/yob2014.txt"
|
440
|
-
# table.load file, format: "csv"
|
441
|
-
# ```
|
442
|
-
#
|
443
|
-
# Because the names data, although formatted as CSV, is distributed in files
|
444
|
-
# with a `.txt` extension, this example explicitly passes the `format`
|
445
|
-
# option in order to demonstrate how to handle such situations. Because CSV
|
446
|
-
# is the default format for load operations, the option is not actually
|
447
|
-
# necessary. For JSON saved with a `.txt` extension, however, it would be.
|
448
|
-
#
|
449
|
-
# ## Exporting query results to Google Cloud Storage
|
450
|
-
#
|
451
|
-
# The example below shows how to pass the `table` option with a query in
|
452
|
-
# order to store results in a permanent table. It also shows how to export
|
453
|
-
# the result data to a Google Cloud Storage file. In order to follow along,
|
454
|
-
# you will need to enable the Google Cloud Storage API in addition to
|
455
|
-
# setting up billing.
|
456
|
-
#
|
457
|
-
# ```ruby
|
458
|
-
# require "google/cloud/bigquery"
|
459
|
-
#
|
460
|
-
# bigquery = Google::Cloud::Bigquery.new
|
461
|
-
# dataset = bigquery.dataset "my_dataset"
|
462
|
-
# source_table = dataset.table "baby_names"
|
463
|
-
# result_table = dataset.create_table "baby_names_results"
|
464
|
-
#
|
465
|
-
# sql = "SELECT name, count " \
|
466
|
-
# "FROM baby_names " \
|
467
|
-
# "WHERE gender = 'M' " \
|
468
|
-
# "ORDER BY count ASC LIMIT 5"
|
469
|
-
# query_job = dataset.query_job sql, table: result_table
|
470
|
-
#
|
471
|
-
# query_job.wait_until_done!
|
472
|
-
#
|
473
|
-
# if !query_job.failed?
|
474
|
-
# require "google/cloud/storage"
|
475
|
-
#
|
476
|
-
# storage = Google::Cloud::Storage.new
|
477
|
-
# bucket_id = "bigquery-exports-#{SecureRandom.uuid}"
|
478
|
-
# bucket = storage.create_bucket bucket_id
|
479
|
-
# extract_url = "gs://#{bucket.id}/baby-names.csv"
|
480
|
-
#
|
481
|
-
# result_table.extract extract_url
|
482
|
-
#
|
483
|
-
# # Download to local filesystem
|
484
|
-
# bucket.files.first.download "baby-names.csv"
|
485
|
-
# end
|
486
|
-
# ```
|
487
|
-
#
|
488
|
-
# If a table you wish to export contains a large amount of data, you can
|
489
|
-
# pass a wildcard URI to export to multiple files (for sharding), or an
|
490
|
-
# array of URIs (for partitioning), or both. See [Exporting
|
491
|
-
# Data](https://cloud.google.com/bigquery/docs/exporting-data)
|
492
|
-
# for details.
|
493
|
-
#
|
494
|
-
# ## Configuring retries and timeout
|
495
|
-
#
|
496
|
-
# You can configure how many times API requests may be automatically
|
497
|
-
# retried. When an API request fails, the response will be inspected to see
|
498
|
-
# if the request meets criteria indicating that it may succeed on retry,
|
499
|
-
# such as `500` and `503` status codes or a specific internal error code
|
500
|
-
# such as `rateLimitExceeded`. If it meets the criteria, the request will be
|
501
|
-
# retried after a delay. If another error occurs, the delay will be
|
502
|
-
# increased before a subsequent attempt, until the `retries` limit is
|
503
|
-
# reached.
|
504
|
-
#
|
505
|
-
# You can also set the request `timeout` value in seconds.
|
506
|
-
#
|
507
|
-
# ```ruby
|
508
|
-
# require "google/cloud/bigquery"
|
509
|
-
#
|
510
|
-
# bigquery = Google::Cloud::Bigquery.new retries: 10, timeout: 120
|
511
|
-
# ```
|
512
|
-
#
|
513
|
-
# See the [BigQuery error
|
514
|
-
# table](https://cloud.google.com/bigquery/troubleshooting-errors#errortable)
|
515
|
-
# for a list of error conditions.
|
29
|
+
# See {file:OVERVIEW.md BigQuery Overview}.
|
516
30
|
#
|
517
31
|
module Bigquery
|
518
32
|
# Creates a new `Project` instance connected to the BigQuery service.
|
519
33
|
# Each call creates a new connection.
|
520
34
|
#
|
521
35
|
# For more information on connecting to Google Cloud see the
|
522
|
-
#
|
523
|
-
# Guide](https://googlecloudplatform.github.io/google-cloud-ruby/#/docs/guides/authentication).
|
36
|
+
# {file:AUTHENTICATION.md Authentication Guide}.
|
524
37
|
#
|
525
38
|
# @param [String] project_id Identifier for a BigQuery project. If not
|
526
39
|
# present, the default project for the credentials is used.
|
@@ -323,6 +323,7 @@ module Google
|
|
323
323
|
"json" => "NEWLINE_DELIMITED_JSON",
|
324
324
|
"newline_delimited_json" => "NEWLINE_DELIMITED_JSON",
|
325
325
|
"avro" => "AVRO",
|
326
|
+
"orc" => "ORC",
|
326
327
|
"parquet" => "PARQUET",
|
327
328
|
"datastore" => "DATASTORE_BACKUP",
|
328
329
|
"backup" => "DATASTORE_BACKUP",
|
@@ -354,6 +355,7 @@ module Google
|
|
354
355
|
return "CSV" if path.end_with? ".csv"
|
355
356
|
return "NEWLINE_DELIMITED_JSON" if path.end_with? ".json"
|
356
357
|
return "AVRO" if path.end_with? ".avro"
|
358
|
+
return "ORC" if path.end_with? ".orc"
|
357
359
|
return "PARQUET" if path.end_with? ".parquet"
|
358
360
|
return "DATASTORE_BACKUP" if path.end_with? ".backup_info"
|
359
361
|
nil
|
@@ -1184,6 +1184,7 @@ module Google
|
|
1184
1184
|
# * `csv` - CSV
|
1185
1185
|
# * `json` - [Newline-delimited JSON](http://jsonlines.org/)
|
1186
1186
|
# * `avro` - [Avro](http://avro.apache.org/)
|
1187
|
+
# * `orc` - [ORC](https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-orc)
|
1187
1188
|
# * `parquet` - [Parquet](https://parquet.apache.org/)
|
1188
1189
|
# * `datastore_backup` - Cloud Datastore backup
|
1189
1190
|
# @param [String] create Specifies whether the job is allowed to create
|
@@ -1444,6 +1445,7 @@ module Google
|
|
1444
1445
|
# * `csv` - CSV
|
1445
1446
|
# * `json` - [Newline-delimited JSON](http://jsonlines.org/)
|
1446
1447
|
# * `avro` - [Avro](http://avro.apache.org/)
|
1448
|
+
# * `orc` - [ORC](https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-orc)
|
1447
1449
|
# * `parquet` - [Parquet](https://parquet.apache.org/)
|
1448
1450
|
# * `datastore_backup` - Cloud Datastore backup
|
1449
1451
|
# @param [String] create Specifies whether the job is allowed to create
|
@@ -21,8 +21,7 @@ module Google
|
|
21
21
|
# # Encryption Configuration
|
22
22
|
#
|
23
23
|
# A builder for BigQuery table encryption configurations, passed to block
|
24
|
-
# arguments to {Dataset#create_table} and
|
25
|
-
# {Table#encryption_configuration}.
|
24
|
+
# arguments to {Dataset#create_table} and {Table#encryption}.
|
26
25
|
#
|
27
26
|
# @see https://cloud.google.com/bigquery/docs/customer-managed-encryption
|
28
27
|
# Protecting Data with Cloud KMS Keys
|
@@ -275,7 +275,9 @@ module Google
|
|
275
275
|
# provided when the job is created, and used to organize and group jobs.
|
276
276
|
#
|
277
277
|
# The returned hash is frozen and changes are not allowed. Use
|
278
|
-
# {#labels=}
|
278
|
+
# {CopyJob::Updater#labels=} or {ExtractJob::Updater#labels=} or
|
279
|
+
# {LoadJob::Updater#labels=} or {QueryJob::Updater#labels=} to replace
|
280
|
+
# the entire hash.
|
279
281
|
#
|
280
282
|
# @return [Hash] The job labels.
|
281
283
|
#
|
@@ -855,6 +855,7 @@ module Google
|
|
855
855
|
# * `csv` - CSV
|
856
856
|
# * `json` - [Newline-delimited JSON](http://jsonlines.org/)
|
857
857
|
# * `avro` - [Avro](http://avro.apache.org/)
|
858
|
+
# * `orc` - [ORC](https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-orc)
|
858
859
|
# * `parquet` - [Parquet](https://parquet.apache.org/)
|
859
860
|
# * `datastore_backup` - Cloud Datastore backup
|
860
861
|
#
|
@@ -1505,6 +1505,7 @@ module Google
|
|
1505
1505
|
# * `csv` - CSV
|
1506
1506
|
# * `json` - [Newline-delimited JSON](http://jsonlines.org/)
|
1507
1507
|
# * `avro` - [Avro](http://avro.apache.org/)
|
1508
|
+
# * `orc` - [ORC](https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-orc)
|
1508
1509
|
# * `parquet` - [Parquet](https://parquet.apache.org/)
|
1509
1510
|
# * `datastore_backup` - Cloud Datastore backup
|
1510
1511
|
# @param [String] create Specifies whether the job is allowed to create
|
@@ -1714,6 +1715,7 @@ module Google
|
|
1714
1715
|
# * `csv` - CSV
|
1715
1716
|
# * `json` - [Newline-delimited JSON](http://jsonlines.org/)
|
1716
1717
|
# * `avro` - [Avro](http://avro.apache.org/)
|
1718
|
+
# * `orc` - [ORC](https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-orc)
|
1717
1719
|
# * `parquet` - [Parquet](https://parquet.apache.org/)
|
1718
1720
|
# * `datastore_backup` - Cloud Datastore backup
|
1719
1721
|
# @param [String] create Specifies whether the job is allowed to create
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: google-cloud-bigquery
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.
|
4
|
+
version: 1.8.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Mike Moore
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2018-
|
12
|
+
date: 2018-09-10 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: google-cloud-core
|
@@ -197,16 +197,16 @@ dependencies:
|
|
197
197
|
name: yard-doctest
|
198
198
|
requirement: !ruby/object:Gem::Requirement
|
199
199
|
requirements:
|
200
|
-
- - "
|
200
|
+
- - "~>"
|
201
201
|
- !ruby/object:Gem::Version
|
202
|
-
version: 0.1.
|
202
|
+
version: 0.1.13
|
203
203
|
type: :development
|
204
204
|
prerelease: false
|
205
205
|
version_requirements: !ruby/object:Gem::Requirement
|
206
206
|
requirements:
|
207
|
-
- - "
|
207
|
+
- - "~>"
|
208
208
|
- !ruby/object:Gem::Version
|
209
|
-
version: 0.1.
|
209
|
+
version: 0.1.13
|
210
210
|
description: google-cloud-bigquery is the official library for Google BigQuery.
|
211
211
|
email:
|
212
212
|
- mike@blowmage.com
|