google-cloud-bigquery 1.8.0 → 1.8.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/AUTHENTICATION.md +178 -0
- data/CHANGELOG.md +208 -0
- data/CODE_OF_CONDUCT.md +40 -0
- data/CONTRIBUTING.md +188 -0
- data/LOGGING.md +27 -0
- data/OVERVIEW.md +463 -0
- data/TROUBLESHOOTING.md +37 -0
- data/lib/google/cloud/bigquery/version.rb +1 -1
- metadata +9 -3
- data/README.md +0 -112
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 852851bfe145222ae3691888ddf5bd1accf17e2113ce7ad2e7e5ea1f8b35e1a3
|
4
|
+
data.tar.gz: f4324c534a43652b21938dfb0fdb94a42ac9091cc540faf590d591062924bc85
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: a6ba7cd948020964e20933a658ef4b1bd9d96b1f0c76a5d15050c0e4c31eae56517f8e5eacf5f7b666de026b391bdc9535dc270e46c19f38db59e45c915fda48
|
7
|
+
data.tar.gz: 5862b2eb1d70f7c7eebd7204d6d197b7665962233c0c49d8305663364a11a35bd5c48ce8f291517d15cc37c0fec9f038d0189676cc9bb2e9137e3a292914abf7
|
data/AUTHENTICATION.md
ADDED
@@ -0,0 +1,178 @@
|
|
1
|
+
# Authentication
|
2
|
+
|
3
|
+
In general, the google-cloud-bigquery library uses [Service
|
4
|
+
Account](https://cloud.google.com/iam/docs/creating-managing-service-accounts)
|
5
|
+
credentials to connect to Google Cloud services. When running on Compute Engine
|
6
|
+
the credentials will be discovered automatically. When running on other
|
7
|
+
environments, the Service Account credentials can be specified by providing the
|
8
|
+
path to the [JSON
|
9
|
+
keyfile](https://cloud.google.com/iam/docs/managing-service-account-keys) for
|
10
|
+
the account (or the JSON itself) in environment variables. Additionally, Cloud
|
11
|
+
SDK credentials can also be discovered automatically, but this is only
|
12
|
+
recommended during development.
|
13
|
+
|
14
|
+
## Project and Credential Lookup
|
15
|
+
|
16
|
+
The google-cloud-bigquery library aims to make authentication as simple as
|
17
|
+
possible, and provides several mechanisms to configure your system without
|
18
|
+
providing **Project ID** and **Service Account Credentials** directly in code.
|
19
|
+
|
20
|
+
**Project ID** is discovered in the following order:
|
21
|
+
|
22
|
+
1. Specify project ID in method arguments
|
23
|
+
2. Specify project ID in configuration
|
24
|
+
3. Discover project ID in environment variables
|
25
|
+
4. Discover GCE project ID
|
26
|
+
|
27
|
+
**Credentials** are discovered in the following order:
|
28
|
+
|
29
|
+
1. Specify credentials in method arguments
|
30
|
+
2. Specify credentials in configuration
|
31
|
+
3. Discover credentials path in environment variables
|
32
|
+
4. Discover credentials JSON in environment variables
|
33
|
+
5. Discover credentials file in the Cloud SDK's path
|
34
|
+
6. Discover GCE credentials
|
35
|
+
|
36
|
+
### Google Cloud Platform environments
|
37
|
+
|
38
|
+
While running on Google Cloud Platform environments such as Google Compute
|
39
|
+
Engine, Google App Engine and Google Kubernetes Engine, no extra work is needed.
|
40
|
+
The **Project ID** and **Credentials** and are discovered automatically. Code
|
41
|
+
should be written as if already authenticated. Just be sure when you [set up the
|
42
|
+
GCE instance][gce-how-to], you add the correct scopes for the APIs you want to
|
43
|
+
access. For example:
|
44
|
+
|
45
|
+
* **All APIs**
|
46
|
+
* `https://www.googleapis.com/auth/cloud-platform`
|
47
|
+
* `https://www.googleapis.com/auth/cloud-platform.read-only`
|
48
|
+
* **BigQuery**
|
49
|
+
* `https://www.googleapis.com/auth/bigquery`
|
50
|
+
* `https://www.googleapis.com/auth/bigquery.insertdata`
|
51
|
+
* **Compute Engine**
|
52
|
+
* `https://www.googleapis.com/auth/compute`
|
53
|
+
* **Datastore**
|
54
|
+
* `https://www.googleapis.com/auth/datastore`
|
55
|
+
* `https://www.googleapis.com/auth/userinfo.email`
|
56
|
+
* **DNS**
|
57
|
+
* `https://www.googleapis.com/auth/ndev.clouddns.readwrite`
|
58
|
+
* **Pub/Sub**
|
59
|
+
* `https://www.googleapis.com/auth/pubsub`
|
60
|
+
* **Storage**
|
61
|
+
* `https://www.googleapis.com/auth/devstorage.full_control`
|
62
|
+
* `https://www.googleapis.com/auth/devstorage.read_only`
|
63
|
+
* `https://www.googleapis.com/auth/devstorage.read_write`
|
64
|
+
|
65
|
+
### Environment Variables
|
66
|
+
|
67
|
+
The **Project ID** and **Credentials JSON** can be placed in environment
|
68
|
+
variables instead of declaring them directly in code. Each service has its own
|
69
|
+
environment variable, allowing for different service accounts to be used for
|
70
|
+
different services. (See the READMEs for the individual service gems for
|
71
|
+
details.) The path to the **Credentials JSON** file can be stored in the
|
72
|
+
environment variable, or the **Credentials JSON** itself can be stored for
|
73
|
+
environments such as Docker containers where writing files is difficult or not
|
74
|
+
encouraged.
|
75
|
+
|
76
|
+
The environment variables that BigQuery checks for project ID are:
|
77
|
+
|
78
|
+
1. `BIGQUERY_PROJECT`
|
79
|
+
2. `GOOGLE_CLOUD_PROJECT`
|
80
|
+
|
81
|
+
The environment variables that BigQuery checks for credentials are configured on {Google::Cloud::Bigquery::Credentials}:
|
82
|
+
|
83
|
+
1. `BIGQUERY_CREDENTIALS` - Path to JSON file, or JSON contents
|
84
|
+
2. `BIGQUERY_KEYFILE` - Path to JSON file, or JSON contents
|
85
|
+
3. `GOOGLE_CLOUD_CREDENTIALS` - Path to JSON file, or JSON contents
|
86
|
+
4. `GOOGLE_CLOUD_KEYFILE` - Path to JSON file, or JSON contents
|
87
|
+
5. `GOOGLE_APPLICATION_CREDENTIALS` - Path to JSON file
|
88
|
+
|
89
|
+
```ruby
|
90
|
+
require "google/cloud/bigquery"
|
91
|
+
|
92
|
+
ENV["BIGQUERY_PROJECT"] = "my-project-id"
|
93
|
+
ENV["BIGQUERY_CREDENTIALS"] = "path/to/keyfile.json"
|
94
|
+
|
95
|
+
bigquery = Google::Cloud::Bigquery.new
|
96
|
+
```
|
97
|
+
|
98
|
+
### Configuration
|
99
|
+
|
100
|
+
The **Project ID** and **Credentials JSON** can be configured instead of placing them in environment variables or providing them as arguments.
|
101
|
+
|
102
|
+
```ruby
|
103
|
+
require "google/cloud/bigquery"
|
104
|
+
|
105
|
+
Google::Cloud::Bigquery.configure do |config|
|
106
|
+
config.project_id = "my-project-id"
|
107
|
+
config.credentials = "path/to/keyfile.json"
|
108
|
+
end
|
109
|
+
|
110
|
+
bigquery = Google::Cloud::Bigquery.new
|
111
|
+
```
|
112
|
+
|
113
|
+
### Cloud SDK
|
114
|
+
|
115
|
+
This option allows for an easy way to authenticate during development. If
|
116
|
+
credentials are not provided in code or in environment variables, then Cloud SDK
|
117
|
+
credentials are discovered.
|
118
|
+
|
119
|
+
To configure your system for this, simply:
|
120
|
+
|
121
|
+
1. [Download and install the Cloud SDK](https://cloud.google.com/sdk)
|
122
|
+
2. Authenticate using OAuth 2.0 `$ gcloud auth login`
|
123
|
+
3. Write code as if already authenticated.
|
124
|
+
|
125
|
+
**NOTE:** This is _not_ recommended for running in production. The Cloud SDK
|
126
|
+
*should* only be used during development.
|
127
|
+
|
128
|
+
[gce-how-to]: https://cloud.google.com/compute/docs/authentication#using
|
129
|
+
[dev-console]: https://console.cloud.google.com/project
|
130
|
+
|
131
|
+
[enable-apis]: https://raw.githubusercontent.com/GoogleCloudPlatform/gcloud-common/master/authentication/enable-apis.png
|
132
|
+
|
133
|
+
[create-new-service-account]: https://raw.githubusercontent.com/GoogleCloudPlatform/gcloud-common/master/authentication/create-new-service-account.png
|
134
|
+
[create-new-service-account-existing-keys]: https://raw.githubusercontent.com/GoogleCloudPlatform/gcloud-common/master/authentication/create-new-service-account-existing-keys.png
|
135
|
+
[reuse-service-account]: https://raw.githubusercontent.com/GoogleCloudPlatform/gcloud-common/master/authentication/reuse-service-account.png
|
136
|
+
|
137
|
+
## Creating a Service Account
|
138
|
+
|
139
|
+
Google Cloud requires a **Project ID** and **Service Account Credentials** to
|
140
|
+
connect to the APIs. You will use the **Project ID** and **JSON key file** to
|
141
|
+
connect to most services with google-cloud-bigquery.
|
142
|
+
|
143
|
+
If you are not running this client on Google Compute Engine, you need a Google
|
144
|
+
Developers service account.
|
145
|
+
|
146
|
+
1. Visit the [Google Developers Console][dev-console].
|
147
|
+
1. Create a new project or click on an existing project.
|
148
|
+
1. Activate the slide-out navigation tray and select **API Manager**. From
|
149
|
+
here, you will enable the APIs that your application requires.
|
150
|
+
|
151
|
+
![Enable the APIs that your application requires][enable-apis]
|
152
|
+
|
153
|
+
*Note: You may need to enable billing in order to use these services.*
|
154
|
+
|
155
|
+
1. Select **Credentials** from the side navigation.
|
156
|
+
|
157
|
+
You should see a screen like one of the following.
|
158
|
+
|
159
|
+
![Create a new service account][create-new-service-account]
|
160
|
+
|
161
|
+
![Create a new service account With Existing Keys][create-new-service-account-existing-keys]
|
162
|
+
|
163
|
+
Find the "Add credentials" drop down and select "Service account" to be
|
164
|
+
guided through downloading a new JSON key file.
|
165
|
+
|
166
|
+
If you want to re-use an existing service account, you can easily generate a
|
167
|
+
new key file. Just select the account you wish to re-use, and click "Generate
|
168
|
+
new JSON key":
|
169
|
+
|
170
|
+
![Re-use an existing service account][reuse-service-account]
|
171
|
+
|
172
|
+
The key file you download will be used by this library to authenticate API
|
173
|
+
requests and should be stored in a secure location.
|
174
|
+
|
175
|
+
## Troubleshooting
|
176
|
+
|
177
|
+
If you're having trouble authenticating you can ask for help by following the
|
178
|
+
{file:TROUBLESHOOTING.md Troubleshooting Guide}.
|
data/CHANGELOG.md
ADDED
@@ -0,0 +1,208 @@
|
|
1
|
+
# Release History
|
2
|
+
|
3
|
+
### 1.8.1 / 2018-09-12
|
4
|
+
|
5
|
+
* Add missing documentation files to package.
|
6
|
+
|
7
|
+
### 1.8.0 / 2018-09-10
|
8
|
+
|
9
|
+
* Add support for OCR format.
|
10
|
+
* Update documentation.
|
11
|
+
|
12
|
+
### 1.7.1 / 2018-08-21
|
13
|
+
|
14
|
+
* Update documentation.
|
15
|
+
|
16
|
+
### 1.7.0 / 2018-06-29
|
17
|
+
|
18
|
+
* Add #schema_update_options to LoadJob and #schema_update_options= to LoadJob::Updater.
|
19
|
+
* Add time partitioning for the target table to LoadJob and QueryJob.
|
20
|
+
* Add #statement_type, #ddl_operation_performed, #ddl_target_table to QueryJob.
|
21
|
+
|
22
|
+
### 1.6.0 / 2018-06-22
|
23
|
+
|
24
|
+
* Documentation updates.
|
25
|
+
* Updated dependencies.
|
26
|
+
|
27
|
+
### 1.5.0 / 2018-05-21
|
28
|
+
|
29
|
+
* Add Schema.load and Schema.dump to read/write a table schema from/to a JSON file or other IO source. The JSON file schema is the same as for the bq CLI.
|
30
|
+
* Add support for the NUMERIC data type.
|
31
|
+
* Add documentation for enabling logging.
|
32
|
+
|
33
|
+
### 1.4.0 / 2018-05-07
|
34
|
+
|
35
|
+
* Add Parquet support to #load and #load_job.
|
36
|
+
|
37
|
+
### 1.3.0 / 2018-04-05
|
38
|
+
|
39
|
+
* Add insert_ids option to #insert in Dataset, Table, and AsyncInserter.
|
40
|
+
* Add BigQuery Project#service_account_email.
|
41
|
+
* Add support for setting Job location to nil in blocks for Job properties.
|
42
|
+
|
43
|
+
### 1.2.0 / 2018-03-31
|
44
|
+
|
45
|
+
* Add geo-regionalization (location) support to Jobs.
|
46
|
+
* Add Project#encryption support to Jobs.
|
47
|
+
* Rename Encryption to EncryptionConfiguration.
|
48
|
+
* Add blocks for setting Job properties to all Job creation methods.
|
49
|
+
* Add support for lists of URLs to #load and #load_job. (jeremywadsack)
|
50
|
+
* Fix Schema::Field type helpers.
|
51
|
+
* Fix Table#load example in README.
|
52
|
+
|
53
|
+
### 1.1.0 / 2018-02-27
|
54
|
+
|
55
|
+
* Support table partitioning by field.
|
56
|
+
* Support Shared Configuration.
|
57
|
+
* Improve AsyncInserter performance.
|
58
|
+
|
59
|
+
### 1.0.0 / 2018-01-10
|
60
|
+
|
61
|
+
* Release 1.0.0
|
62
|
+
* Update authentication documentation
|
63
|
+
* Update Data documentation and code examples
|
64
|
+
* Remove reference to sync and async queries
|
65
|
+
* Allow use of URI objects for Dataset#load, Table#load, and Table#load_job
|
66
|
+
|
67
|
+
### 0.30.0 / 2017-11-14
|
68
|
+
|
69
|
+
* Add `Google::Cloud::Bigquery::Credentials` class.
|
70
|
+
* Rename constructor arguments to `project_id` and `credentials`.
|
71
|
+
(The previous arguments `project` and `keyfile` are still supported.)
|
72
|
+
* Support creating `Dataset` and `Table` objects without making API calls using
|
73
|
+
`skip_lookup` argument.
|
74
|
+
* Add `Dataset#reference?` and `Dataset#resource?` helper method.
|
75
|
+
* Add `Table#reference?` and `Table#resource?` and `Table#resource_partial?`
|
76
|
+
and `Table#resource_full?` helper methods.
|
77
|
+
* `Dataset#insert_async` and `Dataset#insert_async` now yields a
|
78
|
+
`Table::AsyncInserter::Result` object.
|
79
|
+
* `View` is removed, now uses `Table` class.
|
80
|
+
* Needed to support `skip_lookup` argument.
|
81
|
+
* Calling `Table#data` on a view now raises (breaking change).
|
82
|
+
* Performance improvements for queries.
|
83
|
+
* Updated `google-api-client`, `googleauth` dependencies.
|
84
|
+
|
85
|
+
### 0.29.0 / 2017-10-09
|
86
|
+
|
87
|
+
This is a major release with many new features and several breaking changes.
|
88
|
+
|
89
|
+
#### Major Changes
|
90
|
+
|
91
|
+
* All queries now use a new implementation, using a job and polling for results.
|
92
|
+
* The copy, load, extract methods now all have high-level and low-level versions, similar to `query` and `query_job`.
|
93
|
+
* Added asynchronous row insertion, allowing data to be collected and inserted in batches.
|
94
|
+
* Support external data sources for both queries and table views.
|
95
|
+
* Added create-on-insert support for tables.
|
96
|
+
* Allow for customizing job IDs to aid in organizing jobs.
|
97
|
+
|
98
|
+
#### Change Details
|
99
|
+
|
100
|
+
* Update high-level queries as follows:
|
101
|
+
* Update `QueryJob#wait_until_done!` to use `getQueryResults`.
|
102
|
+
* Update `Project#query` and `Dataset#query` with breaking changes:
|
103
|
+
* Remove `timeout` and `dryrun` parameters.
|
104
|
+
* Change return type from `QueryData` to `Data`.
|
105
|
+
* Add `QueryJob#data`
|
106
|
+
* Alias `QueryJob#query_results` to `QueryJob#data` with breaking changes:
|
107
|
+
* Remove the `timeout` parameter.
|
108
|
+
* Change the return type from `QueryData` to `Data`.
|
109
|
+
* Update `View#data` with breaking changes:
|
110
|
+
* Remove the `timeout` and `dryrun` parameters.
|
111
|
+
* Change the return type from `QueryData` to `Data`.
|
112
|
+
* Remove `QueryData`.
|
113
|
+
* Update `Project#query` and `Dataset#query` with improved errors, replacing the previous simple error with one that contains all available information for why the job failed.
|
114
|
+
* Rename `Dataset#load` to `Dataset#load_job`; add high-level, synchronous version as `Dataset#load`.
|
115
|
+
* Rename `Table#copy` to `Table#copy_job`; add high-level, synchronous version as `Table#copy`.
|
116
|
+
* Rename `Table#extract` to `Table#extract_job`; add high-level, synchronous version as `Table#extract`.
|
117
|
+
* Rename `Table#load` to `Table#load_job`; add high-level, synchronous version as `Table#load`.
|
118
|
+
* Add support for querying external data sources with `External`.
|
119
|
+
* Add `Table::AsyncInserter`, `Dataset#insert_async` and `Table#insert_async` to collect and insert rows in batches.
|
120
|
+
* Add `Dataset#insert` to support creating a table while inserting rows if the table does not exist.
|
121
|
+
* Update retry logic to conform to the [BigQuery SLA](https://cloud.google.com/bigquery/sla).
|
122
|
+
* Use a minimum back-off interval of 1 second; for each consecutive error, increase the back-off interval exponentially up to 32 seconds.
|
123
|
+
* Retry if all error reasons are retriable, not if any of the error reasons are retriable.
|
124
|
+
* Add support for labels to `Dataset`, `Table`, `View` and `Job`.
|
125
|
+
* Add `filter` option to `Project#datasets` and `Project#jobs`.
|
126
|
+
* Add support for user-defined functions to `Project#query_job`, `Dataset#query_job`, `QueryJob` and `View`.
|
127
|
+
* In `Dataset`, `Table`, and `View` updates, add the use of ETags for optimistic concurrency control.
|
128
|
+
* Update `Dataset#load` and `Table#load`:
|
129
|
+
* Add `null_marker` option and `LoadJob#null_marker`.
|
130
|
+
* Add `autodetect` option and `LoadJob#autodetect?`.
|
131
|
+
* Fix the default value for `LoadJob#quoted_newlines?`.
|
132
|
+
* Add `job_id` and `prefix` options for controlling client-side job ID generation to `Project#query_job`, `Dataset#load`, `Dataset#query_job`, `Table#copy`, `Table#extract`, and `Table#load`.
|
133
|
+
* Add `Job#user_email`.
|
134
|
+
* Set the maximum delay of `Job#wait_until_done!` polling to 60 seconds.
|
135
|
+
* Automatically retry `Job#cancel`.
|
136
|
+
* Allow users to specify if a `View` query is using Standard vs. Legacy SQL.
|
137
|
+
* Add `project` option to `Project#query_job`.
|
138
|
+
* Add `QueryJob#query_plan`, `QueryJob::Stage` and `QueryJob::Step` to expose query plan information.
|
139
|
+
* Add `Table#buffer_bytes`, `Table#buffer_rows` and `Table#buffer_oldest_at` to expose streaming buffer information.
|
140
|
+
* Update `Dataset#insert` and `Table#insert` to raise an error if `rows` is empty.
|
141
|
+
* Update `Error` with a mapping from code 412 to `FailedPreconditionError`.
|
142
|
+
* Update `Data#schema` to freeze the returned `Schema` object (as in `View` and `LoadJob`.)
|
143
|
+
|
144
|
+
### 0.28.0 / 2017-09-28
|
145
|
+
|
146
|
+
* Update Google API Client dependency to 0.14.x.
|
147
|
+
|
148
|
+
### 0.27.1 / 2017-07-11
|
149
|
+
|
150
|
+
* Add `InsertResponse::InsertError#index` (zedalaye)
|
151
|
+
|
152
|
+
### 0.27.0 / 2017-06-28
|
153
|
+
|
154
|
+
* Add `maximum_billing_tier` and `maximum_bytes_billed` to `QueryJob`, `Project#query_job` and `Dataset#query_job`.
|
155
|
+
* Add `Dataset#load` to support creating, configuring and loading a table in one API call.
|
156
|
+
* Add `Project#schema`.
|
157
|
+
* Upgrade dependency on Google API Client.
|
158
|
+
* Update gem spec homepage links.
|
159
|
+
* Update examples of field access to use symbols instead of strings in the documentation.
|
160
|
+
|
161
|
+
### 0.26.0 / 2017-04-05
|
162
|
+
|
163
|
+
* Upgrade dependency on Google API Client
|
164
|
+
|
165
|
+
### 0.25.0 / 2017-03-31
|
166
|
+
|
167
|
+
* Add `#cancel` to `Job`
|
168
|
+
* Updated documentation
|
169
|
+
|
170
|
+
### 0.24.0 / 2017-03-03
|
171
|
+
|
172
|
+
Major release, several new features, some breaking changes.
|
173
|
+
|
174
|
+
* Standard SQL is now the default syntax.
|
175
|
+
* Legacy SQL syntax can be enabled by providing `legacy_sql: true`.
|
176
|
+
* Several fixes to how data values are formatted when returned from BigQuery.
|
177
|
+
* Returned data rows are now hashes with Symbol keys instead of String keys.
|
178
|
+
* Several fixes to how data values are formatted when importing to BigQuery.
|
179
|
+
* Several improvements to manipulating table schema fields.
|
180
|
+
* Removal of `Schema#fields=` and `Data#raw` methods.
|
181
|
+
* Removal of `fields` argument from `Dataset#create_table` method.
|
182
|
+
* Dependency on Google API Client has been updated to 0.10.x.
|
183
|
+
|
184
|
+
### 0.23.0 / 2016-12-8
|
185
|
+
|
186
|
+
* Support Query Parameters using `params` method arguments to `query` and `query_job`
|
187
|
+
* Add `standard_sql`/`legacy_sql` method arguments to to `query` and `query_job`
|
188
|
+
* Add `standard_sql?`/`legacy_sql?` attributes to `QueryJob`
|
189
|
+
* Many documentation improvements
|
190
|
+
|
191
|
+
### 0.21.0 / 2016-10-20
|
192
|
+
|
193
|
+
* New service constructor Google::Cloud::Bigquery.new
|
194
|
+
|
195
|
+
### 0.20.2 / 2016-09-30
|
196
|
+
|
197
|
+
* Add list of projects that the current credentials can access. (remi)
|
198
|
+
|
199
|
+
### 0.20.1 / 2016-09-02
|
200
|
+
|
201
|
+
* Fix for timeout on uploads.
|
202
|
+
|
203
|
+
### 0.20.0 / 2016-08-26
|
204
|
+
|
205
|
+
This gem contains the Google BigQuery service implementation for the `google-cloud` gem. The `google-cloud` gem replaces the old `gcloud` gem. Legacy code can continue to use the `gcloud` gem.
|
206
|
+
|
207
|
+
* Namespace is now `Google::Cloud`
|
208
|
+
* The `google-cloud` gem is now an umbrella package for individual gems
|
data/CODE_OF_CONDUCT.md
ADDED
@@ -0,0 +1,40 @@
|
|
1
|
+
# Contributor Code of Conduct
|
2
|
+
|
3
|
+
As contributors and maintainers of this project, and in the interest of
|
4
|
+
fostering an open and welcoming community, we pledge to respect all people who
|
5
|
+
contribute through reporting issues, posting feature requests, updating
|
6
|
+
documentation, submitting pull requests or patches, and other activities.
|
7
|
+
|
8
|
+
We are committed to making participation in this project a harassment-free
|
9
|
+
experience for everyone, regardless of level of experience, gender, gender
|
10
|
+
identity and expression, sexual orientation, disability, personal appearance,
|
11
|
+
body size, race, ethnicity, age, religion, or nationality.
|
12
|
+
|
13
|
+
Examples of unacceptable behavior by participants include:
|
14
|
+
|
15
|
+
* The use of sexualized language or imagery
|
16
|
+
* Personal attacks
|
17
|
+
* Trolling or insulting/derogatory comments
|
18
|
+
* Public or private harassment
|
19
|
+
* Publishing other's private information, such as physical or electronic
|
20
|
+
addresses, without explicit permission
|
21
|
+
* Other unethical or unprofessional conduct.
|
22
|
+
|
23
|
+
Project maintainers have the right and responsibility to remove, edit, or reject
|
24
|
+
comments, commits, code, wiki edits, issues, and other contributions that are
|
25
|
+
not aligned to this Code of Conduct. By adopting this Code of Conduct, project
|
26
|
+
maintainers commit themselves to fairly and consistently applying these
|
27
|
+
principles to every aspect of managing this project. Project maintainers who do
|
28
|
+
not follow or enforce the Code of Conduct may be permanently removed from the
|
29
|
+
project team.
|
30
|
+
|
31
|
+
This code of conduct applies both within project spaces and in public spaces
|
32
|
+
when an individual is representing the project or its community.
|
33
|
+
|
34
|
+
Instances of abusive, harassing, or otherwise unacceptable behavior may be
|
35
|
+
reported by opening an issue or contacting one or more of the project
|
36
|
+
maintainers.
|
37
|
+
|
38
|
+
This Code of Conduct is adapted from the [Contributor
|
39
|
+
Covenant](http://contributor-covenant.org), version 1.2.0, available at
|
40
|
+
[http://contributor-covenant.org/version/1/2/0/](http://contributor-covenant.org/version/1/2/0/)
|
data/CONTRIBUTING.md
ADDED
@@ -0,0 +1,188 @@
|
|
1
|
+
# Contributing to Google Cloud BigQuery
|
2
|
+
|
3
|
+
1. **Sign one of the contributor license agreements below.**
|
4
|
+
2. Fork the repo, develop and test your code changes.
|
5
|
+
3. Send a pull request.
|
6
|
+
|
7
|
+
## Contributor License Agreements
|
8
|
+
|
9
|
+
Before we can accept your pull requests you'll need to sign a Contributor
|
10
|
+
License Agreement (CLA):
|
11
|
+
|
12
|
+
- **If you are an individual writing original source code** and **you own the
|
13
|
+
intellectual property**, then you'll need to sign an [individual
|
14
|
+
CLA](https://developers.google.com/open-source/cla/individual).
|
15
|
+
- **If you work for a company that wants to allow you to contribute your work**,
|
16
|
+
then you'll need to sign a [corporate
|
17
|
+
CLA](https://developers.google.com/open-source/cla/corporate).
|
18
|
+
|
19
|
+
You can sign these electronically (just scroll to the bottom). After that, we'll
|
20
|
+
be able to accept your pull requests.
|
21
|
+
|
22
|
+
## Setup
|
23
|
+
|
24
|
+
In order to use the google-cloud-bigquery console and run the project's tests,
|
25
|
+
there is a small amount of setup:
|
26
|
+
|
27
|
+
1. Install Ruby. google-cloud-bigquery requires Ruby 2.3+. You may choose to
|
28
|
+
manage your Ruby and gem installations with [RVM](https://rvm.io/),
|
29
|
+
[rbenv](https://github.com/rbenv/rbenv), or
|
30
|
+
[chruby](https://github.com/postmodern/chruby).
|
31
|
+
|
32
|
+
2. Install [Bundler](http://bundler.io/).
|
33
|
+
|
34
|
+
```sh
|
35
|
+
$ gem install bundler
|
36
|
+
```
|
37
|
+
|
38
|
+
3. Install the top-level project dependencies.
|
39
|
+
|
40
|
+
```sh
|
41
|
+
$ bundle install
|
42
|
+
```
|
43
|
+
|
44
|
+
4. Install the BigQuery dependencies.
|
45
|
+
|
46
|
+
```sh
|
47
|
+
$ cd google-cloud-bigquery/
|
48
|
+
$ bundle exec rake bundleupdate
|
49
|
+
```
|
50
|
+
|
51
|
+
## Console
|
52
|
+
|
53
|
+
In order to run code interactively, you can automatically load
|
54
|
+
google-cloud-bigquery and its dependencies in IRB. This requires that your
|
55
|
+
developer environment has already been configured by following the steps
|
56
|
+
described in the {file:AUTHENTICATION.md Authentication Guide}. An IRB console
|
57
|
+
can be created with:
|
58
|
+
|
59
|
+
```sh
|
60
|
+
$ cd google-cloud-bigquery/
|
61
|
+
$ bundle exec rake console
|
62
|
+
```
|
63
|
+
|
64
|
+
## BigQuery Tests
|
65
|
+
|
66
|
+
Tests are very important part of google-cloud-bigquery. All contributions
|
67
|
+
should include tests that ensure the contributed code behaves as expected.
|
68
|
+
|
69
|
+
To run the unit tests, documentation tests, and code style checks together for a
|
70
|
+
package:
|
71
|
+
|
72
|
+
``` sh
|
73
|
+
$ cd google-cloud-bigquery/
|
74
|
+
$ bundle exec rake ci
|
75
|
+
```
|
76
|
+
|
77
|
+
To run the command above, plus all acceptance tests, use `rake ci:acceptance` or
|
78
|
+
its handy alias, `rake ci:a`.
|
79
|
+
|
80
|
+
### BigQuery Unit Tests
|
81
|
+
|
82
|
+
|
83
|
+
The project uses the [minitest](https://github.com/seattlerb/minitest) library,
|
84
|
+
including [specs](https://github.com/seattlerb/minitest#specs),
|
85
|
+
[mocks](https://github.com/seattlerb/minitest#mocks) and
|
86
|
+
[minitest-autotest](https://github.com/seattlerb/minitest-autotest).
|
87
|
+
|
88
|
+
To run the BigQuery unit tests:
|
89
|
+
|
90
|
+
``` sh
|
91
|
+
$ cd google-cloud-bigquery/
|
92
|
+
$ bundle exec rake test
|
93
|
+
```
|
94
|
+
|
95
|
+
### BigQuery Documentation Tests
|
96
|
+
|
97
|
+
The project tests the code examples in the gem's
|
98
|
+
[YARD](https://github.com/lsegal/yard)-based documentation.
|
99
|
+
|
100
|
+
The example testing functions in a way that is very similar to unit testing, and
|
101
|
+
in fact the library providing it,
|
102
|
+
[yard-doctest](https://github.com/p0deje/yard-doctest), is based on the
|
103
|
+
project's unit test library, [minitest](https://github.com/seattlerb/minitest).
|
104
|
+
|
105
|
+
To run the BigQuery documentation tests:
|
106
|
+
|
107
|
+
``` sh
|
108
|
+
$ cd google-cloud-bigquery/
|
109
|
+
$ bundle exec rake doctest
|
110
|
+
```
|
111
|
+
|
112
|
+
If you add, remove or modify documentation examples when working on a pull
|
113
|
+
request, you may need to update the setup for the tests. The stubs and mocks
|
114
|
+
required to run the tests are located in `support/doctest_helper.rb`. Please
|
115
|
+
note that much of the setup is matched by the title of the
|
116
|
+
[`@example`](http://www.rubydoc.info/gems/yard/file/docs/Tags.md#example) tag.
|
117
|
+
If you alter an example's title, you may encounter breaking tests.
|
118
|
+
|
119
|
+
### BigQuery Acceptance Tests
|
120
|
+
|
121
|
+
The BigQuery acceptance tests interact with the live service API. Follow the
|
122
|
+
instructions in the {file:AUTHENTICATION.md Authentication guide} for enabling
|
123
|
+
the BigQuery API. Occasionally, some API features may not yet be generally
|
124
|
+
available, making it difficult for some contributors to successfully run the
|
125
|
+
entire acceptance test suite. However, please ensure that you do successfully
|
126
|
+
run acceptance tests for any code areas covered by your pull request.
|
127
|
+
|
128
|
+
To run the acceptance tests, first create and configure a project in the Google
|
129
|
+
Developers Console, as described in the {file:AUTHENTICATION.md Authentication
|
130
|
+
guide}. Be sure to download the JSON KEY file. Make note of the PROJECT_ID and
|
131
|
+
the KEYFILE location on your system.
|
132
|
+
|
133
|
+
Before you can run the BigQuery acceptance tests, you must first create indexes
|
134
|
+
used in the tests.
|
135
|
+
|
136
|
+
#### Running the BigQuery acceptance tests
|
137
|
+
|
138
|
+
To run the BigQuery acceptance tests:
|
139
|
+
|
140
|
+
``` sh
|
141
|
+
$ cd google-cloud-bigquery/
|
142
|
+
$ bundle exec rake acceptance[\\{my-project-id},\\{/path/to/keyfile.json}]
|
143
|
+
```
|
144
|
+
|
145
|
+
Or, if you prefer you can store the values in the `GCLOUD_TEST_PROJECT` and
|
146
|
+
`GCLOUD_TEST_KEYFILE` environment variables:
|
147
|
+
|
148
|
+
``` sh
|
149
|
+
$ cd google-cloud-bigquery/
|
150
|
+
$ export GCLOUD_TEST_PROJECT=\\{my-project-id}
|
151
|
+
$ export GCLOUD_TEST_KEYFILE=\\{/path/to/keyfile.json}
|
152
|
+
$ bundle exec rake acceptance
|
153
|
+
```
|
154
|
+
|
155
|
+
If you want to use a different project and credentials for acceptance tests, you
|
156
|
+
can use the more specific `BIGQUERY_TEST_PROJECT` and `BIGQUERY_TEST_KEYFILE`
|
157
|
+
environment variables:
|
158
|
+
|
159
|
+
``` sh
|
160
|
+
$ cd google-cloud-bigquery/
|
161
|
+
$ export BIGQUERY_TEST_PROJECT=\\{my-project-id}
|
162
|
+
$ export BIGQUERY_TEST_KEYFILE=\\{/path/to/keyfile.json}
|
163
|
+
$ bundle exec rake acceptance
|
164
|
+
```
|
165
|
+
|
166
|
+
## Coding Style
|
167
|
+
|
168
|
+
Please follow the established coding style in the library. The style is is
|
169
|
+
largely based on [The Ruby Style
|
170
|
+
Guide](https://github.com/bbatsov/ruby-style-guide) with a few exceptions based
|
171
|
+
on seattle-style:
|
172
|
+
|
173
|
+
* Avoid parenthesis when possible, including in method definitions.
|
174
|
+
* Always use double quotes strings. ([Option
|
175
|
+
B](https://github.com/bbatsov/ruby-style-guide#strings))
|
176
|
+
|
177
|
+
You can check your code against these rules by running Rubocop like so:
|
178
|
+
|
179
|
+
```sh
|
180
|
+
$ cd google-cloud-bigquery/
|
181
|
+
$ bundle exec rake rubocop
|
182
|
+
```
|
183
|
+
|
184
|
+
## Code of Conduct
|
185
|
+
|
186
|
+
Please note that this project is released with a Contributor Code of Conduct. By
|
187
|
+
participating in this project you agree to abide by its terms. See
|
188
|
+
{file:CODE_OF_CONDUCT.md Code of Conduct} for more information.
|
data/LOGGING.md
ADDED
@@ -0,0 +1,27 @@
|
|
1
|
+
# Enabling Logging
|
2
|
+
|
3
|
+
To enable logging for this library, set the logger for the underlying [Google
|
4
|
+
API
|
5
|
+
Client](https://github.com/google/google-api-ruby-client/blob/master/README.md#logging)
|
6
|
+
library. The logger that you set may be a Ruby stdlib
|
7
|
+
[`Logger`](https://ruby-doc.org/stdlib-2.4.0/libdoc/logger/rdoc/Logger.html) as
|
8
|
+
shown below, or a
|
9
|
+
[`Google::Cloud::Logging::Logger`](https://googlecloudplatform.github.io/google-cloud-ruby/docs/google-cloud-logging/latest/Google/Cloud/Logging/Logger)
|
10
|
+
that will write logs to [Stackdriver
|
11
|
+
Logging](https://cloud.google.com/logging/).
|
12
|
+
|
13
|
+
If you do not set the logger explicitly and your application is running in a
|
14
|
+
Rails environment, it will default to `Rails.logger`. Otherwise, if you do not
|
15
|
+
set the logger and you are not using Rails, logging is disabled by default.
|
16
|
+
|
17
|
+
Configuring a Ruby stdlib logger:
|
18
|
+
|
19
|
+
```ruby
|
20
|
+
require "logger"
|
21
|
+
|
22
|
+
my_logger = Logger.new $stderr
|
23
|
+
my_logger.level = Logger::WARN
|
24
|
+
|
25
|
+
# Set the Google API Client logger
|
26
|
+
Google::Apis.logger = my_logger
|
27
|
+
```
|
data/OVERVIEW.md
ADDED
@@ -0,0 +1,463 @@
|
|
1
|
+
# Google Cloud BigQuery
|
2
|
+
|
3
|
+
Google BigQuery enables super-fast, SQL-like queries against massive datasets,
|
4
|
+
using the processing power of Google's infrastructure. To learn more, read [What
|
5
|
+
is BigQuery?](https://cloud.google.com/bigquery/what-is-bigquery).
|
6
|
+
|
7
|
+
The goal of google-cloud is to provide an API that is comfortable to Rubyists.
|
8
|
+
Your authentication credentials are detected automatically in Google Cloud
|
9
|
+
Platform environments such as Google Compute Engine, Google App Engine and
|
10
|
+
Google Kubernetes Engine. In other environments you can configure authentication
|
11
|
+
easily, either directly in your code or via environment variables. Read more
|
12
|
+
about the options for connecting in the {file:AUTHENTICATION.md Authentication
|
13
|
+
Guide}.
|
14
|
+
|
15
|
+
To help you get started quickly, the first few examples below use a public
|
16
|
+
dataset provided by Google. As soon as you have [signed
|
17
|
+
up](https://cloud.google.com/bigquery/sign-up) to use BigQuery, and provided
|
18
|
+
that you stay in the free tier for queries, you should be able to run these
|
19
|
+
first examples without the need to set up billing or to load data (although
|
20
|
+
we'll show you how to do that too.)
|
21
|
+
|
22
|
+
## Listing Datasets and Tables
|
23
|
+
|
24
|
+
A BigQuery project contains datasets, which in turn contain tables. Assuming
|
25
|
+
that you have not yet created datasets or tables in your own project, let's
|
26
|
+
connect to Google's `bigquery-public-data` project, and see what we find.
|
27
|
+
|
28
|
+
```ruby
|
29
|
+
require "google/cloud/bigquery"
|
30
|
+
|
31
|
+
bigquery = Google::Cloud::Bigquery.new project: "bigquery-public-data"
|
32
|
+
|
33
|
+
bigquery.datasets.count #=> 1
|
34
|
+
bigquery.datasets.first.dataset_id #=> "samples"
|
35
|
+
|
36
|
+
dataset = bigquery.datasets.first
|
37
|
+
tables = dataset.tables
|
38
|
+
|
39
|
+
tables.count #=> 7
|
40
|
+
tables.map &:table_id #=> [..., "shakespeare", "trigrams", "wikipedia"]
|
41
|
+
```
|
42
|
+
|
43
|
+
In addition to listing all datasets and tables in the project, you can also
|
44
|
+
retrieve individual datasets and tables by ID. Let's look at the structure of
|
45
|
+
the `shakespeare` table, which contains an entry for every word in every play
|
46
|
+
written by Shakespeare.
|
47
|
+
|
48
|
+
```ruby
|
49
|
+
require "google/cloud/bigquery"
|
50
|
+
|
51
|
+
bigquery = Google::Cloud::Bigquery.new project: "bigquery-public-data"
|
52
|
+
|
53
|
+
dataset = bigquery.dataset "samples"
|
54
|
+
table = dataset.table "shakespeare"
|
55
|
+
|
56
|
+
table.headers #=> [:word, :word_count, :corpus, :corpus_date]
|
57
|
+
table.rows_count #=> 164656
|
58
|
+
```
|
59
|
+
|
60
|
+
Now that you know the column names for the Shakespeare table, let's write and
|
61
|
+
run a few queries against it.
|
62
|
+
|
63
|
+
## Running queries
|
64
|
+
|
65
|
+
BigQuery supports two SQL dialects: [standard
|
66
|
+
SQL](https://cloud.google.com/bigquery/docs/reference/standard-sql/) and the
|
67
|
+
older [legacy SQl (BigQuery
|
68
|
+
SQL)](https://cloud.google.com/bigquery/docs/reference/legacy-sql), as discussed
|
69
|
+
in the guide [Migrating from legacy
|
70
|
+
SQL](https://cloud.google.com/bigquery/docs/reference/standard-sql/migrating-from-legacy-sql).
|
71
|
+
|
72
|
+
### Standard SQL
|
73
|
+
|
74
|
+
Standard SQL is the preferred SQL dialect for querying data stored in BigQuery.
|
75
|
+
It is compliant with the SQL 2011 standard, and has extensions that support
|
76
|
+
querying nested and repeated data. This is the default syntax. It has several
|
77
|
+
advantages over legacy SQL, including:
|
78
|
+
|
79
|
+
* Composability using `WITH` clauses and SQL functions
|
80
|
+
* Subqueries in the `SELECT` list and `WHERE` clause
|
81
|
+
* Correlated subqueries
|
82
|
+
* `ARRAY` and `STRUCT` data types
|
83
|
+
* Inserts, updates, and deletes
|
84
|
+
* `COUNT(DISTINCT <expr>)` is exact and scalable, providing the accuracy of
|
85
|
+
`EXACT_COUNT_DISTINCT` without its limitations
|
86
|
+
* Automatic predicate push-down through `JOIN`s
|
87
|
+
* Complex `JOIN` predicates, including arbitrary expressions
|
88
|
+
|
89
|
+
For examples that demonstrate some of these features, see [Standard SQL
|
90
|
+
ghlights](https://cloud.google.com/bigquery/docs/reference/standard-sql/migrating-from-legacy-l#standard_sql_highlights).
|
91
|
+
|
92
|
+
As shown in this example, standard SQL is the library default:
|
93
|
+
|
94
|
+
```ruby
|
95
|
+
require "google/cloud/bigquery"
|
96
|
+
|
97
|
+
bigquery = Google::Cloud::Bigquery.new
|
98
|
+
|
99
|
+
sql = "SELECT word, SUM(word_count) AS word_count " \
|
100
|
+
"FROM `bigquery-public-data.samples.shakespeare`" \
|
101
|
+
"WHERE word IN ('me', 'I', 'you') GROUP BY word"
|
102
|
+
data = bigquery.query sql
|
103
|
+
```
|
104
|
+
|
105
|
+
Notice that in standard SQL, a fully-qualified table name uses the following
|
106
|
+
format: <code>`my-dashed-project.dataset1.tableName`</code>.
|
107
|
+
|
108
|
+
### Legacy SQL (formerly BigQuery SQL)
|
109
|
+
|
110
|
+
Before version 2.0, BigQuery executed queries using a non-standard SQL dialect
|
111
|
+
known as BigQuery SQL. This variant is optional, and can be enabled by passing
|
112
|
+
the flag `legacy_sql: true` with your query. (If you get an SQL syntax error
|
113
|
+
with a query that may be written in legacy SQL, be sure that you are passing
|
114
|
+
this option.)
|
115
|
+
|
116
|
+
To use legacy SQL, pass the option `legacy_sql: true` with your query:
|
117
|
+
|
118
|
+
```ruby
|
119
|
+
require "google/cloud/bigquery"
|
120
|
+
|
121
|
+
bigquery = Google::Cloud::Bigquery.new
|
122
|
+
|
123
|
+
sql = "SELECT TOP(word, 50) as word, COUNT(*) as count " \
|
124
|
+
"FROM [bigquery-public-data:samples.shakespeare]"
|
125
|
+
data = bigquery.query sql, legacy_sql: true
|
126
|
+
```
|
127
|
+
|
128
|
+
Notice that in legacy SQL, a fully-qualified table name uses brackets instead of
|
129
|
+
back-ticks, and a colon instead of a dot to separate the project and the
|
130
|
+
dataset: `[my-dashed-project:dataset1.tableName]`.
|
131
|
+
|
132
|
+
#### Query parameters
|
133
|
+
|
134
|
+
With standard SQL, you can use positional or named query parameters. This
|
135
|
+
example shows the use of named parameters:
|
136
|
+
|
137
|
+
```ruby
|
138
|
+
require "google/cloud/bigquery"
|
139
|
+
|
140
|
+
bigquery = Google::Cloud::Bigquery.new
|
141
|
+
|
142
|
+
sql = "SELECT word, SUM(word_count) AS word_count " \
|
143
|
+
"FROM `bigquery-public-data.samples.shakespeare`" \
|
144
|
+
"WHERE word IN UNNEST(@words) GROUP BY word"
|
145
|
+
data = bigquery.query sql, params: { words: ['me', 'I', 'you'] }
|
146
|
+
```
|
147
|
+
|
148
|
+
As demonstrated above, passing the `params` option will automatically set
|
149
|
+
`standard_sql` to `true`.
|
150
|
+
|
151
|
+
#### Data types
|
152
|
+
|
153
|
+
BigQuery standard SQL supports simple data types such as integers, as well as
|
154
|
+
more complex types such as `ARRAY` and `STRUCT`.
|
155
|
+
|
156
|
+
The BigQuery data types are converted to and from Ruby types as follows:
|
157
|
+
|
158
|
+
| BigQuery | Ruby | Notes |
|
159
|
+
|-------------|----------------|---|
|
160
|
+
| `BOOL` | `true`/`false` | |
|
161
|
+
| `INT64` | `Integer` | |
|
162
|
+
| `FLOAT64` | `Float` | |
|
163
|
+
| `NUMERIC` | `BigDecimal` | Will be rounded to 9 decimal places |
|
164
|
+
| `STRING` | `String` | |
|
165
|
+
| `DATETIME` | `DateTime` | `DATETIME` does not support time zone. |
|
166
|
+
| `DATE` | `Date` | |
|
167
|
+
| `TIMESTAMP` | `Time` | |
|
168
|
+
| `TIME` | `Google::Cloud::BigQuery::Time` | |
|
169
|
+
| `BYTES` | `File`, `IO`, `StringIO`, or similar | |
|
170
|
+
| `ARRAY` | `Array` | Nested arrays and `nil` values are not supported. |
|
171
|
+
| `STRUCT` | `Hash` | Hash keys may be strings or symbols. |
|
172
|
+
|
173
|
+
See [Data
|
174
|
+
Types](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types)
|
175
|
+
for an overview of each BigQuery data type, including allowed values.
|
176
|
+
|
177
|
+
### Running Queries
|
178
|
+
|
179
|
+
Let's start with the simplest way to run a query. Notice that this time you are
|
180
|
+
connecting using your own default project. It is necessary to have write access
|
181
|
+
to the project for running a query, since queries need to create tables to hold
|
182
|
+
results.
|
183
|
+
|
184
|
+
```ruby
|
185
|
+
require "google/cloud/bigquery"
|
186
|
+
|
187
|
+
bigquery = Google::Cloud::Bigquery.new
|
188
|
+
|
189
|
+
sql = "SELECT APPROX_TOP_COUNT(corpus, 10) as title, " \
|
190
|
+
"COUNT(*) as unique_words " \
|
191
|
+
"FROM `bigquery-public-data.samples.shakespeare`"
|
192
|
+
data = bigquery.query sql
|
193
|
+
|
194
|
+
data.next? #=> false
|
195
|
+
data.first #=> {:title=>[{:value=>"hamlet", :count=>5318}, ...}
|
196
|
+
```
|
197
|
+
|
198
|
+
The `APPROX_TOP_COUNT` function shown above is just one of a variety of
|
199
|
+
functions offered by BigQuery. See the [Query Reference (standard
|
200
|
+
SQL)](https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators)
|
201
|
+
for a full listing.
|
202
|
+
|
203
|
+
### Query Jobs
|
204
|
+
|
205
|
+
It is usually best not to block for most BigQuery operations, including querying
|
206
|
+
as well as importing, exporting, and copying data. Therefore, the BigQuery API
|
207
|
+
provides facilities for managing longer-running jobs. With this approach, an
|
208
|
+
instance of {Google::Cloud::Bigquery::QueryJob} is returned, rather than an
|
209
|
+
instance of {Google::Cloud::Bigquery::Data}.
|
210
|
+
|
211
|
+
```ruby
|
212
|
+
require "google/cloud/bigquery"
|
213
|
+
|
214
|
+
bigquery = Google::Cloud::Bigquery.new
|
215
|
+
|
216
|
+
sql = "SELECT APPROX_TOP_COUNT(corpus, 10) as title, " \
|
217
|
+
"COUNT(*) as unique_words " \
|
218
|
+
"FROM `bigquery-public-data.samples.shakespeare`"
|
219
|
+
job = bigquery.query_job sql
|
220
|
+
|
221
|
+
job.wait_until_done!
|
222
|
+
if !job.failed?
|
223
|
+
job.data.first
|
224
|
+
#=> {:title=>[{:value=>"hamlet", :count=>5318}, ...}
|
225
|
+
end
|
226
|
+
```
|
227
|
+
|
228
|
+
Once you have determined that the job is done and has not failed, you can obtain
|
229
|
+
an instance of {Google::Cloud::Bigquery::Data} by calling `data` on the job
|
230
|
+
instance. The query results for both of the above examples are stored in
|
231
|
+
temporary tables with a lifetime of about 24 hours. See the final example below
|
232
|
+
for a demonstration of how to store query results in a permanent table.
|
233
|
+
|
234
|
+
## Creating Datasets and Tables
|
235
|
+
|
236
|
+
The first thing you need to do in a new BigQuery project is to create a
|
237
|
+
{Google::Cloud::Bigquery::Dataset}. Datasets hold tables and control access to
|
238
|
+
them.
|
239
|
+
|
240
|
+
```ruby
|
241
|
+
require "google/cloud/bigquery"
|
242
|
+
|
243
|
+
bigquery = Google::Cloud::Bigquery.new
|
244
|
+
|
245
|
+
dataset = bigquery.create_dataset "my_dataset"
|
246
|
+
```
|
247
|
+
|
248
|
+
Now that you have a dataset, you can use it to create a table. Every table is
|
249
|
+
defined by a schema that may contain nested and repeated fields. The example
|
250
|
+
below shows a schema with a repeated record field named `cities_lived`. (For
|
251
|
+
more information about nested and repeated fields, see [Preparing Data for
|
252
|
+
Loading](https://cloud.google.com/bigquery/preparing-data-for-loading).)
|
253
|
+
|
254
|
+
```ruby
|
255
|
+
require "google/cloud/bigquery"
|
256
|
+
|
257
|
+
bigquery = Google::Cloud::Bigquery.new
|
258
|
+
dataset = bigquery.dataset "my_dataset"
|
259
|
+
|
260
|
+
table = dataset.create_table "people" do |schema|
|
261
|
+
schema.string "first_name", mode: :required
|
262
|
+
schema.record "cities_lived", mode: :repeated do |nested_schema|
|
263
|
+
nested_schema.string "place", mode: :required
|
264
|
+
nested_schema.integer "number_of_years", mode: :required
|
265
|
+
end
|
266
|
+
end
|
267
|
+
```
|
268
|
+
|
269
|
+
Because of the repeated field in this schema, we cannot use the CSV format to
|
270
|
+
load data into the table.
|
271
|
+
|
272
|
+
## Loading records
|
273
|
+
|
274
|
+
To follow along with these examples, you will need to set up billing on the
|
275
|
+
[Google Developers Console](https://console.developers.google.com).
|
276
|
+
|
277
|
+
In addition to CSV, data can be imported from files that are formatted as
|
278
|
+
[Newline-delimited JSON](http://jsonlines.org/),
|
279
|
+
[Avro](http://avro.apache.org/),
|
280
|
+
[ORC](https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-orc),
|
281
|
+
[Parquet](https://parquet.apache.org/) or from a Google Cloud Datastore backup.
|
282
|
+
It can also be "streamed" into BigQuery.
|
283
|
+
|
284
|
+
### Streaming records
|
285
|
+
|
286
|
+
For situations in which you want new data to be available for querying as soon
|
287
|
+
as possible, inserting individual records directly from your Ruby application is
|
288
|
+
a great approach.
|
289
|
+
|
290
|
+
```ruby
|
291
|
+
require "google/cloud/bigquery"
|
292
|
+
|
293
|
+
bigquery = Google::Cloud::Bigquery.new
|
294
|
+
dataset = bigquery.dataset "my_dataset"
|
295
|
+
table = dataset.table "people"
|
296
|
+
|
297
|
+
rows = [
|
298
|
+
{
|
299
|
+
"first_name" => "Anna",
|
300
|
+
"cities_lived" => [
|
301
|
+
{
|
302
|
+
"place" => "Stockholm",
|
303
|
+
"number_of_years" => 2
|
304
|
+
}
|
305
|
+
]
|
306
|
+
},
|
307
|
+
{
|
308
|
+
"first_name" => "Bob",
|
309
|
+
"cities_lived" => [
|
310
|
+
{
|
311
|
+
"place" => "Seattle",
|
312
|
+
"number_of_years" => 5
|
313
|
+
},
|
314
|
+
{
|
315
|
+
"place" => "Austin",
|
316
|
+
"number_of_years" => 6
|
317
|
+
}
|
318
|
+
]
|
319
|
+
}
|
320
|
+
]
|
321
|
+
table.insert rows
|
322
|
+
```
|
323
|
+
|
324
|
+
To avoid making RPCs (network requests) to retrieve the dataset and table
|
325
|
+
resources when streaming records, pass the `skip_lookup` option. This creates
|
326
|
+
local objects without verifying that the resources exist on the BigQuery
|
327
|
+
service.
|
328
|
+
|
329
|
+
```ruby
|
330
|
+
require "google/cloud/bigquery"
|
331
|
+
|
332
|
+
bigquery = Google::Cloud::Bigquery.new
|
333
|
+
dataset = bigquery.dataset "my_dataset", skip_lookup: true
|
334
|
+
table = dataset.table "people", skip_lookup: true
|
335
|
+
|
336
|
+
rows = [
|
337
|
+
{
|
338
|
+
"first_name" => "Anna",
|
339
|
+
"cities_lived" => [
|
340
|
+
{
|
341
|
+
"place" => "Stockholm",
|
342
|
+
"number_of_years" => 2
|
343
|
+
}
|
344
|
+
]
|
345
|
+
},
|
346
|
+
{
|
347
|
+
"first_name" => "Bob",
|
348
|
+
"cities_lived" => [
|
349
|
+
{
|
350
|
+
"place" => "Seattle",
|
351
|
+
"number_of_years" => 5
|
352
|
+
},
|
353
|
+
{
|
354
|
+
"place" => "Austin",
|
355
|
+
"number_of_years" => 6
|
356
|
+
}
|
357
|
+
]
|
358
|
+
}
|
359
|
+
]
|
360
|
+
table.insert rows
|
361
|
+
```
|
362
|
+
|
363
|
+
There are some trade-offs involved with streaming, so be sure to read the
|
364
|
+
discussion of data consistency in [Streaming Data Into
|
365
|
+
BigQuery](https://cloud.google.com/bigquery/streaming-data-into-bigquery).
|
366
|
+
|
367
|
+
### Uploading a file
|
368
|
+
|
369
|
+
To follow along with this example, please download the
|
370
|
+
[names.zip](http://www.ssa.gov/OACT/babynames/names.zip) archive from the U.S.
|
371
|
+
Social Security Administration. Inside the archive you will find over 100 files
|
372
|
+
containing baby name records since the year 1880.
|
373
|
+
|
374
|
+
```ruby
|
375
|
+
require "google/cloud/bigquery"
|
376
|
+
|
377
|
+
bigquery = Google::Cloud::Bigquery.new
|
378
|
+
dataset = bigquery.dataset "my_dataset"
|
379
|
+
table = dataset.create_table "baby_names" do |schema|
|
380
|
+
schema.string "name", mode: :required
|
381
|
+
schema.string "gender", mode: :required
|
382
|
+
schema.integer "count", mode: :required
|
383
|
+
end
|
384
|
+
|
385
|
+
file = File.open "names/yob2014.txt"
|
386
|
+
table.load file, format: "csv"
|
387
|
+
```
|
388
|
+
|
389
|
+
Because the names data, although formatted as CSV, is distributed in files with
|
390
|
+
a `.txt` extension, this example explicitly passes the `format` option in order
|
391
|
+
to demonstrate how to handle such situations. Because CSV is the default format
|
392
|
+
for load operations, the option is not actually necessary. For JSON saved with a
|
393
|
+
`.txt` extension, however, it would be.
|
394
|
+
|
395
|
+
## Exporting query results to Google Cloud Storage
|
396
|
+
|
397
|
+
The example below shows how to pass the `table` option with a query in order to
|
398
|
+
store results in a permanent table. It also shows how to export the result data
|
399
|
+
to a Google Cloud Storage file. In order to follow along, you will need to
|
400
|
+
enable the Google Cloud Storage API in addition to setting up billing.
|
401
|
+
|
402
|
+
```ruby
|
403
|
+
require "google/cloud/bigquery"
|
404
|
+
|
405
|
+
bigquery = Google::Cloud::Bigquery.new
|
406
|
+
dataset = bigquery.dataset "my_dataset"
|
407
|
+
source_table = dataset.table "baby_names"
|
408
|
+
result_table = dataset.create_table "baby_names_results"
|
409
|
+
|
410
|
+
sql = "SELECT name, count " \
|
411
|
+
"FROM baby_names " \
|
412
|
+
"WHERE gender = 'M' " \
|
413
|
+
"ORDER BY count ASC LIMIT 5"
|
414
|
+
query_job = dataset.query_job sql, table: result_table
|
415
|
+
|
416
|
+
query_job.wait_until_done!
|
417
|
+
|
418
|
+
if !query_job.failed?
|
419
|
+
require "google/cloud/storage"
|
420
|
+
|
421
|
+
storage = Google::Cloud::Storage.new
|
422
|
+
bucket_id = "bigquery-exports-#{SecureRandom.uuid}"
|
423
|
+
bucket = storage.create_bucket bucket_id
|
424
|
+
extract_url = "gs://#{bucket.id}/baby-names.csv"
|
425
|
+
|
426
|
+
result_table.extract extract_url
|
427
|
+
|
428
|
+
# Download to local filesystem
|
429
|
+
bucket.files.first.download "baby-names.csv"
|
430
|
+
end
|
431
|
+
```
|
432
|
+
|
433
|
+
If a table you wish to export contains a large amount of data, you can pass a
|
434
|
+
wildcard URI to export to multiple files (for sharding), or an array of URIs
|
435
|
+
(for partitioning), or both. See [Exporting
|
436
|
+
Data](https://cloud.google.com/bigquery/docs/exporting-data) for details.
|
437
|
+
|
438
|
+
## Configuring retries and timeout
|
439
|
+
|
440
|
+
You can configure how many times API requests may be automatically retried. When
|
441
|
+
an API request fails, the response will be inspected to see if the request meets
|
442
|
+
criteria indicating that it may succeed on retry, such as `500` and `503` status
|
443
|
+
codes or a specific internal error code such as `rateLimitExceeded`. If it meets
|
444
|
+
the criteria, the request will be retried after a delay. If another error
|
445
|
+
occurs, the delay will be increased before a subsequent attempt, until the
|
446
|
+
`retries` limit is reached.
|
447
|
+
|
448
|
+
You can also set the request `timeout` value in seconds.
|
449
|
+
|
450
|
+
```ruby
|
451
|
+
require "google/cloud/bigquery"
|
452
|
+
|
453
|
+
bigquery = Google::Cloud::Bigquery.new retries: 10, timeout: 120
|
454
|
+
```
|
455
|
+
|
456
|
+
See the [BigQuery error
|
457
|
+
table](https://cloud.google.com/bigquery/troubleshooting-errors#errortable) for
|
458
|
+
a list of error conditions.
|
459
|
+
|
460
|
+
## Additional information
|
461
|
+
|
462
|
+
Google BigQuery can be configured to use logging. To learn more, see the
|
463
|
+
{file:LOGGING.md Logging guide}.
|
data/TROUBLESHOOTING.md
ADDED
@@ -0,0 +1,37 @@
|
|
1
|
+
# Troubleshooting
|
2
|
+
|
3
|
+
## Where can I get more help?
|
4
|
+
|
5
|
+
### Ask the Community
|
6
|
+
|
7
|
+
If you have a question about how to use a Google Cloud client library in your
|
8
|
+
project or are stuck in the Developer's console and don't know where to turn,
|
9
|
+
it's possible your questions have already been addressed by the community.
|
10
|
+
|
11
|
+
First, check out the appropriate tags on StackOverflow:
|
12
|
+
- [`google-cloud-platform+ruby+bigquery`][so-ruby]
|
13
|
+
|
14
|
+
Next, try searching through the issues on GitHub:
|
15
|
+
|
16
|
+
- [`api:bigquery` issues][gh-search-ruby]
|
17
|
+
|
18
|
+
Still nothing?
|
19
|
+
|
20
|
+
### Ask the Developers
|
21
|
+
|
22
|
+
If you're experiencing a bug with the code, or have an idea for how it can be
|
23
|
+
improved, *please* create a new issue on GitHub so we can talk about it.
|
24
|
+
|
25
|
+
- [New issue][gh-ruby]
|
26
|
+
|
27
|
+
Or, you can ask questions on the [Google Cloud Platform Slack][slack-ruby]. You
|
28
|
+
can use the "ruby" channel for general Ruby questions, or use the
|
29
|
+
"google-cloud-ruby" channel if you have questions about this gem in particular.
|
30
|
+
|
31
|
+
[so-ruby]: http://stackoverflow.com/questions/tagged/google-cloud-platform+ruby+bigquery
|
32
|
+
|
33
|
+
[gh-search-ruby]: https://github.com/googlecloudplatform/google-cloud-ruby/issues?q=label%3A%22api%3A+bigquery%22
|
34
|
+
|
35
|
+
[gh-ruby]: https://github.com/googlecloudplatform/google-cloud-ruby/issues/new
|
36
|
+
|
37
|
+
[slack-ruby]: https://gcp-slack.appspot.com/
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: google-cloud-bigquery
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.8.
|
4
|
+
version: 1.8.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Mike Moore
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2018-09-
|
12
|
+
date: 2018-09-12 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: google-cloud-core
|
@@ -216,8 +216,14 @@ extensions: []
|
|
216
216
|
extra_rdoc_files: []
|
217
217
|
files:
|
218
218
|
- ".yardopts"
|
219
|
+
- AUTHENTICATION.md
|
220
|
+
- CHANGELOG.md
|
221
|
+
- CODE_OF_CONDUCT.md
|
222
|
+
- CONTRIBUTING.md
|
219
223
|
- LICENSE
|
220
|
-
-
|
224
|
+
- LOGGING.md
|
225
|
+
- OVERVIEW.md
|
226
|
+
- TROUBLESHOOTING.md
|
221
227
|
- lib/google-cloud-bigquery.rb
|
222
228
|
- lib/google/cloud/bigquery.rb
|
223
229
|
- lib/google/cloud/bigquery/convert.rb
|
data/README.md
DELETED
@@ -1,112 +0,0 @@
|
|
1
|
-
# google-cloud-bigquery
|
2
|
-
|
3
|
-
[Google BigQuery](https://cloud.google.com/bigquery/) ([docs](https://cloud.google.com/bigquery/docs)) enables super-fast, SQL-like queries against append-only tables, using the processing power of Google's infrastructure. Simply move your data into BigQuery and let it handle the hard work. You can control access to both the project and your data based on your business needs, such as giving others the ability to view or query your data.
|
4
|
-
|
5
|
-
- [google-cloud-bigquery API documentation](http://googlecloudplatform.github.io/google-cloud-ruby/docs/google-cloud-bigquery/latest)
|
6
|
-
- [google-cloud-bigquery on RubyGems](https://rubygems.org/gems/google-cloud-bigquery)
|
7
|
-
- [Google BigQuery documentation](https://cloud.google.com/bigquery/docs)
|
8
|
-
|
9
|
-
## Quick Start
|
10
|
-
|
11
|
-
```sh
|
12
|
-
$ gem install google-cloud-bigquery
|
13
|
-
```
|
14
|
-
|
15
|
-
## Authentication
|
16
|
-
|
17
|
-
This library uses Service Account credentials to connect to Google Cloud services. When running on Compute Engine the credentials will be discovered automatically. When running on other environments the Service Account credentials can be specified by providing the path to the JSON file, or the JSON itself, in environment variables.
|
18
|
-
|
19
|
-
Instructions and configuration options are covered in the [Authentication Guide](https://googlecloudplatform.github.io/google-cloud-ruby/docs/google-cloud-bigquery/latest/file.AUTHENTICATION).
|
20
|
-
|
21
|
-
## Example
|
22
|
-
|
23
|
-
```ruby
|
24
|
-
require "google/cloud/bigquery"
|
25
|
-
|
26
|
-
bigquery = Google::Cloud::Bigquery.new
|
27
|
-
dataset = bigquery.create_dataset "my_dataset"
|
28
|
-
|
29
|
-
table = dataset.create_table "my_table" do |t|
|
30
|
-
t.name = "My Table",
|
31
|
-
t.description = "A description of my table."
|
32
|
-
t.schema do |s|
|
33
|
-
s.string "first_name", mode: :required
|
34
|
-
s.string "last_name", mode: :required
|
35
|
-
s.integer "age", mode: :required
|
36
|
-
end
|
37
|
-
end
|
38
|
-
|
39
|
-
# Load data into the table from Google Cloud Storage
|
40
|
-
table.load "gs://my-bucket/file-name.csv"
|
41
|
-
|
42
|
-
# Run a query
|
43
|
-
data = dataset.query "SELECT first_name FROM my_table"
|
44
|
-
|
45
|
-
data.each do |row|
|
46
|
-
puts row[:first_name]
|
47
|
-
end
|
48
|
-
```
|
49
|
-
|
50
|
-
## Enabling Logging
|
51
|
-
|
52
|
-
To enable logging for this library, set the logger for the underlying [Google API Client](https://github.com/google/google-api-ruby-client/blob/master/README.md#logging) library. The logger that you set may be a Ruby stdlib [`Logger`](https://ruby-doc.org/stdlib-2.4.0/libdoc/logger/rdoc/Logger.html) as shown below, or a [`Google::Cloud::Logging::Logger`](https://googlecloudplatform.github.io/google-cloud-ruby/docs/google-cloud-logging/latest/Google/Cloud/Logging/Logger) that will write logs to [Stackdriver Logging](https://cloud.google.com/logging/).
|
53
|
-
|
54
|
-
If you do not set the logger explicitly and your application is running in a Rails environment, it will default to `Rails.logger`. Otherwise, if you do not set the logger and you are not using Rails, logging is disabled by default.
|
55
|
-
|
56
|
-
Configuring a Ruby stdlib logger:
|
57
|
-
|
58
|
-
```ruby
|
59
|
-
require "logger"
|
60
|
-
|
61
|
-
my_logger = Logger.new $stderr
|
62
|
-
my_logger.level = Logger::WARN
|
63
|
-
|
64
|
-
# Set the Google API Client logger
|
65
|
-
Google::Apis.logger = my_logger
|
66
|
-
```
|
67
|
-
|
68
|
-
## Supported Ruby Versions
|
69
|
-
|
70
|
-
This library is supported on Ruby 2.3+.
|
71
|
-
|
72
|
-
Google provides official support for Ruby versions that are actively supported
|
73
|
-
by Ruby Core—that is, Ruby versions that are either in normal maintenance or in
|
74
|
-
security maintenance, and not end of life. Currently, this means Ruby 2.3 and
|
75
|
-
later. Older versions of Ruby _may_ still work, but are unsupported and not
|
76
|
-
recommended. See https://www.ruby-lang.org/en/downloads/branches/ for details
|
77
|
-
about the Ruby support schedule.
|
78
|
-
|
79
|
-
## Versioning
|
80
|
-
|
81
|
-
This library follows [Semantic Versioning](http://semver.org/).
|
82
|
-
|
83
|
-
It is currently in major version zero (0.y.z), which means that anything may
|
84
|
-
change at any time and the public API should not be considered stable.
|
85
|
-
|
86
|
-
## Contributing
|
87
|
-
|
88
|
-
Contributions to this library are always welcome and highly encouraged.
|
89
|
-
|
90
|
-
See the [Contributing
|
91
|
-
Guide](https://googlecloudplatform.github.io/google-cloud-ruby/docs/google-cloud-bigquery/latest/file.CONTRIBUTING)
|
92
|
-
for more information on how to get started.
|
93
|
-
|
94
|
-
Please note that this project is released with a Contributor Code of Conduct. By
|
95
|
-
participating in this project you agree to abide by its terms. See [Code of
|
96
|
-
Conduct](https://googlecloudplatform.github.io/google-cloud-ruby/docs/google-cloud-bigquery/latest/file.CODE_OF_CONDUCT)
|
97
|
-
for more information.
|
98
|
-
|
99
|
-
## License
|
100
|
-
|
101
|
-
This library is licensed under Apache 2.0. Full license text is available in
|
102
|
-
[LICENSE](https://googlecloudplatform.github.io/google-cloud-ruby/docs/google-cloud-bigquery/latest/file.LICENSE).
|
103
|
-
|
104
|
-
## Support
|
105
|
-
|
106
|
-
Please [report bugs at the project on
|
107
|
-
Github](https://github.com/GoogleCloudPlatform/google-cloud-ruby/issues). Don't
|
108
|
-
hesitate to [ask
|
109
|
-
questions](http://stackoverflow.com/questions/tagged/google-cloud-platform+ruby)
|
110
|
-
about the client or APIs on [StackOverflow](http://stackoverflow.com). For more
|
111
|
-
see the [Troubleshooting
|
112
|
-
guide](https://googlecloudplatform.github.io/google-cloud-ruby/docs/google-cloud-bigquery/latest/file.TROUBLESHOOTING)
|