google-cloud-speech 0.31.0 → 0.31.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +5 -5
- data/lib/google/cloud/speech.rb +4 -4
- data/lib/google/cloud/speech/v1.rb +4 -4
- data/lib/google/cloud/speech/v1/doc/google/cloud/speech/v1/cloud_speech.rb +97 -97
- data/lib/google/cloud/speech/v1/doc/google/longrunning/operations.rb +9 -9
- data/lib/google/cloud/speech/v1/doc/google/protobuf/any.rb +8 -8
- data/lib/google/cloud/speech/v1/doc/google/protobuf/duration.rb +3 -3
- data/lib/google/cloud/speech/v1/doc/google/rpc/status.rb +11 -11
- data/lib/google/cloud/speech/v1/speech_client.rb +2 -2
- data/lib/google/cloud/speech/v1p1beta1.rb +4 -4
- data/lib/google/cloud/speech/v1p1beta1/doc/google/cloud/speech/v1p1beta1/cloud_speech.rb +107 -107
- data/lib/google/cloud/speech/v1p1beta1/doc/google/longrunning/operations.rb +9 -9
- data/lib/google/cloud/speech/v1p1beta1/doc/google/protobuf/any.rb +8 -8
- data/lib/google/cloud/speech/v1p1beta1/doc/google/protobuf/duration.rb +3 -3
- data/lib/google/cloud/speech/v1p1beta1/doc/google/rpc/status.rb +11 -11
- data/lib/google/cloud/speech/v1p1beta1/speech_client.rb +2 -2
- metadata +2 -4
- data/lib/google/cloud/speech/v1/doc/overview.rb +0 -99
- data/lib/google/cloud/speech/v1p1beta1/doc/overview.rb +0 -99
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 6c020aae8d167db11676cfec941720c95d808833a976c7dff4cc5ed70adb76bb
|
|
4
|
+
data.tar.gz: ba7bd5abf5fbfe047a62e2208d14df69a295782edbe8a5d96ecdd161ebe07bde
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: cc0c3b397c928b5c9f5b12c113aa03e9547aee071e5f5ff7e6591d2dd1b24566210b6c1581c0652a039851faab139175282a9a8735d213f0a3831db41701013e
|
|
7
|
+
data.tar.gz: 394fe29602211aa0a6a847631ebe7d26cd0996f9ba9d4a1c31a5144394b1726bc9ecb289e0e6eaf6e4a23eed5235b4cbdc08b1e701ac7f7fa0a4027ffa748791
|
data/README.md
CHANGED
|
@@ -1,4 +1,4 @@
|
|
|
1
|
-
# Ruby Client for Cloud Speech API ([Alpha](https://github.com/
|
|
1
|
+
# Ruby Client for Cloud Speech API ([Alpha](https://github.com/googleapis/google-cloud-ruby#versioning))
|
|
2
2
|
|
|
3
3
|
[Cloud Speech API][Product Documentation]:
|
|
4
4
|
Converts audio to text by applying powerful neural network models.
|
|
@@ -12,7 +12,7 @@ steps:
|
|
|
12
12
|
1. [Select or create a Cloud Platform project.](https://console.cloud.google.com/project)
|
|
13
13
|
2. [Enable billing for your project.](https://cloud.google.com/billing/docs/how-to/modify-project#enable_billing_for_a_project)
|
|
14
14
|
3. [Enable the Cloud Speech API.](https://console.cloud.google.com/apis/library/speech.googleapis.com)
|
|
15
|
-
4. [Setup Authentication.](https://
|
|
15
|
+
4. [Setup Authentication.](https://googleapis.github.io/google-cloud-ruby/#/docs/google-cloud/master/guides/authentication)
|
|
16
16
|
|
|
17
17
|
### Installation
|
|
18
18
|
```
|
|
@@ -50,17 +50,17 @@ response = speech_client.recognize(config, audio)
|
|
|
50
50
|
to see other available methods on the client.
|
|
51
51
|
- Read the [Cloud Speech API Product documentation][Product Documentation]
|
|
52
52
|
to learn more about the product and see How-to Guides.
|
|
53
|
-
- View this [repository's main README](https://github.com/
|
|
53
|
+
- View this [repository's main README](https://github.com/googleapis/google-cloud-ruby/blob/master/README.md)
|
|
54
54
|
to see the full list of Cloud APIs that we cover.
|
|
55
55
|
|
|
56
|
-
[Client Library Documentation]: https://
|
|
56
|
+
[Client Library Documentation]: https://googleapis.github.io/google-cloud-ruby/#/docs/google-cloud-speech/latest/google/cloud/speech/v1
|
|
57
57
|
[Product Documentation]: https://cloud.google.com/speech
|
|
58
58
|
|
|
59
59
|
## Enabling Logging
|
|
60
60
|
|
|
61
61
|
To enable logging for this library, set the logger for the underlying [gRPC](https://github.com/grpc/grpc/tree/master/src/ruby) library.
|
|
62
62
|
The logger that you set may be a Ruby stdlib [`Logger`](https://ruby-doc.org/stdlib-2.5.0/libdoc/logger/rdoc/Logger.html) as shown below,
|
|
63
|
-
or a [`Google::Cloud::Logging::Logger`](https://
|
|
63
|
+
or a [`Google::Cloud::Logging::Logger`](https://googleapis.github.io/google-cloud-ruby/#/docs/google-cloud-logging/latest/google/cloud/logging/logger)
|
|
64
64
|
that will write logs to [Stackdriver Logging](https://cloud.google.com/logging/). See [grpc/logconfig.rb](https://github.com/grpc/grpc/blob/master/src/ruby/lib/grpc/logconfig.rb)
|
|
65
65
|
and the gRPC [spec_helper.rb](https://github.com/grpc/grpc/blob/master/src/ruby/spec/spec_helper.rb) for additional information.
|
|
66
66
|
|
data/lib/google/cloud/speech.rb
CHANGED
|
@@ -21,7 +21,7 @@ module Google
|
|
|
21
21
|
# rubocop:disable LineLength
|
|
22
22
|
|
|
23
23
|
##
|
|
24
|
-
# # Ruby Client for Cloud Speech API ([Alpha](https://github.com/
|
|
24
|
+
# # Ruby Client for Cloud Speech API ([Alpha](https://github.com/googleapis/google-cloud-ruby#versioning))
|
|
25
25
|
#
|
|
26
26
|
# [Cloud Speech API][Product Documentation]:
|
|
27
27
|
# Converts audio to text by applying powerful neural network models.
|
|
@@ -34,7 +34,7 @@ module Google
|
|
|
34
34
|
# 1. [Select or create a Cloud Platform project.](https://console.cloud.google.com/project)
|
|
35
35
|
# 2. [Enable billing for your project.](https://cloud.google.com/billing/docs/how-to/modify-project#enable_billing_for_a_project)
|
|
36
36
|
# 3. [Enable the Cloud Speech API.](https://console.cloud.google.com/apis/library/speech.googleapis.com)
|
|
37
|
-
# 4. [Setup Authentication.](https://
|
|
37
|
+
# 4. [Setup Authentication.](https://googleapis.github.io/google-cloud-ruby/#/docs/google-cloud/master/guides/authentication)
|
|
38
38
|
#
|
|
39
39
|
# ### Installation
|
|
40
40
|
# ```
|
|
@@ -70,7 +70,7 @@ module Google
|
|
|
70
70
|
# ### Next Steps
|
|
71
71
|
# - Read the [Cloud Speech API Product documentation][Product Documentation]
|
|
72
72
|
# to learn more about the product and see How-to Guides.
|
|
73
|
-
# - View this [repository's main README](https://github.com/
|
|
73
|
+
# - View this [repository's main README](https://github.com/googleapis/google-cloud-ruby/blob/master/README.md)
|
|
74
74
|
# to see the full list of Cloud APIs that we cover.
|
|
75
75
|
#
|
|
76
76
|
# [Product Documentation]: https://cloud.google.com/speech
|
|
@@ -79,7 +79,7 @@ module Google
|
|
|
79
79
|
#
|
|
80
80
|
# To enable logging for this library, set the logger for the underlying [gRPC](https://github.com/grpc/grpc/tree/master/src/ruby) library.
|
|
81
81
|
# The logger that you set may be a Ruby stdlib [`Logger`](https://ruby-doc.org/stdlib-2.5.0/libdoc/logger/rdoc/Logger.html) as shown below,
|
|
82
|
-
# or a [`Google::Cloud::Logging::Logger`](https://
|
|
82
|
+
# or a [`Google::Cloud::Logging::Logger`](https://googleapis.github.io/google-cloud-ruby/#/docs/google-cloud-logging/latest/google/cloud/logging/logger)
|
|
83
83
|
# that will write logs to [Stackdriver Logging](https://cloud.google.com/logging/). See [grpc/logconfig.rb](https://github.com/grpc/grpc/blob/master/src/ruby/lib/grpc/logconfig.rb)
|
|
84
84
|
# and the gRPC [spec_helper.rb](https://github.com/grpc/grpc/blob/master/src/ruby/spec/spec_helper.rb) for additional information.
|
|
85
85
|
#
|
|
@@ -22,7 +22,7 @@ module Google
|
|
|
22
22
|
# rubocop:disable LineLength
|
|
23
23
|
|
|
24
24
|
##
|
|
25
|
-
# # Ruby Client for Cloud Speech API ([Alpha](https://github.com/
|
|
25
|
+
# # Ruby Client for Cloud Speech API ([Alpha](https://github.com/googleapis/google-cloud-ruby#versioning))
|
|
26
26
|
#
|
|
27
27
|
# [Cloud Speech API][Product Documentation]:
|
|
28
28
|
# Converts audio to text by applying powerful neural network models.
|
|
@@ -35,7 +35,7 @@ module Google
|
|
|
35
35
|
# 1. [Select or create a Cloud Platform project.](https://console.cloud.google.com/project)
|
|
36
36
|
# 2. [Enable billing for your project.](https://cloud.google.com/billing/docs/how-to/modify-project#enable_billing_for_a_project)
|
|
37
37
|
# 3. [Enable the Cloud Speech API.](https://console.cloud.google.com/apis/library/speech.googleapis.com)
|
|
38
|
-
# 4. [Setup Authentication.](https://
|
|
38
|
+
# 4. [Setup Authentication.](https://googleapis.github.io/google-cloud-ruby/#/docs/google-cloud/master/guides/authentication)
|
|
39
39
|
#
|
|
40
40
|
# ### Installation
|
|
41
41
|
# ```
|
|
@@ -64,7 +64,7 @@ module Google
|
|
|
64
64
|
# ### Next Steps
|
|
65
65
|
# - Read the [Cloud Speech API Product documentation][Product Documentation]
|
|
66
66
|
# to learn more about the product and see How-to Guides.
|
|
67
|
-
# - View this [repository's main README](https://github.com/
|
|
67
|
+
# - View this [repository's main README](https://github.com/googleapis/google-cloud-ruby/blob/master/README.md)
|
|
68
68
|
# to see the full list of Cloud APIs that we cover.
|
|
69
69
|
#
|
|
70
70
|
# [Product Documentation]: https://cloud.google.com/speech
|
|
@@ -73,7 +73,7 @@ module Google
|
|
|
73
73
|
#
|
|
74
74
|
# To enable logging for this library, set the logger for the underlying [gRPC](https://github.com/grpc/grpc/tree/master/src/ruby) library.
|
|
75
75
|
# The logger that you set may be a Ruby stdlib [`Logger`](https://ruby-doc.org/stdlib-2.5.0/libdoc/logger/rdoc/Logger.html) as shown below,
|
|
76
|
-
# or a [`Google::Cloud::Logging::Logger`](https://
|
|
76
|
+
# or a [`Google::Cloud::Logging::Logger`](https://googleapis.github.io/google-cloud-ruby/#/docs/google-cloud-logging/latest/google/cloud/logging/logger)
|
|
77
77
|
# that will write logs to [Stackdriver Logging](https://cloud.google.com/logging/). See [grpc/logconfig.rb](https://github.com/grpc/grpc/blob/master/src/ruby/lib/grpc/logconfig.rb)
|
|
78
78
|
# and the gRPC [spec_helper.rb](https://github.com/grpc/grpc/blob/master/src/ruby/spec/spec_helper.rb) for additional information.
|
|
79
79
|
#
|
|
@@ -17,7 +17,7 @@ module Google
|
|
|
17
17
|
module Cloud
|
|
18
18
|
module Speech
|
|
19
19
|
module V1
|
|
20
|
-
# The top-level message sent by the client for the
|
|
20
|
+
# The top-level message sent by the client for the `Recognize` method.
|
|
21
21
|
# @!attribute [rw] config
|
|
22
22
|
# @return [Google::Cloud::Speech::V1::RecognitionConfig]
|
|
23
23
|
# *Required* Provides information to the recognizer that specifies how to
|
|
@@ -27,7 +27,7 @@ module Google
|
|
|
27
27
|
# *Required* The audio data to be recognized.
|
|
28
28
|
class RecognizeRequest; end
|
|
29
29
|
|
|
30
|
-
# The top-level message sent by the client for the
|
|
30
|
+
# The top-level message sent by the client for the `LongRunningRecognize`
|
|
31
31
|
# method.
|
|
32
32
|
# @!attribute [rw] config
|
|
33
33
|
# @return [Google::Cloud::Speech::V1::RecognitionConfig]
|
|
@@ -38,24 +38,24 @@ module Google
|
|
|
38
38
|
# *Required* The audio data to be recognized.
|
|
39
39
|
class LongRunningRecognizeRequest; end
|
|
40
40
|
|
|
41
|
-
# The top-level message sent by the client for the
|
|
42
|
-
# Multiple
|
|
43
|
-
# must contain a
|
|
44
|
-
# All subsequent messages must contain
|
|
45
|
-
#
|
|
41
|
+
# The top-level message sent by the client for the `StreamingRecognize` method.
|
|
42
|
+
# Multiple `StreamingRecognizeRequest` messages are sent. The first message
|
|
43
|
+
# must contain a `streaming_config` message and must not contain `audio` data.
|
|
44
|
+
# All subsequent messages must contain `audio` data and must not contain a
|
|
45
|
+
# `streaming_config` message.
|
|
46
46
|
# @!attribute [rw] streaming_config
|
|
47
47
|
# @return [Google::Cloud::Speech::V1::StreamingRecognitionConfig]
|
|
48
48
|
# Provides information to the recognizer that specifies how to process the
|
|
49
|
-
# request. The first
|
|
50
|
-
#
|
|
49
|
+
# request. The first `StreamingRecognizeRequest` message must contain a
|
|
50
|
+
# `streaming_config` message.
|
|
51
51
|
# @!attribute [rw] audio_content
|
|
52
52
|
# @return [String]
|
|
53
53
|
# The audio data to be recognized. Sequential chunks of audio data are sent
|
|
54
|
-
# in sequential
|
|
55
|
-
#
|
|
56
|
-
# and all subsequent
|
|
57
|
-
#
|
|
58
|
-
#
|
|
54
|
+
# in sequential `StreamingRecognizeRequest` messages. The first
|
|
55
|
+
# `StreamingRecognizeRequest` message must not contain `audio_content` data
|
|
56
|
+
# and all subsequent `StreamingRecognizeRequest` messages must contain
|
|
57
|
+
# `audio_content` data. The audio bytes must be encoded as specified in
|
|
58
|
+
# `RecognitionConfig`. Note: as with all bytes fields, protobuffers use a
|
|
59
59
|
# pure binary representation (not base64). See
|
|
60
60
|
# [content limits](https://cloud.google.com/speech-to-text/quotas#content).
|
|
61
61
|
class StreamingRecognizeRequest; end
|
|
@@ -68,40 +68,40 @@ module Google
|
|
|
68
68
|
# process the request.
|
|
69
69
|
# @!attribute [rw] single_utterance
|
|
70
70
|
# @return [true, false]
|
|
71
|
-
# *Optional* If
|
|
71
|
+
# *Optional* If `false` or omitted, the recognizer will perform continuous
|
|
72
72
|
# recognition (continuing to wait for and process audio even if the user
|
|
73
73
|
# pauses speaking) until the client closes the input stream (gRPC API) or
|
|
74
74
|
# until the maximum time limit has been reached. May return multiple
|
|
75
|
-
#
|
|
75
|
+
# `StreamingRecognitionResult`s with the `is_final` flag set to `true`.
|
|
76
76
|
#
|
|
77
|
-
# If
|
|
77
|
+
# If `true`, the recognizer will detect a single spoken utterance. When it
|
|
78
78
|
# detects that the user has paused or stopped speaking, it will return an
|
|
79
|
-
#
|
|
80
|
-
# more than one
|
|
81
|
-
#
|
|
79
|
+
# `END_OF_SINGLE_UTTERANCE` event and cease recognition. It will return no
|
|
80
|
+
# more than one `StreamingRecognitionResult` with the `is_final` flag set to
|
|
81
|
+
# `true`.
|
|
82
82
|
# @!attribute [rw] interim_results
|
|
83
83
|
# @return [true, false]
|
|
84
|
-
# *Optional* If
|
|
84
|
+
# *Optional* If `true`, interim results (tentative hypotheses) may be
|
|
85
85
|
# returned as they become available (these interim results are indicated with
|
|
86
|
-
# the
|
|
87
|
-
# If
|
|
86
|
+
# the `is_final=false` flag).
|
|
87
|
+
# If `false` or omitted, only `is_final=true` result(s) are returned.
|
|
88
88
|
class StreamingRecognitionConfig; end
|
|
89
89
|
|
|
90
90
|
# Provides information to the recognizer that specifies how to process the
|
|
91
91
|
# request.
|
|
92
92
|
# @!attribute [rw] encoding
|
|
93
93
|
# @return [Google::Cloud::Speech::V1::RecognitionConfig::AudioEncoding]
|
|
94
|
-
# Encoding of audio data sent in all
|
|
95
|
-
# This field is optional for
|
|
94
|
+
# Encoding of audio data sent in all `RecognitionAudio` messages.
|
|
95
|
+
# This field is optional for `FLAC` and `WAV` audio files and required
|
|
96
96
|
# for all other audio formats. For details, see {Google::Cloud::Speech::V1::RecognitionConfig::AudioEncoding AudioEncoding}.
|
|
97
97
|
# @!attribute [rw] sample_rate_hertz
|
|
98
98
|
# @return [Integer]
|
|
99
99
|
# Sample rate in Hertz of the audio data sent in all
|
|
100
|
-
#
|
|
100
|
+
# `RecognitionAudio` messages. Valid values are: 8000-48000.
|
|
101
101
|
# 16000 is optimal. For best results, set the sampling rate of the audio
|
|
102
102
|
# source to 16000 Hz. If that's not possible, use the native sample rate of
|
|
103
103
|
# the audio source (instead of re-sampling).
|
|
104
|
-
# This field is optional for
|
|
104
|
+
# This field is optional for `FLAC` and `WAV` audio files and required
|
|
105
105
|
# for all other audio formats. For details, see {Google::Cloud::Speech::V1::RecognitionConfig::AudioEncoding AudioEncoding}.
|
|
106
106
|
# @!attribute [rw] language_code
|
|
107
107
|
# @return [String]
|
|
@@ -113,16 +113,16 @@ module Google
|
|
|
113
113
|
# @!attribute [rw] max_alternatives
|
|
114
114
|
# @return [Integer]
|
|
115
115
|
# *Optional* Maximum number of recognition hypotheses to be returned.
|
|
116
|
-
# Specifically, the maximum number of
|
|
117
|
-
# within each
|
|
118
|
-
# The server may return fewer than
|
|
119
|
-
# Valid values are
|
|
116
|
+
# Specifically, the maximum number of `SpeechRecognitionAlternative` messages
|
|
117
|
+
# within each `SpeechRecognitionResult`.
|
|
118
|
+
# The server may return fewer than `max_alternatives`.
|
|
119
|
+
# Valid values are `0`-`30`. A value of `0` or `1` will return a maximum of
|
|
120
120
|
# one. If omitted, will return a maximum of one.
|
|
121
121
|
# @!attribute [rw] profanity_filter
|
|
122
122
|
# @return [true, false]
|
|
123
|
-
# *Optional* If set to
|
|
123
|
+
# *Optional* If set to `true`, the server will attempt to filter out
|
|
124
124
|
# profanities, replacing all but the initial character in each filtered word
|
|
125
|
-
# with asterisks, e.g. "f***". If set to
|
|
125
|
+
# with asterisks, e.g. "f***". If set to `false` or omitted, profanities
|
|
126
126
|
# won't be filtered out.
|
|
127
127
|
# @!attribute [rw] speech_contexts
|
|
128
128
|
# @return [Array<Google::Cloud::Speech::V1::SpeechContext>]
|
|
@@ -131,10 +131,10 @@ module Google
|
|
|
131
131
|
# information, see [Phrase Hints](https://cloud.google.com/speech-to-text/docs/basics#phrase-hints).
|
|
132
132
|
# @!attribute [rw] enable_word_time_offsets
|
|
133
133
|
# @return [true, false]
|
|
134
|
-
# *Optional* If
|
|
134
|
+
# *Optional* If `true`, the top result includes a list of words and
|
|
135
135
|
# the start and end time offsets (timestamps) for those words. If
|
|
136
|
-
#
|
|
137
|
-
#
|
|
136
|
+
# `false`, no word-level time offset information is returned. The default is
|
|
137
|
+
# `false`.
|
|
138
138
|
# @!attribute [rw] enable_automatic_punctuation
|
|
139
139
|
# @return [true, false]
|
|
140
140
|
# *Optional* If 'true', adds punctuation to recognition result hypotheses.
|
|
@@ -181,15 +181,15 @@ module Google
|
|
|
181
181
|
# @!attribute [rw] use_enhanced
|
|
182
182
|
# @return [true, false]
|
|
183
183
|
# *Optional* Set to true to use an enhanced model for speech recognition.
|
|
184
|
-
# You must also set the
|
|
185
|
-
#
|
|
186
|
-
#
|
|
184
|
+
# You must also set the `model` field to a valid, enhanced model. If
|
|
185
|
+
# `use_enhanced` is set to true and the `model` field is not set, then
|
|
186
|
+
# `use_enhanced` is ignored. If `use_enhanced` is true and an enhanced
|
|
187
187
|
# version of the specified model does not exist, then the speech is
|
|
188
188
|
# recognized using the standard version of the specified model.
|
|
189
189
|
#
|
|
190
190
|
# Enhanced speech models require that you opt-in to data logging using
|
|
191
191
|
# instructions in the [documentation](https://cloud.google.com/speech-to-text/enable-data-logging).
|
|
192
|
-
# If you set
|
|
192
|
+
# If you set `use_enhanced` to true and you have not enabled audio logging,
|
|
193
193
|
# then you will receive an error.
|
|
194
194
|
class RecognitionConfig
|
|
195
195
|
# The encoding of the audio data sent in the request.
|
|
@@ -197,18 +197,18 @@ module Google
|
|
|
197
197
|
# All encodings support only 1 channel (mono) audio.
|
|
198
198
|
#
|
|
199
199
|
# For best results, the audio source should be captured and transmitted using
|
|
200
|
-
# a lossless encoding (
|
|
200
|
+
# a lossless encoding (`FLAC` or `LINEAR16`). The accuracy of the speech
|
|
201
201
|
# recognition can be reduced if lossy codecs are used to capture or transmit
|
|
202
202
|
# audio, particularly if background noise is present. Lossy codecs include
|
|
203
|
-
#
|
|
203
|
+
# `MULAW`, `AMR`, `AMR_WB`, `OGG_OPUS`, and `SPEEX_WITH_HEADER_BYTE`.
|
|
204
204
|
#
|
|
205
|
-
# The
|
|
206
|
-
# included audio content. You can request recognition for
|
|
207
|
-
# contain either
|
|
208
|
-
# If you send
|
|
209
|
-
# your request, you do not need to specify an
|
|
205
|
+
# The `FLAC` and `WAV` audio file formats include a header that describes the
|
|
206
|
+
# included audio content. You can request recognition for `WAV` files that
|
|
207
|
+
# contain either `LINEAR16` or `MULAW` encoded audio.
|
|
208
|
+
# If you send `FLAC` or `WAV` audio file format in
|
|
209
|
+
# your request, you do not need to specify an `AudioEncoding`; the audio
|
|
210
210
|
# encoding format is determined from the file header. If you specify
|
|
211
|
-
# an
|
|
211
|
+
# an `AudioEncoding` when you send send `FLAC` or `WAV` audio, the
|
|
212
212
|
# encoding configuration must match the encoding described in the audio
|
|
213
213
|
# header; otherwise the request returns an
|
|
214
214
|
# {Google::Rpc::Code::INVALID_ARGUMENT} error code.
|
|
@@ -219,33 +219,33 @@ module Google
|
|
|
219
219
|
# Uncompressed 16-bit signed little-endian samples (Linear PCM).
|
|
220
220
|
LINEAR16 = 1
|
|
221
221
|
|
|
222
|
-
#
|
|
222
|
+
# `FLAC` (Free Lossless Audio
|
|
223
223
|
# Codec) is the recommended encoding because it is
|
|
224
224
|
# lossless--therefore recognition is not compromised--and
|
|
225
|
-
# requires only about half the bandwidth of
|
|
225
|
+
# requires only about half the bandwidth of `LINEAR16`. `FLAC` stream
|
|
226
226
|
# encoding supports 16-bit and 24-bit samples, however, not all fields in
|
|
227
|
-
#
|
|
227
|
+
# `STREAMINFO` are supported.
|
|
228
228
|
FLAC = 2
|
|
229
229
|
|
|
230
230
|
# 8-bit samples that compand 14-bit audio samples using G.711 PCMU/mu-law.
|
|
231
231
|
MULAW = 3
|
|
232
232
|
|
|
233
|
-
# Adaptive Multi-Rate Narrowband codec.
|
|
233
|
+
# Adaptive Multi-Rate Narrowband codec. `sample_rate_hertz` must be 8000.
|
|
234
234
|
AMR = 4
|
|
235
235
|
|
|
236
|
-
# Adaptive Multi-Rate Wideband codec.
|
|
236
|
+
# Adaptive Multi-Rate Wideband codec. `sample_rate_hertz` must be 16000.
|
|
237
237
|
AMR_WB = 5
|
|
238
238
|
|
|
239
239
|
# Opus encoded audio frames in Ogg container
|
|
240
240
|
# ([OggOpus](https://wiki.xiph.org/OggOpus)).
|
|
241
|
-
#
|
|
241
|
+
# `sample_rate_hertz` must be one of 8000, 12000, 16000, 24000, or 48000.
|
|
242
242
|
OGG_OPUS = 6
|
|
243
243
|
|
|
244
244
|
# Although the use of lossy encodings is not recommended, if a very low
|
|
245
|
-
# bitrate encoding is required,
|
|
245
|
+
# bitrate encoding is required, `OGG_OPUS` is highly preferred over
|
|
246
246
|
# Speex encoding. The [Speex](https://speex.org/) encoding supported by
|
|
247
247
|
# Cloud Speech API has a header byte in each block, as in MIME type
|
|
248
|
-
#
|
|
248
|
+
# `audio/x-speex-with-header-byte`.
|
|
249
249
|
# It is a variant of the RTP Speex encoding defined in
|
|
250
250
|
# [RFC 5574](https://tools.ietf.org/html/rfc5574).
|
|
251
251
|
# The stream is a sequence of blocks, one block per RTP packet. Each block
|
|
@@ -253,7 +253,7 @@ module Google
|
|
|
253
253
|
# by one or more frames of Speex data, padded to an integral number of
|
|
254
254
|
# bytes (octets) as specified in RFC 5574. In other words, each RTP header
|
|
255
255
|
# is replaced with a single byte containing the block length. Only Speex
|
|
256
|
-
# wideband is supported.
|
|
256
|
+
# wideband is supported. `sample_rate_hertz` must be 16000.
|
|
257
257
|
SPEEX_WITH_HEADER_BYTE = 7
|
|
258
258
|
end
|
|
259
259
|
end
|
|
@@ -270,28 +270,28 @@ module Google
|
|
|
270
270
|
# [usage limits](https://cloud.google.com/speech-to-text/quotas#content).
|
|
271
271
|
class SpeechContext; end
|
|
272
272
|
|
|
273
|
-
# Contains audio data in the encoding specified in the
|
|
274
|
-
# Either
|
|
273
|
+
# Contains audio data in the encoding specified in the `RecognitionConfig`.
|
|
274
|
+
# Either `content` or `uri` must be supplied. Supplying both or neither
|
|
275
275
|
# returns {Google::Rpc::Code::INVALID_ARGUMENT}. See
|
|
276
276
|
# [content limits](https://cloud.google.com/speech-to-text/quotas#content).
|
|
277
277
|
# @!attribute [rw] content
|
|
278
278
|
# @return [String]
|
|
279
279
|
# The audio data bytes encoded as specified in
|
|
280
|
-
#
|
|
280
|
+
# `RecognitionConfig`. Note: as with all bytes fields, protobuffers use a
|
|
281
281
|
# pure binary representation, whereas JSON representations use base64.
|
|
282
282
|
# @!attribute [rw] uri
|
|
283
283
|
# @return [String]
|
|
284
284
|
# URI that points to a file that contains audio data bytes as specified in
|
|
285
|
-
#
|
|
285
|
+
# `RecognitionConfig`. The file must not be compressed (for example, gzip).
|
|
286
286
|
# Currently, only Google Cloud Storage URIs are
|
|
287
287
|
# supported, which must be specified in the following format:
|
|
288
|
-
#
|
|
288
|
+
# `gs://bucket_name/object_name` (other URI formats return
|
|
289
289
|
# {Google::Rpc::Code::INVALID_ARGUMENT}). For more information, see
|
|
290
290
|
# [Request URIs](https://cloud.google.com/storage/docs/reference-uris).
|
|
291
291
|
class RecognitionAudio; end
|
|
292
292
|
|
|
293
|
-
# The only message returned to the client by the
|
|
294
|
-
# contains the result as zero or more sequential
|
|
293
|
+
# The only message returned to the client by the `Recognize` method. It
|
|
294
|
+
# contains the result as zero or more sequential `SpeechRecognitionResult`
|
|
295
295
|
# messages.
|
|
296
296
|
# @!attribute [rw] results
|
|
297
297
|
# @return [Array<Google::Cloud::Speech::V1::SpeechRecognitionResult>]
|
|
@@ -299,10 +299,10 @@ module Google
|
|
|
299
299
|
# sequential portions of audio.
|
|
300
300
|
class RecognizeResponse; end
|
|
301
301
|
|
|
302
|
-
# The only message returned to the client by the
|
|
303
|
-
# It contains the result as zero or more sequential
|
|
304
|
-
# messages. It is included in the
|
|
305
|
-
# returned by the
|
|
302
|
+
# The only message returned to the client by the `LongRunningRecognize` method.
|
|
303
|
+
# It contains the result as zero or more sequential `SpeechRecognitionResult`
|
|
304
|
+
# messages. It is included in the `result.response` field of the `Operation`
|
|
305
|
+
# returned by the `GetOperation` call of the `google::longrunning::Operations`
|
|
306
306
|
# service.
|
|
307
307
|
# @!attribute [rw] results
|
|
308
308
|
# @return [Array<Google::Cloud::Speech::V1::SpeechRecognitionResult>]
|
|
@@ -310,9 +310,9 @@ module Google
|
|
|
310
310
|
# sequential portions of audio.
|
|
311
311
|
class LongRunningRecognizeResponse; end
|
|
312
312
|
|
|
313
|
-
# Describes the progress of a long-running
|
|
314
|
-
# included in the
|
|
315
|
-
#
|
|
313
|
+
# Describes the progress of a long-running `LongRunningRecognize` call. It is
|
|
314
|
+
# included in the `metadata` field of the `Operation` returned by the
|
|
315
|
+
# `GetOperation` call of the `google::longrunning::Operations` service.
|
|
316
316
|
# @!attribute [rw] progress_percent
|
|
317
317
|
# @return [Integer]
|
|
318
318
|
# Approximate percentage of audio processed thus far. Guaranteed to be 100
|
|
@@ -325,13 +325,13 @@ module Google
|
|
|
325
325
|
# Time of the most recent processing update.
|
|
326
326
|
class LongRunningRecognizeMetadata; end
|
|
327
327
|
|
|
328
|
-
#
|
|
329
|
-
#
|
|
328
|
+
# `StreamingRecognizeResponse` is the only message returned to the client by
|
|
329
|
+
# `StreamingRecognize`. A series of zero or more `StreamingRecognizeResponse`
|
|
330
330
|
# messages are streamed back to the client. If there is no recognizable
|
|
331
|
-
# audio, and
|
|
331
|
+
# audio, and `single_utterance` is set to false, then no messages are streamed
|
|
332
332
|
# back to the client.
|
|
333
333
|
#
|
|
334
|
-
# Here's an example of a series of ten
|
|
334
|
+
# Here's an example of a series of ten `StreamingRecognizeResponse`s that might
|
|
335
335
|
# be returned while processing audio:
|
|
336
336
|
#
|
|
337
337
|
# 1. results { alternatives { transcript: "tube" } stability: 0.01 }
|
|
@@ -359,21 +359,21 @@ module Google
|
|
|
359
359
|
# Notes:
|
|
360
360
|
#
|
|
361
361
|
# * Only two of the above responses #4 and #7 contain final results; they are
|
|
362
|
-
# indicated by
|
|
362
|
+
# indicated by `is_final: true`. Concatenating these together generates the
|
|
363
363
|
# full transcript: "to be or not to be that is the question".
|
|
364
364
|
#
|
|
365
|
-
# * The others contain interim
|
|
366
|
-
#
|
|
365
|
+
# * The others contain interim `results`. #3 and #6 contain two interim
|
|
366
|
+
# `results`: the first portion has a high stability and is less likely to
|
|
367
367
|
# change; the second portion has a low stability and is very likely to
|
|
368
|
-
# change. A UI designer might choose to show only high stability
|
|
368
|
+
# change. A UI designer might choose to show only high stability `results`.
|
|
369
369
|
#
|
|
370
|
-
# * The specific
|
|
370
|
+
# * The specific `stability` and `confidence` values shown above are only for
|
|
371
371
|
# illustrative purposes. Actual values may vary.
|
|
372
372
|
#
|
|
373
373
|
# * In each response, only one of these fields will be set:
|
|
374
|
-
#
|
|
375
|
-
#
|
|
376
|
-
# one or more (repeated)
|
|
374
|
+
# `error`,
|
|
375
|
+
# `speech_event_type`, or
|
|
376
|
+
# one or more (repeated) `results`.
|
|
377
377
|
# @!attribute [rw] error
|
|
378
378
|
# @return [Google::Rpc::Status]
|
|
379
379
|
# Output only. If set, returns a {Google::Rpc::Status} message that
|
|
@@ -382,8 +382,8 @@ module Google
|
|
|
382
382
|
# @return [Array<Google::Cloud::Speech::V1::StreamingRecognitionResult>]
|
|
383
383
|
# Output only. This repeated list contains zero or more results that
|
|
384
384
|
# correspond to consecutive portions of the audio currently being processed.
|
|
385
|
-
# It contains zero or one
|
|
386
|
-
# followed by zero or more
|
|
385
|
+
# It contains zero or one `is_final=true` result (the newly settled portion),
|
|
386
|
+
# followed by zero or more `is_final=false` results (the interim results).
|
|
387
387
|
# @!attribute [rw] speech_event_type
|
|
388
388
|
# @return [Google::Cloud::Speech::V1::StreamingRecognizeResponse::SpeechEventType]
|
|
389
389
|
# Output only. Indicates the type of speech event.
|
|
@@ -399,7 +399,7 @@ module Google
|
|
|
399
399
|
# additional results). The client should stop sending additional audio
|
|
400
400
|
# data, half-close the gRPC connection, and wait for any additional results
|
|
401
401
|
# until the server closes the gRPC connection. This event is only sent if
|
|
402
|
-
#
|
|
402
|
+
# `single_utterance` was set to `true`, and is not used otherwise.
|
|
403
403
|
END_OF_SINGLE_UTTERANCE = 1
|
|
404
404
|
end
|
|
405
405
|
end
|
|
@@ -409,14 +409,14 @@ module Google
|
|
|
409
409
|
# @!attribute [rw] alternatives
|
|
410
410
|
# @return [Array<Google::Cloud::Speech::V1::SpeechRecognitionAlternative>]
|
|
411
411
|
# Output only. May contain one or more recognition hypotheses (up to the
|
|
412
|
-
# maximum specified in
|
|
412
|
+
# maximum specified in `max_alternatives`).
|
|
413
413
|
# These alternatives are ordered in terms of accuracy, with the top (first)
|
|
414
414
|
# alternative being the most probable, as ranked by the recognizer.
|
|
415
415
|
# @!attribute [rw] is_final
|
|
416
416
|
# @return [true, false]
|
|
417
|
-
# Output only. If
|
|
418
|
-
# interim result that may change. If
|
|
419
|
-
# speech service will return this particular
|
|
417
|
+
# Output only. If `false`, this `StreamingRecognitionResult` represents an
|
|
418
|
+
# interim result that may change. If `true`, this is the final time the
|
|
419
|
+
# speech service will return this particular `StreamingRecognitionResult`,
|
|
420
420
|
# the recognizer will not return any further hypotheses for this portion of
|
|
421
421
|
# the transcript and corresponding audio.
|
|
422
422
|
# @!attribute [rw] stability
|
|
@@ -424,15 +424,15 @@ module Google
|
|
|
424
424
|
# Output only. An estimate of the likelihood that the recognizer will not
|
|
425
425
|
# change its guess about this interim result. Values range from 0.0
|
|
426
426
|
# (completely unstable) to 1.0 (completely stable).
|
|
427
|
-
# This field is only provided for interim results (
|
|
428
|
-
# The default of 0.0 is a sentinel value indicating
|
|
427
|
+
# This field is only provided for interim results (`is_final=false`).
|
|
428
|
+
# The default of 0.0 is a sentinel value indicating `stability` was not set.
|
|
429
429
|
class StreamingRecognitionResult; end
|
|
430
430
|
|
|
431
431
|
# A speech recognition result corresponding to a portion of the audio.
|
|
432
432
|
# @!attribute [rw] alternatives
|
|
433
433
|
# @return [Array<Google::Cloud::Speech::V1::SpeechRecognitionAlternative>]
|
|
434
434
|
# Output only. May contain one or more recognition hypotheses (up to the
|
|
435
|
-
# maximum specified in
|
|
435
|
+
# maximum specified in `max_alternatives`).
|
|
436
436
|
# These alternatives are ordered in terms of accuracy, with the top (first)
|
|
437
437
|
# alternative being the most probable, as ranked by the recognizer.
|
|
438
438
|
class SpeechRecognitionResult; end
|
|
@@ -446,10 +446,10 @@ module Google
|
|
|
446
446
|
# Output only. The confidence estimate between 0.0 and 1.0. A higher number
|
|
447
447
|
# indicates an estimated greater likelihood that the recognized words are
|
|
448
448
|
# correct. This field is set only for the top alternative of a non-streaming
|
|
449
|
-
# result or, of a streaming result where
|
|
449
|
+
# result or, of a streaming result where `is_final=true`.
|
|
450
450
|
# This field is not guaranteed to be accurate and users should not rely on it
|
|
451
451
|
# to be always provided.
|
|
452
|
-
# The default of 0.0 is a sentinel value indicating
|
|
452
|
+
# The default of 0.0 is a sentinel value indicating `confidence` was not set.
|
|
453
453
|
# @!attribute [rw] words
|
|
454
454
|
# @return [Array<Google::Cloud::Speech::V1::WordInfo>]
|
|
455
455
|
# Output only. A list of word-specific information for each recognized word.
|
|
@@ -460,7 +460,7 @@ module Google
|
|
|
460
460
|
# @return [Google::Protobuf::Duration]
|
|
461
461
|
# Output only. Time offset relative to the beginning of the audio,
|
|
462
462
|
# and corresponding to the start of the spoken word.
|
|
463
|
-
# This field is only set if
|
|
463
|
+
# This field is only set if `enable_word_time_offsets=true` and only
|
|
464
464
|
# in the top hypothesis.
|
|
465
465
|
# This is an experimental feature and the accuracy of the time offset can
|
|
466
466
|
# vary.
|
|
@@ -468,7 +468,7 @@ module Google
|
|
|
468
468
|
# @return [Google::Protobuf::Duration]
|
|
469
469
|
# Output only. Time offset relative to the beginning of the audio,
|
|
470
470
|
# and corresponding to the end of the spoken word.
|
|
471
|
-
# This field is only set if
|
|
471
|
+
# This field is only set if `enable_word_time_offsets=true` and only
|
|
472
472
|
# in the top hypothesis.
|
|
473
473
|
# This is an experimental feature and the accuracy of the time offset can
|
|
474
474
|
# vary.
|