google-cloud-speech 0.37.0 → 0.38.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: '097260285e9fc0974b445e01b73f0fae6738787bf0719045b1475185aae5c2e0'
4
- data.tar.gz: 1653536be5b74266a3aa463c57f0101d7347229c375120878f0d0aa19f88f720
3
+ metadata.gz: 418c1c3abd5c10d1bb15b06d09f9f31bf8c9f48a7ba40b6d9c01f6699d2b920e
4
+ data.tar.gz: 5b01b710d26d3d6bfcba2eb4231b8a535f86c6306fb62a5daf71db406c8a140d
5
5
  SHA512:
6
- metadata.gz: e6b0f379e2d75556c4ce91d067df3d6c3b7ff072af174c39e72cdae749cbd8a8559b0adb360fcf24712cc14f9fa2008b88834daf0f3fa3246056910df235c31e
7
- data.tar.gz: 37212dc923146ee789b8631da1b80a5d6289d3fed52f96aae9b0e1c04a5944bf5bd9413007ff8a1cbcdadd5c367bb33817b0dc28315751acfa126aa6eaeb3f13
6
+ metadata.gz: b45ab7560fea6d5472532e3894328f80673e2498481bbdbcf2d5ab9b9b6fd83f541b7505562052bfb2c08f42e279845615d42100d885bab68695e425447151ad
7
+ data.tar.gz: 2ed95a11fb97cdc6840d8e241d4c12db07cbaa8de94f1d8d749745a696a8c4de8cac0b25e3fa574ee4e4ba490c2743ad1f590b175b0771e23aed7765b1e2e6dd
data/.yardopts CHANGED
@@ -1,5 +1,5 @@
1
1
  --no-private
2
- --title=Cloud Speech API
2
+ --title=Cloud Speech-to-Text API
3
3
  --exclude _pb\.rb$
4
4
  --markup markdown
5
5
  --markup-provider redcarpet
data/README.md CHANGED
@@ -1,6 +1,6 @@
1
- # Ruby Client for Cloud Speech API ([Alpha](https://github.com/googleapis/google-cloud-ruby#versioning))
1
+ # Ruby Client for Cloud Speech-to-Text API ([Alpha](https://github.com/googleapis/google-cloud-ruby#versioning))
2
2
 
3
- [Cloud Speech API][Product Documentation]:
3
+ [Cloud Speech-to-Text API][Product Documentation]:
4
4
  Converts audio to text by applying powerful neural network models.
5
5
  - [Client Library Documentation][]
6
6
  - [Product Documentation][]
@@ -11,7 +11,7 @@ steps:
11
11
 
12
12
  1. [Select or create a Cloud Platform project.](https://console.cloud.google.com/project)
13
13
  2. [Enable billing for your project.](https://cloud.google.com/billing/docs/how-to/modify-project#enable_billing_for_a_project)
14
- 3. [Enable the Cloud Speech API.](https://console.cloud.google.com/apis/library/speech.googleapis.com)
14
+ 3. [Enable the Cloud Speech-to-Text API.](https://console.cloud.google.com/apis/library/speech.googleapis.com)
15
15
  4. [Setup Authentication.](https://googleapis.dev/ruby/google-cloud-speech/latest/file.AUTHENTICATION.html)
16
16
 
17
17
  ### Installation
@@ -46,9 +46,9 @@ response = speech_client.recognize(config, audio)
46
46
  ```
47
47
 
48
48
  ### Next Steps
49
- - Read the [Client Library Documentation][] for Cloud Speech API
49
+ - Read the [Client Library Documentation][] for Cloud Speech-to-Text API
50
50
  to see other available methods on the client.
51
- - Read the [Cloud Speech API Product documentation][Product Documentation]
51
+ - Read the [Cloud Speech-to-Text API Product documentation][Product Documentation]
52
52
  to learn more about the product and see How-to Guides.
53
53
  - View this [repository's main README](https://github.com/googleapis/google-cloud-ruby/blob/master/README.md)
54
54
  to see the full list of Cloud APIs that we cover.
@@ -21,9 +21,9 @@ module Google
21
21
  # rubocop:disable LineLength
22
22
 
23
23
  ##
24
- # # Ruby Client for Cloud Speech API ([Alpha](https://github.com/googleapis/google-cloud-ruby#versioning))
24
+ # # Ruby Client for Cloud Speech-to-Text API ([Alpha](https://github.com/googleapis/google-cloud-ruby#versioning))
25
25
  #
26
- # [Cloud Speech API][Product Documentation]:
26
+ # [Cloud Speech-to-Text API][Product Documentation]:
27
27
  # Converts audio to text by applying powerful neural network models.
28
28
  # - [Product Documentation][]
29
29
  #
@@ -33,7 +33,7 @@ module Google
33
33
  #
34
34
  # 1. [Select or create a Cloud Platform project.](https://console.cloud.google.com/project)
35
35
  # 2. [Enable billing for your project.](https://cloud.google.com/billing/docs/how-to/modify-project#enable_billing_for_a_project)
36
- # 3. [Enable the Cloud Speech API.](https://console.cloud.google.com/apis/library/speech.googleapis.com)
36
+ # 3. [Enable the Cloud Speech-to-Text API.](https://console.cloud.google.com/apis/library/speech.googleapis.com)
37
37
  # 4. [Setup Authentication.](https://googleapis.dev/ruby/google-cloud-speech/latest/file.AUTHENTICATION.html)
38
38
  #
39
39
  # ### Installation
@@ -68,7 +68,7 @@ module Google
68
68
  # ```
69
69
  #
70
70
  # ### Next Steps
71
- # - Read the [Cloud Speech API Product documentation][Product Documentation]
71
+ # - Read the [Cloud Speech-to-Text API Product documentation][Product Documentation]
72
72
  # to learn more about the product and see How-to Guides.
73
73
  # - View this [repository's main README](https://github.com/googleapis/google-cloud-ruby/blob/master/README.md)
74
74
  # to see the full list of Cloud APIs that we cover.
@@ -23,9 +23,9 @@ module Google
23
23
  # rubocop:disable LineLength
24
24
 
25
25
  ##
26
- # # Ruby Client for Cloud Speech API ([Alpha](https://github.com/googleapis/google-cloud-ruby#versioning))
26
+ # # Ruby Client for Cloud Speech-to-Text API ([Alpha](https://github.com/googleapis/google-cloud-ruby#versioning))
27
27
  #
28
- # [Cloud Speech API][Product Documentation]:
28
+ # [Cloud Speech-to-Text API][Product Documentation]:
29
29
  # Converts audio to text by applying powerful neural network models.
30
30
  # - [Product Documentation][]
31
31
  #
@@ -35,7 +35,7 @@ module Google
35
35
  #
36
36
  # 1. [Select or create a Cloud Platform project.](https://console.cloud.google.com/project)
37
37
  # 2. [Enable billing for your project.](https://cloud.google.com/billing/docs/how-to/modify-project#enable_billing_for_a_project)
38
- # 3. [Enable the Cloud Speech API.](https://console.cloud.google.com/apis/library/speech.googleapis.com)
38
+ # 3. [Enable the Cloud Speech-to-Text API.](https://console.cloud.google.com/apis/library/speech.googleapis.com)
39
39
  # 4. [Setup Authentication.](https://googleapis.dev/ruby/google-cloud-speech/latest/file.AUTHENTICATION.html)
40
40
  #
41
41
  # ### Installation
@@ -63,7 +63,7 @@ module Google
63
63
  # ```
64
64
  #
65
65
  # ### Next Steps
66
- # - Read the [Cloud Speech API Product documentation][Product Documentation]
66
+ # - Read the [Cloud Speech-to-Text API Product documentation][Product Documentation]
67
67
  # to learn more about the product and see How-to Guides.
68
68
  # - View this [repository's main README](https://github.com/googleapis/google-cloud-ruby/blob/master/README.md)
69
69
  # to see the full list of Cloud APIs that we cover.
@@ -5,7 +5,10 @@
5
5
  require 'google/protobuf'
6
6
 
7
7
  require 'google/api/annotations_pb'
8
+ require 'google/api/client_pb'
9
+ require 'google/api/field_behavior_pb'
8
10
  require 'google/longrunning/operations_pb'
11
+ require 'google/protobuf/any_pb'
9
12
  require 'google/protobuf/duration_pb'
10
13
  require 'google/protobuf/timestamp_pb'
11
14
  require 'google/rpc/status_pb'
@@ -40,6 +43,7 @@ Google::Protobuf::DescriptorPool.generated_pool.build do
40
43
  repeated :speech_contexts, :message, 6, "google.cloud.speech.v1.SpeechContext"
41
44
  optional :enable_word_time_offsets, :bool, 8
42
45
  optional :enable_automatic_punctuation, :bool, 11
46
+ optional :diarization_config, :message, 19, "google.cloud.speech.v1.SpeakerDiarizationConfig"
43
47
  optional :metadata, :message, 9, "google.cloud.speech.v1.RecognitionMetadata"
44
48
  optional :model, :string, 13
45
49
  optional :use_enhanced, :bool, 14
@@ -54,6 +58,12 @@ Google::Protobuf::DescriptorPool.generated_pool.build do
54
58
  value :OGG_OPUS, 6
55
59
  value :SPEEX_WITH_HEADER_BYTE, 7
56
60
  end
61
+ add_message "google.cloud.speech.v1.SpeakerDiarizationConfig" do
62
+ optional :enable_speaker_diarization, :bool, 1
63
+ optional :min_speaker_count, :int32, 2
64
+ optional :max_speaker_count, :int32, 3
65
+ optional :speaker_tag, :int32, 5
66
+ end
57
67
  add_message "google.cloud.speech.v1.RecognitionMetadata" do
58
68
  optional :interaction_type, :enum, 1, "google.cloud.speech.v1.RecognitionMetadata.InteractionType"
59
69
  optional :industry_naics_code_of_audio, :uint32, 3
@@ -158,6 +168,7 @@ module Google
158
168
  StreamingRecognitionConfig = Google::Protobuf::DescriptorPool.generated_pool.lookup("google.cloud.speech.v1.StreamingRecognitionConfig").msgclass
159
169
  RecognitionConfig = Google::Protobuf::DescriptorPool.generated_pool.lookup("google.cloud.speech.v1.RecognitionConfig").msgclass
160
170
  RecognitionConfig::AudioEncoding = Google::Protobuf::DescriptorPool.generated_pool.lookup("google.cloud.speech.v1.RecognitionConfig.AudioEncoding").enummodule
171
+ SpeakerDiarizationConfig = Google::Protobuf::DescriptorPool.generated_pool.lookup("google.cloud.speech.v1.SpeakerDiarizationConfig").msgclass
161
172
  RecognitionMetadata = Google::Protobuf::DescriptorPool.generated_pool.lookup("google.cloud.speech.v1.RecognitionMetadata").msgclass
162
173
  RecognitionMetadata::InteractionType = Google::Protobuf::DescriptorPool.generated_pool.lookup("google.cloud.speech.v1.RecognitionMetadata.InteractionType").enummodule
163
174
  RecognitionMetadata::MicrophoneDistance = Google::Protobuf::DescriptorPool.generated_pool.lookup("google.cloud.speech.v1.RecognitionMetadata.MicrophoneDistance").enummodule
@@ -1,7 +1,7 @@
1
1
  # Generated by the protocol buffer compiler. DO NOT EDIT!
2
2
  # Source: google/cloud/speech/v1/cloud_speech.proto for package 'google.cloud.speech.v1'
3
3
  # Original file comments:
4
- # Copyright 2018 Google LLC.
4
+ # Copyright 2019 Google LLC.
5
5
  #
6
6
  # Licensed under the Apache License, Version 2.0 (the "License");
7
7
  # you may not use this file except in compliance with the License.
@@ -42,6 +42,8 @@ module Google
42
42
  # google.longrunning.Operations interface. Returns either an
43
43
  # `Operation.error` or an `Operation.response` which contains
44
44
  # a `LongRunningRecognizeResponse` message.
45
+ # For more information on asynchronous speech recognition, see the
46
+ # [how-to](https://cloud.google.com/speech-to-text/docs/async-recognize).
45
47
  rpc :LongRunningRecognize, LongRunningRecognizeRequest, Google::Longrunning::Operation
46
48
  # Performs bidirectional streaming speech recognition: receive results while
47
49
  # sending audio. This method is only available via the gRPC API (not REST).
@@ -20,29 +20,29 @@ module Google
20
20
  # The top-level message sent by the client for the `Recognize` method.
21
21
  # @!attribute [rw] config
22
22
  # @return [Google::Cloud::Speech::V1::RecognitionConfig]
23
- # *Required* Provides information to the recognizer that specifies how to
23
+ # Required. Provides information to the recognizer that specifies how to
24
24
  # process the request.
25
25
  # @!attribute [rw] audio
26
26
  # @return [Google::Cloud::Speech::V1::RecognitionAudio]
27
- # *Required* The audio data to be recognized.
27
+ # Required. The audio data to be recognized.
28
28
  class RecognizeRequest; end
29
29
 
30
30
  # The top-level message sent by the client for the `LongRunningRecognize`
31
31
  # method.
32
32
  # @!attribute [rw] config
33
33
  # @return [Google::Cloud::Speech::V1::RecognitionConfig]
34
- # *Required* Provides information to the recognizer that specifies how to
34
+ # Required. Provides information to the recognizer that specifies how to
35
35
  # process the request.
36
36
  # @!attribute [rw] audio
37
37
  # @return [Google::Cloud::Speech::V1::RecognitionAudio]
38
- # *Required* The audio data to be recognized.
38
+ # Required. The audio data to be recognized.
39
39
  class LongRunningRecognizeRequest; end
40
40
 
41
41
  # The top-level message sent by the client for the `StreamingRecognize` method.
42
42
  # Multiple `StreamingRecognizeRequest` messages are sent. The first message
43
- # must contain a `streaming_config` message and must not contain `audio` data.
44
- # All subsequent messages must contain `audio` data and must not contain a
45
- # `streaming_config` message.
43
+ # must contain a `streaming_config` message and must not contain
44
+ # `audio_content`. All subsequent messages must contain `audio_content` and
45
+ # must not contain a `streaming_config` message.
46
46
  # @!attribute [rw] streaming_config
47
47
  # @return [Google::Cloud::Speech::V1::StreamingRecognitionConfig]
48
48
  # Provides information to the recognizer that specifies how to process the
@@ -55,7 +55,7 @@ module Google
55
55
  # `StreamingRecognizeRequest` message must not contain `audio_content` data
56
56
  # and all subsequent `StreamingRecognizeRequest` messages must contain
57
57
  # `audio_content` data. The audio bytes must be encoded as specified in
58
- # `RecognitionConfig`. Note: as with all bytes fields, protobuffers use a
58
+ # `RecognitionConfig`. Note: as with all bytes fields, proto buffers use a
59
59
  # pure binary representation (not base64). See
60
60
  # [content limits](https://cloud.google.com/speech-to-text/quotas#content).
61
61
  class StreamingRecognizeRequest; end
@@ -64,11 +64,11 @@ module Google
64
64
  # request.
65
65
  # @!attribute [rw] config
66
66
  # @return [Google::Cloud::Speech::V1::RecognitionConfig]
67
- # *Required* Provides information to the recognizer that specifies how to
67
+ # Required. Provides information to the recognizer that specifies how to
68
68
  # process the request.
69
69
  # @!attribute [rw] single_utterance
70
70
  # @return [true, false]
71
- # *Optional* If `false` or omitted, the recognizer will perform continuous
71
+ # If `false` or omitted, the recognizer will perform continuous
72
72
  # recognition (continuing to wait for and process audio even if the user
73
73
  # pauses speaking) until the client closes the input stream (gRPC API) or
74
74
  # until the maximum time limit has been reached. May return multiple
@@ -81,7 +81,7 @@ module Google
81
81
  # `true`.
82
82
  # @!attribute [rw] interim_results
83
83
  # @return [true, false]
84
- # *Optional* If `true`, interim results (tentative hypotheses) may be
84
+ # If `true`, interim results (tentative hypotheses) may be
85
85
  # returned as they become available (these interim results are indicated with
86
86
  # the `is_final=false` flag).
87
87
  # If `false` or omitted, only `is_final=true` result(s) are returned.
@@ -93,8 +93,7 @@ module Google
93
93
  # @return [Google::Cloud::Speech::V1::RecognitionConfig::AudioEncoding]
94
94
  # Encoding of audio data sent in all `RecognitionAudio` messages.
95
95
  # This field is optional for `FLAC` and `WAV` audio files and required
96
- # for all other audio formats. For details, see
97
- # {Google::Cloud::Speech::V1::RecognitionConfig::AudioEncoding AudioEncoding}.
96
+ # for all other audio formats. For details, see {Google::Cloud::Speech::V1::RecognitionConfig::AudioEncoding AudioEncoding}.
98
97
  # @!attribute [rw] sample_rate_hertz
99
98
  # @return [Integer]
100
99
  # Sample rate in Hertz of the audio data sent in all
@@ -102,12 +101,11 @@ module Google
102
101
  # 16000 is optimal. For best results, set the sampling rate of the audio
103
102
  # source to 16000 Hz. If that's not possible, use the native sample rate of
104
103
  # the audio source (instead of re-sampling).
105
- # This field is optional for `FLAC` and `WAV` audio files and required
106
- # for all other audio formats. For details, see
107
- # {Google::Cloud::Speech::V1::RecognitionConfig::AudioEncoding AudioEncoding}.
104
+ # This field is optional for FLAC and WAV audio files, but is
105
+ # required for all other audio formats. For details, see {Google::Cloud::Speech::V1::RecognitionConfig::AudioEncoding AudioEncoding}.
108
106
  # @!attribute [rw] audio_channel_count
109
107
  # @return [Integer]
110
- # *Optional* The number of channels in the input audio data.
108
+ # The number of channels in the input audio data.
111
109
  # ONLY set this for MULTI-CHANNEL recognition.
112
110
  # Valid values for LINEAR16 and FLAC are `1`-`8`.
113
111
  # Valid values for OGG_OPUS are '1'-'254'.
@@ -126,14 +124,15 @@ module Google
126
124
  # `audio_channel_count` multiplied by the length of the audio.
127
125
  # @!attribute [rw] language_code
128
126
  # @return [String]
129
- # *Required* The language of the supplied audio as a
127
+ # Required. The language of the supplied audio as a
130
128
  # [BCP-47](https://www.rfc-editor.org/rfc/bcp/bcp47.txt) language tag.
131
129
  # Example: "en-US".
132
- # See [Language Support](https://cloud.google.com/speech-to-text/docs/languages)
133
- # for a list of the currently supported language codes.
130
+ # See [Language
131
+ # Support](https://cloud.google.com/speech-to-text/docs/languages) for a list
132
+ # of the currently supported language codes.
134
133
  # @!attribute [rw] max_alternatives
135
134
  # @return [Integer]
136
- # *Optional* Maximum number of recognition hypotheses to be returned.
135
+ # Maximum number of recognition hypotheses to be returned.
137
136
  # Specifically, the maximum number of `SpeechRecognitionAlternative` messages
138
137
  # within each `SpeechRecognitionResult`.
139
138
  # The server may return fewer than `max_alternatives`.
@@ -141,36 +140,48 @@ module Google
141
140
  # one. If omitted, will return a maximum of one.
142
141
  # @!attribute [rw] profanity_filter
143
142
  # @return [true, false]
144
- # *Optional* If set to `true`, the server will attempt to filter out
143
+ # If set to `true`, the server will attempt to filter out
145
144
  # profanities, replacing all but the initial character in each filtered word
146
145
  # with asterisks, e.g. "f***". If set to `false` or omitted, profanities
147
146
  # won't be filtered out.
148
147
  # @!attribute [rw] speech_contexts
149
148
  # @return [Array<Google::Cloud::Speech::V1::SpeechContext>]
150
- # *Optional* array of {Google::Cloud::Speech::V1::SpeechContext SpeechContext}.
149
+ # Array of {Google::Cloud::Speech::V1::SpeechContext SpeechContext}.
151
150
  # A means to provide context to assist the speech recognition. For more
152
- # information, see [Phrase Hints](https://cloud.google.com/speech-to-text/docs/basics#phrase-hints).
151
+ # information, see
152
+ # [speech
153
+ # adaptation](https://cloud.google.com/speech-to-text/docs/context-strength).
153
154
  # @!attribute [rw] enable_word_time_offsets
154
155
  # @return [true, false]
155
- # *Optional* If `true`, the top result includes a list of words and
156
+ # If `true`, the top result includes a list of words and
156
157
  # the start and end time offsets (timestamps) for those words. If
157
158
  # `false`, no word-level time offset information is returned. The default is
158
159
  # `false`.
159
160
  # @!attribute [rw] enable_automatic_punctuation
160
161
  # @return [true, false]
161
- # *Optional* If 'true', adds punctuation to recognition result hypotheses.
162
+ # If 'true', adds punctuation to recognition result hypotheses.
162
163
  # This feature is only available in select languages. Setting this for
163
164
  # requests in other languages has no effect at all.
164
165
  # The default 'false' value does not add punctuation to result hypotheses.
165
166
  # Note: This is currently offered as an experimental service, complimentary
166
167
  # to all users. In the future this may be exclusively available as a
167
168
  # premium feature.
169
+ # @!attribute [rw] diarization_config
170
+ # @return [Google::Cloud::Speech::V1::SpeakerDiarizationConfig]
171
+ # Config to enable speaker diarization and set additional
172
+ # parameters to make diarization better suited for your application.
173
+ # Note: When this is enabled, we send all the words from the beginning of the
174
+ # audio for the top alternative in every consecutive STREAMING responses.
175
+ # This is done in order to improve our speaker tags as our models learn to
176
+ # identify the speakers in the conversation over time.
177
+ # For non-streaming requests, the diarization results will be provided only
178
+ # in the top alternative of the FINAL SpeechRecognitionResult.
168
179
  # @!attribute [rw] metadata
169
180
  # @return [Google::Cloud::Speech::V1::RecognitionMetadata]
170
- # *Optional* Metadata regarding this request.
181
+ # Metadata regarding this request.
171
182
  # @!attribute [rw] model
172
183
  # @return [String]
173
- # *Optional* Which model to select for the given request. Select the model
184
+ # Which model to select for the given request. Select the model
174
185
  # best suited to your domain to get best results. If a model is not
175
186
  # explicitly specified, then we auto-select a model based on the parameters
176
187
  # in the RecognitionConfig.
@@ -204,7 +215,7 @@ module Google
204
215
  # </table>
205
216
  # @!attribute [rw] use_enhanced
206
217
  # @return [true, false]
207
- # *Optional* Set to true to use an enhanced model for speech recognition.
218
+ # Set to true to use an enhanced model for speech recognition.
208
219
  # If `use_enhanced` is set to true and the `model` field is not set, then
209
220
  # an appropriate enhanced model is chosen if an enhanced model exists for
210
221
  # the audio.
@@ -215,13 +226,15 @@ module Google
215
226
  class RecognitionConfig
216
227
  # The encoding of the audio data sent in the request.
217
228
  #
218
- # All encodings support only 1 channel (mono) audio.
229
+ # All encodings support only 1 channel (mono) audio, unless the
230
+ # `audio_channel_count` and `enable_separate_recognition_per_channel` fields
231
+ # are set.
219
232
  #
220
233
  # For best results, the audio source should be captured and transmitted using
221
234
  # a lossless encoding (`FLAC` or `LINEAR16`). The accuracy of the speech
222
235
  # recognition can be reduced if lossy codecs are used to capture or transmit
223
236
  # audio, particularly if background noise is present. Lossy codecs include
224
- # `MULAW`, `AMR`, `AMR_WB`, `OGG_OPUS`, and `SPEEX_WITH_HEADER_BYTE`.
237
+ # `MULAW`, `AMR`, `AMR_WB`, `OGG_OPUS`, `SPEEX_WITH_HEADER_BYTE`, and `MP3`.
225
238
  #
226
239
  # The `FLAC` and `WAV` audio file formats include a header that describes the
227
240
  # included audio content. You can request recognition for `WAV` files that
@@ -232,8 +245,7 @@ module Google
232
245
  # an `AudioEncoding` when you send send `FLAC` or `WAV` audio, the
233
246
  # encoding configuration must match the encoding described in the audio
234
247
  # header; otherwise the request returns an
235
- # {Google::Rpc::Code::INVALID_ARGUMENT} error
236
- # code.
248
+ # {Google::Rpc::Code::INVALID_ARGUMENT} error code.
237
249
  module AudioEncoding
238
250
  # Not specified.
239
251
  ENCODING_UNSPECIFIED = 0
@@ -280,6 +292,31 @@ module Google
280
292
  end
281
293
  end
282
294
 
295
+ # Config to enable speaker diarization.
296
+ # @!attribute [rw] enable_speaker_diarization
297
+ # @return [true, false]
298
+ # If 'true', enables speaker detection for each recognized word in
299
+ # the top alternative of the recognition result using a speaker_tag provided
300
+ # in the WordInfo.
301
+ # @!attribute [rw] min_speaker_count
302
+ # @return [Integer]
303
+ # Minimum number of speakers in the conversation. This range gives you more
304
+ # flexibility by allowing the system to automatically determine the correct
305
+ # number of speakers. If not set, the default value is 2.
306
+ # @!attribute [rw] max_speaker_count
307
+ # @return [Integer]
308
+ # Maximum number of speakers in the conversation. This range gives you more
309
+ # flexibility by allowing the system to automatically determine the correct
310
+ # number of speakers. If not set, the default value is 6.
311
+ # @!attribute [rw] speaker_tag
312
+ # @return [Integer]
313
+ # A distinct integer value is assigned for every speaker within
314
+ # the audio. This field specifies which one of those speakers was detected to
315
+ # have spoken this word. Value ranges from '1' to diarization_speaker_count.
316
+ # speaker_tag is set if enable_speaker_diarization = 'true' and only in the
317
+ # top alternative.
318
+ class SpeakerDiarizationConfig; end
319
+
283
320
  # Description of audio data to be recognized.
284
321
  # @!attribute [rw] interaction_type
285
322
  # @return [Google::Cloud::Speech::V1::RecognitionMetadata::InteractionType]
@@ -411,22 +448,28 @@ module Google
411
448
  # in the results.
412
449
  # @!attribute [rw] phrases
413
450
  # @return [Array<String>]
414
- # *Optional* A list of strings containing words and phrases "hints" so that
451
+ # A list of strings containing words and phrases "hints" so that
415
452
  # the speech recognition is more likely to recognize them. This can be used
416
453
  # to improve the accuracy for specific words and phrases, for example, if
417
454
  # specific commands are typically spoken by the user. This can also be used
418
455
  # to add additional words to the vocabulary of the recognizer. See
419
456
  # [usage limits](https://cloud.google.com/speech-to-text/quotas#content).
457
+ #
458
+ # List items can also be set to classes for groups of words that represent
459
+ # common concepts that occur in natural language. For example, rather than
460
+ # providing phrase hints for every month of the year, using the $MONTH class
461
+ # improves the likelihood of correctly transcribing audio that includes
462
+ # months.
420
463
  class SpeechContext; end
421
464
 
422
465
  # Contains audio data in the encoding specified in the `RecognitionConfig`.
423
466
  # Either `content` or `uri` must be supplied. Supplying both or neither
424
- # returns {Google::Rpc::Code::INVALID_ARGUMENT}.
425
- # See [content limits](https://cloud.google.com/speech-to-text/quotas#content).
467
+ # returns {Google::Rpc::Code::INVALID_ARGUMENT}. See
468
+ # [content limits](https://cloud.google.com/speech-to-text/quotas#content).
426
469
  # @!attribute [rw] content
427
470
  # @return [String]
428
471
  # The audio data bytes encoded as specified in
429
- # `RecognitionConfig`. Note: as with all bytes fields, protobuffers use a
472
+ # `RecognitionConfig`. Note: as with all bytes fields, proto buffers use a
430
473
  # pure binary representation, whereas JSON representations use base64.
431
474
  # @!attribute [rw] uri
432
475
  # @return [String]
@@ -435,9 +478,8 @@ module Google
435
478
  # Currently, only Google Cloud Storage URIs are
436
479
  # supported, which must be specified in the following format:
437
480
  # `gs://bucket_name/object_name` (other URI formats return
438
- # {Google::Rpc::Code::INVALID_ARGUMENT}).
439
- # For more information, see [Request
440
- # URIs](https://cloud.google.com/storage/docs/reference-uris).
481
+ # {Google::Rpc::Code::INVALID_ARGUMENT}). For more information, see
482
+ # [Request URIs](https://cloud.google.com/storage/docs/reference-uris).
441
483
  class RecognitionAudio; end
442
484
 
443
485
  # The only message returned to the client by the `Recognize` method. It
@@ -445,7 +487,7 @@ module Google
445
487
  # messages.
446
488
  # @!attribute [rw] results
447
489
  # @return [Array<Google::Cloud::Speech::V1::SpeechRecognitionResult>]
448
- # Output only. Sequential list of transcription results corresponding to
490
+ # Sequential list of transcription results corresponding to
449
491
  # sequential portions of audio.
450
492
  class RecognizeResponse; end
451
493
 
@@ -456,7 +498,7 @@ module Google
456
498
  # service.
457
499
  # @!attribute [rw] results
458
500
  # @return [Array<Google::Cloud::Speech::V1::SpeechRecognitionResult>]
459
- # Output only. Sequential list of transcription results corresponding to
501
+ # Sequential list of transcription results corresponding to
460
502
  # sequential portions of audio.
461
503
  class LongRunningRecognizeResponse; end
462
504
 
@@ -526,17 +568,17 @@ module Google
526
568
  # one or more (repeated) `results`.
527
569
  # @!attribute [rw] error
528
570
  # @return [Google::Rpc::Status]
529
- # Output only. If set, returns a {Google::Rpc::Status}
530
- # message that specifies the error for the operation.
571
+ # If set, returns a {Google::Rpc::Status} message that
572
+ # specifies the error for the operation.
531
573
  # @!attribute [rw] results
532
574
  # @return [Array<Google::Cloud::Speech::V1::StreamingRecognitionResult>]
533
- # Output only. This repeated list contains zero or more results that
575
+ # This repeated list contains zero or more results that
534
576
  # correspond to consecutive portions of the audio currently being processed.
535
577
  # It contains zero or one `is_final=true` result (the newly settled portion),
536
578
  # followed by zero or more `is_final=false` results (the interim results).
537
579
  # @!attribute [rw] speech_event_type
538
580
  # @return [Google::Cloud::Speech::V1::StreamingRecognizeResponse::SpeechEventType]
539
- # Output only. Indicates the type of speech event.
581
+ # Indicates the type of speech event.
540
582
  class StreamingRecognizeResponse
541
583
  # Indicates the type of speech event.
542
584
  module SpeechEventType
@@ -558,27 +600,27 @@ module Google
558
600
  # that is currently being processed.
559
601
  # @!attribute [rw] alternatives
560
602
  # @return [Array<Google::Cloud::Speech::V1::SpeechRecognitionAlternative>]
561
- # Output only. May contain one or more recognition hypotheses (up to the
603
+ # May contain one or more recognition hypotheses (up to the
562
604
  # maximum specified in `max_alternatives`).
563
605
  # These alternatives are ordered in terms of accuracy, with the top (first)
564
606
  # alternative being the most probable, as ranked by the recognizer.
565
607
  # @!attribute [rw] is_final
566
608
  # @return [true, false]
567
- # Output only. If `false`, this `StreamingRecognitionResult` represents an
609
+ # If `false`, this `StreamingRecognitionResult` represents an
568
610
  # interim result that may change. If `true`, this is the final time the
569
611
  # speech service will return this particular `StreamingRecognitionResult`,
570
612
  # the recognizer will not return any further hypotheses for this portion of
571
613
  # the transcript and corresponding audio.
572
614
  # @!attribute [rw] stability
573
615
  # @return [Float]
574
- # Output only. An estimate of the likelihood that the recognizer will not
616
+ # An estimate of the likelihood that the recognizer will not
575
617
  # change its guess about this interim result. Values range from 0.0
576
618
  # (completely unstable) to 1.0 (completely stable).
577
619
  # This field is only provided for interim results (`is_final=false`).
578
620
  # The default of 0.0 is a sentinel value indicating `stability` was not set.
579
621
  # @!attribute [rw] result_end_time
580
622
  # @return [Google::Protobuf::Duration]
581
- # Output only. Time offset of the end of this result relative to the
623
+ # Time offset of the end of this result relative to the
582
624
  # beginning of the audio.
583
625
  # @!attribute [rw] channel_tag
584
626
  # @return [Integer]
@@ -587,16 +629,15 @@ module Google
587
629
  # For audio_channel_count = N, its output values can range from '1' to 'N'.
588
630
  # @!attribute [rw] language_code
589
631
  # @return [String]
590
- # Output only. The
591
- # [BCP-47](https://www.rfc-editor.org/rfc/bcp/bcp47.txt) language tag of the
592
- # language in this result. This language code was detected to have the most
593
- # likelihood of being spoken in the audio.
632
+ # The [BCP-47](https://www.rfc-editor.org/rfc/bcp/bcp47.txt) language tag of
633
+ # the language in this result. This language code was detected to have the
634
+ # most likelihood of being spoken in the audio.
594
635
  class StreamingRecognitionResult; end
595
636
 
596
637
  # A speech recognition result corresponding to a portion of the audio.
597
638
  # @!attribute [rw] alternatives
598
639
  # @return [Array<Google::Cloud::Speech::V1::SpeechRecognitionAlternative>]
599
- # Output only. May contain one or more recognition hypotheses (up to the
640
+ # May contain one or more recognition hypotheses (up to the
600
641
  # maximum specified in `max_alternatives`).
601
642
  # These alternatives are ordered in terms of accuracy, with the top (first)
602
643
  # alternative being the most probable, as ranked by the recognizer.
@@ -610,10 +651,10 @@ module Google
610
651
  # Alternative hypotheses (a.k.a. n-best list).
611
652
  # @!attribute [rw] transcript
612
653
  # @return [String]
613
- # Output only. Transcript text representing the words that the user spoke.
654
+ # Transcript text representing the words that the user spoke.
614
655
  # @!attribute [rw] confidence
615
656
  # @return [Float]
616
- # Output only. The confidence estimate between 0.0 and 1.0. A higher number
657
+ # The confidence estimate between 0.0 and 1.0. A higher number
617
658
  # indicates an estimated greater likelihood that the recognized words are
618
659
  # correct. This field is set only for the top alternative of a non-streaming
619
660
  # result or, of a streaming result where `is_final=true`.
@@ -622,7 +663,7 @@ module Google
622
663
  # The default of 0.0 is a sentinel value indicating `confidence` was not set.
623
664
  # @!attribute [rw] words
624
665
  # @return [Array<Google::Cloud::Speech::V1::WordInfo>]
625
- # Output only. A list of word-specific information for each recognized word.
666
+ # A list of word-specific information for each recognized word.
626
667
  # Note: When `enable_speaker_diarization` is true, you will see all the words
627
668
  # from the beginning of the audio.
628
669
  class SpeechRecognitionAlternative; end
@@ -630,7 +671,7 @@ module Google
630
671
  # Word-specific information for recognized words.
631
672
  # @!attribute [rw] start_time
632
673
  # @return [Google::Protobuf::Duration]
633
- # Output only. Time offset relative to the beginning of the audio,
674
+ # Time offset relative to the beginning of the audio,
634
675
  # and corresponding to the start of the spoken word.
635
676
  # This field is only set if `enable_word_time_offsets=true` and only
636
677
  # in the top hypothesis.
@@ -638,7 +679,7 @@ module Google
638
679
  # vary.
639
680
  # @!attribute [rw] end_time
640
681
  # @return [Google::Protobuf::Duration]
641
- # Output only. Time offset relative to the beginning of the audio,
682
+ # Time offset relative to the beginning of the audio,
642
683
  # and corresponding to the end of the spoken word.
643
684
  # This field is only set if `enable_word_time_offsets=true` and only
644
685
  # in the top hypothesis.
@@ -646,7 +687,7 @@ module Google
646
687
  # vary.
647
688
  # @!attribute [rw] word
648
689
  # @return [String]
649
- # Output only. The word corresponding to this set of information.
690
+ # The word corresponding to this set of information.
650
691
  class WordInfo; end
651
692
  end
652
693
  end