ibm_watson 2.0.1 → 2.1.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +9 -29
- data/lib/ibm_watson/assistant_v1.rb +111 -77
- data/lib/ibm_watson/assistant_v2.rb +83 -59
- data/lib/ibm_watson/compare_comply_v1.rb +11 -4
- data/lib/ibm_watson/discovery_v1.rb +2 -3
- data/lib/ibm_watson/discovery_v2.rb +97 -7
- data/lib/ibm_watson/language_translator_v3.rb +1 -2
- data/lib/ibm_watson/natural_language_classifier_v1.rb +9 -3
- data/lib/ibm_watson/natural_language_understanding_v1.rb +692 -3
- data/lib/ibm_watson/personality_insights_v3.rb +13 -11
- data/lib/ibm_watson/speech_to_text_v1.rb +257 -106
- data/lib/ibm_watson/text_to_speech_v1.rb +599 -19
- data/lib/ibm_watson/tone_analyzer_v3.rb +1 -2
- data/lib/ibm_watson/version.rb +1 -1
- data/lib/ibm_watson/visual_recognition_v3.rb +1 -2
- data/lib/ibm_watson/visual_recognition_v4.rb +11 -8
- data/test/integration/test_discovery_v2.rb +15 -0
- data/test/integration/test_natural_language_understanding_v1.rb +134 -1
- data/test/integration/test_text_to_speech_v1.rb +57 -0
- data/test/unit/test_discovery_v2.rb +29 -0
- data/test/unit/test_natural_language_understanding_v1.rb +231 -0
- data/test/unit/test_text_to_speech_v1.rb +145 -0
- metadata +7 -7
@@ -14,17 +14,20 @@
|
|
14
14
|
# See the License for the specific language governing permissions and
|
15
15
|
# limitations under the License.
|
16
16
|
#
|
17
|
-
# IBM OpenAPI SDK Code Generator Version: 3.
|
17
|
+
# IBM OpenAPI SDK Code Generator Version: 3.31.0-902c9336-20210504-161156
|
18
18
|
#
|
19
|
-
# IBM
|
20
|
-
#
|
21
|
-
#
|
22
|
-
#
|
23
|
-
# Watson™ Natural Language
|
24
|
-
#
|
25
|
-
#
|
26
|
-
#
|
27
|
-
#
|
19
|
+
# IBM Watson™ Personality Insights is discontinued. Existing instances are
|
20
|
+
# supported until 1 December 2021, but as of 1 December 2020, you cannot create new
|
21
|
+
# instances. Any instance that exists on 1 December 2021 will be deleted.<br/><br/>No
|
22
|
+
# direct replacement exists for Personality Insights. However, you can consider using [IBM
|
23
|
+
# Watson™ Natural Language
|
24
|
+
# Understanding](https://cloud.ibm.com/docs/natural-language-understanding?topic=natural-language-understanding-about)
|
25
|
+
# on IBM Cloud® as part of a replacement analytic workflow for your Personality
|
26
|
+
# Insights use cases. You can use Natural Language Understanding to extract data and
|
27
|
+
# insights from text, such as keywords, categories, sentiment, emotion, and syntax. For
|
28
|
+
# more information about the personality models in Personality Insights, see [The science
|
29
|
+
# behind the
|
30
|
+
# service](https://cloud.ibm.com/docs/personality-insights?topic=personality-insights-science).
|
28
31
|
# {: deprecated}
|
29
32
|
#
|
30
33
|
# The IBM Watson Personality Insights service enables applications to derive insights from
|
@@ -54,7 +57,6 @@ require "json"
|
|
54
57
|
require "ibm_cloud_sdk_core"
|
55
58
|
require_relative "./common.rb"
|
56
59
|
|
57
|
-
# Module for the Watson APIs
|
58
60
|
module IBMWatson
|
59
61
|
##
|
60
62
|
# The Personality Insights V3 service.
|
@@ -14,14 +14,21 @@
|
|
14
14
|
# See the License for the specific language governing permissions and
|
15
15
|
# limitations under the License.
|
16
16
|
#
|
17
|
-
# IBM OpenAPI SDK Code Generator Version: 3.
|
17
|
+
# IBM OpenAPI SDK Code Generator Version: 3.31.0-902c9336-20210504-161156
|
18
18
|
#
|
19
19
|
# The IBM Watson™ Speech to Text service provides APIs that use IBM's
|
20
20
|
# speech-recognition capabilities to produce transcripts of spoken audio. The service can
|
21
21
|
# transcribe speech from various languages and audio formats. In addition to basic
|
22
22
|
# transcription, the service can produce detailed information about many different aspects
|
23
|
-
# of the audio.
|
24
|
-
#
|
23
|
+
# of the audio. It returns all JSON response content in the UTF-8 character set.
|
24
|
+
#
|
25
|
+
# The service supports two types of models: previous-generation models that include the
|
26
|
+
# terms `Broadband` and `Narrowband` in their names, and beta next-generation models that
|
27
|
+
# include the terms `Multimedia` and `Telephony` in their names. Broadband and multimedia
|
28
|
+
# models have minimum sampling rates of 16 kHz. Narrowband and telephony models have
|
29
|
+
# minimum sampling rates of 8 kHz. The beta next-generation models currently support fewer
|
30
|
+
# languages and features, but they offer high throughput and greater transcription
|
31
|
+
# accuracy.
|
25
32
|
#
|
26
33
|
# For speech recognition, the service supports synchronous and asynchronous HTTP
|
27
34
|
# Representational State Transfer (REST) interfaces. It also supports a WebSocket
|
@@ -37,8 +44,9 @@
|
|
37
44
|
# can recognize.
|
38
45
|
#
|
39
46
|
# Language model customization and acoustic model customization are generally available
|
40
|
-
# for production use with all
|
41
|
-
# beta functionality for all
|
47
|
+
# for production use with all previous-generation models that are generally available.
|
48
|
+
# Grammars are beta functionality for all previous-generation models that support language
|
49
|
+
# model customization. Next-generation models do not support customization at this time.
|
42
50
|
|
43
51
|
require "concurrent"
|
44
52
|
require "erb"
|
@@ -46,7 +54,6 @@ require "json"
|
|
46
54
|
require "ibm_cloud_sdk_core"
|
47
55
|
require_relative "./common.rb"
|
48
56
|
|
49
|
-
# Module for the Watson APIs
|
50
57
|
module IBMWatson
|
51
58
|
##
|
52
59
|
# The Speech to Text V1 service.
|
@@ -89,8 +96,8 @@ module IBMWatson
|
|
89
96
|
# among other things. The ordering of the list of models can change from call to
|
90
97
|
# call; do not rely on an alphabetized or static list of models.
|
91
98
|
#
|
92
|
-
# **See also:** [
|
93
|
-
# models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models
|
99
|
+
# **See also:** [Listing
|
100
|
+
# models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-list).
|
94
101
|
# @return [IBMCloudSdkCore::DetailedResponse] A `IBMCloudSdkCore::DetailedResponse` object representing the response.
|
95
102
|
def list_models
|
96
103
|
headers = {
|
@@ -116,10 +123,11 @@ module IBMWatson
|
|
116
123
|
# with the service. The information includes the name of the model and its minimum
|
117
124
|
# sampling rate in Hertz, among other things.
|
118
125
|
#
|
119
|
-
# **See also:** [
|
120
|
-
# models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models
|
126
|
+
# **See also:** [Listing
|
127
|
+
# models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-list).
|
121
128
|
# @param model_id [String] The identifier of the model in the form of its name from the output of the **Get a
|
122
|
-
# model** method.
|
129
|
+
# model** method. (**Note:** The model `ar-AR_BroadbandModel` is deprecated; use
|
130
|
+
# `ar-MS_BroadbandModel` instead.).
|
123
131
|
# @return [IBMCloudSdkCore::DetailedResponse] A `IBMCloudSdkCore::DetailedResponse` object representing the response.
|
124
132
|
def get_model(model_id:)
|
125
133
|
raise ArgumentError.new("model_id must be provided") if model_id.nil?
|
@@ -144,7 +152,7 @@ module IBMWatson
|
|
144
152
|
#########################
|
145
153
|
|
146
154
|
##
|
147
|
-
# @!method recognize(audio:, content_type: nil, model: nil, language_customization_id: nil, acoustic_customization_id: nil, base_model_version: nil, customization_weight: nil, inactivity_timeout: nil, keywords: nil, keywords_threshold: nil, max_alternatives: nil, word_alternatives_threshold: nil, word_confidence: nil, timestamps: nil, profanity_filter: nil, smart_formatting: nil, speaker_labels: nil, customization_id: nil, grammar_name: nil, redaction: nil, audio_metrics: nil, end_of_phrase_silence_time: nil, split_transcript_at_phrase_end: nil, speech_detector_sensitivity: nil, background_audio_suppression: nil)
|
155
|
+
# @!method recognize(audio:, content_type: nil, model: nil, language_customization_id: nil, acoustic_customization_id: nil, base_model_version: nil, customization_weight: nil, inactivity_timeout: nil, keywords: nil, keywords_threshold: nil, max_alternatives: nil, word_alternatives_threshold: nil, word_confidence: nil, timestamps: nil, profanity_filter: nil, smart_formatting: nil, speaker_labels: nil, customization_id: nil, grammar_name: nil, redaction: nil, audio_metrics: nil, end_of_phrase_silence_time: nil, split_transcript_at_phrase_end: nil, speech_detector_sensitivity: nil, background_audio_suppression: nil, low_latency: nil)
|
148
156
|
# Recognize audio.
|
149
157
|
# Sends audio and returns transcription results for a recognition request. You can
|
150
158
|
# pass a maximum of 100 MB and a minimum of 100 bytes of audio with a request. The
|
@@ -211,8 +219,40 @@ module IBMWatson
|
|
211
219
|
# sampling rate of the audio is lower than the minimum required rate, the request
|
212
220
|
# fails.
|
213
221
|
#
|
214
|
-
# **See also:** [
|
215
|
-
# formats](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-audio-formats
|
222
|
+
# **See also:** [Supported audio
|
223
|
+
# formats](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-audio-formats).
|
224
|
+
#
|
225
|
+
#
|
226
|
+
# ### Next-generation models
|
227
|
+
#
|
228
|
+
# **Note:** The next-generation language models are beta functionality. They
|
229
|
+
# support a limited number of languages and features at this time. The supported
|
230
|
+
# languages, models, and features will increase with future releases.
|
231
|
+
#
|
232
|
+
# The service supports next-generation `Multimedia` (16 kHz) and `Telephony` (8 kHz)
|
233
|
+
# models for many languages. Next-generation models have higher throughput than the
|
234
|
+
# service's previous generation of `Broadband` and `Narrowband` models. When you use
|
235
|
+
# next-generation models, the service can return transcriptions more quickly and
|
236
|
+
# also provide noticeably better transcription accuracy.
|
237
|
+
#
|
238
|
+
# You specify a next-generation model by using the `model` query parameter, as you
|
239
|
+
# do a previous-generation model. Next-generation models support the same request
|
240
|
+
# headers as previous-generation models, but they support only the following
|
241
|
+
# additional query parameters:
|
242
|
+
# * `background_audio_suppression`
|
243
|
+
# * `inactivity_timeout`
|
244
|
+
# * `profanity_filter`
|
245
|
+
# * `redaction`
|
246
|
+
# * `smart_formatting`
|
247
|
+
# * `speaker_labels`
|
248
|
+
# * `speech_detector_sensitivity`
|
249
|
+
# * `timestamps`
|
250
|
+
#
|
251
|
+
# Many next-generation models also support the beta `low_latency` parameter, which
|
252
|
+
# is not available with previous-generation models.
|
253
|
+
#
|
254
|
+
# **See also:** [Next-generation languages and
|
255
|
+
# models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-ng).
|
216
256
|
#
|
217
257
|
#
|
218
258
|
# ### Multipart speech recognition
|
@@ -235,15 +275,19 @@ module IBMWatson
|
|
235
275
|
# @param audio [File] The audio to transcribe.
|
236
276
|
# @param content_type [String] The format (MIME type) of the audio. For more information about specifying an
|
237
277
|
# audio format, see **Audio formats (content types)** in the method description.
|
238
|
-
# @param model [String] The identifier of the model that is to be used for the recognition request.
|
239
|
-
#
|
240
|
-
#
|
278
|
+
# @param model [String] The identifier of the model that is to be used for the recognition request.
|
279
|
+
# (**Note:** The model `ar-AR_BroadbandModel` is deprecated; use
|
280
|
+
# `ar-MS_BroadbandModel` instead.) See [Languages and
|
281
|
+
# models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models) and
|
282
|
+
# [Next-generation languages and
|
283
|
+
# models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-ng).
|
241
284
|
# @param language_customization_id [String] The customization ID (GUID) of a custom language model that is to be used with the
|
242
285
|
# recognition request. The base model of the specified custom language model must
|
243
286
|
# match the model specified with the `model` parameter. You must make the request
|
244
287
|
# with credentials for the instance of the service that owns the custom model. By
|
245
|
-
# default, no custom language model is used. See [
|
246
|
-
#
|
288
|
+
# default, no custom language model is used. See [Using a custom language model for
|
289
|
+
# speech
|
290
|
+
# recognition](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-languageUse).
|
247
291
|
#
|
248
292
|
#
|
249
293
|
# **Note:** Use this parameter instead of the deprecated `customization_id`
|
@@ -252,14 +296,16 @@ module IBMWatson
|
|
252
296
|
# recognition request. The base model of the specified custom acoustic model must
|
253
297
|
# match the model specified with the `model` parameter. You must make the request
|
254
298
|
# with credentials for the instance of the service that owns the custom model. By
|
255
|
-
# default, no custom acoustic model is used. See [
|
256
|
-
#
|
299
|
+
# default, no custom acoustic model is used. See [Using a custom acoustic model for
|
300
|
+
# speech
|
301
|
+
# recognition](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-acousticUse).
|
257
302
|
# @param base_model_version [String] The version of the specified base model that is to be used with the recognition
|
258
303
|
# request. Multiple versions of a base model can exist when a model is updated for
|
259
304
|
# internal improvements. The parameter is intended primarily for use with custom
|
260
305
|
# models that have been upgraded for a new base model. The default value depends on
|
261
|
-
# whether the parameter is used with or without a custom model. See [
|
262
|
-
#
|
306
|
+
# whether the parameter is used with or without a custom model. See [Making speech
|
307
|
+
# recognition requests with upgraded custom
|
308
|
+
# models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-upgrade-use#custom-upgrade-use-recognition).
|
263
309
|
# @param customization_weight [Float] If you specify the customization ID (GUID) of a custom language model with the
|
264
310
|
# recognition request, the customization weight tells the service how much weight to
|
265
311
|
# give to words from the custom language model compared to those from the base model
|
@@ -276,8 +322,8 @@ module IBMWatson
|
|
276
322
|
# custom model's domain, but it can negatively affect performance on non-domain
|
277
323
|
# phrases.
|
278
324
|
#
|
279
|
-
# See [
|
280
|
-
#
|
325
|
+
# See [Using customization
|
326
|
+
# weight](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-languageUse#weight).
|
281
327
|
# @param inactivity_timeout [Fixnum] The time in seconds after which, if only silence (no speech) is detected in
|
282
328
|
# streaming audio, the connection is closed with a 400 error. The parameter is
|
283
329
|
# useful for stopping audio submission from a live microphone when a user simply
|
@@ -294,34 +340,34 @@ module IBMWatson
|
|
294
340
|
# for double-byte languages might be shorter. Keywords are case-insensitive.
|
295
341
|
#
|
296
342
|
# See [Keyword
|
297
|
-
# spotting](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
343
|
+
# spotting](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-spotting#keyword-spotting).
|
298
344
|
# @param keywords_threshold [Float] A confidence value that is the lower bound for spotting a keyword. A word is
|
299
345
|
# considered to match a keyword if its confidence is greater than or equal to the
|
300
346
|
# threshold. Specify a probability between 0.0 and 1.0. If you specify a threshold,
|
301
347
|
# you must also specify one or more keywords. The service performs no keyword
|
302
348
|
# spotting if you omit either parameter. See [Keyword
|
303
|
-
# spotting](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
349
|
+
# spotting](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-spotting#keyword-spotting).
|
304
350
|
# @param max_alternatives [Fixnum] The maximum number of alternative transcripts that the service is to return. By
|
305
351
|
# default, the service returns a single transcript. If you specify a value of `0`,
|
306
352
|
# the service uses the default value, `1`. See [Maximum
|
307
|
-
# alternatives](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
353
|
+
# alternatives](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metadata#max-alternatives).
|
308
354
|
# @param word_alternatives_threshold [Float] A confidence value that is the lower bound for identifying a hypothesis as a
|
309
355
|
# possible word alternative (also known as "Confusion Networks"). An alternative
|
310
356
|
# word is considered if its confidence is greater than or equal to the threshold.
|
311
357
|
# Specify a probability between 0.0 and 1.0. By default, the service computes no
|
312
358
|
# alternative words. See [Word
|
313
|
-
# alternatives](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
359
|
+
# alternatives](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-spotting#word-alternatives).
|
314
360
|
# @param word_confidence [Boolean] If `true`, the service returns a confidence measure in the range of 0.0 to 1.0 for
|
315
361
|
# each word. By default, the service returns no word confidence scores. See [Word
|
316
|
-
# confidence](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
362
|
+
# confidence](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metadata#word-confidence).
|
317
363
|
# @param timestamps [Boolean] If `true`, the service returns time alignment for each word. By default, no
|
318
364
|
# timestamps are returned. See [Word
|
319
|
-
# timestamps](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
365
|
+
# timestamps](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metadata#word-timestamps).
|
320
366
|
# @param profanity_filter [Boolean] If `true`, the service filters profanity from all output except for keyword
|
321
367
|
# results by replacing inappropriate words with a series of asterisks. Set the
|
322
368
|
# parameter to `false` to return results with no censoring. Applies to US English
|
323
|
-
# transcription only. See [Profanity
|
324
|
-
# filtering](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
369
|
+
# and Japanese transcription only. See [Profanity
|
370
|
+
# filtering](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-formatting#profanity-filtering).
|
325
371
|
# @param smart_formatting [Boolean] If `true`, the service converts dates, times, series of digits and numbers, phone
|
326
372
|
# numbers, currency values, and internet addresses into more readable, conventional
|
327
373
|
# representations in the final transcript of a recognition request. For US English,
|
@@ -331,19 +377,21 @@ module IBMWatson
|
|
331
377
|
# **Note:** Applies to US English, Japanese, and Spanish transcription only.
|
332
378
|
#
|
333
379
|
# See [Smart
|
334
|
-
# formatting](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
380
|
+
# formatting](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-formatting#smart-formatting).
|
335
381
|
# @param speaker_labels [Boolean] If `true`, the response includes labels that identify which words were spoken by
|
336
382
|
# which participants in a multi-person exchange. By default, the service returns no
|
337
383
|
# speaker labels. Setting `speaker_labels` to `true` forces the `timestamps`
|
338
384
|
# parameter to be `true`, regardless of whether you specify `false` for the
|
339
385
|
# parameter.
|
340
|
-
#
|
341
|
-
#
|
342
|
-
#
|
343
|
-
#
|
344
|
-
#
|
345
|
-
#
|
346
|
-
# labels
|
386
|
+
# * For previous-generation models, can be used for US English, Australian English,
|
387
|
+
# German, Japanese, Korean, and Spanish (both broadband and narrowband models) and
|
388
|
+
# UK English (narrowband model) transcription only.
|
389
|
+
# * For next-generation models, can be used for English (Australian, UK, and US),
|
390
|
+
# German, and Spanish transcription only.
|
391
|
+
#
|
392
|
+
# Restrictions and limitations apply to the use of speaker labels for both types of
|
393
|
+
# models. See [Speaker
|
394
|
+
# labels](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-speaker-labels).
|
347
395
|
# @param customization_id [String] **Deprecated.** Use the `language_customization_id` parameter to specify the
|
348
396
|
# customization ID (GUID) of a custom language model that is to be used with the
|
349
397
|
# recognition request. Do not specify both parameters with a request.
|
@@ -352,7 +400,8 @@ module IBMWatson
|
|
352
400
|
# specify the name of the custom language model for which the grammar is defined.
|
353
401
|
# The service recognizes only strings that are recognized by the specified grammar;
|
354
402
|
# it does not recognize other custom words from the model's words resource. See
|
355
|
-
# [
|
403
|
+
# [Using a grammar for speech
|
404
|
+
# recognition](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-grammarUse).
|
356
405
|
# @param redaction [Boolean] If `true`, the service redacts, or masks, numeric data from final transcripts. The
|
357
406
|
# feature redacts any number that has three or more consecutive digits by replacing
|
358
407
|
# each digit with an `X` character. It is intended to redact sensitive numeric data,
|
@@ -367,13 +416,13 @@ module IBMWatson
|
|
367
416
|
# **Note:** Applies to US English, Japanese, and Korean transcription only.
|
368
417
|
#
|
369
418
|
# See [Numeric
|
370
|
-
# redaction](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
419
|
+
# redaction](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-formatting#numeric-redaction).
|
371
420
|
# @param audio_metrics [Boolean] If `true`, requests detailed information about the signal characteristics of the
|
372
421
|
# input audio. The service returns audio metrics with the final transcription
|
373
422
|
# results. By default, the service returns no audio metrics.
|
374
423
|
#
|
375
424
|
# See [Audio
|
376
|
-
# metrics](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metrics#
|
425
|
+
# metrics](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metrics#audio-metrics).
|
377
426
|
# @param end_of_phrase_silence_time [Float] If `true`, specifies the duration of the pause interval at which the service
|
378
427
|
# splits a transcript into multiple final results. If the service detects pauses or
|
379
428
|
# extended silence before it reaches the end of the audio stream, its response can
|
@@ -390,7 +439,7 @@ module IBMWatson
|
|
390
439
|
# Chinese is 0.6 seconds.
|
391
440
|
#
|
392
441
|
# See [End of phrase silence
|
393
|
-
# time](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
442
|
+
# time](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-parsing#silence-time).
|
394
443
|
# @param split_transcript_at_phrase_end [Boolean] If `true`, directs the service to split the transcript into multiple final results
|
395
444
|
# based on semantic features of the input, for example, at the conclusion of
|
396
445
|
# meaningful phrases such as sentences. The service bases its understanding of
|
@@ -400,7 +449,7 @@ module IBMWatson
|
|
400
449
|
# interval.
|
401
450
|
#
|
402
451
|
# See [Split transcript at phrase
|
403
|
-
# end](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
452
|
+
# end](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-parsing#split-transcript).
|
404
453
|
# @param speech_detector_sensitivity [Float] The sensitivity of speech activity detection that the service is to perform. Use
|
405
454
|
# the parameter to suppress word insertions from music, coughing, and other
|
406
455
|
# non-speech events. The service biases the audio it passes for speech recognition
|
@@ -412,8 +461,8 @@ module IBMWatson
|
|
412
461
|
# * 0.5 (the default) provides a reasonable compromise for the level of sensitivity.
|
413
462
|
# * 1.0 suppresses no audio (speech detection sensitivity is disabled).
|
414
463
|
#
|
415
|
-
# The values increase on a monotonic curve. See [Speech
|
416
|
-
#
|
464
|
+
# The values increase on a monotonic curve. See [Speech detector
|
465
|
+
# sensitivity](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-detection#detection-parameters-sensitivity).
|
417
466
|
# @param background_audio_suppression [Float] The level to which the service is to suppress background audio based on its volume
|
418
467
|
# to prevent it from being transcribed as speech. Use the parameter to suppress side
|
419
468
|
# conversations or background noise.
|
@@ -424,10 +473,27 @@ module IBMWatson
|
|
424
473
|
# * 0.5 provides a reasonable level of audio suppression for general usage.
|
425
474
|
# * 1.0 suppresses all audio (no audio is transcribed).
|
426
475
|
#
|
427
|
-
# The values increase on a monotonic curve. See [
|
428
|
-
#
|
476
|
+
# The values increase on a monotonic curve. See [Background audio
|
477
|
+
# suppression](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-detection#detection-parameters-suppression).
|
478
|
+
# @param low_latency [Boolean] If `true` for next-generation `Multimedia` and `Telephony` models that support low
|
479
|
+
# latency, directs the service to produce results even more quickly than it usually
|
480
|
+
# does. Next-generation models produce transcription results faster than
|
481
|
+
# previous-generation models. The `low_latency` parameter causes the models to
|
482
|
+
# produce results even more quickly, though the results might be less accurate when
|
483
|
+
# the parameter is used.
|
484
|
+
#
|
485
|
+
# **Note:** The parameter is beta functionality. It is not available for
|
486
|
+
# previous-generation `Broadband` and `Narrowband` models. It is available only for
|
487
|
+
# some next-generation models.
|
488
|
+
#
|
489
|
+
# * For a list of next-generation models that support low latency, see [Supported
|
490
|
+
# language
|
491
|
+
# models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-ng#models-ng-supported)
|
492
|
+
# for next-generation models.
|
493
|
+
# * For more information about the `low_latency` parameter, see [Low
|
494
|
+
# latency](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-interim#low-latency).
|
429
495
|
# @return [IBMCloudSdkCore::DetailedResponse] A `IBMCloudSdkCore::DetailedResponse` object representing the response.
|
430
|
-
def recognize(audio:, content_type: nil, model: nil, language_customization_id: nil, acoustic_customization_id: nil, base_model_version: nil, customization_weight: nil, inactivity_timeout: nil, keywords: nil, keywords_threshold: nil, max_alternatives: nil, word_alternatives_threshold: nil, word_confidence: nil, timestamps: nil, profanity_filter: nil, smart_formatting: nil, speaker_labels: nil, customization_id: nil, grammar_name: nil, redaction: nil, audio_metrics: nil, end_of_phrase_silence_time: nil, split_transcript_at_phrase_end: nil, speech_detector_sensitivity: nil, background_audio_suppression: nil)
|
496
|
+
def recognize(audio:, content_type: nil, model: nil, language_customization_id: nil, acoustic_customization_id: nil, base_model_version: nil, customization_weight: nil, inactivity_timeout: nil, keywords: nil, keywords_threshold: nil, max_alternatives: nil, word_alternatives_threshold: nil, word_confidence: nil, timestamps: nil, profanity_filter: nil, smart_formatting: nil, speaker_labels: nil, customization_id: nil, grammar_name: nil, redaction: nil, audio_metrics: nil, end_of_phrase_silence_time: nil, split_transcript_at_phrase_end: nil, speech_detector_sensitivity: nil, background_audio_suppression: nil, low_latency: nil)
|
431
497
|
raise ArgumentError.new("audio must be provided") if audio.nil?
|
432
498
|
|
433
499
|
headers = {
|
@@ -460,7 +526,8 @@ module IBMWatson
|
|
460
526
|
"end_of_phrase_silence_time" => end_of_phrase_silence_time,
|
461
527
|
"split_transcript_at_phrase_end" => split_transcript_at_phrase_end,
|
462
528
|
"speech_detector_sensitivity" => speech_detector_sensitivity,
|
463
|
-
"background_audio_suppression" => background_audio_suppression
|
529
|
+
"background_audio_suppression" => background_audio_suppression,
|
530
|
+
"low_latency" => low_latency
|
464
531
|
}
|
465
532
|
|
466
533
|
data = audio
|
@@ -479,7 +546,7 @@ module IBMWatson
|
|
479
546
|
end
|
480
547
|
|
481
548
|
##
|
482
|
-
# @!method recognize_using_websocket(content_type: nil,recognize_callback:,audio: nil,chunk_data: false,model: nil,customization_id: nil,acoustic_customization_id: nil,customization_weight: nil,base_model_version: nil,inactivity_timeout: nil,interim_results: nil,keywords: nil,keywords_threshold: nil,max_alternatives: nil,word_alternatives_threshold: nil,word_confidence: nil,timestamps: nil,profanity_filter: nil,smart_formatting: nil,speaker_labels: nil, end_of_phrase_silence_time: nil, split_transcript_at_phrase_end: nil, speech_detector_sensitivity: nil, background_audio_suppression: nil)
|
549
|
+
# @!method recognize_using_websocket(content_type: nil,recognize_callback:,audio: nil,chunk_data: false,model: nil,customization_id: nil,acoustic_customization_id: nil,customization_weight: nil,base_model_version: nil,inactivity_timeout: nil,interim_results: nil,keywords: nil,keywords_threshold: nil,max_alternatives: nil,word_alternatives_threshold: nil,word_confidence: nil,timestamps: nil,profanity_filter: nil,smart_formatting: nil,speaker_labels: nil, end_of_phrase_silence_time: nil, split_transcript_at_phrase_end: nil, speech_detector_sensitivity: nil, background_audio_suppression: nil, low_latency: nil)
|
483
550
|
# Sends audio for speech recognition using web sockets.
|
484
551
|
# @param content_type [String] The type of the input: audio/basic, audio/flac, audio/l16, audio/mp3, audio/mpeg, audio/mulaw, audio/ogg, audio/ogg;codecs=opus, audio/ogg;codecs=vorbis, audio/wav, audio/webm, audio/webm;codecs=opus, audio/webm;codecs=vorbis, or multipart/form-data.
|
485
552
|
# @param recognize_callback [RecognizeCallback] The instance handling events returned from the service.
|
@@ -596,6 +663,23 @@ module IBMWatson
|
|
596
663
|
#
|
597
664
|
# The values increase on a monotonic curve. See [Speech Activity
|
598
665
|
# Detection](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-input#detection).
|
666
|
+
# @param low_latency [Boolean] If `true` for next-generation `Multimedia` and `Telephony` models that support low
|
667
|
+
# latency, directs the service to produce results even more quickly than it usually
|
668
|
+
# does. Next-generation models produce transcription results faster than
|
669
|
+
# previous-generation models. The `low_latency` parameter causes the models to
|
670
|
+
# produce results even more quickly, though the results might be less accurate when
|
671
|
+
# the parameter is used.
|
672
|
+
#
|
673
|
+
# **Note:** The parameter is beta functionality. It is not available for
|
674
|
+
# previous-generation `Broadband` and `Narrowband` models. It is available only for
|
675
|
+
# some next-generation models.
|
676
|
+
#
|
677
|
+
# * For a list of next-generation models that support low latency, see [Supported
|
678
|
+
# language
|
679
|
+
# models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-ng#models-ng-supported)
|
680
|
+
# for next-generation models.
|
681
|
+
# * For more information about the `low_latency` parameter, see [Low
|
682
|
+
# latency](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-interim#low-latency).
|
599
683
|
# @return [IBMCloudSdkCore::DetailedResponse] A `IBMCloudSdkCore::DetailedResponse` object representing the response.
|
600
684
|
def recognize_using_websocket(
|
601
685
|
content_type: nil,
|
@@ -627,7 +711,8 @@ module IBMWatson
|
|
627
711
|
end_of_phrase_silence_time: nil,
|
628
712
|
split_transcript_at_phrase_end: nil,
|
629
713
|
speech_detector_sensitivity: nil,
|
630
|
-
background_audio_suppression: nil
|
714
|
+
background_audio_suppression: nil,
|
715
|
+
low_latency: nil
|
631
716
|
)
|
632
717
|
raise ArgumentError("Audio must be provided") if audio.nil? && !chunk_data
|
633
718
|
raise ArgumentError("Recognize callback must be provided") if recognize_callback.nil?
|
@@ -669,7 +754,8 @@ module IBMWatson
|
|
669
754
|
"end_of_phrase_silence_time" => end_of_phrase_silence_time,
|
670
755
|
"split_transcript_at_phrase_end" => split_transcript_at_phrase_end,
|
671
756
|
"speech_detector_sensitivity" => speech_detector_sensitivity,
|
672
|
-
"background_audio_suppression" => background_audio_suppression
|
757
|
+
"background_audio_suppression" => background_audio_suppression,
|
758
|
+
"low_latency" => low_latency
|
673
759
|
}
|
674
760
|
options.delete_if { |_, v| v.nil? }
|
675
761
|
WebSocketClient.new(audio: audio, chunk_data: chunk_data, options: options, recognize_callback: recognize_callback, service_url: service_url, headers: headers, disable_ssl_verification: @disable_ssl_verification)
|
@@ -787,7 +873,7 @@ module IBMWatson
|
|
787
873
|
end
|
788
874
|
|
789
875
|
##
|
790
|
-
# @!method create_job(audio:, content_type: nil, model: nil, callback_url: nil, events: nil, user_token: nil, results_ttl: nil, language_customization_id: nil, acoustic_customization_id: nil, base_model_version: nil, customization_weight: nil, inactivity_timeout: nil, keywords: nil, keywords_threshold: nil, max_alternatives: nil, word_alternatives_threshold: nil, word_confidence: nil, timestamps: nil, profanity_filter: nil, smart_formatting: nil, speaker_labels: nil, customization_id: nil, grammar_name: nil, redaction: nil, processing_metrics: nil, processing_metrics_interval: nil, audio_metrics: nil, end_of_phrase_silence_time: nil, split_transcript_at_phrase_end: nil, speech_detector_sensitivity: nil, background_audio_suppression: nil)
|
876
|
+
# @!method create_job(audio:, content_type: nil, model: nil, callback_url: nil, events: nil, user_token: nil, results_ttl: nil, language_customization_id: nil, acoustic_customization_id: nil, base_model_version: nil, customization_weight: nil, inactivity_timeout: nil, keywords: nil, keywords_threshold: nil, max_alternatives: nil, word_alternatives_threshold: nil, word_confidence: nil, timestamps: nil, profanity_filter: nil, smart_formatting: nil, speaker_labels: nil, customization_id: nil, grammar_name: nil, redaction: nil, processing_metrics: nil, processing_metrics_interval: nil, audio_metrics: nil, end_of_phrase_silence_time: nil, split_transcript_at_phrase_end: nil, speech_detector_sensitivity: nil, background_audio_suppression: nil, low_latency: nil)
|
791
877
|
# Create a job.
|
792
878
|
# Creates a job for a new asynchronous recognition request. The job is owned by the
|
793
879
|
# instance of the service whose credentials are used to create it. How you learn the
|
@@ -883,14 +969,49 @@ module IBMWatson
|
|
883
969
|
# sampling rate of the audio is lower than the minimum required rate, the request
|
884
970
|
# fails.
|
885
971
|
#
|
886
|
-
# **See also:** [
|
887
|
-
# formats](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-audio-formats
|
972
|
+
# **See also:** [Supported audio
|
973
|
+
# formats](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-audio-formats).
|
974
|
+
#
|
975
|
+
#
|
976
|
+
# ### Next-generation models
|
977
|
+
#
|
978
|
+
# **Note:** The next-generation language models are beta functionality. They
|
979
|
+
# support a limited number of languages and features at this time. The supported
|
980
|
+
# languages, models, and features will increase with future releases.
|
981
|
+
#
|
982
|
+
# The service supports next-generation `Multimedia` (16 kHz) and `Telephony` (8 kHz)
|
983
|
+
# models for many languages. Next-generation models have higher throughput than the
|
984
|
+
# service's previous generation of `Broadband` and `Narrowband` models. When you use
|
985
|
+
# next-generation models, the service can return transcriptions more quickly and
|
986
|
+
# also provide noticeably better transcription accuracy.
|
987
|
+
#
|
988
|
+
# You specify a next-generation model by using the `model` query parameter, as you
|
989
|
+
# do a previous-generation model. Next-generation models support the same request
|
990
|
+
# headers as previous-generation models, but they support only the following
|
991
|
+
# additional query parameters:
|
992
|
+
# * `background_audio_suppression`
|
993
|
+
# * `inactivity_timeout`
|
994
|
+
# * `profanity_filter`
|
995
|
+
# * `redaction`
|
996
|
+
# * `smart_formatting`
|
997
|
+
# * `speaker_labels`
|
998
|
+
# * `speech_detector_sensitivity`
|
999
|
+
# * `timestamps`
|
1000
|
+
#
|
1001
|
+
# Many next-generation models also support the beta `low_latency` parameter, which
|
1002
|
+
# is not available with previous-generation models.
|
1003
|
+
#
|
1004
|
+
# **See also:** [Next-generation languages and
|
1005
|
+
# models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-ng).
|
888
1006
|
# @param audio [File] The audio to transcribe.
|
889
1007
|
# @param content_type [String] The format (MIME type) of the audio. For more information about specifying an
|
890
1008
|
# audio format, see **Audio formats (content types)** in the method description.
|
891
|
-
# @param model [String] The identifier of the model that is to be used for the recognition request.
|
892
|
-
#
|
893
|
-
#
|
1009
|
+
# @param model [String] The identifier of the model that is to be used for the recognition request.
|
1010
|
+
# (**Note:** The model `ar-AR_BroadbandModel` is deprecated; use
|
1011
|
+
# `ar-MS_BroadbandModel` instead.) See [Languages and
|
1012
|
+
# models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models) and
|
1013
|
+
# [Next-generation languages and
|
1014
|
+
# models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-ng).
|
894
1015
|
# @param callback_url [String] A URL to which callback notifications are to be sent. The URL must already be
|
895
1016
|
# successfully allowlisted by using the **Register a callback** method. You can
|
896
1017
|
# include the same callback URL with any number of job creation requests. Omit the
|
@@ -929,8 +1050,9 @@ module IBMWatson
|
|
929
1050
|
# recognition request. The base model of the specified custom language model must
|
930
1051
|
# match the model specified with the `model` parameter. You must make the request
|
931
1052
|
# with credentials for the instance of the service that owns the custom model. By
|
932
|
-
# default, no custom language model is used. See [
|
933
|
-
#
|
1053
|
+
# default, no custom language model is used. See [Using a custom language model for
|
1054
|
+
# speech
|
1055
|
+
# recognition](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-languageUse).
|
934
1056
|
#
|
935
1057
|
#
|
936
1058
|
# **Note:** Use this parameter instead of the deprecated `customization_id`
|
@@ -939,14 +1061,16 @@ module IBMWatson
|
|
939
1061
|
# recognition request. The base model of the specified custom acoustic model must
|
940
1062
|
# match the model specified with the `model` parameter. You must make the request
|
941
1063
|
# with credentials for the instance of the service that owns the custom model. By
|
942
|
-
# default, no custom acoustic model is used. See [
|
943
|
-
#
|
1064
|
+
# default, no custom acoustic model is used. See [Using a custom acoustic model for
|
1065
|
+
# speech
|
1066
|
+
# recognition](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-acousticUse).
|
944
1067
|
# @param base_model_version [String] The version of the specified base model that is to be used with the recognition
|
945
1068
|
# request. Multiple versions of a base model can exist when a model is updated for
|
946
1069
|
# internal improvements. The parameter is intended primarily for use with custom
|
947
1070
|
# models that have been upgraded for a new base model. The default value depends on
|
948
|
-
# whether the parameter is used with or without a custom model. See [
|
949
|
-
#
|
1071
|
+
# whether the parameter is used with or without a custom model. See [Making speech
|
1072
|
+
# recognition requests with upgraded custom
|
1073
|
+
# models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-upgrade-use#custom-upgrade-use-recognition).
|
950
1074
|
# @param customization_weight [Float] If you specify the customization ID (GUID) of a custom language model with the
|
951
1075
|
# recognition request, the customization weight tells the service how much weight to
|
952
1076
|
# give to words from the custom language model compared to those from the base model
|
@@ -963,8 +1087,8 @@ module IBMWatson
|
|
963
1087
|
# custom model's domain, but it can negatively affect performance on non-domain
|
964
1088
|
# phrases.
|
965
1089
|
#
|
966
|
-
# See [
|
967
|
-
#
|
1090
|
+
# See [Using customization
|
1091
|
+
# weight](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-languageUse#weight).
|
968
1092
|
# @param inactivity_timeout [Fixnum] The time in seconds after which, if only silence (no speech) is detected in
|
969
1093
|
# streaming audio, the connection is closed with a 400 error. The parameter is
|
970
1094
|
# useful for stopping audio submission from a live microphone when a user simply
|
@@ -981,34 +1105,34 @@ module IBMWatson
|
|
981
1105
|
# for double-byte languages might be shorter. Keywords are case-insensitive.
|
982
1106
|
#
|
983
1107
|
# See [Keyword
|
984
|
-
# spotting](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
1108
|
+
# spotting](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-spotting#keyword-spotting).
|
985
1109
|
# @param keywords_threshold [Float] A confidence value that is the lower bound for spotting a keyword. A word is
|
986
1110
|
# considered to match a keyword if its confidence is greater than or equal to the
|
987
1111
|
# threshold. Specify a probability between 0.0 and 1.0. If you specify a threshold,
|
988
1112
|
# you must also specify one or more keywords. The service performs no keyword
|
989
1113
|
# spotting if you omit either parameter. See [Keyword
|
990
|
-
# spotting](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
1114
|
+
# spotting](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-spotting#keyword-spotting).
|
991
1115
|
# @param max_alternatives [Fixnum] The maximum number of alternative transcripts that the service is to return. By
|
992
1116
|
# default, the service returns a single transcript. If you specify a value of `0`,
|
993
1117
|
# the service uses the default value, `1`. See [Maximum
|
994
|
-
# alternatives](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
1118
|
+
# alternatives](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metadata#max-alternatives).
|
995
1119
|
# @param word_alternatives_threshold [Float] A confidence value that is the lower bound for identifying a hypothesis as a
|
996
1120
|
# possible word alternative (also known as "Confusion Networks"). An alternative
|
997
1121
|
# word is considered if its confidence is greater than or equal to the threshold.
|
998
1122
|
# Specify a probability between 0.0 and 1.0. By default, the service computes no
|
999
1123
|
# alternative words. See [Word
|
1000
|
-
# alternatives](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
1124
|
+
# alternatives](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-spotting#word-alternatives).
|
1001
1125
|
# @param word_confidence [Boolean] If `true`, the service returns a confidence measure in the range of 0.0 to 1.0 for
|
1002
1126
|
# each word. By default, the service returns no word confidence scores. See [Word
|
1003
|
-
# confidence](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
1127
|
+
# confidence](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metadata#word-confidence).
|
1004
1128
|
# @param timestamps [Boolean] If `true`, the service returns time alignment for each word. By default, no
|
1005
1129
|
# timestamps are returned. See [Word
|
1006
|
-
# timestamps](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
1130
|
+
# timestamps](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metadata#word-timestamps).
|
1007
1131
|
# @param profanity_filter [Boolean] If `true`, the service filters profanity from all output except for keyword
|
1008
1132
|
# results by replacing inappropriate words with a series of asterisks. Set the
|
1009
1133
|
# parameter to `false` to return results with no censoring. Applies to US English
|
1010
|
-
# transcription only. See [Profanity
|
1011
|
-
# filtering](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
1134
|
+
# and Japanese transcription only. See [Profanity
|
1135
|
+
# filtering](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-formatting#profanity-filtering).
|
1012
1136
|
# @param smart_formatting [Boolean] If `true`, the service converts dates, times, series of digits and numbers, phone
|
1013
1137
|
# numbers, currency values, and internet addresses into more readable, conventional
|
1014
1138
|
# representations in the final transcript of a recognition request. For US English,
|
@@ -1018,19 +1142,21 @@ module IBMWatson
|
|
1018
1142
|
# **Note:** Applies to US English, Japanese, and Spanish transcription only.
|
1019
1143
|
#
|
1020
1144
|
# See [Smart
|
1021
|
-
# formatting](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
1145
|
+
# formatting](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-formatting#smart-formatting).
|
1022
1146
|
# @param speaker_labels [Boolean] If `true`, the response includes labels that identify which words were spoken by
|
1023
1147
|
# which participants in a multi-person exchange. By default, the service returns no
|
1024
1148
|
# speaker labels. Setting `speaker_labels` to `true` forces the `timestamps`
|
1025
1149
|
# parameter to be `true`, regardless of whether you specify `false` for the
|
1026
1150
|
# parameter.
|
1027
|
-
#
|
1028
|
-
#
|
1029
|
-
#
|
1030
|
-
#
|
1031
|
-
#
|
1032
|
-
#
|
1033
|
-
# labels
|
1151
|
+
# * For previous-generation models, can be used for US English, Australian English,
|
1152
|
+
# German, Japanese, Korean, and Spanish (both broadband and narrowband models) and
|
1153
|
+
# UK English (narrowband model) transcription only.
|
1154
|
+
# * For next-generation models, can be used for English (Australian, UK, and US),
|
1155
|
+
# German, and Spanish transcription only.
|
1156
|
+
#
|
1157
|
+
# Restrictions and limitations apply to the use of speaker labels for both types of
|
1158
|
+
# models. See [Speaker
|
1159
|
+
# labels](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-speaker-labels).
|
1034
1160
|
# @param customization_id [String] **Deprecated.** Use the `language_customization_id` parameter to specify the
|
1035
1161
|
# customization ID (GUID) of a custom language model that is to be used with the
|
1036
1162
|
# recognition request. Do not specify both parameters with a request.
|
@@ -1039,7 +1165,8 @@ module IBMWatson
|
|
1039
1165
|
# specify the name of the custom language model for which the grammar is defined.
|
1040
1166
|
# The service recognizes only strings that are recognized by the specified grammar;
|
1041
1167
|
# it does not recognize other custom words from the model's words resource. See
|
1042
|
-
# [
|
1168
|
+
# [Using a grammar for speech
|
1169
|
+
# recognition](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-grammarUse).
|
1043
1170
|
# @param redaction [Boolean] If `true`, the service redacts, or masks, numeric data from final transcripts. The
|
1044
1171
|
# feature redacts any number that has three or more consecutive digits by replacing
|
1045
1172
|
# each digit with an `X` character. It is intended to redact sensitive numeric data,
|
@@ -1054,7 +1181,7 @@ module IBMWatson
|
|
1054
1181
|
# **Note:** Applies to US English, Japanese, and Korean transcription only.
|
1055
1182
|
#
|
1056
1183
|
# See [Numeric
|
1057
|
-
# redaction](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
1184
|
+
# redaction](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-formatting#numeric-redaction).
|
1058
1185
|
# @param processing_metrics [Boolean] If `true`, requests processing metrics about the service's transcription of the
|
1059
1186
|
# input audio. The service returns processing metrics at the interval specified by
|
1060
1187
|
# the `processing_metrics_interval` parameter. It also returns processing metrics
|
@@ -1062,7 +1189,7 @@ module IBMWatson
|
|
1062
1189
|
# the service returns no processing metrics.
|
1063
1190
|
#
|
1064
1191
|
# See [Processing
|
1065
|
-
# metrics](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metrics#
|
1192
|
+
# metrics](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metrics#processing-metrics).
|
1066
1193
|
# @param processing_metrics_interval [Float] Specifies the interval in real wall-clock seconds at which the service is to
|
1067
1194
|
# return processing metrics. The parameter is ignored unless the
|
1068
1195
|
# `processing_metrics` parameter is set to `true`.
|
@@ -1076,13 +1203,13 @@ module IBMWatson
|
|
1076
1203
|
# the service returns processing metrics only for transcription events.
|
1077
1204
|
#
|
1078
1205
|
# See [Processing
|
1079
|
-
# metrics](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metrics#
|
1206
|
+
# metrics](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metrics#processing-metrics).
|
1080
1207
|
# @param audio_metrics [Boolean] If `true`, requests detailed information about the signal characteristics of the
|
1081
1208
|
# input audio. The service returns audio metrics with the final transcription
|
1082
1209
|
# results. By default, the service returns no audio metrics.
|
1083
1210
|
#
|
1084
1211
|
# See [Audio
|
1085
|
-
# metrics](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metrics#
|
1212
|
+
# metrics](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metrics#audio-metrics).
|
1086
1213
|
# @param end_of_phrase_silence_time [Float] If `true`, specifies the duration of the pause interval at which the service
|
1087
1214
|
# splits a transcript into multiple final results. If the service detects pauses or
|
1088
1215
|
# extended silence before it reaches the end of the audio stream, its response can
|
@@ -1099,7 +1226,7 @@ module IBMWatson
|
|
1099
1226
|
# Chinese is 0.6 seconds.
|
1100
1227
|
#
|
1101
1228
|
# See [End of phrase silence
|
1102
|
-
# time](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
1229
|
+
# time](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-parsing#silence-time).
|
1103
1230
|
# @param split_transcript_at_phrase_end [Boolean] If `true`, directs the service to split the transcript into multiple final results
|
1104
1231
|
# based on semantic features of the input, for example, at the conclusion of
|
1105
1232
|
# meaningful phrases such as sentences. The service bases its understanding of
|
@@ -1109,7 +1236,7 @@ module IBMWatson
|
|
1109
1236
|
# interval.
|
1110
1237
|
#
|
1111
1238
|
# See [Split transcript at phrase
|
1112
|
-
# end](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
1239
|
+
# end](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-parsing#split-transcript).
|
1113
1240
|
# @param speech_detector_sensitivity [Float] The sensitivity of speech activity detection that the service is to perform. Use
|
1114
1241
|
# the parameter to suppress word insertions from music, coughing, and other
|
1115
1242
|
# non-speech events. The service biases the audio it passes for speech recognition
|
@@ -1121,8 +1248,8 @@ module IBMWatson
|
|
1121
1248
|
# * 0.5 (the default) provides a reasonable compromise for the level of sensitivity.
|
1122
1249
|
# * 1.0 suppresses no audio (speech detection sensitivity is disabled).
|
1123
1250
|
#
|
1124
|
-
# The values increase on a monotonic curve. See [Speech
|
1125
|
-
#
|
1251
|
+
# The values increase on a monotonic curve. See [Speech detector
|
1252
|
+
# sensitivity](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-detection#detection-parameters-sensitivity).
|
1126
1253
|
# @param background_audio_suppression [Float] The level to which the service is to suppress background audio based on its volume
|
1127
1254
|
# to prevent it from being transcribed as speech. Use the parameter to suppress side
|
1128
1255
|
# conversations or background noise.
|
@@ -1133,10 +1260,27 @@ module IBMWatson
|
|
1133
1260
|
# * 0.5 provides a reasonable level of audio suppression for general usage.
|
1134
1261
|
# * 1.0 suppresses all audio (no audio is transcribed).
|
1135
1262
|
#
|
1136
|
-
# The values increase on a monotonic curve. See [
|
1137
|
-
#
|
1263
|
+
# The values increase on a monotonic curve. See [Background audio
|
1264
|
+
# suppression](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-detection#detection-parameters-suppression).
|
1265
|
+
# @param low_latency [Boolean] If `true` for next-generation `Multimedia` and `Telephony` models that support low
|
1266
|
+
# latency, directs the service to produce results even more quickly than it usually
|
1267
|
+
# does. Next-generation models produce transcription results faster than
|
1268
|
+
# previous-generation models. The `low_latency` parameter causes the models to
|
1269
|
+
# produce results even more quickly, though the results might be less accurate when
|
1270
|
+
# the parameter is used.
|
1271
|
+
#
|
1272
|
+
# **Note:** The parameter is beta functionality. It is not available for
|
1273
|
+
# previous-generation `Broadband` and `Narrowband` models. It is available only for
|
1274
|
+
# some next-generation models.
|
1275
|
+
#
|
1276
|
+
# * For a list of next-generation models that support low latency, see [Supported
|
1277
|
+
# language
|
1278
|
+
# models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-ng#models-ng-supported)
|
1279
|
+
# for next-generation models.
|
1280
|
+
# * For more information about the `low_latency` parameter, see [Low
|
1281
|
+
# latency](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-interim#low-latency).
|
1138
1282
|
# @return [IBMCloudSdkCore::DetailedResponse] A `IBMCloudSdkCore::DetailedResponse` object representing the response.
|
1139
|
-
def create_job(audio:, content_type: nil, model: nil, callback_url: nil, events: nil, user_token: nil, results_ttl: nil, language_customization_id: nil, acoustic_customization_id: nil, base_model_version: nil, customization_weight: nil, inactivity_timeout: nil, keywords: nil, keywords_threshold: nil, max_alternatives: nil, word_alternatives_threshold: nil, word_confidence: nil, timestamps: nil, profanity_filter: nil, smart_formatting: nil, speaker_labels: nil, customization_id: nil, grammar_name: nil, redaction: nil, processing_metrics: nil, processing_metrics_interval: nil, audio_metrics: nil, end_of_phrase_silence_time: nil, split_transcript_at_phrase_end: nil, speech_detector_sensitivity: nil, background_audio_suppression: nil)
|
1283
|
+
def create_job(audio:, content_type: nil, model: nil, callback_url: nil, events: nil, user_token: nil, results_ttl: nil, language_customization_id: nil, acoustic_customization_id: nil, base_model_version: nil, customization_weight: nil, inactivity_timeout: nil, keywords: nil, keywords_threshold: nil, max_alternatives: nil, word_alternatives_threshold: nil, word_confidence: nil, timestamps: nil, profanity_filter: nil, smart_formatting: nil, speaker_labels: nil, customization_id: nil, grammar_name: nil, redaction: nil, processing_metrics: nil, processing_metrics_interval: nil, audio_metrics: nil, end_of_phrase_silence_time: nil, split_transcript_at_phrase_end: nil, speech_detector_sensitivity: nil, background_audio_suppression: nil, low_latency: nil)
|
1140
1284
|
raise ArgumentError.new("audio must be provided") if audio.nil?
|
1141
1285
|
|
1142
1286
|
headers = {
|
@@ -1175,7 +1319,8 @@ module IBMWatson
|
|
1175
1319
|
"end_of_phrase_silence_time" => end_of_phrase_silence_time,
|
1176
1320
|
"split_transcript_at_phrase_end" => split_transcript_at_phrase_end,
|
1177
1321
|
"speech_detector_sensitivity" => speech_detector_sensitivity,
|
1178
|
-
"background_audio_suppression" => background_audio_suppression
|
1322
|
+
"background_audio_suppression" => background_audio_suppression,
|
1323
|
+
"low_latency" => low_latency
|
1179
1324
|
}
|
1180
1325
|
|
1181
1326
|
data = audio
|
@@ -1393,7 +1538,8 @@ module IBMWatson
|
|
1393
1538
|
# models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageLanguageModels#listModels-language).
|
1394
1539
|
# @param language [String] The identifier of the language for which custom language or custom acoustic models
|
1395
1540
|
# are to be returned. Omit the parameter to see all custom language or custom
|
1396
|
-
# acoustic models that are owned by the requesting credentials.
|
1541
|
+
# acoustic models that are owned by the requesting credentials. (**Note:** The
|
1542
|
+
# identifier `ar-AR` is deprecated; use `ar-MS` instead.)
|
1397
1543
|
#
|
1398
1544
|
# To determine the languages for which customization is available, see [Language
|
1399
1545
|
# support for
|
@@ -1548,6 +1694,9 @@ module IBMWatson
|
|
1548
1694
|
# The value that you assign is used for all recognition requests that use the model.
|
1549
1695
|
# You can override it for any recognition request by specifying a customization
|
1550
1696
|
# weight for that request.
|
1697
|
+
#
|
1698
|
+
# See [Using customization
|
1699
|
+
# weight](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-languageUse#weight).
|
1551
1700
|
# @return [IBMCloudSdkCore::DetailedResponse] A `IBMCloudSdkCore::DetailedResponse` object representing the response.
|
1552
1701
|
def train_language_model(customization_id:, word_type_to_add: nil, customization_weight: nil)
|
1553
1702
|
raise ArgumentError.new("customization_id must be provided") if customization_id.nil?
|
@@ -1629,7 +1778,7 @@ module IBMWatson
|
|
1629
1778
|
# subsequent requests for the model until the upgrade completes.
|
1630
1779
|
#
|
1631
1780
|
# **See also:** [Upgrading a custom language
|
1632
|
-
# model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
1781
|
+
# model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-upgrade#custom-upgrade-language).
|
1633
1782
|
# @param customization_id [String] The customization ID (GUID) of the custom language model that is to be used for
|
1634
1783
|
# the request. You must make the request with credentials for the instance of the
|
1635
1784
|
# service that owns the custom model.
|
@@ -2468,7 +2617,8 @@ module IBMWatson
|
|
2468
2617
|
# custom model`.
|
2469
2618
|
# @param base_model_name [String] The name of the base language model that is to be customized by the new custom
|
2470
2619
|
# acoustic model. The new custom model can be used only with the base model that it
|
2471
|
-
# customizes.
|
2620
|
+
# customizes. (**Note:** The model `ar-AR_BroadbandModel` is deprecated; use
|
2621
|
+
# `ar-MS_BroadbandModel` instead.)
|
2472
2622
|
#
|
2473
2623
|
# To determine whether a base model supports acoustic model customization, refer to
|
2474
2624
|
# [Language support for
|
@@ -2517,7 +2667,8 @@ module IBMWatson
|
|
2517
2667
|
# models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageAcousticModels#listModels-acoustic).
|
2518
2668
|
# @param language [String] The identifier of the language for which custom language or custom acoustic models
|
2519
2669
|
# are to be returned. Omit the parameter to see all custom language or custom
|
2520
|
-
# acoustic models that are owned by the requesting credentials.
|
2670
|
+
# acoustic models that are owned by the requesting credentials. (**Note:** The
|
2671
|
+
# identifier `ar-AR` is deprecated; use `ar-MS` instead.)
|
2521
2672
|
#
|
2522
2673
|
# To determine the languages for which customization is available, see [Language
|
2523
2674
|
# support for
|
@@ -2771,7 +2922,7 @@ module IBMWatson
|
|
2771
2922
|
# acoustic model was not trained with a custom language model.
|
2772
2923
|
#
|
2773
2924
|
# **See also:** [Upgrading a custom acoustic
|
2774
|
-
# model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
2925
|
+
# model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-upgrade#custom-upgrade-acoustic).
|
2775
2926
|
# @param customization_id [String] The customization ID (GUID) of the custom acoustic model that is to be used for
|
2776
2927
|
# the request. You must make the request with credentials for the instance of the
|
2777
2928
|
# service that owns the custom model.
|
@@ -2785,7 +2936,7 @@ module IBMWatson
|
|
2785
2936
|
# upgrade of a custom acoustic model that is trained with a custom language model,
|
2786
2937
|
# and only if you receive a 400 response code and the message `No input data
|
2787
2938
|
# modified since last training`. See [Upgrading a custom acoustic
|
2788
|
-
# model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
2939
|
+
# model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-upgrade#custom-upgrade-acoustic).
|
2789
2940
|
# @return [nil]
|
2790
2941
|
def upgrade_acoustic_model(customization_id:, custom_language_model_id: nil, force: nil)
|
2791
2942
|
raise ArgumentError.new("customization_id must be provided") if customization_id.nil?
|
@@ -2923,8 +3074,8 @@ module IBMWatson
|
|
2923
3074
|
# If the sampling rate of the audio is lower than the minimum required rate, the
|
2924
3075
|
# service labels the audio file as `invalid`.
|
2925
3076
|
#
|
2926
|
-
# **See also:** [
|
2927
|
-
# formats](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-audio-formats
|
3077
|
+
# **See also:** [Supported audio
|
3078
|
+
# formats](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-audio-formats).
|
2928
3079
|
#
|
2929
3080
|
#
|
2930
3081
|
# ### Content types for archive-type resources
|