ibm_watson 2.0.1 → 2.1.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +9 -29
- data/lib/ibm_watson/assistant_v1.rb +111 -77
- data/lib/ibm_watson/assistant_v2.rb +83 -59
- data/lib/ibm_watson/compare_comply_v1.rb +11 -4
- data/lib/ibm_watson/discovery_v1.rb +2 -3
- data/lib/ibm_watson/discovery_v2.rb +97 -7
- data/lib/ibm_watson/language_translator_v3.rb +1 -2
- data/lib/ibm_watson/natural_language_classifier_v1.rb +9 -3
- data/lib/ibm_watson/natural_language_understanding_v1.rb +692 -3
- data/lib/ibm_watson/personality_insights_v3.rb +13 -11
- data/lib/ibm_watson/speech_to_text_v1.rb +257 -106
- data/lib/ibm_watson/text_to_speech_v1.rb +599 -19
- data/lib/ibm_watson/tone_analyzer_v3.rb +1 -2
- data/lib/ibm_watson/version.rb +1 -1
- data/lib/ibm_watson/visual_recognition_v3.rb +1 -2
- data/lib/ibm_watson/visual_recognition_v4.rb +11 -8
- data/test/integration/test_discovery_v2.rb +15 -0
- data/test/integration/test_natural_language_understanding_v1.rb +134 -1
- data/test/integration/test_text_to_speech_v1.rb +57 -0
- data/test/unit/test_discovery_v2.rb +29 -0
- data/test/unit/test_natural_language_understanding_v1.rb +231 -0
- data/test/unit/test_text_to_speech_v1.rb +145 -0
- metadata +7 -7
@@ -14,17 +14,20 @@
|
|
14
14
|
# See the License for the specific language governing permissions and
|
15
15
|
# limitations under the License.
|
16
16
|
#
|
17
|
-
# IBM OpenAPI SDK Code Generator Version: 3.
|
17
|
+
# IBM OpenAPI SDK Code Generator Version: 3.31.0-902c9336-20210504-161156
|
18
18
|
#
|
19
|
-
# IBM
|
20
|
-
#
|
21
|
-
#
|
22
|
-
#
|
23
|
-
# Watson™ Natural Language
|
24
|
-
#
|
25
|
-
#
|
26
|
-
#
|
27
|
-
#
|
19
|
+
# IBM Watson™ Personality Insights is discontinued. Existing instances are
|
20
|
+
# supported until 1 December 2021, but as of 1 December 2020, you cannot create new
|
21
|
+
# instances. Any instance that exists on 1 December 2021 will be deleted.<br/><br/>No
|
22
|
+
# direct replacement exists for Personality Insights. However, you can consider using [IBM
|
23
|
+
# Watson™ Natural Language
|
24
|
+
# Understanding](https://cloud.ibm.com/docs/natural-language-understanding?topic=natural-language-understanding-about)
|
25
|
+
# on IBM Cloud® as part of a replacement analytic workflow for your Personality
|
26
|
+
# Insights use cases. You can use Natural Language Understanding to extract data and
|
27
|
+
# insights from text, such as keywords, categories, sentiment, emotion, and syntax. For
|
28
|
+
# more information about the personality models in Personality Insights, see [The science
|
29
|
+
# behind the
|
30
|
+
# service](https://cloud.ibm.com/docs/personality-insights?topic=personality-insights-science).
|
28
31
|
# {: deprecated}
|
29
32
|
#
|
30
33
|
# The IBM Watson Personality Insights service enables applications to derive insights from
|
@@ -54,7 +57,6 @@ require "json"
|
|
54
57
|
require "ibm_cloud_sdk_core"
|
55
58
|
require_relative "./common.rb"
|
56
59
|
|
57
|
-
# Module for the Watson APIs
|
58
60
|
module IBMWatson
|
59
61
|
##
|
60
62
|
# The Personality Insights V3 service.
|
@@ -14,14 +14,21 @@
|
|
14
14
|
# See the License for the specific language governing permissions and
|
15
15
|
# limitations under the License.
|
16
16
|
#
|
17
|
-
# IBM OpenAPI SDK Code Generator Version: 3.
|
17
|
+
# IBM OpenAPI SDK Code Generator Version: 3.31.0-902c9336-20210504-161156
|
18
18
|
#
|
19
19
|
# The IBM Watson™ Speech to Text service provides APIs that use IBM's
|
20
20
|
# speech-recognition capabilities to produce transcripts of spoken audio. The service can
|
21
21
|
# transcribe speech from various languages and audio formats. In addition to basic
|
22
22
|
# transcription, the service can produce detailed information about many different aspects
|
23
|
-
# of the audio.
|
24
|
-
#
|
23
|
+
# of the audio. It returns all JSON response content in the UTF-8 character set.
|
24
|
+
#
|
25
|
+
# The service supports two types of models: previous-generation models that include the
|
26
|
+
# terms `Broadband` and `Narrowband` in their names, and beta next-generation models that
|
27
|
+
# include the terms `Multimedia` and `Telephony` in their names. Broadband and multimedia
|
28
|
+
# models have minimum sampling rates of 16 kHz. Narrowband and telephony models have
|
29
|
+
# minimum sampling rates of 8 kHz. The beta next-generation models currently support fewer
|
30
|
+
# languages and features, but they offer high throughput and greater transcription
|
31
|
+
# accuracy.
|
25
32
|
#
|
26
33
|
# For speech recognition, the service supports synchronous and asynchronous HTTP
|
27
34
|
# Representational State Transfer (REST) interfaces. It also supports a WebSocket
|
@@ -37,8 +44,9 @@
|
|
37
44
|
# can recognize.
|
38
45
|
#
|
39
46
|
# Language model customization and acoustic model customization are generally available
|
40
|
-
# for production use with all
|
41
|
-
# beta functionality for all
|
47
|
+
# for production use with all previous-generation models that are generally available.
|
48
|
+
# Grammars are beta functionality for all previous-generation models that support language
|
49
|
+
# model customization. Next-generation models do not support customization at this time.
|
42
50
|
|
43
51
|
require "concurrent"
|
44
52
|
require "erb"
|
@@ -46,7 +54,6 @@ require "json"
|
|
46
54
|
require "ibm_cloud_sdk_core"
|
47
55
|
require_relative "./common.rb"
|
48
56
|
|
49
|
-
# Module for the Watson APIs
|
50
57
|
module IBMWatson
|
51
58
|
##
|
52
59
|
# The Speech to Text V1 service.
|
@@ -89,8 +96,8 @@ module IBMWatson
|
|
89
96
|
# among other things. The ordering of the list of models can change from call to
|
90
97
|
# call; do not rely on an alphabetized or static list of models.
|
91
98
|
#
|
92
|
-
# **See also:** [
|
93
|
-
# models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models
|
99
|
+
# **See also:** [Listing
|
100
|
+
# models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-list).
|
94
101
|
# @return [IBMCloudSdkCore::DetailedResponse] A `IBMCloudSdkCore::DetailedResponse` object representing the response.
|
95
102
|
def list_models
|
96
103
|
headers = {
|
@@ -116,10 +123,11 @@ module IBMWatson
|
|
116
123
|
# with the service. The information includes the name of the model and its minimum
|
117
124
|
# sampling rate in Hertz, among other things.
|
118
125
|
#
|
119
|
-
# **See also:** [
|
120
|
-
# models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models
|
126
|
+
# **See also:** [Listing
|
127
|
+
# models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-list).
|
121
128
|
# @param model_id [String] The identifier of the model in the form of its name from the output of the **Get a
|
122
|
-
# model** method.
|
129
|
+
# model** method. (**Note:** The model `ar-AR_BroadbandModel` is deprecated; use
|
130
|
+
# `ar-MS_BroadbandModel` instead.).
|
123
131
|
# @return [IBMCloudSdkCore::DetailedResponse] A `IBMCloudSdkCore::DetailedResponse` object representing the response.
|
124
132
|
def get_model(model_id:)
|
125
133
|
raise ArgumentError.new("model_id must be provided") if model_id.nil?
|
@@ -144,7 +152,7 @@ module IBMWatson
|
|
144
152
|
#########################
|
145
153
|
|
146
154
|
##
|
147
|
-
# @!method recognize(audio:, content_type: nil, model: nil, language_customization_id: nil, acoustic_customization_id: nil, base_model_version: nil, customization_weight: nil, inactivity_timeout: nil, keywords: nil, keywords_threshold: nil, max_alternatives: nil, word_alternatives_threshold: nil, word_confidence: nil, timestamps: nil, profanity_filter: nil, smart_formatting: nil, speaker_labels: nil, customization_id: nil, grammar_name: nil, redaction: nil, audio_metrics: nil, end_of_phrase_silence_time: nil, split_transcript_at_phrase_end: nil, speech_detector_sensitivity: nil, background_audio_suppression: nil)
|
155
|
+
# @!method recognize(audio:, content_type: nil, model: nil, language_customization_id: nil, acoustic_customization_id: nil, base_model_version: nil, customization_weight: nil, inactivity_timeout: nil, keywords: nil, keywords_threshold: nil, max_alternatives: nil, word_alternatives_threshold: nil, word_confidence: nil, timestamps: nil, profanity_filter: nil, smart_formatting: nil, speaker_labels: nil, customization_id: nil, grammar_name: nil, redaction: nil, audio_metrics: nil, end_of_phrase_silence_time: nil, split_transcript_at_phrase_end: nil, speech_detector_sensitivity: nil, background_audio_suppression: nil, low_latency: nil)
|
148
156
|
# Recognize audio.
|
149
157
|
# Sends audio and returns transcription results for a recognition request. You can
|
150
158
|
# pass a maximum of 100 MB and a minimum of 100 bytes of audio with a request. The
|
@@ -211,8 +219,40 @@ module IBMWatson
|
|
211
219
|
# sampling rate of the audio is lower than the minimum required rate, the request
|
212
220
|
# fails.
|
213
221
|
#
|
214
|
-
# **See also:** [
|
215
|
-
# formats](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-audio-formats
|
222
|
+
# **See also:** [Supported audio
|
223
|
+
# formats](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-audio-formats).
|
224
|
+
#
|
225
|
+
#
|
226
|
+
# ### Next-generation models
|
227
|
+
#
|
228
|
+
# **Note:** The next-generation language models are beta functionality. They
|
229
|
+
# support a limited number of languages and features at this time. The supported
|
230
|
+
# languages, models, and features will increase with future releases.
|
231
|
+
#
|
232
|
+
# The service supports next-generation `Multimedia` (16 kHz) and `Telephony` (8 kHz)
|
233
|
+
# models for many languages. Next-generation models have higher throughput than the
|
234
|
+
# service's previous generation of `Broadband` and `Narrowband` models. When you use
|
235
|
+
# next-generation models, the service can return transcriptions more quickly and
|
236
|
+
# also provide noticeably better transcription accuracy.
|
237
|
+
#
|
238
|
+
# You specify a next-generation model by using the `model` query parameter, as you
|
239
|
+
# do a previous-generation model. Next-generation models support the same request
|
240
|
+
# headers as previous-generation models, but they support only the following
|
241
|
+
# additional query parameters:
|
242
|
+
# * `background_audio_suppression`
|
243
|
+
# * `inactivity_timeout`
|
244
|
+
# * `profanity_filter`
|
245
|
+
# * `redaction`
|
246
|
+
# * `smart_formatting`
|
247
|
+
# * `speaker_labels`
|
248
|
+
# * `speech_detector_sensitivity`
|
249
|
+
# * `timestamps`
|
250
|
+
#
|
251
|
+
# Many next-generation models also support the beta `low_latency` parameter, which
|
252
|
+
# is not available with previous-generation models.
|
253
|
+
#
|
254
|
+
# **See also:** [Next-generation languages and
|
255
|
+
# models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-ng).
|
216
256
|
#
|
217
257
|
#
|
218
258
|
# ### Multipart speech recognition
|
@@ -235,15 +275,19 @@ module IBMWatson
|
|
235
275
|
# @param audio [File] The audio to transcribe.
|
236
276
|
# @param content_type [String] The format (MIME type) of the audio. For more information about specifying an
|
237
277
|
# audio format, see **Audio formats (content types)** in the method description.
|
238
|
-
# @param model [String] The identifier of the model that is to be used for the recognition request.
|
239
|
-
#
|
240
|
-
#
|
278
|
+
# @param model [String] The identifier of the model that is to be used for the recognition request.
|
279
|
+
# (**Note:** The model `ar-AR_BroadbandModel` is deprecated; use
|
280
|
+
# `ar-MS_BroadbandModel` instead.) See [Languages and
|
281
|
+
# models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models) and
|
282
|
+
# [Next-generation languages and
|
283
|
+
# models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-ng).
|
241
284
|
# @param language_customization_id [String] The customization ID (GUID) of a custom language model that is to be used with the
|
242
285
|
# recognition request. The base model of the specified custom language model must
|
243
286
|
# match the model specified with the `model` parameter. You must make the request
|
244
287
|
# with credentials for the instance of the service that owns the custom model. By
|
245
|
-
# default, no custom language model is used. See [
|
246
|
-
#
|
288
|
+
# default, no custom language model is used. See [Using a custom language model for
|
289
|
+
# speech
|
290
|
+
# recognition](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-languageUse).
|
247
291
|
#
|
248
292
|
#
|
249
293
|
# **Note:** Use this parameter instead of the deprecated `customization_id`
|
@@ -252,14 +296,16 @@ module IBMWatson
|
|
252
296
|
# recognition request. The base model of the specified custom acoustic model must
|
253
297
|
# match the model specified with the `model` parameter. You must make the request
|
254
298
|
# with credentials for the instance of the service that owns the custom model. By
|
255
|
-
# default, no custom acoustic model is used. See [
|
256
|
-
#
|
299
|
+
# default, no custom acoustic model is used. See [Using a custom acoustic model for
|
300
|
+
# speech
|
301
|
+
# recognition](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-acousticUse).
|
257
302
|
# @param base_model_version [String] The version of the specified base model that is to be used with the recognition
|
258
303
|
# request. Multiple versions of a base model can exist when a model is updated for
|
259
304
|
# internal improvements. The parameter is intended primarily for use with custom
|
260
305
|
# models that have been upgraded for a new base model. The default value depends on
|
261
|
-
# whether the parameter is used with or without a custom model. See [
|
262
|
-
#
|
306
|
+
# whether the parameter is used with or without a custom model. See [Making speech
|
307
|
+
# recognition requests with upgraded custom
|
308
|
+
# models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-upgrade-use#custom-upgrade-use-recognition).
|
263
309
|
# @param customization_weight [Float] If you specify the customization ID (GUID) of a custom language model with the
|
264
310
|
# recognition request, the customization weight tells the service how much weight to
|
265
311
|
# give to words from the custom language model compared to those from the base model
|
@@ -276,8 +322,8 @@ module IBMWatson
|
|
276
322
|
# custom model's domain, but it can negatively affect performance on non-domain
|
277
323
|
# phrases.
|
278
324
|
#
|
279
|
-
# See [
|
280
|
-
#
|
325
|
+
# See [Using customization
|
326
|
+
# weight](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-languageUse#weight).
|
281
327
|
# @param inactivity_timeout [Fixnum] The time in seconds after which, if only silence (no speech) is detected in
|
282
328
|
# streaming audio, the connection is closed with a 400 error. The parameter is
|
283
329
|
# useful for stopping audio submission from a live microphone when a user simply
|
@@ -294,34 +340,34 @@ module IBMWatson
|
|
294
340
|
# for double-byte languages might be shorter. Keywords are case-insensitive.
|
295
341
|
#
|
296
342
|
# See [Keyword
|
297
|
-
# spotting](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
343
|
+
# spotting](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-spotting#keyword-spotting).
|
298
344
|
# @param keywords_threshold [Float] A confidence value that is the lower bound for spotting a keyword. A word is
|
299
345
|
# considered to match a keyword if its confidence is greater than or equal to the
|
300
346
|
# threshold. Specify a probability between 0.0 and 1.0. If you specify a threshold,
|
301
347
|
# you must also specify one or more keywords. The service performs no keyword
|
302
348
|
# spotting if you omit either parameter. See [Keyword
|
303
|
-
# spotting](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
349
|
+
# spotting](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-spotting#keyword-spotting).
|
304
350
|
# @param max_alternatives [Fixnum] The maximum number of alternative transcripts that the service is to return. By
|
305
351
|
# default, the service returns a single transcript. If you specify a value of `0`,
|
306
352
|
# the service uses the default value, `1`. See [Maximum
|
307
|
-
# alternatives](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
353
|
+
# alternatives](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metadata#max-alternatives).
|
308
354
|
# @param word_alternatives_threshold [Float] A confidence value that is the lower bound for identifying a hypothesis as a
|
309
355
|
# possible word alternative (also known as "Confusion Networks"). An alternative
|
310
356
|
# word is considered if its confidence is greater than or equal to the threshold.
|
311
357
|
# Specify a probability between 0.0 and 1.0. By default, the service computes no
|
312
358
|
# alternative words. See [Word
|
313
|
-
# alternatives](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
359
|
+
# alternatives](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-spotting#word-alternatives).
|
314
360
|
# @param word_confidence [Boolean] If `true`, the service returns a confidence measure in the range of 0.0 to 1.0 for
|
315
361
|
# each word. By default, the service returns no word confidence scores. See [Word
|
316
|
-
# confidence](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
362
|
+
# confidence](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metadata#word-confidence).
|
317
363
|
# @param timestamps [Boolean] If `true`, the service returns time alignment for each word. By default, no
|
318
364
|
# timestamps are returned. See [Word
|
319
|
-
# timestamps](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
365
|
+
# timestamps](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metadata#word-timestamps).
|
320
366
|
# @param profanity_filter [Boolean] If `true`, the service filters profanity from all output except for keyword
|
321
367
|
# results by replacing inappropriate words with a series of asterisks. Set the
|
322
368
|
# parameter to `false` to return results with no censoring. Applies to US English
|
323
|
-
# transcription only. See [Profanity
|
324
|
-
# filtering](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
369
|
+
# and Japanese transcription only. See [Profanity
|
370
|
+
# filtering](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-formatting#profanity-filtering).
|
325
371
|
# @param smart_formatting [Boolean] If `true`, the service converts dates, times, series of digits and numbers, phone
|
326
372
|
# numbers, currency values, and internet addresses into more readable, conventional
|
327
373
|
# representations in the final transcript of a recognition request. For US English,
|
@@ -331,19 +377,21 @@ module IBMWatson
|
|
331
377
|
# **Note:** Applies to US English, Japanese, and Spanish transcription only.
|
332
378
|
#
|
333
379
|
# See [Smart
|
334
|
-
# formatting](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
380
|
+
# formatting](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-formatting#smart-formatting).
|
335
381
|
# @param speaker_labels [Boolean] If `true`, the response includes labels that identify which words were spoken by
|
336
382
|
# which participants in a multi-person exchange. By default, the service returns no
|
337
383
|
# speaker labels. Setting `speaker_labels` to `true` forces the `timestamps`
|
338
384
|
# parameter to be `true`, regardless of whether you specify `false` for the
|
339
385
|
# parameter.
|
340
|
-
#
|
341
|
-
#
|
342
|
-
#
|
343
|
-
#
|
344
|
-
#
|
345
|
-
#
|
346
|
-
# labels
|
386
|
+
# * For previous-generation models, can be used for US English, Australian English,
|
387
|
+
# German, Japanese, Korean, and Spanish (both broadband and narrowband models) and
|
388
|
+
# UK English (narrowband model) transcription only.
|
389
|
+
# * For next-generation models, can be used for English (Australian, UK, and US),
|
390
|
+
# German, and Spanish transcription only.
|
391
|
+
#
|
392
|
+
# Restrictions and limitations apply to the use of speaker labels for both types of
|
393
|
+
# models. See [Speaker
|
394
|
+
# labels](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-speaker-labels).
|
347
395
|
# @param customization_id [String] **Deprecated.** Use the `language_customization_id` parameter to specify the
|
348
396
|
# customization ID (GUID) of a custom language model that is to be used with the
|
349
397
|
# recognition request. Do not specify both parameters with a request.
|
@@ -352,7 +400,8 @@ module IBMWatson
|
|
352
400
|
# specify the name of the custom language model for which the grammar is defined.
|
353
401
|
# The service recognizes only strings that are recognized by the specified grammar;
|
354
402
|
# it does not recognize other custom words from the model's words resource. See
|
355
|
-
# [
|
403
|
+
# [Using a grammar for speech
|
404
|
+
# recognition](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-grammarUse).
|
356
405
|
# @param redaction [Boolean] If `true`, the service redacts, or masks, numeric data from final transcripts. The
|
357
406
|
# feature redacts any number that has three or more consecutive digits by replacing
|
358
407
|
# each digit with an `X` character. It is intended to redact sensitive numeric data,
|
@@ -367,13 +416,13 @@ module IBMWatson
|
|
367
416
|
# **Note:** Applies to US English, Japanese, and Korean transcription only.
|
368
417
|
#
|
369
418
|
# See [Numeric
|
370
|
-
# redaction](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
419
|
+
# redaction](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-formatting#numeric-redaction).
|
371
420
|
# @param audio_metrics [Boolean] If `true`, requests detailed information about the signal characteristics of the
|
372
421
|
# input audio. The service returns audio metrics with the final transcription
|
373
422
|
# results. By default, the service returns no audio metrics.
|
374
423
|
#
|
375
424
|
# See [Audio
|
376
|
-
# metrics](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metrics#
|
425
|
+
# metrics](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metrics#audio-metrics).
|
377
426
|
# @param end_of_phrase_silence_time [Float] If `true`, specifies the duration of the pause interval at which the service
|
378
427
|
# splits a transcript into multiple final results. If the service detects pauses or
|
379
428
|
# extended silence before it reaches the end of the audio stream, its response can
|
@@ -390,7 +439,7 @@ module IBMWatson
|
|
390
439
|
# Chinese is 0.6 seconds.
|
391
440
|
#
|
392
441
|
# See [End of phrase silence
|
393
|
-
# time](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
442
|
+
# time](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-parsing#silence-time).
|
394
443
|
# @param split_transcript_at_phrase_end [Boolean] If `true`, directs the service to split the transcript into multiple final results
|
395
444
|
# based on semantic features of the input, for example, at the conclusion of
|
396
445
|
# meaningful phrases such as sentences. The service bases its understanding of
|
@@ -400,7 +449,7 @@ module IBMWatson
|
|
400
449
|
# interval.
|
401
450
|
#
|
402
451
|
# See [Split transcript at phrase
|
403
|
-
# end](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
452
|
+
# end](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-parsing#split-transcript).
|
404
453
|
# @param speech_detector_sensitivity [Float] The sensitivity of speech activity detection that the service is to perform. Use
|
405
454
|
# the parameter to suppress word insertions from music, coughing, and other
|
406
455
|
# non-speech events. The service biases the audio it passes for speech recognition
|
@@ -412,8 +461,8 @@ module IBMWatson
|
|
412
461
|
# * 0.5 (the default) provides a reasonable compromise for the level of sensitivity.
|
413
462
|
# * 1.0 suppresses no audio (speech detection sensitivity is disabled).
|
414
463
|
#
|
415
|
-
# The values increase on a monotonic curve. See [Speech
|
416
|
-
#
|
464
|
+
# The values increase on a monotonic curve. See [Speech detector
|
465
|
+
# sensitivity](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-detection#detection-parameters-sensitivity).
|
417
466
|
# @param background_audio_suppression [Float] The level to which the service is to suppress background audio based on its volume
|
418
467
|
# to prevent it from being transcribed as speech. Use the parameter to suppress side
|
419
468
|
# conversations or background noise.
|
@@ -424,10 +473,27 @@ module IBMWatson
|
|
424
473
|
# * 0.5 provides a reasonable level of audio suppression for general usage.
|
425
474
|
# * 1.0 suppresses all audio (no audio is transcribed).
|
426
475
|
#
|
427
|
-
# The values increase on a monotonic curve. See [
|
428
|
-
#
|
476
|
+
# The values increase on a monotonic curve. See [Background audio
|
477
|
+
# suppression](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-detection#detection-parameters-suppression).
|
478
|
+
# @param low_latency [Boolean] If `true` for next-generation `Multimedia` and `Telephony` models that support low
|
479
|
+
# latency, directs the service to produce results even more quickly than it usually
|
480
|
+
# does. Next-generation models produce transcription results faster than
|
481
|
+
# previous-generation models. The `low_latency` parameter causes the models to
|
482
|
+
# produce results even more quickly, though the results might be less accurate when
|
483
|
+
# the parameter is used.
|
484
|
+
#
|
485
|
+
# **Note:** The parameter is beta functionality. It is not available for
|
486
|
+
# previous-generation `Broadband` and `Narrowband` models. It is available only for
|
487
|
+
# some next-generation models.
|
488
|
+
#
|
489
|
+
# * For a list of next-generation models that support low latency, see [Supported
|
490
|
+
# language
|
491
|
+
# models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-ng#models-ng-supported)
|
492
|
+
# for next-generation models.
|
493
|
+
# * For more information about the `low_latency` parameter, see [Low
|
494
|
+
# latency](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-interim#low-latency).
|
429
495
|
# @return [IBMCloudSdkCore::DetailedResponse] A `IBMCloudSdkCore::DetailedResponse` object representing the response.
|
430
|
-
def recognize(audio:, content_type: nil, model: nil, language_customization_id: nil, acoustic_customization_id: nil, base_model_version: nil, customization_weight: nil, inactivity_timeout: nil, keywords: nil, keywords_threshold: nil, max_alternatives: nil, word_alternatives_threshold: nil, word_confidence: nil, timestamps: nil, profanity_filter: nil, smart_formatting: nil, speaker_labels: nil, customization_id: nil, grammar_name: nil, redaction: nil, audio_metrics: nil, end_of_phrase_silence_time: nil, split_transcript_at_phrase_end: nil, speech_detector_sensitivity: nil, background_audio_suppression: nil)
|
496
|
+
def recognize(audio:, content_type: nil, model: nil, language_customization_id: nil, acoustic_customization_id: nil, base_model_version: nil, customization_weight: nil, inactivity_timeout: nil, keywords: nil, keywords_threshold: nil, max_alternatives: nil, word_alternatives_threshold: nil, word_confidence: nil, timestamps: nil, profanity_filter: nil, smart_formatting: nil, speaker_labels: nil, customization_id: nil, grammar_name: nil, redaction: nil, audio_metrics: nil, end_of_phrase_silence_time: nil, split_transcript_at_phrase_end: nil, speech_detector_sensitivity: nil, background_audio_suppression: nil, low_latency: nil)
|
431
497
|
raise ArgumentError.new("audio must be provided") if audio.nil?
|
432
498
|
|
433
499
|
headers = {
|
@@ -460,7 +526,8 @@ module IBMWatson
|
|
460
526
|
"end_of_phrase_silence_time" => end_of_phrase_silence_time,
|
461
527
|
"split_transcript_at_phrase_end" => split_transcript_at_phrase_end,
|
462
528
|
"speech_detector_sensitivity" => speech_detector_sensitivity,
|
463
|
-
"background_audio_suppression" => background_audio_suppression
|
529
|
+
"background_audio_suppression" => background_audio_suppression,
|
530
|
+
"low_latency" => low_latency
|
464
531
|
}
|
465
532
|
|
466
533
|
data = audio
|
@@ -479,7 +546,7 @@ module IBMWatson
|
|
479
546
|
end
|
480
547
|
|
481
548
|
##
|
482
|
-
# @!method recognize_using_websocket(content_type: nil,recognize_callback:,audio: nil,chunk_data: false,model: nil,customization_id: nil,acoustic_customization_id: nil,customization_weight: nil,base_model_version: nil,inactivity_timeout: nil,interim_results: nil,keywords: nil,keywords_threshold: nil,max_alternatives: nil,word_alternatives_threshold: nil,word_confidence: nil,timestamps: nil,profanity_filter: nil,smart_formatting: nil,speaker_labels: nil, end_of_phrase_silence_time: nil, split_transcript_at_phrase_end: nil, speech_detector_sensitivity: nil, background_audio_suppression: nil)
|
549
|
+
# @!method recognize_using_websocket(content_type: nil,recognize_callback:,audio: nil,chunk_data: false,model: nil,customization_id: nil,acoustic_customization_id: nil,customization_weight: nil,base_model_version: nil,inactivity_timeout: nil,interim_results: nil,keywords: nil,keywords_threshold: nil,max_alternatives: nil,word_alternatives_threshold: nil,word_confidence: nil,timestamps: nil,profanity_filter: nil,smart_formatting: nil,speaker_labels: nil, end_of_phrase_silence_time: nil, split_transcript_at_phrase_end: nil, speech_detector_sensitivity: nil, background_audio_suppression: nil, low_latency: nil)
|
483
550
|
# Sends audio for speech recognition using web sockets.
|
484
551
|
# @param content_type [String] The type of the input: audio/basic, audio/flac, audio/l16, audio/mp3, audio/mpeg, audio/mulaw, audio/ogg, audio/ogg;codecs=opus, audio/ogg;codecs=vorbis, audio/wav, audio/webm, audio/webm;codecs=opus, audio/webm;codecs=vorbis, or multipart/form-data.
|
485
552
|
# @param recognize_callback [RecognizeCallback] The instance handling events returned from the service.
|
@@ -596,6 +663,23 @@ module IBMWatson
|
|
596
663
|
#
|
597
664
|
# The values increase on a monotonic curve. See [Speech Activity
|
598
665
|
# Detection](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-input#detection).
|
666
|
+
# @param low_latency [Boolean] If `true` for next-generation `Multimedia` and `Telephony` models that support low
|
667
|
+
# latency, directs the service to produce results even more quickly than it usually
|
668
|
+
# does. Next-generation models produce transcription results faster than
|
669
|
+
# previous-generation models. The `low_latency` parameter causes the models to
|
670
|
+
# produce results even more quickly, though the results might be less accurate when
|
671
|
+
# the parameter is used.
|
672
|
+
#
|
673
|
+
# **Note:** The parameter is beta functionality. It is not available for
|
674
|
+
# previous-generation `Broadband` and `Narrowband` models. It is available only for
|
675
|
+
# some next-generation models.
|
676
|
+
#
|
677
|
+
# * For a list of next-generation models that support low latency, see [Supported
|
678
|
+
# language
|
679
|
+
# models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-ng#models-ng-supported)
|
680
|
+
# for next-generation models.
|
681
|
+
# * For more information about the `low_latency` parameter, see [Low
|
682
|
+
# latency](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-interim#low-latency).
|
599
683
|
# @return [IBMCloudSdkCore::DetailedResponse] A `IBMCloudSdkCore::DetailedResponse` object representing the response.
|
600
684
|
def recognize_using_websocket(
|
601
685
|
content_type: nil,
|
@@ -627,7 +711,8 @@ module IBMWatson
|
|
627
711
|
end_of_phrase_silence_time: nil,
|
628
712
|
split_transcript_at_phrase_end: nil,
|
629
713
|
speech_detector_sensitivity: nil,
|
630
|
-
background_audio_suppression: nil
|
714
|
+
background_audio_suppression: nil,
|
715
|
+
low_latency: nil
|
631
716
|
)
|
632
717
|
raise ArgumentError("Audio must be provided") if audio.nil? && !chunk_data
|
633
718
|
raise ArgumentError("Recognize callback must be provided") if recognize_callback.nil?
|
@@ -669,7 +754,8 @@ module IBMWatson
|
|
669
754
|
"end_of_phrase_silence_time" => end_of_phrase_silence_time,
|
670
755
|
"split_transcript_at_phrase_end" => split_transcript_at_phrase_end,
|
671
756
|
"speech_detector_sensitivity" => speech_detector_sensitivity,
|
672
|
-
"background_audio_suppression" => background_audio_suppression
|
757
|
+
"background_audio_suppression" => background_audio_suppression,
|
758
|
+
"low_latency" => low_latency
|
673
759
|
}
|
674
760
|
options.delete_if { |_, v| v.nil? }
|
675
761
|
WebSocketClient.new(audio: audio, chunk_data: chunk_data, options: options, recognize_callback: recognize_callback, service_url: service_url, headers: headers, disable_ssl_verification: @disable_ssl_verification)
|
@@ -787,7 +873,7 @@ module IBMWatson
|
|
787
873
|
end
|
788
874
|
|
789
875
|
##
|
790
|
-
# @!method create_job(audio:, content_type: nil, model: nil, callback_url: nil, events: nil, user_token: nil, results_ttl: nil, language_customization_id: nil, acoustic_customization_id: nil, base_model_version: nil, customization_weight: nil, inactivity_timeout: nil, keywords: nil, keywords_threshold: nil, max_alternatives: nil, word_alternatives_threshold: nil, word_confidence: nil, timestamps: nil, profanity_filter: nil, smart_formatting: nil, speaker_labels: nil, customization_id: nil, grammar_name: nil, redaction: nil, processing_metrics: nil, processing_metrics_interval: nil, audio_metrics: nil, end_of_phrase_silence_time: nil, split_transcript_at_phrase_end: nil, speech_detector_sensitivity: nil, background_audio_suppression: nil)
|
876
|
+
# @!method create_job(audio:, content_type: nil, model: nil, callback_url: nil, events: nil, user_token: nil, results_ttl: nil, language_customization_id: nil, acoustic_customization_id: nil, base_model_version: nil, customization_weight: nil, inactivity_timeout: nil, keywords: nil, keywords_threshold: nil, max_alternatives: nil, word_alternatives_threshold: nil, word_confidence: nil, timestamps: nil, profanity_filter: nil, smart_formatting: nil, speaker_labels: nil, customization_id: nil, grammar_name: nil, redaction: nil, processing_metrics: nil, processing_metrics_interval: nil, audio_metrics: nil, end_of_phrase_silence_time: nil, split_transcript_at_phrase_end: nil, speech_detector_sensitivity: nil, background_audio_suppression: nil, low_latency: nil)
|
791
877
|
# Create a job.
|
792
878
|
# Creates a job for a new asynchronous recognition request. The job is owned by the
|
793
879
|
# instance of the service whose credentials are used to create it. How you learn the
|
@@ -883,14 +969,49 @@ module IBMWatson
|
|
883
969
|
# sampling rate of the audio is lower than the minimum required rate, the request
|
884
970
|
# fails.
|
885
971
|
#
|
886
|
-
# **See also:** [
|
887
|
-
# formats](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-audio-formats
|
972
|
+
# **See also:** [Supported audio
|
973
|
+
# formats](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-audio-formats).
|
974
|
+
#
|
975
|
+
#
|
976
|
+
# ### Next-generation models
|
977
|
+
#
|
978
|
+
# **Note:** The next-generation language models are beta functionality. They
|
979
|
+
# support a limited number of languages and features at this time. The supported
|
980
|
+
# languages, models, and features will increase with future releases.
|
981
|
+
#
|
982
|
+
# The service supports next-generation `Multimedia` (16 kHz) and `Telephony` (8 kHz)
|
983
|
+
# models for many languages. Next-generation models have higher throughput than the
|
984
|
+
# service's previous generation of `Broadband` and `Narrowband` models. When you use
|
985
|
+
# next-generation models, the service can return transcriptions more quickly and
|
986
|
+
# also provide noticeably better transcription accuracy.
|
987
|
+
#
|
988
|
+
# You specify a next-generation model by using the `model` query parameter, as you
|
989
|
+
# do a previous-generation model. Next-generation models support the same request
|
990
|
+
# headers as previous-generation models, but they support only the following
|
991
|
+
# additional query parameters:
|
992
|
+
# * `background_audio_suppression`
|
993
|
+
# * `inactivity_timeout`
|
994
|
+
# * `profanity_filter`
|
995
|
+
# * `redaction`
|
996
|
+
# * `smart_formatting`
|
997
|
+
# * `speaker_labels`
|
998
|
+
# * `speech_detector_sensitivity`
|
999
|
+
# * `timestamps`
|
1000
|
+
#
|
1001
|
+
# Many next-generation models also support the beta `low_latency` parameter, which
|
1002
|
+
# is not available with previous-generation models.
|
1003
|
+
#
|
1004
|
+
# **See also:** [Next-generation languages and
|
1005
|
+
# models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-ng).
|
888
1006
|
# @param audio [File] The audio to transcribe.
|
889
1007
|
# @param content_type [String] The format (MIME type) of the audio. For more information about specifying an
|
890
1008
|
# audio format, see **Audio formats (content types)** in the method description.
|
891
|
-
# @param model [String] The identifier of the model that is to be used for the recognition request.
|
892
|
-
#
|
893
|
-
#
|
1009
|
+
# @param model [String] The identifier of the model that is to be used for the recognition request.
|
1010
|
+
# (**Note:** The model `ar-AR_BroadbandModel` is deprecated; use
|
1011
|
+
# `ar-MS_BroadbandModel` instead.) See [Languages and
|
1012
|
+
# models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models) and
|
1013
|
+
# [Next-generation languages and
|
1014
|
+
# models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-ng).
|
894
1015
|
# @param callback_url [String] A URL to which callback notifications are to be sent. The URL must already be
|
895
1016
|
# successfully allowlisted by using the **Register a callback** method. You can
|
896
1017
|
# include the same callback URL with any number of job creation requests. Omit the
|
@@ -929,8 +1050,9 @@ module IBMWatson
|
|
929
1050
|
# recognition request. The base model of the specified custom language model must
|
930
1051
|
# match the model specified with the `model` parameter. You must make the request
|
931
1052
|
# with credentials for the instance of the service that owns the custom model. By
|
932
|
-
# default, no custom language model is used. See [
|
933
|
-
#
|
1053
|
+
# default, no custom language model is used. See [Using a custom language model for
|
1054
|
+
# speech
|
1055
|
+
# recognition](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-languageUse).
|
934
1056
|
#
|
935
1057
|
#
|
936
1058
|
# **Note:** Use this parameter instead of the deprecated `customization_id`
|
@@ -939,14 +1061,16 @@ module IBMWatson
|
|
939
1061
|
# recognition request. The base model of the specified custom acoustic model must
|
940
1062
|
# match the model specified with the `model` parameter. You must make the request
|
941
1063
|
# with credentials for the instance of the service that owns the custom model. By
|
942
|
-
# default, no custom acoustic model is used. See [
|
943
|
-
#
|
1064
|
+
# default, no custom acoustic model is used. See [Using a custom acoustic model for
|
1065
|
+
# speech
|
1066
|
+
# recognition](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-acousticUse).
|
944
1067
|
# @param base_model_version [String] The version of the specified base model that is to be used with the recognition
|
945
1068
|
# request. Multiple versions of a base model can exist when a model is updated for
|
946
1069
|
# internal improvements. The parameter is intended primarily for use with custom
|
947
1070
|
# models that have been upgraded for a new base model. The default value depends on
|
948
|
-
# whether the parameter is used with or without a custom model. See [
|
949
|
-
#
|
1071
|
+
# whether the parameter is used with or without a custom model. See [Making speech
|
1072
|
+
# recognition requests with upgraded custom
|
1073
|
+
# models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-upgrade-use#custom-upgrade-use-recognition).
|
950
1074
|
# @param customization_weight [Float] If you specify the customization ID (GUID) of a custom language model with the
|
951
1075
|
# recognition request, the customization weight tells the service how much weight to
|
952
1076
|
# give to words from the custom language model compared to those from the base model
|
@@ -963,8 +1087,8 @@ module IBMWatson
|
|
963
1087
|
# custom model's domain, but it can negatively affect performance on non-domain
|
964
1088
|
# phrases.
|
965
1089
|
#
|
966
|
-
# See [
|
967
|
-
#
|
1090
|
+
# See [Using customization
|
1091
|
+
# weight](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-languageUse#weight).
|
968
1092
|
# @param inactivity_timeout [Fixnum] The time in seconds after which, if only silence (no speech) is detected in
|
969
1093
|
# streaming audio, the connection is closed with a 400 error. The parameter is
|
970
1094
|
# useful for stopping audio submission from a live microphone when a user simply
|
@@ -981,34 +1105,34 @@ module IBMWatson
|
|
981
1105
|
# for double-byte languages might be shorter. Keywords are case-insensitive.
|
982
1106
|
#
|
983
1107
|
# See [Keyword
|
984
|
-
# spotting](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
1108
|
+
# spotting](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-spotting#keyword-spotting).
|
985
1109
|
# @param keywords_threshold [Float] A confidence value that is the lower bound for spotting a keyword. A word is
|
986
1110
|
# considered to match a keyword if its confidence is greater than or equal to the
|
987
1111
|
# threshold. Specify a probability between 0.0 and 1.0. If you specify a threshold,
|
988
1112
|
# you must also specify one or more keywords. The service performs no keyword
|
989
1113
|
# spotting if you omit either parameter. See [Keyword
|
990
|
-
# spotting](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
1114
|
+
# spotting](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-spotting#keyword-spotting).
|
991
1115
|
# @param max_alternatives [Fixnum] The maximum number of alternative transcripts that the service is to return. By
|
992
1116
|
# default, the service returns a single transcript. If you specify a value of `0`,
|
993
1117
|
# the service uses the default value, `1`. See [Maximum
|
994
|
-
# alternatives](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
1118
|
+
# alternatives](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metadata#max-alternatives).
|
995
1119
|
# @param word_alternatives_threshold [Float] A confidence value that is the lower bound for identifying a hypothesis as a
|
996
1120
|
# possible word alternative (also known as "Confusion Networks"). An alternative
|
997
1121
|
# word is considered if its confidence is greater than or equal to the threshold.
|
998
1122
|
# Specify a probability between 0.0 and 1.0. By default, the service computes no
|
999
1123
|
# alternative words. See [Word
|
1000
|
-
# alternatives](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
1124
|
+
# alternatives](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-spotting#word-alternatives).
|
1001
1125
|
# @param word_confidence [Boolean] If `true`, the service returns a confidence measure in the range of 0.0 to 1.0 for
|
1002
1126
|
# each word. By default, the service returns no word confidence scores. See [Word
|
1003
|
-
# confidence](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
1127
|
+
# confidence](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metadata#word-confidence).
|
1004
1128
|
# @param timestamps [Boolean] If `true`, the service returns time alignment for each word. By default, no
|
1005
1129
|
# timestamps are returned. See [Word
|
1006
|
-
# timestamps](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
1130
|
+
# timestamps](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metadata#word-timestamps).
|
1007
1131
|
# @param profanity_filter [Boolean] If `true`, the service filters profanity from all output except for keyword
|
1008
1132
|
# results by replacing inappropriate words with a series of asterisks. Set the
|
1009
1133
|
# parameter to `false` to return results with no censoring. Applies to US English
|
1010
|
-
# transcription only. See [Profanity
|
1011
|
-
# filtering](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
1134
|
+
# and Japanese transcription only. See [Profanity
|
1135
|
+
# filtering](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-formatting#profanity-filtering).
|
1012
1136
|
# @param smart_formatting [Boolean] If `true`, the service converts dates, times, series of digits and numbers, phone
|
1013
1137
|
# numbers, currency values, and internet addresses into more readable, conventional
|
1014
1138
|
# representations in the final transcript of a recognition request. For US English,
|
@@ -1018,19 +1142,21 @@ module IBMWatson
|
|
1018
1142
|
# **Note:** Applies to US English, Japanese, and Spanish transcription only.
|
1019
1143
|
#
|
1020
1144
|
# See [Smart
|
1021
|
-
# formatting](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
1145
|
+
# formatting](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-formatting#smart-formatting).
|
1022
1146
|
# @param speaker_labels [Boolean] If `true`, the response includes labels that identify which words were spoken by
|
1023
1147
|
# which participants in a multi-person exchange. By default, the service returns no
|
1024
1148
|
# speaker labels. Setting `speaker_labels` to `true` forces the `timestamps`
|
1025
1149
|
# parameter to be `true`, regardless of whether you specify `false` for the
|
1026
1150
|
# parameter.
|
1027
|
-
#
|
1028
|
-
#
|
1029
|
-
#
|
1030
|
-
#
|
1031
|
-
#
|
1032
|
-
#
|
1033
|
-
# labels
|
1151
|
+
# * For previous-generation models, can be used for US English, Australian English,
|
1152
|
+
# German, Japanese, Korean, and Spanish (both broadband and narrowband models) and
|
1153
|
+
# UK English (narrowband model) transcription only.
|
1154
|
+
# * For next-generation models, can be used for English (Australian, UK, and US),
|
1155
|
+
# German, and Spanish transcription only.
|
1156
|
+
#
|
1157
|
+
# Restrictions and limitations apply to the use of speaker labels for both types of
|
1158
|
+
# models. See [Speaker
|
1159
|
+
# labels](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-speaker-labels).
|
1034
1160
|
# @param customization_id [String] **Deprecated.** Use the `language_customization_id` parameter to specify the
|
1035
1161
|
# customization ID (GUID) of a custom language model that is to be used with the
|
1036
1162
|
# recognition request. Do not specify both parameters with a request.
|
@@ -1039,7 +1165,8 @@ module IBMWatson
|
|
1039
1165
|
# specify the name of the custom language model for which the grammar is defined.
|
1040
1166
|
# The service recognizes only strings that are recognized by the specified grammar;
|
1041
1167
|
# it does not recognize other custom words from the model's words resource. See
|
1042
|
-
# [
|
1168
|
+
# [Using a grammar for speech
|
1169
|
+
# recognition](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-grammarUse).
|
1043
1170
|
# @param redaction [Boolean] If `true`, the service redacts, or masks, numeric data from final transcripts. The
|
1044
1171
|
# feature redacts any number that has three or more consecutive digits by replacing
|
1045
1172
|
# each digit with an `X` character. It is intended to redact sensitive numeric data,
|
@@ -1054,7 +1181,7 @@ module IBMWatson
|
|
1054
1181
|
# **Note:** Applies to US English, Japanese, and Korean transcription only.
|
1055
1182
|
#
|
1056
1183
|
# See [Numeric
|
1057
|
-
# redaction](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
1184
|
+
# redaction](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-formatting#numeric-redaction).
|
1058
1185
|
# @param processing_metrics [Boolean] If `true`, requests processing metrics about the service's transcription of the
|
1059
1186
|
# input audio. The service returns processing metrics at the interval specified by
|
1060
1187
|
# the `processing_metrics_interval` parameter. It also returns processing metrics
|
@@ -1062,7 +1189,7 @@ module IBMWatson
|
|
1062
1189
|
# the service returns no processing metrics.
|
1063
1190
|
#
|
1064
1191
|
# See [Processing
|
1065
|
-
# metrics](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metrics#
|
1192
|
+
# metrics](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metrics#processing-metrics).
|
1066
1193
|
# @param processing_metrics_interval [Float] Specifies the interval in real wall-clock seconds at which the service is to
|
1067
1194
|
# return processing metrics. The parameter is ignored unless the
|
1068
1195
|
# `processing_metrics` parameter is set to `true`.
|
@@ -1076,13 +1203,13 @@ module IBMWatson
|
|
1076
1203
|
# the service returns processing metrics only for transcription events.
|
1077
1204
|
#
|
1078
1205
|
# See [Processing
|
1079
|
-
# metrics](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metrics#
|
1206
|
+
# metrics](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metrics#processing-metrics).
|
1080
1207
|
# @param audio_metrics [Boolean] If `true`, requests detailed information about the signal characteristics of the
|
1081
1208
|
# input audio. The service returns audio metrics with the final transcription
|
1082
1209
|
# results. By default, the service returns no audio metrics.
|
1083
1210
|
#
|
1084
1211
|
# See [Audio
|
1085
|
-
# metrics](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metrics#
|
1212
|
+
# metrics](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metrics#audio-metrics).
|
1086
1213
|
# @param end_of_phrase_silence_time [Float] If `true`, specifies the duration of the pause interval at which the service
|
1087
1214
|
# splits a transcript into multiple final results. If the service detects pauses or
|
1088
1215
|
# extended silence before it reaches the end of the audio stream, its response can
|
@@ -1099,7 +1226,7 @@ module IBMWatson
|
|
1099
1226
|
# Chinese is 0.6 seconds.
|
1100
1227
|
#
|
1101
1228
|
# See [End of phrase silence
|
1102
|
-
# time](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
1229
|
+
# time](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-parsing#silence-time).
|
1103
1230
|
# @param split_transcript_at_phrase_end [Boolean] If `true`, directs the service to split the transcript into multiple final results
|
1104
1231
|
# based on semantic features of the input, for example, at the conclusion of
|
1105
1232
|
# meaningful phrases such as sentences. The service bases its understanding of
|
@@ -1109,7 +1236,7 @@ module IBMWatson
|
|
1109
1236
|
# interval.
|
1110
1237
|
#
|
1111
1238
|
# See [Split transcript at phrase
|
1112
|
-
# end](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
1239
|
+
# end](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-parsing#split-transcript).
|
1113
1240
|
# @param speech_detector_sensitivity [Float] The sensitivity of speech activity detection that the service is to perform. Use
|
1114
1241
|
# the parameter to suppress word insertions from music, coughing, and other
|
1115
1242
|
# non-speech events. The service biases the audio it passes for speech recognition
|
@@ -1121,8 +1248,8 @@ module IBMWatson
|
|
1121
1248
|
# * 0.5 (the default) provides a reasonable compromise for the level of sensitivity.
|
1122
1249
|
# * 1.0 suppresses no audio (speech detection sensitivity is disabled).
|
1123
1250
|
#
|
1124
|
-
# The values increase on a monotonic curve. See [Speech
|
1125
|
-
#
|
1251
|
+
# The values increase on a monotonic curve. See [Speech detector
|
1252
|
+
# sensitivity](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-detection#detection-parameters-sensitivity).
|
1126
1253
|
# @param background_audio_suppression [Float] The level to which the service is to suppress background audio based on its volume
|
1127
1254
|
# to prevent it from being transcribed as speech. Use the parameter to suppress side
|
1128
1255
|
# conversations or background noise.
|
@@ -1133,10 +1260,27 @@ module IBMWatson
|
|
1133
1260
|
# * 0.5 provides a reasonable level of audio suppression for general usage.
|
1134
1261
|
# * 1.0 suppresses all audio (no audio is transcribed).
|
1135
1262
|
#
|
1136
|
-
# The values increase on a monotonic curve. See [
|
1137
|
-
#
|
1263
|
+
# The values increase on a monotonic curve. See [Background audio
|
1264
|
+
# suppression](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-detection#detection-parameters-suppression).
|
1265
|
+
# @param low_latency [Boolean] If `true` for next-generation `Multimedia` and `Telephony` models that support low
|
1266
|
+
# latency, directs the service to produce results even more quickly than it usually
|
1267
|
+
# does. Next-generation models produce transcription results faster than
|
1268
|
+
# previous-generation models. The `low_latency` parameter causes the models to
|
1269
|
+
# produce results even more quickly, though the results might be less accurate when
|
1270
|
+
# the parameter is used.
|
1271
|
+
#
|
1272
|
+
# **Note:** The parameter is beta functionality. It is not available for
|
1273
|
+
# previous-generation `Broadband` and `Narrowband` models. It is available only for
|
1274
|
+
# some next-generation models.
|
1275
|
+
#
|
1276
|
+
# * For a list of next-generation models that support low latency, see [Supported
|
1277
|
+
# language
|
1278
|
+
# models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-ng#models-ng-supported)
|
1279
|
+
# for next-generation models.
|
1280
|
+
# * For more information about the `low_latency` parameter, see [Low
|
1281
|
+
# latency](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-interim#low-latency).
|
1138
1282
|
# @return [IBMCloudSdkCore::DetailedResponse] A `IBMCloudSdkCore::DetailedResponse` object representing the response.
|
1139
|
-
def create_job(audio:, content_type: nil, model: nil, callback_url: nil, events: nil, user_token: nil, results_ttl: nil, language_customization_id: nil, acoustic_customization_id: nil, base_model_version: nil, customization_weight: nil, inactivity_timeout: nil, keywords: nil, keywords_threshold: nil, max_alternatives: nil, word_alternatives_threshold: nil, word_confidence: nil, timestamps: nil, profanity_filter: nil, smart_formatting: nil, speaker_labels: nil, customization_id: nil, grammar_name: nil, redaction: nil, processing_metrics: nil, processing_metrics_interval: nil, audio_metrics: nil, end_of_phrase_silence_time: nil, split_transcript_at_phrase_end: nil, speech_detector_sensitivity: nil, background_audio_suppression: nil)
|
1283
|
+
def create_job(audio:, content_type: nil, model: nil, callback_url: nil, events: nil, user_token: nil, results_ttl: nil, language_customization_id: nil, acoustic_customization_id: nil, base_model_version: nil, customization_weight: nil, inactivity_timeout: nil, keywords: nil, keywords_threshold: nil, max_alternatives: nil, word_alternatives_threshold: nil, word_confidence: nil, timestamps: nil, profanity_filter: nil, smart_formatting: nil, speaker_labels: nil, customization_id: nil, grammar_name: nil, redaction: nil, processing_metrics: nil, processing_metrics_interval: nil, audio_metrics: nil, end_of_phrase_silence_time: nil, split_transcript_at_phrase_end: nil, speech_detector_sensitivity: nil, background_audio_suppression: nil, low_latency: nil)
|
1140
1284
|
raise ArgumentError.new("audio must be provided") if audio.nil?
|
1141
1285
|
|
1142
1286
|
headers = {
|
@@ -1175,7 +1319,8 @@ module IBMWatson
|
|
1175
1319
|
"end_of_phrase_silence_time" => end_of_phrase_silence_time,
|
1176
1320
|
"split_transcript_at_phrase_end" => split_transcript_at_phrase_end,
|
1177
1321
|
"speech_detector_sensitivity" => speech_detector_sensitivity,
|
1178
|
-
"background_audio_suppression" => background_audio_suppression
|
1322
|
+
"background_audio_suppression" => background_audio_suppression,
|
1323
|
+
"low_latency" => low_latency
|
1179
1324
|
}
|
1180
1325
|
|
1181
1326
|
data = audio
|
@@ -1393,7 +1538,8 @@ module IBMWatson
|
|
1393
1538
|
# models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageLanguageModels#listModels-language).
|
1394
1539
|
# @param language [String] The identifier of the language for which custom language or custom acoustic models
|
1395
1540
|
# are to be returned. Omit the parameter to see all custom language or custom
|
1396
|
-
# acoustic models that are owned by the requesting credentials.
|
1541
|
+
# acoustic models that are owned by the requesting credentials. (**Note:** The
|
1542
|
+
# identifier `ar-AR` is deprecated; use `ar-MS` instead.)
|
1397
1543
|
#
|
1398
1544
|
# To determine the languages for which customization is available, see [Language
|
1399
1545
|
# support for
|
@@ -1548,6 +1694,9 @@ module IBMWatson
|
|
1548
1694
|
# The value that you assign is used for all recognition requests that use the model.
|
1549
1695
|
# You can override it for any recognition request by specifying a customization
|
1550
1696
|
# weight for that request.
|
1697
|
+
#
|
1698
|
+
# See [Using customization
|
1699
|
+
# weight](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-languageUse#weight).
|
1551
1700
|
# @return [IBMCloudSdkCore::DetailedResponse] A `IBMCloudSdkCore::DetailedResponse` object representing the response.
|
1552
1701
|
def train_language_model(customization_id:, word_type_to_add: nil, customization_weight: nil)
|
1553
1702
|
raise ArgumentError.new("customization_id must be provided") if customization_id.nil?
|
@@ -1629,7 +1778,7 @@ module IBMWatson
|
|
1629
1778
|
# subsequent requests for the model until the upgrade completes.
|
1630
1779
|
#
|
1631
1780
|
# **See also:** [Upgrading a custom language
|
1632
|
-
# model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
1781
|
+
# model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-upgrade#custom-upgrade-language).
|
1633
1782
|
# @param customization_id [String] The customization ID (GUID) of the custom language model that is to be used for
|
1634
1783
|
# the request. You must make the request with credentials for the instance of the
|
1635
1784
|
# service that owns the custom model.
|
@@ -2468,7 +2617,8 @@ module IBMWatson
|
|
2468
2617
|
# custom model`.
|
2469
2618
|
# @param base_model_name [String] The name of the base language model that is to be customized by the new custom
|
2470
2619
|
# acoustic model. The new custom model can be used only with the base model that it
|
2471
|
-
# customizes.
|
2620
|
+
# customizes. (**Note:** The model `ar-AR_BroadbandModel` is deprecated; use
|
2621
|
+
# `ar-MS_BroadbandModel` instead.)
|
2472
2622
|
#
|
2473
2623
|
# To determine whether a base model supports acoustic model customization, refer to
|
2474
2624
|
# [Language support for
|
@@ -2517,7 +2667,8 @@ module IBMWatson
|
|
2517
2667
|
# models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageAcousticModels#listModels-acoustic).
|
2518
2668
|
# @param language [String] The identifier of the language for which custom language or custom acoustic models
|
2519
2669
|
# are to be returned. Omit the parameter to see all custom language or custom
|
2520
|
-
# acoustic models that are owned by the requesting credentials.
|
2670
|
+
# acoustic models that are owned by the requesting credentials. (**Note:** The
|
2671
|
+
# identifier `ar-AR` is deprecated; use `ar-MS` instead.)
|
2521
2672
|
#
|
2522
2673
|
# To determine the languages for which customization is available, see [Language
|
2523
2674
|
# support for
|
@@ -2771,7 +2922,7 @@ module IBMWatson
|
|
2771
2922
|
# acoustic model was not trained with a custom language model.
|
2772
2923
|
#
|
2773
2924
|
# **See also:** [Upgrading a custom acoustic
|
2774
|
-
# model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
2925
|
+
# model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-upgrade#custom-upgrade-acoustic).
|
2775
2926
|
# @param customization_id [String] The customization ID (GUID) of the custom acoustic model that is to be used for
|
2776
2927
|
# the request. You must make the request with credentials for the instance of the
|
2777
2928
|
# service that owns the custom model.
|
@@ -2785,7 +2936,7 @@ module IBMWatson
|
|
2785
2936
|
# upgrade of a custom acoustic model that is trained with a custom language model,
|
2786
2937
|
# and only if you receive a 400 response code and the message `No input data
|
2787
2938
|
# modified since last training`. See [Upgrading a custom acoustic
|
2788
|
-
# model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-
|
2939
|
+
# model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-upgrade#custom-upgrade-acoustic).
|
2789
2940
|
# @return [nil]
|
2790
2941
|
def upgrade_acoustic_model(customization_id:, custom_language_model_id: nil, force: nil)
|
2791
2942
|
raise ArgumentError.new("customization_id must be provided") if customization_id.nil?
|
@@ -2923,8 +3074,8 @@ module IBMWatson
|
|
2923
3074
|
# If the sampling rate of the audio is lower than the minimum required rate, the
|
2924
3075
|
# service labels the audio file as `invalid`.
|
2925
3076
|
#
|
2926
|
-
# **See also:** [
|
2927
|
-
# formats](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-audio-formats
|
3077
|
+
# **See also:** [Supported audio
|
3078
|
+
# formats](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-audio-formats).
|
2928
3079
|
#
|
2929
3080
|
#
|
2930
3081
|
# ### Content types for archive-type resources
|