openai 0.23.0 → 0.23.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +16 -0
- data/README.md +1 -1
- data/lib/openai/models/realtime/client_secret_create_response.rb +6 -8
- data/lib/openai/models/realtime/input_audio_buffer_timeout_triggered.rb +25 -5
- data/lib/openai/models/realtime/realtime_audio_config_input.rb +14 -11
- data/lib/openai/models/realtime/realtime_audio_input_turn_detection.rb +173 -117
- data/lib/openai/models/realtime/realtime_client_event.rb +2 -6
- data/lib/openai/models/realtime/{models.rb → realtime_function_tool.rb} +6 -6
- data/lib/openai/models/realtime/realtime_response_create_params.rb +4 -4
- data/lib/openai/models/realtime/realtime_server_event.rb +14 -9
- data/lib/openai/models/realtime/realtime_session.rb +182 -121
- data/lib/openai/models/realtime/realtime_session_create_request.rb +2 -2
- data/lib/openai/models/realtime/realtime_session_create_response.rb +204 -154
- data/lib/openai/models/realtime/realtime_tools_config_union.rb +2 -2
- data/lib/openai/models/realtime/realtime_transcription_session_audio_input.rb +16 -11
- data/lib/openai/models/realtime/realtime_transcription_session_audio_input_turn_detection.rb +175 -117
- data/lib/openai/models/realtime/realtime_transcription_session_create_response.rb +117 -40
- data/lib/openai/models/realtime/transcription_session_updated_event.rb +152 -3
- data/lib/openai/models/responses/response.rb +8 -8
- data/lib/openai/models/responses/response_create_params.rb +8 -8
- data/lib/openai/version.rb +1 -1
- data/lib/openai.rb +1 -4
- data/rbi/openai/models/realtime/input_audio_buffer_timeout_triggered.rbi +24 -5
- data/rbi/openai/models/realtime/realtime_audio_config_input.rbi +44 -28
- data/rbi/openai/models/realtime/realtime_audio_input_turn_detection.rbi +264 -203
- data/rbi/openai/models/realtime/realtime_client_event.rbi +1 -2
- data/rbi/openai/models/realtime/{models.rbi → realtime_function_tool.rbi} +27 -9
- data/rbi/openai/models/realtime/realtime_response_create_params.rbi +5 -5
- data/rbi/openai/models/realtime/realtime_server_event.rbi +0 -2
- data/rbi/openai/models/realtime/realtime_session.rbi +316 -235
- data/rbi/openai/models/realtime/realtime_session_create_request.rbi +4 -4
- data/rbi/openai/models/realtime/realtime_session_create_response.rbi +325 -307
- data/rbi/openai/models/realtime/realtime_tools_config_union.rbi +1 -1
- data/rbi/openai/models/realtime/realtime_transcription_session_audio_input.rbi +39 -28
- data/rbi/openai/models/realtime/realtime_transcription_session_audio_input_turn_detection.rbi +264 -200
- data/rbi/openai/models/realtime/realtime_transcription_session_create_response.rbi +290 -101
- data/rbi/openai/models/realtime/transcription_session_updated_event.rbi +311 -4
- data/rbi/openai/models/responses/response.rbi +12 -12
- data/rbi/openai/models/responses/response_create_params.rbi +12 -12
- data/rbi/openai/resources/responses.rbi +8 -8
- data/sig/openai/models/realtime/realtime_audio_config_input.rbs +4 -8
- data/sig/openai/models/realtime/realtime_audio_input_turn_detection.rbs +91 -65
- data/sig/openai/models/realtime/realtime_client_event.rbs +0 -1
- data/sig/openai/models/realtime/{models.rbs → realtime_function_tool.rbs} +9 -9
- data/sig/openai/models/realtime/realtime_response_create_params.rbs +1 -1
- data/sig/openai/models/realtime/realtime_server_event.rbs +0 -2
- data/sig/openai/models/realtime/realtime_session.rbs +101 -75
- data/sig/openai/models/realtime/realtime_session_create_response.rbs +108 -104
- data/sig/openai/models/realtime/realtime_tools_config_union.rbs +1 -1
- data/sig/openai/models/realtime/realtime_transcription_session_audio_input.rbs +4 -8
- data/sig/openai/models/realtime/realtime_transcription_session_audio_input_turn_detection.rbs +91 -65
- data/sig/openai/models/realtime/realtime_transcription_session_create_response.rbs +123 -35
- data/sig/openai/models/realtime/transcription_session_updated_event.rbs +118 -4
- metadata +5 -14
- data/lib/openai/models/realtime/realtime_transcription_session_client_secret.rb +0 -38
- data/lib/openai/models/realtime/realtime_transcription_session_input_audio_transcription.rb +0 -66
- data/lib/openai/models/realtime/transcription_session_created.rb +0 -43
- data/rbi/openai/models/realtime/realtime_transcription_session_client_secret.rbi +0 -51
- data/rbi/openai/models/realtime/realtime_transcription_session_input_audio_transcription.rbi +0 -144
- data/rbi/openai/models/realtime/transcription_session_created.rbi +0 -79
- data/sig/openai/models/realtime/realtime_transcription_session_client_secret.rbs +0 -20
- data/sig/openai/models/realtime/realtime_transcription_session_input_audio_transcription.rbs +0 -59
- data/sig/openai/models/realtime/transcription_session_created.rbs +0 -32
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 7603886aae923da50eb83881be4d69cf1b5be3616b9903a784f16bcd124c023d
|
4
|
+
data.tar.gz: 03feb2933c7a27590301dc9c805d66ce2e7f45e743515e7e46bf9bbf9d61dfe3
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 6de05c188535d0ed867bf936a6e97a7f452bd6924678b672d23f7868feeda7144df86dac454bfa29300d81844799c1ab77adc42767a314975d566acacb3aaefd
|
7
|
+
data.tar.gz: bf693ec699be028060d13db1fa9bd5ef3bb980118d5c4b292877bc6a16a88bd743ee10cd6c6a9d3828aa15d31e8e55058d660d831af3179050c68b110d8c705f
|
data/CHANGELOG.md
CHANGED
@@ -1,5 +1,21 @@
|
|
1
1
|
# Changelog
|
2
2
|
|
3
|
+
## 0.23.2 (2025-09-11)
|
4
|
+
|
5
|
+
Full Changelog: [v0.23.1...v0.23.2](https://github.com/openai/openai-ruby/compare/v0.23.1...v0.23.2)
|
6
|
+
|
7
|
+
### Chores
|
8
|
+
|
9
|
+
* **api:** Minor docs and type updates for realtime ([ccef982](https://github.com/openai/openai-ruby/commit/ccef9827b31206fc9ba40d2b6165eeefda7621f5))
|
10
|
+
|
11
|
+
## 0.23.1 (2025-09-10)
|
12
|
+
|
13
|
+
Full Changelog: [v0.23.0...v0.23.1](https://github.com/openai/openai-ruby/compare/v0.23.0...v0.23.1)
|
14
|
+
|
15
|
+
### Chores
|
16
|
+
|
17
|
+
* **api:** fix realtime GA types ([342f8d9](https://github.com/openai/openai-ruby/commit/342f8d9a4322cc1afba9aeabc1ff0fda5daec5c3))
|
18
|
+
|
3
19
|
## 0.23.0 (2025-09-08)
|
4
20
|
|
5
21
|
Full Changelog: [v0.22.1...v0.23.0](https://github.com/openai/openai-ruby/compare/v0.22.1...v0.23.0)
|
data/README.md
CHANGED
@@ -41,16 +41,14 @@ module OpenAI
|
|
41
41
|
module Session
|
42
42
|
extend OpenAI::Internal::Type::Union
|
43
43
|
|
44
|
+
discriminator :type
|
45
|
+
|
44
46
|
# A new Realtime session configuration, with an ephemeral key. Default TTL
|
45
47
|
# for keys is one minute.
|
46
|
-
variant -> { OpenAI::Realtime::RealtimeSessionCreateResponse }
|
47
|
-
|
48
|
-
# A
|
49
|
-
|
50
|
-
# When a session is created on the server via REST API, the session object
|
51
|
-
# also contains an ephemeral key. Default TTL for keys is 10 minutes. This
|
52
|
-
# property is not present when a session is updated via the WebSocket API.
|
53
|
-
variant -> { OpenAI::Realtime::RealtimeTranscriptionSessionCreateResponse }
|
48
|
+
variant :realtime, -> { OpenAI::Realtime::RealtimeSessionCreateResponse }
|
49
|
+
|
50
|
+
# A Realtime transcription session configuration object.
|
51
|
+
variant :transcription, -> { OpenAI::Realtime::RealtimeTranscriptionSessionCreateResponse }
|
54
52
|
|
55
53
|
# @!method self.variants
|
56
54
|
# @return [Array(OpenAI::Models::Realtime::RealtimeSessionCreateResponse, OpenAI::Models::Realtime::RealtimeTranscriptionSessionCreateResponse)]
|
@@ -5,13 +5,15 @@ module OpenAI
|
|
5
5
|
module Realtime
|
6
6
|
class InputAudioBufferTimeoutTriggered < OpenAI::Internal::Type::BaseModel
|
7
7
|
# @!attribute audio_end_ms
|
8
|
-
# Millisecond offset
|
8
|
+
# Millisecond offset of audio written to the input audio buffer at the time the
|
9
|
+
# timeout was triggered.
|
9
10
|
#
|
10
11
|
# @return [Integer]
|
11
12
|
required :audio_end_ms, Integer
|
12
13
|
|
13
14
|
# @!attribute audio_start_ms
|
14
|
-
# Millisecond offset
|
15
|
+
# Millisecond offset of audio written to the input audio buffer that was after the
|
16
|
+
# playback time of the last model response.
|
15
17
|
#
|
16
18
|
# @return [Integer]
|
17
19
|
required :audio_start_ms, Integer
|
@@ -35,11 +37,29 @@ module OpenAI
|
|
35
37
|
required :type, const: :"input_audio_buffer.timeout_triggered"
|
36
38
|
|
37
39
|
# @!method initialize(audio_end_ms:, audio_start_ms:, event_id:, item_id:, type: :"input_audio_buffer.timeout_triggered")
|
38
|
-
#
|
40
|
+
# Some parameter documentations has been truncated, see
|
41
|
+
# {OpenAI::Models::Realtime::InputAudioBufferTimeoutTriggered} for more details.
|
39
42
|
#
|
40
|
-
#
|
43
|
+
# Returned when the Server VAD timeout is triggered for the input audio buffer.
|
44
|
+
# This is configured with `idle_timeout_ms` in the `turn_detection` settings of
|
45
|
+
# the session, and it indicates that there hasn't been any speech detected for the
|
46
|
+
# configured duration.
|
41
47
|
#
|
42
|
-
#
|
48
|
+
# The `audio_start_ms` and `audio_end_ms` fields indicate the segment of audio
|
49
|
+
# after the last model response up to the triggering time, as an offset from the
|
50
|
+
# beginning of audio written to the input audio buffer. This means it demarcates
|
51
|
+
# the segment of audio that was silent and the difference between the start and
|
52
|
+
# end values will roughly match the configured timeout.
|
53
|
+
#
|
54
|
+
# The empty audio will be committed to the conversation as an `input_audio` item
|
55
|
+
# (there will be a `input_audio_buffer.committed` event) and a model response will
|
56
|
+
# be generated. There may be speech that didn't trigger VAD but is still detected
|
57
|
+
# by the model, so the model may respond with something relevant to the
|
58
|
+
# conversation or a prompt to continue speaking.
|
59
|
+
#
|
60
|
+
# @param audio_end_ms [Integer] Millisecond offset of audio written to the input audio buffer at the time the ti
|
61
|
+
#
|
62
|
+
# @param audio_start_ms [Integer] Millisecond offset of audio written to the input audio buffer that was after the
|
43
63
|
#
|
44
64
|
# @param event_id [String] The unique ID of the server event.
|
45
65
|
#
|
@@ -36,17 +36,20 @@ module OpenAI
|
|
36
36
|
# @!attribute turn_detection
|
37
37
|
# Configuration for turn detection, ether Server VAD or Semantic VAD. This can be
|
38
38
|
# set to `null` to turn off, in which case the client must manually trigger model
|
39
|
-
# response.
|
40
|
-
# speech based on audio volume and respond at the end of user speech. Semantic VAD
|
41
|
-
# is more advanced and uses a turn detection model (in conjunction with VAD) to
|
42
|
-
# semantically estimate whether the user has finished speaking, then dynamically
|
43
|
-
# sets a timeout based on this probability. For example, if user audio trails off
|
44
|
-
# with "uhhm", the model will score a low probability of turn end and wait longer
|
45
|
-
# for the user to continue speaking. This can be useful for more natural
|
46
|
-
# conversations, but may have a higher latency.
|
39
|
+
# response.
|
47
40
|
#
|
48
|
-
#
|
49
|
-
|
41
|
+
# Server VAD means that the model will detect the start and end of speech based on
|
42
|
+
# audio volume and respond at the end of user speech.
|
43
|
+
#
|
44
|
+
# Semantic VAD is more advanced and uses a turn detection model (in conjunction
|
45
|
+
# with VAD) to semantically estimate whether the user has finished speaking, then
|
46
|
+
# dynamically sets a timeout based on this probability. For example, if user audio
|
47
|
+
# trails off with "uhhm", the model will score a low probability of turn end and
|
48
|
+
# wait longer for the user to continue speaking. This can be useful for more
|
49
|
+
# natural conversations, but may have a higher latency.
|
50
|
+
#
|
51
|
+
# @return [OpenAI::Models::Realtime::RealtimeAudioInputTurnDetection::ServerVad, OpenAI::Models::Realtime::RealtimeAudioInputTurnDetection::SemanticVad, nil]
|
52
|
+
optional :turn_detection, union: -> { OpenAI::Realtime::RealtimeAudioInputTurnDetection }, nil?: true
|
50
53
|
|
51
54
|
# @!method initialize(format_: nil, noise_reduction: nil, transcription: nil, turn_detection: nil)
|
52
55
|
# Some parameter documentations has been truncated, see
|
@@ -58,7 +61,7 @@ module OpenAI
|
|
58
61
|
#
|
59
62
|
# @param transcription [OpenAI::Models::Realtime::AudioTranscription] Configuration for input audio transcription, defaults to off and can be set to `
|
60
63
|
#
|
61
|
-
# @param turn_detection [OpenAI::Models::Realtime::RealtimeAudioInputTurnDetection] Configuration for turn detection, ether Server VAD or Semantic VAD. This can be
|
64
|
+
# @param turn_detection [OpenAI::Models::Realtime::RealtimeAudioInputTurnDetection::ServerVad, OpenAI::Models::Realtime::RealtimeAudioInputTurnDetection::SemanticVad, nil] Configuration for turn detection, ether Server VAD or Semantic VAD. This can be
|
62
65
|
|
63
66
|
# @see OpenAI::Models::Realtime::RealtimeAudioConfigInput#noise_reduction
|
64
67
|
class NoiseReduction < OpenAI::Internal::Type::BaseModel
|
@@ -3,128 +3,184 @@
|
|
3
3
|
module OpenAI
|
4
4
|
module Models
|
5
5
|
module Realtime
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
24
|
-
#
|
25
|
-
|
26
|
-
|
27
|
-
#
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
32
|
-
|
33
|
-
|
34
|
-
|
35
|
-
|
36
|
-
|
37
|
-
|
38
|
-
|
39
|
-
|
40
|
-
|
41
|
-
|
42
|
-
|
43
|
-
|
44
|
-
|
45
|
-
|
46
|
-
|
47
|
-
|
48
|
-
|
49
|
-
|
50
|
-
|
51
|
-
|
52
|
-
|
53
|
-
|
54
|
-
|
55
|
-
|
56
|
-
|
57
|
-
|
58
|
-
|
59
|
-
|
60
|
-
|
61
|
-
|
62
|
-
|
63
|
-
|
64
|
-
|
65
|
-
|
66
|
-
|
67
|
-
|
68
|
-
|
69
|
-
|
70
|
-
|
71
|
-
|
72
|
-
|
73
|
-
|
74
|
-
|
75
|
-
|
76
|
-
|
77
|
-
|
78
|
-
|
79
|
-
|
80
|
-
|
81
|
-
|
82
|
-
|
83
|
-
|
84
|
-
|
85
|
-
|
86
|
-
|
87
|
-
|
88
|
-
|
89
|
-
|
90
|
-
|
91
|
-
|
92
|
-
|
93
|
-
|
94
|
-
|
95
|
-
|
96
|
-
|
97
|
-
|
98
|
-
|
99
|
-
|
100
|
-
|
101
|
-
|
102
|
-
|
103
|
-
|
104
|
-
|
105
|
-
|
106
|
-
|
107
|
-
|
108
|
-
|
109
|
-
|
110
|
-
|
111
|
-
|
112
|
-
#
|
113
|
-
# @return [Array<Symbol>]
|
6
|
+
# Configuration for turn detection, ether Server VAD or Semantic VAD. This can be
|
7
|
+
# set to `null` to turn off, in which case the client must manually trigger model
|
8
|
+
# response.
|
9
|
+
#
|
10
|
+
# Server VAD means that the model will detect the start and end of speech based on
|
11
|
+
# audio volume and respond at the end of user speech.
|
12
|
+
#
|
13
|
+
# Semantic VAD is more advanced and uses a turn detection model (in conjunction
|
14
|
+
# with VAD) to semantically estimate whether the user has finished speaking, then
|
15
|
+
# dynamically sets a timeout based on this probability. For example, if user audio
|
16
|
+
# trails off with "uhhm", the model will score a low probability of turn end and
|
17
|
+
# wait longer for the user to continue speaking. This can be useful for more
|
18
|
+
# natural conversations, but may have a higher latency.
|
19
|
+
module RealtimeAudioInputTurnDetection
|
20
|
+
extend OpenAI::Internal::Type::Union
|
21
|
+
|
22
|
+
discriminator :type
|
23
|
+
|
24
|
+
# Server-side voice activity detection (VAD) which flips on when user speech is detected and off after a period of silence.
|
25
|
+
variant :server_vad, -> { OpenAI::Realtime::RealtimeAudioInputTurnDetection::ServerVad }
|
26
|
+
|
27
|
+
# Server-side semantic turn detection which uses a model to determine when the user has finished speaking.
|
28
|
+
variant :semantic_vad, -> { OpenAI::Realtime::RealtimeAudioInputTurnDetection::SemanticVad }
|
29
|
+
|
30
|
+
class ServerVad < OpenAI::Internal::Type::BaseModel
|
31
|
+
# @!attribute type
|
32
|
+
# Type of turn detection, `server_vad` to turn on simple Server VAD.
|
33
|
+
#
|
34
|
+
# @return [Symbol, :server_vad]
|
35
|
+
required :type, const: :server_vad
|
36
|
+
|
37
|
+
# @!attribute create_response
|
38
|
+
# Whether or not to automatically generate a response when a VAD stop event
|
39
|
+
# occurs.
|
40
|
+
#
|
41
|
+
# @return [Boolean, nil]
|
42
|
+
optional :create_response, OpenAI::Internal::Type::Boolean
|
43
|
+
|
44
|
+
# @!attribute idle_timeout_ms
|
45
|
+
# Optional timeout after which a model response will be triggered automatically.
|
46
|
+
# This is useful for situations in which a long pause from the user is unexpected,
|
47
|
+
# such as a phone call. The model will effectively prompt the user to continue the
|
48
|
+
# conversation based on the current context.
|
49
|
+
#
|
50
|
+
# The timeout value will be applied after the last model response's audio has
|
51
|
+
# finished playing, i.e. it's set to the `response.done` time plus audio playback
|
52
|
+
# duration.
|
53
|
+
#
|
54
|
+
# An `input_audio_buffer.timeout_triggered` event (plus events associated with the
|
55
|
+
# Response) will be emitted when the timeout is reached. Idle timeout is currently
|
56
|
+
# only supported for `server_vad` mode.
|
57
|
+
#
|
58
|
+
# @return [Integer, nil]
|
59
|
+
optional :idle_timeout_ms, Integer, nil?: true
|
60
|
+
|
61
|
+
# @!attribute interrupt_response
|
62
|
+
# Whether or not to automatically interrupt any ongoing response with output to
|
63
|
+
# the default conversation (i.e. `conversation` of `auto`) when a VAD start event
|
64
|
+
# occurs.
|
65
|
+
#
|
66
|
+
# @return [Boolean, nil]
|
67
|
+
optional :interrupt_response, OpenAI::Internal::Type::Boolean
|
68
|
+
|
69
|
+
# @!attribute prefix_padding_ms
|
70
|
+
# Used only for `server_vad` mode. Amount of audio to include before the VAD
|
71
|
+
# detected speech (in milliseconds). Defaults to 300ms.
|
72
|
+
#
|
73
|
+
# @return [Integer, nil]
|
74
|
+
optional :prefix_padding_ms, Integer
|
75
|
+
|
76
|
+
# @!attribute silence_duration_ms
|
77
|
+
# Used only for `server_vad` mode. Duration of silence to detect speech stop (in
|
78
|
+
# milliseconds). Defaults to 500ms. With shorter values the model will respond
|
79
|
+
# more quickly, but may jump in on short pauses from the user.
|
80
|
+
#
|
81
|
+
# @return [Integer, nil]
|
82
|
+
optional :silence_duration_ms, Integer
|
83
|
+
|
84
|
+
# @!attribute threshold
|
85
|
+
# Used only for `server_vad` mode. Activation threshold for VAD (0.0 to 1.0), this
|
86
|
+
# defaults to 0.5. A higher threshold will require louder audio to activate the
|
87
|
+
# model, and thus might perform better in noisy environments.
|
88
|
+
#
|
89
|
+
# @return [Float, nil]
|
90
|
+
optional :threshold, Float
|
91
|
+
|
92
|
+
# @!method initialize(create_response: nil, idle_timeout_ms: nil, interrupt_response: nil, prefix_padding_ms: nil, silence_duration_ms: nil, threshold: nil, type: :server_vad)
|
93
|
+
# Some parameter documentations has been truncated, see
|
94
|
+
# {OpenAI::Models::Realtime::RealtimeAudioInputTurnDetection::ServerVad} for more
|
95
|
+
# details.
|
96
|
+
#
|
97
|
+
# Server-side voice activity detection (VAD) which flips on when user speech is
|
98
|
+
# detected and off after a period of silence.
|
99
|
+
#
|
100
|
+
# @param create_response [Boolean] Whether or not to automatically generate a response when a VAD stop event occurs
|
101
|
+
#
|
102
|
+
# @param idle_timeout_ms [Integer, nil] Optional timeout after which a model response will be triggered automatically. T
|
103
|
+
#
|
104
|
+
# @param interrupt_response [Boolean] Whether or not to automatically interrupt any ongoing response with output to th
|
105
|
+
#
|
106
|
+
# @param prefix_padding_ms [Integer] Used only for `server_vad` mode. Amount of audio to include before the VAD detec
|
107
|
+
#
|
108
|
+
# @param silence_duration_ms [Integer] Used only for `server_vad` mode. Duration of silence to detect speech stop (in m
|
109
|
+
#
|
110
|
+
# @param threshold [Float] Used only for `server_vad` mode. Activation threshold for VAD (0.0 to 1.0), this
|
111
|
+
#
|
112
|
+
# @param type [Symbol, :server_vad] Type of turn detection, `server_vad` to turn on simple Server VAD.
|
114
113
|
end
|
115
114
|
|
116
|
-
|
117
|
-
|
118
|
-
|
119
|
-
|
120
|
-
|
115
|
+
class SemanticVad < OpenAI::Internal::Type::BaseModel
|
116
|
+
# @!attribute type
|
117
|
+
# Type of turn detection, `semantic_vad` to turn on Semantic VAD.
|
118
|
+
#
|
119
|
+
# @return [Symbol, :semantic_vad]
|
120
|
+
required :type, const: :semantic_vad
|
121
|
+
|
122
|
+
# @!attribute create_response
|
123
|
+
# Whether or not to automatically generate a response when a VAD stop event
|
124
|
+
# occurs.
|
125
|
+
#
|
126
|
+
# @return [Boolean, nil]
|
127
|
+
optional :create_response, OpenAI::Internal::Type::Boolean
|
128
|
+
|
129
|
+
# @!attribute eagerness
|
130
|
+
# Used only for `semantic_vad` mode. The eagerness of the model to respond. `low`
|
131
|
+
# will wait longer for the user to continue speaking, `high` will respond more
|
132
|
+
# quickly. `auto` is the default and is equivalent to `medium`. `low`, `medium`,
|
133
|
+
# and `high` have max timeouts of 8s, 4s, and 2s respectively.
|
134
|
+
#
|
135
|
+
# @return [Symbol, OpenAI::Models::Realtime::RealtimeAudioInputTurnDetection::SemanticVad::Eagerness, nil]
|
136
|
+
optional :eagerness,
|
137
|
+
enum: -> { OpenAI::Realtime::RealtimeAudioInputTurnDetection::SemanticVad::Eagerness }
|
121
138
|
|
122
|
-
|
123
|
-
|
139
|
+
# @!attribute interrupt_response
|
140
|
+
# Whether or not to automatically interrupt any ongoing response with output to
|
141
|
+
# the default conversation (i.e. `conversation` of `auto`) when a VAD start event
|
142
|
+
# occurs.
|
143
|
+
#
|
144
|
+
# @return [Boolean, nil]
|
145
|
+
optional :interrupt_response, OpenAI::Internal::Type::Boolean
|
124
146
|
|
125
|
-
# @!method
|
126
|
-
#
|
147
|
+
# @!method initialize(create_response: nil, eagerness: nil, interrupt_response: nil, type: :semantic_vad)
|
148
|
+
# Some parameter documentations has been truncated, see
|
149
|
+
# {OpenAI::Models::Realtime::RealtimeAudioInputTurnDetection::SemanticVad} for
|
150
|
+
# more details.
|
151
|
+
#
|
152
|
+
# Server-side semantic turn detection which uses a model to determine when the
|
153
|
+
# user has finished speaking.
|
154
|
+
#
|
155
|
+
# @param create_response [Boolean] Whether or not to automatically generate a response when a VAD stop event occurs
|
156
|
+
#
|
157
|
+
# @param eagerness [Symbol, OpenAI::Models::Realtime::RealtimeAudioInputTurnDetection::SemanticVad::Eagerness] Used only for `semantic_vad` mode. The eagerness of the model to respond. `low`
|
158
|
+
#
|
159
|
+
# @param interrupt_response [Boolean] Whether or not to automatically interrupt any ongoing response with output to th
|
160
|
+
#
|
161
|
+
# @param type [Symbol, :semantic_vad] Type of turn detection, `semantic_vad` to turn on Semantic VAD.
|
162
|
+
|
163
|
+
# Used only for `semantic_vad` mode. The eagerness of the model to respond. `low`
|
164
|
+
# will wait longer for the user to continue speaking, `high` will respond more
|
165
|
+
# quickly. `auto` is the default and is equivalent to `medium`. `low`, `medium`,
|
166
|
+
# and `high` have max timeouts of 8s, 4s, and 2s respectively.
|
167
|
+
#
|
168
|
+
# @see OpenAI::Models::Realtime::RealtimeAudioInputTurnDetection::SemanticVad#eagerness
|
169
|
+
module Eagerness
|
170
|
+
extend OpenAI::Internal::Type::Enum
|
171
|
+
|
172
|
+
LOW = :low
|
173
|
+
MEDIUM = :medium
|
174
|
+
HIGH = :high
|
175
|
+
AUTO = :auto
|
176
|
+
|
177
|
+
# @!method self.values
|
178
|
+
# @return [Array<Symbol>]
|
179
|
+
end
|
127
180
|
end
|
181
|
+
|
182
|
+
# @!method self.variants
|
183
|
+
# @return [Array(OpenAI::Models::Realtime::RealtimeAudioInputTurnDetection::ServerVad, OpenAI::Models::Realtime::RealtimeAudioInputTurnDetection::SemanticVad)]
|
128
184
|
end
|
129
185
|
end
|
130
186
|
end
|
@@ -110,8 +110,7 @@ module OpenAI
|
|
110
110
|
|
111
111
|
# Send this event to update the session’s configuration.
|
112
112
|
# The client may send this event at any time to update any field
|
113
|
-
# except for `voice` and `model`. `voice` can be updated only if there have been no other
|
114
|
-
# audio outputs yet.
|
113
|
+
# except for `voice` and `model`. `voice` can be updated only if there have been no other audio outputs yet.
|
115
114
|
#
|
116
115
|
# When the server receives a `session.update`, it will respond
|
117
116
|
# with a `session.updated` event showing the full, effective configuration.
|
@@ -120,11 +119,8 @@ module OpenAI
|
|
120
119
|
# To clear a field like `turn_detection`, pass `null`.
|
121
120
|
variant :"session.update", -> { OpenAI::Realtime::SessionUpdateEvent }
|
122
121
|
|
123
|
-
# Send this event to update a transcription session.
|
124
|
-
variant :"transcription_session.update", -> { OpenAI::Realtime::TranscriptionSessionUpdate }
|
125
|
-
|
126
122
|
# @!method self.variants
|
127
|
-
# @return [Array(OpenAI::Models::Realtime::ConversationItemCreateEvent, OpenAI::Models::Realtime::ConversationItemDeleteEvent, OpenAI::Models::Realtime::ConversationItemRetrieveEvent, OpenAI::Models::Realtime::ConversationItemTruncateEvent, OpenAI::Models::Realtime::InputAudioBufferAppendEvent, OpenAI::Models::Realtime::InputAudioBufferClearEvent, OpenAI::Models::Realtime::OutputAudioBufferClearEvent, OpenAI::Models::Realtime::InputAudioBufferCommitEvent, OpenAI::Models::Realtime::ResponseCancelEvent, OpenAI::Models::Realtime::ResponseCreateEvent, OpenAI::Models::Realtime::SessionUpdateEvent
|
123
|
+
# @return [Array(OpenAI::Models::Realtime::ConversationItemCreateEvent, OpenAI::Models::Realtime::ConversationItemDeleteEvent, OpenAI::Models::Realtime::ConversationItemRetrieveEvent, OpenAI::Models::Realtime::ConversationItemTruncateEvent, OpenAI::Models::Realtime::InputAudioBufferAppendEvent, OpenAI::Models::Realtime::InputAudioBufferClearEvent, OpenAI::Models::Realtime::OutputAudioBufferClearEvent, OpenAI::Models::Realtime::InputAudioBufferCommitEvent, OpenAI::Models::Realtime::ResponseCancelEvent, OpenAI::Models::Realtime::ResponseCreateEvent, OpenAI::Models::Realtime::SessionUpdateEvent)]
|
128
124
|
end
|
129
125
|
end
|
130
126
|
end
|
@@ -3,7 +3,7 @@
|
|
3
3
|
module OpenAI
|
4
4
|
module Models
|
5
5
|
module Realtime
|
6
|
-
class
|
6
|
+
class RealtimeFunctionTool < OpenAI::Internal::Type::BaseModel
|
7
7
|
# @!attribute description
|
8
8
|
# The description of the function, including guidance on when and how to call it,
|
9
9
|
# and guidance about what to tell the user when calling (if anything).
|
@@ -26,12 +26,12 @@ module OpenAI
|
|
26
26
|
# @!attribute type
|
27
27
|
# The type of the tool, i.e. `function`.
|
28
28
|
#
|
29
|
-
# @return [Symbol, OpenAI::Models::Realtime::
|
30
|
-
optional :type, enum: -> { OpenAI::Realtime::
|
29
|
+
# @return [Symbol, OpenAI::Models::Realtime::RealtimeFunctionTool::Type, nil]
|
30
|
+
optional :type, enum: -> { OpenAI::Realtime::RealtimeFunctionTool::Type }
|
31
31
|
|
32
32
|
# @!method initialize(description: nil, name: nil, parameters: nil, type: nil)
|
33
33
|
# Some parameter documentations has been truncated, see
|
34
|
-
# {OpenAI::Models::Realtime::
|
34
|
+
# {OpenAI::Models::Realtime::RealtimeFunctionTool} for more details.
|
35
35
|
#
|
36
36
|
# @param description [String] The description of the function, including guidance on when and how
|
37
37
|
#
|
@@ -39,11 +39,11 @@ module OpenAI
|
|
39
39
|
#
|
40
40
|
# @param parameters [Object] Parameters of the function in JSON Schema.
|
41
41
|
#
|
42
|
-
# @param type [Symbol, OpenAI::Models::Realtime::
|
42
|
+
# @param type [Symbol, OpenAI::Models::Realtime::RealtimeFunctionTool::Type] The type of the tool, i.e. `function`.
|
43
43
|
|
44
44
|
# The type of the tool, i.e. `function`.
|
45
45
|
#
|
46
|
-
# @see OpenAI::Models::Realtime::
|
46
|
+
# @see OpenAI::Models::Realtime::RealtimeFunctionTool#type
|
47
47
|
module Type
|
48
48
|
extend OpenAI::Internal::Type::Enum
|
49
49
|
|
@@ -90,7 +90,7 @@ module OpenAI
|
|
90
90
|
# @!attribute tools
|
91
91
|
# Tools available to the model.
|
92
92
|
#
|
93
|
-
# @return [Array<OpenAI::Models::Realtime::
|
93
|
+
# @return [Array<OpenAI::Models::Realtime::RealtimeFunctionTool, OpenAI::Models::Realtime::RealtimeResponseCreateMcpTool>, nil]
|
94
94
|
optional :tools,
|
95
95
|
-> { OpenAI::Internal::Type::ArrayOf[union: OpenAI::Realtime::RealtimeResponseCreateParams::Tool] }
|
96
96
|
|
@@ -118,7 +118,7 @@ module OpenAI
|
|
118
118
|
#
|
119
119
|
# @param tool_choice [Symbol, OpenAI::Models::Responses::ToolChoiceOptions, OpenAI::Models::Responses::ToolChoiceFunction, OpenAI::Models::Responses::ToolChoiceMcp] How the model chooses tools. Provide one of the string modes or force a specific
|
120
120
|
#
|
121
|
-
# @param tools [Array<OpenAI::Models::Realtime::
|
121
|
+
# @param tools [Array<OpenAI::Models::Realtime::RealtimeFunctionTool, OpenAI::Models::Realtime::RealtimeResponseCreateMcpTool>] Tools available to the model.
|
122
122
|
|
123
123
|
# Controls which conversation the response is added to. Currently supports `auto`
|
124
124
|
# and `none`, with `auto` as the default value. The `auto` value means that the
|
@@ -210,14 +210,14 @@ module OpenAI
|
|
210
210
|
module Tool
|
211
211
|
extend OpenAI::Internal::Type::Union
|
212
212
|
|
213
|
-
variant -> { OpenAI::Realtime::
|
213
|
+
variant -> { OpenAI::Realtime::RealtimeFunctionTool }
|
214
214
|
|
215
215
|
# Give the model access to additional tools via remote Model Context Protocol
|
216
216
|
# (MCP) servers. [Learn more about MCP](https://platform.openai.com/docs/guides/tools-remote-mcp).
|
217
217
|
variant -> { OpenAI::Realtime::RealtimeResponseCreateMcpTool }
|
218
218
|
|
219
219
|
# @!method self.variants
|
220
|
-
# @return [Array(OpenAI::Models::Realtime::
|
220
|
+
# @return [Array(OpenAI::Models::Realtime::RealtimeFunctionTool, OpenAI::Models::Realtime::RealtimeResponseCreateMcpTool)]
|
221
221
|
end
|
222
222
|
end
|
223
223
|
end
|
@@ -173,13 +173,6 @@ module OpenAI
|
|
173
173
|
# there is an error.
|
174
174
|
variant :"session.updated", -> { OpenAI::Realtime::SessionUpdatedEvent }
|
175
175
|
|
176
|
-
# Returned when a transcription session is updated with a `transcription_session.update` event, unless
|
177
|
-
# there is an error.
|
178
|
-
variant :"transcription_session.updated", -> { OpenAI::Realtime::TranscriptionSessionUpdatedEvent }
|
179
|
-
|
180
|
-
# Returned when a transcription session is created.
|
181
|
-
variant :"transcription_session.created", -> { OpenAI::Realtime::TranscriptionSessionCreated }
|
182
|
-
|
183
176
|
# **WebRTC Only:** Emitted when the server begins streaming audio to the client. This event is
|
184
177
|
# emitted after an audio content part has been added (`response.content_part.added`)
|
185
178
|
# to the response.
|
@@ -215,7 +208,19 @@ module OpenAI
|
|
215
208
|
# The event will include the full content of the Item except for audio data, which can be retrieved separately with a `conversation.item.retrieve` event if needed.
|
216
209
|
variant :"conversation.item.done", -> { OpenAI::Realtime::ConversationItemDone }
|
217
210
|
|
218
|
-
# Returned when the
|
211
|
+
# Returned when the Server VAD timeout is triggered for the input audio buffer. This is configured
|
212
|
+
# with `idle_timeout_ms` in the `turn_detection` settings of the session, and it indicates that
|
213
|
+
# there hasn't been any speech detected for the configured duration.
|
214
|
+
#
|
215
|
+
# The `audio_start_ms` and `audio_end_ms` fields indicate the segment of audio after the last
|
216
|
+
# model response up to the triggering time, as an offset from the beginning of audio written
|
217
|
+
# to the input audio buffer. This means it demarcates the segment of audio that was silent and
|
218
|
+
# the difference between the start and end values will roughly match the configured timeout.
|
219
|
+
#
|
220
|
+
# The empty audio will be committed to the conversation as an `input_audio` item (there will be a
|
221
|
+
# `input_audio_buffer.committed` event) and a model response will be generated. There may be speech
|
222
|
+
# that didn't trigger VAD but is still detected by the model, so the model may respond with
|
223
|
+
# something relevant to the conversation or a prompt to continue speaking.
|
219
224
|
variant :"input_audio_buffer.timeout_triggered", -> { OpenAI::Realtime::InputAudioBufferTimeoutTriggered }
|
220
225
|
|
221
226
|
# Returned when an input audio transcription segment is identified for an item.
|
@@ -378,7 +383,7 @@ module OpenAI
|
|
378
383
|
end
|
379
384
|
|
380
385
|
# @!method self.variants
|
381
|
-
# @return [Array(OpenAI::Models::Realtime::ConversationCreatedEvent, OpenAI::Models::Realtime::ConversationItemCreatedEvent, OpenAI::Models::Realtime::ConversationItemDeletedEvent, OpenAI::Models::Realtime::ConversationItemInputAudioTranscriptionCompletedEvent, OpenAI::Models::Realtime::ConversationItemInputAudioTranscriptionDeltaEvent, OpenAI::Models::Realtime::ConversationItemInputAudioTranscriptionFailedEvent, OpenAI::Models::Realtime::RealtimeServerEvent::ConversationItemRetrieved, OpenAI::Models::Realtime::ConversationItemTruncatedEvent, OpenAI::Models::Realtime::RealtimeErrorEvent, OpenAI::Models::Realtime::InputAudioBufferClearedEvent, OpenAI::Models::Realtime::InputAudioBufferCommittedEvent, OpenAI::Models::Realtime::InputAudioBufferSpeechStartedEvent, OpenAI::Models::Realtime::InputAudioBufferSpeechStoppedEvent, OpenAI::Models::Realtime::RateLimitsUpdatedEvent, OpenAI::Models::Realtime::ResponseAudioDeltaEvent, OpenAI::Models::Realtime::ResponseAudioDoneEvent, OpenAI::Models::Realtime::ResponseAudioTranscriptDeltaEvent, OpenAI::Models::Realtime::ResponseAudioTranscriptDoneEvent, OpenAI::Models::Realtime::ResponseContentPartAddedEvent, OpenAI::Models::Realtime::ResponseContentPartDoneEvent, OpenAI::Models::Realtime::ResponseCreatedEvent, OpenAI::Models::Realtime::ResponseDoneEvent, OpenAI::Models::Realtime::ResponseFunctionCallArgumentsDeltaEvent, OpenAI::Models::Realtime::ResponseFunctionCallArgumentsDoneEvent, OpenAI::Models::Realtime::ResponseOutputItemAddedEvent, OpenAI::Models::Realtime::ResponseOutputItemDoneEvent, OpenAI::Models::Realtime::ResponseTextDeltaEvent, OpenAI::Models::Realtime::ResponseTextDoneEvent, OpenAI::Models::Realtime::SessionCreatedEvent, OpenAI::Models::Realtime::SessionUpdatedEvent, OpenAI::Models::Realtime::
|
386
|
+
# @return [Array(OpenAI::Models::Realtime::ConversationCreatedEvent, OpenAI::Models::Realtime::ConversationItemCreatedEvent, OpenAI::Models::Realtime::ConversationItemDeletedEvent, OpenAI::Models::Realtime::ConversationItemInputAudioTranscriptionCompletedEvent, OpenAI::Models::Realtime::ConversationItemInputAudioTranscriptionDeltaEvent, OpenAI::Models::Realtime::ConversationItemInputAudioTranscriptionFailedEvent, OpenAI::Models::Realtime::RealtimeServerEvent::ConversationItemRetrieved, OpenAI::Models::Realtime::ConversationItemTruncatedEvent, OpenAI::Models::Realtime::RealtimeErrorEvent, OpenAI::Models::Realtime::InputAudioBufferClearedEvent, OpenAI::Models::Realtime::InputAudioBufferCommittedEvent, OpenAI::Models::Realtime::InputAudioBufferSpeechStartedEvent, OpenAI::Models::Realtime::InputAudioBufferSpeechStoppedEvent, OpenAI::Models::Realtime::RateLimitsUpdatedEvent, OpenAI::Models::Realtime::ResponseAudioDeltaEvent, OpenAI::Models::Realtime::ResponseAudioDoneEvent, OpenAI::Models::Realtime::ResponseAudioTranscriptDeltaEvent, OpenAI::Models::Realtime::ResponseAudioTranscriptDoneEvent, OpenAI::Models::Realtime::ResponseContentPartAddedEvent, OpenAI::Models::Realtime::ResponseContentPartDoneEvent, OpenAI::Models::Realtime::ResponseCreatedEvent, OpenAI::Models::Realtime::ResponseDoneEvent, OpenAI::Models::Realtime::ResponseFunctionCallArgumentsDeltaEvent, OpenAI::Models::Realtime::ResponseFunctionCallArgumentsDoneEvent, OpenAI::Models::Realtime::ResponseOutputItemAddedEvent, OpenAI::Models::Realtime::ResponseOutputItemDoneEvent, OpenAI::Models::Realtime::ResponseTextDeltaEvent, OpenAI::Models::Realtime::ResponseTextDoneEvent, OpenAI::Models::Realtime::SessionCreatedEvent, OpenAI::Models::Realtime::SessionUpdatedEvent, OpenAI::Models::Realtime::RealtimeServerEvent::OutputAudioBufferStarted, OpenAI::Models::Realtime::RealtimeServerEvent::OutputAudioBufferStopped, OpenAI::Models::Realtime::RealtimeServerEvent::OutputAudioBufferCleared, OpenAI::Models::Realtime::ConversationItemAdded, OpenAI::Models::Realtime::ConversationItemDone, OpenAI::Models::Realtime::InputAudioBufferTimeoutTriggered, OpenAI::Models::Realtime::ConversationItemInputAudioTranscriptionSegment, OpenAI::Models::Realtime::McpListToolsInProgress, OpenAI::Models::Realtime::McpListToolsCompleted, OpenAI::Models::Realtime::McpListToolsFailed, OpenAI::Models::Realtime::ResponseMcpCallArgumentsDelta, OpenAI::Models::Realtime::ResponseMcpCallArgumentsDone, OpenAI::Models::Realtime::ResponseMcpCallInProgress, OpenAI::Models::Realtime::ResponseMcpCallCompleted, OpenAI::Models::Realtime::ResponseMcpCallFailed)]
|
382
387
|
end
|
383
388
|
end
|
384
389
|
end
|