elevenlabs_client 0.4.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 2b65be08b17b9ae232158f2c004511a7840688e76853c014f9be37984a9639d1
4
- data.tar.gz: 2de88d74e59af044cfe943e32f0d2f2d8b45f71200469a69b3f95fc7605436d2
3
+ metadata.gz: f1c75ecb60655822ec4d8b88e22ebae5e0a1714e5573000cd5a36c3e80bcb886
4
+ data.tar.gz: 5d05b4e838bc30cbc1c290b615b1c0d686ea6d6aafe9521097ddc00d0ba28189
5
5
  SHA512:
6
- metadata.gz: f919ebdf7090d2f4cdd812589eccd1425e8be5e86615c04f3bf2882c7e8e8e07058db4f37cdf420e6b1948c668c3acbb177fe796b48c7f23563fa41a7204f4ce
7
- data.tar.gz: 23b7dc77bb3ca90e2019d4098887b8555d103d66c1fdf259f883f7952b4c26f57671f9bd2e250e27491acc42c2518e85b37bcb48cdb678fd163f9b3be9b1d7e4
6
+ metadata.gz: e26733f1b2ddaaec79432e7458f2af56b50d0f29bb52bdddc4fcbdbb564c85eea40949c7304fef7a4af3da5ff2c364bb42341b3755bf385cc6e81bb429f81aa5
7
+ data.tar.gz: 59f527fa65e17375fa3a33eb8f7d140a3f59ccd87f51168e6f43bac5c94c3d93fa49026dc4ee6fbe5eb4fb7b0a772f6e6925b6051092f112b12936e3b154009e
data/CHANGELOG.md CHANGED
@@ -5,6 +5,38 @@ All notable changes to this project will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [0.5.0] - 2025-09-14
9
+
10
+ ### Added
11
+
12
+ - Text-to-Speech With Timestamps
13
+ - `client.text_to_speech_with_timestamps.generate(voice_id, text, **options)`
14
+ - Character-level `alignment` and `normalized_alignment`
15
+ - Streaming Text-to-Speech With Timestamps
16
+ - `client.text_to_speech_stream_with_timestamps.stream(voice_id, text, **options, &block)`
17
+ - JSON streaming with audio chunks and timing per chunk
18
+ - WebSocket Streaming Enhancements
19
+ - Single-context and multi-context improvements; correct query param ordering and filtering
20
+ - Docs: `docs/WEBSOCKET_STREAMING.md`
21
+ - Text-to-Dialogue Streaming
22
+ - `client.text_to_dialogue_stream.stream(inputs, **options, &block)`
23
+ - Docs: `docs/TEXT_TO_DIALOGUE_STREAMING.md`
24
+
25
+ ### Improved
26
+
27
+ - Client streaming JSON handling for timestamp streams (`post_streaming_with_timestamps`)
28
+ - Robust parsing and block yielding across streaming tests
29
+ - URL query parameter ordering to match expectations in tests
30
+
31
+ ### Tests
32
+
33
+ - Added comprehensive unit and integration tests for all new endpoints
34
+ - Full suite now: 687 examples, 0 failures
35
+
36
+ ### Notes
37
+
38
+ - These features require valid ElevenLabs API keys and correct model/voice permissions
39
+
8
40
  ## [0.4.0] - 2025-09-12
9
41
 
10
42
  ### Added
data/README.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  [![Gem Version](https://badge.fury.io/rb/elevenlabs_client.svg)](https://badge.fury.io/rb/elevenlabs_client)
4
4
 
5
- A comprehensive Ruby client library for the ElevenLabs API, supporting voice synthesis, dubbing, dialogue generation, sound effects, and AI music composition.
5
+ A comprehensive Ruby client library for the ElevenLabs API, supporting voice synthesis, dubbing, dialogue generation, sound effects, AI music composition, voice transformation, speech transcription, audio isolation, and advanced audio processing features.
6
6
 
7
7
  ## Features
8
8
 
@@ -13,6 +13,11 @@ A comprehensive Ruby client library for the ElevenLabs API, supporting voice syn
13
13
  🎵 **Music Generation** - AI-powered music composition and streaming
14
14
  🎨 **Voice Design** - Create custom voices from text descriptions
15
15
  🎭 **Voice Management** - Create, edit, and manage individual voices
16
+ 🔄 **Speech-to-Speech** - Transform audio from one voice to another (Voice Changer)
17
+ 📝 **Speech-to-Text** - Transcribe audio and video files with advanced features
18
+ 🔇 **Audio Isolation** - Remove background noise from audio files
19
+ 📱 **Audio Native** - Create embeddable audio players for websites
20
+ ⏱️ **Forced Alignment** - Get precise timing information for audio transcripts
16
21
  🤖 **Models** - List available models and their capabilities
17
22
  📡 **Streaming** - Real-time audio streaming
18
23
  ⚙️ **Configurable** - Flexible configuration options
@@ -141,6 +146,60 @@ music_data = client.music.compose(
141
146
  )
142
147
  File.open("generated_music.mp3", "wb") { |f| f.write(music_data) }
143
148
 
149
+ # Speech-to-Speech (Voice Changer)
150
+ File.open("input_audio.mp3", "rb") do |audio_file|
151
+ converted_audio = client.speech_to_speech.convert(
152
+ "target_voice_id",
153
+ audio_file,
154
+ "input_audio.mp3",
155
+ remove_background_noise: true
156
+ )
157
+ File.open("converted_audio.mp3", "wb") { |f| f.write(converted_audio) }
158
+ end
159
+
160
+ # Speech-to-Text Transcription
161
+ File.open("audio.mp3", "rb") do |audio_file|
162
+ transcription = client.speech_to_text.create(
163
+ "scribe_v1",
164
+ file: audio_file,
165
+ filename: "audio.mp3",
166
+ diarize: true,
167
+ timestamps_granularity: "word"
168
+ )
169
+ puts "Transcribed: #{transcription['text']}"
170
+ end
171
+
172
+ # Audio Isolation (Background Noise Removal)
173
+ File.open("noisy_audio.mp3", "rb") do |audio_file|
174
+ clean_audio = client.audio_isolation.isolate(audio_file, "noisy_audio.mp3")
175
+ File.open("clean_audio.mp3", "wb") { |f| f.write(clean_audio) }
176
+ end
177
+
178
+ # Audio Native (Embeddable Player)
179
+ File.open("article.html", "rb") do |html_file|
180
+ project = client.audio_native.create(
181
+ "My Article",
182
+ file: html_file,
183
+ filename: "article.html",
184
+ voice_id: "voice_id",
185
+ auto_convert: true
186
+ )
187
+ puts "Player HTML: #{project['html_snippet']}"
188
+ end
189
+
190
+ # Forced Alignment
191
+ File.open("speech.wav", "rb") do |audio_file|
192
+ alignment = client.forced_alignment.create(
193
+ audio_file,
194
+ "speech.wav",
195
+ "Hello world, this is a test transcript"
196
+ )
197
+
198
+ alignment['words'].each do |word|
199
+ puts "#{word['text']}: #{word['start']}s - #{word['end']}s"
200
+ end
201
+ end
202
+
144
203
  # Streaming Text-to-Speech
145
204
  client.text_to_speech_stream.stream("voice_id", "Streaming text") do |chunk|
146
205
  # Process audio chunk in real-time
@@ -160,6 +219,11 @@ end
160
219
  - **[Music Generation API](docs/MUSIC.md)** - AI-powered music composition and streaming
161
220
  - **[Text-to-Voice API](docs/TEXT_TO_VOICE.md)** - Design and create custom voices
162
221
  - **[Voice Management API](docs/VOICES.md)** - Manage individual voices (CRUD operations)
222
+ - **[Speech-to-Speech API](docs/SPEECH_TO_SPEECH.md)** - Transform audio from one voice to another
223
+ - **[Speech-to-Text API](docs/SPEECH_TO_TEXT.md)** - Transcribe audio and video files
224
+ - **[Audio Isolation API](docs/AUDIO_ISOLATION.md)** - Remove background noise from audio
225
+ - **[Audio Native API](docs/AUDIO_NATIVE.md)** - Create embeddable audio players
226
+ - **[Forced Alignment API](docs/FORCED_ALIGNMENT.md)** - Get precise timing information
163
227
  - **[Models API](docs/MODELS.md)** - List available models and capabilities
164
228
 
165
229
  ### Available Endpoints
@@ -174,6 +238,11 @@ end
174
238
  | `client.music.*` | AI music composition and streaming | [MUSIC.md](docs/MUSIC.md) |
175
239
  | `client.text_to_voice.*` | Voice design and creation | [TEXT_TO_VOICE.md](docs/TEXT_TO_VOICE.md) |
176
240
  | `client.voices.*` | Voice management (CRUD) | [VOICES.md](docs/VOICES.md) |
241
+ | `client.speech_to_speech.*` | Voice changer and audio transformation | [SPEECH_TO_SPEECH.md](docs/SPEECH_TO_SPEECH.md) |
242
+ | `client.speech_to_text.*` | Audio/video transcription | [SPEECH_TO_TEXT.md](docs/SPEECH_TO_TEXT.md) |
243
+ | `client.audio_isolation.*` | Background noise removal | [AUDIO_ISOLATION.md](docs/AUDIO_ISOLATION.md) |
244
+ | `client.audio_native.*` | Embeddable audio players | [AUDIO_NATIVE.md](docs/AUDIO_NATIVE.md) |
245
+ | `client.forced_alignment.*` | Audio-text timing alignment | [FORCED_ALIGNMENT.md](docs/FORCED_ALIGNMENT.md) |
177
246
  | `client.models.*` | Model information and capabilities | [MODELS.md](docs/MODELS.md) |
178
247
 
179
248
  ## Configuration Options
@@ -221,6 +290,9 @@ end
221
290
  - `AuthenticationError` - Invalid API key or authentication failure
222
291
  - `RateLimitError` - Rate limit exceeded
223
292
  - `ValidationError` - Invalid request parameters
293
+ - `NotFoundError` - Resource not found (e.g., voice ID, transcript ID)
294
+ - `BadRequestError` - Bad request with invalid parameters
295
+ - `UnprocessableEntityError` - Request cannot be processed (e.g., invalid file format)
224
296
  - `APIError` - General API errors
225
297
 
226
298
  ## Rails Integration
@@ -235,6 +307,11 @@ The gem is designed to work seamlessly with Rails applications. See the [example
235
307
  - [MusicController](examples/music_controller.rb) - AI music composition and streaming
236
308
  - [TextToVoiceController](examples/text_to_voice_controller.rb) - Voice design and creation
237
309
  - [VoicesController](examples/voices_controller.rb) - Voice management (CRUD operations)
310
+ - [SpeechToSpeechController](examples/speech_to_speech_controller.rb) - Voice changer and audio transformation
311
+ - [SpeechToTextController](examples/speech_to_text_controller.rb) - Audio/video transcription with advanced features
312
+ - [AudioIsolationController](examples/audio_isolation_controller.rb) - Background noise removal and audio cleanup
313
+ - [AudioNativeController](examples/audio_native_controller.rb) - Embeddable audio players for websites
314
+ - [ForcedAlignmentController](examples/forced_alignment_controller.rb) - Audio-text timing alignment and subtitle generation
238
315
 
239
316
  ## Development
240
317
 
@@ -2,12 +2,13 @@
2
2
 
3
3
  require "faraday"
4
4
  require "faraday/multipart"
5
+ require "json"
5
6
 
6
7
  module ElevenlabsClient
7
8
  class Client
8
9
  DEFAULT_BASE_URL = "https://api.elevenlabs.io"
9
10
 
10
- attr_reader :base_url, :api_key, :dubs, :text_to_speech, :text_to_speech_stream, :text_to_dialogue, :sound_generation, :text_to_voice, :models, :voices, :music
11
+ attr_reader :base_url, :api_key, :dubs, :text_to_speech, :text_to_speech_stream, :text_to_speech_with_timestamps, :text_to_speech_stream_with_timestamps, :text_to_dialogue, :text_to_dialogue_stream, :sound_generation, :text_to_voice, :models, :voices, :music, :audio_isolation, :audio_native, :forced_alignment, :speech_to_speech, :speech_to_text, :websocket_text_to_speech
11
12
 
12
13
  def initialize(api_key: nil, base_url: nil, api_key_env: "ELEVENLABS_API_KEY", base_url_env: "ELEVENLABS_BASE_URL")
13
14
  @api_key = api_key || fetch_api_key(api_key_env)
@@ -16,12 +17,21 @@ module ElevenlabsClient
16
17
  @dubs = Dubs.new(self)
17
18
  @text_to_speech = TextToSpeech.new(self)
18
19
  @text_to_speech_stream = TextToSpeechStream.new(self)
20
+ @text_to_speech_with_timestamps = TextToSpeechWithTimestamps.new(self)
21
+ @text_to_speech_stream_with_timestamps = TextToSpeechStreamWithTimestamps.new(self)
19
22
  @text_to_dialogue = TextToDialogue.new(self)
23
+ @text_to_dialogue_stream = TextToDialogueStream.new(self)
20
24
  @sound_generation = SoundGeneration.new(self)
21
25
  @text_to_voice = TextToVoice.new(self)
22
26
  @models = Models.new(self)
23
27
  @voices = Voices.new(self)
24
28
  @music = Endpoints::Music.new(self)
29
+ @audio_isolation = AudioIsolation.new(self)
30
+ @audio_native = AudioNative.new(self)
31
+ @forced_alignment = ForcedAlignment.new(self)
32
+ @speech_to_speech = SpeechToSpeech.new(self)
33
+ @speech_to_text = SpeechToText.new(self)
34
+ @websocket_text_to_speech = WebSocketTextToSpeech.new(self)
25
35
  end
26
36
 
27
37
  # Makes an authenticated GET request
@@ -144,6 +154,44 @@ module ElevenlabsClient
144
154
  handle_response(response)
145
155
  end
146
156
 
157
+ # Makes an authenticated POST request with streaming response for timestamp data
158
+ # @param path [String] API endpoint path
159
+ # @param body [Hash, nil] Request body
160
+ # @param block [Proc] Block to handle each JSON chunk with timestamps
161
+ # @return [Faraday::Response] Response object
162
+ def post_streaming_with_timestamps(path, body = nil, &block)
163
+ buffer = ""
164
+
165
+ response = @conn.post(path) do |req|
166
+ req.headers["xi-api-key"] = api_key
167
+ req.headers["Content-Type"] = "application/json"
168
+ req.body = body.to_json if body
169
+
170
+ # Set up streaming callback for JSON chunks
171
+ req.options.on_data = proc do |chunk, _|
172
+ if block_given?
173
+ buffer += chunk
174
+
175
+ # Process complete JSON objects
176
+ while buffer.include?("\n")
177
+ line, buffer = buffer.split("\n", 2)
178
+ next if line.strip.empty?
179
+
180
+ begin
181
+ json_data = JSON.parse(line)
182
+ block.call(json_data)
183
+ rescue JSON::ParserError
184
+ # Skip malformed JSON lines
185
+ next
186
+ end
187
+ end
188
+ end
189
+ end
190
+ end
191
+
192
+ handle_response(response)
193
+ end
194
+
147
195
  # Helper method to create Faraday::Multipart::FilePart
148
196
  # @param file_io [IO] File IO object
149
197
  # @param filename [String] Original filename
@@ -0,0 +1,71 @@
1
+ # frozen_string_literal: true
2
+
3
+ module ElevenlabsClient
4
+ class AudioIsolation
5
+ def initialize(client)
6
+ @client = client
7
+ end
8
+
9
+ # POST /v1/audio-isolation
10
+ # Removes background noise from audio
11
+ # Documentation: https://elevenlabs.io/docs/api-reference/audio-isolation
12
+ #
13
+ # @param audio_file [IO, File] The audio file from which vocals/speech will be isolated
14
+ # @param filename [String] Original filename for the audio file
15
+ # @param options [Hash] Optional parameters
16
+ # @option options [String] :file_format Format of input audio ('pcm_s16le_16' or 'other', defaults to 'other')
17
+ # @return [String] Binary audio data with background noise removed
18
+ def isolate(audio_file, filename, **options)
19
+ endpoint = "/v1/audio-isolation"
20
+
21
+ payload = {
22
+ audio: @client.file_part(audio_file, filename)
23
+ }
24
+
25
+ # Add optional parameters if provided
26
+ payload[:file_format] = options[:file_format] if options[:file_format]
27
+
28
+ @client.post_multipart(endpoint, payload)
29
+ end
30
+
31
+ # POST /v1/audio-isolation/stream
32
+ # Removes background noise from audio with streaming response
33
+ # Documentation: https://elevenlabs.io/docs/api-reference/audio-isolation/stream
34
+ #
35
+ # @param audio_file [IO, File] The audio file from which vocals/speech will be isolated
36
+ # @param filename [String] Original filename for the audio file
37
+ # @param options [Hash] Optional parameters
38
+ # @option options [String] :file_format Format of input audio ('pcm_s16le_16' or 'other', defaults to 'other')
39
+ # @param block [Proc] Block to handle each chunk of streaming audio data
40
+ # @return [Faraday::Response] Response object for streaming
41
+ def isolate_stream(audio_file, filename, **options, &block)
42
+ endpoint = "/v1/audio-isolation/stream"
43
+
44
+ payload = {
45
+ audio: @client.file_part(audio_file, filename)
46
+ }
47
+
48
+ # Add optional parameters if provided
49
+ payload[:file_format] = options[:file_format] if options[:file_format]
50
+
51
+ # Use streaming multipart request
52
+ response = @client.instance_variable_get(:@conn).post(endpoint) do |req|
53
+ req.headers["xi-api-key"] = @client.api_key
54
+ req.body = payload
55
+
56
+ # Set up streaming callback if block provided
57
+ if block_given?
58
+ req.options.on_data = proc do |chunk, _|
59
+ block.call(chunk)
60
+ end
61
+ end
62
+ end
63
+
64
+ @client.send(:handle_response, response)
65
+ end
66
+
67
+ private
68
+
69
+ attr_reader :client
70
+ end
71
+ end
@@ -0,0 +1,103 @@
1
+ # frozen_string_literal: true
2
+
3
+ module ElevenlabsClient
4
+ class AudioNative
5
+ def initialize(client)
6
+ @client = client
7
+ end
8
+
9
+ # POST /v1/audio-native
10
+ # Creates Audio Native enabled project, optionally starts conversion and returns project ID and embeddable HTML snippet
11
+ # Documentation: https://elevenlabs.io/docs/api-reference/audio-native/create
12
+ #
13
+ # @param name [String] Project name
14
+ # @param options [Hash] Optional parameters
15
+ # @option options [String] :image Image URL used in the player (deprecated)
16
+ # @option options [String] :author Author used in the player
17
+ # @option options [String] :title Title used in the player
18
+ # @option options [Boolean] :small Whether to use small player (deprecated, defaults to false)
19
+ # @option options [String] :text_color Text color used in the player
20
+ # @option options [String] :background_color Background color used in the player
21
+ # @option options [Integer] :sessionization Minutes to persist session (deprecated, defaults to 0)
22
+ # @option options [String] :voice_id Voice ID used to voice the content
23
+ # @option options [String] :model_id TTS Model ID used in the player
24
+ # @option options [IO, File] :file Text or HTML input file containing article content
25
+ # @option options [String] :filename Original filename for the file
26
+ # @option options [Boolean] :auto_convert Whether to auto convert project to audio (defaults to false)
27
+ # @option options [String] :apply_text_normalization Text normalization mode ('auto', 'on', 'off', 'apply_english')
28
+ # @return [Hash] JSON response containing project_id, converting status, and html_snippet
29
+ def create(name, **options)
30
+ endpoint = "/v1/audio-native"
31
+
32
+ payload = { name: name }
33
+
34
+ # Add optional parameters if provided
35
+ payload[:image] = options[:image] if options[:image]
36
+ payload[:author] = options[:author] if options[:author]
37
+ payload[:title] = options[:title] if options[:title]
38
+ payload[:small] = options[:small] unless options[:small].nil?
39
+ payload[:text_color] = options[:text_color] if options[:text_color]
40
+ payload[:background_color] = options[:background_color] if options[:background_color]
41
+ payload[:sessionization] = options[:sessionization] if options[:sessionization]
42
+ payload[:voice_id] = options[:voice_id] if options[:voice_id]
43
+ payload[:model_id] = options[:model_id] if options[:model_id]
44
+ payload[:auto_convert] = options[:auto_convert] unless options[:auto_convert].nil?
45
+ payload[:apply_text_normalization] = options[:apply_text_normalization] if options[:apply_text_normalization]
46
+
47
+ # Add file if provided
48
+ if options[:file] && options[:filename]
49
+ payload[:file] = @client.file_part(options[:file], options[:filename])
50
+ end
51
+
52
+ @client.post_multipart(endpoint, payload)
53
+ end
54
+
55
+ # POST /v1/audio-native/:project_id/content
56
+ # Updates content for the specific AudioNative Project
57
+ # Documentation: https://elevenlabs.io/docs/api-reference/audio-native/update
58
+ #
59
+ # @param project_id [String] The ID of the project to be used
60
+ # @param options [Hash] Optional parameters
61
+ # @option options [IO, File] :file Text or HTML input file containing article content
62
+ # @option options [String] :filename Original filename for the file
63
+ # @option options [Boolean] :auto_convert Whether to auto convert project to audio (defaults to false)
64
+ # @option options [Boolean] :auto_publish Whether to auto publish after conversion (defaults to false)
65
+ # @return [Hash] JSON response containing project_id, converting, publishing status, and html_snippet
66
+ def update_content(project_id, **options)
67
+ endpoint = "/v1/audio-native/#{project_id}/content"
68
+
69
+ payload = {}
70
+
71
+ # Add optional parameters if provided
72
+ payload[:auto_convert] = options[:auto_convert] unless options[:auto_convert].nil?
73
+ payload[:auto_publish] = options[:auto_publish] unless options[:auto_publish].nil?
74
+
75
+ # Add file if provided
76
+ if options[:file] && options[:filename]
77
+ payload[:file] = @client.file_part(options[:file], options[:filename])
78
+ end
79
+
80
+ @client.post_multipart(endpoint, payload)
81
+ end
82
+
83
+ # GET /v1/audio-native/:project_id/settings
84
+ # Get player settings for the specific project
85
+ # Documentation: https://elevenlabs.io/docs/api-reference/audio-native/settings
86
+ #
87
+ # @param project_id [String] The ID of the Studio project
88
+ # @return [Hash] JSON response containing enabled status, snapshot_id, and settings
89
+ def get_settings(project_id)
90
+ endpoint = "/v1/audio-native/#{project_id}/settings"
91
+ @client.get(endpoint)
92
+ end
93
+
94
+ # Alias methods for convenience
95
+ alias_method :create_project, :create
96
+ alias_method :update_project_content, :update_content
97
+ alias_method :project_settings, :get_settings
98
+
99
+ private
100
+
101
+ attr_reader :client
102
+ end
103
+ end
@@ -8,6 +8,7 @@ module ElevenlabsClient
8
8
 
9
9
  # POST /v1/dubbing (multipart)
10
10
  # Creates a new dubbing job
11
+ # Documentation: https://elevenlabs.io/docs/api-reference/dubbing/create
11
12
  #
12
13
  # @param file_io [IO] The audio/video file to dub
13
14
  # @param filename [String] Original filename
@@ -19,8 +20,9 @@ module ElevenlabsClient
19
20
  payload = {
20
21
  file: @client.file_part(file_io, filename),
21
22
  mode: "automatic",
22
- target_languages: target_languages,
23
- name: name
23
+ name: name,
24
+ target_lang: target_languages.first,
25
+ num_speakers: 1
24
26
  }.compact.merge(options)
25
27
 
26
28
  @client.post_multipart("/v1/dubbing", payload)
@@ -28,6 +30,7 @@ module ElevenlabsClient
28
30
 
29
31
  # GET /v1/dubbing/{id}
30
32
  # Retrieves dubbing job details
33
+ # Documentation: https://elevenlabs.io/docs/api-reference/dubbing/get
31
34
  #
32
35
  # @param dubbing_id [String] The dubbing job ID
33
36
  # @return [Hash] Dubbing job details
@@ -37,6 +40,7 @@ module ElevenlabsClient
37
40
 
38
41
  # GET /v1/dubbing
39
42
  # Lists dubbing jobs
43
+ # Documentation: https://elevenlabs.io/docs/api-reference/dubbing
40
44
  #
41
45
  # @param params [Hash] Query parameters (dubbing_status, page_size, etc.)
42
46
  # @return [Hash] List of dubbing jobs
@@ -46,6 +50,7 @@ module ElevenlabsClient
46
50
 
47
51
  # GET /v1/dubbing/{id}/resources
48
52
  # Retrieves dubbing resources for editing (if dubbing_studio: true was used)
53
+ # Documentation: https://elevenlabs.io/docs/api-reference/dubbing/resources/get-resource
49
54
  #
50
55
  # @param dubbing_id [String] The dubbing job ID
51
56
  # @return [Hash] Dubbing resources
@@ -55,6 +60,7 @@ module ElevenlabsClient
55
60
 
56
61
  # DELETE /v1/dubbing/{id}
57
62
  # Deletes a dubbing project
63
+ # Documentation: https://elevenlabs.io/docs/api-reference/dubbing/delete
58
64
  #
59
65
  # @param dubbing_id [String] The dubbing job ID
60
66
  # @return [Hash] Response with status
@@ -64,6 +70,7 @@ module ElevenlabsClient
64
70
 
65
71
  # GET /v1/dubbing/resource/{dubbing_id}
66
72
  # Gets dubbing resource with detailed information including segments, speakers, etc.
73
+ # Documentation: https://elevenlabs.io/docs/api-reference/dubbing/resources/get-resource
67
74
  #
68
75
  # @param dubbing_id [String] The dubbing job ID
69
76
  # @return [Hash] Detailed dubbing resource information
@@ -73,6 +80,7 @@ module ElevenlabsClient
73
80
 
74
81
  # POST /v1/dubbing/resource/{dubbing_id}/speaker/{speaker_id}/segment
75
82
  # Creates a new segment in dubbing resource
83
+ # Documentation: https://elevenlabs.io/docs/api-reference/dubbing/resources/create-segment
76
84
  #
77
85
  # @param dubbing_id [String] The dubbing job ID
78
86
  # @param speaker_id [String] The speaker ID
@@ -94,6 +102,7 @@ module ElevenlabsClient
94
102
 
95
103
  # DELETE /v1/dubbing/resource/{dubbing_id}/segment/{segment_id}
96
104
  # Deletes a single segment from the dubbing
105
+ # Documentation: https://elevenlabs.io/docs/api-reference/dubbing/resources/delete-segment
97
106
  #
98
107
  # @param dubbing_id [String] The dubbing job ID
99
108
  # @param segment_id [String] The segment ID
@@ -104,6 +113,7 @@ module ElevenlabsClient
104
113
 
105
114
  # PATCH /v1/dubbing/resource/{dubbing_id}/segment/{segment_id}/{language}
106
115
  # Updates a single segment with new text and/or start/end times
116
+ # Documentation: https://elevenlabs.io/docs/api-reference/dubbing/resources/update-segment
107
117
  #
108
118
  # @param dubbing_id [String] The dubbing job ID
109
119
  # @param segment_id [String] The segment ID
@@ -124,6 +134,7 @@ module ElevenlabsClient
124
134
 
125
135
  # POST /v1/dubbing/resource/{dubbing_id}/transcribe
126
136
  # Regenerates transcriptions for specified segments
137
+ # Documentation: https://elevenlabs.io/docs/api-reference/dubbing/resources/transcribe-segment
127
138
  #
128
139
  # @param dubbing_id [String] The dubbing job ID
129
140
  # @param segments [Array<String>] List of segment IDs to transcribe
@@ -135,6 +146,7 @@ module ElevenlabsClient
135
146
 
136
147
  # POST /v1/dubbing/resource/{dubbing_id}/translate
137
148
  # Regenerates translations for specified segments/languages
149
+ # Documentation: https://elevenlabs.io/docs/api-reference/dubbing/resources/translate-segment
138
150
  #
139
151
  # @param dubbing_id [String] The dubbing job ID
140
152
  # @param segments [Array<String>] List of segment IDs to translate
@@ -151,6 +163,7 @@ module ElevenlabsClient
151
163
 
152
164
  # POST /v1/dubbing/resource/{dubbing_id}/dub
153
165
  # Regenerates dubs for specified segments/languages
166
+ # Documentation: https://elevenlabs.io/docs/api-reference/dubbing/resources/dub-segment
154
167
  #
155
168
  # @param dubbing_id [String] The dubbing job ID
156
169
  # @param segments [Array<String>] List of segment IDs to dub
@@ -167,6 +180,7 @@ module ElevenlabsClient
167
180
 
168
181
  # POST /v1/dubbing/resource/{dubbing_id}/render/{language}
169
182
  # Renders the output media for a language
183
+ # Documentation: https://elevenlabs.io/docs/api-reference/dubbing/resources/render-project
170
184
  #
171
185
  # @param dubbing_id [String] The dubbing job ID
172
186
  # @param language [String] The language to render
@@ -184,6 +198,7 @@ module ElevenlabsClient
184
198
 
185
199
  # PATCH /v1/dubbing/resource/{dubbing_id}/speaker/{speaker_id}
186
200
  # Updates speaker metadata such as voice
201
+ # Documentation: https://elevenlabs.io/docs/api-reference/dubbing/resources/update-speaker
187
202
  #
188
203
  # @param dubbing_id [String] The dubbing job ID
189
204
  # @param speaker_id [String] The speaker ID
@@ -201,6 +216,7 @@ module ElevenlabsClient
201
216
 
202
217
  # GET /v1/dubbing/resource/{dubbing_id}/speaker/{speaker_id}/similar-voices
203
218
  # Gets similar voices for a speaker
219
+ # Documentation: https://elevenlabs.io/docs/api-reference/dubbing/resources/get-similar-voices
204
220
  #
205
221
  # @param dubbing_id [String] The dubbing job ID
206
222
  # @param speaker_id [String] The speaker ID
@@ -209,6 +225,40 @@ module ElevenlabsClient
209
225
  @client.get("/v1/dubbing/resource/#{dubbing_id}/speaker/#{speaker_id}/similar-voices")
210
226
  end
211
227
 
228
+ # GET /v1/dubbing/{dubbing_id}/audio/{language_code}
229
+ # Returns dub as a streamed MP3 or MP4 file
230
+ # Documentation: https://elevenlabs.io/docs/api-reference/dubbing/audio/get
231
+ #
232
+ # @param dubbing_id [String] ID of the dubbing project
233
+ # @param language_code [String] ID of the language
234
+ # @return [String] Binary audio/video data
235
+ def get_dubbed_audio(dubbing_id, language_code)
236
+ endpoint = "/v1/dubbing/#{dubbing_id}/audio/#{language_code}"
237
+ @client.get(endpoint)
238
+ end
239
+
240
+ # GET /v1/dubbing/{dubbing_id}/transcript/{language_code}
241
+ # Returns transcript for the dub as an SRT or WEBVTT file
242
+ # Documentation: https://elevenlabs.io/docs/api-reference/dubbing/transcript/get-transcript-for-dub
243
+ #
244
+ # @param dubbing_id [String] ID of the dubbing project
245
+ # @param language_code [String] ID of the language
246
+ # @param options [Hash] Optional parameters
247
+ # @option options [String] :format_type Format to use ("srt" or "webvtt", default: "srt")
248
+ # @return [String] Transcript in specified format
249
+ def get_dubbed_transcript(dubbing_id, language_code, **options)
250
+ endpoint = "/v1/dubbing/#{dubbing_id}/transcript/#{language_code}"
251
+
252
+ params = {}
253
+ params[:format_type] = options[:format_type] if options[:format_type]
254
+
255
+ @client.get(endpoint, params)
256
+ end
257
+
258
+ # Alias methods for convenience
259
+ alias_method :dubbed_audio, :get_dubbed_audio
260
+ alias_method :dubbed_transcript, :get_dubbed_transcript
261
+
212
262
  private
213
263
 
214
264
  attr_reader :client
@@ -0,0 +1,41 @@
1
+ # frozen_string_literal: true
2
+
3
+ module ElevenlabsClient
4
+ class ForcedAlignment
5
+ def initialize(client)
6
+ @client = client
7
+ end
8
+
9
+ # POST /v1/forced-alignment
10
+ # Force align an audio file to text. Get timing information for each character and word
11
+ # Documentation: https://elevenlabs.io/docs/api-reference/forced-alignment
12
+ #
13
+ # @param audio_file [IO, File] The audio file to align (must be less than 1GB)
14
+ # @param filename [String] Original filename for the audio file
15
+ # @param text [String] The text to align with the audio
16
+ # @param options [Hash] Optional parameters
17
+ # @option options [Boolean] :enabled_spooled_file Stream file in chunks for large files (defaults to false)
18
+ # @return [Hash] JSON response containing characters, words arrays with timing info, and loss score
19
+ def create(audio_file, filename, text, **options)
20
+ endpoint = "/v1/forced-alignment"
21
+
22
+ payload = {
23
+ file: @client.file_part(audio_file, filename),
24
+ text: text
25
+ }
26
+
27
+ # Add optional parameters if provided
28
+ payload[:enabled_spooled_file] = options[:enabled_spooled_file] unless options[:enabled_spooled_file].nil?
29
+
30
+ @client.post_multipart(endpoint, payload)
31
+ end
32
+
33
+ # Alias methods for convenience
34
+ alias_method :align, :create
35
+ alias_method :force_align, :create
36
+
37
+ private
38
+
39
+ attr_reader :client
40
+ end
41
+ end