RubyGems - elevenlabs_client - Versions diffs - 0.3.0 → 0.5.0 - Mend

elevenlabs_client 0.3.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +52 -1
data/README.md +78 -1
data/lib/elevenlabs_client/client.rb +63 -1
data/lib/elevenlabs_client/endpoints/audio_isolation.rb +71 -0
data/lib/elevenlabs_client/endpoints/audio_native.rb +103 -0
data/lib/elevenlabs_client/endpoints/dubs.rb +208 -2
data/lib/elevenlabs_client/endpoints/forced_alignment.rb +41 -0
data/lib/elevenlabs_client/endpoints/speech_to_speech.rb +125 -0
data/lib/elevenlabs_client/endpoints/speech_to_text.rb +108 -0
data/lib/elevenlabs_client/endpoints/text_to_dialogue_stream.rb +50 -0
data/lib/elevenlabs_client/endpoints/text_to_speech_stream.rb +1 -0
data/lib/elevenlabs_client/endpoints/text_to_speech_stream_with_timestamps.rb +75 -0
data/lib/elevenlabs_client/endpoints/text_to_speech_with_timestamps.rb +73 -0
data/lib/elevenlabs_client/endpoints/voices.rb +362 -0
data/lib/elevenlabs_client/endpoints/websocket_text_to_speech.rb +250 -0
data/lib/elevenlabs_client/version.rb +1 -1
data/lib/elevenlabs_client.rb +9 -2
metadata +25 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: d6c150778cecf2c843fc1b5492d1cfbaed14306f010b240557945c631b083fa1
-  data.tar.gz: 62ba8f400b1be939bfc5839b47a5e636a5e888d812b8eabc04c1ca26196c9c63
+  metadata.gz: f1c75ecb60655822ec4d8b88e22ebae5e0a1714e5573000cd5a36c3e80bcb886
+  data.tar.gz: 5d05b4e838bc30cbc1c290b615b1c0d686ea6d6aafe9521097ddc00d0ba28189
 SHA512:
-  metadata.gz: 9df813daf9c443428e92cd323b94f05fa0791ab046b34f5c0ddd12bc19654235888c98bc47299355e2b9c6bf5d5a417d0b639f59b6d0af16b86f5e3f700202ad
-  data.tar.gz: 50b4a3229dbd3f2e25c5614d5178d4b688c812d9a01181748879a8490a66df457e4fb24b3b6110d0e72e1def3e877072045e84fecb18b175128b1b18fb96e2ad
+  metadata.gz: e26733f1b2ddaaec79432e7458f2af56b50d0f29bb52bdddc4fcbdbb564c85eea40949c7304fef7a4af3da5ff2c364bb42341b3755bf385cc6e81bb429f81aa5
+  data.tar.gz: 59f527fa65e17375fa3a33eb8f7d140a3f59ccd87f51168e6f43bac5c94c3d93fa49026dc4ee6fbe5eb4fb7b0a772f6e6925b6051092f112b12936e3b154009e

data/CHANGELOG.md CHANGED Viewed

@@ -5,6 +5,57 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [0.5.0] - 2025-09-14
+### Added
+- Text-to-Speech With Timestamps
+  - `client.text_to_speech_with_timestamps.generate(voice_id, text, **options)`
+  - Character-level `alignment` and `normalized_alignment`
+- Streaming Text-to-Speech With Timestamps
+  - `client.text_to_speech_stream_with_timestamps.stream(voice_id, text, **options, &block)`
+  - JSON streaming with audio chunks and timing per chunk
+- WebSocket Streaming Enhancements
+  - Single-context and multi-context improvements; correct query param ordering and filtering
+  - Docs: `docs/WEBSOCKET_STREAMING.md`
+- Text-to-Dialogue Streaming
+  - `client.text_to_dialogue_stream.stream(inputs, **options, &block)`
+  - Docs: `docs/TEXT_TO_DIALOGUE_STREAMING.md`
+### Improved
+- Client streaming JSON handling for timestamp streams (`post_streaming_with_timestamps`)
+- Robust parsing and block yielding across streaming tests
+- URL query parameter ordering to match expectations in tests
+### Tests
+- Added comprehensive unit and integration tests for all new endpoints
+- Full suite now: 687 examples, 0 failures
+### Notes
+- These features require valid ElevenLabs API keys and correct model/voice permissions
+## [0.4.0] - 2025-09-12
+### Added
+- **🎵 Dubbing Generation API**
+  - `delete(dubbing_id)` - Delete dubbing projects
+  - `get_resource(dubbing_id)` - Get detailed resource information
+  - `create_segment(options)` - Create new segments
+  - `delete_segment(options)` - Delete segments
+  - `update_segment(options)` - Update segment text/timing
+  - `transcribe_segment(options)` - Regenerate transcriptions
+  - `translate_segment(options)` - Regenerate translations
+  - `dub_segment(options)` - Regenerate dubs
+  - `render_project(options)` - Render output media
+  - `update_speaker(options)` - Update speaker voices
+  - `get_similar_voices(options)` - Get voice recommendations
+- **🔧 HTTP Client Improvements** - Added HTTP method
+  - Added `patch` method for PATCH requests
 ## [0.3.0] - 2025-09-12
 ### Added
@@ -252,4 +303,4 @@ client.dubs.create(file_io: file, filename: "video.mp4", target_languages: ["es"
 - **File Support**: Multiple video and audio formats (MP4, MOV, MP3, WAV, etc.)
 - **Language Support**: Multiple target languages for dubbing
 - **Configuration**: Flexible API key and endpoint configuration
-- **Testing**: Comprehensive test suite with integration tests
+- **Testing**: Comprehensive test suite with integration tests

data/README.md CHANGED Viewed

@@ -2,7 +2,7 @@
 [![Gem Version](https://badge.fury.io/rb/elevenlabs_client.svg)](https://badge.fury.io/rb/elevenlabs_client)
-A comprehensive Ruby client library for the ElevenLabs API, supporting voice synthesis, dubbing, dialogue generation, sound effects, and AI music composition.
+A comprehensive Ruby client library for the ElevenLabs API, supporting voice synthesis, dubbing, dialogue generation, sound effects, AI music composition, voice transformation, speech transcription, audio isolation, and advanced audio processing features.
 ## Features
@@ -13,6 +13,11 @@ A comprehensive Ruby client library for the ElevenLabs API, supporting voice syn
 🎵 **Music Generation** - AI-powered music composition and streaming
 🎨 **Voice Design** - Create custom voices from text descriptions
 🎭 **Voice Management** - Create, edit, and manage individual voices
+🔄 **Speech-to-Speech** - Transform audio from one voice to another (Voice Changer)
+📝 **Speech-to-Text** - Transcribe audio and video files with advanced features
+🔇 **Audio Isolation** - Remove background noise from audio files
+📱 **Audio Native** - Create embeddable audio players for websites
+⏱️ **Forced Alignment** - Get precise timing information for audio transcripts
 🤖 **Models** - List available models and their capabilities
 📡 **Streaming** - Real-time audio streaming
 ⚙️ **Configurable** - Flexible configuration options
@@ -141,6 +146,60 @@ music_data = client.music.compose(
 )
 File.open("generated_music.mp3", "wb") { |f| f.write(music_data) }
+# Speech-to-Speech (Voice Changer)
+File.open("input_audio.mp3", "rb") do |audio_file|
+  converted_audio = client.speech_to_speech.convert(
+    "target_voice_id",
+    audio_file,
+    "input_audio.mp3",
+    remove_background_noise: true
+  )
+  File.open("converted_audio.mp3", "wb") { |f| f.write(converted_audio) }
+end
+# Speech-to-Text Transcription
+File.open("audio.mp3", "rb") do |audio_file|
+  transcription = client.speech_to_text.create(
+    "scribe_v1",
+    file: audio_file,
+    filename: "audio.mp3",
+    diarize: true,
+    timestamps_granularity: "word"
+  )
+  puts "Transcribed: #{transcription['text']}"
+end
+# Audio Isolation (Background Noise Removal)
+File.open("noisy_audio.mp3", "rb") do |audio_file|
+  clean_audio = client.audio_isolation.isolate(audio_file, "noisy_audio.mp3")
+  File.open("clean_audio.mp3", "wb") { |f| f.write(clean_audio) }
+end
+# Audio Native (Embeddable Player)
+File.open("article.html", "rb") do |html_file|
+  project = client.audio_native.create(
+    "My Article",
+    file: html_file,
+    filename: "article.html",
+    voice_id: "voice_id",
+    auto_convert: true
+  )
+  puts "Player HTML: #{project['html_snippet']}"
+end
+# Forced Alignment
+File.open("speech.wav", "rb") do |audio_file|
+  alignment = client.forced_alignment.create(
+    audio_file,
+    "speech.wav",
+    "Hello world, this is a test transcript"
+  )
+  alignment['words'].each do |word|
+    puts "#{word['text']}: #{word['start']}s - #{word['end']}s"
+  end
+end
 # Streaming Text-to-Speech
 client.text_to_speech_stream.stream("voice_id", "Streaming text") do |chunk|
   # Process audio chunk in real-time
@@ -160,6 +219,11 @@ end
 - **[Music Generation API](docs/MUSIC.md)** - AI-powered music composition and streaming
 - **[Text-to-Voice API](docs/TEXT_TO_VOICE.md)** - Design and create custom voices
 - **[Voice Management API](docs/VOICES.md)** - Manage individual voices (CRUD operations)
+- **[Speech-to-Speech API](docs/SPEECH_TO_SPEECH.md)** - Transform audio from one voice to another
+- **[Speech-to-Text API](docs/SPEECH_TO_TEXT.md)** - Transcribe audio and video files
+- **[Audio Isolation API](docs/AUDIO_ISOLATION.md)** - Remove background noise from audio
+- **[Audio Native API](docs/AUDIO_NATIVE.md)** - Create embeddable audio players
+- **[Forced Alignment API](docs/FORCED_ALIGNMENT.md)** - Get precise timing information
 - **[Models API](docs/MODELS.md)** - List available models and capabilities
 ### Available Endpoints
@@ -174,6 +238,11 @@ end
 | `client.music.*` | AI music composition and streaming | [MUSIC.md](docs/MUSIC.md) |
 | `client.text_to_voice.*` | Voice design and creation | [TEXT_TO_VOICE.md](docs/TEXT_TO_VOICE.md) |
 | `client.voices.*` | Voice management (CRUD) | [VOICES.md](docs/VOICES.md) |
+| `client.speech_to_speech.*` | Voice changer and audio transformation | [SPEECH_TO_SPEECH.md](docs/SPEECH_TO_SPEECH.md) |
+| `client.speech_to_text.*` | Audio/video transcription | [SPEECH_TO_TEXT.md](docs/SPEECH_TO_TEXT.md) |
+| `client.audio_isolation.*` | Background noise removal | [AUDIO_ISOLATION.md](docs/AUDIO_ISOLATION.md) |
+| `client.audio_native.*` | Embeddable audio players | [AUDIO_NATIVE.md](docs/AUDIO_NATIVE.md) |
+| `client.forced_alignment.*` | Audio-text timing alignment | [FORCED_ALIGNMENT.md](docs/FORCED_ALIGNMENT.md) |
 | `client.models.*` | Model information and capabilities | [MODELS.md](docs/MODELS.md) |
 ## Configuration Options
@@ -221,6 +290,9 @@ end
 - `AuthenticationError` - Invalid API key or authentication failure
 - `RateLimitError` - Rate limit exceeded
 - `ValidationError` - Invalid request parameters
+- `NotFoundError` - Resource not found (e.g., voice ID, transcript ID)
+- `BadRequestError` - Bad request with invalid parameters
+- `UnprocessableEntityError` - Request cannot be processed (e.g., invalid file format)
 - `APIError` - General API errors
 ## Rails Integration
@@ -235,6 +307,11 @@ The gem is designed to work seamlessly with Rails applications. See the [example
 - [MusicController](examples/music_controller.rb) - AI music composition and streaming
 - [TextToVoiceController](examples/text_to_voice_controller.rb) - Voice design and creation
 - [VoicesController](examples/voices_controller.rb) - Voice management (CRUD operations)
+- [SpeechToSpeechController](examples/speech_to_speech_controller.rb) - Voice changer and audio transformation
+- [SpeechToTextController](examples/speech_to_text_controller.rb) - Audio/video transcription with advanced features
+- [AudioIsolationController](examples/audio_isolation_controller.rb) - Background noise removal and audio cleanup
+- [AudioNativeController](examples/audio_native_controller.rb) - Embeddable audio players for websites
+- [ForcedAlignmentController](examples/forced_alignment_controller.rb) - Audio-text timing alignment and subtitle generation
 ## Development

data/lib/elevenlabs_client/client.rb CHANGED Viewed

@@ -2,12 +2,13 @@
 require "faraday"
 require "faraday/multipart"
+require "json"
 module ElevenlabsClient
   class Client
     DEFAULT_BASE_URL = "https://api.elevenlabs.io"
-    attr_reader :base_url, :api_key, :dubs, :text_to_speech, :text_to_speech_stream, :text_to_dialogue, :sound_generation, :text_to_voice, :models, :voices, :music
+    attr_reader :base_url, :api_key, :dubs, :text_to_speech, :text_to_speech_stream, :text_to_speech_with_timestamps, :text_to_speech_stream_with_timestamps, :text_to_dialogue, :text_to_dialogue_stream, :sound_generation, :text_to_voice, :models, :voices, :music, :audio_isolation, :audio_native, :forced_alignment, :speech_to_speech, :speech_to_text, :websocket_text_to_speech
     def initialize(api_key: nil, base_url: nil, api_key_env: "ELEVENLABS_API_KEY", base_url_env: "ELEVENLABS_BASE_URL")
       @api_key = api_key || fetch_api_key(api_key_env)
@@ -16,12 +17,21 @@ module ElevenlabsClient
       @dubs = Dubs.new(self)
       @text_to_speech = TextToSpeech.new(self)
       @text_to_speech_stream = TextToSpeechStream.new(self)
+      @text_to_speech_with_timestamps = TextToSpeechWithTimestamps.new(self)
+      @text_to_speech_stream_with_timestamps = TextToSpeechStreamWithTimestamps.new(self)
       @text_to_dialogue = TextToDialogue.new(self)
+      @text_to_dialogue_stream = TextToDialogueStream.new(self)
       @sound_generation = SoundGeneration.new(self)
       @text_to_voice = TextToVoice.new(self)
       @models = Models.new(self)
       @voices = Voices.new(self)
       @music = Endpoints::Music.new(self)
+      @audio_isolation = AudioIsolation.new(self)
+      @audio_native = AudioNative.new(self)
+      @forced_alignment = ForcedAlignment.new(self)
+      @speech_to_speech = SpeechToSpeech.new(self)
+      @speech_to_text = SpeechToText.new(self)
+      @websocket_text_to_speech = WebSocketTextToSpeech.new(self)
     end
     # Makes an authenticated GET request
@@ -61,6 +71,20 @@ module ElevenlabsClient
       handle_response(response)
     end
+    # Makes an authenticated PATCH request
+    # @param path [String] API endpoint path
+    # @param body [Hash, nil] Request body
+    # @return [Hash] Response body
+    def patch(path, body = nil)
+      response = @conn.patch(path) do |req|
+        req.headers["xi-api-key"] = api_key
+        req.headers["Content-Type"] = "application/json"
+        req.body = body.to_json if body
+      end
+      handle_response(response)
+    end
     # Makes an authenticated multipart POST request
     # @param path [String] API endpoint path
     # @param payload [Hash] Multipart payload
@@ -130,6 +154,44 @@ module ElevenlabsClient
       handle_response(response)
     end
+    # Makes an authenticated POST request with streaming response for timestamp data
+    # @param path [String] API endpoint path
+    # @param body [Hash, nil] Request body
+    # @param block [Proc] Block to handle each JSON chunk with timestamps
+    # @return [Faraday::Response] Response object
+    def post_streaming_with_timestamps(path, body = nil, &block)
+      buffer = ""
+      response = @conn.post(path) do |req|
+        req.headers["xi-api-key"] = api_key
+        req.headers["Content-Type"] = "application/json"
+        req.body = body.to_json if body
+        # Set up streaming callback for JSON chunks
+        req.options.on_data = proc do |chunk, _|
+          if block_given?
+            buffer += chunk
+            # Process complete JSON objects
+            while buffer.include?("\n")
+              line, buffer = buffer.split("\n", 2)
+              next if line.strip.empty?
+              begin
+                json_data = JSON.parse(line)
+                block.call(json_data)
+              rescue JSON::ParserError
+                # Skip malformed JSON lines
+                next
+              end
+            end
+          end
+        end
+      end
+      handle_response(response)
+    end
     # Helper method to create Faraday::Multipart::FilePart
     # @param file_io [IO] File IO object
     # @param filename [String] Original filename

data/lib/elevenlabs_client/endpoints/audio_isolation.rb ADDED Viewed

@@ -0,0 +1,71 @@
+# frozen_string_literal: true
+module ElevenlabsClient
+  class AudioIsolation
+    def initialize(client)
+      @client = client
+    end
+    # POST /v1/audio-isolation
+    # Removes background noise from audio
+    # Documentation: https://elevenlabs.io/docs/api-reference/audio-isolation
+    #
+    # @param audio_file [IO, File] The audio file from which vocals/speech will be isolated
+    # @param filename [String] Original filename for the audio file
+    # @param options [Hash] Optional parameters
+    # @option options [String] :file_format Format of input audio ('pcm_s16le_16' or 'other', defaults to 'other')
+    # @return [String] Binary audio data with background noise removed
+    def isolate(audio_file, filename, **options)
+      endpoint = "/v1/audio-isolation"
+      payload = {
+        audio: @client.file_part(audio_file, filename)
+      }
+      # Add optional parameters if provided
+      payload[:file_format] = options[:file_format] if options[:file_format]
+      @client.post_multipart(endpoint, payload)
+    end
+    # POST /v1/audio-isolation/stream
+    # Removes background noise from audio with streaming response
+    # Documentation: https://elevenlabs.io/docs/api-reference/audio-isolation/stream
+    #
+    # @param audio_file [IO, File] The audio file from which vocals/speech will be isolated
+    # @param filename [String] Original filename for the audio file
+    # @param options [Hash] Optional parameters
+    # @option options [String] :file_format Format of input audio ('pcm_s16le_16' or 'other', defaults to 'other')
+    # @param block [Proc] Block to handle each chunk of streaming audio data
+    # @return [Faraday::Response] Response object for streaming
+    def isolate_stream(audio_file, filename, **options, &block)
+      endpoint = "/v1/audio-isolation/stream"
+      payload = {
+        audio: @client.file_part(audio_file, filename)
+      }
+      # Add optional parameters if provided
+      payload[:file_format] = options[:file_format] if options[:file_format]
+      # Use streaming multipart request
+      response = @client.instance_variable_get(:@conn).post(endpoint) do |req|
+        req.headers["xi-api-key"] = @client.api_key
+        req.body = payload
+        # Set up streaming callback if block provided
+        if block_given?
+          req.options.on_data = proc do |chunk, _|
+            block.call(chunk)
+          end
+        end
+      end
+      @client.send(:handle_response, response)
+    end
+    private
+    attr_reader :client
+  end
+end

data/lib/elevenlabs_client/endpoints/audio_native.rb ADDED Viewed

@@ -0,0 +1,103 @@
+# frozen_string_literal: true
+module ElevenlabsClient
+  class AudioNative
+    def initialize(client)
+      @client = client
+    end
+    # POST /v1/audio-native
+    # Creates Audio Native enabled project, optionally starts conversion and returns project ID and embeddable HTML snippet
+    # Documentation: https://elevenlabs.io/docs/api-reference/audio-native/create
+    #
+    # @param name [String] Project name
+    # @param options [Hash] Optional parameters
+    # @option options [String] :image Image URL used in the player (deprecated)
+    # @option options [String] :author Author used in the player
+    # @option options [String] :title Title used in the player
+    # @option options [Boolean] :small Whether to use small player (deprecated, defaults to false)
+    # @option options [String] :text_color Text color used in the player
+    # @option options [String] :background_color Background color used in the player
+    # @option options [Integer] :sessionization Minutes to persist session (deprecated, defaults to 0)
+    # @option options [String] :voice_id Voice ID used to voice the content
+    # @option options [String] :model_id TTS Model ID used in the player
+    # @option options [IO, File] :file Text or HTML input file containing article content
+    # @option options [String] :filename Original filename for the file
+    # @option options [Boolean] :auto_convert Whether to auto convert project to audio (defaults to false)
+    # @option options [String] :apply_text_normalization Text normalization mode ('auto', 'on', 'off', 'apply_english')
+    # @return [Hash] JSON response containing project_id, converting status, and html_snippet
+    def create(name, **options)
+      endpoint = "/v1/audio-native"
+      payload = { name: name }
+      # Add optional parameters if provided
+      payload[:image] = options[:image] if options[:image]
+      payload[:author] = options[:author] if options[:author]
+      payload[:title] = options[:title] if options[:title]
+      payload[:small] = options[:small] unless options[:small].nil?
+      payload[:text_color] = options[:text_color] if options[:text_color]
+      payload[:background_color] = options[:background_color] if options[:background_color]
+      payload[:sessionization] = options[:sessionization] if options[:sessionization]
+      payload[:voice_id] = options[:voice_id] if options[:voice_id]
+      payload[:model_id] = options[:model_id] if options[:model_id]
+      payload[:auto_convert] = options[:auto_convert] unless options[:auto_convert].nil?
+      payload[:apply_text_normalization] = options[:apply_text_normalization] if options[:apply_text_normalization]
+      # Add file if provided
+      if options[:file] && options[:filename]
+        payload[:file] = @client.file_part(options[:file], options[:filename])
+      end
+      @client.post_multipart(endpoint, payload)
+    end
+    # POST /v1/audio-native/:project_id/content
+    # Updates content for the specific AudioNative Project
+    # Documentation: https://elevenlabs.io/docs/api-reference/audio-native/update
+    #
+    # @param project_id [String] The ID of the project to be used
+    # @param options [Hash] Optional parameters
+    # @option options [IO, File] :file Text or HTML input file containing article content
+    # @option options [String] :filename Original filename for the file
+    # @option options [Boolean] :auto_convert Whether to auto convert project to audio (defaults to false)
+    # @option options [Boolean] :auto_publish Whether to auto publish after conversion (defaults to false)
+    # @return [Hash] JSON response containing project_id, converting, publishing status, and html_snippet
+    def update_content(project_id, **options)
+      endpoint = "/v1/audio-native/#{project_id}/content"
+      payload = {}
+      # Add optional parameters if provided
+      payload[:auto_convert] = options[:auto_convert] unless options[:auto_convert].nil?
+      payload[:auto_publish] = options[:auto_publish] unless options[:auto_publish].nil?
+      # Add file if provided
+      if options[:file] && options[:filename]
+        payload[:file] = @client.file_part(options[:file], options[:filename])
+      end
+      @client.post_multipart(endpoint, payload)
+    end
+    # GET /v1/audio-native/:project_id/settings
+    # Get player settings for the specific project
+    # Documentation: https://elevenlabs.io/docs/api-reference/audio-native/settings
+    #
+    # @param project_id [String] The ID of the Studio project
+    # @return [Hash] JSON response containing enabled status, snapshot_id, and settings
+    def get_settings(project_id)
+      endpoint = "/v1/audio-native/#{project_id}/settings"
+      @client.get(endpoint)
+    end
+    # Alias methods for convenience
+    alias_method :create_project, :create
+    alias_method :update_project_content, :update_content
+    alias_method :project_settings, :get_settings
+    private
+    attr_reader :client
+  end
+end