RubyGems - eleven_rb - Versions diffs - 0.4.0 → 1.0.0 - Mend

eleven_rb 0.4.0 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +17 -0
data/README.md +42 -2
data/lib/eleven_rb/client.rb +8 -0
data/lib/eleven_rb/objects/cost_info.rb +5 -3
data/lib/eleven_rb/resources/models.rb +7 -0
data/lib/eleven_rb/resources/text_to_dialogue.rb +113 -0
data/lib/eleven_rb/version.rb +1 -1
data/lib/eleven_rb.rb +1 -0
metadata +6 -5

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: fa71eff851a0c6b80f139e801962bceaf1bb371f6e9a0cd325c47b2ee6f4994c
-  data.tar.gz: aa640970faba75afe3cbacdc16958ee63c1f2ecfb009512cd9e8182117e16d29
+  metadata.gz: ed711abcce18771ad13f10bcb29754605be61f7d02f7114f0e0b28b0dad4d556
+  data.tar.gz: 146285726bc80b0c3eab0b307a7ec4b788a8f3465903992bb12fc2b34bc1694b
 SHA512:
-  metadata.gz: a537ba9de014afc366c348a71613f257b6380ae784979fb42cd55522610c85661e0e34907c770642fa268051cbb91b24110e00d5e8b7305d5c913a81e04705a6
-  data.tar.gz: c1f4e236fb327b737b4e6346bf3a71bfec357d4f7f09a0e1eaf9c8889ef2251ff40b8eeb3be959e98b58e2a76ff6305e7391d64f0c33a9ce1fb38897b958f1fa
+  metadata.gz: 6bf8e216c83287bb099e4a6bbed4ef718329f361fb7dfb4c70bf122f2512c74916eb1540fe6a1dfd4ae01e0edc53edc05408a017946f504c09611a54a6c2370b
+  data.tar.gz: 1839c52e3adf4efed58c410f08fa5c5e4818fde0964922e0752019b6606d726ed66a4f48a767b4e2110aa3b0cf98e7f4666eaef64552ec9f0f976378b1ef5094

data/CHANGELOG.md CHANGED Viewed

@@ -7,6 +7,23 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
+## [1.0.0] - 2026-03-10
+### Added
+- Text-to-Dialogue multi-speaker audio generation via `client.text_to_dialogue.generate` (`POST /v1/text-to-dialogue`)
+- `Client#text_to_dialogue` resource with `dialogue` alias
+- Multi-speaker input validation (max 10 unique voices, 5000 character limit)
+- `eleven_v3` model added to `CostInfo::COST_PER_1K_CHARS` ($0.30/1K chars)
+- `Models#latest` method returning the most capable model (`eleven_v3`)
+- Audio tags support via v3 model (`[laughs]`, `[whispers]`, `[excited]`, etc.)
+- `CostInfo` now accepts `character_count:` keyword as alternative to `text:`
+- TTS generation with word-level timestamps via `client.tts.generate_with_timestamps`
+### Changed
+- `CostInfo#initialize` signature: `text:` is now optional when `character_count:` is provided (backwards-compatible)
 ## [0.4.0] - 2026-03-10
 ### Added

data/README.md CHANGED Viewed

@@ -4,12 +4,13 @@
 [![CI](https://github.com/webventures/eleven_rb/actions/workflows/ci.yml/badge.svg)](https://github.com/webventures/eleven_rb/actions/workflows/ci.yml)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
-A Ruby client for the [ElevenLabs](https://try.elevenlabs.io/qyk2j8gumrjz) Text-to-Speech, Speech-to-Speech, Sound Effects, and Music API.
+A Ruby client for the [ElevenLabs](https://try.elevenlabs.io/qyk2j8gumrjz) Text-to-Speech, Speech-to-Speech, Text-to-Dialogue, Sound Effects, and Music API.
 ## Features
 - Text-to-Speech generation and streaming
 - Speech-to-Speech voice conversion
+- Text-to-Dialogue multi-speaker generation with audio tags
 - Sound effects generation from text descriptions
 - Music generation from prompts or composition plans
 - Voice management (list, get, create, update, delete)
@@ -73,7 +74,7 @@ audio.save_to_file("output.mp3")
 audio = client.tts.generate(
   "Hello world",
   voice_id: "voice_id",
-  model_id: "eleven_multilingual_v2",
+  model_id: "eleven_v3",             # Most expressive, 70+ languages, audio tags
   voice_settings: {
     stability: 0.5,
     similarity_boost: 0.75
@@ -111,6 +112,42 @@ io = File.open("input.mp3", "rb")
 audio = client.sts.convert(io, voice_id: "voice_id")
 ```
+### Text-to-Dialogue
+```ruby
+# Generate multi-speaker dialogue
+audio = client.text_to_dialogue.generate([
+  { text: "[excited] Welcome to the show!", voice_id: "voice_abc" },
+  { text: "[laughs] Thanks for having me.", voice_id: "voice_xyz" },
+  { text: "So tell us about your project...", voice_id: "voice_abc" }
+])
+audio.save_to_file("dialogue.mp3")
+# With options
+audio = client.dialogue.generate(
+  inputs,
+  model_id: "eleven_v3",
+  language_code: "en",
+  settings: { stability: 0.5 },
+  seed: 42,
+  output_format: "mp3_44100_192"
+)
+```
+### Audio Tags
+The `eleven_v3` model supports inline audio tags for expressive speech:
+```ruby
+audio = client.tts.generate(
+  "[excited] Oh wow, this is AMAZING! [laughs] I can't believe it...",
+  voice_id: "voice_id",
+  model_id: "eleven_v3"
+)
+```
+Supported tags include `[laughs]`, `[whispers]`, `[sighs]`, `[excited]`, `[sarcastic]`, `[curious]`, `[pause]`, and more. Use CAPS for emphasis, `...` for pauses, and `—` for interruptions. See the [ElevenLabs v3 documentation](https://elevenlabs.io/docs/guides/audio-tags) for the full list.
 ### Sound Effects
 ```ruby
@@ -274,6 +311,9 @@ client = ElevenRb::Client.new(
 models = client.models.list
 models.each { |m| puts "#{m.name} (#{m.model_id})" }
+# Get the latest/most capable model
+client.models.latest  # => "eleven_v3"
 # Get multilingual models
 client.models.multilingual

data/lib/eleven_rb/client.rb CHANGED Viewed

@@ -101,6 +101,14 @@ module ElevenRb
       @music ||= Resources::Music.new(http_client)
     end
+    # Text-to-dialogue resource
+    #
+    # @return [Resources::TextToDialogue]
+    def text_to_dialogue
+      @text_to_dialogue ||= Resources::TextToDialogue.new(http_client)
+    end
+    alias dialogue text_to_dialogue
     # Voice slot manager
     #
     # @return [VoiceSlotManager]

data/lib/eleven_rb/objects/cost_info.rb CHANGED Viewed

@@ -12,6 +12,7 @@ module ElevenRb
         'eleven_monolingual_v1' => 0.30,
         'eleven_multilingual_v1' => 0.30,
         'eleven_multilingual_v2' => 0.30,
+        'eleven_v3' => 0.30,
         'eleven_turbo_v2' => 0.18,
         'eleven_turbo_v2_5' => 0.18,
         'eleven_english_sts_v2' => 0.30,
@@ -23,11 +24,12 @@ module ElevenRb
       # Initialize cost info
       #
-      # @param text [String] the text being converted
+      # @param text [String, nil] the text being converted
+      # @param character_count [Integer, nil] direct character count (alternative to text)
       # @param voice_id [String] the voice ID
       # @param model_id [String] the model ID
-      def initialize(text:, voice_id:, model_id:)
-        @character_count = text.length
+      def initialize(voice_id:, model_id:, text: nil, character_count: nil)
+        @character_count = character_count || text&.length || 0
         @voice_id = voice_id
         @model_id = model_id
       end

data/lib/eleven_rb/resources/models.rb CHANGED Viewed

@@ -54,6 +54,13 @@ module ElevenRb
         get('eleven_multilingual_v2') || tts_capable.first
       end
+      # Get the latest/most capable model
+      #
+      # @return [Objects::Model, nil]
+      def latest
+        get('eleven_v3') || default
+      end
       # Get model IDs as array
       #
       # @return [Array<String>]

data/lib/eleven_rb/resources/text_to_dialogue.rb ADDED Viewed

@@ -0,0 +1,113 @@
+# frozen_string_literal: true
+module ElevenRb
+  module Resources
+    # Text-to-dialogue resource for multi-speaker audio generation
+    #
+    # @example Generate dialogue
+    #   audio = client.text_to_dialogue.generate([
+    #     { text: "[excited] Welcome!", voice_id: "voice_abc" },
+    #     { text: "[laughs] Thanks!", voice_id: "voice_xyz" }
+    #   ])
+    #   audio.save_to_file("dialogue.mp3")
+    class TextToDialogue < Base
+      DEFAULT_MODEL = 'eleven_v3'
+      MAX_VOICES_PER_REQUEST = 10
+      MAX_TEXT_LENGTH = 5000
+      # Generate dialogue audio from multiple speaker inputs
+      #
+      # @param inputs [Array<Hash>] Array of { text:, voice_id: } hashes
+      # @param model_id [String] Model to use (only eleven_v3 supported)
+      # @param language_code [String, nil] ISO 639-1 language code
+      # @param settings [Hash, nil] Generation settings (stability: 0.0-1.0)
+      # @param seed [Integer, nil] Seed for reproducibility
+      # @param output_format [String] Audio output format
+      # @param apply_text_normalization [String] "auto", "on", or "off"
+      # @return [Objects::Audio]
+      def generate(
+        inputs,
+        model_id: DEFAULT_MODEL,
+        language_code: nil,
+        settings: nil,
+        seed: nil,
+        output_format: 'mp3_44100_128',
+        apply_text_normalization: 'auto'
+      )
+        validate_inputs!(inputs)
+        body = build_request_body(inputs, model_id, language_code, settings, seed,
+                                  apply_text_normalization)
+        response = post_binary(
+          "/text-to-dialogue?output_format=#{output_format}",
+          body
+        )
+        build_audio_response(response, inputs, output_format, model_id)
+      end
+      private
+      def build_request_body(inputs, model_id, language_code, settings, seed,
+                             apply_text_normalization)
+        body = {
+          inputs: inputs.map { |i| { text: i[:text], voice_id: i[:voice_id] } },
+          model_id: model_id,
+          apply_text_normalization: apply_text_normalization
+        }
+        body[:language_code] = language_code if language_code
+        body[:settings] = settings if settings
+        body[:seed] = seed if seed
+        body
+      end
+      def build_audio_response(response, inputs, output_format, model_id)
+        total_text = inputs.map { |i| i[:text] }.join("\n")
+        total_chars = inputs.sum { |i| i[:text].length }
+        primary_voice = inputs.first[:voice_id]
+        audio = Objects::Audio.new(
+          data: response, format: output_format,
+          voice_id: primary_voice, text: total_text, model_id: model_id
+        )
+        cost_info = Objects::CostInfo.new(
+          character_count: total_chars, voice_id: primary_voice, model_id: model_id
+        )
+        http_client.config.trigger(
+          :on_audio_generated,
+          audio: audio, voice_id: primary_voice,
+          text: total_text, cost_info: cost_info.to_h
+        )
+        audio
+      end
+      def validate_inputs!(inputs)
+        raise Errors::ValidationError, 'inputs must be a non-empty array' unless inputs.is_a?(Array) && !inputs.empty?
+        inputs.each_with_index do |input, i|
+          validate_presence!(input[:text], "inputs[#{i}].text")
+          validate_presence!(input[:voice_id], "inputs[#{i}].voice_id")
+        end
+        unique_voices = inputs.map { |i| i[:voice_id] }.uniq
+        if unique_voices.length > MAX_VOICES_PER_REQUEST
+          raise Errors::ValidationError,
+                "Maximum #{MAX_VOICES_PER_REQUEST} unique voices per request " \
+                "(got #{unique_voices.length})"
+        end
+        total_chars = inputs.sum { |i| i[:text].length }
+        return unless total_chars > MAX_TEXT_LENGTH
+        raise Errors::ValidationError,
+              "Total text length #{total_chars} exceeds maximum " \
+              "#{MAX_TEXT_LENGTH} characters"
+      end
+    end
+  end
+end

data/lib/eleven_rb/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module ElevenRb
-  VERSION = '0.4.0'
+  VERSION = '1.0.0'
 end

data/lib/eleven_rb.rb CHANGED Viewed

@@ -109,6 +109,7 @@ require_relative 'eleven_rb/resources/user'
 require_relative 'eleven_rb/resources/sound_effects'
 require_relative 'eleven_rb/resources/music'
 require_relative 'eleven_rb/resources/speech_to_speech'
+require_relative 'eleven_rb/resources/text_to_dialogue'
 # High-level components
 require_relative 'eleven_rb/voice_slot_manager'

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: eleven_rb
 version: !ruby/object:Gem::Version
-  version: 0.4.0
+  version: 1.0.0
 platform: ruby
 authors:
 - Web Ventures Ltd
@@ -122,9 +122,9 @@ dependencies:
       - !ruby/object:Gem::Version
         version: '0.9'
 description: |
-  A well-structured Ruby gem for ElevenLabs TTS with voice library management,
-  streaming support, voice slot optimization, and comprehensive callbacks for
-  logging, error tracking, and cost monitoring.
+  A comprehensive Ruby client for ElevenLabs covering Text-to-Speech,
+  Speech-to-Speech, Text-to-Dialogue, Sound Effects, and Music generation
+  with voice management, streaming, and built-in cost tracking.
 email:
 - gems@dev.webven.nz
 executables: []
@@ -158,6 +158,7 @@ files:
 - lib/eleven_rb/resources/music.rb
 - lib/eleven_rb/resources/sound_effects.rb
 - lib/eleven_rb/resources/speech_to_speech.rb
+- lib/eleven_rb/resources/text_to_dialogue.rb
 - lib/eleven_rb/resources/text_to_speech.rb
 - lib/eleven_rb/resources/user.rb
 - lib/eleven_rb/resources/voice_library.rb
@@ -189,5 +190,5 @@ required_rubygems_version: !ruby/object:Gem::Requirement
 requirements: []
 rubygems_version: 3.6.9
 specification_version: 4
-summary: Ruby client for the ElevenLabs Text-to-Speech API
+summary: Ruby client for the ElevenLabs Audio AI API
 test_files: []