RubyGems - ruby-gemini-api - Versions diffs - 1.0.0 → 1.1.0 - Mend

ruby-gemini-api 1.0.0 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +27 -0
data/README.md +278 -0
data/lib/gemini/client.rb +28 -3
data/lib/gemini/embeddings.rb +108 -17
data/lib/gemini/live/configuration.rb +65 -0
data/lib/gemini/live/connection.rb +83 -0
data/lib/gemini/live/message_builder.rb +217 -0
data/lib/gemini/live/session.rb +223 -0
data/lib/gemini/live.rb +102 -0
data/lib/gemini/response.rb +43 -3
data/lib/gemini/version.rb +1 -1
data/lib/gemini.rb +1 -0
metadata +21 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 92b2d6f35fbff3fda1457ef3244ae730ee99e156909c310b62878540d47d4076
-  data.tar.gz: 710191ae9ac4f136ea586782be43b8aa18390d74a8ba1d844a9524fc120fdaa4
+  metadata.gz: 31487006e959d8d9a755743f6471b54e075a1bc91aa36718274203deb9fda84d
+  data.tar.gz: bc7dbbbea933ed2343b2ec1a32d3cf9b03fe41a4494801c47e5d83842bf31603
 SHA512:
-  metadata.gz: 31a2f84ff0f7d9d5127fe3cfa2eba75c2395cc4105e5faa6d45b472ccc4dc8e6502f28635f6a7dfce0c071e445010af9a1781c833a18c6f491f55b9e2989957b
-  data.tar.gz: 499e659d3a3284f461deede5fbcaff16ebb3476ced2c3c5c44d4d5e333d3751e45483ed221a654b0b560651127496933342e0fed5bfea8307de40edab80faa21
+  metadata.gz: cc033c4ab711800c56f2f1d8884c38a74359d7c3b66fb66ed39ea03b23b98c3748d08208dc560e3276ac3d6e25660ad3a2c5aef69ef25b523bbd6512a5b5d246
+  data.tar.gz: 998ca95babf2803241a9d0a01c00eefe7fcbd37990723ffc9378668bc30f5529a90daa92b20919e5c0e681f68f4e8a5d4206e66520eeec46da9aa18c169fc9be

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,32 @@
 ## [Unreleased]
+## [1.1.0] - 2026-04-29
+### Added
+- Live API support for real-time bidirectional audio/video/text conversations over WebSocket
+  - `Gemini::Live::Session` with event-driven API (`:setup_complete`, `:text`, `:audio`, `:tool_call`, `:turn_complete`, `:interrupted`, `:usage_metadata`, `:session_resumption`, `:go_away`, `:close`, `:error`)
+  - `Gemini::Live::Configuration` with response modality, voice, system instruction, tools, context-window compression, session resumption, manual VAD, output audio transcription
+  - `Gemini::Live::MessageBuilder` for setup, clientContent, realtimeInput, activity start/end, and tool response messages
+- Live API audio demos: `live_audio_demo.rb` (low-latency streaming), `live_audio_simple.rb`
+- Manual VAD (Voice Activity Detection) support via `automatic_activity_detection: false`
+- Live API Function Calling
+  - `Session#send_realtime_text(text)` — universal text input via `realtimeInput.text`, required by newer Live models such as `gemini-3.1-flash-live-preview`
+  - `MessageBuilder.realtime_text(text)` builder
+  - Async (NON_BLOCKING) function call support: `MessageBuilder.tool_response` validates and normalizes the `scheduling` field (`INTERRUPT`, `WHEN_IDLE`, `SILENT`), accepted either inside the response payload or as a top-level shortcut
+  - Demos: `live_function_calling_demo.rb` / `live_function_calling_demo_ja.rb`
+- Embeddings API support (`embedContent` and `batchEmbedContents`)
+  - `client.embeddings_api.create(input:, ...)` for single embeddings
+  - `client.embeddings_api.batch_create(inputs:, ...)` for batch embeddings
+  - `client.embed_content(input, ...)` shortcut that auto-routes Array inputs to batch
+  - Optional parameters: `task_type` (RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, SEMANTIC_SIMILARITY, CLASSIFICATION, CLUSTERING, QUESTION_ANSWERING, FACT_VERIFICATION, CODE_RETRIEVAL_QUERY), `title` (RETRIEVAL_DOCUMENT only), `output_dimensionality`
+  - Default model: `gemini-embedding-001`
+- `Response` helpers for embeddings: `#embedding`, `#embeddings`, `#embedding_dimension`, `#embedding_response?`
+- Demos: `embeddings_demo.rb` / `embeddings_demo_ja.rb`
+### Notes
+- Verified Live model compatibility on the `bidiGenerateContent` endpoint: only the native-audio variants and `gemini-3.1-flash-live-preview` are deployed today. The latter requires `realtimeInput.text` (i.e., `Session#send_realtime_text`) and `AUDIO` modality. The `gemini-2.5-flash-live-preview` model name listed in the public tools docs is not yet deployed.
+- `MessageBuilder.realtime_input` (legacy `mediaChunks` path) is documented as deprecated by the upstream API; prefer `realtime_text` going forward.
 ## [1.0.0] - 2026-01-28
 ### Added

data/README.md CHANGED Viewed

@@ -30,6 +30,8 @@ This project is inspired by and pays homage to [ruby-openai](https://github.com/
 - Structured output with JSON schema and enum constraints
 - Document processing (PDFs and other formats)
 - Context caching for efficient processing
+- Text embeddings (single and batch) with task type, title, and output dimensionality control
+- Live API: real-time bidirectional conversations with text/audio/video and function calling (sync and async)
 ### Function Calling
@@ -992,6 +994,275 @@ end
 For a complete example of context caching, check out the `demo/document_cache_demo.rb` file.
+### Live API (Real-time Conversations)
+The Gemini Live API provides bidirectional WebSocket-based real-time conversations with audio, video, and text support. The library wraps the protocol behind an event-driven `Gemini::Live::Session`.
+#### Basic Audio Conversation
+The default model (`gemini-2.5-flash-native-audio-preview-12-2025`) responds with audio. You receive Base64-encoded 24 kHz 16-bit PCM chunks via the `:audio` event.
+```ruby
+require 'gemini'
+require 'base64'
+client = Gemini::Client.new(ENV['GEMINI_API_KEY'])
+client.live.connect(
+  response_modality: "AUDIO",
+  voice_name: "Kore",
+  system_instruction: "You are a helpful assistant. Be brief."
+) do |session|
+  setup_complete = false
+  audio_chunks = []
+  session.on(:setup_complete) { setup_complete = true }
+  session.on(:audio)          { |data, _mime| audio_chunks << Base64.decode64(data) }
+  session.on(:turn_complete)  { puts "[#{audio_chunks.sum(&:bytesize)} bytes]" }
+  session.on(:error)          { |e| puts "Error: #{e.message}" }
+  sleep 0.05 until setup_complete
+  session.send_realtime_text("What is the capital of Japan?")
+  sleep 8
+end
+```
+For text-only responses, see the note below about Live model availability.
+#### Function Calling (Synchronous)
+The Live API supports function calling. Define your tools, register a `:tool_call` handler, and reply with `session.send_tool_response`.
+> **Note on Live model input format**
+> Newer Live models such as `gemini-3.1-flash-live-preview` reject the
+> legacy `clientContent.turns[]` payload that older models (including the
+> native-audio variants) accept. Use `session.send_realtime_text(...)`
+> instead of `session.send_text(...)`, which emits the universal
+> `realtimeInput.text` form and works on every currently-deployed Live
+> model. The `gemini-2.5-flash-live-preview` model name listed in the
+> public tools docs is not deployed on the `bidiGenerateContent` endpoint
+> at the time of writing.
+```ruby
+require 'base64'
+tools = [
+  {
+    functionDeclarations: [
+      {
+        name: "get_weather",
+        description: "Get the current weather for a location",
+        parameters: {
+          type: "object",
+          properties: {
+            location: { type: "string", description: "City name" }
+          },
+          required: ["location"]
+        }
+      }
+    ]
+  }
+]
+audio_chunks = []
+client.live.connect(
+  response_modality: "AUDIO",
+  voice_name: "Kore",
+  tools: tools,
+  system_instruction: "Use the available functions when asked about weather."
+) do |session|
+  session.on(:audio) { |data, _mime| audio_chunks << Base64.decode64(data) }
+  session.on(:tool_call) do |function_calls|
+    responses = function_calls.map do |call|
+      result = case call[:name]
+               when "get_weather"
+                 { temperature: 22, condition: "sunny", location: call[:args]["location"] }
+               end
+      { id: call[:id], name: call[:name], response: result }
+    end
+    session.send_tool_response(responses)
+  end
+  sleep 0.5  # wait for setup
+  session.send_realtime_text("What's the weather in Tokyo?")
+  sleep 18
+end
+# audio_chunks now contains 24 kHz, 16-bit PCM mono audio of the spoken reply.
+```
+A complete example is in `demo/live_function_calling_demo.rb`.
+#### Function Calling (Asynchronous / NON_BLOCKING)
+`gemini-2.5-flash-live-preview` supports asynchronous function calls. Mark a function declaration with `behavior: "NON_BLOCKING"` so the model can keep talking while the call runs, then control how the result is delivered back via `scheduling`.
+```ruby
+tools = [
+  {
+    functionDeclarations: [
+      {
+        name: "fetch_long_running_data",
+        behavior: "NON_BLOCKING",
+        description: "Slow data lookup",
+        parameters: { type: "object", properties: {} }
+      }
+    ]
+  }
+]
+session.on(:tool_call) do |function_calls|
+  responses = function_calls.map do |call|
+    {
+      id: call[:id],
+      name: call[:name],
+      response: { result: "data ready" },
+      scheduling: "INTERRUPT"  # or "WHEN_IDLE", "SILENT"
+    }
+  end
+  session.send_tool_response(responses)
+end
+```
+`scheduling` can also be placed inside the `response:` hash directly. Valid values: `INTERRUPT`, `WHEN_IDLE`, `SILENT`. The library validates and uppercases the value automatically; an unknown value raises `ArgumentError`.
+#### Built-in Tools
+Google Search grounding is supported in the Live API:
+```ruby
+client.live.connect(
+  model: "gemini-2.5-flash-live-preview",
+  tools: [{ google_search: {} }]
+) do |session|
+  # ...
+end
+```
+#### Supported Live API Models for Tools
+The public Live API tools docs list:
+| Model | Sync Function Calling | Async (NON_BLOCKING) | Google Search |
+|---|---|---|---|
+| `gemini-2.5-flash-live-preview` | ✓ | ✓ | ✓ |
+| `gemini-3.1-flash-live-preview` | ✓ | — | ✓ |
+In practice, on the `bidiGenerateContent` endpoint as of writing:
+- `gemini-3.1-flash-live-preview` is deployed and works with **AUDIO** response modality + tools, **but only when text input is sent via `session.send_realtime_text(...)`** (i.e., `realtimeInput.text`). It rejects the legacy `clientContent.turns[]` payload.
+- `gemini-2.5-flash-native-audio-preview-12-2025` (the library default) is deployed and accepts both `send_realtime_text` and `send_text` (legacy `clientContent.turns[]`).
+- `gemini-2.5-flash-live-preview` from the docs table is **not yet deployed**.
+Once a TEXT-modality-capable Live model ships, the same code works with `response_modality: "TEXT"` and the `voice_name:` argument removed.
+Demos available:
+- `demo/live_text_demo.rb` - Live API text conversation
+- `demo/live_audio_demo.rb` - Live API audio conversation
+- `demo/live_function_calling_demo.rb` - Live API function calling
+### Embeddings
+You can generate text embeddings using the Gemini Embeddings API. Embeddings are vector representations of text that can be used for semantic similarity, classification, clustering, retrieval, and more.
+#### Single Embedding
+```ruby
+require 'gemini'
+client = Gemini::Client.new(ENV['GEMINI_API_KEY'])
+response = client.embed_content(
+  "What is the meaning of life?",
+  model: "gemini-embedding-001"
+)
+if response.success?
+  puts "Dimension: #{response.embedding_dimension}"
+  puts "Vector (first 5 values): #{response.embedding.first(5).inspect}"
+end
+```
+#### Batch Embeddings
+Pass an Array of strings to embed multiple texts in a single batch request (uses `batchEmbedContents` under the hood):
+```ruby
+response = client.embed_content(
+  [
+    "I love programming in Ruby.",
+    "Rubies are red gemstones.",
+    "Python is also a programming language."
+  ],
+  model: "gemini-embedding-001",
+  task_type: :semantic_similarity
+)
+response.embeddings.each_with_index do |values, i|
+  puts "Embedding #{i}: dimension=#{values.size}"
+end
+```
+#### Task Type, Title, and Output Dimensionality
+You can specify a `task_type` to optimize the embedding for a particular downstream task. When `task_type: :retrieval_document` is used, you may also pass a `title`. Use `output_dimensionality` to truncate the vector length (recommended values: 768, 1536, 3072).
+```ruby
+response = client.embed_content(
+  "Ruby is a dynamic, open-source programming language.",
+  model: "gemini-embedding-001",
+  task_type: :retrieval_document,
+  title: "Ruby Overview",
+  output_dimensionality: 768
+)
+```
+Supported task types:
+- `RETRIEVAL_QUERY`
+- `RETRIEVAL_DOCUMENT`
+- `SEMANTIC_SIMILARITY`
+- `CLASSIFICATION`
+- `CLUSTERING`
+- `QUESTION_ANSWERING`
+- `FACT_VERIFICATION`
+- `CODE_RETRIEVAL_QUERY`
+You can pass them as a String, Symbol, or in any case (e.g. `:retrieval_query`, `"RETRIEVAL_QUERY"`).
+#### Direct Access via `embeddings_api`
+For more control, you can call the embeddings API directly:
+```ruby
+# Single
+client.embeddings_api.create(input: "Hello", model: "gemini-embedding-001")
+# Batch
+client.embeddings_api.batch_create(
+  inputs: ["First", "Second", "Third"],
+  model: "gemini-embedding-001",
+  task_type: :clustering
+)
+```
+#### Response Helpers
+The Response object exposes a few helpers for embedding payloads:
+```ruby
+response.embedding            # First embedding values (Array of Floats)
+response.embeddings           # All embedding value arrays (Array of Arrays)
+response.embedding_dimension  # Length of the first embedding vector
+response.embedding_response?  # true if the payload contains embedding data
+```
+A complete example is available in `demo/embeddings_demo.rb`.
 ### Structured Output with JSON Schema
 You can request responses in structured JSON format by specifying a JSON schema:
@@ -1232,6 +1503,10 @@ The gem includes several demo applications that showcase its functionality:
 - `demo/document_chat_demo.rb` - Document processing
 - `demo/document_conversation_demo.rb` - Conversation with documents
 - `demo/document_cache_demo.rb` - Document caching
+- `demo/embeddings_demo.rb` - Text embeddings (single and batch)
+- `demo/live_text_demo.rb` - Live API text conversation
+- `demo/live_audio_demo.rb` - Live API audio conversation
+- `demo/live_function_calling_demo.rb` - Live API function calling
 Run the demos with:
@@ -1286,6 +1561,9 @@ ruby demo/document_conversation_demo.rb path/to/document.pdf
 # Document caching and querying
 ruby demo/document_cache_demo.rb path/to/document.pdf
+# Text embeddings (single and batch)
+ruby demo/embeddings_demo.rb
 ```
 ## Models

data/lib/gemini/client.rb CHANGED Viewed

@@ -70,6 +70,16 @@ module Gemini
       @cached_content ||= Gemini::CachedContent.new(client: self)
     end
+    # Live APIアクセサ
+    def live
+      @live ||= Gemini::Live.new(client: self)
+    end
+    # Embeddings APIアクセサ
+    def embeddings_api
+      @embeddings_api ||= Gemini::Embeddings.new(client: self)
+    end
     def reset_headers
       @extra_headers = {}
     end
@@ -112,10 +122,25 @@ module Gemini
       end
     end
-    # Method corresponding to OpenAI's embeddings
+    # Generate embeddings for the given input.
+    # input can be a String (single embed) or Array of Strings (batch embed).
+    # Supports task_type, title (RETRIEVAL_DOCUMENT only), and output_dimensionality.
+    def embed_content(input, model: Gemini::Embeddings::DEFAULT_MODEL, task_type: nil,
+                      title: nil, output_dimensionality: nil, **parameters)
+      embeddings_api.create(
+        input: input,
+        model: model,
+        task_type: task_type,
+        title: title,
+        output_dimensionality: output_dimensionality,
+        **parameters
+      )
+    end
+    # Method corresponding to OpenAI's embeddings (kept for compatibility)
     def embeddings(parameters: {})
-      model = parameters.delete(:model) || "text-embedding-model"
-      path = "models/#{model}:embedContent"
+      model = parameters.delete(:model) || Gemini::Embeddings::DEFAULT_MODEL
+      path = "models/#{model.to_s.delete_prefix("models/")}:embedContent"
       response = json_post(path: path, parameters: parameters)
       Gemini::Response.new(response)
     end

data/lib/gemini/embeddings.rb CHANGED Viewed

@@ -1,27 +1,118 @@
 module Gemini
   class Embeddings
+    DEFAULT_MODEL = "gemini-embedding-001".freeze
+    VALID_TASK_TYPES = %w[
+      RETRIEVAL_QUERY
+      RETRIEVAL_DOCUMENT
+      SEMANTIC_SIMILARITY
+      CLASSIFICATION
+      CLUSTERING
+      QUESTION_ANSWERING
+      FACT_VERIFICATION
+      CODE_RETRIEVAL_QUERY
+    ].freeze
     def initialize(client:)
       @client = client
     end
-    def create(input:, model: "text-embedding-model", **parameters)
-      content = case input
-                when String
-                  { parts: [{ text: input }] }
-                when Array
-                  { parts: input.map { |text| { text: text.to_s } } }
-                else
-                  { parts: [{ text: input.to_s }] }
-                end
-      payload = {
-        content: content
-      }.merge(parameters)
-      @client.json_post(
-        path: "models/#{model}:embedContent",
+    # Generate an embedding for a single content, or batch when input is an Array
+    def create(input:, model: DEFAULT_MODEL, task_type: nil, title: nil,
+               output_dimensionality: nil, **parameters)
+      if input.is_a?(Array)
+        return batch_create(
+          inputs: input,
+          model: model,
+          task_type: task_type,
+          title: title,
+          output_dimensionality: output_dimensionality,
+          **parameters
+        )
+      end
+      payload = build_embed_payload(
+        input: input,
+        task_type: task_type,
+        title: title,
+        output_dimensionality: output_dimensionality
+      ).merge(parameters)
+      response = @client.json_post(
+        path: "models/#{normalize_model(model)}:embedContent",
         parameters: payload
       )
+      Gemini::Response.new(response)
+    end
+    # Generate embeddings for multiple inputs in a single batch request
+    def batch_create(inputs:, model: DEFAULT_MODEL, task_type: nil, title: nil,
+                     output_dimensionality: nil, **parameters)
+      requests = inputs.map do |input|
+        req = build_embed_payload(
+          input: input,
+          task_type: task_type,
+          title: title,
+          output_dimensionality: output_dimensionality
+        )
+        req[:model] = "models/#{normalize_model(model)}"
+        req
+      end
+      payload = { requests: requests }.merge(parameters)
+      response = @client.json_post(
+        path: "models/#{normalize_model(model)}:batchEmbedContents",
+        parameters: payload
+      )
+      Gemini::Response.new(response)
+    end
+    private
+    def build_embed_payload(input:, task_type:, title:, output_dimensionality:)
+      payload = { content: format_content(input) }
+      if task_type
+        validate_task_type!(task_type)
+        payload[:taskType] = task_type.to_s.upcase
+      end
+      payload[:title] = title if title
+      payload[:outputDimensionality] = output_dimensionality if output_dimensionality
+      payload
+    end
+    def format_content(input)
+      case input
+      when String
+        { parts: [{ text: input }] }
+      when Hash
+        if input.key?(:parts) || input.key?("parts")
+          input
+        elsif input.key?(:text) || input.key?("text") ||
+              input.key?(:inline_data) || input.key?("inline_data") ||
+              input.key?(:file_data) || input.key?("file_data")
+          { parts: [input] }
+        else
+          input
+        end
+      else
+        { parts: [{ text: input.to_s }] }
+      end
+    end
+    def normalize_model(model)
+      model_str = model.to_s
+      model_str.start_with?("models/") ? model_str.delete_prefix("models/") : model_str
+    end
+    def validate_task_type!(task_type)
+      task_type_str = task_type.to_s.upcase
+      unless VALID_TASK_TYPES.include?(task_type_str)
+        raise ArgumentError, "task_type must be one of: #{VALID_TASK_TYPES.join(', ')}"
+      end
     end
   end
-end
+end

data/lib/gemini/live/configuration.rb ADDED Viewed

@@ -0,0 +1,65 @@
+# frozen_string_literal: true
+module Gemini
+  class Live
+    # Configuration class for Live API sessions
+    class Configuration
+      attr_accessor :model, :response_modality, :voice_name,
+                    :system_instruction, :tools,
+                    :context_window_compression, :session_resumption,
+                    :automatic_activity_detection,
+                    :media_resolution, :output_audio_transcription
+      VALID_MODALITIES = %w[TEXT AUDIO].freeze
+      VALID_VOICES = %w[Puck Charon Kore Fenrir Aoede Leda Orus Zephyr].freeze
+      # NOTE: gemini-2.5-flash-live-preview is listed in the public Live API
+      # tools documentation as the recommended model, but is not currently
+      # deployed (returns "model not found" on bidiGenerateContent). The
+      # native-audio preview model is the only Live model on which function
+      # calling currently works in practice (with AUDIO modality).
+      DEFAULT_MODEL = "gemini-2.5-flash-native-audio-preview-12-2025"
+      def initialize(
+        model: DEFAULT_MODEL,
+        response_modality: "TEXT",
+        voice_name: nil,
+        system_instruction: nil,
+        tools: nil,
+        context_window_compression: nil,
+        session_resumption: nil,
+        automatic_activity_detection: true,
+        media_resolution: nil,
+        output_audio_transcription: false
+      )
+        @model = model
+        @response_modality = validate_modality(response_modality)
+        @voice_name = validate_voice(voice_name)
+        @system_instruction = system_instruction
+        @tools = tools
+        @context_window_compression = context_window_compression
+        @session_resumption = session_resumption
+        @automatic_activity_detection = automatic_activity_detection
+        @media_resolution = media_resolution
+        @output_audio_transcription = output_audio_transcription
+      end
+      private
+      def validate_modality(modality)
+        modality = modality.to_s.upcase
+        unless VALID_MODALITIES.include?(modality)
+          raise ArgumentError, "Invalid modality: #{modality}. Must be one of: #{VALID_MODALITIES.join(', ')}"
+        end
+        modality
+      end
+      def validate_voice(voice)
+        return nil if voice.nil?
+        unless VALID_VOICES.include?(voice)
+          raise ArgumentError, "Invalid voice: #{voice}. Must be one of: #{VALID_VOICES.join(', ')}"
+        end
+        voice
+      end
+    end
+  end
+end

data/lib/gemini/live/connection.rb ADDED Viewed

@@ -0,0 +1,83 @@
+# frozen_string_literal: true
+require "websocket-client-simple"
+require "json"
+module Gemini
+  class Live
+    # WebSocket connection manager for Live API
+    class Connection
+      WEBSOCKET_BASE_URL = "wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent"
+      attr_reader :connected
+      def initialize(api_key:, on_message:, on_open:, on_error:, on_close:)
+        @api_key = api_key
+        @on_message = on_message
+        @on_open = on_open
+        @on_error = on_error
+        @on_close = on_close
+        @ws = nil
+        @connected = false
+        @mutex = Mutex.new
+      end
+      def connect
+        url = "#{WEBSOCKET_BASE_URL}?key=#{@api_key}"
+        # Store callbacks in local variables for closure
+        on_message_callback = @on_message
+        on_open_callback = @on_open
+        on_error_callback = @on_error
+        on_close_callback = @on_close
+        connection = self
+        @ws = WebSocket::Client::Simple.connect(url) do |ws|
+          ws.on :open do
+            connection.instance_variable_set(:@connected, true)
+            on_open_callback.call if on_open_callback
+          end
+          ws.on :message do |msg|
+            on_message_callback.call(msg.data) if on_message_callback
+          end
+          ws.on :error do |e|
+            on_error_callback.call(e) if on_error_callback
+          end
+          ws.on :close do |e|
+            connection.instance_variable_set(:@connected, false)
+            code = e.respond_to?(:code) ? e.code : nil
+            reason = e.respond_to?(:reason) ? e.reason : nil
+            on_close_callback.call(code, reason) if on_close_callback
+          end
+        end
+        self
+      end
+      def send(data)
+        return false unless @ws && @connected
+        @mutex.synchronize do
+          json_data = data.is_a?(String) ? data : data.to_json
+          @ws.send(json_data)
+        end
+        true
+      rescue StandardError => e
+        @on_error&.call(e)
+        false
+      end
+      def close
+        @ws&.close
+        @connected = false
+      end
+      def connected?
+        @connected && @ws && !@ws.closed?
+      end
+    end
+  end
+end

data/lib/gemini/live/message_builder.rb ADDED Viewed

@@ -0,0 +1,217 @@
+# frozen_string_literal: true
+module Gemini
+  class Live
+    # Helper class to build Live API messages
+    class MessageBuilder
+      VALID_SCHEDULING = %w[INTERRUPT WHEN_IDLE SILENT].freeze
+      class << self
+        # Build setup message from configuration
+        def setup(config)
+          message = {
+            setup: {
+              model: normalize_model_name(config.model)
+            }
+          }
+          generation_config = build_generation_config(config)
+          message[:setup][:generationConfig] = generation_config unless generation_config.empty?
+          # System instruction
+          if config.system_instruction
+            message[:setup][:systemInstruction] = {
+              parts: [{ text: config.system_instruction }]
+            }
+          end
+          # Tools configuration
+          message[:setup][:tools] = config.tools if config.tools
+          # Context window compression
+          if config.context_window_compression
+            message[:setup][:contextWindowCompression] = config.context_window_compression
+          end
+          # Session resumption
+          if config.session_resumption
+            message[:setup][:sessionResumption] = config.session_resumption
+          end
+          # VAD (Voice Activity Detection) settings
+          unless config.automatic_activity_detection
+            message[:setup][:realtimeInputConfig] = {
+              automaticActivityDetection: {
+                disabled: true
+              }
+            }
+          end
+          message
+        end
+        # Build client content message (text)
+        def client_content(text:, turn_complete: true, role: "user")
+          {
+            clientContent: {
+              turns: [
+                {
+                  role: role,
+                  parts: [{ text: text }]
+                }
+              ],
+              turnComplete: turn_complete
+            }
+          }
+        end
+        # Build client content with multiple parts
+        def client_content_parts(parts:, turn_complete: true, role: "user")
+          {
+            clientContent: {
+              turns: [
+                {
+                  role: role,
+                  parts: parts
+                }
+              ],
+              turnComplete: turn_complete
+            }
+          }
+        end
+        # Build realtime input message (audio/video) using the legacy
+        # mediaChunks field. NOTE: mediaChunks is deprecated by the API in
+        # favor of the dedicated audio/video fields built by realtime_audio
+        # and realtime_video. Kept for backward compatibility with older
+        # Live models that still accept it.
+        def realtime_input(audio_data: nil, video_data: nil, mime_type:)
+          data = audio_data || video_data
+          {
+            realtimeInput: {
+              mediaChunks: [
+                {
+                  mimeType: mime_type,
+                  data: data
+                }
+              ]
+            }
+          }
+        end
+        # Build a realtime text input message. This is the universal
+        # text-input form for the Live API and is required by newer Live
+        # models such as gemini-3.1-flash-live-preview, which reject the
+        # turn-based clientContent payload.
+        def realtime_text(text)
+          { realtimeInput: { text: text.to_s } }
+        end
+        # Build activity start message (for manual VAD)
+        def activity_start
+          {
+            realtimeInput: {
+              activityStart: {}
+            }
+          }
+        end
+        # Build activity end message (for manual VAD)
+        def activity_end
+          {
+            realtimeInput: {
+              activityEnd: {}
+            }
+          }
+        end
+        # Build tool response message.
+        #
+        # Each function response hash supports:
+        #   :id       - The function call id from the server
+        #   :name     - The function name
+        #   :response - The function result (Hash or scalar). When using
+        #               NON_BLOCKING (async) function calls, include
+        #               `scheduling: "INTERRUPT" | "WHEN_IDLE" | "SILENT"`
+        #               inside the response hash.
+        #   :scheduling - (optional) Top-level shortcut. When provided,
+        #                 it is merged into the response hash as
+        #                 `response[:scheduling]`. Accepts Symbol or String.
+        #
+        # Raises ArgumentError if scheduling is not one of the valid values.
+        def tool_response(function_responses)
+          {
+            toolResponse: {
+              functionResponses: function_responses.map { |resp| build_function_response(resp) }
+            }
+          }
+        end
+        private
+        def build_function_response(resp)
+          response_payload =
+            case resp[:response]
+            when Hash then resp[:response].dup
+            when nil  then {}
+            else { result: resp[:response] }
+            end
+          if (top_level_scheduling = resp[:scheduling])
+            response_payload[:scheduling] = normalize_scheduling(top_level_scheduling)
+          elsif (sched = response_payload[:scheduling] || response_payload["scheduling"])
+            normalized = normalize_scheduling(sched)
+            response_payload.delete("scheduling")
+            response_payload[:scheduling] = normalized
+          end
+          { id: resp[:id], name: resp[:name], response: response_payload }
+        end
+        def normalize_scheduling(value)
+          value_str = value.to_s.upcase
+          unless VALID_SCHEDULING.include?(value_str)
+            raise ArgumentError,
+                  "scheduling must be one of: #{VALID_SCHEDULING.join(', ')} (got #{value.inspect})"
+          end
+          value_str
+        end
+        def normalize_model_name(model)
+          model.start_with?("models/") ? model : "models/#{model}"
+        end
+        def build_generation_config(config)
+          generation_config = {}
+          # Response modality
+          generation_config[:responseModalities] = [config.response_modality]
+          # Speech/Voice configuration for AUDIO modality
+          if config.response_modality == "AUDIO" && config.voice_name
+            generation_config[:speechConfig] = {
+              voiceConfig: {
+                prebuiltVoiceConfig: {
+                  voiceName: config.voice_name
+                }
+              }
+            }
+          end
+          # Media resolution
+          if config.media_resolution
+            generation_config[:mediaResolution] = config.media_resolution
+          end
+          # Output audio transcription
+          if config.output_audio_transcription
+            generation_config[:outputAudioTranscription] = {}
+          end
+          generation_config
+        end
+      end
+    end
+  end
+end

data/lib/gemini/live/session.rb ADDED Viewed

@@ -0,0 +1,223 @@
+# frozen_string_literal: true
+require "json"
+require "base64"
+module Gemini
+  class Live
+    # Live API session manager
+    class Session
+      attr_reader :configuration, :last_resumption_token, :usage_metadata
+      def initialize(api_key:, configuration:)
+        @api_key = api_key
+        @configuration = configuration
+        @event_handlers = Hash.new { |h, k| h[k] = [] }
+        @connected = false
+        @setup_complete = false
+        @last_resumption_token = nil
+        @usage_metadata = nil
+        @connection = nil
+        setup_connection
+      end
+      # Register event handler
+      # Supported events:
+      #   :setup_complete - Session setup completed
+      #   :text           - Text response received (text)
+      #   :audio          - Audio data received (base64_data, mime_type)
+      #   :data           - Other inline data received (base64_data, mime_type)
+      #   :tool_call      - Tool call requested (function_calls)
+      #   :interrupted    - User interrupted the model
+      #   :turn_complete  - Model turn completed
+      #   :generation_complete - Generation completed
+      #   :usage_metadata - Token usage info received (metadata)
+      #   :session_resumption - Session resumption token updated (update)
+      #   :go_away        - Connection will close soon (info)
+      #   :error          - Error occurred (error)
+      #   :close          - Connection closed (code, reason)
+      def on(event, &block)
+        @event_handlers[event.to_sym] << block
+        self
+      end
+      # Send text message via clientContent.turns. This is the legacy form
+      # used by native-audio Live models. Newer models such as
+      # gemini-3.1-flash-live-preview reject this payload — use
+      # #send_realtime_text instead, which works on every Live model.
+      def send_text(text, turn_complete: true)
+        ensure_setup_complete!
+        message = MessageBuilder.client_content(
+          text: text,
+          turn_complete: turn_complete
+        )
+        @connection.send(message)
+      end
+      # Send text input via realtimeInput.text (universal form).
+      # Works with every currently-deployed Live model, including
+      # gemini-3.1-flash-live-preview and native-audio variants.
+      def send_realtime_text(text)
+        ensure_setup_complete!
+        @connection.send(MessageBuilder.realtime_text(text))
+      end
+      # Send audio data (Base64 encoded PCM)
+      def send_audio(audio_data, mime_type: "audio/pcm;rate=16000")
+        ensure_setup_complete!
+        encoded_data = audio_data.is_a?(String) && audio_data.encoding == Encoding::BINARY ?
+          Base64.strict_encode64(audio_data) : audio_data
+        message = MessageBuilder.realtime_input(
+          audio_data: encoded_data,
+          mime_type: mime_type
+        )
+        @connection.send(message)
+      end
+      # Send video/image data (Base64 encoded)
+      def send_video(image_data, mime_type: "image/jpeg")
+        ensure_setup_complete!
+        encoded_data = image_data.is_a?(String) && image_data.encoding == Encoding::BINARY ?
+          Base64.strict_encode64(image_data) : image_data
+        message = MessageBuilder.realtime_input(
+          video_data: encoded_data,
+          mime_type: mime_type
+        )
+        @connection.send(message)
+      end
+      # Send tool response
+      def send_tool_response(function_responses)
+        ensure_setup_complete!
+        message = MessageBuilder.tool_response(function_responses)
+        @connection.send(message)
+      end
+      # Manual VAD control - signal activity start
+      def activity_start
+        ensure_setup_complete!
+        @connection.send(MessageBuilder.activity_start)
+      end
+      # Manual VAD control - signal activity end
+      def activity_end
+        ensure_setup_complete!
+        @connection.send(MessageBuilder.activity_end)
+      end
+      # Close the session
+      def close
+        @connection&.close
+        @connected = false
+        @setup_complete = false
+      end
+      def connected?
+        @connected && @connection&.connected?
+      end
+      def setup_complete?
+        @setup_complete
+      end
+      private
+      def setup_connection
+        @connection = Connection.new(
+          api_key: @api_key,
+          on_message: method(:handle_message),
+          on_open: method(:handle_open),
+          on_error: method(:handle_error),
+          on_close: method(:handle_close)
+        )
+        @connection.connect
+        @connected = true
+      end
+      def handle_open
+        # Send setup message immediately after connection opens
+        setup_message = MessageBuilder.setup(@configuration)
+        @connection.send(setup_message)
+      end
+      def handle_message(data)
+        parsed = JSON.parse(data, symbolize_names: true)
+        if parsed[:setupComplete]
+          @setup_complete = true
+          emit(:setup_complete)
+        elsif parsed[:serverContent]
+          handle_server_content(parsed[:serverContent])
+        elsif parsed[:toolCall]
+          emit(:tool_call, parsed[:toolCall][:functionCalls])
+        elsif parsed[:usageMetadata]
+          @usage_metadata = parsed[:usageMetadata]
+          emit(:usage_metadata, parsed[:usageMetadata])
+        elsif parsed[:sessionResumptionUpdate]
+          handle_session_resumption(parsed[:sessionResumptionUpdate])
+        elsif parsed[:goAway]
+          emit(:go_away, parsed[:goAway])
+        end
+      rescue JSON::ParserError => e
+        emit(:error, e)
+      end
+      def handle_server_content(content)
+        # Check for interruption
+        if content[:interrupted]
+          emit(:interrupted)
+          return
+        end
+        # Check for generation complete
+        if content[:generationComplete]
+          emit(:generation_complete)
+        end
+        # Process model turn
+        model_turn = content[:modelTurn]
+        if model_turn
+          model_turn[:parts]&.each do |part|
+            if part[:text]
+              emit(:text, part[:text])
+            elsif part[:inlineData]
+              inline = part[:inlineData]
+              if inline[:mimeType]&.start_with?("audio/")
+                emit(:audio, inline[:data], inline[:mimeType])
+              else
+                emit(:data, inline[:data], inline[:mimeType])
+              end
+            end
+          end
+        end
+        # Check for turn complete
+        emit(:turn_complete) if content[:turnComplete]
+      end
+      def handle_session_resumption(update)
+        @last_resumption_token = update[:newHandle]
+        emit(:session_resumption, update)
+      end
+      def handle_error(error)
+        emit(:error, error)
+      end
+      def handle_close(code, reason)
+        @connected = false
+        @setup_complete = false
+        emit(:close, code, reason)
+      end
+      def emit(event, *args)
+        @event_handlers[event].each { |handler| handler.call(*args) }
+      end
+      def ensure_setup_complete!
+        raise Gemini::Error, "Session setup not complete. Wait for :setup_complete event." unless @setup_complete
+      end
+    end
+  end
+end

data/lib/gemini/live.rb ADDED Viewed

@@ -0,0 +1,102 @@
+# frozen_string_literal: true
+require_relative "live/configuration"
+require_relative "live/message_builder"
+require_relative "live/connection"
+require_relative "live/session"
+module Gemini
+  # Live API client for real-time audio/video/text interactions
+  #
+  # @example Basic text conversation
+  #   client = Gemini::Client.new(api_key)
+  #   session = client.live.connect(model: "gemini-2.5-flash-live-preview")
+  #
+  #   session.on(:setup_complete) { puts "Connected!" }
+  #   session.on(:text) { |text| puts "AI: #{text}" }
+  #   session.on(:error) { |e| puts "Error: #{e}" }
+  #
+  #   session.send_text("Hello!")
+  #   sleep 5
+  #   session.close
+  #
+  # @example Audio conversation
+  #   session = client.live.connect(
+  #     model: "gemini-2.5-flash-live-preview",
+  #     response_modality: "AUDIO",
+  #     voice_name: "Puck"
+  #   )
+  #
+  #   session.on(:audio) { |data, mime| play_audio(data) }
+  #   session.send_audio(pcm_data)  # 16-bit PCM, 16kHz, mono
+  #
+  # @example With block (auto-close)
+  #   client.live.connect(model: "gemini-2.5-flash-live-preview") do |session|
+  #     session.on(:text) { |text| puts text }
+  #     session.send_text("Hello!")
+  #     sleep 5
+  #   end  # session.close called automatically
+  #
+  class Live
+    def initialize(client:)
+      @client = client
+    end
+    # Establish a WebSocket connection and return a session
+    #
+    # @param model [String] Model to use (default: "gemini-2.5-flash-live-preview")
+    # @param response_modality [String] "TEXT" or "AUDIO" (default: "TEXT")
+    # @param voice_name [String] Voice for audio responses (Puck, Charon, Kore, etc.)
+    # @param system_instruction [String] System prompt
+    # @param tools [Array] Tool definitions for function calling
+    # @param context_window_compression [Hash] Compression settings for long sessions
+    # @param session_resumption [Hash] Session resumption settings
+    # @param automatic_activity_detection [Boolean] Enable/disable automatic VAD (default: true)
+    # @param media_resolution [String] Media resolution setting
+    # @param output_audio_transcription [Boolean] Enable audio transcription (default: false)
+    # @yield [session] If block given, yields the session and closes it when block returns
+    # @return [Gemini::Live::Session] The live session
+    #
+    def connect(
+      model: Configuration::DEFAULT_MODEL,
+      response_modality: "TEXT",
+      voice_name: nil,
+      system_instruction: nil,
+      tools: nil,
+      context_window_compression: nil,
+      session_resumption: nil,
+      automatic_activity_detection: true,
+      media_resolution: nil,
+      output_audio_transcription: false,
+      &block
+    )
+      config = Configuration.new(
+        model: model,
+        response_modality: response_modality,
+        voice_name: voice_name,
+        system_instruction: system_instruction,
+        tools: tools,
+        context_window_compression: context_window_compression,
+        session_resumption: session_resumption,
+        automatic_activity_detection: automatic_activity_detection,
+        media_resolution: media_resolution,
+        output_audio_transcription: output_audio_transcription
+      )
+      session = Session.new(
+        api_key: @client.api_key,
+        configuration: config
+      )
+      if block_given?
+        begin
+          yield session
+        ensure
+          session.close
+        end
+      else
+        session
+      end
+    end
+  end
+end

data/lib/gemini/response.rb CHANGED Viewed

@@ -70,9 +70,49 @@ module Gemini
     # Check if response is valid
     def valid?
-      !@raw_data.nil? &&
-      ((@raw_data.key?("candidates") && !@raw_data["candidates"].empty?) ||
-       (@raw_data.key?("predictions") && !@raw_data["predictions"].empty?))
+      !@raw_data.nil? &&
+      ((@raw_data.key?("candidates") && !@raw_data["candidates"].empty?) ||
+       (@raw_data.key?("predictions") && !@raw_data["predictions"].empty?) ||
+       embedding_response?)
+    end
+    # Check if the raw response contains embedding data
+    def embedding_response?
+      return false if @raw_data.nil?
+      (@raw_data.key?("embedding") && !@raw_data["embedding"].nil?) ||
+        (@raw_data.key?("embeddings") && @raw_data["embeddings"].is_a?(Array) && !@raw_data["embeddings"].empty?)
+    end
+    # Get the embedding values as an Array of Floats.
+    # For single embedContent responses returns the values array.
+    # For batchEmbedContents responses returns the first embedding's values.
+    def embedding
+      return nil unless @raw_data
+      if @raw_data["embedding"].is_a?(Hash)
+        @raw_data["embedding"]["values"]
+      elsif @raw_data["embeddings"].is_a?(Array) && @raw_data["embeddings"].first.is_a?(Hash)
+        @raw_data["embeddings"].first["values"]
+      end
+    end
+    # Get all embedding value arrays for batch responses.
+    # Returns an Array of Arrays of Floats.
+    # For single embedContent responses, returns a single-element array.
+    def embeddings
+      return [] unless @raw_data
+      if @raw_data["embeddings"].is_a?(Array)
+        @raw_data["embeddings"].map { |e| e["values"] }.compact
+      elsif @raw_data["embedding"].is_a?(Hash) && @raw_data["embedding"]["values"]
+        [@raw_data["embedding"]["values"]]
+      else
+        []
+      end
+    end
+    # Get the dimensionality (length) of the first embedding vector
+    def embedding_dimension
+      values = embedding
+      values.is_a?(Array) ? values.length : 0
     end
     # Get error message if any

data/lib/gemini/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module Gemini
-  VERSION = "1.0.0"
+  VERSION = "1.1.0"
 end

data/lib/gemini.rb CHANGED Viewed

@@ -20,6 +20,7 @@ require_relative "gemini/function_calling_helper"
 require_relative "gemini/documents"
 require_relative "gemini/cached_content"
 require_relative "gemini/video"
+require_relative "gemini/live"
 module Gemini
   class Error < StandardError; end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: ruby-gemini-api
 version: !ruby/object:Gem::Version
-  version: 1.0.0
+  version: 1.1.0
 platform: ruby
 authors:
 - rira100000000
@@ -51,6 +51,20 @@ dependencies:
     - - "~>"
       - !ruby/object:Gem::Version
         version: '2.0'
+- !ruby/object:Gem::Dependency
+  name: websocket-client-simple
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '0.8'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '0.8'
 - !ruby/object:Gem::Dependency
   name: rake
   requirement: !ruby/object:Gem::Requirement
@@ -156,6 +170,11 @@ files:
 - lib/gemini/http.rb
 - lib/gemini/http_headers.rb
 - lib/gemini/images.rb
+- lib/gemini/live.rb
+- lib/gemini/live/configuration.rb
+- lib/gemini/live/connection.rb
+- lib/gemini/live/message_builder.rb
+- lib/gemini/live/session.rb
 - lib/gemini/messages.rb
 - lib/gemini/models.rb
 - lib/gemini/response.rb
@@ -187,7 +206,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubygems_version: 3.7.2
+rubygems_version: 3.6.9
 specification_version: 4
 summary: Ruby client for Google's Gemini API
 test_files: []