RubyGems - ruby-gemini-api - Versions diffs - 0.1.7 → 1.1.0 - Mend

ruby-gemini-api 0.1.7 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +59 -21
data/README.md +397 -0
data/lib/gemini/client.rb +85 -7
data/lib/gemini/embeddings.rb +108 -17
data/lib/gemini/function_calling_helper.rb +45 -0
data/lib/gemini/live/configuration.rb +65 -0
data/lib/gemini/live/connection.rb +83 -0
data/lib/gemini/live/message_builder.rb +217 -0
data/lib/gemini/live/session.rb +223 -0
data/lib/gemini/live.rb +102 -0
data/lib/gemini/response.rb +89 -4
data/lib/gemini/version.rb +1 -1
data/lib/gemini.rb +2 -0
metadata +23 -6

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 89f3cb8e2bfe7ee5c3e2319312a44b8eefe756d984aa0a26373102f5a3d7eb18
-  data.tar.gz: 0b3272695fc70f9a841261f525799faea9b6da6562bf949d167ec511f1e04eb4
+  metadata.gz: 31487006e959d8d9a755743f6471b54e075a1bc91aa36718274203deb9fda84d
+  data.tar.gz: bc7dbbbea933ed2343b2ec1a32d3cf9b03fe41a4494801c47e5d83842bf31603
 SHA512:
-  metadata.gz: 3ac51187c27d747ea663c269d63b87a8e31a52febdce56efcc7b17497ffa17987e9b61a957c1ee95ea265128594a22ac3ce937a0380c97e5422a81e6906b052c
-  data.tar.gz: 8ca1da9ce78243b3435c7baf5b059253ecb1a32b98092520342fa3063135315713da4cb98c9ee789fcbf28453fadf1f2e2bbfb39d28b3e515b60549ac78f8012
+  metadata.gz: cc033c4ab711800c56f2f1d8884c38a74359d7c3b66fb66ed39ea03b23b98c3748d08208dc560e3276ac3d6e25660ad3a2c5aef69ef25b523bbd6512a5b5d246
+  data.tar.gz: 998ca95babf2803241a9d0a01c00eefe7fcbd37990723ffc9378668bc30f5529a90daa92b20919e5c0e681f68f4e8a5d4206e66520eeec46da9aa18c169fc9be

data/CHANGELOG.md CHANGED Viewed

@@ -1,39 +1,77 @@
 ## [Unreleased]
-## [0.1.0] - 2025-04-05
-- Initial release
+## [1.1.0] - 2026-04-29
+### Added
+- Live API support for real-time bidirectional audio/video/text conversations over WebSocket
+  - `Gemini::Live::Session` with event-driven API (`:setup_complete`, `:text`, `:audio`, `:tool_call`, `:turn_complete`, `:interrupted`, `:usage_metadata`, `:session_resumption`, `:go_away`, `:close`, `:error`)
+  - `Gemini::Live::Configuration` with response modality, voice, system instruction, tools, context-window compression, session resumption, manual VAD, output audio transcription
+  - `Gemini::Live::MessageBuilder` for setup, clientContent, realtimeInput, activity start/end, and tool response messages
+- Live API audio demos: `live_audio_demo.rb` (low-latency streaming), `live_audio_simple.rb`
+- Manual VAD (Voice Activity Detection) support via `automatic_activity_detection: false`
+- Live API Function Calling
+  - `Session#send_realtime_text(text)` — universal text input via `realtimeInput.text`, required by newer Live models such as `gemini-3.1-flash-live-preview`
+  - `MessageBuilder.realtime_text(text)` builder
+  - Async (NON_BLOCKING) function call support: `MessageBuilder.tool_response` validates and normalizes the `scheduling` field (`INTERRUPT`, `WHEN_IDLE`, `SILENT`), accepted either inside the response payload or as a top-level shortcut
+  - Demos: `live_function_calling_demo.rb` / `live_function_calling_demo_ja.rb`
+- Embeddings API support (`embedContent` and `batchEmbedContents`)
+  - `client.embeddings_api.create(input:, ...)` for single embeddings
+  - `client.embeddings_api.batch_create(inputs:, ...)` for batch embeddings
+  - `client.embed_content(input, ...)` shortcut that auto-routes Array inputs to batch
+  - Optional parameters: `task_type` (RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, SEMANTIC_SIMILARITY, CLASSIFICATION, CLUSTERING, QUESTION_ANSWERING, FACT_VERIFICATION, CODE_RETRIEVAL_QUERY), `title` (RETRIEVAL_DOCUMENT only), `output_dimensionality`
+  - Default model: `gemini-embedding-001`
+- `Response` helpers for embeddings: `#embedding`, `#embeddings`, `#embedding_dimension`, `#embedding_response?`
+- Demos: `embeddings_demo.rb` / `embeddings_demo_ja.rb`
+### Notes
+- Verified Live model compatibility on the `bidiGenerateContent` endpoint: only the native-audio variants and `gemini-3.1-flash-live-preview` are deployed today. The latter requires `realtimeInput.text` (i.e., `Session#send_realtime_text`) and `AUDIO` modality. The `gemini-2.5-flash-live-preview` model name listed in the public tools docs is not yet deployed.
+- `MessageBuilder.realtime_input` (legacy `mediaChunks` path) is documented as deprecated by the upstream API; prefer `realtime_text` going forward.
+## [1.0.0] - 2026-01-28
+### Added
+- Thinking feature support for Gemini 2.5 and Gemini 3 models
+  - `thinking_budget` parameter for Gemini 2.5 (1-32768 tokens, -1 for dynamic, 0 to disable)
+  - `thinking_level` parameter for Gemini 3 (:minimal, :low, :medium, :high)
+- Thought Signatures support for Function Calling with Thinking
+  - `FunctionCallingHelper.build_continuation` for automatic signature management
+  - Response methods: `thought_signatures`, `first_thought_signature`, `has_thought_signature?`
+- Response helper methods: `thoughts_token_count`, `model_version`, `gemini_3?`
-## [0.1.1] - 2025-05-04
+## [0.1.7] - 2026-01-13
-- Changed generate_contents to accept temperature parameter
+- Remove dotenv dependency
-## [0.1.2] - 2025-07-10
+## [0.1.6] - 2025-12-11
-- Add function calling
+- Add support for video understanding
+  - Analyze local video files (Files API and inline data)
+  - Analyze YouTube videos
+  - Helper methods: describe, ask, extract_timestamps, analyze_segment
+  - Support for MP4, MPEG, MOV, AVI, FLV, WebM, WMV, 3GPP formats
+- Change default model to gemini-2.5-flash
-## [0.1.3] - 2025-10-09
+## [0.1.5] - 2025-11-13
-- Add support for multi-image input
+- Add support for URL Context tool
+- Add simplified method for accessing grounding search sources
 ## [0.1.4] - 2025-11-08
 - Add support for grounding search
-## [0.1.5] - 2025-11-13
+## [0.1.3] - 2025-10-09
-- Add support for URL Context tool
-- Add simplified method for accessing grounding search sources
+- Add support for multi-image input
-## [0.1.6] - 2025-12-11
+## [0.1.2] - 2025-07-10
-- Add support for video understanding
-  - Analyze local video files (Files API and inline data)
-  - Analyze YouTube videos
-  - Helper methods: describe, ask, extract_timestamps, analyze_segment
-  - Support for MP4, MPEG, MOV, AVI, FLV, WebM, WMV, 3GPP formats
-- Change default model to gemini-2.5-flash
+- Add function calling
-## [0.1.7] - 2026-01-13
+## [0.1.1] - 2025-05-04
-- Remove dotenv dependency
+- Changed generate_contents to accept temperature parameter
+## [0.1.0] - 2025-04-05
+- Initial release

data/README.md CHANGED Viewed

@@ -1,10 +1,22 @@
 [README ‐ 日本語](https://github.com/rira100000000/ruby-gemini-api/blob/main/README_ja.md)
 # Ruby-Gemini-API
+[![Gem Version](https://badge.fury.io/rb/ruby-gemini-api.svg)](https://badge.fury.io/rb/ruby-gemini-api)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
 A Ruby client library for Google's Gemini API. This gem provides a simple, intuitive interface for interacting with Gemini's generative AI capabilities, following patterns similar to other AI client libraries.
 This project is inspired by and pays homage to [ruby-openai](https://github.com/alexrudall/ruby-openai), aiming to provide a familiar and consistent experience for Ruby developers working with Gemini's AI models.
+## Why This Gem?
+- **Familiar Interface**: API design inspired by ruby-openai for a smooth transition
+- **Comprehensive Features**: Text generation, vision, audio, video, function calling, and more
+- **Response Object**: Convenient wrapper for easy access to generated content
+- **Streaming Support**: Real-time text generation with block-based API
+- **Thinking Support**: Built-in support for Gemini 2.5/3 thinking features
+- **Production Ready**: Stable 1.0 release with thorough documentation
 ## Features
 - Text generation with Gemini models
@@ -18,6 +30,8 @@ This project is inspired by and pays homage to [ruby-openai](https://github.com/
 - Structured output with JSON schema and enum constraints
 - Document processing (PDFs and other formats)
 - Context caching for efficient processing
+- Text embeddings (single and batch) with task type, title, and output dimensionality control
+- Live API: real-time bidirectional conversations with text/audio/video and function calling (sync and async)
 ### Function Calling
@@ -94,6 +108,105 @@ puts "After deleting a function: #{all_tools.list_functions}"
 # => After deleting a function: [:get_current_weather, :send_email]
 ```
+### Thinking Feature
+Gemini 2.5 and later models support the Thinking feature, which enables the model to perform internal reasoning processes for complex problems to generate higher-quality answers.
+#### Using with Gemini 2.5: `thinking_budget`
+```ruby
+require 'gemini'
+client = Gemini::Client.new(ENV['GEMINI_API_KEY'])
+# Specify thinking token count (1-32768)
+response = client.generate_content(
+  "Solve this complex math problem step by step",
+  model: "gemini-2.5-flash",
+  thinking_budget: 2048
+)
+puts "Thoughts token count: #{response.thoughts_token_count}"
+puts "Answer: #{response.text}"
+# Dynamic thinking (model decides automatically)
+response = client.generate_content(
+  "A difficult logic puzzle",
+  model: "gemini-2.5-flash",
+  thinking_budget: -1
+)
+# Disable thinking
+response = client.generate_content(
+  "Simple question",
+  thinking_budget: 0
+)
+```
+#### Using with Gemini 3: `thinking_level`
+```ruby
+# Specify thinking level (:minimal, :low, :medium, :high)
+response = client.generate_content(
+  "Complex analysis task",
+  model: "gemini-3-flash-preview",
+  thinking_level: :high
+)
+```
+#### Function Calling with Thinking (Thought Signatures)
+When using Function Calling with Gemini 3, Thought Signatures must be managed. The library automatically handles signatures for you.
+```ruby
+# Initial request
+response = client.generate_content(
+  "What's the weather in Tokyo?",
+  tools: tools,
+  thinking_level: :medium
+)
+# If function calls are present
+if response.function_calls.any?
+  # Execute the function
+  weather_data = get_weather("Tokyo")
+  # Build continuation request (with automatic signature attachment)
+  contents = Gemini::FunctionCallingHelper.build_continuation(
+    original_contents: [{ role: "user", parts: [{ text: "What's the weather in Tokyo?" }] }],
+    model_response: response,
+    function_responses: [
+      { name: "get_weather", response: weather_data }
+    ]
+  )
+  # Continuation request
+  final_response = client.generate_content(
+    contents,
+    tools: tools,
+    thinking_level: :medium
+  )
+  puts final_response.text
+end
+```
+#### Response Methods for Thinking
+```ruby
+# Get thoughts token count
+response.thoughts_token_count  # => 150
+# Get thought signatures (for Function Calling)
+response.thought_signatures    # => ["base64encoded..."]
+response.first_thought_signature
+response.has_thought_signature?
+# Check model version
+response.model_version  # => "gemini-3-flash-preview"
+response.gemini_3?      # => true
+```
 ## Installation
 Add this line to your application's Gemfile:
@@ -881,6 +994,275 @@ end
 For a complete example of context caching, check out the `demo/document_cache_demo.rb` file.
+### Live API (Real-time Conversations)
+The Gemini Live API provides bidirectional WebSocket-based real-time conversations with audio, video, and text support. The library wraps the protocol behind an event-driven `Gemini::Live::Session`.
+#### Basic Audio Conversation
+The default model (`gemini-2.5-flash-native-audio-preview-12-2025`) responds with audio. You receive Base64-encoded 24 kHz 16-bit PCM chunks via the `:audio` event.
+```ruby
+require 'gemini'
+require 'base64'
+client = Gemini::Client.new(ENV['GEMINI_API_KEY'])
+client.live.connect(
+  response_modality: "AUDIO",
+  voice_name: "Kore",
+  system_instruction: "You are a helpful assistant. Be brief."
+) do |session|
+  setup_complete = false
+  audio_chunks = []
+  session.on(:setup_complete) { setup_complete = true }
+  session.on(:audio)          { |data, _mime| audio_chunks << Base64.decode64(data) }
+  session.on(:turn_complete)  { puts "[#{audio_chunks.sum(&:bytesize)} bytes]" }
+  session.on(:error)          { |e| puts "Error: #{e.message}" }
+  sleep 0.05 until setup_complete
+  session.send_realtime_text("What is the capital of Japan?")
+  sleep 8
+end
+```
+For text-only responses, see the note below about Live model availability.
+#### Function Calling (Synchronous)
+The Live API supports function calling. Define your tools, register a `:tool_call` handler, and reply with `session.send_tool_response`.
+> **Note on Live model input format**
+> Newer Live models such as `gemini-3.1-flash-live-preview` reject the
+> legacy `clientContent.turns[]` payload that older models (including the
+> native-audio variants) accept. Use `session.send_realtime_text(...)`
+> instead of `session.send_text(...)`, which emits the universal
+> `realtimeInput.text` form and works on every currently-deployed Live
+> model. The `gemini-2.5-flash-live-preview` model name listed in the
+> public tools docs is not deployed on the `bidiGenerateContent` endpoint
+> at the time of writing.
+```ruby
+require 'base64'
+tools = [
+  {
+    functionDeclarations: [
+      {
+        name: "get_weather",
+        description: "Get the current weather for a location",
+        parameters: {
+          type: "object",
+          properties: {
+            location: { type: "string", description: "City name" }
+          },
+          required: ["location"]
+        }
+      }
+    ]
+  }
+]
+audio_chunks = []
+client.live.connect(
+  response_modality: "AUDIO",
+  voice_name: "Kore",
+  tools: tools,
+  system_instruction: "Use the available functions when asked about weather."
+) do |session|
+  session.on(:audio) { |data, _mime| audio_chunks << Base64.decode64(data) }
+  session.on(:tool_call) do |function_calls|
+    responses = function_calls.map do |call|
+      result = case call[:name]
+               when "get_weather"
+                 { temperature: 22, condition: "sunny", location: call[:args]["location"] }
+               end
+      { id: call[:id], name: call[:name], response: result }
+    end
+    session.send_tool_response(responses)
+  end
+  sleep 0.5  # wait for setup
+  session.send_realtime_text("What's the weather in Tokyo?")
+  sleep 18
+end
+# audio_chunks now contains 24 kHz, 16-bit PCM mono audio of the spoken reply.
+```
+A complete example is in `demo/live_function_calling_demo.rb`.
+#### Function Calling (Asynchronous / NON_BLOCKING)
+`gemini-2.5-flash-live-preview` supports asynchronous function calls. Mark a function declaration with `behavior: "NON_BLOCKING"` so the model can keep talking while the call runs, then control how the result is delivered back via `scheduling`.
+```ruby
+tools = [
+  {
+    functionDeclarations: [
+      {
+        name: "fetch_long_running_data",
+        behavior: "NON_BLOCKING",
+        description: "Slow data lookup",
+        parameters: { type: "object", properties: {} }
+      }
+    ]
+  }
+]
+session.on(:tool_call) do |function_calls|
+  responses = function_calls.map do |call|
+    {
+      id: call[:id],
+      name: call[:name],
+      response: { result: "data ready" },
+      scheduling: "INTERRUPT"  # or "WHEN_IDLE", "SILENT"
+    }
+  end
+  session.send_tool_response(responses)
+end
+```
+`scheduling` can also be placed inside the `response:` hash directly. Valid values: `INTERRUPT`, `WHEN_IDLE`, `SILENT`. The library validates and uppercases the value automatically; an unknown value raises `ArgumentError`.
+#### Built-in Tools
+Google Search grounding is supported in the Live API:
+```ruby
+client.live.connect(
+  model: "gemini-2.5-flash-live-preview",
+  tools: [{ google_search: {} }]
+) do |session|
+  # ...
+end
+```
+#### Supported Live API Models for Tools
+The public Live API tools docs list:
+| Model | Sync Function Calling | Async (NON_BLOCKING) | Google Search |
+|---|---|---|---|
+| `gemini-2.5-flash-live-preview` | ✓ | ✓ | ✓ |
+| `gemini-3.1-flash-live-preview` | ✓ | — | ✓ |
+In practice, on the `bidiGenerateContent` endpoint as of writing:
+- `gemini-3.1-flash-live-preview` is deployed and works with **AUDIO** response modality + tools, **but only when text input is sent via `session.send_realtime_text(...)`** (i.e., `realtimeInput.text`). It rejects the legacy `clientContent.turns[]` payload.
+- `gemini-2.5-flash-native-audio-preview-12-2025` (the library default) is deployed and accepts both `send_realtime_text` and `send_text` (legacy `clientContent.turns[]`).
+- `gemini-2.5-flash-live-preview` from the docs table is **not yet deployed**.
+Once a TEXT-modality-capable Live model ships, the same code works with `response_modality: "TEXT"` and the `voice_name:` argument removed.
+Demos available:
+- `demo/live_text_demo.rb` - Live API text conversation
+- `demo/live_audio_demo.rb` - Live API audio conversation
+- `demo/live_function_calling_demo.rb` - Live API function calling
+### Embeddings
+You can generate text embeddings using the Gemini Embeddings API. Embeddings are vector representations of text that can be used for semantic similarity, classification, clustering, retrieval, and more.
+#### Single Embedding
+```ruby
+require 'gemini'
+client = Gemini::Client.new(ENV['GEMINI_API_KEY'])
+response = client.embed_content(
+  "What is the meaning of life?",
+  model: "gemini-embedding-001"
+)
+if response.success?
+  puts "Dimension: #{response.embedding_dimension}"
+  puts "Vector (first 5 values): #{response.embedding.first(5).inspect}"
+end
+```
+#### Batch Embeddings
+Pass an Array of strings to embed multiple texts in a single batch request (uses `batchEmbedContents` under the hood):
+```ruby
+response = client.embed_content(
+  [
+    "I love programming in Ruby.",
+    "Rubies are red gemstones.",
+    "Python is also a programming language."
+  ],
+  model: "gemini-embedding-001",
+  task_type: :semantic_similarity
+)
+response.embeddings.each_with_index do |values, i|
+  puts "Embedding #{i}: dimension=#{values.size}"
+end
+```
+#### Task Type, Title, and Output Dimensionality
+You can specify a `task_type` to optimize the embedding for a particular downstream task. When `task_type: :retrieval_document` is used, you may also pass a `title`. Use `output_dimensionality` to truncate the vector length (recommended values: 768, 1536, 3072).
+```ruby
+response = client.embed_content(
+  "Ruby is a dynamic, open-source programming language.",
+  model: "gemini-embedding-001",
+  task_type: :retrieval_document,
+  title: "Ruby Overview",
+  output_dimensionality: 768
+)
+```
+Supported task types:
+- `RETRIEVAL_QUERY`
+- `RETRIEVAL_DOCUMENT`
+- `SEMANTIC_SIMILARITY`
+- `CLASSIFICATION`
+- `CLUSTERING`
+- `QUESTION_ANSWERING`
+- `FACT_VERIFICATION`
+- `CODE_RETRIEVAL_QUERY`
+You can pass them as a String, Symbol, or in any case (e.g. `:retrieval_query`, `"RETRIEVAL_QUERY"`).
+#### Direct Access via `embeddings_api`
+For more control, you can call the embeddings API directly:
+```ruby
+# Single
+client.embeddings_api.create(input: "Hello", model: "gemini-embedding-001")
+# Batch
+client.embeddings_api.batch_create(
+  inputs: ["First", "Second", "Third"],
+  model: "gemini-embedding-001",
+  task_type: :clustering
+)
+```
+#### Response Helpers
+The Response object exposes a few helpers for embedding payloads:
+```ruby
+response.embedding            # First embedding values (Array of Floats)
+response.embeddings           # All embedding value arrays (Array of Arrays)
+response.embedding_dimension  # Length of the first embedding vector
+response.embedding_response?  # true if the payload contains embedding data
+```
+A complete example is available in `demo/embeddings_demo.rb`.
 ### Structured Output with JSON Schema
 You can request responses in structured JSON format by specifying a JSON schema:
@@ -1116,9 +1498,15 @@ The gem includes several demo applications that showcase its functionality:
 - `demo/file_audio_demo.rb` - Audio transcription with large audio files
 - `demo/structured_output_demo.rb` - Structured JSON output with schema
 - `demo/enum_response_demo.rb` - Enum-constrained responses
+- `demo/thinking_demo.rb` - Thinking feature (Gemini 2.5)
+- `demo/thinking_gemini3_demo.rb` - Thinking feature (Gemini 3)
 - `demo/document_chat_demo.rb` - Document processing
 - `demo/document_conversation_demo.rb` - Conversation with documents
 - `demo/document_cache_demo.rb` - Document caching
+- `demo/embeddings_demo.rb` - Text embeddings (single and batch)
+- `demo/live_text_demo.rb` - Live API text conversation
+- `demo/live_audio_demo.rb` - Live API audio conversation
+- `demo/live_function_calling_demo.rb` - Live API function calling
 Run the demos with:
@@ -1159,6 +1547,12 @@ ruby demo/structured_output_demo.rb
 # Enum-constrained responses
 ruby demo/enum_response_demo.rb
+# Thinking feature (Gemini 2.5)
+ruby demo/thinking_demo.rb
+# Thinking feature (Gemini 3)
+ruby demo/thinking_gemini3_demo.rb
 # Document processing
 ruby demo/document_chat_demo.rb path/to/document.pdf
@@ -1167,6 +1561,9 @@ ruby demo/document_conversation_demo.rb path/to/document.pdf
 # Document caching and querying
 ruby demo/document_cache_demo.rb path/to/document.pdf
+# Text embeddings (single and batch)
+ruby demo/embeddings_demo.rb
 ```
 ## Models