ruby-gemini-api 0.1.7 → 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +59 -21
- data/README.md +397 -0
- data/lib/gemini/client.rb +85 -7
- data/lib/gemini/embeddings.rb +108 -17
- data/lib/gemini/function_calling_helper.rb +45 -0
- data/lib/gemini/live/configuration.rb +65 -0
- data/lib/gemini/live/connection.rb +83 -0
- data/lib/gemini/live/message_builder.rb +217 -0
- data/lib/gemini/live/session.rb +223 -0
- data/lib/gemini/live.rb +102 -0
- data/lib/gemini/response.rb +89 -4
- data/lib/gemini/version.rb +1 -1
- data/lib/gemini.rb +2 -0
- metadata +23 -6
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 31487006e959d8d9a755743f6471b54e075a1bc91aa36718274203deb9fda84d
|
|
4
|
+
data.tar.gz: bc7dbbbea933ed2343b2ec1a32d3cf9b03fe41a4494801c47e5d83842bf31603
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: cc033c4ab711800c56f2f1d8884c38a74359d7c3b66fb66ed39ea03b23b98c3748d08208dc560e3276ac3d6e25660ad3a2c5aef69ef25b523bbd6512a5b5d246
|
|
7
|
+
data.tar.gz: 998ca95babf2803241a9d0a01c00eefe7fcbd37990723ffc9378668bc30f5529a90daa92b20919e5c0e681f68f4e8a5d4206e66520eeec46da9aa18c169fc9be
|
data/CHANGELOG.md
CHANGED
|
@@ -1,39 +1,77 @@
|
|
|
1
1
|
## [Unreleased]
|
|
2
2
|
|
|
3
|
-
## [
|
|
4
|
-
|
|
5
|
-
|
|
3
|
+
## [1.1.0] - 2026-04-29
|
|
4
|
+
|
|
5
|
+
### Added
|
|
6
|
+
- Live API support for real-time bidirectional audio/video/text conversations over WebSocket
|
|
7
|
+
- `Gemini::Live::Session` with event-driven API (`:setup_complete`, `:text`, `:audio`, `:tool_call`, `:turn_complete`, `:interrupted`, `:usage_metadata`, `:session_resumption`, `:go_away`, `:close`, `:error`)
|
|
8
|
+
- `Gemini::Live::Configuration` with response modality, voice, system instruction, tools, context-window compression, session resumption, manual VAD, output audio transcription
|
|
9
|
+
- `Gemini::Live::MessageBuilder` for setup, clientContent, realtimeInput, activity start/end, and tool response messages
|
|
10
|
+
- Live API audio demos: `live_audio_demo.rb` (low-latency streaming), `live_audio_simple.rb`
|
|
11
|
+
- Manual VAD (Voice Activity Detection) support via `automatic_activity_detection: false`
|
|
12
|
+
- Live API Function Calling
|
|
13
|
+
- `Session#send_realtime_text(text)` — universal text input via `realtimeInput.text`, required by newer Live models such as `gemini-3.1-flash-live-preview`
|
|
14
|
+
- `MessageBuilder.realtime_text(text)` builder
|
|
15
|
+
- Async (NON_BLOCKING) function call support: `MessageBuilder.tool_response` validates and normalizes the `scheduling` field (`INTERRUPT`, `WHEN_IDLE`, `SILENT`), accepted either inside the response payload or as a top-level shortcut
|
|
16
|
+
- Demos: `live_function_calling_demo.rb` / `live_function_calling_demo_ja.rb`
|
|
17
|
+
- Embeddings API support (`embedContent` and `batchEmbedContents`)
|
|
18
|
+
- `client.embeddings_api.create(input:, ...)` for single embeddings
|
|
19
|
+
- `client.embeddings_api.batch_create(inputs:, ...)` for batch embeddings
|
|
20
|
+
- `client.embed_content(input, ...)` shortcut that auto-routes Array inputs to batch
|
|
21
|
+
- Optional parameters: `task_type` (RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, SEMANTIC_SIMILARITY, CLASSIFICATION, CLUSTERING, QUESTION_ANSWERING, FACT_VERIFICATION, CODE_RETRIEVAL_QUERY), `title` (RETRIEVAL_DOCUMENT only), `output_dimensionality`
|
|
22
|
+
- Default model: `gemini-embedding-001`
|
|
23
|
+
- `Response` helpers for embeddings: `#embedding`, `#embeddings`, `#embedding_dimension`, `#embedding_response?`
|
|
24
|
+
- Demos: `embeddings_demo.rb` / `embeddings_demo_ja.rb`
|
|
25
|
+
|
|
26
|
+
### Notes
|
|
27
|
+
- Verified Live model compatibility on the `bidiGenerateContent` endpoint: only the native-audio variants and `gemini-3.1-flash-live-preview` are deployed today. The latter requires `realtimeInput.text` (i.e., `Session#send_realtime_text`) and `AUDIO` modality. The `gemini-2.5-flash-live-preview` model name listed in the public tools docs is not yet deployed.
|
|
28
|
+
- `MessageBuilder.realtime_input` (legacy `mediaChunks` path) is documented as deprecated by the upstream API; prefer `realtime_text` going forward.
|
|
29
|
+
|
|
30
|
+
## [1.0.0] - 2026-01-28
|
|
31
|
+
|
|
32
|
+
### Added
|
|
33
|
+
- Thinking feature support for Gemini 2.5 and Gemini 3 models
|
|
34
|
+
- `thinking_budget` parameter for Gemini 2.5 (1-32768 tokens, -1 for dynamic, 0 to disable)
|
|
35
|
+
- `thinking_level` parameter for Gemini 3 (:minimal, :low, :medium, :high)
|
|
36
|
+
- Thought Signatures support for Function Calling with Thinking
|
|
37
|
+
- `FunctionCallingHelper.build_continuation` for automatic signature management
|
|
38
|
+
- Response methods: `thought_signatures`, `first_thought_signature`, `has_thought_signature?`
|
|
39
|
+
- Response helper methods: `thoughts_token_count`, `model_version`, `gemini_3?`
|
|
6
40
|
|
|
7
|
-
## [0.1.
|
|
41
|
+
## [0.1.7] - 2026-01-13
|
|
8
42
|
|
|
9
|
-
-
|
|
43
|
+
- Remove dotenv dependency
|
|
10
44
|
|
|
11
|
-
## [0.1.
|
|
45
|
+
## [0.1.6] - 2025-12-11
|
|
12
46
|
|
|
13
|
-
- Add
|
|
47
|
+
- Add support for video understanding
|
|
48
|
+
- Analyze local video files (Files API and inline data)
|
|
49
|
+
- Analyze YouTube videos
|
|
50
|
+
- Helper methods: describe, ask, extract_timestamps, analyze_segment
|
|
51
|
+
- Support for MP4, MPEG, MOV, AVI, FLV, WebM, WMV, 3GPP formats
|
|
52
|
+
- Change default model to gemini-2.5-flash
|
|
14
53
|
|
|
15
|
-
## [0.1.
|
|
54
|
+
## [0.1.5] - 2025-11-13
|
|
16
55
|
|
|
17
|
-
- Add support for
|
|
56
|
+
- Add support for URL Context tool
|
|
57
|
+
- Add simplified method for accessing grounding search sources
|
|
18
58
|
|
|
19
59
|
## [0.1.4] - 2025-11-08
|
|
20
60
|
|
|
21
61
|
- Add support for grounding search
|
|
22
62
|
|
|
23
|
-
## [0.1.
|
|
63
|
+
## [0.1.3] - 2025-10-09
|
|
24
64
|
|
|
25
|
-
- Add support for
|
|
26
|
-
- Add simplified method for accessing grounding search sources
|
|
65
|
+
- Add support for multi-image input
|
|
27
66
|
|
|
28
|
-
## [0.1.
|
|
67
|
+
## [0.1.2] - 2025-07-10
|
|
29
68
|
|
|
30
|
-
- Add
|
|
31
|
-
- Analyze local video files (Files API and inline data)
|
|
32
|
-
- Analyze YouTube videos
|
|
33
|
-
- Helper methods: describe, ask, extract_timestamps, analyze_segment
|
|
34
|
-
- Support for MP4, MPEG, MOV, AVI, FLV, WebM, WMV, 3GPP formats
|
|
35
|
-
- Change default model to gemini-2.5-flash
|
|
69
|
+
- Add function calling
|
|
36
70
|
|
|
37
|
-
## [0.1.
|
|
71
|
+
## [0.1.1] - 2025-05-04
|
|
38
72
|
|
|
39
|
-
-
|
|
73
|
+
- Changed generate_contents to accept temperature parameter
|
|
74
|
+
|
|
75
|
+
## [0.1.0] - 2025-04-05
|
|
76
|
+
|
|
77
|
+
- Initial release
|
data/README.md
CHANGED
|
@@ -1,10 +1,22 @@
|
|
|
1
1
|
[README ‐ 日本語](https://github.com/rira100000000/ruby-gemini-api/blob/main/README_ja.md)
|
|
2
2
|
# Ruby-Gemini-API
|
|
3
3
|
|
|
4
|
+
[](https://badge.fury.io/rb/ruby-gemini-api)
|
|
5
|
+
[](https://opensource.org/licenses/MIT)
|
|
6
|
+
|
|
4
7
|
A Ruby client library for Google's Gemini API. This gem provides a simple, intuitive interface for interacting with Gemini's generative AI capabilities, following patterns similar to other AI client libraries.
|
|
5
8
|
|
|
6
9
|
This project is inspired by and pays homage to [ruby-openai](https://github.com/alexrudall/ruby-openai), aiming to provide a familiar and consistent experience for Ruby developers working with Gemini's AI models.
|
|
7
10
|
|
|
11
|
+
## Why This Gem?
|
|
12
|
+
|
|
13
|
+
- **Familiar Interface**: API design inspired by ruby-openai for a smooth transition
|
|
14
|
+
- **Comprehensive Features**: Text generation, vision, audio, video, function calling, and more
|
|
15
|
+
- **Response Object**: Convenient wrapper for easy access to generated content
|
|
16
|
+
- **Streaming Support**: Real-time text generation with block-based API
|
|
17
|
+
- **Thinking Support**: Built-in support for Gemini 2.5/3 thinking features
|
|
18
|
+
- **Production Ready**: Stable 1.0 release with thorough documentation
|
|
19
|
+
|
|
8
20
|
## Features
|
|
9
21
|
|
|
10
22
|
- Text generation with Gemini models
|
|
@@ -18,6 +30,8 @@ This project is inspired by and pays homage to [ruby-openai](https://github.com/
|
|
|
18
30
|
- Structured output with JSON schema and enum constraints
|
|
19
31
|
- Document processing (PDFs and other formats)
|
|
20
32
|
- Context caching for efficient processing
|
|
33
|
+
- Text embeddings (single and batch) with task type, title, and output dimensionality control
|
|
34
|
+
- Live API: real-time bidirectional conversations with text/audio/video and function calling (sync and async)
|
|
21
35
|
|
|
22
36
|
### Function Calling
|
|
23
37
|
|
|
@@ -94,6 +108,105 @@ puts "After deleting a function: #{all_tools.list_functions}"
|
|
|
94
108
|
# => After deleting a function: [:get_current_weather, :send_email]
|
|
95
109
|
```
|
|
96
110
|
|
|
111
|
+
### Thinking Feature
|
|
112
|
+
|
|
113
|
+
Gemini 2.5 and later models support the Thinking feature, which enables the model to perform internal reasoning processes for complex problems to generate higher-quality answers.
|
|
114
|
+
|
|
115
|
+
#### Using with Gemini 2.5: `thinking_budget`
|
|
116
|
+
|
|
117
|
+
```ruby
|
|
118
|
+
require 'gemini'
|
|
119
|
+
|
|
120
|
+
client = Gemini::Client.new(ENV['GEMINI_API_KEY'])
|
|
121
|
+
|
|
122
|
+
# Specify thinking token count (1-32768)
|
|
123
|
+
response = client.generate_content(
|
|
124
|
+
"Solve this complex math problem step by step",
|
|
125
|
+
model: "gemini-2.5-flash",
|
|
126
|
+
thinking_budget: 2048
|
|
127
|
+
)
|
|
128
|
+
|
|
129
|
+
puts "Thoughts token count: #{response.thoughts_token_count}"
|
|
130
|
+
puts "Answer: #{response.text}"
|
|
131
|
+
|
|
132
|
+
# Dynamic thinking (model decides automatically)
|
|
133
|
+
response = client.generate_content(
|
|
134
|
+
"A difficult logic puzzle",
|
|
135
|
+
model: "gemini-2.5-flash",
|
|
136
|
+
thinking_budget: -1
|
|
137
|
+
)
|
|
138
|
+
|
|
139
|
+
# Disable thinking
|
|
140
|
+
response = client.generate_content(
|
|
141
|
+
"Simple question",
|
|
142
|
+
thinking_budget: 0
|
|
143
|
+
)
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
#### Using with Gemini 3: `thinking_level`
|
|
147
|
+
|
|
148
|
+
```ruby
|
|
149
|
+
# Specify thinking level (:minimal, :low, :medium, :high)
|
|
150
|
+
response = client.generate_content(
|
|
151
|
+
"Complex analysis task",
|
|
152
|
+
model: "gemini-3-flash-preview",
|
|
153
|
+
thinking_level: :high
|
|
154
|
+
)
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
#### Function Calling with Thinking (Thought Signatures)
|
|
158
|
+
|
|
159
|
+
When using Function Calling with Gemini 3, Thought Signatures must be managed. The library automatically handles signatures for you.
|
|
160
|
+
|
|
161
|
+
```ruby
|
|
162
|
+
# Initial request
|
|
163
|
+
response = client.generate_content(
|
|
164
|
+
"What's the weather in Tokyo?",
|
|
165
|
+
tools: tools,
|
|
166
|
+
thinking_level: :medium
|
|
167
|
+
)
|
|
168
|
+
|
|
169
|
+
# If function calls are present
|
|
170
|
+
if response.function_calls.any?
|
|
171
|
+
# Execute the function
|
|
172
|
+
weather_data = get_weather("Tokyo")
|
|
173
|
+
|
|
174
|
+
# Build continuation request (with automatic signature attachment)
|
|
175
|
+
contents = Gemini::FunctionCallingHelper.build_continuation(
|
|
176
|
+
original_contents: [{ role: "user", parts: [{ text: "What's the weather in Tokyo?" }] }],
|
|
177
|
+
model_response: response,
|
|
178
|
+
function_responses: [
|
|
179
|
+
{ name: "get_weather", response: weather_data }
|
|
180
|
+
]
|
|
181
|
+
)
|
|
182
|
+
|
|
183
|
+
# Continuation request
|
|
184
|
+
final_response = client.generate_content(
|
|
185
|
+
contents,
|
|
186
|
+
tools: tools,
|
|
187
|
+
thinking_level: :medium
|
|
188
|
+
)
|
|
189
|
+
|
|
190
|
+
puts final_response.text
|
|
191
|
+
end
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
#### Response Methods for Thinking
|
|
195
|
+
|
|
196
|
+
```ruby
|
|
197
|
+
# Get thoughts token count
|
|
198
|
+
response.thoughts_token_count # => 150
|
|
199
|
+
|
|
200
|
+
# Get thought signatures (for Function Calling)
|
|
201
|
+
response.thought_signatures # => ["base64encoded..."]
|
|
202
|
+
response.first_thought_signature
|
|
203
|
+
response.has_thought_signature?
|
|
204
|
+
|
|
205
|
+
# Check model version
|
|
206
|
+
response.model_version # => "gemini-3-flash-preview"
|
|
207
|
+
response.gemini_3? # => true
|
|
208
|
+
```
|
|
209
|
+
|
|
97
210
|
## Installation
|
|
98
211
|
|
|
99
212
|
Add this line to your application's Gemfile:
|
|
@@ -881,6 +994,275 @@ end
|
|
|
881
994
|
|
|
882
995
|
For a complete example of context caching, check out the `demo/document_cache_demo.rb` file.
|
|
883
996
|
|
|
997
|
+
### Live API (Real-time Conversations)
|
|
998
|
+
|
|
999
|
+
The Gemini Live API provides bidirectional WebSocket-based real-time conversations with audio, video, and text support. The library wraps the protocol behind an event-driven `Gemini::Live::Session`.
|
|
1000
|
+
|
|
1001
|
+
#### Basic Audio Conversation
|
|
1002
|
+
|
|
1003
|
+
The default model (`gemini-2.5-flash-native-audio-preview-12-2025`) responds with audio. You receive Base64-encoded 24 kHz 16-bit PCM chunks via the `:audio` event.
|
|
1004
|
+
|
|
1005
|
+
```ruby
|
|
1006
|
+
require 'gemini'
|
|
1007
|
+
require 'base64'
|
|
1008
|
+
|
|
1009
|
+
client = Gemini::Client.new(ENV['GEMINI_API_KEY'])
|
|
1010
|
+
|
|
1011
|
+
client.live.connect(
|
|
1012
|
+
response_modality: "AUDIO",
|
|
1013
|
+
voice_name: "Kore",
|
|
1014
|
+
system_instruction: "You are a helpful assistant. Be brief."
|
|
1015
|
+
) do |session|
|
|
1016
|
+
setup_complete = false
|
|
1017
|
+
audio_chunks = []
|
|
1018
|
+
|
|
1019
|
+
session.on(:setup_complete) { setup_complete = true }
|
|
1020
|
+
session.on(:audio) { |data, _mime| audio_chunks << Base64.decode64(data) }
|
|
1021
|
+
session.on(:turn_complete) { puts "[#{audio_chunks.sum(&:bytesize)} bytes]" }
|
|
1022
|
+
session.on(:error) { |e| puts "Error: #{e.message}" }
|
|
1023
|
+
|
|
1024
|
+
sleep 0.05 until setup_complete
|
|
1025
|
+
|
|
1026
|
+
session.send_realtime_text("What is the capital of Japan?")
|
|
1027
|
+
sleep 8
|
|
1028
|
+
end
|
|
1029
|
+
```
|
|
1030
|
+
|
|
1031
|
+
For text-only responses, see the note below about Live model availability.
|
|
1032
|
+
|
|
1033
|
+
#### Function Calling (Synchronous)
|
|
1034
|
+
|
|
1035
|
+
The Live API supports function calling. Define your tools, register a `:tool_call` handler, and reply with `session.send_tool_response`.
|
|
1036
|
+
|
|
1037
|
+
> **Note on Live model input format**
|
|
1038
|
+
> Newer Live models such as `gemini-3.1-flash-live-preview` reject the
|
|
1039
|
+
> legacy `clientContent.turns[]` payload that older models (including the
|
|
1040
|
+
> native-audio variants) accept. Use `session.send_realtime_text(...)`
|
|
1041
|
+
> instead of `session.send_text(...)`, which emits the universal
|
|
1042
|
+
> `realtimeInput.text` form and works on every currently-deployed Live
|
|
1043
|
+
> model. The `gemini-2.5-flash-live-preview` model name listed in the
|
|
1044
|
+
> public tools docs is not deployed on the `bidiGenerateContent` endpoint
|
|
1045
|
+
> at the time of writing.
|
|
1046
|
+
|
|
1047
|
+
```ruby
|
|
1048
|
+
require 'base64'
|
|
1049
|
+
|
|
1050
|
+
tools = [
|
|
1051
|
+
{
|
|
1052
|
+
functionDeclarations: [
|
|
1053
|
+
{
|
|
1054
|
+
name: "get_weather",
|
|
1055
|
+
description: "Get the current weather for a location",
|
|
1056
|
+
parameters: {
|
|
1057
|
+
type: "object",
|
|
1058
|
+
properties: {
|
|
1059
|
+
location: { type: "string", description: "City name" }
|
|
1060
|
+
},
|
|
1061
|
+
required: ["location"]
|
|
1062
|
+
}
|
|
1063
|
+
}
|
|
1064
|
+
]
|
|
1065
|
+
}
|
|
1066
|
+
]
|
|
1067
|
+
|
|
1068
|
+
audio_chunks = []
|
|
1069
|
+
|
|
1070
|
+
client.live.connect(
|
|
1071
|
+
response_modality: "AUDIO",
|
|
1072
|
+
voice_name: "Kore",
|
|
1073
|
+
tools: tools,
|
|
1074
|
+
system_instruction: "Use the available functions when asked about weather."
|
|
1075
|
+
) do |session|
|
|
1076
|
+
session.on(:audio) { |data, _mime| audio_chunks << Base64.decode64(data) }
|
|
1077
|
+
|
|
1078
|
+
session.on(:tool_call) do |function_calls|
|
|
1079
|
+
responses = function_calls.map do |call|
|
|
1080
|
+
result = case call[:name]
|
|
1081
|
+
when "get_weather"
|
|
1082
|
+
{ temperature: 22, condition: "sunny", location: call[:args]["location"] }
|
|
1083
|
+
end
|
|
1084
|
+
{ id: call[:id], name: call[:name], response: result }
|
|
1085
|
+
end
|
|
1086
|
+
session.send_tool_response(responses)
|
|
1087
|
+
end
|
|
1088
|
+
|
|
1089
|
+
sleep 0.5 # wait for setup
|
|
1090
|
+
session.send_realtime_text("What's the weather in Tokyo?")
|
|
1091
|
+
sleep 18
|
|
1092
|
+
end
|
|
1093
|
+
|
|
1094
|
+
# audio_chunks now contains 24 kHz, 16-bit PCM mono audio of the spoken reply.
|
|
1095
|
+
```
|
|
1096
|
+
|
|
1097
|
+
A complete example is in `demo/live_function_calling_demo.rb`.
|
|
1098
|
+
|
|
1099
|
+
#### Function Calling (Asynchronous / NON_BLOCKING)
|
|
1100
|
+
|
|
1101
|
+
`gemini-2.5-flash-live-preview` supports asynchronous function calls. Mark a function declaration with `behavior: "NON_BLOCKING"` so the model can keep talking while the call runs, then control how the result is delivered back via `scheduling`.
|
|
1102
|
+
|
|
1103
|
+
```ruby
|
|
1104
|
+
tools = [
|
|
1105
|
+
{
|
|
1106
|
+
functionDeclarations: [
|
|
1107
|
+
{
|
|
1108
|
+
name: "fetch_long_running_data",
|
|
1109
|
+
behavior: "NON_BLOCKING",
|
|
1110
|
+
description: "Slow data lookup",
|
|
1111
|
+
parameters: { type: "object", properties: {} }
|
|
1112
|
+
}
|
|
1113
|
+
]
|
|
1114
|
+
}
|
|
1115
|
+
]
|
|
1116
|
+
|
|
1117
|
+
session.on(:tool_call) do |function_calls|
|
|
1118
|
+
responses = function_calls.map do |call|
|
|
1119
|
+
{
|
|
1120
|
+
id: call[:id],
|
|
1121
|
+
name: call[:name],
|
|
1122
|
+
response: { result: "data ready" },
|
|
1123
|
+
scheduling: "INTERRUPT" # or "WHEN_IDLE", "SILENT"
|
|
1124
|
+
}
|
|
1125
|
+
end
|
|
1126
|
+
session.send_tool_response(responses)
|
|
1127
|
+
end
|
|
1128
|
+
```
|
|
1129
|
+
|
|
1130
|
+
`scheduling` can also be placed inside the `response:` hash directly. Valid values: `INTERRUPT`, `WHEN_IDLE`, `SILENT`. The library validates and uppercases the value automatically; an unknown value raises `ArgumentError`.
|
|
1131
|
+
|
|
1132
|
+
#### Built-in Tools
|
|
1133
|
+
|
|
1134
|
+
Google Search grounding is supported in the Live API:
|
|
1135
|
+
|
|
1136
|
+
```ruby
|
|
1137
|
+
client.live.connect(
|
|
1138
|
+
model: "gemini-2.5-flash-live-preview",
|
|
1139
|
+
tools: [{ google_search: {} }]
|
|
1140
|
+
) do |session|
|
|
1141
|
+
# ...
|
|
1142
|
+
end
|
|
1143
|
+
```
|
|
1144
|
+
|
|
1145
|
+
#### Supported Live API Models for Tools
|
|
1146
|
+
|
|
1147
|
+
The public Live API tools docs list:
|
|
1148
|
+
|
|
1149
|
+
| Model | Sync Function Calling | Async (NON_BLOCKING) | Google Search |
|
|
1150
|
+
|---|---|---|---|
|
|
1151
|
+
| `gemini-2.5-flash-live-preview` | ✓ | ✓ | ✓ |
|
|
1152
|
+
| `gemini-3.1-flash-live-preview` | ✓ | — | ✓ |
|
|
1153
|
+
|
|
1154
|
+
In practice, on the `bidiGenerateContent` endpoint as of writing:
|
|
1155
|
+
|
|
1156
|
+
- `gemini-3.1-flash-live-preview` is deployed and works with **AUDIO** response modality + tools, **but only when text input is sent via `session.send_realtime_text(...)`** (i.e., `realtimeInput.text`). It rejects the legacy `clientContent.turns[]` payload.
|
|
1157
|
+
- `gemini-2.5-flash-native-audio-preview-12-2025` (the library default) is deployed and accepts both `send_realtime_text` and `send_text` (legacy `clientContent.turns[]`).
|
|
1158
|
+
- `gemini-2.5-flash-live-preview` from the docs table is **not yet deployed**.
|
|
1159
|
+
|
|
1160
|
+
Once a TEXT-modality-capable Live model ships, the same code works with `response_modality: "TEXT"` and the `voice_name:` argument removed.
|
|
1161
|
+
|
|
1162
|
+
Demos available:
|
|
1163
|
+
|
|
1164
|
+
- `demo/live_text_demo.rb` - Live API text conversation
|
|
1165
|
+
- `demo/live_audio_demo.rb` - Live API audio conversation
|
|
1166
|
+
- `demo/live_function_calling_demo.rb` - Live API function calling
|
|
1167
|
+
|
|
1168
|
+
### Embeddings
|
|
1169
|
+
|
|
1170
|
+
You can generate text embeddings using the Gemini Embeddings API. Embeddings are vector representations of text that can be used for semantic similarity, classification, clustering, retrieval, and more.
|
|
1171
|
+
|
|
1172
|
+
#### Single Embedding
|
|
1173
|
+
|
|
1174
|
+
```ruby
|
|
1175
|
+
require 'gemini'
|
|
1176
|
+
|
|
1177
|
+
client = Gemini::Client.new(ENV['GEMINI_API_KEY'])
|
|
1178
|
+
|
|
1179
|
+
response = client.embed_content(
|
|
1180
|
+
"What is the meaning of life?",
|
|
1181
|
+
model: "gemini-embedding-001"
|
|
1182
|
+
)
|
|
1183
|
+
|
|
1184
|
+
if response.success?
|
|
1185
|
+
puts "Dimension: #{response.embedding_dimension}"
|
|
1186
|
+
puts "Vector (first 5 values): #{response.embedding.first(5).inspect}"
|
|
1187
|
+
end
|
|
1188
|
+
```
|
|
1189
|
+
|
|
1190
|
+
#### Batch Embeddings
|
|
1191
|
+
|
|
1192
|
+
Pass an Array of strings to embed multiple texts in a single batch request (uses `batchEmbedContents` under the hood):
|
|
1193
|
+
|
|
1194
|
+
```ruby
|
|
1195
|
+
response = client.embed_content(
|
|
1196
|
+
[
|
|
1197
|
+
"I love programming in Ruby.",
|
|
1198
|
+
"Rubies are red gemstones.",
|
|
1199
|
+
"Python is also a programming language."
|
|
1200
|
+
],
|
|
1201
|
+
model: "gemini-embedding-001",
|
|
1202
|
+
task_type: :semantic_similarity
|
|
1203
|
+
)
|
|
1204
|
+
|
|
1205
|
+
response.embeddings.each_with_index do |values, i|
|
|
1206
|
+
puts "Embedding #{i}: dimension=#{values.size}"
|
|
1207
|
+
end
|
|
1208
|
+
```
|
|
1209
|
+
|
|
1210
|
+
#### Task Type, Title, and Output Dimensionality
|
|
1211
|
+
|
|
1212
|
+
You can specify a `task_type` to optimize the embedding for a particular downstream task. When `task_type: :retrieval_document` is used, you may also pass a `title`. Use `output_dimensionality` to truncate the vector length (recommended values: 768, 1536, 3072).
|
|
1213
|
+
|
|
1214
|
+
```ruby
|
|
1215
|
+
response = client.embed_content(
|
|
1216
|
+
"Ruby is a dynamic, open-source programming language.",
|
|
1217
|
+
model: "gemini-embedding-001",
|
|
1218
|
+
task_type: :retrieval_document,
|
|
1219
|
+
title: "Ruby Overview",
|
|
1220
|
+
output_dimensionality: 768
|
|
1221
|
+
)
|
|
1222
|
+
```
|
|
1223
|
+
|
|
1224
|
+
Supported task types:
|
|
1225
|
+
|
|
1226
|
+
- `RETRIEVAL_QUERY`
|
|
1227
|
+
- `RETRIEVAL_DOCUMENT`
|
|
1228
|
+
- `SEMANTIC_SIMILARITY`
|
|
1229
|
+
- `CLASSIFICATION`
|
|
1230
|
+
- `CLUSTERING`
|
|
1231
|
+
- `QUESTION_ANSWERING`
|
|
1232
|
+
- `FACT_VERIFICATION`
|
|
1233
|
+
- `CODE_RETRIEVAL_QUERY`
|
|
1234
|
+
|
|
1235
|
+
You can pass them as a String, Symbol, or in any case (e.g. `:retrieval_query`, `"RETRIEVAL_QUERY"`).
|
|
1236
|
+
|
|
1237
|
+
#### Direct Access via `embeddings_api`
|
|
1238
|
+
|
|
1239
|
+
For more control, you can call the embeddings API directly:
|
|
1240
|
+
|
|
1241
|
+
```ruby
|
|
1242
|
+
# Single
|
|
1243
|
+
client.embeddings_api.create(input: "Hello", model: "gemini-embedding-001")
|
|
1244
|
+
|
|
1245
|
+
# Batch
|
|
1246
|
+
client.embeddings_api.batch_create(
|
|
1247
|
+
inputs: ["First", "Second", "Third"],
|
|
1248
|
+
model: "gemini-embedding-001",
|
|
1249
|
+
task_type: :clustering
|
|
1250
|
+
)
|
|
1251
|
+
```
|
|
1252
|
+
|
|
1253
|
+
#### Response Helpers
|
|
1254
|
+
|
|
1255
|
+
The Response object exposes a few helpers for embedding payloads:
|
|
1256
|
+
|
|
1257
|
+
```ruby
|
|
1258
|
+
response.embedding # First embedding values (Array of Floats)
|
|
1259
|
+
response.embeddings # All embedding value arrays (Array of Arrays)
|
|
1260
|
+
response.embedding_dimension # Length of the first embedding vector
|
|
1261
|
+
response.embedding_response? # true if the payload contains embedding data
|
|
1262
|
+
```
|
|
1263
|
+
|
|
1264
|
+
A complete example is available in `demo/embeddings_demo.rb`.
|
|
1265
|
+
|
|
884
1266
|
### Structured Output with JSON Schema
|
|
885
1267
|
|
|
886
1268
|
You can request responses in structured JSON format by specifying a JSON schema:
|
|
@@ -1116,9 +1498,15 @@ The gem includes several demo applications that showcase its functionality:
|
|
|
1116
1498
|
- `demo/file_audio_demo.rb` - Audio transcription with large audio files
|
|
1117
1499
|
- `demo/structured_output_demo.rb` - Structured JSON output with schema
|
|
1118
1500
|
- `demo/enum_response_demo.rb` - Enum-constrained responses
|
|
1501
|
+
- `demo/thinking_demo.rb` - Thinking feature (Gemini 2.5)
|
|
1502
|
+
- `demo/thinking_gemini3_demo.rb` - Thinking feature (Gemini 3)
|
|
1119
1503
|
- `demo/document_chat_demo.rb` - Document processing
|
|
1120
1504
|
- `demo/document_conversation_demo.rb` - Conversation with documents
|
|
1121
1505
|
- `demo/document_cache_demo.rb` - Document caching
|
|
1506
|
+
- `demo/embeddings_demo.rb` - Text embeddings (single and batch)
|
|
1507
|
+
- `demo/live_text_demo.rb` - Live API text conversation
|
|
1508
|
+
- `demo/live_audio_demo.rb` - Live API audio conversation
|
|
1509
|
+
- `demo/live_function_calling_demo.rb` - Live API function calling
|
|
1122
1510
|
|
|
1123
1511
|
Run the demos with:
|
|
1124
1512
|
|
|
@@ -1159,6 +1547,12 @@ ruby demo/structured_output_demo.rb
|
|
|
1159
1547
|
# Enum-constrained responses
|
|
1160
1548
|
ruby demo/enum_response_demo.rb
|
|
1161
1549
|
|
|
1550
|
+
# Thinking feature (Gemini 2.5)
|
|
1551
|
+
ruby demo/thinking_demo.rb
|
|
1552
|
+
|
|
1553
|
+
# Thinking feature (Gemini 3)
|
|
1554
|
+
ruby demo/thinking_gemini3_demo.rb
|
|
1555
|
+
|
|
1162
1556
|
# Document processing
|
|
1163
1557
|
ruby demo/document_chat_demo.rb path/to/document.pdf
|
|
1164
1558
|
|
|
@@ -1167,6 +1561,9 @@ ruby demo/document_conversation_demo.rb path/to/document.pdf
|
|
|
1167
1561
|
|
|
1168
1562
|
# Document caching and querying
|
|
1169
1563
|
ruby demo/document_cache_demo.rb path/to/document.pdf
|
|
1564
|
+
|
|
1565
|
+
# Text embeddings (single and batch)
|
|
1566
|
+
ruby demo/embeddings_demo.rb
|
|
1170
1567
|
```
|
|
1171
1568
|
|
|
1172
1569
|
## Models
|