ruby-gemini-api 1.0.0 → 1.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +43 -0
- data/README.md +455 -0
- data/lib/gemini/client.rb +68 -3
- data/lib/gemini/embeddings.rb +108 -17
- data/lib/gemini/live/configuration.rb +65 -0
- data/lib/gemini/live/connection.rb +83 -0
- data/lib/gemini/live/message_builder.rb +217 -0
- data/lib/gemini/live/session.rb +223 -0
- data/lib/gemini/live.rb +102 -0
- data/lib/gemini/response.rb +141 -4
- data/lib/gemini/tokens.rb +77 -0
- data/lib/gemini/tts.rb +83 -0
- data/lib/gemini/version.rb +1 -1
- data/lib/gemini.rb +3 -0
- metadata +23 -2
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: cd1be2a2b81543d21686e9d4ade4de2e6aa42ea8b26abc4bb0929e9aa77fada0
|
|
4
|
+
data.tar.gz: 9b96121c0a8a68220e4e368057424211e29e0652f28f1812b56049842d31c5c9
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: be505ec75d011d31cd3c741924b2e845024d4bccdc36b3fbdfa63e0468771a45f7dd4abbc31ec6ade82ff4e992b46bd9e3aa65a06bd874a73a799f1cdb246bed
|
|
7
|
+
data.tar.gz: edcc02ff33b17e56a004836b2c55c7a133d883bc38c9f67893f3d3a59afb239bf7329454ef8278cf618e2d393867e445dc88bf539350999258c856f02c775b69
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,48 @@
|
|
|
1
1
|
## [Unreleased]
|
|
2
2
|
|
|
3
|
+
## [1.2.0] - 2026-05-14
|
|
4
|
+
|
|
5
|
+
### Added
|
|
6
|
+
- TTS (speech generation) API support
|
|
7
|
+
- `client.tts.generate(text, voice:)` and `client.generate_speech(text, voice:)` shortcut
|
|
8
|
+
- Single-speaker mode via `voice:` and multi-speaker mode via `multi_speaker: [{ speaker:, voice: }, ...]`
|
|
9
|
+
- 30 prebuilt voices exposed as `Gemini::TTS::VOICES`
|
|
10
|
+
- Default model `gemini-2.5-flash-preview-tts` (override via `model:`)
|
|
11
|
+
- `Response` helpers: `#audio_data`, `#audio_mime_type`, `#audio_response?`, `#save_audio(path)` which auto-wraps L16 PCM in a RIFF/WAVE header
|
|
12
|
+
- Demos: `tts_demo.rb` / `tts_demo_ja.rb`
|
|
13
|
+
- `countTokens` API support
|
|
14
|
+
- `client.tokens.count(input, ...)` and `client.count_tokens(input, ...)` shortcut
|
|
15
|
+
- Accepts String / Array / Hash inputs, full `contents:` array, plus optional `system_instruction:`, `tools:`, `generation_config:`, `cached_content:` (auto-wraps payload in `generateContentRequest` when extra fields are present)
|
|
16
|
+
- `Response` helpers: `#count_tokens`, `#prompt_tokens_details`, `#cached_content_token_count`, `#count_tokens_response?`
|
|
17
|
+
- Demos: `count_tokens_demo.rb` / `count_tokens_demo_ja.rb`
|
|
18
|
+
|
|
19
|
+
## [1.1.0] - 2026-04-29
|
|
20
|
+
|
|
21
|
+
### Added
|
|
22
|
+
- Live API support for real-time bidirectional audio/video/text conversations over WebSocket
|
|
23
|
+
- `Gemini::Live::Session` with event-driven API (`:setup_complete`, `:text`, `:audio`, `:tool_call`, `:turn_complete`, `:interrupted`, `:usage_metadata`, `:session_resumption`, `:go_away`, `:close`, `:error`)
|
|
24
|
+
- `Gemini::Live::Configuration` with response modality, voice, system instruction, tools, context-window compression, session resumption, manual VAD, output audio transcription
|
|
25
|
+
- `Gemini::Live::MessageBuilder` for setup, clientContent, realtimeInput, activity start/end, and tool response messages
|
|
26
|
+
- Live API audio demos: `live_audio_demo.rb` (low-latency streaming), `live_audio_simple.rb`
|
|
27
|
+
- Manual VAD (Voice Activity Detection) support via `automatic_activity_detection: false`
|
|
28
|
+
- Live API Function Calling
|
|
29
|
+
- `Session#send_realtime_text(text)` — universal text input via `realtimeInput.text`, required by newer Live models such as `gemini-3.1-flash-live-preview`
|
|
30
|
+
- `MessageBuilder.realtime_text(text)` builder
|
|
31
|
+
- Async (NON_BLOCKING) function call support: `MessageBuilder.tool_response` validates and normalizes the `scheduling` field (`INTERRUPT`, `WHEN_IDLE`, `SILENT`), accepted either inside the response payload or as a top-level shortcut
|
|
32
|
+
- Demos: `live_function_calling_demo.rb` / `live_function_calling_demo_ja.rb`
|
|
33
|
+
- Embeddings API support (`embedContent` and `batchEmbedContents`)
|
|
34
|
+
- `client.embeddings_api.create(input:, ...)` for single embeddings
|
|
35
|
+
- `client.embeddings_api.batch_create(inputs:, ...)` for batch embeddings
|
|
36
|
+
- `client.embed_content(input, ...)` shortcut that auto-routes Array inputs to batch
|
|
37
|
+
- Optional parameters: `task_type` (RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, SEMANTIC_SIMILARITY, CLASSIFICATION, CLUSTERING, QUESTION_ANSWERING, FACT_VERIFICATION, CODE_RETRIEVAL_QUERY), `title` (RETRIEVAL_DOCUMENT only), `output_dimensionality`
|
|
38
|
+
- Default model: `gemini-embedding-001`
|
|
39
|
+
- `Response` helpers for embeddings: `#embedding`, `#embeddings`, `#embedding_dimension`, `#embedding_response?`
|
|
40
|
+
- Demos: `embeddings_demo.rb` / `embeddings_demo_ja.rb`
|
|
41
|
+
|
|
42
|
+
### Notes
|
|
43
|
+
- Verified Live model compatibility on the `bidiGenerateContent` endpoint: only the native-audio variants and `gemini-3.1-flash-live-preview` are deployed today. The latter requires `realtimeInput.text` (i.e., `Session#send_realtime_text`) and `AUDIO` modality. The `gemini-2.5-flash-live-preview` model name listed in the public tools docs is not yet deployed.
|
|
44
|
+
- `MessageBuilder.realtime_input` (legacy `mediaChunks` path) is documented as deprecated by the upstream API; prefer `realtime_text` going forward.
|
|
45
|
+
|
|
3
46
|
## [1.0.0] - 2026-01-28
|
|
4
47
|
|
|
5
48
|
### Added
|
data/README.md
CHANGED
|
@@ -30,6 +30,10 @@ This project is inspired by and pays homage to [ruby-openai](https://github.com/
|
|
|
30
30
|
- Structured output with JSON schema and enum constraints
|
|
31
31
|
- Document processing (PDFs and other formats)
|
|
32
32
|
- Context caching for efficient processing
|
|
33
|
+
- Text embeddings (single and batch) with task type, title, and output dimensionality control
|
|
34
|
+
- Token counting (`countTokens`) for prompts, chat history, and full requests with system instruction / tools / cached content
|
|
35
|
+
- Speech generation (TTS) with 30 prebuilt voices, single-speaker and multi-speaker modes, and one-line WAV file output
|
|
36
|
+
- Live API: real-time bidirectional conversations with text/audio/video and function calling (sync and async)
|
|
33
37
|
|
|
34
38
|
### Function Calling
|
|
35
39
|
|
|
@@ -992,6 +996,450 @@ end
|
|
|
992
996
|
|
|
993
997
|
For a complete example of context caching, check out the `demo/document_cache_demo.rb` file.
|
|
994
998
|
|
|
999
|
+
### Live API (Real-time Conversations)
|
|
1000
|
+
|
|
1001
|
+
The Gemini Live API provides bidirectional WebSocket-based real-time conversations with audio, video, and text support. The library wraps the protocol behind an event-driven `Gemini::Live::Session`.
|
|
1002
|
+
|
|
1003
|
+
#### Basic Audio Conversation
|
|
1004
|
+
|
|
1005
|
+
The default model (`gemini-2.5-flash-native-audio-preview-12-2025`) responds with audio. You receive Base64-encoded 24 kHz 16-bit PCM chunks via the `:audio` event.
|
|
1006
|
+
|
|
1007
|
+
```ruby
|
|
1008
|
+
require 'gemini'
|
|
1009
|
+
require 'base64'
|
|
1010
|
+
|
|
1011
|
+
client = Gemini::Client.new(ENV['GEMINI_API_KEY'])
|
|
1012
|
+
|
|
1013
|
+
client.live.connect(
|
|
1014
|
+
response_modality: "AUDIO",
|
|
1015
|
+
voice_name: "Kore",
|
|
1016
|
+
system_instruction: "You are a helpful assistant. Be brief."
|
|
1017
|
+
) do |session|
|
|
1018
|
+
setup_complete = false
|
|
1019
|
+
audio_chunks = []
|
|
1020
|
+
|
|
1021
|
+
session.on(:setup_complete) { setup_complete = true }
|
|
1022
|
+
session.on(:audio) { |data, _mime| audio_chunks << Base64.decode64(data) }
|
|
1023
|
+
session.on(:turn_complete) { puts "[#{audio_chunks.sum(&:bytesize)} bytes]" }
|
|
1024
|
+
session.on(:error) { |e| puts "Error: #{e.message}" }
|
|
1025
|
+
|
|
1026
|
+
sleep 0.05 until setup_complete
|
|
1027
|
+
|
|
1028
|
+
session.send_realtime_text("What is the capital of Japan?")
|
|
1029
|
+
sleep 8
|
|
1030
|
+
end
|
|
1031
|
+
```
|
|
1032
|
+
|
|
1033
|
+
For text-only responses, see the note below about Live model availability.
|
|
1034
|
+
|
|
1035
|
+
#### Function Calling (Synchronous)
|
|
1036
|
+
|
|
1037
|
+
The Live API supports function calling. Define your tools, register a `:tool_call` handler, and reply with `session.send_tool_response`.
|
|
1038
|
+
|
|
1039
|
+
> **Note on Live model input format**
|
|
1040
|
+
> Newer Live models such as `gemini-3.1-flash-live-preview` reject the
|
|
1041
|
+
> legacy `clientContent.turns[]` payload that older models (including the
|
|
1042
|
+
> native-audio variants) accept. Use `session.send_realtime_text(...)`
|
|
1043
|
+
> instead of `session.send_text(...)`, which emits the universal
|
|
1044
|
+
> `realtimeInput.text` form and works on every currently-deployed Live
|
|
1045
|
+
> model. The `gemini-2.5-flash-live-preview` model name listed in the
|
|
1046
|
+
> public tools docs is not deployed on the `bidiGenerateContent` endpoint
|
|
1047
|
+
> at the time of writing.
|
|
1048
|
+
|
|
1049
|
+
```ruby
|
|
1050
|
+
require 'base64'
|
|
1051
|
+
|
|
1052
|
+
tools = [
|
|
1053
|
+
{
|
|
1054
|
+
functionDeclarations: [
|
|
1055
|
+
{
|
|
1056
|
+
name: "get_weather",
|
|
1057
|
+
description: "Get the current weather for a location",
|
|
1058
|
+
parameters: {
|
|
1059
|
+
type: "object",
|
|
1060
|
+
properties: {
|
|
1061
|
+
location: { type: "string", description: "City name" }
|
|
1062
|
+
},
|
|
1063
|
+
required: ["location"]
|
|
1064
|
+
}
|
|
1065
|
+
}
|
|
1066
|
+
]
|
|
1067
|
+
}
|
|
1068
|
+
]
|
|
1069
|
+
|
|
1070
|
+
audio_chunks = []
|
|
1071
|
+
|
|
1072
|
+
client.live.connect(
|
|
1073
|
+
response_modality: "AUDIO",
|
|
1074
|
+
voice_name: "Kore",
|
|
1075
|
+
tools: tools,
|
|
1076
|
+
system_instruction: "Use the available functions when asked about weather."
|
|
1077
|
+
) do |session|
|
|
1078
|
+
session.on(:audio) { |data, _mime| audio_chunks << Base64.decode64(data) }
|
|
1079
|
+
|
|
1080
|
+
session.on(:tool_call) do |function_calls|
|
|
1081
|
+
responses = function_calls.map do |call|
|
|
1082
|
+
result = case call[:name]
|
|
1083
|
+
when "get_weather"
|
|
1084
|
+
{ temperature: 22, condition: "sunny", location: call[:args]["location"] }
|
|
1085
|
+
end
|
|
1086
|
+
{ id: call[:id], name: call[:name], response: result }
|
|
1087
|
+
end
|
|
1088
|
+
session.send_tool_response(responses)
|
|
1089
|
+
end
|
|
1090
|
+
|
|
1091
|
+
sleep 0.5 # wait for setup
|
|
1092
|
+
session.send_realtime_text("What's the weather in Tokyo?")
|
|
1093
|
+
sleep 18
|
|
1094
|
+
end
|
|
1095
|
+
|
|
1096
|
+
# audio_chunks now contains 24 kHz, 16-bit PCM mono audio of the spoken reply.
|
|
1097
|
+
```
|
|
1098
|
+
|
|
1099
|
+
A complete example is in `demo/live_function_calling_demo.rb`.
|
|
1100
|
+
|
|
1101
|
+
#### Function Calling (Asynchronous / NON_BLOCKING)
|
|
1102
|
+
|
|
1103
|
+
`gemini-2.5-flash-live-preview` supports asynchronous function calls. Mark a function declaration with `behavior: "NON_BLOCKING"` so the model can keep talking while the call runs, then control how the result is delivered back via `scheduling`.
|
|
1104
|
+
|
|
1105
|
+
```ruby
|
|
1106
|
+
tools = [
|
|
1107
|
+
{
|
|
1108
|
+
functionDeclarations: [
|
|
1109
|
+
{
|
|
1110
|
+
name: "fetch_long_running_data",
|
|
1111
|
+
behavior: "NON_BLOCKING",
|
|
1112
|
+
description: "Slow data lookup",
|
|
1113
|
+
parameters: { type: "object", properties: {} }
|
|
1114
|
+
}
|
|
1115
|
+
]
|
|
1116
|
+
}
|
|
1117
|
+
]
|
|
1118
|
+
|
|
1119
|
+
session.on(:tool_call) do |function_calls|
|
|
1120
|
+
responses = function_calls.map do |call|
|
|
1121
|
+
{
|
|
1122
|
+
id: call[:id],
|
|
1123
|
+
name: call[:name],
|
|
1124
|
+
response: { result: "data ready" },
|
|
1125
|
+
scheduling: "INTERRUPT" # or "WHEN_IDLE", "SILENT"
|
|
1126
|
+
}
|
|
1127
|
+
end
|
|
1128
|
+
session.send_tool_response(responses)
|
|
1129
|
+
end
|
|
1130
|
+
```
|
|
1131
|
+
|
|
1132
|
+
`scheduling` can also be placed inside the `response:` hash directly. Valid values: `INTERRUPT`, `WHEN_IDLE`, `SILENT`. The library validates and uppercases the value automatically; an unknown value raises `ArgumentError`.
|
|
1133
|
+
|
|
1134
|
+
#### Built-in Tools
|
|
1135
|
+
|
|
1136
|
+
Google Search grounding is supported in the Live API:
|
|
1137
|
+
|
|
1138
|
+
```ruby
|
|
1139
|
+
client.live.connect(
|
|
1140
|
+
model: "gemini-2.5-flash-live-preview",
|
|
1141
|
+
tools: [{ google_search: {} }]
|
|
1142
|
+
) do |session|
|
|
1143
|
+
# ...
|
|
1144
|
+
end
|
|
1145
|
+
```
|
|
1146
|
+
|
|
1147
|
+
#### Supported Live API Models for Tools
|
|
1148
|
+
|
|
1149
|
+
The public Live API tools docs list:
|
|
1150
|
+
|
|
1151
|
+
| Model | Sync Function Calling | Async (NON_BLOCKING) | Google Search |
|
|
1152
|
+
|---|---|---|---|
|
|
1153
|
+
| `gemini-2.5-flash-live-preview` | ✓ | ✓ | ✓ |
|
|
1154
|
+
| `gemini-3.1-flash-live-preview` | ✓ | — | ✓ |
|
|
1155
|
+
|
|
1156
|
+
In practice, on the `bidiGenerateContent` endpoint as of writing:
|
|
1157
|
+
|
|
1158
|
+
- `gemini-3.1-flash-live-preview` is deployed and works with **AUDIO** response modality + tools, **but only when text input is sent via `session.send_realtime_text(...)`** (i.e., `realtimeInput.text`). It rejects the legacy `clientContent.turns[]` payload.
|
|
1159
|
+
- `gemini-2.5-flash-native-audio-preview-12-2025` (the library default) is deployed and accepts both `send_realtime_text` and `send_text` (legacy `clientContent.turns[]`).
|
|
1160
|
+
- `gemini-2.5-flash-live-preview` from the docs table is **not yet deployed**.
|
|
1161
|
+
|
|
1162
|
+
Once a TEXT-modality-capable Live model ships, the same code works with `response_modality: "TEXT"` and the `voice_name:` argument removed.
|
|
1163
|
+
|
|
1164
|
+
Demos available:
|
|
1165
|
+
|
|
1166
|
+
- `demo/live_text_demo.rb` - Live API text conversation
|
|
1167
|
+
- `demo/live_audio_demo.rb` - Live API audio conversation
|
|
1168
|
+
- `demo/live_function_calling_demo.rb` - Live API function calling
|
|
1169
|
+
|
|
1170
|
+
### Embeddings
|
|
1171
|
+
|
|
1172
|
+
You can generate text embeddings using the Gemini Embeddings API. Embeddings are vector representations of text that can be used for semantic similarity, classification, clustering, retrieval, and more.
|
|
1173
|
+
|
|
1174
|
+
#### Single Embedding
|
|
1175
|
+
|
|
1176
|
+
```ruby
|
|
1177
|
+
require 'gemini'
|
|
1178
|
+
|
|
1179
|
+
client = Gemini::Client.new(ENV['GEMINI_API_KEY'])
|
|
1180
|
+
|
|
1181
|
+
response = client.embed_content(
|
|
1182
|
+
"What is the meaning of life?",
|
|
1183
|
+
model: "gemini-embedding-001"
|
|
1184
|
+
)
|
|
1185
|
+
|
|
1186
|
+
if response.success?
|
|
1187
|
+
puts "Dimension: #{response.embedding_dimension}"
|
|
1188
|
+
puts "Vector (first 5 values): #{response.embedding.first(5).inspect}"
|
|
1189
|
+
end
|
|
1190
|
+
```
|
|
1191
|
+
|
|
1192
|
+
#### Batch Embeddings
|
|
1193
|
+
|
|
1194
|
+
Pass an Array of strings to embed multiple texts in a single batch request (uses `batchEmbedContents` under the hood):
|
|
1195
|
+
|
|
1196
|
+
```ruby
|
|
1197
|
+
response = client.embed_content(
|
|
1198
|
+
[
|
|
1199
|
+
"I love programming in Ruby.",
|
|
1200
|
+
"Rubies are red gemstones.",
|
|
1201
|
+
"Python is also a programming language."
|
|
1202
|
+
],
|
|
1203
|
+
model: "gemini-embedding-001",
|
|
1204
|
+
task_type: :semantic_similarity
|
|
1205
|
+
)
|
|
1206
|
+
|
|
1207
|
+
response.embeddings.each_with_index do |values, i|
|
|
1208
|
+
puts "Embedding #{i}: dimension=#{values.size}"
|
|
1209
|
+
end
|
|
1210
|
+
```
|
|
1211
|
+
|
|
1212
|
+
#### Task Type, Title, and Output Dimensionality
|
|
1213
|
+
|
|
1214
|
+
You can specify a `task_type` to optimize the embedding for a particular downstream task. When `task_type: :retrieval_document` is used, you may also pass a `title`. Use `output_dimensionality` to truncate the vector length (recommended values: 768, 1536, 3072).
|
|
1215
|
+
|
|
1216
|
+
```ruby
|
|
1217
|
+
response = client.embed_content(
|
|
1218
|
+
"Ruby is a dynamic, open-source programming language.",
|
|
1219
|
+
model: "gemini-embedding-001",
|
|
1220
|
+
task_type: :retrieval_document,
|
|
1221
|
+
title: "Ruby Overview",
|
|
1222
|
+
output_dimensionality: 768
|
|
1223
|
+
)
|
|
1224
|
+
```
|
|
1225
|
+
|
|
1226
|
+
Supported task types:
|
|
1227
|
+
|
|
1228
|
+
- `RETRIEVAL_QUERY`
|
|
1229
|
+
- `RETRIEVAL_DOCUMENT`
|
|
1230
|
+
- `SEMANTIC_SIMILARITY`
|
|
1231
|
+
- `CLASSIFICATION`
|
|
1232
|
+
- `CLUSTERING`
|
|
1233
|
+
- `QUESTION_ANSWERING`
|
|
1234
|
+
- `FACT_VERIFICATION`
|
|
1235
|
+
- `CODE_RETRIEVAL_QUERY`
|
|
1236
|
+
|
|
1237
|
+
You can pass them as a String, Symbol, or in any case (e.g. `:retrieval_query`, `"RETRIEVAL_QUERY"`).
|
|
1238
|
+
|
|
1239
|
+
#### Direct Access via `embeddings_api`
|
|
1240
|
+
|
|
1241
|
+
For more control, you can call the embeddings API directly:
|
|
1242
|
+
|
|
1243
|
+
```ruby
|
|
1244
|
+
# Single
|
|
1245
|
+
client.embeddings_api.create(input: "Hello", model: "gemini-embedding-001")
|
|
1246
|
+
|
|
1247
|
+
# Batch
|
|
1248
|
+
client.embeddings_api.batch_create(
|
|
1249
|
+
inputs: ["First", "Second", "Third"],
|
|
1250
|
+
model: "gemini-embedding-001",
|
|
1251
|
+
task_type: :clustering
|
|
1252
|
+
)
|
|
1253
|
+
```
|
|
1254
|
+
|
|
1255
|
+
#### Response Helpers
|
|
1256
|
+
|
|
1257
|
+
The Response object exposes a few helpers for embedding payloads:
|
|
1258
|
+
|
|
1259
|
+
```ruby
|
|
1260
|
+
response.embedding # First embedding values (Array of Floats)
|
|
1261
|
+
response.embeddings # All embedding value arrays (Array of Arrays)
|
|
1262
|
+
response.embedding_dimension # Length of the first embedding vector
|
|
1263
|
+
response.embedding_response? # true if the payload contains embedding data
|
|
1264
|
+
```
|
|
1265
|
+
|
|
1266
|
+
A complete example is available in `demo/embeddings_demo.rb`.
|
|
1267
|
+
|
|
1268
|
+
### Token Counting
|
|
1269
|
+
|
|
1270
|
+
Estimate how many tokens an input would consume before sending it to a generation endpoint. Useful for cost/quota planning and for staying within a model's context window.
|
|
1271
|
+
|
|
1272
|
+
#### Basic Usage
|
|
1273
|
+
|
|
1274
|
+
```ruby
|
|
1275
|
+
require 'gemini'
|
|
1276
|
+
|
|
1277
|
+
client = Gemini::Client.new(ENV['GEMINI_API_KEY'])
|
|
1278
|
+
|
|
1279
|
+
response = client.count_tokens("The quick brown fox jumps over the lazy dog.")
|
|
1280
|
+
|
|
1281
|
+
puts response.count_tokens # => 9 (totalTokens)
|
|
1282
|
+
puts response.prompt_tokens_details # => [{"modality"=>"TEXT", "tokenCount"=>9}]
|
|
1283
|
+
```
|
|
1284
|
+
|
|
1285
|
+
By default the request goes to `gemini-2.5-flash`. Override it with `model:`:
|
|
1286
|
+
|
|
1287
|
+
```ruby
|
|
1288
|
+
client.count_tokens("Hello", model: "gemini-2.5-pro")
|
|
1289
|
+
```
|
|
1290
|
+
|
|
1291
|
+
#### Multi-turn Chat History
|
|
1292
|
+
|
|
1293
|
+
Pass a fully formed `contents:` array (the same shape used by `generateContent`) to count tokens for an entire conversation:
|
|
1294
|
+
|
|
1295
|
+
```ruby
|
|
1296
|
+
response = client.count_tokens(
|
|
1297
|
+
contents: [
|
|
1298
|
+
{ role: "user", parts: [{ text: "Hi, my name is Bob." }] },
|
|
1299
|
+
{ role: "model", parts: [{ text: "Hi Bob!" }] },
|
|
1300
|
+
{ role: "user", parts: [{ text: "What's the weather like today?" }] }
|
|
1301
|
+
]
|
|
1302
|
+
)
|
|
1303
|
+
```
|
|
1304
|
+
|
|
1305
|
+
#### With System Instruction, Tools, or Cached Content
|
|
1306
|
+
|
|
1307
|
+
When you include `system_instruction:`, `tools:`, `generation_config:`, or `cached_content:`, the request is automatically wrapped as a `generateContentRequest` so the count reflects the full payload:
|
|
1308
|
+
|
|
1309
|
+
```ruby
|
|
1310
|
+
response = client.count_tokens(
|
|
1311
|
+
"What is the weather in Tokyo?",
|
|
1312
|
+
system_instruction: "You are a concise weather assistant.",
|
|
1313
|
+
tools: [
|
|
1314
|
+
{
|
|
1315
|
+
function_declarations: [
|
|
1316
|
+
{
|
|
1317
|
+
name: "get_weather",
|
|
1318
|
+
description: "Get the current weather for a city.",
|
|
1319
|
+
parameters: {
|
|
1320
|
+
type: "object",
|
|
1321
|
+
properties: { city: { type: "string" } },
|
|
1322
|
+
required: ["city"]
|
|
1323
|
+
}
|
|
1324
|
+
}
|
|
1325
|
+
]
|
|
1326
|
+
}
|
|
1327
|
+
]
|
|
1328
|
+
)
|
|
1329
|
+
|
|
1330
|
+
puts response.count_tokens
|
|
1331
|
+
```
|
|
1332
|
+
|
|
1333
|
+
#### Direct Access via `tokens`
|
|
1334
|
+
|
|
1335
|
+
```ruby
|
|
1336
|
+
client.tokens.count("Hello", model: "gemini-2.5-flash")
|
|
1337
|
+
```
|
|
1338
|
+
|
|
1339
|
+
#### Response Helpers
|
|
1340
|
+
|
|
1341
|
+
```ruby
|
|
1342
|
+
response.count_tokens # totalTokens from the API (Integer)
|
|
1343
|
+
response.prompt_tokens_details # per-modality breakdown (Array<Hash>)
|
|
1344
|
+
response.cached_content_token_count # tokens reused from cachedContent (Integer)
|
|
1345
|
+
response.count_tokens_response? # true if the payload is a countTokens response
|
|
1346
|
+
```
|
|
1347
|
+
|
|
1348
|
+
A complete example is available in `demo/count_tokens_demo.rb`.
|
|
1349
|
+
|
|
1350
|
+
### Speech Generation (TTS)
|
|
1351
|
+
|
|
1352
|
+
Generate spoken audio from text using Gemini's TTS preview models. The API returns 24 kHz, 16-bit, mono PCM (L16) audio; `Response#save_audio` wraps it in a RIFF/WAVE header so the result is directly playable.
|
|
1353
|
+
|
|
1354
|
+
#### Single-Speaker
|
|
1355
|
+
|
|
1356
|
+
```ruby
|
|
1357
|
+
require 'gemini'
|
|
1358
|
+
|
|
1359
|
+
client = Gemini::Client.new(ENV['GEMINI_API_KEY'])
|
|
1360
|
+
|
|
1361
|
+
response = client.generate_speech(
|
|
1362
|
+
"Say cheerfully: Have a wonderful day!",
|
|
1363
|
+
voice: "Kore"
|
|
1364
|
+
)
|
|
1365
|
+
|
|
1366
|
+
if response.success?
|
|
1367
|
+
response.save_audio("hello.wav")
|
|
1368
|
+
puts response.audio_mime_type # => "audio/L16;codec=pcm;rate=24000"
|
|
1369
|
+
end
|
|
1370
|
+
```
|
|
1371
|
+
|
|
1372
|
+
Phrase the prompt as an instruction to read text aloud (`Say ...:` / `Read the following:`); a bare phrase like `"Hello"` is treated as a chat message and rejected with a 400 error.
|
|
1373
|
+
|
|
1374
|
+
#### Multi-Speaker
|
|
1375
|
+
|
|
1376
|
+
Provide a `multi_speaker:` array to assign different voices to named speakers (up to 2 speakers in the current preview models). Reference the speakers by the same names in your prompt.
|
|
1377
|
+
|
|
1378
|
+
```ruby
|
|
1379
|
+
script = <<~SCRIPT
|
|
1380
|
+
TTS the following conversation between Joe and Jane:
|
|
1381
|
+
Joe: How's it going today, Jane?
|
|
1382
|
+
Jane: Not too bad, how about you?
|
|
1383
|
+
SCRIPT
|
|
1384
|
+
|
|
1385
|
+
response = client.generate_speech(
|
|
1386
|
+
script,
|
|
1387
|
+
multi_speaker: [
|
|
1388
|
+
{ speaker: "Joe", voice: "Kore" },
|
|
1389
|
+
{ speaker: "Jane", voice: "Puck" }
|
|
1390
|
+
]
|
|
1391
|
+
)
|
|
1392
|
+
|
|
1393
|
+
response.save_audio("dialogue.wav")
|
|
1394
|
+
```
|
|
1395
|
+
|
|
1396
|
+
#### Style Control
|
|
1397
|
+
|
|
1398
|
+
You can steer tone, pace, and emotion in two ways.
|
|
1399
|
+
|
|
1400
|
+
**1. Natural-language instruction** — describe the delivery as part of the prompt.
|
|
1401
|
+
|
|
1402
|
+
```ruby
|
|
1403
|
+
client.generate_speech(
|
|
1404
|
+
"Read this in a soft whisper: I have a secret... and you must never tell anyone.",
|
|
1405
|
+
voice: "Zephyr"
|
|
1406
|
+
)
|
|
1407
|
+
```
|
|
1408
|
+
|
|
1409
|
+
**2. Inline bracket tag** — put a directive like `[whispers]`, `[excited]`, `[laughs]`, `[sighs]`, `[shouting]`, etc. at the start of the text to apply that style to what follows.
|
|
1410
|
+
|
|
1411
|
+
```ruby
|
|
1412
|
+
client.generate_speech(
|
|
1413
|
+
"[whispers] I have a secret... and you must never tell anyone.",
|
|
1414
|
+
voice: "Zephyr"
|
|
1415
|
+
)
|
|
1416
|
+
```
|
|
1417
|
+
|
|
1418
|
+
Stick to **one style per call**: switching style mid-prompt (e.g. `[whispers] ... [excited] ...`) tends to leave the second segment in the first style or drop it entirely. If you need multiple styles, call `generate_speech` once per sentence and concatenate the audio yourself.
|
|
1419
|
+
|
|
1420
|
+
#### Models and Voices
|
|
1421
|
+
|
|
1422
|
+
- Default model: `gemini-2.5-flash-preview-tts` (override via `model:`)
|
|
1423
|
+
- Other models: `gemini-2.5-pro-preview-tts`, `gemini-3.1-flash-tts-preview`
|
|
1424
|
+
- 30 prebuilt voices are listed in `Gemini::TTS::VOICES` (Zephyr, Puck, Charon, Kore, Fenrir, Leda, Orus, Aoede, …). Unknown names raise `ArgumentError` at build time.
|
|
1425
|
+
|
|
1426
|
+
#### Direct Access via `tts`
|
|
1427
|
+
|
|
1428
|
+
```ruby
|
|
1429
|
+
client.tts.generate("Say hello.", voice: "Kore")
|
|
1430
|
+
```
|
|
1431
|
+
|
|
1432
|
+
#### Response Helpers
|
|
1433
|
+
|
|
1434
|
+
```ruby
|
|
1435
|
+
response.audio_data # Base64-encoded PCM payload
|
|
1436
|
+
response.audio_mime_type # e.g. "audio/L16;codec=pcm;rate=24000"
|
|
1437
|
+
response.audio_response? # true if the payload contains audio inlineData
|
|
1438
|
+
response.save_audio(path) # writes a playable .wav file and returns the path
|
|
1439
|
+
```
|
|
1440
|
+
|
|
1441
|
+
A complete example is available in `demo/tts_demo.rb`.
|
|
1442
|
+
|
|
995
1443
|
### Structured Output with JSON Schema
|
|
996
1444
|
|
|
997
1445
|
You can request responses in structured JSON format by specifying a JSON schema:
|
|
@@ -1232,6 +1680,10 @@ The gem includes several demo applications that showcase its functionality:
|
|
|
1232
1680
|
- `demo/document_chat_demo.rb` - Document processing
|
|
1233
1681
|
- `demo/document_conversation_demo.rb` - Conversation with documents
|
|
1234
1682
|
- `demo/document_cache_demo.rb` - Document caching
|
|
1683
|
+
- `demo/embeddings_demo.rb` - Text embeddings (single and batch)
|
|
1684
|
+
- `demo/live_text_demo.rb` - Live API text conversation
|
|
1685
|
+
- `demo/live_audio_demo.rb` - Live API audio conversation
|
|
1686
|
+
- `demo/live_function_calling_demo.rb` - Live API function calling
|
|
1235
1687
|
|
|
1236
1688
|
Run the demos with:
|
|
1237
1689
|
|
|
@@ -1286,6 +1738,9 @@ ruby demo/document_conversation_demo.rb path/to/document.pdf
|
|
|
1286
1738
|
|
|
1287
1739
|
# Document caching and querying
|
|
1288
1740
|
ruby demo/document_cache_demo.rb path/to/document.pdf
|
|
1741
|
+
|
|
1742
|
+
# Text embeddings (single and batch)
|
|
1743
|
+
ruby demo/embeddings_demo.rb
|
|
1289
1744
|
```
|
|
1290
1745
|
|
|
1291
1746
|
## Models
|
data/lib/gemini/client.rb
CHANGED
|
@@ -70,6 +70,56 @@ module Gemini
|
|
|
70
70
|
@cached_content ||= Gemini::CachedContent.new(client: self)
|
|
71
71
|
end
|
|
72
72
|
|
|
73
|
+
# Live APIアクセサ
|
|
74
|
+
def live
|
|
75
|
+
@live ||= Gemini::Live.new(client: self)
|
|
76
|
+
end
|
|
77
|
+
|
|
78
|
+
# Embeddings APIアクセサ
|
|
79
|
+
def embeddings_api
|
|
80
|
+
@embeddings_api ||= Gemini::Embeddings.new(client: self)
|
|
81
|
+
end
|
|
82
|
+
|
|
83
|
+
# Token counting APIアクセサ
|
|
84
|
+
def tokens
|
|
85
|
+
@tokens ||= Gemini::Tokens.new(client: self)
|
|
86
|
+
end
|
|
87
|
+
|
|
88
|
+
# TTS (speech generation) APIアクセサ
|
|
89
|
+
def tts
|
|
90
|
+
@tts ||= Gemini::TTS.new(client: self)
|
|
91
|
+
end
|
|
92
|
+
|
|
93
|
+
# Convenience wrapper for TTS speech generation.
|
|
94
|
+
def generate_speech(text, voice: nil, multi_speaker: nil, model: Gemini::TTS::DEFAULT_MODEL,
|
|
95
|
+
speech_config: nil, **parameters)
|
|
96
|
+
tts.generate(
|
|
97
|
+
text,
|
|
98
|
+
voice: voice,
|
|
99
|
+
multi_speaker: multi_speaker,
|
|
100
|
+
model: model,
|
|
101
|
+
speech_config: speech_config,
|
|
102
|
+
**parameters
|
|
103
|
+
)
|
|
104
|
+
end
|
|
105
|
+
|
|
106
|
+
# Convenience wrapper for countTokens.
|
|
107
|
+
# input can be a String, Array of parts/strings, Hash, or omitted when contents: is given.
|
|
108
|
+
def count_tokens(input = nil, model: Gemini::Tokens::DEFAULT_MODEL, contents: nil,
|
|
109
|
+
system_instruction: nil, tools: nil, generation_config: nil,
|
|
110
|
+
cached_content: nil, **parameters)
|
|
111
|
+
tokens.count(
|
|
112
|
+
input,
|
|
113
|
+
model: model,
|
|
114
|
+
contents: contents,
|
|
115
|
+
system_instruction: system_instruction,
|
|
116
|
+
tools: tools,
|
|
117
|
+
generation_config: generation_config,
|
|
118
|
+
cached_content: cached_content,
|
|
119
|
+
**parameters
|
|
120
|
+
)
|
|
121
|
+
end
|
|
122
|
+
|
|
73
123
|
def reset_headers
|
|
74
124
|
@extra_headers = {}
|
|
75
125
|
end
|
|
@@ -112,10 +162,25 @@ module Gemini
|
|
|
112
162
|
end
|
|
113
163
|
end
|
|
114
164
|
|
|
115
|
-
#
|
|
165
|
+
# Generate embeddings for the given input.
|
|
166
|
+
# input can be a String (single embed) or Array of Strings (batch embed).
|
|
167
|
+
# Supports task_type, title (RETRIEVAL_DOCUMENT only), and output_dimensionality.
|
|
168
|
+
def embed_content(input, model: Gemini::Embeddings::DEFAULT_MODEL, task_type: nil,
|
|
169
|
+
title: nil, output_dimensionality: nil, **parameters)
|
|
170
|
+
embeddings_api.create(
|
|
171
|
+
input: input,
|
|
172
|
+
model: model,
|
|
173
|
+
task_type: task_type,
|
|
174
|
+
title: title,
|
|
175
|
+
output_dimensionality: output_dimensionality,
|
|
176
|
+
**parameters
|
|
177
|
+
)
|
|
178
|
+
end
|
|
179
|
+
|
|
180
|
+
# Method corresponding to OpenAI's embeddings (kept for compatibility)
|
|
116
181
|
def embeddings(parameters: {})
|
|
117
|
-
model = parameters.delete(:model) ||
|
|
118
|
-
path = "models/#{model}:embedContent"
|
|
182
|
+
model = parameters.delete(:model) || Gemini::Embeddings::DEFAULT_MODEL
|
|
183
|
+
path = "models/#{model.to_s.delete_prefix("models/")}:embedContent"
|
|
119
184
|
response = json_post(path: path, parameters: parameters)
|
|
120
185
|
Gemini::Response.new(response)
|
|
121
186
|
end
|