ruby-gemini-api 1.1.0 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 31487006e959d8d9a755743f6471b54e075a1bc91aa36718274203deb9fda84d
4
- data.tar.gz: bc7dbbbea933ed2343b2ec1a32d3cf9b03fe41a4494801c47e5d83842bf31603
3
+ metadata.gz: 325e8812a1eb58643280cdfba86f4a8a9a8e3659d882d8fb5fc5e4ac2025a35c
4
+ data.tar.gz: 1bcf959e036341fe71a41a693accc9f0d85a0fe790f43811a2af7ae3b455e10b
5
5
  SHA512:
6
- metadata.gz: cc033c4ab711800c56f2f1d8884c38a74359d7c3b66fb66ed39ea03b23b98c3748d08208dc560e3276ac3d6e25660ad3a2c5aef69ef25b523bbd6512a5b5d246
7
- data.tar.gz: 998ca95babf2803241a9d0a01c00eefe7fcbd37990723ffc9378668bc30f5529a90daa92b20919e5c0e681f68f4e8a5d4206e66520eeec46da9aa18c169fc9be
6
+ metadata.gz: 243dd3d0f56a645305e1345de08ca7512088e0d7ae2dfd45f2b5747614c469ed35523991da8cb7d4bef08f060b0cbfb666903e0b5c262e18cebece66405753b9
7
+ data.tar.gz: fa2234b64394200ab6039010f88d5b9ac97c70e4e4266943b0ee57848e7f0141bebe3a50929d315bb54b4ddd88c990c07d3a26cf54f904b1e89727a53a4c8a12
data/CHANGELOG.md CHANGED
@@ -1,5 +1,25 @@
1
1
  ## [Unreleased]
2
2
 
3
+ ### Added
4
+ - Code Execution shortcut support via `code_execution: true` on `generate_content` / `generate_content_stream`
5
+ - `Response` helpers for Code Execution results: `#code_execution?`, `#executable_code`, `#executable_codes`, `#code_execution_output`, `#code_execution_outcome`, `#code_execution_success?`, `#code_execution_results`
6
+
7
+ ## [1.2.0] - 2026-05-14
8
+
9
+ ### Added
10
+ - TTS (speech generation) API support
11
+ - `client.tts.generate(text, voice:)` and `client.generate_speech(text, voice:)` shortcut
12
+ - Single-speaker mode via `voice:` and multi-speaker mode via `multi_speaker: [{ speaker:, voice: }, ...]`
13
+ - 30 prebuilt voices exposed as `Gemini::TTS::VOICES`
14
+ - Default model `gemini-2.5-flash-preview-tts` (override via `model:`)
15
+ - `Response` helpers: `#audio_data`, `#audio_mime_type`, `#audio_response?`, `#save_audio(path)` which auto-wraps L16 PCM in a RIFF/WAVE header
16
+ - Demos: `tts_demo.rb` / `tts_demo_ja.rb`
17
+ - `countTokens` API support
18
+ - `client.tokens.count(input, ...)` and `client.count_tokens(input, ...)` shortcut
19
+ - Accepts String / Array / Hash inputs, full `contents:` array, plus optional `system_instruction:`, `tools:`, `generation_config:`, `cached_content:` (auto-wraps payload in `generateContentRequest` when extra fields are present)
20
+ - `Response` helpers: `#count_tokens`, `#prompt_tokens_details`, `#cached_content_token_count`, `#count_tokens_response?`
21
+ - Demos: `count_tokens_demo.rb` / `count_tokens_demo_ja.rb`
22
+
3
23
  ## [1.1.0] - 2026-04-29
4
24
 
5
25
  ### Added
data/README.md CHANGED
@@ -31,7 +31,10 @@ This project is inspired by and pays homage to [ruby-openai](https://github.com/
31
31
  - Document processing (PDFs and other formats)
32
32
  - Context caching for efficient processing
33
33
  - Text embeddings (single and batch) with task type, title, and output dimensionality control
34
+ - Token counting (`countTokens`) for prompts, chat history, and full requests with system instruction / tools / cached content
35
+ - Speech generation (TTS) with 30 prebuilt voices, single-speaker and multi-speaker modes, and one-line WAV file output
34
36
  - Live API: real-time bidirectional conversations with text/audio/video and function calling (sync and async)
37
+ - Code Execution: let the model generate and run Python code, then inspect generated code and execution results
35
38
 
36
39
  ### Function Calling
37
40
 
@@ -108,11 +111,65 @@ puts "After deleting a function: #{all_tools.list_functions}"
108
111
  # => After deleting a function: [:get_current_weather, :send_email]
109
112
  ```
110
113
 
114
+ ### Code Execution
115
+
116
+ Pass `code_execution: true` to let Gemini generate and run Python code when it helps answer calculation or data-processing tasks.
117
+
118
+ ```ruby
119
+ require 'gemini'
120
+
121
+ client = Gemini::Client.new(ENV['GEMINI_API_KEY'])
122
+
123
+ response = client.generate_content(
124
+ "Calculate the sum of the first 50 prime numbers and show the code you used",
125
+ model: "gemini-3.5-flash",
126
+ code_execution: true
127
+ )
128
+
129
+ puts response.text
130
+
131
+ if response.code_execution?
132
+ puts response.executable_code
133
+ puts response.code_execution_outcome
134
+ puts response.code_execution_output
135
+ end
136
+ ```
137
+
138
+ You can also inspect all generated code and execution results:
139
+
140
+ ```ruby
141
+ response.executable_codes # => [{"language"=>"PYTHON", "code"=>"..."}]
142
+ response.code_execution_results # => [{"outcome"=>"OUTCOME_OK", "output"=>"..."}]
143
+ response.code_execution_success? # => true
144
+ ```
145
+
146
+ Code Execution also works with image inputs. For Gemini 3 models, combine it with `thinking_level` when you want the model to inspect an image with code.
147
+
148
+ ```ruby
149
+ response = client.generate_content(
150
+ [
151
+ { type: "text", text: "Read the small numbers in this image" },
152
+ { type: "image_file", image_file: { file_path: "meter.jpg" } }
153
+ ],
154
+ model: "gemini-3.5-flash",
155
+ code_execution: true,
156
+ thinking_level: :medium
157
+ )
158
+ ```
159
+
160
+ A complete example is available in `demo/code_execution_demo.rb`.
161
+
111
162
  ### Thinking Feature
112
163
 
113
164
  Gemini 2.5 and later models support the Thinking feature, which enables the model to perform internal reasoning processes for complex problems to generate higher-quality answers.
114
165
 
115
- #### Using with Gemini 2.5: `thinking_budget`
166
+ #### Deprecation notice: thinking and sampling controls
167
+
168
+ For Gemini 3 and later models, prefer `thinking_level` (`:minimal`, `:low`, `:medium`, `:high`) instead of `thinking_budget`. The `thinking_budget` option remains available for Gemini 2.5 compatibility, but it should be treated as a legacy control when targeting newer models.
169
+
170
+ Sampling parameters such as `temperature`, `top_p`, and `top_k` are also considered legacy tuning knobs for newer Gemini models. Existing code can continue to pass them through for backward compatibility, but new integrations should rely on model defaults and Thinking controls first.
171
+
172
+ #### Legacy Gemini 2.5 usage: `thinking_budget`
116
173
 
117
174
  ```ruby
118
175
  require 'gemini'
@@ -143,7 +200,7 @@ response = client.generate_content(
143
200
  )
144
201
  ```
145
202
 
146
- #### Using with Gemini 3: `thinking_level`
203
+ #### Recommended Gemini 3 usage: `thinking_level`
147
204
 
148
205
  ```ruby
149
206
  # Specify thinking level (:minimal, :low, :medium, :high)
@@ -1263,6 +1320,181 @@ response.embedding_response? # true if the payload contains embedding data
1263
1320
 
1264
1321
  A complete example is available in `demo/embeddings_demo.rb`.
1265
1322
 
1323
+ ### Token Counting
1324
+
1325
+ Estimate how many tokens an input would consume before sending it to a generation endpoint. Useful for cost/quota planning and for staying within a model's context window.
1326
+
1327
+ #### Basic Usage
1328
+
1329
+ ```ruby
1330
+ require 'gemini'
1331
+
1332
+ client = Gemini::Client.new(ENV['GEMINI_API_KEY'])
1333
+
1334
+ response = client.count_tokens("The quick brown fox jumps over the lazy dog.")
1335
+
1336
+ puts response.count_tokens # => 9 (totalTokens)
1337
+ puts response.prompt_tokens_details # => [{"modality"=>"TEXT", "tokenCount"=>9}]
1338
+ ```
1339
+
1340
+ By default the request goes to `gemini-2.5-flash`. Override it with `model:`:
1341
+
1342
+ ```ruby
1343
+ client.count_tokens("Hello", model: "gemini-2.5-pro")
1344
+ ```
1345
+
1346
+ #### Multi-turn Chat History
1347
+
1348
+ Pass a fully formed `contents:` array (the same shape used by `generateContent`) to count tokens for an entire conversation:
1349
+
1350
+ ```ruby
1351
+ response = client.count_tokens(
1352
+ contents: [
1353
+ { role: "user", parts: [{ text: "Hi, my name is Bob." }] },
1354
+ { role: "model", parts: [{ text: "Hi Bob!" }] },
1355
+ { role: "user", parts: [{ text: "What's the weather like today?" }] }
1356
+ ]
1357
+ )
1358
+ ```
1359
+
1360
+ #### With System Instruction, Tools, or Cached Content
1361
+
1362
+ When you include `system_instruction:`, `tools:`, `generation_config:`, or `cached_content:`, the request is automatically wrapped as a `generateContentRequest` so the count reflects the full payload:
1363
+
1364
+ ```ruby
1365
+ response = client.count_tokens(
1366
+ "What is the weather in Tokyo?",
1367
+ system_instruction: "You are a concise weather assistant.",
1368
+ tools: [
1369
+ {
1370
+ function_declarations: [
1371
+ {
1372
+ name: "get_weather",
1373
+ description: "Get the current weather for a city.",
1374
+ parameters: {
1375
+ type: "object",
1376
+ properties: { city: { type: "string" } },
1377
+ required: ["city"]
1378
+ }
1379
+ }
1380
+ ]
1381
+ }
1382
+ ]
1383
+ )
1384
+
1385
+ puts response.count_tokens
1386
+ ```
1387
+
1388
+ #### Direct Access via `tokens`
1389
+
1390
+ ```ruby
1391
+ client.tokens.count("Hello", model: "gemini-2.5-flash")
1392
+ ```
1393
+
1394
+ #### Response Helpers
1395
+
1396
+ ```ruby
1397
+ response.count_tokens # totalTokens from the API (Integer)
1398
+ response.prompt_tokens_details # per-modality breakdown (Array<Hash>)
1399
+ response.cached_content_token_count # tokens reused from cachedContent (Integer)
1400
+ response.count_tokens_response? # true if the payload is a countTokens response
1401
+ ```
1402
+
1403
+ A complete example is available in `demo/count_tokens_demo.rb`.
1404
+
1405
+ ### Speech Generation (TTS)
1406
+
1407
+ Generate spoken audio from text using Gemini's TTS preview models. The API returns 24 kHz, 16-bit, mono PCM (L16) audio; `Response#save_audio` wraps it in a RIFF/WAVE header so the result is directly playable.
1408
+
1409
+ #### Single-Speaker
1410
+
1411
+ ```ruby
1412
+ require 'gemini'
1413
+
1414
+ client = Gemini::Client.new(ENV['GEMINI_API_KEY'])
1415
+
1416
+ response = client.generate_speech(
1417
+ "Say cheerfully: Have a wonderful day!",
1418
+ voice: "Kore"
1419
+ )
1420
+
1421
+ if response.success?
1422
+ response.save_audio("hello.wav")
1423
+ puts response.audio_mime_type # => "audio/L16;codec=pcm;rate=24000"
1424
+ end
1425
+ ```
1426
+
1427
+ Phrase the prompt as an instruction to read text aloud (`Say ...:` / `Read the following:`); a bare phrase like `"Hello"` is treated as a chat message and rejected with a 400 error.
1428
+
1429
+ #### Multi-Speaker
1430
+
1431
+ Provide a `multi_speaker:` array to assign different voices to named speakers (up to 2 speakers in the current preview models). Reference the speakers by the same names in your prompt.
1432
+
1433
+ ```ruby
1434
+ script = <<~SCRIPT
1435
+ TTS the following conversation between Joe and Jane:
1436
+ Joe: How's it going today, Jane?
1437
+ Jane: Not too bad, how about you?
1438
+ SCRIPT
1439
+
1440
+ response = client.generate_speech(
1441
+ script,
1442
+ multi_speaker: [
1443
+ { speaker: "Joe", voice: "Kore" },
1444
+ { speaker: "Jane", voice: "Puck" }
1445
+ ]
1446
+ )
1447
+
1448
+ response.save_audio("dialogue.wav")
1449
+ ```
1450
+
1451
+ #### Style Control
1452
+
1453
+ You can steer tone, pace, and emotion in two ways.
1454
+
1455
+ **1. Natural-language instruction** — describe the delivery as part of the prompt.
1456
+
1457
+ ```ruby
1458
+ client.generate_speech(
1459
+ "Read this in a soft whisper: I have a secret... and you must never tell anyone.",
1460
+ voice: "Zephyr"
1461
+ )
1462
+ ```
1463
+
1464
+ **2. Inline bracket tag** — put a directive like `[whispers]`, `[excited]`, `[laughs]`, `[sighs]`, `[shouting]`, etc. at the start of the text to apply that style to what follows.
1465
+
1466
+ ```ruby
1467
+ client.generate_speech(
1468
+ "[whispers] I have a secret... and you must never tell anyone.",
1469
+ voice: "Zephyr"
1470
+ )
1471
+ ```
1472
+
1473
+ Stick to **one style per call**: switching style mid-prompt (e.g. `[whispers] ... [excited] ...`) tends to leave the second segment in the first style or drop it entirely. If you need multiple styles, call `generate_speech` once per sentence and concatenate the audio yourself.
1474
+
1475
+ #### Models and Voices
1476
+
1477
+ - Default model: `gemini-2.5-flash-preview-tts` (override via `model:`)
1478
+ - Other models: `gemini-2.5-pro-preview-tts`, `gemini-3.1-flash-tts-preview`
1479
+ - 30 prebuilt voices are listed in `Gemini::TTS::VOICES` (Zephyr, Puck, Charon, Kore, Fenrir, Leda, Orus, Aoede, …). Unknown names raise `ArgumentError` at build time.
1480
+
1481
+ #### Direct Access via `tts`
1482
+
1483
+ ```ruby
1484
+ client.tts.generate("Say hello.", voice: "Kore")
1485
+ ```
1486
+
1487
+ #### Response Helpers
1488
+
1489
+ ```ruby
1490
+ response.audio_data # Base64-encoded PCM payload
1491
+ response.audio_mime_type # e.g. "audio/L16;codec=pcm;rate=24000"
1492
+ response.audio_response? # true if the payload contains audio inlineData
1493
+ response.save_audio(path) # writes a playable .wav file and returns the path
1494
+ ```
1495
+
1496
+ A complete example is available in `demo/tts_demo.rb`.
1497
+
1266
1498
  ### Structured Output with JSON Schema
1267
1499
 
1268
1500
  You can request responses in structured JSON format by specifying a JSON schema:
@@ -1498,6 +1730,7 @@ The gem includes several demo applications that showcase its functionality:
1498
1730
  - `demo/file_audio_demo.rb` - Audio transcription with large audio files
1499
1731
  - `demo/structured_output_demo.rb` - Structured JSON output with schema
1500
1732
  - `demo/enum_response_demo.rb` - Enum-constrained responses
1733
+ - `demo/code_execution_demo.rb` - Code Execution with generated Python code and execution output
1501
1734
  - `demo/thinking_demo.rb` - Thinking feature (Gemini 2.5)
1502
1735
  - `demo/thinking_gemini3_demo.rb` - Thinking feature (Gemini 3)
1503
1736
  - `demo/document_chat_demo.rb` - Document processing
@@ -1547,6 +1780,12 @@ ruby demo/structured_output_demo.rb
1547
1780
  # Enum-constrained responses
1548
1781
  ruby demo/enum_response_demo.rb
1549
1782
 
1783
+ # Code Execution
1784
+ ruby demo/code_execution_demo.rb
1785
+
1786
+ # Code Execution with a different model
1787
+ GEMINI_MODEL=gemini-3.5-pro ruby demo/code_execution_demo.rb
1788
+
1550
1789
  # Thinking feature (Gemini 2.5)
1551
1790
  ruby demo/thinking_demo.rb
1552
1791
 
@@ -1568,11 +1807,16 @@ ruby demo/embeddings_demo.rb
1568
1807
 
1569
1808
  ## Models
1570
1809
 
1571
- The library supports various Gemini models:
1810
+ Model names can be passed as strings. Common examples:
1572
1811
 
1812
+ - `gemini-3.5-flash`
1813
+ - `gemini-3.5-pro`
1814
+ - `gemini-3-flash-preview`
1573
1815
  - `gemini-2.5-flash`
1574
1816
  - `gemini-2.5-pro`
1575
1817
 
1818
+ Use `client.models.list` to check models available to your API key.
1819
+
1576
1820
  ## Requirements
1577
1821
 
1578
1822
  - Ruby 3.0 or higher
@@ -1589,4 +1833,4 @@ The library supports various Gemini models:
1589
1833
 
1590
1834
  ## License
1591
1835
 
1592
- The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
1836
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
data/lib/gemini/client.rb CHANGED
@@ -80,6 +80,46 @@ module Gemini
80
80
  @embeddings_api ||= Gemini::Embeddings.new(client: self)
81
81
  end
82
82
 
83
+ # Token counting APIアクセサ
84
+ def tokens
85
+ @tokens ||= Gemini::Tokens.new(client: self)
86
+ end
87
+
88
+ # TTS (speech generation) APIアクセサ
89
+ def tts
90
+ @tts ||= Gemini::TTS.new(client: self)
91
+ end
92
+
93
+ # Convenience wrapper for TTS speech generation.
94
+ def generate_speech(text, voice: nil, multi_speaker: nil, model: Gemini::TTS::DEFAULT_MODEL,
95
+ speech_config: nil, **parameters)
96
+ tts.generate(
97
+ text,
98
+ voice: voice,
99
+ multi_speaker: multi_speaker,
100
+ model: model,
101
+ speech_config: speech_config,
102
+ **parameters
103
+ )
104
+ end
105
+
106
+ # Convenience wrapper for countTokens.
107
+ # input can be a String, Array of parts/strings, Hash, or omitted when contents: is given.
108
+ def count_tokens(input = nil, model: Gemini::Tokens::DEFAULT_MODEL, contents: nil,
109
+ system_instruction: nil, tools: nil, generation_config: nil,
110
+ cached_content: nil, **parameters)
111
+ tokens.count(
112
+ input,
113
+ model: model,
114
+ contents: contents,
115
+ system_instruction: system_instruction,
116
+ tools: tools,
117
+ generation_config: generation_config,
118
+ cached_content: cached_content,
119
+ **parameters
120
+ )
121
+ end
122
+
83
123
  def reset_headers
84
124
  @extra_headers = {}
85
125
  end
@@ -162,6 +202,7 @@ module Gemini
162
202
  def generate_content(prompt, model: "gemini-2.5-flash", system_instruction: nil,
163
203
  response_mime_type: nil, response_schema: nil, temperature: 0.5, tools: nil,
164
204
  url_context: false, google_search: false,
205
+ code_execution: false,
165
206
  thinking_budget: nil, thinking_level: nil,
166
207
  **parameters, &stream_callback)
167
208
  content = format_content(prompt)
@@ -190,7 +231,12 @@ module Gemini
190
231
  end
191
232
 
192
233
  # Handle tool shortcuts
193
- tools = build_tools_array(tools, url_context: url_context, google_search: google_search)
234
+ tools = build_tools_array(
235
+ tools,
236
+ url_context: url_context,
237
+ google_search: google_search,
238
+ code_execution: code_execution
239
+ )
194
240
  params[:tools] = tools if tools && !tools.empty?
195
241
 
196
242
  params.merge!(parameters)
@@ -205,7 +251,8 @@ module Gemini
205
251
  # Streaming text generation
206
252
  def generate_content_stream(prompt, model: "gemini-2.5-flash", system_instruction: nil,
207
253
  response_mime_type: nil, response_schema: nil, temperature: 0.5,
208
- url_context: false, google_search: false, **parameters, &block)
254
+ url_context: false, google_search: false, code_execution: false,
255
+ **parameters, &block)
209
256
  raise ArgumentError, "Block is required for streaming" unless block_given?
210
257
 
211
258
  content = format_content(prompt)
@@ -230,7 +277,12 @@ module Gemini
230
277
  params[:generation_config]["temperature"] = temperature
231
278
 
232
279
  # Handle tool shortcuts
233
- tools = build_tools_array(nil, url_context: url_context, google_search: google_search)
280
+ tools = build_tools_array(
281
+ nil,
282
+ url_context: url_context,
283
+ google_search: google_search,
284
+ code_execution: code_execution
285
+ )
234
286
  params[:tools] = tools if tools && !tools.empty?
235
287
 
236
288
  # Merge other parameters
@@ -495,7 +547,7 @@ module Gemini
495
547
  end
496
548
 
497
549
  # Build tools array from explicit tools parameter and shortcuts
498
- def build_tools_array(tools, url_context: false, google_search: false)
550
+ def build_tools_array(tools, url_context: false, google_search: false, code_execution: false)
499
551
  result_tools = []
500
552
 
501
553
  # Add existing tools if provided
@@ -511,6 +563,9 @@ module Gemini
511
563
  # Add google_search tool if requested
512
564
  result_tools << { google_search: {} } if google_search
513
565
 
566
+ # Add code_execution tool if requested
567
+ result_tools << { code_execution: {} } if code_execution
568
+
514
569
  # Remove duplicates based on tool keys and return
515
570
  return nil if result_tools.empty?
516
571
  result_tools.uniq { |tool| tool.keys.first }
@@ -645,4 +700,4 @@ module Gemini
645
700
  end
646
701
  end
647
702
  end
648
- end
703
+ end
@@ -37,13 +37,135 @@ module Gemini
37
37
 
38
38
  parts.select { |part| part.key?("text") }.map { |part| part["text"] }
39
39
  end
40
+
41
+ # Get all executableCode parts returned by the Code Execution tool.
42
+ def executable_codes
43
+ return [] unless valid?
44
+
45
+ parts.map { |part| part["executableCode"] || part["executable_code"] }.compact
46
+ end
47
+
48
+ # Get the first generated code string from a Code Execution response.
49
+ def executable_code
50
+ code_part = executable_codes.first
51
+ return nil unless code_part
52
+
53
+ code_part["code"] || code_part[:code]
54
+ end
55
+
56
+ # Get all codeExecutionResult parts returned by the Code Execution tool.
57
+ def code_execution_results
58
+ return [] unless valid?
59
+
60
+ parts.map { |part| part["codeExecutionResult"] || part["code_execution_result"] }.compact
61
+ end
62
+
63
+ # Get the first execution output string from a Code Execution response.
64
+ def code_execution_output
65
+ result_part = code_execution_results.first
66
+ return nil unless result_part
67
+
68
+ result_part["output"] || result_part[:output]
69
+ end
70
+
71
+ # Get the first execution outcome (for example "OUTCOME_OK").
72
+ def code_execution_outcome
73
+ result_part = code_execution_results.first
74
+ return nil unless result_part
75
+
76
+ result_part["outcome"] || result_part[:outcome]
77
+ end
78
+
79
+ # True when the first Code Execution result completed successfully.
80
+ def code_execution_success?
81
+ code_execution_outcome == "OUTCOME_OK"
82
+ end
83
+
84
+ # True if the response contains generated code or execution results.
85
+ def code_execution?
86
+ !executable_codes.empty? || !code_execution_results.empty?
87
+ end
40
88
 
41
89
  # Get image parts (if any)
42
90
  def image_parts
43
91
  return [] unless valid?
44
-
92
+
45
93
  parts.select { |part| part.key?("inline_data") && part["inline_data"]["mime_type"].start_with?("image/") }
46
94
  end
95
+
96
+ # Get the first audio inlineData part (TTS responses use camelCase "inlineData")
97
+ def audio_part
98
+ return nil unless valid?
99
+
100
+ parts.find do |part|
101
+ data_key = part["inlineData"] || part["inline_data"]
102
+ next false unless data_key
103
+ mt = data_key["mimeType"] || data_key["mime_type"]
104
+ mt.is_a?(String) && mt.start_with?("audio/")
105
+ end
106
+ end
107
+
108
+ # Base64-encoded audio data from a TTS response
109
+ def audio_data
110
+ part = audio_part
111
+ return nil unless part
112
+ data_key = part["inlineData"] || part["inline_data"]
113
+ data_key["data"]
114
+ end
115
+
116
+ # MIME type of the audio payload (e.g. "audio/L16;codec=pcm;rate=24000")
117
+ def audio_mime_type
118
+ part = audio_part
119
+ return nil unless part
120
+ data_key = part["inlineData"] || part["inline_data"]
121
+ data_key["mimeType"] || data_key["mime_type"]
122
+ end
123
+
124
+ # True if the response contains audio inlineData
125
+ def audio_response?
126
+ !audio_part.nil?
127
+ end
128
+
129
+ # Save audio to a file. PCM (L16) payloads are wrapped in a WAV header so
130
+ # the result is directly playable; other audio MIME types are written as-is.
131
+ # Returns the written file path or nil if no audio is present.
132
+ def save_audio(filepath)
133
+ data_b64 = audio_data
134
+ return nil unless data_b64
135
+
136
+ require 'base64'
137
+ raw = Base64.strict_decode64(data_b64)
138
+ mime = audio_mime_type.to_s
139
+
140
+ if mime.include?("L16") || mime.include?("pcm")
141
+ rate = mime[/rate=(\d+)/, 1]&.to_i || 24000
142
+ channels = 1
143
+ bits_per_sample = 16
144
+ byte_rate = rate * channels * bits_per_sample / 8
145
+ block_align = channels * bits_per_sample / 8
146
+ data_size = raw.bytesize
147
+
148
+ header = +""
149
+ header << "RIFF"
150
+ header << [36 + data_size].pack("V")
151
+ header << "WAVE"
152
+ header << "fmt "
153
+ header << [16].pack("V")
154
+ header << [1].pack("v")
155
+ header << [channels].pack("v")
156
+ header << [rate].pack("V")
157
+ header << [byte_rate].pack("V")
158
+ header << [block_align].pack("v")
159
+ header << [bits_per_sample].pack("v")
160
+ header << "data"
161
+ header << [data_size].pack("V")
162
+
163
+ File.binwrite(filepath, header + raw)
164
+ else
165
+ File.binwrite(filepath, raw)
166
+ end
167
+ filepath
168
+ end
47
169
 
48
170
  # Get all content with string representation
49
171
  def full_content
@@ -73,7 +195,8 @@ module Gemini
73
195
  !@raw_data.nil? &&
74
196
  ((@raw_data.key?("candidates") && !@raw_data["candidates"].empty?) ||
75
197
  (@raw_data.key?("predictions") && !@raw_data["predictions"].empty?) ||
76
- embedding_response?)
198
+ embedding_response? ||
199
+ count_tokens_response?)
77
200
  end
78
201
 
79
202
  # Check if the raw response contains embedding data
@@ -231,6 +354,28 @@ module Gemini
231
354
  def total_tokens
232
355
  usage&.dig("totalTokens") || 0
233
356
  end
357
+
358
+ # Check whether this response is a countTokens API result
359
+ def count_tokens_response?
360
+ !@raw_data.nil? && @raw_data.key?("totalTokens") &&
361
+ !@raw_data.key?("candidates") && !@raw_data.key?("predictions") &&
362
+ !embedding_response?
363
+ end
364
+
365
+ # Total tokens reported by the countTokens API (top-level totalTokens)
366
+ def count_tokens
367
+ @raw_data&.dig("totalTokens")
368
+ end
369
+
370
+ # Cached content token count reported by countTokens
371
+ def cached_content_token_count
372
+ @raw_data&.dig("cachedContentTokenCount") || 0
373
+ end
374
+
375
+ # Per-modality token breakdown reported by countTokens
376
+ def prompt_tokens_details
377
+ @raw_data&.dig("promptTokensDetails") || []
378
+ end
234
379
 
235
380
  # Process chunks for streaming responses
236
381
  def stream_chunks
@@ -553,4 +698,4 @@ module Gemini
553
698
  end
554
699
  end
555
700
  end
556
- end
701
+ end
@@ -0,0 +1,77 @@
1
+ module Gemini
2
+ class Tokens
3
+ DEFAULT_MODEL = "gemini-2.5-flash".freeze
4
+
5
+ def initialize(client:)
6
+ @client = client
7
+ end
8
+
9
+ # Count tokens for the given input.
10
+ #
11
+ # input: String, Array of parts/contents, or Hash. Optional when `contents:` is given.
12
+ # contents: full Array of Content objects (overrides input).
13
+ # system_instruction: String or Content hash.
14
+ # tools: Array of tool definitions (passed via generateContentRequest form).
15
+ # generation_config: Hash forwarded as generationConfig.
16
+ # cached_content: cachedContents/* resource name.
17
+ def count(input = nil, model: DEFAULT_MODEL, contents: nil, system_instruction: nil,
18
+ tools: nil, generation_config: nil, cached_content: nil, **parameters)
19
+ normalized_model = normalize_model(model)
20
+
21
+ payload = build_payload(
22
+ model: normalized_model,
23
+ input: input,
24
+ contents: contents,
25
+ system_instruction: system_instruction,
26
+ tools: tools,
27
+ generation_config: generation_config,
28
+ cached_content: cached_content
29
+ ).merge(parameters)
30
+
31
+ response = @client.json_post(
32
+ path: "models/#{normalized_model}:countTokens",
33
+ parameters: payload
34
+ )
35
+ Gemini::Response.new(response)
36
+ end
37
+
38
+ private
39
+
40
+ def build_payload(model:, input:, contents:, system_instruction:, tools:, generation_config:, cached_content:)
41
+ resolved_contents = contents || [format_content(input)]
42
+
43
+ # Use generateContentRequest form when extra request fields are present
44
+ if system_instruction || tools || generation_config || cached_content
45
+ # model is required inside the nested GenerateContentRequest
46
+ gc_request = { model: "models/#{model}", contents: resolved_contents }
47
+ gc_request[:systemInstruction] = format_content(system_instruction) if system_instruction
48
+ gc_request[:tools] = tools if tools
49
+ gc_request[:generationConfig] = generation_config if generation_config
50
+ gc_request[:cachedContent] = cached_content if cached_content
51
+ { generateContentRequest: gc_request }
52
+ else
53
+ { contents: resolved_contents }
54
+ end
55
+ end
56
+
57
+ def format_content(input)
58
+ case input
59
+ when nil
60
+ raise ArgumentError, "input or contents parameter is required"
61
+ when String
62
+ { parts: [{ text: input }] }
63
+ when Array
64
+ { parts: input.map { |part| part.is_a?(String) ? { text: part } : part } }
65
+ when Hash
66
+ input.key?(:parts) || input.key?("parts") ? input : { parts: [input] }
67
+ else
68
+ { parts: [{ text: input.to_s }] }
69
+ end
70
+ end
71
+
72
+ def normalize_model(model)
73
+ model_str = model.to_s
74
+ model_str.start_with?("models/") ? model_str.delete_prefix("models/") : model_str
75
+ end
76
+ end
77
+ end
data/lib/gemini/tts.rb ADDED
@@ -0,0 +1,83 @@
1
+ module Gemini
2
+ class TTS
3
+ DEFAULT_MODEL = "gemini-2.5-flash-preview-tts".freeze
4
+
5
+ # 30 prebuilt voice names available for the prebuiltVoiceConfig
6
+ VOICES = %w[
7
+ Zephyr Puck Charon Kore Fenrir Leda Orus Aoede Callirrhoe Autonoe
8
+ Enceladus Iapetus Umbriel Algieba Despina Erinome Algenib Rasalgethi
9
+ Laomedeia Achernar Alnilam Schedar Gacrux Pulcherrima Achird
10
+ Zubenelgenubi Vindemiatrix Sadachbia Sadaltager Sulafat
11
+ ].freeze
12
+
13
+ def initialize(client:)
14
+ @client = client
15
+ end
16
+
17
+ # Generate speech audio from text.
18
+ #
19
+ # text: prompt String (use style cues / bracket tags like [excited] for control,
20
+ # or "Speaker 1: ... Speaker 2: ..." for multi-speaker).
21
+ # voice: a single voice name (prebuiltVoiceConfig). Mutually exclusive with multi_speaker.
22
+ # multi_speaker: Array of { speaker:, voice: } Hashes for multi-speaker output.
23
+ # model: TTS preview model name. Defaults to gemini-2.5-flash-preview-tts.
24
+ # speech_config: raw speechConfig Hash override (skips voice/multi_speaker handling).
25
+ def generate(text, voice: nil, multi_speaker: nil, model: DEFAULT_MODEL,
26
+ speech_config: nil, **parameters)
27
+ raise ArgumentError, "text is required" if text.nil? || text.to_s.empty?
28
+ if voice && multi_speaker
29
+ raise ArgumentError, "voice and multi_speaker are mutually exclusive"
30
+ end
31
+
32
+ resolved_speech_config = speech_config || build_speech_config(voice: voice, multi_speaker: multi_speaker)
33
+ raise ArgumentError, "voice, multi_speaker, or speech_config is required" unless resolved_speech_config
34
+
35
+ payload = {
36
+ contents: [{ parts: [{ text: text }] }],
37
+ generationConfig: {
38
+ responseModalities: ["AUDIO"],
39
+ speechConfig: resolved_speech_config
40
+ }
41
+ }
42
+
43
+ payload.merge!(parameters) if parameters && !parameters.empty?
44
+
45
+ response = @client.json_post(
46
+ path: "models/#{normalize_model(model)}:generateContent",
47
+ parameters: payload
48
+ )
49
+ Gemini::Response.new(response)
50
+ end
51
+
52
+ private
53
+
54
+ def build_speech_config(voice:, multi_speaker:)
55
+ if multi_speaker
56
+ speaker_voice_configs = multi_speaker.map do |entry|
57
+ speaker = entry[:speaker] || entry["speaker"]
58
+ v = entry[:voice] || entry["voice"]
59
+ raise ArgumentError, "multi_speaker entries require :speaker and :voice" unless speaker && v
60
+ validate_voice!(v)
61
+ {
62
+ speaker: speaker,
63
+ voiceConfig: { prebuiltVoiceConfig: { voiceName: v } }
64
+ }
65
+ end
66
+ { multiSpeakerVoiceConfig: { speakerVoiceConfigs: speaker_voice_configs } }
67
+ elsif voice
68
+ validate_voice!(voice)
69
+ { voiceConfig: { prebuiltVoiceConfig: { voiceName: voice } } }
70
+ end
71
+ end
72
+
73
+ def validate_voice!(voice)
74
+ return if VOICES.include?(voice.to_s)
75
+ raise ArgumentError, "Unknown voice '#{voice}'. Available voices: #{VOICES.join(', ')}"
76
+ end
77
+
78
+ def normalize_model(model)
79
+ model_str = model.to_s
80
+ model_str.start_with?("models/") ? model_str.delete_prefix("models/") : model_str
81
+ end
82
+ end
83
+ end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Gemini
4
- VERSION = "1.1.0"
4
+ VERSION = "1.3.0"
5
5
  end
data/lib/gemini.rb CHANGED
@@ -12,6 +12,8 @@ require_relative "gemini/threads"
12
12
  require_relative "gemini/messages"
13
13
  require_relative "gemini/runs"
14
14
  require_relative "gemini/embeddings"
15
+ require_relative "gemini/tokens"
16
+ require_relative "gemini/tts"
15
17
  require_relative "gemini/audio"
16
18
  require_relative "gemini/files"
17
19
  require_relative "gemini/images"
metadata CHANGED
@@ -1,13 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: ruby-gemini-api
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.1.0
4
+ version: 1.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - rira100000000
8
+ autorequire:
8
9
  bindir: bin
9
10
  cert_chain: []
10
- date: 1980-01-02 00:00:00.000000000 Z
11
+ date: 2026-06-11 00:00:00.000000000 Z
11
12
  dependencies:
12
13
  - !ruby/object:Gem::Dependency
13
14
  name: faraday
@@ -180,7 +181,9 @@ files:
180
181
  - lib/gemini/response.rb
181
182
  - lib/gemini/runs.rb
182
183
  - lib/gemini/threads.rb
184
+ - lib/gemini/tokens.rb
183
185
  - lib/gemini/tool_definition.rb
186
+ - lib/gemini/tts.rb
184
187
  - lib/gemini/version.rb
185
188
  - lib/gemini/video.rb
186
189
  - lib/ruby/gemini.rb
@@ -192,6 +195,7 @@ metadata:
192
195
  source_code_uri: https://github.com/rira100000000/ruby-gemini-api
193
196
  changelog_uri: https://github.com/rira100000000/ruby-gemini-api/blob/main/CHANGELOG.md
194
197
  rubygems_mfa_required: 'true'
198
+ post_install_message:
195
199
  rdoc_options: []
196
200
  require_paths:
197
201
  - lib
@@ -206,7 +210,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
206
210
  - !ruby/object:Gem::Version
207
211
  version: '0'
208
212
  requirements: []
209
- rubygems_version: 3.6.9
213
+ rubygems_version: 3.3.26
214
+ signing_key:
210
215
  specification_version: 4
211
216
  summary: Ruby client for Google's Gemini API
212
217
  test_files: []