RubyGems - omniai-google - Versions diffs - 2.6.5 → 2.7.7 - Mend

omniai-google 2.6.5 → 2.7.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

checksums.yaml +4 -4
data/README.md +200 -0
data/lib/omniai/google/bucket.rb +115 -0
data/lib/omniai/google/chat.rb +1 -1
data/lib/omniai/google/client.rb +10 -0
data/lib/omniai/google/config.rb +8 -0
data/lib/omniai/google/transcribe.rb +143 -0
data/lib/omniai/google/transcribe_helpers.rb +461 -0
data/lib/omniai/google/version.rb +1 -1
metadata +19 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: a1f3c06628af28183c2e2224b794ed0f105b136208dc06231ad9fc43e77254e5
-  data.tar.gz: f267290388bc2a067f202d0da53581c0142363b69d99d47c6a4e1d5ef722f720
+  metadata.gz: eb0ebf5227f6a88f24418703712cccfad44d0fa41251f7e84e279211e45bf8aa
+  data.tar.gz: 465f216908801d1c440bf6bb22f1d15df8a39558d3cb2ec91bae290ff959f48d
 SHA512:
-  metadata.gz: 4f013c574fc90c3fe31dccdc4ac424020908866fb8efd0b027c3d92b6e901b2a7656bfba527a37af80af17be9b33e42f1b6caa3b9a784f80e678558e1bbeee53
-  data.tar.gz: '08dea5482ac8679c3d382ecf1830f634e85e652a94f470654ebcb5468898ec30473b7be6047535a056475542d30e1ef0f29dbba851fc1f252050ececfeb3af1a'
+  metadata.gz: 7c2dacf677dc673f2b3c2b8ccf439f59be0cd901b23c886d40b2f8f04e11edd1574616ca2a07a59f4c4406d5bb5e06d5aca0285ba45481b22aafb552efdc5741
+  data.tar.gz: 704d882c87af6d51800d86eaef592b1bf84e749686ad0129d7b837a7e8b800417ec1c2757c7d3728f796d0c53b640175b2eb13e369d8a179028f6f9e48fa064b

data/README.md CHANGED Viewed

@@ -58,6 +58,8 @@ OmniAI::Google.configure do |config|
 end
 ```
+**Note for Transcription**: When using transcription features, ensure your service account has the necessary permissions for Google Cloud Speech-to-Text API and Google Cloud Storage (for automatic file uploads). See the [GCS Setup](#gcs-setup-for-transcription) section below for detailed configuration.
 Credentials may be configured using:
 1. A `File` / `String` / `Pathname`.
@@ -143,6 +145,204 @@ end
 [Google API Reference `stream`](https://ai.google.dev/gemini-api/docs/api-overview#stream)
+### Transcribe
+Audio files can be transcribed using Google's Speech-to-Text API. The implementation automatically handles both synchronous and asynchronous recognition based on file size and model type.
+#### Basic Usage
+```ruby
+# Transcribe a local audio file
+result = client.transcribe("path/to/audio.mp3")
+result.text # "Hello, this is the transcribed text..."
+# Transcribe with specific model
+result = client.transcribe("path/to/audio.mp3", model: "latest_long")
+result.text # "Hello, this is the transcribed text..."
+```
+#### Multi-Language Detection
+The transcription automatically detects multiple languages when no specific language is provided:
+```ruby
+# Auto-detect English and Spanish
+result = client.transcribe("bilingual_audio.mp3", model: "latest_long")
+result.text # "Hello, how are you? Hola, ¿cómo estás?"
+# Specify expected languages explicitly
+result = client.transcribe("audio.mp3", language: ["en-US", "es-US"], model: "latest_long")
+```
+#### Detailed Transcription with Timestamps
+Use `VERBOSE_JSON` format to get detailed timing information, confidence scores, and language detection per segment:
+```ruby
+result = client.transcribe("audio.mp3",
+  model: "latest_long",
+  format: OmniAI::Transcribe::Format::VERBOSE_JSON
+)
+# Access the full transcript
+result.text # "Complete transcribed text..."
+# Access detailed segment information
+result.segments.each do |segment|
+  puts "Segment #{segment[:segment_id]}: #{segment[:text]}"
+  puts "Language: #{segment[:language_code]}"
+  puts "Confidence: #{segment[:confidence]}"
+  puts "End time: #{segment[:end_time]}"
+  # Word-level timing (if available)
+  segment[:words].each do |word|
+    puts "  #{word[:word]} (#{word[:start_time]} - #{word[:end_time]})"
+  end
+end
+# Total audio duration
+puts "Total duration: #{result.total_duration}"
+```
+#### Models
+The transcription supports various models optimized for different use cases:
+```ruby
+# For short audio (< 60 seconds)
+client.transcribe("short_audio.mp3", model: OmniAI::Google::Transcribe::Model::LATEST_SHORT)
+# For long-form audio (> 60 seconds) - automatically uses async processing
+client.transcribe("long_audio.mp3", model: OmniAI::Google::Transcribe::Model::LATEST_LONG)
+# For phone/telephony audio
+client.transcribe("phone_call.mp3", model: OmniAI::Google::Transcribe::Model::TELEPHONY_LONG)
+# For medical conversations
+client.transcribe("medical_interview.mp3", model: OmniAI::Google::Transcribe::Model::MEDICAL_CONVERSATION)
+# Other available models
+client.transcribe("audio.mp3", model: OmniAI::Google::Transcribe::Model::CHIRP_2) # Enhanced model
+client.transcribe("audio.mp3", model: OmniAI::Google::Transcribe::Model::CHIRP)   # Universal model
+```
+**Available Model Constants:**
+- `OmniAI::Google::Transcribe::Model::LATEST_SHORT` - Optimized for audio < 60 seconds
+- `OmniAI::Google::Transcribe::Model::LATEST_LONG` - Optimized for long-form audio
+- `OmniAI::Google::Transcribe::Model::TELEPHONY_SHORT` - For short phone calls
+- `OmniAI::Google::Transcribe::Model::TELEPHONY_LONG` - For long phone calls
+- `OmniAI::Google::Transcribe::Model::MEDICAL_CONVERSATION` - For medical conversations
+- `OmniAI::Google::Transcribe::Model::MEDICAL_DICTATION` - For medical dictation
+- `OmniAI::Google::Transcribe::Model::CHIRP_2` - Enhanced universal model
+- `OmniAI::Google::Transcribe::Model::CHIRP` - Universal model
+#### Supported Formats
+- **Input**: MP3, WAV, FLAC, and other common audio formats
+- **GCS URIs**: Direct transcription from Google Cloud Storage
+- **File uploads**: Automatic upload to GCS for files > 10MB or long-form models
+#### Advanced Features
+**Automatic Processing Selection:**
+- Files < 60 seconds: Uses synchronous recognition
+- Files > 60 seconds or long-form models: Uses asynchronous batch recognition
+- Large files: Automatically uploaded to Google Cloud Storage
+**GCS Integration:**
+- Automatic file upload and cleanup
+- Support for existing GCS URIs
+- Configurable bucket names
+**Error Handling:**
+- Automatic retry logic for temporary failures
+- Clear error messages for common issues
+- Graceful handling of network timeouts
+[Google Speech-to-Text API Reference](https://cloud.google.com/speech-to-text/docs)
+#### GCS Setup for Transcription
+For transcription to work properly with automatic file uploads, you need to set up Google Cloud Storage and configure the appropriate permissions.
+##### 1. Create a GCS Bucket
+You must create a bucket named `{project_id}-speech-audio` manually before using transcription features:
+```bash
+# Using gcloud CLI
+gsutil mb gs://your-project-id-speech-audio
+# Or create via Google Cloud Console
+# Navigate to Cloud Storage > Browser > Create Bucket
+```
+##### 2. Service Account Permissions
+Your service account needs the following IAM roles for transcription to work:
+**Required Roles:**
+- **Cloud Speech Editor** - Grants access to edit resources in Speech-to-Text
+- **Storage Bucket Viewer** - Grants permission to view buckets and their metadata, excluding IAM policies
+- **Storage Object Admin** - Grants full control over objects, including listing, creating, viewing, and deleting objects
+**To assign roles via gcloud CLI:**
+```bash
+# Replace YOUR_SERVICE_ACCOUNT_EMAIL and YOUR_PROJECT_ID with actual values
+SERVICE_ACCOUNT="your-service-account@your-project-id.iam.gserviceaccount.com"
+PROJECT_ID="your-project-id"
+# Grant Speech-to-Text permissions
+gcloud projects add-iam-policy-binding $PROJECT_ID \
+    --member="serviceAccount:$SERVICE_ACCOUNT" \
+    --role="roles/speech.editor"
+# Grant Storage permissions
+gcloud projects add-iam-policy-binding $PROJECT_ID \
+    --member="serviceAccount:$SERVICE_ACCOUNT" \
+    --role="roles/storage.objectAdmin"
+gcloud projects add-iam-policy-binding $PROJECT_ID \
+    --member="serviceAccount:$SERVICE_ACCOUNT" \
+    --role="roles/storage.legacyBucketReader"
+```
+**Or via Google Cloud Console:**
+1. Go to IAM & Admin > IAM
+2. Find your service account
+3. Click "Edit Principal"
+4. Add the required roles listed above
+##### 3. Enable Required APIs
+Ensure the following APIs are enabled in your Google Cloud Project:
+```bash
+# Enable Speech-to-Text API
+gcloud services enable speech.googleapis.com
+# Enable Cloud Storage API
+gcloud services enable storage.googleapis.com
+```
+##### 4. Bucket Configuration (Optional)
+You can customize the bucket name by configuring it in your application:
+```ruby
+# Custom bucket name in your transcription calls
+# The bucket must exist and your service account must have access
+client.transcribe("audio.mp3", bucket_name: "my-custom-audio-bucket")
+```
+**Important Notes:**
+- The default bucket name follows the pattern: `{project_id}-speech-audio`
+- You must create the bucket manually before using transcription features
+- Choose an appropriate region for your bucket based on your location and compliance requirements
+- Audio files are automatically deleted after successful transcription
+- If transcription fails, temporary files may remain and should be cleaned up manually
 ### Embed
 Text can be converted into a vector embedding for similarity comparison usage via:

data/lib/omniai/google/bucket.rb ADDED Viewed

@@ -0,0 +1,115 @@
+# frozen_string_literal: true
+require "google/cloud/storage"
+module OmniAI
+  module Google
+    # Uploads audio files to Google Cloud Storage for transcription.
+    class Bucket
+      class UploadError < StandardError; end
+      # @param client [Client]
+      # @param io [IO, String]
+      # @param bucket_name [String] optional - bucket name (defaults to project_id-speech-audio)
+      def self.process!(client:, io:, bucket_name: nil)
+        new(client:, io:, bucket_name:).process!
+      end
+      # @param client [Client]
+      # @param io [File, String]
+      # @param bucket_name [String] optional - bucket name
+      def initialize(client:, io:, bucket_name: nil)
+        @client = client
+        @io = io
+        @bucket_name = bucket_name || default_bucket_name
+      end
+      # @raise [UploadError]
+      #
+      # @return [String] GCS URI (gs://bucket/object)
+      def process!
+        # Create storage client with same credentials as main client
+        credentials = @client.instance_variable_get(:@credentials)
+        storage = ::Google::Cloud::Storage.new(
+          project_id:,
+          credentials:
+        )
+        # Get bucket (don't auto-create if it doesn't exist)
+        bucket = storage.bucket(@bucket_name)
+        unless bucket
+          raise UploadError, "Bucket '#{@bucket_name}' not found. " \
+            "Please create it manually or ensure the service account has access."
+        end
+        # Generate unique filename
+        timestamp = Time.now.strftime("%Y%m%d_%H%M%S")
+        random_suffix = SecureRandom.hex(4)
+        filename = "audio_#{timestamp}_#{random_suffix}.#{file_extension}"
+        # Upload file - create StringIO for binary content
+        content = file_content
+        if content.is_a?(String) && content.include?("\0")
+          # Binary content - wrap in StringIO
+          require "stringio"
+          content = StringIO.new(content)
+        end
+        bucket.create_file(content, filename)
+        # Return GCS URI
+        "gs://#{@bucket_name}/#{filename}"
+      rescue ::Google::Cloud::Error => e
+        raise UploadError, "Failed to upload to GCS: #{e.message}"
+      end
+    private
+      # @return [String]
+      def project_id
+        @client.instance_variable_get(:@project_id) ||
+          raise(ArgumentError, "project_id is required for GCS upload")
+      end
+      # @return [String]
+      def location_id
+        @client.instance_variable_get(:@location_id) || "global"
+      end
+      # @return [String]
+      def default_bucket_name
+        "#{project_id}-speech-audio"
+      end
+      # @return [String]
+      def file_content
+        case @io
+        when String
+          # Check if it's a file path or binary content
+          if @io.include?("\0") || !File.exist?(@io)
+            # It's binary content, return as-is
+            @io
+          else
+            # It's a file path, read the file
+            File.read(@io)
+          end
+        when File, IO, StringIO
+          @io.rewind if @io.respond_to?(:rewind)
+          @io.read
+        else
+          raise ArgumentError, "Unsupported input type: #{@io.class}"
+        end
+      end
+      # @return [String]
+      def file_extension
+        case @io
+        when String
+          File.extname(@io)[1..] || "wav"
+        else
+          "wav" # Default extension
+        end
+      end
+    end
+  end
+end

data/lib/omniai/google/chat.rb CHANGED Viewed

@@ -15,7 +15,7 @@ module OmniAI
       module Model
         GEMINI_1_0_PRO = "gemini-1.0-pro"
         GEMINI_1_5_PRO = "gemini-1.5-pro"
-        GEMINI_2_5_PRO = "gemini-2.5-pro-preview-05-06"
+        GEMINI_2_5_PRO = "gemini-2.5-pro-preview-06-05"
         GEMINI_1_5_FLASH = "gemini-1.5-flash"
         GEMINI_2_0_FLASH = "gemini-2.0-flash"
         GEMINI_2_5_FLASH = "gemini-2.5-flash-preview-04-17"

data/lib/omniai/google/client.rb CHANGED Viewed

@@ -88,6 +88,16 @@ module OmniAI
         Embed.process!(input, model:, client: self)
       end
+      # @raise [OmniAI::Error]
+      #
+      # @param input [String, File, IO] required - audio file path, file object, or GCS URI
+      # @param model [String] optional
+      # @param language [String, Array<String>] optional - language codes for transcription
+      # @param format [Symbol] optional - :json or :verbose_json
+      def transcribe(input, model: Transcribe::DEFAULT_MODEL, language: nil, format: nil)
+        Transcribe.process!(input, model:, language:, format:, client: self)
+      end
       # @return [String]
       def path
         if @project_id && @location_id

data/lib/omniai/google/config.rb CHANGED Viewed

@@ -55,6 +55,14 @@ module OmniAI
       def credentials=(value)
         @credentials = Credentials.parse(value)
       end
+      # @return [Hash]
+      def transcribe_options
+        @transcribe_options ||= {}
+      end
+      # @param value [Hash]
+      attr_writer :transcribe_options
     end
   end
 end

data/lib/omniai/google/transcribe.rb ADDED Viewed

@@ -0,0 +1,143 @@
+# frozen_string_literal: true
+module OmniAI
+  module Google
+    # A Google transcribe implementation.
+    #
+    # Usage:
+    #
+    #   transcribe = OmniAI::Google::Transcribe.new(client: client)
+    #   transcribe.process!(audio_file)
+    class Transcribe < OmniAI::Transcribe
+      include TranscribeHelpers
+      module Model
+        CHIRP_2 = "chirp_2"
+        CHIRP = "chirp"
+        LATEST_LONG = "latest_long"
+        LATEST_SHORT = "latest_short"
+        TELEPHONY_LONG = "telephony_long"
+        TELEPHONY_SHORT = "telephony_short"
+        MEDICAL_CONVERSATION = "medical_conversation"
+        MEDICAL_DICTATION = "medical_dictation"
+      end
+      DEFAULT_MODEL = Model::LATEST_SHORT
+      DEFAULT_RECOGNIZER = "_"
+      # @return [Context]
+      CONTEXT = Context.build do |context|
+        # No custom deserializers needed - let base class handle parsing
+      end
+      # @raise [HTTPError]
+      #
+      # @return [OmniAI::Transcribe::Transcription]
+      def process!
+        if needs_async_recognition?
+          process_async!
+        else
+          process_sync!
+        end
+      end
+    private
+      # @return [Boolean]
+      def needs_async_recognition?
+        # Use async for long-form models or when GCS is needed
+        needs_long_form_recognition? || needs_gcs_upload?
+      end
+      # @raise [HTTPError]
+      #
+      # @return [OmniAI::Transcribe::Transcription]
+      def process_sync!
+        response = request!
+        handle_sync_response_errors(response)
+        data = response.parse
+        transcript = data.dig("results", 0, "alternatives", 0, "transcript") || ""
+        transformed_data = build_sync_response_data(data, transcript)
+        Transcription.parse(model: @model, format: @format, data: transformed_data)
+      end
+      # @raise [HTTPError]
+      #
+      # @return [OmniAI::Transcribe::Transcription]
+      def process_async!
+        # Track if we uploaded the file for cleanup
+        uploaded_gcs_uri = nil
+        # Start the batch recognition job
+        response = request_batch!
+        raise HTTPError, response unless response.status.ok?
+        operation_data = response.parse
+        operation_name = operation_data["name"]
+        raise HTTPError, "No operation name returned from batch recognition request" unless operation_name
+        # Extract GCS URI for cleanup if we uploaded it
+        if operation_data.dig("metadata", "batchRecognizeRequest", "files")
+          file_uri = operation_data.dig("metadata", "batchRecognizeRequest", "files", 0, "uri")
+          # Only mark for cleanup if it's not a user-provided GCS URI
+          uploaded_gcs_uri = file_uri unless @io.is_a?(String) && @io.start_with?("gs://")
+        end
+        # Poll for completion
+        result = poll_operation!(operation_name)
+        # Extract transcript from completed operation
+        transcript_data = extract_batch_transcript(result)
+        # Clean up uploaded file if we created it
+        cleanup_gcs_file(uploaded_gcs_uri) if uploaded_gcs_uri
+        Transcription.parse(model: @model, format: @format, data: transcript_data)
+      end
+    protected
+      # @return [Context]
+      def context
+        CONTEXT
+      end
+      # @return [HTTP::Response]
+      def request!
+        # Speech-to-Text API uses different endpoints for regional vs global
+        endpoint = speech_endpoint
+        speech_connection = HTTP.persistent(endpoint)
+          .timeout(connect: @client.timeout, write: @client.timeout, read: @client.timeout)
+          .accept(:json)
+        # Add authentication if using credentials
+        speech_connection = speech_connection.auth("Bearer #{@client.send(:auth).split.last}") if @client.credentials?
+        speech_connection.post(path, params:, json: payload)
+      end
+      # @return [Hash]
+      def payload
+        config = build_config
+        payload_data = { config: }
+        add_audio_data(payload_data)
+        payload_data
+      end
+      # @return [String]
+      def path
+        # Always use Speech-to-Text API v2 with recognizers
+        recognizer_path = "projects/#{project_id}/locations/#{location_id}/recognizers/#{recognizer_name}"
+        "/v2/#{recognizer_path}:recognize"
+      end
+      # @return [Hash]
+      def params
+        { key: (@client.api_key unless @client.credentials?) }.compact
+      end
+    end
+  end
+end

data/lib/omniai/google/transcribe_helpers.rb ADDED Viewed

@@ -0,0 +1,461 @@
+# frozen_string_literal: true
+module OmniAI
+  module Google
+    # Helper methods for transcription functionality
+    module TranscribeHelpers # rubocop:disable Metrics/ModuleLength
+    private
+      # @return [String]
+      def project_id
+        @client.instance_variable_get(:@project_id) ||
+          raise(ArgumentError, "project_id is required for transcription")
+      end
+      # @return [String]
+      def location_id
+        case @model
+        when "chirp_2"
+          "us-central1"
+        else
+          @client.instance_variable_get(:@location_id) || "global"
+        end
+      end
+      # @return [String]
+      def speech_endpoint
+        location_id == "global" ? "https://speech.googleapis.com" : "https://#{location_id}-speech.googleapis.com"
+      end
+      # @return [Array<String>, nil]
+      def language_codes
+        case @language
+        when String
+          [@language] unless @language.strip.empty?
+        when Array
+          cleaned = @language.compact.reject(&:empty?)
+          cleaned if cleaned.any?
+        when nil, ""
+          nil # Auto-detect language when not specified
+        else
+          ["en-US"] # Default to English (multi-language only supported in global/us/eu locations)
+        end
+      end
+      # @param input [String, Pathname, File, IO]
+      # @return [String] Base64 encoded audio content
+      def encode_audio(input)
+        case input
+        when String
+          if File.exist?(input)
+            Base64.strict_encode64(File.read(input))
+          else
+            input # Assume it's already base64 encoded
+          end
+        when Pathname, File, IO, StringIO
+          Base64.strict_encode64(input.read)
+        else
+          raise ArgumentError, "Unsupported input type: #{input.class}"
+        end
+      end
+      # @return [Boolean]
+      def needs_gcs_upload?
+        return false if @io.is_a?(String) && @io.start_with?("gs://")
+        file_size = calculate_file_size
+        # Force GCS upload for files > 10MB or if using long models for longer audio
+        file_size > 10_000_000 || needs_long_form_recognition?
+      end
+      # @return [Boolean]
+      def needs_long_form_recognition?
+        # Use long-form models for potentially longer audio files
+        return true if @model&.include?("long")
+        # Chirp models process speech in larger chunks and prefer BatchRecognize
+        return true if @model&.include?("chirp")
+        # For large files, assume they might be longer than 60 seconds
+        # Approximate: files larger than 1MB might be longer than 60 seconds
+        calculate_file_size > 1_000_000
+      end
+      # @return [Integer]
+      def calculate_file_size
+        case @io
+        when String
+          File.exist?(@io) ? File.size(@io) : 0
+        when File, IO, StringIO
+          @io.respond_to?(:size) ? @io.size : 0
+        else
+          0
+        end
+      end
+      # @return [Hash]
+      def build_config
+        config = {
+          model: @model,
+          autoDecodingConfig: {},
+        }
+        # Only include languageCodes if specified and non-empty (omit for auto-detection)
+        lang_codes = language_codes
+        config[:languageCodes] = if lang_codes&.any?
+                                   lang_codes
+                                 else
+                                   # Handle language detection based on model capabilities
+                                   default_language_codes
+                                 end
+        features = build_features
+        config[:features] = features unless features.empty?
+        if OmniAI::Google.config.respond_to?(:transcribe_options)
+          config.merge!(OmniAI::Google.config.transcribe_options)
+        end
+        config
+      end
+      # @return [Array<String>] Default language codes based on model
+      def default_language_codes
+        if @model&.include?("chirp")
+          # Chirp models use "auto" for automatic language detection
+          ["auto"]
+        else
+          # Other models use multiple languages for auto-detection
+          %w[en-US es-US]
+        end
+      end
+      # @return [Hash]
+      def build_features
+        case @format
+        when "verbose_json"
+          {
+            enableAutomaticPunctuation: true,
+            enableWordTimeOffsets: true,
+            enableWordConfidence: true,
+          }
+        when "json"
+          { enableAutomaticPunctuation: true }
+        else
+          {}
+        end
+      end
+      # @param payload_data [Hash]
+      def add_audio_data(payload_data)
+        if @io.is_a?(String) && @io.start_with?("gs://")
+          payload_data[:uri] = @io
+        elsif needs_gcs_upload?
+          gcs_uri = Bucket.process!(client: @client, io: @io)
+          payload_data[:uri] = gcs_uri
+        else
+          payload_data[:content] = encode_audio(@io)
+        end
+      end
+      # @return [Hash] Payload for batch recognition
+      def batch_payload
+        config = build_config
+        # Get audio URI for batch processing
+        audio_uri = if @io.is_a?(String) && @io.start_with?("gs://")
+                      @io
+                    else
+                      # Force GCS upload for batch recognition
+                      Bucket.process!(client: @client, io: @io)
+                    end
+        {
+          config:,
+          files: [{ uri: audio_uri }],
+          recognitionOutputConfig: {
+            inlineResponseConfig: {},
+          },
+        }
+      end
+      # @param operation_name [String]
+      # @raise [HTTPError]
+      #
+      # @return [Hash]
+      def poll_operation!(operation_name)
+        endpoint = speech_endpoint
+        connection = HTTP.persistent(endpoint)
+          .timeout(connect: @client.timeout, write: @client.timeout, read: @client.timeout)
+          .accept(:json)
+        # Add authentication if using credentials
+        connection = connection.auth("Bearer #{@client.send(:auth).split.last}") if @client.credentials?
+        max_attempts = 60 # Maximum 15 minutes (15 second intervals)
+        attempt = 0
+        loop do
+          attempt += 1
+          raise HTTPError, "Operation timed out after #{max_attempts * 15} seconds" if attempt > max_attempts
+          operation_response = connection.get("/v2/#{operation_name}", params: operation_params)
+          raise HTTPError, operation_response unless operation_response.status.ok?
+          operation_data = operation_response.parse
+          # Check for errors
+          if operation_data["error"]
+            error_message = operation_data.dig("error", "message") || "Unknown error"
+            raise HTTPError, "Operation failed: #{error_message}"
+          end
+          # Check if done
+          return operation_data if operation_data["done"]
+          # Wait before polling again
+          sleep(15)
+        end
+      end
+      # @return [HTTP::Response]
+      def request_batch!
+        endpoint = speech_endpoint
+        connection = HTTP.persistent(endpoint)
+          .timeout(connect: @client.timeout, write: @client.timeout, read: @client.timeout)
+          .accept(:json)
+        # Add authentication if using credentials
+        connection = connection.auth("Bearer #{@client.send(:auth).split.last}") if @client.credentials?
+        connection.post(batch_path, params: operation_params, json: batch_payload)
+      end
+      # @return [String]
+      def batch_path
+        # Use batchRecognize endpoint for async recognition
+        recognizer_path = "projects/#{project_id}/locations/#{location_id}/recognizers/#{recognizer_name}"
+        "/v2/#{recognizer_path}:batchRecognize"
+      end
+      # @return [Hash]
+      def operation_params
+        { key: (@client.api_key unless @client.credentials?) }.compact
+      end
+      # @return [String]
+      def recognizer_name
+        # Always use the default recognizer - the model is specified in the config
+        "_"
+      end
+      # @param result [Hash] Operation result from batch recognition
+      # @return [Hash] Data formatted for OmniAI::Transcribe::Transcription.parse
+      def extract_batch_transcript(result)
+        batch_results = result.dig("response", "results")
+        return empty_transcript_data unless batch_results
+        file_result = batch_results.values.first
+        return empty_transcript_data unless file_result
+        transcript_segments = file_result.dig("transcript", "results")
+        return empty_transcript_data unless transcript_segments&.any?
+        build_transcript_data(transcript_segments, file_result)
+      end
+      # @return [Hash]
+      def empty_transcript_data
+        { "text" => "" }
+      end
+      # @param transcript_segments [Array]
+      # @param file_result [Hash]
+      # @return [Hash]
+      def build_transcript_data(transcript_segments, file_result)
+        transcript_text = extract_transcript_text(transcript_segments)
+        result_data = { "text" => transcript_text }
+        add_duration_if_available(result_data, file_result)
+        add_segments_if_verbose(result_data, transcript_segments)
+        result_data
+      end
+      # @param transcript_segments [Array]
+      # @return [String]
+      def extract_transcript_text(transcript_segments)
+        text_segments = transcript_segments.map do |segment|
+          segment.dig("alternatives", 0, "transcript")
+        end.compact
+        text_segments.join(" ")
+      end
+      # @param result_data [Hash]
+      # @param file_result [Hash]
+      def add_duration_if_available(result_data, file_result)
+        duration = file_result.dig("metadata", "totalBilledDuration")
+        result_data["duration"] = parse_duration(duration) if duration
+      end
+      # @param result_data [Hash]
+      # @param transcript_segments [Array]
+      def add_segments_if_verbose(result_data, transcript_segments)
+        result_data["segments"] = build_segments(transcript_segments) if @format == "verbose_json"
+      end
+      # @param duration_string [String] Duration in Google's format (e.g., "123.456s")
+      # @return [Float] Duration in seconds
+      def parse_duration(duration_string)
+        return nil unless duration_string
+        duration_string.to_s.sub(/s$/, "").to_f
+      end
+      # @param segments [Array] Transcript segments from Google API
+      # @return [Array<Hash>] Segments formatted for base class
+      def build_segments(segments)
+        segments.map.with_index do |segment, index|
+          alternative = segment.dig("alternatives", 0)
+          next unless alternative
+          segment_data = {
+            "id" => index,
+            "text" => alternative["transcript"],
+            "start" => calculate_segment_start(segments, index),
+            "end" => parse_duration(segment["resultEndOffset"]),
+            "confidence" => alternative["confidence"],
+          }
+          # Words removed - segments provide sufficient granularity for most use cases
+          segment_data
+        end.compact
+      end
+      # @param segments [Array] All segments
+      # @param index [Integer] Current segment index
+      # @return [Float] Start time estimated from previous segment end
+      def calculate_segment_start(segments, index)
+        return 0.0 if index.zero?
+        prev_segment = segments[index - 1]
+        parse_duration(prev_segment["resultEndOffset"]) || 0.0
+      end
+      # @param response [HTTP::Response]
+      # @raise [HTTPError]
+      def handle_sync_response_errors(response)
+        return if response.status.ok?
+        error_data = parse_error_data(response)
+        raise_timeout_error(response) if timeout_error?(error_data)
+        raise HTTPError, response
+      end
+      # @param response [HTTP::Response]
+      # @return [Hash]
+      def parse_error_data(response)
+        response.parse
+      rescue StandardError
+        {}
+      end
+      # @param error_data [Hash]
+      # @return [Boolean]
+      def timeout_error?(error_data)
+        error_data.dig("error", "message")&.include?("60 seconds")
+      end
+      # @param response [HTTP::Response]
+      # @raise [HTTPError]
+      def raise_timeout_error(response)
+        raise HTTPError, (response.tap do |r|
+          r.instance_variable_set(:@body, "Audio file exceeds 60-second limit for direct upload. " \
+            "Use a long-form model (e.g., 'latest_long') or upload to GCS first. " \
+            "Original error: #{response.flush}")
+        end)
+      end
+      # @param data [Hash]
+      # @param transcript [String]
+      # @return [Hash]
+      def build_sync_response_data(data, transcript)
+        return { "text" => transcript } unless verbose_json_format?(data)
+        build_verbose_sync_data(data, transcript)
+      end
+      # @param data [Hash]
+      # @return [Boolean]
+      def verbose_json_format?(data)
+        @format == "verbose_json" &&
+          data["results"]&.any? &&
+          data["results"][0]["alternatives"]&.any?
+      end
+      # @param data [Hash]
+      # @param transcript [String]
+      # @return [Hash]
+      def build_verbose_sync_data(data, transcript)
+        alternative = data["results"][0]["alternatives"][0]
+        {
+          "text" => transcript,
+          "segments" => [{
+            "id" => 0,
+            "text" => transcript,
+            "start" => 0.0,
+            "end" => nil,
+            "confidence" => alternative["confidence"],
+          }],
+        }
+      end
+      # @param gcs_uri [String] GCS URI to delete (e.g., "gs://bucket/file.mp3")
+      def cleanup_gcs_file(gcs_uri)
+        return unless valid_gcs_uri?(gcs_uri)
+        bucket_name, object_name = parse_gcs_uri(gcs_uri)
+        return unless bucket_name && object_name
+        delete_gcs_object(bucket_name, object_name, gcs_uri)
+      end
+      # @param gcs_uri [String]
+      # @return [Boolean]
+      def valid_gcs_uri?(gcs_uri)
+        gcs_uri&.start_with?("gs://")
+      end
+      # @param gcs_uri [String]
+      # @return [Array<String>] [bucket_name, object_name]
+      def parse_gcs_uri(gcs_uri)
+        uri_parts = gcs_uri.sub("gs://", "").split("/", 2)
+        [uri_parts[0], uri_parts[1]]
+      end
+      # @param bucket_name [String]
+      # @param object_name [String]
+      # @param gcs_uri [String]
+      def delete_gcs_object(bucket_name, object_name, gcs_uri)
+        storage = create_storage_client
+        bucket = storage.bucket(bucket_name)
+        return unless bucket
+        file = bucket.file(object_name)
+        file&.delete
+      rescue ::Google::Cloud::Error => e
+        @client.logger&.warn("Failed to cleanup GCS file #{gcs_uri}: #{e.message}")
+      end
+      # @return [Google::Cloud::Storage]
+      def create_storage_client
+        credentials = @client.instance_variable_get(:@credentials)
+        ::Google::Cloud::Storage.new(project_id:, credentials:)
+      end
+    end
+  end
+end

data/lib/omniai/google/version.rb CHANGED Viewed

@@ -2,6 +2,6 @@
 module OmniAI
   module Google
-    VERSION = "2.6.5"
+    VERSION = "2.7.7"
   end
 end

metadata CHANGED Viewed

@@ -1,13 +1,13 @@
 --- !ruby/object:Gem::Specification
 name: omniai-google
 version: !ruby/object:Gem::Version
-  version: 2.6.5
+  version: 2.7.7
 platform: ruby
 authors:
 - Kevin Sylvestre
 bindir: exe
 cert_chain: []
-date: 2025-05-14 00:00:00.000000000 Z
+date: 2025-06-16 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: event_stream_parser
@@ -37,6 +37,20 @@ dependencies:
     - - ">="
       - !ruby/object:Gem::Version
         version: '0'
+- !ruby/object:Gem::Dependency
+  name: google-cloud-storage
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
 - !ruby/object:Gem::Dependency
   name: omniai
   requirement: !ruby/object:Gem::Requirement
@@ -75,6 +89,7 @@ files:
 - Gemfile
 - README.md
 - lib/omniai/google.rb
+- lib/omniai/google/bucket.rb
 - lib/omniai/google/chat.rb
 - lib/omniai/google/chat/choice_serializer.rb
 - lib/omniai/google/chat/content_serializer.rb
@@ -92,6 +107,8 @@ files:
 - lib/omniai/google/config.rb
 - lib/omniai/google/credentials.rb
 - lib/omniai/google/embed.rb
+- lib/omniai/google/transcribe.rb
+- lib/omniai/google/transcribe_helpers.rb
 - lib/omniai/google/upload.rb
 - lib/omniai/google/upload/file.rb
 - lib/omniai/google/version.rb