RubyGems - clip-rb - Versions diffs - 1.0.2 → 2.0.0 - Mend

clip-rb 1.0.2 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +50 -0
data/README.md +22 -1
data/UPGRADING.md +103 -0
data/exe/clip-embed-image +35 -5
data/exe/clip-embed-text +35 -5
data/lib/clip/image_preprocessor.rb +26 -19
data/lib/clip/model.rb +12 -5
data/lib/clip/multilingual_model.rb +77 -0
data/lib/clip/tokenizer.rb +11 -3
data/lib/clip/version.rb +1 -1
data/lib/clip-rb.rb +2 -0
data/lib/clip.rb +95 -32
metadata +28 -11

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: c6c5fa03ca9061b273aa50792bfa67159b213049f9630466d18ec621459c227f
-  data.tar.gz: 23e59e48ab413dde5f61fdcce0f83a7ffd7bbe713c4f289840dedec347c4ed4d
+  metadata.gz: d6eff0db629a49c59e9c26a4ec877094ac84803e6d086bd8ba29d7c58648dc48
+  data.tar.gz: 3227054c9a2c3d74ab81e13d76ee4bc58451715b6e1ed708d3f7cbb1c497014e
 SHA512:
-  metadata.gz: b7ce62a3bbb124e6a5481199a4a56d5ca08b21c0ec71420e9970c543ea320f281cdc820a208087a9122de7d1f61734331346517a97871f24769ca1b77a7206be
-  data.tar.gz: 02a4f1145d2769e85aa8d415c35bc83a83469196e4d792a022f55f0de22e221bf1b19ddc592c9b05b613eafef2cb19aea91f36550d99acaede5dc05f625ed675
+  metadata.gz: c80c6fd029598a4000319a250191c202ab2fbdc3e636c0e6025325b11984d883740c3c2aa439f11baef25c33b413945becfb1be9171a4d2efefcc6ced729b01e
+  data.tar.gz: cdab5e8c1beebf3f1d55f9e8f2810fb087b822ba01c3493c7660bbb7653d2409b1e41a4dfc79010ba43855278193044d3fc0d66e0eb6a5d0a5a86d960d92640b

data/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,50 @@
+# Changelog
+All notable changes to this project will be documented in this file.
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [2.0.0] - 2024-12-13
+### Added
+- `Clip.similarity` method for calculating cosine similarity between embeddings
+- `Clip.normalize` method for L2 normalization of embedding vectors
+- `Clip::ImagePreprocessor::InvalidImageError` for better error handling
+- Image format validation with supported formats: jpg, jpeg, png, gif, bmp, webp, tiff
+- HTTP timeout (5 minutes default) for model downloads
+- `Clip::DownloadError` custom exception for download failures
+- Thread safety for lazy-loaded ONNX models using Mutex
+- CLI tools now support `--json` flag for JSON output
+- CLI tools now support `--multilingual` flag to use multilingual model
+- CLI tools now have `--help` flag with usage information
+### Fixed
+- Resource leak in tokenizer: GzipReader now properly closed after reading BPE vocabulary
+- `basic_clean` method now properly implements HTML entity decoding and unicode normalization
+- Inconsistent hash key types in MultilingualModel ONNX predict calls
+- Module-level `attr_accessor :logger` now works correctly
+- Changelog URL in gemspec now points to correct path
+- Multilingual tokenizer now downloads correctly (workaround for tokenizers gem bug)
+- HTTP redirects with relative URLs now handled correctly
+### Changed
+- **Breaking:** Removed `add_batch_dimension` method from ImagePreprocessor
+- **Breaking:** MultilingualModel tokenizer is now downloaded automatically instead of using `Tokenizers.from_pretrained`
+- Model downloads now skip files that already exist
+- CLI tools use OptionParser for proper argument handling
+## [1.1.0] - 2024-12-10
+### Added
+- XLM Roberta model for multilingual text embedding support
+- `Clip::MultilingualModel` class for multilingual CLIP
+## [1.0.0] - 2024-12-01
+### Added
+- Initial release
+- OpenAI CLIP ViT-B-32 model support
+- Text and image embedding generation
+- Automatic model downloading from Hugging Face
+- CLI tools: `clip-embed-text` and `clip-embed-image`

data/README.md CHANGED Viewed

@@ -21,6 +21,7 @@ See [neighbor gem](https://github.com/ankane/neighbor) to learn more about vecto
 - Ruby 3.0.0 or later
 - ONNX CLIP models (downloaded automatically on first use)
+- XLM Roberta CLIP model (for multilingual support)
 ---
@@ -43,7 +44,9 @@ gem install clip-rb
 ```ruby
 require 'clip'
-clip = Clip::Model.new
+# This will download the models on first use (default path is .clip_models)
+# If you don't want this behavior you can pass the path to the models as an argument.
+clip = Clip::Model.new
 text_embedding = clip.encode_text("a photo of a cat")
 # => [0.15546110272407532, 0.07329428941011429, ...]
@@ -54,6 +57,24 @@ image_embedding = clip.encode_image("test/fixtures/test.jpg")
 💡 Tip: Use cosine similarity for KNN vector search when comparing embeddings!
+## Multilingual text embeddings
+Since the original CLIP only supports English embeddings this gem now has added support for multilingual text embeddings using the XLM Roberta model.
+```ruby
+require 'clip'
+# This will download the models on first use (default path is .clip_models/multilingual)
+# If you don't want this behavior you can pass the path to the models as an argument.
+clip = Clip::MultilingualModel.new
+text_embedding = clip.encode_text("un photo de un gato")
+# => [0.15546110272407532, 0.07329428941011429, ...]
+image_embedding = clip.encode_image("test/fixtures/test.jpg")
+# => [0.22115306556224823, 0.19343754649162292, ...]
+```
 ## CLI
 Additionally you can fetch embeddings by calling:

data/UPGRADING.md ADDED Viewed

@@ -0,0 +1,103 @@
+# Upgrading Guide
+## Upgrading from 1.x to 2.0
+### Breaking Changes
+#### 1. ImagePreprocessor: `add_batch_dimension` method removed
+The `add_batch_dimension` method was removed from `Clip::ImagePreprocessor` because it was misleadingly named - it didn't actually add a batch dimension.
+**Before (1.x):**
+```ruby
+preprocessor = Clip::ImagePreprocessor.new
+tensor = preprocessor.preprocess(image_path)
+# tensor shape was [3, 224, 224] despite method name suggesting [1, 3, 224, 224]
+```
+**After (2.0):**
+```ruby
+preprocessor = Clip::ImagePreprocessor.new
+tensor = preprocessor.preprocess(image_path)
+# tensor shape is [3, 224, 224] - same behavior, clearer code
+```
+If you were calling `add_batch_dimension` directly (unlikely since it was private), you'll need to remove those calls.
+#### 2. MultilingualModel: Tokenizer loading changed
+The `MultilingualModel` no longer uses `Tokenizers.from_pretrained` due to a bug in the tokenizers gem. Instead, it downloads the tokenizer.json file directly and loads it from disk.
+**Before (1.x):**
+```ruby
+model = Clip::MultilingualModel.new(
+  tokenizer: Tokenizers.from_pretrained("M-CLIP/XLM-Roberta-Large-Vit-B-32")
+)
+```
+**After (2.0):**
+```ruby
+# Tokenizer is downloaded automatically - no need to specify
+model = Clip::MultilingualModel.new
+# Or provide a custom tokenizer loaded from file
+model = Clip::MultilingualModel.new(
+  tokenizer: Tokenizers::Tokenizer.from_file("/path/to/tokenizer.json")
+)
+```
+### New Features
+#### Similarity and Normalization Helpers
+```ruby
+# Calculate cosine similarity between embeddings
+similarity = Clip.similarity(embedding1, embedding2)
+# Normalize embeddings to unit length
+normalized = Clip.normalize(embedding)
+```
+#### Image Validation
+Images are now validated before processing:
+```ruby
+begin
+  model.encode_image("invalid.xyz")
+rescue Clip::ImagePreprocessor::InvalidImageError => e
+  puts e.message  # "Unsupported image format: xyz. Supported: jpg, jpeg, png, gif, bmp, webp, tiff"
+end
+```
+#### CLI Improvements
+```bash
+# JSON output for piping
+clip-embed-text --json "a photo of a cat"
+# Use multilingual model
+clip-embed-text --multilingual "une photo d'un chat"
+# Help
+clip-embed-text --help
+```
+#### Thread Safety
+Both `Model` and `MultilingualModel` now use mutex locks for thread-safe lazy loading of ONNX models.
+#### Download Improvements
+- HTTP timeout of 5 minutes (configurable)
+- Downloads skip files that already exist
+- Relative redirects handled correctly
+- Custom `Clip::DownloadError` exception
+### Migration Checklist
+- [ ] Remove any calls to `add_batch_dimension` (if applicable)
+- [ ] Update custom tokenizer initialization for `MultilingualModel` to use `from_file` instead of `from_pretrained`
+- [ ] Consider using new `Clip.similarity` and `Clip.normalize` helpers
+- [ ] Update error handling to catch `Clip::ImagePreprocessor::InvalidImageError`
+- [ ] Update error handling to catch `Clip::DownloadError`

data/exe/clip-embed-image CHANGED Viewed

@@ -1,16 +1,46 @@
 #!/usr/bin/env ruby
+# frozen_string_literal: true
 require_relative "../lib/clip"
+require "json"
+require "optparse"
+options = { format: :ruby }
-if ARGV.length != 1
-  puts "Usage: clip-embed-image <image_file>"
+OptionParser.new do |opts|
+  opts.banner = "Usage: clip-embed-image [options] <image_file>"
+  opts.on("-j", "--json", "Output as JSON") do
+    options[:format] = :json
+  end
+  opts.on("-m", "--multilingual", "Use multilingual model") do
+    options[:multilingual] = true
+  end
+  opts.on("-h", "--help", "Show this help") do
+    puts opts
+    exit
+  end
+end.parse!
+if ARGV.empty?
+  puts "Usage: clip-embed-image [options] <image_file>"
+  puts "Run 'clip-embed-image --help' for options"
   exit 1
 end
 begin
-  puts Clip::Model.new.encode_image(ARGV[0])
-rescue => e
-  puts "Error: #{e.message}"
+  model = options[:multilingual] ? Clip::MultilingualModel.new : Clip::Model.new
+  embedding = model.encode_image(ARGV[0])
+  case options[:format]
+  when :json
+    puts JSON.generate(embedding)
+  else
+    puts embedding.inspect
+  end
+rescue StandardError => e
+  warn "Error: #{e.message}"
   exit 1
 end

data/exe/clip-embed-text CHANGED Viewed

@@ -1,16 +1,46 @@
 #!/usr/bin/env ruby
+# frozen_string_literal: true
 require_relative "../lib/clip"
+require "json"
+require "optparse"
+options = { format: :ruby }
-if ARGV.length != 1
-  puts "Usage: clip-embed-text <text>"
+OptionParser.new do |opts|
+  opts.banner = "Usage: clip-embed-text [options] <text>"
+  opts.on("-j", "--json", "Output as JSON") do
+    options[:format] = :json
+  end
+  opts.on("-m", "--multilingual", "Use multilingual model") do
+    options[:multilingual] = true
+  end
+  opts.on("-h", "--help", "Show this help") do
+    puts opts
+    exit
+  end
+end.parse!
+if ARGV.empty?
+  puts "Usage: clip-embed-text [options] <text>"
+  puts "Run 'clip-embed-text --help' for options"
   exit 1
 end
 begin
-  puts Clip::Model.new.encode_text(ARGV[0])
-rescue => e
-  puts "Error: #{e.message}"
+  model = options[:multilingual] ? Clip::MultilingualModel.new : Clip::Model.new
+  embedding = model.encode_text(ARGV[0])
+  case options[:format]
+  when :json
+    puts JSON.generate(embedding)
+  else
+    puts embedding.inspect
+  end
+rescue StandardError => e
+  warn "Error: #{e.message}"
   exit 1
 end

data/lib/clip/image_preprocessor.rb CHANGED Viewed

@@ -1,31 +1,48 @@
+# frozen_string_literal: true
 require "mini_magick"
 require "numo/narray"
 module Clip
   class ImagePreprocessor
     # CLIP's expected image normalization parameters
-    MEAN = Numo::DFloat[*[ 0.48145466, 0.4578275, 0.40821073 ]]
-    STD = Numo::DFloat[*[ 0.26862954, 0.26130258, 0.27577711 ]]
+    MEAN = Numo::DFloat[0.48145466, 0.4578275, 0.40821073]
+    STD = Numo::DFloat[0.26862954, 0.26130258, 0.27577711]
+    SUPPORTED_FORMATS = %w[jpg jpeg png gif bmp webp tiff].freeze
+    class InvalidImageError < StandardError; end
     def initialize(target_size: 224)
       @target_size = target_size
     end
-    # Preprocess the image and return a tensor with shape [batch_size, 3, 224, 224]
+    # Preprocess the image and return a tensor with shape [3, 224, 224]
     def preprocess(image_path)
+      validate_image!(image_path)
       image = load_and_resize(image_path)
       tensor = image_to_tensor(image)
-      normalized = normalize(tensor)
-      add_batch_dimension(normalized)
+      normalize(tensor)
     end
     private
+    # Validate that the image file exists and has a supported format
+    def validate_image!(image_path)
+      path = image_path.is_a?(File) ? image_path.path : image_path.to_s
+      raise InvalidImageError, "Image file not found: #{path}" unless File.exist?(path)
+      extension = File.extname(path).delete(".").downcase
+      return if SUPPORTED_FORMATS.include?(extension)
+      raise InvalidImageError, "Unsupported image format: #{extension}. Supported: #{SUPPORTED_FORMATS.join(', ')}"
+    end
     # Load image, convert to RGB, and resize to target size
     def load_and_resize(image_path)
       image = MiniMagick::Image.open(image_path)
-      image.format "png" # Ensure consistent format
-      image = image.combine_options do |c|
+      image.format "png"
+      image.combine_options do |c|
         c.resize "#{@target_size}x#{@target_size}!"
         c.quality 100
         c.colorspace "RGB"
@@ -33,30 +50,20 @@ module Clip
       image
     end
-    # Convert the image to a normalized NumPy array with shape [3, 224, 224]
+    # Convert the image to a normalized tensor with shape [3, 224, 224]
     def image_to_tensor(image)
-      pixels = image.get_pixels # Returns [[R, G, B], ...] for each row
-      # Convert to Numo::NArray and reshape
+      pixels = image.get_pixels
       pixel_array = Numo::UInt8.asarray(pixels).cast_to(Numo::DFloat)
-      # Reshape to [height, width, channels]
       pixel_array = pixel_array.reshape(@target_size, @target_size, 3)
-      # Transpose to [channels, height, width]
       pixel_array = pixel_array.transpose(2, 0, 1)
-      # Normalize to [0, 1]
       pixel_array / 255.0
     end
     # Apply CLIP normalization: (x - mean) / std
     def normalize(tensor)
-      # Expand mean and std to match tensor shape
       mean = MEAN.reshape(3, 1, 1)
       std = STD.reshape(3, 1, 1)
       (tensor - mean) / std
     end
-    # Add batch dimension: [1, 3, 224, 224]
-    def add_batch_dimension(tensor)
-      tensor.reshape(3, @target_size, @target_size)
-    end
   end
 end

data/lib/clip/model.rb CHANGED Viewed

@@ -1,3 +1,5 @@
+# frozen_string_literal: true
 require "onnxruntime"
 module Clip
@@ -15,24 +17,29 @@ module Clip
       Clip.download_models(download_dir) if download_models && !Clip.models_exist?(textual_model_path: textual_model_path, visual_model_path: visual_model_path)
       @tokenizer = tokenizer
       @image_preprocessor = image_preprocessor
+      @model_mutex = Mutex.new
     end
     def encode_text(text)
       tokens = tokenizer.encode(text)
-      text_model.predict({ input: [ tokens ] })["output"].first
+      text_model.predict({ "input" => [tokens] })["output"].first
     end
     def encode_image(image)
-      image = image_preprocessor.preprocess(image).to_a
-      image_model.predict({ input: [ image ] })["output"].first
+      image_tensor = image_preprocessor.preprocess(image).to_a
+      image_model.predict({ "input" => [image_tensor] })["output"].first
     end
     def text_model
-      @text_model ||= OnnxRuntime::Model.new(textual_model_path)
+      @model_mutex.synchronize do
+        @text_model ||= OnnxRuntime::Model.new(textual_model_path)
+      end
     end
     def image_model
-      @image_model ||= OnnxRuntime::Model.new(visual_model_path)
+      @model_mutex.synchronize do
+        @image_model ||= OnnxRuntime::Model.new(visual_model_path)
+      end
     end
     private

data/lib/clip/multilingual_model.rb ADDED Viewed

@@ -0,0 +1,77 @@
+# frozen_string_literal: true
+require "onnxruntime"
+require "tokenizers"
+module Clip
+  class MultilingualModel
+    TOKENIZER_FILENAME = "tokenizer.json"
+    def initialize(
+      textual_model_path: ".clip_models/multilingual/textual.onnx",
+      visual_model_path: ".clip_models/multilingual/visual.onnx",
+      tokenizer: nil,
+      image_preprocessor: Clip::ImagePreprocessor.new,
+      download_models: true,
+      download_dir: ".clip_models/multilingual"
+    )
+      @textual_model_path = textual_model_path
+      @visual_model_path = visual_model_path
+      @download_dir = download_dir
+      if download_models
+        Clip.download_models(download_dir, Clip::MULTILINGUAL_MODELS) unless Clip.models_exist?(textual_model_path: textual_model_path, visual_model_path: visual_model_path)
+        download_tokenizer unless tokenizer
+      end
+      @tokenizer = tokenizer || load_tokenizer
+      @image_preprocessor = image_preprocessor
+      @model_mutex = Mutex.new
+    end
+    def encode_text(text)
+      encoding = tokenizer.encode(text)
+      input_ids = [encoding.ids]
+      attention_mask = [Array.new(encoding.ids.size, 1)]
+      text_model.predict({ "input_ids" => input_ids, "attention_mask" => attention_mask })["output"].first
+    end
+    def encode_image(image)
+      image_tensor = image_preprocessor.preprocess(image).to_a
+      image_model.predict({ "pixel_values" => [image_tensor] })["output"].first
+    end
+    def text_model
+      @model_mutex.synchronize do
+        @text_model ||= OnnxRuntime::Model.new(textual_model_path)
+      end
+    end
+    def image_model
+      @model_mutex.synchronize do
+        @image_model ||= OnnxRuntime::Model.new(visual_model_path)
+      end
+    end
+    private
+    attr_reader :textual_model_path, :visual_model_path, :tokenizer, :image_preprocessor
+    def tokenizer_path
+      File.join(@download_dir, TOKENIZER_FILENAME)
+    end
+    def download_tokenizer
+      return if File.exist?(tokenizer_path)
+      Clip.logger ||= Logger.new($stdout)
+      Clip.logger.info("Downloading tokenizer from #{Clip::MULTILINGUAL_TOKENIZER_URL}")
+      Clip.download_file(Clip::MULTILINGUAL_TOKENIZER_URL, tokenizer_path)
+    end
+    def load_tokenizer
+      Tokenizers::Tokenizer.from_file(tokenizer_path)
+    end
+  end
+end

data/lib/clip/tokenizer.rb CHANGED Viewed

@@ -1,3 +1,5 @@
+# frozen_string_literal: true
 require "zlib"
 require "set"
@@ -5,10 +7,10 @@ module Clip
   class Tokenizer
     INPUT_VECTOR_SIZE = 77
-    def initialize(bpe_path = __dir__ + "/../bpe_simple_vocab_16e6.txt.gz")
+    def initialize(bpe_path = File.join(__dir__, "..", "bpe_simple_vocab_16e6.txt.gz"))
       @byte_encoder = bytes_to_unicode
       @byte_decoder = @byte_encoder.invert
-      merges = Zlib::GzipReader.open(bpe_path).read.split("\n")[1..(49152 - 256 - 2)]
+      merges = Zlib::GzipReader.open(bpe_path) { |gz| gz.read }.split("\n")[1..(49152 - 256 - 2)]
       merges = merges.map { |merge| merge.split(" ") }
       vocab = @byte_encoder.values
       vocab += vocab.map { |v| "#{v}</w>" }
@@ -53,8 +55,14 @@ module Clip
       pairs
     end
+    # Clean text by decoding HTML entities and normalizing unicode
+    # Matches Python CLIP's basic_clean which uses ftfy.fix_text and html.unescape
     def basic_clean(text)
-      text
+      require "cgi"
+      # Decode HTML entities (called twice like Python original)
+      text = CGI.unescapeHTML(CGI.unescapeHTML(text))
+      # Normalize unicode to NFC form (similar to ftfy's fix_text for most cases)
+      text.unicode_normalize(:nfc).strip
     end
     def whitespace_clean(text)

data/lib/clip/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module Clip
-  VERSION = "1.0.2"
+  VERSION = "2.0.0"
 end

data/lib/clip-rb.rb CHANGED Viewed

@@ -1 +1,3 @@
+# frozen_string_literal: true
 require "clip"

data/lib/clip.rb CHANGED Viewed

@@ -1,50 +1,113 @@
+# frozen_string_literal: true
 require_relative "clip/model"
+require_relative "clip/multilingual_model"
 require_relative "clip/tokenizer"
 require_relative "clip/image_preprocessor"
 require "net/http"
+require "uri"
 require "fileutils"
 require "logger"
 module Clip
-  attr_accessor :logger
   BASE_URL = "https://huggingface.co/khasinski/"
   MODELS = {
-    textual: "clip-ViT-B-32-onnx/resolve/main/textual.onnx?download=true",
-    visual: "clip-ViT-B-32-onnx/resolve/main/visual.onnx?download=true"
-  }
-  def self.download_models(download_dir)
-    logger ||= Logger.new(STDOUT)
-    FileUtils.mkdir_p(download_dir)
-    MODELS.each do |type, path|
-      uri = URI.join(BASE_URL, path)
-      logger.info("Downloading #{type} model from #{uri}")
-      while true
-        response = Net::HTTP.get_response(uri)
-        if response.is_a?(Net::HTTPRedirection)
-          logger.info("Redirected to #{response['location']}")
-          uri = URI.parse(response['location']) # Update URI to the redirect location
-          next
-        elsif response.is_a?(Net::HTTPSuccess)
-          file_path = File.join(download_dir, "#{type}.onnx")
-          File.open(file_path, 'wb') do |file|
-            file.write(response.body) # Write the body directly for simplicity
+    "textual.onnx" => "clip-ViT-B-32-onnx/resolve/main/textual.onnx?download=true",
+    "visual.onnx" => "clip-ViT-B-32-onnx/resolve/main/visual.onnx?download=true"
+  }.freeze
+  MULTILINGUAL_MODELS = {
+    "textual.onnx" => "XLM-Roberta-Large-Vit-B-32-onnx/resolve/main/textual.onnx?download=true",
+    "visual.onnx" => "XLM-Roberta-Large-Vit-B-32-onnx/resolve/main/visual.onnx?download=true",
+    "data.bin" => "XLM-Roberta-Large-Vit-B-32-onnx/resolve/main/data.bin?download=true"
+  }.freeze
+  MULTILINGUAL_TOKENIZER_URL = "https://huggingface.co/M-CLIP/XLM-Roberta-Large-Vit-B-32/resolve/main/tokenizer.json"
+  DEFAULT_TIMEOUT = 300 # 5 minutes for large model files
+  class DownloadError < StandardError; end
+  class << self
+    attr_accessor :logger
+    def download_models(download_dir, models = MODELS)
+      @logger ||= Logger.new($stdout)
+      FileUtils.mkdir_p(download_dir)
+      models.each do |filename, path|
+        uri = URI.join(BASE_URL, path)
+        destination = File.join(download_dir, filename)
+        next if File.exist?(destination)
+        logger.info("Downloading #{filename} model from #{uri}")
+        download_file(uri.to_s, destination)
+      end
+    end
+    def download_file(url, destination, limit: 10, timeout: DEFAULT_TIMEOUT)
+      raise DownloadError, "Too many HTTP redirects" if limit == 0
+      uri = URI.parse(url)
+      http = Net::HTTP.new(uri.host, uri.port)
+      http.use_ssl = (uri.scheme == "https")
+      http.open_timeout = timeout
+      http.read_timeout = timeout
+      request = Net::HTTP::Get.new(uri.request_uri)
+      http.request(request) do |response|
+        case response
+        when Net::HTTPRedirection
+          location = response["location"]
+          # Handle relative redirects
+          new_url = if location.start_with?("/")
+                      "#{uri.scheme}://#{uri.host}#{location}"
+                    else
+                      location
+                    end
+          download_file(new_url, destination, limit: limit - 1, timeout: timeout)
+        when Net::HTTPSuccess
+          File.open(destination, "wb") do |file|
+            response.read_body do |chunk|
+              file.write(chunk)
+            end
           end
-          logger.info("Successfully downloaded #{type} model")
-          break
         else
-          logger.error("Failed to download #{type} model from #{uri}: #{response.code} #{response.message}")
-          raise "Failed to download #{type} model from #{uri}"
+          raise DownloadError, "Failed to download file: #{response.code} #{response.message}"
         end
       end
     end
-  end
-  def self.models_exist?(textual_model_path:, visual_model_path:)
-    File.exist?(textual_model_path) && File.exist?(visual_model_path)
+    def models_exist?(textual_model_path:, visual_model_path:)
+      File.exist?(textual_model_path) && File.exist?(visual_model_path)
+    end
+    # Normalize an embedding vector to unit length (L2 normalization)
+    # @param embedding [Array<Float>] The embedding vector
+    # @return [Array<Float>] The normalized embedding vector
+    def normalize(embedding)
+      magnitude = Math.sqrt(embedding.sum { |x| x * x })
+      return embedding if magnitude.zero?
+      embedding.map { |x| x / magnitude }
+    end
+    # Calculate cosine similarity between two embeddings
+    # @param embedding1 [Array<Float>] First embedding vector
+    # @param embedding2 [Array<Float>] Second embedding vector
+    # @return [Float] Cosine similarity score between -1 and 1
+    def similarity(embedding1, embedding2)
+      raise ArgumentError, "Embeddings must have the same length" if embedding1.length != embedding2.length
+      dot_product = embedding1.zip(embedding2).sum { |a, b| a * b }
+      magnitude1 = Math.sqrt(embedding1.sum { |x| x * x })
+      magnitude2 = Math.sqrt(embedding2.sum { |x| x * x })
+      return 0.0 if magnitude1.zero? || magnitude2.zero?
+      dot_product / (magnitude1 * magnitude2)
+    end
   end
 end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: clip-rb
 version: !ruby/object:Gem::Version
-  version: 1.0.2
+  version: 2.0.0
 platform: ruby
 authors:
 - Krzysztof Hasiński
-autorequire:
+autorequire:
 bindir: exe
 cert_chain: []
-date: 2025-02-04 00:00:00.000000000 Z
+date: 2025-12-12 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: onnxruntime
@@ -84,16 +84,30 @@ dependencies:
   name: mini_magick
   requirement: !ruby/object:Gem::Requirement
     requirements:
-    - - ">="
+    - - "~>"
       - !ruby/object:Gem::Version
-        version: '0'
+        version: '5.0'
   type: :runtime
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
-    - - ">="
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '5.0'
+- !ruby/object:Gem::Dependency
+  name: tokenizers
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '0.5'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
       - !ruby/object:Gem::Version
-        version: '0'
+        version: '0.5'
 description: OpenAI CLIP embeddings, uses ONNX models. Allows to create embeddings
   for images and text
 email:
@@ -106,10 +120,12 @@ extra_rdoc_files: []
 files:
 - ".clip_models/.gitkeep"
 - ".rspec"
+- CHANGELOG.md
 - CODE_OF_CONDUCT.md
 - LICENSE.txt
 - README.md
 - Rakefile
+- UPGRADING.md
 - exe/clip-embed-image
 - exe/clip-embed-text
 - lib/bpe_simple_vocab_16e6.txt.gz
@@ -117,6 +133,7 @@ files:
 - lib/clip.rb
 - lib/clip/image_preprocessor.rb
 - lib/clip/model.rb
+- lib/clip/multilingual_model.rb
 - lib/clip/tokenizer.rb
 - lib/clip/version.rb
 - sig/clip.rbs
@@ -127,8 +144,8 @@ licenses:
 metadata:
   homepage_uri: https://github.com/khasinski/clip-rb
   source_code_uri: https://github.com/khasinski/clip-rb
-  changelog_uri: https://github.com/khasinski/clip-rb/CHANGELOG.md
-post_install_message:
+  changelog_uri: https://github.com/khasinski/clip-rb/blob/main/CHANGELOG.md
+post_install_message:
 rdoc_options: []
 require_paths:
 - lib
@@ -143,8 +160,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubygems_version: 3.5.22
-signing_key:
+rubygems_version: 3.0.3.1
+signing_key:
 specification_version: 4
 summary: OpenAI CLIP embeddings, uses ONNX models
 test_files: []