RubyGems - fastembed - Versions diffs - 1.0.0 → 1.1.0 - Mend

fastembed 1.0.0 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (38) hide show

checksums.yaml +4 -4
data/.rubocop.yml +1 -0
data/.yardopts +6 -0
data/BENCHMARKS.md +124 -1
data/CHANGELOG.md +14 -0
data/README.md +395 -74
data/benchmark/compare_all.rb +167 -0
data/benchmark/compare_python.py +60 -0
data/benchmark/memory_profile.rb +70 -0
data/benchmark/profile.rb +198 -0
data/benchmark/reranker_benchmark.rb +158 -0
data/exe/fastembed +6 -0
data/fastembed.gemspec +3 -0
data/lib/fastembed/async.rb +193 -0
data/lib/fastembed/base_model.rb +247 -0
data/lib/fastembed/base_model_info.rb +61 -0
data/lib/fastembed/cli.rb +745 -0
data/lib/fastembed/custom_model_registry.rb +255 -0
data/lib/fastembed/image_embedding.rb +313 -0
data/lib/fastembed/late_interaction_embedding.rb +260 -0
data/lib/fastembed/late_interaction_model_info.rb +91 -0
data/lib/fastembed/model_info.rb +59 -19
data/lib/fastembed/model_management.rb +82 -23
data/lib/fastembed/onnx_embedding_model.rb +25 -4
data/lib/fastembed/pooling.rb +39 -3
data/lib/fastembed/progress.rb +52 -0
data/lib/fastembed/quantization.rb +75 -0
data/lib/fastembed/reranker_model_info.rb +91 -0
data/lib/fastembed/sparse_embedding.rb +261 -0
data/lib/fastembed/sparse_model_info.rb +80 -0
data/lib/fastembed/text_cross_encoder.rb +217 -0
data/lib/fastembed/text_embedding.rb +161 -28
data/lib/fastembed/validators.rb +59 -0
data/lib/fastembed/version.rb +1 -1
data/lib/fastembed.rb +42 -1
data/plan.md +257 -0
data/scripts/verify_models.rb +229 -0
metadata +70 -3

data/plan.md ADDED Viewed

@@ -0,0 +1,257 @@
+# FastEmbed-rb Roadmap
+This document outlines features from the original [FastEmbed Python library](https://github.com/qdrant/fastembed) that are not yet implemented in fastembed-rb.
+## Current Status (v1.0.0)
+### Implemented
+- Dense text embeddings with 12 models
+- Automatic model downloading from HuggingFace
+- Lazy evaluation via `Enumerator`
+- Query/passage prefixes for retrieval models
+- Mean pooling and L2 normalization
+- Configurable batch size and threading
+- CoreML execution provider support
+- CLI tool (`fastembed`)
+- **Reranking / Cross-Encoder models** (5 models)
+## Feature Gap Analysis
+### High Priority
+#### 1. Sparse Text Embeddings
+The Python library supports sparse embedding models that return indices and values rather than dense vectors. These are useful for hybrid search combining keyword and semantic matching.
+**Models to support:**
+- `Qdrant/bm25` - Classic BM25 (0.010 GB)
+- `Qdrant/bm42-all-minilm-l6-v2-attentions` - Attention-based sparse (0.090 GB)
+- `prithivida/Splade_PP_en_v1` - SPLADE++ (0.532 GB)
+**API design:**
+```ruby
+sparse = Fastembed::SparseTextEmbedding.new
+result = sparse.embed(["hello world"]).first
+# => { indices: [123, 456, 789], values: [0.5, 0.3, 0.2] }
+```
+**Implementation notes:**
+- Need new `SparseTextEmbedding` class
+- Different output format (sparse vectors instead of dense)
+- May require different tokenization approach for BM25
+#### 2. Late Interaction (ColBERT) Models
+ColBERT-style models produce token-level embeddings rather than a single vector per document. This enables more fine-grained matching.
+**Models to support:**
+- `answerdotai/answerai-colbert-small-v1` (96 dim)
+- `colbert-ir/colbertv2.0` (128 dim)
+- `jinaai/jina-colbert-v2` (128 dim)
+**API design:**
+```ruby
+colbert = Fastembed::LateInteractionTextEmbedding.new
+result = colbert.embed(["hello world"]).first
+# => Array of token embeddings, shape: [num_tokens, dim]
+```
+**Implementation notes:**
+- Returns 2D array per document (tokens × dimensions)
+- Different pooling strategy (no pooling, keep all tokens)
+- Scoring requires MaxSim operation between query and document tokens
+#### ~~3. Reranking / Cross-Encoder Models~~ ✅ IMPLEMENTED
+See `Fastembed::TextCrossEncoder` class.
+### Medium Priority
+#### ~~4. Image Embeddings~~ ✅ IMPLEMENTED
+Vision models for converting images to vectors. Requires `mini_magick` gem.
+**Supported models:**
+- `Qdrant/resnet50-onnx` (2048 dim)
+- `Qdrant/clip-ViT-B-32-vision` (512 dim)
+- `jinaai/jina-clip-v1` (768 dim)
+**Usage:**
+```ruby
+# Add to Gemfile: gem "mini_magick"
+image_embed = Fastembed::ImageEmbedding.new
+vector = image_embed.embed(["path/to/image.jpg"]).first
+```
+#### ~~5. Custom Model Support~~ ✅ IMPLEMENTED
+Implemented via `CustomModelRegistry` module. Users can register custom models:
+```ruby
+Fastembed.register_model(
+  model_name: "my-org/my-model",
+  dim: 768,
+  sources: { hf: "my-org/my-model" }
+)
+embed = Fastembed::TextEmbedding.new(model_name: "my-org/my-model")
+```
+Also supports local model loading via `local_model_dir` parameter.
+### Low Priority
+#### 6. Multimodal Late Interaction (ColPali)
+ColPali models that can embed both images and text for document retrieval.
+**Models to support:**
+- `vidore/colpali-v1.2`
+- `vidore/colqwen2-v1.0`
+**Implementation notes:**
+- Combines image and text embedding
+- Requires vision preprocessing
+- Complex architecture, lower priority
+#### 7. Quantized Models
+Support for INT8/INT4 quantized models for faster inference and lower memory usage.
+**Implementation notes:**
+- ONNX Runtime supports quantized models natively
+- Need to add quantized model variants to registry
+- Trade-off between speed and accuracy
+## ~~CLI Enhancements~~ ✅ IMPLEMENTED
+All planned CLI features have been implemented:
+- ✅ `fastembed download <model>` - Pre-download models for offline use
+- ✅ `fastembed benchmark` - Run performance benchmarks with configurable iterations
+- ✅ `fastembed info <model>` - Show detailed model information including cache status
+- ✅ `-i input.txt` - Read texts from file (one per line)
+- ✅ `-p` / `--progress` - Show progress bar during embedding
+- ✅ `-q` / `--quiet` - Suppress progress output for scripting
+## Breaking Changes for v2.0
+If we do a major version bump:
+1. Consider making `embed()` return an Array instead of Enumerator by default
+2. Rename `query_embed`/`passage_embed` to `embed_query`/`embed_passage` for consistency
+3. Use keyword arguments consistently throughout
+---
+## Refactoring Plan
+### Completed: Phase 1 - Extract Shared Helpers
+- [x] Create `Validators` module for document validation
+- [x] Extract `prepare_model_inputs` to BaseModel
+- [x] Extract `setup_model_and_tokenizer` to BaseModel
+- [x] Update all model classes to use shared helpers
+**Result:** Reduced ~60 lines of duplicated code across 4 model classes.
+---
+### Completed: Phase 2 - Add Missing Features (Medium Risk)
+Goal: Achieve API consistency across all model types.
+#### 2.1 Add `passage_embed` to TextSparseEmbedding ✅ IMPLEMENTED
+Added to TextSparseEmbedding.
+```ruby
+# lib/fastembed/sparse_embedding.rb
+def passage_embed(passages, batch_size: 32)
+  passages = [passages] if passages.is_a?(String)
+  embed(passages, batch_size: batch_size)
+end
+```
+#### 2.2 Add async methods to all embedding classes ✅ IMPLEMENTED
+Added async methods to all model classes:
+- TextSparseEmbedding: embed_async, query_embed_async, passage_embed_async
+- LateInteractionTextEmbedding: embed_async, query_embed_async, passage_embed_async
+- TextCrossEncoder: rerank_async, rerank_with_scores_async
+```ruby
+# Add to TextSparseEmbedding
+def embed_async(documents, batch_size: 32)
+  Async::Future.new { embed(documents, batch_size: batch_size).to_a }
+end
+def query_embed_async(queries, batch_size: 32)
+  Async::Future.new { query_embed(queries, batch_size: batch_size).to_a }
+end
+def passage_embed_async(passages, batch_size: 32)
+  Async::Future.new { passage_embed(passages, batch_size: batch_size).to_a }
+end
+# Add to TextCrossEncoder
+def rerank_async(query:, documents:, batch_size: 64)
+  Async::Future.new { rerank(query: query, documents: documents, batch_size: batch_size) }
+end
+```
+#### 2.3 Add progress callback support to all embedding classes ✅ IMPLEMENTED
+Added progress callback support to TextSparseEmbedding and LateInteractionTextEmbedding.
+#### 2.4 Add `show_progress` parameter to TextCrossEncoder ✅ IMPLEMENTED
+Made configurable (was hardcoded to true).
+---
+### Completed: Phase 3 - Unify Initialization (Higher Risk)
+Goal: Consistent initialization API across all model types.
+#### 3.1 Add quantization support to all models ✅ IMPLEMENTED
+Added quantization parameter to all model classes (TextSparseEmbedding, LateInteractionTextEmbedding, TextCrossEncoder).
+#### 3.2 Add local_model_dir support to all models ✅ IMPLEMENTED
+Added local_model_dir, model_file, and tokenizer_file parameters to all model classes. Shared logic extracted to BaseModel (initialize_from_local, create_local_model_info).
+#### 3.3 Document batch size rationale ✅ DOCUMENTED
+Default batch sizes vary by model type based on memory requirements:
+| Model Type | Default Batch Size | Rationale |
+|------------|-------------------|-----------|
+| TextEmbedding | 256 | Dense embeddings have fixed output size (e.g., 384 floats). Memory is predictable and efficient. |
+| TextSparseEmbedding | 32 | SPLADE models output logits for entire vocabulary (~30k tokens) per sequence position. Much higher memory per document. |
+| LateInteractionTextEmbedding | 32 | ColBERT keeps per-token embeddings (not pooled), so output size scales with sequence length × embedding dim. |
+| TextCrossEncoder | 64 | Processes query-document pairs together. Each pair requires more memory than single documents, but less than sparse/late interaction. |
+Users can override these defaults via the `batch_size` parameter if they have different memory constraints.
+---
+### Implementation Priority
+| Task | Risk | Effort | Value |
+|------|------|--------|-------|
+| 2.1 Add passage_embed to Sparse | Low | Small | Medium |
+| 2.2 Add async to all classes | Low | Medium | High |
+| 2.3 Add progress to all classes | Medium | Medium | Medium |
+| 2.4 Add show_progress to CrossEncoder | Low | Small | Low |
+| 3.1 Add quantization to all | Medium | Medium | Medium |
+| 3.2 Add local_model_dir to all | Medium | Large | Medium |
+| 3.3 Document batch size rationale | Low | Small | Low |
+---
+## Contributing
+Contributions are welcome! If you'd like to implement any of these features:
+1. Open an issue to discuss the approach
+2. Follow the existing code style (run `bundle exec rubocop`)
+3. Add tests for new functionality
+4. Update the README and CHANGELOG

data/scripts/verify_models.rb ADDED Viewed

@@ -0,0 +1,229 @@
+#!/usr/bin/env ruby
+# frozen_string_literal: true
+# Model Verification Script
+# Downloads and tests all supported models to ensure they work correctly.
+#
+# Usage:
+#   ruby scripts/verify_models.rb           # Run all model tests
+#   ruby scripts/verify_models.rb --quick   # Quick test (first model of each type only)
+#   ruby scripts/verify_models.rb --type embedding  # Test only embedding models
+require 'bundler/setup'
+require 'fastembed'
+require 'optparse'
+class ModelVerifier
+  COLORS = {
+    green: "\e[32m",
+    red: "\e[31m",
+    yellow: "\e[33m",
+    cyan: "\e[36m",
+    reset: "\e[0m"
+  }.freeze
+  def initialize(quick: false, type: nil)
+    @quick = quick
+    @type = type
+    @results = { passed: [], failed: [], skipped: [] }
+  end
+  def run
+    puts "#{COLORS[:cyan]}=== FastEmbed Model Verification ==#{COLORS[:reset]}"
+    puts "Mode: #{@quick ? 'Quick' : 'Full'}"
+    puts
+    verify_embedding_models if should_test?(:embedding)
+    verify_sparse_models if should_test?(:sparse)
+    verify_late_interaction_models if should_test?(:late_interaction)
+    verify_reranker_models if should_test?(:reranker)
+    verify_image_models if should_test?(:image)
+    print_summary
+  end
+  private
+  def should_test?(model_type)
+    @type.nil? || @type.to_sym == model_type
+  end
+  def verify_embedding_models
+    section("Text Embedding Models")
+    models = Fastembed::SUPPORTED_MODELS.keys
+    models = [models.first] if @quick
+    models.each do |model_name|
+      verify_model(model_name, :embedding) do
+        embedding = Fastembed::TextEmbedding.new(model_name: model_name, show_progress: false)
+        vectors = embedding.embed(['Hello world', 'Test document']).to_a
+        raise "Expected 2 vectors, got #{vectors.length}" unless vectors.length == 2
+        raise "Vector dimension mismatch" unless vectors.first.length == embedding.dim
+        "dim=#{embedding.dim}"
+      end
+    end
+  end
+  def verify_sparse_models
+    section("Sparse Embedding Models")
+    models = Fastembed::SUPPORTED_SPARSE_MODELS.keys
+    models = [models.first] if @quick
+    models.each do |model_name|
+      verify_model(model_name, :sparse) do
+        embedding = Fastembed::TextSparseEmbedding.new(model_name: model_name, show_progress: false)
+        vectors = embedding.embed(['Hello world']).to_a
+        raise "Expected sparse vector with indices" unless vectors.first[:indices].is_a?(Array)
+        raise "Expected sparse vector with values" unless vectors.first[:values].is_a?(Array)
+        "nnz=#{vectors.first[:indices].length}"
+      end
+    end
+  end
+  def verify_late_interaction_models
+    section("Late Interaction Models")
+    models = Fastembed::SUPPORTED_LATE_INTERACTION_MODELS.keys
+    models = [models.first] if @quick
+    models.each do |model_name|
+      verify_model(model_name, :late_interaction) do
+        embedding = Fastembed::LateInteractionTextEmbedding.new(model_name: model_name, show_progress: false)
+        vectors = embedding.embed(['Hello world']).to_a
+        raise "Expected token embeddings array" unless vectors.first.is_a?(Array)
+        raise "Token embedding dimension mismatch" unless vectors.first.first.length == embedding.dim
+        "dim=#{embedding.dim}, tokens=#{vectors.first.length}"
+      end
+    end
+  end
+  def verify_reranker_models
+    section("Reranker Models")
+    models = Fastembed::SUPPORTED_RERANKER_MODELS.keys
+    models = [models.first] if @quick
+    models.each do |model_name|
+      verify_model(model_name, :reranker) do
+        encoder = Fastembed::TextCrossEncoder.new(model_name: model_name, show_progress: false)
+        results = encoder.rerank_with_scores(
+          query: 'What is Ruby?',
+          documents: ['Ruby is a programming language', 'The sky is blue']
+        )
+        raise "Expected ranked results" unless results.is_a?(Array)
+        raise "Expected results with scores" unless results.first.key?(:score)
+        "top_score=#{results.first[:score].round(4)}"
+      end
+    end
+  end
+  def verify_image_models
+    section("Image Embedding Models")
+    begin
+      require 'mini_magick'
+    rescue LoadError
+      puts "#{COLORS[:yellow]}  Skipped (mini_magick not installed)#{COLORS[:reset]}"
+      Fastembed::SUPPORTED_IMAGE_MODELS.keys.each do |model_name|
+        @results[:skipped] << { name: model_name, type: :image, reason: 'mini_magick not installed' }
+      end
+      return
+    end
+    models = Fastembed::SUPPORTED_IMAGE_MODELS.keys
+    models = [models.first] if @quick
+    models.each do |model_name|
+      verify_model(model_name, :image) do
+        # Create a test image
+        require 'tempfile'
+        path = Tempfile.new(['test', '.png']).path
+        MiniMagick::Tool::Convert.new do |convert|
+          convert.size '224x224'
+          convert.xc 'white'
+          convert << path
+        end
+        begin
+          embedding = Fastembed::ImageEmbedding.new(model_name: model_name, show_progress: false)
+          vectors = embedding.embed(path).to_a
+          raise "Expected image embedding" unless vectors.first.is_a?(Array)
+          raise "Dimension mismatch" unless vectors.first.length == embedding.dim
+          "dim=#{embedding.dim}"
+        ensure
+          File.delete(path) if File.exist?(path)
+        end
+      end
+    end
+  end
+  def verify_model(model_name, type)
+    print "  #{model_name}... "
+    $stdout.flush
+    start_time = Time.now
+    begin
+      result = yield
+      elapsed = (Time.now - start_time).round(2)
+      puts "#{COLORS[:green]}PASS#{COLORS[:reset]} (#{elapsed}s) [#{result}]"
+      @results[:passed] << { name: model_name, type: type, time: elapsed }
+    rescue => e
+      elapsed = (Time.now - start_time).round(2)
+      puts "#{COLORS[:red]}FAIL#{COLORS[:reset]} (#{elapsed}s)"
+      puts "    Error: #{e.message}"
+      @results[:failed] << { name: model_name, type: type, error: e.message }
+    end
+  end
+  def section(title)
+    puts "#{COLORS[:cyan]}#{title}:#{COLORS[:reset]}"
+  end
+  def print_summary
+    puts
+    puts "#{COLORS[:cyan]}=== Summary ==#{COLORS[:reset]}"
+    puts "Passed: #{COLORS[:green]}#{@results[:passed].length}#{COLORS[:reset]}"
+    puts "Failed: #{COLORS[:red]}#{@results[:failed].length}#{COLORS[:reset]}"
+    puts "Skipped: #{COLORS[:yellow]}#{@results[:skipped].length}#{COLORS[:reset]}"
+    puts
+    unless @results[:failed].empty?
+      puts "#{COLORS[:red]}Failed models:#{COLORS[:reset]}"
+      @results[:failed].each do |r|
+        puts "  - #{r[:name]} (#{r[:type]}): #{r[:error]}"
+      end
+    end
+    exit(@results[:failed].empty? ? 0 : 1)
+  end
+end
+# Parse options
+options = { quick: false, type: nil }
+OptionParser.new do |opts|
+  opts.banner = "Usage: #{$PROGRAM_NAME} [options]"
+  opts.on('-q', '--quick', 'Quick mode (first model of each type only)') do
+    options[:quick] = true
+  end
+  opts.on('-t', '--type TYPE', %w[embedding sparse late_interaction reranker image],
+          'Test only models of this type') do |type|
+    options[:type] = type
+  end
+  opts.on('-h', '--help', 'Show this help') do
+    puts opts
+    exit
+  end
+end.parse!
+ModelVerifier.new(**options).run

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: fastembed
 version: !ruby/object:Gem::Version
-  version: 1.0.0
+  version: 1.1.0
 platform: ruby
 authors:
 - Chris Hasinski
 autorequire:
 bindir: exe
 cert_chain: []
-date: 2026-01-08 00:00:00.000000000 Z
+date: 2026-01-11 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: onnxruntime
@@ -38,6 +38,20 @@ dependencies:
     - - "~>"
       - !ruby/object:Gem::Version
         version: '0.5'
+- !ruby/object:Gem::Dependency
+  name: mini_magick
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '4.0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '4.0'
 - !ruby/object:Gem::Dependency
   name: rake
   requirement: !ruby/object:Gem::Requirement
@@ -94,30 +108,83 @@ dependencies:
     - - "~>"
       - !ruby/object:Gem::Version
         version: '3.0'
+- !ruby/object:Gem::Dependency
+  name: webmock
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '3.0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '3.0'
+- !ruby/object:Gem::Dependency
+  name: yard
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '0.9'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '0.9'
 description: A Ruby port of FastEmbed - fast text embeddings using ONNX Runtime
 email:
 - krzysztof.hasinski@gmail.com
-executables: []
+executables:
+- fastembed
 extensions: []
 extra_rdoc_files: []
 files:
 - ".mise.toml"
 - ".rspec"
 - ".rubocop.yml"
+- ".yardopts"
 - BENCHMARKS.md
 - CHANGELOG.md
 - Gemfile
 - LICENSE
 - README.md
 - Rakefile
+- benchmark/compare_all.rb
+- benchmark/compare_python.py
+- benchmark/memory_profile.rb
+- benchmark/profile.rb
+- benchmark/reranker_benchmark.rb
+- exe/fastembed
 - fastembed.gemspec
 - lib/fastembed.rb
+- lib/fastembed/async.rb
+- lib/fastembed/base_model.rb
+- lib/fastembed/base_model_info.rb
+- lib/fastembed/cli.rb
+- lib/fastembed/custom_model_registry.rb
+- lib/fastembed/image_embedding.rb
+- lib/fastembed/late_interaction_embedding.rb
+- lib/fastembed/late_interaction_model_info.rb
 - lib/fastembed/model_info.rb
 - lib/fastembed/model_management.rb
 - lib/fastembed/onnx_embedding_model.rb
 - lib/fastembed/pooling.rb
+- lib/fastembed/progress.rb
+- lib/fastembed/quantization.rb
+- lib/fastembed/reranker_model_info.rb
+- lib/fastembed/sparse_embedding.rb
+- lib/fastembed/sparse_model_info.rb
+- lib/fastembed/text_cross_encoder.rb
 - lib/fastembed/text_embedding.rb
+- lib/fastembed/validators.rb
 - lib/fastembed/version.rb
+- plan.md
+- scripts/verify_models.rb
 homepage: https://github.com/khasinski/fastembed-rb
 licenses:
 - MIT