RubyGems - vectra-client - Versions diffs - 1.1.0 → 1.1.1 - Mend

vectra-client 1.1.0 → 1.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +37 -0
data/README.md +22 -0
data/docs/_layouts/page.html +7 -0
data/docs/api/cheatsheet.md +17 -0
data/docs/api/methods.md +45 -0
data/docs/guides/roadmap.md +53 -0
data/lib/vectra/client.rb +61 -0
data/lib/vectra/middleware/request.rb +1 -1
data/lib/vectra/providers/memory.rb +56 -0
data/lib/vectra/providers/pgvector.rb +50 -0
data/lib/vectra/providers/qdrant.rb +39 -0
data/lib/vectra/providers/weaviate.rb +64 -0
data/lib/vectra/version.rb +1 -1
metadata +2 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 70730299dab8475b05688f7017dccd1965b87a49589c96be7633a356af8d57f2
-  data.tar.gz: 14999ebde62586578b444ba45d396cb38f1a4dd5dcbf2fd126e042ebb1c6fae5
+  metadata.gz: 0f1fd06b5874c1bc1da1244fb05321cda4a4759e234d9175159c5a6ba7ed8d40
+  data.tar.gz: 9d69e983f4ef5ed4d6bdd58e7a003e06045f3570ba1b3329587f82bcf07902a3
 SHA512:
-  metadata.gz: 69c8fa722ee4abfe3ddf6b19f8bd12f46d09edefee50612021142c3951600666ed854018e8a5c1bc33249895fec42ae18ab1d821beffa989d2700da99c28bcf4
-  data.tar.gz: f27a0df4bbcf618659297b1376ebe977588d32d5dfef76fe7c8f5f38c8780b37e055a0e8d91f6bc3ae8eb7cad88606c760d4ed5882c61756ac1495dedfc339a5
+  metadata.gz: 7c1911470f96d83470dd98cdc6bb3e6c438a11e71b7317265388d22ae4c3792cea3e595661c927a42587b6008fc891940d511815932b3da478bfeb056ccf8c30
+  data.tar.gz: a3a17643736f8b9a19b92c81a87ce8749a4e55798085e474ef1bc2f13ab9f8c688b2bc7537467a851e59a31cf06390444a064a9d525cff8efb175dbae1de7c7c

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,42 @@
 # Changelog
+## [v1.1.0](https://github.com/stokry/vectra/tree/v1.1.0) (2026-01-15)
+[Full Changelog](https://github.com/stokry/vectra/compare/v1.0.8...v1.1.0)
+### 🎉 Major Feature: Middleware System
+This release introduces a **Rack-style middleware system** for all vector database operations.
+#### Added
+- **Middleware Stack** - All client operations now route through a composable middleware pipeline
+- **5 Built-in Middleware**:
+  - `Vectra::Middleware::Logging` - Structured logs with timing for all operations
+  - `Vectra::Middleware::Retry` - Automatic retry with exponential/linear backoff for transient errors
+  - `Vectra::Middleware::Instrumentation` - Hooks for metrics and APM integration
+  - `Vectra::Middleware::PIIRedaction` - Automatic PII redaction (email, phone, SSN, credit cards)
+  - `Vectra::Middleware::CostTracker` - Track API costs per operation with callbacks
+- **Request/Response Objects** - Type-safe objects with metadata attachment
+- **Extensible Framework** - Create custom middleware by extending `Vectra::Middleware::Base`
+- **Global & Per-Client Middleware** - Apply middleware globally (`Client.use`) or per-instance (`new(middleware: [...])`)
+#### Changed
+- All client operations (`upsert`, `query`, `fetch`, `update`, `delete`, `stats`, `list_indexes`, etc.) now route through middleware stack for consistency
+- Middleware has complete visibility into all client operations
+#### Documentation
+- Added comprehensive middleware section to README
+- Created `examples/middleware_demo.rb` demonstrating all 5 built-in middleware
+- Full YARD documentation for all middleware classes
+- Published [middleware guide](https://dev.to/stokry/rack-style-middleware-for-vector-databases-in-ruby-vectra-client-110-2jh3) on Dev.to
+#### Migration Notes
+No breaking changes. Middleware is opt-in - existing code works without modification.
 ## [v1.0.8](https://github.com/stokry/vectra/tree/v1.0.8) (2026-01-14)
 [Full Changelog](https://github.com/stokry/vectra/compare/v1.0.7...v1.0.8)

data/README.md CHANGED Viewed

@@ -109,6 +109,14 @@ results = client.hybrid_search(
   text: 'ruby programming',
   alpha: 0.7  # 70% semantic, 30% keyword
 )
+# Text-only search (keyword search without embeddings)
+# Supported by: Qdrant, Weaviate, pgvector
+results = client.text_search(
+  index: 'products',
+  text: 'iPhone 15 Pro',
+  top_k: 10
+)
 ```
 ## Provider Examples
@@ -307,6 +315,20 @@ Vectra includes 7 production-ready patterns out of the box:
 - **Health Checks** - `healthy?`, `ping`, and `health_check` methods
 - **Instrumentation** - Datadog, New Relic, Sentry, Honeybadger support
+## Roadmap
+High-level roadmap for `vectra-client`:
+- **1.x (near term)**
+  - Reranking middleware built on top of the existing Rack-style middleware stack.
+  - Additional middleware building blocks (sampling, tracing, score normalization).
+  - Smoother Rails UX for multi-tenant setups and larger demos (e‑commerce, RAG, recommendations).
+- **Mid term**
+  - Additional providers where it makes sense and stays maintainable.
+  - Deeper documentation and recipes around reranking and hybrid search.
+For a more detailed, always-up-to-date version, see the online roadmap: https://vectra-docs.netlify.app/guides/roadmap/
 ## Development
 ```bash

data/docs/_layouts/page.html CHANGED Viewed

@@ -91,6 +91,13 @@
           <li><a href="https://github.com/stokry/vectra/issues" class="tma-sidebar__link" target="_blank">Report Issue ↗</a></li>
         </ul>
       </div>
+      <div class="tma-sidebar__section">
+        <h3 class="tma-sidebar__title">Resources</h3>
+        <ul class="tma-sidebar__list">
+          <li><a href="{{ site.baseurl }}/guides/roadmap" class="tma-sidebar__link {% if page.url == '/guides/roadmap/' %}tma-sidebar__link--active{% endif %}">Roadmap</a></li>
+        </ul>
+      </div>
     </aside>
     <!-- Main Content -->

data/docs/api/cheatsheet.md CHANGED Viewed

@@ -98,6 +98,23 @@ results = client.hybrid_search(
 Supported providers: Qdrant ✅, Weaviate ✅, pgvector ✅, Pinecone ⚠️
+### Text Search (keyword-only, no embeddings)
+```ruby
+results = client.text_search(
+  index: 'products',
+  text: 'iPhone 15 Pro',
+  top_k: 10,
+  filter: { category: 'electronics' }
+)
+results.each do |match|
+  puts "#{match.id} (score=#{match.score.round(3)}): #{match.metadata['title']}"
+end
+```
+Supported providers: Qdrant ✅ (BM25), Weaviate ✅ (BM25), pgvector ✅ (PostgreSQL full-text)
 ### Fetch
 ```ruby

data/docs/api/methods.md CHANGED Viewed

@@ -147,6 +147,51 @@ results = client.hybrid_search(
 ---
+### `client.text_search(index:, text:, top_k: 10, namespace: nil, filter: nil, include_values: false, include_metadata: true)`
+Text-only search (keyword search without requiring embeddings).
+**Parameters:**
+- `index` (String) - Index/collection name (uses client's default index when omitted)
+- `text` (String) - Text query for keyword search
+- `top_k` (Integer) - Number of results (default: 10)
+- `namespace` (String, optional) - Namespace
+- `filter` (Hash, optional) - Metadata filter
+- `include_values` (Boolean) - Include vector values (default: false)
+- `include_metadata` (Boolean) - Include metadata (default: true)
+**Returns:** `Vectra::QueryResult`
+**Provider Support:**
+- ✅ Qdrant (BM25)
+- ✅ Weaviate (BM25)
+- ✅ pgvector (PostgreSQL full-text search)
+- ✅ Memory (simple keyword matching - for testing only)
+- ❌ Pinecone (not supported - use sparse vectors instead)
+**Example:**
+```ruby
+# Keyword search for exact matches
+results = client.text_search(
+  index: 'products',
+  text: 'iPhone 15 Pro',
+  top_k: 10,
+  filter: { category: 'electronics' }
+)
+results.each do |match|
+  puts "#{match.id}: #{match.score} - #{match.metadata['title']}"
+end
+```
+**Use Cases:**
+- Product name search (exact matches)
+- Function/class name search in documentation
+- Keyword-based filtering when semantic search is not needed
+- Faster search when embeddings are not available
+---
 ### `client.fetch(index:, ids:, namespace: nil)`
 Fetch vectors by their IDs.

data/docs/guides/roadmap.md ADDED Viewed

@@ -0,0 +1,53 @@
+---
+layout: page
+title: Roadmap
+permalink: /guides/roadmap/
+---
+# Vectra Roadmap
+This page outlines the high-level roadmap for **vectra-client**, the unified Ruby client for vector databases.
+The roadmap is intentionally focused on **production features** that make AI workloads reliable, observable, and easy to operate in Ruby.
+## Near Term (1.x)
+- **Reranking middleware**
+  - Middleware that can call external rerankers (e.g., Cohere, Jina, custom HTTP) and reorder search results after a `query`.
+  - Pluggable providers, configurable `top_n`, and safe fallbacks when reranking fails.
+- **More middleware building blocks**
+  - Request sampling / tracing for debugging complex production issues.
+  - Response shaping (e.g., score normalization, custom thresholds) as reusable middleware.
+- **Rails UX improvements**
+  - Convenience generators and helpers for multi-tenant setups.
+  - Better defaults and examples for 1k+ records demos (e‑commerce, blogs, RAG, recommendations).
+## Mid Term
+- **Additional providers**
+  - Support for more hosted / self-hosted vector solutions where it makes sense and stays maintainable.
+- **First-class reranking guides**
+  - End-to-end documentation for combining vectra-client with external LLMs / rerankers.
+- **More recipes & patterns**
+  - Deeper recipes for analytics, recommendations, and hybrid search in large Rails apps.
+## Long Term Vision
+Keep **vectra-client** the most **production-ready Ruby toolkit** for vector databases:
+- Strong guarantees around retries, circuit breakers, and backpressure.
+- Excellent observability out of the box.
+- Stable, provider-agnostic API that lets you change infra without rewriting your app.
+If you have ideas or needs that fit this direction, please open an issue on GitHub so we can prioritise the roadmap around real-world use cases.
+{
+  "cells": [],
+  "metadata": {
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 2
+}

data/lib/vectra/client.rb CHANGED Viewed

@@ -494,6 +494,67 @@ module Vectra
       )
     end
+    # Text-only search (keyword search without embeddings)
+    #
+    # Performs keyword/text search without requiring vector embeddings.
+    # Useful for exact matches, product names, function names, etc.
+    #
+    # @param index [String] the index/collection name
+    # @param text [String] text query for keyword search
+    # @param top_k [Integer] number of results to return (default: 10)
+    # @param namespace [String, nil] optional namespace
+    # @param filter [Hash, nil] metadata filter
+    # @param include_values [Boolean] include vector values in results
+    # @param include_metadata [Boolean] include metadata in results
+    # @return [QueryResult] search results
+    #
+    # @example Basic text search
+    #   results = client.text_search(
+    #     index: 'products',
+    #     text: 'iPhone 15 Pro',
+    #     top_k: 10
+    #   )
+    #
+    # @example Text search with filter
+    #   results = client.text_search(
+    #     index: 'products',
+    #     text: 'laptop',
+    #     filter: { category: 'electronics', in_stock: true }
+    #   )
+    #
+    # @raise [UnsupportedFeatureError] if provider doesn't support text search
+    def text_search(index:, text:, top_k: 10, namespace: nil, filter: nil,
+                    include_values: false, include_metadata: true)
+      index ||= default_index
+      namespace ||= default_namespace
+      validate_index!(index)
+      raise ValidationError, "Text query cannot be nil or empty" if text.nil? || text.empty?
+      unless provider.respond_to?(:text_search)
+        raise UnsupportedFeatureError,
+              "Text search is not supported by #{provider_name} provider"
+      end
+      Instrumentation.instrument(
+        operation: :text_search,
+        provider: provider_name,
+        index: index,
+        metadata: { top_k: top_k }
+      ) do
+        @middleware.call(
+          :text_search,
+          index: index,
+          text: text,
+          top_k: top_k,
+          namespace: namespace,
+          filter: filter,
+          include_values: include_values,
+          include_metadata: include_metadata,
+          provider: provider_name
+        )
+      end
+    end
     # Get the provider name
     #
     # @return [Symbol]

data/lib/vectra/middleware/request.rb CHANGED Viewed

@@ -55,7 +55,7 @@ module Vectra
       #
       # @return [Boolean]
       def read_operation?
-        [:query, :fetch, :list_indexes, :describe_index, :stats].include?(operation)
+        [:query, :text_search, :hybrid_search, :fetch, :list_indexes, :describe_index, :stats].include?(operation)
       end
     end
   end

data/lib/vectra/providers/memory.rb CHANGED Viewed

@@ -80,6 +80,32 @@ module Vectra
         QueryResult.from_response(matches: matches, namespace: namespace)
       end
+      # Text-only search using simple keyword matching in metadata
+      #
+      # For testing purposes only. Performs case-insensitive keyword matching
+      # in metadata values. Not a real BM25/full-text search implementation.
+      #
+      # @param index [String] index name
+      # @param text [String] text query for keyword search
+      # @param top_k [Integer] number of results
+      # @param namespace [String, nil] optional namespace
+      # @param filter [Hash, nil] metadata filter
+      # @param include_values [Boolean] include vector values
+      # @param include_metadata [Boolean] include metadata
+      # @return [QueryResult] search results
+      def text_search(index:, text:, top_k:, namespace: nil, filter: nil,
+                      include_values: false, include_metadata: true)
+        ns = namespace || ""
+        candidates = filter_candidates(@storage[index][ns].values, filter)
+        text_lower = text.to_s.downcase
+        matches = find_text_matches(candidates, text_lower, include_values, include_metadata)
+        matches = matches.sort_by { |m| -m[:score] }.first(top_k)
+        log_debug("Text search returned #{matches.size} results")
+        QueryResult.from_response(matches: matches, namespace: namespace)
+      end
       # @see Base#fetch
       def fetch(index:, ids:, namespace: nil)
         ns = namespace || ""
@@ -293,6 +319,36 @@ module Vectra
         true
       end
       # rubocop:enable Naming/PredicateMethod
+      # Filter candidates by metadata filter
+      def filter_candidates(candidates, filter)
+        return candidates unless filter
+        candidates.select { |v| matches_filter?(v, filter) }
+      end
+      # Find text matches in candidates
+      def find_text_matches(candidates, text_lower, include_values, include_metadata)
+        candidates.map do |vec|
+          metadata_text = build_metadata_text(vec)
+          next unless metadata_text.include?(text_lower)
+          score = calculate_text_score(text_lower, metadata_text)
+          build_match(vec, score, include_values, include_metadata)
+        end.compact
+      end
+      # Build metadata text string for searching
+      def build_metadata_text(vector)
+        (vector.metadata || {}).values.map(&:to_s).join(" ").downcase
+      end
+      # Calculate text match score based on word matches
+      def calculate_text_score(query_text, metadata_text)
+        query_words = query_text.split(/\s+/)
+        matched_words = query_words.count { |word| metadata_text.include?(word) }
+        matched_words.to_f / query_words.size
+      end
     end
   end
 end

data/lib/vectra/providers/pgvector.rb CHANGED Viewed

@@ -28,6 +28,7 @@ module Vectra
     #   )
     #   client.upsert(index: 'documents', vectors: [...])
     #
+    # rubocop:disable Metrics/ClassLength
     class Pgvector < Base
       include Connection
       include SqlHelpers
@@ -162,6 +163,54 @@ module Vectra
         )
       end
+      # Text-only search using PostgreSQL full-text search
+      #
+      # @param index [String] table name
+      # @param text [String] text query for full-text search
+      # @param top_k [Integer] number of results
+      # @param namespace [String, nil] optional namespace
+      # @param filter [Hash, nil] metadata filter
+      # @param include_values [Boolean] include vector values
+      # @param include_metadata [Boolean] include metadata
+      # @param text_column [String] column name for full-text search (default: 'content')
+      # @return [QueryResult] search results
+      #
+      # @note Your table should have a text column with a tsvector index:
+      #   CREATE INDEX idx_content_fts ON my_index USING gin(to_tsvector('english', content));
+      def text_search(index:, text:, top_k:, namespace: nil, filter: nil,
+                      include_values: false, include_metadata: true,
+                      text_column: "content")
+        ensure_table_exists!(index)
+        select_cols = ["id"]
+        select_cols << "embedding" if include_values
+        select_cols << "metadata" if include_metadata
+        # Use ts_rank for scoring
+        text_score = "ts_rank(to_tsvector('english', COALESCE(#{quote_ident(text_column)}, '')), " \
+                     "plainto_tsquery('english', #{escape_literal(text)}))"
+        select_cols << "#{text_score} AS score"
+        where_clauses = build_where_clauses(namespace, filter)
+        where_clauses << "to_tsvector('english', COALESCE(#{quote_ident(text_column)}, '')) @@ " \
+                         "plainto_tsquery('english', #{escape_literal(text)})"
+        sql = "SELECT #{select_cols.join(', ')} FROM #{quote_ident(index)}"
+        sql += " WHERE #{where_clauses.join(' AND ')}" if where_clauses.any?
+        sql += " ORDER BY score DESC"
+        sql += " LIMIT #{top_k.to_i}"
+        result = execute(sql)
+        matches = result.map { |row| build_match_from_row(row, include_values, include_metadata) }
+        log_debug("Text search returned #{matches.size} results")
+        QueryResult.from_response(
+          matches: matches,
+          namespace: namespace
+        )
+      end
       # @see Base#fetch
       def fetch(index:, ids:, namespace: nil)
         ensure_table_exists!(index)
@@ -361,5 +410,6 @@ module Vectra
         raise ConfigurationError, "Host (connection URL or hostname) must be configured for pgvector"
       end
     end
+    # rubocop:enable Metrics/ClassLength
   end
 end

data/lib/vectra/providers/qdrant.rb CHANGED Viewed

@@ -110,6 +110,45 @@ module Vectra
         handle_hybrid_search_response(response, alpha, namespace)
       end
+      # Text-only search using Qdrant's BM25 text search
+      #
+      # @param index [String] collection name
+      # @param text [String] text query for keyword search
+      # @param top_k [Integer] number of results
+      # @param namespace [String, nil] optional namespace
+      # @param filter [Hash, nil] metadata filter
+      # @param include_values [Boolean] include vector values
+      # @param include_metadata [Boolean] include metadata
+      # @return [QueryResult] search results
+      def text_search(index:, text:, top_k:, namespace: nil, filter: nil,
+                      include_values: false, include_metadata: true)
+        qdrant_filter = build_filter(filter, namespace)
+        body = {
+          query: { text: text },
+          limit: top_k,
+          with_vector: include_values,
+          with_payload: include_metadata
+        }
+        body[:filter] = qdrant_filter if qdrant_filter
+        response = with_error_handling do
+          connection.post("/collections/#{index}/points/query", body)
+        end
+        if response.success?
+          matches = transform_search_results(response.body["result"] || [])
+          log_debug("Text search returned #{matches.size} results")
+          QueryResult.from_response(
+            matches: matches,
+            namespace: namespace
+          )
+        else
+          handle_error(response)
+        end
+      end
       # @see Base#fetch
       def fetch(index:, ids:, namespace: nil) # rubocop:disable Lint/UnusedMethodArgument
         point_ids = ids.map { |id| generate_point_id(id) }

data/lib/vectra/providers/weaviate.rb CHANGED Viewed

@@ -139,6 +139,36 @@ module Vectra
                                       include_values, include_metadata)
       end
+      # Text-only search using Weaviate's BM25 text search
+      #
+      # @param index [String] class name
+      # @param text [String] text query for BM25 search
+      # @param top_k [Integer] number of results
+      # @param namespace [String, nil] optional namespace (not used in Weaviate)
+      # @param filter [Hash, nil] metadata filter
+      # @param include_values [Boolean] include vector values
+      # @param include_metadata [Boolean] include metadata
+      # @return [QueryResult] search results
+      def text_search(index:, text:, top_k:, namespace: nil, filter: nil,
+                      include_values: false, include_metadata: true)
+        where_filter = build_where(filter, namespace)
+        graphql = build_text_search_graphql(
+          index: index,
+          text: text,
+          top_k: top_k,
+          where_filter: where_filter,
+          include_values: include_values,
+          include_metadata: include_metadata
+        )
+        body = { "query" => graphql }
+        response = with_error_handling do
+          connection.post("#{API_BASE_PATH}/graphql", body)
+        end
+        handle_text_search_response(response, index, namespace, include_values, include_metadata)
+      end
       # rubocop:disable Metrics/PerceivedComplexity
       def fetch(index:, ids:, namespace: nil)
         body = {
@@ -337,6 +367,26 @@ module Vectra
         build_graphql_query(index, top_k, text, alpha, vector, where_filter, selection_block)
       end
+      def build_text_search_graphql(index:, text:, top_k:, where_filter:,
+                                    include_values:, include_metadata:)
+        selection_block = build_selection_fields(include_values, include_metadata).join(" ")
+        <<~GRAPHQL
+          {
+            Get {
+              #{index}(
+                limit: #{top_k}
+                bm25: {
+                  query: "#{text.gsub('"', '\\"')}"
+                }
+                #{"where: #{JSON.generate(where_filter)}" if where_filter}
+              ) {
+                #{selection_block}
+              }
+            }
+          }
+        GRAPHQL
+      end
       def build_graphql_query(index, top_k, text, alpha, vector, where_filter, selection_block)
         <<~GRAPHQL
           {
@@ -379,6 +429,20 @@ module Vectra
         end
       end
+      def handle_text_search_response(response, index, namespace, include_values, include_metadata)
+        if response.success?
+          matches = extract_query_matches(response.body, index, include_values, include_metadata)
+          log_debug("Text search returned #{matches.size} results")
+          QueryResult.from_response(
+            matches: matches,
+            namespace: namespace
+          )
+        else
+          handle_error(response)
+        end
+      end
       def validate_config!
         super
         raise ConfigurationError, "Host must be configured for Weaviate" if config.host.nil? || config.host.empty?

data/lib/vectra/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module Vectra
-  VERSION = "1.1.0"
+  VERSION = "1.1.1"
 end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: vectra-client
 version: !ruby/object:Gem::Version
-  version: 1.1.0
+  version: 1.1.1
 platform: ruby
 authors:
 - Mijo Kristo
@@ -274,6 +274,7 @@ files:
 - docs/guides/rails-integration.md
 - docs/guides/rails-troubleshooting.md
 - docs/guides/recipes.md
+- docs/guides/roadmap.md
 - docs/guides/runbooks/cache-issues.md
 - docs/guides/runbooks/high-error-rate.md
 - docs/guides/runbooks/high-latency.md