vectra-client 0.4.0 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: b0d48b8a6205df9a0d6545e3772e26da09f826f03d56751a6fa997e1a73d89f1
4
- data.tar.gz: 25dc65ad327e03e7912a0b938739b6acf1d6708efd8fa41773d9bbae1053e3dc
3
+ metadata.gz: f9965925f32b1b497e306ba25776f06edaba86b3c8abf8b750c8a449fe68ba5a
4
+ data.tar.gz: 1622d500398e1fb95b146d2792c0f29abdde93888e7b74af8fd1dc674c9bee58
5
5
  SHA512:
6
- metadata.gz: 68fc7d7a2c941733bb8f42007dffa52989daedeb362da62c094d05dec7d02f8ca6d7ca8763194e9b51b4ecfe3f7a6e8db4418c7589caab31943f055f75f4c48a
7
- data.tar.gz: f40ab28eb011943d4961e22c5d31b5a816a81cea1a2bf6afa238004adaa1cdd8558f10b0ee8d56376d6fc737788b16fc1acb7088778d8461ecb3a8db2ff85580
6
+ metadata.gz: 1911cce768648f9c48e9c94e13a7d0c51f39e81eb381a0defcbc581eb27b4cfacb56d41e1cfb776f6a2ef4522e344c32d842a550c79d5a7584bb1a84a637e605
7
+ data.tar.gz: bfb7cc7174a739591a8061436f47963dab2c33809db3437558f5addf9cf4dff2b15b93bc72934f1dd04821f2f93884b07f0c9f5886b4d618d9e3eff15ce907a6
data/CHANGELOG.md CHANGED
@@ -1,22 +1,59 @@
1
1
  # Changelog
2
2
 
3
- ## [v0.4.0](https://github.com/stokry/vectra/tree/v0.4.0) (2026-01-12)
3
+ ## [v1.0.0](https://github.com/stokry/vectra/tree/v1.0.0) (2026-01-12)
4
4
 
5
5
  ### Added
6
- - Memory provider for in-memory vector storage
7
- - QueryBuilder for chainable query API
8
- - Batch operations with concurrent processing
9
- - Vector normalization methods (L2, L1)
10
- - Enhanced error message extraction
11
-
12
- ### Fixed
13
- - Weaviate DELETE request query parameters
14
- - Qdrant error message extraction from nested status
15
- - Client ping error info capture
16
- - Credential rotation timeout parameter handling
6
+ - Hybrid search functionality for Qdrant and Weaviate providers
7
+ - Enhanced provider capabilities and error handling
8
+ - Support for advanced filtering and namespace operations
9
+ - Improved vector search performance
10
+
11
+ ### Changed
12
+ - Major API refinements and provider implementations
13
+ - Enhanced test coverage and documentation
14
+
15
+ [Full Changelog](https://github.com/stokry/vectra/compare/v0.4.0...v1.0.0)
16
+
17
+ ## [v0.4.0](https://github.com/stokry/vectra/tree/v0.4.0) (2026-01-12)
17
18
 
18
19
  [Full Changelog](https://github.com/stokry/vectra/compare/v0.3.4...v0.4.0)
19
20
 
21
+ ### Added
22
+ - **Hybrid Search** - Combine semantic (vector) and keyword (text) search across all providers
23
+ - Full support for Qdrant (prefetch + rescore API)
24
+ - Full support for Weaviate (hybrid GraphQL with BM25)
25
+ - Full support for pgvector (vector similarity + PostgreSQL full-text search)
26
+ - Partial support for Pinecone (requires sparse vectors for true hybrid search)
27
+ - Alpha parameter (0.0 = pure keyword, 1.0 = pure semantic) for fine-tuning balance
28
+ - **Batch Progress Callbacks** - Real-time visibility into batch operations
29
+ - `on_progress` callback with detailed statistics (processed, total, percentage, chunk info)
30
+ - Thread-safe progress tracking with `Concurrent::AtomicFixnum`
31
+ - Support for `upsert_async`, `delete_async`, and `fetch_async` methods
32
+ - **Vector Normalization Helper** - Improve cosine similarity results
33
+ - `Vector.normalize!` instance method (L2 and L1 normalization)
34
+ - `Vector.normalize` class method for non-mutating normalization
35
+ - Automatic handling of zero vectors
36
+ - **Dimension Validation** - Automatic validation of vector dimension consistency
37
+ - Validates all vectors in a batch have the same dimension
38
+ - Detailed error messages with index and expected/actual dimensions
39
+ - Works with both Vector objects and hash vectors
40
+ - **Better Error Messages** - Enhanced error context and debugging
41
+ - Includes error details, field-specific errors, and context
42
+ - Improved error message format: "Main message (details) [Fields: field1, field2]"
43
+ - **Connection Health Check** - Simple health monitoring methods
44
+ - `healthy?` method for quick boolean health check
45
+ - `ping` method with latency measurement and detailed status
46
+ - Automatic error logging when logger is configured
47
+
48
+ ### Changed
49
+ - Improved error handling with more context in error messages
50
+ - Enhanced batch operations with progress tracking capabilities
51
+
52
+ ### Documentation
53
+ - Added comprehensive hybrid search examples and provider support matrix
54
+ - Updated getting started guide with normalization, health checks, and dimension validation
55
+ - Added real-world examples demonstrating new features
56
+
20
57
  ## [v0.3.4](https://github.com/stokry/vectra/tree/v0.3.4) (2026-01-12)
21
58
 
22
59
  [Full Changelog](https://github.com/stokry/vectra/compare/v0.3.3...v0.3.4)
data/README.md CHANGED
@@ -81,6 +81,15 @@ end
81
81
  # Ping with latency
82
82
  status = client.ping
83
83
  puts "Provider: #{status[:provider]}, Latency: #{status[:latency_ms]}ms"
84
+
85
+ # Hybrid search (semantic + keyword)
86
+ # Supported by: Qdrant, Weaviate, Pinecone, pgvector
87
+ results = client.hybrid_search(
88
+ index: 'docs',
89
+ vector: embedding,
90
+ text: 'ruby programming',
91
+ alpha: 0.7 # 70% semantic, 30% keyword
92
+ )
84
93
  ```
85
94
 
86
95
  ## Provider Examples
data/docs/api/overview.md CHANGED
@@ -111,6 +111,98 @@ Get index statistics.
111
111
  }
112
112
  ```
113
113
 
114
+ ### `hybrid_search(index:, vector:, text:, alpha:, top_k:)`
115
+
116
+ Combine semantic (vector) and keyword (text) search.
117
+
118
+ **Parameters:**
119
+ - `index` (String) - Index/collection name
120
+ - `vector` (Array) - Query vector for semantic search
121
+ - `text` (String) - Text query for keyword search
122
+ - `alpha` (Float) - Balance between semantic and keyword (0.0 = pure keyword, 1.0 = pure semantic)
123
+ - `top_k` (Integer) - Number of results (default: 10)
124
+ - `namespace` (String, optional) - Namespace
125
+ - `filter` (Hash, optional) - Metadata filter
126
+ - `include_values` (Boolean) - Include vector values (default: false)
127
+ - `include_metadata` (Boolean) - Include metadata (default: true)
128
+
129
+ **Example:**
130
+ ```ruby
131
+ results = client.hybrid_search(
132
+ index: 'docs',
133
+ vector: embedding,
134
+ text: 'ruby programming',
135
+ alpha: 0.7 # 70% semantic, 30% keyword
136
+ )
137
+ ```
138
+
139
+ **Provider Support:** Qdrant ✅, Weaviate ✅, pgvector ✅, Pinecone ⚠️
140
+
141
+ ### `healthy?`
142
+
143
+ Quick health check - returns true if provider connection is healthy.
144
+
145
+ **Returns:** Boolean
146
+
147
+ **Example:**
148
+ ```ruby
149
+ if client.healthy?
150
+ client.upsert(...)
151
+ end
152
+ ```
153
+
154
+ ### `ping`
155
+
156
+ Ping provider and get connection health status with latency.
157
+
158
+ **Returns:**
159
+ ```ruby
160
+ {
161
+ healthy: true,
162
+ provider: :pinecone,
163
+ latency_ms: 45.23
164
+ }
165
+ ```
166
+
167
+ **Example:**
168
+ ```ruby
169
+ status = client.ping
170
+ puts "Latency: #{status[:latency_ms]}ms"
171
+ ```
172
+
173
+ ### `Vector.normalize(vector, type: :l2)`
174
+
175
+ Normalize a vector array (non-mutating).
176
+
177
+ **Parameters:**
178
+ - `vector` (Array) - Vector values to normalize
179
+ - `type` (Symbol) - Normalization type: `:l2` (default) or `:l1`
180
+
181
+ **Returns:** Array of normalized values
182
+
183
+ **Example:**
184
+ ```ruby
185
+ embedding = openai_response['data'][0]['embedding']
186
+ normalized = Vectra::Vector.normalize(embedding)
187
+ client.upsert(vectors: [{ id: 'doc-1', values: normalized }])
188
+ ```
189
+
190
+ ### `vector.normalize!(type: :l2)`
191
+
192
+ Normalize vector in-place (mutates the vector).
193
+
194
+ **Parameters:**
195
+ - `type` (Symbol) - Normalization type: `:l2` (default) or `:l1`
196
+
197
+ **Returns:** Self (for method chaining)
198
+
199
+ **Example:**
200
+ ```ruby
201
+ vector = Vectra::Vector.new(id: 'doc-1', values: embedding)
202
+ vector.normalize! # L2 normalization
203
+ client.upsert(vectors: [vector])
204
+ ```
205
+
114
206
  ## Error Handling
115
207
 
116
208
  ```ruby
@@ -121,6 +121,52 @@ if status[:error]
121
121
  end
122
122
  ```
123
123
 
124
+ ### Hybrid Search (Semantic + Keyword)
125
+
126
+ Combine the best of both worlds: semantic understanding from vectors and exact keyword matching:
127
+
128
+ ```ruby
129
+ # Hybrid search with 70% semantic, 30% keyword
130
+ results = client.hybrid_search(
131
+ index: 'docs',
132
+ vector: embedding, # Semantic search
133
+ text: 'ruby programming', # Keyword search
134
+ alpha: 0.7, # 0.0 = pure keyword, 1.0 = pure semantic
135
+ top_k: 10
136
+ )
137
+
138
+ results.each do |match|
139
+ puts "#{match.id}: #{match.score}"
140
+ end
141
+
142
+ # Pure semantic (alpha = 1.0)
143
+ results = client.hybrid_search(
144
+ index: 'docs',
145
+ vector: embedding,
146
+ text: 'ruby',
147
+ alpha: 1.0
148
+ )
149
+
150
+ # Pure keyword (alpha = 0.0)
151
+ results = client.hybrid_search(
152
+ index: 'docs',
153
+ vector: embedding,
154
+ text: 'ruby programming',
155
+ alpha: 0.0
156
+ )
157
+ ```
158
+
159
+ **Provider Support:**
160
+ - **Qdrant**: ✅ Full support (prefetch + rescore API)
161
+ - **Weaviate**: ✅ Full support (hybrid GraphQL with BM25)
162
+ - **Pinecone**: ⚠️ Partial support (requires sparse vectors for true hybrid search)
163
+ - **pgvector**: ✅ Full support (combines vector similarity + PostgreSQL full-text search)
164
+
165
+ **Note for pgvector:** Your table needs a text column with a tsvector index:
166
+ ```sql
167
+ CREATE INDEX idx_content_fts ON my_index USING gin(to_tsvector('english', content));
168
+ ```
169
+
124
170
  ### Dimension Validation
125
171
 
126
172
  Vectra automatically validates that all vectors in a batch have the same dimension:
@@ -26,13 +26,47 @@ vectors = 10_000.times.map { |i| { id: "vec_#{i}", values: Array.new(384) { rand
26
26
  result = batch.upsert_async(
27
27
  index: 'my-index',
28
28
  vectors: vectors,
29
- chunk_size: 100
29
+ chunk_size: 100,
30
+ on_progress: proc { |stats|
31
+ progress = stats[:percentage]
32
+ processed = stats[:processed]
33
+ total = stats[:total]
34
+ chunk = stats[:current_chunk] + 1
35
+ total_chunks = stats[:total_chunks]
36
+
37
+ puts "Progress: #{progress}% (#{processed}/#{total})"
38
+ puts " Chunk #{chunk}/#{total_chunks} | Success: #{stats[:success_count]}, Failed: #{stats[:failed_count]}"
39
+ }
30
40
  )
31
41
 
32
42
  puts "Upserted: #{result[:upserted_count]} vectors in #{result[:chunks]} chunks"
33
43
  puts "Errors: #{result[:errors].size}" if result[:errors].any?
34
44
  ```
35
45
 
46
+ ### Progress Tracking
47
+
48
+ Monitor batch operations in real-time with progress callbacks:
49
+
50
+ ```ruby
51
+ batch.upsert_async(
52
+ index: 'my-index',
53
+ vectors: large_vector_array,
54
+ chunk_size: 100,
55
+ on_progress: proc { |stats|
56
+ # stats contains:
57
+ # - processed: number of processed vectors
58
+ # - total: total number of vectors
59
+ # - percentage: progress percentage (0-100)
60
+ # - current_chunk: current chunk index (0-based)
61
+ # - total_chunks: total number of chunks
62
+ # - success_count: number of successful chunks
63
+ # - failed_count: number of failed chunks
64
+
65
+ puts "Progress: #{stats[:percentage]}% (#{stats[:processed]}/#{stats[:total]})"
66
+ }
67
+ )
68
+ ```
69
+
36
70
  ### Batch Delete
37
71
 
38
72
  ```ruby
@@ -43,6 +43,7 @@ client = Vectra::Client.new(
43
43
  - ✅ ACID transactions
44
44
  - ✅ Complex queries
45
45
  - ✅ Rails ActiveRecord integration
46
+ - ✅ Hybrid search (vector + full-text search)
46
47
 
47
48
  ## Example
48
49
 
@@ -63,6 +64,17 @@ client.upsert(
63
64
 
64
65
  # Search using cosine distance
65
66
  results = client.query(vector: [0.1, 0.2, 0.3], top_k: 5)
67
+
68
+ # Hybrid search (requires text column with tsvector index)
69
+ # First, create the index:
70
+ # CREATE INDEX idx_content_fts ON my_index USING gin(to_tsvector('english', content));
71
+ results = client.hybrid_search(
72
+ index: 'my_index',
73
+ vector: embedding,
74
+ text: 'ruby programming',
75
+ alpha: 0.7,
76
+ text_column: 'content' # default: 'content'
77
+ )
66
78
  ```
67
79
 
68
80
  ## ActiveRecord Integration
@@ -32,6 +32,7 @@ client = Vectra::Client.new(
32
32
  - ✅ Index statistics
33
33
  - ✅ Metadata filtering
34
34
  - ✅ Namespace support
35
+ - ⚠️ Hybrid search (partial - requires sparse vectors)
35
36
 
36
37
  ## Example
37
38
 
@@ -56,6 +57,15 @@ results = client.query(vector: [0.1, 0.2, 0.3], top_k: 5)
56
57
  results.matches.each do |match|
57
58
  puts "#{match['id']}: #{match['score']}"
58
59
  end
60
+
61
+ # Hybrid search (note: requires sparse vectors for true hybrid search)
62
+ # For now, this uses dense vector search only
63
+ results = client.hybrid_search(
64
+ index: 'my-index',
65
+ vector: embedding,
66
+ text: 'ruby programming',
67
+ alpha: 0.7
68
+ )
59
69
  ```
60
70
 
61
71
  ## Configuration Options
@@ -56,6 +56,14 @@ client.upsert(
56
56
 
57
57
  # Search
58
58
  results = client.query(vector: [0.1, 0.2, 0.3], top_k: 10)
59
+
60
+ # Hybrid search (semantic + keyword)
61
+ results = client.hybrid_search(
62
+ index: 'my-collection',
63
+ vector: embedding,
64
+ text: 'ruby programming',
65
+ alpha: 0.7 # 70% semantic, 30% keyword
66
+ )
59
67
  ```
60
68
 
61
69
  ## Configuration Options
@@ -47,6 +47,7 @@ client = Vectra::Client.new(
47
47
  - ✅ Delete by IDs or filter
48
48
  - ✅ List and describe classes
49
49
  - ✅ Basic stats via GraphQL `Aggregate`
50
+ - ✅ Hybrid search (BM25 + vector similarity)
50
51
 
51
52
  ## Basic Example
52
53
 
@@ -89,6 +90,15 @@ results = client.query(
89
90
  results.each do |match|
90
91
  puts "#{match.id} (score=#{match.score.round(3)}): #{match.metadata["title"]}"
91
92
  end
93
+
94
+ # Hybrid search (BM25 + vector)
95
+ results = client.hybrid_search(
96
+ index: index,
97
+ vector: embedding,
98
+ text: 'ruby programming',
99
+ alpha: 0.7, # 70% semantic, 30% keyword
100
+ top_k: 10
101
+ )
92
102
  ```
93
103
 
94
104
  ## Advanced Filtering
data/lib/vectra/client.rb CHANGED
@@ -287,6 +287,71 @@ module Vectra
287
287
  provider.stats(index: index, namespace: namespace)
288
288
  end
289
289
 
290
+ # Hybrid search combining semantic (vector) and keyword (text) search
291
+ #
292
+ # Combines the best of both worlds: semantic understanding from vectors
293
+ # and exact keyword matching from text search.
294
+ #
295
+ # @param index [String] the index/collection name
296
+ # @param vector [Array<Float>] query vector for semantic search
297
+ # @param text [String] text query for keyword search
298
+ # @param alpha [Float] balance between semantic and keyword (0.0 = pure keyword, 1.0 = pure semantic)
299
+ # @param top_k [Integer] number of results to return
300
+ # @param namespace [String, nil] optional namespace
301
+ # @param filter [Hash, nil] metadata filter
302
+ # @param include_values [Boolean] include vector values in results
303
+ # @param include_metadata [Boolean] include metadata in results
304
+ # @return [QueryResult] search results
305
+ #
306
+ # @example Basic hybrid search
307
+ # results = client.hybrid_search(
308
+ # index: 'docs',
309
+ # vector: embedding,
310
+ # text: 'ruby programming',
311
+ # alpha: 0.7 # 70% semantic, 30% keyword
312
+ # )
313
+ #
314
+ # @example Pure semantic (alpha = 1.0)
315
+ # results = client.hybrid_search(
316
+ # index: 'docs',
317
+ # vector: embedding,
318
+ # text: 'ruby',
319
+ # alpha: 1.0
320
+ # )
321
+ #
322
+ # @example Pure keyword (alpha = 0.0)
323
+ # results = client.hybrid_search(
324
+ # index: 'docs',
325
+ # vector: embedding,
326
+ # text: 'ruby programming',
327
+ # alpha: 0.0
328
+ # )
329
+ #
330
+ def hybrid_search(index:, vector:, text:, alpha: 0.5, top_k: 10, namespace: nil,
331
+ filter: nil, include_values: false, include_metadata: true)
332
+ validate_index!(index)
333
+ validate_query_vector!(vector)
334
+ raise ValidationError, "Text query cannot be nil or empty" if text.nil? || text.empty?
335
+ raise ValidationError, "Alpha must be between 0.0 and 1.0" unless (0.0..1.0).include?(alpha)
336
+
337
+ unless provider.respond_to?(:hybrid_search)
338
+ raise UnsupportedFeatureError,
339
+ "Hybrid search is not supported by #{provider_name} provider"
340
+ end
341
+
342
+ provider.hybrid_search(
343
+ index: index,
344
+ vector: vector,
345
+ text: text,
346
+ alpha: alpha,
347
+ top_k: top_k,
348
+ namespace: namespace,
349
+ filter: filter,
350
+ include_values: include_values,
351
+ include_metadata: include_metadata
352
+ )
353
+ end
354
+
290
355
  # Get the provider name
291
356
  #
292
357
  # @return [Symbol]
data/lib/vectra/errors.rb CHANGED
@@ -57,6 +57,9 @@ module Vectra
57
57
  # Raised when the provider is not supported
58
58
  class UnsupportedProviderError < Error; end
59
59
 
60
+ # Raised when a feature is not supported by the provider
61
+ class UnsupportedFeatureError < Error; end
62
+
60
63
  # Raised when an operation times out
61
64
  class TimeoutError < Error; end
62
65
 
@@ -94,6 +94,74 @@ module Vectra
94
94
  QueryResult.from_response(matches: matches, namespace: namespace)
95
95
  end
96
96
 
97
+ # Hybrid search combining vector similarity and PostgreSQL full-text search
98
+ #
99
+ # Combines pgvector similarity search with PostgreSQL's native full-text search.
100
+ # Requires a text search column (tsvector) in your table.
101
+ #
102
+ # @param index [String] table name
103
+ # @param vector [Array<Float>] query vector
104
+ # @param text [String] text query for full-text search
105
+ # @param alpha [Float] balance (0.0 = full-text, 1.0 = vector)
106
+ # @param top_k [Integer] number of results
107
+ # @param namespace [String, nil] optional namespace
108
+ # @param filter [Hash, nil] metadata filter
109
+ # @param include_values [Boolean] include vector values
110
+ # @param include_metadata [Boolean] include metadata
111
+ # @param text_column [String] column name for full-text search (default: 'content')
112
+ # @return [QueryResult] search results
113
+ #
114
+ # @note Your table should have a text column with a tsvector index:
115
+ # CREATE INDEX idx_content_fts ON my_index USING gin(to_tsvector('english', content));
116
+ def hybrid_search(index:, vector:, text:, alpha:, top_k:, namespace: nil,
117
+ filter: nil, include_values: false, include_metadata: true,
118
+ text_column: "content")
119
+ ensure_table_exists!(index)
120
+
121
+ vector_literal = format_vector(vector)
122
+ distance_op = DISTANCE_FUNCTIONS[table_metric(index)]
123
+
124
+ # Build hybrid score: alpha * vector_similarity + (1-alpha) * text_rank
125
+ # Vector similarity: 1 - (distance / max_distance)
126
+ # Text rank: ts_rank from full-text search
127
+ select_cols = ["id"]
128
+ select_cols << "embedding" if include_values
129
+ select_cols << "metadata" if include_metadata
130
+
131
+ # Calculate hybrid score
132
+ # For vector: use cosine distance (1 - distance gives similarity)
133
+ # For text: use ts_rank
134
+ vector_score = "1.0 - (embedding #{distance_op} '#{vector_literal}'::vector)"
135
+ text_score = "ts_rank(to_tsvector('english', COALESCE(#{quote_ident(text_column)}, '')), " \
136
+ "plainto_tsquery('english', #{escape_literal(text)}))"
137
+
138
+ # Normalize scores to 0-1 range and combine with alpha
139
+ hybrid_score = "(#{alpha} * #{vector_score} + (1.0 - #{alpha}) * #{text_score})"
140
+
141
+ select_cols << "#{hybrid_score} AS score"
142
+ select_cols << "#{vector_score} AS vector_score"
143
+ select_cols << "#{text_score} AS text_score"
144
+
145
+ where_clauses = build_where_clauses(namespace, filter)
146
+ where_clauses << "to_tsvector('english', COALESCE(#{quote_ident(text_column)}, '')) @@ " \
147
+ "plainto_tsquery('english', #{escape_literal(text)})"
148
+
149
+ sql = "SELECT #{select_cols.join(', ')} FROM #{quote_ident(index)}"
150
+ sql += " WHERE #{where_clauses.join(' AND ')}" if where_clauses.any?
151
+ sql += " ORDER BY score DESC"
152
+ sql += " LIMIT #{top_k.to_i}"
153
+
154
+ result = execute(sql)
155
+ matches = result.map { |row| build_match_from_row(row, include_values, include_metadata) }
156
+
157
+ log_debug("Hybrid search returned #{matches.size} results (alpha: #{alpha})")
158
+
159
+ QueryResult.from_response(
160
+ matches: matches,
161
+ namespace: namespace
162
+ )
163
+ end
164
+
97
165
  # @see Base#fetch
98
166
  def fetch(index:, ids:, namespace: nil)
99
167
  ensure_table_exists!(index)
@@ -67,6 +67,63 @@ module Vectra
67
67
  end
68
68
  end
69
69
 
70
+ # Hybrid search combining dense (vector) and sparse (keyword) search
71
+ #
72
+ # Pinecone supports hybrid search using sparse-dense vectors.
73
+ # For text-based keyword search, you need to provide sparse vectors.
74
+ #
75
+ # @param index [String] index name
76
+ # @param vector [Array<Float>] dense query vector
77
+ # @param text [String] text query (converted to sparse vector)
78
+ # @param alpha [Float] balance (0.0 = sparse, 1.0 = dense)
79
+ # @param top_k [Integer] number of results
80
+ # @param namespace [String, nil] optional namespace
81
+ # @param filter [Hash, nil] metadata filter
82
+ # @param include_values [Boolean] include vector values
83
+ # @param include_metadata [Boolean] include metadata
84
+ # @return [QueryResult] search results
85
+ #
86
+ # @note For proper hybrid search, you should generate sparse vectors
87
+ # from text using a tokenizer (e.g., BM25). This method accepts text
88
+ # but requires sparse vector generation externally.
89
+ def hybrid_search(index:, vector:, alpha:, top_k:, namespace: nil,
90
+ filter: nil, include_values: false, include_metadata: true, text: nil)
91
+ # Pinecone hybrid search requires sparse vectors
92
+ # For now, we'll use dense vector only and log a warning
93
+ # In production, users should generate sparse vectors from text
94
+ if text
95
+ log_debug("Pinecone hybrid search: text parameter ignored. " \
96
+ "For true hybrid search, provide sparse vectors via sparse_values parameter.")
97
+ end
98
+
99
+ # Use dense vector search with alpha weighting
100
+ # Note: Pinecone's actual hybrid search requires sparse vectors
101
+ # This is a simplified implementation
102
+ body = {
103
+ vector: vector.map(&:to_f),
104
+ topK: top_k,
105
+ includeValues: include_values,
106
+ includeMetadata: include_metadata
107
+ }
108
+ body[:namespace] = namespace if namespace
109
+ body[:filter] = transform_filter(filter) if filter
110
+
111
+ # Alpha is used conceptually here - Pinecone's actual hybrid search
112
+ # requires sparse vectors in the query
113
+ response = data_connection(index).post("/query", body)
114
+
115
+ if response.success?
116
+ log_debug("Hybrid search returned #{response.body['matches']&.size || 0} results (alpha: #{alpha})")
117
+ QueryResult.from_response(
118
+ matches: transform_matches(response.body["matches"] || []),
119
+ namespace: response.body["namespace"],
120
+ usage: response.body["usage"]
121
+ )
122
+ else
123
+ handle_error(response)
124
+ end
125
+ end
126
+
70
127
  # @see Base#fetch
71
128
  def fetch(index:, ids:, namespace: nil)
72
129
  params = { ids: ids }
@@ -83,6 +83,33 @@ module Vectra
83
83
  end
84
84
  end
85
85
 
86
+ # Hybrid search combining vector and text search
87
+ #
88
+ # Uses Qdrant's prefetch + rescore API for efficient hybrid search
89
+ #
90
+ # @param index [String] collection name
91
+ # @param vector [Array<Float>] query vector
92
+ # @param text [String] text query for keyword search
93
+ # @param alpha [Float] balance (0.0 = keyword, 1.0 = vector)
94
+ # @param top_k [Integer] number of results
95
+ # @param namespace [String, nil] optional namespace
96
+ # @param filter [Hash, nil] metadata filter
97
+ # @param include_values [Boolean] include vector values
98
+ # @param include_metadata [Boolean] include metadata
99
+ # @return [QueryResult] search results
100
+ def hybrid_search(index:, vector:, text:, alpha:, top_k:, namespace: nil,
101
+ filter: nil, include_values: false, include_metadata: true)
102
+ qdrant_filter = build_filter(filter, namespace)
103
+ body = build_hybrid_search_body(vector, text, alpha, top_k, qdrant_filter,
104
+ include_values, include_metadata)
105
+
106
+ response = with_error_handling do
107
+ connection.post("/collections/#{index}/points/query", body)
108
+ end
109
+
110
+ handle_hybrid_search_response(response, alpha, namespace)
111
+ end
112
+
86
113
  # @see Base#fetch
87
114
  def fetch(index:, ids:, namespace: nil) # rubocop:disable Lint/UnusedMethodArgument
88
115
  point_ids = ids.map { |id| generate_point_id(id) }
@@ -280,6 +307,38 @@ module Vectra
280
307
 
281
308
  private
282
309
 
310
+ def build_hybrid_search_body(vector, text, alpha, top_k, filter, include_values, include_metadata)
311
+ body = {
312
+ prefetch: {
313
+ query: { text: text },
314
+ limit: top_k * 2
315
+ },
316
+ query: { vector: vector.map(&:to_f) },
317
+ limit: top_k,
318
+ params: { alpha: alpha },
319
+ with_vector: include_values,
320
+ with_payload: include_metadata
321
+ }
322
+
323
+ body[:prefetch][:filter] = filter if filter
324
+ body[:query][:filter] = filter if filter
325
+ body
326
+ end
327
+
328
+ def handle_hybrid_search_response(response, alpha, namespace)
329
+ if response.success?
330
+ matches = transform_search_results(response.body["result"] || [])
331
+ log_debug("Hybrid search returned #{matches.size} results (alpha: #{alpha})")
332
+
333
+ QueryResult.from_response(
334
+ matches: matches,
335
+ namespace: namespace
336
+ )
337
+ else
338
+ handle_error(response)
339
+ end
340
+ end
341
+
283
342
  def validate_config!
284
343
  super
285
344
  raise ConfigurationError, "Host must be configured for Qdrant" if config.host.nil? || config.host.empty?
@@ -102,6 +102,43 @@ module Vectra
102
102
  end
103
103
  end
104
104
 
105
+ # Hybrid search combining vector and BM25 text search
106
+ #
107
+ # Uses Weaviate's hybrid search API with alpha parameter
108
+ #
109
+ # @param index [String] class name
110
+ # @param vector [Array<Float>] query vector
111
+ # @param text [String] text query for BM25 search
112
+ # @param alpha [Float] balance (0.0 = BM25, 1.0 = vector)
113
+ # @param top_k [Integer] number of results
114
+ # @param namespace [String, nil] optional namespace (not used in Weaviate)
115
+ # @param filter [Hash, nil] metadata filter
116
+ # @param include_values [Boolean] include vector values
117
+ # @param include_metadata [Boolean] include metadata
118
+ # @return [QueryResult] search results
119
+ def hybrid_search(index:, vector:, text:, alpha:, top_k:, namespace: nil,
120
+ filter: nil, include_values: false, include_metadata: true)
121
+ where_filter = build_where(filter, namespace)
122
+ graphql = build_hybrid_search_graphql(
123
+ index: index,
124
+ vector: vector,
125
+ text: text,
126
+ alpha: alpha,
127
+ top_k: top_k,
128
+ where_filter: where_filter,
129
+ include_values: include_values,
130
+ include_metadata: include_metadata
131
+ )
132
+ body = { "query" => graphql }
133
+
134
+ response = with_error_handling do
135
+ connection.post("#{API_BASE_PATH}/graphql", body)
136
+ end
137
+
138
+ handle_hybrid_search_response(response, index, alpha, namespace,
139
+ include_values, include_metadata)
140
+ end
141
+
105
142
  # rubocop:disable Metrics/PerceivedComplexity
106
143
  def fetch(index:, ids:, namespace: nil)
107
144
  body = {
@@ -294,6 +331,54 @@ module Vectra
294
331
 
295
332
  private
296
333
 
334
+ def build_hybrid_search_graphql(index:, vector:, text:, alpha:, top_k:,
335
+ where_filter:, include_values:, include_metadata:)
336
+ selection_block = build_selection_fields(include_values, include_metadata).join(" ")
337
+ build_graphql_query(index, top_k, text, alpha, vector, where_filter, selection_block)
338
+ end
339
+
340
+ def build_graphql_query(index, top_k, text, alpha, vector, where_filter, selection_block)
341
+ <<~GRAPHQL
342
+ {
343
+ Get {
344
+ #{index}(
345
+ limit: #{top_k}
346
+ hybrid: {
347
+ query: "#{text.gsub('"', '\\"')}"
348
+ alpha: #{alpha}
349
+ }
350
+ nearVector: { vector: [#{vector.map { |v| format('%.10f', v.to_f) }.join(', ')}] }
351
+ #{"where: #{JSON.generate(where_filter)}" if where_filter}
352
+ ) {
353
+ #{selection_block}
354
+ }
355
+ }
356
+ }
357
+ GRAPHQL
358
+ end
359
+
360
+ def build_selection_fields(include_values, include_metadata)
361
+ fields = ["_additional { id distance }"]
362
+ fields << "vector" if include_values
363
+ fields << "metadata" if include_metadata
364
+ fields
365
+ end
366
+
367
+ def handle_hybrid_search_response(response, index, alpha, namespace,
368
+ include_values, include_metadata)
369
+ if response.success?
370
+ matches = extract_query_matches(response.body, index, include_values, include_metadata)
371
+ log_debug("Hybrid search returned #{matches.size} results (alpha: #{alpha})")
372
+
373
+ QueryResult.from_response(
374
+ matches: matches,
375
+ namespace: namespace
376
+ )
377
+ else
378
+ handle_error(response)
379
+ end
380
+ end
381
+
297
382
  def validate_config!
298
383
  super
299
384
  raise ConfigurationError, "Host must be configured for Weaviate" if config.host.nil? || config.host.empty?
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Vectra
4
- VERSION = "0.4.0"
4
+ VERSION = "1.0.0"
5
5
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: vectra-client
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.0
4
+ version: 1.0.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Mijo Kristo