vectra-client 1.1.0 → 1.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +37 -0
- data/README.md +22 -0
- data/docs/_layouts/page.html +7 -0
- data/docs/api/cheatsheet.md +17 -0
- data/docs/api/methods.md +45 -0
- data/docs/guides/roadmap.md +53 -0
- data/lib/vectra/client.rb +61 -0
- data/lib/vectra/middleware/request.rb +1 -1
- data/lib/vectra/providers/memory.rb +56 -0
- data/lib/vectra/providers/pgvector.rb +50 -0
- data/lib/vectra/providers/qdrant.rb +39 -0
- data/lib/vectra/providers/weaviate.rb +64 -0
- data/lib/vectra/version.rb +1 -1
- metadata +2 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 0f1fd06b5874c1bc1da1244fb05321cda4a4759e234d9175159c5a6ba7ed8d40
|
|
4
|
+
data.tar.gz: 9d69e983f4ef5ed4d6bdd58e7a003e06045f3570ba1b3329587f82bcf07902a3
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 7c1911470f96d83470dd98cdc6bb3e6c438a11e71b7317265388d22ae4c3792cea3e595661c927a42587b6008fc891940d511815932b3da478bfeb056ccf8c30
|
|
7
|
+
data.tar.gz: a3a17643736f8b9a19b92c81a87ce8749a4e55798085e474ef1bc2f13ab9f8c688b2bc7537467a851e59a31cf06390444a064a9d525cff8efb175dbae1de7c7c
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,42 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## [v1.1.0](https://github.com/stokry/vectra/tree/v1.1.0) (2026-01-15)
|
|
4
|
+
|
|
5
|
+
[Full Changelog](https://github.com/stokry/vectra/compare/v1.0.8...v1.1.0)
|
|
6
|
+
|
|
7
|
+
### 🎉 Major Feature: Middleware System
|
|
8
|
+
|
|
9
|
+
This release introduces a **Rack-style middleware system** for all vector database operations.
|
|
10
|
+
|
|
11
|
+
#### Added
|
|
12
|
+
|
|
13
|
+
- **Middleware Stack** - All client operations now route through a composable middleware pipeline
|
|
14
|
+
- **5 Built-in Middleware**:
|
|
15
|
+
- `Vectra::Middleware::Logging` - Structured logs with timing for all operations
|
|
16
|
+
- `Vectra::Middleware::Retry` - Automatic retry with exponential/linear backoff for transient errors
|
|
17
|
+
- `Vectra::Middleware::Instrumentation` - Hooks for metrics and APM integration
|
|
18
|
+
- `Vectra::Middleware::PIIRedaction` - Automatic PII redaction (email, phone, SSN, credit cards)
|
|
19
|
+
- `Vectra::Middleware::CostTracker` - Track API costs per operation with callbacks
|
|
20
|
+
- **Request/Response Objects** - Type-safe objects with metadata attachment
|
|
21
|
+
- **Extensible Framework** - Create custom middleware by extending `Vectra::Middleware::Base`
|
|
22
|
+
- **Global & Per-Client Middleware** - Apply middleware globally (`Client.use`) or per-instance (`new(middleware: [...])`)
|
|
23
|
+
|
|
24
|
+
#### Changed
|
|
25
|
+
|
|
26
|
+
- All client operations (`upsert`, `query`, `fetch`, `update`, `delete`, `stats`, `list_indexes`, etc.) now route through middleware stack for consistency
|
|
27
|
+
- Middleware has complete visibility into all client operations
|
|
28
|
+
|
|
29
|
+
#### Documentation
|
|
30
|
+
|
|
31
|
+
- Added comprehensive middleware section to README
|
|
32
|
+
- Created `examples/middleware_demo.rb` demonstrating all 5 built-in middleware
|
|
33
|
+
- Full YARD documentation for all middleware classes
|
|
34
|
+
- Published [middleware guide](https://dev.to/stokry/rack-style-middleware-for-vector-databases-in-ruby-vectra-client-110-2jh3) on Dev.to
|
|
35
|
+
|
|
36
|
+
#### Migration Notes
|
|
37
|
+
|
|
38
|
+
No breaking changes. Middleware is opt-in - existing code works without modification.
|
|
39
|
+
|
|
3
40
|
## [v1.0.8](https://github.com/stokry/vectra/tree/v1.0.8) (2026-01-14)
|
|
4
41
|
|
|
5
42
|
[Full Changelog](https://github.com/stokry/vectra/compare/v1.0.7...v1.0.8)
|
data/README.md
CHANGED
|
@@ -109,6 +109,14 @@ results = client.hybrid_search(
|
|
|
109
109
|
text: 'ruby programming',
|
|
110
110
|
alpha: 0.7 # 70% semantic, 30% keyword
|
|
111
111
|
)
|
|
112
|
+
|
|
113
|
+
# Text-only search (keyword search without embeddings)
|
|
114
|
+
# Supported by: Qdrant, Weaviate, pgvector
|
|
115
|
+
results = client.text_search(
|
|
116
|
+
index: 'products',
|
|
117
|
+
text: 'iPhone 15 Pro',
|
|
118
|
+
top_k: 10
|
|
119
|
+
)
|
|
112
120
|
```
|
|
113
121
|
|
|
114
122
|
## Provider Examples
|
|
@@ -307,6 +315,20 @@ Vectra includes 7 production-ready patterns out of the box:
|
|
|
307
315
|
- **Health Checks** - `healthy?`, `ping`, and `health_check` methods
|
|
308
316
|
- **Instrumentation** - Datadog, New Relic, Sentry, Honeybadger support
|
|
309
317
|
|
|
318
|
+
## Roadmap
|
|
319
|
+
|
|
320
|
+
High-level roadmap for `vectra-client`:
|
|
321
|
+
|
|
322
|
+
- **1.x (near term)**
|
|
323
|
+
- Reranking middleware built on top of the existing Rack-style middleware stack.
|
|
324
|
+
- Additional middleware building blocks (sampling, tracing, score normalization).
|
|
325
|
+
- Smoother Rails UX for multi-tenant setups and larger demos (e‑commerce, RAG, recommendations).
|
|
326
|
+
- **Mid term**
|
|
327
|
+
- Additional providers where it makes sense and stays maintainable.
|
|
328
|
+
- Deeper documentation and recipes around reranking and hybrid search.
|
|
329
|
+
|
|
330
|
+
For a more detailed, always-up-to-date version, see the online roadmap: https://vectra-docs.netlify.app/guides/roadmap/
|
|
331
|
+
|
|
310
332
|
## Development
|
|
311
333
|
|
|
312
334
|
```bash
|
data/docs/_layouts/page.html
CHANGED
|
@@ -91,6 +91,13 @@
|
|
|
91
91
|
<li><a href="https://github.com/stokry/vectra/issues" class="tma-sidebar__link" target="_blank">Report Issue ↗</a></li>
|
|
92
92
|
</ul>
|
|
93
93
|
</div>
|
|
94
|
+
|
|
95
|
+
<div class="tma-sidebar__section">
|
|
96
|
+
<h3 class="tma-sidebar__title">Resources</h3>
|
|
97
|
+
<ul class="tma-sidebar__list">
|
|
98
|
+
<li><a href="{{ site.baseurl }}/guides/roadmap" class="tma-sidebar__link {% if page.url == '/guides/roadmap/' %}tma-sidebar__link--active{% endif %}">Roadmap</a></li>
|
|
99
|
+
</ul>
|
|
100
|
+
</div>
|
|
94
101
|
</aside>
|
|
95
102
|
|
|
96
103
|
<!-- Main Content -->
|
data/docs/api/cheatsheet.md
CHANGED
|
@@ -98,6 +98,23 @@ results = client.hybrid_search(
|
|
|
98
98
|
|
|
99
99
|
Supported providers: Qdrant ✅, Weaviate ✅, pgvector ✅, Pinecone ⚠️
|
|
100
100
|
|
|
101
|
+
### Text Search (keyword-only, no embeddings)
|
|
102
|
+
|
|
103
|
+
```ruby
|
|
104
|
+
results = client.text_search(
|
|
105
|
+
index: 'products',
|
|
106
|
+
text: 'iPhone 15 Pro',
|
|
107
|
+
top_k: 10,
|
|
108
|
+
filter: { category: 'electronics' }
|
|
109
|
+
)
|
|
110
|
+
|
|
111
|
+
results.each do |match|
|
|
112
|
+
puts "#{match.id} (score=#{match.score.round(3)}): #{match.metadata['title']}"
|
|
113
|
+
end
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
Supported providers: Qdrant ✅ (BM25), Weaviate ✅ (BM25), pgvector ✅ (PostgreSQL full-text)
|
|
117
|
+
|
|
101
118
|
### Fetch
|
|
102
119
|
|
|
103
120
|
```ruby
|
data/docs/api/methods.md
CHANGED
|
@@ -147,6 +147,51 @@ results = client.hybrid_search(
|
|
|
147
147
|
|
|
148
148
|
---
|
|
149
149
|
|
|
150
|
+
### `client.text_search(index:, text:, top_k: 10, namespace: nil, filter: nil, include_values: false, include_metadata: true)`
|
|
151
|
+
|
|
152
|
+
Text-only search (keyword search without requiring embeddings).
|
|
153
|
+
|
|
154
|
+
**Parameters:**
|
|
155
|
+
- `index` (String) - Index/collection name (uses client's default index when omitted)
|
|
156
|
+
- `text` (String) - Text query for keyword search
|
|
157
|
+
- `top_k` (Integer) - Number of results (default: 10)
|
|
158
|
+
- `namespace` (String, optional) - Namespace
|
|
159
|
+
- `filter` (Hash, optional) - Metadata filter
|
|
160
|
+
- `include_values` (Boolean) - Include vector values (default: false)
|
|
161
|
+
- `include_metadata` (Boolean) - Include metadata (default: true)
|
|
162
|
+
|
|
163
|
+
**Returns:** `Vectra::QueryResult`
|
|
164
|
+
|
|
165
|
+
**Provider Support:**
|
|
166
|
+
- ✅ Qdrant (BM25)
|
|
167
|
+
- ✅ Weaviate (BM25)
|
|
168
|
+
- ✅ pgvector (PostgreSQL full-text search)
|
|
169
|
+
- ✅ Memory (simple keyword matching - for testing only)
|
|
170
|
+
- ❌ Pinecone (not supported - use sparse vectors instead)
|
|
171
|
+
|
|
172
|
+
**Example:**
|
|
173
|
+
```ruby
|
|
174
|
+
# Keyword search for exact matches
|
|
175
|
+
results = client.text_search(
|
|
176
|
+
index: 'products',
|
|
177
|
+
text: 'iPhone 15 Pro',
|
|
178
|
+
top_k: 10,
|
|
179
|
+
filter: { category: 'electronics' }
|
|
180
|
+
)
|
|
181
|
+
|
|
182
|
+
results.each do |match|
|
|
183
|
+
puts "#{match.id}: #{match.score} - #{match.metadata['title']}"
|
|
184
|
+
end
|
|
185
|
+
```
|
|
186
|
+
|
|
187
|
+
**Use Cases:**
|
|
188
|
+
- Product name search (exact matches)
|
|
189
|
+
- Function/class name search in documentation
|
|
190
|
+
- Keyword-based filtering when semantic search is not needed
|
|
191
|
+
- Faster search when embeddings are not available
|
|
192
|
+
|
|
193
|
+
---
|
|
194
|
+
|
|
150
195
|
### `client.fetch(index:, ids:, namespace: nil)`
|
|
151
196
|
|
|
152
197
|
Fetch vectors by their IDs.
|
|
@@ -0,0 +1,53 @@
|
|
|
1
|
+
---
|
|
2
|
+
layout: page
|
|
3
|
+
title: Roadmap
|
|
4
|
+
permalink: /guides/roadmap/
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Vectra Roadmap
|
|
8
|
+
|
|
9
|
+
This page outlines the high-level roadmap for **vectra-client**, the unified Ruby client for vector databases.
|
|
10
|
+
|
|
11
|
+
The roadmap is intentionally focused on **production features** that make AI workloads reliable, observable, and easy to operate in Ruby.
|
|
12
|
+
|
|
13
|
+
## Near Term (1.x)
|
|
14
|
+
|
|
15
|
+
- **Reranking middleware**
|
|
16
|
+
- Middleware that can call external rerankers (e.g., Cohere, Jina, custom HTTP) and reorder search results after a `query`.
|
|
17
|
+
- Pluggable providers, configurable `top_n`, and safe fallbacks when reranking fails.
|
|
18
|
+
- **More middleware building blocks**
|
|
19
|
+
- Request sampling / tracing for debugging complex production issues.
|
|
20
|
+
- Response shaping (e.g., score normalization, custom thresholds) as reusable middleware.
|
|
21
|
+
- **Rails UX improvements**
|
|
22
|
+
- Convenience generators and helpers for multi-tenant setups.
|
|
23
|
+
- Better defaults and examples for 1k+ records demos (e‑commerce, blogs, RAG, recommendations).
|
|
24
|
+
|
|
25
|
+
## Mid Term
|
|
26
|
+
|
|
27
|
+
- **Additional providers**
|
|
28
|
+
- Support for more hosted / self-hosted vector solutions where it makes sense and stays maintainable.
|
|
29
|
+
- **First-class reranking guides**
|
|
30
|
+
- End-to-end documentation for combining vectra-client with external LLMs / rerankers.
|
|
31
|
+
- **More recipes & patterns**
|
|
32
|
+
- Deeper recipes for analytics, recommendations, and hybrid search in large Rails apps.
|
|
33
|
+
|
|
34
|
+
## Long Term Vision
|
|
35
|
+
|
|
36
|
+
Keep **vectra-client** the most **production-ready Ruby toolkit** for vector databases:
|
|
37
|
+
|
|
38
|
+
- Strong guarantees around retries, circuit breakers, and backpressure.
|
|
39
|
+
- Excellent observability out of the box.
|
|
40
|
+
- Stable, provider-agnostic API that lets you change infra without rewriting your app.
|
|
41
|
+
|
|
42
|
+
If you have ideas or needs that fit this direction, please open an issue on GitHub so we can prioritise the roadmap around real-world use cases.
|
|
43
|
+
|
|
44
|
+
{
|
|
45
|
+
"cells": [],
|
|
46
|
+
"metadata": {
|
|
47
|
+
"language_info": {
|
|
48
|
+
"name": "python"
|
|
49
|
+
}
|
|
50
|
+
},
|
|
51
|
+
"nbformat": 4,
|
|
52
|
+
"nbformat_minor": 2
|
|
53
|
+
}
|
data/lib/vectra/client.rb
CHANGED
|
@@ -494,6 +494,67 @@ module Vectra
|
|
|
494
494
|
)
|
|
495
495
|
end
|
|
496
496
|
|
|
497
|
+
# Text-only search (keyword search without embeddings)
|
|
498
|
+
#
|
|
499
|
+
# Performs keyword/text search without requiring vector embeddings.
|
|
500
|
+
# Useful for exact matches, product names, function names, etc.
|
|
501
|
+
#
|
|
502
|
+
# @param index [String] the index/collection name
|
|
503
|
+
# @param text [String] text query for keyword search
|
|
504
|
+
# @param top_k [Integer] number of results to return (default: 10)
|
|
505
|
+
# @param namespace [String, nil] optional namespace
|
|
506
|
+
# @param filter [Hash, nil] metadata filter
|
|
507
|
+
# @param include_values [Boolean] include vector values in results
|
|
508
|
+
# @param include_metadata [Boolean] include metadata in results
|
|
509
|
+
# @return [QueryResult] search results
|
|
510
|
+
#
|
|
511
|
+
# @example Basic text search
|
|
512
|
+
# results = client.text_search(
|
|
513
|
+
# index: 'products',
|
|
514
|
+
# text: 'iPhone 15 Pro',
|
|
515
|
+
# top_k: 10
|
|
516
|
+
# )
|
|
517
|
+
#
|
|
518
|
+
# @example Text search with filter
|
|
519
|
+
# results = client.text_search(
|
|
520
|
+
# index: 'products',
|
|
521
|
+
# text: 'laptop',
|
|
522
|
+
# filter: { category: 'electronics', in_stock: true }
|
|
523
|
+
# )
|
|
524
|
+
#
|
|
525
|
+
# @raise [UnsupportedFeatureError] if provider doesn't support text search
|
|
526
|
+
def text_search(index:, text:, top_k: 10, namespace: nil, filter: nil,
|
|
527
|
+
include_values: false, include_metadata: true)
|
|
528
|
+
index ||= default_index
|
|
529
|
+
namespace ||= default_namespace
|
|
530
|
+
validate_index!(index)
|
|
531
|
+
raise ValidationError, "Text query cannot be nil or empty" if text.nil? || text.empty?
|
|
532
|
+
|
|
533
|
+
unless provider.respond_to?(:text_search)
|
|
534
|
+
raise UnsupportedFeatureError,
|
|
535
|
+
"Text search is not supported by #{provider_name} provider"
|
|
536
|
+
end
|
|
537
|
+
|
|
538
|
+
Instrumentation.instrument(
|
|
539
|
+
operation: :text_search,
|
|
540
|
+
provider: provider_name,
|
|
541
|
+
index: index,
|
|
542
|
+
metadata: { top_k: top_k }
|
|
543
|
+
) do
|
|
544
|
+
@middleware.call(
|
|
545
|
+
:text_search,
|
|
546
|
+
index: index,
|
|
547
|
+
text: text,
|
|
548
|
+
top_k: top_k,
|
|
549
|
+
namespace: namespace,
|
|
550
|
+
filter: filter,
|
|
551
|
+
include_values: include_values,
|
|
552
|
+
include_metadata: include_metadata,
|
|
553
|
+
provider: provider_name
|
|
554
|
+
)
|
|
555
|
+
end
|
|
556
|
+
end
|
|
557
|
+
|
|
497
558
|
# Get the provider name
|
|
498
559
|
#
|
|
499
560
|
# @return [Symbol]
|
|
@@ -55,7 +55,7 @@ module Vectra
|
|
|
55
55
|
#
|
|
56
56
|
# @return [Boolean]
|
|
57
57
|
def read_operation?
|
|
58
|
-
[:query, :fetch, :list_indexes, :describe_index, :stats].include?(operation)
|
|
58
|
+
[:query, :text_search, :hybrid_search, :fetch, :list_indexes, :describe_index, :stats].include?(operation)
|
|
59
59
|
end
|
|
60
60
|
end
|
|
61
61
|
end
|
|
@@ -80,6 +80,32 @@ module Vectra
|
|
|
80
80
|
QueryResult.from_response(matches: matches, namespace: namespace)
|
|
81
81
|
end
|
|
82
82
|
|
|
83
|
+
# Text-only search using simple keyword matching in metadata
|
|
84
|
+
#
|
|
85
|
+
# For testing purposes only. Performs case-insensitive keyword matching
|
|
86
|
+
# in metadata values. Not a real BM25/full-text search implementation.
|
|
87
|
+
#
|
|
88
|
+
# @param index [String] index name
|
|
89
|
+
# @param text [String] text query for keyword search
|
|
90
|
+
# @param top_k [Integer] number of results
|
|
91
|
+
# @param namespace [String, nil] optional namespace
|
|
92
|
+
# @param filter [Hash, nil] metadata filter
|
|
93
|
+
# @param include_values [Boolean] include vector values
|
|
94
|
+
# @param include_metadata [Boolean] include metadata
|
|
95
|
+
# @return [QueryResult] search results
|
|
96
|
+
def text_search(index:, text:, top_k:, namespace: nil, filter: nil,
|
|
97
|
+
include_values: false, include_metadata: true)
|
|
98
|
+
ns = namespace || ""
|
|
99
|
+
candidates = filter_candidates(@storage[index][ns].values, filter)
|
|
100
|
+
text_lower = text.to_s.downcase
|
|
101
|
+
|
|
102
|
+
matches = find_text_matches(candidates, text_lower, include_values, include_metadata)
|
|
103
|
+
matches = matches.sort_by { |m| -m[:score] }.first(top_k)
|
|
104
|
+
|
|
105
|
+
log_debug("Text search returned #{matches.size} results")
|
|
106
|
+
QueryResult.from_response(matches: matches, namespace: namespace)
|
|
107
|
+
end
|
|
108
|
+
|
|
83
109
|
# @see Base#fetch
|
|
84
110
|
def fetch(index:, ids:, namespace: nil)
|
|
85
111
|
ns = namespace || ""
|
|
@@ -293,6 +319,36 @@ module Vectra
|
|
|
293
319
|
true
|
|
294
320
|
end
|
|
295
321
|
# rubocop:enable Naming/PredicateMethod
|
|
322
|
+
|
|
323
|
+
# Filter candidates by metadata filter
|
|
324
|
+
def filter_candidates(candidates, filter)
|
|
325
|
+
return candidates unless filter
|
|
326
|
+
|
|
327
|
+
candidates.select { |v| matches_filter?(v, filter) }
|
|
328
|
+
end
|
|
329
|
+
|
|
330
|
+
# Find text matches in candidates
|
|
331
|
+
def find_text_matches(candidates, text_lower, include_values, include_metadata)
|
|
332
|
+
candidates.map do |vec|
|
|
333
|
+
metadata_text = build_metadata_text(vec)
|
|
334
|
+
next unless metadata_text.include?(text_lower)
|
|
335
|
+
|
|
336
|
+
score = calculate_text_score(text_lower, metadata_text)
|
|
337
|
+
build_match(vec, score, include_values, include_metadata)
|
|
338
|
+
end.compact
|
|
339
|
+
end
|
|
340
|
+
|
|
341
|
+
# Build metadata text string for searching
|
|
342
|
+
def build_metadata_text(vector)
|
|
343
|
+
(vector.metadata || {}).values.map(&:to_s).join(" ").downcase
|
|
344
|
+
end
|
|
345
|
+
|
|
346
|
+
# Calculate text match score based on word matches
|
|
347
|
+
def calculate_text_score(query_text, metadata_text)
|
|
348
|
+
query_words = query_text.split(/\s+/)
|
|
349
|
+
matched_words = query_words.count { |word| metadata_text.include?(word) }
|
|
350
|
+
matched_words.to_f / query_words.size
|
|
351
|
+
end
|
|
296
352
|
end
|
|
297
353
|
end
|
|
298
354
|
end
|
|
@@ -28,6 +28,7 @@ module Vectra
|
|
|
28
28
|
# )
|
|
29
29
|
# client.upsert(index: 'documents', vectors: [...])
|
|
30
30
|
#
|
|
31
|
+
# rubocop:disable Metrics/ClassLength
|
|
31
32
|
class Pgvector < Base
|
|
32
33
|
include Connection
|
|
33
34
|
include SqlHelpers
|
|
@@ -162,6 +163,54 @@ module Vectra
|
|
|
162
163
|
)
|
|
163
164
|
end
|
|
164
165
|
|
|
166
|
+
# Text-only search using PostgreSQL full-text search
|
|
167
|
+
#
|
|
168
|
+
# @param index [String] table name
|
|
169
|
+
# @param text [String] text query for full-text search
|
|
170
|
+
# @param top_k [Integer] number of results
|
|
171
|
+
# @param namespace [String, nil] optional namespace
|
|
172
|
+
# @param filter [Hash, nil] metadata filter
|
|
173
|
+
# @param include_values [Boolean] include vector values
|
|
174
|
+
# @param include_metadata [Boolean] include metadata
|
|
175
|
+
# @param text_column [String] column name for full-text search (default: 'content')
|
|
176
|
+
# @return [QueryResult] search results
|
|
177
|
+
#
|
|
178
|
+
# @note Your table should have a text column with a tsvector index:
|
|
179
|
+
# CREATE INDEX idx_content_fts ON my_index USING gin(to_tsvector('english', content));
|
|
180
|
+
def text_search(index:, text:, top_k:, namespace: nil, filter: nil,
|
|
181
|
+
include_values: false, include_metadata: true,
|
|
182
|
+
text_column: "content")
|
|
183
|
+
ensure_table_exists!(index)
|
|
184
|
+
|
|
185
|
+
select_cols = ["id"]
|
|
186
|
+
select_cols << "embedding" if include_values
|
|
187
|
+
select_cols << "metadata" if include_metadata
|
|
188
|
+
|
|
189
|
+
# Use ts_rank for scoring
|
|
190
|
+
text_score = "ts_rank(to_tsvector('english', COALESCE(#{quote_ident(text_column)}, '')), " \
|
|
191
|
+
"plainto_tsquery('english', #{escape_literal(text)}))"
|
|
192
|
+
select_cols << "#{text_score} AS score"
|
|
193
|
+
|
|
194
|
+
where_clauses = build_where_clauses(namespace, filter)
|
|
195
|
+
where_clauses << "to_tsvector('english', COALESCE(#{quote_ident(text_column)}, '')) @@ " \
|
|
196
|
+
"plainto_tsquery('english', #{escape_literal(text)})"
|
|
197
|
+
|
|
198
|
+
sql = "SELECT #{select_cols.join(', ')} FROM #{quote_ident(index)}"
|
|
199
|
+
sql += " WHERE #{where_clauses.join(' AND ')}" if where_clauses.any?
|
|
200
|
+
sql += " ORDER BY score DESC"
|
|
201
|
+
sql += " LIMIT #{top_k.to_i}"
|
|
202
|
+
|
|
203
|
+
result = execute(sql)
|
|
204
|
+
matches = result.map { |row| build_match_from_row(row, include_values, include_metadata) }
|
|
205
|
+
|
|
206
|
+
log_debug("Text search returned #{matches.size} results")
|
|
207
|
+
|
|
208
|
+
QueryResult.from_response(
|
|
209
|
+
matches: matches,
|
|
210
|
+
namespace: namespace
|
|
211
|
+
)
|
|
212
|
+
end
|
|
213
|
+
|
|
165
214
|
# @see Base#fetch
|
|
166
215
|
def fetch(index:, ids:, namespace: nil)
|
|
167
216
|
ensure_table_exists!(index)
|
|
@@ -361,5 +410,6 @@ module Vectra
|
|
|
361
410
|
raise ConfigurationError, "Host (connection URL or hostname) must be configured for pgvector"
|
|
362
411
|
end
|
|
363
412
|
end
|
|
413
|
+
# rubocop:enable Metrics/ClassLength
|
|
364
414
|
end
|
|
365
415
|
end
|
|
@@ -110,6 +110,45 @@ module Vectra
|
|
|
110
110
|
handle_hybrid_search_response(response, alpha, namespace)
|
|
111
111
|
end
|
|
112
112
|
|
|
113
|
+
# Text-only search using Qdrant's BM25 text search
|
|
114
|
+
#
|
|
115
|
+
# @param index [String] collection name
|
|
116
|
+
# @param text [String] text query for keyword search
|
|
117
|
+
# @param top_k [Integer] number of results
|
|
118
|
+
# @param namespace [String, nil] optional namespace
|
|
119
|
+
# @param filter [Hash, nil] metadata filter
|
|
120
|
+
# @param include_values [Boolean] include vector values
|
|
121
|
+
# @param include_metadata [Boolean] include metadata
|
|
122
|
+
# @return [QueryResult] search results
|
|
123
|
+
def text_search(index:, text:, top_k:, namespace: nil, filter: nil,
|
|
124
|
+
include_values: false, include_metadata: true)
|
|
125
|
+
qdrant_filter = build_filter(filter, namespace)
|
|
126
|
+
body = {
|
|
127
|
+
query: { text: text },
|
|
128
|
+
limit: top_k,
|
|
129
|
+
with_vector: include_values,
|
|
130
|
+
with_payload: include_metadata
|
|
131
|
+
}
|
|
132
|
+
|
|
133
|
+
body[:filter] = qdrant_filter if qdrant_filter
|
|
134
|
+
|
|
135
|
+
response = with_error_handling do
|
|
136
|
+
connection.post("/collections/#{index}/points/query", body)
|
|
137
|
+
end
|
|
138
|
+
|
|
139
|
+
if response.success?
|
|
140
|
+
matches = transform_search_results(response.body["result"] || [])
|
|
141
|
+
log_debug("Text search returned #{matches.size} results")
|
|
142
|
+
|
|
143
|
+
QueryResult.from_response(
|
|
144
|
+
matches: matches,
|
|
145
|
+
namespace: namespace
|
|
146
|
+
)
|
|
147
|
+
else
|
|
148
|
+
handle_error(response)
|
|
149
|
+
end
|
|
150
|
+
end
|
|
151
|
+
|
|
113
152
|
# @see Base#fetch
|
|
114
153
|
def fetch(index:, ids:, namespace: nil) # rubocop:disable Lint/UnusedMethodArgument
|
|
115
154
|
point_ids = ids.map { |id| generate_point_id(id) }
|
|
@@ -139,6 +139,36 @@ module Vectra
|
|
|
139
139
|
include_values, include_metadata)
|
|
140
140
|
end
|
|
141
141
|
|
|
142
|
+
# Text-only search using Weaviate's BM25 text search
|
|
143
|
+
#
|
|
144
|
+
# @param index [String] class name
|
|
145
|
+
# @param text [String] text query for BM25 search
|
|
146
|
+
# @param top_k [Integer] number of results
|
|
147
|
+
# @param namespace [String, nil] optional namespace (not used in Weaviate)
|
|
148
|
+
# @param filter [Hash, nil] metadata filter
|
|
149
|
+
# @param include_values [Boolean] include vector values
|
|
150
|
+
# @param include_metadata [Boolean] include metadata
|
|
151
|
+
# @return [QueryResult] search results
|
|
152
|
+
def text_search(index:, text:, top_k:, namespace: nil, filter: nil,
|
|
153
|
+
include_values: false, include_metadata: true)
|
|
154
|
+
where_filter = build_where(filter, namespace)
|
|
155
|
+
graphql = build_text_search_graphql(
|
|
156
|
+
index: index,
|
|
157
|
+
text: text,
|
|
158
|
+
top_k: top_k,
|
|
159
|
+
where_filter: where_filter,
|
|
160
|
+
include_values: include_values,
|
|
161
|
+
include_metadata: include_metadata
|
|
162
|
+
)
|
|
163
|
+
body = { "query" => graphql }
|
|
164
|
+
|
|
165
|
+
response = with_error_handling do
|
|
166
|
+
connection.post("#{API_BASE_PATH}/graphql", body)
|
|
167
|
+
end
|
|
168
|
+
|
|
169
|
+
handle_text_search_response(response, index, namespace, include_values, include_metadata)
|
|
170
|
+
end
|
|
171
|
+
|
|
142
172
|
# rubocop:disable Metrics/PerceivedComplexity
|
|
143
173
|
def fetch(index:, ids:, namespace: nil)
|
|
144
174
|
body = {
|
|
@@ -337,6 +367,26 @@ module Vectra
|
|
|
337
367
|
build_graphql_query(index, top_k, text, alpha, vector, where_filter, selection_block)
|
|
338
368
|
end
|
|
339
369
|
|
|
370
|
+
def build_text_search_graphql(index:, text:, top_k:, where_filter:,
|
|
371
|
+
include_values:, include_metadata:)
|
|
372
|
+
selection_block = build_selection_fields(include_values, include_metadata).join(" ")
|
|
373
|
+
<<~GRAPHQL
|
|
374
|
+
{
|
|
375
|
+
Get {
|
|
376
|
+
#{index}(
|
|
377
|
+
limit: #{top_k}
|
|
378
|
+
bm25: {
|
|
379
|
+
query: "#{text.gsub('"', '\\"')}"
|
|
380
|
+
}
|
|
381
|
+
#{"where: #{JSON.generate(where_filter)}" if where_filter}
|
|
382
|
+
) {
|
|
383
|
+
#{selection_block}
|
|
384
|
+
}
|
|
385
|
+
}
|
|
386
|
+
}
|
|
387
|
+
GRAPHQL
|
|
388
|
+
end
|
|
389
|
+
|
|
340
390
|
def build_graphql_query(index, top_k, text, alpha, vector, where_filter, selection_block)
|
|
341
391
|
<<~GRAPHQL
|
|
342
392
|
{
|
|
@@ -379,6 +429,20 @@ module Vectra
|
|
|
379
429
|
end
|
|
380
430
|
end
|
|
381
431
|
|
|
432
|
+
def handle_text_search_response(response, index, namespace, include_values, include_metadata)
|
|
433
|
+
if response.success?
|
|
434
|
+
matches = extract_query_matches(response.body, index, include_values, include_metadata)
|
|
435
|
+
log_debug("Text search returned #{matches.size} results")
|
|
436
|
+
|
|
437
|
+
QueryResult.from_response(
|
|
438
|
+
matches: matches,
|
|
439
|
+
namespace: namespace
|
|
440
|
+
)
|
|
441
|
+
else
|
|
442
|
+
handle_error(response)
|
|
443
|
+
end
|
|
444
|
+
end
|
|
445
|
+
|
|
382
446
|
def validate_config!
|
|
383
447
|
super
|
|
384
448
|
raise ConfigurationError, "Host must be configured for Weaviate" if config.host.nil? || config.host.empty?
|
data/lib/vectra/version.rb
CHANGED
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: vectra-client
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 1.1.
|
|
4
|
+
version: 1.1.1
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Mijo Kristo
|
|
@@ -274,6 +274,7 @@ files:
|
|
|
274
274
|
- docs/guides/rails-integration.md
|
|
275
275
|
- docs/guides/rails-troubleshooting.md
|
|
276
276
|
- docs/guides/recipes.md
|
|
277
|
+
- docs/guides/roadmap.md
|
|
277
278
|
- docs/guides/runbooks/cache-issues.md
|
|
278
279
|
- docs/guides/runbooks/high-error-rate.md
|
|
279
280
|
- docs/guides/runbooks/high-latency.md
|