vectra-client 1.0.6 → 1.0.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,211 @@
1
+ ---
2
+ layout: page
3
+ title: Testing Guide
4
+ permalink: /guides/testing/
5
+ ---
6
+
7
+ # Testing Guide
8
+
9
+ How to test code that uses Vectra without running a real vector database.
10
+
11
+ Vectra ships with a **Memory provider** – an in-memory vector store designed for **RSpec/Minitest, local dev, and CI**.
12
+
13
+ > Not for production use. All data is stored in memory and lost when the process exits.
14
+
15
+ ---
16
+
17
+ ## 1. Configure Vectra for Tests
18
+
19
+ ### Option A: Global Config (Rails)
20
+
21
+ ```ruby
22
+ # config/initializers/vectra.rb
23
+ require 'vectra'
24
+
25
+ Vectra.configure do |config|
26
+ if Rails.env.test?
27
+ config.provider = :memory
28
+ else
29
+ config.provider = :qdrant
30
+ config.host = ENV.fetch('QDRANT_HOST', 'http://localhost:6333')
31
+ config.api_key = ENV['QDRANT_API_KEY']
32
+ end
33
+ end
34
+ ```
35
+
36
+ Then in your application code and tests:
37
+
38
+ ```ruby
39
+ client = Vectra::Client.new
40
+ ```
41
+
42
+ ### Option B: Direct Construction in Tests
43
+
44
+ ```ruby
45
+ require 'vectra'
46
+
47
+ RSpec.describe 'My vector code' do
48
+ let(:client) { Vectra.memory }
49
+
50
+ it 'searches using in-memory provider' do
51
+ client.upsert(
52
+ index: 'documents',
53
+ vectors: [
54
+ { id: 'doc-1', values: [0.1, 0.2, 0.3], metadata: { title: 'Hello' } }
55
+ ]
56
+ )
57
+
58
+ results = client.query(index: 'documents', vector: [0.1, 0.2, 0.3], top_k: 5)
59
+ expect(results.ids).to include('doc-1')
60
+ end
61
+ end
62
+ ```
63
+
64
+ ---
65
+
66
+ ## 2. Reset State Between Tests
67
+
68
+ The Memory provider exposes `clear!` to wipe all in-memory data.
69
+
70
+ ```ruby
71
+ # spec/support/vectra_memory.rb
72
+ RSpec.configure do |config|
73
+ config.before(:each, vectra: :memory) do
74
+ Vectra::Providers::Memory.instance.clear!
75
+ end
76
+ end
77
+ ```
78
+
79
+ Usage:
80
+
81
+ ```ruby
82
+ RSpec.describe SearchService, vectra: :memory do
83
+ it 'returns expected results' do
84
+ # state is clean here
85
+ end
86
+ end
87
+ ```
88
+
89
+ If you use global `config.provider = :memory` in test env, this hook ensures tests are isolated.
90
+
91
+ ---
92
+
93
+ ## 3. Testing Application Code (Service Example)
94
+
95
+ Assume you have a simple service that wraps Vectra:
96
+
97
+ ```ruby
98
+ # app/services/search_service.rb
99
+ class SearchService
100
+ def self.search(query:, limit: 10)
101
+ client = Vectra::Client.new
102
+
103
+ embedding = EmbeddingService.generate(query)
104
+
105
+ client.query(
106
+ index: 'documents',
107
+ vector: embedding,
108
+ top_k: limit,
109
+ include_metadata: true
110
+ )
111
+ end
112
+ end
113
+ ```
114
+
115
+ You can test it **without** any external DB:
116
+
117
+ ```ruby
118
+ # spec/services/search_service_spec.rb
119
+ require 'rails_helper'
120
+
121
+ RSpec.describe SearchService, vectra: :memory do
122
+ let(:client) { Vectra.memory }
123
+
124
+ before do
125
+ # Seed in-memory vectors
126
+ client.upsert(
127
+ index: 'documents',
128
+ vectors: [
129
+ { id: 'doc-1', values: [1.0, 0.0, 0.0], metadata: { title: 'Ruby Vectors' } },
130
+ { id: 'doc-2', values: [0.0, 1.0, 0.0], metadata: { title: 'PostgreSQL' } }
131
+ ]
132
+ )
133
+
134
+ # Stub client used inside SearchService to our memory client
135
+ allow(Vectra::Client).to receive(:new).and_return(client)
136
+
137
+ # Stub embedding service to return a deterministic vector
138
+ allow(EmbeddingService).to receive(:generate).and_return([1.0, 0.0, 0.0])
139
+ end
140
+
141
+ it 'returns most relevant document first' do
142
+ results = SearchService.search(query: 'ruby vectors', limit: 2)
143
+
144
+ expect(results.first.metadata['title']).to eq('Ruby Vectors')
145
+ end
146
+ end
147
+ ```
148
+
149
+ ---
150
+
151
+ ## 4. Testing `has_vector` Models
152
+
153
+ Example model:
154
+
155
+ ```ruby
156
+ # app/models/document.rb
157
+ class Document < ApplicationRecord
158
+ include Vectra::ActiveRecord
159
+
160
+ has_vector :embedding,
161
+ provider: :memory,
162
+ index: 'documents',
163
+ dimension: 3,
164
+ metadata_fields: [:title]
165
+ end
166
+ ```
167
+
168
+ ### Model Spec
169
+
170
+ ```ruby
171
+ # spec/models/document_spec.rb
172
+ require 'rails_helper'
173
+
174
+ RSpec.describe Document, vectra: :memory do
175
+ it 'indexes on create and can be searched' do
176
+ doc = Document.create!(
177
+ title: 'Hello',
178
+ embedding: [1.0, 0.0, 0.0]
179
+ )
180
+
181
+ results = Document.vector_search(
182
+ embedding: [1.0, 0.0, 0.0],
183
+ limit: 5
184
+ )
185
+
186
+ expect(results.map(&:id)).to include(doc.id)
187
+ end
188
+ end
189
+ ```
190
+
191
+ ---
192
+
193
+ ## 5. CI Considerations
194
+
195
+ - ✅ No external services needed (no Docker, no cloud credentials)
196
+ - ✅ Fast and deterministic tests
197
+ - ✅ Same Vectra API as real providers
198
+
199
+ Recommended CI pattern:
200
+
201
+ 1. Use `provider: :memory` for the majority of tests.
202
+ 2. Have a **small set of integration tests** (separate job) that hit a real provider (e.g. Qdrant in Docker) if needed.
203
+
204
+ ---
205
+
206
+ ## 6. Useful Links
207
+
208
+ - [Memory Provider Docs](/providers/memory/)
209
+ - [Recipes & Patterns](/guides/recipes/)
210
+ - [Rails Integration Guide](/guides/rails-integration/)
211
+ - [API Cheatsheet](/api/cheatsheet/)
@@ -0,0 +1,215 @@
1
+ ---
2
+ layout: page
3
+ title: Provider Selection Guide
4
+ permalink: /providers/selection/
5
+ ---
6
+
7
+ # Provider Selection Guide
8
+
9
+ Kratki vodič kako odabrati pravog providera za tvoj use-case.
10
+
11
+ Vectra podržava 5 providera:
12
+
13
+ - **Pinecone** – managed cloud
14
+ - **Qdrant** – open source, self-host + cloud
15
+ - **Weaviate** – AI-native, GraphQL, open source
16
+ - **pgvector** – PostgreSQL ekstenzija
17
+ - **Memory** – in-memory, samo za testing
18
+
19
+ ---
20
+
21
+ ## Brzi "decision tree"
22
+
23
+ ### 1. Već koristiš PostgreSQL i želiš minimalne promjene
24
+
25
+ **Preporuka:** `pgvector`
26
+
27
+ Koristi pgvector ako:
28
+
29
+ - Sve ti je već u Postgresu
30
+ - Ne želiš dodatni servis u infrastrukturi
31
+ - Hoćeš **SQL + ACID** i transakcije
32
+ - Dataset je **srednji** (deseci / stotine tisuća do par milijuna vektora)
33
+
34
+ ```ruby
35
+ # Gemfile
36
+ # pg + pgvector extension u bazi
37
+
38
+ Vectra.configure do |config|
39
+ config.provider = :pgvector
40
+ config.host = ENV['DATABASE_URL']
41
+ end
42
+
43
+ client = Vectra::Client.new
44
+ ```
45
+
46
+ **Plusevi:**
47
+ - Nema dodatne baze → manje ops-a
48
+ - Možeš raditi JOIN-ove, transakcije, migrations kao i inače
49
+
50
+ **Minusi:**
51
+ - Nije "pure" vektorska baza (manje specijaliziranih featurea)
52
+ - Scaling je vezan uz Postgres
53
+
54
+ ---
55
+
56
+ ### 2. Želiš open source i mogućnost self-hosta
57
+
58
+ **Preporuka:** `Qdrant` (ili `Weaviate` ako ti treba GraphQL i AI-native featurei)
59
+
60
+ Koristi **Qdrant** ako:
61
+
62
+ - Želiš **OSS** i full kontrolu
63
+ - Hoćeš odličan performance i dobar filter engine
64
+ - Možeš vrtiti Docker/Kubernetes ili koristiti Qdrant Cloud
65
+
66
+ ```ruby
67
+ Vectra.configure do |config|
68
+ config.provider = :qdrant
69
+ config.host = ENV.fetch('QDRANT_HOST', 'http://localhost:6333')
70
+ config.api_key = ENV['QDRANT_API_KEY'] # opcionalno za lokalni
71
+ end
72
+
73
+ client = Vectra::Client.new
74
+ ```
75
+
76
+ Koristi **Weaviate** ako:
77
+
78
+ - Želiš **GraphQL API** i bogat schema model
79
+ - Hoćeš AI-native feature (ugrađeni vektorizatori, hybrid search, cross-references)
80
+
81
+ ```ruby
82
+ Vectra.configure do |config|
83
+ config.provider = :weaviate
84
+ config.host = ENV['WEAVIATE_HOST']
85
+ config.api_key = ENV['WEAVIATE_API_KEY']
86
+ end
87
+ ```
88
+
89
+ ---
90
+
91
+ ### 3. Želiš managed cloud i "zero ops"
92
+
93
+ **Preporuka:** `Pinecone`
94
+
95
+ Koristi Pinecone ako:
96
+
97
+ - Ne želiš brinuti o indexima, sharding-u, backupima
98
+ - Imaš veće volumene i trebaš stabilan cloud servis
99
+ - Hoćeš multi-region, SLA, enterprise podršku
100
+
101
+ ```ruby
102
+ Vectra.configure do |config|
103
+ config.provider = :pinecone
104
+ config.api_key = ENV['PINECONE_API_KEY']
105
+ config.environment = ENV['PINECONE_ENVIRONMENT'] # npr. 'us-west-4'
106
+ end
107
+
108
+ client = Vectra::Client.new
109
+ ```
110
+
111
+ **Plusevi:**
112
+ - Najmanje ops-a
113
+ - Dobri performance i scaling out-of-the-box
114
+
115
+ **Minusi:**
116
+ - Vezan si na cloud provider
117
+ - Plaćeni servis
118
+
119
+ ---
120
+
121
+ ### 4. Samo želiš nešto što radi lokalno za prototipiranje / testing
122
+
123
+ **Preporuka:** `Memory` ili lokalni `Qdrant`
124
+
125
+ - **Memory provider** (`:memory`):
126
+ - Super za RSpec / Minitest i CI
127
+ - Nema vanjskih ovisnosti
128
+
129
+ ```ruby
130
+ Vectra.configure do |config|
131
+ config.provider = :memory if Rails.env.test?
132
+ end
133
+
134
+ client = Vectra::Client.new
135
+ ```
136
+
137
+ - **Lokalni Qdrant**:
138
+ - Pokreneš `docker run qdrant/qdrant`
139
+ - Koristiš pravi vektorski engine i lokalni disk
140
+
141
+ ```ruby
142
+ Vectra.configure do |config|
143
+ config.provider = :qdrant
144
+ config.host = 'http://localhost:6333'
145
+ end
146
+ ```
147
+
148
+ ---
149
+
150
+ ## Tipični scenariji
151
+
152
+ ### E-commerce (1000–1M proizvoda)
153
+
154
+ - **Ako već koristiš Postgres** → `pgvector`
155
+ - **Ako želiš dedicated vektorsku bazu** → `Qdrant`
156
+
157
+ Za Rails primjer pogledaj [Rails Integration Guide](/guides/rails-integration/) i [Recipes & Patterns](/guides/recipes/).
158
+
159
+ ---
160
+
161
+ ### SaaS aplikacija s multi-tenant podrškom
162
+
163
+ - **Qdrant** ili **Weaviate** zbog dobrog filteringa i fleksibilnosti
164
+ - **Pgvector** ako je sve već u jednoj Postgres bazi
165
+
166
+ Koristi **namespace per tenant** + metadata `tenant_id` (primjer u [Recipes](/guides/recipes/#multi-tenant-saas-namespace-isolation)).
167
+
168
+ ---
169
+
170
+ ### RAG chatbot / dokumentacija / knowledge base
171
+
172
+ - **Qdrant** ili **Weaviate** zbog dobrog rada s tekstom i filterima
173
+ - **Pinecone** ako želiš managed cloud bez brige
174
+
175
+ Važno:
176
+
177
+ - Čunkaj dokumente u manje dijelove (npr. 200–500 tokena)
178
+ - Spremi `document_id`, `chunk_index`, `source_url` u metadata
179
+
180
+ Primjer: [RAG Chatbot recipe](/guides/recipes/#rag-chatbot-context-retrieval).
181
+
182
+ ---
183
+
184
+ ### Interni alati / reporting / dashboards
185
+
186
+ - **Pgvector** je često najbolji izbor:
187
+ - Podaci već u Postgresu
188
+ - Možeš kombinirati vektorske upite i klasične SQL joinove
189
+
190
+ ---
191
+
192
+ ## Što ako želim promijeniti providera kasnije?
193
+
194
+ To je upravo najveća prednost Vectre 🙂
195
+
196
+ 1. U konfiguraciji promijeniš `config.provider` + relevantne kredencijale
197
+ 2. Opcionalno pokreneš migraciju podataka (dual-write ili batch migracija)
198
+ 3. Tvoj aplikacijski kod (`client.upsert`, `client.query`, `has_vector`) ostaje isti
199
+
200
+ Za detaljan primjer pogledaj:
201
+
202
+ - [Zero-Downtime Provider Migration](/guides/recipes/#zero-downtime-provider-migration)
203
+
204
+ ---
205
+
206
+ ## Sažetak preporuka
207
+
208
+ - **Samo Postgres, minimalne promjene:** `pgvector`
209
+ - **OSS + self-host, jak filter engine:** `Qdrant`
210
+ - **AI-native, GraphQL:** `Weaviate`
211
+ - **Managed cloud, zero ops:** `Pinecone`
212
+ - **Testing / CI:** `Memory`
213
+
214
+ Sve ove opcije koristiš kroz **isti Vectra API**, pa kasnije možeš mijenjati providera s minimalnim promjenama u kodu.
215
+
@@ -2,8 +2,9 @@
2
2
 
3
3
  require "active_support/concern"
4
4
 
5
- # Ensure Client and Providers are loaded (for Rails autoloading compatibility)
5
+ # Ensure Client and supporting classes are loaded (for Rails autoloading compatibility)
6
6
  require_relative "client" unless defined?(Vectra::Client)
7
+ require_relative "batch" unless defined?(Vectra::Batch)
7
8
 
8
9
  module Vectra
9
10
  # ActiveRecord integration for vector embeddings
@@ -26,6 +27,7 @@ module Vectra
26
27
  # # Search similar documents
27
28
  # results = Document.vector_search([0.1, 0.2, ...], limit: 10)
28
29
  #
30
+ # rubocop:disable Metrics/ModuleLength
29
31
  module ActiveRecord
30
32
  extend ActiveSupport::Concern
31
33
 
@@ -86,6 +88,54 @@ module Vectra
86
88
  end
87
89
  end
88
90
 
91
+ # Reindex all vectors for this model using current configuration.
92
+ #
93
+ # @param scope [ActiveRecord::Relation] records to reindex (default: all)
94
+ # @param batch_size [Integer] number of records per batch
95
+ # @param on_progress [Proc, nil] optional callback called after each batch
96
+ # Receives a hash with :processed and :total keys (and any other stats from Batch)
97
+ #
98
+ # @return [Integer] number of records processed
99
+ def reindex_vectors(scope: all, batch_size: 1_000, on_progress: nil)
100
+ config = _vectra_config
101
+ client = vectra_client
102
+ batch = Vectra::Batch.new(client)
103
+
104
+ processed = 0
105
+
106
+ scope.in_batches(of: batch_size).each do |relation|
107
+ records = relation.to_a
108
+
109
+ vectors = records.map do |record|
110
+ vector = record.send(config[:attribute])
111
+ next if vector.nil?
112
+
113
+ metadata = config[:metadata_fields].each_with_object({}) do |field, hash|
114
+ hash[field.to_s] = record.send(field) if record.respond_to?(field)
115
+ end
116
+
117
+ {
118
+ id: "#{config[:index]}_#{record.id}",
119
+ values: vector,
120
+ metadata: metadata
121
+ }
122
+ end.compact
123
+
124
+ next if vectors.empty?
125
+
126
+ batch.upsert_async(
127
+ index: config[:index],
128
+ vectors: vectors,
129
+ namespace: nil,
130
+ on_progress: on_progress
131
+ )
132
+
133
+ processed += vectors.size
134
+ end
135
+
136
+ processed
137
+ end
138
+
89
139
  # Search vectors
90
140
  #
91
141
  # @api private
@@ -195,4 +245,5 @@ module Vectra
195
245
  "#{self.class._vectra_config[:index]}_#{id}"
196
246
  end
197
247
  end
248
+ # rubocop:enable Metrics/ModuleLength
198
249
  end
data/lib/vectra/batch.rb CHANGED
@@ -1,6 +1,7 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  require "concurrent"
4
+ require_relative "query_result" unless defined?(Vectra::QueryResult)
4
5
 
5
6
  module Vectra
6
7
  # Batch operations with concurrent processing
@@ -112,6 +113,74 @@ module Vectra
112
113
  merge_fetch_results(results)
113
114
  end
114
115
 
116
+ # Perform async batch query with concurrent requests
117
+ #
118
+ # Useful for finding similar items for multiple vectors at once (e.g., recommendation engine).
119
+ #
120
+ # @param index [String] the index name
121
+ # @param vectors [Array<Array<Float>>] query vectors
122
+ # @param top_k [Integer] number of results per query (default: 10)
123
+ # @param namespace [String, nil] optional namespace
124
+ # @param filter [Hash, nil] metadata filter
125
+ # @param include_values [Boolean] include vector values in response
126
+ # @param include_metadata [Boolean] include metadata in response
127
+ # @param chunk_size [Integer] queries per chunk for progress tracking (default: 10)
128
+ # @param on_progress [Proc, nil] optional callback called after each chunk completes
129
+ # Callback receives hash with: processed, total, percentage, current_chunk, total_chunks, success_count, failed_count
130
+ # @return [Array<QueryResult>] array of query results, one per input vector
131
+ #
132
+ # @example Find similar items for multiple products
133
+ # product_embeddings = products.map(&:embedding)
134
+ # results = batch.query_async(
135
+ # index: 'products',
136
+ # vectors: product_embeddings,
137
+ # top_k: 5,
138
+ # on_progress: ->(stats) {
139
+ # puts "Processed #{stats[:processed]}/#{stats[:total]} queries"
140
+ # }
141
+ # )
142
+ # results.each_with_index do |result, i|
143
+ # puts "Similar to product #{i}: #{result.ids}"
144
+ # end
145
+ #
146
+ def query_async(index:, vectors:, top_k: 10, namespace: nil, filter: nil,
147
+ include_values: false, include_metadata: true,
148
+ chunk_size: 10, on_progress: nil)
149
+ return [] if vectors.empty?
150
+
151
+ # Process queries in chunks for progress tracking
152
+ chunks = vectors.each_slice(chunk_size).to_a
153
+ results = process_chunks_concurrently(chunks, total_items: vectors.size, on_progress: on_progress) do |chunk|
154
+ # Execute queries sequentially within chunk (each query is already fast)
155
+ chunk.map do |vector|
156
+ client.query(
157
+ index: index,
158
+ vector: vector,
159
+ top_k: top_k,
160
+ namespace: namespace,
161
+ filter: filter,
162
+ include_values: include_values,
163
+ include_metadata: include_metadata
164
+ )
165
+ end
166
+ end
167
+
168
+ # Flatten results and handle errors
169
+ all_results = []
170
+ results.each_with_index do |chunk_result, chunk_index|
171
+ if chunk_result[:error]
172
+ # On error, return empty QueryResult for each vector in chunk
173
+ # Use actual chunk size (last chunk might be smaller)
174
+ actual_chunk_size = chunk_index < chunks.size ? chunks[chunk_index].size : chunk_size
175
+ actual_chunk_size.times { all_results << QueryResult.new(matches: []) }
176
+ else
177
+ all_results.concat(chunk_result[:result] || [])
178
+ end
179
+ end
180
+
181
+ all_results
182
+ end
183
+
115
184
  private
116
185
 
117
186
  # rubocop:disable Metrics/AbcSize, Metrics/MethodLength, Metrics/BlockLength
data/lib/vectra/cache.rb CHANGED
@@ -258,4 +258,53 @@ module Vectra
258
258
  "#{index}:f:#{id}:#{namespace || 'default'}"
259
259
  end
260
260
  end
261
+
262
+ # Helper for caching embeddings based on model, record ID and input text.
263
+ #
264
+ # @example
265
+ # cache = Vectra::Cache.new(ttl: 600, max_size: 1000)
266
+ #
267
+ # embedding = Vectra::Embeddings.fetch(
268
+ # cache: cache,
269
+ # model_name: "Product",
270
+ # id: product.id,
271
+ # input: product.description,
272
+ # field: :description
273
+ # ) do
274
+ # EmbeddingService.generate(product.description)
275
+ # end
276
+ #
277
+ module Embeddings
278
+ module_function
279
+
280
+ # Build a stable cache key for an embedding.
281
+ #
282
+ # @param model_name [String] model class name (e.g. "Product")
283
+ # @param id [Integer, String] record ID
284
+ # @param input [String] raw input used for embedding
285
+ # @param field [Symbol, String, nil] optional field name
286
+ #
287
+ # @return [String] cache key
288
+ def cache_key(model_name:, id:, input:, field: nil)
289
+ field_part = field ? field.to_s : "default"
290
+ base = "#{model_name}:#{field_part}:#{id}:#{input}"
291
+ digest = Digest::SHA256.hexdigest(base)[0, 32]
292
+ "emb:#{model_name}:#{field_part}:#{digest}"
293
+ end
294
+
295
+ # Fetch an embedding from cache or compute and store it.
296
+ #
297
+ # @param cache [Vectra::Cache] cache instance
298
+ # @param model_name [String] model class name
299
+ # @param id [Integer, String] record ID
300
+ # @param input [String] input used for embedding
301
+ # @param field [Symbol, String, nil] optional field name
302
+ #
303
+ # @yield block that computes the embedding when not cached
304
+ # @return [Object] cached or computed embedding
305
+ def fetch(cache:, model_name:, id:, input:, field: nil, &block)
306
+ key = cache_key(model_name: model_name, id: id, input: input, field: field)
307
+ cache.fetch(key, &block)
308
+ end
309
+ end
261
310
  end