vectra-client 0.2.1 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (55) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop.yml +77 -37
  3. data/CHANGELOG.md +49 -6
  4. data/README.md +52 -393
  5. data/docs/Gemfile +9 -0
  6. data/docs/_config.yml +37 -0
  7. data/docs/_layouts/default.html +14 -0
  8. data/docs/_layouts/home.html +187 -0
  9. data/docs/_layouts/page.html +82 -0
  10. data/docs/_site/api/overview/index.html +145 -0
  11. data/docs/_site/assets/main.css +649 -0
  12. data/docs/_site/assets/main.css.map +1 -0
  13. data/docs/_site/assets/minima-social-icons.svg +33 -0
  14. data/docs/_site/assets/style.css +295 -0
  15. data/docs/_site/community/contributing/index.html +110 -0
  16. data/docs/_site/examples/basic-usage/index.html +117 -0
  17. data/docs/_site/examples/index.html +58 -0
  18. data/docs/_site/feed.xml +1 -0
  19. data/docs/_site/guides/getting-started/index.html +106 -0
  20. data/docs/_site/guides/installation/index.html +82 -0
  21. data/docs/_site/index.html +92 -0
  22. data/docs/_site/providers/index.html +119 -0
  23. data/docs/_site/providers/pgvector/index.html +155 -0
  24. data/docs/_site/providers/pinecone/index.html +121 -0
  25. data/docs/_site/providers/qdrant/index.html +124 -0
  26. data/docs/_site/providers/weaviate/index.html +123 -0
  27. data/docs/_site/robots.txt +1 -0
  28. data/docs/_site/sitemap.xml +39 -0
  29. data/docs/api/overview.md +126 -0
  30. data/docs/assets/style.css +927 -0
  31. data/docs/community/contributing.md +89 -0
  32. data/docs/examples/basic-usage.md +102 -0
  33. data/docs/examples/index.md +54 -0
  34. data/docs/guides/getting-started.md +90 -0
  35. data/docs/guides/installation.md +67 -0
  36. data/docs/guides/performance.md +200 -0
  37. data/docs/index.md +37 -0
  38. data/docs/providers/index.md +81 -0
  39. data/docs/providers/pgvector.md +95 -0
  40. data/docs/providers/pinecone.md +72 -0
  41. data/docs/providers/qdrant.md +73 -0
  42. data/docs/providers/weaviate.md +72 -0
  43. data/lib/vectra/batch.rb +148 -0
  44. data/lib/vectra/cache.rb +261 -0
  45. data/lib/vectra/configuration.rb +6 -1
  46. data/lib/vectra/pool.rb +256 -0
  47. data/lib/vectra/streaming.rb +153 -0
  48. data/lib/vectra/version.rb +1 -1
  49. data/lib/vectra.rb +4 -0
  50. data/netlify.toml +12 -0
  51. metadata +58 -5
  52. data/IMPLEMENTATION_GUIDE.md +0 -686
  53. data/NEW_FEATURES_v0.2.0.md +0 -459
  54. data/RELEASE_CHECKLIST_v0.2.0.md +0 -383
  55. data/USAGE_EXAMPLES.md +0 -787
@@ -1,459 +0,0 @@
1
- # 🚀 NEW FEATURES in v0.2.0
2
-
3
- ## Overview
4
-
5
- Version 0.2.0 adds **enterprise-grade features** for production use:
6
-
7
- - 📊 **Instrumentation & Monitoring** - New Relic, Datadog, custom handlers
8
- - 🎨 **Rails Generator** - `rails g vectra:install`
9
- - 💎 **ActiveRecord Integration** - `has_vector` DSL for seamless Rails integration
10
- - 🔄 **Automatic Retry Logic** - Resilience for transient database errors
11
- - ⚡ **Performance Benchmarks** - Measure and optimize your setup
12
-
13
- ---
14
-
15
- ## 📊 Instrumentation & Monitoring
16
-
17
- Track all vector operations in production.
18
-
19
- ### Quick Start
20
-
21
- ```ruby
22
- # config/initializers/vectra.rb
23
- Vectra.configure do |config|
24
- config.instrumentation = true
25
- end
26
-
27
- # Custom handler
28
- Vectra.on_operation do |event|
29
- Rails.logger.info "Vectra: #{event.operation} took #{event.duration}ms"
30
-
31
- # Send to monitoring
32
- StatsD.timing("vectra.#{event.operation}", event.duration)
33
- StatsD.increment("vectra.#{event.success? ? 'success' : 'error'}")
34
- end
35
- ```
36
-
37
- ### New Relic Integration
38
-
39
- ```ruby
40
- require 'vectra/instrumentation/new_relic'
41
-
42
- Vectra.configure { |c| c.instrumentation = true }
43
- Vectra::Instrumentation::NewRelic.setup!
44
- ```
45
-
46
- Automatically tracks:
47
- - `Custom/Vectra/pgvector/query/duration`
48
- - `Custom/Vectra/pgvector/upsert/success`
49
- - And more...
50
-
51
- ### Datadog Integration
52
-
53
- ```ruby
54
- require 'vectra/instrumentation/datadog'
55
-
56
- Vectra.configure { |c| c.instrumentation = true }
57
- Vectra::Instrumentation::Datadog.setup!(
58
- host: ENV['DD_AGENT_HOST'],
59
- port: 8125
60
- )
61
- ```
62
-
63
- Metrics:
64
- - `vectra.operation.duration` (timing)
65
- - `vectra.operation.count` (counter)
66
- - `vectra.operation.error` (counter)
67
-
68
- ### Event API
69
-
70
- ```ruby
71
- Vectra.on_operation do |event|
72
- event.operation # :query, :upsert, :fetch, :update, :delete
73
- event.provider # :pgvector, :pinecone
74
- event.index # 'documents'
75
- event.duration # 123.45 (milliseconds)
76
- event.metadata # { vector_count: 10, top_k: 5, ... }
77
- event.success? # true/false
78
- event.error # Exception or nil
79
- end
80
- ```
81
-
82
- ---
83
-
84
- ## 🎨 Rails Generator
85
-
86
- One command to set up Vectra in Rails.
87
-
88
- ### Usage
89
-
90
- ```bash
91
- rails generate vectra:install --provider=pgvector --instrumentation=true
92
- ```
93
-
94
- ### What it creates:
95
-
96
- 1. **config/initializers/vectra.rb** - Configuration with smart defaults
97
- 2. **db/migrate/XXX_enable_pgvector_extension.rb** - Enables pgvector (if using PostgreSQL)
98
- 3. **Setup instructions** - Provider-specific next steps
99
-
100
- ### Options
101
-
102
- ```bash
103
- --provider=NAME # pinecone, pgvector, qdrant, weaviate (default: pgvector)
104
- --database-url=URL # PostgreSQL connection URL (for pgvector)
105
- --api-key=KEY # API key for the provider
106
- --instrumentation # Enable instrumentation
107
- ```
108
-
109
- ### Example
110
-
111
- ```bash
112
- rails g vectra:install --provider=pgvector --instrumentation=true
113
- rails db:migrate
114
-
115
- # Creates initializer with:
116
- # - Connection pooling (10 connections)
117
- # - Batch operations (100 vectors/batch)
118
- # - Instrumentation enabled
119
- # - Logging to Rails.logger
120
- ```
121
-
122
- ---
123
-
124
- ## 💎 ActiveRecord Integration
125
-
126
- Add vector search to any Rails model.
127
-
128
- ### Quick Start
129
-
130
- ```ruby
131
- class Document < ApplicationRecord
132
- include Vectra::ActiveRecord
133
-
134
- has_vector :embedding,
135
- dimension: 384,
136
- provider: :pgvector,
137
- index: 'documents',
138
- auto_index: true,
139
- metadata_fields: [:title, :category, :status]
140
-
141
- # Generate embeddings (use OpenAI, Cohere, etc.)
142
- before_validation :generate_embedding, if: :content_changed?
143
-
144
- def generate_embedding
145
- self.embedding = OpenAI::Client.new.embeddings(
146
- parameters: { model: 'text-embedding-3-small', input: content }
147
- ).dig('data', 0, 'embedding')
148
- end
149
- end
150
- ```
151
-
152
- ### Usage
153
-
154
- ```ruby
155
- # Create (automatically indexed)
156
- doc = Document.create!(
157
- title: 'Getting Started',
158
- content: 'Learn how to...',
159
- category: 'tutorial'
160
- )
161
-
162
- # Vector search
163
- query_vector = generate_embedding('how to get started')
164
- results = Document.vector_search(query_vector, limit: 10)
165
-
166
- results.each do |doc|
167
- puts "#{doc.title} - Score: #{doc.vector_score}"
168
- end
169
-
170
- # Find similar documents
171
- similar = doc.similar(limit: 5)
172
-
173
- # Search with filters
174
- results = Document.vector_search(
175
- query_vector,
176
- limit: 10,
177
- filter: { category: 'tutorial', status: 'published' }
178
- )
179
-
180
- # Manual control
181
- doc.index_vector! # Force indexing
182
- doc.delete_vector! # Remove from index
183
- ```
184
-
185
- ### Features
186
-
187
- - ✅ **Auto-indexing** - On create/update
188
- - ✅ **Auto-deletion** - On destroy
189
- - ✅ **Metadata sync** - Specified fields included in vector metadata
190
- - ✅ **AR object loading** - Search returns ActiveRecord objects, not just vectors
191
- - ✅ **Score access** - `doc.vector_score` available on results
192
- - ✅ **Background jobs** - Disable auto-index for async processing
193
-
194
- ### Background Indexing
195
-
196
- ```ruby
197
- class Document < ApplicationRecord
198
- include Vectra::ActiveRecord
199
-
200
- has_vector :embedding, dimension: 384, auto_index: false
201
-
202
- after_commit :index_async, on: [:create, :update]
203
-
204
- def index_async
205
- IndexVectorJob.perform_later(id)
206
- end
207
- end
208
-
209
- # Job
210
- class IndexVectorJob < ApplicationJob
211
- def perform(document_id)
212
- doc = Document.find(document_id)
213
- doc.index_vector!
214
- end
215
- end
216
- ```
217
-
218
- ---
219
-
220
- ## 🔄 Automatic Retry Logic
221
-
222
- Resilience for transient database errors.
223
-
224
- ### What it does
225
-
226
- Automatically retries operations on:
227
- - `PG::ConnectionBad`
228
- - `PG::UnableToSend`
229
- - `PG::TooManyConnections`
230
- - `PG::SerializationFailure`
231
- - `PG::DeadlockDetected`
232
- - `ConnectionPool::TimeoutError`
233
- - Any error with "timeout" or "connection" in message
234
-
235
- ### Configuration
236
-
237
- ```ruby
238
- Vectra.configure do |config|
239
- config.max_retries = 3 # Default: 3
240
- config.retry_delay = 1.0 # Initial delay: 1 second
241
- end
242
- ```
243
-
244
- ### How it works
245
-
246
- - **Exponential backoff**: 1s, 2s, 4s, 8s, ...
247
- - **Jitter**: ±25% randomness to prevent thundering herd
248
- - **Max delay**: Capped at 30 seconds
249
- - **Logging**: Each retry logged with delay and reason
250
-
251
- ### Already integrated
252
-
253
- Retry logic is automatically used in:
254
- - pgvector `execute()` method
255
- - All database operations
256
-
257
- No code changes needed - it just works!
258
-
259
- ---
260
-
261
- ## ⚡ Performance Benchmarks
262
-
263
- Measure and optimize your setup.
264
-
265
- ### Run benchmarks
266
-
267
- ```bash
268
- # Batch operations
269
- DATABASE_URL=postgres://localhost/vectra_bench \
270
- ruby benchmarks/batch_operations_benchmark.rb
271
-
272
- # Connection pooling
273
- DATABASE_URL=postgres://localhost/vectra_bench \
274
- ruby benchmarks/connection_pooling_benchmark.rb
275
- ```
276
-
277
- ### What they measure
278
-
279
- **Batch Operations:**
280
- - Vector counts: 100, 500, 1K, 5K, 10K
281
- - Batch sizes: 50, 100, 250, 500
282
- - Metrics: avg time, vectors/sec, batch count
283
-
284
- **Connection Pooling:**
285
- - Pool sizes: 5, 10, 20
286
- - Thread counts: 1, 2, 5, 10, 20
287
- - Metrics: total time, ops/sec, pool availability
288
-
289
- ### Example output
290
-
291
- ```
292
- 1000 vectors:
293
- Batch size 50: 2.134s avg (468 vectors/sec, 20 batches)
294
- Batch size 100: 1.892s avg (529 vectors/sec, 10 batches)
295
- Batch size 250: 1.645s avg (608 vectors/sec, 4 batches)
296
- Batch size 500: 1.523s avg (657 vectors/sec, 2 batches)
297
-
298
- Pool Size: 10
299
- 1 threads: 5.21s total (9.6 ops/sec) Pool: 9/10 available
300
- 5 threads: 5.34s total (46.8 ops/sec) Pool: 5/10 available
301
- 10 threads: 5.67s total (88.2 ops/sec) Pool: 0/10 available
302
- ```
303
-
304
- ### Recommendations
305
-
306
- Based on benchmarks:
307
- - **Batch size**: 250-500 for best performance
308
- - **Pool size**: Match your app server thread count
309
- - **Monitoring**: Use instrumentation to track in production
310
-
311
- ---
312
-
313
- ## 📚 Documentation
314
-
315
- New comprehensive guides:
316
-
317
- - **USAGE_EXAMPLES.md** - 10 practical examples
318
- - E-commerce semantic search
319
- - RAG chatbot
320
- - Duplicate detection
321
- - Rails integration
322
- - Performance tips
323
-
324
- - **IMPLEMENTATION_GUIDE.md** - Developer guide
325
- - Feature implementation
326
- - Testing strategies
327
- - Customization examples
328
- - Troubleshooting
329
-
330
- ---
331
-
332
- ## 🚀 Migration Guide
333
-
334
- ### From v0.1.x to v0.2.0
335
-
336
- **No breaking changes!** v0.2.0 is fully backward compatible.
337
-
338
- ### New configuration options
339
-
340
- ```ruby
341
- Vectra.configure do |config|
342
- # Existing options still work
343
- config.provider = :pgvector
344
- config.host = 'postgres://...'
345
-
346
- # NEW: Instrumentation (optional)
347
- config.instrumentation = false # default
348
-
349
- # NEW: Already had these, now documented
350
- config.pool_size = 5 # Connection pool size
351
- config.pool_timeout = 5 # Seconds to wait for connection
352
- config.batch_size = 100 # Vectors per batch
353
- config.max_retries = 3 # Retry attempts
354
- config.retry_delay = 1 # Initial retry delay
355
- end
356
- ```
357
-
358
- ### Enabling new features
359
-
360
- ```ruby
361
- # 1. Enable instrumentation
362
- Vectra.configure { |c| c.instrumentation = true }
363
-
364
- # 2. Add handler (optional)
365
- Vectra.on_operation { |event| puts event.operation }
366
-
367
- # 3. Use ActiveRecord integration (opt-in)
368
- class Document < ApplicationRecord
369
- include Vectra::ActiveRecord
370
- has_vector :embedding, dimension: 384
371
- end
372
- ```
373
-
374
- ### No action required
375
-
376
- These features work automatically:
377
- - ✅ Retry logic (already integrated)
378
- - ✅ Performance improvements (transparent)
379
-
380
- ---
381
-
382
- ## 📊 Performance Improvements
383
-
384
- ### Before v0.2.0
385
-
386
- ```ruby
387
- # 10,000 individual operations
388
- 10_000.times { client.upsert(index: 'docs', vectors: [vec]) }
389
- # ~50 seconds
390
- ```
391
-
392
- ### After v0.2.0
393
-
394
- ```ruby
395
- # Batch operations (automatic)
396
- client.upsert(index: 'docs', vectors: 10_000_vectors)
397
- # ~5 seconds (10x faster!)
398
- ```
399
-
400
- ### Connection Pooling
401
-
402
- ```ruby
403
- # Before: New connection per request
404
- # After: Reuse from pool (5-10x faster in multi-threaded apps)
405
-
406
- Vectra.configure do |config|
407
- config.pool_size = 20 # Match your app threads
408
- end
409
- ```
410
-
411
- ---
412
-
413
- ## 🎯 Next Steps
414
-
415
- 1. **Update your gem:**
416
- ```bash
417
- bundle update vectra
418
- ```
419
-
420
- 2. **Enable instrumentation:**
421
- ```ruby
422
- Vectra.configure { |c| c.instrumentation = true }
423
- ```
424
-
425
- 3. **Try ActiveRecord integration:**
426
- ```ruby
427
- rails g vectra:install --provider=pgvector
428
- ```
429
-
430
- 4. **Run benchmarks:**
431
- ```bash
432
- ruby benchmarks/batch_operations_benchmark.rb
433
- ```
434
-
435
- 5. **Read examples:**
436
- - See USAGE_EXAMPLES.md for 10 practical examples
437
- - See IMPLEMENTATION_GUIDE.md for detailed docs
438
-
439
- ---
440
-
441
- ## 🤝 Contributing
442
-
443
- We welcome contributions! See:
444
- - CONTRIBUTING.md - Contribution guide
445
- - IMPLEMENTATION_GUIDE.md - Feature implementation guide
446
-
447
- ---
448
-
449
- ## 📝 Changelog
450
-
451
- See CHANGELOG.md for complete version history.
452
-
453
- ---
454
-
455
- ## ❓ Questions?
456
-
457
- - GitHub Issues: https://github.com/stokry/vectra/issues
458
- - Documentation: See README.md, USAGE_EXAMPLES.md
459
- - Examples: See examples/ directory