vectra-client 0.2.1 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (55) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop.yml +77 -37
  3. data/CHANGELOG.md +49 -6
  4. data/README.md +52 -393
  5. data/docs/Gemfile +9 -0
  6. data/docs/_config.yml +37 -0
  7. data/docs/_layouts/default.html +14 -0
  8. data/docs/_layouts/home.html +187 -0
  9. data/docs/_layouts/page.html +82 -0
  10. data/docs/_site/api/overview/index.html +145 -0
  11. data/docs/_site/assets/main.css +649 -0
  12. data/docs/_site/assets/main.css.map +1 -0
  13. data/docs/_site/assets/minima-social-icons.svg +33 -0
  14. data/docs/_site/assets/style.css +295 -0
  15. data/docs/_site/community/contributing/index.html +110 -0
  16. data/docs/_site/examples/basic-usage/index.html +117 -0
  17. data/docs/_site/examples/index.html +58 -0
  18. data/docs/_site/feed.xml +1 -0
  19. data/docs/_site/guides/getting-started/index.html +106 -0
  20. data/docs/_site/guides/installation/index.html +82 -0
  21. data/docs/_site/index.html +92 -0
  22. data/docs/_site/providers/index.html +119 -0
  23. data/docs/_site/providers/pgvector/index.html +155 -0
  24. data/docs/_site/providers/pinecone/index.html +121 -0
  25. data/docs/_site/providers/qdrant/index.html +124 -0
  26. data/docs/_site/providers/weaviate/index.html +123 -0
  27. data/docs/_site/robots.txt +1 -0
  28. data/docs/_site/sitemap.xml +39 -0
  29. data/docs/api/overview.md +126 -0
  30. data/docs/assets/style.css +927 -0
  31. data/docs/community/contributing.md +89 -0
  32. data/docs/examples/basic-usage.md +102 -0
  33. data/docs/examples/index.md +54 -0
  34. data/docs/guides/getting-started.md +90 -0
  35. data/docs/guides/installation.md +67 -0
  36. data/docs/guides/performance.md +200 -0
  37. data/docs/index.md +37 -0
  38. data/docs/providers/index.md +81 -0
  39. data/docs/providers/pgvector.md +95 -0
  40. data/docs/providers/pinecone.md +72 -0
  41. data/docs/providers/qdrant.md +73 -0
  42. data/docs/providers/weaviate.md +72 -0
  43. data/lib/vectra/batch.rb +148 -0
  44. data/lib/vectra/cache.rb +261 -0
  45. data/lib/vectra/configuration.rb +6 -1
  46. data/lib/vectra/pool.rb +256 -0
  47. data/lib/vectra/streaming.rb +153 -0
  48. data/lib/vectra/version.rb +1 -1
  49. data/lib/vectra.rb +4 -0
  50. data/netlify.toml +12 -0
  51. metadata +58 -5
  52. data/IMPLEMENTATION_GUIDE.md +0 -686
  53. data/NEW_FEATURES_v0.2.0.md +0 -459
  54. data/RELEASE_CHECKLIST_v0.2.0.md +0 -383
  55. data/USAGE_EXAMPLES.md +0 -787
data/README.md CHANGED
@@ -1,462 +1,121 @@
1
1
  # Vectra
2
2
 
3
- [![Gem Version](https://badge.fury.io/rb/vectra.svg)](https://badge.fury.io/rb/vectra)
3
+ [![Gem Version](https://badge.fury.io/rb/vectra-client.svg)](https://rubygems.org/gems/vectra-client)
4
4
  [![CI](https://github.com/stokry/vectra/actions/workflows/ci.yml/badge.svg)](https://github.com/stokry/vectra/actions)
5
5
  [![codecov](https://codecov.io/gh/stokry/vectra/branch/main/graph/badge.svg)](https://codecov.io/gh/stokry/vectra)
6
- [![Ruby Style Guide](https://img.shields.io/badge/code_style-rubocop-brightgreen.svg)](https://github.com/rubocop/rubocop)
7
6
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
8
- [![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-2.0-4baaaa.svg)](CODE_OF_CONDUCT.md)
9
7
 
10
- **Vectra** is a unified Ruby client for vector databases. Write once, switch providers seamlessly.
8
+ **A unified Ruby client for vector databases.** Write once, switch providers seamlessly.
11
9
 
12
- ## Features
13
-
14
- - 🔌 **Unified API** - One interface for multiple vector databases
15
- - 🚀 **Modern Ruby** - Built for Ruby 3.2+ with modern patterns
16
- - 🔄 **Automatic Retries** - Built-in retry logic with exponential backoff
17
- - 📊 **Rich Results** - Enumerable query results with filtering capabilities
18
- - 🛡️ **Type Safety** - Comprehensive validation and meaningful errors
19
- - 📝 **Well Documented** - Extensive YARD documentation
10
+ 📖 **Documentation:** [vectra-docs.netlify.app](https://vectra-docs.netlify.app/)
20
11
 
21
12
  ## Supported Providers
22
13
 
23
- | Provider | Status | Version |
24
- |----------|--------|---------|
25
- | [Pinecone](https://pinecone.io) | Fully Supported | v0.1.0 |
26
- | [PostgreSQL + pgvector](https://github.com/pgvector/pgvector) | ✅ Fully Supported | v0.1.1 |
27
- | [Qdrant](https://qdrant.tech) | Fully Supported | v0.2.1 |
28
- | [Weaviate](https://weaviate.io) | 🚧 Planned | v0.3.0 |
14
+ | Provider | Type | Status |
15
+ |----------|------|--------|
16
+ | **Pinecone** | Managed Cloud | Supported |
17
+ | **Qdrant** | Open Source | ✅ Supported |
18
+ | **Weaviate** | Open Source | Supported |
19
+ | **pgvector** | PostgreSQL | Supported |
29
20
 
30
21
  ## Installation
31
22
 
32
- Add this line to your application's Gemfile:
33
-
34
23
  ```ruby
35
- gem 'vectra'
24
+ gem 'vectra-client'
36
25
  ```
37
26
 
38
- And then execute:
39
-
40
27
  ```bash
41
28
  bundle install
42
29
  ```
43
30
 
44
- Or install it yourself:
45
-
46
- ```bash
47
- gem install vectra
48
- ```
49
-
50
- ### Provider-Specific Dependencies
51
-
52
- For **pgvector** support, add the `pg` gem:
53
-
54
- ```ruby
55
- gem 'pg', '~> 1.5'
56
- ```
57
-
58
31
  ## Quick Start
59
32
 
60
- ### Configuration
61
-
62
33
  ```ruby
63
34
  require 'vectra'
64
35
 
65
- # Global configuration
66
- Vectra.configure do |config|
67
- config.provider = :pinecone
68
- config.api_key = ENV['PINECONE_API_KEY']
69
- config.environment = 'us-east-1' # or config.host = 'your-index-host.pinecone.io'
70
- end
71
-
72
- # Create a client
73
- client = Vectra::Client.new
74
- ```
75
-
76
- Or use per-client configuration:
77
-
78
- ```ruby
79
- # Shortcut for Pinecone
80
- client = Vectra.pinecone(
81
- api_key: ENV['PINECONE_API_KEY'],
82
- environment: 'us-east-1'
83
- )
84
-
85
- # Shortcut for pgvector (PostgreSQL)
86
- client = Vectra.pgvector(
87
- connection_url: 'postgres://user:password@localhost/mydb'
88
- )
89
-
90
- # Shortcut for Qdrant
91
- client = Vectra.qdrant(
92
- host: 'http://localhost:6333', # Local Qdrant
93
- api_key: ENV['QDRANT_API_KEY'] # Optional for local instances
94
- )
95
-
96
- # Generic client with options
36
+ # Initialize client (works with any provider)
97
37
  client = Vectra::Client.new(
98
38
  provider: :pinecone,
99
39
  api_key: ENV['PINECONE_API_KEY'],
100
- environment: 'us-east-1',
101
- timeout: 60,
102
- max_retries: 5
40
+ environment: 'us-west-4'
103
41
  )
104
- ```
105
42
 
106
- ### Basic Operations
107
-
108
- #### Upsert Vectors
109
-
110
- ```ruby
43
+ # Upsert vectors
111
44
  client.upsert(
112
- index: 'my-index',
113
45
  vectors: [
114
- { id: 'vec1', values: [0.1, 0.2, 0.3], metadata: { text: 'Hello world' } },
115
- { id: 'vec2', values: [0.4, 0.5, 0.6], metadata: { text: 'Ruby is great' } }
46
+ { id: 'doc-1', values: [0.1, 0.2, 0.3], metadata: { title: 'Hello' } },
47
+ { id: 'doc-2', values: [0.4, 0.5, 0.6], metadata: { title: 'World' } }
116
48
  ]
117
49
  )
118
- # => { upserted_count: 2 }
119
- ```
120
-
121
- #### Query Vectors
122
-
123
- ```ruby
124
- results = client.query(
125
- index: 'my-index',
126
- vector: [0.1, 0.2, 0.3],
127
- top_k: 5,
128
- include_metadata: true
129
- )
130
-
131
- # Iterate over results
132
- results.each do |match|
133
- puts "ID: #{match.id}, Score: #{match.score}"
134
- puts "Metadata: #{match.metadata}"
135
- end
136
-
137
- # Access specific results
138
- results.first # First match
139
- results.ids # All matching IDs
140
- results.scores # All scores
141
- results.max_score # Highest score
142
50
 
143
- # Filter by score
144
- high_quality = results.above_score(0.8)
145
- ```
146
-
147
- #### Query with Filters
51
+ # Search
52
+ results = client.query(vector: [0.1, 0.2, 0.3], top_k: 5)
53
+ results.each { |match| puts "#{match.id}: #{match.score}" }
148
54
 
149
- ```ruby
150
- results = client.query(
151
- index: 'my-index',
152
- vector: [0.1, 0.2, 0.3],
153
- top_k: 10,
154
- filter: { category: 'programming', language: 'ruby' }
155
- )
55
+ # Delete
56
+ client.delete(ids: ['doc-1', 'doc-2'])
156
57
  ```
157
58
 
158
- #### Fetch Vectors by ID
59
+ ## Provider Examples
159
60
 
160
61
  ```ruby
161
- vectors = client.fetch(index: 'my-index', ids: ['vec1', 'vec2'])
62
+ # Pinecone
63
+ client = Vectra.pinecone(api_key: ENV['PINECONE_API_KEY'], environment: 'us-west-4')
162
64
 
163
- vectors['vec1'].values # [0.1, 0.2, 0.3]
164
- vectors['vec1'].metadata # { 'text' => 'Hello world' }
165
- ```
65
+ # Qdrant (local)
66
+ client = Vectra.qdrant(host: 'http://localhost:6333')
166
67
 
167
- #### Update Vector Metadata
68
+ # Qdrant (cloud)
69
+ client = Vectra.qdrant(host: 'https://your-cluster.qdrant.io', api_key: ENV['QDRANT_API_KEY'])
168
70
 
169
- ```ruby
170
- client.update(
171
- index: 'my-index',
172
- id: 'vec1',
173
- metadata: { category: 'updated', processed: true }
174
- )
175
- ```
71
+ # Weaviate
72
+ client = Vectra.weaviate(host: 'http://localhost:8080', api_key: ENV['WEAVIATE_API_KEY'])
176
73
 
177
- #### Delete Vectors
178
-
179
- ```ruby
180
- # Delete by IDs
181
- client.delete(index: 'my-index', ids: ['vec1', 'vec2'])
182
-
183
- # Delete by filter
184
- client.delete(index: 'my-index', filter: { category: 'old' })
185
-
186
- # Delete all (use with caution!)
187
- client.delete(index: 'my-index', delete_all: true)
188
- ```
189
-
190
- ### Working with Vectors
191
-
192
- ```ruby
193
- # Create a Vector object
194
- vector = Vectra::Vector.new(
195
- id: 'my-vector',
196
- values: [0.1, 0.2, 0.3],
197
- metadata: { text: 'Example' }
198
- )
199
-
200
- vector.dimension # => 3
201
- vector.metadata? # => true
202
- vector.to_h # Convert to hash
203
-
204
- # Calculate similarity
205
- other = Vectra::Vector.new(id: 'other', values: [0.1, 0.2, 0.3])
206
- vector.cosine_similarity(other) # => 1.0 (identical)
207
- vector.euclidean_distance(other) # => 0.0
208
- ```
209
-
210
- ### Index Management
211
-
212
- ```ruby
213
- # List all indexes
214
- indexes = client.list_indexes
215
- indexes.each { |idx| puts idx[:name] }
216
-
217
- # Describe an index
218
- info = client.describe_index(index: 'my-index')
219
- puts info[:dimension] # => 384
220
- puts info[:metric] # => "cosine"
221
-
222
- # Get index statistics
223
- stats = client.stats(index: 'my-index')
224
- puts stats[:total_vector_count]
74
+ # pgvector (PostgreSQL)
75
+ client = Vectra.pgvector(connection_url: 'postgres://user:pass@localhost/mydb')
225
76
  ```
226
77
 
227
- ### Namespaces
228
-
229
- Namespaces allow you to partition vectors within an index:
230
-
231
- ```ruby
232
- # Upsert to a namespace
233
- client.upsert(
234
- index: 'my-index',
235
- namespace: 'production',
236
- vectors: [...]
237
- )
238
-
239
- # Query within a namespace
240
- client.query(
241
- index: 'my-index',
242
- namespace: 'production',
243
- vector: [0.1, 0.2, 0.3],
244
- top_k: 5
245
- )
246
- ```
247
-
248
- ### pgvector (PostgreSQL)
249
-
250
- pgvector uses PostgreSQL tables as indexes. Each "index" is a table with a vector column.
251
-
252
- #### Setup PostgreSQL with pgvector
253
-
254
- ```bash
255
- # Using Docker
256
- docker run -d --name pgvector \
257
- -e POSTGRES_PASSWORD=password \
258
- -p 5432:5432 \
259
- pgvector/pgvector:pg16
260
- ```
261
-
262
- #### Create an Index (Table)
263
-
264
- ```ruby
265
- client = Vectra.pgvector(connection_url: 'postgres://postgres:password@localhost/postgres')
266
-
267
- # Create a new index with cosine similarity
268
- client.provider.create_index(
269
- name: 'documents',
270
- dimension: 384,
271
- metric: 'cosine' # or 'euclidean', 'inner_product'
272
- )
273
- ```
274
-
275
- #### Supported Metrics
276
-
277
- | Metric | Description | pgvector Operator |
278
- |--------|-------------|-------------------|
279
- | `cosine` | Cosine similarity (default) | `<=>` |
280
- | `euclidean` | Euclidean distance | `<->` |
281
- | `inner_product` | Inner product / dot product | `<#>` |
282
-
283
- #### Table Structure
284
-
285
- Vectra creates tables with the following structure:
286
-
287
- ```sql
288
- CREATE TABLE documents (
289
- id TEXT PRIMARY KEY,
290
- embedding vector(384),
291
- metadata JSONB DEFAULT '{}',
292
- namespace TEXT DEFAULT '',
293
- created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
294
- );
295
-
296
- -- IVFFlat index for fast similarity search
297
- CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops);
298
- ```
299
-
300
- ## Configuration Options
301
-
302
- | Option | Description | Default |
303
- |--------|-------------|---------|
304
- | `provider` | Vector database provider (`:pinecone`, `:pgvector`, `:qdrant`, `:weaviate`) | Required |
305
- | `api_key` | API key for authentication (password for pgvector) | Required* |
306
- | `environment` | Environment/region (Pinecone) | - |
307
- | `host` | Direct host URL or PostgreSQL connection URL | - |
308
- | `timeout` | Request timeout in seconds | 30 |
309
- | `open_timeout` | Connection timeout in seconds | 10 |
310
- | `max_retries` | Maximum retry attempts | 3 |
311
- | `retry_delay` | Initial retry delay in seconds | 1 |
312
- | `logger` | Logger instance for debugging | nil |
313
-
314
- *For pgvector, `api_key` is used as the PostgreSQL password.
315
-
316
- ## Error Handling
317
-
318
- Vectra provides specific error classes for different failure scenarios:
319
-
320
- ```ruby
321
- begin
322
- client.query(index: 'my-index', vector: [0.1, 0.2], top_k: 5)
323
- rescue Vectra::AuthenticationError => e
324
- puts "Authentication failed: #{e.message}"
325
- rescue Vectra::RateLimitError => e
326
- puts "Rate limited. Retry after #{e.retry_after} seconds"
327
- rescue Vectra::NotFoundError => e
328
- puts "Resource not found: #{e.message}"
329
- rescue Vectra::ValidationError => e
330
- puts "Invalid request: #{e.message}"
331
- rescue Vectra::ServerError => e
332
- puts "Server error (#{e.status_code}): #{e.message}"
333
- rescue Vectra::Error => e
334
- puts "General error: #{e.message}"
335
- end
336
- ```
78
+ ## Features
337
79
 
338
- ## Logging
80
+ - **Provider Agnostic** - Switch providers with one line change
81
+ - **Production Ready** - Ruby 3.2+, 95%+ test coverage
82
+ - **Resilient** - Retry logic with exponential backoff
83
+ - **Observable** - Datadog & New Relic instrumentation
84
+ - **Rails Ready** - ActiveRecord integration with `has_vector` DSL
339
85
 
340
- Enable debug logging to see request details:
86
+ ## Rails Integration
341
87
 
342
88
  ```ruby
343
- require 'logger'
89
+ class Document < ApplicationRecord
90
+ include Vectra::ActiveRecord
344
91
 
345
- Vectra.configure do |config|
346
- config.provider = :pinecone
347
- config.api_key = ENV['PINECONE_API_KEY']
348
- config.environment = 'us-east-1'
349
- config.logger = Logger.new($stdout)
92
+ has_vector :embedding,
93
+ provider: :qdrant,
94
+ index: 'documents',
95
+ dimension: 1536
350
96
  end
351
- ```
352
-
353
- ## Best Practices
354
-
355
- ### Batch Upserts
356
-
357
- For large datasets, batch your upserts:
358
-
359
- ```ruby
360
- vectors = large_dataset.each_slice(100).map do |batch|
361
- client.upsert(index: 'my-index', vectors: batch)
362
- end
363
- ```
364
-
365
- ### Connection Reuse
366
97
 
367
- Create a single client instance and reuse it:
98
+ # Auto-indexes on save
99
+ doc = Document.create!(title: 'Hello', embedding: [0.1, 0.2, ...])
368
100
 
369
- ```ruby
370
- # Good: Reuse the client
371
- client = Vectra::Client.new(...)
372
- client.query(...)
373
- client.upsert(...)
374
-
375
- # Avoid: Creating new clients for each operation
376
- Vectra::Client.new(...).query(...)
377
- Vectra::Client.new(...).upsert(...)
378
- ```
379
-
380
- ### Error Recovery
381
-
382
- Implement retry logic for transient failures:
383
-
384
- ```ruby
385
- def query_with_retry(client, **params, retries: 3)
386
- client.query(**params)
387
- rescue Vectra::RateLimitError => e
388
- if retries > 0
389
- sleep(e.retry_after || 1)
390
- retry(retries: retries - 1)
391
- else
392
- raise
393
- end
394
- end
101
+ # Search
102
+ Document.vector_search(embedding: query_vector, limit: 10)
395
103
  ```
396
104
 
397
105
  ## Development
398
106
 
399
- After checking out the repo:
400
-
401
107
  ```bash
402
- # Install dependencies
108
+ git clone https://github.com/stokry/vectra.git
109
+ cd vectra
403
110
  bundle install
404
-
405
- # Run tests
406
111
  bundle exec rspec
407
-
408
- # Run linter
409
112
  bundle exec rubocop
410
-
411
- # Generate documentation
412
- bundle exec rake docs
413
113
  ```
414
114
 
415
- ## Roadmap
416
-
417
- ### v0.1.0
418
- - ✅ Pinecone provider
419
- - ✅ Basic CRUD operations
420
- - ✅ Configuration system
421
- - ✅ Error handling with retries
422
- - ✅ Comprehensive tests
423
-
424
- ### v0.1.1 (Current)
425
- - ✅ pgvector (PostgreSQL) provider
426
- - ✅ Multiple similarity metrics (cosine, euclidean, inner product)
427
- - ✅ Namespace support for pgvector
428
- - ✅ IVFFlat index creation
429
-
430
- ### v0.2.1
431
- - ✅ Qdrant provider (fully implemented)
432
- - ✅ Enhanced error handling
433
- - ✅ Improved retry middleware
434
-
435
- ### v0.3.0
436
- - 🚧 Weaviate provider
437
- - 🚧 Batch operations
438
- - 🚧 Performance optimizations
439
-
440
- ### v1.0.0
441
- - 🚧 Rails integration
442
- - 🚧 ActiveRecord-like DSL
443
- - 🚧 Background job support
444
- - 🚧 Full documentation
445
-
446
115
  ## Contributing
447
116
 
448
- Bug reports and pull requests are welcome on GitHub at https://github.com/stokry/vectra.
449
-
450
- 1. Fork it
451
- 2. Create your feature branch (`git checkout -b feature/my-new-feature`)
452
- 3. Commit your changes (`git commit -am 'Add some feature'`)
453
- 4. Push to the branch (`git push origin feature/my-new-feature`)
454
- 5. Create a new Pull Request
117
+ Bug reports and pull requests welcome at [github.com/stokry/vectra](https://github.com/stokry/vectra).
455
118
 
456
119
  ## License
457
120
 
458
- The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
459
-
460
- ## Acknowledgments
461
-
462
- Inspired by the simplicity of Ruby database gems and the need for a unified vector database interface.
121
+ MIT License - see [LICENSE](LICENSE) file.
data/docs/Gemfile ADDED
@@ -0,0 +1,9 @@
1
+ # frozen_string_literal: true
2
+
3
+ source "https://rubygems.org"
4
+
5
+ gem "jekyll", "~> 4.4"
6
+ gem "jekyll-feed", "~> 0.12"
7
+ gem "jekyll-seo-tag"
8
+ gem "jekyll-sitemap"
9
+ gem "webrick", "~> 1.8"
data/docs/_config.yml ADDED
@@ -0,0 +1,37 @@
1
+ title: Vectra Documentation
2
+ description: A unified Ruby client for vector databases. Build AI-powered search, RAG applications, and recommendation systems.
3
+ url: "https://vectra-docs.netlify.app"
4
+ baseurl: ""
5
+
6
+ plugins:
7
+ - jekyll-feed
8
+ - jekyll-sitemap
9
+ - jekyll-seo-tag
10
+
11
+ markdown: kramdown
12
+ highlighter: rouge
13
+
14
+ kramdown:
15
+ syntax_highlighter: rouge
16
+ syntax_highlighter_opts:
17
+ block:
18
+ line_numbers: false
19
+
20
+ exclude:
21
+ - Gemfile
22
+ - Gemfile.lock
23
+ - node_modules
24
+ - vendor/bundle/
25
+ - vendor/cache/
26
+ - vendor/gems/
27
+ - vendor/ruby/
28
+ - _site/
29
+
30
+ permalink: /:categories/:title/
31
+
32
+ defaults:
33
+ - scope:
34
+ path: ""
35
+ type: "pages"
36
+ values:
37
+ layout: "page"
@@ -0,0 +1,14 @@
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="utf-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1">
6
+ <meta name="description" content="{{ page.description | default: site.description }}">
7
+ <title>{{ page.title }} - {{ site.title }}</title>
8
+ <link rel="stylesheet" href="{{ site.baseurl }}/assets/style.css">
9
+ <link rel="icon" type="image/svg+xml" href="data:image/svg+xml,<svg xmlns=%22http://www.w3.org/2000/svg%22 viewBox=%220 0 100 100%22><text y=%22.9em%22 font-size=%2290%22>◆</text></svg>">
10
+ </head>
11
+ <body>
12
+ {{ content }}
13
+ </body>
14
+ </html>