vectra-client 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md ADDED
@@ -0,0 +1,456 @@
1
+ # Vectra
2
+
3
+ [![Gem Version](https://badge.fury.io/rb/vectra.svg)](https://badge.fury.io/rb/vectra)
4
+ [![CI](https://github.com/stokry/vectra/actions/workflows/ci.yml/badge.svg)](https://github.com/stokry/vectra/actions)
5
+ [![codecov](https://codecov.io/gh/stokry/vectra/branch/main/graph/badge.svg)](https://codecov.io/gh/stokry/vectra)
6
+ [![Ruby Style Guide](https://img.shields.io/badge/code_style-rubocop-brightgreen.svg)](https://github.com/rubocop/rubocop)
7
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
8
+ [![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-2.0-4baaaa.svg)](CODE_OF_CONDUCT.md)
9
+
10
+ **Vectra** is a unified Ruby client for vector databases. Write once, switch providers seamlessly.
11
+
12
+ ## Features
13
+
14
+ - 🔌 **Unified API** - One interface for multiple vector databases
15
+ - 🚀 **Modern Ruby** - Built for Ruby 3.2+ with modern patterns
16
+ - 🔄 **Automatic Retries** - Built-in retry logic with exponential backoff
17
+ - 📊 **Rich Results** - Enumerable query results with filtering capabilities
18
+ - 🛡️ **Type Safety** - Comprehensive validation and meaningful errors
19
+ - 📝 **Well Documented** - Extensive YARD documentation
20
+
21
+ ## Supported Providers
22
+
23
+ | Provider | Status | Version |
24
+ |----------|--------|---------|
25
+ | [Pinecone](https://pinecone.io) | ✅ Fully Supported | v0.1.0 |
26
+ | [PostgreSQL + pgvector](https://github.com/pgvector/pgvector) | ✅ Fully Supported | v0.1.1 |
27
+ | [Qdrant](https://qdrant.tech) | 🚧 Planned | v0.2.0 |
28
+ | [Weaviate](https://weaviate.io) | 🚧 Planned | v0.3.0 |
29
+
30
+ ## Installation
31
+
32
+ Add this line to your application's Gemfile:
33
+
34
+ ```ruby
35
+ gem 'vectra'
36
+ ```
37
+
38
+ And then execute:
39
+
40
+ ```bash
41
+ bundle install
42
+ ```
43
+
44
+ Or install it yourself:
45
+
46
+ ```bash
47
+ gem install vectra
48
+ ```
49
+
50
+ ### Provider-Specific Dependencies
51
+
52
+ For **pgvector** support, add the `pg` gem:
53
+
54
+ ```ruby
55
+ gem 'pg', '~> 1.5'
56
+ ```
57
+
58
+ ## Quick Start
59
+
60
+ ### Configuration
61
+
62
+ ```ruby
63
+ require 'vectra'
64
+
65
+ # Global configuration
66
+ Vectra.configure do |config|
67
+ config.provider = :pinecone
68
+ config.api_key = ENV['PINECONE_API_KEY']
69
+ config.environment = 'us-east-1' # or config.host = 'your-index-host.pinecone.io'
70
+ end
71
+
72
+ # Create a client
73
+ client = Vectra::Client.new
74
+ ```
75
+
76
+ Or use per-client configuration:
77
+
78
+ ```ruby
79
+ # Shortcut for Pinecone
80
+ client = Vectra.pinecone(
81
+ api_key: ENV['PINECONE_API_KEY'],
82
+ environment: 'us-east-1'
83
+ )
84
+
85
+ # Shortcut for pgvector (PostgreSQL)
86
+ client = Vectra.pgvector(
87
+ connection_url: 'postgres://user:password@localhost/mydb'
88
+ )
89
+
90
+ # Generic client with options
91
+ client = Vectra::Client.new(
92
+ provider: :pinecone,
93
+ api_key: ENV['PINECONE_API_KEY'],
94
+ environment: 'us-east-1',
95
+ timeout: 60,
96
+ max_retries: 5
97
+ )
98
+ ```
99
+
100
+ ### Basic Operations
101
+
102
+ #### Upsert Vectors
103
+
104
+ ```ruby
105
+ client.upsert(
106
+ index: 'my-index',
107
+ vectors: [
108
+ { id: 'vec1', values: [0.1, 0.2, 0.3], metadata: { text: 'Hello world' } },
109
+ { id: 'vec2', values: [0.4, 0.5, 0.6], metadata: { text: 'Ruby is great' } }
110
+ ]
111
+ )
112
+ # => { upserted_count: 2 }
113
+ ```
114
+
115
+ #### Query Vectors
116
+
117
+ ```ruby
118
+ results = client.query(
119
+ index: 'my-index',
120
+ vector: [0.1, 0.2, 0.3],
121
+ top_k: 5,
122
+ include_metadata: true
123
+ )
124
+
125
+ # Iterate over results
126
+ results.each do |match|
127
+ puts "ID: #{match.id}, Score: #{match.score}"
128
+ puts "Metadata: #{match.metadata}"
129
+ end
130
+
131
+ # Access specific results
132
+ results.first # First match
133
+ results.ids # All matching IDs
134
+ results.scores # All scores
135
+ results.max_score # Highest score
136
+
137
+ # Filter by score
138
+ high_quality = results.above_score(0.8)
139
+ ```
140
+
141
+ #### Query with Filters
142
+
143
+ ```ruby
144
+ results = client.query(
145
+ index: 'my-index',
146
+ vector: [0.1, 0.2, 0.3],
147
+ top_k: 10,
148
+ filter: { category: 'programming', language: 'ruby' }
149
+ )
150
+ ```
151
+
152
+ #### Fetch Vectors by ID
153
+
154
+ ```ruby
155
+ vectors = client.fetch(index: 'my-index', ids: ['vec1', 'vec2'])
156
+
157
+ vectors['vec1'].values # [0.1, 0.2, 0.3]
158
+ vectors['vec1'].metadata # { 'text' => 'Hello world' }
159
+ ```
160
+
161
+ #### Update Vector Metadata
162
+
163
+ ```ruby
164
+ client.update(
165
+ index: 'my-index',
166
+ id: 'vec1',
167
+ metadata: { category: 'updated', processed: true }
168
+ )
169
+ ```
170
+
171
+ #### Delete Vectors
172
+
173
+ ```ruby
174
+ # Delete by IDs
175
+ client.delete(index: 'my-index', ids: ['vec1', 'vec2'])
176
+
177
+ # Delete by filter
178
+ client.delete(index: 'my-index', filter: { category: 'old' })
179
+
180
+ # Delete all (use with caution!)
181
+ client.delete(index: 'my-index', delete_all: true)
182
+ ```
183
+
184
+ ### Working with Vectors
185
+
186
+ ```ruby
187
+ # Create a Vector object
188
+ vector = Vectra::Vector.new(
189
+ id: 'my-vector',
190
+ values: [0.1, 0.2, 0.3],
191
+ metadata: { text: 'Example' }
192
+ )
193
+
194
+ vector.dimension # => 3
195
+ vector.metadata? # => true
196
+ vector.to_h # Convert to hash
197
+
198
+ # Calculate similarity
199
+ other = Vectra::Vector.new(id: 'other', values: [0.1, 0.2, 0.3])
200
+ vector.cosine_similarity(other) # => 1.0 (identical)
201
+ vector.euclidean_distance(other) # => 0.0
202
+ ```
203
+
204
+ ### Index Management
205
+
206
+ ```ruby
207
+ # List all indexes
208
+ indexes = client.list_indexes
209
+ indexes.each { |idx| puts idx[:name] }
210
+
211
+ # Describe an index
212
+ info = client.describe_index(index: 'my-index')
213
+ puts info[:dimension] # => 384
214
+ puts info[:metric] # => "cosine"
215
+
216
+ # Get index statistics
217
+ stats = client.stats(index: 'my-index')
218
+ puts stats[:total_vector_count]
219
+ ```
220
+
221
+ ### Namespaces
222
+
223
+ Namespaces allow you to partition vectors within an index:
224
+
225
+ ```ruby
226
+ # Upsert to a namespace
227
+ client.upsert(
228
+ index: 'my-index',
229
+ namespace: 'production',
230
+ vectors: [...]
231
+ )
232
+
233
+ # Query within a namespace
234
+ client.query(
235
+ index: 'my-index',
236
+ namespace: 'production',
237
+ vector: [0.1, 0.2, 0.3],
238
+ top_k: 5
239
+ )
240
+ ```
241
+
242
+ ### pgvector (PostgreSQL)
243
+
244
+ pgvector uses PostgreSQL tables as indexes. Each "index" is a table with a vector column.
245
+
246
+ #### Setup PostgreSQL with pgvector
247
+
248
+ ```bash
249
+ # Using Docker
250
+ docker run -d --name pgvector \
251
+ -e POSTGRES_PASSWORD=password \
252
+ -p 5432:5432 \
253
+ pgvector/pgvector:pg16
254
+ ```
255
+
256
+ #### Create an Index (Table)
257
+
258
+ ```ruby
259
+ client = Vectra.pgvector(connection_url: 'postgres://postgres:password@localhost/postgres')
260
+
261
+ # Create a new index with cosine similarity
262
+ client.provider.create_index(
263
+ name: 'documents',
264
+ dimension: 384,
265
+ metric: 'cosine' # or 'euclidean', 'inner_product'
266
+ )
267
+ ```
268
+
269
+ #### Supported Metrics
270
+
271
+ | Metric | Description | pgvector Operator |
272
+ |--------|-------------|-------------------|
273
+ | `cosine` | Cosine similarity (default) | `<=>` |
274
+ | `euclidean` | Euclidean distance | `<->` |
275
+ | `inner_product` | Inner product / dot product | `<#>` |
276
+
277
+ #### Table Structure
278
+
279
+ Vectra creates tables with the following structure:
280
+
281
+ ```sql
282
+ CREATE TABLE documents (
283
+ id TEXT PRIMARY KEY,
284
+ embedding vector(384),
285
+ metadata JSONB DEFAULT '{}',
286
+ namespace TEXT DEFAULT '',
287
+ created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
288
+ );
289
+
290
+ -- IVFFlat index for fast similarity search
291
+ CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops);
292
+ ```
293
+
294
+ ## Configuration Options
295
+
296
+ | Option | Description | Default |
297
+ |--------|-------------|---------|
298
+ | `provider` | Vector database provider (`:pinecone`, `:pgvector`, `:qdrant`, `:weaviate`) | Required |
299
+ | `api_key` | API key for authentication (password for pgvector) | Required* |
300
+ | `environment` | Environment/region (Pinecone) | - |
301
+ | `host` | Direct host URL or PostgreSQL connection URL | - |
302
+ | `timeout` | Request timeout in seconds | 30 |
303
+ | `open_timeout` | Connection timeout in seconds | 10 |
304
+ | `max_retries` | Maximum retry attempts | 3 |
305
+ | `retry_delay` | Initial retry delay in seconds | 1 |
306
+ | `logger` | Logger instance for debugging | nil |
307
+
308
+ *For pgvector, `api_key` is used as the PostgreSQL password.
309
+
310
+ ## Error Handling
311
+
312
+ Vectra provides specific error classes for different failure scenarios:
313
+
314
+ ```ruby
315
+ begin
316
+ client.query(index: 'my-index', vector: [0.1, 0.2], top_k: 5)
317
+ rescue Vectra::AuthenticationError => e
318
+ puts "Authentication failed: #{e.message}"
319
+ rescue Vectra::RateLimitError => e
320
+ puts "Rate limited. Retry after #{e.retry_after} seconds"
321
+ rescue Vectra::NotFoundError => e
322
+ puts "Resource not found: #{e.message}"
323
+ rescue Vectra::ValidationError => e
324
+ puts "Invalid request: #{e.message}"
325
+ rescue Vectra::ServerError => e
326
+ puts "Server error (#{e.status_code}): #{e.message}"
327
+ rescue Vectra::Error => e
328
+ puts "General error: #{e.message}"
329
+ end
330
+ ```
331
+
332
+ ## Logging
333
+
334
+ Enable debug logging to see request details:
335
+
336
+ ```ruby
337
+ require 'logger'
338
+
339
+ Vectra.configure do |config|
340
+ config.provider = :pinecone
341
+ config.api_key = ENV['PINECONE_API_KEY']
342
+ config.environment = 'us-east-1'
343
+ config.logger = Logger.new($stdout)
344
+ end
345
+ ```
346
+
347
+ ## Best Practices
348
+
349
+ ### Batch Upserts
350
+
351
+ For large datasets, batch your upserts:
352
+
353
+ ```ruby
354
+ vectors = large_dataset.each_slice(100).map do |batch|
355
+ client.upsert(index: 'my-index', vectors: batch)
356
+ end
357
+ ```
358
+
359
+ ### Connection Reuse
360
+
361
+ Create a single client instance and reuse it:
362
+
363
+ ```ruby
364
+ # Good: Reuse the client
365
+ client = Vectra::Client.new(...)
366
+ client.query(...)
367
+ client.upsert(...)
368
+
369
+ # Avoid: Creating new clients for each operation
370
+ Vectra::Client.new(...).query(...)
371
+ Vectra::Client.new(...).upsert(...)
372
+ ```
373
+
374
+ ### Error Recovery
375
+
376
+ Implement retry logic for transient failures:
377
+
378
+ ```ruby
379
+ def query_with_retry(client, **params, retries: 3)
380
+ client.query(**params)
381
+ rescue Vectra::RateLimitError => e
382
+ if retries > 0
383
+ sleep(e.retry_after || 1)
384
+ retry(retries: retries - 1)
385
+ else
386
+ raise
387
+ end
388
+ end
389
+ ```
390
+
391
+ ## Development
392
+
393
+ After checking out the repo:
394
+
395
+ ```bash
396
+ # Install dependencies
397
+ bundle install
398
+
399
+ # Run tests
400
+ bundle exec rspec
401
+
402
+ # Run linter
403
+ bundle exec rubocop
404
+
405
+ # Generate documentation
406
+ bundle exec rake docs
407
+ ```
408
+
409
+ ## Roadmap
410
+
411
+ ### v0.1.0
412
+ - ✅ Pinecone provider
413
+ - ✅ Basic CRUD operations
414
+ - ✅ Configuration system
415
+ - ✅ Error handling with retries
416
+ - ✅ Comprehensive tests
417
+
418
+ ### v0.1.1 (Current)
419
+ - ✅ pgvector (PostgreSQL) provider
420
+ - ✅ Multiple similarity metrics (cosine, euclidean, inner product)
421
+ - ✅ Namespace support for pgvector
422
+ - ✅ IVFFlat index creation
423
+
424
+ ### v0.2.0
425
+ - 🚧 Qdrant provider
426
+ - 🚧 Enhanced error handling
427
+ - 🚧 Connection pooling
428
+
429
+ ### v0.3.0
430
+ - 🚧 Weaviate provider
431
+ - 🚧 Batch operations
432
+ - 🚧 Performance optimizations
433
+
434
+ ### v1.0.0
435
+ - 🚧 Rails integration
436
+ - 🚧 ActiveRecord-like DSL
437
+ - 🚧 Background job support
438
+ - 🚧 Full documentation
439
+
440
+ ## Contributing
441
+
442
+ Bug reports and pull requests are welcome on GitHub at https://github.com/stokry/vectra.
443
+
444
+ 1. Fork it
445
+ 2. Create your feature branch (`git checkout -b feature/my-new-feature`)
446
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
447
+ 4. Push to the branch (`git push origin feature/my-new-feature`)
448
+ 5. Create a new Pull Request
449
+
450
+ ## License
451
+
452
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
453
+
454
+ ## Acknowledgments
455
+
456
+ Inspired by the simplicity of Ruby database gems and the need for a unified vector database interface.
data/Rakefile ADDED
@@ -0,0 +1,34 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "bundler/gem_tasks"
4
+ require "rspec/core/rake_task"
5
+ require "rubocop/rake_task"
6
+
7
+ RSpec::Core::RakeTask.new(:spec)
8
+ RuboCop::RakeTask.new
9
+
10
+ task default: %i[spec rubocop]
11
+
12
+ namespace :spec do
13
+ desc "Run unit tests only"
14
+ RSpec::Core::RakeTask.new(:unit) do |t|
15
+ t.pattern = "spec/vectra/**/*_spec.rb"
16
+ end
17
+
18
+ desc "Run integration tests only"
19
+ RSpec::Core::RakeTask.new(:integration) do |t|
20
+ t.pattern = "spec/integration/**/*_spec.rb"
21
+ end
22
+ end
23
+
24
+ desc "Generate documentation"
25
+ task :docs do
26
+ require "yard"
27
+ YARD::CLI::Yardoc.run("--output-dir", "doc", "lib/**/*.rb", "-", "README.md", "CHANGELOG.md")
28
+ end
29
+
30
+ desc "Generate CHANGELOG.md"
31
+ task :changelog do
32
+ puts "Generating CHANGELOG.md..."
33
+ system("github_changelog_generator") || puts("Install with: gem install github_changelog_generator")
34
+ end
data/SECURITY.md ADDED
@@ -0,0 +1,196 @@
1
+ # Security Policy
2
+
3
+ ## Supported Versions
4
+
5
+ We release patches for security vulnerabilities for the following versions:
6
+
7
+ | Version | Supported |
8
+ | ------- | ------------------ |
9
+ | 0.1.x | :white_check_mark: |
10
+ | < 0.1 | :x: |
11
+
12
+ ## Reporting a Vulnerability
13
+
14
+ We take the security of Vectra seriously. If you believe you have found a security vulnerability, please report it to us as described below.
15
+
16
+ ### Where to Report
17
+
18
+ **Please do NOT report security vulnerabilities through public GitHub issues.**
19
+
20
+ Instead, please report them via email to: **mijo@mijokristo.com**
21
+
22
+ ### What to Include
23
+
24
+ Please include the following information in your report:
25
+
26
+ - Type of vulnerability (e.g., authentication bypass, SQL injection, credential exposure)
27
+ - Full paths of source file(s) related to the manifestation of the vulnerability
28
+ - The location of the affected source code (tag/branch/commit or direct URL)
29
+ - Step-by-step instructions to reproduce the issue
30
+ - Proof-of-concept or exploit code (if possible)
31
+ - Impact of the issue, including how an attacker might exploit it
32
+
33
+ ### Response Timeline
34
+
35
+ - **Initial Response**: Within 48 hours
36
+ - **Status Update**: Within 7 days
37
+ - **Fix Timeline**: Depends on severity, typically 30-90 days
38
+
39
+ We will:
40
+ 1. Confirm the receipt of your vulnerability report
41
+ 2. Provide an estimated timeline for a fix
42
+ 3. Notify you when the vulnerability is fixed
43
+ 4. Credit you in the security advisory (unless you prefer to remain anonymous)
44
+
45
+ ## Security Best Practices for Users
46
+
47
+ ### API Key Management
48
+
49
+ - **Never commit API keys** to version control
50
+ - Store API keys in environment variables or secure vaults
51
+ - Use different API keys for development, staging, and production
52
+ - Rotate API keys regularly
53
+ - Limit API key permissions to minimum required access
54
+
55
+ ```ruby
56
+ # ✅ Good - Use environment variables
57
+ Vectra.configure do |config|
58
+ config.api_key = ENV['PINECONE_API_KEY']
59
+ end
60
+
61
+ # ❌ Bad - Hardcoded API key
62
+ Vectra.configure do |config|
63
+ config.api_key = "pk-123456789" # Never do this!
64
+ end
65
+ ```
66
+
67
+ ### Network Security
68
+
69
+ - Always use HTTPS for API connections (enforced by default)
70
+ - Verify SSL certificates (enabled by default)
71
+ - Use VPN or private networks when possible
72
+ - Monitor API usage for unusual patterns
73
+
74
+ ### Data Security
75
+
76
+ - **Sanitize input data** before upserting to vector databases
77
+ - **Validate vector dimensions** match your index configuration
78
+ - **Review metadata** for sensitive information before upserting
79
+ - **Implement access controls** at the application level
80
+ - **Encrypt sensitive metadata** before storage if needed
81
+
82
+ ```ruby
83
+ # Example: Sanitizing metadata
84
+ def sanitize_metadata(metadata)
85
+ metadata.reject { |k, _| k.to_s.match?(/password|secret|token/i) }
86
+ end
87
+
88
+ vectors = [{
89
+ id: "vec1",
90
+ values: embedding,
91
+ metadata: sanitize_metadata(user_data)
92
+ }]
93
+
94
+ client.upsert(index: "my-index", vectors: vectors)
95
+ ```
96
+
97
+ ### Dependency Security
98
+
99
+ - Keep Vectra and its dependencies up to date
100
+ - Run `bundle audit` regularly to check for known vulnerabilities
101
+ - Review dependency changes in updates
102
+
103
+ ```bash
104
+ # Check for vulnerabilities
105
+ gem install bundler-audit
106
+ bundle audit --update
107
+ ```
108
+
109
+ ### Rate Limiting
110
+
111
+ - Implement application-level rate limiting
112
+ - Handle `RateLimitError` exceptions appropriately
113
+ - Use exponential backoff for retries
114
+
115
+ ```ruby
116
+ def safe_query_with_backoff(client, **params, max_retries: 3)
117
+ retries = 0
118
+
119
+ begin
120
+ client.query(**params)
121
+ rescue Vectra::RateLimitError => e
122
+ retries += 1
123
+ if retries <= max_retries
124
+ sleep_time = e.retry_after || (2 ** retries)
125
+ sleep(sleep_time)
126
+ retry
127
+ else
128
+ raise
129
+ end
130
+ end
131
+ end
132
+ ```
133
+
134
+ ### Logging and Monitoring
135
+
136
+ - **Do not log API keys** or sensitive data
137
+ - Monitor for authentication failures
138
+ - Track unusual query patterns
139
+ - Set up alerts for rate limit violations
140
+
141
+ ```ruby
142
+ # ❌ Bad - Logs API key
143
+ logger.info("Using API key: #{config.api_key}")
144
+
145
+ # ✅ Good - Logs without sensitive data
146
+ logger.info("Initializing Vectra client for #{config.provider}")
147
+ ```
148
+
149
+ ## Known Security Considerations
150
+
151
+ ### API Key Exposure
152
+
153
+ API keys are transmitted in HTTP headers. While connections use HTTPS, ensure:
154
+ - API keys are never logged or exposed in error messages
155
+ - API keys are not included in client-side code
156
+ - Development/test API keys are separate from production
157
+
158
+ ### Metadata Privacy
159
+
160
+ Metadata stored with vectors may contain sensitive information:
161
+ - Review metadata fields before upserting
162
+ - Consider encryption for sensitive fields
163
+ - Implement data retention policies
164
+ - Follow GDPR/privacy regulations for user data
165
+
166
+ ### Dependency Chain
167
+
168
+ Vectra depends on:
169
+ - `faraday` - HTTP client library
170
+ - `faraday-retry` - Retry middleware
171
+
172
+ We monitor these dependencies for security issues and update promptly.
173
+
174
+ ## Security Updates
175
+
176
+ Security updates will be released as patch versions (e.g., 0.1.1) and announced:
177
+ - On GitHub Security Advisories
178
+ - In the CHANGELOG.md
179
+ - Via RubyGems security notifications
180
+
181
+ Subscribe to GitHub releases to be notified of security updates.
182
+
183
+ ## Compliance
184
+
185
+ Vectra is designed to work with various vector database providers. Ensure your usage complies with:
186
+ - Your provider's security requirements
187
+ - Data protection regulations (GDPR, CCPA, etc.)
188
+ - Industry-specific compliance (HIPAA, PCI-DSS, etc.)
189
+
190
+ ## Questions?
191
+
192
+ If you have questions about security that are not covered here, please email: mijo@mijokristo.com
193
+
194
+ ## Attribution
195
+
196
+ We appreciate responsible disclosure and will acknowledge security researchers who help improve Vectra's security.