vectra-client 0.1.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.codecov.yml +31 -0
- data/.rspec +4 -0
- data/.rubocop.yml +183 -0
- data/.ruby-version +1 -0
- data/CHANGELOG.md +88 -0
- data/CODE_OF_CONDUCT.md +127 -0
- data/CONTRIBUTING.md +239 -0
- data/LICENSE +21 -0
- data/README.md +456 -0
- data/Rakefile +34 -0
- data/SECURITY.md +196 -0
- data/lib/vectra/client.rb +304 -0
- data/lib/vectra/configuration.rb +169 -0
- data/lib/vectra/errors.rb +73 -0
- data/lib/vectra/providers/base.rb +265 -0
- data/lib/vectra/providers/pgvector/connection.rb +75 -0
- data/lib/vectra/providers/pgvector/index_management.rb +122 -0
- data/lib/vectra/providers/pgvector/sql_helpers.rb +115 -0
- data/lib/vectra/providers/pgvector.rb +297 -0
- data/lib/vectra/providers/pinecone.rb +308 -0
- data/lib/vectra/providers/qdrant.rb +48 -0
- data/lib/vectra/providers/weaviate.rb +48 -0
- data/lib/vectra/query_result.rb +257 -0
- data/lib/vectra/vector.rb +155 -0
- data/lib/vectra/version.rb +5 -0
- data/lib/vectra.rb +133 -0
- metadata +226 -0
data/README.md
ADDED
|
@@ -0,0 +1,456 @@
|
|
|
1
|
+
# Vectra
|
|
2
|
+
|
|
3
|
+
[](https://badge.fury.io/rb/vectra)
|
|
4
|
+
[](https://github.com/stokry/vectra/actions)
|
|
5
|
+
[](https://codecov.io/gh/stokry/vectra)
|
|
6
|
+
[](https://github.com/rubocop/rubocop)
|
|
7
|
+
[](https://opensource.org/licenses/MIT)
|
|
8
|
+
[](CODE_OF_CONDUCT.md)
|
|
9
|
+
|
|
10
|
+
**Vectra** is a unified Ruby client for vector databases. Write once, switch providers seamlessly.
|
|
11
|
+
|
|
12
|
+
## Features
|
|
13
|
+
|
|
14
|
+
- 🔌 **Unified API** - One interface for multiple vector databases
|
|
15
|
+
- 🚀 **Modern Ruby** - Built for Ruby 3.2+ with modern patterns
|
|
16
|
+
- 🔄 **Automatic Retries** - Built-in retry logic with exponential backoff
|
|
17
|
+
- 📊 **Rich Results** - Enumerable query results with filtering capabilities
|
|
18
|
+
- 🛡️ **Type Safety** - Comprehensive validation and meaningful errors
|
|
19
|
+
- 📝 **Well Documented** - Extensive YARD documentation
|
|
20
|
+
|
|
21
|
+
## Supported Providers
|
|
22
|
+
|
|
23
|
+
| Provider | Status | Version |
|
|
24
|
+
|----------|--------|---------|
|
|
25
|
+
| [Pinecone](https://pinecone.io) | ✅ Fully Supported | v0.1.0 |
|
|
26
|
+
| [PostgreSQL + pgvector](https://github.com/pgvector/pgvector) | ✅ Fully Supported | v0.1.1 |
|
|
27
|
+
| [Qdrant](https://qdrant.tech) | 🚧 Planned | v0.2.0 |
|
|
28
|
+
| [Weaviate](https://weaviate.io) | 🚧 Planned | v0.3.0 |
|
|
29
|
+
|
|
30
|
+
## Installation
|
|
31
|
+
|
|
32
|
+
Add this line to your application's Gemfile:
|
|
33
|
+
|
|
34
|
+
```ruby
|
|
35
|
+
gem 'vectra'
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
And then execute:
|
|
39
|
+
|
|
40
|
+
```bash
|
|
41
|
+
bundle install
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
Or install it yourself:
|
|
45
|
+
|
|
46
|
+
```bash
|
|
47
|
+
gem install vectra
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
### Provider-Specific Dependencies
|
|
51
|
+
|
|
52
|
+
For **pgvector** support, add the `pg` gem:
|
|
53
|
+
|
|
54
|
+
```ruby
|
|
55
|
+
gem 'pg', '~> 1.5'
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
## Quick Start
|
|
59
|
+
|
|
60
|
+
### Configuration
|
|
61
|
+
|
|
62
|
+
```ruby
|
|
63
|
+
require 'vectra'
|
|
64
|
+
|
|
65
|
+
# Global configuration
|
|
66
|
+
Vectra.configure do |config|
|
|
67
|
+
config.provider = :pinecone
|
|
68
|
+
config.api_key = ENV['PINECONE_API_KEY']
|
|
69
|
+
config.environment = 'us-east-1' # or config.host = 'your-index-host.pinecone.io'
|
|
70
|
+
end
|
|
71
|
+
|
|
72
|
+
# Create a client
|
|
73
|
+
client = Vectra::Client.new
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
Or use per-client configuration:
|
|
77
|
+
|
|
78
|
+
```ruby
|
|
79
|
+
# Shortcut for Pinecone
|
|
80
|
+
client = Vectra.pinecone(
|
|
81
|
+
api_key: ENV['PINECONE_API_KEY'],
|
|
82
|
+
environment: 'us-east-1'
|
|
83
|
+
)
|
|
84
|
+
|
|
85
|
+
# Shortcut for pgvector (PostgreSQL)
|
|
86
|
+
client = Vectra.pgvector(
|
|
87
|
+
connection_url: 'postgres://user:password@localhost/mydb'
|
|
88
|
+
)
|
|
89
|
+
|
|
90
|
+
# Generic client with options
|
|
91
|
+
client = Vectra::Client.new(
|
|
92
|
+
provider: :pinecone,
|
|
93
|
+
api_key: ENV['PINECONE_API_KEY'],
|
|
94
|
+
environment: 'us-east-1',
|
|
95
|
+
timeout: 60,
|
|
96
|
+
max_retries: 5
|
|
97
|
+
)
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
### Basic Operations
|
|
101
|
+
|
|
102
|
+
#### Upsert Vectors
|
|
103
|
+
|
|
104
|
+
```ruby
|
|
105
|
+
client.upsert(
|
|
106
|
+
index: 'my-index',
|
|
107
|
+
vectors: [
|
|
108
|
+
{ id: 'vec1', values: [0.1, 0.2, 0.3], metadata: { text: 'Hello world' } },
|
|
109
|
+
{ id: 'vec2', values: [0.4, 0.5, 0.6], metadata: { text: 'Ruby is great' } }
|
|
110
|
+
]
|
|
111
|
+
)
|
|
112
|
+
# => { upserted_count: 2 }
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
#### Query Vectors
|
|
116
|
+
|
|
117
|
+
```ruby
|
|
118
|
+
results = client.query(
|
|
119
|
+
index: 'my-index',
|
|
120
|
+
vector: [0.1, 0.2, 0.3],
|
|
121
|
+
top_k: 5,
|
|
122
|
+
include_metadata: true
|
|
123
|
+
)
|
|
124
|
+
|
|
125
|
+
# Iterate over results
|
|
126
|
+
results.each do |match|
|
|
127
|
+
puts "ID: #{match.id}, Score: #{match.score}"
|
|
128
|
+
puts "Metadata: #{match.metadata}"
|
|
129
|
+
end
|
|
130
|
+
|
|
131
|
+
# Access specific results
|
|
132
|
+
results.first # First match
|
|
133
|
+
results.ids # All matching IDs
|
|
134
|
+
results.scores # All scores
|
|
135
|
+
results.max_score # Highest score
|
|
136
|
+
|
|
137
|
+
# Filter by score
|
|
138
|
+
high_quality = results.above_score(0.8)
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
#### Query with Filters
|
|
142
|
+
|
|
143
|
+
```ruby
|
|
144
|
+
results = client.query(
|
|
145
|
+
index: 'my-index',
|
|
146
|
+
vector: [0.1, 0.2, 0.3],
|
|
147
|
+
top_k: 10,
|
|
148
|
+
filter: { category: 'programming', language: 'ruby' }
|
|
149
|
+
)
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
#### Fetch Vectors by ID
|
|
153
|
+
|
|
154
|
+
```ruby
|
|
155
|
+
vectors = client.fetch(index: 'my-index', ids: ['vec1', 'vec2'])
|
|
156
|
+
|
|
157
|
+
vectors['vec1'].values # [0.1, 0.2, 0.3]
|
|
158
|
+
vectors['vec1'].metadata # { 'text' => 'Hello world' }
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
#### Update Vector Metadata
|
|
162
|
+
|
|
163
|
+
```ruby
|
|
164
|
+
client.update(
|
|
165
|
+
index: 'my-index',
|
|
166
|
+
id: 'vec1',
|
|
167
|
+
metadata: { category: 'updated', processed: true }
|
|
168
|
+
)
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
#### Delete Vectors
|
|
172
|
+
|
|
173
|
+
```ruby
|
|
174
|
+
# Delete by IDs
|
|
175
|
+
client.delete(index: 'my-index', ids: ['vec1', 'vec2'])
|
|
176
|
+
|
|
177
|
+
# Delete by filter
|
|
178
|
+
client.delete(index: 'my-index', filter: { category: 'old' })
|
|
179
|
+
|
|
180
|
+
# Delete all (use with caution!)
|
|
181
|
+
client.delete(index: 'my-index', delete_all: true)
|
|
182
|
+
```
|
|
183
|
+
|
|
184
|
+
### Working with Vectors
|
|
185
|
+
|
|
186
|
+
```ruby
|
|
187
|
+
# Create a Vector object
|
|
188
|
+
vector = Vectra::Vector.new(
|
|
189
|
+
id: 'my-vector',
|
|
190
|
+
values: [0.1, 0.2, 0.3],
|
|
191
|
+
metadata: { text: 'Example' }
|
|
192
|
+
)
|
|
193
|
+
|
|
194
|
+
vector.dimension # => 3
|
|
195
|
+
vector.metadata? # => true
|
|
196
|
+
vector.to_h # Convert to hash
|
|
197
|
+
|
|
198
|
+
# Calculate similarity
|
|
199
|
+
other = Vectra::Vector.new(id: 'other', values: [0.1, 0.2, 0.3])
|
|
200
|
+
vector.cosine_similarity(other) # => 1.0 (identical)
|
|
201
|
+
vector.euclidean_distance(other) # => 0.0
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
### Index Management
|
|
205
|
+
|
|
206
|
+
```ruby
|
|
207
|
+
# List all indexes
|
|
208
|
+
indexes = client.list_indexes
|
|
209
|
+
indexes.each { |idx| puts idx[:name] }
|
|
210
|
+
|
|
211
|
+
# Describe an index
|
|
212
|
+
info = client.describe_index(index: 'my-index')
|
|
213
|
+
puts info[:dimension] # => 384
|
|
214
|
+
puts info[:metric] # => "cosine"
|
|
215
|
+
|
|
216
|
+
# Get index statistics
|
|
217
|
+
stats = client.stats(index: 'my-index')
|
|
218
|
+
puts stats[:total_vector_count]
|
|
219
|
+
```
|
|
220
|
+
|
|
221
|
+
### Namespaces
|
|
222
|
+
|
|
223
|
+
Namespaces allow you to partition vectors within an index:
|
|
224
|
+
|
|
225
|
+
```ruby
|
|
226
|
+
# Upsert to a namespace
|
|
227
|
+
client.upsert(
|
|
228
|
+
index: 'my-index',
|
|
229
|
+
namespace: 'production',
|
|
230
|
+
vectors: [...]
|
|
231
|
+
)
|
|
232
|
+
|
|
233
|
+
# Query within a namespace
|
|
234
|
+
client.query(
|
|
235
|
+
index: 'my-index',
|
|
236
|
+
namespace: 'production',
|
|
237
|
+
vector: [0.1, 0.2, 0.3],
|
|
238
|
+
top_k: 5
|
|
239
|
+
)
|
|
240
|
+
```
|
|
241
|
+
|
|
242
|
+
### pgvector (PostgreSQL)
|
|
243
|
+
|
|
244
|
+
pgvector uses PostgreSQL tables as indexes. Each "index" is a table with a vector column.
|
|
245
|
+
|
|
246
|
+
#### Setup PostgreSQL with pgvector
|
|
247
|
+
|
|
248
|
+
```bash
|
|
249
|
+
# Using Docker
|
|
250
|
+
docker run -d --name pgvector \
|
|
251
|
+
-e POSTGRES_PASSWORD=password \
|
|
252
|
+
-p 5432:5432 \
|
|
253
|
+
pgvector/pgvector:pg16
|
|
254
|
+
```
|
|
255
|
+
|
|
256
|
+
#### Create an Index (Table)
|
|
257
|
+
|
|
258
|
+
```ruby
|
|
259
|
+
client = Vectra.pgvector(connection_url: 'postgres://postgres:password@localhost/postgres')
|
|
260
|
+
|
|
261
|
+
# Create a new index with cosine similarity
|
|
262
|
+
client.provider.create_index(
|
|
263
|
+
name: 'documents',
|
|
264
|
+
dimension: 384,
|
|
265
|
+
metric: 'cosine' # or 'euclidean', 'inner_product'
|
|
266
|
+
)
|
|
267
|
+
```
|
|
268
|
+
|
|
269
|
+
#### Supported Metrics
|
|
270
|
+
|
|
271
|
+
| Metric | Description | pgvector Operator |
|
|
272
|
+
|--------|-------------|-------------------|
|
|
273
|
+
| `cosine` | Cosine similarity (default) | `<=>` |
|
|
274
|
+
| `euclidean` | Euclidean distance | `<->` |
|
|
275
|
+
| `inner_product` | Inner product / dot product | `<#>` |
|
|
276
|
+
|
|
277
|
+
#### Table Structure
|
|
278
|
+
|
|
279
|
+
Vectra creates tables with the following structure:
|
|
280
|
+
|
|
281
|
+
```sql
|
|
282
|
+
CREATE TABLE documents (
|
|
283
|
+
id TEXT PRIMARY KEY,
|
|
284
|
+
embedding vector(384),
|
|
285
|
+
metadata JSONB DEFAULT '{}',
|
|
286
|
+
namespace TEXT DEFAULT '',
|
|
287
|
+
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
|
288
|
+
);
|
|
289
|
+
|
|
290
|
+
-- IVFFlat index for fast similarity search
|
|
291
|
+
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops);
|
|
292
|
+
```
|
|
293
|
+
|
|
294
|
+
## Configuration Options
|
|
295
|
+
|
|
296
|
+
| Option | Description | Default |
|
|
297
|
+
|--------|-------------|---------|
|
|
298
|
+
| `provider` | Vector database provider (`:pinecone`, `:pgvector`, `:qdrant`, `:weaviate`) | Required |
|
|
299
|
+
| `api_key` | API key for authentication (password for pgvector) | Required* |
|
|
300
|
+
| `environment` | Environment/region (Pinecone) | - |
|
|
301
|
+
| `host` | Direct host URL or PostgreSQL connection URL | - |
|
|
302
|
+
| `timeout` | Request timeout in seconds | 30 |
|
|
303
|
+
| `open_timeout` | Connection timeout in seconds | 10 |
|
|
304
|
+
| `max_retries` | Maximum retry attempts | 3 |
|
|
305
|
+
| `retry_delay` | Initial retry delay in seconds | 1 |
|
|
306
|
+
| `logger` | Logger instance for debugging | nil |
|
|
307
|
+
|
|
308
|
+
*For pgvector, `api_key` is used as the PostgreSQL password.
|
|
309
|
+
|
|
310
|
+
## Error Handling
|
|
311
|
+
|
|
312
|
+
Vectra provides specific error classes for different failure scenarios:
|
|
313
|
+
|
|
314
|
+
```ruby
|
|
315
|
+
begin
|
|
316
|
+
client.query(index: 'my-index', vector: [0.1, 0.2], top_k: 5)
|
|
317
|
+
rescue Vectra::AuthenticationError => e
|
|
318
|
+
puts "Authentication failed: #{e.message}"
|
|
319
|
+
rescue Vectra::RateLimitError => e
|
|
320
|
+
puts "Rate limited. Retry after #{e.retry_after} seconds"
|
|
321
|
+
rescue Vectra::NotFoundError => e
|
|
322
|
+
puts "Resource not found: #{e.message}"
|
|
323
|
+
rescue Vectra::ValidationError => e
|
|
324
|
+
puts "Invalid request: #{e.message}"
|
|
325
|
+
rescue Vectra::ServerError => e
|
|
326
|
+
puts "Server error (#{e.status_code}): #{e.message}"
|
|
327
|
+
rescue Vectra::Error => e
|
|
328
|
+
puts "General error: #{e.message}"
|
|
329
|
+
end
|
|
330
|
+
```
|
|
331
|
+
|
|
332
|
+
## Logging
|
|
333
|
+
|
|
334
|
+
Enable debug logging to see request details:
|
|
335
|
+
|
|
336
|
+
```ruby
|
|
337
|
+
require 'logger'
|
|
338
|
+
|
|
339
|
+
Vectra.configure do |config|
|
|
340
|
+
config.provider = :pinecone
|
|
341
|
+
config.api_key = ENV['PINECONE_API_KEY']
|
|
342
|
+
config.environment = 'us-east-1'
|
|
343
|
+
config.logger = Logger.new($stdout)
|
|
344
|
+
end
|
|
345
|
+
```
|
|
346
|
+
|
|
347
|
+
## Best Practices
|
|
348
|
+
|
|
349
|
+
### Batch Upserts
|
|
350
|
+
|
|
351
|
+
For large datasets, batch your upserts:
|
|
352
|
+
|
|
353
|
+
```ruby
|
|
354
|
+
vectors = large_dataset.each_slice(100).map do |batch|
|
|
355
|
+
client.upsert(index: 'my-index', vectors: batch)
|
|
356
|
+
end
|
|
357
|
+
```
|
|
358
|
+
|
|
359
|
+
### Connection Reuse
|
|
360
|
+
|
|
361
|
+
Create a single client instance and reuse it:
|
|
362
|
+
|
|
363
|
+
```ruby
|
|
364
|
+
# Good: Reuse the client
|
|
365
|
+
client = Vectra::Client.new(...)
|
|
366
|
+
client.query(...)
|
|
367
|
+
client.upsert(...)
|
|
368
|
+
|
|
369
|
+
# Avoid: Creating new clients for each operation
|
|
370
|
+
Vectra::Client.new(...).query(...)
|
|
371
|
+
Vectra::Client.new(...).upsert(...)
|
|
372
|
+
```
|
|
373
|
+
|
|
374
|
+
### Error Recovery
|
|
375
|
+
|
|
376
|
+
Implement retry logic for transient failures:
|
|
377
|
+
|
|
378
|
+
```ruby
|
|
379
|
+
def query_with_retry(client, **params, retries: 3)
|
|
380
|
+
client.query(**params)
|
|
381
|
+
rescue Vectra::RateLimitError => e
|
|
382
|
+
if retries > 0
|
|
383
|
+
sleep(e.retry_after || 1)
|
|
384
|
+
retry(retries: retries - 1)
|
|
385
|
+
else
|
|
386
|
+
raise
|
|
387
|
+
end
|
|
388
|
+
end
|
|
389
|
+
```
|
|
390
|
+
|
|
391
|
+
## Development
|
|
392
|
+
|
|
393
|
+
After checking out the repo:
|
|
394
|
+
|
|
395
|
+
```bash
|
|
396
|
+
# Install dependencies
|
|
397
|
+
bundle install
|
|
398
|
+
|
|
399
|
+
# Run tests
|
|
400
|
+
bundle exec rspec
|
|
401
|
+
|
|
402
|
+
# Run linter
|
|
403
|
+
bundle exec rubocop
|
|
404
|
+
|
|
405
|
+
# Generate documentation
|
|
406
|
+
bundle exec rake docs
|
|
407
|
+
```
|
|
408
|
+
|
|
409
|
+
## Roadmap
|
|
410
|
+
|
|
411
|
+
### v0.1.0
|
|
412
|
+
- ✅ Pinecone provider
|
|
413
|
+
- ✅ Basic CRUD operations
|
|
414
|
+
- ✅ Configuration system
|
|
415
|
+
- ✅ Error handling with retries
|
|
416
|
+
- ✅ Comprehensive tests
|
|
417
|
+
|
|
418
|
+
### v0.1.1 (Current)
|
|
419
|
+
- ✅ pgvector (PostgreSQL) provider
|
|
420
|
+
- ✅ Multiple similarity metrics (cosine, euclidean, inner product)
|
|
421
|
+
- ✅ Namespace support for pgvector
|
|
422
|
+
- ✅ IVFFlat index creation
|
|
423
|
+
|
|
424
|
+
### v0.2.0
|
|
425
|
+
- 🚧 Qdrant provider
|
|
426
|
+
- 🚧 Enhanced error handling
|
|
427
|
+
- 🚧 Connection pooling
|
|
428
|
+
|
|
429
|
+
### v0.3.0
|
|
430
|
+
- 🚧 Weaviate provider
|
|
431
|
+
- 🚧 Batch operations
|
|
432
|
+
- 🚧 Performance optimizations
|
|
433
|
+
|
|
434
|
+
### v1.0.0
|
|
435
|
+
- 🚧 Rails integration
|
|
436
|
+
- 🚧 ActiveRecord-like DSL
|
|
437
|
+
- 🚧 Background job support
|
|
438
|
+
- 🚧 Full documentation
|
|
439
|
+
|
|
440
|
+
## Contributing
|
|
441
|
+
|
|
442
|
+
Bug reports and pull requests are welcome on GitHub at https://github.com/stokry/vectra.
|
|
443
|
+
|
|
444
|
+
1. Fork it
|
|
445
|
+
2. Create your feature branch (`git checkout -b feature/my-new-feature`)
|
|
446
|
+
3. Commit your changes (`git commit -am 'Add some feature'`)
|
|
447
|
+
4. Push to the branch (`git push origin feature/my-new-feature`)
|
|
448
|
+
5. Create a new Pull Request
|
|
449
|
+
|
|
450
|
+
## License
|
|
451
|
+
|
|
452
|
+
The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
|
|
453
|
+
|
|
454
|
+
## Acknowledgments
|
|
455
|
+
|
|
456
|
+
Inspired by the simplicity of Ruby database gems and the need for a unified vector database interface.
|
data/Rakefile
ADDED
|
@@ -0,0 +1,34 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require "bundler/gem_tasks"
|
|
4
|
+
require "rspec/core/rake_task"
|
|
5
|
+
require "rubocop/rake_task"
|
|
6
|
+
|
|
7
|
+
RSpec::Core::RakeTask.new(:spec)
|
|
8
|
+
RuboCop::RakeTask.new
|
|
9
|
+
|
|
10
|
+
task default: %i[spec rubocop]
|
|
11
|
+
|
|
12
|
+
namespace :spec do
|
|
13
|
+
desc "Run unit tests only"
|
|
14
|
+
RSpec::Core::RakeTask.new(:unit) do |t|
|
|
15
|
+
t.pattern = "spec/vectra/**/*_spec.rb"
|
|
16
|
+
end
|
|
17
|
+
|
|
18
|
+
desc "Run integration tests only"
|
|
19
|
+
RSpec::Core::RakeTask.new(:integration) do |t|
|
|
20
|
+
t.pattern = "spec/integration/**/*_spec.rb"
|
|
21
|
+
end
|
|
22
|
+
end
|
|
23
|
+
|
|
24
|
+
desc "Generate documentation"
|
|
25
|
+
task :docs do
|
|
26
|
+
require "yard"
|
|
27
|
+
YARD::CLI::Yardoc.run("--output-dir", "doc", "lib/**/*.rb", "-", "README.md", "CHANGELOG.md")
|
|
28
|
+
end
|
|
29
|
+
|
|
30
|
+
desc "Generate CHANGELOG.md"
|
|
31
|
+
task :changelog do
|
|
32
|
+
puts "Generating CHANGELOG.md..."
|
|
33
|
+
system("github_changelog_generator") || puts("Install with: gem install github_changelog_generator")
|
|
34
|
+
end
|
data/SECURITY.md
ADDED
|
@@ -0,0 +1,196 @@
|
|
|
1
|
+
# Security Policy
|
|
2
|
+
|
|
3
|
+
## Supported Versions
|
|
4
|
+
|
|
5
|
+
We release patches for security vulnerabilities for the following versions:
|
|
6
|
+
|
|
7
|
+
| Version | Supported |
|
|
8
|
+
| ------- | ------------------ |
|
|
9
|
+
| 0.1.x | :white_check_mark: |
|
|
10
|
+
| < 0.1 | :x: |
|
|
11
|
+
|
|
12
|
+
## Reporting a Vulnerability
|
|
13
|
+
|
|
14
|
+
We take the security of Vectra seriously. If you believe you have found a security vulnerability, please report it to us as described below.
|
|
15
|
+
|
|
16
|
+
### Where to Report
|
|
17
|
+
|
|
18
|
+
**Please do NOT report security vulnerabilities through public GitHub issues.**
|
|
19
|
+
|
|
20
|
+
Instead, please report them via email to: **mijo@mijokristo.com**
|
|
21
|
+
|
|
22
|
+
### What to Include
|
|
23
|
+
|
|
24
|
+
Please include the following information in your report:
|
|
25
|
+
|
|
26
|
+
- Type of vulnerability (e.g., authentication bypass, SQL injection, credential exposure)
|
|
27
|
+
- Full paths of source file(s) related to the manifestation of the vulnerability
|
|
28
|
+
- The location of the affected source code (tag/branch/commit or direct URL)
|
|
29
|
+
- Step-by-step instructions to reproduce the issue
|
|
30
|
+
- Proof-of-concept or exploit code (if possible)
|
|
31
|
+
- Impact of the issue, including how an attacker might exploit it
|
|
32
|
+
|
|
33
|
+
### Response Timeline
|
|
34
|
+
|
|
35
|
+
- **Initial Response**: Within 48 hours
|
|
36
|
+
- **Status Update**: Within 7 days
|
|
37
|
+
- **Fix Timeline**: Depends on severity, typically 30-90 days
|
|
38
|
+
|
|
39
|
+
We will:
|
|
40
|
+
1. Confirm the receipt of your vulnerability report
|
|
41
|
+
2. Provide an estimated timeline for a fix
|
|
42
|
+
3. Notify you when the vulnerability is fixed
|
|
43
|
+
4. Credit you in the security advisory (unless you prefer to remain anonymous)
|
|
44
|
+
|
|
45
|
+
## Security Best Practices for Users
|
|
46
|
+
|
|
47
|
+
### API Key Management
|
|
48
|
+
|
|
49
|
+
- **Never commit API keys** to version control
|
|
50
|
+
- Store API keys in environment variables or secure vaults
|
|
51
|
+
- Use different API keys for development, staging, and production
|
|
52
|
+
- Rotate API keys regularly
|
|
53
|
+
- Limit API key permissions to minimum required access
|
|
54
|
+
|
|
55
|
+
```ruby
|
|
56
|
+
# ✅ Good - Use environment variables
|
|
57
|
+
Vectra.configure do |config|
|
|
58
|
+
config.api_key = ENV['PINECONE_API_KEY']
|
|
59
|
+
end
|
|
60
|
+
|
|
61
|
+
# ❌ Bad - Hardcoded API key
|
|
62
|
+
Vectra.configure do |config|
|
|
63
|
+
config.api_key = "pk-123456789" # Never do this!
|
|
64
|
+
end
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
### Network Security
|
|
68
|
+
|
|
69
|
+
- Always use HTTPS for API connections (enforced by default)
|
|
70
|
+
- Verify SSL certificates (enabled by default)
|
|
71
|
+
- Use VPN or private networks when possible
|
|
72
|
+
- Monitor API usage for unusual patterns
|
|
73
|
+
|
|
74
|
+
### Data Security
|
|
75
|
+
|
|
76
|
+
- **Sanitize input data** before upserting to vector databases
|
|
77
|
+
- **Validate vector dimensions** match your index configuration
|
|
78
|
+
- **Review metadata** for sensitive information before upserting
|
|
79
|
+
- **Implement access controls** at the application level
|
|
80
|
+
- **Encrypt sensitive metadata** before storage if needed
|
|
81
|
+
|
|
82
|
+
```ruby
|
|
83
|
+
# Example: Sanitizing metadata
|
|
84
|
+
def sanitize_metadata(metadata)
|
|
85
|
+
metadata.reject { |k, _| k.to_s.match?(/password|secret|token/i) }
|
|
86
|
+
end
|
|
87
|
+
|
|
88
|
+
vectors = [{
|
|
89
|
+
id: "vec1",
|
|
90
|
+
values: embedding,
|
|
91
|
+
metadata: sanitize_metadata(user_data)
|
|
92
|
+
}]
|
|
93
|
+
|
|
94
|
+
client.upsert(index: "my-index", vectors: vectors)
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
### Dependency Security
|
|
98
|
+
|
|
99
|
+
- Keep Vectra and its dependencies up to date
|
|
100
|
+
- Run `bundle audit` regularly to check for known vulnerabilities
|
|
101
|
+
- Review dependency changes in updates
|
|
102
|
+
|
|
103
|
+
```bash
|
|
104
|
+
# Check for vulnerabilities
|
|
105
|
+
gem install bundler-audit
|
|
106
|
+
bundle audit --update
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
### Rate Limiting
|
|
110
|
+
|
|
111
|
+
- Implement application-level rate limiting
|
|
112
|
+
- Handle `RateLimitError` exceptions appropriately
|
|
113
|
+
- Use exponential backoff for retries
|
|
114
|
+
|
|
115
|
+
```ruby
|
|
116
|
+
def safe_query_with_backoff(client, **params, max_retries: 3)
|
|
117
|
+
retries = 0
|
|
118
|
+
|
|
119
|
+
begin
|
|
120
|
+
client.query(**params)
|
|
121
|
+
rescue Vectra::RateLimitError => e
|
|
122
|
+
retries += 1
|
|
123
|
+
if retries <= max_retries
|
|
124
|
+
sleep_time = e.retry_after || (2 ** retries)
|
|
125
|
+
sleep(sleep_time)
|
|
126
|
+
retry
|
|
127
|
+
else
|
|
128
|
+
raise
|
|
129
|
+
end
|
|
130
|
+
end
|
|
131
|
+
end
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
### Logging and Monitoring
|
|
135
|
+
|
|
136
|
+
- **Do not log API keys** or sensitive data
|
|
137
|
+
- Monitor for authentication failures
|
|
138
|
+
- Track unusual query patterns
|
|
139
|
+
- Set up alerts for rate limit violations
|
|
140
|
+
|
|
141
|
+
```ruby
|
|
142
|
+
# ❌ Bad - Logs API key
|
|
143
|
+
logger.info("Using API key: #{config.api_key}")
|
|
144
|
+
|
|
145
|
+
# ✅ Good - Logs without sensitive data
|
|
146
|
+
logger.info("Initializing Vectra client for #{config.provider}")
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
## Known Security Considerations
|
|
150
|
+
|
|
151
|
+
### API Key Exposure
|
|
152
|
+
|
|
153
|
+
API keys are transmitted in HTTP headers. While connections use HTTPS, ensure:
|
|
154
|
+
- API keys are never logged or exposed in error messages
|
|
155
|
+
- API keys are not included in client-side code
|
|
156
|
+
- Development/test API keys are separate from production
|
|
157
|
+
|
|
158
|
+
### Metadata Privacy
|
|
159
|
+
|
|
160
|
+
Metadata stored with vectors may contain sensitive information:
|
|
161
|
+
- Review metadata fields before upserting
|
|
162
|
+
- Consider encryption for sensitive fields
|
|
163
|
+
- Implement data retention policies
|
|
164
|
+
- Follow GDPR/privacy regulations for user data
|
|
165
|
+
|
|
166
|
+
### Dependency Chain
|
|
167
|
+
|
|
168
|
+
Vectra depends on:
|
|
169
|
+
- `faraday` - HTTP client library
|
|
170
|
+
- `faraday-retry` - Retry middleware
|
|
171
|
+
|
|
172
|
+
We monitor these dependencies for security issues and update promptly.
|
|
173
|
+
|
|
174
|
+
## Security Updates
|
|
175
|
+
|
|
176
|
+
Security updates will be released as patch versions (e.g., 0.1.1) and announced:
|
|
177
|
+
- On GitHub Security Advisories
|
|
178
|
+
- In the CHANGELOG.md
|
|
179
|
+
- Via RubyGems security notifications
|
|
180
|
+
|
|
181
|
+
Subscribe to GitHub releases to be notified of security updates.
|
|
182
|
+
|
|
183
|
+
## Compliance
|
|
184
|
+
|
|
185
|
+
Vectra is designed to work with various vector database providers. Ensure your usage complies with:
|
|
186
|
+
- Your provider's security requirements
|
|
187
|
+
- Data protection regulations (GDPR, CCPA, etc.)
|
|
188
|
+
- Industry-specific compliance (HIPAA, PCI-DSS, etc.)
|
|
189
|
+
|
|
190
|
+
## Questions?
|
|
191
|
+
|
|
192
|
+
If you have questions about security that are not covered here, please email: mijo@mijokristo.com
|
|
193
|
+
|
|
194
|
+
## Attribution
|
|
195
|
+
|
|
196
|
+
We appreciate responsible disclosure and will acknowledge security researchers who help improve Vectra's security.
|