vectra-client 0.2.2 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,200 @@
1
+ ---
2
+ layout: page
3
+ title: Performance & Optimization
4
+ permalink: /guides/performance/
5
+ ---
6
+
7
+ # Performance & Optimization
8
+
9
+ Vectra provides several performance optimization features for high-throughput applications.
10
+
11
+ ## Async Batch Operations
12
+
13
+ Process large vector sets concurrently with automatic chunking:
14
+
15
+ ```ruby
16
+ require 'vectra'
17
+
18
+ client = Vectra::Client.new(provider: :pinecone, api_key: ENV['PINECONE_API_KEY'])
19
+
20
+ # Create a batch processor with 4 concurrent workers
21
+ batch = Vectra::Batch.new(client, concurrency: 4)
22
+
23
+ # Async upsert with automatic chunking
24
+ vectors = 10_000.times.map { |i| { id: "vec_#{i}", values: Array.new(384) { rand } } }
25
+
26
+ result = batch.upsert_async(
27
+ index: 'my-index',
28
+ vectors: vectors,
29
+ chunk_size: 100
30
+ )
31
+
32
+ puts "Upserted: #{result[:upserted_count]} vectors in #{result[:chunks]} chunks"
33
+ puts "Errors: #{result[:errors].size}" if result[:errors].any?
34
+ ```
35
+
36
+ ### Batch Delete
37
+
38
+ ```ruby
39
+ ids = 1000.times.map { |i| "vec_#{i}" }
40
+
41
+ result = batch.delete_async(
42
+ index: 'my-index',
43
+ ids: ids,
44
+ chunk_size: 100
45
+ )
46
+ ```
47
+
48
+ ### Batch Fetch
49
+
50
+ ```ruby
51
+ ids = ['vec_1', 'vec_2', 'vec_3']
52
+
53
+ vectors = batch.fetch_async(
54
+ index: 'my-index',
55
+ ids: ids,
56
+ chunk_size: 50
57
+ )
58
+ ```
59
+
60
+ ## Streaming Results
61
+
62
+ For large query result sets, use streaming to reduce memory usage:
63
+
64
+ ```ruby
65
+ stream = Vectra::Streaming.new(client, page_size: 100)
66
+
67
+ # Stream with a block
68
+ stream.query_each(
69
+ index: 'my-index',
70
+ vector: query_vector,
71
+ total: 1000
72
+ ) do |match|
73
+ process_match(match)
74
+ end
75
+
76
+ # Or use lazy enumerator
77
+ results = stream.query_stream(
78
+ index: 'my-index',
79
+ vector: query_vector,
80
+ total: 1000
81
+ )
82
+
83
+ # Only fetches what you need
84
+ results.take(50).each { |m| puts m.id }
85
+ ```
86
+
87
+ ## Caching Layer
88
+
89
+ Cache frequently queried vectors to reduce database load:
90
+
91
+ ```ruby
92
+ # Create cache with 5-minute TTL
93
+ cache = Vectra::Cache.new(ttl: 300, max_size: 1000)
94
+
95
+ # Wrap client with caching
96
+ cached_client = Vectra::CachedClient.new(client, cache: cache)
97
+
98
+ # First query hits the database
99
+ result1 = cached_client.query(index: 'idx', vector: vec, top_k: 10)
100
+
101
+ # Second identical query returns cached result
102
+ result2 = cached_client.query(index: 'idx', vector: vec, top_k: 10)
103
+
104
+ # Invalidate cache when data changes
105
+ cached_client.invalidate_index('idx')
106
+
107
+ # Clear all cache
108
+ cached_client.clear_cache
109
+ ```
110
+
111
+ ### Cache Statistics
112
+
113
+ ```ruby
114
+ stats = cache.stats
115
+ puts "Cache size: #{stats[:size]}/#{stats[:max_size]}"
116
+ puts "TTL: #{stats[:ttl]} seconds"
117
+ ```
118
+
119
+ ## Connection Pooling (pgvector)
120
+
121
+ For pgvector, use connection pooling with warmup:
122
+
123
+ ```ruby
124
+ # Configure pool size
125
+ Vectra.configure do |config|
126
+ config.provider = :pgvector
127
+ config.host = ENV['DATABASE_URL']
128
+ config.pool_size = 10
129
+ config.pool_timeout = 5
130
+ end
131
+
132
+ client = Vectra::Client.new
133
+
134
+ # Warmup connections at startup
135
+ client.provider.warmup_pool(5)
136
+
137
+ # Check pool stats
138
+ stats = client.provider.pool_stats
139
+ puts "Available connections: #{stats[:available]}"
140
+ puts "Checked out: #{stats[:checked_out]}"
141
+
142
+ # Shutdown pool when done
143
+ client.provider.shutdown_pool
144
+ ```
145
+
146
+ ## Configuration Options
147
+
148
+ ```ruby
149
+ Vectra.configure do |config|
150
+ # Provider settings
151
+ config.provider = :pinecone
152
+ config.api_key = ENV['PINECONE_API_KEY']
153
+
154
+ # Timeouts
155
+ config.timeout = 30
156
+ config.open_timeout = 10
157
+
158
+ # Retry settings
159
+ config.max_retries = 3
160
+ config.retry_delay = 1
161
+
162
+ # Batch operations
163
+ config.batch_size = 100
164
+ config.async_concurrency = 4
165
+
166
+ # Connection pooling (pgvector)
167
+ config.pool_size = 10
168
+ config.pool_timeout = 5
169
+
170
+ # Caching
171
+ config.cache_enabled = true
172
+ config.cache_ttl = 300
173
+ config.cache_max_size = 1000
174
+ end
175
+ ```
176
+
177
+ ## Benchmarking
178
+
179
+ Run the included benchmarks:
180
+
181
+ ```bash
182
+ # Batch operations benchmark
183
+ bundle exec ruby benchmarks/batch_operations_benchmark.rb
184
+
185
+ # Connection pooling benchmark
186
+ bundle exec ruby benchmarks/connection_pooling_benchmark.rb
187
+ ```
188
+
189
+ ## Best Practices
190
+
191
+ 1. **Batch Size**: Use batch sizes of 100-500 for optimal throughput
192
+ 2. **Concurrency**: Set concurrency to 2-4x your CPU cores
193
+ 3. **Connection Pool**: Size pool to expected concurrent requests + 20%
194
+ 4. **Cache TTL**: Set TTL based on data freshness requirements
195
+ 5. **Warmup**: Always warmup connections in production
196
+
197
+ ## Next Steps
198
+
199
+ - [API Reference]({{ site.baseurl }}/api/overview)
200
+ - [Provider Guides]({{ site.baseurl }}/providers)
@@ -0,0 +1,267 @@
1
+ ---
2
+ layout: page
3
+ title: "Runbook: Cache Issues"
4
+ permalink: /guides/runbooks/cache-issues/
5
+ ---
6
+
7
+ # Runbook: Cache Issues
8
+
9
+ **Alert:** `VectraLowCacheHitRatio`
10
+ **Severity:** Warning
11
+ **Threshold:** Cache hit ratio <50% for 10 minutes
12
+
13
+ ## Symptoms
14
+
15
+ - High cache miss rate
16
+ - Increased database load
17
+ - Higher latency than expected
18
+ - Stale data being returned
19
+
20
+ ## Quick Diagnosis
21
+
22
+ ```ruby
23
+ cache = Vectra::Cache.new
24
+ stats = cache.stats
25
+
26
+ puts "Size: #{stats[:size]} / #{stats[:max_size]}"
27
+ puts "TTL: #{stats[:ttl]} seconds"
28
+ puts "Keys: #{stats[:keys].count}"
29
+ ```
30
+
31
+ ```promql
32
+ # Prometheus: Check hit ratio
33
+ sum(vectra_cache_hits_total) /
34
+ (sum(vectra_cache_hits_total) + sum(vectra_cache_misses_total))
35
+ ```
36
+
37
+ ## Investigation Steps
38
+
39
+ ### 1. Check Cache Configuration
40
+
41
+ ```ruby
42
+ # Current config
43
+ puts Vectra.configuration.cache_enabled # Should be true
44
+ puts Vectra.configuration.cache_ttl # Default: 300
45
+ puts Vectra.configuration.cache_max_size # Default: 1000
46
+ ```
47
+
48
+ ### 2. Analyze Access Patterns
49
+
50
+ ```ruby
51
+ # Check what's being cached
52
+ cache.stats[:keys].each do |key|
53
+ parts = key.split(":")
54
+ puts "Index: #{parts[0]}, Type: #{parts[1]}"
55
+ end
56
+
57
+ # Count by type
58
+ keys = cache.stats[:keys]
59
+ queries = keys.count { |k| k.include?(":q:") }
60
+ fetches = keys.count { |k| k.include?(":f:") }
61
+ puts "Query cache entries: #{queries}"
62
+ puts "Fetch cache entries: #{fetches}"
63
+ ```
64
+
65
+ ### 3. Check for Cache Thrashing
66
+
67
+ ```ruby
68
+ # If max_size is too small, cache thrashes
69
+ # Sign: entries being evicted immediately after creation
70
+ # Solution: Increase max_size
71
+
72
+ stats = cache.stats
73
+ if stats[:size] >= stats[:max_size] * 0.9
74
+ puts "WARNING: Cache near capacity - consider increasing max_size"
75
+ end
76
+ ```
77
+
78
+ ### 4. Check TTL Appropriateness
79
+
80
+ ```ruby
81
+ # If TTL is too short, cache misses are high
82
+ # If TTL is too long, stale data is served
83
+
84
+ # Check data freshness requirements
85
+ # - Real-time data: TTL 30-60s
86
+ # - Semi-static data: TTL 300-600s
87
+ # - Static data: TTL 3600s+
88
+ ```
89
+
90
+ ## Resolution Steps
91
+
92
+ ### Low Hit Ratio
93
+
94
+ #### Increase Cache Size
95
+
96
+ ```ruby
97
+ cache = Vectra::Cache.new(
98
+ ttl: 300,
99
+ max_size: 5000 # Increase from 1000
100
+ )
101
+ cached_client = Vectra::CachedClient.new(client, cache: cache)
102
+ ```
103
+
104
+ #### Adjust TTL
105
+
106
+ ```ruby
107
+ # For high-churn data
108
+ cache = Vectra::Cache.new(ttl: 60) # 1 minute
109
+
110
+ # For stable data
111
+ cache = Vectra::Cache.new(ttl: 3600) # 1 hour
112
+ ```
113
+
114
+ #### Cache Warming
115
+
116
+ ```ruby
117
+ # Pre-populate cache on startup
118
+ common_queries = load_common_queries()
119
+ common_queries.each do |q|
120
+ cached_client.query(
121
+ index: q[:index],
122
+ vector: q[:vector],
123
+ top_k: q[:top_k]
124
+ )
125
+ end
126
+ ```
127
+
128
+ ### Stale Data
129
+
130
+ #### Reduce TTL
131
+
132
+ ```ruby
133
+ cache = Vectra::Cache.new(ttl: 60) # Reduce from 300
134
+ ```
135
+
136
+ #### Implement Cache Invalidation
137
+
138
+ ```ruby
139
+ # After upsert, invalidate affected cache
140
+ def upsert_with_invalidation(index:, vectors:)
141
+ result = client.upsert(index: index, vectors: vectors)
142
+ cached_client.invalidate_index(index)
143
+ result
144
+ end
145
+ ```
146
+
147
+ #### Use Cache-Aside Pattern
148
+
149
+ ```ruby
150
+ def get_vector(id)
151
+ # Check cache first
152
+ cached = cache.get("vector:#{id}")
153
+ return cached if cached
154
+
155
+ # Fetch from source
156
+ vector = client.fetch(index: "main", ids: [id])[id]
157
+
158
+ # Cache with appropriate TTL
159
+ cache.set("vector:#{id}", vector)
160
+ vector
161
+ end
162
+ ```
163
+
164
+ ### Cache Thrashing
165
+
166
+ #### Increase Max Size
167
+
168
+ ```ruby
169
+ # Rule of thumb: max_size = unique_queries_per_ttl * 1.5
170
+ # Example: 1000 unique queries per 5 min, max_size = 1500
171
+ cache = Vectra::Cache.new(
172
+ ttl: 300,
173
+ max_size: 1500
174
+ )
175
+ ```
176
+
177
+ #### Implement Tiered Caching
178
+
179
+ ```ruby
180
+ # Hot cache: Small, short TTL
181
+ hot_cache = Vectra::Cache.new(ttl: 60, max_size: 100)
182
+
183
+ # Warm cache: Large, longer TTL
184
+ warm_cache = Vectra::Cache.new(ttl: 600, max_size: 5000)
185
+
186
+ # Check hot first, then warm
187
+ def cached_query(...)
188
+ hot_cache.fetch(key) do
189
+ warm_cache.fetch(key) do
190
+ client.query(...)
191
+ end
192
+ end
193
+ end
194
+ ```
195
+
196
+ ### Memory Issues
197
+
198
+ #### Monitor Memory Usage
199
+
200
+ ```ruby
201
+ # Estimate cache memory usage
202
+ # Approximate: 1KB per cached query result
203
+ estimated_mb = cache.stats[:size] * 1.0 / 1000
204
+ puts "Estimated cache memory: #{estimated_mb} MB"
205
+ ```
206
+
207
+ #### Implement LRU Eviction
208
+
209
+ ```ruby
210
+ # Vectra::Cache already implements LRU
211
+ # If memory is still an issue, reduce max_size
212
+ cache = Vectra::Cache.new(max_size: 500)
213
+ ```
214
+
215
+ ## Prevention
216
+
217
+ ### 1. Right-size Cache
218
+
219
+ ```ruby
220
+ # Calculate based on query patterns
221
+ unique_queries_per_minute = 100
222
+ ttl_minutes = 5
223
+ buffer = 1.5
224
+
225
+ max_size = unique_queries_per_minute * ttl_minutes * buffer
226
+ # = 100 * 5 * 1.5 = 750
227
+ ```
228
+
229
+ ### 2. Monitor Cache Metrics
230
+
231
+ ```promql
232
+ # Alert on low hit ratio
233
+ sum(rate(vectra_cache_hits_total[5m])) /
234
+ (sum(rate(vectra_cache_hits_total[5m])) +
235
+ sum(rate(vectra_cache_misses_total[5m]))) < 0.5
236
+ ```
237
+
238
+ ### 3. Implement Cache Warm-up
239
+
240
+ ```ruby
241
+ # In application boot
242
+ Rails.application.config.after_initialize do
243
+ VectraCacheWarmer.perform_async
244
+ end
245
+ ```
246
+
247
+ ### 4. Use Cache Namespacing
248
+
249
+ ```ruby
250
+ # Separate caches for different use cases
251
+ search_cache = Vectra::Cache.new(ttl: 60) # Fast invalidation
252
+ embed_cache = Vectra::Cache.new(ttl: 3600) # Long-lived embeddings
253
+ ```
254
+
255
+ ## Escalation
256
+
257
+ | Time | Action |
258
+ |------|--------|
259
+ | 10 min | Adjust TTL/max_size |
260
+ | 30 min | Implement cache warming |
261
+ | 1 hour | Review access patterns |
262
+ | 2 hours | Consider Redis/Memcached |
263
+
264
+ ## Related
265
+
266
+ - [Performance Guide]({{ site.baseurl }}/guides/performance)
267
+ - [Monitoring Guide]({{ site.baseurl }}/guides/monitoring)
@@ -0,0 +1,152 @@
1
+ ---
2
+ layout: page
3
+ title: "Runbook: High Error Rate"
4
+ permalink: /guides/runbooks/high-error-rate/
5
+ ---
6
+
7
+ # Runbook: High Error Rate
8
+
9
+ **Alert:** `VectraHighErrorRate`
10
+ **Severity:** Critical
11
+ **Threshold:** Error rate >5% for 5 minutes
12
+
13
+ ## Symptoms
14
+
15
+ - Alert firing for high error rate
16
+ - Users reporting failed operations
17
+ - Increased latency alongside errors
18
+
19
+ ## Quick Diagnosis
20
+
21
+ ```bash
22
+ # Check recent errors in logs
23
+ grep -i "vectra.*error" /var/log/app.log | tail -50
24
+
25
+ # Check error breakdown by type
26
+ curl -s localhost:9090/api/v1/query?query=sum(vectra_errors_total)by(error_type) | jq
27
+ ```
28
+
29
+ ## Investigation Steps
30
+
31
+ ### 1. Identify Error Type
32
+
33
+ ```ruby
34
+ # In Rails console
35
+ Vectra::Client.new.stats(index: "your-index")
36
+ ```
37
+
38
+ | Error Type | Likely Cause | Action |
39
+ |------------|--------------|--------|
40
+ | `AuthenticationError` | Invalid/expired API key | Check credentials |
41
+ | `RateLimitError` | Too many requests | Implement backoff |
42
+ | `ServerError` | Provider outage | Check provider status |
43
+ | `ConnectionError` | Network issues | Check connectivity |
44
+ | `ValidationError` | Bad request data | Check input validation |
45
+
46
+ ### 2. Check Provider Status
47
+
48
+ - **Pinecone:** [status.pinecone.io](https://status.pinecone.io)
49
+ - **Qdrant:** Check self-hosted logs or cloud dashboard
50
+ - **pgvector:** `SELECT * FROM pg_stat_activity WHERE state = 'active';`
51
+
52
+ ### 3. Check Application Logs
53
+
54
+ ```bash
55
+ # Filter by error class
56
+ grep "Vectra::RateLimitError" /var/log/app.log | wc -l
57
+ grep "Vectra::ServerError" /var/log/app.log | wc -l
58
+ grep "Vectra::AuthenticationError" /var/log/app.log | wc -l
59
+ ```
60
+
61
+ ## Resolution Steps
62
+
63
+ ### Authentication Errors
64
+
65
+ ```ruby
66
+ # Verify API key is set
67
+ puts ENV['PINECONE_API_KEY'].nil? ? "MISSING" : "SET"
68
+
69
+ # Test connection
70
+ client = Vectra::Client.new
71
+ client.list_indexes
72
+ ```
73
+
74
+ ### Rate Limit Errors
75
+
76
+ ```ruby
77
+ # Implement exponential backoff
78
+ Vectra.configure do |config|
79
+ config.max_retries = 5
80
+ config.retry_delay = 2 # Start with 2s delay
81
+ end
82
+
83
+ # Or use batch operations with concurrency limit
84
+ batch = Vectra::Batch.new(client, concurrency: 2) # Reduce from 4
85
+ ```
86
+
87
+ ### Server Errors
88
+
89
+ 1. Check provider status page
90
+ 2. If provider is down, enable fallback or circuit breaker
91
+ 3. Consider failover to backup provider
92
+
93
+ ```ruby
94
+ # Simple circuit breaker
95
+ class VectraCircuitBreaker
96
+ def self.call
97
+ return cached_response if circuit_open?
98
+
99
+ yield
100
+ rescue Vectra::ServerError
101
+ open_circuit!
102
+ cached_response
103
+ end
104
+ end
105
+ ```
106
+
107
+ ### Connection Errors
108
+
109
+ ```bash
110
+ # Test network connectivity
111
+ curl -I https://api.pinecone.io/health
112
+
113
+ # Check DNS resolution
114
+ nslookup api.pinecone.io
115
+
116
+ # Check firewall rules
117
+ iptables -L -n | grep -i pinecone
118
+ ```
119
+
120
+ ## Prevention
121
+
122
+ 1. **Set up retry logic:**
123
+ ```ruby
124
+ config.max_retries = 3
125
+ config.retry_delay = 1
126
+ ```
127
+
128
+ 2. **Monitor error rate trends:**
129
+ ```promql
130
+ increase(vectra_errors_total[1h])
131
+ ```
132
+
133
+ 3. **Implement circuit breakers** for provider outages
134
+
135
+ 4. **Cache frequently accessed data:**
136
+ ```ruby
137
+ cached_client = Vectra::CachedClient.new(client)
138
+ ```
139
+
140
+ ## Escalation
141
+
142
+ | Time | Action |
143
+ |------|--------|
144
+ | 5 min | Page on-call engineer |
145
+ | 15 min | Escalate to team lead |
146
+ | 30 min | Consider provider failover |
147
+ | 1 hour | Engage provider support |
148
+
149
+ ## Related
150
+
151
+ - [High Latency Runbook]({{ site.baseurl }}/guides/runbooks/high-latency)
152
+ - [Monitoring Guide]({{ site.baseurl }}/guides/monitoring)