RubyGems - vectra-client - Versions diffs - 0.2.2 → 0.3.1 - Mend

vectra-client 0.2.2 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (34) hide show

checksums.yaml +4 -4
data/.rubocop.yml +77 -37
data/CHANGELOG.md +85 -6
data/README.md +56 -203
data/docs/Gemfile +0 -1
data/docs/_config.yml +19 -2
data/docs/_layouts/default.html +6 -6
data/docs/_layouts/home.html +183 -29
data/docs/_layouts/page.html +81 -18
data/docs/assets/style.css +806 -174
data/docs/examples/index.md +46 -24
data/docs/guides/monitoring.md +860 -0
data/docs/guides/performance.md +200 -0
data/docs/guides/runbooks/cache-issues.md +267 -0
data/docs/guides/runbooks/high-error-rate.md +152 -0
data/docs/guides/runbooks/high-latency.md +287 -0
data/docs/guides/runbooks/pool-exhausted.md +216 -0
data/docs/index.md +22 -38
data/docs/providers/index.md +58 -39
data/lib/vectra/batch.rb +148 -0
data/lib/vectra/cache.rb +261 -0
data/lib/vectra/circuit_breaker.rb +336 -0
data/lib/vectra/client.rb +2 -0
data/lib/vectra/configuration.rb +6 -1
data/lib/vectra/health_check.rb +254 -0
data/lib/vectra/instrumentation/honeybadger.rb +128 -0
data/lib/vectra/instrumentation/sentry.rb +117 -0
data/lib/vectra/logging.rb +242 -0
data/lib/vectra/pool.rb +256 -0
data/lib/vectra/rate_limiter.rb +304 -0
data/lib/vectra/streaming.rb +153 -0
data/lib/vectra/version.rb +1 -1
data/lib/vectra.rb +8 -0
metadata +31 -1

data/docs/guides/performance.md ADDED Viewed

@@ -0,0 +1,200 @@
+---
+layout: page
+title: Performance & Optimization
+permalink: /guides/performance/
+---
+# Performance & Optimization
+Vectra provides several performance optimization features for high-throughput applications.
+## Async Batch Operations
+Process large vector sets concurrently with automatic chunking:
+```ruby
+require 'vectra'
+client = Vectra::Client.new(provider: :pinecone, api_key: ENV['PINECONE_API_KEY'])
+# Create a batch processor with 4 concurrent workers
+batch = Vectra::Batch.new(client, concurrency: 4)
+# Async upsert with automatic chunking
+vectors = 10_000.times.map { |i| { id: "vec_#{i}", values: Array.new(384) { rand } } }
+result = batch.upsert_async(
+  index: 'my-index',
+  vectors: vectors,
+  chunk_size: 100
+)
+puts "Upserted: #{result[:upserted_count]} vectors in #{result[:chunks]} chunks"
+puts "Errors: #{result[:errors].size}" if result[:errors].any?
+```
+### Batch Delete
+```ruby
+ids = 1000.times.map { |i| "vec_#{i}" }
+result = batch.delete_async(
+  index: 'my-index',
+  ids: ids,
+  chunk_size: 100
+)
+```
+### Batch Fetch
+```ruby
+ids = ['vec_1', 'vec_2', 'vec_3']
+vectors = batch.fetch_async(
+  index: 'my-index',
+  ids: ids,
+  chunk_size: 50
+)
+```
+## Streaming Results
+For large query result sets, use streaming to reduce memory usage:
+```ruby
+stream = Vectra::Streaming.new(client, page_size: 100)
+# Stream with a block
+stream.query_each(
+  index: 'my-index',
+  vector: query_vector,
+  total: 1000
+) do |match|
+  process_match(match)
+end
+# Or use lazy enumerator
+results = stream.query_stream(
+  index: 'my-index',
+  vector: query_vector,
+  total: 1000
+)
+# Only fetches what you need
+results.take(50).each { |m| puts m.id }
+```
+## Caching Layer
+Cache frequently queried vectors to reduce database load:
+```ruby
+# Create cache with 5-minute TTL
+cache = Vectra::Cache.new(ttl: 300, max_size: 1000)
+# Wrap client with caching
+cached_client = Vectra::CachedClient.new(client, cache: cache)
+# First query hits the database
+result1 = cached_client.query(index: 'idx', vector: vec, top_k: 10)
+# Second identical query returns cached result
+result2 = cached_client.query(index: 'idx', vector: vec, top_k: 10)
+# Invalidate cache when data changes
+cached_client.invalidate_index('idx')
+# Clear all cache
+cached_client.clear_cache
+```
+### Cache Statistics
+```ruby
+stats = cache.stats
+puts "Cache size: #{stats[:size]}/#{stats[:max_size]}"
+puts "TTL: #{stats[:ttl]} seconds"
+```
+## Connection Pooling (pgvector)
+For pgvector, use connection pooling with warmup:
+```ruby
+# Configure pool size
+Vectra.configure do |config|
+  config.provider = :pgvector
+  config.host = ENV['DATABASE_URL']
+  config.pool_size = 10
+  config.pool_timeout = 5
+end
+client = Vectra::Client.new
+# Warmup connections at startup
+client.provider.warmup_pool(5)
+# Check pool stats
+stats = client.provider.pool_stats
+puts "Available connections: #{stats[:available]}"
+puts "Checked out: #{stats[:checked_out]}"
+# Shutdown pool when done
+client.provider.shutdown_pool
+```
+## Configuration Options
+```ruby
+Vectra.configure do |config|
+  # Provider settings
+  config.provider = :pinecone
+  config.api_key = ENV['PINECONE_API_KEY']
+  # Timeouts
+  config.timeout = 30
+  config.open_timeout = 10
+  # Retry settings
+  config.max_retries = 3
+  config.retry_delay = 1
+  # Batch operations
+  config.batch_size = 100
+  config.async_concurrency = 4
+  # Connection pooling (pgvector)
+  config.pool_size = 10
+  config.pool_timeout = 5
+  # Caching
+  config.cache_enabled = true
+  config.cache_ttl = 300
+  config.cache_max_size = 1000
+end
+```
+## Benchmarking
+Run the included benchmarks:
+```bash
+# Batch operations benchmark
+bundle exec ruby benchmarks/batch_operations_benchmark.rb
+# Connection pooling benchmark
+bundle exec ruby benchmarks/connection_pooling_benchmark.rb
+```
+## Best Practices
+1. **Batch Size**: Use batch sizes of 100-500 for optimal throughput
+2. **Concurrency**: Set concurrency to 2-4x your CPU cores
+3. **Connection Pool**: Size pool to expected concurrent requests + 20%
+4. **Cache TTL**: Set TTL based on data freshness requirements
+5. **Warmup**: Always warmup connections in production
+## Next Steps
+- [API Reference]({{ site.baseurl }}/api/overview)
+- [Provider Guides]({{ site.baseurl }}/providers)

data/docs/guides/runbooks/cache-issues.md ADDED Viewed

@@ -0,0 +1,267 @@
+---
+layout: page
+title: "Runbook: Cache Issues"
+permalink: /guides/runbooks/cache-issues/
+---
+# Runbook: Cache Issues
+**Alert:** `VectraLowCacheHitRatio`
+**Severity:** Warning
+**Threshold:** Cache hit ratio <50% for 10 minutes
+## Symptoms
+- High cache miss rate
+- Increased database load
+- Higher latency than expected
+- Stale data being returned
+## Quick Diagnosis
+```ruby
+cache = Vectra::Cache.new
+stats = cache.stats
+puts "Size: #{stats[:size]} / #{stats[:max_size]}"
+puts "TTL: #{stats[:ttl]} seconds"
+puts "Keys: #{stats[:keys].count}"
+```
+```promql
+# Prometheus: Check hit ratio
+sum(vectra_cache_hits_total) /
+(sum(vectra_cache_hits_total) + sum(vectra_cache_misses_total))
+```
+## Investigation Steps
+### 1. Check Cache Configuration
+```ruby
+# Current config
+puts Vectra.configuration.cache_enabled    # Should be true
+puts Vectra.configuration.cache_ttl        # Default: 300
+puts Vectra.configuration.cache_max_size   # Default: 1000
+```
+### 2. Analyze Access Patterns
+```ruby
+# Check what's being cached
+cache.stats[:keys].each do |key|
+  parts = key.split(":")
+  puts "Index: #{parts[0]}, Type: #{parts[1]}"
+end
+# Count by type
+keys = cache.stats[:keys]
+queries = keys.count { |k| k.include?(":q:") }
+fetches = keys.count { |k| k.include?(":f:") }
+puts "Query cache entries: #{queries}"
+puts "Fetch cache entries: #{fetches}"
+```
+### 3. Check for Cache Thrashing
+```ruby
+# If max_size is too small, cache thrashes
+# Sign: entries being evicted immediately after creation
+# Solution: Increase max_size
+stats = cache.stats
+if stats[:size] >= stats[:max_size] * 0.9
+  puts "WARNING: Cache near capacity - consider increasing max_size"
+end
+```
+### 4. Check TTL Appropriateness
+```ruby
+# If TTL is too short, cache misses are high
+# If TTL is too long, stale data is served
+# Check data freshness requirements
+# - Real-time data: TTL 30-60s
+# - Semi-static data: TTL 300-600s
+# - Static data: TTL 3600s+
+```
+## Resolution Steps
+### Low Hit Ratio
+#### Increase Cache Size
+```ruby
+cache = Vectra::Cache.new(
+  ttl: 300,
+  max_size: 5000  # Increase from 1000
+)
+cached_client = Vectra::CachedClient.new(client, cache: cache)
+```
+#### Adjust TTL
+```ruby
+# For high-churn data
+cache = Vectra::Cache.new(ttl: 60)  # 1 minute
+# For stable data
+cache = Vectra::Cache.new(ttl: 3600)  # 1 hour
+```
+#### Cache Warming
+```ruby
+# Pre-populate cache on startup
+common_queries = load_common_queries()
+common_queries.each do |q|
+  cached_client.query(
+    index: q[:index],
+    vector: q[:vector],
+    top_k: q[:top_k]
+  )
+end
+```
+### Stale Data
+#### Reduce TTL
+```ruby
+cache = Vectra::Cache.new(ttl: 60)  # Reduce from 300
+```
+#### Implement Cache Invalidation
+```ruby
+# After upsert, invalidate affected cache
+def upsert_with_invalidation(index:, vectors:)
+  result = client.upsert(index: index, vectors: vectors)
+  cached_client.invalidate_index(index)
+  result
+end
+```
+#### Use Cache-Aside Pattern
+```ruby
+def get_vector(id)
+  # Check cache first
+  cached = cache.get("vector:#{id}")
+  return cached if cached
+  # Fetch from source
+  vector = client.fetch(index: "main", ids: [id])[id]
+  # Cache with appropriate TTL
+  cache.set("vector:#{id}", vector)
+  vector
+end
+```
+### Cache Thrashing
+#### Increase Max Size
+```ruby
+# Rule of thumb: max_size = unique_queries_per_ttl * 1.5
+# Example: 1000 unique queries per 5 min, max_size = 1500
+cache = Vectra::Cache.new(
+  ttl: 300,
+  max_size: 1500
+)
+```
+#### Implement Tiered Caching
+```ruby
+# Hot cache: Small, short TTL
+hot_cache = Vectra::Cache.new(ttl: 60, max_size: 100)
+# Warm cache: Large, longer TTL
+warm_cache = Vectra::Cache.new(ttl: 600, max_size: 5000)
+# Check hot first, then warm
+def cached_query(...)
+  hot_cache.fetch(key) do
+    warm_cache.fetch(key) do
+      client.query(...)
+    end
+  end
+end
+```
+### Memory Issues
+#### Monitor Memory Usage
+```ruby
+# Estimate cache memory usage
+# Approximate: 1KB per cached query result
+estimated_mb = cache.stats[:size] * 1.0 / 1000
+puts "Estimated cache memory: #{estimated_mb} MB"
+```
+#### Implement LRU Eviction
+```ruby
+# Vectra::Cache already implements LRU
+# If memory is still an issue, reduce max_size
+cache = Vectra::Cache.new(max_size: 500)
+```
+## Prevention
+### 1. Right-size Cache
+```ruby
+# Calculate based on query patterns
+unique_queries_per_minute = 100
+ttl_minutes = 5
+buffer = 1.5
+max_size = unique_queries_per_minute * ttl_minutes * buffer
+# = 100 * 5 * 1.5 = 750
+```
+### 2. Monitor Cache Metrics
+```promql
+# Alert on low hit ratio
+sum(rate(vectra_cache_hits_total[5m])) /
+(sum(rate(vectra_cache_hits_total[5m])) +
+ sum(rate(vectra_cache_misses_total[5m]))) < 0.5
+```
+### 3. Implement Cache Warm-up
+```ruby
+# In application boot
+Rails.application.config.after_initialize do
+  VectraCacheWarmer.perform_async
+end
+```
+### 4. Use Cache Namespacing
+```ruby
+# Separate caches for different use cases
+search_cache = Vectra::Cache.new(ttl: 60)   # Fast invalidation
+embed_cache = Vectra::Cache.new(ttl: 3600)  # Long-lived embeddings
+```
+## Escalation
+| Time | Action |
+|------|--------|
+| 10 min | Adjust TTL/max_size |
+| 30 min | Implement cache warming |
+| 1 hour | Review access patterns |
+| 2 hours | Consider Redis/Memcached |
+## Related
+- [Performance Guide]({{ site.baseurl }}/guides/performance)
+- [Monitoring Guide]({{ site.baseurl }}/guides/monitoring)

data/docs/guides/runbooks/high-error-rate.md ADDED Viewed

@@ -0,0 +1,152 @@
+---
+layout: page
+title: "Runbook: High Error Rate"
+permalink: /guides/runbooks/high-error-rate/
+---
+# Runbook: High Error Rate
+**Alert:** `VectraHighErrorRate`
+**Severity:** Critical
+**Threshold:** Error rate >5% for 5 minutes
+## Symptoms
+- Alert firing for high error rate
+- Users reporting failed operations
+- Increased latency alongside errors
+## Quick Diagnosis
+```bash
+# Check recent errors in logs
+grep -i "vectra.*error" /var/log/app.log | tail -50
+# Check error breakdown by type
+curl -s localhost:9090/api/v1/query?query=sum(vectra_errors_total)by(error_type) | jq
+```
+## Investigation Steps
+### 1. Identify Error Type
+```ruby
+# In Rails console
+Vectra::Client.new.stats(index: "your-index")
+```
+| Error Type | Likely Cause | Action |
+|------------|--------------|--------|
+| `AuthenticationError` | Invalid/expired API key | Check credentials |
+| `RateLimitError` | Too many requests | Implement backoff |
+| `ServerError` | Provider outage | Check provider status |
+| `ConnectionError` | Network issues | Check connectivity |
+| `ValidationError` | Bad request data | Check input validation |
+### 2. Check Provider Status
+- **Pinecone:** [status.pinecone.io](https://status.pinecone.io)
+- **Qdrant:** Check self-hosted logs or cloud dashboard
+- **pgvector:** `SELECT * FROM pg_stat_activity WHERE state = 'active';`
+### 3. Check Application Logs
+```bash
+# Filter by error class
+grep "Vectra::RateLimitError" /var/log/app.log | wc -l
+grep "Vectra::ServerError" /var/log/app.log | wc -l
+grep "Vectra::AuthenticationError" /var/log/app.log | wc -l
+```
+## Resolution Steps
+### Authentication Errors
+```ruby
+# Verify API key is set
+puts ENV['PINECONE_API_KEY'].nil? ? "MISSING" : "SET"
+# Test connection
+client = Vectra::Client.new
+client.list_indexes
+```
+### Rate Limit Errors
+```ruby
+# Implement exponential backoff
+Vectra.configure do |config|
+  config.max_retries = 5
+  config.retry_delay = 2  # Start with 2s delay
+end
+# Or use batch operations with concurrency limit
+batch = Vectra::Batch.new(client, concurrency: 2)  # Reduce from 4
+```
+### Server Errors
+1. Check provider status page
+2. If provider is down, enable fallback or circuit breaker
+3. Consider failover to backup provider
+```ruby
+# Simple circuit breaker
+class VectraCircuitBreaker
+  def self.call
+    return cached_response if circuit_open?
+    yield
+  rescue Vectra::ServerError
+    open_circuit!
+    cached_response
+  end
+end
+```
+### Connection Errors
+```bash
+# Test network connectivity
+curl -I https://api.pinecone.io/health
+# Check DNS resolution
+nslookup api.pinecone.io
+# Check firewall rules
+iptables -L -n | grep -i pinecone
+```
+## Prevention
+1. **Set up retry logic:**
+   ```ruby
+   config.max_retries = 3
+   config.retry_delay = 1
+   ```
+2. **Monitor error rate trends:**
+   ```promql
+   increase(vectra_errors_total[1h])
+   ```
+3. **Implement circuit breakers** for provider outages
+4. **Cache frequently accessed data:**
+   ```ruby
+   cached_client = Vectra::CachedClient.new(client)
+   ```
+## Escalation
+| Time | Action |
+|------|--------|
+| 5 min | Page on-call engineer |
+| 15 min | Escalate to team lead |
+| 30 min | Consider provider failover |
+| 1 hour | Engage provider support |
+## Related
+- [High Latency Runbook]({{ site.baseurl }}/guides/runbooks/high-latency)
+- [Monitoring Guide]({{ site.baseurl }}/guides/monitoring)