RubyGems - vectra-client - Versions diffs - 0.2.1 → 0.3.0 - Mend

vectra-client 0.2.1 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (55) hide show

checksums.yaml +4 -4
data/.rubocop.yml +77 -37
data/CHANGELOG.md +49 -6
data/README.md +52 -393
data/docs/Gemfile +9 -0
data/docs/_config.yml +37 -0
data/docs/_layouts/default.html +14 -0
data/docs/_layouts/home.html +187 -0
data/docs/_layouts/page.html +82 -0
data/docs/_site/api/overview/index.html +145 -0
data/docs/_site/assets/main.css +649 -0
data/docs/_site/assets/main.css.map +1 -0
data/docs/_site/assets/minima-social-icons.svg +33 -0
data/docs/_site/assets/style.css +295 -0
data/docs/_site/community/contributing/index.html +110 -0
data/docs/_site/examples/basic-usage/index.html +117 -0
data/docs/_site/examples/index.html +58 -0
data/docs/_site/feed.xml +1 -0
data/docs/_site/guides/getting-started/index.html +106 -0
data/docs/_site/guides/installation/index.html +82 -0
data/docs/_site/index.html +92 -0
data/docs/_site/providers/index.html +119 -0
data/docs/_site/providers/pgvector/index.html +155 -0
data/docs/_site/providers/pinecone/index.html +121 -0
data/docs/_site/providers/qdrant/index.html +124 -0
data/docs/_site/providers/weaviate/index.html +123 -0
data/docs/_site/robots.txt +1 -0
data/docs/_site/sitemap.xml +39 -0
data/docs/api/overview.md +126 -0
data/docs/assets/style.css +927 -0
data/docs/community/contributing.md +89 -0
data/docs/examples/basic-usage.md +102 -0
data/docs/examples/index.md +54 -0
data/docs/guides/getting-started.md +90 -0
data/docs/guides/installation.md +67 -0
data/docs/guides/performance.md +200 -0
data/docs/index.md +37 -0
data/docs/providers/index.md +81 -0
data/docs/providers/pgvector.md +95 -0
data/docs/providers/pinecone.md +72 -0
data/docs/providers/qdrant.md +73 -0
data/docs/providers/weaviate.md +72 -0
data/lib/vectra/batch.rb +148 -0
data/lib/vectra/cache.rb +261 -0
data/lib/vectra/configuration.rb +6 -1
data/lib/vectra/pool.rb +256 -0
data/lib/vectra/streaming.rb +153 -0
data/lib/vectra/version.rb +1 -1
data/lib/vectra.rb +4 -0
data/netlify.toml +12 -0
metadata +58 -5
data/IMPLEMENTATION_GUIDE.md +0 -686
data/NEW_FEATURES_v0.2.0.md +0 -459
data/RELEASE_CHECKLIST_v0.2.0.md +0 -383
data/USAGE_EXAMPLES.md +0 -787

data/docs/community/contributing.md ADDED Viewed

@@ -0,0 +1,89 @@
+---
+layout: page
+title: Contributing
+permalink: /community/contributing/
+---
+# Contributing to Vectra
+We welcome contributions! Here's how to get started.
+## Development Setup
+1. Clone the repository:
+```bash
+git clone https://github.com/stokry/vectra.git
+cd vectra
+```
+2. Install dependencies:
+```bash
+bundle install
+```
+3. Run tests:
+```bash
+bundle exec rspec
+```
+## Making Changes
+1. Create a feature branch:
+```bash
+git checkout -b feature/your-feature-name
+```
+2. Make your changes and write tests
+3. Run linter:
+```bash
+bundle exec rubocop
+```
+4. Commit and push:
+```bash
+git add .
+git commit -m "Description of changes"
+git push origin feature/your-feature-name
+```
+5. Create a Pull Request
+## Code Style
+We use RuboCop for code style. Ensure your code passes:
+```bash
+bundle exec rubocop
+```
+## Testing
+All changes require tests:
+```bash
+# Run all tests
+bundle exec rspec
+# Run specific suite
+bundle exec rspec spec/vectra
+# Run with coverage
+bundle exec rspec --cov
+```
+## Documentation
+Please update documentation for any changes:
+- Update README.md for user-facing changes
+- Update CHANGELOG.md
+- Add examples if needed
+## Questions?
+Feel free to:
+- Open an issue on GitHub
+- Check existing issues and discussions
+- Read the [Implementation Guide](https://github.com/stokry/vectra/blob/main/IMPLEMENTATION_GUIDE.md)
+Thank you for contributing! 🙌

data/docs/examples/basic-usage.md ADDED Viewed

@@ -0,0 +1,102 @@
+---
+layout: page
+title: Basic Usage
+permalink: /examples/basic-usage/
+---
+# Basic Usage Examples
+## Simple Search
+```ruby
+require 'vectra'
+client = Vectra::Client.new(
+  provider: :pinecone,
+  api_key: ENV['PINECONE_API_KEY'],
+  environment: 'us-west-4'
+)
+# Search for similar vectors
+results = client.query(
+  vector: [0.1, 0.2, 0.3],
+  top_k: 5
+)
+puts results.matches.count
+```
+## Batch Operations
+```ruby
+# Upsert multiple vectors at once
+vectors = [
+  { id: '1', values: [0.1, 0.2, 0.3], metadata: { title: 'Doc 1' } },
+  { id: '2', values: [0.2, 0.3, 0.4], metadata: { title: 'Doc 2' } },
+  { id: '3', values: [0.3, 0.4, 0.5], metadata: { title: 'Doc 3' } }
+]
+client.upsert(vectors: vectors)
+# Delete multiple vectors
+client.delete(ids: ['1', '2', '3'])
+```
+## Rails Integration
+```ruby
+# config/initializers/vectra.rb
+Vectra.configure do |config|
+  config.provider = :pgvector
+  config.database = Rails.configuration.database_configuration[Rails.env]['database']
+end
+# app/models/document.rb
+class Document < ApplicationRecord
+  include Vectra::ActiveRecord
+  vector_search :embedding
+  def generate_embedding
+    # Generate embedding using OpenAI, Cohere, etc.
+    embedding_vector = generate_vector_from_text(content)
+    self.embedding = embedding_vector
+  end
+end
+# Usage
+doc = Document.find(1)
+similar_docs = doc.vector_search(limit: 10)
+```
+## With Metadata Filtering
+```ruby
+results = client.query(
+  vector: [0.1, 0.2, 0.3],
+  top_k: 10,
+  include_metadata: true
+)
+results.matches.each do |match|
+  puts "ID: #{match['id']}"
+  puts "Score: #{match['score']}"
+  puts "Metadata: #{match['metadata']}"
+end
+```
+## Error Handling
+```ruby
+begin
+  client.query(vector: [0.1, 0.2, 0.3])
+rescue Vectra::ConnectionError => e
+  puts "Connection failed: #{e.message}"
+rescue Vectra::ValidationError => e
+  puts "Invalid input: #{e.message}"
+rescue => e
+  puts "Unexpected error: #{e.message}"
+end
+```
+See [Getting Started]({{ site.baseurl }}/guides/getting-started) for more examples.

data/docs/examples/index.md ADDED Viewed

@@ -0,0 +1,54 @@
+---
+layout: page
+title: Examples
+permalink: /examples/
+---
+# Code Examples
+Practical examples to get started with Vectra.
+## Quick Examples
+<div class="tma-comparison-grid">
+  <div class="tma-comparison-card">
+    <h4>Basic Usage</h4>
+    <p>Simple searches and CRUD operations</p>
+    <a href="{{ site.baseurl }}/examples/basic-usage/">View Guide →</a>
+  </div>
+  <div class="tma-comparison-card">
+    <h4>Rails Integration</h4>
+    <p>ActiveRecord integration with has_vector</p>
+    <a href="{{ site.baseurl }}/providers/pgvector/">View Guide →</a>
+  </div>
+</div>
+## Provider Examples
+<div class="tma-comparison-grid">
+  <div class="tma-comparison-card">
+    <h4>Pinecone</h4>
+    <p>Managed cloud vector database</p>
+    <a href="{{ site.baseurl }}/providers/pinecone/">View Guide →</a>
+  </div>
+  <div class="tma-comparison-card">
+    <h4>Qdrant</h4>
+    <p>Open source, self-hosted</p>
+    <a href="{{ site.baseurl }}/providers/qdrant/">View Guide →</a>
+  </div>
+  <div class="tma-comparison-card">
+    <h4>Weaviate</h4>
+    <p>Semantic search with GraphQL</p>
+    <a href="{{ site.baseurl }}/providers/weaviate/">View Guide →</a>
+  </div>
+  <div class="tma-comparison-card">
+    <h4>pgvector</h4>
+    <p>PostgreSQL with vector support</p>
+    <a href="{{ site.baseurl }}/providers/pgvector/">View Guide →</a>
+  </div>
+</div>
+## More Resources
+- [GitHub Examples](https://github.com/stokry/vectra/tree/main/examples) - Full example files
+- [Integration Tests](https://github.com/stokry/vectra/tree/main/spec/integration) - Real-world test cases

data/docs/guides/getting-started.md ADDED Viewed

@@ -0,0 +1,90 @@
+---
+layout: page
+title: Getting Started
+permalink: /guides/getting-started/
+---
+# Getting Started with Vectra
+## Initialize a Client
+```ruby
+require 'vectra'
+# Initialize with Pinecone
+client = Vectra::Client.new(
+  provider: :pinecone,
+  api_key: ENV['PINECONE_API_KEY'],
+  environment: 'us-west-4'
+)
+```
+## Basic Operations
+### Upsert Vectors
+```ruby
+client.upsert(
+  vectors: [
+    {
+      id: 'vec-1',
+      values: [0.1, 0.2, 0.3],
+      metadata: { title: 'Document 1' }
+    },
+    {
+      id: 'vec-2',
+      values: [0.2, 0.3, 0.4],
+      metadata: { title: 'Document 2' }
+    }
+  ]
+)
+```
+### Query (Search)
+```ruby
+results = client.query(
+  vector: [0.1, 0.2, 0.3],
+  top_k: 5,
+  include_metadata: true
+)
+results.matches.each do |match|
+  puts "ID: #{match['id']}, Score: #{match['score']}"
+end
+```
+### Delete Vectors
+```ruby
+client.delete(ids: ['vec-1', 'vec-2'])
+```
+### Get Vector Stats
+```ruby
+stats = client.stats
+puts "Index dimension: #{stats['dimension']}"
+puts "Vector count: #{stats['vector_count']}"
+```
+## Configuration
+Create a configuration file (Rails: `config/initializers/vectra.rb`):
+```ruby
+Vectra.configure do |config|
+  config.provider = :pinecone
+  config.api_key = ENV['PINECONE_API_KEY']
+  config.environment = 'us-west-4'
+end
+# Later in your code:
+client = Vectra::Client.new
+```
+## Next Steps
+- [API Reference]({{ site.baseurl }}/api/overview)
+- [Provider Guides]({{ site.baseurl }}/providers)
+- [Examples]({{ site.baseurl }}/examples/basic-usage)

data/docs/guides/installation.md ADDED Viewed

@@ -0,0 +1,67 @@
+---
+layout: page
+title: Installation
+permalink: /guides/installation/
+---
+# Installation Guide
+## Requirements
+- Ruby 3.2.0 or higher
+- Bundler
+## Install via Bundler
+Add Vectra to your Gemfile:
+```ruby
+gem 'vectra-client'
+```
+Then run:
+```bash
+bundle install
+```
+## Install Standalone
+Alternatively, install via RubyGems:
+```bash
+gem install vectra-client
+```
+## Rails Integration
+For Rails applications, run the install generator:
+```bash
+rails generate vectra:install
+```
+This will create an initializer file at `config/initializers/vectra.rb`.
+## Provider-Specific Setup
+Each vector database provider may require additional dependencies:
+### PostgreSQL with pgvector
+```ruby
+gem 'pg', '~> 1.5'
+```
+### Instrumentation
+#### Datadog
+```ruby
+gem 'dogstatsd-ruby'
+```
+#### New Relic
+```ruby
+gem 'newrelic_rpm'
+```
+See [Provider Guides]({{ site.baseurl }}/providers) for detailed setup instructions.

data/docs/guides/performance.md ADDED Viewed

@@ -0,0 +1,200 @@
+---
+layout: page
+title: Performance & Optimization
+permalink: /guides/performance/
+---
+# Performance & Optimization
+Vectra provides several performance optimization features for high-throughput applications.
+## Async Batch Operations
+Process large vector sets concurrently with automatic chunking:
+```ruby
+require 'vectra'
+client = Vectra::Client.new(provider: :pinecone, api_key: ENV['PINECONE_API_KEY'])
+# Create a batch processor with 4 concurrent workers
+batch = Vectra::Batch.new(client, concurrency: 4)
+# Async upsert with automatic chunking
+vectors = 10_000.times.map { |i| { id: "vec_#{i}", values: Array.new(384) { rand } } }
+result = batch.upsert_async(
+  index: 'my-index',
+  vectors: vectors,
+  chunk_size: 100
+)
+puts "Upserted: #{result[:upserted_count]} vectors in #{result[:chunks]} chunks"
+puts "Errors: #{result[:errors].size}" if result[:errors].any?
+```
+### Batch Delete
+```ruby
+ids = 1000.times.map { |i| "vec_#{i}" }
+result = batch.delete_async(
+  index: 'my-index',
+  ids: ids,
+  chunk_size: 100
+)
+```
+### Batch Fetch
+```ruby
+ids = ['vec_1', 'vec_2', 'vec_3']
+vectors = batch.fetch_async(
+  index: 'my-index',
+  ids: ids,
+  chunk_size: 50
+)
+```
+## Streaming Results
+For large query result sets, use streaming to reduce memory usage:
+```ruby
+stream = Vectra::Streaming.new(client, page_size: 100)
+# Stream with a block
+stream.query_each(
+  index: 'my-index',
+  vector: query_vector,
+  total: 1000
+) do |match|
+  process_match(match)
+end
+# Or use lazy enumerator
+results = stream.query_stream(
+  index: 'my-index',
+  vector: query_vector,
+  total: 1000
+)
+# Only fetches what you need
+results.take(50).each { |m| puts m.id }
+```
+## Caching Layer
+Cache frequently queried vectors to reduce database load:
+```ruby
+# Create cache with 5-minute TTL
+cache = Vectra::Cache.new(ttl: 300, max_size: 1000)
+# Wrap client with caching
+cached_client = Vectra::CachedClient.new(client, cache: cache)
+# First query hits the database
+result1 = cached_client.query(index: 'idx', vector: vec, top_k: 10)
+# Second identical query returns cached result
+result2 = cached_client.query(index: 'idx', vector: vec, top_k: 10)
+# Invalidate cache when data changes
+cached_client.invalidate_index('idx')
+# Clear all cache
+cached_client.clear_cache
+```
+### Cache Statistics
+```ruby
+stats = cache.stats
+puts "Cache size: #{stats[:size]}/#{stats[:max_size]}"
+puts "TTL: #{stats[:ttl]} seconds"
+```
+## Connection Pooling (pgvector)
+For pgvector, use connection pooling with warmup:
+```ruby
+# Configure pool size
+Vectra.configure do |config|
+  config.provider = :pgvector
+  config.host = ENV['DATABASE_URL']
+  config.pool_size = 10
+  config.pool_timeout = 5
+end
+client = Vectra::Client.new
+# Warmup connections at startup
+client.provider.warmup_pool(5)
+# Check pool stats
+stats = client.provider.pool_stats
+puts "Available connections: #{stats[:available]}"
+puts "Checked out: #{stats[:checked_out]}"
+# Shutdown pool when done
+client.provider.shutdown_pool
+```
+## Configuration Options
+```ruby
+Vectra.configure do |config|
+  # Provider settings
+  config.provider = :pinecone
+  config.api_key = ENV['PINECONE_API_KEY']
+  # Timeouts
+  config.timeout = 30
+  config.open_timeout = 10
+  # Retry settings
+  config.max_retries = 3
+  config.retry_delay = 1
+  # Batch operations
+  config.batch_size = 100
+  config.async_concurrency = 4
+  # Connection pooling (pgvector)
+  config.pool_size = 10
+  config.pool_timeout = 5
+  # Caching
+  config.cache_enabled = true
+  config.cache_ttl = 300
+  config.cache_max_size = 1000
+end
+```
+## Benchmarking
+Run the included benchmarks:
+```bash
+# Batch operations benchmark
+bundle exec ruby benchmarks/batch_operations_benchmark.rb
+# Connection pooling benchmark
+bundle exec ruby benchmarks/connection_pooling_benchmark.rb
+```
+## Best Practices
+1. **Batch Size**: Use batch sizes of 100-500 for optimal throughput
+2. **Concurrency**: Set concurrency to 2-4x your CPU cores
+3. **Connection Pool**: Size pool to expected concurrent requests + 20%
+4. **Cache TTL**: Set TTL based on data freshness requirements
+5. **Warmup**: Always warmup connections in production
+## Next Steps
+- [API Reference]({{ site.baseurl }}/api/overview)
+- [Provider Guides]({{ site.baseurl }}/providers)

data/docs/index.md ADDED Viewed

@@ -0,0 +1,37 @@
+---
+layout: home
+title: Vectra
+---
+```ruby
+require 'vectra'
+# Initialize any provider with the same API
+client = Vectra::Client.new(
+  provider: :pinecone,     # or :qdrant, :weaviate, :pgvector
+  api_key: ENV['API_KEY'],
+  host: 'your-host.example.com'
+)
+# Store vectors with metadata
+client.upsert(
+  vectors: [
+    {
+      id: 'doc-1',
+      values: [0.1, 0.2, 0.3, ...],  # Your embedding
+      metadata: { title: 'Getting Started with AI' }
+    }
+  ]
+)
+# Search by similarity
+results = client.query(
+  vector: [0.1, 0.2, 0.3, ...],
+  top_k: 10,
+  filter: { category: 'tutorials' }
+)
+results.each do |match|
+  puts "#{match['id']}: #{match['score']}"
+end
+```

data/docs/providers/index.md ADDED Viewed

@@ -0,0 +1,81 @@
+---
+layout: page
+title: Providers
+permalink: /providers/
+---
+# Vector Database Providers
+Vectra supports multiple vector database providers. Choose the one that best fits your needs.
+## Supported Providers
+| Provider | Type | Best For |
+|----------|------|----------|
+| [**Pinecone**]({{ site.baseurl }}/providers/pinecone) | Managed Cloud | Production, Zero ops |
+| [**Qdrant**]({{ site.baseurl }}/providers/qdrant) | Open Source | Self-hosted, Performance |
+| [**Weaviate**]({{ site.baseurl }}/providers/weaviate) | Open Source | Semantic search, GraphQL |
+| [**pgvector**]({{ site.baseurl }}/providers/pgvector) | PostgreSQL | SQL integration, ACID |
+## Quick Comparison
+<div class="tma-comparison-grid">
+  <div class="tma-comparison-card">
+    <h4>Pinecone</h4>
+    <ul>
+      <li class="pro">Fully managed service</li>
+      <li class="pro">Easy setup</li>
+      <li class="pro">Highly scalable</li>
+      <li class="con">Cloud only</li>
+      <li class="con">Paid service</li>
+    </ul>
+  </div>
+  <div class="tma-comparison-card">
+    <h4>Qdrant</h4>
+    <ul>
+      <li class="pro">Open source</li>
+      <li class="pro">Self-hosted option</li>
+      <li class="pro">High performance</li>
+      <li class="pro">Cloud option available</li>
+      <li class="con">More configuration</li>
+    </ul>
+  </div>
+  <div class="tma-comparison-card">
+    <h4>Weaviate</h4>
+    <ul>
+      <li class="pro">Open source</li>
+      <li class="pro">Semantic search</li>
+      <li class="pro">GraphQL API</li>
+      <li class="pro">Multi-model support</li>
+      <li class="con">More complex setup</li>
+    </ul>
+  </div>
+  <div class="tma-comparison-card">
+    <h4>pgvector</h4>
+    <ul>
+      <li class="pro">SQL database</li>
+      <li class="pro">ACID transactions</li>
+      <li class="pro">Use existing Postgres</li>
+      <li class="pro">Very affordable</li>
+      <li class="con">Not vector-specialized</li>
+    </ul>
+  </div>
+</div>
+## Switching Providers
+One of Vectra's key features is easy provider switching:
+```ruby
+# Just change the provider - your code stays the same!
+client = Vectra::Client.new(provider: :qdrant, host: 'localhost:6333')
+# All operations work identically
+client.upsert(vectors: [...])
+results = client.query(vector: [...], top_k: 5)
+```
+## Next Steps
+- [Getting Started Guide]({{ site.baseurl }}/guides/getting-started)
+- [API Reference]({{ site.baseurl }}/api/overview)