RubyGems - prescient - Versions diffs - 0.1.0 → 0.2.0 - Mend

prescient 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

checksums.yaml +4 -4
data/.rubocop.yml +6 -2
data/.yardopts +14 -0
data/CHANGELOG.md +64 -0
data/CHANGELOG.pdf +0 -0
data/INTEGRATION_GUIDE.md +363 -0
data/README.md +96 -38
data/Rakefile +2 -1
data/VECTOR_SEARCH_GUIDE.md +42 -39
data/lib/prescient/base.rb +123 -19
data/lib/prescient/client.rb +125 -21
data/lib/prescient/provider/huggingface.rb +1 -3
data/lib/prescient/version.rb +1 -1
data/lib/prescient.rb +103 -1
data/prescient.gemspec +17 -15
metadata +67 -32

data/README.md CHANGED Viewed

@@ -117,6 +117,52 @@ Prescient.configure do |config|
 end
 ```
+### Provider Fallback Configuration
+Prescient supports automatic fallback to backup providers when the primary provider fails. This ensures high availability for your AI applications.
+```ruby
+Prescient.configure do |config|
+  # Configure primary provider
+  config.add_provider(:primary, Prescient::Provider::OpenAI,
+    api_key: ENV['OPENAI_API_KEY'],
+    embedding_model: 'text-embedding-3-small',
+    chat_model: 'gpt-3.5-turbo'
+  )
+  # Configure backup providers
+  config.add_provider(:backup1, Prescient::Provider::Anthropic,
+    api_key: ENV['ANTHROPIC_API_KEY'],
+    model: 'claude-3-haiku-20240307'
+  )
+  config.add_provider(:backup2, Prescient::Provider::Ollama,
+    url: 'http://localhost:11434',
+    embedding_model: 'nomic-embed-text',
+    chat_model: 'llama3.1:8b'
+  )
+  # Configure fallback order
+  config.fallback_providers = [:backup1, :backup2]
+end
+# Client with fallback enabled (default)
+client = Prescient::Client.new(:primary, enable_fallback: true)
+# Client without fallback
+client_no_fallback = Prescient::Client.new(:primary, enable_fallback: false)
+# Convenience methods also support fallback
+response = Prescient.generate_response("Hello", provider: :primary, enable_fallback: true)
+```
+**Fallback Behavior:**
+- When a provider fails with a persistent error, Prescient automatically tries the next available provider
+- Only available (healthy) providers are tried during fallback
+- If no fallback providers are configured, all available providers are tried as fallbacks
+- Transient errors (rate limits, timeouts) still use retry logic before fallback
+- The fallback process preserves all method arguments and options
 ## Usage
 ### Quick Start
@@ -170,8 +216,8 @@ response = Prescient.generate_response(query, context_items,
 )
 puts response[:response]
-puts "Model: #{response[:model]}"
-puts "Provider: #{response[:provider]}"
+puts "Model: " + response[:model]
+puts "Provider: " + response[:provider]
 ```
 ### Error Handling
@@ -214,14 +260,14 @@ Prescient.configure do |config|
     prompt_templates: {
       system_prompt: 'You are a friendly customer service representative.',
       no_context_template: <<~TEMPLATE.strip,
-        %{system_prompt}
+        %{ system_prompt }
         Customer Question: %{query}
         Please provide a helpful response.
       TEMPLATE
       with_context_template: <<~TEMPLATE.strip
-        %{system_prompt} Use the company info below to help answer.
+        %{ system_prompt } Use the company info below to help answer.
         Company Information:
         %{context}
@@ -259,6 +305,7 @@ prompt_templates: {
   system_prompt: 'You are a technical documentation assistant. Provide detailed explanations with code examples.',
   # ... templates
 }
 ```
 #### Creative Writing
@@ -283,12 +330,12 @@ Prescient.configure do |config|
     context_configs: {
       'product' => {
         fields: %w[name description price category brand],
-        format: '%{name} by %{brand}: %{description} - $%{price} (%{category})',
+        format: '%{ name } by %{ brand }: %{ description } - $%{ price } (%{ category })',
         embedding_fields: %w[name description category brand]
       },
       'review' => {
         fields: %w[product_name rating review_text reviewer_name],
-        format: '%{product_name} - %{rating}/5 stars: "%{review_text}"',
+        format: '%{ product_name } - %{ rating }/5 stars: "%{ review_text }"',
         embedding_fields: %w[product_name review_text]
       }
     }
@@ -409,10 +456,10 @@ query_embedding = client.generate_embedding(query_text)
 query_vector = "[#{query_embedding.join(',')}]"
 results = db.exec_params(
-  "SELECT d.title, d.content, de.embedding <=> $1::vector AS distance
-   FROM documents d
-   JOIN document_embeddings de ON d.id = de.document_id
-   ORDER BY de.embedding <=> $1::vector
+  "SELECT d.title, d.content, de.embedding <=> $1::vector AS distance
+   FROM documents d
+   JOIN document_embeddings de ON d.id = de.document_id
+   ORDER BY de.embedding <=> $1::vector
    LIMIT 5",
   [query_vector]
 )
@@ -423,14 +470,14 @@ results = db.exec_params(
 pgvector supports three distance functions:
 - **Cosine Distance** (`<=>`): Best for normalized embeddings
-- **L2 Distance** (`<->`): Euclidean distance, good general purpose
+- **L2 Distance** (`<->`): Euclidean distance, good general purpose
 - **Inner Product** (`<#>`): Dot product, useful for specific cases
 ```sql
 -- Cosine similarity (most common)
 ORDER BY embedding <=> query_vector
--- L2 distance
+-- L2 distance
 ORDER BY embedding <-> query_vector
 -- Inner product
@@ -443,8 +490,8 @@ The setup automatically creates HNSW indexes for fast similarity search:
 ```sql
 -- Example index for cosine distance
-CREATE INDEX idx_embeddings_cosine
-ON document_embeddings
+CREATE INDEX idx_embeddings_cosine
+ON document_embeddings
 USING hnsw (embedding vector_cosine_ops)
 WITH (m = 16, ef_construction = 64);
 ```
@@ -457,22 +504,22 @@ Combine vector similarity with metadata filtering:
 # Search with tag filtering
 results = db.exec_params(
   "SELECT d.title, de.embedding <=> $1::vector as distance
-   FROM documents d
+   FROM documents d
    JOIN document_embeddings de ON d.id = de.document_id
    WHERE d.metadata->'tags' ? 'programming'
-   ORDER BY de.embedding <=> $1::vector
+   ORDER BY de.embedding <=> $1::vector
    LIMIT 5",
   [query_vector]
 )
-# Search with difficulty and tag filters
+# Search with difficulty and tag filters
 results = db.exec_params(
   "SELECT d.title, de.embedding <=> $1::vector as distance
-   FROM documents d
+   FROM documents d
    JOIN document_embeddings de ON d.id = de.document_id
    WHERE d.metadata->>'difficulty' = 'beginner'
      AND d.metadata->'tags' ?| $2::text[]
-   ORDER BY de.embedding <=> $1::vector
+   ORDER BY de.embedding <=> $1::vector
    LIMIT 5",
   [query_vector, ['ruby', 'programming']]
 )
@@ -488,7 +535,7 @@ For large datasets, tune HNSW parameters:
 -- High accuracy (slower build, more memory)
 WITH (m = 32, ef_construction = 128)
--- Fast build (lower accuracy, less memory)
+-- Fast build (lower accuracy, less memory)
 WITH (m = 8, ef_construction = 32)
 -- Balanced (recommended default)
@@ -502,9 +549,9 @@ WITH (m = 16, ef_construction = 64)
 SET hnsw.ef_search = 100;  -- Higher = more accurate, slower
 -- Use EXPLAIN ANALYZE to optimize queries
-EXPLAIN ANALYZE
-SELECT * FROM document_embeddings
-ORDER BY embedding <=> '[0.1,0.2,...]'::vector
+EXPLAIN ANALYZE
+SELECT * FROM document_embeddings
+ORDER BY embedding <=> '[0.1,0.2,...]'::vector
 LIMIT 10;
 ```
@@ -516,14 +563,14 @@ For large documents, use chunking for better search granularity:
 def chunk_document(text, chunk_size: 500, overlap: 50)
   chunks = []
   start = 0
   while start < text.length
     end_pos = [start + chunk_size, text.length].min
     chunk = text[start...end_pos]
     chunks << chunk
     start += chunk_size - overlap
   end
   chunks
 end
@@ -548,6 +595,7 @@ DB_HOST=localhost ruby examples/vector_search.rb
 ```
 The example demonstrates:
 - Document embedding generation and storage
 - Similarity search with different distance functions
 - Metadata filtering and advanced queries
@@ -596,7 +644,7 @@ info = client.provider_info
 puts info[:name]      # => :ollama
 puts info[:class]     # => "Prescient::Ollama::Provider"
 puts info[:available] # => true
-puts info[:options]   # => {...} (excluding sensitive data)
+puts info[:options]   # => { ... } (excluding sensitive data)
 ```
 ## Provider-Specific Features
@@ -633,6 +681,7 @@ The easiest way to get started with Prescient and Ollama is using Docker Compose
 Before starting, ensure your system meets the minimum requirements for running Ollama:
 #### **Minimum Requirements:**
 - **CPU**: 4+ cores (x86_64 or ARM64)
 - **RAM**: 8GB+ (16GB recommended)
 - **Storage**: 10GB+ free space for models
@@ -640,20 +689,22 @@ Before starting, ensure your system meets the minimum requirements for running O
 #### **Model-Specific Requirements:**
-| Model | RAM Required | Storage | Notes |
-|-------|-------------|---------|-------|
-| `nomic-embed-text` | 1GB | 274MB | Embedding model |
-| `llama3.1:8b` | 8GB | 4.7GB | Chat model (8B parameters) |
-| `llama3.1:70b` | 64GB+ | 40GB | Large chat model (70B parameters) |
-| `codellama:7b` | 8GB | 3.8GB | Code generation model |
+| Model              | RAM Required | Storage | Notes                             |
+| ------------------ | ------------ | ------- | --------------------------------- |
+| `nomic-embed-text` | 1GB          | 274MB   | Embedding model                   |
+| `llama3.1:8b`      | 8GB          | 4.7GB   | Chat model (8B parameters)        |
+| `llama3.1:70b`     | 64GB+        | 40GB    | Large chat model (70B parameters) |
+| `codellama:7b`     | 8GB          | 3.8GB   | Code generation model             |
 #### **Performance Recommendations:**
 - **SSD Storage**: Significantly faster model loading
 - **GPU (Optional)**: NVIDIA GPU with 8GB+ VRAM for acceleration
 - **Network**: Stable internet for initial model downloads
 - **Docker**: 4GB+ memory limit configured
 #### **GPU Acceleration (Optional):**
 - **NVIDIA GPU**: RTX 3060+ with 8GB+ VRAM recommended
 - **CUDA**: Version 11.8+ required
 - **Docker**: NVIDIA Container Toolkit installed
@@ -664,24 +715,27 @@ Before starting, ensure your system meets the minimum requirements for running O
 ### Quick Start with Docker
 1. **Start Ollama service:**
    ```bash
    docker-compose up -d ollama
    ```
 2. **Pull required models:**
    ```bash
    # Automatic setup
    docker-compose up ollama-init
    # Or manual setup
    ./scripts/setup-ollama-models.sh
    ```
 3. **Run examples:**
    ```bash
    # Set environment variable
    export OLLAMA_URL=http://localhost:11434
    # Run examples
    ruby examples/custom_contexts.rb
    ```
@@ -702,9 +756,9 @@ The included `docker-compose.yml` provides:
 services:
   ollama:
     ports:
-      - "11434:11434"  # Ollama API port
+      - "11434:11434" # Ollama API port
     volumes:
-      - ollama_data:/root/.ollama  # Persist models
+      - ollama_data:/root/.ollama # Persist models
     environment:
       - OLLAMA_HOST=0.0.0.0
       - OLLAMA_ORIGINS=*
@@ -749,7 +803,7 @@ curl http://localhost:11434/api/tags
 # Pull a specific model
 curl -X POST http://localhost:11434/api/pull \
   -H "Content-Type: application/json" \
-  -d '{"name": "llama3.1:8b"}'
+  -d '{ "name": "llama3.1:8b"}'
 # Health check
 curl http://localhost:11434/api/version
@@ -770,6 +824,7 @@ For production use:
 #### **Common Issues:**
 **Out of Memory Errors:**
 ```bash
 # Check available memory
 free -h
@@ -782,6 +837,7 @@ OLLAMA_CHAT_MODEL=llama3.1:7b ruby examples/custom_contexts.rb
 ```
 **Slow Model Loading:**
 ```bash
 # Check disk I/O
 iostat -x 1
@@ -791,6 +847,7 @@ iostat -x 1
 ```
 **Model Download Failures:**
 ```bash
 # Check disk space
 df -h
@@ -800,6 +857,7 @@ docker exec prescient-ollama ollama pull llama3.1:8b
 ```
 **GPU Not Detected:**
 ```bash
 # Check NVIDIA Docker runtime
 docker run --rm --gpus all nvidia/cuda:11.8-base nvidia-smi
@@ -820,7 +878,7 @@ docker logs prescient-ollama
 # Test API response time
 time curl -X POST http://localhost:11434/api/generate \
   -H "Content-Type: application/json" \
-  -d '{"model": "llama3.1:8b", "prompt": "Hello", "stream": false}'
+  -d '{ "model": "llama3.1:8b", "prompt": "Hello", "stream": false}'
 ```
 ## Testing

data/Rakefile CHANGED Viewed

@@ -26,5 +26,6 @@ task :console do
   require "bundler/setup"
   require "prescient"
   require "irb"
+  ARGV.clear
   IRB.start
-end
+end

data/VECTOR_SEARCH_GUIDE.md CHANGED Viewed

@@ -130,17 +130,17 @@ query_embedding = client.generate_embedding(query_text)
 query_vector = "[#{query_embedding.join(',')}]"
 results = db.exec_params(
-  "SELECT d.title, d.content, de.embedding <=> $1::vector AS distance
-   FROM documents d
-   JOIN document_embeddings de ON d.id = de.document_id
-   ORDER BY de.embedding <=> $1::vector
+  "SELECT d.title, d.content, de.embedding <=> $1::vector AS distance
+   FROM documents d
+   JOIN document_embeddings de ON d.id = de.document_id
+   ORDER BY de.embedding <=> $1::vector
    LIMIT 5",
   [query_vector]
 )
 results.each do |row|
   similarity = 1 - row['distance'].to_f
-  puts "#{row['title']} (#{(similarity * 100).round(1)}% similar)"
+  puts "#{ row['title']} (#{ (similarity * 100).round(1)}% similar)"
 end
 ```
@@ -150,11 +150,11 @@ end
 # Search with metadata filtering
 results = db.exec_params(
   "SELECT d.title, de.embedding <=> $1::vector as distance
-   FROM documents d
+   FROM documents d
    JOIN document_embeddings de ON d.id = de.document_id
    WHERE d.metadata->'tags' ? 'programming'
      AND d.metadata->>'difficulty' = 'beginner'
-   ORDER BY de.embedding <=> $1::vector
+   ORDER BY de.embedding <=> $1::vector
    LIMIT 10",
   [query_vector]
 )
@@ -168,17 +168,17 @@ For large documents, split into chunks for better search granularity:
 def chunk_document(text, chunk_size: 500, overlap: 50)
   chunks = []
   start = 0
   while start < text.length
     end_pos = [start + chunk_size, text.length].min
     # Find word boundary to avoid cutting words
     if end_pos < text.length
       while end_pos > start && text[end_pos] != ' '
         end_pos -= 1
       end
     end
     chunk = text[start...end_pos].strip
     chunks << {
       text: chunk,
@@ -186,11 +186,11 @@ def chunk_document(text, chunk_size: 500, overlap: 50)
       end_pos: end_pos,
       index: chunks.length
     }
     start = end_pos - overlap
     break if start >= text.length
   end
   chunks
 end
@@ -200,14 +200,14 @@ chunks.each do |chunk|
   # Insert chunk
   chunk_result = db.exec_params(
     "INSERT INTO document_chunks (document_id, chunk_index, chunk_text, chunk_metadata) VALUES ($1, $2, $3, $4) RETURNING id",
-    [document_id, chunk[:index], chunk[:text], {start_pos: chunk[:start_pos], end_pos: chunk[:end_pos]}.to_json]
+    [document_id, chunk[:index], chunk[:text], { start_pos: chunk[:start_pos], end_pos: chunk[:end_pos]}.to_json]
   )
   chunk_id = chunk_result[0]['id']
   # Generate embedding for chunk
   chunk_embedding = client.generate_embedding(chunk[:text])
   chunk_vector = "[#{chunk_embedding.join(',')}]"
   # Store chunk embedding
   db.exec_params(
     "INSERT INTO chunk_embeddings (chunk_id, document_id, embedding_provider, embedding_model, embedding_dimensions, embedding) VALUES ($1, $2, $3, $4, $5, $6)",
@@ -224,20 +224,20 @@ For different dataset sizes and performance requirements:
 ```sql
 -- Small datasets (< 100K vectors): Fast build, good accuracy
-CREATE INDEX idx_embeddings_small
-ON document_embeddings
+CREATE INDEX idx_embeddings_small
+ON document_embeddings
 USING hnsw (embedding vector_cosine_ops)
 WITH (m = 8, ef_construction = 32);
 -- Medium datasets (100K - 1M vectors): Balanced
-CREATE INDEX idx_embeddings_medium
-ON document_embeddings
+CREATE INDEX idx_embeddings_medium
+ON document_embeddings
 USING hnsw (embedding vector_cosine_ops)
 WITH (m = 16, ef_construction = 64);
 -- Large datasets (> 1M vectors): High accuracy
-CREATE INDEX idx_embeddings_large
-ON document_embeddings
+CREATE INDEX idx_embeddings_large
+ON document_embeddings
 USING hnsw (embedding vector_cosine_ops)
 WITH (m = 32, ef_construction = 128);
 ```
@@ -251,9 +251,9 @@ SET hnsw.ef_search = 100;  -- Balanced (default)
 SET hnsw.ef_search = 200;  -- High accuracy, slower
 -- Monitor query performance
-EXPLAIN (ANALYZE, BUFFERS)
-SELECT * FROM document_embeddings
-ORDER BY embedding <=> '[0.1,0.2,...]'::vector
+EXPLAIN (ANALYZE, BUFFERS)
+SELECT * FROM document_embeddings
+ORDER BY embedding <=> '[0.1,0.2,...]'::vector
 LIMIT 10;
 ```
@@ -268,7 +268,7 @@ texts.each_slice(10) do |batch|
   batch.each do |text|
     embedding = client.generate_embedding(text)
     embeddings << embedding
     # Small delay to avoid rate limiting
     sleep(0.1)
   end
@@ -295,13 +295,13 @@ Combine vector similarity with traditional text search:
 ```sql
 WITH vector_results AS (
   SELECT document_id, embedding <=> $1::vector as distance
-  FROM document_embeddings
-  ORDER BY embedding <=> $1::vector
+  FROM document_embeddings
+  ORDER BY embedding <=> $1::vector
   LIMIT 20
 ),
 text_results AS (
   SELECT id as document_id, ts_rank(to_tsvector(content), plainto_tsquery($2)) as rank
-  FROM documents
+  FROM documents
   WHERE to_tsvector(content) @@ plainto_tsquery($2)
 )
 SELECT d.title, d.content,
@@ -328,10 +328,10 @@ providers = [
 providers.each do |provider|
   next unless provider[:client].available?
   embedding = provider[:client].generate_embedding(text)
   vector_str = "[#{embedding.join(',')}]"
   db.exec_params(
     "INSERT INTO document_embeddings (document_id, embedding_provider, embedding_model, embedding_dimensions, embedding, embedding_text) VALUES ($1, $2, $3, $4, $5, $6)",
     [document_id, provider[:name], provider[:model], provider[:dims], vector_str, text]
@@ -348,14 +348,14 @@ end
 def track_search(query_text, results, provider, model)
   query_embedding = client.generate_embedding(query_text)
   query_vector = "[#{query_embedding.join(',')}]"
   # Insert search query
   query_result = db.exec_params(
     "INSERT INTO search_queries (query_text, embedding_provider, embedding_model, query_embedding, result_count) VALUES ($1, $2, $3, $4, $5) RETURNING id",
     [query_text, provider, model, query_vector, results.length]
   )
   query_id = query_result[0]['id']
   # Insert query results
   results.each_with_index do |result, index|
     db.exec_params(
@@ -371,14 +371,14 @@ end
 ```sql
 -- Popular search terms
 SELECT query_text, COUNT(*) as search_count
-FROM search_queries
+FROM search_queries
 WHERE created_at > NOW() - INTERVAL '7 days'
 GROUP BY query_text
 ORDER BY search_count DESC
 LIMIT 10;
 -- Average similarity scores
-SELECT embedding_provider, embedding_model,
+SELECT embedding_provider, embedding_model,
        AVG(similarity_score) as avg_similarity,
        COUNT(*) as result_count
 FROM query_results qr
@@ -400,11 +400,12 @@ ORDER BY hour;
 ### Common Issues
 **Slow queries:**
 ```sql
 -- Check if indexes are being used
-EXPLAIN (ANALYZE, BUFFERS)
-SELECT * FROM document_embeddings
-ORDER BY embedding <=> '[...]'::vector
+EXPLAIN (ANALYZE, BUFFERS)
+SELECT * FROM document_embeddings
+ORDER BY embedding <=> '[...]'::vector
 LIMIT 10;
 -- Rebuild indexes if needed
@@ -412,10 +413,11 @@ REINDEX INDEX idx_document_embeddings_cosine;
 ```
 **Memory issues:**
 ```sql
 -- Check index sizes
 SELECT schemaname, tablename, indexname, pg_size_pretty(pg_relation_size(indexrelid)) as size
-FROM pg_stat_user_indexes
+FROM pg_stat_user_indexes
 WHERE tablename LIKE '%embedding%'
 ORDER BY pg_relation_size(indexrelid) DESC;
@@ -424,6 +426,7 @@ SET work_mem = '256MB';
 ```
 **Dimension mismatches:**
 ```ruby
 # Validate embedding dimensions before storing
 expected_dims = 768
@@ -447,4 +450,4 @@ end
 - [pgvector Documentation](https://github.com/pgvector/pgvector)
 - [HNSW Algorithm](https://arxiv.org/abs/1603.09320)
 - [Vector Database Concepts](https://www.pinecone.io/learn/vector-database/)
-- [Embedding Best Practices](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings)
+- [Embedding Best Practices](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings)