prescient 0.0.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md CHANGED
@@ -1,43 +1,947 @@
1
1
  # Prescient
2
2
 
3
- TODO: Delete this and the text below, and describe your gem
3
+ Prescient provides a unified interface for AI providers including Ollama (local), Anthropic Claude, OpenAI GPT, and HuggingFace models. Built for prescient applications that need AI predictions with provider switching, error handling, and fallback mechanisms.
4
4
 
5
- Welcome to your new gem! In this directory, you'll find the files you need to be able to package up your Ruby library into a gem. Put your Ruby code in the file `lib/prescient`. To experiment with that code, run `bin/console` for an interactive prompt.
5
+ ## Features
6
+
7
+ - **Unified Interface**: Single API for multiple AI providers
8
+ - **Local and Cloud Support**: Ollama for local/private deployments, cloud APIs for scale
9
+ - **Embedding Generation**: Vector embeddings for semantic search and AI applications
10
+ - **Text Completion**: Chat completions with context support
11
+ - **Error Handling**: Robust error handling with automatic retries
12
+ - **Health Monitoring**: Built-in health checks for all providers
13
+ - **Flexible Configuration**: Environment variable and programmatic configuration
14
+
15
+ ## Supported Providers
16
+
17
+ ### Ollama (Local)
18
+
19
+ - **Models**: Any Ollama-compatible model (llama3.1, nomic-embed-text, etc.)
20
+ - **Capabilities**: Embeddings, Text Generation, Model Management
21
+ - **Use Case**: Privacy-focused, local deployments
22
+
23
+ ### Anthropic Claude
24
+
25
+ - **Models**: Claude 3 (Haiku, Sonnet, Opus)
26
+ - **Capabilities**: Text Generation only (no embeddings)
27
+ - **Use Case**: High-quality conversational AI
28
+
29
+ ### OpenAI
30
+
31
+ - **Models**: GPT-3.5, GPT-4, text-embedding-3-small/large
32
+ - **Capabilities**: Embeddings, Text Generation
33
+ - **Use Case**: Proven performance, wide model selection
34
+
35
+ ### HuggingFace
36
+
37
+ - **Models**: sentence-transformers, open-source chat models
38
+ - **Capabilities**: Embeddings, Text Generation
39
+ - **Use Case**: Open-source models, research
6
40
 
7
41
  ## Installation
8
42
 
9
- TODO: Replace `UPDATE_WITH_YOUR_GEM_NAME_IMMEDIATELY_AFTER_RELEASE_TO_RUBYGEMS_ORG` with your gem name right after releasing it to RubyGems.org. Please do not do it earlier due to security reasons. Alternatively, replace this section with instructions to install your gem from git if you don't plan to release to RubyGems.org.
43
+ Add this line to your application's Gemfile:
44
+
45
+ ```ruby
46
+ gem 'prescient'
47
+ ```
10
48
 
11
- Install the gem and add to the application's Gemfile by executing:
49
+ And then execute:
12
50
 
13
51
  ```bash
14
- bundle add UPDATE_WITH_YOUR_GEM_NAME_IMMEDIATELY_AFTER_RELEASE_TO_RUBYGEMS_ORG
52
+ bundle install
15
53
  ```
16
54
 
17
- If bundler is not being used to manage dependencies, install the gem by executing:
55
+ Or install it yourself as:
18
56
 
19
57
  ```bash
20
- gem install UPDATE_WITH_YOUR_GEM_NAME_IMMEDIATELY_AFTER_RELEASE_TO_RUBYGEMS_ORG
58
+ gem install prescient
21
59
  ```
22
60
 
61
+ ## Configuration
62
+
63
+ ### Environment Variables
64
+
65
+ ```bash
66
+ # Ollama (Local)
67
+ OLLAMA_URL=http://localhost:11434
68
+ OLLAMA_EMBEDDING_MODEL=nomic-embed-text
69
+ OLLAMA_CHAT_MODEL=llama3.1:8b
70
+
71
+ # Anthropic
72
+ ANTHROPIC_API_KEY=your_api_key
73
+ ANTHROPIC_MODEL=claude-3-haiku-20240307
74
+
75
+ # OpenAI
76
+ OPENAI_API_KEY=your_api_key
77
+ OPENAI_EMBEDDING_MODEL=text-embedding-3-small
78
+ OPENAI_CHAT_MODEL=gpt-3.5-turbo
79
+
80
+ # HuggingFace
81
+ HUGGINGFACE_API_KEY=your_api_key
82
+ HUGGINGFACE_EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
83
+ HUGGINGFACE_CHAT_MODEL=microsoft/DialoGPT-medium
84
+ ```
85
+
86
+ ### Programmatic Configuration
87
+
88
+ ```ruby
89
+ require 'prescient'
90
+
91
+ # Configure providers
92
+ Prescient.configure do |config|
93
+ config.default_provider = :ollama
94
+ config.timeout = 60
95
+ config.retry_attempts = 3
96
+ config.retry_delay = 1.0
97
+
98
+ # Add custom Ollama configuration
99
+ config.add_provider(:ollama, Prescient::Ollama::Provider,
100
+ url: 'http://localhost:11434',
101
+ embedding_model: 'nomic-embed-text',
102
+ chat_model: 'llama3.1:8b'
103
+ )
104
+
105
+ # Add Anthropic
106
+ config.add_provider(:anthropic, Prescient::Anthropic::Provider,
107
+ api_key: ENV['ANTHROPIC_API_KEY'],
108
+ model: 'claude-3-haiku-20240307'
109
+ )
110
+
111
+ # Add OpenAI
112
+ config.add_provider(:openai, Prescient::OpenAI::Provider,
113
+ api_key: ENV['OPENAI_API_KEY'],
114
+ embedding_model: 'text-embedding-3-small',
115
+ chat_model: 'gpt-3.5-turbo'
116
+ )
117
+ end
118
+ ```
119
+
120
+ ### Provider Fallback Configuration
121
+
122
+ Prescient supports automatic fallback to backup providers when the primary provider fails. This ensures high availability for your AI applications.
123
+
124
+ ```ruby
125
+ Prescient.configure do |config|
126
+ # Configure primary provider
127
+ config.add_provider(:primary, Prescient::Provider::OpenAI,
128
+ api_key: ENV['OPENAI_API_KEY'],
129
+ embedding_model: 'text-embedding-3-small',
130
+ chat_model: 'gpt-3.5-turbo'
131
+ )
132
+
133
+ # Configure backup providers
134
+ config.add_provider(:backup1, Prescient::Provider::Anthropic,
135
+ api_key: ENV['ANTHROPIC_API_KEY'],
136
+ model: 'claude-3-haiku-20240307'
137
+ )
138
+
139
+ config.add_provider(:backup2, Prescient::Provider::Ollama,
140
+ url: 'http://localhost:11434',
141
+ embedding_model: 'nomic-embed-text',
142
+ chat_model: 'llama3.1:8b'
143
+ )
144
+
145
+ # Configure fallback order
146
+ config.fallback_providers = [:backup1, :backup2]
147
+ end
148
+
149
+ # Client with fallback enabled (default)
150
+ client = Prescient::Client.new(:primary, enable_fallback: true)
151
+
152
+ # Client without fallback
153
+ client_no_fallback = Prescient::Client.new(:primary, enable_fallback: false)
154
+
155
+ # Convenience methods also support fallback
156
+ response = Prescient.generate_response("Hello", provider: :primary, enable_fallback: true)
157
+ ```
158
+
159
+ **Fallback Behavior:**
160
+ - When a provider fails with a persistent error, Prescient automatically tries the next available provider
161
+ - Only available (healthy) providers are tried during fallback
162
+ - If no fallback providers are configured, all available providers are tried as fallbacks
163
+ - Transient errors (rate limits, timeouts) still use retry logic before fallback
164
+ - The fallback process preserves all method arguments and options
165
+
23
166
  ## Usage
24
167
 
25
- TODO: Write usage instructions here
168
+ ### Quick Start
169
+
170
+ ```ruby
171
+ require 'prescient'
172
+
173
+ # Use default provider (Ollama)
174
+ client = Prescient.client
175
+
176
+ # Generate embeddings
177
+ embedding = client.generate_embedding("Your text here")
178
+ # => [0.1, 0.2, 0.3, ...] (768-dimensional vector)
179
+
180
+ # Generate text responses
181
+ response = client.generate_response("What is Ruby?")
182
+ puts response[:response]
183
+ # => "Ruby is a dynamic, open-source programming language..."
184
+
185
+ # Health check
186
+ health = client.health_check
187
+ puts health[:status] # => "healthy"
188
+ ```
189
+
190
+ ### Provider-Specific Usage
191
+
192
+ ```ruby
193
+ # Use specific provider
194
+ openai_client = Prescient.client(:openai)
195
+ anthropic_client = Prescient.client(:anthropic)
196
+
197
+ # Direct method calls
198
+ embedding = Prescient.generate_embedding("text", provider: :openai)
199
+ response = Prescient.generate_response("prompt", provider: :anthropic)
200
+ ```
201
+
202
+ ### Context-Aware Generation
203
+
204
+ ```ruby
205
+ # Generate embeddings for document chunks
206
+ documents = ["Document 1 content", "Document 2 content"]
207
+ embeddings = documents.map { |doc| Prescient.generate_embedding(doc) }
208
+
209
+ # Later, find relevant context and generate response
210
+ query = "What is mentioned about Ruby?"
211
+ context_items = find_relevant_documents(query, embeddings) # Your similarity search
212
+
213
+ response = Prescient.generate_response(query, context_items,
214
+ max_tokens: 1000,
215
+ temperature: 0.7
216
+ )
217
+
218
+ puts response[:response]
219
+ puts "Model: " + response[:model]
220
+ puts "Provider: " + response[:provider]
221
+ ```
222
+
223
+ ### Error Handling
224
+
225
+ ```ruby
226
+ begin
227
+ response = client.generate_response("Your prompt")
228
+ rescue Prescient::ConnectionError => e
229
+ puts "Connection failed: #{e.message}"
230
+ rescue Prescient::RateLimitError => e
231
+ puts "Rate limited: #{e.message}"
232
+ rescue Prescient::AuthenticationError => e
233
+ puts "Auth failed: #{e.message}"
234
+ rescue Prescient::Error => e
235
+ puts "General error: #{e.message}"
236
+ end
237
+ ```
238
+
239
+ ### Health Monitoring
240
+
241
+ ```ruby
242
+ # Check all providers
243
+ [:ollama, :anthropic, :openai, :huggingface].each do |provider|
244
+ health = Prescient.health_check(provider: provider)
245
+ puts "#{provider}: #{health[:status]}"
246
+ puts "Ready: #{health[:ready]}" if health[:ready]
247
+ end
248
+ ```
249
+
250
+ ## Custom Prompt Templates
251
+
252
+ Prescient allows you to customize the AI assistant's behavior through configurable prompt templates:
253
+
254
+ ```ruby
255
+ Prescient.configure do |config|
256
+ config.add_provider(:customer_service, Prescient::Provider::OpenAI,
257
+ api_key: ENV['OPENAI_API_KEY'],
258
+ embedding_model: 'text-embedding-3-small',
259
+ chat_model: 'gpt-3.5-turbo',
260
+ prompt_templates: {
261
+ system_prompt: 'You are a friendly customer service representative.',
262
+ no_context_template: <<~TEMPLATE.strip,
263
+ %{ system_prompt }
264
+
265
+ Customer Question: %{query}
266
+
267
+ Please provide a helpful response.
268
+ TEMPLATE
269
+ with_context_template: <<~TEMPLATE.strip
270
+ %{ system_prompt } Use the company info below to help answer.
271
+
272
+ Company Information:
273
+ %{context}
274
+
275
+ Customer Question: %{query}
276
+
277
+ Respond based on our company policies above.
278
+ TEMPLATE
279
+ }
280
+ )
281
+ end
282
+
283
+ client = Prescient.client(:customer_service)
284
+ response = client.generate_response("What's your return policy?")
285
+ ```
286
+
287
+ ### Template Placeholders
288
+
289
+ - `%{system_prompt}` - The system/role instruction
290
+ - `%{query}` - The user's question
291
+ - `%{context}` - Formatted context items (when provided)
292
+
293
+ ### Template Types
294
+
295
+ - **system_prompt** - Defines the AI's role and behavior
296
+ - **no_context_template** - Used when no context items provided
297
+ - **with_context_template** - Used when context items are provided
298
+
299
+ ### Examples by Use Case
300
+
301
+ #### Technical Documentation
302
+
303
+ ```ruby
304
+ prompt_templates: {
305
+ system_prompt: 'You are a technical documentation assistant. Provide detailed explanations with code examples.',
306
+ # ... templates
307
+ }
308
+
309
+ ```
310
+
311
+ #### Creative Writing
312
+
313
+ ```ruby
314
+ prompt_templates: {
315
+ system_prompt: 'You are a creative writing assistant. Be imaginative and inspiring.',
316
+ # ... templates
317
+ }
318
+ ```
319
+
320
+ See `examples/custom_prompts.rb` for complete examples.
321
+
322
+ ## Custom Context Configurations
323
+
324
+ Define how different data types should be formatted and which fields to use for embeddings:
325
+
326
+ ```ruby
327
+ Prescient.configure do |config|
328
+ config.add_provider(:ecommerce, Prescient::Provider::OpenAI,
329
+ api_key: ENV['OPENAI_API_KEY'],
330
+ context_configs: {
331
+ 'product' => {
332
+ fields: %w[name description price category brand],
333
+ format: '%{ name } by %{ brand }: %{ description } - $%{ price } (%{ category })',
334
+ embedding_fields: %w[name description category brand]
335
+ },
336
+ 'review' => {
337
+ fields: %w[product_name rating review_text reviewer_name],
338
+ format: '%{ product_name } - %{ rating }/5 stars: "%{ review_text }"',
339
+ embedding_fields: %w[product_name review_text]
340
+ }
341
+ }
342
+ )
343
+ end
344
+
345
+ # Context items with explicit type
346
+ products = [
347
+ {
348
+ 'type' => 'product',
349
+ 'name' => 'UltraBook Pro',
350
+ 'description' => 'High-performance laptop',
351
+ 'price' => '1299.99',
352
+ 'category' => 'Laptops',
353
+ 'brand' => 'TechCorp'
354
+ }
355
+ ]
356
+
357
+ client = Prescient.client(:ecommerce)
358
+ response = client.generate_response("I need a laptop for work", products)
359
+ ```
360
+
361
+ ### Context Configuration Options
362
+
363
+ - **fields** - Array of field names available for this context type
364
+ - **format** - Template string for displaying context items
365
+ - **embedding_fields** - Specific fields to use when generating embeddings
366
+
367
+ ### Automatic Context Detection
368
+
369
+ The system automatically detects context types based on YOUR configured field patterns:
370
+
371
+ 1. **Explicit Type Fields**: Uses `type`, `context_type`, or `model_type` field values
372
+ 2. **Field Matching**: Matches items to configured contexts based on field overlap (≥50% match required)
373
+ 3. **Default Fallback**: Uses generic formatting when no context configuration matches
374
+
375
+ The system has NO hardcoded context types - it's entirely driven by your configuration!
376
+
377
+ ### Without Context Configuration
378
+
379
+ The system works perfectly without any context configuration - it will:
380
+
381
+ - Use intelligent fallback formatting for any hash structure
382
+ - Extract text fields for embeddings while excluding common metadata (id, timestamps, etc.)
383
+ - Provide consistent behavior across different data types
384
+
385
+ ```ruby
386
+ # No context_configs needed - works with any data!
387
+ client = Prescient.client(:default)
388
+ response = client.generate_response("Analyze this", [
389
+ { 'title' => 'Issue', 'content' => 'Server down', 'created_at' => '2024-01-01' },
390
+ { 'name' => 'Alert', 'message' => 'High CPU usage', 'timestamp' => 1234567 }
391
+ ])
392
+ ```
393
+
394
+ See `examples/custom_contexts.rb` for complete examples.
395
+
396
+ ## Vector Database Integration (pgvector)
397
+
398
+ Prescient integrates seamlessly with PostgreSQL's pgvector extension for storing and searching embeddings:
399
+
400
+ ### Setup with Docker
401
+
402
+ The included `docker-compose.yml` provides a complete setup with PostgreSQL + pgvector:
403
+
404
+ ```bash
405
+ # Start PostgreSQL with pgvector
406
+ docker-compose up -d postgres
407
+
408
+ # The database will automatically:
409
+ # - Install pgvector extension
410
+ # - Create tables for documents and embeddings
411
+ # - Set up optimized vector indexes
412
+ # - Insert sample data for testing
413
+ ```
414
+
415
+ ### Database Schema
416
+
417
+ The setup creates these key tables:
418
+
419
+ - **`documents`** - Store original content and metadata
420
+ - **`document_embeddings`** - Store vector embeddings for documents
421
+ - **`document_chunks`** - Break large documents into searchable chunks
422
+ - **`chunk_embeddings`** - Store embeddings for document chunks
423
+ - **`search_queries`** - Track search queries and performance
424
+ - **`query_results`** - Store search results for analysis
425
+
426
+ ### Vector Search Example
427
+
428
+ ```ruby
429
+ require 'prescient'
430
+ require 'pg'
431
+
432
+ # Connect to database
433
+ db = PG.connect(
434
+ host: 'localhost',
435
+ port: 5432,
436
+ dbname: 'prescient_development',
437
+ user: 'prescient',
438
+ password: 'prescient_password'
439
+ )
440
+
441
+ # Generate embedding for a document
442
+ client = Prescient.client(:ollama)
443
+ text = "Ruby is a dynamic programming language"
444
+ embedding = client.generate_embedding(text)
445
+
446
+ # Store embedding in database
447
+ vector_str = "[#{embedding.join(',')}]"
448
+ db.exec_params(
449
+ "INSERT INTO document_embeddings (document_id, embedding_provider, embedding_model, embedding_dimensions, embedding, embedding_text) VALUES ($1, $2, $3, $4, $5, $6)",
450
+ [doc_id, 'ollama', 'nomic-embed-text', 768, vector_str, text]
451
+ )
452
+
453
+ # Perform similarity search
454
+ query_text = "What is Ruby programming?"
455
+ query_embedding = client.generate_embedding(query_text)
456
+ query_vector = "[#{query_embedding.join(',')}]"
457
+
458
+ results = db.exec_params(
459
+ "SELECT d.title, d.content, de.embedding <=> $1::vector AS distance
460
+ FROM documents d
461
+ JOIN document_embeddings de ON d.id = de.document_id
462
+ ORDER BY de.embedding <=> $1::vector
463
+ LIMIT 5",
464
+ [query_vector]
465
+ )
466
+ ```
467
+
468
+ ### Distance Functions
469
+
470
+ pgvector supports three distance functions:
471
+
472
+ - **Cosine Distance** (`<=>`): Best for normalized embeddings
473
+ - **L2 Distance** (`<->`): Euclidean distance, good general purpose
474
+ - **Inner Product** (`<#>`): Dot product, useful for specific cases
475
+
476
+ ```sql
477
+ -- Cosine similarity (most common)
478
+ ORDER BY embedding <=> query_vector
479
+
480
+ -- L2 distance
481
+ ORDER BY embedding <-> query_vector
482
+
483
+ -- Inner product
484
+ ORDER BY embedding <#> query_vector
485
+ ```
486
+
487
+ ### Vector Indexes
488
+
489
+ The setup automatically creates HNSW indexes for fast similarity search:
490
+
491
+ ```sql
492
+ -- Example index for cosine distance
493
+ CREATE INDEX idx_embeddings_cosine
494
+ ON document_embeddings
495
+ USING hnsw (embedding vector_cosine_ops)
496
+ WITH (m = 16, ef_construction = 64);
497
+ ```
498
+
499
+ ### Advanced Search with Filters
500
+
501
+ Combine vector similarity with metadata filtering:
502
+
503
+ ```ruby
504
+ # Search with tag filtering
505
+ results = db.exec_params(
506
+ "SELECT d.title, de.embedding <=> $1::vector as distance
507
+ FROM documents d
508
+ JOIN document_embeddings de ON d.id = de.document_id
509
+ WHERE d.metadata->'tags' ? 'programming'
510
+ ORDER BY de.embedding <=> $1::vector
511
+ LIMIT 5",
512
+ [query_vector]
513
+ )
514
+
515
+ # Search with difficulty and tag filters
516
+ results = db.exec_params(
517
+ "SELECT d.title, de.embedding <=> $1::vector as distance
518
+ FROM documents d
519
+ JOIN document_embeddings de ON d.id = de.document_id
520
+ WHERE d.metadata->>'difficulty' = 'beginner'
521
+ AND d.metadata->'tags' ?| $2::text[]
522
+ ORDER BY de.embedding <=> $1::vector
523
+ LIMIT 5",
524
+ [query_vector, ['ruby', 'programming']]
525
+ )
526
+ ```
527
+
528
+ ### Performance Optimization
529
+
530
+ #### Index Configuration
531
+
532
+ For large datasets, tune HNSW parameters:
533
+
534
+ ```sql
535
+ -- High accuracy (slower build, more memory)
536
+ WITH (m = 32, ef_construction = 128)
537
+
538
+ -- Fast build (lower accuracy, less memory)
539
+ WITH (m = 8, ef_construction = 32)
540
+
541
+ -- Balanced (recommended default)
542
+ WITH (m = 16, ef_construction = 64)
543
+ ```
544
+
545
+ #### Query Performance
546
+
547
+ ```sql
548
+ -- Set ef_search for query-time accuracy/speed tradeoff
549
+ SET hnsw.ef_search = 100; -- Higher = more accurate, slower
550
+
551
+ -- Use EXPLAIN ANALYZE to optimize queries
552
+ EXPLAIN ANALYZE
553
+ SELECT * FROM document_embeddings
554
+ ORDER BY embedding <=> '[0.1,0.2,...]'::vector
555
+ LIMIT 10;
556
+ ```
557
+
558
+ #### Chunking Strategy
559
+
560
+ For large documents, use chunking for better search granularity:
561
+
562
+ ```ruby
563
+ def chunk_document(text, chunk_size: 500, overlap: 50)
564
+ chunks = []
565
+ start = 0
566
+
567
+ while start < text.length
568
+ end_pos = [start + chunk_size, text.length].min
569
+ chunk = text[start...end_pos]
570
+ chunks << chunk
571
+ start += chunk_size - overlap
572
+ end
573
+
574
+ chunks
575
+ end
576
+
577
+ # Generate embeddings for each chunk
578
+ chunks = chunk_document(document.content)
579
+ chunks.each_with_index do |chunk, index|
580
+ embedding = client.generate_embedding(chunk)
581
+ # Store chunk and embedding...
582
+ end
583
+ ```
584
+
585
+ ### Example Usage
586
+
587
+ Run the complete vector search example:
588
+
589
+ ```bash
590
+ # Start services
591
+ docker-compose up -d postgres ollama
592
+
593
+ # Run example
594
+ DB_HOST=localhost ruby examples/vector_search.rb
595
+ ```
596
+
597
+ The example demonstrates:
598
+
599
+ - Document embedding generation and storage
600
+ - Similarity search with different distance functions
601
+ - Metadata filtering and advanced queries
602
+ - Performance comparison between approaches
603
+
604
+ ## Advanced Usage
605
+
606
+ ### Custom Provider Implementation
607
+
608
+ ```ruby
609
+ class MyCustomProvider < Prescient::BaseProvider
610
+ def generate_embedding(text, **options)
611
+ # Your implementation
612
+ end
613
+
614
+ def generate_response(prompt, context_items = [], **options)
615
+ # Your implementation
616
+ end
617
+
618
+ def health_check
619
+ # Your implementation
620
+ end
621
+
622
+ protected
623
+
624
+ def validate_configuration!
625
+ # Validate required options
626
+ end
627
+ end
628
+
629
+ # Register your provider
630
+ Prescient.configure do |config|
631
+ config.add_provider(:mycustom, MyCustomProvider,
632
+ api_key: 'your_key',
633
+ model: 'your_model'
634
+ )
635
+ end
636
+ ```
637
+
638
+ ### Provider Information
639
+
640
+ ```ruby
641
+ client = Prescient.client(:ollama)
642
+ info = client.provider_info
643
+
644
+ puts info[:name] # => :ollama
645
+ puts info[:class] # => "Prescient::Ollama::Provider"
646
+ puts info[:available] # => true
647
+ puts info[:options] # => { ... } (excluding sensitive data)
648
+ ```
649
+
650
+ ## Provider-Specific Features
651
+
652
+ ### Ollama
653
+
654
+ - Model management: `pull_model`, `list_models`
655
+ - Local deployment support
656
+ - No API costs
657
+
658
+ ### Anthropic
659
+
660
+ - High-quality responses
661
+ - No embedding support (use with OpenAI/HuggingFace for embeddings)
662
+
663
+ ### OpenAI
664
+
665
+ - Multiple embedding model sizes
666
+ - Latest GPT models
667
+ - Reliable performance
668
+
669
+ ### HuggingFace
670
+
671
+ - Open-source models
672
+ - Research-friendly
673
+ - Free tier available
674
+
675
+ ## Docker Setup (Recommended for Ollama)
676
+
677
+ The easiest way to get started with Prescient and Ollama is using Docker Compose:
678
+
679
+ ### Hardware Requirements
680
+
681
+ Before starting, ensure your system meets the minimum requirements for running Ollama:
682
+
683
+ #### **Minimum Requirements:**
684
+
685
+ - **CPU**: 4+ cores (x86_64 or ARM64)
686
+ - **RAM**: 8GB+ (16GB recommended)
687
+ - **Storage**: 10GB+ free space for models
688
+ - **OS**: Linux, macOS, or Windows with Docker
689
+
690
+ #### **Model-Specific Requirements:**
691
+
692
+ | Model | RAM Required | Storage | Notes |
693
+ | ------------------ | ------------ | ------- | --------------------------------- |
694
+ | `nomic-embed-text` | 1GB | 274MB | Embedding model |
695
+ | `llama3.1:8b` | 8GB | 4.7GB | Chat model (8B parameters) |
696
+ | `llama3.1:70b` | 64GB+ | 40GB | Large chat model (70B parameters) |
697
+ | `codellama:7b` | 8GB | 3.8GB | Code generation model |
698
+
699
+ #### **Performance Recommendations:**
700
+
701
+ - **SSD Storage**: Significantly faster model loading
702
+ - **GPU (Optional)**: NVIDIA GPU with 8GB+ VRAM for acceleration
703
+ - **Network**: Stable internet for initial model downloads
704
+ - **Docker**: 4GB+ memory limit configured
705
+
706
+ #### **GPU Acceleration (Optional):**
707
+
708
+ - **NVIDIA GPU**: RTX 3060+ with 8GB+ VRAM recommended
709
+ - **CUDA**: Version 11.8+ required
710
+ - **Docker**: NVIDIA Container Toolkit installed
711
+ - **Performance**: 3-10x faster inference with compatible models
712
+
713
+ > **💡 Tip**: Start with smaller models like `llama3.1:8b` and upgrade based on your hardware capabilities and performance needs.
714
+
715
+ ### Quick Start with Docker
716
+
717
+ 1. **Start Ollama service:**
718
+
719
+ ```bash
720
+ docker-compose up -d ollama
721
+ ```
722
+
723
+ 2. **Pull required models:**
724
+
725
+ ```bash
726
+ # Automatic setup
727
+ docker-compose up ollama-init
728
+
729
+ # Or manual setup
730
+ ./scripts/setup-ollama-models.sh
731
+ ```
732
+
733
+ 3. **Run examples:**
734
+
735
+ ```bash
736
+ # Set environment variable
737
+ export OLLAMA_URL=http://localhost:11434
738
+
739
+ # Run examples
740
+ ruby examples/custom_contexts.rb
741
+ ```
742
+
743
+ ### Docker Compose Services
744
+
745
+ The included `docker-compose.yml` provides:
746
+
747
+ - **ollama**: Ollama AI service with persistent model storage
748
+ - **ollama-init**: Automatically pulls required models on startup
749
+ - **redis**: Optional caching layer for embeddings
750
+ - **prescient-app**: Example Ruby application container
751
+
752
+ ### Configuration Options
753
+
754
+ ```yaml
755
+ # docker-compose.yml environment variables
756
+ services:
757
+ ollama:
758
+ ports:
759
+ - "11434:11434" # Ollama API port
760
+ volumes:
761
+ - ollama_data:/root/.ollama # Persist models
762
+ environment:
763
+ - OLLAMA_HOST=0.0.0.0
764
+ - OLLAMA_ORIGINS=*
765
+ ```
766
+
767
+ ### GPU Support (Optional)
768
+
769
+ For GPU acceleration, uncomment the GPU configuration in `docker-compose.yml`:
770
+
771
+ ```yaml
772
+ services:
773
+ ollama:
774
+ deploy:
775
+ resources:
776
+ reservations:
777
+ devices:
778
+ - driver: nvidia
779
+ count: 1
780
+ capabilities: [gpu]
781
+ ```
782
+
783
+ ### Environment Variables
784
+
785
+ ```bash
786
+ # Ollama Configuration
787
+ OLLAMA_URL=http://localhost:11434
788
+ OLLAMA_EMBEDDING_MODEL=nomic-embed-text
789
+ OLLAMA_CHAT_MODEL=llama3.1:8b
790
+
791
+ # Optional: Other AI providers
792
+ OPENAI_API_KEY=your_key_here
793
+ ANTHROPIC_API_KEY=your_key_here
794
+ HUGGINGFACE_API_KEY=your_key_here
795
+ ```
796
+
797
+ ### Model Management
798
+
799
+ ```bash
800
+ # Check available models
801
+ curl http://localhost:11434/api/tags
802
+
803
+ # Pull a specific model
804
+ curl -X POST http://localhost:11434/api/pull \
805
+ -H "Content-Type: application/json" \
806
+ -d '{ "name": "llama3.1:8b"}'
807
+
808
+ # Health check
809
+ curl http://localhost:11434/api/version
810
+ ```
811
+
812
+ ### Production Deployment
813
+
814
+ For production use:
815
+
816
+ 1. Use specific image tags instead of `latest`
817
+ 2. Configure proper resource limits
818
+ 3. Set up monitoring and logging
819
+ 4. Use secrets management for API keys
820
+ 5. Configure backups for model data
821
+
822
+ ### Troubleshooting
823
+
824
+ #### **Common Issues:**
825
+
826
+ **Out of Memory Errors:**
827
+
828
+ ```bash
829
+ # Check available memory
830
+ free -h
831
+
832
+ # Increase Docker memory limit (Docker Desktop)
833
+ # Settings > Resources > Memory: 8GB+
834
+
835
+ # Use smaller models if hardware limited
836
+ OLLAMA_CHAT_MODEL=llama3.1:7b ruby examples/custom_contexts.rb
837
+ ```
838
+
839
+ **Slow Model Loading:**
840
+
841
+ ```bash
842
+ # Check disk I/O
843
+ iostat -x 1
844
+
845
+ # Move Docker data to SSD if on HDD
846
+ # Docker Desktop: Settings > Resources > Disk image location
847
+ ```
848
+
849
+ **Model Download Failures:**
850
+
851
+ ```bash
852
+ # Check disk space
853
+ df -h
854
+
855
+ # Manually pull models with retry
856
+ docker exec prescient-ollama ollama pull llama3.1:8b
857
+ ```
858
+
859
+ **GPU Not Detected:**
860
+
861
+ ```bash
862
+ # Check NVIDIA Docker runtime
863
+ docker run --rm --gpus all nvidia/cuda:11.8-base nvidia-smi
864
+
865
+ # Install NVIDIA Container Toolkit if missing
866
+ # https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
867
+ ```
868
+
869
+ #### **Performance Monitoring:**
870
+
871
+ ```bash
872
+ # Monitor resource usage
873
+ docker stats prescient-ollama
874
+
875
+ # Check Ollama logs
876
+ docker logs prescient-ollama
877
+
878
+ # Test API response time
879
+ time curl -X POST http://localhost:11434/api/generate \
880
+ -H "Content-Type: application/json" \
881
+ -d '{ "model": "llama3.1:8b", "prompt": "Hello", "stream": false}'
882
+ ```
883
+
884
+ ## Testing
885
+
886
+ The gem includes comprehensive test coverage:
887
+
888
+ ```bash
889
+ bundle exec rspec
890
+ ```
26
891
 
27
892
  ## Development
28
893
 
29
- After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
894
+ After checking out the repo, run:
30
895
 
31
- To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).
896
+ ```bash
897
+ bundle install
898
+ ```
899
+
900
+ To install this gem onto your local machine:
901
+
902
+ ```bash
903
+ bundle exec rake install
904
+ ```
32
905
 
33
906
  ## Contributing
34
907
 
35
- Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/prescient. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [code of conduct](https://github.com/[USERNAME]/prescient/blob/main/CODE_OF_CONDUCT.md).
908
+ 1. Fork it
909
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
910
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
911
+ 4. Push to the branch (`git push origin my-new-feature`)
912
+ 5. Create a new Pull Request
36
913
 
37
914
  ## License
38
915
 
39
916
  The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
40
917
 
41
- ## Code of Conduct
918
+ ## Roadmap
919
+
920
+ ### Version 0.2.0 (Planned)
921
+
922
+ - **MariaDB Vector Support**: Integration with MariaDB using external vector databases
923
+ - **Hybrid Database Architecture**: Support for MariaDB + Milvus/Qdrant combinations
924
+ - **Vector Database Adapters**: Pluggable adapters for different vector storage backends
925
+ - **Enhanced Chunking Strategies**: Smart document splitting with multiple algorithms
926
+ - **Search Result Ranking**: Advanced scoring and re-ranking capabilities
927
+
928
+ ### Version 0.3.0 (Future)
929
+
930
+ - **Streaming Responses**: Real-time response streaming for chat applications
931
+ - **Multi-Model Ensembles**: Combine responses from multiple AI providers
932
+ - **Advanced Analytics**: Search performance insights and usage analytics
933
+ - **Cloud Provider Integration**: Direct support for Pinecone, Weaviate, etc.
934
+
935
+ ## Changelog
936
+
937
+ ### Version 0.1.0
42
938
 
43
- Everyone interacting in the Prescient project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/[USERNAME]/prescient/blob/main/CODE_OF_CONDUCT.md).
939
+ - Initial release
940
+ - Support for Ollama, Anthropic, OpenAI, and HuggingFace
941
+ - Unified interface for embeddings and text generation
942
+ - Comprehensive error handling and retry logic
943
+ - Health monitoring capabilities
944
+ - PostgreSQL pgvector integration with complete Docker setup
945
+ - Vector similarity search with multiple distance functions
946
+ - Document chunking and metadata filtering
947
+ - Performance optimization guides and troubleshooting