prescient 0.0.0 → 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md CHANGED
@@ -1,43 +1,889 @@
1
1
  # Prescient
2
2
 
3
- TODO: Delete this and the text below, and describe your gem
3
+ Prescient provides a unified interface for AI providers including Ollama (local), Anthropic Claude, OpenAI GPT, and HuggingFace models. Built for prescient applications that need AI predictions with provider switching, error handling, and fallback mechanisms.
4
4
 
5
- Welcome to your new gem! In this directory, you'll find the files you need to be able to package up your Ruby library into a gem. Put your Ruby code in the file `lib/prescient`. To experiment with that code, run `bin/console` for an interactive prompt.
5
+ ## Features
6
+
7
+ - **Unified Interface**: Single API for multiple AI providers
8
+ - **Local and Cloud Support**: Ollama for local/private deployments, cloud APIs for scale
9
+ - **Embedding Generation**: Vector embeddings for semantic search and AI applications
10
+ - **Text Completion**: Chat completions with context support
11
+ - **Error Handling**: Robust error handling with automatic retries
12
+ - **Health Monitoring**: Built-in health checks for all providers
13
+ - **Flexible Configuration**: Environment variable and programmatic configuration
14
+
15
+ ## Supported Providers
16
+
17
+ ### Ollama (Local)
18
+
19
+ - **Models**: Any Ollama-compatible model (llama3.1, nomic-embed-text, etc.)
20
+ - **Capabilities**: Embeddings, Text Generation, Model Management
21
+ - **Use Case**: Privacy-focused, local deployments
22
+
23
+ ### Anthropic Claude
24
+
25
+ - **Models**: Claude 3 (Haiku, Sonnet, Opus)
26
+ - **Capabilities**: Text Generation only (no embeddings)
27
+ - **Use Case**: High-quality conversational AI
28
+
29
+ ### OpenAI
30
+
31
+ - **Models**: GPT-3.5, GPT-4, text-embedding-3-small/large
32
+ - **Capabilities**: Embeddings, Text Generation
33
+ - **Use Case**: Proven performance, wide model selection
34
+
35
+ ### HuggingFace
36
+
37
+ - **Models**: sentence-transformers, open-source chat models
38
+ - **Capabilities**: Embeddings, Text Generation
39
+ - **Use Case**: Open-source models, research
6
40
 
7
41
  ## Installation
8
42
 
9
- TODO: Replace `UPDATE_WITH_YOUR_GEM_NAME_IMMEDIATELY_AFTER_RELEASE_TO_RUBYGEMS_ORG` with your gem name right after releasing it to RubyGems.org. Please do not do it earlier due to security reasons. Alternatively, replace this section with instructions to install your gem from git if you don't plan to release to RubyGems.org.
43
+ Add this line to your application's Gemfile:
44
+
45
+ ```ruby
46
+ gem 'prescient'
47
+ ```
48
+
49
+ And then execute:
50
+
51
+ ```bash
52
+ bundle install
53
+ ```
10
54
 
11
- Install the gem and add to the application's Gemfile by executing:
55
+ Or install it yourself as:
12
56
 
13
57
  ```bash
14
- bundle add UPDATE_WITH_YOUR_GEM_NAME_IMMEDIATELY_AFTER_RELEASE_TO_RUBYGEMS_ORG
58
+ gem install prescient
15
59
  ```
16
60
 
17
- If bundler is not being used to manage dependencies, install the gem by executing:
61
+ ## Configuration
62
+
63
+ ### Environment Variables
18
64
 
19
65
  ```bash
20
- gem install UPDATE_WITH_YOUR_GEM_NAME_IMMEDIATELY_AFTER_RELEASE_TO_RUBYGEMS_ORG
66
+ # Ollama (Local)
67
+ OLLAMA_URL=http://localhost:11434
68
+ OLLAMA_EMBEDDING_MODEL=nomic-embed-text
69
+ OLLAMA_CHAT_MODEL=llama3.1:8b
70
+
71
+ # Anthropic
72
+ ANTHROPIC_API_KEY=your_api_key
73
+ ANTHROPIC_MODEL=claude-3-haiku-20240307
74
+
75
+ # OpenAI
76
+ OPENAI_API_KEY=your_api_key
77
+ OPENAI_EMBEDDING_MODEL=text-embedding-3-small
78
+ OPENAI_CHAT_MODEL=gpt-3.5-turbo
79
+
80
+ # HuggingFace
81
+ HUGGINGFACE_API_KEY=your_api_key
82
+ HUGGINGFACE_EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
83
+ HUGGINGFACE_CHAT_MODEL=microsoft/DialoGPT-medium
84
+ ```
85
+
86
+ ### Programmatic Configuration
87
+
88
+ ```ruby
89
+ require 'prescient'
90
+
91
+ # Configure providers
92
+ Prescient.configure do |config|
93
+ config.default_provider = :ollama
94
+ config.timeout = 60
95
+ config.retry_attempts = 3
96
+ config.retry_delay = 1.0
97
+
98
+ # Add custom Ollama configuration
99
+ config.add_provider(:ollama, Prescient::Ollama::Provider,
100
+ url: 'http://localhost:11434',
101
+ embedding_model: 'nomic-embed-text',
102
+ chat_model: 'llama3.1:8b'
103
+ )
104
+
105
+ # Add Anthropic
106
+ config.add_provider(:anthropic, Prescient::Anthropic::Provider,
107
+ api_key: ENV['ANTHROPIC_API_KEY'],
108
+ model: 'claude-3-haiku-20240307'
109
+ )
110
+
111
+ # Add OpenAI
112
+ config.add_provider(:openai, Prescient::OpenAI::Provider,
113
+ api_key: ENV['OPENAI_API_KEY'],
114
+ embedding_model: 'text-embedding-3-small',
115
+ chat_model: 'gpt-3.5-turbo'
116
+ )
117
+ end
21
118
  ```
22
119
 
23
120
  ## Usage
24
121
 
25
- TODO: Write usage instructions here
122
+ ### Quick Start
123
+
124
+ ```ruby
125
+ require 'prescient'
126
+
127
+ # Use default provider (Ollama)
128
+ client = Prescient.client
129
+
130
+ # Generate embeddings
131
+ embedding = client.generate_embedding("Your text here")
132
+ # => [0.1, 0.2, 0.3, ...] (768-dimensional vector)
133
+
134
+ # Generate text responses
135
+ response = client.generate_response("What is Ruby?")
136
+ puts response[:response]
137
+ # => "Ruby is a dynamic, open-source programming language..."
138
+
139
+ # Health check
140
+ health = client.health_check
141
+ puts health[:status] # => "healthy"
142
+ ```
143
+
144
+ ### Provider-Specific Usage
145
+
146
+ ```ruby
147
+ # Use specific provider
148
+ openai_client = Prescient.client(:openai)
149
+ anthropic_client = Prescient.client(:anthropic)
150
+
151
+ # Direct method calls
152
+ embedding = Prescient.generate_embedding("text", provider: :openai)
153
+ response = Prescient.generate_response("prompt", provider: :anthropic)
154
+ ```
155
+
156
+ ### Context-Aware Generation
157
+
158
+ ```ruby
159
+ # Generate embeddings for document chunks
160
+ documents = ["Document 1 content", "Document 2 content"]
161
+ embeddings = documents.map { |doc| Prescient.generate_embedding(doc) }
162
+
163
+ # Later, find relevant context and generate response
164
+ query = "What is mentioned about Ruby?"
165
+ context_items = find_relevant_documents(query, embeddings) # Your similarity search
166
+
167
+ response = Prescient.generate_response(query, context_items,
168
+ max_tokens: 1000,
169
+ temperature: 0.7
170
+ )
171
+
172
+ puts response[:response]
173
+ puts "Model: #{response[:model]}"
174
+ puts "Provider: #{response[:provider]}"
175
+ ```
176
+
177
+ ### Error Handling
178
+
179
+ ```ruby
180
+ begin
181
+ response = client.generate_response("Your prompt")
182
+ rescue Prescient::ConnectionError => e
183
+ puts "Connection failed: #{e.message}"
184
+ rescue Prescient::RateLimitError => e
185
+ puts "Rate limited: #{e.message}"
186
+ rescue Prescient::AuthenticationError => e
187
+ puts "Auth failed: #{e.message}"
188
+ rescue Prescient::Error => e
189
+ puts "General error: #{e.message}"
190
+ end
191
+ ```
192
+
193
+ ### Health Monitoring
194
+
195
+ ```ruby
196
+ # Check all providers
197
+ [:ollama, :anthropic, :openai, :huggingface].each do |provider|
198
+ health = Prescient.health_check(provider: provider)
199
+ puts "#{provider}: #{health[:status]}"
200
+ puts "Ready: #{health[:ready]}" if health[:ready]
201
+ end
202
+ ```
203
+
204
+ ## Custom Prompt Templates
205
+
206
+ Prescient allows you to customize the AI assistant's behavior through configurable prompt templates:
207
+
208
+ ```ruby
209
+ Prescient.configure do |config|
210
+ config.add_provider(:customer_service, Prescient::Provider::OpenAI,
211
+ api_key: ENV['OPENAI_API_KEY'],
212
+ embedding_model: 'text-embedding-3-small',
213
+ chat_model: 'gpt-3.5-turbo',
214
+ prompt_templates: {
215
+ system_prompt: 'You are a friendly customer service representative.',
216
+ no_context_template: <<~TEMPLATE.strip,
217
+ %{system_prompt}
218
+
219
+ Customer Question: %{query}
220
+
221
+ Please provide a helpful response.
222
+ TEMPLATE
223
+ with_context_template: <<~TEMPLATE.strip
224
+ %{system_prompt} Use the company info below to help answer.
225
+
226
+ Company Information:
227
+ %{context}
228
+
229
+ Customer Question: %{query}
230
+
231
+ Respond based on our company policies above.
232
+ TEMPLATE
233
+ }
234
+ )
235
+ end
236
+
237
+ client = Prescient.client(:customer_service)
238
+ response = client.generate_response("What's your return policy?")
239
+ ```
240
+
241
+ ### Template Placeholders
242
+
243
+ - `%{system_prompt}` - The system/role instruction
244
+ - `%{query}` - The user's question
245
+ - `%{context}` - Formatted context items (when provided)
246
+
247
+ ### Template Types
248
+
249
+ - **system_prompt** - Defines the AI's role and behavior
250
+ - **no_context_template** - Used when no context items provided
251
+ - **with_context_template** - Used when context items are provided
252
+
253
+ ### Examples by Use Case
254
+
255
+ #### Technical Documentation
256
+
257
+ ```ruby
258
+ prompt_templates: {
259
+ system_prompt: 'You are a technical documentation assistant. Provide detailed explanations with code examples.',
260
+ # ... templates
261
+ }
262
+ ```
263
+
264
+ #### Creative Writing
265
+
266
+ ```ruby
267
+ prompt_templates: {
268
+ system_prompt: 'You are a creative writing assistant. Be imaginative and inspiring.',
269
+ # ... templates
270
+ }
271
+ ```
272
+
273
+ See `examples/custom_prompts.rb` for complete examples.
274
+
275
+ ## Custom Context Configurations
276
+
277
+ Define how different data types should be formatted and which fields to use for embeddings:
278
+
279
+ ```ruby
280
+ Prescient.configure do |config|
281
+ config.add_provider(:ecommerce, Prescient::Provider::OpenAI,
282
+ api_key: ENV['OPENAI_API_KEY'],
283
+ context_configs: {
284
+ 'product' => {
285
+ fields: %w[name description price category brand],
286
+ format: '%{name} by %{brand}: %{description} - $%{price} (%{category})',
287
+ embedding_fields: %w[name description category brand]
288
+ },
289
+ 'review' => {
290
+ fields: %w[product_name rating review_text reviewer_name],
291
+ format: '%{product_name} - %{rating}/5 stars: "%{review_text}"',
292
+ embedding_fields: %w[product_name review_text]
293
+ }
294
+ }
295
+ )
296
+ end
297
+
298
+ # Context items with explicit type
299
+ products = [
300
+ {
301
+ 'type' => 'product',
302
+ 'name' => 'UltraBook Pro',
303
+ 'description' => 'High-performance laptop',
304
+ 'price' => '1299.99',
305
+ 'category' => 'Laptops',
306
+ 'brand' => 'TechCorp'
307
+ }
308
+ ]
309
+
310
+ client = Prescient.client(:ecommerce)
311
+ response = client.generate_response("I need a laptop for work", products)
312
+ ```
313
+
314
+ ### Context Configuration Options
315
+
316
+ - **fields** - Array of field names available for this context type
317
+ - **format** - Template string for displaying context items
318
+ - **embedding_fields** - Specific fields to use when generating embeddings
319
+
320
+ ### Automatic Context Detection
321
+
322
+ The system automatically detects context types based on YOUR configured field patterns:
323
+
324
+ 1. **Explicit Type Fields**: Uses `type`, `context_type`, or `model_type` field values
325
+ 2. **Field Matching**: Matches items to configured contexts based on field overlap (≥50% match required)
326
+ 3. **Default Fallback**: Uses generic formatting when no context configuration matches
327
+
328
+ The system has NO hardcoded context types - it's entirely driven by your configuration!
329
+
330
+ ### Without Context Configuration
331
+
332
+ The system works perfectly without any context configuration - it will:
333
+
334
+ - Use intelligent fallback formatting for any hash structure
335
+ - Extract text fields for embeddings while excluding common metadata (id, timestamps, etc.)
336
+ - Provide consistent behavior across different data types
337
+
338
+ ```ruby
339
+ # No context_configs needed - works with any data!
340
+ client = Prescient.client(:default)
341
+ response = client.generate_response("Analyze this", [
342
+ { 'title' => 'Issue', 'content' => 'Server down', 'created_at' => '2024-01-01' },
343
+ { 'name' => 'Alert', 'message' => 'High CPU usage', 'timestamp' => 1234567 }
344
+ ])
345
+ ```
346
+
347
+ See `examples/custom_contexts.rb` for complete examples.
348
+
349
+ ## Vector Database Integration (pgvector)
350
+
351
+ Prescient integrates seamlessly with PostgreSQL's pgvector extension for storing and searching embeddings:
352
+
353
+ ### Setup with Docker
354
+
355
+ The included `docker-compose.yml` provides a complete setup with PostgreSQL + pgvector:
356
+
357
+ ```bash
358
+ # Start PostgreSQL with pgvector
359
+ docker-compose up -d postgres
360
+
361
+ # The database will automatically:
362
+ # - Install pgvector extension
363
+ # - Create tables for documents and embeddings
364
+ # - Set up optimized vector indexes
365
+ # - Insert sample data for testing
366
+ ```
367
+
368
+ ### Database Schema
369
+
370
+ The setup creates these key tables:
371
+
372
+ - **`documents`** - Store original content and metadata
373
+ - **`document_embeddings`** - Store vector embeddings for documents
374
+ - **`document_chunks`** - Break large documents into searchable chunks
375
+ - **`chunk_embeddings`** - Store embeddings for document chunks
376
+ - **`search_queries`** - Track search queries and performance
377
+ - **`query_results`** - Store search results for analysis
378
+
379
+ ### Vector Search Example
380
+
381
+ ```ruby
382
+ require 'prescient'
383
+ require 'pg'
384
+
385
+ # Connect to database
386
+ db = PG.connect(
387
+ host: 'localhost',
388
+ port: 5432,
389
+ dbname: 'prescient_development',
390
+ user: 'prescient',
391
+ password: 'prescient_password'
392
+ )
393
+
394
+ # Generate embedding for a document
395
+ client = Prescient.client(:ollama)
396
+ text = "Ruby is a dynamic programming language"
397
+ embedding = client.generate_embedding(text)
398
+
399
+ # Store embedding in database
400
+ vector_str = "[#{embedding.join(',')}]"
401
+ db.exec_params(
402
+ "INSERT INTO document_embeddings (document_id, embedding_provider, embedding_model, embedding_dimensions, embedding, embedding_text) VALUES ($1, $2, $3, $4, $5, $6)",
403
+ [doc_id, 'ollama', 'nomic-embed-text', 768, vector_str, text]
404
+ )
405
+
406
+ # Perform similarity search
407
+ query_text = "What is Ruby programming?"
408
+ query_embedding = client.generate_embedding(query_text)
409
+ query_vector = "[#{query_embedding.join(',')}]"
410
+
411
+ results = db.exec_params(
412
+ "SELECT d.title, d.content, de.embedding <=> $1::vector AS distance
413
+ FROM documents d
414
+ JOIN document_embeddings de ON d.id = de.document_id
415
+ ORDER BY de.embedding <=> $1::vector
416
+ LIMIT 5",
417
+ [query_vector]
418
+ )
419
+ ```
420
+
421
+ ### Distance Functions
422
+
423
+ pgvector supports three distance functions:
424
+
425
+ - **Cosine Distance** (`<=>`): Best for normalized embeddings
426
+ - **L2 Distance** (`<->`): Euclidean distance, good general purpose
427
+ - **Inner Product** (`<#>`): Dot product, useful for specific cases
428
+
429
+ ```sql
430
+ -- Cosine similarity (most common)
431
+ ORDER BY embedding <=> query_vector
432
+
433
+ -- L2 distance
434
+ ORDER BY embedding <-> query_vector
435
+
436
+ -- Inner product
437
+ ORDER BY embedding <#> query_vector
438
+ ```
439
+
440
+ ### Vector Indexes
441
+
442
+ The setup automatically creates HNSW indexes for fast similarity search:
443
+
444
+ ```sql
445
+ -- Example index for cosine distance
446
+ CREATE INDEX idx_embeddings_cosine
447
+ ON document_embeddings
448
+ USING hnsw (embedding vector_cosine_ops)
449
+ WITH (m = 16, ef_construction = 64);
450
+ ```
451
+
452
+ ### Advanced Search with Filters
453
+
454
+ Combine vector similarity with metadata filtering:
455
+
456
+ ```ruby
457
+ # Search with tag filtering
458
+ results = db.exec_params(
459
+ "SELECT d.title, de.embedding <=> $1::vector as distance
460
+ FROM documents d
461
+ JOIN document_embeddings de ON d.id = de.document_id
462
+ WHERE d.metadata->'tags' ? 'programming'
463
+ ORDER BY de.embedding <=> $1::vector
464
+ LIMIT 5",
465
+ [query_vector]
466
+ )
467
+
468
+ # Search with difficulty and tag filters
469
+ results = db.exec_params(
470
+ "SELECT d.title, de.embedding <=> $1::vector as distance
471
+ FROM documents d
472
+ JOIN document_embeddings de ON d.id = de.document_id
473
+ WHERE d.metadata->>'difficulty' = 'beginner'
474
+ AND d.metadata->'tags' ?| $2::text[]
475
+ ORDER BY de.embedding <=> $1::vector
476
+ LIMIT 5",
477
+ [query_vector, ['ruby', 'programming']]
478
+ )
479
+ ```
480
+
481
+ ### Performance Optimization
482
+
483
+ #### Index Configuration
484
+
485
+ For large datasets, tune HNSW parameters:
486
+
487
+ ```sql
488
+ -- High accuracy (slower build, more memory)
489
+ WITH (m = 32, ef_construction = 128)
490
+
491
+ -- Fast build (lower accuracy, less memory)
492
+ WITH (m = 8, ef_construction = 32)
493
+
494
+ -- Balanced (recommended default)
495
+ WITH (m = 16, ef_construction = 64)
496
+ ```
497
+
498
+ #### Query Performance
499
+
500
+ ```sql
501
+ -- Set ef_search for query-time accuracy/speed tradeoff
502
+ SET hnsw.ef_search = 100; -- Higher = more accurate, slower
503
+
504
+ -- Use EXPLAIN ANALYZE to optimize queries
505
+ EXPLAIN ANALYZE
506
+ SELECT * FROM document_embeddings
507
+ ORDER BY embedding <=> '[0.1,0.2,...]'::vector
508
+ LIMIT 10;
509
+ ```
510
+
511
+ #### Chunking Strategy
512
+
513
+ For large documents, use chunking for better search granularity:
514
+
515
+ ```ruby
516
+ def chunk_document(text, chunk_size: 500, overlap: 50)
517
+ chunks = []
518
+ start = 0
519
+
520
+ while start < text.length
521
+ end_pos = [start + chunk_size, text.length].min
522
+ chunk = text[start...end_pos]
523
+ chunks << chunk
524
+ start += chunk_size - overlap
525
+ end
526
+
527
+ chunks
528
+ end
529
+
530
+ # Generate embeddings for each chunk
531
+ chunks = chunk_document(document.content)
532
+ chunks.each_with_index do |chunk, index|
533
+ embedding = client.generate_embedding(chunk)
534
+ # Store chunk and embedding...
535
+ end
536
+ ```
537
+
538
+ ### Example Usage
539
+
540
+ Run the complete vector search example:
541
+
542
+ ```bash
543
+ # Start services
544
+ docker-compose up -d postgres ollama
545
+
546
+ # Run example
547
+ DB_HOST=localhost ruby examples/vector_search.rb
548
+ ```
549
+
550
+ The example demonstrates:
551
+ - Document embedding generation and storage
552
+ - Similarity search with different distance functions
553
+ - Metadata filtering and advanced queries
554
+ - Performance comparison between approaches
555
+
556
+ ## Advanced Usage
557
+
558
+ ### Custom Provider Implementation
559
+
560
+ ```ruby
561
+ class MyCustomProvider < Prescient::BaseProvider
562
+ def generate_embedding(text, **options)
563
+ # Your implementation
564
+ end
565
+
566
+ def generate_response(prompt, context_items = [], **options)
567
+ # Your implementation
568
+ end
569
+
570
+ def health_check
571
+ # Your implementation
572
+ end
573
+
574
+ protected
575
+
576
+ def validate_configuration!
577
+ # Validate required options
578
+ end
579
+ end
580
+
581
+ # Register your provider
582
+ Prescient.configure do |config|
583
+ config.add_provider(:mycustom, MyCustomProvider,
584
+ api_key: 'your_key',
585
+ model: 'your_model'
586
+ )
587
+ end
588
+ ```
589
+
590
+ ### Provider Information
591
+
592
+ ```ruby
593
+ client = Prescient.client(:ollama)
594
+ info = client.provider_info
595
+
596
+ puts info[:name] # => :ollama
597
+ puts info[:class] # => "Prescient::Ollama::Provider"
598
+ puts info[:available] # => true
599
+ puts info[:options] # => {...} (excluding sensitive data)
600
+ ```
601
+
602
+ ## Provider-Specific Features
603
+
604
+ ### Ollama
605
+
606
+ - Model management: `pull_model`, `list_models`
607
+ - Local deployment support
608
+ - No API costs
609
+
610
+ ### Anthropic
611
+
612
+ - High-quality responses
613
+ - No embedding support (use with OpenAI/HuggingFace for embeddings)
614
+
615
+ ### OpenAI
616
+
617
+ - Multiple embedding model sizes
618
+ - Latest GPT models
619
+ - Reliable performance
620
+
621
+ ### HuggingFace
622
+
623
+ - Open-source models
624
+ - Research-friendly
625
+ - Free tier available
626
+
627
+ ## Docker Setup (Recommended for Ollama)
628
+
629
+ The easiest way to get started with Prescient and Ollama is using Docker Compose:
630
+
631
+ ### Hardware Requirements
632
+
633
+ Before starting, ensure your system meets the minimum requirements for running Ollama:
634
+
635
+ #### **Minimum Requirements:**
636
+ - **CPU**: 4+ cores (x86_64 or ARM64)
637
+ - **RAM**: 8GB+ (16GB recommended)
638
+ - **Storage**: 10GB+ free space for models
639
+ - **OS**: Linux, macOS, or Windows with Docker
640
+
641
+ #### **Model-Specific Requirements:**
642
+
643
+ | Model | RAM Required | Storage | Notes |
644
+ |-------|-------------|---------|-------|
645
+ | `nomic-embed-text` | 1GB | 274MB | Embedding model |
646
+ | `llama3.1:8b` | 8GB | 4.7GB | Chat model (8B parameters) |
647
+ | `llama3.1:70b` | 64GB+ | 40GB | Large chat model (70B parameters) |
648
+ | `codellama:7b` | 8GB | 3.8GB | Code generation model |
649
+
650
+ #### **Performance Recommendations:**
651
+ - **SSD Storage**: Significantly faster model loading
652
+ - **GPU (Optional)**: NVIDIA GPU with 8GB+ VRAM for acceleration
653
+ - **Network**: Stable internet for initial model downloads
654
+ - **Docker**: 4GB+ memory limit configured
655
+
656
+ #### **GPU Acceleration (Optional):**
657
+ - **NVIDIA GPU**: RTX 3060+ with 8GB+ VRAM recommended
658
+ - **CUDA**: Version 11.8+ required
659
+ - **Docker**: NVIDIA Container Toolkit installed
660
+ - **Performance**: 3-10x faster inference with compatible models
661
+
662
+ > **💡 Tip**: Start with smaller models like `llama3.1:8b` and upgrade based on your hardware capabilities and performance needs.
663
+
664
+ ### Quick Start with Docker
665
+
666
+ 1. **Start Ollama service:**
667
+ ```bash
668
+ docker-compose up -d ollama
669
+ ```
670
+
671
+ 2. **Pull required models:**
672
+ ```bash
673
+ # Automatic setup
674
+ docker-compose up ollama-init
675
+
676
+ # Or manual setup
677
+ ./scripts/setup-ollama-models.sh
678
+ ```
679
+
680
+ 3. **Run examples:**
681
+ ```bash
682
+ # Set environment variable
683
+ export OLLAMA_URL=http://localhost:11434
684
+
685
+ # Run examples
686
+ ruby examples/custom_contexts.rb
687
+ ```
688
+
689
+ ### Docker Compose Services
690
+
691
+ The included `docker-compose.yml` provides:
692
+
693
+ - **ollama**: Ollama AI service with persistent model storage
694
+ - **ollama-init**: Automatically pulls required models on startup
695
+ - **redis**: Optional caching layer for embeddings
696
+ - **prescient-app**: Example Ruby application container
697
+
698
+ ### Configuration Options
699
+
700
+ ```yaml
701
+ # docker-compose.yml environment variables
702
+ services:
703
+ ollama:
704
+ ports:
705
+ - "11434:11434" # Ollama API port
706
+ volumes:
707
+ - ollama_data:/root/.ollama # Persist models
708
+ environment:
709
+ - OLLAMA_HOST=0.0.0.0
710
+ - OLLAMA_ORIGINS=*
711
+ ```
712
+
713
+ ### GPU Support (Optional)
714
+
715
+ For GPU acceleration, uncomment the GPU configuration in `docker-compose.yml`:
716
+
717
+ ```yaml
718
+ services:
719
+ ollama:
720
+ deploy:
721
+ resources:
722
+ reservations:
723
+ devices:
724
+ - driver: nvidia
725
+ count: 1
726
+ capabilities: [gpu]
727
+ ```
728
+
729
+ ### Environment Variables
730
+
731
+ ```bash
732
+ # Ollama Configuration
733
+ OLLAMA_URL=http://localhost:11434
734
+ OLLAMA_EMBEDDING_MODEL=nomic-embed-text
735
+ OLLAMA_CHAT_MODEL=llama3.1:8b
736
+
737
+ # Optional: Other AI providers
738
+ OPENAI_API_KEY=your_key_here
739
+ ANTHROPIC_API_KEY=your_key_here
740
+ HUGGINGFACE_API_KEY=your_key_here
741
+ ```
742
+
743
+ ### Model Management
744
+
745
+ ```bash
746
+ # Check available models
747
+ curl http://localhost:11434/api/tags
748
+
749
+ # Pull a specific model
750
+ curl -X POST http://localhost:11434/api/pull \
751
+ -H "Content-Type: application/json" \
752
+ -d '{"name": "llama3.1:8b"}'
753
+
754
+ # Health check
755
+ curl http://localhost:11434/api/version
756
+ ```
757
+
758
+ ### Production Deployment
759
+
760
+ For production use:
761
+
762
+ 1. Use specific image tags instead of `latest`
763
+ 2. Configure proper resource limits
764
+ 3. Set up monitoring and logging
765
+ 4. Use secrets management for API keys
766
+ 5. Configure backups for model data
767
+
768
+ ### Troubleshooting
769
+
770
+ #### **Common Issues:**
771
+
772
+ **Out of Memory Errors:**
773
+ ```bash
774
+ # Check available memory
775
+ free -h
776
+
777
+ # Increase Docker memory limit (Docker Desktop)
778
+ # Settings > Resources > Memory: 8GB+
779
+
780
+ # Use smaller models if hardware limited
781
+ OLLAMA_CHAT_MODEL=llama3.1:7b ruby examples/custom_contexts.rb
782
+ ```
783
+
784
+ **Slow Model Loading:**
785
+ ```bash
786
+ # Check disk I/O
787
+ iostat -x 1
788
+
789
+ # Move Docker data to SSD if on HDD
790
+ # Docker Desktop: Settings > Resources > Disk image location
791
+ ```
792
+
793
+ **Model Download Failures:**
794
+ ```bash
795
+ # Check disk space
796
+ df -h
797
+
798
+ # Manually pull models with retry
799
+ docker exec prescient-ollama ollama pull llama3.1:8b
800
+ ```
801
+
802
+ **GPU Not Detected:**
803
+ ```bash
804
+ # Check NVIDIA Docker runtime
805
+ docker run --rm --gpus all nvidia/cuda:11.8-base nvidia-smi
806
+
807
+ # Install NVIDIA Container Toolkit if missing
808
+ # https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
809
+ ```
810
+
811
+ #### **Performance Monitoring:**
812
+
813
+ ```bash
814
+ # Monitor resource usage
815
+ docker stats prescient-ollama
816
+
817
+ # Check Ollama logs
818
+ docker logs prescient-ollama
819
+
820
+ # Test API response time
821
+ time curl -X POST http://localhost:11434/api/generate \
822
+ -H "Content-Type: application/json" \
823
+ -d '{"model": "llama3.1:8b", "prompt": "Hello", "stream": false}'
824
+ ```
825
+
826
+ ## Testing
827
+
828
+ The gem includes comprehensive test coverage:
829
+
830
+ ```bash
831
+ bundle exec rspec
832
+ ```
26
833
 
27
834
  ## Development
28
835
 
29
- After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
836
+ After checking out the repo, run:
837
+
838
+ ```bash
839
+ bundle install
840
+ ```
30
841
 
31
- To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).
842
+ To install this gem onto your local machine:
843
+
844
+ ```bash
845
+ bundle exec rake install
846
+ ```
32
847
 
33
848
  ## Contributing
34
849
 
35
- Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/prescient. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [code of conduct](https://github.com/[USERNAME]/prescient/blob/main/CODE_OF_CONDUCT.md).
850
+ 1. Fork it
851
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
852
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
853
+ 4. Push to the branch (`git push origin my-new-feature`)
854
+ 5. Create a new Pull Request
36
855
 
37
856
  ## License
38
857
 
39
858
  The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
40
859
 
41
- ## Code of Conduct
860
+ ## Roadmap
861
+
862
+ ### Version 0.2.0 (Planned)
863
+
864
+ - **MariaDB Vector Support**: Integration with MariaDB using external vector databases
865
+ - **Hybrid Database Architecture**: Support for MariaDB + Milvus/Qdrant combinations
866
+ - **Vector Database Adapters**: Pluggable adapters for different vector storage backends
867
+ - **Enhanced Chunking Strategies**: Smart document splitting with multiple algorithms
868
+ - **Search Result Ranking**: Advanced scoring and re-ranking capabilities
869
+
870
+ ### Version 0.3.0 (Future)
871
+
872
+ - **Streaming Responses**: Real-time response streaming for chat applications
873
+ - **Multi-Model Ensembles**: Combine responses from multiple AI providers
874
+ - **Advanced Analytics**: Search performance insights and usage analytics
875
+ - **Cloud Provider Integration**: Direct support for Pinecone, Weaviate, etc.
876
+
877
+ ## Changelog
878
+
879
+ ### Version 0.1.0
42
880
 
43
- Everyone interacting in the Prescient project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/[USERNAME]/prescient/blob/main/CODE_OF_CONDUCT.md).
881
+ - Initial release
882
+ - Support for Ollama, Anthropic, OpenAI, and HuggingFace
883
+ - Unified interface for embeddings and text generation
884
+ - Comprehensive error handling and retry logic
885
+ - Health monitoring capabilities
886
+ - PostgreSQL pgvector integration with complete Docker setup
887
+ - Vector similarity search with multiple distance functions
888
+ - Document chunking and metadata filtering
889
+ - Performance optimization guides and troubleshooting