prescient 0.0.0 → 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.env.example +37 -0
- data/.rubocop.yml +326 -0
- data/Dockerfile.example +41 -0
- data/README.md +859 -13
- data/Rakefile +25 -3
- data/VECTOR_SEARCH_GUIDE.md +450 -0
- data/db/init/01_enable_pgvector.sql +30 -0
- data/db/init/02_create_schema.sql +108 -0
- data/db/init/03_create_indexes.sql +96 -0
- data/db/init/04_insert_sample_data.sql +121 -0
- data/db/migrate/001_create_prescient_tables.rb +158 -0
- data/docker-compose.yml +153 -0
- data/examples/basic_usage.rb +123 -0
- data/examples/custom_contexts.rb +355 -0
- data/examples/custom_prompts.rb +212 -0
- data/examples/vector_search.rb +330 -0
- data/lib/prescient/base.rb +270 -0
- data/lib/prescient/client.rb +107 -0
- data/lib/prescient/provider/anthropic.rb +146 -0
- data/lib/prescient/provider/huggingface.rb +202 -0
- data/lib/prescient/provider/ollama.rb +172 -0
- data/lib/prescient/provider/openai.rb +181 -0
- data/lib/prescient/version.rb +1 -1
- data/lib/prescient.rb +84 -2
- data/prescient.gemspec +51 -0
- data/scripts/setup-ollama-models.sh +77 -0
- metadata +215 -12
- data/.vscode/settings.json +0 -1
data/README.md
CHANGED
@@ -1,43 +1,889 @@
|
|
1
1
|
# Prescient
|
2
2
|
|
3
|
-
|
3
|
+
Prescient provides a unified interface for AI providers including Ollama (local), Anthropic Claude, OpenAI GPT, and HuggingFace models. Built for prescient applications that need AI predictions with provider switching, error handling, and fallback mechanisms.
|
4
4
|
|
5
|
-
|
5
|
+
## Features
|
6
|
+
|
7
|
+
- **Unified Interface**: Single API for multiple AI providers
|
8
|
+
- **Local and Cloud Support**: Ollama for local/private deployments, cloud APIs for scale
|
9
|
+
- **Embedding Generation**: Vector embeddings for semantic search and AI applications
|
10
|
+
- **Text Completion**: Chat completions with context support
|
11
|
+
- **Error Handling**: Robust error handling with automatic retries
|
12
|
+
- **Health Monitoring**: Built-in health checks for all providers
|
13
|
+
- **Flexible Configuration**: Environment variable and programmatic configuration
|
14
|
+
|
15
|
+
## Supported Providers
|
16
|
+
|
17
|
+
### Ollama (Local)
|
18
|
+
|
19
|
+
- **Models**: Any Ollama-compatible model (llama3.1, nomic-embed-text, etc.)
|
20
|
+
- **Capabilities**: Embeddings, Text Generation, Model Management
|
21
|
+
- **Use Case**: Privacy-focused, local deployments
|
22
|
+
|
23
|
+
### Anthropic Claude
|
24
|
+
|
25
|
+
- **Models**: Claude 3 (Haiku, Sonnet, Opus)
|
26
|
+
- **Capabilities**: Text Generation only (no embeddings)
|
27
|
+
- **Use Case**: High-quality conversational AI
|
28
|
+
|
29
|
+
### OpenAI
|
30
|
+
|
31
|
+
- **Models**: GPT-3.5, GPT-4, text-embedding-3-small/large
|
32
|
+
- **Capabilities**: Embeddings, Text Generation
|
33
|
+
- **Use Case**: Proven performance, wide model selection
|
34
|
+
|
35
|
+
### HuggingFace
|
36
|
+
|
37
|
+
- **Models**: sentence-transformers, open-source chat models
|
38
|
+
- **Capabilities**: Embeddings, Text Generation
|
39
|
+
- **Use Case**: Open-source models, research
|
6
40
|
|
7
41
|
## Installation
|
8
42
|
|
9
|
-
|
43
|
+
Add this line to your application's Gemfile:
|
44
|
+
|
45
|
+
```ruby
|
46
|
+
gem 'prescient'
|
47
|
+
```
|
48
|
+
|
49
|
+
And then execute:
|
50
|
+
|
51
|
+
```bash
|
52
|
+
bundle install
|
53
|
+
```
|
10
54
|
|
11
|
-
|
55
|
+
Or install it yourself as:
|
12
56
|
|
13
57
|
```bash
|
14
|
-
|
58
|
+
gem install prescient
|
15
59
|
```
|
16
60
|
|
17
|
-
|
61
|
+
## Configuration
|
62
|
+
|
63
|
+
### Environment Variables
|
18
64
|
|
19
65
|
```bash
|
20
|
-
|
66
|
+
# Ollama (Local)
|
67
|
+
OLLAMA_URL=http://localhost:11434
|
68
|
+
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
|
69
|
+
OLLAMA_CHAT_MODEL=llama3.1:8b
|
70
|
+
|
71
|
+
# Anthropic
|
72
|
+
ANTHROPIC_API_KEY=your_api_key
|
73
|
+
ANTHROPIC_MODEL=claude-3-haiku-20240307
|
74
|
+
|
75
|
+
# OpenAI
|
76
|
+
OPENAI_API_KEY=your_api_key
|
77
|
+
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
|
78
|
+
OPENAI_CHAT_MODEL=gpt-3.5-turbo
|
79
|
+
|
80
|
+
# HuggingFace
|
81
|
+
HUGGINGFACE_API_KEY=your_api_key
|
82
|
+
HUGGINGFACE_EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
|
83
|
+
HUGGINGFACE_CHAT_MODEL=microsoft/DialoGPT-medium
|
84
|
+
```
|
85
|
+
|
86
|
+
### Programmatic Configuration
|
87
|
+
|
88
|
+
```ruby
|
89
|
+
require 'prescient'
|
90
|
+
|
91
|
+
# Configure providers
|
92
|
+
Prescient.configure do |config|
|
93
|
+
config.default_provider = :ollama
|
94
|
+
config.timeout = 60
|
95
|
+
config.retry_attempts = 3
|
96
|
+
config.retry_delay = 1.0
|
97
|
+
|
98
|
+
# Add custom Ollama configuration
|
99
|
+
config.add_provider(:ollama, Prescient::Ollama::Provider,
|
100
|
+
url: 'http://localhost:11434',
|
101
|
+
embedding_model: 'nomic-embed-text',
|
102
|
+
chat_model: 'llama3.1:8b'
|
103
|
+
)
|
104
|
+
|
105
|
+
# Add Anthropic
|
106
|
+
config.add_provider(:anthropic, Prescient::Anthropic::Provider,
|
107
|
+
api_key: ENV['ANTHROPIC_API_KEY'],
|
108
|
+
model: 'claude-3-haiku-20240307'
|
109
|
+
)
|
110
|
+
|
111
|
+
# Add OpenAI
|
112
|
+
config.add_provider(:openai, Prescient::OpenAI::Provider,
|
113
|
+
api_key: ENV['OPENAI_API_KEY'],
|
114
|
+
embedding_model: 'text-embedding-3-small',
|
115
|
+
chat_model: 'gpt-3.5-turbo'
|
116
|
+
)
|
117
|
+
end
|
21
118
|
```
|
22
119
|
|
23
120
|
## Usage
|
24
121
|
|
25
|
-
|
122
|
+
### Quick Start
|
123
|
+
|
124
|
+
```ruby
|
125
|
+
require 'prescient'
|
126
|
+
|
127
|
+
# Use default provider (Ollama)
|
128
|
+
client = Prescient.client
|
129
|
+
|
130
|
+
# Generate embeddings
|
131
|
+
embedding = client.generate_embedding("Your text here")
|
132
|
+
# => [0.1, 0.2, 0.3, ...] (768-dimensional vector)
|
133
|
+
|
134
|
+
# Generate text responses
|
135
|
+
response = client.generate_response("What is Ruby?")
|
136
|
+
puts response[:response]
|
137
|
+
# => "Ruby is a dynamic, open-source programming language..."
|
138
|
+
|
139
|
+
# Health check
|
140
|
+
health = client.health_check
|
141
|
+
puts health[:status] # => "healthy"
|
142
|
+
```
|
143
|
+
|
144
|
+
### Provider-Specific Usage
|
145
|
+
|
146
|
+
```ruby
|
147
|
+
# Use specific provider
|
148
|
+
openai_client = Prescient.client(:openai)
|
149
|
+
anthropic_client = Prescient.client(:anthropic)
|
150
|
+
|
151
|
+
# Direct method calls
|
152
|
+
embedding = Prescient.generate_embedding("text", provider: :openai)
|
153
|
+
response = Prescient.generate_response("prompt", provider: :anthropic)
|
154
|
+
```
|
155
|
+
|
156
|
+
### Context-Aware Generation
|
157
|
+
|
158
|
+
```ruby
|
159
|
+
# Generate embeddings for document chunks
|
160
|
+
documents = ["Document 1 content", "Document 2 content"]
|
161
|
+
embeddings = documents.map { |doc| Prescient.generate_embedding(doc) }
|
162
|
+
|
163
|
+
# Later, find relevant context and generate response
|
164
|
+
query = "What is mentioned about Ruby?"
|
165
|
+
context_items = find_relevant_documents(query, embeddings) # Your similarity search
|
166
|
+
|
167
|
+
response = Prescient.generate_response(query, context_items,
|
168
|
+
max_tokens: 1000,
|
169
|
+
temperature: 0.7
|
170
|
+
)
|
171
|
+
|
172
|
+
puts response[:response]
|
173
|
+
puts "Model: #{response[:model]}"
|
174
|
+
puts "Provider: #{response[:provider]}"
|
175
|
+
```
|
176
|
+
|
177
|
+
### Error Handling
|
178
|
+
|
179
|
+
```ruby
|
180
|
+
begin
|
181
|
+
response = client.generate_response("Your prompt")
|
182
|
+
rescue Prescient::ConnectionError => e
|
183
|
+
puts "Connection failed: #{e.message}"
|
184
|
+
rescue Prescient::RateLimitError => e
|
185
|
+
puts "Rate limited: #{e.message}"
|
186
|
+
rescue Prescient::AuthenticationError => e
|
187
|
+
puts "Auth failed: #{e.message}"
|
188
|
+
rescue Prescient::Error => e
|
189
|
+
puts "General error: #{e.message}"
|
190
|
+
end
|
191
|
+
```
|
192
|
+
|
193
|
+
### Health Monitoring
|
194
|
+
|
195
|
+
```ruby
|
196
|
+
# Check all providers
|
197
|
+
[:ollama, :anthropic, :openai, :huggingface].each do |provider|
|
198
|
+
health = Prescient.health_check(provider: provider)
|
199
|
+
puts "#{provider}: #{health[:status]}"
|
200
|
+
puts "Ready: #{health[:ready]}" if health[:ready]
|
201
|
+
end
|
202
|
+
```
|
203
|
+
|
204
|
+
## Custom Prompt Templates
|
205
|
+
|
206
|
+
Prescient allows you to customize the AI assistant's behavior through configurable prompt templates:
|
207
|
+
|
208
|
+
```ruby
|
209
|
+
Prescient.configure do |config|
|
210
|
+
config.add_provider(:customer_service, Prescient::Provider::OpenAI,
|
211
|
+
api_key: ENV['OPENAI_API_KEY'],
|
212
|
+
embedding_model: 'text-embedding-3-small',
|
213
|
+
chat_model: 'gpt-3.5-turbo',
|
214
|
+
prompt_templates: {
|
215
|
+
system_prompt: 'You are a friendly customer service representative.',
|
216
|
+
no_context_template: <<~TEMPLATE.strip,
|
217
|
+
%{system_prompt}
|
218
|
+
|
219
|
+
Customer Question: %{query}
|
220
|
+
|
221
|
+
Please provide a helpful response.
|
222
|
+
TEMPLATE
|
223
|
+
with_context_template: <<~TEMPLATE.strip
|
224
|
+
%{system_prompt} Use the company info below to help answer.
|
225
|
+
|
226
|
+
Company Information:
|
227
|
+
%{context}
|
228
|
+
|
229
|
+
Customer Question: %{query}
|
230
|
+
|
231
|
+
Respond based on our company policies above.
|
232
|
+
TEMPLATE
|
233
|
+
}
|
234
|
+
)
|
235
|
+
end
|
236
|
+
|
237
|
+
client = Prescient.client(:customer_service)
|
238
|
+
response = client.generate_response("What's your return policy?")
|
239
|
+
```
|
240
|
+
|
241
|
+
### Template Placeholders
|
242
|
+
|
243
|
+
- `%{system_prompt}` - The system/role instruction
|
244
|
+
- `%{query}` - The user's question
|
245
|
+
- `%{context}` - Formatted context items (when provided)
|
246
|
+
|
247
|
+
### Template Types
|
248
|
+
|
249
|
+
- **system_prompt** - Defines the AI's role and behavior
|
250
|
+
- **no_context_template** - Used when no context items provided
|
251
|
+
- **with_context_template** - Used when context items are provided
|
252
|
+
|
253
|
+
### Examples by Use Case
|
254
|
+
|
255
|
+
#### Technical Documentation
|
256
|
+
|
257
|
+
```ruby
|
258
|
+
prompt_templates: {
|
259
|
+
system_prompt: 'You are a technical documentation assistant. Provide detailed explanations with code examples.',
|
260
|
+
# ... templates
|
261
|
+
}
|
262
|
+
```
|
263
|
+
|
264
|
+
#### Creative Writing
|
265
|
+
|
266
|
+
```ruby
|
267
|
+
prompt_templates: {
|
268
|
+
system_prompt: 'You are a creative writing assistant. Be imaginative and inspiring.',
|
269
|
+
# ... templates
|
270
|
+
}
|
271
|
+
```
|
272
|
+
|
273
|
+
See `examples/custom_prompts.rb` for complete examples.
|
274
|
+
|
275
|
+
## Custom Context Configurations
|
276
|
+
|
277
|
+
Define how different data types should be formatted and which fields to use for embeddings:
|
278
|
+
|
279
|
+
```ruby
|
280
|
+
Prescient.configure do |config|
|
281
|
+
config.add_provider(:ecommerce, Prescient::Provider::OpenAI,
|
282
|
+
api_key: ENV['OPENAI_API_KEY'],
|
283
|
+
context_configs: {
|
284
|
+
'product' => {
|
285
|
+
fields: %w[name description price category brand],
|
286
|
+
format: '%{name} by %{brand}: %{description} - $%{price} (%{category})',
|
287
|
+
embedding_fields: %w[name description category brand]
|
288
|
+
},
|
289
|
+
'review' => {
|
290
|
+
fields: %w[product_name rating review_text reviewer_name],
|
291
|
+
format: '%{product_name} - %{rating}/5 stars: "%{review_text}"',
|
292
|
+
embedding_fields: %w[product_name review_text]
|
293
|
+
}
|
294
|
+
}
|
295
|
+
)
|
296
|
+
end
|
297
|
+
|
298
|
+
# Context items with explicit type
|
299
|
+
products = [
|
300
|
+
{
|
301
|
+
'type' => 'product',
|
302
|
+
'name' => 'UltraBook Pro',
|
303
|
+
'description' => 'High-performance laptop',
|
304
|
+
'price' => '1299.99',
|
305
|
+
'category' => 'Laptops',
|
306
|
+
'brand' => 'TechCorp'
|
307
|
+
}
|
308
|
+
]
|
309
|
+
|
310
|
+
client = Prescient.client(:ecommerce)
|
311
|
+
response = client.generate_response("I need a laptop for work", products)
|
312
|
+
```
|
313
|
+
|
314
|
+
### Context Configuration Options
|
315
|
+
|
316
|
+
- **fields** - Array of field names available for this context type
|
317
|
+
- **format** - Template string for displaying context items
|
318
|
+
- **embedding_fields** - Specific fields to use when generating embeddings
|
319
|
+
|
320
|
+
### Automatic Context Detection
|
321
|
+
|
322
|
+
The system automatically detects context types based on YOUR configured field patterns:
|
323
|
+
|
324
|
+
1. **Explicit Type Fields**: Uses `type`, `context_type`, or `model_type` field values
|
325
|
+
2. **Field Matching**: Matches items to configured contexts based on field overlap (≥50% match required)
|
326
|
+
3. **Default Fallback**: Uses generic formatting when no context configuration matches
|
327
|
+
|
328
|
+
The system has NO hardcoded context types - it's entirely driven by your configuration!
|
329
|
+
|
330
|
+
### Without Context Configuration
|
331
|
+
|
332
|
+
The system works perfectly without any context configuration - it will:
|
333
|
+
|
334
|
+
- Use intelligent fallback formatting for any hash structure
|
335
|
+
- Extract text fields for embeddings while excluding common metadata (id, timestamps, etc.)
|
336
|
+
- Provide consistent behavior across different data types
|
337
|
+
|
338
|
+
```ruby
|
339
|
+
# No context_configs needed - works with any data!
|
340
|
+
client = Prescient.client(:default)
|
341
|
+
response = client.generate_response("Analyze this", [
|
342
|
+
{ 'title' => 'Issue', 'content' => 'Server down', 'created_at' => '2024-01-01' },
|
343
|
+
{ 'name' => 'Alert', 'message' => 'High CPU usage', 'timestamp' => 1234567 }
|
344
|
+
])
|
345
|
+
```
|
346
|
+
|
347
|
+
See `examples/custom_contexts.rb` for complete examples.
|
348
|
+
|
349
|
+
## Vector Database Integration (pgvector)
|
350
|
+
|
351
|
+
Prescient integrates seamlessly with PostgreSQL's pgvector extension for storing and searching embeddings:
|
352
|
+
|
353
|
+
### Setup with Docker
|
354
|
+
|
355
|
+
The included `docker-compose.yml` provides a complete setup with PostgreSQL + pgvector:
|
356
|
+
|
357
|
+
```bash
|
358
|
+
# Start PostgreSQL with pgvector
|
359
|
+
docker-compose up -d postgres
|
360
|
+
|
361
|
+
# The database will automatically:
|
362
|
+
# - Install pgvector extension
|
363
|
+
# - Create tables for documents and embeddings
|
364
|
+
# - Set up optimized vector indexes
|
365
|
+
# - Insert sample data for testing
|
366
|
+
```
|
367
|
+
|
368
|
+
### Database Schema
|
369
|
+
|
370
|
+
The setup creates these key tables:
|
371
|
+
|
372
|
+
- **`documents`** - Store original content and metadata
|
373
|
+
- **`document_embeddings`** - Store vector embeddings for documents
|
374
|
+
- **`document_chunks`** - Break large documents into searchable chunks
|
375
|
+
- **`chunk_embeddings`** - Store embeddings for document chunks
|
376
|
+
- **`search_queries`** - Track search queries and performance
|
377
|
+
- **`query_results`** - Store search results for analysis
|
378
|
+
|
379
|
+
### Vector Search Example
|
380
|
+
|
381
|
+
```ruby
|
382
|
+
require 'prescient'
|
383
|
+
require 'pg'
|
384
|
+
|
385
|
+
# Connect to database
|
386
|
+
db = PG.connect(
|
387
|
+
host: 'localhost',
|
388
|
+
port: 5432,
|
389
|
+
dbname: 'prescient_development',
|
390
|
+
user: 'prescient',
|
391
|
+
password: 'prescient_password'
|
392
|
+
)
|
393
|
+
|
394
|
+
# Generate embedding for a document
|
395
|
+
client = Prescient.client(:ollama)
|
396
|
+
text = "Ruby is a dynamic programming language"
|
397
|
+
embedding = client.generate_embedding(text)
|
398
|
+
|
399
|
+
# Store embedding in database
|
400
|
+
vector_str = "[#{embedding.join(',')}]"
|
401
|
+
db.exec_params(
|
402
|
+
"INSERT INTO document_embeddings (document_id, embedding_provider, embedding_model, embedding_dimensions, embedding, embedding_text) VALUES ($1, $2, $3, $4, $5, $6)",
|
403
|
+
[doc_id, 'ollama', 'nomic-embed-text', 768, vector_str, text]
|
404
|
+
)
|
405
|
+
|
406
|
+
# Perform similarity search
|
407
|
+
query_text = "What is Ruby programming?"
|
408
|
+
query_embedding = client.generate_embedding(query_text)
|
409
|
+
query_vector = "[#{query_embedding.join(',')}]"
|
410
|
+
|
411
|
+
results = db.exec_params(
|
412
|
+
"SELECT d.title, d.content, de.embedding <=> $1::vector AS distance
|
413
|
+
FROM documents d
|
414
|
+
JOIN document_embeddings de ON d.id = de.document_id
|
415
|
+
ORDER BY de.embedding <=> $1::vector
|
416
|
+
LIMIT 5",
|
417
|
+
[query_vector]
|
418
|
+
)
|
419
|
+
```
|
420
|
+
|
421
|
+
### Distance Functions
|
422
|
+
|
423
|
+
pgvector supports three distance functions:
|
424
|
+
|
425
|
+
- **Cosine Distance** (`<=>`): Best for normalized embeddings
|
426
|
+
- **L2 Distance** (`<->`): Euclidean distance, good general purpose
|
427
|
+
- **Inner Product** (`<#>`): Dot product, useful for specific cases
|
428
|
+
|
429
|
+
```sql
|
430
|
+
-- Cosine similarity (most common)
|
431
|
+
ORDER BY embedding <=> query_vector
|
432
|
+
|
433
|
+
-- L2 distance
|
434
|
+
ORDER BY embedding <-> query_vector
|
435
|
+
|
436
|
+
-- Inner product
|
437
|
+
ORDER BY embedding <#> query_vector
|
438
|
+
```
|
439
|
+
|
440
|
+
### Vector Indexes
|
441
|
+
|
442
|
+
The setup automatically creates HNSW indexes for fast similarity search:
|
443
|
+
|
444
|
+
```sql
|
445
|
+
-- Example index for cosine distance
|
446
|
+
CREATE INDEX idx_embeddings_cosine
|
447
|
+
ON document_embeddings
|
448
|
+
USING hnsw (embedding vector_cosine_ops)
|
449
|
+
WITH (m = 16, ef_construction = 64);
|
450
|
+
```
|
451
|
+
|
452
|
+
### Advanced Search with Filters
|
453
|
+
|
454
|
+
Combine vector similarity with metadata filtering:
|
455
|
+
|
456
|
+
```ruby
|
457
|
+
# Search with tag filtering
|
458
|
+
results = db.exec_params(
|
459
|
+
"SELECT d.title, de.embedding <=> $1::vector as distance
|
460
|
+
FROM documents d
|
461
|
+
JOIN document_embeddings de ON d.id = de.document_id
|
462
|
+
WHERE d.metadata->'tags' ? 'programming'
|
463
|
+
ORDER BY de.embedding <=> $1::vector
|
464
|
+
LIMIT 5",
|
465
|
+
[query_vector]
|
466
|
+
)
|
467
|
+
|
468
|
+
# Search with difficulty and tag filters
|
469
|
+
results = db.exec_params(
|
470
|
+
"SELECT d.title, de.embedding <=> $1::vector as distance
|
471
|
+
FROM documents d
|
472
|
+
JOIN document_embeddings de ON d.id = de.document_id
|
473
|
+
WHERE d.metadata->>'difficulty' = 'beginner'
|
474
|
+
AND d.metadata->'tags' ?| $2::text[]
|
475
|
+
ORDER BY de.embedding <=> $1::vector
|
476
|
+
LIMIT 5",
|
477
|
+
[query_vector, ['ruby', 'programming']]
|
478
|
+
)
|
479
|
+
```
|
480
|
+
|
481
|
+
### Performance Optimization
|
482
|
+
|
483
|
+
#### Index Configuration
|
484
|
+
|
485
|
+
For large datasets, tune HNSW parameters:
|
486
|
+
|
487
|
+
```sql
|
488
|
+
-- High accuracy (slower build, more memory)
|
489
|
+
WITH (m = 32, ef_construction = 128)
|
490
|
+
|
491
|
+
-- Fast build (lower accuracy, less memory)
|
492
|
+
WITH (m = 8, ef_construction = 32)
|
493
|
+
|
494
|
+
-- Balanced (recommended default)
|
495
|
+
WITH (m = 16, ef_construction = 64)
|
496
|
+
```
|
497
|
+
|
498
|
+
#### Query Performance
|
499
|
+
|
500
|
+
```sql
|
501
|
+
-- Set ef_search for query-time accuracy/speed tradeoff
|
502
|
+
SET hnsw.ef_search = 100; -- Higher = more accurate, slower
|
503
|
+
|
504
|
+
-- Use EXPLAIN ANALYZE to optimize queries
|
505
|
+
EXPLAIN ANALYZE
|
506
|
+
SELECT * FROM document_embeddings
|
507
|
+
ORDER BY embedding <=> '[0.1,0.2,...]'::vector
|
508
|
+
LIMIT 10;
|
509
|
+
```
|
510
|
+
|
511
|
+
#### Chunking Strategy
|
512
|
+
|
513
|
+
For large documents, use chunking for better search granularity:
|
514
|
+
|
515
|
+
```ruby
|
516
|
+
def chunk_document(text, chunk_size: 500, overlap: 50)
|
517
|
+
chunks = []
|
518
|
+
start = 0
|
519
|
+
|
520
|
+
while start < text.length
|
521
|
+
end_pos = [start + chunk_size, text.length].min
|
522
|
+
chunk = text[start...end_pos]
|
523
|
+
chunks << chunk
|
524
|
+
start += chunk_size - overlap
|
525
|
+
end
|
526
|
+
|
527
|
+
chunks
|
528
|
+
end
|
529
|
+
|
530
|
+
# Generate embeddings for each chunk
|
531
|
+
chunks = chunk_document(document.content)
|
532
|
+
chunks.each_with_index do |chunk, index|
|
533
|
+
embedding = client.generate_embedding(chunk)
|
534
|
+
# Store chunk and embedding...
|
535
|
+
end
|
536
|
+
```
|
537
|
+
|
538
|
+
### Example Usage
|
539
|
+
|
540
|
+
Run the complete vector search example:
|
541
|
+
|
542
|
+
```bash
|
543
|
+
# Start services
|
544
|
+
docker-compose up -d postgres ollama
|
545
|
+
|
546
|
+
# Run example
|
547
|
+
DB_HOST=localhost ruby examples/vector_search.rb
|
548
|
+
```
|
549
|
+
|
550
|
+
The example demonstrates:
|
551
|
+
- Document embedding generation and storage
|
552
|
+
- Similarity search with different distance functions
|
553
|
+
- Metadata filtering and advanced queries
|
554
|
+
- Performance comparison between approaches
|
555
|
+
|
556
|
+
## Advanced Usage
|
557
|
+
|
558
|
+
### Custom Provider Implementation
|
559
|
+
|
560
|
+
```ruby
|
561
|
+
class MyCustomProvider < Prescient::BaseProvider
|
562
|
+
def generate_embedding(text, **options)
|
563
|
+
# Your implementation
|
564
|
+
end
|
565
|
+
|
566
|
+
def generate_response(prompt, context_items = [], **options)
|
567
|
+
# Your implementation
|
568
|
+
end
|
569
|
+
|
570
|
+
def health_check
|
571
|
+
# Your implementation
|
572
|
+
end
|
573
|
+
|
574
|
+
protected
|
575
|
+
|
576
|
+
def validate_configuration!
|
577
|
+
# Validate required options
|
578
|
+
end
|
579
|
+
end
|
580
|
+
|
581
|
+
# Register your provider
|
582
|
+
Prescient.configure do |config|
|
583
|
+
config.add_provider(:mycustom, MyCustomProvider,
|
584
|
+
api_key: 'your_key',
|
585
|
+
model: 'your_model'
|
586
|
+
)
|
587
|
+
end
|
588
|
+
```
|
589
|
+
|
590
|
+
### Provider Information
|
591
|
+
|
592
|
+
```ruby
|
593
|
+
client = Prescient.client(:ollama)
|
594
|
+
info = client.provider_info
|
595
|
+
|
596
|
+
puts info[:name] # => :ollama
|
597
|
+
puts info[:class] # => "Prescient::Ollama::Provider"
|
598
|
+
puts info[:available] # => true
|
599
|
+
puts info[:options] # => {...} (excluding sensitive data)
|
600
|
+
```
|
601
|
+
|
602
|
+
## Provider-Specific Features
|
603
|
+
|
604
|
+
### Ollama
|
605
|
+
|
606
|
+
- Model management: `pull_model`, `list_models`
|
607
|
+
- Local deployment support
|
608
|
+
- No API costs
|
609
|
+
|
610
|
+
### Anthropic
|
611
|
+
|
612
|
+
- High-quality responses
|
613
|
+
- No embedding support (use with OpenAI/HuggingFace for embeddings)
|
614
|
+
|
615
|
+
### OpenAI
|
616
|
+
|
617
|
+
- Multiple embedding model sizes
|
618
|
+
- Latest GPT models
|
619
|
+
- Reliable performance
|
620
|
+
|
621
|
+
### HuggingFace
|
622
|
+
|
623
|
+
- Open-source models
|
624
|
+
- Research-friendly
|
625
|
+
- Free tier available
|
626
|
+
|
627
|
+
## Docker Setup (Recommended for Ollama)
|
628
|
+
|
629
|
+
The easiest way to get started with Prescient and Ollama is using Docker Compose:
|
630
|
+
|
631
|
+
### Hardware Requirements
|
632
|
+
|
633
|
+
Before starting, ensure your system meets the minimum requirements for running Ollama:
|
634
|
+
|
635
|
+
#### **Minimum Requirements:**
|
636
|
+
- **CPU**: 4+ cores (x86_64 or ARM64)
|
637
|
+
- **RAM**: 8GB+ (16GB recommended)
|
638
|
+
- **Storage**: 10GB+ free space for models
|
639
|
+
- **OS**: Linux, macOS, or Windows with Docker
|
640
|
+
|
641
|
+
#### **Model-Specific Requirements:**
|
642
|
+
|
643
|
+
| Model | RAM Required | Storage | Notes |
|
644
|
+
|-------|-------------|---------|-------|
|
645
|
+
| `nomic-embed-text` | 1GB | 274MB | Embedding model |
|
646
|
+
| `llama3.1:8b` | 8GB | 4.7GB | Chat model (8B parameters) |
|
647
|
+
| `llama3.1:70b` | 64GB+ | 40GB | Large chat model (70B parameters) |
|
648
|
+
| `codellama:7b` | 8GB | 3.8GB | Code generation model |
|
649
|
+
|
650
|
+
#### **Performance Recommendations:**
|
651
|
+
- **SSD Storage**: Significantly faster model loading
|
652
|
+
- **GPU (Optional)**: NVIDIA GPU with 8GB+ VRAM for acceleration
|
653
|
+
- **Network**: Stable internet for initial model downloads
|
654
|
+
- **Docker**: 4GB+ memory limit configured
|
655
|
+
|
656
|
+
#### **GPU Acceleration (Optional):**
|
657
|
+
- **NVIDIA GPU**: RTX 3060+ with 8GB+ VRAM recommended
|
658
|
+
- **CUDA**: Version 11.8+ required
|
659
|
+
- **Docker**: NVIDIA Container Toolkit installed
|
660
|
+
- **Performance**: 3-10x faster inference with compatible models
|
661
|
+
|
662
|
+
> **💡 Tip**: Start with smaller models like `llama3.1:8b` and upgrade based on your hardware capabilities and performance needs.
|
663
|
+
|
664
|
+
### Quick Start with Docker
|
665
|
+
|
666
|
+
1. **Start Ollama service:**
|
667
|
+
```bash
|
668
|
+
docker-compose up -d ollama
|
669
|
+
```
|
670
|
+
|
671
|
+
2. **Pull required models:**
|
672
|
+
```bash
|
673
|
+
# Automatic setup
|
674
|
+
docker-compose up ollama-init
|
675
|
+
|
676
|
+
# Or manual setup
|
677
|
+
./scripts/setup-ollama-models.sh
|
678
|
+
```
|
679
|
+
|
680
|
+
3. **Run examples:**
|
681
|
+
```bash
|
682
|
+
# Set environment variable
|
683
|
+
export OLLAMA_URL=http://localhost:11434
|
684
|
+
|
685
|
+
# Run examples
|
686
|
+
ruby examples/custom_contexts.rb
|
687
|
+
```
|
688
|
+
|
689
|
+
### Docker Compose Services
|
690
|
+
|
691
|
+
The included `docker-compose.yml` provides:
|
692
|
+
|
693
|
+
- **ollama**: Ollama AI service with persistent model storage
|
694
|
+
- **ollama-init**: Automatically pulls required models on startup
|
695
|
+
- **redis**: Optional caching layer for embeddings
|
696
|
+
- **prescient-app**: Example Ruby application container
|
697
|
+
|
698
|
+
### Configuration Options
|
699
|
+
|
700
|
+
```yaml
|
701
|
+
# docker-compose.yml environment variables
|
702
|
+
services:
|
703
|
+
ollama:
|
704
|
+
ports:
|
705
|
+
- "11434:11434" # Ollama API port
|
706
|
+
volumes:
|
707
|
+
- ollama_data:/root/.ollama # Persist models
|
708
|
+
environment:
|
709
|
+
- OLLAMA_HOST=0.0.0.0
|
710
|
+
- OLLAMA_ORIGINS=*
|
711
|
+
```
|
712
|
+
|
713
|
+
### GPU Support (Optional)
|
714
|
+
|
715
|
+
For GPU acceleration, uncomment the GPU configuration in `docker-compose.yml`:
|
716
|
+
|
717
|
+
```yaml
|
718
|
+
services:
|
719
|
+
ollama:
|
720
|
+
deploy:
|
721
|
+
resources:
|
722
|
+
reservations:
|
723
|
+
devices:
|
724
|
+
- driver: nvidia
|
725
|
+
count: 1
|
726
|
+
capabilities: [gpu]
|
727
|
+
```
|
728
|
+
|
729
|
+
### Environment Variables
|
730
|
+
|
731
|
+
```bash
|
732
|
+
# Ollama Configuration
|
733
|
+
OLLAMA_URL=http://localhost:11434
|
734
|
+
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
|
735
|
+
OLLAMA_CHAT_MODEL=llama3.1:8b
|
736
|
+
|
737
|
+
# Optional: Other AI providers
|
738
|
+
OPENAI_API_KEY=your_key_here
|
739
|
+
ANTHROPIC_API_KEY=your_key_here
|
740
|
+
HUGGINGFACE_API_KEY=your_key_here
|
741
|
+
```
|
742
|
+
|
743
|
+
### Model Management
|
744
|
+
|
745
|
+
```bash
|
746
|
+
# Check available models
|
747
|
+
curl http://localhost:11434/api/tags
|
748
|
+
|
749
|
+
# Pull a specific model
|
750
|
+
curl -X POST http://localhost:11434/api/pull \
|
751
|
+
-H "Content-Type: application/json" \
|
752
|
+
-d '{"name": "llama3.1:8b"}'
|
753
|
+
|
754
|
+
# Health check
|
755
|
+
curl http://localhost:11434/api/version
|
756
|
+
```
|
757
|
+
|
758
|
+
### Production Deployment
|
759
|
+
|
760
|
+
For production use:
|
761
|
+
|
762
|
+
1. Use specific image tags instead of `latest`
|
763
|
+
2. Configure proper resource limits
|
764
|
+
3. Set up monitoring and logging
|
765
|
+
4. Use secrets management for API keys
|
766
|
+
5. Configure backups for model data
|
767
|
+
|
768
|
+
### Troubleshooting
|
769
|
+
|
770
|
+
#### **Common Issues:**
|
771
|
+
|
772
|
+
**Out of Memory Errors:**
|
773
|
+
```bash
|
774
|
+
# Check available memory
|
775
|
+
free -h
|
776
|
+
|
777
|
+
# Increase Docker memory limit (Docker Desktop)
|
778
|
+
# Settings > Resources > Memory: 8GB+
|
779
|
+
|
780
|
+
# Use smaller models if hardware limited
|
781
|
+
OLLAMA_CHAT_MODEL=llama3.1:7b ruby examples/custom_contexts.rb
|
782
|
+
```
|
783
|
+
|
784
|
+
**Slow Model Loading:**
|
785
|
+
```bash
|
786
|
+
# Check disk I/O
|
787
|
+
iostat -x 1
|
788
|
+
|
789
|
+
# Move Docker data to SSD if on HDD
|
790
|
+
# Docker Desktop: Settings > Resources > Disk image location
|
791
|
+
```
|
792
|
+
|
793
|
+
**Model Download Failures:**
|
794
|
+
```bash
|
795
|
+
# Check disk space
|
796
|
+
df -h
|
797
|
+
|
798
|
+
# Manually pull models with retry
|
799
|
+
docker exec prescient-ollama ollama pull llama3.1:8b
|
800
|
+
```
|
801
|
+
|
802
|
+
**GPU Not Detected:**
|
803
|
+
```bash
|
804
|
+
# Check NVIDIA Docker runtime
|
805
|
+
docker run --rm --gpus all nvidia/cuda:11.8-base nvidia-smi
|
806
|
+
|
807
|
+
# Install NVIDIA Container Toolkit if missing
|
808
|
+
# https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
|
809
|
+
```
|
810
|
+
|
811
|
+
#### **Performance Monitoring:**
|
812
|
+
|
813
|
+
```bash
|
814
|
+
# Monitor resource usage
|
815
|
+
docker stats prescient-ollama
|
816
|
+
|
817
|
+
# Check Ollama logs
|
818
|
+
docker logs prescient-ollama
|
819
|
+
|
820
|
+
# Test API response time
|
821
|
+
time curl -X POST http://localhost:11434/api/generate \
|
822
|
+
-H "Content-Type: application/json" \
|
823
|
+
-d '{"model": "llama3.1:8b", "prompt": "Hello", "stream": false}'
|
824
|
+
```
|
825
|
+
|
826
|
+
## Testing
|
827
|
+
|
828
|
+
The gem includes comprehensive test coverage:
|
829
|
+
|
830
|
+
```bash
|
831
|
+
bundle exec rspec
|
832
|
+
```
|
26
833
|
|
27
834
|
## Development
|
28
835
|
|
29
|
-
After checking out the repo, run
|
836
|
+
After checking out the repo, run:
|
837
|
+
|
838
|
+
```bash
|
839
|
+
bundle install
|
840
|
+
```
|
30
841
|
|
31
|
-
To install this gem onto your local machine
|
842
|
+
To install this gem onto your local machine:
|
843
|
+
|
844
|
+
```bash
|
845
|
+
bundle exec rake install
|
846
|
+
```
|
32
847
|
|
33
848
|
## Contributing
|
34
849
|
|
35
|
-
|
850
|
+
1. Fork it
|
851
|
+
2. Create your feature branch (`git checkout -b my-new-feature`)
|
852
|
+
3. Commit your changes (`git commit -am 'Add some feature'`)
|
853
|
+
4. Push to the branch (`git push origin my-new-feature`)
|
854
|
+
5. Create a new Pull Request
|
36
855
|
|
37
856
|
## License
|
38
857
|
|
39
858
|
The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
|
40
859
|
|
41
|
-
##
|
860
|
+
## Roadmap
|
861
|
+
|
862
|
+
### Version 0.2.0 (Planned)
|
863
|
+
|
864
|
+
- **MariaDB Vector Support**: Integration with MariaDB using external vector databases
|
865
|
+
- **Hybrid Database Architecture**: Support for MariaDB + Milvus/Qdrant combinations
|
866
|
+
- **Vector Database Adapters**: Pluggable adapters for different vector storage backends
|
867
|
+
- **Enhanced Chunking Strategies**: Smart document splitting with multiple algorithms
|
868
|
+
- **Search Result Ranking**: Advanced scoring and re-ranking capabilities
|
869
|
+
|
870
|
+
### Version 0.3.0 (Future)
|
871
|
+
|
872
|
+
- **Streaming Responses**: Real-time response streaming for chat applications
|
873
|
+
- **Multi-Model Ensembles**: Combine responses from multiple AI providers
|
874
|
+
- **Advanced Analytics**: Search performance insights and usage analytics
|
875
|
+
- **Cloud Provider Integration**: Direct support for Pinecone, Weaviate, etc.
|
876
|
+
|
877
|
+
## Changelog
|
878
|
+
|
879
|
+
### Version 0.1.0
|
42
880
|
|
43
|
-
|
881
|
+
- Initial release
|
882
|
+
- Support for Ollama, Anthropic, OpenAI, and HuggingFace
|
883
|
+
- Unified interface for embeddings and text generation
|
884
|
+
- Comprehensive error handling and retry logic
|
885
|
+
- Health monitoring capabilities
|
886
|
+
- PostgreSQL pgvector integration with complete Docker setup
|
887
|
+
- Vector similarity search with multiple distance functions
|
888
|
+
- Document chunking and metadata filtering
|
889
|
+
- Performance optimization guides and troubleshooting
|