ragdoll 0.1.10 → 0.1.12
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +22 -0
- data/README.md +326 -351
- data/app/models/ragdoll/document.rb +1 -1
- data/app/models/ragdoll/search.rb +1 -1
- data/app/models/ragdoll/unified_content.rb +216 -0
- data/app/models/ragdoll/unified_document.rb +338 -0
- data/app/services/ragdoll/audio_to_text_service.rb +200 -0
- data/app/services/ragdoll/document_converter.rb +216 -0
- data/app/services/ragdoll/document_management.rb +117 -9
- data/app/services/ragdoll/document_processor.rb +213 -311
- data/app/services/ragdoll/image_to_text_service.rb +322 -0
- data/app/services/ragdoll/migration_service.rb +340 -0
- data/app/services/ragdoll/text_extraction_service.rb +422 -0
- data/app/services/ragdoll/unified_document_management.rb +300 -0
- data/db/migrate/20250923000001_create_ragdoll_unified_contents.rb +87 -0
- data/lib/ragdoll/core/client.rb +2 -2
- data/lib/ragdoll/core/version.rb +1 -1
- data/lib/ragdoll/core.rb +7 -0
- metadata +11 -2
data/README.md
CHANGED
@@ -1,7 +1,7 @@
|
|
1
|
-
<
|
2
|
-
|
3
|
-
|
4
|
-
|
1
|
+
> [!CAUTION]<br />
|
2
|
+
> **Software Under Development by a Crazy Man**<br />
|
3
|
+
> Gave up on the multi-modal vectorization approach,<br />
|
4
|
+
> now using a unified text-based RAG architecture.
|
5
5
|
<br />
|
6
6
|
<div align="center">
|
7
7
|
<table>
|
@@ -12,7 +12,8 @@
|
|
12
12
|
</a>
|
13
13
|
</td>
|
14
14
|
<td width="50%" valign="top">
|
15
|
-
<p
|
15
|
+
<p><strong>🔄 NEW: Unified Text-Based RAG Architecture</strong></p>
|
16
|
+
<p>Ragdoll has evolved to a unified text-based RAG (Retrieval-Augmented Generation) architecture that converts all media types—text, images, audio, and video—to comprehensive text representations before vectorization. This approach enables true cross-modal search where you can find images through their AI-generated descriptions, audio through transcripts, and all content through a single, powerful text-based search index.</p>
|
16
17
|
</td>
|
17
18
|
</tr>
|
18
19
|
</table>
|
@@ -20,62 +21,66 @@
|
|
20
21
|
|
21
22
|
# Ragdoll
|
22
23
|
|
23
|
-
|
24
|
+
**Unified Text-Based RAG (Retrieval-Augmented Generation) library built on ActiveRecord.** Features PostgreSQL + pgvector for high-performance semantic search with a simplified architecture that converts all media types to searchable text.
|
25
|
+
|
26
|
+
RAG does not have to be hard. The new unified approach eliminates the complexity of multi-modal vectorization while enabling powerful cross-modal search capabilities. See: [https://0x1eef.github.io/posts/an-introduction-to-rag-with-llm.rb/](https://0x1eef.github.io/posts/an-introduction-to-rag-with-llm.rb/)
|
27
|
+
|
28
|
+
## 🆕 **What's New: Unified Text-Based Architecture**
|
29
|
+
|
30
|
+
Ragdoll 2.0 introduces a revolutionary unified approach:
|
24
31
|
|
25
|
-
|
32
|
+
- **All Media → Text**: Images become comprehensive descriptions, audio becomes transcripts
|
33
|
+
- **Single Embedding Model**: One text embedding model for all content types
|
34
|
+
- **Cross-Modal Search**: Find images through descriptions, audio through transcripts
|
35
|
+
- **Simplified Architecture**: No more complex STI (Single Table Inheritance) models
|
36
|
+
- **Better Search**: Unified text index enables more sophisticated queries
|
37
|
+
- **Migration Path**: Smooth transition from the previous multi-modal system
|
26
38
|
|
27
39
|
## Overview
|
28
40
|
|
29
|
-
Ragdoll is a database-first,
|
41
|
+
Ragdoll is a database-first, unified text-based Retrieval-Augmented Generation (RAG) library for Ruby. It pairs PostgreSQL + pgvector with an ActiveRecord-driven schema to deliver fast, production-grade semantic search through a simplified unified architecture.
|
30
42
|
|
31
|
-
The library
|
43
|
+
The library converts all document types to rich text representations: PDFs and documents are extracted as text, images are converted to comprehensive AI-generated descriptions, and audio files are transcribed. This unified approach enables powerful cross-modal search while maintaining simplicity.
|
32
44
|
|
33
|
-
### Why
|
45
|
+
### Why the New Unified Architecture?
|
34
46
|
|
35
|
-
-
|
36
|
-
-
|
37
|
-
-
|
38
|
-
-
|
39
|
-
-
|
40
|
-
-
|
47
|
+
- **Simplified Complexity**: Single content model instead of multiple polymorphic types
|
48
|
+
- **Cross-Modal Search**: Find images by searching for objects or concepts in their descriptions
|
49
|
+
- **Unified Index**: One text-based search index for all content types
|
50
|
+
- **Better Retrieval**: Text descriptions often contain more searchable information than raw media
|
51
|
+
- **Cost Effective**: Single embedding model instead of specialized models per media type
|
52
|
+
- **Easier Maintenance**: One embedding pipeline to maintain and optimize
|
41
53
|
|
42
54
|
### Key Capabilities
|
43
55
|
|
44
|
-
-
|
45
|
-
-
|
46
|
-
-
|
47
|
-
- Search
|
48
|
-
-
|
49
|
-
-
|
56
|
+
- **Universal Text Conversion**: Converts any media type to searchable text
|
57
|
+
- **AI-Powered Descriptions**: Comprehensive image descriptions using vision models
|
58
|
+
- **Audio Transcription**: Speech-to-text conversion for audio content
|
59
|
+
- **Semantic Search**: Vector similarity search across all converted content
|
60
|
+
- **Cross-Modal Retrieval**: Search for images using text descriptions of their content
|
61
|
+
- **Content Quality Assessment**: Automatic scoring of converted content quality
|
62
|
+
- **Migration Support**: Tools to migrate from previous multi-modal architecture
|
50
63
|
|
51
64
|
## Table of Contents
|
52
65
|
|
53
66
|
- [Quick Start](#quick-start)
|
67
|
+
- [Unified Architecture Guide](#unified-architecture-guide)
|
68
|
+
- [Document Processing Pipeline](#document-processing-pipeline)
|
69
|
+
- [Cross-Modal Search](#cross-modal-search)
|
70
|
+
- [Migration from Multi-Modal](#migration-from-multi-modal)
|
54
71
|
- [API Overview](#api-overview)
|
55
|
-
- [Search and Retrieval](#search-and-retrieval)
|
56
|
-
- [Search Analytics and Tracking](#search-analytics-and-tracking)
|
57
|
-
- [System Operations](#system-operations)
|
58
72
|
- [Configuration](#configuration)
|
59
|
-
- [Current Implementation Status](#current-implementation-status)
|
60
|
-
- [Architecture Highlights](#architecture-highlights)
|
61
|
-
- [Text Document Processing](#text-document-processing-current)
|
62
|
-
- [PostgreSQL + pgvector Configuration](#postgresql--pgvector-configuration)
|
63
|
-
- [Performance Features](#performance-features)
|
64
73
|
- [Installation](#installation)
|
65
74
|
- [Requirements](#requirements)
|
66
|
-
- [
|
67
|
-
- [Environment Variables](#environment-variables)
|
75
|
+
- [Performance Features](#performance-features)
|
68
76
|
- [Troubleshooting](#troubleshooting)
|
69
|
-
- [Related Projects](#related-projects)
|
70
|
-
- [Key Design Principles](#key-design-principles)
|
71
|
-
- [Contributing & Support](#contributing--support)
|
72
77
|
|
73
78
|
## Quick Start
|
74
79
|
|
75
80
|
```ruby
|
76
81
|
require 'ragdoll'
|
77
82
|
|
78
|
-
# Configure with
|
83
|
+
# Configure with unified text-based architecture
|
79
84
|
Ragdoll.configure do |config|
|
80
85
|
# Database configuration (PostgreSQL only)
|
81
86
|
config.database_config = {
|
@@ -88,225 +93,234 @@ Ragdoll.configure do |config|
|
|
88
93
|
auto_migrate: true
|
89
94
|
}
|
90
95
|
|
91
|
-
#
|
92
|
-
config.
|
93
|
-
|
94
|
-
|
96
|
+
# Enable unified text-based models
|
97
|
+
config.use_unified_models = true
|
98
|
+
|
99
|
+
# Text conversion settings
|
100
|
+
config.text_conversion = {
|
101
|
+
image_detail_level: :comprehensive, # :minimal, :standard, :comprehensive, :analytical
|
102
|
+
audio_transcription_provider: :openai, # :azure, :google, :whisper_local
|
103
|
+
enable_fallback_descriptions: true
|
104
|
+
}
|
95
105
|
|
96
|
-
#
|
97
|
-
config.
|
98
|
-
config.
|
106
|
+
# Single embedding model for all content
|
107
|
+
config.embedding_model = "text-embedding-3-large"
|
108
|
+
config.embedding_provider = :openai
|
99
109
|
|
100
|
-
#
|
101
|
-
config.
|
102
|
-
config.logging_config[:log_filepath] = File.join(Dir.home, '.ragdoll', 'ragdoll.log')
|
110
|
+
# Ruby LLM configuration
|
111
|
+
config.ruby_llm_config[:openai][:api_key] = ENV['OPENAI_API_KEY']
|
103
112
|
end
|
104
113
|
|
105
|
-
# Add documents -
|
114
|
+
# Add documents - all types converted to text
|
106
115
|
result = Ragdoll.add_document(path: 'research_paper.pdf')
|
107
|
-
|
108
|
-
|
116
|
+
image_result = Ragdoll.add_document(path: 'diagram.png') # Converted to description
|
117
|
+
audio_result = Ragdoll.add_document(path: 'lecture.mp3') # Converted to transcript
|
109
118
|
|
110
|
-
#
|
111
|
-
|
112
|
-
|
119
|
+
# Cross-modal search - find images by describing their content
|
120
|
+
results = Ragdoll.search(query: 'neural network architecture diagram')
|
121
|
+
# This can return the image document if its AI description mentions neural networks
|
113
122
|
|
114
|
-
# Search
|
115
|
-
results = Ragdoll.search(query: '
|
123
|
+
# Search for audio content by transcript content
|
124
|
+
results = Ragdoll.search(query: 'machine learning discussion')
|
125
|
+
# Returns audio documents whose transcripts mention machine learning
|
116
126
|
|
117
|
-
#
|
118
|
-
document = Ragdoll.get_document(id:
|
127
|
+
# Check content quality
|
128
|
+
document = Ragdoll.get_document(id: result[:document_id])
|
129
|
+
puts document[:content_quality_score] # 0.0 to 1.0 rating
|
119
130
|
```
|
120
131
|
|
121
|
-
##
|
132
|
+
## Unified Architecture Guide
|
122
133
|
|
123
|
-
|
134
|
+
### Document Processing Pipeline
|
124
135
|
|
125
|
-
|
136
|
+
The new unified pipeline converts all media types to searchable text:
|
126
137
|
|
127
138
|
```ruby
|
128
|
-
#
|
129
|
-
|
130
|
-
|
131
|
-
puts result[:document_id] # "123"
|
132
|
-
puts result[:message] # "Document 'document' added successfully with ID 123"
|
133
|
-
puts result[:embeddings_queued] # true
|
134
|
-
|
135
|
-
# Check document processing status
|
136
|
-
status = Ragdoll.document_status(id: result[:document_id])
|
137
|
-
puts status[:status] # "processed"
|
138
|
-
puts status[:embeddings_count] # 15
|
139
|
-
puts status[:embeddings_ready] # true
|
140
|
-
puts status[:message] # "Document processed successfully with 15 embeddings"
|
141
|
-
|
142
|
-
# Get detailed document information
|
143
|
-
document = Ragdoll.get_document(id: result[:document_id])
|
144
|
-
puts document[:title] # "document"
|
145
|
-
puts document[:status] # "processed"
|
146
|
-
puts document[:embeddings_count] # 15
|
147
|
-
puts document[:content_length] # 5000
|
139
|
+
# Text files: Direct extraction
|
140
|
+
text_doc = Ragdoll.add_document(path: 'article.md')
|
141
|
+
# Content: Original markdown text
|
148
142
|
|
149
|
-
#
|
150
|
-
Ragdoll.
|
143
|
+
# PDF/DOCX: Text extraction
|
144
|
+
pdf_doc = Ragdoll.add_document(path: 'research.pdf')
|
145
|
+
# Content: Extracted text from all pages
|
151
146
|
|
152
|
-
#
|
153
|
-
Ragdoll.
|
147
|
+
# Images: AI-generated descriptions
|
148
|
+
image_doc = Ragdoll.add_document(path: 'chart.png')
|
149
|
+
# Content: "Bar chart showing quarterly sales data with increasing trend..."
|
154
150
|
|
155
|
-
#
|
156
|
-
|
151
|
+
# Audio: Speech-to-text transcription
|
152
|
+
audio_doc = Ragdoll.add_document(path: 'meeting.mp3')
|
153
|
+
# Content: "In this meeting we discussed the quarterly results..."
|
157
154
|
|
158
|
-
#
|
159
|
-
|
160
|
-
|
161
|
-
puts stats[:total_embeddings] # 1250
|
155
|
+
# Video: Audio transcription + metadata
|
156
|
+
video_doc = Ragdoll.add_document(path: 'presentation.mp4')
|
157
|
+
# Content: Combination of audio transcript and video metadata
|
162
158
|
```
|
163
159
|
|
164
|
-
###
|
160
|
+
### Text Conversion Services
|
165
161
|
|
166
162
|
```ruby
|
167
|
-
#
|
168
|
-
|
169
|
-
|
170
|
-
|
171
|
-
results = Ragdoll.search(
|
172
|
-
query: 'machine learning',
|
173
|
-
session_id: 123, # Optional: track user sessions
|
174
|
-
user_id: 456 # Optional: track by user
|
175
|
-
)
|
163
|
+
# Use individual conversion services
|
164
|
+
text_content = Ragdoll::TextExtractionService.extract('document.pdf')
|
165
|
+
image_description = Ragdoll::ImageToTextService.convert('photo.jpg', detail_level: :comprehensive)
|
166
|
+
audio_transcript = Ragdoll::AudioToTextService.transcribe('speech.wav')
|
176
167
|
|
177
|
-
#
|
178
|
-
|
179
|
-
image_results = Ragdoll.search(query: 'neural network diagram', content_type: 'image')
|
180
|
-
audio_results = Ragdoll.search(query: 'AI discussion', content_type: 'audio')
|
168
|
+
# Use unified converter (orchestrates all services)
|
169
|
+
unified_text = Ragdoll::DocumentConverter.convert_to_text('any_file.ext')
|
181
170
|
|
182
|
-
#
|
183
|
-
|
184
|
-
|
185
|
-
|
186
|
-
keywords: ['AI', 'neural networks'],
|
187
|
-
tags: ['technical']
|
188
|
-
)
|
171
|
+
# Manage documents with unified approach
|
172
|
+
management = Ragdoll::UnifiedDocumentManagement.new
|
173
|
+
document = management.add_document('mixed_media_file.mov')
|
174
|
+
```
|
189
175
|
|
190
|
-
|
191
|
-
context = Ragdoll.get_context(query: 'machine learning', limit: 5)
|
176
|
+
### Content Quality Assessment
|
192
177
|
|
193
|
-
|
194
|
-
|
195
|
-
|
196
|
-
|
178
|
+
```ruby
|
179
|
+
# Get content quality scores
|
180
|
+
document = Ragdoll::UnifiedDocument.find(id)
|
181
|
+
quality = document.content_quality_score # 0.0 to 1.0
|
182
|
+
|
183
|
+
# Quality factors:
|
184
|
+
# - Content length (50-2000 words optimal)
|
185
|
+
# - Original media type (text > documents > descriptions > placeholders)
|
186
|
+
# - Conversion success (full content > partial > fallback)
|
187
|
+
|
188
|
+
# Batch quality assessment
|
189
|
+
stats = Ragdoll::UnifiedContent.stats
|
190
|
+
puts stats[:content_quality_distribution]
|
191
|
+
# => { high: 150, medium: 75, low: 25 }
|
192
|
+
```
|
193
|
+
|
194
|
+
## Cross-Modal Search
|
195
|
+
|
196
|
+
The unified architecture enables powerful cross-modal search capabilities:
|
197
|
+
|
198
|
+
```ruby
|
199
|
+
# Find images by describing their visual content
|
200
|
+
image_results = Ragdoll.search(query: 'red sports car in parking lot')
|
201
|
+
# Returns image documents whose AI descriptions match the query
|
202
|
+
|
203
|
+
# Search for audio by spoken content
|
204
|
+
audio_results = Ragdoll.search(query: 'quarterly sales meeting discussion')
|
205
|
+
# Returns audio documents whose transcripts contain these topics
|
206
|
+
|
207
|
+
# Mixed results across all media types
|
208
|
+
all_results = Ragdoll.search(query: 'artificial intelligence')
|
209
|
+
# Returns text documents, images with AI descriptions, and audio transcripts
|
210
|
+
# all ranked by relevance to the query
|
211
|
+
|
212
|
+
# Filter by original media type while searching text
|
213
|
+
image_only = Ragdoll.search(
|
214
|
+
query: 'machine learning workflow',
|
215
|
+
original_media_type: 'image'
|
197
216
|
)
|
198
217
|
|
199
|
-
#
|
200
|
-
|
201
|
-
query: '
|
202
|
-
|
203
|
-
text_weight: 0.3
|
218
|
+
# Search with quality filtering
|
219
|
+
high_quality = Ragdoll.search(
|
220
|
+
query: 'deep learning',
|
221
|
+
min_quality_score: 0.7
|
204
222
|
)
|
205
223
|
```
|
206
224
|
|
207
|
-
|
225
|
+
## Migration from Multi-Modal
|
208
226
|
|
209
|
-
|
227
|
+
Migrate smoothly from the previous multi-modal architecture:
|
210
228
|
|
211
229
|
```ruby
|
212
|
-
#
|
213
|
-
|
230
|
+
# Check migration readiness
|
231
|
+
migration_service = Ragdoll::MigrationService.new
|
232
|
+
report = migration_service.create_comparison_report
|
214
233
|
|
215
|
-
|
216
|
-
|
217
|
-
puts "#{doc.title}: #{doc.keywords_match_count} matches"
|
218
|
-
end
|
234
|
+
puts "Migration Benefits:"
|
235
|
+
report[:benefits].each { |benefit, description| puts "- #{description}" }
|
219
236
|
|
220
|
-
#
|
221
|
-
results = Ragdoll::
|
222
|
-
|
223
|
-
|
224
|
-
results.each do |doc|
|
225
|
-
puts "#{doc.title}: #{doc.total_keywords_count} total keywords"
|
226
|
-
end
|
227
|
-
|
228
|
-
# Combined semantic + keywords search for best results
|
229
|
-
results = Ragdoll.search(
|
230
|
-
query: 'artificial intelligence applications',
|
231
|
-
keywords: ['ai', 'machine learning', 'neural networks'],
|
232
|
-
limit: 10
|
237
|
+
# Migrate all documents
|
238
|
+
results = Ragdoll::MigrationService.migrate_all_documents(
|
239
|
+
batch_size: 50,
|
240
|
+
process_embeddings: true
|
233
241
|
)
|
234
242
|
|
235
|
-
|
236
|
-
|
237
|
-
|
238
|
-
|
239
|
-
|
243
|
+
puts "Migrated: #{results[:migrated]} documents"
|
244
|
+
puts "Errors: #{results[:errors].length}"
|
245
|
+
|
246
|
+
# Validate migration integrity
|
247
|
+
validation = migration_service.validate_migration
|
248
|
+
puts "Validation passed: #{validation[:passed]}/#{validation[:total_checks]} checks"
|
240
249
|
|
241
|
-
#
|
242
|
-
|
243
|
-
# Will match documents with keywords: ['python', 'data-science', 'ai']
|
250
|
+
# Migrate individual document
|
251
|
+
migrated_doc = Ragdoll::MigrationService.migrate_document(old_document_id)
|
244
252
|
```
|
245
253
|
|
246
|
-
|
247
|
-
|
248
|
-
|
249
|
-
- **Smart Scoring**: Results ordered by match count or document focus
|
250
|
-
- **Case Insensitive**: Automatic keyword normalization
|
251
|
-
- **Integration Ready**: Works seamlessly with semantic search
|
252
|
-
- **Inspired by `find_matching_entries.rb`**: Optimized for PostgreSQL arrays
|
254
|
+
## API Overview
|
255
|
+
|
256
|
+
### Unified Document Management
|
253
257
|
|
254
|
-
|
258
|
+
```ruby
|
259
|
+
# Add documents with automatic text conversion
|
260
|
+
result = Ragdoll.add_document(path: 'any_file.ext')
|
261
|
+
puts result[:document_id]
|
262
|
+
puts result[:content_preview] # First 100 characters of converted text
|
263
|
+
|
264
|
+
# Batch processing with unified pipeline
|
265
|
+
files = ['doc.pdf', 'image.jpg', 'audio.mp3']
|
266
|
+
results = Ragdoll::UnifiedDocumentManagement.new.batch_process_documents(files)
|
267
|
+
|
268
|
+
# Reprocess with different conversion settings
|
269
|
+
Ragdoll::UnifiedDocumentManagement.new.reprocess_document(
|
270
|
+
document_id,
|
271
|
+
image_detail_level: :analytical
|
272
|
+
)
|
273
|
+
```
|
255
274
|
|
256
|
-
|
275
|
+
### Search API
|
257
276
|
|
258
277
|
```ruby
|
259
|
-
#
|
260
|
-
|
261
|
-
puts "Total searches: #{analytics[:total_searches]}"
|
262
|
-
puts "Unique queries: #{analytics[:unique_queries]}"
|
263
|
-
puts "Average execution time: #{analytics[:avg_execution_time]}ms"
|
264
|
-
puts "Click-through rate: #{analytics[:click_through_rate]}%"
|
265
|
-
|
266
|
-
# Find similar searches using vector similarity
|
267
|
-
search = Ragdoll::Search.first
|
268
|
-
similar_searches = search.nearest_neighbors(:query_embedding, distance: :cosine).limit(5)
|
269
|
-
|
270
|
-
similar_searches.each do |similar|
|
271
|
-
puts "Query: #{similar.query}"
|
272
|
-
puts "Similarity: #{similar.neighbor_distance}"
|
273
|
-
puts "Results: #{similar.results_count}"
|
274
|
-
end
|
278
|
+
# Unified search across all content types
|
279
|
+
results = Ragdoll.search(query: 'machine learning algorithms')
|
275
280
|
|
276
|
-
#
|
277
|
-
|
278
|
-
|
281
|
+
# Search with original media type context
|
282
|
+
results.each do |doc|
|
283
|
+
puts "#{doc.title} (originally #{doc.original_media_type})"
|
284
|
+
puts "Quality: #{doc.content_quality_score.round(2)}"
|
285
|
+
puts "Content: #{doc.content[0..100]}..."
|
286
|
+
end
|
279
287
|
|
280
|
-
#
|
281
|
-
|
282
|
-
query: '
|
283
|
-
|
288
|
+
# Advanced search with content quality
|
289
|
+
high_quality_results = Ragdoll.search(
|
290
|
+
query: 'neural networks',
|
291
|
+
min_quality_score: 0.8,
|
292
|
+
limit: 10
|
284
293
|
)
|
285
294
|
```
|
286
295
|
|
287
|
-
###
|
296
|
+
### Content Analysis
|
288
297
|
|
289
298
|
```ruby
|
290
|
-
#
|
291
|
-
|
292
|
-
# Returns information about documents, content types, embeddings, etc.
|
299
|
+
# Analyze converted content
|
300
|
+
document = Ragdoll::UnifiedDocument.find(id)
|
293
301
|
|
294
|
-
#
|
295
|
-
|
302
|
+
# Check original media type
|
303
|
+
puts document.unified_contents.first.original_media_type # 'image', 'audio', 'text', etc.
|
296
304
|
|
297
|
-
#
|
298
|
-
|
305
|
+
# View conversion metadata
|
306
|
+
content = document.unified_contents.first
|
307
|
+
puts content.conversion_method # 'image_to_text', 'audio_transcription', etc.
|
308
|
+
puts content.metadata # Conversion settings and results
|
299
309
|
|
300
|
-
#
|
301
|
-
|
310
|
+
# Quality metrics
|
311
|
+
puts content.word_count
|
312
|
+
puts content.character_count
|
313
|
+
puts content.content_quality_score
|
302
314
|
```
|
303
315
|
|
304
|
-
|
316
|
+
## Configuration
|
305
317
|
|
306
318
|
```ruby
|
307
|
-
# Configure the system
|
308
319
|
Ragdoll.configure do |config|
|
309
|
-
#
|
320
|
+
# Enable unified text-based architecture
|
321
|
+
config.use_unified_models = true
|
322
|
+
|
323
|
+
# Database configuration (PostgreSQL required)
|
310
324
|
config.database_config = {
|
311
325
|
adapter: 'postgresql',
|
312
326
|
database: 'ragdoll_production',
|
@@ -317,141 +331,74 @@ Ragdoll.configure do |config|
|
|
317
331
|
auto_migrate: true
|
318
332
|
}
|
319
333
|
|
320
|
-
#
|
321
|
-
config.
|
322
|
-
|
323
|
-
|
334
|
+
# Text conversion settings
|
335
|
+
config.text_conversion = {
|
336
|
+
# Image conversion detail levels:
|
337
|
+
# :minimal - Brief one-sentence description
|
338
|
+
# :standard - Main elements and composition
|
339
|
+
# :comprehensive - Detailed description including objects, colors, mood
|
340
|
+
# :analytical - Thorough analysis including artistic elements
|
341
|
+
image_detail_level: :comprehensive,
|
342
|
+
|
343
|
+
# Audio transcription providers
|
344
|
+
audio_transcription_provider: :openai, # :azure, :google, :whisper_local
|
345
|
+
|
346
|
+
# Fallback behavior
|
347
|
+
enable_fallback_descriptions: true,
|
348
|
+
fallback_timeout: 30 # seconds
|
349
|
+
}
|
350
|
+
|
351
|
+
# Single embedding model for all content types
|
352
|
+
config.embedding_model = "text-embedding-3-large"
|
353
|
+
config.embedding_provider = :openai
|
324
354
|
|
355
|
+
# Ruby LLM configuration for text conversion
|
356
|
+
config.ruby_llm_config[:openai][:api_key] = ENV['OPENAI_API_KEY']
|
325
357
|
config.ruby_llm_config[:anthropic][:api_key] = ENV['ANTHROPIC_API_KEY']
|
326
|
-
config.ruby_llm_config[:google][:api_key] = ENV['GOOGLE_API_KEY']
|
327
358
|
|
328
|
-
#
|
329
|
-
config.
|
330
|
-
|
331
|
-
|
332
|
-
|
333
|
-
|
334
|
-
config.models[:embedding][:audio] = 'audio-embedding-3-small'
|
359
|
+
# Vision model configuration for image descriptions
|
360
|
+
config.vision_config = {
|
361
|
+
primary_model: 'gpt-4-vision-preview',
|
362
|
+
fallback_model: 'gemini-pro-vision',
|
363
|
+
temperature: 0.2
|
364
|
+
}
|
335
365
|
|
336
|
-
#
|
337
|
-
config.
|
338
|
-
|
366
|
+
# Audio transcription configuration
|
367
|
+
config.audio_config = {
|
368
|
+
openai: {
|
369
|
+
model: 'whisper-1',
|
370
|
+
temperature: 0.0
|
371
|
+
},
|
372
|
+
azure: {
|
373
|
+
endpoint: ENV['AZURE_SPEECH_ENDPOINT'],
|
374
|
+
api_key: ENV['AZURE_SPEECH_KEY']
|
375
|
+
}
|
376
|
+
}
|
339
377
|
|
340
378
|
# Processing settings
|
341
379
|
config.chunking[:text][:max_tokens] = 1000
|
342
380
|
config.chunking[:text][:overlap] = 200
|
343
381
|
config.search[:similarity_threshold] = 0.7
|
344
382
|
config.search[:max_results] = 10
|
345
|
-
end
|
346
|
-
```
|
347
|
-
|
348
|
-
## Current Implementation Status
|
349
|
-
|
350
|
-
### ✅ **Fully Implemented**
|
351
|
-
- **Text document processing**: PDF, DOCX, HTML, Markdown, plain text files
|
352
|
-
- **Embedding generation**: Text chunking and vector embedding creation
|
353
|
-
- **Database schema**: Multi-modal polymorphic architecture with PostgreSQL + pgvector
|
354
|
-
- **Dual metadata architecture**: Separate LLM-generated content analysis and file properties
|
355
|
-
- **Search functionality**: Semantic search with cosine similarity and usage analytics
|
356
|
-
- **Search tracking system**: Comprehensive analytics with query embeddings, click-through tracking, and performance monitoring
|
357
|
-
- **Document management**: Add, update, delete, list operations
|
358
|
-
- **Background processing**: ActiveJob integration for async embedding generation
|
359
|
-
- **LLM metadata generation**: AI-powered structured content analysis with schema validation
|
360
|
-
- **Logging**: Configurable file-based logging with multiple levels
|
361
|
-
|
362
|
-
### 🚧 **In Development**
|
363
|
-
- **Image processing**: Framework exists but vision AI integration needs completion
|
364
|
-
- **Audio processing**: Framework exists but speech-to-text integration needs completion
|
365
|
-
- **Hybrid search**: Combining semantic and full-text search capabilities
|
366
|
-
|
367
|
-
### 📋 **Planned Features**
|
368
|
-
- **Multi-modal search**: Search across text, image, and audio content types
|
369
|
-
- **Content-type specific embedding models**: Different models for text, image, audio
|
370
|
-
- **Enhanced metadata schemas**: Domain-specific metadata templates
|
371
|
-
|
372
|
-
## Architecture Highlights
|
373
|
-
|
374
|
-
### Dual Metadata Design
|
375
|
-
|
376
|
-
Ragdoll uses a sophisticated dual metadata architecture to separate concerns:
|
377
|
-
|
378
|
-
- **`metadata` (JSON)**: LLM-generated content analysis including summary, keywords, classification, topics, sentiment, and domain-specific insights
|
379
|
-
- **`file_metadata` (JSON)**: System-generated file properties including size, MIME type, dimensions, processing parameters, and technical characteristics
|
380
|
-
|
381
|
-
This separation enables both semantic search operations on content meaning and efficient file management operations.
|
382
|
-
|
383
|
-
### Polymorphic Multi-Modal Architecture
|
384
|
-
|
385
|
-
The database schema uses polymorphic associations to elegantly support multiple content types:
|
386
|
-
|
387
|
-
- **Documents**: Central entity with dual metadata columns
|
388
|
-
- **Content Types**: Specialized tables for `text_contents`, `image_contents`, `audio_contents`
|
389
|
-
- **Embeddings**: Unified vector storage via polymorphic `embeddable` associations
|
390
|
-
|
391
|
-
## Text Document Processing (Current)
|
392
|
-
|
393
|
-
Currently, Ragdoll processes text documents through:
|
394
|
-
|
395
|
-
1. **Content Extraction**: Extracts text from PDF, DOCX, HTML, Markdown, and plain text
|
396
|
-
2. **Metadata Generation**: AI-powered analysis creates structured content metadata
|
397
|
-
3. **Text Chunking**: Splits content into manageable chunks with configurable size/overlap
|
398
|
-
4. **Embedding Generation**: Creates vector embeddings using OpenAI or other providers
|
399
|
-
5. **Database Storage**: Stores in polymorphic multi-modal architecture with dual metadata
|
400
|
-
6. **Search**: Semantic search using cosine similarity with usage analytics
|
401
|
-
|
402
|
-
### Example Usage
|
403
|
-
|
404
|
-
```ruby
|
405
|
-
# Add a text document
|
406
|
-
result = Ragdoll.add_document(path: 'document.pdf')
|
407
|
-
|
408
|
-
# Check processing status
|
409
|
-
status = Ragdoll.document_status(id: result[:document_id])
|
410
|
-
|
411
|
-
# Search the content
|
412
|
-
results = Ragdoll.search(query: 'machine learning')
|
413
|
-
```
|
414
|
-
|
415
|
-
## PostgreSQL + pgvector Configuration
|
416
|
-
|
417
|
-
### Database Setup
|
418
|
-
|
419
|
-
```bash
|
420
|
-
# Install PostgreSQL and pgvector
|
421
|
-
brew install postgresql pgvector # macOS
|
422
|
-
# or
|
423
|
-
apt-get install postgresql postgresql-contrib # Ubuntu
|
424
|
-
|
425
|
-
# Create database and enable pgvector extension
|
426
|
-
createdb ragdoll_production
|
427
|
-
psql -d ragdoll_production -c "CREATE EXTENSION IF NOT EXISTS vector;"
|
428
|
-
```
|
429
383
|
|
430
|
-
|
431
|
-
|
432
|
-
|
433
|
-
|
434
|
-
|
435
|
-
adapter: 'postgresql',
|
436
|
-
database: 'ragdoll_production',
|
437
|
-
username: 'ragdoll',
|
438
|
-
password: ENV['DATABASE_PASSWORD'],
|
439
|
-
host: 'localhost',
|
440
|
-
port: 5432,
|
441
|
-
pool: 20,
|
442
|
-
auto_migrate: true
|
384
|
+
# Quality thresholds
|
385
|
+
config.quality_thresholds = {
|
386
|
+
high_quality: 0.8,
|
387
|
+
medium_quality: 0.5,
|
388
|
+
min_content_length: 50
|
443
389
|
}
|
444
390
|
end
|
445
391
|
```
|
446
392
|
|
447
393
|
## Performance Features
|
448
394
|
|
449
|
-
- **
|
450
|
-
- **
|
451
|
-
- **
|
452
|
-
- **Batch
|
453
|
-
- **
|
454
|
-
- **
|
395
|
+
- **Unified Index**: Single text-based search index for all content types
|
396
|
+
- **Optimized Conversion**: Efficient text extraction and AI-powered description generation
|
397
|
+
- **Quality Scoring**: Automatic assessment of converted content quality
|
398
|
+
- **Batch Processing**: Efficient bulk document processing with progress tracking
|
399
|
+
- **Smart Caching**: Caches conversion results to avoid reprocessing
|
400
|
+
- **Background Jobs**: Asynchronous processing for large files
|
401
|
+
- **Cross-Modal Optimization**: Specialized optimizations for different media type conversions
|
455
402
|
|
456
403
|
## Installation
|
457
404
|
|
@@ -461,6 +408,12 @@ brew install postgresql pgvector # macOS
|
|
461
408
|
# or
|
462
409
|
apt-get install postgresql postgresql-contrib # Ubuntu
|
463
410
|
|
411
|
+
# For image processing
|
412
|
+
brew install imagemagick
|
413
|
+
|
414
|
+
# For audio processing (optional, depending on provider)
|
415
|
+
brew install ffmpeg
|
416
|
+
|
464
417
|
# Install gem
|
465
418
|
gem install ragdoll
|
466
419
|
|
@@ -471,61 +424,83 @@ gem 'ragdoll'
|
|
471
424
|
## Requirements
|
472
425
|
|
473
426
|
- **Ruby**: 3.2+
|
474
|
-
- **PostgreSQL**: 12+ with pgvector extension
|
475
|
-
- **
|
427
|
+
- **PostgreSQL**: 12+ with pgvector extension
|
428
|
+
- **ImageMagick**: For image processing and metadata extraction
|
429
|
+
- **FFmpeg**: Optional, for advanced audio/video processing
|
430
|
+
- **Dependencies**: activerecord, pg, pgvector, neighbor, ruby_llm, pdf-reader, docx, rmagick, tempfile
|
476
431
|
|
477
|
-
|
432
|
+
### Vision Model Requirements
|
478
433
|
|
479
|
-
|
480
|
-
-
|
481
|
-
-
|
482
|
-
-
|
483
|
-
-
|
434
|
+
For comprehensive image descriptions:
|
435
|
+
- **OpenAI**: GPT-4 Vision (recommended)
|
436
|
+
- **Google**: Gemini Pro Vision
|
437
|
+
- **Anthropic**: Claude 3 with vision capabilities
|
438
|
+
- **Local**: Ollama with vision-capable models
|
484
439
|
|
485
|
-
|
440
|
+
### Audio Transcription Requirements
|
486
441
|
|
487
|
-
|
488
|
-
|
489
|
-
-
|
490
|
-
-
|
491
|
-
- `OPENAI_PROJECT` — optional, for OpenAI project scoping
|
492
|
-
- `ANTHROPIC_API_KEY` — optional, for Anthropic models
|
493
|
-
- `GOOGLE_API_KEY` — optional, for Google models
|
494
|
-
- `DATABASE_PASSWORD` — your PostgreSQL password if not using peer auth
|
442
|
+
- **OpenAI**: Whisper API (recommended)
|
443
|
+
- **Azure**: Speech Services
|
444
|
+
- **Google**: Cloud Speech-to-Text
|
445
|
+
- **Local**: Whisper installation
|
495
446
|
|
496
447
|
## Troubleshooting
|
497
448
|
|
498
|
-
###
|
499
|
-
|
500
|
-
- Ensure the extension is enabled in your database:
|
449
|
+
### Image Processing Issues
|
501
450
|
|
502
451
|
```bash
|
503
|
-
|
452
|
+
# Verify ImageMagick installation
|
453
|
+
convert -version
|
454
|
+
|
455
|
+
# Check vision model access
|
456
|
+
irb -r ragdoll
|
457
|
+
> Ragdoll::ImageToTextService.new.convert('test_image.jpg')
|
504
458
|
```
|
505
459
|
|
506
|
-
|
460
|
+
### Audio Processing Issues
|
507
461
|
|
508
|
-
|
462
|
+
```bash
|
463
|
+
# For Whisper local installation
|
464
|
+
pip install openai-whisper
|
509
465
|
|
510
|
-
|
511
|
-
-
|
512
|
-
|
466
|
+
# Test audio file support
|
467
|
+
irb -r ragdoll
|
468
|
+
> Ragdoll::AudioToTextService.new.transcribe('test_audio.wav')
|
469
|
+
```
|
513
470
|
|
514
|
-
|
471
|
+
### Content Quality Issues
|
515
472
|
|
516
|
-
|
517
|
-
|
473
|
+
```ruby
|
474
|
+
# Check content quality distribution
|
475
|
+
stats = Ragdoll::UnifiedContent.stats
|
476
|
+
puts stats[:content_quality_distribution]
|
477
|
+
|
478
|
+
# Reprocess low-quality content
|
479
|
+
low_quality = Ragdoll::UnifiedDocument.joins(:unified_contents)
|
480
|
+
.where('unified_contents.content_quality_score < 0.5')
|
481
|
+
|
482
|
+
low_quality.each do |doc|
|
483
|
+
Ragdoll::UnifiedDocumentManagement.new.reprocess_document(
|
484
|
+
doc.id,
|
485
|
+
image_detail_level: :analytical
|
486
|
+
)
|
487
|
+
end
|
488
|
+
```
|
518
489
|
|
519
|
-
##
|
490
|
+
## Use Cases
|
520
491
|
|
521
|
-
|
492
|
+
- **Knowledge Bases**: Search across text documents, presentation images, and recorded meetings
|
493
|
+
- **Media Libraries**: Find images by visual content, audio by spoken topics
|
494
|
+
- **Research Collections**: Unified search across papers (text), charts (images), and interviews (audio)
|
495
|
+
- **Documentation Systems**: Search technical docs, architecture diagrams, and explanation videos
|
496
|
+
- **Educational Content**: Find learning materials across all media types through unified text search
|
522
497
|
|
523
498
|
## Key Design Principles
|
524
499
|
|
525
|
-
1. **
|
526
|
-
2. **
|
527
|
-
3. **
|
528
|
-
4. **
|
529
|
-
5. **
|
530
|
-
6. **
|
531
|
-
7. **
|
500
|
+
1. **Unified Text Representation**: All media types converted to searchable text
|
501
|
+
2. **Cross-Modal Search**: Images findable through descriptions, audio through transcripts
|
502
|
+
3. **Quality-Driven**: Automatic assessment and optimization of converted content
|
503
|
+
4. **Simplified Architecture**: Single content model instead of complex polymorphic relationships
|
504
|
+
5. **AI-Enhanced Conversion**: Leverages latest vision and speech models for rich text conversion
|
505
|
+
6. **Migration-Friendly**: Smooth transition path from previous multi-modal architecture
|
506
|
+
7. **Performance-Optimized**: Single embedding model and unified search index for speed
|