ragdoll 0.1.10 → 0.1.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md CHANGED
@@ -1,7 +1,7 @@
1
- <div align="center" style="background-color: yellow; color: black; padding: 20px; margin: 20px 0; border: 2px solid black; font-size: 48px; font-weight: bold;">
2
- ⚠️ CAUTION ⚠️<br />
3
- Software Under Development by a Crazy Man
4
- </div>
1
+ > [!CAUTION]<br />
2
+ > **Software Under Development by a Crazy Man**<br />
3
+ > Gave up on the multi-modal vectorization approach,<br />
4
+ > now using a unified text-based RAG architecture.
5
5
  <br />
6
6
  <div align="center">
7
7
  <table>
@@ -12,7 +12,8 @@
12
12
  </a>
13
13
  </td>
14
14
  <td width="50%" valign="top">
15
- <p>Multi-modal RAG (Retrieval-Augmented Generation) is an architecture that integrates multiple data types (such as text, images, and audio) to enhance AI response generation. It combines retrieval-based methods, which fetch relevant information from a knowledge base, with generative large language models (LLMs) that create coherent and contextually appropriate outputs. This approach allows for more comprehensive and engaging user interactions, such as chatbots that respond with both text and images or educational tools that incorporate visual aids into learning materials. By leveraging various modalities, multi-modal RAG systems improve context understanding and user experience.</p>
15
+ <p><strong>🔄 NEW: Unified Text-Based RAG Architecture</strong></p>
16
+ <p>Ragdoll has evolved to a unified text-based RAG (Retrieval-Augmented Generation) architecture that converts all media types—text, images, audio, and video—to comprehensive text representations before vectorization. This approach enables true cross-modal search where you can find images through their AI-generated descriptions, audio through transcripts, and all content through a single, powerful text-based search index.</p>
16
17
  </td>
17
18
  </tr>
18
19
  </table>
@@ -20,62 +21,66 @@
20
21
 
21
22
  # Ragdoll
22
23
 
23
- Database-oriented multi-modal RAG (Retrieval-Augmented Generation) library built on ActiveRecord. Features PostgreSQL + pgvector for high-performance semantic search, polymorphic content architecture, and dual metadata design for sophisticated document analysis.
24
+ **Unified Text-Based RAG (Retrieval-Augmented Generation) library built on ActiveRecord.** Features PostgreSQL + pgvector for high-performance semantic search with a simplified architecture that converts all media types to searchable text.
25
+
26
+ RAG does not have to be hard. The new unified approach eliminates the complexity of multi-modal vectorization while enabling powerful cross-modal search capabilities. See: [https://0x1eef.github.io/posts/an-introduction-to-rag-with-llm.rb/](https://0x1eef.github.io/posts/an-introduction-to-rag-with-llm.rb/)
27
+
28
+ ## 🆕 **What's New: Unified Text-Based Architecture**
29
+
30
+ Ragdoll 2.0 introduces a revolutionary unified approach:
24
31
 
25
- RAG does not have to be hard. Every week its getting simpler. The frontier LLM providers are starting to encorporate RAG services. For example OpenAI offers a vector search service. See: [https://0x1eef.github.io/posts/an-introduction-to-rag-with-llm.rb/](https://0x1eef.github.io/posts/an-introduction-to-rag-with-llm.rb/)
32
+ - **All Media Text**: Images become comprehensive descriptions, audio becomes transcripts
33
+ - **Single Embedding Model**: One text embedding model for all content types
34
+ - **Cross-Modal Search**: Find images through descriptions, audio through transcripts
35
+ - **Simplified Architecture**: No more complex STI (Single Table Inheritance) models
36
+ - **Better Search**: Unified text index enables more sophisticated queries
37
+ - **Migration Path**: Smooth transition from the previous multi-modal system
26
38
 
27
39
  ## Overview
28
40
 
29
- Ragdoll is a database-first, multi-modal Retrieval-Augmented Generation (RAG) library for Ruby. It pairs PostgreSQL + pgvector with an ActiveRecord-driven schema to deliver fast, production-grade semantic search and clean data modeling. Today it ships with robust text processing; image and audio pipelines are scaffolded and actively being completed.
41
+ Ragdoll is a database-first, unified text-based Retrieval-Augmented Generation (RAG) library for Ruby. It pairs PostgreSQL + pgvector with an ActiveRecord-driven schema to deliver fast, production-grade semantic search through a simplified unified architecture.
30
42
 
31
- The library emphasizes a dual-metadata design: LLM-derived semantic metadata for understanding content, and system file metadata for managing assets. With built-in analytics, background processing, and a high-level API, you can go from ingest to answer quickly—and scale confidently.
43
+ The library converts all document types to rich text representations: PDFs and documents are extracted as text, images are converted to comprehensive AI-generated descriptions, and audio files are transcribed. This unified approach enables powerful cross-modal search while maintaining simplicity.
32
44
 
33
- ### Why Ragdoll?
45
+ ### Why the New Unified Architecture?
34
46
 
35
- - Database-first foundation on ActiveRecord (PostgreSQL + pgvector only) for performance and reliability
36
- - Multi-modal architecture (text today; image/audio next) via polymorphic content design
37
- - Dual metadata model separating semantic analysis from file properties
38
- - Provider-agnostic LLM integration via `ruby_llm` (OpenAI, Anthropic, Google)
39
- - Production-friendly: background jobs, connection pooling, indexing, and search analytics
40
- - Simple, ergonomic high-level API to keep your application code clean
47
+ - **Simplified Complexity**: Single content model instead of multiple polymorphic types
48
+ - **Cross-Modal Search**: Find images by searching for objects or concepts in their descriptions
49
+ - **Unified Index**: One text-based search index for all content types
50
+ - **Better Retrieval**: Text descriptions often contain more searchable information than raw media
51
+ - **Cost Effective**: Single embedding model instead of specialized models per media type
52
+ - **Easier Maintenance**: One embedding pipeline to maintain and optimize
41
53
 
42
54
  ### Key Capabilities
43
55
 
44
- - Semantic search with vector similarity (cosine) across polymorphic content
45
- - Text ingestion, chunking, and embedding generation
46
- - LLM-powered structured metadata with schema validation
47
- - Search tracking and analytics (CTR, performance, similarity of queries)
48
- - Hybrid search (semantic + full-text) planned
49
- - Extensible model and configuration system
56
+ - **Universal Text Conversion**: Converts any media type to searchable text
57
+ - **AI-Powered Descriptions**: Comprehensive image descriptions using vision models
58
+ - **Audio Transcription**: Speech-to-text conversion for audio content
59
+ - **Semantic Search**: Vector similarity search across all converted content
60
+ - **Cross-Modal Retrieval**: Search for images using text descriptions of their content
61
+ - **Content Quality Assessment**: Automatic scoring of converted content quality
62
+ - **Migration Support**: Tools to migrate from previous multi-modal architecture
50
63
 
51
64
  ## Table of Contents
52
65
 
53
66
  - [Quick Start](#quick-start)
67
+ - [Unified Architecture Guide](#unified-architecture-guide)
68
+ - [Document Processing Pipeline](#document-processing-pipeline)
69
+ - [Cross-Modal Search](#cross-modal-search)
70
+ - [Migration from Multi-Modal](#migration-from-multi-modal)
54
71
  - [API Overview](#api-overview)
55
- - [Search and Retrieval](#search-and-retrieval)
56
- - [Search Analytics and Tracking](#search-analytics-and-tracking)
57
- - [System Operations](#system-operations)
58
72
  - [Configuration](#configuration)
59
- - [Current Implementation Status](#current-implementation-status)
60
- - [Architecture Highlights](#architecture-highlights)
61
- - [Text Document Processing](#text-document-processing-current)
62
- - [PostgreSQL + pgvector Configuration](#postgresql--pgvector-configuration)
63
- - [Performance Features](#performance-features)
64
73
  - [Installation](#installation)
65
74
  - [Requirements](#requirements)
66
- - [Use Cases](#use-cases)
67
- - [Environment Variables](#environment-variables)
75
+ - [Performance Features](#performance-features)
68
76
  - [Troubleshooting](#troubleshooting)
69
- - [Related Projects](#related-projects)
70
- - [Key Design Principles](#key-design-principles)
71
- - [Contributing & Support](#contributing--support)
72
77
 
73
78
  ## Quick Start
74
79
 
75
80
  ```ruby
76
81
  require 'ragdoll'
77
82
 
78
- # Configure with PostgreSQL + pgvector
83
+ # Configure with unified text-based architecture
79
84
  Ragdoll.configure do |config|
80
85
  # Database configuration (PostgreSQL only)
81
86
  config.database_config = {
@@ -88,225 +93,234 @@ Ragdoll.configure do |config|
88
93
  auto_migrate: true
89
94
  }
90
95
 
91
- # Ruby LLM configuration
92
- config.ruby_llm_config[:openai][:api_key] = ENV['OPENAI_API_KEY']
93
- config.ruby_llm_config[:openai][:organization] = ENV['OPENAI_ORGANIZATION']
94
- config.ruby_llm_config[:openai][:project] = ENV['OPENAI_PROJECT']
96
+ # Enable unified text-based models
97
+ config.use_unified_models = true
98
+
99
+ # Text conversion settings
100
+ config.text_conversion = {
101
+ image_detail_level: :comprehensive, # :minimal, :standard, :comprehensive, :analytical
102
+ audio_transcription_provider: :openai, # :azure, :google, :whisper_local
103
+ enable_fallback_descriptions: true
104
+ }
95
105
 
96
- # Model configuration
97
- config.models[:default] = 'openai/gpt-4o'
98
- config.models[:embedding][:text] = 'text-embedding-3-small'
106
+ # Single embedding model for all content
107
+ config.embedding_model = "text-embedding-3-large"
108
+ config.embedding_provider = :openai
99
109
 
100
- # Logging configuration
101
- config.logging_config[:log_level] = :warn
102
- config.logging_config[:log_filepath] = File.join(Dir.home, '.ragdoll', 'ragdoll.log')
110
+ # Ruby LLM configuration
111
+ config.ruby_llm_config[:openai][:api_key] = ENV['OPENAI_API_KEY']
103
112
  end
104
113
 
105
- # Add documents - returns detailed result
114
+ # Add documents - all types converted to text
106
115
  result = Ragdoll.add_document(path: 'research_paper.pdf')
107
- puts result[:message] # "Document 'research_paper' added successfully with ID 123"
108
- doc_id = result[:document_id]
116
+ image_result = Ragdoll.add_document(path: 'diagram.png') # Converted to description
117
+ audio_result = Ragdoll.add_document(path: 'lecture.mp3') # Converted to transcript
109
118
 
110
- # Check document status
111
- status = Ragdoll.document_status(id: doc_id)
112
- puts status[:message] # Shows processing status and embeddings count
119
+ # Cross-modal search - find images by describing their content
120
+ results = Ragdoll.search(query: 'neural network architecture diagram')
121
+ # This can return the image document if its AI description mentions neural networks
113
122
 
114
- # Search across content
115
- results = Ragdoll.search(query: 'neural networks')
123
+ # Search for audio content by transcript content
124
+ results = Ragdoll.search(query: 'machine learning discussion')
125
+ # Returns audio documents whose transcripts mention machine learning
116
126
 
117
- # Get detailed document information
118
- document = Ragdoll.get_document(id: doc_id)
127
+ # Check content quality
128
+ document = Ragdoll.get_document(id: result[:document_id])
129
+ puts document[:content_quality_score] # 0.0 to 1.0 rating
119
130
  ```
120
131
 
121
- ## API Overview
132
+ ## Unified Architecture Guide
122
133
 
123
- The `Ragdoll` module provides a convenient high-level API for common operations:
134
+ ### Document Processing Pipeline
124
135
 
125
- ### Document Management
136
+ The new unified pipeline converts all media types to searchable text:
126
137
 
127
138
  ```ruby
128
- # Add single document - returns detailed result hash
129
- result = Ragdoll.add_document(path: 'document.pdf')
130
- puts result[:success] # true
131
- puts result[:document_id] # "123"
132
- puts result[:message] # "Document 'document' added successfully with ID 123"
133
- puts result[:embeddings_queued] # true
134
-
135
- # Check document processing status
136
- status = Ragdoll.document_status(id: result[:document_id])
137
- puts status[:status] # "processed"
138
- puts status[:embeddings_count] # 15
139
- puts status[:embeddings_ready] # true
140
- puts status[:message] # "Document processed successfully with 15 embeddings"
141
-
142
- # Get detailed document information
143
- document = Ragdoll.get_document(id: result[:document_id])
144
- puts document[:title] # "document"
145
- puts document[:status] # "processed"
146
- puts document[:embeddings_count] # 15
147
- puts document[:content_length] # 5000
139
+ # Text files: Direct extraction
140
+ text_doc = Ragdoll.add_document(path: 'article.md')
141
+ # Content: Original markdown text
148
142
 
149
- # Update document metadata
150
- Ragdoll.update_document(id: result[:document_id], title: 'New Title')
143
+ # PDF/DOCX: Text extraction
144
+ pdf_doc = Ragdoll.add_document(path: 'research.pdf')
145
+ # Content: Extracted text from all pages
151
146
 
152
- # Delete document
153
- Ragdoll.delete_document(id: result[:document_id])
147
+ # Images: AI-generated descriptions
148
+ image_doc = Ragdoll.add_document(path: 'chart.png')
149
+ # Content: "Bar chart showing quarterly sales data with increasing trend..."
154
150
 
155
- # List all documents
156
- documents = Ragdoll.list_documents(limit: 10)
151
+ # Audio: Speech-to-text transcription
152
+ audio_doc = Ragdoll.add_document(path: 'meeting.mp3')
153
+ # Content: "In this meeting we discussed the quarterly results..."
157
154
 
158
- # System statistics
159
- stats = Ragdoll.stats
160
- puts stats[:total_documents] # 50
161
- puts stats[:total_embeddings] # 1250
155
+ # Video: Audio transcription + metadata
156
+ video_doc = Ragdoll.add_document(path: 'presentation.mp4')
157
+ # Content: Combination of audio transcript and video metadata
162
158
  ```
163
159
 
164
- ### Search and Retrieval
160
+ ### Text Conversion Services
165
161
 
166
162
  ```ruby
167
- # Semantic search across all content types
168
- results = Ragdoll.search(query: 'artificial intelligence')
169
-
170
- # Search with automatic tracking (default)
171
- results = Ragdoll.search(
172
- query: 'machine learning',
173
- session_id: 123, # Optional: track user sessions
174
- user_id: 456 # Optional: track by user
175
- )
163
+ # Use individual conversion services
164
+ text_content = Ragdoll::TextExtractionService.extract('document.pdf')
165
+ image_description = Ragdoll::ImageToTextService.convert('photo.jpg', detail_level: :comprehensive)
166
+ audio_transcript = Ragdoll::AudioToTextService.transcribe('speech.wav')
176
167
 
177
- # Search specific content types
178
- text_results = Ragdoll.search(query: 'machine learning', content_type: 'text')
179
- image_results = Ragdoll.search(query: 'neural network diagram', content_type: 'image')
180
- audio_results = Ragdoll.search(query: 'AI discussion', content_type: 'audio')
168
+ # Use unified converter (orchestrates all services)
169
+ unified_text = Ragdoll::DocumentConverter.convert_to_text('any_file.ext')
181
170
 
182
- # Advanced search with metadata filters
183
- results = Ragdoll.search(
184
- query: 'deep learning',
185
- classification: 'research',
186
- keywords: ['AI', 'neural networks'],
187
- tags: ['technical']
188
- )
171
+ # Manage documents with unified approach
172
+ management = Ragdoll::UnifiedDocumentManagement.new
173
+ document = management.add_document('mixed_media_file.mov')
174
+ ```
189
175
 
190
- # Get context for RAG applications
191
- context = Ragdoll.get_context(query: 'machine learning', limit: 5)
176
+ ### Content Quality Assessment
192
177
 
193
- # Enhanced prompt with context
194
- enhanced = Ragdoll.enhance_prompt(
195
- prompt: 'What is machine learning?',
196
- context_limit: 5
178
+ ```ruby
179
+ # Get content quality scores
180
+ document = Ragdoll::UnifiedDocument.find(id)
181
+ quality = document.content_quality_score # 0.0 to 1.0
182
+
183
+ # Quality factors:
184
+ # - Content length (50-2000 words optimal)
185
+ # - Original media type (text > documents > descriptions > placeholders)
186
+ # - Conversion success (full content > partial > fallback)
187
+
188
+ # Batch quality assessment
189
+ stats = Ragdoll::UnifiedContent.stats
190
+ puts stats[:content_quality_distribution]
191
+ # => { high: 150, medium: 75, low: 25 }
192
+ ```
193
+
194
+ ## Cross-Modal Search
195
+
196
+ The unified architecture enables powerful cross-modal search capabilities:
197
+
198
+ ```ruby
199
+ # Find images by describing their visual content
200
+ image_results = Ragdoll.search(query: 'red sports car in parking lot')
201
+ # Returns image documents whose AI descriptions match the query
202
+
203
+ # Search for audio by spoken content
204
+ audio_results = Ragdoll.search(query: 'quarterly sales meeting discussion')
205
+ # Returns audio documents whose transcripts contain these topics
206
+
207
+ # Mixed results across all media types
208
+ all_results = Ragdoll.search(query: 'artificial intelligence')
209
+ # Returns text documents, images with AI descriptions, and audio transcripts
210
+ # all ranked by relevance to the query
211
+
212
+ # Filter by original media type while searching text
213
+ image_only = Ragdoll.search(
214
+ query: 'machine learning workflow',
215
+ original_media_type: 'image'
197
216
  )
198
217
 
199
- # Hybrid search combining semantic and full-text
200
- results = Ragdoll.hybrid_search(
201
- query: 'neural networks',
202
- semantic_weight: 0.7,
203
- text_weight: 0.3
218
+ # Search with quality filtering
219
+ high_quality = Ragdoll.search(
220
+ query: 'deep learning',
221
+ min_quality_score: 0.7
204
222
  )
205
223
  ```
206
224
 
207
- ### Keywords Search
225
+ ## Migration from Multi-Modal
208
226
 
209
- Ragdoll supports powerful keywords-based search that can be used standalone or combined with semantic search. The keywords system uses PostgreSQL array operations for high performance and supports both partial matching (overlap) and exact matching (contains all).
227
+ Migrate smoothly from the previous multi-modal architecture:
210
228
 
211
229
  ```ruby
212
- # Keywords-only search (overlap - documents containing any of the keywords)
213
- results = Ragdoll::Document.search_by_keywords(['machine', 'learning', 'ai'])
230
+ # Check migration readiness
231
+ migration_service = Ragdoll::MigrationService.new
232
+ report = migration_service.create_comparison_report
214
233
 
215
- # Results are sorted by match count (documents with more keyword matches rank higher)
216
- results.each do |doc|
217
- puts "#{doc.title}: #{doc.keywords_match_count} matches"
218
- end
234
+ puts "Migration Benefits:"
235
+ report[:benefits].each { |benefit, description| puts "- #{description}" }
219
236
 
220
- # Exact keywords search (contains all - documents must have ALL keywords)
221
- results = Ragdoll::Document.search_by_keywords_all(['ruby', 'programming'])
222
-
223
- # Results are sorted by focus (fewer total keywords = more focused document)
224
- results.each do |doc|
225
- puts "#{doc.title}: #{doc.total_keywords_count} total keywords"
226
- end
227
-
228
- # Combined semantic + keywords search for best results
229
- results = Ragdoll.search(
230
- query: 'artificial intelligence applications',
231
- keywords: ['ai', 'machine learning', 'neural networks'],
232
- limit: 10
237
+ # Migrate all documents
238
+ results = Ragdoll::MigrationService.migrate_all_documents(
239
+ batch_size: 50,
240
+ process_embeddings: true
233
241
  )
234
242
 
235
- # Keywords search with options
236
- results = Ragdoll::Document.search_by_keywords(
237
- ['web', 'javascript', 'frontend'],
238
- limit: 20
239
- )
243
+ puts "Migrated: #{results[:migrated]} documents"
244
+ puts "Errors: #{results[:errors].length}"
245
+
246
+ # Validate migration integrity
247
+ validation = migration_service.validate_migration
248
+ puts "Validation passed: #{validation[:passed]}/#{validation[:total_checks]} checks"
240
249
 
241
- # Case-insensitive keyword matching (automatically normalized)
242
- results = Ragdoll::Document.search_by_keywords(['Python', 'DATA-SCIENCE', 'ai'])
243
- # Will match documents with keywords: ['python', 'data-science', 'ai']
250
+ # Migrate individual document
251
+ migrated_doc = Ragdoll::MigrationService.migrate_document(old_document_id)
244
252
  ```
245
253
 
246
- **Keywords Search Features:**
247
- - **High Performance**: Uses PostgreSQL GIN indexes for fast array operations
248
- - **Flexible Matching**: Supports both overlap (`&&`) and contains (`@>`) operators
249
- - **Smart Scoring**: Results ordered by match count or document focus
250
- - **Case Insensitive**: Automatic keyword normalization
251
- - **Integration Ready**: Works seamlessly with semantic search
252
- - **Inspired by `find_matching_entries.rb`**: Optimized for PostgreSQL arrays
254
+ ## API Overview
255
+
256
+ ### Unified Document Management
253
257
 
254
- ### Search Analytics and Tracking
258
+ ```ruby
259
+ # Add documents with automatic text conversion
260
+ result = Ragdoll.add_document(path: 'any_file.ext')
261
+ puts result[:document_id]
262
+ puts result[:content_preview] # First 100 characters of converted text
263
+
264
+ # Batch processing with unified pipeline
265
+ files = ['doc.pdf', 'image.jpg', 'audio.mp3']
266
+ results = Ragdoll::UnifiedDocumentManagement.new.batch_process_documents(files)
267
+
268
+ # Reprocess with different conversion settings
269
+ Ragdoll::UnifiedDocumentManagement.new.reprocess_document(
270
+ document_id,
271
+ image_detail_level: :analytical
272
+ )
273
+ ```
255
274
 
256
- Ragdoll automatically tracks all searches to provide comprehensive analytics and improve search relevance over time:
275
+ ### Search API
257
276
 
258
277
  ```ruby
259
- # Get search analytics for the last 30 days
260
- analytics = Ragdoll::Search.search_analytics(days: 30)
261
- puts "Total searches: #{analytics[:total_searches]}"
262
- puts "Unique queries: #{analytics[:unique_queries]}"
263
- puts "Average execution time: #{analytics[:avg_execution_time]}ms"
264
- puts "Click-through rate: #{analytics[:click_through_rate]}%"
265
-
266
- # Find similar searches using vector similarity
267
- search = Ragdoll::Search.first
268
- similar_searches = search.nearest_neighbors(:query_embedding, distance: :cosine).limit(5)
269
-
270
- similar_searches.each do |similar|
271
- puts "Query: #{similar.query}"
272
- puts "Similarity: #{similar.neighbor_distance}"
273
- puts "Results: #{similar.results_count}"
274
- end
278
+ # Unified search across all content types
279
+ results = Ragdoll.search(query: 'machine learning algorithms')
275
280
 
276
- # Track user interactions (clicks on search results)
277
- search_result = Ragdoll::SearchResult.first
278
- search_result.mark_as_clicked!
281
+ # Search with original media type context
282
+ results.each do |doc|
283
+ puts "#{doc.title} (originally #{doc.original_media_type})"
284
+ puts "Quality: #{doc.content_quality_score.round(2)}"
285
+ puts "Content: #{doc.content[0..100]}..."
286
+ end
279
287
 
280
- # Disable tracking for specific searches if needed
281
- results = Ragdoll.search(
282
- query: 'private query',
283
- track_search: false
288
+ # Advanced search with content quality
289
+ high_quality_results = Ragdoll.search(
290
+ query: 'neural networks',
291
+ min_quality_score: 0.8,
292
+ limit: 10
284
293
  )
285
294
  ```
286
295
 
287
- ### System Operations
296
+ ### Content Analysis
288
297
 
289
298
  ```ruby
290
- # Get system statistics
291
- stats = Ragdoll.stats
292
- # Returns information about documents, content types, embeddings, etc.
299
+ # Analyze converted content
300
+ document = Ragdoll::UnifiedDocument.find(id)
293
301
 
294
- # Health check
295
- healthy = Ragdoll.healthy?
302
+ # Check original media type
303
+ puts document.unified_contents.first.original_media_type # 'image', 'audio', 'text', etc.
296
304
 
297
- # Get configuration
298
- config = Ragdoll.configuration
305
+ # View conversion metadata
306
+ content = document.unified_contents.first
307
+ puts content.conversion_method # 'image_to_text', 'audio_transcription', etc.
308
+ puts content.metadata # Conversion settings and results
299
309
 
300
- # Reset configuration (useful for testing)
301
- Ragdoll.reset_configuration!
310
+ # Quality metrics
311
+ puts content.word_count
312
+ puts content.character_count
313
+ puts content.content_quality_score
302
314
  ```
303
315
 
304
- ### Configuration
316
+ ## Configuration
305
317
 
306
318
  ```ruby
307
- # Configure the system
308
319
  Ragdoll.configure do |config|
309
- # Database configuration (PostgreSQL only - REQUIRED)
320
+ # Enable unified text-based architecture
321
+ config.use_unified_models = true
322
+
323
+ # Database configuration (PostgreSQL required)
310
324
  config.database_config = {
311
325
  adapter: 'postgresql',
312
326
  database: 'ragdoll_production',
@@ -317,141 +331,74 @@ Ragdoll.configure do |config|
317
331
  auto_migrate: true
318
332
  }
319
333
 
320
- # Ruby LLM configuration for multiple providers
321
- config.ruby_llm_config[:openai][:api_key] = ENV['OPENAI_API_KEY']
322
- config.ruby_llm_config[:openai][:organization] = ENV['OPENAI_ORGANIZATION']
323
- config.ruby_llm_config[:openai][:project] = ENV['OPENAI_PROJECT']
334
+ # Text conversion settings
335
+ config.text_conversion = {
336
+ # Image conversion detail levels:
337
+ # :minimal - Brief one-sentence description
338
+ # :standard - Main elements and composition
339
+ # :comprehensive - Detailed description including objects, colors, mood
340
+ # :analytical - Thorough analysis including artistic elements
341
+ image_detail_level: :comprehensive,
342
+
343
+ # Audio transcription providers
344
+ audio_transcription_provider: :openai, # :azure, :google, :whisper_local
345
+
346
+ # Fallback behavior
347
+ enable_fallback_descriptions: true,
348
+ fallback_timeout: 30 # seconds
349
+ }
350
+
351
+ # Single embedding model for all content types
352
+ config.embedding_model = "text-embedding-3-large"
353
+ config.embedding_provider = :openai
324
354
 
355
+ # Ruby LLM configuration for text conversion
356
+ config.ruby_llm_config[:openai][:api_key] = ENV['OPENAI_API_KEY']
325
357
  config.ruby_llm_config[:anthropic][:api_key] = ENV['ANTHROPIC_API_KEY']
326
- config.ruby_llm_config[:google][:api_key] = ENV['GOOGLE_API_KEY']
327
358
 
328
- # Model configuration
329
- config.models[:default] = 'openai/gpt-4o'
330
- config.models[:summary] = 'openai/gpt-4o'
331
- config.models[:keywords] = 'openai/gpt-4o'
332
- config.models[:embedding][:text] = 'text-embedding-3-small'
333
- config.models[:embedding][:image] = 'image-embedding-3-small'
334
- config.models[:embedding][:audio] = 'audio-embedding-3-small'
359
+ # Vision model configuration for image descriptions
360
+ config.vision_config = {
361
+ primary_model: 'gpt-4-vision-preview',
362
+ fallback_model: 'gemini-pro-vision',
363
+ temperature: 0.2
364
+ }
335
365
 
336
- # Logging configuration
337
- config.logging_config[:log_level] = :warn # :debug, :info, :warn, :error, :fatal
338
- config.logging_config[:log_filepath] = File.join(Dir.home, '.ragdoll', 'ragdoll.log')
366
+ # Audio transcription configuration
367
+ config.audio_config = {
368
+ openai: {
369
+ model: 'whisper-1',
370
+ temperature: 0.0
371
+ },
372
+ azure: {
373
+ endpoint: ENV['AZURE_SPEECH_ENDPOINT'],
374
+ api_key: ENV['AZURE_SPEECH_KEY']
375
+ }
376
+ }
339
377
 
340
378
  # Processing settings
341
379
  config.chunking[:text][:max_tokens] = 1000
342
380
  config.chunking[:text][:overlap] = 200
343
381
  config.search[:similarity_threshold] = 0.7
344
382
  config.search[:max_results] = 10
345
- end
346
- ```
347
-
348
- ## Current Implementation Status
349
-
350
- ### ✅ **Fully Implemented**
351
- - **Text document processing**: PDF, DOCX, HTML, Markdown, plain text files
352
- - **Embedding generation**: Text chunking and vector embedding creation
353
- - **Database schema**: Multi-modal polymorphic architecture with PostgreSQL + pgvector
354
- - **Dual metadata architecture**: Separate LLM-generated content analysis and file properties
355
- - **Search functionality**: Semantic search with cosine similarity and usage analytics
356
- - **Search tracking system**: Comprehensive analytics with query embeddings, click-through tracking, and performance monitoring
357
- - **Document management**: Add, update, delete, list operations
358
- - **Background processing**: ActiveJob integration for async embedding generation
359
- - **LLM metadata generation**: AI-powered structured content analysis with schema validation
360
- - **Logging**: Configurable file-based logging with multiple levels
361
-
362
- ### 🚧 **In Development**
363
- - **Image processing**: Framework exists but vision AI integration needs completion
364
- - **Audio processing**: Framework exists but speech-to-text integration needs completion
365
- - **Hybrid search**: Combining semantic and full-text search capabilities
366
-
367
- ### 📋 **Planned Features**
368
- - **Multi-modal search**: Search across text, image, and audio content types
369
- - **Content-type specific embedding models**: Different models for text, image, audio
370
- - **Enhanced metadata schemas**: Domain-specific metadata templates
371
-
372
- ## Architecture Highlights
373
-
374
- ### Dual Metadata Design
375
-
376
- Ragdoll uses a sophisticated dual metadata architecture to separate concerns:
377
-
378
- - **`metadata` (JSON)**: LLM-generated content analysis including summary, keywords, classification, topics, sentiment, and domain-specific insights
379
- - **`file_metadata` (JSON)**: System-generated file properties including size, MIME type, dimensions, processing parameters, and technical characteristics
380
-
381
- This separation enables both semantic search operations on content meaning and efficient file management operations.
382
-
383
- ### Polymorphic Multi-Modal Architecture
384
-
385
- The database schema uses polymorphic associations to elegantly support multiple content types:
386
-
387
- - **Documents**: Central entity with dual metadata columns
388
- - **Content Types**: Specialized tables for `text_contents`, `image_contents`, `audio_contents`
389
- - **Embeddings**: Unified vector storage via polymorphic `embeddable` associations
390
-
391
- ## Text Document Processing (Current)
392
-
393
- Currently, Ragdoll processes text documents through:
394
-
395
- 1. **Content Extraction**: Extracts text from PDF, DOCX, HTML, Markdown, and plain text
396
- 2. **Metadata Generation**: AI-powered analysis creates structured content metadata
397
- 3. **Text Chunking**: Splits content into manageable chunks with configurable size/overlap
398
- 4. **Embedding Generation**: Creates vector embeddings using OpenAI or other providers
399
- 5. **Database Storage**: Stores in polymorphic multi-modal architecture with dual metadata
400
- 6. **Search**: Semantic search using cosine similarity with usage analytics
401
-
402
- ### Example Usage
403
-
404
- ```ruby
405
- # Add a text document
406
- result = Ragdoll.add_document(path: 'document.pdf')
407
-
408
- # Check processing status
409
- status = Ragdoll.document_status(id: result[:document_id])
410
-
411
- # Search the content
412
- results = Ragdoll.search(query: 'machine learning')
413
- ```
414
-
415
- ## PostgreSQL + pgvector Configuration
416
-
417
- ### Database Setup
418
-
419
- ```bash
420
- # Install PostgreSQL and pgvector
421
- brew install postgresql pgvector # macOS
422
- # or
423
- apt-get install postgresql postgresql-contrib # Ubuntu
424
-
425
- # Create database and enable pgvector extension
426
- createdb ragdoll_production
427
- psql -d ragdoll_production -c "CREATE EXTENSION IF NOT EXISTS vector;"
428
- ```
429
383
 
430
- ### Configuration Example
431
-
432
- ```ruby
433
- Ragdoll.configure do |config|
434
- config.database_config = {
435
- adapter: 'postgresql',
436
- database: 'ragdoll_production',
437
- username: 'ragdoll',
438
- password: ENV['DATABASE_PASSWORD'],
439
- host: 'localhost',
440
- port: 5432,
441
- pool: 20,
442
- auto_migrate: true
384
+ # Quality thresholds
385
+ config.quality_thresholds = {
386
+ high_quality: 0.8,
387
+ medium_quality: 0.5,
388
+ min_content_length: 50
443
389
  }
444
390
  end
445
391
  ```
446
392
 
447
393
  ## Performance Features
448
394
 
449
- - **Native pgvector**: Hardware-accelerated similarity search
450
- - **IVFFlat indexing**: Fast approximate nearest neighbor search
451
- - **Polymorphic embeddings**: Unified search across content types
452
- - **Batch processing**: Efficient bulk operations
453
- - **Background jobs**: Asynchronous document processing
454
- - **Connection pooling**: High-concurrency support
395
+ - **Unified Index**: Single text-based search index for all content types
396
+ - **Optimized Conversion**: Efficient text extraction and AI-powered description generation
397
+ - **Quality Scoring**: Automatic assessment of converted content quality
398
+ - **Batch Processing**: Efficient bulk document processing with progress tracking
399
+ - **Smart Caching**: Caches conversion results to avoid reprocessing
400
+ - **Background Jobs**: Asynchronous processing for large files
401
+ - **Cross-Modal Optimization**: Specialized optimizations for different media type conversions
455
402
 
456
403
  ## Installation
457
404
 
@@ -461,6 +408,12 @@ brew install postgresql pgvector # macOS
461
408
  # or
462
409
  apt-get install postgresql postgresql-contrib # Ubuntu
463
410
 
411
+ # For image processing
412
+ brew install imagemagick
413
+
414
+ # For audio processing (optional, depending on provider)
415
+ brew install ffmpeg
416
+
464
417
  # Install gem
465
418
  gem install ragdoll
466
419
 
@@ -471,61 +424,83 @@ gem 'ragdoll'
471
424
  ## Requirements
472
425
 
473
426
  - **Ruby**: 3.2+
474
- - **PostgreSQL**: 12+ with pgvector extension (REQUIRED - no other databases supported)
475
- - **Dependencies**: activerecord, pg, pgvector, neighbor, ruby_llm, pdf-reader, docx, rubyzip, shrine, rmagick, opensearch-ruby, searchkick, ruby-progressbar
427
+ - **PostgreSQL**: 12+ with pgvector extension
428
+ - **ImageMagick**: For image processing and metadata extraction
429
+ - **FFmpeg**: Optional, for advanced audio/video processing
430
+ - **Dependencies**: activerecord, pg, pgvector, neighbor, ruby_llm, pdf-reader, docx, rmagick, tempfile
476
431
 
477
- ## Use Cases
432
+ ### Vision Model Requirements
478
433
 
479
- - Internal knowledge bases and chat assistants grounded in your documents
480
- - Product documentation and support search with analytics and relevance feedback
481
- - Research corpora exploration (summaries, topics, similarity) across large text sets
482
- - Incident retrospectives and operational analytics with searchable write-ups
483
- - Media libraries preparing for text + image + audio pipelines (image/audio in progress)
434
+ For comprehensive image descriptions:
435
+ - **OpenAI**: GPT-4 Vision (recommended)
436
+ - **Google**: Gemini Pro Vision
437
+ - **Anthropic**: Claude 3 with vision capabilities
438
+ - **Local**: Ollama with vision-capable models
484
439
 
485
- ## Environment Variables
440
+ ### Audio Transcription Requirements
486
441
 
487
- Set the following as environment variables (do not commit secrets to source control):
488
-
489
- - `OPENAI_API_KEY` required for OpenAI models
490
- - `OPENAI_ORGANIZATION` optional, for OpenAI org scoping
491
- - `OPENAI_PROJECT` — optional, for OpenAI project scoping
492
- - `ANTHROPIC_API_KEY` — optional, for Anthropic models
493
- - `GOOGLE_API_KEY` — optional, for Google models
494
- - `DATABASE_PASSWORD` — your PostgreSQL password if not using peer auth
442
+ - **OpenAI**: Whisper API (recommended)
443
+ - **Azure**: Speech Services
444
+ - **Google**: Cloud Speech-to-Text
445
+ - **Local**: Whisper installation
495
446
 
496
447
  ## Troubleshooting
497
448
 
498
- ### pgvector extension missing
499
-
500
- - Ensure the extension is enabled in your database:
449
+ ### Image Processing Issues
501
450
 
502
451
  ```bash
503
- psql -d ragdoll_production -c "CREATE EXTENSION IF NOT EXISTS vector;"
452
+ # Verify ImageMagick installation
453
+ convert -version
454
+
455
+ # Check vision model access
456
+ irb -r ragdoll
457
+ > Ragdoll::ImageToTextService.new.convert('test_image.jpg')
504
458
  ```
505
459
 
506
- - If the command fails, verify PostgreSQL and pgvector are installed and that you’re connecting to the correct database.
460
+ ### Audio Processing Issues
507
461
 
508
- ### Document stuck in "processing"
462
+ ```bash
463
+ # For Whisper local installation
464
+ pip install openai-whisper
509
465
 
510
- - Confirm your API keys are set and valid.
511
- - Ensure `auto_migrate: true` in configuration (or run migrations if you manage schema yourself).
512
- - Check logs at the path configured by `logging_config[:log_filepath]` for errors.
466
+ # Test audio file support
467
+ irb -r ragdoll
468
+ > Ragdoll::AudioToTextService.new.transcribe('test_audio.wav')
469
+ ```
513
470
 
514
- ## Related Projects
471
+ ### Content Quality Issues
515
472
 
516
- - **ragdoll-cli**: Standalone CLI application using ragdoll
517
- - **ragdoll-rails**: Rails engine with web interface for ragdoll
473
+ ```ruby
474
+ # Check content quality distribution
475
+ stats = Ragdoll::UnifiedContent.stats
476
+ puts stats[:content_quality_distribution]
477
+
478
+ # Reprocess low-quality content
479
+ low_quality = Ragdoll::UnifiedDocument.joins(:unified_contents)
480
+ .where('unified_contents.content_quality_score < 0.5')
481
+
482
+ low_quality.each do |doc|
483
+ Ragdoll::UnifiedDocumentManagement.new.reprocess_document(
484
+ doc.id,
485
+ image_detail_level: :analytical
486
+ )
487
+ end
488
+ ```
518
489
 
519
- ## Contributing & Support
490
+ ## Use Cases
520
491
 
521
- Contributions are welcome! If you find a bug or have a feature request, please open an issue or submit a pull request. For questions and feedback, open an issue in this repository.
492
+ - **Knowledge Bases**: Search across text documents, presentation images, and recorded meetings
493
+ - **Media Libraries**: Find images by visual content, audio by spoken topics
494
+ - **Research Collections**: Unified search across papers (text), charts (images), and interviews (audio)
495
+ - **Documentation Systems**: Search technical docs, architecture diagrams, and explanation videos
496
+ - **Educational Content**: Find learning materials across all media types through unified text search
522
497
 
523
498
  ## Key Design Principles
524
499
 
525
- 1. **Database-Oriented**: Built on ActiveRecord with PostgreSQL + pgvector for production performance
526
- 2. **Multi-Modal First**: Text, image, and audio content as first-class citizens via polymorphic architecture
527
- 3. **Dual Metadata Design**: Separates LLM-generated content analysis from file properties
528
- 4. **LLM-Enhanced**: Structured metadata generation with schema validation using latest AI capabilities
529
- 5. **High-Level API**: Simple, intuitive interface for complex operations
530
- 6. **Scalable**: Designed for production workloads with background processing and proper indexing
531
- 7. **Extensible**: Easy to add new content types and embedding models through polymorphic design
500
+ 1. **Unified Text Representation**: All media types converted to searchable text
501
+ 2. **Cross-Modal Search**: Images findable through descriptions, audio through transcripts
502
+ 3. **Quality-Driven**: Automatic assessment and optimization of converted content
503
+ 4. **Simplified Architecture**: Single content model instead of complex polymorphic relationships
504
+ 5. **AI-Enhanced Conversion**: Leverages latest vision and speech models for rich text conversion
505
+ 6. **Migration-Friendly**: Smooth transition path from previous multi-modal architecture
506
+ 7. **Performance-Optimized**: Single embedding model and unified search index for speed