RubyGems - ragdoll - Versions diffs - 0.1.8 → 0.1.10 - Mend

ragdoll 0.1.8 → 0.1.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +243 -0
data/README.md +209 -31
data/Rakefile +4 -5
data/app/models/ragdoll/document.rb +115 -12
data/app/models/ragdoll/embedding.rb +108 -2
data/app/models/ragdoll/search.rb +165 -0
data/app/models/ragdoll/search_result.rb +121 -0
data/app/services/ragdoll/configuration_service.rb +3 -3
data/app/services/ragdoll/document_processor.rb +124 -1
data/app/services/ragdoll/embedding_service.rb +10 -0
data/app/services/ragdoll/search_engine.rb +75 -6
data/db/migrate/{001_enable_postgresql_extensions.rb → 20250815234901_enable_postgresql_extensions.rb} +7 -8
data/db/migrate/20250815234902_create_ragdoll_documents.rb +117 -0
data/db/migrate/{005_create_ragdoll_embeddings.rb → 20250815234903_create_ragdoll_embeddings.rb} +13 -10
data/db/migrate/{006_create_ragdoll_contents.rb → 20250815234904_create_ragdoll_contents.rb} +14 -11
data/db/migrate/20250815234905_create_ragdoll_searches.rb +77 -0
data/db/migrate/20250815234906_create_ragdoll_search_results.rb +49 -0
data/lib/ragdoll/core/client.rb +75 -8
data/lib/ragdoll/core/database.rb +8 -3
data/lib/ragdoll/core/model.rb +13 -0
data/lib/ragdoll/core/version.rb +1 -1
data/lib/ragdoll/core.rb +2 -0
data/lib/ragdoll.rb +17 -0
data/lib/tasks/db.rake +75 -27
metadata +375 -6
data/db/migrate/004_create_ragdoll_documents.rb +0 -70

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 7fb2f70ebe6d95bfcfca1ba44e84f140f1d75d17e27ead66ce9b7643f3571688
-  data.tar.gz: 61e3ccb7dc45bb6196e70770d4eaed9cae17602a9b442e5f525752c1e4a53445
+  metadata.gz: 4f7b2c95ede1523e9e01af70394217387d876da6317fed651df3e27cf337cfe9
+  data.tar.gz: a82ae7d541fd06876acb3acaf8f02639234f8b118274621851678a2799c5f559
 SHA512:
-  metadata.gz: 318e00ff0df2e4b075b9379ffc4a13de4700c4fa6c2c544be8678b700e4810d7cc80479eed3f709e6f25891a394741a8dccfc8e1fed6017d31607946c9267549
-  data.tar.gz: a8261e8a3f2740599564f4dd3b2c31914903339035664c01bfdea4800227858f071d25675ffd17c419b59d47baf8c0eb91313600355ac86bfc8d21eaf5e34add
+  metadata.gz: ba14828a6e743677c84072b9f1bb27743e429531ebdd9fbd3d8553add7bbdad070d709cd617dc620fef4ddc6846085ca79d3bb6d32bae8465c6b3b10acc0692f
+  data.tar.gz: de630ebf15168b562ef686ec6cd9f1cfe532b5bbf495e33a74085b567cf53ce7bb87e7c5c543756c47bd68c98290221b879a1b4d8e5888aac4916d1c1554fe99

data/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,243 @@
+# Changelog
+All notable changes to the Ragdoll Core project will be documented in this file.
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [Unreleased]
+## [0.1.10] - 2025-01-15
+### Changed
+- Continued improvements to search performance and accuracy
+### Added
+- **Hybrid Search**: Complete implementation combining semantic and full-text search capabilities
+  - Configurable weights for semantic vs text search (default: 70% semantic, 30% text)
+  - Deduplication of results by document ID
+  - Combined scoring system for unified result ranking
+- **Full-text Search**: PostgreSQL full-text search with tsvector indexing
+  - Per-word match ratio scoring (0.0 to 1.0)
+  - GIN index for high-performance text search
+  - Search across title, summary, keywords, and description fields
+- **Enhanced Search API**: Complete search type delegation at top-level Ragdoll namespace
+  - `Ragdoll.hybrid_search` method for combined semantic and text search
+  - `Ragdoll::Document.search_content` for full-text search capabilities
+  - Consistent parameter handling across all search methods
+### Changed
+- **Search Architecture**: Unified search interface supporting semantic, fulltext, and hybrid modes
+- **Database Schema**: Added search_vector column with GIN indexing for full-text search performance
+### Technical Details
+- Full-text search uses PostgreSQL's built-in tsvector capabilities
+- Hybrid search combines cosine similarity (semantic) with text match ratios
+- Results are ranked by weighted combined scores
+- All search methods maintain backward compatibility
+## [0.1.9] - 2025-01-10
+### Added
+- **Initial CHANGELOG**: Added comprehensive CHANGELOG.md following Keep a Changelog format
+  - Complete version history from git log analysis
+  - Feature status tracking (implemented vs planned)
+  - Migration guides and breaking changes documentation
+  - Structured release notes with proper categorization
+- **Search Tracking System**: Comprehensive analytics with query embeddings, click-through tracking, and performance monitoring
+  - Automatic search recording with vector embeddings for similarity analysis
+  - Click-through rate tracking and user engagement monitoring
+  - Session and user behavior tracking capabilities
+  - Performance metrics including execution time and result quality analysis
+  - Search similarity analysis using vector embeddings
+  - Automatic cleanup of orphaned and unused searches
+- **Enhanced README**: Updated documentation with search tracking examples and analytics usage
+  - Comprehensive search analytics examples and usage patterns
+  - Updated API examples to use proper top-level Ragdoll methods
+  - Added search tracking configuration and usage examples
+- **API Method Consistency**: Added `hybrid_search` delegation to top-level Ragdoll namespace
+  - Complete documentation with examples and parameter descriptions
+  - Consistent API experience across all search methods
+  - Verified method availability at both Ragdoll and Ragdoll::Core levels
+### Fixed
+- **Model Resolution Warning**: Fixed "undefined method 'empty?' for an instance of Ragdoll::Core::Model" warning
+  - Added defensive `empty?` method to Model class
+  - Enhanced constructor to handle polymorphic Model objects
+  - Added nil/empty checks in embedding service
+### Changed
+- **Test Coverage**: Added coverage directory to .gitignore for cleaner repository state
+### Technical Details
+- Commits: `9186067`, `cb952d3`, `e902a5f`, `632527b`
+- All changes maintain backward compatibility
+- No breaking API changes
+## [0.1.8] - 2025-01-04
+### Added
+- **Search Analytics Foundation**: Added `Ragdoll::Search` model with query embedding and result tracking capabilities
+- **Embedding Service Enhancements**: Fallback mechanism for model resolution in embedding service
+- **Test Coverage**: Added coverage directory to gitignore and improved test infrastructure
+### Changed
+- Updated Gemfile.lock with latest gem versions
+- Enhanced runtime dependencies and version management
+### Fixed
+- Package directory exclusion in gitignore
+## [0.1.7] - 2025-01-04
+### Added
+- **Multi-Modal Content Models**: Added AudioContent model for comprehensive audio processing support
+- **Background Job Processing**: New Ragdoll job classes for asynchronous document processing
+- **Metadata Schemas**: Structured metadata schemas for text and image documents with validation
+### Changed
+- Updated ragdoll gem dependencies
+- Improved submodule management for documentation
+## [0.1.6] - 2025-01-04
+### Added
+- **Documentation Restructure**: Replaced local docs with ragdoll-docs submodule
+- **Conventional Commits**: Updated and restructured Conventional Commits specification
+- **CI/CD Improvements**: Enhanced GitHub Actions workflow and dropped JRuby support for RMagick compatibility
+### Fixed
+- Test skipping logic for CI environments
+- Automated release workflow adjustments
+## [0.1.5] - 2025-01-04
+### Added
+- Enhanced document processing pipeline
+- Improved error handling and logging
+### Fixed
+- Version management and release process refinements
+## [0.1.4] - 2025-01-04
+### Added
+- Extended multi-modal architecture support
+- Performance optimizations for large document processing
+### Changed
+- Refined version numbering and release process
+## [0.1.3] - 2025-01-04
+### Added
+- **Core RAG Architecture**: Multi-modal RAG (Retrieval-Augmented Generation) library built on ActiveRecord
+- **PostgreSQL + pgvector Integration**: High-performance semantic search with vector similarity
+- **Polymorphic Content Architecture**: Unified handling of text, image, and audio content types
+- **Dual Metadata Design**: Separation of LLM-generated content analysis and system file properties
+- **Document Processing Pipeline**: Support for PDF, DOCX, HTML, Markdown, and plain text files
+- **Embedding Generation**: Text chunking and vector embedding creation with multiple LLM provider support
+- **Semantic Search**: Cosine similarity search with usage analytics
+- **Background Processing**: ActiveJob integration for asynchronous document processing
+- **Logging System**: Configurable file-based logging with multiple levels
+### Technical Features
+- **Database Schema**: Multi-modal polymorphic architecture optimized for PostgreSQL
+- **IVFFlat Indexing**: Fast approximate nearest neighbor search for vector similarity
+- **Connection Pooling**: High-concurrency support for production workloads
+- **Configuration Management**: Comprehensive configuration system for LLM providers and processing settings
+## [0.1.1] - 2024-12-XX
+### Added
+- Initial project structure and basic functionality
+- Core document management capabilities
+- Basic search and retrieval features
+## [0.0.2] - 2024-12-XX
+### Added
+- Initial alpha release
+- Basic RAG architecture foundation
+- PostgreSQL database integration
+---
+## Feature Status
+### ✅ Fully Implemented
+- **Text Document Processing**: PDF, DOCX, HTML, Markdown, plain text files
+- **Embedding Generation**: Text chunking and vector embedding creation
+- **Database Schema**: Multi-modal polymorphic architecture with PostgreSQL + pgvector
+- **Dual Metadata Architecture**: Separate LLM-generated content analysis and file properties
+- **Search Functionality**: Semantic search with cosine similarity and usage analytics
+- **Hybrid Search**: Complete implementation combining semantic and full-text search with configurable weights
+- **Full-text Search**: PostgreSQL tsvector-based text search with GIN indexing
+- **Search Tracking System**: Comprehensive analytics with query embeddings, click-through tracking, and performance monitoring
+- **Document Management**: Add, update, delete, list operations
+- **Background Processing**: ActiveJob integration for async embedding generation
+- **LLM Metadata Generation**: AI-powered structured content analysis with schema validation
+- **Logging**: Configurable file-based logging with multiple levels
+### 🚧 In Development
+- **Image Processing**: Framework exists but vision AI integration needs completion
+- **Audio Processing**: Framework exists but speech-to-text integration needs completion
+### 📋 Planned Features
+- **Multi-modal Search**: Search across text, image, and audio content types
+- **Content-type Specific Embedding Models**: Different models for text, image, audio
+- **Enhanced Metadata Schemas**: Domain-specific metadata templates
+---
+## Migration Guide
+### From 0.1.9 to 0.1.10
+- **New Search Methods**: `Ragdoll.hybrid_search` and `Ragdoll::Document.search_content` methods now available
+- **Database Migration**: New search_vector column added to documents table with GIN index for full-text search
+- **API Enhancement**: All search methods now support unified parameter interface
+- **Backward Compatibility**: Existing `Ragdoll.search` method unchanged, continues to work as before
+- **CLI Integration**: ragdoll-cli now requires ragdoll >= 0.1.10 for hybrid and full-text search support
+### From 0.1.8 to 0.1.9
+- **CHANGELOG Addition**: Comprehensive changelog and feature tracking added
+- **API Method Consistency**: `hybrid_search` method properly delegated to top-level namespace
+- **No Breaking Changes**: All existing functionality remains compatible
+### From 0.1.7 to 0.1.8
+- New search tracking tables will be automatically created via migrations
+- No breaking changes to existing API
+- Search tracking is enabled by default but can be disabled per search
+### From 0.1.6 to 0.1.7
+- AudioContent model added - existing installations will auto-migrate
+- New background job classes available for improved processing
+- Metadata schemas provide enhanced validation
+### From 0.1.5 to 0.1.6
+- Documentation moved to submodule - update local references
+- CI/CD improvements may affect development workflows
+- JRuby support removed due to RMagick dependency
+---
+## Breaking Changes
+### Version 0.1.6
+- **JRuby Support Removed**: RMagick dependency incompatibility
+- **Documentation Structure**: Local docs replaced with submodule
+---
+## Contributors
+- **Dewayne VanHoozer** - Primary developer and maintainer
+---
+## License
+This project is licensed under the MIT License - see the LICENSE file for details.
+---
+*This changelog is automatically maintained and reflects the actual implementation status of features.*

data/README.md CHANGED Viewed

@@ -18,17 +18,65 @@
   </table>
 </div>
-# Ragdoll::Core
+# Ragdoll
 Database-oriented multi-modal RAG (Retrieval-Augmented Generation) library built on ActiveRecord. Features PostgreSQL + pgvector for high-performance semantic search, polymorphic content architecture, and dual metadata design for sophisticated document analysis.
+RAG does not have to be hard.  Every week its getting simpler.  The frontier LLM providers are starting to encorporate RAG services.  For example OpenAI offers a vector search service.  See: [https://0x1eef.github.io/posts/an-introduction-to-rag-with-llm.rb/](https://0x1eef.github.io/posts/an-introduction-to-rag-with-llm.rb/)
+## Overview
+Ragdoll is a database-first, multi-modal Retrieval-Augmented Generation (RAG) library for Ruby. It pairs PostgreSQL + pgvector with an ActiveRecord-driven schema to deliver fast, production-grade semantic search and clean data modeling. Today it ships with robust text processing; image and audio pipelines are scaffolded and actively being completed.
+The library emphasizes a dual-metadata design: LLM-derived semantic metadata for understanding content, and system file metadata for managing assets. With built-in analytics, background processing, and a high-level API, you can go from ingest to answer quickly—and scale confidently.
+### Why Ragdoll?
+- Database-first foundation on ActiveRecord (PostgreSQL + pgvector only) for performance and reliability
+- Multi-modal architecture (text today; image/audio next) via polymorphic content design
+- Dual metadata model separating semantic analysis from file properties
+- Provider-agnostic LLM integration via `ruby_llm` (OpenAI, Anthropic, Google)
+- Production-friendly: background jobs, connection pooling, indexing, and search analytics
+- Simple, ergonomic high-level API to keep your application code clean
+### Key Capabilities
+- Semantic search with vector similarity (cosine) across polymorphic content
+- Text ingestion, chunking, and embedding generation
+- LLM-powered structured metadata with schema validation
+- Search tracking and analytics (CTR, performance, similarity of queries)
+- Hybrid search (semantic + full-text) planned
+- Extensible model and configuration system
+## Table of Contents
+- [Quick Start](#quick-start)
+- [API Overview](#api-overview)
+- [Search and Retrieval](#search-and-retrieval)
+- [Search Analytics and Tracking](#search-analytics-and-tracking)
+- [System Operations](#system-operations)
+- [Configuration](#configuration)
+- [Current Implementation Status](#current-implementation-status)
+- [Architecture Highlights](#architecture-highlights)
+- [Text Document Processing](#text-document-processing-current)
+- [PostgreSQL + pgvector Configuration](#postgresql--pgvector-configuration)
+- [Performance Features](#performance-features)
+- [Installation](#installation)
+- [Requirements](#requirements)
+- [Use Cases](#use-cases)
+- [Environment Variables](#environment-variables)
+- [Troubleshooting](#troubleshooting)
+- [Related Projects](#related-projects)
+- [Key Design Principles](#key-design-principles)
+- [Contributing & Support](#contributing--support)
 ## Quick Start
 ```ruby
 require 'ragdoll'
 # Configure with PostgreSQL + pgvector
-Ragdoll::Core.configure do |config|
+Ragdoll.configure do |config|
   # Database configuration (PostgreSQL only)
   config.database_config = {
     adapter: 'postgresql',
@@ -55,22 +103,22 @@ Ragdoll::Core.configure do |config|
 end
 # Add documents - returns detailed result
-result = Ragdoll::Core.add_document(path: 'research_paper.pdf')
+result = Ragdoll.add_document(path: 'research_paper.pdf')
 puts result[:message]  # "Document 'research_paper' added successfully with ID 123"
 doc_id = result[:document_id]
 # Check document status
-status = Ragdoll::Core.document_status(id: doc_id)
+status = Ragdoll.document_status(id: doc_id)
 puts status[:message]  # Shows processing status and embeddings count
 # Search across content
-results = Ragdoll::Core.search(query: 'neural networks')
+results = Ragdoll.search(query: 'neural networks')
 # Get detailed document information
-document = Ragdoll::Core.get_document(id: doc_id)
+document = Ragdoll.get_document(id: doc_id)
 ```
-## High-Level API
+## API Overview
 The `Ragdoll` module provides a convenient high-level API for common operations:
@@ -78,37 +126,37 @@ The `Ragdoll` module provides a convenient high-level API for common operations:
 ```ruby
 # Add single document - returns detailed result hash
-result = Ragdoll::Core.add_document(path: 'document.pdf')
+result = Ragdoll.add_document(path: 'document.pdf')
 puts result[:success]         # true
 puts result[:document_id]     # "123"
 puts result[:message]         # "Document 'document' added successfully with ID 123"
 puts result[:embeddings_queued] # true
 # Check document processing status
-status = Ragdoll::Core.document_status(id: result[:document_id])
+status = Ragdoll.document_status(id: result[:document_id])
 puts status[:status]          # "processed"
 puts status[:embeddings_count] # 15
 puts status[:embeddings_ready] # true
 puts status[:message]         # "Document processed successfully with 15 embeddings"
 # Get detailed document information
-document = Ragdoll::Core.get_document(id: result[:document_id])
+document = Ragdoll.get_document(id: result[:document_id])
 puts document[:title]         # "document"
 puts document[:status]        # "processed"
 puts document[:embeddings_count] # 15
 puts document[:content_length]   # 5000
 # Update document metadata
-Ragdoll::Core.update_document(id: result[:document_id], title: 'New Title')
+Ragdoll.update_document(id: result[:document_id], title: 'New Title')
 # Delete document
-Ragdoll::Core.delete_document(id: result[:document_id])
+Ragdoll.delete_document(id: result[:document_id])
 # List all documents
-documents = Ragdoll::Core.list_documents(limit: 10)
+documents = Ragdoll.list_documents(limit: 10)
 # System statistics
-stats = Ragdoll::Core.stats
+stats = Ragdoll.stats
 puts stats[:total_documents]  # 50
 puts stats[:total_embeddings] # 1250
 ```
@@ -117,15 +165,22 @@ puts stats[:total_embeddings] # 1250
 ```ruby
 # Semantic search across all content types
-results = Ragdoll::Core.search(query: 'artificial intelligence')
+results = Ragdoll.search(query: 'artificial intelligence')
+# Search with automatic tracking (default)
+results = Ragdoll.search(
+  query: 'machine learning',
+  session_id: 123,  # Optional: track user sessions
+  user_id:    456   # Optional: track by user
+)
 # Search specific content types
-text_results = Ragdoll::Core.search(query: 'machine learning', content_type: 'text')
-image_results = Ragdoll::Core.search(query: 'neural network diagram', content_type: 'image')
-audio_results = Ragdoll::Core.search(query: 'AI discussion', content_type: 'audio')
+text_results = Ragdoll.search(query: 'machine learning', content_type: 'text')
+image_results = Ragdoll.search(query: 'neural network diagram', content_type: 'image')
+audio_results = Ragdoll.search(query: 'AI discussion', content_type: 'audio')
 # Advanced search with metadata filters
-results = Ragdoll::Core.search(
+results = Ragdoll.search(
   query: 'deep learning',
   classification: 'research',
   keywords: ['AI', 'neural networks'],
@@ -133,44 +188,124 @@ results = Ragdoll::Core.search(
 )
 # Get context for RAG applications
-context = Ragdoll::Core.get_context(query: 'machine learning', limit: 5)
+context = Ragdoll.get_context(query: 'machine learning', limit: 5)
 # Enhanced prompt with context
-enhanced = Ragdoll::Core.enhance_prompt(
+enhanced = Ragdoll.enhance_prompt(
   prompt: 'What is machine learning?',
   context_limit: 5
 )
 # Hybrid search combining semantic and full-text
-results = Ragdoll::Core.hybrid_search(
+results = Ragdoll.hybrid_search(
   query: 'neural networks',
   semantic_weight: 0.7,
   text_weight: 0.3
 )
 ```
+### Keywords Search
+Ragdoll supports powerful keywords-based search that can be used standalone or combined with semantic search. The keywords system uses PostgreSQL array operations for high performance and supports both partial matching (overlap) and exact matching (contains all).
+```ruby
+# Keywords-only search (overlap - documents containing any of the keywords)
+results = Ragdoll::Document.search_by_keywords(['machine', 'learning', 'ai'])
+# Results are sorted by match count (documents with more keyword matches rank higher)
+results.each do |doc|
+  puts "#{doc.title}: #{doc.keywords_match_count} matches"
+end
+# Exact keywords search (contains all - documents must have ALL keywords)
+results = Ragdoll::Document.search_by_keywords_all(['ruby', 'programming'])
+# Results are sorted by focus (fewer total keywords = more focused document)
+results.each do |doc|
+  puts "#{doc.title}: #{doc.total_keywords_count} total keywords"
+end
+# Combined semantic + keywords search for best results
+results = Ragdoll.search(
+  query: 'artificial intelligence applications',
+  keywords: ['ai', 'machine learning', 'neural networks'],
+  limit: 10
+)
+# Keywords search with options
+results = Ragdoll::Document.search_by_keywords(
+  ['web', 'javascript', 'frontend'],
+  limit: 20
+)
+# Case-insensitive keyword matching (automatically normalized)
+results = Ragdoll::Document.search_by_keywords(['Python', 'DATA-SCIENCE', 'ai'])
+# Will match documents with keywords: ['python', 'data-science', 'ai']
+```
+**Keywords Search Features:**
+- **High Performance**: Uses PostgreSQL GIN indexes for fast array operations
+- **Flexible Matching**: Supports both overlap (`&&`) and contains (`@>`) operators
+- **Smart Scoring**: Results ordered by match count or document focus
+- **Case Insensitive**: Automatic keyword normalization
+- **Integration Ready**: Works seamlessly with semantic search
+- **Inspired by `find_matching_entries.rb`**: Optimized for PostgreSQL arrays
+### Search Analytics and Tracking
+Ragdoll automatically tracks all searches to provide comprehensive analytics and improve search relevance over time:
+```ruby
+# Get search analytics for the last 30 days
+analytics = Ragdoll::Search.search_analytics(days: 30)
+puts "Total searches: #{analytics[:total_searches]}"
+puts "Unique queries: #{analytics[:unique_queries]}"
+puts "Average execution time: #{analytics[:avg_execution_time]}ms"
+puts "Click-through rate: #{analytics[:click_through_rate]}%"
+# Find similar searches using vector similarity
+search = Ragdoll::Search.first
+similar_searches = search.nearest_neighbors(:query_embedding, distance: :cosine).limit(5)
+similar_searches.each do |similar|
+  puts "Query: #{similar.query}"
+  puts "Similarity: #{similar.neighbor_distance}"
+  puts "Results: #{similar.results_count}"
+end
+# Track user interactions (clicks on search results)
+search_result = Ragdoll::SearchResult.first
+search_result.mark_as_clicked!
+# Disable tracking for specific searches if needed
+results = Ragdoll.search(
+  query: 'private query',
+  track_search: false
+)
+```
 ### System Operations
 ```ruby
 # Get system statistics
-stats = Ragdoll::Core.stats
+stats = Ragdoll.stats
 # Returns information about documents, content types, embeddings, etc.
 # Health check
-healthy = Ragdoll::Core.healthy?
+healthy = Ragdoll.healthy?
 # Get configuration
-config = Ragdoll::Core.configuration
+config = Ragdoll.configuration
 # Reset configuration (useful for testing)
-Ragdoll::Core.reset_configuration!
+Ragdoll.reset_configuration!
 ```
 ### Configuration
 ```ruby
 # Configure the system
-Ragdoll::Core.configure do |config|
+Ragdoll.configure do |config|
   # Database configuration (PostgreSQL only - REQUIRED)
   config.database_config = {
     adapter: 'postgresql',
@@ -218,6 +353,7 @@ end
 - **Database schema**: Multi-modal polymorphic architecture with PostgreSQL + pgvector
 - **Dual metadata architecture**: Separate LLM-generated content analysis and file properties
 - **Search functionality**: Semantic search with cosine similarity and usage analytics
+- **Search tracking system**: Comprehensive analytics with query embeddings, click-through tracking, and performance monitoring
 - **Document management**: Add, update, delete, list operations
 - **Background processing**: ActiveJob integration for async embedding generation
 - **LLM metadata generation**: AI-powered structured content analysis with schema validation
@@ -264,15 +400,16 @@ Currently, Ragdoll processes text documents through:
 6. **Search**: Semantic search using cosine similarity with usage analytics
 ### Example Usage
 ```ruby
 # Add a text document
-result = Ragdoll::Core.add_document(path: 'document.pdf')
+result = Ragdoll.add_document(path: 'document.pdf')
 # Check processing status
-status = Ragdoll::Core.document_status(id: result[:document_id])
+status = Ragdoll.document_status(id: result[:document_id])
 # Search the content
-results = Ragdoll::Core.search(query: 'machine learning')
+results = Ragdoll.search(query: 'machine learning')
 ```
 ## PostgreSQL + pgvector Configuration
@@ -293,7 +430,7 @@ psql -d ragdoll_production -c "CREATE EXTENSION IF NOT EXISTS vector;"
 ### Configuration Example
 ```ruby
-Ragdoll::Core.configure do |config|
+Ragdoll.configure do |config|
   config.database_config = {
     adapter: 'postgresql',
     database: 'ragdoll_production',
@@ -337,11 +474,52 @@ gem 'ragdoll'
 - **PostgreSQL**: 12+ with pgvector extension (REQUIRED - no other databases supported)
 - **Dependencies**: activerecord, pg, pgvector, neighbor, ruby_llm, pdf-reader, docx, rubyzip, shrine, rmagick, opensearch-ruby, searchkick, ruby-progressbar
+## Use Cases
+- Internal knowledge bases and chat assistants grounded in your documents
+- Product documentation and support search with analytics and relevance feedback
+- Research corpora exploration (summaries, topics, similarity) across large text sets
+- Incident retrospectives and operational analytics with searchable write-ups
+- Media libraries preparing for text + image + audio pipelines (image/audio in progress)
+## Environment Variables
+Set the following as environment variables (do not commit secrets to source control):
+- `OPENAI_API_KEY` — required for OpenAI models
+- `OPENAI_ORGANIZATION` — optional, for OpenAI org scoping
+- `OPENAI_PROJECT` — optional, for OpenAI project scoping
+- `ANTHROPIC_API_KEY` — optional, for Anthropic models
+- `GOOGLE_API_KEY` — optional, for Google models
+- `DATABASE_PASSWORD` — your PostgreSQL password if not using peer auth
+## Troubleshooting
+### pgvector extension missing
+- Ensure the extension is enabled in your database:
+```bash
+psql -d ragdoll_production -c "CREATE EXTENSION IF NOT EXISTS vector;"
+```
+- If the command fails, verify PostgreSQL and pgvector are installed and that you’re connecting to the correct database.
+### Document stuck in "processing"
+- Confirm your API keys are set and valid.
+- Ensure `auto_migrate: true` in configuration (or run migrations if you manage schema yourself).
+- Check logs at the path configured by `logging_config[:log_filepath]` for errors.
 ## Related Projects
 - **ragdoll-cli**: Standalone CLI application using ragdoll
 - **ragdoll-rails**: Rails engine with web interface for ragdoll
+## Contributing & Support
+Contributions are welcome! If you find a bug or have a feature request, please open an issue or submit a pull request. For questions and feedback, open an issue in this repository.
 ## Key Design Principles
 1. **Database-Oriented**: Built on ActiveRecord with PostgreSQL + pgvector for production performance

data/Rakefile CHANGED Viewed

@@ -1,8 +1,5 @@
 # frozen_string_literal: true
-require "simplecov"
-SimpleCov.start
 # Suppress bundler/rubygems warnings
 $VERBOSE = nil
@@ -52,8 +49,10 @@ task :setup_test_db do
     puts "Warning: Could not install pgvector extension: #{e.message}"
   end
-  # Run migrations
-  Ragdoll::Core::Database.setup(test_db_config.merge(auto_migrate: true, logger: nil))
+  # Reset and run migrations (drops all tables and re-runs migrations)
+  # This ensures clean state for tests regardless of previous migration versions
+  Ragdoll::Core::Database.setup(test_db_config.merge(auto_migrate: false, logger: nil))
+  Ragdoll::Core::Database.reset!
   puts "Test database setup complete"
 end