RubyGems - ragdoll - Versions diffs - 0.1.9 → 0.1.10 - Mend

ragdoll 0.1.9 → 0.1.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +46 -4
data/README.md +49 -0
data/Rakefile +4 -2
data/app/models/ragdoll/document.rb +115 -12
data/app/models/ragdoll/embedding.rb +36 -4
data/app/services/ragdoll/search_engine.rb +13 -2
data/db/migrate/{001_enable_postgresql_extensions.rb → 20250815234901_enable_postgresql_extensions.rb} +7 -8
data/db/migrate/20250815234902_create_ragdoll_documents.rb +117 -0
data/db/migrate/{005_create_ragdoll_embeddings.rb → 20250815234903_create_ragdoll_embeddings.rb} +13 -10
data/db/migrate/{006_create_ragdoll_contents.rb → 20250815234904_create_ragdoll_contents.rb} +14 -11
data/db/migrate/{007_create_ragdoll_searches.rb → 20250815234905_create_ragdoll_searches.rb} +24 -20
data/db/migrate/{008_create_ragdoll_search_results.rb → 20250815234906_create_ragdoll_search_results.rb} +16 -16
data/lib/ragdoll/core/database.rb +8 -3
data/lib/ragdoll/core/version.rb +1 -1
data/lib/tasks/db.rake +63 -15
metadata +7 -7
data/db/migrate/004_create_ragdoll_documents.rb +0 -70

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: cde84c4b5bbf1e8296bdd762ee78acb2f69663e493ce23b0941ada9d1201bdcd
-  data.tar.gz: f8bc456d3c536a295920bc1c806974b2b39f08977a8761604c7a192b83e756d2
+  metadata.gz: 4f7b2c95ede1523e9e01af70394217387d876da6317fed651df3e27cf337cfe9
+  data.tar.gz: a82ae7d541fd06876acb3acaf8f02639234f8b118274621851678a2799c5f559
 SHA512:
-  metadata.gz: c1ce0e46be45fe8004930ec231a83a59f31039f4908be2a0e0ba67043237f1ea03bc00991820f6928a6ef5baa6ca910547876f21ddad5a7ead2d6384192e7708
-  data.tar.gz: e3f50e1205b4ba755c6a978acb06240b7b1fa729f4fa9bef33f956a9b245ad3d3323612f300902051237ffa71a763fc6db8d8e0fedc4f2761c46a977b42d6958
+  metadata.gz: ba14828a6e743677c84072b9f1bb27743e429531ebdd9fbd3d8553add7bbdad070d709cd617dc620fef4ddc6846085ca79d3bb6d32bae8465c6b3b10acc0692f
+  data.tar.gz: de630ebf15168b562ef686ec6cd9f1cfe532b5bbf495e33a74085b567cf53ce7bb87e7c5c543756c47bd68c98290221b879a1b4d8e5888aac4916d1c1554fe99

data/CHANGELOG.md CHANGED Viewed

@@ -6,7 +6,36 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 ## [Unreleased]
-*Note: These features will be included in the next release (likely v0.1.9) featuring comprehensive search tracking and analytics capabilities.*
+## [0.1.10] - 2025-01-15
+### Changed
+- Continued improvements to search performance and accuracy
+### Added
+- **Hybrid Search**: Complete implementation combining semantic and full-text search capabilities
+  - Configurable weights for semantic vs text search (default: 70% semantic, 30% text)
+  - Deduplication of results by document ID
+  - Combined scoring system for unified result ranking
+- **Full-text Search**: PostgreSQL full-text search with tsvector indexing
+  - Per-word match ratio scoring (0.0 to 1.0)
+  - GIN index for high-performance text search
+  - Search across title, summary, keywords, and description fields
+- **Enhanced Search API**: Complete search type delegation at top-level Ragdoll namespace
+  - `Ragdoll.hybrid_search` method for combined semantic and text search
+  - `Ragdoll::Document.search_content` for full-text search capabilities
+  - Consistent parameter handling across all search methods
+### Changed
+- **Search Architecture**: Unified search interface supporting semantic, fulltext, and hybrid modes
+- **Database Schema**: Added search_vector column with GIN indexing for full-text search performance
+### Technical Details
+- Full-text search uses PostgreSQL's built-in tsvector capabilities
+- Hybrid search combines cosine similarity (semantic) with text match ratios
+- Results are ranked by weighted combined scores
+- All search methods maintain backward compatibility
+## [0.1.9] - 2025-01-10
 ### Added
 - **Initial CHANGELOG**: Added comprehensive CHANGELOG.md following Keep a Changelog format
@@ -40,7 +69,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 - **Test Coverage**: Added coverage directory to .gitignore for cleaner repository state
 ### Technical Details
-- Commits: `9186067`, `cb952d3`, `e902a5f`, `632527b`
+- Commits: `9186067`, `cb952d3`, `e902a5f`, `632527b`
 - All changes maintain backward compatibility
 - No breaking API changes
@@ -141,6 +170,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 - **Database Schema**: Multi-modal polymorphic architecture with PostgreSQL + pgvector
 - **Dual Metadata Architecture**: Separate LLM-generated content analysis and file properties
 - **Search Functionality**: Semantic search with cosine similarity and usage analytics
+- **Hybrid Search**: Complete implementation combining semantic and full-text search with configurable weights
+- **Full-text Search**: PostgreSQL tsvector-based text search with GIN indexing
 - **Search Tracking System**: Comprehensive analytics with query embeddings, click-through tracking, and performance monitoring
 - **Document Management**: Add, update, delete, list operations
 - **Background Processing**: ActiveJob integration for async embedding generation
@@ -150,7 +181,6 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 ### 🚧 In Development
 - **Image Processing**: Framework exists but vision AI integration needs completion
 - **Audio Processing**: Framework exists but speech-to-text integration needs completion
-- **Hybrid Search**: Combining semantic and full-text search capabilities
 ### 📋 Planned Features
 - **Multi-modal Search**: Search across text, image, and audio content types
@@ -161,6 +191,18 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 ## Migration Guide
+### From 0.1.9 to 0.1.10
+- **New Search Methods**: `Ragdoll.hybrid_search` and `Ragdoll::Document.search_content` methods now available
+- **Database Migration**: New search_vector column added to documents table with GIN index for full-text search
+- **API Enhancement**: All search methods now support unified parameter interface
+- **Backward Compatibility**: Existing `Ragdoll.search` method unchanged, continues to work as before
+- **CLI Integration**: ragdoll-cli now requires ragdoll >= 0.1.10 for hybrid and full-text search support
+### From 0.1.8 to 0.1.9
+- **CHANGELOG Addition**: Comprehensive changelog and feature tracking added
+- **API Method Consistency**: `hybrid_search` method properly delegated to top-level namespace
+- **No Breaking Changes**: All existing functionality remains compatible
 ### From 0.1.7 to 0.1.8
 - New search tracking tables will be automatically created via migrations
 - No breaking changes to existing API
@@ -198,4 +240,4 @@ This project is licensed under the MIT License - see the LICENSE file for detail
 ---
-*This changelog is automatically maintained and reflects the actual implementation status of features.*
+*This changelog is automatically maintained and reflects the actual implementation status of features.*

data/README.md CHANGED Viewed

@@ -22,6 +22,8 @@
 Database-oriented multi-modal RAG (Retrieval-Augmented Generation) library built on ActiveRecord. Features PostgreSQL + pgvector for high-performance semantic search, polymorphic content architecture, and dual metadata design for sophisticated document analysis.
+RAG does not have to be hard.  Every week its getting simpler.  The frontier LLM providers are starting to encorporate RAG services.  For example OpenAI offers a vector search service.  See: [https://0x1eef.github.io/posts/an-introduction-to-rag-with-llm.rb/](https://0x1eef.github.io/posts/an-introduction-to-rag-with-llm.rb/)
 ## Overview
 Ragdoll is a database-first, multi-modal Retrieval-Augmented Generation (RAG) library for Ruby. It pairs PostgreSQL + pgvector with an ActiveRecord-driven schema to deliver fast, production-grade semantic search and clean data modeling. Today it ships with robust text processing; image and audio pipelines are scaffolded and actively being completed.
@@ -202,6 +204,53 @@ results = Ragdoll.hybrid_search(
 )
 ```
+### Keywords Search
+Ragdoll supports powerful keywords-based search that can be used standalone or combined with semantic search. The keywords system uses PostgreSQL array operations for high performance and supports both partial matching (overlap) and exact matching (contains all).
+```ruby
+# Keywords-only search (overlap - documents containing any of the keywords)
+results = Ragdoll::Document.search_by_keywords(['machine', 'learning', 'ai'])
+# Results are sorted by match count (documents with more keyword matches rank higher)
+results.each do |doc|
+  puts "#{doc.title}: #{doc.keywords_match_count} matches"
+end
+# Exact keywords search (contains all - documents must have ALL keywords)
+results = Ragdoll::Document.search_by_keywords_all(['ruby', 'programming'])
+# Results are sorted by focus (fewer total keywords = more focused document)
+results.each do |doc|
+  puts "#{doc.title}: #{doc.total_keywords_count} total keywords"
+end
+# Combined semantic + keywords search for best results
+results = Ragdoll.search(
+  query: 'artificial intelligence applications',
+  keywords: ['ai', 'machine learning', 'neural networks'],
+  limit: 10
+)
+# Keywords search with options
+results = Ragdoll::Document.search_by_keywords(
+  ['web', 'javascript', 'frontend'],
+  limit: 20
+)
+# Case-insensitive keyword matching (automatically normalized)
+results = Ragdoll::Document.search_by_keywords(['Python', 'DATA-SCIENCE', 'ai'])
+# Will match documents with keywords: ['python', 'data-science', 'ai']
+```
+**Keywords Search Features:**
+- **High Performance**: Uses PostgreSQL GIN indexes for fast array operations
+- **Flexible Matching**: Supports both overlap (`&&`) and contains (`@>`) operators
+- **Smart Scoring**: Results ordered by match count or document focus
+- **Case Insensitive**: Automatic keyword normalization
+- **Integration Ready**: Works seamlessly with semantic search
+- **Inspired by `find_matching_entries.rb`**: Optimized for PostgreSQL arrays
 ### Search Analytics and Tracking
 Ragdoll automatically tracks all searches to provide comprehensive analytics and improve search relevance over time:

data/Rakefile CHANGED Viewed

@@ -49,8 +49,10 @@ task :setup_test_db do
     puts "Warning: Could not install pgvector extension: #{e.message}"
   end
-  # Run migrations
-  Ragdoll::Core::Database.setup(test_db_config.merge(auto_migrate: true, logger: nil))
+  # Reset and run migrations (drops all tables and re-runs migrations)
+  # This ensures clean state for tests regardless of previous migration versions
+  Ragdoll::Core::Database.setup(test_db_config.merge(auto_migrate: false, logger: nil))
+  Ragdoll::Core::Database.reset!
   puts "Test database setup complete"
 end

data/app/models/ragdoll/document.rb CHANGED Viewed

@@ -142,10 +142,12 @@ module Ragdoll
     def keywords_array
       return [] unless keywords.present?
+      # After migration, keywords is now a PostgreSQL array
       case keywords
       when Array
-        keywords
+        keywords.map(&:to_s).map(&:strip).reject(&:empty?)
       when String
+        # Fallback for any remaining string data (shouldn't happen after migration)
         keywords.split(",").map(&:strip).reject(&:empty?)
       else
         []
@@ -153,17 +155,23 @@ module Ragdoll
     end
     def add_keyword(keyword)
+      return if keyword.blank?
       current_keywords = keywords_array
-      return if current_keywords.include?(keyword.strip)
+      normalized_keyword = keyword.to_s.strip.downcase
+      return if current_keywords.map(&:downcase).include?(normalized_keyword)
-      current_keywords << keyword.strip
-      self.keywords = current_keywords.join(", ")
+      current_keywords << normalized_keyword
+      self.keywords = current_keywords
     end
     def remove_keyword(keyword)
+      return if keyword.blank?
       current_keywords = keywords_array
-      current_keywords.delete(keyword.strip)
-      self.keywords = current_keywords.join(", ")
+      normalized_keyword = keyword.to_s.strip.downcase
+      current_keywords.reject! { |k| k.downcase == normalized_keyword }
+      self.keywords = current_keywords
     end
     # Metadata accessors for common fields
@@ -249,15 +257,110 @@ module Ragdoll
       puts "Metadata generation failed: #{e.message}"
     end
-    # PostgreSQL full-text search on metadata fields
+    # PostgreSQL full-text search on metadata fields with per-word match-ratio [0.0..1.0]
     def self.search_content(query, **options)
       return none if query.blank?
-      # Use PostgreSQL's built-in full-text search across metadata fields
-      where(
-        "to_tsvector('english', COALESCE(title, '') || ' ' || COALESCE(metadata->>'summary', '') || ' ' || COALESCE(metadata->>'keywords', '') || ' ' || COALESCE(metadata->>'description', '')) @@ plainto_tsquery('english', ?)",
-        query
-      ).limit(options[:limit] || 20)
+      # Split into unique alphanumeric words
+      words = query.downcase.scan(/[[:alnum:]]+/).uniq
+      return none if words.empty?
+      limit = options[:limit] || 20
+      threshold = options[:threshold] || 0.0
+      # Use precomputed tsvector column if it exists, otherwise build on the fly
+      if column_names.include?("search_vector")
+        tsvector = "#{table_name}.search_vector"
+      else
+        # Build tsvector from title and metadata fields
+        text_expr =
+          "COALESCE(title, '') || ' ' || " \
+          "COALESCE(metadata->>'summary', '') || ' ' || " \
+          "COALESCE(metadata->>'keywords', '') || ' ' || " \
+          "COALESCE(metadata->>'description', '')"
+        tsvector = "to_tsvector('english', #{text_expr})"
+      end
+      # Prepare sanitized tsquery terms
+      tsqueries = words.map do |word|
+        sanitize_sql_array(["plainto_tsquery('english', ?)", word])
+      end
+      # Combine per-word tsqueries with OR so PostgreSQL can use the GIN index
+      combined_tsquery = tsqueries.join(' || ')
+      # Score each match (1 if present, 0 if not), sum them
+      score_terms = tsqueries.map { |tsq| "(#{tsvector} @@ #{tsq})::int" }
+      score_sum   = score_terms.join(' + ')
+      # Similarity ratio: fraction of query words present
+      similarity_sql = "(#{score_sum})::float / #{words.size}"
+      # Start with basic search query
+      query = select("#{table_name}.*, #{similarity_sql} AS fulltext_similarity")
+      # Build where conditions
+      conditions = ["#{tsvector} @@ (#{combined_tsquery})"]
+      # Add status filter (default to processed unless overridden)
+      status = options[:status] || 'processed'
+      conditions << "#{table_name}.status = '#{status}'"
+      # Add document type filter if specified
+      if options[:document_type].present?
+        conditions << sanitize_sql_array(["#{table_name}.document_type = ?", options[:document_type]])
+      end
+      # Add threshold filtering if specified
+      if threshold > 0.0
+        conditions << "#{similarity_sql} >= #{threshold}"
+      end
+      # Combine all conditions
+      where_clause = conditions.join(' AND ')
+      # Materialize to array to avoid COUNT/SELECT alias conflicts in some AR versions
+      query.where(where_clause)
+        .order(Arel.sql("fulltext_similarity DESC, updated_at DESC"))
+        .limit(limit)
+        .to_a
+    end
+    # Search documents by keywords using PostgreSQL array operations
+    # Returns documents that match keywords with scoring based on match count
+    # Inspired by find_matching_entries.rb algorithm but optimized for PostgreSQL arrays
+    def self.search_by_keywords(keywords_array, **options)
+      return where("1 = 0") if keywords_array.blank?
+      # Normalize keywords to lowercase strings array
+      normalized_keywords = Array(keywords_array).map(&:to_s).map(&:downcase).reject(&:empty?)
+      return where("1 = 0") if normalized_keywords.empty?
+      limit = options[:limit] || 20
+      # Use PostgreSQL array overlap operator with proper array literal
+      quoted_keywords = normalized_keywords.map { |k| "\"#{k}\"" }.join(',')
+      array_literal = "'{#{quoted_keywords}}'::text[]"
+      where("keywords && #{array_literal}")
+        .order("created_at DESC")
+        .limit(limit)
+    end
+    # Find documents that contain ALL specified keywords (exact array matching)
+    def self.search_by_keywords_all(keywords_array, **options)
+      return where("1 = 0") if keywords_array.blank?
+      normalized_keywords = Array(keywords_array).map(&:to_s).map(&:downcase).reject(&:empty?)
+      return where("1 = 0") if normalized_keywords.empty?
+      limit = options[:limit] || 20
+      # Use PostgreSQL array contains operator with proper array literal
+      quoted_keywords = normalized_keywords.map { |k| "\"#{k}\"" }.join(',')
+      array_literal = "'{#{quoted_keywords}}'::text[]"
+      where("keywords @> #{array_literal}")
+        .order("created_at DESC")
+        .limit(limit)
     end
     # Faceted search by metadata fields

data/app/models/ragdoll/embedding.rb CHANGED Viewed

@@ -64,10 +64,26 @@ module Ragdoll
       scope = scope.by_model(filters[:embedding_model]) if filters[:embedding_model]
       # Document-level filters require joining through embeddable (STI Content) to documents
-      if filters[:document_type]
+      needs_document_join = filters[:document_type] || filters[:keywords]
+      if needs_document_join
         scope = scope.joins("JOIN ragdoll_contents ON ragdoll_contents.id = ragdoll_embeddings.embeddable_id")
                      .joins("JOIN ragdoll_documents ON ragdoll_documents.id = ragdoll_contents.document_id")
-                     .where("ragdoll_documents.document_type = ?", filters[:document_type])
+      end
+      if filters[:document_type]
+        scope = scope.where("ragdoll_documents.document_type = ?", filters[:document_type])
+      end
+      # Keywords filtering using PostgreSQL array operations
+      if filters[:keywords] && filters[:keywords].any?
+        normalized_keywords = Array(filters[:keywords]).map(&:to_s).map(&:downcase).reject(&:empty?)
+        if normalized_keywords.any?
+          # Use PostgreSQL array overlap operator with proper array literal
+          quoted_keywords = normalized_keywords.map { |k| "\"#{k}\"" }.join(',')
+          array_literal = "'{#{quoted_keywords}}'::text[]"
+          scope = scope.where("ragdoll_documents.keywords && #{array_literal}")
+        end
       end
       # Use pgvector for similarity search
@@ -83,10 +99,26 @@ module Ragdoll
       scope = scope.by_model(filters[:embedding_model]) if filters[:embedding_model]
       # Document-level filters require joining through embeddable (STI Content) to documents
-      if filters[:document_type]
+      needs_document_join = filters[:document_type] || filters[:keywords]
+      if needs_document_join
         scope = scope.joins("JOIN ragdoll_contents ON ragdoll_contents.id = ragdoll_embeddings.embeddable_id")
                      .joins("JOIN ragdoll_documents ON ragdoll_documents.id = ragdoll_contents.document_id")
-                     .where("ragdoll_documents.document_type = ?", filters[:document_type])
+      end
+      if filters[:document_type]
+        scope = scope.where("ragdoll_documents.document_type = ?", filters[:document_type])
+      end
+      # Keywords filtering using PostgreSQL array operations
+      if filters[:keywords] && filters[:keywords].any?
+        normalized_keywords = Array(filters[:keywords]).map(&:to_s).map(&:downcase).reject(&:empty?)
+        if normalized_keywords.any?
+          # Use PostgreSQL array overlap operator with proper array literal
+          quoted_keywords = normalized_keywords.map { |k| "\"#{k}\"" }.join(',')
+          array_literal = "'{#{quoted_keywords}}'::text[]"
+          scope = scope.where("ragdoll_documents.keywords && #{array_literal}")
+        end
       end
       search_with_pgvector_stats(query_embedding, scope, limit, threshold)

data/app/services/ragdoll/search_engine.rb CHANGED Viewed

@@ -33,6 +33,10 @@ module Ragdoll
       threshold = options[:threshold] || search_config[:similarity_threshold]
       filters = options[:filters] || {}
+      # Extract keywords option and normalize
+      keywords = options[:keywords] || []
+      keywords = Array(keywords).map(&:to_s).reject(&:empty?)
       # Extract tracking options
       session_id = options[:session_id]
       user_id = options[:user_id]
@@ -49,6 +53,11 @@ module Ragdoll
         return [] if query_embedding.nil?
       end
+      # Add keywords to filters if provided
+      if keywords.any?
+        filters[:keywords] = keywords
+      end
       # Search using ActiveRecord models with statistics
       # Try enhanced search first, fall back to original if it fails
       begin
@@ -81,13 +90,15 @@ module Ragdoll
             }
           end
+          search_type = keywords.any? ? "semantic_with_keywords" : "semantic"
           Ragdoll::Search.record_search(
             query: query_string,
             query_embedding: query_embedding,
             results: search_results,
-            search_type: "semantic",
+            search_type: search_type,
             filters: filters,
-            options: { limit: limit, threshold: threshold },
+            options: { limit: limit, threshold: threshold, keywords: keywords },
             execution_time_ms: execution_time,
             session_id: session_id,
             user_id: user_id

data/db/migrate/{001_enable_postgresql_extensions.rb → 20250815234901_enable_postgresql_extensions.rb} RENAMED Viewed

@@ -1,8 +1,5 @@
 class EnablePostgresqlExtensions < ActiveRecord::Migration[7.0]
   def up
-    # This migration is now handled by the db:create rake task
-    # Just ensure required extensions are available
     # Vector similarity search (required for embeddings)
     execute "CREATE EXTENSION IF NOT EXISTS vector"
@@ -15,9 +12,11 @@ class EnablePostgresqlExtensions < ActiveRecord::Migration[7.0]
   end
   def down
-    execute <<-SQL
-      DROP DATABASE IF EXISTS ragdoll_development;
-      DROP ROLE IF EXISTS ragdoll;
-    SQL
+    # Extensions are typically not dropped as they might be used by other databases
+    # If you really need to drop them, uncomment the following:
+    # execute "DROP EXTENSION IF EXISTS vector"
+    # execute "DROP EXTENSION IF EXISTS unaccent"
+    # execute "DROP EXTENSION IF EXISTS pg_trgm"
+    # execute "DROP EXTENSION IF EXISTS \"uuid-ossp\""
   end
-end
+end

data/db/migrate/20250815234902_create_ragdoll_documents.rb ADDED Viewed

@@ -0,0 +1,117 @@
+class CreateRagdollDocuments < ActiveRecord::Migration[7.0]
+  # For concurrent index creation (PostgreSQL)
+  disable_ddl_transaction!
+  def up
+    create_table :ragdoll_documents,
+      comment: "Core documents table with LLM-generated structured metadata" do |t|
+      t.string :location, null: false,
+        comment: "Source location of document (file path, URL, or identifier)"
+      t.string :title, null: false,
+        comment: "Human-readable document title for display and search"
+      t.text :summary, null: false, default: "",
+        comment: "LLM-generated summary of document content"
+      t.string :document_type, null: false, default: "text",
+        comment: "Document format type"
+      t.string :status, null: false, default: "pending",
+        comment: "Document processing status"
+      t.json :metadata, default: {},
+        comment: "LLM-generated structured metadata about the file"
+      t.timestamp :file_modified_at, null: false, default: -> { "CURRENT_TIMESTAMP" },
+        comment: "Timestamp when the source file was last modified"
+      t.timestamps null: false,
+        comment: "Standard creation and update timestamps"
+      # Add tsvector column for full-text search
+      t.tsvector :search_vector
+      # Add keywords as array column
+      t.text :keywords, array: true, default: []
+    end
+    ###########
+    # Indexes #
+    ###########
+    add_index :ragdoll_documents, :location, unique: true,
+      comment: "Unique index for document source lookup"
+    add_index :ragdoll_documents, :title,
+      comment: "Index for title-based search"
+    add_index :ragdoll_documents, :document_type,
+      comment: "Index for filtering by document type"
+    add_index :ragdoll_documents, :status,
+      comment: "Index for filtering by processing status"
+    add_index :ragdoll_documents, :created_at,
+      comment: "Index for chronological sorting"
+    add_index :ragdoll_documents, [:document_type, :status],
+      comment: "Composite index for type+status filtering"
+    # Full-text search index
+    execute <<-SQL
+      CREATE INDEX CONCURRENTLY index_ragdoll_documents_on_fulltext_search
+      ON ragdoll_documents
+      USING gin(to_tsvector('english',
+        COALESCE(title, '') || ' ' ||
+        COALESCE(metadata->>'summary', '') || ' ' ||
+        COALESCE(metadata->>'keywords', '') || ' ' ||
+        COALESCE(metadata->>'description', '')
+      ))
+    SQL
+    add_index :ragdoll_documents, "(metadata->>'document_type')",
+      name: "index_ragdoll_documents_on_metadata_type",
+      comment: "Index for filtering by document type"
+    add_index :ragdoll_documents, "(metadata->>'classification')",
+      name: "index_ragdoll_documents_on_metadata_classification",
+      comment: "Index for filtering by document classification"
+    # GIN index on search_vector
+    add_index :ragdoll_documents, :search_vector, using: :gin, algorithm: :concurrently
+    # GIN index on keywords array
+    add_index :ragdoll_documents, :keywords, using: :gin,
+      name: 'index_ragdoll_documents_on_keywords_gin'
+    # Trigger to keep search_vector up to date on INSERT/UPDATE
+    execute <<-SQL
+      CREATE FUNCTION ragdoll_documents_vector_update() RETURNS trigger AS $$
+      BEGIN
+        NEW.search_vector := to_tsvector('english',
+          COALESCE(NEW.title, '') || ' ' ||
+          COALESCE(NEW.metadata->>'summary', '') || ' ' ||
+          COALESCE(NEW.metadata->>'keywords', '') || ' ' ||
+          COALESCE(NEW.metadata->>'description', '')
+        );
+        RETURN NEW;
+      END
+      $$ LANGUAGE plpgsql;
+      CREATE TRIGGER ragdoll_search_vector_update
+      BEFORE INSERT OR UPDATE ON ragdoll_documents
+      FOR EACH ROW EXECUTE FUNCTION ragdoll_documents_vector_update();
+    SQL
+  end
+  def down
+    execute <<-SQL
+      DROP TRIGGER IF EXISTS ragdoll_search_vector_update ON ragdoll_documents;
+      DROP FUNCTION IF EXISTS ragdoll_documents_vector_update();
+    SQL
+    drop_table :ragdoll_documents
+  end
+end

data/db/migrate/{005_create_ragdoll_embeddings.rb → 20250815234903_create_ragdoll_embeddings.rb} RENAMED Viewed

@@ -3,7 +3,7 @@ class CreateRagdollEmbeddings < ActiveRecord::Migration[7.0]
     create_table :ragdoll_embeddings,
       comment: "Polymorphic vector embeddings storage for semantic similarity search" do |t|
-        t.references :embeddable, polymorphic: true, null: false,
+      t.references :embeddable, polymorphic: true, null: false,
         comment: "Polymorphic reference to embeddable content"
       t.text :content, null: false, default: "",
@@ -26,16 +26,19 @@ class CreateRagdollEmbeddings < ActiveRecord::Migration[7.0]
       t.timestamps null: false,
         comment: "Standard creation and update timestamps"
+    end
-      ###########
-      # Indexes #
-      ###########
+    ###########
+    # Indexes #
+    ###########
-      t.index %i[embeddable_type embeddable_id],
-        comment: "Index for finding embeddings by embeddable content"
+    add_index :ragdoll_embeddings, [:embeddable_type, :embeddable_id],
+      comment: "Index for finding embeddings by embeddable content"
-      t.index :embedding_vector, using: :ivfflat, opclass: :vector_cosine_ops, name: "index_ragdoll_embeddings_on_embedding_vector_cosine",
-        comment: "IVFFlat index for fast cosine similarity search"
-    end
+    add_index :ragdoll_embeddings, :embedding_vector,
+      using: :ivfflat,
+      opclass: :vector_cosine_ops,
+      name: "index_ragdoll_embeddings_on_embedding_vector_cosine",
+      comment: "IVFFlat index for fast cosine similarity search"
   end
-end
+end

data/db/migrate/{006_create_ragdoll_contents.rb → 20250815234904_create_ragdoll_contents.rb} RENAMED Viewed

@@ -29,19 +29,22 @@ class CreateRagdollContents < ActiveRecord::Migration[7.0]
       t.timestamps null: false,
         comment: "Standard creation and update timestamps"
+    end
-      ###########
-      # Indexes #
-      ###########
+    ###########
+    # Indexes #
+    ###########
-      t.index :embedding_model,
-        comment: "Index for filtering by embedding model"
+    add_index :ragdoll_contents, :embedding_model,
+      comment: "Index for filtering by embedding model"
-      t.index :type,
-        comment: "Index for filtering by content type"
+    add_index :ragdoll_contents, :type,
+      comment: "Index for filtering by content type"
-      t.index "to_tsvector('english', COALESCE(content, ''))", using: :gin, name: "index_ragdoll_contents_on_fulltext_search",
-        comment: "Full-text search index for text content"
-    end
+    execute <<-SQL
+      CREATE INDEX index_ragdoll_contents_on_fulltext_search
+      ON ragdoll_contents
+      USING gin(to_tsvector('english', COALESCE(content, '')))
+    SQL
   end
-end
+end

data/db/migrate/{007_create_ragdoll_searches.rb → 20250815234905_create_ragdoll_searches.rb} RENAMED Viewed

@@ -41,33 +41,37 @@ class CreateRagdollSearches < ActiveRecord::Migration[7.0]
       t.timestamps null: false,
         comment: "Standard creation and update timestamps"
+    end
-      ###########
-      # Indexes #
-      ###########
+    ###########
+    # Indexes #
+    ###########
-      t.index :query_embedding, using: :ivfflat, opclass: :vector_cosine_ops,
-        name: "index_ragdoll_searches_on_query_embedding_cosine",
-        comment: "IVFFlat index for finding similar search queries"
+    add_index :ragdoll_searches, :query_embedding,
+      using: :ivfflat,
+      opclass: :vector_cosine_ops,
+      name: "index_ragdoll_searches_on_query_embedding_cosine",
+      comment: "IVFFlat index for finding similar search queries"
-      t.index :search_type,
-        comment: "Index for filtering by search type"
+    add_index :ragdoll_searches, :search_type,
+      comment: "Index for filtering by search type"
-      t.index :session_id,
-        comment: "Index for grouping searches by session"
+    add_index :ragdoll_searches, :session_id,
+      comment: "Index for grouping searches by session"
-      t.index :user_id,
-        comment: "Index for filtering searches by user"
+    add_index :ragdoll_searches, :user_id,
+      comment: "Index for filtering searches by user"
-      t.index :created_at,
-        comment: "Index for chronological search history"
+    add_index :ragdoll_searches, :created_at,
+      comment: "Index for chronological search history"
-      t.index :results_count,
-        comment: "Index for analyzing search effectiveness"
+    add_index :ragdoll_searches, :results_count,
+      comment: "Index for analyzing search effectiveness"
-      t.index "to_tsvector('english', query)", using: :gin,
-        name: "index_ragdoll_searches_on_fulltext_query",
-        comment: "Full-text search index for finding searches by query text"
-    end
+    execute <<-SQL
+      CREATE INDEX index_ragdoll_searches_on_fulltext_query
+      ON ragdoll_searches
+      USING gin(to_tsvector('english', query))
+    SQL
   end
 end

data/db/migrate/{008_create_ragdoll_search_results.rb → 20250815234906_create_ragdoll_search_results.rb} RENAMED Viewed

@@ -24,26 +24,26 @@ class CreateRagdollSearchResults < ActiveRecord::Migration[7.0]
       t.timestamps null: false,
         comment: "Standard creation and update timestamps"
+    end
-      ###########
-      # Indexes #
-      ###########
+    ###########
+    # Indexes #
+    ###########
-      t.index [:search_id, :result_rank],
-        name: "idx_search_results_search_rank",
-        comment: "Index for retrieving results in ranked order"
+    add_index :ragdoll_search_results, [:search_id, :result_rank],
+      name: "idx_search_results_search_rank",
+      comment: "Index for retrieving results in ranked order"
-      t.index [:embedding_id, :similarity_score],
-        name: "idx_search_results_embedding_score",
-        comment: "Index for analyzing embedding performance"
+    add_index :ragdoll_search_results, [:embedding_id, :similarity_score],
+      name: "idx_search_results_embedding_score",
+      comment: "Index for analyzing embedding performance"
-      t.index :similarity_score,
-        name: "idx_search_results_similarity",
-        comment: "Index for similarity score analysis"
+    add_index :ragdoll_search_results, :similarity_score,
+      name: "idx_search_results_similarity",
+      comment: "Index for similarity score analysis"
-      t.index [:clicked, :clicked_at],
-        name: "idx_search_results_clicks",
-        comment: "Index for click-through analysis"
-    end
+    add_index :ragdoll_search_results, [:clicked, :clicked_at],
+      name: "idx_search_results_clicks",
+      comment: "Index for click-through analysis"
   end
 end

data/lib/ragdoll/core/database.rb CHANGED Viewed

@@ -90,10 +90,10 @@ module Ragdoll
         # Drop all tables in correct order (respecting foreign key constraints)
         # Order: dependent tables first, then parent tables
         tables_to_drop = %w[
+          ragdoll_search_results
+          ragdoll_searches
           ragdoll_embeddings
-          ragdoll_text_contents
-          ragdoll_image_contents
-          ragdoll_audio_contents
+          ragdoll_contents
           ragdoll_documents
           schema_migrations
         ]
@@ -109,6 +109,11 @@ module Ragdoll
           end
         end
+        # Also drop any functions/triggers that might exist
+        if ActiveRecord::Base.connection.adapter_name.downcase.include?("postgresql")
+          ActiveRecord::Base.connection.execute("DROP FUNCTION IF EXISTS ragdoll_documents_vector_update() CASCADE")
+        end
         migrate!
       end

data/lib/ragdoll/core/version.rb CHANGED Viewed

@@ -3,6 +3,6 @@
 module Ragdoll
   module Core
-    VERSION = "0.1.9"
+    VERSION = "0.1.10"
   end
 end

data/lib/tasks/db.rake CHANGED Viewed

@@ -25,22 +25,17 @@ namespace :db do
       )
       # Run individual SQL commands to avoid transaction block issues
-      begin
-        ActiveRecord::Base.connection.execute("DROP DATABASE IF EXISTS ragdoll_development")
-      rescue => e
-        puts "Note: #{e.message}" if e.message.include?("does not exist")
-      end
-      begin
-        ActiveRecord::Base.connection.execute("DROP ROLE IF EXISTS ragdoll")
-      rescue => e
-        puts "Note: #{e.message}" if e.message.include?("does not exist")
-      end
+      # Note: Removed the DROP DATABASE/ROLE here since that should be done via db:drop task
       begin
         ActiveRecord::Base.connection.execute("CREATE ROLE ragdoll WITH LOGIN CREATEDB")
+        puts "Role 'ragdoll' created successfully"
       rescue => e
-        puts "Note: Role already exists, continuing..." if e.message.include?("already exists")
+        if e.message.include?("already exists")
+          puts "Note: Role 'ragdoll' already exists, continuing..."
+        else
+          raise e
+        end
       end
       begin
@@ -50,8 +45,16 @@ namespace :db do
             ENCODING = 'UTF8'
             CONNECTION LIMIT = -1
         SQL
+        puts "Database 'ragdoll_development' created successfully"
       rescue => e
-        puts "Note: Database already exists, continuing..." if e.message.include?("already exists")
+        if e.message.include?("already exists")
+          puts "ERROR: Database 'ragdoll_development' already exists!"
+          puts "Please run 'rake db:drop' first to remove the existing database, then run 'rake db:create' again."
+          puts "Or use 'rake db:reset' to drop, create, and migrate in one step."
+          exit 1
+        else
+          raise e
+        end
       end
       ActiveRecord::Base.connection.execute("GRANT ALL PRIVILEGES ON DATABASE ragdoll_development TO ragdoll")
@@ -97,8 +100,53 @@ namespace :db do
     puts "Dropping database with config: #{config.database.inspect}"
     case config.database[:adapter]
-    when "postgresql", "mysql2"
-      puts "For #{config.database[:adapter]}, please drop the database manually on your server"
+    when "postgresql"
+      puts "PostgreSQL database drop - running as superuser to drop database and role..."
+      # Connect as superuser to drop database and role
+      ActiveRecord::Base.establish_connection(
+        adapter: 'postgresql',
+        database: 'postgres', # Connect to postgres database initially
+        username: ENV.fetch('POSTGRES_SUPERUSER', 'postgres'),
+        password: ENV['POSTGRES_SUPERUSER_PASSWORD'],
+        host: config.database[:host] || 'localhost',
+        port: config.database[:port] || 5432
+      )
+      # Drop the database if it exists
+      begin
+        ActiveRecord::Base.connection.execute("DROP DATABASE IF EXISTS ragdoll_development")
+        puts "Database 'ragdoll_development' dropped successfully"
+      rescue => e
+        puts "Error dropping database: #{e.message}"
+      end
+      # Optionally drop the role (commented out by default to preserve user)
+      # begin
+      #   ActiveRecord::Base.connection.execute("DROP ROLE IF EXISTS ragdoll")
+      #   puts "Role 'ragdoll' dropped successfully"
+      # rescue => e
+      #   puts "Error dropping role: #{e.message}"
+      # end
+    when "mysql2"
+      puts "MySQL database drop - connecting to drop database..."
+      # Connect without specifying database
+      ActiveRecord::Base.establish_connection(
+        adapter: 'mysql2',
+        username: config.database[:username],
+        password: config.database[:password],
+        host: config.database[:host] || 'localhost',
+        port: config.database[:port] || 3306
+      )
+      begin
+        ActiveRecord::Base.connection.execute("DROP DATABASE IF EXISTS #{config.database[:database]}")
+        puts "Database '#{config.database[:database]}' dropped successfully"
+      rescue => e
+        puts "Error dropping database: #{e.message}"
+      end
     end
     puts "Database drop completed"

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: ragdoll
 version: !ruby/object:Gem::Version
-  version: 0.1.9
+  version: 0.1.10
 platform: ruby
 authors:
 - Dewayne VanHoozer
@@ -406,12 +406,12 @@ files:
 - app/services/ragdoll/search_engine.rb
 - app/services/ragdoll/text_chunker.rb
 - app/services/ragdoll/text_generation_service.rb
-- db/migrate/001_enable_postgresql_extensions.rb
-- db/migrate/004_create_ragdoll_documents.rb
-- db/migrate/005_create_ragdoll_embeddings.rb
-- db/migrate/006_create_ragdoll_contents.rb
-- db/migrate/007_create_ragdoll_searches.rb
-- db/migrate/008_create_ragdoll_search_results.rb
+- db/migrate/20250815234901_enable_postgresql_extensions.rb
+- db/migrate/20250815234902_create_ragdoll_documents.rb
+- db/migrate/20250815234903_create_ragdoll_embeddings.rb
+- db/migrate/20250815234904_create_ragdoll_contents.rb
+- db/migrate/20250815234905_create_ragdoll_searches.rb
+- db/migrate/20250815234906_create_ragdoll_search_results.rb
 - lib/ragdoll-core.rb
 - lib/ragdoll.rb
 - lib/ragdoll/core.rb

data/db/migrate/004_create_ragdoll_documents.rb DELETED Viewed

@@ -1,70 +0,0 @@
-class CreateRagdollDocuments < ActiveRecord::Migration[7.0]
-  def change
-    create_table :ragdoll_documents,
-      comment: "Core documents table with LLM-generated structured metadata" do |t|
-      t.string :location, null: false,
-        comment: "Source location of document (file path, URL, or identifier)"
-      t.string :title, null: false,
-        comment: "Human-readable document title for display and search"
-      t.text :summary, null: false, default: "",
-        comment: "LLM-generated summary of document content"
-      t.text :keywords , null: false, default: "",
-        comment: "LLM-generated comma-separated keywords of document"
-      t.string :document_type, null: false, default: "text",
-        comment: "Document format type"
-      t.string :status, null: false, default: "pending",
-        comment: "Document processing status"
-      t.json :metadata, default: {},
-        comment: "LLM-generated structured metadata about the file"
-      t.timestamp :file_modified_at, null: false, default: -> { "CURRENT_TIMESTAMP" },
-        comment: "Timestamp when the source file was last modified"
-      t.timestamps null: false,
-        comment: "Standard creation and update timestamps"
-      ###########
-      # Indexes #
-      ###########
-      t.index :location, unique: true,
-        comment: "Unique index for document source lookup"
-      t.index :title,
-        comment: "Index for title-based search"
-      t.index :document_type,
-        comment: "Index for filtering by document type"
-      t.index :status,
-        comment: "Index for filtering by processing status"
-      t.index :created_at,
-        comment: "Index for chronological sorting"
-      t.index %i[document_type status],
-        comment: "Composite index for type+status filtering"
-      t.index "to_tsvector('english', COALESCE(title, '') ||
-        ' ' ||
-        COALESCE(metadata->>'summary', '') ||
-        ' ' || COALESCE(metadata->>'keywords', '') ||
-        ' ' || COALESCE(metadata->>'description', ''))",
-        using: :gin, name: "index_ragdoll_documents_on_fulltext_search",
-        comment: "Full-text search across title and metadata fields"
-      t.index "(metadata->>'document_type')", name: "index_ragdoll_documents_on_metadata_type",
-        comment: "Index for filtering by document type"
-      t.index "(metadata->>'classification')", name: "index_ragdoll_documents_on_metadata_classification",
-        comment: "Index for filtering by document classification"
-    end
-  end
-end