ragdoll 0.0.2 → 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (54) hide show
  1. checksums.yaml +4 -4
  2. data/README.md +40 -318
  3. data/Rakefile +4 -15
  4. data/app/models/ragdoll/document.rb +9 -0
  5. data/app/models/ragdoll/embedding.rb +9 -0
  6. data/config/initializers/ragdoll.rb +6 -0
  7. data/config/routes.rb +5 -0
  8. data/db/migrate/20250218123456_create_documents.rb +20 -0
  9. data/lib/config/database.yml +28 -0
  10. data/lib/config/ragdoll.yml +31 -0
  11. data/lib/ragdoll/engine.rb +16 -0
  12. data/lib/ragdoll/import_job.rb +15 -0
  13. data/lib/ragdoll/ingestion.rb +30 -0
  14. data/lib/ragdoll/search.rb +18 -0
  15. data/lib/ragdoll/version.rb +7 -0
  16. data/lib/ragdoll.rb +6 -243
  17. data/lib/tasks/import_task.thor +32 -0
  18. data/lib/tasks/jobs_task.thor +40 -0
  19. data/lib/tasks/ragdoll_tasks.thor +7 -0
  20. data/lib/tasks/search_task.thor +55 -0
  21. metadata +37 -40
  22. data/db/migrate/001_enable_postgresql_extensions.rb +0 -23
  23. data/db/migrate/004_create_ragdoll_documents.rb +0 -70
  24. data/db/migrate/005_create_ragdoll_embeddings.rb +0 -41
  25. data/db/migrate/006_create_ragdoll_contents.rb +0 -47
  26. data/lib/ragdoll/core/client.rb +0 -315
  27. data/lib/ragdoll/core/configuration.rb +0 -273
  28. data/lib/ragdoll/core/database.rb +0 -141
  29. data/lib/ragdoll/core/document_management.rb +0 -110
  30. data/lib/ragdoll/core/document_processor.rb +0 -344
  31. data/lib/ragdoll/core/embedding_service.rb +0 -183
  32. data/lib/ragdoll/core/errors.rb +0 -11
  33. data/lib/ragdoll/core/jobs/extract_keywords.rb +0 -32
  34. data/lib/ragdoll/core/jobs/extract_text.rb +0 -42
  35. data/lib/ragdoll/core/jobs/generate_embeddings.rb +0 -32
  36. data/lib/ragdoll/core/jobs/generate_summary.rb +0 -29
  37. data/lib/ragdoll/core/metadata_schemas.rb +0 -334
  38. data/lib/ragdoll/core/models/audio_content.rb +0 -175
  39. data/lib/ragdoll/core/models/content.rb +0 -126
  40. data/lib/ragdoll/core/models/document.rb +0 -678
  41. data/lib/ragdoll/core/models/embedding.rb +0 -204
  42. data/lib/ragdoll/core/models/image_content.rb +0 -227
  43. data/lib/ragdoll/core/models/text_content.rb +0 -169
  44. data/lib/ragdoll/core/search_engine.rb +0 -50
  45. data/lib/ragdoll/core/services/image_description_service.rb +0 -230
  46. data/lib/ragdoll/core/services/metadata_generator.rb +0 -335
  47. data/lib/ragdoll/core/shrine_config.rb +0 -71
  48. data/lib/ragdoll/core/text_chunker.rb +0 -210
  49. data/lib/ragdoll/core/text_generation_service.rb +0 -360
  50. data/lib/ragdoll/core/version.rb +0 -8
  51. data/lib/ragdoll/core.rb +0 -73
  52. data/lib/ragdoll-core.rb +0 -3
  53. data/lib/tasks/annotate.rake +0 -126
  54. data/lib/tasks/db.rake +0 -338
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 2d5c41105ebbbb39c2c38db7518607f45d446e6cd3b024abde75beb434ff3d2b
4
- data.tar.gz: 2d9714a078bd7b9a8adda80869af3f983e3cb637356cba64e1550f0fe77911ee
3
+ metadata.gz: bea4621e2b802db79d78f8b1d0679cf2f81ed35b91d35683ce0afcb83ddc54e1
4
+ data.tar.gz: ec12fb975b154f77a42d54fb3c716d523e1b90e4e0122b3576c5aac15a957340
5
5
  SHA512:
6
- metadata.gz: bf742b9919e1d542b45325e11e197bc9aca313b892c34567ea51d4efd7e7815ecc8081165af5cfef12e2453b86ee215d3a92a807a3c98d486b3797ac3dbaf214
7
- data.tar.gz: da808c5780e3fecd02ef2ab0c4414b0f70cf3eef6a69ba08311c473ed19f98f3d3d4d0b5912c294dc0441a03f58a3ff510be126736d47881190cf56841e6233d
6
+ metadata.gz: 3702308d3b772dfc0ebf429a26bae0f0378456e9d6c48357b8e2a4cdeb3744e78b43fd610ef308ff6348f1bec28bb37d51bd5e335d78583574b3212c8f544a33
7
+ data.tar.gz: 9beebfebafe1ed2e815a3042949b68e5f208d8c555a92228eb4098fa99900f07d2985654766e2907de3069f5d10b6cf3e65fb7a3b431ee9db836e6111f1e27f2
data/README.md CHANGED
@@ -1,353 +1,75 @@
1
- <div align="center" style="background-color: yellow; color: black; padding: 20px; margin: 20px 0; border: 2px solid black; font-size: 48px; font-weight: bold;">
2
- ⚠️ CAUTION ⚠️<br />
3
- Software Under Development by a Crazy Man
4
- </div>
5
- <br />
6
- <div align="center">
7
- <table>
8
- <tr>
9
- <td width="50%">
10
- <a href="https://research.ibm.com/blog/retrieval-augmented-generation-RAG" target="_blank">
11
- <img src="rag_doll.png" alt="Ragdoll" width="800">
12
- </a>
13
- </td>
14
- <td width="50%" valign="top">
15
- <p>Multi-modal RAG (Retrieval-Augmented Generation) is an architecture that integrates multiple data types (such as text, images, and audio) to enhance AI response generation. It combines retrieval-based methods, which fetch relevant information from a knowledge base, with generative large language models (LLMs) that create coherent and contextually appropriate outputs. This approach allows for more comprehensive and engaging user interactions, such as chatbots that respond with both text and images or educational tools that incorporate visual aids into learning materials. By leveraging various modalities, multi-modal RAG systems improve context understanding and user experience.</p>
16
- </td>
17
- </tr>
18
- </table>
19
- </div>
1
+ # Ragdoll
20
2
 
21
- # Ragdoll::Core
3
+ Ragdoll is a Rails Engine designed for document ingestion and search. It allows you to import documents, vectorize them, and perform searches using vector representations.
22
4
 
23
- Database-oriented multi-modal RAG (Retrieval-Augmented Generation) library built on ActiveRecord. Features PostgreSQL + pgvector for high-performance semantic search, polymorphic content architecture, and dual metadata design for sophisticated document analysis.
5
+ ## Installation as a Rails Engine
24
6
 
25
- ## Quick Start
7
+ To use Ragdoll as a Rails Engine, add this line to your application's Gemfile:
26
8
 
27
- ```ruby
28
- require 'ragdoll'
29
-
30
- # Configure with PostgreSQL + pgvector
31
- Ragdoll::Core.configure do |config|
32
- # Database configuration (PostgreSQL only)
33
- config.database_config = {
34
- adapter: 'postgresql',
35
- database: 'ragdoll_production',
36
- username: 'ragdoll',
37
- password: ENV['DATABASE_PASSWORD'],
38
- host: 'localhost',
39
- port: 5432,
40
- auto_migrate: true
41
- }
42
-
43
- # Ruby LLM configuration
44
- config.ruby_llm_config[:openai][:api_key] = ENV['OPENAI_API_KEY']
45
- config.ruby_llm_config[:openai][:organization] = ENV['OPENAI_ORGANIZATION']
46
- config.ruby_llm_config[:openai][:project] = ENV['OPENAI_PROJECT']
47
-
48
- # Model configuration
49
- config.models[:default] = 'openai/gpt-4o'
50
- config.models[:embedding][:text] = 'text-embedding-3-small'
51
-
52
- # Logging configuration
53
- config.logging_config[:log_level] = :warn
54
- config.logging_config[:log_filepath] = File.join(Dir.home, '.ragdoll', 'ragdoll.log')
55
- end
56
-
57
- # Add documents - returns detailed result
58
- result = Ragdoll::Core.add_document(path: 'research_paper.pdf')
59
- puts result[:message] # "Document 'research_paper' added successfully with ID 123"
60
- doc_id = result[:document_id]
61
-
62
- # Check document status
63
- status = Ragdoll::Core.document_status(id: doc_id)
64
- puts status[:message] # Shows processing status and embeddings count
65
-
66
- # Search across content
67
- results = Ragdoll::Core.search(query: 'neural networks')
68
-
69
- # Get detailed document information
70
- document = Ragdoll::Core.get_document(id: doc_id)
71
- ```
72
-
73
- ## High-Level API
74
-
75
- The `Ragdoll` module provides a convenient high-level API for common operations:
76
-
77
- ### Document Management
78
-
79
- ```ruby
80
- # Add single document - returns detailed result hash
81
- result = Ragdoll::Core.add_document(path: 'document.pdf')
82
- puts result[:success] # true
83
- puts result[:document_id] # "123"
84
- puts result[:message] # "Document 'document' added successfully with ID 123"
85
- puts result[:embeddings_queued] # true
86
-
87
- # Check document processing status
88
- status = Ragdoll::Core.document_status(id: result[:document_id])
89
- puts status[:status] # "processed"
90
- puts status[:embeddings_count] # 15
91
- puts status[:embeddings_ready] # true
92
- puts status[:message] # "Document processed successfully with 15 embeddings"
93
-
94
- # Get detailed document information
95
- document = Ragdoll::Core.get_document(id: result[:document_id])
96
- puts document[:title] # "document"
97
- puts document[:status] # "processed"
98
- puts document[:embeddings_count] # 15
99
- puts document[:content_length] # 5000
100
-
101
- # Update document metadata
102
- Ragdoll::Core.update_document(id: result[:document_id], title: 'New Title')
103
-
104
- # Delete document
105
- Ragdoll::Core.delete_document(id: result[:document_id])
106
-
107
- # List all documents
108
- documents = Ragdoll::Core.list_documents(limit: 10)
109
-
110
- # System statistics
111
- stats = Ragdoll::Core.stats
112
- puts stats[:total_documents] # 50
113
- puts stats[:total_embeddings] # 1250
9
+ ```bash
10
+ bundle add ragdoll
114
11
  ```
115
12
 
116
- ### Search and Retrieval
117
-
118
- ```ruby
119
- # Semantic search across all content types
120
- results = Ragdoll::Core.search(query: 'artificial intelligence')
121
-
122
- # Search specific content types
123
- text_results = Ragdoll::Core.search(query: 'machine learning', content_type: 'text')
124
- image_results = Ragdoll::Core.search(query: 'neural network diagram', content_type: 'image')
125
- audio_results = Ragdoll::Core.search(query: 'AI discussion', content_type: 'audio')
126
-
127
- # Advanced search with metadata filters
128
- results = Ragdoll::Core.search(
129
- query: 'deep learning',
130
- classification: 'research',
131
- keywords: ['AI', 'neural networks'],
132
- tags: ['technical']
133
- )
13
+ And then execute:
134
14
 
135
- # Get context for RAG applications
136
- context = Ragdoll::Core.get_context(query: 'machine learning', limit: 5)
137
-
138
- # Enhanced prompt with context
139
- enhanced = Ragdoll::Core.enhance_prompt(
140
- prompt: 'What is machine learning?',
141
- context_limit: 5
142
- )
143
-
144
- # Hybrid search combining semantic and full-text
145
- results = Ragdoll::Core.hybrid_search(
146
- query: 'neural networks',
147
- semantic_weight: 0.7,
148
- text_weight: 0.3
149
- )
15
+ ```bash
16
+ bundle install
150
17
  ```
151
18
 
152
- ### System Operations
153
-
154
- ```ruby
155
- # Get system statistics
156
- stats = Ragdoll::Core.stats
157
- # Returns information about documents, content types, embeddings, etc.
19
+ Or install it yourself as:
158
20
 
159
- # Health check
160
- healthy = Ragdoll::Core.healthy?
161
-
162
- # Get configuration
163
- config = Ragdoll::Core.configuration
164
-
165
- # Reset configuration (useful for testing)
166
- Ragdoll::Core.reset_configuration!
21
+ ```bash
22
+ gem install ragdoll
167
23
  ```
168
24
 
169
- ### Configuration
170
-
171
- ```ruby
172
- # Configure the system
173
- Ragdoll::Core.configure do |config|
174
- # Database configuration (PostgreSQL only - REQUIRED)
175
- config.database_config = {
176
- adapter: 'postgresql',
177
- database: 'ragdoll_production',
178
- username: 'ragdoll',
179
- password: ENV['DATABASE_PASSWORD'],
180
- host: 'localhost',
181
- port: 5432,
182
- auto_migrate: true
183
- }
184
-
185
- # Ruby LLM configuration for multiple providers
186
- config.ruby_llm_config[:openai][:api_key] = ENV['OPENAI_API_KEY']
187
- config.ruby_llm_config[:openai][:organization] = ENV['OPENAI_ORGANIZATION']
188
- config.ruby_llm_config[:openai][:project] = ENV['OPENAI_PROJECT']
189
-
190
- config.ruby_llm_config[:anthropic][:api_key] = ENV['ANTHROPIC_API_KEY']
191
- config.ruby_llm_config[:google][:api_key] = ENV['GOOGLE_API_KEY']
25
+ ## Usage as a Rails Engine
192
26
 
193
- # Model configuration
194
- config.models[:default] = 'openai/gpt-4o'
195
- config.models[:summary] = 'openai/gpt-4o'
196
- config.models[:keywords] = 'openai/gpt-4o'
197
- config.models[:embedding][:text] = 'text-embedding-3-small'
198
- config.models[:embedding][:image] = 'image-embedding-3-small'
199
- config.models[:embedding][:audio] = 'audio-embedding-3-small'
27
+ ### Importing Documents
200
28
 
201
- # Logging configuration
202
- config.logging_config[:log_level] = :warn # :debug, :info, :warn, :error, :fatal
203
- config.logging_config[:log_filepath] = File.join(Dir.home, '.ragdoll', 'ragdoll.log')
29
+ To import documents from a file, glob, or directory, use the following command:
204
30
 
205
- # Processing settings
206
- config.chunking[:text][:max_tokens] = 1000
207
- config.chunking[:text][:overlap] = 200
208
- config.search[:similarity_threshold] = 0.7
209
- config.search[:max_results] = 10
210
- end
31
+ ```bash
32
+ ragdoll import PATH
211
33
  ```
212
34
 
213
- ## Current Implementation Status
214
-
215
- ### **Fully Implemented**
216
- - **Text document processing**: PDF, DOCX, HTML, Markdown, plain text files
217
- - **Embedding generation**: Text chunking and vector embedding creation
218
- - **Database schema**: Multi-modal polymorphic architecture with PostgreSQL + pgvector
219
- - **Dual metadata architecture**: Separate LLM-generated content analysis and file properties
220
- - **Search functionality**: Semantic search with cosine similarity and usage analytics
221
- - **Document management**: Add, update, delete, list operations
222
- - **Background processing**: ActiveJob integration for async embedding generation
223
- - **LLM metadata generation**: AI-powered structured content analysis with schema validation
224
- - **Logging**: Configurable file-based logging with multiple levels
225
-
226
- ### 🚧 **In Development**
227
- - **Image processing**: Framework exists but vision AI integration needs completion
228
- - **Audio processing**: Framework exists but speech-to-text integration needs completion
229
- - **Hybrid search**: Combining semantic and full-text search capabilities
230
-
231
- ### 📋 **Planned Features**
232
- - **Multi-modal search**: Search across text, image, and audio content types
233
- - **Content-type specific embedding models**: Different models for text, image, audio
234
- - **Enhanced metadata schemas**: Domain-specific metadata templates
235
-
236
- ## Architecture Highlights
237
-
238
- ### Dual Metadata Design
239
-
240
- Ragdoll uses a sophisticated dual metadata architecture to separate concerns:
241
-
242
- - **`metadata` (JSON)**: LLM-generated content analysis including summary, keywords, classification, topics, sentiment, and domain-specific insights
243
- - **`file_metadata` (JSON)**: System-generated file properties including size, MIME type, dimensions, processing parameters, and technical characteristics
244
-
245
- This separation enables both semantic search operations on content meaning and efficient file management operations.
246
-
247
- ### Polymorphic Multi-Modal Architecture
248
-
249
- The database schema uses polymorphic associations to elegantly support multiple content types:
35
+ - `PATH`: The path to the file or directory to import.
36
+ - Use the `-r` or `--recursive` option to import files recursively from directories.
37
+ - Use the `-j` or `--jobs` option to specify the number of concurrent import jobs.
250
38
 
251
- - **Documents**: Central entity with dual metadata columns
252
- - **Content Types**: Specialized tables for `text_contents`, `image_contents`, `audio_contents`
253
- - **Embeddings**: Unified vector storage via polymorphic `embeddable` associations
39
+ ### Managing Jobs
254
40
 
255
- ## Text Document Processing (Current)
256
-
257
- Currently, Ragdoll processes text documents through:
258
-
259
- 1. **Content Extraction**: Extracts text from PDF, DOCX, HTML, Markdown, and plain text
260
- 2. **Metadata Generation**: AI-powered analysis creates structured content metadata
261
- 3. **Text Chunking**: Splits content into manageable chunks with configurable size/overlap
262
- 4. **Embedding Generation**: Creates vector embeddings using OpenAI or other providers
263
- 5. **Database Storage**: Stores in polymorphic multi-modal architecture with dual metadata
264
- 6. **Search**: Semantic search using cosine similarity with usage analytics
265
-
266
- ### Example Usage
267
- ```ruby
268
- # Add a text document
269
- result = Ragdoll::Core.add_document(path: 'document.pdf')
270
-
271
- # Check processing status
272
- status = Ragdoll::Core.document_status(id: result[:document_id])
273
-
274
- # Search the content
275
- results = Ragdoll::Core.search(query: 'machine learning')
276
- ```
277
-
278
- ## PostgreSQL + pgvector Configuration
279
-
280
- ### Database Setup
41
+ To manage import jobs, use the following command:
281
42
 
282
43
  ```bash
283
- # Install PostgreSQL and pgvector
284
- brew install postgresql pgvector # macOS
285
- # or
286
- apt-get install postgresql postgresql-contrib # Ubuntu
287
-
288
- # Create database and enable pgvector extension
289
- createdb ragdoll_production
290
- psql -d ragdoll_production -c "CREATE EXTENSION IF NOT EXISTS vector;"
291
- ```
292
-
293
- ### Configuration Example
294
-
295
- ```ruby
296
- Ragdoll::Core.configure do |config|
297
- config.database_config = {
298
- adapter: 'postgresql',
299
- database: 'ragdoll_production',
300
- username: 'ragdoll',
301
- password: ENV['DATABASE_PASSWORD'],
302
- host: 'localhost',
303
- port: 5432,
304
- pool: 20,
305
- auto_migrate: true
306
- }
307
- end
44
+ ragdoll jobs [JOB_ID]
308
45
  ```
309
46
 
310
- ## Performance Features
47
+ - `JOB_ID`: The ID of a specific job to manage.
48
+ - Use `--stop`, `--pause`, or `--resume` to control a specific job.
49
+ - Use `--stop-all`, `--pause-all`, or `--resume-all` to control all jobs.
311
50
 
312
- - **Native pgvector**: Hardware-accelerated similarity search
313
- - **IVFFlat indexing**: Fast approximate nearest neighbor search
314
- - **Polymorphic embeddings**: Unified search across content types
315
- - **Batch processing**: Efficient bulk operations
316
- - **Background jobs**: Asynchronous document processing
317
- - **Connection pooling**: High-concurrency support
51
+ ### Searching Documents
318
52
 
319
- ## Installation
53
+ To search the database with a prompt, use the following command:
320
54
 
321
55
  ```bash
322
- # Install system dependencies
323
- brew install postgresql pgvector # macOS
324
- # or
325
- apt-get install postgresql postgresql-contrib # Ubuntu
56
+ ragdoll search PROMPT
57
+ ```
326
58
 
327
- # Install gem
328
- gem install ragdoll
59
+ - `PROMPT`: The search prompt as a string or use the `-p` option to specify a file containing the prompt text.
60
+ - Use the `--max_count` option to specify the maximum number of results to return.
61
+ - Use the `--rerank` option to rerank results using keyword search.
329
62
 
330
- # Or add to Gemfile
331
- gem 'ragdoll'
332
- ```
63
+ ## Development and Contribution
333
64
 
334
- ## Requirements
65
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
335
66
 
336
- - **Ruby**: 3.2+
337
- - **PostgreSQL**: 12+ with pgvector extension (REQUIRED - no other databases supported)
338
- - **Dependencies**: activerecord, pg, pgvector, neighbor, ruby_llm, pdf-reader, docx, rubyzip, shrine, rmagick, opensearch-ruby, searchkick, ruby-progressbar
67
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).
339
68
 
340
- ## Related Projects
69
+ ## Contributing
341
70
 
342
- - **ragdoll-cli**: Standalone CLI application using ragdoll
343
- - **ragdoll-rails**: Rails engine with web interface for ragdoll
71
+ Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/ragdoll.
344
72
 
345
- ## Key Design Principles
73
+ ## License
346
74
 
347
- 1. **Database-Oriented**: Built on ActiveRecord with PostgreSQL + pgvector for production performance
348
- 2. **Multi-Modal First**: Text, image, and audio content as first-class citizens via polymorphic architecture
349
- 3. **Dual Metadata Design**: Separates LLM-generated content analysis from file properties
350
- 4. **LLM-Enhanced**: Structured metadata generation with schema validation using latest AI capabilities
351
- 5. **High-Level API**: Simple, intuitive interface for complex operations
352
- 6. **Scalable**: Designed for production workloads with background processing and proper indexing
353
- 7. **Extensible**: Easy to add new content types and embedding models through polymorphic design
75
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
data/Rakefile CHANGED
@@ -1,21 +1,10 @@
1
- # frozen_string_literal: true
2
-
3
- require 'simplecov'
4
- SimpleCov.start
1
+ # This file defines the Rake tasks for the Ragdoll gem, including tasks for testing.
5
2
 
6
- # Suppress bundler/rubygems warnings
7
- $VERBOSE = nil
3
+ # frozen_string_literal: true
8
4
 
9
5
  require "bundler/gem_tasks"
10
- require "rake/testtask"
11
-
12
- Rake::TestTask.new(:test) do |t|
13
- t.libs << "test"
14
- t.libs << "lib"
15
- t.test_files = FileList["test/**/*_test.rb"]
16
- end
6
+ require "minitest/test_task"
17
7
 
18
- # Load annotate tasks
19
- Dir.glob("lib/tasks/*.rake").each { |r| load r }
8
+ Minitest::TestTask.create
20
9
 
21
10
  task default: :test
@@ -0,0 +1,9 @@
1
+ # This file defines the Document model for the Ragdoll gem.
2
+
3
+ # frozen_string_literal: true
4
+
5
+ module Ragdoll
6
+ class Document < ApplicationRecord
7
+ has_many :embeddings, dependent: :destroy
8
+ end
9
+ end
@@ -0,0 +1,9 @@
1
+ # This file defines the Embedding model for the Ragdoll gem.
2
+
3
+ # frozen_string_literal: true
4
+
5
+ module Ragdoll
6
+ class Embedding < ApplicationRecord
7
+ belongs_to :document
8
+ end
9
+ end
@@ -0,0 +1,6 @@
1
+ # frozen_string_literal: true
2
+
3
+ # Initializer for Ragdoll engine
4
+ Ragdoll.configure do |config|
5
+ # Set configuration options here
6
+ end
data/config/routes.rb ADDED
@@ -0,0 +1,5 @@
1
+ # frozen_string_literal: true
2
+
3
+ Ragdoll::Engine.routes.draw do
4
+ # Define your engine routes here
5
+ end
@@ -0,0 +1,20 @@
1
+ # This migration creates the documents table with necessary extensions for PostgreSQL.
2
+
3
+ module Ragdoll
4
+ class CreateDocuments < ActiveRecord::Migration[7.0]
5
+ def change
6
+ enable_extension 'pg_trgm'
7
+ enable_extension 'fuzzystrmatch'
8
+
9
+ create_table :documents do |t|
10
+ t.string :location
11
+ t.string :summary
12
+ t.string :type
13
+ t.datetime :processing_started_at
14
+ t.datetime :processing_finished_at
15
+
16
+ t.timestamps
17
+ end
18
+ end
19
+ end
20
+ end
@@ -0,0 +1,28 @@
1
+ # This file contains the database configuration for the Ragdoll gem, using environment variables.
2
+
3
+ default: &default
4
+ adapter: postgresql
5
+ encoding: unicode
6
+ pool: <%= ENV.fetch("RAGDOLL_POOL", 5) %>
7
+ timeout: <%= ENV.fetch("RAGDOLL_TIMEOUT", 5000) %>
8
+
9
+ development:
10
+ <<: *default
11
+ host: <%= ENV.fetch("RAGDOLL_HOST", "localhost") %>
12
+ database: <%= ENV.fetch("RAGDOLL_DATABASE", "ragdoll_development") %>
13
+ username: <%= ENV.fetch("RAGDOLL_USER", "user") %>
14
+ password: <%= ENV.fetch("RAGDOLL_PASSWORD", "password") %>
15
+
16
+ test:
17
+ <<: *default
18
+ host: <%= ENV.fetch("RAGDOLL_HOST", "localhost") %>
19
+ database: <%= ENV.fetch("RAGDOLL_DATABASE", "ragdoll_test") %>
20
+ username: <%= ENV.fetch("RAGDOLL_USER", "user") %>
21
+ password: <%= ENV.fetch("RAGDOLL_PASSWORD", "password") %>
22
+
23
+ production:
24
+ <<: *default
25
+ host: <%= ENV.fetch("RAGDOLL_HOST") %>
26
+ database: <%= ENV.fetch("RAGDOLL_DATABASE") %>
27
+ username: <%= ENV.fetch("RAGDOLL_USER") %>
28
+ password: <%= ENV.fetch("RAGDOLL_PASSWORD") %>
@@ -0,0 +1,31 @@
1
+ # This file contains the default configuration settings for the Ragdoll gem, including database configurations.
2
+
3
+ default: &default
4
+ database:
5
+ host: localhost
6
+ database: ragdoll_development
7
+ user: user
8
+ password: password
9
+ pool: 5
10
+ timeout: 5000
11
+
12
+ llm:
13
+ embeddings_model: "llama-2-7b"
14
+ reranking_model: "llama-2-13b"
15
+ chat_model: "llama-2-70b"
16
+
17
+ development:
18
+ <<: *default
19
+
20
+ test:
21
+ <<: *default
22
+ database:
23
+ database: ragdoll_test
24
+
25
+ production:
26
+ <<: *default
27
+ database:
28
+ host: <%= ENV.fetch("RAGDOLL_HOST") %>
29
+ database: <%= ENV.fetch("RAGDOLL_DATABASE") %>
30
+ user: <%= ENV.fetch("RAGDOLL_USER") %>
31
+ password: <%= ENV.fetch("RAGDOLL_PASSWORD") %>
@@ -0,0 +1,16 @@
1
+ # This file defines the Ragdoll engine, which integrates the gem with Rails applications.
2
+
3
+ # frozen_string_literal: true
4
+
5
+ require "rails/engine"
6
+
7
+ module Ragdoll
8
+ class Engine < ::Rails::Engine
9
+ isolate_namespace Ragdoll
10
+ config.generators do |g|
11
+ g.test_framework :minitest
12
+ g.fixture_replacement :factory_bot
13
+ g.factory_bot dir: 'test/factories'
14
+ end
15
+ end
16
+ end
@@ -0,0 +1,15 @@
1
+ # This file defines the ImportJob class for handling document import tasks in the background.
2
+
3
+ # frozen_string_literal: true
4
+
5
+ module Ragdoll
6
+ class ImportJob < SolidJob::Base
7
+ def perform(file)
8
+ document = File.read(file)
9
+ ingestion = Ragdoll::Ingestion.new(document)
10
+ vectorized_chunks = ingestion.chunk_and_vectorize
11
+ ingestion.store_in_database
12
+ puts "Imported #{file} successfully."
13
+ end
14
+ end
15
+ end
@@ -0,0 +1,30 @@
1
+ # This file contains the Ingestion class responsible for processing documents by chunking and vectorizing them.
2
+
3
+ # frozen_string_literal: true
4
+
5
+ module Ragdoll
6
+ class Ingestion
7
+ def initialize(document)
8
+ @document = document
9
+ end
10
+
11
+ def chunk_and_vectorize
12
+ # Example logic for chunking and vectorization
13
+ chunks = @document.split("\n\n") # Split document into paragraphs
14
+ vectorized_chunks = chunks.map { |chunk| vectorize(chunk) }
15
+ vectorized_chunks
16
+ end
17
+
18
+ def store_in_database
19
+ # Implement logic to store vectorized data in the database
20
+ end
21
+
22
+ private
23
+
24
+ def vectorize(chunk)
25
+ # Placeholder for vectorization logic
26
+ # Convert chunk to a vector representation
27
+ chunk.split.map(&:downcase) # Simple example: split words and downcase
28
+ end
29
+ end
30
+ end
@@ -0,0 +1,18 @@
1
+ # This file contains the Search class responsible for querying the database with a prompt.
2
+
3
+ # frozen_string_literal: true
4
+
5
+ module Ragdoll
6
+ class Search
7
+ def initialize(prompt)
8
+ @prompt = prompt
9
+ end
10
+
11
+ def search_database(max_count)
12
+ # Example logic for searching the database
13
+ # This is a placeholder for actual database search logic
14
+ results = [] # Placeholder for actual database query results
15
+ results.select { |entry| entry.include?(@prompt) }
16
+ end
17
+ end
18
+ end
@@ -0,0 +1,7 @@
1
+ # This file defines the version number for the Ragdoll gem.
2
+
3
+ # frozen_string_literal: true
4
+
5
+ module Ragdoll
6
+ VERSION = "0.1.0"
7
+ end