universal_document_processor 1.0.1 → 1.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/AI_USAGE_GUIDE.md DELETED
@@ -1,404 +0,0 @@
1
- # 🤖 Universal Document Processor - AI Agent Usage Guide
2
-
3
- ## Overview
4
-
5
- The Universal Document Processor gem includes powerful AI-powered document analysis capabilities through its built-in **Agentic AI** features. Once you've installed the gem, you can leverage AI to analyze, summarize, extract information, and interact with your documents intelligently.
6
-
7
- ## 🚀 Quick Setup
8
-
9
- ### 1. Install the Gem
10
-
11
- ```bash
12
- gem install universal_document_processor
13
- ```
14
-
15
- ### 2. Set Up Your OpenAI API Key
16
-
17
- ```bash
18
- # Set environment variable
19
- export OPENAI_API_KEY="your-openai-api-key-here"
20
- ```
21
-
22
- Or pass it directly in your code:
23
-
24
- ```ruby
25
- options = { api_key: 'your-openai-api-key-here' }
26
- ```
27
-
28
- ### 3. Basic AI Usage
29
-
30
- ```ruby
31
- require 'universal_document_processor'
32
-
33
- # Basic AI analysis
34
- result = UniversalDocumentProcessor.ai_analyze('document.pdf')
35
- puts result
36
- ```
37
-
38
- ## 🧠 AI Features Overview
39
-
40
- ### Available AI Methods
41
-
42
- 1. **`ai_analyze`** - Comprehensive document analysis
43
- 2. **`ai_summarize`** - Generate summaries of different lengths
44
- 3. **`ai_extract_info`** - Extract specific information categories
45
- 4. **`ai_translate`** - Translate document content
46
- 5. **`ai_classify`** - Classify document type and purpose
47
- 6. **`ai_insights`** - Generate insights and recommendations
48
- 7. **`ai_action_items`** - Extract actionable items
49
- 8. **`ai_compare`** - Compare multiple documents
50
- 9. **`ai_chat`** - Interactive chat about documents
51
-
52
- ## 📝 Detailed Usage Examples
53
-
54
- ### 1. Document Analysis
55
-
56
- #### General Analysis
57
- ```ruby
58
- # Analyze any document comprehensively
59
- analysis = UniversalDocumentProcessor.ai_analyze('report.pdf')
60
- puts analysis
61
- ```
62
-
63
- #### Specific Query Analysis
64
- ```ruby
65
- # Ask specific questions about the document
66
- analysis = UniversalDocumentProcessor.ai_analyze('contract.pdf', {
67
- query: "What are the key terms and conditions?"
68
- })
69
- puts analysis
70
- ```
71
-
72
- ### 2. Document Summarization
73
-
74
- ```ruby
75
- # Short summary (2-3 sentences)
76
- summary = UniversalDocumentProcessor.ai_summarize('document.pdf', length: :short)
77
-
78
- # Medium summary (1-2 paragraphs) - default
79
- summary = UniversalDocumentProcessor.ai_summarize('document.pdf', length: :medium)
80
-
81
- # Detailed summary
82
- summary = UniversalDocumentProcessor.ai_summarize('document.pdf', length: :long)
83
-
84
- puts summary
85
- ```
86
-
87
- ### 3. Information Extraction
88
-
89
- ```ruby
90
- # Extract default categories
91
- info = UniversalDocumentProcessor.ai_extract_info('meeting_notes.pdf')
92
-
93
- # Extract specific categories
94
- info = UniversalDocumentProcessor.ai_extract_info('contract.pdf', [
95
- 'parties', 'dates', 'financial_terms', 'obligations', 'deadlines'
96
- ])
97
-
98
- puts info
99
- ```
100
-
101
- ### 4. Document Translation
102
-
103
- ```ruby
104
- # Translate to different languages
105
- spanish_content = UniversalDocumentProcessor.ai_translate('document.pdf', 'Spanish')
106
- japanese_content = UniversalDocumentProcessor.ai_translate('document.pdf', 'Japanese')
107
- french_content = UniversalDocumentProcessor.ai_translate('document.pdf', 'French')
108
-
109
- puts spanish_content
110
- ```
111
-
112
- ### 5. Document Classification
113
-
114
- ```ruby
115
- # Classify document type and purpose
116
- classification = UniversalDocumentProcessor.ai_classify('unknown_document.pdf')
117
-
118
- # Returns structured information about document type
119
- puts classification
120
- ```
121
-
122
- ### 6. Generate Insights
123
-
124
- ```ruby
125
- # Get AI-powered insights and recommendations
126
- insights = UniversalDocumentProcessor.ai_insights('business_plan.pdf')
127
-
128
- # Returns analysis of key themes, recommendations, etc.
129
- puts insights
130
- ```
131
-
132
- ### 7. Extract Action Items
133
-
134
- ```ruby
135
- # Extract actionable items from documents
136
- action_items = UniversalDocumentProcessor.ai_action_items('meeting_minutes.pdf')
137
-
138
- # Returns structured list of tasks, deadlines, assignments
139
- puts action_items
140
- ```
141
-
142
- ### 8. Compare Documents
143
-
144
- ```ruby
145
- # Compare multiple documents
146
- comparison = UniversalDocumentProcessor.ai_compare([
147
- 'version1.pdf',
148
- 'version2.pdf',
149
- 'version3.pdf'
150
- ], :content)
151
-
152
- puts comparison
153
- ```
154
-
155
- ## 🎯 Advanced Usage with Document Objects
156
-
157
- ### Using Document Objects for More Control
158
-
159
- ```ruby
160
- # Create document object for advanced operations
161
- doc = UniversalDocumentProcessor::Document.new('complex_document.pdf')
162
-
163
- # Use AI methods on the document object
164
- summary = doc.ai_summarize(length: :medium)
165
- insights = doc.ai_insights
166
- action_items = doc.ai_action_items
167
-
168
- # Interactive chat about the document
169
- response = doc.ai_chat("What are the main risks mentioned in this document?")
170
- puts response
171
- ```
172
-
173
- ### Creating and Reusing AI Agent
174
-
175
- ```ruby
176
- # Create an AI agent with custom configuration
177
- ai_agent = UniversalDocumentProcessor.create_ai_agent({
178
- model: 'gpt-4',
179
- temperature: 0.7,
180
- api_key: 'your-api-key'
181
- })
182
-
183
- # Process document
184
- doc_result = UniversalDocumentProcessor.process('document.pdf')
185
-
186
- # Use AI agent for multiple operations
187
- summary = ai_agent.summarize_document(doc_result, length: :short)
188
- insights = ai_agent.generate_insights(doc_result)
189
- classification = ai_agent.classify_document(doc_result)
190
-
191
- # Interactive chat
192
- response = ai_agent.chat("Tell me about the financial projections", doc_result)
193
- ```
194
-
195
- ## 🛠️ Configuration Options
196
-
197
- ### AI Agent Configuration
198
-
199
- ```ruby
200
- options = {
201
- api_key: 'your-openai-api-key', # OpenAI API key
202
- model: 'gpt-4', # AI model to use
203
- temperature: 0.7, # Response creativity (0.0-1.0)
204
- max_history: 10, # Conversation history limit
205
- base_url: 'https://api.openai.com/v1' # API endpoint
206
- }
207
-
208
- # Use with any AI method
209
- result = UniversalDocumentProcessor.ai_analyze('document.pdf', options)
210
- ```
211
-
212
- ## 💡 Use Case Examples
213
-
214
- ### 1. Legal Document Analysis
215
-
216
- ```ruby
217
- # Analyze legal contracts
218
- contract_analysis = UniversalDocumentProcessor.ai_analyze('contract.pdf', {
219
- query: "Extract all key terms, obligations, and potential risks"
220
- })
221
-
222
- # Extract specific legal information
223
- legal_info = UniversalDocumentProcessor.ai_extract_info('contract.pdf', [
224
- 'parties', 'effective_date', 'termination_clauses', 'payment_terms', 'liabilities'
225
- ])
226
- ```
227
-
228
- ### 2. Business Report Processing
229
-
230
- ```ruby
231
- # Summarize quarterly reports
232
- summary = UniversalDocumentProcessor.ai_summarize('q4_report.pdf', length: :medium)
233
-
234
- # Extract key business metrics
235
- metrics = UniversalDocumentProcessor.ai_extract_info('q4_report.pdf', [
236
- 'revenue', 'expenses', 'profit_margins', 'growth_metrics', 'forecasts'
237
- ])
238
-
239
- # Get strategic insights
240
- insights = UniversalDocumentProcessor.ai_insights('q4_report.pdf')
241
- ```
242
-
243
- ### 3. Meeting Minutes Processing
244
-
245
- ```ruby
246
- # Extract action items from meeting notes
247
- action_items = UniversalDocumentProcessor.ai_action_items('meeting_notes.pdf')
248
-
249
- # Summarize meeting outcomes
250
- summary = UniversalDocumentProcessor.ai_summarize('meeting_notes.pdf', length: :short)
251
-
252
- # Extract key decisions and follow-ups
253
- decisions = UniversalDocumentProcessor.ai_extract_info('meeting_notes.pdf', [
254
- 'decisions_made', 'action_items', 'deadlines', 'assigned_people'
255
- ])
256
- ```
257
-
258
- ### 4. Research Paper Analysis
259
-
260
- ```ruby
261
- # Analyze research papers
262
- analysis = UniversalDocumentProcessor.ai_analyze('research_paper.pdf', {
263
- query: "What are the main findings and methodology used?"
264
- })
265
-
266
- # Extract research data
267
- research_info = UniversalDocumentProcessor.ai_extract_info('research_paper.pdf', [
268
- 'hypothesis', 'methodology', 'results', 'conclusions', 'future_work'
269
- ])
270
- ```
271
-
272
- ## 🔄 Interactive Document Chat
273
-
274
- ```ruby
275
- # Create document object
276
- doc = UniversalDocumentProcessor::Document.new('document.pdf')
277
-
278
- # Start interactive chat session
279
- puts "Chat with your document (type 'exit' to quit):"
280
-
281
- loop do
282
- print "> "
283
- user_input = gets.chomp
284
- break if user_input.downcase == 'exit'
285
-
286
- response = doc.ai_chat(user_input)
287
- puts "AI: #{response}\n\n"
288
- end
289
- ```
290
-
291
- ## 📊 Batch AI Processing
292
-
293
- ```ruby
294
- # Process multiple documents with AI
295
- documents = ['doc1.pdf', 'doc2.docx', 'doc3.xlsx']
296
-
297
- # Batch summarization
298
- summaries = documents.map do |file|
299
- {
300
- file: file,
301
- summary: UniversalDocumentProcessor.ai_summarize(file, length: :short)
302
- }
303
- end
304
-
305
- # Batch classification
306
- classifications = documents.map do |file|
307
- {
308
- file: file,
309
- classification: UniversalDocumentProcessor.ai_classify(file)
310
- }
311
- end
312
- ```
313
-
314
- ## 🚨 Error Handling
315
-
316
- ```ruby
317
- begin
318
- result = UniversalDocumentProcessor.ai_analyze('document.pdf')
319
- puts result
320
- rescue ArgumentError => e
321
- puts "Configuration error: #{e.message}"
322
- puts "Please check your OpenAI API key"
323
- rescue UniversalDocumentProcessor::ProcessingError => e
324
- puts "Processing error: #{e.message}"
325
- rescue StandardError => e
326
- puts "Unexpected error: #{e.message}"
327
- end
328
- ```
329
-
330
- ## 🎛️ Environment Variables
331
-
332
- Set these environment variables for seamless operation:
333
-
334
- ```bash
335
- # Required
336
- export OPENAI_API_KEY="your-openai-api-key"
337
-
338
- # Optional
339
- export OPENAI_MODEL="gpt-4"
340
- export OPENAI_TEMPERATURE="0.7"
341
- export OPENAI_BASE_URL="https://api.openai.com/v1"
342
- ```
343
-
344
- ## 🔧 Troubleshooting
345
-
346
- ### Common Issues and Solutions
347
-
348
- 1. **Missing API Key**
349
- ```ruby
350
- # Error: ArgumentError: OpenAI API key is required
351
- # Solution: Set OPENAI_API_KEY environment variable or pass api_key in options
352
- ```
353
-
354
- 2. **API Rate Limits**
355
- ```ruby
356
- # Add delays between requests for large batch operations
357
- documents.each_with_index do |doc, index|
358
- result = UniversalDocumentProcessor.ai_analyze(doc)
359
- sleep(1) if index % 10 == 0 # Pause every 10 requests
360
- end
361
- ```
362
-
363
- 3. **Large Documents**
364
- ```ruby
365
- # For very large documents, consider processing in chunks
366
- options = { max_content_length: 10000 }
367
- result = UniversalDocumentProcessor.ai_analyze('large_doc.pdf', options)
368
- ```
369
-
370
- ## 📚 Best Practices
371
-
372
- 1. **Optimize API Usage**
373
- - Cache results for repeated analysis
374
- - Use appropriate summary lengths
375
- - Batch similar operations
376
-
377
- 2. **Security**
378
- - Store API keys securely
379
- - Don't log sensitive document content
380
- - Use environment variables for configuration
381
-
382
- 3. **Performance**
383
- - Process documents in parallel when possible
384
- - Use specific queries rather than general analysis
385
- - Consider document size when choosing AI operations
386
-
387
- ## 🎯 Next Steps
388
-
389
- 1. **Explore Advanced Features**: Try different AI models and temperature settings
390
- 2. **Integrate with Your Application**: Build AI-powered document workflows
391
- 3. **Customize for Your Domain**: Create domain-specific extraction categories
392
- 4. **Scale Your Usage**: Implement batch processing for large document sets
393
-
394
- ## 📞 Support
395
-
396
- For issues with AI functionality:
397
- 1. Check your OpenAI API key and credits
398
- 2. Verify document format compatibility
399
- 3. Review error messages for specific guidance
400
- 4. Consult the main gem documentation for additional features
401
-
402
- ---
403
-
404
- *This guide covers the AI capabilities of the Universal Document Processor gem. The AI features require an OpenAI API key and internet connection to function.*
data/GEM_RELEASE_GUIDE.md DELETED
@@ -1,288 +0,0 @@
1
- # 🚀 Universal Document Processor - Gem Release Guide
2
-
3
- ## Overview
4
-
5
- This guide will walk you through the complete process of releasing your Universal Document Processor gem through GitHub and publishing it to RubyGems.
6
-
7
- ## 📋 Prerequisites
8
-
9
- 1. **GitHub Account**: Make sure you have a GitHub account
10
- 2. **RubyGems Account**: Create an account at [rubygems.org](https://rubygems.org)
11
- 3. **Git**: Ensure Git is installed and configured
12
- 4. **Ruby**: Ruby 2.7+ installed
13
- 5. **Bundler**: Latest version of Bundler
14
-
15
- ## 🛠️ Step-by-Step Release Process
16
-
17
- ### Step 1: Prepare Your Local Repository
18
-
19
- ```bash
20
- # Navigate to your gem directory
21
- cd universal_document_processor
22
-
23
- # Check current status
24
- git status
25
-
26
- # Add all files to git
27
- git add .
28
-
29
- # Commit your changes
30
- git commit -m "Initial gem setup with AI features"
31
-
32
- # Check your remote origin (should point to GitHub)
33
- git remote -v
34
- ```
35
-
36
- ### Step 2: Create GitHub Repository
37
-
38
- 1. **Go to GitHub**: Visit [github.com](https://github.com)
39
- 2. **Create New Repository**:
40
- - Repository name: `universal_document_processor`
41
- - Description: "Universal document processor with AI capabilities for all file formats"
42
- - Make it **Public** (required for gem publishing)
43
- - Don't initialize with README (you already have one)
44
-
45
- 3. **Add GitHub as remote origin**:
46
- ```bash
47
- # Replace YOUR_USERNAME with your actual GitHub username
48
- git remote add origin https://github.com/YOUR_USERNAME/universal_document_processor.git
49
-
50
- # Or if you already have origin set, update it:
51
- git remote set-url origin https://github.com/YOUR_USERNAME/universal_document_processor.git
52
- ```
53
-
54
- ### Step 3: Push to GitHub
55
-
56
- ```bash
57
- # Push to GitHub
58
- git branch -M main
59
- git push -u origin main
60
- ```
61
-
62
- ### Step 4: Set Up RubyGems Account
63
-
64
- 1. **Create RubyGems Account**: Visit [rubygems.org](https://rubygems.org) and sign up
65
- 2. **Get API Key**:
66
- - Go to your profile → "Edit Profile" → "API Keys"
67
- - Create a new API key with appropriate permissions
68
- 3. **Configure local gem credentials**:
69
- ```bash
70
- # This will prompt for your RubyGems.org credentials
71
- gem push --help
72
- ```
73
-
74
- ### Step 5: Build and Test Your Gem Locally
75
-
76
- ```bash
77
- # Install dependencies
78
- bundle install
79
-
80
- # Build the gem
81
- gem build universal_document_processor.gemspec
82
-
83
- # Test installation locally
84
- gem install ./universal_document_processor-1.0.0.gem
85
-
86
- # Test that it works
87
- ruby -e "require 'universal_document_processor'; puts 'Gem loaded successfully!'"
88
- ```
89
-
90
- ### Step 6: Publish to RubyGems
91
-
92
- ```bash
93
- # Push the gem to RubyGems
94
- gem push universal_document_processor-1.0.0.gem
95
- ```
96
-
97
- If successful, you'll see:
98
- ```
99
- Pushing gem to https://rubygems.org...
100
- Successfully registered gem: universal_document_processor (1.0.0)
101
- ```
102
-
103
- ### Step 7: Create GitHub Release
104
-
105
- 1. **Go to your GitHub repository**
106
- 2. **Click "Releases"** → **"Create a new release"**
107
- 3. **Fill in the details**:
108
- - **Tag version**: `v1.0.0`
109
- - **Release title**: `Universal Document Processor v1.0.0`
110
- - **Description**: Copy from your CHANGELOG.md
111
- 4. **Publish release**
112
-
113
- ### Step 8: Update Repository Links
114
-
115
- Update your gemspec file with the correct GitHub URLs:
116
-
117
- ```ruby
118
- # In universal_document_processor.gemspec
119
- spec.homepage = "https://github.com/YOUR_USERNAME/universal_document_processor"
120
- spec.metadata = {
121
- "homepage_uri" => "https://github.com/YOUR_USERNAME/universal_document_processor",
122
- "source_code_uri" => "https://github.com/YOUR_USERNAME/universal_document_processor",
123
- "bug_tracker_uri" => "https://github.com/YOUR_USERNAME/universal_document_processor/issues",
124
- # ... other metadata
125
- }
126
- ```
127
-
128
- ## 🔄 Future Updates and Versioning
129
-
130
- ### Semantic Versioning
131
-
132
- Follow [Semantic Versioning](https://semver.org/):
133
- - **MAJOR** (1.0.0 → 2.0.0): Breaking changes
134
- - **MINOR** (1.0.0 → 1.1.0): New features, backward compatible
135
- - **PATCH** (1.0.0 → 1.0.1): Bug fixes, backward compatible
136
-
137
- ### Release Process for Updates
138
-
139
- 1. **Update version** in `lib/universal_document_processor/version.rb`:
140
- ```ruby
141
- module UniversalDocumentProcessor
142
- VERSION = "1.1.0"
143
- end
144
- ```
145
-
146
- 2. **Update CHANGELOG.md** with new changes
147
-
148
- 3. **Commit and tag**:
149
- ```bash
150
- git add .
151
- git commit -m "Release v1.1.0"
152
- git tag v1.1.0
153
- git push origin main --tags
154
- ```
155
-
156
- 4. **Build and publish**:
157
- ```bash
158
- gem build universal_document_processor.gemspec
159
- gem push universal_document_processor-1.1.0.gem
160
- ```
161
-
162
- 5. **Create GitHub Release** for the new version
163
-
164
- ## 🛡️ Security Best Practices
165
-
166
- ### 1. Enable MFA on RubyGems
167
- ```bash
168
- # Enable two-factor authentication
169
- gem owner --add your@email.com --otp 123456
170
- ```
171
-
172
- ### 2. Secure API Keys
173
- - Never commit API keys to the repository
174
- - Use environment variables for sensitive data
175
- - Add `.env` files to `.gitignore`
176
-
177
- ### 3. Gem Signing (Optional but Recommended)
178
- ```bash
179
- # Create a self-signed certificate
180
- gem cert --build your@email.com
181
-
182
- # Sign your gem
183
- gem build universal_document_processor.gemspec --sign
184
- ```
185
-
186
- ## 📊 GitHub Actions for Automated Testing (Optional)
187
-
188
- Create `.github/workflows/ci.yml`:
189
-
190
- ```yaml
191
- name: CI
192
-
193
- on:
194
- push:
195
- branches: [ main ]
196
- pull_request:
197
- branches: [ main ]
198
-
199
- jobs:
200
- test:
201
- runs-on: ubuntu-latest
202
- strategy:
203
- matrix:
204
- ruby-version: ['2.7', '3.0', '3.1', '3.2']
205
-
206
- steps:
207
- - uses: actions/checkout@v3
208
-
209
- - name: Set up Ruby
210
- uses: ruby/setup-ruby@v1
211
- with:
212
- ruby-version: ${{ matrix.ruby-version }}
213
- bundler-cache: true
214
-
215
- - name: Run tests
216
- run: bundle exec rspec
217
-
218
- - name: Run rubocop
219
- run: bundle exec rubocop
220
- ```
221
-
222
- ## 🎯 Post-Release Checklist
223
-
224
- - [ ] Gem is available on [rubygems.org](https://rubygems.org)
225
- - [ ] GitHub repository is public and accessible
226
- - [ ] README.md is comprehensive and up-to-date
227
- - [ ] CHANGELOG.md reflects all changes
228
- - [ ] License file is present
229
- - [ ] GitHub release is created with proper tags
230
- - [ ] Links in gemspec point to correct repositories
231
- - [ ] Documentation is clear for users
232
-
233
- ## 📈 Promotion and Marketing
234
-
235
- ### 1. Announce Your Gem
236
- - Write a blog post about your gem
237
- - Share on social media (Twitter, LinkedIn)
238
- - Post in Ruby communities and forums
239
- - Submit to Ruby newsletter curators
240
-
241
- ### 2. Documentation
242
- - Create detailed documentation using YARD
243
- - Add code examples and tutorials
244
- - Create video demonstrations
245
-
246
- ### 3. Community Engagement
247
- - Respond to issues and pull requests promptly
248
- - Maintain active development
249
- - Gather user feedback and iterate
250
-
251
- ## 🆘 Troubleshooting
252
-
253
- ### Common Issues
254
-
255
- 1. **"Permission denied" when pushing to RubyGems**
256
- - Check your API credentials
257
- - Ensure you have push permissions
258
- - Verify gem name isn't already taken
259
-
260
- 2. **Git push rejected**
261
- - Pull latest changes: `git pull origin main`
262
- - Resolve any conflicts
263
- - Try push again
264
-
265
- 3. **Gem build fails**
266
- - Check gemspec syntax
267
- - Ensure all required files are present
268
- - Verify Ruby version compatibility
269
-
270
- 4. **GitHub repository access issues**
271
- - Check repository visibility (should be public)
272
- - Verify SSH keys or access tokens
273
- - Ensure correct remote URL
274
-
275
- ## 📞 Support
276
-
277
- If you encounter issues during the release process:
278
-
279
- 1. **Check the logs**: Most tools provide detailed error messages
280
- 2. **GitHub Documentation**: [GitHub Docs](https://docs.github.com)
281
- 3. **RubyGems Guides**: [RubyGems.org Guides](https://guides.rubygems.org)
282
- 4. **Ruby Community**: Stack Overflow, Reddit r/ruby
283
-
284
- ---
285
-
286
- **Congratulations! Your gem is now live and available to the Ruby community! 🎉**
287
-
288
- Remember to maintain your gem regularly, respond to user feedback, and keep it updated with new features and bug fixes.