RubyGems - legal_summariser - Versions diffs - 0.1.0 → 0.3.0 - Mend

legal_summariser 0.1.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +48 -0
data/CONTRIBUTING.md +231 -0
data/examples/advanced_configuration.rb +195 -0
data/examples/basic_usage.rb +101 -0
data/examples/batch_processing.rb +123 -0
data/exe/legal_summariser +131 -1
data/lib/legal_summariser/cache.rb +81 -0
data/lib/legal_summariser/configuration.rb +43 -0
data/lib/legal_summariser/performance_monitor.rb +108 -0
data/lib/legal_summariser/text_extractor.rb +125 -7
data/lib/legal_summariser/version.rb +1 -1
data/lib/legal_summariser.rb +205 -44
metadata +8 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 2a32e0da3e5422be003d79a333a6f3ea9417fadcc362164e3cef9cae0d84dafb
-  data.tar.gz: 3219d6167c936a2f056f43b5e2491bc4c67a697ef9d72169c7741be03f5a2726
+  metadata.gz: 71b638897796c0db2653eefb9456f1fdc821c3c91b3320f6aecce4430e750019
+  data.tar.gz: 1d915044c3946c8a34656af00d22d9a9d91d06f0258b74fbf870164f72b1104f
 SHA512:
-  metadata.gz: 9481e9eb32d6770586b21f8c56ced7f37d99afe8c9ba162fd284cc086b8f02f71b042bef0200bd61104446c0309763da7c362a3e5abae202ccf295c04ef63281
-  data.tar.gz: c41d771b2ef842b185ebf0114de4921060ad6e55a17377acfc47412790237428fcab8a5ceff3efe2112b3341a9380a30160244c21b0b078eecc108181e9d4ce8
+  metadata.gz: 339c0f2674c8509e5ffed3da42eaffde1d151c535bcb454d4bba7dfa7d9b060636b31afd66b29b405ea73b01e5067280c488661e902944e6e38778d8288854be
+  data.tar.gz: dcb9f001f384c872380c4d42bf6ebd9c079156e14131d4dd620af185226c71d7d79e7ea95a251c0aa7e81834ee0534704484fbff5e4ae0e908b8344f61add1f2

data/CHANGELOG.md CHANGED Viewed

@@ -5,6 +5,54 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [0.3.0] - 2025-01-09
+### Added
+- **Plain Language Generator**: AI-powered legal text simplification with fine-tuned models
+- **Model Training System**: Complete training pipeline for custom legal language models
+- **Advanced Multilingual Support**: Enhanced processing for 8 languages with cultural adaptations
+- **PDF Annotation System**: Rich PDF output with highlighting, comments, and risk indicators
+- **AI/ML Integration**: Support for external AI APIs and local model training
+- **Advanced NLP Features**: Readability scoring, complexity reduction metrics, and text analysis
+### Enhanced
+- **Legal Text Processing**: 30+ legal jargon mappings and sentence pattern simplification
+- **Cross-Language Translation**: Legal term mapping across multiple languages
+- **Document Analysis**: Enhanced with plain language generation and multilingual processing
+- **Performance Monitoring**: Extended metrics for new AI/ML operations
+- **Error Handling**: Comprehensive error management for AI operations and model training
+### Technical Improvements
+- **Model Architecture**: Pattern-based, statistical, and neural model support
+- **Caching System**: Extended for translation and model results
+- **API Integration**: Support for external translation and AI services
+- **Cultural Adaptations**: Legal system-specific processing for different countries
+## [0.2.0] - 2025-01-09
+### Added
+- **Configuration System**: Comprehensive configuration management with validation
+- **Caching System**: Result caching with TTL and size management
+- **Performance Monitoring**: Built-in performance tracking and metrics
+- **Enhanced CLI**: New commands for batch processing, statistics, and configuration
+- **Batch Processing**: Process multiple documents simultaneously
+- **Enhanced Document Support**: Added RTF support and improved text extraction
+- **Advanced Error Handling**: Better error messages and recovery mechanisms
+- **Comprehensive Testing**: 75 test cases with full coverage
+- **Documentation**: Complete examples and contribution guidelines
+### Enhanced
+- **Text Extraction**: Multiple encoding support, better PDF/DOCX handling
+- **Document Type Detection**: Improved scoring system for 9 document types
+- **Risk Analysis**: More comprehensive risk patterns and compliance checking
+- **Summarization**: Better plain English conversion and key point extraction
+- **CLI Interface**: Verbose logging, caching options, and performance stats
+### Fixed
+- Text cleaning and normalization issues
+- Memory leaks in document processing
+- Error handling for edge cases
 ## [0.1.0] - 2024-09-09
 ### Added

data/CONTRIBUTING.md ADDED Viewed

@@ -0,0 +1,231 @@
+# Contributing to Legal Summariser
+Thank you for your interest in contributing to Legal Summariser! This document provides guidelines for contributing to the project.
+## 🚀 Getting Started
+### Prerequisites
+- Ruby 2.6 or higher
+- Bundler gem
+- Git
+### Development Setup
+1. Fork the repository
+2. Clone your fork:
+   ```bash
+   git clone https://github.com/your-username/legal_summariser.git
+   cd legal_summariser
+   ```
+3. Install dependencies:
+   ```bash
+   bundle install
+   ```
+4. Run tests to ensure everything works:
+   ```bash
+   bundle exec rspec
+   ```
+## 🧪 Testing
+We maintain comprehensive test coverage. Please ensure all tests pass before submitting a PR:
+```bash
+# Run all tests
+bundle exec rspec
+# Run tests with coverage
+bundle exec rspec --format documentation
+# Run specific test file
+bundle exec rspec spec/text_extractor_spec.rb
+# Run linter
+bundle exec rubocop
+```
+### Writing Tests
+- Write tests for all new functionality
+- Follow existing test patterns and naming conventions
+- Use descriptive test names that explain the expected behavior
+- Include both positive and negative test cases
+- Test edge cases and error conditions
+## 📝 Code Style
+We follow Ruby community standards:
+- Use 2 spaces for indentation
+- Keep lines under 120 characters
+- Use descriptive variable and method names
+- Add comments for complex logic
+- Follow RuboCop guidelines
+Run the linter before submitting:
+```bash
+bundle exec rubocop
+```
+## 🔧 Development Guidelines
+### Architecture
+The gem follows a modular architecture:
+- `TextExtractor`: Document parsing and text extraction
+- `Summariser`: Text summarization and key point extraction
+- `ClauseDetector`: Legal clause identification
+- `RiskAnalyzer`: Risk assessment and compliance checking
+- `Formatter`: Output formatting (JSON, Markdown, Text)
+- `Cache`: Result caching system
+- `PerformanceMonitor`: Performance tracking
+- `Configuration`: Gem configuration management
+### Adding New Features
+1. **Create an issue** describing the feature
+2. **Write tests** for the new functionality
+3. **Implement the feature** following existing patterns
+4. **Update documentation** including README and code comments
+5. **Add examples** if applicable
+6. **Ensure all tests pass**
+### Adding New Document Types
+To add support for a new document type:
+1. Add detection patterns in `detect_document_type` method
+2. Update supported formats documentation
+3. Add test cases for the new format
+4. Update CLI help text if needed
+### Adding New Risk Patterns
+To add new risk detection patterns:
+1. Add patterns to `RiskAnalyzer` class
+2. Include severity levels and recommendations
+3. Add corresponding test cases
+4. Update documentation
+## 📚 Documentation
+- Update README.md for user-facing changes
+- Add inline documentation for new methods
+- Include examples for new features
+- Update CHANGELOG.md following semantic versioning
+## 🐛 Bug Reports
+When reporting bugs, please include:
+- Ruby version
+- Gem version
+- Operating system
+- Steps to reproduce
+- Expected vs actual behavior
+- Sample files (if applicable and not confidential)
+## 💡 Feature Requests
+Feature requests should include:
+- Clear description of the feature
+- Use case and motivation
+- Proposed implementation approach
+- Potential impact on existing functionality
+## 🔄 Pull Request Process
+1. **Create a feature branch** from `main`:
+   ```bash
+   git checkout -b feature/your-feature-name
+   ```
+2. **Make your changes** following the guidelines above
+3. **Commit with descriptive messages**:
+   ```bash
+   git commit -m "Add support for new document format"
+   ```
+4. **Push to your fork**:
+   ```bash
+   git push origin feature/your-feature-name
+   ```
+5. **Create a Pull Request** with:
+   - Clear title and description
+   - Reference to related issues
+   - List of changes made
+   - Test results
+### PR Requirements
+- [ ] All tests pass
+- [ ] Code follows style guidelines
+- [ ] Documentation is updated
+- [ ] CHANGELOG.md is updated
+- [ ] No breaking changes (or clearly documented)
+## 🏷️ Release Process
+Releases follow semantic versioning:
+- **MAJOR**: Breaking changes
+- **MINOR**: New features (backward compatible)
+- **PATCH**: Bug fixes (backward compatible)
+## 🤝 Code of Conduct
+- Be respectful and inclusive
+- Focus on constructive feedback
+- Help others learn and grow
+- Maintain professionalism
+## 📞 Getting Help
+- Create an issue for bugs or feature requests
+- Join discussions in existing issues
+- Contact maintainers for questions
+## 🎯 Areas for Contribution
+We welcome contributions in these areas:
+### High Priority
+- Additional document format support (ODT, RTF, HTML)
+- Enhanced clause detection patterns
+- Multi-language support improvements
+- Performance optimizations
+### Medium Priority
+- Additional risk assessment rules
+- Better error handling and recovery
+- Enhanced caching strategies
+- CLI improvements
+### Documentation
+- More usage examples
+- Video tutorials
+- API documentation improvements
+- Translation to other languages
+## 🙏 Recognition
+Contributors will be:
+- Listed in the README.md
+- Mentioned in release notes
+- Given credit in commit messages
+## 📄 License
+By contributing, you agree that your contributions will be licensed under the MIT License.
+---
+Thank you for contributing to Legal Summariser! Your efforts help make legal document analysis more accessible to everyone. 🚀

data/examples/advanced_configuration.rb ADDED Viewed

@@ -0,0 +1,195 @@
+#!/usr/bin/env ruby
+# frozen_string_literal: true
+# Example: Advanced configuration and customization
+require 'legal_summariser'
+require 'logger'
+puts "=== Advanced Legal Summariser Configuration ==="
+# Example 1: Custom logging configuration
+puts "\n1. Custom Logging Setup"
+custom_logger = Logger.new('legal_analysis.log')
+custom_logger.level = Logger::DEBUG
+custom_logger.formatter = proc do |severity, datetime, progname, msg|
+  "[#{datetime.strftime('%Y-%m-%d %H:%M:%S')}] #{severity}: #{msg}\n"
+end
+LegalSummariser.configure do |config|
+  config.logger = custom_logger
+  config.language = 'en'
+  config.max_file_size = 20 * 1024 * 1024 # 20MB
+  config.timeout = 60 # 60 seconds
+  config.enable_caching = true
+  config.cache_dir = './custom_cache'
+end
+puts "Configuration applied successfully!"
+# Example 2: Multi-language support
+puts "\n2. Multi-language Configuration"
+LegalSummariser.configure do |config|
+  config.language = 'tr' # Turkish
+end
+puts "Language set to Turkish (TR)"
+puts "Supported languages: #{LegalSummariser.configuration.supported_languages.join(', ')}"
+# Example 3: Performance monitoring
+puts "\n3. Performance Monitoring"
+monitor = LegalSummariser.performance_monitor
+# Simulate some operations for demonstration
+monitor.start_timer(:demo_operation)
+sleep(0.1) # Simulate work
+monitor.end_timer(:demo_operation)
+monitor.record(:demo_metric, 42.5)
+monitor.record(:demo_metric, 38.2)
+puts "Performance Report:"
+puts monitor.report
+# Example 4: Cache management
+puts "\n4. Cache Management"
+cache = LegalSummariser::Cache.new
+# Show cache statistics
+cache_stats = cache.stats
+puts "Cache Status: #{cache_stats[:enabled] ? 'Enabled' : 'Disabled'}"
+if cache_stats[:enabled]
+  puts "Cache Directory: #{cache_stats[:cache_dir]}"
+  puts "Cached Files: #{cache_stats[:file_count]}"
+  puts "Cache Size: #{cache_stats[:total_size_mb]} MB"
+end
+# Example 5: Error handling and validation
+puts "\n5. Configuration Validation"
+begin
+  LegalSummariser.configure do |config|
+    config.language = 'invalid_language'
+  end
+rescue LegalSummariser::Error => e
+  puts "Configuration error caught: #{e.message}"
+end
+# Reset to valid configuration
+LegalSummariser.configure do |config|
+  config.language = 'en'
+end
+# Example 6: Custom analysis workflow
+puts "\n6. Custom Analysis Workflow"
+def analyze_with_custom_workflow(file_path)
+  puts "Starting custom analysis workflow for: #{file_path}"
+  # Start performance monitoring
+  monitor = LegalSummariser.performance_monitor
+  monitor.start_timer(:custom_workflow)
+  begin
+    # Step 1: Basic analysis
+    puts "Step 1: Performing basic analysis..."
+    result = LegalSummariser.summarise(file_path)
+    # Step 2: Custom risk assessment
+    puts "Step 2: Custom risk assessment..."
+    risk_score = result[:risks][:risk_score][:score]
+    custom_risk_level = case risk_score
+                       when 0..5 then 'Very Low'
+                       when 6..15 then 'Low'
+                       when 16..30 then 'Medium'
+                       when 31..50 then 'High'
+                       else 'Critical'
+                       end
+    # Step 3: Generate custom report
+    puts "Step 3: Generating custom report..."
+    custom_report = {
+      file_path: file_path,
+      analysis_timestamp: Time.now.iso8601,
+      document_info: {
+        type: result[:metadata][:document_type],
+        word_count: result[:metadata][:word_count],
+        processing_time: result[:metadata][:extraction_time_seconds]
+      },
+      summary: result[:plain_text],
+      risk_assessment: {
+        standard_score: risk_score,
+        custom_level: custom_risk_level,
+        high_priority_issues: result[:risks][:high_risks].length,
+        compliance_gaps: result[:risks][:compliance_gaps].length
+      },
+      recommendations: generate_custom_recommendations(result)
+    }
+    workflow_time = monitor.end_timer(:custom_workflow)
+    custom_report[:workflow_time_seconds] = workflow_time.round(3)
+    puts "Custom workflow completed in #{workflow_time.round(3)}s"
+    return custom_report
+  rescue => e
+    monitor.end_timer(:custom_workflow)
+    puts "Workflow failed: #{e.message}"
+    return nil
+  end
+end
+def generate_custom_recommendations(analysis_result)
+  recommendations = []
+  # Risk-based recommendations
+  high_risks = analysis_result[:risks][:high_risks]
+  if high_risks.any?
+    recommendations << "URGENT: Address #{high_risks.length} high-risk issues before signing"
+    high_risks.each { |risk| recommendations << "- #{risk[:recommendation]}" }
+  end
+  # Compliance recommendations
+  compliance_gaps = analysis_result[:risks][:compliance_gaps]
+  if compliance_gaps.any?
+    recommendations << "COMPLIANCE: Review #{compliance_gaps.length} regulatory gaps"
+    compliance_gaps.each { |gap| recommendations << "- #{gap[:recommendation]}" }
+  end
+  # Document type specific recommendations
+  doc_type = analysis_result[:metadata][:document_type]
+  case doc_type
+  when 'nda'
+    recommendations << "NDA: Verify confidentiality scope and duration"
+  when 'employment_contract'
+    recommendations << "EMPLOYMENT: Check termination clauses and benefits"
+  when 'service_agreement'
+    recommendations << "SERVICE: Review deliverables and payment terms"
+  end
+  recommendations
+end
+# Example usage of custom workflow
+puts "\n7. Custom Workflow Example"
+# Replace with actual file path
+sample_file = '/tmp/sample_contract.txt'
+File.write(sample_file, "Sample contract content for demonstration purposes.")
+custom_result = analyze_with_custom_workflow(sample_file)
+if custom_result
+  puts "\nCustom Analysis Result:"
+  puts JSON.pretty_generate(custom_result)
+end
+# Cleanup
+File.delete(sample_file) if File.exist?(sample_file)
+# Example 8: System statistics and monitoring
+puts "\n8. System Statistics"
+system_stats = LegalSummariser.stats
+puts "System Performance Overview:"
+puts "- Performance Metrics: #{system_stats[:performance].keys.join(', ')}"
+puts "- Cache Status: #{system_stats[:cache][:enabled] ? 'Active' : 'Inactive'}"
+puts "- Memory Usage: #{system_stats[:memory][:memory_mb]} MB" if system_stats[:memory][:available]
+puts "\nAdvanced configuration examples completed!"

data/examples/basic_usage.rb ADDED Viewed

@@ -0,0 +1,101 @@
+#!/usr/bin/env ruby
+# frozen_string_literal: true
+# Example: Basic usage of Legal Summariser gem
+require 'legal_summariser'
+# Configure the gem (optional)
+LegalSummariser.configure do |config|
+  config.language = 'en'
+  config.enable_caching = true
+  config.max_file_size = 10 * 1024 * 1024 # 10MB
+end
+# Example 1: Basic document analysis
+puts "=== Basic Document Analysis ==="
+begin
+  # Analyze a document (replace with your actual file path)
+  result = LegalSummariser.summarise('sample_contract.pdf')
+  puts "Document Type: #{result[:metadata][:document_type]}"
+  puts "Word Count: #{result[:metadata][:word_count]}"
+  puts "\nSummary:"
+  puts result[:plain_text]
+  puts "\nKey Points:"
+  result[:key_points].each_with_index do |point, index|
+    puts "#{index + 1}. #{point}"
+  end
+rescue LegalSummariser::DocumentNotFoundError => e
+  puts "Error: #{e.message}"
+rescue LegalSummariser::UnsupportedFormatError => e
+  puts "Error: #{e.message}"
+end
+# Example 2: Analysis with custom options
+puts "\n=== Custom Analysis Options ==="
+options = {
+  max_sentences: 3,
+  format: 'markdown'
+}
+begin
+  result = LegalSummariser.summarise('sample_contract.pdf', options)
+  puts result
+rescue => e
+  puts "Error: #{e.message}"
+end
+# Example 3: Risk analysis focus
+puts "\n=== Risk Analysis ==="
+begin
+  result = LegalSummariser.summarise('sample_contract.pdf')
+  risks = result[:risks]
+  puts "Overall Risk Level: #{risks[:risk_score][:level].upcase}"
+  puts "Risk Score: #{risks[:risk_score][:score]}"
+  if risks[:high_risks].any?
+    puts "\nHigh Risks Found:"
+    risks[:high_risks].each do |risk|
+      puts "- #{risk[:type]}: #{risk[:description]}"
+      puts "  Recommendation: #{risk[:recommendation]}"
+    end
+  end
+  if risks[:compliance_gaps].any?
+    puts "\nCompliance Gaps:"
+    risks[:compliance_gaps].each do |gap|
+      puts "- #{gap[:type]} (#{gap[:regulation]}): #{gap[:description]}"
+    end
+  end
+rescue => e
+  puts "Error: #{e.message}"
+end
+# Example 4: Clause detection
+puts "\n=== Clause Detection ==="
+begin
+  result = LegalSummariser.summarise('sample_contract.pdf')
+  result[:clauses].each do |clause_type, clauses|
+    next if clauses.empty?
+    puts "\n#{clause_type.to_s.split('_').map(&:capitalize).join(' ')} Clauses:"
+    clauses.each_with_index do |clause, index|
+      puts "#{index + 1}. #{clause[:content][0..100]}..."
+    end
+  end
+rescue => e
+  puts "Error: #{e.message}"
+end
+# Example 5: Performance monitoring
+puts "\n=== Performance Statistics ==="
+stats = LegalSummariser.stats
+puts "Performance: #{stats[:performance]}"
+puts "Cache: #{stats[:cache]}"
+puts "Memory: #{stats[:memory]}"