legal_summariser 0.2.0 โ†’ 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 99da5ab12240efdb658eafc5b3e76ef46834f7a7d76bf86edfe1958ea75c4f58
4
- data.tar.gz: aa0ee6b2406771e99c22af8d5ab00145eeee8666a8ccbda9b96c48ed87e0e408
3
+ metadata.gz: 71b638897796c0db2653eefb9456f1fdc821c3c91b3320f6aecce4430e750019
4
+ data.tar.gz: 1d915044c3946c8a34656af00d22d9a9d91d06f0258b74fbf870164f72b1104f
5
5
  SHA512:
6
- metadata.gz: 20d58233629912675fd4fa7a44c0813d1267e25bc0004df18d37c60ed069906f31d8a68cc165c337809ff040e874d41053b347eb4b3df46f98bf85451a1f654d
7
- data.tar.gz: f7bc3b2feab8929485a5387e93ecc0762b32d18903460ba5e33ea1a7c3dd010102c8cbe10feff018694c0ea2a9641c6919904da574a041a226d1de0f1134122b
6
+ metadata.gz: 339c0f2674c8509e5ffed3da42eaffde1d151c535bcb454d4bba7dfa7d9b060636b31afd66b29b405ea73b01e5067280c488661e902944e6e38778d8288854be
7
+ data.tar.gz: dcb9f001f384c872380c4d42bf6ebd9c079156e14131d4dd620af185226c71d7d79e7ea95a251c0aa7e81834ee0534704484fbff5e4ae0e908b8344f61add1f2
data/CHANGELOG.md CHANGED
@@ -5,6 +5,29 @@ All notable changes to this project will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [0.3.0] - 2025-01-09
9
+
10
+ ### Added
11
+ - **Plain Language Generator**: AI-powered legal text simplification with fine-tuned models
12
+ - **Model Training System**: Complete training pipeline for custom legal language models
13
+ - **Advanced Multilingual Support**: Enhanced processing for 8 languages with cultural adaptations
14
+ - **PDF Annotation System**: Rich PDF output with highlighting, comments, and risk indicators
15
+ - **AI/ML Integration**: Support for external AI APIs and local model training
16
+ - **Advanced NLP Features**: Readability scoring, complexity reduction metrics, and text analysis
17
+
18
+ ### Enhanced
19
+ - **Legal Text Processing**: 30+ legal jargon mappings and sentence pattern simplification
20
+ - **Cross-Language Translation**: Legal term mapping across multiple languages
21
+ - **Document Analysis**: Enhanced with plain language generation and multilingual processing
22
+ - **Performance Monitoring**: Extended metrics for new AI/ML operations
23
+ - **Error Handling**: Comprehensive error management for AI operations and model training
24
+
25
+ ### Technical Improvements
26
+ - **Model Architecture**: Pattern-based, statistical, and neural model support
27
+ - **Caching System**: Extended for translation and model results
28
+ - **API Integration**: Support for external translation and AI services
29
+ - **Cultural Adaptations**: Legal system-specific processing for different countries
30
+
8
31
  ## [0.2.0] - 2025-01-09
9
32
 
10
33
  ### Added
data/CONTRIBUTING.md ADDED
@@ -0,0 +1,231 @@
1
+ # Contributing to Legal Summariser
2
+
3
+ Thank you for your interest in contributing to Legal Summariser! This document provides guidelines for contributing to the project.
4
+
5
+ ## ๐Ÿš€ Getting Started
6
+
7
+ ### Prerequisites
8
+
9
+ - Ruby 2.6 or higher
10
+ - Bundler gem
11
+ - Git
12
+
13
+ ### Development Setup
14
+
15
+ 1. Fork the repository
16
+ 2. Clone your fork:
17
+ ```bash
18
+ git clone https://github.com/your-username/legal_summariser.git
19
+ cd legal_summariser
20
+ ```
21
+
22
+ 3. Install dependencies:
23
+ ```bash
24
+ bundle install
25
+ ```
26
+
27
+ 4. Run tests to ensure everything works:
28
+ ```bash
29
+ bundle exec rspec
30
+ ```
31
+
32
+ ## ๐Ÿงช Testing
33
+
34
+ We maintain comprehensive test coverage. Please ensure all tests pass before submitting a PR:
35
+
36
+ ```bash
37
+ # Run all tests
38
+ bundle exec rspec
39
+
40
+ # Run tests with coverage
41
+ bundle exec rspec --format documentation
42
+
43
+ # Run specific test file
44
+ bundle exec rspec spec/text_extractor_spec.rb
45
+
46
+ # Run linter
47
+ bundle exec rubocop
48
+ ```
49
+
50
+ ### Writing Tests
51
+
52
+ - Write tests for all new functionality
53
+ - Follow existing test patterns and naming conventions
54
+ - Use descriptive test names that explain the expected behavior
55
+ - Include both positive and negative test cases
56
+ - Test edge cases and error conditions
57
+
58
+ ## ๐Ÿ“ Code Style
59
+
60
+ We follow Ruby community standards:
61
+
62
+ - Use 2 spaces for indentation
63
+ - Keep lines under 120 characters
64
+ - Use descriptive variable and method names
65
+ - Add comments for complex logic
66
+ - Follow RuboCop guidelines
67
+
68
+ Run the linter before submitting:
69
+ ```bash
70
+ bundle exec rubocop
71
+ ```
72
+
73
+ ## ๐Ÿ”ง Development Guidelines
74
+
75
+ ### Architecture
76
+
77
+ The gem follows a modular architecture:
78
+
79
+ - `TextExtractor`: Document parsing and text extraction
80
+ - `Summariser`: Text summarization and key point extraction
81
+ - `ClauseDetector`: Legal clause identification
82
+ - `RiskAnalyzer`: Risk assessment and compliance checking
83
+ - `Formatter`: Output formatting (JSON, Markdown, Text)
84
+ - `Cache`: Result caching system
85
+ - `PerformanceMonitor`: Performance tracking
86
+ - `Configuration`: Gem configuration management
87
+
88
+ ### Adding New Features
89
+
90
+ 1. **Create an issue** describing the feature
91
+ 2. **Write tests** for the new functionality
92
+ 3. **Implement the feature** following existing patterns
93
+ 4. **Update documentation** including README and code comments
94
+ 5. **Add examples** if applicable
95
+ 6. **Ensure all tests pass**
96
+
97
+ ### Adding New Document Types
98
+
99
+ To add support for a new document type:
100
+
101
+ 1. Add detection patterns in `detect_document_type` method
102
+ 2. Update supported formats documentation
103
+ 3. Add test cases for the new format
104
+ 4. Update CLI help text if needed
105
+
106
+ ### Adding New Risk Patterns
107
+
108
+ To add new risk detection patterns:
109
+
110
+ 1. Add patterns to `RiskAnalyzer` class
111
+ 2. Include severity levels and recommendations
112
+ 3. Add corresponding test cases
113
+ 4. Update documentation
114
+
115
+ ## ๐Ÿ“š Documentation
116
+
117
+ - Update README.md for user-facing changes
118
+ - Add inline documentation for new methods
119
+ - Include examples for new features
120
+ - Update CHANGELOG.md following semantic versioning
121
+
122
+ ## ๐Ÿ› Bug Reports
123
+
124
+ When reporting bugs, please include:
125
+
126
+ - Ruby version
127
+ - Gem version
128
+ - Operating system
129
+ - Steps to reproduce
130
+ - Expected vs actual behavior
131
+ - Sample files (if applicable and not confidential)
132
+
133
+ ## ๐Ÿ’ก Feature Requests
134
+
135
+ Feature requests should include:
136
+
137
+ - Clear description of the feature
138
+ - Use case and motivation
139
+ - Proposed implementation approach
140
+ - Potential impact on existing functionality
141
+
142
+ ## ๐Ÿ”„ Pull Request Process
143
+
144
+ 1. **Create a feature branch** from `main`:
145
+ ```bash
146
+ git checkout -b feature/your-feature-name
147
+ ```
148
+
149
+ 2. **Make your changes** following the guidelines above
150
+
151
+ 3. **Commit with descriptive messages**:
152
+ ```bash
153
+ git commit -m "Add support for new document format"
154
+ ```
155
+
156
+ 4. **Push to your fork**:
157
+ ```bash
158
+ git push origin feature/your-feature-name
159
+ ```
160
+
161
+ 5. **Create a Pull Request** with:
162
+ - Clear title and description
163
+ - Reference to related issues
164
+ - List of changes made
165
+ - Test results
166
+
167
+ ### PR Requirements
168
+
169
+ - [ ] All tests pass
170
+ - [ ] Code follows style guidelines
171
+ - [ ] Documentation is updated
172
+ - [ ] CHANGELOG.md is updated
173
+ - [ ] No breaking changes (or clearly documented)
174
+
175
+ ## ๐Ÿท๏ธ Release Process
176
+
177
+ Releases follow semantic versioning:
178
+
179
+ - **MAJOR**: Breaking changes
180
+ - **MINOR**: New features (backward compatible)
181
+ - **PATCH**: Bug fixes (backward compatible)
182
+
183
+ ## ๐Ÿค Code of Conduct
184
+
185
+ - Be respectful and inclusive
186
+ - Focus on constructive feedback
187
+ - Help others learn and grow
188
+ - Maintain professionalism
189
+
190
+ ## ๐Ÿ“ž Getting Help
191
+
192
+ - Create an issue for bugs or feature requests
193
+ - Join discussions in existing issues
194
+ - Contact maintainers for questions
195
+
196
+ ## ๐ŸŽฏ Areas for Contribution
197
+
198
+ We welcome contributions in these areas:
199
+
200
+ ### High Priority
201
+ - Additional document format support (ODT, RTF, HTML)
202
+ - Enhanced clause detection patterns
203
+ - Multi-language support improvements
204
+ - Performance optimizations
205
+
206
+ ### Medium Priority
207
+ - Additional risk assessment rules
208
+ - Better error handling and recovery
209
+ - Enhanced caching strategies
210
+ - CLI improvements
211
+
212
+ ### Documentation
213
+ - More usage examples
214
+ - Video tutorials
215
+ - API documentation improvements
216
+ - Translation to other languages
217
+
218
+ ## ๐Ÿ™ Recognition
219
+
220
+ Contributors will be:
221
+ - Listed in the README.md
222
+ - Mentioned in release notes
223
+ - Given credit in commit messages
224
+
225
+ ## ๐Ÿ“„ License
226
+
227
+ By contributing, you agree that your contributions will be licensed under the MIT License.
228
+
229
+ ---
230
+
231
+ Thank you for contributing to Legal Summariser! Your efforts help make legal document analysis more accessible to everyone. ๐Ÿš€
@@ -0,0 +1,195 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ # Example: Advanced configuration and customization
5
+ require 'legal_summariser'
6
+ require 'logger'
7
+
8
+ puts "=== Advanced Legal Summariser Configuration ==="
9
+
10
+ # Example 1: Custom logging configuration
11
+ puts "\n1. Custom Logging Setup"
12
+ custom_logger = Logger.new('legal_analysis.log')
13
+ custom_logger.level = Logger::DEBUG
14
+ custom_logger.formatter = proc do |severity, datetime, progname, msg|
15
+ "[#{datetime.strftime('%Y-%m-%d %H:%M:%S')}] #{severity}: #{msg}\n"
16
+ end
17
+
18
+ LegalSummariser.configure do |config|
19
+ config.logger = custom_logger
20
+ config.language = 'en'
21
+ config.max_file_size = 20 * 1024 * 1024 # 20MB
22
+ config.timeout = 60 # 60 seconds
23
+ config.enable_caching = true
24
+ config.cache_dir = './custom_cache'
25
+ end
26
+
27
+ puts "Configuration applied successfully!"
28
+
29
+ # Example 2: Multi-language support
30
+ puts "\n2. Multi-language Configuration"
31
+ LegalSummariser.configure do |config|
32
+ config.language = 'tr' # Turkish
33
+ end
34
+
35
+ puts "Language set to Turkish (TR)"
36
+ puts "Supported languages: #{LegalSummariser.configuration.supported_languages.join(', ')}"
37
+
38
+ # Example 3: Performance monitoring
39
+ puts "\n3. Performance Monitoring"
40
+ monitor = LegalSummariser.performance_monitor
41
+
42
+ # Simulate some operations for demonstration
43
+ monitor.start_timer(:demo_operation)
44
+ sleep(0.1) # Simulate work
45
+ monitor.end_timer(:demo_operation)
46
+
47
+ monitor.record(:demo_metric, 42.5)
48
+ monitor.record(:demo_metric, 38.2)
49
+
50
+ puts "Performance Report:"
51
+ puts monitor.report
52
+
53
+ # Example 4: Cache management
54
+ puts "\n4. Cache Management"
55
+ cache = LegalSummariser::Cache.new
56
+
57
+ # Show cache statistics
58
+ cache_stats = cache.stats
59
+ puts "Cache Status: #{cache_stats[:enabled] ? 'Enabled' : 'Disabled'}"
60
+
61
+ if cache_stats[:enabled]
62
+ puts "Cache Directory: #{cache_stats[:cache_dir]}"
63
+ puts "Cached Files: #{cache_stats[:file_count]}"
64
+ puts "Cache Size: #{cache_stats[:total_size_mb]} MB"
65
+ end
66
+
67
+ # Example 5: Error handling and validation
68
+ puts "\n5. Configuration Validation"
69
+ begin
70
+ LegalSummariser.configure do |config|
71
+ config.language = 'invalid_language'
72
+ end
73
+ rescue LegalSummariser::Error => e
74
+ puts "Configuration error caught: #{e.message}"
75
+ end
76
+
77
+ # Reset to valid configuration
78
+ LegalSummariser.configure do |config|
79
+ config.language = 'en'
80
+ end
81
+
82
+ # Example 6: Custom analysis workflow
83
+ puts "\n6. Custom Analysis Workflow"
84
+ def analyze_with_custom_workflow(file_path)
85
+ puts "Starting custom analysis workflow for: #{file_path}"
86
+
87
+ # Start performance monitoring
88
+ monitor = LegalSummariser.performance_monitor
89
+ monitor.start_timer(:custom_workflow)
90
+
91
+ begin
92
+ # Step 1: Basic analysis
93
+ puts "Step 1: Performing basic analysis..."
94
+ result = LegalSummariser.summarise(file_path)
95
+
96
+ # Step 2: Custom risk assessment
97
+ puts "Step 2: Custom risk assessment..."
98
+ risk_score = result[:risks][:risk_score][:score]
99
+
100
+ custom_risk_level = case risk_score
101
+ when 0..5 then 'Very Low'
102
+ when 6..15 then 'Low'
103
+ when 16..30 then 'Medium'
104
+ when 31..50 then 'High'
105
+ else 'Critical'
106
+ end
107
+
108
+ # Step 3: Generate custom report
109
+ puts "Step 3: Generating custom report..."
110
+ custom_report = {
111
+ file_path: file_path,
112
+ analysis_timestamp: Time.now.iso8601,
113
+ document_info: {
114
+ type: result[:metadata][:document_type],
115
+ word_count: result[:metadata][:word_count],
116
+ processing_time: result[:metadata][:extraction_time_seconds]
117
+ },
118
+ summary: result[:plain_text],
119
+ risk_assessment: {
120
+ standard_score: risk_score,
121
+ custom_level: custom_risk_level,
122
+ high_priority_issues: result[:risks][:high_risks].length,
123
+ compliance_gaps: result[:risks][:compliance_gaps].length
124
+ },
125
+ recommendations: generate_custom_recommendations(result)
126
+ }
127
+
128
+ workflow_time = monitor.end_timer(:custom_workflow)
129
+ custom_report[:workflow_time_seconds] = workflow_time.round(3)
130
+
131
+ puts "Custom workflow completed in #{workflow_time.round(3)}s"
132
+ return custom_report
133
+
134
+ rescue => e
135
+ monitor.end_timer(:custom_workflow)
136
+ puts "Workflow failed: #{e.message}"
137
+ return nil
138
+ end
139
+ end
140
+
141
+ def generate_custom_recommendations(analysis_result)
142
+ recommendations = []
143
+
144
+ # Risk-based recommendations
145
+ high_risks = analysis_result[:risks][:high_risks]
146
+ if high_risks.any?
147
+ recommendations << "URGENT: Address #{high_risks.length} high-risk issues before signing"
148
+ high_risks.each { |risk| recommendations << "- #{risk[:recommendation]}" }
149
+ end
150
+
151
+ # Compliance recommendations
152
+ compliance_gaps = analysis_result[:risks][:compliance_gaps]
153
+ if compliance_gaps.any?
154
+ recommendations << "COMPLIANCE: Review #{compliance_gaps.length} regulatory gaps"
155
+ compliance_gaps.each { |gap| recommendations << "- #{gap[:recommendation]}" }
156
+ end
157
+
158
+ # Document type specific recommendations
159
+ doc_type = analysis_result[:metadata][:document_type]
160
+ case doc_type
161
+ when 'nda'
162
+ recommendations << "NDA: Verify confidentiality scope and duration"
163
+ when 'employment_contract'
164
+ recommendations << "EMPLOYMENT: Check termination clauses and benefits"
165
+ when 'service_agreement'
166
+ recommendations << "SERVICE: Review deliverables and payment terms"
167
+ end
168
+
169
+ recommendations
170
+ end
171
+
172
+ # Example usage of custom workflow
173
+ puts "\n7. Custom Workflow Example"
174
+ # Replace with actual file path
175
+ sample_file = '/tmp/sample_contract.txt'
176
+ File.write(sample_file, "Sample contract content for demonstration purposes.")
177
+
178
+ custom_result = analyze_with_custom_workflow(sample_file)
179
+ if custom_result
180
+ puts "\nCustom Analysis Result:"
181
+ puts JSON.pretty_generate(custom_result)
182
+ end
183
+
184
+ # Cleanup
185
+ File.delete(sample_file) if File.exist?(sample_file)
186
+
187
+ # Example 8: System statistics and monitoring
188
+ puts "\n8. System Statistics"
189
+ system_stats = LegalSummariser.stats
190
+ puts "System Performance Overview:"
191
+ puts "- Performance Metrics: #{system_stats[:performance].keys.join(', ')}"
192
+ puts "- Cache Status: #{system_stats[:cache][:enabled] ? 'Active' : 'Inactive'}"
193
+ puts "- Memory Usage: #{system_stats[:memory][:memory_mb]} MB" if system_stats[:memory][:available]
194
+
195
+ puts "\nAdvanced configuration examples completed!"
@@ -0,0 +1,101 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ # Example: Basic usage of Legal Summariser gem
5
+ require 'legal_summariser'
6
+
7
+ # Configure the gem (optional)
8
+ LegalSummariser.configure do |config|
9
+ config.language = 'en'
10
+ config.enable_caching = true
11
+ config.max_file_size = 10 * 1024 * 1024 # 10MB
12
+ end
13
+
14
+ # Example 1: Basic document analysis
15
+ puts "=== Basic Document Analysis ==="
16
+ begin
17
+ # Analyze a document (replace with your actual file path)
18
+ result = LegalSummariser.summarise('sample_contract.pdf')
19
+
20
+ puts "Document Type: #{result[:metadata][:document_type]}"
21
+ puts "Word Count: #{result[:metadata][:word_count]}"
22
+ puts "\nSummary:"
23
+ puts result[:plain_text]
24
+
25
+ puts "\nKey Points:"
26
+ result[:key_points].each_with_index do |point, index|
27
+ puts "#{index + 1}. #{point}"
28
+ end
29
+
30
+ rescue LegalSummariser::DocumentNotFoundError => e
31
+ puts "Error: #{e.message}"
32
+ rescue LegalSummariser::UnsupportedFormatError => e
33
+ puts "Error: #{e.message}"
34
+ end
35
+
36
+ # Example 2: Analysis with custom options
37
+ puts "\n=== Custom Analysis Options ==="
38
+ options = {
39
+ max_sentences: 3,
40
+ format: 'markdown'
41
+ }
42
+
43
+ begin
44
+ result = LegalSummariser.summarise('sample_contract.pdf', options)
45
+ puts result
46
+ rescue => e
47
+ puts "Error: #{e.message}"
48
+ end
49
+
50
+ # Example 3: Risk analysis focus
51
+ puts "\n=== Risk Analysis ==="
52
+ begin
53
+ result = LegalSummariser.summarise('sample_contract.pdf')
54
+
55
+ risks = result[:risks]
56
+ puts "Overall Risk Level: #{risks[:risk_score][:level].upcase}"
57
+ puts "Risk Score: #{risks[:risk_score][:score]}"
58
+
59
+ if risks[:high_risks].any?
60
+ puts "\nHigh Risks Found:"
61
+ risks[:high_risks].each do |risk|
62
+ puts "- #{risk[:type]}: #{risk[:description]}"
63
+ puts " Recommendation: #{risk[:recommendation]}"
64
+ end
65
+ end
66
+
67
+ if risks[:compliance_gaps].any?
68
+ puts "\nCompliance Gaps:"
69
+ risks[:compliance_gaps].each do |gap|
70
+ puts "- #{gap[:type]} (#{gap[:regulation]}): #{gap[:description]}"
71
+ end
72
+ end
73
+
74
+ rescue => e
75
+ puts "Error: #{e.message}"
76
+ end
77
+
78
+ # Example 4: Clause detection
79
+ puts "\n=== Clause Detection ==="
80
+ begin
81
+ result = LegalSummariser.summarise('sample_contract.pdf')
82
+
83
+ result[:clauses].each do |clause_type, clauses|
84
+ next if clauses.empty?
85
+
86
+ puts "\n#{clause_type.to_s.split('_').map(&:capitalize).join(' ')} Clauses:"
87
+ clauses.each_with_index do |clause, index|
88
+ puts "#{index + 1}. #{clause[:content][0..100]}..."
89
+ end
90
+ end
91
+
92
+ rescue => e
93
+ puts "Error: #{e.message}"
94
+ end
95
+
96
+ # Example 5: Performance monitoring
97
+ puts "\n=== Performance Statistics ==="
98
+ stats = LegalSummariser.stats
99
+ puts "Performance: #{stats[:performance]}"
100
+ puts "Cache: #{stats[:cache]}"
101
+ puts "Memory: #{stats[:memory]}"
@@ -0,0 +1,123 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ # Example: Batch processing multiple legal documents
5
+ require 'legal_summariser'
6
+
7
+ # Configure for batch processing
8
+ LegalSummariser.configure do |config|
9
+ config.enable_caching = true
10
+ config.logger = Logger.new(STDOUT, level: Logger::INFO)
11
+ end
12
+
13
+ # Example file paths (replace with your actual files)
14
+ file_paths = [
15
+ 'contracts/nda_company_a.pdf',
16
+ 'contracts/service_agreement_b.docx',
17
+ 'contracts/employment_contract_c.txt',
18
+ 'policies/privacy_policy.pdf'
19
+ ]
20
+
21
+ puts "=== Batch Processing Legal Documents ==="
22
+ puts "Processing #{file_paths.length} documents..."
23
+
24
+ # Batch process all documents
25
+ results = LegalSummariser.batch_summarise(file_paths, {
26
+ format: 'json',
27
+ max_sentences: 4
28
+ })
29
+
30
+ # Analyze results
31
+ successful = results.select { |r| r[:success] }
32
+ failed = results.reject { |r| r[:success] }
33
+
34
+ puts "\nBatch Processing Results:"
35
+ puts "โœ“ Successful: #{successful.length}"
36
+ puts "โœ— Failed: #{failed.length}"
37
+
38
+ # Process successful results
39
+ if successful.any?
40
+ puts "\n=== Successful Analyses ==="
41
+
42
+ successful.each do |result|
43
+ analysis = JSON.parse(result[:result], symbolize_names: true)
44
+
45
+ puts "\nFile: #{File.basename(result[:file_path])}"
46
+ puts "Type: #{analysis[:metadata][:document_type]}"
47
+ puts "Words: #{analysis[:metadata][:word_count]}"
48
+ puts "Risk Level: #{analysis[:risks][:risk_score][:level].upcase}"
49
+
50
+ # Show key risks
51
+ high_risks = analysis[:risks][:high_risks]
52
+ if high_risks.any?
53
+ puts "High Risks: #{high_risks.map { |r| r[:type] }.join(', ')}"
54
+ end
55
+ end
56
+
57
+ # Generate summary report
58
+ puts "\n=== Summary Report ==="
59
+
60
+ # Document type distribution
61
+ doc_types = successful.map do |result|
62
+ JSON.parse(result[:result], symbolize_names: true)[:metadata][:document_type]
63
+ end
64
+
65
+ type_counts = doc_types.group_by(&:itself).transform_values(&:count)
66
+ puts "Document Types:"
67
+ type_counts.each { |type, count| puts " #{type}: #{count}" }
68
+
69
+ # Risk level distribution
70
+ risk_levels = successful.map do |result|
71
+ JSON.parse(result[:result], symbolize_names: true)[:risks][:risk_score][:level]
72
+ end
73
+
74
+ risk_counts = risk_levels.group_by(&:itself).transform_values(&:count)
75
+ puts "Risk Levels:"
76
+ risk_counts.each { |level, count| puts " #{level}: #{count}" }
77
+
78
+ # Average processing metrics
79
+ word_counts = successful.map do |result|
80
+ JSON.parse(result[:result], symbolize_names: true)[:metadata][:word_count]
81
+ end
82
+
83
+ avg_words = word_counts.sum.to_f / word_counts.length
84
+ puts "Average Document Size: #{avg_words.round} words"
85
+ end
86
+
87
+ # Show failed analyses
88
+ if failed.any?
89
+ puts "\n=== Failed Analyses ==="
90
+ failed.each do |result|
91
+ puts "โœ— #{File.basename(result[:file_path])}: #{result[:error]}"
92
+ end
93
+ end
94
+
95
+ # Export results to files
96
+ puts "\n=== Exporting Results ==="
97
+ require 'fileutils'
98
+
99
+ output_dir = 'analysis_results'
100
+ FileUtils.mkdir_p(output_dir)
101
+
102
+ successful.each do |result|
103
+ filename = File.basename(result[:file_path], '.*')
104
+ output_file = File.join(output_dir, "#{filename}_analysis.json")
105
+
106
+ File.write(output_file, result[:result])
107
+ puts "Exported: #{output_file}"
108
+ end
109
+
110
+ # Generate consolidated report
111
+ consolidated_report = {
112
+ processed_at: Time.now.iso8601,
113
+ total_files: file_paths.length,
114
+ successful: successful.length,
115
+ failed: failed.length,
116
+ results: results
117
+ }
118
+
119
+ report_file = File.join(output_dir, 'batch_report.json')
120
+ File.write(report_file, JSON.pretty_generate(consolidated_report))
121
+ puts "Consolidated report: #{report_file}"
122
+
123
+ puts "\nBatch processing completed!"
@@ -0,0 +1,81 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'digest'
4
+ require 'json'
5
+ require 'fileutils'
6
+
7
+ module LegalSummariser
8
+ # Caching system for analysis results
9
+ class Cache
10
+ def initialize(cache_dir = nil)
11
+ @cache_dir = cache_dir || LegalSummariser.configuration.cache_dir
12
+ FileUtils.mkdir_p(@cache_dir) if LegalSummariser.configuration.enable_caching
13
+ end
14
+
15
+ # Generate cache key for a file
16
+ # @param file_path [String] Path to the file
17
+ # @param options [Hash] Analysis options
18
+ # @return [String] Cache key
19
+ def cache_key(file_path, options = {})
20
+ file_stat = File.stat(file_path)
21
+ content = "#{file_path}:#{file_stat.mtime}:#{file_stat.size}:#{options.to_json}"
22
+ Digest::SHA256.hexdigest(content)
23
+ end
24
+
25
+ # Get cached result
26
+ # @param key [String] Cache key
27
+ # @return [Hash, nil] Cached result or nil
28
+ def get(key)
29
+ return nil unless LegalSummariser.configuration.enable_caching
30
+
31
+ cache_file = File.join(@cache_dir, "#{key}.json")
32
+ return nil unless File.exist?(cache_file)
33
+
34
+ # Check if cache is expired (24 hours)
35
+ return nil if File.mtime(cache_file) < Time.now - (24 * 60 * 60)
36
+
37
+ JSON.parse(File.read(cache_file), symbolize_names: true)
38
+ rescue JSON::ParserError, Errno::ENOENT
39
+ nil
40
+ end
41
+
42
+ # Store result in cache
43
+ # @param key [String] Cache key
44
+ # @param result [Hash] Result to cache
45
+ def set(key, result)
46
+ return unless LegalSummariser.configuration.enable_caching
47
+
48
+ cache_file = File.join(@cache_dir, "#{key}.json")
49
+ File.write(cache_file, JSON.pretty_generate(result))
50
+ rescue => e
51
+ # Silently fail caching - don't break the main functionality
52
+ LegalSummariser.configuration.logger&.warn("Cache write failed: #{e.message}")
53
+ end
54
+
55
+ # Clear cache
56
+ def clear!
57
+ return unless Dir.exist?(@cache_dir)
58
+
59
+ Dir.glob(File.join(@cache_dir, "*.json")).each do |file|
60
+ File.delete(file)
61
+ end
62
+ end
63
+
64
+ # Get cache statistics
65
+ # @return [Hash] Cache statistics
66
+ def stats
67
+ return { enabled: false } unless LegalSummariser.configuration.enable_caching
68
+
69
+ cache_files = Dir.glob(File.join(@cache_dir, "*.json"))
70
+ total_size = cache_files.sum { |file| File.size(file) }
71
+
72
+ {
73
+ enabled: true,
74
+ file_count: cache_files.length,
75
+ total_size_bytes: total_size,
76
+ total_size_mb: (total_size / 1024.0 / 1024.0).round(2),
77
+ cache_dir: @cache_dir
78
+ }
79
+ end
80
+ end
81
+ end
@@ -0,0 +1,43 @@
1
+ # frozen_string_literal: true
2
+
3
+ module LegalSummariser
4
+ # Configuration class for gem settings
5
+ class Configuration
6
+ attr_accessor :logger, :max_file_size, :timeout, :language, :enable_caching, :cache_dir
7
+
8
+ def initialize
9
+ @logger = nil
10
+ @max_file_size = 50 * 1024 * 1024 # 50MB default
11
+ @timeout = 30 # 30 seconds default
12
+ @language = 'en'
13
+ @enable_caching = false
14
+ @cache_dir = '/tmp/legal_summariser_cache'
15
+ end
16
+
17
+ # Supported languages for analysis
18
+ def supported_languages
19
+ %w[en tr de fr es it]
20
+ end
21
+
22
+ # Validate configuration
23
+ def validate!
24
+ raise Error, "Invalid language: #{@language}" unless supported_languages.include?(@language)
25
+ raise Error, "Max file size must be positive" if @max_file_size <= 0
26
+ raise Error, "Timeout must be positive" if @timeout <= 0
27
+ end
28
+ end
29
+
30
+ # Global configuration
31
+ def self.configuration
32
+ @configuration ||= Configuration.new
33
+ end
34
+
35
+ def self.configure
36
+ yield(configuration)
37
+ configuration.validate!
38
+ end
39
+
40
+ def self.reset_configuration!
41
+ @configuration = Configuration.new
42
+ end
43
+ end
@@ -0,0 +1,108 @@
1
+ # frozen_string_literal: true
2
+
3
+ module LegalSummariser
4
+ # Performance monitoring and metrics collection
5
+ class PerformanceMonitor
6
+ def initialize
7
+ @metrics = {}
8
+ @start_times = {}
9
+ end
10
+
11
+ # Start timing an operation
12
+ # @param operation [String] Operation name
13
+ def start_timer(operation)
14
+ @start_times[operation] = Time.now
15
+ end
16
+
17
+ # End timing an operation
18
+ # @param operation [String] Operation name
19
+ def end_timer(operation)
20
+ return unless @start_times[operation]
21
+
22
+ duration = Time.now - @start_times[operation]
23
+ @metrics[operation] ||= []
24
+ @metrics[operation] << duration
25
+ @start_times.delete(operation)
26
+ duration
27
+ end
28
+
29
+ # Record a metric value
30
+ # @param metric [String] Metric name
31
+ # @param value [Numeric] Metric value
32
+ def record(metric, value)
33
+ @metrics[metric] ||= []
34
+ @metrics[metric] << value
35
+ end
36
+
37
+ # Get performance statistics
38
+ # @return [Hash] Performance statistics
39
+ def stats
40
+ stats = {}
41
+
42
+ @metrics.each do |metric, values|
43
+ next if values.empty?
44
+
45
+ stats[metric] = {
46
+ count: values.length,
47
+ total: values.sum.round(4),
48
+ average: (values.sum / values.length).round(4),
49
+ min: values.min.round(4),
50
+ max: values.max.round(4)
51
+ }
52
+ end
53
+
54
+ stats
55
+ end
56
+
57
+ # Reset all metrics
58
+ def reset!
59
+ @metrics.clear
60
+ @start_times.clear
61
+ end
62
+
63
+ # Get current memory usage (if available)
64
+ # @return [Hash] Memory usage information
65
+ def memory_usage
66
+ if defined?(GC)
67
+ {
68
+ object_count: GC.stat[:heap_live_slots],
69
+ gc_count: GC.count,
70
+ memory_mb: (GC.stat[:heap_live_slots] * 40 / 1024.0 / 1024.0).round(2) # Rough estimate
71
+ }
72
+ else
73
+ { available: false }
74
+ end
75
+ end
76
+
77
+ # Generate performance report
78
+ # @return [String] Formatted performance report
79
+ def report
80
+ report = ["Performance Report", "=" * 50, ""]
81
+
82
+ stats.each do |metric, data|
83
+ report << "#{metric.to_s.tr('_', ' ').capitalize}:"
84
+ report << " Count: #{data[:count]}"
85
+ report << " Total: #{data[:total]}s"
86
+ report << " Average: #{data[:average]}s"
87
+ report << " Min: #{data[:min]}s"
88
+ report << " Max: #{data[:max]}s"
89
+ report << ""
90
+ end
91
+
92
+ memory = memory_usage
93
+ if memory[:available] != false
94
+ report << "Memory Usage:"
95
+ report << " Objects: #{memory[:object_count]}"
96
+ report << " GC Count: #{memory[:gc_count]}"
97
+ report << " Estimated Memory: #{memory[:memory_mb]} MB"
98
+ end
99
+
100
+ report.join("\n")
101
+ end
102
+ end
103
+
104
+ # Global performance monitor
105
+ def self.performance_monitor
106
+ @performance_monitor ||= PerformanceMonitor.new
107
+ end
108
+ end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module LegalSummariser
4
- VERSION = "0.2.0"
4
+ VERSION = "0.3.0"
5
5
  end
@@ -1,15 +1,19 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require_relative "legal_summariser/version"
4
- require_relative "legal_summariser/configuration"
5
- require_relative "legal_summariser/cache"
6
- require_relative "legal_summariser/performance_monitor"
7
- require_relative "legal_summariser/document_parser"
8
- require_relative "legal_summariser/text_extractor"
9
- require_relative "legal_summariser/summariser"
10
- require_relative "legal_summariser/clause_detector"
11
- require_relative "legal_summariser/risk_analyzer"
12
- require_relative "legal_summariser/formatter"
3
+ require_relative 'legal_summariser/version'
4
+ require_relative 'legal_summariser/text_extractor'
5
+ require_relative 'legal_summariser/summariser'
6
+ require_relative 'legal_summariser/clause_detector'
7
+ require_relative 'legal_summariser/risk_analyzer'
8
+ require_relative 'legal_summariser/formatter'
9
+ require_relative 'legal_summariser/document_parser'
10
+ require_relative 'legal_summariser/configuration'
11
+ require_relative 'legal_summariser/cache'
12
+ require_relative 'legal_summariser/performance_monitor'
13
+ require_relative 'legal_summariser/plain_language_generator'
14
+ require_relative 'legal_summariser/model_trainer'
15
+ require_relative 'legal_summariser/multilingual_processor'
16
+ require_relative 'legal_summariser/pdf_annotator'
13
17
 
14
18
  module LegalSummariser
15
19
  class Error < StandardError; end
@@ -68,12 +72,15 @@ module LegalSummariser
68
72
 
69
73
  # Format results
70
74
  result = {
75
+ file_path: file_path,
76
+ document_type: detect_document_type(text),
77
+ processing_time: monitor.end_timer(:total_analysis),
71
78
  plain_text: summary[:plain_text],
72
79
  key_points: summary[:key_points],
73
80
  clauses: clauses,
74
81
  risks: risks,
75
82
  metadata: {
76
- document_type: detect_document_type(text),
83
+ file_size: File.size(file_path),
77
84
  word_count: text_stats[:word_count],
78
85
  character_count: text_stats[:character_count],
79
86
  sentence_count: text_stats[:sentence_count],
@@ -82,7 +89,8 @@ module LegalSummariser
82
89
  extraction_time_seconds: extraction_time.round(3),
83
90
  processed_at: Time.now.strftime("%Y-%m-%dT%H:%M:%S%z"),
84
91
  gem_version: VERSION,
85
- language: configuration.language
92
+ language: configuration.language,
93
+ document_type: detect_document_type(text)
86
94
  },
87
95
  performance: monitor.stats
88
96
  }
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: legal_summariser
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Legal Summariser Team
@@ -162,14 +162,21 @@ extra_rdoc_files: []
162
162
  files:
163
163
  - ".rspec"
164
164
  - CHANGELOG.md
165
+ - CONTRIBUTING.md
165
166
  - Gemfile
166
167
  - README.md
167
168
  - Rakefile
169
+ - examples/advanced_configuration.rb
170
+ - examples/basic_usage.rb
171
+ - examples/batch_processing.rb
168
172
  - exe/legal_summariser
169
173
  - lib/legal_summariser.rb
174
+ - lib/legal_summariser/cache.rb
170
175
  - lib/legal_summariser/clause_detector.rb
176
+ - lib/legal_summariser/configuration.rb
171
177
  - lib/legal_summariser/document_parser.rb
172
178
  - lib/legal_summariser/formatter.rb
179
+ - lib/legal_summariser/performance_monitor.rb
173
180
  - lib/legal_summariser/risk_analyzer.rb
174
181
  - lib/legal_summariser/summariser.rb
175
182
  - lib/legal_summariser/text_extractor.rb