legal_summariser 0.1.0 โ†’ 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 2a32e0da3e5422be003d79a333a6f3ea9417fadcc362164e3cef9cae0d84dafb
4
- data.tar.gz: 3219d6167c936a2f056f43b5e2491bc4c67a697ef9d72169c7741be03f5a2726
3
+ metadata.gz: 71b638897796c0db2653eefb9456f1fdc821c3c91b3320f6aecce4430e750019
4
+ data.tar.gz: 1d915044c3946c8a34656af00d22d9a9d91d06f0258b74fbf870164f72b1104f
5
5
  SHA512:
6
- metadata.gz: 9481e9eb32d6770586b21f8c56ced7f37d99afe8c9ba162fd284cc086b8f02f71b042bef0200bd61104446c0309763da7c362a3e5abae202ccf295c04ef63281
7
- data.tar.gz: c41d771b2ef842b185ebf0114de4921060ad6e55a17377acfc47412790237428fcab8a5ceff3efe2112b3341a9380a30160244c21b0b078eecc108181e9d4ce8
6
+ metadata.gz: 339c0f2674c8509e5ffed3da42eaffde1d151c535bcb454d4bba7dfa7d9b060636b31afd66b29b405ea73b01e5067280c488661e902944e6e38778d8288854be
7
+ data.tar.gz: dcb9f001f384c872380c4d42bf6ebd9c079156e14131d4dd620af185226c71d7d79e7ea95a251c0aa7e81834ee0534704484fbff5e4ae0e908b8344f61add1f2
data/CHANGELOG.md CHANGED
@@ -5,6 +5,54 @@ All notable changes to this project will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [0.3.0] - 2025-01-09
9
+
10
+ ### Added
11
+ - **Plain Language Generator**: AI-powered legal text simplification with fine-tuned models
12
+ - **Model Training System**: Complete training pipeline for custom legal language models
13
+ - **Advanced Multilingual Support**: Enhanced processing for 8 languages with cultural adaptations
14
+ - **PDF Annotation System**: Rich PDF output with highlighting, comments, and risk indicators
15
+ - **AI/ML Integration**: Support for external AI APIs and local model training
16
+ - **Advanced NLP Features**: Readability scoring, complexity reduction metrics, and text analysis
17
+
18
+ ### Enhanced
19
+ - **Legal Text Processing**: 30+ legal jargon mappings and sentence pattern simplification
20
+ - **Cross-Language Translation**: Legal term mapping across multiple languages
21
+ - **Document Analysis**: Enhanced with plain language generation and multilingual processing
22
+ - **Performance Monitoring**: Extended metrics for new AI/ML operations
23
+ - **Error Handling**: Comprehensive error management for AI operations and model training
24
+
25
+ ### Technical Improvements
26
+ - **Model Architecture**: Pattern-based, statistical, and neural model support
27
+ - **Caching System**: Extended for translation and model results
28
+ - **API Integration**: Support for external translation and AI services
29
+ - **Cultural Adaptations**: Legal system-specific processing for different countries
30
+
31
+ ## [0.2.0] - 2025-01-09
32
+
33
+ ### Added
34
+ - **Configuration System**: Comprehensive configuration management with validation
35
+ - **Caching System**: Result caching with TTL and size management
36
+ - **Performance Monitoring**: Built-in performance tracking and metrics
37
+ - **Enhanced CLI**: New commands for batch processing, statistics, and configuration
38
+ - **Batch Processing**: Process multiple documents simultaneously
39
+ - **Enhanced Document Support**: Added RTF support and improved text extraction
40
+ - **Advanced Error Handling**: Better error messages and recovery mechanisms
41
+ - **Comprehensive Testing**: 75 test cases with full coverage
42
+ - **Documentation**: Complete examples and contribution guidelines
43
+
44
+ ### Enhanced
45
+ - **Text Extraction**: Multiple encoding support, better PDF/DOCX handling
46
+ - **Document Type Detection**: Improved scoring system for 9 document types
47
+ - **Risk Analysis**: More comprehensive risk patterns and compliance checking
48
+ - **Summarization**: Better plain English conversion and key point extraction
49
+ - **CLI Interface**: Verbose logging, caching options, and performance stats
50
+
51
+ ### Fixed
52
+ - Text cleaning and normalization issues
53
+ - Memory leaks in document processing
54
+ - Error handling for edge cases
55
+
8
56
  ## [0.1.0] - 2024-09-09
9
57
 
10
58
  ### Added
data/CONTRIBUTING.md ADDED
@@ -0,0 +1,231 @@
1
+ # Contributing to Legal Summariser
2
+
3
+ Thank you for your interest in contributing to Legal Summariser! This document provides guidelines for contributing to the project.
4
+
5
+ ## ๐Ÿš€ Getting Started
6
+
7
+ ### Prerequisites
8
+
9
+ - Ruby 2.6 or higher
10
+ - Bundler gem
11
+ - Git
12
+
13
+ ### Development Setup
14
+
15
+ 1. Fork the repository
16
+ 2. Clone your fork:
17
+ ```bash
18
+ git clone https://github.com/your-username/legal_summariser.git
19
+ cd legal_summariser
20
+ ```
21
+
22
+ 3. Install dependencies:
23
+ ```bash
24
+ bundle install
25
+ ```
26
+
27
+ 4. Run tests to ensure everything works:
28
+ ```bash
29
+ bundle exec rspec
30
+ ```
31
+
32
+ ## ๐Ÿงช Testing
33
+
34
+ We maintain comprehensive test coverage. Please ensure all tests pass before submitting a PR:
35
+
36
+ ```bash
37
+ # Run all tests
38
+ bundle exec rspec
39
+
40
+ # Run tests with coverage
41
+ bundle exec rspec --format documentation
42
+
43
+ # Run specific test file
44
+ bundle exec rspec spec/text_extractor_spec.rb
45
+
46
+ # Run linter
47
+ bundle exec rubocop
48
+ ```
49
+
50
+ ### Writing Tests
51
+
52
+ - Write tests for all new functionality
53
+ - Follow existing test patterns and naming conventions
54
+ - Use descriptive test names that explain the expected behavior
55
+ - Include both positive and negative test cases
56
+ - Test edge cases and error conditions
57
+
58
+ ## ๐Ÿ“ Code Style
59
+
60
+ We follow Ruby community standards:
61
+
62
+ - Use 2 spaces for indentation
63
+ - Keep lines under 120 characters
64
+ - Use descriptive variable and method names
65
+ - Add comments for complex logic
66
+ - Follow RuboCop guidelines
67
+
68
+ Run the linter before submitting:
69
+ ```bash
70
+ bundle exec rubocop
71
+ ```
72
+
73
+ ## ๐Ÿ”ง Development Guidelines
74
+
75
+ ### Architecture
76
+
77
+ The gem follows a modular architecture:
78
+
79
+ - `TextExtractor`: Document parsing and text extraction
80
+ - `Summariser`: Text summarization and key point extraction
81
+ - `ClauseDetector`: Legal clause identification
82
+ - `RiskAnalyzer`: Risk assessment and compliance checking
83
+ - `Formatter`: Output formatting (JSON, Markdown, Text)
84
+ - `Cache`: Result caching system
85
+ - `PerformanceMonitor`: Performance tracking
86
+ - `Configuration`: Gem configuration management
87
+
88
+ ### Adding New Features
89
+
90
+ 1. **Create an issue** describing the feature
91
+ 2. **Write tests** for the new functionality
92
+ 3. **Implement the feature** following existing patterns
93
+ 4. **Update documentation** including README and code comments
94
+ 5. **Add examples** if applicable
95
+ 6. **Ensure all tests pass**
96
+
97
+ ### Adding New Document Types
98
+
99
+ To add support for a new document type:
100
+
101
+ 1. Add detection patterns in `detect_document_type` method
102
+ 2. Update supported formats documentation
103
+ 3. Add test cases for the new format
104
+ 4. Update CLI help text if needed
105
+
106
+ ### Adding New Risk Patterns
107
+
108
+ To add new risk detection patterns:
109
+
110
+ 1. Add patterns to `RiskAnalyzer` class
111
+ 2. Include severity levels and recommendations
112
+ 3. Add corresponding test cases
113
+ 4. Update documentation
114
+
115
+ ## ๐Ÿ“š Documentation
116
+
117
+ - Update README.md for user-facing changes
118
+ - Add inline documentation for new methods
119
+ - Include examples for new features
120
+ - Update CHANGELOG.md following semantic versioning
121
+
122
+ ## ๐Ÿ› Bug Reports
123
+
124
+ When reporting bugs, please include:
125
+
126
+ - Ruby version
127
+ - Gem version
128
+ - Operating system
129
+ - Steps to reproduce
130
+ - Expected vs actual behavior
131
+ - Sample files (if applicable and not confidential)
132
+
133
+ ## ๐Ÿ’ก Feature Requests
134
+
135
+ Feature requests should include:
136
+
137
+ - Clear description of the feature
138
+ - Use case and motivation
139
+ - Proposed implementation approach
140
+ - Potential impact on existing functionality
141
+
142
+ ## ๐Ÿ”„ Pull Request Process
143
+
144
+ 1. **Create a feature branch** from `main`:
145
+ ```bash
146
+ git checkout -b feature/your-feature-name
147
+ ```
148
+
149
+ 2. **Make your changes** following the guidelines above
150
+
151
+ 3. **Commit with descriptive messages**:
152
+ ```bash
153
+ git commit -m "Add support for new document format"
154
+ ```
155
+
156
+ 4. **Push to your fork**:
157
+ ```bash
158
+ git push origin feature/your-feature-name
159
+ ```
160
+
161
+ 5. **Create a Pull Request** with:
162
+ - Clear title and description
163
+ - Reference to related issues
164
+ - List of changes made
165
+ - Test results
166
+
167
+ ### PR Requirements
168
+
169
+ - [ ] All tests pass
170
+ - [ ] Code follows style guidelines
171
+ - [ ] Documentation is updated
172
+ - [ ] CHANGELOG.md is updated
173
+ - [ ] No breaking changes (or clearly documented)
174
+
175
+ ## ๐Ÿท๏ธ Release Process
176
+
177
+ Releases follow semantic versioning:
178
+
179
+ - **MAJOR**: Breaking changes
180
+ - **MINOR**: New features (backward compatible)
181
+ - **PATCH**: Bug fixes (backward compatible)
182
+
183
+ ## ๐Ÿค Code of Conduct
184
+
185
+ - Be respectful and inclusive
186
+ - Focus on constructive feedback
187
+ - Help others learn and grow
188
+ - Maintain professionalism
189
+
190
+ ## ๐Ÿ“ž Getting Help
191
+
192
+ - Create an issue for bugs or feature requests
193
+ - Join discussions in existing issues
194
+ - Contact maintainers for questions
195
+
196
+ ## ๐ŸŽฏ Areas for Contribution
197
+
198
+ We welcome contributions in these areas:
199
+
200
+ ### High Priority
201
+ - Additional document format support (ODT, RTF, HTML)
202
+ - Enhanced clause detection patterns
203
+ - Multi-language support improvements
204
+ - Performance optimizations
205
+
206
+ ### Medium Priority
207
+ - Additional risk assessment rules
208
+ - Better error handling and recovery
209
+ - Enhanced caching strategies
210
+ - CLI improvements
211
+
212
+ ### Documentation
213
+ - More usage examples
214
+ - Video tutorials
215
+ - API documentation improvements
216
+ - Translation to other languages
217
+
218
+ ## ๐Ÿ™ Recognition
219
+
220
+ Contributors will be:
221
+ - Listed in the README.md
222
+ - Mentioned in release notes
223
+ - Given credit in commit messages
224
+
225
+ ## ๐Ÿ“„ License
226
+
227
+ By contributing, you agree that your contributions will be licensed under the MIT License.
228
+
229
+ ---
230
+
231
+ Thank you for contributing to Legal Summariser! Your efforts help make legal document analysis more accessible to everyone. ๐Ÿš€
@@ -0,0 +1,195 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ # Example: Advanced configuration and customization
5
+ require 'legal_summariser'
6
+ require 'logger'
7
+
8
+ puts "=== Advanced Legal Summariser Configuration ==="
9
+
10
+ # Example 1: Custom logging configuration
11
+ puts "\n1. Custom Logging Setup"
12
+ custom_logger = Logger.new('legal_analysis.log')
13
+ custom_logger.level = Logger::DEBUG
14
+ custom_logger.formatter = proc do |severity, datetime, progname, msg|
15
+ "[#{datetime.strftime('%Y-%m-%d %H:%M:%S')}] #{severity}: #{msg}\n"
16
+ end
17
+
18
+ LegalSummariser.configure do |config|
19
+ config.logger = custom_logger
20
+ config.language = 'en'
21
+ config.max_file_size = 20 * 1024 * 1024 # 20MB
22
+ config.timeout = 60 # 60 seconds
23
+ config.enable_caching = true
24
+ config.cache_dir = './custom_cache'
25
+ end
26
+
27
+ puts "Configuration applied successfully!"
28
+
29
+ # Example 2: Multi-language support
30
+ puts "\n2. Multi-language Configuration"
31
+ LegalSummariser.configure do |config|
32
+ config.language = 'tr' # Turkish
33
+ end
34
+
35
+ puts "Language set to Turkish (TR)"
36
+ puts "Supported languages: #{LegalSummariser.configuration.supported_languages.join(', ')}"
37
+
38
+ # Example 3: Performance monitoring
39
+ puts "\n3. Performance Monitoring"
40
+ monitor = LegalSummariser.performance_monitor
41
+
42
+ # Simulate some operations for demonstration
43
+ monitor.start_timer(:demo_operation)
44
+ sleep(0.1) # Simulate work
45
+ monitor.end_timer(:demo_operation)
46
+
47
+ monitor.record(:demo_metric, 42.5)
48
+ monitor.record(:demo_metric, 38.2)
49
+
50
+ puts "Performance Report:"
51
+ puts monitor.report
52
+
53
+ # Example 4: Cache management
54
+ puts "\n4. Cache Management"
55
+ cache = LegalSummariser::Cache.new
56
+
57
+ # Show cache statistics
58
+ cache_stats = cache.stats
59
+ puts "Cache Status: #{cache_stats[:enabled] ? 'Enabled' : 'Disabled'}"
60
+
61
+ if cache_stats[:enabled]
62
+ puts "Cache Directory: #{cache_stats[:cache_dir]}"
63
+ puts "Cached Files: #{cache_stats[:file_count]}"
64
+ puts "Cache Size: #{cache_stats[:total_size_mb]} MB"
65
+ end
66
+
67
+ # Example 5: Error handling and validation
68
+ puts "\n5. Configuration Validation"
69
+ begin
70
+ LegalSummariser.configure do |config|
71
+ config.language = 'invalid_language'
72
+ end
73
+ rescue LegalSummariser::Error => e
74
+ puts "Configuration error caught: #{e.message}"
75
+ end
76
+
77
+ # Reset to valid configuration
78
+ LegalSummariser.configure do |config|
79
+ config.language = 'en'
80
+ end
81
+
82
+ # Example 6: Custom analysis workflow
83
+ puts "\n6. Custom Analysis Workflow"
84
+ def analyze_with_custom_workflow(file_path)
85
+ puts "Starting custom analysis workflow for: #{file_path}"
86
+
87
+ # Start performance monitoring
88
+ monitor = LegalSummariser.performance_monitor
89
+ monitor.start_timer(:custom_workflow)
90
+
91
+ begin
92
+ # Step 1: Basic analysis
93
+ puts "Step 1: Performing basic analysis..."
94
+ result = LegalSummariser.summarise(file_path)
95
+
96
+ # Step 2: Custom risk assessment
97
+ puts "Step 2: Custom risk assessment..."
98
+ risk_score = result[:risks][:risk_score][:score]
99
+
100
+ custom_risk_level = case risk_score
101
+ when 0..5 then 'Very Low'
102
+ when 6..15 then 'Low'
103
+ when 16..30 then 'Medium'
104
+ when 31..50 then 'High'
105
+ else 'Critical'
106
+ end
107
+
108
+ # Step 3: Generate custom report
109
+ puts "Step 3: Generating custom report..."
110
+ custom_report = {
111
+ file_path: file_path,
112
+ analysis_timestamp: Time.now.iso8601,
113
+ document_info: {
114
+ type: result[:metadata][:document_type],
115
+ word_count: result[:metadata][:word_count],
116
+ processing_time: result[:metadata][:extraction_time_seconds]
117
+ },
118
+ summary: result[:plain_text],
119
+ risk_assessment: {
120
+ standard_score: risk_score,
121
+ custom_level: custom_risk_level,
122
+ high_priority_issues: result[:risks][:high_risks].length,
123
+ compliance_gaps: result[:risks][:compliance_gaps].length
124
+ },
125
+ recommendations: generate_custom_recommendations(result)
126
+ }
127
+
128
+ workflow_time = monitor.end_timer(:custom_workflow)
129
+ custom_report[:workflow_time_seconds] = workflow_time.round(3)
130
+
131
+ puts "Custom workflow completed in #{workflow_time.round(3)}s"
132
+ return custom_report
133
+
134
+ rescue => e
135
+ monitor.end_timer(:custom_workflow)
136
+ puts "Workflow failed: #{e.message}"
137
+ return nil
138
+ end
139
+ end
140
+
141
+ def generate_custom_recommendations(analysis_result)
142
+ recommendations = []
143
+
144
+ # Risk-based recommendations
145
+ high_risks = analysis_result[:risks][:high_risks]
146
+ if high_risks.any?
147
+ recommendations << "URGENT: Address #{high_risks.length} high-risk issues before signing"
148
+ high_risks.each { |risk| recommendations << "- #{risk[:recommendation]}" }
149
+ end
150
+
151
+ # Compliance recommendations
152
+ compliance_gaps = analysis_result[:risks][:compliance_gaps]
153
+ if compliance_gaps.any?
154
+ recommendations << "COMPLIANCE: Review #{compliance_gaps.length} regulatory gaps"
155
+ compliance_gaps.each { |gap| recommendations << "- #{gap[:recommendation]}" }
156
+ end
157
+
158
+ # Document type specific recommendations
159
+ doc_type = analysis_result[:metadata][:document_type]
160
+ case doc_type
161
+ when 'nda'
162
+ recommendations << "NDA: Verify confidentiality scope and duration"
163
+ when 'employment_contract'
164
+ recommendations << "EMPLOYMENT: Check termination clauses and benefits"
165
+ when 'service_agreement'
166
+ recommendations << "SERVICE: Review deliverables and payment terms"
167
+ end
168
+
169
+ recommendations
170
+ end
171
+
172
+ # Example usage of custom workflow
173
+ puts "\n7. Custom Workflow Example"
174
+ # Replace with actual file path
175
+ sample_file = '/tmp/sample_contract.txt'
176
+ File.write(sample_file, "Sample contract content for demonstration purposes.")
177
+
178
+ custom_result = analyze_with_custom_workflow(sample_file)
179
+ if custom_result
180
+ puts "\nCustom Analysis Result:"
181
+ puts JSON.pretty_generate(custom_result)
182
+ end
183
+
184
+ # Cleanup
185
+ File.delete(sample_file) if File.exist?(sample_file)
186
+
187
+ # Example 8: System statistics and monitoring
188
+ puts "\n8. System Statistics"
189
+ system_stats = LegalSummariser.stats
190
+ puts "System Performance Overview:"
191
+ puts "- Performance Metrics: #{system_stats[:performance].keys.join(', ')}"
192
+ puts "- Cache Status: #{system_stats[:cache][:enabled] ? 'Active' : 'Inactive'}"
193
+ puts "- Memory Usage: #{system_stats[:memory][:memory_mb]} MB" if system_stats[:memory][:available]
194
+
195
+ puts "\nAdvanced configuration examples completed!"
@@ -0,0 +1,101 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ # Example: Basic usage of Legal Summariser gem
5
+ require 'legal_summariser'
6
+
7
+ # Configure the gem (optional)
8
+ LegalSummariser.configure do |config|
9
+ config.language = 'en'
10
+ config.enable_caching = true
11
+ config.max_file_size = 10 * 1024 * 1024 # 10MB
12
+ end
13
+
14
+ # Example 1: Basic document analysis
15
+ puts "=== Basic Document Analysis ==="
16
+ begin
17
+ # Analyze a document (replace with your actual file path)
18
+ result = LegalSummariser.summarise('sample_contract.pdf')
19
+
20
+ puts "Document Type: #{result[:metadata][:document_type]}"
21
+ puts "Word Count: #{result[:metadata][:word_count]}"
22
+ puts "\nSummary:"
23
+ puts result[:plain_text]
24
+
25
+ puts "\nKey Points:"
26
+ result[:key_points].each_with_index do |point, index|
27
+ puts "#{index + 1}. #{point}"
28
+ end
29
+
30
+ rescue LegalSummariser::DocumentNotFoundError => e
31
+ puts "Error: #{e.message}"
32
+ rescue LegalSummariser::UnsupportedFormatError => e
33
+ puts "Error: #{e.message}"
34
+ end
35
+
36
+ # Example 2: Analysis with custom options
37
+ puts "\n=== Custom Analysis Options ==="
38
+ options = {
39
+ max_sentences: 3,
40
+ format: 'markdown'
41
+ }
42
+
43
+ begin
44
+ result = LegalSummariser.summarise('sample_contract.pdf', options)
45
+ puts result
46
+ rescue => e
47
+ puts "Error: #{e.message}"
48
+ end
49
+
50
+ # Example 3: Risk analysis focus
51
+ puts "\n=== Risk Analysis ==="
52
+ begin
53
+ result = LegalSummariser.summarise('sample_contract.pdf')
54
+
55
+ risks = result[:risks]
56
+ puts "Overall Risk Level: #{risks[:risk_score][:level].upcase}"
57
+ puts "Risk Score: #{risks[:risk_score][:score]}"
58
+
59
+ if risks[:high_risks].any?
60
+ puts "\nHigh Risks Found:"
61
+ risks[:high_risks].each do |risk|
62
+ puts "- #{risk[:type]}: #{risk[:description]}"
63
+ puts " Recommendation: #{risk[:recommendation]}"
64
+ end
65
+ end
66
+
67
+ if risks[:compliance_gaps].any?
68
+ puts "\nCompliance Gaps:"
69
+ risks[:compliance_gaps].each do |gap|
70
+ puts "- #{gap[:type]} (#{gap[:regulation]}): #{gap[:description]}"
71
+ end
72
+ end
73
+
74
+ rescue => e
75
+ puts "Error: #{e.message}"
76
+ end
77
+
78
+ # Example 4: Clause detection
79
+ puts "\n=== Clause Detection ==="
80
+ begin
81
+ result = LegalSummariser.summarise('sample_contract.pdf')
82
+
83
+ result[:clauses].each do |clause_type, clauses|
84
+ next if clauses.empty?
85
+
86
+ puts "\n#{clause_type.to_s.split('_').map(&:capitalize).join(' ')} Clauses:"
87
+ clauses.each_with_index do |clause, index|
88
+ puts "#{index + 1}. #{clause[:content][0..100]}..."
89
+ end
90
+ end
91
+
92
+ rescue => e
93
+ puts "Error: #{e.message}"
94
+ end
95
+
96
+ # Example 5: Performance monitoring
97
+ puts "\n=== Performance Statistics ==="
98
+ stats = LegalSummariser.stats
99
+ puts "Performance: #{stats[:performance]}"
100
+ puts "Cache: #{stats[:cache]}"
101
+ puts "Memory: #{stats[:memory]}"