legal_summariser 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 2a32e0da3e5422be003d79a333a6f3ea9417fadcc362164e3cef9cae0d84dafb
4
+ data.tar.gz: 3219d6167c936a2f056f43b5e2491bc4c67a697ef9d72169c7741be03f5a2726
5
+ SHA512:
6
+ metadata.gz: 9481e9eb32d6770586b21f8c56ced7f37d99afe8c9ba162fd284cc086b8f02f71b042bef0200bd61104446c0309763da7c362a3e5abae202ccf295c04ef63281
7
+ data.tar.gz: c41d771b2ef842b185ebf0114de4921060ad6e55a17377acfc47412790237428fcab8a5ceff3efe2112b3341a9380a30160244c21b0b078eecc108181e9d4ce8
data/.rspec ADDED
@@ -0,0 +1,3 @@
1
+ --format documentation
2
+ --color
3
+ --require spec_helper
data/CHANGELOG.md ADDED
@@ -0,0 +1,46 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [0.1.0] - 2024-09-09
9
+
10
+ ### Added
11
+ - Initial release of Legal Summariser
12
+ - Text extraction from PDF, DOCX, and TXT files
13
+ - Basic legal document summarisation with plain English conversion
14
+ - Clause detection for 8 key legal areas:
15
+ - Data Processing & Privacy
16
+ - Liability & Indemnification
17
+ - Confidentiality & Non-disclosure
18
+ - Termination & Cancellation
19
+ - Payment & Fees
20
+ - Intellectual Property
21
+ - Dispute Resolution
22
+ - Governing Law
23
+ - Risk analysis system with high/medium risk detection
24
+ - Compliance gap identification for GDPR and KVKK
25
+ - Unfair terms detection
26
+ - Multiple output formats (JSON, Markdown, Plain Text)
27
+ - Command-line interface with Thor
28
+ - Comprehensive test suite with RSpec
29
+ - Document type auto-detection
30
+ - Offline processing capabilities
31
+
32
+ ### Features
33
+ - Rule-based clause extraction using regex patterns
34
+ - Smart sentence scoring for summarisation
35
+ - Legal language simplification
36
+ - Risk scoring algorithm
37
+ - Compliance checking framework
38
+ - Multi-format document support
39
+ - CLI demo mode with sample NDA
40
+
41
+ ### Technical
42
+ - Ruby gem structure with proper gemspec
43
+ - Modular architecture with separate classes for each function
44
+ - Error handling for unsupported formats and missing files
45
+ - Text cleaning and normalization
46
+ - Comprehensive documentation and examples
data/Gemfile ADDED
@@ -0,0 +1,6 @@
1
+ # frozen_string_literal: true
2
+
3
+ source "https://rubygems.org"
4
+
5
+ # Specify your gem's dependencies in legal_summariser.gemspec
6
+ gemspec
data/README.md ADDED
@@ -0,0 +1,281 @@
1
+ # Legal Summariser 📋⚖️
2
+
3
+ > A Ruby-based AI-powered toolkit for legal document analysis that summarises contracts, extracts key clauses, flags risks, and translates legal jargon into plain English while preserving legal accuracy.
4
+
5
+ **Created by [Ahmet KAHRAMAN](https://ahmetxhero.web.app)** 👨‍💻
6
+
7
+ [![Ruby](https://img.shields.io/badge/Ruby-2.6+-red.svg)](https://ruby-lang.org)
8
+ [![License](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
9
+ [![Tests](https://img.shields.io/badge/Tests-26%20passing-green.svg)](#testing)
10
+
11
+ ---
12
+
13
+ ## 👋 About the Author
14
+
15
+ **Ahmet KAHRAMAN** - Mobile Developer & Cyber Security Expert
16
+
17
+ - 🌐 **Portfolio**: [ahmetxhero.web.app](https://ahmetxhero.web.app)
18
+ - 🎥 **YouTube**: [@ahmetxhero](https://youtube.com/@ahmetxhero)
19
+ - 💼 **LinkedIn**: [linkedin.com/in/ahmetxhero](https://linkedin.com/in/ahmetxhero)
20
+ - 🐤 **Twitter**: [@ahmetxhero](https://x.com/ahmetxhero)
21
+ - 📧 **Email**: ahmetxhero@gmail.com
22
+ - 🏠 **Location**: Ankara, Turkey 🇹🇷
23
+
24
+ *"Security first, innovation always"* - Building secure, innovative solutions for a better digital future 🚀
25
+
26
+ ---
27
+
28
+ ## Features
29
+
30
+ - **Document Processing**: Supports PDF, DOCX, and plain text files
31
+ - **Smart Summarisation**: Converts legal documents into concise plain English
32
+ - **Clause Detection**: Automatically identifies key legal clauses including:
33
+ - Data Processing & Privacy (GDPR/KVKK compliance)
34
+ - Liability & Indemnification
35
+ - Confidentiality & Non-disclosure
36
+ - Termination & Cancellation
37
+ - Payment & Fees
38
+ - Intellectual Property
39
+ - Dispute Resolution
40
+ - Governing Law
41
+ - **Risk Analysis**: Flags potential legal risks and unfair terms
42
+ - **Compliance Checking**: Identifies gaps in GDPR, KVKK, and other regulations
43
+ - **Multiple Output Formats**: JSON, Markdown, and plain text
44
+ - **CLI Interface**: Command-line tool for batch processing
45
+ - **Offline Processing**: Works without internet for sensitive documents
46
+
47
+ ## Installation
48
+
49
+ Add this line to your application's Gemfile:
50
+
51
+ ```ruby
52
+ gem 'legal_summariser'
53
+ ```
54
+
55
+ And then execute:
56
+
57
+ ```bash
58
+ bundle install
59
+ ```
60
+
61
+ Or install it yourself as:
62
+
63
+ ```bash
64
+ gem install legal_summariser
65
+ ```
66
+
67
+ ## Usage
68
+
69
+ ### Ruby API
70
+
71
+ ```ruby
72
+ require "legal_summariser"
73
+
74
+ # Basic usage
75
+ summary = LegalSummariser.summarise("contracts/nda.pdf")
76
+ puts summary[:plain_text]
77
+ # => "This Non-Disclosure Agreement establishes confidentiality, valid for 2 years. The company may terminate at any time..."
78
+
79
+ # With options
80
+ result = LegalSummariser.summarise("contract.pdf", {
81
+ format: 'markdown',
82
+ max_sentences: 3
83
+ })
84
+
85
+ # Access different parts of the analysis
86
+ puts result[:key_points] # Key contract points
87
+ puts result[:clauses] # Detected legal clauses
88
+ puts result[:risks] # Risk analysis
89
+ puts result[:metadata] # Document metadata
90
+ ```
91
+
92
+ ### Command Line Interface
93
+
94
+ ```bash
95
+ # Analyze a document
96
+ legal_summariser analyze contract.pdf
97
+
98
+ # Specify output format
99
+ legal_summariser analyze contract.pdf --format markdown
100
+
101
+ # Save to file
102
+ legal_summariser analyze contract.pdf --output summary.md --format markdown
103
+
104
+ # Run demo
105
+ legal_summariser demo
106
+
107
+ # Show supported formats
108
+ legal_summariser supported_formats
109
+
110
+ # Show version
111
+ legal_summariser version
112
+ ```
113
+
114
+ ## Example Output
115
+
116
+ ### Plain Text Summary
117
+ ```
118
+ This Non-Disclosure Agreement establishes confidentiality obligations between parties.
119
+ The agreement will remain valid for 2 years from the date of signing. Either party may
120
+ terminate with 30 days written notice. The receiving party will be liable for any
121
+ breach of confidentiality obligations.
122
+ ```
123
+
124
+ ### Risk Analysis
125
+ ```
126
+ High Risks Found:
127
+ - Unlimited Liability: Agreement may expose party to unlimited financial liability
128
+ - Broad Indemnification: Very broad indemnification obligations that could be costly
129
+
130
+ Compliance Gaps:
131
+ - Missing GDPR Reference: Document processes personal data but lacks GDPR compliance language
132
+ - Missing Data Subject Rights: No mention of data subject rights under GDPR
133
+ ```
134
+
135
+ ### Detected Clauses
136
+ - **Confidentiality**: 3 clauses found
137
+ - **Liability**: 2 clauses found
138
+ - **Termination**: 1 clause found
139
+ - **Data Processing**: 2 clauses found
140
+
141
+ ## Supported Document Types
142
+
143
+ The system automatically detects and optimizes analysis for:
144
+
145
+ - **Non-Disclosure Agreements (NDAs)**
146
+ - **Service Agreements**
147
+ - **Employment Contracts**
148
+ - **Privacy Policies**
149
+ - **License Agreements**
150
+ - **General Contracts**
151
+
152
+ ## Supported File Formats
153
+
154
+ ### Input Formats
155
+ - PDF (.pdf)
156
+ - Microsoft Word (.docx)
157
+ - Plain Text (.txt)
158
+
159
+ ### Output Formats
160
+ - JSON (structured data)
161
+ - Markdown (formatted report)
162
+ - Plain Text (simple summary)
163
+
164
+ ## Target Users
165
+
166
+ - **Law firms & compliance teams**: Faster contract reviews
167
+ - **Startups & SMEs**: Understanding investor or supplier contracts
168
+ - **Forensics experts**: Extracting critical legal clauses for reports
169
+ - **Academics & NGOs**: Analysing legal policies and regulations
170
+
171
+ ## Technical Architecture
172
+
173
+ ### Hybrid Approach
174
+ - **Rule-based extractors**: For structured clause detection
175
+ - **NLP processing**: For summarisation and risk detection
176
+ - **Pattern matching**: For compliance gap identification
177
+
178
+ ### Key Components
179
+ - **TextExtractor**: Multi-format document parsing
180
+ - **Summariser**: Plain English conversion engine
181
+ - **ClauseDetector**: Legal clause identification
182
+ - **RiskAnalyzer**: Risk assessment and flagging
183
+ - **Formatter**: Multi-format output generation
184
+
185
+ ## Development
186
+
187
+ After checking out the repo, run `bundle install` to install dependencies. Then, run `rake spec` to run the tests.
188
+
189
+ ```bash
190
+ # Install dependencies
191
+ bundle install
192
+
193
+ # Run tests
194
+ bundle exec rspec
195
+
196
+ # Run linter
197
+ bundle exec rubocop
198
+
199
+ # Build gem
200
+ gem build legal_summariser.gemspec
201
+
202
+ # Install local gem
203
+ gem install ./legal_summariser-*.gem
204
+ ```
205
+
206
+ ## Roadmap
207
+
208
+ - **v0.1** ✅ Text extraction + basic summarisation
209
+ - **v0.2** ✅ Clause detection + risk flagging
210
+ - **v0.3** 🔄 Plain language generator (fine-tuned models)
211
+ - **v1.0** 📋 Multi-language support + PDF annotation output
212
+
213
+ ## Contributing
214
+
215
+ Bug reports and pull requests are welcome on GitHub at https://github.com/legal-summariser/legal_summariser.
216
+
217
+ ## License
218
+
219
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
220
+
221
+ ## Disclaimer
222
+
223
+ This tool is designed to assist with legal document analysis but should not replace professional legal advice. Always consult with qualified legal professionals for important legal matters.
224
+
225
+ ## 🌍 Global Impact
226
+
227
+ - **Innovation**: Bridges AI with law and compliance in a developer-friendly way
228
+ - **Contribution**: Open-source library for legal NLP in Ruby
229
+ - **Public benefit**: Helps both professionals and citizens better understand their rights and obligations
230
+ - **Global relevance**: Applicable across jurisdictions (GDPR, KVKK, HIPAA, CCPA)
231
+
232
+ ## 🛠️ Tech Stack
233
+
234
+ This project leverages my expertise in:
235
+
236
+ - **Ruby Development**: Gem architecture, modular design patterns
237
+ - **AI & NLP**: Rule-based text analysis, pattern recognition
238
+ - **Cybersecurity**: Compliance frameworks (GDPR, KVKK), risk assessment
239
+ - **Digital Forensics**: Legal document analysis, evidence extraction
240
+ - **Software Engineering**: Test-driven development, CLI tools
241
+
242
+ ## 🎓 Professional Background
243
+
244
+ As a **Mobile Developer & Cyber Security Expert** with 10+ years in Public Sector IT:
245
+
246
+ - 🎓 **Master's in Forensic Informatics** - Gazi University (2021-2023)
247
+ - 🏢 **Mobile Developer** - Gendarmerie General Command (2024-Present)
248
+ - 🔒 **Certified Ethical Hacker (CEH)**
249
+ - 📱 **iOS & Android Development Expert**
250
+
251
+ ## 🤝 Connect & Collaborate
252
+
253
+ | Platform | Link | Purpose |
254
+ |----------|------|---------|
255
+ | 🌐 **Portfolio** | [ahmetxhero.web.app](https://ahmetxhero.web.app) | Professional showcase |
256
+ | 🎥 **YouTube** | [@ahmetxhero](https://youtube.com/@ahmetxhero) | Tech tutorials & content |
257
+ | 💼 **LinkedIn** | [ahmetxhero](https://linkedin.com/in/ahmetxhero) | Professional network |
258
+ | 🐤 **Twitter** | [@ahmetxhero](https://x.com/ahmetxhero) | Tech updates & thoughts |
259
+ | 📝 **Medium** | [ahmetxhero.medium.com](https://ahmetxhero.medium.com) | Technical articles |
260
+ | 📷 **Instagram** | [@ahmetxhero](https://instagram.com/ahmetxhero) | Behind the scenes |
261
+
262
+ ## ☕ Support My Work
263
+
264
+ If this project helps you or your organization, consider:
265
+
266
+ - ⭐ **Star this repository** on GitHub
267
+ - 🔄 **Share it** with your network
268
+ - ☕ **Buy me a coffee** to support open source development
269
+ - 🗣️ **Invite me to speak** at your tech event about legal tech innovation
270
+
271
+ ## 🎯 Current Focus
272
+
273
+ - 🔒 **Cybersecurity**: Developing secure applications with privacy-first approach
274
+ - 📱 **Mobile Development**: Creating innovative iOS and Android solutions
275
+ - 🔍 **Digital Forensics**: Advancing forensic investigation techniques
276
+ - 📚 **Knowledge Sharing**: Contributing to the tech community through open source
277
+ - ⚖️ **Legal Tech**: Building tools that make legal processes more accessible
278
+
279
+ ---
280
+
281
+ *Building secure, innovative solutions for a better digital future* 🚀
data/Rakefile ADDED
@@ -0,0 +1,12 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "bundler/gem_tasks"
4
+ require "rspec/core/rake_task"
5
+
6
+ RSpec::Core::RakeTask.new(:spec)
7
+
8
+ require "rubocop/rake_task"
9
+
10
+ RuboCop::RakeTask.new
11
+
12
+ task default: %i[spec rubocop]
@@ -0,0 +1,121 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ require_relative '../lib/legal_summariser'
5
+ require 'thor'
6
+
7
+ module LegalSummariser
8
+ class CLI < Thor
9
+ desc "analyze FILE", "Analyze a legal document and generate summary"
10
+ option :format, aliases: '-f', default: 'text', desc: 'Output format (json, markdown, text)'
11
+ option :output, aliases: '-o', desc: 'Output file path (optional)'
12
+ option :max_sentences, type: :numeric, default: 5, desc: 'Maximum sentences in summary'
13
+ def analyze(file_path)
14
+ begin
15
+ puts "Analyzing: #{file_path}"
16
+ puts "Format: #{options[:format]}"
17
+ puts "-" * 50
18
+
19
+ # Perform analysis
20
+ results = LegalSummariser.summarise(file_path, {
21
+ format: options[:format],
22
+ max_sentences: options[:max_sentences]
23
+ })
24
+
25
+ # Output results
26
+ if options[:output]
27
+ File.write(options[:output], results)
28
+ puts "Results saved to: #{options[:output]}"
29
+ else
30
+ puts results
31
+ end
32
+
33
+ rescue LegalSummariser::DocumentNotFoundError => e
34
+ puts "Error: #{e.message}"
35
+ exit 1
36
+ rescue LegalSummariser::UnsupportedFormatError => e
37
+ puts "Error: #{e.message}"
38
+ exit 1
39
+ rescue => e
40
+ puts "Unexpected error: #{e.message}"
41
+ puts e.backtrace if ENV['DEBUG']
42
+ exit 1
43
+ end
44
+ end
45
+
46
+ desc "version", "Show version information"
47
+ def version
48
+ puts "Legal Summariser v#{LegalSummariser::VERSION}"
49
+ puts "Ruby-based AI-powered legal document analysis toolkit"
50
+ end
51
+
52
+ desc "supported_formats", "List supported document formats"
53
+ def supported_formats
54
+ puts "Supported document formats:"
55
+ puts "- PDF (.pdf)"
56
+ puts "- Microsoft Word (.docx)"
57
+ puts "- Plain text (.txt)"
58
+ puts ""
59
+ puts "Output formats:"
60
+ puts "- JSON (json)"
61
+ puts "- Markdown (markdown, md)"
62
+ puts "- Plain text (text, txt)"
63
+ end
64
+
65
+ desc "demo", "Run demo analysis on sample documents"
66
+ def demo
67
+ puts "Legal Summariser Demo"
68
+ puts "=" * 50
69
+ puts ""
70
+
71
+ # Create sample NDA text for demo
72
+ sample_text = create_sample_nda
73
+ sample_file = "/tmp/sample_nda.txt"
74
+ File.write(sample_file, sample_text)
75
+
76
+ puts "Analyzing sample NDA document..."
77
+ puts ""
78
+
79
+ results = LegalSummariser.summarise(sample_file, { format: 'markdown' })
80
+ puts results
81
+
82
+ # Clean up
83
+ File.delete(sample_file) if File.exist?(sample_file)
84
+ end
85
+
86
+ private
87
+
88
+ def create_sample_nda
89
+ <<~NDA
90
+ NON-DISCLOSURE AGREEMENT
91
+
92
+ This Non-Disclosure Agreement ("Agreement") is entered into on [DATE] between Company ABC ("Disclosing Party") and John Doe ("Receiving Party").
93
+
94
+ 1. CONFIDENTIAL INFORMATION
95
+ The Disclosing Party may disclose certain confidential and proprietary information to the Receiving Party. Confidential information includes all technical data, trade secrets, know-how, research, product plans, products, services, customers, customer lists, markets, software, developments, inventions, processes, formulas, technology, designs, drawings, engineering, hardware configuration information, marketing, finances, or other business information.
96
+
97
+ 2. OBLIGATIONS
98
+ The Receiving Party agrees to hold and maintain the Confidential Information in strict confidence for a period of two (2) years from the date of disclosure. The Receiving Party shall not disclose any Confidential Information to third parties without prior written consent.
99
+
100
+ 3. LIABILITY
101
+ The Receiving Party shall be liable for any breach of this Agreement and shall indemnify the Disclosing Party against all claims, damages, and expenses arising from such breach.
102
+
103
+ 4. TERMINATION
104
+ This Agreement may be terminated by either party with thirty (30) days written notice. Upon termination, all Confidential Information must be returned or destroyed.
105
+
106
+ 5. GOVERNING LAW
107
+ This Agreement shall be governed by the laws of England and Wales. Any disputes shall be resolved through binding arbitration.
108
+
109
+ 6. DATA PROTECTION
110
+ Both parties acknowledge their obligations under the General Data Protection Regulation (GDPR) regarding any personal data processed under this Agreement.
111
+
112
+ IN WITNESS WHEREOF, the parties have executed this Agreement as of the date first written above.
113
+ NDA
114
+ end
115
+ end
116
+ end
117
+
118
+ # Run CLI if called directly
119
+ if __FILE__ == $0
120
+ LegalSummariser::CLI.start(ARGV)
121
+ end
@@ -0,0 +1,206 @@
1
+ # frozen_string_literal: true
2
+
3
+ module LegalSummariser
4
+ class ClauseDetector
5
+ attr_reader :text
6
+
7
+ def initialize(text)
8
+ @text = text.downcase
9
+ end
10
+
11
+ # Detect key legal clauses in the document
12
+ # @return [Hash] Detected clauses with their content
13
+ def detect
14
+ {
15
+ data_processing: detect_data_processing_clauses,
16
+ liability: detect_liability_clauses,
17
+ confidentiality: detect_confidentiality_clauses,
18
+ termination: detect_termination_clauses,
19
+ payment: detect_payment_clauses,
20
+ intellectual_property: detect_ip_clauses,
21
+ dispute_resolution: detect_dispute_resolution_clauses,
22
+ governing_law: detect_governing_law_clauses
23
+ }.compact
24
+ end
25
+
26
+ private
27
+
28
+ # Detect data processing and privacy clauses
29
+ # @return [Array<Hash>] Data processing clauses
30
+ def detect_data_processing_clauses
31
+ patterns = [
32
+ /data\s+processing/,
33
+ /personal\s+data/,
34
+ /gdpr/,
35
+ /kvkk/,
36
+ /data\s+protection/,
37
+ /privacy\s+policy/,
38
+ /data\s+subject/,
39
+ /data\s+controller/,
40
+ /data\s+processor/
41
+ ]
42
+
43
+ find_clauses_by_patterns(patterns, "Data Processing")
44
+ end
45
+
46
+ # Detect liability and indemnification clauses
47
+ # @return [Array<Hash>] Liability clauses
48
+ def detect_liability_clauses
49
+ patterns = [
50
+ /liabilit/,
51
+ /liable/,
52
+ /indemnif/,
53
+ /damages/,
54
+ /limitation\s+of\s+liability/,
55
+ /exclude.*liability/,
56
+ /consequential\s+damages/,
57
+ /indirect\s+damages/
58
+ ]
59
+
60
+ find_clauses_by_patterns(patterns, "Liability")
61
+ end
62
+
63
+ # Detect confidentiality and non-disclosure clauses
64
+ # @return [Array<Hash>] Confidentiality clauses
65
+ def detect_confidentiality_clauses
66
+ patterns = [
67
+ /confidential/,
68
+ /non.?disclosure/,
69
+ /proprietary\s+information/,
70
+ /trade\s+secret/,
71
+ /confidentiality\s+agreement/,
72
+ /nda/
73
+ ]
74
+
75
+ find_clauses_by_patterns(patterns, "Confidentiality")
76
+ end
77
+
78
+ # Detect termination clauses
79
+ # @return [Array<Hash>] Termination clauses
80
+ def detect_termination_clauses
81
+ patterns = [
82
+ /terminat/,
83
+ /end\s+this\s+agreement/,
84
+ /breach.*agreement/,
85
+ /notice\s+of\s+termination/,
86
+ /expir/,
87
+ /cancel/
88
+ ]
89
+
90
+ find_clauses_by_patterns(patterns, "Termination")
91
+ end
92
+
93
+ # Detect payment and fee clauses
94
+ # @return [Array<Hash>] Payment clauses
95
+ def detect_payment_clauses
96
+ patterns = [
97
+ /payment/,
98
+ /fee/,
99
+ /\$[\d,]+/,
100
+ /invoice/,
101
+ /billing/,
102
+ /compensation/,
103
+ /remuneration/,
104
+ /salary/,
105
+ /wage/
106
+ ]
107
+
108
+ find_clauses_by_patterns(patterns, "Payment")
109
+ end
110
+
111
+ # Detect intellectual property clauses
112
+ # @return [Array<Hash>] IP clauses
113
+ def detect_ip_clauses
114
+ patterns = [
115
+ /intellectual\s+property/,
116
+ /copyright/,
117
+ /trademark/,
118
+ /patent/,
119
+ /trade\s+mark/,
120
+ /proprietary\s+rights/,
121
+ /ownership/,
122
+ /license/,
123
+ /licensing/
124
+ ]
125
+
126
+ find_clauses_by_patterns(patterns, "Intellectual Property")
127
+ end
128
+
129
+ # Detect dispute resolution clauses
130
+ # @return [Array<Hash>] Dispute resolution clauses
131
+ def detect_dispute_resolution_clauses
132
+ patterns = [
133
+ /dispute/,
134
+ /arbitration/,
135
+ /mediation/,
136
+ /litigation/,
137
+ /court/,
138
+ /jurisdiction/,
139
+ /resolution\s+of\s+disputes/,
140
+ /legal\s+proceedings/
141
+ ]
142
+
143
+ find_clauses_by_patterns(patterns, "Dispute Resolution")
144
+ end
145
+
146
+ # Detect governing law clauses
147
+ # @return [Array<Hash>] Governing law clauses
148
+ def detect_governing_law_clauses
149
+ patterns = [
150
+ /governing\s+law/,
151
+ /applicable\s+law/,
152
+ /laws?\s+of/,
153
+ /jurisdiction/,
154
+ /governed\s+by/,
155
+ /subject\s+to.*law/
156
+ ]
157
+
158
+ find_clauses_by_patterns(patterns, "Governing Law")
159
+ end
160
+
161
+ # Find clauses matching given patterns
162
+ # @param patterns [Array<Regexp>] Regex patterns to match
163
+ # @param clause_type [String] Type of clause being detected
164
+ # @return [Array<Hash>] Found clauses
165
+ def find_clauses_by_patterns(patterns, clause_type)
166
+ clauses = []
167
+ sentences = extract_sentences
168
+
169
+ sentences.each_with_index do |sentence, index|
170
+ patterns.each do |pattern|
171
+ if sentence.match?(pattern)
172
+ clauses << {
173
+ type: clause_type,
174
+ content: sentence.strip,
175
+ position: index + 1,
176
+ keywords: extract_keywords(sentence, pattern)
177
+ }
178
+ break # Don't match multiple patterns for the same sentence
179
+ end
180
+ end
181
+ end
182
+
183
+ clauses.uniq { |clause| clause[:content] }
184
+ end
185
+
186
+ # Extract sentences from text
187
+ # @return [Array<String>] Array of sentences
188
+ def extract_sentences
189
+ # Split on sentence boundaries
190
+ sentences = text.split(/(?<=[.!?])\s+/)
191
+
192
+ # Filter out very short sentences
193
+ sentences.select { |s| s.length > 20 }
194
+ .map { |s| s.strip.gsub(/\s+/, ' ') }
195
+ end
196
+
197
+ # Extract relevant keywords from a sentence based on pattern
198
+ # @param sentence [String] The sentence
199
+ # @param pattern [Regexp] The matching pattern
200
+ # @return [Array<String>] Extracted keywords
201
+ def extract_keywords(sentence, pattern)
202
+ matches = sentence.scan(pattern).flatten
203
+ matches.map(&:strip).reject(&:empty?)
204
+ end
205
+ end
206
+ end
@@ -0,0 +1,10 @@
1
+ # frozen_string_literal: true
2
+
3
+ module LegalSummariser
4
+ # Legacy compatibility - DocumentParser is now handled by TextExtractor
5
+ class DocumentParser
6
+ def self.parse(file_path)
7
+ TextExtractor.extract(file_path)
8
+ end
9
+ end
10
+ end