legal_summariser 0.3.0 → 0.3.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +8 -0
- data/README.md +92 -29
- data/lib/legal_summariser/model_trainer.rb +707 -0
- data/lib/legal_summariser/multilingual_processor.rb +683 -0
- data/lib/legal_summariser/pdf_annotator.rb +601 -0
- data/lib/legal_summariser/plain_language_generator.rb +463 -0
- data/lib/legal_summariser/version.rb +1 -1
- metadata +19 -11
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 17172c652ce52a7b5707e22bcec192c56e3538255cc579cdf4d331de5caf2d6e
|
|
4
|
+
data.tar.gz: 16383b701c48ce0e674edfccd9eece8fa31b6659ca29ff4f073939bf735b7564
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: aa1b6fd284af5525e90b8cf15333eb2cfddecc1179142b7f8ea0df306e7a04f92a637e208e985de6517414ec60aa565ba8baba37f993952969a565641ad45d6e
|
|
7
|
+
data.tar.gz: 638343a4a36085b27561abe93e0374e1e034ad3dd701646d1d3d9d38537c41bb3bef243a32b191526b5237e16a8f933ee9055e51b5dc62b21411752bf40958ca
|
data/CHANGELOG.md
CHANGED
|
@@ -5,6 +5,14 @@ All notable changes to this project will be documented in this file.
|
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
7
|
|
|
8
|
+
## [0.3.1] - 2025-01-09
|
|
9
|
+
|
|
10
|
+
### Updated
|
|
11
|
+
- **Gem Metadata**: Updated author information and contact details
|
|
12
|
+
- **RubyGems Page**: Enhanced description and metadata for better discoverability
|
|
13
|
+
- **Links**: Updated all GitHub repository links to correct owner
|
|
14
|
+
- **Documentation**: Added additional metadata links (bug tracker, wiki, documentation)
|
|
15
|
+
|
|
8
16
|
## [0.3.0] - 2025-01-09
|
|
9
17
|
|
|
10
18
|
### Added
|
data/README.md
CHANGED
|
@@ -6,7 +6,8 @@
|
|
|
6
6
|
|
|
7
7
|
[](https://ruby-lang.org)
|
|
8
8
|
[](LICENSE)
|
|
9
|
-
[](#testing)
|
|
10
|
+
[](https://rubygems.org/gems/legal_summariser)
|
|
10
11
|
|
|
11
12
|
---
|
|
12
13
|
|
|
@@ -27,7 +28,8 @@
|
|
|
27
28
|
|
|
28
29
|
## Features
|
|
29
30
|
|
|
30
|
-
|
|
31
|
+
### 🚀 Core Analysis
|
|
32
|
+
- **Document Processing**: Supports PDF, DOCX, RTF, and plain text files
|
|
31
33
|
- **Smart Summarisation**: Converts legal documents into concise plain English
|
|
32
34
|
- **Clause Detection**: Automatically identifies key legal clauses including:
|
|
33
35
|
- Data Processing & Privacy (GDPR/KVKK compliance)
|
|
@@ -40,8 +42,29 @@
|
|
|
40
42
|
- Governing Law
|
|
41
43
|
- **Risk Analysis**: Flags potential legal risks and unfair terms
|
|
42
44
|
- **Compliance Checking**: Identifies gaps in GDPR, KVKK, and other regulations
|
|
43
|
-
|
|
44
|
-
|
|
45
|
+
|
|
46
|
+
### 🤖 AI/ML Features (v0.3.0)
|
|
47
|
+
- **Plain Language Generator**: AI-powered legal text simplification with 30+ jargon mappings
|
|
48
|
+
- **Model Training System**: Train custom legal language models (pattern-based, statistical, neural)
|
|
49
|
+
- **Readability Scoring**: Calculate complexity reduction and readability metrics
|
|
50
|
+
- **Fine-tuning Support**: Customize models for specific legal domains
|
|
51
|
+
|
|
52
|
+
### 🌍 Multilingual Support
|
|
53
|
+
- **8 Languages Supported**: English, Spanish, French, German, Italian, Portuguese, Turkish, Dutch
|
|
54
|
+
- **Legal Term Translation**: Cross-language legal terminology mapping
|
|
55
|
+
- **Cultural Adaptations**: Legal system-specific processing for different countries
|
|
56
|
+
- **AI Translation Integration**: Support for external translation APIs
|
|
57
|
+
|
|
58
|
+
### 📄 Advanced Output
|
|
59
|
+
- **PDF Annotations**: Rich PDF output with highlighting, comments, and risk indicators
|
|
60
|
+
- **Multiple Formats**: JSON, Markdown, plain text, and annotated PDF
|
|
61
|
+
- **Batch Processing**: Process multiple documents simultaneously
|
|
62
|
+
- **Performance Monitoring**: Built-in metrics and caching system
|
|
63
|
+
|
|
64
|
+
### 🛠️ Developer Tools
|
|
65
|
+
- **CLI Interface**: Comprehensive command-line tool
|
|
66
|
+
- **Configuration System**: Flexible configuration with validation
|
|
67
|
+
- **Caching System**: Result caching with TTL and size management
|
|
45
68
|
- **Offline Processing**: Works without internet for sensitive documents
|
|
46
69
|
|
|
47
70
|
## Installation
|
|
@@ -73,41 +96,76 @@ require "legal_summariser"
|
|
|
73
96
|
|
|
74
97
|
# Basic usage
|
|
75
98
|
summary = LegalSummariser.summarise("contracts/nda.pdf")
|
|
76
|
-
puts summary[:plain_text]
|
|
77
|
-
#
|
|
99
|
+
puts summary[:plain_text] # AI-generated plain language version
|
|
100
|
+
puts summary[:summary] # Original summary
|
|
101
|
+
puts summary[:multilingual] # Multi-language processing results
|
|
78
102
|
|
|
79
|
-
#
|
|
103
|
+
# Advanced AI features
|
|
80
104
|
result = LegalSummariser.summarise("contract.pdf", {
|
|
81
105
|
format: 'markdown',
|
|
82
|
-
max_sentences: 3
|
|
106
|
+
max_sentences: 3,
|
|
107
|
+
language: 'es', # Process in Spanish
|
|
108
|
+
plain_language: true, # Enable AI plain language generation
|
|
109
|
+
generate_annotations: true # Create PDF annotations
|
|
83
110
|
})
|
|
84
111
|
|
|
85
|
-
# Access
|
|
112
|
+
# Access AI-enhanced analysis
|
|
86
113
|
puts result[:key_points] # Key contract points
|
|
87
114
|
puts result[:clauses] # Detected legal clauses
|
|
88
115
|
puts result[:risks] # Risk analysis
|
|
89
|
-
puts result[:
|
|
116
|
+
puts result[:plain_text] # AI-simplified version
|
|
117
|
+
puts result[:multilingual] # Multi-language results
|
|
118
|
+
puts result[:metadata] # Enhanced metadata with AI metrics
|
|
119
|
+
|
|
120
|
+
# Plain Language Generator
|
|
121
|
+
generator = LegalSummariser::PlainLanguageGenerator.new
|
|
122
|
+
plain_result = generator.generate("The party of the first part shall indemnify...")
|
|
123
|
+
puts plain_result[:text] # "The first party will compensate..."
|
|
124
|
+
puts plain_result[:readability_score] # Readability improvement metrics
|
|
125
|
+
|
|
126
|
+
# Model Training
|
|
127
|
+
trainer = LegalSummariser::ModelTrainer.new
|
|
128
|
+
trainer.train_model('contract_model', training_data, type: 'statistical')
|
|
129
|
+
trainer.fine_tune_model('contract_model', fine_tuning_data)
|
|
130
|
+
|
|
131
|
+
# Multilingual Processing
|
|
132
|
+
processor = LegalSummariser::MultilingualProcessor.new
|
|
133
|
+
result = processor.process_multilingual("contract.pdf", source: 'en', target: 'es')
|
|
134
|
+
|
|
135
|
+
# PDF Annotations
|
|
136
|
+
annotator = LegalSummariser::PDFAnnotator.new
|
|
137
|
+
annotator.create_annotated_pdf("contract.pdf", analysis_results, "annotated_contract.pdf")
|
|
90
138
|
```
|
|
91
139
|
|
|
92
140
|
### Command Line Interface
|
|
93
141
|
|
|
94
142
|
```bash
|
|
95
|
-
#
|
|
143
|
+
# Basic analysis
|
|
96
144
|
legal_summariser analyze contract.pdf
|
|
97
145
|
|
|
98
|
-
#
|
|
99
|
-
legal_summariser analyze contract.pdf --format markdown
|
|
146
|
+
# AI-enhanced analysis with plain language
|
|
147
|
+
legal_summariser analyze contract.pdf --plain-language --format markdown
|
|
100
148
|
|
|
101
|
-
#
|
|
102
|
-
legal_summariser analyze contract.pdf --
|
|
149
|
+
# Multilingual processing
|
|
150
|
+
legal_summariser analyze contract.pdf --language es --translate-to en
|
|
103
151
|
|
|
104
|
-
#
|
|
105
|
-
legal_summariser
|
|
152
|
+
# Generate annotated PDF
|
|
153
|
+
legal_summariser analyze contract.pdf --annotate --output annotated_contract.pdf
|
|
106
154
|
|
|
107
|
-
#
|
|
108
|
-
legal_summariser
|
|
155
|
+
# Batch processing with AI features
|
|
156
|
+
legal_summariser batch contracts/ --plain-language --multilingual
|
|
157
|
+
|
|
158
|
+
# Configuration and stats
|
|
159
|
+
legal_summariser config --set language=es
|
|
160
|
+
legal_summariser stats
|
|
109
161
|
|
|
110
|
-
#
|
|
162
|
+
# Model management
|
|
163
|
+
legal_summariser train-model --type statistical --data training_data.json
|
|
164
|
+
legal_summariser list-models
|
|
165
|
+
|
|
166
|
+
# Utility commands
|
|
167
|
+
legal_summariser demo
|
|
168
|
+
legal_summariser supported_formats
|
|
111
169
|
legal_summariser version
|
|
112
170
|
```
|
|
113
171
|
|
|
@@ -176,11 +234,16 @@ The system automatically detects and optimizes analysis for:
|
|
|
176
234
|
- **Pattern matching**: For compliance gap identification
|
|
177
235
|
|
|
178
236
|
### Key Components
|
|
179
|
-
- **TextExtractor**: Multi-format document parsing
|
|
180
|
-
- **Summariser**:
|
|
181
|
-
- **ClauseDetector**:
|
|
182
|
-
- **RiskAnalyzer**:
|
|
183
|
-
- **
|
|
237
|
+
- **TextExtractor**: Multi-format document parsing (PDF, DOCX, RTF, TXT)
|
|
238
|
+
- **Summariser**: Enhanced plain English conversion engine
|
|
239
|
+
- **ClauseDetector**: Advanced legal clause identification
|
|
240
|
+
- **RiskAnalyzer**: Comprehensive risk assessment and flagging
|
|
241
|
+
- **PlainLanguageGenerator**: AI-powered legal text simplification
|
|
242
|
+
- **ModelTrainer**: Custom model training and fine-tuning system
|
|
243
|
+
- **MultilingualProcessor**: Cross-language processing and translation
|
|
244
|
+
- **PDFAnnotator**: Rich PDF annotation and highlighting
|
|
245
|
+
- **Formatter**: Multi-format output generation (JSON, Markdown, PDF)
|
|
246
|
+
- **Cache & Performance**: Advanced caching and performance monitoring
|
|
184
247
|
|
|
185
248
|
## Development
|
|
186
249
|
|
|
@@ -206,9 +269,9 @@ gem install ./legal_summariser-*.gem
|
|
|
206
269
|
## Roadmap
|
|
207
270
|
|
|
208
271
|
- **v0.1** ✅ Text extraction + basic summarisation
|
|
209
|
-
- **v0.2** ✅ Clause detection + risk flagging
|
|
210
|
-
- **v0.3**
|
|
211
|
-
- **v1.0** 📋
|
|
272
|
+
- **v0.2** ✅ Clause detection + risk flagging + performance enhancements
|
|
273
|
+
- **v0.3** ✅ AI/ML features + multilingual support + PDF annotations
|
|
274
|
+
- **v1.0** 📋 Advanced neural models + enterprise features + API service
|
|
212
275
|
|
|
213
276
|
## Contributing
|
|
214
277
|
|
|
@@ -234,7 +297,7 @@ This tool is designed to assist with legal document analysis but should not repl
|
|
|
234
297
|
This project leverages my expertise in:
|
|
235
298
|
|
|
236
299
|
- **Ruby Development**: Gem architecture, modular design patterns
|
|
237
|
-
- **AI & NLP**:
|
|
300
|
+
- **AI & NLP**: Advanced machine learning, neural networks, multilingual processing
|
|
238
301
|
- **Cybersecurity**: Compliance frameworks (GDPR, KVKK), risk assessment
|
|
239
302
|
- **Digital Forensics**: Legal document analysis, evidence extraction
|
|
240
303
|
- **Software Engineering**: Test-driven development, CLI tools
|