legal_summariser 0.3.0 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 71b638897796c0db2653eefb9456f1fdc821c3c91b3320f6aecce4430e750019
4
- data.tar.gz: 1d915044c3946c8a34656af00d22d9a9d91d06f0258b74fbf870164f72b1104f
3
+ metadata.gz: 17172c652ce52a7b5707e22bcec192c56e3538255cc579cdf4d331de5caf2d6e
4
+ data.tar.gz: 16383b701c48ce0e674edfccd9eece8fa31b6659ca29ff4f073939bf735b7564
5
5
  SHA512:
6
- metadata.gz: 339c0f2674c8509e5ffed3da42eaffde1d151c535bcb454d4bba7dfa7d9b060636b31afd66b29b405ea73b01e5067280c488661e902944e6e38778d8288854be
7
- data.tar.gz: dcb9f001f384c872380c4d42bf6ebd9c079156e14131d4dd620af185226c71d7d79e7ea95a251c0aa7e81834ee0534704484fbff5e4ae0e908b8344f61add1f2
6
+ metadata.gz: aa1b6fd284af5525e90b8cf15333eb2cfddecc1179142b7f8ea0df306e7a04f92a637e208e985de6517414ec60aa565ba8baba37f993952969a565641ad45d6e
7
+ data.tar.gz: 638343a4a36085b27561abe93e0374e1e034ad3dd701646d1d3d9d38537c41bb3bef243a32b191526b5237e16a8f933ee9055e51b5dc62b21411752bf40958ca
data/CHANGELOG.md CHANGED
@@ -5,6 +5,14 @@ All notable changes to this project will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [0.3.1] - 2025-01-09
9
+
10
+ ### Updated
11
+ - **Gem Metadata**: Updated author information and contact details
12
+ - **RubyGems Page**: Enhanced description and metadata for better discoverability
13
+ - **Links**: Updated all GitHub repository links to correct owner
14
+ - **Documentation**: Added additional metadata links (bug tracker, wiki, documentation)
15
+
8
16
  ## [0.3.0] - 2025-01-09
9
17
 
10
18
  ### Added
data/README.md CHANGED
@@ -6,7 +6,8 @@
6
6
 
7
7
  [![Ruby](https://img.shields.io/badge/Ruby-2.6+-red.svg)](https://ruby-lang.org)
8
8
  [![License](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
9
- [![Tests](https://img.shields.io/badge/Tests-26%20passing-green.svg)](#testing)
9
+ [![Tests](https://img.shields.io/badge/Tests-100+%20passing-green.svg)](#testing)
10
+ [![Version](https://img.shields.io/badge/Version-0.3.0-blue.svg)](https://rubygems.org/gems/legal_summariser)
10
11
 
11
12
  ---
12
13
 
@@ -27,7 +28,8 @@
27
28
 
28
29
  ## Features
29
30
 
30
- - **Document Processing**: Supports PDF, DOCX, and plain text files
31
+ ### 🚀 Core Analysis
32
+ - **Document Processing**: Supports PDF, DOCX, RTF, and plain text files
31
33
  - **Smart Summarisation**: Converts legal documents into concise plain English
32
34
  - **Clause Detection**: Automatically identifies key legal clauses including:
33
35
  - Data Processing & Privacy (GDPR/KVKK compliance)
@@ -40,8 +42,29 @@
40
42
  - Governing Law
41
43
  - **Risk Analysis**: Flags potential legal risks and unfair terms
42
44
  - **Compliance Checking**: Identifies gaps in GDPR, KVKK, and other regulations
43
- - **Multiple Output Formats**: JSON, Markdown, and plain text
44
- - **CLI Interface**: Command-line tool for batch processing
45
+
46
+ ### 🤖 AI/ML Features (v0.3.0)
47
+ - **Plain Language Generator**: AI-powered legal text simplification with 30+ jargon mappings
48
+ - **Model Training System**: Train custom legal language models (pattern-based, statistical, neural)
49
+ - **Readability Scoring**: Calculate complexity reduction and readability metrics
50
+ - **Fine-tuning Support**: Customize models for specific legal domains
51
+
52
+ ### 🌍 Multilingual Support
53
+ - **8 Languages Supported**: English, Spanish, French, German, Italian, Portuguese, Turkish, Dutch
54
+ - **Legal Term Translation**: Cross-language legal terminology mapping
55
+ - **Cultural Adaptations**: Legal system-specific processing for different countries
56
+ - **AI Translation Integration**: Support for external translation APIs
57
+
58
+ ### 📄 Advanced Output
59
+ - **PDF Annotations**: Rich PDF output with highlighting, comments, and risk indicators
60
+ - **Multiple Formats**: JSON, Markdown, plain text, and annotated PDF
61
+ - **Batch Processing**: Process multiple documents simultaneously
62
+ - **Performance Monitoring**: Built-in metrics and caching system
63
+
64
+ ### 🛠️ Developer Tools
65
+ - **CLI Interface**: Comprehensive command-line tool
66
+ - **Configuration System**: Flexible configuration with validation
67
+ - **Caching System**: Result caching with TTL and size management
45
68
  - **Offline Processing**: Works without internet for sensitive documents
46
69
 
47
70
  ## Installation
@@ -73,41 +96,76 @@ require "legal_summariser"
73
96
 
74
97
  # Basic usage
75
98
  summary = LegalSummariser.summarise("contracts/nda.pdf")
76
- puts summary[:plain_text]
77
- # => "This Non-Disclosure Agreement establishes confidentiality, valid for 2 years. The company may terminate at any time..."
99
+ puts summary[:plain_text] # AI-generated plain language version
100
+ puts summary[:summary] # Original summary
101
+ puts summary[:multilingual] # Multi-language processing results
78
102
 
79
- # With options
103
+ # Advanced AI features
80
104
  result = LegalSummariser.summarise("contract.pdf", {
81
105
  format: 'markdown',
82
- max_sentences: 3
106
+ max_sentences: 3,
107
+ language: 'es', # Process in Spanish
108
+ plain_language: true, # Enable AI plain language generation
109
+ generate_annotations: true # Create PDF annotations
83
110
  })
84
111
 
85
- # Access different parts of the analysis
112
+ # Access AI-enhanced analysis
86
113
  puts result[:key_points] # Key contract points
87
114
  puts result[:clauses] # Detected legal clauses
88
115
  puts result[:risks] # Risk analysis
89
- puts result[:metadata] # Document metadata
116
+ puts result[:plain_text] # AI-simplified version
117
+ puts result[:multilingual] # Multi-language results
118
+ puts result[:metadata] # Enhanced metadata with AI metrics
119
+
120
+ # Plain Language Generator
121
+ generator = LegalSummariser::PlainLanguageGenerator.new
122
+ plain_result = generator.generate("The party of the first part shall indemnify...")
123
+ puts plain_result[:text] # "The first party will compensate..."
124
+ puts plain_result[:readability_score] # Readability improvement metrics
125
+
126
+ # Model Training
127
+ trainer = LegalSummariser::ModelTrainer.new
128
+ trainer.train_model('contract_model', training_data, type: 'statistical')
129
+ trainer.fine_tune_model('contract_model', fine_tuning_data)
130
+
131
+ # Multilingual Processing
132
+ processor = LegalSummariser::MultilingualProcessor.new
133
+ result = processor.process_multilingual("contract.pdf", source: 'en', target: 'es')
134
+
135
+ # PDF Annotations
136
+ annotator = LegalSummariser::PDFAnnotator.new
137
+ annotator.create_annotated_pdf("contract.pdf", analysis_results, "annotated_contract.pdf")
90
138
  ```
91
139
 
92
140
  ### Command Line Interface
93
141
 
94
142
  ```bash
95
- # Analyze a document
143
+ # Basic analysis
96
144
  legal_summariser analyze contract.pdf
97
145
 
98
- # Specify output format
99
- legal_summariser analyze contract.pdf --format markdown
146
+ # AI-enhanced analysis with plain language
147
+ legal_summariser analyze contract.pdf --plain-language --format markdown
100
148
 
101
- # Save to file
102
- legal_summariser analyze contract.pdf --output summary.md --format markdown
149
+ # Multilingual processing
150
+ legal_summariser analyze contract.pdf --language es --translate-to en
103
151
 
104
- # Run demo
105
- legal_summariser demo
152
+ # Generate annotated PDF
153
+ legal_summariser analyze contract.pdf --annotate --output annotated_contract.pdf
106
154
 
107
- # Show supported formats
108
- legal_summariser supported_formats
155
+ # Batch processing with AI features
156
+ legal_summariser batch contracts/ --plain-language --multilingual
157
+
158
+ # Configuration and stats
159
+ legal_summariser config --set language=es
160
+ legal_summariser stats
109
161
 
110
- # Show version
162
+ # Model management
163
+ legal_summariser train-model --type statistical --data training_data.json
164
+ legal_summariser list-models
165
+
166
+ # Utility commands
167
+ legal_summariser demo
168
+ legal_summariser supported_formats
111
169
  legal_summariser version
112
170
  ```
113
171
 
@@ -176,11 +234,16 @@ The system automatically detects and optimizes analysis for:
176
234
  - **Pattern matching**: For compliance gap identification
177
235
 
178
236
  ### Key Components
179
- - **TextExtractor**: Multi-format document parsing
180
- - **Summariser**: Plain English conversion engine
181
- - **ClauseDetector**: Legal clause identification
182
- - **RiskAnalyzer**: Risk assessment and flagging
183
- - **Formatter**: Multi-format output generation
237
+ - **TextExtractor**: Multi-format document parsing (PDF, DOCX, RTF, TXT)
238
+ - **Summariser**: Enhanced plain English conversion engine
239
+ - **ClauseDetector**: Advanced legal clause identification
240
+ - **RiskAnalyzer**: Comprehensive risk assessment and flagging
241
+ - **PlainLanguageGenerator**: AI-powered legal text simplification
242
+ - **ModelTrainer**: Custom model training and fine-tuning system
243
+ - **MultilingualProcessor**: Cross-language processing and translation
244
+ - **PDFAnnotator**: Rich PDF annotation and highlighting
245
+ - **Formatter**: Multi-format output generation (JSON, Markdown, PDF)
246
+ - **Cache & Performance**: Advanced caching and performance monitoring
184
247
 
185
248
  ## Development
186
249
 
@@ -206,9 +269,9 @@ gem install ./legal_summariser-*.gem
206
269
  ## Roadmap
207
270
 
208
271
  - **v0.1** ✅ Text extraction + basic summarisation
209
- - **v0.2** ✅ Clause detection + risk flagging
210
- - **v0.3** 🔄 Plain language generator (fine-tuned models)
211
- - **v1.0** 📋 Multi-language support + PDF annotation output
272
+ - **v0.2** ✅ Clause detection + risk flagging + performance enhancements
273
+ - **v0.3** AI/ML features + multilingual support + PDF annotations
274
+ - **v1.0** 📋 Advanced neural models + enterprise features + API service
212
275
 
213
276
  ## Contributing
214
277
 
@@ -234,7 +297,7 @@ This tool is designed to assist with legal document analysis but should not repl
234
297
  This project leverages my expertise in:
235
298
 
236
299
  - **Ruby Development**: Gem architecture, modular design patterns
237
- - **AI & NLP**: Rule-based text analysis, pattern recognition
300
+ - **AI & NLP**: Advanced machine learning, neural networks, multilingual processing
238
301
  - **Cybersecurity**: Compliance frameworks (GDPR, KVKK), risk assessment
239
302
  - **Digital Forensics**: Legal document analysis, evidence extraction
240
303
  - **Software Engineering**: Test-driven development, CLI tools