lexm 0.2.0 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +110 -3
- data/bin/lexm +2 -2
- data/lib/lexm/lemma.rb +11 -3
- data/lib/lexm/lemma_list.rb +390 -17
- data/lib/lexm/lemma_redirect.rb +2 -2
- data/lib/lexm/sublemma.rb +11 -3
- data/lib/lexm/version.rb +3 -3
- data/lib/lexm.rb +2 -2
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 147eccae18795425b54c13045269798e8c438523c5453279c5274fc44a95fc65
|
4
|
+
data.tar.gz: b5055ec1bb29595732129402875e58f79c9d2440053624a88dec447f8e82752f
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 6144cfd7f2eb44f7ef4a4f134925a39ff2c9a675f5176de4c09f17d02f8805aac7ba71ed5a6c81fbf48435cf99e11b4e0b912c263e3759c880a8c655983b902b
|
7
|
+
data.tar.gz: '0479a14b133a3437314bce00e2d12ae9fedd45c5c69170ff76ee0b35338aaf43d2b5efa22462b240fa2c7c1c0a06d0b6255ec7d8ea53c0156674bdf9f46c59ce'
|
data/README.md
CHANGED
@@ -2,12 +2,37 @@
|
|
2
2
|
|
3
3
|
<img align="center" width="400" src="icon.png"/>
|
4
4
|
|
5
|
-
### Lemma Markup Format
|
5
|
+
### Lemma Markup Format
|
6
|
+
|
7
|
+

|
8
|
+

|
9
|
+
|
6
10
|
</div>
|
7
11
|
|
8
12
|
---
|
9
13
|
|
10
|
-
|
14
|
+
<!--ts-->
|
15
|
+
* [Lemma Markup Format](#lemma-markup-format)
|
16
|
+
* [Installation](#installation)
|
17
|
+
* [Basic Format](#basic-format)
|
18
|
+
* [Examples](#examples)
|
19
|
+
* [Entry Types](#entry-types)
|
20
|
+
* [Standard Lemma](#standard-lemma)
|
21
|
+
* [Lemma with Sublemmas](#lemma-with-sublemmas)
|
22
|
+
* [Redirection Entry](#redirection-entry)
|
23
|
+
* [Mixed Format](#mixed-format)
|
24
|
+
* [Advanced Features](#advanced-features)
|
25
|
+
* [Validation](#validation)
|
26
|
+
* [File Operations](#file-operations)
|
27
|
+
* [LexM Format Specification](#lexm-format-specification)
|
28
|
+
* [Attribution](#-attribution)
|
29
|
+
* [How to Cite](#how-to-cite)
|
30
|
+
* [License](#license)
|
31
|
+
<!--te-->
|
32
|
+
|
33
|
+
---
|
34
|
+
|
35
|
+
LexM is a concise, human-readable format for representing dictionary-ready lexical entries with their various forms, relationships, and redirections. It's designed to be both easy to write by hand and simple to parse programmatically.
|
11
36
|
|
12
37
|
## Installation
|
13
38
|
|
@@ -37,6 +62,9 @@ A LexM entry consists of a lemma (headword) and optional elements:
|
|
37
62
|
lemma[annotations]|sublemma1,sublemma2,>(relation)target
|
38
63
|
```
|
39
64
|
|
65
|
+
> [!NOTE]
|
66
|
+
> The format is designed to be human-readable while still being structured enough for programmatic processing. This makes it ideal for dictionary development, computational linguistics, and language learning applications.
|
67
|
+
|
40
68
|
## Examples
|
41
69
|
|
42
70
|
```ruby
|
@@ -85,6 +113,9 @@ list.eachWord do |word|
|
|
85
113
|
end
|
86
114
|
```
|
87
115
|
|
116
|
+
> [!TIP]
|
117
|
+
> When using `addLemma`, the method will automatically merge lemmas with the same headword by default, combining their annotations and adding new sublemmas. Use `addLemma(lemma, false)` to add a lemma without merging.
|
118
|
+
|
88
119
|
## Entry Types
|
89
120
|
|
90
121
|
### Standard Lemma
|
@@ -119,6 +150,62 @@ A lemma that has sublemmas including a redirection:
|
|
119
150
|
left|left-handed,>(sp,pp)leave
|
120
151
|
```
|
121
152
|
|
153
|
+
## Advanced Features
|
154
|
+
|
155
|
+
### Validation
|
156
|
+
|
157
|
+
LexM includes comprehensive validation to ensure your dictionary data is consistent and free of conflicts:
|
158
|
+
|
159
|
+
```ruby
|
160
|
+
list = LemmaList.new
|
161
|
+
# Add lemmas...
|
162
|
+
|
163
|
+
# Option 1: Validates and returns true/false
|
164
|
+
if list.validate
|
165
|
+
puts "Dictionary is valid!"
|
166
|
+
else
|
167
|
+
puts "Dictionary contains errors"
|
168
|
+
end
|
169
|
+
|
170
|
+
# Option 2: Get a list of all validation errors
|
171
|
+
errors = list.validateAll
|
172
|
+
if errors.empty?
|
173
|
+
puts "Dictionary is valid!"
|
174
|
+
else
|
175
|
+
puts "Validation errors:"
|
176
|
+
errors.each { |error| puts "- #{error}" }
|
177
|
+
end
|
178
|
+
```
|
179
|
+
|
180
|
+
> [!IMPORTANT]
|
181
|
+
> The `validateAll` method checks for all validation issues at once, including:
|
182
|
+
> - Duplicate headwords
|
183
|
+
> - Words that appear as both headwords and sublemmas
|
184
|
+
> - Words that appear as both normal headwords and redirection headwords
|
185
|
+
> - Circular dependencies and redirections
|
186
|
+
|
187
|
+
### File Operations
|
188
|
+
|
189
|
+
Load from and save to LexM files:
|
190
|
+
|
191
|
+
```ruby
|
192
|
+
# Load from file
|
193
|
+
lemmas = LemmaList.new("dictionary.lexm")
|
194
|
+
|
195
|
+
# Save to file
|
196
|
+
lemmas.save("updated_dictionary.lexm")
|
197
|
+
```
|
198
|
+
|
199
|
+
## LexM Format Specification
|
200
|
+
|
201
|
+
| Element | Syntax | Example |
|
202
|
+
|---------|--------|---------|
|
203
|
+
| Lemma | `word` | `run` |
|
204
|
+
| Annotations | `[key:value,key2:value2]` | `[sp:ran,pp:run]` |
|
205
|
+
| Sublemmas | `\|sublemma1,sublemma2` | `\|run away,run up` |
|
206
|
+
| Redirection | `>>(type)target` | `>>(pl)child` |
|
207
|
+
| Sublemma Redirection | `>(type)target` | `>(sp)rise` |
|
208
|
+
|
122
209
|
## Attribution
|
123
210
|
LexM was created and developed by Yanis Zafirópulos (a.k.a. Dr.Kameleon). If you use this software, please maintain this attribution.
|
124
211
|
|
@@ -129,4 +216,24 @@ If you use LexM in your research or applications, please cite it as:
|
|
129
216
|
|
130
217
|
## License
|
131
218
|
|
132
|
-
|
219
|
+
MIT License
|
220
|
+
|
221
|
+
Copyright (c) 2025 Yanis Zafirópulos (aka Dr.Kameleon)
|
222
|
+
|
223
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
224
|
+
of this software and associated documentation files (the "Software"), to deal
|
225
|
+
in the Software without restriction, including without limitation the rights
|
226
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
227
|
+
copies of the Software, and to permit persons to whom the Software is
|
228
|
+
furnished to do so, subject to the following conditions:
|
229
|
+
|
230
|
+
The above copyright notice and this permission notice shall be included in all
|
231
|
+
copies or substantial portions of the Software.
|
232
|
+
|
233
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
234
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
235
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
236
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
237
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
238
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
239
|
+
SOFTWARE.
|
data/bin/lexm
CHANGED
@@ -2,11 +2,11 @@
|
|
2
2
|
#############################################################
|
3
3
|
# LexM - Lemma Markup Format
|
4
4
|
#
|
5
|
-
# A specification for representing
|
5
|
+
# A specification for representing dictionary-ready,
|
6
6
|
# lexical entries and their relationships
|
7
7
|
#
|
8
8
|
# File: bin/lexm
|
9
|
-
#
|
9
|
+
# (c) 2025 Yanis Zafirópulos (aka Dr.Kameleon)
|
10
10
|
#############################################################
|
11
11
|
|
12
12
|
require "lexm"
|
data/lib/lexm/lemma.rb
CHANGED
@@ -1,25 +1,33 @@
|
|
1
1
|
#############################################################
|
2
2
|
# LexM - Lemma Markup Format
|
3
3
|
#
|
4
|
-
# A specification for representing
|
4
|
+
# A specification for representing dictionary-ready,
|
5
5
|
# lexical entries and their relationships
|
6
6
|
#
|
7
7
|
# File: lib/lexm/lemma.rb
|
8
|
-
#
|
8
|
+
# (c) 2025 Yanis Zafirópulos (aka Dr.Kameleon)
|
9
9
|
#############################################################
|
10
10
|
|
11
11
|
module LexM
|
12
12
|
# Represents a lemma, the main entry in a lexicon
|
13
13
|
class Lemma
|
14
14
|
attr_accessor :text, :annotations, :sublemmas, :redirect
|
15
|
+
# Source location information
|
16
|
+
attr_accessor :source_file, :source_line, :source_column
|
15
17
|
|
16
18
|
# Initialize from either a string or direct components
|
17
19
|
# @param input [String, nil] input string in LexM format to parse
|
18
|
-
|
20
|
+
# @param source_file [String, nil] source file path
|
21
|
+
# @param source_line [Integer, nil] source line number
|
22
|
+
# @param source_column [Integer, nil] source column number
|
23
|
+
def initialize(input = nil, source_file = nil, source_line = nil, source_column = nil)
|
19
24
|
@text = nil
|
20
25
|
@annotations = {}
|
21
26
|
@sublemmas = []
|
22
27
|
@redirect = nil
|
28
|
+
@source_file = source_file
|
29
|
+
@source_line = source_line
|
30
|
+
@source_column = source_column
|
23
31
|
|
24
32
|
parse(input) if input.is_a?(String)
|
25
33
|
end
|
data/lib/lexm/lemma_list.rb
CHANGED
@@ -1,11 +1,11 @@
|
|
1
1
|
#############################################################
|
2
2
|
# LexM - Lemma Markup Format
|
3
3
|
#
|
4
|
-
# A specification for representing
|
4
|
+
# A specification for representing dictionary-ready,
|
5
5
|
# lexical entries and their relationships
|
6
6
|
#
|
7
7
|
# File: lib/lexm/lemma_list.rb
|
8
|
-
#
|
8
|
+
# (c) 2025 Yanis Zafirópulos (aka Dr.Kameleon)
|
9
9
|
#############################################################
|
10
10
|
|
11
11
|
module LexM
|
@@ -63,10 +63,13 @@ module LexM
|
|
63
63
|
# @param text [String] text to parse
|
64
64
|
# @return [LemmaList] self
|
65
65
|
def parseString(text)
|
66
|
+
line_number = 0
|
66
67
|
text.each_line do |line|
|
68
|
+
line_number += 1
|
67
69
|
line = line.strip
|
68
70
|
next if line.empty? || line.start_with?('#')
|
69
|
-
|
71
|
+
lemma = Lemma.new(line, "string input", line_number, 1)
|
72
|
+
@lemmas << lemma
|
70
73
|
end
|
71
74
|
self
|
72
75
|
end
|
@@ -84,7 +87,12 @@ module LexM
|
|
84
87
|
next if line.empty? || line.start_with?('#')
|
85
88
|
|
86
89
|
begin
|
87
|
-
|
90
|
+
# Create lemma with source location info
|
91
|
+
lemma = Lemma.new(line, filename, line_number, 1)
|
92
|
+
@lemmas << lemma
|
93
|
+
|
94
|
+
# Track sublemma positions
|
95
|
+
track_sublemma_positions(lemma, line, filename, line_number)
|
88
96
|
rescue StandardError => e
|
89
97
|
raise "Error on line #{line_number}: #{e.message} (#{line})"
|
90
98
|
end
|
@@ -100,17 +108,61 @@ module LexM
|
|
100
108
|
self
|
101
109
|
end
|
102
110
|
|
111
|
+
# Track source positions for sublemmas
|
112
|
+
# @param lemma [Lemma] The lemma containing sublemmas
|
113
|
+
# @param line [String] The original line from the file
|
114
|
+
# @param filename [String] Source filename
|
115
|
+
# @param line_number [Integer] Source line number
|
116
|
+
# @return [void]
|
117
|
+
def track_sublemma_positions(lemma, line, filename, line_number)
|
118
|
+
return if line.nil? || lemma.redirected? || !line.include?("|")
|
119
|
+
|
120
|
+
# Find where sublemmas begin
|
121
|
+
sublemmas_start = line.index("|") + 1
|
122
|
+
|
123
|
+
# For each sublemma, try to find its position in the line
|
124
|
+
lemma.sublemmas.each do |sublemma|
|
125
|
+
sublemma.source_file = filename
|
126
|
+
sublemma.source_line = line_number
|
127
|
+
|
128
|
+
# Determine column position
|
129
|
+
if sublemma.text
|
130
|
+
# Find position of this sublemma text in the line
|
131
|
+
text_pos = line.index(sublemma.text, sublemmas_start)
|
132
|
+
sublemma.source_column = text_pos ? text_pos + 1 : sublemmas_start
|
133
|
+
elsif sublemma.redirect
|
134
|
+
# Find position of redirection marker
|
135
|
+
redirect_pos = line.index('>', sublemmas_start)
|
136
|
+
sublemma.source_column = redirect_pos ? redirect_pos + 1 : sublemmas_start
|
137
|
+
end
|
138
|
+
end
|
139
|
+
end
|
140
|
+
|
141
|
+
# Helper method to format source location
|
142
|
+
# @param item [Object] Object with source location attributes
|
143
|
+
# @return [String] Formatted source location
|
144
|
+
def source_location_str(item)
|
145
|
+
if item.source_file && item.source_line
|
146
|
+
col_info = item.source_column ? ", col: #{item.source_column}" : ""
|
147
|
+
"#{item.source_file}:#{item.source_line}#{col_info}"
|
148
|
+
else
|
149
|
+
"unknown location"
|
150
|
+
end
|
151
|
+
end
|
152
|
+
|
103
153
|
# Check for circular redirection chains
|
104
154
|
# For example, if A redirects to B, which redirects back to A
|
105
155
|
# @return [Boolean] true if no circular redirections are found
|
106
156
|
# @raise [StandardError] with cycle path if circular redirections are detected
|
107
157
|
def validateRedirections
|
108
|
-
# Build a redirection graph
|
158
|
+
# Build a redirection graph with locations
|
109
159
|
redirection_map = {}
|
160
|
+
location_map = {}
|
110
161
|
|
111
162
|
@lemmas.each do |lemma|
|
112
163
|
if lemma.redirected?
|
113
164
|
redirection_map[lemma.text] = lemma.redirect.target
|
165
|
+
location_map[lemma.text] = source_location_str(lemma)
|
114
166
|
end
|
115
167
|
end
|
116
168
|
|
@@ -125,21 +177,310 @@ module LexM
|
|
125
177
|
end
|
126
178
|
|
127
179
|
if redirection_map.key?(current) && current == start
|
128
|
-
|
129
|
-
|
180
|
+
# Format the cycle with locations
|
181
|
+
cycle_path = visited.map do |word|
|
182
|
+
loc = location_map[word] || "unknown location"
|
183
|
+
"#{word} (#{loc})"
|
184
|
+
end
|
185
|
+
|
186
|
+
cycle_path << "#{current} (#{location_map[current]})"
|
187
|
+
raise "Circular redirection detected: #{cycle_path.join(' -> ')}"
|
188
|
+
end
|
189
|
+
end
|
190
|
+
|
191
|
+
true
|
192
|
+
end
|
193
|
+
|
194
|
+
# Ensures no headword appears more than once in the list
|
195
|
+
# This prevents ambiguity and conflicts in the dictionary
|
196
|
+
# @return [Boolean] true if no duplicate headwords are found
|
197
|
+
# @raise [StandardError] if duplicate headwords are detected
|
198
|
+
def validateHeadwords
|
199
|
+
# Check for duplicate headwords
|
200
|
+
headwords = {}
|
201
|
+
|
202
|
+
@lemmas.each do |lemma|
|
203
|
+
if headwords.key?(lemma.text)
|
204
|
+
location1 = source_location_str(headwords[lemma.text])
|
205
|
+
location2 = source_location_str(lemma)
|
206
|
+
raise "Duplicate headword detected: '#{lemma.text}' at #{location1} and #{location2}"
|
207
|
+
end
|
208
|
+
headwords[lemma.text] = lemma
|
209
|
+
end
|
210
|
+
|
211
|
+
true
|
212
|
+
end
|
213
|
+
|
214
|
+
# Ensures that words don't appear as both headwords and sublemmas,
|
215
|
+
# and that the same sublemma doesn't appear under multiple headwords
|
216
|
+
# @return [Boolean] true if no conflicts are found
|
217
|
+
# @raise [StandardError] if conflicts are detected
|
218
|
+
def validateSublemmaRelationships
|
219
|
+
# Build word maps with source tracking
|
220
|
+
normal_headwords = {}
|
221
|
+
redirection_headwords = {}
|
222
|
+
sublemmas_map = {}
|
223
|
+
|
224
|
+
# First, capture all headwords and their sublemmas
|
225
|
+
@lemmas.each do |lemma|
|
226
|
+
if lemma.redirected?
|
227
|
+
redirection_headwords[lemma.text] = lemma
|
228
|
+
else
|
229
|
+
normal_headwords[lemma.text] = lemma
|
230
|
+
|
231
|
+
# Process sublemmas for non-redirecting lemmas
|
232
|
+
lemma.sublemmas.each do |sublemma|
|
233
|
+
# Skip redirecting sublemmas, we only care about actual sublemmas with text
|
234
|
+
next if sublemma.redirected?
|
235
|
+
|
236
|
+
# Record which headword this sublemma belongs to
|
237
|
+
if sublemmas_map.key?(sublemma.text)
|
238
|
+
sublemmas_map[sublemma.text] << [lemma, sublemma]
|
239
|
+
else
|
240
|
+
sublemmas_map[sublemma.text] = [[lemma, sublemma]]
|
241
|
+
end
|
242
|
+
end
|
243
|
+
end
|
244
|
+
end
|
245
|
+
|
246
|
+
# Check for words that are both normal headwords and redirection headwords
|
247
|
+
normal_headwords.each do |word, lemma|
|
248
|
+
if redirection_headwords.key?(word)
|
249
|
+
location1 = source_location_str(lemma)
|
250
|
+
location2 = source_location_str(redirection_headwords[word])
|
251
|
+
raise "Word '#{word}' is both a normal headword (#{location1}) and a redirection headword (#{location2})"
|
252
|
+
end
|
253
|
+
end
|
254
|
+
|
255
|
+
# Check for words that are both headwords and sublemmas
|
256
|
+
normal_headwords.each do |word, lemma|
|
257
|
+
if sublemmas_map.key?(word)
|
258
|
+
location1 = source_location_str(lemma)
|
259
|
+
sublemma_info = sublemmas_map[word].map do |l, s|
|
260
|
+
"#{l.text} (#{source_location_str(s)})"
|
261
|
+
end.join(', ')
|
262
|
+
raise "Word '#{word}' is both a headword (#{location1}) and a sublemma of #{sublemma_info}"
|
263
|
+
end
|
264
|
+
end
|
265
|
+
|
266
|
+
# Check for words that are both redirection headwords and sublemmas
|
267
|
+
redirection_headwords.each do |word, lemma|
|
268
|
+
if sublemmas_map.key?(word)
|
269
|
+
location1 = source_location_str(lemma)
|
270
|
+
sublemma_info = sublemmas_map[word].map do |l, s|
|
271
|
+
"#{l.text} (#{source_location_str(s)})"
|
272
|
+
end.join(', ')
|
273
|
+
raise "Word '#{word}' is both a redirection headword (#{location1}) and a sublemma of #{sublemma_info}"
|
274
|
+
end
|
275
|
+
end
|
276
|
+
|
277
|
+
# Check for sublemmas that appear in multiple entries
|
278
|
+
sublemmas_map.each do |sublemma, entries|
|
279
|
+
if entries.size > 1
|
280
|
+
headword_info = entries.map do |l, s|
|
281
|
+
"#{l.text} (#{source_location_str(s)})"
|
282
|
+
end.join(', ')
|
283
|
+
raise "Sublemma '#{sublemma}' appears in multiple entries: #{headword_info}"
|
284
|
+
end
|
285
|
+
end
|
286
|
+
|
287
|
+
true
|
288
|
+
end
|
289
|
+
|
290
|
+
# Detects circular dependencies between lemmas and sublemmas
|
291
|
+
# A circular dependency would result in infinite recursion when
|
292
|
+
# expanding or processing the lemma structure
|
293
|
+
# @return [Boolean] true if no circular dependencies are found
|
294
|
+
# @raise [StandardError] if circular dependencies are detected
|
295
|
+
def validateCircularDependencies
|
296
|
+
# Build a graph of dependencies (headword -> sublemmas) with locations
|
297
|
+
dependency_graph = {}
|
298
|
+
location_map = {}
|
299
|
+
|
300
|
+
@lemmas.each do |lemma|
|
301
|
+
next if lemma.redirected?
|
302
|
+
|
303
|
+
# Track lemma location
|
304
|
+
location_map[lemma.text] = source_location_str(lemma)
|
305
|
+
|
306
|
+
# Initialize headword in the graph if not present
|
307
|
+
dependency_graph[lemma.text] ||= []
|
308
|
+
|
309
|
+
# Add all non-redirecting sublemmas as dependencies
|
310
|
+
lemma.sublemmas.each do |sublemma|
|
311
|
+
next if sublemma.redirected?
|
312
|
+
dependency_graph[lemma.text] << sublemma.text
|
313
|
+
location_map[sublemma.text] ||= source_location_str(sublemma)
|
314
|
+
end
|
315
|
+
end
|
316
|
+
|
317
|
+
# For each headword, check for circular dependencies
|
318
|
+
dependency_graph.each_key do |start|
|
319
|
+
detectCycles(dependency_graph, start, [], [], location_map)
|
320
|
+
end
|
321
|
+
|
322
|
+
true
|
323
|
+
end
|
324
|
+
|
325
|
+
# Helper method for validateCircularDependencies
|
326
|
+
# Recursively traverses the dependency graph to find cycles using DFS
|
327
|
+
# @param graph [Hash] The dependency graph mapping lemmas to their sublemmas
|
328
|
+
# @param start [String] The starting node for cycle detection
|
329
|
+
# @param visited [Array] Nodes already visited in any path
|
330
|
+
# @param path [Array] Nodes visited in the current path
|
331
|
+
# @param location_map [Hash] Map of words to their source locations
|
332
|
+
# @return [Boolean] True if no cycles are detected
|
333
|
+
# @raise [StandardError] if a cycle is detected
|
334
|
+
def detectCycles(graph, start, visited = [], path = [], location_map = {})
|
335
|
+
# Mark the current node as visited and add to path
|
336
|
+
visited << start
|
337
|
+
path << start
|
338
|
+
|
339
|
+
# Visit all neighbors
|
340
|
+
if graph.key?(start)
|
341
|
+
graph[start].each do |neighbor|
|
342
|
+
# Skip if neighbor is not a headword (not in graph)
|
343
|
+
next unless graph.key?(neighbor)
|
344
|
+
|
345
|
+
if !visited.include?(neighbor)
|
346
|
+
detectCycles(graph, neighbor, visited, path, location_map)
|
347
|
+
elsif path.include?(neighbor)
|
348
|
+
# Cycle detected
|
349
|
+
cycle_start_index = path.index(neighbor)
|
350
|
+
cycle = path[cycle_start_index..-1] << neighbor
|
351
|
+
|
352
|
+
# Format the cycle with source locations
|
353
|
+
cycle_with_locations = cycle.map do |word|
|
354
|
+
loc = location_map[word] || "unknown location"
|
355
|
+
"#{word} (#{loc})"
|
356
|
+
end
|
357
|
+
|
358
|
+
raise "Circular dependency detected: #{cycle_with_locations.join(' -> ')}"
|
359
|
+
end
|
130
360
|
end
|
131
361
|
end
|
132
362
|
|
363
|
+
# Remove the current node from path
|
364
|
+
path.pop
|
133
365
|
true
|
134
366
|
end
|
135
367
|
|
136
368
|
# Validate the entire lemma list for consistency
|
137
369
|
# Runs all validation checks
|
138
370
|
# @return [Boolean] true if validation passes
|
139
|
-
# @raise [StandardError] with detailed message if validation fails
|
140
371
|
def validate
|
141
|
-
|
142
|
-
|
372
|
+
begin
|
373
|
+
validateHeadwords
|
374
|
+
validateSublemmaRelationships
|
375
|
+
validateCircularDependencies
|
376
|
+
validateRedirections
|
377
|
+
return true
|
378
|
+
rescue StandardError => e
|
379
|
+
puts "Validation error: #{e.message}"
|
380
|
+
return false
|
381
|
+
end
|
382
|
+
end
|
383
|
+
|
384
|
+
# Performs all validation checks and returns an array of all errors
|
385
|
+
# instead of raising on the first error encountered
|
386
|
+
# @return [Array<String>] List of validation errors or empty array if valid
|
387
|
+
def validateAll
|
388
|
+
errors = []
|
389
|
+
|
390
|
+
# Create maps for tracking word usage with source locations
|
391
|
+
normal_headwords = {}
|
392
|
+
redirection_headwords = {}
|
393
|
+
sublemmas_map = {}
|
394
|
+
|
395
|
+
# First, map out all words and their locations
|
396
|
+
@lemmas.each do |lemma|
|
397
|
+
location = source_location_str(lemma)
|
398
|
+
|
399
|
+
if lemma.redirected?
|
400
|
+
redirection_headwords[lemma.text] = location
|
401
|
+
else
|
402
|
+
normal_headwords[lemma.text] = location
|
403
|
+
|
404
|
+
# Process sublemmas for non-redirecting lemmas
|
405
|
+
lemma.sublemmas.each do |sublemma|
|
406
|
+
next if sublemma.redirected?
|
407
|
+
|
408
|
+
sub_location = source_location_str(sublemma)
|
409
|
+
|
410
|
+
# Record which headword this sublemma belongs to with location
|
411
|
+
if sublemmas_map.key?(sublemma.text)
|
412
|
+
sublemmas_map[sublemma.text] << [lemma.text, sub_location]
|
413
|
+
else
|
414
|
+
sublemmas_map[sublemma.text] = [[lemma.text, sub_location]]
|
415
|
+
end
|
416
|
+
end
|
417
|
+
end
|
418
|
+
end
|
419
|
+
|
420
|
+
# Check for duplicate headwords with locations
|
421
|
+
headword_locations = {}
|
422
|
+
@lemmas.each do |lemma|
|
423
|
+
location = source_location_str(lemma)
|
424
|
+
if headword_locations.key?(lemma.text)
|
425
|
+
headword_locations[lemma.text] << location
|
426
|
+
else
|
427
|
+
headword_locations[lemma.text] = [location]
|
428
|
+
end
|
429
|
+
end
|
430
|
+
|
431
|
+
headword_locations.each do |word, locations|
|
432
|
+
if locations.size > 1
|
433
|
+
errors << "Duplicate headword detected: '#{word}' at #{locations.join(' and ')}"
|
434
|
+
end
|
435
|
+
end
|
436
|
+
|
437
|
+
# Check for words that are both normal headwords and redirection headwords
|
438
|
+
normal_headwords.each do |word, location|
|
439
|
+
if redirection_headwords.key?(word)
|
440
|
+
errors << "Word '#{word}' is both a normal headword (#{location}) and a redirection headword (#{redirection_headwords[word]})"
|
441
|
+
end
|
442
|
+
end
|
443
|
+
|
444
|
+
# Check for words that are both headwords and sublemmas
|
445
|
+
normal_headwords.each do |word, location|
|
446
|
+
if sublemmas_map.key?(word)
|
447
|
+
sublemma_info = sublemmas_map[word].map { |h, l| "#{h} (#{l})" }.join(', ')
|
448
|
+
errors << "Word '#{word}' is both a headword (#{location}) and a sublemma of #{sublemma_info}"
|
449
|
+
end
|
450
|
+
end
|
451
|
+
|
452
|
+
# Check for words that are both redirection headwords and sublemmas
|
453
|
+
redirection_headwords.each do |word, location|
|
454
|
+
if sublemmas_map.key?(word)
|
455
|
+
sublemma_info = sublemmas_map[word].map { |h, l| "#{h} (#{l})" }.join(', ')
|
456
|
+
errors << "Word '#{word}' is both a redirection headword (#{location}) and a sublemma of #{sublemma_info}"
|
457
|
+
end
|
458
|
+
end
|
459
|
+
|
460
|
+
# Check for sublemmas that appear in multiple entries
|
461
|
+
sublemmas_map.each do |sublemma, headword_list|
|
462
|
+
if headword_list.size > 1
|
463
|
+
headword_info = headword_list.map { |h, l| "#{h} (#{l})" }.join(', ')
|
464
|
+
errors << "Sublemma '#{sublemma}' appears in multiple entries: #{headword_info}"
|
465
|
+
end
|
466
|
+
end
|
467
|
+
|
468
|
+
# Check for circular dependencies and redirections if no errors so far
|
469
|
+
if errors.empty?
|
470
|
+
begin
|
471
|
+
validateCircularDependencies
|
472
|
+
rescue StandardError => e
|
473
|
+
errors << e.message
|
474
|
+
end
|
475
|
+
|
476
|
+
begin
|
477
|
+
validateRedirections
|
478
|
+
rescue StandardError => e
|
479
|
+
errors << e.message
|
480
|
+
end
|
481
|
+
end
|
482
|
+
|
483
|
+
errors
|
143
484
|
end
|
144
485
|
|
145
486
|
# Find lemmas by lemma text
|
@@ -193,19 +534,51 @@ module LexM
|
|
193
534
|
end
|
194
535
|
end
|
195
536
|
|
196
|
-
#
|
197
|
-
#
|
198
|
-
#
|
199
|
-
|
200
|
-
|
537
|
+
# Adds a lemma to the list
|
538
|
+
# If a lemma with the same headword already exists, it will merge the
|
539
|
+
# annotations and sublemmas from the new lemma into the existing one
|
540
|
+
# @param lemma [Lemma] The lemma to add
|
541
|
+
# @param merge [Boolean] Whether to merge with existing lemmas (default: true)
|
542
|
+
# @return [LemmaList] self for method chaining
|
543
|
+
def addLemma(lemma, merge = true)
|
544
|
+
# Find existing lemma with the same headword
|
545
|
+
existing = findByText(lemma.text).first
|
546
|
+
|
547
|
+
if existing && merge
|
548
|
+
# Merge annotations
|
549
|
+
lemma.annotations.each do |key, value|
|
550
|
+
existing.setAnnotation(key, value)
|
551
|
+
end
|
552
|
+
|
553
|
+
# Merge sublemmas
|
554
|
+
lemma.sublemmas.each do |sublemma|
|
555
|
+
# Check if this sublemma already exists
|
556
|
+
sublemma_exists = existing.sublemmas.any? do |existing_sublemma|
|
557
|
+
existing_sublemma.text == sublemma.text &&
|
558
|
+
(!existing_sublemma.redirected? && !sublemma.redirected?)
|
559
|
+
end
|
560
|
+
|
561
|
+
# Add the sublemma if it doesn't exist
|
562
|
+
unless sublemma_exists
|
563
|
+
existing.sublemmas << sublemma
|
564
|
+
end
|
565
|
+
end
|
566
|
+
else
|
567
|
+
# Add as new lemma
|
568
|
+
@lemmas << lemma
|
569
|
+
end
|
570
|
+
|
201
571
|
self
|
202
572
|
end
|
203
573
|
|
204
574
|
# Add multiple lemmas at once
|
205
575
|
# @param lemmas [Array<Lemma>] lemmas to add
|
576
|
+
# @param merge [Boolean] Whether to merge with existing lemmas (default: true)
|
206
577
|
# @return [LemmaList] self
|
207
|
-
def addLemmas(lemmas)
|
208
|
-
|
578
|
+
def addLemmas(lemmas, merge = true)
|
579
|
+
lemmas.each do |lemma|
|
580
|
+
addLemma(lemma, merge)
|
581
|
+
end
|
209
582
|
self
|
210
583
|
end
|
211
584
|
|
data/lib/lexm/lemma_redirect.rb
CHANGED
@@ -1,11 +1,11 @@
|
|
1
1
|
#############################################################
|
2
2
|
# LexM - Lemma Markup Format
|
3
3
|
#
|
4
|
-
# A specification for representing
|
4
|
+
# A specification for representing dictionary-ready,
|
5
5
|
# lexical entries and their relationships
|
6
6
|
#
|
7
7
|
# File: lib/lexm/lemma_redirect.rb
|
8
|
-
#
|
8
|
+
# (c) 2025 Yanis Zafirópulos (aka Dr.Kameleon)
|
9
9
|
#############################################################
|
10
10
|
|
11
11
|
module LexM
|
data/lib/lexm/sublemma.rb
CHANGED
@@ -1,24 +1,32 @@
|
|
1
1
|
#############################################################
|
2
2
|
# LexM - Lemma Markup Format
|
3
3
|
#
|
4
|
-
# A specification for representing
|
4
|
+
# A specification for representing dictionary-ready,
|
5
5
|
# lexical entries and their relationships
|
6
6
|
#
|
7
7
|
# File: lib/lexm/sublemma.rb
|
8
|
-
#
|
8
|
+
# (c) 2025 Yanis Zafirópulos (aka Dr.Kameleon)
|
9
9
|
#############################################################
|
10
10
|
|
11
11
|
module LexM
|
12
12
|
# Represents a sublemma, which can be either a textual sublemma or a redirection
|
13
13
|
class Sublemma
|
14
14
|
attr_accessor :text, :redirect
|
15
|
+
# Source location information
|
16
|
+
attr_accessor :source_file, :source_line, :source_column
|
15
17
|
|
16
18
|
# Initialize a new sublemma
|
17
19
|
# @param text [String, nil] the text of the sublemma (nil for pure redirections)
|
18
20
|
# @param redirect [LemmaRedirect, nil] redirection information (nil for normal sublemmas)
|
19
|
-
|
21
|
+
# @param source_file [String, nil] source file path
|
22
|
+
# @param source_line [Integer, nil] source line number
|
23
|
+
# @param source_column [Integer, nil] source column number
|
24
|
+
def initialize(text = nil, redirect = nil, source_file = nil, source_line = nil, source_column = nil)
|
20
25
|
@text = text
|
21
26
|
@redirect = redirect
|
27
|
+
@source_file = source_file
|
28
|
+
@source_line = source_line
|
29
|
+
@source_column = source_column
|
22
30
|
end
|
23
31
|
|
24
32
|
# Is this a pure redirection sublemma?
|
data/lib/lexm/version.rb
CHANGED
@@ -1,13 +1,13 @@
|
|
1
1
|
#############################################################
|
2
2
|
# LexM - Lemma Markup Format
|
3
3
|
#
|
4
|
-
# A specification for representing
|
4
|
+
# A specification for representing dictionary-ready,
|
5
5
|
# lexical entries and their relationships
|
6
6
|
#
|
7
7
|
# File: lib/lexm/version.rb
|
8
|
-
#
|
8
|
+
# (c) 2025 Yanis Zafirópulos (aka Dr.Kameleon)
|
9
9
|
#############################################################
|
10
10
|
|
11
11
|
module LexM
|
12
|
-
VERSION = "0.
|
12
|
+
VERSION = "0.3.0"
|
13
13
|
end
|
data/lib/lexm.rb
CHANGED
@@ -1,11 +1,11 @@
|
|
1
1
|
#############################################################
|
2
2
|
# LexM - Lemma Markup Format
|
3
3
|
#
|
4
|
-
# A specification for representing
|
4
|
+
# A specification for representing dictionary-ready,
|
5
5
|
# lexical entries and their relationships
|
6
6
|
#
|
7
7
|
# File: lib/lexm.rb
|
8
|
-
#
|
8
|
+
# (c) 2025 Yanis Zafirópulos (aka Dr.Kameleon)
|
9
9
|
#############################################################
|
10
10
|
|
11
11
|
require 'lexm/version'
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: lexm
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.3.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Yanis Zafirópulos
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2025-03-
|
11
|
+
date: 2025-03-21 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rspec
|