lexm 0.2.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 1d4d7c862b94b78fad36be7529b9a2ae2b7fc0ddcb7071703146ebbdb504576b
4
- data.tar.gz: 7e5e2750d5b226fd6e6782c6e3337826d1d94a2771d4cada0ea2bf74c69dc2cd
3
+ metadata.gz: 147eccae18795425b54c13045269798e8c438523c5453279c5274fc44a95fc65
4
+ data.tar.gz: b5055ec1bb29595732129402875e58f79c9d2440053624a88dec447f8e82752f
5
5
  SHA512:
6
- metadata.gz: d91386b44fb2ad409d236a13e7685b3df4ebe6bc442de2826d488bbc927d2c28af1b6f6d3c52aabb64493f3a0b3b7e4d77951e43b022750041c5c7933a8d04fa
7
- data.tar.gz: ae7d3b47fa6a7ff65e5fb1a125d257ad4aaf83fdf03c2d7cc803f98a0f5b0dbf76e1ce5df9de885e552e5d346e27e7c5a96dc8cbd0d3446f9a3de90668560c50
6
+ metadata.gz: 6144cfd7f2eb44f7ef4a4f134925a39ff2c9a675f5176de4c09f17d02f8805aac7ba71ed5a6c81fbf48435cf99e11b4e0b912c263e3759c880a8c655983b902b
7
+ data.tar.gz: '0479a14b133a3437314bce00e2d12ae9fedd45c5c69170ff76ee0b35338aaf43d2b5efa22462b240fa2c7c1c0a06d0b6255ec7d8ea53c0156674bdf9f46c59ce'
data/README.md CHANGED
@@ -2,12 +2,37 @@
2
2
 
3
3
  <img align="center" width="400" src="icon.png"/>
4
4
 
5
- ### Lemma Markup Format<br><br>![License](https://img.shields.io/github/license/drkameleon/lexm?style=for-the-badge)
5
+ ### Lemma Markup Format
6
+
7
+ ![License](https://img.shields.io/github/license/drkameleon/lexm?style=for-the-badge)
8
+ ![Version](https://img.shields.io/badge/version-0.3.0-blue.svg?style=for-the-badge)
9
+
6
10
  </div>
7
11
 
8
12
  ---
9
13
 
10
- LexM is a concise, human-readable format for representing dictionary-ready, lexical entries with their various forms, relationships, and redirections.
14
+ <!--ts-->
15
+ * [Lemma Markup Format](#lemma-markup-format)
16
+ * [Installation](#installation)
17
+ * [Basic Format](#basic-format)
18
+ * [Examples](#examples)
19
+ * [Entry Types](#entry-types)
20
+ * [Standard Lemma](#standard-lemma)
21
+ * [Lemma with Sublemmas](#lemma-with-sublemmas)
22
+ * [Redirection Entry](#redirection-entry)
23
+ * [Mixed Format](#mixed-format)
24
+ * [Advanced Features](#advanced-features)
25
+ * [Validation](#validation)
26
+ * [File Operations](#file-operations)
27
+ * [LexM Format Specification](#lexm-format-specification)
28
+ * [Attribution](#-attribution)
29
+ * [How to Cite](#how-to-cite)
30
+ * [License](#license)
31
+ <!--te-->
32
+
33
+ ---
34
+
35
+ LexM is a concise, human-readable format for representing dictionary-ready lexical entries with their various forms, relationships, and redirections. It's designed to be both easy to write by hand and simple to parse programmatically.
11
36
 
12
37
  ## Installation
13
38
 
@@ -37,6 +62,9 @@ A LexM entry consists of a lemma (headword) and optional elements:
37
62
  lemma[annotations]|sublemma1,sublemma2,>(relation)target
38
63
  ```
39
64
 
65
+ > [!NOTE]
66
+ > The format is designed to be human-readable while still being structured enough for programmatic processing. This makes it ideal for dictionary development, computational linguistics, and language learning applications.
67
+
40
68
  ## Examples
41
69
 
42
70
  ```ruby
@@ -85,6 +113,9 @@ list.eachWord do |word|
85
113
  end
86
114
  ```
87
115
 
116
+ > [!TIP]
117
+ > When using `addLemma`, the method will automatically merge lemmas with the same headword by default, combining their annotations and adding new sublemmas. Use `addLemma(lemma, false)` to add a lemma without merging.
118
+
88
119
  ## Entry Types
89
120
 
90
121
  ### Standard Lemma
@@ -119,6 +150,62 @@ A lemma that has sublemmas including a redirection:
119
150
  left|left-handed,>(sp,pp)leave
120
151
  ```
121
152
 
153
+ ## Advanced Features
154
+
155
+ ### Validation
156
+
157
+ LexM includes comprehensive validation to ensure your dictionary data is consistent and free of conflicts:
158
+
159
+ ```ruby
160
+ list = LemmaList.new
161
+ # Add lemmas...
162
+
163
+ # Option 1: Validates and returns true/false
164
+ if list.validate
165
+ puts "Dictionary is valid!"
166
+ else
167
+ puts "Dictionary contains errors"
168
+ end
169
+
170
+ # Option 2: Get a list of all validation errors
171
+ errors = list.validateAll
172
+ if errors.empty?
173
+ puts "Dictionary is valid!"
174
+ else
175
+ puts "Validation errors:"
176
+ errors.each { |error| puts "- #{error}" }
177
+ end
178
+ ```
179
+
180
+ > [!IMPORTANT]
181
+ > The `validateAll` method checks for all validation issues at once, including:
182
+ > - Duplicate headwords
183
+ > - Words that appear as both headwords and sublemmas
184
+ > - Words that appear as both normal headwords and redirection headwords
185
+ > - Circular dependencies and redirections
186
+
187
+ ### File Operations
188
+
189
+ Load from and save to LexM files:
190
+
191
+ ```ruby
192
+ # Load from file
193
+ lemmas = LemmaList.new("dictionary.lexm")
194
+
195
+ # Save to file
196
+ lemmas.save("updated_dictionary.lexm")
197
+ ```
198
+
199
+ ## LexM Format Specification
200
+
201
+ | Element | Syntax | Example |
202
+ |---------|--------|---------|
203
+ | Lemma | `word` | `run` |
204
+ | Annotations | `[key:value,key2:value2]` | `[sp:ran,pp:run]` |
205
+ | Sublemmas | `\|sublemma1,sublemma2` | `\|run away,run up` |
206
+ | Redirection | `>>(type)target` | `>>(pl)child` |
207
+ | Sublemma Redirection | `>(type)target` | `>(sp)rise` |
208
+
122
209
  ## Attribution
123
210
  LexM was created and developed by Yanis Zafirópulos (a.k.a. Dr.Kameleon). If you use this software, please maintain this attribution.
124
211
 
@@ -129,4 +216,24 @@ If you use LexM in your research or applications, please cite it as:
129
216
 
130
217
  ## License
131
218
 
132
- This library is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
219
+ MIT License
220
+
221
+ Copyright (c) 2025 Yanis Zafirópulos (aka Dr.Kameleon)
222
+
223
+ Permission is hereby granted, free of charge, to any person obtaining a copy
224
+ of this software and associated documentation files (the "Software"), to deal
225
+ in the Software without restriction, including without limitation the rights
226
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
227
+ copies of the Software, and to permit persons to whom the Software is
228
+ furnished to do so, subject to the following conditions:
229
+
230
+ The above copyright notice and this permission notice shall be included in all
231
+ copies or substantial portions of the Software.
232
+
233
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
234
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
235
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
236
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
237
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
238
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
239
+ SOFTWARE.
data/bin/lexm CHANGED
@@ -2,11 +2,11 @@
2
2
  #############################################################
3
3
  # LexM - Lemma Markup Format
4
4
  #
5
- # A specification for representing, dictionary-ready
5
+ # A specification for representing dictionary-ready,
6
6
  # lexical entries and their relationships
7
7
  #
8
8
  # File: bin/lexm
9
- # Author: Yanis Zafirópulos (aka Dr.Kameleon)
9
+ # (c) 2025 Yanis Zafirópulos (aka Dr.Kameleon)
10
10
  #############################################################
11
11
 
12
12
  require "lexm"
data/lib/lexm/lemma.rb CHANGED
@@ -1,25 +1,33 @@
1
1
  #############################################################
2
2
  # LexM - Lemma Markup Format
3
3
  #
4
- # A specification for representing, dictionary-ready
4
+ # A specification for representing dictionary-ready,
5
5
  # lexical entries and their relationships
6
6
  #
7
7
  # File: lib/lexm/lemma.rb
8
- # Author: Yanis Zafirópulos (aka Dr.Kameleon)
8
+ # (c) 2025 Yanis Zafirópulos (aka Dr.Kameleon)
9
9
  #############################################################
10
10
 
11
11
  module LexM
12
12
  # Represents a lemma, the main entry in a lexicon
13
13
  class Lemma
14
14
  attr_accessor :text, :annotations, :sublemmas, :redirect
15
+ # Source location information
16
+ attr_accessor :source_file, :source_line, :source_column
15
17
 
16
18
  # Initialize from either a string or direct components
17
19
  # @param input [String, nil] input string in LexM format to parse
18
- def initialize(input = nil)
20
+ # @param source_file [String, nil] source file path
21
+ # @param source_line [Integer, nil] source line number
22
+ # @param source_column [Integer, nil] source column number
23
+ def initialize(input = nil, source_file = nil, source_line = nil, source_column = nil)
19
24
  @text = nil
20
25
  @annotations = {}
21
26
  @sublemmas = []
22
27
  @redirect = nil
28
+ @source_file = source_file
29
+ @source_line = source_line
30
+ @source_column = source_column
23
31
 
24
32
  parse(input) if input.is_a?(String)
25
33
  end
@@ -1,11 +1,11 @@
1
1
  #############################################################
2
2
  # LexM - Lemma Markup Format
3
3
  #
4
- # A specification for representing, dictionary-ready
4
+ # A specification for representing dictionary-ready,
5
5
  # lexical entries and their relationships
6
6
  #
7
7
  # File: lib/lexm/lemma_list.rb
8
- # Author: Yanis Zafirópulos (aka Dr.Kameleon)
8
+ # (c) 2025 Yanis Zafirópulos (aka Dr.Kameleon)
9
9
  #############################################################
10
10
 
11
11
  module LexM
@@ -63,10 +63,13 @@ module LexM
63
63
  # @param text [String] text to parse
64
64
  # @return [LemmaList] self
65
65
  def parseString(text)
66
+ line_number = 0
66
67
  text.each_line do |line|
68
+ line_number += 1
67
69
  line = line.strip
68
70
  next if line.empty? || line.start_with?('#')
69
- @lemmas << Lemma.new(line)
71
+ lemma = Lemma.new(line, "string input", line_number, 1)
72
+ @lemmas << lemma
70
73
  end
71
74
  self
72
75
  end
@@ -84,7 +87,12 @@ module LexM
84
87
  next if line.empty? || line.start_with?('#')
85
88
 
86
89
  begin
87
- @lemmas << Lemma.new(line)
90
+ # Create lemma with source location info
91
+ lemma = Lemma.new(line, filename, line_number, 1)
92
+ @lemmas << lemma
93
+
94
+ # Track sublemma positions
95
+ track_sublemma_positions(lemma, line, filename, line_number)
88
96
  rescue StandardError => e
89
97
  raise "Error on line #{line_number}: #{e.message} (#{line})"
90
98
  end
@@ -100,17 +108,61 @@ module LexM
100
108
  self
101
109
  end
102
110
 
111
+ # Track source positions for sublemmas
112
+ # @param lemma [Lemma] The lemma containing sublemmas
113
+ # @param line [String] The original line from the file
114
+ # @param filename [String] Source filename
115
+ # @param line_number [Integer] Source line number
116
+ # @return [void]
117
+ def track_sublemma_positions(lemma, line, filename, line_number)
118
+ return if line.nil? || lemma.redirected? || !line.include?("|")
119
+
120
+ # Find where sublemmas begin
121
+ sublemmas_start = line.index("|") + 1
122
+
123
+ # For each sublemma, try to find its position in the line
124
+ lemma.sublemmas.each do |sublemma|
125
+ sublemma.source_file = filename
126
+ sublemma.source_line = line_number
127
+
128
+ # Determine column position
129
+ if sublemma.text
130
+ # Find position of this sublemma text in the line
131
+ text_pos = line.index(sublemma.text, sublemmas_start)
132
+ sublemma.source_column = text_pos ? text_pos + 1 : sublemmas_start
133
+ elsif sublemma.redirect
134
+ # Find position of redirection marker
135
+ redirect_pos = line.index('>', sublemmas_start)
136
+ sublemma.source_column = redirect_pos ? redirect_pos + 1 : sublemmas_start
137
+ end
138
+ end
139
+ end
140
+
141
+ # Helper method to format source location
142
+ # @param item [Object] Object with source location attributes
143
+ # @return [String] Formatted source location
144
+ def source_location_str(item)
145
+ if item.source_file && item.source_line
146
+ col_info = item.source_column ? ", col: #{item.source_column}" : ""
147
+ "#{item.source_file}:#{item.source_line}#{col_info}"
148
+ else
149
+ "unknown location"
150
+ end
151
+ end
152
+
103
153
  # Check for circular redirection chains
104
154
  # For example, if A redirects to B, which redirects back to A
105
155
  # @return [Boolean] true if no circular redirections are found
106
156
  # @raise [StandardError] with cycle path if circular redirections are detected
107
157
  def validateRedirections
108
- # Build a redirection graph
158
+ # Build a redirection graph with locations
109
159
  redirection_map = {}
160
+ location_map = {}
110
161
 
111
162
  @lemmas.each do |lemma|
112
163
  if lemma.redirected?
113
164
  redirection_map[lemma.text] = lemma.redirect.target
165
+ location_map[lemma.text] = source_location_str(lemma)
114
166
  end
115
167
  end
116
168
 
@@ -125,21 +177,310 @@ module LexM
125
177
  end
126
178
 
127
179
  if redirection_map.key?(current) && current == start
128
- cycle_path = visited.join(" -> ") + " -> " + current
129
- raise "Circular redirection detected: #{cycle_path}"
180
+ # Format the cycle with locations
181
+ cycle_path = visited.map do |word|
182
+ loc = location_map[word] || "unknown location"
183
+ "#{word} (#{loc})"
184
+ end
185
+
186
+ cycle_path << "#{current} (#{location_map[current]})"
187
+ raise "Circular redirection detected: #{cycle_path.join(' -> ')}"
188
+ end
189
+ end
190
+
191
+ true
192
+ end
193
+
194
+ # Ensures no headword appears more than once in the list
195
+ # This prevents ambiguity and conflicts in the dictionary
196
+ # @return [Boolean] true if no duplicate headwords are found
197
+ # @raise [StandardError] if duplicate headwords are detected
198
+ def validateHeadwords
199
+ # Check for duplicate headwords
200
+ headwords = {}
201
+
202
+ @lemmas.each do |lemma|
203
+ if headwords.key?(lemma.text)
204
+ location1 = source_location_str(headwords[lemma.text])
205
+ location2 = source_location_str(lemma)
206
+ raise "Duplicate headword detected: '#{lemma.text}' at #{location1} and #{location2}"
207
+ end
208
+ headwords[lemma.text] = lemma
209
+ end
210
+
211
+ true
212
+ end
213
+
214
+ # Ensures that words don't appear as both headwords and sublemmas,
215
+ # and that the same sublemma doesn't appear under multiple headwords
216
+ # @return [Boolean] true if no conflicts are found
217
+ # @raise [StandardError] if conflicts are detected
218
+ def validateSublemmaRelationships
219
+ # Build word maps with source tracking
220
+ normal_headwords = {}
221
+ redirection_headwords = {}
222
+ sublemmas_map = {}
223
+
224
+ # First, capture all headwords and their sublemmas
225
+ @lemmas.each do |lemma|
226
+ if lemma.redirected?
227
+ redirection_headwords[lemma.text] = lemma
228
+ else
229
+ normal_headwords[lemma.text] = lemma
230
+
231
+ # Process sublemmas for non-redirecting lemmas
232
+ lemma.sublemmas.each do |sublemma|
233
+ # Skip redirecting sublemmas, we only care about actual sublemmas with text
234
+ next if sublemma.redirected?
235
+
236
+ # Record which headword this sublemma belongs to
237
+ if sublemmas_map.key?(sublemma.text)
238
+ sublemmas_map[sublemma.text] << [lemma, sublemma]
239
+ else
240
+ sublemmas_map[sublemma.text] = [[lemma, sublemma]]
241
+ end
242
+ end
243
+ end
244
+ end
245
+
246
+ # Check for words that are both normal headwords and redirection headwords
247
+ normal_headwords.each do |word, lemma|
248
+ if redirection_headwords.key?(word)
249
+ location1 = source_location_str(lemma)
250
+ location2 = source_location_str(redirection_headwords[word])
251
+ raise "Word '#{word}' is both a normal headword (#{location1}) and a redirection headword (#{location2})"
252
+ end
253
+ end
254
+
255
+ # Check for words that are both headwords and sublemmas
256
+ normal_headwords.each do |word, lemma|
257
+ if sublemmas_map.key?(word)
258
+ location1 = source_location_str(lemma)
259
+ sublemma_info = sublemmas_map[word].map do |l, s|
260
+ "#{l.text} (#{source_location_str(s)})"
261
+ end.join(', ')
262
+ raise "Word '#{word}' is both a headword (#{location1}) and a sublemma of #{sublemma_info}"
263
+ end
264
+ end
265
+
266
+ # Check for words that are both redirection headwords and sublemmas
267
+ redirection_headwords.each do |word, lemma|
268
+ if sublemmas_map.key?(word)
269
+ location1 = source_location_str(lemma)
270
+ sublemma_info = sublemmas_map[word].map do |l, s|
271
+ "#{l.text} (#{source_location_str(s)})"
272
+ end.join(', ')
273
+ raise "Word '#{word}' is both a redirection headword (#{location1}) and a sublemma of #{sublemma_info}"
274
+ end
275
+ end
276
+
277
+ # Check for sublemmas that appear in multiple entries
278
+ sublemmas_map.each do |sublemma, entries|
279
+ if entries.size > 1
280
+ headword_info = entries.map do |l, s|
281
+ "#{l.text} (#{source_location_str(s)})"
282
+ end.join(', ')
283
+ raise "Sublemma '#{sublemma}' appears in multiple entries: #{headword_info}"
284
+ end
285
+ end
286
+
287
+ true
288
+ end
289
+
290
+ # Detects circular dependencies between lemmas and sublemmas
291
+ # A circular dependency would result in infinite recursion when
292
+ # expanding or processing the lemma structure
293
+ # @return [Boolean] true if no circular dependencies are found
294
+ # @raise [StandardError] if circular dependencies are detected
295
+ def validateCircularDependencies
296
+ # Build a graph of dependencies (headword -> sublemmas) with locations
297
+ dependency_graph = {}
298
+ location_map = {}
299
+
300
+ @lemmas.each do |lemma|
301
+ next if lemma.redirected?
302
+
303
+ # Track lemma location
304
+ location_map[lemma.text] = source_location_str(lemma)
305
+
306
+ # Initialize headword in the graph if not present
307
+ dependency_graph[lemma.text] ||= []
308
+
309
+ # Add all non-redirecting sublemmas as dependencies
310
+ lemma.sublemmas.each do |sublemma|
311
+ next if sublemma.redirected?
312
+ dependency_graph[lemma.text] << sublemma.text
313
+ location_map[sublemma.text] ||= source_location_str(sublemma)
314
+ end
315
+ end
316
+
317
+ # For each headword, check for circular dependencies
318
+ dependency_graph.each_key do |start|
319
+ detectCycles(dependency_graph, start, [], [], location_map)
320
+ end
321
+
322
+ true
323
+ end
324
+
325
+ # Helper method for validateCircularDependencies
326
+ # Recursively traverses the dependency graph to find cycles using DFS
327
+ # @param graph [Hash] The dependency graph mapping lemmas to their sublemmas
328
+ # @param start [String] The starting node for cycle detection
329
+ # @param visited [Array] Nodes already visited in any path
330
+ # @param path [Array] Nodes visited in the current path
331
+ # @param location_map [Hash] Map of words to their source locations
332
+ # @return [Boolean] True if no cycles are detected
333
+ # @raise [StandardError] if a cycle is detected
334
+ def detectCycles(graph, start, visited = [], path = [], location_map = {})
335
+ # Mark the current node as visited and add to path
336
+ visited << start
337
+ path << start
338
+
339
+ # Visit all neighbors
340
+ if graph.key?(start)
341
+ graph[start].each do |neighbor|
342
+ # Skip if neighbor is not a headword (not in graph)
343
+ next unless graph.key?(neighbor)
344
+
345
+ if !visited.include?(neighbor)
346
+ detectCycles(graph, neighbor, visited, path, location_map)
347
+ elsif path.include?(neighbor)
348
+ # Cycle detected
349
+ cycle_start_index = path.index(neighbor)
350
+ cycle = path[cycle_start_index..-1] << neighbor
351
+
352
+ # Format the cycle with source locations
353
+ cycle_with_locations = cycle.map do |word|
354
+ loc = location_map[word] || "unknown location"
355
+ "#{word} (#{loc})"
356
+ end
357
+
358
+ raise "Circular dependency detected: #{cycle_with_locations.join(' -> ')}"
359
+ end
130
360
  end
131
361
  end
132
362
 
363
+ # Remove the current node from path
364
+ path.pop
133
365
  true
134
366
  end
135
367
 
136
368
  # Validate the entire lemma list for consistency
137
369
  # Runs all validation checks
138
370
  # @return [Boolean] true if validation passes
139
- # @raise [StandardError] with detailed message if validation fails
140
371
  def validate
141
- validateRedirections
142
- true
372
+ begin
373
+ validateHeadwords
374
+ validateSublemmaRelationships
375
+ validateCircularDependencies
376
+ validateRedirections
377
+ return true
378
+ rescue StandardError => e
379
+ puts "Validation error: #{e.message}"
380
+ return false
381
+ end
382
+ end
383
+
384
+ # Performs all validation checks and returns an array of all errors
385
+ # instead of raising on the first error encountered
386
+ # @return [Array<String>] List of validation errors or empty array if valid
387
+ def validateAll
388
+ errors = []
389
+
390
+ # Create maps for tracking word usage with source locations
391
+ normal_headwords = {}
392
+ redirection_headwords = {}
393
+ sublemmas_map = {}
394
+
395
+ # First, map out all words and their locations
396
+ @lemmas.each do |lemma|
397
+ location = source_location_str(lemma)
398
+
399
+ if lemma.redirected?
400
+ redirection_headwords[lemma.text] = location
401
+ else
402
+ normal_headwords[lemma.text] = location
403
+
404
+ # Process sublemmas for non-redirecting lemmas
405
+ lemma.sublemmas.each do |sublemma|
406
+ next if sublemma.redirected?
407
+
408
+ sub_location = source_location_str(sublemma)
409
+
410
+ # Record which headword this sublemma belongs to with location
411
+ if sublemmas_map.key?(sublemma.text)
412
+ sublemmas_map[sublemma.text] << [lemma.text, sub_location]
413
+ else
414
+ sublemmas_map[sublemma.text] = [[lemma.text, sub_location]]
415
+ end
416
+ end
417
+ end
418
+ end
419
+
420
+ # Check for duplicate headwords with locations
421
+ headword_locations = {}
422
+ @lemmas.each do |lemma|
423
+ location = source_location_str(lemma)
424
+ if headword_locations.key?(lemma.text)
425
+ headword_locations[lemma.text] << location
426
+ else
427
+ headword_locations[lemma.text] = [location]
428
+ end
429
+ end
430
+
431
+ headword_locations.each do |word, locations|
432
+ if locations.size > 1
433
+ errors << "Duplicate headword detected: '#{word}' at #{locations.join(' and ')}"
434
+ end
435
+ end
436
+
437
+ # Check for words that are both normal headwords and redirection headwords
438
+ normal_headwords.each do |word, location|
439
+ if redirection_headwords.key?(word)
440
+ errors << "Word '#{word}' is both a normal headword (#{location}) and a redirection headword (#{redirection_headwords[word]})"
441
+ end
442
+ end
443
+
444
+ # Check for words that are both headwords and sublemmas
445
+ normal_headwords.each do |word, location|
446
+ if sublemmas_map.key?(word)
447
+ sublemma_info = sublemmas_map[word].map { |h, l| "#{h} (#{l})" }.join(', ')
448
+ errors << "Word '#{word}' is both a headword (#{location}) and a sublemma of #{sublemma_info}"
449
+ end
450
+ end
451
+
452
+ # Check for words that are both redirection headwords and sublemmas
453
+ redirection_headwords.each do |word, location|
454
+ if sublemmas_map.key?(word)
455
+ sublemma_info = sublemmas_map[word].map { |h, l| "#{h} (#{l})" }.join(', ')
456
+ errors << "Word '#{word}' is both a redirection headword (#{location}) and a sublemma of #{sublemma_info}"
457
+ end
458
+ end
459
+
460
+ # Check for sublemmas that appear in multiple entries
461
+ sublemmas_map.each do |sublemma, headword_list|
462
+ if headword_list.size > 1
463
+ headword_info = headword_list.map { |h, l| "#{h} (#{l})" }.join(', ')
464
+ errors << "Sublemma '#{sublemma}' appears in multiple entries: #{headword_info}"
465
+ end
466
+ end
467
+
468
+ # Check for circular dependencies and redirections if no errors so far
469
+ if errors.empty?
470
+ begin
471
+ validateCircularDependencies
472
+ rescue StandardError => e
473
+ errors << e.message
474
+ end
475
+
476
+ begin
477
+ validateRedirections
478
+ rescue StandardError => e
479
+ errors << e.message
480
+ end
481
+ end
482
+
483
+ errors
143
484
  end
144
485
 
145
486
  # Find lemmas by lemma text
@@ -193,19 +534,51 @@ module LexM
193
534
  end
194
535
  end
195
536
 
196
- # Add a new lemma
197
- # @param lemma [Lemma] lemma to add
198
- # @return [LemmaList] self
199
- def addLemma(lemma)
200
- @lemmas << lemma
537
+ # Adds a lemma to the list
538
+ # If a lemma with the same headword already exists, it will merge the
539
+ # annotations and sublemmas from the new lemma into the existing one
540
+ # @param lemma [Lemma] The lemma to add
541
+ # @param merge [Boolean] Whether to merge with existing lemmas (default: true)
542
+ # @return [LemmaList] self for method chaining
543
+ def addLemma(lemma, merge = true)
544
+ # Find existing lemma with the same headword
545
+ existing = findByText(lemma.text).first
546
+
547
+ if existing && merge
548
+ # Merge annotations
549
+ lemma.annotations.each do |key, value|
550
+ existing.setAnnotation(key, value)
551
+ end
552
+
553
+ # Merge sublemmas
554
+ lemma.sublemmas.each do |sublemma|
555
+ # Check if this sublemma already exists
556
+ sublemma_exists = existing.sublemmas.any? do |existing_sublemma|
557
+ existing_sublemma.text == sublemma.text &&
558
+ (!existing_sublemma.redirected? && !sublemma.redirected?)
559
+ end
560
+
561
+ # Add the sublemma if it doesn't exist
562
+ unless sublemma_exists
563
+ existing.sublemmas << sublemma
564
+ end
565
+ end
566
+ else
567
+ # Add as new lemma
568
+ @lemmas << lemma
569
+ end
570
+
201
571
  self
202
572
  end
203
573
 
204
574
  # Add multiple lemmas at once
205
575
  # @param lemmas [Array<Lemma>] lemmas to add
576
+ # @param merge [Boolean] Whether to merge with existing lemmas (default: true)
206
577
  # @return [LemmaList] self
207
- def addLemmas(lemmas)
208
- @lemmas.concat(lemmas)
578
+ def addLemmas(lemmas, merge = true)
579
+ lemmas.each do |lemma|
580
+ addLemma(lemma, merge)
581
+ end
209
582
  self
210
583
  end
211
584
 
@@ -1,11 +1,11 @@
1
1
  #############################################################
2
2
  # LexM - Lemma Markup Format
3
3
  #
4
- # A specification for representing, dictionary-ready
4
+ # A specification for representing dictionary-ready,
5
5
  # lexical entries and their relationships
6
6
  #
7
7
  # File: lib/lexm/lemma_redirect.rb
8
- # Author: Yanis Zafirópulos (aka Dr.Kameleon)
8
+ # (c) 2025 Yanis Zafirópulos (aka Dr.Kameleon)
9
9
  #############################################################
10
10
 
11
11
  module LexM
data/lib/lexm/sublemma.rb CHANGED
@@ -1,24 +1,32 @@
1
1
  #############################################################
2
2
  # LexM - Lemma Markup Format
3
3
  #
4
- # A specification for representing, dictionary-ready
4
+ # A specification for representing dictionary-ready,
5
5
  # lexical entries and their relationships
6
6
  #
7
7
  # File: lib/lexm/sublemma.rb
8
- # Author: Yanis Zafirópulos (aka Dr.Kameleon)
8
+ # (c) 2025 Yanis Zafirópulos (aka Dr.Kameleon)
9
9
  #############################################################
10
10
 
11
11
  module LexM
12
12
  # Represents a sublemma, which can be either a textual sublemma or a redirection
13
13
  class Sublemma
14
14
  attr_accessor :text, :redirect
15
+ # Source location information
16
+ attr_accessor :source_file, :source_line, :source_column
15
17
 
16
18
  # Initialize a new sublemma
17
19
  # @param text [String, nil] the text of the sublemma (nil for pure redirections)
18
20
  # @param redirect [LemmaRedirect, nil] redirection information (nil for normal sublemmas)
19
- def initialize(text = nil, redirect = nil)
21
+ # @param source_file [String, nil] source file path
22
+ # @param source_line [Integer, nil] source line number
23
+ # @param source_column [Integer, nil] source column number
24
+ def initialize(text = nil, redirect = nil, source_file = nil, source_line = nil, source_column = nil)
20
25
  @text = text
21
26
  @redirect = redirect
27
+ @source_file = source_file
28
+ @source_line = source_line
29
+ @source_column = source_column
22
30
  end
23
31
 
24
32
  # Is this a pure redirection sublemma?
data/lib/lexm/version.rb CHANGED
@@ -1,13 +1,13 @@
1
1
  #############################################################
2
2
  # LexM - Lemma Markup Format
3
3
  #
4
- # A specification for representing, dictionary-ready
4
+ # A specification for representing dictionary-ready,
5
5
  # lexical entries and their relationships
6
6
  #
7
7
  # File: lib/lexm/version.rb
8
- # Author: Yanis Zafirópulos (aka Dr.Kameleon)
8
+ # (c) 2025 Yanis Zafirópulos (aka Dr.Kameleon)
9
9
  #############################################################
10
10
 
11
11
  module LexM
12
- VERSION = "0.2.0"
12
+ VERSION = "0.3.0"
13
13
  end
data/lib/lexm.rb CHANGED
@@ -1,11 +1,11 @@
1
1
  #############################################################
2
2
  # LexM - Lemma Markup Format
3
3
  #
4
- # A specification for representing, dictionary-ready
4
+ # A specification for representing dictionary-ready,
5
5
  # lexical entries and their relationships
6
6
  #
7
7
  # File: lib/lexm.rb
8
- # Author: Yanis Zafirópulos (aka Dr.Kameleon)
8
+ # (c) 2025 Yanis Zafirópulos (aka Dr.Kameleon)
9
9
  #############################################################
10
10
 
11
11
  require 'lexm/version'
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: lexm
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Yanis Zafirópulos
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2025-03-20 00:00:00.000000000 Z
11
+ date: 2025-03-21 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rspec