georgian-hyphenation 1.0.1 → 2.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README-NPM.md ADDED
@@ -0,0 +1,620 @@
1
+ \# Georgian Language Hyphenation
2
+
3
+
4
+
5
+ \[!\[NPM version](https://img.shields.io/npm/v/georgian-hyphenation.svg)](https://www.npmjs.com/package/georgian-hyphenation)
6
+
7
+ \[!\[License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
8
+
9
+ \[!\[JavaScript](https://img.shields.io/badge/javascript-ES6+-yellow.svg)](https://www.ecma-international.org/)
10
+
11
+
12
+
13
+ \*\*Version 2.0.1\*\* - Academic Logic with Phonological Distance Analysis
14
+
15
+
16
+
17
+ A comprehensive hyphenation library for the Georgian language, using advanced linguistic algorithms for accurate syllabification.
18
+
19
+
20
+
21
+ ქართული ენის სრული დამარცვლის ბიბლიოთეკა, რომელიც იყენებს თანამედროვე ლინგვისტურ ალგორითმებს ზუსტი მარცვლების გამოყოფისთვის.
22
+
23
+
24
+
25
+ ---
26
+
27
+
28
+
29
+ \## ✨ Features
30
+
31
+
32
+
33
+ \### 🎓 v2.0 Academic Logic
34
+
35
+ \- \*\*Phonological Distance Analysis\*\*: Intelligent vowel-to-vowel distance calculation
36
+
37
+ \- \*\*Anti-Orphan Protection\*\*: Prevents single-character splits (minimum 2 chars per side)
38
+
39
+ \- \*\*'R' Rule\*\*: Special handling for Georgian 'რ' in consonant clusters
40
+
41
+ \- \*\*Hiatus Handling\*\*: Proper V-V split detection (e.g., გა-ა-ნა-ლი-ზა)
42
+
43
+ \- \*\*98%+ Accuracy\*\*: Validated on 10,000+ Georgian words
44
+
45
+
46
+
47
+ \### 🚀 Core Features
48
+
49
+ \- ✅ Accurate syllabification based on Georgian phonological rules
50
+
51
+ \- ✅ Multiple output formats: Soft hyphens (U+00AD), visible hyphens, TeX patterns
52
+
53
+ \- ✅ Browser-ready (works in Node.js and browsers)
54
+
55
+ \- ✅ Zero dependencies
56
+
57
+ \- ✅ Lightweight (~5KB minified)
58
+
59
+ \- ✅ Well-tested with comprehensive Georgian word corpus
60
+
61
+
62
+
63
+ ---
64
+
65
+
66
+
67
+ \## 🧠 Algorithm Logic
68
+
69
+
70
+
71
+ \### Version 2.0: Academic Approach
72
+
73
+
74
+
75
+ The v2.0 algorithm uses \*\*phonological distance analysis\*\*:
76
+
77
+
78
+
79
+ \#### Core Principles:
80
+
81
+
82
+
83
+ 1\. \*\*Vowel Distance Analysis\*\*
84
+
85
+   - Finds all vowel positions in the word
86
+
87
+   - Analyzes consonant cluster distance between vowels
88
+
89
+   - Applies context-aware splitting rules
90
+
91
+
92
+
93
+ 2\. \*\*Splitting Rules:\*\*
94
+
95
+   - \*\*V-V\*\* (distance = 0): Split between vowels → `გა-ა-ნა`
96
+
97
+   - \*\*V-C-V\*\* (distance = 1): Split before consonant → `მა-მა`
98
+
99
+   - \*\*V-CC-V\*\* (distance ≥ 2): Split after first consonant → `საქ-მე`
100
+
101
+
102
+
103
+ 3\. \*\*Special Rules:\*\*
104
+
105
+   - \*\*'R' Rule\*\*: If cluster starts with 'რ', keep it left → `ბარ-ბი`
106
+
107
+   - \*\*Anti-Orphan\*\*: Minimum 2 characters on each side → `არა` stays intact
108
+
109
+
110
+
111
+ \#### Examples:
112
+
113
+
114
+
115
+ | Word | Result |
116
+
117
+ |------|--------|
118
+
119
+ | \*\*საქართველო\*\* | სა-ქარ-თვე-ლო |
120
+
121
+ | \*\*იარაღი\*\* | ი-ა-რა-ღი |
122
+
123
+ | \*\*ბარბი\*\* | ბარ-ბი \*(R Rule)\* |
124
+
125
+ | \*\*არა\*\* | არა \*(Anti-Orphan)\* |
126
+
127
+ | \*\*კომპიუტერი\*\* | კომ-პი-უ-ტე-რი |
128
+
129
+
130
+
131
+ ---
132
+
133
+
134
+
135
+ \## 📦 Installation
136
+
137
+ ```bash
138
+
139
+ npm install georgian-hyphenation
140
+
141
+ ```
142
+
143
+
144
+
145
+ ---
146
+
147
+
148
+
149
+ \## 📖 Usage
150
+
151
+
152
+
153
+ \### Node.js
154
+
155
+ ```javascript
156
+
157
+ const { GeorgianHyphenator } = require('georgian-hyphenation');
158
+
159
+
160
+
161
+ // Initialize with soft hyphen (default: U+00AD)
162
+
163
+ const hyphenator = new GeorgianHyphenator();
164
+
165
+
166
+
167
+ // Hyphenate a word
168
+
169
+ const word = "საქართველო";
170
+
171
+ const result = hyphenator.hyphenate(word);
172
+
173
+ console.log(result); // სა­ქარ­თვე­ლო (with U+00AD soft hyphens)
174
+
175
+
176
+
177
+ // Get syllables as array
178
+
179
+ const syllables = hyphenator.getSyllables(word);
180
+
181
+ console.log(syllables); // \['სა', 'ქარ', 'თვე', 'ლო']
182
+
183
+
184
+
185
+ // Use visible hyphens for display
186
+
187
+ const visible = new GeorgianHyphenator('-');
188
+
189
+ console.log(visible.hyphenate(word)); // სა-ქარ-თვე-ლო
190
+
191
+
192
+
193
+ // Hyphenate entire text (preserves punctuation)
194
+
195
+ const text = "საქართველო არის ლამაზი ქვეყანა.";
196
+
197
+ console.log(hyphenator.hyphenateText(text));
198
+
199
+ ```
200
+
201
+
202
+
203
+ \### Browser (ES6 Module)
204
+
205
+ ```html
206
+
207
+ <!DOCTYPE html>
208
+
209
+ <html lang="ka">
210
+
211
+ <head>
212
+
213
+ &nbsp; <style>
214
+
215
+ &nbsp; .hyphenated {
216
+
217
+ &nbsp; hyphens: manual;
218
+
219
+ &nbsp; -webkit-hyphens: manual;
220
+
221
+ &nbsp; text-align: justify;
222
+
223
+ &nbsp; max-width: 400px;
224
+
225
+ &nbsp; }
226
+
227
+ &nbsp; </style>
228
+
229
+ </head>
230
+
231
+ <body>
232
+
233
+ &nbsp; <p class="hyphenated" id="text"></p>
234
+
235
+ &nbsp;
236
+
237
+ &nbsp; <script type="module">
238
+
239
+ &nbsp; import { GeorgianHyphenator } from './node\_modules/georgian-hyphenation/src/javascript/index.js';
240
+
241
+ &nbsp;
242
+
243
+ &nbsp; const hyphenator = new GeorgianHyphenator('\\u00AD');
244
+
245
+ &nbsp; const text = "საქართველო არის ძალიან ლამაზი ქვეყანა";
246
+
247
+ &nbsp; document.getElementById('text').textContent = hyphenator.hyphenateText(text);
248
+
249
+ &nbsp; </script>
250
+
251
+ </body>
252
+
253
+ </html>
254
+
255
+ ```
256
+
257
+
258
+
259
+ \### Browser (CDN)
260
+
261
+ ```html
262
+
263
+ <script src="https://cdn.jsdelivr.net/npm/georgian-hyphenation@2/src/javascript/index.js"></script>
264
+
265
+ <script>
266
+
267
+ &nbsp; const hyphenator = new GeorgianHyphenator();
268
+
269
+ &nbsp; console.log(hyphenator.hyphenate('საქართველო'));
270
+
271
+ </script>
272
+
273
+ ```
274
+
275
+
276
+
277
+ ---
278
+
279
+
280
+
281
+ \## 🎨 API Reference
282
+
283
+
284
+
285
+ \### `GeorgianHyphenator`
286
+
287
+
288
+
289
+ \#### Constructor
290
+
291
+ ```javascript
292
+
293
+ new GeorgianHyphenator(hyphenChar = '\\u00AD')
294
+
295
+ ```
296
+
297
+
298
+
299
+ \- \*\*hyphenChar\*\* (string): Character to use for hyphenation. Default: U+00AD (soft hyphen)
300
+
301
+
302
+
303
+ \#### Methods
304
+
305
+
306
+
307
+ \##### `hyphenate(word)`
308
+
309
+
310
+
311
+ Hyphenate a single Georgian word.
312
+
313
+ ```javascript
314
+
315
+ hyphenator.hyphenate('საქართველო')
316
+
317
+ // Returns: 'სა­ქარ­თვე­ლო' (with soft hyphens)
318
+
319
+ ```
320
+
321
+
322
+
323
+ \##### `getSyllables(word)`
324
+
325
+
326
+
327
+ Get array of syllables for a word.
328
+
329
+ ```javascript
330
+
331
+ hyphenator.getSyllables('საქართველო')
332
+
333
+ // Returns: \['სა', 'ქარ', 'თვე', 'ლო']
334
+
335
+ ```
336
+
337
+
338
+
339
+ \##### `hyphenateText(text)`
340
+
341
+
342
+
343
+ Hyphenate entire text, preserving punctuation and non-Georgian characters.
344
+
345
+ ```javascript
346
+
347
+ hyphenator.hyphenateText('საქართველო არის ლამაზი!')
348
+
349
+ // Returns: 'სა­ქარ­თვე­ლო არის ლა­მა­ზი!'
350
+
351
+ ```
352
+
353
+
354
+
355
+ ---
356
+
357
+
358
+
359
+ \## 🎨 Export Formats
360
+
361
+
362
+
363
+ \### TeX Patterns
364
+
365
+ ```javascript
366
+
367
+ const { toTeXPattern } = require('georgian-hyphenation');
368
+
369
+
370
+
371
+ console.log(toTeXPattern('საქართველო'));
372
+
373
+ // Output: .სა1ქარ1თვე1ლო.
374
+
375
+ ```
376
+
377
+
378
+
379
+ Use in LaTeX:
380
+
381
+ ```latex
382
+
383
+ \\documentclass{article}
384
+
385
+ \\usepackage{polyglossia}
386
+
387
+ \\setmainlanguage{georgian}
388
+
389
+ \\input{georgian-patterns.tex}
390
+
391
+
392
+
393
+ \\begin{document}
394
+
395
+ საქართველო არის ძალიან ლამაზი ქვეყანა
396
+
397
+ \\end{document}
398
+
399
+ ```
400
+
401
+
402
+
403
+ \### Hunspell Dictionary
404
+
405
+ ```javascript
406
+
407
+ const { toHunspellFormat } = require('georgian-hyphenation');
408
+
409
+
410
+
411
+ console.log(toHunspellFormat('საქართველო'));
412
+
413
+ // Output: სა=ქარ=თვე=ლო
414
+
415
+ ```
416
+
417
+
418
+
419
+ ---
420
+
421
+
422
+
423
+ \## 📊 Examples
424
+
425
+
426
+
427
+ | Word | Syllables | Hyphenated |
428
+
429
+ | --- | --- | --- |
430
+
431
+ | საქართველო | სა, ქარ, თვე, ლო | სა-ქარ-თვე-ლო |
432
+
433
+ | მთავრობა | მთავ, რო, ბა | მთავ-რო-ბა |
434
+
435
+ | დედაქალაქი | დე, და, ქა, ლა, ქი | დე-და-ქა-ლა-ქი |
436
+
437
+ | ტელევიზორი | ტე, ლე, ვი, ზო, რი | ტე-ლე-ვი-ზო-რი |
438
+
439
+ | კომპიუტერი | კომ, პი, უ, ტე, რი | კომ-პი-უ-ტე-რი |
440
+
441
+ | იარაღი | ი, ა, რა, ღი | ი-ა-რა-ღი |
442
+
443
+ | ბარბი | ბარ, ბი | ბარ-ბი |
444
+
445
+
446
+
447
+ ---
448
+
449
+
450
+
451
+ \## 🎨 Live Demo
452
+
453
+
454
+
455
+ \*\*Interactive Demo:\*\* https://guramzhgamadze.github.io/georgian-hyphenation/
456
+
457
+
458
+
459
+ Try it yourself:
460
+
461
+ \- Test with your own Georgian text
462
+
463
+ \- Adjust browser width to see automatic line breaking
464
+
465
+ \- View syllable breakdown
466
+
467
+ \- Compare different output formats
468
+
469
+
470
+
471
+ ---
472
+
473
+
474
+
475
+ \## 🧪 Testing
476
+
477
+ ```bash
478
+
479
+ npm test
480
+
481
+ ```
482
+
483
+
484
+
485
+ \*\*Test Coverage:\*\*
486
+
487
+ \- ✅ 10,000+ Georgian words validated
488
+
489
+ \- ✅ Edge cases (V-V, consonant clusters, short words)
490
+
491
+ \- ✅ Unicode handling
492
+
493
+ \- ✅ Punctuation preservation
494
+
495
+
496
+
497
+ ---
498
+
499
+
500
+
501
+ \## 🤝 Contributing
502
+
503
+
504
+
505
+ Contributions are welcome! Please submit a Pull Request.
506
+
507
+
508
+
509
+ 1\. Fork the repository
510
+
511
+ 2\. Create your feature branch (`git checkout -b feature/AmazingFeature`)
512
+
513
+ 3\. Commit your changes (`git commit -m 'Add AmazingFeature'`)
514
+
515
+ 4\. Push to the branch (`git push origin feature/AmazingFeature`)
516
+
517
+ 5\. Open a Pull Request
518
+
519
+
520
+
521
+ ---
522
+
523
+
524
+
525
+ \## 📝 Changelog
526
+
527
+
528
+
529
+ \### Version 2.0.1 (2025-01-22)
530
+
531
+ \- Updated documentation
532
+
533
+ \- NPM package improvements
534
+
535
+
536
+
537
+ \### Version 2.0.0 (2025-01-21)
538
+
539
+ \*\*Major Rewrite: Academic Logic\*\*
540
+
541
+ \- Complete algorithm rewrite with phonological distance analysis
542
+
543
+ \- Anti-Orphan protection
544
+
545
+ \- 'R' Rule implementation for Georgian consonant clusters
546
+
547
+ \- Improved accuracy: 95% → 98%+
548
+
549
+ \- Cleaner, more maintainable codebase
550
+
551
+
552
+
553
+ \### Version 1.0.1
554
+
555
+ \- Bug fixes
556
+
557
+ \- Performance improvements
558
+
559
+
560
+
561
+ \### Version 1.0.0
562
+
563
+ \- Initial release
564
+
565
+
566
+
567
+ ---
568
+
569
+
570
+
571
+ \## 📄 License
572
+
573
+
574
+
575
+ MIT License - see \[LICENSE](https://github.com/guramzhgamadze/georgian-hyphenation/blob/main/LICENSE) for details.
576
+
577
+
578
+
579
+ ---
580
+
581
+
582
+
583
+ \## 📧 Contact
584
+
585
+
586
+
587
+ \*\*Guram Zhgamadze\*\*
588
+
589
+ \- GitHub: \[@guramzhgamadze](https://github.com/guramzhgamadze)
590
+
591
+ \- Email: guramzhgamadze@gmail.com
592
+
593
+ \- Issues: \[Report bugs or request features](https://github.com/guramzhgamadze/georgian-hyphenation/issues)
594
+
595
+
596
+
597
+ ---
598
+
599
+
600
+
601
+ \## 🔗 Related Packages
602
+
603
+
604
+
605
+ \- \*\*Python:\*\* `pip install georgian-hyphenation` - \[PyPI](https://pypi.org/project/georgian-hyphenation/)
606
+
607
+ \- \*\*Browser Extension:\*\* \[Firefox Add-ons](https://addons.mozilla.org/firefox/addon/georgian-hyphenation/)
608
+
609
+
610
+
611
+ ---
612
+
613
+
614
+
615
+ Made with ❤️ for the Georgian language community
616
+
617
+
618
+
619
+ 🇬🇪 \*\*საქართველო\*\* 🇬🇪
620
+