georgian-hyphenation 2.2.3 โ 2.2.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +277 -163
- package/package.json +2 -2
- package/src/javascript/index.js +6 -6
- package/src/georgian_hyphenation/__init__.py +0 -26
- package/src/georgian_hyphenation/__pycache__/__init__.cpython-313.pyc +0 -0
- package/src/georgian_hyphenation/__pycache__/hyphenator.cpython-313.pyc +0 -0
- package/src/georgian_hyphenation/hyphenator.py +0 -358
- package/src/georgian_hyphenation/hyphenator.py.backup +0 -312
- package/src/georgian_hyphenation.egg-info/PKG-INFO +0 -657
- package/src/georgian_hyphenation.egg-info/SOURCES.txt +0 -14
- package/src/georgian_hyphenation.egg-info/dependency_links.txt +0 -1
- package/src/georgian_hyphenation.egg-info/requires.txt +0 -3
- package/src/georgian_hyphenation.egg-info/top_level.txt +0 -2
|
@@ -1,657 +0,0 @@
|
|
|
1
|
-
Metadata-Version: 2.4
|
|
2
|
-
Name: georgian-hyphenation
|
|
3
|
-
Version: 2.2.2
|
|
4
|
-
Summary: Georgian Language Hyphenation Library v2.2.1 - Modernized & Optimized with Dictionary Support
|
|
5
|
-
Home-page: https://github.com/guramzhgamadze/georgian-hyphenation
|
|
6
|
-
Author: Guram Zhgamadze
|
|
7
|
-
Author-email: Guram Zhgamadze <guramzhgamadze@gmail.com>
|
|
8
|
-
License: MIT
|
|
9
|
-
Project-URL: Homepage, https://github.com/guramzhgamadze/georgian-hyphenation
|
|
10
|
-
Project-URL: Repository, https://github.com/guramzhgamadze/georgian-hyphenation
|
|
11
|
-
Project-URL: Documentation, https://github.com/guramzhgamadze/georgian-hyphenation#readme
|
|
12
|
-
Project-URL: Bug Tracker, https://github.com/guramzhgamadze/georgian-hyphenation/issues
|
|
13
|
-
Keywords: georgian,hyphenation,syllabification,nlp,linguistics,kartuli,dictionary
|
|
14
|
-
Classifier: Development Status :: 5 - Production/Stable
|
|
15
|
-
Classifier: Intended Audience :: Developers
|
|
16
|
-
Classifier: License :: OSI Approved :: MIT License
|
|
17
|
-
Classifier: Programming Language :: Python :: 3
|
|
18
|
-
Classifier: Programming Language :: Python :: 3.7
|
|
19
|
-
Classifier: Programming Language :: Python :: 3.8
|
|
20
|
-
Classifier: Programming Language :: Python :: 3.9
|
|
21
|
-
Classifier: Programming Language :: Python :: 3.10
|
|
22
|
-
Classifier: Programming Language :: Python :: 3.11
|
|
23
|
-
Classifier: Programming Language :: Python :: 3.12
|
|
24
|
-
Classifier: Topic :: Text Processing :: Linguistic
|
|
25
|
-
Classifier: Natural Language :: Georgian
|
|
26
|
-
Requires-Python: >=3.7
|
|
27
|
-
Description-Content-Type: text/markdown
|
|
28
|
-
License-File: LICENSE.txt
|
|
29
|
-
Provides-Extra: dev
|
|
30
|
-
Requires-Dist: pytest>=7.0; extra == "dev"
|
|
31
|
-
Dynamic: author
|
|
32
|
-
Dynamic: home-page
|
|
33
|
-
Dynamic: license-file
|
|
34
|
-
Dynamic: requires-python
|
|
35
|
-
|
|
36
|
-
# ๐ฌ๐ช Georgian Hyphenation - Python Library
|
|
37
|
-
|
|
38
|
-
[](https://pypi.org/project/georgian-hyphenation/)
|
|
39
|
-
[](https://pypi.org/project/georgian-hyphenation/)
|
|
40
|
-
[](https://opensource.org/licenses/MIT)
|
|
41
|
-
|
|
42
|
-
**Georgian Language Hyphenation Library v2.2.1** - แฅแแ แแฃแแ แแแแก แแแแแ แชแแแแก แแแแแแแแแแ
|
|
43
|
-
|
|
44
|
-
Automatic hyphenation (syllabification) for Georgian text with hybrid engine: **Algorithm + Dictionary**.
|
|
45
|
-
|
|
46
|
-
---
|
|
47
|
-
|
|
48
|
-
## โจ Features
|
|
49
|
-
|
|
50
|
-
### **v2.2.1 (Latest)**
|
|
51
|
-
- ๐ฏ **Hybrid Engine**: Algorithm + Dictionary (150+ exception words)
|
|
52
|
-
- โก **Optimized Performance**: Set-based harmonic cluster lookup (O(1))
|
|
53
|
-
- ๐ **Strip & Re-hyphenate**: Corrects old incorrect hyphenation
|
|
54
|
-
- ๐ต **Harmonic Clusters**: Preserves natural Georgian sound clusters (แแ, แแ, แแ , etc.)
|
|
55
|
-
- ๐ **Gemination Handling**: Splits double consonants correctly (rare in Georgian)
|
|
56
|
-
- ๐ก๏ธ **Anti-Orphan Protection**: Minimum 2 characters on each side
|
|
57
|
-
- ๐ **Pure Python**: No external dependencies
|
|
58
|
-
- ๐ **Unicode Support**: Full Georgian script support
|
|
59
|
-
|
|
60
|
-
### **Core Algorithm**
|
|
61
|
-
- Phonological distance analysis
|
|
62
|
-
- Vowel-based syllable detection
|
|
63
|
-
- Contextual consonant cluster handling
|
|
64
|
-
- Punctuation preservation
|
|
65
|
-
|
|
66
|
-
---
|
|
67
|
-
|
|
68
|
-
## ๐ฆ Installation
|
|
69
|
-
```bash
|
|
70
|
-
pip install georgian-hyphenation
|
|
71
|
-
```
|
|
72
|
-
|
|
73
|
-
### **Requirements**
|
|
74
|
-
- Python 3.7+
|
|
75
|
-
- No external dependencies (uses only standard library)
|
|
76
|
-
|
|
77
|
-
---
|
|
78
|
-
|
|
79
|
-
## ๐ Quick Start
|
|
80
|
-
|
|
81
|
-
### **Basic Usage**
|
|
82
|
-
```python
|
|
83
|
-
from georgian_hyphenation import GeorgianHyphenator
|
|
84
|
-
|
|
85
|
-
# Initialize with visible hyphen
|
|
86
|
-
hyphenator = GeorgianHyphenator('-')
|
|
87
|
-
|
|
88
|
-
# Hyphenate single word
|
|
89
|
-
print(hyphenator.hyphenate('แกแแฅแแ แแแแแ'))
|
|
90
|
-
# Output: แกแ-แฅแแ -แแแ-แแ
|
|
91
|
-
|
|
92
|
-
# Hyphenate text
|
|
93
|
-
text = 'แกแแฅแแ แแแแแ แแ แแก แแแแแแ แฅแแแงแแแ'
|
|
94
|
-
print(hyphenator.hyphenate_text(text))
|
|
95
|
-
# Output: แกแ-แฅแแ -แแแ-แแ แแ แแก แแ-แแ-แแ แฅแแ-แงแ-แแ
|
|
96
|
-
|
|
97
|
-
# Get syllables as list
|
|
98
|
-
syllables = hyphenator.get_syllables('แแแแแฅแแแแฅแ')
|
|
99
|
-
print(syllables)
|
|
100
|
-
# Output: ['แแ', 'แแ', 'แฅแ', 'แแ', 'แฅแ']
|
|
101
|
-
```
|
|
102
|
-
|
|
103
|
-
### **Using Dictionary (Recommended)**
|
|
104
|
-
```python
|
|
105
|
-
from georgian_hyphenation import GeorgianHyphenator
|
|
106
|
-
|
|
107
|
-
hyphenator = GeorgianHyphenator('-')
|
|
108
|
-
|
|
109
|
-
# Load default dictionary (150+ exception words)
|
|
110
|
-
hyphenator.load_default_library()
|
|
111
|
-
|
|
112
|
-
# Now hyphenation will use dictionary first, then algorithm
|
|
113
|
-
print(hyphenator.hyphenate('แแแแแแฃแขแแ แ'))
|
|
114
|
-
# Output: แแแ-แแแฃ-แขแ-แ แ (from dictionary)
|
|
115
|
-
```
|
|
116
|
-
|
|
117
|
-
### **Convenience Functions**
|
|
118
|
-
```python
|
|
119
|
-
from georgian_hyphenation import hyphenate, get_syllables, hyphenate_text
|
|
120
|
-
|
|
121
|
-
# Quick hyphenation with default settings
|
|
122
|
-
print(hyphenate('แกแแฅแแ แแแแแ'))
|
|
123
|
-
# Output: แกแยญแฅแแ ยญแแแยญแแ (with soft hyphens U+00AD)
|
|
124
|
-
|
|
125
|
-
# Get syllables
|
|
126
|
-
print(get_syllables('แแแแแ แแแ'))
|
|
127
|
-
# Output: ['แแแแ', 'แ แ', 'แแ']
|
|
128
|
-
|
|
129
|
-
# Hyphenate entire text
|
|
130
|
-
text = 'แกแแฅแแ แแแแแ แแ แแก แแแแแแ แฅแแแงแแแ'
|
|
131
|
-
print(hyphenate_text(text))
|
|
132
|
-
```
|
|
133
|
-
|
|
134
|
-
---
|
|
135
|
-
|
|
136
|
-
## ๐จ Hyphen Character Options
|
|
137
|
-
|
|
138
|
-
### **Soft Hyphen (Invisible, default)**
|
|
139
|
-
```python
|
|
140
|
-
# Soft hyphen (U+00AD) - invisible, only appears at line breaks
|
|
141
|
-
hyphenator = GeorgianHyphenator('\u00AD')
|
|
142
|
-
print(hyphenator.hyphenate('แกแแฅแแ แแแแแ'))
|
|
143
|
-
# Output: แกแยญแฅแแ ยญแแแยญแแ (hyphens invisible until line wraps)
|
|
144
|
-
```
|
|
145
|
-
|
|
146
|
-
### **Visible Hyphen**
|
|
147
|
-
```python
|
|
148
|
-
# Regular hyphen - always visible
|
|
149
|
-
hyphenator = GeorgianHyphenator('-')
|
|
150
|
-
print(hyphenator.hyphenate('แกแแฅแแ แแแแแ'))
|
|
151
|
-
# Output: แกแ-แฅแแ -แแแ-แแ
|
|
152
|
-
```
|
|
153
|
-
|
|
154
|
-
### **Middle Dot**
|
|
155
|
-
```python
|
|
156
|
-
# Middle dot - useful for visualization
|
|
157
|
-
hyphenator = GeorgianHyphenator('ยท')
|
|
158
|
-
print(hyphenator.hyphenate('แกแแฅแแ แแแแแ'))
|
|
159
|
-
# Output: แกแยทแฅแแ ยทแแแยทแแ
|
|
160
|
-
```
|
|
161
|
-
|
|
162
|
-
### **Custom Character**
|
|
163
|
-
```python
|
|
164
|
-
# Any character you want
|
|
165
|
-
hyphenator = GeorgianHyphenator('|')
|
|
166
|
-
print(hyphenator.hyphenate('แกแแฅแแ แแแแแ'))
|
|
167
|
-
# Output: แกแ|แฅแแ |แแแ|แแ
|
|
168
|
-
```
|
|
169
|
-
|
|
170
|
-
---
|
|
171
|
-
|
|
172
|
-
## ๐ Advanced Usage
|
|
173
|
-
|
|
174
|
-
### **Custom Dictionary**
|
|
175
|
-
```python
|
|
176
|
-
from georgian_hyphenation import GeorgianHyphenator
|
|
177
|
-
|
|
178
|
-
hyphenator = GeorgianHyphenator('-')
|
|
179
|
-
|
|
180
|
-
# Add your own exception words
|
|
181
|
-
custom_dict = {
|
|
182
|
-
'แแแแแแฃแขแแ แ': 'แแแ-แแแฃ-แขแ-แ แ',
|
|
183
|
-
'แแ แแแ แแแ': 'แแ แแ-แ แ-แแ',
|
|
184
|
-
'แแแขแแ แแแขแ': 'แแ-แขแแ -แแ-แขแ'
|
|
185
|
-
}
|
|
186
|
-
|
|
187
|
-
hyphenator.load_library(custom_dict)
|
|
188
|
-
|
|
189
|
-
# Now these words will use your custom hyphenation
|
|
190
|
-
print(hyphenator.hyphenate('แแแแแแฃแขแแ แ'))
|
|
191
|
-
# Output: แแแ-แแแฃ-แขแ-แ แ
|
|
192
|
-
```
|
|
193
|
-
|
|
194
|
-
### **Combining Default + Custom Dictionary**
|
|
195
|
-
```python
|
|
196
|
-
hyphenator = GeorgianHyphenator('-')
|
|
197
|
-
|
|
198
|
-
# Load default dictionary first
|
|
199
|
-
hyphenator.load_default_library()
|
|
200
|
-
|
|
201
|
-
# Add your custom words
|
|
202
|
-
hyphenator.load_library({
|
|
203
|
-
'แกแแแชแแแแฃแ แ': 'แกแแ-แชแ-แ-แแฃ-แ แ'
|
|
204
|
-
})
|
|
205
|
-
|
|
206
|
-
# Now has both default + custom exceptions
|
|
207
|
-
```
|
|
208
|
-
|
|
209
|
-
### **Export Formats**
|
|
210
|
-
```python
|
|
211
|
-
from georgian_hyphenation import to_tex_pattern, to_hunspell_format
|
|
212
|
-
|
|
213
|
-
# TeX hyphenation pattern
|
|
214
|
-
print(to_tex_pattern('แกแแฅแแ แแแแแ'))
|
|
215
|
-
# Output: .แกแ1แฅแแ 1แแแ1แแ.
|
|
216
|
-
|
|
217
|
-
# Hunspell format
|
|
218
|
-
print(to_hunspell_format('แกแแฅแแ แแแแแ'))
|
|
219
|
-
# Output: แกแ=แฅแแ =แแแ=แแ
|
|
220
|
-
```
|
|
221
|
-
|
|
222
|
-
### **Processing Files**
|
|
223
|
-
```python
|
|
224
|
-
from georgian_hyphenation import GeorgianHyphenator
|
|
225
|
-
|
|
226
|
-
hyphenator = GeorgianHyphenator('\u00AD')
|
|
227
|
-
hyphenator.load_default_library()
|
|
228
|
-
|
|
229
|
-
# Read file
|
|
230
|
-
with open('input.txt', 'r', encoding='utf-8') as f:
|
|
231
|
-
text = f.read()
|
|
232
|
-
|
|
233
|
-
# Hyphenate
|
|
234
|
-
hyphenated = hyphenator.hyphenate_text(text)
|
|
235
|
-
|
|
236
|
-
# Write output
|
|
237
|
-
with open('output.txt', 'w', encoding='utf-8') as f:
|
|
238
|
-
f.write(hyphenated)
|
|
239
|
-
```
|
|
240
|
-
|
|
241
|
-
---
|
|
242
|
-
|
|
243
|
-
## ๐ฌ How It Works
|
|
244
|
-
|
|
245
|
-
### **v2.2.1 Hybrid Engine**
|
|
246
|
-
|
|
247
|
-
1. **Sanitization**: Strip existing hyphens from input
|
|
248
|
-
2. **Dictionary Lookup**: Check exception words first (if loaded)
|
|
249
|
-
3. **Algorithm Fallback**: Apply phonological rules if not in dictionary
|
|
250
|
-
|
|
251
|
-
### **Algorithm Rules**
|
|
252
|
-
|
|
253
|
-
#### **1. Vowel Detection**
|
|
254
|
-
```
|
|
255
|
-
แกแแฅแแ แแแแแ โ vowels at positions: [1, 3, 5, 7]
|
|
256
|
-
```
|
|
257
|
-
|
|
258
|
-
#### **2. Consonant Cluster Analysis**
|
|
259
|
-
|
|
260
|
-
Between each vowel pair:
|
|
261
|
-
|
|
262
|
-
- **0 consonants (V-V)**: Split between vowels
|
|
263
|
-
```python
|
|
264
|
-
'แแแแแแแ' โ 'แแ-แ-แแ-แแ'
|
|
265
|
-
```
|
|
266
|
-
|
|
267
|
-
- **1 consonant (V-C-V)**: Split after first vowel
|
|
268
|
-
```python
|
|
269
|
-
'แแแแ' โ 'แแ-แแ'
|
|
270
|
-
```
|
|
271
|
-
|
|
272
|
-
- **2+ consonants (V-CC...C-V)**:
|
|
273
|
-
1. Check for **gemination** (double consonants) - rare in Georgian
|
|
274
|
-
```python
|
|
275
|
-
'แกแแแแ' โ 'แกแแ-แแ' # Split between double 'แ' (if exists)
|
|
276
|
-
```
|
|
277
|
-
|
|
278
|
-
2. Check for **harmonic clusters**
|
|
279
|
-
```python
|
|
280
|
-
'แแแแแ' โ 'แแแ-แแ' # Keep 'แแ' together
|
|
281
|
-
```
|
|
282
|
-
|
|
283
|
-
3. Default: Split after first consonant
|
|
284
|
-
```python
|
|
285
|
-
'แแแ แแแ แ' โ 'แแแ -แแ-แ แ'
|
|
286
|
-
```
|
|
287
|
-
|
|
288
|
-
#### **3. Harmonic Clusters (62 clusters)**
|
|
289
|
-
|
|
290
|
-
These consonant pairs stay together:
|
|
291
|
-
```
|
|
292
|
-
แแ, แแ , แแฆ, แแ, แแ, แแ, แแ, แแ, แแ, แแ, แแ , แแ , แแ, แแ , แแฆ,
|
|
293
|
-
แแ, แแ, แแ, แแ , แแ, แแข, แแ, แแ , แแฆ, แ แ, แ แ, แ แ, แกแฌ, แกแฎ, แขแ,
|
|
294
|
-
แขแ, แขแ , แคแ, แคแ , แคแฅ, แคแจ, แฅแ, แฅแ, แฅแ, แฅแ , แฆแ, แฆแ , แงแ, แงแ , แจแ,
|
|
295
|
-
แจแ, แฉแฅ, แฉแ , แชแ, แชแ, แชแ , แชแ, แซแ, แซแ, แซแฆ, แฌแ, แฌแ , แฌแ, แฌแ, แญแ,
|
|
296
|
-
แญแ , แญแง, แฎแ, แฎแ, แฎแ, แฎแ, แฏแ
|
|
297
|
-
```
|
|
298
|
-
|
|
299
|
-
#### **4. Anti-Orphan Protection**
|
|
300
|
-
|
|
301
|
-
Minimum 2 characters on each side:
|
|
302
|
-
```python
|
|
303
|
-
'แแ แ' โ 'แแ แ' # Not split (would create 1-letter syllable)
|
|
304
|
-
'แแ แแ' โ 'แ-แ แ-แ' # OK to split
|
|
305
|
-
```
|
|
306
|
-
|
|
307
|
-
---
|
|
308
|
-
|
|
309
|
-
## ๐งช Examples
|
|
310
|
-
|
|
311
|
-
### **Basic Words**
|
|
312
|
-
```python
|
|
313
|
-
hyphenate('แกแแฅแแ แแแแแ') # โ แกแ-แฅแแ -แแแ-แแ
|
|
314
|
-
hyphenate('แแแแแ แแแ') # โ แแแแ-แ แ-แแ
|
|
315
|
-
hyphenate('แแแแแฅแแแแฅแ') # โ แแ-แแ-แฅแ-แแ-แฅแ
|
|
316
|
-
hyphenate('แแแ แแแแแแขแ') # โ แแแ -แแ-แแแ-แขแ
|
|
317
|
-
```
|
|
318
|
-
|
|
319
|
-
### **V-C-V Pattern (Single Consonant)**
|
|
320
|
-
```python
|
|
321
|
-
hyphenate('แแแแกแ') # โ แแแ-แกแ
|
|
322
|
-
hyphenate('แแแกแ') # โ แแ-แกแ
|
|
323
|
-
hyphenate('แแแแ') # โ แแ-แแ
|
|
324
|
-
hyphenate('แแแแ') # โ แแ-แแ
|
|
325
|
-
```
|
|
326
|
-
|
|
327
|
-
### **Harmonic Clusters**
|
|
328
|
-
```python
|
|
329
|
-
hyphenate('แแแแแ') # โ แแแ-แแ (keeps แแ)
|
|
330
|
-
hyphenate('แแ แแแ') # โ แแ แ-แแ (keeps แแ )
|
|
331
|
-
hyphenate('แแแแฎแ') # โ แแแ-แฎแ (keeps แแ)
|
|
332
|
-
hyphenate('แขแ แแแแแ') # โ แขแ แแ-แแ-แ (keeps แขแ )
|
|
333
|
-
hyphenate('แแ แแแ แแแ') # โ แแ แแ-แ แ-แแ (keeps แแ and แแ )
|
|
334
|
-
```
|
|
335
|
-
|
|
336
|
-
### **V-V Split**
|
|
337
|
-
```python
|
|
338
|
-
hyphenate('แแแแแแแ') # โ แแ-แ-แแ-แแ
|
|
339
|
-
hyphenate('แแแแแ แ') # โ แแ-แ-แ-แ แ
|
|
340
|
-
hyphenate('แแแจแแแ') # โ แ-แ-แจแ-แแ
|
|
341
|
-
hyphenate('แแแแแแแแแ') # โ แแ-แ-แแ-แแ-แแ
|
|
342
|
-
```
|
|
343
|
-
|
|
344
|
-
### **Complex Words**
|
|
345
|
-
```python
|
|
346
|
-
hyphenate('แแแแแ แแแ') # โ แแแแ-แ แ-แแ
|
|
347
|
-
hyphenate('แกแแแแแแ แแแ') # โ แกแแ-แแแ-แ แ-แแ
|
|
348
|
-
hyphenate('แแแ แแแ แ') # โ แแแ -แแ-แ แ
|
|
349
|
-
hyphenate('แแกแขแ แแแแแแ') # โ แแก-แขแ แ-แแ-แแ-แ
|
|
350
|
-
```
|
|
351
|
-
|
|
352
|
-
### **Text Processing**
|
|
353
|
-
```python
|
|
354
|
-
text = 'แกแแฅแแ แแแแแ แแ แแก แแแแแแ แฅแแแงแแแ'
|
|
355
|
-
hyphenate_text(text)
|
|
356
|
-
# โ 'แกแยญแฅแแ ยญแแแยญแแ แแ แแก แแยญแแยญแแ แฅแแยญแงแยญแแ'
|
|
357
|
-
|
|
358
|
-
# Preserves punctuation
|
|
359
|
-
text = 'แแแแแ แแแ, แแแ แแแแแแขแ แแ แกแแกแแแแ แแแ.'
|
|
360
|
-
hyphenate_text(text)
|
|
361
|
-
# โ 'แแแแยญแ แยญแแ, แแแ ยญแแยญแแแยญแขแ แแ แกแยญแกแยญแแแ ยญแแแ.'
|
|
362
|
-
|
|
363
|
-
# Preserves numbers and Latin text
|
|
364
|
-
text = 'แกแแฅแแ แแแแแแจแ 2025 แฌแแแก'
|
|
365
|
-
hyphenate_text(text)
|
|
366
|
-
# โ 'แกแยญแฅแแ ยญแแแยญแแยญแจแ 2025 แฌแแแก'
|
|
367
|
-
```
|
|
368
|
-
|
|
369
|
-
### **Get Syllables**
|
|
370
|
-
```python
|
|
371
|
-
get_syllables('แกแแฅแแ แแแแแ') # โ ['แกแ', 'แฅแแ ', 'แแแ', 'แแ']
|
|
372
|
-
get_syllables('แแแแแฅแแแแฅแ') # โ ['แแ', 'แแ', 'แฅแ', 'แแ', 'แฅแ']
|
|
373
|
-
get_syllables('แแแแแ แแแ') # โ ['แแแแ', 'แ แ', 'แแ']
|
|
374
|
-
get_syllables('แแแแแ') # โ ['แแแ', 'แแ']
|
|
375
|
-
```
|
|
376
|
-
|
|
377
|
-
---
|
|
378
|
-
|
|
379
|
-
## ๐ Dictionary
|
|
380
|
-
|
|
381
|
-
The library includes `data/exceptions.json` with 150+ Georgian words that require special hyphenation:
|
|
382
|
-
```json
|
|
383
|
-
{
|
|
384
|
-
"แแแแแแฃแขแแ แ": "แแแ-แแแฃ-แขแ-แ แ",
|
|
385
|
-
"แแแขแแ แแแขแ": "แแ-แขแแ -แแ-แขแ",
|
|
386
|
-
"แกแแฅแแ แแแแแ": "แกแ-แฅแแ -แแแ-แแ",
|
|
387
|
-
"แแ แแแ แแแ": "แแ แแ-แ แ-แแ",
|
|
388
|
-
"แแแแแ แแแ": "แแแแ-แ แ-แแ"
|
|
389
|
-
}
|
|
390
|
-
```
|
|
391
|
-
|
|
392
|
-
Load it with:
|
|
393
|
-
```python
|
|
394
|
-
hyphenator.load_default_library()
|
|
395
|
-
```
|
|
396
|
-
|
|
397
|
-
---
|
|
398
|
-
|
|
399
|
-
## ๐ง API Reference
|
|
400
|
-
|
|
401
|
-
### **Class: GeorgianHyphenator**
|
|
402
|
-
```python
|
|
403
|
-
class GeorgianHyphenator:
|
|
404
|
-
def __init__(self, hyphen_char: str = '\u00AD')
|
|
405
|
-
```
|
|
406
|
-
|
|
407
|
-
**Parameters:**
|
|
408
|
-
- `hyphen_char` (str): Character to use for hyphenation. Default: soft hyphen `\u00AD`
|
|
409
|
-
|
|
410
|
-
---
|
|
411
|
-
|
|
412
|
-
### **Methods**
|
|
413
|
-
|
|
414
|
-
#### **hyphenate(word: str) โ str**
|
|
415
|
-
Hyphenate a single Georgian word.
|
|
416
|
-
```python
|
|
417
|
-
hyphenator = GeorgianHyphenator('-')
|
|
418
|
-
result = hyphenator.hyphenate('แกแแฅแแ แแแแแ')
|
|
419
|
-
# Returns: 'แกแ-แฅแแ -แแแ-แแ'
|
|
420
|
-
```
|
|
421
|
-
|
|
422
|
-
---
|
|
423
|
-
|
|
424
|
-
#### **hyphenate_text(text: str) โ str**
|
|
425
|
-
Hyphenate entire text (preserves punctuation and non-Georgian characters).
|
|
426
|
-
```python
|
|
427
|
-
hyphenator = GeorgianHyphenator('-')
|
|
428
|
-
result = hyphenator.hyphenate_text('แกแแฅแแ แแแแแ แแ แแก แแแแแแ')
|
|
429
|
-
# Returns: 'แกแ-แฅแแ -แแแ-แแ แแ แแก แแ-แแ-แแ'
|
|
430
|
-
```
|
|
431
|
-
|
|
432
|
-
---
|
|
433
|
-
|
|
434
|
-
#### **get_syllables(word: str) โ List[str]**
|
|
435
|
-
Get syllables as a list.
|
|
436
|
-
```python
|
|
437
|
-
hyphenator = GeorgianHyphenator('-')
|
|
438
|
-
syllables = hyphenator.get_syllables('แกแแฅแแ แแแแแ')
|
|
439
|
-
# Returns: ['แกแ', 'แฅแแ ', 'แแแ', 'แแ']
|
|
440
|
-
```
|
|
441
|
-
|
|
442
|
-
---
|
|
443
|
-
|
|
444
|
-
#### **load_library(data: Dict[str, str]) โ None**
|
|
445
|
-
Load custom dictionary.
|
|
446
|
-
```python
|
|
447
|
-
hyphenator.load_library({
|
|
448
|
-
'แกแแขแงแแ': 'แกแ-แขแงแแ',
|
|
449
|
-
'แแแแแแแแ': 'แแ-แแ-แแ-แแ'
|
|
450
|
-
})
|
|
451
|
-
```
|
|
452
|
-
|
|
453
|
-
---
|
|
454
|
-
|
|
455
|
-
#### **load_default_library() โ None**
|
|
456
|
-
Load default exception dictionary from `data/exceptions.json`.
|
|
457
|
-
```python
|
|
458
|
-
hyphenator.load_default_library()
|
|
459
|
-
```
|
|
460
|
-
|
|
461
|
-
---
|
|
462
|
-
|
|
463
|
-
### **Convenience Functions**
|
|
464
|
-
|
|
465
|
-
#### **hyphenate(word: str, hyphen_char: str = '\u00AD') โ str**
|
|
466
|
-
```python
|
|
467
|
-
from georgian_hyphenation import hyphenate
|
|
468
|
-
result = hyphenate('แกแแฅแแ แแแแแ', '-')
|
|
469
|
-
```
|
|
470
|
-
|
|
471
|
-
#### **get_syllables(word: str) โ List[str]**
|
|
472
|
-
```python
|
|
473
|
-
from georgian_hyphenation import get_syllables
|
|
474
|
-
syllables = get_syllables('แกแแฅแแ แแแแแ')
|
|
475
|
-
```
|
|
476
|
-
|
|
477
|
-
#### **hyphenate_text(text: str, hyphen_char: str = '\u00AD') โ str**
|
|
478
|
-
```python
|
|
479
|
-
from georgian_hyphenation import hyphenate_text
|
|
480
|
-
result = hyphenate_text('แกแแฅแแ แแแแแ แแ แแก แแแแแแ')
|
|
481
|
-
```
|
|
482
|
-
|
|
483
|
-
#### **to_tex_pattern(word: str) โ str**
|
|
484
|
-
```python
|
|
485
|
-
from georgian_hyphenation import to_tex_pattern
|
|
486
|
-
pattern = to_tex_pattern('แกแแฅแแ แแแแแ')
|
|
487
|
-
# Returns: '.แกแ1แฅแแ 1แแแ1แแ.'
|
|
488
|
-
```
|
|
489
|
-
|
|
490
|
-
#### **to_hunspell_format(word: str) โ str**
|
|
491
|
-
```python
|
|
492
|
-
from georgian_hyphenation import to_hunspell_format
|
|
493
|
-
hunspell = to_hunspell_format('แกแแฅแแ แแแแแ')
|
|
494
|
-
# Returns: 'แกแ=แฅแแ =แแแ=แแ'
|
|
495
|
-
```
|
|
496
|
-
|
|
497
|
-
---
|
|
498
|
-
|
|
499
|
-
## ๐งช Testing
|
|
500
|
-
|
|
501
|
-
Run the test suite:
|
|
502
|
-
```bash
|
|
503
|
-
python test_python.py
|
|
504
|
-
```
|
|
505
|
-
|
|
506
|
-
Expected output:
|
|
507
|
-
```
|
|
508
|
-
๐งช Georgian Hyphenation v2.2.1 - Python Tests
|
|
509
|
-
|
|
510
|
-
๐ Basic Hyphenation Tests:
|
|
511
|
-
โ
Test 1: แกแแฅแแ แแแแแ
|
|
512
|
-
Result: แกแ-แฅแแ -แแแ-แแ
|
|
513
|
-
...
|
|
514
|
-
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
|
515
|
-
๐ Test Results: 13 passed, 0 failed
|
|
516
|
-
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
|
517
|
-
๐ All tests passed!
|
|
518
|
-
```
|
|
519
|
-
|
|
520
|
-
---
|
|
521
|
-
|
|
522
|
-
## ๐ Project Structure
|
|
523
|
-
```
|
|
524
|
-
georgian-hyphenation/
|
|
525
|
-
โโโ data/
|
|
526
|
-
โ โโโ exceptions.json # Dictionary (150+ words)
|
|
527
|
-
โโโ src/
|
|
528
|
-
โ โโโ georgian_hyphenation/
|
|
529
|
-
โ โโโ __init__.py # Package init
|
|
530
|
-
โ โโโ hyphenator.py # Main code
|
|
531
|
-
โโโ test_python.py # Test suite
|
|
532
|
-
โโโ pyproject.toml # Package config
|
|
533
|
-
โโโ MANIFEST.in # Data files manifest
|
|
534
|
-
โโโ README.md # This file
|
|
535
|
-
โโโ LICENSE.txt # MIT License
|
|
536
|
-
```
|
|
537
|
-
|
|
538
|
-
---
|
|
539
|
-
|
|
540
|
-
## ๐ Changelog
|
|
541
|
-
|
|
542
|
-
### **v2.2.1 (2025-01-27)**
|
|
543
|
-
- โจ Optimized: Set-based harmonic cluster lookup (O(1) instead of O(n))
|
|
544
|
-
- โจ Added 12 new harmonic clusters: แแ , แแ , แแ , แแฆ, แแข, แจแ, แฉแ , แฌแ, แญแง
|
|
545
|
-
- ๐ Strip & Re-hyphenate: Always removes old hyphens and reapplies correctly
|
|
546
|
-
- ๐ฆ Dictionary: 150+ exception words in `data/exceptions.json`
|
|
547
|
-
- ๐ฏ Hybrid Engine: Dictionary-first, Algorithm fallback
|
|
548
|
-
- ๐ Improved documentation with detailed API reference
|
|
549
|
-
|
|
550
|
-
### **v2.0.0 (2024)**
|
|
551
|
-
- Initial release
|
|
552
|
-
- Phonological algorithm
|
|
553
|
-
- Basic harmonic cluster handling
|
|
554
|
-
- TeX and Hunspell export formats
|
|
555
|
-
|
|
556
|
-
---
|
|
557
|
-
|
|
558
|
-
## ๐ค Contributing
|
|
559
|
-
|
|
560
|
-
Contributions are welcome! To contribute:
|
|
561
|
-
|
|
562
|
-
1. Fork the repository: https://github.com/guramzhgamadze/georgian-hyphenation
|
|
563
|
-
2. Create a feature branch: `git checkout -b feature/new-feature`
|
|
564
|
-
3. Make your changes
|
|
565
|
-
4. Run tests: `python test_python.py`
|
|
566
|
-
5. Commit: `git commit -m 'Add new feature'`
|
|
567
|
-
6. Push: `git push origin feature/new-feature`
|
|
568
|
-
7. Open a Pull Request
|
|
569
|
-
|
|
570
|
-
### **Adding Exception Words**
|
|
571
|
-
|
|
572
|
-
To add words to the dictionary:
|
|
573
|
-
|
|
574
|
-
1. Edit `data/exceptions.json`
|
|
575
|
-
2. Add your word in format: `"แกแแขแงแแ": "แกแ-แขแงแแ"`
|
|
576
|
-
3. Test: `python test_python.py`
|
|
577
|
-
4. Submit PR
|
|
578
|
-
|
|
579
|
-
---
|
|
580
|
-
|
|
581
|
-
## ๐ Bug Reports
|
|
582
|
-
|
|
583
|
-
Found a bug? Please open an issue:
|
|
584
|
-
https://github.com/guramzhgamadze/georgian-hyphenation/issues
|
|
585
|
-
|
|
586
|
-
Include:
|
|
587
|
-
- Python version
|
|
588
|
-
- Code snippet that reproduces the issue
|
|
589
|
-
- Expected vs actual output
|
|
590
|
-
|
|
591
|
-
---
|
|
592
|
-
|
|
593
|
-
## ๐ License
|
|
594
|
-
|
|
595
|
-
MIT License
|
|
596
|
-
|
|
597
|
-
Copyright (c) 2025 Guram Zhgamadze
|
|
598
|
-
|
|
599
|
-
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
600
|
-
of this software and associated documentation files (the "Software"), to deal
|
|
601
|
-
in the Software without restriction, including without limitation the rights
|
|
602
|
-
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
603
|
-
copies of the Software, and to permit persons to whom the Software is
|
|
604
|
-
furnished to do so, subject to the following conditions:
|
|
605
|
-
|
|
606
|
-
The above copyright notice and this permission notice shall be included in all
|
|
607
|
-
copies or substantial portions of the Software.
|
|
608
|
-
|
|
609
|
-
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
610
|
-
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
611
|
-
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
612
|
-
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
613
|
-
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
614
|
-
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
615
|
-
SOFTWARE.
|
|
616
|
-
|
|
617
|
-
---
|
|
618
|
-
|
|
619
|
-
## ๐จโ๐ป Author
|
|
620
|
-
|
|
621
|
-
**Guram Zhgamadze**
|
|
622
|
-
|
|
623
|
-
- GitHub: [@guramzhgamadze](https://github.com/guramzhgamadze)
|
|
624
|
-
- Email: guramzhgamadze@gmail.com
|
|
625
|
-
- PyPI: [georgian-hyphenation](https://pypi.org/project/georgian-hyphenation/)
|
|
626
|
-
|
|
627
|
-
---
|
|
628
|
-
|
|
629
|
-
## ๐ Acknowledgments
|
|
630
|
-
|
|
631
|
-
- Georgian linguistic research on syllabification
|
|
632
|
-
- TeX hyphenation algorithm inspiration
|
|
633
|
-
- Python community for excellent packaging tools
|
|
634
|
-
|
|
635
|
-
---
|
|
636
|
-
|
|
637
|
-
## ๐ Related Projects
|
|
638
|
-
|
|
639
|
-
- [Hyphen](https://github.com/hunspell/hyphen) - Generic hyphenation library
|
|
640
|
-
- [PyHyphen](https://github.com/dr-leo/PyHyphen) - Python wrapper for Hyphen
|
|
641
|
-
- [TeX hyphenation patterns](http://www.ctan.org/tex-archive/language/hyph-utf8)
|
|
642
|
-
|
|
643
|
-
---
|
|
644
|
-
|
|
645
|
-
## โญ Support
|
|
646
|
-
|
|
647
|
-
If you find this library useful, please:
|
|
648
|
-
- โญ Star the repository on GitHub
|
|
649
|
-
- ๐ข Share with others
|
|
650
|
-
- ๐ Report bugs
|
|
651
|
-
- ๐ก Suggest improvements
|
|
652
|
-
|
|
653
|
-
---
|
|
654
|
-
|
|
655
|
-
**Made with โค๏ธ for the Georgian language community**
|
|
656
|
-
|
|
657
|
-
๐ฌ๐ช **แฅแแ แแฃแแ แแแแก แชแแคแ แฃแแ แแแแแแแแ แแแแกแแแแก**
|
|
@@ -1,14 +0,0 @@
|
|
|
1
|
-
LICENSE.txt
|
|
2
|
-
MANIFEST.in
|
|
3
|
-
README.md
|
|
4
|
-
pyproject.toml
|
|
5
|
-
setup.py
|
|
6
|
-
data/exceptions.json
|
|
7
|
-
src/georgian_hyphenation/__init__.py
|
|
8
|
-
src/georgian_hyphenation/hyphenator.py
|
|
9
|
-
src/georgian_hyphenation.egg-info/PKG-INFO
|
|
10
|
-
src/georgian_hyphenation.egg-info/SOURCES.txt
|
|
11
|
-
src/georgian_hyphenation.egg-info/dependency_links.txt
|
|
12
|
-
src/georgian_hyphenation.egg-info/requires.txt
|
|
13
|
-
src/georgian_hyphenation.egg-info/top_level.txt
|
|
14
|
-
tests/test_basic.py
|
|
@@ -1 +0,0 @@
|
|
|
1
|
-
|