georgian-hyphenation 2.2.3 โ†’ 2.2.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,657 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: georgian-hyphenation
3
- Version: 2.2.2
4
- Summary: Georgian Language Hyphenation Library v2.2.1 - Modernized & Optimized with Dictionary Support
5
- Home-page: https://github.com/guramzhgamadze/georgian-hyphenation
6
- Author: Guram Zhgamadze
7
- Author-email: Guram Zhgamadze <guramzhgamadze@gmail.com>
8
- License: MIT
9
- Project-URL: Homepage, https://github.com/guramzhgamadze/georgian-hyphenation
10
- Project-URL: Repository, https://github.com/guramzhgamadze/georgian-hyphenation
11
- Project-URL: Documentation, https://github.com/guramzhgamadze/georgian-hyphenation#readme
12
- Project-URL: Bug Tracker, https://github.com/guramzhgamadze/georgian-hyphenation/issues
13
- Keywords: georgian,hyphenation,syllabification,nlp,linguistics,kartuli,dictionary
14
- Classifier: Development Status :: 5 - Production/Stable
15
- Classifier: Intended Audience :: Developers
16
- Classifier: License :: OSI Approved :: MIT License
17
- Classifier: Programming Language :: Python :: 3
18
- Classifier: Programming Language :: Python :: 3.7
19
- Classifier: Programming Language :: Python :: 3.8
20
- Classifier: Programming Language :: Python :: 3.9
21
- Classifier: Programming Language :: Python :: 3.10
22
- Classifier: Programming Language :: Python :: 3.11
23
- Classifier: Programming Language :: Python :: 3.12
24
- Classifier: Topic :: Text Processing :: Linguistic
25
- Classifier: Natural Language :: Georgian
26
- Requires-Python: >=3.7
27
- Description-Content-Type: text/markdown
28
- License-File: LICENSE.txt
29
- Provides-Extra: dev
30
- Requires-Dist: pytest>=7.0; extra == "dev"
31
- Dynamic: author
32
- Dynamic: home-page
33
- Dynamic: license-file
34
- Dynamic: requires-python
35
-
36
- # ๐Ÿ‡ฌ๐Ÿ‡ช Georgian Hyphenation - Python Library
37
-
38
- [![PyPI version](https://badge.fury.io/py/georgian-hyphenation.svg)](https://pypi.org/project/georgian-hyphenation/)
39
- [![Python versions](https://img.shields.io/pypi/pyversions/georgian-hyphenation.svg)](https://pypi.org/project/georgian-hyphenation/)
40
- [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
41
-
42
- **Georgian Language Hyphenation Library v2.2.1** - แƒฅแƒแƒ แƒ—แƒฃแƒšแƒ˜ แƒ”แƒœแƒ˜แƒก แƒ“แƒแƒ›แƒแƒ แƒชแƒ•แƒšแƒ˜แƒก แƒ‘แƒ˜แƒ‘แƒšแƒ˜แƒแƒ—แƒ”แƒ™แƒ
43
-
44
- Automatic hyphenation (syllabification) for Georgian text with hybrid engine: **Algorithm + Dictionary**.
45
-
46
- ---
47
-
48
- ## โœจ Features
49
-
50
- ### **v2.2.1 (Latest)**
51
- - ๐ŸŽฏ **Hybrid Engine**: Algorithm + Dictionary (150+ exception words)
52
- - โšก **Optimized Performance**: Set-based harmonic cluster lookup (O(1))
53
- - ๐Ÿ”„ **Strip & Re-hyphenate**: Corrects old incorrect hyphenation
54
- - ๐ŸŽต **Harmonic Clusters**: Preserves natural Georgian sound clusters (แƒ‘แƒš, แƒ’แƒš, แƒ™แƒ , etc.)
55
- - ๐Ÿ’Ž **Gemination Handling**: Splits double consonants correctly (rare in Georgian)
56
- - ๐Ÿ›ก๏ธ **Anti-Orphan Protection**: Minimum 2 characters on each side
57
- - ๐Ÿ **Pure Python**: No external dependencies
58
- - ๐ŸŒ **Unicode Support**: Full Georgian script support
59
-
60
- ### **Core Algorithm**
61
- - Phonological distance analysis
62
- - Vowel-based syllable detection
63
- - Contextual consonant cluster handling
64
- - Punctuation preservation
65
-
66
- ---
67
-
68
- ## ๐Ÿ“ฆ Installation
69
- ```bash
70
- pip install georgian-hyphenation
71
- ```
72
-
73
- ### **Requirements**
74
- - Python 3.7+
75
- - No external dependencies (uses only standard library)
76
-
77
- ---
78
-
79
- ## ๐Ÿš€ Quick Start
80
-
81
- ### **Basic Usage**
82
- ```python
83
- from georgian_hyphenation import GeorgianHyphenator
84
-
85
- # Initialize with visible hyphen
86
- hyphenator = GeorgianHyphenator('-')
87
-
88
- # Hyphenate single word
89
- print(hyphenator.hyphenate('แƒกแƒแƒฅแƒแƒ แƒ—แƒ•แƒ”แƒšแƒ'))
90
- # Output: แƒกแƒ-แƒฅแƒแƒ -แƒ—แƒ•แƒ”-แƒšแƒ
91
-
92
- # Hyphenate text
93
- text = 'แƒกแƒแƒฅแƒแƒ แƒ—แƒ•แƒ”แƒšแƒ แƒแƒ แƒ˜แƒก แƒšแƒแƒ›แƒแƒ–แƒ˜ แƒฅแƒ•แƒ”แƒงแƒแƒœแƒ'
94
- print(hyphenator.hyphenate_text(text))
95
- # Output: แƒกแƒ-แƒฅแƒแƒ -แƒ—แƒ•แƒ”-แƒšแƒ แƒแƒ แƒ˜แƒก แƒšแƒ-แƒ›แƒ-แƒ–แƒ˜ แƒฅแƒ•แƒ”-แƒงแƒ-แƒœแƒ
96
-
97
- # Get syllables as list
98
- syllables = hyphenator.get_syllables('แƒ“แƒ”แƒ“แƒแƒฅแƒแƒšแƒแƒฅแƒ˜')
99
- print(syllables)
100
- # Output: ['แƒ“แƒ”', 'แƒ“แƒ', 'แƒฅแƒ', 'แƒšแƒ', 'แƒฅแƒ˜']
101
- ```
102
-
103
- ### **Using Dictionary (Recommended)**
104
- ```python
105
- from georgian_hyphenation import GeorgianHyphenator
106
-
107
- hyphenator = GeorgianHyphenator('-')
108
-
109
- # Load default dictionary (150+ exception words)
110
- hyphenator.load_default_library()
111
-
112
- # Now hyphenation will use dictionary first, then algorithm
113
- print(hyphenator.hyphenate('แƒ™แƒแƒ›แƒžแƒ˜แƒฃแƒขแƒ”แƒ แƒ˜'))
114
- # Output: แƒ™แƒแƒ›-แƒžแƒ˜แƒฃ-แƒขแƒ”-แƒ แƒ˜ (from dictionary)
115
- ```
116
-
117
- ### **Convenience Functions**
118
- ```python
119
- from georgian_hyphenation import hyphenate, get_syllables, hyphenate_text
120
-
121
- # Quick hyphenation with default settings
122
- print(hyphenate('แƒกแƒแƒฅแƒแƒ แƒ—แƒ•แƒ”แƒšแƒ'))
123
- # Output: แƒกแƒยญแƒฅแƒแƒ ยญแƒ—แƒ•แƒ”ยญแƒšแƒ (with soft hyphens U+00AD)
124
-
125
- # Get syllables
126
- print(get_syllables('แƒ›แƒ—แƒแƒ•แƒ แƒแƒ‘แƒ'))
127
- # Output: ['แƒ›แƒ—แƒแƒ•', 'แƒ แƒ', 'แƒ‘แƒ']
128
-
129
- # Hyphenate entire text
130
- text = 'แƒกแƒแƒฅแƒแƒ แƒ—แƒ•แƒ”แƒšแƒ แƒแƒ แƒ˜แƒก แƒšแƒแƒ›แƒแƒ–แƒ˜ แƒฅแƒ•แƒ”แƒงแƒแƒœแƒ'
131
- print(hyphenate_text(text))
132
- ```
133
-
134
- ---
135
-
136
- ## ๐ŸŽจ Hyphen Character Options
137
-
138
- ### **Soft Hyphen (Invisible, default)**
139
- ```python
140
- # Soft hyphen (U+00AD) - invisible, only appears at line breaks
141
- hyphenator = GeorgianHyphenator('\u00AD')
142
- print(hyphenator.hyphenate('แƒกแƒแƒฅแƒแƒ แƒ—แƒ•แƒ”แƒšแƒ'))
143
- # Output: แƒกแƒยญแƒฅแƒแƒ ยญแƒ—แƒ•แƒ”ยญแƒšแƒ (hyphens invisible until line wraps)
144
- ```
145
-
146
- ### **Visible Hyphen**
147
- ```python
148
- # Regular hyphen - always visible
149
- hyphenator = GeorgianHyphenator('-')
150
- print(hyphenator.hyphenate('แƒกแƒแƒฅแƒแƒ แƒ—แƒ•แƒ”แƒšแƒ'))
151
- # Output: แƒกแƒ-แƒฅแƒแƒ -แƒ—แƒ•แƒ”-แƒšแƒ
152
- ```
153
-
154
- ### **Middle Dot**
155
- ```python
156
- # Middle dot - useful for visualization
157
- hyphenator = GeorgianHyphenator('ยท')
158
- print(hyphenator.hyphenate('แƒกแƒแƒฅแƒแƒ แƒ—แƒ•แƒ”แƒšแƒ'))
159
- # Output: แƒกแƒยทแƒฅแƒแƒ ยทแƒ—แƒ•แƒ”ยทแƒšแƒ
160
- ```
161
-
162
- ### **Custom Character**
163
- ```python
164
- # Any character you want
165
- hyphenator = GeorgianHyphenator('|')
166
- print(hyphenator.hyphenate('แƒกแƒแƒฅแƒแƒ แƒ—แƒ•แƒ”แƒšแƒ'))
167
- # Output: แƒกแƒ|แƒฅแƒแƒ |แƒ—แƒ•แƒ”|แƒšแƒ
168
- ```
169
-
170
- ---
171
-
172
- ## ๐Ÿ“š Advanced Usage
173
-
174
- ### **Custom Dictionary**
175
- ```python
176
- from georgian_hyphenation import GeorgianHyphenator
177
-
178
- hyphenator = GeorgianHyphenator('-')
179
-
180
- # Add your own exception words
181
- custom_dict = {
182
- 'แƒ™แƒแƒ›แƒžแƒ˜แƒฃแƒขแƒ”แƒ แƒ˜': 'แƒ™แƒแƒ›-แƒžแƒ˜แƒฃ-แƒขแƒ”-แƒ แƒ˜',
183
- 'แƒžแƒ แƒแƒ’แƒ แƒแƒ›แƒ': 'แƒžแƒ แƒแƒ’-แƒ แƒ-แƒ›แƒ',
184
- 'แƒ˜แƒœแƒขแƒ”แƒ แƒœแƒ”แƒขแƒ˜': 'แƒ˜แƒœ-แƒขแƒ”แƒ -แƒœแƒ”-แƒขแƒ˜'
185
- }
186
-
187
- hyphenator.load_library(custom_dict)
188
-
189
- # Now these words will use your custom hyphenation
190
- print(hyphenator.hyphenate('แƒ™แƒแƒ›แƒžแƒ˜แƒฃแƒขแƒ”แƒ แƒ˜'))
191
- # Output: แƒ™แƒแƒ›-แƒžแƒ˜แƒฃ-แƒขแƒ”-แƒ แƒ˜
192
- ```
193
-
194
- ### **Combining Default + Custom Dictionary**
195
- ```python
196
- hyphenator = GeorgianHyphenator('-')
197
-
198
- # Load default dictionary first
199
- hyphenator.load_default_library()
200
-
201
- # Add your custom words
202
- hyphenator.load_library({
203
- 'แƒกแƒžแƒ”แƒชแƒ˜แƒแƒšแƒฃแƒ แƒ˜': 'แƒกแƒžแƒ”-แƒชแƒ˜-แƒ-แƒšแƒฃ-แƒ แƒ˜'
204
- })
205
-
206
- # Now has both default + custom exceptions
207
- ```
208
-
209
- ### **Export Formats**
210
- ```python
211
- from georgian_hyphenation import to_tex_pattern, to_hunspell_format
212
-
213
- # TeX hyphenation pattern
214
- print(to_tex_pattern('แƒกแƒแƒฅแƒแƒ แƒ—แƒ•แƒ”แƒšแƒ'))
215
- # Output: .แƒกแƒ1แƒฅแƒแƒ 1แƒ—แƒ•แƒ”1แƒšแƒ.
216
-
217
- # Hunspell format
218
- print(to_hunspell_format('แƒกแƒแƒฅแƒแƒ แƒ—แƒ•แƒ”แƒšแƒ'))
219
- # Output: แƒกแƒ=แƒฅแƒแƒ =แƒ—แƒ•แƒ”=แƒšแƒ
220
- ```
221
-
222
- ### **Processing Files**
223
- ```python
224
- from georgian_hyphenation import GeorgianHyphenator
225
-
226
- hyphenator = GeorgianHyphenator('\u00AD')
227
- hyphenator.load_default_library()
228
-
229
- # Read file
230
- with open('input.txt', 'r', encoding='utf-8') as f:
231
- text = f.read()
232
-
233
- # Hyphenate
234
- hyphenated = hyphenator.hyphenate_text(text)
235
-
236
- # Write output
237
- with open('output.txt', 'w', encoding='utf-8') as f:
238
- f.write(hyphenated)
239
- ```
240
-
241
- ---
242
-
243
- ## ๐Ÿ”ฌ How It Works
244
-
245
- ### **v2.2.1 Hybrid Engine**
246
-
247
- 1. **Sanitization**: Strip existing hyphens from input
248
- 2. **Dictionary Lookup**: Check exception words first (if loaded)
249
- 3. **Algorithm Fallback**: Apply phonological rules if not in dictionary
250
-
251
- ### **Algorithm Rules**
252
-
253
- #### **1. Vowel Detection**
254
- ```
255
- แƒกแƒแƒฅแƒแƒ แƒ—แƒ•แƒ”แƒšแƒ โ†’ vowels at positions: [1, 3, 5, 7]
256
- ```
257
-
258
- #### **2. Consonant Cluster Analysis**
259
-
260
- Between each vowel pair:
261
-
262
- - **0 consonants (V-V)**: Split between vowels
263
- ```python
264
- 'แƒ’แƒแƒแƒ™แƒ”แƒ—แƒ' โ†’ 'แƒ’แƒ-แƒ-แƒ™แƒ”-แƒ—แƒ'
265
- ```
266
-
267
- - **1 consonant (V-C-V)**: Split after first vowel
268
- ```python
269
- 'แƒ›แƒแƒ›แƒ' โ†’ 'แƒ›แƒ-แƒ›แƒ'
270
- ```
271
-
272
- - **2+ consonants (V-CC...C-V)**:
273
- 1. Check for **gemination** (double consonants) - rare in Georgian
274
- ```python
275
- 'แƒกแƒแƒ›แƒ›แƒ' โ†’ 'แƒกแƒแƒ›-แƒ›แƒ' # Split between double 'แƒ›' (if exists)
276
- ```
277
-
278
- 2. Check for **harmonic clusters**
279
- ```python
280
- 'แƒ‘แƒšแƒแƒ™แƒ˜' โ†’ 'แƒ‘แƒšแƒ-แƒ™แƒ˜' # Keep 'แƒ‘แƒš' together
281
- ```
282
-
283
- 3. Default: Split after first consonant
284
- ```python
285
- 'แƒ‘แƒแƒ แƒ‘แƒแƒ แƒ”' โ†’ 'แƒ‘แƒแƒ -แƒ‘แƒ-แƒ แƒ”'
286
- ```
287
-
288
- #### **3. Harmonic Clusters (62 clusters)**
289
-
290
- These consonant pairs stay together:
291
- ```
292
- แƒ‘แƒš, แƒ‘แƒ , แƒ‘แƒฆ, แƒ‘แƒ–, แƒ’แƒ“, แƒ’แƒš, แƒ’แƒ›, แƒ’แƒœ, แƒ’แƒ•, แƒ’แƒ–, แƒ’แƒ , แƒ“แƒ , แƒ—แƒš, แƒ—แƒ , แƒ—แƒฆ,
293
- แƒ™แƒš, แƒ™แƒ›, แƒ™แƒœ, แƒ™แƒ , แƒ™แƒ•, แƒ›แƒข, แƒžแƒš, แƒžแƒ , แƒŸแƒฆ, แƒ แƒ’, แƒ แƒš, แƒ แƒ›, แƒกแƒฌ, แƒกแƒฎ, แƒขแƒ™,
294
- แƒขแƒž, แƒขแƒ , แƒคแƒš, แƒคแƒ , แƒคแƒฅ, แƒคแƒจ, แƒฅแƒš, แƒฅแƒœ, แƒฅแƒ•, แƒฅแƒ , แƒฆแƒš, แƒฆแƒ , แƒงแƒš, แƒงแƒ , แƒจแƒ—,
295
- แƒจแƒž, แƒฉแƒฅ, แƒฉแƒ , แƒชแƒš, แƒชแƒœ, แƒชแƒ , แƒชแƒ•, แƒซแƒ’, แƒซแƒ•, แƒซแƒฆ, แƒฌแƒš, แƒฌแƒ , แƒฌแƒœ, แƒฌแƒ™, แƒญแƒ™,
296
- แƒญแƒ , แƒญแƒง, แƒฎแƒš, แƒฎแƒ›, แƒฎแƒœ, แƒฎแƒ•, แƒฏแƒ’
297
- ```
298
-
299
- #### **4. Anti-Orphan Protection**
300
-
301
- Minimum 2 characters on each side:
302
- ```python
303
- 'แƒแƒ แƒ' โ†’ 'แƒแƒ แƒ' # Not split (would create 1-letter syllable)
304
- 'แƒแƒ แƒแƒ' โ†’ 'แƒ-แƒ แƒ-แƒ' # OK to split
305
- ```
306
-
307
- ---
308
-
309
- ## ๐Ÿงช Examples
310
-
311
- ### **Basic Words**
312
- ```python
313
- hyphenate('แƒกแƒแƒฅแƒแƒ แƒ—แƒ•แƒ”แƒšแƒ') # โ†’ แƒกแƒ-แƒฅแƒแƒ -แƒ—แƒ•แƒ”-แƒšแƒ
314
- hyphenate('แƒ›แƒ—แƒแƒ•แƒ แƒแƒ‘แƒ') # โ†’ แƒ›แƒ—แƒแƒ•-แƒ แƒ-แƒ‘แƒ
315
- hyphenate('แƒ“แƒ”แƒ“แƒแƒฅแƒแƒšแƒแƒฅแƒ˜') # โ†’ แƒ“แƒ”-แƒ“แƒ-แƒฅแƒ-แƒšแƒ-แƒฅแƒ˜
316
- hyphenate('แƒžแƒแƒ แƒšแƒแƒ›แƒ”แƒœแƒขแƒ˜') # โ†’ แƒžแƒแƒ -แƒšแƒ-แƒ›แƒ”แƒœ-แƒขแƒ˜
317
- ```
318
-
319
- ### **V-C-V Pattern (Single Consonant)**
320
- ```python
321
- hyphenate('แƒ™แƒšแƒแƒกแƒ˜') # โ†’ แƒ™แƒšแƒ-แƒกแƒ˜
322
- hyphenate('แƒ›แƒแƒกแƒ') # โ†’ แƒ›แƒ-แƒกแƒ
323
- hyphenate('แƒ›แƒแƒ›แƒ') # โ†’ แƒ›แƒ-แƒ›แƒ
324
- hyphenate('แƒ‘แƒแƒ‘แƒ') # โ†’ แƒ‘แƒ-แƒ‘แƒ
325
- ```
326
-
327
- ### **Harmonic Clusters**
328
- ```python
329
- hyphenate('แƒ‘แƒšแƒแƒ™แƒ˜') # โ†’ แƒ‘แƒšแƒ-แƒ™แƒ˜ (keeps แƒ‘แƒš)
330
- hyphenate('แƒ™แƒ แƒ”แƒ›แƒ˜') # โ†’ แƒ™แƒ แƒ”-แƒ›แƒ˜ (keeps แƒ™แƒ )
331
- hyphenate('แƒ’แƒšแƒ”แƒฎแƒ˜') # โ†’ แƒ’แƒšแƒ”-แƒฎแƒ˜ (keeps แƒ’แƒš)
332
- hyphenate('แƒขแƒ แƒแƒ›แƒ•แƒแƒ˜') # โ†’ แƒขแƒ แƒแƒ›-แƒ•แƒ-แƒ˜ (keeps แƒขแƒ )
333
- hyphenate('แƒžแƒ แƒแƒ’แƒ แƒแƒ›แƒ') # โ†’ แƒžแƒ แƒแƒ’-แƒ แƒ-แƒ›แƒ (keeps แƒžแƒ  and แƒ’แƒ )
334
- ```
335
-
336
- ### **V-V Split**
337
- ```python
338
- hyphenate('แƒ’แƒแƒแƒ™แƒ”แƒ—แƒ') # โ†’ แƒ’แƒ-แƒ-แƒ™แƒ”-แƒ—แƒ
339
- hyphenate('แƒ’แƒแƒ˜แƒแƒ แƒ') # โ†’ แƒ’แƒ-แƒ˜-แƒ-แƒ แƒ
340
- hyphenate('แƒแƒแƒจแƒ”แƒœแƒ') # โ†’ แƒ-แƒ-แƒจแƒ”-แƒœแƒ
341
- hyphenate('แƒ’แƒแƒแƒœแƒแƒšแƒ˜แƒ–แƒ') # โ†’ แƒ’แƒ-แƒ-แƒœแƒ-แƒšแƒ˜-แƒ–แƒ
342
- ```
343
-
344
- ### **Complex Words**
345
- ```python
346
- hyphenate('แƒ›แƒ—แƒแƒ•แƒ แƒแƒ‘แƒ') # โ†’ แƒ›แƒ—แƒแƒ•-แƒ แƒ-แƒ‘แƒ
347
- hyphenate('แƒกแƒแƒ›แƒ—แƒแƒ•แƒ แƒแƒ‘แƒ') # โ†’ แƒกแƒแƒ›-แƒ—แƒแƒ•-แƒ แƒ-แƒ‘แƒ
348
- hyphenate('แƒ‘แƒแƒ แƒ‘แƒแƒ แƒ”') # โ†’ แƒ‘แƒแƒ -แƒ‘แƒ-แƒ แƒ”
349
- hyphenate('แƒแƒกแƒขแƒ แƒแƒœแƒแƒ›แƒ˜แƒ') # โ†’ แƒแƒก-แƒขแƒ แƒ-แƒœแƒ-แƒ›แƒ˜-แƒ
350
- ```
351
-
352
- ### **Text Processing**
353
- ```python
354
- text = 'แƒกแƒแƒฅแƒแƒ แƒ—แƒ•แƒ”แƒšแƒ แƒแƒ แƒ˜แƒก แƒšแƒแƒ›แƒแƒ–แƒ˜ แƒฅแƒ•แƒ”แƒงแƒแƒœแƒ'
355
- hyphenate_text(text)
356
- # โ†’ 'แƒกแƒยญแƒฅแƒแƒ ยญแƒ—แƒ•แƒ”ยญแƒšแƒ แƒแƒ แƒ˜แƒก แƒšแƒยญแƒ›แƒยญแƒ–แƒ˜ แƒฅแƒ•แƒ”ยญแƒงแƒยญแƒœแƒ'
357
-
358
- # Preserves punctuation
359
- text = 'แƒ›แƒ—แƒแƒ•แƒ แƒแƒ‘แƒ, แƒžแƒแƒ แƒšแƒแƒ›แƒ”แƒœแƒขแƒ˜ แƒ“แƒ แƒกแƒแƒกแƒแƒ›แƒแƒ แƒ—แƒšแƒ.'
360
- hyphenate_text(text)
361
- # โ†’ 'แƒ›แƒ—แƒแƒ•ยญแƒ แƒยญแƒ‘แƒ, แƒžแƒแƒ ยญแƒšแƒยญแƒ›แƒ”แƒœยญแƒขแƒ˜ แƒ“แƒ แƒกแƒยญแƒกแƒยญแƒ›แƒแƒ ยญแƒ—แƒšแƒ.'
362
-
363
- # Preserves numbers and Latin text
364
- text = 'แƒกแƒแƒฅแƒแƒ แƒ—แƒ•แƒ”แƒšแƒแƒจแƒ˜ 2025 แƒฌแƒ”แƒšแƒก'
365
- hyphenate_text(text)
366
- # โ†’ 'แƒกแƒยญแƒฅแƒแƒ ยญแƒ—แƒ•แƒ”ยญแƒšแƒยญแƒจแƒ˜ 2025 แƒฌแƒ”แƒšแƒก'
367
- ```
368
-
369
- ### **Get Syllables**
370
- ```python
371
- get_syllables('แƒกแƒแƒฅแƒแƒ แƒ—แƒ•แƒ”แƒšแƒ') # โ†’ ['แƒกแƒ', 'แƒฅแƒแƒ ', 'แƒ—แƒ•แƒ”', 'แƒšแƒ']
372
- get_syllables('แƒ“แƒ”แƒ“แƒแƒฅแƒแƒšแƒแƒฅแƒ˜') # โ†’ ['แƒ“แƒ”', 'แƒ“แƒ', 'แƒฅแƒ', 'แƒšแƒ', 'แƒฅแƒ˜']
373
- get_syllables('แƒ›แƒ—แƒแƒ•แƒ แƒแƒ‘แƒ') # โ†’ ['แƒ›แƒ—แƒแƒ•', 'แƒ แƒ', 'แƒ‘แƒ']
374
- get_syllables('แƒ‘แƒšแƒแƒ™แƒ˜') # โ†’ ['แƒ‘แƒšแƒ', 'แƒ™แƒ˜']
375
- ```
376
-
377
- ---
378
-
379
- ## ๐Ÿ“Š Dictionary
380
-
381
- The library includes `data/exceptions.json` with 150+ Georgian words that require special hyphenation:
382
- ```json
383
- {
384
- "แƒ™แƒแƒ›แƒžแƒ˜แƒฃแƒขแƒ”แƒ แƒ˜": "แƒ™แƒแƒ›-แƒžแƒ˜แƒฃ-แƒขแƒ”-แƒ แƒ˜",
385
- "แƒ˜แƒœแƒขแƒ”แƒ แƒœแƒ”แƒขแƒ˜": "แƒ˜แƒœ-แƒขแƒ”แƒ -แƒœแƒ”-แƒขแƒ˜",
386
- "แƒกแƒแƒฅแƒแƒ แƒ—แƒ•แƒ”แƒšแƒ": "แƒกแƒ-แƒฅแƒแƒ -แƒ—แƒ•แƒ”-แƒšแƒ",
387
- "แƒžแƒ แƒแƒ’แƒ แƒแƒ›แƒ": "แƒžแƒ แƒแƒ’-แƒ แƒ-แƒ›แƒ",
388
- "แƒ›แƒ—แƒแƒ•แƒ แƒแƒ‘แƒ": "แƒ›แƒ—แƒแƒ•-แƒ แƒ-แƒ‘แƒ"
389
- }
390
- ```
391
-
392
- Load it with:
393
- ```python
394
- hyphenator.load_default_library()
395
- ```
396
-
397
- ---
398
-
399
- ## ๐Ÿ”ง API Reference
400
-
401
- ### **Class: GeorgianHyphenator**
402
- ```python
403
- class GeorgianHyphenator:
404
- def __init__(self, hyphen_char: str = '\u00AD')
405
- ```
406
-
407
- **Parameters:**
408
- - `hyphen_char` (str): Character to use for hyphenation. Default: soft hyphen `\u00AD`
409
-
410
- ---
411
-
412
- ### **Methods**
413
-
414
- #### **hyphenate(word: str) โ†’ str**
415
- Hyphenate a single Georgian word.
416
- ```python
417
- hyphenator = GeorgianHyphenator('-')
418
- result = hyphenator.hyphenate('แƒกแƒแƒฅแƒแƒ แƒ—แƒ•แƒ”แƒšแƒ')
419
- # Returns: 'แƒกแƒ-แƒฅแƒแƒ -แƒ—แƒ•แƒ”-แƒšแƒ'
420
- ```
421
-
422
- ---
423
-
424
- #### **hyphenate_text(text: str) โ†’ str**
425
- Hyphenate entire text (preserves punctuation and non-Georgian characters).
426
- ```python
427
- hyphenator = GeorgianHyphenator('-')
428
- result = hyphenator.hyphenate_text('แƒกแƒแƒฅแƒแƒ แƒ—แƒ•แƒ”แƒšแƒ แƒแƒ แƒ˜แƒก แƒšแƒแƒ›แƒแƒ–แƒ˜')
429
- # Returns: 'แƒกแƒ-แƒฅแƒแƒ -แƒ—แƒ•แƒ”-แƒšแƒ แƒแƒ แƒ˜แƒก แƒšแƒ-แƒ›แƒ-แƒ–แƒ˜'
430
- ```
431
-
432
- ---
433
-
434
- #### **get_syllables(word: str) โ†’ List[str]**
435
- Get syllables as a list.
436
- ```python
437
- hyphenator = GeorgianHyphenator('-')
438
- syllables = hyphenator.get_syllables('แƒกแƒแƒฅแƒแƒ แƒ—แƒ•แƒ”แƒšแƒ')
439
- # Returns: ['แƒกแƒ', 'แƒฅแƒแƒ ', 'แƒ—แƒ•แƒ”', 'แƒšแƒ']
440
- ```
441
-
442
- ---
443
-
444
- #### **load_library(data: Dict[str, str]) โ†’ None**
445
- Load custom dictionary.
446
- ```python
447
- hyphenator.load_library({
448
- 'แƒกแƒ˜แƒขแƒงแƒ•แƒ': 'แƒกแƒ˜-แƒขแƒงแƒ•แƒ',
449
- 'แƒ›แƒแƒ’แƒแƒšแƒ˜แƒ—แƒ˜': 'แƒ›แƒ-แƒ’แƒ-แƒšแƒ˜-แƒ—แƒ˜'
450
- })
451
- ```
452
-
453
- ---
454
-
455
- #### **load_default_library() โ†’ None**
456
- Load default exception dictionary from `data/exceptions.json`.
457
- ```python
458
- hyphenator.load_default_library()
459
- ```
460
-
461
- ---
462
-
463
- ### **Convenience Functions**
464
-
465
- #### **hyphenate(word: str, hyphen_char: str = '\u00AD') โ†’ str**
466
- ```python
467
- from georgian_hyphenation import hyphenate
468
- result = hyphenate('แƒกแƒแƒฅแƒแƒ แƒ—แƒ•แƒ”แƒšแƒ', '-')
469
- ```
470
-
471
- #### **get_syllables(word: str) โ†’ List[str]**
472
- ```python
473
- from georgian_hyphenation import get_syllables
474
- syllables = get_syllables('แƒกแƒแƒฅแƒแƒ แƒ—แƒ•แƒ”แƒšแƒ')
475
- ```
476
-
477
- #### **hyphenate_text(text: str, hyphen_char: str = '\u00AD') โ†’ str**
478
- ```python
479
- from georgian_hyphenation import hyphenate_text
480
- result = hyphenate_text('แƒกแƒแƒฅแƒแƒ แƒ—แƒ•แƒ”แƒšแƒ แƒแƒ แƒ˜แƒก แƒšแƒแƒ›แƒแƒ–แƒ˜')
481
- ```
482
-
483
- #### **to_tex_pattern(word: str) โ†’ str**
484
- ```python
485
- from georgian_hyphenation import to_tex_pattern
486
- pattern = to_tex_pattern('แƒกแƒแƒฅแƒแƒ แƒ—แƒ•แƒ”แƒšแƒ')
487
- # Returns: '.แƒกแƒ1แƒฅแƒแƒ 1แƒ—แƒ•แƒ”1แƒšแƒ.'
488
- ```
489
-
490
- #### **to_hunspell_format(word: str) โ†’ str**
491
- ```python
492
- from georgian_hyphenation import to_hunspell_format
493
- hunspell = to_hunspell_format('แƒกแƒแƒฅแƒแƒ แƒ—แƒ•แƒ”แƒšแƒ')
494
- # Returns: 'แƒกแƒ=แƒฅแƒแƒ =แƒ—แƒ•แƒ”=แƒšแƒ'
495
- ```
496
-
497
- ---
498
-
499
- ## ๐Ÿงช Testing
500
-
501
- Run the test suite:
502
- ```bash
503
- python test_python.py
504
- ```
505
-
506
- Expected output:
507
- ```
508
- ๐Ÿงช Georgian Hyphenation v2.2.1 - Python Tests
509
-
510
- ๐Ÿ“‹ Basic Hyphenation Tests:
511
- โœ… Test 1: แƒกแƒแƒฅแƒแƒ แƒ—แƒ•แƒ”แƒšแƒ
512
- Result: แƒกแƒ-แƒฅแƒแƒ -แƒ—แƒ•แƒ”-แƒšแƒ
513
- ...
514
- โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
515
- ๐Ÿ“Š Test Results: 13 passed, 0 failed
516
- โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
517
- ๐ŸŽ‰ All tests passed!
518
- ```
519
-
520
- ---
521
-
522
- ## ๐Ÿ“ Project Structure
523
- ```
524
- georgian-hyphenation/
525
- โ”œโ”€โ”€ data/
526
- โ”‚ โ””โ”€โ”€ exceptions.json # Dictionary (150+ words)
527
- โ”œโ”€โ”€ src/
528
- โ”‚ โ””โ”€โ”€ georgian_hyphenation/
529
- โ”‚ โ”œโ”€โ”€ __init__.py # Package init
530
- โ”‚ โ””โ”€โ”€ hyphenator.py # Main code
531
- โ”œโ”€โ”€ test_python.py # Test suite
532
- โ”œโ”€โ”€ pyproject.toml # Package config
533
- โ”œโ”€โ”€ MANIFEST.in # Data files manifest
534
- โ”œโ”€โ”€ README.md # This file
535
- โ””โ”€โ”€ LICENSE.txt # MIT License
536
- ```
537
-
538
- ---
539
-
540
- ## ๐Ÿ“œ Changelog
541
-
542
- ### **v2.2.1 (2025-01-27)**
543
- - โœจ Optimized: Set-based harmonic cluster lookup (O(1) instead of O(n))
544
- - โœจ Added 12 new harmonic clusters: แƒ‘แƒ , แƒ’แƒ , แƒ“แƒ , แƒ—แƒฆ, แƒ›แƒข, แƒจแƒž, แƒฉแƒ , แƒฌแƒ™, แƒญแƒง
545
- - ๐Ÿ”„ Strip & Re-hyphenate: Always removes old hyphens and reapplies correctly
546
- - ๐Ÿ“ฆ Dictionary: 150+ exception words in `data/exceptions.json`
547
- - ๐ŸŽฏ Hybrid Engine: Dictionary-first, Algorithm fallback
548
- - ๐Ÿ“ Improved documentation with detailed API reference
549
-
550
- ### **v2.0.0 (2024)**
551
- - Initial release
552
- - Phonological algorithm
553
- - Basic harmonic cluster handling
554
- - TeX and Hunspell export formats
555
-
556
- ---
557
-
558
- ## ๐Ÿค Contributing
559
-
560
- Contributions are welcome! To contribute:
561
-
562
- 1. Fork the repository: https://github.com/guramzhgamadze/georgian-hyphenation
563
- 2. Create a feature branch: `git checkout -b feature/new-feature`
564
- 3. Make your changes
565
- 4. Run tests: `python test_python.py`
566
- 5. Commit: `git commit -m 'Add new feature'`
567
- 6. Push: `git push origin feature/new-feature`
568
- 7. Open a Pull Request
569
-
570
- ### **Adding Exception Words**
571
-
572
- To add words to the dictionary:
573
-
574
- 1. Edit `data/exceptions.json`
575
- 2. Add your word in format: `"แƒกแƒ˜แƒขแƒงแƒ•แƒ": "แƒกแƒ˜-แƒขแƒงแƒ•แƒ"`
576
- 3. Test: `python test_python.py`
577
- 4. Submit PR
578
-
579
- ---
580
-
581
- ## ๐Ÿ› Bug Reports
582
-
583
- Found a bug? Please open an issue:
584
- https://github.com/guramzhgamadze/georgian-hyphenation/issues
585
-
586
- Include:
587
- - Python version
588
- - Code snippet that reproduces the issue
589
- - Expected vs actual output
590
-
591
- ---
592
-
593
- ## ๐Ÿ“„ License
594
-
595
- MIT License
596
-
597
- Copyright (c) 2025 Guram Zhgamadze
598
-
599
- Permission is hereby granted, free of charge, to any person obtaining a copy
600
- of this software and associated documentation files (the "Software"), to deal
601
- in the Software without restriction, including without limitation the rights
602
- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
603
- copies of the Software, and to permit persons to whom the Software is
604
- furnished to do so, subject to the following conditions:
605
-
606
- The above copyright notice and this permission notice shall be included in all
607
- copies or substantial portions of the Software.
608
-
609
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
610
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
611
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
612
- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
613
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
614
- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
615
- SOFTWARE.
616
-
617
- ---
618
-
619
- ## ๐Ÿ‘จโ€๐Ÿ’ป Author
620
-
621
- **Guram Zhgamadze**
622
-
623
- - GitHub: [@guramzhgamadze](https://github.com/guramzhgamadze)
624
- - Email: guramzhgamadze@gmail.com
625
- - PyPI: [georgian-hyphenation](https://pypi.org/project/georgian-hyphenation/)
626
-
627
- ---
628
-
629
- ## ๐Ÿ™ Acknowledgments
630
-
631
- - Georgian linguistic research on syllabification
632
- - TeX hyphenation algorithm inspiration
633
- - Python community for excellent packaging tools
634
-
635
- ---
636
-
637
- ## ๐Ÿ“š Related Projects
638
-
639
- - [Hyphen](https://github.com/hunspell/hyphen) - Generic hyphenation library
640
- - [PyHyphen](https://github.com/dr-leo/PyHyphen) - Python wrapper for Hyphen
641
- - [TeX hyphenation patterns](http://www.ctan.org/tex-archive/language/hyph-utf8)
642
-
643
- ---
644
-
645
- ## โญ Support
646
-
647
- If you find this library useful, please:
648
- - โญ Star the repository on GitHub
649
- - ๐Ÿ“ข Share with others
650
- - ๐Ÿ› Report bugs
651
- - ๐Ÿ’ก Suggest improvements
652
-
653
- ---
654
-
655
- **Made with โค๏ธ for the Georgian language community**
656
-
657
- ๐Ÿ‡ฌ๐Ÿ‡ช **แƒฅแƒแƒ แƒ—แƒฃแƒšแƒ˜ แƒ”แƒœแƒ˜แƒก แƒชแƒ˜แƒคแƒ แƒฃแƒšแƒ˜ แƒ’แƒแƒœแƒ•แƒ˜แƒ—แƒแƒ แƒ”แƒ‘แƒ˜แƒกแƒ—แƒ•แƒ˜แƒก**
@@ -1,14 +0,0 @@
1
- LICENSE.txt
2
- MANIFEST.in
3
- README.md
4
- pyproject.toml
5
- setup.py
6
- data/exceptions.json
7
- src/georgian_hyphenation/__init__.py
8
- src/georgian_hyphenation/hyphenator.py
9
- src/georgian_hyphenation.egg-info/PKG-INFO
10
- src/georgian_hyphenation.egg-info/SOURCES.txt
11
- src/georgian_hyphenation.egg-info/dependency_links.txt
12
- src/georgian_hyphenation.egg-info/requires.txt
13
- src/georgian_hyphenation.egg-info/top_level.txt
14
- tests/test_basic.py
@@ -1,3 +0,0 @@
1
-
2
- [dev]
3
- pytest>=7.0
@@ -1,2 +0,0 @@
1
- georgian_hyphenation
2
- javascript