uzmorph 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- uzmorph-0.1.0/MANIFEST.in +2 -0
- uzmorph-0.1.0/PKG-INFO +79 -0
- uzmorph-0.1.0/README.md +65 -0
- uzmorph-0.1.0/cse.csv +1432 -0
- uzmorph-0.1.0/exception_stems.csv +125 -0
- uzmorph-0.1.0/lemma_map.csv +30 -0
- uzmorph-0.1.0/non_affixed_stems.csv +125 -0
- uzmorph-0.1.0/number_stems.csv +25 -0
- uzmorph-0.1.0/pyproject.toml +3 -0
- uzmorph-0.1.0/root.csv +119 -0
- uzmorph-0.1.0/setup.cfg +4 -0
- uzmorph-0.1.0/setup.py +26 -0
- uzmorph-0.1.0/small_stems.csv +20 -0
- uzmorph-0.1.0/uzmorph.egg-info/PKG-INFO +79 -0
- uzmorph-0.1.0/uzmorph.egg-info/SOURCES.txt +23 -0
- uzmorph-0.1.0/uzmorph.egg-info/dependency_links.txt +1 -0
- uzmorph-0.1.0/uzmorph.egg-info/top_level.txt +1 -0
- uzmorph-0.1.0/uzmorph.py +433 -0
uzmorph-0.1.0/PKG-INFO
ADDED
|
@@ -0,0 +1,79 @@
|
|
|
1
|
+
Metadata-Version: 2.1
|
|
2
|
+
Name: uzmorph
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: A rule-based morphological analyzer for the Uzbek language based on CSE (Complete Set of Endings) and annotated morphological tags
|
|
5
|
+
Home-page: https://github.com/UlugbekSalaev/uzmorph
|
|
6
|
+
Author: Ulugbek Salaev
|
|
7
|
+
Author-email: ulugbek.salaev@urdu.uz
|
|
8
|
+
Classifier: Programming Language :: Python :: 3
|
|
9
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
10
|
+
Classifier: Operating System :: OS Independent
|
|
11
|
+
Classifier: Topic :: Text Processing :: Linguistic
|
|
12
|
+
Requires-Python: >=3.6
|
|
13
|
+
Description-Content-Type: text/markdown
|
|
14
|
+
|
|
15
|
+
# uzmorph
|
|
16
|
+
|
|
17
|
+
**uzmorph** is a professional morphological analyzer for the Uzbek language based on **CSE (Complete Set of Endings)** rules and comprehensive morphological tagging. It provides deep linguistic analysis by identifying stems, lemmas, and a wide array of annotated morphological features.
|
|
18
|
+
|
|
19
|
+
## Key Features
|
|
20
|
+
|
|
21
|
+
- **CSE-Based Analysis:** Employs advanced suffix stripping rules (**Complete Set of Endings**) for precise morphological segmentation.
|
|
22
|
+
- **Rich Morphological Tagging:** Extracts detailed features including part-of-speech (POS), tense, person, possession, case, and voice.
|
|
23
|
+
- **Flat JSON Output:** Returns analysis results in a developer-friendly, flattened JSON-compatible format.
|
|
24
|
+
- **Professional API:** Designed for easy integration with standard English-named methods and formatted terminal output.
|
|
25
|
+
|
|
26
|
+
## Installation
|
|
27
|
+
|
|
28
|
+
```bash
|
|
29
|
+
pip install uzmorph
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
## Quick Start
|
|
33
|
+
|
|
34
|
+
```python
|
|
35
|
+
from uzmorph import UzMorph
|
|
36
|
+
|
|
37
|
+
# Initialize the analyzer
|
|
38
|
+
analyzer = UzMorph()
|
|
39
|
+
|
|
40
|
+
# Analyze a word
|
|
41
|
+
results = analyzer.analyze("maktabimda")
|
|
42
|
+
|
|
43
|
+
# Formatted console print
|
|
44
|
+
analyzer.print_result(results)
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
## JSON Result Sample
|
|
48
|
+
|
|
49
|
+
Each analysis result is a dictionary containing the following structure:
|
|
50
|
+
|
|
51
|
+
```json
|
|
52
|
+
[
|
|
53
|
+
{
|
|
54
|
+
"word": "maktabimda",
|
|
55
|
+
"stem": "maktab",
|
|
56
|
+
"lemma": "maktab",
|
|
57
|
+
"cse": "imda",
|
|
58
|
+
"cse_formula": "(i)mda",
|
|
59
|
+
"pos": "NOUN",
|
|
60
|
+
"possession": "1",
|
|
61
|
+
"cases": "Locative",
|
|
62
|
+
"singular": "1",
|
|
63
|
+
"syntactical_affixes": "(i)m da",
|
|
64
|
+
"note": null,
|
|
65
|
+
"ball": 108
|
|
66
|
+
}
|
|
67
|
+
]
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
## API Reference
|
|
71
|
+
|
|
72
|
+
### `UzMorph` Class
|
|
73
|
+
- `analyze(word, pos_filter=None)`: Performs morphological analysis and returns a list of results.
|
|
74
|
+
- `print_result(results)`: Prints formatted output to the console.
|
|
75
|
+
- `get_pos_list()`: Returns a formatted string of all available POS tags.
|
|
76
|
+
- `get_features_list()`: Returns a list of all possible property keys in the result.
|
|
77
|
+
|
|
78
|
+
## License
|
|
79
|
+
MIT
|
uzmorph-0.1.0/README.md
ADDED
|
@@ -0,0 +1,65 @@
|
|
|
1
|
+
# uzmorph
|
|
2
|
+
|
|
3
|
+
**uzmorph** is a professional morphological analyzer for the Uzbek language based on **CSE (Complete Set of Endings)** rules and comprehensive morphological tagging. It provides deep linguistic analysis by identifying stems, lemmas, and a wide array of annotated morphological features.
|
|
4
|
+
|
|
5
|
+
## Key Features
|
|
6
|
+
|
|
7
|
+
- **CSE-Based Analysis:** Employs advanced suffix stripping rules (**Complete Set of Endings**) for precise morphological segmentation.
|
|
8
|
+
- **Rich Morphological Tagging:** Extracts detailed features including part-of-speech (POS), tense, person, possession, case, and voice.
|
|
9
|
+
- **Flat JSON Output:** Returns analysis results in a developer-friendly, flattened JSON-compatible format.
|
|
10
|
+
- **Professional API:** Designed for easy integration with standard English-named methods and formatted terminal output.
|
|
11
|
+
|
|
12
|
+
## Installation
|
|
13
|
+
|
|
14
|
+
```bash
|
|
15
|
+
pip install uzmorph
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
## Quick Start
|
|
19
|
+
|
|
20
|
+
```python
|
|
21
|
+
from uzmorph import UzMorph
|
|
22
|
+
|
|
23
|
+
# Initialize the analyzer
|
|
24
|
+
analyzer = UzMorph()
|
|
25
|
+
|
|
26
|
+
# Analyze a word
|
|
27
|
+
results = analyzer.analyze("maktabimda")
|
|
28
|
+
|
|
29
|
+
# Formatted console print
|
|
30
|
+
analyzer.print_result(results)
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
## JSON Result Sample
|
|
34
|
+
|
|
35
|
+
Each analysis result is a dictionary containing the following structure:
|
|
36
|
+
|
|
37
|
+
```json
|
|
38
|
+
[
|
|
39
|
+
{
|
|
40
|
+
"word": "maktabimda",
|
|
41
|
+
"stem": "maktab",
|
|
42
|
+
"lemma": "maktab",
|
|
43
|
+
"cse": "imda",
|
|
44
|
+
"cse_formula": "(i)mda",
|
|
45
|
+
"pos": "NOUN",
|
|
46
|
+
"possession": "1",
|
|
47
|
+
"cases": "Locative",
|
|
48
|
+
"singular": "1",
|
|
49
|
+
"syntactical_affixes": "(i)m da",
|
|
50
|
+
"note": null,
|
|
51
|
+
"ball": 108
|
|
52
|
+
}
|
|
53
|
+
]
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
## API Reference
|
|
57
|
+
|
|
58
|
+
### `UzMorph` Class
|
|
59
|
+
- `analyze(word, pos_filter=None)`: Performs morphological analysis and returns a list of results.
|
|
60
|
+
- `print_result(results)`: Prints formatted output to the console.
|
|
61
|
+
- `get_pos_list()`: Returns a formatted string of all available POS tags.
|
|
62
|
+
- `get_features_list()`: Returns a list of all possible property keys in the result.
|
|
63
|
+
|
|
64
|
+
## License
|
|
65
|
+
MIT
|