SinaTools 0.1.35__py2.py3-none-any.whl → 0.1.37__py2.py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {SinaTools-0.1.35.dist-info → SinaTools-0.1.37.dist-info}/METADATA +63 -64
- {SinaTools-0.1.35.dist-info → SinaTools-0.1.37.dist-info}/RECORD +15 -19
- {SinaTools-0.1.35.dist-info → SinaTools-0.1.37.dist-info}/WHEEL +6 -6
- {SinaTools-0.1.35.dist-info → SinaTools-0.1.37.dist-info}/entry_points.txt +0 -1
- sinatools/CLI/DataDownload/download_files.py +9 -8
- sinatools/VERSION +1 -1
- sinatools/ner/trainers/BertNestedTrainer.py +203 -203
- sinatools/ner/trainers/BertTrainer.py +163 -163
- sinatools/ner/trainers/__init__.py +2 -2
- sinatools/utils/similarity.py +62 -27
- sinatools/wsd/disambiguator.py +14 -90
- sinatools/ner/data.py +0 -124
- sinatools/ner/relation_extractor.py +0 -201
- sinatools/utils/implication.py +0 -662
- sinatools/utils/jaccard.py +0 -247
- {SinaTools-0.1.35.data → SinaTools-0.1.37.data}/data/sinatools/environment.yml +0 -0
- {SinaTools-0.1.35.dist-info → SinaTools-0.1.37.dist-info}/AUTHORS.rst +0 -0
- {SinaTools-0.1.35.dist-info → SinaTools-0.1.37.dist-info}/LICENSE +0 -0
- {SinaTools-0.1.35.dist-info → SinaTools-0.1.37.dist-info}/top_level.txt +0 -0
@@ -1,64 +1,63 @@
|
|
1
|
-
Metadata-Version: 2.1
|
2
|
-
Name: SinaTools
|
3
|
-
Version: 0.1.
|
4
|
-
Summary: Open-source Python toolkit for Arabic Natural Understanding, allowing people to integrate it in their system workflow.
|
5
|
-
Home-page: https://github.com/SinaLab/sinatools
|
6
|
-
License: MIT license
|
7
|
-
Keywords: sinatools
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
Requires-Dist:
|
12
|
-
Requires-Dist:
|
13
|
-
Requires-Dist:
|
14
|
-
Requires-Dist:
|
15
|
-
Requires-Dist:
|
16
|
-
Requires-Dist:
|
17
|
-
Requires-Dist:
|
18
|
-
Requires-Dist:
|
19
|
-
Requires-Dist:
|
20
|
-
Requires-Dist:
|
21
|
-
Requires-Dist:
|
22
|
-
|
23
|
-
|
24
|
-
|
25
|
-
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
32
|
-
|
33
|
-
|
34
|
-
|
35
|
-
|
36
|
-
|
37
|
-
|
38
|
-
|
39
|
-
|
40
|
-
|
41
|
-
|
42
|
-
|
43
|
-
|
44
|
-
|
45
|
-
|
46
|
-
|
47
|
-
|
48
|
-
|
49
|
-
|
50
|
-
|
51
|
-
|
52
|
-
|
53
|
-
|
54
|
-
|
55
|
-
|
56
|
-
|
57
|
-
|
58
|
-
|
59
|
-
|
60
|
-
|
61
|
-
|
62
|
-
|
63
|
-
|
64
|
-
|
1
|
+
Metadata-Version: 2.1
|
2
|
+
Name: SinaTools
|
3
|
+
Version: 0.1.37
|
4
|
+
Summary: Open-source Python toolkit for Arabic Natural Understanding, allowing people to integrate it in their system workflow.
|
5
|
+
Home-page: https://github.com/SinaLab/sinatools
|
6
|
+
License: MIT license
|
7
|
+
Keywords: sinatools
|
8
|
+
Description-Content-Type: text/markdown
|
9
|
+
License-File: LICENSE
|
10
|
+
License-File: AUTHORS.rst
|
11
|
+
Requires-Dist: six
|
12
|
+
Requires-Dist: farasapy
|
13
|
+
Requires-Dist: tqdm
|
14
|
+
Requires-Dist: requests
|
15
|
+
Requires-Dist: regex
|
16
|
+
Requires-Dist: pathlib
|
17
|
+
Requires-Dist: torch ==1.13.0
|
18
|
+
Requires-Dist: transformers ==4.24.0
|
19
|
+
Requires-Dist: torchtext ==0.14.0
|
20
|
+
Requires-Dist: torchvision ==0.14.0
|
21
|
+
Requires-Dist: seqeval ==1.2.2
|
22
|
+
Requires-Dist: natsort ==7.1.1
|
23
|
+
|
24
|
+
SinaTools
|
25
|
+
======================
|
26
|
+
Open Source Toolkit for Arabic NLP and NLU developed by [SinaLab](http://sina.birzeit.edu/) at Birzeit University. SinaTools is available through Python APIs, command lines, colabs, and online demos.
|
27
|
+
|
28
|
+
See the full list of [Available Packages](https://sina.birzeit.edu/sinatools/), which include: (1) [Morphology Tagging](https://sina.birzeit.edu/sinatools/index.html#morph), (2) [Named Entity Recognition (NER)](https://sina.birzeit.edu/sinatools/index.html#ner), (3) [Word Sense Disambiguation (WSD)](https://sina.birzeit.edu/sinatools/index.html#wsd), (4) [Semantic Relatedness](https://sina.birzeit.edu/sinatools/index.html#sr), (5) [Synonymy Extraction and Evaluation](https://sina.birzeit.edu/sinatools/index.html#se), (6) [Relation Extraction](https://sina.birzeit.edu/sinatools/index.html#re), (7) [Utilities](https://sina.birzeit.edu/sinatools/index.html#u) (diacritic-based word matching, Jaccard similarly, parser, tokenizers, corpora processing, transliteration, etc).
|
29
|
+
|
30
|
+
See [Demo Pages](https://sina.birzeit.edu/sinatools/).
|
31
|
+
|
32
|
+
See the [benchmarking](https://www.jarrar.info/publications/HJK24.pdf), which shows that SinaTools outperformed all related toolkits.
|
33
|
+
|
34
|
+
Installation
|
35
|
+
--------
|
36
|
+
To install SinaTools, ensure you are using Python version 3.10.8, then clone the [GitHub](git://github.com/SinaLab/SinaTools) repository.
|
37
|
+
|
38
|
+
Alternatively, you can execute the following command:
|
39
|
+
|
40
|
+
```bash
|
41
|
+
pip install sinatools
|
42
|
+
```
|
43
|
+
|
44
|
+
Installing Models and Data Files
|
45
|
+
--------
|
46
|
+
Some modules in SinaTools require some data files and fine-tuned models to be downloaded. To download these models, please consult the [DataDownload](https://sina.birzeit.edu/sinatools/documentation/cli_tools/DataDownload/DataDownload.html).
|
47
|
+
|
48
|
+
Documentation
|
49
|
+
--------
|
50
|
+
For information, please refer to the [main page](https://sina.birzeit.edu/sinatools) or the [online domuementation](https://sina.birzeit.edu/sinatools/documentation).
|
51
|
+
|
52
|
+
Citation
|
53
|
+
-------
|
54
|
+
Tymaa Hammouda, Mustafa Jarrar, Mohammed Khalilia: [SinaTools: Open Source Toolkit for Arabic Natural Language Understanding](http://www.jarrar.info/publications/HJK24.pdf). In Proceedings of the 2024 AI in Computational Linguistics (ACLing 2024), Procedia Computer Science, Dubai. ELSEVIER.
|
55
|
+
|
56
|
+
License
|
57
|
+
--------
|
58
|
+
SinaTools is available under the MIT License. See the [LICENSE](https://github.com/SinaLab/sinatools/blob/main/LICENSE) file for more information.
|
59
|
+
|
60
|
+
Reporting Issues
|
61
|
+
--------
|
62
|
+
To report any issues or bugs, please contact us at "sina.institute.bzu@gmail.com" or visit [SinaTools Issues](https://github.com/SinaLab/sinatools/issues).
|
63
|
+
|
@@ -1,10 +1,10 @@
|
|
1
|
-
SinaTools-0.1.
|
2
|
-
sinatools/VERSION,sha256=
|
1
|
+
SinaTools-0.1.37.data/data/sinatools/environment.yml,sha256=OzilhLjZbo_3nU93EQNUFX-6G5O3newiSWrwxvMH2Os,7231
|
2
|
+
sinatools/VERSION,sha256=rds3CaJrvi4kNl0xJMt9fYHplBe78mGMmyBFfi9Zsco,6
|
3
3
|
sinatools/__init__.py,sha256=bEosTU1o-FSpyytS6iVP_82BXHF2yHnzpJxPLYRbeII,135
|
4
4
|
sinatools/environment.yml,sha256=OzilhLjZbo_3nU93EQNUFX-6G5O3newiSWrwxvMH2Os,7231
|
5
5
|
sinatools/install_env.py,sha256=EODeeE0ZzfM_rz33_JSIruX03Nc4ghyVOM5BHVhsZaQ,404
|
6
6
|
sinatools/sinatools.py,sha256=vR5AaF0iel21LvsdcqwheoBz0SIj9K9I_Ub8M8oA98Y,20
|
7
|
-
sinatools/CLI/DataDownload/download_files.py,sha256=
|
7
|
+
sinatools/CLI/DataDownload/download_files.py,sha256=EezvbukR3pZ8s6mGZnzTcjsbo3CBDlC0g6KhJWlYp1w,2686
|
8
8
|
sinatools/CLI/morphology/ALMA_multi_word.py,sha256=rmpa72twwIJHme_kpQ1lu3_7y_Jorj70QTvOnQMJRuI,1274
|
9
9
|
sinatools/CLI/morphology/morph_analyzer.py,sha256=HPamEKos_JRYCJv_2q6c12N--da58_JXTno9haww5Ao,3497
|
10
10
|
sinatools/CLI/ner/corpus_entity_extractor.py,sha256=DdvigsDQzko5nJBjzUXlIDqoBMBTVzktjSo7JfEXTIA,4778
|
@@ -77,13 +77,11 @@ sinatools/morphology/ALMA_multi_word.py,sha256=hj_-8ojrYYHnfCGk8WKtJdUR8mauzQdma
|
|
77
77
|
sinatools/morphology/__init__.py,sha256=I4wVBh8BhyNl-CySVdiI_nUSn6gj1j-gmLKP300RpE0,1216
|
78
78
|
sinatools/morphology/morph_analyzer.py,sha256=JOH2UWKNQWo5UzpWNzP9R1D3B3qLSogIiMp8n0N_56o,7177
|
79
79
|
sinatools/ner/__init__.py,sha256=59kLMX6UQhF6JpE10RhaDYC3a2_jiWOIVPuejsoflFE,1050
|
80
|
-
sinatools/ner/data.py,sha256=lvOW86dXse8SC75Q0supQaE0rrRffoxNjIA0Qbv5WZY,4354
|
81
80
|
sinatools/ner/data_format.py,sha256=7Yt0aOicOn9_YuuyCkM_IYi_rgjGYxR9bCuUaNGM73o,4341
|
82
81
|
sinatools/ner/datasets.py,sha256=mG1iwqSm3lXCFHLqE-b4wNi176cpuzNBz8tKaBU6z6M,5059
|
83
82
|
sinatools/ner/entity_extractor.py,sha256=O2epRwRFUUcQs3SnFIYHVBI4zVhr8hRcj0XJYeby4ts,3588
|
84
83
|
sinatools/ner/helpers.py,sha256=dnOoDY5JMyOLTUWVIZLMt8mBn2IbWlVaqHhQyjs1voo,2343
|
85
84
|
sinatools/ner/metrics.py,sha256=Irz6SsIvpOzGIA2lWxrEV86xnTnm0TzKm9SUVT4SXUU,2734
|
86
|
-
sinatools/ner/relation_extractor.py,sha256=a85xGX6V72fDpJk0GKmmtlWf8S8ezY-2pm5oGc9_ESY,9750
|
87
85
|
sinatools/ner/transforms.py,sha256=vti3mDdi-IRP8i0aTQ37QqpPlP9hdMmJ6_bAMa0uL-s,4871
|
88
86
|
sinatools/ner/data/__init__.py,sha256=W0C1ge_XxTfmdEGz0hkclz57aLI5VFS5t6BjByCfkFk,57
|
89
87
|
sinatools/ner/data/datasets.py,sha256=lcdDDenFMEKIGYQmfww2dk_9WKWrJO9HtKptaAEsRmY,5064
|
@@ -93,9 +91,9 @@ sinatools/ner/nn/BertNestedTagger.py,sha256=_fwAn1kiKmXe6m5y16Ipty3kvXIEFEmiUq74
|
|
93
91
|
sinatools/ner/nn/BertSeqTagger.py,sha256=dFcBBiMw2QCWsyy7aQDe_PS3aRuNn4DOxKIHgTblFvc,504
|
94
92
|
sinatools/ner/nn/__init__.py,sha256=UgQD_XLNzQGBNSYc_Bw1aRJZjq4PJsnMT1iZwnJemqE,170
|
95
93
|
sinatools/ner/trainers/BaseTrainer.py,sha256=Ifz4SeTxJwVn1_uWZ3I9KbcSo2hLPN3ojsIYuoKE9wE,4050
|
96
|
-
sinatools/ner/trainers/BertNestedTrainer.py,sha256=
|
97
|
-
sinatools/ner/trainers/BertTrainer.py,sha256=
|
98
|
-
sinatools/ner/trainers/__init__.py,sha256=
|
94
|
+
sinatools/ner/trainers/BertNestedTrainer.py,sha256=iJOah69tXZsAXBimqP0odEsk8SPX4A355riePzW2BFs,8632
|
95
|
+
sinatools/ner/trainers/BertTrainer.py,sha256=BtttsrHPolmK3eRDqrgVUuv6lVMuImIeskxhi02Q-44,6596
|
96
|
+
sinatools/ner/trainers/__init__.py,sha256=Xnbi_M4KKJRqV7FJe1vklyT0nEW2Q2obxgcWkbR0ZbA,190
|
99
97
|
sinatools/relations/__init__.py,sha256=cYjsP2mlTYvAwVIEFtgA6i9gLUSkGVOuDggMs7TvG5k,272
|
100
98
|
sinatools/relations/relation_extractor.py,sha256=UuDlaaR0ch9BFv4sBF1tr7P-P9xq8oRZF41tAze6_ok,9751
|
101
99
|
sinatools/semantic_relatedness/__init__.py,sha256=S0xrmqtl72L02N56nbNMudPoebnYQgsaIyyX-587DsU,830
|
@@ -104,24 +102,22 @@ sinatools/synonyms/__init__.py,sha256=yMuphNZrm5XLOR2T0weOHcUysJm-JKHUmVLoLQO839
|
|
104
102
|
sinatools/synonyms/synonyms_generator.py,sha256=jRd0D3_kn-jYBaZzqY-7oOy0SFjSJ-mjM7JhsySzX58,9037
|
105
103
|
sinatools/utils/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
|
106
104
|
sinatools/utils/charsets.py,sha256=rs82oZJqRqosZdTKXfFAJfJ5t4PxjMM_oAPsiWSWuwU,2817
|
107
|
-
sinatools/utils/implication.py,sha256=MsbI6S1LNY-fCxGMxFTuaV639r3QijkkdcfH48rvY7A,27804
|
108
|
-
sinatools/utils/jaccard.py,sha256=kLIptPNB2VIqnemVve9auyOL1kXHIsCkKCEwxFM8yP4,10114
|
109
105
|
sinatools/utils/parser.py,sha256=qvHdln5R5CAv_0UOJWe0mcp8JCsGqgazoeIIkoALH88,6259
|
110
106
|
sinatools/utils/readfile.py,sha256=xE4LEaCqXJIk9v37QUSSmWb-aY3UnCFUNb7uVdx3cpM,133
|
111
|
-
sinatools/utils/similarity.py,sha256=
|
107
|
+
sinatools/utils/similarity.py,sha256=HAK6OmyVnfjPm0GWL3z9s4ZoUwpZHVKxt3CeSMfqLIQ,11990
|
112
108
|
sinatools/utils/text_dublication_detector.py,sha256=FeSkbfWGMQluz23H4CBHXION-walZPgjueX6AL8u_Q0,5660
|
113
109
|
sinatools/utils/text_transliteration.py,sha256=F3smhr2AEJtySE6wGQsiXXOslTvSDzLivTYu0btgc10,8769
|
114
110
|
sinatools/utils/tokenizer.py,sha256=nyk6lh5-p38wrU62hvh4wg7ni9ammkdqqIgcjbbBxxo,6965
|
115
111
|
sinatools/utils/tokenizers_words.py,sha256=efNfOil9qDNVJ9yynk_8sqf65PsL-xtsHG7y2SZCkjQ,656
|
116
112
|
sinatools/utils/word_compare.py,sha256=rS2Z74sf7R-7MTXyrFj5miRi2TnSG9OdTDp_qQYuo2Y,28200
|
117
113
|
sinatools/wsd/__init__.py,sha256=mwmCUurOV42rsNRpIUP3luG0oEzeTfEx3oeDl93Oif8,306
|
118
|
-
sinatools/wsd/disambiguator.py,sha256=
|
114
|
+
sinatools/wsd/disambiguator.py,sha256=h-3idc5rPPbMDSE_QVJAsEVkDHwzYY3L2SEPNXIdOcc,20104
|
119
115
|
sinatools/wsd/settings.py,sha256=6XflVTFKD8SVySX9Wj7zYQtV26WDTcQ2-uW8-gDNHKE,747
|
120
116
|
sinatools/wsd/wsd.py,sha256=gHIBUFXegoY1z3rRnIlK6TduhYq2BTa_dHakOjOlT4k,4434
|
121
|
-
SinaTools-0.1.
|
122
|
-
SinaTools-0.1.
|
123
|
-
SinaTools-0.1.
|
124
|
-
SinaTools-0.1.
|
125
|
-
SinaTools-0.1.
|
126
|
-
SinaTools-0.1.
|
127
|
-
SinaTools-0.1.
|
117
|
+
SinaTools-0.1.37.dist-info/AUTHORS.rst,sha256=aTWeWlIdfLi56iLJfIUAwIrmqDcgxXKLji75_Fjzjyg,174
|
118
|
+
SinaTools-0.1.37.dist-info/LICENSE,sha256=uwsKYG4TayHXNANWdpfMN2lVW4dimxQjA_7vuCVhD70,1088
|
119
|
+
SinaTools-0.1.37.dist-info/METADATA,sha256=1OAigouXXSSaZ3MpOAxxAHfh5yPltiXjaOGe656KjTc,3346
|
120
|
+
SinaTools-0.1.37.dist-info/WHEEL,sha256=DZajD4pwLWue70CAfc7YaxT1wLUciNBvN_TTcvXpltE,110
|
121
|
+
SinaTools-0.1.37.dist-info/entry_points.txt,sha256=_CsRKM_tSCWV5hefBNUsWf9_6DrJnzFlxeAo1wm5XqY,1302
|
122
|
+
SinaTools-0.1.37.dist-info/top_level.txt,sha256=8tNdPTeJKw3TQCaua8IJIx6N6WpgZZmVekf1OdBNJpE,10
|
123
|
+
SinaTools-0.1.37.dist-info/RECORD,,
|
@@ -1,6 +1,6 @@
|
|
1
|
-
Wheel-Version: 1.0
|
2
|
-
Generator: bdist_wheel (0.
|
3
|
-
Root-Is-Purelib: true
|
4
|
-
Tag: py2-none-any
|
5
|
-
Tag: py3-none-any
|
6
|
-
|
1
|
+
Wheel-Version: 1.0
|
2
|
+
Generator: bdist_wheel (0.43.0)
|
3
|
+
Root-Is-Purelib: true
|
4
|
+
Tag: py2-none-any
|
5
|
+
Tag: py3-none-any
|
6
|
+
|
@@ -52,16 +52,17 @@ def main():
|
|
52
52
|
for file in args.files:
|
53
53
|
print("file: ", file)
|
54
54
|
if file == "wsd":
|
55
|
-
|
56
|
-
|
55
|
+
download_file(urls["morph"])
|
56
|
+
download_file(urls["ner"])
|
57
57
|
#download_file(urls["wsd_model"])
|
58
|
-
download_folder_from_hf("SinaLab/ArabGlossBERT", "bert-base-arabertv02_22_May_2021_00h_allglosses_unused01")
|
59
58
|
#download_file(urls["wsd_tokenizer"])
|
60
|
-
|
61
|
-
|
62
|
-
|
63
|
-
|
64
|
-
|
59
|
+
download_folder_from_hf("SinaLab/ArabGlossBERT", "bert-base-arabertv02_22_May_2021_00h_allglosses_unused01")
|
60
|
+
download_folder_from_hf("SinaLab/ArabGlossBERT", "bert-base-arabertv02")
|
61
|
+
download_file(urls["one_gram"])
|
62
|
+
download_file(urls["five_grams"])
|
63
|
+
download_file(urls["four_grams"])
|
64
|
+
download_file(urls["three_grams"])
|
65
|
+
download_file(urls["two_grams"])
|
65
66
|
elif file == "synonyms":
|
66
67
|
download_file(urls["graph_l2"])
|
67
68
|
download_file(urls["graph_l3"])
|
sinatools/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.1.
|
1
|
+
0.1.37
|