SinaTools 0.1.27__py2.py3-none-any.whl → 0.1.28__py2.py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -457,13 +457,13 @@ def WSD(sentence):
457
457
 
458
458
  def disambiguate(sentence):
459
459
  """
460
- This method disambiguate words within a sentence.
460
+ This method is a pipeline of five methods. Given a sentence as input, this method tags each word in the sentence with the following: Lemma, single-word sense, multi-word sense, and NER tag. The disambiguation of single/multi-word senses is done using our ArabGlossBERT TSV model. You can try the demo online. For more details read the article.
461
461
 
462
462
  Args:
463
- sentence (:obj:`str`): The Arabic text to be disambiguated, it should be limited to less than 500 characters.
463
+ sentence (:obj:`str`): The Arabic text to be disambiguated.
464
464
 
465
465
  Returns:
466
- :obj:`list`: The JSON output includes a list of words, with each word having a gloss if it exists or a lemma if no gloss is found.
466
+ :obj:`list`: A list of JSON objects, with each word having a concept id if it exists or a lemma if no gloss is found.
467
467
 
468
468
  **Example:**
469
469
 
@@ -475,22 +475,23 @@ def disambiguate(sentence):
475
475
  print(result)
476
476
 
477
477
  #output
478
- [
479
- {
480
- "concept_id": "303019218",
481
- "word": "ذهبت",
482
- "lemma": "ذَهَبَ۪ 1"
483
- },
484
- {
485
- "word": "إلى",
486
- "lemma": "إِلَى 1"
487
- },
488
- {
489
- "word": "جامعة بيرزيت",
490
- "concept_id": "334000099",
491
- "lemma": "جامِعَة بيرزَيت"
492
- }
493
- ]
478
+ [{
479
+ 'concept_id': '303051631',
480
+ 'word': 'تمشيت',
481
+ 'lemma': 'تَمَشَّى'
482
+ },{
483
+ 'concept_id': '303005470',
484
+ 'word': 'بين',
485
+ 'lemma': 'بَيْن'
486
+ },{
487
+ 'concept_id': '303007335',
488
+ 'word': 'الجداول',
489
+ 'lemma': 'جَدْوَلٌ'
490
+ },{
491
+ 'concept_id': '303056588',
492
+ 'word': 'والأنهار',
493
+ 'lemma': 'نَهْرٌ'
494
+ }]
494
495
  """
495
496
  if len(sentence) > 500:
496
497
  content = ["Input is too long"]