PyPI - PyRuSH - Versions diffs - 1.0.11__tar.gz → 1.0.12.dev1__tar.gz - Mend

PyRuSH 1.0.11tar.gz → 1.0.12.dev1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (28) hide show

{pyrush-1.0.11 → pyrush-1.0.12.dev1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: PyRuSH
-Version: 1.0.11
+Version: 1.0.12.dev1
 Summary: PyRuSH is the python implementation of RuSH (Rule-based sentence Segmenter using Hashing), which is originally developed using Java. RuSH is an efficient, reliable, and easy adaptable rule-based sentence segmentation solution. It is specifically designed to handle the telegraphic written text in clinical note. It leverages a nested hash table to execute simultaneous rule processing, which reduces the impact of the rule-base growth on execution time and eliminates the effect of rule order on accuracy.
 Home-page: https://github.com/jianlins/PyRuSH
 Author: Jianlin
@@ -77,3 +77,16 @@ Start from version 1.0.3, PyRuSH adds Spacy compatible Sentencizer component: Py
 A Colab Notebook Demo
 ---------------------------
 Feel free to try this runnable `Colab notebook Demo <https://colab.research.google.com/drive/1gX9MzZTQiPw8G3x_vUwZbiSXGtbI0uIX?usp=sharing>`_
+Revision History
+----------------
+**1.0.11 (2025-09-02)**
+- Improved sentence splitting logic: Sentences are now split at the last token before exceeding the max length, ensuring no chunk exceeds the specified limit.
+- Edge case handling: Trailing whitespaces (caused by spacy sentence labeling mechanism) can be optionally split into a separate sentence (merge_gaps=False) to avoid necessarily long sentences.
+**1.0.9 (2024-10-27)**
+- Initial release with spaCy 3.x compatibility and core RuSH logic.
+- Added Spacy-compatible PyRuSHSentencizer component.

{pyrush-1.0.11 → pyrush-1.0.12.dev1}/PyRuSH/__init__.py RENAMED Viewed

@@ -30,7 +30,7 @@
 from .PyRuSHSentencizer import PyRuSHSentencizer
 from .RuSH import RuSH, BEGIN, END
-__version__ = '1.0.11'
+__version__ = '1.0.12dev1'

{pyrush-1.0.11 → pyrush-1.0.12.dev1}/PyRuSH.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: PyRuSH
-Version: 1.0.11
+Version: 1.0.12.dev1
 Summary: PyRuSH is the python implementation of RuSH (Rule-based sentence Segmenter using Hashing), which is originally developed using Java. RuSH is an efficient, reliable, and easy adaptable rule-based sentence segmentation solution. It is specifically designed to handle the telegraphic written text in clinical note. It leverages a nested hash table to execute simultaneous rule processing, which reduces the impact of the rule-base growth on execution time and eliminates the effect of rule order on accuracy.
 Home-page: https://github.com/jianlins/PyRuSH
 Author: Jianlin
@@ -77,3 +77,16 @@ Start from version 1.0.3, PyRuSH adds Spacy compatible Sentencizer component: Py
 A Colab Notebook Demo
 ---------------------------
 Feel free to try this runnable `Colab notebook Demo <https://colab.research.google.com/drive/1gX9MzZTQiPw8G3x_vUwZbiSXGtbI0uIX?usp=sharing>`_
+Revision History
+----------------
+**1.0.11 (2025-09-02)**
+- Improved sentence splitting logic: Sentences are now split at the last token before exceeding the max length, ensuring no chunk exceeds the specified limit.
+- Edge case handling: Trailing whitespaces (caused by spacy sentence labeling mechanism) can be optionally split into a separate sentence (merge_gaps=False) to avoid necessarily long sentences.
+**1.0.9 (2024-10-27)**
+- Initial release with spaCy 3.x compatibility and core RuSH logic.
+- Added Spacy-compatible PyRuSHSentencizer component.

{pyrush-1.0.11 → pyrush-1.0.12.dev1}/PyRuSH.egg-info/SOURCES.txt RENAMED Viewed

@@ -21,6 +21,7 @@ PyRuSH/../conf/rush_rules.tsv
 conf/rush_rules.tsv
 tests/test_PyRuSHSentencizer_param.py
 tests/test_PyRushSentencizer.py
+tests/test_PyRushSentencizer2.py
 tests/test_Rush.py
 tests/test_Rush_w_Logger.py
 tests/test_cpredict_split_gaps.py

{pyrush-1.0.11 → pyrush-1.0.12.dev1}/README.rst RENAMED Viewed

@@ -52,3 +52,16 @@ Start from version 1.0.3, PyRuSH adds Spacy compatible Sentencizer component: Py
 A Colab Notebook Demo
 ---------------------------
 Feel free to try this runnable `Colab notebook Demo <https://colab.research.google.com/drive/1gX9MzZTQiPw8G3x_vUwZbiSXGtbI0uIX?usp=sharing>`_
+Revision History
+----------------
+**1.0.11 (2025-09-02)**
+- Improved sentence splitting logic: Sentences are now split at the last token before exceeding the max length, ensuring no chunk exceeds the specified limit.
+- Edge case handling: Trailing whitespaces (caused by spacy sentence labeling mechanism) can be optionally split into a separate sentence (merge_gaps=False) to avoid necessarily long sentences.
+**1.0.9 (2024-10-27)**
+- Initial release with spaCy 3.x compatibility and core RuSH logic.
+- Added Spacy-compatible PyRuSHSentencizer component.

{pyrush-1.0.11 → pyrush-1.0.12.dev1}/conf/rush_rules.tsv RENAMED Viewed

@@ -20,7 +20,7 @@
 #stbegin is the marker for sentence begin, the span of sentence will start at the begin of the captured group
 #stbegin has two scores 0, 1: 0 for true sentence begin clues, 1 for false sentence begin clues which will overwrite 0-scored rules when they are overlapping.
-#stend is the marker for sentence end, the span of sentence will end at the end of the captured group
+#stend is the marker for sentence begin, the span of sentence will end at the end of the captured group
 #stend also has two scores 2, 3: 2 for true sentence end clues, 3 for false sentence end clues which will overwrite 2-scored rules when they are overlapping
 # \b the begin of an input
@@ -47,12 +47,6 @@
 \b\s+(\C	0	stbegin
 \b\s+(\d	0	stbegin
 \c.\s+(\C)	0	stbegin
-Dr.\s+(\C)	1	stbegin
-Mr.\s+(\C)	1	stbegin
-Ms.\s+(\C)	1	stbegin
-Miss.\s+(\C)	1	stbegin
-Mrs.\s+(\C)	1	stbegin
-dr.\s+(\C)	1	stbegin
  mL.\s+(\C)	0	stbegin
 *)	1	stbegin
 \c\c.\s+(\C)	0	stbegin
@@ -245,7 +239,7 @@ dr.\s+(\C)	1	stbegin
 \n(? \C	0	stbegin
 \n(? \c	0	stbegin
 \n(. \C	0	stbegin
-\n(\+ \C	0	stbegin
+\n(+ \C	0	stbegin
 \n(/ \C	0	stbegin
 \n+\d\d-\d\d\s+(\C	0	stbegin
 \n+\d+-\d\d-\d\d\s+(\C	0	stbegin

{pyrush-1.0.11 → pyrush-1.0.12.dev1}/setup.cfg RENAMED Viewed

@@ -1,7 +1,6 @@
 [metadata]
 readme = README.md
 license = MIT
-license_files = LICENSE
 [bdist_wheel]
 python_tag = py3

{pyrush-1.0.11 → pyrush-1.0.12.dev1}/tests/test_PyRuSHSentencizer_param.py RENAMED Viewed

@@ -6,7 +6,7 @@ from PyRuSH.PyRuSHSentencizer import PyRuSHSentencizer
 text_short = "Sentence one. Sentence two!"
 text_long = "This is a very long sentence that should be split at whitespace before the max length is reached. " * 5
 text_whitespace = "First sentence.    Second sentence after spaces.\nThird sentence after newline."
-rule_path = os.path.join(os.path.dirname(os.path.dirname(__file__)), "conf", "rush_rules.tsv")
+rule_path = os.path.join(os.path.dirname(__file__), "rush_rules.tsv")
 def make_nlp(merge_gaps, max_sentence_length):
     nlp = English()

{pyrush-1.0.11 → pyrush-1.0.12.dev1}/tests/test_PyRushSentencizer.py RENAMED Viewed

@@ -10,7 +10,7 @@ from spacy.lang.en import English
 class TestRuSH(unittest.TestCase):
     def setUp(self):
-        pwd = os.path.dirname(os.path.abspath(__file__))
+        self.pwd = os.path.dirname(os.path.abspath(__file__))
     def test_doc(self):
         nlp = English()
@@ -51,14 +51,14 @@ I will see her in a month to six weeks.  She is to follow up with Dr. X before t
  End Ezoic - MTSam Sample Bottom Matched Content - native_bottom
 '''
         nlp = English()
-        nlp.add_pipe("medspacy_pyrush")
+        nlp.add_pipe("medspacy_pyrush", config={"rules_path": os.path.join(self.pwd, 'rush_rules.tsv')})
         doc = nlp(input_str)
         sents = [s for s in doc.sents]
         for sent in sents:
             print('>' + str(sent) + '<\n\n')
         # New expected count includes whitespace-only sentences
-        assert (len(sents) == 53)
+        assert (len(sents) == 51)
         # For content checks, filter out whitespace-only sentences
         content_sents = [s for s in sents if s.text.strip()]
         assert (content_sents[0].text == 'Ms. ABCD is a 69-year-old lady, who was admitted to the hospital with chest pain and respiratory insufficiency.')
@@ -74,7 +74,7 @@ I will see her in a month to six weeks.  She is to follow up with Dr. X before t
         from loguru import logger
         logger.add(sys.stdout, level="DEBUG")
         nlp = English()
-        nlp.add_pipe("medspacy_pyrush")
+        nlp.add_pipe("medspacy_pyrush", config={"rules_path": os.path.join(self.pwd, 'rush_rules.tsv')})
         doc = nlp(input_str)
         sents = [s for s in doc.sents]
         for sent in sents:
@@ -116,4 +116,5 @@ I will see her in a month to six weeks.  She is to follow up with Dr. X before t
         # SpaCy has no control of sentence end. Thus, it ends up with sloppy ends.
         assert (sents[1].text == 'Ms. ABCD is a 69-year-old lady, who was admitted to the hospital with'
-                                 ' chest pain and respiratory insufficiency.')
+                                 ' chest pain and respiratory insufficiency.')

pyrush-1.0.12.dev1/tests/test_PyRushSentencizer2.py ADDED Viewed

@@ -0,0 +1,45 @@
+import unittest
+import os
+import sys
+sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
+from PyRuSH import PyRuSHSentencizer
+from spacy.lang.en import English
+class TestRuSH(unittest.TestCase):
+    def setUp(self):
+        self.pwd = os.path.dirname(os.path.abspath(__file__))
+    # def test_doc(self):
+    #     nlp = English()
+    #     nlp.add_pipe("medspacy_pyrush")
+    #     doc = nlp("This is a sentence. This is another sentence.")
+    #     print('\n'.join([str(s) for s in doc.sents]))
+    #     print('\nTotal sentences: {}'.format(len([s for s in doc.sents])))
+    #     print('\ndoc is an instance of {}'.format(type(doc)))
+    def test_doc4(self):
+        input_str='''Ms. [**Known patient lastname 2004**] was admitted on [**2573-5-30**]. Ultrasound
+at the time of admission demonstrated pancreatic duct dilitation and
+edematous gallbladder. She was admitted to the ICU.
+Discharge Medications:
+1. Miconazole Nitrate 2 % Powder Sig: One (1) Appl Topical  BID
+(2 times a day) as needed.
+2. Heparin Sodium (Porcine) 5,000 unit/mL Solution Sig: One (1)
+Injection TID (3 times a day).
+3. Acetaminophen 160 mg/5 mL Elixir Sig: One (1)  PO Q4-6H
+(every 4 to 6 hours) as needed.'''
+        nlp = English()
+        nlp.add_pipe("medspacy_pyrush", config={"rules_path": os.path.join(self.pwd, 'rush_rules.tsv')})
+        nlp.initialize()
+        doc = nlp(input_str)
+        sents = [s for s in doc.sents]
+        for sent in sents:
+            print('>' + str(sent) + '<\n\n')
+        assert(sents[-1].text=='''Sig: One (1)  PO Q4-6H
+(every 4 to 6 hours) as needed.''')
+if __name__ == '__main__':
+    unittest.main()

{pyrush-1.0.11 → pyrush-1.0.12.dev1}/tests/test_Rush.py RENAMED Viewed

@@ -95,7 +95,7 @@ class TestRuSH(unittest.TestCase):
         sentences = rush.segToSentenceSpans(input_str)
         self.printDetails(sentences, input_str)
-    def test_doc2(self):
+    def test8(self):
         input_str = '''
 9.  Advair b.i.d.
 10.  Xopenex q.i.d. and p.r.n.
@@ -109,12 +109,44 @@ I will see her in a month to six weeks.  She is to follow up with Dr. X before t
         sent = sentences[1]
         assert (input_str[sent.begin:sent.end] == '10.  Xopenex q.i.d. and p.r.n.')
-    def test_doc11(self):
+    def test9(self):
         input_str='  This is a sentence. This is another sentence.'
-        sentences=self.rush.segToSentenceSpans(input_str)
-        for sent in sentences:
-            print('>' + input_str[sent.begin:sent.end] + '<\n')
+        self.rush = RuSH(str(os.path.join(self.pwd, 'rush_rules.tsv')), min_sent_chars=2, enable_logger=True)
+        sentences = self.rush.segToSentenceSpans(input_str)
+        self.printDetails(sentences, input_str)
+    def test10(self):
+        input_str='''Ms. [**Known patient lastname 2004**] was admitted on [**2573-5-30**]. Ultrasound
+at the time of admission demonstrated pancreatic duct dilitation and
+edematous gallbladder. She was admitted to the ICU.
+Discharge Medications:
+1. Miconazole Nitrate 2 % Powder Sig: One (1) Appl Topical  BID
+(2 times a day) as needed.
+2. Heparin Sodium (Porcine) 5,000 unit/mL Solution Sig: One (1)
+Injection TID (3 times a day).
+3. Acetaminophen 160 mg/5 mL Elixir Sig: One (1)  PO Q4-6H
+(every 4 to 6 hours) as needed.'''
+        self.rush = RuSH(str(os.path.join(self.pwd, 'rush_rules.tsv')), min_sent_chars=2, enable_logger=True)
+        sentences = self.rush.segToSentenceSpans(input_str)
+        self.printDetails(sentences, input_str)
+        assert (sentences[0].begin == 0 and sentences[0].end == 173)
+        assert (sentences[1].begin == 174 and sentences[1].end == 202)
+        assert (sentences[2].begin == 203 and sentences[2].end == 225)
+        assert (sentences[3].begin == 226 and sentences[3].end == 258)
+        assert (sentences[4].begin == 259 and sentences[4].end == 316)
+        assert (sentences[5].begin == 317 and sentences[5].end == 367)
+        assert (sentences[6].begin == 368 and sentences[6].end == 411)
+        assert (sentences[7].begin == 412 and sentences[7].end == 447)
+        assert (sentences[8].begin == 448 and sentences[8].end == 502)
+    def test11(self):
+        input_str = '''Patient doesn't have heart disease or high blood pressure, but their dad did have
+diabetes. Pt is a 63M w/ h/o metastatic carcinoid tumor, HTN and hyperlipidemia.'''
+        self.rush = RuSH(str(os.path.join(self.pwd, 'rush_rules.tsv')), min_sent_chars=2, enable_logger=True)
+        sentences = self.rush.segToSentenceSpans(input_str)
+        self.printDetails(sentences, input_str)
+        assert (sentences[0].begin == 0 and sentences[0].end == 91)
+        assert (sentences[1].begin == 92 and sentences[1].end == 162)
 if __name__ == '__main__':
     unittest.main()

{pyrush-1.0.11 → pyrush-1.0.12.dev1}/LICENSE RENAMED Viewed

File without changes

{pyrush-1.0.11 → pyrush-1.0.12.dev1}/MANIFEST.in RENAMED Viewed

File without changes

{pyrush-1.0.11 → pyrush-1.0.12.dev1}/PyRuSH/PyRuSHSentencizer.py RENAMED Viewed

File without changes

{pyrush-1.0.11 → pyrush-1.0.12.dev1}/PyRuSH/RuSH.py RENAMED Viewed

File without changes

{pyrush-1.0.11 → pyrush-1.0.12.dev1}/PyRuSH/StaticSentencizerFun.cpp RENAMED Viewed

File without changes

{pyrush-1.0.11 → pyrush-1.0.12.dev1}/PyRuSH/StaticSentencizerFun.pyx RENAMED Viewed

File without changes

{pyrush-1.0.11 → pyrush-1.0.12.dev1}/PyRuSH.egg-info/dependency_links.txt RENAMED Viewed

File without changes

{pyrush-1.0.11 → pyrush-1.0.12.dev1}/PyRuSH.egg-info/not-zip-safe RENAMED Viewed

File without changes

{pyrush-1.0.11 → pyrush-1.0.12.dev1}/PyRuSH.egg-info/requires.txt RENAMED Viewed

File without changes

{pyrush-1.0.11 → pyrush-1.0.12.dev1}/PyRuSH.egg-info/top_level.txt RENAMED Viewed

File without changes

{pyrush-1.0.11 → pyrush-1.0.12.dev1}/pyproject.toml RENAMED Viewed

File without changes

{pyrush-1.0.11 → pyrush-1.0.12.dev1}/requirements.txt RENAMED Viewed

File without changes

{pyrush-1.0.11 → pyrush-1.0.12.dev1}/setup.py RENAMED Viewed

File without changes

{pyrush-1.0.11 → pyrush-1.0.12.dev1}/tests/test_Rush_w_Logger.py RENAMED Viewed

File without changes

{pyrush-1.0.11 → pyrush-1.0.12.dev1}/tests/test_cpredict_split_gaps.py RENAMED Viewed

File without changes

{pyrush-1.0.11 → pyrush-1.0.12.dev1}/tests/test_debug.py RENAMED Viewed

File without changes

{pyrush-1.0.11 → pyrush-1.0.12.dev1}/tests/test_merge_gaps_max_length.py RENAMED Viewed

File without changes

PyRuSH 1.0.11__tar.gz → 1.0.12.dev1__tar.gz

PyRuSH 1.0.11tar.gz → 1.0.12.dev1tar.gz