PyPI - project-ryland - Versions diffs - 1.3.8__tar.gz - Mend

project-ryland 1.3.8__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (24) hide show

project_ryland-1.3.8/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Justin Vinh
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

project_ryland-1.3.8/PKG-INFO ADDED Viewed

@@ -0,0 +1,92 @@
+Metadata-Version: 2.4
+Name: project_ryland
+Version: 1.3.8
+Summary: This project develops standardized tools to use LLMs in research studies for improving patient care.
+Author-email: Justin Vinh <jvinh21@gmail.com>
+Requires-Python: >=3.10
+Description-Content-Type: text/markdown
+Classifier: Programming Language :: Python :: 3
+License-File: LICENSE
+Requires-Dist: pandas>=2.0
+Requires-Dist: numpy>=1.26
+Requires-Dist: matplotlib>=3.9
+Requires-Dist: scikit-learn>=1.5
+Requires-Dist: lifelines>=0.28
+Requires-Dist: tqdm>=4.66
+Requires-Dist: loguru>=0.7
+Requires-Dist: orjson>=3.10
+Requires-Dist: pyyaml>=6.0
+Requires-Dist: environs>=9.5
+Requires-Dist: openai>=1.43
+Requires-Dist: azure-identity>=1.17
+Requires-Dist: azure-core>=1.30
+Requires-Dist: pydantic>=2.6
+Requires-Dist: python-dateutil>=2.9
+Requires-Dist: requests>=2.31
+# project_ryland
+<a target="_blank" href="https://cookiecutter-data-science.drivendata.org/">
+    <img src="https://img.shields.io/badge/CCDS-Project%20template-328F97?logo=cookiecutter" />
+</a>
+This project develops standardized tools to use LLMs in research studies for improving patient care.
+RYLAND stands for Research sYstem for LLM-based Analytics of Novel Data. Ryland is the protagonist of Justin's favorite book (he'll leave it to you to figure out which one)
+Ignore the file tree - it needs to be updated.
+## Project Organization
+```
+├── LICENSE            <- Open-source license if one is chosen
+├── Makefile           <- Makefile with convenience commands like `make data` or `make train`
+├── README.md          <- The top-level README for developers using this project.
+├── data
+│   ├── external       <- Data from third party sources.
+│   ├── interim        <- Intermediate data that has been transformed.
+│   ├── processed      <- The final, canonical data sets for modeling.
+│   └── raw            <- The original, immutable data dump.
+│
+├── docs               <- A default mkdocs project; see www.mkdocs.org for details
+│
+├── models             <- Trained and serialized models, model predictions, or model summaries
+│
+├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
+│                         the creator's initials, and a short `-` delimited description, e.g.
+│                         `1.0-jqp-initial-data-exploration`.
+│
+├── pyproject.toml     <- Project configuration file with package metadata for
+│                         project_ryland_code and configuration for tools like black
+│
+├── references         <- Data dictionaries, manuals, and all other explanatory materials.
+│
+├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
+│   └── figures        <- Generated graphics and figures to be used in reporting
+│
+├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
+│                         generated with `pip freeze > requirements.txt`
+│
+├── setup.cfg          <- Configuration file for flake8
+│
+└── project_ryland_code   <- Source code for use in this project.
+    │
+    ├── __init__.py             <- Makes project_ryland_code a Python module
+    │
+    ├── config.py               <- Store useful variables and configuration
+    │
+    ├── dataset.py              <- Scripts to download or generate data
+    │
+    ├── features.py             <- Code to create features for modeling
+    │
+    ├── modeling
+    │   ├── __init__.py
+    │   ├── predict.py          <- Code to run model inference with trained models
+    │   └── train.py            <- Code to train models
+    │
+    └── plots.py                <- Code to create visualizations
+```
+--------

project_ryland-1.3.8/README.md ADDED Viewed

@@ -0,0 +1,65 @@
+# project_ryland
+<a target="_blank" href="https://cookiecutter-data-science.drivendata.org/">
+    <img src="https://img.shields.io/badge/CCDS-Project%20template-328F97?logo=cookiecutter" />
+</a>
+This project develops standardized tools to use LLMs in research studies for improving patient care.
+RYLAND stands for Research sYstem for LLM-based Analytics of Novel Data. Ryland is the protagonist of Justin's favorite book (he'll leave it to you to figure out which one)
+Ignore the file tree - it needs to be updated.
+## Project Organization
+```
+├── LICENSE            <- Open-source license if one is chosen
+├── Makefile           <- Makefile with convenience commands like `make data` or `make train`
+├── README.md          <- The top-level README for developers using this project.
+├── data
+│   ├── external       <- Data from third party sources.
+│   ├── interim        <- Intermediate data that has been transformed.
+│   ├── processed      <- The final, canonical data sets for modeling.
+│   └── raw            <- The original, immutable data dump.
+│
+├── docs               <- A default mkdocs project; see www.mkdocs.org for details
+│
+├── models             <- Trained and serialized models, model predictions, or model summaries
+│
+├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
+│                         the creator's initials, and a short `-` delimited description, e.g.
+│                         `1.0-jqp-initial-data-exploration`.
+│
+├── pyproject.toml     <- Project configuration file with package metadata for
+│                         project_ryland_code and configuration for tools like black
+│
+├── references         <- Data dictionaries, manuals, and all other explanatory materials.
+│
+├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
+│   └── figures        <- Generated graphics and figures to be used in reporting
+│
+├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
+│                         generated with `pip freeze > requirements.txt`
+│
+├── setup.cfg          <- Configuration file for flake8
+│
+└── project_ryland_code   <- Source code for use in this project.
+    │
+    ├── __init__.py             <- Makes project_ryland_code a Python module
+    │
+    ├── config.py               <- Store useful variables and configuration
+    │
+    ├── dataset.py              <- Scripts to download or generate data
+    │
+    ├── features.py             <- Code to create features for modeling
+    │
+    ├── modeling
+    │   ├── __init__.py
+    │   ├── predict.py          <- Code to run model inference with trained models
+    │   └── train.py            <- Code to train models
+    │
+    └── plots.py                <- Code to create visualizations
+```
+--------

project_ryland-1.3.8/project_ryland/__init__.py ADDED Viewed

File without changes

project_ryland-1.3.8/project_ryland/data_utils/__init__.py ADDED Viewed

File without changes

project_ryland-1.3.8/project_ryland/data_utils/analysis_utils.py ADDED Viewed

@@ -0,0 +1,14 @@
+"""
+------------------------------------------------------------------------------
+Author:         Justin Vinh
+Collaborators:  John Rhee, MD
+Parent Package: Project Ryland
+Creation Date:  2025.10.16
+Last Modified:  2025.10.16
+Purpose:
+This module contains functions designed to aid in statistical analyses of
+data processed by LLMs
+------------------------------------------------------------------------------
+"""

project_ryland-1.3.8/project_ryland/data_utils/io_utils.py ADDED Viewed

@@ -0,0 +1,61 @@
+"""
+------------------------------------------------------------------------------
+Author:         Justin Vinh
+Parent Package: Project Ryland
+Creation Date:  2025.09.29
+Last Modified:  2025.09.29
+Purpose:
+Contains functions to import/output data and do basic cleaning
+------------------------------------------------------------------------------
+"""
+from pathlib import Path
+import orjson
+import pandas as pd
+def normalize_newlines(text: str) -> str:
+    """
+    Normalize all line endings of text to use Unix-style \n
+    (helps in using regex downstream)
+    """
+    if not isinstance(text, str):
+        return text
+    else:
+        # Replace \\r\\n in the text to just \n
+        return text.replace('\\r\\n', '\n').replace('\r\n', '\n')
+def load_oncdrs_json_to_df(path_name):
+    """
+    1) Loads the given OncDRS-exported json file and changes it to a
+    dataframe.
+    2) Normalize all line endings of text in the RPT_TEXT and NARRATIVE_TEXT
+    columns to use Unix-style \n (helps in using regex downstream)
+    3) Has error handling for missing file or decoding issues
+    """
+    # Handles incorrect path names, raises an error if there is a bad path
+    path = Path(path_name)
+    if not path.exists():
+        raise FileNotFoundError(f"File {path_name} not found")
+    # Tries opening the json file, raises an error if unable to read it
+    # try:
+    #     with path.open('r') as json_file:
+    #         data = json.load(json_file)
+    try:
+        data = orjson.loads(path.read_bytes())
+    except orjson.JSONDecodeError as e:
+        raise ValueError(f'Invalid JSON format in file: {path}') from e
+    # Creates a dataframe using the key-value pairs under response>docs
+    df = pd.DataFrame(data['response']['docs'])
+    # Normalizes line endings with \n in RPT_TEXT and NARRATIVE_TEXT columns
+    for col in ['RPT_TEXT', 'NARRATIVE_TEXT']:
+        if col in df.columns:
+            df[col] = df[col].map(normalize_newlines)
+    return df

project_ryland-1.3.8/project_ryland/data_utils/keyword_mappings.py ADDED Viewed

@@ -0,0 +1,330 @@
+"""
+------------------------------------------------------------------------------
+Author:         Justin Vinh
+Collaborators:  Zach Tentor
+Parent Package: Project Ryland
+Creation Date:  2025.09.29
+Last Modified:  2025.10.07
+Purpose:
+Contain the keyword mappings for the Project Ryland Utils. This module
+also contains keywords for several scripts that are project-specific.
+------------------------------------------------------------------------------
+"""
+# Symptoms of Interest
+# ----------------------------------------------------------------------------
+gwas_prompt_variables_v1 = {
+    'symptoms': [
+        'headache',
+        'hair loss',
+        'fatigue',
+        'nausea',
+        'anxiety',
+        'difficulty sleeping',
+        'numbness and tingling',
+        'joint pain',
+        'rash',
+        'diarrhea',
+        'constipation',
+        'other'
+    ]
+}
+# Input text keywords
+progress_note_text_filters = ['CENTER FOR NEURO-ONCOLOGY',
+                              'NEURO-ONCOLOGY PROGRESS NOTE',
+                              'Subjective: Patient ID',
+                              'HISTORY OF PRESENT ILLNESS',
+                              'INTERVAL HISTORY'
+]
+neuro_onc_tumor_keywords = ["glioblastoma",
+                            "astrocytoma",
+                            "oligodendroglioma",
+                            "glioma"
+]
+# Note types (process description) of interest
+# (Used to pre-filter in the main script)
+# ----------------------------------------------------------------------------
+pathology_proc_desc_of_interest = [
+    'SURGICAL PATHOLOGY',
+    'ANATOMIC PATHOLOGY',
+    'OTHER PATHOLOGY RESULTS',
+    'OUTSIDE PATHOLOGY REVIEW',
+    'FLOW CYTOMETRY']
+image_proc_desc_of_interest = [
+    'CT CHEST',
+    'CT PET CHEST'
+]
+# Mappings of keywords by process descriptions
+# This dict contains rules for keywords and conditions for each process
+# description in RPT_TEXT. It will be used to extract specific text segments
+# to be fed to the LLM.
+# ----------------------------------------------------------------------------
+image_mappings_by_proc_desc =   {
+    "*": [
+        {
+            "start": r"IMPRESSION:",
+            "end": ["END IMPRESSION",
+                    "This report was electronically signed"],
+            "condition": None
+        }
+    ]
+}
+progress_note_mappings_by_proc_desc = {
+    "Progress Note": [
+        {
+            "start": r"EXAM:",
+            "end": ["DATA:",
+                    "MRI"],
+            "condition": None
+        },
+        {
+            "start": r"EXAMINATION",
+            "end": ["LABORATORY",
+                    "ASSESSMENT AND PLAN"],
+            "condition": None
+        },
+        {
+            "start": r"PHYSICAL EXAM",
+            "end": ["LABORATORY",
+                    "IMAGING",
+                    "IMPRESSION AND PLAN","LABS",
+                    "RADIOGRAPHIC EXAMINATION",
+                    "Radiology",
+                    "PLAN",
+                    "LAB RESULT",
+                    "IMPRESSION",
+                    "ASSESSMENT/PLAN"],
+            "condition": None
+        },
+        {
+            "start": r"Physical Exam",
+            "end": ["TEST",
+                    "Results",
+                    "Labs",
+                    "Blood Draw",
+                    "LABORATORY"],
+            "condition": None
+        },
+        {
+            "start": r"Physical exam",
+            "end": ["Lab Results"],
+            "condition": None
+        },
+        {
+            "start": r"Examination",
+            "end": ["Data Review",
+                    "Prior Work up",
+                    "Assessment and Plan"],
+            "condition": None
+    }
+    ]
+}
+pathology_mappings_by_proc_desc = {
+    "SURGICAL PATHOLOGY": [
+        {
+            "start": r"INTERPRETATION",
+            "end": ["TEST INFORMATION",
+                    "Final Diagnosis",
+                    "Prior Results"],
+            "condition": None
+        },
+        {
+            "start": r"FINAL PATHOLOGIC DIAGNOSIS",
+            "end": ["Electronically Signed Out"],
+            "condition": None
+        },
+        {
+            "start": r"PATHOLOGIC DIAGNOSIS",
+            "end": ["CLINICAL DATA"],
+            "condition": None
+        },
+        {
+            "start": r"FINAL DIAGNOSIS",
+            "end": ["GROSS DESCRIPTION",
+                    "Final Diagnosis"],
+            "condition": None
+        },
+        {
+            "start": r"DIAGNOSIS",
+            "end": ["Gross Description",
+                    "Final Diagnosis"],
+            "condition": None
+        },
+        {
+            "start": r"Diagnosis",
+            "end": ["Gross Description"],
+            "condition": "exclude",
+            "exclude_after": "CLINICAL DATA"
+        },
+        {
+            "start": r"Diagnosis",
+            "end": ["Electronically Signed Out"],
+            "condition": None
+        },
+        {
+            "start": r"Final Diagnosis",
+            "end": ["Electronically Signed Out"],
+            "condition": None
+        }
+    ],
+    "ANATOMIC PATHOLOGY": [
+        {
+            "start": r"PATHOLOGIC DIAGNOSIS",
+            "end": ["CLINICAL DATA"],
+            "condition": None
+        },
+        {
+            "start": r"FINAL PATHOLOGIC DIAGNOSIS",
+            "end": ["Electronically Signed Out"],
+            "condition": None
+        },
+        {
+            "start": r"FINAL DIAGNOSIS",
+            "end": ["Clinical History"],
+            "condition": None
+        },
+        {
+            "start": r"RESULTS",
+            "end": ["REFERENCES"],
+            "condition": None
+        },
+        {
+            "start": r"INTERPRETATION",
+            "end": ["CLINICAL DATA"],
+            "condition": None
+        },
+        {
+            "start": r"Diagnosis",
+            "end": ["Electronically Signed Out"],
+            "condition": None
+        },
+        {
+            "start": r"DIAGNOSIS",
+            "end": ["Gross Description"],
+            "condition": "exclude",
+            "exclude_after": "CLINICAL DATA"
+        },
+        {
+            "start": r"Final Diagnosis",
+            "end": ["Clinical History:"],
+            "condition": None
+        }
+    ],
+    "FLOW CYTOMETRY": [
+        {
+            "start": r"INTERPRETATION",
+            "end": ["By his/her signature",
+                    "These tests were developed"],
+            "condition": None
+        }
+        # Additional mappings can be added if necessary
+    ],
+    "OUTSIDE PATHOLOGY REVIEW": [
+        {
+            "start": r"PATHOLOGIC DIAGNOSIS",
+            "end": ["CLINICAL DATA"],
+            "condition": None
+        },
+        {
+            "start": r"INTEGRATED DIAGNOSIS",
+            "end": ["Electronically Signed"],
+            "condition": None
+        }
+        # Clarification needed on how to handle overlapping diagnostics
+    ],
+    "OTHER PATHOLOGY RESULTS": [
+        {
+            "start": r"DIAGNOSIS",
+            "end": ["CLINICAL DATA"],
+            "condition": None
+        },
+        {
+            "start": r"PATHOLOGIC DIAGNOSIS",
+            "end": ["CLINICAL DATA"],
+            "condition": None
+        },
+        {
+            "start": r"FINAL DIAGNOSIS",
+            "end": ["GROSS DESCRIPTION"],
+            "condition": None
+        },
+        {
+            "start": r"CYTOLOGIC DIAGNOSIS",
+            "end": [],  # No specific end keywords provided
+            "condition": None
+        },
+        {
+            "start": r"Final Diagnosis",
+            "end": ["Clinical History"],
+            "condition": None
+        },
+        {
+            "start": r"INTERPRETATION",
+            "end": ["Final Diagnosis"],
+            "condition": None
+        },
+        {
+            "start": r"RESULT",
+            "end": ["Final Diagnosis"],
+            "condition": None
+        },
+        {
+            "start": r"Result",
+            "end": ["Final Diagnosis"],
+            "condition": None
+        }
+    ],
+    "PROGRESS NOTES": [
+        {
+            "start": r"EXAMINATION",
+            "end": ["LABORATORY",
+                    "ASSESSMENT AND PLAN"],
+            "condition": None
+        },
+        {
+            "start": r"PHYSICAL EXAM",
+            "end": ["LABORATORY",
+                    "IMAGING",
+                    "IMPRESSION AND PLAN","LABS",
+                    "RADIOGRAPHIC EXAMINATION",
+                    "Radiology",
+                    "PLAN",
+                    "LAB RESULT",
+                    "IMPRESSION",
+                    "ASSESSMENT/PLAN"],
+            "condition": None
+        },
+        {
+            "start": r"Physical Exam",
+            "end": ["TEST",
+                    "Results",
+                    "Labs",
+                    "Blood Draw",
+                    "LABORATORY"],
+            "condition": None
+        },
+        {
+            "start": r"Physical exam",
+            "end": ["Lab Results"],
+            "condition": None
+        },
+        {
+            "start": r"Examination",
+            "end": ["Data Review",
+                    "Prior Work up",
+                    "Assessment and Plan"],
+            "condition": None
+    }
+    ]
+}