PyPI - llm-ie - Versions diffs - 0.1.1__tar.gz → 0.1.3__tar.gz - Mend

llm-ie 0.1.1tar.gz → 0.1.3tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

{llm_ie-0.1.1 → llm_ie-0.1.3}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: llm-ie
-Version: 0.1.1
+Version: 0.1.3
 Summary: An LLM-powered tool that transforms everyday language into robust information extraction pipelines.
 License: MIT
 Author: Enshuo (David) Hsu
@@ -9,9 +9,10 @@ Classifier: License :: OSI Approved :: MIT License
 Classifier: Programming Language :: Python :: 3
 Classifier: Programming Language :: Python :: 3.11
 Classifier: Programming Language :: Python :: 3.12
+Requires-Dist: nltk (>=3.8,<4.0)
 Description-Content-Type: text/markdown
-<div align="center"><img src=asset/readme_img/LLM-IE.png width=500 ></div>
+<div align="center"><img src=doc_asset/readme_img/LLM-IE.png width=500 ></div>
 ![Python Version](https://img.shields.io/pypi/pyversions/llm-ie)
 ![PyPI](https://img.shields.io/pypi/v/llm-ie)
@@ -33,10 +34,10 @@ An LLM-powered tool that transforms everyday language into robust information ex
 ## Overview
 LLM-IE is a toolkit that provides robust information extraction utilities for frame-based information extraction. Since prompt design has a significant impact on generative information extraction with LLMs, it also provides a built-in LLM editor to help with prompt writing. The flowchart below demonstrates the workflow starting from a casual language request.
-<div align="center"><img src="asset/readme_img/LLM-IE flowchart.png" width=800 ></div>
+<div align="center"><img src="doc_asset/readme_img/LLM-IE flowchart.png" width=800 ></div>
 ## Prerequisite
-At least one LLM inference engine is required. There are built-in supports for 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python), <img src="https://avatars.githubusercontent.com/u/151674099?s=48&v=4" alt="Icon" width="20"/> [Ollama](https://github.com/ollama/ollama), 🤗 [Huggingface_hub](https://github.com/huggingface/huggingface_hub), and <img src=asset/readme_img/openai-logomark.png width=16 /> [OpenAI API](https://platform.openai.com/docs/api-reference/introduction). For installation guides, please refer to those projects. Other inference engines can be configured through the [InferenceEngine](src/llm_ie/engines.py) abstract class. See [LLM Inference Engine](#llm-inference-engine) section below.
+At least one LLM inference engine is required. There are built-in supports for 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python), <img src="https://avatars.githubusercontent.com/u/151674099?s=48&v=4" alt="Icon" width="20"/> [Ollama](https://github.com/ollama/ollama), 🤗 [Huggingface_hub](https://github.com/huggingface/huggingface_hub), and <img src=doc_asset/readme_img/openai-logomark.png width=16 /> [OpenAI API](https://platform.openai.com/docs/api-reference/introduction). For installation guides, please refer to those projects. Other inference engines can be configured through the [InferenceEngine](src/llm_ie/engines.py) abstract class. See [LLM Inference Engine](#llm-inference-engine) section below.
 ## Installation
 The Python package is available on PyPI.
@@ -82,7 +83,7 @@ llm = HuggingFaceHubInferenceEngine(model="meta-llama/Meta-Llama-3-8B-Instruct")
 </details>
 <details>
-<summary><img src=asset/readme_img/openai-logomark.png width=16 /> OpenAI API</summary>
+<summary><img src=doc_asset/readme_img/openai-logomark.png width=16 /> OpenAI API</summary>
 Follow the [Best Practices for API Key Safety](https://help.openai.com/en/articles/5112595-best-practices-for-api-key-safety) to set up API key.
 ```python
@@ -228,7 +229,7 @@ from llm_ie.engines import HuggingFaceHubInferenceEngine
 hf = HuggingFaceHubInferenceEngine(model="meta-llama/Meta-Llama-3-8B-Instruct")
 ```
-#### <img src=asset/readme_img/openai-logomark.png width=16 /> OpenAI API
+#### <img src=doc_asset/readme_img/openai-logomark.png width=16 /> OpenAI API
 In bash, save API key to the environmental variable ```OPENAI_API_KEY```.
 ```
 export OPENAI_API_KEY=<your_API_key>

{llm_ie-0.1.1 → llm_ie-0.1.3}/README.md RENAMED Viewed

@@ -1,4 +1,4 @@
-<div align="center"><img src=asset/readme_img/LLM-IE.png width=500 ></div>
+<div align="center"><img src=doc_asset/readme_img/LLM-IE.png width=500 ></div>
 ![Python Version](https://img.shields.io/pypi/pyversions/llm-ie)
 ![PyPI](https://img.shields.io/pypi/v/llm-ie)
@@ -20,10 +20,10 @@ An LLM-powered tool that transforms everyday language into robust information ex
 ## Overview
 LLM-IE is a toolkit that provides robust information extraction utilities for frame-based information extraction. Since prompt design has a significant impact on generative information extraction with LLMs, it also provides a built-in LLM editor to help with prompt writing. The flowchart below demonstrates the workflow starting from a casual language request.
-<div align="center"><img src="asset/readme_img/LLM-IE flowchart.png" width=800 ></div>
+<div align="center"><img src="doc_asset/readme_img/LLM-IE flowchart.png" width=800 ></div>
 ## Prerequisite
-At least one LLM inference engine is required. There are built-in supports for 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python), <img src="https://avatars.githubusercontent.com/u/151674099?s=48&v=4" alt="Icon" width="20"/> [Ollama](https://github.com/ollama/ollama), 🤗 [Huggingface_hub](https://github.com/huggingface/huggingface_hub), and <img src=asset/readme_img/openai-logomark.png width=16 /> [OpenAI API](https://platform.openai.com/docs/api-reference/introduction). For installation guides, please refer to those projects. Other inference engines can be configured through the [InferenceEngine](src/llm_ie/engines.py) abstract class. See [LLM Inference Engine](#llm-inference-engine) section below.
+At least one LLM inference engine is required. There are built-in supports for 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python), <img src="https://avatars.githubusercontent.com/u/151674099?s=48&v=4" alt="Icon" width="20"/> [Ollama](https://github.com/ollama/ollama), 🤗 [Huggingface_hub](https://github.com/huggingface/huggingface_hub), and <img src=doc_asset/readme_img/openai-logomark.png width=16 /> [OpenAI API](https://platform.openai.com/docs/api-reference/introduction). For installation guides, please refer to those projects. Other inference engines can be configured through the [InferenceEngine](src/llm_ie/engines.py) abstract class. See [LLM Inference Engine](#llm-inference-engine) section below.
 ## Installation
 The Python package is available on PyPI.
@@ -69,7 +69,7 @@ llm = HuggingFaceHubInferenceEngine(model="meta-llama/Meta-Llama-3-8B-Instruct")
 </details>
 <details>
-<summary><img src=asset/readme_img/openai-logomark.png width=16 /> OpenAI API</summary>
+<summary><img src=doc_asset/readme_img/openai-logomark.png width=16 /> OpenAI API</summary>
 Follow the [Best Practices for API Key Safety](https://help.openai.com/en/articles/5112595-best-practices-for-api-key-safety) to set up API key.
 ```python
@@ -215,7 +215,7 @@ from llm_ie.engines import HuggingFaceHubInferenceEngine
 hf = HuggingFaceHubInferenceEngine(model="meta-llama/Meta-Llama-3-8B-Instruct")
 ```
-#### <img src=asset/readme_img/openai-logomark.png width=16 /> OpenAI API
+#### <img src=doc_asset/readme_img/openai-logomark.png width=16 /> OpenAI API
 In bash, save API key to the environmental variable ```OPENAI_API_KEY```.
 ```
 export OPENAI_API_KEY=<your_API_key>

{llm_ie-0.1.1 → llm_ie-0.1.3}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "llm-ie"
-version = "0.1.1"
+version = "0.1.3"
 description = "An LLM-powered tool that transforms everyday language into robust information extraction pipelines."
 authors = ["Enshuo (David) Hsu"]
 license = "MIT"
@@ -10,8 +10,10 @@ exclude = [
     "test/**"
 ]
 [tool.poetry.dependencies]
 python = "^3.11"
+nltk = "^3.8"
 [build-system]

llm_ie-0.1.3/src/llm_ie/asset/PromptEditor_prompts/comment.txt ADDED Viewed

@@ -0,0 +1,7 @@
+This is a text analysis task. Analyze the draft prompt based on the prompt guideline below.
+# Prompt guideline
+{{prompt_guideline}}
+# Draft prompt
+{{draft}}

llm_ie-0.1.3/src/llm_ie/asset/PromptEditor_prompts/rewrite.txt ADDED Viewed

@@ -0,0 +1,7 @@
+This is a prompt rewriting task. Rewrite the draft prompt following the prompt guideline below. DO NOT explain your answer.
+# Prompt guideline
+{{prompt_guideline}}
+# Draft prompt
+{{draft}}

llm_ie-0.1.3/src/llm_ie/asset/prompt_guide/BasicFrameExtractor_prompt_guide.txt ADDED Viewed

@@ -0,0 +1,35 @@
+Prompt template design:
+    1. Task description
+    2. Schema definition
+    3. Output format definition
+    4. Additional hints
+    5. Input placeholder
+Example:
+    # Task description
+    The paragraph below is from the Food and Drug Administration (FDA) Clinical Pharmacology Section of Labeling for Human Prescription Drug and Biological Products, Adverse reactions section. Please carefully review it and extract the adverse reactions and percentages. Note that each adverse reaction is nested under a clinical trial and potentially an arm. Your output should take that into consideration.
+    # Schema definition
+    Your output should contain:
+        "ClinicalTrial" which is the name of the trial,
+        If applicable, "Arm" which is the arm within the clinical trial,
+        "AdverseReaction" which is the name of the adverse reaction,
+        If applicable, "Percentage" which is the occurance of the adverse reaction within the trial and arm,
+        "Evidence" which is the EXACT sentence in the text where you found the AdverseReaction from
+    # Output format definition
+    Your output should follow JSON format, for example:
+    [
+        {"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>", "Evidence": "<exact sentence from the text>"},
+        {"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>", "Evidence": "<exact sentence from the text>"}
+    ]
+    # Additional hints
+    Your output should be 100% based on the provided content. DO NOT output fake numbers.
+    If there is no specific arm, just omit the "Arm" key. If the percentage is not reported, just omit the "Percentage" key. The "Evidence" should always be provided.
+    # Input placeholder
+    Below is the Adverse reactions section:
+    {{input}}

llm_ie-0.1.3/src/llm_ie/asset/prompt_guide/ReviewFrameExtractor_prompt_guide.txt ADDED Viewed

@@ -0,0 +1,35 @@
+Prompt template design:
+    1. Task description
+    2. Schema definition
+    3. Output format definition
+    4. Additional hints
+    5. Input placeholder
+Example:
+    # Task description
+    The paragraph below is from the Food and Drug Administration (FDA) Clinical Pharmacology Section of Labeling for Human Prescription Drug and Biological Products, Adverse reactions section. Please carefully review it and extract the adverse reactions and percentages. Note that each adverse reaction is nested under a clinical trial and potentially an arm. Your output should take that into consideration.
+    # Schema definition
+    Your output should contain:
+        "ClinicalTrial" which is the name of the trial,
+        If applicable, "Arm" which is the arm within the clinical trial,
+        "AdverseReaction" which is the name of the adverse reaction,
+        If applicable, "Percentage" which is the occurance of the adverse reaction within the trial and arm,
+        "Evidence" which is the EXACT sentence in the text where you found the AdverseReaction from
+    # Output format definition
+    Your output should follow JSON format, for example:
+    [
+        {"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>", "Evidence": "<exact sentence from the text>"},
+        {"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>", "Evidence": "<exact sentence from the text>"}
+    ]
+    # Additional hints
+    Your output should be 100% based on the provided content. DO NOT output fake numbers.
+    If there is no specific arm, just omit the "Arm" key. If the percentage is not reported, just omit the "Percentage" key. The "Evidence" should always be provided.
+    # Input placeholder
+    Below is the Adverse reactions section:
+    {{input}}

llm_ie-0.1.3/src/llm_ie/asset/prompt_guide/SentenceFrameExtractor_prompt_guide.txt ADDED Viewed

@@ -0,0 +1,40 @@
+Prompt template design:
+    1. Task description (mention the task is to extract information from sentences)
+    2. Schema definition
+    3. Output format definition
+    4. Additional hints
+    5. Input placeholder (mention user will feed sentence by sentence)
+Example:
+    # Task description
+    The paragraph below is from the Food and Drug Administration (FDA) Clinical Pharmacology Section of Labeling for Human Prescription Drug and Biological Products, Adverse Reactions section. Your task is to extract the adverse reactions in a given sentence (provided by user at a time). Note that adverse reactions can be nested under a clinical trial and potentially an arm. Your output should consider that.
+    # Schema definition
+    Your output should contain:
+        If applicable, "ClinicalTrial" which is the name of the trial,
+        If applicable, "Arm" which is the arm within the clinical trial,
+        Must have, "AdverseReaction" which is the name of the adverse reaction spelled exactly as in the source document,
+        If applicable, "Percentage" which is the occurrence of the adverse reaction within the trial and arm,
+    # Output format definition
+    Your output should follow JSON format,
+    if there are adverse reaction mentions:
+        [{"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>"},
+        {"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>"}]
+    if there is no adverse reaction in the given sentence, just output an empty list:
+        []
+    # Additional hints
+    Your output should be 100% based on the provided content. DO NOT output fake numbers.
+    If there is no specific trial or arm, just omit the "ClinicalTrial" or "Arm" key. If the percentage is not reported, just omit the "Percentage" key.
+    I am only interested in the content in JSON format. Do NOT generate explanation.
+    The Adverse reactions section often has a sentence in the first paragraph:
+        "The following clinically significant adverse reactions are described elsewhere in the labeling:..." Make sure to extract those adverse reaction mentions.
+    The Adverse reactions section often has summary sentences like:
+        "The most common adverse reactions were ...". Make sure to extract those adverse reaction mentions.
+    # Input placeholder
+    Below is the entire Adverse reactions section for your reference. I will feed you with sentences from it one by one.
+    "{{input}}"

{llm_ie-0.1.1 → llm_ie-0.1.3}/src/llm_ie/engines.py RENAMED Viewed

@@ -31,7 +31,6 @@ class InferenceEngine:
 class LlamaCppInferenceEngine(InferenceEngine):
-    from llama_cpp import Llama
     def __init__(self, repo_id:str, gguf_filename:str, n_ctx:int=4096, n_gpu_layers:int=-1, **kwrs):
         """
         The Llama.cpp inference engine.
@@ -48,13 +47,13 @@ class LlamaCppInferenceEngine(InferenceEngine):
         n_gpu_layers : int, Optional
             number of layers to offload to GPU. Default is all layers (-1).
         """
-        super().__init__()
+        from llama_cpp import Llama
         self.repo_id = repo_id
         self.gguf_filename = gguf_filename
         self.n_ctx = n_ctx
         self.n_gpu_layers = n_gpu_layers
-        self.model = self.Llama.from_pretrained(
+        self.model = Llama.from_pretrained(
             repo_id=self.repo_id,
             filename=self.gguf_filename,
             n_gpu_layers=n_gpu_layers,
@@ -106,7 +105,6 @@ class LlamaCppInferenceEngine(InferenceEngine):
 class OllamaInferenceEngine(InferenceEngine):
-    import ollama
     def __init__(self, model_name:str, num_ctx:int=4096, keep_alive:int=300, **kwrs):
         """
         The Ollama inference engine.
@@ -120,6 +118,8 @@ class OllamaInferenceEngine(InferenceEngine):
         keep_alive : int, Optional
             seconds to hold the LLM after the last API call.
         """
+        import ollama
+        self.ollama = ollama
         self.model_name = model_name
         self.num_ctx = num_ctx
         self.keep_alive = keep_alive
@@ -158,13 +158,13 @@ class OllamaInferenceEngine(InferenceEngine):
 class HuggingFaceHubInferenceEngine(InferenceEngine):
-    from huggingface_hub import InferenceClient
     def __init__(self, **kwrs):
         """
         The Huggingface_hub InferenceClient inference engine.
         For parameters and documentation, refer to https://huggingface.co/docs/huggingface_hub/en/package_reference/inference_client
         """
-        self.client = self.InferenceClient(**kwrs)
+        from huggingface_hub import InferenceClient
+        self.client = InferenceClient(**kwrs)
     def chat(self, messages:List[Dict[str,str]], max_new_tokens:int=2048, temperature:float=0.0, stream:bool=False, **kwrs) -> str:
         """
@@ -200,7 +200,6 @@ class HuggingFaceHubInferenceEngine(InferenceEngine):
 class OpenAIInferenceEngine(InferenceEngine):
-    from openai import OpenAI
     def __init__(self, model:str, **kwrs):
         """
         The OpenAI API inference engine.
@@ -211,7 +210,8 @@ class OpenAIInferenceEngine(InferenceEngine):
         model_name : str
             model name as described in https://platform.openai.com/docs/models
         """
-        self.client = self.OpenAI(**kwrs)
+        from openai import OpenAI
+        self.client = OpenAI(**kwrs)
         self.model = model
     def chat(self, messages:List[Dict[str,str]], max_new_tokens:int=2048, temperature:float=0.0, stream:bool=False, **kwrs) -> str:

{llm_ie-0.1.1 → llm_ie-0.1.3}/src/llm_ie/extractors.py RENAMED Viewed

@@ -1,6 +1,7 @@
 import os
 import abc
 import re
+import importlib.resources
 from typing import List, Dict, Tuple, Union
 from llm_ie.data_types import LLMInformationExtractionFrame
 from llm_ie.engines import InferenceEngine
@@ -27,7 +28,8 @@ class FrameExtractor:
     @classmethod
     def get_prompt_guide(cls) -> str:
-        with open(os.path.join('/home/daviden1013/David_projects/llm-ie', 'asset', 'prompt_guide', f"{cls.__name__}_prompt_guide.txt"), 'r') as f:
+        file_path = importlib.resources.files('llm_ie.asset.prompt_guide').joinpath(f"{cls.__name__}_prompt_guide.txt")
+        with open(file_path, 'r') as f:
             return f.read()
     def _get_user_prompt(self, text_content:Union[str, Dict[str,str]]) -> str:

llm_ie-0.1.3/src/llm_ie/prompt_editor.py ADDED Viewed

@@ -0,0 +1,45 @@
+import os
+import importlib.resources
+from llm_ie.engines import InferenceEngine
+from llm_ie.extractors import FrameExtractor
+class PromptEditor:
+    def __init__(self, inference_engine:InferenceEngine, extractor:FrameExtractor):
+        """
+        This class is a LLM agent that rewrite or comment a prompt draft based on the prompt guide of an extractor.
+        Parameters
+        ----------
+        inference_engine : InferenceEngine
+            the LLM inferencing engine object. Must implements the chat() method.
+        extractor : FrameExtractor
+            a FrameExtractor.
+        """
+        self.inference_engine = inference_engine
+        self.prompt_guide = extractor.get_prompt_guide()
+    def rewrite(self, draft:str) -> str:
+        """
+        This method inputs a prompt draft and rewrites it following the extractor's guideline.
+        """
+        file_path = importlib.resources.files('llm_ie.asset.PromptEditor_prompts').joinpath('rewrite.txt')
+        with open(file_path, 'r') as f:
+            prompt = f.read()
+        prompt = prompt.replace("{{draft}}", draft).replace("{{prompt_guideline}}", self.prompt_guide)
+        messages = [{"role": "user", "content": prompt}]
+        res = self.inference_engine.chat(messages, stream=True)
+        return res
+    def comment(self, draft:str) -> str:
+        """
+        This method inputs a prompt draft and comment following the extractor's guideline.
+        """
+        file_path = importlib.resources.files('llm_ie.asset.PromptEditor_prompts').joinpath('comment.txt')
+        with open(file_path, 'r') as f:
+            prompt = f.read()
+        prompt = prompt.replace("{{draft}}", draft).replace("{{prompt_guideline}}", self.prompt_guide)
+        messages = [{"role": "user", "content": prompt}]
+        res = self.inference_engine.chat(messages, stream=True)
+        return res

llm_ie-0.1.1/src/llm_ie/prompt_editor.py DELETED Viewed

@@ -1,26 +0,0 @@
-import os
-from llm_ie.engines import InferenceEngine
-from llm_ie.extractors import FrameExtractor
-class PromptEditor:
-    def __init__(self, inference_engine:InferenceEngine, extractor:FrameExtractor):
-        self.inference_engine = inference_engine
-        self.prompt_guide = extractor.get_prompt_guide()
-    def rewrite(self, draft:str) -> str:
-        with open(os.path.join('/home/daviden1013/David_projects/llm-ie', 'asset', 'PromptEditor_prompts', 'rewrite.txt'), 'r') as f:
-            prompt = f.read()
-        prompt = prompt.replace("{{draft}}", draft).replace("{{prompt_guideline}}", self.prompt_guide)
-        messages = [{"role": "user", "content": prompt}]
-        res = self.inference_engine.chat(messages, stream=True)
-        return res
-    def comment(self, draft:str) -> str:
-        with open(os.path.join('/home/daviden1013/David_projects/llm-ie', 'asset', 'PromptEditor_prompts', 'comment.txt'), 'r') as f:
-            prompt = f.read()
-        prompt = prompt.replace("{{draft}}", draft).replace("{{prompt_guideline}}", self.prompt_guide)
-        messages = [{"role": "user", "content": prompt}]
-        res = self.inference_engine.chat(messages, stream=True)
-        return res

{llm_ie-0.1.1 → llm_ie-0.1.3}/src/llm_ie/__init__.py RENAMED Viewed

File without changes

{llm_ie-0.1.1 → llm_ie-0.1.3}/src/llm_ie/data_types.py RENAMED Viewed

File without changes

llm-ie 0.1.1__tar.gz → 0.1.3__tar.gz

llm-ie 0.1.1tar.gz → 0.1.3tar.gz