llm-ie 0.1.2__py3-none-any.whl → 0.1.3__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,7 @@
1
+ This is a text analysis task. Analyze the draft prompt based on the prompt guideline below.
2
+
3
+ # Prompt guideline
4
+ {{prompt_guideline}}
5
+
6
+ # Draft prompt
7
+ {{draft}}
@@ -0,0 +1,7 @@
1
+ This is a prompt rewriting task. Rewrite the draft prompt following the prompt guideline below. DO NOT explain your answer.
2
+
3
+ # Prompt guideline
4
+ {{prompt_guideline}}
5
+
6
+ # Draft prompt
7
+ {{draft}}
@@ -0,0 +1,35 @@
1
+ Prompt template design:
2
+ 1. Task description
3
+ 2. Schema definition
4
+ 3. Output format definition
5
+ 4. Additional hints
6
+ 5. Input placeholder
7
+
8
+ Example:
9
+
10
+ # Task description
11
+ The paragraph below is from the Food and Drug Administration (FDA) Clinical Pharmacology Section of Labeling for Human Prescription Drug and Biological Products, Adverse reactions section. Please carefully review it and extract the adverse reactions and percentages. Note that each adverse reaction is nested under a clinical trial and potentially an arm. Your output should take that into consideration.
12
+
13
+ # Schema definition
14
+ Your output should contain:
15
+ "ClinicalTrial" which is the name of the trial,
16
+ If applicable, "Arm" which is the arm within the clinical trial,
17
+ "AdverseReaction" which is the name of the adverse reaction,
18
+ If applicable, "Percentage" which is the occurance of the adverse reaction within the trial and arm,
19
+ "Evidence" which is the EXACT sentence in the text where you found the AdverseReaction from
20
+
21
+ # Output format definition
22
+ Your output should follow JSON format, for example:
23
+ [
24
+ {"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>", "Evidence": "<exact sentence from the text>"},
25
+ {"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>", "Evidence": "<exact sentence from the text>"}
26
+ ]
27
+
28
+ # Additional hints
29
+ Your output should be 100% based on the provided content. DO NOT output fake numbers.
30
+ If there is no specific arm, just omit the "Arm" key. If the percentage is not reported, just omit the "Percentage" key. The "Evidence" should always be provided.
31
+
32
+ # Input placeholder
33
+ Below is the Adverse reactions section:
34
+ {{input}}
35
+
@@ -0,0 +1,35 @@
1
+ Prompt template design:
2
+ 1. Task description
3
+ 2. Schema definition
4
+ 3. Output format definition
5
+ 4. Additional hints
6
+ 5. Input placeholder
7
+
8
+ Example:
9
+
10
+ # Task description
11
+ The paragraph below is from the Food and Drug Administration (FDA) Clinical Pharmacology Section of Labeling for Human Prescription Drug and Biological Products, Adverse reactions section. Please carefully review it and extract the adverse reactions and percentages. Note that each adverse reaction is nested under a clinical trial and potentially an arm. Your output should take that into consideration.
12
+
13
+ # Schema definition
14
+ Your output should contain:
15
+ "ClinicalTrial" which is the name of the trial,
16
+ If applicable, "Arm" which is the arm within the clinical trial,
17
+ "AdverseReaction" which is the name of the adverse reaction,
18
+ If applicable, "Percentage" which is the occurance of the adverse reaction within the trial and arm,
19
+ "Evidence" which is the EXACT sentence in the text where you found the AdverseReaction from
20
+
21
+ # Output format definition
22
+ Your output should follow JSON format, for example:
23
+ [
24
+ {"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>", "Evidence": "<exact sentence from the text>"},
25
+ {"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>", "Evidence": "<exact sentence from the text>"}
26
+ ]
27
+
28
+ # Additional hints
29
+ Your output should be 100% based on the provided content. DO NOT output fake numbers.
30
+ If there is no specific arm, just omit the "Arm" key. If the percentage is not reported, just omit the "Percentage" key. The "Evidence" should always be provided.
31
+
32
+ # Input placeholder
33
+ Below is the Adverse reactions section:
34
+ {{input}}
35
+
@@ -0,0 +1,40 @@
1
+ Prompt template design:
2
+ 1. Task description (mention the task is to extract information from sentences)
3
+ 2. Schema definition
4
+ 3. Output format definition
5
+ 4. Additional hints
6
+ 5. Input placeholder (mention user will feed sentence by sentence)
7
+
8
+ Example:
9
+
10
+ # Task description
11
+ The paragraph below is from the Food and Drug Administration (FDA) Clinical Pharmacology Section of Labeling for Human Prescription Drug and Biological Products, Adverse Reactions section. Your task is to extract the adverse reactions in a given sentence (provided by user at a time). Note that adverse reactions can be nested under a clinical trial and potentially an arm. Your output should consider that.
12
+
13
+ # Schema definition
14
+ Your output should contain:
15
+ If applicable, "ClinicalTrial" which is the name of the trial,
16
+ If applicable, "Arm" which is the arm within the clinical trial,
17
+ Must have, "AdverseReaction" which is the name of the adverse reaction spelled exactly as in the source document,
18
+ If applicable, "Percentage" which is the occurrence of the adverse reaction within the trial and arm,
19
+
20
+ # Output format definition
21
+ Your output should follow JSON format,
22
+ if there are adverse reaction mentions:
23
+ [{"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>"},
24
+ {"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>"}]
25
+ if there is no adverse reaction in the given sentence, just output an empty list:
26
+ []
27
+
28
+ # Additional hints
29
+ Your output should be 100% based on the provided content. DO NOT output fake numbers.
30
+ If there is no specific trial or arm, just omit the "ClinicalTrial" or "Arm" key. If the percentage is not reported, just omit the "Percentage" key.
31
+ I am only interested in the content in JSON format. Do NOT generate explanation.
32
+
33
+ The Adverse reactions section often has a sentence in the first paragraph:
34
+ "The following clinically significant adverse reactions are described elsewhere in the labeling:..." Make sure to extract those adverse reaction mentions.
35
+ The Adverse reactions section often has summary sentences like:
36
+ "The most common adverse reactions were ...". Make sure to extract those adverse reaction mentions.
37
+
38
+ # Input placeholder
39
+ Below is the entire Adverse reactions section for your reference. I will feed you with sentences from it one by one.
40
+ "{{input}}"
llm_ie/extractors.py CHANGED
@@ -1,6 +1,7 @@
1
1
  import os
2
2
  import abc
3
3
  import re
4
+ import importlib.resources
4
5
  from typing import List, Dict, Tuple, Union
5
6
  from llm_ie.data_types import LLMInformationExtractionFrame
6
7
  from llm_ie.engines import InferenceEngine
@@ -27,7 +28,8 @@ class FrameExtractor:
27
28
 
28
29
  @classmethod
29
30
  def get_prompt_guide(cls) -> str:
30
- with open(os.path.join('/home/daviden1013/David_projects/llm-ie', 'asset', 'prompt_guide', f"{cls.__name__}_prompt_guide.txt"), 'r') as f:
31
+ file_path = importlib.resources.files('llm_ie.asset.prompt_guide').joinpath(f"{cls.__name__}_prompt_guide.txt")
32
+ with open(file_path, 'r') as f:
31
33
  return f.read()
32
34
 
33
35
  def _get_user_prompt(self, text_content:Union[str, Dict[str,str]]) -> str:
llm_ie/prompt_editor.py CHANGED
@@ -1,14 +1,29 @@
1
1
  import os
2
+ import importlib.resources
2
3
  from llm_ie.engines import InferenceEngine
3
4
  from llm_ie.extractors import FrameExtractor
4
5
 
5
6
  class PromptEditor:
6
7
  def __init__(self, inference_engine:InferenceEngine, extractor:FrameExtractor):
8
+ """
9
+ This class is a LLM agent that rewrite or comment a prompt draft based on the prompt guide of an extractor.
10
+
11
+ Parameters
12
+ ----------
13
+ inference_engine : InferenceEngine
14
+ the LLM inferencing engine object. Must implements the chat() method.
15
+ extractor : FrameExtractor
16
+ a FrameExtractor.
17
+ """
7
18
  self.inference_engine = inference_engine
8
19
  self.prompt_guide = extractor.get_prompt_guide()
9
20
 
10
21
  def rewrite(self, draft:str) -> str:
11
- with open(os.path.join('/home/daviden1013/David_projects/llm-ie', 'asset', 'PromptEditor_prompts', 'rewrite.txt'), 'r') as f:
22
+ """
23
+ This method inputs a prompt draft and rewrites it following the extractor's guideline.
24
+ """
25
+ file_path = importlib.resources.files('llm_ie.asset.PromptEditor_prompts').joinpath('rewrite.txt')
26
+ with open(file_path, 'r') as f:
12
27
  prompt = f.read()
13
28
 
14
29
  prompt = prompt.replace("{{draft}}", draft).replace("{{prompt_guideline}}", self.prompt_guide)
@@ -17,7 +32,11 @@ class PromptEditor:
17
32
  return res
18
33
 
19
34
  def comment(self, draft:str) -> str:
20
- with open(os.path.join('/home/daviden1013/David_projects/llm-ie', 'asset', 'PromptEditor_prompts', 'comment.txt'), 'r') as f:
35
+ """
36
+ This method inputs a prompt draft and comment following the extractor's guideline.
37
+ """
38
+ file_path = importlib.resources.files('llm_ie.asset.PromptEditor_prompts').joinpath('comment.txt')
39
+ with open(file_path, 'r') as f:
21
40
  prompt = f.read()
22
41
 
23
42
  prompt = prompt.replace("{{draft}}", draft).replace("{{prompt_guideline}}", self.prompt_guide)
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: llm-ie
3
- Version: 0.1.2
3
+ Version: 0.1.3
4
4
  Summary: An LLM-powered tool that transforms everyday language into robust information extraction pipelines.
5
5
  License: MIT
6
6
  Author: Enshuo (David) Hsu
@@ -9,9 +9,10 @@ Classifier: License :: OSI Approved :: MIT License
9
9
  Classifier: Programming Language :: Python :: 3
10
10
  Classifier: Programming Language :: Python :: 3.11
11
11
  Classifier: Programming Language :: Python :: 3.12
12
+ Requires-Dist: nltk (>=3.8,<4.0)
12
13
  Description-Content-Type: text/markdown
13
14
 
14
- <div align="center"><img src=asset/readme_img/LLM-IE.png width=500 ></div>
15
+ <div align="center"><img src=doc_asset/readme_img/LLM-IE.png width=500 ></div>
15
16
 
16
17
  ![Python Version](https://img.shields.io/pypi/pyversions/llm-ie)
17
18
  ![PyPI](https://img.shields.io/pypi/v/llm-ie)
@@ -33,10 +34,10 @@ An LLM-powered tool that transforms everyday language into robust information ex
33
34
  ## Overview
34
35
  LLM-IE is a toolkit that provides robust information extraction utilities for frame-based information extraction. Since prompt design has a significant impact on generative information extraction with LLMs, it also provides a built-in LLM editor to help with prompt writing. The flowchart below demonstrates the workflow starting from a casual language request.
35
36
 
36
- <div align="center"><img src="asset/readme_img/LLM-IE flowchart.png" width=800 ></div>
37
+ <div align="center"><img src="doc_asset/readme_img/LLM-IE flowchart.png" width=800 ></div>
37
38
 
38
39
  ## Prerequisite
39
- At least one LLM inference engine is required. There are built-in supports for 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python), <img src="https://avatars.githubusercontent.com/u/151674099?s=48&v=4" alt="Icon" width="20"/> [Ollama](https://github.com/ollama/ollama), 🤗 [Huggingface_hub](https://github.com/huggingface/huggingface_hub), and <img src=asset/readme_img/openai-logomark.png width=16 /> [OpenAI API](https://platform.openai.com/docs/api-reference/introduction). For installation guides, please refer to those projects. Other inference engines can be configured through the [InferenceEngine](src/llm_ie/engines.py) abstract class. See [LLM Inference Engine](#llm-inference-engine) section below.
40
+ At least one LLM inference engine is required. There are built-in supports for 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python), <img src="https://avatars.githubusercontent.com/u/151674099?s=48&v=4" alt="Icon" width="20"/> [Ollama](https://github.com/ollama/ollama), 🤗 [Huggingface_hub](https://github.com/huggingface/huggingface_hub), and <img src=doc_asset/readme_img/openai-logomark.png width=16 /> [OpenAI API](https://platform.openai.com/docs/api-reference/introduction). For installation guides, please refer to those projects. Other inference engines can be configured through the [InferenceEngine](src/llm_ie/engines.py) abstract class. See [LLM Inference Engine](#llm-inference-engine) section below.
40
41
 
41
42
  ## Installation
42
43
  The Python package is available on PyPI.
@@ -82,7 +83,7 @@ llm = HuggingFaceHubInferenceEngine(model="meta-llama/Meta-Llama-3-8B-Instruct")
82
83
  </details>
83
84
 
84
85
  <details>
85
- <summary><img src=asset/readme_img/openai-logomark.png width=16 /> OpenAI API</summary>
86
+ <summary><img src=doc_asset/readme_img/openai-logomark.png width=16 /> OpenAI API</summary>
86
87
 
87
88
  Follow the [Best Practices for API Key Safety](https://help.openai.com/en/articles/5112595-best-practices-for-api-key-safety) to set up API key.
88
89
  ```python
@@ -228,7 +229,7 @@ from llm_ie.engines import HuggingFaceHubInferenceEngine
228
229
  hf = HuggingFaceHubInferenceEngine(model="meta-llama/Meta-Llama-3-8B-Instruct")
229
230
  ```
230
231
 
231
- #### <img src=asset/readme_img/openai-logomark.png width=16 /> OpenAI API
232
+ #### <img src=doc_asset/readme_img/openai-logomark.png width=16 /> OpenAI API
232
233
  In bash, save API key to the environmental variable ```OPENAI_API_KEY```.
233
234
  ```
234
235
  export OPENAI_API_KEY=<your_API_key>
@@ -0,0 +1,13 @@
1
+ llm_ie/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
2
+ llm_ie/asset/PromptEditor_prompts/comment.txt,sha256=C_lxx-dlOlFJ__jkHKosZ8HsNAeV1aowh2B36nIipBY,159
3
+ llm_ie/asset/PromptEditor_prompts/rewrite.txt,sha256=bYLOix7DUBlcWv-Q0JZ5kDnZ9OEXBt_AGDN0TydLB8o,191
4
+ llm_ie/asset/prompt_guide/BasicFrameExtractor_prompt_guide.txt,sha256=XbnU8byLGGUA3A3lT0bb2Hw-ggzhcqD3ZuKzduod2ww,1944
5
+ llm_ie/asset/prompt_guide/ReviewFrameExtractor_prompt_guide.txt,sha256=XbnU8byLGGUA3A3lT0bb2Hw-ggzhcqD3ZuKzduod2ww,1944
6
+ llm_ie/asset/prompt_guide/SentenceFrameExtractor_prompt_guide.txt,sha256=8nj9OLPJMtr9Soi5JU3Xk-HC7pKNoI54xA_A4u7I5j4,2620
7
+ llm_ie/data_types.py,sha256=AxqgfmPkYySDz7VuTWh8yDWofvZgdjgFiW9hihqInHc,6605
8
+ llm_ie/engines.py,sha256=TuxM56_u6-dsAAuNdfuKSH23nb9UfFbg6T60e-OXEA8,9294
9
+ llm_ie/extractors.py,sha256=cYfxKzOxs6fsFbcA0KvEGDXHOPBojNpDG7kfrV7Tsbc,22355
10
+ llm_ie/prompt_editor.py,sha256=dbu7A3O7O7Iw2v-xCgrTFH1-wTLAGf4SHDqdeS-He2Q,1869
11
+ llm_ie-0.1.3.dist-info/METADATA,sha256=3cn2apV_x-zhyGXyvUnVUhonBhdGGYlrrU10tZvnORQ,28028
12
+ llm_ie-0.1.3.dist-info/WHEEL,sha256=sP946D7jFCHeNz5Iq4fL4Lu-PrWrFsgfLXbbkciIZwg,88
13
+ llm_ie-0.1.3.dist-info/RECORD,,
@@ -1,8 +0,0 @@
1
- llm_ie/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
2
- llm_ie/data_types.py,sha256=AxqgfmPkYySDz7VuTWh8yDWofvZgdjgFiW9hihqInHc,6605
3
- llm_ie/engines.py,sha256=TuxM56_u6-dsAAuNdfuKSH23nb9UfFbg6T60e-OXEA8,9294
4
- llm_ie/extractors.py,sha256=94uPhEtpYeingMY4WVLc8F6vw8hnSS8Wt-TMr5B5flg,22315
5
- llm_ie/prompt_editor.py,sha256=doPjy5HFoZvP5Y1x_rcA_-wSQfqHkwKfETQd3uIh0GA,1212
6
- llm_ie-0.1.2.dist-info/METADATA,sha256=74HgMLENRFNbx04Vgk0uLbrKOQkxfAM6ZXOk75gbSO0,27975
7
- llm_ie-0.1.2.dist-info/WHEEL,sha256=sP946D7jFCHeNz5Iq4fL4Lu-PrWrFsgfLXbbkciIZwg,88
8
- llm_ie-0.1.2.dist-info/RECORD,,
File without changes