llm-ie 0.1.2__tar.gz → 0.1.3__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: llm-ie
3
- Version: 0.1.2
3
+ Version: 0.1.3
4
4
  Summary: An LLM-powered tool that transforms everyday language into robust information extraction pipelines.
5
5
  License: MIT
6
6
  Author: Enshuo (David) Hsu
@@ -9,9 +9,10 @@ Classifier: License :: OSI Approved :: MIT License
9
9
  Classifier: Programming Language :: Python :: 3
10
10
  Classifier: Programming Language :: Python :: 3.11
11
11
  Classifier: Programming Language :: Python :: 3.12
12
+ Requires-Dist: nltk (>=3.8,<4.0)
12
13
  Description-Content-Type: text/markdown
13
14
 
14
- <div align="center"><img src=asset/readme_img/LLM-IE.png width=500 ></div>
15
+ <div align="center"><img src=doc_asset/readme_img/LLM-IE.png width=500 ></div>
15
16
 
16
17
  ![Python Version](https://img.shields.io/pypi/pyversions/llm-ie)
17
18
  ![PyPI](https://img.shields.io/pypi/v/llm-ie)
@@ -33,10 +34,10 @@ An LLM-powered tool that transforms everyday language into robust information ex
33
34
  ## Overview
34
35
  LLM-IE is a toolkit that provides robust information extraction utilities for frame-based information extraction. Since prompt design has a significant impact on generative information extraction with LLMs, it also provides a built-in LLM editor to help with prompt writing. The flowchart below demonstrates the workflow starting from a casual language request.
35
36
 
36
- <div align="center"><img src="asset/readme_img/LLM-IE flowchart.png" width=800 ></div>
37
+ <div align="center"><img src="doc_asset/readme_img/LLM-IE flowchart.png" width=800 ></div>
37
38
 
38
39
  ## Prerequisite
39
- At least one LLM inference engine is required. There are built-in supports for 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python), <img src="https://avatars.githubusercontent.com/u/151674099?s=48&v=4" alt="Icon" width="20"/> [Ollama](https://github.com/ollama/ollama), 🤗 [Huggingface_hub](https://github.com/huggingface/huggingface_hub), and <img src=asset/readme_img/openai-logomark.png width=16 /> [OpenAI API](https://platform.openai.com/docs/api-reference/introduction). For installation guides, please refer to those projects. Other inference engines can be configured through the [InferenceEngine](src/llm_ie/engines.py) abstract class. See [LLM Inference Engine](#llm-inference-engine) section below.
40
+ At least one LLM inference engine is required. There are built-in supports for 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python), <img src="https://avatars.githubusercontent.com/u/151674099?s=48&v=4" alt="Icon" width="20"/> [Ollama](https://github.com/ollama/ollama), 🤗 [Huggingface_hub](https://github.com/huggingface/huggingface_hub), and <img src=doc_asset/readme_img/openai-logomark.png width=16 /> [OpenAI API](https://platform.openai.com/docs/api-reference/introduction). For installation guides, please refer to those projects. Other inference engines can be configured through the [InferenceEngine](src/llm_ie/engines.py) abstract class. See [LLM Inference Engine](#llm-inference-engine) section below.
40
41
 
41
42
  ## Installation
42
43
  The Python package is available on PyPI.
@@ -82,7 +83,7 @@ llm = HuggingFaceHubInferenceEngine(model="meta-llama/Meta-Llama-3-8B-Instruct")
82
83
  </details>
83
84
 
84
85
  <details>
85
- <summary><img src=asset/readme_img/openai-logomark.png width=16 /> OpenAI API</summary>
86
+ <summary><img src=doc_asset/readme_img/openai-logomark.png width=16 /> OpenAI API</summary>
86
87
 
87
88
  Follow the [Best Practices for API Key Safety](https://help.openai.com/en/articles/5112595-best-practices-for-api-key-safety) to set up API key.
88
89
  ```python
@@ -228,7 +229,7 @@ from llm_ie.engines import HuggingFaceHubInferenceEngine
228
229
  hf = HuggingFaceHubInferenceEngine(model="meta-llama/Meta-Llama-3-8B-Instruct")
229
230
  ```
230
231
 
231
- #### <img src=asset/readme_img/openai-logomark.png width=16 /> OpenAI API
232
+ #### <img src=doc_asset/readme_img/openai-logomark.png width=16 /> OpenAI API
232
233
  In bash, save API key to the environmental variable ```OPENAI_API_KEY```.
233
234
  ```
234
235
  export OPENAI_API_KEY=<your_API_key>
@@ -1,4 +1,4 @@
1
- <div align="center"><img src=asset/readme_img/LLM-IE.png width=500 ></div>
1
+ <div align="center"><img src=doc_asset/readme_img/LLM-IE.png width=500 ></div>
2
2
 
3
3
  ![Python Version](https://img.shields.io/pypi/pyversions/llm-ie)
4
4
  ![PyPI](https://img.shields.io/pypi/v/llm-ie)
@@ -20,10 +20,10 @@ An LLM-powered tool that transforms everyday language into robust information ex
20
20
  ## Overview
21
21
  LLM-IE is a toolkit that provides robust information extraction utilities for frame-based information extraction. Since prompt design has a significant impact on generative information extraction with LLMs, it also provides a built-in LLM editor to help with prompt writing. The flowchart below demonstrates the workflow starting from a casual language request.
22
22
 
23
- <div align="center"><img src="asset/readme_img/LLM-IE flowchart.png" width=800 ></div>
23
+ <div align="center"><img src="doc_asset/readme_img/LLM-IE flowchart.png" width=800 ></div>
24
24
 
25
25
  ## Prerequisite
26
- At least one LLM inference engine is required. There are built-in supports for 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python), <img src="https://avatars.githubusercontent.com/u/151674099?s=48&v=4" alt="Icon" width="20"/> [Ollama](https://github.com/ollama/ollama), 🤗 [Huggingface_hub](https://github.com/huggingface/huggingface_hub), and <img src=asset/readme_img/openai-logomark.png width=16 /> [OpenAI API](https://platform.openai.com/docs/api-reference/introduction). For installation guides, please refer to those projects. Other inference engines can be configured through the [InferenceEngine](src/llm_ie/engines.py) abstract class. See [LLM Inference Engine](#llm-inference-engine) section below.
26
+ At least one LLM inference engine is required. There are built-in supports for 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python), <img src="https://avatars.githubusercontent.com/u/151674099?s=48&v=4" alt="Icon" width="20"/> [Ollama](https://github.com/ollama/ollama), 🤗 [Huggingface_hub](https://github.com/huggingface/huggingface_hub), and <img src=doc_asset/readme_img/openai-logomark.png width=16 /> [OpenAI API](https://platform.openai.com/docs/api-reference/introduction). For installation guides, please refer to those projects. Other inference engines can be configured through the [InferenceEngine](src/llm_ie/engines.py) abstract class. See [LLM Inference Engine](#llm-inference-engine) section below.
27
27
 
28
28
  ## Installation
29
29
  The Python package is available on PyPI.
@@ -69,7 +69,7 @@ llm = HuggingFaceHubInferenceEngine(model="meta-llama/Meta-Llama-3-8B-Instruct")
69
69
  </details>
70
70
 
71
71
  <details>
72
- <summary><img src=asset/readme_img/openai-logomark.png width=16 /> OpenAI API</summary>
72
+ <summary><img src=doc_asset/readme_img/openai-logomark.png width=16 /> OpenAI API</summary>
73
73
 
74
74
  Follow the [Best Practices for API Key Safety](https://help.openai.com/en/articles/5112595-best-practices-for-api-key-safety) to set up API key.
75
75
  ```python
@@ -215,7 +215,7 @@ from llm_ie.engines import HuggingFaceHubInferenceEngine
215
215
  hf = HuggingFaceHubInferenceEngine(model="meta-llama/Meta-Llama-3-8B-Instruct")
216
216
  ```
217
217
 
218
- #### <img src=asset/readme_img/openai-logomark.png width=16 /> OpenAI API
218
+ #### <img src=doc_asset/readme_img/openai-logomark.png width=16 /> OpenAI API
219
219
  In bash, save API key to the environmental variable ```OPENAI_API_KEY```.
220
220
  ```
221
221
  export OPENAI_API_KEY=<your_API_key>
@@ -1,6 +1,6 @@
1
1
  [tool.poetry]
2
2
  name = "llm-ie"
3
- version = "0.1.2"
3
+ version = "0.1.3"
4
4
  description = "An LLM-powered tool that transforms everyday language into robust information extraction pipelines."
5
5
  authors = ["Enshuo (David) Hsu"]
6
6
  license = "MIT"
@@ -10,8 +10,10 @@ exclude = [
10
10
  "test/**"
11
11
  ]
12
12
 
13
+
13
14
  [tool.poetry.dependencies]
14
15
  python = "^3.11"
16
+ nltk = "^3.8"
15
17
 
16
18
 
17
19
  [build-system]
@@ -0,0 +1,7 @@
1
+ This is a text analysis task. Analyze the draft prompt based on the prompt guideline below.
2
+
3
+ # Prompt guideline
4
+ {{prompt_guideline}}
5
+
6
+ # Draft prompt
7
+ {{draft}}
@@ -0,0 +1,7 @@
1
+ This is a prompt rewriting task. Rewrite the draft prompt following the prompt guideline below. DO NOT explain your answer.
2
+
3
+ # Prompt guideline
4
+ {{prompt_guideline}}
5
+
6
+ # Draft prompt
7
+ {{draft}}
@@ -0,0 +1,35 @@
1
+ Prompt template design:
2
+ 1. Task description
3
+ 2. Schema definition
4
+ 3. Output format definition
5
+ 4. Additional hints
6
+ 5. Input placeholder
7
+
8
+ Example:
9
+
10
+ # Task description
11
+ The paragraph below is from the Food and Drug Administration (FDA) Clinical Pharmacology Section of Labeling for Human Prescription Drug and Biological Products, Adverse reactions section. Please carefully review it and extract the adverse reactions and percentages. Note that each adverse reaction is nested under a clinical trial and potentially an arm. Your output should take that into consideration.
12
+
13
+ # Schema definition
14
+ Your output should contain:
15
+ "ClinicalTrial" which is the name of the trial,
16
+ If applicable, "Arm" which is the arm within the clinical trial,
17
+ "AdverseReaction" which is the name of the adverse reaction,
18
+ If applicable, "Percentage" which is the occurance of the adverse reaction within the trial and arm,
19
+ "Evidence" which is the EXACT sentence in the text where you found the AdverseReaction from
20
+
21
+ # Output format definition
22
+ Your output should follow JSON format, for example:
23
+ [
24
+ {"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>", "Evidence": "<exact sentence from the text>"},
25
+ {"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>", "Evidence": "<exact sentence from the text>"}
26
+ ]
27
+
28
+ # Additional hints
29
+ Your output should be 100% based on the provided content. DO NOT output fake numbers.
30
+ If there is no specific arm, just omit the "Arm" key. If the percentage is not reported, just omit the "Percentage" key. The "Evidence" should always be provided.
31
+
32
+ # Input placeholder
33
+ Below is the Adverse reactions section:
34
+ {{input}}
35
+
@@ -0,0 +1,35 @@
1
+ Prompt template design:
2
+ 1. Task description
3
+ 2. Schema definition
4
+ 3. Output format definition
5
+ 4. Additional hints
6
+ 5. Input placeholder
7
+
8
+ Example:
9
+
10
+ # Task description
11
+ The paragraph below is from the Food and Drug Administration (FDA) Clinical Pharmacology Section of Labeling for Human Prescription Drug and Biological Products, Adverse reactions section. Please carefully review it and extract the adverse reactions and percentages. Note that each adverse reaction is nested under a clinical trial and potentially an arm. Your output should take that into consideration.
12
+
13
+ # Schema definition
14
+ Your output should contain:
15
+ "ClinicalTrial" which is the name of the trial,
16
+ If applicable, "Arm" which is the arm within the clinical trial,
17
+ "AdverseReaction" which is the name of the adverse reaction,
18
+ If applicable, "Percentage" which is the occurance of the adverse reaction within the trial and arm,
19
+ "Evidence" which is the EXACT sentence in the text where you found the AdverseReaction from
20
+
21
+ # Output format definition
22
+ Your output should follow JSON format, for example:
23
+ [
24
+ {"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>", "Evidence": "<exact sentence from the text>"},
25
+ {"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>", "Evidence": "<exact sentence from the text>"}
26
+ ]
27
+
28
+ # Additional hints
29
+ Your output should be 100% based on the provided content. DO NOT output fake numbers.
30
+ If there is no specific arm, just omit the "Arm" key. If the percentage is not reported, just omit the "Percentage" key. The "Evidence" should always be provided.
31
+
32
+ # Input placeholder
33
+ Below is the Adverse reactions section:
34
+ {{input}}
35
+
@@ -0,0 +1,40 @@
1
+ Prompt template design:
2
+ 1. Task description (mention the task is to extract information from sentences)
3
+ 2. Schema definition
4
+ 3. Output format definition
5
+ 4. Additional hints
6
+ 5. Input placeholder (mention user will feed sentence by sentence)
7
+
8
+ Example:
9
+
10
+ # Task description
11
+ The paragraph below is from the Food and Drug Administration (FDA) Clinical Pharmacology Section of Labeling for Human Prescription Drug and Biological Products, Adverse Reactions section. Your task is to extract the adverse reactions in a given sentence (provided by user at a time). Note that adverse reactions can be nested under a clinical trial and potentially an arm. Your output should consider that.
12
+
13
+ # Schema definition
14
+ Your output should contain:
15
+ If applicable, "ClinicalTrial" which is the name of the trial,
16
+ If applicable, "Arm" which is the arm within the clinical trial,
17
+ Must have, "AdverseReaction" which is the name of the adverse reaction spelled exactly as in the source document,
18
+ If applicable, "Percentage" which is the occurrence of the adverse reaction within the trial and arm,
19
+
20
+ # Output format definition
21
+ Your output should follow JSON format,
22
+ if there are adverse reaction mentions:
23
+ [{"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>"},
24
+ {"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>"}]
25
+ if there is no adverse reaction in the given sentence, just output an empty list:
26
+ []
27
+
28
+ # Additional hints
29
+ Your output should be 100% based on the provided content. DO NOT output fake numbers.
30
+ If there is no specific trial or arm, just omit the "ClinicalTrial" or "Arm" key. If the percentage is not reported, just omit the "Percentage" key.
31
+ I am only interested in the content in JSON format. Do NOT generate explanation.
32
+
33
+ The Adverse reactions section often has a sentence in the first paragraph:
34
+ "The following clinically significant adverse reactions are described elsewhere in the labeling:..." Make sure to extract those adverse reaction mentions.
35
+ The Adverse reactions section often has summary sentences like:
36
+ "The most common adverse reactions were ...". Make sure to extract those adverse reaction mentions.
37
+
38
+ # Input placeholder
39
+ Below is the entire Adverse reactions section for your reference. I will feed you with sentences from it one by one.
40
+ "{{input}}"
@@ -1,6 +1,7 @@
1
1
  import os
2
2
  import abc
3
3
  import re
4
+ import importlib.resources
4
5
  from typing import List, Dict, Tuple, Union
5
6
  from llm_ie.data_types import LLMInformationExtractionFrame
6
7
  from llm_ie.engines import InferenceEngine
@@ -27,7 +28,8 @@ class FrameExtractor:
27
28
 
28
29
  @classmethod
29
30
  def get_prompt_guide(cls) -> str:
30
- with open(os.path.join('/home/daviden1013/David_projects/llm-ie', 'asset', 'prompt_guide', f"{cls.__name__}_prompt_guide.txt"), 'r') as f:
31
+ file_path = importlib.resources.files('llm_ie.asset.prompt_guide').joinpath(f"{cls.__name__}_prompt_guide.txt")
32
+ with open(file_path, 'r') as f:
31
33
  return f.read()
32
34
 
33
35
  def _get_user_prompt(self, text_content:Union[str, Dict[str,str]]) -> str:
@@ -0,0 +1,45 @@
1
+ import os
2
+ import importlib.resources
3
+ from llm_ie.engines import InferenceEngine
4
+ from llm_ie.extractors import FrameExtractor
5
+
6
+ class PromptEditor:
7
+ def __init__(self, inference_engine:InferenceEngine, extractor:FrameExtractor):
8
+ """
9
+ This class is a LLM agent that rewrite or comment a prompt draft based on the prompt guide of an extractor.
10
+
11
+ Parameters
12
+ ----------
13
+ inference_engine : InferenceEngine
14
+ the LLM inferencing engine object. Must implements the chat() method.
15
+ extractor : FrameExtractor
16
+ a FrameExtractor.
17
+ """
18
+ self.inference_engine = inference_engine
19
+ self.prompt_guide = extractor.get_prompt_guide()
20
+
21
+ def rewrite(self, draft:str) -> str:
22
+ """
23
+ This method inputs a prompt draft and rewrites it following the extractor's guideline.
24
+ """
25
+ file_path = importlib.resources.files('llm_ie.asset.PromptEditor_prompts').joinpath('rewrite.txt')
26
+ with open(file_path, 'r') as f:
27
+ prompt = f.read()
28
+
29
+ prompt = prompt.replace("{{draft}}", draft).replace("{{prompt_guideline}}", self.prompt_guide)
30
+ messages = [{"role": "user", "content": prompt}]
31
+ res = self.inference_engine.chat(messages, stream=True)
32
+ return res
33
+
34
+ def comment(self, draft:str) -> str:
35
+ """
36
+ This method inputs a prompt draft and comment following the extractor's guideline.
37
+ """
38
+ file_path = importlib.resources.files('llm_ie.asset.PromptEditor_prompts').joinpath('comment.txt')
39
+ with open(file_path, 'r') as f:
40
+ prompt = f.read()
41
+
42
+ prompt = prompt.replace("{{draft}}", draft).replace("{{prompt_guideline}}", self.prompt_guide)
43
+ messages = [{"role": "user", "content": prompt}]
44
+ res = self.inference_engine.chat(messages, stream=True)
45
+ return res
@@ -1,26 +0,0 @@
1
- import os
2
- from llm_ie.engines import InferenceEngine
3
- from llm_ie.extractors import FrameExtractor
4
-
5
- class PromptEditor:
6
- def __init__(self, inference_engine:InferenceEngine, extractor:FrameExtractor):
7
- self.inference_engine = inference_engine
8
- self.prompt_guide = extractor.get_prompt_guide()
9
-
10
- def rewrite(self, draft:str) -> str:
11
- with open(os.path.join('/home/daviden1013/David_projects/llm-ie', 'asset', 'PromptEditor_prompts', 'rewrite.txt'), 'r') as f:
12
- prompt = f.read()
13
-
14
- prompt = prompt.replace("{{draft}}", draft).replace("{{prompt_guideline}}", self.prompt_guide)
15
- messages = [{"role": "user", "content": prompt}]
16
- res = self.inference_engine.chat(messages, stream=True)
17
- return res
18
-
19
- def comment(self, draft:str) -> str:
20
- with open(os.path.join('/home/daviden1013/David_projects/llm-ie', 'asset', 'PromptEditor_prompts', 'comment.txt'), 'r') as f:
21
- prompt = f.read()
22
-
23
- prompt = prompt.replace("{{draft}}", draft).replace("{{prompt_guideline}}", self.prompt_guide)
24
- messages = [{"role": "user", "content": prompt}]
25
- res = self.inference_engine.chat(messages, stream=True)
26
- return res
File without changes
File without changes
File without changes