llm-ie 0.1.1__tar.gz → 0.1.3__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: llm-ie
3
- Version: 0.1.1
3
+ Version: 0.1.3
4
4
  Summary: An LLM-powered tool that transforms everyday language into robust information extraction pipelines.
5
5
  License: MIT
6
6
  Author: Enshuo (David) Hsu
@@ -9,9 +9,10 @@ Classifier: License :: OSI Approved :: MIT License
9
9
  Classifier: Programming Language :: Python :: 3
10
10
  Classifier: Programming Language :: Python :: 3.11
11
11
  Classifier: Programming Language :: Python :: 3.12
12
+ Requires-Dist: nltk (>=3.8,<4.0)
12
13
  Description-Content-Type: text/markdown
13
14
 
14
- <div align="center"><img src=asset/readme_img/LLM-IE.png width=500 ></div>
15
+ <div align="center"><img src=doc_asset/readme_img/LLM-IE.png width=500 ></div>
15
16
 
16
17
  ![Python Version](https://img.shields.io/pypi/pyversions/llm-ie)
17
18
  ![PyPI](https://img.shields.io/pypi/v/llm-ie)
@@ -33,10 +34,10 @@ An LLM-powered tool that transforms everyday language into robust information ex
33
34
  ## Overview
34
35
  LLM-IE is a toolkit that provides robust information extraction utilities for frame-based information extraction. Since prompt design has a significant impact on generative information extraction with LLMs, it also provides a built-in LLM editor to help with prompt writing. The flowchart below demonstrates the workflow starting from a casual language request.
35
36
 
36
- <div align="center"><img src="asset/readme_img/LLM-IE flowchart.png" width=800 ></div>
37
+ <div align="center"><img src="doc_asset/readme_img/LLM-IE flowchart.png" width=800 ></div>
37
38
 
38
39
  ## Prerequisite
39
- At least one LLM inference engine is required. There are built-in supports for 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python), <img src="https://avatars.githubusercontent.com/u/151674099?s=48&v=4" alt="Icon" width="20"/> [Ollama](https://github.com/ollama/ollama), 🤗 [Huggingface_hub](https://github.com/huggingface/huggingface_hub), and <img src=asset/readme_img/openai-logomark.png width=16 /> [OpenAI API](https://platform.openai.com/docs/api-reference/introduction). For installation guides, please refer to those projects. Other inference engines can be configured through the [InferenceEngine](src/llm_ie/engines.py) abstract class. See [LLM Inference Engine](#llm-inference-engine) section below.
40
+ At least one LLM inference engine is required. There are built-in supports for 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python), <img src="https://avatars.githubusercontent.com/u/151674099?s=48&v=4" alt="Icon" width="20"/> [Ollama](https://github.com/ollama/ollama), 🤗 [Huggingface_hub](https://github.com/huggingface/huggingface_hub), and <img src=doc_asset/readme_img/openai-logomark.png width=16 /> [OpenAI API](https://platform.openai.com/docs/api-reference/introduction). For installation guides, please refer to those projects. Other inference engines can be configured through the [InferenceEngine](src/llm_ie/engines.py) abstract class. See [LLM Inference Engine](#llm-inference-engine) section below.
40
41
 
41
42
  ## Installation
42
43
  The Python package is available on PyPI.
@@ -82,7 +83,7 @@ llm = HuggingFaceHubInferenceEngine(model="meta-llama/Meta-Llama-3-8B-Instruct")
82
83
  </details>
83
84
 
84
85
  <details>
85
- <summary><img src=asset/readme_img/openai-logomark.png width=16 /> OpenAI API</summary>
86
+ <summary><img src=doc_asset/readme_img/openai-logomark.png width=16 /> OpenAI API</summary>
86
87
 
87
88
  Follow the [Best Practices for API Key Safety](https://help.openai.com/en/articles/5112595-best-practices-for-api-key-safety) to set up API key.
88
89
  ```python
@@ -228,7 +229,7 @@ from llm_ie.engines import HuggingFaceHubInferenceEngine
228
229
  hf = HuggingFaceHubInferenceEngine(model="meta-llama/Meta-Llama-3-8B-Instruct")
229
230
  ```
230
231
 
231
- #### <img src=asset/readme_img/openai-logomark.png width=16 /> OpenAI API
232
+ #### <img src=doc_asset/readme_img/openai-logomark.png width=16 /> OpenAI API
232
233
  In bash, save API key to the environmental variable ```OPENAI_API_KEY```.
233
234
  ```
234
235
  export OPENAI_API_KEY=<your_API_key>
@@ -1,4 +1,4 @@
1
- <div align="center"><img src=asset/readme_img/LLM-IE.png width=500 ></div>
1
+ <div align="center"><img src=doc_asset/readme_img/LLM-IE.png width=500 ></div>
2
2
 
3
3
  ![Python Version](https://img.shields.io/pypi/pyversions/llm-ie)
4
4
  ![PyPI](https://img.shields.io/pypi/v/llm-ie)
@@ -20,10 +20,10 @@ An LLM-powered tool that transforms everyday language into robust information ex
20
20
  ## Overview
21
21
  LLM-IE is a toolkit that provides robust information extraction utilities for frame-based information extraction. Since prompt design has a significant impact on generative information extraction with LLMs, it also provides a built-in LLM editor to help with prompt writing. The flowchart below demonstrates the workflow starting from a casual language request.
22
22
 
23
- <div align="center"><img src="asset/readme_img/LLM-IE flowchart.png" width=800 ></div>
23
+ <div align="center"><img src="doc_asset/readme_img/LLM-IE flowchart.png" width=800 ></div>
24
24
 
25
25
  ## Prerequisite
26
- At least one LLM inference engine is required. There are built-in supports for 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python), <img src="https://avatars.githubusercontent.com/u/151674099?s=48&v=4" alt="Icon" width="20"/> [Ollama](https://github.com/ollama/ollama), 🤗 [Huggingface_hub](https://github.com/huggingface/huggingface_hub), and <img src=asset/readme_img/openai-logomark.png width=16 /> [OpenAI API](https://platform.openai.com/docs/api-reference/introduction). For installation guides, please refer to those projects. Other inference engines can be configured through the [InferenceEngine](src/llm_ie/engines.py) abstract class. See [LLM Inference Engine](#llm-inference-engine) section below.
26
+ At least one LLM inference engine is required. There are built-in supports for 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python), <img src="https://avatars.githubusercontent.com/u/151674099?s=48&v=4" alt="Icon" width="20"/> [Ollama](https://github.com/ollama/ollama), 🤗 [Huggingface_hub](https://github.com/huggingface/huggingface_hub), and <img src=doc_asset/readme_img/openai-logomark.png width=16 /> [OpenAI API](https://platform.openai.com/docs/api-reference/introduction). For installation guides, please refer to those projects. Other inference engines can be configured through the [InferenceEngine](src/llm_ie/engines.py) abstract class. See [LLM Inference Engine](#llm-inference-engine) section below.
27
27
 
28
28
  ## Installation
29
29
  The Python package is available on PyPI.
@@ -69,7 +69,7 @@ llm = HuggingFaceHubInferenceEngine(model="meta-llama/Meta-Llama-3-8B-Instruct")
69
69
  </details>
70
70
 
71
71
  <details>
72
- <summary><img src=asset/readme_img/openai-logomark.png width=16 /> OpenAI API</summary>
72
+ <summary><img src=doc_asset/readme_img/openai-logomark.png width=16 /> OpenAI API</summary>
73
73
 
74
74
  Follow the [Best Practices for API Key Safety](https://help.openai.com/en/articles/5112595-best-practices-for-api-key-safety) to set up API key.
75
75
  ```python
@@ -215,7 +215,7 @@ from llm_ie.engines import HuggingFaceHubInferenceEngine
215
215
  hf = HuggingFaceHubInferenceEngine(model="meta-llama/Meta-Llama-3-8B-Instruct")
216
216
  ```
217
217
 
218
- #### <img src=asset/readme_img/openai-logomark.png width=16 /> OpenAI API
218
+ #### <img src=doc_asset/readme_img/openai-logomark.png width=16 /> OpenAI API
219
219
  In bash, save API key to the environmental variable ```OPENAI_API_KEY```.
220
220
  ```
221
221
  export OPENAI_API_KEY=<your_API_key>
@@ -1,6 +1,6 @@
1
1
  [tool.poetry]
2
2
  name = "llm-ie"
3
- version = "0.1.1"
3
+ version = "0.1.3"
4
4
  description = "An LLM-powered tool that transforms everyday language into robust information extraction pipelines."
5
5
  authors = ["Enshuo (David) Hsu"]
6
6
  license = "MIT"
@@ -10,8 +10,10 @@ exclude = [
10
10
  "test/**"
11
11
  ]
12
12
 
13
+
13
14
  [tool.poetry.dependencies]
14
15
  python = "^3.11"
16
+ nltk = "^3.8"
15
17
 
16
18
 
17
19
  [build-system]
@@ -0,0 +1,7 @@
1
+ This is a text analysis task. Analyze the draft prompt based on the prompt guideline below.
2
+
3
+ # Prompt guideline
4
+ {{prompt_guideline}}
5
+
6
+ # Draft prompt
7
+ {{draft}}
@@ -0,0 +1,7 @@
1
+ This is a prompt rewriting task. Rewrite the draft prompt following the prompt guideline below. DO NOT explain your answer.
2
+
3
+ # Prompt guideline
4
+ {{prompt_guideline}}
5
+
6
+ # Draft prompt
7
+ {{draft}}
@@ -0,0 +1,35 @@
1
+ Prompt template design:
2
+ 1. Task description
3
+ 2. Schema definition
4
+ 3. Output format definition
5
+ 4. Additional hints
6
+ 5. Input placeholder
7
+
8
+ Example:
9
+
10
+ # Task description
11
+ The paragraph below is from the Food and Drug Administration (FDA) Clinical Pharmacology Section of Labeling for Human Prescription Drug and Biological Products, Adverse reactions section. Please carefully review it and extract the adverse reactions and percentages. Note that each adverse reaction is nested under a clinical trial and potentially an arm. Your output should take that into consideration.
12
+
13
+ # Schema definition
14
+ Your output should contain:
15
+ "ClinicalTrial" which is the name of the trial,
16
+ If applicable, "Arm" which is the arm within the clinical trial,
17
+ "AdverseReaction" which is the name of the adverse reaction,
18
+ If applicable, "Percentage" which is the occurance of the adverse reaction within the trial and arm,
19
+ "Evidence" which is the EXACT sentence in the text where you found the AdverseReaction from
20
+
21
+ # Output format definition
22
+ Your output should follow JSON format, for example:
23
+ [
24
+ {"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>", "Evidence": "<exact sentence from the text>"},
25
+ {"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>", "Evidence": "<exact sentence from the text>"}
26
+ ]
27
+
28
+ # Additional hints
29
+ Your output should be 100% based on the provided content. DO NOT output fake numbers.
30
+ If there is no specific arm, just omit the "Arm" key. If the percentage is not reported, just omit the "Percentage" key. The "Evidence" should always be provided.
31
+
32
+ # Input placeholder
33
+ Below is the Adverse reactions section:
34
+ {{input}}
35
+
@@ -0,0 +1,35 @@
1
+ Prompt template design:
2
+ 1. Task description
3
+ 2. Schema definition
4
+ 3. Output format definition
5
+ 4. Additional hints
6
+ 5. Input placeholder
7
+
8
+ Example:
9
+
10
+ # Task description
11
+ The paragraph below is from the Food and Drug Administration (FDA) Clinical Pharmacology Section of Labeling for Human Prescription Drug and Biological Products, Adverse reactions section. Please carefully review it and extract the adverse reactions and percentages. Note that each adverse reaction is nested under a clinical trial and potentially an arm. Your output should take that into consideration.
12
+
13
+ # Schema definition
14
+ Your output should contain:
15
+ "ClinicalTrial" which is the name of the trial,
16
+ If applicable, "Arm" which is the arm within the clinical trial,
17
+ "AdverseReaction" which is the name of the adverse reaction,
18
+ If applicable, "Percentage" which is the occurance of the adverse reaction within the trial and arm,
19
+ "Evidence" which is the EXACT sentence in the text where you found the AdverseReaction from
20
+
21
+ # Output format definition
22
+ Your output should follow JSON format, for example:
23
+ [
24
+ {"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>", "Evidence": "<exact sentence from the text>"},
25
+ {"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>", "Evidence": "<exact sentence from the text>"}
26
+ ]
27
+
28
+ # Additional hints
29
+ Your output should be 100% based on the provided content. DO NOT output fake numbers.
30
+ If there is no specific arm, just omit the "Arm" key. If the percentage is not reported, just omit the "Percentage" key. The "Evidence" should always be provided.
31
+
32
+ # Input placeholder
33
+ Below is the Adverse reactions section:
34
+ {{input}}
35
+
@@ -0,0 +1,40 @@
1
+ Prompt template design:
2
+ 1. Task description (mention the task is to extract information from sentences)
3
+ 2. Schema definition
4
+ 3. Output format definition
5
+ 4. Additional hints
6
+ 5. Input placeholder (mention user will feed sentence by sentence)
7
+
8
+ Example:
9
+
10
+ # Task description
11
+ The paragraph below is from the Food and Drug Administration (FDA) Clinical Pharmacology Section of Labeling for Human Prescription Drug and Biological Products, Adverse Reactions section. Your task is to extract the adverse reactions in a given sentence (provided by user at a time). Note that adverse reactions can be nested under a clinical trial and potentially an arm. Your output should consider that.
12
+
13
+ # Schema definition
14
+ Your output should contain:
15
+ If applicable, "ClinicalTrial" which is the name of the trial,
16
+ If applicable, "Arm" which is the arm within the clinical trial,
17
+ Must have, "AdverseReaction" which is the name of the adverse reaction spelled exactly as in the source document,
18
+ If applicable, "Percentage" which is the occurrence of the adverse reaction within the trial and arm,
19
+
20
+ # Output format definition
21
+ Your output should follow JSON format,
22
+ if there are adverse reaction mentions:
23
+ [{"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>"},
24
+ {"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>"}]
25
+ if there is no adverse reaction in the given sentence, just output an empty list:
26
+ []
27
+
28
+ # Additional hints
29
+ Your output should be 100% based on the provided content. DO NOT output fake numbers.
30
+ If there is no specific trial or arm, just omit the "ClinicalTrial" or "Arm" key. If the percentage is not reported, just omit the "Percentage" key.
31
+ I am only interested in the content in JSON format. Do NOT generate explanation.
32
+
33
+ The Adverse reactions section often has a sentence in the first paragraph:
34
+ "The following clinically significant adverse reactions are described elsewhere in the labeling:..." Make sure to extract those adverse reaction mentions.
35
+ The Adverse reactions section often has summary sentences like:
36
+ "The most common adverse reactions were ...". Make sure to extract those adverse reaction mentions.
37
+
38
+ # Input placeholder
39
+ Below is the entire Adverse reactions section for your reference. I will feed you with sentences from it one by one.
40
+ "{{input}}"
@@ -31,7 +31,6 @@ class InferenceEngine:
31
31
 
32
32
 
33
33
  class LlamaCppInferenceEngine(InferenceEngine):
34
- from llama_cpp import Llama
35
34
  def __init__(self, repo_id:str, gguf_filename:str, n_ctx:int=4096, n_gpu_layers:int=-1, **kwrs):
36
35
  """
37
36
  The Llama.cpp inference engine.
@@ -48,13 +47,13 @@ class LlamaCppInferenceEngine(InferenceEngine):
48
47
  n_gpu_layers : int, Optional
49
48
  number of layers to offload to GPU. Default is all layers (-1).
50
49
  """
51
- super().__init__()
50
+ from llama_cpp import Llama
52
51
  self.repo_id = repo_id
53
52
  self.gguf_filename = gguf_filename
54
53
  self.n_ctx = n_ctx
55
54
  self.n_gpu_layers = n_gpu_layers
56
55
 
57
- self.model = self.Llama.from_pretrained(
56
+ self.model = Llama.from_pretrained(
58
57
  repo_id=self.repo_id,
59
58
  filename=self.gguf_filename,
60
59
  n_gpu_layers=n_gpu_layers,
@@ -106,7 +105,6 @@ class LlamaCppInferenceEngine(InferenceEngine):
106
105
 
107
106
 
108
107
  class OllamaInferenceEngine(InferenceEngine):
109
- import ollama
110
108
  def __init__(self, model_name:str, num_ctx:int=4096, keep_alive:int=300, **kwrs):
111
109
  """
112
110
  The Ollama inference engine.
@@ -120,6 +118,8 @@ class OllamaInferenceEngine(InferenceEngine):
120
118
  keep_alive : int, Optional
121
119
  seconds to hold the LLM after the last API call.
122
120
  """
121
+ import ollama
122
+ self.ollama = ollama
123
123
  self.model_name = model_name
124
124
  self.num_ctx = num_ctx
125
125
  self.keep_alive = keep_alive
@@ -158,13 +158,13 @@ class OllamaInferenceEngine(InferenceEngine):
158
158
 
159
159
 
160
160
  class HuggingFaceHubInferenceEngine(InferenceEngine):
161
- from huggingface_hub import InferenceClient
162
161
  def __init__(self, **kwrs):
163
162
  """
164
163
  The Huggingface_hub InferenceClient inference engine.
165
164
  For parameters and documentation, refer to https://huggingface.co/docs/huggingface_hub/en/package_reference/inference_client
166
165
  """
167
- self.client = self.InferenceClient(**kwrs)
166
+ from huggingface_hub import InferenceClient
167
+ self.client = InferenceClient(**kwrs)
168
168
 
169
169
  def chat(self, messages:List[Dict[str,str]], max_new_tokens:int=2048, temperature:float=0.0, stream:bool=False, **kwrs) -> str:
170
170
  """
@@ -200,7 +200,6 @@ class HuggingFaceHubInferenceEngine(InferenceEngine):
200
200
 
201
201
 
202
202
  class OpenAIInferenceEngine(InferenceEngine):
203
- from openai import OpenAI
204
203
  def __init__(self, model:str, **kwrs):
205
204
  """
206
205
  The OpenAI API inference engine.
@@ -211,7 +210,8 @@ class OpenAIInferenceEngine(InferenceEngine):
211
210
  model_name : str
212
211
  model name as described in https://platform.openai.com/docs/models
213
212
  """
214
- self.client = self.OpenAI(**kwrs)
213
+ from openai import OpenAI
214
+ self.client = OpenAI(**kwrs)
215
215
  self.model = model
216
216
 
217
217
  def chat(self, messages:List[Dict[str,str]], max_new_tokens:int=2048, temperature:float=0.0, stream:bool=False, **kwrs) -> str:
@@ -1,6 +1,7 @@
1
1
  import os
2
2
  import abc
3
3
  import re
4
+ import importlib.resources
4
5
  from typing import List, Dict, Tuple, Union
5
6
  from llm_ie.data_types import LLMInformationExtractionFrame
6
7
  from llm_ie.engines import InferenceEngine
@@ -27,7 +28,8 @@ class FrameExtractor:
27
28
 
28
29
  @classmethod
29
30
  def get_prompt_guide(cls) -> str:
30
- with open(os.path.join('/home/daviden1013/David_projects/llm-ie', 'asset', 'prompt_guide', f"{cls.__name__}_prompt_guide.txt"), 'r') as f:
31
+ file_path = importlib.resources.files('llm_ie.asset.prompt_guide').joinpath(f"{cls.__name__}_prompt_guide.txt")
32
+ with open(file_path, 'r') as f:
31
33
  return f.read()
32
34
 
33
35
  def _get_user_prompt(self, text_content:Union[str, Dict[str,str]]) -> str:
@@ -0,0 +1,45 @@
1
+ import os
2
+ import importlib.resources
3
+ from llm_ie.engines import InferenceEngine
4
+ from llm_ie.extractors import FrameExtractor
5
+
6
+ class PromptEditor:
7
+ def __init__(self, inference_engine:InferenceEngine, extractor:FrameExtractor):
8
+ """
9
+ This class is a LLM agent that rewrite or comment a prompt draft based on the prompt guide of an extractor.
10
+
11
+ Parameters
12
+ ----------
13
+ inference_engine : InferenceEngine
14
+ the LLM inferencing engine object. Must implements the chat() method.
15
+ extractor : FrameExtractor
16
+ a FrameExtractor.
17
+ """
18
+ self.inference_engine = inference_engine
19
+ self.prompt_guide = extractor.get_prompt_guide()
20
+
21
+ def rewrite(self, draft:str) -> str:
22
+ """
23
+ This method inputs a prompt draft and rewrites it following the extractor's guideline.
24
+ """
25
+ file_path = importlib.resources.files('llm_ie.asset.PromptEditor_prompts').joinpath('rewrite.txt')
26
+ with open(file_path, 'r') as f:
27
+ prompt = f.read()
28
+
29
+ prompt = prompt.replace("{{draft}}", draft).replace("{{prompt_guideline}}", self.prompt_guide)
30
+ messages = [{"role": "user", "content": prompt}]
31
+ res = self.inference_engine.chat(messages, stream=True)
32
+ return res
33
+
34
+ def comment(self, draft:str) -> str:
35
+ """
36
+ This method inputs a prompt draft and comment following the extractor's guideline.
37
+ """
38
+ file_path = importlib.resources.files('llm_ie.asset.PromptEditor_prompts').joinpath('comment.txt')
39
+ with open(file_path, 'r') as f:
40
+ prompt = f.read()
41
+
42
+ prompt = prompt.replace("{{draft}}", draft).replace("{{prompt_guideline}}", self.prompt_guide)
43
+ messages = [{"role": "user", "content": prompt}]
44
+ res = self.inference_engine.chat(messages, stream=True)
45
+ return res
@@ -1,26 +0,0 @@
1
- import os
2
- from llm_ie.engines import InferenceEngine
3
- from llm_ie.extractors import FrameExtractor
4
-
5
- class PromptEditor:
6
- def __init__(self, inference_engine:InferenceEngine, extractor:FrameExtractor):
7
- self.inference_engine = inference_engine
8
- self.prompt_guide = extractor.get_prompt_guide()
9
-
10
- def rewrite(self, draft:str) -> str:
11
- with open(os.path.join('/home/daviden1013/David_projects/llm-ie', 'asset', 'PromptEditor_prompts', 'rewrite.txt'), 'r') as f:
12
- prompt = f.read()
13
-
14
- prompt = prompt.replace("{{draft}}", draft).replace("{{prompt_guideline}}", self.prompt_guide)
15
- messages = [{"role": "user", "content": prompt}]
16
- res = self.inference_engine.chat(messages, stream=True)
17
- return res
18
-
19
- def comment(self, draft:str) -> str:
20
- with open(os.path.join('/home/daviden1013/David_projects/llm-ie', 'asset', 'PromptEditor_prompts', 'comment.txt'), 'r') as f:
21
- prompt = f.read()
22
-
23
- prompt = prompt.replace("{{draft}}", draft).replace("{{prompt_guideline}}", self.prompt_guide)
24
- messages = [{"role": "user", "content": prompt}]
25
- res = self.inference_engine.chat(messages, stream=True)
26
- return res
File without changes
File without changes