llm-ie 0.2.2__tar.gz → 0.3.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {llm_ie-0.2.2 → llm_ie-0.3.1}/PKG-INFO +89 -44
- {llm_ie-0.2.2 → llm_ie-0.3.1}/README.md +87 -43
- {llm_ie-0.2.2 → llm_ie-0.3.1}/pyproject.toml +2 -1
- llm_ie-0.3.1/src/llm_ie/asset/PromptEditor_prompts/chat.txt +5 -0
- llm_ie-0.3.1/src/llm_ie/asset/PromptEditor_prompts/rewrite.txt +9 -0
- llm_ie-0.3.1/src/llm_ie/asset/PromptEditor_prompts/system.txt +1 -0
- llm_ie-0.3.1/src/llm_ie/asset/default_prompts/ReviewFrameExtractor_addition_review_prompt.txt +3 -0
- llm_ie-0.3.1/src/llm_ie/asset/default_prompts/ReviewFrameExtractor_revision_review_prompt.txt +2 -0
- llm_ie-0.3.1/src/llm_ie/asset/default_prompts/SentenceReviewFrameExtractor_addition_review_prompt.txt +4 -0
- llm_ie-0.3.1/src/llm_ie/asset/default_prompts/SentenceReviewFrameExtractor_revision_review_prompt.txt +3 -0
- llm_ie-0.3.1/src/llm_ie/asset/prompt_guide/BasicFrameExtractor_prompt_guide.txt +145 -0
- {llm_ie-0.2.2 → llm_ie-0.3.1}/src/llm_ie/asset/prompt_guide/BinaryRelationExtractor_prompt_guide.txt +32 -12
- {llm_ie-0.2.2 → llm_ie-0.3.1}/src/llm_ie/asset/prompt_guide/MultiClassRelationExtractor_prompt_guide.txt +35 -12
- llm_ie-0.3.1/src/llm_ie/asset/prompt_guide/ReviewFrameExtractor_prompt_guide.txt +145 -0
- llm_ie-0.3.1/src/llm_ie/asset/prompt_guide/SentenceCoTFrameExtractor_prompt_guide.txt +217 -0
- llm_ie-0.3.1/src/llm_ie/asset/prompt_guide/SentenceFrameExtractor_prompt_guide.txt +145 -0
- llm_ie-0.3.1/src/llm_ie/asset/prompt_guide/SentenceReviewFrameExtractor_prompt_guide.txt +145 -0
- {llm_ie-0.2.2 → llm_ie-0.3.1}/src/llm_ie/engines.py +1 -1
- {llm_ie-0.2.2 → llm_ie-0.3.1}/src/llm_ie/extractors.py +331 -24
- llm_ie-0.3.1/src/llm_ie/prompt_editor.py +187 -0
- llm_ie-0.2.2/src/llm_ie/asset/PromptEditor_prompts/rewrite.txt +0 -7
- llm_ie-0.2.2/src/llm_ie/asset/prompt_guide/BasicFrameExtractor_prompt_guide.txt +0 -35
- llm_ie-0.2.2/src/llm_ie/asset/prompt_guide/ReviewFrameExtractor_prompt_guide.txt +0 -35
- llm_ie-0.2.2/src/llm_ie/asset/prompt_guide/SentenceFrameExtractor_prompt_guide.txt +0 -40
- llm_ie-0.2.2/src/llm_ie/prompt_editor.py +0 -45
- {llm_ie-0.2.2 → llm_ie-0.3.1}/src/llm_ie/__init__.py +0 -0
- {llm_ie-0.2.2 → llm_ie-0.3.1}/src/llm_ie/asset/PromptEditor_prompts/comment.txt +0 -0
- {llm_ie-0.2.2 → llm_ie-0.3.1}/src/llm_ie/data_types.py +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.1
|
|
2
2
|
Name: llm-ie
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.3.1
|
|
4
4
|
Summary: An LLM-powered tool that transforms everyday language into robust information extraction pipelines.
|
|
5
5
|
License: MIT
|
|
6
6
|
Author: Enshuo (David) Hsu
|
|
@@ -9,6 +9,7 @@ Classifier: License :: OSI Approved :: MIT License
|
|
|
9
9
|
Classifier: Programming Language :: Python :: 3
|
|
10
10
|
Classifier: Programming Language :: Python :: 3.11
|
|
11
11
|
Classifier: Programming Language :: Python :: 3.12
|
|
12
|
+
Requires-Dist: colorama (>=0.4.6,<0.5.0)
|
|
12
13
|
Requires-Dist: nltk (>=3.8,<4.0)
|
|
13
14
|
Description-Content-Type: text/markdown
|
|
14
15
|
|
|
@@ -20,6 +21,13 @@ Description-Content-Type: text/markdown
|
|
|
20
21
|
|
|
21
22
|
An LLM-powered tool that transforms everyday language into robust information extraction pipelines.
|
|
22
23
|
|
|
24
|
+
| Features | Support |
|
|
25
|
+
|----------|----------|
|
|
26
|
+
| **LLM Agent for prompt writing** | :white_check_mark: Interactive chat, Python functions |
|
|
27
|
+
| **Named Entity Recognition (NER)** | :white_check_mark: Document-level, Sentence-level |
|
|
28
|
+
| **Entity Attributes Extraction** | :white_check_mark: Flexible formats |
|
|
29
|
+
| **Relation Extraction (RE)** | :white_check_mark: Binary & Multiclass relations |
|
|
30
|
+
|
|
23
31
|
## Table of Contents
|
|
24
32
|
- [Overview](#overview)
|
|
25
33
|
- [Prerequisite](#prerequisite)
|
|
@@ -35,12 +43,12 @@ An LLM-powered tool that transforms everyday language into robust information ex
|
|
|
35
43
|
- [RelationExtractor](#relationextractor)
|
|
36
44
|
|
|
37
45
|
## Overview
|
|
38
|
-
LLM-IE is a toolkit that provides robust information extraction utilities for
|
|
46
|
+
LLM-IE is a toolkit that provides robust information extraction utilities for named entity, entity attributes, and entity relation extraction. Since prompt design has a significant impact on generative information extraction with LLMs, it has a built-in LLM agent ("editor") to help with prompt writing. The flowchart below demonstrates the workflow starting from a casual language request to output visualization.
|
|
39
47
|
|
|
40
48
|
<div align="center"><img src="doc_asset/readme_img/LLM-IE flowchart.png" width=800 ></div>
|
|
41
49
|
|
|
42
50
|
## Prerequisite
|
|
43
|
-
At least one LLM inference engine is required. There are built-in supports for 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python), <img src="https://avatars.githubusercontent.com/u/151674099?s=48&v=4" alt="Icon" width="20"/> [Ollama](https://github.com/ollama/ollama), 🤗 [Huggingface_hub](https://github.com/huggingface/huggingface_hub), <img src=doc_asset/readme_img/openai-logomark.png width=16 /> [OpenAI API](https://platform.openai.com/docs/api-reference/introduction), and <img src=doc_asset/readme_img/vllm-logo.png width=20 /> vLLM. For installation guides, please refer to those projects. Other inference engines can be configured through the [InferenceEngine](src/llm_ie/engines.py) abstract class. See [LLM Inference Engine](#llm-inference-engine) section below.
|
|
51
|
+
At least one LLM inference engine is required. There are built-in supports for 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python), <img src="https://avatars.githubusercontent.com/u/151674099?s=48&v=4" alt="Icon" width="20"/> [Ollama](https://github.com/ollama/ollama), 🤗 [Huggingface_hub](https://github.com/huggingface/huggingface_hub), <img src=doc_asset/readme_img/openai-logomark.png width=16 /> [OpenAI API](https://platform.openai.com/docs/api-reference/introduction), and <img src=doc_asset/readme_img/vllm-logo.png width=20 /> [vLLM](https://github.com/vllm-project/vllm). For installation guides, please refer to those projects. Other inference engines can be configured through the [InferenceEngine](src/llm_ie/engines.py) abstract class. See [LLM Inference Engine](#llm-inference-engine) section below.
|
|
44
52
|
|
|
45
53
|
## Installation
|
|
46
54
|
The Python package is available on PyPI.
|
|
@@ -125,21 +133,26 @@ We start with a casual description:
|
|
|
125
133
|
|
|
126
134
|
*"Extract diagnosis from the clinical note. Make sure to include diagnosis date and status."*
|
|
127
135
|
|
|
128
|
-
|
|
129
|
-
|
|
130
|
-
|
|
136
|
+
Define the AI prompt editor.
|
|
137
|
+
```python
|
|
138
|
+
from llm_ie.engines import OllamaInferenceEngine
|
|
131
139
|
from llm_ie.extractors import BasicFrameExtractor
|
|
132
140
|
from llm_ie.prompt_editor import PromptEditor
|
|
133
141
|
|
|
134
|
-
#
|
|
135
|
-
|
|
136
|
-
|
|
137
|
-
# Use LLM editor to generate a formal prompt template with standard extraction schema
|
|
142
|
+
# Define a LLM inference engine
|
|
143
|
+
llm = OllamaInferenceEngine(model_name="llama3.1:8b-instruct-q8_0")
|
|
144
|
+
# Define LLM prompt editor
|
|
138
145
|
editor = PromptEditor(llm, BasicFrameExtractor)
|
|
139
|
-
|
|
146
|
+
# Start chat
|
|
147
|
+
editor.chat()
|
|
140
148
|
```
|
|
141
149
|
|
|
142
|
-
|
|
150
|
+
This opens an interactive session:
|
|
151
|
+
<div align="left"><img src=doc_asset/readme_img/terminal_chat.PNG width=1000 ></div>
|
|
152
|
+
|
|
153
|
+
|
|
154
|
+
The ```PromptEditor``` drafts a prompt template following the schema required by the ```BasicFrameExtractor```:
|
|
155
|
+
|
|
143
156
|
```
|
|
144
157
|
# Task description
|
|
145
158
|
The paragraph below contains a clinical note with diagnoses listed. Please carefully review it and extract the diagnoses, including the diagnosis date and status.
|
|
@@ -165,6 +178,8 @@ If there is no specific date or status, just omit those keys.
|
|
|
165
178
|
Below is the clinical note:
|
|
166
179
|
{{input}}
|
|
167
180
|
```
|
|
181
|
+
|
|
182
|
+
|
|
168
183
|
#### Information extraction pipeline
|
|
169
184
|
Now we apply the prompt template to build an information extraction pipeline.
|
|
170
185
|
|
|
@@ -202,15 +217,33 @@ from llm_ie.data_types import LLMInformationExtractionDocument
|
|
|
202
217
|
doc = LLMInformationExtractionDocument(doc_id="Synthesized medical note",
|
|
203
218
|
text=note_text)
|
|
204
219
|
# Add frames to a document
|
|
205
|
-
|
|
206
|
-
doc.add_frame(frame, valid_mode="span", create_id=True)
|
|
220
|
+
doc.add_frames(frames, create_id=True)
|
|
207
221
|
|
|
208
222
|
# Save document to file (.llmie)
|
|
209
223
|
doc.save("<your filename>.llmie")
|
|
210
224
|
```
|
|
211
225
|
|
|
226
|
+
To visualize the extracted frames, we use the ```viz_serve()``` method.
|
|
227
|
+
```python
|
|
228
|
+
doc.viz_serve()
|
|
229
|
+
```
|
|
230
|
+
A Flask APP starts at port 5000 (default).
|
|
231
|
+
```
|
|
232
|
+
* Serving Flask app 'ie_viz.utilities'
|
|
233
|
+
* Debug mode: off
|
|
234
|
+
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
|
|
235
|
+
* Running on all addresses (0.0.0.0)
|
|
236
|
+
* Running on http://127.0.0.1:5000
|
|
237
|
+
Press CTRL+C to quit
|
|
238
|
+
127.0.0.1 - - [03/Oct/2024 23:36:22] "GET / HTTP/1.1" 200 -
|
|
239
|
+
```
|
|
240
|
+
|
|
241
|
+
<div align="left"><img src="doc_asset/readme_img/llm-ie_demo.PNG" width=1000 ></div>
|
|
242
|
+
|
|
243
|
+
|
|
212
244
|
## Examples
|
|
213
|
-
- [
|
|
245
|
+
- [Interactive chat with LLM prompt editors](demo/prompt_template_writing_via_chat.ipynb)
|
|
246
|
+
- [Write prompt templates with LLM prompt editors](demo/prompt_template_writing.ipynb)
|
|
214
247
|
- [NER + RE for Drug, Strength, Frequency](demo/medication_relation_extraction.ipynb)
|
|
215
248
|
|
|
216
249
|
## User Guide
|
|
@@ -435,7 +468,30 @@ print(BasicFrameExtractor.get_prompt_guide())
|
|
|
435
468
|
```
|
|
436
469
|
|
|
437
470
|
### Prompt Editor
|
|
438
|
-
The prompt editor is an LLM agent that
|
|
471
|
+
The prompt editor is an LLM agent that help users write prompt templates following the defined schema and guideline of each extractor. Chat with the promtp editor:
|
|
472
|
+
|
|
473
|
+
```python
|
|
474
|
+
from llm_ie.prompt_editor import PromptEditor
|
|
475
|
+
from llm_ie.extractors import BasicFrameExtractor
|
|
476
|
+
from llm_ie.engines import OllamaInferenceEngine
|
|
477
|
+
|
|
478
|
+
# Define an LLM inference engine
|
|
479
|
+
ollama = OllamaInferenceEngine(model_name="llama3.1:8b-instruct-q8_0")
|
|
480
|
+
|
|
481
|
+
# Define editor
|
|
482
|
+
editor = PromptEditor(ollama, BasicFrameExtractor)
|
|
483
|
+
|
|
484
|
+
editor.chat()
|
|
485
|
+
```
|
|
486
|
+
|
|
487
|
+
In a terminal environment, an interactive chat session will start:
|
|
488
|
+
<div align="left"><img src=doc_asset/readme_img/terminal_chat.PNG width=1000 ></div>
|
|
489
|
+
|
|
490
|
+
In the Jupyter/IPython environment, an ipywidgets session will start:
|
|
491
|
+
<div align="left"><img src=doc_asset/readme_img/IPython_chat.PNG width=1000 ></div>
|
|
492
|
+
|
|
493
|
+
|
|
494
|
+
We can also use the `rewrite()` and `comment()` methods to programmingly interact with the prompt editor:
|
|
439
495
|
1. start with a casual description of the task
|
|
440
496
|
2. have the prompt editor generate a prompt template as the starting point
|
|
441
497
|
3. manually revise the prompt template
|
|
@@ -581,40 +637,29 @@ print(BasicFrameExtractor.get_prompt_guide())
|
|
|
581
637
|
```
|
|
582
638
|
|
|
583
639
|
```
|
|
584
|
-
Prompt
|
|
585
|
-
1. Task description
|
|
586
|
-
2. Schema definition
|
|
587
|
-
3. Output format definition
|
|
588
|
-
4. Additional hints
|
|
589
|
-
5. Input placeholder
|
|
640
|
+
Prompt Template Design:
|
|
590
641
|
|
|
591
|
-
|
|
642
|
+
1. Task Description:
|
|
643
|
+
Provide a detailed description of the task, including the background and the type of task (e.g., named entity recognition).
|
|
592
644
|
|
|
593
|
-
|
|
594
|
-
|
|
645
|
+
2. Schema Definition:
|
|
646
|
+
List the key concepts that should be extracted, and provide clear definitions for each one.
|
|
595
647
|
|
|
596
|
-
|
|
597
|
-
|
|
598
|
-
"ClinicalTrial" which is the name of the trial,
|
|
599
|
-
If applicable, "Arm" which is the arm within the clinical trial,
|
|
600
|
-
"AdverseReaction" which is the name of the adverse reaction,
|
|
601
|
-
If applicable, "Percentage" which is the occurance of the adverse reaction within the trial and arm,
|
|
602
|
-
"Evidence" which is the EXACT sentence in the text where you found the AdverseReaction from
|
|
648
|
+
3. Output Format Definition:
|
|
649
|
+
The output should be a JSON list, where each element is a dictionary representing a frame (an entity along with its attributes). Each dictionary must include a key that holds the entity text. This key can be named "entity_text" or anything else depend on the context. The attributes can either be flat (e.g., {"entity_text": "<entity_text>", "attr1": "<attr1>", "attr2": "<attr2>"}) or nested (e.g., {"entity_text": "<entity_text>", "attributes": {"attr1": "<attr1>", "attr2": "<attr2>"}}).
|
|
603
650
|
|
|
604
|
-
|
|
605
|
-
|
|
606
|
-
|
|
607
|
-
|
|
608
|
-
|
|
609
|
-
|
|
651
|
+
4. Optional: Hints:
|
|
652
|
+
Provide itemized hints for the information extractors to guide the extraction process.
|
|
653
|
+
|
|
654
|
+
5. Optional: Examples:
|
|
655
|
+
Include examples in the format:
|
|
656
|
+
Input: ...
|
|
657
|
+
Output: ...
|
|
610
658
|
|
|
611
|
-
|
|
612
|
-
|
|
613
|
-
If there is no specific arm, just omit the "Arm" key. If the percentage is not reported, just omit the "Percentage" key. The "Evidence" should always be provided.
|
|
659
|
+
6. Input Placeholder:
|
|
660
|
+
The template must include a placeholder in the format {{<placeholder_name>}} for the input text. The placeholder name can be customized as needed.
|
|
614
661
|
|
|
615
|
-
|
|
616
|
-
Below is the Adverse reactions section:
|
|
617
|
-
{{input}}
|
|
662
|
+
......
|
|
618
663
|
```
|
|
619
664
|
</details>
|
|
620
665
|
|
|
@@ -6,6 +6,13 @@
|
|
|
6
6
|
|
|
7
7
|
An LLM-powered tool that transforms everyday language into robust information extraction pipelines.
|
|
8
8
|
|
|
9
|
+
| Features | Support |
|
|
10
|
+
|----------|----------|
|
|
11
|
+
| **LLM Agent for prompt writing** | :white_check_mark: Interactive chat, Python functions |
|
|
12
|
+
| **Named Entity Recognition (NER)** | :white_check_mark: Document-level, Sentence-level |
|
|
13
|
+
| **Entity Attributes Extraction** | :white_check_mark: Flexible formats |
|
|
14
|
+
| **Relation Extraction (RE)** | :white_check_mark: Binary & Multiclass relations |
|
|
15
|
+
|
|
9
16
|
## Table of Contents
|
|
10
17
|
- [Overview](#overview)
|
|
11
18
|
- [Prerequisite](#prerequisite)
|
|
@@ -21,12 +28,12 @@ An LLM-powered tool that transforms everyday language into robust information ex
|
|
|
21
28
|
- [RelationExtractor](#relationextractor)
|
|
22
29
|
|
|
23
30
|
## Overview
|
|
24
|
-
LLM-IE is a toolkit that provides robust information extraction utilities for
|
|
31
|
+
LLM-IE is a toolkit that provides robust information extraction utilities for named entity, entity attributes, and entity relation extraction. Since prompt design has a significant impact on generative information extraction with LLMs, it has a built-in LLM agent ("editor") to help with prompt writing. The flowchart below demonstrates the workflow starting from a casual language request to output visualization.
|
|
25
32
|
|
|
26
33
|
<div align="center"><img src="doc_asset/readme_img/LLM-IE flowchart.png" width=800 ></div>
|
|
27
34
|
|
|
28
35
|
## Prerequisite
|
|
29
|
-
At least one LLM inference engine is required. There are built-in supports for 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python), <img src="https://avatars.githubusercontent.com/u/151674099?s=48&v=4" alt="Icon" width="20"/> [Ollama](https://github.com/ollama/ollama), 🤗 [Huggingface_hub](https://github.com/huggingface/huggingface_hub), <img src=doc_asset/readme_img/openai-logomark.png width=16 /> [OpenAI API](https://platform.openai.com/docs/api-reference/introduction), and <img src=doc_asset/readme_img/vllm-logo.png width=20 /> vLLM. For installation guides, please refer to those projects. Other inference engines can be configured through the [InferenceEngine](src/llm_ie/engines.py) abstract class. See [LLM Inference Engine](#llm-inference-engine) section below.
|
|
36
|
+
At least one LLM inference engine is required. There are built-in supports for 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python), <img src="https://avatars.githubusercontent.com/u/151674099?s=48&v=4" alt="Icon" width="20"/> [Ollama](https://github.com/ollama/ollama), 🤗 [Huggingface_hub](https://github.com/huggingface/huggingface_hub), <img src=doc_asset/readme_img/openai-logomark.png width=16 /> [OpenAI API](https://platform.openai.com/docs/api-reference/introduction), and <img src=doc_asset/readme_img/vllm-logo.png width=20 /> [vLLM](https://github.com/vllm-project/vllm). For installation guides, please refer to those projects. Other inference engines can be configured through the [InferenceEngine](src/llm_ie/engines.py) abstract class. See [LLM Inference Engine](#llm-inference-engine) section below.
|
|
30
37
|
|
|
31
38
|
## Installation
|
|
32
39
|
The Python package is available on PyPI.
|
|
@@ -111,21 +118,26 @@ We start with a casual description:
|
|
|
111
118
|
|
|
112
119
|
*"Extract diagnosis from the clinical note. Make sure to include diagnosis date and status."*
|
|
113
120
|
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
|
|
121
|
+
Define the AI prompt editor.
|
|
122
|
+
```python
|
|
123
|
+
from llm_ie.engines import OllamaInferenceEngine
|
|
117
124
|
from llm_ie.extractors import BasicFrameExtractor
|
|
118
125
|
from llm_ie.prompt_editor import PromptEditor
|
|
119
126
|
|
|
120
|
-
#
|
|
121
|
-
|
|
122
|
-
|
|
123
|
-
# Use LLM editor to generate a formal prompt template with standard extraction schema
|
|
127
|
+
# Define a LLM inference engine
|
|
128
|
+
llm = OllamaInferenceEngine(model_name="llama3.1:8b-instruct-q8_0")
|
|
129
|
+
# Define LLM prompt editor
|
|
124
130
|
editor = PromptEditor(llm, BasicFrameExtractor)
|
|
125
|
-
|
|
131
|
+
# Start chat
|
|
132
|
+
editor.chat()
|
|
126
133
|
```
|
|
127
134
|
|
|
128
|
-
|
|
135
|
+
This opens an interactive session:
|
|
136
|
+
<div align="left"><img src=doc_asset/readme_img/terminal_chat.PNG width=1000 ></div>
|
|
137
|
+
|
|
138
|
+
|
|
139
|
+
The ```PromptEditor``` drafts a prompt template following the schema required by the ```BasicFrameExtractor```:
|
|
140
|
+
|
|
129
141
|
```
|
|
130
142
|
# Task description
|
|
131
143
|
The paragraph below contains a clinical note with diagnoses listed. Please carefully review it and extract the diagnoses, including the diagnosis date and status.
|
|
@@ -151,6 +163,8 @@ If there is no specific date or status, just omit those keys.
|
|
|
151
163
|
Below is the clinical note:
|
|
152
164
|
{{input}}
|
|
153
165
|
```
|
|
166
|
+
|
|
167
|
+
|
|
154
168
|
#### Information extraction pipeline
|
|
155
169
|
Now we apply the prompt template to build an information extraction pipeline.
|
|
156
170
|
|
|
@@ -188,15 +202,33 @@ from llm_ie.data_types import LLMInformationExtractionDocument
|
|
|
188
202
|
doc = LLMInformationExtractionDocument(doc_id="Synthesized medical note",
|
|
189
203
|
text=note_text)
|
|
190
204
|
# Add frames to a document
|
|
191
|
-
|
|
192
|
-
doc.add_frame(frame, valid_mode="span", create_id=True)
|
|
205
|
+
doc.add_frames(frames, create_id=True)
|
|
193
206
|
|
|
194
207
|
# Save document to file (.llmie)
|
|
195
208
|
doc.save("<your filename>.llmie")
|
|
196
209
|
```
|
|
197
210
|
|
|
211
|
+
To visualize the extracted frames, we use the ```viz_serve()``` method.
|
|
212
|
+
```python
|
|
213
|
+
doc.viz_serve()
|
|
214
|
+
```
|
|
215
|
+
A Flask APP starts at port 5000 (default).
|
|
216
|
+
```
|
|
217
|
+
* Serving Flask app 'ie_viz.utilities'
|
|
218
|
+
* Debug mode: off
|
|
219
|
+
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
|
|
220
|
+
* Running on all addresses (0.0.0.0)
|
|
221
|
+
* Running on http://127.0.0.1:5000
|
|
222
|
+
Press CTRL+C to quit
|
|
223
|
+
127.0.0.1 - - [03/Oct/2024 23:36:22] "GET / HTTP/1.1" 200 -
|
|
224
|
+
```
|
|
225
|
+
|
|
226
|
+
<div align="left"><img src="doc_asset/readme_img/llm-ie_demo.PNG" width=1000 ></div>
|
|
227
|
+
|
|
228
|
+
|
|
198
229
|
## Examples
|
|
199
|
-
- [
|
|
230
|
+
- [Interactive chat with LLM prompt editors](demo/prompt_template_writing_via_chat.ipynb)
|
|
231
|
+
- [Write prompt templates with LLM prompt editors](demo/prompt_template_writing.ipynb)
|
|
200
232
|
- [NER + RE for Drug, Strength, Frequency](demo/medication_relation_extraction.ipynb)
|
|
201
233
|
|
|
202
234
|
## User Guide
|
|
@@ -421,7 +453,30 @@ print(BasicFrameExtractor.get_prompt_guide())
|
|
|
421
453
|
```
|
|
422
454
|
|
|
423
455
|
### Prompt Editor
|
|
424
|
-
The prompt editor is an LLM agent that
|
|
456
|
+
The prompt editor is an LLM agent that help users write prompt templates following the defined schema and guideline of each extractor. Chat with the promtp editor:
|
|
457
|
+
|
|
458
|
+
```python
|
|
459
|
+
from llm_ie.prompt_editor import PromptEditor
|
|
460
|
+
from llm_ie.extractors import BasicFrameExtractor
|
|
461
|
+
from llm_ie.engines import OllamaInferenceEngine
|
|
462
|
+
|
|
463
|
+
# Define an LLM inference engine
|
|
464
|
+
ollama = OllamaInferenceEngine(model_name="llama3.1:8b-instruct-q8_0")
|
|
465
|
+
|
|
466
|
+
# Define editor
|
|
467
|
+
editor = PromptEditor(ollama, BasicFrameExtractor)
|
|
468
|
+
|
|
469
|
+
editor.chat()
|
|
470
|
+
```
|
|
471
|
+
|
|
472
|
+
In a terminal environment, an interactive chat session will start:
|
|
473
|
+
<div align="left"><img src=doc_asset/readme_img/terminal_chat.PNG width=1000 ></div>
|
|
474
|
+
|
|
475
|
+
In the Jupyter/IPython environment, an ipywidgets session will start:
|
|
476
|
+
<div align="left"><img src=doc_asset/readme_img/IPython_chat.PNG width=1000 ></div>
|
|
477
|
+
|
|
478
|
+
|
|
479
|
+
We can also use the `rewrite()` and `comment()` methods to programmingly interact with the prompt editor:
|
|
425
480
|
1. start with a casual description of the task
|
|
426
481
|
2. have the prompt editor generate a prompt template as the starting point
|
|
427
482
|
3. manually revise the prompt template
|
|
@@ -567,40 +622,29 @@ print(BasicFrameExtractor.get_prompt_guide())
|
|
|
567
622
|
```
|
|
568
623
|
|
|
569
624
|
```
|
|
570
|
-
Prompt
|
|
571
|
-
1. Task description
|
|
572
|
-
2. Schema definition
|
|
573
|
-
3. Output format definition
|
|
574
|
-
4. Additional hints
|
|
575
|
-
5. Input placeholder
|
|
625
|
+
Prompt Template Design:
|
|
576
626
|
|
|
577
|
-
|
|
627
|
+
1. Task Description:
|
|
628
|
+
Provide a detailed description of the task, including the background and the type of task (e.g., named entity recognition).
|
|
578
629
|
|
|
579
|
-
|
|
580
|
-
|
|
630
|
+
2. Schema Definition:
|
|
631
|
+
List the key concepts that should be extracted, and provide clear definitions for each one.
|
|
581
632
|
|
|
582
|
-
|
|
583
|
-
|
|
584
|
-
"ClinicalTrial" which is the name of the trial,
|
|
585
|
-
If applicable, "Arm" which is the arm within the clinical trial,
|
|
586
|
-
"AdverseReaction" which is the name of the adverse reaction,
|
|
587
|
-
If applicable, "Percentage" which is the occurance of the adverse reaction within the trial and arm,
|
|
588
|
-
"Evidence" which is the EXACT sentence in the text where you found the AdverseReaction from
|
|
633
|
+
3. Output Format Definition:
|
|
634
|
+
The output should be a JSON list, where each element is a dictionary representing a frame (an entity along with its attributes). Each dictionary must include a key that holds the entity text. This key can be named "entity_text" or anything else depend on the context. The attributes can either be flat (e.g., {"entity_text": "<entity_text>", "attr1": "<attr1>", "attr2": "<attr2>"}) or nested (e.g., {"entity_text": "<entity_text>", "attributes": {"attr1": "<attr1>", "attr2": "<attr2>"}}).
|
|
589
635
|
|
|
590
|
-
|
|
591
|
-
|
|
592
|
-
|
|
593
|
-
|
|
594
|
-
|
|
595
|
-
|
|
636
|
+
4. Optional: Hints:
|
|
637
|
+
Provide itemized hints for the information extractors to guide the extraction process.
|
|
638
|
+
|
|
639
|
+
5. Optional: Examples:
|
|
640
|
+
Include examples in the format:
|
|
641
|
+
Input: ...
|
|
642
|
+
Output: ...
|
|
596
643
|
|
|
597
|
-
|
|
598
|
-
|
|
599
|
-
If there is no specific arm, just omit the "Arm" key. If the percentage is not reported, just omit the "Percentage" key. The "Evidence" should always be provided.
|
|
644
|
+
6. Input Placeholder:
|
|
645
|
+
The template must include a placeholder in the format {{<placeholder_name>}} for the input text. The placeholder name can be customized as needed.
|
|
600
646
|
|
|
601
|
-
|
|
602
|
-
Below is the Adverse reactions section:
|
|
603
|
-
{{input}}
|
|
647
|
+
......
|
|
604
648
|
```
|
|
605
649
|
</details>
|
|
606
650
|
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
[tool.poetry]
|
|
2
2
|
name = "llm-ie"
|
|
3
|
-
version = "0.
|
|
3
|
+
version = "0.3.1"
|
|
4
4
|
description = "An LLM-powered tool that transforms everyday language into robust information extraction pipelines."
|
|
5
5
|
authors = ["Enshuo (David) Hsu"]
|
|
6
6
|
license = "MIT"
|
|
@@ -14,6 +14,7 @@ exclude = [
|
|
|
14
14
|
[tool.poetry.dependencies]
|
|
15
15
|
python = "^3.11"
|
|
16
16
|
nltk = "^3.8"
|
|
17
|
+
colorama = "^0.4.6"
|
|
17
18
|
|
|
18
19
|
|
|
19
20
|
[build-system]
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
You are an AI assistant specializing in prompt writing and improvement. Your role is to help users refine, rewrite, and generate effective prompts based on guidelines provided. You are highly knowledgeable in extracting key information and adhering to structured formats. During interactions, you will engage in clear, insightful, and context-aware conversations, providing thoughtful responses to assist the user. Maintain a polite, professional tone and ensure each response adds value to the conversation, promoting clarity and creativity in the user's prompts. If users ask about irrelevant topics (not related to prompt development), you will politely decline to answer and guide the conversation back to prompt development.
|
|
@@ -0,0 +1,4 @@
|
|
|
1
|
+
Review the input sentence and your output carefully. If anything was missed, add it to your output following the defined output formats.
|
|
2
|
+
You should ONLY adding new items. Do NOT re-generate the entire answer.
|
|
3
|
+
Your output should be based on the input sentence.
|
|
4
|
+
Your output should strictly adheres to the defined output formats.
|
|
@@ -0,0 +1,3 @@
|
|
|
1
|
+
Review the input sentence and your output carefully. If you find any omissions or errors, correct them by generating a revised output following the defined output formats.
|
|
2
|
+
Your output should be based on the input sentence.
|
|
3
|
+
Your output should strictly adheres to the defined output formats.
|
|
@@ -0,0 +1,145 @@
|
|
|
1
|
+
Prompt Template Design:
|
|
2
|
+
|
|
3
|
+
1. Task Description:
|
|
4
|
+
Provide a detailed description of the task, including the background and the type of task (e.g., named entity recognition).
|
|
5
|
+
|
|
6
|
+
2. Schema Definition:
|
|
7
|
+
List the key concepts that should be extracted, and provide clear definitions for each one.
|
|
8
|
+
|
|
9
|
+
3. Output Format Definition:
|
|
10
|
+
The output should be a JSON list, where each element is a dictionary representing a frame (an entity along with its attributes). Each dictionary must include a key that holds the entity text. This key can be named "entity_text" or anything else depend on the context. The attributes can either be flat (e.g., {"entity_text": "<entity_text>", "attr1": "<attr1>", "attr2": "<attr2>"}) or nested (e.g., {"entity_text": "<entity_text>", "attributes": {"attr1": "<attr1>", "attr2": "<attr2>"}}).
|
|
11
|
+
|
|
12
|
+
4. Optional: Hints:
|
|
13
|
+
Provide itemized hints for the information extractors to guide the extraction process.
|
|
14
|
+
|
|
15
|
+
5. Optional: Examples:
|
|
16
|
+
Include examples in the format:
|
|
17
|
+
Input: ...
|
|
18
|
+
Output: ...
|
|
19
|
+
|
|
20
|
+
6. Input Placeholder:
|
|
21
|
+
The template must include a placeholder in the format {{<placeholder_name>}} for the input text. The placeholder name can be customized as needed.
|
|
22
|
+
|
|
23
|
+
|
|
24
|
+
Example 1 (single entity type with attributes):
|
|
25
|
+
|
|
26
|
+
# Task description
|
|
27
|
+
The paragraph below is from the Food and Drug Administration (FDA) Clinical Pharmacology Section of Labeling for Human Prescription Drug and Biological Products, Adverse reactions section. Please carefully review it and extract the adverse reactions and percentages. Note that each adverse reaction is nested under a clinical trial and potentially an arm. Your output should take that into consideration.
|
|
28
|
+
|
|
29
|
+
# Schema definition
|
|
30
|
+
Your output should contain:
|
|
31
|
+
"ClinicalTrial" which is the name of the trial,
|
|
32
|
+
If applicable, "Arm" which is the arm within the clinical trial,
|
|
33
|
+
"AdverseReaction" which is the name of the adverse reaction,
|
|
34
|
+
If applicable, "Percentage" which is the occurance of the adverse reaction within the trial and arm,
|
|
35
|
+
"Evidence" which is the EXACT sentence in the text where you found the AdverseReaction from
|
|
36
|
+
|
|
37
|
+
# Output format definition
|
|
38
|
+
Your output should follow JSON format, for example:
|
|
39
|
+
[
|
|
40
|
+
{"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>", "Evidence": "<exact sentence from the text>"},
|
|
41
|
+
{"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>", "Evidence": "<exact sentence from the text>"}
|
|
42
|
+
]
|
|
43
|
+
|
|
44
|
+
# Additional hints
|
|
45
|
+
Your output should be 100% based on the provided content. DO NOT output fake numbers.
|
|
46
|
+
If there is no specific arm, just omit the "Arm" key. If the percentage is not reported, just omit the "Percentage" key. The "Evidence" should always be provided.
|
|
47
|
+
|
|
48
|
+
# Input placeholder
|
|
49
|
+
Below is the Adverse reactions section:
|
|
50
|
+
{{input}}
|
|
51
|
+
|
|
52
|
+
|
|
53
|
+
Example 2 (multiple entity types):
|
|
54
|
+
|
|
55
|
+
# Task description
|
|
56
|
+
This is a named entity recognition task. Given medical note, annotate the Drug, Form, Strength, Frequency, Route, Dosage, Reason, ADE, and Duration.
|
|
57
|
+
|
|
58
|
+
# Schema definition
|
|
59
|
+
Your output should contain:
|
|
60
|
+
"entity_text": the exact wording as mentioned in the note.
|
|
61
|
+
"entity_type": type of the entity. It should be one of the "Drug", "Form", "Strength", "Frequency", "Route", "Dosage", "Reason", "ADE", or "Duration".
|
|
62
|
+
|
|
63
|
+
# Output format definition
|
|
64
|
+
Your output should follow JSON format,
|
|
65
|
+
if there are one of the entity mentions: Drug, Form, Strength, Frequency, Route, Dosage, Reason, ADE, or Duration:
|
|
66
|
+
[{"entity_text": "<Exact entity mentions as in the note>", "entity_type": "<entity type as listed above>"},
|
|
67
|
+
{"entity_text": "<Exact entity mentions as in the note>", "entity_type": "<entity type as listed above>"}]
|
|
68
|
+
if there is no entity mentioned in the given note, just output an empty list:
|
|
69
|
+
[]
|
|
70
|
+
|
|
71
|
+
I am only interested in the extracted contents in []. Do NOT explain your answer.
|
|
72
|
+
|
|
73
|
+
# Examples
|
|
74
|
+
Below are some examples:
|
|
75
|
+
|
|
76
|
+
Input: Acetaminophen 650 mg PO BID 5.
|
|
77
|
+
Output: [{"entity_text": "Acetaminophen", "entity_type": "Drug"}, {"entity_text": "650 mg", "entity_type": "Strength"}, {"entity_text": "PO", "entity_type": "Route"}, {"entity_text": "BID", "entity_type": "Frequency"}]
|
|
78
|
+
|
|
79
|
+
Input: Mesalamine DR 1200 mg PO BID 2.
|
|
80
|
+
Output: [{"entity_text": "Mesalamine DR", "entity_type": "Drug"}, {"entity_text": "1200 mg", "entity_type": "Strength"}, {"entity_text": "BID", "entity_type": "Frequency"}, {"entity_text": "PO", "entity_type": "Route"}]
|
|
81
|
+
|
|
82
|
+
|
|
83
|
+
# Input placeholder
|
|
84
|
+
Below is the medical note:
|
|
85
|
+
"{{input}}"
|
|
86
|
+
|
|
87
|
+
|
|
88
|
+
Example 3 (multiple entity types with corresponding attributes):
|
|
89
|
+
|
|
90
|
+
# Task description
|
|
91
|
+
This is a named entity recognition task. Given a medical note, annotate the events (EVENT) and time expressions (TIMEX3):
|
|
92
|
+
|
|
93
|
+
# Schema definition
|
|
94
|
+
Your output should contain:
|
|
95
|
+
"entity_text": the exact wording as mentioned in the note.
|
|
96
|
+
"entity_type": type of the entity. It should be one of the "EVENT" or "TIMEX3".
|
|
97
|
+
if entity_type is "EVENT",
|
|
98
|
+
"type": the event type as one of the "TEST", "PROBLEM", "TREATMENT", "CLINICAL_DEPT", "EVIDENTIAL", or "OCCURRENCE".
|
|
99
|
+
"polarity": whether an EVENT is positive ("POS") or negative ("NAG"). For example, in “the patient reports headache, and denies chills”, the EVENT [headache] is positive in its polarity, and the EVENT [chills] is negative in its polarity.
|
|
100
|
+
"modality": whether an EVENT actually occurred or not. Must be one of the "FACTUAL", "CONDITIONAL", "POSSIBLE", or "PROPOSED".
|
|
101
|
+
|
|
102
|
+
if entity_type is "TIMEX3",
|
|
103
|
+
"type": the type as one of the "DATE", "TIME", "DURATION", or "FREQUENCY".
|
|
104
|
+
"val": the numeric value 1) DATE: [YYYY]-[MM]-[DD], 2) TIME: [hh]:[mm]:[ss], 3) DURATION: P[n][Y/M/W/D]. So, “for eleven days” will be
|
|
105
|
+
represented as “P11D”, meaning a period of 11 days. 4) R[n][duration], where n denotes the number of repeats. When the n is omitted, the expression denotes an unspecified amount of repeats. For example, “once a day for 3 days” is “R3P1D” (repeat the time interval of 1 day (P1D) for 3 times (R3)), twice every day is “RP12H” (repeat every 12 hours)
|
|
106
|
+
"mod": additional information regarding the temporal value of a time expression. Must be one of the:
|
|
107
|
+
“NA”: the default value, no relevant modifier is present;
|
|
108
|
+
“MORE”, means “more than”, e.g. over 2 days (val = P2D, mod = MORE);
|
|
109
|
+
“LESS”, means “less than”, e.g. almost 2 months (val = P2M, mod=LESS);
|
|
110
|
+
“APPROX”, means “approximate”, e.g. nearly a week (val = P1W, mod=APPROX);
|
|
111
|
+
“START”, describes the beginning of a period of time, e.g. Christmas morning, 2005 (val= 2005-12-25, mod= START).
|
|
112
|
+
“END”, describes the end of a period of time, e.g. late last year, (val = 2010, mod = END)
|
|
113
|
+
“MIDDLE”, describes the middle of a period of time, e.g. mid-September 2001 (val = 2001-09, mod = MIDDLE)
|
|
114
|
+
|
|
115
|
+
# Output format definition
|
|
116
|
+
Your output should follow JSON format,
|
|
117
|
+
if there are one of the EVENT or TIMEX3 entity mentions:
|
|
118
|
+
[
|
|
119
|
+
{"entity_text": "<Exact entity mentions as in the note>", "entity_type": "EVENT", "type": "<event type>", "polarity": "<event polarity>", "modality": "<event modality>"},
|
|
120
|
+
{"entity_text": "<Exact entity mentions as in the note>", "entity_type": "TIMEX3", "type": "<TIMEX3 type>", "val": "<time value>", "mod": "<additional information>"}
|
|
121
|
+
...
|
|
122
|
+
]
|
|
123
|
+
if there is no entity mentioned in the given note, just output an empty list:
|
|
124
|
+
[]
|
|
125
|
+
|
|
126
|
+
I am only interested in the extracted contents in []. Do NOT explain your answer.
|
|
127
|
+
|
|
128
|
+
# Examples
|
|
129
|
+
Below are some examples:
|
|
130
|
+
|
|
131
|
+
Input: At 9/7/93 , 1:00 a.m. , intravenous fluids rate was decreased to 50 cc's per hour , total fluids given during the first 24 hours were 140 to 150 cc's per kilo per day .
|
|
132
|
+
Output: [{"entity_text": "intravenous fluids", "entity_type": "EVENT", "type": "TREATMENT", "polarity": "POS", "modality": "FACTUAL"},
|
|
133
|
+
{"entity_text": "decreased", "entity_type": "EVENT", "type": "OCCURRENCE", "polarity": "POS", "modality": "FACTUAL"},
|
|
134
|
+
{"entity_text": "total fluids", "entity_type": "EVENT", "type": "TREATMENT", "polarity": "POS", "modality": "FACTUAL"},
|
|
135
|
+
{"entity_text": "9/7/93 , 1:00 a.m.", "entity_type": "TIMEX3", "type": "TIME", "val": "1993-09-07T01:00", "mod": "NA"},
|
|
136
|
+
{"entity_text": "24 hours", "entity_type": "TIMEX3", "type": "DURATION", "val": "PT24H", "mod": "NA"}]
|
|
137
|
+
|
|
138
|
+
Input: At that time it appeared well adhered to the underlying skin .
|
|
139
|
+
Output: [{"entity_text": "it", "entity_type": "EVENT", "type": "TREATMENT", "polarity": "POS", "modality": "FACTUAL"},
|
|
140
|
+
{"entity_text": "well adhered", "entity_type": "EVENT", "type": "OCCURRENCE", "polarity": "POS", "modality": "FACTUAL"}]
|
|
141
|
+
|
|
142
|
+
|
|
143
|
+
# Input placeholder
|
|
144
|
+
Below is the entire medical note:
|
|
145
|
+
"{{input}}"
|