llm-ie 0.3.5__tar.gz → 0.4.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {llm_ie-0.3.5 → llm_ie-0.4.1}/PKG-INFO +361 -109
- {llm_ie-0.3.5 → llm_ie-0.4.1}/README.md +361 -109
- {llm_ie-0.3.5 → llm_ie-0.4.1}/pyproject.toml +1 -1
- llm_ie-0.4.1/src/llm_ie/__init__.py +9 -0
- {llm_ie-0.3.5 → llm_ie-0.4.1}/src/llm_ie/data_types.py +44 -17
- {llm_ie-0.3.5 → llm_ie-0.4.1}/src/llm_ie/engines.py +151 -9
- {llm_ie-0.3.5 → llm_ie-0.4.1}/src/llm_ie/extractors.py +545 -151
- {llm_ie-0.3.5 → llm_ie-0.4.1}/src/llm_ie/prompt_editor.py +17 -2
- llm_ie-0.3.5/src/llm_ie/__init__.py +0 -0
- {llm_ie-0.3.5 → llm_ie-0.4.1}/src/llm_ie/asset/PromptEditor_prompts/chat.txt +0 -0
- {llm_ie-0.3.5 → llm_ie-0.4.1}/src/llm_ie/asset/PromptEditor_prompts/comment.txt +0 -0
- {llm_ie-0.3.5 → llm_ie-0.4.1}/src/llm_ie/asset/PromptEditor_prompts/rewrite.txt +0 -0
- {llm_ie-0.3.5 → llm_ie-0.4.1}/src/llm_ie/asset/PromptEditor_prompts/system.txt +0 -0
- {llm_ie-0.3.5 → llm_ie-0.4.1}/src/llm_ie/asset/default_prompts/ReviewFrameExtractor_addition_review_prompt.txt +0 -0
- {llm_ie-0.3.5 → llm_ie-0.4.1}/src/llm_ie/asset/default_prompts/ReviewFrameExtractor_revision_review_prompt.txt +0 -0
- {llm_ie-0.3.5 → llm_ie-0.4.1}/src/llm_ie/asset/default_prompts/SentenceReviewFrameExtractor_addition_review_prompt.txt +0 -0
- {llm_ie-0.3.5 → llm_ie-0.4.1}/src/llm_ie/asset/default_prompts/SentenceReviewFrameExtractor_revision_review_prompt.txt +0 -0
- {llm_ie-0.3.5 → llm_ie-0.4.1}/src/llm_ie/asset/prompt_guide/BasicFrameExtractor_prompt_guide.txt +0 -0
- {llm_ie-0.3.5 → llm_ie-0.4.1}/src/llm_ie/asset/prompt_guide/BinaryRelationExtractor_prompt_guide.txt +0 -0
- {llm_ie-0.3.5 → llm_ie-0.4.1}/src/llm_ie/asset/prompt_guide/MultiClassRelationExtractor_prompt_guide.txt +0 -0
- {llm_ie-0.3.5 → llm_ie-0.4.1}/src/llm_ie/asset/prompt_guide/ReviewFrameExtractor_prompt_guide.txt +0 -0
- {llm_ie-0.3.5 → llm_ie-0.4.1}/src/llm_ie/asset/prompt_guide/SentenceCoTFrameExtractor_prompt_guide.txt +0 -0
- {llm_ie-0.3.5 → llm_ie-0.4.1}/src/llm_ie/asset/prompt_guide/SentenceFrameExtractor_prompt_guide.txt +0 -0
- {llm_ie-0.3.5 → llm_ie-0.4.1}/src/llm_ie/asset/prompt_guide/SentenceReviewFrameExtractor_prompt_guide.txt +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.1
|
|
2
2
|
Name: llm-ie
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.4.1
|
|
4
4
|
Summary: An LLM-powered tool that transforms everyday language into robust information extraction pipelines.
|
|
5
5
|
License: MIT
|
|
6
6
|
Author: Enshuo (David) Hsu
|
|
@@ -24,10 +24,21 @@ An LLM-powered tool that transforms everyday language into robust information ex
|
|
|
24
24
|
|
|
25
25
|
| Features | Support |
|
|
26
26
|
|----------|----------|
|
|
27
|
-
| **LLM Agent for prompt writing** | :white_check_mark:
|
|
27
|
+
| **LLM Agent for prompt writing** | :white_check_mark: Interactive chat, Python functions |
|
|
28
28
|
| **Named Entity Recognition (NER)** | :white_check_mark: Document-level, Sentence-level |
|
|
29
29
|
| **Entity Attributes Extraction** | :white_check_mark: Flexible formats |
|
|
30
30
|
| **Relation Extraction (RE)** | :white_check_mark: Binary & Multiclass relations |
|
|
31
|
+
| **Visualization** | :white_check_mark: Built-in entity & relation visualization |
|
|
32
|
+
|
|
33
|
+
## Recent Updates
|
|
34
|
+
- [v0.3.0](https://github.com/daviden1013/llm-ie/releases/tag/v0.3.0) (Oct 17, 2024): Interactive chat to Prompt editor LLM agent.
|
|
35
|
+
- [v0.3.1](https://github.com/daviden1013/llm-ie/releases/tag/v0.3.1) (Oct 26, 2024): Added Sentence Review Frame Extractor and Sentence CoT Frame Extractor
|
|
36
|
+
- [v0.3.4](https://github.com/daviden1013/llm-ie/releases/tag/v0.3.4) (Nov 24, 2024): Added entity fuzzy search.
|
|
37
|
+
- [v0.3.5](https://github.com/daviden1013/llm-ie/releases/tag/v0.3.5) (Nov 27, 2024): Adopted `json_repair` to fix broken JSON from LLM outputs.
|
|
38
|
+
- [v0.4.0](https://github.com/daviden1013/llm-ie/releases/tag/v0.4.0) (Jan 4, 2025):
|
|
39
|
+
- Concurrent LLM inferencing to speed up frame and relation extraction.
|
|
40
|
+
- Support for LiteLLM.
|
|
41
|
+
- [v0.4.1](https://github.com/daviden1013/llm-ie/releases/tag/v0.4.1) (Jan 25, 2025): Added filters, table view, and some new features to visualization tool (make sure to update [ie-viz](https://github.com/daviden1013/ie-viz)).
|
|
31
42
|
|
|
32
43
|
## Table of Contents
|
|
33
44
|
- [Overview](#overview)
|
|
@@ -38,10 +49,13 @@ An LLM-powered tool that transforms everyday language into robust information ex
|
|
|
38
49
|
- [User Guide](#user-guide)
|
|
39
50
|
- [LLM Inference Engine](#llm-inference-engine)
|
|
40
51
|
- [Prompt Template](#prompt-template)
|
|
41
|
-
- [Prompt Editor](#prompt-editor)
|
|
52
|
+
- [Prompt Editor LLM Agent](#prompt-editor-llm-agent)
|
|
42
53
|
- [Extractor](#extractor)
|
|
43
54
|
- [FrameExtractor](#frameextractor)
|
|
44
55
|
- [RelationExtractor](#relationextractor)
|
|
56
|
+
- [Visualization](#visualization)
|
|
57
|
+
- [Benchmarks](#benchmarks)
|
|
58
|
+
- [Citation](#citation)
|
|
45
59
|
|
|
46
60
|
## Overview
|
|
47
61
|
LLM-IE is a toolkit that provides robust information extraction utilities for named entity, entity attributes, and entity relation extraction. Since prompt design has a significant impact on generative information extraction with LLMs, it has a built-in LLM agent ("editor") to help with prompt writing. The flowchart below demonstrates the workflow starting from a casual language request to output visualization.
|
|
@@ -49,7 +63,7 @@ LLM-IE is a toolkit that provides robust information extraction utilities for na
|
|
|
49
63
|
<div align="center"><img src="doc_asset/readme_img/LLM-IE flowchart.png" width=800 ></div>
|
|
50
64
|
|
|
51
65
|
## Prerequisite
|
|
52
|
-
At least one LLM inference engine is required. There are built-in supports for 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python), <img src="
|
|
66
|
+
At least one LLM inference engine is required. There are built-in supports for 🚅 [LiteLLM](https://github.com/BerriAI/litellm), 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python), <img src="doc_asset/readme_img/ollama_icon.png" alt="Icon" width="22"/> [Ollama](https://github.com/ollama/ollama), 🤗 [Huggingface_hub](https://github.com/huggingface/huggingface_hub), <img src=doc_asset/readme_img/openai-logomark_white.png width=16 /> [OpenAI API](https://platform.openai.com/docs/api-reference/introduction), and <img src=doc_asset/readme_img/vllm-logo_small.png width=20 /> [vLLM](https://github.com/vllm-project/vllm). For installation guides, please refer to those projects. Other inference engines can be configured through the [InferenceEngine](src/llm_ie/engines.py) abstract class. See [LLM Inference Engine](#llm-inference-engine) section below.
|
|
53
67
|
|
|
54
68
|
## Installation
|
|
55
69
|
The Python package is available on PyPI.
|
|
@@ -65,22 +79,23 @@ We use a [synthesized medical note](demo/document/synthesized_note.txt) by ChatG
|
|
|
65
79
|
Choose one of the built-in engines below.
|
|
66
80
|
|
|
67
81
|
<details>
|
|
68
|
-
<summary
|
|
82
|
+
<summary>🚅 LiteLLM</summary>
|
|
69
83
|
|
|
70
|
-
```python
|
|
71
|
-
from llm_ie.engines import
|
|
84
|
+
```python
|
|
85
|
+
from llm_ie.engines import LiteLLMInferenceEngine
|
|
72
86
|
|
|
73
|
-
|
|
87
|
+
inference_engine = LiteLLMInferenceEngine(model="openai/Llama-3.3-70B-Instruct", base_url="http://localhost:8000/v1", api_key="EMPTY")
|
|
74
88
|
```
|
|
75
89
|
</details>
|
|
90
|
+
|
|
76
91
|
<details>
|
|
77
|
-
<summary
|
|
92
|
+
<summary><img src=doc_asset/readme_img/openai-logomark_white.png width=16 /> OpenAI API</summary>
|
|
78
93
|
|
|
94
|
+
Follow the [Best Practices for API Key Safety](https://help.openai.com/en/articles/5112595-best-practices-for-api-key-safety) to set up API key.
|
|
79
95
|
```python
|
|
80
|
-
from llm_ie.engines import
|
|
96
|
+
from llm_ie.engines import OpenAIInferenceEngine
|
|
81
97
|
|
|
82
|
-
|
|
83
|
-
gguf_filename="Meta-Llama-3.1-8B-Instruct-Q8_0.gguf")
|
|
98
|
+
inference_engine = OpenAIInferenceEngine(model="gpt-4o-mini")
|
|
84
99
|
```
|
|
85
100
|
</details>
|
|
86
101
|
|
|
@@ -90,24 +105,22 @@ llm = LlamaCppInferenceEngine(repo_id="bullerwins/Meta-Llama-3.1-8B-Instruct-GGU
|
|
|
90
105
|
```python
|
|
91
106
|
from llm_ie.engines import HuggingFaceHubInferenceEngine
|
|
92
107
|
|
|
93
|
-
|
|
108
|
+
inference_engine = HuggingFaceHubInferenceEngine(model="meta-llama/Meta-Llama-3-8B-Instruct")
|
|
94
109
|
```
|
|
95
110
|
</details>
|
|
96
111
|
|
|
97
112
|
<details>
|
|
98
|
-
<summary><img src=doc_asset/readme_img/
|
|
113
|
+
<summary><img src="doc_asset/readme_img/ollama_icon.png" alt="Icon" width="22"/> Ollama</summary>
|
|
99
114
|
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
from llm_ie.engines import OpenAIInferenceEngine
|
|
115
|
+
```python
|
|
116
|
+
from llm_ie.engines import OllamaInferenceEngine
|
|
103
117
|
|
|
104
|
-
|
|
118
|
+
inference_engine = OllamaInferenceEngine(model_name="llama3.1:8b-instruct-q8_0")
|
|
105
119
|
```
|
|
106
|
-
|
|
107
120
|
</details>
|
|
108
121
|
|
|
109
122
|
<details>
|
|
110
|
-
<summary><img src=doc_asset/readme_img/vllm-
|
|
123
|
+
<summary><img src=doc_asset/readme_img/vllm-logo_small.png width=20 /> vLLM</summary>
|
|
111
124
|
|
|
112
125
|
The vLLM support follows the [OpenAI Compatible Server](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html). For more parameters, please refer to the documentation.
|
|
113
126
|
|
|
@@ -118,15 +131,24 @@ vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct
|
|
|
118
131
|
Define inference engine
|
|
119
132
|
```python
|
|
120
133
|
from llm_ie.engines import OpenAIInferenceEngine
|
|
121
|
-
|
|
122
|
-
|
|
123
|
-
|
|
134
|
+
inference_engine = OpenAIInferenceEngine(base_url="http://localhost:8000/v1",
|
|
135
|
+
api_key="EMPTY",
|
|
136
|
+
model="meta-llama/Meta-Llama-3.1-8B-Instruct")
|
|
124
137
|
```
|
|
138
|
+
</details>
|
|
125
139
|
|
|
140
|
+
<details>
|
|
141
|
+
<summary>🦙 Llama-cpp-python</summary>
|
|
126
142
|
|
|
143
|
+
```python
|
|
144
|
+
from llm_ie.engines import LlamaCppInferenceEngine
|
|
145
|
+
|
|
146
|
+
inference_engine = LlamaCppInferenceEngine(repo_id="bullerwins/Meta-Llama-3.1-8B-Instruct-GGUF",
|
|
147
|
+
gguf_filename="Meta-Llama-3.1-8B-Instruct-Q8_0.gguf")
|
|
148
|
+
```
|
|
127
149
|
</details>
|
|
128
150
|
|
|
129
|
-
In this quick start demo, we use
|
|
151
|
+
In this quick start demo, we use Ollama to run Llama-3.1-8B with int8 quantization.
|
|
130
152
|
The outputs might be slightly different with other inference engines, LLMs, or quantization.
|
|
131
153
|
|
|
132
154
|
#### Casual language as prompt
|
|
@@ -136,14 +158,12 @@ We start with a casual description:
|
|
|
136
158
|
|
|
137
159
|
Define the AI prompt editor.
|
|
138
160
|
```python
|
|
139
|
-
from llm_ie
|
|
140
|
-
from llm_ie.extractors import BasicFrameExtractor
|
|
141
|
-
from llm_ie.prompt_editor import PromptEditor
|
|
161
|
+
from llm_ie import OllamaInferenceEngine, PromptEditor, SentenceFrameExtractor
|
|
142
162
|
|
|
143
163
|
# Define a LLM inference engine
|
|
144
|
-
|
|
164
|
+
inference_engine = OllamaInferenceEngine(model_name="llama3.1:8b-instruct-q8_0")
|
|
145
165
|
# Define LLM prompt editor
|
|
146
|
-
editor = PromptEditor(
|
|
166
|
+
editor = PromptEditor(inference_engine, SentenceFrameExtractor)
|
|
147
167
|
# Start chat
|
|
148
168
|
editor.chat()
|
|
149
169
|
```
|
|
@@ -152,7 +172,7 @@ This opens an interactive session:
|
|
|
152
172
|
<div align="left"><img src=doc_asset/readme_img/terminal_chat.PNG width=1000 ></div>
|
|
153
173
|
|
|
154
174
|
|
|
155
|
-
The ```PromptEditor``` drafts a prompt template following the schema required by the ```
|
|
175
|
+
The ```PromptEditor``` drafts a prompt template following the schema required by the ```SentenceFrameExtractor```:
|
|
156
176
|
|
|
157
177
|
```
|
|
158
178
|
# Task description
|
|
@@ -190,10 +210,13 @@ with open("./demo/document/synthesized_note.txt", 'r') as f:
|
|
|
190
210
|
note_text = f.read()
|
|
191
211
|
|
|
192
212
|
# Define extractor
|
|
193
|
-
extractor =
|
|
213
|
+
extractor = SentenceFrameExtractor(inference_engine, prompt_template)
|
|
194
214
|
|
|
195
215
|
# Extract
|
|
196
|
-
|
|
216
|
+
# To stream the extraction process, use concurrent=False, stream=True:
|
|
217
|
+
frames = extractor.extract_frames(note_text, entity_key="Diagnosis", concurrent=False, stream=True)
|
|
218
|
+
# For faster extraction, use concurrent=True to enable asynchronous prompting
|
|
219
|
+
frames = extractor.extract_frames(note_text, entity_key="Diagnosis", concurrent=True)
|
|
197
220
|
|
|
198
221
|
# Check extractions
|
|
199
222
|
for frame in frames:
|
|
@@ -202,10 +225,17 @@ for frame in frames:
|
|
|
202
225
|
The output is a list of frames. Each frame has a ```entity_text```, ```start```, ```end```, and a dictionary of ```attr```.
|
|
203
226
|
|
|
204
227
|
```python
|
|
205
|
-
{'frame_id': '0', 'start': 537, 'end': 549, 'entity_text': '
|
|
206
|
-
{'frame_id': '1', 'start': 551, 'end': 565, 'entity_text': '
|
|
207
|
-
{'frame_id': '2', 'start': 571, 'end': 595, 'entity_text': 'Type 2
|
|
208
|
-
{'frame_id': '3', 'start':
|
|
228
|
+
{'frame_id': '0', 'start': 537, 'end': 549, 'entity_text': 'hypertension', 'attr': {'Date': '2010-01-01', 'Status': 'Active'}}
|
|
229
|
+
{'frame_id': '1', 'start': 551, 'end': 565, 'entity_text': 'hyperlipidemia', 'attr': {'Date': '2015-01-01', 'Status': 'Active'}}
|
|
230
|
+
{'frame_id': '2', 'start': 571, 'end': 595, 'entity_text': 'Type 2 diabetes mellitus', 'attr': {'Date': '2018-01-01', 'Status': 'Active'}}
|
|
231
|
+
{'frame_id': '3', 'start': 660, 'end': 670, 'entity_text': 'chest pain', 'attr': {'Date': 'July 18, 2024'}}
|
|
232
|
+
{'frame_id': '4', 'start': 991, 'end': 1003, 'entity_text': 'Hypertension', 'attr': {'Date': '2010-01-01'}}
|
|
233
|
+
{'frame_id': '5', 'start': 1026, 'end': 1040, 'entity_text': 'Hyperlipidemia', 'attr': {'Date': '2015-01-01'}}
|
|
234
|
+
{'frame_id': '6', 'start': 1063, 'end': 1087, 'entity_text': 'Type 2 Diabetes Mellitus', 'attr': {'Date': '2018-01-01'}}
|
|
235
|
+
{'frame_id': '7', 'start': 1926, 'end': 1947, 'entity_text': 'ST-segment depression', 'attr': None}
|
|
236
|
+
{'frame_id': '8', 'start': 2049, 'end': 2066, 'entity_text': 'acute infiltrates', 'attr': None}
|
|
237
|
+
{'frame_id': '9', 'start': 2117, 'end': 2150, 'entity_text': 'Mild left ventricular hypertrophy', 'attr': None}
|
|
238
|
+
{'frame_id': '10', 'start': 2402, 'end': 2425, 'entity_text': 'acute coronary syndrome', 'attr': {'Date': 'July 20, 2024', 'Status': 'Active'}}
|
|
209
239
|
```
|
|
210
240
|
|
|
211
241
|
We can save the frames to a document object for better management. The document holds ```text``` and ```frames```. The ```add_frame()``` method performs validation and (if passed) adds a frame to the document.
|
|
@@ -228,7 +258,7 @@ To visualize the extracted frames, we use the ```viz_serve()``` method.
|
|
|
228
258
|
```python
|
|
229
259
|
doc.viz_serve()
|
|
230
260
|
```
|
|
231
|
-
A Flask
|
|
261
|
+
A Flask App starts at port 5000 (default).
|
|
232
262
|
```
|
|
233
263
|
* Serving Flask app 'ie_viz.utilities'
|
|
234
264
|
* Debug mode: off
|
|
@@ -255,30 +285,43 @@ This package is comprised of some key classes:
|
|
|
255
285
|
- Extractors
|
|
256
286
|
|
|
257
287
|
### LLM Inference Engine
|
|
258
|
-
Provides an interface for different LLM inference engines to work in the information extraction workflow. The built-in engines are
|
|
288
|
+
Provides an interface for different LLM inference engines to work in the information extraction workflow. The built-in engines are `LiteLLMInferenceEngine`, `OpenAIInferenceEngine`, `HuggingFaceHubInferenceEngine`, `OllamaInferenceEngine`, and `LlamaCppInferenceEngine`.
|
|
259
289
|
|
|
260
|
-
####
|
|
261
|
-
The
|
|
290
|
+
#### 🚅 LiteLLM
|
|
291
|
+
The LiteLLM is an adaptor project that unifies many proprietary and open-source LLM APIs. Popular inferncing servers, including OpenAI, Huggingface Hub, and Ollama are supported via its interface. For more details, refer to [LiteLLM GitHub page](https://github.com/BerriAI/litellm).
|
|
262
292
|
|
|
293
|
+
To use LiteLLM with LLM-IE, import the `LiteLLMInferenceEngine` and follow the required model naming.
|
|
263
294
|
```python
|
|
264
|
-
from llm_ie.engines import
|
|
295
|
+
from llm_ie.engines import LiteLLMInferenceEngine
|
|
296
|
+
|
|
297
|
+
# Huggingface serverless inferencing
|
|
298
|
+
os.environ['HF_TOKEN']
|
|
299
|
+
inference_engine = LiteLLMInferenceEngine(model="huggingface/meta-llama/Meta-Llama-3-8B-Instruct")
|
|
265
300
|
|
|
266
|
-
|
|
267
|
-
|
|
268
|
-
|
|
269
|
-
|
|
270
|
-
|
|
271
|
-
|
|
301
|
+
# OpenAI GPT models
|
|
302
|
+
os.environ['OPENAI_API_KEY']
|
|
303
|
+
inference_engine = LiteLLMInferenceEngine(model="openai/gpt-4o-mini")
|
|
304
|
+
|
|
305
|
+
# OpenAI compatible local server
|
|
306
|
+
inference_engine = LiteLLMInferenceEngine(model="openai/Llama-3.1-8B-Instruct", base_url="http://localhost:8000/v1", api_key="EMPTY")
|
|
307
|
+
|
|
308
|
+
# Ollama
|
|
309
|
+
inference_engine = LiteLLMInferenceEngine(model="ollama/llama3.1:8b-instruct-q8_0")
|
|
272
310
|
```
|
|
273
|
-
|
|
274
|
-
|
|
311
|
+
|
|
312
|
+
#### <img src=doc_asset/readme_img/openai-logomark_white.png width=16 /> OpenAI API
|
|
313
|
+
In bash, save API key to the environmental variable ```OPENAI_API_KEY```.
|
|
314
|
+
```
|
|
315
|
+
export OPENAI_API_KEY=<your_API_key>
|
|
316
|
+
```
|
|
317
|
+
|
|
318
|
+
In Python, create inference engine and specify model name. For the available models, refer to [OpenAI webpage](https://platform.openai.com/docs/models).
|
|
319
|
+
For more parameters, see [OpenAI API reference](https://platform.openai.com/docs/api-reference/introduction).
|
|
275
320
|
|
|
276
321
|
```python
|
|
277
|
-
from llm_ie.engines import
|
|
322
|
+
from llm_ie.engines import OpenAIInferenceEngine
|
|
278
323
|
|
|
279
|
-
|
|
280
|
-
num_ctx=4096,
|
|
281
|
-
keep_alive=300)
|
|
324
|
+
inference_engine = OpenAIInferenceEngine(model="gpt-4o-mini")
|
|
282
325
|
```
|
|
283
326
|
|
|
284
327
|
#### 🤗 huggingface_hub
|
|
@@ -287,25 +330,19 @@ The ```model``` can be a model id hosted on the Hugging Face Hub or a URL to a d
|
|
|
287
330
|
```python
|
|
288
331
|
from llm_ie.engines import HuggingFaceHubInferenceEngine
|
|
289
332
|
|
|
290
|
-
|
|
291
|
-
```
|
|
292
|
-
|
|
293
|
-
#### <img src=doc_asset/readme_img/openai-logomark.png width=16 /> OpenAI API
|
|
294
|
-
In bash, save API key to the environmental variable ```OPENAI_API_KEY```.
|
|
295
|
-
```
|
|
296
|
-
export OPENAI_API_KEY=<your_API_key>
|
|
333
|
+
inference_engine = HuggingFaceHubInferenceEngine(model="meta-llama/Meta-Llama-3-8B-Instruct")
|
|
297
334
|
```
|
|
298
335
|
|
|
299
|
-
|
|
300
|
-
|
|
336
|
+
#### <img src="doc_asset/readme_img/ollama_icon.png" alt="Icon" width="22"/> Ollama
|
|
337
|
+
The ```model_name``` must match the names on the [Ollama library](https://ollama.com/library). Use the command line ```ollama ls``` to check your local model list. ```num_ctx``` determines the context length LLM will consider during text generation. Empirically, longer context length gives better performance, while consuming more memory and increases computation. ```keep_alive``` regulates the lifespan of LLM. It indicates a number of seconds to hold the LLM after the last API call. Default is 5 minutes (300 sec).
|
|
301
338
|
|
|
302
339
|
```python
|
|
303
|
-
from llm_ie.engines import
|
|
340
|
+
from llm_ie.engines import OllamaInferenceEngine
|
|
304
341
|
|
|
305
|
-
|
|
342
|
+
inference_engine = OllamaInferenceEngine(model_name="llama3.1:8b-instruct-q8_0", num_ctx=4096, keep_alive=300)
|
|
306
343
|
```
|
|
307
344
|
|
|
308
|
-
#### <img src=doc_asset/readme_img/vllm-
|
|
345
|
+
#### <img src=doc_asset/readme_img/vllm-logo_small.png width=20 /> vLLM
|
|
309
346
|
The vLLM support follows the [OpenAI Compatible Server](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html). For more parameters, please refer to the documentation.
|
|
310
347
|
|
|
311
348
|
Start the server
|
|
@@ -318,20 +355,34 @@ the default port is 8000. ```--port``` sets the port.
|
|
|
318
355
|
Define inference engine
|
|
319
356
|
```python
|
|
320
357
|
from llm_ie.engines import OpenAIInferenceEngine
|
|
321
|
-
|
|
358
|
+
inference_engine = OpenAIInferenceEngine(base_url="http://localhost:8000/v1",
|
|
322
359
|
api_key="MY_API_KEY",
|
|
323
360
|
model="meta-llama/Meta-Llama-3.1-8B-Instruct")
|
|
324
361
|
```
|
|
325
362
|
The ```model``` must match the repo name specified in the server.
|
|
326
363
|
|
|
364
|
+
#### 🦙 Llama-cpp-python
|
|
365
|
+
The ```repo_id``` and ```gguf_filename``` must match the ones on the Huggingface repo to ensure the correct model is loaded. ```n_ctx``` determines the context length LLM will consider during text generation. Empirically, longer context length gives better performance, while consuming more memory and increases computation. Note that when ```n_ctx``` is less than the prompt length, Llama.cpp throws exceptions. ```n_gpu_layers``` indicates a number of model layers to offload to GPU. Default is -1 for all layers (entire LLM). Flash attention ```flash_attn``` is supported by Llama.cpp. The ```verbose``` indicates whether model information should be displayed. For more input parameters, see 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python).
|
|
366
|
+
|
|
367
|
+
```python
|
|
368
|
+
from llm_ie.engines import LlamaCppInferenceEngine
|
|
369
|
+
|
|
370
|
+
inference_engine = LlamaCppInferenceEngine(repo_id="bullerwins/Meta-Llama-3.1-8B-Instruct-GGUF",
|
|
371
|
+
gguf_filename="Meta-Llama-3.1-8B-Instruct-Q8_0.gguf",
|
|
372
|
+
n_ctx=4096,
|
|
373
|
+
n_gpu_layers=-1,
|
|
374
|
+
flash_attn=True,
|
|
375
|
+
verbose=False)
|
|
376
|
+
```
|
|
377
|
+
|
|
327
378
|
#### Test inference engine configuration
|
|
328
379
|
To test the inference engine, use the ```chat()``` method.
|
|
329
380
|
|
|
330
381
|
```python
|
|
331
382
|
from llm_ie.engines import OllamaInferenceEngine
|
|
332
383
|
|
|
333
|
-
|
|
334
|
-
|
|
384
|
+
inference_engine = OllamaInferenceEngine(model_name="llama3.1:8b-instruct-q8_0")
|
|
385
|
+
inference_engine.chat(messages=[{"role": "user", "content":"Hi"}], stream=True)
|
|
335
386
|
```
|
|
336
387
|
The output should be something like (might vary by LLMs and versions)
|
|
337
388
|
|
|
@@ -449,8 +500,8 @@ prompt_template = """
|
|
|
449
500
|
Below is the medical note:
|
|
450
501
|
"{{note}}"
|
|
451
502
|
"""
|
|
452
|
-
|
|
453
|
-
extractor = BasicFrameExtractor(
|
|
503
|
+
inference_engine = OllamaInferenceEngine(model_name="llama3.1:8b-instruct-q8_0")
|
|
504
|
+
extractor = BasicFrameExtractor(inference_engine, prompt_template)
|
|
454
505
|
prompt_text = extractor._get_user_prompt(text_content={"knowledge": "<some text...>",
|
|
455
506
|
"note": "<some text...>")
|
|
456
507
|
print(prompt_text)
|
|
@@ -468,7 +519,7 @@ from llm_ie.extractors import BasicFrameExtractor
|
|
|
468
519
|
print(BasicFrameExtractor.get_prompt_guide())
|
|
469
520
|
```
|
|
470
521
|
|
|
471
|
-
### Prompt Editor
|
|
522
|
+
### Prompt Editor LLM Agent
|
|
472
523
|
The prompt editor is an LLM agent that help users write prompt templates following the defined schema and guideline of each extractor. Chat with the promtp editor:
|
|
473
524
|
|
|
474
525
|
```python
|
|
@@ -477,10 +528,10 @@ from llm_ie.extractors import BasicFrameExtractor
|
|
|
477
528
|
from llm_ie.engines import OllamaInferenceEngine
|
|
478
529
|
|
|
479
530
|
# Define an LLM inference engine
|
|
480
|
-
|
|
531
|
+
inference_engine = OllamaInferenceEngine(model_name="llama3.1:8b-instruct-q8_0")
|
|
481
532
|
|
|
482
533
|
# Define editor
|
|
483
|
-
editor = PromptEditor(
|
|
534
|
+
editor = PromptEditor(inference_engine, BasicFrameExtractor)
|
|
484
535
|
|
|
485
536
|
editor.chat()
|
|
486
537
|
```
|
|
@@ -504,10 +555,10 @@ from llm_ie.extractors import BasicFrameExtractor
|
|
|
504
555
|
from llm_ie.engines import OllamaInferenceEngine
|
|
505
556
|
|
|
506
557
|
# Define an LLM inference engine
|
|
507
|
-
|
|
558
|
+
inference_engine = OllamaInferenceEngine(model_name="llama3.1:8b-instruct-q8_0")
|
|
508
559
|
|
|
509
560
|
# Define editor
|
|
510
|
-
editor = PromptEditor(
|
|
561
|
+
editor = PromptEditor(inference_engine, BasicFrameExtractor)
|
|
511
562
|
|
|
512
563
|
# Have editor to generate initial prompt template
|
|
513
564
|
initial_version = editor.rewrite("Extract treatment events from the discharge summary.")
|
|
@@ -612,10 +663,12 @@ After a few iterations of revision, we will have a high-quality prompt template
|
|
|
612
663
|
|
|
613
664
|
### Extractor
|
|
614
665
|
An extractor implements a prompting method for information extraction. There are two extractor families: ```FrameExtractor``` and ```RelationExtractor```.
|
|
615
|
-
The ```FrameExtractor``` extracts named entities
|
|
666
|
+
The ```FrameExtractor``` extracts named entities with attributes ("frames"). The ```RelationExtractor``` extracts the relations (and relation types) between frames.
|
|
616
667
|
|
|
617
668
|
#### FrameExtractor
|
|
618
|
-
The ```BasicFrameExtractor``` directly prompts LLM to generate a list of dictionaries. Each dictionary is then post-processed into a frame. The ```ReviewFrameExtractor``` is based on the ```BasicFrameExtractor``` but adds a review step after the initial extraction to boost sensitivity and improve performance. ```SentenceFrameExtractor``` gives LLM the entire document upfront as a reference, then prompts LLM sentence by sentence and collects per-sentence outputs. To learn about an extractor, use the class method ```get_prompt_guide()``` to print out the prompt guide.
|
|
669
|
+
The ```BasicFrameExtractor``` directly prompts LLM to generate a list of dictionaries. Each dictionary is then post-processed into a frame. The ```ReviewFrameExtractor``` is based on the ```BasicFrameExtractor``` but adds a review step after the initial extraction to boost sensitivity and improve performance. ```SentenceFrameExtractor``` gives LLM the entire document upfront as a reference, then prompts LLM sentence by sentence and collects per-sentence outputs. ```SentenceReviewFrameExtractor``` is the combined version of ```ReviewFrameExtractor``` and ```SentenceFrameExtractor``` which each sentence is extracted and reviewed. The ```SentenceCoTFrameExtractor``` implements chain of thoughts (CoT). It first analyzes a sentence, then extract frames based on the CoT. To learn about an extractor, use the class method ```get_prompt_guide()``` to print out the prompt guide.
|
|
670
|
+
|
|
671
|
+
Since the output entity text from LLMs might not be consistent with the original text due to the limitations of LLMs, we apply fuzzy search in post-processing to find the accurate entity span. In the `FrameExtractor.extract_frames()` method, setting parameter `fuzzy_match=True` applies Jaccard similarity matching.
|
|
619
672
|
|
|
620
673
|
<details>
|
|
621
674
|
<summary>BasicFrameExtractor</summary>
|
|
@@ -625,8 +678,8 @@ The ```BasicFrameExtractor``` directly prompts LLM to generate a list of diction
|
|
|
625
678
|
```python
|
|
626
679
|
from llm_ie.extractors import BasicFrameExtractor
|
|
627
680
|
|
|
628
|
-
extractor = BasicFrameExtractor(
|
|
629
|
-
frames = extractor.extract_frames(text_content=text, entity_key="Diagnosis", stream=True)
|
|
681
|
+
extractor = BasicFrameExtractor(inference_engine, prompt_temp)
|
|
682
|
+
frames = extractor.extract_frames(text_content=text, entity_key="Diagnosis", case_sensitive=False, fuzzy_match=True, stream=True)
|
|
630
683
|
```
|
|
631
684
|
|
|
632
685
|
Use the ```get_prompt_guide()``` method to inspect the prompt template guideline for ```BasicFrameExtractor```.
|
|
@@ -688,7 +741,7 @@ The ```review_mode``` should be set to ```review_mode="revision"```
|
|
|
688
741
|
```python
|
|
689
742
|
review_prompt = "Review the input and your output again. If you find some diagnosis was missed, add them to your output. Regenerate your output."
|
|
690
743
|
|
|
691
|
-
extractor = ReviewFrameExtractor(
|
|
744
|
+
extractor = ReviewFrameExtractor(inference_engine, prompt_temp, review_prompt, review_mode="revision")
|
|
692
745
|
frames = extractor.extract_frames(text_content=text, entity_key="Diagnosis", stream=True)
|
|
693
746
|
```
|
|
694
747
|
</details>
|
|
@@ -698,14 +751,95 @@ frames = extractor.extract_frames(text_content=text, entity_key="Diagnosis", str
|
|
|
698
751
|
|
|
699
752
|
The ```SentenceFrameExtractor``` instructs the LLM to extract sentence by sentence. The reason is to ensure the accuracy of frame spans. It also prevents LLMs from overseeing sections/ sentences. Empirically, this extractor results in better recall than the ```BasicFrameExtractor``` in complex tasks.
|
|
700
753
|
|
|
754
|
+
For concurrent extraction (recommended), the `async/ await` feature is used to speed up inferencing. The `concurrent_batch_size` sets the batch size of sentences to be processed in cocurrent.
|
|
755
|
+
|
|
756
|
+
```python
|
|
757
|
+
from llm_ie.extractors import SentenceFrameExtractor
|
|
758
|
+
|
|
759
|
+
extractor = SentenceFrameExtractor(inference_engine, prompt_temp)
|
|
760
|
+
frames = extractor.extract_frames(text_content=text, entity_key="Diagnosis", case_sensitive=False, fuzzy_match=True, concurrent=True, concurrent_batch_size=32)
|
|
761
|
+
```
|
|
762
|
+
|
|
701
763
|
The ```multi_turn``` parameter specifies multi-turn conversation for prompting. If True, sentences and LLM outputs will be appended to the input message and carry-over. If False, only the current sentence is prompted. For LLM inference engines that supports prompt cache (e.g., Llama.Cpp, Ollama), use multi-turn conversation prompting can better utilize the KV caching and results in faster inferencing. But for vLLM with [Automatic Prefix Caching (APC)](https://docs.vllm.ai/en/latest/automatic_prefix_caching/apc.html), multi-turn conversation is not necessary.
|
|
702
764
|
|
|
703
765
|
```python
|
|
704
766
|
from llm_ie.extractors import SentenceFrameExtractor
|
|
705
767
|
|
|
706
|
-
extractor = SentenceFrameExtractor(
|
|
707
|
-
frames = extractor.extract_frames(text_content=text, entity_key="Diagnosis", multi_turn=True, stream=True)
|
|
768
|
+
extractor = SentenceFrameExtractor(inference_engine, prompt_temp)
|
|
769
|
+
frames = extractor.extract_frames(text_content=text, entity_key="Diagnosis", multi_turn=False, case_sensitive=False, fuzzy_match=True, stream=True)
|
|
708
770
|
```
|
|
771
|
+
|
|
772
|
+
</details>
|
|
773
|
+
|
|
774
|
+
<details>
|
|
775
|
+
<summary>SentenceReviewFrameExtractor</summary>
|
|
776
|
+
|
|
777
|
+
The `SentenceReviewFrameExtractor` performs sentence-level extraction and review.
|
|
778
|
+
|
|
779
|
+
```python
|
|
780
|
+
from llm_ie.extractors import SentenceReviewFrameExtractor
|
|
781
|
+
|
|
782
|
+
extractor = SentenceReviewFrameExtractor(inference_engine, prompt_temp, review_mode="revision")
|
|
783
|
+
frames = extractor.extract_frames(text_content=note_text, entity_key="Diagnosis", stream=True)
|
|
784
|
+
```
|
|
785
|
+
|
|
786
|
+
```
|
|
787
|
+
Sentence:
|
|
788
|
+
#### History of Present Illness
|
|
789
|
+
The patient reported that the chest pain started two days prior to admission.
|
|
790
|
+
|
|
791
|
+
Initial Output:
|
|
792
|
+
[
|
|
793
|
+
{"Diagnosis": "chest pain", "Date": "two days prior to admission", "Status": "reported"}
|
|
794
|
+
]
|
|
795
|
+
Review:
|
|
796
|
+
[
|
|
797
|
+
{"Diagnosis": "admission", "Date": null, "Status": null}
|
|
798
|
+
]
|
|
799
|
+
```
|
|
800
|
+
|
|
801
|
+
</details>
|
|
802
|
+
|
|
803
|
+
<details>
|
|
804
|
+
<summary>SentenceCoTFrameExtractor</summary>
|
|
805
|
+
|
|
806
|
+
The `SentenceCoTFrameExtractor` processes document sentence-by-sentence. For each sentence, it first generate an analysis paragraph in `<Analysis>... </Analysis>`(chain-of-thought). Then output extraction in JSON in `<Outputs>... </Outputs>`, similar to `SentenceFrameExtractor`.
|
|
807
|
+
|
|
808
|
+
```python
|
|
809
|
+
from llm_ie.extractors import SentenceCoTFrameExtractor
|
|
810
|
+
|
|
811
|
+
extractor = SentenceCoTFrameExtractor(inference_engine, CoT_prompt_temp)
|
|
812
|
+
frames = extractor.extract_frames(text_content=note_text, entity_key="Diagnosis", stream=True)
|
|
813
|
+
```
|
|
814
|
+
|
|
815
|
+
```
|
|
816
|
+
Sentence:
|
|
817
|
+
#### Discharge Medications
|
|
818
|
+
- Aspirin 81 mg daily
|
|
819
|
+
- Clopidogrel 75 mg daily
|
|
820
|
+
- Atorvastatin 40 mg daily
|
|
821
|
+
- Metoprolol 50 mg twice daily
|
|
822
|
+
- Lisinopril 20 mg daily
|
|
823
|
+
- Metformin 1000 mg twice daily
|
|
824
|
+
|
|
825
|
+
#### Discharge Instructions
|
|
826
|
+
John Doe was advised to follow a heart-healthy diet, engage in regular physical activity, and monitor his blood glucose levels.
|
|
827
|
+
|
|
828
|
+
CoT:
|
|
829
|
+
<Analysis>
|
|
830
|
+
The given text does not explicitly mention a diagnosis, but rather lists the discharge medications and instructions for the patient. However, we can infer that the patient has been diagnosed with conditions that require these medications, such as high blood pressure, high cholesterol, and diabetes.
|
|
831
|
+
|
|
832
|
+
</Analysis>
|
|
833
|
+
|
|
834
|
+
<Outputs>
|
|
835
|
+
[
|
|
836
|
+
{"Diagnosis": "hypertension", "Date": null, "Status": "confirmed"},
|
|
837
|
+
{"Diagnosis": "hyperlipidemia", "Date": null, "Status": "confirmed"},
|
|
838
|
+
{"Diagnosis": "Type 2 diabetes mellitus", "Date": null, "Status": "confirmed"}
|
|
839
|
+
]
|
|
840
|
+
</Outputs>
|
|
841
|
+
```
|
|
842
|
+
|
|
709
843
|
</details>
|
|
710
844
|
|
|
711
845
|
#### RelationExtractor
|
|
@@ -725,12 +859,32 @@ print(BinaryRelationExtractor.get_prompt_guide())
|
|
|
725
859
|
```
|
|
726
860
|
|
|
727
861
|
```
|
|
728
|
-
Prompt
|
|
729
|
-
|
|
730
|
-
|
|
731
|
-
|
|
732
|
-
|
|
733
|
-
|
|
862
|
+
Prompt Template Design:
|
|
863
|
+
|
|
864
|
+
1. Task description:
|
|
865
|
+
Provide a detailed description of the task, including the background and the type of task (e.g., binary relation extraction). Mention the region of interest (ROI) text.
|
|
866
|
+
2. Schema definition:
|
|
867
|
+
List the criterion for relation (True) and for no relation (False).
|
|
868
|
+
|
|
869
|
+
3. Output format definition:
|
|
870
|
+
The ouptut must be a dictionary with a key "Relation" (i.e., {"Relation": "<True or False>"}).
|
|
871
|
+
|
|
872
|
+
4. (optional) Hints:
|
|
873
|
+
Provide itemized hints for the information extractors to guide the extraction process.
|
|
874
|
+
|
|
875
|
+
5. (optional) Examples:
|
|
876
|
+
Include examples in the format:
|
|
877
|
+
Input: ...
|
|
878
|
+
Output: ...
|
|
879
|
+
|
|
880
|
+
6. Entity 1 full information:
|
|
881
|
+
Include a placeholder in the format {{<frame_1>}}
|
|
882
|
+
|
|
883
|
+
7. Entity 2 full information:
|
|
884
|
+
Include a placeholder in the format {{<frame_2>}}
|
|
885
|
+
|
|
886
|
+
8. Input placeholders:
|
|
887
|
+
The template must include a placeholder "{{roi_text}}" for the ROI text.
|
|
734
888
|
|
|
735
889
|
|
|
736
890
|
Example:
|
|
@@ -754,15 +908,15 @@ Example:
|
|
|
754
908
|
3. If the strength or frequency is for another medication, output False.
|
|
755
909
|
4. If the strength or frequency is for the same medication but at a different location (span), output False.
|
|
756
910
|
|
|
757
|
-
#
|
|
758
|
-
ROI Text with the two entities annotated with <entity_1> and <entity_2>:
|
|
759
|
-
"{{roi_text}}"
|
|
760
|
-
|
|
761
|
-
Entity 1 full information:
|
|
911
|
+
# Entity 1 full information:
|
|
762
912
|
{{frame_1}}
|
|
763
913
|
|
|
764
|
-
Entity 2 full information:
|
|
914
|
+
# Entity 2 full information:
|
|
765
915
|
{{frame_2}}
|
|
916
|
+
|
|
917
|
+
# Input placeholders
|
|
918
|
+
ROI Text with the two entities annotated with <entity_1> and <entity_2>:
|
|
919
|
+
"{{roi_text}}"
|
|
766
920
|
```
|
|
767
921
|
|
|
768
922
|
As an example, we define the ```possible_relation_func``` function:
|
|
@@ -797,8 +951,12 @@ In the ```BinaryRelationExtractor``` constructor, we pass in the prompt template
|
|
|
797
951
|
```python
|
|
798
952
|
from llm_ie.extractors import BinaryRelationExtractor
|
|
799
953
|
|
|
800
|
-
extractor = BinaryRelationExtractor(
|
|
801
|
-
relations
|
|
954
|
+
extractor = BinaryRelationExtractor(inference_engine, prompt_template=prompt_template, possible_relation_func=possible_relation_func)
|
|
955
|
+
# Extract binary relations with concurrent mode (faster)
|
|
956
|
+
relations = extractor.extract_relations(doc, concurrent=True)
|
|
957
|
+
|
|
958
|
+
# To print out the step-by-step, use the `concurrent=False` and `stream=True` options
|
|
959
|
+
relations = extractor.extract_relations(doc, concurrent=False, stream=True)
|
|
802
960
|
```
|
|
803
961
|
|
|
804
962
|
</details>
|
|
@@ -814,11 +972,34 @@ print(MultiClassRelationExtractor.get_prompt_guide())
|
|
|
814
972
|
```
|
|
815
973
|
|
|
816
974
|
```
|
|
817
|
-
Prompt
|
|
818
|
-
|
|
819
|
-
|
|
820
|
-
|
|
821
|
-
|
|
975
|
+
Prompt Template Design:
|
|
976
|
+
|
|
977
|
+
1. Task description:
|
|
978
|
+
Provide a detailed description of the task, including the background and the type of task (e.g., binary relation extraction). Mention the region of interest (ROI) text.
|
|
979
|
+
2. Schema definition:
|
|
980
|
+
List the criterion for relation (True) and for no relation (False).
|
|
981
|
+
|
|
982
|
+
3. Output format definition:
|
|
983
|
+
This section must include a placeholder "{{pos_rel_types}}" for the possible relation types.
|
|
984
|
+
The ouptut must be a dictionary with a key "RelationType" (i.e., {"RelationType": "<relation type or No Relation>"}).
|
|
985
|
+
|
|
986
|
+
4. (optional) Hints:
|
|
987
|
+
Provide itemized hints for the information extractors to guide the extraction process.
|
|
988
|
+
|
|
989
|
+
5. (optional) Examples:
|
|
990
|
+
Include examples in the format:
|
|
991
|
+
Input: ...
|
|
992
|
+
Output: ...
|
|
993
|
+
|
|
994
|
+
6. Entity 1 full information:
|
|
995
|
+
Include a placeholder in the format {{<frame_1>}}
|
|
996
|
+
|
|
997
|
+
7. Entity 2 full information:
|
|
998
|
+
Include a placeholder in the format {{<frame_2>}}
|
|
999
|
+
|
|
1000
|
+
8. Input placeholders:
|
|
1001
|
+
The template must include a placeholder "{{roi_text}}" for the ROI text.
|
|
1002
|
+
|
|
822
1003
|
|
|
823
1004
|
|
|
824
1005
|
Example:
|
|
@@ -851,15 +1032,15 @@ Example:
|
|
|
851
1032
|
3. If the strength or frequency is for another medication, output "No Relation".
|
|
852
1033
|
4. If the strength or frequency is for the same medication but at a different location (span), output "No Relation".
|
|
853
1034
|
|
|
854
|
-
#
|
|
855
|
-
ROI Text with the two entities annotated with <entity_1> and <entity_2>:
|
|
856
|
-
"{{roi_text}}"
|
|
857
|
-
|
|
858
|
-
Entity 1 full information:
|
|
1035
|
+
# Entity 1 full information:
|
|
859
1036
|
{{frame_1}}
|
|
860
1037
|
|
|
861
|
-
Entity 2 full information:
|
|
1038
|
+
# Entity 2 full information:
|
|
862
1039
|
{{frame_2}}
|
|
1040
|
+
|
|
1041
|
+
# Input placeholders
|
|
1042
|
+
ROI Text with the two entities annotated with <entity_1> and <entity_2>:
|
|
1043
|
+
"{{roi_text}}"
|
|
863
1044
|
```
|
|
864
1045
|
|
|
865
1046
|
As an example, we define the ```possible_relation_types_func``` :
|
|
@@ -890,8 +1071,79 @@ def possible_relation_types_func(frame_1, frame_2) -> List[str]:
|
|
|
890
1071
|
```python
|
|
891
1072
|
from llm_ie.extractors import MultiClassRelationExtractor
|
|
892
1073
|
|
|
893
|
-
extractor = MultiClassRelationExtractor(
|
|
894
|
-
|
|
1074
|
+
extractor = MultiClassRelationExtractor(inference_engine, prompt_template=re_prompt_template,
|
|
1075
|
+
possible_relation_types_func=possible_relation_types_func)
|
|
1076
|
+
|
|
1077
|
+
# Extract multi-class relations with concurrent mode (faster)
|
|
1078
|
+
relations = extractor.extract_relations(doc, concurrent=True)
|
|
1079
|
+
|
|
1080
|
+
# To print out the step-by-step, use the `concurrent=False` and `stream=True` options
|
|
1081
|
+
relations = extractor.extract_relations(doc, concurrent=False, stream=True)
|
|
895
1082
|
```
|
|
896
1083
|
|
|
897
1084
|
</details>
|
|
1085
|
+
|
|
1086
|
+
### Visualization
|
|
1087
|
+
|
|
1088
|
+
<div align="center"><img src="doc_asset/readme_img/visualization.PNG" width=95% ></div>
|
|
1089
|
+
|
|
1090
|
+
The `LLMInformationExtractionDocument` class supports named entity, entity attributes, and relation visualization. The implementation is through our plug-in package [ie-viz](https://github.com/daviden1013/ie-viz). Check the example Jupyter Notebook [NER + RE for Drug, Strength, Frequency](demo/medication_relation_extraction.ipynb) for a working demo.
|
|
1091
|
+
|
|
1092
|
+
```cmd
|
|
1093
|
+
pip install ie-viz
|
|
1094
|
+
```
|
|
1095
|
+
|
|
1096
|
+
The `viz_serve()` method starts a Flask App on localhost port 5000 by default.
|
|
1097
|
+
```python
|
|
1098
|
+
from llm_ie.data_types import LLMInformationExtractionDocument
|
|
1099
|
+
|
|
1100
|
+
# Define document
|
|
1101
|
+
doc = LLMInformationExtractionDocument(doc_id="Medical note",
|
|
1102
|
+
text=note_text)
|
|
1103
|
+
# Add extracted frames and relations to document
|
|
1104
|
+
doc.add_frames(frames)
|
|
1105
|
+
doc.add_relations(relations)
|
|
1106
|
+
# Visualize the document
|
|
1107
|
+
doc.viz_serve()
|
|
1108
|
+
```
|
|
1109
|
+
|
|
1110
|
+
Alternatively, the `viz_render()` method returns a self-contained (HTML + JS + CSS) string. Save it to file and open with a browser.
|
|
1111
|
+
```python
|
|
1112
|
+
html = doc.viz_render()
|
|
1113
|
+
|
|
1114
|
+
with open("Medical note.html", "w") as f:
|
|
1115
|
+
f.write(html)
|
|
1116
|
+
```
|
|
1117
|
+
|
|
1118
|
+
To customize colors for different entities, use `color_attr_key` (simple) or `color_map_func` (advanced).
|
|
1119
|
+
|
|
1120
|
+
The `color_attr_key` automatically assign colors based on the specified attribute key. For example, "EntityType".
|
|
1121
|
+
```python
|
|
1122
|
+
doc.viz_serve(color_attr_key="EntityType")
|
|
1123
|
+
```
|
|
1124
|
+
|
|
1125
|
+
The `color_map_func` allow users to define a custom entity-color mapping function. For example,
|
|
1126
|
+
```python
|
|
1127
|
+
def color_map_func(entity) -> str:
|
|
1128
|
+
if entity['attr']['<attribute key>'] == "<a certain value>":
|
|
1129
|
+
return "#7f7f7f"
|
|
1130
|
+
else:
|
|
1131
|
+
return "#03A9F4"
|
|
1132
|
+
|
|
1133
|
+
doc.viz_serve(color_map_func=color_map_func)
|
|
1134
|
+
```
|
|
1135
|
+
|
|
1136
|
+
## Benchmarks
|
|
1137
|
+
We benchmarked the frame and relation extractors on biomedical information extraction tasks. The results and experiment code is available on [this page](https://github.com/daviden1013/LLM-IE_Benchmark).
|
|
1138
|
+
|
|
1139
|
+
|
|
1140
|
+
## Citation
|
|
1141
|
+
For more information and benchmarks, please check our paper:
|
|
1142
|
+
```bibtex
|
|
1143
|
+
@article{hsu2024llm,
|
|
1144
|
+
title={LLM-IE: A Python Package for Generative Information Extraction with Large Language Models},
|
|
1145
|
+
author={Hsu, Enshuo and Roberts, Kirk},
|
|
1146
|
+
journal={arXiv preprint arXiv:2411.11779},
|
|
1147
|
+
year={2024}
|
|
1148
|
+
}
|
|
1149
|
+
```
|