llm-ie 0.2.2__tar.gz → 0.3.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (22) hide show
  1. {llm_ie-0.2.2 → llm_ie-0.3.0}/PKG-INFO +88 -44
  2. {llm_ie-0.2.2 → llm_ie-0.3.0}/README.md +87 -43
  3. {llm_ie-0.2.2 → llm_ie-0.3.0}/pyproject.toml +1 -1
  4. llm_ie-0.3.0/src/llm_ie/asset/PromptEditor_prompts/chat.txt +5 -0
  5. llm_ie-0.3.0/src/llm_ie/asset/PromptEditor_prompts/rewrite.txt +9 -0
  6. llm_ie-0.3.0/src/llm_ie/asset/PromptEditor_prompts/system.txt +1 -0
  7. llm_ie-0.3.0/src/llm_ie/asset/prompt_guide/BasicFrameExtractor_prompt_guide.txt +145 -0
  8. {llm_ie-0.2.2 → llm_ie-0.3.0}/src/llm_ie/asset/prompt_guide/BinaryRelationExtractor_prompt_guide.txt +32 -12
  9. {llm_ie-0.2.2 → llm_ie-0.3.0}/src/llm_ie/asset/prompt_guide/MultiClassRelationExtractor_prompt_guide.txt +35 -12
  10. llm_ie-0.3.0/src/llm_ie/asset/prompt_guide/ReviewFrameExtractor_prompt_guide.txt +145 -0
  11. llm_ie-0.3.0/src/llm_ie/asset/prompt_guide/SentenceFrameExtractor_prompt_guide.txt +145 -0
  12. {llm_ie-0.2.2 → llm_ie-0.3.0}/src/llm_ie/engines.py +1 -1
  13. {llm_ie-0.2.2 → llm_ie-0.3.0}/src/llm_ie/extractors.py +36 -5
  14. llm_ie-0.3.0/src/llm_ie/prompt_editor.py +189 -0
  15. llm_ie-0.2.2/src/llm_ie/asset/PromptEditor_prompts/rewrite.txt +0 -7
  16. llm_ie-0.2.2/src/llm_ie/asset/prompt_guide/BasicFrameExtractor_prompt_guide.txt +0 -35
  17. llm_ie-0.2.2/src/llm_ie/asset/prompt_guide/ReviewFrameExtractor_prompt_guide.txt +0 -35
  18. llm_ie-0.2.2/src/llm_ie/asset/prompt_guide/SentenceFrameExtractor_prompt_guide.txt +0 -40
  19. llm_ie-0.2.2/src/llm_ie/prompt_editor.py +0 -45
  20. {llm_ie-0.2.2 → llm_ie-0.3.0}/src/llm_ie/__init__.py +0 -0
  21. {llm_ie-0.2.2 → llm_ie-0.3.0}/src/llm_ie/asset/PromptEditor_prompts/comment.txt +0 -0
  22. {llm_ie-0.2.2 → llm_ie-0.3.0}/src/llm_ie/data_types.py +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: llm-ie
3
- Version: 0.2.2
3
+ Version: 0.3.0
4
4
  Summary: An LLM-powered tool that transforms everyday language into robust information extraction pipelines.
5
5
  License: MIT
6
6
  Author: Enshuo (David) Hsu
@@ -20,6 +20,13 @@ Description-Content-Type: text/markdown
20
20
 
21
21
  An LLM-powered tool that transforms everyday language into robust information extraction pipelines.
22
22
 
23
+ | Features | Support |
24
+ |----------|----------|
25
+ | **LLM Agent for prompt writing** | :white_check_mark: Interactive chat, Python functions |
26
+ | **Named Entity Recognition (NER)** | :white_check_mark: Document-level, Sentence-level |
27
+ | **Entity Attributes Extraction** | :white_check_mark: Flexible formats |
28
+ | **Relation Extraction (RE)** | :white_check_mark: Binary & Multiclass relations |
29
+
23
30
  ## Table of Contents
24
31
  - [Overview](#overview)
25
32
  - [Prerequisite](#prerequisite)
@@ -35,12 +42,12 @@ An LLM-powered tool that transforms everyday language into robust information ex
35
42
  - [RelationExtractor](#relationextractor)
36
43
 
37
44
  ## Overview
38
- LLM-IE is a toolkit that provides robust information extraction utilities for frame-based information extraction. Since prompt design has a significant impact on generative information extraction with LLMs, it also provides a built-in LLM editor to help with prompt writing. The flowchart below demonstrates the workflow starting from a casual language request.
45
+ LLM-IE is a toolkit that provides robust information extraction utilities for named entity, entity attributes, and entity relation extraction. Since prompt design has a significant impact on generative information extraction with LLMs, it has a built-in LLM agent ("editor") to help with prompt writing. The flowchart below demonstrates the workflow starting from a casual language request to output visualization.
39
46
 
40
47
  <div align="center"><img src="doc_asset/readme_img/LLM-IE flowchart.png" width=800 ></div>
41
48
 
42
49
  ## Prerequisite
43
- At least one LLM inference engine is required. There are built-in supports for 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python), <img src="https://avatars.githubusercontent.com/u/151674099?s=48&v=4" alt="Icon" width="20"/> [Ollama](https://github.com/ollama/ollama), 🤗 [Huggingface_hub](https://github.com/huggingface/huggingface_hub), <img src=doc_asset/readme_img/openai-logomark.png width=16 /> [OpenAI API](https://platform.openai.com/docs/api-reference/introduction), and <img src=doc_asset/readme_img/vllm-logo.png width=20 /> vLLM. For installation guides, please refer to those projects. Other inference engines can be configured through the [InferenceEngine](src/llm_ie/engines.py) abstract class. See [LLM Inference Engine](#llm-inference-engine) section below.
50
+ At least one LLM inference engine is required. There are built-in supports for 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python), <img src="https://avatars.githubusercontent.com/u/151674099?s=48&v=4" alt="Icon" width="20"/> [Ollama](https://github.com/ollama/ollama), 🤗 [Huggingface_hub](https://github.com/huggingface/huggingface_hub), <img src=doc_asset/readme_img/openai-logomark.png width=16 /> [OpenAI API](https://platform.openai.com/docs/api-reference/introduction), and <img src=doc_asset/readme_img/vllm-logo.png width=20 /> [vLLM](https://github.com/vllm-project/vllm). For installation guides, please refer to those projects. Other inference engines can be configured through the [InferenceEngine](src/llm_ie/engines.py) abstract class. See [LLM Inference Engine](#llm-inference-engine) section below.
44
51
 
45
52
  ## Installation
46
53
  The Python package is available on PyPI.
@@ -125,21 +132,26 @@ We start with a casual description:
125
132
 
126
133
  *"Extract diagnosis from the clinical note. Make sure to include diagnosis date and status."*
127
134
 
128
- The ```PromptEditor``` rewrites it following the schema required by the ```BasicFrameExtractor```.
129
-
130
- ```python
135
+ Define the AI prompt editor.
136
+ ```python
137
+ from llm_ie.engines import OllamaInferenceEngine
131
138
  from llm_ie.extractors import BasicFrameExtractor
132
139
  from llm_ie.prompt_editor import PromptEditor
133
140
 
134
- # Describe the task in casual language
135
- prompt_draft = "Extract diagnosis from the clinical note. Make sure to include diagnosis date and status."
136
-
137
- # Use LLM editor to generate a formal prompt template with standard extraction schema
141
+ # Define a LLM inference engine
142
+ llm = OllamaInferenceEngine(model_name="llama3.1:8b-instruct-q8_0")
143
+ # Define LLM prompt editor
138
144
  editor = PromptEditor(llm, BasicFrameExtractor)
139
- prompt_template = editor.rewrite(prompt_draft)
145
+ # Start chat
146
+ editor.chat()
140
147
  ```
141
148
 
142
- The editor generates a prompt template as below:
149
+ This opens an interactive session:
150
+ <div align="left"><img src=doc_asset/readme_img/terminal_chat.PNG width=1000 ></div>
151
+
152
+
153
+ The ```PromptEditor``` drafts a prompt template following the schema required by the ```BasicFrameExtractor```:
154
+
143
155
  ```
144
156
  # Task description
145
157
  The paragraph below contains a clinical note with diagnoses listed. Please carefully review it and extract the diagnoses, including the diagnosis date and status.
@@ -165,6 +177,8 @@ If there is no specific date or status, just omit those keys.
165
177
  Below is the clinical note:
166
178
  {{input}}
167
179
  ```
180
+
181
+
168
182
  #### Information extraction pipeline
169
183
  Now we apply the prompt template to build an information extraction pipeline.
170
184
 
@@ -202,15 +216,33 @@ from llm_ie.data_types import LLMInformationExtractionDocument
202
216
  doc = LLMInformationExtractionDocument(doc_id="Synthesized medical note",
203
217
  text=note_text)
204
218
  # Add frames to a document
205
- for frame in frames:
206
- doc.add_frame(frame, valid_mode="span", create_id=True)
219
+ doc.add_frames(frames, create_id=True)
207
220
 
208
221
  # Save document to file (.llmie)
209
222
  doc.save("<your filename>.llmie")
210
223
  ```
211
224
 
225
+ To visualize the extracted frames, we use the ```viz_serve()``` method.
226
+ ```python
227
+ doc.viz_serve()
228
+ ```
229
+ A Flask APP starts at port 5000 (default).
230
+ ```
231
+ * Serving Flask app 'ie_viz.utilities'
232
+ * Debug mode: off
233
+ WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
234
+ * Running on all addresses (0.0.0.0)
235
+ * Running on http://127.0.0.1:5000
236
+ Press CTRL+C to quit
237
+ 127.0.0.1 - - [03/Oct/2024 23:36:22] "GET / HTTP/1.1" 200 -
238
+ ```
239
+
240
+ <div align="left"><img src="doc_asset/readme_img/llm-ie_demo.PNG" width=1000 ></div>
241
+
242
+
212
243
  ## Examples
213
- - [Write prompt templates with AI editors](demo/prompt_template_writing.ipynb)
244
+ - [Interactive chat with LLM prompt editors](demo/prompt_template_writing_via_chat.ipynb)
245
+ - [Write prompt templates with LLM prompt editors](demo/prompt_template_writing.ipynb)
214
246
  - [NER + RE for Drug, Strength, Frequency](demo/medication_relation_extraction.ipynb)
215
247
 
216
248
  ## User Guide
@@ -435,7 +467,30 @@ print(BasicFrameExtractor.get_prompt_guide())
435
467
  ```
436
468
 
437
469
  ### Prompt Editor
438
- The prompt editor is an LLM agent that reviews, comments and rewrites a prompt following the defined schema of each extractor. It is recommended to use prompt editor iteratively:
470
+ The prompt editor is an LLM agent that help users write prompt templates following the defined schema and guideline of each extractor. Chat with the promtp editor:
471
+
472
+ ```python
473
+ from llm_ie.prompt_editor import PromptEditor
474
+ from llm_ie.extractors import BasicFrameExtractor
475
+ from llm_ie.engines import OllamaInferenceEngine
476
+
477
+ # Define an LLM inference engine
478
+ ollama = OllamaInferenceEngine(model_name="llama3.1:8b-instruct-q8_0")
479
+
480
+ # Define editor
481
+ editor = PromptEditor(ollama, BasicFrameExtractor)
482
+
483
+ editor.chat()
484
+ ```
485
+
486
+ In a terminal environment, an interactive chat session will start:
487
+ <div align="left"><img src=doc_asset/readme_img/terminal_chat.PNG width=1000 ></div>
488
+
489
+ In the Jupyter/IPython environment, an ipywidgets session will start:
490
+ <div align="left"><img src=doc_asset/readme_img/IPython_chat.PNG width=1000 ></div>
491
+
492
+
493
+ We can also use the `rewrite()` and `comment()` methods to programmingly interact with the prompt editor:
439
494
  1. start with a casual description of the task
440
495
  2. have the prompt editor generate a prompt template as the starting point
441
496
  3. manually revise the prompt template
@@ -581,40 +636,29 @@ print(BasicFrameExtractor.get_prompt_guide())
581
636
  ```
582
637
 
583
638
  ```
584
- Prompt template design:
585
- 1. Task description
586
- 2. Schema definition
587
- 3. Output format definition
588
- 4. Additional hints
589
- 5. Input placeholder
639
+ Prompt Template Design:
590
640
 
591
- Example:
641
+ 1. Task Description:
642
+ Provide a detailed description of the task, including the background and the type of task (e.g., named entity recognition).
592
643
 
593
- # Task description
594
- The paragraph below is from the Food and Drug Administration (FDA) Clinical Pharmacology Section of Labeling for Human Prescription Drug and Biological Products, Adverse reactions section. Please carefully review it and extract the adverse reactions and percentages. Note that each adverse reaction is nested under a clinical trial and potentially an arm. Your output should take that into consideration.
644
+ 2. Schema Definition:
645
+ List the key concepts that should be extracted, and provide clear definitions for each one.
595
646
 
596
- # Schema definition
597
- Your output should contain:
598
- "ClinicalTrial" which is the name of the trial,
599
- If applicable, "Arm" which is the arm within the clinical trial,
600
- "AdverseReaction" which is the name of the adverse reaction,
601
- If applicable, "Percentage" which is the occurance of the adverse reaction within the trial and arm,
602
- "Evidence" which is the EXACT sentence in the text where you found the AdverseReaction from
647
+ 3. Output Format Definition:
648
+ The output should be a JSON list, where each element is a dictionary representing a frame (an entity along with its attributes). Each dictionary must include a key that holds the entity text. This key can be named "entity_text" or anything else depend on the context. The attributes can either be flat (e.g., {"entity_text": "<entity_text>", "attr1": "<attr1>", "attr2": "<attr2>"}) or nested (e.g., {"entity_text": "<entity_text>", "attributes": {"attr1": "<attr1>", "attr2": "<attr2>"}}).
603
649
 
604
- # Output format definition
605
- Your output should follow JSON format, for example:
606
- [
607
- {"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>", "Evidence": "<exact sentence from the text>"},
608
- {"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>", "Evidence": "<exact sentence from the text>"}
609
- ]
650
+ 4. Optional: Hints:
651
+ Provide itemized hints for the information extractors to guide the extraction process.
652
+
653
+ 5. Optional: Examples:
654
+ Include examples in the format:
655
+ Input: ...
656
+ Output: ...
610
657
 
611
- # Additional hints
612
- Your output should be 100% based on the provided content. DO NOT output fake numbers.
613
- If there is no specific arm, just omit the "Arm" key. If the percentage is not reported, just omit the "Percentage" key. The "Evidence" should always be provided.
658
+ 6. Input Placeholder:
659
+ The template must include a placeholder in the format {{<placeholder_name>}} for the input text. The placeholder name can be customized as needed.
614
660
 
615
- # Input placeholder
616
- Below is the Adverse reactions section:
617
- {{input}}
661
+ ......
618
662
  ```
619
663
  </details>
620
664
 
@@ -6,6 +6,13 @@
6
6
 
7
7
  An LLM-powered tool that transforms everyday language into robust information extraction pipelines.
8
8
 
9
+ | Features | Support |
10
+ |----------|----------|
11
+ | **LLM Agent for prompt writing** | :white_check_mark: Interactive chat, Python functions |
12
+ | **Named Entity Recognition (NER)** | :white_check_mark: Document-level, Sentence-level |
13
+ | **Entity Attributes Extraction** | :white_check_mark: Flexible formats |
14
+ | **Relation Extraction (RE)** | :white_check_mark: Binary & Multiclass relations |
15
+
9
16
  ## Table of Contents
10
17
  - [Overview](#overview)
11
18
  - [Prerequisite](#prerequisite)
@@ -21,12 +28,12 @@ An LLM-powered tool that transforms everyday language into robust information ex
21
28
  - [RelationExtractor](#relationextractor)
22
29
 
23
30
  ## Overview
24
- LLM-IE is a toolkit that provides robust information extraction utilities for frame-based information extraction. Since prompt design has a significant impact on generative information extraction with LLMs, it also provides a built-in LLM editor to help with prompt writing. The flowchart below demonstrates the workflow starting from a casual language request.
31
+ LLM-IE is a toolkit that provides robust information extraction utilities for named entity, entity attributes, and entity relation extraction. Since prompt design has a significant impact on generative information extraction with LLMs, it has a built-in LLM agent ("editor") to help with prompt writing. The flowchart below demonstrates the workflow starting from a casual language request to output visualization.
25
32
 
26
33
  <div align="center"><img src="doc_asset/readme_img/LLM-IE flowchart.png" width=800 ></div>
27
34
 
28
35
  ## Prerequisite
29
- At least one LLM inference engine is required. There are built-in supports for 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python), <img src="https://avatars.githubusercontent.com/u/151674099?s=48&v=4" alt="Icon" width="20"/> [Ollama](https://github.com/ollama/ollama), 🤗 [Huggingface_hub](https://github.com/huggingface/huggingface_hub), <img src=doc_asset/readme_img/openai-logomark.png width=16 /> [OpenAI API](https://platform.openai.com/docs/api-reference/introduction), and <img src=doc_asset/readme_img/vllm-logo.png width=20 /> vLLM. For installation guides, please refer to those projects. Other inference engines can be configured through the [InferenceEngine](src/llm_ie/engines.py) abstract class. See [LLM Inference Engine](#llm-inference-engine) section below.
36
+ At least one LLM inference engine is required. There are built-in supports for 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python), <img src="https://avatars.githubusercontent.com/u/151674099?s=48&v=4" alt="Icon" width="20"/> [Ollama](https://github.com/ollama/ollama), 🤗 [Huggingface_hub](https://github.com/huggingface/huggingface_hub), <img src=doc_asset/readme_img/openai-logomark.png width=16 /> [OpenAI API](https://platform.openai.com/docs/api-reference/introduction), and <img src=doc_asset/readme_img/vllm-logo.png width=20 /> [vLLM](https://github.com/vllm-project/vllm). For installation guides, please refer to those projects. Other inference engines can be configured through the [InferenceEngine](src/llm_ie/engines.py) abstract class. See [LLM Inference Engine](#llm-inference-engine) section below.
30
37
 
31
38
  ## Installation
32
39
  The Python package is available on PyPI.
@@ -111,21 +118,26 @@ We start with a casual description:
111
118
 
112
119
  *"Extract diagnosis from the clinical note. Make sure to include diagnosis date and status."*
113
120
 
114
- The ```PromptEditor``` rewrites it following the schema required by the ```BasicFrameExtractor```.
115
-
116
- ```python
121
+ Define the AI prompt editor.
122
+ ```python
123
+ from llm_ie.engines import OllamaInferenceEngine
117
124
  from llm_ie.extractors import BasicFrameExtractor
118
125
  from llm_ie.prompt_editor import PromptEditor
119
126
 
120
- # Describe the task in casual language
121
- prompt_draft = "Extract diagnosis from the clinical note. Make sure to include diagnosis date and status."
122
-
123
- # Use LLM editor to generate a formal prompt template with standard extraction schema
127
+ # Define a LLM inference engine
128
+ llm = OllamaInferenceEngine(model_name="llama3.1:8b-instruct-q8_0")
129
+ # Define LLM prompt editor
124
130
  editor = PromptEditor(llm, BasicFrameExtractor)
125
- prompt_template = editor.rewrite(prompt_draft)
131
+ # Start chat
132
+ editor.chat()
126
133
  ```
127
134
 
128
- The editor generates a prompt template as below:
135
+ This opens an interactive session:
136
+ <div align="left"><img src=doc_asset/readme_img/terminal_chat.PNG width=1000 ></div>
137
+
138
+
139
+ The ```PromptEditor``` drafts a prompt template following the schema required by the ```BasicFrameExtractor```:
140
+
129
141
  ```
130
142
  # Task description
131
143
  The paragraph below contains a clinical note with diagnoses listed. Please carefully review it and extract the diagnoses, including the diagnosis date and status.
@@ -151,6 +163,8 @@ If there is no specific date or status, just omit those keys.
151
163
  Below is the clinical note:
152
164
  {{input}}
153
165
  ```
166
+
167
+
154
168
  #### Information extraction pipeline
155
169
  Now we apply the prompt template to build an information extraction pipeline.
156
170
 
@@ -188,15 +202,33 @@ from llm_ie.data_types import LLMInformationExtractionDocument
188
202
  doc = LLMInformationExtractionDocument(doc_id="Synthesized medical note",
189
203
  text=note_text)
190
204
  # Add frames to a document
191
- for frame in frames:
192
- doc.add_frame(frame, valid_mode="span", create_id=True)
205
+ doc.add_frames(frames, create_id=True)
193
206
 
194
207
  # Save document to file (.llmie)
195
208
  doc.save("<your filename>.llmie")
196
209
  ```
197
210
 
211
+ To visualize the extracted frames, we use the ```viz_serve()``` method.
212
+ ```python
213
+ doc.viz_serve()
214
+ ```
215
+ A Flask APP starts at port 5000 (default).
216
+ ```
217
+ * Serving Flask app 'ie_viz.utilities'
218
+ * Debug mode: off
219
+ WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
220
+ * Running on all addresses (0.0.0.0)
221
+ * Running on http://127.0.0.1:5000
222
+ Press CTRL+C to quit
223
+ 127.0.0.1 - - [03/Oct/2024 23:36:22] "GET / HTTP/1.1" 200 -
224
+ ```
225
+
226
+ <div align="left"><img src="doc_asset/readme_img/llm-ie_demo.PNG" width=1000 ></div>
227
+
228
+
198
229
  ## Examples
199
- - [Write prompt templates with AI editors](demo/prompt_template_writing.ipynb)
230
+ - [Interactive chat with LLM prompt editors](demo/prompt_template_writing_via_chat.ipynb)
231
+ - [Write prompt templates with LLM prompt editors](demo/prompt_template_writing.ipynb)
200
232
  - [NER + RE for Drug, Strength, Frequency](demo/medication_relation_extraction.ipynb)
201
233
 
202
234
  ## User Guide
@@ -421,7 +453,30 @@ print(BasicFrameExtractor.get_prompt_guide())
421
453
  ```
422
454
 
423
455
  ### Prompt Editor
424
- The prompt editor is an LLM agent that reviews, comments and rewrites a prompt following the defined schema of each extractor. It is recommended to use prompt editor iteratively:
456
+ The prompt editor is an LLM agent that help users write prompt templates following the defined schema and guideline of each extractor. Chat with the promtp editor:
457
+
458
+ ```python
459
+ from llm_ie.prompt_editor import PromptEditor
460
+ from llm_ie.extractors import BasicFrameExtractor
461
+ from llm_ie.engines import OllamaInferenceEngine
462
+
463
+ # Define an LLM inference engine
464
+ ollama = OllamaInferenceEngine(model_name="llama3.1:8b-instruct-q8_0")
465
+
466
+ # Define editor
467
+ editor = PromptEditor(ollama, BasicFrameExtractor)
468
+
469
+ editor.chat()
470
+ ```
471
+
472
+ In a terminal environment, an interactive chat session will start:
473
+ <div align="left"><img src=doc_asset/readme_img/terminal_chat.PNG width=1000 ></div>
474
+
475
+ In the Jupyter/IPython environment, an ipywidgets session will start:
476
+ <div align="left"><img src=doc_asset/readme_img/IPython_chat.PNG width=1000 ></div>
477
+
478
+
479
+ We can also use the `rewrite()` and `comment()` methods to programmingly interact with the prompt editor:
425
480
  1. start with a casual description of the task
426
481
  2. have the prompt editor generate a prompt template as the starting point
427
482
  3. manually revise the prompt template
@@ -567,40 +622,29 @@ print(BasicFrameExtractor.get_prompt_guide())
567
622
  ```
568
623
 
569
624
  ```
570
- Prompt template design:
571
- 1. Task description
572
- 2. Schema definition
573
- 3. Output format definition
574
- 4. Additional hints
575
- 5. Input placeholder
625
+ Prompt Template Design:
576
626
 
577
- Example:
627
+ 1. Task Description:
628
+ Provide a detailed description of the task, including the background and the type of task (e.g., named entity recognition).
578
629
 
579
- # Task description
580
- The paragraph below is from the Food and Drug Administration (FDA) Clinical Pharmacology Section of Labeling for Human Prescription Drug and Biological Products, Adverse reactions section. Please carefully review it and extract the adverse reactions and percentages. Note that each adverse reaction is nested under a clinical trial and potentially an arm. Your output should take that into consideration.
630
+ 2. Schema Definition:
631
+ List the key concepts that should be extracted, and provide clear definitions for each one.
581
632
 
582
- # Schema definition
583
- Your output should contain:
584
- "ClinicalTrial" which is the name of the trial,
585
- If applicable, "Arm" which is the arm within the clinical trial,
586
- "AdverseReaction" which is the name of the adverse reaction,
587
- If applicable, "Percentage" which is the occurance of the adverse reaction within the trial and arm,
588
- "Evidence" which is the EXACT sentence in the text where you found the AdverseReaction from
633
+ 3. Output Format Definition:
634
+ The output should be a JSON list, where each element is a dictionary representing a frame (an entity along with its attributes). Each dictionary must include a key that holds the entity text. This key can be named "entity_text" or anything else depend on the context. The attributes can either be flat (e.g., {"entity_text": "<entity_text>", "attr1": "<attr1>", "attr2": "<attr2>"}) or nested (e.g., {"entity_text": "<entity_text>", "attributes": {"attr1": "<attr1>", "attr2": "<attr2>"}}).
589
635
 
590
- # Output format definition
591
- Your output should follow JSON format, for example:
592
- [
593
- {"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>", "Evidence": "<exact sentence from the text>"},
594
- {"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>", "Evidence": "<exact sentence from the text>"}
595
- ]
636
+ 4. Optional: Hints:
637
+ Provide itemized hints for the information extractors to guide the extraction process.
638
+
639
+ 5. Optional: Examples:
640
+ Include examples in the format:
641
+ Input: ...
642
+ Output: ...
596
643
 
597
- # Additional hints
598
- Your output should be 100% based on the provided content. DO NOT output fake numbers.
599
- If there is no specific arm, just omit the "Arm" key. If the percentage is not reported, just omit the "Percentage" key. The "Evidence" should always be provided.
644
+ 6. Input Placeholder:
645
+ The template must include a placeholder in the format {{<placeholder_name>}} for the input text. The placeholder name can be customized as needed.
600
646
 
601
- # Input placeholder
602
- Below is the Adverse reactions section:
603
- {{input}}
647
+ ......
604
648
  ```
605
649
  </details>
606
650
 
@@ -1,6 +1,6 @@
1
1
  [tool.poetry]
2
2
  name = "llm-ie"
3
- version = "0.2.2"
3
+ version = "0.3.0"
4
4
  description = "An LLM-powered tool that transforms everyday language into robust information extraction pipelines."
5
5
  authors = ["Enshuo (David) Hsu"]
6
6
  license = "MIT"
@@ -0,0 +1,5 @@
1
+ # Task description
2
+ Chat with the user following the prompt guideline below.
3
+
4
+ # Prompt guideline
5
+ {{prompt_guideline}}
@@ -0,0 +1,9 @@
1
+ # Task description
2
+ Rewrite the draft prompt following the prompt guideline below.
3
+ DO NOT explain your answer.
4
+
5
+ # Prompt guideline
6
+ {{prompt_guideline}}
7
+
8
+ # Draft prompt
9
+ {{draft}}
@@ -0,0 +1 @@
1
+ You are an AI assistant specializing in prompt writing and improvement. Your role is to help users refine, rewrite, and generate effective prompts based on guidelines provided. You are highly knowledgeable in extracting key information and adhering to structured formats. During interactions, you will engage in clear, insightful, and context-aware conversations, providing thoughtful responses to assist the user. Maintain a polite, professional tone and ensure each response adds value to the conversation, promoting clarity and creativity in the user's prompts. If users ask about irrelevant topics (not related to prompt development), you will politely decline to answer and guide the conversation back to prompt development.
@@ -0,0 +1,145 @@
1
+ Prompt Template Design:
2
+
3
+ 1. Task Description:
4
+ Provide a detailed description of the task, including the background and the type of task (e.g., named entity recognition).
5
+
6
+ 2. Schema Definition:
7
+ List the key concepts that should be extracted, and provide clear definitions for each one.
8
+
9
+ 3. Output Format Definition:
10
+ The output should be a JSON list, where each element is a dictionary representing a frame (an entity along with its attributes). Each dictionary must include a key that holds the entity text. This key can be named "entity_text" or anything else depend on the context. The attributes can either be flat (e.g., {"entity_text": "<entity_text>", "attr1": "<attr1>", "attr2": "<attr2>"}) or nested (e.g., {"entity_text": "<entity_text>", "attributes": {"attr1": "<attr1>", "attr2": "<attr2>"}}).
11
+
12
+ 4. Optional: Hints:
13
+ Provide itemized hints for the information extractors to guide the extraction process.
14
+
15
+ 5. Optional: Examples:
16
+ Include examples in the format:
17
+ Input: ...
18
+ Output: ...
19
+
20
+ 6. Input Placeholder:
21
+ The template must include a placeholder in the format {{<placeholder_name>}} for the input text. The placeholder name can be customized as needed.
22
+
23
+
24
+ Example 1 (single entity type with attributes):
25
+
26
+ # Task description
27
+ The paragraph below is from the Food and Drug Administration (FDA) Clinical Pharmacology Section of Labeling for Human Prescription Drug and Biological Products, Adverse reactions section. Please carefully review it and extract the adverse reactions and percentages. Note that each adverse reaction is nested under a clinical trial and potentially an arm. Your output should take that into consideration.
28
+
29
+ # Schema definition
30
+ Your output should contain:
31
+ "ClinicalTrial" which is the name of the trial,
32
+ If applicable, "Arm" which is the arm within the clinical trial,
33
+ "AdverseReaction" which is the name of the adverse reaction,
34
+ If applicable, "Percentage" which is the occurance of the adverse reaction within the trial and arm,
35
+ "Evidence" which is the EXACT sentence in the text where you found the AdverseReaction from
36
+
37
+ # Output format definition
38
+ Your output should follow JSON format, for example:
39
+ [
40
+ {"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>", "Evidence": "<exact sentence from the text>"},
41
+ {"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>", "Evidence": "<exact sentence from the text>"}
42
+ ]
43
+
44
+ # Additional hints
45
+ Your output should be 100% based on the provided content. DO NOT output fake numbers.
46
+ If there is no specific arm, just omit the "Arm" key. If the percentage is not reported, just omit the "Percentage" key. The "Evidence" should always be provided.
47
+
48
+ # Input placeholder
49
+ Below is the Adverse reactions section:
50
+ {{input}}
51
+
52
+
53
+ Example 2 (multiple entity types):
54
+
55
+ # Task description
56
+ This is a named entity recognition task. Given medical note, annotate the Drug, Form, Strength, Frequency, Route, Dosage, Reason, ADE, and Duration.
57
+
58
+ # Schema definition
59
+ Your output should contain:
60
+ "entity_text": the exact wording as mentioned in the note.
61
+ "entity_type": type of the entity. It should be one of the "Drug", "Form", "Strength", "Frequency", "Route", "Dosage", "Reason", "ADE", or "Duration".
62
+
63
+ # Output format definition
64
+ Your output should follow JSON format,
65
+ if there are one of the entity mentions: Drug, Form, Strength, Frequency, Route, Dosage, Reason, ADE, or Duration:
66
+ [{"entity_text": "<Exact entity mentions as in the note>", "entity_type": "<entity type as listed above>"},
67
+ {"entity_text": "<Exact entity mentions as in the note>", "entity_type": "<entity type as listed above>"}]
68
+ if there is no entity mentioned in the given note, just output an empty list:
69
+ []
70
+
71
+ I am only interested in the extracted contents in []. Do NOT explain your answer.
72
+
73
+ # Examples
74
+ Below are some examples:
75
+
76
+ Input: Acetaminophen 650 mg PO BID 5.
77
+ Output: [{"entity_text": "Acetaminophen", "entity_type": "Drug"}, {"entity_text": "650 mg", "entity_type": "Strength"}, {"entity_text": "PO", "entity_type": "Route"}, {"entity_text": "BID", "entity_type": "Frequency"}]
78
+
79
+ Input: Mesalamine DR 1200 mg PO BID 2.
80
+ Output: [{"entity_text": "Mesalamine DR", "entity_type": "Drug"}, {"entity_text": "1200 mg", "entity_type": "Strength"}, {"entity_text": "BID", "entity_type": "Frequency"}, {"entity_text": "PO", "entity_type": "Route"}]
81
+
82
+
83
+ # Input placeholder
84
+ Below is the medical note:
85
+ "{{input}}"
86
+
87
+
88
+ Example 3 (multiple entity types with corresponding attributes):
89
+
90
+ # Task description
91
+ This is a named entity recognition task. Given a medical note, annotate the events (EVENT) and time expressions (TIMEX3):
92
+
93
+ # Schema definition
94
+ Your output should contain:
95
+ "entity_text": the exact wording as mentioned in the note.
96
+ "entity_type": type of the entity. It should be one of the "EVENT" or "TIMEX3".
97
+ if entity_type is "EVENT",
98
+ "type": the event type as one of the "TEST", "PROBLEM", "TREATMENT", "CLINICAL_DEPT", "EVIDENTIAL", or "OCCURRENCE".
99
+ "polarity": whether an EVENT is positive ("POS") or negative ("NAG"). For example, in “the patient reports headache, and denies chills”, the EVENT [headache] is positive in its polarity, and the EVENT [chills] is negative in its polarity.
100
+ "modality": whether an EVENT actually occurred or not. Must be one of the "FACTUAL", "CONDITIONAL", "POSSIBLE", or "PROPOSED".
101
+
102
+ if entity_type is "TIMEX3",
103
+ "type": the type as one of the "DATE", "TIME", "DURATION", or "FREQUENCY".
104
+ "val": the numeric value 1) DATE: [YYYY]-[MM]-[DD], 2) TIME: [hh]:[mm]:[ss], 3) DURATION: P[n][Y/M/W/D]. So, “for eleven days” will be
105
+ represented as “P11D”, meaning a period of 11 days. 4) R[n][duration], where n denotes the number of repeats. When the n is omitted, the expression denotes an unspecified amount of repeats. For example, “once a day for 3 days” is “R3P1D” (repeat the time interval of 1 day (P1D) for 3 times (R3)), twice every day is “RP12H” (repeat every 12 hours)
106
+ "mod": additional information regarding the temporal value of a time expression. Must be one of the:
107
+ “NA”: the default value, no relevant modifier is present;
108
+ “MORE”, means “more than”, e.g. over 2 days (val = P2D, mod = MORE);
109
+ “LESS”, means “less than”, e.g. almost 2 months (val = P2M, mod=LESS);
110
+ “APPROX”, means “approximate”, e.g. nearly a week (val = P1W, mod=APPROX);
111
+ “START”, describes the beginning of a period of time, e.g. Christmas morning, 2005 (val= 2005-12-25, mod= START).
112
+ “END”, describes the end of a period of time, e.g. late last year, (val = 2010, mod = END)
113
+ “MIDDLE”, describes the middle of a period of time, e.g. mid-September 2001 (val = 2001-09, mod = MIDDLE)
114
+
115
+ # Output format definition
116
+ Your output should follow JSON format,
117
+ if there are one of the EVENT or TIMEX3 entity mentions:
118
+ [
119
+ {"entity_text": "<Exact entity mentions as in the note>", "entity_type": "EVENT", "type": "<event type>", "polarity": "<event polarity>", "modality": "<event modality>"},
120
+ {"entity_text": "<Exact entity mentions as in the note>", "entity_type": "TIMEX3", "type": "<TIMEX3 type>", "val": "<time value>", "mod": "<additional information>"}
121
+ ...
122
+ ]
123
+ if there is no entity mentioned in the given note, just output an empty list:
124
+ []
125
+
126
+ I am only interested in the extracted contents in []. Do NOT explain your answer.
127
+
128
+ # Examples
129
+ Below are some examples:
130
+
131
+ Input: At 9/7/93 , 1:00 a.m. , intravenous fluids rate was decreased to 50 cc&apos;s per hour , total fluids given during the first 24 hours were 140 to 150 cc&apos;s per kilo per day .
132
+ Output: [{"entity_text": "intravenous fluids", "entity_type": "EVENT", "type": "TREATMENT", "polarity": "POS", "modality": "FACTUAL"},
133
+ {"entity_text": "decreased", "entity_type": "EVENT", "type": "OCCURRENCE", "polarity": "POS", "modality": "FACTUAL"},
134
+ {"entity_text": "total fluids", "entity_type": "EVENT", "type": "TREATMENT", "polarity": "POS", "modality": "FACTUAL"},
135
+ {"entity_text": "9/7/93 , 1:00 a.m.", "entity_type": "TIMEX3", "type": "TIME", "val": "1993-09-07T01:00", "mod": "NA"},
136
+ {"entity_text": "24 hours", "entity_type": "TIMEX3", "type": "DURATION", "val": "PT24H", "mod": "NA"}]
137
+
138
+ Input: At that time it appeared well adhered to the underlying skin .
139
+ Output: [{"entity_text": "it", "entity_type": "EVENT", "type": "TREATMENT", "polarity": "POS", "modality": "FACTUAL"},
140
+ {"entity_text": "well adhered", "entity_type": "EVENT", "type": "OCCURRENCE", "polarity": "POS", "modality": "FACTUAL"}]
141
+
142
+
143
+ # Input placeholder
144
+ Below is the entire medical note:
145
+ "{{input}}"