llm-ie 0.2.1__py3-none-any.whl → 0.3.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,5 @@
1
+ # Task description
2
+ Chat with the user following the prompt guideline below.
3
+
4
+ # Prompt guideline
5
+ {{prompt_guideline}}
@@ -1,4 +1,6 @@
1
- This is a prompt rewriting task. Rewrite the draft prompt following the prompt guideline below. DO NOT explain your answer.
1
+ # Task description
2
+ Rewrite the draft prompt following the prompt guideline below.
3
+ DO NOT explain your answer.
2
4
 
3
5
  # Prompt guideline
4
6
  {{prompt_guideline}}
@@ -0,0 +1 @@
1
+ You are an AI assistant specializing in prompt writing and improvement. Your role is to help users refine, rewrite, and generate effective prompts based on guidelines provided. You are highly knowledgeable in extracting key information and adhering to structured formats. During interactions, you will engage in clear, insightful, and context-aware conversations, providing thoughtful responses to assist the user. Maintain a polite, professional tone and ensure each response adds value to the conversation, promoting clarity and creativity in the user's prompts. If users ask about irrelevant topics (not related to prompt development), you will politely decline to answer and guide the conversation back to prompt development.
@@ -1,11 +1,27 @@
1
- Prompt template design:
2
- 1. Task description
3
- 2. Schema definition
4
- 3. Output format definition
5
- 4. Additional hints
6
- 5. Input placeholder
1
+ Prompt Template Design:
7
2
 
8
- Example:
3
+ 1. Task Description:
4
+ Provide a detailed description of the task, including the background and the type of task (e.g., named entity recognition).
5
+
6
+ 2. Schema Definition:
7
+ List the key concepts that should be extracted, and provide clear definitions for each one.
8
+
9
+ 3. Output Format Definition:
10
+ The output should be a JSON list, where each element is a dictionary representing a frame (an entity along with its attributes). Each dictionary must include a key that holds the entity text. This key can be named "entity_text" or anything else depend on the context. The attributes can either be flat (e.g., {"entity_text": "<entity_text>", "attr1": "<attr1>", "attr2": "<attr2>"}) or nested (e.g., {"entity_text": "<entity_text>", "attributes": {"attr1": "<attr1>", "attr2": "<attr2>"}}).
11
+
12
+ 4. Optional: Hints:
13
+ Provide itemized hints for the information extractors to guide the extraction process.
14
+
15
+ 5. Optional: Examples:
16
+ Include examples in the format:
17
+ Input: ...
18
+ Output: ...
19
+
20
+ 6. Input Placeholder:
21
+ The template must include a placeholder in the format {{<placeholder_name>}} for the input text. The placeholder name can be customized as needed.
22
+
23
+
24
+ Example 1 (single entity type with attributes):
9
25
 
10
26
  # Task description
11
27
  The paragraph below is from the Food and Drug Administration (FDA) Clinical Pharmacology Section of Labeling for Human Prescription Drug and Biological Products, Adverse reactions section. Please carefully review it and extract the adverse reactions and percentages. Note that each adverse reaction is nested under a clinical trial and potentially an arm. Your output should take that into consideration.
@@ -33,3 +49,97 @@ Example:
33
49
  Below is the Adverse reactions section:
34
50
  {{input}}
35
51
 
52
+
53
+ Example 2 (multiple entity types):
54
+
55
+ # Task description
56
+ This is a named entity recognition task. Given medical note, annotate the Drug, Form, Strength, Frequency, Route, Dosage, Reason, ADE, and Duration.
57
+
58
+ # Schema definition
59
+ Your output should contain:
60
+ "entity_text": the exact wording as mentioned in the note.
61
+ "entity_type": type of the entity. It should be one of the "Drug", "Form", "Strength", "Frequency", "Route", "Dosage", "Reason", "ADE", or "Duration".
62
+
63
+ # Output format definition
64
+ Your output should follow JSON format,
65
+ if there are one of the entity mentions: Drug, Form, Strength, Frequency, Route, Dosage, Reason, ADE, or Duration:
66
+ [{"entity_text": "<Exact entity mentions as in the note>", "entity_type": "<entity type as listed above>"},
67
+ {"entity_text": "<Exact entity mentions as in the note>", "entity_type": "<entity type as listed above>"}]
68
+ if there is no entity mentioned in the given note, just output an empty list:
69
+ []
70
+
71
+ I am only interested in the extracted contents in []. Do NOT explain your answer.
72
+
73
+ # Examples
74
+ Below are some examples:
75
+
76
+ Input: Acetaminophen 650 mg PO BID 5.
77
+ Output: [{"entity_text": "Acetaminophen", "entity_type": "Drug"}, {"entity_text": "650 mg", "entity_type": "Strength"}, {"entity_text": "PO", "entity_type": "Route"}, {"entity_text": "BID", "entity_type": "Frequency"}]
78
+
79
+ Input: Mesalamine DR 1200 mg PO BID 2.
80
+ Output: [{"entity_text": "Mesalamine DR", "entity_type": "Drug"}, {"entity_text": "1200 mg", "entity_type": "Strength"}, {"entity_text": "BID", "entity_type": "Frequency"}, {"entity_text": "PO", "entity_type": "Route"}]
81
+
82
+
83
+ # Input placeholder
84
+ Below is the medical note:
85
+ "{{input}}"
86
+
87
+
88
+ Example 3 (multiple entity types with corresponding attributes):
89
+
90
+ # Task description
91
+ This is a named entity recognition task. Given a medical note, annotate the events (EVENT) and time expressions (TIMEX3):
92
+
93
+ # Schema definition
94
+ Your output should contain:
95
+ "entity_text": the exact wording as mentioned in the note.
96
+ "entity_type": type of the entity. It should be one of the "EVENT" or "TIMEX3".
97
+ if entity_type is "EVENT",
98
+ "type": the event type as one of the "TEST", "PROBLEM", "TREATMENT", "CLINICAL_DEPT", "EVIDENTIAL", or "OCCURRENCE".
99
+ "polarity": whether an EVENT is positive ("POS") or negative ("NAG"). For example, in “the patient reports headache, and denies chills”, the EVENT [headache] is positive in its polarity, and the EVENT [chills] is negative in its polarity.
100
+ "modality": whether an EVENT actually occurred or not. Must be one of the "FACTUAL", "CONDITIONAL", "POSSIBLE", or "PROPOSED".
101
+
102
+ if entity_type is "TIMEX3",
103
+ "type": the type as one of the "DATE", "TIME", "DURATION", or "FREQUENCY".
104
+ "val": the numeric value 1) DATE: [YYYY]-[MM]-[DD], 2) TIME: [hh]:[mm]:[ss], 3) DURATION: P[n][Y/M/W/D]. So, “for eleven days” will be
105
+ represented as “P11D”, meaning a period of 11 days. 4) R[n][duration], where n denotes the number of repeats. When the n is omitted, the expression denotes an unspecified amount of repeats. For example, “once a day for 3 days” is “R3P1D” (repeat the time interval of 1 day (P1D) for 3 times (R3)), twice every day is “RP12H” (repeat every 12 hours)
106
+ "mod": additional information regarding the temporal value of a time expression. Must be one of the:
107
+ “NA”: the default value, no relevant modifier is present;
108
+ “MORE”, means “more than”, e.g. over 2 days (val = P2D, mod = MORE);
109
+ “LESS”, means “less than”, e.g. almost 2 months (val = P2M, mod=LESS);
110
+ “APPROX”, means “approximate”, e.g. nearly a week (val = P1W, mod=APPROX);
111
+ “START”, describes the beginning of a period of time, e.g. Christmas morning, 2005 (val= 2005-12-25, mod= START).
112
+ “END”, describes the end of a period of time, e.g. late last year, (val = 2010, mod = END)
113
+ “MIDDLE”, describes the middle of a period of time, e.g. mid-September 2001 (val = 2001-09, mod = MIDDLE)
114
+
115
+ # Output format definition
116
+ Your output should follow JSON format,
117
+ if there are one of the EVENT or TIMEX3 entity mentions:
118
+ [
119
+ {"entity_text": "<Exact entity mentions as in the note>", "entity_type": "EVENT", "type": "<event type>", "polarity": "<event polarity>", "modality": "<event modality>"},
120
+ {"entity_text": "<Exact entity mentions as in the note>", "entity_type": "TIMEX3", "type": "<TIMEX3 type>", "val": "<time value>", "mod": "<additional information>"}
121
+ ...
122
+ ]
123
+ if there is no entity mentioned in the given note, just output an empty list:
124
+ []
125
+
126
+ I am only interested in the extracted contents in []. Do NOT explain your answer.
127
+
128
+ # Examples
129
+ Below are some examples:
130
+
131
+ Input: At 9/7/93 , 1:00 a.m. , intravenous fluids rate was decreased to 50 cc&apos;s per hour , total fluids given during the first 24 hours were 140 to 150 cc&apos;s per kilo per day .
132
+ Output: [{"entity_text": "intravenous fluids", "entity_type": "EVENT", "type": "TREATMENT", "polarity": "POS", "modality": "FACTUAL"},
133
+ {"entity_text": "decreased", "entity_type": "EVENT", "type": "OCCURRENCE", "polarity": "POS", "modality": "FACTUAL"},
134
+ {"entity_text": "total fluids", "entity_type": "EVENT", "type": "TREATMENT", "polarity": "POS", "modality": "FACTUAL"},
135
+ {"entity_text": "9/7/93 , 1:00 a.m.", "entity_type": "TIMEX3", "type": "TIME", "val": "1993-09-07T01:00", "mod": "NA"},
136
+ {"entity_text": "24 hours", "entity_type": "TIMEX3", "type": "DURATION", "val": "PT24H", "mod": "NA"}]
137
+
138
+ Input: At that time it appeared well adhered to the underlying skin .
139
+ Output: [{"entity_text": "it", "entity_type": "EVENT", "type": "TREATMENT", "polarity": "POS", "modality": "FACTUAL"},
140
+ {"entity_text": "well adhered", "entity_type": "EVENT", "type": "OCCURRENCE", "polarity": "POS", "modality": "FACTUAL"}]
141
+
142
+
143
+ # Input placeholder
144
+ Below is the entire medical note:
145
+ "{{input}}"
@@ -1,9 +1,29 @@
1
- Prompt template design:
2
- 1. Task description (mention binary relation extraction and ROI)
3
- 2. Schema definition (defines relation)
4
- 3. Output format definition (must use the key "Relation")
5
- 4. Hints
6
- 5. Input placeholders (must include "roi_text", "frame_1", and "frame_2" placeholders)
1
+ Prompt Template Design:
2
+
3
+ 1. Task description:
4
+ Provide a detailed description of the task, including the background and the type of task (e.g., binary relation extraction). Mention the region of interest (ROI) text.
5
+ 2. Schema definition:
6
+ List the criterion for relation (True) and for no relation (False).
7
+
8
+ 3. Output format definition:
9
+ The ouptut must be a dictionary with a key "Relation" (i.e., {"Relation": "<True or False>"}).
10
+
11
+ 4. (optional) Hints:
12
+ Provide itemized hints for the information extractors to guide the extraction process.
13
+
14
+ 5. (optional) Examples:
15
+ Include examples in the format:
16
+ Input: ...
17
+ Output: ...
18
+
19
+ 6. Entity 1 full information:
20
+ Include a placeholder in the format {{<frame_1>}}
21
+
22
+ 7. Entity 2 full information:
23
+ Include a placeholder in the format {{<frame_2>}}
24
+
25
+ 8. Input placeholders:
26
+ The template must include a placeholder "{{roi_text}}" for the ROI text.
7
27
 
8
28
 
9
29
  Example:
@@ -27,12 +47,12 @@ Example:
27
47
  3. If the strength or frequency is for another medication, output False.
28
48
  4. If the strength or frequency is for the same medication but at a different location (span), output False.
29
49
 
50
+ # Entity 1 full information:
51
+ {{frame_1}}
52
+
53
+ # Entity 2 full information:
54
+ {{frame_2}}
55
+
30
56
  # Input placeholders
31
57
  ROI Text with the two entities annotated with <entity_1> and <entity_2>:
32
58
  "{{roi_text}}"
33
-
34
- Entity 1 full information:
35
- {{frame_1}}
36
-
37
- Entity 2 full information:
38
- {{frame_2}}
@@ -1,8 +1,31 @@
1
- Prompt template design:
2
- 1. Task description (mention multi-class relation extraction and ROI)
3
- 2. Schema definition (defines relation types)
4
- 3. Output format definition (must use the key "RelationType")
5
- 4. Input placeholders (must include "roi_text", "frame_1", and "frame_2" placeholders)
1
+ Prompt Template Design:
2
+
3
+ 1. Task description:
4
+ Provide a detailed description of the task, including the background and the type of task (e.g., binary relation extraction). Mention the region of interest (ROI) text.
5
+ 2. Schema definition:
6
+ List the criterion for relation (True) and for no relation (False).
7
+
8
+ 3. Output format definition:
9
+ This section must include a placeholder "{{pos_rel_types}}" for the possible relation types.
10
+ The ouptut must be a dictionary with a key "RelationType" (i.e., {"RelationType": "<relation type or No Relation>"}).
11
+
12
+ 4. (optional) Hints:
13
+ Provide itemized hints for the information extractors to guide the extraction process.
14
+
15
+ 5. (optional) Examples:
16
+ Include examples in the format:
17
+ Input: ...
18
+ Output: ...
19
+
20
+ 6. Entity 1 full information:
21
+ Include a placeholder in the format {{<frame_1>}}
22
+
23
+ 7. Entity 2 full information:
24
+ Include a placeholder in the format {{<frame_2>}}
25
+
26
+ 8. Input placeholders:
27
+ The template must include a placeholder "{{roi_text}}" for the ROI text.
28
+
6
29
 
7
30
 
8
31
  Example:
@@ -35,12 +58,12 @@ Example:
35
58
  3. If the strength or frequency is for another medication, output "No Relation".
36
59
  4. If the strength or frequency is for the same medication but at a different location (span), output "No Relation".
37
60
 
38
- # Input placeholders
39
- ROI Text with the two entities annotated with <entity_1> and <entity_2>:
40
- "{{roi_text}}"
41
-
42
- Entity 1 full information:
61
+ # Entity 1 full information:
43
62
  {{frame_1}}
44
63
 
45
- Entity 2 full information:
46
- {{frame_2}}
64
+ # Entity 2 full information:
65
+ {{frame_2}}
66
+
67
+ # Input placeholders
68
+ ROI Text with the two entities annotated with <entity_1> and <entity_2>:
69
+ "{{roi_text}}"
@@ -1,11 +1,27 @@
1
- Prompt template design:
2
- 1. Task description
3
- 2. Schema definition
4
- 3. Output format definition
5
- 4. Additional hints
6
- 5. Input placeholder
1
+ Prompt Template Design:
7
2
 
8
- Example:
3
+ 1. Task Description:
4
+ Provide a detailed description of the task, including the background and the type of task (e.g., named entity recognition).
5
+
6
+ 2. Schema Definition:
7
+ List the key concepts that should be extracted, and provide clear definitions for each one.
8
+
9
+ 3. Output Format Definition:
10
+ The output should be a JSON list, where each element is a dictionary representing a frame (an entity along with its attributes). Each dictionary must include a key that holds the entity text. This key can be named "entity_text" or anything else depend on the context. The attributes can either be flat (e.g., {"entity_text": "<entity_text>", "attr1": "<attr1>", "attr2": "<attr2>"}) or nested (e.g., {"entity_text": "<entity_text>", "attributes": {"attr1": "<attr1>", "attr2": "<attr2>"}}).
11
+
12
+ 4. Optional: Hints:
13
+ Provide itemized hints for the information extractors to guide the extraction process.
14
+
15
+ 5. Optional: Examples:
16
+ Include examples in the format:
17
+ Input: ...
18
+ Output: ...
19
+
20
+ 6. Input Placeholder:
21
+ The template must include a placeholder in the format {{<placeholder_name>}} for the input text. The placeholder name can be customized as needed.
22
+
23
+
24
+ Example 1 (single entity type with attributes):
9
25
 
10
26
  # Task description
11
27
  The paragraph below is from the Food and Drug Administration (FDA) Clinical Pharmacology Section of Labeling for Human Prescription Drug and Biological Products, Adverse reactions section. Please carefully review it and extract the adverse reactions and percentages. Note that each adverse reaction is nested under a clinical trial and potentially an arm. Your output should take that into consideration.
@@ -33,3 +49,97 @@ Example:
33
49
  Below is the Adverse reactions section:
34
50
  {{input}}
35
51
 
52
+
53
+ Example 2 (multiple entity types):
54
+
55
+ # Task description
56
+ This is a named entity recognition task. Given medical note, annotate the Drug, Form, Strength, Frequency, Route, Dosage, Reason, ADE, and Duration.
57
+
58
+ # Schema definition
59
+ Your output should contain:
60
+ "entity_text": the exact wording as mentioned in the note.
61
+ "entity_type": type of the entity. It should be one of the "Drug", "Form", "Strength", "Frequency", "Route", "Dosage", "Reason", "ADE", or "Duration".
62
+
63
+ # Output format definition
64
+ Your output should follow JSON format,
65
+ if there are one of the entity mentions: Drug, Form, Strength, Frequency, Route, Dosage, Reason, ADE, or Duration:
66
+ [{"entity_text": "<Exact entity mentions as in the note>", "entity_type": "<entity type as listed above>"},
67
+ {"entity_text": "<Exact entity mentions as in the note>", "entity_type": "<entity type as listed above>"}]
68
+ if there is no entity mentioned in the given note, just output an empty list:
69
+ []
70
+
71
+ I am only interested in the extracted contents in []. Do NOT explain your answer.
72
+
73
+ # Examples
74
+ Below are some examples:
75
+
76
+ Input: Acetaminophen 650 mg PO BID 5.
77
+ Output: [{"entity_text": "Acetaminophen", "entity_type": "Drug"}, {"entity_text": "650 mg", "entity_type": "Strength"}, {"entity_text": "PO", "entity_type": "Route"}, {"entity_text": "BID", "entity_type": "Frequency"}]
78
+
79
+ Input: Mesalamine DR 1200 mg PO BID 2.
80
+ Output: [{"entity_text": "Mesalamine DR", "entity_type": "Drug"}, {"entity_text": "1200 mg", "entity_type": "Strength"}, {"entity_text": "BID", "entity_type": "Frequency"}, {"entity_text": "PO", "entity_type": "Route"}]
81
+
82
+
83
+ # Input placeholder
84
+ Below is the medical note:
85
+ "{{input}}"
86
+
87
+
88
+ Example 3 (multiple entity types with corresponding attributes):
89
+
90
+ # Task description
91
+ This is a named entity recognition task. Given a medical note, annotate the events (EVENT) and time expressions (TIMEX3):
92
+
93
+ # Schema definition
94
+ Your output should contain:
95
+ "entity_text": the exact wording as mentioned in the note.
96
+ "entity_type": type of the entity. It should be one of the "EVENT" or "TIMEX3".
97
+ if entity_type is "EVENT",
98
+ "type": the event type as one of the "TEST", "PROBLEM", "TREATMENT", "CLINICAL_DEPT", "EVIDENTIAL", or "OCCURRENCE".
99
+ "polarity": whether an EVENT is positive ("POS") or negative ("NAG"). For example, in “the patient reports headache, and denies chills”, the EVENT [headache] is positive in its polarity, and the EVENT [chills] is negative in its polarity.
100
+ "modality": whether an EVENT actually occurred or not. Must be one of the "FACTUAL", "CONDITIONAL", "POSSIBLE", or "PROPOSED".
101
+
102
+ if entity_type is "TIMEX3",
103
+ "type": the type as one of the "DATE", "TIME", "DURATION", or "FREQUENCY".
104
+ "val": the numeric value 1) DATE: [YYYY]-[MM]-[DD], 2) TIME: [hh]:[mm]:[ss], 3) DURATION: P[n][Y/M/W/D]. So, “for eleven days” will be
105
+ represented as “P11D”, meaning a period of 11 days. 4) R[n][duration], where n denotes the number of repeats. When the n is omitted, the expression denotes an unspecified amount of repeats. For example, “once a day for 3 days” is “R3P1D” (repeat the time interval of 1 day (P1D) for 3 times (R3)), twice every day is “RP12H” (repeat every 12 hours)
106
+ "mod": additional information regarding the temporal value of a time expression. Must be one of the:
107
+ “NA”: the default value, no relevant modifier is present;
108
+ “MORE”, means “more than”, e.g. over 2 days (val = P2D, mod = MORE);
109
+ “LESS”, means “less than”, e.g. almost 2 months (val = P2M, mod=LESS);
110
+ “APPROX”, means “approximate”, e.g. nearly a week (val = P1W, mod=APPROX);
111
+ “START”, describes the beginning of a period of time, e.g. Christmas morning, 2005 (val= 2005-12-25, mod= START).
112
+ “END”, describes the end of a period of time, e.g. late last year, (val = 2010, mod = END)
113
+ “MIDDLE”, describes the middle of a period of time, e.g. mid-September 2001 (val = 2001-09, mod = MIDDLE)
114
+
115
+ # Output format definition
116
+ Your output should follow JSON format,
117
+ if there are one of the EVENT or TIMEX3 entity mentions:
118
+ [
119
+ {"entity_text": "<Exact entity mentions as in the note>", "entity_type": "EVENT", "type": "<event type>", "polarity": "<event polarity>", "modality": "<event modality>"},
120
+ {"entity_text": "<Exact entity mentions as in the note>", "entity_type": "TIMEX3", "type": "<TIMEX3 type>", "val": "<time value>", "mod": "<additional information>"}
121
+ ...
122
+ ]
123
+ if there is no entity mentioned in the given note, just output an empty list:
124
+ []
125
+
126
+ I am only interested in the extracted contents in []. Do NOT explain your answer.
127
+
128
+ # Examples
129
+ Below are some examples:
130
+
131
+ Input: At 9/7/93 , 1:00 a.m. , intravenous fluids rate was decreased to 50 cc&apos;s per hour , total fluids given during the first 24 hours were 140 to 150 cc&apos;s per kilo per day .
132
+ Output: [{"entity_text": "intravenous fluids", "entity_type": "EVENT", "type": "TREATMENT", "polarity": "POS", "modality": "FACTUAL"},
133
+ {"entity_text": "decreased", "entity_type": "EVENT", "type": "OCCURRENCE", "polarity": "POS", "modality": "FACTUAL"},
134
+ {"entity_text": "total fluids", "entity_type": "EVENT", "type": "TREATMENT", "polarity": "POS", "modality": "FACTUAL"},
135
+ {"entity_text": "9/7/93 , 1:00 a.m.", "entity_type": "TIMEX3", "type": "TIME", "val": "1993-09-07T01:00", "mod": "NA"},
136
+ {"entity_text": "24 hours", "entity_type": "TIMEX3", "type": "DURATION", "val": "PT24H", "mod": "NA"}]
137
+
138
+ Input: At that time it appeared well adhered to the underlying skin .
139
+ Output: [{"entity_text": "it", "entity_type": "EVENT", "type": "TREATMENT", "polarity": "POS", "modality": "FACTUAL"},
140
+ {"entity_text": "well adhered", "entity_type": "EVENT", "type": "OCCURRENCE", "polarity": "POS", "modality": "FACTUAL"}]
141
+
142
+
143
+ # Input placeholder
144
+ Below is the entire medical note:
145
+ "{{input}}"
@@ -1,40 +1,145 @@
1
- Prompt template design:
2
- 1. Task description (mention the task is to extract information from sentences)
3
- 2. Schema definition
4
- 3. Output format definition
5
- 4. Additional hints
6
- 5. Input placeholder (mention user will feed sentence by sentence)
1
+ Prompt Template Design:
7
2
 
8
- Example:
3
+ 1. Task Description:
4
+ Provide a detailed description of the task, including the background and the type of task (e.g., named entity recognition).
5
+
6
+ 2. Schema Definition:
7
+ List the key concepts that should be extracted, and provide clear definitions for each one.
8
+
9
+ 3. Output Format Definition:
10
+ The output should be a JSON list, where each element is a dictionary representing a frame (an entity along with its attributes). Each dictionary must include a key that holds the entity text. This key can be named "entity_text" or anything else depend on the context. The attributes can either be flat (e.g., {"entity_text": "<entity_text>", "attr1": "<attr1>", "attr2": "<attr2>"}) or nested (e.g., {"entity_text": "<entity_text>", "attributes": {"attr1": "<attr1>", "attr2": "<attr2>"}}).
11
+
12
+ 4. Optional: Hints:
13
+ Provide itemized hints for the information extractors to guide the extraction process.
14
+
15
+ 5. Optional: Examples:
16
+ Include examples in the format:
17
+ Input: ...
18
+ Output: ...
19
+
20
+ 6. Input Placeholder:
21
+ The template must include a placeholder in the format {{<placeholder_name>}} for the input text. The placeholder name can be customized as needed.
22
+
23
+
24
+ Example 1 (single entity type with attributes):
9
25
 
10
26
  # Task description
11
- The paragraph below is from the Food and Drug Administration (FDA) Clinical Pharmacology Section of Labeling for Human Prescription Drug and Biological Products, Adverse Reactions section. Your task is to extract the adverse reactions in a given sentence (provided by user at a time). Note that adverse reactions can be nested under a clinical trial and potentially an arm. Your output should consider that.
27
+ The paragraph below is from the Food and Drug Administration (FDA) Clinical Pharmacology Section of Labeling for Human Prescription Drug and Biological Products, Adverse reactions section. Please carefully review it and extract the adverse reactions and percentages. Note that each adverse reaction is nested under a clinical trial and potentially an arm. Your output should take that into consideration.
12
28
 
13
29
  # Schema definition
14
30
  Your output should contain:
15
- If applicable, "ClinicalTrial" which is the name of the trial,
31
+ "ClinicalTrial" which is the name of the trial,
16
32
  If applicable, "Arm" which is the arm within the clinical trial,
17
- Must have, "AdverseReaction" which is the name of the adverse reaction spelled exactly as in the source document,
18
- If applicable, "Percentage" which is the occurrence of the adverse reaction within the trial and arm,
33
+ "AdverseReaction" which is the name of the adverse reaction,
34
+ If applicable, "Percentage" which is the occurance of the adverse reaction within the trial and arm,
35
+ "Evidence" which is the EXACT sentence in the text where you found the AdverseReaction from
19
36
 
20
37
  # Output format definition
21
- Your output should follow JSON format,
22
- if there are adverse reaction mentions:
23
- [{"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>"},
24
- {"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>"}]
25
- if there is no adverse reaction in the given sentence, just output an empty list:
26
- []
38
+ Your output should follow JSON format, for example:
39
+ [
40
+ {"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>", "Evidence": "<exact sentence from the text>"},
41
+ {"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>", "Evidence": "<exact sentence from the text>"}
42
+ ]
27
43
 
28
44
  # Additional hints
29
45
  Your output should be 100% based on the provided content. DO NOT output fake numbers.
30
- If there is no specific trial or arm, just omit the "ClinicalTrial" or "Arm" key. If the percentage is not reported, just omit the "Percentage" key.
31
- I am only interested in the content in JSON format. Do NOT generate explanation.
46
+ If there is no specific arm, just omit the "Arm" key. If the percentage is not reported, just omit the "Percentage" key. The "Evidence" should always be provided.
47
+
48
+ # Input placeholder
49
+ Below is the Adverse reactions section for your reference. I will feed you with sentences from it one by one.
50
+ {{input}}
51
+
52
+
53
+ Example 2 (multiple entity types):
54
+
55
+ # Task description
56
+ This is a named entity recognition task. Given a sentence from a medical note, annotate the Drug, Form, Strength, Frequency, Route, Dosage, Reason, ADE, and Duration.
57
+
58
+ # Schema definition
59
+ Your output should contain:
60
+ "entity_text": the exact wording as mentioned in the note.
61
+ "entity_type": type of the entity. It should be one of the "Drug", "Form", "Strength", "Frequency", "Route", "Dosage", "Reason", "ADE", or "Duration".
62
+
63
+ # Output format definition
64
+ Your output should follow JSON format,
65
+ if there are one of the entity mentions: Drug, Form, Strength, Frequency, Route, Dosage, Reason, ADE, or Duration:
66
+ [{"entity_text": "<Exact entity mentions as in the note>", "entity_type": "<entity type as listed above>"},
67
+ {"entity_text": "<Exact entity mentions as in the note>", "entity_type": "<entity type as listed above>"}]
68
+ if there is no entity mentioned in the given note, just output an empty list:
69
+ []
70
+
71
+ I am only interested in the extracted contents in []. Do NOT explain your answer.
72
+
73
+ # Examples
74
+ Below are some examples:
75
+
76
+ Input: Acetaminophen 650 mg PO BID 5.
77
+ Output: [{"entity_text": "Acetaminophen", "entity_type": "Drug"}, {"entity_text": "650 mg", "entity_type": "Strength"}, {"entity_text": "PO", "entity_type": "Route"}, {"entity_text": "BID", "entity_type": "Frequency"}]
78
+
79
+ Input: Mesalamine DR 1200 mg PO BID 2.
80
+ Output: [{"entity_text": "Mesalamine DR", "entity_type": "Drug"}, {"entity_text": "1200 mg", "entity_type": "Strength"}, {"entity_text": "BID", "entity_type": "Frequency"}, {"entity_text": "PO", "entity_type": "Route"}]
81
+
82
+
83
+ # Input placeholder
84
+ Below is the medical note for your reference. I will feed you with sentences from it one by one.
85
+ "{{input}}"
86
+
87
+
88
+ Example 3 (multiple entity types with corresponding attributes):
89
+
90
+ # Task description
91
+ This is a named entity recognition task. Given a sentence from a medical note, annotate the events (EVENT) and time expressions (TIMEX3):
92
+
93
+ # Schema definition
94
+ Your output should contain:
95
+ "entity_text": the exact wording as mentioned in the note.
96
+ "entity_type": type of the entity. It should be one of the "EVENT" or "TIMEX3".
97
+ if entity_type is "EVENT",
98
+ "type": the event type as one of the "TEST", "PROBLEM", "TREATMENT", "CLINICAL_DEPT", "EVIDENTIAL", or "OCCURRENCE".
99
+ "polarity": whether an EVENT is positive ("POS") or negative ("NAG"). For example, in “the patient reports headache, and denies chills”, the EVENT [headache] is positive in its polarity, and the EVENT [chills] is negative in its polarity.
100
+ "modality": whether an EVENT actually occurred or not. Must be one of the "FACTUAL", "CONDITIONAL", "POSSIBLE", or "PROPOSED".
101
+
102
+ if entity_type is "TIMEX3",
103
+ "type": the type as one of the "DATE", "TIME", "DURATION", or "FREQUENCY".
104
+ "val": the numeric value 1) DATE: [YYYY]-[MM]-[DD], 2) TIME: [hh]:[mm]:[ss], 3) DURATION: P[n][Y/M/W/D]. So, “for eleven days” will be
105
+ represented as “P11D”, meaning a period of 11 days. 4) R[n][duration], where n denotes the number of repeats. When the n is omitted, the expression denotes an unspecified amount of repeats. For example, “once a day for 3 days” is “R3P1D” (repeat the time interval of 1 day (P1D) for 3 times (R3)), twice every day is “RP12H” (repeat every 12 hours)
106
+ "mod": additional information regarding the temporal value of a time expression. Must be one of the:
107
+ “NA”: the default value, no relevant modifier is present;
108
+ “MORE”, means “more than”, e.g. over 2 days (val = P2D, mod = MORE);
109
+ “LESS”, means “less than”, e.g. almost 2 months (val = P2M, mod=LESS);
110
+ “APPROX”, means “approximate”, e.g. nearly a week (val = P1W, mod=APPROX);
111
+ “START”, describes the beginning of a period of time, e.g. Christmas morning, 2005 (val= 2005-12-25, mod= START).
112
+ “END”, describes the end of a period of time, e.g. late last year, (val = 2010, mod = END)
113
+ “MIDDLE”, describes the middle of a period of time, e.g. mid-September 2001 (val = 2001-09, mod = MIDDLE)
114
+
115
+ # Output format definition
116
+ Your output should follow JSON format,
117
+ if there are one of the EVENT or TIMEX3 entity mentions:
118
+ [
119
+ {"entity_text": "<Exact entity mentions as in the note>", "entity_type": "EVENT", "type": "<event type>", "polarity": "<event polarity>", "modality": "<event modality>"},
120
+ {"entity_text": "<Exact entity mentions as in the note>", "entity_type": "TIMEX3", "type": "<TIMEX3 type>", "val": "<time value>", "mod": "<additional information>"}
121
+ ...
122
+ ]
123
+ if there is no entity mentioned in the given note, just output an empty list:
124
+ []
125
+
126
+ I am only interested in the extracted contents in []. Do NOT explain your answer.
127
+
128
+ # Examples
129
+ Below are some examples:
130
+
131
+ Input: At 9/7/93 , 1:00 a.m. , intravenous fluids rate was decreased to 50 cc&apos;s per hour , total fluids given during the first 24 hours were 140 to 150 cc&apos;s per kilo per day .
132
+ Output: [{"entity_text": "intravenous fluids", "entity_type": "EVENT", "type": "TREATMENT", "polarity": "POS", "modality": "FACTUAL"},
133
+ {"entity_text": "decreased", "entity_type": "EVENT", "type": "OCCURRENCE", "polarity": "POS", "modality": "FACTUAL"},
134
+ {"entity_text": "total fluids", "entity_type": "EVENT", "type": "TREATMENT", "polarity": "POS", "modality": "FACTUAL"},
135
+ {"entity_text": "9/7/93 , 1:00 a.m.", "entity_type": "TIMEX3", "type": "TIME", "val": "1993-09-07T01:00", "mod": "NA"},
136
+ {"entity_text": "24 hours", "entity_type": "TIMEX3", "type": "DURATION", "val": "PT24H", "mod": "NA"}]
137
+
138
+ Input: At that time it appeared well adhered to the underlying skin .
139
+ Output: [{"entity_text": "it", "entity_type": "EVENT", "type": "TREATMENT", "polarity": "POS", "modality": "FACTUAL"},
140
+ {"entity_text": "well adhered", "entity_type": "EVENT", "type": "OCCURRENCE", "polarity": "POS", "modality": "FACTUAL"}]
32
141
 
33
- The Adverse reactions section often has a sentence in the first paragraph:
34
- "The following clinically significant adverse reactions are described elsewhere in the labeling:..." Make sure to extract those adverse reaction mentions.
35
- The Adverse reactions section often has summary sentences like:
36
- "The most common adverse reactions were ...". Make sure to extract those adverse reaction mentions.
37
142
 
38
143
  # Input placeholder
39
- Below is the entire Adverse reactions section for your reference. I will feed you with sentences from it one by one.
144
+ Below is the entire medical note for your reference. I will feed you with sentences from it one by one.
40
145
  "{{input}}"
llm_ie/data_types.py CHANGED
@@ -1,4 +1,4 @@
1
- from typing import List, Dict, Iterable, Callable
1
+ from typing import List, Dict, Tuple, Iterable, Callable
2
2
  import importlib.util
3
3
  import json
4
4
 
@@ -283,15 +283,12 @@ class LLMInformationExtractionDocument:
283
283
  json_file.flush()
284
284
 
285
285
 
286
- def serve_viz(self, host: str = '0.0.0.0', port: int = 3000, theme:str = "light",
287
- color_attr_key:str=None, color_map_func:Callable=None):
286
+ def _viz_preprocess(self) -> Tuple:
288
287
  """
289
- This method serves a visualization of the document.
288
+ This method preprocesses the entities and relations for visualization.
290
289
  """
291
290
  if importlib.util.find_spec("ie_viz") is None:
292
291
  raise ImportError("ie_viz not found. Please install ie_viz (```pip install ie-viz```).")
293
-
294
- from ie_viz import serve
295
292
 
296
293
  if self.has_frame():
297
294
  entities = [{"entity_id": frame.frame_id, "start": frame.start, "end": frame.end, "attr": frame.attr} for frame in self.frames]
@@ -306,6 +303,31 @@ class LLMInformationExtractionDocument:
306
303
  else:
307
304
  relations = None
308
305
 
306
+ return entities, relations
307
+
308
+
309
+ def viz_serve(self, host: str = '0.0.0.0', port: int = 5000, theme:str = "light",
310
+ color_attr_key:str=None, color_map_func:Callable=None):
311
+ """
312
+ This method serves a visualization App of the document.
313
+
314
+ Parameters:
315
+ -----------
316
+ host : str, Optional
317
+ The host address to run the server on.
318
+ port : int, Optional
319
+ The port number to run the server on.
320
+ theme : str, Optional
321
+ The theme of the visualization. Must be either "light" or "dark".
322
+ color_attr_key : str, Optional
323
+ The attribute key to be used for coloring the entities.
324
+ color_map_func : Callable, Optional
325
+ The function to be used for mapping the entity attributes to colors. When provided, the color_attr_key and
326
+ theme will be overwritten. The function must take an entity dictionary as input and return a color string (hex).
327
+ """
328
+ entities, relations = self._viz_preprocess()
329
+ from ie_viz import serve
330
+
309
331
  serve(text=self.text,
310
332
  entities=entities,
311
333
  relations=relations,
@@ -314,4 +336,28 @@ class LLMInformationExtractionDocument:
314
336
  theme=theme,
315
337
  color_attr_key=color_attr_key,
316
338
  color_map_func=color_map_func)
317
-
339
+
340
+
341
+ def viz_render(self, theme:str = "light", color_attr_key:str=None, color_map_func:Callable=None) -> str:
342
+ """
343
+ This method renders visualization html of the document.
344
+
345
+ Parameters:
346
+ -----------
347
+ theme : str, Optional
348
+ The theme of the visualization. Must be either "light" or "dark".
349
+ color_attr_key : str, Optional
350
+ The attribute key to be used for coloring the entities.
351
+ color_map_func : Callable, Optional
352
+ The function to be used for mapping the entity attributes to colors. When provided, the color_attr_key and
353
+ theme will be overwritten. The function must take an entity dictionary as input and return a color string (hex).
354
+ """
355
+ entities, relations = self._viz_preprocess()
356
+ from ie_viz import render
357
+
358
+ return render(text=self.text,
359
+ entities=entities,
360
+ relations=relations,
361
+ theme=theme,
362
+ color_attr_key=color_attr_key,
363
+ color_map_func=color_map_func)
llm_ie/engines.py CHANGED
@@ -243,7 +243,7 @@ class OpenAIInferenceEngine(InferenceEngine):
243
243
  for chunk in response:
244
244
  if chunk.choices[0].delta.content is not None:
245
245
  res += chunk.choices[0].delta.content
246
- print(chunk.choices[0].delta.content, end="")
246
+ print(chunk.choices[0].delta.content, end="", flush=True)
247
247
  return res
248
248
 
249
249
  return response.choices[0].message.content
llm_ie/extractors.py CHANGED
@@ -73,18 +73,49 @@ class Extractor:
73
73
 
74
74
  return prompt
75
75
 
76
+ def _find_dict_strings(self, text: str) -> List[str]:
77
+ """
78
+ Extracts balanced JSON-like dictionaries from a string, even if nested.
79
+
80
+ Parameters:
81
+ -----------
82
+ text : str
83
+ the input text containing JSON-like structures.
84
+
85
+ Returns : List[str]
86
+ A list of valid JSON-like strings representing dictionaries.
87
+ """
88
+ open_brace = 0
89
+ start = -1
90
+ json_objects = []
91
+
92
+ for i, char in enumerate(text):
93
+ if char == '{':
94
+ if open_brace == 0:
95
+ # start of a new JSON object
96
+ start = i
97
+ open_brace += 1
98
+ elif char == '}':
99
+ open_brace -= 1
100
+ if open_brace == 0 and start != -1:
101
+ json_objects.append(text[start:i + 1])
102
+ start = -1
103
+
104
+ return json_objects
105
+
106
+
76
107
  def _extract_json(self, gen_text:str) -> List[Dict[str, str]]:
77
108
  """
78
109
  This method inputs a generated text and output a JSON of information tuples
79
110
  """
80
- pattern = r'\{.*?\}'
81
111
  out = []
82
- for match in re.findall(pattern, gen_text, re.DOTALL):
112
+ dict_str_list = self._find_dict_strings(gen_text)
113
+ for dict_str in dict_str_list:
83
114
  try:
84
- tup_dict = json.loads(match)
85
- out.append(tup_dict)
115
+ dict_obj = json.loads(dict_str)
116
+ out.append(dict_obj)
86
117
  except json.JSONDecodeError:
87
- print(f'Post-processing failed at:\n{match}')
118
+ print(f'Post-processing failed at:\n{dict_str}')
88
119
  return out
89
120
 
90
121
 
llm_ie/prompt_editor.py CHANGED
@@ -1,8 +1,15 @@
1
- import os
1
+ import sys
2
+ from typing import Dict, Union
2
3
  import importlib.resources
3
4
  from llm_ie.engines import InferenceEngine
4
5
  from llm_ie.extractors import FrameExtractor
6
+ import re
7
+ import colorama
8
+ from colorama import Fore, Style
9
+ import ipywidgets as widgets
10
+ from IPython.display import display, HTML
5
11
 
12
+
6
13
  class PromptEditor:
7
14
  def __init__(self, inference_engine:InferenceEngine, extractor:FrameExtractor):
8
15
  """
@@ -18,16 +25,48 @@ class PromptEditor:
18
25
  self.inference_engine = inference_engine
19
26
  self.prompt_guide = extractor.get_prompt_guide()
20
27
 
28
+ file_path = importlib.resources.files('llm_ie.asset.PromptEditor_prompts').joinpath('system.txt')
29
+ with open(file_path, 'r') as f:
30
+ self.system_prompt = f.read()
31
+
32
+
33
+ def _apply_prompt_template(self, text_content:Dict[str,str], prompt_template:str) -> str:
34
+ """
35
+ This method applies text_content to prompt_template and returns a prompt.
36
+
37
+ Parameters
38
+ ----------
39
+ text_content : Dict[str,str]
40
+ the input text content to put in prompt template.
41
+ all the keys must be included in the prompt template placeholder {{<placeholder name>}}.
42
+
43
+ Returns : str
44
+ a prompt.
45
+ """
46
+ pattern = re.compile(r'{{(.*?)}}')
47
+ placeholders = pattern.findall(prompt_template)
48
+ if len(placeholders) != len(text_content):
49
+ raise ValueError(f"Expect text_content ({len(text_content)}) and prompt template placeholder ({len(placeholders)}) to have equal size.")
50
+ if not all([k in placeholders for k, _ in text_content.items()]):
51
+ raise ValueError(f"All keys in text_content ({text_content.keys()}) must match placeholders in prompt template ({placeholders}).")
52
+
53
+ prompt = pattern.sub(lambda match: re.sub(r'\\', r'\\\\', text_content[match.group(1)]), prompt_template)
54
+
55
+ return prompt
56
+
57
+
21
58
  def rewrite(self, draft:str) -> str:
22
59
  """
23
60
  This method inputs a prompt draft and rewrites it following the extractor's guideline.
24
61
  """
25
62
  file_path = importlib.resources.files('llm_ie.asset.PromptEditor_prompts').joinpath('rewrite.txt')
26
63
  with open(file_path, 'r') as f:
27
- prompt = f.read()
64
+ rewrite_prompt_template = f.read()
28
65
 
29
- prompt = prompt.replace("{{draft}}", draft).replace("{{prompt_guideline}}", self.prompt_guide)
30
- messages = [{"role": "user", "content": prompt}]
66
+ prompt = self._apply_prompt_template(text_content={"draft": draft, "prompt_guideline": self.prompt_guide},
67
+ prompt_template=rewrite_prompt_template)
68
+ messages = [{"role": "system", "content": self.system_prompt},
69
+ {"role": "user", "content": prompt}]
31
70
  res = self.inference_engine.chat(messages, stream=True)
32
71
  return res
33
72
 
@@ -37,9 +76,114 @@ class PromptEditor:
37
76
  """
38
77
  file_path = importlib.resources.files('llm_ie.asset.PromptEditor_prompts').joinpath('comment.txt')
39
78
  with open(file_path, 'r') as f:
40
- prompt = f.read()
79
+ comment_prompt_template = f.read()
41
80
 
42
- prompt = prompt.replace("{{draft}}", draft).replace("{{prompt_guideline}}", self.prompt_guide)
43
- messages = [{"role": "user", "content": prompt}]
81
+ prompt = self._apply_prompt_template(text_content={"draft": draft, "prompt_guideline": self.prompt_guide},
82
+ prompt_template=comment_prompt_template)
83
+ messages = [{"role": "system", "content": self.system_prompt},
84
+ {"role": "user", "content": prompt}]
44
85
  res = self.inference_engine.chat(messages, stream=True)
45
- return res
86
+ return res
87
+
88
+
89
+ def _terminal_chat(self):
90
+ """
91
+ This method runs an interactive chat session in the terminal to help users write prompt templates.
92
+ """
93
+ colorama.init(autoreset=True)
94
+ file_path = importlib.resources.files('llm_ie.asset.PromptEditor_prompts').joinpath('chat.txt')
95
+ with open(file_path, 'r') as f:
96
+ chat_prompt_template = f.read()
97
+
98
+ prompt = self._apply_prompt_template(text_content={"prompt_guideline": self.prompt_guide},
99
+ prompt_template=chat_prompt_template)
100
+
101
+ messages = [{"role": "system", "content": self.system_prompt},
102
+ {"role": "user", "content": prompt}]
103
+
104
+ print(f'Welcome to the interactive chat! Type "{Fore.RED}exit{Style.RESET_ALL}" or {Fore.YELLOW}control + C{Style.RESET_ALL} to end the conversation.')
105
+
106
+ while True:
107
+ # Get user input
108
+ user_input = input(f"{Fore.GREEN}\nUser: {Style.RESET_ALL}")
109
+
110
+ # Exit condition
111
+ if user_input.lower() == 'exit':
112
+ print(f"{Fore.YELLOW}Interactive chat ended. Goodbye!{Style.RESET_ALL}")
113
+ break
114
+
115
+ # Chat
116
+ messages.append({"role": "user", "content": user_input})
117
+ print(f"{Fore.BLUE}Assistant: {Style.RESET_ALL}", end="")
118
+ response = self.inference_engine.chat(messages, stream=True)
119
+ messages.append({"role": "assistant", "content": response})
120
+
121
+
122
+ def _IPython_chat(self):
123
+ """
124
+ This method runs an interactive chat session in Jupyter/IPython using ipywidgets to help users write prompt templates.
125
+ """
126
+ # Load the chat prompt template from the resources
127
+ file_path = importlib.resources.files('llm_ie.asset.PromptEditor_prompts').joinpath('chat.txt')
128
+ with open(file_path, 'r') as f:
129
+ chat_prompt_template = f.read()
130
+
131
+ # Prepare the initial system message with the prompt guideline
132
+ prompt = self._apply_prompt_template(text_content={"prompt_guideline": self.prompt_guide},
133
+ prompt_template=chat_prompt_template)
134
+
135
+ # Initialize conversation messages
136
+ messages = [{"role": "system", "content": self.system_prompt},
137
+ {"role": "user", "content": prompt}]
138
+
139
+ # Widgets for user input and chat output
140
+ input_box = widgets.Text(placeholder="Type your message here...")
141
+ output_area = widgets.Output()
142
+
143
+ # Display initial instructions
144
+ with output_area:
145
+ display(HTML('Welcome to the interactive chat! Type "<span style="color: red;">exit</span>" to end the conversation.'))
146
+
147
+ def handle_input(sender):
148
+ user_input = input_box.value
149
+ input_box.value = '' # Clear the input box after submission
150
+
151
+ # Exit condition
152
+ if user_input.strip().lower() == 'exit':
153
+ with output_area:
154
+ display(HTML('<p style="color: orange;">Interactive chat ended. Goodbye!</p>'))
155
+ input_box.disabled = True # Disable the input box after exiting
156
+ return
157
+
158
+ # Append user message to conversation
159
+ messages.append({"role": "user", "content": user_input})
160
+ print(f"User: {user_input}")
161
+
162
+ # Display the user message
163
+ with output_area:
164
+ display(HTML(f'<pre><span style="color: green;">User: </span>{user_input}</pre>'))
165
+
166
+ # Get assistant's response and append it to conversation
167
+ print("Assistant: ", end="")
168
+ response = self.inference_engine.chat(messages, stream=True)
169
+ messages.append({"role": "assistant", "content": response})
170
+
171
+ # Display the assistant's response
172
+ with output_area:
173
+ display(HTML(f'<pre><span style="color: blue;">Assistant: </span>{response}</pre>'))
174
+
175
+ # Bind the user input to the handle_input function
176
+ input_box.on_submit(handle_input)
177
+
178
+ # Display the input box and output area
179
+ display(input_box)
180
+ display(output_area)
181
+
182
+ def chat(self):
183
+ """
184
+ External method that detects the environment and calls the appropriate chat method.
185
+ """
186
+ if 'ipykernel' in sys.modules:
187
+ self._IPython_chat()
188
+ else:
189
+ self._terminal_chat()
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: llm-ie
3
- Version: 0.2.1
3
+ Version: 0.3.0
4
4
  Summary: An LLM-powered tool that transforms everyday language into robust information extraction pipelines.
5
5
  License: MIT
6
6
  Author: Enshuo (David) Hsu
@@ -20,6 +20,13 @@ Description-Content-Type: text/markdown
20
20
 
21
21
  An LLM-powered tool that transforms everyday language into robust information extraction pipelines.
22
22
 
23
+ | Features | Support |
24
+ |----------|----------|
25
+ | **LLM Agent for prompt writing** | :white_check_mark: Interactive chat, Python functions |
26
+ | **Named Entity Recognition (NER)** | :white_check_mark: Document-level, Sentence-level |
27
+ | **Entity Attributes Extraction** | :white_check_mark: Flexible formats |
28
+ | **Relation Extraction (RE)** | :white_check_mark: Binary & Multiclass relations |
29
+
23
30
  ## Table of Contents
24
31
  - [Overview](#overview)
25
32
  - [Prerequisite](#prerequisite)
@@ -35,12 +42,12 @@ An LLM-powered tool that transforms everyday language into robust information ex
35
42
  - [RelationExtractor](#relationextractor)
36
43
 
37
44
  ## Overview
38
- LLM-IE is a toolkit that provides robust information extraction utilities for frame-based information extraction. Since prompt design has a significant impact on generative information extraction with LLMs, it also provides a built-in LLM editor to help with prompt writing. The flowchart below demonstrates the workflow starting from a casual language request.
45
+ LLM-IE is a toolkit that provides robust information extraction utilities for named entity, entity attributes, and entity relation extraction. Since prompt design has a significant impact on generative information extraction with LLMs, it has a built-in LLM agent ("editor") to help with prompt writing. The flowchart below demonstrates the workflow starting from a casual language request to output visualization.
39
46
 
40
47
  <div align="center"><img src="doc_asset/readme_img/LLM-IE flowchart.png" width=800 ></div>
41
48
 
42
49
  ## Prerequisite
43
- At least one LLM inference engine is required. There are built-in supports for 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python), <img src="https://avatars.githubusercontent.com/u/151674099?s=48&v=4" alt="Icon" width="20"/> [Ollama](https://github.com/ollama/ollama), 🤗 [Huggingface_hub](https://github.com/huggingface/huggingface_hub), <img src=doc_asset/readme_img/openai-logomark.png width=16 /> [OpenAI API](https://platform.openai.com/docs/api-reference/introduction), and <img src=doc_asset/readme_img/vllm-logo.png width=20 /> vLLM. For installation guides, please refer to those projects. Other inference engines can be configured through the [InferenceEngine](src/llm_ie/engines.py) abstract class. See [LLM Inference Engine](#llm-inference-engine) section below.
50
+ At least one LLM inference engine is required. There are built-in supports for 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python), <img src="https://avatars.githubusercontent.com/u/151674099?s=48&v=4" alt="Icon" width="20"/> [Ollama](https://github.com/ollama/ollama), 🤗 [Huggingface_hub](https://github.com/huggingface/huggingface_hub), <img src=doc_asset/readme_img/openai-logomark.png width=16 /> [OpenAI API](https://platform.openai.com/docs/api-reference/introduction), and <img src=doc_asset/readme_img/vllm-logo.png width=20 /> [vLLM](https://github.com/vllm-project/vllm). For installation guides, please refer to those projects. Other inference engines can be configured through the [InferenceEngine](src/llm_ie/engines.py) abstract class. See [LLM Inference Engine](#llm-inference-engine) section below.
44
51
 
45
52
  ## Installation
46
53
  The Python package is available on PyPI.
@@ -125,21 +132,26 @@ We start with a casual description:
125
132
 
126
133
  *"Extract diagnosis from the clinical note. Make sure to include diagnosis date and status."*
127
134
 
128
- The ```PromptEditor``` rewrites it following the schema required by the ```BasicFrameExtractor```.
129
-
130
- ```python
135
+ Define the AI prompt editor.
136
+ ```python
137
+ from llm_ie.engines import OllamaInferenceEngine
131
138
  from llm_ie.extractors import BasicFrameExtractor
132
139
  from llm_ie.prompt_editor import PromptEditor
133
140
 
134
- # Describe the task in casual language
135
- prompt_draft = "Extract diagnosis from the clinical note. Make sure to include diagnosis date and status."
136
-
137
- # Use LLM editor to generate a formal prompt template with standard extraction schema
141
+ # Define a LLM inference engine
142
+ llm = OllamaInferenceEngine(model_name="llama3.1:8b-instruct-q8_0")
143
+ # Define LLM prompt editor
138
144
  editor = PromptEditor(llm, BasicFrameExtractor)
139
- prompt_template = editor.rewrite(prompt_draft)
145
+ # Start chat
146
+ editor.chat()
140
147
  ```
141
148
 
142
- The editor generates a prompt template as below:
149
+ This opens an interactive session:
150
+ <div align="left"><img src=doc_asset/readme_img/terminal_chat.PNG width=1000 ></div>
151
+
152
+
153
+ The ```PromptEditor``` drafts a prompt template following the schema required by the ```BasicFrameExtractor```:
154
+
143
155
  ```
144
156
  # Task description
145
157
  The paragraph below contains a clinical note with diagnoses listed. Please carefully review it and extract the diagnoses, including the diagnosis date and status.
@@ -165,6 +177,8 @@ If there is no specific date or status, just omit those keys.
165
177
  Below is the clinical note:
166
178
  {{input}}
167
179
  ```
180
+
181
+
168
182
  #### Information extraction pipeline
169
183
  Now we apply the prompt template to build an information extraction pipeline.
170
184
 
@@ -202,15 +216,33 @@ from llm_ie.data_types import LLMInformationExtractionDocument
202
216
  doc = LLMInformationExtractionDocument(doc_id="Synthesized medical note",
203
217
  text=note_text)
204
218
  # Add frames to a document
205
- for frame in frames:
206
- doc.add_frame(frame, valid_mode="span", create_id=True)
219
+ doc.add_frames(frames, create_id=True)
207
220
 
208
221
  # Save document to file (.llmie)
209
222
  doc.save("<your filename>.llmie")
210
223
  ```
211
224
 
225
+ To visualize the extracted frames, we use the ```viz_serve()``` method.
226
+ ```python
227
+ doc.viz_serve()
228
+ ```
229
+ A Flask APP starts at port 5000 (default).
230
+ ```
231
+ * Serving Flask app 'ie_viz.utilities'
232
+ * Debug mode: off
233
+ WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
234
+ * Running on all addresses (0.0.0.0)
235
+ * Running on http://127.0.0.1:5000
236
+ Press CTRL+C to quit
237
+ 127.0.0.1 - - [03/Oct/2024 23:36:22] "GET / HTTP/1.1" 200 -
238
+ ```
239
+
240
+ <div align="left"><img src="doc_asset/readme_img/llm-ie_demo.PNG" width=1000 ></div>
241
+
242
+
212
243
  ## Examples
213
- - [Write prompt templates with AI editors](demo/prompt_template_writing.ipynb)
244
+ - [Interactive chat with LLM prompt editors](demo/prompt_template_writing_via_chat.ipynb)
245
+ - [Write prompt templates with LLM prompt editors](demo/prompt_template_writing.ipynb)
214
246
  - [NER + RE for Drug, Strength, Frequency](demo/medication_relation_extraction.ipynb)
215
247
 
216
248
  ## User Guide
@@ -435,7 +467,30 @@ print(BasicFrameExtractor.get_prompt_guide())
435
467
  ```
436
468
 
437
469
  ### Prompt Editor
438
- The prompt editor is an LLM agent that reviews, comments and rewrites a prompt following the defined schema of each extractor. It is recommended to use prompt editor iteratively:
470
+ The prompt editor is an LLM agent that help users write prompt templates following the defined schema and guideline of each extractor. Chat with the promtp editor:
471
+
472
+ ```python
473
+ from llm_ie.prompt_editor import PromptEditor
474
+ from llm_ie.extractors import BasicFrameExtractor
475
+ from llm_ie.engines import OllamaInferenceEngine
476
+
477
+ # Define an LLM inference engine
478
+ ollama = OllamaInferenceEngine(model_name="llama3.1:8b-instruct-q8_0")
479
+
480
+ # Define editor
481
+ editor = PromptEditor(ollama, BasicFrameExtractor)
482
+
483
+ editor.chat()
484
+ ```
485
+
486
+ In a terminal environment, an interactive chat session will start:
487
+ <div align="left"><img src=doc_asset/readme_img/terminal_chat.PNG width=1000 ></div>
488
+
489
+ In the Jupyter/IPython environment, an ipywidgets session will start:
490
+ <div align="left"><img src=doc_asset/readme_img/IPython_chat.PNG width=1000 ></div>
491
+
492
+
493
+ We can also use the `rewrite()` and `comment()` methods to programmingly interact with the prompt editor:
439
494
  1. start with a casual description of the task
440
495
  2. have the prompt editor generate a prompt template as the starting point
441
496
  3. manually revise the prompt template
@@ -581,40 +636,29 @@ print(BasicFrameExtractor.get_prompt_guide())
581
636
  ```
582
637
 
583
638
  ```
584
- Prompt template design:
585
- 1. Task description
586
- 2. Schema definition
587
- 3. Output format definition
588
- 4. Additional hints
589
- 5. Input placeholder
639
+ Prompt Template Design:
590
640
 
591
- Example:
641
+ 1. Task Description:
642
+ Provide a detailed description of the task, including the background and the type of task (e.g., named entity recognition).
592
643
 
593
- # Task description
594
- The paragraph below is from the Food and Drug Administration (FDA) Clinical Pharmacology Section of Labeling for Human Prescription Drug and Biological Products, Adverse reactions section. Please carefully review it and extract the adverse reactions and percentages. Note that each adverse reaction is nested under a clinical trial and potentially an arm. Your output should take that into consideration.
644
+ 2. Schema Definition:
645
+ List the key concepts that should be extracted, and provide clear definitions for each one.
595
646
 
596
- # Schema definition
597
- Your output should contain:
598
- "ClinicalTrial" which is the name of the trial,
599
- If applicable, "Arm" which is the arm within the clinical trial,
600
- "AdverseReaction" which is the name of the adverse reaction,
601
- If applicable, "Percentage" which is the occurance of the adverse reaction within the trial and arm,
602
- "Evidence" which is the EXACT sentence in the text where you found the AdverseReaction from
647
+ 3. Output Format Definition:
648
+ The output should be a JSON list, where each element is a dictionary representing a frame (an entity along with its attributes). Each dictionary must include a key that holds the entity text. This key can be named "entity_text" or anything else depend on the context. The attributes can either be flat (e.g., {"entity_text": "<entity_text>", "attr1": "<attr1>", "attr2": "<attr2>"}) or nested (e.g., {"entity_text": "<entity_text>", "attributes": {"attr1": "<attr1>", "attr2": "<attr2>"}}).
603
649
 
604
- # Output format definition
605
- Your output should follow JSON format, for example:
606
- [
607
- {"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>", "Evidence": "<exact sentence from the text>"},
608
- {"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>", "Evidence": "<exact sentence from the text>"}
609
- ]
650
+ 4. Optional: Hints:
651
+ Provide itemized hints for the information extractors to guide the extraction process.
652
+
653
+ 5. Optional: Examples:
654
+ Include examples in the format:
655
+ Input: ...
656
+ Output: ...
610
657
 
611
- # Additional hints
612
- Your output should be 100% based on the provided content. DO NOT output fake numbers.
613
- If there is no specific arm, just omit the "Arm" key. If the percentage is not reported, just omit the "Percentage" key. The "Evidence" should always be provided.
658
+ 6. Input Placeholder:
659
+ The template must include a placeholder in the format {{<placeholder_name>}} for the input text. The placeholder name can be customized as needed.
614
660
 
615
- # Input placeholder
616
- Below is the Adverse reactions section:
617
- {{input}}
661
+ ......
618
662
  ```
619
663
  </details>
620
664
 
@@ -0,0 +1,17 @@
1
+ llm_ie/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
2
+ llm_ie/asset/PromptEditor_prompts/chat.txt,sha256=Fq62voV0JQ8xBRcxS1Nmdd7DkHs1fGYb-tmNwctZZK0,118
3
+ llm_ie/asset/PromptEditor_prompts/comment.txt,sha256=C_lxx-dlOlFJ__jkHKosZ8HsNAeV1aowh2B36nIipBY,159
4
+ llm_ie/asset/PromptEditor_prompts/rewrite.txt,sha256=JAwY9vm1jSmKf2qcLBYUvrSmME2EJH36bALmkwZDWYQ,178
5
+ llm_ie/asset/PromptEditor_prompts/system.txt,sha256=QwGTIJvp-5u2P8CkGt_rabttlN1puHQwIBNquUm1ZHo,730
6
+ llm_ie/asset/prompt_guide/BasicFrameExtractor_prompt_guide.txt,sha256=m7iX4Qjsf1N2V1mbjE-x4F-qPGZA2qGJbUCdpets394,9293
7
+ llm_ie/asset/prompt_guide/BinaryRelationExtractor_prompt_guide.txt,sha256=Z6Yc2_QRqroWcJ13owNJbo78I0wpS4XXDsOjXFR-aPk,2166
8
+ llm_ie/asset/prompt_guide/MultiClassRelationExtractor_prompt_guide.txt,sha256=EQ9Jmh0CQmlfkWqXx6_apuEZUKK3WIrdpAvfbTX2_No,3011
9
+ llm_ie/asset/prompt_guide/ReviewFrameExtractor_prompt_guide.txt,sha256=m7iX4Qjsf1N2V1mbjE-x4F-qPGZA2qGJbUCdpets394,9293
10
+ llm_ie/asset/prompt_guide/SentenceFrameExtractor_prompt_guide.txt,sha256=oKH_QeDgpw771ZdHk3L7DYz2Jvfm7OolUoTiJyMJI30,9541
11
+ llm_ie/data_types.py,sha256=hPz3WOeAzfn2QKmb0CxHmRdQWZQ4G9zq8U-RJBVFdYk,14329
12
+ llm_ie/engines.py,sha256=PTYs7s_iCPmI-yFUCVCPY_cMGS77ma2VGoz4rdNkODI,9308
13
+ llm_ie/extractors.py,sha256=l0zJEtPSuy-2f_OxPQFPH3RsyLydmN9MRQJQYiRdRbY,45026
14
+ llm_ie/prompt_editor.py,sha256=y8YI-nOdBpxSr2boQBqG_vuGhW572a9rYPG3eb0oWH0,8205
15
+ llm_ie-0.3.0.dist-info/METADATA,sha256=s2lWsb4RvN9_95KHqxY77Eclw4aEooTEay0lDcKD6VM,41225
16
+ llm_ie-0.3.0.dist-info/WHEEL,sha256=sP946D7jFCHeNz5Iq4fL4Lu-PrWrFsgfLXbbkciIZwg,88
17
+ llm_ie-0.3.0.dist-info/RECORD,,
@@ -1,15 +0,0 @@
1
- llm_ie/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
2
- llm_ie/asset/PromptEditor_prompts/comment.txt,sha256=C_lxx-dlOlFJ__jkHKosZ8HsNAeV1aowh2B36nIipBY,159
3
- llm_ie/asset/PromptEditor_prompts/rewrite.txt,sha256=bYLOix7DUBlcWv-Q0JZ5kDnZ9OEXBt_AGDN0TydLB8o,191
4
- llm_ie/asset/prompt_guide/BasicFrameExtractor_prompt_guide.txt,sha256=XbnU8byLGGUA3A3lT0bb2Hw-ggzhcqD3ZuKzduod2ww,1944
5
- llm_ie/asset/prompt_guide/BinaryRelationExtractor_prompt_guide.txt,sha256=z9Xg0fdFbVVwnTYcUTcAUvEIWhF075W8qGxN-Vj7xdo,1548
6
- llm_ie/asset/prompt_guide/MultiClassRelationExtractor_prompt_guide.txt,sha256=D5DphUHw8SUERUVdcIjUynuTmYJa6-PwBlF7FzxNsvQ,2276
7
- llm_ie/asset/prompt_guide/ReviewFrameExtractor_prompt_guide.txt,sha256=XbnU8byLGGUA3A3lT0bb2Hw-ggzhcqD3ZuKzduod2ww,1944
8
- llm_ie/asset/prompt_guide/SentenceFrameExtractor_prompt_guide.txt,sha256=8nj9OLPJMtr9Soi5JU3Xk-HC7pKNoI54xA_A4u7I5j4,2620
9
- llm_ie/data_types.py,sha256=zAUx-n1ePTTg1aFrml0lTZPreJx241YTeWFCprRoYbU,12245
10
- llm_ie/engines.py,sha256=m9ytGUX61jEy9SmVHbb90mrfGMAwC6dV-v7Jke1U7Ho,9296
11
- llm_ie/extractors.py,sha256=EVuHqW1lW0RpGmnRNmX4ih6ppfvy2gYOAOgc8Pngfkw,44103
12
- llm_ie/prompt_editor.py,sha256=dbu7A3O7O7Iw2v-xCgrTFH1-wTLAGf4SHDqdeS-He2Q,1869
13
- llm_ie-0.2.1.dist-info/METADATA,sha256=JOjW83QHlj-rgp09upos2-XvhvvnROBeOoX4U33AZKU,40052
14
- llm_ie-0.2.1.dist-info/WHEEL,sha256=sP946D7jFCHeNz5Iq4fL4Lu-PrWrFsgfLXbbkciIZwg,88
15
- llm_ie-0.2.1.dist-info/RECORD,,
File without changes