llm-ie 0.3.0__py3-none-any.whl → 0.3.1__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,3 @@
1
+ Review the input text and your output carefully. If anything was missed, add it to your output following the defined output formats.
2
+ You should ONLY adding new items. Do NOT re-generate the entire answer.
3
+ Your output should strictly adheres to the defined output formats.
@@ -0,0 +1,2 @@
1
+ Review the input text and your output carefully. If you find any omissions or errors, correct them by generating a revised output following the defined output formats.
2
+ Your output should strictly adheres to the defined output formats.
@@ -0,0 +1,4 @@
1
+ Review the input sentence and your output carefully. If anything was missed, add it to your output following the defined output formats.
2
+ You should ONLY adding new items. Do NOT re-generate the entire answer.
3
+ Your output should be based on the input sentence.
4
+ Your output should strictly adheres to the defined output formats.
@@ -0,0 +1,3 @@
1
+ Review the input sentence and your output carefully. If you find any omissions or errors, correct them by generating a revised output following the defined output formats.
2
+ Your output should be based on the input sentence.
3
+ Your output should strictly adheres to the defined output formats.
@@ -0,0 +1,217 @@
1
+ Prompt Template Design:
2
+
3
+ 1. Task Description:
4
+ Provide a detailed description of the task, including the background and the type of task (e.g., named entity recognition).
5
+
6
+ 2. Schema Definition:
7
+ List the key concepts that should be extracted, and provide clear definitions for each one.
8
+
9
+ 3. Thinking process:
10
+ Provide clear step-by-step instructions for analyzing the input text. Typically, this process should begin with an analysis section and proceed to the output generation. Each section should have a specific purpose:
11
+
12
+ Optional: Recall Section (<Recall>... </Recall>):
13
+ Write a brief recall of the task description and schema definition for better understanding of the task.
14
+
15
+ Analysis Section (<Analysis>... </Analysis>):
16
+ Break down the input text to identify important medical contents and clarify ambiguous concepts.
17
+
18
+ Output Section (<Outputs>... </Outputs>):
19
+ Based on the analysis, generate the required output in the defined format. Ensure that the extracted information adheres to the schema and task description.
20
+
21
+ 4. Output Format Definition:
22
+ The output should be a JSON list, where each element is a dictionary representing a frame (an entity along with its attributes). Each dictionary must include a key that holds the entity text. This key can be named "entity_text" or anything else depend on the context. The attributes can either be flat (e.g., {"entity_text": "<entity_text>", "attr1": "<attr1>", "attr2": "<attr2>"}) or nested (e.g., {"entity_text": "<entity_text>", "attributes": {"attr1": "<attr1>", "attr2": "<attr2>"}}).
23
+
24
+ 5. Optional: Hints:
25
+ Provide itemized hints for the information extractors to guide the extraction process.
26
+
27
+ 6. Optional: Examples:
28
+ Include examples in the format:
29
+ Input: ...
30
+ Output: ...
31
+
32
+ 7. Input Placeholder:
33
+ The template must include a placeholder in the format {{<placeholder_name>}} for the input text. The placeholder name can be customized as needed.
34
+
35
+
36
+ Example 1 (single entity type with attributes):
37
+
38
+ # Task description
39
+ The paragraph below is from the Food and Drug Administration (FDA) Clinical Pharmacology Section of Labeling for Human Prescription Drug and Biological Products, Adverse reactions section. Please carefully review it and extract the adverse reactions and percentages. Note that each adverse reaction is nested under a clinical trial and potentially an arm. Your output should take that into consideration.
40
+
41
+ # Schema definition
42
+ Your output should contain:
43
+ "ClinicalTrial" which is the name of the trial,
44
+ If applicable, "Arm" which is the arm within the clinical trial,
45
+ "AdverseReaction" which is the name of the adverse reaction,
46
+ If applicable, "Percentage" which is the occurance of the adverse reaction within the trial and arm,
47
+ "Evidence" which is the EXACT sentence in the text where you found the AdverseReaction from
48
+
49
+ # Thinking process
50
+ Approach this task step by step. Start with a recall section (<Recall>... </Recall>) that briefly summarize of the task description and schema definition for better understanding of the task. Then write an analysis section (<Analysis>... </Analysis>) to analyze the input sentence. Identify important pharmacology contents and clarify ambiguous concepts. Finally, the output section (<Outputs>... </Outputs>) that list your final outputs following the defined format.
51
+
52
+ # Output format definition
53
+ Your output should follow JSON format, for example:
54
+ [
55
+ {"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>", "Evidence": "<exact sentence from the text>"},
56
+ {"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>", "Evidence": "<exact sentence from the text>"}
57
+ ]
58
+
59
+ # Additional hints
60
+ Your output should be 100% based on the provided content. DO NOT output fake numbers.
61
+ If there is no specific arm, just omit the "Arm" key. If the percentage is not reported, just omit the "Percentage" key. The "Evidence" should always be provided.
62
+
63
+ # Input placeholder
64
+ Below is the Adverse reactions section for your reference. I will feed you with sentences from it one by one.
65
+ {{input}}
66
+
67
+
68
+ Example 2 (multiple entity types):
69
+
70
+ # Task description
71
+ This is a named entity recognition task. Given a sentence from a medical note, annotate the Drug, Form, Strength, Frequency, Route, Dosage, Reason, ADE, and Duration.
72
+
73
+ # Schema definition
74
+ Your output should contain:
75
+ "entity_text": the exact wording as mentioned in the note.
76
+ "entity_type": type of the entity. It should be one of the "Drug", "Form", "Strength", "Frequency", "Route", "Dosage", "Reason", "ADE", or "Duration".
77
+
78
+ # Thinking process
79
+ Approach this task step by step. Start with an analysis section (<Analysis>... </Analysis>) to analyze the input sentence. Identify important medical contents and clarify ambiguous concepts. Then, the output section (<Outputs>... </Outputs>) that list your final outputs following the defined format.
80
+
81
+ # Output format definition
82
+ Your output should follow JSON format,
83
+ if there are one of the entity mentions: Drug, Form, Strength, Frequency, Route, Dosage, Reason, ADE, or Duration:
84
+ [{"entity_text": "<Exact entity mentions as in the note>", "entity_type": "<entity type as listed above>"},
85
+ {"entity_text": "<Exact entity mentions as in the note>", "entity_type": "<entity type as listed above>"}]
86
+ if there is no entity mentioned in the given note, just output an empty list:
87
+ []
88
+
89
+ # Examples
90
+ Below are some examples:
91
+
92
+ Input: Acetaminophen 650 mg PO BID 5.
93
+ Output:
94
+ <Analysis>
95
+ The sentence "Acetaminophen 650 mg PO BID 5." contains several potential medical entities.
96
+
97
+ "Acetaminophen" is a Drug.
98
+ "650 mg" represents the Strength.
99
+ "PO" is the Route (meaning by mouth).
100
+ "BID" stands for a dosing frequency, which represents Frequency (meaning twice a day).
101
+ </Analysis>
102
+
103
+ <Outputs>
104
+ [{"entity_text": "Acetaminophen", "entity_type": "Drug"}, {"entity_text": "650 mg", "entity_type": "Strength"}, {"entity_text": "PO", "entity_type": "Route"}, {"entity_text": "BID", "entity_type": "Frequency"}]
105
+ </Outputs>
106
+
107
+ Input: Mesalamine DR 1200 mg PO BID 2.
108
+ Output:
109
+ <Analysis>
110
+ The sentence "Mesalamine DR 1200 mg PO BID 2." contains the following medical entities:
111
+
112
+ "Mesalamine" is a Drug.
113
+ "DR" stands for Form (delayed-release).
114
+ "1200 mg" represents the Strength.
115
+ "PO" is the Route (by mouth).
116
+ "BID" is the Frequency (twice a day).
117
+ </Analysis>
118
+
119
+ <Outputs>
120
+ [{"entity_text": "Mesalamine DR", "entity_type": "Drug"}, {"entity_text": "1200 mg", "entity_type": "Strength"}, {"entity_text": "BID", "entity_type": "Frequency"}, {"entity_text": "PO", "entity_type": "Route"}]
121
+ </Outputs>
122
+
123
+ # Input placeholder
124
+ Below is the medical note for your reference. I will feed you with sentences from it one by one.
125
+ "{{input}}"
126
+
127
+
128
+ Example 3 (multiple entity types with corresponding attributes):
129
+
130
+ # Task description
131
+ This is a named entity recognition task. Given a sentence from a medical note, annotate the events (EVENT) and time expressions (TIMEX3):
132
+
133
+ # Schema definition
134
+ Your output should contain:
135
+ "entity_text": the exact wording as mentioned in the note.
136
+ "entity_type": type of the entity. It should be one of the "EVENT" or "TIMEX3".
137
+ if entity_type is "EVENT",
138
+ "type": the event type as one of the "TEST", "PROBLEM", "TREATMENT", "CLINICAL_DEPT", "EVIDENTIAL", or "OCCURRENCE".
139
+ "polarity": whether an EVENT is positive ("POS") or negative ("NAG"). For example, in “the patient reports headache, and denies chills”, the EVENT [headache] is positive in its polarity, and the EVENT [chills] is negative in its polarity.
140
+ "modality": whether an EVENT actually occurred or not. Must be one of the "FACTUAL", "CONDITIONAL", "POSSIBLE", or "PROPOSED".
141
+
142
+ if entity_type is "TIMEX3",
143
+ "type": the type as one of the "DATE", "TIME", "DURATION", or "FREQUENCY".
144
+ "val": the numeric value 1) DATE: [YYYY]-[MM]-[DD], 2) TIME: [hh]:[mm]:[ss], 3) DURATION: P[n][Y/M/W/D]. So, “for eleven days” will be
145
+ represented as “P11D”, meaning a period of 11 days. 4) R[n][duration], where n denotes the number of repeats. When the n is omitted, the expression denotes an unspecified amount of repeats. For example, “once a day for 3 days” is “R3P1D” (repeat the time interval of 1 day (P1D) for 3 times (R3)), twice every day is “RP12H” (repeat every 12 hours)
146
+ "mod": additional information regarding the temporal value of a time expression. Must be one of the:
147
+ “NA”: the default value, no relevant modifier is present;
148
+ “MORE”, means “more than”, e.g. over 2 days (val = P2D, mod = MORE);
149
+ “LESS”, means “less than”, e.g. almost 2 months (val = P2M, mod=LESS);
150
+ “APPROX”, means “approximate”, e.g. nearly a week (val = P1W, mod=APPROX);
151
+ “START”, describes the beginning of a period of time, e.g. Christmas morning, 2005 (val= 2005-12-25, mod= START).
152
+ “END”, describes the end of a period of time, e.g. late last year, (val = 2010, mod = END)
153
+ “MIDDLE”, describes the middle of a period of time, e.g. mid-September 2001 (val = 2001-09, mod = MIDDLE)
154
+
155
+ # Thinking process
156
+ Approach this task step by step. Start with a recall section (<Recall>... </Recall>) that briefly summarize of the task description and schema definition for better understanding of the task. Followed by an analysis section (<Analysis>... </Analysis>) to analyze the input sentence. Identify important medical contents and clarify ambiguous concepts. Then, the output section (<Outputs>... </Outputs>) that list your final outputs following the defined format.
157
+
158
+ # Output format definition
159
+ Your output should follow JSON format,
160
+ if there are one of the EVENT or TIMEX3 entity mentions:
161
+ [
162
+ {"entity_text": "<Exact entity mentions as in the note>", "entity_type": "EVENT", "type": "<event type>", "polarity": "<event polarity>", "modality": "<event modality>"},
163
+ {"entity_text": "<Exact entity mentions as in the note>", "entity_type": "TIMEX3", "type": "<TIMEX3 type>", "val": "<time value>", "mod": "<additional information>"}
164
+ ...
165
+ ]
166
+ if there is no entity mentioned in the given note, just output an empty list:
167
+ []
168
+
169
+
170
+ # Examples
171
+ Below are some examples:
172
+
173
+ Input: At 9/7/93 , 1:00 a.m. , intravenous fluids rate was decreased to 50 cc&apos;s per hour , total fluids given during the first 24 hours were 140 to 150 cc&apos;s per kilo per day .
174
+ Output:
175
+ <Recall>
176
+ This is a named entity recognition task that focuses on extracting medical events (EVENT) and time expressions (TIMEX3). Events are categorized by their type (TEST, PROBLEM, TREATMENT, etc.), polarity (POS or NEG), and modality (FACTUAL, CONDITIONAL, POSSIBLE, or PROPOSED). Time expressions are identified as either DATE, TIME, DURATION, or FREQUENCY and include specific values or modifiers where applicable.
177
+ </Recall>
178
+
179
+ <Analysis>
180
+ In this sentence:
181
+
182
+ "9/7/93" represents a TIMEX3 entity for the date.
183
+ "1:00 a.m." is a TIMEX3 entity representing the time.
184
+ "first 24 hours" refers to a TIMEX3 entity of duration.
185
+ "intravenous fluids rate was decreased" is an EVENT referring to a TREATMENT event with a negative polarity (as it was "decreased") and a FACTUAL modality (it actually happened).
186
+ "total fluids given during the first 24 hours" is another EVENT representing a TREATMENT that is FACTUAL in its modality.
187
+ </Analysis>
188
+
189
+ <Outputs>
190
+ [{"entity_text": "intravenous fluids", "entity_type": "EVENT", "type": "TREATMENT", "polarity": "POS", "modality": "FACTUAL"},
191
+ {"entity_text": "decreased", "entity_type": "EVENT", "type": "OCCURRENCE", "polarity": "POS", "modality": "FACTUAL"},
192
+ {"entity_text": "total fluids", "entity_type": "EVENT", "type": "TREATMENT", "polarity": "POS", "modality": "FACTUAL"},
193
+ {"entity_text": "9/7/93 , 1:00 a.m.", "entity_type": "TIMEX3", "type": "TIME", "val": "1993-09-07T01:00", "mod": "NA"},
194
+ {"entity_text": "24 hours", "entity_type": "TIMEX3", "type": "DURATION", "val": "PT24H", "mod": "NA"}]
195
+ </Outputs>
196
+
197
+ Input: At that time it appeared well adhered to the underlying skin .
198
+ Output:
199
+ <Recall>
200
+ This is a named entity recognition task focused on extracting medical events (EVENT) and time expressions (TIMEX3). Events are categorized by their type (e.g., TEST, PROBLEM, TREATMENT), polarity (POS or NEG), and modality (FACTUAL, CONDITIONAL, POSSIBLE, or PROPOSED). Time expressions are categorized as DATE, TIME, DURATION, or FREQUENCY, and include values or modifiers where applicable.
201
+ </Recall>
202
+
203
+ <Analysis>
204
+ In this sentence:
205
+
206
+ "At that time" refers to a TIMEX3 entity that is vague, so it can be considered as a TIME with an unspecified value.
207
+ "appeared well adhered to the underlying skin" describes an EVENT that likely indicates a PROBLEM (the condition of the skin) and has a POS polarity (since it is "well adhered") with a FACTUAL modality (it actually occurred).
208
+ </Analysis>
209
+
210
+ <Outputs>
211
+ [{"entity_text": "it", "entity_type": "EVENT", "type": "TREATMENT", "polarity": "POS", "modality": "FACTUAL"},
212
+ {"entity_text": "well adhered", "entity_type": "EVENT", "type": "OCCURRENCE", "polarity": "POS", "modality": "FACTUAL"}]
213
+ </Outputs>
214
+
215
+ # Input placeholder
216
+ Below is the entire medical note for your reference. I will feed you with sentences from it one by one.
217
+ "{{input}}"
@@ -0,0 +1,145 @@
1
+ Prompt Template Design:
2
+
3
+ 1. Task Description:
4
+ Provide a detailed description of the task, including the background and the type of task (e.g., named entity recognition).
5
+
6
+ 2. Schema Definition:
7
+ List the key concepts that should be extracted, and provide clear definitions for each one.
8
+
9
+ 3. Output Format Definition:
10
+ The output should be a JSON list, where each element is a dictionary representing a frame (an entity along with its attributes). Each dictionary must include a key that holds the entity text. This key can be named "entity_text" or anything else depend on the context. The attributes can either be flat (e.g., {"entity_text": "<entity_text>", "attr1": "<attr1>", "attr2": "<attr2>"}) or nested (e.g., {"entity_text": "<entity_text>", "attributes": {"attr1": "<attr1>", "attr2": "<attr2>"}}).
11
+
12
+ 4. Optional: Hints:
13
+ Provide itemized hints for the information extractors to guide the extraction process.
14
+
15
+ 5. Optional: Examples:
16
+ Include examples in the format:
17
+ Input: ...
18
+ Output: ...
19
+
20
+ 6. Input Placeholder:
21
+ The template must include a placeholder in the format {{<placeholder_name>}} for the input text. The placeholder name can be customized as needed.
22
+
23
+
24
+ Example 1 (single entity type with attributes):
25
+
26
+ # Task description
27
+ The paragraph below is from the Food and Drug Administration (FDA) Clinical Pharmacology Section of Labeling for Human Prescription Drug and Biological Products, Adverse reactions section. Please carefully review it and extract the adverse reactions and percentages. Note that each adverse reaction is nested under a clinical trial and potentially an arm. Your output should take that into consideration.
28
+
29
+ # Schema definition
30
+ Your output should contain:
31
+ "ClinicalTrial" which is the name of the trial,
32
+ If applicable, "Arm" which is the arm within the clinical trial,
33
+ "AdverseReaction" which is the name of the adverse reaction,
34
+ If applicable, "Percentage" which is the occurance of the adverse reaction within the trial and arm,
35
+ "Evidence" which is the EXACT sentence in the text where you found the AdverseReaction from
36
+
37
+ # Output format definition
38
+ Your output should follow JSON format, for example:
39
+ [
40
+ {"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>", "Evidence": "<exact sentence from the text>"},
41
+ {"ClinicalTrial": "<Clinical trial name or number>", "Arm": "<name of arm>", "AdverseReaction": "<Adverse reaction text>", "Percentage": "<a percent>", "Evidence": "<exact sentence from the text>"}
42
+ ]
43
+
44
+ # Additional hints
45
+ Your output should be 100% based on the provided content. DO NOT output fake numbers.
46
+ If there is no specific arm, just omit the "Arm" key. If the percentage is not reported, just omit the "Percentage" key. The "Evidence" should always be provided.
47
+
48
+ # Input placeholder
49
+ Below is the Adverse reactions section for your reference. I will feed you with sentences from it one by one.
50
+ {{input}}
51
+
52
+
53
+ Example 2 (multiple entity types):
54
+
55
+ # Task description
56
+ This is a named entity recognition task. Given a sentence from a medical note, annotate the Drug, Form, Strength, Frequency, Route, Dosage, Reason, ADE, and Duration.
57
+
58
+ # Schema definition
59
+ Your output should contain:
60
+ "entity_text": the exact wording as mentioned in the note.
61
+ "entity_type": type of the entity. It should be one of the "Drug", "Form", "Strength", "Frequency", "Route", "Dosage", "Reason", "ADE", or "Duration".
62
+
63
+ # Output format definition
64
+ Your output should follow JSON format,
65
+ if there are one of the entity mentions: Drug, Form, Strength, Frequency, Route, Dosage, Reason, ADE, or Duration:
66
+ [{"entity_text": "<Exact entity mentions as in the note>", "entity_type": "<entity type as listed above>"},
67
+ {"entity_text": "<Exact entity mentions as in the note>", "entity_type": "<entity type as listed above>"}]
68
+ if there is no entity mentioned in the given note, just output an empty list:
69
+ []
70
+
71
+ I am only interested in the extracted contents in []. Do NOT explain your answer.
72
+
73
+ # Examples
74
+ Below are some examples:
75
+
76
+ Input: Acetaminophen 650 mg PO BID 5.
77
+ Output: [{"entity_text": "Acetaminophen", "entity_type": "Drug"}, {"entity_text": "650 mg", "entity_type": "Strength"}, {"entity_text": "PO", "entity_type": "Route"}, {"entity_text": "BID", "entity_type": "Frequency"}]
78
+
79
+ Input: Mesalamine DR 1200 mg PO BID 2.
80
+ Output: [{"entity_text": "Mesalamine DR", "entity_type": "Drug"}, {"entity_text": "1200 mg", "entity_type": "Strength"}, {"entity_text": "BID", "entity_type": "Frequency"}, {"entity_text": "PO", "entity_type": "Route"}]
81
+
82
+
83
+ # Input placeholder
84
+ Below is the medical note for your reference. I will feed you with sentences from it one by one.
85
+ "{{input}}"
86
+
87
+
88
+ Example 3 (multiple entity types with corresponding attributes):
89
+
90
+ # Task description
91
+ This is a named entity recognition task. Given a sentence from a medical note, annotate the events (EVENT) and time expressions (TIMEX3):
92
+
93
+ # Schema definition
94
+ Your output should contain:
95
+ "entity_text": the exact wording as mentioned in the note.
96
+ "entity_type": type of the entity. It should be one of the "EVENT" or "TIMEX3".
97
+ if entity_type is "EVENT",
98
+ "type": the event type as one of the "TEST", "PROBLEM", "TREATMENT", "CLINICAL_DEPT", "EVIDENTIAL", or "OCCURRENCE".
99
+ "polarity": whether an EVENT is positive ("POS") or negative ("NAG"). For example, in “the patient reports headache, and denies chills”, the EVENT [headache] is positive in its polarity, and the EVENT [chills] is negative in its polarity.
100
+ "modality": whether an EVENT actually occurred or not. Must be one of the "FACTUAL", "CONDITIONAL", "POSSIBLE", or "PROPOSED".
101
+
102
+ if entity_type is "TIMEX3",
103
+ "type": the type as one of the "DATE", "TIME", "DURATION", or "FREQUENCY".
104
+ "val": the numeric value 1) DATE: [YYYY]-[MM]-[DD], 2) TIME: [hh]:[mm]:[ss], 3) DURATION: P[n][Y/M/W/D]. So, “for eleven days” will be
105
+ represented as “P11D”, meaning a period of 11 days. 4) R[n][duration], where n denotes the number of repeats. When the n is omitted, the expression denotes an unspecified amount of repeats. For example, “once a day for 3 days” is “R3P1D” (repeat the time interval of 1 day (P1D) for 3 times (R3)), twice every day is “RP12H” (repeat every 12 hours)
106
+ "mod": additional information regarding the temporal value of a time expression. Must be one of the:
107
+ “NA”: the default value, no relevant modifier is present;
108
+ “MORE”, means “more than”, e.g. over 2 days (val = P2D, mod = MORE);
109
+ “LESS”, means “less than”, e.g. almost 2 months (val = P2M, mod=LESS);
110
+ “APPROX”, means “approximate”, e.g. nearly a week (val = P1W, mod=APPROX);
111
+ “START”, describes the beginning of a period of time, e.g. Christmas morning, 2005 (val= 2005-12-25, mod= START).
112
+ “END”, describes the end of a period of time, e.g. late last year, (val = 2010, mod = END)
113
+ “MIDDLE”, describes the middle of a period of time, e.g. mid-September 2001 (val = 2001-09, mod = MIDDLE)
114
+
115
+ # Output format definition
116
+ Your output should follow JSON format,
117
+ if there are one of the EVENT or TIMEX3 entity mentions:
118
+ [
119
+ {"entity_text": "<Exact entity mentions as in the note>", "entity_type": "EVENT", "type": "<event type>", "polarity": "<event polarity>", "modality": "<event modality>"},
120
+ {"entity_text": "<Exact entity mentions as in the note>", "entity_type": "TIMEX3", "type": "<TIMEX3 type>", "val": "<time value>", "mod": "<additional information>"}
121
+ ...
122
+ ]
123
+ if there is no entity mentioned in the given note, just output an empty list:
124
+ []
125
+
126
+ I am only interested in the extracted contents in []. Do NOT explain your answer.
127
+
128
+ # Examples
129
+ Below are some examples:
130
+
131
+ Input: At 9/7/93 , 1:00 a.m. , intravenous fluids rate was decreased to 50 cc&apos;s per hour , total fluids given during the first 24 hours were 140 to 150 cc&apos;s per kilo per day .
132
+ Output: [{"entity_text": "intravenous fluids", "entity_type": "EVENT", "type": "TREATMENT", "polarity": "POS", "modality": "FACTUAL"},
133
+ {"entity_text": "decreased", "entity_type": "EVENT", "type": "OCCURRENCE", "polarity": "POS", "modality": "FACTUAL"},
134
+ {"entity_text": "total fluids", "entity_type": "EVENT", "type": "TREATMENT", "polarity": "POS", "modality": "FACTUAL"},
135
+ {"entity_text": "9/7/93 , 1:00 a.m.", "entity_type": "TIMEX3", "type": "TIME", "val": "1993-09-07T01:00", "mod": "NA"},
136
+ {"entity_text": "24 hours", "entity_type": "TIMEX3", "type": "DURATION", "val": "PT24H", "mod": "NA"}]
137
+
138
+ Input: At that time it appeared well adhered to the underlying skin .
139
+ Output: [{"entity_text": "it", "entity_type": "EVENT", "type": "TREATMENT", "polarity": "POS", "modality": "FACTUAL"},
140
+ {"entity_text": "well adhered", "entity_type": "EVENT", "type": "OCCURRENCE", "polarity": "POS", "modality": "FACTUAL"}]
141
+
142
+
143
+ # Input placeholder
144
+ Below is the entire medical note for your reference. I will feed you with sentences from it one by one.
145
+ "{{input}}"
llm_ie/extractors.py CHANGED
@@ -8,6 +8,7 @@ import itertools
8
8
  from typing import List, Dict, Tuple, Union, Callable
9
9
  from llm_ie.data_types import LLMInformationExtractionFrame, LLMInformationExtractionDocument
10
10
  from llm_ie.engines import InferenceEngine
11
+ from colorama import Fore, Style
11
12
 
12
13
 
13
14
  class Extractor:
@@ -115,7 +116,7 @@ class Extractor:
115
116
  dict_obj = json.loads(dict_str)
116
117
  out.append(dict_obj)
117
118
  except json.JSONDecodeError:
118
- print(f'Post-processing failed at:\n{dict_str}')
119
+ warnings.warn(f'Post-processing failed:\n{dict_str}', RuntimeWarning)
119
120
  return out
120
121
 
121
122
 
@@ -275,7 +276,7 @@ class BasicFrameExtractor(FrameExtractor):
275
276
 
276
277
 
277
278
  def extract_frames(self, text_content:Union[str, Dict[str,str]], entity_key:str, max_new_tokens:int=2048,
278
- temperature:float=0.0, document_key:str=None, **kwrs) -> List[LLMInformationExtractionFrame]:
279
+ temperature:float=0.0, case_sensitive:bool=False, document_key:str=None, **kwrs) -> List[LLMInformationExtractionFrame]:
279
280
  """
280
281
  This method inputs a text and outputs a list of LLMInformationExtractionFrame
281
282
  It use the extract() method and post-process outputs into frames.
@@ -292,6 +293,8 @@ class BasicFrameExtractor(FrameExtractor):
292
293
  the max number of new tokens LLM should generate.
293
294
  temperature : float, Optional
294
295
  the temperature for token sampling.
296
+ case_sensitive : bool, Optional
297
+ if True, entity text matching will be case-sensitive.
295
298
  document_key : str, Optional
296
299
  specify the key in text_content where document text is.
297
300
  If text_content is str, this parameter will be ignored.
@@ -302,7 +305,14 @@ class BasicFrameExtractor(FrameExtractor):
302
305
  frame_list = []
303
306
  gen_text = self.extract(text_content=text_content,
304
307
  max_new_tokens=max_new_tokens, temperature=temperature, **kwrs)
305
- entity_json = self._extract_json(gen_text=gen_text)
308
+
309
+ entity_json = []
310
+ for entity in self._extract_json(gen_text=gen_text):
311
+ if entity_key in entity:
312
+ entity_json.append(entity)
313
+ else:
314
+ warnings.warn(f'Extractor output "{entity}" does not have entity_key ("{entity_key}"). This frame will be dropped.', RuntimeWarning)
315
+
306
316
  if isinstance(text_content, str):
307
317
  text = text_content
308
318
  elif isinstance(text_content, dict):
@@ -310,7 +320,7 @@ class BasicFrameExtractor(FrameExtractor):
310
320
 
311
321
  spans = self._find_entity_spans(text=text,
312
322
  entities=[e[entity_key] for e in entity_json],
313
- case_sensitive=False)
323
+ case_sensitive=case_sensitive)
314
324
 
315
325
  for i, (ent, span) in enumerate(zip(entity_json, spans)):
316
326
  if span is not None:
@@ -325,8 +335,8 @@ class BasicFrameExtractor(FrameExtractor):
325
335
 
326
336
 
327
337
  class ReviewFrameExtractor(BasicFrameExtractor):
328
- def __init__(self, inference_engine:InferenceEngine, prompt_template:str, review_prompt:str,
329
- review_mode:str, system_prompt:str=None, **kwrs):
338
+ def __init__(self, inference_engine:InferenceEngine, prompt_template:str,
339
+ review_mode:str, review_prompt:str=None,system_prompt:str=None, **kwrs):
330
340
  """
331
341
  This class add a review step after the BasicFrameExtractor.
332
342
  The Review process asks LLM to review its output and:
@@ -340,8 +350,9 @@ class ReviewFrameExtractor(BasicFrameExtractor):
340
350
  the LLM inferencing engine object. Must implements the chat() method.
341
351
  prompt_template : str
342
352
  prompt template with "{{<placeholder name>}}" placeholder.
343
- review_prompt : str
344
- the prompt text that ask LLM to review. Specify addition or revision in the instruction.
353
+ review_prompt : str: Optional
354
+ the prompt text that ask LLM to review. Specify addition or revision in the instruction.
355
+ if not provided, a default review prompt will be used.
345
356
  review_mode : str
346
357
  review mode. Must be one of {"addition", "revision"}
347
358
  addition mode only ask LLM to add new frames, while revision mode ask LLM to regenerate.
@@ -350,11 +361,20 @@ class ReviewFrameExtractor(BasicFrameExtractor):
350
361
  """
351
362
  super().__init__(inference_engine=inference_engine, prompt_template=prompt_template,
352
363
  system_prompt=system_prompt, **kwrs)
353
- self.review_prompt = review_prompt
354
364
  if review_mode not in {"addition", "revision"}:
355
365
  raise ValueError('review_mode must be one of {"addition", "revision"}.')
356
366
  self.review_mode = review_mode
357
367
 
368
+ if review_prompt:
369
+ self.review_prompt = review_prompt
370
+ else:
371
+ file_path = importlib.resources.files('llm_ie.asset.default_prompts').\
372
+ joinpath(f"{self.__class__.__name__}_{self.review_mode}_review_prompt.txt")
373
+ with open(file_path, 'r') as f:
374
+ self.review_prompt = f.read()
375
+
376
+ warnings.warn(f'Custom review prompt not provided. The default review prompt is used:\n"{self.review_prompt}"', UserWarning)
377
+
358
378
 
359
379
  def extract(self, text_content:Union[str, Dict[str,str]],
360
380
  max_new_tokens:int=4096, temperature:float=0.0, stream:bool=False, **kwrs) -> str:
@@ -377,12 +397,15 @@ class ReviewFrameExtractor(BasicFrameExtractor):
377
397
  Return : str
378
398
  the output from LLM. Need post-processing.
379
399
  """
380
- # Pormpt extraction
381
400
  messages = []
382
401
  if self.system_prompt:
383
402
  messages.append({'role': 'system', 'content': self.system_prompt})
384
403
 
385
404
  messages.append({'role': 'user', 'content': self._get_user_prompt(text_content)})
405
+ # Initial output
406
+ if stream:
407
+ print(f"{Fore.BLUE}Initial Output:{Style.RESET_ALL}")
408
+
386
409
  initial = self.inference_engine.chat(
387
410
  messages=messages,
388
411
  max_new_tokens=max_new_tokens,
@@ -395,6 +418,8 @@ class ReviewFrameExtractor(BasicFrameExtractor):
395
418
  messages.append({'role': 'assistant', 'content': initial})
396
419
  messages.append({'role': 'user', 'content': self.review_prompt})
397
420
 
421
+ if stream:
422
+ print(f"\n{Fore.YELLOW}Review:{Style.RESET_ALL}")
398
423
  review = self.inference_engine.chat(
399
424
  messages=messages,
400
425
  max_new_tokens=max_new_tokens,
@@ -459,7 +484,7 @@ class SentenceFrameExtractor(FrameExtractor):
459
484
 
460
485
 
461
486
  def extract(self, text_content:Union[str, Dict[str,str]], max_new_tokens:int=512,
462
- document_key:str=None, multi_turn:bool=True, temperature:float=0.0, stream:bool=False, **kwrs) -> List[Dict[str,str]]:
487
+ document_key:str=None, multi_turn:bool=False, temperature:float=0.0, stream:bool=False, **kwrs) -> List[Dict[str,str]]:
463
488
  """
464
489
  This method inputs a text and outputs a list of outputs per sentence.
465
490
 
@@ -507,8 +532,8 @@ class SentenceFrameExtractor(FrameExtractor):
507
532
  for sent in sentences:
508
533
  messages.append({'role': 'user', 'content': sent['sentence_text']})
509
534
  if stream:
510
- print(f"\n\nSentence: \n{sent['sentence_text']}\n")
511
- print("Extraction:")
535
+ print(f"\n\n{Fore.GREEN}Sentence: {Style.RESET_ALL}\n{sent['sentence_text']}\n")
536
+ print(f"{Fore.BLUE}Extraction:{Style.RESET_ALL}")
512
537
 
513
538
  gen_text = self.inference_engine.chat(
514
539
  messages=messages,
@@ -534,7 +559,8 @@ class SentenceFrameExtractor(FrameExtractor):
534
559
 
535
560
 
536
561
  def extract_frames(self, text_content:Union[str, Dict[str,str]], entity_key:str, max_new_tokens:int=512,
537
- document_key:str=None, multi_turn:bool=True, temperature:float=0.0, stream:bool=False, **kwrs) -> List[LLMInformationExtractionFrame]:
562
+ document_key:str=None, multi_turn:bool=False, temperature:float=0.0, case_sensitive:bool=False,
563
+ stream:bool=False, **kwrs) -> List[LLMInformationExtractionFrame]:
538
564
  """
539
565
  This method inputs a text and outputs a list of LLMInformationExtractionFrame
540
566
  It use the extract() method and post-process outputs into frames.
@@ -560,6 +586,8 @@ class SentenceFrameExtractor(FrameExtractor):
560
586
  can better utilize the KV caching.
561
587
  temperature : float, Optional
562
588
  the temperature for token sampling.
589
+ case_sensitive : bool, Optional
590
+ if True, entity text matching will be case-sensitive.
563
591
  stream : bool, Optional
564
592
  if True, LLM generated text will be printed in terminal in real-time.
565
593
 
@@ -575,9 +603,15 @@ class SentenceFrameExtractor(FrameExtractor):
575
603
  **kwrs)
576
604
  frame_list = []
577
605
  for sent in llm_output_sentence:
578
- entity_json = self._extract_json(gen_text=sent['gen_text'])
606
+ entity_json = []
607
+ for entity in self._extract_json(gen_text=sent['gen_text']):
608
+ if entity_key in entity:
609
+ entity_json.append(entity)
610
+ else:
611
+ warnings.warn(f'Extractor output "{entity}" does not have entity_key ("{entity_key}"). This frame will be dropped.', RuntimeWarning)
612
+
579
613
  spans = self._find_entity_spans(text=sent['sentence_text'],
580
- entities=[e[entity_key] for e in entity_json], case_sensitive=False)
614
+ entities=[e[entity_key] for e in entity_json], case_sensitive=case_sensitive)
581
615
  for ent, span in zip(entity_json, spans):
582
616
  if span is not None:
583
617
  start, end = span
@@ -592,6 +626,248 @@ class SentenceFrameExtractor(FrameExtractor):
592
626
  return frame_list
593
627
 
594
628
 
629
+ class SentenceReviewFrameExtractor(SentenceFrameExtractor):
630
+ def __init__(self, inference_engine:InferenceEngine, prompt_template:str,
631
+ review_mode:str, review_prompt:str=None, system_prompt:str=None, **kwrs):
632
+ """
633
+ This class adds a review step after the SentenceFrameExtractor.
634
+ For each sentence, the review process asks LLM to review its output and:
635
+ 1. add more frames while keeping current. This is efficient for boosting recall.
636
+ 2. or, regenerate frames (add new and delete existing).
637
+ Use the review_mode parameter to specify. Note that the review_prompt should instruct LLM accordingly.
638
+
639
+ Parameters:
640
+ ----------
641
+ inference_engine : InferenceEngine
642
+ the LLM inferencing engine object. Must implements the chat() method.
643
+ prompt_template : str
644
+ prompt template with "{{<placeholder name>}}" placeholder.
645
+ review_prompt : str: Optional
646
+ the prompt text that ask LLM to review. Specify addition or revision in the instruction.
647
+ if not provided, a default review prompt will be used.
648
+ review_mode : str
649
+ review mode. Must be one of {"addition", "revision"}
650
+ addition mode only ask LLM to add new frames, while revision mode ask LLM to regenerate.
651
+ system_prompt : str, Optional
652
+ system prompt.
653
+ """
654
+ super().__init__(inference_engine=inference_engine, prompt_template=prompt_template,
655
+ system_prompt=system_prompt, **kwrs)
656
+
657
+ if review_mode not in {"addition", "revision"}:
658
+ raise ValueError('review_mode must be one of {"addition", "revision"}.')
659
+ self.review_mode = review_mode
660
+
661
+ if review_prompt:
662
+ self.review_prompt = review_prompt
663
+ else:
664
+ file_path = importlib.resources.files('llm_ie.asset.default_prompts').\
665
+ joinpath(f"{self.__class__.__name__}_{self.review_mode}_review_prompt.txt")
666
+ with open(file_path, 'r') as f:
667
+ self.review_prompt = f.read()
668
+
669
+ warnings.warn(f'Custom review prompt not provided. The default review prompt is used:\n"{self.review_prompt}"', UserWarning)
670
+
671
+
672
+ def extract(self, text_content:Union[str, Dict[str,str]], max_new_tokens:int=512,
673
+ document_key:str=None, multi_turn:bool=False, temperature:float=0.0, stream:bool=False, **kwrs) -> List[Dict[str,str]]:
674
+ """
675
+ This method inputs a text and outputs a list of outputs per sentence.
676
+
677
+ Parameters:
678
+ ----------
679
+ text_content : Union[str, Dict[str,str]]
680
+ the input text content to put in prompt template.
681
+ If str, the prompt template must has only 1 placeholder {{<placeholder name>}}, regardless of placeholder name.
682
+ If dict, all the keys must be included in the prompt template placeholder {{<placeholder name>}}.
683
+ max_new_tokens : str, Optional
684
+ the max number of new tokens LLM should generate.
685
+ document_key : str, Optional
686
+ specify the key in text_content where document text is.
687
+ If text_content is str, this parameter will be ignored.
688
+ multi_turn : bool, Optional
689
+ multi-turn conversation prompting.
690
+ If True, sentences and LLM outputs will be appended to the input message and carry-over.
691
+ If False, only the current sentence is prompted.
692
+ For LLM inference engines that supports prompt cache (e.g., Llama.Cpp, Ollama), use multi-turn conversation prompting
693
+ can better utilize the KV caching.
694
+ temperature : float, Optional
695
+ the temperature for token sampling.
696
+ stream : bool, Optional
697
+ if True, LLM generated text will be printed in terminal in real-time.
698
+
699
+ Return : str
700
+ the output from LLM. Need post-processing.
701
+ """
702
+ # define output
703
+ output = []
704
+ # sentence tokenization
705
+ if isinstance(text_content, str):
706
+ sentences = self._get_sentences(text_content)
707
+ elif isinstance(text_content, dict):
708
+ sentences = self._get_sentences(text_content[document_key])
709
+ # construct chat messages
710
+ messages = []
711
+ if self.system_prompt:
712
+ messages.append({'role': 'system', 'content': self.system_prompt})
713
+
714
+ messages.append({'role': 'user', 'content': self._get_user_prompt(text_content)})
715
+ messages.append({'role': 'assistant', 'content': 'Sure, please start with the first sentence.'})
716
+
717
+ # generate sentence by sentence
718
+ for sent in sentences:
719
+ messages.append({'role': 'user', 'content': sent['sentence_text']})
720
+ if stream:
721
+ print(f"\n\n{Fore.GREEN}Sentence: {Style.RESET_ALL}\n{sent['sentence_text']}\n")
722
+ print(f"{Fore.BLUE}Initial Output:{Style.RESET_ALL}")
723
+
724
+ initial = self.inference_engine.chat(
725
+ messages=messages,
726
+ max_new_tokens=max_new_tokens,
727
+ temperature=temperature,
728
+ stream=stream,
729
+ **kwrs
730
+ )
731
+
732
+ # Review
733
+ if stream:
734
+ print(f"\n{Fore.YELLOW}Review:{Style.RESET_ALL}")
735
+ messages.append({'role': 'assistant', 'content': initial})
736
+ messages.append({'role': 'user', 'content': self.review_prompt})
737
+
738
+ review = self.inference_engine.chat(
739
+ messages=messages,
740
+ max_new_tokens=max_new_tokens,
741
+ temperature=temperature,
742
+ stream=stream,
743
+ **kwrs
744
+ )
745
+
746
+ # Output
747
+ if self.review_mode == "revision":
748
+ gen_text = review
749
+ elif self.review_mode == "addition":
750
+ gen_text = initial + '\n' + review
751
+
752
+ if multi_turn:
753
+ # update chat messages with LLM outputs
754
+ messages.append({'role': 'assistant', 'content': review})
755
+ else:
756
+ # delete sentence and review so that message is reset
757
+ del messages[-3:]
758
+
759
+ # add to output
760
+ output.append({'sentence_start': sent['start'],
761
+ 'sentence_end': sent['end'],
762
+ 'sentence_text': sent['sentence_text'],
763
+ 'gen_text': gen_text})
764
+ return output
765
+
766
+
767
+ class SentenceCoTFrameExtractor(SentenceFrameExtractor):
768
+ from nltk.tokenize.punkt import PunktSentenceTokenizer
769
+ def __init__(self, inference_engine:InferenceEngine, prompt_template:str, system_prompt:str=None, **kwrs):
770
+ """
771
+ This class performs sentence-based Chain-of-thoughts (CoT) information extraction.
772
+ A simulated chat follows this process:
773
+ 1. system prompt (optional)
774
+ 2. user instructions (schema, background, full text, few-shot example...)
775
+ 3. user input first sentence
776
+ 4. assistant analyze the sentence
777
+ 5. assistant extract outputs
778
+ 6. repeat #3, #4, #5
779
+
780
+ Input system prompt (optional), prompt template (with user instructions),
781
+ and specify a LLM.
782
+
783
+ Parameters
784
+ ----------
785
+ inference_engine : InferenceEngine
786
+ the LLM inferencing engine object. Must implements the chat() method.
787
+ prompt_template : str
788
+ prompt template with "{{<placeholder name>}}" placeholder.
789
+ system_prompt : str, Optional
790
+ system prompt.
791
+ """
792
+ super().__init__(inference_engine=inference_engine, prompt_template=prompt_template,
793
+ system_prompt=system_prompt, **kwrs)
794
+
795
+
796
+ def extract(self, text_content:Union[str, Dict[str,str]], max_new_tokens:int=512,
797
+ document_key:str=None, multi_turn:bool=False, temperature:float=0.0, stream:bool=False, **kwrs) -> List[Dict[str,str]]:
798
+ """
799
+ This method inputs a text and outputs a list of outputs per sentence.
800
+
801
+ Parameters:
802
+ ----------
803
+ text_content : Union[str, Dict[str,str]]
804
+ the input text content to put in prompt template.
805
+ If str, the prompt template must has only 1 placeholder {{<placeholder name>}}, regardless of placeholder name.
806
+ If dict, all the keys must be included in the prompt template placeholder {{<placeholder name>}}.
807
+ max_new_tokens : str, Optional
808
+ the max number of new tokens LLM should generate.
809
+ document_key : str, Optional
810
+ specify the key in text_content where document text is.
811
+ If text_content is str, this parameter will be ignored.
812
+ multi_turn : bool, Optional
813
+ multi-turn conversation prompting.
814
+ If True, sentences and LLM outputs will be appended to the input message and carry-over.
815
+ If False, only the current sentence is prompted.
816
+ For LLM inference engines that supports prompt cache (e.g., Llama.Cpp, Ollama), use multi-turn conversation prompting
817
+ can better utilize the KV caching.
818
+ temperature : float, Optional
819
+ the temperature for token sampling.
820
+ stream : bool, Optional
821
+ if True, LLM generated text will be printed in terminal in real-time.
822
+
823
+ Return : str
824
+ the output from LLM. Need post-processing.
825
+ """
826
+ # define output
827
+ output = []
828
+ # sentence tokenization
829
+ if isinstance(text_content, str):
830
+ sentences = self._get_sentences(text_content)
831
+ elif isinstance(text_content, dict):
832
+ sentences = self._get_sentences(text_content[document_key])
833
+ # construct chat messages
834
+ messages = []
835
+ if self.system_prompt:
836
+ messages.append({'role': 'system', 'content': self.system_prompt})
837
+
838
+ messages.append({'role': 'user', 'content': self._get_user_prompt(text_content)})
839
+ messages.append({'role': 'assistant', 'content': 'Sure, please start with the first sentence.'})
840
+
841
+ # generate sentence by sentence
842
+ for sent in sentences:
843
+ messages.append({'role': 'user', 'content': sent['sentence_text']})
844
+ if stream:
845
+ print(f"\n\n{Fore.GREEN}Sentence: {Style.RESET_ALL}\n{sent['sentence_text']}\n")
846
+ print(f"{Fore.BLUE}CoT:{Style.RESET_ALL}")
847
+
848
+ gen_text = self.inference_engine.chat(
849
+ messages=messages,
850
+ max_new_tokens=max_new_tokens,
851
+ temperature=temperature,
852
+ stream=stream,
853
+ **kwrs
854
+ )
855
+
856
+ if multi_turn:
857
+ # update chat messages with LLM outputs
858
+ messages.append({'role': 'assistant', 'content': gen_text})
859
+ else:
860
+ # delete sentence so that message is reset
861
+ del messages[-1]
862
+
863
+ # add to output
864
+ output.append({'sentence_start': sent['start'],
865
+ 'sentence_end': sent['end'],
866
+ 'sentence_text': sent['sentence_text'],
867
+ 'gen_text': gen_text})
868
+ return output
869
+
870
+
595
871
  class RelationExtractor(Extractor):
596
872
  def __init__(self, inference_engine:InferenceEngine, prompt_template:str, system_prompt:str=None, **kwrs):
597
873
  """
@@ -752,8 +1028,8 @@ class BinaryRelationExtractor(RelationExtractor):
752
1028
  """
753
1029
  roi_text = self._get_ROI(frame_1, frame_2, text, buffer_size=buffer_size)
754
1030
  if stream:
755
- print(f"\n\nROI text: \n{roi_text}\n")
756
- print("Extraction:")
1031
+ print(f"\n\n{Fore.GREEN}ROI text:{Style.RESET_ALL} \n{roi_text}\n")
1032
+ print(f"{Fore.BLUE}Extraction:{Style.RESET_ALL}")
757
1033
 
758
1034
  messages = []
759
1035
  if self.system_prompt:
@@ -904,8 +1180,8 @@ class MultiClassRelationExtractor(RelationExtractor):
904
1180
  """
905
1181
  roi_text = self._get_ROI(frame_1, frame_2, text, buffer_size=buffer_size)
906
1182
  if stream:
907
- print(f"\n\nROI text: \n{roi_text}\n")
908
- print("Extraction:")
1183
+ print(f"\n\n{Fore.GREEN}ROI text:{Style.RESET_ALL} \n{roi_text}\n")
1184
+ print(f"{Fore.BLUE}Extraction:{Style.RESET_ALL}")
909
1185
 
910
1186
  messages = []
911
1187
  if self.system_prompt:
llm_ie/prompt_editor.py CHANGED
@@ -1,10 +1,9 @@
1
1
  import sys
2
- from typing import Dict, Union
2
+ from typing import Dict
3
3
  import importlib.resources
4
4
  from llm_ie.engines import InferenceEngine
5
5
  from llm_ie.extractors import FrameExtractor
6
6
  import re
7
- import colorama
8
7
  from colorama import Fore, Style
9
8
  import ipywidgets as widgets
10
9
  from IPython.display import display, HTML
@@ -90,7 +89,6 @@ class PromptEditor:
90
89
  """
91
90
  This method runs an interactive chat session in the terminal to help users write prompt templates.
92
91
  """
93
- colorama.init(autoreset=True)
94
92
  file_path = importlib.resources.files('llm_ie.asset.PromptEditor_prompts').joinpath('chat.txt')
95
93
  with open(file_path, 'r') as f:
96
94
  chat_prompt_template = f.read()
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: llm-ie
3
- Version: 0.3.0
3
+ Version: 0.3.1
4
4
  Summary: An LLM-powered tool that transforms everyday language into robust information extraction pipelines.
5
5
  License: MIT
6
6
  Author: Enshuo (David) Hsu
@@ -9,6 +9,7 @@ Classifier: License :: OSI Approved :: MIT License
9
9
  Classifier: Programming Language :: Python :: 3
10
10
  Classifier: Programming Language :: Python :: 3.11
11
11
  Classifier: Programming Language :: Python :: 3.12
12
+ Requires-Dist: colorama (>=0.4.6,<0.5.0)
12
13
  Requires-Dist: nltk (>=3.8,<4.0)
13
14
  Description-Content-Type: text/markdown
14
15
 
@@ -3,15 +3,21 @@ llm_ie/asset/PromptEditor_prompts/chat.txt,sha256=Fq62voV0JQ8xBRcxS1Nmdd7DkHs1fG
3
3
  llm_ie/asset/PromptEditor_prompts/comment.txt,sha256=C_lxx-dlOlFJ__jkHKosZ8HsNAeV1aowh2B36nIipBY,159
4
4
  llm_ie/asset/PromptEditor_prompts/rewrite.txt,sha256=JAwY9vm1jSmKf2qcLBYUvrSmME2EJH36bALmkwZDWYQ,178
5
5
  llm_ie/asset/PromptEditor_prompts/system.txt,sha256=QwGTIJvp-5u2P8CkGt_rabttlN1puHQwIBNquUm1ZHo,730
6
+ llm_ie/asset/default_prompts/ReviewFrameExtractor_addition_review_prompt.txt,sha256=pKes8BOAoJJgmo_IQh2ISKiMh_rDPl_rDUU_VgDQ4o4,273
7
+ llm_ie/asset/default_prompts/ReviewFrameExtractor_revision_review_prompt.txt,sha256=9Nwkr2U_3ZSk01xDtgiFJVABi6FkC8Izdq7zrzFfLRg,235
8
+ llm_ie/asset/default_prompts/SentenceReviewFrameExtractor_addition_review_prompt.txt,sha256=Of11LFuXLB249oekFelzlIeoAB0cATReqWgFTvhNz_8,329
9
+ llm_ie/asset/default_prompts/SentenceReviewFrameExtractor_revision_review_prompt.txt,sha256=kNJQK7NdoCx13TXGY8HYGrW_v4SEaErK8j9qIzd70CM,291
6
10
  llm_ie/asset/prompt_guide/BasicFrameExtractor_prompt_guide.txt,sha256=m7iX4Qjsf1N2V1mbjE-x4F-qPGZA2qGJbUCdpets394,9293
7
11
  llm_ie/asset/prompt_guide/BinaryRelationExtractor_prompt_guide.txt,sha256=Z6Yc2_QRqroWcJ13owNJbo78I0wpS4XXDsOjXFR-aPk,2166
8
12
  llm_ie/asset/prompt_guide/MultiClassRelationExtractor_prompt_guide.txt,sha256=EQ9Jmh0CQmlfkWqXx6_apuEZUKK3WIrdpAvfbTX2_No,3011
9
13
  llm_ie/asset/prompt_guide/ReviewFrameExtractor_prompt_guide.txt,sha256=m7iX4Qjsf1N2V1mbjE-x4F-qPGZA2qGJbUCdpets394,9293
14
+ llm_ie/asset/prompt_guide/SentenceCoTFrameExtractor_prompt_guide.txt,sha256=T4NsO33s3KSJml-klzXAJiYox0kiuxGo-ou2a2Ig2SY,14225
10
15
  llm_ie/asset/prompt_guide/SentenceFrameExtractor_prompt_guide.txt,sha256=oKH_QeDgpw771ZdHk3L7DYz2Jvfm7OolUoTiJyMJI30,9541
16
+ llm_ie/asset/prompt_guide/SentenceReviewFrameExtractor_prompt_guide.txt,sha256=oKH_QeDgpw771ZdHk3L7DYz2Jvfm7OolUoTiJyMJI30,9541
11
17
  llm_ie/data_types.py,sha256=hPz3WOeAzfn2QKmb0CxHmRdQWZQ4G9zq8U-RJBVFdYk,14329
12
18
  llm_ie/engines.py,sha256=PTYs7s_iCPmI-yFUCVCPY_cMGS77ma2VGoz4rdNkODI,9308
13
- llm_ie/extractors.py,sha256=l0zJEtPSuy-2f_OxPQFPH3RsyLydmN9MRQJQYiRdRbY,45026
14
- llm_ie/prompt_editor.py,sha256=y8YI-nOdBpxSr2boQBqG_vuGhW572a9rYPG3eb0oWH0,8205
15
- llm_ie-0.3.0.dist-info/METADATA,sha256=s2lWsb4RvN9_95KHqxY77Eclw4aEooTEay0lDcKD6VM,41225
16
- llm_ie-0.3.0.dist-info/WHEEL,sha256=sP946D7jFCHeNz5Iq4fL4Lu-PrWrFsgfLXbbkciIZwg,88
17
- llm_ie-0.3.0.dist-info/RECORD,,
19
+ llm_ie/extractors.py,sha256=xgkicRzBPRaQPiKWmQJ5b_aiNv9VEc85jzBA7cQXic8,58331
20
+ llm_ie/prompt_editor.py,sha256=3h_2yIe7OV4auv4Vb9Zdx2q26UhC0xp9c4tt_yDr78I,8144
21
+ llm_ie-0.3.1.dist-info/METADATA,sha256=eJCzg7G_ivz0CcP9KycSeHo986se6tqA8cKLtQyTtw4,41266
22
+ llm_ie-0.3.1.dist-info/WHEEL,sha256=sP946D7jFCHeNz5Iq4fL4Lu-PrWrFsgfLXbbkciIZwg,88
23
+ llm_ie-0.3.1.dist-info/RECORD,,
File without changes