PyPI - llm-ie - Versions diffs - 0.4.3__tar.gz → 0.4.5__tar.gz - Mend

llm-ie 0.4.3tar.gz → 0.4.5tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

llm_ie-0.4.3/README.md → llm_ie-0.4.5/PKG-INFO RENAMED Viewed

@@ -1,3 +1,20 @@
+Metadata-Version: 2.1
+Name: llm-ie
+Version: 0.4.5
+Summary: An LLM-powered tool that transforms everyday language into robust information extraction pipelines.
+License: MIT
+Author: Enshuo (David) Hsu
+Requires-Python: >=3.11,<4.0
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Requires-Dist: colorama (>=0.4.6,<0.5.0)
+Requires-Dist: json_repair (>=0.30,<0.31)
+Requires-Dist: nest_asyncio (>=1.6.0,<2.0.0)
+Requires-Dist: nltk (>=3.8,<4.0)
+Description-Content-Type: text/markdown
 <div align="center"><img src=doc_asset/readme_img/LLM-IE.png width=500 ></div>
 ![Python Version](https://img.shields.io/pypi/pyversions/llm-ie)
@@ -24,6 +41,10 @@ An LLM-powered tool that transforms everyday language into robust information ex
     - Support for LiteLLM.
 - [v0.4.1](https://github.com/daviden1013/llm-ie/releases/tag/v0.4.1) (Jan 25, 2025): Added filters, table view, and some new features to visualization tool (make sure to update [ie-viz](https://github.com/daviden1013/ie-viz)).
 - [v0.4.3](https://github.com/daviden1013/llm-ie/releases/tag/v0.4.3) (Feb 7, 2025): Added Azure OpenAI support.
+- [v0.4.5](https://github.com/daviden1013/llm-ie/releases/tag/v0.4.5) (Feb 16, 2025):
+    - Added option to adjust number of context sentences in sentence-based extractors.
+    - Added support for OpenAI reasoning models ("o" series).
 ## Table of Contents
 - [Overview](#overview)
@@ -323,6 +344,14 @@ from llm_ie.engines import OpenAIInferenceEngine
 inference_engine = OpenAIInferenceEngine(model="gpt-4o-mini")
 ```
+For reasoning models ("o" series), use the `reasoning_model=True` flag. The `max_completion_tokens` will be used instead of the `max_tokens`. `temperature` will be ignored.
+```python
+from llm_ie.engines import OpenAIInferenceEngine
+inference_engine = OpenAIInferenceEngine(model="o1-mini", reasoning_model=True)
+```
 #### <img src=doc_asset/readme_img/Azure_icon.png width=32 /> Azure OpenAI API
 In bash, save the endpoint name and API key to environmental variables `AZURE_OPENAI_ENDPOINT` and `AZURE_OPENAI_API_KEY`.
 ```
@@ -339,6 +368,14 @@ from llm_ie.engines import AzureOpenAIInferenceEngine
 inference_engine = AzureOpenAIInferenceEngine(model="gpt-4o-mini")
 ```
+For reasoning models ("o" series), use the `reasoning_model=True` flag. The `max_completion_tokens` will be used instead of the `max_tokens`. `temperature` will be ignored.
+```python
+from llm_ie.engines import AzureOpenAIInferenceEngine
+inference_engine = AzureOpenAIInferenceEngine(model="o1-mini", reasoning_model=True)
+```
 #### 🤗 huggingface_hub
 The ```model``` can be a model id hosted on the Hugging Face Hub or a URL to a deployed Inference Endpoint. Refer to the [Inference Client](https://huggingface.co/docs/huggingface_hub/en/package_reference/inference_client) documentation for more details.
@@ -766,7 +803,7 @@ frames = extractor.extract_frames(text_content=text, entity_key="Diagnosis", str
 The ```SentenceFrameExtractor``` instructs the LLM to extract sentence by sentence. The reason is to ensure the accuracy of frame spans. It also prevents LLMs from overseeing sections/ sentences. Empirically, this extractor results in better recall than the ```BasicFrameExtractor``` in complex tasks.
-For concurrent extraction (recommended), the `async/ await` feature is used to speed up inferencing. The `concurrent_batch_size` sets the batch size of sentences to be processed in cocurrent.
+For concurrent extraction (recommended), the `async/await` feature is used to speed up inferencing. The `concurrent_batch_size` sets the batch size of sentences to be processed in cocurrent.
 ```python
 from llm_ie.extractors import SentenceFrameExtractor
@@ -775,15 +812,29 @@ extractor = SentenceFrameExtractor(inference_engine, prompt_temp)
 frames = extractor.extract_frames(text_content=text, entity_key="Diagnosis", case_sensitive=False, fuzzy_match=True, concurrent=True, concurrent_batch_size=32)
 ```
-The ```multi_turn``` parameter specifies multi-turn conversation for prompting. If True, sentences and LLM outputs will be appended to the input message and carry-over. If False, only the current sentence is prompted. For LLM inference engines that supports prompt cache (e.g., Llama.Cpp, Ollama), use multi-turn conversation prompting can better utilize the KV caching and results in faster inferencing. But for vLLM with [Automatic Prefix Caching (APC)](https://docs.vllm.ai/en/latest/automatic_prefix_caching/apc.html), multi-turn conversation is not necessary.
+The `context_sentences` sets number of sentences before and after the sentence of interest to provide additional context. When `context_sentences=2`, 2 sentences before and 2 sentences after are included in the user prompt as context. When `context_sentences="all"`, the entire document is included as context. When `context_sentences=0`, no context is provided and LLM will only extract based on the current sentence of interest.
 ```python
 from llm_ie.extractors import SentenceFrameExtractor
-extractor = SentenceFrameExtractor(inference_engine, prompt_temp)
-frames = extractor.extract_frames(text_content=text, entity_key="Diagnosis", multi_turn=False, case_sensitive=False, fuzzy_match=True, stream=True)
+extractor = SentenceFrameExtractor(inference_engine=inference_engine,
+                                   prompt_template=prompt_temp,
+                                   context_sentences=2)
+frames = extractor.extract_frames(text_content=text, entity_key="Diagnosis", case_sensitive=False, fuzzy_match=True, stream=True)
 ```
+For the sentence:
+*The patient has a history of hypertension, hyperlipidemia, and Type 2 diabetes mellitus.*
+The context is "previous sentence 2" "previous sentence 1" "the sentence of interest" "proceeding sentence 1" "proceeding sentence 2":
+*Emily Brown, MD (Cardiology), Dr. Michael Green, MD (Pulmonology)
+*#### Reason for Admission*
+*John Doe, a 49-year-old male, was admitted to the hospital with complaints of chest pain, shortness of breath, and dizziness. The patient has a history of hypertension, hyperlipidemia, and Type 2 diabetes mellitus. #### History of Present Illness*
+*The patient reported that the chest pain started two days prior to admission. The pain was described as a pressure-like sensation in the central chest, radiating to the left arm and jaw.*
 </details>
 <details>
@@ -1161,4 +1212,4 @@ For more information and benchmarks, please check our paper:
   journal={arXiv preprint arXiv:2411.11779},
   year={2024}
 }
-```
+```

llm_ie-0.4.3/PKG-INFO → llm_ie-0.4.5/README.md RENAMED Viewed

@@ -1,20 +1,3 @@
-Metadata-Version: 2.1
-Name: llm-ie
-Version: 0.4.3
-Summary: An LLM-powered tool that transforms everyday language into robust information extraction pipelines.
-License: MIT
-Author: Enshuo (David) Hsu
-Requires-Python: >=3.11,<4.0
-Classifier: License :: OSI Approved :: MIT License
-Classifier: Programming Language :: Python :: 3
-Classifier: Programming Language :: Python :: 3.11
-Classifier: Programming Language :: Python :: 3.12
-Requires-Dist: colorama (>=0.4.6,<0.5.0)
-Requires-Dist: json_repair (>=0.30,<0.31)
-Requires-Dist: nest_asyncio (>=0.1.6,<0.2.0)
-Requires-Dist: nltk (>=3.8,<4.0)
-Description-Content-Type: text/markdown
 <div align="center"><img src=doc_asset/readme_img/LLM-IE.png width=500 ></div>
 ![Python Version](https://img.shields.io/pypi/pyversions/llm-ie)
@@ -41,6 +24,10 @@ An LLM-powered tool that transforms everyday language into robust information ex
     - Support for LiteLLM.
 - [v0.4.1](https://github.com/daviden1013/llm-ie/releases/tag/v0.4.1) (Jan 25, 2025): Added filters, table view, and some new features to visualization tool (make sure to update [ie-viz](https://github.com/daviden1013/ie-viz)).
 - [v0.4.3](https://github.com/daviden1013/llm-ie/releases/tag/v0.4.3) (Feb 7, 2025): Added Azure OpenAI support.
+- [v0.4.5](https://github.com/daviden1013/llm-ie/releases/tag/v0.4.5) (Feb 16, 2025):
+    - Added option to adjust number of context sentences in sentence-based extractors.
+    - Added support for OpenAI reasoning models ("o" series).
 ## Table of Contents
 - [Overview](#overview)
@@ -340,6 +327,14 @@ from llm_ie.engines import OpenAIInferenceEngine
 inference_engine = OpenAIInferenceEngine(model="gpt-4o-mini")
 ```
+For reasoning models ("o" series), use the `reasoning_model=True` flag. The `max_completion_tokens` will be used instead of the `max_tokens`. `temperature` will be ignored.
+```python
+from llm_ie.engines import OpenAIInferenceEngine
+inference_engine = OpenAIInferenceEngine(model="o1-mini", reasoning_model=True)
+```
 #### <img src=doc_asset/readme_img/Azure_icon.png width=32 /> Azure OpenAI API
 In bash, save the endpoint name and API key to environmental variables `AZURE_OPENAI_ENDPOINT` and `AZURE_OPENAI_API_KEY`.
 ```
@@ -356,6 +351,14 @@ from llm_ie.engines import AzureOpenAIInferenceEngine
 inference_engine = AzureOpenAIInferenceEngine(model="gpt-4o-mini")
 ```
+For reasoning models ("o" series), use the `reasoning_model=True` flag. The `max_completion_tokens` will be used instead of the `max_tokens`. `temperature` will be ignored.
+```python
+from llm_ie.engines import AzureOpenAIInferenceEngine
+inference_engine = AzureOpenAIInferenceEngine(model="o1-mini", reasoning_model=True)
+```
 #### 🤗 huggingface_hub
 The ```model``` can be a model id hosted on the Hugging Face Hub or a URL to a deployed Inference Endpoint. Refer to the [Inference Client](https://huggingface.co/docs/huggingface_hub/en/package_reference/inference_client) documentation for more details.
@@ -783,7 +786,7 @@ frames = extractor.extract_frames(text_content=text, entity_key="Diagnosis", str
 The ```SentenceFrameExtractor``` instructs the LLM to extract sentence by sentence. The reason is to ensure the accuracy of frame spans. It also prevents LLMs from overseeing sections/ sentences. Empirically, this extractor results in better recall than the ```BasicFrameExtractor``` in complex tasks.
-For concurrent extraction (recommended), the `async/ await` feature is used to speed up inferencing. The `concurrent_batch_size` sets the batch size of sentences to be processed in cocurrent.
+For concurrent extraction (recommended), the `async/await` feature is used to speed up inferencing. The `concurrent_batch_size` sets the batch size of sentences to be processed in cocurrent.
 ```python
 from llm_ie.extractors import SentenceFrameExtractor
@@ -792,15 +795,29 @@ extractor = SentenceFrameExtractor(inference_engine, prompt_temp)
 frames = extractor.extract_frames(text_content=text, entity_key="Diagnosis", case_sensitive=False, fuzzy_match=True, concurrent=True, concurrent_batch_size=32)
 ```
-The ```multi_turn``` parameter specifies multi-turn conversation for prompting. If True, sentences and LLM outputs will be appended to the input message and carry-over. If False, only the current sentence is prompted. For LLM inference engines that supports prompt cache (e.g., Llama.Cpp, Ollama), use multi-turn conversation prompting can better utilize the KV caching and results in faster inferencing. But for vLLM with [Automatic Prefix Caching (APC)](https://docs.vllm.ai/en/latest/automatic_prefix_caching/apc.html), multi-turn conversation is not necessary.
+The `context_sentences` sets number of sentences before and after the sentence of interest to provide additional context. When `context_sentences=2`, 2 sentences before and 2 sentences after are included in the user prompt as context. When `context_sentences="all"`, the entire document is included as context. When `context_sentences=0`, no context is provided and LLM will only extract based on the current sentence of interest.
 ```python
 from llm_ie.extractors import SentenceFrameExtractor
-extractor = SentenceFrameExtractor(inference_engine, prompt_temp)
-frames = extractor.extract_frames(text_content=text, entity_key="Diagnosis", multi_turn=False, case_sensitive=False, fuzzy_match=True, stream=True)
+extractor = SentenceFrameExtractor(inference_engine=inference_engine,
+                                   prompt_template=prompt_temp,
+                                   context_sentences=2)
+frames = extractor.extract_frames(text_content=text, entity_key="Diagnosis", case_sensitive=False, fuzzy_match=True, stream=True)
 ```
+For the sentence:
+*The patient has a history of hypertension, hyperlipidemia, and Type 2 diabetes mellitus.*
+The context is "previous sentence 2" "previous sentence 1" "the sentence of interest" "proceeding sentence 1" "proceeding sentence 2":
+*Emily Brown, MD (Cardiology), Dr. Michael Green, MD (Pulmonology)
+*#### Reason for Admission*
+*John Doe, a 49-year-old male, was admitted to the hospital with complaints of chest pain, shortness of breath, and dizziness. The patient has a history of hypertension, hyperlipidemia, and Type 2 diabetes mellitus. #### History of Present Illness*
+*The patient reported that the chest pain started two days prior to admission. The pain was described as a pressure-like sensation in the central chest, radiating to the left arm and jaw.*
 </details>
 <details>
@@ -1178,4 +1195,4 @@ For more information and benchmarks, please check our paper:
   journal={arXiv preprint arXiv:2411.11779},
   year={2024}
 }
-```
+```

{llm_ie-0.4.3 → llm_ie-0.4.5}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "llm-ie"
-version = "0.4.3"
+version = "0.4.5"
 description = "An LLM-powered tool that transforms everyday language into robust information extraction pipelines."
 authors = ["Enshuo (David) Hsu"]
 license = "MIT"
@@ -16,7 +16,7 @@ python = "^3.11"
 nltk = "^3.8"
 colorama = "^0.4.6"
 json_repair = "^0.30"
-nest_asyncio = "^0.1.6"
+nest_asyncio = "^1.6.0"
 [build-system]

{llm_ie-0.4.3 → llm_ie-0.4.5}/src/llm_ie/asset/prompt_guide/SentenceCoTFrameExtractor_prompt_guide.txt RENAMED Viewed

@@ -61,8 +61,8 @@ Example 1 (single entity type with attributes):
     If there is no specific arm, just omit the "Arm" key. If the percentage is not reported, just omit the "Percentage" key. The "Evidence" should always be provided.
     # Input placeholder
-    Below is the Adverse reactions section for your reference. I will feed you with sentences from it one by one.
-    {{input}}
+    Below is the Adverse reactions section:
+    "{{input}}"
 Example 2 (multiple entity types):
@@ -121,7 +121,7 @@ Example 2 (multiple entity types):
         </Outputs>
     # Input placeholder
-    Below is the medical note for your reference. I will feed you with sentences from it one by one.
+    Below is the medical note:
     "{{input}}"
@@ -213,5 +213,5 @@ Example 3 (multiple entity types with corresponding attributes):
         </Outputs>
     # Input placeholder
-    Below is the entire medical note for your reference. I will feed you with sentences from it one by one.
+    Below is the medical note:
     "{{input}}"

{llm_ie-0.4.3 → llm_ie-0.4.5}/src/llm_ie/asset/prompt_guide/SentenceFrameExtractor_prompt_guide.txt RENAMED Viewed

@@ -46,8 +46,8 @@ Example 1 (single entity type with attributes):
     If there is no specific arm, just omit the "Arm" key. If the percentage is not reported, just omit the "Percentage" key. The "Evidence" should always be provided.
     # Input placeholder
-    Below is the Adverse reactions section for your reference. I will feed you with sentences from it one by one.
-    {{input}}
+    Below is the Adverse reactions section:
+    "{{input}}"
 Example 2 (multiple entity types):
@@ -81,7 +81,7 @@ Example 2 (multiple entity types):
     # Input placeholder
-    Below is the medical note for your reference. I will feed you with sentences from it one by one.
+    Below is the medical note:
     "{{input}}"
@@ -141,5 +141,5 @@ Example 3 (multiple entity types with corresponding attributes):
     # Input placeholder
-    Below is the entire medical note for your reference. I will feed you with sentences from it one by one.
+    Below is the medical note:
     "{{input}}"

{llm_ie-0.4.3 → llm_ie-0.4.5}/src/llm_ie/asset/prompt_guide/SentenceReviewFrameExtractor_prompt_guide.txt RENAMED Viewed

@@ -46,8 +46,8 @@ Example 1 (single entity type with attributes):
     If there is no specific arm, just omit the "Arm" key. If the percentage is not reported, just omit the "Percentage" key. The "Evidence" should always be provided.
     # Input placeholder
-    Below is the Adverse reactions section for your reference. I will feed you with sentences from it one by one.
-    {{input}}
+    Below is the Adverse reactions section:
+    "{{input}}"
 Example 2 (multiple entity types):
@@ -81,7 +81,7 @@ Example 2 (multiple entity types):
     # Input placeholder
-    Below is the medical note for your reference. I will feed you with sentences from it one by one.
+    Below is the medical note:
     "{{input}}"
@@ -141,5 +141,5 @@ Example 3 (multiple entity types with corresponding attributes):
     # Input placeholder
-    Below is the entire medical note for your reference. I will feed you with sentences from it one by one.
+    Below is the medical note:
     "{{input}}"

{llm_ie-0.4.3 → llm_ie-0.4.5}/src/llm_ie/engines.py RENAMED Viewed

@@ -1,4 +1,5 @@
 import abc
+import warnings
 import importlib
 from typing import List, Dict, Union
@@ -242,7 +243,7 @@ class HuggingFaceHubInferenceEngine(InferenceEngine):
 class OpenAIInferenceEngine(InferenceEngine):
-    def __init__(self, model:str, **kwrs):
+    def __init__(self, model:str, reasoning_model:bool=False, **kwrs):
         """
         The OpenAI API inference engine. Supports OpenAI models and OpenAI compatible servers:
         - vLLM OpenAI compatible server (https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html)
@@ -254,6 +255,8 @@ class OpenAIInferenceEngine(InferenceEngine):
         ----------
         model_name : str
             model name as described in https://platform.openai.com/docs/models
+        reasoning_model : bool, Optional
+            indicator for OpenAI reasoning models ("o" series).
         """
         if importlib.util.find_spec("openai") is None:
             raise ImportError("OpenAI Python API library not found. Please install OpanAI (```pip install openai```).")
@@ -262,6 +265,7 @@ class OpenAIInferenceEngine(InferenceEngine):
         self.client = OpenAI(**kwrs)
         self.async_client = AsyncOpenAI(**kwrs)
         self.model = model
+        self.reasoning_model = reasoning_model
     def chat(self, messages:List[Dict[str,str]], max_new_tokens:int=2048, temperature:float=0.0, stream:bool=False, **kwrs) -> str:
         """
@@ -278,14 +282,27 @@ class OpenAIInferenceEngine(InferenceEngine):
         stream : bool, Optional
             if True, LLM generated text will be printed in terminal in real-time.
         """
-        response = self.client.chat.completions.create(
-            model=self.model,
-            messages=messages,
-            max_tokens=max_new_tokens,
-            temperature=temperature,
-            stream=stream,
-            **kwrs
-        )
+        if self.reasoning_model:
+            if temperature != 0.0:
+                warnings.warn("Reasoning models do not support temperature parameter. Will be ignored.", UserWarning)
+            response = self.client.chat.completions.create(
+                model=self.model,
+                messages=messages,
+                max_completion_tokens=max_new_tokens,
+                stream=stream,
+                **kwrs
+            )
+        else:
+            response = self.client.chat.completions.create(
+                model=self.model,
+                messages=messages,
+                max_tokens=max_new_tokens,
+                temperature=temperature,
+                stream=stream,
+                **kwrs
+            )
         if stream:
             res = ''
@@ -294,8 +311,17 @@ class OpenAIInferenceEngine(InferenceEngine):
                     if chunk.choices[0].delta.content is not None:
                         res += chunk.choices[0].delta.content
                         print(chunk.choices[0].delta.content, end="", flush=True)
+                    if chunk.choices[0].finish_reason == "length":
+                        warnings.warn("Model stopped generating due to context length limit.", RuntimeWarning)
+                        if self.reasoning_model:
+                            warnings.warn("max_new_tokens includes reasoning tokens and output tokens.", UserWarning)
             return res
+        if response.choices[0].finish_reason == "length":
+            warnings.warn("Model stopped generating due to context length limit.", RuntimeWarning)
+            if self.reasoning_model:
+                warnings.warn("max_new_tokens includes reasoning tokens and output tokens.", UserWarning)
         return response.choices[0].message.content
@@ -303,20 +329,37 @@ class OpenAIInferenceEngine(InferenceEngine):
         """
         Async version of chat method. Streaming is not supported.
         """
-        response = await self.async_client.chat.completions.create(
-            model=self.model,
-            messages=messages,
-            max_tokens=max_new_tokens,
-            temperature=temperature,
-            stream=False,
-            **kwrs
-        )
+        if self.reasoning_model:
+            if temperature != 0.0:
+                warnings.warn("Reasoning models do not support temperature parameter. Will be ignored.", UserWarning)
+            response = await self.async_client.chat.completions.create(
+                model=self.model,
+                messages=messages,
+                max_completion_tokens=max_new_tokens,
+                stream=False,
+                **kwrs
+            )
+        else:
+            response = await self.async_client.chat.completions.create(
+                model=self.model,
+                messages=messages,
+                max_tokens=max_new_tokens,
+                temperature=temperature,
+                stream=False,
+                **kwrs
+            )
+        if response.choices[0].finish_reason == "length":
+            warnings.warn("Model stopped generating due to context length limit.", RuntimeWarning)
+            if self.reasoning_model:
+                warnings.warn("max_new_tokens includes reasoning tokens and output tokens.", UserWarning)
         return response.choices[0].message.content
 class AzureOpenAIInferenceEngine(InferenceEngine):
-    def __init__(self, model:str, api_version:str, **kwrs):
+    def __init__(self, model:str, api_version:str, reasoning_model:bool=False, **kwrs):
         """
         The Azure OpenAI API inference engine.
         For parameters and documentation, refer to
@@ -329,6 +372,8 @@ class AzureOpenAIInferenceEngine(InferenceEngine):
             model name as described in https://platform.openai.com/docs/models
         api_version : str
             the Azure OpenAI API version
+        reasoning_model : bool, Optional
+            indicator for OpenAI reasoning models ("o" series).
         """
         if importlib.util.find_spec("openai") is None:
             raise ImportError("OpenAI Python API library not found. Please install OpanAI (```pip install openai```).")
@@ -340,6 +385,7 @@ class AzureOpenAIInferenceEngine(InferenceEngine):
                                   **kwrs)
         self.async_client = AsyncAzureOpenAI(api_version=self.api_version,
                                              **kwrs)
+        self.reasoning_model = reasoning_model
     def chat(self, messages:List[Dict[str,str]], max_new_tokens:int=2048, temperature:float=0.0, stream:bool=False, **kwrs) -> str:
         """
@@ -356,14 +402,27 @@ class AzureOpenAIInferenceEngine(InferenceEngine):
         stream : bool, Optional
             if True, LLM generated text will be printed in terminal in real-time.
         """
-        response = self.client.chat.completions.create(
-            model=self.model,
-            messages=messages,
-            max_tokens=max_new_tokens,
-            temperature=temperature,
-            stream=stream,
-            **kwrs
-        )
+        if self.reasoning_model:
+            if temperature != 0.0:
+                warnings.warn("Reasoning models do not support temperature parameter. Will be ignored.", UserWarning)
+            response = self.client.chat.completions.create(
+                model=self.model,
+                messages=messages,
+                max_completion_tokens=max_new_tokens,
+                stream=stream,
+                **kwrs
+            )
+        else:
+            response = self.client.chat.completions.create(
+                model=self.model,
+                messages=messages,
+                max_tokens=max_new_tokens,
+                temperature=temperature,
+                stream=stream,
+                **kwrs
+            )
         if stream:
             res = ''
@@ -372,8 +431,17 @@ class AzureOpenAIInferenceEngine(InferenceEngine):
                     if chunk.choices[0].delta.content is not None:
                         res += chunk.choices[0].delta.content
                         print(chunk.choices[0].delta.content, end="", flush=True)
+                    if chunk.choices[0].finish_reason == "length":
+                        warnings.warn("Model stopped generating due to context length limit.", RuntimeWarning)
+                        if self.reasoning_model:
+                            warnings.warn("max_new_tokens includes reasoning tokens and output tokens.", UserWarning)
             return res
+        if response.choices[0].finish_reason == "length":
+            warnings.warn("Model stopped generating due to context length limit.", RuntimeWarning)
+            if self.reasoning_model:
+                warnings.warn("max_new_tokens includes reasoning tokens and output tokens.", UserWarning)
         return response.choices[0].message.content
@@ -381,15 +449,32 @@ class AzureOpenAIInferenceEngine(InferenceEngine):
         """
         Async version of chat method. Streaming is not supported.
         """
-        response = await self.async_client.chat.completions.create(
-            model=self.model,
-            messages=messages,
-            max_tokens=max_new_tokens,
-            temperature=temperature,
-            stream=False,
-            **kwrs
-        )
+        if self.reasoning_model:
+            if temperature != 0.0:
+                warnings.warn("Reasoning models do not support temperature parameter. Will be ignored.", UserWarning)
+            response = await self.async_client.chat.completions.create(
+                model=self.model,
+                messages=messages,
+                max_completion_tokens=max_new_tokens,
+                stream=False,
+                **kwrs
+            )
+        else:
+            response = await self.async_client.chat.completions.create(
+                model=self.model,
+                messages=messages,
+                max_tokens=max_new_tokens,
+                temperature=temperature,
+                stream=False,
+                **kwrs
+            )
+        if response.choices[0].finish_reason == "length":
+            warnings.warn("Model stopped generating due to context length limit.", RuntimeWarning)
+            if self.reasoning_model:
+                warnings.warn("max_new_tokens includes reasoning tokens and output tokens.", UserWarning)
         return response.choices[0].message.content

{llm_ie-0.4.3 → llm_ie-0.4.5}/src/llm_ie/extractors.py RENAMED Viewed

@@ -1,6 +1,5 @@
 import abc
 import re
-import copy
 import json
 import json_repair
 import inspect
@@ -13,7 +12,6 @@ from typing import Set, List, Dict, Tuple, Union, Callable
 from llm_ie.data_types import LLMInformationExtractionFrame, LLMInformationExtractionDocument
 from llm_ie.engines import InferenceEngine
 from colorama import Fore, Style
-from nltk.tokenize import RegexpTokenizer
 class Extractor:
@@ -139,6 +137,7 @@ class Extractor:
 class FrameExtractor(Extractor):
+    from nltk.tokenize import RegexpTokenizer
     def __init__(self, inference_engine:InferenceEngine, prompt_template:str, system_prompt:str=None, **kwrs):
         """
         This is the abstract class for frame extraction.
@@ -157,7 +156,8 @@ class FrameExtractor(Extractor):
                          prompt_template=prompt_template,
                          system_prompt=system_prompt,
                          **kwrs)
-        self.tokenizer = RegexpTokenizer(r'\w+|[^\w\s]')
+        self.tokenizer = self.RegexpTokenizer(r'\w+|[^\w\s]')
     def _jaccard_score(self, s1:Set[str], s2:Set[str]) -> float:
@@ -569,7 +569,8 @@ class ReviewFrameExtractor(BasicFrameExtractor):
 class SentenceFrameExtractor(FrameExtractor):
     from nltk.tokenize.punkt import PunktSentenceTokenizer
-    def __init__(self, inference_engine:InferenceEngine, prompt_template:str, system_prompt:str=None, **kwrs):
+    def __init__(self, inference_engine:InferenceEngine, prompt_template:str, system_prompt:str=None,
+                 context_sentences:Union[str, int]="all", **kwrs):
         """
         This class performs sentence-by-sentence information extraction.
         The process is as follows:
@@ -590,10 +591,26 @@ class SentenceFrameExtractor(FrameExtractor):
             prompt template with "{{<placeholder name>}}" placeholder.
         system_prompt : str, Optional
             system prompt.
+        context_sentences : Union[str, int], Optional
+            number of sentences before and after the given sentence to provide additional context.
+            if "all", the full text will be provided in the prompt as context.
+            if 0, no additional context will be provided.
+                This is good for tasks that does not require context beyond the given sentence.
+            if > 0, the number of sentences before and after the given sentence to provide as context.
+                This is good for tasks that require context beyond the given sentence.
         """
         super().__init__(inference_engine=inference_engine, prompt_template=prompt_template,
                          system_prompt=system_prompt, **kwrs)
+        if not isinstance(context_sentences, int) and context_sentences != "all":
+            raise ValueError('context_sentences must be an integer (>= 0) or "all".')
+        if isinstance(context_sentences, int) and context_sentences < 0:
+            raise ValueError("context_sentences must be a positive integer.")
+        self.context_sentences =context_sentences
     def _get_sentences(self, text:str) -> List[Dict[str,str]]:
         """
         This method sentence tokenize the input text into a list of sentences
@@ -614,9 +631,24 @@ class SentenceFrameExtractor(FrameExtractor):
                             "end": end})
         return sentences
+    def _get_context_sentences(self, text_content, i:int, sentences:List[Dict[str, str]], document_key:str=None) -> str:
+        """
+        This function returns the context sentences for the current sentence of interest (i).
+        """
+        if self.context_sentences == "all":
+            context = text_content if isinstance(text_content, str) else text_content[document_key]
+        elif self.context_sentences == 0:
+            context = ""
+        else:
+            start = max(0, i - self.context_sentences)
+            end = min(i + 1 + self.context_sentences, len(sentences))
+            context = " ".join([s['sentence_text'] for s in sentences[start:end]])
+        return context
     def extract(self, text_content:Union[str, Dict[str,str]], max_new_tokens:int=512,
-                document_key:str=None, multi_turn:bool=False, temperature:float=0.0, stream:bool=False, **kwrs) -> List[Dict[str,str]]:
+                document_key:str=None, temperature:float=0.0, stream:bool=False, **kwrs) -> List[Dict[str,str]]:
         """
         This method inputs a text and outputs a list of outputs per sentence.
@@ -631,12 +663,6 @@ class SentenceFrameExtractor(FrameExtractor):
         document_key : str, Optional
             specify the key in text_content where document text is.
             If text_content is str, this parameter will be ignored.
-        multi_turn : bool, Optional
-            multi-turn conversation prompting.
-            If True, sentences and LLM outputs will be appended to the input message and carry-over.
-            If False, only the current sentence is prompted.
-            For LLM inference engines that supports prompt cache (e.g., Llama.Cpp, Ollama), use multi-turn conversation prompting
-            can better utilize the KV caching.
         temperature : float, Optional
             the temperature for token sampling.
         stream : bool, Optional
@@ -654,19 +680,32 @@ class SentenceFrameExtractor(FrameExtractor):
             if document_key is None:
                 raise ValueError("document_key must be provided when text_content is dict.")
             sentences = self._get_sentences(text_content[document_key])
-        # construct chat messages
-        messages = []
-        if self.system_prompt:
-            messages.append({'role': 'system', 'content': self.system_prompt})
-        messages.append({'role': 'user', 'content': self._get_user_prompt(text_content)})
-        messages.append({'role': 'assistant', 'content': 'Sure, please start with the first sentence.'})
         # generate sentence by sentence
-        for sent in sentences:
-            messages.append({'role': 'user', 'content': sent['sentence_text']})
+        for i, sent in enumerate(sentences):
+            # construct chat messages
+            messages = []
+            if self.system_prompt:
+                messages.append({'role': 'system', 'content': self.system_prompt})
+            context = self._get_context_sentences(text_content, i, sentences, document_key)
+            if self.context_sentences == 0:
+                # no context, just place sentence of interest
+                messages.append({'role': 'user', 'content': self._get_user_prompt(sent['sentence_text'])})
+            else:
+                # insert context
+                messages.append({'role': 'user', 'content': self._get_user_prompt(context)})
+                # simulate conversation
+                messages.append({'role': 'assistant', 'content': 'Sure, please provide the sentence of interest.'})
+                # place sentence of interest
+                messages.append({'role': 'user', 'content': sent['sentence_text']})
             if stream:
-                print(f"\n\n{Fore.GREEN}Sentence: {Style.RESET_ALL}\n{sent['sentence_text']}\n")
+                print(f"\n\n{Fore.GREEN}Sentence {i}:{Style.RESET_ALL}\n{sent['sentence_text']}\n")
+                if isinstance(self.context_sentences, int) and self.context_sentences > 0:
+                    print(f"{Fore.YELLOW}Context:{Style.RESET_ALL}\n{context}\n")
                 print(f"{Fore.BLUE}Extraction:{Style.RESET_ALL}")
             gen_text = self.inference_engine.chat(
@@ -676,19 +715,13 @@ class SentenceFrameExtractor(FrameExtractor):
                             stream=stream,
                             **kwrs
                         )
-            if multi_turn:
-                # update chat messages with LLM outputs
-                messages.append({'role': 'assistant', 'content': gen_text})
-            else:
-                # delete sentence so that message is reset
-                del messages[-1]
             # add to output
             output.append({'sentence_start': sent['start'],
                             'sentence_end': sent['end'],
                             'sentence_text': sent['sentence_text'],
                             'gen_text': gen_text})
         return output
@@ -726,21 +759,31 @@ class SentenceFrameExtractor(FrameExtractor):
             if document_key is None:
                 raise ValueError("document_key must be provided when text_content is dict.")
             sentences = self._get_sentences(text_content[document_key])
-        # construct chat messages
-        base_messages = []
-        if self.system_prompt:
-            base_messages.append({'role': 'system', 'content': self.system_prompt})
-        base_messages.append({'role': 'user', 'content': self._get_user_prompt(text_content)})
-        base_messages.append({'role': 'assistant', 'content': 'Sure, please start with the first sentence.'})
         # generate sentence by sentence
         tasks = []
         for i in range(0, len(sentences), concurrent_batch_size):
             batch = sentences[i:i + concurrent_batch_size]
-            for sent in batch:
-                messages = copy.deepcopy(base_messages)
-                messages.append({'role': 'user', 'content': sent['sentence_text']})
+            for j, sent in enumerate(batch):
+                # construct chat messages
+                messages = []
+                if self.system_prompt:
+                    messages.append({'role': 'system', 'content': self.system_prompt})
+                context = self._get_context_sentences(text_content, i + j, sentences, document_key)
+                if self.context_sentences == 0:
+                    # no context, just place sentence of interest
+                    messages.append({'role': 'user', 'content': self._get_user_prompt(sent['sentence_text'])})
+                else:
+                    # insert context
+                    messages.append({'role': 'user', 'content': self._get_user_prompt(context)})
+                    # simulate conversation
+                    messages.append({'role': 'assistant', 'content': 'Sure, please provide the sentence of interest.'})
+                    # place sentence of interest
+                    messages.append({'role': 'user', 'content': sent['sentence_text']})
+                # add to tasks
                 task = asyncio.create_task(
                     self.inference_engine.chat_async(
                                 messages=messages,
@@ -764,10 +807,10 @@ class SentenceFrameExtractor(FrameExtractor):
     def extract_frames(self, text_content:Union[str, Dict[str,str]], entity_key:str, max_new_tokens:int=512,
-                            document_key:str=None, multi_turn:bool=False, temperature:float=0.0, stream:bool=False,
-                            concurrent:bool=False, concurrent_batch_size:int=32,
-                            case_sensitive:bool=False, fuzzy_match:bool=True, fuzzy_buffer_size:float=0.2, fuzzy_score_cutoff:float=0.8,
-                            **kwrs) -> List[LLMInformationExtractionFrame]:
+                        document_key:str=None, temperature:float=0.0, stream:bool=False,
+                        concurrent:bool=False, concurrent_batch_size:int=32,
+                        case_sensitive:bool=False, fuzzy_match:bool=True, fuzzy_buffer_size:float=0.2, fuzzy_score_cutoff:float=0.8,
+                        **kwrs) -> List[LLMInformationExtractionFrame]:
         """
         This method inputs a text and outputs a list of LLMInformationExtractionFrame
         It use the extract() method and post-process outputs into frames.
@@ -785,12 +828,6 @@ class SentenceFrameExtractor(FrameExtractor):
         document_key : str, Optional
             specify the key in text_content where document text is.
             If text_content is str, this parameter will be ignored.
-        multi_turn : bool, Optional
-            multi-turn conversation prompting.
-            If True, sentences and LLM outputs will be appended to the input message and carry-over.
-            If False, only the current sentence is prompted.
-            For LLM inference engines that supports prompt cache (e.g., Llama.Cpp, Ollama), use multi-turn conversation prompting
-            can better utilize the KV caching.
         temperature : float, Optional
             the temperature for token sampling.
         stream : bool, Optional
@@ -815,8 +852,6 @@ class SentenceFrameExtractor(FrameExtractor):
         if concurrent:
             if stream:
                 warnings.warn("stream=True is not supported in concurrent mode.", RuntimeWarning)
-            if multi_turn:
-                warnings.warn("multi_turn=True is not supported in concurrent mode.", RuntimeWarning)
             nest_asyncio.apply() # For Jupyter notebook. Terminal does not need this.
             llm_output_sentences = asyncio.run(self.extract_async(text_content=text_content,
@@ -830,7 +865,6 @@ class SentenceFrameExtractor(FrameExtractor):
             llm_output_sentences = self.extract(text_content=text_content,
                                             max_new_tokens=max_new_tokens,
                                             document_key=document_key,
-                                            multi_turn=multi_turn,
                                             temperature=temperature,
                                             stream=stream,
                                             **kwrs)
@@ -866,7 +900,8 @@ class SentenceFrameExtractor(FrameExtractor):
 class SentenceReviewFrameExtractor(SentenceFrameExtractor):
     def __init__(self, inference_engine:InferenceEngine, prompt_template:str,
-                 review_mode:str, review_prompt:str=None, system_prompt:str=None, **kwrs):
+                 review_mode:str, review_prompt:str=None, system_prompt:str=None,
+                 context_sentences:Union[str, int]="all", **kwrs):
         """
         This class adds a review step after the SentenceFrameExtractor.
         For each sentence, the review process asks LLM to review its output and:
@@ -888,9 +923,16 @@ class SentenceReviewFrameExtractor(SentenceFrameExtractor):
             addition mode only ask LLM to add new frames, while revision mode ask LLM to regenerate.
         system_prompt : str, Optional
             system prompt.
+        context_sentences : Union[str, int], Optional
+            number of sentences before and after the given sentence to provide additional context.
+            if "all", the full text will be provided in the prompt as context.
+            if 0, no additional context will be provided.
+                This is good for tasks that does not require context beyond the given sentence.
+            if > 0, the number of sentences before and after the given sentence to provide as context.
+                This is good for tasks that require context beyond the given sentence.
         """
         super().__init__(inference_engine=inference_engine, prompt_template=prompt_template,
-                         system_prompt=system_prompt, **kwrs)
+                         system_prompt=system_prompt, context_sentences=context_sentences, **kwrs)
         if review_mode not in {"addition", "revision"}:
             raise ValueError('review_mode must be one of {"addition", "revision"}.')
@@ -908,7 +950,7 @@ class SentenceReviewFrameExtractor(SentenceFrameExtractor):
     def extract(self, text_content:Union[str, Dict[str,str]], max_new_tokens:int=512,
-                document_key:str=None, multi_turn:bool=False, temperature:float=0.0, stream:bool=False, **kwrs) -> List[Dict[str,str]]:
+                document_key:str=None, temperature:float=0.0, stream:bool=False, **kwrs) -> List[Dict[str,str]]:
         """
         This method inputs a text and outputs a list of outputs per sentence.
@@ -923,12 +965,6 @@ class SentenceReviewFrameExtractor(SentenceFrameExtractor):
         document_key : str, Optional
             specify the key in text_content where document text is.
             If text_content is str, this parameter will be ignored.
-        multi_turn : bool, Optional
-            multi-turn conversation prompting.
-            If True, sentences and LLM outputs will be appended to the input message and carry-over.
-            If False, only the current sentence is prompted.
-            For LLM inference engines that supports prompt cache (e.g., Llama.Cpp, Ollama), use multi-turn conversation prompting
-            can better utilize the KV caching.
         temperature : float, Optional
             the temperature for token sampling.
         stream : bool, Optional
@@ -946,19 +982,31 @@ class SentenceReviewFrameExtractor(SentenceFrameExtractor):
             if document_key is None:
                 raise ValueError("document_key must be provided when text_content is dict.")
             sentences = self._get_sentences(text_content[document_key])
-        # construct chat messages
-        messages = []
-        if self.system_prompt:
-            messages.append({'role': 'system', 'content': self.system_prompt})
+        # generate sentence by sentence
+        for i, sent in enumerate(sentences):
+            # construct chat messages
+            messages = []
+            if self.system_prompt:
+                messages.append({'role': 'system', 'content': self.system_prompt})
-        messages.append({'role': 'user', 'content': self._get_user_prompt(text_content)})
-        messages.append({'role': 'assistant', 'content': 'Sure, please start with the first sentence.'})
+            context = self._get_context_sentences(text_content, i, sentences, document_key)
+            if self.context_sentences == 0:
+                # no context, just place sentence of interest
+                messages.append({'role': 'user', 'content': self._get_user_prompt(sent['sentence_text'])})
+            else:
+                # insert context
+                messages.append({'role': 'user', 'content': self._get_user_prompt(context)})
+                # simulate conversation
+                messages.append({'role': 'assistant', 'content': 'Sure, please provide the sentence of interest.'})
+                # place sentence of interest
+                messages.append({'role': 'user', 'content': sent['sentence_text']})
-        # generate sentence by sentence
-        for sent in sentences:
-            messages.append({'role': 'user', 'content': sent['sentence_text']})
             if stream:
-                print(f"\n\n{Fore.GREEN}Sentence: {Style.RESET_ALL}\n{sent['sentence_text']}\n")
+                print(f"\n\n{Fore.GREEN}Sentence {i}: {Style.RESET_ALL}\n{sent['sentence_text']}\n")
+                if isinstance(self.context_sentences, int) and self.context_sentences > 0:
+                    print(f"{Fore.YELLOW}Context:{Style.RESET_ALL}\n{context}\n")
                 print(f"{Fore.BLUE}Initial Output:{Style.RESET_ALL}")
             initial = self.inference_engine.chat(
@@ -988,13 +1036,6 @@ class SentenceReviewFrameExtractor(SentenceFrameExtractor):
                 gen_text = review
             elif self.review_mode == "addition":
                 gen_text = initial + '\n' + review
-            if multi_turn:
-                # update chat messages with LLM outputs
-                messages.append({'role': 'assistant', 'content': review})
-            else:
-                # delete sentence and review so that message is reset
-                del messages[-3:]
             # add to output
             output.append({'sentence_start': sent['start'],
@@ -1040,24 +1081,33 @@ class SentenceReviewFrameExtractor(SentenceFrameExtractor):
             if document_key is None:
                 raise ValueError("document_key must be provided when text_content is dict.")
             sentences = self._get_sentences(text_content[document_key])
-        # construct chat messages
-        base_messages = []
-        if self.system_prompt:
-            base_messages.append({'role': 'system', 'content': self.system_prompt})
-        base_messages.append({'role': 'user', 'content': self._get_user_prompt(text_content)})
-        base_messages.append({'role': 'assistant', 'content': 'Sure, please start with the first sentence.'})
         # generate initial outputs sentence by sentence
-        initials = []
         tasks = []
-        message_list = []
+        messages_list = []
         for i in range(0, len(sentences), concurrent_batch_size):
             batch = sentences[i:i + concurrent_batch_size]
-            for sent in batch:
-                messages = copy.deepcopy(base_messages)
-                messages.append({'role': 'user', 'content': sent['sentence_text']})
-                message_list.append(messages)
+            for j, sent in enumerate(batch):
+                # construct chat messages
+                messages = []
+                if self.system_prompt:
+                    messages.append({'role': 'system', 'content': self.system_prompt})
+                context = self._get_context_sentences(text_content, i + j, sentences, document_key)
+                if self.context_sentences == 0:
+                    # no context, just place sentence of interest
+                    messages.append({'role': 'user', 'content': self._get_user_prompt(sent['sentence_text'])})
+                else:
+                    # insert context
+                    messages.append({'role': 'user', 'content': self._get_user_prompt(context)})
+                    # simulate conversation
+                    messages.append({'role': 'assistant', 'content': 'Sure, please provide the sentence of interest.'})
+                    # place sentence of interest
+                    messages.append({'role': 'user', 'content': sent['sentence_text']})
+                messages_list.append(messages)
                 task = asyncio.create_task(
                     self.inference_engine.chat_async(
                                 messages=messages,
@@ -1071,15 +1121,15 @@ class SentenceReviewFrameExtractor(SentenceFrameExtractor):
         # Wait until the batch is done, collect results and move on to next batch
         responses = await asyncio.gather(*tasks)
         # Collect initials
-        for gen_text, sent, message in zip(responses, sentences, message_list):
+        initials = []
+        for gen_text, sent, messages in zip(responses, sentences, messages_list):
             initials.append({'sentence_start': sent['start'],
                             'sentence_end': sent['end'],
                             'sentence_text': sent['sentence_text'],
                             'gen_text': gen_text,
-                            'messages': message})
+                            'messages': messages})
         # Review
-        reviews = []
         tasks = []
         for i in range(0, len(initials), concurrent_batch_size):
             batch = initials[i:i + concurrent_batch_size]
@@ -1101,6 +1151,7 @@ class SentenceReviewFrameExtractor(SentenceFrameExtractor):
             responses = await asyncio.gather(*tasks)
         # Collect reviews
+        reviews = []
         for gen_text, sent in zip(responses, sentences):
             reviews.append({'sentence_start': sent['start'],
                             'sentence_end': sent['end'],
@@ -1123,7 +1174,8 @@ class SentenceReviewFrameExtractor(SentenceFrameExtractor):
 class SentenceCoTFrameExtractor(SentenceFrameExtractor):
     from nltk.tokenize.punkt import PunktSentenceTokenizer
-    def __init__(self, inference_engine:InferenceEngine, prompt_template:str, system_prompt:str=None, **kwrs):
+    def __init__(self, inference_engine:InferenceEngine, prompt_template:str, system_prompt:str=None,
+                 context_sentences:Union[str, int]="all", **kwrs):
         """
         This class performs sentence-based Chain-of-thoughts (CoT) information extraction.
         A simulated chat follows this process:
@@ -1145,13 +1197,20 @@ class SentenceCoTFrameExtractor(SentenceFrameExtractor):
             prompt template with "{{<placeholder name>}}" placeholder.
         system_prompt : str, Optional
             system prompt.
+        context_sentences : Union[str, int], Optional
+            number of sentences before and after the given sentence to provide additional context.
+            if "all", the full text will be provided in the prompt as context.
+            if 0, no additional context will be provided.
+                This is good for tasks that does not require context beyond the given sentence.
+            if > 0, the number of sentences before and after the given sentence to provide as context.
+                This is good for tasks that require context beyond the given sentence.
         """
         super().__init__(inference_engine=inference_engine, prompt_template=prompt_template,
-                         system_prompt=system_prompt, **kwrs)
+                         system_prompt=system_prompt, context_sentences=context_sentences, **kwrs)
     def extract(self, text_content:Union[str, Dict[str,str]], max_new_tokens:int=512,
-                document_key:str=None, multi_turn:bool=False, temperature:float=0.0, stream:bool=False, **kwrs) -> List[Dict[str,str]]:
+                document_key:str=None, temperature:float=0.0, stream:bool=False, **kwrs) -> List[Dict[str,str]]:
         """
         This method inputs a text and outputs a list of outputs per sentence.
@@ -1166,12 +1225,6 @@ class SentenceCoTFrameExtractor(SentenceFrameExtractor):
         document_key : str, Optional
             specify the key in text_content where document text is.
             If text_content is str, this parameter will be ignored.
-        multi_turn : bool, Optional
-            multi-turn conversation prompting.
-            If True, sentences and LLM outputs will be appended to the input message and carry-over.
-            If False, only the current sentence is prompted.
-            For LLM inference engines that supports prompt cache (e.g., Llama.Cpp, Ollama), use multi-turn conversation prompting
-            can better utilize the KV caching.
         temperature : float, Optional
             the temperature for token sampling.
         stream : bool, Optional
@@ -1187,19 +1240,31 @@ class SentenceCoTFrameExtractor(SentenceFrameExtractor):
             sentences = self._get_sentences(text_content)
         elif isinstance(text_content, dict):
             sentences = self._get_sentences(text_content[document_key])
-        # construct chat messages
-        messages = []
-        if self.system_prompt:
-            messages.append({'role': 'system', 'content': self.system_prompt})
-        messages.append({'role': 'user', 'content': self._get_user_prompt(text_content)})
-        messages.append({'role': 'assistant', 'content': 'Sure, please start with the first sentence.'})
         # generate sentence by sentence
-        for sent in sentences:
-            messages.append({'role': 'user', 'content': sent['sentence_text']})
+        for i, sent in enumerate(sentences):
+            # construct chat messages
+            messages = []
+            if self.system_prompt:
+                messages.append({'role': 'system', 'content': self.system_prompt})
+            context = self._get_context_sentences(text_content, i, sentences, document_key)
+            if self.context_sentences == 0:
+                # no context, just place sentence of interest
+                messages.append({'role': 'user', 'content': self._get_user_prompt(sent['sentence_text'])})
+            else:
+                # insert context
+                messages.append({'role': 'user', 'content': self._get_user_prompt(context)})
+                # simulate conversation
+                messages.append({'role': 'assistant', 'content': 'Sure, please provide the sentence of interest.'})
+                # place sentence of interest
+                messages.append({'role': 'user', 'content': sent['sentence_text']})
             if stream:
                 print(f"\n\n{Fore.GREEN}Sentence: {Style.RESET_ALL}\n{sent['sentence_text']}\n")
+                if isinstance(self.context_sentences, int) and self.context_sentences > 0:
+                    print(f"{Fore.YELLOW}Context:{Style.RESET_ALL}\n{context}\n")
                 print(f"{Fore.BLUE}CoT:{Style.RESET_ALL}")
             gen_text = self.inference_engine.chat(
@@ -1209,13 +1274,6 @@ class SentenceCoTFrameExtractor(SentenceFrameExtractor):
                             stream=stream,
                             **kwrs
                         )
-            if multi_turn:
-                # update chat messages with LLM outputs
-                messages.append({'role': 'assistant', 'content': gen_text})
-            else:
-                # delete sentence so that message is reset
-                del messages[-1]
             # add to output
             output.append({'sentence_start': sent['start'],