PyPI - llm-ie - Versions diffs - 0.4.1__tar.gz → 0.4.3__tar.gz - Mend

llm-ie 0.4.1tar.gz → 0.4.3tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

llm_ie-0.4.1/README.md → llm_ie-0.4.3/PKG-INFO RENAMED Viewed

@@ -1,3 +1,20 @@
+Metadata-Version: 2.1
+Name: llm-ie
+Version: 0.4.3
+Summary: An LLM-powered tool that transforms everyday language into robust information extraction pipelines.
+License: MIT
+Author: Enshuo (David) Hsu
+Requires-Python: >=3.11,<4.0
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Requires-Dist: colorama (>=0.4.6,<0.5.0)
+Requires-Dist: json_repair (>=0.30,<0.31)
+Requires-Dist: nest_asyncio (>=0.1.6,<0.2.0)
+Requires-Dist: nltk (>=3.8,<4.0)
+Description-Content-Type: text/markdown
 <div align="center"><img src=doc_asset/readme_img/LLM-IE.png width=500 ></div>
 ![Python Version](https://img.shields.io/pypi/pyversions/llm-ie)
@@ -23,6 +40,7 @@ An LLM-powered tool that transforms everyday language into robust information ex
     - Concurrent LLM inferencing to speed up frame and relation extraction.
     - Support for LiteLLM.
 - [v0.4.1](https://github.com/daviden1013/llm-ie/releases/tag/v0.4.1) (Jan 25, 2025): Added filters, table view, and some new features to visualization tool (make sure to update [ie-viz](https://github.com/daviden1013/ie-viz)).
+- [v0.4.3](https://github.com/daviden1013/llm-ie/releases/tag/v0.4.3) (Feb 7, 2025): Added Azure OpenAI support.
 ## Table of Contents
 - [Overview](#overview)
@@ -83,6 +101,20 @@ inference_engine = OpenAIInferenceEngine(model="gpt-4o-mini")
 ```
 </details>
+<details>
+<summary><img src=doc_asset/readme_img/Azure_icon.png width=32 /> Azure OpenAI API</summary>
+Follow the [Azure AI Services Quickstart](https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart?tabs=command-line%2Ckeyless%2Ctypescript-keyless%2Cpython-new&pivots=programming-language-python) to set up Endpoint and API key.
+```python
+from llm_ie.engines import AzureOpenAIInferenceEngine
+inference_engine = AzureOpenAIInferenceEngine(model="gpt-4o-mini",
+                                              api_version="<your api version>")
+```
+</details>
 <details>
 <summary>🤗 Huggingface_hub</summary>
@@ -308,6 +340,22 @@ from llm_ie.engines import OpenAIInferenceEngine
 inference_engine = OpenAIInferenceEngine(model="gpt-4o-mini")
 ```
+#### <img src=doc_asset/readme_img/Azure_icon.png width=32 /> Azure OpenAI API
+In bash, save the endpoint name and API key to environmental variables `AZURE_OPENAI_ENDPOINT` and `AZURE_OPENAI_API_KEY`.
+```
+export AZURE_OPENAI_API_KEY="<your_API_key>"
+export AZURE_OPENAI_ENDPOINT="<your_endpoint>"
+```
+In Python, create inference engine and specify model name. For the available models, refer to [OpenAI webpage](https://platform.openai.com/docs/models).
+For more parameters, see [Azure OpenAI reference](https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart).
+```python
+from llm_ie.engines import AzureOpenAIInferenceEngine
+inference_engine = AzureOpenAIInferenceEngine(model="gpt-4o-mini")
+```
 #### 🤗 huggingface_hub
 The ```model``` can be a model id hosted on the Hugging Face Hub or a URL to a deployed Inference Endpoint. Refer to the [Inference Client](https://huggingface.co/docs/huggingface_hub/en/package_reference/inference_client) documentation for more details.
@@ -1130,4 +1178,4 @@ For more information and benchmarks, please check our paper:
   journal={arXiv preprint arXiv:2411.11779},
   year={2024}
 }
-```
+```

llm_ie-0.4.1/PKG-INFO → llm_ie-0.4.3/README.md RENAMED Viewed

@@ -1,19 +1,3 @@
-Metadata-Version: 2.1
-Name: llm-ie
-Version: 0.4.1
-Summary: An LLM-powered tool that transforms everyday language into robust information extraction pipelines.
-License: MIT
-Author: Enshuo (David) Hsu
-Requires-Python: >=3.11,<4.0
-Classifier: License :: OSI Approved :: MIT License
-Classifier: Programming Language :: Python :: 3
-Classifier: Programming Language :: Python :: 3.11
-Classifier: Programming Language :: Python :: 3.12
-Requires-Dist: colorama (>=0.4.6,<0.5.0)
-Requires-Dist: json_repair (>=0.30,<0.31)
-Requires-Dist: nltk (>=3.8,<4.0)
-Description-Content-Type: text/markdown
 <div align="center"><img src=doc_asset/readme_img/LLM-IE.png width=500 ></div>
 ![Python Version](https://img.shields.io/pypi/pyversions/llm-ie)
@@ -39,6 +23,7 @@ An LLM-powered tool that transforms everyday language into robust information ex
     - Concurrent LLM inferencing to speed up frame and relation extraction.
     - Support for LiteLLM.
 - [v0.4.1](https://github.com/daviden1013/llm-ie/releases/tag/v0.4.1) (Jan 25, 2025): Added filters, table view, and some new features to visualization tool (make sure to update [ie-viz](https://github.com/daviden1013/ie-viz)).
+- [v0.4.3](https://github.com/daviden1013/llm-ie/releases/tag/v0.4.3) (Feb 7, 2025): Added Azure OpenAI support.
 ## Table of Contents
 - [Overview](#overview)
@@ -99,6 +84,20 @@ inference_engine = OpenAIInferenceEngine(model="gpt-4o-mini")
 ```
 </details>
+<details>
+<summary><img src=doc_asset/readme_img/Azure_icon.png width=32 /> Azure OpenAI API</summary>
+Follow the [Azure AI Services Quickstart](https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart?tabs=command-line%2Ckeyless%2Ctypescript-keyless%2Cpython-new&pivots=programming-language-python) to set up Endpoint and API key.
+```python
+from llm_ie.engines import AzureOpenAIInferenceEngine
+inference_engine = AzureOpenAIInferenceEngine(model="gpt-4o-mini",
+                                              api_version="<your api version>")
+```
+</details>
 <details>
 <summary>🤗 Huggingface_hub</summary>
@@ -324,6 +323,22 @@ from llm_ie.engines import OpenAIInferenceEngine
 inference_engine = OpenAIInferenceEngine(model="gpt-4o-mini")
 ```
+#### <img src=doc_asset/readme_img/Azure_icon.png width=32 /> Azure OpenAI API
+In bash, save the endpoint name and API key to environmental variables `AZURE_OPENAI_ENDPOINT` and `AZURE_OPENAI_API_KEY`.
+```
+export AZURE_OPENAI_API_KEY="<your_API_key>"
+export AZURE_OPENAI_ENDPOINT="<your_endpoint>"
+```
+In Python, create inference engine and specify model name. For the available models, refer to [OpenAI webpage](https://platform.openai.com/docs/models).
+For more parameters, see [Azure OpenAI reference](https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart).
+```python
+from llm_ie.engines import AzureOpenAIInferenceEngine
+inference_engine = AzureOpenAIInferenceEngine(model="gpt-4o-mini")
+```
 #### 🤗 huggingface_hub
 The ```model``` can be a model id hosted on the Hugging Face Hub or a URL to a deployed Inference Endpoint. Refer to the [Inference Client](https://huggingface.co/docs/huggingface_hub/en/package_reference/inference_client) documentation for more details.
@@ -1146,4 +1161,4 @@ For more information and benchmarks, please check our paper:
   journal={arXiv preprint arXiv:2411.11779},
   year={2024}
 }
-```
+```

{llm_ie-0.4.1 → llm_ie-0.4.3}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "llm-ie"
-version = "0.4.1"
+version = "0.4.3"
 description = "An LLM-powered tool that transforms everyday language into robust information extraction pipelines."
 authors = ["Enshuo (David) Hsu"]
 license = "MIT"
@@ -16,6 +16,7 @@ python = "^3.11"
 nltk = "^3.8"
 colorama = "^0.4.6"
 json_repair = "^0.30"
+nest_asyncio = "^0.1.6"
 [build-system]

{llm_ie-0.4.1 → llm_ie-0.4.3}/src/llm_ie/__init__.py RENAMED Viewed

@@ -1,9 +1,9 @@
 from .data_types import LLMInformationExtractionFrame, LLMInformationExtractionDocument
-from .engines import LlamaCppInferenceEngine, OllamaInferenceEngine, HuggingFaceHubInferenceEngine, OpenAIInferenceEngine, LiteLLMInferenceEngine
+from .engines import LlamaCppInferenceEngine, OllamaInferenceEngine, HuggingFaceHubInferenceEngine, OpenAIInferenceEngine, AzureOpenAIInferenceEngine, LiteLLMInferenceEngine
 from .extractors import BasicFrameExtractor, ReviewFrameExtractor, SentenceFrameExtractor, SentenceReviewFrameExtractor, SentenceCoTFrameExtractor, BinaryRelationExtractor, MultiClassRelationExtractor
 from .prompt_editor import PromptEditor
 __all__ = ["LLMInformationExtractionFrame", "LLMInformationExtractionDocument",
-           "LlamaCppInferenceEngine", "OllamaInferenceEngine", "HuggingFaceHubInferenceEngine", "OpenAIInferenceEngine", "LiteLLMInferenceEngine",
+           "LlamaCppInferenceEngine", "OllamaInferenceEngine", "HuggingFaceHubInferenceEngine", "OpenAIInferenceEngine", "AzureOpenAIInferenceEngine", "LiteLLMInferenceEngine",
            "BasicFrameExtractor", "ReviewFrameExtractor", "SentenceFrameExtractor", "SentenceReviewFrameExtractor", "SentenceCoTFrameExtractor", "BinaryRelationExtractor", "MultiClassRelationExtractor",
            "PromptEditor"]

{llm_ie-0.4.1 → llm_ie-0.4.3}/src/llm_ie/data_types.py RENAMED Viewed

@@ -204,7 +204,7 @@ class LLMInformationExtractionDocument:
         # Add frame
         frame_clone = frame.copy()
         if create_id:
-            frame_clone.doc_id = f"{self.doc_id}_{len(self.frames)}"
+            frame_clone.frame_id = str(len(self.frames))
         self.frames.append(frame_clone)
         return True

{llm_ie-0.4.1 → llm_ie-0.4.3}/src/llm_ie/engines.py RENAMED Viewed

@@ -290,9 +290,88 @@ class OpenAIInferenceEngine(InferenceEngine):
         if stream:
             res = ''
             for chunk in response:
-                if chunk.choices[0].delta.content is not None:
-                    res += chunk.choices[0].delta.content
-                    print(chunk.choices[0].delta.content, end="", flush=True)
+                if len(chunk.choices) > 0:
+                    if chunk.choices[0].delta.content is not None:
+                        res += chunk.choices[0].delta.content
+                        print(chunk.choices[0].delta.content, end="", flush=True)
+            return res
+        return response.choices[0].message.content
+    async def chat_async(self, messages:List[Dict[str,str]], max_new_tokens:int=2048, temperature:float=0.0, **kwrs) -> str:
+        """
+        Async version of chat method. Streaming is not supported.
+        """
+        response = await self.async_client.chat.completions.create(
+            model=self.model,
+            messages=messages,
+            max_tokens=max_new_tokens,
+            temperature=temperature,
+            stream=False,
+            **kwrs
+        )
+        return response.choices[0].message.content
+class AzureOpenAIInferenceEngine(InferenceEngine):
+    def __init__(self, model:str, api_version:str, **kwrs):
+        """
+        The Azure OpenAI API inference engine.
+        For parameters and documentation, refer to
+        - https://azure.microsoft.com/en-us/products/ai-services/openai-service
+        - https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart
+        Parameters:
+        ----------
+        model : str
+            model name as described in https://platform.openai.com/docs/models
+        api_version : str
+            the Azure OpenAI API version
+        """
+        if importlib.util.find_spec("openai") is None:
+            raise ImportError("OpenAI Python API library not found. Please install OpanAI (```pip install openai```).")
+        from openai import AzureOpenAI, AsyncAzureOpenAI
+        self.model = model
+        self.api_version = api_version
+        self.client = AzureOpenAI(api_version=self.api_version,
+                                  **kwrs)
+        self.async_client = AsyncAzureOpenAI(api_version=self.api_version,
+                                             **kwrs)
+    def chat(self, messages:List[Dict[str,str]], max_new_tokens:int=2048, temperature:float=0.0, stream:bool=False, **kwrs) -> str:
+        """
+        This method inputs chat messages and outputs LLM generated text.
+        Parameters:
+        ----------
+        messages : List[Dict[str,str]]
+            a list of dict with role and content. role must be one of {"system", "user", "assistant"}
+        max_new_tokens : str, Optional
+            the max number of new tokens LLM can generate.
+        temperature : float, Optional
+            the temperature for token sampling.
+        stream : bool, Optional
+            if True, LLM generated text will be printed in terminal in real-time.
+        """
+        response = self.client.chat.completions.create(
+            model=self.model,
+            messages=messages,
+            max_tokens=max_new_tokens,
+            temperature=temperature,
+            stream=stream,
+            **kwrs
+        )
+        if stream:
+            res = ''
+            for chunk in response:
+                if len(chunk.choices) > 0:
+                    if chunk.choices[0].delta.content is not None:
+                        res += chunk.choices[0].delta.content
+                        print(chunk.choices[0].delta.content, end="", flush=True)
             return res
         return response.choices[0].message.content
@@ -312,6 +391,7 @@ class OpenAIInferenceEngine(InferenceEngine):
         )
         return response.choices[0].message.content
 class LiteLLMInferenceEngine(InferenceEngine):
     def __init__(self, model:str=None, base_url:str=None, api_key:str=None):

{llm_ie-0.4.1 → llm_ie-0.4.3}/src/llm_ie/extractors.py RENAMED Viewed

@@ -59,7 +59,7 @@ class Extractor:
         text_content : Union[str, Dict[str,str]]
             the input text content to put in prompt template.
             If str, the prompt template must has only 1 placeholder {{<placeholder name>}}, regardless of placeholder name.
-            If dict, all the keys must be included in the prompt template placeholder {{<placeholder name>}}.
+            If dict, all the keys must be included in the prompt template placeholder {{<placeholder name>}}. All values must be str.
         Returns : str
             a user prompt.
@@ -73,6 +73,10 @@ class Extractor:
             prompt = pattern.sub(text, self.prompt_template)
         elif isinstance(text_content, dict):
+            # Check if all values are str
+            if not all([isinstance(v, str) for v in text_content.values()]):
+                raise ValueError("All values in text_content must be str.")
+            # Check if all keys are in the prompt template
             placeholders = pattern.findall(self.prompt_template)
             if len(placeholders) != len(text_content):
                 raise ValueError(f"Expect text_content ({len(text_content)}) and prompt template placeholder ({len(placeholders)}) to have equal size.")
@@ -422,6 +426,13 @@ class BasicFrameExtractor(FrameExtractor):
         Return : str
             a list of frames.
         """
+        if isinstance(text_content, str):
+            text = text_content
+        elif isinstance(text_content, dict):
+            if document_key is None:
+                raise ValueError("document_key must be provided when text_content is dict.")
+            text = text_content[document_key]
         frame_list = []
         gen_text = self.extract(text_content=text_content,
                                 max_new_tokens=max_new_tokens,
@@ -435,11 +446,6 @@ class BasicFrameExtractor(FrameExtractor):
                 entity_json.append(entity)
             else:
                 warnings.warn(f'Extractor output "{entity}" does not have entity_key ("{entity_key}"). This frame will be dropped.', RuntimeWarning)
-        if isinstance(text_content, str):
-            text = text_content
-        elif isinstance(text_content, dict):
-            text = text_content[document_key]
         spans = self._find_entity_spans(text=text,
                                         entities=[e[entity_key] for e in entity_json],
@@ -645,6 +651,8 @@ class SentenceFrameExtractor(FrameExtractor):
         if isinstance(text_content, str):
             sentences = self._get_sentences(text_content)
         elif isinstance(text_content, dict):
+            if document_key is None:
+                raise ValueError("document_key must be provided when text_content is dict.")
             sentences = self._get_sentences(text_content[document_key])
         # construct chat messages
         messages = []
@@ -715,6 +723,8 @@ class SentenceFrameExtractor(FrameExtractor):
         if isinstance(text_content, str):
             sentences = self._get_sentences(text_content)
         elif isinstance(text_content, dict):
+            if document_key is None:
+                raise ValueError("document_key must be provided when text_content is dict.")
             sentences = self._get_sentences(text_content[document_key])
         # construct chat messages
         base_messages = []
@@ -933,6 +943,8 @@ class SentenceReviewFrameExtractor(SentenceFrameExtractor):
         if isinstance(text_content, str):
             sentences = self._get_sentences(text_content)
         elif isinstance(text_content, dict):
+            if document_key is None:
+                raise ValueError("document_key must be provided when text_content is dict.")
             sentences = self._get_sentences(text_content[document_key])
         # construct chat messages
         messages = []
@@ -1025,6 +1037,8 @@ class SentenceReviewFrameExtractor(SentenceFrameExtractor):
         if isinstance(text_content, str):
             sentences = self._get_sentences(text_content)
         elif isinstance(text_content, dict):
+            if document_key is None:
+                raise ValueError("document_key must be provided when text_content is dict.")
             sentences = self._get_sentences(text_content[document_key])
         # construct chat messages
         base_messages = []

{llm_ie-0.4.1 → llm_ie-0.4.3}/src/llm_ie/prompt_editor.py RENAMED Viewed

@@ -67,7 +67,7 @@ class PromptEditor:
         return prompt
-    def rewrite(self, draft:str) -> str:
+    def rewrite(self, draft:str, **kwrs) -> str:
         """
         This method inputs a prompt draft and rewrites it following the extractor's guideline.
         """
@@ -79,10 +79,10 @@ class PromptEditor:
                                              prompt_template=rewrite_prompt_template)
         messages = [{"role": "system", "content": self.system_prompt},
                     {"role": "user", "content": prompt}]
-        res = self.inference_engine.chat(messages, stream=True)
+        res = self.inference_engine.chat(messages, stream=True, **kwrs)
         return res
-    def comment(self, draft:str) -> str:
+    def comment(self, draft:str, **kwrs) -> str:
         """
         This method inputs a prompt draft and comment following the extractor's guideline.
         """
@@ -94,11 +94,11 @@ class PromptEditor:
                                              prompt_template=comment_prompt_template)
         messages = [{"role": "system", "content": self.system_prompt},
                     {"role": "user", "content": prompt}]
-        res = self.inference_engine.chat(messages, stream=True)
+        res = self.inference_engine.chat(messages, stream=True, **kwrs)
         return res
-    def _terminal_chat(self):
+    def _terminal_chat(self, **kwrs):
         """
         This method runs an interactive chat session in the terminal to help users write prompt templates.
         """
@@ -126,11 +126,11 @@ class PromptEditor:
             # Chat
             messages.append({"role": "user", "content": user_input})
             print(f"{Fore.BLUE}Assistant: {Style.RESET_ALL}", end="")
-            response = self.inference_engine.chat(messages, stream=True)
+            response = self.inference_engine.chat(messages, stream=True, **kwrs)
             messages.append({"role": "assistant", "content": response})
-    def _IPython_chat(self):
+    def _IPython_chat(self, **kwrs):
         """
         This method runs an interactive chat session in Jupyter/IPython using ipywidgets to help users write prompt templates.
         """
@@ -186,7 +186,7 @@ class PromptEditor:
             # Get assistant's response and append it to conversation
             print("Assistant: ", end="")
-            response = self.inference_engine.chat(messages, stream=True)
+            response = self.inference_engine.chat(messages, stream=True, **kwrs)
             messages.append({"role": "assistant", "content": response})
             # Display the assistant's response
@@ -200,11 +200,11 @@ class PromptEditor:
         display(input_box)
         display(output_area)
-    def chat(self):
+    def chat(self, **kwrs):
         """
         External method that detects the environment and calls the appropriate chat method.
         """
         if 'ipykernel' in sys.modules:
-            self._IPython_chat()
+            self._IPython_chat(**kwrs)
         else:
-            self._terminal_chat()
+            self._terminal_chat(**kwrs)