PyPI - llm-ie - Versions diffs - 0.1.5__tar.gz → 0.1.7__tar.gz - Mend

llm-ie 0.1.5tar.gz → 0.1.7tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

{llm_ie-0.1.5 → llm_ie-0.1.7}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: llm-ie
-Version: 0.1.5
+Version: 0.1.7
 Summary: An LLM-powered tool that transforms everyday language into robust information extraction pipelines.
 License: MIT
 Author: Enshuo (David) Hsu
@@ -37,7 +37,7 @@ LLM-IE is a toolkit that provides robust information extraction utilities for fr
 <div align="center"><img src="doc_asset/readme_img/LLM-IE flowchart.png" width=800 ></div>
 ## Prerequisite
-At least one LLM inference engine is required. There are built-in supports for 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python), <img src="https://avatars.githubusercontent.com/u/151674099?s=48&v=4" alt="Icon" width="20"/> [Ollama](https://github.com/ollama/ollama), 🤗 [Huggingface_hub](https://github.com/huggingface/huggingface_hub), and <img src=doc_asset/readme_img/openai-logomark.png width=16 /> [OpenAI API](https://platform.openai.com/docs/api-reference/introduction). For installation guides, please refer to those projects. Other inference engines can be configured through the [InferenceEngine](src/llm_ie/engines.py) abstract class. See [LLM Inference Engine](#llm-inference-engine) section below.
+At least one LLM inference engine is required. There are built-in supports for 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python), <img src="https://avatars.githubusercontent.com/u/151674099?s=48&v=4" alt="Icon" width="20"/> [Ollama](https://github.com/ollama/ollama), 🤗 [Huggingface_hub](https://github.com/huggingface/huggingface_hub), <img src=doc_asset/readme_img/openai-logomark.png width=16 /> [OpenAI API](https://platform.openai.com/docs/api-reference/introduction), and <img src=doc_asset/readme_img/vllm-logo.png width=20 /> vLLM. For installation guides, please refer to those projects. Other inference engines can be configured through the [InferenceEngine](src/llm_ie/engines.py) abstract class. See [LLM Inference Engine](#llm-inference-engine) section below.
 ## Installation
 The Python package is available on PyPI.
@@ -92,6 +92,26 @@ from llm_ie.engines import OpenAIInferenceEngine
 llm = OpenAIInferenceEngine(model="gpt-4o-mini")
 ```
+</details>
+<details>
+<summary><img src=doc_asset/readme_img/vllm-logo.png width=20 /> vLLM</summary>
+The vLLM support follows the [OpenAI Compatible Server](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html). For more parameters, please refer to the documentation.
+Start the server
+```cmd
+vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct
+```
+Define inference engine
+```python
+from llm_ie.engines import OpenAIInferenceEngine
+engine = OpenAIInferenceEngine(base_url="http://localhost:8000/v1",
+                               api_key="EMPTY",
+                               model="meta-llama/Meta-Llama-3.1-8B-Instruct")
+```
 </details>
 In this quick start demo, we use Llama-cpp-python to run Llama-3.1-8B with int8 quantization ([bullerwins/Meta-Llama-3.1-8B-Instruct-GGUF](https://huggingface.co/bullerwins/Meta-Llama-3.1-8B-Instruct-GGUF)).
@@ -244,6 +264,24 @@ from llm_ie.engines import OpenAIInferenceEngine
 openai_engine = OpenAIInferenceEngine(model="gpt-4o-mini")
 ```
+#### <img src=doc_asset/readme_img/vllm-logo.png width=20 /> vLLM
+The vLLM support follows the [OpenAI Compatible Server](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html). For more parameters, please refer to the documentation.
+Start the server
+```cmd
+CUDA_VISIBLE_DEVICES=<GPU#> vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct --api-key MY_API_KEY --tensor-parallel-size <# of GPUs to use>
+```
+Use ```CUDA_VISIBLE_DEVICES``` to specify GPUs to use. The ```--tensor-parallel-size``` should be set accordingly. The ```--api-key``` is optional.
+the default port is 8000. ```--port``` sets the port.
+Define inference engine
+```python
+from llm_ie.engines import OpenAIInferenceEngine
+engine = OpenAIInferenceEngine(base_url="http://localhost:8000/v1",
+                               api_key="MY_API_KEY",
+                               model="meta-llama/Meta-Llama-3.1-8B-Instruct")
+```
+The ```model``` must match the repo name specified in the server.
 #### Test inference engine configuration
 To test the inference engine, use the ```chat()``` method.

{llm_ie-0.1.5 → llm_ie-0.1.7}/README.md RENAMED Viewed

@@ -23,7 +23,7 @@ LLM-IE is a toolkit that provides robust information extraction utilities for fr
 <div align="center"><img src="doc_asset/readme_img/LLM-IE flowchart.png" width=800 ></div>
 ## Prerequisite
-At least one LLM inference engine is required. There are built-in supports for 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python), <img src="https://avatars.githubusercontent.com/u/151674099?s=48&v=4" alt="Icon" width="20"/> [Ollama](https://github.com/ollama/ollama), 🤗 [Huggingface_hub](https://github.com/huggingface/huggingface_hub), and <img src=doc_asset/readme_img/openai-logomark.png width=16 /> [OpenAI API](https://platform.openai.com/docs/api-reference/introduction). For installation guides, please refer to those projects. Other inference engines can be configured through the [InferenceEngine](src/llm_ie/engines.py) abstract class. See [LLM Inference Engine](#llm-inference-engine) section below.
+At least one LLM inference engine is required. There are built-in supports for 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python), <img src="https://avatars.githubusercontent.com/u/151674099?s=48&v=4" alt="Icon" width="20"/> [Ollama](https://github.com/ollama/ollama), 🤗 [Huggingface_hub](https://github.com/huggingface/huggingface_hub), <img src=doc_asset/readme_img/openai-logomark.png width=16 /> [OpenAI API](https://platform.openai.com/docs/api-reference/introduction), and <img src=doc_asset/readme_img/vllm-logo.png width=20 /> vLLM. For installation guides, please refer to those projects. Other inference engines can be configured through the [InferenceEngine](src/llm_ie/engines.py) abstract class. See [LLM Inference Engine](#llm-inference-engine) section below.
 ## Installation
 The Python package is available on PyPI.
@@ -78,6 +78,26 @@ from llm_ie.engines import OpenAIInferenceEngine
 llm = OpenAIInferenceEngine(model="gpt-4o-mini")
 ```
+</details>
+<details>
+<summary><img src=doc_asset/readme_img/vllm-logo.png width=20 /> vLLM</summary>
+The vLLM support follows the [OpenAI Compatible Server](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html). For more parameters, please refer to the documentation.
+Start the server
+```cmd
+vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct
+```
+Define inference engine
+```python
+from llm_ie.engines import OpenAIInferenceEngine
+engine = OpenAIInferenceEngine(base_url="http://localhost:8000/v1",
+                               api_key="EMPTY",
+                               model="meta-llama/Meta-Llama-3.1-8B-Instruct")
+```
 </details>
 In this quick start demo, we use Llama-cpp-python to run Llama-3.1-8B with int8 quantization ([bullerwins/Meta-Llama-3.1-8B-Instruct-GGUF](https://huggingface.co/bullerwins/Meta-Llama-3.1-8B-Instruct-GGUF)).
@@ -230,6 +250,24 @@ from llm_ie.engines import OpenAIInferenceEngine
 openai_engine = OpenAIInferenceEngine(model="gpt-4o-mini")
 ```
+#### <img src=doc_asset/readme_img/vllm-logo.png width=20 /> vLLM
+The vLLM support follows the [OpenAI Compatible Server](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html). For more parameters, please refer to the documentation.
+Start the server
+```cmd
+CUDA_VISIBLE_DEVICES=<GPU#> vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct --api-key MY_API_KEY --tensor-parallel-size <# of GPUs to use>
+```
+Use ```CUDA_VISIBLE_DEVICES``` to specify GPUs to use. The ```--tensor-parallel-size``` should be set accordingly. The ```--api-key``` is optional.
+the default port is 8000. ```--port``` sets the port.
+Define inference engine
+```python
+from llm_ie.engines import OpenAIInferenceEngine
+engine = OpenAIInferenceEngine(base_url="http://localhost:8000/v1",
+                               api_key="MY_API_KEY",
+                               model="meta-llama/Meta-Llama-3.1-8B-Instruct")
+```
+The ```model``` must match the repo name specified in the server.
 #### Test inference engine configuration
 To test the inference engine, use the ```chat()``` method.

{llm_ie-0.1.5 → llm_ie-0.1.7}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "llm-ie"
-version = "0.1.5"
+version = "0.1.7"
 description = "An LLM-powered tool that transforms everyday language into robust information extraction pipelines."
 authors = ["Enshuo (David) Hsu"]
 license = "MIT"

{llm_ie-0.1.5 → llm_ie-0.1.7}/src/llm_ie/engines.py RENAMED Viewed

@@ -246,4 +246,4 @@ class OpenAIInferenceEngine(InferenceEngine):
                     print(chunk.choices[0].delta.content, end="")
             return res
-        return response.choices[0].delta.content
+        return response.choices[0].message.content

{llm_ie-0.1.5 → llm_ie-0.1.7}/src/llm_ie/extractors.py RENAMED Viewed

@@ -397,7 +397,7 @@ class SentenceFrameExtractor(FrameExtractor):
     def extract(self, text_content:Union[str, Dict[str,str]], max_new_tokens:int=512,
-                document_key:str=None, temperature:float=0.0, stream:bool=False, **kwrs) -> List[Dict[str,str]]:
+                document_key:str=None, multi_turn:bool=True, temperature:float=0.0, stream:bool=False, **kwrs) -> List[Dict[str,str]]:
         """
         This method inputs a text and outputs a list of outputs per sentence.
@@ -412,6 +412,12 @@ class SentenceFrameExtractor(FrameExtractor):
         document_key : str, Optional
             specify the key in text_content where document text is.
             If text_content is str, this parameter will be ignored.
+        multi_turn : bool, Optional
+            multi-turn conversation prompting.
+            If True, sentences and LLM outputs will be appended to the input message and carry-over.
+            If False, only the current sentence is prompted.
+            For LLM inference engines that supports prompt cache (e.g., Llama.Cpp, Ollama), use multi-turn conversation prompting
+            can better utilize the KV caching.
         temperature : float, Optional
             the temperature for token sampling.
         stream : bool, Optional
@@ -449,9 +455,14 @@ class SentenceFrameExtractor(FrameExtractor):
                             stream=stream,
                             **kwrs
                         )
-            # update chat messages
-            messages.append({'role': 'assistant', 'content': gen_text})
+            if multi_turn:
+                # update chat messages with LLM outputs
+                messages.append({'role': 'assistant', 'content': gen_text})
+            else:
+                # delete sentence so that message is reset
+                del messages[-1]
             # add to output
             output.append({'sentence_start': sent['start'],
                             'sentence_end': sent['end'],
@@ -460,8 +471,8 @@ class SentenceFrameExtractor(FrameExtractor):
         return output
-    def extract_frames(self, text_content:Union[str, Dict[str,str]], entity_key:str,
-                       max_new_tokens:int=512, document_key:str=None, **kwrs) -> List[LLMInformationExtractionFrame]:
+    def extract_frames(self, text_content:Union[str, Dict[str,str]], entity_key:str, max_new_tokens:int=512,
+                       document_key:str=None, multi_turn:bool=True, temperature:float=0.0, stream:bool=False, **kwrs) -> List[LLMInformationExtractionFrame]:
         """
         This method inputs a text and outputs a list of LLMInformationExtractionFrame
         It use the extract() method and post-process outputs into frames.
@@ -479,12 +490,27 @@ class SentenceFrameExtractor(FrameExtractor):
         document_key : str, Optional
             specify the key in text_content where document text is.
             If text_content is str, this parameter will be ignored.
+        multi_turn : bool, Optional
+            multi-turn conversation prompting.
+            If True, sentences and LLM outputs will be appended to the input message and carry-over.
+            If False, only the current sentence is prompted.
+            For LLM inference engines that supports prompt cache (e.g., Llama.Cpp, Ollama), use multi-turn conversation prompting
+            can better utilize the KV caching.
+        temperature : float, Optional
+            the temperature for token sampling.
+        stream : bool, Optional
+            if True, LLM generated text will be printed in terminal in real-time.
         Return : str
             a list of frames.
         """
         llm_output_sentence = self.extract(text_content=text_content,
-                                           max_new_tokens=max_new_tokens, document_key=document_key, **kwrs)
+                                           max_new_tokens=max_new_tokens,
+                                           document_key=document_key,
+                                           multi_turn=multi_turn,
+                                           temperature=temperature,
+                                           stream=stream,
+                                           **kwrs)
         frame_list = []
         for sent in llm_output_sentence:
             entity_json = self._extract_json(gen_text=sent['gen_text'])