PyPI - llm-ie - Versions diffs - 0.1.0__tar.gz → 0.1.2__tar.gz - Mend

llm-ie 0.1.0tar.gz → 0.1.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

{llm_ie-0.1.0 → llm_ie-0.1.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: llm-ie
-Version: 0.1.0
+Version: 0.1.2
 Summary: An LLM-powered tool that transforms everyday language into robust information extraction pipelines.
 License: MIT
 Author: Enshuo (David) Hsu
@@ -11,7 +11,11 @@ Classifier: Programming Language :: Python :: 3.11
 Classifier: Programming Language :: Python :: 3.12
 Description-Content-Type: text/markdown
-<div align="center"><img src=asset/LLM-IE.png width=500 ></div>
+<div align="center"><img src=asset/readme_img/LLM-IE.png width=500 ></div>
+![Python Version](https://img.shields.io/pypi/pyversions/llm-ie)
+![PyPI](https://img.shields.io/pypi/v/llm-ie)
 An LLM-powered tool that transforms everyday language into robust information extraction pipelines.
@@ -29,10 +33,10 @@ An LLM-powered tool that transforms everyday language into robust information ex
 ## Overview
 LLM-IE is a toolkit that provides robust information extraction utilities for frame-based information extraction. Since prompt design has a significant impact on generative information extraction with LLMs, it also provides a built-in LLM editor to help with prompt writing. The flowchart below demonstrates the workflow starting from a casual language request.
-<div align="center"><img src="asset/LLM-IE flowchart.png" width=800 ></div>
+<div align="center"><img src="asset/readme_img/LLM-IE flowchart.png" width=800 ></div>
 ## Prerequisite
-At least one LLM inference engine is required. We provide built-in support for 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python) and <img src="https://avatars.githubusercontent.com/u/151674099?s=48&v=4" alt="Icon" width="20"/> [Ollama](https://github.com/ollama/ollama). For installation guides, please refer to those projects. Other inference engines can be configured through the [InferenceEngine](src/llm_ie/engines.py) abstract class. See [LLM Inference Engine](#llm-inference-engine) section below.
+At least one LLM inference engine is required. There are built-in supports for 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python), <img src="https://avatars.githubusercontent.com/u/151674099?s=48&v=4" alt="Icon" width="20"/> [Ollama](https://github.com/ollama/ollama), 🤗 [Huggingface_hub](https://github.com/huggingface/huggingface_hub), and <img src=asset/readme_img/openai-logomark.png width=16 /> [OpenAI API](https://platform.openai.com/docs/api-reference/introduction). For installation guides, please refer to those projects. Other inference engines can be configured through the [InferenceEngine](src/llm_ie/engines.py) abstract class. See [LLM Inference Engine](#llm-inference-engine) section below.
 ## Installation
 The Python package is available on PyPI.
@@ -45,7 +49,7 @@ Note that this package does not check LLM inference engine installation nor inst
 We use a [synthesized medical note](demo/document/synthesized_note.txt) by ChatGPT to demo the information extraction process. Our task is to extract diagnosis names, spans, and corresponding attributes (i.e., diagnosis datetime, status).
 #### Choose an LLM inference engine
-We use one of the built-in engines.
+Choose one of the built-in engines below.
 <details>
 <summary><img src="https://avatars.githubusercontent.com/u/151674099?s=48&v=4" alt="Icon" width="20"/> Ollama</summary>
@@ -62,11 +66,35 @@ llm = OllamaInferenceEngine(model_name="llama3.1:8b-instruct-q8_0")
 ```python
 from llm_ie.engines import LlamaCppInferenceEngine
-llama_cpp = LlamaCppInferenceEngine(repo_id="bullerwins/Meta-Llama-3.1-8B-Instruct-GGUF",
+llm = LlamaCppInferenceEngine(repo_id="bullerwins/Meta-Llama-3.1-8B-Instruct-GGUF",
                                     gguf_filename="Meta-Llama-3.1-8B-Instruct-Q8_0.gguf")
 ```
 </details>
+<details>
+<summary>🤗 Huggingface_hub</summary>
+```python
+from llm_ie.engines import HuggingFaceHubInferenceEngine
+llm = HuggingFaceHubInferenceEngine(model="meta-llama/Meta-Llama-3-8B-Instruct")
+```
+</details>
+<details>
+<summary><img src=asset/readme_img/openai-logomark.png width=16 /> OpenAI API</summary>
+Follow the [Best Practices for API Key Safety](https://help.openai.com/en/articles/5112595-best-practices-for-api-key-safety) to set up API key.
+```python
+from llm_ie.engines import OpenAIInferenceEngine
+llm = OpenAIInferenceEngine(model="gpt-4o-mini")
+```
+</details>
+In this quick start demo, we use Llama-cpp-python to run Llama-3.1-8B with int8 quantization ([bullerwins/Meta-Llama-3.1-8B-Instruct-GGUF](https://huggingface.co/bullerwins/Meta-Llama-3.1-8B-Instruct-GGUF)).
+The outputs might be slightly different with other inference engines, LLMs, or quantization.
 #### Casual language as prompt
 We start with a casual description:
@@ -165,7 +193,7 @@ This package is comprised of some key classes:
 - Extractors
 ### LLM Inference Engine
-Provides an interface for different LLM inference engines to work in the information extraction workflow. The built-in engines are ```LlamaCppInferenceEngine``` and ```OllamaInferenceEngine```.
+Provides an interface for different LLM inference engines to work in the information extraction workflow. The built-in engines are ```LlamaCppInferenceEngine```, ```OllamaInferenceEngine```, and ```HuggingFaceHubInferenceEngine```.
 #### 🦙 Llama-cpp-python
 The ```repo_id``` and ```gguf_filename``` must match the ones on the Huggingface repo to ensure the correct model is loaded. ```n_ctx``` determines the context length LLM will consider during text generation. Empirically, longer context length gives better performance, while consuming more memory and increases computation. Note that when ```n_ctx``` is less than the prompt length, Llama.cpp throws exceptions. ```n_gpu_layers``` indicates a number of model layers to offload to GPU. Default is -1 for all layers (entire LLM). Flash attention ```flash_attn``` is supported by Llama.cpp. The ```verbose``` indicates whether model information should be displayed. For more input parameters, see 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python).
@@ -191,6 +219,31 @@ ollama = OllamaInferenceEngine(model_name="llama3.1:8b-instruct-q8_0",
                                keep_alive=300)
 ```
+#### 🤗 huggingface_hub
+The ```model``` can be a model id hosted on the Hugging Face Hub or a URL to a deployed Inference Endpoint. Refer to the [Inference Client](https://huggingface.co/docs/huggingface_hub/en/package_reference/inference_client) documentation for more details.
+```python
+from llm_ie.engines import HuggingFaceHubInferenceEngine
+hf = HuggingFaceHubInferenceEngine(model="meta-llama/Meta-Llama-3-8B-Instruct")
+```
+#### <img src=asset/readme_img/openai-logomark.png width=16 /> OpenAI API
+In bash, save API key to the environmental variable ```OPENAI_API_KEY```.
+```
+export OPENAI_API_KEY=<your_API_key>
+```
+In Python, create inference engine and specify model name. For the available models, refer to [OpenAI webpage](https://platform.openai.com/docs/models).
+For more parameters, see [OpenAI API reference](https://platform.openai.com/docs/api-reference/introduction).
+```python
+from llm_ie.engines import OpenAIInferenceEngine
+openai_engine = OpenAIInferenceEngine(model="gpt-4o-mini")
+```
 #### Test inference engine configuration
 To test the inference engine, use the ```chat()``` method.

{llm_ie-0.1.0 → llm_ie-0.1.2}/README.md RENAMED Viewed

@@ -1,4 +1,8 @@
-<div align="center"><img src=asset/LLM-IE.png width=500 ></div>
+<div align="center"><img src=asset/readme_img/LLM-IE.png width=500 ></div>
+![Python Version](https://img.shields.io/pypi/pyversions/llm-ie)
+![PyPI](https://img.shields.io/pypi/v/llm-ie)
 An LLM-powered tool that transforms everyday language into robust information extraction pipelines.
@@ -16,10 +20,10 @@ An LLM-powered tool that transforms everyday language into robust information ex
 ## Overview
 LLM-IE is a toolkit that provides robust information extraction utilities for frame-based information extraction. Since prompt design has a significant impact on generative information extraction with LLMs, it also provides a built-in LLM editor to help with prompt writing. The flowchart below demonstrates the workflow starting from a casual language request.
-<div align="center"><img src="asset/LLM-IE flowchart.png" width=800 ></div>
+<div align="center"><img src="asset/readme_img/LLM-IE flowchart.png" width=800 ></div>
 ## Prerequisite
-At least one LLM inference engine is required. We provide built-in support for 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python) and <img src="https://avatars.githubusercontent.com/u/151674099?s=48&v=4" alt="Icon" width="20"/> [Ollama](https://github.com/ollama/ollama). For installation guides, please refer to those projects. Other inference engines can be configured through the [InferenceEngine](src/llm_ie/engines.py) abstract class. See [LLM Inference Engine](#llm-inference-engine) section below.
+At least one LLM inference engine is required. There are built-in supports for 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python), <img src="https://avatars.githubusercontent.com/u/151674099?s=48&v=4" alt="Icon" width="20"/> [Ollama](https://github.com/ollama/ollama), 🤗 [Huggingface_hub](https://github.com/huggingface/huggingface_hub), and <img src=asset/readme_img/openai-logomark.png width=16 /> [OpenAI API](https://platform.openai.com/docs/api-reference/introduction). For installation guides, please refer to those projects. Other inference engines can be configured through the [InferenceEngine](src/llm_ie/engines.py) abstract class. See [LLM Inference Engine](#llm-inference-engine) section below.
 ## Installation
 The Python package is available on PyPI.
@@ -32,7 +36,7 @@ Note that this package does not check LLM inference engine installation nor inst
 We use a [synthesized medical note](demo/document/synthesized_note.txt) by ChatGPT to demo the information extraction process. Our task is to extract diagnosis names, spans, and corresponding attributes (i.e., diagnosis datetime, status).
 #### Choose an LLM inference engine
-We use one of the built-in engines.
+Choose one of the built-in engines below.
 <details>
 <summary><img src="https://avatars.githubusercontent.com/u/151674099?s=48&v=4" alt="Icon" width="20"/> Ollama</summary>
@@ -49,11 +53,35 @@ llm = OllamaInferenceEngine(model_name="llama3.1:8b-instruct-q8_0")
 ```python
 from llm_ie.engines import LlamaCppInferenceEngine
-llama_cpp = LlamaCppInferenceEngine(repo_id="bullerwins/Meta-Llama-3.1-8B-Instruct-GGUF",
+llm = LlamaCppInferenceEngine(repo_id="bullerwins/Meta-Llama-3.1-8B-Instruct-GGUF",
                                     gguf_filename="Meta-Llama-3.1-8B-Instruct-Q8_0.gguf")
 ```
 </details>
+<details>
+<summary>🤗 Huggingface_hub</summary>
+```python
+from llm_ie.engines import HuggingFaceHubInferenceEngine
+llm = HuggingFaceHubInferenceEngine(model="meta-llama/Meta-Llama-3-8B-Instruct")
+```
+</details>
+<details>
+<summary><img src=asset/readme_img/openai-logomark.png width=16 /> OpenAI API</summary>
+Follow the [Best Practices for API Key Safety](https://help.openai.com/en/articles/5112595-best-practices-for-api-key-safety) to set up API key.
+```python
+from llm_ie.engines import OpenAIInferenceEngine
+llm = OpenAIInferenceEngine(model="gpt-4o-mini")
+```
+</details>
+In this quick start demo, we use Llama-cpp-python to run Llama-3.1-8B with int8 quantization ([bullerwins/Meta-Llama-3.1-8B-Instruct-GGUF](https://huggingface.co/bullerwins/Meta-Llama-3.1-8B-Instruct-GGUF)).
+The outputs might be slightly different with other inference engines, LLMs, or quantization.
 #### Casual language as prompt
 We start with a casual description:
@@ -152,7 +180,7 @@ This package is comprised of some key classes:
 - Extractors
 ### LLM Inference Engine
-Provides an interface for different LLM inference engines to work in the information extraction workflow. The built-in engines are ```LlamaCppInferenceEngine``` and ```OllamaInferenceEngine```.
+Provides an interface for different LLM inference engines to work in the information extraction workflow. The built-in engines are ```LlamaCppInferenceEngine```, ```OllamaInferenceEngine```, and ```HuggingFaceHubInferenceEngine```.
 #### 🦙 Llama-cpp-python
 The ```repo_id``` and ```gguf_filename``` must match the ones on the Huggingface repo to ensure the correct model is loaded. ```n_ctx``` determines the context length LLM will consider during text generation. Empirically, longer context length gives better performance, while consuming more memory and increases computation. Note that when ```n_ctx``` is less than the prompt length, Llama.cpp throws exceptions. ```n_gpu_layers``` indicates a number of model layers to offload to GPU. Default is -1 for all layers (entire LLM). Flash attention ```flash_attn``` is supported by Llama.cpp. The ```verbose``` indicates whether model information should be displayed. For more input parameters, see 🦙 [Llama-cpp-python](https://github.com/abetlen/llama-cpp-python).
@@ -178,6 +206,31 @@ ollama = OllamaInferenceEngine(model_name="llama3.1:8b-instruct-q8_0",
                                keep_alive=300)
 ```
+#### 🤗 huggingface_hub
+The ```model``` can be a model id hosted on the Hugging Face Hub or a URL to a deployed Inference Endpoint. Refer to the [Inference Client](https://huggingface.co/docs/huggingface_hub/en/package_reference/inference_client) documentation for more details.
+```python
+from llm_ie.engines import HuggingFaceHubInferenceEngine
+hf = HuggingFaceHubInferenceEngine(model="meta-llama/Meta-Llama-3-8B-Instruct")
+```
+#### <img src=asset/readme_img/openai-logomark.png width=16 /> OpenAI API
+In bash, save API key to the environmental variable ```OPENAI_API_KEY```.
+```
+export OPENAI_API_KEY=<your_API_key>
+```
+In Python, create inference engine and specify model name. For the available models, refer to [OpenAI webpage](https://platform.openai.com/docs/models).
+For more parameters, see [OpenAI API reference](https://platform.openai.com/docs/api-reference/introduction).
+```python
+from llm_ie.engines import OpenAIInferenceEngine
+openai_engine = OpenAIInferenceEngine(model="gpt-4o-mini")
+```
 #### Test inference engine configuration
 To test the inference engine, use the ```chat()``` method.

{llm_ie-0.1.0 → llm_ie-0.1.2}/pyproject.toml RENAMED Viewed

@@ -1,14 +1,13 @@
 [tool.poetry]
 name = "llm-ie"
-version = "0.1.0"
+version = "0.1.2"
 description = "An LLM-powered tool that transforms everyday language into robust information extraction pipelines."
 authors = ["Enshuo (David) Hsu"]
 license = "MIT"
 readme = "README.md"
 exclude = [
-    "test/**",
-    "*.png"
+    "test/**"
 ]
 [tool.poetry.dependencies]

{llm_ie-0.1.0 → llm_ie-0.1.2}/src/llm_ie/engines.py RENAMED Viewed

@@ -31,7 +31,6 @@ class InferenceEngine:
 class LlamaCppInferenceEngine(InferenceEngine):
-    from llama_cpp import Llama
     def __init__(self, repo_id:str, gguf_filename:str, n_ctx:int=4096, n_gpu_layers:int=-1, **kwrs):
         """
         The Llama.cpp inference engine.
@@ -48,13 +47,13 @@ class LlamaCppInferenceEngine(InferenceEngine):
         n_gpu_layers : int, Optional
             number of layers to offload to GPU. Default is all layers (-1).
         """
-        super().__init__()
+        from llama_cpp import Llama
         self.repo_id = repo_id
         self.gguf_filename = gguf_filename
         self.n_ctx = n_ctx
         self.n_gpu_layers = n_gpu_layers
-        self.model = self.Llama.from_pretrained(
+        self.model = Llama.from_pretrained(
             repo_id=self.repo_id,
             filename=self.gguf_filename,
             n_gpu_layers=n_gpu_layers,
@@ -106,7 +105,6 @@ class LlamaCppInferenceEngine(InferenceEngine):
 class OllamaInferenceEngine(InferenceEngine):
-    import ollama
     def __init__(self, model_name:str, num_ctx:int=4096, keep_alive:int=300, **kwrs):
         """
         The Ollama inference engine.
@@ -120,6 +118,8 @@ class OllamaInferenceEngine(InferenceEngine):
         keep_alive : int, Optional
             seconds to hold the LLM after the last API call.
         """
+        import ollama
+        self.ollama = ollama
         self.model_name = model_name
         self.num_ctx = num_ctx
         self.keep_alive = keep_alive
@@ -155,12 +155,95 @@ class OllamaInferenceEngine(InferenceEngine):
             return res
         return response['message']['content']
-    def release_model_memory(self):
+class HuggingFaceHubInferenceEngine(InferenceEngine):
+    def __init__(self, **kwrs):
         """
-        Call API again with keep_alive=0 to release memory for the model
+        The Huggingface_hub InferenceClient inference engine.
+        For parameters and documentation, refer to https://huggingface.co/docs/huggingface_hub/en/package_reference/inference_client
+        """
+        from huggingface_hub import InferenceClient
+        self.client = InferenceClient(**kwrs)
+    def chat(self, messages:List[Dict[str,str]], max_new_tokens:int=2048, temperature:float=0.0, stream:bool=False, **kwrs) -> str:
         """
-        self.ollama.chat(model=self.model_name,
-                        messages=[{'role': 'user', 'content': ''}],
-                        options={'num_predict': 0},
-                        keep_alive=0)
+        This method inputs chat messages and outputs LLM generated text.
+        Parameters:
+        ----------
+        messages : List[Dict[str,str]]
+            a list of dict with role and content. role must be one of {"system", "user", "assistant"}
+        max_new_tokens : str, Optional
+            the max number of new tokens LLM can generate.
+        temperature : float, Optional
+            the temperature for token sampling.
+        stream : bool, Optional
+            if True, LLM generated text will be printed in terminal in real-time.
+        """
+        response = self.client.chat.completions.create(
+                    messages=messages,
+                    max_tokens=max_new_tokens,
+                    temperature=temperature,
+                    stream=stream,
+                    **kwrs
+                )
+        if stream:
+            res = ''
+            for chunk in response:
+                res += chunk.choices[0].delta.content
+                print(chunk.choices[0].delta.content, end='', flush=True)
+            return res
+        return response.choices[0].message.content
+class OpenAIInferenceEngine(InferenceEngine):
+    def __init__(self, model:str, **kwrs):
+        """
+        The OpenAI API inference engine.
+        For parameters and documentation, refer to https://platform.openai.com/docs/api-reference/introduction
+        Parameters:
+        ----------
+        model_name : str
+            model name as described in https://platform.openai.com/docs/models
+        """
+        from openai import OpenAI
+        self.client = OpenAI(**kwrs)
+        self.model = model
+    def chat(self, messages:List[Dict[str,str]], max_new_tokens:int=2048, temperature:float=0.0, stream:bool=False, **kwrs) -> str:
+        """
+        This method inputs chat messages and outputs LLM generated text.
+        Parameters:
+        ----------
+        messages : List[Dict[str,str]]
+            a list of dict with role and content. role must be one of {"system", "user", "assistant"}
+        max_new_tokens : str, Optional
+            the max number of new tokens LLM can generate.
+        temperature : float, Optional
+            the temperature for token sampling.
+        stream : bool, Optional
+            if True, LLM generated text will be printed in terminal in real-time.
+        """
+        response = self.client.chat.completions.create(
+            model=self.model,
+            messages=messages,
+            max_tokens=max_new_tokens,
+            temperature=temperature,
+            stream=stream,
+            **kwrs
+        )
+        if stream:
+            res = ''
+            for chunk in response:
+                if chunk.choices[0].delta.content is not None:
+                    res += chunk.choices[0].delta.content
+                    print(chunk.choices[0].delta.content, end="")
+            return res
+        return response.choices[0].delta.content

{llm_ie-0.1.0 → llm_ie-0.1.2}/src/llm_ie/__init__.py RENAMED Viewed

File without changes

{llm_ie-0.1.0 → llm_ie-0.1.2}/src/llm_ie/data_types.py RENAMED Viewed

File without changes

{llm_ie-0.1.0 → llm_ie-0.1.2}/src/llm_ie/extractors.py RENAMED Viewed

File without changes

{llm_ie-0.1.0 → llm_ie-0.1.2}/src/llm_ie/prompt_editor.py RENAMED Viewed

File without changes

llm-ie 0.1.0__tar.gz → 0.1.2__tar.gz

llm-ie 0.1.0tar.gz → 0.1.2tar.gz