PyPI - lexoid - Versions diffs - 0.1.13__tar.gz → 0.1.14__tar.gz - Mend

lexoid 0.1.13tar.gz → 0.1.14tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

{lexoid-0.1.13 → lexoid-0.1.14}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: lexoid
-Version: 0.1.13
+Version: 0.1.14
 Summary:
 Requires-Python: >=3.10,<4.0
 Classifier: Programming Language :: Python :: 3
@@ -49,7 +49,8 @@ Description-Content-Type: text/markdown
 </div>
 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/oidlabs-com/Lexoid/blob/main/examples/example_notebook_colab.ipynb)
-[![GitHub license](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://github.com/oidlabs-com/Lexoid/blob/main/LICENSE)
+[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-yellow)](https://huggingface.co/spaces/oidlabs/Lexoid)
+[![GitHub license](https://img.shields.io/badge/License-Apache_2.0-turquoise.svg)](https://github.com/oidlabs-com/Lexoid/blob/main/LICENSE)
 [![PyPI](https://img.shields.io/pypi/v/lexoid)](https://pypi.org/project/lexoid/)
 [![Docs](https://github.com/oidlabs-com/Lexoid/actions/workflows/deploy_docs.yml/badge.svg)](https://oidlabs-com.github.io/Lexoid/)
@@ -144,6 +145,7 @@ print(parsed_md)
 * Hugging Face
 * Together AI
 * OpenRouter
+* Fireworks
 ## Benchmark
@@ -151,22 +153,25 @@ Results aggregated across 5 iterations each for 5 documents.
 _Note:_ Benchmarks are currently done in the zero-shot setting.
-| Rank | Model                                                 | Mean Similarity | Std. Dev. | Time (s) | Cost($)  |
-| ---- | ----------------------------------------------------- | --------------- | --------- | -------- | -------- |
-| 1    | gemini-2.0-flash                                      | 0.829           | 0.102     | 7.41     | 0.000480 |
-| 2    | gemini-2.0-flash-001                                  | 0.814           | 0.176     | 6.85     | 0.000421 |
-| 3    | gemini-1.5-flash                                      | 0.797           | 0.143     | 9.54     | 0.000238 |
-| 4    | gemini-2.0-pro-exp                                    | 0.764           | 0.227     | 11.95    |   TBA    |
-| 5    | gemini-2.0-flash-thinking-exp                         | 0.746           | 0.266     | 10.46    |   TBA    |
-| 6    | gemini-1.5-pro                                        | 0.732           | 0.265     | 11.44    | 0.003332 |
-| 7    | gpt-4o                                                | 0.687           | 0.247     | 10.16    | 0.004736 |
-| 8    | gpt-4o-mini                                           | 0.642           | 0.213     | 9.71     | 0.000275 |
-| 9    | gemma-3-27b-it (via OpenRouter)                       | 0.628           | 0.299     | 18.79    | 0.000096 |
-| 10   | gemini-1.5-flash-8b                                   | 0.551           | 0.223     | 3.91     | 0.000055 |
-| 11   | Llama-Vision-Free (via Together AI)                   | 0.531           | 0.198     | 6.93     | 0        |
-| 12   | Llama-3.2-11B-Vision-Instruct-Turbo (via Together AI) | 0.524           | 0.192     | 3.68     | 0.000060 |
-| 13   | qwen/qwen-2.5-vl-7b-instruct (via OpenRouter)         | 0.482           | 0.209     | 11.53    | 0.000052 |
-| 14   | Llama-3.2-90B-Vision-Instruct-Turbo (via Together AI) | 0.461           | 0.306     | 19.26    | 0.000426 |
-| 15   | Llama-3.2-11B-Vision-Instruct (via Hugging Face)      | 0.451           | 0.257     | 4.54     |   0      |
-| 16   | microsoft/phi-4-multimodal-instruct (via OpenRouter)  | 0.366           | 0.287     | 10.80    | 0.000019 |
+| Rank | Model | Mean Similarity | Std. Dev. | Time (s) | Cost ($) |
+| --- | --- | --- | --- | --- | --- |
+| 1 | gemini-2.0-flash | 0.829 | 0.102 | 7.41 | 0.00048 |
+| 2 | gemini-2.0-flash-001 | 0.814 | 0.176 | 6.85 | 0.000421 |
+| 3 | gemini-1.5-flash | 0.797 | 0.143 | 9.54 | 0.000238 |
+| 4 | gemini-2.0-pro-exp | 0.764 | 0.227 | 11.95 | TBA |
+| 5 | AUTO | 0.76 | 0.184 | 5.14 | 0.000217 |
+| 6 | gemini-2.0-flash-thinking-exp | 0.746 | 0.266 | 10.46 | TBA |
+| 7 | gemini-1.5-pro | 0.732 | 0.265 | 11.44 | 0.003332 |
+| 8 | accounts/fireworks/models/llama4-maverick-instruct-basic (via Fireworks) | 0.687 | 0.221 | 8.07 | 0.000419 |
+| 9 | gpt-4o | 0.687 | 0.247 | 10.16 | 0.004736 |
+| 10 | accounts/fireworks/models/llama4-scout-instruct-basic (via Fireworks) | 0.675 | 0.184 | 5.98 | 0.000226 |
+| 11 | gpt-4o-mini | 0.642 | 0.213 | 9.71 | 0.000275 |
+| 12 | gemma-3-27b-it (via OpenRouter) | 0.628 | 0.299 | 18.79 | 0.000096 |
+| 13 | gemini-1.5-flash-8b | 0.551 | 0.223 | 3.91 | 0.000055 |
+| 14 | Llama-Vision-Free (via Together AI) | 0.531 | 0.198 | 6.93 | 0 |
+| 15 | Llama-3.2-11B-Vision-Instruct-Turbo (via Together AI) | 0.524 | 0.192 | 3.68 | 0.00006 |
+| 16 | qwen/qwen-2.5-vl-7b-instruct (via OpenRouter) | 0.482 | 0.209 | 11.53 | 0.000052 |
+| 17 | Llama-3.2-90B-Vision-Instruct-Turbo (via Together AI) | 0.461 | 0.306 | 19.26 | 0.000426 |
+| 18 | Llama-3.2-11B-Vision-Instruct (via Hugging Face) | 0.451 | 0.257 | 4.54 | 0 |
+| 19 | microsoft/phi-4-multimodal-instruct (via OpenRouter) | 0.366 | 0.287 | 10.8 | 0.000019 |

{lexoid-0.1.13 → lexoid-0.1.14}/README.md RENAMED Viewed

@@ -14,7 +14,8 @@
 </div>
 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/oidlabs-com/Lexoid/blob/main/examples/example_notebook_colab.ipynb)
-[![GitHub license](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://github.com/oidlabs-com/Lexoid/blob/main/LICENSE)
+[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-yellow)](https://huggingface.co/spaces/oidlabs/Lexoid)
+[![GitHub license](https://img.shields.io/badge/License-Apache_2.0-turquoise.svg)](https://github.com/oidlabs-com/Lexoid/blob/main/LICENSE)
 [![PyPI](https://img.shields.io/pypi/v/lexoid)](https://pypi.org/project/lexoid/)
 [![Docs](https://github.com/oidlabs-com/Lexoid/actions/workflows/deploy_docs.yml/badge.svg)](https://oidlabs-com.github.io/Lexoid/)
@@ -109,6 +110,7 @@ print(parsed_md)
 * Hugging Face
 * Together AI
 * OpenRouter
+* Fireworks
 ## Benchmark
@@ -116,21 +118,24 @@ Results aggregated across 5 iterations each for 5 documents.
 _Note:_ Benchmarks are currently done in the zero-shot setting.
-| Rank | Model                                                 | Mean Similarity | Std. Dev. | Time (s) | Cost($)  |
-| ---- | ----------------------------------------------------- | --------------- | --------- | -------- | -------- |
-| 1    | gemini-2.0-flash                                      | 0.829           | 0.102     | 7.41     | 0.000480 |
-| 2    | gemini-2.0-flash-001                                  | 0.814           | 0.176     | 6.85     | 0.000421 |
-| 3    | gemini-1.5-flash                                      | 0.797           | 0.143     | 9.54     | 0.000238 |
-| 4    | gemini-2.0-pro-exp                                    | 0.764           | 0.227     | 11.95    |   TBA    |
-| 5    | gemini-2.0-flash-thinking-exp                         | 0.746           | 0.266     | 10.46    |   TBA    |
-| 6    | gemini-1.5-pro                                        | 0.732           | 0.265     | 11.44    | 0.003332 |
-| 7    | gpt-4o                                                | 0.687           | 0.247     | 10.16    | 0.004736 |
-| 8    | gpt-4o-mini                                           | 0.642           | 0.213     | 9.71     | 0.000275 |
-| 9    | gemma-3-27b-it (via OpenRouter)                       | 0.628           | 0.299     | 18.79    | 0.000096 |
-| 10   | gemini-1.5-flash-8b                                   | 0.551           | 0.223     | 3.91     | 0.000055 |
-| 11   | Llama-Vision-Free (via Together AI)                   | 0.531           | 0.198     | 6.93     | 0        |
-| 12   | Llama-3.2-11B-Vision-Instruct-Turbo (via Together AI) | 0.524           | 0.192     | 3.68     | 0.000060 |
-| 13   | qwen/qwen-2.5-vl-7b-instruct (via OpenRouter)         | 0.482           | 0.209     | 11.53    | 0.000052 |
-| 14   | Llama-3.2-90B-Vision-Instruct-Turbo (via Together AI) | 0.461           | 0.306     | 19.26    | 0.000426 |
-| 15   | Llama-3.2-11B-Vision-Instruct (via Hugging Face)      | 0.451           | 0.257     | 4.54     |   0      |
-| 16   | microsoft/phi-4-multimodal-instruct (via OpenRouter)  | 0.366           | 0.287     | 10.80    | 0.000019 |
+| Rank | Model | Mean Similarity | Std. Dev. | Time (s) | Cost ($) |
+| --- | --- | --- | --- | --- | --- |
+| 1 | gemini-2.0-flash | 0.829 | 0.102 | 7.41 | 0.00048 |
+| 2 | gemini-2.0-flash-001 | 0.814 | 0.176 | 6.85 | 0.000421 |
+| 3 | gemini-1.5-flash | 0.797 | 0.143 | 9.54 | 0.000238 |
+| 4 | gemini-2.0-pro-exp | 0.764 | 0.227 | 11.95 | TBA |
+| 5 | AUTO | 0.76 | 0.184 | 5.14 | 0.000217 |
+| 6 | gemini-2.0-flash-thinking-exp | 0.746 | 0.266 | 10.46 | TBA |
+| 7 | gemini-1.5-pro | 0.732 | 0.265 | 11.44 | 0.003332 |
+| 8 | accounts/fireworks/models/llama4-maverick-instruct-basic (via Fireworks) | 0.687 | 0.221 | 8.07 | 0.000419 |
+| 9 | gpt-4o | 0.687 | 0.247 | 10.16 | 0.004736 |
+| 10 | accounts/fireworks/models/llama4-scout-instruct-basic (via Fireworks) | 0.675 | 0.184 | 5.98 | 0.000226 |
+| 11 | gpt-4o-mini | 0.642 | 0.213 | 9.71 | 0.000275 |
+| 12 | gemma-3-27b-it (via OpenRouter) | 0.628 | 0.299 | 18.79 | 0.000096 |
+| 13 | gemini-1.5-flash-8b | 0.551 | 0.223 | 3.91 | 0.000055 |
+| 14 | Llama-Vision-Free (via Together AI) | 0.531 | 0.198 | 6.93 | 0 |
+| 15 | Llama-3.2-11B-Vision-Instruct-Turbo (via Together AI) | 0.524 | 0.192 | 3.68 | 0.00006 |
+| 16 | qwen/qwen-2.5-vl-7b-instruct (via OpenRouter) | 0.482 | 0.209 | 11.53 | 0.000052 |
+| 17 | Llama-3.2-90B-Vision-Instruct-Turbo (via Together AI) | 0.461 | 0.306 | 19.26 | 0.000426 |
+| 18 | Llama-3.2-11B-Vision-Instruct (via Hugging Face) | 0.451 | 0.257 | 4.54 | 0 |
+| 19 | microsoft/phi-4-multimodal-instruct (via OpenRouter) | 0.366 | 0.287 | 10.8 | 0.000019 |

{lexoid-0.1.13 → lexoid-0.1.14}/lexoid/api.py RENAMED Viewed

@@ -10,7 +10,11 @@ from typing import Union, Dict, List
 from loguru import logger
-from lexoid.core.parse_type.llm_parser import parse_llm_doc
+from lexoid.core.parse_type.llm_parser import (
+    parse_llm_doc,
+    create_response,
+    convert_doc_to_base64_images,
+)
 from lexoid.core.parse_type.static_parser import parse_static_doc
 from lexoid.core.utils import (
     convert_to_pdf,
@@ -49,6 +53,7 @@ def parse_chunk(path: str, parser_type: ParserType, **kwargs) -> Dict:
             - parent_title: Title of parent doc if recursively parsed
             - recursive_docs: List of dictionaries for recursively parsed documents
             - token_usage: Dictionary containing token usage statistics
+            - parser_used: Which parser was actually used
     """
     if parser_type == ParserType.AUTO:
         router_priority = kwargs.get("router_priority", "speed")
@@ -60,10 +65,13 @@ def parse_chunk(path: str, parser_type: ParserType, **kwargs) -> Dict:
     )
     if parser_type == ParserType.STATIC_PARSE:
         logger.debug("Using static parser")
-        return parse_static_doc(path, **kwargs)
+        result = parse_static_doc(path, **kwargs)
     else:
         logger.debug("Using LLM parser")
-        return parse_llm_doc(path, **kwargs)
+        result = parse_llm_doc(path, **kwargs)
+    result["parser_used"] = parser_type
+    return result
 def parse_chunk_list(
@@ -82,15 +90,18 @@ def parse_chunk_list(
     """
     combined_segments = []
     raw_texts = []
-    token_usage = {"input": 0, "output": 0, "image_count": 0}
+    token_usage = {"input": 0, "output": 0, "llm_page_count": 0}
     for file_path in file_paths:
         result = parse_chunk(file_path, parser_type, **kwargs)
         combined_segments.extend(result["segments"])
         raw_texts.append(result["raw"])
-        if "token_usage" in result:
+        if (
+            result.get("parser_used") == ParserType.LLM_PARSE
+            and "token_usage" in result
+        ):
             token_usage["input"] += result["token_usage"]["input"]
             token_usage["output"] += result["token_usage"]["output"]
-            token_usage["image_count"] += len(result["segments"])
+            token_usage["llm_page_count"] += len(result["segments"])
     token_usage["total"] = token_usage["input"] + token_usage["output"]
     return {
@@ -136,7 +147,7 @@ def parse(
     as_pdf = kwargs.get("as_pdf", False)
     depth = kwargs.get("depth", 1)
-    if type(parser_type) == str:
+    if type(parser_type) is str:
         parser_type = ParserType[parser_type]
     if (
         path.lower().endswith((".doc", ".docx"))
@@ -184,7 +195,7 @@ def parse(
         if not path.lower().endswith(".pdf") or parser_type == ParserType.STATIC_PARSE:
             kwargs["split"] = False
-            result = parse_chunk(path, parser_type, **kwargs)
+            result = parse_chunk_list([path], parser_type, kwargs)
         else:
             kwargs["split"] = True
             split_dir = os.path.join(temp_dir, "splits/")
@@ -219,42 +230,43 @@ def parse(
                 "token_usage": {
                     "input": sum(r["token_usage"]["input"] for r in chunk_results),
                     "output": sum(r["token_usage"]["output"] for r in chunk_results),
-                    "image_count": sum(
-                        r["token_usage"]["image_count"] for r in chunk_results
+                    "llm_page_count": sum(
+                        r["token_usage"]["llm_page_count"] for r in chunk_results
                     ),
                     "total": sum(r["token_usage"]["total"] for r in chunk_results),
                 },
             }
-            if "api_cost_mapping" in kwargs:
-                api_cost_mapping = kwargs["api_cost_mapping"]
-                if isinstance(api_cost_mapping, dict):
-                    api_cost_mapping = api_cost_mapping
-                elif isinstance(api_cost_mapping, str) and os.path.exists(
-                    api_cost_mapping
-                ):
-                    with open(api_cost_mapping, "r") as f:
-                        api_cost_mapping = json.load(f)
-                else:
-                    raise ValueError(f"Unsupported API cost value: {api_cost_mapping}.")
-                api_cost = api_cost_mapping.get(
-                    kwargs.get("model", "gemini-2.0-flash"), None
+        if "api_cost_mapping" in kwargs and "token_usage" in result:
+            api_cost_mapping = kwargs["api_cost_mapping"]
+            if isinstance(api_cost_mapping, dict):
+                api_cost_mapping = api_cost_mapping
+            elif isinstance(api_cost_mapping, str) and os.path.exists(api_cost_mapping):
+                with open(api_cost_mapping, "r") as f:
+                    api_cost_mapping = json.load(f)
+            else:
+                raise ValueError(f"Unsupported API cost value: {api_cost_mapping}.")
+            api_cost = api_cost_mapping.get(
+                kwargs.get("model", "gemini-2.0-flash"), None
+            )
+            if api_cost:
+                token_usage = result["token_usage"]
+                token_cost = {
+                    "input": token_usage["input"] * api_cost["input"] / 1_000_000,
+                    "input-image": api_cost.get("input-image", 0)
+                    * token_usage.get("llm_page_count", 0),
+                    "output": token_usage["output"] * api_cost["output"] / 1_000_000,
+                }
+                token_cost["total"] = (
+                    token_cost["input"]
+                    + token_cost["input-image"]
+                    + token_cost["output"]
                 )
-                if api_cost:
-                    token_usage = result["token_usage"]
-                    token_cost = {
-                        "input": token_usage["input"] * api_cost["input"] / 1_000_000
-                        + api_cost.get("input-image", 0) * token_usage["image_count"],
-                        "output": token_usage["output"]
-                        * api_cost["output"]
-                        / 1_000_000,
-                    }
-                    token_cost["total"] = token_cost["input"] + token_cost["output"]
-                    result["token_cost"] = token_cost
-            if as_pdf:
-                result["pdf_path"] = path
+                result["token_cost"] = token_cost
+        if as_pdf:
+            result["pdf_path"] = path
     if depth > 1:
         recursive_docs = []
@@ -285,3 +297,63 @@ def parse(
         result["recursive_docs"] = recursive_docs
     return result
+def parse_with_schema(
+    path: str, schema: Dict, api: str = "openai", model: str = "gpt-4o-mini", **kwargs
+) -> List[List[Dict]]:
+    """
+    Parses a PDF using an LLM to generate structured output conforming to a given JSON schema.
+    Args:
+        path (str): Path to the PDF file.
+        schema (Dict): JSON schema to which the parsed output should conform.
+        api (str, optional): LLM API provider (One of "openai", "huggingface", "together", "openrouter", and "fireworks").
+        model (str, optional): LLM model name.
+        **kwargs: Additional arguments for the parser (e.g.: temperature, max_tokens).
+    Returns:
+        List[List[Dict]]: List of dictionaries for each page, each conforming to the provided schema.
+    """
+    system_prompt = f"""
+        The output should be formatted as a JSON instance that conforms to the JSON schema below.
+        As an example, for the schema {{
+        "properties": {{
+            "foo": {{
+            "title": "Foo",
+            "description": "a list of strings",
+            "type": "array",
+            "items": {{"type": "string"}}
+            }}
+        }},
+        "required": ["foo"]
+        }}, the object {{"foo": ["bar", "baz"]}} is valid. The object {{"properties": {{"foo": ["bar", "baz"]}}}} is not.
+        Here is the output schema:
+        {json.dumps(schema, indent=2)}
+        """
+    user_prompt = "You are an AI agent that parses documents and returns them in the specified JSON format. Please parse the document and return it in the required format."
+    responses = []
+    images = convert_doc_to_base64_images(path)
+    for i, (page_num, image) in enumerate(images):
+        resp_dict = create_response(
+            api=api,
+            model=model,
+            user_prompt=user_prompt,
+            system_prompt=system_prompt,
+            image_url=image,
+            temperature=kwargs.get("temperature", 0.0),
+            max_tokens=kwargs.get("max_tokens", 1024),
+        )
+        response = resp_dict.get("response", "")
+        response = response.split("```json")[-1].split("```")[0].strip()
+        logger.debug(f"Processing page {page_num + 1} with response: {response}")
+        new_dict = json.loads(response)
+        responses.append(new_dict)
+    return responses

{lexoid-0.1.13 → lexoid-0.1.14}/lexoid/core/parse_type/llm_parser.py RENAMED Viewed

@@ -3,23 +3,24 @@ import io
 import mimetypes
 import os
 import time
+from functools import wraps
+from typing import Dict, List, Optional, Tuple
 import pypdfium2 as pdfium
 import requests
-from functools import wraps
+from huggingface_hub import InferenceClient
+from loguru import logger
+from openai import OpenAI
 from requests.exceptions import HTTPError
-from typing import Dict, List
+from together import Together
 from lexoid.core.prompt_templates import (
     INSTRUCTIONS_ADD_PG_BREAK,
+    LLAMA_PARSER_PROMPT,
     OPENAI_USER_PROMPT,
     PARSER_PROMPT,
-    LLAMA_PARSER_PROMPT,
 )
 from lexoid.core.utils import convert_image_to_pdf
-from loguru import logger
-from openai import OpenAI
-from together import Together
-from huggingface_hub import InferenceClient
 def retry_on_http_error(func):
@@ -65,10 +66,13 @@ def parse_llm_doc(path: str, **kwargs) -> List[Dict] | str:
         return parse_with_api(path, api="huggingface", **kwargs)
     if any(model.startswith(prefix) for prefix in ["microsoft", "google", "qwen"]):
         return parse_with_api(path, api="openrouter", **kwargs)
+    if model.startswith("accounts/fireworks"):
+        return parse_with_api(path, api="fireworks", **kwargs)
     raise ValueError(f"Unsupported model: {model}")
 def parse_with_gemini(path: str, **kwargs) -> List[Dict] | str:
+    logger.debug(f"Parsing with Gemini API and model {kwargs['model']}")
     api_key = os.environ.get("GOOGLE_API_KEY")
     if not api_key:
         raise ValueError("GOOGLE_API_KEY environment variable is not set")
@@ -105,7 +109,7 @@ def parse_with_gemini(path: str, **kwargs) -> List[Dict] | str:
             }
         ],
         "generationConfig": {
-            "temperature": kwargs.get("temperature", 0.7),
+            "temperature": kwargs.get("temperature", 0.2),
         },
     }
@@ -127,7 +131,7 @@ def parse_with_gemini(path: str, **kwargs) -> List[Dict] | str:
     combined_text = ""
     if "<output>" in raw_text:
-        combined_text = raw_text.split("<output>")[1].strip()
+        combined_text = raw_text.split("<output>")[-1].strip()
     if "</output>" in result:
         combined_text = result.split("</output>")[0].strip()
@@ -169,18 +173,54 @@ def convert_pdf_page_to_base64(
     return base64.b64encode(img_byte_arr.getvalue()).decode("utf-8")
-def parse_with_api(path: str, api: str, **kwargs) -> List[Dict] | str:
-    """
-    Parse documents (PDFs or images) using various vision model APIs.
+def get_messages(
+    system_prompt: Optional[str], user_prompt: Optional[str], image_url: Optional[str]
+) -> List[Dict]:
+    messages = []
+    if system_prompt:
+        messages.append(
+            {
+                "role": "system",
+                "content": system_prompt,
+            }
+        )
+    base_message = (
+        [
+            {"type": "text", "text": user_prompt},
+        ]
+        if user_prompt
+        else []
+    )
+    image_message = (
+        [
+            {
+                "type": "image_url",
+                "image_url": {"url": image_url},
+            }
+        ]
+        if image_url
+        else []
+    )
-    Args:
-        path (str): Path to the document to parse
-        api (str): Which API to use ("openai", "huggingface", or "together")
-        **kwargs: Additional arguments including model, temperature, title, etc.
+    messages.append(
+        {
+            "role": "user",
+            "content": base_message + image_message,
+        }
+    )
-    Returns:
-        Dict: Dictionary containing parsed document data
-    """
+    return messages
+def create_response(
+    api: str,
+    model: str,
+    system_prompt: Optional[str] = None,
+    user_prompt: Optional[str] = None,
+    image_url: Optional[str] = None,
+    temperature: float = 0.2,
+    max_tokens: int = 1024,
+) -> Dict:
     # Initialize appropriate client
     clients = {
         "openai": lambda: OpenAI(),
@@ -192,11 +232,52 @@ def parse_with_api(path: str, api: str, **kwargs) -> List[Dict] | str:
             base_url="https://openrouter.ai/api/v1",
             api_key=os.environ["OPENROUTER_API_KEY"],
         ),
+        "fireworks": lambda: OpenAI(
+            base_url="https://api.fireworks.ai/inference/v1",
+            api_key=os.environ["FIREWORKS_API_KEY"],
+        ),
     }
     assert api in clients, f"Unsupported API: {api}"
-    logger.debug(f"Parsing with {api} API and model {kwargs['model']}")
     client = clients[api]()
+    # Prepare messages for the API call
+    messages = get_messages(system_prompt, user_prompt, image_url)
+    # Common completion parameters
+    completion_params = {
+        "model": model,
+        "messages": messages,
+        "max_tokens": max_tokens,
+        "temperature": temperature,
+    }
+    # Get completion from selected API
+    response = client.chat.completions.create(**completion_params)
+    token_usage = response.usage
+    # Extract the response text
+    page_text = response.choices[0].message.content
+    return {
+        "response": page_text,
+        "usage": token_usage,
+    }
+def parse_with_api(path: str, api: str, **kwargs) -> List[Dict] | str:
+    """
+    Parse documents (PDFs or images) using various vision model APIs.
+    Args:
+        path (str): Path to the document to parse
+        api (str): Which API to use ("openai", "huggingface", or "together")
+        **kwargs: Additional arguments including model, temperature, title, etc.
+    Returns:
+        Dict: Dictionary containing parsed document data
+    """
+    logger.debug(f"Parsing with {api} API and model {kwargs['model']}")
     # Handle different input types
     mime_type, _ = mimetypes.guess_type(path)
     if mime_type and mime_type.startswith("image"):
@@ -215,67 +296,39 @@ def parse_with_api(path: str, api: str, **kwargs) -> List[Dict] | str:
             for page_num in range(len(pdf_document))
         ]
-    # API-specific message formatting
-    def get_messages(page_num: int, image_url: str) -> List[Dict]:
-        image_message = {
-            "type": "image_url",
-            "image_url": {"url": image_url},
-        }
+    # Process each page/image
+    all_results = []
+    for page_num, image_url in images:
         if api == "openai":
             system_prompt = kwargs.get(
                 "system_prompt", PARSER_PROMPT.format(custom_instructions="")
             )
             user_prompt = kwargs.get("user_prompt", OPENAI_USER_PROMPT)
-            return [
-                {
-                    "role": "system",
-                    "content": system_prompt,
-                },
-                {
-                    "role": "user",
-                    "content": [
-                        {"type": "text", "text": user_prompt},
-                        image_message,
-                    ],
-                },
-            ]
         else:
-            prompt = kwargs.get("system_prompt", LLAMA_PARSER_PROMPT)
-            base_message = {"type": "text", "text": prompt}
-            return [
-                {
-                    "role": "user",
-                    "content": [base_message, image_message],
-                }
-            ]
-    # Process each page/image
-    all_results = []
-    for page_num, image_url in images:
-        messages = get_messages(page_num, image_url)
-        # Common completion parameters
-        completion_params = {
-            "model": kwargs["model"],
-            "messages": messages,
-            "max_tokens": kwargs.get("max_tokens", 1024),
-            "temperature": kwargs.get("temperature", 0.7),
-        }
+            system_prompt = kwargs.get("system_prompt", None)
+            user_prompt = kwargs.get("user_prompt", LLAMA_PARSER_PROMPT)
+        response = create_response(
+            api=api,
+            model=kwargs["model"],
+            system_prompt=system_prompt,
+            user_prompt=user_prompt,
+            image_url=image_url,
+            temperature=kwargs.get("temperature", 0.2),
+            max_tokens=kwargs.get("max_tokens", 1024),
+        )
         # Get completion from selected API
-        response = client.chat.completions.create(**completion_params)
-        token_usage = response.usage
+        page_text = response["response"]
+        token_usage = response["usage"]
-        # Extract the response text
-        page_text = response.choices[0].message.content
         if kwargs.get("verbose", None):
             logger.debug(f"Page {page_num + 1} response: {page_text}")
         # Extract content between output tags if present
         result = page_text
         if "<output>" in page_text:
-            result = page_text.split("<output>")[1].strip()
+            result = page_text.split("<output>")[-1].strip()
         if "</output>" in result:
             result = result.split("</output>")[0].strip()
         all_results.append(
@@ -319,3 +372,28 @@ def parse_with_api(path: str, api: str, **kwargs) -> List[Dict] | str:
             "total": sum(total_tokens for _, _, _, _, total_tokens in all_results),
         },
     }
+def convert_doc_to_base64_images(path: str) -> List[Tuple[int, str]]:
+    """
+    Converts a document (PDF or image) to a base64 encoded string.
+    Args:
+        path (str): Path to the PDF file.
+    Returns:
+        str: Base64 encoded string of the PDF content.
+    """
+    if path.endswith(".pdf"):
+        pdf_document = pdfium.PdfDocument(path)
+        return [
+            (
+                page_num,
+                f"data:image/png;base64,{convert_pdf_page_to_base64(pdf_document, page_num)}",
+            )
+            for page_num in range(len(pdf_document))
+        ]
+    elif mimetypes.guess_type(path)[0].startswith("image"):
+        with open(path, "rb") as img_file:
+            image_base64 = base64.b64encode(img_file.read()).decode("utf-8")
+            return [(0, f"data:image/png;base64,{image_base64}")]

{lexoid-0.1.13 → lexoid-0.1.14}/lexoid/core/prompt_templates.py RENAMED Viewed

@@ -41,7 +41,8 @@ Think step-by-step.
     '0' is typically more oval than 'O'
     '8' has a more angular top than 'B'
 {custom_instructions}
-- Return only the correct markdown without additional text or explanations. Do not any additional text (such as "```html" or "```markdown") in the output.
+- Return only the correct markdown without additional text or explanations.
+- DO NOT use code blocks such as "```html" or "```markdown" in the output unless there is a code block in the content.
 - Think before generating the output in <thinking></thinking> tags.
 Remember, your primary objective is to create an output that, when rendered, structurally replicates the original document's content as closely as possible without losing any textual details.

{lexoid-0.1.13 → lexoid-0.1.14}/lexoid/core/utils.py RENAMED Viewed

@@ -345,7 +345,7 @@ def get_webpage_soup(url: str) -> BeautifulSoup:
                 # Additional wait for any dynamic content
                 try:
                     await page.wait_for_selector("body", timeout=30000)
-                except:
+                except Exception:
                     pass
                 html = await page.content()
@@ -561,24 +561,32 @@ def router(path: str, priority: str = "speed") -> str:
         priority (str): The priority for routing: "accuracy" (preference to LLM_PARSE) or "speed" (preference to STATIC_PARSE).
     """
     file_type = get_file_type(path)
-    if file_type.startswith("text/") or "spreadsheet" in file_type or "presentation" in file_type:
+    if (
+        file_type.startswith("text/")
+        or "spreadsheet" in file_type
+        or "presentation" in file_type
+    ):
         return "STATIC_PARSE"
     if priority == "accuracy":
         # If the file is a PDF without images but has hyperlinks, use STATIC_PARSE
         # Otherwise, use LLM_PARSE
-        if (
-            file_type == "application/pdf"
-            and not has_image_in_pdf(path)
-            and has_hyperlink_in_pdf(path)
-        ):
+        has_image = has_image_in_pdf(path)
+        has_hyperlink = has_hyperlink_in_pdf(path)
+        if file_type == "application/pdf" and not has_image and has_hyperlink:
+            logger.debug("Using STATIC_PARSE for PDF with hyperlinks and no images.")
             return "STATIC_PARSE"
+        logger.debug(
+            f"Using LLM_PARSE because PDF has image ({has_image}) or has no hyperlink ({has_hyperlink})."
+        )
         return "LLM_PARSE"
     else:
         # If the file is a PDF without images, use STATIC_PARSE
         # Otherwise, use LLM_PARSE
         if file_type == "application/pdf" and not has_image_in_pdf(path):
+            logger.debug("Using STATIC_PARSE for PDF without images.")
             return "STATIC_PARSE"
+        logger.debug("Using LLM_PARSE because PDF has images")
         return "LLM_PARSE"

{lexoid-0.1.13 → lexoid-0.1.14}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "lexoid"
-version = "0.1.13"
+version = "0.1.14"
 description = ""
 authors = []
 readme = "README.md"

{lexoid-0.1.13 → lexoid-0.1.14}/LICENSE RENAMED Viewed

File without changes

{lexoid-0.1.13 → lexoid-0.1.14}/lexoid/core/parse_type/static_parser.py RENAMED Viewed

File without changes

lexoid 0.1.13__tar.gz → 0.1.14__tar.gz

lexoid 0.1.13tar.gz → 0.1.14tar.gz