PyPI - langchain-ocr-lib - Versions diffs - 0.1.0__tar.gz → 0.2.0__tar.gz - Mend

langchain-ocr-lib 0.1.0tar.gz → 0.2.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (37) hide show

langchain_ocr_lib-0.2.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,188 @@
+Metadata-Version: 2.1
+Name: langchain-ocr-lib
+Version: 0.2.0
+Summary:
+License: MIT
+Author: Andreas Klos
+Author-email: aklos@outlook.de
+Requires-Python: >=3.11,<4.0
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Requires-Dist: deprecated (>=1.2.14,<2.0.0)
+Requires-Dist: inject (>=5.2.1,<6.0.0)
+Requires-Dist: langchain-community (>=0.3.19,<0.4.0)
+Requires-Dist: langchain-ollama (>=0.2.0,<0.3.0)
+Requires-Dist: langchain-openai (>=0.3.8,<0.4.0)
+Requires-Dist: langfuse (>=2.59.7,<3.0.0)
+Requires-Dist: openai (>=1.42.0,<2.0.0)
+Requires-Dist: pdf2image (>=1.17.0,<2.0.0)
+Requires-Dist: pillow (>=11.0.0,<12.0.0)
+Requires-Dist: pycountry (>=24.6.1,<25.0.0)
+Requires-Dist: pytest-asyncio (>=0.25.0,<0.26.0)
+Requires-Dist: pyyaml (>=6.0.2,<7.0.0)
+Description-Content-Type: text/markdown
+# langchain_ocr_lib
+**langchain_ocr_lib** is the OCR processing engine behind LangChain-OCR. It provides a modular, vision-LLM-powered Chain to convert image and PDF documents into clean Markdown. Designed for direct CLI usage or integration into larger applications.
+## Table of Contents
+1. [Overview](#1-overview)
+2. [Features](#2-features)
+3. [Installation](#3-installation)
+   1. [Prerequisites](#31-prerequisites)
+   2. [Environment Setup](#32-environment-setup)
+4. [Usage](#4-usage)
+   1. [CLI](#41-cli)
+   2. [Python Module](#42-python-module)
+   3. [Docker](#43-docker)
+5. [Architecture](#5-architecture)
+6. [Testing](#6-testing)
+7. [License](#7-license)
+---
+## 1. Overview
+This package offers the core functionality to extract text from documents using vision LLMs and convert it into Markdown. It is highly configurable by environment variables and its design based on dependency injection, that  allows you to easily swap out components. The package is designed to be used as a library, but it also provides a command-line interface (CLI) for easy local execution.
+---
+## 2. Features
+- **Vision-Language OCR:** Supports Ollama. Other LLM providers will be added soon.
+- **CLI Interface:** Simple local execution via command line or container
+- **Highly Configurable:** Use environment variables to configure the OCR
+- **Dependency Injection:** Easily swap out components for custom implementations
+- **LangChain:** Integrates with LangChain
+- **Markdown Output:** Outputs well-formatted Markdown text
+---
+## 3. Installation
+### 3.1 Prerequisites
+- **Python:** 3.11+
+- **Poetry:** [Install Poetry](https://python-poetry.org/docs/)
+- **Docker:** For containerized CLI usage (optional)
+- **Ollama:** Follow instructions [here](https://ollama.com)
+- **Langfuse:** Different options for self hosting, see [here](https://langfuse.com/self-hosting) (optional, for observability)
+### 3.2 Environment Setup
+The package is published on PyPI, so you can install it directly with pip:
+```bash
+pip install langchain-ocr-lib
+```
+However, if you want to run the latest version or contribute to the project, you can clone the repository and install it locally.
+```bash
+git clone https://github.com/a-klos/langchain-ocr.git
+cd langchain-ocr/langchain_ocr_lib
+poetry install --with dev
+```
+You can configure the package by setting environment variables. Configuration options are shown in the [`.env.template`](../.env.template) file.
+---
+## 4. Usage
+Remember that you need to pull the configured LLM model first. With Ollama, you can do this with:
+```bash
+ollama pull <model_name>
+```
+For example, to pull the `gemma3:4b-it-q4_K_M` model, run:
+```bash
+ollama pull gemma3:4b-it-q4_K_M
+```
+### 4.1 CLI
+Run OCR locally from the terminal:
+```bash
+langchain-ocr <<input_file>>
+```
+Supports:
+- `.jpg`, `.jpeg`, `.png`, and `.pdf` inputs
+### 4.2 Python Module
+Use the the library programmatically:
+```python
+import inject
+import configure_di
+from langchain_ocr_lib.di_config import configure_di
+from langchain_ocr_lib.di_binding_keys.binding_keys import PdfConverterKey
+from langchain_ocr_lib.impl.converter.pdf_converter import Pdf2MarkdownConverter
+configure_di() #This sets up the dependency injection
+class Converter:
+    _converter: Pdf2MarkdownConverter = inject.attr(PdfConverterKey)
+    def convert(self, filename: str) -> str:
+        return self._converter.convert2markdown(filename=filename)
+converter = Converter()
+markdown = converter.convert("../docs/invoice.pdf") # Adjust the file path as needed
+print(markdown)
+```
+The `configure_di()` function sets up the dependency injection for the library. The dependencies can be easily swapped out or appended with new dependencies. See [../api/src/langchain_ocr/di_config.py](../api/src/langchain_ocr/di_config.py) for more details on how to add new dependencies.
+Swapping out the dependencies can be done as follows:
+```python
+import inject
+from inject import Binder
+from langchain_ocr_lib.di_config import lib_di_config, PdfConverterKey
+from langchain_ocr_lib.impl.converter.pdf_converter import Pdf2MarkdownConverter
+class MyPdfConverter(Pdf2MarkdownConverter):
+    def convert(self, filename: str) -> None:
+        markdown = self.convert2markdown(filename=filename)
+        print(markdown)
+def _api_specific_config(binder: Binder):
+    binder.install(lib_di_config)  # Install all default bindings
+    binder.bind(PdfConverterKey, MyPdfConverter())  # Then override PdfConverter
+def configure():
+    """Configure the dependency injection container."""
+    inject.configure(_api_specific_config, allow_override=True, clear=True)
+configure()
+class Converter:
+    _converter: MyPdfConverter = inject.attr(PdfConverterKey)
+    def convert(self, filename: str) -> None:
+        self._converter.convert(filename=filename)
+converter = Converter()
+converter.convert("../docs/invoice.pdf") # Adjust the file path as needed
+```
+### 4.3 Docker
+Run OCR via Docker without local Python setup:
+```bash
+docker build -t ocr -f langchain_ocr_lib/Dockerfile .
+docker run --net=host -it --rm -v ./docs:/app/docs:ro ocr docs/invoice.png
+```

langchain_ocr_lib-0.2.0/README.md ADDED Viewed

@@ -0,0 +1,160 @@
+# langchain_ocr_lib
+**langchain_ocr_lib** is the OCR processing engine behind LangChain-OCR. It provides a modular, vision-LLM-powered Chain to convert image and PDF documents into clean Markdown. Designed for direct CLI usage or integration into larger applications.
+## Table of Contents
+1. [Overview](#1-overview)
+2. [Features](#2-features)
+3. [Installation](#3-installation)
+   1. [Prerequisites](#31-prerequisites)
+   2. [Environment Setup](#32-environment-setup)
+4. [Usage](#4-usage)
+   1. [CLI](#41-cli)
+   2. [Python Module](#42-python-module)
+   3. [Docker](#43-docker)
+5. [Architecture](#5-architecture)
+6. [Testing](#6-testing)
+7. [License](#7-license)
+---
+## 1. Overview
+This package offers the core functionality to extract text from documents using vision LLMs and convert it into Markdown. It is highly configurable by environment variables and its design based on dependency injection, that  allows you to easily swap out components. The package is designed to be used as a library, but it also provides a command-line interface (CLI) for easy local execution.
+---
+## 2. Features
+- **Vision-Language OCR:** Supports Ollama. Other LLM providers will be added soon.
+- **CLI Interface:** Simple local execution via command line or container
+- **Highly Configurable:** Use environment variables to configure the OCR
+- **Dependency Injection:** Easily swap out components for custom implementations
+- **LangChain:** Integrates with LangChain
+- **Markdown Output:** Outputs well-formatted Markdown text
+---
+## 3. Installation
+### 3.1 Prerequisites
+- **Python:** 3.11+
+- **Poetry:** [Install Poetry](https://python-poetry.org/docs/)
+- **Docker:** For containerized CLI usage (optional)
+- **Ollama:** Follow instructions [here](https://ollama.com)
+- **Langfuse:** Different options for self hosting, see [here](https://langfuse.com/self-hosting) (optional, for observability)
+### 3.2 Environment Setup
+The package is published on PyPI, so you can install it directly with pip:
+```bash
+pip install langchain-ocr-lib
+```
+However, if you want to run the latest version or contribute to the project, you can clone the repository and install it locally.
+```bash
+git clone https://github.com/a-klos/langchain-ocr.git
+cd langchain-ocr/langchain_ocr_lib
+poetry install --with dev
+```
+You can configure the package by setting environment variables. Configuration options are shown in the [`.env.template`](../.env.template) file.
+---
+## 4. Usage
+Remember that you need to pull the configured LLM model first. With Ollama, you can do this with:
+```bash
+ollama pull <model_name>
+```
+For example, to pull the `gemma3:4b-it-q4_K_M` model, run:
+```bash
+ollama pull gemma3:4b-it-q4_K_M
+```
+### 4.1 CLI
+Run OCR locally from the terminal:
+```bash
+langchain-ocr <<input_file>>
+```
+Supports:
+- `.jpg`, `.jpeg`, `.png`, and `.pdf` inputs
+### 4.2 Python Module
+Use the the library programmatically:
+```python
+import inject
+import configure_di
+from langchain_ocr_lib.di_config import configure_di
+from langchain_ocr_lib.di_binding_keys.binding_keys import PdfConverterKey
+from langchain_ocr_lib.impl.converter.pdf_converter import Pdf2MarkdownConverter
+configure_di() #This sets up the dependency injection
+class Converter:
+    _converter: Pdf2MarkdownConverter = inject.attr(PdfConverterKey)
+    def convert(self, filename: str) -> str:
+        return self._converter.convert2markdown(filename=filename)
+converter = Converter()
+markdown = converter.convert("../docs/invoice.pdf") # Adjust the file path as needed
+print(markdown)
+```
+The `configure_di()` function sets up the dependency injection for the library. The dependencies can be easily swapped out or appended with new dependencies. See [../api/src/langchain_ocr/di_config.py](../api/src/langchain_ocr/di_config.py) for more details on how to add new dependencies.
+Swapping out the dependencies can be done as follows:
+```python
+import inject
+from inject import Binder
+from langchain_ocr_lib.di_config import lib_di_config, PdfConverterKey
+from langchain_ocr_lib.impl.converter.pdf_converter import Pdf2MarkdownConverter
+class MyPdfConverter(Pdf2MarkdownConverter):
+    def convert(self, filename: str) -> None:
+        markdown = self.convert2markdown(filename=filename)
+        print(markdown)
+def _api_specific_config(binder: Binder):
+    binder.install(lib_di_config)  # Install all default bindings
+    binder.bind(PdfConverterKey, MyPdfConverter())  # Then override PdfConverter
+def configure():
+    """Configure the dependency injection container."""
+    inject.configure(_api_specific_config, allow_override=True, clear=True)
+configure()
+class Converter:
+    _converter: MyPdfConverter = inject.attr(PdfConverterKey)
+    def convert(self, filename: str) -> None:
+        self._converter.convert(filename=filename)
+converter = Converter()
+converter.convert("../docs/invoice.pdf") # Adjust the file path as needed
+```
+### 4.3 Docker
+Run OCR via Docker without local Python setup:
+```bash
+docker build -t ocr -f langchain_ocr_lib/Dockerfile .
+docker run --net=host -it --rm -v ./docs:/app/docs:ro ocr docs/invoice.png
+```

{langchain_ocr_lib-0.1.0 → langchain_ocr_lib-0.2.0}/pyproject.toml RENAMED Viewed

@@ -1,5 +1,5 @@
 [build-system]
-requires = ["poetry-core"]
+requires = ["poetry-core==1.8.5"]
 build-backend = "poetry.core.masonry.api"
 [tool.poetry.scripts]
@@ -7,7 +7,7 @@ langchain-ocr = "langchain_ocr_lib.main:main"
 [tool.poetry]
 name = "langchain-ocr-lib"
-version = "0.1.0"
+version = "0.2.0"
 description = ""
 authors = ["Andreas Klos <aklos@outlook.de>"]
 readme = "README.md"
@@ -72,9 +72,8 @@ docstring-quotes = '"""'
 multiline-quotes = '"""'
 dictionaries = ["en_US", "python", "technical", "pandas"]
 ban-relative-imports = true
-per-file-ignores = """
+per-file-ignores = """src/langchain_ocr_lib/di_binding_keys/binding_keys.py: D101"""
-"""
 [tool.black]
 line-length = 120

{langchain_ocr_lib-0.1.0 → langchain_ocr_lib-0.2.0}/src/langchain_ocr_lib/converter/converter.py RENAMED Viewed

@@ -3,11 +3,13 @@
 from abc import ABC, abstractmethod
 import inject
+from langchain_ocr_lib.di_binding_keys.binding_keys import LangfuseTracedChainKey
 class File2MarkdownConverter(ABC):
     """Abstract base class for the File2MarkdownConverter class."""
-    _chain = inject.attr("LangfuseTracedChain")
+    _chain = inject.attr(LangfuseTracedChainKey)
     @abstractmethod
     async def aconvert2markdown(self, file: bytes) -> str:

langchain_ocr_lib-0.2.0/src/langchain_ocr_lib/di_binding_keys/binding_keys.py ADDED Viewed

@@ -0,0 +1,29 @@
+"""Define key classes for dependency bindings. More reliable than using strings."""
+class LargeLanguageModelKey:
+    pass
+class LangfuseClientKey:
+    pass
+class LangfuseManagerKey:
+    pass
+class OcrChainKey:
+    pass
+class LangfuseTracedChainKey:
+    pass
+class PdfConverterKey:
+    pass
+class ImageConverterKey:
+    pass

{langchain_ocr_lib-0.1.0 → langchain_ocr_lib-0.2.0}/src/langchain_ocr_lib/di_config.py RENAMED Viewed

@@ -2,6 +2,15 @@
 from inject import Binder
 import inject
+from langchain_ocr_lib.di_binding_keys.binding_keys import (
+    ImageConverterKey,
+    LangfuseClientKey,
+    LangfuseManagerKey,
+    LangfuseTracedChainKey,
+    LargeLanguageModelKey,
+    OcrChainKey,
+    PdfConverterKey,
+)
 from langchain_ollama import ChatOllama
 from langchain_openai import ChatOpenAI
 from langfuse import Langfuse
@@ -46,12 +55,12 @@ def lib_di_config(binder: Binder):
         llm_instance = llm_provider(settings, ChatOpenAI)
     else:
         raise NotImplementedError("Configured LLM is not implemented")
-    binder.bind("LargeLanguageModel", llm_instance)
+    binder.bind(LargeLanguageModelKey, llm_instance)
     prompt = ocr_prompt_template_builder(language=language_settings.language, model_name=settings.model)
     binder.bind(
-        "LangfuseClient",
+        LangfuseClientKey,
         Langfuse(
             public_key=langfuse_settings.public_key,
             secret_key=langfuse_settings.secret_key,
@@ -60,7 +69,7 @@ def lib_di_config(binder: Binder):
     )
     binder.bind(
-        "LangfuseManager",
+        LangfuseManagerKey,
         LangfuseManager(
             managed_prompts={
                 OcrChain.__name__: prompt,
@@ -68,17 +77,17 @@ def lib_di_config(binder: Binder):
         ),
     )
-    binder.bind("OcrChain", OcrChain())
+    binder.bind(OcrChainKey, OcrChain())
     binder.bind(
-        "LangfuseTracedChain",
+        LangfuseTracedChainKey,
         LangfuseTracedChain(
             settings=langfuse_settings,
         ),
     )
-    binder.bind("PdfConverter", Pdf2MarkdownConverter())
-    binder.bind("ImageConverter", Image2MarkdownConverter())
+    binder.bind(PdfConverterKey, Pdf2MarkdownConverter())
+    binder.bind(ImageConverterKey, Image2MarkdownConverter())
 def configure_di():

{langchain_ocr_lib-0.1.0 → langchain_ocr_lib-0.2.0}/src/langchain_ocr_lib/impl/chains/ocr_chain.py RENAMED Viewed

@@ -7,6 +7,7 @@ from langchain_core.runnables.utils import Input
 import inject
 from langchain_ocr_lib.chains.chain import Chain
+from langchain_ocr_lib.di_binding_keys.binding_keys import LangfuseManagerKey
 RunnableInput = Input  # TODO: adjust properly
 RunnableOutput = str
@@ -15,7 +16,7 @@ RunnableOutput = str
 class OcrChain(Chain[RunnableInput, RunnableOutput]):
     """Base class for LLM answer generation chain."""
-    _langfuse_manager = inject.attr("LangfuseManager")
+    _langfuse_manager = inject.attr(LangfuseManagerKey)
     def __init__(self):
         """Initialize the AnswerGenerationChain.

{langchain_ocr_lib-0.1.0 → langchain_ocr_lib-0.2.0}/src/langchain_ocr_lib/impl/langfuse_manager/langfuse_manager.py RENAMED Viewed

@@ -10,6 +10,9 @@ from langchain_core.language_models.llms import LLM
 from langfuse.api.resources.commons.errors.not_found_error import NotFoundError
 from langfuse.model import ChatPromptClient
+from langchain_ocr_lib.di_binding_keys.binding_keys import LangfuseClientKey, LargeLanguageModelKey
 logger = logging.getLogger(__name__)
@@ -23,8 +26,8 @@ class LangfuseManager:
     """
     API_KEY_FILTER: str = "api_key"
-    _llm = inject.attr("LargeLanguageModel")
-    _langfuse = inject.attr("LangfuseClient")
+    _llm = inject.attr(LargeLanguageModelKey)
+    _langfuse = inject.attr(LangfuseClientKey)
     def __init__(
         self,
@@ -136,12 +139,16 @@ class LangfuseManager:
             fallback = self._managed_prompts[name]
             if isinstance(fallback, ChatPromptTemplate):
                 return fallback
-            if isinstance(fallback, list) and len(fallback) > 0 and isinstance(fallback[0], dict) and "content" in fallback[0]:
+            if (
+                isinstance(fallback, list)
+                and len(fallback) > 0
+                and isinstance(fallback[0], dict)
+                and "content" in fallback[0]
+            ):
                 image_payload = [{"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,{image_data}"}}]
                 return ChatPromptTemplate.from_messages([("system", fallback[0]["content"]), ("user", image_payload)])
-            else:
-                logger.error("Unexpected structure for fallback prompt.")
-                raise ValueError("Unexpected structure for fallback prompt.")
+            logger.error("Unexpected structure for fallback prompt.")
+            raise ValueError("Unexpected structure for fallback prompt.")
         langchain_prompt = langfuse_prompt.get_langchain_prompt()
         langchain_prompt[-1] = ("user", json.loads(langchain_prompt[-1][1]))

{langchain_ocr_lib-0.1.0 → langchain_ocr_lib-0.2.0}/src/langchain_ocr_lib/impl/tracers/langfuse_traced_chain.py RENAMED Viewed

@@ -8,6 +8,7 @@ from langfuse.callback import CallbackHandler
 from langchain_ocr_lib.impl.settings.langfuse_settings import LangfuseSettings
 from langchain_ocr_lib.tracers.traced_chain import TracedChain
+from langchain_ocr_lib.di_config import OcrChainKey
 class LangfuseTracedChain(TracedChain):
@@ -23,7 +24,7 @@ class LangfuseTracedChain(TracedChain):
     """
     CONFIG_CALLBACK_KEY = "callbacks"
-    _inner_chain = inject.attr("OcrChain")
+    _inner_chain = inject.attr(OcrChainKey)
     def __init__(self, settings: LangfuseSettings):
         super().__init__()

langchain_ocr_lib-0.1.0/PKG-INFO DELETED Viewed

@@ -1,28 +0,0 @@
-Metadata-Version: 2.3
-Name: langchain-ocr-lib
-Version: 0.1.0
-Summary:
-License: MIT
-Author: Andreas Klos
-Author-email: aklos@outlook.de
-Requires-Python: >=3.11,<4.0
-Classifier: License :: OSI Approved :: MIT License
-Classifier: Programming Language :: Python :: 3
-Classifier: Programming Language :: Python :: 3.11
-Classifier: Programming Language :: Python :: 3.12
-Classifier: Programming Language :: Python :: 3.13
-Requires-Dist: deprecated (>=1.2.14,<2.0.0)
-Requires-Dist: inject (>=5.2.1,<6.0.0)
-Requires-Dist: langchain-community (>=0.3.19,<0.4.0)
-Requires-Dist: langchain-ollama (>=0.2.0,<0.3.0)
-Requires-Dist: langchain-openai (>=0.3.8,<0.4.0)
-Requires-Dist: langfuse (>=2.59.7,<3.0.0)
-Requires-Dist: openai (>=1.42.0,<2.0.0)
-Requires-Dist: pdf2image (>=1.17.0,<2.0.0)
-Requires-Dist: pillow (>=11.0.0,<12.0.0)
-Requires-Dist: pycountry (>=24.6.1,<25.0.0)
-Requires-Dist: pytest-asyncio (>=0.25.0,<0.26.0)
-Requires-Dist: pyyaml (>=6.0.2,<7.0.0)
-Description-Content-Type: text/markdown