PyPI - onnxtr - Versions diffs - 0.1.2__tar.gz → 0.3.0__tar.gz - Mend

onnxtr 0.1.2tar.gz → 0.3.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (81) hide show

{onnxtr-0.1.2 → onnxtr-0.3.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: onnxtr
-Version: 0.1.2
+Version: 0.3.0
 Summary: Onnx Text Recognition (OnnxTR): docTR Onnx-Wrapper for high-performance OCR on documents.
 Author-email: Felix Dittrich <felixdittrich92@gmail.com>
 Maintainer: Felix Dittrich
@@ -228,7 +228,7 @@ License-File: LICENSE
 Requires-Dist: numpy<2.0.0,>=1.16.0
 Requires-Dist: scipy<2.0.0,>=1.4.0
 Requires-Dist: opencv-python<5.0.0,>=4.5.0
-Requires-Dist: pypdfium2<5.0.0,>=4.0.0
+Requires-Dist: pypdfium2<5.0.0,>=4.11.0
 Requires-Dist: pyclipper<2.0.0,>=1.2.0
 Requires-Dist: shapely<3.0.0,>=1.6.0
 Requires-Dist: rapidfuzz<4.0.0,>=3.0.0
@@ -275,7 +275,7 @@ Requires-Dist: pre-commit>=2.17.0; extra == "dev"
 [![codecov](https://codecov.io/gh/felixdittrich92/OnnxTR/graph/badge.svg?token=WVFRCQBOLI)](https://codecov.io/gh/felixdittrich92/OnnxTR)
 [![Codacy Badge](https://app.codacy.com/project/badge/Grade/4fff4d764bb14fb8b4f4afeb9587231b)](https://app.codacy.com/gh/felixdittrich92/OnnxTR/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade)
 [![CodeFactor](https://www.codefactor.io/repository/github/felixdittrich92/onnxtr/badge)](https://www.codefactor.io/repository/github/felixdittrich92/onnxtr)
-[![Pypi](https://img.shields.io/badge/pypi-v0.1.1-blue.svg)](https://pypi.org/project/OnnxTR/)
+[![Pypi](https://img.shields.io/badge/pypi-v0.3.0-blue.svg)](https://pypi.org/project/OnnxTR/)
 > :warning: Please note that this is a wrapper around the [doctr](https://github.com/mindee/doctr) library to provide a Onnx pipeline for docTR. For feature requests, which are not directly related to the Onnx pipeline, please refer to the base project.
@@ -284,8 +284,9 @@ Requires-Dist: pre-commit>=2.17.0; extra == "dev"
 What you can expect from this repository:
 - efficient ways to parse textual information (localize and identify each word) from your documents
-- a Onnx pipeline for docTR, a wrapper around the [doctr](https://github.com/mindee/doctr) library
+- a Onnx pipeline for docTR, a wrapper around the [doctr](https://github.com/mindee/doctr) library - no PyTorch or TensorFlow dependencies
 - more lightweight package with faster inference latency and less required resources
+- 8-Bit quantized models for faster inference on CPU
 ![OCR_example](https://github.com/felixdittrich92/OnnxTR/raw/main/docs/images/ocr.png)
@@ -335,11 +336,11 @@ multi_img_doc = DocumentFile.from_images(["path/to/page1.jpg", "path/to/page2.jp
 ### Putting it together
-Let's use the default pretrained model for an example:
+Let's use the default `ocr_predictor` model for an example:
 ```python
 from onnxtr.io import DocumentFile
-from onnxtr.models import ocr_predictor
+from onnxtr.models import ocr_predictor, EngineConfig
 model = ocr_predictor(
     det_arch='fast_base',  # detection architecture
@@ -356,8 +357,15 @@ model = ocr_predictor(
     detect_language=False, # set to `True` if the language of the pages should be detected (default: False)
     # DocumentBuilder specific parameters
     resolve_lines=True,  # whether words should be automatically grouped into lines (default: True)
-    resolve_blocks=True,  # whether lines should be automatically grouped into blocks (default: True)
+    resolve_blocks=False,  # whether lines should be automatically grouped into blocks (default: False)
     paragraph_break=0.035,  # relative length of the minimum space separating paragraphs (default: 0.035)
+    # OnnxTR specific parameters
+    # NOTE: 8-Bit quantized models are not available for FAST detection models and can in general lead to poorer accuracy
+    load_in_8_bit=False,  # set to `True` to load 8-bit quantized models instead of the full precision onces (default: False)
+    # Advanced engine configuration options
+    det_engine_cfg=EngineConfig(),  # detection model engine configuration (default: internal predefined configuration)
+    reco_engine_cfg=EngineConfig(),  # recognition model engine configuration (default: internal predefined configuration)
+    clf_engine_cfg=EngineConfig(),  # classification (orientation) model engine configuration (default: internal predefined configuration)
 )
 # PDF
 doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
@@ -395,6 +403,39 @@ for output in xml_output:
 ```
+<details>
+  <summary>Advanced engine configuration options</summary>
+You can also define advanced engine configurations for the models / predictors:
+```python
+from onnxruntime import SessionOptions
+from onnxtr.models import ocr_predictor, EngineConfig
+general_options = SessionOptions()  # For configuartion options see: https://onnxruntime.ai/docs/api/python/api_summary.html#sessionoptions
+general_options.enable_cpu_mem_arena = False
+# NOTE: The following would force to run only on the GPU if no GPU is available it will raise an error
+# List of strings e.g. ["CUDAExecutionProvider", "CPUExecutionProvider"] or a list of tuples with the provider and its options e.g.
+# [("CUDAExecutionProvider", {"device_id": 0}), ("CPUExecutionProvider", {"arena_extend_strategy": "kSameAsRequested"})]
+providers = [("CUDAExecutionProvider", {"device_id": 0})]  # For available providers see: https://onnxruntime.ai/docs/execution-providers/
+engine_config = EngineConfig(
+    session_options=general_options,
+    providers=providers
+)
+# We use the default predictor with the custom engine configuration
+# NOTE: You can define differnt engine configurations for detection, recognition and classification depending on your needs
+predictor = ocr_predictor(
+    det_engine_cfg=engine_config,
+    reco_engine_cfg=engine_config,
+    clf_engine_cfg=engine_config
+)
+```
+</details>
 ## Loading custom exported models
 You can also load docTR custom exported models:
@@ -438,9 +479,9 @@ predictor.list_archs()
             'linknet_resnet18',
             'linknet_resnet34',
             'linknet_resnet50',
-            'fast_tiny',
-            'fast_small',
-            'fast_base'
+            'fast_tiny',  # No 8-bit support
+            'fast_small',  # No 8-bit support
+            'fast_base'  # No 8-bit support
         ],
     'recognition archs':
         [
@@ -469,14 +510,38 @@ NOTE:
 ### Benchmarks
-The benchmarks was measured on a `i7-14700K Intel CPU`.
+The CPU benchmarks was measured on a `i7-14700K Intel CPU`.
+The GPU benchmarks was measured on a `RTX 4080 Nvidia GPU`.
+Benchmarking performed on the FUNSD dataset and CORD dataset.
+docTR / OnnxTR models used for the benchmarks are `fast_base` (full precision) | `db_resnet50` (8-bit variant) for detection and `crnn_vgg16_bn` for recognition.
+The smallest combination in OnnxTR (docTR) of `db_mobilenet_v3_large` and `crnn_mobilenet_v3_small` takes as comparison `~0.17s / Page` on the FUNSD dataset and `~0.12s / Page` on the CORD dataset in **full precision**.
+- CPU benchmarks:
+|Library                          |FUNSD (199 pages)              |CORD  (900 pages)              |
+|---------------------------------|-------------------------------|-------------------------------|
+|docTR (CPU) - v0.8.1             | ~1.29s / Page                 | ~0.60s / Page                 |
+|**OnnxTR (CPU)** - v0.1.2        | ~0.57s / Page                 | **~0.25s / Page**             |
+|**OnnxTR (CPU) 8-bit** - v0.1.2  | **~0.38s / Page**             | **~0.14s / Page**             |
+|EasyOCR (CPU) - v1.7.1           | ~1.96s / Page                 | ~1.75s / Page                 |
+|**PyTesseract (CPU)** - v0.3.10  | **~0.50s / Page**             | ~0.52s / Page                 |
+|Surya (line) (CPU) - v0.4.4      | ~48.76s / Page                | ~35.49s / Page                |
+|PaddleOCR (CPU) - no cls - v2.7.3| ~1.27s / Page                 | ~0.38s / Page                 |
-MORE BENCHMARKS COMING SOON
+- GPU benchmarks:
-|Dataset                         |docTR (CPU) - v0.8.1           |OnnxTR (CPU) - v0.1.1          |
-|--------------------------------|-------------------------------|-------------------------------|
-|FUNSD (199 pages)               | ~1.29s / Page                 | ~0.57s / Page                 |
-|CORD  (900 pages)               | ~0.60s / Page                 | ~0.25s / Page                 |
+|Library                              |FUNSD (199 pages)              |CORD  (900 pages)              |
+|-------------------------------------|-------------------------------|-------------------------------|
+|docTR (GPU) - v0.8.1                 | ~0.07s / Page                 | ~0.05s / Page                 |
+|**docTR (GPU) float16** - v0.8.1     | **~0.06s / Page**             | **~0.03s / Page**             |
+|OnnxTR (GPU) - v0.1.2                | **~0.06s / Page**             | ~0.04s / Page                 |
+|EasyOCR (GPU) - v1.7.1               | ~0.31s / Page                 | ~0.19s / Page                 |
+|Surya (GPU) float16 - v0.4.4         | ~3.70s / Page                 | ~2.81s / Page                 |
+|**PaddleOCR (GPU) - no cls - v2.7.3**| ~0.08s / Page                 | **~0.03s / Page**             |
 ## Citation

{onnxtr-0.1.2 → onnxtr-0.3.0}/README.md RENAMED Viewed

@@ -7,7 +7,7 @@
 [![codecov](https://codecov.io/gh/felixdittrich92/OnnxTR/graph/badge.svg?token=WVFRCQBOLI)](https://codecov.io/gh/felixdittrich92/OnnxTR)
 [![Codacy Badge](https://app.codacy.com/project/badge/Grade/4fff4d764bb14fb8b4f4afeb9587231b)](https://app.codacy.com/gh/felixdittrich92/OnnxTR/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade)
 [![CodeFactor](https://www.codefactor.io/repository/github/felixdittrich92/onnxtr/badge)](https://www.codefactor.io/repository/github/felixdittrich92/onnxtr)
-[![Pypi](https://img.shields.io/badge/pypi-v0.1.1-blue.svg)](https://pypi.org/project/OnnxTR/)
+[![Pypi](https://img.shields.io/badge/pypi-v0.3.0-blue.svg)](https://pypi.org/project/OnnxTR/)
 > :warning: Please note that this is a wrapper around the [doctr](https://github.com/mindee/doctr) library to provide a Onnx pipeline for docTR. For feature requests, which are not directly related to the Onnx pipeline, please refer to the base project.
@@ -16,8 +16,9 @@
 What you can expect from this repository:
 - efficient ways to parse textual information (localize and identify each word) from your documents
-- a Onnx pipeline for docTR, a wrapper around the [doctr](https://github.com/mindee/doctr) library
+- a Onnx pipeline for docTR, a wrapper around the [doctr](https://github.com/mindee/doctr) library - no PyTorch or TensorFlow dependencies
 - more lightweight package with faster inference latency and less required resources
+- 8-Bit quantized models for faster inference on CPU
 ![OCR_example](https://github.com/felixdittrich92/OnnxTR/raw/main/docs/images/ocr.png)
@@ -67,11 +68,11 @@ multi_img_doc = DocumentFile.from_images(["path/to/page1.jpg", "path/to/page2.jp
 ### Putting it together
-Let's use the default pretrained model for an example:
+Let's use the default `ocr_predictor` model for an example:
 ```python
 from onnxtr.io import DocumentFile
-from onnxtr.models import ocr_predictor
+from onnxtr.models import ocr_predictor, EngineConfig
 model = ocr_predictor(
     det_arch='fast_base',  # detection architecture
@@ -88,8 +89,15 @@ model = ocr_predictor(
     detect_language=False, # set to `True` if the language of the pages should be detected (default: False)
     # DocumentBuilder specific parameters
     resolve_lines=True,  # whether words should be automatically grouped into lines (default: True)
-    resolve_blocks=True,  # whether lines should be automatically grouped into blocks (default: True)
+    resolve_blocks=False,  # whether lines should be automatically grouped into blocks (default: False)
     paragraph_break=0.035,  # relative length of the minimum space separating paragraphs (default: 0.035)
+    # OnnxTR specific parameters
+    # NOTE: 8-Bit quantized models are not available for FAST detection models and can in general lead to poorer accuracy
+    load_in_8_bit=False,  # set to `True` to load 8-bit quantized models instead of the full precision onces (default: False)
+    # Advanced engine configuration options
+    det_engine_cfg=EngineConfig(),  # detection model engine configuration (default: internal predefined configuration)
+    reco_engine_cfg=EngineConfig(),  # recognition model engine configuration (default: internal predefined configuration)
+    clf_engine_cfg=EngineConfig(),  # classification (orientation) model engine configuration (default: internal predefined configuration)
 )
 # PDF
 doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
@@ -127,6 +135,39 @@ for output in xml_output:
 ```
+<details>
+  <summary>Advanced engine configuration options</summary>
+You can also define advanced engine configurations for the models / predictors:
+```python
+from onnxruntime import SessionOptions
+from onnxtr.models import ocr_predictor, EngineConfig
+general_options = SessionOptions()  # For configuartion options see: https://onnxruntime.ai/docs/api/python/api_summary.html#sessionoptions
+general_options.enable_cpu_mem_arena = False
+# NOTE: The following would force to run only on the GPU if no GPU is available it will raise an error
+# List of strings e.g. ["CUDAExecutionProvider", "CPUExecutionProvider"] or a list of tuples with the provider and its options e.g.
+# [("CUDAExecutionProvider", {"device_id": 0}), ("CPUExecutionProvider", {"arena_extend_strategy": "kSameAsRequested"})]
+providers = [("CUDAExecutionProvider", {"device_id": 0})]  # For available providers see: https://onnxruntime.ai/docs/execution-providers/
+engine_config = EngineConfig(
+    session_options=general_options,
+    providers=providers
+)
+# We use the default predictor with the custom engine configuration
+# NOTE: You can define differnt engine configurations for detection, recognition and classification depending on your needs
+predictor = ocr_predictor(
+    det_engine_cfg=engine_config,
+    reco_engine_cfg=engine_config,
+    clf_engine_cfg=engine_config
+)
+```
+</details>
 ## Loading custom exported models
 You can also load docTR custom exported models:
@@ -170,9 +211,9 @@ predictor.list_archs()
             'linknet_resnet18',
             'linknet_resnet34',
             'linknet_resnet50',
-            'fast_tiny',
-            'fast_small',
-            'fast_base'
+            'fast_tiny',  # No 8-bit support
+            'fast_small',  # No 8-bit support
+            'fast_base'  # No 8-bit support
         ],
     'recognition archs':
         [
@@ -201,14 +242,38 @@ NOTE:
 ### Benchmarks
-The benchmarks was measured on a `i7-14700K Intel CPU`.
+The CPU benchmarks was measured on a `i7-14700K Intel CPU`.
+The GPU benchmarks was measured on a `RTX 4080 Nvidia GPU`.
+Benchmarking performed on the FUNSD dataset and CORD dataset.
+docTR / OnnxTR models used for the benchmarks are `fast_base` (full precision) | `db_resnet50` (8-bit variant) for detection and `crnn_vgg16_bn` for recognition.
+The smallest combination in OnnxTR (docTR) of `db_mobilenet_v3_large` and `crnn_mobilenet_v3_small` takes as comparison `~0.17s / Page` on the FUNSD dataset and `~0.12s / Page` on the CORD dataset in **full precision**.
+- CPU benchmarks:
+|Library                          |FUNSD (199 pages)              |CORD  (900 pages)              |
+|---------------------------------|-------------------------------|-------------------------------|
+|docTR (CPU) - v0.8.1             | ~1.29s / Page                 | ~0.60s / Page                 |
+|**OnnxTR (CPU)** - v0.1.2        | ~0.57s / Page                 | **~0.25s / Page**             |
+|**OnnxTR (CPU) 8-bit** - v0.1.2  | **~0.38s / Page**             | **~0.14s / Page**             |
+|EasyOCR (CPU) - v1.7.1           | ~1.96s / Page                 | ~1.75s / Page                 |
+|**PyTesseract (CPU)** - v0.3.10  | **~0.50s / Page**             | ~0.52s / Page                 |
+|Surya (line) (CPU) - v0.4.4      | ~48.76s / Page                | ~35.49s / Page                |
+|PaddleOCR (CPU) - no cls - v2.7.3| ~1.27s / Page                 | ~0.38s / Page                 |
-MORE BENCHMARKS COMING SOON
+- GPU benchmarks:
-|Dataset                         |docTR (CPU) - v0.8.1           |OnnxTR (CPU) - v0.1.1          |
-|--------------------------------|-------------------------------|-------------------------------|
-|FUNSD (199 pages)               | ~1.29s / Page                 | ~0.57s / Page                 |
-|CORD  (900 pages)               | ~0.60s / Page                 | ~0.25s / Page                 |
+|Library                              |FUNSD (199 pages)              |CORD  (900 pages)              |
+|-------------------------------------|-------------------------------|-------------------------------|
+|docTR (GPU) - v0.8.1                 | ~0.07s / Page                 | ~0.05s / Page                 |
+|**docTR (GPU) float16** - v0.8.1     | **~0.06s / Page**             | **~0.03s / Page**             |
+|OnnxTR (GPU) - v0.1.2                | **~0.06s / Page**             | ~0.04s / Page                 |
+|EasyOCR (GPU) - v1.7.1               | ~0.31s / Page                 | ~0.19s / Page                 |
+|Surya (GPU) float16 - v0.4.4         | ~3.70s / Page                 | ~2.81s / Page                 |
+|**PaddleOCR (GPU) - no cls - v2.7.3**| ~0.08s / Page                 | **~0.03s / Page**             |
 ## Citation

{onnxtr-0.1.2 → onnxtr-0.3.0}/onnxtr/io/elements.py RENAMED Viewed

@@ -67,10 +67,11 @@ class Word(Element):
         confidence: the confidence associated with the text prediction
         geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
         the page's size
+        objectness_score: the objectness score of the detection
         crop_orientation: the general orientation of the crop in degrees and its confidence
     """
-    _exported_keys: List[str] = ["value", "confidence", "geometry", "crop_orientation"]
+    _exported_keys: List[str] = ["value", "confidence", "geometry", "objectness_score", "crop_orientation"]
     _children_names: List[str] = []
     def __init__(
@@ -78,12 +79,14 @@ class Word(Element):
         value: str,
         confidence: float,
         geometry: Union[BoundingBox, np.ndarray],
+        objectness_score: float,
         crop_orientation: Dict[str, Any],
     ) -> None:
         super().__init__()
         self.value = value
         self.confidence = confidence
         self.geometry = geometry
+        self.objectness_score = objectness_score
         self.crop_orientation = crop_orientation
     def render(self) -> str:
@@ -143,7 +146,7 @@ class Line(Element):
             all words in it.
     """
-    _exported_keys: List[str] = ["geometry"]
+    _exported_keys: List[str] = ["geometry", "objectness_score"]
     _children_names: List[str] = ["words"]
     words: List[Word] = []
@@ -151,7 +154,11 @@ class Line(Element):
         self,
         words: List[Word],
         geometry: Optional[Union[BoundingBox, np.ndarray]] = None,
+        objectness_score: Optional[float] = None,
     ) -> None:
+        # Compute the objectness score of the line
+        if objectness_score is None:
+            objectness_score = float(np.mean([w.objectness_score for w in words]))
         # Resolve the geometry using the smallest enclosing bounding box
         if geometry is None:
             # Check whether this is a rotated or straight box
@@ -160,6 +167,7 @@ class Line(Element):
         super().__init__(words=words)
         self.geometry = geometry
+        self.objectness_score = objectness_score
     def render(self) -> str:
         """Renders the full text of the element"""
@@ -186,7 +194,7 @@ class Block(Element):
             all lines and artefacts in it.
     """
-    _exported_keys: List[str] = ["geometry"]
+    _exported_keys: List[str] = ["geometry", "objectness_score"]
     _children_names: List[str] = ["lines", "artefacts"]
     lines: List[Line] = []
     artefacts: List[Artefact] = []
@@ -196,7 +204,11 @@ class Block(Element):
         lines: List[Line] = [],
         artefacts: List[Artefact] = [],
         geometry: Optional[Union[BoundingBox, np.ndarray]] = None,
+        objectness_score: Optional[float] = None,
     ) -> None:
+        # Compute the objectness score of the line
+        if objectness_score is None:
+            objectness_score = float(np.mean([w.objectness_score for line in lines for w in line.words]))
         # Resolve the geometry using the smallest enclosing bounding box
         if geometry is None:
             line_boxes = [word.geometry for line in lines for word in line.words]
@@ -208,6 +220,7 @@ class Block(Element):
         super().__init__(lines=lines, artefacts=artefacts)
         self.geometry = geometry
+        self.objectness_score = objectness_score
     def render(self, line_break: str = "\n") -> str:
         """Renders the full text of the element"""
@@ -314,7 +327,7 @@ class Page(Element):
         SubElement(
             head,
             "meta",
-            attrib={"name": "ocr-system", "content": f" {onnxtr.__version__}"},  # type: ignore[attr-defined]
+            attrib={"name": "ocr-system", "content": f"onnxtr {onnxtr.__version__}"},  # type: ignore[attr-defined]
         )
         SubElement(
             head,

{onnxtr-0.1.2 → onnxtr-0.3.0}/onnxtr/io/pdf.py RENAMED Viewed

@@ -15,7 +15,7 @@ __all__ = ["read_pdf"]
 def read_pdf(
     file: AbstractFile,
-    scale: float = 2,
+    scale: int = 2,
     rgb_mode: bool = True,
     password: Optional[str] = None,
     **kwargs: Any,
@@ -38,5 +38,8 @@ def read_pdf(
         the list of pages decoded as numpy ndarray of shape H x W x C
     """
     # Rasterise pages to numpy ndarrays with pypdfium2
-    pdf = pdfium.PdfDocument(file, password=password, autoclose=True)
-    return [page.render(scale=scale, rev_byteorder=rgb_mode, **kwargs).to_numpy() for page in pdf]
+    pdf = pdfium.PdfDocument(file, password=password)
+    try:
+        return [page.render(scale=scale, rev_byteorder=rgb_mode, **kwargs).to_numpy() for page in pdf]
+    finally:
+        pdf.close()

{onnxtr-0.1.2 → onnxtr-0.3.0}/onnxtr/models/__init__.py RENAMED Viewed

@@ -1,3 +1,4 @@
+from .engine import EngineConfig
 from .classification import *
 from .detection import *
 from .recognition import *

{onnxtr-0.1.2 → onnxtr-0.3.0}/onnxtr/models/_utils.py RENAMED Viewed

@@ -11,6 +11,8 @@ import cv2
 import numpy as np
 from langdetect import LangDetectException, detect_langs
+from onnxtr.utils.geometry import rotate_image
 __all__ = ["estimate_orientation", "get_language"]
@@ -29,56 +31,91 @@ def get_max_width_length_ratio(contour: np.ndarray) -> float:
     return max(w / h, h / w)
-def estimate_orientation(img: np.ndarray, n_ct: int = 50, ratio_threshold_for_lines: float = 5) -> int:
+def estimate_orientation(
+    img: np.ndarray,
+    general_page_orientation: Optional[Tuple[int, float]] = None,
+    n_ct: int = 70,
+    ratio_threshold_for_lines: float = 3,
+    min_confidence: float = 0.2,
+    lower_area: int = 100,
+) -> int:
     """Estimate the angle of the general document orientation based on the
      lines of the document and the assumption that they should be horizontal.
     Args:
     ----
         img: the img or bitmap to analyze (H, W, C)
+        general_page_orientation: the general orientation of the page (angle [0, 90, 180, 270 (-90)], confidence)
+            estimated by a model
         n_ct: the number of contours used for the orientation estimation
         ratio_threshold_for_lines: this is the ratio w/h used to discriminates lines
+        min_confidence: the minimum confidence to consider the general_page_orientation
+        lower_area: the minimum area of a contour to be considered
     Returns:
     -------
-        the angle of the general document orientation
+        the estimated angle of the page (clockwise, negative for left side rotation, positive for right side rotation)
     """
     assert len(img.shape) == 3 and img.shape[-1] in [1, 3], f"Image shape {img.shape} not supported"
-    max_value = np.max(img)
-    min_value = np.min(img)
-    if max_value <= 1 and min_value >= 0 or (max_value <= 255 and min_value >= 0 and img.shape[-1] == 1):
-        thresh = img.astype(np.uint8)
-    if max_value <= 255 and min_value >= 0 and img.shape[-1] == 3:
+    thresh = None
+    # Convert image to grayscale if necessary
+    if img.shape[-1] == 3:
         gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
         gray_img = cv2.medianBlur(gray_img, 5)
-        thresh = cv2.threshold(gray_img, thresh=0, maxval=255, type=cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]  # type: ignore[assignment]
-    # try to merge words in lines
-    (h, w) = img.shape[:2]
-    k_x = max(1, (floor(w / 100)))
-    k_y = max(1, (floor(h / 100)))
-    kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (k_x, k_y))
-    thresh = cv2.dilate(thresh, kernel, iterations=1)  # type: ignore[assignment]
+        thresh = cv2.threshold(gray_img, thresh=0, maxval=255, type=cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
+    else:
+        thresh = img.astype(np.uint8)  # type: ignore[assignment]
+    page_orientation, orientation_confidence = general_page_orientation or (None, 0.0)
+    if page_orientation and orientation_confidence >= min_confidence:
+        # We rotate the image to the general orientation which improves the detection
+        # No expand needed bitmap is already padded
+        thresh = rotate_image(thresh, -page_orientation)  # type: ignore
+    else:  # That's only required if we do not work on the detection models bin map
+        # try to merge words in lines
+        (h, w) = img.shape[:2]
+        k_x = max(1, (floor(w / 100)))
+        k_y = max(1, (floor(h / 100)))
+        kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (k_x, k_y))
+        thresh = cv2.dilate(thresh, kernel, iterations=1)
     # extract contours
     contours, _ = cv2.findContours(thresh, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
-    # Sort contours
-    contours = sorted(contours, key=get_max_width_length_ratio, reverse=True)
+    # Filter & Sort contours
+    contours = sorted(
+        [contour for contour in contours if cv2.contourArea(contour) > lower_area],
+        key=get_max_width_length_ratio,
+        reverse=True,
+    )
     angles = []
     for contour in contours[:n_ct]:
-        _, (w, h), angle = cv2.minAreaRect(contour)
+        _, (w, h), angle = cv2.minAreaRect(contour)  # type: ignore[assignment]
         if w / h > ratio_threshold_for_lines:  # select only contours with ratio like lines
             angles.append(angle)
         elif w / h < 1 / ratio_threshold_for_lines:  # if lines are vertical, substract 90 degree
             angles.append(angle - 90)
     if len(angles) == 0:
-        return 0  # in case no angles is found
+        estimated_angle = 0  # in case no angles is found
     else:
         median = -median_low(angles)
-        return round(median) if abs(median) != 0 else 0
+        estimated_angle = -round(median) if abs(median) != 0 else 0
+    # combine with the general orientation and the estimated angle
+    if page_orientation and orientation_confidence >= min_confidence:
+        # special case where the estimated angle is mostly wrong:
+        # case 1: - and + swapped
+        # case 2: estimated angle is completely wrong
+        # so in this case we prefer the general page orientation
+        if abs(estimated_angle) == abs(page_orientation):
+            return page_orientation
+        estimated_angle = estimated_angle if page_orientation == 0 else page_orientation + estimated_angle
+        if estimated_angle > 180:
+            estimated_angle -= 360
+    return estimated_angle  # return the clockwise angle (negative - left side rotation, positive - right side rotation)
 def rectify_crops(

{onnxtr-0.1.2 → onnxtr-0.3.0}/onnxtr/models/builder.py RENAMED Viewed

@@ -31,7 +31,7 @@ class DocumentBuilder(NestedObject):
     def __init__(
         self,
         resolve_lines: bool = True,
-        resolve_blocks: bool = True,
+        resolve_blocks: bool = False,
         paragraph_break: float = 0.035,
         export_as_straight_boxes: bool = False,
     ) -> None:
@@ -223,6 +223,7 @@ class DocumentBuilder(NestedObject):
     def _build_blocks(
         self,
         boxes: np.ndarray,
+        objectness_scores: np.ndarray,
         word_preds: List[Tuple[str, float]],
         crop_orientations: List[Dict[str, Any]],
     ) -> List[Block]:
@@ -230,7 +231,8 @@ class DocumentBuilder(NestedObject):
         Args:
         ----
-            boxes: bounding boxes of all detected words of the page, of shape (N, 5) or (N, 4, 2)
+            boxes: bounding boxes of all detected words of the page, of shape (N, 4) or (N, 4, 2)
+            objectness_scores: objectness scores of all detected words of the page, of shape N
             word_preds: list of all detected words of the page, of shape N
             crop_orientations: list of dictoinaries containing
                 the general orientation (orientations + confidences) of the crops
@@ -265,12 +267,14 @@ class DocumentBuilder(NestedObject):
                     Word(
                         *word_preds[idx],
                         tuple([tuple(pt) for pt in boxes[idx].tolist()]),  # type: ignore[arg-type]
+                        float(objectness_scores[idx]),
                         crop_orientations[idx],
                     )
                     if boxes.ndim == 3
                     else Word(
                         *word_preds[idx],
                         ((boxes[idx, 0], boxes[idx, 1]), (boxes[idx, 2], boxes[idx, 3])),
+                        float(objectness_scores[idx]),
                         crop_orientations[idx],
                     )
                     for idx in line
@@ -293,6 +297,7 @@ class DocumentBuilder(NestedObject):
         self,
         pages: List[np.ndarray],
         boxes: List[np.ndarray],
+        objectness_scores: List[np.ndarray],
         text_preds: List[List[Tuple[str, float]]],
         page_shapes: List[Tuple[int, int]],
         crop_orientations: List[Dict[str, Any]],
@@ -304,8 +309,9 @@ class DocumentBuilder(NestedObject):
         Args:
         ----
             pages: list of N elements, where each element represents the page image
-            boxes: list of N elements, where each element represents the localization predictions, of shape (*, 5)
-                or (*, 6) for all words for a given page
+            boxes: list of N elements, where each element represents the localization predictions, of shape (*, 4)
+                or (*, 4, 2) for all words for a given page
+            objectness_scores: list of N elements, where each element represents the objectness scores
             text_preds: list of N elements, where each element is the list of all word prediction (text + confidence)
             page_shapes: shape of each page, of size N
             crop_orientations: list of N elements, where each element is
@@ -319,9 +325,9 @@ class DocumentBuilder(NestedObject):
         -------
             document object
         """
-        if len(boxes) != len(text_preds) != len(crop_orientations) or len(boxes) != len(page_shapes) != len(
-            crop_orientations
-        ):
+        if len(boxes) != len(text_preds) != len(crop_orientations) != len(objectness_scores) or len(boxes) != len(
+            page_shapes
+        ) != len(crop_orientations) != len(objectness_scores):
             raise ValueError("All arguments are expected to be lists of the same size")
         _orientations = (
@@ -339,6 +345,7 @@ class DocumentBuilder(NestedObject):
                 page,
                 self._build_blocks(
                     page_boxes,
+                    loc_scores,
                     word_preds,
                     word_crop_orientations,
                 ),
@@ -347,8 +354,16 @@ class DocumentBuilder(NestedObject):
                 orientation,
                 language,
             )
-            for page, _idx, shape, page_boxes, word_preds, word_crop_orientations, orientation, language in zip(
-                pages, range(len(boxes)), page_shapes, boxes, text_preds, crop_orientations, _orientations, _languages
+            for page, _idx, shape, page_boxes, loc_scores, word_preds, word_crop_orientations, orientation, language in zip(  # noqa: E501
+                pages,
+                range(len(boxes)),
+                page_shapes,
+                boxes,
+                objectness_scores,
+                text_preds,
+                crop_orientations,
+                _orientations,
+                _languages,
             )
         ]

onnxtr 0.1.2__tar.gz → 0.3.0__tar.gz

onnxtr 0.1.2tar.gz → 0.3.0tar.gz