PyPI - docling - Versions diffs - 2.51.0__tar.gz → 2.53.0__tar.gz - Mend

docling 2.51.0tar.gz → 2.53.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of docling might be problematic. Click here for more details.

Files changed (144) hide show

{docling-2.51.0 → docling-2.53.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: docling
-Version: 2.51.0
+Version: 2.53.0
 Summary: SDK and CLI for parsing PDF, DOCX, HTML, and more, to a unified document representation for powering downstream workflows such as gen AI applications.
 Author-email: Christoph Auer <cau@zurich.ibm.com>, Michele Dolfi <dol@zurich.ibm.com>, Maxim Lysak <mly@zurich.ibm.com>, Nikos Livathinos <nli@zurich.ibm.com>, Ahmed Nassar <ahn@zurich.ibm.com>, Panos Vagenas <pva@zurich.ibm.com>, Peter Staar <taa@zurich.ibm.com>
 License-Expression: MIT
@@ -26,7 +26,7 @@ Requires-Python: <4.0,>=3.9
 Description-Content-Type: text/markdown
 License-File: LICENSE
 Requires-Dist: pydantic<3.0.0,>=2.0.0
-Requires-Dist: docling-core[chunking]<3.0.0,>=2.42.0
+Requires-Dist: docling-core[chunking]<3.0.0,>=2.48.0
 Requires-Dist: docling-parse<5.0.0,>=4.4.0
 Requires-Dist: docling-ibm-models<4,>=3.9.1
 Requires-Dist: filetype<2.0.0,>=1.2.0
@@ -108,18 +108,22 @@ Docling simplifies document processing, parsing diverse formats — including ad
 * 🔒 Local execution capabilities for sensitive data and air-gapped environments
 * 🤖 Plug-and-play [integrations][integrations] incl. LangChain, LlamaIndex, Crew AI & Haystack for agentic AI
 * 🔍 Extensive OCR support for scanned PDFs and images
-* 👓 Support of several Visual Language Models ([SmolDocling](https://huggingface.co/ds4sd/SmolDocling-256M-preview))
+* 👓 Support of several Visual Language Models ([GraniteDocling](https://huggingface.co/ibm-granite/granite-docling-258M))
 * 🎙️ Audio support with Automatic Speech Recognition (ASR) models
+* 🔌 Connect to any agent using the [MCP server](https://docling-project.github.io/docling/usage/mcp/)
 * 💻 Simple and convenient CLI
 ### What's new
 * 📤 Structured [information extraction][extraction] \[🧪 beta\]
+* 📑 New layout model (**Heron**) by default, for faster PDF parsing
+* 🔌 [MCP server](https://docling-project.github.io/docling/usage/mcp/) for agentic applications
 ### Coming soon
 * 📝 Metadata extraction, including title, authors, references & language
 * 📝 Chart understanding (Barchart, Piechart, LinePlot, etc)
 * 📝 Complex chemistry understanding (Molecular structures)
+* 📝 Parsing of Web Video Text Tracks (WebVTT) files
 ## Installation
@@ -145,7 +149,7 @@ result = converter.convert(source)
 print(result.document.export_to_markdown())  # output: "## Docling Technical Report[...]"
 ```
-More [advanced usage options](https://docling-project.github.io/docling/usage/) are available in
+More [advanced usage options](https://docling-project.github.io/docling/usage/advanced_options/) are available in
 the docs.
 ## CLI
@@ -156,9 +160,9 @@ Docling has a built-in CLI to run conversions.
 docling https://arxiv.org/pdf/2206.01062
 ```
-You can also use 🥚[SmolDocling](https://huggingface.co/ds4sd/SmolDocling-256M-preview) and other VLMs via Docling CLI:
+You can also use 🥚[GraniteDocling](https://huggingface.co/ibm-granite/granite-docling-258M) and other VLMs via Docling CLI:
 ```bash
-docling --pipeline vlm --vlm-model smoldocling https://arxiv.org/pdf/2206.01062
+docling --pipeline vlm --vlm-model granite_docling https://arxiv.org/pdf/2206.01062
 ```
 This will use MLX acceleration on supported Apple Silicon hardware.

{docling-2.51.0 → docling-2.53.0}/README.md RENAMED Viewed

@@ -36,18 +36,22 @@ Docling simplifies document processing, parsing diverse formats — including ad
 * 🔒 Local execution capabilities for sensitive data and air-gapped environments
 * 🤖 Plug-and-play [integrations][integrations] incl. LangChain, LlamaIndex, Crew AI & Haystack for agentic AI
 * 🔍 Extensive OCR support for scanned PDFs and images
-* 👓 Support of several Visual Language Models ([SmolDocling](https://huggingface.co/ds4sd/SmolDocling-256M-preview))
+* 👓 Support of several Visual Language Models ([GraniteDocling](https://huggingface.co/ibm-granite/granite-docling-258M))
 * 🎙️ Audio support with Automatic Speech Recognition (ASR) models
+* 🔌 Connect to any agent using the [MCP server](https://docling-project.github.io/docling/usage/mcp/)
 * 💻 Simple and convenient CLI
 ### What's new
 * 📤 Structured [information extraction][extraction] \[🧪 beta\]
+* 📑 New layout model (**Heron**) by default, for faster PDF parsing
+* 🔌 [MCP server](https://docling-project.github.io/docling/usage/mcp/) for agentic applications
 ### Coming soon
 * 📝 Metadata extraction, including title, authors, references & language
 * 📝 Chart understanding (Barchart, Piechart, LinePlot, etc)
 * 📝 Complex chemistry understanding (Molecular structures)
+* 📝 Parsing of Web Video Text Tracks (WebVTT) files
 ## Installation
@@ -73,7 +77,7 @@ result = converter.convert(source)
 print(result.document.export_to_markdown())  # output: "## Docling Technical Report[...]"
 ```
-More [advanced usage options](https://docling-project.github.io/docling/usage/) are available in
+More [advanced usage options](https://docling-project.github.io/docling/usage/advanced_options/) are available in
 the docs.
 ## CLI
@@ -84,9 +88,9 @@ Docling has a built-in CLI to run conversions.
 docling https://arxiv.org/pdf/2206.01062
 ```
-You can also use 🥚[SmolDocling](https://huggingface.co/ds4sd/SmolDocling-256M-preview) and other VLMs via Docling CLI:
+You can also use 🥚[GraniteDocling](https://huggingface.co/ibm-granite/granite-docling-258M) and other VLMs via Docling CLI:
 ```bash
-docling --pipeline vlm --vlm-model smoldocling https://arxiv.org/pdf/2206.01062
+docling --pipeline vlm --vlm-model granite_docling https://arxiv.org/pdf/2206.01062
 ```
 This will use MLX acceleration on supported Apple Silicon hardware.

{docling-2.51.0 → docling-2.53.0}/docling/cli/main.py RENAMED Viewed

@@ -48,6 +48,7 @@ from docling.datamodel.base_models import (
 from docling.datamodel.document import ConversionResult
 from docling.datamodel.pipeline_options import (
     AsrPipelineOptions,
+    ConvertPipelineOptions,
     EasyOcrOptions,
     OcrOptions,
     PaginatedPipelineOptions,
@@ -63,6 +64,8 @@ from docling.datamodel.vlm_model_specs import (
     GOT2_TRANSFORMERS,
     GRANITE_VISION_OLLAMA,
     GRANITE_VISION_TRANSFORMERS,
+    GRANITEDOCLING_MLX,
+    GRANITEDOCLING_TRANSFORMERS,
     SMOLDOCLING_MLX,
     SMOLDOCLING_TRANSFORMERS,
     SMOLDOCLING_VLLM,
@@ -71,8 +74,13 @@ from docling.datamodel.vlm_model_specs import (
 from docling.document_converter import (
     AudioFormatOption,
     DocumentConverter,
+    ExcelFormatOption,
     FormatOption,
+    HTMLFormatOption,
+    MarkdownFormatOption,
     PdfFormatOption,
+    PowerpointFormatOption,
+    WordFormatOption,
 )
 from docling.models.factories import get_ocr_factory
 from docling.pipeline.asr_pipeline import AsrPipeline
@@ -328,7 +336,7 @@ def convert(  # noqa: C901
     vlm_model: Annotated[
         VlmModelType,
         typer.Option(..., help="Choose the VLM model to use with PDF or image files."),
-    ] = VlmModelType.SMOLDOCLING,
+    ] = VlmModelType.GRANITEDOCLING,
     asr_model: Annotated[
         AsrModelType,
         typer.Option(..., help="Choose the ASR model to use with audio/video files."),
@@ -626,10 +634,33 @@ def convert(  # noqa: C901
                 backend=MetsGbsDocumentBackend,
             )
+            # SimplePipeline options
+            simple_format_option = ConvertPipelineOptions(
+                do_picture_description=enrich_picture_description,
+                do_picture_classification=enrich_picture_classes,
+            )
+            if artifacts_path is not None:
+                simple_format_option.artifacts_path = artifacts_path
             format_options = {
                 InputFormat.PDF: pdf_format_option,
                 InputFormat.IMAGE: pdf_format_option,
                 InputFormat.METS_GBS: mets_gbs_format_option,
+                InputFormat.DOCX: WordFormatOption(
+                    pipeline_options=simple_format_option
+                ),
+                InputFormat.PPTX: PowerpointFormatOption(
+                    pipeline_options=simple_format_option
+                ),
+                InputFormat.XLSX: ExcelFormatOption(
+                    pipeline_options=simple_format_option
+                ),
+                InputFormat.HTML: HTMLFormatOption(
+                    pipeline_options=simple_format_option
+                ),
+                InputFormat.MD: MarkdownFormatOption(
+                    pipeline_options=simple_format_option
+                ),
             }
         elif pipeline == ProcessingPipeline.VLM:
@@ -655,6 +686,18 @@ def convert(  # noqa: C901
                             "To run SmolDocling faster, please install mlx-vlm:\n"
                             "pip install mlx-vlm"
                         )
+            elif vlm_model == VlmModelType.GRANITEDOCLING:
+                pipeline_options.vlm_options = GRANITEDOCLING_TRANSFORMERS
+                if sys.platform == "darwin":
+                    try:
+                        import mlx_vlm
+                        pipeline_options.vlm_options = GRANITEDOCLING_MLX
+                    except ImportError:
+                        _log.warning(
+                            "To run GraniteDocling faster, please install mlx-vlm:\n"
+                            "pip install mlx-vlm"
+                        )
             elif vlm_model == VlmModelType.SMOLDOCLING_VLLM:
                 pipeline_options.vlm_options = SMOLDOCLING_VLLM

{docling-2.51.0 → docling-2.53.0}/docling/cli/models.py RENAMED Viewed

@@ -33,6 +33,8 @@ class _AvailableModels(str, Enum):
     CODE_FORMULA = "code_formula"
     PICTURE_CLASSIFIER = "picture_classifier"
     SMOLVLM = "smolvlm"
+    GRANITEDOCLING = "granitedocling"
+    GRANITEDOCLING_MLX = "granitedocling_mlx"
     SMOLDOCLING = "smoldocling"
     SMOLDOCLING_MLX = "smoldocling_mlx"
     GRANITE_VISION = "granite_vision"
@@ -108,6 +110,8 @@ def download(
         with_code_formula=_AvailableModels.CODE_FORMULA in to_download,
         with_picture_classifier=_AvailableModels.PICTURE_CLASSIFIER in to_download,
         with_smolvlm=_AvailableModels.SMOLVLM in to_download,
+        with_granitedocling=_AvailableModels.GRANITEDOCLING in to_download,
+        with_granitedocling_mlx=_AvailableModels.GRANITEDOCLING_MLX in to_download,
         with_smoldocling=_AvailableModels.SMOLDOCLING in to_download,
         with_smoldocling_mlx=_AvailableModels.SMOLDOCLING_MLX in to_download,
         with_granite_vision=_AvailableModels.GRANITE_VISION in to_download,

{docling-2.51.0 → docling-2.53.0}/docling/datamodel/pipeline_options.py RENAMED Viewed

@@ -12,7 +12,7 @@ from pydantic import (
 )
 from typing_extensions import deprecated
-from docling.datamodel import asr_model_specs
+from docling.datamodel import asr_model_specs, vlm_model_specs
 # Import the following for backwards compatibility
 from docling.datamodel.accelerator_options import AcceleratorDevice, AcceleratorOptions
@@ -114,7 +114,11 @@ class RapidOcrOptions(OcrOptions):
     cls_model_path: Optional[str] = None  # same default as rapidocr
     rec_model_path: Optional[str] = None  # same default as rapidocr
     rec_keys_path: Optional[str] = None  # same default as rapidocr
-    rec_font_path: Optional[str] = None  # same default as rapidocr
+    rec_font_path: Optional[str] = None  # Deprecated, please use font_path instead
+    font_path: Optional[str] = None  # same default as rapidocr
+    # Dictionary to overwrite or pass-through additional parameters
+    rapidocr_params: Dict[str, Any] = Field(default_factory=dict)
     model_config = ConfigDict(
         extra="forbid",
@@ -135,6 +139,8 @@ class EasyOcrOptions(OcrOptions):
     recog_network: Optional[str] = "standard"
     download_enabled: bool = True
+    suppress_mps_warnings: bool = True
     model_config = ConfigDict(
         extra="forbid",
         protected_namespaces=(),
@@ -257,11 +263,21 @@ class PipelineOptions(BaseOptions):
     accelerator_options: AcceleratorOptions = AcceleratorOptions()
     enable_remote_services: bool = False
     allow_external_plugins: bool = False
+    artifacts_path: Optional[Union[Path, str]] = None
-class PaginatedPipelineOptions(PipelineOptions):
-    artifacts_path: Optional[Union[Path, str]] = None
+class ConvertPipelineOptions(PipelineOptions):
+    """Base convert pipeline options."""
+    do_picture_classification: bool = False  # True: classify pictures in documents
+    do_picture_description: bool = False  # True: run describe pictures in documents
+    picture_description_options: PictureDescriptionBaseOptions = (
+        smolvlm_picture_description
+    )
+class PaginatedPipelineOptions(ConvertPipelineOptions):
     images_scale: float = 1.0
     generate_page_images: bool = False
     generate_picture_images: bool = False
@@ -274,7 +290,7 @@ class VlmPipelineOptions(PaginatedPipelineOptions):
     )
     # If True, text from backend will be used instead of generated text
     vlm_options: Union[InlineVlmOptions, ApiVlmOptions] = (
-        smoldocling_vlm_conversion_options
+        vlm_model_specs.GRANITEDOCLING_TRANSFORMERS
     )
@@ -293,13 +309,11 @@ class LayoutOptions(BaseModel):
 class AsrPipelineOptions(PipelineOptions):
     asr_options: Union[InlineAsrOptions] = asr_model_specs.WHISPER_TINY
-    artifacts_path: Optional[Union[Path, str]] = None
 class VlmExtractionPipelineOptions(PipelineOptions):
     """Options for extraction pipeline."""
-    artifacts_path: Optional[Union[Path, str]] = None
     vlm_options: Union[InlineVlmOptions] = NU_EXTRACT_2B_TRANSFORMERS
@@ -310,8 +324,6 @@ class PdfPipelineOptions(PaginatedPipelineOptions):
     do_ocr: bool = True  # True: perform OCR, replace programmatic PDF text
     do_code_enrichment: bool = False  # True: perform code OCR
     do_formula_enrichment: bool = False  # True: perform formula OCR, return Latex code
-    do_picture_classification: bool = False  # True: classify pictures in documents
-    do_picture_description: bool = False  # True: run describe pictures in documents
     force_backend_text: bool = (
         False  # (To be used with vlms, or other generative models)
     )
@@ -319,9 +331,6 @@ class PdfPipelineOptions(PaginatedPipelineOptions):
     table_structure_options: TableStructureOptions = TableStructureOptions()
     ocr_options: OcrOptions = EasyOcrOptions()
-    picture_description_options: PictureDescriptionBaseOptions = (
-        smolvlm_picture_description
-    )
     layout_options: LayoutOptions = LayoutOptions()
     images_scale: float = 1.0

{docling-2.51.0 → docling-2.53.0}/docling/datamodel/vlm_model_specs.py RENAMED Viewed

@@ -18,6 +18,35 @@ from docling.datamodel.pipeline_options_vlm_model import (
 _log = logging.getLogger(__name__)
+# Granite-Docling
+GRANITEDOCLING_TRANSFORMERS = InlineVlmOptions(
+    repo_id="ibm-granite/granite-docling-258M",
+    prompt="Convert this page to docling.",
+    response_format=ResponseFormat.DOCTAGS,
+    inference_framework=InferenceFramework.TRANSFORMERS,
+    transformers_model_type=TransformersModelType.AUTOMODEL_IMAGETEXTTOTEXT,
+    supported_devices=[
+        AcceleratorDevice.CPU,
+        AcceleratorDevice.CUDA,
+    ],
+    scale=2.0,
+    temperature=0.0,
+    max_new_tokens=8192,
+    stop_strings=["</doctag>", "<|end_of_text|>"],
+)
+GRANITEDOCLING_MLX = InlineVlmOptions(
+    repo_id="ibm-granite/granite-docling-258M-mlx",
+    prompt="Convert this page to docling.",
+    response_format=ResponseFormat.DOCTAGS,
+    inference_framework=InferenceFramework.MLX,
+    supported_devices=[AcceleratorDevice.MPS],
+    scale=2.0,
+    temperature=0.0,
+    max_new_tokens=8192,
+    stop_strings=["</doctag>", "<|end_of_text|>"],
+)
 # SmolDocling
 SMOLDOCLING_MLX = InlineVlmOptions(
     repo_id="ds4sd/SmolDocling-256M-preview-mlx-bf16",
@@ -272,3 +301,4 @@ class VlmModelType(str, Enum):
     GRANITE_VISION_VLLM = "granite_vision_vllm"
     GRANITE_VISION_OLLAMA = "granite_vision_ollama"
     GOT_OCR_2 = "got_ocr_2"
+    GRANITEDOCLING = "granite_docling"

{docling-2.51.0 → docling-2.53.0}/docling/models/base_model.py RENAMED Viewed

@@ -4,7 +4,13 @@ from collections.abc import Iterable
 from typing import Any, Generic, Optional, Protocol, Type, Union
 import numpy as np
-from docling_core.types.doc import BoundingBox, DocItem, DoclingDocument, NodeItem
+from docling_core.types.doc import (
+    BoundingBox,
+    DocItem,
+    DoclingDocument,
+    NodeItem,
+    PictureItem,
+)
 from PIL.Image import Image
 from typing_extensions import TypeVar
@@ -164,8 +170,17 @@ class BaseItemAndImageEnrichmentModel(
             return None
         assert isinstance(element, DocItem)
-        element_prov = element.prov[0]
+        # Allow the case of documents without page images but embedded images (e.g. Word and HTML docs)
+        if len(element.prov) == 0 and isinstance(element, PictureItem):
+            embedded_im = element.get_image(conv_res.document)
+            if embedded_im is not None:
+                return ItemAndImageEnrichmentElement(item=element, image=embedded_im)
+            else:
+                return None
+        # Crop the image form the page
+        element_prov = element.prov[0]
         bbox = element_prov.bbox
         width = bbox.r - bbox.l
         height = bbox.t - bbox.b
@@ -183,4 +198,14 @@ class BaseItemAndImageEnrichmentModel(
         cropped_image = conv_res.pages[page_ix].get_image(
             scale=self.images_scale, cropbox=expanded_bbox
         )
+        # Allow for images being embedded without the page backend or page images
+        if cropped_image is None and isinstance(element, PictureItem):
+            embedded_im = element.get_image(conv_res.document)
+            if embedded_im is not None:
+                return ItemAndImageEnrichmentElement(item=element, image=embedded_im)
+            else:
+                return None
+        # Return the proper cropped image
         return ItemAndImageEnrichmentElement(item=element, image=cropped_image)

{docling-2.51.0 → docling-2.53.0}/docling/models/easyocr_model.py RENAMED Viewed

@@ -78,14 +78,17 @@ class EasyOcrModel(BaseOcrModel):
                 download_enabled = False
                 model_storage_directory = str(artifacts_path / self._model_repo_folder)
-            self.reader = easyocr.Reader(
-                lang_list=self.options.lang,
-                gpu=use_gpu,
-                model_storage_directory=model_storage_directory,
-                recog_network=self.options.recog_network,
-                download_enabled=download_enabled,
-                verbose=False,
-            )
+            with warnings.catch_warnings():
+                if self.options.suppress_mps_warnings:
+                    warnings.filterwarnings("ignore", message=".*pin_memory.*MPS.*")
+                self.reader = easyocr.Reader(
+                    lang_list=self.options.lang,
+                    gpu=use_gpu,
+                    model_storage_directory=model_storage_directory,
+                    recog_network=self.options.recog_network,
+                    download_enabled=download_enabled,
+                    verbose=False,
+                )
     @staticmethod
     def download_models(
@@ -147,7 +150,14 @@ class EasyOcrModel(BaseOcrModel):
                             scale=self.scale, cropbox=ocr_rect
                         )
                         im = numpy.array(high_res_image)
-                        result = self.reader.readtext(im)
+                        with warnings.catch_warnings():
+                            if self.options.suppress_mps_warnings:
+                                warnings.filterwarnings(
+                                    "ignore", message=".*pin_memory.*MPS.*"
+                                )
+                            result = self.reader.readtext(im)
                         del high_res_image
                         del im

{docling-2.51.0 → docling-2.53.0}/docling/models/picture_description_vlm_model.py RENAMED Viewed

@@ -67,7 +67,7 @@ class PictureDescriptionVlmModel(
                 self.model = AutoModelForImageTextToText.from_pretrained(
                     artifacts_path,
                     device_map=self.device,
-                    torch_dtype=torch.bfloat16,
+                    dtype=torch.bfloat16,
                     _attn_implementation=(
                         "flash_attention_2"
                         if self.device.startswith("cuda")

{docling-2.51.0 → docling-2.53.0}/docling/models/rapid_ocr_model.py RENAMED Viewed

@@ -62,32 +62,44 @@ class RapidOcrModel(BaseOcrModel):
             }
             backend_enum = _ALIASES.get(self.options.backend, EngineType.ONNXRUNTIME)
+            params = {
+                # Global settings (these are still correct)
+                "Global.text_score": self.options.text_score,
+                "Global.font_path": self.options.font_path,
+                # "Global.verbose": self.options.print_verbose,
+                # Detection model settings
+                "Det.model_path": self.options.det_model_path,
+                "Det.use_cuda": use_cuda,
+                "Det.use_dml": use_dml,
+                "Det.intra_op_num_threads": intra_op_num_threads,
+                # Classification model settings
+                "Cls.model_path": self.options.cls_model_path,
+                "Cls.use_cuda": use_cuda,
+                "Cls.use_dml": use_dml,
+                "Cls.intra_op_num_threads": intra_op_num_threads,
+                # Recognition model settings
+                "Rec.model_path": self.options.rec_model_path,
+                "Rec.font_path": self.options.rec_font_path,
+                "Rec.keys_path": self.options.rec_keys_path,
+                "Rec.use_cuda": use_cuda,
+                "Rec.use_dml": use_dml,
+                "Rec.intra_op_num_threads": intra_op_num_threads,
+                "Det.engine_type": backend_enum,
+                "Cls.engine_type": backend_enum,
+                "Rec.engine_type": backend_enum,
+            }
+            if self.options.rec_font_path is not None:
+                _log.warning(
+                    "The 'rec_font_path' option for RapidOCR is deprecated. Please use 'font_path' instead."
+                )
+            user_params = self.options.rapidocr_params
+            if user_params:
+                _log.debug("Overwriting RapidOCR params with user-provided values.")
+                params.update(user_params)
             self.reader = RapidOCR(
-                params={
-                    # Global settings (these are still correct)
-                    "Global.text_score": self.options.text_score,
-                    # "Global.verbose": self.options.print_verbose,
-                    # Detection model settings
-                    "Det.model_path": self.options.det_model_path,
-                    "Det.use_cuda": use_cuda,
-                    "Det.use_dml": use_dml,
-                    "Det.intra_op_num_threads": intra_op_num_threads,
-                    # Classification model settings
-                    "Cls.model_path": self.options.cls_model_path,
-                    "Cls.use_cuda": use_cuda,
-                    "Cls.use_dml": use_dml,
-                    "Cls.intra_op_num_threads": intra_op_num_threads,
-                    # Recognition model settings
-                    "Rec.model_path": self.options.rec_model_path,
-                    "Rec.font_path": self.options.rec_font_path,
-                    "Rec.keys_path": self.options.rec_keys_path,
-                    "Rec.use_cuda": use_cuda,
-                    "Rec.use_dml": use_dml,
-                    "Rec.intra_op_num_threads": intra_op_num_threads,
-                    "Det.engine_type": backend_enum,
-                    "Cls.engine_type": backend_enum,
-                    "Rec.engine_type": backend_enum,
-                }
+                params=params,
             )
     def __call__(
@@ -120,6 +132,9 @@ class RapidOcrModel(BaseOcrModel):
                             use_cls=self.options.use_cls,
                             use_rec=self.options.use_rec,
                         )
+                        if result is None or result.boxes is None:
+                            _log.warning("RapidOCR returned empty result!")
+                            continue
                         result = list(
                             zip(result.boxes.tolist(), result.txts, result.scores)
                         )

{docling-2.51.0 → docling-2.53.0}/docling/models/vlm_models_inline/hf_transformers_model.py RENAMED Viewed

@@ -112,7 +112,7 @@ class HuggingFaceTransformersVlmModel(BaseVlmPageModel, HuggingFaceModelDownload
             self.vlm_model = model_cls.from_pretrained(
                 artifacts_path,
                 device_map=self.device,
-                torch_dtype=self.vlm_options.torch_dtype,
+                dtype=self.vlm_options.torch_dtype,
                 _attn_implementation=(
                     "flash_attention_2"
                     if self.device.startswith("cuda")

{docling-2.51.0 → docling-2.53.0}/docling/models/vlm_models_inline/nuextract_transformers_model.py RENAMED Viewed

@@ -144,7 +144,7 @@ class NuExtractTransformersModel(BaseVlmModel, HuggingFaceModelDownloadMixin):
             self.vlm_model = AutoModelForImageTextToText.from_pretrained(
                 artifacts_path,
                 device_map=self.device,
-                torch_dtype=self.vlm_options.torch_dtype,
+                dtype=self.vlm_options.torch_dtype,
                 _attn_implementation=(
                     "flash_attention_2"
                     if self.device.startswith("cuda")

{docling-2.51.0 → docling-2.53.0}/docling/pipeline/asr_pipeline.py RENAMED Viewed

@@ -208,25 +208,13 @@ class AsrPipeline(BasePipeline):
         self.pipeline_options: AsrPipelineOptions = pipeline_options
-        artifacts_path: Optional[Path] = None
-        if pipeline_options.artifacts_path is not None:
-            artifacts_path = Path(pipeline_options.artifacts_path).expanduser()
-        elif settings.artifacts_path is not None:
-            artifacts_path = Path(settings.artifacts_path).expanduser()
-        if artifacts_path is not None and not artifacts_path.is_dir():
-            raise RuntimeError(
-                f"The value of {artifacts_path=} is not valid. "
-                "When defined, it must point to a folder containing all models required by the pipeline."
-            )
         if isinstance(self.pipeline_options.asr_options, InlineAsrNativeWhisperOptions):
             asr_options: InlineAsrNativeWhisperOptions = (
                 self.pipeline_options.asr_options
             )
             self._model = _NativeWhisperModel(
                 enabled=True,  # must be always enabled for this pipeline to make sense.
-                artifacts_path=artifacts_path,
+                artifacts_path=self.artifacts_path,
                 accelerator_options=pipeline_options.accelerator_options,
                 asr_options=asr_options,
             )

{docling-2.51.0 → docling-2.53.0}/docling/pipeline/base_extraction_pipeline.py RENAMED Viewed

@@ -1,19 +1,33 @@
 import logging
 from abc import ABC, abstractmethod
+from pathlib import Path
 from typing import Optional
 from docling.datamodel.base_models import ConversionStatus, ErrorItem
 from docling.datamodel.document import InputDocument
 from docling.datamodel.extraction import ExtractionResult, ExtractionTemplateType
-from docling.datamodel.pipeline_options import BaseOptions
+from docling.datamodel.pipeline_options import BaseOptions, PipelineOptions
+from docling.datamodel.settings import settings
 _log = logging.getLogger(__name__)
 class BaseExtractionPipeline(ABC):
-    def __init__(self, pipeline_options: BaseOptions):
+    def __init__(self, pipeline_options: PipelineOptions):
         self.pipeline_options = pipeline_options
+        self.artifacts_path: Optional[Path] = None
+        if pipeline_options.artifacts_path is not None:
+            self.artifacts_path = Path(pipeline_options.artifacts_path).expanduser()
+        elif settings.artifacts_path is not None:
+            self.artifacts_path = Path(settings.artifacts_path).expanduser()
+        if self.artifacts_path is not None and not self.artifacts_path.is_dir():
+            raise RuntimeError(
+                f"The value of {self.artifacts_path=} is not valid. "
+                "When defined, it must point to a folder containing all models required by the pipeline."
+            )
     def execute(
         self,
         in_doc: InputDocument,
@@ -54,5 +68,5 @@ class BaseExtractionPipeline(ABC):
     @classmethod
     @abstractmethod
-    def get_default_options(cls) -> BaseOptions:
+    def get_default_options(cls) -> PipelineOptions:
         pass

docling 2.51.0__tar.gz → 2.53.0__tar.gz

Potentially problematic release.

docling 2.51.0tar.gz → 2.53.0tar.gz