PyPI - rc-docparser - Versions diffs - 0.2.0__tar.gz → 0.2.2__tar.gz - Mend

rc-docparser 0.2.0tar.gz → 0.2.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (39) hide show

{rc_docparser-0.2.0 → rc_docparser-0.2.2}/CHANGELOG.md RENAMED Viewed

@@ -6,6 +6,19 @@ follows [Semantic Versioning](https://semver.org/).
 ## [Unreleased]
+## [0.2.2] - 2026-06-19
+### Changed
+- README/PyPI page now leads with the `rc-docparser` distribution name and
+  clarifies the `docparser` import name (docs-only release).
+## [0.2.1] - 2026-06-19
+### Changed
+- Packaging aligned with Research Commons conventions: PEP 639 SPDX
+  `license = "MIT"` (with `LICENSE` shipped via `license-files`), single-source
+  version in `src/docparser/_version.py`, `authors = ["rc-docparser contributors"]`,
+  a `Typing :: Typed` classifier, and a `Research Commons` project URL.
+- Tooling: stricter ruff rule set (added `SIM`) and a coverage report config.
 ## [0.2.0] - 2026-06-16
 ### Added
 - PPTX parser (extra `[pptx]`): walks slides in order; emits per-slide
@@ -36,6 +49,9 @@ follows [Semantic Versioning](https://semver.org/).
 ### Changed
 - `parse_path` and `run_all` accept the new PDF and captioning options;
   non-PDF parsers ignore PDF-only keyword arguments.
+- Published on PyPI as `rc-docparser` (the bare `docparser` name is blocked by
+  PyPI's project-name similarity guard). The Python import name is unchanged:
+  `import docparser`.
 ## [0.1.0] - 2026-06-12
 ### Added

{rc_docparser-0.2.0 → rc_docparser-0.2.2}/PKG-INFO RENAMED Viewed

@@ -1,39 +1,19 @@
 Metadata-Version: 2.4
 Name: rc-docparser
-Version: 0.2.0
+Version: 0.2.2
 Summary: Convert research literature (.docx, .xlsx, .pdf, .html, .pptx, .epub, .txt, .md, .csv) into structured Markdown + JSON corpora, with optional VLM image semantic captioning.
 Project-URL: Homepage, https://github.com/Research-Commons/docparser
 Project-URL: Repository, https://github.com/Research-Commons/docparser
 Project-URL: Issues, https://github.com/Research-Commons/docparser/issues
 Project-URL: Changelog, https://github.com/Research-Commons/docparser/blob/main/CHANGELOG.md
-Author-email: Research Commons <shubhankitsingh@researchcommons.ai>
-License: MIT License
-        Copyright (c) 2026 Research Commons
-        Permission is hereby granted, free of charge, to any person obtaining a copy
-        of this software and associated documentation files (the "Software"), to deal
-        in the Software without restriction, including without limitation the rights
-        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-        copies of the Software, and to permit persons to whom the Software is
-        furnished to do so, subject to the following conditions:
-        The above copyright notice and this permission notice shall be included in all
-        copies or substantial portions of the Software.
-        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
-        SOFTWARE.
+Project-URL: Research Commons, https://lab.researchcommons.ai/
+Author: rc-docparser contributors
+License-Expression: MIT
 License-File: LICENSE
 Keywords: corpus,csv,docx,epub,html,literature,markdown,ocr,parser,pdf,pptx,rag,vlm,xlsx
 Classifier: Development Status :: 4 - Beta
 Classifier: Intended Audience :: Developers
 Classifier: Intended Audience :: Science/Research
-Classifier: License :: OSI Approved :: MIT License
 Classifier: Operating System :: OS Independent
 Classifier: Programming Language :: Python :: 3
 Classifier: Programming Language :: Python :: 3.10
@@ -41,6 +21,7 @@ Classifier: Programming Language :: Python :: 3.11
 Classifier: Programming Language :: Python :: 3.12
 Classifier: Topic :: Scientific/Engineering
 Classifier: Topic :: Text Processing :: Markup
+Classifier: Typing :: Typed
 Requires-Python: >=3.10
 Requires-Dist: lxml>=5.3.0
 Requires-Dist: openpyxl>=3.1.5
@@ -99,7 +80,9 @@ Provides-Extra: vlm
 Requires-Dist: requests>=2.32.3; extra == 'vlm'
 Description-Content-Type: text/markdown
-# docparser
+# rc-docparser
+_Published on PyPI as **`rc-docparser`**; import it as `docparser` (`import docparser`)._
 Convert research literature (`.docx`, `.xlsx`, `.pdf`, `.html`, `.pptx`,
 `.epub`, `.txt`, `.md`, `.csv`) into a clean, reproducible **Markdown + JSON**
@@ -121,29 +104,32 @@ via OpenRouter, OpenAI, Gemini, a local server, or a fully-local model.
 ## Install
+The package is published on PyPI as **`rc-docparser`**; the Python import name
+is `docparser` (i.e. `import docparser`).
 ```bash
-pip install docparser              # core: docx + xlsx + txt/md + csv/tsv
-pip install 'docparser[pdf]'       # + PyMuPDF for PDFs
-pip install 'docparser[html]'      # + trafilatura + bs4 for HTML
-pip install 'docparser[pptx]'      # + python-pptx for PowerPoint
-pip install 'docparser[epub]'      # + EbookLib + bs4 for EPUB
-pip install 'docparser[vlm]'       # + requests for API VLM captions
-pip install 'docparser[all]'       # everything above (recommended)
+pip install rc-docparser              # core: docx + xlsx + txt/md + csv/tsv
+pip install 'rc-docparser[pdf]'       # + PyMuPDF for PDFs
+pip install 'rc-docparser[html]'      # + trafilatura + bs4 for HTML
+pip install 'rc-docparser[pptx]'      # + python-pptx for PowerPoint
+pip install 'rc-docparser[epub]'      # + EbookLib + bs4 for EPUB
+pip install 'rc-docparser[vlm]'       # + requests for API VLM captions
+pip install 'rc-docparser[all]'       # everything above (recommended)
 ```
 Higher-fidelity / heavier features are separate opt-in extras (so the core
 install stays small and MIT):
 ```bash
-pip install 'docparser[tables]'       # + pdfplumber for PDF table extraction
-pip install 'docparser[ocr]'          # + rapidocr-onnxruntime for scanned PDFs
-pip install 'docparser[pymupdf4llm]'  # PyMuPDF4LLM PDF backend (AGPL/commercial)
-pip install 'docparser[docling]'      # IBM Docling PDF backend (MIT)
-pip install 'docparser[marker]'       # Datalab Marker PDF backend (GPL-3.0)
-pip install 'docparser[localvlm]'     # transformers/torch local captioning
+pip install 'rc-docparser[tables]'       # + pdfplumber for PDF table extraction
+pip install 'rc-docparser[ocr]'          # + rapidocr-onnxruntime for scanned PDFs
+pip install 'rc-docparser[pymupdf4llm]'  # PyMuPDF4LLM PDF backend (AGPL/commercial)
+pip install 'rc-docparser[docling]'      # IBM Docling PDF backend (MIT)
+pip install 'rc-docparser[marker]'       # Datalab Marker PDF backend (GPL-3.0)
+pip install 'rc-docparser[localvlm]'     # transformers/torch local captioning
 ```
-`docparser` requires Python 3.10+.
+`rc-docparser` requires Python 3.10+.
 ## Quick start (library)

{rc_docparser-0.2.0 → rc_docparser-0.2.2}/README.md RENAMED Viewed

@@ -1,4 +1,6 @@
-# docparser
+# rc-docparser
+_Published on PyPI as **`rc-docparser`**; import it as `docparser` (`import docparser`)._
 Convert research literature (`.docx`, `.xlsx`, `.pdf`, `.html`, `.pptx`,
 `.epub`, `.txt`, `.md`, `.csv`) into a clean, reproducible **Markdown + JSON**
@@ -20,29 +22,32 @@ via OpenRouter, OpenAI, Gemini, a local server, or a fully-local model.
 ## Install
+The package is published on PyPI as **`rc-docparser`**; the Python import name
+is `docparser` (i.e. `import docparser`).
 ```bash
-pip install docparser              # core: docx + xlsx + txt/md + csv/tsv
-pip install 'docparser[pdf]'       # + PyMuPDF for PDFs
-pip install 'docparser[html]'      # + trafilatura + bs4 for HTML
-pip install 'docparser[pptx]'      # + python-pptx for PowerPoint
-pip install 'docparser[epub]'      # + EbookLib + bs4 for EPUB
-pip install 'docparser[vlm]'       # + requests for API VLM captions
-pip install 'docparser[all]'       # everything above (recommended)
+pip install rc-docparser              # core: docx + xlsx + txt/md + csv/tsv
+pip install 'rc-docparser[pdf]'       # + PyMuPDF for PDFs
+pip install 'rc-docparser[html]'      # + trafilatura + bs4 for HTML
+pip install 'rc-docparser[pptx]'      # + python-pptx for PowerPoint
+pip install 'rc-docparser[epub]'      # + EbookLib + bs4 for EPUB
+pip install 'rc-docparser[vlm]'       # + requests for API VLM captions
+pip install 'rc-docparser[all]'       # everything above (recommended)
 ```
 Higher-fidelity / heavier features are separate opt-in extras (so the core
 install stays small and MIT):
 ```bash
-pip install 'docparser[tables]'       # + pdfplumber for PDF table extraction
-pip install 'docparser[ocr]'          # + rapidocr-onnxruntime for scanned PDFs
-pip install 'docparser[pymupdf4llm]'  # PyMuPDF4LLM PDF backend (AGPL/commercial)
-pip install 'docparser[docling]'      # IBM Docling PDF backend (MIT)
-pip install 'docparser[marker]'       # Datalab Marker PDF backend (GPL-3.0)
-pip install 'docparser[localvlm]'     # transformers/torch local captioning
+pip install 'rc-docparser[tables]'       # + pdfplumber for PDF table extraction
+pip install 'rc-docparser[ocr]'          # + rapidocr-onnxruntime for scanned PDFs
+pip install 'rc-docparser[pymupdf4llm]'  # PyMuPDF4LLM PDF backend (AGPL/commercial)
+pip install 'rc-docparser[docling]'      # IBM Docling PDF backend (MIT)
+pip install 'rc-docparser[marker]'       # Datalab Marker PDF backend (GPL-3.0)
+pip install 'rc-docparser[localvlm]'     # transformers/torch local captioning
 ```
-`docparser` requires Python 3.10+.
+`rc-docparser` requires Python 3.10+.
 ## Quick start (library)

{rc_docparser-0.2.0 → rc_docparser-0.2.2}/pyproject.toml RENAMED Viewed

@@ -7,11 +7,10 @@ name = "rc-docparser"
 dynamic = ["version"]
 description = "Convert research literature (.docx, .xlsx, .pdf, .html, .pptx, .epub, .txt, .md, .csv) into structured Markdown + JSON corpora, with optional VLM image semantic captioning."
 readme = "README.md"
-license = { file = "LICENSE" }
+license = "MIT"
+license-files = ["LICENSE"]
 requires-python = ">=3.10"
-authors = [
-    { name = "Research Commons", email = "shubhankitsingh@researchcommons.ai" },
-]
+authors = [{ name = "rc-docparser contributors" }]
 keywords = [
     "docx",
     "xlsx",
@@ -32,7 +31,6 @@ classifiers = [
     "Development Status :: 4 - Beta",
     "Intended Audience :: Developers",
     "Intended Audience :: Science/Research",
-    "License :: OSI Approved :: MIT License",
     "Operating System :: OS Independent",
     "Programming Language :: Python :: 3",
     "Programming Language :: Python :: 3.10",
@@ -40,6 +38,7 @@ classifiers = [
     "Programming Language :: Python :: 3.12",
     "Topic :: Scientific/Engineering",
     "Topic :: Text Processing :: Markup",
+    "Typing :: Typed",
 ]
 dependencies = [
     "python-docx>=1.1.2",
@@ -97,9 +96,10 @@ Homepage = "https://github.com/Research-Commons/docparser"
 Repository = "https://github.com/Research-Commons/docparser"
 Issues = "https://github.com/Research-Commons/docparser/issues"
 Changelog = "https://github.com/Research-Commons/docparser/blob/main/CHANGELOG.md"
+"Research Commons" = "https://lab.researchcommons.ai/"
 [tool.hatch.version]
-path = "src/docparser/__init__.py"
+path = "src/docparser/_version.py"
 [tool.hatch.build.targets.wheel]
 packages = ["src/docparser"]
@@ -121,7 +121,7 @@ line-length = 100
 target-version = "py310"
 [tool.ruff.lint]
-select = ["E", "F", "W", "I", "B", "UP", "RUF"]
+select = ["E", "F", "W", "I", "B", "UP", "SIM", "RUF"]
 ignore = ["E501"]
 [tool.pytest.ini_options]
@@ -135,6 +135,16 @@ markers = [
 source = ["src/docparser"]
 branch = true
+[tool.coverage.report]
+show_missing = true
+exclude_lines = [
+    "pragma: no cover",
+    "if TYPE_CHECKING:",
+    "@overload",
+    "raise NotImplementedError",
+    "\\.\\.\\.",
+]
 [tool.mypy]
 python_version = "3.10"
 files = ["src/docparser"]

{rc_docparser-0.2.0 → rc_docparser-0.2.2}/src/docparser/__init__.py RENAMED Viewed

@@ -19,8 +19,7 @@ Public API
 """
 from __future__ import annotations
-__version__ = "0.2.0"
+from ._version import __version__
 from .common import (
     WorkspaceLayout,
     bytes_sha1,

rc_docparser-0.2.2/src/docparser/_version.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ __version__ = "0.2.2"

{rc_docparser-0.2.0 → rc_docparser-0.2.2}/src/docparser/cli.py RENAMED Viewed

@@ -21,10 +21,7 @@ from .orchestrator import SUPPORTED_EXTENSIONS, parse_path, run_all
 def _layout_from_args(args: argparse.Namespace) -> WorkspaceLayout:
-    if args.workspace:
-        layout = WorkspaceLayout.under(args.workspace)
-    else:
-        layout = WorkspaceLayout()
+    layout = WorkspaceLayout.under(args.workspace) if args.workspace else WorkspaceLayout()
     if getattr(args, "raw_dir", None):
         layout.raw_dir = Path(args.raw_dir)
     if getattr(args, "parsed_dir", None):

{rc_docparser-0.2.0 → rc_docparser-0.2.2}/src/docparser/docx.py RENAMED Viewed

@@ -140,9 +140,7 @@ def _is_caption(p: Paragraph) -> bool:
     if style in CAPTION_STYLE_NAMES:
         return True
     text = (p.text or "").strip()
-    if CAPTION_RE.match(text):
-        return True
-    return False
+    return bool(CAPTION_RE.match(text))
 @dataclass

{rc_docparser-0.2.0 → rc_docparser-0.2.2}/src/docparser/epub.py RENAMED Viewed

@@ -5,7 +5,7 @@ over each chapter (headings / paragraphs / lists / tables / images), and
 extracts embedded images to the asset directory. Embedded images can be
 captioned via a ``captioner`` callable.
-Requires the ``[epub]`` extra: ``pip install 'docparser[epub]'`` (which also
+Requires the ``[epub]`` extra: ``pip install 'rc-docparser[epub]'`` (which also
 pulls in BeautifulSoup from the ``[html]`` extra).
 """
 from __future__ import annotations
@@ -36,7 +36,7 @@ def _import_deps():
     except ImportError as exc:  # pragma: no cover - optional dep guard
         raise ImportError(
             "docparser.epub.parse_epub requires the [epub] extra. "
-            "Install with: pip install 'docparser[epub]'"
+            "Install with: pip install 'rc-docparser[epub]'"
         ) from exc
     return ebooklib, epub, BeautifulSoup

{rc_docparser-0.2.0 → rc_docparser-0.2.2}/src/docparser/html.py RENAMED Viewed

@@ -8,7 +8,7 @@ Two-tier strategy:
    typed blocks (heading/paragraph/list/table/image) when trafilatura returns
    nothing useful or when the caller wants the full structure.
-Requires the ``[html]`` extra: ``pip install 'docparser[html]'``.
+Requires the ``[html]`` extra: ``pip install 'rc-docparser[html]'``.
 """
 from __future__ import annotations
@@ -37,7 +37,7 @@ def _import_deps():
     except ImportError as exc:  # pragma: no cover
         raise ImportError(
             "docparser.html.parse_html requires the [html] extra. "
-            "Install with: pip install 'docparser[html]'"
+            "Install with: pip install 'rc-docparser[html]'"
         ) from exc
     return trafilatura, BeautifulSoup

{rc_docparser-0.2.0 → rc_docparser-0.2.2}/src/docparser/image.py RENAMED Viewed

@@ -234,7 +234,7 @@ def caption_image(
     if requests is None:  # pragma: no cover - optional dep guard
         raise ImportError(
             "docparser.image.caption_image requires the [vlm] extra. "
-            "Install with: pip install 'docparser[vlm]'"
+            "Install with: pip install 'rc-docparser[vlm]'"
         )
     provider_name, preset = _resolve_provider(provider)

{rc_docparser-0.2.0 → rc_docparser-0.2.2}/src/docparser/localvlm.py RENAMED Viewed

@@ -8,7 +8,7 @@ image-to-text model such as BLIP.
 It honors the same ``VLMResult`` shape and the same ``SHA1(image) x model``
 on-disk cache as the API captioner, so output is interchangeable.
-Requires the ``[localvlm]`` extra: ``pip install 'docparser[localvlm]'``.
+Requires the ``[localvlm]`` extra: ``pip install 'rc-docparser[localvlm]'``.
 """
 from __future__ import annotations
@@ -32,7 +32,7 @@ def _load_pipeline(model: str):
     except ImportError as exc:  # pragma: no cover - optional dep
         raise ImportError(
             "docparser.localvlm requires the [localvlm] extra. "
-            "Install with: pip install 'docparser[localvlm]'"
+            "Install with: pip install 'rc-docparser[localvlm]'"
         ) from exc
     return pipeline("image-to-text", model=model)

{rc_docparser-0.2.0 → rc_docparser-0.2.2}/src/docparser/ocr.py RENAMED Viewed

@@ -3,7 +3,7 @@
 Uses ``rapidocr-onnxruntime`` by default: a pure-pip ONNX OCR engine that needs
 no system binaries (unlike Tesseract). The engine is created once and reused.
-Requires the ``[ocr]`` extra: ``pip install 'docparser[ocr]'``.
+Requires the ``[ocr]`` extra: ``pip install 'rc-docparser[ocr]'``.
 """
 from __future__ import annotations
@@ -12,7 +12,7 @@ from functools import lru_cache
 from typing import Any
 _NO_OCR_MSG = (
-    "OCR requires the [ocr] extra. Install with: pip install 'docparser[ocr]'"
+    "OCR requires the [ocr] extra. Install with: pip install 'rc-docparser[ocr]'"
 )

{rc_docparser-0.2.0 → rc_docparser-0.2.2}/src/docparser/pdf.py RENAMED Viewed

@@ -12,7 +12,7 @@ heading classifier based on font sizing. On top of that it offers:
 - ``extract_tables`` use ``pdfplumber`` (the ``[tables]`` extra) to emit real
   table blocks instead of flattened text.
-Requires the ``[pdf]`` extra: ``pip install 'docparser[pdf]'``.
+Requires the ``[pdf]`` extra: ``pip install 'rc-docparser[pdf]'``.
 """
 from __future__ import annotations
@@ -104,7 +104,7 @@ def parse_pdf(
     except ImportError as exc:  # pragma: no cover
         raise ImportError(
             "docparser.pdf.parse_pdf requires the [pdf] extra. "
-            "Install with: pip install 'docparser[pdf]'"
+            "Install with: pip install 'rc-docparser[pdf]'"
         ) from exc
     if ocr not in {"off", "auto", "force"}:
@@ -218,7 +218,7 @@ def parse_pdf(
         except ImportError as exc:  # pragma: no cover - optional dep
             raise ImportError(
                 "extract_tables=True requires the [tables] extra. "
-                "Install with: pip install 'docparser[tables]'"
+                "Install with: pip install 'rc-docparser[tables]'"
             ) from exc
         out: list[list[list[str]]] = []
         with pdfplumber.open(str(real_source)) as pdf:

{rc_docparser-0.2.0 → rc_docparser-0.2.2}/src/docparser/pdf_backends.py RENAMED Viewed

@@ -8,9 +8,9 @@ schema via :func:`docparser.text._blocks_from_markdown`.
 All backends are optional extras and lazily imported:
-- ``pymupdf4llm`` -> ``pip install 'docparser[pymupdf4llm]'`` (note: AGPL/commercial)
-- ``docling``     -> ``pip install 'docparser[docling]'`` (MIT)
-- ``marker``      -> ``pip install 'docparser[marker]'`` (GPL-3.0)
+- ``pymupdf4llm`` -> ``pip install 'rc-docparser[pymupdf4llm]'`` (note: AGPL/commercial)
+- ``docling``     -> ``pip install 'rc-docparser[docling]'`` (MIT)
+- ``marker``      -> ``pip install 'rc-docparser[marker]'`` (GPL-3.0)
 """
 from __future__ import annotations
@@ -28,7 +28,7 @@ def _markdown_pymupdf4llm(path: Path) -> str:
     except ImportError as exc:  # pragma: no cover - optional dep
         raise ImportError(
             "backend='pymupdf4llm' requires the [pymupdf4llm] extra. "
-            "Install with: pip install 'docparser[pymupdf4llm]'"
+            "Install with: pip install 'rc-docparser[pymupdf4llm]'"
         ) from exc
     return pymupdf4llm.to_markdown(str(path))
@@ -39,7 +39,7 @@ def _markdown_docling(path: Path) -> str:
     except ImportError as exc:  # pragma: no cover - optional dep
         raise ImportError(
             "backend='docling' requires the [docling] extra. "
-            "Install with: pip install 'docparser[docling]'"
+            "Install with: pip install 'rc-docparser[docling]'"
         ) from exc
     converter = DocumentConverter()
     result = converter.convert(str(path))
@@ -55,7 +55,7 @@ def _markdown_marker(path: Path) -> str:
     except ImportError as exc:  # pragma: no cover - optional dep
         raise ImportError(
             "backend='marker' requires the [marker] extra. "
-            "Install with: pip install 'docparser[marker]'"
+            "Install with: pip install 'rc-docparser[marker]'"
         ) from exc
     config_parser = ConfigParser({"output_format": "markdown"})
     converter = PdfConverter(

{rc_docparser-0.2.0 → rc_docparser-0.2.2}/src/docparser/pptx.py RENAMED Viewed

@@ -6,7 +6,7 @@ pictures are emitted in shape order. Speaker notes are captured per slide.
 Embedded pictures are written to ``layout.assets_dir_for(source)`` and may be
 captioned via a ``captioner`` callable (same contract as the other parsers).
-Requires the ``[pptx]`` extra: ``pip install 'docparser[pptx]'``.
+Requires the ``[pptx]`` extra: ``pip install 'rc-docparser[pptx]'``.
 """
 from __future__ import annotations
@@ -33,7 +33,7 @@ def _import_pptx():
     except ImportError as exc:  # pragma: no cover - optional dep guard
         raise ImportError(
             "docparser.pptx.parse_pptx requires the [pptx] extra. "
-            "Install with: pip install 'docparser[pptx]'"
+            "Install with: pip install 'rc-docparser[pptx]'"
         ) from exc
     return Presentation, MSO_SHAPE_TYPE

{rc_docparser-0.2.0 → rc_docparser-0.2.2}/src/docparser/xlsx.py RENAMED Viewed

@@ -6,6 +6,7 @@ embedded images.
 """
 from __future__ import annotations
+import contextlib
 import datetime as _dt
 import io
 from collections.abc import Callable
@@ -66,10 +67,8 @@ def _cell_record(cell: Cell, formulas_ws: Worksheet | None) -> dict[str, Any]:
         except Exception:
             pass
     if cell.comment is not None:
-        try:
+        with contextlib.suppress(Exception):
             rec["comment"] = str(cell.comment.text)
-        except Exception:
-            pass
     return rec
@@ -84,10 +83,8 @@ def _extract_images(
             data: bytes | None = None
             if hasattr(ref, "read"):
                 data = ref.read()
-                try:
+                with contextlib.suppress(Exception):
                     ref.seek(0)
-                except Exception:
-                    pass
             elif isinstance(ref, (bytes, bytearray)):
                 data = bytes(ref)
             else:

{rc_docparser-0.2.0 → rc_docparser-0.2.2}/tests/conftest.py RENAMED Viewed

@@ -1,6 +1,7 @@
 """Shared pytest fixtures: synthetic docx, xlsx, pdf, html files."""
 from __future__ import annotations
+import contextlib
 import io
 from pathlib import Path
@@ -51,10 +52,8 @@ def sample_docx(tmp_path: Path) -> Path:
     img_path.write_bytes(_png_bytes())
     doc.add_picture(str(img_path))
     cap_para = doc.add_paragraph("Figure 1: a red square example.")
-    try:
+    with contextlib.suppress(KeyError):
         cap_para.style = doc.styles["Caption"]
-    except KeyError:
-        pass
     doc.add_heading("Section B", level=2)
     doc.add_paragraph("Conclusion paragraph.")