PyPI - sigdetect - Versions diffs - 0.3.0__tar.gz → 0.4.0__tar.gz - Mend

sigdetect 0.3.0tar.gz → 0.4.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (36) hide show

{sigdetect-0.3.0 → sigdetect-0.4.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: sigdetect
-Version: 0.3.0
+Version: 0.4.0
 Summary: Signature detection and role attribution for PDFs
 Author-email: BT Asmamaw <basmamaw@angeiongroup.com>
 License: MIT
@@ -95,14 +95,14 @@ sigdetect detect \
 ### Notes
 - The config file controls `pdf_root`, `out_dir`, `engine`, `pseudo_signatures`, `recurse_xobjects`, etc.
-- `--engine` supports **pypdf2** (default); a **pymupdf** engine placeholder exists and may be included in a future build.
+- `--engine` accepts **auto** (default; prefers PyMuPDF when installed, falls back to PyPDF2), **pypdf2**, or **pymupdf**.
 - `--pseudo-signatures` enables a vendor/Acro-only pseudo-signature when no actual `/Widget` is present (useful for DocuSign / Acrobat Sign receipts).
 - `--recurse-xobjects` allows scanning Form XObjects for vendor markers and labels embedded in page resources.
 - `--profile` selects tuned role logic:
   - `hipaa` → patient / representative / attorney
   - `retainer` → client / firm (prefers detecting two signatures)
 - `--recursive/--no-recursive` toggles whether `sigdetect detect` descends into subdirectories when hunting for PDFs (recursive by default).
-- `--crop-signatures` enables PNG crops for each detected widget (requires installing the optional `pymupdf` dependency). Use `--crop-dir` to override the destination and `--crop-dpi` to choose rendering quality.
+- Cropping (`--crop-signatures`) and wet detection (`--detect-wet`) are enabled by default for single-pass runs; disable them if you want a light, e-sign-only pass. PyMuPDF is required for crops; PyMuPDF + Tesseract are required for wet detection.
 - If the executable is not on `PATH`, you can always fall back to `python -m sigdetect.cli ...`.
 ### EDA (quick aggregate stats)
@@ -136,15 +136,13 @@ result = detector.Detect(Path("/path/to/pdfs/example.pdf"))
 print(result.to_dict())
 ~~~
-`Detect(Path)` returns a **FileResult** dataclass; call `.to_dict()` for the JSON-friendly representation (see [Result schema](#result-schema)). Each signature entry now exposes `bounding_box` coordinates (PDF points, origin bottom-left). When PNG cropping is enabled, `crop_path` points at the generated image.
+`Detect(Path)` returns a **FileResult** dataclass; call `.to_dict()` for the JSON-friendly representation (see [Result schema](#result-schema)). Each signature entry now exposes `bounding_box` coordinates (PDF points, origin bottom-left). When PNG cropping is enabled, `crop_path` points at the generated image. Use `Engine="auto"` if you want the single-pass defaults that prefer PyMuPDF (for geometry) when available.
 ---
 ## Library API (embed in another script)
-Minimal, plug-and-play API
-Import from `sigdetect.api` and get plain dicts out (JSON-ready),
-with no I/O side effects by default:
+Minimal, plug-and-play API that returns plain dicts (JSON-ready) without side effects unless you opt into cropping:
 ~~~python
 from pathlib import Path
@@ -192,21 +190,14 @@ for res in ScanDirectory(
 # 3) Crop PNG snippets for FileResult objects (requires PyMuPDF)
 detector = get_detector(pdfRoot="/path/to/pdfs", profileName="hipaa")
 file_result = detector.Detect(Path("/path/to/pdfs/example.pdf"))
-crops = CropSignatureImages(
+CropSignatureImages(
     "/path/to/pdfs/example.pdf",
     file_result,
     outputDirectory="./signature_crops",
     dpi=200,
-    returnBytes=True,  # also returns in-memory PNG bytes for each crop
 )
-first_crop = crops[0]
-print(first_crop.path, len(first_crop.image_bytes))
 ~~~
-When ``returnBytes=True`` the helper returns ``SignatureCrop`` objects containing the saved path,
-PNG bytes, and the originating signature metadata.
 ## Result schema
@@ -245,7 +236,7 @@ High-level summary (per file):
       "scores": { "page_label": 4, "general": 2 },
       "evidence": ["page_label:representative(parent/guardian)", "pseudo:true"],
       "hint": "VendorOrAcroOnly",
-      "render_type": "unknown",
+      "render_type": "typed",
       "bounding_box": null,
       "crop_path": null
     }
@@ -290,6 +281,10 @@ profile: retainer    # or: hipaa
 crop_signatures: false   # enable to write PNG crops (requires pymupdf)
 # crop_output_dir: ./signature_crops
 crop_image_dpi: 200
+detect_wet_signatures: false   # opt-in OCR wet detection (PyMuPDF + Tesseract)
+wet_ocr_dpi: 200
+wet_ocr_languages: eng
+wet_precision_threshold: 0.82
 ~~~
 YAML files can be customized or load at runtime (see CLI `--config`, if available, or import and pass patterns into engine).
@@ -304,6 +299,7 @@ YAML files can be customized or load at runtime (see CLI `--config`, if availabl
   - Looks for client and firm labels/tokens; boosts pages with law-firm markers (LLP/LLC/PA/PC) and “By:” blocks.
   - Applies an anti-front-matter rule to reduce page-1 false positives (e.g., letterheads, firm mastheads).
   - When only vendor/Acro clues exist (no widgets), it will emit two pseudo signatures targeting likely pages.
+- **Wet detection (opt-in):** With `detect_wet_signatures: true`, the CLI runs an OCR-backed pass (PyMuPDF + pytesseract/Tesseract) after e-sign detection. It emits `RenderType="wet"` signatures for high-confidence label/stroke pairs in the lower page region. Missing OCR dependencies add a `ManualReview:*` hint instead of failing.
 ---

{sigdetect-0.3.0 → sigdetect-0.4.0}/README.md RENAMED Viewed

@@ -79,14 +79,14 @@ sigdetect detect \
 ### Notes
 - The config file controls `pdf_root`, `out_dir`, `engine`, `pseudo_signatures`, `recurse_xobjects`, etc.
-- `--engine` supports **pypdf2** (default); a **pymupdf** engine placeholder exists and may be included in a future build.
+- `--engine` accepts **auto** (default; prefers PyMuPDF when installed, falls back to PyPDF2), **pypdf2**, or **pymupdf**.
 - `--pseudo-signatures` enables a vendor/Acro-only pseudo-signature when no actual `/Widget` is present (useful for DocuSign / Acrobat Sign receipts).
 - `--recurse-xobjects` allows scanning Form XObjects for vendor markers and labels embedded in page resources.
 - `--profile` selects tuned role logic:
   - `hipaa` → patient / representative / attorney
   - `retainer` → client / firm (prefers detecting two signatures)
 - `--recursive/--no-recursive` toggles whether `sigdetect detect` descends into subdirectories when hunting for PDFs (recursive by default).
-- `--crop-signatures` enables PNG crops for each detected widget (requires installing the optional `pymupdf` dependency). Use `--crop-dir` to override the destination and `--crop-dpi` to choose rendering quality.
+- Cropping (`--crop-signatures`) and wet detection (`--detect-wet`) are enabled by default for single-pass runs; disable them if you want a light, e-sign-only pass. PyMuPDF is required for crops; PyMuPDF + Tesseract are required for wet detection.
 - If the executable is not on `PATH`, you can always fall back to `python -m sigdetect.cli ...`.
 ### EDA (quick aggregate stats)
@@ -120,15 +120,13 @@ result = detector.Detect(Path("/path/to/pdfs/example.pdf"))
 print(result.to_dict())
 ~~~
-`Detect(Path)` returns a **FileResult** dataclass; call `.to_dict()` for the JSON-friendly representation (see [Result schema](#result-schema)). Each signature entry now exposes `bounding_box` coordinates (PDF points, origin bottom-left). When PNG cropping is enabled, `crop_path` points at the generated image.
+`Detect(Path)` returns a **FileResult** dataclass; call `.to_dict()` for the JSON-friendly representation (see [Result schema](#result-schema)). Each signature entry now exposes `bounding_box` coordinates (PDF points, origin bottom-left). When PNG cropping is enabled, `crop_path` points at the generated image. Use `Engine="auto"` if you want the single-pass defaults that prefer PyMuPDF (for geometry) when available.
 ---
 ## Library API (embed in another script)
-Minimal, plug-and-play API
-Import from `sigdetect.api` and get plain dicts out (JSON-ready),
-with no I/O side effects by default:
+Minimal, plug-and-play API that returns plain dicts (JSON-ready) without side effects unless you opt into cropping:
 ~~~python
 from pathlib import Path
@@ -176,21 +174,14 @@ for res in ScanDirectory(
 # 3) Crop PNG snippets for FileResult objects (requires PyMuPDF)
 detector = get_detector(pdfRoot="/path/to/pdfs", profileName="hipaa")
 file_result = detector.Detect(Path("/path/to/pdfs/example.pdf"))
-crops = CropSignatureImages(
+CropSignatureImages(
     "/path/to/pdfs/example.pdf",
     file_result,
     outputDirectory="./signature_crops",
     dpi=200,
-    returnBytes=True,  # also returns in-memory PNG bytes for each crop
 )
-first_crop = crops[0]
-print(first_crop.path, len(first_crop.image_bytes))
 ~~~
-When ``returnBytes=True`` the helper returns ``SignatureCrop`` objects containing the saved path,
-PNG bytes, and the originating signature metadata.
 ## Result schema
@@ -229,7 +220,7 @@ High-level summary (per file):
       "scores": { "page_label": 4, "general": 2 },
       "evidence": ["page_label:representative(parent/guardian)", "pseudo:true"],
       "hint": "VendorOrAcroOnly",
-      "render_type": "unknown",
+      "render_type": "typed",
       "bounding_box": null,
       "crop_path": null
     }
@@ -274,6 +265,10 @@ profile: retainer    # or: hipaa
 crop_signatures: false   # enable to write PNG crops (requires pymupdf)
 # crop_output_dir: ./signature_crops
 crop_image_dpi: 200
+detect_wet_signatures: false   # opt-in OCR wet detection (PyMuPDF + Tesseract)
+wet_ocr_dpi: 200
+wet_ocr_languages: eng
+wet_precision_threshold: 0.82
 ~~~
 YAML files can be customized or load at runtime (see CLI `--config`, if available, or import and pass patterns into engine).
@@ -288,6 +283,7 @@ YAML files can be customized or load at runtime (see CLI `--config`, if availabl
   - Looks for client and firm labels/tokens; boosts pages with law-firm markers (LLP/LLC/PA/PC) and “By:” blocks.
   - Applies an anti-front-matter rule to reduce page-1 false positives (e.g., letterheads, firm mastheads).
   - When only vendor/Acro clues exist (no widgets), it will emit two pseudo signatures targeting likely pages.
+- **Wet detection (opt-in):** With `detect_wet_signatures: true`, the CLI runs an OCR-backed pass (PyMuPDF + pytesseract/Tesseract) after e-sign detection. It emits `RenderType="wet"` signatures for high-confidence label/stroke pairs in the lower page region. Missing OCR dependencies add a `ManualReview:*` hint instead of failing.
 ---

{sigdetect-0.3.0 → sigdetect-0.4.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "sigdetect"
-version = "0.3.0"
+version = "0.4.0"
 description = "Signature detection and role attribution for PDFs"
 readme = "README.md"
 authors = [{ name = "BT Asmamaw", email = "basmamaw@angeiongroup.com" }]

{sigdetect-0.3.0 → sigdetect-0.4.0}/src/sigdetect/__init__.py RENAMED Viewed

@@ -21,4 +21,4 @@ try:
 except PackageNotFoundError:  # pragma: no cover
     __version__ = "0.0.0"
-DEFAULT_ENGINE = "pypdf2"
+DEFAULT_ENGINE = "auto"

{sigdetect-0.3.0 → sigdetect-0.4.0}/src/sigdetect/api.py RENAMED Viewed

@@ -10,7 +10,7 @@ from sigdetect.config import DetectConfiguration
 from sigdetect.cropping import SignatureCrop
 from sigdetect.detector import BuildDetector, Detector, FileResult, Signature
-EngineName = Literal["pypdf2", "pypdf", "pymupdf"]
+EngineName = Literal["pypdf2", "pypdf", "pymupdf", "auto"]
 ProfileName = Literal["hipaa", "retainer"]
@@ -18,7 +18,7 @@ def DetectPdf(
     pdfPath: str | Path,
     *,
     profileName: ProfileName = "hipaa",
-    engineName: EngineName = "pypdf2",
+    engineName: EngineName = "auto",
     includePseudoSignatures: bool = True,
     recurseXObjects: bool = True,
     detector: Detector | None = None,
@@ -43,7 +43,7 @@ def get_detector(
     *,
     pdfRoot: str | Path | None = None,
     profileName: ProfileName = "hipaa",
-    engineName: EngineName = "pypdf2",
+    engineName: EngineName = "auto",
     includePseudoSignatures: bool = True,
     recurseXObjects: bool = True,
     outputDirectory: str | Path | None = None,
@@ -200,7 +200,9 @@ def CropSignatureImages(
     outputDirectory: str | Path,
     dpi: int = 200,
     returnBytes: Literal[False] = False,
-) -> list[Path]: ...
+    saveToDisk: bool = True,
+) -> list[Path]:
+    ...
 @overload
@@ -211,7 +213,9 @@ def CropSignatureImages(
     outputDirectory: str | Path,
     dpi: int,
     returnBytes: Literal[True],
-) -> list[SignatureCrop]: ...
+    saveToDisk: bool,
+) -> list[SignatureCrop]:
+    ...
 def CropSignatureImages(
@@ -221,12 +225,14 @@ def CropSignatureImages(
     outputDirectory: str | Path,
     dpi: int = 200,
     returnBytes: bool = False,
+    saveToDisk: bool = True,
 ) -> list[Path] | list[SignatureCrop]:
     """Crop detected signature regions to PNG files.
     Accepts either a :class:`FileResult` instance or the ``dict`` returned by
     :func:`DetectPdf`. Requires the optional ``pymupdf`` dependency.
-    Set ``returnBytes=True`` to also receive in-memory PNG bytes for each crop.
+    Set ``returnBytes=True`` to also receive in-memory PNG bytes for each crop. Set
+    ``saveToDisk=False`` to skip writing PNG files while still returning in-memory data.
     """
     from sigdetect.cropping import crop_signatures
@@ -238,6 +244,7 @@ def CropSignatureImages(
         output_dir=Path(outputDirectory),
         dpi=dpi,
         return_bytes=returnBytes,
+        save_files=saveToDisk,
     )
     if original_dict is not None:
         original_dict.clear()

{sigdetect-0.3.0 → sigdetect-0.4.0}/src/sigdetect/cli.py RENAMED Viewed

@@ -15,6 +15,7 @@ from .cropping import SignatureCroppingUnavailable, crop_signatures
 from .detector import BuildDetector, FileResult
 from .eda import RunExploratoryAnalysis
 from .logging_setup import ConfigureLogging
+from .wet_detection import apply_wet_detection
 Logger = ConfigureLogging()
@@ -72,6 +73,33 @@ def Detect(
         help="Rendering DPI for signature crops",
         show_default=False,
     ),
+    detectWetSignatures: bool | None = typer.Option(
+        None,
+        "--detect-wet/--no-detect-wet",
+        help="Run OCR-backed wet signature detection (requires PyMuPDF + Tesseract)",
+        show_default=False,
+    ),
+    wetOcrDpi: int | None = typer.Option(
+        None,
+        "--wet-ocr-dpi",
+        min=72,
+        max=600,
+        help="Rendering DPI for OCR pages (wet detection)",
+        show_default=False,
+    ),
+    wetOcrLanguages: str | None = typer.Option(
+        None,
+        "--wet-ocr-languages",
+        help="Tesseract language packs for OCR (e.g., 'eng' or 'eng+spa')",
+    ),
+    wetPrecisionThreshold: float | None = typer.Option(
+        None,
+        "--wet-precision-threshold",
+        min=0.0,
+        max=1.0,
+        help="Minimum wet-signature confidence (0-1) to accept a candidate",
+        show_default=False,
+    ),
 ) -> None:
     """Run detection for the configured directory and emit ``results.json``."""
@@ -89,6 +117,14 @@ def Detect(
         overrides["CropOutputDirectory"] = cropDirectory
     if cropDpi is not None:
         overrides["CropImageDpi"] = cropDpi
+    if detectWetSignatures is not None:
+        overrides["DetectWetSignatures"] = detectWetSignatures
+    if wetOcrDpi is not None:
+        overrides["WetOcrDpi"] = wetOcrDpi
+    if wetOcrLanguages is not None:
+        overrides["WetOcrLanguages"] = wetOcrLanguages
+    if wetPrecisionThreshold is not None:
+        overrides["WetPrecisionThreshold"] = wetPrecisionThreshold
     if overrides:
         configuration = configuration.model_copy(update=overrides)
         configuration = FinalizeConfiguration(configuration)
@@ -182,6 +218,7 @@ def Detect(
     def _process(pdf_path: Path) -> None:
         file_result = detector.Detect(pdf_path)
+        apply_wet_detection(pdf_path, configuration, file_result, logger=Logger)
         _append_result(file_result, pdf_path)
     try:

{sigdetect-0.3.0 → sigdetect-0.4.0}/src/sigdetect/config.py RENAMED Viewed

@@ -10,7 +10,7 @@ from typing import Literal
 import yaml
 from pydantic import BaseModel, ConfigDict, Field, field_validator
-EngineName = Literal["pypdf2", "pypdf", "pymupdf"]
+EngineName = Literal["pypdf2", "pypdf", "pymupdf", "auto"]
 ProfileName = Literal["hipaa", "retainer"]
@@ -25,13 +25,19 @@ class DetectConfiguration(BaseModel):
     PdfRoot: Path = Field(default=Path("hipaa_results"), alias="pdf_root")
     OutputDirectory: Path | None = Field(default=Path("out"), alias="out_dir")
-    Engine: EngineName = Field(default="pypdf2", alias="engine")
+    Engine: EngineName = Field(default="auto", alias="engine")
     Profile: ProfileName = Field(default="hipaa", alias="profile")
     PseudoSignatures: bool = Field(default=True, alias="pseudo_signatures")
     RecurseXObjects: bool = Field(default=True, alias="recurse_xobjects")
-    CropSignatures: bool = Field(default=False, alias="crop_signatures")
+    CropSignatures: bool = Field(default=True, alias="crop_signatures")
     CropOutputDirectory: Path | None = Field(default=None, alias="crop_output_dir")
     CropImageDpi: int = Field(default=200, alias="crop_image_dpi", ge=72, le=600)
+    DetectWetSignatures: bool = Field(default=True, alias="detect_wet_signatures")
+    WetOcrDpi: int = Field(default=200, alias="wet_ocr_dpi", ge=72, le=600)
+    WetOcrLanguages: str = Field(default="eng", alias="wet_ocr_languages")
+    WetPrecisionThreshold: float = Field(
+        default=0.82, alias="wet_precision_threshold", ge=0.0, le=1.0
+    )
     @field_validator("PdfRoot", "OutputDirectory", "CropOutputDirectory", mode="before")
     @classmethod
@@ -85,6 +91,22 @@ class DetectConfiguration(BaseModel):
     def crop_image_dpi(self) -> int:  # pragma: no cover - simple passthrough
         return self.CropImageDpi
+    @property
+    def detect_wet_signatures(self) -> bool:  # pragma: no cover - simple passthrough
+        return self.DetectWetSignatures
+    @property
+    def wet_ocr_dpi(self) -> int:  # pragma: no cover - simple passthrough
+        return self.WetOcrDpi
+    @property
+    def wet_ocr_languages(self) -> str:  # pragma: no cover - simple passthrough
+        return self.WetOcrLanguages
+    @property
+    def wet_precision_threshold(self) -> float:  # pragma: no cover - simple passthrough
+        return self.WetPrecisionThreshold
 def LoadConfiguration(path: Path | None) -> DetectConfiguration:
     """Load configuration from ``path`` while applying environment overrides.
@@ -108,6 +130,10 @@ def LoadConfiguration(path: Path | None) -> DetectConfiguration:
     env_crop = os.getenv("SIGDETECT_CROP_SIGNATURES")
     env_crop_dir = os.getenv("SIGDETECT_CROP_DIR")
     env_crop_dpi = os.getenv("SIGDETECT_CROP_DPI")
+    env_detect_wet = os.getenv("SIGDETECT_DETECT_WET")
+    env_wet_dpi = os.getenv("SIGDETECT_WET_OCR_DPI")
+    env_wet_lang = os.getenv("SIGDETECT_WET_LANGUAGES")
+    env_wet_precision = os.getenv("SIGDETECT_WET_PRECISION")
     raw_data: dict[str, object] = {}
     if path and Path(path).exists():
@@ -133,6 +159,20 @@ def LoadConfiguration(path: Path | None) -> DetectConfiguration:
     if env_crop_dpi:
         with suppress(ValueError):
             raw_data["crop_image_dpi"] = int(env_crop_dpi)
+    if env_detect_wet is not None:
+        lowered = env_detect_wet.lower()
+        if lowered in {"1", "true", "yes", "on"}:
+            raw_data["detect_wet_signatures"] = True
+        elif lowered in {"0", "false", "no", "off"}:
+            raw_data["detect_wet_signatures"] = False
+    if env_wet_dpi:
+        with suppress(ValueError):
+            raw_data["wet_ocr_dpi"] = int(env_wet_dpi)
+    if env_wet_lang:
+        raw_data["wet_ocr_languages"] = env_wet_lang
+    if env_wet_precision:
+        with suppress(ValueError):
+            raw_data["wet_precision_threshold"] = float(env_wet_precision)
     configuration = DetectConfiguration(**raw_data)
     return FinalizeConfiguration(configuration)

{sigdetect-0.3.0 → sigdetect-0.4.0}/src/sigdetect/cropping.py RENAMED Viewed

@@ -28,6 +28,7 @@ class SignatureCrop:
     path: Path
     image_bytes: bytes
     signature: Signature
+    saved_to_disk: bool = True
 @overload
@@ -39,7 +40,9 @@ def crop_signatures(
     dpi: int = 200,
     logger: logging.Logger | None = None,
     return_bytes: Literal[False] = False,
-) -> list[Path]: ...
+    save_files: bool = True,
+) -> list[Path]:
+    ...
 @overload
@@ -50,8 +53,10 @@ def crop_signatures(
     output_dir: Path,
     dpi: int = 200,
     logger: logging.Logger | None = None,
-    return_bytes: Literal[True] = True,
-) -> list[SignatureCrop]: ...
+    return_bytes: Literal[True],
+    save_files: bool = True,
+) -> list[SignatureCrop]:
+    ...
 def crop_signatures(
@@ -62,27 +67,32 @@ def crop_signatures(
     dpi: int = 200,
     logger: logging.Logger | None = None,
     return_bytes: bool = False,
+    save_files: bool = True,
 ) -> list[Path] | list[SignatureCrop]:
     """Render each signature bounding box to a PNG image using PyMuPDF.
     Set ``return_bytes=True`` to collect in-memory PNG bytes for each crop while also writing
-    the files to ``output_dir``.
+    the files to ``output_dir``. Set ``save_files=False`` to skip writing PNGs to disk.
     """
     if fitz is None:  # pragma: no cover - exercised when dependency absent
         raise SignatureCroppingUnavailable(
             "PyMuPDF is required for PNG crops. Install 'pymupdf' or 'sigdetect[pymupdf]'."
         )
+    if not save_files and not return_bytes:
+        raise ValueError("At least one of save_files or return_bytes must be True")
     pdf_path = Path(pdf_path)
     output_dir = Path(output_dir)
-    output_dir.mkdir(parents=True, exist_ok=True)
+    if save_files:
+        output_dir.mkdir(parents=True, exist_ok=True)
     generated_paths: list[Path] = []
     generated_crops: list[SignatureCrop] = []
     with fitz.open(pdf_path) as document:  # type: ignore[attr-defined]
         per_document_dir = output_dir / pdf_path.stem
-        per_document_dir.mkdir(parents=True, exist_ok=True)
+        if save_files:
+            per_document_dir.mkdir(parents=True, exist_ok=True)
         scale = dpi / 72.0
         matrix = fitz.Matrix(scale, scale)
@@ -113,7 +123,8 @@ def crop_signatures(
             try:
                 image_bytes: bytes | None = None
                 pixmap = page.get_pixmap(matrix=matrix, clip=clip, alpha=False)
-                pixmap.save(destination)
+                if save_files:
+                    pixmap.save(destination)
                 if return_bytes:
                     image_bytes = pixmap.tobytes("png")
             except Exception as exc:  # pragma: no cover - defensive
@@ -129,8 +140,9 @@ def crop_signatures(
                     )
                 continue
-            signature.CropPath = str(destination)
-            generated_paths.append(destination)
+            if save_files:
+                signature.CropPath = str(destination)
+                generated_paths.append(destination)
             if return_bytes:
                 if image_bytes is None:  # pragma: no cover - defensive
                     continue
@@ -139,6 +151,7 @@ def crop_signatures(
                         path=destination,
                         image_bytes=image_bytes,
                         signature=signature,
+                        saved_to_disk=save_files,
                     )
                 )

{sigdetect-0.3.0 → sigdetect-0.4.0}/src/sigdetect/detector/__init__.py RENAMED Viewed

@@ -2,6 +2,7 @@
 from __future__ import annotations
+import warnings
 from typing import TYPE_CHECKING, Type
 from .base_detector import Detector
@@ -37,7 +38,23 @@ def BuildDetector(configuration: DetectConfiguration) -> Detector:
         or getattr(configuration, "engine", None)
         or PyPDF2Detector.Name
     )
-    normalized = engine_name.lower()
+    normalized = str(engine_name).lower()
+    if normalized == "auto":
+        detector_cls: Type[Detector] | None = None
+        if PyMuPDFDetector is not None:
+            detector_cls = ENGINE_REGISTRY.get(getattr(PyMuPDFDetector, "Name", "")) or PyMuPDFDetector
+        if detector_cls is None:
+            detector_cls = ENGINE_REGISTRY.get(PyPDF2Detector.Name) or ENGINE_REGISTRY.get("pypdf")
+            warnings.warn(
+                "Engine 'auto' falling back to 'pypdf2' because PyMuPDF is unavailable",
+                RuntimeWarning,
+                stacklevel=2,
+            )
+        if detector_cls is None:
+            available = ", ".join(sorted(ENGINE_REGISTRY)) or "<none>"
+            raise ValueError(f"No available detector engines. Available engines: {available}")
+        return detector_cls(configuration)
     detector_cls = ENGINE_REGISTRY.get(normalized)
     if detector_cls is None:

{sigdetect-0.3.0 → sigdetect-0.4.0}/src/sigdetect/detector/pymupdf_engine.py RENAMED Viewed

@@ -111,6 +111,7 @@ class PyMuPDFDetector(PyPDF2Detector):
                     rect, exclusion, mode = rect_info
                     padded = self._PadRect(rect, page.rect, signature.Role, exclusion, mode)
                     signature.BoundingBox = self._RectToPdfTuple(padded, page.rect.height)
+                    signature.RenderType = "drawn"
                     if signature.Page is None:
                         signature.Page = page_index + 1
                     break

{sigdetect-0.3.0 → sigdetect-0.4.0}/src/sigdetect/detector/pypdf2_engine.py RENAMED Viewed

@@ -348,7 +348,7 @@ class PyPDF2Detector(Detector):
         return normalized.lower().startswith("im")
     def _ClassifyAppearance(self, widget: generic.DictionaryObject, page) -> str:
-        """Classify the widget's appearance as drawn/typed/hybrid/unknown."""
+        """Classify the widget's appearance as drawn or typed."""
         ap_dict = AsDictionary(widget.get("/AP"))
         if not isinstance(ap_dict, generic.DictionaryObject):
@@ -356,7 +356,7 @@ class PyPDF2Detector(Detector):
         normal = ap_dict.get("/N")
         streams = self._ExtractAppearanceStreams(normal)
         if not streams:
-            return "unknown"
+            return "typed"
         has_text = False
         has_vector = False
@@ -384,13 +384,11 @@ class PyPDF2Detector(Detector):
                         has_image = True
                         break
-        if has_image and (has_text or has_vector):
-            return "hybrid"
         if has_image:
             return "drawn"
         if has_text or has_vector:
             return "typed"
-        return "unknown"
+        return "typed"
     # ---- file-wide stream scan (compressed or not)
     def _ScanFileStreamsForVendors(self, file_bytes: bytes) -> tuple[set[str], str]:
@@ -863,6 +861,7 @@ class PyPDF2Detector(Detector):
                                 Scores={r: sc},
                                 Evidence=ev + ["pseudo:true"],
                                 Hint="VendorOrAcroOnly",
+                                RenderType="typed",
                             )
                         )
@@ -903,6 +902,7 @@ class PyPDF2Detector(Detector):
                                 Scores={role: score} if score > 0 else {},
                                 Evidence=ev + ["pseudo:true"],
                                 Hint="VendorOrAcroOnly",
+                                RenderType="typed",
                             )
                         )
@@ -1055,6 +1055,7 @@ class PyPDF2Detector(Detector):
                         Scores=scores,
                         Evidence=evidence,
                         Hint=f"AcroSig:{fname}" if fname else "AcroSig",
+                        RenderType="typed",
                     )
                 )
@@ -1120,6 +1121,7 @@ class PyPDF2Detector(Detector):
                         Scores=dict(scores),
                         Evidence=evidence + ["pseudo:true"],
                         Hint="VendorOrAcroOnly",
+                        RenderType="typed",
                     )
                 )

{sigdetect-0.3.0 → sigdetect-0.4.0}/src/sigdetect/detector/signature_model.py RENAMED Viewed

@@ -17,7 +17,7 @@ class Signature:
     Scores: dict[str, int]
     Evidence: list[str]
     Hint: str
-    RenderType: str = "unknown"
+    RenderType: str = "typed"
     BoundingBox: tuple[float, float, float, float] | None = None
     CropPath: str | None = None

sigdetect 0.3.0__tar.gz → 0.4.0__tar.gz

sigdetect 0.3.0tar.gz → 0.4.0tar.gz