PyPI - ocrcontext - Versions diffs - 0.1.3__tar.gz → 0.1.5__tar.gz - Mend

ocrcontext 0.1.3tar.gz → 0.1.5tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (54) hide show

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/CHANGELOG.md RENAMED Viewed

@@ -7,7 +7,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
-## [0.1.3] - 2026-06-27
+## [0.1.5] - 2026-06-27
+### Fixed
+- CLI now shows a clear error message when an LLM provider API key is missing
+  instead of a raw traceback (e.g. `OPENAI_API_KEY` not set).
+- CLI prints a first-run warning before the OCR step when PaddleOCR models
+  have not been downloaded yet, so users know the ~90 MB download is expected.
+## [0.1.4] - 2026-06-27
 ### Added
 - **GPU acceleration** — `Analyzer(use_gpu=True)` routes PaddleOCR inference to a
@@ -95,7 +103,9 @@ into a standalone, LLM-agnostic library.
 - **Packaging** — optional extras `[paddle]`, `[trocr]`, `[vision]`, `[all]`;
   PEP 561 typed (`py.typed`); examples and a GPU/network-free test suite.
-[Unreleased]: https://github.com/bahadirkarsli/ocrcontext/compare/v0.1.3...HEAD
+[Unreleased]: https://github.com/bahadirkarsli/ocrcontext/compare/v0.1.5...HEAD
+[0.1.5]: https://github.com/bahadirkarsli/ocrcontext/compare/v0.1.4...v0.1.5
+[0.1.4]: https://github.com/bahadirkarsli/ocrcontext/compare/v0.1.3...v0.1.4
 [0.1.3]: https://github.com/bahadirkarsli/ocrcontext/compare/v0.1.2...v0.1.3
 [0.1.2]: https://github.com/bahadirkarsli/ocrcontext/compare/v0.1.1...v0.1.2
 [0.1.1]: https://github.com/bahadirkarsli/ocrcontext/compare/v0.1.0...v0.1.1

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: ocrcontext
-Version: 0.1.3
+Version: 0.1.5
 Summary: Decoupled, LLM-agnostic document OCR + structured extraction. Vision and LLM parsing in 3 lines of code.
 Project-URL: Homepage, https://github.com/BahadirKarsli/OCRContext
 Project-URL: Repository, https://github.com/BahadirKarsli/OCRContext
@@ -90,8 +90,19 @@ print(result.text)
 `ocrcontext` is the extraction core of a production document-analysis platform, lifted out of its FastAPI/Next.js stack into a pure, pip-installable library. It handles OCR engine routing, fidelity-first LLM cleanup, and schema-based structured extraction — and gets out of your way.
+## Demo
+**Structured invoice extraction from an image:**
+<img width="100%" alt="Invoice extraction demo" src="https://github.com/user-attachments/assets/8e77ab83-fff3-4929-9a54-7f4a75acc16f" />
+**Digital PDF text extraction:**
+<img width="100%" alt="PDF extraction demo" src="https://github.com/user-attachments/assets/84437bd0-9d24-4a2e-8e0c-0014c9e85820" />
 ## Contents
+- [Demo](#demo)
 - [Install](#install)
 - [CLI](#cli)
 - [Quick start (Python API)](#quick-start-python-api)

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/README.md RENAMED Viewed

@@ -35,8 +35,19 @@ print(result.text)
 `ocrcontext` is the extraction core of a production document-analysis platform, lifted out of its FastAPI/Next.js stack into a pure, pip-installable library. It handles OCR engine routing, fidelity-first LLM cleanup, and schema-based structured extraction — and gets out of your way.
+## Demo
+**Structured invoice extraction from an image:**
+<img width="100%" alt="Invoice extraction demo" src="https://github.com/user-attachments/assets/8e77ab83-fff3-4929-9a54-7f4a75acc16f" />
+**Digital PDF text extraction:**
+<img width="100%" alt="PDF extraction demo" src="https://github.com/user-attachments/assets/84437bd0-9d24-4a2e-8e0c-0014c9e85820" />
 ## Contents
+- [Demo](#demo)
 - [Install](#install)
 - [CLI](#cli)
 - [Quick start (Python API)](#quick-start-python-api)

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "ocrcontext"
-version = "0.1.3"
+version = "0.1.5"
 description = "Decoupled, LLM-agnostic document OCR + structured extraction. Vision and LLM parsing in 3 lines of code."
 readme = "README.md"
 license = { text = "MIT" }

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/src/ocrcontext/cli.py RENAMED Viewed

@@ -11,6 +11,7 @@ Then run:
 from __future__ import annotations
+import os
 import sys
 from pathlib import Path
 from typing import Optional
@@ -25,6 +26,7 @@ except ImportError:  # pragma: no cover
 from .analyzer import Analyzer
 from .config import AnalyzerConfig
+from .types import OcrResult
 from .schemas import (
     Contract,
     IdCard,
@@ -33,6 +35,61 @@ from .schemas import (
     Receipt,
 )
+def _suppress_paddle_noise() -> None:
+    import logging
+    import warnings
+    # Set env vars BEFORE any paddle/paddlex import so they see the right paths.
+    # _ensure_ascii_model_cache() in paddle.py does the same but only when the
+    # engine lazy-loads; calling it here guarantees it runs first.
+    from .engines.paddle import _ensure_ascii_model_cache, _ensure_paddle_runtime_flags
+    _ensure_ascii_model_cache()
+    _ensure_paddle_runtime_flags()
+    os.environ.setdefault("GLOG_minloglevel", "3")
+    # Silence Python-level loggers (no paddlex import — that would defeat the purpose).
+    null = logging.NullHandler()
+    for name in ("ppocr", "paddlex", "paddle", "paddle.utils", "paddle.fluid"):
+        lg = logging.getLogger(name)
+        lg.setLevel(logging.ERROR)
+        lg.handlers = [null]
+        lg.propagate = False
+    # Root-level filter catches sub-loggers that bypass the above (e.g. paddlex.utils.*).
+    class _NoiseFilter(logging.Filter):
+        _NOISE = ("Could not find files", "ccache", "oneDNN", "mkldnn")
+        def filter(self, record: logging.LogRecord) -> bool:
+            return not any(t in record.getMessage() for t in self._NOISE)
+    logging.getLogger().addFilter(_NoiseFilter())
+    warnings.filterwarnings("ignore", category=UserWarning, module="paddle")
+def _route_label(result: OcrResult, file_path: Path) -> str:
+    src = result.text_source
+    if src == "pdf_text_layer":
+        return "DIGITAL PDF -> text layer"
+    if src == "ocr":
+        return "SCANNED PDF -> rasterize + PaddleOCR" if file_path.suffix.lower() == ".pdf" else "IMAGE -> PaddleOCR"
+    if src == "vision_handwriting":
+        return "HANDWRITING -> Google Vision"
+    if src == "handwriting_ocr":
+        return "HANDWRITING -> PaddleOCR"
+    return src
+def _info(msg: str) -> None:
+    typer.echo(f"[i] {msg}", err=True)
+def _ok(msg: str) -> None:
+    typer.echo(f"[OK] {msg}", err=True)
 app = typer.Typer(
     name="ocrcontext",
     help="OCR a document and optionally extract structured data.",
@@ -59,6 +116,13 @@ _SCHEMA_NAMES = list(_SCHEMAS)
 def _build_llm(provider: str, model: str):
     """Dynamically import the right LangChain provider class."""
+    _API_KEY_HINTS = {
+        "openai":    ("OPENAI_API_KEY",    "platform.openai.com/api-keys"),
+        "anthropic": ("ANTHROPIC_API_KEY", "console.anthropic.com/settings/keys"),
+        "google":    ("GOOGLE_API_KEY",    "aistudio.google.com/apikey"),
+        "ollama":    (None, None),
+    }
     try:
         if provider == "openai":
             from langchain_openai import ChatOpenAI  # type: ignore[import-untyped]
@@ -79,6 +143,19 @@ def _build_llm(provider: str, model: str):
             err=True,
         )
         raise typer.Exit(code=1)
+    except Exception as exc:
+        msg = str(exc)
+        if "api_key" in msg.lower() or "credentials" in msg.lower() or "auth" in msg.lower():
+            env_var, url = _API_KEY_HINTS.get(provider, (None, None))
+            hint = f"Set it with:  $env:{env_var} = \"...\"" if env_var else ""
+            url_hint = f"\nGet a key at: {url}" if url else ""
+            typer.echo(
+                f"[ERROR] No API key found for '{provider}'.\n{hint}{url_hint}",
+                err=True,
+            )
+        else:
+            typer.echo(f"[ERROR] Failed to initialize '{provider}': {exc}", err=True)
+        raise typer.Exit(code=1)
     typer.echo(
         f"[ERROR] Unknown provider '{provider}'. "
@@ -129,12 +206,13 @@ def extract(
 ) -> None:
     """OCR a document and optionally extract structured data."""
+    _suppress_paddle_noise()
     file_path = Path(file)
     if not file_path.exists():
         typer.echo(f"[ERROR] File not found: {file}", err=True)
         raise typer.Exit(code=1)
-    # Validate --schema value early for a clear error message.
     if schema is not None and schema not in _SCHEMAS:
         typer.echo(
             f"[ERROR] Unknown schema '{schema}'. "
@@ -148,36 +226,47 @@ def extract(
         raise typer.Exit(code=1)
     refine_flag = _parse_refine(refine)
-    # Build LLM only when needed.
     needs_llm = schema is not None or refine_flag is True
     llm = _build_llm(provider, model) if needs_llm else None
-    analyzer = Analyzer(
-        llm=llm,
-        config=AnalyzerConfig(lang=lang),
-    )
+    analyzer = Analyzer(llm=llm, config=AnalyzerConfig(lang=lang))
     try:
-        if schema is not None:
-            schema_cls = _SCHEMAS[schema]
-            result = analyzer.extract(
-                file_path,
-                schema=schema_cls,
-                handwriting=handwriting,
-                refine=refine_flag or False,
-            )
-            typer.echo(result.model_dump_json(indent=2))
-        else:
-            result = analyzer.analyze(
+        _info(f"file: {file_path.name}")
+        paddlex_cache = Path(os.environ.get("PADDLE_PDX_CACHE_HOME", Path.home() / ".paddlex"))
+        if not (paddlex_cache / "official_models").exists():
+            _info("first run: downloading OCR model (~90 MB), this may take a minute...")
+        _info("OCR...")
+        ocr_result = analyzer.analyze(
                 file_path,
                 handwriting=handwriting,
                 refine=refine_flag,
             )
+        conf = f"confidence: {ocr_result.confidence:.0%}" if ocr_result.confidence < 1.0 else "exact"
+        _ok(f"route: {_route_label(ocr_result, file_path)}  ({conf})")
+        if ocr_result.refined:
+            _ok("refine: APPLIED")
+        if schema is not None:
+            schema_cls = _SCHEMAS[schema]
+            _info(f"extract: {schema} schema...")
+            structured = analyzer.extract_text(
+                ocr_result.text,
+                schema_cls,
+                language=lang,
+            )
+            _ok(f"extract: {schema} [OK]")
+            typer.echo(structured.model_dump_json(indent=2))
+        else:
             if output == "json":
-                typer.echo(result.model_dump_json(indent=2))
+                typer.echo(ocr_result.model_dump_json(indent=2))
             else:
-                typer.echo(result.text)
+                typer.echo(ocr_result.text)
     except Exception as exc:  # noqa: BLE001
         typer.echo(f"[ERROR] {exc}", err=True)

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/src/ocrcontext/engines/paddle.py RENAMED Viewed

@@ -104,6 +104,7 @@ class PaddleEngine(OcrEngine):
         import logging
         logging.getLogger("ppocr").setLevel(logging.ERROR)
+        logging.getLogger("paddlex").setLevel(logging.ERROR)
         requested = paddle_lang
         ocr, errors = self._try_init(PaddleOCR, paddle_lang, use_gpu=self._use_gpu)
         if ocr is None and paddle_lang != "en":
@@ -129,9 +130,12 @@ class PaddleEngine(OcrEngine):
         # Shared 3.x flags: disable sub-models unneeded for plain OCR.
         # enable_mkldnn is forced False on CPU to avoid PaddlePaddle 3.x PIR bug;
         # on GPU it's irrelevant (MKLDNN is CPU-only) but harmless to keep False.
+        # use_gpu is only injected when True — some PaddleOCR 3.x builds reject it
+        # as an unknown argument even when set to False.
+        gpu_kwargs = {"use_gpu": True} if use_gpu else {}
         base_3x = {
             "lang": lang,
-            "use_gpu": use_gpu,
+            **gpu_kwargs,
             "use_doc_orientation_classify": False,
             "use_doc_unwarping": False,
             "use_textline_orientation": False,
@@ -147,10 +151,10 @@ class PaddleEngine(OcrEngine):
             # 3.x default — version determined by installed package, no pin
             base_3x,
             # Minimal 3.x (for builds that reject the sub-model flags)
-            {"lang": lang, "use_gpu": use_gpu, "enable_mkldnn": False},
-            {"lang": lang, "use_gpu": use_gpu},
+            {"lang": lang, **gpu_kwargs, "enable_mkldnn": False},
+            {"lang": lang, **gpu_kwargs},
             # Legacy 2.x (use_angle_cls; use_doc_* / show_log don't exist in 2.x)
-            {"use_angle_cls": True, "lang": lang, "use_gpu": use_gpu},
+            {"use_angle_cls": True, "lang": lang, **gpu_kwargs},
         ]
         errors: list[str] = []
         for kwargs in profiles:

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/src/ocrcontext/llm/schemas.py RENAMED Viewed

@@ -22,7 +22,7 @@ class LineItem(BaseModel):
             "Default 1 only if neither is available."
         ),
     )
-    unit: Optional[str] = Field(None, description="Unit type (Adet, Kg, Saat, etc.).")
+    unit: Optional[str] = Field(None, description="Unit of measure as written on the document (e.g. pcs, kg, hrs). Null if not present.")
     unit_price: Optional[float] = Field(None, description="Price per unit.")
     tax_rate: Optional[str] = Field(
         None, description="Tax percentage (e.g., 20, 10, 0) or pattern."
@@ -34,9 +34,9 @@ class Invoice(BaseModel):
     supplier_name: Optional[str] = Field(None, description="Name of the vendor/supplier.")
     invoice_date: Optional[str] = Field(None, description="Format YYYY-MM-DD.")
     invoice_number: Optional[str] = Field(None, description="The invoice ID/number.")
-    tax_id: Optional[str] = Field(None, description="Tax ID / VKN / TCKN.")
+    tax_id: Optional[str] = Field(None, description="Tax ID / VAT registration number.")
     tax_rate: Optional[str] = Field(
-        None, description="e.g. 'KDV %20' when KDV is 20%."
+        None, description="Tax/VAT rate as written on the document (e.g. 'VAT 20%', 'GST 10%')."
     )
     currency: Optional[str] = Field(None, description="Currency code (TRY, USD, EUR, etc.).")
     total_amount: Optional[float] = Field(None, description="Final total amount (numeric).")
@@ -63,37 +63,38 @@ class Invoice(BaseModel):
         return self
-# Verbatim system prompt from app/api/invoices/process/route.ts.
 INVOICE_EXTRACTION_PROMPT = """You are an expert invoice data extraction assistant.
 CRITICAL RULES:
-1. **LANGUAGE REPAIR**:
-    - The text may come from OCR and may have missing characters.
-    - If language is 'tr' (Turkish), intelligently fix missing Turkish characters.
+1. **OCR REPAIR**: The text may come from OCR and may have missing or garbled characters.
+   Use context to infer the correct value — do not invent values that are not on the document.
 2. **NUMBER PARSING**:
-    - Be extremely careful with comma (,) and dot (.).
-    - In Turkish/European invoices, '1.200,50' means One Thousand Two Hundred and 50 cents.
-    - NEVER confuse a quantity (e.g., 500) with a price (e.g. 5,00).
+    - Be careful with comma (,) and dot (.) as thousand separators vs decimal points.
+    - European format: '1.200,50' = 1200.50. US/UK format: '1,200.50' = 1200.50.
+    - NEVER confuse a quantity (e.g., 2) with a unit price (e.g., 45.00).
 3. **CURRENCY DETECTION**:
-    - Look for symbols: ₺, TL, TRY, USD, $, EUR, €.
-    - Prioritize 'TRY' / 'TL' unless explicitly stated otherwise.
+    - Look for symbols or codes on the document: $, USD, €, EUR, £, GBP, ₺, TRY, etc.
+    - Use ONLY what is explicitly stated. Do not default to any currency.
-Extract the following fields if it exists:
+4. **UNITS**: Copy the unit exactly as written on the document (pcs, kg, hrs, m², etc.).
+   If no unit is shown, use null — never invent one.
+Extract the following fields if present:
 - 'supplier_name': Name of the vendor/supplier.
 - 'invoice_date': Format YYYY-MM-DD.
 - 'invoice_number': The invoice ID/number.
-- 'tax_id': Tax ID / VKN / TCKN.
-- 'tax_rate': It can be like 'KDV' and for example if it is 'KDV' and it is %20, write it as 'KDV %20' in excel.
-- 'currency': Currency code (TRY, USD, EUR, etc.).
+- 'tax_id': Tax ID or VAT registration number.
+- 'tax_rate': Tax/VAT rate as written (e.g. 'VAT 20%', 'GST 10%').
+- 'currency': ISO currency code (USD, EUR, GBP, TRY, etc.).
 - 'total_amount': Final total amount (numeric).
 - 'line_items': An array of items/services. Each item should have:
   - 'description': Product/Service name.
   - 'quantity': Numeric quantity. If missing, calculate it as total / unit_price. Default 1 only if neither is available.
-  - 'unit': Unit type (Adet, Kg, Saat, etc.).
+  - 'unit': Unit of measure exactly as written on the document. Null if not present.
   - 'unit_price': Price per unit.
-  - 'tax_rate': Tax percentage (e.g., 20, 10, 0) or pattern.
+  - 'tax_rate': Tax percentage (e.g., 20, 10, 0) or null.
   - 'total': Total price for this line.
 Return ONLY a valid JSON object. If a field is not found, use null."""

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/tests/test_cli.py RENAMED Viewed

@@ -48,6 +48,9 @@ def _patch_analyzer(monkeypatch, ocr_text: str = "hello world", structured: dict
         def extract(self, *args, schema=None, **kwargs):
             return schema(**(structured or {}))
+        def extract_text(self, text, schema, **kwargs):
+            return schema(**(structured or {}))
     monkeypatch.setattr(cli_mod, "Analyzer", _FakeAnalyzer)
     monkeypatch.setattr(cli_mod, "_build_llm", lambda provider, model: None)
@@ -99,7 +102,7 @@ def test_extract_json_output(ascii_tmp, monkeypatch):
     result = runner.invoke(app, ["extract", str(png), "--output", "json"])
     assert result.exit_code == 0
     import json
-    data = json.loads(result.output)
+    data = json.loads(result.output[result.output.index("{"):])
     assert data["text"] == "some text"
     assert data["text_source"] == "ocr"
@@ -119,7 +122,7 @@ def test_extract_invoice_schema(ascii_tmp, monkeypatch):
     result = runner.invoke(app, ["extract", str(png), "--schema", "invoice"])
     assert result.exit_code == 0
     import json
-    data = json.loads(result.output)
+    data = json.loads(result.output[result.output.index("{"):])
     assert data["supplier_name"] == "ACME"
     assert data["total_amount"] == 250.0
@@ -131,7 +134,7 @@ def test_extract_receipt_schema(ascii_tmp, monkeypatch):
     result = runner.invoke(app, ["extract", str(png), "--schema", "receipt"])
     assert result.exit_code == 0
     import json
-    data = json.loads(result.output)
+    data = json.loads(result.output[result.output.index("{"):])
     assert data["store_name"] == "Migros"

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/.gitignore RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/LICENSE RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/examples/01_quickstart.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/examples/02_refine_openai.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/examples/03_structured_invoice.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/examples/04_local_ollama.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/examples/image_smoke_test.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/examples/pdf_smoke_test.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/examples/structured_smoke_test.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/src/ocrcontext/__init__.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/src/ocrcontext/analyzer.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/src/ocrcontext/config.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/src/ocrcontext/engines/__init__.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/src/ocrcontext/engines/base.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/src/ocrcontext/engines/handwriting.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/src/ocrcontext/engines/pdf_text.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/src/ocrcontext/engines/registry.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/src/ocrcontext/engines/trocr.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/src/ocrcontext/engines/vision.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/src/ocrcontext/exceptions.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/src/ocrcontext/llm/__init__.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/src/ocrcontext/llm/drift.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/src/ocrcontext/llm/extractor.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/src/ocrcontext/llm/formatting.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/src/ocrcontext/llm/literal_preserve.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/src/ocrcontext/llm/prompts.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/src/ocrcontext/llm/refiner.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/src/ocrcontext/loaders.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/src/ocrcontext/pipeline.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/src/ocrcontext/preprocessing/__init__.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/src/ocrcontext/preprocessing/image.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/src/ocrcontext/py.typed RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/src/ocrcontext/quality.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/src/ocrcontext/schemas.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/src/ocrcontext/types.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/src/ocrcontext/utils/__init__.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/src/ocrcontext/utils/files.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/src/ocrcontext/utils/lang.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/tests/__init__.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/tests/conftest.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/tests/test_langchain_loader.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/tests/test_literal_preserve.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/tests/test_llm.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/tests/test_pipeline_analyzer.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/tests/test_schemas.py RENAMED Viewed

File without changes

{ocrcontext-0.1.3 → ocrcontext-0.1.5}/tests/test_text_helpers.py RENAMED Viewed

File without changes

ocrcontext 0.1.3__tar.gz → 0.1.5__tar.gz

ocrcontext 0.1.3tar.gz → 0.1.5tar.gz