PyPI - docintel-platform - Versions diffs - 1.1.0__tar.gz → 1.2.0__tar.gz - Mend

docintel-platform 1.1.0tar.gz → 1.2.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (126) hide show

{docintel_platform-1.1.0/src/docintel_platform.egg-info → docintel_platform-1.2.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: docintel-platform
-Version: 1.1.0
+Version: 1.2.0
 Summary: Enterprise document intelligence API for PDF compliance, multi-format extraction, structuring, and summarization.
 Author: Babandeep Singh
 License-Expression: MIT
@@ -90,7 +90,7 @@ Requires-Dist: openpyxl>=3.1.5; extra == "all"
 Enterprise document intelligence API: PDF compliance (OCR, PII, redaction), LLM structuring, and multi-format text workflows (Word, Excel, CSV, plain text).
-**Version:** 1.1.0 | **PyPI:** [docintel-platform](https://pypi.org/project/docintel-platform/)
+**Version:** 1.2.0 | **PyPI:** [docintel-platform](https://pypi.org/project/docintel-platform/)
 ---
@@ -112,6 +112,8 @@ make docker-up
 | Gradio UI | http://127.0.0.1:7860 |
 | Health | http://127.0.0.1:5000/health |
+Gradio includes a **Document process** tab (unified pipeline). It needs the API plus a Redis worker (`worker` service in compose, or `make run-worker` locally).
 **pip install:**
 ```bash
@@ -142,7 +144,7 @@ report = client.process_document("policy.docx", include_pii=True)
 | Documents | `POST /v1/documents/*` | Identify, extract, classify, summarize, PII, compare, **process** |
 | Text | `POST /v1/text/summarize` | TextRank extractive summary |
 | Batch | `POST /v1/batch` | Async summarize, classify, detect_pii, process |
-| Jobs | `GET /v1/jobs/{id}` | Poll async work (`?async=true` on PDF and process routes) |
+| Jobs | `GET /v1/jobs/{id}` | Poll async work (`?async=true`; default in Docker when Redis is up) |
 | Ops | `GET /health`, `GET /metrics` | Health and Prometheus-friendly metrics |
 **Supported uploads (text workflows):** PDF, DOCX, XLSX, CSV, JSON, TXT, MD.
@@ -182,6 +184,7 @@ make run                # API on :5000
 make run-worker         # RQ worker (separate terminal, needs Redis)
 make run-ui             # Gradio on :7860
 make test
+make eval               # offline quality report (summary, classify, process, PII)
 ```
 Copy `.env.example` to `.env` for `DOCINTEL_LLM_API_KEY`, auth keys, Redis, and S3. See comments in that file for all variables.

{docintel_platform-1.1.0 → docintel_platform-1.2.0}/README.md RENAMED Viewed

@@ -7,7 +7,7 @@
 Enterprise document intelligence API: PDF compliance (OCR, PII, redaction), LLM structuring, and multi-format text workflows (Word, Excel, CSV, plain text).
-**Version:** 1.1.0 | **PyPI:** [docintel-platform](https://pypi.org/project/docintel-platform/)
+**Version:** 1.2.0 | **PyPI:** [docintel-platform](https://pypi.org/project/docintel-platform/)
 ---
@@ -29,6 +29,8 @@ make docker-up
 | Gradio UI | http://127.0.0.1:7860 |
 | Health | http://127.0.0.1:5000/health |
+Gradio includes a **Document process** tab (unified pipeline). It needs the API plus a Redis worker (`worker` service in compose, or `make run-worker` locally).
 **pip install:**
 ```bash
@@ -59,7 +61,7 @@ report = client.process_document("policy.docx", include_pii=True)
 | Documents | `POST /v1/documents/*` | Identify, extract, classify, summarize, PII, compare, **process** |
 | Text | `POST /v1/text/summarize` | TextRank extractive summary |
 | Batch | `POST /v1/batch` | Async summarize, classify, detect_pii, process |
-| Jobs | `GET /v1/jobs/{id}` | Poll async work (`?async=true` on PDF and process routes) |
+| Jobs | `GET /v1/jobs/{id}` | Poll async work (`?async=true`; default in Docker when Redis is up) |
 | Ops | `GET /health`, `GET /metrics` | Health and Prometheus-friendly metrics |
 **Supported uploads (text workflows):** PDF, DOCX, XLSX, CSV, JSON, TXT, MD.
@@ -99,6 +101,7 @@ make run                # API on :5000
 make run-worker         # RQ worker (separate terminal, needs Redis)
 make run-ui             # Gradio on :7860
 make test
+make eval               # offline quality report (summary, classify, process, PII)
 ```
 Copy `.env.example` to `.env` for `DOCINTEL_LLM_API_KEY`, auth keys, Redis, and S3. See comments in that file for all variables.

{docintel_platform-1.1.0 → docintel_platform-1.2.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "docintel-platform"
-version = "1.1.0"
+version = "1.2.0"
 description = "Enterprise document intelligence API for PDF compliance, multi-format extraction, structuring, and summarization."
 readme = "README.md"
 license = "MIT"

{docintel_platform-1.1.0 → docintel_platform-1.2.0}/src/docintel/__init__.py RENAMED Viewed

@@ -2,5 +2,5 @@
 from docintel.client import DocintelClient, DocintelError
-__version__ = "1.1.0"
+__version__ = "1.2.0"
 __all__ = ["DocintelClient", "DocintelError", "__version__"]

{docintel_platform-1.1.0 → docintel_platform-1.2.0}/src/docintel/client.py RENAMED Viewed

@@ -164,15 +164,65 @@ class DocintelClient:
             return response.content
         return response.json()
-    def summarize(self, text: str, *, sentences: int = 3) -> dict[str, Any]:
+    def _post_async_json(
+        self,
+        path: str,
+        *,
+        json_body: dict[str, Any] | None = None,
+        params: dict[str, str] | None = None,
+        poll: bool = True,
+    ) -> dict[str, Any]:
         response = self._session.post(
-            self._url("/v1/text/summarize"),
-            json={"text": text, "sentences": sentences},
+            self._url(path),
+            json=json_body,
+            params=params,
             timeout=self.timeout,
         )
+        if response.status_code == 202:
+            payload = response.json()
+            if not poll:
+                return payload
+            completed = self.poll_job(payload["job_id"])
+            result = completed.get("result") or {}
+            return {"status": "ok", **result}
         self._raise_for_status(response)
         return response.json()
+    def _post_async_multipart(
+        self,
+        path: str,
+        *,
+        files: dict,
+        data: dict[str, str] | None = None,
+        params: dict[str, str] | None = None,
+        poll: bool = True,
+    ) -> dict[str, Any]:
+        response = self._session.post(
+            self._url(path),
+            params=params,
+            files=files,
+            data=data or {},
+            timeout=self.timeout,
+        )
+        if response.status_code == 202:
+            payload = response.json()
+            if not poll:
+                return payload
+            completed = self.poll_job(payload["job_id"])
+            result = completed.get("result") or {}
+            return {"status": "ok", **result}
+        self._raise_for_status(response)
+        return response.json()
+    def summarize(self, text: str, *, sentences: int = 3, async_job: bool = False, poll: bool = True) -> dict[str, Any]:
+        params = {"async": "true"} if async_job else {}
+        return self._post_async_json(
+            "/v1/text/summarize",
+            json_body={"text": text, "sentences": sentences},
+            params=params,
+            poll=poll,
+        )
     def list_document_types(self) -> dict[str, Any]:
         response = self._session.get(self._url("/v1/documents/types"), timeout=self.timeout)
         self._raise_for_status(response)
@@ -189,16 +239,47 @@ class DocintelClient:
         self._raise_for_status(response)
         return response.json()
-    def extract_document_text(self, path: str | Path) -> dict[str, Any]:
+    def extract_document_text(
+        self,
+        path: str | Path,
+        *,
+        async_job: bool = False,
+        poll: bool = True,
+    ) -> dict[str, Any]:
         file_path = Path(path)
+        params = {"async": "true"} if async_job else {}
         with file_path.open("rb") as handle:
-            response = self._session.post(
-                self._url("/v1/documents/extract-text"),
+            return self._post_async_multipart(
+                "/v1/documents/extract-text",
+                params=params,
                 files={"file": (file_path.name, handle, "application/octet-stream")},
-                timeout=self.timeout,
+                poll=poll,
             )
-        self._raise_for_status(response)
-        return response.json()
+    def classify_document(
+        self,
+        path: str | Path | None = None,
+        *,
+        text: str | None = None,
+        async_job: bool = False,
+        poll: bool = True,
+    ) -> dict[str, Any]:
+        params = {"async": "true"} if async_job else {}
+        if path is not None:
+            file_path = Path(path)
+            with file_path.open("rb") as handle:
+                return self._post_async_multipart(
+                    "/v1/documents/classify",
+                    params=params,
+                    files={"file": (file_path.name, handle, "application/octet-stream")},
+                    poll=poll,
+                )
+        return self._post_async_json(
+            "/v1/documents/classify",
+            json_body={"text": text or ""},
+            params=params,
+            poll=poll,
+        )
     def summarize_document(
         self,
@@ -206,24 +287,26 @@ class DocintelClient:
         *,
         text: str | None = None,
         sentences: int = 3,
+        async_job: bool = False,
+        poll: bool = True,
     ) -> dict[str, Any]:
+        params = {"async": "true"} if async_job else {}
         if path is not None:
             file_path = Path(path)
             with file_path.open("rb") as handle:
-                response = self._session.post(
-                    self._url("/v1/documents/summarize"),
+                return self._post_async_multipart(
+                    "/v1/documents/summarize",
+                    params=params,
                     files={"file": (file_path.name, handle, "application/octet-stream")},
                     data={"sentences": str(sentences)},
-                    timeout=self.timeout,
+                    poll=poll,
                 )
-        else:
-            response = self._session.post(
-                self._url("/v1/documents/summarize"),
-                json={"text": text or "", "sentences": sentences},
-                timeout=self.timeout,
-            )
-        self._raise_for_status(response)
-        return response.json()
+        return self._post_async_json(
+            "/v1/documents/summarize",
+            json_body={"text": text or "", "sentences": sentences},
+            params=params,
+            poll=poll,
+        )
     def detect_pii_document(
         self,
@@ -233,34 +316,36 @@ class DocintelClient:
         entities: str | None = None,
         vertical: str | None = None,
         min_score: float = 0.35,
+        async_job: bool = False,
+        poll: bool = True,
     ) -> dict[str, Any]:
-        data = {"min_score": str(min_score)}
-        if entities:
-            data["entities"] = entities
-        if vertical:
-            data["vertical"] = vertical
+        params = {"async": "true"} if async_job else {}
         if path is not None:
             file_path = Path(path)
+            data = {"min_score": str(min_score)}
+            if entities:
+                data["entities"] = entities
+            if vertical:
+                data["vertical"] = vertical
             with file_path.open("rb") as handle:
-                response = self._session.post(
-                    self._url("/v1/documents/detect-pii"),
+                return self._post_async_multipart(
+                    "/v1/documents/detect-pii",
+                    params=params,
                     files={"file": (file_path.name, handle, "application/octet-stream")},
                     data=data,
-                    timeout=self.timeout,
+                    poll=poll,
                 )
-        else:
-            payload = {"text": text or "", "min_score": min_score}
-            if entities:
-                payload["entities"] = entities
-            if vertical:
-                payload["vertical"] = vertical
-            response = self._session.post(
-                self._url("/v1/documents/detect-pii"),
-                json=payload,
-                timeout=self.timeout,
-            )
-        self._raise_for_status(response)
-        return response.json()
+        payload: dict[str, Any] = {"text": text or "", "min_score": min_score}
+        if entities:
+            payload["entities"] = entities
+        if vertical:
+            payload["vertical"] = vertical
+        return self._post_async_json(
+            "/v1/documents/detect-pii",
+            json_body=payload,
+            params=params,
+            poll=poll,
+        )
     def compare_documents(
         self,
@@ -269,27 +354,29 @@ class DocintelClient:
         text_b: str | None = None,
         path_a: str | Path | None = None,
         path_b: str | Path | None = None,
+        async_job: bool = False,
+        poll: bool = True,
     ) -> dict[str, Any]:
+        params = {"async": "true"} if async_job else {}
         if path_a is not None and path_b is not None:
             file_a = Path(path_a)
             file_b = Path(path_b)
             with file_a.open("rb") as handle_a, file_b.open("rb") as handle_b:
-                response = self._session.post(
-                    self._url("/v1/documents/compare"),
+                return self._post_async_multipart(
+                    "/v1/documents/compare",
+                    params=params,
                     files={
                         "file_a": (file_a.name, handle_a, "application/octet-stream"),
                         "file_b": (file_b.name, handle_b, "application/octet-stream"),
                     },
-                    timeout=self.timeout,
+                    poll=poll,
                 )
-        else:
-            response = self._session.post(
-                self._url("/v1/documents/compare"),
-                json={"text_a": text_a or "", "text_b": text_b or ""},
-                timeout=self.timeout,
-            )
-        self._raise_for_status(response)
-        return response.json()
+        return self._post_async_json(
+            "/v1/documents/compare",
+            json_body={"text_a": text_a or "", "text_b": text_b or ""},
+            params=params,
+            poll=poll,
+        )
     def process_document(
         self,
@@ -302,8 +389,12 @@ class DocintelClient:
         entities: str | None = None,
         vertical: str | None = None,
         min_score: float = 0.35,
+        async_job: bool = False,
+        callback_url: str | None = None,
+        poll: bool = True,
     ) -> dict[str, Any]:
         file_path = Path(path)
+        params = {"async": "true"} if async_job else {}
         data = {
             "sentences": str(sentences),
             "include_summarize": str(include_summarize).lower(),
@@ -315,12 +406,13 @@ class DocintelClient:
             data["entities"] = entities
         if vertical:
             data["vertical"] = vertical
+        if callback_url:
+            data["callback_url"] = callback_url
         with file_path.open("rb") as handle:
-            response = self._session.post(
-                self._url("/v1/documents/process"),
+            return self._post_async_multipart(
+                "/v1/documents/process",
+                params=params,
                 files={"file": (file_path.name, handle, "application/octet-stream")},
                 data=data,
-                timeout=self.timeout,
+                poll=poll,
             )
-        self._raise_for_status(response)
-        return response.json()

{docintel_platform-1.1.0 → docintel_platform-1.2.0}/src/docintel/jobs/models.py RENAMED Viewed

@@ -30,6 +30,11 @@ class JobType(str, Enum):
     TEXT_CLASSIFY = "text_classify"
     TEXT_DETECT_PII = "text_detect_pii"
     DOCUMENT_PROCESS = "document_process"
+    DOCUMENT_CLASSIFY = "document_classify"
+    DOCUMENT_SUMMARIZE = "document_summarize"
+    DOCUMENT_DETECT_PII = "document_detect_pii"
+    DOCUMENT_EXTRACT_TEXT = "document_extract_text"
+    DOCUMENT_COMPARE = "document_compare"
     BATCH = "batch"

{docintel_platform-1.1.0 → docintel_platform-1.2.0}/src/docintel/jobs/queue.py RENAMED Viewed

@@ -187,6 +187,119 @@ def enqueue_document_process_text_job(
     )
+def enqueue_classify_document_job(
+    job_id: str,
+    input_path: str,
+    filename: str,
+    content_type: str | None,
+) -> None:
+    queue = get_queue()
+    queue.enqueue(
+        "docintel.jobs.tasks.run_classify_document_job",
+        job_id=job_id,
+        input_path=input_path,
+        filename=filename,
+        content_type=content_type,
+        job_timeout=600,
+        result_ttl=DEFAULT_RESULT_TTL,
+        failure_ttl=DEFAULT_FAILURE_TTL,
+    )
+def enqueue_summarize_document_job(
+    job_id: str,
+    input_path: str,
+    filename: str,
+    content_type: str | None,
+    sentences: int,
+) -> None:
+    queue = get_queue()
+    queue.enqueue(
+        "docintel.jobs.tasks.run_summarize_document_job",
+        job_id=job_id,
+        input_path=input_path,
+        filename=filename,
+        content_type=content_type,
+        sentences=sentences,
+        job_timeout=600,
+        result_ttl=DEFAULT_RESULT_TTL,
+        failure_ttl=DEFAULT_FAILURE_TTL,
+    )
+def enqueue_detect_pii_document_job(
+    job_id: str,
+    input_path: str,
+    filename: str,
+    content_type: str | None,
+    *,
+    entities: list[str] | None = None,
+    min_score: float = 0.35,
+) -> None:
+    queue = get_queue()
+    queue.enqueue(
+        "docintel.jobs.tasks.run_detect_pii_document_job",
+        job_id=job_id,
+        input_path=input_path,
+        filename=filename,
+        content_type=content_type,
+        entities=entities,
+        min_score=min_score,
+        job_timeout=600,
+        result_ttl=DEFAULT_RESULT_TTL,
+        failure_ttl=DEFAULT_FAILURE_TTL,
+    )
+def enqueue_extract_text_job(
+    job_id: str,
+    input_path: str,
+    filename: str,
+    content_type: str | None,
+) -> None:
+    queue = get_queue()
+    queue.enqueue(
+        "docintel.jobs.tasks.run_extract_text_job",
+        job_id=job_id,
+        input_path=input_path,
+        filename=filename,
+        content_type=content_type,
+        job_timeout=600,
+        result_ttl=DEFAULT_RESULT_TTL,
+        failure_ttl=DEFAULT_FAILURE_TTL,
+    )
+def enqueue_compare_job(
+    job_id: str,
+    *,
+    text_a: str | None = None,
+    text_b: str | None = None,
+    path_a: str | None = None,
+    path_b: str | None = None,
+    filename_a: str | None = None,
+    filename_b: str | None = None,
+    content_type_a: str | None = None,
+    content_type_b: str | None = None,
+) -> None:
+    queue = get_queue()
+    queue.enqueue(
+        "docintel.jobs.tasks.run_compare_job",
+        job_id=job_id,
+        text_a=text_a,
+        text_b=text_b,
+        path_a=path_a,
+        path_b=path_b,
+        filename_a=filename_a,
+        filename_b=filename_b,
+        content_type_a=content_type_a,
+        content_type_b=content_type_b,
+        job_timeout=600,
+        result_ttl=DEFAULT_RESULT_TTL,
+        failure_ttl=DEFAULT_FAILURE_TTL,
+    )
 def queue_depth() -> int | None:
     """Return RQ queue length when Redis is reachable."""
     try:

docintel-platform 1.1.0__tar.gz → 1.2.0__tar.gz

docintel-platform 1.1.0tar.gz → 1.2.0tar.gz