PyPI - cat-stack - Versions diffs - 1.5.0__tar.gz → 1.6.1__tar.gz - Mend

cat-stack 1.5.0tar.gz → 1.6.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (44) hide show

{cat_stack-1.5.0 → cat_stack-1.6.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: cat-stack
-Version: 1.5.0
+Version: 1.6.1
 Summary: Domain-agnostic text, image, PDF, and DOCX classification engine powered by LLMs
 Project-URL: Documentation, https://github.com/chrissoria/cat-stack#readme
 Project-URL: Issues, https://github.com/chrissoria/cat-stack/issues
@@ -177,10 +177,65 @@ All providers use the same `(model_name, provider, api_key)` tuple format. Provi
 - **Multi-model ensemble** with consensus voting and agreement scores
 - **Batch API support** for OpenAI, Anthropic, Google, Mistral, and xAI
 - **Prompt strategies**: Chain-of-Thought, Chain-of-Verification, step-back prompting, few-shot examples
-- **Text, image, and PDF** input auto-detection
+- **Text, image, and PDF** input auto-detection (PDF inputs are
+  validated against the `%PDF-` magic-byte header before reaching
+  PyMuPDF, so a webpage saved with `.pdf` extension surfaces a clear
+  `ValueError` instead of silently classifying a blank rendered page
+  as `success`)
 - **Embedding similarity** tiebreaker for ensemble consensus ties
 - **Pilot test** — validate classifications on a small sample before committing to the full run
+## Future work / contributions welcome
+The following items are tracked but not yet implemented. PRs welcome —
+each entry includes the scope I'd suggest if someone wants to pick it up.
+- **Standalone SambaNova provider.** Currently SambaNova-hosted models
+  are reachable through the HuggingFace router suffix
+  (`meta-llama/...:sambanova`), but there's no direct
+  `provider="sambanova"` path that talks to SambaNova's own
+  OpenAI-compatible endpoint. Wiring it up means a new
+  `PROVIDER_CONFIG` entry, the right base URL
+  (`https://api.sambanova.ai/v1`), token-detection rules in
+  `detect_provider`, and a smoke test against one of their cheap
+  models (e.g. `Meta-Llama-3.1-8B-Instruct`).
+- **Consolidate HuggingFace-suffix dispatch.** The strings
+  `"huggingface"` and `"huggingface-together"` are currently
+  hardcoded in ~30 dispatch sites across
+  `pdf_functions.py` / `image_functions.py` /
+  `text_functions_ensemble.py` / `_chunked.py`. Adding a new router
+  suffix (e.g. `huggingface-fireworks`) means updating every one of
+  them. The cleaner refactor is a single
+  `_is_openai_compatible(model_source)` helper that matches anything
+  starting with `huggingface` plus the static list
+  (openai/perplexity/xai). Same shape as our existing
+  `_sanitize_google_schema` helper. Touches a lot of sites but each
+  edit is mechanical.
+- **Meta-LLM "Senate VP" tiebreaker + batch_mode support for
+  `embedding_tiebreaker`.** The existing `embedding_tiebreaker=True`
+  resolves true 50/50 ties via centroid similarity, but only in
+  synchronous ensemble mode. Two related extensions: (a) a meta-LLM
+  tie-breaker that invokes a separate model on tied rows
+  (`tie_break="meta_model"` with a configurable model); (b) extend
+  the existing centroid tiebreaker to work inside `batch_mode=True`
+  by running it after the batch results come back, before
+  `build_output_dataframes`. The infrastructure for both is already
+  in `_tiebreaker.py`; the meta-LLM variant would be a new resolver
+  function called from `resolve_ties_with_centroids`.
+- **Schema-permafail retry short-circuit.** When a model's
+  classification permanently fails schema validation across all
+  available retry budgets, the framework keeps spending API calls.
+  A short-circuit that detects "this model + this input is producing
+  the same invalid output N times in a row" and bails out early
+  would save quota. Scope was narrowed earlier (after the
+  HF-SMALL-MODEL fix reduced the wasted-retries surface area), so
+  there's a real risk this stays low-value; recommend writing the
+  detection metric first, instrumenting an actual run, and only
+  building the short-circuit if the metric says it would have helped.
 ## License
 GPL-3.0-or-later

{cat_stack-1.5.0 → cat_stack-1.6.1}/README.md RENAMED Viewed

@@ -141,10 +141,65 @@ All providers use the same `(model_name, provider, api_key)` tuple format. Provi
 - **Multi-model ensemble** with consensus voting and agreement scores
 - **Batch API support** for OpenAI, Anthropic, Google, Mistral, and xAI
 - **Prompt strategies**: Chain-of-Thought, Chain-of-Verification, step-back prompting, few-shot examples
-- **Text, image, and PDF** input auto-detection
+- **Text, image, and PDF** input auto-detection (PDF inputs are
+  validated against the `%PDF-` magic-byte header before reaching
+  PyMuPDF, so a webpage saved with `.pdf` extension surfaces a clear
+  `ValueError` instead of silently classifying a blank rendered page
+  as `success`)
 - **Embedding similarity** tiebreaker for ensemble consensus ties
 - **Pilot test** — validate classifications on a small sample before committing to the full run
+## Future work / contributions welcome
+The following items are tracked but not yet implemented. PRs welcome —
+each entry includes the scope I'd suggest if someone wants to pick it up.
+- **Standalone SambaNova provider.** Currently SambaNova-hosted models
+  are reachable through the HuggingFace router suffix
+  (`meta-llama/...:sambanova`), but there's no direct
+  `provider="sambanova"` path that talks to SambaNova's own
+  OpenAI-compatible endpoint. Wiring it up means a new
+  `PROVIDER_CONFIG` entry, the right base URL
+  (`https://api.sambanova.ai/v1`), token-detection rules in
+  `detect_provider`, and a smoke test against one of their cheap
+  models (e.g. `Meta-Llama-3.1-8B-Instruct`).
+- **Consolidate HuggingFace-suffix dispatch.** The strings
+  `"huggingface"` and `"huggingface-together"` are currently
+  hardcoded in ~30 dispatch sites across
+  `pdf_functions.py` / `image_functions.py` /
+  `text_functions_ensemble.py` / `_chunked.py`. Adding a new router
+  suffix (e.g. `huggingface-fireworks`) means updating every one of
+  them. The cleaner refactor is a single
+  `_is_openai_compatible(model_source)` helper that matches anything
+  starting with `huggingface` plus the static list
+  (openai/perplexity/xai). Same shape as our existing
+  `_sanitize_google_schema` helper. Touches a lot of sites but each
+  edit is mechanical.
+- **Meta-LLM "Senate VP" tiebreaker + batch_mode support for
+  `embedding_tiebreaker`.** The existing `embedding_tiebreaker=True`
+  resolves true 50/50 ties via centroid similarity, but only in
+  synchronous ensemble mode. Two related extensions: (a) a meta-LLM
+  tie-breaker that invokes a separate model on tied rows
+  (`tie_break="meta_model"` with a configurable model); (b) extend
+  the existing centroid tiebreaker to work inside `batch_mode=True`
+  by running it after the batch results come back, before
+  `build_output_dataframes`. The infrastructure for both is already
+  in `_tiebreaker.py`; the meta-LLM variant would be a new resolver
+  function called from `resolve_ties_with_centroids`.
+- **Schema-permafail retry short-circuit.** When a model's
+  classification permanently fails schema validation across all
+  available retry budgets, the framework keeps spending API calls.
+  A short-circuit that detects "this model + this input is producing
+  the same invalid output N times in a row" and bails out early
+  would save quota. Scope was narrowed earlier (after the
+  HF-SMALL-MODEL fix reduced the wasted-retries surface area), so
+  there's a real risk this stays low-value; recommend writing the
+  detection metric first, instrumenting an actual run, and only
+  building the short-circuit if the metric says it would have helped.
 ## License
 GPL-3.0-or-later

{cat_stack-1.5.0 → cat_stack-1.6.1}/src/catstack/__about__.py RENAMED Viewed

@@ -1,7 +1,7 @@
 # SPDX-FileCopyrightText: 2025-present Christopher Soria <chrissoria@berkeley.edu>
 #
 # SPDX-License-Identifier: GPL-3.0-or-later
-__version__ = "1.5.0"
+__version__ = "1.6.1"
 __author__ = "Chris Soria"
 __email__ = "chrissoria@berkeley.edu"
 __title__ = "cat-stack"

{cat_stack-1.5.0 → cat_stack-1.6.1}/src/catstack/_batch.py RENAMED Viewed

@@ -102,6 +102,58 @@ class BatchJobFailedError(RuntimeError):
     pass
+def _inspect_anthropic_terminal_state(status_data: dict, job_id: str) -> None:
+    """Inspect an Anthropic batch in `processing_status == "ended"`.
+    Anthropic uses a single terminal state ("ended") for all outcomes —
+    fully succeeded, fully errored, fully canceled, fully expired, or
+    any mix. The polling code treats "ended" as success and returns
+    status_data; per-request errors get surfaced at parse time. That
+    works for the mixed case but is misleading when 0/N requests
+    succeeded: the caller silently gets a DataFrame of all-error rows
+    with no clear signal that the whole batch was dead.
+    This helper raises the appropriate exception when the batch is
+    *uniformly* failed/canceled/expired, and prints a warning for the
+    partial-failure case. Returns silently when the batch has any
+    successes (combined with per-row errors from parse layer).
+    """
+    counts = status_data.get("request_counts", {})
+    succeeded = counts.get("succeeded", 0)
+    errored = counts.get("errored", 0)
+    canceled = counts.get("canceled", 0)
+    expired = counts.get("expired", 0)
+    total = succeeded + errored + canceled + expired
+    if total == 0:
+        return
+    if succeeded == 0:
+        if canceled == total:
+            raise BatchJobExpiredError(
+                f"Anthropic batch '{job_id}' was canceled (canceled={canceled}). "
+                f"Job ID saved above — check provider dashboard for details."
+            )
+        if expired == total:
+            raise BatchJobExpiredError(
+                f"Anthropic batch '{job_id}' expired before any requests succeeded "
+                f"(expired={expired})."
+            )
+        raise BatchJobFailedError(
+            f"Anthropic batch '{job_id}' ended with 0/{total} requests succeeded "
+            f"(errored={errored}, canceled={canceled}, expired={expired}). "
+            f"Check the provider dashboard for the error details."
+        )
+    if errored or canceled or expired:
+        print(
+            f"  [batch] Anthropic batch '{job_id}' partial: "
+            f"succeeded={succeeded}, errored={errored}, "
+            f"canceled={canceled}, expired={expired}. "
+            f"Errored rows will appear as failures in the output DataFrame."
+        )
 # =============================================================================
 # Auth headers
 # =============================================================================
@@ -451,6 +503,8 @@ def _poll_batch_job(
                     f"Batch job '{job_id}' failed (state: {state}). "
                     f"Check the provider dashboard for details."
                 )
+            if provider == "anthropic":
+                _inspect_anthropic_terminal_state(status_data, job_id)
             return status_data
         time.sleep(interval)
@@ -718,6 +772,7 @@ def _run_one_batch_job(
             stepback_insights=stepback_insights,
             model_name=model,
             multi_label=prompt_params.get("multi_label", True),
+            system_prompt=prompt_params.get("system_prompt", ""),
         )
         payload = client._build_payload(
@@ -816,6 +871,7 @@ def _run_one_sync_model(
             stepback_insights=prompt_params.get("stepback_insights", {}),
             model_name=model,
             multi_label=prompt_params.get("multi_label", True),
+            system_prompt=prompt_params.get("system_prompt", ""),
         )
         try:
             raw, err = client.complete(
@@ -998,7 +1054,20 @@ def run_batch_ensemble_classify(
     with ThreadPoolExecutor(max_workers=len(model_configs)) as executor:
         futures = {executor.submit(_run_cfg, cfg): cfg for cfg in model_configs}
         for future in as_completed(futures):
-            model_key, result = future.result()
+            cfg = futures[future]
+            model_key = cfg["sanitized_name"]
+            try:
+                _, result = future.result()
+            except Exception as e:
+                print(
+                    f"\n[batch ensemble] Model '{cfg['model']}' ({cfg['provider']}) "
+                    f"failed: {type(e).__name__}: {e}"
+                )
+                print(
+                    f"  Other models will still complete; "
+                    f"this model's column will be empty."
+                )
+                result = {}
             all_model_results[model_key] = result
     all_results = []
@@ -1313,7 +1382,20 @@ def run_batch_ensemble_summarize(
     with ThreadPoolExecutor(max_workers=len(model_configs)) as executor:
         futures = {executor.submit(_run_cfg, cfg): cfg for cfg in model_configs}
         for future in as_completed(futures):
-            model_key, result = future.result()
+            cfg = futures[future]
+            model_key = cfg["sanitized_name"]
+            try:
+                _, result = future.result()
+            except Exception as e:
+                print(
+                    f"\n[batch ensemble] Model '{cfg['model']}' ({cfg['provider']}) "
+                    f"failed: {type(e).__name__}: {e}"
+                )
+                print(
+                    f"  Other models will still complete; "
+                    f"this model's column will be empty."
+                )
+                result = {}
             all_model_results[model_key] = result
     model_names = [cfg["sanitized_name"] for cfg in model_configs]

{cat_stack-1.5.0 → cat_stack-1.6.1}/src/catstack/_formatter.py RENAMED Viewed

@@ -42,31 +42,26 @@ def _check_dependencies():
         )
-def _ensure_dependencies(verbose: bool = True) -> bool:
-    """Ensure formatter Python dependencies are installed.
-    Tries to import torch/transformers/accelerate. If any are missing,
-    auto-installs them via pip after printing a clear warning about the
-    download size (~1.5 GB total). Returns True on success, False on
-    install failure.
-    """
+def _check_dependencies_installed() -> bool:
+    """Pure check — returns True if all formatter deps import successfully.
+    No side effects, no install attempts."""
     try:
         import torch  # noqa: F401
         import transformers  # noqa: F401
         import accelerate  # noqa: F401
         return True
     except ImportError:
-        pass
+        return False
-    if verbose:
-        print(
-            "\n[CatLLM] JSON formatter dependencies (transformers, torch, "
-            "accelerate)\n"
-            "  are not installed in this Python environment. Installing now\n"
-            "  (~1.5 GB download; one-time). To skip this and disable the\n"
-            "  formatter, pass json_formatter=False."
-        )
+def _install_dependencies(verbose: bool = True) -> bool:
+    """Run `pip install` for the formatter deps. Caller is responsible for
+    obtaining user consent before calling this — it does not prompt.
+    Returns True if deps are importable after install, False otherwise.
+    """
+    if verbose:
+        print("[CatLLM] Installing formatter dependencies (~1.5 GB)…")
     import subprocess
     try:
         subprocess.check_call(
@@ -77,19 +72,100 @@ def _ensure_dependencies(verbose: bool = True) -> bool:
         if verbose:
             print(
                 f"[CatLLM] Failed to install formatter dependencies ({e}).\n"
-                "  Install manually: pip install 'cat-llm[formatter]'"
+                "  Install manually: pip install 'cat-stack[formatter]'"
             )
         return False
+    return _check_dependencies_installed()
+def _prompt_formatter_consent(model_label: str = "the current model") -> str:
+    """Interactive consent prompt for the JSON formatter fallback.
+    Two paths depending on whether the formatter dependencies are already
+    installed:
+      - Deps installed: asks whether to load the ~1 GB formatter model.
+      - Deps missing:   asks whether to download deps (~1.5 GB) AND load.
+    Non-TTY contexts (CI, batch scripts, headless notebooks): prints a
+    one-time suggestion and returns "declined" without blocking on input.
+    Returns "approved" or "declined". On approval with deps missing,
+    also installs the deps before returning.
+    """
+    deps_installed = _check_dependencies_installed()
+    if not sys.stdin.isatty():
+        if deps_installed:
+            print(
+                f"\n[CatLLM] Malformed JSON from {model_label}. The JSON "
+                "formatter could recover this — pass json_formatter=True "
+                "to enable, or json_formatter=False to silence this suggestion."
+            )
+        else:
+            print(
+                f"\n[CatLLM] Malformed JSON from {model_label}. The JSON "
+                "formatter could recover, but its deps (~1.5 GB) aren't "
+                "installed. Run `pip install cat-stack[formatter]` and pass "
+                "json_formatter=True to enable, or json_formatter=False to "
+                "silence this suggestion."
+            )
+        return "declined"
+    if deps_installed:
+        prompt = (
+            f"\n[CatLLM] {model_label} produced malformed JSON on the first row.\n"
+            "  The JSON formatter can re-format the model's prose output\n"
+            "  into valid catstack JSON for this and subsequent rows.\n"
+            "    Cost: ~1 GB RAM (one-time load).\n"
+            "  Use the formatter for this run? (Y/n): "
+        )
+    else:
+        prompt = (
+            f"\n[CatLLM] {model_label} produced malformed JSON on the first row.\n"
+            "  The JSON formatter can re-format the model's prose output\n"
+            "  into valid catstack JSON for this and subsequent rows.\n"
+            "    Cost: ~1.5 GB download (transformers + torch + accelerate)\n"
+            "         + ~1 GB RAM (one-time load).\n"
+            "  Download deps and use the formatter? (Y/n): "
+        )
-    # Verify import works now
     try:
-        import torch  # noqa: F401
-        import transformers  # noqa: F401
+        answer = input(prompt).strip().lower()
+    except (EOFError, KeyboardInterrupt):
+        print("\n[CatLLM] No input received — skipping formatter.")
+        return "declined"
+    if answer in ("", "y", "yes"):
+        if not deps_installed:
+            if not _install_dependencies(verbose=True):
+                print("[CatLLM] Continuing without formatter.")
+                return "declined"
+        return "approved"
+    print("[CatLLM] Continuing without formatter.")
+    return "declined"
+def _ensure_dependencies(verbose: bool = True) -> bool:
+    """Back-compat: ensure deps are installed, auto-installing if missing.
+    Still used by the explicit `json_formatter=True` path where the user
+    has already implicitly consented by passing True. The new
+    `json_formatter=None` ("auto") path uses `_prompt_formatter_consent`
+    plus `_install_dependencies` directly so the install requires an
+    explicit yes.
+    """
+    if _check_dependencies_installed():
         return True
-    except ImportError as e:
-        if verbose:
-            print(f"[CatLLM] Formatter deps installed but import failed: {e}")
-        return False
+    if verbose:
+        print(
+            "\n[CatLLM] JSON formatter dependencies (transformers, torch, "
+            "accelerate)\n"
+            "  are not installed. Installing now (~1.5 GB download; one-time).\n"
+            "  To skip this and disable the formatter, pass json_formatter=False."
+        )
+    return _install_dependencies(verbose=verbose)
 def _is_model_cached() -> bool:

cat-stack 1.5.0__tar.gz → 1.6.1__tar.gz

cat-stack 1.5.0tar.gz → 1.6.1tar.gz