PyPI - cat-stack - Versions diffs - 0.4.0__tar.gz → 1.0.0__tar.gz - Mend

cat-stack 0.4.0tar.gz → 1.0.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (42) hide show

{cat_stack-0.4.0 → cat_stack-1.0.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: cat-stack
-Version: 0.4.0
+Version: 1.0.0
 Summary: Domain-agnostic text, image, PDF, and DOCX classification engine powered by LLMs
 Project-URL: Documentation, https://github.com/chrissoria/cat-stack#readme
 Project-URL: Issues, https://github.com/chrissoria/cat-stack/issues
@@ -100,6 +100,41 @@ cat.classify(
 )
 ```
+#### Inline prompt tuning
+Add `prompt_tune=True` to automatically optimize the classification prompt before the full run. A browser UI opens for you to correct a small sample, then the optimized prompt is used for all remaining items.
+```python
+cat.classify(
+    input_data=df["text"],
+    categories=["Cat A", "Cat B", "Cat C"],
+    models=[("gpt-4o", "openai", key)],
+    prompt_tune=15,       # tune on 15 random items, then classify all
+    tune_iterations=3,    # max attempts per category (default 3)
+)
+```
+### `prompt_tune()`
+Standalone automatic prompt optimization. Iteratively refines classification prompts using user feedback — classify a sample, correct mistakes in the browser, and let the LLM generate targeted per-category instructions.
+```python
+result = cat.prompt_tune(
+    input_data=df["text"],
+    categories=["Cat A", "Cat B", "Cat C"],
+    api_key="your-key",
+    sample_size=15,
+    max_iterations=3,
+)
+# Use the optimized prompt for classification
+cat.classify(
+    input_data=df["text"],
+    categories=["Cat A", "Cat B", "Cat C"],
+    api_key="your-key",
+    system_prompt=result["system_prompt"],
+)
+```
 ### `extract()`
 Discover categories from a corpus using LLM-driven exploration.
@@ -141,11 +176,13 @@ All providers use the same `(model_name, provider, api_key)` tuple format. Provi
 ## Features
+- **Automatic prompt optimization** (`prompt_tune`) — correct a small sample in a browser UI, and the system generates per-category instructions that improve accuracy
 - **Multi-model ensemble** with consensus voting and agreement scores
 - **Batch API support** for OpenAI, Anthropic, Google, Mistral, and xAI
 - **Prompt strategies**: Chain-of-Thought, Chain-of-Verification, step-back prompting, few-shot examples
 - **Text, image, and PDF** input auto-detection
 - **Embedding similarity** tiebreaker for ensemble consensus ties
+- **Pilot test** — validate classifications on a small sample before committing to the full run
 ## License

{cat_stack-0.4.0 → cat_stack-1.0.0}/README.md RENAMED Viewed

@@ -61,6 +61,41 @@ cat.classify(
 )
 ```
+#### Inline prompt tuning
+Add `prompt_tune=True` to automatically optimize the classification prompt before the full run. A browser UI opens for you to correct a small sample, then the optimized prompt is used for all remaining items.
+```python
+cat.classify(
+    input_data=df["text"],
+    categories=["Cat A", "Cat B", "Cat C"],
+    models=[("gpt-4o", "openai", key)],
+    prompt_tune=15,       # tune on 15 random items, then classify all
+    tune_iterations=3,    # max attempts per category (default 3)
+)
+```
+### `prompt_tune()`
+Standalone automatic prompt optimization. Iteratively refines classification prompts using user feedback — classify a sample, correct mistakes in the browser, and let the LLM generate targeted per-category instructions.
+```python
+result = cat.prompt_tune(
+    input_data=df["text"],
+    categories=["Cat A", "Cat B", "Cat C"],
+    api_key="your-key",
+    sample_size=15,
+    max_iterations=3,
+)
+# Use the optimized prompt for classification
+cat.classify(
+    input_data=df["text"],
+    categories=["Cat A", "Cat B", "Cat C"],
+    api_key="your-key",
+    system_prompt=result["system_prompt"],
+)
+```
 ### `extract()`
 Discover categories from a corpus using LLM-driven exploration.
@@ -102,11 +137,13 @@ All providers use the same `(model_name, provider, api_key)` tuple format. Provi
 ## Features
+- **Automatic prompt optimization** (`prompt_tune`) — correct a small sample in a browser UI, and the system generates per-category instructions that improve accuracy
 - **Multi-model ensemble** with consensus voting and agreement scores
 - **Batch API support** for OpenAI, Anthropic, Google, Mistral, and xAI
 - **Prompt strategies**: Chain-of-Thought, Chain-of-Verification, step-back prompting, few-shot examples
 - **Text, image, and PDF** input auto-detection
 - **Embedding similarity** tiebreaker for ensemble consensus ties
+- **Pilot test** — validate classifications on a small sample before committing to the full run
 ## License

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/__about__.py RENAMED Viewed

@@ -1,7 +1,7 @@
 # SPDX-FileCopyrightText: 2025-present Christopher Soria <chrissoria@berkeley.edu>
 #
 # SPDX-License-Identifier: GPL-3.0-or-later
-__version__ = "0.4.0"
+__version__ = "1.0.0"
 __author__ = "Chris Soria"
 __email__ = "chrissoria@berkeley.edu"
 __title__ = "cat-stack"

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/summarize.py RENAMED Viewed

@@ -31,6 +31,7 @@ def summarize(
     api_key: str = None,
     description: str = "",
     instructions: str = "",
+    format: str = "paragraph",
     max_length: int = None,
     focus: str = None,
     user_model: str = "gpt-4o",
@@ -76,7 +77,15 @@ def summarize(
             - PDF: directory path, single PDF path, or list of PDF paths
         api_key (str): API key for the model provider (single-model mode)
         description (str): Description of what the content contains (provides context)
-        instructions (str): Specific summarization instructions (e.g., "bullet points")
+        instructions (str): Specific summarization instructions. When used with
+            format, these are appended as additional instructions. Default "".
+        format (str): Output format for the summary. Default "paragraph".
+            - "paragraph": Flowing prose summary (default)
+            - "bullets": Bullet-point list of key points
+            - "one-liner": Single-sentence summary
+            - "structured": Labeled sections (What, Who, Why, Impact)
+            - "report": Comprehensive full-page report with Overview, Background,
+              Key Provisions, Stakeholders/Impact, and Implementation sections
         max_length (int): Maximum summary length in words
         focus (str): What to focus on (e.g., "main arguments", "emotional content")
         user_model (str): Model to use (default "gpt-4o")
@@ -179,6 +188,75 @@ def summarize(
         ...     ],
         ... )
     """
+    # =========================================================================
+    # Resolve format → instructions + max_length defaults
+    # =========================================================================
+    _FORMAT_PRESETS = {
+        "paragraph": {
+            "instructions": "Write a concise summary in paragraph form.",
+            "max_length": None,
+        },
+        "bullets": {
+            "instructions": (
+                "Summarize as a bullet-point list. Each bullet should capture "
+                "one key point. Use '- ' prefix for each bullet."
+            ),
+            "max_length": None,
+        },
+        "one-liner": {
+            "instructions": "Summarize in a single sentence.",
+            "max_length": 40,
+        },
+        "structured": {
+            "instructions": (
+                "Summarize using these labeled sections:\n"
+                "- What: What does this do or say?\n"
+                "- Who: Who is affected or involved?\n"
+                "- Why: What is the motivation or purpose?\n"
+                "- Impact: What are the key consequences or effects?"
+            ),
+            "max_length": None,
+        },
+        "report": {
+            "instructions": (
+                "Write a comprehensive full-page report covering the following sections. "
+                "Use clear headings and be thorough.\n\n"
+                "## Overview\n"
+                "A brief executive summary (2-3 sentences).\n\n"
+                "## Background and Context\n"
+                "What is the background? What problem or situation prompted this? "
+                "Include relevant history and prior actions.\n\n"
+                "## Key Provisions\n"
+                "Detail the main provisions, requirements, or arguments. "
+                "Be specific about numbers, dates, names, and conditions.\n\n"
+                "## Stakeholders and Impact\n"
+                "Who is affected? What are the expected consequences? "
+                "Include both intended effects and potential concerns.\n\n"
+                "## Implementation\n"
+                "How will this be implemented? What is the timeline? "
+                "Are there enforcement mechanisms or milestones?"
+            ),
+            "max_length": 800,
+        },
+    }
+    format_lower = format.lower() if format else "paragraph"
+    if format_lower not in _FORMAT_PRESETS:
+        valid = ", ".join(f'"{k}"' for k in _FORMAT_PRESETS)
+        raise ValueError(f"format must be one of {valid}, got '{format}'")
+    preset = _FORMAT_PRESETS[format_lower]
+    # Format instructions are prepended to any user-provided instructions
+    if not instructions:
+        instructions = preset["instructions"]
+    else:
+        instructions = f"{preset['instructions']}\n\nAdditional instructions: {instructions}"
+    # Use format's max_length as default only if user didn't specify one
+    if max_length is None and preset["max_length"] is not None:
+        max_length = preset["max_length"]
     # Map mode to pdf_mode
     pdf_mode = mode if mode in ("image", "text", "both") else "image"

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/text_functions_ensemble.py RENAMED Viewed

@@ -3756,7 +3756,7 @@ def summarize_ensemble(
                         max_retries=max_retries,
                     )
                 else:
-                    response, _err = client.complete(
+                    response, error = client.complete(
                         messages=messages,
                         json_schema=json_schema,
                         creativity=creativity,
@@ -3764,6 +3764,9 @@ def summarize_ensemble(
                         max_retries=max_retries,
                     )
+                if error:
+                    return (model_name, '{"summary": ""}', error)
                 # Extract JSON from response
                 json_str = extract_json(response)
@@ -3806,7 +3809,7 @@ def summarize_ensemble(
                 # Resolve thinking_budget for this provider
                 effective_thinking = thinking_budget if cfg["provider"] in ("google", "openai", "anthropic", "huggingface", "huggingface-together") else None
-                response, _err = client.complete(
+                response, error = client.complete(
                     messages=messages,
                     json_schema=json_schema,
                     creativity=creativity,
@@ -3814,6 +3817,9 @@ def summarize_ensemble(
                     max_retries=max_retries,
                 )
+                if error:
+                    return (model_name, '{"summary": ""}', error)
                 # Extract JSON from response
                 json_str = extract_json(response)

{cat_stack-0.4.0 → cat_stack-1.0.0}/.gitignore RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/LICENSE RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/pyproject.toml RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/__init__.py RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/_batch.py RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/_category_analysis.py RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/_chunked.py RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/_embeddings.py RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/_formatter.py RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/_pilot_test.py RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/_providers.py RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/_review_ui.py RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/_tiebreaker.py RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/_utils.py RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/_web_fetch.py RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/calls/CoVe.py RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/calls/__init__.py RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/calls/all_calls.py RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/calls/image_CoVe.py RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/calls/image_stepback.py RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/calls/pdf_CoVe.py RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/calls/pdf_stepback.py RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/calls/stepback.py RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/calls/top_n.py RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/classify.py RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/explore.py RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/extract.py RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/image_functions.py RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/images/circle.png RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/images/cube.png RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/images/diamond.png RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/images/overlapping_pentagons.png RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/images/rectangles.png RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/model_reference_list.py RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/pdf_functions.py RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/prompt_tune.py RENAMED Viewed

File without changes

{cat_stack-0.4.0 → cat_stack-1.0.0}/src/cat_stack/text_functions.py RENAMED Viewed

File without changes

cat-stack 0.4.0__tar.gz → 1.0.0__tar.gz

cat-stack 0.4.0tar.gz → 1.0.0tar.gz