PyPI - cat-llm - Versions diffs - 0.0.72__tar.gz → 0.0.74__tar.gz - Mend

cat-llm 0.0.72tar.gz → 0.0.74tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

{cat_llm-0.0.72 → cat_llm-0.0.74}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: cat-llm
-Version: 0.0.72
+Version: 0.0.74
 Summary: A tool for categorizing text data and images using LLMs and vision models
 Project-URL: Documentation, https://github.com/chrissoria/cat-llm#readme
 Project-URL: Issues, https://github.com/chrissoria/cat-llm/issues
@@ -189,16 +189,23 @@ Performs multi-label classification of text responses into user-defined categori
 Processes each text response individually, assigning one or more categories from the provided list. Supports flexible output formatting and optional saving of results to CSV for easy integration with data analysis workflows.
 **Parameters:**
-- `survey_question` (str): The survey question being analyzed
 - `survey_input` (list): List of text responses to classify
 - `categories` (list): List of predefined categories for classification
 - `api_key` (str): API key for the LLM service
-- `user_model` (str, default="gpt-4o"): Specific model to use
-- `creativity` (float, default=0): Temperature/randomness setting (0.0-1.0)
+- `user_model` (str, default="gpt-5"): Specific model to use
+- `user_prompt` (str, optional): Custom prompt template to override default prompting
+- `survey_question` (str, default=""): The survey question being analyzed
+- `example1` through `example6` (dict, optional): Few-shot learning examples (format: {"response": "...", "categories": [...]})
+- `creativity` (float, optional): Temperature/randomness setting (0.0-1.0, varies by model)
 - `safety` (bool, default=False): Enable safety checks on responses and saves to CSV at each API call step
+- `to_csv` (bool, default=False): Whether to save results to CSV
+- `chain_of_verification` (bool, default=False): Enable Chain-of-Verification prompting technique for improved accuracy
+- `chain_of_thought` (bool, default=False): Enable Chain-of-Thought prompting technique for improved accuracy
+- `step_back_prompt` (bool, default=False): Enable step-back prompting to analyze higher-level context before classification
+- `context_prompt` (bool, default=False): Add expert role and behavioral guidelines to the prompt
 - `filename` (str, default="categorized_data.csv"): Filename for CSV output
 - `save_directory` (str, optional): Directory path to save the CSV file
-- `model_source` (str, default="OpenAI"): Model provider ("OpenAI", "Anthropic", "Perplexity", "Mistral")
+- `model_source` (str, default="auto"): Model provider ("auto", "OpenAI", "Anthropic", "Google", "Mistral", "Perplexity", "Huggingface")
 **Returns:**
 - `pandas.DataFrame`: DataFrame with classification results, columns formatted as specified

{cat_llm-0.0.72 → cat_llm-0.0.74}/README.md RENAMED Viewed

@@ -160,16 +160,23 @@ Performs multi-label classification of text responses into user-defined categori
 Processes each text response individually, assigning one or more categories from the provided list. Supports flexible output formatting and optional saving of results to CSV for easy integration with data analysis workflows.
 **Parameters:**
-- `survey_question` (str): The survey question being analyzed
 - `survey_input` (list): List of text responses to classify
 - `categories` (list): List of predefined categories for classification
 - `api_key` (str): API key for the LLM service
-- `user_model` (str, default="gpt-4o"): Specific model to use
-- `creativity` (float, default=0): Temperature/randomness setting (0.0-1.0)
+- `user_model` (str, default="gpt-5"): Specific model to use
+- `user_prompt` (str, optional): Custom prompt template to override default prompting
+- `survey_question` (str, default=""): The survey question being analyzed
+- `example1` through `example6` (dict, optional): Few-shot learning examples (format: {"response": "...", "categories": [...]})
+- `creativity` (float, optional): Temperature/randomness setting (0.0-1.0, varies by model)
 - `safety` (bool, default=False): Enable safety checks on responses and saves to CSV at each API call step
+- `to_csv` (bool, default=False): Whether to save results to CSV
+- `chain_of_verification` (bool, default=False): Enable Chain-of-Verification prompting technique for improved accuracy
+- `chain_of_thought` (bool, default=False): Enable Chain-of-Thought prompting technique for improved accuracy
+- `step_back_prompt` (bool, default=False): Enable step-back prompting to analyze higher-level context before classification
+- `context_prompt` (bool, default=False): Add expert role and behavioral guidelines to the prompt
 - `filename` (str, default="categorized_data.csv"): Filename for CSV output
 - `save_directory` (str, optional): Directory path to save the CSV file
-- `model_source` (str, default="OpenAI"): Model provider ("OpenAI", "Anthropic", "Perplexity", "Mistral")
+- `model_source` (str, default="auto"): Model provider ("auto", "OpenAI", "Anthropic", "Google", "Mistral", "Perplexity", "Huggingface")
 **Returns:**
 - `pandas.DataFrame`: DataFrame with classification results, columns formatted as specified

{cat_llm-0.0.72 → cat_llm-0.0.74}/src/catllm/__about__.py RENAMED Viewed

@@ -1,7 +1,7 @@
 # SPDX-FileCopyrightText: 2025-present Christopher Soria <chrissoria@berkeley.edu>
 #
 # SPDX-License-Identifier: MIT
-__version__ = "0.0.72"
+__version__ = "0.0.74"
 __author__ = "Chris Soria"
 __email__ = "chrissoria@berkeley.edu"
 __title__ = "cat-llm"

{cat_llm-0.0.72 → cat_llm-0.0.74}/src/catllm/text_functions.py RENAMED Viewed

@@ -260,6 +260,7 @@ def multi_class(
     safety = False,
     to_csv = False,
     chain_of_verification = False,
+    chain_of_thought = True,
     step_back_prompt = False,
     context_prompt = False,
     filename = "categorized_data.csv",
@@ -397,12 +398,27 @@ def multi_class(
             extracted_jsons.append(default_json)
             #print(f"Skipped NaN input.")
         else:
+            if chain_of_thought:
+                prompt = f"""{survey_question_context}
-            prompt = f"""{survey_question_context} \
-            Categorize this survey response "{response}" into the following categories that apply: \
-            {categories_str}
-            {examples_text}
-            Provide your work in JSON format where the number belonging to each category is the key and a 1 if the category is present and a 0 if it is not present as key values."""
+                Categorize this survey response "{response}" into the following categories that apply:
+                {categories_str}
+                Let's think step by step:
+                1. First, identify the main themes mentioned in the response
+                2. Then, match each theme to the relevant categories
+                3. Finally, assign 1 to matching categories and 0 to non-matching categories
+                {examples_text}
+                Provide your reasoning for each category, then provide your final answer in JSON format where the number belonging to each category is the key and a 1 if the category is present and a 0 if it is not present as key values."""
+            else:
+                prompt = f"""{survey_question_context} \
+                Categorize this survey response "{response}" into the following categories that apply: \
+                {categories_str}
+                {examples_text}
+                Provide your work in JSON format where the number belonging to each category is the key and a 1 if the category is present and a 0 if it is not present as key values."""
             if context_prompt:
                 context = """You are an expert researcher in survey data categorization.
@@ -410,7 +426,6 @@ def multi_class(
                 When uncertain, prioritize precision over recall."""
                 prompt = context + prompt
-                print(prompt)
             if chain_of_verification:
                 step2_prompt = f"""You provided this initial categorization: