PyPI - cat-llm - Versions diffs - 0.0.54__tar.gz → 0.0.56__tar.gz - Mend

cat-llm 0.0.54tar.gz → 0.0.56tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

{cat_llm-0.0.54 → cat_llm-0.0.56}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: cat-llm
-Version: 0.0.54
+Version: 0.0.56
 Summary: A tool for categorizing text data and images using LLMs and vision models
 Project-URL: Documentation, https://github.com/chrissoria/cat-llm#readme
 Project-URL: Issues, https://github.com/chrissoria/cat-llm/issues
@@ -21,6 +21,7 @@ Classifier: Programming Language :: Python :: Implementation :: PyPy
 Requires-Python: >=3.8
 Requires-Dist: openai
 Requires-Dist: pandas
+Requires-Dist: random
 Requires-Dist: requests
 Requires-Dist: tqdm
 Description-Content-Type: text/markdown
@@ -59,11 +60,15 @@ pip install cat-llm
 ## Quick Start
-CatLLM helps social scientists and demographers automatically categorize open-ended survey responses using AI models like GPT-4 and Claude.
+CatLLM helps social scientists and researchers automatically categorize open-ended survey responses, images, and web-scraped data using AI models like GPT-4 and Claude.
-Simply provide your survey responses and category list - the package handles the rest and outputs clean data ready for statistical analysis. It works with single or multiple categories per response and automatically skips missing data to save API costs.
+Text Analysis: Simply provide your survey responses and category list - the package handles the rest and outputs clean data ready for statistical analysis. It works with single or multiple categories per response and automatically skips missing data to save API costs.
-Perfect for turning messy text data into structured categories you can actually analyze.
+Image Categorization: Uses the same intelligent categorization method to analyze images, extracting specific features, counting objects, identifying colors, or determining the presence of elements based on your research questions.
+Web Data Collection: Builds comprehensive datasets by scraping web data and using Large Language Models to extract exactly the information you need. The function searches across multiple sources, processes the findings through AI models, and structures everything into clean dataframe format ready for export to CSV.
+Whether you're working with messy text responses, analyzing visual content, or gathering information from across the web, CatLLM consistently transforms unstructured data into structured categories and datasets you can actually analyze. All outputs are formatted for immediate statistical analysis and can be exported directly to CSV for further research workflows.
 ## Configuration

{cat_llm-0.0.54 → cat_llm-0.0.56}/README.md RENAMED Viewed

@@ -32,11 +32,15 @@ pip install cat-llm
 ## Quick Start
-CatLLM helps social scientists and demographers automatically categorize open-ended survey responses using AI models like GPT-4 and Claude.
+CatLLM helps social scientists and researchers automatically categorize open-ended survey responses, images, and web-scraped data using AI models like GPT-4 and Claude.
-Simply provide your survey responses and category list - the package handles the rest and outputs clean data ready for statistical analysis. It works with single or multiple categories per response and automatically skips missing data to save API costs.
+Text Analysis: Simply provide your survey responses and category list - the package handles the rest and outputs clean data ready for statistical analysis. It works with single or multiple categories per response and automatically skips missing data to save API costs.
-Perfect for turning messy text data into structured categories you can actually analyze.
+Image Categorization: Uses the same intelligent categorization method to analyze images, extracting specific features, counting objects, identifying colors, or determining the presence of elements based on your research questions.
+Web Data Collection: Builds comprehensive datasets by scraping web data and using Large Language Models to extract exactly the information you need. The function searches across multiple sources, processes the findings through AI models, and structures everything into clean dataframe format ready for export to CSV.
+Whether you're working with messy text responses, analyzing visual content, or gathering information from across the web, CatLLM consistently transforms unstructured data into structured categories and datasets you can actually analyze. All outputs are formatted for immediate statistical analysis and can be exported directly to CSV for further research workflows.
 ## Configuration

{cat_llm-0.0.54 → cat_llm-0.0.56}/pyproject.toml RENAMED Viewed

@@ -28,7 +28,8 @@ dependencies = [
   "pandas",
   "tqdm",
   "requests",
-  "openai"
+  "openai",
+  "random"
 ]
 [project.urls]

{cat_llm-0.0.54 → cat_llm-0.0.56}/src/catllm/CERAD_functions.py RENAMED Viewed

@@ -378,7 +378,7 @@ def cerad_drawn_score(
             image_files.reset_index(drop=True) if isinstance(image_files, (pd.DataFrame, pd.Series))
             else pd.Series(image_files)
         ),
-        'link1': pd.Series(link1).reset_index(drop=True),
+        'model_response': pd.Series(link1).reset_index(drop=True),
         'json': pd.Series(extracted_jsons).reset_index(drop=True)
     })
     categorized_data = pd.concat([categorized_data, normalized_data], axis=1)

{cat_llm-0.0.54 → cat_llm-0.0.56}/src/catllm/__about__.py RENAMED Viewed

@@ -1,7 +1,7 @@
 # SPDX-FileCopyrightText: 2025-present Christopher Soria <chrissoria@berkeley.edu>
 #
 # SPDX-License-Identifier: MIT
-__version__ = "0.0.54"
+__version__ = "0.0.56"
 __author__ = "Chris Soria"
 __email__ = "chrissoria@berkeley.edu"
 __title__ = "cat-llm"

{cat_llm-0.0.54 → cat_llm-0.0.56}/src/catllm/image_functions.py RENAMED Viewed

@@ -225,7 +225,7 @@ def image_multi_class(
             # Save progress so far
             temp_df = pd.DataFrame({
                 'image_input': image_files[:i+1],
-                'link1': link1,
+                'model_response': link1,
                 'json': extracted_jsons
             })
             # Normalize processed jsons so far
@@ -522,7 +522,7 @@ def image_score_drawing(
             # Save progress so far
             temp_df = pd.DataFrame({
                 'image_input': image_files[:i+1],
-                'link1': link1,
+                'model_response': link1,
                 'json': extracted_jsons
             })
             # Normalize processed jsons so far
@@ -844,7 +844,7 @@ def image_features(
             image_files.reset_index(drop=True) if isinstance(image_files, (pd.DataFrame, pd.Series))
             else pd.Series(image_files)
         ),
-        'link1': pd.Series(link1).reset_index(drop=True),
+        'model_response': pd.Series(link1).reset_index(drop=True),
         'json': pd.Series(extracted_jsons).reset_index(drop=True)
     })
     categorized_data = pd.concat([categorized_data, normalized_data], axis=1)

{cat_llm-0.0.54 → cat_llm-0.0.56}/src/catllm/text_functions.py RENAMED Viewed

@@ -227,6 +227,7 @@ def multi_class(
     user_model="gpt-4o",
     creativity=0,
     safety=False,
+    to_csv=False,
     filename="categorized_data.csv",
     save_directory=None,
     model_source="OpenAI"
@@ -390,7 +391,7 @@ Provide your work in JSON format where the number belonging to each category is
                     normalized_data_list.append(pd.DataFrame({"1": ["e"]}))
             normalized_data = pd.concat(normalized_data_list, ignore_index=True)
             temp_df = pd.concat([temp_df, normalized_data], axis=1)
-            # Save to CSV
+            # save to CSV
             if save_directory is None:
                 save_directory = os.getcwd()
             temp_df.to_csv(os.path.join(save_directory, filename), index=False)
@@ -405,13 +406,18 @@ Provide your work in JSON format where the number belonging to each category is
             normalized_data_list.append(pd.DataFrame({"1": ["e"]}))
     normalized_data = pd.concat(normalized_data_list, ignore_index=True)
     categorized_data = pd.DataFrame({
-        'image_input': (
+        'survey_input': (
             survey_input.reset_index(drop=True) if isinstance(survey_input, (pd.DataFrame, pd.Series))
             else pd.Series(survey_input)
         ),
-        'link1': pd.Series(link1).reset_index(drop=True),
+        'model_response': pd.Series(link1).reset_index(drop=True),
         'json': pd.Series(extracted_jsons).reset_index(drop=True)
     })
     categorized_data = pd.concat([categorized_data, normalized_data], axis=1)
+    if to_csv:
+        if save_directory is None:
+            save_directory = os.getcwd()
+        categorized_data.to_csv(os.path.join(save_directory, filename), index=False)
     return categorized_data