PyPI - hamtaa-texttools - Versions diffs - 1.0.5__tar.gz → 1.0.6__tar.gz - Mend

hamtaa-texttools 1.0.5tar.gz → 1.0.6tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of hamtaa-texttools might be problematic. Click here for more details.

Files changed (36) hide show

{hamtaa_texttools-1.0.5/hamtaa_texttools.egg-info → hamtaa_texttools-1.0.6}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: hamtaa-texttools
-Version: 1.0.5
+Version: 1.0.6
 Summary: TextTools is a high-level NLP toolkit built on top of modern LLMs.
 Author-email: Tohidi <the.mohammad.tohidi@gmail.com>, Montazer <montazerh82@gmail.com>, Givechi <mohamad.m.givechi@gmail.com>, MoosaviNejad <erfanmoosavi84@gmail.com>
 License: MIT License
@@ -51,17 +51,17 @@ It provides ready-to-use utilities for **translation, question detection, keywor
 TextTools provides a rich collection of high-level NLP utilities built on top of LLMs.
 Each tool is designed to work out-of-the-box with structured outputs (JSON / Pydantic).
-- **Categorizer** → Zero-finetuning text categorization for fast, scalable classification.
-- **Keyword Extractor** → Identify the most important keywords in a text.
-- **Question Merger** → Merge the provided questions, preserving all the main points
-- **NER (Named Entity Recognition) Extractor** → Extract people, places, organizations, and other entities.
-- **Question Detector** → Determine whether a text is a question or not.
-- **Question Generator From Text** → Generate high-quality, context-relevant questions from provided text.
-- **Question Generator From Subject** → Generate high-quality, context-relevant questions from a subject.
-- **Rewriter** → Rewrite text while preserving meaning or without it.
-- **Summarizer** → Condense long passages into clear, structured summaries.
-- **Translator** → Translate text across multiple languages, with support for custom rules.
-- **Custom Tool** → Allows users to define a custom tool with arbitrary BaseModel.
+- **`categorize()`** - Classifies text into Islamic studies categories
+- **`is_question()`** - Binary detection of whether input is a question
+- **`extract_keywords()`** - Extracts keywords from text
+- **`extract_entities()`** - Named Entity Recognition (NER) system
+- **`summarize()`** - Text summarization
+- **`text_to_question()`** - Generates questions from text
+- **`merge_questions()`** - Merges multiple questions with different modes
+- **`rewrite()`** - Rewrites text with different wording/meaning
+- **`subject_to_question()`** - Generates questions about a specific subject
+- **`translate()`** - Text translation between languages
+- **`run_custom()`** - Allows users to define a custom tool with arbitrary BaseModel
 ---
@@ -87,7 +87,7 @@ All these flags can be used individually or together to tailor the behavior of a
 Install the latest release via PyPI:
 ```bash
-pip install -U hamta-texttools
+pip install -U hamtaa-texttools
 ```
 ---
@@ -118,7 +118,7 @@ model = "gpt-4o-mini"
 the_tool = TheTool(client=client, model=model, with_analysis=True, output_lang="English")
 # Example: Question Detection
-detection = the_tool.detect_question("Is this project open source?", logpobs=True, top_logprobs=2)
+detection = the_tool.is_question("Is this project open source?", logprobs=True, top_logprobs=2)
 print(detection["result"])
 print(detection["logprobs"])
 # Output: True
@@ -135,7 +135,7 @@ class Custom(BaseModel):
   result: list[list[dict[str, int]]]
 custom_prompt = "Something"
-custom_result = the_tool.custom_tool(custom_prompt, Custom)
+custom_result = the_tool.run_custom(custom_prompt, Custom)
 print(custom_result)
 ```

{hamtaa_texttools-1.0.5 → hamtaa_texttools-1.0.6}/README.md RENAMED Viewed

@@ -17,17 +17,17 @@ It provides ready-to-use utilities for **translation, question detection, keywor
 TextTools provides a rich collection of high-level NLP utilities built on top of LLMs.
 Each tool is designed to work out-of-the-box with structured outputs (JSON / Pydantic).
-- **Categorizer** → Zero-finetuning text categorization for fast, scalable classification.
-- **Keyword Extractor** → Identify the most important keywords in a text.
-- **Question Merger** → Merge the provided questions, preserving all the main points
-- **NER (Named Entity Recognition) Extractor** → Extract people, places, organizations, and other entities.
-- **Question Detector** → Determine whether a text is a question or not.
-- **Question Generator From Text** → Generate high-quality, context-relevant questions from provided text.
-- **Question Generator From Subject** → Generate high-quality, context-relevant questions from a subject.
-- **Rewriter** → Rewrite text while preserving meaning or without it.
-- **Summarizer** → Condense long passages into clear, structured summaries.
-- **Translator** → Translate text across multiple languages, with support for custom rules.
-- **Custom Tool** → Allows users to define a custom tool with arbitrary BaseModel.
+- **`categorize()`** - Classifies text into Islamic studies categories
+- **`is_question()`** - Binary detection of whether input is a question
+- **`extract_keywords()`** - Extracts keywords from text
+- **`extract_entities()`** - Named Entity Recognition (NER) system
+- **`summarize()`** - Text summarization
+- **`text_to_question()`** - Generates questions from text
+- **`merge_questions()`** - Merges multiple questions with different modes
+- **`rewrite()`** - Rewrites text with different wording/meaning
+- **`subject_to_question()`** - Generates questions about a specific subject
+- **`translate()`** - Text translation between languages
+- **`run_custom()`** - Allows users to define a custom tool with arbitrary BaseModel
 ---
@@ -53,7 +53,7 @@ All these flags can be used individually or together to tailor the behavior of a
 Install the latest release via PyPI:
 ```bash
-pip install -U hamta-texttools
+pip install -U hamtaa-texttools
 ```
 ---
@@ -84,7 +84,7 @@ model = "gpt-4o-mini"
 the_tool = TheTool(client=client, model=model, with_analysis=True, output_lang="English")
 # Example: Question Detection
-detection = the_tool.detect_question("Is this project open source?", logpobs=True, top_logprobs=2)
+detection = the_tool.is_question("Is this project open source?", logprobs=True, top_logprobs=2)
 print(detection["result"])
 print(detection["logprobs"])
 # Output: True
@@ -101,7 +101,7 @@ class Custom(BaseModel):
   result: list[list[dict[str, int]]]
 custom_prompt = "Something"
-custom_result = the_tool.custom_tool(custom_prompt, Custom)
+custom_result = the_tool.run_custom(custom_prompt, Custom)
 print(custom_result)
 ```

{hamtaa_texttools-1.0.5 → hamtaa_texttools-1.0.6/hamtaa_texttools.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: hamtaa-texttools
-Version: 1.0.5
+Version: 1.0.6
 Summary: TextTools is a high-level NLP toolkit built on top of modern LLMs.
 Author-email: Tohidi <the.mohammad.tohidi@gmail.com>, Montazer <montazerh82@gmail.com>, Givechi <mohamad.m.givechi@gmail.com>, MoosaviNejad <erfanmoosavi84@gmail.com>
 License: MIT License
@@ -51,17 +51,17 @@ It provides ready-to-use utilities for **translation, question detection, keywor
 TextTools provides a rich collection of high-level NLP utilities built on top of LLMs.
 Each tool is designed to work out-of-the-box with structured outputs (JSON / Pydantic).
-- **Categorizer** → Zero-finetuning text categorization for fast, scalable classification.
-- **Keyword Extractor** → Identify the most important keywords in a text.
-- **Question Merger** → Merge the provided questions, preserving all the main points
-- **NER (Named Entity Recognition) Extractor** → Extract people, places, organizations, and other entities.
-- **Question Detector** → Determine whether a text is a question or not.
-- **Question Generator From Text** → Generate high-quality, context-relevant questions from provided text.
-- **Question Generator From Subject** → Generate high-quality, context-relevant questions from a subject.
-- **Rewriter** → Rewrite text while preserving meaning or without it.
-- **Summarizer** → Condense long passages into clear, structured summaries.
-- **Translator** → Translate text across multiple languages, with support for custom rules.
-- **Custom Tool** → Allows users to define a custom tool with arbitrary BaseModel.
+- **`categorize()`** - Classifies text into Islamic studies categories
+- **`is_question()`** - Binary detection of whether input is a question
+- **`extract_keywords()`** - Extracts keywords from text
+- **`extract_entities()`** - Named Entity Recognition (NER) system
+- **`summarize()`** - Text summarization
+- **`text_to_question()`** - Generates questions from text
+- **`merge_questions()`** - Merges multiple questions with different modes
+- **`rewrite()`** - Rewrites text with different wording/meaning
+- **`subject_to_question()`** - Generates questions about a specific subject
+- **`translate()`** - Text translation between languages
+- **`run_custom()`** - Allows users to define a custom tool with arbitrary BaseModel
 ---
@@ -87,7 +87,7 @@ All these flags can be used individually or together to tailor the behavior of a
 Install the latest release via PyPI:
 ```bash
-pip install -U hamta-texttools
+pip install -U hamtaa-texttools
 ```
 ---
@@ -118,7 +118,7 @@ model = "gpt-4o-mini"
 the_tool = TheTool(client=client, model=model, with_analysis=True, output_lang="English")
 # Example: Question Detection
-detection = the_tool.detect_question("Is this project open source?", logpobs=True, top_logprobs=2)
+detection = the_tool.is_question("Is this project open source?", logprobs=True, top_logprobs=2)
 print(detection["result"])
 print(detection["logprobs"])
 # Output: True
@@ -135,7 +135,7 @@ class Custom(BaseModel):
   result: list[list[dict[str, int]]]
 custom_prompt = "Something"
-custom_result = the_tool.custom_tool(custom_prompt, Custom)
+custom_result = the_tool.run_custom(custom_prompt, Custom)
 print(custom_result)
 ```

{hamtaa_texttools-1.0.5 → hamtaa_texttools-1.0.6}/hamtaa_texttools.egg-info/SOURCES.txt RENAMED Viewed

@@ -15,15 +15,15 @@ texttools/formatters/base_formatter.py
 texttools/formatters/user_merge_formatter.py
 texttools/prompts/README.md
 texttools/prompts/categorizer.yaml
-texttools/prompts/custom_tool.yaml
+texttools/prompts/is_question.yaml
 texttools/prompts/keyword_extractor.yaml
 texttools/prompts/ner_extractor.yaml
-texttools/prompts/question_detector.yaml
-texttools/prompts/question_generator.yaml
 texttools/prompts/question_merger.yaml
 texttools/prompts/rewriter.yaml
-texttools/prompts/subject_question_generator.yaml
+texttools/prompts/run_custom.yaml
+texttools/prompts/subject_to_question.yaml
 texttools/prompts/summarizer.yaml
+texttools/prompts/text_to_question.yaml
 texttools/prompts/translator.yaml
 texttools/tools/__init__.py
 texttools/tools/async_the_tool.py

{hamtaa_texttools-1.0.5 → hamtaa_texttools-1.0.6}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "hamtaa-texttools"
-version = "1.0.5"
+version = "1.0.6"
 authors = [
   { name = "Tohidi", email = "the.mohammad.tohidi@gmail.com" },
   { name = "Montazer", email = "montazerh82@gmail.com" },
@@ -17,7 +17,7 @@ license = {file = "LICENSE"}
 requires-python = ">=3.8"
 dependencies = [
   "openai==1.97.1",
-  "PyYAML>=6.0"
+  "PyYAML>=6.0",
 ]
 keywords = ["nlp", "llm", "text-processing", "openai"]

{hamtaa_texttools-1.0.5 → hamtaa_texttools-1.0.6}/texttools/batch/batch_manager.py RENAMED Viewed

@@ -2,11 +2,16 @@ import json
 import uuid
 from pathlib import Path
 from typing import Any, Type
+import logging
 from pydantic import BaseModel
 from openai import OpenAI
 from openai.lib._pydantic import to_strict_json_schema
+# Configure logger
+logger = logging.getLogger("batch_runner")
+logger.setLevel(logging.INFO)
 class SimpleBatchManager:
     """
@@ -159,25 +164,9 @@ class SimpleBatchManager:
         info = self.client.batches.retrieve(job["id"])
         job = info.to_dict()
         self._save_state(job_name, [job])
-        print("HERE is the job", job)
+        logger.info("Batch job status: %s", job)
         return job["status"]
-    def _parsed(self, result: dict) -> list:
-        """
-        Parses the result dictionary, extracting the desired output or error for each item.
-        Returns a list of dictionaries with 'id' and 'output' keys.
-        """
-        modified_result = []
-        for key, d in result.items():
-            if "desired_output" in d:
-                new_dict = {"id": key, "output": d["desired_output"]}
-                modified_result.append(new_dict)
-            else:
-                new_dict = {"id": key, "output": d["error"]}
-                modified_result.append(new_dict)
-        return modified_result
     def fetch_results(
         self, job_name: str, remove_cache: bool = True
     ) -> tuple[dict[str, str], list]:
@@ -198,7 +187,7 @@ class SimpleBatchManager:
                 err_content = (
                     self.client.files.content(error_file_id).read().decode("utf-8")
                 )
-                print("Error file content:", err_content)
+                logger.info("Error file content:", err_content)
             return {}
         content = self.client.files.content(out_file_id).read().decode("utf-8")

{hamtaa_texttools-1.0.5 → hamtaa_texttools-1.0.6}/texttools/batch/batch_runner.py RENAMED Viewed

@@ -4,23 +4,27 @@ import time
 from dataclasses import dataclass
 from pathlib import Path
 from typing import Any, Callable
+import logging
+from dotenv import load_dotenv
 from openai import OpenAI
 from pydantic import BaseModel
-from texttools.batch.batch_manager import SimpleBatchManager
+from texttools.batch import SimpleBatchManager
+# Configure logger
+logger = logging.getLogger("batch_runner")
+logger.setLevel(logging.INFO)
-class Output(BaseModel):
-    output: str
+class OutputModel(BaseModel):
+    desired_output: str
 def export_data(data):
     """
     Produces a structure of the following form from an initial data structure:
-    [
-        {"id": str, "content": str},...
-    ]
+    [{"id": str, "text": str},...]
     """
     return data
@@ -50,19 +54,17 @@ class BatchConfig:
     BASE_OUTPUT_DIR: str = "Data/batch_entity_result"
     import_function: Callable = import_data
     export_function: Callable = export_data
+    poll_interval_seconds: int = 30
+    max_retries: int = 3
 class BatchJobRunner:
     """
-    Orchestrates the execution of batched LLM processing jobs.
-    Handles data loading, partitioning, job execution via SimpleBatchManager,
-    and result saving. Manages the complete workflow from input data to processed outputs,
-    including retries and progress tracking across multiple batch parts.
+    Handles running batch jobs using a batch manager and configuration.
     """
     def __init__(
-        self, config: BatchConfig = BatchConfig(), output_model: type = Output
+        self, config: BatchConfig = BatchConfig(), output_model: type = OutputModel
     ):
         self.config = config
         self.system_prompt = config.system_prompt
@@ -76,8 +78,13 @@ class BatchJobRunner:
         self.parts: list[list[dict[str, Any]]] = []
         self._partition_data()
         Path(self.config.BASE_OUTPUT_DIR).mkdir(parents=True, exist_ok=True)
+        # Map part index to job name
+        self.part_idx_to_job_name: dict[int, str] = {}
+        # Track retry attempts per part
+        self.part_attempts: dict[int, int] = {}
     def _init_manager(self) -> SimpleBatchManager:
+        load_dotenv()
         api_key = os.getenv("OPENAI_API_KEY")
         client = OpenAI(api_key=api_key)
         return SimpleBatchManager(
@@ -111,7 +118,7 @@ class BatchJobRunner:
         prompt_length = len(self.system_prompt)
         total = total_length + (prompt_length * len(self.data))
         calculation = total / self.config.CHARS_PER_TOKEN
-        print(
+        logger.info(
             f"Total chars: {total_length}, Prompt chars: {prompt_length}, Total: {total}, Tokens: {calculation}"
         )
         if calculation < self.config.MAX_TOTAL_TOKENS:
@@ -122,55 +129,99 @@ class BatchJobRunner:
                 self.data[i : i + self.config.MAX_BATCH_SIZE]
                 for i in range(0, len(self.data), self.config.MAX_BATCH_SIZE)
             ]
-        print(f"Data split into {len(self.parts)} part(s)")
+        logger.info(f"Data split into {len(self.parts)} part(s)")
-    def run(self):
+    def _submit_all_jobs(self) -> None:
         for idx, part in enumerate(self.parts):
             if self._result_exists(idx):
-                print(f"Skipping part {idx + 1}: result already exists.")
+                logger.info(f"Skipping part {idx + 1}: result already exists.")
                 continue
             part_job_name = (
                 f"{self.job_name}_part_{idx + 1}"
                 if len(self.parts) > 1
                 else self.job_name
             )
-            print(
-                f"\n--- Processing part {idx + 1}/{len(self.parts)}: {part_job_name} ---"
+            # If a job with this name already exists, register and skip submitting
+            existing_job = self.manager._load_state(part_job_name)
+            if existing_job:
+                logger.info(
+                    f"Skipping part {idx + 1}: job already exists ({part_job_name})."
+                )
+                self.part_idx_to_job_name[idx] = part_job_name
+                self.part_attempts.setdefault(idx, 0)
+                continue
+            payload = part
+            logger.info(
+                f"Submitting job for part {idx + 1}/{len(self.parts)}: {part_job_name}"
             )
-            self._process_part(part, part_job_name, idx)
+            self.manager.start(payload, job_name=part_job_name)
+            self.part_idx_to_job_name[idx] = part_job_name
+            self.part_attempts.setdefault(idx, 0)
+            # This is added for letting file get uploaded, before starting the next part.
+            logger.info("Uploading...")
+            time.sleep(30)
-    def _process_part(
-        self, part: list[dict[str, Any]], part_job_name: str, part_idx: int
-    ):
-        while True:
-            print(f"Starting job for part: {part_job_name}")
-            self.manager.start(part, job_name=part_job_name)
-            print("Started batch job. Checking status...")
-            while True:
-                status = self.manager.check_status(job_name=part_job_name)
-                print(f"Status: {status}")
+    def run(self):
+        # Submit all jobs up-front for concurrent execution
+        self._submit_all_jobs()
+        pending_parts: set[int] = set(self.part_idx_to_job_name.keys())
+        logger.info(f"Pending parts: {sorted(pending_parts)}")
+        # Polling loop
+        while pending_parts:
+            finished_this_round: list[int] = []
+            for part_idx in list(pending_parts):
+                job_name = self.part_idx_to_job_name[part_idx]
+                status = self.manager.check_status(job_name=job_name)
+                logger.info(f"Status for {job_name}: {status}")
                 if status == "completed":
-                    print("Job completed. Fetching results...")
+                    logger.info(
+                        f"Job completed. Fetching results for part {part_idx + 1}..."
+                    )
                     output_data, log = self.manager.fetch_results(
-                        job_name=part_job_name, remove_cache=False
+                        job_name=job_name, remove_cache=False
                     )
                     output_data = self.config.import_function(output_data)
                     self._save_results(output_data, log, part_idx)
-                    print("Fetched and saved results for this part.")
-                    return
+                    logger.info(f"Fetched and saved results for part {part_idx + 1}.")
+                    finished_this_round.append(part_idx)
                 elif status == "failed":
-                    print("Job failed. Clearing state, waiting, and retrying...")
-                    self.manager._clear_state(part_job_name)
-                    # Wait before retrying
-                    time.sleep(10)
-                    # Break inner loop to restart the job
-                    break
+                    attempt = self.part_attempts.get(part_idx, 0) + 1
+                    self.part_attempts[part_idx] = attempt
+                    if attempt <= self.config.max_retries:
+                        logger.info(
+                            f"Job {job_name} failed (attempt {attempt}). Retrying after short backoff..."
+                        )
+                        self.manager._clear_state(job_name)
+                        time.sleep(10)
+                        payload = self._to_manager_payload(self.parts[part_idx])
+                        new_job_name = (
+                            f"{self.job_name}_part_{part_idx + 1}_retry_{attempt}"
+                        )
+                        self.manager.start(payload, job_name=new_job_name)
+                        self.part_idx_to_job_name[part_idx] = new_job_name
+                    else:
+                        logger.info(
+                            f"Job {job_name} failed after {attempt - 1} retries. Marking as failed."
+                        )
+                        finished_this_round.append(part_idx)
                 else:
-                    # Wait before checking again
-                    time.sleep(5)
+                    # Still running or queued
+                    continue
+            # Remove finished parts
+            for part_idx in finished_this_round:
+                pending_parts.discard(part_idx)
+            if pending_parts:
+                logger.info(
+                    f"Waiting {self.config.poll_interval_seconds}s before next status check for parts: {sorted(pending_parts)}"
+                )
+                time.sleep(self.config.poll_interval_seconds)
     def _save_results(
-        self, output_data: list[dict[str, Any]], log: list[Any], part_idx: int
+        self,
+        output_data: list[dict[str, Any]] | dict[str, Any],
+        log: list[Any],
+        part_idx: int,
     ):
         part_suffix = f"_part_{part_idx + 1}" if len(self.parts) > 1 else ""
         result_path = (
@@ -178,7 +229,7 @@ class BatchJobRunner:
             / f"{Path(self.output_data_filename).stem}{part_suffix}.json"
         )
         if not output_data:
-            print("No output data to save. Skipping this part.")
+            logger.info("No output data to save. Skipping this part.")
             return
         else:
             with open(result_path, "w", encoding="utf-8") as f:
@@ -195,13 +246,13 @@ class BatchJobRunner:
         part_suffix = f"_part_{part_idx + 1}" if len(self.parts) > 1 else ""
         result_path = (
             Path(self.config.BASE_OUTPUT_DIR)
-            / f"{Path(self.output_data_path).stem}{part_suffix}.json"
+            / f"{Path(self.output_data_filename).stem}{part_suffix}.json"
         )
         return result_path.exists()
 if __name__ == "__main__":
-    print("=== Batch Job Runner ===")
+    logger.info("=== Batch Job Runner ===")
     config = BatchConfig(
         system_prompt="",
         job_name="job_name",

{hamtaa_texttools-1.0.5 → hamtaa_texttools-1.0.6}/texttools/prompts/README.md RENAMED Viewed

@@ -3,6 +3,8 @@
 ## Overview
 This folder contains YAML files for all prompts used in the project. Each file represents a separate prompt template, which can be loaded by tools or scripts that require structured prompts for AI models.
+---
 ## Structure
 - **prompt_file.yaml**: Each YAML file represents a single prompt template.
 - **main_template**: The main instruction template for the model.
@@ -24,6 +26,8 @@ analyze_template:
     Optional detailed analysis template.
 ```
+---
 ## Guidelines
 1. **Naming**: Use descriptive names for each YAML file corresponding to the tool or task it serves.
 2. **Placeholders**: Use `{input}` or other relevant placeholders to dynamically inject data.

{hamtaa_texttools-1.0.5 → hamtaa_texttools-1.0.6}/texttools/prompts/keyword_extractor.yaml RENAMED Viewed

@@ -2,12 +2,12 @@ main_template: |
   You are an expert keyword extractor.
   Extract the most relevant keywords from the given text.
   Guidelines:
-  1. Keywords must represent the main concepts of the text.
-  2. If two words have overlapping meanings, choose only one.
-  3. Do not include generic or unrelated words.
-  4. Keywords must be single, self-contained words (no phrases).
-  5. Output between 3 and 7 keywords based on the input length.
-  6. Respond only in JSON format:
+  - Keywords must represent the main concepts of the text.
+  - If two words have overlapping meanings, choose only one.
+  - Do not include generic or unrelated words.
+  - Keywords must be single, self-contained words (no phrases).
+  - Output between 3 and 7 keywords based on the input length.
+  - Respond only in JSON format:
   {{"result": ["keyword1", "keyword2", etc.]}}
   Here is the text:
   {input}

{hamtaa_texttools-1.0.5 → hamtaa_texttools-1.0.6}/texttools/prompts/question_merger.yaml RENAMED Viewed

@@ -5,11 +5,11 @@ main_template:
     I will give you a list of questions that are semantically similar.
     Your task is to merge them into one unified question.
     Guidelines:
-    1. Preserves all the information and intent from the original questions.
-    2. Sounds natural, fluent, and concise.
-    3. Avoids redundancy or unnecessary repetition.
-    4. Does not omit any unique idea from the originals.
-    5. Respond only in JSON format:
+    - Preserves all the information and intent from the original questions.
+    - Sounds natural, fluent, and concise.
+    - Avoids redundancy or unnecessary repetition.
+    - Does not omit any unique idea from the originals.
+    - Respond only in JSON format:
     {{"result": "string"}}
     Here is the questions:
     {input}

{hamtaa_texttools-1.0.5 → hamtaa_texttools-1.0.6}/texttools/tools/async_the_tool.py RENAMED Viewed

@@ -99,7 +99,7 @@ class AsyncTheTool:
         )
         return results
-    async def detect_question(
+    async def is_question(
         self,
         question: str,
         output_lang: str | None = None,
@@ -111,7 +111,7 @@ class AsyncTheTool:
     ) -> dict[str, bool]:
         results = await self.operator.run(
             question,
-            prompt_file="question_detector.yaml",
+            prompt_file="is_question.yaml",
             output_model=OutputModels.BoolOutput,
             with_analysis=with_analysis,
             resp_format="parse",
@@ -123,7 +123,7 @@ class AsyncTheTool:
         )
         return results
-    async def generate_question_from_text(
+    async def text_to_question(
         self,
         text: str,
         output_lang: str | None = None,
@@ -135,7 +135,7 @@ class AsyncTheTool:
     ) -> dict[str, str]:
         results = await self.operator.run(
             text,
-            prompt_file="question_generator.yaml",
+            prompt_file="text_to_question.yaml",
             output_model=OutputModels.StrOutput,
             with_analysis=with_analysis,
             resp_format="parse",
@@ -202,7 +202,7 @@ class AsyncTheTool:
         )
         return results
-    async def generate_questions_from_subject(
+    async def subject_to_question(
         self,
         subject: str,
         number_of_questions: int,
@@ -215,7 +215,7 @@ class AsyncTheTool:
     ) -> dict[str, list[str]]:
         results = await self.operator.run(
             subject,
-            prompt_file="subject_question_generator.yaml",
+            prompt_file="subject_to_question.yaml",
             output_model=OutputModels.ReasonListStrOutput,
             with_analysis=with_analysis,
             resp_format="parse",

{hamtaa_texttools-1.0.5 → hamtaa_texttools-1.0.6}/texttools/tools/internals/async_operator.py RENAMED Viewed

@@ -3,7 +3,8 @@ from __future__ import annotations
 import json
 import math
 import re
-from typing import Any, Literal, Optional, TypeVar
+from typing import Any, Literal, TypeVar
+import logging
 from openai import AsyncOpenAI
 from pydantic import BaseModel
@@ -16,6 +17,10 @@ from texttools.tools.internals.prompt_loader import PromptLoader
 # Base Model type for output models
 T = TypeVar("T", bound=BaseModel)
+# Configure logger
+logger = logging.getLogger("async_operator")
+logger.setLevel(logging.INFO)
 class AsyncOperator:
     """
@@ -190,6 +195,7 @@ class AsyncOperator:
         for choice in completion.choices:
             if not getattr(choice, "logprobs", None):
+                logger.info("No logprobs found.")
                 continue
             for logprob_item in choice.logprobs.content:
@@ -237,11 +243,10 @@ class AsyncOperator:
         try:
             cleaned_text = input_text.strip()
-            # FIXED: Correct parameter order for load
             prompt_configs = prompt_loader.load(
-                prompt_file=prompt_file,  # prompt_file
-                text=cleaned_text,  # text
-                mode=mode if use_modes else "",  # mode
+                prompt_file=prompt_file,
+                text=cleaned_text,
+                mode=mode if use_modes else "",
                 **extra_kwargs,
             )
@@ -269,7 +274,7 @@ class AsyncOperator:
                     output_model,
                     logprobs,
                     top_logprobs,
-                    max_tokens,  # Pass max_tokens
+                    max_tokens,
                 )
             elif resp_format == "parse":
                 parsed, completion = await self._parse_completion(
@@ -277,10 +282,16 @@ class AsyncOperator:
                     output_model,
                     logprobs,
                     top_logprobs,
-                    max_tokens,  # Pass max_tokens
+                    max_tokens,
                 )
             else:
-                raise ValueError(f"Unknown resp_format: {resp_format}")
+                logger.error(f"Unknown resp_format: {resp_format}")
+            # Ensure output_model has a `result` field
+            if not hasattr(parsed, "result"):
+                logger.error(
+                    "The provided output_model must define a field named 'result'"
+                )
             results = {"result": parsed.result}
@@ -293,5 +304,5 @@ class AsyncOperator:
             return results
         except Exception as e:
-            print(f"[ERROR] Async operation failed: {e}")
-            raise
+            logger.error(f"Async TheTool failed: {e}")
+            return {"Error": str(e), "result": ""}

{hamtaa_texttools-1.0.5 → hamtaa_texttools-1.0.6}/texttools/tools/internals/operator.py RENAMED Viewed

@@ -2,7 +2,7 @@ from __future__ import annotations
 import math
 import re
-from typing import Any, TypeVar, Type, Literal, Optional
+from typing import Any, TypeVar, Type, Literal
 import json
 import logging
@@ -291,5 +291,5 @@ class Operator:
             return results
         except Exception as e:
-            logger.error(f"Operation failed: {e}")
+            logger.error(f"TheTool failed: {e}")
             return {"Error": str(e), "result": ""}

{hamtaa_texttools-1.0.5 → hamtaa_texttools-1.0.6}/texttools/tools/internals/prompt_loader.py RENAMED Viewed

@@ -1,4 +1,4 @@
-from typing import Optional
+from functools import lru_cache
 from pathlib import Path
 import yaml
@@ -7,10 +7,6 @@ class PromptLoader:
     """
     Utility for loading and formatting YAML prompt templates.
-    Each YAML file under `prompts/` must define at least a `main_template`,
-    and optionally an `analyze_template`. These can either be a single string
-    or a dictionary keyed by mode names (if `use_modes=True`).
     Responsibilities:
     - Load and parse YAML prompt definitions.
     - Select the right template (by mode, if applicable).
@@ -22,31 +18,30 @@ class PromptLoader:
         }
     """
+    def __init__(self):
+        self.base_dir = Path(__file__).parent.parent.parent / Path("prompts")
     MAIN_TEMPLATE: str = "main_template"
     ANALYZE_TEMPLATE: str = "analyze_template"
-    def _load_templates(
-        self,
-        prompts_dir: str,
-        prompt_file: str,
-        mode: str | None,
-    ) -> dict[str, str]:
-        prompt_path = Path(__file__).parent.parent.parent / prompts_dir / prompt_file
+    # Use lru_cache to load each file once
+    @lru_cache(maxsize=32)
+    def _load_templates(self, prompt_file: str, mode: str | None) -> dict[str, str]:
+        prompt_path = self.base_dir / prompt_file
         if not prompt_path.exists():
             raise FileNotFoundError(f"Prompt file not found: {prompt_path}")
         try:
-            # Load the data
             data = yaml.safe_load(prompt_path.read_text(encoding="utf-8"))
         except yaml.YAMLError as e:
             raise ValueError(f"Invalid YAML in {prompt_path}: {e}")
         return {
-            "main_template": data[self.MAIN_TEMPLATE][mode]
+            self.MAIN_TEMPLATE: data[self.MAIN_TEMPLATE][mode]
             if mode
             else data[self.MAIN_TEMPLATE],
-            "analyze_template": data.get(self.ANALYZE_TEMPLATE)[mode]
+            self.ANALYZE_TEMPLATE: data.get(self.ANALYZE_TEMPLATE)[mode]
             if mode
             else data.get(self.ANALYZE_TEMPLATE),
         }
@@ -59,14 +54,9 @@ class PromptLoader:
         return format_args
     def load(
-        self,
-        prompt_file: str,
-        text: str,
-        mode: str,
-        prompts_dir: str = "prompts",
-        **extra_kwargs,
+        self, prompt_file: str, text: str, mode: str, **extra_kwargs
     ) -> dict[str, str]:
-        template_configs = self._load_templates(prompts_dir, prompt_file, mode)
+        template_configs = self._load_templates(prompt_file, mode)
         format_args = self._build_format_args(text, **extra_kwargs)
         # Inject variables inside each template

{hamtaa_texttools-1.0.5 → hamtaa_texttools-1.0.6}/texttools/tools/the_tool.py RENAMED Viewed

@@ -17,11 +17,11 @@ class TheTool:
     - categorize: assign a text to one of several Islamic categories.
     - extract_keywords: produce a keyword list from text.
     - extract_entities: simple NER (name/type pairs).
-    - detect_question: binary check whether input is a question.
-    - generate_question_from_text: produce a new question from a text.
+    - is_question: binary check whether input is a question.
+    - text_to_question: produce a new question from a text.
     - merge_questions: combine multiple questions (default/reason modes).
     - rewrite: rephrase questions (same meaning/different wording, or vice versa).
-    - generate_questions_from_subject: generate multiple questions given a subject.
+    - subject_to_question: generate multiple questions given a subject.
     - summarize: produce a concise summary of a subject.
     - translate: translate text between languages.
@@ -174,7 +174,7 @@ class TheTool:
             top_logprobs=self.top_logprobs if top_logprobs is None else top_logprobs,
         )
-    def detect_question(
+    def is_question(
         self,
         text: str,
         model: str | None = None,
@@ -196,7 +196,7 @@ class TheTool:
         """
         return self.operator.run(
             # Internal parameters
-            prompt_file="question_detector.yaml",
+            prompt_file="is_question.yaml",
             output_model=OutputModels.BoolOutput,
             resp_format="parse",
             output_lang=False,
@@ -212,7 +212,7 @@ class TheTool:
             top_logprobs=self.top_logprobs if top_logprobs is None else top_logprobs,
         )
-    def generate_question_from_text(
+    def text_to_question(
         self,
         text: str,
         model: str | None = None,
@@ -235,7 +235,7 @@ class TheTool:
         """
         return self.operator.run(
             # Internal parameters
-            prompt_file="question_generator.yaml",
+            prompt_file="text_to_question.yaml",
             output_model=OutputModels.StrOutput,
             resp_format="parse",
             # User parameters
@@ -340,7 +340,7 @@ class TheTool:
             top_logprobs=self.top_logprobs if top_logprobs is None else top_logprobs,
         )
-    def generate_questions_from_subject(
+    def subject_to_question(
         self,
         text: str,
         number_of_questions: int,
@@ -366,7 +366,7 @@ class TheTool:
         """
         return self.operator.run(
             # Internal parameters
-            prompt_file="subject_question_generator.yaml",
+            prompt_file="subject_to_question.yaml",
             output_model=OutputModels.ReasonListStrOutput,
             resp_format="parse",
             # User parameters
@@ -463,14 +463,14 @@ class TheTool:
             top_logprobs=self.top_logprobs if top_logprobs is None else top_logprobs,
         )
-    def custom_tool(
+    def run_custom(
         self,
         prompt: str,
         output_model: Any,
         model: str | None = None,
         output_lang: str | None = None,
         temperature: float | None = None,
-        logprobs: float | None = None,
+        logprobs: bool | None = None,
         top_logprobs: int | None = None,
     ) -> dict[str, Any]:
         """
@@ -485,7 +485,7 @@ class TheTool:
         """
         return self.operator.run(
             # Internal parameters
-            prompt_file="custom_tool.yaml",
+            prompt_file="run_custom.yaml",
             resp_format="parse",
             user_prompt=False,
             with_analysis=False,