PyPI - ml_approach_suggestion_agent - Versions diffs - 0.1.0__py3-none-any.whl → 0.1.3__py3-none-any.whl - Mend

ml_approach_suggestion_agent 0.1.0py3-none-any.whl → 0.1.3py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of ml_approach_suggestion_agent might be problematic. Click here for more details.

Files changed (9) hide show

ml_approach_suggestion_agent/agent.py CHANGED Viewed

@@ -17,7 +17,7 @@ class MLApproachDecisionAgent:
         self.config = config or MethodologyConfig()
         self.ai_handler = SFNAIHandler()
-    def suggest_approach(self, domain_name, domain_description, use_case, column_descriptions, column_insights, target_column_name, target_column_insights, max_try=1) -> Tuple[MethodologyRecommendation, Dict[str, Any]]:
+    def suggest_approach(self, domain_name, domain_description, use_case, column_insights, max_try=1) -> Tuple[MethodologyRecommendation, Dict[str, Any]]:
         """
         Suggests a machine learning approach based on the provided domain, use case, and column descriptions.
         Args:
@@ -36,7 +36,7 @@ class MLApproachDecisionAgent:
         """
-        system_prompt, user_prompt = format_approach_prompt(domain_name=domain_name, domain_description=domain_description, use_case=use_case, column_descriptions=column_descriptions, column_insights=column_insights, target_column_name=target_column_name, target_column_insights=target_column_insights)
+        system_prompt, user_prompt = format_approach_prompt(domain_name=domain_name, domain_description=domain_description, use_case=use_case, column_insights=column_insights)
         for _ in range(max_try):
             try:
                 response, cost_summary = self.ai_handler.route_to(
@@ -66,14 +66,11 @@ class MLApproachDecisionAgent:
     def execute_task(self, task_data: Dict[str, Any]) -> Dict[str, Any]:
         self.logger.info("Executing data quality assessment task.")
-        domain_name, domain_description, use_case, column_descriptions, column_insights, target_column_name, target_column_insights = (
+        domain_name, domain_description, use_case, column_insights = (
             task_data["domain_name"],
             task_data["domain_description"],
             task_data["use_case"],
-            task_data["column_descriptions"],
             task_data["column_insights"],
-            task_data["target_column_name"],
-            task_data["target_column_insights"],
         )
         # Suggest an approach
@@ -81,10 +78,7 @@ class MLApproachDecisionAgent:
             domain_name=domain_name,
             domain_description=domain_description,
             use_case=use_case,
-            column_descriptions=column_descriptions,
             column_insights=column_insights,
-            target_column_name=target_column_name,
-            target_column_insights=target_column_insights
         )
         if not result:
             return {

ml_approach_suggestion_agent/config.py CHANGED Viewed

@@ -13,7 +13,7 @@ class MethodologyConfig(BaseSettings):
     )
     methodology_ai_provider: str = Field(default="openai", description="AI provider to use")
-    methodology_ai_model: str = Field(default="gpt-4o-mini", description="AI model to use")
+    methodology_ai_model: str = Field(default="gpt-5-mini", description="AI model to use")
     methodology_temperature: float = Field(default=0.3, ge=0.0, le=0.5, description="AI model temperature")
     methodology_max_tokens: int = Field(default=4000, ge=100, le=8000, description="Maximum tokens for AI response")

ml_approach_suggestion_agent/constants.py CHANGED Viewed

@@ -1,102 +1,84 @@
-METHODOLOGY_SELECTION_SYSTEM_PROMPT = """
-You are an ML methodology advisor. Your task is to analyze the user's problem and recommend the single most appropriate approach.
+METHODOLOGY_SELECTION_SYSTEM_PROMPT = """You are an ML methodology advisor. Analyze the problem and select ONE methodology: binary_classification, time_series_forecasting, or not_applicable.
-**Decision Framework:**
+**Simple Decision Rules:**
-1. **Understand the Business Goal:** Start with the `use_case_description` to grasp what the user is trying to achieve.
+1. **Binary Classification** - Choose when:
+   - Use case asks "predict whether", "will X happen", "classify if"
+   - Answer is YES/NO, TRUE/FALSE, or 1/0
+   - Example: "predict if machine fails", "detect fraud", "identify churn"
-2. **Examine the Target Variable:** Use `target_column_insights` to understand the nature of what needs to be predicted:
-   - Check `unique_count` to determine if it's binary (2 values), multiclass (>2 values), or continuous
-   - Check `data_type` to see if it's numerical (int/float) or categorical (str)
-   - Review `sample_values` to understand the actual values
+2. **Time Series Forecasting** - Choose when:
+   - Use case asks to "forecast", "predict future value", "estimate next"
+   - Answer is a NUMERICAL value in the FUTURE
+   - Example: "forecast next month sales", "predict tomorrow's temperature"
-3. **Check for Temporal Dependencies:** Look for these indicators in `column_insights`:
-   - Presence of timestamp/date columns with high unique_count (near row_count)
-   - Column descriptions mentioning "time", "date", "timestamp", "sequential"
-   - Use case description mentioning "over time", "forecast", "predict future", "trend", "sequential patterns"
-   - If temporal column exists AND the prediction depends on historical patterns, consider time series methods
-4. **Critical Time Series Distinction:**
-   - **Time Series Forecasting**: Target is NUMERICAL and goal is predicting FUTURE VALUES (e.g., "predict next month's sales", "forecast temperature")
-   - **Time Series Classification**: Target is CATEGORICAL (even if binary 0/1) and data is SEQUENTIAL (e.g., "classify failure from sensor patterns", "detect activity type from accelerometer sequence")
-   - **Binary/Multiclass Classification**: Target is categorical BUT data points are INDEPENDENT (no temporal ordering matters)
-5. **Assess Data Structure:** Review `column_insights`:
-   - If no target specified or target_column_name is "Not specified" → likely `not_applicable`
-   - If use case is purely computational/rule-based → `not_applicable`
-6. **Select ONE Methodology:**
-   - `binary_classification`: Target has exactly 2 unique values (categorical/binary) AND samples are independent (no temporal dependency)
-   - `multiclass_classification`: Target has >2 unique categories AND samples are independent
-   - `time_series_forecasting`: Target is NUMERICAL AND prediction involves FUTURE time periods based on historical patterns
-   - `time_series_classification`: Target is CATEGORICAL (binary or multiclass) AND data has TEMPORAL ORDERING where sequential patterns are critical for prediction
-   - `not_applicable`: No clear ML objective, purely rule-based problem, or insufficient information
-7. **Key Rules to Avoid Mistakes:**
-   - Binary target (0/1) with timestamp does NOT automatically mean binary_classification - check if SEQUENCE matters
-   - If use case mentions "based on sensor readings over time", "sequential patterns", "time-series data" → it's time_series_classification
-   - If use case is just "predict X" with no temporal context and independent samples → it's binary/multiclass_classification
-   - Presence of timestamp column alone doesn't mean time series - check if the PREDICTION depends on temporal patterns
-   - If target is numerical but goal is "classify into categories" → still classification, not regression
-8. **Justify Your Choice:**
-   - State the business goal clearly
-   - Identify the target variable type (binary/multiclass/numerical)
-   - Explain whether temporal dependencies exist and matter for prediction
-   - Connect these factors to show why the chosen methodology fits
+3. **Not Applicable** - Choose when:
+   - No prediction needed
+   - Just data analysis, reporting, or calculations
+   - Not enough information
+**Required Output:**
+1. Select the single best ML methodology from: binary_classification, time_series_forecasting, or not_applicable
+2. Provide a clear justification explaining:
+   - What you understand the business goal to be
+   - What type of prediction is needed (binary outcome, numerical forecast, or none)
+   - Whether temporal patterns are critical for this prediction
+   - Why the selected methodology is the best fit
+**Important:**
+- Having timestamps doesn't mean it's time series forecasting
+- Check WHAT is being predicted: binary outcome OR future number
+- The dataset may contain 1-4 tables - analyze all provided tables together"""
-"""
-METHODOLOGY_SELECTION_USER_PROMPT = """
-**Business Context:**
+METHODOLOGY_SELECTION_USER_PROMPT = """**Business Context:**
 Domain: {domain_name}
 {domain_description}
 **Use Case:**
 {use_case_description}
-**Data Overview:**
-Columns:
-{column_descriptions}
 Dataset Characteristics:
 {column_insights}
-**Target Information:**
-Target Column: {target_column_name}
-Target Details: {target_column_insights}
-**Required Output:**
-1. Select the single best ML methodology
-2. Provide a brief justification explaining why this methodology fits the problem
 """
-def format_approach_prompt(domain_name, domain_description, use_case, column_descriptions, column_insights, target_column_name, target_column_insights):
+def format_approach_prompt(
+    domain_name: str,
+    domain_description: str,
+    use_case: str,
+    column_insights: str
+) -> tuple[str, str]:
     """
+    Format the methodology selection prompts for the LLM.
     Args:
-    domain (str): The domain of the data.
-    use_case (str): The use case of the data.
-    column_descriptions (List[str]): A list of column descriptions.
-    column_insights (List[str]): A list of column insights.
+        domain_name: The domain of the data (e.g., "Healthcare", "Finance")
+        domain_description: Detailed description of the domain context
+        use_case: Description of what the user wants to achieve
+        column_descriptions: Description of the columns in the dataset
+        column_insights: Statistical insights about the columns (data types,
+                        unique counts, distributions, etc.)
     Returns:
-    Tuple[str, str]: The formatted system prompt and user prompt.
-    TODO:
-        - Change prompt write new prompt and involve the supported approaches in new prompt.
+        tuple[str, str]: The formatted system prompt and user prompt
+    Example:
+        system_prompt, user_prompt = format_approach_prompt(
+            domain_name="E-commerce",
+            domain_description="Online retail platform with customer transactions",
+            use_case="Predict if a customer will make a purchase",
+            column_descriptions="user_id, page_views, cart_additions, timestamp",
+            column_insights="4 columns, 10000 rows, mixed types"
+        )
     """
     user_prompt = METHODOLOGY_SELECTION_USER_PROMPT.format(
         domain_name=domain_name,
         domain_description=domain_description,
         use_case_description=use_case,
-        column_descriptions=column_descriptions,
-        column_insights=column_insights,
-        target_column_name=target_column_name,
-        target_column_insights=target_column_insights,
+        column_insights=column_insights
     )
-    return METHODOLOGY_SELECTION_SYSTEM_PROMPT, user_prompt
+    return METHODOLOGY_SELECTION_SYSTEM_PROMPT, user_prompt

ml_approach_suggestion_agent/models.py CHANGED Viewed

@@ -2,6 +2,6 @@ from pydantic import BaseModel, Field
 from typing import Literal
 class MethodologyRecommendation(BaseModel):
-    selected_methodology: Literal["binary_classification", "multiclass_classification", "time_series_forecasting", "time_series_classification", "not_applicable" ] = Field(..., description="The most appropriate ML approach for this problem")
+    selected_methodology: Literal[ "binary_classification", "time_series_forecasting", "not_applicable"] = Field(..., description="The most appropriate ML approach for this problem")
-    justification: str = Field(..., description="Clear explanation connecting the business goal and target variable to the chosen methodology")
+    justification: str = Field( ..., description="Clear explanation connecting the business goal and data characteristics to the chosen methodology")

{ml_approach_suggestion_agent-0.1.0.dist-info → ml_approach_suggestion_agent-0.1.3.dist-info}/METADATA RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: ml_approach_suggestion_agent
-Version: 0.1.0
+Version: 0.1.3
 Summary: Add your description here
 License-Expression: MIT
 Classifier: Programming Language :: Python :: 3
@@ -12,7 +12,7 @@ Classifier: Operating System :: OS Independent
 Requires-Python: >=3.11
 Description-Content-Type: text/markdown
 Requires-Dist: pydantic-settings
-Requires-Dist: sfn-blueprint>=0.6.15
+Requires-Dist: sfn-blueprint>=0.6.16
 Provides-Extra: dev
 Requires-Dist: pytest; extra == "dev"
 Requires-Dist: pytest-mock; extra == "dev"

ml_approach_suggestion_agent-0.1.3.dist-info/RECORD ADDED Viewed

@@ -0,0 +1,9 @@
+ml_approach_suggestion_agent/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
+ml_approach_suggestion_agent/agent.py,sha256=HdISZ15vU5K7pzHXJpncrCWpgPvEGJLn7_1jTt7Zeuw,5010
+ml_approach_suggestion_agent/config.py,sha256=19zO13zQ2Fmf_0wKqOfc2KfM-WD6EM4YPJLjRrG_jTY,703
+ml_approach_suggestion_agent/constants.py,sha256=EDLzMHXM8lwxb-lVLjkVElCJYChPntXW06WfCQHglJ8,3139
+ml_approach_suggestion_agent/models.py,sha256=cfZbMZPMAcNxeWTuIb_TKmNSr0C99Z4ZMicYPxlScyA,448
+ml_approach_suggestion_agent-0.1.3.dist-info/METADATA,sha256=tdnDGRvv6_VEkgkvLKTe6XcV_qfLiKNB2IE5LOdqD4c,8451
+ml_approach_suggestion_agent-0.1.3.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
+ml_approach_suggestion_agent-0.1.3.dist-info/top_level.txt,sha256=3-KHls6umFXtNFJoP7OFCLvb4zd12AWH71PVKNd5Aok,29
+ml_approach_suggestion_agent-0.1.3.dist-info/RECORD,,

ml_approach_suggestion_agent-0.1.0.dist-info/RECORD DELETED Viewed

@@ -1,9 +0,0 @@
-ml_approach_suggestion_agent/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
-ml_approach_suggestion_agent/agent.py,sha256=4FgDbiFXY1BRQN3hToif2Y-uaceHPdPrwyG3GEVRWMU,5569
-ml_approach_suggestion_agent/config.py,sha256=YOAXgdzgY72yB0mNEt9J2w1bdavx-VSCANBJ_4YR4Bo,704
-ml_approach_suggestion_agent/constants.py,sha256=znbt7x6xAxJL4eAZXcr8njerD84c8QuKK2ZlsFhsloo,5011
-ml_approach_suggestion_agent/models.py,sha256=i4vxGvA9vB3JOAh8IySMfcyAEQ7CWQe9ArgG7Rqo_cU,501
-ml_approach_suggestion_agent-0.1.0.dist-info/METADATA,sha256=qha4KlPfYbnoPlkCSJM_1L0COCFeh0CxBC_sr2WLYU0,8451
-ml_approach_suggestion_agent-0.1.0.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
-ml_approach_suggestion_agent-0.1.0.dist-info/top_level.txt,sha256=3-KHls6umFXtNFJoP7OFCLvb4zd12AWH71PVKNd5Aok,29
-ml_approach_suggestion_agent-0.1.0.dist-info/RECORD,,

{ml_approach_suggestion_agent-0.1.0.dist-info → ml_approach_suggestion_agent-0.1.3.dist-info}/WHEEL RENAMED Viewed

File without changes

{ml_approach_suggestion_agent-0.1.0.dist-info → ml_approach_suggestion_agent-0.1.3.dist-info}/top_level.txt RENAMED Viewed

File without changes

ml_approach_suggestion_agent 0.1.0__py3-none-any.whl → 0.1.3__py3-none-any.whl

Potentially problematic release.

ml_approach_suggestion_agent 0.1.0py3-none-any.whl → 0.1.3py3-none-any.whl