PyPI - ai-data-science-team - Versions diffs - 0.0.0.9011__tar.gz → 0.0.0.9013__tar.gz - Mend

ai-data-science-team 0.0.0.9011tar.gz → 0.0.0.9013tar.gz

Files changed (51) hide show

{ai_data_science_team-0.0.0.9011/ai_data_science_team.egg-info → ai_data_science_team-0.0.0.9013}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: ai-data-science-team
-Version: 0.0.0.9011
+Version: 0.0.0.9013
 Summary: Build and run an AI-powered data science team.
 Home-page: https://github.com/business-science/ai-data-science-team
 Author: Matt Dancho
@@ -31,9 +31,16 @@ Requires-Dist: psutil
 Provides-Extra: machine-learning
 Requires-Dist: h2o; extra == "machine-learning"
 Requires-Dist: mlflow; extra == "machine-learning"
+Provides-Extra: data-science
+Requires-Dist: pytimetk; extra == "data-science"
+Requires-Dist: missingno; extra == "data-science"
+Requires-Dist: sweetviz; extra == "data-science"
 Provides-Extra: all
 Requires-Dist: h2o; extra == "all"
 Requires-Dist: mlflow; extra == "all"
+Requires-Dist: pytimetk; extra == "all"
+Requires-Dist: missingno; extra == "all"
+Requires-Dist: sweetviz; extra == "all"
 Dynamic: author
 Dynamic: author-email
 Dynamic: classifier
@@ -59,6 +66,8 @@ Dynamic: summary
   <a href="https://pypi.python.org/pypi/ai-data-science-team"><img src="https://img.shields.io/pypi/v/ai-data-science-team.svg?style=for-the-badge" alt="PyPI"></a>
   <a href="https://github.com/business-science/ai-data-science-team"><img src="https://img.shields.io/pypi/pyversions/ai-data-science-team.svg?style=for-the-badge" alt="versions"></a>
   <a href="https://github.com/business-science/ai-data-science-team/blob/main/LICENSE"><img src="https://img.shields.io/github/license/business-science/ai-data-science-team.svg?style=for-the-badge" alt="license"></a>
+  <img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/business-science/ai-data-science-team?style=for-the-badge">
 </div>
@@ -93,8 +102,9 @@ The AI Data Science Team of Copilots includes Agents that specialize data cleani
     - [Apps Available Now](#apps-available-now)
       - [🔥 Agentic Applications](#-agentic-applications)
     - [Agents Available Now](#agents-available-now)
-      - [Agents](#agents)
+      - [Standard Agents](#standard-agents)
       - [🔥🔥 NEW! Machine Learning Agents](#-new-machine-learning-agents)
+      - [🔥 NEW! Data Science Agents](#-new-data-science-agents)
       - [Multi-Agents](#multi-agents)
     - [Agents Coming Soon](#agents-coming-soon)
   - [Disclaimer](#disclaimer)
@@ -122,7 +132,7 @@ If you're an aspiring data scientist who wants to learn how to build AI Agents a
 This project is a work in progress. New data science agents will be released soon.
-![AI Data Science Team](/img/ai_data_science_team_.jpg)
+![AI Data Science Team](/img/ai_data_science_team.jpg)
 ### NEW: Multi-Agents
@@ -142,18 +152,22 @@ This is a top secret project I'm working on. It's a multi-agent data science app
 #### 🔥 Agentic Applications
-1. **SQL Database Agent App:** Connects any SQL Database, generates SQL queries from natural language, and returns data as a downloadable table. [See Application](/apps/sql-database-agent-app/)
+1. **NEW Exploratory Data Copilot**: An AI-powered data science app that performs automated exploratory data analysis (EDA) with EDA Reporting, Missing Data Analysis, Correlation Analysis, and more. [See Application](/apps/exploratory-copilot-app/)
+![Exploratory Data Copilot](/img/apps/ai_exploratory_copilot.jpg)
+2. **SQL Database Agent App:** Connects any SQL Database, generates SQL queries from natural language, and returns data as a downloadable table. [See Application](/apps/sql-database-agent-app/)
 ### Agents Available Now
-#### Agents
+#### Standard Agents
 1. **Data Wrangling Agent:** Merges, Joins, Preps and Wrangles data into a format that is ready for data analysis. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_wrangling_agent.ipynb)
 2. **Data Visualization Agent:** Creates visualizations to help you understand your data. Returns JSON serializable plotly visualizations. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_visualization_agent.ipynb)
 3. **🔥 Data Cleaning Agent:** Performs Data Preparation steps including handling missing values, outliers, and data type conversions. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_cleaning_agent.ipynb)
 4. **Feature Engineering Agent:** Converts the prepared data into ML-ready data. Adds features to increase predictive accuracy of ML models. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/feature_engineering_agent.ipynb)
 5. **🔥 SQL Database Agent:** Connects to SQL databases to pull data into the data science environment. Creates pipelines to automate data extraction. Performs Joins, Aggregations, and other SQL Query operations. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/sql_database_agent.ipynb)
-6. **Data Loader Tools Agent:** Loads data from various sources including CSV, Excel, Parquet, and Pickle files. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_loader_tools_agent.ipynb)
+6. **🔥 Data Loader Tools Agent:** Loads data from various sources including CSV, Excel, Parquet, and Pickle files. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_loader_tools_agent.ipynb)
 #### 🔥🔥 NEW! Machine Learning Agents
@@ -161,6 +175,10 @@ This is a top secret project I'm working on. It's a multi-agent data science app
 1. **🔥 H2O Machine Learning Agent:** Builds and logs 100's of high-performance machine learning models. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/ml_agents/h2o_machine_learning_agent.ipynb)
 2. **🔥 MLflow Tools Agent (MLOps):** This agent has 11+ tools for managing models, ML projects, and making production ML predictions with MLflow. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/ml_agents/mlflow_tools_agent.ipynb)
+#### 🔥 NEW! Data Science Agents
+1. **🔥🔥 EDA Tools Agent:** Performs automated exploratory data analysis (EDA) with EDA Reporting, Missing Data Analysis, Correlation Analysis, and more. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/ds_agents/eda_tools_agent.ipynb)
 #### Multi-Agents

{ai_data_science_team-0.0.0.9011 → ai_data_science_team-0.0.0.9013}/README.md RENAMED Viewed

@@ -12,6 +12,8 @@
   <a href="https://pypi.python.org/pypi/ai-data-science-team"><img src="https://img.shields.io/pypi/v/ai-data-science-team.svg?style=for-the-badge" alt="PyPI"></a>
   <a href="https://github.com/business-science/ai-data-science-team"><img src="https://img.shields.io/pypi/pyversions/ai-data-science-team.svg?style=for-the-badge" alt="versions"></a>
   <a href="https://github.com/business-science/ai-data-science-team/blob/main/LICENSE"><img src="https://img.shields.io/github/license/business-science/ai-data-science-team.svg?style=for-the-badge" alt="license"></a>
+  <img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/business-science/ai-data-science-team?style=for-the-badge">
 </div>
@@ -46,8 +48,9 @@ The AI Data Science Team of Copilots includes Agents that specialize data cleani
     - [Apps Available Now](#apps-available-now)
       - [🔥 Agentic Applications](#-agentic-applications)
     - [Agents Available Now](#agents-available-now)
-      - [Agents](#agents)
+      - [Standard Agents](#standard-agents)
       - [🔥🔥 NEW! Machine Learning Agents](#-new-machine-learning-agents)
+      - [🔥 NEW! Data Science Agents](#-new-data-science-agents)
       - [Multi-Agents](#multi-agents)
     - [Agents Coming Soon](#agents-coming-soon)
   - [Disclaimer](#disclaimer)
@@ -75,7 +78,7 @@ If you're an aspiring data scientist who wants to learn how to build AI Agents a
 This project is a work in progress. New data science agents will be released soon.
-![AI Data Science Team](/img/ai_data_science_team_.jpg)
+![AI Data Science Team](/img/ai_data_science_team.jpg)
 ### NEW: Multi-Agents
@@ -95,18 +98,22 @@ This is a top secret project I'm working on. It's a multi-agent data science app
 #### 🔥 Agentic Applications
-1. **SQL Database Agent App:** Connects any SQL Database, generates SQL queries from natural language, and returns data as a downloadable table. [See Application](/apps/sql-database-agent-app/)
+1. **NEW Exploratory Data Copilot**: An AI-powered data science app that performs automated exploratory data analysis (EDA) with EDA Reporting, Missing Data Analysis, Correlation Analysis, and more. [See Application](/apps/exploratory-copilot-app/)
+![Exploratory Data Copilot](/img/apps/ai_exploratory_copilot.jpg)
+2. **SQL Database Agent App:** Connects any SQL Database, generates SQL queries from natural language, and returns data as a downloadable table. [See Application](/apps/sql-database-agent-app/)
 ### Agents Available Now
-#### Agents
+#### Standard Agents
 1. **Data Wrangling Agent:** Merges, Joins, Preps and Wrangles data into a format that is ready for data analysis. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_wrangling_agent.ipynb)
 2. **Data Visualization Agent:** Creates visualizations to help you understand your data. Returns JSON serializable plotly visualizations. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_visualization_agent.ipynb)
 3. **🔥 Data Cleaning Agent:** Performs Data Preparation steps including handling missing values, outliers, and data type conversions. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_cleaning_agent.ipynb)
 4. **Feature Engineering Agent:** Converts the prepared data into ML-ready data. Adds features to increase predictive accuracy of ML models. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/feature_engineering_agent.ipynb)
 5. **🔥 SQL Database Agent:** Connects to SQL databases to pull data into the data science environment. Creates pipelines to automate data extraction. Performs Joins, Aggregations, and other SQL Query operations. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/sql_database_agent.ipynb)
-6. **Data Loader Tools Agent:** Loads data from various sources including CSV, Excel, Parquet, and Pickle files. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_loader_tools_agent.ipynb)
+6. **🔥 Data Loader Tools Agent:** Loads data from various sources including CSV, Excel, Parquet, and Pickle files. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_loader_tools_agent.ipynb)
 #### 🔥🔥 NEW! Machine Learning Agents
@@ -114,6 +121,10 @@ This is a top secret project I'm working on. It's a multi-agent data science app
 1. **🔥 H2O Machine Learning Agent:** Builds and logs 100's of high-performance machine learning models. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/ml_agents/h2o_machine_learning_agent.ipynb)
 2. **🔥 MLflow Tools Agent (MLOps):** This agent has 11+ tools for managing models, ML projects, and making production ML predictions with MLflow. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/ml_agents/mlflow_tools_agent.ipynb)
+#### 🔥 NEW! Data Science Agents
+1. **🔥🔥 EDA Tools Agent:** Performs automated exploratory data analysis (EDA) with EDA Reporting, Missing Data Analysis, Correlation Analysis, and more. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/ds_agents/eda_tools_agent.ipynb)
 #### Multi-Agents

ai_data_science_team-0.0.0.9013/ai_data_science_team/_version.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ __version__ = "0.0.0.9013"

{ai_data_science_team-0.0.0.9011 → ai_data_science_team-0.0.0.9013}/ai_data_science_team/agents/data_loader_tools_agent.py RENAMED Viewed

@@ -25,6 +25,7 @@ from ai_data_science_team.tools.data_loader import (
     get_file_info,
     search_files_by_pattern,
 )
+from ai_data_science_team.utils.messages import get_tool_call_names
 AGENT_NAME = "data_loader_tools_agent"
@@ -174,6 +175,12 @@ class DataLoaderToolsAgent(BaseAgent):
             return Markdown(self.response["messages"][0].content)
         else:
             return self.response["messages"][0].content
+    def get_tool_calls(self):
+        """
+        Returns the tool calls made by the agent.
+        """
+        return self.response["tool_calls"]
@@ -204,6 +211,7 @@ def make_data_loader_tools_agent(
         internal_messages: Annotated[Sequence[BaseMessage], operator.add]
         user_instructions: str
         data_loader_artifacts: dict
+        tool_calls: List[str]
     def data_loader_agent(state):
@@ -253,10 +261,13 @@ def make_data_loader_tools_agent(
             elif isinstance(last_message, dict) and "artifact" in last_message:
                 last_tool_artifact = last_message["artifact"]
+        tool_calls = get_tool_call_names(internal_messages)
         return {
             "messages": [last_ai_message],
             "internal_messages": internal_messages,
             "data_loader_artifacts": last_tool_artifact,
+            "tool_calls": tool_calls,
         }
     workflow = StateGraph(GraphState)

ai_data_science_team-0.0.0.9013/ai_data_science_team/ds_agents/__init__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ from ai_data_science_team.ds_agents.eda_tools_agent import EDAToolsAgent, make_eda_tools_agent

ai_data_science_team-0.0.0.9013/ai_data_science_team/ds_agents/eda_tools_agent.py ADDED Viewed

@@ -0,0 +1,258 @@
+from typing import Any, Optional, Annotated, Sequence, List, Dict, Tuple
+import operator
+import pandas as pd
+import os
+from io import StringIO, BytesIO
+import base64
+import matplotlib.pyplot as plt
+from IPython.display import Markdown
+from langchain_core.messages import BaseMessage, AIMessage
+from langgraph.prebuilt import create_react_agent, ToolNode
+from langgraph.prebuilt.chat_agent_executor import AgentState
+from langgraph.graph import START, END, StateGraph
+from ai_data_science_team.templates import BaseAgent
+from ai_data_science_team.utils.regex import format_agent_name
+from ai_data_science_team.tools.eda import (
+    explain_data,
+    describe_dataset,
+    visualize_missing,
+    correlation_funnel,
+    generate_sweetviz_report,
+)
+from ai_data_science_team.utils.messages import get_tool_call_names
+AGENT_NAME = "exploratory_data_analyst_agent"
+# Updated tool list for EDA
+EDA_TOOLS = [
+    explain_data,
+    describe_dataset,
+    visualize_missing,
+    correlation_funnel,
+    generate_sweetviz_report,
+]
+class EDAToolsAgent(BaseAgent):
+    """
+    An Exploratory Data Analysis Tools Agent that interacts with EDA tools to generate summary statistics,
+    missing data visualizations, correlation funnels, EDA reports, etc.
+    Parameters:
+    ----------
+    model : langchain.llms.base.LLM
+        The language model for generating the tool-calling agent.
+    create_react_agent_kwargs : dict
+        Additional kwargs for create_react_agent.
+    invoke_react_agent_kwargs : dict
+        Additional kwargs for agent invocation.
+    """
+    def __init__(
+        self,
+        model: Any,
+        create_react_agent_kwargs: Optional[Dict] = {},
+        invoke_react_agent_kwargs: Optional[Dict] = {},
+    ):
+        self._params = {
+            "model": model,
+            "create_react_agent_kwargs": create_react_agent_kwargs,
+            "invoke_react_agent_kwargs": invoke_react_agent_kwargs,
+        }
+        self._compiled_graph = self._make_compiled_graph()
+        self.response = None
+    def _make_compiled_graph(self):
+        """
+        Creates the compiled state graph for the EDA agent.
+        """
+        self.response = None
+        return make_eda_tools_agent(**self._params)
+    def update_params(self, **kwargs):
+        """
+        Updates the agent's parameters and rebuilds the compiled graph.
+        """
+        for k, v in kwargs.items():
+            self._params[k] = v
+        self._compiled_graph = self._make_compiled_graph()
+    async def ainvoke_agent(
+        self,
+        user_instructions: str = None,
+        data_raw: pd.DataFrame = None,
+        **kwargs
+    ):
+        """
+        Asynchronously runs the agent with user instructions and data.
+        Parameters:
+        ----------
+        user_instructions : str, optional
+            The instructions for the agent.
+        data_raw : pd.DataFrame, optional
+            The input data as a DataFrame.
+        """
+        response = await self._compiled_graph.ainvoke(
+            {
+                "user_instructions": user_instructions,
+                "data_raw": data_raw.to_dict() if data_raw is not None else None,
+            },
+            **kwargs
+        )
+        self.response = response
+        return None
+    def invoke_agent(
+        self,
+        user_instructions: str = None,
+        data_raw: pd.DataFrame = None,
+        **kwargs
+    ):
+        """
+        Synchronously runs the agent with user instructions and data.
+        Parameters:
+        ----------
+        user_instructions : str, optional
+            The instructions for the agent.
+        data_raw : pd.DataFrame, optional
+            The input data as a DataFrame.
+        """
+        response = self._compiled_graph.invoke(
+            {
+                "user_instructions": user_instructions,
+                "data_raw": data_raw.to_dict() if data_raw is not None else None,
+            },
+            **kwargs
+        )
+        self.response = response
+        return None
+    def get_internal_messages(self, markdown: bool = False):
+        """
+        Returns internal messages from the agent response.
+        """
+        pretty_print = "\n\n".join(
+            [f"### {msg.type.upper()}\n\nID: {msg.id}\n\nContent:\n\n{msg.content}"
+             for msg in self.response["internal_messages"]]
+        )
+        if markdown:
+            return Markdown(pretty_print)
+        else:
+            return self.response["internal_messages"]
+    def get_artifacts(self, as_dataframe: bool = False):
+        """
+        Returns the EDA artifacts from the agent response.
+        """
+        if as_dataframe:
+            return pd.DataFrame(self.response["eda_artifacts"])
+        else:
+            return self.response["eda_artifacts"]
+    def get_ai_message(self, markdown: bool = False):
+        """
+        Returns the AI message from the agent response.
+        """
+        if markdown:
+            return Markdown(self.response["messages"][0].content)
+        else:
+            return self.response["messages"][0].content
+    def get_tool_calls(self):
+        """
+        Returns the tool calls made by the agent.
+        """
+        return self.response["tool_calls"]
+def make_eda_tools_agent(
+    model: Any,
+    create_react_agent_kwargs: Optional[Dict] = {},
+    invoke_react_agent_kwargs: Optional[Dict] = {},
+):
+    """
+    Creates an Exploratory Data Analyst Agent that can interact with EDA tools.
+    Parameters:
+    ----------
+    model : Any
+        The language model used for tool-calling.
+    create_react_agent_kwargs : dict
+        Additional kwargs for create_react_agent.
+    invoke_react_agent_kwargs : dict
+        Additional kwargs for agent invocation.
+    Returns:
+    -------
+    app : langgraph.graph.CompiledStateGraph
+        The compiled state graph for the EDA agent.
+    """
+    class GraphState(AgentState):
+        internal_messages: Annotated[Sequence[BaseMessage], operator.add]
+        user_instructions: str
+        data_raw: dict
+        eda_artifacts: dict
+        tool_calls: list
+    def exploratory_agent(state):
+        print(format_agent_name(AGENT_NAME))
+        print("    * RUN REACT TOOL-CALLING AGENT FOR EDA")
+        tool_node = ToolNode(
+            tools=EDA_TOOLS
+        )
+        eda_agent = create_react_agent(
+            model,
+            tools=tool_node,
+            state_schema=GraphState,
+            **create_react_agent_kwargs,
+        )
+        response = eda_agent.invoke(
+            {
+                "messages": [("user", state["user_instructions"])],
+                "data_raw": state["data_raw"],
+            },
+            invoke_react_agent_kwargs,
+        )
+        print("    * POST-PROCESSING EDA RESULTS")
+        internal_messages = response['messages']
+        if not internal_messages:
+            return {"internal_messages": [], "eda_artifacts": None}
+        last_ai_message = AIMessage(internal_messages[-1].content, role=AGENT_NAME)
+        last_tool_artifact = None
+        if len(internal_messages) > 1:
+            last_message = internal_messages[-2]
+            if hasattr(last_message, "artifact"):
+                last_tool_artifact = last_message.artifact
+            elif isinstance(last_message, dict) and "artifact" in last_message:
+                last_tool_artifact = last_message["artifact"]
+        tool_calls = get_tool_call_names(internal_messages)
+        return {
+            "messages": [last_ai_message],
+            "internal_messages": internal_messages,
+            "eda_artifacts": last_tool_artifact,
+            "tool_calls": tool_calls,
+        }
+    workflow = StateGraph(GraphState)
+    workflow.add_node("exploratory_agent", exploratory_agent)
+    workflow.add_edge(START, "exploratory_agent")
+    workflow.add_edge("exploratory_agent", END)
+    app = workflow.compile()
+    return app

{ai_data_science_team-0.0.0.9011 → ai_data_science_team-0.0.0.9013}/ai_data_science_team/ml_agents/mlflow_tools_agent.py RENAMED Viewed

@@ -27,6 +27,7 @@ from ai_data_science_team.tools.mlflow import (
     mlflow_search_registered_models,
     mlflow_get_model_version_details,
 )
+from ai_data_science_team.utils.messages import get_tool_call_names
 AGENT_NAME = "mlflow_tools_agent"
@@ -228,6 +229,12 @@ class MLflowToolsAgent(BaseAgent):
             return Markdown(self.response["messages"][0].content)
         else:
             return self.response["messages"][0].content
+    def get_tool_calls(self):
+        """
+        Returns the tool calls made by the agent.
+        """
+        return self.response["tool_calls"]
@@ -330,10 +337,13 @@ def make_mlflow_tools_agent(
             elif isinstance(last_message, dict) and "artifact" in last_message:
                 last_tool_artifact = last_message["artifact"]
+        tool_calls = get_tool_call_names(internal_messages)
         return {
             "messages": [last_ai_message],
             "internal_messages": internal_messages,
             "mlflow_artifacts": last_tool_artifact,
+            "tool_calls": tool_calls,
         }

{ai_data_science_team-0.0.0.9011 → ai_data_science_team-0.0.0.9013}/ai_data_science_team/tools/dataframe.py RENAMED Viewed

@@ -74,7 +74,12 @@ def get_dataframe_summary(
     return summaries
-def _summarize_dataframe(df: pd.DataFrame, dataset_name: str, n_sample=30, skip_stats=False) -> str:
+def _summarize_dataframe(
+    df: pd.DataFrame,
+    dataset_name: str,
+    n_sample=30,
+    skip_stats=False
+) -> str:
     """Generate a summary string for a single DataFrame."""
     # 1. Convert dictionary-type cells to strings
     #    This prevents unhashable dict errors during df.nunique().

ai_data_science_team-0.0.0.9013/ai_data_science_team/tools/eda.py ADDED Viewed

@@ -0,0 +1,352 @@
+from typing import Annotated, Dict, Tuple, Union
+import os
+import tempfile
+from langchain.tools import tool
+from langgraph.prebuilt import InjectedState
+from ai_data_science_team.tools.dataframe import get_dataframe_summary
+@tool(response_format='content')
+def explain_data(
+    data_raw: Annotated[dict, InjectedState("data_raw")],
+    n_sample: int = 30,
+    skip_stats: bool = False,
+):
+    """
+    Tool: explain_data
+    Description:
+        Provides an extensive, narrative summary of a DataFrame including its shape, column types,
+        missing value percentages, unique counts, sample rows, and (if not skipped) descriptive stats/info.
+    Parameters:
+        data_raw (dict): Raw data.
+        n_sample (int, default=30): Number of rows to display.
+        skip_stats (bool, default=False): If True, omit descriptive stats/info.
+    LLM Guidance:
+        Use when a detailed, human-readable explanation is needed—i.e., a full overview is preferred over a concise numerical summary.
+    Returns:
+        str: Detailed DataFrame summary.
+    """
+    print("    * Tool: explain_data")
+    import pandas as pd
+    result = get_dataframe_summary(pd.DataFrame(data_raw), n_sample=n_sample, skip_stats=skip_stats)
+    return result
+@tool(response_format='content_and_artifact')
+def describe_dataset(
+    data_raw: Annotated[dict, InjectedState("data_raw")]
+) -> Tuple[str, Dict]:
+    """
+    Tool: describe_dataset
+    Description:
+        Compute and return summary statistics for the dataset using pandas' describe() method.
+        The tool provides both a textual summary and a structured artifact (a dictionary) for further processing.
+    Parameters:
+    -----------
+    data_raw : dict
+        The raw data in dictionary format.
+    LLM Selection Guidance:
+    ------------------------
+    Use this tool when:
+      - The request emphasizes numerical descriptive statistics (e.g., count, mean, std, min, quartiles, max).
+      - The user needs a concise statistical snapshot rather than a detailed narrative.
+      - Both a brief text explanation and a structured data artifact (for downstream tasks) are required.
+    Returns:
+    -------
+    Tuple[str, Dict]:
+        - content: A textual summary indicating that summary statistics have been computed.
+        - artifact: A dictionary (derived from DataFrame.describe()) containing detailed statistical measures.
+    """
+    print("    * Tool: describe_dataset")
+    import pandas as pd
+    df = pd.DataFrame(data_raw)
+    description_df = df.describe(include='all')
+    content = "Summary statistics computed using pandas describe()."
+    artifact = {'describe_df': description_df.to_dict()}
+    return content, artifact
+@tool(response_format='content_and_artifact')
+def visualize_missing(
+    data_raw: Annotated[dict, InjectedState("data_raw")],
+    n_sample: int = None
+) -> Tuple[str, Dict]:
+    """
+    Tool: visualize_missing
+    Description:
+        Missing value analysis using the missingno library. Generates a matrix plot, bar plot, and heatmap plot.
+    Parameters:
+    -----------
+    data_raw : dict
+        The raw data in dictionary format.
+    n_sample : int, optional (default: None)
+        The number of rows to sample from the dataset if it is large.
+    Returns:
+    -------
+    Tuple[str, Dict]:
+        content: A message describing the generated plots.
+        artifact: A dict with keys 'matrix_plot', 'bar_plot', and 'heatmap_plot' each containing the
+                  corresponding base64 encoded PNG image.
+    """
+    print("    * Tool: visualize_missing")
+    try:
+        import missingno as msno  # Ensure missingno is installed
+    except ImportError:
+        raise ImportError("Please install the 'missingno' package to use this tool. pip install missingno")
+    import pandas as pd
+    import base64
+    from io import BytesIO
+    import matplotlib.pyplot as plt
+    # Create the DataFrame and sample if n_sample is provided.
+    df = pd.DataFrame(data_raw)
+    if n_sample is not None:
+        df = df.sample(n=n_sample, random_state=42)
+    # Dictionary to store the base64 encoded images for each plot.
+    encoded_plots = {}
+    # Define a helper function to create a plot, save it, and encode it.
+    def create_and_encode_plot(plot_func, plot_name: str):
+        plt.figure(figsize=(8, 6))
+        # Call the missingno plotting function.
+        plot_func(df)
+        plt.tight_layout()
+        buf = BytesIO()
+        plt.savefig(buf, format="png")
+        plt.close()
+        buf.seek(0)
+        return base64.b64encode(buf.getvalue()).decode("utf-8")
+    # Create and encode the matrix plot.
+    encoded_plots["matrix_plot"] = create_and_encode_plot(msno.matrix, "matrix")
+    # Create and encode the bar plot.
+    encoded_plots["bar_plot"] = create_and_encode_plot(msno.bar, "bar")
+    # Create and encode the heatmap plot.
+    encoded_plots["heatmap_plot"] = create_and_encode_plot(msno.heatmap, "heatmap")
+    content = "Missing data visualizations (matrix, bar, and heatmap) have been generated."
+    artifact = encoded_plots
+    return content, artifact
+@tool(response_format='content_and_artifact')
+def correlation_funnel(
+    data_raw: Annotated[dict, InjectedState("data_raw")],
+    target: str,
+    target_bin_index: Union[int, str] = -1,
+    corr_method: str = "pearson",
+    n_bins: int = 4,
+    thresh_infreq: float = 0.01,
+    name_infreq: str = "-OTHER",
+) -> Tuple[str, Dict]:
+    """
+    Tool: correlation_funnel
+    Description:
+        Correlation analysis using the correlation funnel method. The tool binarizes the data and computes correlation versus a target column.
+    Parameters:
+    ----------
+    target : str
+        The base target column name (e.g., 'Member_Status'). The tool will look for columns that begin
+        with this string followed by '__' (e.g., 'Member_Status__Gold', 'Member_Status__Platinum').
+    target_bin_index : int or str, default -1
+        If an integer, selects the target level by position from the matching columns.
+        If a string (e.g., "Yes"), attempts to match to the suffix of a column name
+        (i.e., 'target__Yes').
+    corr_method : str
+        The correlation method ('pearson', 'kendall', or 'spearman'). Default is 'pearson'.
+    n_bins : int
+        The number of bins to use for binarization. Default is 4.
+    thresh_infreq : float
+        The threshold for infrequent levels. Default is 0.01.
+    name_infreq : str
+        The name to use for infrequent levels. Default is '-OTHER'.
+    """
+    print("    * Tool: correlation_funnel")
+    try:
+        import pytimetk as tk
+    except ImportError:
+        raise ImportError("Please install the 'pytimetk' package to use this tool. pip install pytimetk")
+    import pandas as pd
+    import base64
+    from io import BytesIO
+    import matplotlib.pyplot as plt
+    import json
+    import plotly.graph_objects as go
+    import plotly.io as pio
+    from typing import Union
+    # Convert the raw injected state into a DataFrame.
+    df = pd.DataFrame(data_raw)
+    # Apply the binarization method.
+    df_binarized = df.binarize(
+        n_bins=n_bins,
+        thresh_infreq=thresh_infreq,
+        name_infreq=name_infreq,
+        one_hot=True
+    )
+    # Determine the full target column name.
+    # Look for all columns that start with "target__"
+    matching_columns = [col for col in df_binarized.columns if col.startswith(f"{target}__")]
+    if not matching_columns:
+        # If no matching columns are found, warn and use the provided target as-is.
+        full_target = target
+    else:
+        # Determine the full target based on target_bin_index.
+        if isinstance(target_bin_index, str):
+            # Build the candidate column name
+            candidate = f"{target}__{target_bin_index}"
+            if candidate in matching_columns:
+                full_target = candidate
+            else:
+                # If no matching candidate is found, default to the last matching column.
+                full_target = matching_columns[-1]
+        else:
+            # target_bin_index is an integer.
+            try:
+                full_target = matching_columns[target_bin_index]
+            except IndexError:
+                # If index is out of bounds, use the last matching column.
+                full_target = matching_columns[-1]
+    # Compute correlation funnel using the full target column name.
+    df_correlated = df_binarized.correlate(target=full_target, method=corr_method)
+    # Attempt to generate a static plot.
+    try:
+        # Here we assume that your DataFrame has a method plot_correlation_funnel.
+        fig = df_correlated.plot_correlation_funnel(engine='plotnine', height=600)
+        buf = BytesIO()
+        # Use the appropriate save method for your figure object.
+        fig.save(buf, format="png")
+        plt.close()
+        buf.seek(0)
+        encoded = base64.b64encode(buf.getvalue()).decode("utf-8")
+    except Exception as e:
+        encoded = {"error": str(e)}
+    # Attempt to generate a Plotly plot.
+    try:
+        fig = df_correlated.plot_correlation_funnel(engine='plotly')
+        fig_json = pio.to_json(fig)
+        fig_dict = json.loads(fig_json)
+    except Exception as e:
+        fig_dict = {"error": str(e)}
+    content = (f"Correlation funnel computed using method '{corr_method}' for target level '{full_target}'. "
+               f"Base target was '{target}' with target_bin_index '{target_bin_index}'.")
+    artifact = {
+        "correlation_data": df_correlated.to_dict(orient="list"),
+        "plot_image": encoded,
+        "plotly_figure": fig_dict,
+    }
+    return content, artifact
+@tool(response_format='content_and_artifact')
+def generate_sweetviz_report(
+    data_raw: Annotated[dict, InjectedState("data_raw")],
+    target: str = None,
+    report_name: str = "sweetviz_report.html",
+    report_directory: str = None,  # <-- Default to None
+    open_browser: bool = False,
+) -> Tuple[str, Dict]:
+    """
+    Tool: generate_sweetviz_report
+    Description:
+        Make an Exploratory Data Analysis (EDA) report using the Sweetviz library.
+    Parameters:
+    -----------
+    data_raw : dict
+        The raw data injected as a dictionary (converted from a DataFrame).
+    target : str, optional
+        The target feature to analyze. Default is None.
+    report_name : str, optional
+        The file name to save the Sweetviz HTML report. Default is "sweetviz_report.html".
+    report_directory : str, optional
+        The directory where the report should be saved.
+        If None, a temporary directory is created and used.
+    open_browser : bool, optional
+        Whether to open the report in a web browser. Default is False.
+    Returns:
+    --------
+    Tuple[str, Dict]:
+        content: A summary message describing the generated report.
+        artifact: A dictionary with the report file path and optionally the report's HTML content.
+    """
+    print("    * Tool: generate_sweetviz_report")
+    # Import sweetviz
+    try:
+        import sweetviz as sv
+    except ImportError:
+        raise ImportError("Please install the 'sweetviz' package to use this tool. Run: pip install sweetviz")
+    import pandas as pd
+    # Convert injected raw data to a DataFrame.
+    df = pd.DataFrame(data_raw)
+    # If no directory is specified, use a temporary directory.
+    if not report_directory:
+        report_directory = tempfile.mkdtemp()
+        print(f"    * Using temporary directory: {report_directory}")
+    else:
+        # Ensure user-specified directory exists.
+        if not os.path.exists(report_directory):
+            os.makedirs(report_directory)
+    # Create the Sweetviz report.
+    report = sv.analyze(df, target_feat=target)
+    # Determine the full path for the report.
+    full_report_path = os.path.join(report_directory, report_name)
+    # Save the report to the specified HTML file.
+    report.show_html(
+        filepath=full_report_path,
+        open_browser=open_browser,
+    )
+    # Optionally, read the HTML content (if desired to pass along in the artifact).
+    try:
+        with open(full_report_path, "r", encoding="utf-8") as f:
+            html_content = f.read()
+    except Exception:
+        html_content = None
+    content = (
+        f"Sweetviz EDA report generated and saved as '{os.path.abspath(full_report_path)}'. "
+        f"{'This was saved in a temporary directory.' if 'tmp' in report_directory else ''}"
+    )
+    artifact = {
+        "report_file": os.path.abspath(full_report_path),
+        "report_html": html_content,
+    }
+    return content, artifact

ai_data_science_team-0.0.0.9013/ai_data_science_team/utils/__init__.py ADDED Viewed

File without changes

ai_data_science_team-0.0.0.9013/ai_data_science_team/utils/html.py ADDED Viewed

@@ -0,0 +1,27 @@
+import webbrowser
+import os
+def open_html_file_in_browser(file_path: str):
+    """
+    Opens an HTML file in the default web browser.
+    Parameters:
+    -----------
+    file_path : str
+        The file path or URL of the HTML file to open.
+    Returns:
+    --------
+    None
+    """
+    # Check if the file exists if a local path is provided.
+    if os.path.isfile(file_path):
+        # Convert file path to a file URL
+        file_url = 'file://' + os.path.abspath(file_path)
+    else:
+        # If the file doesn't exist locally, assume it's a URL
+        file_url = file_path
+    webbrowser.open(file_url)

ai_data_science_team-0.0.0.9013/ai_data_science_team/utils/matplotlib.py ADDED Viewed

@@ -0,0 +1,46 @@
+import base64
+from io import BytesIO
+import matplotlib.pyplot as plt
+from PIL import Image
+def matplotlib_from_base64(encoded: str, title: str = None, figsize: tuple = (8, 6)):
+    """
+    Convert a base64-encoded image to a matplotlib plot and display it.
+    Parameters:
+    -----------
+    encoded : str
+        The base64-encoded image string.
+    title : str, optional
+        A title for the plot. Default is None.
+    figsize : tuple, optional
+        Figure size (width, height) for the plot. Default is (8, 6).
+    Returns:
+    --------
+    fig, ax : tuple
+        The matplotlib figure and axes objects.
+    """
+    # Decode the base64 string to bytes
+    img_data = base64.b64decode(encoded)
+    # Load the bytes data into a BytesIO buffer
+    buf = BytesIO(img_data)
+    # Open the image using Pillow
+    img = Image.open(buf)
+    # Create a matplotlib figure and axis
+    fig, ax = plt.subplots(figsize=figsize)
+    # Display the image
+    ax.imshow(img)
+    ax.axis('off')  # Hide the axis
+    if title:
+        ax.set_title(title)
+    # Show the plot
+    plt.show()
+    return fig, ax

ai_data_science_team-0.0.0.9013/ai_data_science_team/utils/messages.py ADDED Viewed

@@ -0,0 +1,27 @@
+def get_tool_call_names(messages):
+    """
+    Method to extract the tool call names from a list of LangChain messages.
+    Parameters:
+    ----------
+    messages : list
+        A list of LangChain messages.
+    Returns:
+    -------
+    tool_calls : list
+        A list of tool call names.
+    """
+    tool_calls = []
+    for message in messages:
+        try:
+            if "tool_call_id" in list(dict(message).keys()):
+                tool_calls.append(message.name)
+        except:
+            pass
+    return tool_calls

{ai_data_science_team-0.0.0.9011 → ai_data_science_team-0.0.0.9013/ai_data_science_team.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: ai-data-science-team
-Version: 0.0.0.9011
+Version: 0.0.0.9013
 Summary: Build and run an AI-powered data science team.
 Home-page: https://github.com/business-science/ai-data-science-team
 Author: Matt Dancho
@@ -31,9 +31,16 @@ Requires-Dist: psutil
 Provides-Extra: machine-learning
 Requires-Dist: h2o; extra == "machine-learning"
 Requires-Dist: mlflow; extra == "machine-learning"
+Provides-Extra: data-science
+Requires-Dist: pytimetk; extra == "data-science"
+Requires-Dist: missingno; extra == "data-science"
+Requires-Dist: sweetviz; extra == "data-science"
 Provides-Extra: all
 Requires-Dist: h2o; extra == "all"
 Requires-Dist: mlflow; extra == "all"
+Requires-Dist: pytimetk; extra == "all"
+Requires-Dist: missingno; extra == "all"
+Requires-Dist: sweetviz; extra == "all"
 Dynamic: author
 Dynamic: author-email
 Dynamic: classifier
@@ -59,6 +66,8 @@ Dynamic: summary
   <a href="https://pypi.python.org/pypi/ai-data-science-team"><img src="https://img.shields.io/pypi/v/ai-data-science-team.svg?style=for-the-badge" alt="PyPI"></a>
   <a href="https://github.com/business-science/ai-data-science-team"><img src="https://img.shields.io/pypi/pyversions/ai-data-science-team.svg?style=for-the-badge" alt="versions"></a>
   <a href="https://github.com/business-science/ai-data-science-team/blob/main/LICENSE"><img src="https://img.shields.io/github/license/business-science/ai-data-science-team.svg?style=for-the-badge" alt="license"></a>
+  <img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/business-science/ai-data-science-team?style=for-the-badge">
 </div>
@@ -93,8 +102,9 @@ The AI Data Science Team of Copilots includes Agents that specialize data cleani
     - [Apps Available Now](#apps-available-now)
       - [🔥 Agentic Applications](#-agentic-applications)
     - [Agents Available Now](#agents-available-now)
-      - [Agents](#agents)
+      - [Standard Agents](#standard-agents)
       - [🔥🔥 NEW! Machine Learning Agents](#-new-machine-learning-agents)
+      - [🔥 NEW! Data Science Agents](#-new-data-science-agents)
       - [Multi-Agents](#multi-agents)
     - [Agents Coming Soon](#agents-coming-soon)
   - [Disclaimer](#disclaimer)
@@ -122,7 +132,7 @@ If you're an aspiring data scientist who wants to learn how to build AI Agents a
 This project is a work in progress. New data science agents will be released soon.
-![AI Data Science Team](/img/ai_data_science_team_.jpg)
+![AI Data Science Team](/img/ai_data_science_team.jpg)
 ### NEW: Multi-Agents
@@ -142,18 +152,22 @@ This is a top secret project I'm working on. It's a multi-agent data science app
 #### 🔥 Agentic Applications
-1. **SQL Database Agent App:** Connects any SQL Database, generates SQL queries from natural language, and returns data as a downloadable table. [See Application](/apps/sql-database-agent-app/)
+1. **NEW Exploratory Data Copilot**: An AI-powered data science app that performs automated exploratory data analysis (EDA) with EDA Reporting, Missing Data Analysis, Correlation Analysis, and more. [See Application](/apps/exploratory-copilot-app/)
+![Exploratory Data Copilot](/img/apps/ai_exploratory_copilot.jpg)
+2. **SQL Database Agent App:** Connects any SQL Database, generates SQL queries from natural language, and returns data as a downloadable table. [See Application](/apps/sql-database-agent-app/)
 ### Agents Available Now
-#### Agents
+#### Standard Agents
 1. **Data Wrangling Agent:** Merges, Joins, Preps and Wrangles data into a format that is ready for data analysis. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_wrangling_agent.ipynb)
 2. **Data Visualization Agent:** Creates visualizations to help you understand your data. Returns JSON serializable plotly visualizations. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_visualization_agent.ipynb)
 3. **🔥 Data Cleaning Agent:** Performs Data Preparation steps including handling missing values, outliers, and data type conversions. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_cleaning_agent.ipynb)
 4. **Feature Engineering Agent:** Converts the prepared data into ML-ready data. Adds features to increase predictive accuracy of ML models. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/feature_engineering_agent.ipynb)
 5. **🔥 SQL Database Agent:** Connects to SQL databases to pull data into the data science environment. Creates pipelines to automate data extraction. Performs Joins, Aggregations, and other SQL Query operations. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/sql_database_agent.ipynb)
-6. **Data Loader Tools Agent:** Loads data from various sources including CSV, Excel, Parquet, and Pickle files. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_loader_tools_agent.ipynb)
+6. **🔥 Data Loader Tools Agent:** Loads data from various sources including CSV, Excel, Parquet, and Pickle files. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_loader_tools_agent.ipynb)
 #### 🔥🔥 NEW! Machine Learning Agents
@@ -161,6 +175,10 @@ This is a top secret project I'm working on. It's a multi-agent data science app
 1. **🔥 H2O Machine Learning Agent:** Builds and logs 100's of high-performance machine learning models. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/ml_agents/h2o_machine_learning_agent.ipynb)
 2. **🔥 MLflow Tools Agent (MLOps):** This agent has 11+ tools for managing models, ML projects, and making production ML predictions with MLflow. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/ml_agents/mlflow_tools_agent.ipynb)
+#### 🔥 NEW! Data Science Agents
+1. **🔥🔥 EDA Tools Agent:** Performs automated exploratory data analysis (EDA) with EDA Reporting, Missing Data Analysis, Correlation Analysis, and more. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/ds_agents/eda_tools_agent.ipynb)
 #### Multi-Agents

{ai_data_science_team-0.0.0.9011 → ai_data_science_team-0.0.0.9013}/ai_data_science_team.egg-info/SOURCES.txt RENAMED Viewed

@@ -18,6 +18,9 @@ ai_data_science_team/agents/data_visualization_agent.py
 ai_data_science_team/agents/data_wrangling_agent.py
 ai_data_science_team/agents/feature_engineering_agent.py
 ai_data_science_team/agents/sql_database_agent.py
+ai_data_science_team/ds_agents/__init__.py
+ai_data_science_team/ds_agents/eda_tools_agent.py
+ai_data_science_team/ds_agents/modeling_tools_agent.py
 ai_data_science_team/ml_agents/__init__.py
 ai_data_science_team/ml_agents/h2o_ml_agent.py
 ai_data_science_team/ml_agents/h2o_ml_tools_agent.py
@@ -32,10 +35,14 @@ ai_data_science_team/templates/agent_templates.py
 ai_data_science_team/tools/__init__.py
 ai_data_science_team/tools/data_loader.py
 ai_data_science_team/tools/dataframe.py
+ai_data_science_team/tools/eda.py
 ai_data_science_team/tools/h2o.py
 ai_data_science_team/tools/mlflow.py
 ai_data_science_team/tools/sql.py
 ai_data_science_team/utils/__init__.py
+ai_data_science_team/utils/html.py
 ai_data_science_team/utils/logging.py
+ai_data_science_team/utils/matplotlib.py
+ai_data_science_team/utils/messages.py
 ai_data_science_team/utils/plotly.py
 ai_data_science_team/utils/regex.py

{ai_data_science_team-0.0.0.9011 → ai_data_science_team-0.0.0.9013}/ai_data_science_team.egg-info/requires.txt RENAMED Viewed

@@ -17,6 +17,14 @@ psutil
 [all]
 h2o
 mlflow
+pytimetk
+missingno
+sweetviz
+[data_science]
+pytimetk
+missingno
+sweetviz
 [machine_learning]
 h2o

{ai_data_science_team-0.0.0.9011 → ai_data_science_team-0.0.0.9013}/setup.py RENAMED Viewed

@@ -27,7 +27,8 @@ setup(
     install_requires=parse_requirements("requirements.txt"),
     extras_require={
         "machine_learning": ["h2o", "mlflow"],
-        "all": ["h2o", "mlflow"],
+        "data_science": ["pytimetk", "missingno", "sweetviz"],
+        "all": ["h2o", "mlflow", "pytimetk", "missingno","sweetviz"],
     },
     python_requires=">=3.9",
     classifiers=[