PyPI - ai-data-science-team - Versions diffs - 0.0.0.9010__tar.gz → 0.0.0.9011__tar.gz - Mend

ai-data-science-team 0.0.0.9010tar.gz → 0.0.0.9011tar.gz

Files changed (45) hide show

{ai_data_science_team-0.0.0.9010/ai_data_science_team.egg-info → ai_data_science_team-0.0.0.9011}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: ai-data-science-team
-Version: 0.0.0.9010
+Version: 0.0.0.9011
 Summary: Build and run an AI-powered data science team.
 Home-page: https://github.com/business-science/ai-data-science-team
 Author: Matt Dancho
@@ -93,8 +93,8 @@ The AI Data Science Team of Copilots includes Agents that specialize data cleani
     - [Apps Available Now](#apps-available-now)
       - [🔥 Agentic Applications](#-agentic-applications)
     - [Agents Available Now](#agents-available-now)
+      - [Agents](#agents)
       - [🔥🔥 NEW! Machine Learning Agents](#-new-machine-learning-agents)
-      - [Data Science Agents](#data-science-agents-1)
       - [Multi-Agents](#multi-agents)
     - [Agents Coming Soon](#agents-coming-soon)
   - [Disclaimer](#disclaimer)
@@ -122,7 +122,7 @@ If you're an aspiring data scientist who wants to learn how to build AI Agents a
 This project is a work in progress. New data science agents will be released soon.
-![Data Science Team](/img/ai_data_science_team.jpg)
+![AI Data Science Team](/img/ai_data_science_team_.jpg)
 ### NEW: Multi-Agents
@@ -146,18 +146,21 @@ This is a top secret project I'm working on. It's a multi-agent data science app
 ### Agents Available Now
+#### Agents
+1. **Data Wrangling Agent:** Merges, Joins, Preps and Wrangles data into a format that is ready for data analysis. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_wrangling_agent.ipynb)
+2. **Data Visualization Agent:** Creates visualizations to help you understand your data. Returns JSON serializable plotly visualizations. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_visualization_agent.ipynb)
+3. **🔥 Data Cleaning Agent:** Performs Data Preparation steps including handling missing values, outliers, and data type conversions. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_cleaning_agent.ipynb)
+4. **Feature Engineering Agent:** Converts the prepared data into ML-ready data. Adds features to increase predictive accuracy of ML models. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/feature_engineering_agent.ipynb)
+5. **🔥 SQL Database Agent:** Connects to SQL databases to pull data into the data science environment. Creates pipelines to automate data extraction. Performs Joins, Aggregations, and other SQL Query operations. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/sql_database_agent.ipynb)
+6. **Data Loader Tools Agent:** Loads data from various sources including CSV, Excel, Parquet, and Pickle files. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_loader_tools_agent.ipynb)
 #### 🔥🔥 NEW! Machine Learning Agents
 1. **🔥 H2O Machine Learning Agent:** Builds and logs 100's of high-performance machine learning models. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/ml_agents/h2o_machine_learning_agent.ipynb)
 2. **🔥 MLflow Tools Agent (MLOps):** This agent has 11+ tools for managing models, ML projects, and making production ML predictions with MLflow. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/ml_agents/mlflow_tools_agent.ipynb)
-#### Data Science Agents
-1. **Data Wrangling Agent:** Merges, Joins, Preps and Wrangles data into a format that is ready for data analysis. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_wrangling_agent.ipynb)
-2. **Data Visualization Agent:** Creates visualizations to help you understand your data. Returns JSON serializable plotly visualizations. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_visualization_agent.ipynb)
-3. **Data Cleaning Agent:** Performs Data Preparation steps including handling missing values, outliers, and data type conversions. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_cleaning_agent.ipynb)
-4. **Feature Engineering Agent:** Converts the prepared data into ML-ready data. Adds features to increase predictive accuracy of ML models. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/feature_engineering_agent.ipynb)
-5. **SQL Database Agent:** Connects to SQL databases to pull data into the data science environment. Creates pipelines to automate data extraction. Performs Joins, Aggregations, and other SQL Query operations. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/sql_database_agent.ipynb)
 #### Multi-Agents

{ai_data_science_team-0.0.0.9010 → ai_data_science_team-0.0.0.9011}/README.md RENAMED Viewed

@@ -46,8 +46,8 @@ The AI Data Science Team of Copilots includes Agents that specialize data cleani
     - [Apps Available Now](#apps-available-now)
       - [🔥 Agentic Applications](#-agentic-applications)
     - [Agents Available Now](#agents-available-now)
+      - [Agents](#agents)
       - [🔥🔥 NEW! Machine Learning Agents](#-new-machine-learning-agents)
-      - [Data Science Agents](#data-science-agents-1)
       - [Multi-Agents](#multi-agents)
     - [Agents Coming Soon](#agents-coming-soon)
   - [Disclaimer](#disclaimer)
@@ -75,7 +75,7 @@ If you're an aspiring data scientist who wants to learn how to build AI Agents a
 This project is a work in progress. New data science agents will be released soon.
-![Data Science Team](/img/ai_data_science_team.jpg)
+![AI Data Science Team](/img/ai_data_science_team_.jpg)
 ### NEW: Multi-Agents
@@ -99,18 +99,21 @@ This is a top secret project I'm working on. It's a multi-agent data science app
 ### Agents Available Now
+#### Agents
+1. **Data Wrangling Agent:** Merges, Joins, Preps and Wrangles data into a format that is ready for data analysis. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_wrangling_agent.ipynb)
+2. **Data Visualization Agent:** Creates visualizations to help you understand your data. Returns JSON serializable plotly visualizations. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_visualization_agent.ipynb)
+3. **🔥 Data Cleaning Agent:** Performs Data Preparation steps including handling missing values, outliers, and data type conversions. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_cleaning_agent.ipynb)
+4. **Feature Engineering Agent:** Converts the prepared data into ML-ready data. Adds features to increase predictive accuracy of ML models. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/feature_engineering_agent.ipynb)
+5. **🔥 SQL Database Agent:** Connects to SQL databases to pull data into the data science environment. Creates pipelines to automate data extraction. Performs Joins, Aggregations, and other SQL Query operations. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/sql_database_agent.ipynb)
+6. **Data Loader Tools Agent:** Loads data from various sources including CSV, Excel, Parquet, and Pickle files. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_loader_tools_agent.ipynb)
 #### 🔥🔥 NEW! Machine Learning Agents
 1. **🔥 H2O Machine Learning Agent:** Builds and logs 100's of high-performance machine learning models. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/ml_agents/h2o_machine_learning_agent.ipynb)
 2. **🔥 MLflow Tools Agent (MLOps):** This agent has 11+ tools for managing models, ML projects, and making production ML predictions with MLflow. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/ml_agents/mlflow_tools_agent.ipynb)
-#### Data Science Agents
-1. **Data Wrangling Agent:** Merges, Joins, Preps and Wrangles data into a format that is ready for data analysis. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_wrangling_agent.ipynb)
-2. **Data Visualization Agent:** Creates visualizations to help you understand your data. Returns JSON serializable plotly visualizations. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_visualization_agent.ipynb)
-3. **Data Cleaning Agent:** Performs Data Preparation steps including handling missing values, outliers, and data type conversions. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_cleaning_agent.ipynb)
-4. **Feature Engineering Agent:** Converts the prepared data into ML-ready data. Adds features to increase predictive accuracy of ML models. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/feature_engineering_agent.ipynb)
-5. **SQL Database Agent:** Connects to SQL databases to pull data into the data science environment. Creates pipelines to automate data extraction. Performs Joins, Aggregations, and other SQL Query operations. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/sql_database_agent.ipynb)
 #### Multi-Agents

ai_data_science_team-0.0.0.9011/ai_data_science_team/_version.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ __version__ = "0.0.0.9011"

{ai_data_science_team-0.0.0.9010 → ai_data_science_team-0.0.0.9011}/ai_data_science_team/agents/__init__.py RENAMED Viewed

@@ -3,3 +3,4 @@ from ai_data_science_team.agents.feature_engineering_agent import make_feature_e
 from ai_data_science_team.agents.data_wrangling_agent import make_data_wrangling_agent, DataWranglingAgent
 from ai_data_science_team.agents.sql_database_agent import make_sql_database_agent, SQLDatabaseAgent
 from ai_data_science_team.agents.data_visualization_agent import make_data_visualization_agent, DataVisualizationAgent
+from ai_data_science_team.agents.data_loader_tools_agent import make_data_loader_tools_agent, DataLoaderToolsAgent

ai_data_science_team-0.0.0.9011/ai_data_science_team/agents/data_loader_tools_agent.py ADDED Viewed

@@ -0,0 +1,272 @@
+from typing import Any, Optional, Annotated, Sequence, List, Dict
+import operator
+import pandas as pd
+import os
+from IPython.display import Markdown
+from langchain_core.messages import BaseMessage, AIMessage
+from langgraph.prebuilt import create_react_agent, ToolNode
+from langgraph.prebuilt.chat_agent_executor import AgentState
+from langgraph.graph import START, END, StateGraph
+from ai_data_science_team.templates import BaseAgent
+from ai_data_science_team.utils.regex import format_agent_name
+from ai_data_science_team.tools.data_loader import (
+    load_directory,
+    load_file,
+    list_directory_contents,
+    list_directory_recursive,
+    get_file_info,
+    search_files_by_pattern,
+)
+AGENT_NAME = "data_loader_tools_agent"
+tools = [
+    load_directory,
+    load_file,
+    list_directory_contents,
+    list_directory_recursive,
+    get_file_info,
+    search_files_by_pattern,
+]
+class DataLoaderToolsAgent(BaseAgent):
+    """
+    A Data Loader Agent that can interact with data loading tools and search for files in your file system.
+    Parameters:
+    ----------
+    model : langchain.llms.base.LLM
+        The language model used to generate the tool calling agent.
+    react_agent_kwargs : dict
+        Additional keyword arguments to pass to the create_react_agent function.
+    invoke_react_agent_kwargs : dict
+        Additional keyword arguments to pass to the invoke method of the react agent.
+    Methods:
+    --------
+    update_params(**kwargs)
+        Updates the agent's parameters and rebuilds the compiled graph.
+    ainvoke_agent(user_instructions: str=None, **kwargs)
+        Runs the agent with the given user instructions asynchronously.
+    invoke_agent(user_instructions: str=None, **kwargs)
+        Runs the agent with the given user instructions.
+    get_internal_messages(markdown: bool=False)
+        Returns the internal messages from the agent's response.
+    get_artifacts(as_dataframe: bool=False)
+        Returns the MLflow artifacts from the agent's response.
+    get_ai_message(markdown: bool=False)
+        Returns the AI message from the agent's response.
+    """
+    def __init__(
+        self,
+        model: Any,
+        create_react_agent_kwargs: Optional[Dict]={},
+        invoke_react_agent_kwargs: Optional[Dict]={},
+    ):
+        self._params = {
+            "model": model,
+            "create_react_agent_kwargs": create_react_agent_kwargs,
+            "invoke_react_agent_kwargs": invoke_react_agent_kwargs,
+        }
+        self._compiled_graph = self._make_compiled_graph()
+        self.response = None
+    def _make_compiled_graph(self):
+        """
+        Creates the compiled graph for the agent.
+        """
+        self.response = None
+        return make_data_loader_tools_agent(**self._params)
+    def update_params(self, **kwargs):
+        """
+        Updates the agent's parameters and rebuilds the compiled graph.
+        """
+        for k, v in kwargs.items():
+            self._params[k] = v
+        self._compiled_graph = self._make_compiled_graph()
+    async def ainvoke_agent(
+        self,
+        user_instructions: str=None,
+        **kwargs
+    ):
+        """
+        Runs the agent with the given user instructions.
+        Parameters:
+        ----------
+        user_instructions : str, optional
+            The user instructions to pass to the agent.
+        kwargs : dict, optional
+            Additional keyword arguments to pass to the agents ainvoke method.
+        """
+        response = await self._compiled_graph.ainvoke(
+            {
+                "user_instructions": user_instructions,
+            },
+            **kwargs
+        )
+        self.response = response
+        return None
+    def invoke_agent(
+        self,
+        user_instructions: str=None,
+        **kwargs
+    ):
+        """
+        Runs the agent with the given user instructions.
+        Parameters:
+        ----------
+        user_instructions : str, optional
+            The user instructions to pass to the agent.
+        kwargs : dict, optional
+            Additional keyword arguments to pass to the agents invoke method.
+        """
+        response = self._compiled_graph.invoke(
+            {
+                "user_instructions": user_instructions,
+            },
+            **kwargs
+        )
+        self.response = response
+        return None
+    def get_internal_messages(self, markdown: bool=False):
+        """
+        Returns the internal messages from the agent's response.
+        """
+        pretty_print = "\n\n".join([f"### {msg.type.upper()}\n\nID: {msg.id}\n\nContent:\n\n{msg.content}" for msg in self.response["internal_messages"]])
+        if markdown:
+            return Markdown(pretty_print)
+        else:
+            return self.response["internal_messages"]
+    def get_artifacts(self, as_dataframe: bool=False):
+        """
+        Returns the MLflow artifacts from the agent's response.
+        """
+        if as_dataframe:
+            return pd.DataFrame(self.response["data_loader_artifacts"])
+        else:
+            return self.response["data_loader_artifacts"]
+    def get_ai_message(self, markdown: bool=False):
+        """
+        Returns the AI message from the agent's response.
+        """
+        if markdown:
+            return Markdown(self.response["messages"][0].content)
+        else:
+            return self.response["messages"][0].content
+def make_data_loader_tools_agent(
+    model: Any,
+    create_react_agent_kwargs: Optional[Dict]={},
+    invoke_react_agent_kwargs: Optional[Dict]={},
+):
+    """
+    Creates a Data Loader Agent that can interact with data loading tools.
+    Parameters:
+    ----------
+    model : langchain.llms.base.LLM
+        The language model used to generate the tool calling agent.
+    react_agent_kwargs : dict
+        Additional keyword arguments to pass to the create_react_agent function.
+    invoke_react_agent_kwargs : dict
+        Additional keyword arguments to pass to the invoke method of the react agent.
+    Returns:
+    --------
+    app : langchain.graphs.CompiledStateGraph
+        An agent that can interact with data loading tools.
+    """
+    class GraphState(AgentState):
+        internal_messages: Annotated[Sequence[BaseMessage], operator.add]
+        user_instructions: str
+        data_loader_artifacts: dict
+    def data_loader_agent(state):
+        print(format_agent_name(AGENT_NAME))
+        print("    ")
+        print("    * RUN REACT TOOL-CALLING AGENT")
+        tool_node = ToolNode(
+            tools=tools
+        )
+        data_loader_agent = create_react_agent(
+            model,
+            tools=tool_node,
+            state_schema=GraphState,
+            **create_react_agent_kwargs,
+        )
+        response = data_loader_agent.invoke(
+            {
+                "messages": [("user", state["user_instructions"])],
+            },
+            invoke_react_agent_kwargs,
+        )
+        print("    * POST-PROCESS RESULTS")
+        internal_messages = response['messages']
+        # Ensure there is at least one AI message
+        if not internal_messages:
+            return {
+                "internal_messages": [],
+                "mlflow_artifacts": None,
+            }
+        # Get the last AI message
+        last_ai_message = AIMessage(internal_messages[-1].content, role = AGENT_NAME)
+        # Get the last tool artifact safely
+        last_tool_artifact = None
+        if len(internal_messages) > 1:
+            last_message = internal_messages[-2]  # Get second-to-last message
+            if hasattr(last_message, "artifact"):  # Check if it has an "artifact"
+                last_tool_artifact = last_message.artifact
+            elif isinstance(last_message, dict) and "artifact" in last_message:
+                last_tool_artifact = last_message["artifact"]
+        return {
+            "messages": [last_ai_message],
+            "internal_messages": internal_messages,
+            "data_loader_artifacts": last_tool_artifact,
+        }
+    workflow = StateGraph(GraphState)
+    workflow.add_node("data_loader_agent", data_loader_agent)
+    workflow.add_edge(START, "data_loader_agent")
+    workflow.add_edge("data_loader_agent", END)
+    app = workflow.compile()
+    return app

{ai_data_science_team-0.0.0.9010 → ai_data_science_team-0.0.0.9011}/ai_data_science_team/ml_agents/h2o_ml_agent.py RENAMED Viewed

@@ -506,6 +506,7 @@ def make_h2o_ml_agent(
             while remaining flexible to user instructions.
             - Return a dict with keys: leaderboard, best_model_id, model_path, and model_results.
             - If enable_mlfow is True, log the top metrics and save the model as an artifact. (See example function)
+            - IMPORTANT: if enable_mlflow is True, make sure to set enable_mlflow to True in the function definition.
             Initial User Instructions (Disregard any instructions that are unrelated to modeling):
                 {user_instructions}
@@ -533,7 +534,7 @@ def make_h2o_ml_agent(
                 sort_metric: str ,
                 model_directory: Optional[str] = None,
                 log_path: Optional[str] = None,
-                enable_mlflow: bool,
+                enable_mlflow: bool, # If use has specified to enable MLflow, make sure to make this True
                 mlflow_tracking_uri: Optional[str],
                 mlflow_experiment_name: str,
                 mlflow_run_name: str,

{ai_data_science_team-0.0.0.9010 → ai_data_science_team-0.0.0.9011}/ai_data_science_team/ml_agents/mlflow_tools_agent.py RENAMED Viewed

@@ -1,5 +1,5 @@
-from typing import Any, Optional, Annotated, Sequence
+from typing import Any, Optional, Annotated, Sequence, Dict
 import operator
 import pandas as pd
@@ -63,8 +63,10 @@ class MLflowToolsAgent(BaseAgent):
         The tracking URI for MLflow. Defaults to None.
     mlflow_registry_uri : str, optional
         The registry URI for MLflow. Defaults to None.
-    **react_agent_kwargs : dict, optional
-        Additional keyword arguments to pass to the agent's react agent.
+    react_agent_kwargs : dict
+        Additional keyword arguments to pass to the create_react_agent function.
+    invoke_react_agent_kwargs : dict
+        Additional keyword arguments to pass to the invoke method of the react agent.
     Methods:
     --------
@@ -114,13 +116,15 @@ class MLflowToolsAgent(BaseAgent):
         model: Any,
         mlflow_tracking_uri: Optional[str]=None,
         mlflow_registry_uri: Optional[str]=None,
-        **react_agent_kwargs,
+        create_react_agent_kwargs: Optional[Dict]={},
+        invoke_react_agent_kwargs: Optional[Dict]={},
     ):
         self._params = {
             "model": model,
             "mlflow_tracking_uri": mlflow_tracking_uri,
             "mlflow_registry_uri": mlflow_registry_uri,
-            **react_agent_kwargs,
+            "create_react_agent_kwargs": create_react_agent_kwargs,
+            "invoke_react_agent_kwargs": invoke_react_agent_kwargs,
         }
         self._compiled_graph = self._make_compiled_graph()
         self.response = None
@@ -185,8 +189,6 @@ class MLflowToolsAgent(BaseAgent):
             The user instructions to pass to the agent.
         data_raw : pd.DataFrame, optional
             The raw data to pass to the agent. Used for prediction and tool calls where data is required.
-        kwargs : dict, optional
-            Additional keyword arguments to pass to the agents invoke method.
         """
         response = self._compiled_graph.invoke(
@@ -234,10 +236,30 @@ def make_mlflow_tools_agent(
     model: Any,
     mlflow_tracking_uri: str=None,
     mlflow_registry_uri: str=None,
-    **react_agent_kwargs,
+    create_react_agent_kwargs: Optional[Dict]={},
+    invoke_react_agent_kwargs: Optional[Dict]={},
 ):
     """
     MLflow Tool Calling Agent
+    Parameters:
+    ----------
+    model : Any
+        The language model used to generate the agent.
+    mlflow_tracking_uri : str, optional
+        The tracking URI for MLflow. Defaults to None.
+    mlflow_registry_uri : str, optional
+        The registry URI for MLflow. Defaults to None.
+    create_react_agent_kwargs : dict, optional
+        Additional keyword arguments to pass to the agent's create_react_agent method.
+    invoke_react_agent_kwargs : dict, optional
+        Additional keyword arguments to pass to the agent's invoke method.
+    Returns
+    -------
+    app : langchain.graphs.CompiledStateGraph
+        A compiled state graph for the MLflow Tool Calling Agent.
     """
     try:
@@ -274,7 +296,7 @@ def make_mlflow_tools_agent(
             model,
             tools=tool_node,
             state_schema=GraphState,
-            **react_agent_kwargs,
+            **create_react_agent_kwargs,
         )
         response = mlflow_agent.invoke(
@@ -282,6 +304,7 @@ def make_mlflow_tools_agent(
                 "messages": [("user", state["user_instructions"])],
                 "data_raw": state["data_raw"],
             },
+            invoke_react_agent_kwargs,
         )
         print("    * POST-PROCESS RESULTS")

{ai_data_science_team-0.0.0.9010 → ai_data_science_team-0.0.0.9011}/ai_data_science_team/tools/data_loader.py RENAMED Viewed

@@ -1,41 +1,77 @@
 from langchain.tools import tool
+from langgraph.prebuilt import InjectedState
 import pandas as pd
+import os
-from typing import Tuple, List, Dict
+from typing import Tuple, List, Dict, Optional, Annotated
 @tool(response_format='content_and_artifact')
-def load_directory(dir_path: str) -> Tuple[str, Dict]:
+def load_directory(
+    directory_path: str = os.getcwd(),
+    file_type: Optional[str] = None
+) -> Tuple[str, Dict]:
     """
     Tool: load_directory
-    Description: Loads all recognized tabular files in a directory.
+    Description: Loads all recognized tabular files in a directory.
+                 If file_type is specified (e.g., 'csv'), only files
+                 with that extension are loaded.
     Parameters:
     ----------
-    dir_path : str
-        The path to the directory to load.
+    directory_path : str
+        The path to the directory to load. Defaults to the current working directory.
+    file_type : str, optional
+        The extension of the file type you want to load exclusively
+        (e.g., 'csv', 'xlsx', 'parquet'). If None or not provided,
+        attempts to load all recognized tabular files.
     Returns:
     -------
     Tuple[str, Dict]
         A tuple containing a message and a dictionary of data frames.
     """
-    print("    * Tool: load_directory")
+    print(f"    * Tool: load_directory | {directory_path}")
     import os
     import pandas as pd
+    if directory_path is None:
+        return "No directory path provided.", {}
+    if not os.path.isdir(directory_path):
+        return f"Directory not found: {directory_path}", {}
     data_frames = {}
-    for filename in os.listdir(dir_path):
-        file_path = os.path.join(dir_path, filename)
+    for filename in os.listdir(directory_path):
+        file_path = os.path.join(directory_path, filename)
         # Skip directories
         if os.path.isdir(file_path):
             continue
+        # If file_type is specified, only process files that match.
+        if file_type:
+            # Make sure extension check is case-insensitive
+            if not filename.lower().endswith(f".{file_type.lower()}"):
+                continue
         try:
+            # Attempt to auto-detect and load the file
             data_frames[filename] = auto_load_file(file_path).to_dict()
         except Exception as e:
+            # If loading fails, record the error message
             data_frames[filename] = f"Error loading file: {e}"
-    return f"Returned the following data frames: {list(data_frames.keys())}", data_frames
+    return (
+        f"Returned the following data frames: {list(data_frames.keys())}",
+        data_frames
+    )
 @tool(response_format='content_and_artifact')
 def load_file(file_path: str) -> Tuple[str, Dict]:
@@ -52,12 +88,15 @@ def load_file(file_path: str) -> Tuple[str, Dict]:
     Tuple[str, Dict]
         A tuple containing a message and a dictionary of the data frame.
     """
-    print("    * Tool: load_file")
+    print(f"    * Tool: load_file | {file_path}")
     return f"Returned the following data frame from this file: {file_path}", auto_load_file(file_path).to_dict()
 @tool(response_format='content_and_artifact')
-def list_directory_contents(directory_path: str, show_hidden: bool = False) -> Tuple[List[str], List[Dict]]:
+def list_directory_contents(
+    directory_path: str = os.getcwd(),
+    show_hidden: bool = False
+) -> Tuple[List[str], List[Dict]]:
     """
     Tool: list_directory_contents
     Description: Lists all files and folders in the specified directory.
@@ -67,30 +106,51 @@ def list_directory_contents(directory_path: str, show_hidden: bool = False) -> T
     Returns:
         tuple:
             - content (list[str]): A list of filenames/folders (suitable for display)
-            - artifact (list[dict]): A list of dictionaries where each dict has keys like {"filename": <name>}.
-                                     This structure can be easily converted to a pandas DataFrame.
+            - artifact (list[dict]): A list of dictionaries where each dict includes
+              the keys {"filename": <name>, "type": <'file' or 'directory'>}.
+              This structure can be easily converted to a pandas DataFrame.
     """
-    print("    * Tool: list_directory_contents")
+    print(f"    * Tool: list_directory_contents | {directory_path}")
     import os
+    if directory_path is None:
+        return "No directory path provided.", []
+    if not os.path.isdir(directory_path):
+        return f"Directory not found: {directory_path}", []
     items = []
     for item in os.listdir(directory_path):
         # If show_hidden is False, skip items starting with '.'
         if not show_hidden and item.startswith('.'):
             continue
         items.append(item)
+    items.reverse()
-    # content: just the raw list of filenames
-    content = items
-    # artifact: list of dicts (each row is {"filename": ...}), easily turned into a DataFrame
-    artifact = [{"filename": item} for item in items]
+    # content: just the raw list of item names (files/folders).
+    content = items.copy()
+    content.append(f"Total items: {len(items)}")
+    content.append(f"Directory: {directory_path}")
+    # artifact: list of dicts with both "filename" and "type" keys.
+    artifact = []
+    for item in items:
+        item_path = os.path.join(directory_path, item)
+        artifact.append({
+            "filename": item,
+            "type": "directory" if os.path.isdir(item_path) else "file"
+        })
     return content, artifact
 @tool(response_format='content_and_artifact')
-def list_directory_recursive(directory_path: str, show_hidden: bool = False) -> Tuple[str, List[Dict]]:
+def list_directory_recursive(
+    directory_path: str = os.getcwd(),
+    show_hidden: bool = False
+) -> Tuple[str, List[Dict]]:
     """
     Tool: list_directory_recursive
     Description:
@@ -111,13 +171,19 @@ def list_directory_recursive(directory_path: str, show_hidden: bool = False) ->
     Example:
         content, artifact = list_directory_recursive("/path/to/folder", show_hidden=False)
     """
-    print("    * Tool: list_directory_recursive")
+    print(f"    * Tool: list_directory_recursive | {directory_path}")
     # We'll store two things as we recurse:
     # 1) lines for building the "tree" string
     # 2) records in a list of dicts for easy DataFrame creation
     import os
+    if directory_path is None:
+        return "No directory path provided.", {}
+    if not os.path.isdir(directory_path):
+        return f"Directory not found: {directory_path}", {}
     lines = []
     records = []
@@ -210,7 +276,7 @@ def get_file_info(file_path: str) -> Tuple[str, List[Dict]]:
     Example:
         content, artifact = get_file_info("/path/to/mydata.csv")
     """
-    print("    * Tool: get_file_info")
+    print(f"    * Tool: get_file_info | {file_path}")
     # Ensure the file exists
     import os
@@ -244,7 +310,11 @@ def get_file_info(file_path: str) -> Tuple[str, List[Dict]]:
 @tool(response_format='content_and_artifact')
-def search_files_by_pattern(directory_path: str, pattern: str = "*.csv", recursive: bool = False) -> Tuple[str, List[Dict]]:
+def search_files_by_pattern(
+    directory_path: str = os.getcwd(),
+    pattern: str = "*.csv",
+    recursive: bool = False
+) -> Tuple[str, List[Dict]]:
     """
     Tool: search_files_by_pattern
     Description:
@@ -266,7 +336,7 @@ def search_files_by_pattern(directory_path: str, pattern: str = "*.csv", recursi
     Example:
         content, artifact = search_files_by_pattern("/path/to/folder", "*.csv", recursive=True)
     """
-    print("    * Tool: search_files_by_pattern")
+    print(f"    * Tool: search_files_by_pattern | {directory_path}")
     import os
     import fnmatch

ai_data_science_team-0.0.0.9011/ai_data_science_team/utils/__init__.py ADDED Viewed

File without changes

{ai_data_science_team-0.0.0.9010 → ai_data_science_team-0.0.0.9011/ai_data_science_team.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: ai-data-science-team
-Version: 0.0.0.9010
+Version: 0.0.0.9011
 Summary: Build and run an AI-powered data science team.
 Home-page: https://github.com/business-science/ai-data-science-team
 Author: Matt Dancho
@@ -93,8 +93,8 @@ The AI Data Science Team of Copilots includes Agents that specialize data cleani
     - [Apps Available Now](#apps-available-now)
       - [🔥 Agentic Applications](#-agentic-applications)
     - [Agents Available Now](#agents-available-now)
+      - [Agents](#agents)
       - [🔥🔥 NEW! Machine Learning Agents](#-new-machine-learning-agents)
-      - [Data Science Agents](#data-science-agents-1)
       - [Multi-Agents](#multi-agents)
     - [Agents Coming Soon](#agents-coming-soon)
   - [Disclaimer](#disclaimer)
@@ -122,7 +122,7 @@ If you're an aspiring data scientist who wants to learn how to build AI Agents a
 This project is a work in progress. New data science agents will be released soon.
-![Data Science Team](/img/ai_data_science_team.jpg)
+![AI Data Science Team](/img/ai_data_science_team_.jpg)
 ### NEW: Multi-Agents
@@ -146,18 +146,21 @@ This is a top secret project I'm working on. It's a multi-agent data science app
 ### Agents Available Now
+#### Agents
+1. **Data Wrangling Agent:** Merges, Joins, Preps and Wrangles data into a format that is ready for data analysis. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_wrangling_agent.ipynb)
+2. **Data Visualization Agent:** Creates visualizations to help you understand your data. Returns JSON serializable plotly visualizations. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_visualization_agent.ipynb)
+3. **🔥 Data Cleaning Agent:** Performs Data Preparation steps including handling missing values, outliers, and data type conversions. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_cleaning_agent.ipynb)
+4. **Feature Engineering Agent:** Converts the prepared data into ML-ready data. Adds features to increase predictive accuracy of ML models. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/feature_engineering_agent.ipynb)
+5. **🔥 SQL Database Agent:** Connects to SQL databases to pull data into the data science environment. Creates pipelines to automate data extraction. Performs Joins, Aggregations, and other SQL Query operations. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/sql_database_agent.ipynb)
+6. **Data Loader Tools Agent:** Loads data from various sources including CSV, Excel, Parquet, and Pickle files. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_loader_tools_agent.ipynb)
 #### 🔥🔥 NEW! Machine Learning Agents
 1. **🔥 H2O Machine Learning Agent:** Builds and logs 100's of high-performance machine learning models. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/ml_agents/h2o_machine_learning_agent.ipynb)
 2. **🔥 MLflow Tools Agent (MLOps):** This agent has 11+ tools for managing models, ML projects, and making production ML predictions with MLflow. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/ml_agents/mlflow_tools_agent.ipynb)
-#### Data Science Agents
-1. **Data Wrangling Agent:** Merges, Joins, Preps and Wrangles data into a format that is ready for data analysis. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_wrangling_agent.ipynb)
-2. **Data Visualization Agent:** Creates visualizations to help you understand your data. Returns JSON serializable plotly visualizations. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_visualization_agent.ipynb)
-3. **Data Cleaning Agent:** Performs Data Preparation steps including handling missing values, outliers, and data type conversions. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_cleaning_agent.ipynb)
-4. **Feature Engineering Agent:** Converts the prepared data into ML-ready data. Adds features to increase predictive accuracy of ML models. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/feature_engineering_agent.ipynb)
-5. **SQL Database Agent:** Connects to SQL databases to pull data into the data science environment. Creates pipelines to automate data extraction. Performs Joins, Aggregations, and other SQL Query operations. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/sql_database_agent.ipynb)
 #### Multi-Agents

{ai_data_science_team-0.0.0.9010 → ai_data_science_team-0.0.0.9011}/ai_data_science_team.egg-info/SOURCES.txt RENAMED Viewed

@@ -20,6 +20,7 @@ ai_data_science_team/agents/feature_engineering_agent.py
 ai_data_science_team/agents/sql_database_agent.py
 ai_data_science_team/ml_agents/__init__.py
 ai_data_science_team/ml_agents/h2o_ml_agent.py
+ai_data_science_team/ml_agents/h2o_ml_tools_agent.py
 ai_data_science_team/ml_agents/mlflow_tools_agent.py
 ai_data_science_team/multiagents/__init__.py
 ai_data_science_team/multiagents/sql_data_analyst.py

ai_data_science_team-0.0.0.9010/ai_data_science_team/_version.py DELETED Viewed

	@@ -1 +0,0 @@
1	- __version__ = "0.0.0.9010"

ai_data_science_team-0.0.0.9010/ai_data_science_team/agents/data_loader_tools_agent.py DELETED Viewed

@@ -1,69 +0,0 @@
-from typing import Any, Optional, Annotated, Sequence, List, Dict
-import operator
-import pandas as pd
-import os
-from IPython.display import Markdown
-from langchain_core.messages import BaseMessage, AIMessage
-from langgraph.prebuilt import create_react_agent, ToolNode
-from langgraph.prebuilt.chat_agent_executor import AgentState
-from langgraph.graph import START, END, StateGraph
-from ai_data_science_team.templates import BaseAgent
-from ai_data_science_team.utils.regex import format_agent_name
-from ai_data_science_team.tools.data_loader import (
-    load_directory,
-    load_file,
-    list_directory_contents,
-    list_directory_recursive,
-    get_file_info,
-    search_files_by_pattern,
-)
-AGENT_NAME = "data_loader_tools_agent"
-tools = [
-    load_directory,
-    load_file,
-    list_directory_contents,
-    list_directory_recursive,
-    get_file_info,
-    search_files_by_pattern,
-]
-def make_data_loader_tools_agent(
-    model: Any,
-    directory: Optional[str] = os.getcwd(),
-):
-    """
-    Creates a Data Loader Agent that can interact with data loading tools.
-    Parameters:
-    ----------
-    model : langchain.llms.base.LLM
-        The language model used to generate the tool calling agent.
-    directory : str, optional
-        The directory to search for files. Defaults to the current working directory.
-    Returns:
-    --------
-    Data Loader Agent
-        An agent that can interact with data loading tools.
-    """
-    class GraphState(AgentState):
-        internal_messages: Annotated[Sequence[BaseMessage], operator.add]
-        directory: str
-        user_instructions: str
-        data_artifacts: dict
-    pass