PyPI - ai-data-science-team - Versions diffs - 0.0.0.9015__tar.gz → 0.0.0.9016__tar.gz - Mend

ai-data-science-team 0.0.0.9015tar.gz → 0.0.0.9016tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (52) hide show

{ai_data_science_team-0.0.0.9015/ai_data_science_team.egg-info → ai_data_science_team-0.0.0.9016}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
-Metadata-Version: 2.2
+Metadata-Version: 2.4
 Name: ai-data-science-team
-Version: 0.0.0.9015
+Version: 0.0.0.9016
 Summary: Build and run an AI-powered data science team.
 Home-page: https://github.com/business-science/ai-data-science-team
 Author: Matt Dancho
@@ -47,6 +47,7 @@ Dynamic: classifier
 Dynamic: description
 Dynamic: description-content-type
 Dynamic: home-page
+Dynamic: license-file
 Dynamic: provides-extra
 Dynamic: requires-dist
 Dynamic: requires-python
@@ -97,9 +98,8 @@ The AI Data Science Team of Copilots includes Agents that specialize data cleani
   - [Companies That Want A Custom AI Data Science Team (And AI Apps)](#companies-that-want-a-custom-ai-data-science-team-and-ai-apps)
   - [Generative AI for Data Scientists Workshop](#generative-ai-for-data-scientists-workshop)
   - [Data Science Agents](#data-science-agents)
+    - [🔥 NEW: Data Science Apps](#-new-data-science-apps)
     - [NEW: Multi-Agents](#new-multi-agents)
-    - [Data Science Apps](#data-science-apps)
-    - [Apps Available Now](#apps-available-now)
       - [🔥 Agentic Applications](#-agentic-applications)
     - [Agents Available Now](#agents-available-now)
       - [Standard Agents](#standard-agents)
@@ -110,11 +110,11 @@ The AI Data Science Team of Copilots includes Agents that specialize data cleani
   - [Disclaimer](#disclaimer)
   - [Installation](#installation)
   - [Usage](#usage)
-    - [Example 1: Feature Engineering with the Feature Engineering Agent](#example-1-feature-engineering-with-the-feature-engineering-agent)
-    - [Example 2: Cleaning Data with the Data Cleaning Agent](#example-2-cleaning-data-with-the-data-cleaning-agent)
+    - [Example: H2O Machine Learning Agent](#example-h2o-machine-learning-agent)
   - [Contributing](#contributing)
   - [License](#license)
 - [Want To Become A Full-Stack Generative AI Data Scientist?](#want-to-become-a-full-stack-generative-ai-data-scientist)
+- [⭐️ Star History](#️-star-history)
 ## Companies That Want A Custom AI Data Science Team (And AI Apps)
@@ -134,21 +134,24 @@ This project is a work in progress. New data science agents will be released soo
 ![AI Data Science Team](/img/ai_data_science_team.jpg)
-### NEW: Multi-Agents
+### 🔥 NEW: Data Science Apps
-**🔥 Pandas Data Analyst Agent:** Combines the ability to wrangle, transform, and analyze data with an optional data visualization agent that can create interactive plots.
+**🔥 Open Pandas AI Data Analyst:** Load an Excel or CSV file and ask it questions. Get data and charts back.
+![Pandas Data Analyst App](/img/apps/ai_pandas_data_analyst_app.jpg)
+**🔥 SQL Database Agent:** Connects any SQL Database, generates SQL queries from natural language, and returns data as a downloadable table.
-![Business Intelligence SQL Agent](/img/multi_agent_pandas_data_analyst.jpg)
+**🔥 Exploratory Data Copilot:** An AI-powered data science app that performs automated exploratory data analysis (EDA) with EDA Reporting, Missing Data Analysis, Correlation Analysis, and more.
-### Data Science Apps
+[See all available apps here](/apps)
-This is a top secret project I'm working on. It's a multi-agent data science app that performs time series forecasting.
+### NEW: Multi-Agents
-![Multi-Agent Data Science App](/img/ai_powered_apps.jpg)
+**🔥 Pandas Data Analyst Agent:** Combines the ability to wrangle, transform, and analyze data with an optional data visualization agent that can create interactive plots.
-### Apps Available Now
+![Pandas Data Analyst Agent](/img/multi_agent_pandas_data_analyst.jpg)
-[See all available apps here](/apps)
 #### 🔥 Agentic Applications
@@ -205,6 +208,14 @@ By using this software, you agree to use it solely for learning purposes.
 ## Installation
+You can install via PyPI (note that this is a beta version and breaking changes may occur until 0.1.0):
+``` bash
+pip install ai-data-science-team
+```
+Or, if you want the latest version from GitHub:
 ``` bash
 pip install git+https://github.com/business-science/ai-data-science-team.git --upgrade
 ```
@@ -213,55 +224,46 @@ pip install git+https://github.com/business-science/ai-data-science-team.git --u
 [See all examples here.](/examples)
-### Example 1: Feature Engineering with the Feature Engineering Agent
+### Example: H2O Machine Learning Agent
-[See the full example here.](/examples/feature_engineering_agent.ipynb)
+[See the full example here.](https://github.com/business-science/ai-data-science-team/blob/master/examples/ml_agents/h2o_machine_learning_agent.ipynb)
 ``` python
-feature_engineering_agent = FeatureEngineeringAgent(model = llm)
-feature_engineering_agent.invoke_agent(
-    data_raw = df,
-    user_instructions = "Make sure to scale and center numeric features",
-    target_variable = "Churn",
-    max_retries = 3,
+# Import libraries
+from langchain_openai import ChatOpenAI
+import pandas as pd
+import h2o
+import os
+from ai_data_science_team.ml_agents import H2OMLAgent
+# Load the data
+df = pd.read_csv("data/churn_data.csv")
+df
+# Initialize the language model
+os.environ['OPENAI_API_KEY'] = "YOUR_OPENAI_API_KEY"
+llm = ChatOpenAI(model=MODEL)
+llm
+# Initialize the H2O ML Agent
+ml_agent = H2OMLAgent(
+    model=llm,
+    log=True,
+    log_path="logs/",
+    model_directory="h2o_models/",
+    enable_mlflow=True, # Use this if you wish to log models to MLflow
 )
-```
-``` bash
----FEATURE ENGINEERING AGENT----
-    * CREATE FEATURE ENGINEER CODE
-    * EXECUTING AGENT CODE
-    * EXPLAIN AGENT CODE
-```
-``` python
-feature_engineering_agent.get_data_engineered()
-```
-### Example 2: Cleaning Data with the Data Cleaning Agent
-[See the full example here.](/examples/data_cleaning_agent.ipynb)
-``` python
-data_cleaning_agent = DataCleaningAgent(model = llm)
+ml_agent
-response = data_cleaning_agent.invoke_agent(
-    data_raw = df,
-    user_instructions = "Don't remove outliers when cleaning the data.",
-    max_retries = 3,
+# Run the agent
+ml_agent.invoke_agent(
+    data_raw=df.drop(columns=["customerID"]),
+    user_instructions="Please do classification on 'Churn'. Use a max runtime of 30 seconds.",
+    target_variable="Churn"
 )
-```
-``` bash
----DATA CLEANING AGENT----
-    * CREATE DATA CLEANER CODE
-    * EXECUTING AGENT CODE
-    * EXPLAIN AGENT CODE
-```
-``` python
-data_cleaning_agent.get_data_cleaned()
+# Retrieve and display the leaderboard of models
+ml_agent.get_leaderboard()
 ```
 ## Contributing
@@ -282,4 +284,8 @@ This project is licensed under the MIT License. See LICENSE file for details.
 I teach Generative AI Data Science to help you build AI-powered data science apps. [**Register for my next Generative AI for Data Scientists workshop here.**](https://learn.business-science.io/ai-register)
+# ⭐️ Star History
+[![Star History Chart](https://api.star-history.com/svg?repos=business-science/ai-data-science-team&type=Date)](https://star-history.com/#)
+[**Please ⭐ us on GitHub (it takes 2 seconds and means a lot).**](https://github.com/business-science/ai-data-science-team)

{ai_data_science_team-0.0.0.9015 → ai_data_science_team-0.0.0.9016}/README.md RENAMED Viewed

@@ -43,9 +43,8 @@ The AI Data Science Team of Copilots includes Agents that specialize data cleani
   - [Companies That Want A Custom AI Data Science Team (And AI Apps)](#companies-that-want-a-custom-ai-data-science-team-and-ai-apps)
   - [Generative AI for Data Scientists Workshop](#generative-ai-for-data-scientists-workshop)
   - [Data Science Agents](#data-science-agents)
+    - [🔥 NEW: Data Science Apps](#-new-data-science-apps)
     - [NEW: Multi-Agents](#new-multi-agents)
-    - [Data Science Apps](#data-science-apps)
-    - [Apps Available Now](#apps-available-now)
       - [🔥 Agentic Applications](#-agentic-applications)
     - [Agents Available Now](#agents-available-now)
       - [Standard Agents](#standard-agents)
@@ -56,11 +55,11 @@ The AI Data Science Team of Copilots includes Agents that specialize data cleani
   - [Disclaimer](#disclaimer)
   - [Installation](#installation)
   - [Usage](#usage)
-    - [Example 1: Feature Engineering with the Feature Engineering Agent](#example-1-feature-engineering-with-the-feature-engineering-agent)
-    - [Example 2: Cleaning Data with the Data Cleaning Agent](#example-2-cleaning-data-with-the-data-cleaning-agent)
+    - [Example: H2O Machine Learning Agent](#example-h2o-machine-learning-agent)
   - [Contributing](#contributing)
   - [License](#license)
 - [Want To Become A Full-Stack Generative AI Data Scientist?](#want-to-become-a-full-stack-generative-ai-data-scientist)
+- [⭐️ Star History](#️-star-history)
 ## Companies That Want A Custom AI Data Science Team (And AI Apps)
@@ -80,21 +79,24 @@ This project is a work in progress. New data science agents will be released soo
 ![AI Data Science Team](/img/ai_data_science_team.jpg)
-### NEW: Multi-Agents
+### 🔥 NEW: Data Science Apps
-**🔥 Pandas Data Analyst Agent:** Combines the ability to wrangle, transform, and analyze data with an optional data visualization agent that can create interactive plots.
+**🔥 Open Pandas AI Data Analyst:** Load an Excel or CSV file and ask it questions. Get data and charts back.
+![Pandas Data Analyst App](/img/apps/ai_pandas_data_analyst_app.jpg)
+**🔥 SQL Database Agent:** Connects any SQL Database, generates SQL queries from natural language, and returns data as a downloadable table.
-![Business Intelligence SQL Agent](/img/multi_agent_pandas_data_analyst.jpg)
+**🔥 Exploratory Data Copilot:** An AI-powered data science app that performs automated exploratory data analysis (EDA) with EDA Reporting, Missing Data Analysis, Correlation Analysis, and more.
-### Data Science Apps
+[See all available apps here](/apps)
-This is a top secret project I'm working on. It's a multi-agent data science app that performs time series forecasting.
+### NEW: Multi-Agents
-![Multi-Agent Data Science App](/img/ai_powered_apps.jpg)
+**🔥 Pandas Data Analyst Agent:** Combines the ability to wrangle, transform, and analyze data with an optional data visualization agent that can create interactive plots.
-### Apps Available Now
+![Pandas Data Analyst Agent](/img/multi_agent_pandas_data_analyst.jpg)
-[See all available apps here](/apps)
 #### 🔥 Agentic Applications
@@ -151,6 +153,14 @@ By using this software, you agree to use it solely for learning purposes.
 ## Installation
+You can install via PyPI (note that this is a beta version and breaking changes may occur until 0.1.0):
+``` bash
+pip install ai-data-science-team
+```
+Or, if you want the latest version from GitHub:
 ``` bash
 pip install git+https://github.com/business-science/ai-data-science-team.git --upgrade
 ```
@@ -159,55 +169,46 @@ pip install git+https://github.com/business-science/ai-data-science-team.git --u
 [See all examples here.](/examples)
-### Example 1: Feature Engineering with the Feature Engineering Agent
+### Example: H2O Machine Learning Agent
-[See the full example here.](/examples/feature_engineering_agent.ipynb)
+[See the full example here.](https://github.com/business-science/ai-data-science-team/blob/master/examples/ml_agents/h2o_machine_learning_agent.ipynb)
 ``` python
-feature_engineering_agent = FeatureEngineeringAgent(model = llm)
-feature_engineering_agent.invoke_agent(
-    data_raw = df,
-    user_instructions = "Make sure to scale and center numeric features",
-    target_variable = "Churn",
-    max_retries = 3,
+# Import libraries
+from langchain_openai import ChatOpenAI
+import pandas as pd
+import h2o
+import os
+from ai_data_science_team.ml_agents import H2OMLAgent
+# Load the data
+df = pd.read_csv("data/churn_data.csv")
+df
+# Initialize the language model
+os.environ['OPENAI_API_KEY'] = "YOUR_OPENAI_API_KEY"
+llm = ChatOpenAI(model=MODEL)
+llm
+# Initialize the H2O ML Agent
+ml_agent = H2OMLAgent(
+    model=llm,
+    log=True,
+    log_path="logs/",
+    model_directory="h2o_models/",
+    enable_mlflow=True, # Use this if you wish to log models to MLflow
 )
-```
-``` bash
----FEATURE ENGINEERING AGENT----
-    * CREATE FEATURE ENGINEER CODE
-    * EXECUTING AGENT CODE
-    * EXPLAIN AGENT CODE
-```
-``` python
-feature_engineering_agent.get_data_engineered()
-```
-### Example 2: Cleaning Data with the Data Cleaning Agent
-[See the full example here.](/examples/data_cleaning_agent.ipynb)
-``` python
-data_cleaning_agent = DataCleaningAgent(model = llm)
+ml_agent
-response = data_cleaning_agent.invoke_agent(
-    data_raw = df,
-    user_instructions = "Don't remove outliers when cleaning the data.",
-    max_retries = 3,
+# Run the agent
+ml_agent.invoke_agent(
+    data_raw=df.drop(columns=["customerID"]),
+    user_instructions="Please do classification on 'Churn'. Use a max runtime of 30 seconds.",
+    target_variable="Churn"
 )
-```
-``` bash
----DATA CLEANING AGENT----
-    * CREATE DATA CLEANER CODE
-    * EXECUTING AGENT CODE
-    * EXPLAIN AGENT CODE
-```
-``` python
-data_cleaning_agent.get_data_cleaned()
+# Retrieve and display the leaderboard of models
+ml_agent.get_leaderboard()
 ```
 ## Contributing
@@ -228,4 +229,8 @@ This project is licensed under the MIT License. See LICENSE file for details.
 I teach Generative AI Data Science to help you build AI-powered data science apps. [**Register for my next Generative AI for Data Scientists workshop here.**](https://learn.business-science.io/ai-register)
+# ⭐️ Star History
+[![Star History Chart](https://api.star-history.com/svg?repos=business-science/ai-data-science-team&type=Date)](https://star-history.com/#)
+[**Please ⭐ us on GitHub (it takes 2 seconds and means a lot).**](https://github.com/business-science/ai-data-science-team)

ai_data_science_team-0.0.0.9016/ai_data_science_team/_version.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ __version__ = "0.0.0.9016"

{ai_data_science_team-0.0.0.9015 → ai_data_science_team-0.0.0.9016}/ai_data_science_team/ds_agents/eda_tools_agent.py RENAMED Viewed

@@ -1,5 +1,3 @@
 from typing import Any, Optional, Annotated, Sequence, Dict
 import operator
 import pandas as pd
@@ -17,10 +15,11 @@ from ai_data_science_team.utils.regex import format_agent_name
 from ai_data_science_team.tools.eda import (
     explain_data,
-    describe_dataset,
-    visualize_missing,
-    correlation_funnel,
+    describe_dataset,
+    visualize_missing,
+    generate_correlation_funnel,
     generate_sweetviz_report,
+    generate_dtale_report,
 )
 from ai_data_science_team.utils.messages import get_tool_call_names
@@ -32,15 +31,17 @@ EDA_TOOLS = [
     explain_data,
     describe_dataset,
     visualize_missing,
-    correlation_funnel,
+    generate_correlation_funnel,
     generate_sweetviz_report,
+    generate_dtale_report,
 ]
 class EDAToolsAgent(BaseAgent):
     """
     An Exploratory Data Analysis Tools Agent that interacts with EDA tools to generate summary statistics,
     missing data visualizations, correlation funnels, EDA reports, etc.
     Parameters:
     ----------
     model : langchain.llms.base.LLM
@@ -52,9 +53,9 @@ class EDAToolsAgent(BaseAgent):
     checkpointer : Checkpointer, optional
         The checkpointer for the agent.
     """
     def __init__(
-        self,
+        self,
         model: Any,
         create_react_agent_kwargs: Optional[Dict] = {},
         invoke_react_agent_kwargs: Optional[Dict] = {},
@@ -64,18 +65,18 @@ class EDAToolsAgent(BaseAgent):
             "model": model,
             "create_react_agent_kwargs": create_react_agent_kwargs,
             "invoke_react_agent_kwargs": invoke_react_agent_kwargs,
-            "checkpointer": checkpointer
+            "checkpointer": checkpointer,
         }
         self._compiled_graph = self._make_compiled_graph()
         self.response = None
     def _make_compiled_graph(self):
         """
         Creates the compiled state graph for the EDA agent.
         """
         self.response = None
         return make_eda_tools_agent(**self._params)
     def update_params(self, **kwargs):
         """
         Updates the agent's parameters and rebuilds the compiled graph.
@@ -83,16 +84,13 @@ class EDAToolsAgent(BaseAgent):
         for k, v in kwargs.items():
             self._params[k] = v
         self._compiled_graph = self._make_compiled_graph()
     async def ainvoke_agent(
-        self,
-        user_instructions: str = None,
-        data_raw: pd.DataFrame = None,
-        **kwargs
+        self, user_instructions: str = None, data_raw: pd.DataFrame = None, **kwargs
     ):
         """
         Asynchronously runs the agent with user instructions and data.
         Parameters:
         ----------
         user_instructions : str, optional
@@ -105,20 +103,17 @@ class EDAToolsAgent(BaseAgent):
                 "user_instructions": user_instructions,
                 "data_raw": data_raw.to_dict() if data_raw is not None else None,
             },
-            **kwargs
+            **kwargs,
         )
         self.response = response
         return None
     def invoke_agent(
-        self,
-        user_instructions: str = None,
-        data_raw: pd.DataFrame = None,
-        **kwargs
+        self, user_instructions: str = None, data_raw: pd.DataFrame = None, **kwargs
     ):
         """
         Synchronously runs the agent with user instructions and data.
         Parameters:
         ----------
         user_instructions : str, optional
@@ -131,24 +126,26 @@ class EDAToolsAgent(BaseAgent):
                 "user_instructions": user_instructions,
                 "data_raw": data_raw.to_dict() if data_raw is not None else None,
             },
-            **kwargs
+            **kwargs,
         )
         self.response = response
         return None
     def get_internal_messages(self, markdown: bool = False):
         """
         Returns internal messages from the agent response.
         """
         pretty_print = "\n\n".join(
-            [f"### {msg.type.upper()}\n\nID: {msg.id}\n\nContent:\n\n{msg.content}"
-             for msg in self.response["internal_messages"]]
+            [
+                f"### {msg.type.upper()}\n\nID: {msg.id}\n\nContent:\n\n{msg.content}"
+                for msg in self.response["internal_messages"]
+            ]
         )
         if markdown:
             return Markdown(pretty_print)
         else:
             return self.response["internal_messages"]
     def get_artifacts(self, as_dataframe: bool = False):
         """
         Returns the EDA artifacts from the agent response.
@@ -157,7 +154,7 @@ class EDAToolsAgent(BaseAgent):
             return pd.DataFrame(self.response["eda_artifacts"])
         else:
             return self.response["eda_artifacts"]
     def get_ai_message(self, markdown: bool = False):
         """
         Returns the AI message from the agent response.
@@ -166,13 +163,14 @@ class EDAToolsAgent(BaseAgent):
             return Markdown(self.response["messages"][0].content)
         else:
             return self.response["messages"][0].content
     def get_tool_calls(self):
         """
         Returns the tool calls made by the agent.
         """
         return self.response["tool_calls"]
 def make_eda_tools_agent(
     model: Any,
     create_react_agent_kwargs: Optional[Dict] = {},
@@ -181,7 +179,7 @@ def make_eda_tools_agent(
 ):
     """
     Creates an Exploratory Data Analyst Agent that can interact with EDA tools.
     Parameters:
     ----------
     model : Any
@@ -192,13 +190,13 @@ def make_eda_tools_agent(
         Additional kwargs for agent invocation.
     checkpointer : Checkpointer, optional
         The checkpointer for the agent.
     Returns:
     -------
     app : langgraph.graph.CompiledStateGraph
         The compiled state graph for the EDA agent.
     """
     class GraphState(AgentState):
         internal_messages: Annotated[Sequence[BaseMessage], operator.add]
         user_instructions: str
@@ -209,11 +207,9 @@ def make_eda_tools_agent(
     def exploratory_agent(state):
         print(format_agent_name(AGENT_NAME))
         print("    * RUN REACT TOOL-CALLING AGENT FOR EDA")
-        tool_node = ToolNode(
-            tools=EDA_TOOLS
-        )
+        tool_node = ToolNode(tools=EDA_TOOLS)
         eda_agent = create_react_agent(
             model,
             tools=tool_node,
@@ -221,7 +217,7 @@ def make_eda_tools_agent(
             **create_react_agent_kwargs,
             checkpointer=checkpointer,
         )
         response = eda_agent.invoke(
             {
                 "messages": [("user", state["user_instructions"])],
@@ -229,13 +225,13 @@ def make_eda_tools_agent(
             },
             invoke_react_agent_kwargs,
         )
         print("    * POST-PROCESSING EDA RESULTS")
-        internal_messages = response['messages']
+        internal_messages = response["messages"]
         if not internal_messages:
             return {"internal_messages": [], "eda_artifacts": None}
         last_ai_message = AIMessage(internal_messages[-1].content, role=AGENT_NAME)
         last_tool_artifact = None
         if len(internal_messages) > 1:
@@ -244,24 +240,24 @@ def make_eda_tools_agent(
                 last_tool_artifact = last_message.artifact
             elif isinstance(last_message, dict) and "artifact" in last_message:
                 last_tool_artifact = last_message["artifact"]
         tool_calls = get_tool_call_names(internal_messages)
         return {
             "messages": [last_ai_message],
             "internal_messages": internal_messages,
             "eda_artifacts": last_tool_artifact,
             "tool_calls": tool_calls,
         }
     workflow = StateGraph(GraphState)
     workflow.add_node("exploratory_agent", exploratory_agent)
     workflow.add_edge(START, "exploratory_agent")
     workflow.add_edge("exploratory_agent", END)
     app = workflow.compile(
         checkpointer=checkpointer,
         name=AGENT_NAME,
     )
     return app

ai-data-science-team 0.0.0.9015__tar.gz → 0.0.0.9016__tar.gz

ai-data-science-team 0.0.0.9015tar.gz → 0.0.0.9016tar.gz