PyPI - holmesgpt - Versions diffs - 0.12.3a1__py3-none-any.whl → 0.12.4__py3-none-any.whl - Mend

holmesgpt 0.12.3a1py3-none-any.whl → 0.12.4py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of holmesgpt might be problematic. Click here for more details.

Files changed (52) hide show

holmes/__init__.py +1 -1
holmes/config.py +75 -33
holmes/core/config.py +5 -0
holmes/core/conversations.py +17 -2
holmes/core/investigation.py +1 -0
holmes/core/llm.py +1 -2
holmes/core/prompt.py +29 -4
holmes/core/supabase_dal.py +49 -13
holmes/core/tool_calling_llm.py +26 -1
holmes/core/tools.py +2 -1
holmes/core/tools_utils/tool_executor.py +1 -0
holmes/core/toolset_manager.py +10 -3
holmes/core/tracing.py +77 -10
holmes/interactive.py +110 -20
holmes/main.py +13 -18
holmes/plugins/destinations/slack/plugin.py +19 -9
holmes/plugins/prompts/_fetch_logs.jinja2 +11 -1
holmes/plugins/prompts/_general_instructions.jinja2 +6 -37
holmes/plugins/prompts/_permission_errors.jinja2 +6 -0
holmes/plugins/prompts/_runbook_instructions.jinja2 +13 -5
holmes/plugins/prompts/_toolsets_instructions.jinja2 +22 -14
holmes/plugins/prompts/generic_ask.jinja2 +6 -0
holmes/plugins/prompts/generic_ask_conversation.jinja2 +1 -0
holmes/plugins/prompts/generic_ask_for_issue_conversation.jinja2 +1 -0
holmes/plugins/prompts/generic_investigation.jinja2 +1 -0
holmes/plugins/prompts/kubernetes_workload_ask.jinja2 +0 -2
holmes/plugins/runbooks/__init__.py +20 -4
holmes/plugins/toolsets/__init__.py +7 -9
holmes/plugins/toolsets/aks-node-health.yaml +0 -8
holmes/plugins/toolsets/argocd.yaml +4 -1
holmes/plugins/toolsets/azure_sql/apis/azure_sql_api.py +1 -1
holmes/plugins/toolsets/azure_sql/apis/connection_failure_api.py +2 -0
holmes/plugins/toolsets/confluence.yaml +1 -1
holmes/plugins/toolsets/datadog/datadog_metrics_instructions.jinja2 +54 -4
holmes/plugins/toolsets/datadog/toolset_datadog_metrics.py +150 -6
holmes/plugins/toolsets/kubernetes.yaml +6 -0
holmes/plugins/toolsets/prometheus/prometheus.py +2 -6
holmes/plugins/toolsets/prometheus/prometheus_instructions.jinja2 +2 -2
holmes/plugins/toolsets/runbook/runbook_fetcher.py +65 -6
holmes/plugins/toolsets/service_discovery.py +1 -1
holmes/plugins/toolsets/slab.yaml +1 -1
holmes/utils/colors.py +7 -0
holmes/utils/console/consts.py +5 -0
holmes/utils/console/result.py +2 -1
holmes/utils/keygen_utils.py +6 -0
holmes/version.py +2 -2
holmesgpt-0.12.4.dist-info/METADATA +258 -0
{holmesgpt-0.12.3a1.dist-info → holmesgpt-0.12.4.dist-info}/RECORD +51 -47
holmesgpt-0.12.3a1.dist-info/METADATA +0 -400
{holmesgpt-0.12.3a1.dist-info → holmesgpt-0.12.4.dist-info}/LICENSE.txt +0 -0
{holmesgpt-0.12.3a1.dist-info → holmesgpt-0.12.4.dist-info}/WHEEL +0 -0
{holmesgpt-0.12.3a1.dist-info → holmesgpt-0.12.4.dist-info}/entry_points.txt +0 -0

holmes/plugins/prompts/_general_instructions.jinja2 CHANGED Viewed

@@ -1,5 +1,8 @@
 # In general
+{% if cluster_name -%}
+* You are running on cluster {{ cluster_name }}.
+{%- endif %}
 * when it can provide extra information, first run as many tools as you need to gather more information, then respond.
 * if possible, do so repeatedly with different tool calls each time to gather more information.
 * do not stop investigating until you are at the final root cause you are able to find.
@@ -9,7 +12,8 @@
 * in this case, try to find substrings or search for the correct spellings
 * always provide detailed information like exact resource names, versions, labels, etc
 * even if you found the root cause, keep investigating to find other possible root causes and to gather data for the answer like exact names
-* if a runbook url is present as well as tool that can fetch it, you MUST fetch the runbook before beginning your investigation.
+* if a runbook url is present you MUST fetch the runbook before beginning your investigation
+* when the user mentions any operational issue (high CPU, memory issues, database down, application errors, etc.), ALWAYS check if there's a matching runbook in the catalog first
 * if you don't know, say that the analysis was inconclusive.
 * if there are multiple possible causes list them in a numbered list.
 * there will often be errors in the data that are not relevant or that do not have an impact - ignore them in your conclusion if you were not able to tie them to an actual error.
@@ -32,42 +36,7 @@
 {% include '_toolsets_instructions.jinja2' %}
-{% include '_fetch_logs.jinja2' %}
-# Handling Permission Errors
-If during the investigation you encounter a permissions error (e.g., `Error from server (Forbidden):`), **ALWAYS** follow these steps to ensure a thorough resolution:
-1.**Analyze the Error Message**
- - Identify the missing resource, API group, and verbs from the error details.
- - Never stop at reporting the error
- - Proceed with an in-depth investigation.
-2.**Locate the Relevant Helm Release**
-Check if Helm tools are available, if they are available always use Helm commands to help user find the release associated with the Holmes pod:
- - Run `helm list -A | grep holmes` to identify the release name.
- - Run `helm get values <RELEASE_NAME> -n <NAMESPACE>` to retrieve details such as `customClusterRoleRules` and `clusterName`.
-If Helm tools are unavailable, skip this step.
-3. **Check for Missing Permissions**
- - Check for a cluster role with <RELEASE_NAME>-holmes-cluster-role in its name and a service account with <RELEASE_NAME>-holmes-service-account in its name to troubleshoot missing permissions where release name is the name you found earlier if helm tools are available (If the exact cluster role or service account isn't found, search for similar or related names, including variations or prefixes/suffixes that might be used in the cluster.)
- - Focus on identifying absent permissions that align with the error message.
-4. **Update the Configuration**
-If necessary permissions are absent both in customClusterRoleRules and the cluster role mentioned previously, ALWAYS advise the user to update their configuration by modifying the `generated_values.yaml` file as follows:
-```
-holmes:
-    customClusterRoleRules:
-      - apiGroups: ["<API_GROUP>"]
-        resources: ["<RESOURCE_1>", "<RESOURCE_2>"]
-        verbs: ["<VERB_1>", "<VERB_2>", "<VERB_3>"]
-```
-After that instruct them to apply the changes with::
-```
-    helm upgrade <RELEASE_NAME> robusta/robusta --values=generated_values.yaml --set clusterName=<YOUR_CLUSTER_NAME>
-```
-5. **Fallback Guidelines**
-- If you cannot determine the release or cluster name, use placeholders `<RELEASE_NAME>` and `<YOUR_CLUSTER_NAME>`.
-- While you should attempt to retrieve details using Helm commands, do **not** direct the user to execute these commands themselves.
-Reminder:
-* Always adhere to this process, even if Helm tools are unavailable.
-* Strive for thoroughness and precision, ensuring the issue is fully addressed.
+{% include '_permission_errors.jinja2' %}
 # Special cases and how to reply

holmes/plugins/prompts/_permission_errors.jinja2 ADDED Viewed

@@ -0,0 +1,6 @@
+# Handling Permission Errors
+If during the investigation you encounter a permissions error (e.g., `Error from server (Forbidden):`), **ALWAYS** follow these steps to ensure a thorough resolution:
+1. Analyze the Error Message: Identify the missing resource, API group, and verbs from the error details.
+2. Check which user/service account you're running with and what permissions it has
+3. Report this to the user and refer them to https://robusta-dev.github.io/holmesgpt/data-sources/permissions/

holmes/plugins/prompts/_runbook_instructions.jinja2 CHANGED Viewed

@@ -1,13 +1,21 @@
 {% if runbooks and runbooks.catalog|length > 0 %}
 # Runbook Selection
-## Available Runbooks
+You (HolmesGPT) have access to a set of runbooks that provide step-by-step troubleshooting instructions for various known issues.
+If one of the following runbooks relates to the user's issue, you MUST fetch it with the fetch_runbook tool.
+## Available Runbooks for fetch_runbook tool
 {% for runbook in runbooks.catalog %}
 ### description: {{ runbook.description }}
 link: {{ runbook.link }}
 {% endfor %}
-ALWAYS try to find the runbooks that can provide troubleshooting instructions when the user describes an operational issue, debugging scenario, or asks for step‑by‑step troubleshooting.
-To get the runbook details, use `fetch_runbook` tool by comparing the runbook description with the user prompt.
-ALWAYS follow the steps described in the runbook.
-If you decided not to follow one or more steps, ALWAYS explain why.
+If there is a runbook that MIGHT match the user's issue, you MUST:
+1. Fetch the runbook with the `fetch_runbook` tool.
+2. Decide based on the runbook's contents if it is relevant or not.
+3. If it seems relevant, inform the user that you accesses a runbook and will use it to troubleshoot the issue.
+4. To the maximum extent possible, follow the runbook instructions step-by-step.
+5. Provide a detailed report of the steps you performed, including any findings or errors encountered.
+6. If a runbook step requires tools or integrations you don't have access to tell the user that you cannot perform that step due to missing tools.
 {%- endif -%}

holmes/plugins/prompts/_toolsets_instructions.jinja2 CHANGED Viewed

@@ -1,3 +1,5 @@
+# Toolset Setup and Configuration Instructions
 {%- set enabled_toolsets_with_instructions = [] -%}
 {%- set disabled_toolsets = [] -%}
@@ -9,8 +11,10 @@
   {%- endif -%}
 {%- endfor -%}
-{% if enabled_toolsets_with_instructions|list -%}
 # Available Toolsets
+{% include '_fetch_logs.jinja2' %}
+{% if enabled_toolsets_with_instructions|list %}
 {%- for toolset in enabled_toolsets_with_instructions -%}
 {% if toolset.llm_instructions %}
@@ -19,13 +23,13 @@
 {%- endif -%}
 {%- endfor -%}
 {%- endif -%}
-{% if disabled_toolsets %}
-# Disabled & failed Toolsets
+# Disabled & failed Toolsets
+{% if disabled_toolsets %}
 The following toolsets are either disabled or failed to initialize:
 {% for toolset in disabled_toolsets %}
 * toolset "{{ toolset.name }}": {{ toolset.description }}
-    {%- if toolset.status == "failed" %}
+    {%- if toolset.status.value == "failed" %}
     *  status: The toolset is enabled but misconfigured and failed to initialize.
       {%- if toolset.error %}
     *  error: {{ toolset.error }}
@@ -37,20 +41,24 @@ The following toolsets are either disabled or failed to initialize:
     *  setup instructions: {{ toolset.docs_url }}
     {%- endif -%}
 {%- endfor %}
+{% else %}
+<no toolsets are disabled or failed>
+{% endif %}
 If you need a toolset to access a system that you don't otherwise have access to:
   - Check the list of toolsets above and see if any loosely match the needs
   - If the toolset has `status: failed`: Tell the user and copy the error in your response for the user to see
-  - If the toolset has `status: disabled`: Ask the user to configure the it.
+  - If the toolset has `status: disabled`: Ask the user to configure it.
     - Share the setup instructions URL with the user
-    - Invoke the tool fetch_webpage on the toolset URL and summarize setup steps
-  - If there are no relevant toolsets in the list below, tell the user that you are missing an integration to access XYZ:
-    you should give an answer similar to "I don't have access to <system>. Please add a Holmes integration for <system> so
-    that I can investigate this."
-{% else %}
+  - If there are no relevant toolsets in the list above, tell the user that you are missing an integration to access XYZ:
+    You should give an answer similar to "I don't have access to <system>. To add a HolmesGPT integration for <system> you can [connect an MCP server](https://robusta-dev.github.io/holmesgpt/data-sources/remote-mcp-servers/) or add a [custom toolset](https://robusta-dev.github.io/holmesgpt/data-sources/custom-toolsets/)."
-# Disabled & failed Toolsets
+Likewise, if users ask about setting up or configuring integrations (e.g., "How can I give you access to ArgoCD applications?"):
+ALWAYS check if there's a disabled or failed toolset that matches what the user is asking about. If you find one:
+1. If the toolset has a specific documentation URL (toolset.docs_url), ALWAYS direct them to that URL first
+2. If no specific documentation exists, then direct them to the general Holmes documentation:
+   - For all toolset configurations: https://robusta-dev.github.io/holmesgpt/data-sources/
+   - For custom toolsets: https://robusta-dev.github.io/holmesgpt/data-sources/custom-toolsets/
+   - For remote MCP servers: https://robusta-dev.github.io/holmesgpt/data-sources/remote-mcp-servers/
-If you need a toolset to access a system that you don't otherwise have access to, tell the user that you are missing an integration to access XYZ.
-You should give an answer similar to "I don't have access to <system>. Please add a Holmes integration for <system> so that I can investigate this."
-{%- endif -%}
+When providing configuration guidance, always prefer the specific toolset documentation URL when available.

holmes/plugins/prompts/generic_ask.jinja2 CHANGED Viewed

@@ -1,8 +1,10 @@
 You are a tool-calling AI assist provided with common devops and IT tools that you can use to troubleshoot problems or answer questions.
 Whenever possible you MUST first use tools to investigate then answer the question.
+Ask for multiple tool calls at the same time as it saves time for the user.
 Do not say 'based on the tool output' or explicitly refer to tools at all.
 If you output an answer and then realize you need to call more tools or there are possible next steps, you may do so by calling tools at that point in time.
 If you have a good and concrete suggestion for how the user can fix something, tell them even if not asked explicitly
+{% include '_current_date_time.jinja2' %}
 Use conversation history to maintain continuity when appropriate, ensuring efficiency in your responses.
@@ -34,3 +36,7 @@ Relevant logs:
 ```
 Validation error led to unhandled Java exception causing a crash.
+{% if system_prompt_additions %}
+{{ system_prompt_additions }}
+{% endif %}

holmes/plugins/prompts/generic_ask_conversation.jinja2 CHANGED Viewed

@@ -1,5 +1,6 @@
 You are a tool-calling AI assist provided with common devops and IT tools that you can use to troubleshoot problems or answer questions.
 Whenever possible you MUST first use tools to investigate then answer the question.
+Ask for multiple tool calls at the same time as it saves time for the user.
 Do not say 'based on the tool output' or explicitly refer to tools at all.
 If you output an answer and then realize you need to call more tools or there are possible next steps, you may do so by calling tools at that point in time.
 If you have a good and concrete suggestion for how the user can fix something, tell them even if not asked explicitly

holmes/plugins/prompts/generic_ask_for_issue_conversation.jinja2 CHANGED Viewed

@@ -1,5 +1,6 @@
 You are a tool-calling AI assist provided with common devops and IT tools that you can use to troubleshoot problems or answer questions.
 Whenever possible you MUST first use tools to investigate then answer the question.
+Ask for multiple tool calls at the same time as it saves time for the user.
 Do not say 'based on the tool output' or explicitly refer to tools at all.
 If you output an answer and then realize you need to call more tools or there are possible next steps, you may do so by calling tools at that point in time.
 {% include '_current_date_time.jinja2' %}

holmes/plugins/prompts/generic_investigation.jinja2 CHANGED Viewed

@@ -1,5 +1,6 @@
 You are a tool-calling AI assist provided with common devops and IT tools that you can use to troubleshoot problems or answer questions.
 Whenever possible you MUST first use tools to investigate then answer the question.
+Ask for multiple tool calls at the same time as it saves time for the user.
 Do not say 'based on the tool output'
 Provide an terse analysis of the following {{ issue.source_type }} alert/issue and why it is firing.

holmes/plugins/prompts/kubernetes_workload_ask.jinja2 CHANGED Viewed

@@ -43,8 +43,6 @@ In general:
 {% include '_toolsets_instructions.jinja2' %}
-{% include '_fetch_logs.jinja2' %}
 Style guide:
 * Be painfully concise.
 * Leave out "the" and filler words when possible.

holmes/plugins/runbooks/__init__.py CHANGED Viewed

@@ -11,6 +11,7 @@ from pydantic import BaseModel, PrivateAttr
 from holmes.utils.pydantic_utils import RobustaBaseConfig, load_model_from_file
 THIS_DIR = os.path.abspath(os.path.dirname(__file__))
+DEFAULT_RUNBOOK_SEARCH_PATH = THIS_DIR
 CATALOG_FILE = "catalog.json"
@@ -94,7 +95,22 @@ def load_runbook_catalog() -> Optional[RunbookCatalog]:
     return None
-def get_runbook_by_path(runbook_relative_path: str) -> str:
-    runbook_folder = os.path.dirname(os.path.realpath(__file__))
-    runbook_path = os.path.join(runbook_folder, runbook_relative_path)
-    return runbook_path
+def get_runbook_by_path(
+    runbook_relative_path: str, search_paths: List[str]
+) -> Optional[str]:
+    """
+    Find a runbook by searching through provided paths.
+    Args:
+        runbook_relative_path: The relative path to the runbook
+        search_paths: Optional list of directories to search. If None, uses default runbook folder.
+    Returns:
+        Full path to the runbook if found, None otherwise
+    """
+    for search_path in search_paths:
+        runbook_path = os.path.join(search_path, runbook_relative_path)
+        if os.path.exists(runbook_path):
+            return runbook_path
+    return None

holmes/plugins/toolsets/__init__.py CHANGED Viewed

@@ -3,14 +3,16 @@ import os
 import os.path
 from typing import Any, List, Optional, Union
-from holmes.common.env_vars import USE_LEGACY_KUBERNETES_LOGS
 import yaml  # type: ignore
 from pydantic import ValidationError
-from holmes.plugins.toolsets.azure_sql.azure_sql_toolset import AzureSQLToolset
 import holmes.utils.env as env_utils
+from holmes.common.env_vars import USE_LEGACY_KUBERNETES_LOGS
 from holmes.core.supabase_dal import SupabaseDal
 from holmes.core.tools import Toolset, ToolsetType, ToolsetYamlFromConfig, YAMLToolset
+from holmes.plugins.toolsets.atlas_mongodb.mongodb_atlas import MongoDBAtlasToolset
+from holmes.plugins.toolsets.azure_sql.azure_sql_toolset import AzureSQLToolset
+from holmes.plugins.toolsets.bash.bash_toolset import BashExecutorToolset
 from holmes.plugins.toolsets.coralogix.toolset_coralogix_logs import (
     CoralogixLogsToolset,
 )
@@ -18,18 +20,15 @@ from holmes.plugins.toolsets.datadog.toolset_datadog_logs import DatadogLogsTool
 from holmes.plugins.toolsets.datadog.toolset_datadog_metrics import (
     DatadogMetricsToolset,
 )
-from holmes.plugins.toolsets.datadog.toolset_datadog_traces import (
-    DatadogTracesToolset,
-)
-from holmes.plugins.toolsets.kubernetes_logs import KubernetesLogsToolset
+from holmes.plugins.toolsets.datadog.toolset_datadog_traces import DatadogTracesToolset
 from holmes.plugins.toolsets.git import GitToolset
 from holmes.plugins.toolsets.grafana.toolset_grafana import GrafanaToolset
-from holmes.plugins.toolsets.bash.bash_toolset import BashExecutorToolset
 from holmes.plugins.toolsets.grafana.toolset_grafana_loki import GrafanaLokiToolset
 from holmes.plugins.toolsets.grafana.toolset_grafana_tempo import GrafanaTempoToolset
 from holmes.plugins.toolsets.internet.internet import InternetToolset
 from holmes.plugins.toolsets.internet.notion import NotionToolset
 from holmes.plugins.toolsets.kafka import KafkaToolset
+from holmes.plugins.toolsets.kubernetes_logs import KubernetesLogsToolset
 from holmes.plugins.toolsets.mcp.toolset_mcp import RemoteMCPToolset
 from holmes.plugins.toolsets.newrelic import NewRelicToolset
 from holmes.plugins.toolsets.opensearch.opensearch import OpenSearchToolset
@@ -38,7 +37,6 @@ from holmes.plugins.toolsets.opensearch.opensearch_traces import OpenSearchTrace
 from holmes.plugins.toolsets.prometheus.prometheus import PrometheusToolset
 from holmes.plugins.toolsets.rabbitmq.toolset_rabbitmq import RabbitMQToolset
 from holmes.plugins.toolsets.robusta.robusta import RobustaToolset
-from holmes.plugins.toolsets.atlas_mongodb.mongodb_atlas import MongoDBAtlasToolset
 from holmes.plugins.toolsets.runbook.runbook_fetcher import RunbookToolset
 from holmes.plugins.toolsets.servicenow.servicenow import ServiceNowToolset
@@ -156,7 +154,7 @@ def load_toolsets_from_config(
             toolset_type = config.get("type", ToolsetType.BUILTIN.value)
             # MCP server is not a built-in toolset, so we need to set the type explicitly
             validated_toolset: Optional[Toolset] = None
-            if toolset_type is ToolsetType.MCP:
+            if toolset_type == ToolsetType.MCP.value:
                 validated_toolset = RemoteMCPToolset(**config, name=name)
             elif strict_check:
                 validated_toolset = YAMLToolset(**config, name=name)  # type: ignore

holmes/plugins/toolsets/aks-node-health.yaml CHANGED Viewed

@@ -55,11 +55,3 @@ toolsets:
         user_description: "lists all VMSS names in {{ NODE_RESOURCE_GROUP }}"
         command: |
           az vmss list -g {{ NODE_RESOURCE_GROUP }} --query '[*].name' -o tsv --only-show-errors
-      - name: "vmss_run_command"
-        description: |
-          Execute a shell command on a specific VMSS VM instance using az vmss run-command.
-          VM_ID is the instance ID of the VMSS, which can be derived from node names.
-          Prerequisites: get_node_resource_group, list_vmss_names
-        user_description: "run command {{ SHELL_COMMAND }} on VM #{{ VM_ID }} of VMSS {{ VMSS_NAME }}"
-        command: |
-          az vmss run-command invoke --resource-group {{ NODE_RESOURCE_GROUP }} --name {{ VMSS_NAME }} --instance-id {{ VM_ID }} --command-id RunShellScript --scripts {{ SHELL_COMMAND }}

holmes/plugins/toolsets/argocd.yaml CHANGED Viewed

@@ -6,13 +6,16 @@ toolsets:
     llm_instructions: |
       You have access to a set of ArgoCD tools for debugging Kubernetes application deployments.
       If an application's name does not exist in kubernetes, it may exist in argocd: call the tool `argocd_app_list` to find it.
+      IMPORTANT: If you are asked about health issues, ALWAYS check if the argo cd apps are in a healthy state.
+      If some resource is out of sync, ALWAYS show the diff, using the argocd_app_diff tool, between the desired state and the current state.
       These tools help you investigate issues with GitOps-managed applications in your Kubernetes clusters.
-      ALWAYS follow these steps:
+      In addition to the general investigation steps, ALWAYS follow these steps as well:
         1. List the applications
         2. Retrieve the application status and its config
         3. Retrieve the application's manifests for issues
         4. Compare the ArgoCD config with the kubernetes status using kubectl tools
         5. Check for resources mismatch between argocd and kubernetes
+        6. If an application is OutOfSync, pull the diff using the argocd_app_diff tool
       {% if tool_names|list|length > 0 %}
       The following commands are available to introspect into ArgoCD: {{ ", ".join(tool_names) }}
       {% endif %}

holmes/plugins/toolsets/azure_sql/apis/azure_sql_api.py CHANGED Viewed

@@ -179,7 +179,7 @@ class AzureSQLAPIClient:
             server_name=server_name,
             database_name=database_name,
         )
-        return tuning.as_dict()
+        return dict(tuning.as_dict())
     def get_top_cpu_queries(
         self,

holmes/plugins/toolsets/azure_sql/apis/connection_failure_api.py CHANGED Viewed

@@ -134,6 +134,8 @@ class ConnectionFailureAPI:
                     for metric in metrics.value:
                         if metric.timeseries:
                             for timeseries in metric.timeseries:
+                                if timeseries.data is None:
+                                    continue
                                 for data_point in timeseries.data:
                                     if data_point.time_stamp:
                                         metric_values.append(

holmes/plugins/toolsets/confluence.yaml CHANGED Viewed

@@ -14,6 +14,6 @@ toolsets:
     tools:
       - name: "fetch_confluence_url"
-        description: "Fetch a page in confluence.  Use this to fetch confluence runbooks if they are present before starting your investigation."
+        description: "Fetch a page in confluence."
         user_description: "fetch confluence page {{ confluence_page_id }}"
         command: "curl -u ${CONFLUENCE_USER}:${CONFLUENCE_API_KEY} -X GET -H 'Content-Type: application/json' ${CONFLUENCE_BASE_URL}/wiki/rest/api/content/{{ confluence_page_id }}?expand=body.storage"

holmes/plugins/toolsets/datadog/datadog_metrics_instructions.jinja2 CHANGED Viewed

@@ -15,12 +15,62 @@ When investigating metrics-related issues:
    - Provides metric type (gauge/count/rate), unit, and description
    - Accepts comma-separated list for batch queries
+4. **Use `list_datadog_metric_tags`** to understand which tags are available for a given metric
+   - Provides a set of tags and aggregations
+   - Can help to build the correct `tag_filter`, to find which metrics are available for a given resource
+### General guideline
+- This toolset is used to generate visualizations and graphs.
+- Assume the resource should have metrics. If metrics not found, try to adjust tag filters
+- IMPORTANT: This toolset DOES NOT support promql queries.
+### CRITICAL: Pod Name Resolution Workflow
+When users ask for metrics about a deployment, service, or workload (e.g., "my-workload", "nginx-deployment"):
+**ALWAYS follow this two-step process:**
+1. **First**: Use `kubectl_find_resource` to find the actual pod names
+   - Example: `kubectl_find_resource` with "my-workload" → finds pods like "my-workload-8f8cdfxyz-c7zdr"
+2. **Then**: Use those specific pod names in Datadog queries
+   - Correct: `container.cpu.usage{pod_name:my-workload-8f8cdfxyz-c7zdr}`
+   - WRONG: `container.cpu.usage{pod_name:my-workload}` ← This will return no data!
+**Why this matters:**
+- Pod names in Datadog are the actual Kubernetes pod names (with random suffixes)
+- Deployment/service names are NOT pod names
+- Using deployment names as pod_name filters will always return empty results
 ### Time Parameters
 - Use RFC3339 format: `2023-03-01T10:30:00Z`
 - Or relative seconds: `-3600` for 1 hour ago
 - Defaults to 1 hour window if not specified
-### Common Patterns
-- CPU investigation: First list metrics with `tag_filter:kube_node_name:nodename`, then query specific metrics
-- Memory issues: Look for `system.mem.*` or `kubernetes.memory.*` metrics
-- Container metrics: Filter by pod/container tags
+### Common Investigation Patterns
+**For Pod/Container Metrics (MOST COMMON):**
+1. User asks: "Show CPU for my-workload"
+2. Use `kubectl_find_resource` → find pod "my-workload-abc123-xyz"
+3. Query Datadog: `container.cpu.usage{pod_name:my-workload-abc123-xyz}`
+**For Node-level Metrics:**
+1. Use `tag_filter:kube_node_name:nodename` to filter by node
+2. Query system-level metrics like `system.cpu.user{kube_node_name:worker-1}`
+**For Service-level Metrics:**
+1. First resolve service to pods using `kubectl_find_resource`
+2. Query metrics for all pods belonging to that service
+3. Use namespace filtering: `tag_filter:kube_namespace:default`
+# Handling queries results
+* ALWAYS embed the execution results into your answer
+* You only need to embed the partial result in your response. Include the "tool_name" and "random_key". For example: << {"type": "datadogql", "tool_name": "query_datadog_metrics", "random_key": "92jf2hf"} >>
+* Post processing will parse your response, re-run the query from the tool output and create a chart visible to the user
+* You MUST ensure that the query is successful.
+* ALWAYS embed a DataDog graph in the response. The graph should visualize data related to the incident.
+* Embed at most 2 graphs
+* When embedding multiple graphs, always add line spacing between them
+    For example:
+    <<{"type": "datadogql", "tool_name": "query_datadog_metrics", "random_key": "lBaA"}>>
+    <<{"type": "datadogql", "tool_name": "query_datadog_metrics", "random_key": "IKtq"}>>

holmesgpt 0.12.3a1__py3-none-any.whl → 0.12.4__py3-none-any.whl

Potentially problematic release.

holmesgpt 0.12.3a1py3-none-any.whl → 0.12.4py3-none-any.whl