@sentry/junior-datadog 0.25.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,201 @@
1
+ Apache License
2
+ Version 2.0, January 2004
3
+ http://www.apache.org/licenses/
4
+
5
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6
+
7
+ 1. Definitions.
8
+
9
+ "License" shall mean the terms and conditions for use, reproduction,
10
+ and distribution as defined by Sections 1 through 9 of this document.
11
+
12
+ "Licensor" shall mean the copyright owner or entity authorized by
13
+ the copyright owner that is granting the License.
14
+
15
+ "Legal Entity" shall mean the union of the acting entity and all
16
+ other entities that control, are controlled by, or are under common
17
+ control with that entity. For the purposes of this definition,
18
+ "control" means (i) the power, direct or indirect, to cause the
19
+ direction or management of such entity, whether by contract or
20
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
21
+ outstanding shares, or (iii) beneficial ownership of such entity.
22
+
23
+ "You" (or "Your") shall mean an individual or Legal Entity
24
+ exercising permissions granted by this License.
25
+
26
+ "Source" form shall mean the preferred form for making modifications,
27
+ including but not limited to software source code, documentation
28
+ source, and configuration files.
29
+
30
+ "Object" form shall mean any form resulting from mechanical
31
+ transformation or translation of a Source form, including but
32
+ not limited to compiled object code, generated documentation,
33
+ and conversions to other media types.
34
+
35
+ "Work" shall mean the work of authorship, whether in Source or
36
+ Object form, made available under the License, as indicated by a
37
+ copyright notice that is included in or attached to the work
38
+ (an example is provided in the Appendix below).
39
+
40
+ "Derivative Works" shall mean any work, whether in Source or Object
41
+ form, that is based on (or derived from) the Work and for which the
42
+ editorial revisions, annotations, elaborations, or other modifications
43
+ represent, as a whole, an original work of authorship. For the purposes
44
+ of this License, Derivative Works shall not include works that remain
45
+ separable from, or merely link (or bind by name) to the interfaces of,
46
+ the Work and Derivative Works thereof.
47
+
48
+ "Contribution" shall mean any work of authorship, including
49
+ the original version of the Work and any modifications or additions
50
+ to that Work or Derivative Works thereof, that is intentionally
51
+ submitted to Licensor for inclusion in the Work by the copyright owner
52
+ or by an individual or Legal Entity authorized to submit on behalf of
53
+ the copyright owner. For the purposes of this definition, "submitted"
54
+ means any form of electronic, verbal, or written communication sent
55
+ to the Licensor or its representatives, including but not limited to
56
+ communication on electronic mailing lists, source code control systems,
57
+ and issue tracking systems that are managed by, or on behalf of, the
58
+ Licensor for the purpose of discussing and improving the Work, but
59
+ excluding communication that is conspicuously marked or otherwise
60
+ designated in writing by the copyright owner as "Not a Contribution."
61
+
62
+ "Contributor" shall mean Licensor and any individual or Legal Entity
63
+ on behalf of whom a Contribution has been received by Licensor and
64
+ subsequently incorporated within the Work.
65
+
66
+ 2. Grant of Copyright License. Subject to the terms and conditions of
67
+ this License, each Contributor hereby grants to You a perpetual,
68
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69
+ copyright license to reproduce, prepare Derivative Works of,
70
+ publicly display, publicly perform, sublicense, and distribute the
71
+ Work and such Derivative Works in Source or Object form.
72
+
73
+ 3. Grant of Patent License. Subject to the terms and conditions of
74
+ this License, each Contributor hereby grants to You a perpetual,
75
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76
+ (except as stated in this section) patent license to make, have made,
77
+ use, offer to sell, sell, import, and otherwise transfer the Work,
78
+ where such license applies only to those patent claims licensable
79
+ by such Contributor that are necessarily infringed by their
80
+ Contribution(s) alone or by combination of their Contribution(s)
81
+ with the Work to which such Contribution(s) was submitted. If You
82
+ institute patent litigation against any entity (including a
83
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
84
+ or a Contribution incorporated within the Work constitutes direct
85
+ or contributory patent infringement, then any patent licenses
86
+ granted to You under this License for that Work shall terminate
87
+ as of the date such litigation is filed.
88
+
89
+ 4. Redistribution. You may reproduce and distribute copies of the
90
+ Work or Derivative Works thereof in any medium, with or without
91
+ modifications, and in Source or Object form, provided that You
92
+ meet the following conditions:
93
+
94
+ (a) You must give any other recipients of the Work or
95
+ Derivative Works a copy of this License; and
96
+
97
+ (b) You must cause any modified files to carry prominent notices
98
+ stating that You changed the files; and
99
+
100
+ (c) You must retain, in the Source form of any Derivative Works
101
+ that You distribute, all copyright, patent, trademark, and
102
+ attribution notices from the Source form of the Work,
103
+ excluding those notices that do not pertain to any part of
104
+ the Derivative Works; and
105
+
106
+ (d) If the Work includes a "NOTICE" text file as part of its
107
+ distribution, then any Derivative Works that You distribute must
108
+ include a readable copy of the attribution notices contained
109
+ within such NOTICE file, excluding those notices that do not
110
+ pertain to any part of the Derivative Works, in at least one
111
+ of the following places: within a NOTICE text file distributed
112
+ as part of the Derivative Works; within the Source form or
113
+ documentation, if provided along with the Derivative Works; or,
114
+ within a display generated by the Derivative Works, if and
115
+ wherever such third-party notices normally appear. The contents
116
+ of the NOTICE file are for informational purposes only and
117
+ do not modify the License. You may add Your own attribution
118
+ notices within Derivative Works that You distribute, alongside
119
+ or as an addendum to the NOTICE text from the Work, provided
120
+ that such additional attribution notices cannot be construed
121
+ as modifying the License.
122
+
123
+ You may add Your own copyright statement to Your modifications and
124
+ may provide additional or different license terms and conditions
125
+ for use, reproduction, or distribution of Your modifications, or
126
+ for any such Derivative Works as a whole, provided Your use,
127
+ reproduction, and distribution of the Work otherwise complies with
128
+ the conditions stated in this License.
129
+
130
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
131
+ any Contribution intentionally submitted for inclusion in the Work
132
+ by You to the Licensor shall be under the terms and conditions of
133
+ this License, without any additional terms or conditions.
134
+ Notwithstanding the above, nothing herein shall supersede or modify
135
+ the terms of any separate license agreement you may have executed
136
+ with Licensor regarding such Contributions.
137
+
138
+ 6. Trademarks. This License does not grant permission to use the trade
139
+ names, trademarks, service marks, or product names of the Licensor,
140
+ except as required for reasonable and customary use in describing the
141
+ origin of the Work and reproducing the content of the NOTICE file.
142
+
143
+ 7. Disclaimer of Warranty. Unless required by applicable law or
144
+ agreed to in writing, Licensor provides the Work (and each
145
+ Contributor provides its Contributions) on an "AS IS" BASIS,
146
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147
+ implied, including, without limitation, any warranties or conditions
148
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149
+ PARTICULAR PURPOSE. You are solely responsible for determining the
150
+ appropriateness of using or redistributing the Work and assume any
151
+ risks associated with Your exercise of permissions under this License.
152
+
153
+ 8. Limitation of Liability. In no event and under no legal theory,
154
+ whether in tort (including negligence), contract, or otherwise,
155
+ unless required by applicable law (such as deliberate and grossly
156
+ negligent acts) or agreed to in writing, shall any Contributor be
157
+ liable to You for damages, including any direct, indirect, special,
158
+ incidental, or consequential damages of any character arising as a
159
+ result of this License or out of the use or inability to use the
160
+ Work (including but not limited to damages for loss of goodwill,
161
+ work stoppage, computer failure or malfunction, or any and all
162
+ other commercial damages or losses), even if such Contributor
163
+ has been advised of the possibility of such damages.
164
+
165
+ 9. Accepting Warranty or Additional Liability. While redistributing
166
+ the Work or Derivative Works thereof, You may choose to offer,
167
+ and charge a fee for, acceptance of support, warranty, indemnity,
168
+ or other liability obligations and/or rights consistent with this
169
+ License. However, in accepting such obligations, You may act only
170
+ on Your own behalf and on Your sole responsibility, not on behalf
171
+ of any other Contributor, and only if You agree to indemnify,
172
+ defend, and hold each Contributor harmless for any liability
173
+ incurred by, or claims asserted against, such Contributor by reason
174
+ of your accepting any such warranty or additional liability.
175
+
176
+ END OF TERMS AND CONDITIONS
177
+
178
+ APPENDIX: How to apply the Apache License to your work.
179
+
180
+ To apply the Apache License to your work, attach the following
181
+ boilerplate notice, with the fields enclosed by brackets "[]"
182
+ replaced with your own identifying information. (Don't include
183
+ the brackets!) The text should be enclosed in the appropriate
184
+ comment syntax for the file format. We also recommend that a
185
+ file or class name and description of purpose be included on the
186
+ same "printed page" as the copyright notice for easier
187
+ identification within third-party archives.
188
+
189
+ Copyright [yyyy] [name of copyright owner]
190
+
191
+ Licensed under the Apache License, Version 2.0 (the "License");
192
+ you may not use this file except in compliance with the License.
193
+ You may obtain a copy of the License at
194
+
195
+ http://www.apache.org/licenses/LICENSE-2.0
196
+
197
+ Unless required by applicable law or agreed to in writing, software
198
+ distributed under the License is distributed on an "AS IS" BASIS,
199
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200
+ See the License for the specific language governing permissions and
201
+ limitations under the License.
package/README.md ADDED
@@ -0,0 +1,44 @@
1
+ # @sentry/junior-datadog
2
+
3
+ `@sentry/junior-datadog` adds read-only Datadog telemetry workflows to Junior through Datadog's hosted MCP server.
4
+
5
+ Install it alongside `@sentry/junior`:
6
+
7
+ ```bash
8
+ pnpm add @sentry/junior @sentry/junior-datadog
9
+ ```
10
+
11
+ Then register the plugin package in `juniorNitro(...)`:
12
+
13
+ ```ts title="nitro.config.ts"
14
+ juniorNitro({
15
+ pluginPackages: ["@sentry/junior-datadog"],
16
+ });
17
+ ```
18
+
19
+ This package does not use `DD_API_KEY`, `DD_APP_KEY`, or a shared workspace integration. Each user connects their own Datadog account the first time Junior calls a Datadog MCP tool. Junior sends the OAuth link privately and resumes the thread automatically after the user authorizes.
20
+
21
+ Junior intentionally keeps this package read-only by limiting the MCP tool surface to search, fetch, and log analytics tools. The plugin does not expose notebook writes, monitor edits, or other mutating Datadog tools.
22
+
23
+ ## Datadog site
24
+
25
+ The packaged manifest defaults to the US1 endpoint (`mcp.datadoghq.com`) and enables the `core`, `apm`, and `error-tracking` toolsets. Teams on other Datadog sites (US3, US5, EU, AP1, AP2, GovCloud) set `DATADOG_SITE` in their Junior deployment env to their site host (e.g. `us5.datadoghq.com`, `datadoghq.eu`, `ddog-gov.com`). No code changes or plugin copy needed. See the [Datadog plugin docs](https://junior.sentry.dev/extend/datadog-plugin/) for the full site table.
26
+
27
+ ## Optional channel defaults
28
+
29
+ If a Slack channel usually investigates the same Datadog environment or service, store that as a conversation-scoped default:
30
+
31
+ ```bash
32
+ jr-rpc config set datadog.env prod
33
+ jr-rpc config set datadog.service checkout
34
+ ```
35
+
36
+ These defaults are optional fallbacks. If a user names a different env or service in a request, Junior should follow the explicit request instead.
37
+
38
+ ## Auth model
39
+
40
+ - Datadog MCP requires user-based OAuth (OAuth 2.1 + PKCE) and does not accept shared bearer tokens here.
41
+ - This package is not suitable for fully headless or unattended automation.
42
+ - Users can disconnect from Junior App Home with `Unlink`, or by asking Junior to disconnect Datadog.
43
+
44
+ Full setup guide: https://junior.sentry.dev/extend/datadog-plugin/
package/package.json ADDED
@@ -0,0 +1,13 @@
1
+ {
2
+ "name": "@sentry/junior-datadog",
3
+ "version": "0.25.0",
4
+ "private": false,
5
+ "publishConfig": {
6
+ "access": "public"
7
+ },
8
+ "type": "module",
9
+ "files": [
10
+ "plugin.yaml",
11
+ "skills"
12
+ ]
13
+ }
package/plugin.yaml ADDED
@@ -0,0 +1,36 @@
1
+ name: datadog
2
+ description: Query Datadog telemetry (logs, metrics, traces, monitors, incidents, dashboards) via Datadog's hosted MCP server
3
+
4
+ config-keys:
5
+ - env
6
+ - service
7
+
8
+ # Datadog orgs are region-pinned. The MCP hostname must match the customer's
9
+ # Datadog site. Non-US1 operators set DATADOG_SITE to their site host (e.g.
10
+ # `us5.datadoghq.com`, `datadoghq.eu`, `ap1.datadoghq.com`, `ddog-gov.com`).
11
+ # US1 operators can leave DATADOG_SITE unset and the default applies.
12
+ env-vars:
13
+ DATADOG_SITE:
14
+ default: datadoghq.com
15
+
16
+ mcp:
17
+ url: https://mcp.${DATADOG_SITE}/api/unstable/mcp-server/mcp?toolsets=core,apm,error-tracking
18
+ allowed-tools:
19
+ - analyze_datadog_logs
20
+ - get_datadog_incident
21
+ - get_datadog_metric
22
+ - get_datadog_metric_context
23
+ - get_datadog_notebook
24
+ - get_datadog_trace
25
+ - search_datadog_dashboards
26
+ - search_datadog_events
27
+ - search_datadog_hosts
28
+ - search_datadog_incidents
29
+ - search_datadog_logs
30
+ - search_datadog_metrics
31
+ - search_datadog_monitors
32
+ - search_datadog_notebooks
33
+ - search_datadog_rum_events
34
+ - search_datadog_service_dependencies
35
+ - search_datadog_services
36
+ - search_datadog_spans
@@ -0,0 +1,69 @@
1
+ ---
2
+ name: datadog
3
+ description: Query live Datadog telemetry (logs, metrics, traces, spans, monitors, incidents, dashboards, services, hosts) through Datadog's hosted MCP server. Use when users ask to investigate production behavior in Datadog — searching logs, checking monitor status, inspecting traces or spans, looking up incidents, finding services, or correlating metrics. Do not use it for Sentry issues, repository/source-code work, or ticketing.
4
+ uses-config: datadog.env datadog.service
5
+ ---
6
+
7
+ # Datadog Operations
8
+
9
+ Use this skill for Datadog observability investigations in the harness.
10
+
11
+ ## Reference loading
12
+
13
+ Load references conditionally based on the request:
14
+
15
+ | Need | Read |
16
+ | -------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------- |
17
+ | Any Datadog operation | [references/api-surface.md](references/api-surface.md) |
18
+ | Log search, metric query, trace lookup, incidents | [references/common-use-cases.md](references/common-use-cases.md), [references/query-syntax.md](references/query-syntax.md) |
19
+ | Auth failures, permission errors, or tool failures | [references/troubleshooting-workarounds.md](references/troubleshooting-workarounds.md) |
20
+
21
+ ## Workflow
22
+
23
+ 1. Resolve the operation and target:
24
+
25
+ - Determine whether the request is a log search, metric query, trace/span inspection, monitor lookup, incident lookup, dashboard/notebook lookup, service/host listing, or service-dependency map.
26
+ - Prefer explicit env, service, host, monitor/incident IDs, trace IDs, or Datadog URLs when the user provides them.
27
+ - When the user did not specify a scope, treat `datadog.env` and `datadog.service` conversation config as optional defaults. Explicit user input always wins over config.
28
+ - Only set or change `datadog.env` and `datadog.service` when the user explicitly asks to store a default for this conversation or channel.
29
+ - If the request refers to an earlier telemetry item indirectly (an incident, trace, or monitor already mentioned in the thread), inspect the current thread for the existing ID or URL before asking the user to restate it.
30
+ - Ask one concise follow-up only when a search is genuinely under-specified, for example when the user asks about "errors" with no env, service, or time window hint and the thread has no prior context.
31
+
32
+ 2. Use the active Datadog MCP tools:
33
+
34
+ - `loadSkill` returns `available_tools` for this skill, including the exact `tool_name` values and input schemas exposed in this turn.
35
+ - Call those exact tool names directly. Use `searchTools` only if you need to rediscover or filter the active Datadog tools later in the same turn.
36
+ - Start narrow: pick the single most direct tool for the request before reaching for broader search.
37
+ - Known incident ID → `get_datadog_incident`
38
+ - Known trace ID → `get_datadog_trace`
39
+ - Known notebook ID → `get_datadog_notebook`
40
+ - Known metric name → `get_datadog_metric` (and `get_datadog_metric_context` when the user wants available tags or dimensions)
41
+ - For exploratory questions, prefer one `search_datadog_*` call with a tight query, then one follow-up fetch if needed.
42
+ - For "what is the current error rate / log volume / top offenders" style questions, prefer `analyze_datadog_logs` (SQL-style aggregation) over pulling raw log pages back through `search_datadog_logs`.
43
+ - For service-topology questions ("what calls checkout?", "what does the payment API depend on?"), prefer `search_datadog_service_dependencies` over manually stitching spans together.
44
+ - Use `search_datadog_monitors` for "is this alerting?" or "what is monitor X doing?"; use `search_datadog_incidents` / `get_datadog_incident` for incident context.
45
+ - Use `search_datadog_rum_events` only when the user asks about real-user / browser telemetry, not for backend issues.
46
+
47
+ 3. Bound every query:
48
+
49
+ - Always constrain time windows. Default to the last 15 minutes for "right now" questions and the last 24 hours for retrospective questions; otherwise use the window the user named.
50
+ - Always include `env:` when `datadog.env` is set or the user named an env.
51
+ - Always include `service:` when the user named a service or `datadog.service` is set and the tool is service-scoped.
52
+ - Cap result size. Prefer the default or small page sizes; do not page through thousands of logs when an aggregate tool answers the question.
53
+
54
+ 4. Report the result:
55
+
56
+ - Return the concrete answer first (counts, status, incident severity, trace timing, top offenders), then a short evidence block.
57
+ - Include Datadog deep links (e.g. `https://app.datadoghq.com/logs?query=...`, `https://app.datadoghq.com/apm/trace/<id>`, `https://app.datadoghq.com/incidents/<id>`) so Slack users can click through.
58
+ - Preserve interesting spans, log lines, or metric values inline only when they are the evidence for the answer. Do not dump raw tool output.
59
+ - Keep routine tool chatter silent. Do not narrate each MCP search or fetch step.
60
+
61
+ ## Guardrails
62
+
63
+ - Read-only only in this skill. Do not create, edit, mute, or resolve monitors, incidents, notebooks, dashboards, SLOs, or feature flags — the plugin intentionally does not expose those tools.
64
+ - Log, RUM, APM, and incident payloads can contain PII or sensitive customer data. Quote only the minimum needed to answer the question. Do not paste full raw log bodies or span payloads when a summary plus a deep link is enough.
65
+ - If Datadog authorization is required, let the MCP OAuth flow pause and resume the thread automatically instead of asking the user to handle credentials manually.
66
+ - If a Datadog tool returns a generic `403`, `permission denied`, or similar, stop and tell the user the current Datadog connection could not access the requested resource. Do not guess at missing RBAC scopes.
67
+ - If Datadog responds with `429 Too Many Requests`, wait briefly and retry the same query once. If it still fails, report the throttle and stop.
68
+ - For large traces that the server marks as truncated, report that fact; do not pretend the shown spans are complete.
69
+ - Do not use this skill for Sentry issues, Linear/GitHub ticketing, or source-code investigation. Hand those off to the matching skill.
@@ -0,0 +1,95 @@
1
+ # API Surface
2
+
3
+ Use this reference for any Datadog operation.
4
+
5
+ ## Runtime contract
6
+
7
+ - `loadSkill` returns `available_tools` for this skill, including the exact Datadog MCP `tool_name` values exposed in the current turn.
8
+ - Call those exact `tool_name` values directly.
9
+ - Use `searchTools` only when you need to rediscover or filter the active Datadog tools later in the same turn.
10
+ - Do not hardcode raw Datadog MCP tool names in advance. Tool discovery is part of the workflow.
11
+ - Return concrete findings plus Datadog deep links for navigation.
12
+
13
+ ## Provider surface
14
+
15
+ The packaged plugin points at Datadog's hosted remote MCP server and enables the `core`, `apm`, and `error-tracking` toolsets. Tool exposure is intentionally limited to the read-oriented surface below.
16
+
17
+ ### Tools exposed in this skill
18
+
19
+ | Tool | Intent |
20
+ | ------------------------------------- | ----------------------------------------------------------------------------------- |
21
+ | `search_datadog_logs` | Search raw log events by filter (service, host, env, status, query, time window). |
22
+ | `analyze_datadog_logs` | SQL-style aggregation over logs for counts, group-bys, top-N, and numeric analysis. |
23
+ | `search_datadog_events` | Datadog Events API: deployments, infra changes, alerts, status events. |
24
+ | `search_datadog_metrics` | List available metrics by name pattern, tag, or service. |
25
+ | `get_datadog_metric` | Query a specific metric time series over a time window. |
26
+ | `get_datadog_metric_context` | Fetch metadata and available tag dimensions for a metric. |
27
+ | `search_datadog_spans` | Search APM spans by service, operation, tags, time, error state. |
28
+ | `get_datadog_trace` | Fetch a full trace by trace ID. |
29
+ | `search_datadog_services` | List services from the Software Catalog with ownership and tag metadata. |
30
+ | `search_datadog_service_dependencies` | Upstream/downstream service map for a service, or services owned by a team. |
31
+ | `search_datadog_hosts` | List monitored hosts with tags and health state. |
32
+ | `search_datadog_monitors` | List monitors, their statuses, and alert conditions. |
33
+ | `search_datadog_incidents` | List incidents with severity, state, and metadata. |
34
+ | `get_datadog_incident` | Retrieve a specific incident by ID (timeline detail may be absent). |
35
+ | `search_datadog_dashboards` | List available dashboards. |
36
+ | `search_datadog_notebooks` | List Datadog notebooks by author, tag, or content. |
37
+ | `get_datadog_notebook` | Fetch a notebook by ID. |
38
+ | `search_datadog_rum_events` | Search Datadog RUM (Real User Monitoring) events for browser / frontend issues. |
39
+
40
+ ### Tools intentionally not exposed
41
+
42
+ - Notebook mutations (`create_datadog_notebook`, `edit_datadog_notebook`).
43
+ - Monitor, SLO, or incident mutations.
44
+ - Feature-flag, DBM, and security toolsets (the packaged URL does not request them).
45
+
46
+ If a user asks for a mutation, stop and explain that this skill is read-only.
47
+
48
+ ## Operation patterns
49
+
50
+ | Intent | Minimum tool pattern |
51
+ | ------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------- |
52
+ | "Why is service X failing right now?" | `search_datadog_monitors` + `analyze_datadog_logs` (top error counts by status or message) + optionally `get_datadog_trace` for one failing trace. |
53
+ | "Show me errors for service X in the last hour." | `analyze_datadog_logs` for counts/top-N first; only fall back to `search_datadog_logs` if the user asked for specific log lines. |
54
+ | "What is the status of monitor X?" | `search_datadog_monitors` with the monitor name/tag, then cite state + last transition time. |
55
+ | "Tell me about incident INC-123." | `get_datadog_incident` directly. Only fall back to `search_datadog_incidents` if no ID is known. |
56
+ | "What depends on the checkout service?" | `search_datadog_service_dependencies` scoped to that service. |
57
+ | "How did this trace spend its time?" | `get_datadog_trace` by ID; cite the slowest spans. |
58
+ | "What tag values are valid for this metric?" | `get_datadog_metric_context` before `get_datadog_metric`. |
59
+ | "Which hosts are unhealthy?" | `search_datadog_hosts` filtered by health/tags. |
60
+ | "Find slow page loads." | `search_datadog_rum_events` with a page/speed filter. |
61
+
62
+ ## Config helpers
63
+
64
+ Use these commands only when the user explicitly asks to inspect or store Datadog defaults for the current conversation/channel.
65
+
66
+ Resolve env default:
67
+
68
+ ```bash
69
+ jr-rpc config get datadog.env
70
+ ```
71
+
72
+ Set env default:
73
+
74
+ ```bash
75
+ jr-rpc config set datadog.env prod
76
+ ```
77
+
78
+ Resolve service default:
79
+
80
+ ```bash
81
+ jr-rpc config get datadog.service
82
+ ```
83
+
84
+ Set service default:
85
+
86
+ ```bash
87
+ jr-rpc config set datadog.service checkout
88
+ ```
89
+
90
+ ## Content expectations
91
+
92
+ - Translate Slack-thread wording into stable observability language (env, service, status, span, monitor, incident, host).
93
+ - Preserve material URLs present in the conversation (Sentry, GitHub, dashboards, prior Datadog links) when they add evidence.
94
+ - Include Datadog deep links (`https://app.datadoghq.com/...`) with the answer so users can click through.
95
+ - Label assumptions clearly when the thread leaves important details uncertain (chosen env, chosen time window, chosen service).
@@ -0,0 +1,84 @@
1
+ # Common Use Cases
2
+
3
+ Use these patterns to shape concrete Datadog requests.
4
+
5
+ ## 1. Triage "service X is failing right now"
6
+
7
+ - Default the time window to the last 15 minutes unless the user gave a different one.
8
+ - Constrain by `service:` and `env:` (explicit user input wins; fall back to `datadog.service` / `datadog.env`).
9
+ - `search_datadog_monitors` for `service:<x>` first — a firing monitor usually names the failure mode.
10
+ - Then `analyze_datadog_logs` to aggregate by status/level/message to find the top error shape.
11
+ - If the user asks "why", fetch one representative failing trace with `get_datadog_trace` or `search_datadog_spans` filtered to `service:<x> status:error`.
12
+ - Report monitor state, top error, and one failing trace link — not a dump.
13
+
14
+ ## 2. "Is this monitor alerting?"
15
+
16
+ - Use `search_datadog_monitors` with the monitor name, tag, or ID.
17
+ - Report state (`OK`, `Warn`, `Alert`, `No Data`), last transition, and the monitor link.
18
+ - If the monitor is in `No Data`, note that explicitly — it is not the same as healthy.
19
+
20
+ ## 3. "Tell me about incident INC-123" or "What is the status of the Redis incident?"
21
+
22
+ - If the user named the incident ID, go straight to `get_datadog_incident`.
23
+ - If only a topic was named, use `search_datadog_incidents` filtered by active/severity and scan for a match in the thread's time window.
24
+ - Report severity, state, owner, and link to the incident.
25
+ - Note that incident timeline detail may be absent from the MCP response; do not fabricate timeline entries.
26
+
27
+ ## 4. Log search with a specific query
28
+
29
+ - Default to `search_datadog_logs` only when the user explicitly wants raw log lines.
30
+ - Constrain with `service:`, `env:`, `status:`, `host:`, or `@<faceted_field>:` as appropriate (see `query-syntax.md`).
31
+ - Cap page size and time window to avoid huge responses.
32
+ - Report a short summary plus a Datadog logs deep link. Quote only the minimum log content.
33
+
34
+ ## 5. "What are the top errors for service X right now?"
35
+
36
+ - Prefer `analyze_datadog_logs` with a SQL-style `GROUP BY status` or `GROUP BY @http.status_code` / `GROUP BY @error.kind`.
37
+ - Report the top 3-5 buckets with counts, not an exhaustive table.
38
+ - Include the aggregated query link so the user can open the same view in Datadog.
39
+
40
+ ## 6. Trace inspection by ID
41
+
42
+ - Use `get_datadog_trace` with the trace ID.
43
+ - Cite the top 3 slowest or error-tagged spans (service, operation, duration, error state).
44
+ - If the server marks the trace as truncated, say so — some spans are not present.
45
+
46
+ ## 7. Span search for a known error pattern
47
+
48
+ - Use `search_datadog_spans` with explicit filters like `service:<x> status:error resource_name:"..."` and a bounded time window.
49
+ - Report span counts plus the most illustrative span's trace link.
50
+
51
+ ## 8. Service topology lookup
52
+
53
+ - Use `search_datadog_service_dependencies` to answer "what calls X?" or "what does X depend on?" or "what does team Y own?".
54
+ - Return the dependency list with service names and link back to the Service Catalog page.
55
+
56
+ ## 9. Metric lookup
57
+
58
+ - Use `search_datadog_metrics` when the user is unsure of the metric name.
59
+ - Once the metric name is known, use `get_datadog_metric` with the time window and tag filters.
60
+ - Use `get_datadog_metric_context` before querying if the user wants to know which tags (`env`, `service`, `host`, ...) are usable.
61
+ - Report headline numbers (current, peak, delta) plus a metric explorer link.
62
+
63
+ ## 10. Host health
64
+
65
+ - Use `search_datadog_hosts` filtered by tag, role, or `down:true`.
66
+ - Return counts, the list of unhealthy hosts (names + tags), and a host map link.
67
+
68
+ ## 11. RUM / frontend slowness
69
+
70
+ - Use `search_datadog_rum_events` only when the user asked about end-user / browser experience.
71
+ - Constrain to `@type:error`, slow page loads, or specific views; bound the time window.
72
+ - Do not use RUM for backend errors — those live in logs/APM.
73
+
74
+ ## 12. Dashboards and notebooks
75
+
76
+ - `search_datadog_dashboards` to list dashboards by topic, team, or tag — useful for "do we already have a dashboard for X?".
77
+ - `search_datadog_notebooks` + `get_datadog_notebook` for reading existing investigation notebooks.
78
+ - This skill does not create or edit dashboards or notebooks. If the user asks, stop and say so.
79
+
80
+ ## 13. Storing channel defaults
81
+
82
+ - Use `jr-rpc config set datadog.env <env>` only when the user explicitly asks to store an env default for the conversation/channel.
83
+ - Use `jr-rpc config set datadog.service <service>` only when the user explicitly asks to store a service default for the conversation/channel.
84
+ - Treat both defaults as optional fallbacks. Explicit user input wins whenever a request names a different env or service.
@@ -0,0 +1,77 @@
1
+ # Query Syntax
2
+
3
+ Use this reference when forming Datadog log queries, span queries, and log analytics (`analyze_datadog_logs`) SQL.
4
+
5
+ ## Log search query syntax
6
+
7
+ Datadog log search queries are tag-and-facet based. Core building blocks:
8
+
9
+ | Form | Meaning |
10
+ | ------------------ | -------------------------------------------------------------------- |
11
+ | `service:<name>` | Reserved attribute — service emitting the log. |
12
+ | `env:<name>` | Reserved attribute — deployment environment tag. |
13
+ | `host:<name>` | Reserved attribute — emitting host. |
14
+ | `status:<level>` | Log level: `error`, `warn`, `info`, `debug`, etc. |
15
+ | `source:<name>` | Log source integration (e.g. `nginx`, `python`). |
16
+ | `@<field>:<value>` | Faceted attribute (custom JSON field), e.g. `@http.status_code:500`. |
17
+ | `"some phrase"` | Free-text phrase search. |
18
+ | `AND`, `OR`, `-` | Boolean ops; `-` negates. Default operator between terms is `AND`. |
19
+ | `(a OR b) AND c` | Parenthesized boolean expression. |
20
+
21
+ Common examples:
22
+
23
+ - `service:checkout env:prod status:error`
24
+ - `service:api env:prod @http.status_code:(500 OR 502 OR 503)`
25
+ - `service:worker -status:info "timeout"`
26
+ - `@error.kind:DatabaseError env:prod`
27
+
28
+ Tips:
29
+
30
+ - Prefer `@<field>:` form over free-text search when the field exists. Facet matches are cheaper and more precise.
31
+ - `status` and `@http.status_code` are different. `status` is the log level; `@http.status_code` is the HTTP response code.
32
+ - Reserved attributes (`service`, `env`, `host`, `status`, `source`) do not take the `@` prefix. Custom fields do.
33
+
34
+ ## Span / APM search
35
+
36
+ APM span search shares the same query language, plus a few APM-specific attributes:
37
+
38
+ | Attribute | Meaning |
39
+ | ------------------ | ------------------------------------------ |
40
+ | `service:<name>` | Service emitting the span. |
41
+ | `env:<name>` | Deployment environment tag. |
42
+ | `operation_name:X` | Span operation name (e.g. `http.request`). |
43
+ | `resource_name:X` | Endpoint or handler. |
44
+ | `status:error` | Span is marked as an error. |
45
+ | `duration:>500ms` | Range filter on span duration. |
46
+
47
+ ## `analyze_datadog_logs` SQL
48
+
49
+ `analyze_datadog_logs` takes SQL-like aggregations over the same log data. Prefer it for counts, top-N, group-bys, and time-bucketed analytics instead of paging raw logs.
50
+
51
+ Conventions:
52
+
53
+ - Wrap log query filters in a `WHERE` clause using the same log-search query syntax (quoted as a string).
54
+ - Use `COUNT(*)` for volume, `COUNT(DISTINCT <field>)` for unique cardinality.
55
+ - `GROUP BY` faceted fields (without `@` in the SQL form — the tool's schema specifies how to reference them; follow the tool's input schema exactly).
56
+ - Cap with `ORDER BY ... DESC LIMIT N` — top 5-10 is usually enough.
57
+
58
+ Example intents (shape — not a literal string; call the tool with the input schema it advertises):
59
+
60
+ - Top 10 services by error count in the last hour.
61
+ - HTTP 5xx count by status code in the last 15 minutes, grouped by `@http.status_code`.
62
+ - Log volume by `host` over the last hour to spot a noisy emitter.
63
+
64
+ ## Time windows
65
+
66
+ - For "right now" questions, default to the last 15 minutes.
67
+ - For "what happened earlier today" questions, default to the last 24 hours.
68
+ - For incident-linked questions, prefer a window that brackets the incident `created` time.
69
+ - Always include a time window — unbounded queries are slow and easy to misinterpret.
70
+
71
+ ## What to cite back
72
+
73
+ - The exact query string used (`service:checkout env:prod status:error`) — users often want to click through.
74
+ - A Datadog deep link that encodes the same filter:
75
+ - `https://app.datadoghq.com/logs?query=<url-encoded-query>&from_ts=<ms>&to_ts=<ms>`
76
+ - `https://app.datadoghq.com/apm/traces?query=<url-encoded-query>`
77
+ - The time window you used.
@@ -0,0 +1,47 @@
1
+ # Troubleshooting and Workarounds
2
+
3
+ Use this reference when Datadog MCP calls fail or return unexpected results.
4
+
5
+ ## Authentication and connection
6
+
7
+ | Symptom | Likely cause | What to do |
8
+ | ------------------------------------------------------------------ | ----------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------ |
9
+ | Tool call returns an authorization-required signal before running. | User has not yet completed the Datadog OAuth flow in Slack. | Let the runtime DM the user the authorization link and pause the turn. Do not prompt for credentials manually. |
10
+ | Tool call returned `401` mid-session. | OAuth token expired or was revoked. | Expect Junior's MCP layer to resurface the authorization flow. Retry once the user has re-authorized; do not loop before that. |
11
+ | OAuth callback did not resume the thread. | User closed the browser before the redirect completed. | Ask the user to retry the request — the OAuth flow will restart and complete if they finish it this time. |
12
+
13
+ ## Permission and scope errors
14
+
15
+ - A Datadog API returning `403 Forbidden` or `permission denied` means the user's Datadog role cannot read that resource (metrics, APM, incidents, RUM, etc.).
16
+ - Stop and tell the user the current Datadog connection could not access the requested data. Suggest they verify their Datadog role/team.
17
+ - Do not guess specific missing permission names unless Datadog explicitly named one in the error.
18
+ - Do not loop retrying a 403.
19
+
20
+ ## Rate limits
21
+
22
+ - Datadog throttles the unstable MCP endpoint. A `429 Too Many Requests` response is expected under load.
23
+ - Retry the same query once after a short wait.
24
+ - If it fails again, report the throttle and stop. Do not fall back to larger scans that will throttle harder.
25
+
26
+ ## Query returned no results
27
+
28
+ - Double-check that `env:` and `service:` match real values. Datadog tag values are case-sensitive.
29
+ - Widen the time window before widening the filter. Many "no results" cases are just too narrow a window.
30
+ - If searching logs with `@<field>:value`, confirm the field exists as a facet; custom log attributes must be facetized in Datadog to be searchable.
31
+ - If an expected monitor or incident is missing, the user's account may not have access to that workspace or team.
32
+
33
+ ## Too many results / large payloads
34
+
35
+ - Prefer `analyze_datadog_logs` with `GROUP BY` + `LIMIT` over paging raw logs.
36
+ - For traces marked truncated by the server, say so in the reply. Do not pretend the shown spans are complete.
37
+ - Quote only the minimum log / span / metric content needed as evidence. Link to Datadog for the rest.
38
+
39
+ ## Multiple Datadog sites
40
+
41
+ - The packaged plugin defaults to the US1 endpoint (`mcp.datadoghq.com`). The manifest declares `DATADOG_SITE` in its `env-vars` block with a default of `datadoghq.com` and references it from `mcp.url` as `${DATADOG_SITE}`, so non-US1 operators (US3, US5, EU, AP1, AP2, GovCloud) set `DATADOG_SITE` in their Junior deployment env to their site host (e.g. `us5.datadoghq.com`, `datadoghq.eu`, `ddog-gov.com`). Users hitting auth failures against the wrong regional endpoint should have the operator confirm `DATADOG_SITE` is set correctly.
42
+ - If the user's Datadog account lives on a different site than the deployment is configured for, advise the operator to update the `DATADOG_SITE` environment variable. Do not try to work around this silently inside a turn.
43
+
44
+ ## Read-only scope
45
+
46
+ - This skill intentionally exposes only read-oriented Datadog tools.
47
+ - If the user asks to create a notebook, edit a monitor, mute an alert, or resolve an incident, stop and tell them those actions are not in scope. Do not attempt to approximate the mutation from read tools.