awslabs.cloudwatch-applicationsignals-mcp-server 0.1.21__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (36) hide show
  1. awslabs/__init__.py +17 -0
  2. awslabs/cloudwatch_applicationsignals_mcp_server/__init__.py +17 -0
  3. awslabs/cloudwatch_applicationsignals_mcp_server/audit_presentation_utils.py +288 -0
  4. awslabs/cloudwatch_applicationsignals_mcp_server/audit_utils.py +912 -0
  5. awslabs/cloudwatch_applicationsignals_mcp_server/aws_clients.py +120 -0
  6. awslabs/cloudwatch_applicationsignals_mcp_server/canary_utils.py +910 -0
  7. awslabs/cloudwatch_applicationsignals_mcp_server/enablement_guides/templates/ec2/ec2-dotnet-enablement.md +435 -0
  8. awslabs/cloudwatch_applicationsignals_mcp_server/enablement_guides/templates/ec2/ec2-java-enablement.md +321 -0
  9. awslabs/cloudwatch_applicationsignals_mcp_server/enablement_guides/templates/ec2/ec2-nodejs-enablement.md +420 -0
  10. awslabs/cloudwatch_applicationsignals_mcp_server/enablement_guides/templates/ec2/ec2-python-enablement.md +598 -0
  11. awslabs/cloudwatch_applicationsignals_mcp_server/enablement_guides/templates/ecs/ecs-dotnet-enablement.md +264 -0
  12. awslabs/cloudwatch_applicationsignals_mcp_server/enablement_guides/templates/ecs/ecs-java-enablement.md +193 -0
  13. awslabs/cloudwatch_applicationsignals_mcp_server/enablement_guides/templates/ecs/ecs-nodejs-enablement.md +198 -0
  14. awslabs/cloudwatch_applicationsignals_mcp_server/enablement_guides/templates/ecs/ecs-python-enablement.md +236 -0
  15. awslabs/cloudwatch_applicationsignals_mcp_server/enablement_guides/templates/eks/eks-dotnet-enablement.md +166 -0
  16. awslabs/cloudwatch_applicationsignals_mcp_server/enablement_guides/templates/eks/eks-java-enablement.md +166 -0
  17. awslabs/cloudwatch_applicationsignals_mcp_server/enablement_guides/templates/eks/eks-nodejs-enablement.md +166 -0
  18. awslabs/cloudwatch_applicationsignals_mcp_server/enablement_guides/templates/eks/eks-python-enablement.md +169 -0
  19. awslabs/cloudwatch_applicationsignals_mcp_server/enablement_guides/templates/lambda/lambda-dotnet-enablement.md +336 -0
  20. awslabs/cloudwatch_applicationsignals_mcp_server/enablement_guides/templates/lambda/lambda-java-enablement.md +336 -0
  21. awslabs/cloudwatch_applicationsignals_mcp_server/enablement_guides/templates/lambda/lambda-nodejs-enablement.md +336 -0
  22. awslabs/cloudwatch_applicationsignals_mcp_server/enablement_guides/templates/lambda/lambda-python-enablement.md +336 -0
  23. awslabs/cloudwatch_applicationsignals_mcp_server/enablement_tools.py +147 -0
  24. awslabs/cloudwatch_applicationsignals_mcp_server/server.py +1505 -0
  25. awslabs/cloudwatch_applicationsignals_mcp_server/service_audit_utils.py +231 -0
  26. awslabs/cloudwatch_applicationsignals_mcp_server/service_tools.py +659 -0
  27. awslabs/cloudwatch_applicationsignals_mcp_server/sli_report_client.py +333 -0
  28. awslabs/cloudwatch_applicationsignals_mcp_server/slo_tools.py +386 -0
  29. awslabs/cloudwatch_applicationsignals_mcp_server/trace_tools.py +784 -0
  30. awslabs/cloudwatch_applicationsignals_mcp_server/utils.py +172 -0
  31. awslabs_cloudwatch_applicationsignals_mcp_server-0.1.21.dist-info/METADATA +808 -0
  32. awslabs_cloudwatch_applicationsignals_mcp_server-0.1.21.dist-info/RECORD +36 -0
  33. awslabs_cloudwatch_applicationsignals_mcp_server-0.1.21.dist-info/WHEEL +4 -0
  34. awslabs_cloudwatch_applicationsignals_mcp_server-0.1.21.dist-info/entry_points.txt +2 -0
  35. awslabs_cloudwatch_applicationsignals_mcp_server-0.1.21.dist-info/licenses/LICENSE +174 -0
  36. awslabs_cloudwatch_applicationsignals_mcp_server-0.1.21.dist-info/licenses/NOTICE +2 -0
@@ -0,0 +1,808 @@
1
+ Metadata-Version: 2.4
2
+ Name: awslabs.cloudwatch-applicationsignals-mcp-server
3
+ Version: 0.1.21
4
+ Summary: An AWS Labs Model Context Protocol (MCP) server for AWS Application Signals
5
+ Project-URL: Homepage, https://awslabs.github.io/mcp/
6
+ Project-URL: Documentation, https://awslabs.github.io/mcp/servers/cloudwatch-applicationsignals-mcp-server/
7
+ Project-URL: Source, https://github.com/awslabs/mcp.git
8
+ Project-URL: Bug Tracker, https://github.com/awslabs/mcp/issues
9
+ Project-URL: Changelog, https://github.com/awslabs/mcp/blob/main/src/cloudwatch-applicationsignals-mcp-server/CHANGELOG.md
10
+ Author: Amazon Web Services
11
+ Author-email: AWSLabs MCP <203918161+awslabs-mcp@users.noreply.github.com>
12
+ License: Apache-2.0
13
+ License-File: LICENSE
14
+ License-File: NOTICE
15
+ Classifier: License :: OSI Approved :: Apache Software License
16
+ Classifier: Operating System :: OS Independent
17
+ Classifier: Programming Language :: Python
18
+ Classifier: Programming Language :: Python :: 3
19
+ Classifier: Programming Language :: Python :: 3.10
20
+ Classifier: Programming Language :: Python :: 3.11
21
+ Classifier: Programming Language :: Python :: 3.12
22
+ Classifier: Programming Language :: Python :: 3.13
23
+ Requires-Python: >=3.10
24
+ Requires-Dist: boto3>=1.40.41
25
+ Requires-Dist: httpx>=0.24.0
26
+ Requires-Dist: loguru>=0.7.3
27
+ Requires-Dist: mcp[cli]>=1.23.0
28
+ Requires-Dist: pydantic>=2.11.1
29
+ Description-Content-Type: text/markdown
30
+
31
+ # CloudWatch Application Signals MCP Server
32
+
33
+ An MCP (Model Context Protocol) server that provides comprehensive tools for monitoring and analyzing AWS services using [AWS Application Signals](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Application-Signals.html).
34
+
35
+ This server enables AI assistants like Claude, GitHub Copilot, and Amazon Q to help you monitor service health, analyze performance metrics, track SLO compliance, and investigate issues using distributed tracing with advanced audit capabilities and root cause analysis.
36
+
37
+ ## Key Features
38
+
39
+ 1. **Comprehensive Service Auditing** - Monitor overall service health, diagnose root causes, and recommend actionable fixes with built-in APM expertise
40
+ 2. **Advanced SLO Compliance Monitoring** - Track Service Level Objectives with breach detection and root cause analysis
41
+ 3. **Operation-Level Performance Analysis** - Deep dive into specific API endpoints and operations
42
+ 4. **100% Trace Visibility** - Query OpenTelemetry spans data via Transaction Search for complete observability
43
+ 5. **Multi-Service Analysis** - Audit multiple services simultaneously with automatic batching
44
+ 6. **Natural Language Insights** - Generate business insights from telemetry data through natural language queries
45
+
46
+ ## Prerequisites
47
+
48
+ 1. [Sign-Up for an AWS account](https://aws.amazon.com/free/?trk=78b916d7-7c94-4cab-98d9-0ce5e648dd5f&sc_channel=ps&ef_id=Cj0KCQjwxJvBBhDuARIsAGUgNfjOZq8r2bH2OfcYfYTht5v5I1Bn0lBKiI2Ii71A8Gk39ZU5cwMLPkcaAo_CEALw_wcB:G:s&s_kwcid=AL!4422!3!432339156162!e!!g!!aws%20sign%20up!9572385111!102212379327&gad_campaignid=9572385111&gbraid=0AAAAADjHtp99c5A9DUyUaUQVhVEoi8of3&gclid=Cj0KCQjwxJvBBhDuARIsAGUgNfjOZq8r2bH2OfcYfYTht5v5I1Bn0lBKiI2Ii71A8Gk39ZU5cwMLPkcaAo_CEALw_wcB)
49
+ 2. [Enable Application Signals](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Application-Monitoring-Sections.html) for your applications
50
+ 3. Install `uv` from [Astral](https://docs.astral.sh/uv/getting-started/installation/) or the [GitHub README](https://github.com/astral-sh/uv#installation)
51
+ 4. Install Python using `uv python install 3.10`
52
+
53
+ ## Available Tools
54
+
55
+ ### Enablement & Setup Tools
56
+
57
+ #### 1. **`get_enablement_guide`** - Application Signals Enablement Assistant
58
+ **Enable observability through AI-guided autonomous code modifications**
59
+
60
+ Use this tool to enable AWS Application Signals through agentic enablement. The tool returns a curated guide that the AI agent follows to autonomously make necessary code changes to your IaC, Dockerfiles, and dependency files. The guide is customized for your service platform (EC2, ECS, Lambda, EKS) and programming language (Python, Node.js, Java).
61
+
62
+ **Prerequisites:**
63
+ - **Enable Start Discovery** in your AWS account and region before using this tool
64
+ - This is a one-time setup that creates the **AWSServiceRoleForCloudWatchApplicationSignals** service-linked role
65
+ - Navigate to CloudWatch console → Services → "Start discovering your Services" → Enable Application Signals
66
+ - See the [enablement guide](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Application-Signals-Enable.html) for detailed steps
67
+
68
+ **How it works:**
69
+ - Returns a curated enablement guide as a prompt for the AI agent
70
+ - The AI agent follows the guide to autonomously modify your code
71
+ - The guide also serves as knowledge you can ask follow-up questions about
72
+ - Supports interactive Q&A throughout the enablement process
73
+
74
+ **When to use this tool:**
75
+ - Enable observability, monitoring, or Application Signals for your AWS service
76
+ - Set up automatic instrumentation for your application on AWS
77
+ - Instrument your service running on EC2, ECS, Lambda, or EKS
78
+ - Add tracing, metrics, or telemetry to your AWS application
79
+
80
+ **Requirements:**
81
+ - Write permissions to IaC files, Dockerfiles, and dependency files
82
+ - Platform must be one of: `ec2`, `ecs`, `lambda`, `eks`
83
+ - Language must be one of: `python`, `nodejs`, `java`
84
+
85
+ **Recommendations:**
86
+ - Use absolute paths for both IaC and application directories (less ambiguous for AI agents)
87
+ - Provide both directory paths in your initial prompt for faster enablement
88
+
89
+ **Best Practice Prompts:**
90
+
91
+ Good prompts (specific and complete):
92
+ ```
93
+ "Enable Application Signals for my Python service running on ECS.
94
+ My app code is in /home/user/myapp and IaC is in /home/user/myapp/infrastructure"
95
+
96
+ "I want to add observability to my Node.js Lambda function.
97
+ The Lambda code is at /Users/dev/checkout-service and
98
+ the CDK infrastructure is at /Users/dev/checkout-service/cdk"
99
+
100
+ "Help me instrument my Java application on EC2 with Application Signals.
101
+ Application directory: /opt/apps/payment-api
102
+ Terraform code: /opt/apps/payment-api/terraform"
103
+ ```
104
+
105
+ Less effective prompts:
106
+ ```
107
+ "Enable monitoring for my app"
108
+ → Missing: platform, language, paths
109
+
110
+ "Enable Application Signals. My code is in ./src and IaC is in ./infrastructure"
111
+ → Problem: Relative paths instead of absolute paths
112
+
113
+ "Enable Application Signals for my ECS service at /home/user/myapp"
114
+ → Missing: programming language
115
+ ```
116
+
117
+ Quick template:
118
+ ```
119
+ "Enable Application Signals for my [LANGUAGE] service on [PLATFORM].
120
+ App code: [ABSOLUTE_PATH_TO_APP]
121
+ IaC code: [ABSOLUTE_PATH_TO_IAC]"
122
+ ```
123
+
124
+ ### 🥇 Primary Audit Tools (Use These First)
125
+
126
+ #### 1. **`audit_services`** ⭐ **PRIMARY SERVICE AUDIT TOOL**
127
+ **The #1 tool for comprehensive AWS service health auditing and monitoring**
128
+
129
+ - **USE THIS FIRST** for all service-level auditing tasks
130
+ - Comprehensive health assessment with actionable insights and recommendations
131
+ - Multi-service analysis with automatic batching (audit 1-100+ services simultaneously)
132
+ - SLO compliance monitoring with automatic breach detection
133
+ - Root cause analysis with traces, logs, and metrics correlation
134
+ - Issue prioritization by severity (critical, warning, info findings)
135
+ - **Wildcard Pattern Support**: Use `*payment*` for automatic service discovery
136
+ - Performance optimized for fast execution across multiple targets
137
+
138
+ **Key Use Cases:**
139
+ - `audit_services(service_targets='[{"Type":"service","Data":{"Service":{"Type":"Service","Name":"*"}}}]')` - Audit all services
140
+ - `audit_services(service_targets='[{"Type":"service","Data":{"Service":{"Type":"Service","Name":"*payment*"}}}]')` - Audit payment services
141
+ - `audit_services(..., auditors="all")` - Comprehensive root cause analysis with all auditors
142
+
143
+ #### 2. **`audit_slos`** ⭐ **PRIMARY SLO AUDIT TOOL**
144
+ **The #1 tool for comprehensive SLO compliance monitoring and breach analysis**
145
+
146
+ - **PREFERRED TOOL** for SLO root cause analysis after using `get_slo()`
147
+ - Much more comprehensive than individual trace tools - provides integrated analysis
148
+ - Combines traces, logs, metrics, and dependencies in a single audit
149
+ - Automatic SLO breach detection with prioritized findings
150
+ - **Wildcard Pattern Support**: Use `*payment*` for automatic SLO discovery
151
+ - Actionable recommendations based on multi-dimensional analysis
152
+
153
+ **Key Use Cases:**
154
+ - `audit_slos(slo_targets='[{"Type":"slo","Data":{"Slo":{"SloName":"*"}}}]')` - Audit all SLOs
155
+ - `audit_slos(..., auditors="all")` - Comprehensive root cause analysis for SLO breaches
156
+
157
+ #### 3. **`audit_service_operations`** 🥇 **PRIMARY OPERATION AUDIT TOOL**
158
+ **The #1 RECOMMENDED tool for operation-specific analysis and performance investigation**
159
+
160
+ - **PREFERRED OVER audit_services()** for operation-level auditing
161
+ - Precision targeting of exact operation behavior vs. service-wide averages
162
+ - Actionable insights with specific error traces and dependency failures
163
+ - Code-level detail with exact stack traces and timeout locations
164
+ - **Wildcard Pattern Support**: Use `*GET*` for specific operation types
165
+ - Focused analysis that eliminates noise from other operations
166
+
167
+ **Key Use Cases:**
168
+ - `audit_service_operations(operation_targets='[{"Type":"service_operation","Data":{"ServiceOperation":{"Service":{"Type":"Service","Name":"*payment*"},"Operation":"*GET*","MetricType":"Latency"}}}]')` - Audit GET operations in payment services
169
+ - `audit_service_operations(..., auditors="all")` - Root cause analysis for specific operations
170
+
171
+ ### 📊 Service Discovery & Information Tools
172
+
173
+ #### 4. **`list_monitored_services`** - Service Discovery Tool
174
+ **OPTIONAL TOOL** - `audit_services()` can automatically discover services using wildcard patterns
175
+
176
+ - Get detailed overview of all monitored services in your environment
177
+ - Discover specific service names and environments for manual audit target construction
178
+ - **RECOMMENDED**: Use `audit_services()` with wildcard patterns instead for comprehensive discovery AND analysis
179
+
180
+ #### 5. **`get_service_detail`** - Service Metadata Tool
181
+ **For basic service metadata and configuration details**
182
+
183
+ - Service metadata and configuration (platform information, key attributes)
184
+ - Service-level metrics (Latency, Error, Fault aggregates)
185
+ - Log groups associated with the service
186
+ - **IMPORTANT**: This tool does NOT provide operation names - use `audit_services()` for operation discovery
187
+
188
+ #### 6. **`list_service_operations`** - Operation Discovery Tool
189
+ **CRITICAL LIMITATION**: Only discovers operations that have been ACTIVELY INVOKED in the specified time window
190
+
191
+ - Basic operation inventory for RECENTLY ACTIVE operations only (max 24 hours)
192
+ - Empty results ≠ no operations exist, just no recent invocations
193
+ - **RECOMMENDED**: Use `audit_services()` FIRST for comprehensive operation discovery and analysis
194
+
195
+ ### 🎯 SLO Management Tools
196
+
197
+ #### 7. **`get_slo`** - SLO Configuration Details
198
+ **Essential for understanding SLO configuration before deep investigation**
199
+
200
+ - Comprehensive SLO configuration details (metrics, thresholds, goals)
201
+ - Operation names and key attributes for further investigation
202
+ - Metric type (LATENCY or AVAILABILITY) and comparison operators
203
+ - **NEXT STEP**: Use `audit_slos()` with `auditors="all"` for root cause analysis
204
+
205
+ #### 8. **`list_slos`** - SLO Discovery
206
+ **List all Service Level Objectives in Application Signals**
207
+
208
+ - Complete list of all SLOs in your account with names and ARNs
209
+ - Filter SLOs by service attributes
210
+ - Basic SLO information including creation time and operation names
211
+ - Useful for SLO discovery and finding SLO names for use with other tools
212
+
213
+ ### 📈 Metrics & Performance Tools
214
+
215
+ #### 9. **`query_service_metrics`** - CloudWatch Metrics Analysis
216
+ **Get CloudWatch metrics for specific Application Signals services**
217
+
218
+ - Analyze service performance (latency, throughput, error rates)
219
+ - View trends over time with both standard statistics and percentiles
220
+ - Automatic granularity adjustment based on time range
221
+ - Summary statistics with recent data points and timestamps
222
+
223
+ ### 🔍 Advanced Trace & Log Analysis Tools
224
+
225
+ #### 10. **`search_transaction_spans`** - 100% Trace Visibility
226
+ **Query OpenTelemetry Spans data via Transaction Search (100% sampled data)**
227
+
228
+ - **100% sampled data** vs X-Ray's 5% sampling for more accurate results
229
+ - Query "aws/spans" log group with CloudWatch Logs Insights
230
+ - Generate business performance insights and summaries
231
+ - **IMPORTANT**: Always include a limit in queries to prevent overwhelming context
232
+
233
+ **Example Query:**
234
+ ```
235
+ FILTER attributes.aws.local.service = "payment-service" and attributes.aws.local.environment = "eks:production"
236
+ | STATS avg(duration) as avg_latency by attributes.aws.local.operation
237
+ | LIMIT 50
238
+ ```
239
+
240
+ #### 11. **`query_sampled_traces`** - X-Ray Trace Analysis (Secondary Tool)
241
+ **Query AWS X-Ray traces (5% sampled data) for trace investigation**
242
+
243
+ - **⚠️ IMPORTANT**: Consider using `audit_slos()` with `auditors="all"` instead for comprehensive root cause analysis
244
+ - Uses X-Ray's 5% sampled trace data - may miss critical errors
245
+ - Limited context compared to comprehensive audit tools
246
+ - **RECOMMENDATION**: Use `get_service_detail()` for operation discovery and `audit_slos()` for root cause analysis
247
+
248
+ **Common Filter Expressions:**
249
+ - `service("service-name"){fault = true}` - Find traces with faults (5xx errors)
250
+ - `duration > 5` - Find slow requests (over 5 seconds)
251
+ - `annotation[aws.local.operation]="GET /api/orders"` - Filter by specific operation
252
+
253
+ #### 12. **`analyze_canary_failures`** - Comprehensive Canary Failure Analysis
254
+ **Deep dive into CloudWatch Synthetics canary failures with root cause identification**
255
+
256
+ - Comprehensive canary failure analysis with deep dive into issues
257
+ - Analyze historical patterns and specific incident details
258
+ - Get comprehensive artifact analysis including logs, screenshots, and HAR files
259
+ - Receive actionable recommendations based on AWS debugging methodology
260
+ - Correlate canary failures with Application Signals telemetry data
261
+ - Identify performance degradation and availability issues across service dependencies
262
+
263
+ **Key Features:**
264
+ - **Failure Pattern Analysis**: Identifies recurring failure modes and temporal patterns
265
+ - **Artifact Deep Dive**: Analyzes canary logs, screenshots, and network traces for root causes
266
+ - **Service Correlation**: Links canary failures to upstream/downstream service issues using Application Signals
267
+ - **Performance Insights**: Detects latency spikes, fault rates, and connection issues
268
+ - **Actionable Remediation**: Provides specific steps based on AWS operational best practices
269
+ - **IAM Analysis**: Validates IAM roles and permissions for common canary access issues
270
+ - **Backend Service Integration**: Correlates canary failures with backend service errors and exceptions
271
+
272
+ **Common Use Cases:**
273
+ - Incident Response: Rapid diagnosis of canary failures during outages
274
+ - Performance Investigation: Understanding latency and availability degradation
275
+ - Dependency Analysis: Identifying which services are causing canary failures
276
+ - Historical Trending: Analyzing failure patterns over time for proactive improvements
277
+ - Root Cause Analysis: Deep dive into specific failure scenarios with full context
278
+ - Infrastructure Issues: Diagnose S3 access, VPC connectivity, and browser target problems
279
+ - Backend Service Debugging: Identify application code issues affecting canary success
280
+
281
+ #### 13. **`list_slis`** - Legacy SLI Status Report (Specialized Tool)
282
+ **Use `audit_services()` as the PRIMARY tool for service auditing**
283
+
284
+ - Basic report showing summary counts (total, healthy, breached, insufficient data)
285
+ - Simple list of breached services with SLO names
286
+ - **IMPORTANT**: `audit_services()` is the PRIMARY and PREFERRED tool for all service auditing tasks
287
+ - Only use this tool for legacy SLI status report format specifically
288
+
289
+ ## Installation
290
+
291
+ ### One-Click Installation
292
+
293
+ | Cursor | VS Code |
294
+ |:------:|:-------:|
295
+ | [![Install MCP Server](https://cursor.com/en/install-mcp?name=applicationsignals&config=eyJhdXRvQXBwcm92ZSI6W10sImRpc2FibGVkIjpmYWxzZSwidGltZW91dCI6NjAsImNvbW1hbmQiOiJ1dnggYXdzbGFicy5jbG91ZHdhdGNoLWFwcGxpY2F0aW9uc2lnbmFscy1tY3Atc2VydmVyQGxhdGVzdCIsImVudiI6eyJBV1NfUFJPRklMRSI6IltUaGUgQVdTIFByb2ZpbGUgTmFtZSB0byB1c2UgZm9yIEFXUyBhY2Nlc3NdIiwiQVdTX1JFR0lPTiI6IltUaGUgQVdTIHJlZ2lvbiB0byBydW4gaW5dIiwiRkFTVE1DUF9MT0dfTEVWRUwiOiJFUlJPUiJ9LCJ0cmFuc3BvcnRUeXBlIjoic3RkaW8ifQ) | [![Install on VS Code](https://insiders.vscode.dev/redirect/mcp/install?name=applicationsignals&config=%7B%22autoApprove%22%3A%5B%5D%2C%22disabled%22%3Afalse%2C%22timeout%22%3A60%2C%22command%22%3A%22uvx%22%2C%22args%22%3A%5B%22awslabs.cloudwatch-applicationsignals-mcp-server%40latest%22%5D%2C%22env%22%3A%7B%22AWS_PROFILE%22%3A%22%5BThe%20AWS%20Profile%20Name%20to%20use%20for%20AWS%20access%5D%22%2C%22AWS_REGION%22%3A%22%5BThe%20AWS%20region%20to%20run%20in%5D%22%2C%22FASTMCP_LOG_LEVEL%22%3A%22ERROR%22%7D%2C%22transportType%22%3A%22stdio%22%7D) |
296
+
297
+ ### Installing via `uv`
298
+
299
+ When using [`uv`](https://docs.astral.sh/uv/) no specific installation is needed. We will
300
+ use [`uvx`](https://docs.astral.sh/uv/guides/tools/) to directly run *awslabs.cloudwatch-applicationsignals-mcp-server*.
301
+
302
+ ### Installing via Claude Desktop
303
+
304
+ On MacOS: `~/Library/Application\ Support/Claude/claude_desktop_config.json`
305
+ On Windows: `%APPDATA%/Claude/claude_desktop_config.json`
306
+
307
+ <details>
308
+ <summary>Development/Unpublished Servers Configuration</summary>
309
+ When installing a development or unpublished server, add the `--directory` flag:
310
+
311
+ ```json
312
+ {
313
+ "mcpServers": {
314
+ "applicationsignals": {
315
+ "command": "uvx",
316
+ "args": ["--from", "/absolute/path/to/cloudwatch-applicationsignals-mcp-server", "awslabs.cloudwatch-applicationsignals-mcp-server"],
317
+ "env": {
318
+ "AWS_PROFILE": "[The AWS Profile Name to use for AWS access]",
319
+ "AWS_REGION": "[AWS Region]",
320
+ "FASTMCP_LOG_LEVEL": "ERROR"
321
+ }
322
+ }
323
+ }
324
+ }
325
+ ```
326
+ </details>
327
+
328
+ <details>
329
+ <summary>Published Servers Configuration</summary>
330
+
331
+ ```json
332
+ {
333
+ "mcpServers": {
334
+ "applicationsignals": {
335
+ "command": "uvx",
336
+ "args": ["awslabs.cloudwatch-applicationsignals-mcp-server@latest"],
337
+ "env": {
338
+ "AWS_PROFILE": "[The AWS Profile Name to use for AWS access]",
339
+ "AWS_REGION": "[AWS Region]",
340
+ "FASTMCP_LOG_LEVEL": "ERROR"
341
+ }
342
+ }
343
+ }
344
+ }
345
+ ```
346
+ </details>
347
+
348
+ ### Installing for Kiro
349
+
350
+ - Add the following configuration to your Kiro MCP settings file at `~/.kiro/settings/mcp.json`:
351
+
352
+ ```json
353
+ {
354
+ "mcpServers": {
355
+ "applicationsignals": {
356
+ "command": "uvx",
357
+ "args": [
358
+ "awslabs.cloudwatch-applicationsignals-mcp-server@latest"
359
+ ],
360
+ "env": {
361
+ "AWS_PROFILE": "[The AWS Profile Name to use for AWS access]",
362
+ "AWS_REGION": "[AWS Region]",
363
+ "FASTMCP_LOG_LEVEL": "ERROR"
364
+ },
365
+ "disabled": false,
366
+ "autoApprove": []
367
+ }
368
+ }
369
+ }
370
+ ```
371
+
372
+ - Restart Kiro to make sure AWS credentials are properly loaded for MCP server after updating your AWS credentials
373
+
374
+ ### Windows Installation
375
+
376
+ For Windows users, the MCP server configuration format is slightly different:
377
+
378
+ ```json
379
+ {
380
+ "mcpServers": {
381
+ "applicationsignals": {
382
+ "disabled": false,
383
+ "timeout": 60,
384
+ "type": "stdio",
385
+ "command": "uv",
386
+ "args": [
387
+ "tool",
388
+ "run",
389
+ "--from",
390
+ "awslabs.cloudwatch-applicationsignals-mcp-server@latest",
391
+ "awslabs.cloudwatch-applicationsignals-mcp-server.exe"
392
+ ],
393
+ "env": {
394
+ "AWS_PROFILE": "[The AWS Profile Name to use for AWS access]",
395
+ "AWS_REGION": "[AWS Region]",
396
+ "FASTMCP_LOG_LEVEL": "ERROR"
397
+ }
398
+ }
399
+ }
400
+ }
401
+ ```
402
+
403
+ ### Build and install docker image locally on the same host of your LLM client
404
+
405
+ 1. `git clone https://github.com/awslabs/mcp.git`
406
+ 2. Go to sub-directory 'src/cloudwatch-applicationsignals-mcp-server/'
407
+ 3. Run 'docker build -t awslabs/cloudwatch-applicationsignals-mcp-server:latest .'
408
+
409
+ ### Add or update your LLM client's config with following:
410
+ ```json
411
+ {
412
+ "mcpServers": {
413
+ "applicationsignals": {
414
+ "command": "docker",
415
+ "args": [
416
+ "run",
417
+ "-i",
418
+ "--rm",
419
+ "-v", "${HOME}/.aws:/root/.aws:ro",
420
+ "-e", "AWS_PROFILE=[The AWS Profile Name to use for AWS access]",
421
+ "-e", "AWS_REGION=[AWS Region]",
422
+ "awslabs/cloudwatch-applicationsignals-mcp-server:latest"
423
+ ]
424
+ }
425
+ }
426
+ }
427
+ ```
428
+
429
+ ### Debugging
430
+
431
+ You can use the MCP inspector to debug the server. For `uvx` installations:
432
+
433
+ ```bash
434
+ npx @modelcontextprotocol/inspector uvx awslabs.cloudwatch-applicationsignals-mcp-server
435
+ ```
436
+
437
+ Or if you've installed the package in a specific directory:
438
+
439
+ ```bash
440
+ cd /path/to/cloudwatch-applicationsignals-mcp-server
441
+ npx @modelcontextprotocol/inspector uv run awslabs.cloudwatch-applicationsignals-mcp-server
442
+ ```
443
+
444
+ ## Usage Examples
445
+
446
+ ### Example 1: Comprehensive Service Health Audit (Recommended Workflow)
447
+ ```
448
+ User: "Audit the health of all my payment services"
449
+ Assistant: I'll perform a comprehensive health audit of all your payment services using the primary audit tool.
450
+
451
+ [Uses audit_services with wildcard pattern]
452
+
453
+ audit_services(
454
+ service_targets='[{"Type":"service","Data":{"Service":{"Type":"Service","Name":"*payment*"}}}]',
455
+ auditors="slo,operation_metric"
456
+ )
457
+
458
+ Results: Found 3 payment services with the following findings:
459
+
460
+ 🔴 CRITICAL FINDINGS:
461
+ • payment-api-service: SLO breach detected - Latency SLO violated (p99: 2.1s > 1.5s threshold)
462
+ • payment-processor: High error rate detected (8.2% > 5% threshold)
463
+
464
+ ⚠️ WARNING FINDINGS:
465
+ • payment-gateway: Dependency timeout issues with external payment provider
466
+
467
+ ✅ HEALTHY:
468
+ • payment-validator: All metrics within normal ranges
469
+
470
+ Recommendations:
471
+ 1. Investigate payment-api-service latency issues - check database connection pool
472
+ 2. Review payment-processor error logs for root cause analysis
473
+ 3. Consider circuit breaker pattern for payment-gateway external dependencies
474
+ ```
475
+
476
+ ### Example 2: SLO Breach Root Cause Analysis
477
+ ```
478
+ User: "I have an SLO breach for my checkout service. Can you help me investigate?"
479
+ Assistant: I'll help you investigate the SLO breach. Let me first get the SLO configuration and then perform a comprehensive root cause analysis.
480
+
481
+ [Uses get_slo to understand configuration]
482
+ [Then uses audit_slos with auditors="all" for comprehensive analysis]
483
+
484
+ audit_slos(
485
+ slo_targets='[{"Type":"slo","Data":{"Slo":{"SloName":"checkout-service-latency-slo"}}}]',
486
+ auditors="all"
487
+ )
488
+
489
+ Root Cause Analysis Results:
490
+
491
+ 🔴 SLO BREACH CONFIRMED:
492
+ • checkout-service-latency-slo: 87.3% attainment (< 95% goal)
493
+ • Breach duration: 2.5 hours
494
+ • Impact: 1,247 requests affected
495
+
496
+ 🔍 ROOT CAUSE IDENTIFIED:
497
+ • Database connection pool exhaustion detected
498
+ • Trace analysis shows 95% of slow requests waiting for DB connections
499
+ • Dependency on payment-service also showing elevated latency
500
+
501
+ 📊 SUPPORTING EVIDENCE:
502
+ • P99 latency: 3.2s (threshold: 2.0s)
503
+ • Error rate spike: 12% during breach period
504
+ • 15 error traces with "connection timeout" messages
505
+
506
+ 🛠️ RECOMMENDED ACTIONS:
507
+ 1. Increase database connection pool size from 10 to 25
508
+ 2. Implement connection pool monitoring alerts
509
+ 3. Review payment-service dependency performance
510
+ 4. Consider implementing request queuing for high load periods
511
+ ```
512
+
513
+ ### Example 3: Operation-Level Performance Investigation
514
+ ```
515
+ User: "The GET operations in my payment services are slow. Can you investigate?"
516
+ Assistant: I'll investigate the performance of GET operations specifically in your payment services using operation-level analysis.
517
+
518
+ [Uses audit_service_operations for precise operation targeting]
519
+
520
+ audit_service_operations(
521
+ operation_targets='[{"Type":"service_operation","Data":{"ServiceOperation":{"Service":{"Type":"Service","Name":"*payment*"},"Operation":"*GET*","MetricType":"Latency"}}}]',
522
+ auditors="operation_metric,trace"
523
+ )
524
+
525
+ Operation Performance Analysis Results:
526
+
527
+ 🎯 GET OPERATIONS ANALYZED:
528
+ • payment-api: GET /api/payments/{id} - P99: 1.8s
529
+ • payment-api: GET /api/payments/history - P99: 3.2s ⚠️
530
+ • payment-processor: GET /api/process/status - P99: 0.8s ✅
531
+
532
+ 🔴 PERFORMANCE ISSUES IDENTIFIED:
533
+ • GET /api/payments/history: Consistently slow (3.2s avg)
534
+ - Root cause: Full table scan on payments table
535
+ - 89% of requests show database query time > 2.5s
536
+ - Missing index on user_id + created_date columns
537
+
538
+ 📈 PERFORMANCE RECOMMENDATIONS:
539
+ 1. Add composite index: CREATE INDEX idx_payments_user_date ON payments(user_id, created_date)
540
+ 2. Implement pagination for history endpoint (limit 50 records)
541
+ 3. Consider caching for frequently accessed payment histories
542
+ 4. Add query timeout protection (max 5s)
543
+
544
+ 🚀 EXPECTED IMPROVEMENT:
545
+ • Estimated latency reduction: 3.2s → 0.4s (87% improvement)
546
+ • Database load reduction: ~75%
547
+ ```
548
+
549
+ ### Example 4: Transaction Search for 100% Trace Visibility
550
+ ```
551
+ User: "I want to analyze all the errors in my checkout service from the last hour"
552
+ Assistant: I'll use Transaction Search to analyze all errors with 100% trace visibility (no sampling).
553
+
554
+ [Uses search_transaction_spans for complete trace data]
555
+
556
+ search_transaction_spans(
557
+ log_group_name="aws/spans",
558
+ query_string='FILTER attributes.aws.local.service = "checkout-service" and attributes.http.status_code >= 400 | STATS count() as error_count by attributes.http.status_code, attributes.aws.local.operation | SORT error_count DESC | LIMIT 20',
559
+ start_time="2024-01-15T10:00:00Z",
560
+ end_time="2024-01-15T11:00:00Z"
561
+ )
562
+
563
+ ✅ Transaction Search Results (100% sampled data):
564
+
565
+ ERROR BREAKDOWN (Last Hour):
566
+ • HTTP 500 errors: 47 occurrences
567
+ - POST /api/checkout/complete: 31 errors
568
+ - POST /api/checkout/validate: 16 errors
569
+
570
+ • HTTP 404 errors: 23 occurrences
571
+ - GET /api/checkout/{id}: 23 errors
572
+
573
+ • HTTP 503 errors: 12 occurrences
574
+ - POST /api/checkout/payment: 12 errors
575
+
576
+ 🔍 KEY INSIGHTS:
577
+ • 82 total errors out of 1,247 requests (6.6% error rate)
578
+ • Most errors (66%) in checkout completion flow
579
+ • Service unavailable errors correlate with payment service dependency
580
+
581
+ 🛠️ IMMEDIATE ACTIONS NEEDED:
582
+ 1. Investigate checkout completion logic for 500 errors
583
+ 2. Add validation for checkout ID existence (404s)
584
+ 3. Implement circuit breaker for payment service calls (503s)
585
+ ```
586
+
587
+ ### Example 5: Service Discovery and Overview
588
+ ```
589
+ User: "What services are being monitored and what's their current health status?"
590
+ Assistant: I'll get an overview of all monitored services and their health status.
591
+
592
+ [Uses list_monitored_services for discovery, then audit_services for health analysis]
593
+
594
+ Found 8 services being monitored:
595
+
596
+ 📊 SERVICE INVENTORY:
597
+ • api-gateway-service (AWS::ApiGateway::RestApi) - Production
598
+ • checkout-service (AWS::ECS::Service) - Production
599
+ • payment-api-service (AWS::ECS::Service) - Production
600
+ • payment-processor (AWS::Lambda::Function) - Production
601
+ • user-service (AWS::EKS::Service) - Production
602
+ • inventory-service (AWS::EKS::Service) - Production
603
+ • notification-lambda (AWS::Lambda::Function) - Production
604
+ • analytics-service (AWS::ECS::Service) - Production
605
+
606
+ [Then performs health audit]
607
+
608
+ 🏥 HEALTH STATUS SUMMARY:
609
+ ✅ HEALTHY (5 services):
610
+ • api-gateway-service: All SLOs met, low latency
611
+ • user-service: Excellent performance metrics
612
+ • inventory-service: All operations within thresholds
613
+ • notification-lambda: Fast execution, no errors
614
+ • analytics-service: Stable performance
615
+
616
+ ⚠️ NEEDS ATTENTION (2 services):
617
+ • checkout-service: Elevated latency on completion operations
618
+ • payment-api-service: Intermittent timeout issues
619
+
620
+ 🔴 CRITICAL (1 service):
621
+ • payment-processor: SLO breach - error rate 8.2% (threshold: 5%)
622
+
623
+ 💡 RECOMMENDATIONS:
624
+ 1. Focus immediate attention on payment-processor error investigation
625
+ 2. Monitor checkout-service latency trends
626
+ 3. Review payment-api-service timeout configurations
627
+ ```
628
+
629
+ ### Example 6: Canary Failure Analysis and Root Cause Investigation
630
+ ```
631
+ User: "My Pet Clinic canaries are failing. Can you help me diagnose the issues?"
632
+ Assistant: I'll analyze your Pet Clinic canaries to identify the root causes of failures.
633
+
634
+ [Uses analyze_canary_failures for comprehensive canary analysis]
635
+
636
+ analyze_canary_failures(canary_name="pc-visit-vet")
637
+ analyze_canary_failures(canary_name="pc-add-visit")
638
+ analyze_canary_failures(canary_name="webapp-erorrpagecanary")
639
+
640
+ 🔍 CANARY FAILURE ANALYSIS RESULTS:
641
+
642
+ 🔴 CRITICAL ISSUES IDENTIFIED:
643
+
644
+ **pc-visit-vet canary:**
645
+ • Root Cause: S3 bucket access issue
646
+ • Error Pattern: Exit status 127, "No such file or directory"
647
+ • Failure Count: 5 consecutive failures
648
+ • IAM Analysis: ✅ Role exists but S3 bucket ARN patterns incorrect in policies
649
+
650
+ **pc-add-visit canary:**
651
+ • Root Cause: Selector timeout + backend service errors
652
+ • Error Pattern: 30000ms timeout waiting for UI element + MissingFormatArgumentException
653
+ • Backend Issue: Format specifier '% o' error in BedrockRuntimeV1Service.invokeTitanModel()
654
+ • Performance: 34 second average response time, 0% success rate
655
+
656
+ **webapp-erorrpagecanary:**
657
+ • Root Cause: Browser target close during selector wait
658
+ • Error Pattern: "Target closed" waiting for `#jsError` selector
659
+ • Failure Count: 5 consecutive failures with 60000ms connection timeouts
660
+
661
+ 🔍 BACKEND SERVICE CORRELATION:
662
+ • MissingFormatArgumentException detected in Pet Clinic backend
663
+ • Location: org.springframework.samples.petclinic.customers.aws.BedrockRuntimeV1Service.invokeTitanModel (line 75)
664
+ • Impact: Affects multiple canaries testing Pet Clinic functionality
665
+ • 20% fault rate on GET /api/customer/diagnose/owners/{ownerId}/pets/{petId}
666
+
667
+ 🛠️ RECOMMENDED ACTIONS:
668
+
669
+ **Immediate (Critical):**
670
+ 1. Fix S3 bucket ARN patterns in pc-visit-vet IAM policy
671
+ 2. Fix format string bug in BedrockRuntimeV1Service: change '% o' to '%s' or correct format
672
+ 3. Add VPC permissions to canary IAM roles if Lambda runs in VPC
673
+
674
+ **Infrastructure (High Priority):**
675
+ 4. Investigate browser target stability issues (webapp-erorrpagecanary)
676
+ 5. Review canary timeout configurations - consider increasing from 30s to 60s
677
+ 6. Implement circuit breaker pattern for external service dependencies
678
+
679
+ **Monitoring (Medium Priority):**
680
+ 7. Add Application Signals monitoring for canary success rates
681
+ 8. Set up alerts for consecutive canary failures (>3 failures)
682
+ 9. Implement canary health dashboard with real-time status
683
+
684
+ 🎯 EXPECTED OUTCOMES:
685
+ • S3 access fix: Immediate resolution of pc-visit-vet failures
686
+ • Backend service fix: 80%+ improvement in Pet Clinic canary success rates
687
+ • Infrastructure improvements: Reduced browser target close errors
688
+ • Enhanced monitoring: Proactive failure detection and faster resolution
689
+ ```
690
+
691
+ ## Recommended Workflows
692
+
693
+ ### 🎯 Primary Audit Workflow (Most Common)
694
+ 1. **Start with `audit_services()`** - Use wildcard patterns for automatic service discovery
695
+ 2. **Review findings summary** - Let user choose which issues to investigate further
696
+ 3. **Deep dive with `auditors="all"`** - For selected services needing root cause analysis
697
+
698
+ ### 🔍 SLO Investigation Workflow
699
+ 1. **Use `get_slo()`** - Understand SLO configuration and thresholds
700
+ 2. **Use `audit_slos()` with `auditors="all"`** - Comprehensive root cause analysis
701
+ 3. **Follow actionable recommendations** - Implement suggested fixes
702
+
703
+ ### ⚡ Operation Performance Workflow
704
+ 1. **Use `audit_service_operations()`** - Target specific operations with precision
705
+ 2. **Apply wildcard patterns** - e.g., `*GET*` for all GET operations
706
+ 3. **Root cause analysis** - Use `auditors="all"` for detailed investigation
707
+
708
+ ### 📊 Complete Observability Workflow
709
+ 1. **Service Discovery** - `audit_services()` with wildcard patterns
710
+ 2. **SLO Compliance** - `audit_slos()` for breach detection
711
+ 3. **Operation Analysis** - `audit_service_operations()` for endpoint-specific issues
712
+ 4. **Trace Investigation** - `search_transaction_spans()` for 100% trace visibility
713
+
714
+ ## Configuration
715
+
716
+ ### Required AWS Permissions
717
+
718
+ The server requires the following AWS IAM permissions:
719
+
720
+ ```json
721
+ {
722
+ "Version": "2012-10-17",
723
+ "Statement": [
724
+ {
725
+ "Effect": "Allow",
726
+ "Action": [
727
+ "application-signals:ListServices",
728
+ "application-signals:GetService",
729
+ "application-signals:ListServiceOperations",
730
+ "application-signals:ListServiceLevelObjectives",
731
+ "application-signals:GetServiceLevelObjective",
732
+ "application-signals:BatchGetServiceLevelObjectiveBudgetReport",
733
+ "application-signals:ListAuditFindings",
734
+ "cloudwatch:GetMetricData",
735
+ "cloudwatch:GetMetricStatistics",
736
+ "logs:GetQueryResults",
737
+ "logs:StartQuery",
738
+ "logs:StopQuery",
739
+ "logs:FilterLogEvents",
740
+ "xray:GetTraceSummaries",
741
+ "xray:BatchGetTraces",
742
+ "xray:GetTraceSegmentDestination",
743
+ "synthetics:GetCanary",
744
+ "synthetics:GetCanaryRuns",
745
+ "s3:GetObject",
746
+ "s3:ListBucket",
747
+ "iam:GetRole",
748
+ "iam:ListAttachedRolePolicies",
749
+ "iam:GetPolicy",
750
+ "iam:GetPolicyVersion"
751
+ ],
752
+ "Resource": "*"
753
+ }
754
+ ]
755
+ }
756
+ ```
757
+
758
+ ### Environment Variables
759
+
760
+ - `AWS_PROFILE` - AWS profile name to use for authentication (defaults to `default` profile)
761
+ - `AWS_REGION` - AWS region (defaults to us-east-1)
762
+ - `MCP_CLOUDWATCH_APPLICATION_SIGNALS_LOG_LEVEL` - Logging level (defaults to INFO)
763
+ - `AUDITOR_LOG_PATH` - Path for audit log files (defaults to /tmp)
764
+
765
+ ### AWS Credentials
766
+
767
+ This server uses AWS profiles for authentication. Set the `AWS_PROFILE` environment variable to use a specific profile from your `~/.aws/credentials` file.
768
+
769
+ The server will use the standard AWS credential chain via boto3, which includes:
770
+ - AWS Profile specified by `AWS_PROFILE` environment variable
771
+ - Default profile from AWS credentials file
772
+ - IAM roles when running on EC2, ECS, Lambda, etc.
773
+
774
+ ### Transaction Search Configuration
775
+
776
+ For 100% trace visibility, enable AWS X-Ray Transaction Search:
777
+ 1. Configure X-Ray to send traces to CloudWatch Logs
778
+ 2. Set destination to 'CloudWatchLogs' with status 'ACTIVE'
779
+ 3. This enables the `search_transaction_spans()` tool for complete observability
780
+
781
+ Without Transaction Search, you'll only have access to 5% sampled trace data through X-Ray.
782
+
783
+ ## Development
784
+
785
+ This server is part of the AWS Labs MCP collection. For development and contribution guidelines, please see the main repository documentation.
786
+
787
+ ### Running Tests
788
+
789
+ To run the comprehensive test suite that validates all use case examples and tool functionality:
790
+
791
+ ```bash
792
+ cd src/cloudwatch-applicationsignals-mcp-server
793
+ python -m pytest tests/test_use_case_examples.py -v
794
+ ```
795
+
796
+ This test file verifies that all use case examples in the tool documentation call the correct tools with the right parameters and target formats. It includes tests for:
797
+
798
+ - All documented use cases for `audit_services()`, `audit_slos()`, and `audit_service_operations()`
799
+ - Target format validation (service, SLO, and operation targets)
800
+ - Wildcard pattern expansion functionality
801
+ - Auditor selection for different scenarios
802
+ - JSON format validation for all documentation examples
803
+
804
+ The tests use mocked AWS clients to prevent real API calls while validating the tool logic and parameter handling.
805
+
806
+ ## License
807
+
808
+ This project is licensed under the Apache License, Version 2.0. See the LICENSE file for details.