@uniswap/ai-toolkit-nx-claude 0.5.29 → 0.5.30-next.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/cli-generator.cjs +28 -59
- package/dist/packages/ai-toolkit-nx-claude/src/cli-generator.d.ts +8 -10
- package/dist/packages/ai-toolkit-nx-claude/src/cli-generator.d.ts.map +1 -1
- package/dist/packages/ai-toolkit-nx-claude/src/index.d.ts +0 -1
- package/dist/packages/ai-toolkit-nx-claude/src/index.d.ts.map +1 -1
- package/generators.json +0 -15
- package/package.json +4 -35
- package/dist/content/agents/agnostic/CLAUDE.md +0 -282
- package/dist/content/agents/agnostic/agent-capability-analyst.md +0 -575
- package/dist/content/agents/agnostic/agent-optimizer.md +0 -396
- package/dist/content/agents/agnostic/agent-orchestrator.md +0 -475
- package/dist/content/agents/agnostic/cicd-agent.md +0 -301
- package/dist/content/agents/agnostic/claude-agent-discovery.md +0 -304
- package/dist/content/agents/agnostic/claude-docs-fact-checker.md +0 -435
- package/dist/content/agents/agnostic/claude-docs-initializer.md +0 -782
- package/dist/content/agents/agnostic/claude-docs-manager.md +0 -595
- package/dist/content/agents/agnostic/code-explainer.md +0 -269
- package/dist/content/agents/agnostic/code-generator.md +0 -785
- package/dist/content/agents/agnostic/commit-message-generator.md +0 -101
- package/dist/content/agents/agnostic/context-loader.md +0 -432
- package/dist/content/agents/agnostic/debug-assistant.md +0 -321
- package/dist/content/agents/agnostic/doc-writer.md +0 -536
- package/dist/content/agents/agnostic/feedback-collector.md +0 -165
- package/dist/content/agents/agnostic/infrastructure-agent.md +0 -406
- package/dist/content/agents/agnostic/migration-assistant.md +0 -489
- package/dist/content/agents/agnostic/pattern-learner.md +0 -481
- package/dist/content/agents/agnostic/performance-analyzer.md +0 -528
- package/dist/content/agents/agnostic/plan-reviewer.md +0 -173
- package/dist/content/agents/agnostic/planner.md +0 -235
- package/dist/content/agents/agnostic/pr-creator.md +0 -498
- package/dist/content/agents/agnostic/pr-reviewer.md +0 -142
- package/dist/content/agents/agnostic/prompt-engineer.md +0 -541
- package/dist/content/agents/agnostic/refactorer.md +0 -311
- package/dist/content/agents/agnostic/researcher.md +0 -349
- package/dist/content/agents/agnostic/security-analyzer.md +0 -1087
- package/dist/content/agents/agnostic/stack-splitter.md +0 -642
- package/dist/content/agents/agnostic/style-enforcer.md +0 -568
- package/dist/content/agents/agnostic/test-runner.md +0 -481
- package/dist/content/agents/agnostic/test-writer.md +0 -292
- package/dist/content/commands/agnostic/CLAUDE.md +0 -207
- package/dist/content/commands/agnostic/address-pr-issues.md +0 -205
- package/dist/content/commands/agnostic/auto-spec.md +0 -386
- package/dist/content/commands/agnostic/claude-docs.md +0 -409
- package/dist/content/commands/agnostic/claude-init-plus.md +0 -439
- package/dist/content/commands/agnostic/create-pr.md +0 -79
- package/dist/content/commands/agnostic/daily-standup.md +0 -185
- package/dist/content/commands/agnostic/deploy.md +0 -441
- package/dist/content/commands/agnostic/execute-plan.md +0 -167
- package/dist/content/commands/agnostic/explain-file.md +0 -303
- package/dist/content/commands/agnostic/explore.md +0 -82
- package/dist/content/commands/agnostic/fix-bug.md +0 -273
- package/dist/content/commands/agnostic/gen-tests.md +0 -185
- package/dist/content/commands/agnostic/generate-commit-message.md +0 -92
- package/dist/content/commands/agnostic/git-worktree-orchestrator.md +0 -647
- package/dist/content/commands/agnostic/implement-spec.md +0 -270
- package/dist/content/commands/agnostic/monitor.md +0 -581
- package/dist/content/commands/agnostic/perf-analyze.md +0 -214
- package/dist/content/commands/agnostic/plan.md +0 -453
- package/dist/content/commands/agnostic/refactor.md +0 -315
- package/dist/content/commands/agnostic/refine-linear-task.md +0 -575
- package/dist/content/commands/agnostic/research.md +0 -49
- package/dist/content/commands/agnostic/review-code.md +0 -321
- package/dist/content/commands/agnostic/review-plan.md +0 -109
- package/dist/content/commands/agnostic/review-pr.md +0 -393
- package/dist/content/commands/agnostic/split-stack.md +0 -705
- package/dist/content/commands/agnostic/update-claude-md.md +0 -401
- package/dist/content/commands/agnostic/work-through-pr-comments.md +0 -873
- package/dist/generators/add-agent/CLAUDE.md +0 -130
- package/dist/generators/add-agent/files/__name__.md.template +0 -37
- package/dist/generators/add-agent/generator.cjs +0 -640
- package/dist/generators/add-agent/schema.json +0 -59
- package/dist/generators/add-command/CLAUDE.md +0 -131
- package/dist/generators/add-command/files/__name__.md.template +0 -46
- package/dist/generators/add-command/generator.cjs +0 -643
- package/dist/generators/add-command/schema.json +0 -50
- package/dist/generators/files/src/index.ts.template +0 -1
- package/dist/generators/init/CLAUDE.md +0 -520
- package/dist/generators/init/generator.cjs +0 -3304
- package/dist/generators/init/schema.json +0 -180
- package/dist/packages/ai-toolkit-nx-claude/src/generators/add-agent/generator.d.ts +0 -5
- package/dist/packages/ai-toolkit-nx-claude/src/generators/add-agent/generator.d.ts.map +0 -1
- package/dist/packages/ai-toolkit-nx-claude/src/generators/add-command/generator.d.ts +0 -5
- package/dist/packages/ai-toolkit-nx-claude/src/generators/add-command/generator.d.ts.map +0 -1
- package/dist/packages/ai-toolkit-nx-claude/src/generators/init/generator.d.ts +0 -5
- package/dist/packages/ai-toolkit-nx-claude/src/generators/init/generator.d.ts.map +0 -1
- package/dist/packages/ai-toolkit-nx-claude/src/utils/auto-update-utils.d.ts +0 -30
- package/dist/packages/ai-toolkit-nx-claude/src/utils/auto-update-utils.d.ts.map +0 -1
|
@@ -1,581 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
description: Set up comprehensive monitoring for applications with automated metrics identification, alerting, and dashboard configuration
|
|
3
|
-
argument-hint: monitor [application-type] [monitoring-platform] [options]
|
|
4
|
-
allowed-tools: Bash(*), Read(*), Write(*.yml), Write(*.json), Write(*.md), Edit(*), Grep(*), Glob(*), LS(*)
|
|
5
|
-
---
|
|
6
|
-
|
|
7
|
-
# Monitor Command
|
|
8
|
-
|
|
9
|
-
## Overview
|
|
10
|
-
|
|
11
|
-
The monitor command provides automated monitoring setup for applications, including metrics identification, alert configuration, and dashboard creation across multiple monitoring platforms.
|
|
12
|
-
|
|
13
|
-
## Inputs
|
|
14
|
-
|
|
15
|
-
Parse command arguments to determine monitoring configuration:
|
|
16
|
-
|
|
17
|
-
```yaml
|
|
18
|
-
application_type:
|
|
19
|
-
- microservices
|
|
20
|
-
- monolith
|
|
21
|
-
- serverless
|
|
22
|
-
- spa (single page application)
|
|
23
|
-
- mobile-api
|
|
24
|
-
- data-pipeline
|
|
25
|
-
- auto-detect
|
|
26
|
-
|
|
27
|
-
monitoring_platform:
|
|
28
|
-
- prometheus-grafana
|
|
29
|
-
- datadog
|
|
30
|
-
- newrelic
|
|
31
|
-
- cloudwatch
|
|
32
|
-
- azure-insights
|
|
33
|
-
- elastic-apm
|
|
34
|
-
- auto-select
|
|
35
|
-
|
|
36
|
-
options:
|
|
37
|
-
environment: [development, staging, production]
|
|
38
|
-
severity_levels: [critical, warning, info]
|
|
39
|
-
notification_channels: [email, slack, pagerduty, webhook]
|
|
40
|
-
retention_period: [7d, 30d, 90d, 365d]
|
|
41
|
-
sampling_rate: [0.1, 1, 5, 10] # percentage
|
|
42
|
-
custom_metrics: [true, false]
|
|
43
|
-
business_metrics: [true, false]
|
|
44
|
-
sli_slo_setup: [true, false]
|
|
45
|
-
```
|
|
46
|
-
|
|
47
|
-
## Task
|
|
48
|
-
|
|
49
|
-
### 1. Application Analysis
|
|
50
|
-
|
|
51
|
-
Analyze the codebase to identify monitoring requirements:
|
|
52
|
-
|
|
53
|
-
#### Code Pattern Analysis
|
|
54
|
-
|
|
55
|
-
- **Framework Detection**: Identify web frameworks (Express, FastAPI, Spring Boot, etc.)
|
|
56
|
-
- **Database Usage**: Detect database connections and query patterns
|
|
57
|
-
- **External Dependencies**: Find API calls, message queues, cache usage
|
|
58
|
-
- **Error Handling**: Locate error handling patterns and logging statements
|
|
59
|
-
- **Performance Bottlenecks**: Identify potentially slow operations
|
|
60
|
-
|
|
61
|
-
#### Architecture Assessment
|
|
62
|
-
|
|
63
|
-
- **Service Boundaries**: Map service interactions for microservices
|
|
64
|
-
- **Data Flow**: Trace request/response patterns
|
|
65
|
-
- **Resource Usage**: Identify CPU/memory intensive operations
|
|
66
|
-
- **Scalability Points**: Find auto-scaling triggers and limits
|
|
67
|
-
|
|
68
|
-
### 2. Metrics Identification
|
|
69
|
-
|
|
70
|
-
#### Application Metrics
|
|
71
|
-
|
|
72
|
-
```yaml
|
|
73
|
-
response_time_metrics:
|
|
74
|
-
- http_request_duration_seconds
|
|
75
|
-
- database_query_duration_seconds
|
|
76
|
-
- external_api_call_duration_seconds
|
|
77
|
-
- business_process_duration_seconds
|
|
78
|
-
|
|
79
|
-
throughput_metrics:
|
|
80
|
-
- http_requests_per_second
|
|
81
|
-
- transactions_per_second
|
|
82
|
-
- messages_processed_per_second
|
|
83
|
-
- concurrent_users
|
|
84
|
-
|
|
85
|
-
error_metrics:
|
|
86
|
-
- http_error_rate_4xx
|
|
87
|
-
- http_error_rate_5xx
|
|
88
|
-
- database_connection_errors
|
|
89
|
-
- external_service_errors
|
|
90
|
-
- application_exceptions
|
|
91
|
-
|
|
92
|
-
availability_metrics:
|
|
93
|
-
- service_uptime_percentage
|
|
94
|
-
- health_check_success_rate
|
|
95
|
-
- dependency_availability
|
|
96
|
-
```
|
|
97
|
-
|
|
98
|
-
#### Infrastructure Metrics
|
|
99
|
-
|
|
100
|
-
```yaml
|
|
101
|
-
system_metrics:
|
|
102
|
-
- cpu_usage_percentage
|
|
103
|
-
- memory_usage_percentage
|
|
104
|
-
- disk_usage_percentage
|
|
105
|
-
- network_io_bytes
|
|
106
|
-
- file_descriptor_usage
|
|
107
|
-
|
|
108
|
-
container_metrics:
|
|
109
|
-
- container_cpu_usage
|
|
110
|
-
- container_memory_usage
|
|
111
|
-
- container_restart_count
|
|
112
|
-
- pod_status (kubernetes)
|
|
113
|
-
|
|
114
|
-
database_metrics:
|
|
115
|
-
- connection_pool_usage
|
|
116
|
-
- query_performance
|
|
117
|
-
- deadlock_count
|
|
118
|
-
- replication_lag
|
|
119
|
-
```
|
|
120
|
-
|
|
121
|
-
#### Business Metrics
|
|
122
|
-
|
|
123
|
-
```yaml
|
|
124
|
-
user_metrics:
|
|
125
|
-
- active_users_count
|
|
126
|
-
- user_session_duration
|
|
127
|
-
- user_retention_rate
|
|
128
|
-
- feature_usage_count
|
|
129
|
-
|
|
130
|
-
transaction_metrics:
|
|
131
|
-
- revenue_per_minute
|
|
132
|
-
- conversion_rate
|
|
133
|
-
- cart_abandonment_rate
|
|
134
|
-
- payment_success_rate
|
|
135
|
-
|
|
136
|
-
content_metrics:
|
|
137
|
-
- page_views
|
|
138
|
-
- bounce_rate
|
|
139
|
-
- search_queries
|
|
140
|
-
- download_count
|
|
141
|
-
```
|
|
142
|
-
|
|
143
|
-
### 3. Alert Configuration
|
|
144
|
-
|
|
145
|
-
#### Threshold-Based Alerts
|
|
146
|
-
|
|
147
|
-
```yaml
|
|
148
|
-
critical_alerts:
|
|
149
|
-
- name: 'High Error Rate'
|
|
150
|
-
condition: 'error_rate > 5%'
|
|
151
|
-
duration: '2m'
|
|
152
|
-
severity: 'critical'
|
|
153
|
-
channels: ['pagerduty', 'slack']
|
|
154
|
-
|
|
155
|
-
- name: 'Service Down'
|
|
156
|
-
condition: 'up == 0'
|
|
157
|
-
duration: '30s'
|
|
158
|
-
severity: 'critical'
|
|
159
|
-
channels: ['pagerduty', 'email']
|
|
160
|
-
|
|
161
|
-
- name: 'High Response Time'
|
|
162
|
-
condition: 'response_time_p95 > 2s'
|
|
163
|
-
duration: '5m'
|
|
164
|
-
severity: 'critical'
|
|
165
|
-
channels: ['slack', 'email']
|
|
166
|
-
|
|
167
|
-
warning_alerts:
|
|
168
|
-
- name: 'High CPU Usage'
|
|
169
|
-
condition: 'cpu_usage > 80%'
|
|
170
|
-
duration: '5m'
|
|
171
|
-
severity: 'warning'
|
|
172
|
-
channels: ['slack']
|
|
173
|
-
|
|
174
|
-
- name: 'High Memory Usage'
|
|
175
|
-
condition: 'memory_usage > 85%'
|
|
176
|
-
duration: '3m'
|
|
177
|
-
severity: 'warning'
|
|
178
|
-
channels: ['slack']
|
|
179
|
-
|
|
180
|
-
- name: 'Increased Response Time'
|
|
181
|
-
condition: 'response_time_p95 > 1s'
|
|
182
|
-
duration: '10m'
|
|
183
|
-
severity: 'warning'
|
|
184
|
-
channels: ['email']
|
|
185
|
-
```
|
|
186
|
-
|
|
187
|
-
#### Anomaly Detection Alerts
|
|
188
|
-
|
|
189
|
-
```yaml
|
|
190
|
-
anomaly_alerts:
|
|
191
|
-
- name: 'Unusual Traffic Pattern'
|
|
192
|
-
metric: 'requests_per_second'
|
|
193
|
-
algorithm: 'statistical_deviation'
|
|
194
|
-
sensitivity: 'medium'
|
|
195
|
-
learning_period: '7d'
|
|
196
|
-
|
|
197
|
-
- name: 'Memory Leak Detection'
|
|
198
|
-
metric: 'memory_usage'
|
|
199
|
-
algorithm: 'trend_analysis'
|
|
200
|
-
sensitivity: 'high'
|
|
201
|
-
learning_period: '24h'
|
|
202
|
-
|
|
203
|
-
- name: 'Response Time Anomaly'
|
|
204
|
-
metric: 'response_time_p95'
|
|
205
|
-
algorithm: 'seasonal_decomposition'
|
|
206
|
-
sensitivity: 'medium'
|
|
207
|
-
learning_period: '14d'
|
|
208
|
-
```
|
|
209
|
-
|
|
210
|
-
### 4. Dashboard Configuration
|
|
211
|
-
|
|
212
|
-
#### Service Health Dashboard
|
|
213
|
-
|
|
214
|
-
```yaml
|
|
215
|
-
service_health_dashboard:
|
|
216
|
-
panels:
|
|
217
|
-
- title: 'Service Status Overview'
|
|
218
|
-
type: 'stat'
|
|
219
|
-
metrics: ['up', 'health_check_status']
|
|
220
|
-
|
|
221
|
-
- title: 'Request Rate'
|
|
222
|
-
type: 'graph'
|
|
223
|
-
metrics: ['http_requests_per_second']
|
|
224
|
-
time_range: '1h'
|
|
225
|
-
|
|
226
|
-
- title: 'Response Time Distribution'
|
|
227
|
-
type: 'histogram'
|
|
228
|
-
metrics: ['http_request_duration_seconds']
|
|
229
|
-
|
|
230
|
-
- title: 'Error Rate'
|
|
231
|
-
type: 'graph'
|
|
232
|
-
metrics: ['http_error_rate_4xx', 'http_error_rate_5xx']
|
|
233
|
-
|
|
234
|
-
- title: 'Top Errors'
|
|
235
|
-
type: 'table'
|
|
236
|
-
metrics: ['error_messages', 'error_count']
|
|
237
|
-
limit: 10
|
|
238
|
-
```
|
|
239
|
-
|
|
240
|
-
#### Performance Dashboard
|
|
241
|
-
|
|
242
|
-
```yaml
|
|
243
|
-
performance_dashboard:
|
|
244
|
-
panels:
|
|
245
|
-
- title: 'Response Time Percentiles'
|
|
246
|
-
type: 'graph'
|
|
247
|
-
metrics: ['response_time_p50', 'response_time_p95', 'response_time_p99']
|
|
248
|
-
|
|
249
|
-
- title: 'Throughput'
|
|
250
|
-
type: 'graph'
|
|
251
|
-
metrics: ['requests_per_second', 'transactions_per_second']
|
|
252
|
-
|
|
253
|
-
- title: 'Resource Utilization'
|
|
254
|
-
type: 'graph'
|
|
255
|
-
metrics: ['cpu_usage', 'memory_usage', 'disk_usage']
|
|
256
|
-
|
|
257
|
-
- title: 'Database Performance'
|
|
258
|
-
type: 'graph'
|
|
259
|
-
metrics: ['db_query_duration', 'db_connections_active']
|
|
260
|
-
|
|
261
|
-
- title: 'Cache Hit Rate'
|
|
262
|
-
type: 'stat'
|
|
263
|
-
metrics: ['cache_hit_rate']
|
|
264
|
-
```
|
|
265
|
-
|
|
266
|
-
#### Business KPI Dashboard
|
|
267
|
-
|
|
268
|
-
```yaml
|
|
269
|
-
business_kpi_dashboard:
|
|
270
|
-
panels:
|
|
271
|
-
- title: 'Active Users'
|
|
272
|
-
type: 'stat'
|
|
273
|
-
metrics: ['active_users_current', 'active_users_24h']
|
|
274
|
-
|
|
275
|
-
- title: 'Revenue Metrics'
|
|
276
|
-
type: 'graph'
|
|
277
|
-
metrics: ['revenue_per_hour', 'conversion_rate']
|
|
278
|
-
|
|
279
|
-
- title: 'User Journey Funnel'
|
|
280
|
-
type: 'funnel'
|
|
281
|
-
metrics: ['page_visits', 'signups', 'conversions']
|
|
282
|
-
|
|
283
|
-
- title: 'Feature Usage'
|
|
284
|
-
type: 'heatmap'
|
|
285
|
-
metrics: ['feature_usage_by_user', 'feature_adoption_rate']
|
|
286
|
-
```
|
|
287
|
-
|
|
288
|
-
### 5. Platform-Specific Implementation
|
|
289
|
-
|
|
290
|
-
#### Prometheus/Grafana Setup
|
|
291
|
-
|
|
292
|
-
```yaml
|
|
293
|
-
prometheus_config:
|
|
294
|
-
scrape_configs:
|
|
295
|
-
- job_name: 'application'
|
|
296
|
-
static_configs:
|
|
297
|
-
- targets: ['localhost:8080']
|
|
298
|
-
metrics_path: '/metrics'
|
|
299
|
-
scrape_interval: '30s'
|
|
300
|
-
|
|
301
|
-
alerting:
|
|
302
|
-
alertmanagers:
|
|
303
|
-
- static_configs:
|
|
304
|
-
- targets: ['alertmanager:9093']
|
|
305
|
-
|
|
306
|
-
rule_files:
|
|
307
|
-
- 'alert_rules.yml'
|
|
308
|
-
- 'recording_rules.yml'
|
|
309
|
-
|
|
310
|
-
grafana_dashboards:
|
|
311
|
-
- service_health.json
|
|
312
|
-
- performance_metrics.json
|
|
313
|
-
- business_kpis.json
|
|
314
|
-
- infrastructure.json
|
|
315
|
-
```
|
|
316
|
-
|
|
317
|
-
#### DataDog Configuration
|
|
318
|
-
|
|
319
|
-
```yaml
|
|
320
|
-
datadog_config:
|
|
321
|
-
api_key: '${DATADOG_API_KEY}'
|
|
322
|
-
app_key: '${DATADOG_APP_KEY}'
|
|
323
|
-
|
|
324
|
-
dashboards:
|
|
325
|
-
- title: 'Application Overview'
|
|
326
|
-
widgets:
|
|
327
|
-
- title: 'Request Rate'
|
|
328
|
-
definition:
|
|
329
|
-
type: 'timeseries'
|
|
330
|
-
requests:
|
|
331
|
-
- q: 'sum:http.requests{*}.as_rate()'
|
|
332
|
-
|
|
333
|
-
monitors:
|
|
334
|
-
- name: 'High Error Rate'
|
|
335
|
-
type: 'metric alert'
|
|
336
|
-
query: 'avg(last_5m):avg:http.errors{*}.as_rate() > 0.05'
|
|
337
|
-
message: 'Error rate is above 5%'
|
|
338
|
-
tags: ['team:backend', 'severity:critical']
|
|
339
|
-
```
|
|
340
|
-
|
|
341
|
-
#### New Relic Configuration
|
|
342
|
-
|
|
343
|
-
```yaml
|
|
344
|
-
newrelic_config:
|
|
345
|
-
license_key: '${NEW_RELIC_LICENSE_KEY}'
|
|
346
|
-
app_name: 'MyApplication'
|
|
347
|
-
|
|
348
|
-
custom_insights:
|
|
349
|
-
- event_type: 'BusinessMetrics'
|
|
350
|
-
attributes:
|
|
351
|
-
- user_id
|
|
352
|
-
- transaction_amount
|
|
353
|
-
- feature_used
|
|
354
|
-
|
|
355
|
-
alerts:
|
|
356
|
-
- name: 'Apdex Score'
|
|
357
|
-
type: 'apdex'
|
|
358
|
-
condition: 'below'
|
|
359
|
-
threshold: 0.8
|
|
360
|
-
duration: 300
|
|
361
|
-
```
|
|
362
|
-
|
|
363
|
-
### 6. SLI/SLO Definition
|
|
364
|
-
|
|
365
|
-
#### Service Level Indicators
|
|
366
|
-
|
|
367
|
-
```yaml
|
|
368
|
-
slis:
|
|
369
|
-
availability:
|
|
370
|
-
description: 'Percentage of successful requests'
|
|
371
|
-
calculation: "sum(http_requests{status!~'5..'})/sum(http_requests)"
|
|
372
|
-
|
|
373
|
-
latency:
|
|
374
|
-
description: '95th percentile response time'
|
|
375
|
-
calculation: 'histogram_quantile(0.95, http_request_duration_seconds)'
|
|
376
|
-
|
|
377
|
-
error_rate:
|
|
378
|
-
description: 'Percentage of failed requests'
|
|
379
|
-
calculation: "sum(http_requests{status=~'5..'})/sum(http_requests)"
|
|
380
|
-
```
|
|
381
|
-
|
|
382
|
-
#### Service Level Objectives
|
|
383
|
-
|
|
384
|
-
```yaml
|
|
385
|
-
slos:
|
|
386
|
-
availability_slo:
|
|
387
|
-
target: 99.9%
|
|
388
|
-
time_window: '30d'
|
|
389
|
-
error_budget: 0.1%
|
|
390
|
-
|
|
391
|
-
latency_slo:
|
|
392
|
-
target: '< 200ms'
|
|
393
|
-
percentile: 95
|
|
394
|
-
time_window: '30d'
|
|
395
|
-
|
|
396
|
-
error_rate_slo:
|
|
397
|
-
target: '< 1%'
|
|
398
|
-
time_window: '30d'
|
|
399
|
-
```
|
|
400
|
-
|
|
401
|
-
## Delegation
|
|
402
|
-
|
|
403
|
-
### Agent Coordination
|
|
404
|
-
|
|
405
|
-
1. **Infrastructure Agent**: Configure system-level monitoring (CPU, memory, disk, network)
|
|
406
|
-
2. **Application Agent**: Set up application-specific metrics and traces
|
|
407
|
-
3. **Database Agent**: Configure database monitoring and performance metrics
|
|
408
|
-
4. **Security Agent**: Implement security monitoring and compliance checks
|
|
409
|
-
5. **DevOps Agent**: Configure CI/CD pipeline monitoring and deployment tracking
|
|
410
|
-
|
|
411
|
-
### Workflow Orchestration
|
|
412
|
-
|
|
413
|
-
```yaml
|
|
414
|
-
monitoring_setup_workflow:
|
|
415
|
-
phases:
|
|
416
|
-
- name: 'discovery'
|
|
417
|
-
agent: 'application_agent'
|
|
418
|
-
tasks: ['analyze_codebase', 'identify_frameworks', 'map_dependencies']
|
|
419
|
-
|
|
420
|
-
- name: 'metrics_design'
|
|
421
|
-
agent: 'infrastructure_agent'
|
|
422
|
-
tasks: ['define_slis', 'create_dashboards', 'setup_alerts']
|
|
423
|
-
dependencies: ['discovery']
|
|
424
|
-
|
|
425
|
-
- name: 'implementation'
|
|
426
|
-
agent: 'devops_agent'
|
|
427
|
-
tasks: ['deploy_monitoring', 'configure_agents', 'test_alerts']
|
|
428
|
-
dependencies: ['metrics_design']
|
|
429
|
-
|
|
430
|
-
- name: 'validation'
|
|
431
|
-
agent: 'application_agent'
|
|
432
|
-
tasks: ['verify_metrics', 'test_dashboards', 'validate_alerts']
|
|
433
|
-
dependencies: ['implementation']
|
|
434
|
-
```
|
|
435
|
-
|
|
436
|
-
## Output
|
|
437
|
-
|
|
438
|
-
### Monitoring Configuration Report
|
|
439
|
-
|
|
440
|
-
```yaml
|
|
441
|
-
monitoring_report:
|
|
442
|
-
summary:
|
|
443
|
-
application_type: 'microservices'
|
|
444
|
-
monitoring_platform: 'prometheus-grafana'
|
|
445
|
-
metrics_configured: 45
|
|
446
|
-
alerts_configured: 12
|
|
447
|
-
dashboards_created: 4
|
|
448
|
-
|
|
449
|
-
metrics_inventory:
|
|
450
|
-
application_metrics: 20
|
|
451
|
-
infrastructure_metrics: 15
|
|
452
|
-
business_metrics: 10
|
|
453
|
-
|
|
454
|
-
alert_coverage:
|
|
455
|
-
critical_alerts: 6
|
|
456
|
-
warning_alerts: 4
|
|
457
|
-
info_alerts: 2
|
|
458
|
-
|
|
459
|
-
dashboard_summary:
|
|
460
|
-
- name: 'Service Health'
|
|
461
|
-
panels: 8
|
|
462
|
-
refresh_rate: '30s'
|
|
463
|
-
|
|
464
|
-
- name: 'Performance Metrics'
|
|
465
|
-
panels: 12
|
|
466
|
-
refresh_rate: '1m'
|
|
467
|
-
|
|
468
|
-
sli_slo_definition:
|
|
469
|
-
availability_slo: '99.9%'
|
|
470
|
-
latency_slo: '< 200ms (p95)'
|
|
471
|
-
error_rate_slo: '< 1%'
|
|
472
|
-
|
|
473
|
-
files_created:
|
|
474
|
-
- config/prometheus.yml
|
|
475
|
-
- config/alert_rules.yml
|
|
476
|
-
- dashboards/service_health.json
|
|
477
|
-
- dashboards/performance.json
|
|
478
|
-
- docker-compose.monitoring.yml
|
|
479
|
-
|
|
480
|
-
next_steps:
|
|
481
|
-
- 'Deploy monitoring stack'
|
|
482
|
-
- 'Configure notification channels'
|
|
483
|
-
- 'Set up log aggregation'
|
|
484
|
-
- 'Create runbooks for alerts'
|
|
485
|
-
- 'Schedule SLO reviews'
|
|
486
|
-
```
|
|
487
|
-
|
|
488
|
-
### Implementation Examples
|
|
489
|
-
|
|
490
|
-
#### Microservices Monitoring
|
|
491
|
-
|
|
492
|
-
```bash
|
|
493
|
-
# Example: Set up comprehensive monitoring for microservices
|
|
494
|
-
monitor microservices prometheus-grafana \
|
|
495
|
-
--environment=production \
|
|
496
|
-
--severity-levels=critical,warning \
|
|
497
|
-
--notification-channels=slack,pagerduty \
|
|
498
|
-
--sli-slo-setup=true \
|
|
499
|
-
--business-metrics=true
|
|
500
|
-
```
|
|
501
|
-
|
|
502
|
-
#### Monolith Application Monitoring
|
|
503
|
-
|
|
504
|
-
```bash
|
|
505
|
-
# Example: Monitor a monolithic application with DataDog
|
|
506
|
-
monitor monolith datadog \
|
|
507
|
-
--environment=staging \
|
|
508
|
-
--custom-metrics=true \
|
|
509
|
-
--retention-period=90d \
|
|
510
|
-
--sampling-rate=5
|
|
511
|
-
```
|
|
512
|
-
|
|
513
|
-
#### Serverless Function Monitoring
|
|
514
|
-
|
|
515
|
-
```bash
|
|
516
|
-
# Example: Set up monitoring for serverless functions
|
|
517
|
-
monitor serverless cloudwatch \
|
|
518
|
-
--environment=production \
|
|
519
|
-
--notification-channels=email,webhook \
|
|
520
|
-
--business-metrics=false
|
|
521
|
-
```
|
|
522
|
-
|
|
523
|
-
### Troubleshooting Guide
|
|
524
|
-
|
|
525
|
-
#### Common Issues and Solutions
|
|
526
|
-
|
|
527
|
-
1. **Metrics Not Appearing**
|
|
528
|
-
|
|
529
|
-
- Check metric endpoint accessibility
|
|
530
|
-
- Verify scrape configuration
|
|
531
|
-
- Confirm metric naming conventions
|
|
532
|
-
- Validate service discovery
|
|
533
|
-
|
|
534
|
-
2. **High Cardinality Issues**
|
|
535
|
-
|
|
536
|
-
- Implement metric filtering
|
|
537
|
-
- Use recording rules for aggregation
|
|
538
|
-
- Limit label values
|
|
539
|
-
- Set retention policies
|
|
540
|
-
|
|
541
|
-
3. **Alert Fatigue**
|
|
542
|
-
|
|
543
|
-
- Tune alert thresholds
|
|
544
|
-
- Implement alert grouping
|
|
545
|
-
- Add alert dependencies
|
|
546
|
-
- Create escalation policies
|
|
547
|
-
|
|
548
|
-
4. **Dashboard Performance**
|
|
549
|
-
- Optimize query performance
|
|
550
|
-
- Use appropriate time ranges
|
|
551
|
-
- Implement result caching
|
|
552
|
-
- Reduce panel refresh rates
|
|
553
|
-
|
|
554
|
-
### Best Practices
|
|
555
|
-
|
|
556
|
-
1. **Metric Design**
|
|
557
|
-
|
|
558
|
-
- Use consistent naming conventions
|
|
559
|
-
- Implement proper labeling strategy
|
|
560
|
-
- Avoid high cardinality metrics
|
|
561
|
-
- Document metric meanings
|
|
562
|
-
|
|
563
|
-
2. **Alert Management**
|
|
564
|
-
|
|
565
|
-
- Start with conservative thresholds
|
|
566
|
-
- Implement alert dependencies
|
|
567
|
-
- Use appropriate notification channels
|
|
568
|
-
- Create clear runbooks
|
|
569
|
-
|
|
570
|
-
3. **Dashboard Organization**
|
|
571
|
-
|
|
572
|
-
- Group related metrics
|
|
573
|
-
- Use consistent color schemes
|
|
574
|
-
- Implement role-based access
|
|
575
|
-
- Regular dashboard reviews
|
|
576
|
-
|
|
577
|
-
4. **Performance Optimization**
|
|
578
|
-
- Monitor monitoring system itself
|
|
579
|
-
- Implement proper retention policies
|
|
580
|
-
- Use efficient query patterns
|
|
581
|
-
- Regular cleanup and maintenance
|