claude-autopm 2.8.1 → 2.8.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (273) hide show
  1. package/README.md +116 -8
  2. package/bin/autopm.js +2 -0
  3. package/bin/commands/plugin.js +395 -0
  4. package/bin/commands/team.js +184 -10
  5. package/install/install.js +223 -4
  6. package/lib/plugins/PluginManager.js +1328 -0
  7. package/lib/plugins/PluginManager.old.js +400 -0
  8. package/package.json +4 -1
  9. package/scripts/publish-plugins.sh +166 -0
  10. package/autopm/.claude/agents/cloud/README.md +0 -55
  11. package/autopm/.claude/agents/cloud/aws-cloud-architect.md +0 -521
  12. package/autopm/.claude/agents/cloud/azure-cloud-architect.md +0 -436
  13. package/autopm/.claude/agents/cloud/gcp-cloud-architect.md +0 -385
  14. package/autopm/.claude/agents/cloud/gcp-cloud-functions-engineer.md +0 -306
  15. package/autopm/.claude/agents/cloud/gemini-api-expert.md +0 -880
  16. package/autopm/.claude/agents/cloud/kubernetes-orchestrator.md +0 -566
  17. package/autopm/.claude/agents/cloud/openai-python-expert.md +0 -1087
  18. package/autopm/.claude/agents/cloud/terraform-infrastructure-expert.md +0 -454
  19. package/autopm/.claude/agents/core/agent-manager.md +0 -296
  20. package/autopm/.claude/agents/core/code-analyzer.md +0 -131
  21. package/autopm/.claude/agents/core/file-analyzer.md +0 -162
  22. package/autopm/.claude/agents/core/test-runner.md +0 -200
  23. package/autopm/.claude/agents/data/airflow-orchestration-expert.md +0 -52
  24. package/autopm/.claude/agents/data/kedro-pipeline-expert.md +0 -50
  25. package/autopm/.claude/agents/data/langgraph-workflow-expert.md +0 -520
  26. package/autopm/.claude/agents/databases/README.md +0 -50
  27. package/autopm/.claude/agents/databases/bigquery-expert.md +0 -392
  28. package/autopm/.claude/agents/databases/cosmosdb-expert.md +0 -368
  29. package/autopm/.claude/agents/databases/mongodb-expert.md +0 -398
  30. package/autopm/.claude/agents/databases/postgresql-expert.md +0 -321
  31. package/autopm/.claude/agents/databases/redis-expert.md +0 -52
  32. package/autopm/.claude/agents/devops/README.md +0 -52
  33. package/autopm/.claude/agents/devops/azure-devops-specialist.md +0 -308
  34. package/autopm/.claude/agents/devops/docker-containerization-expert.md +0 -298
  35. package/autopm/.claude/agents/devops/github-operations-specialist.md +0 -335
  36. package/autopm/.claude/agents/devops/mcp-context-manager.md +0 -319
  37. package/autopm/.claude/agents/devops/observability-engineer.md +0 -574
  38. package/autopm/.claude/agents/devops/ssh-operations-expert.md +0 -1093
  39. package/autopm/.claude/agents/devops/traefik-proxy-expert.md +0 -444
  40. package/autopm/.claude/agents/frameworks/README.md +0 -64
  41. package/autopm/.claude/agents/frameworks/e2e-test-engineer.md +0 -360
  42. package/autopm/.claude/agents/frameworks/nats-messaging-expert.md +0 -254
  43. package/autopm/.claude/agents/frameworks/react-frontend-engineer.md +0 -217
  44. package/autopm/.claude/agents/frameworks/react-ui-expert.md +0 -226
  45. package/autopm/.claude/agents/frameworks/tailwindcss-expert.md +0 -770
  46. package/autopm/.claude/agents/frameworks/ux-design-expert.md +0 -244
  47. package/autopm/.claude/agents/integration/message-queue-engineer.md +0 -794
  48. package/autopm/.claude/agents/languages/README.md +0 -50
  49. package/autopm/.claude/agents/languages/bash-scripting-expert.md +0 -541
  50. package/autopm/.claude/agents/languages/javascript-frontend-engineer.md +0 -197
  51. package/autopm/.claude/agents/languages/nodejs-backend-engineer.md +0 -226
  52. package/autopm/.claude/agents/languages/python-backend-engineer.md +0 -214
  53. package/autopm/.claude/agents/languages/python-backend-expert.md +0 -289
  54. package/autopm/.claude/agents/testing/frontend-testing-engineer.md +0 -395
  55. package/autopm/.claude/commands/ai/langgraph-workflow.md +0 -65
  56. package/autopm/.claude/commands/ai/openai-chat.md +0 -65
  57. package/autopm/.claude/commands/azure/COMMANDS.md +0 -107
  58. package/autopm/.claude/commands/azure/COMMAND_MAPPING.md +0 -252
  59. package/autopm/.claude/commands/azure/INTEGRATION_FIX.md +0 -103
  60. package/autopm/.claude/commands/azure/README.md +0 -246
  61. package/autopm/.claude/commands/azure/active-work.md +0 -198
  62. package/autopm/.claude/commands/azure/aliases.md +0 -143
  63. package/autopm/.claude/commands/azure/blocked-items.md +0 -287
  64. package/autopm/.claude/commands/azure/clean.md +0 -93
  65. package/autopm/.claude/commands/azure/docs-query.md +0 -48
  66. package/autopm/.claude/commands/azure/feature-decompose.md +0 -380
  67. package/autopm/.claude/commands/azure/feature-list.md +0 -61
  68. package/autopm/.claude/commands/azure/feature-new.md +0 -115
  69. package/autopm/.claude/commands/azure/feature-show.md +0 -205
  70. package/autopm/.claude/commands/azure/feature-start.md +0 -130
  71. package/autopm/.claude/commands/azure/fix-integration-example.md +0 -93
  72. package/autopm/.claude/commands/azure/help.md +0 -150
  73. package/autopm/.claude/commands/azure/import-us.md +0 -269
  74. package/autopm/.claude/commands/azure/init.md +0 -211
  75. package/autopm/.claude/commands/azure/next-task.md +0 -262
  76. package/autopm/.claude/commands/azure/search.md +0 -160
  77. package/autopm/.claude/commands/azure/sprint-status.md +0 -235
  78. package/autopm/.claude/commands/azure/standup.md +0 -260
  79. package/autopm/.claude/commands/azure/sync-all.md +0 -99
  80. package/autopm/.claude/commands/azure/task-analyze.md +0 -186
  81. package/autopm/.claude/commands/azure/task-close.md +0 -329
  82. package/autopm/.claude/commands/azure/task-edit.md +0 -145
  83. package/autopm/.claude/commands/azure/task-list.md +0 -263
  84. package/autopm/.claude/commands/azure/task-new.md +0 -84
  85. package/autopm/.claude/commands/azure/task-reopen.md +0 -79
  86. package/autopm/.claude/commands/azure/task-show.md +0 -126
  87. package/autopm/.claude/commands/azure/task-start.md +0 -301
  88. package/autopm/.claude/commands/azure/task-status.md +0 -65
  89. package/autopm/.claude/commands/azure/task-sync.md +0 -67
  90. package/autopm/.claude/commands/azure/us-edit.md +0 -164
  91. package/autopm/.claude/commands/azure/us-list.md +0 -202
  92. package/autopm/.claude/commands/azure/us-new.md +0 -265
  93. package/autopm/.claude/commands/azure/us-parse.md +0 -253
  94. package/autopm/.claude/commands/azure/us-show.md +0 -188
  95. package/autopm/.claude/commands/azure/us-status.md +0 -320
  96. package/autopm/.claude/commands/azure/validate.md +0 -86
  97. package/autopm/.claude/commands/azure/work-item-sync.md +0 -47
  98. package/autopm/.claude/commands/cloud/infra-deploy.md +0 -38
  99. package/autopm/.claude/commands/github/workflow-create.md +0 -42
  100. package/autopm/.claude/commands/infrastructure/ssh-security.md +0 -65
  101. package/autopm/.claude/commands/infrastructure/traefik-setup.md +0 -65
  102. package/autopm/.claude/commands/kubernetes/deploy.md +0 -37
  103. package/autopm/.claude/commands/playwright/test-scaffold.md +0 -38
  104. package/autopm/.claude/commands/pm/blocked.md +0 -28
  105. package/autopm/.claude/commands/pm/clean.md +0 -119
  106. package/autopm/.claude/commands/pm/context-create.md +0 -136
  107. package/autopm/.claude/commands/pm/context-prime.md +0 -170
  108. package/autopm/.claude/commands/pm/context-update.md +0 -292
  109. package/autopm/.claude/commands/pm/context.md +0 -28
  110. package/autopm/.claude/commands/pm/epic-close.md +0 -86
  111. package/autopm/.claude/commands/pm/epic-decompose.md +0 -370
  112. package/autopm/.claude/commands/pm/epic-edit.md +0 -83
  113. package/autopm/.claude/commands/pm/epic-list.md +0 -30
  114. package/autopm/.claude/commands/pm/epic-merge.md +0 -222
  115. package/autopm/.claude/commands/pm/epic-oneshot.md +0 -119
  116. package/autopm/.claude/commands/pm/epic-refresh.md +0 -119
  117. package/autopm/.claude/commands/pm/epic-show.md +0 -28
  118. package/autopm/.claude/commands/pm/epic-split.md +0 -120
  119. package/autopm/.claude/commands/pm/epic-start.md +0 -195
  120. package/autopm/.claude/commands/pm/epic-status.md +0 -28
  121. package/autopm/.claude/commands/pm/epic-sync-modular.md +0 -338
  122. package/autopm/.claude/commands/pm/epic-sync-original.md +0 -473
  123. package/autopm/.claude/commands/pm/epic-sync.md +0 -486
  124. package/autopm/.claude/commands/pm/help.md +0 -28
  125. package/autopm/.claude/commands/pm/import.md +0 -115
  126. package/autopm/.claude/commands/pm/in-progress.md +0 -28
  127. package/autopm/.claude/commands/pm/init.md +0 -28
  128. package/autopm/.claude/commands/pm/issue-analyze.md +0 -202
  129. package/autopm/.claude/commands/pm/issue-close.md +0 -119
  130. package/autopm/.claude/commands/pm/issue-edit.md +0 -93
  131. package/autopm/.claude/commands/pm/issue-reopen.md +0 -87
  132. package/autopm/.claude/commands/pm/issue-show.md +0 -41
  133. package/autopm/.claude/commands/pm/issue-start.md +0 -234
  134. package/autopm/.claude/commands/pm/issue-status.md +0 -95
  135. package/autopm/.claude/commands/pm/issue-sync.md +0 -411
  136. package/autopm/.claude/commands/pm/next.md +0 -28
  137. package/autopm/.claude/commands/pm/prd-edit.md +0 -82
  138. package/autopm/.claude/commands/pm/prd-list.md +0 -28
  139. package/autopm/.claude/commands/pm/prd-new.md +0 -55
  140. package/autopm/.claude/commands/pm/prd-parse.md +0 -42
  141. package/autopm/.claude/commands/pm/prd-status.md +0 -28
  142. package/autopm/.claude/commands/pm/search.md +0 -28
  143. package/autopm/.claude/commands/pm/standup.md +0 -28
  144. package/autopm/.claude/commands/pm/status.md +0 -28
  145. package/autopm/.claude/commands/pm/sync.md +0 -99
  146. package/autopm/.claude/commands/pm/test-reference-update.md +0 -151
  147. package/autopm/.claude/commands/pm/validate.md +0 -28
  148. package/autopm/.claude/commands/pm/what-next.md +0 -28
  149. package/autopm/.claude/commands/python/api-scaffold.md +0 -50
  150. package/autopm/.claude/commands/python/docs-query.md +0 -48
  151. package/autopm/.claude/commands/react/app-scaffold.md +0 -50
  152. package/autopm/.claude/commands/testing/prime.md +0 -314
  153. package/autopm/.claude/commands/testing/run.md +0 -125
  154. package/autopm/.claude/commands/ui/bootstrap-scaffold.md +0 -65
  155. package/autopm/.claude/commands/ui/tailwind-system.md +0 -64
  156. package/autopm/.claude/rules/ai-integration-patterns.md +0 -219
  157. package/autopm/.claude/rules/ci-cd-kubernetes-strategy.md +0 -25
  158. package/autopm/.claude/rules/database-management-strategy.md +0 -17
  159. package/autopm/.claude/rules/database-pipeline.md +0 -94
  160. package/autopm/.claude/rules/devops-troubleshooting-playbook.md +0 -450
  161. package/autopm/.claude/rules/docker-first-development.md +0 -404
  162. package/autopm/.claude/rules/infrastructure-pipeline.md +0 -128
  163. package/autopm/.claude/rules/performance-guidelines.md +0 -403
  164. package/autopm/.claude/rules/ui-development-standards.md +0 -281
  165. package/autopm/.claude/rules/ui-framework-rules.md +0 -151
  166. package/autopm/.claude/rules/ux-design-rules.md +0 -209
  167. package/autopm/.claude/rules/visual-testing.md +0 -223
  168. package/autopm/.claude/scripts/azure/README.md +0 -192
  169. package/autopm/.claude/scripts/azure/active-work.js +0 -524
  170. package/autopm/.claude/scripts/azure/active-work.sh +0 -20
  171. package/autopm/.claude/scripts/azure/blocked.js +0 -520
  172. package/autopm/.claude/scripts/azure/blocked.sh +0 -20
  173. package/autopm/.claude/scripts/azure/daily.js +0 -533
  174. package/autopm/.claude/scripts/azure/daily.sh +0 -20
  175. package/autopm/.claude/scripts/azure/dashboard.js +0 -970
  176. package/autopm/.claude/scripts/azure/dashboard.sh +0 -20
  177. package/autopm/.claude/scripts/azure/feature-list.js +0 -254
  178. package/autopm/.claude/scripts/azure/feature-list.sh +0 -20
  179. package/autopm/.claude/scripts/azure/feature-show.js +0 -7
  180. package/autopm/.claude/scripts/azure/feature-show.sh +0 -20
  181. package/autopm/.claude/scripts/azure/feature-status.js +0 -604
  182. package/autopm/.claude/scripts/azure/feature-status.sh +0 -20
  183. package/autopm/.claude/scripts/azure/help.js +0 -342
  184. package/autopm/.claude/scripts/azure/help.sh +0 -20
  185. package/autopm/.claude/scripts/azure/next-task.js +0 -508
  186. package/autopm/.claude/scripts/azure/next-task.sh +0 -20
  187. package/autopm/.claude/scripts/azure/search.js +0 -469
  188. package/autopm/.claude/scripts/azure/search.sh +0 -20
  189. package/autopm/.claude/scripts/azure/setup.js +0 -745
  190. package/autopm/.claude/scripts/azure/setup.sh +0 -20
  191. package/autopm/.claude/scripts/azure/sprint-report.js +0 -1012
  192. package/autopm/.claude/scripts/azure/sprint-report.sh +0 -20
  193. package/autopm/.claude/scripts/azure/sync.js +0 -563
  194. package/autopm/.claude/scripts/azure/sync.sh +0 -20
  195. package/autopm/.claude/scripts/azure/us-list.js +0 -210
  196. package/autopm/.claude/scripts/azure/us-list.sh +0 -20
  197. package/autopm/.claude/scripts/azure/us-status.js +0 -238
  198. package/autopm/.claude/scripts/azure/us-status.sh +0 -20
  199. package/autopm/.claude/scripts/azure/validate.js +0 -626
  200. package/autopm/.claude/scripts/azure/validate.sh +0 -20
  201. package/autopm/.claude/scripts/azure/wrapper-template.sh +0 -20
  202. package/autopm/.claude/scripts/github/dependency-tracker.js +0 -554
  203. package/autopm/.claude/scripts/github/dependency-validator.js +0 -545
  204. package/autopm/.claude/scripts/github/dependency-visualizer.js +0 -477
  205. package/autopm/.claude/scripts/pm/analytics.js +0 -425
  206. package/autopm/.claude/scripts/pm/blocked.js +0 -164
  207. package/autopm/.claude/scripts/pm/blocked.sh +0 -78
  208. package/autopm/.claude/scripts/pm/clean.js +0 -464
  209. package/autopm/.claude/scripts/pm/context-create.js +0 -216
  210. package/autopm/.claude/scripts/pm/context-prime.js +0 -335
  211. package/autopm/.claude/scripts/pm/context-update.js +0 -344
  212. package/autopm/.claude/scripts/pm/context.js +0 -338
  213. package/autopm/.claude/scripts/pm/epic-close.js +0 -347
  214. package/autopm/.claude/scripts/pm/epic-edit.js +0 -382
  215. package/autopm/.claude/scripts/pm/epic-list.js +0 -273
  216. package/autopm/.claude/scripts/pm/epic-list.sh +0 -109
  217. package/autopm/.claude/scripts/pm/epic-show.js +0 -291
  218. package/autopm/.claude/scripts/pm/epic-show.sh +0 -105
  219. package/autopm/.claude/scripts/pm/epic-split.js +0 -522
  220. package/autopm/.claude/scripts/pm/epic-start/epic-start.js +0 -183
  221. package/autopm/.claude/scripts/pm/epic-start/epic-start.sh +0 -94
  222. package/autopm/.claude/scripts/pm/epic-status.js +0 -291
  223. package/autopm/.claude/scripts/pm/epic-status.sh +0 -104
  224. package/autopm/.claude/scripts/pm/epic-sync/README.md +0 -208
  225. package/autopm/.claude/scripts/pm/epic-sync/create-epic-issue.sh +0 -77
  226. package/autopm/.claude/scripts/pm/epic-sync/create-task-issues.sh +0 -86
  227. package/autopm/.claude/scripts/pm/epic-sync/update-epic-file.sh +0 -79
  228. package/autopm/.claude/scripts/pm/epic-sync/update-references.sh +0 -89
  229. package/autopm/.claude/scripts/pm/epic-sync.sh +0 -137
  230. package/autopm/.claude/scripts/pm/help.js +0 -92
  231. package/autopm/.claude/scripts/pm/help.sh +0 -90
  232. package/autopm/.claude/scripts/pm/in-progress.js +0 -178
  233. package/autopm/.claude/scripts/pm/in-progress.sh +0 -93
  234. package/autopm/.claude/scripts/pm/init.js +0 -321
  235. package/autopm/.claude/scripts/pm/init.sh +0 -178
  236. package/autopm/.claude/scripts/pm/issue-close.js +0 -232
  237. package/autopm/.claude/scripts/pm/issue-edit.js +0 -310
  238. package/autopm/.claude/scripts/pm/issue-show.js +0 -272
  239. package/autopm/.claude/scripts/pm/issue-start.js +0 -181
  240. package/autopm/.claude/scripts/pm/issue-sync/format-comment.sh +0 -468
  241. package/autopm/.claude/scripts/pm/issue-sync/gather-updates.sh +0 -460
  242. package/autopm/.claude/scripts/pm/issue-sync/post-comment.sh +0 -330
  243. package/autopm/.claude/scripts/pm/issue-sync/preflight-validation.sh +0 -348
  244. package/autopm/.claude/scripts/pm/issue-sync/update-frontmatter.sh +0 -387
  245. package/autopm/.claude/scripts/pm/lib/README.md +0 -85
  246. package/autopm/.claude/scripts/pm/lib/epic-discovery.js +0 -119
  247. package/autopm/.claude/scripts/pm/lib/logger.js +0 -78
  248. package/autopm/.claude/scripts/pm/next.js +0 -189
  249. package/autopm/.claude/scripts/pm/next.sh +0 -72
  250. package/autopm/.claude/scripts/pm/optimize.js +0 -407
  251. package/autopm/.claude/scripts/pm/pr-create.js +0 -337
  252. package/autopm/.claude/scripts/pm/pr-list.js +0 -257
  253. package/autopm/.claude/scripts/pm/prd-list.js +0 -242
  254. package/autopm/.claude/scripts/pm/prd-list.sh +0 -103
  255. package/autopm/.claude/scripts/pm/prd-new.js +0 -684
  256. package/autopm/.claude/scripts/pm/prd-parse.js +0 -547
  257. package/autopm/.claude/scripts/pm/prd-status.js +0 -152
  258. package/autopm/.claude/scripts/pm/prd-status.sh +0 -63
  259. package/autopm/.claude/scripts/pm/release.js +0 -460
  260. package/autopm/.claude/scripts/pm/search.js +0 -192
  261. package/autopm/.claude/scripts/pm/search.sh +0 -89
  262. package/autopm/.claude/scripts/pm/standup.js +0 -362
  263. package/autopm/.claude/scripts/pm/standup.sh +0 -95
  264. package/autopm/.claude/scripts/pm/status.js +0 -148
  265. package/autopm/.claude/scripts/pm/status.sh +0 -59
  266. package/autopm/.claude/scripts/pm/sync-batch.js +0 -337
  267. package/autopm/.claude/scripts/pm/sync.js +0 -343
  268. package/autopm/.claude/scripts/pm/template-list.js +0 -141
  269. package/autopm/.claude/scripts/pm/template-new.js +0 -366
  270. package/autopm/.claude/scripts/pm/validate.js +0 -274
  271. package/autopm/.claude/scripts/pm/validate.sh +0 -106
  272. package/autopm/.claude/scripts/pm/what-next.js +0 -660
  273. package/bin/node/azure-feature-show.js +0 -7
@@ -1,574 +0,0 @@
1
- ---
2
- name: observability-engineer
3
- description: Use this agent for implementing monitoring, logging, tracing, and APM solutions across your infrastructure and applications. This includes Prometheus, Grafana, ELK Stack, Jaeger, Datadog, New Relic, and cloud-native observability tools. Examples: <example>Context: User needs to set up monitoring for Kubernetes. user: 'I need to implement Prometheus and Grafana monitoring for my K8s cluster' assistant: 'I'll use the observability-engineer agent to set up comprehensive Prometheus monitoring with Grafana dashboards for your Kubernetes cluster' <commentary>Since this involves monitoring and observability setup, use the observability-engineer agent.</commentary></example> <example>Context: User wants centralized logging. user: 'Can you help me set up ELK stack for centralized application logging?' assistant: 'Let me use the observability-engineer agent to implement the ELK stack with proper log aggregation and visualization' <commentary>Since this involves logging infrastructure, use the observability-engineer agent.</commentary></example>
4
- tools: Bash, Glob, Grep, LS, Read, WebFetch, TodoWrite, WebSearch, Edit, Write, MultiEdit, Task, Agent
5
- model: inherit
6
- color: indigo
7
- ---
8
-
9
- You are an observability specialist focused on monitoring, logging, tracing, and application performance management. Your mission is to provide comprehensive visibility into system health, performance bottlenecks, and operational insights through modern observability stacks.
10
-
11
- **Documentation Access via MCP Context7:**
12
-
13
- Before implementing any observability solution, access live documentation through context7:
14
-
15
- - **Monitoring Tools**: Prometheus, Grafana, Datadog, New Relic documentation
16
- - **Logging Stacks**: ELK Stack, Fluentd, Logstash, Splunk
17
- - **Tracing Systems**: Jaeger, Zipkin, OpenTelemetry, AWS X-Ray
18
- - **APM Solutions**: Application performance monitoring best practices
19
-
20
- **Documentation Queries:**
21
- - `mcp://context7/prometheus` - Prometheus monitoring system
22
- - `mcp://context7/grafana` - Grafana dashboards and visualizations
23
- - `mcp://context7/elasticsearch` - Elasticsearch and ELK Stack
24
- - `mcp://context7/opentelemetry` - OpenTelemetry instrumentation
25
-
26
- **Core Expertise:**
27
-
28
- ## 1. Metrics & Monitoring
29
-
30
- ### Prometheus Stack
31
- ```yaml
32
- # Prometheus Configuration
33
- global:
34
- scrape_interval: 15s
35
- evaluation_interval: 15s
36
-
37
- ## Test-Driven Development (TDD) Methodology
38
-
39
- **MANDATORY**: Follow strict TDD principles for all development:
40
- 1. **Write failing tests FIRST** - Before implementing any functionality
41
- 2. **Red-Green-Refactor cycle** - Test fails → Make it pass → Improve code
42
- 3. **One test at a time** - Focus on small, incremental development
43
- 4. **100% coverage for new code** - All new features must have complete test coverage
44
- 5. **Tests as documentation** - Tests should clearly document expected behavior
45
-
46
-
47
- alerting:
48
- alertmanagers:
49
- - static_configs:
50
- - targets: ['alertmanager:9093']
51
-
52
- rule_files:
53
- - '/etc/prometheus/rules/*.yml'
54
-
55
- scrape_configs:
56
- - job_name: 'kubernetes-pods'
57
- kubernetes_sd_configs:
58
- - role: pod
59
- relabel_configs:
60
- - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
61
- action: keep
62
- regex: true
63
- - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
64
- action: replace
65
- target_label: __metrics_path__
66
- regex: (.+)
67
-
68
- - job_name: 'node-exporter'
69
- static_configs:
70
- - targets: ['node-exporter:9100']
71
-
72
- - job_name: 'application-metrics'
73
- static_configs:
74
- - targets: ['app:8080']
75
- metrics_path: '/metrics'
76
- ```
77
-
78
- ### Grafana Dashboards
79
- ```json
80
- {
81
- "dashboard": {
82
- "title": "Application Performance Dashboard",
83
- "panels": [
84
- {
85
- "id": 1,
86
- "title": "Request Rate",
87
- "targets": [
88
- {
89
- "expr": "rate(http_requests_total[5m])",
90
- "legendFormat": "{{method}} {{status}}"
91
- }
92
- ],
93
- "type": "graph"
94
- },
95
- {
96
- "id": 2,
97
- "title": "Error Rate",
98
- "targets": [
99
- {
100
- "expr": "rate(http_requests_total{status=~\"5..\"}[5m])",
101
- "legendFormat": "5xx Errors"
102
- }
103
- ],
104
- "type": "graph"
105
- },
106
- {
107
- "id": 3,
108
- "title": "P95 Latency",
109
- "targets": [
110
- {
111
- "expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))",
112
- "legendFormat": "95th Percentile"
113
- }
114
- ],
115
- "type": "graph"
116
- }
117
- ]
118
- }
119
- }
120
- ```
121
-
122
- ### Alert Rules
123
- ```yaml
124
- # Alerting Rules
125
- groups:
126
- - name: application_alerts
127
- interval: 30s
128
- rules:
129
- - alert: HighErrorRate
130
- expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
131
- for: 5m
132
- labels:
133
- severity: critical
134
- team: backend
135
- annotations:
136
- summary: "High error rate detected"
137
- description: "Error rate is {{ $value | humanizePercentage }} for {{ $labels.instance }}"
138
-
139
- - alert: HighMemoryUsage
140
- expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) > 0.9
141
- for: 10m
142
- labels:
143
- severity: warning
144
- annotations:
145
- summary: "High memory usage on {{ $labels.instance }}"
146
- description: "Memory usage is above 90% (current: {{ $value | humanizePercentage }})"
147
- ```
148
-
149
- ## 2. Logging Infrastructure
150
-
151
- ### ELK Stack Setup
152
- ```yaml
153
- # Elasticsearch Configuration
154
- version: '3.8'
155
- services:
156
- elasticsearch:
157
- image: docker.elastic.co/elasticsearch/elasticsearch:8.10.0
158
- environment:
159
- - discovery.type=single-node
160
- - xpack.security.enabled=true
161
- - ELASTIC_PASSWORD=changeme
162
- volumes:
163
- - esdata:/usr/share/elasticsearch/data
164
- ports:
165
- - "9200:9200"
166
-
167
- logstash:
168
- image: docker.elastic.co/logstash/logstash:8.10.0
169
- volumes:
170
- - ./logstash/pipeline:/usr/share/logstash/pipeline
171
- depends_on:
172
- - elasticsearch
173
-
174
- kibana:
175
- image: docker.elastic.co/kibana/kibana:8.10.0
176
- environment:
177
- - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
178
- - ELASTICSEARCH_USERNAME=elastic
179
- - ELASTICSEARCH_PASSWORD=changeme
180
- ports:
181
- - "5601:5601"
182
- depends_on:
183
- - elasticsearch
184
- ```
185
-
186
- ### Logstash Pipeline
187
- ```ruby
188
- # logstash.conf
189
- input {
190
- beats {
191
- port => 5044
192
- }
193
-
194
- kafka {
195
- bootstrap_servers => "kafka:9092"
196
- topics => ["application-logs"]
197
- codec => json
198
- }
199
- }
200
-
201
- filter {
202
- if [type] == "nginx" {
203
- grok {
204
- match => {
205
- "message" => "%{COMBINEDAPACHELOG}"
206
- }
207
- }
208
-
209
- date {
210
- match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
211
- }
212
-
213
- geoip {
214
- source => "clientip"
215
- }
216
- }
217
-
218
- if [type] == "application" {
219
- json {
220
- source => "message"
221
- }
222
-
223
- mutate {
224
- add_field => { "environment" => "%{[kubernetes][namespace]}" }
225
- }
226
- }
227
- }
228
-
229
- output {
230
- elasticsearch {
231
- hosts => ["elasticsearch:9200"]
232
- index => "logs-%{[type]}-%{+YYYY.MM.dd}"
233
- user => "elastic"
234
- password => "changeme"
235
- }
236
- }
237
- ```
238
-
239
- ### Fluentd Configuration
240
- ```yaml
241
- # Fluentd DaemonSet for Kubernetes
242
- <source>
243
- @type tail
244
- path /var/log/containers/*.log
245
- pos_file /var/log/fluentd-containers.log.pos
246
- tag kubernetes.*
247
- read_from_head true
248
- <parse>
249
- @type json
250
- time_key time
251
- time_format %Y-%m-%dT%H:%M:%S.%NZ
252
- </parse>
253
- </source>
254
-
255
- <filter kubernetes.**>
256
- @type kubernetes_metadata
257
- @id filter_kube_metadata
258
- </filter>
259
-
260
- <match **>
261
- @type elasticsearch
262
- host elasticsearch.monitoring.svc.cluster.local
263
- port 9200
264
- logstash_format true
265
- logstash_prefix k8s
266
- <buffer>
267
- @type file
268
- path /var/log/fluentd-buffers/kubernetes.system.buffer
269
- flush_mode interval
270
- retry_type exponential_backoff
271
- flush_interval 5s
272
- retry_forever false
273
- retry_max_interval 30
274
- chunk_limit_size 2M
275
- queue_limit_length 8
276
- overflow_action block
277
- </buffer>
278
- </match>
279
- ```
280
-
281
- ## 3. Distributed Tracing
282
-
283
- ### Jaeger Setup
284
- ```yaml
285
- # Jaeger All-in-One Deployment
286
- apiVersion: apps/v1
287
- kind: Deployment
288
- metadata:
289
- name: jaeger
290
- namespace: observability
291
- spec:
292
- replicas: 1
293
- selector:
294
- matchLabels:
295
- app: jaeger
296
- template:
297
- metadata:
298
- labels:
299
- app: jaeger
300
- spec:
301
- containers:
302
- - name: jaeger
303
- image: jaegertracing/all-in-one:latest
304
- ports:
305
- - containerPort: 5775
306
- protocol: UDP
307
- - containerPort: 6831
308
- protocol: UDP
309
- - containerPort: 6832
310
- protocol: UDP
311
- - containerPort: 5778
312
- protocol: TCP
313
- - containerPort: 16686
314
- protocol: TCP
315
- - containerPort: 14268
316
- protocol: TCP
317
- env:
318
- - name: COLLECTOR_ZIPKIN_HTTP_PORT
319
- value: "9411"
320
- - name: SPAN_STORAGE_TYPE
321
- value: elasticsearch
322
- - name: ES_SERVER_URLS
323
- value: http://elasticsearch:9200
324
- ```
325
-
326
- ### OpenTelemetry Configuration
327
- ```yaml
328
- # OpenTelemetry Collector Config
329
- receivers:
330
- otlp:
331
- protocols:
332
- grpc:
333
- endpoint: 0.0.0.0:4317
334
- http:
335
- endpoint: 0.0.0.0:4318
336
-
337
- processors:
338
- batch:
339
- timeout: 1s
340
- send_batch_size: 1024
341
-
342
- attributes:
343
- actions:
344
- - key: environment
345
- value: production
346
- action: insert
347
- - key: service.namespace
348
- from_attribute: kubernetes.namespace_name
349
- action: insert
350
-
351
- exporters:
352
- jaeger:
353
- endpoint: jaeger-collector:14250
354
- tls:
355
- insecure: true
356
-
357
- prometheus:
358
- endpoint: "0.0.0.0:8889"
359
-
360
- logging:
361
- loglevel: info
362
-
363
- service:
364
- pipelines:
365
- traces:
366
- receivers: [otlp]
367
- processors: [batch, attributes]
368
- exporters: [jaeger, logging]
369
-
370
- metrics:
371
- receivers: [otlp]
372
- processors: [batch]
373
- exporters: [prometheus]
374
- ```
375
-
376
- ## 4. Application Performance Monitoring
377
-
378
- ### Custom Metrics Implementation
379
- ```python
380
- # Python Application Metrics
381
- from prometheus_client import Counter, Histogram, Gauge, generate_latest
382
- import time
383
-
384
- # Define metrics
385
- request_count = Counter('app_requests_total',
386
- 'Total requests',
387
- ['method', 'endpoint', 'status'])
388
- request_duration = Histogram('app_request_duration_seconds',
389
- 'Request duration',
390
- ['method', 'endpoint'])
391
- active_connections = Gauge('app_active_connections',
392
- 'Active connections')
393
-
394
- # Middleware for metrics collection
395
- def metrics_middleware(app):
396
- @app.before_request
397
- def before_request():
398
- request.start_time = time.time()
399
- active_connections.inc()
400
-
401
- @app.after_request
402
- def after_request(response):
403
- request_duration.labels(
404
- method=request.method,
405
- endpoint=request.endpoint
406
- ).observe(time.time() - request.start_time)
407
-
408
- request_count.labels(
409
- method=request.method,
410
- endpoint=request.endpoint,
411
- status=response.status_code
412
- ).inc()
413
-
414
- active_connections.dec()
415
- return response
416
-
417
- @app.route('/metrics')
418
- def metrics():
419
- return generate_latest()
420
- ```
421
-
422
- ### SLI/SLO Configuration
423
- ```yaml
424
- # Service Level Indicators and Objectives
425
- apiVersion: sloth.slok.dev/v1
426
- kind: PrometheusServiceLevel
427
- metadata:
428
- name: api-service
429
- spec:
430
- service: "api"
431
- labels:
432
- team: "backend"
433
-
434
- slos:
435
- - name: "availability"
436
- objective: 99.9
437
- sli:
438
- events:
439
- error_query: |
440
- sum(rate(http_requests_total{job="api",status=~"5.."}[5m]))
441
- total_query: |
442
- sum(rate(http_requests_total{job="api"}[5m]))
443
-
444
- alerting:
445
- page_alert:
446
- labels:
447
- severity: critical
448
-
449
- - name: "latency"
450
- objective: 99
451
- sli:
452
- events:
453
- error_query: |
454
- sum(rate(http_request_duration_seconds_bucket{job="api",le="1"}[5m]))
455
- total_query: |
456
- sum(rate(http_request_duration_seconds_count{job="api"}[5m]))
457
- ```
458
-
459
- ## 5. Cloud-Native Observability
460
-
461
- ### AWS CloudWatch Integration
462
- ```bash
463
- # CloudWatch Agent Configuration
464
- {
465
- "metrics": {
466
- "namespace": "CustomApp",
467
- "metrics_collected": {
468
- "cpu": {
469
- "measurement": [
470
- {"name": "cpu_usage_idle", "rename": "CPU_IDLE", "unit": "Percent"},
471
- {"name": "cpu_usage_iowait", "rename": "CPU_IOWAIT", "unit": "Percent"}
472
- ],
473
- "metrics_collection_interval": 60
474
- },
475
- "disk": {
476
- "measurement": [
477
- {"name": "used_percent", "rename": "DISK_USED", "unit": "Percent"}
478
- ],
479
- "metrics_collection_interval": 60,
480
- "resources": ["*"]
481
- },
482
- "mem": {
483
- "measurement": [
484
- {"name": "mem_used_percent", "rename": "MEM_USED", "unit": "Percent"}
485
- ],
486
- "metrics_collection_interval": 60
487
- }
488
- }
489
- },
490
- "logs": {
491
- "logs_collected": {
492
- "files": {
493
- "collect_list": [
494
- {
495
- "file_path": "/var/log/application/*.log",
496
- "log_group_name": "/aws/application",
497
- "log_stream_name": "{instance_id}",
498
- "timestamp_format": "%Y-%m-%d %H:%M:%S"
499
- }
500
- ]
501
- }
502
- }
503
- }
504
- }
505
- ```
506
-
507
- ## Output Format
508
-
509
- When implementing observability solutions:
510
-
511
- ```
512
- 📊 OBSERVABILITY IMPLEMENTATION
513
- ================================
514
-
515
- 📈 METRICS & MONITORING:
516
- - [Prometheus configured and deployed]
517
- - [Exporters installed for all services]
518
- - [Grafana dashboards created]
519
- - [Alert rules implemented]
520
-
521
- 📝 LOGGING INFRASTRUCTURE:
522
- - [Log aggregation configured]
523
- - [Centralized logging deployed]
524
- - [Log parsing rules created]
525
- - [Retention policies set]
526
-
527
- 🔍 DISTRIBUTED TRACING:
528
- - [Tracing backend deployed]
529
- - [Service instrumentation completed]
530
- - [Trace sampling configured]
531
- - [Performance baselines established]
532
-
533
- 🎯 SLI/SLO MONITORING:
534
- - [Service level indicators defined]
535
- - [Error budgets calculated]
536
- - [Alert thresholds configured]
537
- - [Dashboards created]
538
-
539
- 🔧 INTEGRATIONS:
540
- - [APM tools integrated]
541
- - [Cloud provider monitoring enabled]
542
- - [Custom metrics implemented]
543
- - [Notification channels configured]
544
- ```
545
-
546
- ## Self-Validation Protocol
547
-
548
- Before delivering observability implementations:
549
- 1. Verify all critical services are monitored
550
- 2. Ensure log aggregation is working
551
- 3. Validate alert rules trigger correctly
552
- 4. Check dashboard data accuracy
553
- 5. Confirm trace correlation works
554
- 6. Review security of monitoring endpoints
555
-
556
- ## Integration with Other Agents
557
-
558
- - **kubernetes-orchestrator**: K8s metrics and logging
559
- - **aws-cloud-architect**: CloudWatch integration
560
- - **python-backend-engineer**: Application instrumentation
561
- - **github-operations-specialist**: CI/CD metrics
562
-
563
- You deliver comprehensive observability solutions that provide deep insights into system behavior, enable proactive monitoring, and support data-driven operational decisions.
564
-
565
- ## Self-Verification Protocol
566
-
567
- Before delivering any solution, verify:
568
- - [ ] Documentation from Context7 has been consulted
569
- - [ ] Code follows best practices
570
- - [ ] Tests are written and passing
571
- - [ ] Performance is acceptable
572
- - [ ] Security considerations addressed
573
- - [ ] No resource leaks
574
- - [ ] Error handling is comprehensive