@gonzih/skills-devops 1.1.0 → 1.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
{"name":"@gonzih/skills-devops","version":"1.
|
|
1
|
+
{"name":"@gonzih/skills-devops","version":"1.2.0","description":"DevOps skills for Claude Code","type":"module","scripts":{"postinstall":"node install.js"},"files":["skills/","install.js","README.md"],"keywords":["claude","mcp","skills","devops"],"license":"MIT"}
|
|
@@ -20,6 +20,12 @@ Analyze a set of incoming monitoring alerts, determine severity and urgency, ide
|
|
|
20
20
|
4. Identify the root cause alert and downstream effects
|
|
21
21
|
5. Output a triage summary: what to act on now, what to monitor, what to silence
|
|
22
22
|
|
|
23
|
+
## Live Data Sources
|
|
24
|
+
- **Prometheus alerting rule examples**: `https://github.com/samber/awesome-prometheus-alerts` — community-maintained library of Prometheus alerting rules by category (infra, databases, Kubernetes, etc.)
|
|
25
|
+
- **AWS status page API**: `https://health.aws.amazon.com/health/status` — current AWS service health; programmatic access via AWS Health API (`aws health describe-events`)
|
|
26
|
+
- **GCP status page API**: `https://status.cloud.google.com/incidents.json` — machine-readable GCP incident feed
|
|
27
|
+
- **Azure status API**: `https://azure.status.microsoft/api/v2/status.json` — Azure service health JSON feed
|
|
28
|
+
|
|
23
29
|
## Example
|
|
24
30
|
User: "I have 47 alerts firing: disk full on db-01, high latency on api-gateway, 5xx spike on checkout"
|
|
25
31
|
→ Identifies disk-full on db-01 as root cause driving the cascade, recommends clearing disk space first, provides ordered action plan.
|
|
@@ -29,6 +29,11 @@ When the pipeline deploys infrastructure with Pulumi, check for:
|
|
|
29
29
|
- **Config missing**: stack config not committed or secret not set — run `pulumi config set <key>`
|
|
30
30
|
- **Passphrase prompt**: `PULUMI_CONFIG_PASSPHRASE` env var missing for self-managed backends
|
|
31
31
|
|
|
32
|
+
## Live Data Sources
|
|
33
|
+
- **GitHub Actions API**: `https://api.github.com/repos/{owner}/{repo}/actions/runs` — fetch recent workflow run status, logs, and failure reasons
|
|
34
|
+
- **Docker Hub vulnerability feeds**: `https://hub.docker.com/v2/repositories/{namespace}/{repo}/tags` — check image tag availability; cross-reference CVEs via Docker Scout or Snyk
|
|
35
|
+
- **npm audit patterns**: Run `npm audit --json` for dependency vulnerability data; common failure patterns include `EAUDITNOPJSON`, peer conflict errors, and lock file drift
|
|
36
|
+
|
|
32
37
|
## Example
|
|
33
38
|
User: "My GitHub Actions deploy job is failing with 'Error: Unable to locate executable file: docker'"
|
|
34
39
|
→ Diagnose missing Docker setup step, suggest `docker/setup-buildx-action`, apply to workflow YAML.
|
|
@@ -27,6 +27,11 @@ When the incident involves infrastructure managed by Pulumi:
|
|
|
27
27
|
2. Re-deploy the previous version: check out the corresponding commit and run `pulumi up`, or use `pulumi up --target-replace <urn>` for surgical replacement of a single resource
|
|
28
28
|
3. Use `pulumi refresh` after rollback to confirm state matches real infrastructure
|
|
29
29
|
|
|
30
|
+
## Live Data Sources
|
|
31
|
+
- **AWS Service Health Dashboard API**: `https://health.aws.amazon.com/health/status` — real-time AWS service events; use `aws health describe-events --filter eventTypeCategories=issue` for programmatic querying
|
|
32
|
+
- **GCP status API**: `https://status.cloud.google.com/incidents.json` — current and historical GCP incidents in JSON format
|
|
33
|
+
- **PagerDuty webhook patterns**: Inbound webhooks deliver `incident.trigger`, `incident.acknowledge`, `incident.resolve` payloads — use these to auto-populate incident timelines and sync state
|
|
34
|
+
|
|
30
35
|
## Example
|
|
31
36
|
User: "Production database is down, started 14:32 UTC, ~10k users affected"
|
|
32
37
|
→ Declares SEV1, drafts initial stakeholder update, starts timeline, prompts for on-call contacts, and guides through mitigation steps to resolution and postmortem.
|
|
@@ -26,6 +26,11 @@ When the runbook covers an infrastructure change managed by Pulumi, include:
|
|
|
26
26
|
- Link to the stack in Pulumi Cloud (e.g. `https://app.pulumi.com/<org>/<project>/<stack>`)
|
|
27
27
|
- Rollback step using `pulumi stack history` to identify the previous deployment and re-run it
|
|
28
28
|
|
|
29
|
+
## Live Data Sources
|
|
30
|
+
- **AWS Runbook templates**: `https://docs.aws.amazon.com/systems-manager/latest/userguide/automation-documents.html` — official AWS Systems Manager Automation runbook library
|
|
31
|
+
- **PagerDuty runbook community**: `https://community.pagerduty.com` — real-world runbook examples shared by practitioners
|
|
32
|
+
- **SRE Workbook public patterns**: `https://sre.google/workbook/table-of-contents/` — Google SRE Workbook chapters on on-call and operational procedures
|
|
33
|
+
|
|
29
34
|
## Example
|
|
30
35
|
User: "Write a runbook for restarting the payment-service in Kubernetes"
|
|
31
36
|
→ Produces a runbook covering health checks, drain, rolling restart, verification, and rollback with `kubectl` commands.
|