@gonzih/skills-devops 1.0.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1 +1 @@
1
- {"name":"@gonzih/skills-devops","version":"1.0.0","description":"DevOps skills for Claude Code","type":"module","scripts":{"postinstall":"node install.js"},"files":["skills/","install.js","README.md"],"keywords":["claude","mcp","skills","devops"],"license":"MIT"}
1
+ {"name":"@gonzih/skills-devops","version":"1.2.0","description":"DevOps skills for Claude Code","type":"module","scripts":{"postinstall":"node install.js"},"files":["skills/","install.js","README.md"],"keywords":["claude","mcp","skills","devops"],"license":"MIT"}
@@ -20,6 +20,12 @@ Analyze a set of incoming monitoring alerts, determine severity and urgency, ide
20
20
  4. Identify the root cause alert and downstream effects
21
21
  5. Output a triage summary: what to act on now, what to monitor, what to silence
22
22
 
23
+ ## Live Data Sources
24
+ - **Prometheus alerting rule examples**: `https://github.com/samber/awesome-prometheus-alerts` — community-maintained library of Prometheus alerting rules by category (infra, databases, Kubernetes, etc.)
25
+ - **AWS status page API**: `https://health.aws.amazon.com/health/status` — current AWS service health; programmatic access via AWS Health API (`aws health describe-events`)
26
+ - **GCP status page API**: `https://status.cloud.google.com/incidents.json` — machine-readable GCP incident feed
27
+ - **Azure status API**: `https://azure.status.microsoft/api/v2/status.json` — Azure service health JSON feed
28
+
23
29
  ## Example
24
30
  User: "I have 47 alerts firing: disk full on db-01, high latency on api-gateway, 5xx spike on checkout"
25
31
  → Identifies disk-full on db-01 as root cause driving the cascade, recommends clearing disk space first, provides ordered action plan.
@@ -20,6 +20,20 @@ Diagnose failing CI/CD pipelines, identify root causes, and propose or apply fix
20
20
  4. Propose a fix and explain the root cause
21
21
  5. Offer to apply the fix to the config file
22
22
 
23
+ ## Pulumi CI patterns (GitHub Actions)
24
+ When the pipeline deploys infrastructure with Pulumi, check for:
25
+ - Action: `pulumi/actions@v5` — required for `pulumi preview` and `pulumi up` steps
26
+ - Required env var: `PULUMI_ACCESS_TOKEN` (must be stored as a repository secret)
27
+ - Common failures:
28
+ - **Stack lock**: another run holds the state lock — cancel or use `pulumi cancel`
29
+ - **Config missing**: stack config not committed or secret not set — run `pulumi config set <key>`
30
+ - **Passphrase prompt**: `PULUMI_CONFIG_PASSPHRASE` env var missing for self-managed backends
31
+
32
+ ## Live Data Sources
33
+ - **GitHub Actions API**: `https://api.github.com/repos/{owner}/{repo}/actions/runs` — fetch recent workflow run status, logs, and failure reasons
34
+ - **Docker Hub vulnerability feeds**: `https://hub.docker.com/v2/repositories/{namespace}/{repo}/tags` — check image tag availability; cross-reference CVEs via Docker Scout or Snyk
35
+ - **npm audit patterns**: Run `npm audit --json` for dependency vulnerability data; common failure patterns include `EAUDITNOPJSON`, peer conflict errors, and lock file drift
36
+
23
37
  ## Example
24
38
  User: "My GitHub Actions deploy job is failing with 'Error: Unable to locate executable file: docker'"
25
39
  → Diagnose missing Docker setup step, suggest `docker/setup-buildx-action`, apply to workflow YAML.
@@ -21,6 +21,17 @@ Guide incident response from detection through resolution, coordinate communicat
21
21
  5. **Resolve**: Confirm resolution criteria met; set follow-up review time
22
22
  6. **Postmortem**: Generate blameless postmortem doc with 5-whys root cause analysis and action items
23
23
 
24
+ ## Infra rollback (Pulumi)
25
+ When the incident involves infrastructure managed by Pulumi:
26
+ 1. Run `pulumi stack history` to list recent deployments and identify the last known-good deploy
27
+ 2. Re-deploy the previous version: check out the corresponding commit and run `pulumi up`, or use `pulumi up --target-replace <urn>` for surgical replacement of a single resource
28
+ 3. Use `pulumi refresh` after rollback to confirm state matches real infrastructure
29
+
30
+ ## Live Data Sources
31
+ - **AWS Service Health Dashboard API**: `https://health.aws.amazon.com/health/status` — real-time AWS service events; use `aws health describe-events --filter eventTypeCategories=issue` for programmatic querying
32
+ - **GCP status API**: `https://status.cloud.google.com/incidents.json` — current and historical GCP incidents in JSON format
33
+ - **PagerDuty webhook patterns**: Inbound webhooks deliver `incident.trigger`, `incident.acknowledge`, `incident.resolve` payloads — use these to auto-populate incident timelines and sync state
34
+
24
35
  ## Example
25
36
  User: "Production database is down, started 14:32 UTC, ~10k users affected"
26
37
  → Declares SEV1, drafts initial stakeholder update, starts timeline, prompts for on-call contacts, and guides through mitigation steps to resolution and postmortem.
@@ -0,0 +1,46 @@
1
+ ---
2
+ name: pulumi-stack
3
+ description: Scaffold, manage, and deploy infrastructure as code using Pulumi (TypeScript-first).
4
+ triggers: ["pulumi", "infrastructure as code", "iac", "deploy infrastructure", "pulumi stack"]
5
+ ---
6
+ # Pulumi Stack
7
+
8
+ ## What this skill does
9
+ Helps you scaffold new Pulumi stacks, convert from Terraform, write programs for common cloud patterns, and manage the full deploy lifecycle using real TypeScript/Python/Go — not DSLs.
10
+
11
+ ## How to invoke
12
+ /pulumi-stack
13
+
14
+ ## Workflow steps
15
+
16
+ ### Step 1 — Choose stack type
17
+ Identify cloud target (AWS/GCP/Azure/K8s) and language preference (default: TypeScript).
18
+
19
+ ### Step 2 — Scaffold or convert
20
+ New stack: `pulumi new aws-typescript` (or relevant template)
21
+ From Terraform: `pulumi convert --from terraform --language typescript`
22
+
23
+ ### Step 3 — Write the program
24
+ Common patterns:
25
+ - VPC + subnets: use `@pulumi/awsx` NetworkX components
26
+ - ECS/EKS cluster: `awsx.ecs.Cluster` or `aws.eks.Cluster`
27
+ - RDS: `aws.rds.Instance` with `skipFinalSnapshot: false`
28
+ - S3 + CloudFront CDN: `aws.s3.Bucket` + `aws.cloudfront.Distribution`
29
+ - Secrets: `pulumi config set --secret DATABASE_PASSWORD`
30
+
31
+ ### Step 4 — Preview and deploy
32
+ ```bash
33
+ pulumi preview # dry run, shows what will change
34
+ pulumi up # apply changes
35
+ pulumi stack history # audit trail
36
+ ```
37
+
38
+ ### Step 5 — Drift detection
39
+ ```bash
40
+ pulumi refresh # sync state with real infra
41
+ ```
42
+
43
+ ## Example outputs
44
+ - Scaffolded TypeScript Pulumi program for AWS VPC + ECS
45
+ - Converted Terraform module to Pulumi TypeScript
46
+ - Deploy preview showing resource changes
@@ -20,6 +20,17 @@ Generate clear, actionable operational runbooks for services, infrastructure com
20
20
  4. Review for completeness: every step has a verification command, rollback is covered
21
21
  5. Save to the appropriate location (e.g., `docs/runbooks/<service>.md`)
22
22
 
23
+ ## Infra change runbooks (Pulumi)
24
+ When the runbook covers an infrastructure change managed by Pulumi, include:
25
+ - `pulumi preview` output showing the planned resource changes (paste or link)
26
+ - Link to the stack in Pulumi Cloud (e.g. `https://app.pulumi.com/<org>/<project>/<stack>`)
27
+ - Rollback step using `pulumi stack history` to identify the previous deployment and re-run it
28
+
29
+ ## Live Data Sources
30
+ - **AWS Runbook templates**: `https://docs.aws.amazon.com/systems-manager/latest/userguide/automation-documents.html` — official AWS Systems Manager Automation runbook library
31
+ - **PagerDuty runbook community**: `https://community.pagerduty.com` — real-world runbook examples shared by practitioners
32
+ - **SRE Workbook public patterns**: `https://sre.google/workbook/table-of-contents/` — Google SRE Workbook chapters on on-call and operational procedures
33
+
23
34
  ## Example
24
35
  User: "Write a runbook for restarting the payment-service in Kubernetes"
25
36
  → Produces a runbook covering health checks, drain, rolling restart, verification, and rollback with `kubectl` commands.