npm - @gonzih/skills-devops - Versions diffs - 1.0.0 → 1.2.0 - Mend

@gonzih/skills-devops 1.0.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/package.json +1 -1
package/skills/alert-triage/SKILL.md +6 -0
package/skills/ci-cd-troubleshooter/SKILL.md +14 -0
package/skills/incident-commander/SKILL.md +11 -0
package/skills/pulumi-stack/SKILL.md +46 -0
package/skills/runbook-writer/SKILL.md +11 -0

package/package.json CHANGED Viewed

	@@ -1 +1 @@
1	- {"name":"@gonzih/skills-devops","version":"1.0.0","description":"DevOps skills for Claude Code","type":"module","scripts":{"postinstall":"node install.js"},"files":["skills/","install.js","README.md"],"keywords":["claude","mcp","skills","devops"],"license":"MIT"}
1	+ {"name":"@gonzih/skills-devops","version":"1.2.0","description":"DevOps skills for Claude Code","type":"module","scripts":{"postinstall":"node install.js"},"files":["skills/","install.js","README.md"],"keywords":["claude","mcp","skills","devops"],"license":"MIT"}

package/skills/alert-triage/SKILL.md CHANGED Viewed

@@ -20,6 +20,12 @@ Analyze a set of incoming monitoring alerts, determine severity and urgency, ide
 4. Identify the root cause alert and downstream effects
 5. Output a triage summary: what to act on now, what to monitor, what to silence
+## Live Data Sources
+- **Prometheus alerting rule examples**: `https://github.com/samber/awesome-prometheus-alerts` — community-maintained library of Prometheus alerting rules by category (infra, databases, Kubernetes, etc.)
+- **AWS status page API**: `https://health.aws.amazon.com/health/status` — current AWS service health; programmatic access via AWS Health API (`aws health describe-events`)
+- **GCP status page API**: `https://status.cloud.google.com/incidents.json` — machine-readable GCP incident feed
+- **Azure status API**: `https://azure.status.microsoft/api/v2/status.json` — Azure service health JSON feed
 ## Example
 User: "I have 47 alerts firing: disk full on db-01, high latency on api-gateway, 5xx spike on checkout"
 → Identifies disk-full on db-01 as root cause driving the cascade, recommends clearing disk space first, provides ordered action plan.

package/skills/ci-cd-troubleshooter/SKILL.md CHANGED Viewed

@@ -20,6 +20,20 @@ Diagnose failing CI/CD pipelines, identify root causes, and propose or apply fix
 4. Propose a fix and explain the root cause
 5. Offer to apply the fix to the config file
+## Pulumi CI patterns (GitHub Actions)
+When the pipeline deploys infrastructure with Pulumi, check for:
+- Action: `pulumi/actions@v5` — required for `pulumi preview` and `pulumi up` steps
+- Required env var: `PULUMI_ACCESS_TOKEN` (must be stored as a repository secret)
+- Common failures:
+  - **Stack lock**: another run holds the state lock — cancel or use `pulumi cancel`
+  - **Config missing**: stack config not committed or secret not set — run `pulumi config set <key>`
+  - **Passphrase prompt**: `PULUMI_CONFIG_PASSPHRASE` env var missing for self-managed backends
+## Live Data Sources
+- **GitHub Actions API**: `https://api.github.com/repos/{owner}/{repo}/actions/runs` — fetch recent workflow run status, logs, and failure reasons
+- **Docker Hub vulnerability feeds**: `https://hub.docker.com/v2/repositories/{namespace}/{repo}/tags` — check image tag availability; cross-reference CVEs via Docker Scout or Snyk
+- **npm audit patterns**: Run `npm audit --json` for dependency vulnerability data; common failure patterns include `EAUDITNOPJSON`, peer conflict errors, and lock file drift
 ## Example
 User: "My GitHub Actions deploy job is failing with 'Error: Unable to locate executable file: docker'"
 → Diagnose missing Docker setup step, suggest `docker/setup-buildx-action`, apply to workflow YAML.

package/skills/incident-commander/SKILL.md CHANGED Viewed

@@ -21,6 +21,17 @@ Guide incident response from detection through resolution, coordinate communicat
 5. **Resolve**: Confirm resolution criteria met; set follow-up review time
 6. **Postmortem**: Generate blameless postmortem doc with 5-whys root cause analysis and action items
+## Infra rollback (Pulumi)
+When the incident involves infrastructure managed by Pulumi:
+1. Run `pulumi stack history` to list recent deployments and identify the last known-good deploy
+2. Re-deploy the previous version: check out the corresponding commit and run `pulumi up`, or use `pulumi up --target-replace <urn>` for surgical replacement of a single resource
+3. Use `pulumi refresh` after rollback to confirm state matches real infrastructure
+## Live Data Sources
+- **AWS Service Health Dashboard API**: `https://health.aws.amazon.com/health/status` — real-time AWS service events; use `aws health describe-events --filter eventTypeCategories=issue` for programmatic querying
+- **GCP status API**: `https://status.cloud.google.com/incidents.json` — current and historical GCP incidents in JSON format
+- **PagerDuty webhook patterns**: Inbound webhooks deliver `incident.trigger`, `incident.acknowledge`, `incident.resolve` payloads — use these to auto-populate incident timelines and sync state
 ## Example
 User: "Production database is down, started 14:32 UTC, ~10k users affected"
 → Declares SEV1, drafts initial stakeholder update, starts timeline, prompts for on-call contacts, and guides through mitigation steps to resolution and postmortem.

package/skills/pulumi-stack/SKILL.md ADDED Viewed

@@ -0,0 +1,46 @@
+---
+name: pulumi-stack
+description: Scaffold, manage, and deploy infrastructure as code using Pulumi (TypeScript-first).
+triggers: ["pulumi", "infrastructure as code", "iac", "deploy infrastructure", "pulumi stack"]
+---
+# Pulumi Stack
+## What this skill does
+Helps you scaffold new Pulumi stacks, convert from Terraform, write programs for common cloud patterns, and manage the full deploy lifecycle using real TypeScript/Python/Go — not DSLs.
+## How to invoke
+/pulumi-stack
+## Workflow steps
+### Step 1 — Choose stack type
+Identify cloud target (AWS/GCP/Azure/K8s) and language preference (default: TypeScript).
+### Step 2 — Scaffold or convert
+New stack: `pulumi new aws-typescript` (or relevant template)
+From Terraform: `pulumi convert --from terraform --language typescript`
+### Step 3 — Write the program
+Common patterns:
+- VPC + subnets: use `@pulumi/awsx` NetworkX components
+- ECS/EKS cluster: `awsx.ecs.Cluster` or `aws.eks.Cluster`
+- RDS: `aws.rds.Instance` with `skipFinalSnapshot: false`
+- S3 + CloudFront CDN: `aws.s3.Bucket` + `aws.cloudfront.Distribution`
+- Secrets: `pulumi config set --secret DATABASE_PASSWORD`
+### Step 4 — Preview and deploy
+```bash
+pulumi preview   # dry run, shows what will change
+pulumi up        # apply changes
+pulumi stack history  # audit trail
+```
+### Step 5 — Drift detection
+```bash
+pulumi refresh   # sync state with real infra
+```
+## Example outputs
+- Scaffolded TypeScript Pulumi program for AWS VPC + ECS
+- Converted Terraform module to Pulumi TypeScript
+- Deploy preview showing resource changes

package/skills/runbook-writer/SKILL.md CHANGED Viewed

@@ -20,6 +20,17 @@ Generate clear, actionable operational runbooks for services, infrastructure com
 4. Review for completeness: every step has a verification command, rollback is covered
 5. Save to the appropriate location (e.g., `docs/runbooks/<service>.md`)
+## Infra change runbooks (Pulumi)
+When the runbook covers an infrastructure change managed by Pulumi, include:
+- `pulumi preview` output showing the planned resource changes (paste or link)
+- Link to the stack in Pulumi Cloud (e.g. `https://app.pulumi.com/<org>/<project>/<stack>`)
+- Rollback step using `pulumi stack history` to identify the previous deployment and re-run it
+## Live Data Sources
+- **AWS Runbook templates**: `https://docs.aws.amazon.com/systems-manager/latest/userguide/automation-documents.html` — official AWS Systems Manager Automation runbook library
+- **PagerDuty runbook community**: `https://community.pagerduty.com` — real-world runbook examples shared by practitioners
+- **SRE Workbook public patterns**: `https://sre.google/workbook/table-of-contents/` — Google SRE Workbook chapters on on-call and operational procedures
 ## Example
 User: "Write a runbook for restarting the payment-service in Kubernetes"
 → Produces a runbook covering health checks, drain, rolling restart, verification, and rollback with `kubectl` commands.