@gonzih/skills-devops 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,18 @@
1
+ # @gonzih/skills-devops
2
+
3
+ DevOps skills for Claude Code.
4
+
5
+ ## Install
6
+
7
+ ```bash
8
+ npm install -g @gonzih/skills-devops
9
+ ```
10
+
11
+ Skills are automatically copied to `~/.claude/skills/` on install.
12
+
13
+ ## Included Skills
14
+
15
+ - **ci-cd-troubleshooter** — Diagnose and fix CI/CD pipeline failures
16
+ - **runbook-writer** — Write operational runbooks for systems and services
17
+ - **alert-triage** — Analyze and prioritize incoming alerts
18
+ - **incident-commander** — Lead incident response and write postmortems
package/install.js ADDED
@@ -0,0 +1,13 @@
1
+ import { copyFileSync, mkdirSync, readdirSync } from 'fs';
2
+ import { join } from 'path';
3
+ import { homedir } from 'os';
4
+
5
+ const skillsDir = new URL('./skills/', import.meta.url).pathname;
6
+ const targetDir = join(homedir(), '.claude', 'skills');
7
+
8
+ for (const name of readdirSync(skillsDir)) {
9
+ const dest = join(targetDir, name);
10
+ mkdirSync(dest, { recursive: true });
11
+ copyFileSync(join(skillsDir, name, 'SKILL.md'), join(dest, 'SKILL.md'));
12
+ console.log(`Installed skill: ${name}`);
13
+ }
package/package.json ADDED
@@ -0,0 +1 @@
1
+ {"name":"@gonzih/skills-devops","version":"1.0.0","description":"DevOps skills for Claude Code","type":"module","scripts":{"postinstall":"node install.js"},"files":["skills/","install.js","README.md"],"keywords":["claude","mcp","skills","devops"],"license":"MIT"}
@@ -0,0 +1,25 @@
1
+ ---
2
+ name: alert-triage
3
+ description: Analyze and prioritize incoming alerts
4
+ ---
5
+
6
+ ## What
7
+ Analyze a set of incoming monitoring alerts, determine severity and urgency, identify correlations, and produce a prioritized action plan.
8
+
9
+ ## How
10
+ - Parse alert payloads (PagerDuty, Alertmanager, Datadog, CloudWatch, etc.)
11
+ - Group related alerts by service, host, or time window to avoid alert storms
12
+ - Assess business impact: user-facing vs. internal, revenue-critical vs. background
13
+ - Identify the probable root alert vs. downstream noise
14
+ - Recommend immediate actions vs. watch-and-wait
15
+
16
+ ## Workflow
17
+ 1. Paste or describe the active alerts
18
+ 2. Group and deduplicate correlated alerts
19
+ 3. Rank by impact: P1 (user-facing outage) → P4 (informational)
20
+ 4. Identify the root cause alert and downstream effects
21
+ 5. Output a triage summary: what to act on now, what to monitor, what to silence
22
+
23
+ ## Example
24
+ User: "I have 47 alerts firing: disk full on db-01, high latency on api-gateway, 5xx spike on checkout"
25
+ → Identifies disk-full on db-01 as root cause driving the cascade, recommends clearing disk space first, provides ordered action plan.
@@ -0,0 +1,25 @@
1
+ ---
2
+ name: ci-cd-troubleshooter
3
+ description: Diagnose and fix CI/CD pipeline failures
4
+ ---
5
+
6
+ ## What
7
+ Diagnose failing CI/CD pipelines, identify root causes, and propose or apply fixes across GitHub Actions, GitLab CI, Jenkins, CircleCI, and similar systems.
8
+
9
+ ## How
10
+ - Read pipeline config files and recent failure logs
11
+ - Identify the failing step, error type, and likely cause
12
+ - Check for common issues: missing secrets, dependency failures, flaky tests, resource limits, misconfigured caches
13
+ - Propose targeted fixes with minimal blast radius
14
+ - Verify the fix doesn't break other pipeline stages
15
+
16
+ ## Workflow
17
+ 1. Ask for the pipeline URL or paste the failure log
18
+ 2. Identify the failing job and step
19
+ 3. Classify the failure (config error, flaky test, infra issue, code bug)
20
+ 4. Propose a fix and explain the root cause
21
+ 5. Offer to apply the fix to the config file
22
+
23
+ ## Example
24
+ User: "My GitHub Actions deploy job is failing with 'Error: Unable to locate executable file: docker'"
25
+ → Diagnose missing Docker setup step, suggest `docker/setup-buildx-action`, apply to workflow YAML.
@@ -0,0 +1,26 @@
1
+ ---
2
+ name: incident-commander
3
+ description: Lead incident response and write postmortems
4
+ ---
5
+
6
+ ## What
7
+ Guide incident response from detection through resolution, coordinate communication, track timeline, and produce blameless postmortems.
8
+
9
+ ## How
10
+ - Establish incident severity (SEV1–SEV4) and assign roles: IC, comms lead, tech lead
11
+ - Maintain a live incident timeline with timestamped actions
12
+ - Draft stakeholder updates at the right cadence and detail level
13
+ - Track mitigation steps and confirm resolution criteria
14
+ - After resolution: write a blameless postmortem with timeline, impact, root cause, and action items
15
+
16
+ ## Workflow
17
+ 1. **Declare**: Capture incident start time, description, and initial severity
18
+ 2. **Triage**: Identify affected systems, blast radius, and on-call owners
19
+ 3. **Coordinate**: Draft stakeholder update templates; track timeline entries
20
+ 4. **Mitigate**: Document each remediation step and its outcome
21
+ 5. **Resolve**: Confirm resolution criteria met; set follow-up review time
22
+ 6. **Postmortem**: Generate blameless postmortem doc with 5-whys root cause analysis and action items
23
+
24
+ ## Example
25
+ User: "Production database is down, started 14:32 UTC, ~10k users affected"
26
+ → Declares SEV1, drafts initial stakeholder update, starts timeline, prompts for on-call contacts, and guides through mitigation steps to resolution and postmortem.
@@ -0,0 +1,25 @@
1
+ ---
2
+ name: runbook-writer
3
+ description: Write operational runbooks for systems and services
4
+ ---
5
+
6
+ ## What
7
+ Generate clear, actionable operational runbooks for services, infrastructure components, and recurring operational tasks.
8
+
9
+ ## How
10
+ - Gather context: service name, tech stack, deployment platform, on-call audience
11
+ - Structure runbooks with consistent sections: Overview, Prerequisites, Procedures, Rollback, Escalation
12
+ - Write steps at the right level of detail for the target audience (SRE, developer, NOC)
13
+ - Include verification commands after each major step
14
+ - Add decision trees for common failure modes
15
+
16
+ ## Workflow
17
+ 1. Ask for the service/system name and what the runbook should cover
18
+ 2. Identify the target audience and their assumed knowledge level
19
+ 3. Draft the runbook with Overview → Prerequisites → Step-by-step Procedures → Verification → Rollback → Escalation
20
+ 4. Review for completeness: every step has a verification command, rollback is covered
21
+ 5. Save to the appropriate location (e.g., `docs/runbooks/<service>.md`)
22
+
23
+ ## Example
24
+ User: "Write a runbook for restarting the payment-service in Kubernetes"
25
+ → Produces a runbook covering health checks, drain, rolling restart, verification, and rollback with `kubectl` commands.