@gonzih/skills-devops 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +18 -0
- package/install.js +13 -0
- package/package.json +1 -0
- package/skills/alert-triage/SKILL.md +25 -0
- package/skills/ci-cd-troubleshooter/SKILL.md +25 -0
- package/skills/incident-commander/SKILL.md +26 -0
- package/skills/runbook-writer/SKILL.md +25 -0
package/README.md
ADDED
|
@@ -0,0 +1,18 @@
|
|
|
1
|
+
# @gonzih/skills-devops
|
|
2
|
+
|
|
3
|
+
DevOps skills for Claude Code.
|
|
4
|
+
|
|
5
|
+
## Install
|
|
6
|
+
|
|
7
|
+
```bash
|
|
8
|
+
npm install -g @gonzih/skills-devops
|
|
9
|
+
```
|
|
10
|
+
|
|
11
|
+
Skills are automatically copied to `~/.claude/skills/` on install.
|
|
12
|
+
|
|
13
|
+
## Included Skills
|
|
14
|
+
|
|
15
|
+
- **ci-cd-troubleshooter** — Diagnose and fix CI/CD pipeline failures
|
|
16
|
+
- **runbook-writer** — Write operational runbooks for systems and services
|
|
17
|
+
- **alert-triage** — Analyze and prioritize incoming alerts
|
|
18
|
+
- **incident-commander** — Lead incident response and write postmortems
|
package/install.js
ADDED
|
@@ -0,0 +1,13 @@
|
|
|
1
|
+
import { copyFileSync, mkdirSync, readdirSync } from 'fs';
|
|
2
|
+
import { join } from 'path';
|
|
3
|
+
import { homedir } from 'os';
|
|
4
|
+
|
|
5
|
+
const skillsDir = new URL('./skills/', import.meta.url).pathname;
|
|
6
|
+
const targetDir = join(homedir(), '.claude', 'skills');
|
|
7
|
+
|
|
8
|
+
for (const name of readdirSync(skillsDir)) {
|
|
9
|
+
const dest = join(targetDir, name);
|
|
10
|
+
mkdirSync(dest, { recursive: true });
|
|
11
|
+
copyFileSync(join(skillsDir, name, 'SKILL.md'), join(dest, 'SKILL.md'));
|
|
12
|
+
console.log(`Installed skill: ${name}`);
|
|
13
|
+
}
|
package/package.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"name":"@gonzih/skills-devops","version":"1.0.0","description":"DevOps skills for Claude Code","type":"module","scripts":{"postinstall":"node install.js"},"files":["skills/","install.js","README.md"],"keywords":["claude","mcp","skills","devops"],"license":"MIT"}
|
|
@@ -0,0 +1,25 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: alert-triage
|
|
3
|
+
description: Analyze and prioritize incoming alerts
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
## What
|
|
7
|
+
Analyze a set of incoming monitoring alerts, determine severity and urgency, identify correlations, and produce a prioritized action plan.
|
|
8
|
+
|
|
9
|
+
## How
|
|
10
|
+
- Parse alert payloads (PagerDuty, Alertmanager, Datadog, CloudWatch, etc.)
|
|
11
|
+
- Group related alerts by service, host, or time window to avoid alert storms
|
|
12
|
+
- Assess business impact: user-facing vs. internal, revenue-critical vs. background
|
|
13
|
+
- Identify the probable root alert vs. downstream noise
|
|
14
|
+
- Recommend immediate actions vs. watch-and-wait
|
|
15
|
+
|
|
16
|
+
## Workflow
|
|
17
|
+
1. Paste or describe the active alerts
|
|
18
|
+
2. Group and deduplicate correlated alerts
|
|
19
|
+
3. Rank by impact: P1 (user-facing outage) → P4 (informational)
|
|
20
|
+
4. Identify the root cause alert and downstream effects
|
|
21
|
+
5. Output a triage summary: what to act on now, what to monitor, what to silence
|
|
22
|
+
|
|
23
|
+
## Example
|
|
24
|
+
User: "I have 47 alerts firing: disk full on db-01, high latency on api-gateway, 5xx spike on checkout"
|
|
25
|
+
→ Identifies disk-full on db-01 as root cause driving the cascade, recommends clearing disk space first, provides ordered action plan.
|
|
@@ -0,0 +1,25 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ci-cd-troubleshooter
|
|
3
|
+
description: Diagnose and fix CI/CD pipeline failures
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
## What
|
|
7
|
+
Diagnose failing CI/CD pipelines, identify root causes, and propose or apply fixes across GitHub Actions, GitLab CI, Jenkins, CircleCI, and similar systems.
|
|
8
|
+
|
|
9
|
+
## How
|
|
10
|
+
- Read pipeline config files and recent failure logs
|
|
11
|
+
- Identify the failing step, error type, and likely cause
|
|
12
|
+
- Check for common issues: missing secrets, dependency failures, flaky tests, resource limits, misconfigured caches
|
|
13
|
+
- Propose targeted fixes with minimal blast radius
|
|
14
|
+
- Verify the fix doesn't break other pipeline stages
|
|
15
|
+
|
|
16
|
+
## Workflow
|
|
17
|
+
1. Ask for the pipeline URL or paste the failure log
|
|
18
|
+
2. Identify the failing job and step
|
|
19
|
+
3. Classify the failure (config error, flaky test, infra issue, code bug)
|
|
20
|
+
4. Propose a fix and explain the root cause
|
|
21
|
+
5. Offer to apply the fix to the config file
|
|
22
|
+
|
|
23
|
+
## Example
|
|
24
|
+
User: "My GitHub Actions deploy job is failing with 'Error: Unable to locate executable file: docker'"
|
|
25
|
+
→ Diagnose missing Docker setup step, suggest `docker/setup-buildx-action`, apply to workflow YAML.
|
|
@@ -0,0 +1,26 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: incident-commander
|
|
3
|
+
description: Lead incident response and write postmortems
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
## What
|
|
7
|
+
Guide incident response from detection through resolution, coordinate communication, track timeline, and produce blameless postmortems.
|
|
8
|
+
|
|
9
|
+
## How
|
|
10
|
+
- Establish incident severity (SEV1–SEV4) and assign roles: IC, comms lead, tech lead
|
|
11
|
+
- Maintain a live incident timeline with timestamped actions
|
|
12
|
+
- Draft stakeholder updates at the right cadence and detail level
|
|
13
|
+
- Track mitigation steps and confirm resolution criteria
|
|
14
|
+
- After resolution: write a blameless postmortem with timeline, impact, root cause, and action items
|
|
15
|
+
|
|
16
|
+
## Workflow
|
|
17
|
+
1. **Declare**: Capture incident start time, description, and initial severity
|
|
18
|
+
2. **Triage**: Identify affected systems, blast radius, and on-call owners
|
|
19
|
+
3. **Coordinate**: Draft stakeholder update templates; track timeline entries
|
|
20
|
+
4. **Mitigate**: Document each remediation step and its outcome
|
|
21
|
+
5. **Resolve**: Confirm resolution criteria met; set follow-up review time
|
|
22
|
+
6. **Postmortem**: Generate blameless postmortem doc with 5-whys root cause analysis and action items
|
|
23
|
+
|
|
24
|
+
## Example
|
|
25
|
+
User: "Production database is down, started 14:32 UTC, ~10k users affected"
|
|
26
|
+
→ Declares SEV1, drafts initial stakeholder update, starts timeline, prompts for on-call contacts, and guides through mitigation steps to resolution and postmortem.
|
|
@@ -0,0 +1,25 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: runbook-writer
|
|
3
|
+
description: Write operational runbooks for systems and services
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
## What
|
|
7
|
+
Generate clear, actionable operational runbooks for services, infrastructure components, and recurring operational tasks.
|
|
8
|
+
|
|
9
|
+
## How
|
|
10
|
+
- Gather context: service name, tech stack, deployment platform, on-call audience
|
|
11
|
+
- Structure runbooks with consistent sections: Overview, Prerequisites, Procedures, Rollback, Escalation
|
|
12
|
+
- Write steps at the right level of detail for the target audience (SRE, developer, NOC)
|
|
13
|
+
- Include verification commands after each major step
|
|
14
|
+
- Add decision trees for common failure modes
|
|
15
|
+
|
|
16
|
+
## Workflow
|
|
17
|
+
1. Ask for the service/system name and what the runbook should cover
|
|
18
|
+
2. Identify the target audience and their assumed knowledge level
|
|
19
|
+
3. Draft the runbook with Overview → Prerequisites → Step-by-step Procedures → Verification → Rollback → Escalation
|
|
20
|
+
4. Review for completeness: every step has a verification command, rollback is covered
|
|
21
|
+
5. Save to the appropriate location (e.g., `docs/runbooks/<service>.md`)
|
|
22
|
+
|
|
23
|
+
## Example
|
|
24
|
+
User: "Write a runbook for restarting the payment-service in Kubernetes"
|
|
25
|
+
→ Produces a runbook covering health checks, drain, rolling restart, verification, and rollback with `kubectl` commands.
|