@h1dr0n/skill-pool 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +296 -0
- package/bin/cli.js +157 -0
- package/package.json +41 -0
- package/skills/api/agents/backend-specialist.md +69 -0
- package/skills/api/agents/database-optimizer.md +176 -0
- package/skills/api/manifest.yaml +20 -0
- package/skills/api/rules/auth-security.md +45 -0
- package/skills/api/skills/api-patterns/SKILL.md +81 -0
- package/skills/api/skills/api-patterns/api-style.md +42 -0
- package/skills/api/skills/api-patterns/auth.md +24 -0
- package/skills/api/skills/api-patterns/documentation.md +26 -0
- package/skills/api/skills/api-patterns/graphql.md +41 -0
- package/skills/api/skills/api-patterns/rate-limiting.md +31 -0
- package/skills/api/skills/api-patterns/response.md +37 -0
- package/skills/api/skills/api-patterns/rest.md +40 -0
- package/skills/api/skills/api-patterns/scripts/api_validator.py +211 -0
- package/skills/api/skills/api-patterns/security-testing.md +122 -0
- package/skills/api/skills/api-patterns/trpc.md +41 -0
- package/skills/api/skills/api-patterns/versioning.md +22 -0
- package/skills/api/skills/database-patterns.md +126 -0
- package/skills/api/skills/deployment-patterns.md +105 -0
- package/skills/api/skills/docker-patterns.md +135 -0
- package/skills/common/agents/code-reviewer.md +78 -0
- package/skills/common/agents/planner.md +80 -0
- package/skills/common/agents/security-reviewer.md +82 -0
- package/skills/common/agents/software-architect.md +81 -0
- package/skills/common/manifest.yaml +25 -0
- package/skills/common/rules/coding-style.md +39 -0
- package/skills/common/rules/git-workflow.md +33 -0
- package/skills/common/rules/security.md +25 -0
- package/skills/common/skills/architecture/SKILL.md +55 -0
- package/skills/common/skills/architecture/context-discovery.md +43 -0
- package/skills/common/skills/architecture/examples.md +94 -0
- package/skills/common/skills/architecture/pattern-selection.md +68 -0
- package/skills/common/skills/architecture/patterns-reference.md +50 -0
- package/skills/common/skills/architecture/trade-off-analysis.md +77 -0
- package/skills/common/skills/brainstorming/SKILL.md +163 -0
- package/skills/common/skills/brainstorming/dynamic-questioning.md +350 -0
- package/skills/common/skills/clean-code.md +99 -0
- package/skills/common/skills/code-review-checklist.md +86 -0
- package/skills/common/skills/plan-writing/SKILL.md +152 -0
- package/skills/common/skills/skill-feedback.md +94 -0
- package/skills/common/skills/tdd-workflow.md +130 -0
- package/skills/common/skills/verification-loop.md +112 -0
- package/skills/cpp/agents/cpp-build-resolver.md +90 -0
- package/skills/cpp/agents/cpp-reviewer.md +72 -0
- package/skills/cpp/manifest.yaml +15 -0
- package/skills/cpp/skills/cpp-coding-standards.md +722 -0
- package/skills/cpp/skills/cpp-testing.md +323 -0
- package/skills/devops/agents/devops-automator.md +376 -0
- package/skills/devops/agents/sre.md +90 -0
- package/skills/devops/manifest.yaml +20 -0
- package/skills/devops/skills/deployment-patterns.md +427 -0
- package/skills/devops/skills/deployment-procedures/SKILL.md +241 -0
- package/skills/devops/skills/docker-patterns.md +364 -0
- package/skills/devops/skills/e2e-testing.md +326 -0
- package/skills/devops/skills/github-ops.md +144 -0
- package/skills/django/manifest.yaml +16 -0
- package/skills/django/skills/django-patterns.md +734 -0
- package/skills/django/skills/django-security.md +593 -0
- package/skills/django/skills/django-tdd.md +729 -0
- package/skills/django/skills/django-verification.md +469 -0
- package/skills/dotnet/agents/csharp-reviewer.md +101 -0
- package/skills/dotnet/manifest.yaml +14 -0
- package/skills/dotnet/skills/csharp-testing.md +321 -0
- package/skills/dotnet/skills/dotnet-patterns.md +321 -0
- package/skills/go/agents/code-reviewer.md +76 -0
- package/skills/go/agents/go-build-resolver.md +94 -0
- package/skills/go/agents/go-reviewer.md +76 -0
- package/skills/go/manifest.yaml +17 -0
- package/skills/go/rules/go-style.md +55 -0
- package/skills/go/skills/golang-patterns.md +674 -0
- package/skills/go/skills/golang-testing.md +720 -0
- package/skills/java/agents/java-build-resolver.md +153 -0
- package/skills/java/agents/java-reviewer.md +92 -0
- package/skills/java/manifest.yaml +18 -0
- package/skills/java/skills/java-coding-standards.md +147 -0
- package/skills/java/skills/jpa-patterns.md +151 -0
- package/skills/java/skills/springboot-patterns.md +314 -0
- package/skills/java/skills/springboot-security.md +272 -0
- package/skills/kotlin/agents/kotlin-build-resolver.md +118 -0
- package/skills/kotlin/agents/kotlin-reviewer.md +159 -0
- package/skills/kotlin/manifest.yaml +17 -0
- package/skills/kotlin/skills/kotlin-coroutines-flows.md +284 -0
- package/skills/kotlin/skills/kotlin-patterns.md +711 -0
- package/skills/kotlin/skills/kotlin-testing.md +824 -0
- package/skills/laravel/manifest.yaml +15 -0
- package/skills/laravel/skills/laravel-patterns.md +409 -0
- package/skills/laravel/skills/laravel-security.md +279 -0
- package/skills/laravel/skills/laravel-tdd.md +277 -0
- package/skills/laravel/skills/laravel-verification.md +173 -0
- package/skills/mobile/agents/dart-build-resolver.md +201 -0
- package/skills/mobile/agents/flutter-reviewer.md +243 -0
- package/skills/mobile/manifest.yaml +19 -0
- package/skills/mobile/skills/android-clean-architecture.md +339 -0
- package/skills/mobile/skills/dart-flutter-patterns.md +563 -0
- package/skills/mobile/skills/swiftui-patterns.md +259 -0
- package/skills/nestjs/manifest.yaml +13 -0
- package/skills/nestjs/skills/nestjs-patterns.md +230 -0
- package/skills/perl/manifest.yaml +13 -0
- package/skills/perl/skills/perl-patterns.md +504 -0
- package/skills/perl/skills/perl-security.md +503 -0
- package/skills/perl/skills/perl-testing.md +475 -0
- package/skills/python/agents/python-reviewer.md +98 -0
- package/skills/python/manifest.yaml +18 -0
- package/skills/python/rules/python-style.md +69 -0
- package/skills/python/skills/python-patterns/SKILL.md +441 -0
- package/skills/python/skills/python-patterns.md +90 -0
- package/skills/python/skills/python-testing.md +81 -0
- package/skills/rust/agents/rust-build-resolver.md +148 -0
- package/skills/rust/agents/rust-reviewer.md +94 -0
- package/skills/rust/manifest.yaml +16 -0
- package/skills/rust/rules/rust-style.md +107 -0
- package/skills/rust/skills/rust-patterns.md +499 -0
- package/skills/rust/skills/rust-testing.md +500 -0
- package/skills/security/agents/accessibility-auditor.md +316 -0
- package/skills/security/agents/security-reviewer.md +108 -0
- package/skills/security/manifest.yaml +19 -0
- package/skills/security/skills/red-team-tactics/SKILL.md +199 -0
- package/skills/security/skills/security-bounty-hunter.md +99 -0
- package/skills/security/skills/security-review.md +495 -0
- package/skills/security/skills/security-scan.md +165 -0
- package/skills/security/skills/vulnerability-scanner/SKILL.md +276 -0
- package/skills/security/skills/vulnerability-scanner/checklists.md +121 -0
- package/skills/security/skills/vulnerability-scanner/scripts/security_scan.py +458 -0
- package/skills/swift/manifest.yaml +16 -0
- package/skills/swift/skills/swift-actor-persistence.md +142 -0
- package/skills/swift/skills/swift-concurrency.md +216 -0
- package/skills/swift/skills/swift-protocol-di-testing.md +190 -0
- package/skills/swift/skills/swiftui-patterns.md +259 -0
- package/skills/unity/agents/game-designer.md +167 -0
- package/skills/unity/agents/unity-architect.md +52 -0
- package/skills/unity/agents/unity-editor-tool-developer.md +310 -0
- package/skills/unity/agents/unity-multiplayer-engineer.md +321 -0
- package/skills/unity/agents/unity-shader-graph-artist.md +269 -0
- package/skills/unity/manifest.yaml +21 -0
- package/skills/unity/rules/csharp-patterns.md +48 -0
- package/skills/unity/rules/unity-specific.md +53 -0
- package/skills/unity/skills/systematic-debugging.md +92 -0
- package/skills/unity/skills/unity-architecture.md +173 -0
- package/skills/unreal/agents/level-designer.md +208 -0
- package/skills/unreal/agents/technical-artist.md +229 -0
- package/skills/unreal/agents/unreal-multiplayer-architect.md +313 -0
- package/skills/unreal/agents/unreal-systems-engineer.md +310 -0
- package/skills/unreal/agents/unreal-technical-artist.md +256 -0
- package/skills/unreal/agents/unreal-world-builder.md +273 -0
- package/skills/unreal/manifest.yaml +21 -0
- package/skills/unreal/skills/unreal-patterns.md +183 -0
- package/skills/web/agents/frontend-specialist.md +71 -0
- package/skills/web/agents/ui-designer.md +383 -0
- package/skills/web/agents/ux-architect.md +469 -0
- package/skills/web/manifest.yaml +22 -0
- package/skills/web/rules/accessibility.md +54 -0
- package/skills/web/rules/css-performance.md +52 -0
- package/skills/web/skills/e2e-testing.md +132 -0
- package/skills/web/skills/frontend-design/SKILL.md +452 -0
- package/skills/web/skills/frontend-design/animation-guide.md +331 -0
- package/skills/web/skills/frontend-design/color-system.md +311 -0
- package/skills/web/skills/frontend-design/decision-trees.md +418 -0
- package/skills/web/skills/frontend-design/motion-graphics.md +306 -0
- package/skills/web/skills/frontend-design/scripts/accessibility_checker.py +183 -0
- package/skills/web/skills/frontend-design/scripts/ux_audit.py +722 -0
- package/skills/web/skills/frontend-design/typography-system.md +345 -0
- package/skills/web/skills/frontend-design/ux-psychology.md +1116 -0
- package/skills/web/skills/frontend-design/visual-effects.md +383 -0
- package/skills/web/skills/react-nextjs.md +135 -0
- package/skills/web/skills/tailwind-patterns/SKILL.md +269 -0
- package/src/adapters/antigravity.js +164 -0
- package/src/adapters/claude.js +188 -0
- package/src/adapters/cursor.js +161 -0
- package/src/adapters/index.js +67 -0
- package/src/adapters/windsurf.js +158 -0
- package/src/commands/add.js +266 -0
- package/src/commands/create.js +127 -0
- package/src/commands/diff.js +78 -0
- package/src/commands/info.js +88 -0
- package/src/commands/init.js +224 -0
- package/src/commands/install.js +90 -0
- package/src/commands/list.js +54 -0
- package/src/commands/remove.js +101 -0
- package/src/commands/targets.js +32 -0
- package/src/commands/update.js +57 -0
- package/src/core/manifest.js +57 -0
- package/src/core/plugins.js +86 -0
- package/src/core/resolver.js +84 -0
- package/src/core/tracker.js +49 -0
- package/src/utils/fs.js +80 -0
- package/src/utils/git.js +52 -0
|
@@ -0,0 +1,90 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: SRE (Site Reliability Engineer)
|
|
3
|
+
description: Expert site reliability engineer specializing in SLOs, error budgets, observability, chaos engineering, and toil reduction for production systems at scale.
|
|
4
|
+
color: "#e63946"
|
|
5
|
+
emoji: 🛡️
|
|
6
|
+
vibe: Reliability is a feature. Error budgets fund velocity — spend them wisely.
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
# SRE (Site Reliability Engineer) Agent
|
|
10
|
+
|
|
11
|
+
You are **SRE**, a site reliability engineer who treats reliability as a feature with a measurable budget. You define SLOs that reflect user experience, build observability that answers questions you haven't asked yet, and automate toil so engineers can focus on what matters.
|
|
12
|
+
|
|
13
|
+
## 🧠 Your Identity & Memory
|
|
14
|
+
- **Role**: Site reliability engineering and production systems specialist
|
|
15
|
+
- **Personality**: Data-driven, proactive, automation-obsessed, pragmatic about risk
|
|
16
|
+
- **Memory**: You remember failure patterns, SLO burn rates, and which automation saved the most toil
|
|
17
|
+
- **Experience**: You've managed systems from 99.9% to 99.99% and know that each nine costs 10x more
|
|
18
|
+
|
|
19
|
+
## 🎯 Your Core Mission
|
|
20
|
+
|
|
21
|
+
Build and maintain reliable production systems through engineering, not heroics:
|
|
22
|
+
|
|
23
|
+
1. **SLOs & error budgets** — Define what "reliable enough" means, measure it, act on it
|
|
24
|
+
2. **Observability** — Logs, metrics, traces that answer "why is this broken?" in minutes
|
|
25
|
+
3. **Toil reduction** — Automate repetitive operational work systematically
|
|
26
|
+
4. **Chaos engineering** — Proactively find weaknesses before users do
|
|
27
|
+
5. **Capacity planning** — Right-size resources based on data, not guesses
|
|
28
|
+
|
|
29
|
+
## 🔧 Critical Rules
|
|
30
|
+
|
|
31
|
+
1. **SLOs drive decisions** — If there's error budget remaining, ship features. If not, fix reliability.
|
|
32
|
+
2. **Measure before optimizing** — No reliability work without data showing the problem
|
|
33
|
+
3. **Automate toil, don't heroic through it** — If you did it twice, automate it
|
|
34
|
+
4. **Blameless culture** — Systems fail, not people. Fix the system.
|
|
35
|
+
5. **Progressive rollouts** — Canary → percentage → full. Never big-bang deploys.
|
|
36
|
+
|
|
37
|
+
## 📋 SLO Framework
|
|
38
|
+
|
|
39
|
+
```yaml
|
|
40
|
+
# SLO Definition
|
|
41
|
+
service: payment-api
|
|
42
|
+
slos:
|
|
43
|
+
- name: Availability
|
|
44
|
+
description: Successful responses to valid requests
|
|
45
|
+
sli: count(status < 500) / count(total)
|
|
46
|
+
target: 99.95%
|
|
47
|
+
window: 30d
|
|
48
|
+
burn_rate_alerts:
|
|
49
|
+
- severity: critical
|
|
50
|
+
short_window: 5m
|
|
51
|
+
long_window: 1h
|
|
52
|
+
factor: 14.4
|
|
53
|
+
- severity: warning
|
|
54
|
+
short_window: 30m
|
|
55
|
+
long_window: 6h
|
|
56
|
+
factor: 6
|
|
57
|
+
|
|
58
|
+
- name: Latency
|
|
59
|
+
description: Request duration at p99
|
|
60
|
+
sli: count(duration < 300ms) / count(total)
|
|
61
|
+
target: 99%
|
|
62
|
+
window: 30d
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
## 🔭 Observability Stack
|
|
66
|
+
|
|
67
|
+
### The Three Pillars
|
|
68
|
+
| Pillar | Purpose | Key Questions |
|
|
69
|
+
|--------|---------|---------------|
|
|
70
|
+
| **Metrics** | Trends, alerting, SLO tracking | Is the system healthy? Is the error budget burning? |
|
|
71
|
+
| **Logs** | Event details, debugging | What happened at 14:32:07? |
|
|
72
|
+
| **Traces** | Request flow across services | Where is the latency? Which service failed? |
|
|
73
|
+
|
|
74
|
+
### Golden Signals
|
|
75
|
+
- **Latency** — Duration of requests (distinguish success vs error latency)
|
|
76
|
+
- **Traffic** — Requests per second, concurrent users
|
|
77
|
+
- **Errors** — Error rate by type (5xx, timeout, business logic)
|
|
78
|
+
- **Saturation** — CPU, memory, queue depth, connection pool usage
|
|
79
|
+
|
|
80
|
+
## 🔥 Incident Response Integration
|
|
81
|
+
- Severity based on SLO impact, not gut feeling
|
|
82
|
+
- Automated runbooks for known failure modes
|
|
83
|
+
- Post-incident reviews focused on systemic fixes
|
|
84
|
+
- Track MTTR, not just MTBF
|
|
85
|
+
|
|
86
|
+
## 💬 Communication Style
|
|
87
|
+
- Lead with data: "Error budget is 43% consumed with 60% of the window remaining"
|
|
88
|
+
- Frame reliability as investment: "This automation saves 4 hours/week of toil"
|
|
89
|
+
- Use risk language: "This deployment has a 15% chance of exceeding our latency SLO"
|
|
90
|
+
- Be direct about trade-offs: "We can ship this feature, but we'll need to defer the migration"
|
|
@@ -0,0 +1,20 @@
|
|
|
1
|
+
name: devops
|
|
2
|
+
version: 0.1.0
|
|
3
|
+
description: DevOps workflows - deployment patterns, Docker, E2E testing, GitHub operations, CI/CD pipelines
|
|
4
|
+
depends:
|
|
5
|
+
- common
|
|
6
|
+
tags:
|
|
7
|
+
- devops
|
|
8
|
+
- docker
|
|
9
|
+
- ci-cd
|
|
10
|
+
- deployment
|
|
11
|
+
rules: []
|
|
12
|
+
skills:
|
|
13
|
+
- skills/deployment-patterns.md
|
|
14
|
+
- skills/docker-patterns.md
|
|
15
|
+
- skills/e2e-testing.md
|
|
16
|
+
- skills/github-ops.md
|
|
17
|
+
- skills/deployment-procedures
|
|
18
|
+
agents:
|
|
19
|
+
- agents/devops-automator.md
|
|
20
|
+
- agents/sre.md
|
|
@@ -0,0 +1,427 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: deployment-patterns
|
|
3
|
+
description: Deployment workflows, CI/CD pipeline patterns, Docker containerization, health checks, rollback strategies, and production readiness checklists for web applications.
|
|
4
|
+
origin: ECC
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Deployment Patterns
|
|
8
|
+
|
|
9
|
+
Production deployment workflows and CI/CD best practices.
|
|
10
|
+
|
|
11
|
+
## When to Activate
|
|
12
|
+
|
|
13
|
+
- Setting up CI/CD pipelines
|
|
14
|
+
- Dockerizing an application
|
|
15
|
+
- Planning deployment strategy (blue-green, canary, rolling)
|
|
16
|
+
- Implementing health checks and readiness probes
|
|
17
|
+
- Preparing for a production release
|
|
18
|
+
- Configuring environment-specific settings
|
|
19
|
+
|
|
20
|
+
## Deployment Strategies
|
|
21
|
+
|
|
22
|
+
### Rolling Deployment (Default)
|
|
23
|
+
|
|
24
|
+
Replace instances gradually — old and new versions run simultaneously during rollout.
|
|
25
|
+
|
|
26
|
+
```
|
|
27
|
+
Instance 1: v1 → v2 (update first)
|
|
28
|
+
Instance 2: v1 (still running v1)
|
|
29
|
+
Instance 3: v1 (still running v1)
|
|
30
|
+
|
|
31
|
+
Instance 1: v2
|
|
32
|
+
Instance 2: v1 → v2 (update second)
|
|
33
|
+
Instance 3: v1
|
|
34
|
+
|
|
35
|
+
Instance 1: v2
|
|
36
|
+
Instance 2: v2
|
|
37
|
+
Instance 3: v1 → v2 (update last)
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
**Pros:** Zero downtime, gradual rollout
|
|
41
|
+
**Cons:** Two versions run simultaneously — requires backward-compatible changes
|
|
42
|
+
**Use when:** Standard deployments, backward-compatible changes
|
|
43
|
+
|
|
44
|
+
### Blue-Green Deployment
|
|
45
|
+
|
|
46
|
+
Run two identical environments. Switch traffic atomically.
|
|
47
|
+
|
|
48
|
+
```
|
|
49
|
+
Blue (v1) ← traffic
|
|
50
|
+
Green (v2) idle, running new version
|
|
51
|
+
|
|
52
|
+
# After verification:
|
|
53
|
+
Blue (v1) idle (becomes standby)
|
|
54
|
+
Green (v2) ← traffic
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
**Pros:** Instant rollback (switch back to blue), clean cutover
|
|
58
|
+
**Cons:** Requires 2x infrastructure during deployment
|
|
59
|
+
**Use when:** Critical services, zero-tolerance for issues
|
|
60
|
+
|
|
61
|
+
### Canary Deployment
|
|
62
|
+
|
|
63
|
+
Route a small percentage of traffic to the new version first.
|
|
64
|
+
|
|
65
|
+
```
|
|
66
|
+
v1: 95% of traffic
|
|
67
|
+
v2: 5% of traffic (canary)
|
|
68
|
+
|
|
69
|
+
# If metrics look good:
|
|
70
|
+
v1: 50% of traffic
|
|
71
|
+
v2: 50% of traffic
|
|
72
|
+
|
|
73
|
+
# Final:
|
|
74
|
+
v2: 100% of traffic
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
**Pros:** Catches issues with real traffic before full rollout
|
|
78
|
+
**Cons:** Requires traffic splitting infrastructure, monitoring
|
|
79
|
+
**Use when:** High-traffic services, risky changes, feature flags
|
|
80
|
+
|
|
81
|
+
## Docker
|
|
82
|
+
|
|
83
|
+
### Multi-Stage Dockerfile (Node.js)
|
|
84
|
+
|
|
85
|
+
```dockerfile
|
|
86
|
+
# Stage 1: Install dependencies
|
|
87
|
+
FROM node:22-alpine AS deps
|
|
88
|
+
WORKDIR /app
|
|
89
|
+
COPY package.json package-lock.json ./
|
|
90
|
+
RUN npm ci --production=false
|
|
91
|
+
|
|
92
|
+
# Stage 2: Build
|
|
93
|
+
FROM node:22-alpine AS builder
|
|
94
|
+
WORKDIR /app
|
|
95
|
+
COPY --from=deps /app/node_modules ./node_modules
|
|
96
|
+
COPY . .
|
|
97
|
+
RUN npm run build
|
|
98
|
+
RUN npm prune --production
|
|
99
|
+
|
|
100
|
+
# Stage 3: Production image
|
|
101
|
+
FROM node:22-alpine AS runner
|
|
102
|
+
WORKDIR /app
|
|
103
|
+
|
|
104
|
+
RUN addgroup -g 1001 -S appgroup && adduser -S appuser -u 1001
|
|
105
|
+
USER appuser
|
|
106
|
+
|
|
107
|
+
COPY --from=builder --chown=appuser:appgroup /app/node_modules ./node_modules
|
|
108
|
+
COPY --from=builder --chown=appuser:appgroup /app/dist ./dist
|
|
109
|
+
COPY --from=builder --chown=appuser:appgroup /app/package.json ./
|
|
110
|
+
|
|
111
|
+
ENV NODE_ENV=production
|
|
112
|
+
EXPOSE 3000
|
|
113
|
+
|
|
114
|
+
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
|
|
115
|
+
CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1
|
|
116
|
+
|
|
117
|
+
CMD ["node", "dist/server.js"]
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
### Multi-Stage Dockerfile (Go)
|
|
121
|
+
|
|
122
|
+
```dockerfile
|
|
123
|
+
FROM golang:1.22-alpine AS builder
|
|
124
|
+
WORKDIR /app
|
|
125
|
+
COPY go.mod go.sum ./
|
|
126
|
+
RUN go mod download
|
|
127
|
+
COPY . .
|
|
128
|
+
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-s -w" -o /server ./cmd/server
|
|
129
|
+
|
|
130
|
+
FROM alpine:3.19 AS runner
|
|
131
|
+
RUN apk --no-cache add ca-certificates
|
|
132
|
+
RUN adduser -D -u 1001 appuser
|
|
133
|
+
USER appuser
|
|
134
|
+
|
|
135
|
+
COPY --from=builder /server /server
|
|
136
|
+
|
|
137
|
+
EXPOSE 8080
|
|
138
|
+
HEALTHCHECK --interval=30s --timeout=3s CMD wget -qO- http://localhost:8080/health || exit 1
|
|
139
|
+
CMD ["/server"]
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
### Multi-Stage Dockerfile (Python/Django)
|
|
143
|
+
|
|
144
|
+
```dockerfile
|
|
145
|
+
FROM python:3.12-slim AS builder
|
|
146
|
+
WORKDIR /app
|
|
147
|
+
RUN pip install --no-cache-dir uv
|
|
148
|
+
COPY requirements.txt .
|
|
149
|
+
RUN uv pip install --system --no-cache -r requirements.txt
|
|
150
|
+
|
|
151
|
+
FROM python:3.12-slim AS runner
|
|
152
|
+
WORKDIR /app
|
|
153
|
+
|
|
154
|
+
RUN useradd -r -u 1001 appuser
|
|
155
|
+
USER appuser
|
|
156
|
+
|
|
157
|
+
COPY --from=builder /usr/local/lib/python3.12/site-packages /usr/local/lib/python3.12/site-packages
|
|
158
|
+
COPY --from=builder /usr/local/bin /usr/local/bin
|
|
159
|
+
COPY . .
|
|
160
|
+
|
|
161
|
+
ENV PYTHONUNBUFFERED=1
|
|
162
|
+
EXPOSE 8000
|
|
163
|
+
|
|
164
|
+
HEALTHCHECK --interval=30s --timeout=3s CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health/')" || exit 1
|
|
165
|
+
CMD ["gunicorn", "config.wsgi:application", "--bind", "0.0.0.0:8000", "--workers", "4"]
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
### Docker Best Practices
|
|
169
|
+
|
|
170
|
+
```
|
|
171
|
+
# GOOD practices
|
|
172
|
+
- Use specific version tags (node:22-alpine, not node:latest)
|
|
173
|
+
- Multi-stage builds to minimize image size
|
|
174
|
+
- Run as non-root user
|
|
175
|
+
- Copy dependency files first (layer caching)
|
|
176
|
+
- Use .dockerignore to exclude node_modules, .git, tests
|
|
177
|
+
- Add HEALTHCHECK instruction
|
|
178
|
+
- Set resource limits in docker-compose or k8s
|
|
179
|
+
|
|
180
|
+
# BAD practices
|
|
181
|
+
- Running as root
|
|
182
|
+
- Using :latest tags
|
|
183
|
+
- Copying entire repo in one COPY layer
|
|
184
|
+
- Installing dev dependencies in production image
|
|
185
|
+
- Storing secrets in image (use env vars or secrets manager)
|
|
186
|
+
```
|
|
187
|
+
|
|
188
|
+
## CI/CD Pipeline
|
|
189
|
+
|
|
190
|
+
### GitHub Actions (Standard Pipeline)
|
|
191
|
+
|
|
192
|
+
```yaml
|
|
193
|
+
name: CI/CD
|
|
194
|
+
|
|
195
|
+
on:
|
|
196
|
+
push:
|
|
197
|
+
branches: [main]
|
|
198
|
+
pull_request:
|
|
199
|
+
branches: [main]
|
|
200
|
+
|
|
201
|
+
jobs:
|
|
202
|
+
test:
|
|
203
|
+
runs-on: ubuntu-latest
|
|
204
|
+
steps:
|
|
205
|
+
- uses: actions/checkout@v4
|
|
206
|
+
- uses: actions/setup-node@v4
|
|
207
|
+
with:
|
|
208
|
+
node-version: 22
|
|
209
|
+
cache: npm
|
|
210
|
+
- run: npm ci
|
|
211
|
+
- run: npm run lint
|
|
212
|
+
- run: npm run typecheck
|
|
213
|
+
- run: npm test -- --coverage
|
|
214
|
+
- uses: actions/upload-artifact@v4
|
|
215
|
+
if: always()
|
|
216
|
+
with:
|
|
217
|
+
name: coverage
|
|
218
|
+
path: coverage/
|
|
219
|
+
|
|
220
|
+
build:
|
|
221
|
+
needs: test
|
|
222
|
+
runs-on: ubuntu-latest
|
|
223
|
+
if: github.ref == 'refs/heads/main'
|
|
224
|
+
steps:
|
|
225
|
+
- uses: actions/checkout@v4
|
|
226
|
+
- uses: docker/setup-buildx-action@v3
|
|
227
|
+
- uses: docker/login-action@v3
|
|
228
|
+
with:
|
|
229
|
+
registry: ghcr.io
|
|
230
|
+
username: ${{ github.actor }}
|
|
231
|
+
password: ${{ secrets.GITHUB_TOKEN }}
|
|
232
|
+
- uses: docker/build-push-action@v5
|
|
233
|
+
with:
|
|
234
|
+
push: true
|
|
235
|
+
tags: ghcr.io/${{ github.repository }}:${{ github.sha }}
|
|
236
|
+
cache-from: type=gha
|
|
237
|
+
cache-to: type=gha,mode=max
|
|
238
|
+
|
|
239
|
+
deploy:
|
|
240
|
+
needs: build
|
|
241
|
+
runs-on: ubuntu-latest
|
|
242
|
+
if: github.ref == 'refs/heads/main'
|
|
243
|
+
environment: production
|
|
244
|
+
steps:
|
|
245
|
+
- name: Deploy to production
|
|
246
|
+
run: |
|
|
247
|
+
# Platform-specific deployment command
|
|
248
|
+
# Railway: railway up
|
|
249
|
+
# Vercel: vercel --prod
|
|
250
|
+
# K8s: kubectl set image deployment/app app=ghcr.io/${{ github.repository }}:${{ github.sha }}
|
|
251
|
+
echo "Deploying ${{ github.sha }}"
|
|
252
|
+
```
|
|
253
|
+
|
|
254
|
+
### Pipeline Stages
|
|
255
|
+
|
|
256
|
+
```
|
|
257
|
+
PR opened:
|
|
258
|
+
lint → typecheck → unit tests → integration tests → preview deploy
|
|
259
|
+
|
|
260
|
+
Merged to main:
|
|
261
|
+
lint → typecheck → unit tests → integration tests → build image → deploy staging → smoke tests → deploy production
|
|
262
|
+
```
|
|
263
|
+
|
|
264
|
+
## Health Checks
|
|
265
|
+
|
|
266
|
+
### Health Check Endpoint
|
|
267
|
+
|
|
268
|
+
```typescript
|
|
269
|
+
// Simple health check
|
|
270
|
+
app.get("/health", (req, res) => {
|
|
271
|
+
res.status(200).json({ status: "ok" });
|
|
272
|
+
});
|
|
273
|
+
|
|
274
|
+
// Detailed health check (for internal monitoring)
|
|
275
|
+
app.get("/health/detailed", async (req, res) => {
|
|
276
|
+
const checks = {
|
|
277
|
+
database: await checkDatabase(),
|
|
278
|
+
redis: await checkRedis(),
|
|
279
|
+
externalApi: await checkExternalApi(),
|
|
280
|
+
};
|
|
281
|
+
|
|
282
|
+
const allHealthy = Object.values(checks).every(c => c.status === "ok");
|
|
283
|
+
|
|
284
|
+
res.status(allHealthy ? 200 : 503).json({
|
|
285
|
+
status: allHealthy ? "ok" : "degraded",
|
|
286
|
+
timestamp: new Date().toISOString(),
|
|
287
|
+
version: process.env.APP_VERSION || "unknown",
|
|
288
|
+
uptime: process.uptime(),
|
|
289
|
+
checks,
|
|
290
|
+
});
|
|
291
|
+
});
|
|
292
|
+
|
|
293
|
+
async function checkDatabase(): Promise<HealthCheck> {
|
|
294
|
+
try {
|
|
295
|
+
await db.query("SELECT 1");
|
|
296
|
+
return { status: "ok", latency_ms: 2 };
|
|
297
|
+
} catch (err) {
|
|
298
|
+
return { status: "error", message: "Database unreachable" };
|
|
299
|
+
}
|
|
300
|
+
}
|
|
301
|
+
```
|
|
302
|
+
|
|
303
|
+
### Kubernetes Probes
|
|
304
|
+
|
|
305
|
+
```yaml
|
|
306
|
+
livenessProbe:
|
|
307
|
+
httpGet:
|
|
308
|
+
path: /health
|
|
309
|
+
port: 3000
|
|
310
|
+
initialDelaySeconds: 10
|
|
311
|
+
periodSeconds: 30
|
|
312
|
+
failureThreshold: 3
|
|
313
|
+
|
|
314
|
+
readinessProbe:
|
|
315
|
+
httpGet:
|
|
316
|
+
path: /health
|
|
317
|
+
port: 3000
|
|
318
|
+
initialDelaySeconds: 5
|
|
319
|
+
periodSeconds: 10
|
|
320
|
+
failureThreshold: 2
|
|
321
|
+
|
|
322
|
+
startupProbe:
|
|
323
|
+
httpGet:
|
|
324
|
+
path: /health
|
|
325
|
+
port: 3000
|
|
326
|
+
initialDelaySeconds: 0
|
|
327
|
+
periodSeconds: 5
|
|
328
|
+
failureThreshold: 30 # 30 * 5s = 150s max startup time
|
|
329
|
+
```
|
|
330
|
+
|
|
331
|
+
## Environment Configuration
|
|
332
|
+
|
|
333
|
+
### Twelve-Factor App Pattern
|
|
334
|
+
|
|
335
|
+
```bash
|
|
336
|
+
# All config via environment variables — never in code
|
|
337
|
+
DATABASE_URL=postgres://user:pass@host:5432/db
|
|
338
|
+
REDIS_URL=redis://host:6379/0
|
|
339
|
+
API_KEY=${API_KEY} # injected by secrets manager
|
|
340
|
+
LOG_LEVEL=info
|
|
341
|
+
PORT=3000
|
|
342
|
+
|
|
343
|
+
# Environment-specific behavior
|
|
344
|
+
NODE_ENV=production # or staging, development
|
|
345
|
+
APP_ENV=production # explicit app environment
|
|
346
|
+
```
|
|
347
|
+
|
|
348
|
+
### Configuration Validation
|
|
349
|
+
|
|
350
|
+
```typescript
|
|
351
|
+
import { z } from "zod";
|
|
352
|
+
|
|
353
|
+
const envSchema = z.object({
|
|
354
|
+
NODE_ENV: z.enum(["development", "staging", "production"]),
|
|
355
|
+
PORT: z.coerce.number().default(3000),
|
|
356
|
+
DATABASE_URL: z.string().url(),
|
|
357
|
+
REDIS_URL: z.string().url(),
|
|
358
|
+
JWT_SECRET: z.string().min(32),
|
|
359
|
+
LOG_LEVEL: z.enum(["debug", "info", "warn", "error"]).default("info"),
|
|
360
|
+
});
|
|
361
|
+
|
|
362
|
+
// Validate at startup — fail fast if config is wrong
|
|
363
|
+
export const env = envSchema.parse(process.env);
|
|
364
|
+
```
|
|
365
|
+
|
|
366
|
+
## Rollback Strategy
|
|
367
|
+
|
|
368
|
+
### Instant Rollback
|
|
369
|
+
|
|
370
|
+
```bash
|
|
371
|
+
# Docker/Kubernetes: point to previous image
|
|
372
|
+
kubectl rollout undo deployment/app
|
|
373
|
+
|
|
374
|
+
# Vercel: promote previous deployment
|
|
375
|
+
vercel rollback
|
|
376
|
+
|
|
377
|
+
# Railway: redeploy previous commit
|
|
378
|
+
railway up --commit <previous-sha>
|
|
379
|
+
|
|
380
|
+
# Database: rollback migration (if reversible)
|
|
381
|
+
npx prisma migrate resolve --rolled-back <migration-name>
|
|
382
|
+
```
|
|
383
|
+
|
|
384
|
+
### Rollback Checklist
|
|
385
|
+
|
|
386
|
+
- [ ] Previous image/artifact is available and tagged
|
|
387
|
+
- [ ] Database migrations are backward-compatible (no destructive changes)
|
|
388
|
+
- [ ] Feature flags can disable new features without deploy
|
|
389
|
+
- [ ] Monitoring alerts configured for error rate spikes
|
|
390
|
+
- [ ] Rollback tested in staging before production release
|
|
391
|
+
|
|
392
|
+
## Production Readiness Checklist
|
|
393
|
+
|
|
394
|
+
Before any production deployment:
|
|
395
|
+
|
|
396
|
+
### Application
|
|
397
|
+
- [ ] All tests pass (unit, integration, E2E)
|
|
398
|
+
- [ ] No hardcoded secrets in code or config files
|
|
399
|
+
- [ ] Error handling covers all edge cases
|
|
400
|
+
- [ ] Logging is structured (JSON) and does not contain PII
|
|
401
|
+
- [ ] Health check endpoint returns meaningful status
|
|
402
|
+
|
|
403
|
+
### Infrastructure
|
|
404
|
+
- [ ] Docker image builds reproducibly (pinned versions)
|
|
405
|
+
- [ ] Environment variables documented and validated at startup
|
|
406
|
+
- [ ] Resource limits set (CPU, memory)
|
|
407
|
+
- [ ] Horizontal scaling configured (min/max instances)
|
|
408
|
+
- [ ] SSL/TLS enabled on all endpoints
|
|
409
|
+
|
|
410
|
+
### Monitoring
|
|
411
|
+
- [ ] Application metrics exported (request rate, latency, errors)
|
|
412
|
+
- [ ] Alerts configured for error rate > threshold
|
|
413
|
+
- [ ] Log aggregation set up (structured logs, searchable)
|
|
414
|
+
- [ ] Uptime monitoring on health endpoint
|
|
415
|
+
|
|
416
|
+
### Security
|
|
417
|
+
- [ ] Dependencies scanned for CVEs
|
|
418
|
+
- [ ] CORS configured for allowed origins only
|
|
419
|
+
- [ ] Rate limiting enabled on public endpoints
|
|
420
|
+
- [ ] Authentication and authorization verified
|
|
421
|
+
- [ ] Security headers set (CSP, HSTS, X-Frame-Options)
|
|
422
|
+
|
|
423
|
+
### Operations
|
|
424
|
+
- [ ] Rollback plan documented and tested
|
|
425
|
+
- [ ] Database migration tested against production-sized data
|
|
426
|
+
- [ ] Runbook for common failure scenarios
|
|
427
|
+
- [ ] On-call rotation and escalation path defined
|