tech-hub-skills 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +250 -0
- package/bin/cli.js +241 -0
- package/bin/copilot.js +182 -0
- package/bin/postinstall.js +42 -0
- package/package.json +46 -0
- package/tech_hub_skills/roles/ai-engineer/skills/01-prompt-engineering/README.md +252 -0
- package/tech_hub_skills/roles/ai-engineer/skills/02-rag-pipeline/README.md +448 -0
- package/tech_hub_skills/roles/ai-engineer/skills/03-agent-orchestration/README.md +599 -0
- package/tech_hub_skills/roles/ai-engineer/skills/04-llm-guardrails/README.md +735 -0
- package/tech_hub_skills/roles/ai-engineer/skills/05-vector-embeddings/README.md +711 -0
- package/tech_hub_skills/roles/ai-engineer/skills/06-llm-evaluation/README.md +777 -0
- package/tech_hub_skills/roles/azure/skills/01-infrastructure-fundamentals/README.md +264 -0
- package/tech_hub_skills/roles/azure/skills/02-data-factory/README.md +264 -0
- package/tech_hub_skills/roles/azure/skills/03-synapse-analytics/README.md +264 -0
- package/tech_hub_skills/roles/azure/skills/04-databricks/README.md +264 -0
- package/tech_hub_skills/roles/azure/skills/05-functions/README.md +264 -0
- package/tech_hub_skills/roles/azure/skills/06-kubernetes-service/README.md +264 -0
- package/tech_hub_skills/roles/azure/skills/07-openai-service/README.md +264 -0
- package/tech_hub_skills/roles/azure/skills/08-machine-learning/README.md +264 -0
- package/tech_hub_skills/roles/azure/skills/09-storage-adls/README.md +264 -0
- package/tech_hub_skills/roles/azure/skills/10-networking/README.md +264 -0
- package/tech_hub_skills/roles/azure/skills/11-sql-cosmos/README.md +264 -0
- package/tech_hub_skills/roles/azure/skills/12-event-hubs/README.md +264 -0
- package/tech_hub_skills/roles/code-review/skills/01-automated-code-review/README.md +394 -0
- package/tech_hub_skills/roles/code-review/skills/02-pr-review-workflow/README.md +427 -0
- package/tech_hub_skills/roles/code-review/skills/03-code-quality-gates/README.md +518 -0
- package/tech_hub_skills/roles/code-review/skills/04-reviewer-assignment/README.md +504 -0
- package/tech_hub_skills/roles/code-review/skills/05-review-analytics/README.md +540 -0
- package/tech_hub_skills/roles/data-engineer/skills/01-lakehouse-architecture/README.md +550 -0
- package/tech_hub_skills/roles/data-engineer/skills/02-etl-pipeline/README.md +580 -0
- package/tech_hub_skills/roles/data-engineer/skills/03-data-quality/README.md +579 -0
- package/tech_hub_skills/roles/data-engineer/skills/04-streaming-pipelines/README.md +608 -0
- package/tech_hub_skills/roles/data-engineer/skills/05-performance-optimization/README.md +547 -0
- package/tech_hub_skills/roles/data-governance/skills/01-data-catalog/README.md +112 -0
- package/tech_hub_skills/roles/data-governance/skills/02-data-lineage/README.md +129 -0
- package/tech_hub_skills/roles/data-governance/skills/03-data-quality-framework/README.md +182 -0
- package/tech_hub_skills/roles/data-governance/skills/04-access-control/README.md +39 -0
- package/tech_hub_skills/roles/data-governance/skills/05-master-data-management/README.md +40 -0
- package/tech_hub_skills/roles/data-governance/skills/06-compliance-privacy/README.md +46 -0
- package/tech_hub_skills/roles/data-scientist/skills/01-eda-automation/README.md +230 -0
- package/tech_hub_skills/roles/data-scientist/skills/02-statistical-modeling/README.md +264 -0
- package/tech_hub_skills/roles/data-scientist/skills/03-feature-engineering/README.md +264 -0
- package/tech_hub_skills/roles/data-scientist/skills/04-predictive-modeling/README.md +264 -0
- package/tech_hub_skills/roles/data-scientist/skills/05-customer-analytics/README.md +264 -0
- package/tech_hub_skills/roles/data-scientist/skills/06-campaign-analysis/README.md +264 -0
- package/tech_hub_skills/roles/data-scientist/skills/07-experimentation/README.md +264 -0
- package/tech_hub_skills/roles/data-scientist/skills/08-data-visualization/README.md +264 -0
- package/tech_hub_skills/roles/devops/skills/01-cicd-pipeline/README.md +264 -0
- package/tech_hub_skills/roles/devops/skills/02-container-orchestration/README.md +264 -0
- package/tech_hub_skills/roles/devops/skills/03-infrastructure-as-code/README.md +264 -0
- package/tech_hub_skills/roles/devops/skills/04-gitops/README.md +264 -0
- package/tech_hub_skills/roles/devops/skills/05-environment-management/README.md +264 -0
- package/tech_hub_skills/roles/devops/skills/06-automated-testing/README.md +264 -0
- package/tech_hub_skills/roles/devops/skills/07-release-management/README.md +264 -0
- package/tech_hub_skills/roles/devops/skills/08-monitoring-alerting/README.md +264 -0
- package/tech_hub_skills/roles/devops/skills/09-devsecops/README.md +265 -0
- package/tech_hub_skills/roles/finops/skills/01-cost-visibility/README.md +264 -0
- package/tech_hub_skills/roles/finops/skills/02-resource-tagging/README.md +264 -0
- package/tech_hub_skills/roles/finops/skills/03-budget-management/README.md +264 -0
- package/tech_hub_skills/roles/finops/skills/04-reserved-instances/README.md +264 -0
- package/tech_hub_skills/roles/finops/skills/05-spot-optimization/README.md +264 -0
- package/tech_hub_skills/roles/finops/skills/06-storage-tiering/README.md +264 -0
- package/tech_hub_skills/roles/finops/skills/07-compute-rightsizing/README.md +264 -0
- package/tech_hub_skills/roles/finops/skills/08-chargeback/README.md +264 -0
- package/tech_hub_skills/roles/ml-engineer/skills/01-mlops-pipeline/README.md +566 -0
- package/tech_hub_skills/roles/ml-engineer/skills/02-feature-engineering/README.md +655 -0
- package/tech_hub_skills/roles/ml-engineer/skills/03-model-training/README.md +704 -0
- package/tech_hub_skills/roles/ml-engineer/skills/04-model-serving/README.md +845 -0
- package/tech_hub_skills/roles/ml-engineer/skills/05-model-monitoring/README.md +874 -0
- package/tech_hub_skills/roles/mlops/skills/01-ml-pipeline-orchestration/README.md +264 -0
- package/tech_hub_skills/roles/mlops/skills/02-experiment-tracking/README.md +264 -0
- package/tech_hub_skills/roles/mlops/skills/03-model-registry/README.md +264 -0
- package/tech_hub_skills/roles/mlops/skills/04-feature-store/README.md +264 -0
- package/tech_hub_skills/roles/mlops/skills/05-model-deployment/README.md +264 -0
- package/tech_hub_skills/roles/mlops/skills/06-model-observability/README.md +264 -0
- package/tech_hub_skills/roles/mlops/skills/07-data-versioning/README.md +264 -0
- package/tech_hub_skills/roles/mlops/skills/08-ab-testing/README.md +264 -0
- package/tech_hub_skills/roles/mlops/skills/09-automated-retraining/README.md +264 -0
- package/tech_hub_skills/roles/platform-engineer/skills/01-internal-developer-platform/README.md +153 -0
- package/tech_hub_skills/roles/platform-engineer/skills/02-self-service-infrastructure/README.md +57 -0
- package/tech_hub_skills/roles/platform-engineer/skills/03-slo-sli-management/README.md +59 -0
- package/tech_hub_skills/roles/platform-engineer/skills/04-developer-experience/README.md +57 -0
- package/tech_hub_skills/roles/platform-engineer/skills/05-incident-management/README.md +73 -0
- package/tech_hub_skills/roles/platform-engineer/skills/06-capacity-management/README.md +59 -0
- package/tech_hub_skills/roles/product-designer/skills/01-requirements-discovery/README.md +407 -0
- package/tech_hub_skills/roles/product-designer/skills/02-user-research/README.md +382 -0
- package/tech_hub_skills/roles/product-designer/skills/03-brainstorming-ideation/README.md +437 -0
- package/tech_hub_skills/roles/product-designer/skills/04-ux-design/README.md +496 -0
- package/tech_hub_skills/roles/product-designer/skills/05-product-market-fit/README.md +376 -0
- package/tech_hub_skills/roles/product-designer/skills/06-stakeholder-management/README.md +412 -0
- package/tech_hub_skills/roles/security-architect/skills/01-pii-detection/README.md +319 -0
- package/tech_hub_skills/roles/security-architect/skills/02-threat-modeling/README.md +264 -0
- package/tech_hub_skills/roles/security-architect/skills/03-infrastructure-security/README.md +264 -0
- package/tech_hub_skills/roles/security-architect/skills/04-iam/README.md +264 -0
- package/tech_hub_skills/roles/security-architect/skills/05-application-security/README.md +264 -0
- package/tech_hub_skills/roles/security-architect/skills/06-secrets-management/README.md +264 -0
- package/tech_hub_skills/roles/security-architect/skills/07-security-monitoring/README.md +264 -0
- package/tech_hub_skills/roles/system-design/skills/01-architecture-patterns/README.md +337 -0
- package/tech_hub_skills/roles/system-design/skills/02-requirements-engineering/README.md +264 -0
- package/tech_hub_skills/roles/system-design/skills/03-scalability/README.md +264 -0
- package/tech_hub_skills/roles/system-design/skills/04-high-availability/README.md +264 -0
- package/tech_hub_skills/roles/system-design/skills/05-cost-optimization-design/README.md +264 -0
- package/tech_hub_skills/roles/system-design/skills/06-api-design/README.md +264 -0
- package/tech_hub_skills/roles/system-design/skills/07-observability-architecture/README.md +264 -0
- package/tech_hub_skills/roles/system-design/skills/08-process-automation/PROCESS_TEMPLATE.md +336 -0
- package/tech_hub_skills/roles/system-design/skills/08-process-automation/README.md +521 -0
- package/tech_hub_skills/skills/README.md +336 -0
- package/tech_hub_skills/skills/ai-engineer.md +104 -0
- package/tech_hub_skills/skills/azure.md +149 -0
- package/tech_hub_skills/skills/code-review.md +399 -0
- package/tech_hub_skills/skills/compliance-automation.md +747 -0
- package/tech_hub_skills/skills/data-engineer.md +113 -0
- package/tech_hub_skills/skills/data-governance.md +102 -0
- package/tech_hub_skills/skills/data-scientist.md +123 -0
- package/tech_hub_skills/skills/devops.md +160 -0
- package/tech_hub_skills/skills/docker.md +160 -0
- package/tech_hub_skills/skills/enterprise-dashboard.md +613 -0
- package/tech_hub_skills/skills/finops.md +184 -0
- package/tech_hub_skills/skills/ml-engineer.md +115 -0
- package/tech_hub_skills/skills/mlops.md +187 -0
- package/tech_hub_skills/skills/optimization-advisor.md +329 -0
- package/tech_hub_skills/skills/orchestrator.md +497 -0
- package/tech_hub_skills/skills/platform-engineer.md +102 -0
- package/tech_hub_skills/skills/process-automation.md +226 -0
- package/tech_hub_skills/skills/process-changelog.md +184 -0
- package/tech_hub_skills/skills/process-documentation.md +484 -0
- package/tech_hub_skills/skills/process-kanban.md +324 -0
- package/tech_hub_skills/skills/process-versioning.md +214 -0
- package/tech_hub_skills/skills/product-designer.md +104 -0
- package/tech_hub_skills/skills/project-starter.md +443 -0
- package/tech_hub_skills/skills/security-architect.md +135 -0
- package/tech_hub_skills/skills/system-design.md +126 -0
|
@@ -0,0 +1,264 @@
|
|
|
1
|
+
# Skill 09: Automated Retraining Pipelines
|
|
2
|
+
|
|
3
|
+
## 🎯 Overview
|
|
4
|
+
Trigger-based retraining, validation gates
|
|
5
|
+
|
|
6
|
+
## 🔗 Connections
|
|
7
|
+
- **Data Engineer**: Data foundation and pipelines (de-01, de-02, de-03)
|
|
8
|
+
- **Security Architect**: Compliance, PII detection, access control (sa-01, sa-02)
|
|
9
|
+
- **ML Engineer**: Model lifecycle and serving (ml-01, ml-04)
|
|
10
|
+
- **AI Engineer**: LLM integration and automation (ai-01, ai-02, ai-07)
|
|
11
|
+
- **MLOps**: Experiment tracking and monitoring (mo-01, mo-03, mo-06)
|
|
12
|
+
- **FinOps**: Cost optimization and tracking (fo-01, fo-07)
|
|
13
|
+
- **DevOps**: CI/CD, containerization, monitoring (do-01, do-03, do-08)
|
|
14
|
+
- **System Design**: Architecture patterns (sd-01)
|
|
15
|
+
- **Dependencies**: mo-06
|
|
16
|
+
|
|
17
|
+
## 🛠️ Tools Included
|
|
18
|
+
|
|
19
|
+
### 1. Primary Implementation Script
|
|
20
|
+
Core implementation for automated retraining pipelines.
|
|
21
|
+
|
|
22
|
+
### 2. Configuration Manager
|
|
23
|
+
Manage configuration and settings for automated retraining pipelines.
|
|
24
|
+
|
|
25
|
+
### 3. Integration Connector
|
|
26
|
+
Connect with other Tech Hub skills and external services.
|
|
27
|
+
|
|
28
|
+
### 4. Monitoring & Metrics
|
|
29
|
+
Track performance, costs, and quality metrics.
|
|
30
|
+
|
|
31
|
+
### 5. Automation Scripts
|
|
32
|
+
Automate common workflows and tasks.
|
|
33
|
+
|
|
34
|
+
## 📊 Key Metrics
|
|
35
|
+
- Implementation quality score
|
|
36
|
+
- Performance benchmarks
|
|
37
|
+
- Cost efficiency
|
|
38
|
+
- Security compliance rate
|
|
39
|
+
- Integration test coverage
|
|
40
|
+
|
|
41
|
+
## 🚀 Quick Start
|
|
42
|
+
|
|
43
|
+
```python
|
|
44
|
+
# Example implementation for Automated Retraining Pipelines
|
|
45
|
+
from mlops import 09_automated_retraining
|
|
46
|
+
|
|
47
|
+
# Initialize
|
|
48
|
+
service = 09AutomatedRetrainingService()
|
|
49
|
+
|
|
50
|
+
# Execute
|
|
51
|
+
result = service.execute(
|
|
52
|
+
config={
|
|
53
|
+
"environment": "production",
|
|
54
|
+
"enable_monitoring": True
|
|
55
|
+
}
|
|
56
|
+
)
|
|
57
|
+
|
|
58
|
+
print(f"Status: {result.status}")
|
|
59
|
+
print(f"Metrics: {result.metrics}")
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
## 📚 Best Practices
|
|
63
|
+
|
|
64
|
+
### Cost Optimization (FinOps Integration)
|
|
65
|
+
|
|
66
|
+
1. **Monitor Resource Costs**
|
|
67
|
+
- Track costs per execution
|
|
68
|
+
- Set budget alerts
|
|
69
|
+
- Optimize resource utilization
|
|
70
|
+
- Reference: FinOps fo-01 (Cost Monitoring)
|
|
71
|
+
|
|
72
|
+
2. **Right-size Resources**
|
|
73
|
+
- Use appropriate compute sizes
|
|
74
|
+
- Implement auto-scaling
|
|
75
|
+
- Leverage spot/reserved instances where applicable
|
|
76
|
+
- Reference: FinOps fo-06, fo-07
|
|
77
|
+
|
|
78
|
+
### Security & Privacy (Security Architect Integration)
|
|
79
|
+
|
|
80
|
+
3. **Implement Access Control**
|
|
81
|
+
- Use least privilege principle
|
|
82
|
+
- Enable Azure AD authentication
|
|
83
|
+
- Audit access logs
|
|
84
|
+
- Reference: Security Architect sa-02 (IAM), sa-04
|
|
85
|
+
|
|
86
|
+
4. **Data Protection**
|
|
87
|
+
- Encrypt data at rest and in transit
|
|
88
|
+
- Scan for PII before processing
|
|
89
|
+
- Implement data retention policies
|
|
90
|
+
- Reference: Security Architect sa-01 (PII Detection)
|
|
91
|
+
|
|
92
|
+
### Quality & Governance (Data Engineer Integration)
|
|
93
|
+
|
|
94
|
+
5. **Ensure Data Quality**
|
|
95
|
+
- Validate inputs and outputs
|
|
96
|
+
- Implement quality gates
|
|
97
|
+
- Monitor data freshness
|
|
98
|
+
- Reference: Data Engineer de-03 (Data Quality)
|
|
99
|
+
|
|
100
|
+
### Lifecycle Management (MLOps Integration)
|
|
101
|
+
|
|
102
|
+
6. **Version Control**
|
|
103
|
+
- Version all configurations
|
|
104
|
+
- Track changes over time
|
|
105
|
+
- Enable rollback capability
|
|
106
|
+
- Reference: MLOps mo-03 (Versioning)
|
|
107
|
+
|
|
108
|
+
7. **Continuous Monitoring**
|
|
109
|
+
- Track performance metrics
|
|
110
|
+
- Set up alerting
|
|
111
|
+
- Monitor for drift
|
|
112
|
+
- Reference: MLOps mo-06 (Monitoring)
|
|
113
|
+
|
|
114
|
+
### Deployment & Operations (DevOps Integration)
|
|
115
|
+
|
|
116
|
+
8. **Automate Deployment**
|
|
117
|
+
- Implement CI/CD pipelines
|
|
118
|
+
- Use infrastructure as code
|
|
119
|
+
- Enable blue-green deployments
|
|
120
|
+
- Reference: DevOps do-01 (CI/CD), do-03 (IaC)
|
|
121
|
+
|
|
122
|
+
9. **Observability**
|
|
123
|
+
- Implement distributed tracing
|
|
124
|
+
- Set up dashboards
|
|
125
|
+
- Enable logging and metrics
|
|
126
|
+
- Reference: DevOps do-08 (Monitoring)
|
|
127
|
+
|
|
128
|
+
### Azure-Specific Best Practices
|
|
129
|
+
|
|
130
|
+
10. **Leverage Azure Services**
|
|
131
|
+
- Use managed services where possible
|
|
132
|
+
- Implement Azure Policy for governance
|
|
133
|
+
- Enable Azure Monitor integration
|
|
134
|
+
- Use managed identities for authentication
|
|
135
|
+
|
|
136
|
+
## 💰 Cost Optimization Examples
|
|
137
|
+
|
|
138
|
+
### Cost Tracking
|
|
139
|
+
```python
|
|
140
|
+
from finops_tracker import CostTracker
|
|
141
|
+
|
|
142
|
+
tracker = CostTracker()
|
|
143
|
+
|
|
144
|
+
@tracker.track_costs
|
|
145
|
+
def run_operation(params):
|
|
146
|
+
# Your operation here
|
|
147
|
+
result = execute_operation(params)
|
|
148
|
+
return result
|
|
149
|
+
|
|
150
|
+
# Monthly report
|
|
151
|
+
report = tracker.monthly_report()
|
|
152
|
+
print(f"Total cost: ${report.total_cost:.2f}")
|
|
153
|
+
print(f"Cost per operation: ${report.avg_cost:.4f}")
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
## 🔒 Security Best Practices Examples
|
|
157
|
+
|
|
158
|
+
### Access Control Implementation
|
|
159
|
+
```python
|
|
160
|
+
from azure.identity import DefaultAzureCredential
|
|
161
|
+
from security_manager import AccessControl
|
|
162
|
+
|
|
163
|
+
credential = DefaultAzureCredential()
|
|
164
|
+
access_control = AccessControl(credential)
|
|
165
|
+
|
|
166
|
+
# Validate access before operation
|
|
167
|
+
@access_control.require_role("operator")
|
|
168
|
+
def sensitive_operation(data):
|
|
169
|
+
# Operation logic
|
|
170
|
+
return process_data(data)
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
## 📊 Enhanced Metrics & Monitoring
|
|
174
|
+
|
|
175
|
+
| Metric Category | Metric | Target | Tool |
|
|
176
|
+
|-----------------|--------|--------|------|
|
|
177
|
+
| **Performance** | Execution time (p95) | <5s | Azure Monitor |
|
|
178
|
+
| | Success rate | >99% | Custom metrics |
|
|
179
|
+
| **Cost** | Cost per operation | <$0.05 | FinOps dashboard |
|
|
180
|
+
| | Resource utilization | >75% | Azure Monitor |
|
|
181
|
+
| **Quality** | Error rate | <1% | App Insights |
|
|
182
|
+
| | Data quality score | >95% | Quality tracker |
|
|
183
|
+
| **Security** | Access violations | 0 | Security logs |
|
|
184
|
+
| | Compliance score | 100% | Audit system |
|
|
185
|
+
|
|
186
|
+
## 🚀 Deployment Pipeline
|
|
187
|
+
|
|
188
|
+
### CI/CD Example
|
|
189
|
+
```yaml
|
|
190
|
+
# .github/workflows/deploy-09-automated-retraining.yml
|
|
191
|
+
name: Deploy Automated Retraining Pipelines
|
|
192
|
+
|
|
193
|
+
on:
|
|
194
|
+
push:
|
|
195
|
+
paths:
|
|
196
|
+
- 'mlops/skills/09-automated-retraining/**'
|
|
197
|
+
branches:
|
|
198
|
+
- main
|
|
199
|
+
|
|
200
|
+
jobs:
|
|
201
|
+
test:
|
|
202
|
+
runs-on: ubuntu-latest
|
|
203
|
+
steps:
|
|
204
|
+
- uses: actions/checkout@v3
|
|
205
|
+
- name: Run tests
|
|
206
|
+
run: pytest tests/ -v
|
|
207
|
+
- name: Security scan
|
|
208
|
+
run: python scripts/security_scan.py
|
|
209
|
+
- name: Cost validation
|
|
210
|
+
run: python scripts/validate_costs.py
|
|
211
|
+
|
|
212
|
+
deploy:
|
|
213
|
+
needs: test
|
|
214
|
+
runs-on: ubuntu-latest
|
|
215
|
+
steps:
|
|
216
|
+
- name: Deploy to Azure
|
|
217
|
+
run: |
|
|
218
|
+
az deployment group create \
|
|
219
|
+
--resource-group rg-mlops \
|
|
220
|
+
--template-file infra/main.bicep
|
|
221
|
+
- name: Monitor deployment
|
|
222
|
+
run: python scripts/monitor_health.py --duration 10m
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
## 🔄 Integration Workflow
|
|
226
|
+
|
|
227
|
+
### End-to-End Process
|
|
228
|
+
```
|
|
229
|
+
1. Input Validation
|
|
230
|
+
↓
|
|
231
|
+
2. Security Checks (sa-01, sa-02)
|
|
232
|
+
↓
|
|
233
|
+
3. Main Processing
|
|
234
|
+
↓
|
|
235
|
+
4. Quality Validation (de-03)
|
|
236
|
+
↓
|
|
237
|
+
5. Cost Tracking (fo-01)
|
|
238
|
+
↓
|
|
239
|
+
6. Monitoring & Logging (do-08)
|
|
240
|
+
↓
|
|
241
|
+
7. Output Delivery
|
|
242
|
+
```
|
|
243
|
+
|
|
244
|
+
## 🎯 Quick Wins
|
|
245
|
+
|
|
246
|
+
1. **Enable cost tracking** - Monitor spending from day one
|
|
247
|
+
2. **Implement security scanning** - Catch vulnerabilities early
|
|
248
|
+
3. **Set up monitoring** - Full visibility into operations
|
|
249
|
+
4. **Automate deployment** - Faster, safer releases
|
|
250
|
+
5. **Add quality gates** - Prevent bad data from propagating
|
|
251
|
+
6. **Enable caching** - Reduce redundant operations
|
|
252
|
+
7. **Implement retries** - Improve reliability
|
|
253
|
+
8. **Set up alerting** - Know about issues immediately
|
|
254
|
+
|
|
255
|
+
## 🔗 Related Skills
|
|
256
|
+
- mo-06
|
|
257
|
+
|
|
258
|
+
---
|
|
259
|
+
|
|
260
|
+
**Skill ID**: `09-automated-retraining`
|
|
261
|
+
**Complexity**: Expert
|
|
262
|
+
**Dependencies**: mo-06
|
|
263
|
+
**Business Value**: High
|
|
264
|
+
**Estimated Implementation Time**: 1-2 weeks
|
package/tech_hub_skills/roles/platform-engineer/skills/01-internal-developer-platform/README.md
ADDED
|
@@ -0,0 +1,153 @@
|
|
|
1
|
+
# pe-01: Internal Developer Platform (IDP)
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
Build developer portals using Backstage for service catalog, golden path templates, self-service provisioning, and platform documentation.
|
|
6
|
+
|
|
7
|
+
## Key Capabilities
|
|
8
|
+
|
|
9
|
+
- **Developer Portal**: Centralized platform UI (Backstage)
|
|
10
|
+
- **Service Catalog**: All services, APIs, documentation
|
|
11
|
+
- **Golden Path Templates**: Scaffolding for new services
|
|
12
|
+
- **Self-Service Provisioning**: One-click infrastructure
|
|
13
|
+
- **Platform Documentation**: Unified docs portal
|
|
14
|
+
|
|
15
|
+
## Tools & Technologies
|
|
16
|
+
|
|
17
|
+
- **Backstage**: Open-source developer portal
|
|
18
|
+
- **Port**: Developer portal platform
|
|
19
|
+
- **Humanitec**: Platform orchestrator
|
|
20
|
+
- **Kratix**: Platform-as-a-product framework
|
|
21
|
+
|
|
22
|
+
## Implementation
|
|
23
|
+
|
|
24
|
+
### 1. Backstage Setup
|
|
25
|
+
|
|
26
|
+
```yaml
|
|
27
|
+
# app-config.yaml
|
|
28
|
+
app:
|
|
29
|
+
title: Tech Hub Platform
|
|
30
|
+
baseUrl: http://localhost:3000
|
|
31
|
+
|
|
32
|
+
organization:
|
|
33
|
+
name: Tech Innovation Hub
|
|
34
|
+
|
|
35
|
+
backend:
|
|
36
|
+
baseUrl: http://localhost:7007
|
|
37
|
+
listen:
|
|
38
|
+
port: 7007
|
|
39
|
+
database:
|
|
40
|
+
client: pg
|
|
41
|
+
connection:
|
|
42
|
+
host: ${POSTGRES_HOST}
|
|
43
|
+
port: ${POSTGRES_PORT}
|
|
44
|
+
|
|
45
|
+
catalog:
|
|
46
|
+
providers:
|
|
47
|
+
github:
|
|
48
|
+
organization: 'your-org'
|
|
49
|
+
catalogPath: '/catalog-info.yaml'
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
### 2. Service Catalog
|
|
53
|
+
|
|
54
|
+
```yaml
|
|
55
|
+
# catalog-info.yaml
|
|
56
|
+
apiVersion: backstage.io/v1alpha1
|
|
57
|
+
kind: Component
|
|
58
|
+
metadata:
|
|
59
|
+
name: customer-api
|
|
60
|
+
description: Customer management API
|
|
61
|
+
annotations:
|
|
62
|
+
github.com/project-slug: your-org/customer-api
|
|
63
|
+
sonarqube.org/project-key: customer-api
|
|
64
|
+
spec:
|
|
65
|
+
type: service
|
|
66
|
+
lifecycle: production
|
|
67
|
+
owner: team-platform
|
|
68
|
+
system: customer-domain
|
|
69
|
+
providesApis:
|
|
70
|
+
- customer-api
|
|
71
|
+
consumesApis:
|
|
72
|
+
- auth-api
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
### 3. Golden Path Template
|
|
76
|
+
|
|
77
|
+
```yaml
|
|
78
|
+
# template.yaml
|
|
79
|
+
apiVersion: scaffolder.backstage.io/v1beta3
|
|
80
|
+
kind: Template
|
|
81
|
+
metadata:
|
|
82
|
+
name: python-fastapi-service
|
|
83
|
+
title: Python FastAPI Service
|
|
84
|
+
description: Create a new Python FastAPI microservice
|
|
85
|
+
spec:
|
|
86
|
+
owner: platform-team
|
|
87
|
+
type: service
|
|
88
|
+
|
|
89
|
+
parameters:
|
|
90
|
+
- title: Service Information
|
|
91
|
+
required:
|
|
92
|
+
- name
|
|
93
|
+
- owner
|
|
94
|
+
properties:
|
|
95
|
+
name:
|
|
96
|
+
title: Service Name
|
|
97
|
+
type: string
|
|
98
|
+
description: Unique name for the service
|
|
99
|
+
owner:
|
|
100
|
+
title: Owner
|
|
101
|
+
type: string
|
|
102
|
+
description: Team that owns this service
|
|
103
|
+
|
|
104
|
+
steps:
|
|
105
|
+
- id: fetch-template
|
|
106
|
+
name: Fetch Template
|
|
107
|
+
action: fetch:template
|
|
108
|
+
input:
|
|
109
|
+
url: ./skeleton
|
|
110
|
+
values:
|
|
111
|
+
name: ${{ parameters.name }}
|
|
112
|
+
owner: ${{ parameters.owner }}
|
|
113
|
+
|
|
114
|
+
- id: publish
|
|
115
|
+
name: Publish to GitHub
|
|
116
|
+
action: publish:github
|
|
117
|
+
input:
|
|
118
|
+
repoUrl: github.com?owner=your-org&repo=${{ parameters.name }}
|
|
119
|
+
|
|
120
|
+
- id: register
|
|
121
|
+
name: Register Component
|
|
122
|
+
action: catalog:register
|
|
123
|
+
input:
|
|
124
|
+
repoContentsUrl: ${{ steps.publish.output.repoContentsUrl }}
|
|
125
|
+
catalogInfoPath: '/catalog-info.yaml'
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
## Best Practices
|
|
129
|
+
|
|
130
|
+
1. **Start Small**: Begin with service catalog, add features iteratively
|
|
131
|
+
2. **Golden Paths**: Create templates for 80% of use cases
|
|
132
|
+
3. **Self-Service**: Minimize manual ticket workflows
|
|
133
|
+
4. **Measure Adoption**: Track active users and template usage
|
|
134
|
+
5. **Documentation**: Keep docs updated and searchable
|
|
135
|
+
|
|
136
|
+
## Cost Optimization
|
|
137
|
+
|
|
138
|
+
- Host Backstage on Kubernetes spot instances
|
|
139
|
+
- Use PostgreSQL managed service (cheaper than self-hosted)
|
|
140
|
+
- Cache plugin data to reduce API calls
|
|
141
|
+
- Right-size backend resources
|
|
142
|
+
|
|
143
|
+
## Integration
|
|
144
|
+
|
|
145
|
+
**Connects with:**
|
|
146
|
+
- do-01 (CI/CD): Link to deployment pipelines
|
|
147
|
+
- do-02 (Kubernetes): Service deployment info
|
|
148
|
+
- pe-03 (SLO): Display SLO status
|
|
149
|
+
- dg-01 (Catalog): Link to data catalog
|
|
150
|
+
|
|
151
|
+
## Quick Win
|
|
152
|
+
|
|
153
|
+
Deploy Backstage with GitHub integration, import 5 services to catalog, show team the unified view of their services.
|
package/tech_hub_skills/roles/platform-engineer/skills/02-self-service-infrastructure/README.md
ADDED
|
@@ -0,0 +1,57 @@
|
|
|
1
|
+
# pe-02: Self-Service Infrastructure
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
Enable developers to provision namespaces, databases, secrets, and environments through self-service automation.
|
|
6
|
+
|
|
7
|
+
## Key Capabilities
|
|
8
|
+
|
|
9
|
+
- **Namespace Provisioning**: Auto-create K8s namespaces
|
|
10
|
+
- **Database Provisioning**: Self-service DB creation
|
|
11
|
+
- **Secret Management**: Automated secret injection
|
|
12
|
+
- **Resource Quotas**: Automatic quota management
|
|
13
|
+
- **Environment Management**: Dev/staging/prod provisioning
|
|
14
|
+
|
|
15
|
+
## Implementation
|
|
16
|
+
|
|
17
|
+
```python
|
|
18
|
+
# Self-service namespace provisioning
|
|
19
|
+
from kubernetes import client, config
|
|
20
|
+
|
|
21
|
+
def provision_namespace(team_name, environment):
|
|
22
|
+
"""Create namespace with quotas and RBAC"""
|
|
23
|
+
config.load_kube_config()
|
|
24
|
+
v1 = client.CoreV1Api()
|
|
25
|
+
|
|
26
|
+
# Create namespace
|
|
27
|
+
namespace = client.V1Namespace(
|
|
28
|
+
metadata=client.V1ObjectMeta(
|
|
29
|
+
name=f"{team_name}-{environment}",
|
|
30
|
+
labels={
|
|
31
|
+
"team": team_name,
|
|
32
|
+
"environment": environment
|
|
33
|
+
}
|
|
34
|
+
)
|
|
35
|
+
)
|
|
36
|
+
v1.create_namespace(namespace)
|
|
37
|
+
|
|
38
|
+
# Apply resource quota
|
|
39
|
+
quota = client.V1ResourceQuota(
|
|
40
|
+
metadata=client.V1ObjectMeta(name="default-quota"),
|
|
41
|
+
spec=client.V1ResourceQuotaSpec(
|
|
42
|
+
hard={
|
|
43
|
+
"requests.cpu": "10",
|
|
44
|
+
"requests.memory": "20Gi",
|
|
45
|
+
"pods": "50"
|
|
46
|
+
}
|
|
47
|
+
)
|
|
48
|
+
)
|
|
49
|
+
v1.create_namespaced_resource_quota(
|
|
50
|
+
namespace=namespace.metadata.name,
|
|
51
|
+
body=quota
|
|
52
|
+
)
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
## Integration
|
|
56
|
+
|
|
57
|
+
**Connects with:** do-02 (Kubernetes), sa-06 (Secrets), pe-01 (IDP)
|
|
@@ -0,0 +1,59 @@
|
|
|
1
|
+
# pe-03: SLO/SLI Management
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
Define and track Service Level Objectives (SLOs), manage error budgets, instrument SLIs, and create SLO-based alerting.
|
|
6
|
+
|
|
7
|
+
## Key Capabilities
|
|
8
|
+
|
|
9
|
+
- **SLO Definition**: Availability, latency, error rate targets
|
|
10
|
+
- **Error Budget Management**: Track remaining error budget
|
|
11
|
+
- **SLI Instrumentation**: Collect service-level indicators
|
|
12
|
+
- **SLO-Based Alerting**: Alert on error budget burn
|
|
13
|
+
- **SLO Dashboards**: Visualize SLO compliance
|
|
14
|
+
|
|
15
|
+
## Implementation
|
|
16
|
+
|
|
17
|
+
```yaml
|
|
18
|
+
# SLO definition (Sloth)
|
|
19
|
+
version: prometheus/v1
|
|
20
|
+
service: customer-api
|
|
21
|
+
slos:
|
|
22
|
+
- name: requests-availability
|
|
23
|
+
objective: 99.9
|
|
24
|
+
description: 99.9% of requests successful
|
|
25
|
+
sli:
|
|
26
|
+
events:
|
|
27
|
+
error_query: |
|
|
28
|
+
sum(rate(http_requests_total{job="customer-api",code=~"5.."}[5m]))
|
|
29
|
+
total_query: |
|
|
30
|
+
sum(rate(http_requests_total{job="customer-api"}[5m]))
|
|
31
|
+
alerting:
|
|
32
|
+
name: CustomerAPIHighErrorRate
|
|
33
|
+
labels:
|
|
34
|
+
severity: page
|
|
35
|
+
annotations:
|
|
36
|
+
summary: High error rate on customer API
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
```python
|
|
40
|
+
# Error budget calculation
|
|
41
|
+
def calculate_error_budget(slo_target, time_window_days=30):
|
|
42
|
+
"""Calculate remaining error budget"""
|
|
43
|
+
total_minutes = time_window_days * 24 * 60
|
|
44
|
+
allowed_downtime = total_minutes * (1 - slo_target/100)
|
|
45
|
+
|
|
46
|
+
actual_downtime = get_actual_downtime(time_window_days)
|
|
47
|
+
remaining_budget = allowed_downtime - actual_downtime
|
|
48
|
+
|
|
49
|
+
return {
|
|
50
|
+
'allowed_downtime_minutes': allowed_downtime,
|
|
51
|
+
'actual_downtime_minutes': actual_downtime,
|
|
52
|
+
'remaining_budget_minutes': remaining_budget,
|
|
53
|
+
'budget_consumed_percent': (actual_downtime / allowed_downtime) * 100
|
|
54
|
+
}
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
## Integration
|
|
58
|
+
|
|
59
|
+
**Connects with:** do-08 (Monitoring), pe-01 (IDP), pe-05 (Incident Management)
|
|
@@ -0,0 +1,57 @@
|
|
|
1
|
+
# pe-04: Developer Experience
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
Improve developer velocity through automated onboarding, documentation-as-code, CLI tools, and DORA metrics tracking.
|
|
6
|
+
|
|
7
|
+
## Key Capabilities
|
|
8
|
+
|
|
9
|
+
- **Automated Onboarding**: Zero-to-commit in < 1 hour
|
|
10
|
+
- **Documentation-as-Code**: Docs in git, versioned
|
|
11
|
+
- **Developer CLI**: Unified command-line tools
|
|
12
|
+
- **DORA Metrics**: Deployment frequency, lead time, MTTR, change fail rate
|
|
13
|
+
- **Feedback Collection**: Regular developer surveys
|
|
14
|
+
|
|
15
|
+
## Implementation
|
|
16
|
+
|
|
17
|
+
```bash
|
|
18
|
+
# Developer CLI
|
|
19
|
+
#!/bin/bash
|
|
20
|
+
# platform-cli
|
|
21
|
+
|
|
22
|
+
case $1 in
|
|
23
|
+
create-service)
|
|
24
|
+
backstage scaffold $2 --template python-fastapi
|
|
25
|
+
;;
|
|
26
|
+
deploy)
|
|
27
|
+
kubectl apply -f k8s/ --namespace=$2
|
|
28
|
+
;;
|
|
29
|
+
logs)
|
|
30
|
+
kubectl logs -f deployment/$2 -n $3
|
|
31
|
+
;;
|
|
32
|
+
metrics)
|
|
33
|
+
open "https://grafana.company.com/d/dora-metrics"
|
|
34
|
+
;;
|
|
35
|
+
esac
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
```python
|
|
39
|
+
# DORA metrics collection
|
|
40
|
+
def calculate_dora_metrics(team_name, days=30):
|
|
41
|
+
"""Calculate DORA metrics"""
|
|
42
|
+
deployments = get_deployments(team_name, days)
|
|
43
|
+
incidents = get_incidents(team_name, days)
|
|
44
|
+
|
|
45
|
+
metrics = {
|
|
46
|
+
'deployment_frequency': len(deployments) / days,
|
|
47
|
+
'lead_time_hours': sum(d.lead_time for d in deployments) / len(deployments),
|
|
48
|
+
'mttr_hours': sum(i.resolution_time for i in incidents) / len(incidents) if incidents else 0,
|
|
49
|
+
'change_fail_rate': len([d for d in deployments if d.failed]) / len(deployments)
|
|
50
|
+
}
|
|
51
|
+
|
|
52
|
+
return metrics
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
## Integration
|
|
56
|
+
|
|
57
|
+
**Connects with:** pe-01 (IDP), do-01 (CI/CD), pe-05 (Incident Management)
|
|
@@ -0,0 +1,73 @@
|
|
|
1
|
+
# pe-05: Incident Management
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
On-call management, incident response procedures, postmortem templates, runbook automation, and alert routing.
|
|
6
|
+
|
|
7
|
+
## Key Capabilities
|
|
8
|
+
|
|
9
|
+
- **On-Call Management**: Rotation schedules
|
|
10
|
+
- **Incident Response**: Clear escalation procedures
|
|
11
|
+
- **Postmortem Templates**: Blameless retrospectives
|
|
12
|
+
- **Runbook Automation**: Auto-remediation
|
|
13
|
+
- **Alert Routing**: Intelligent alert distribution
|
|
14
|
+
|
|
15
|
+
## Implementation
|
|
16
|
+
|
|
17
|
+
```yaml
|
|
18
|
+
# PagerDuty incident response
|
|
19
|
+
services:
|
|
20
|
+
- name: customer-api
|
|
21
|
+
escalation_policy: platform-team
|
|
22
|
+
alert_grouping: intelligent
|
|
23
|
+
incident_urgency_rule:
|
|
24
|
+
type: constant
|
|
25
|
+
urgency: high
|
|
26
|
+
|
|
27
|
+
# Runbook automation
|
|
28
|
+
runbooks:
|
|
29
|
+
- name: high-memory-usage
|
|
30
|
+
trigger: memory_usage > 90%
|
|
31
|
+
actions:
|
|
32
|
+
- restart_pod
|
|
33
|
+
- scale_replicas: 2
|
|
34
|
+
- notify_oncall
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
```python
|
|
38
|
+
# Postmortem template generator
|
|
39
|
+
def create_postmortem(incident_id):
|
|
40
|
+
"""Generate postmortem document"""
|
|
41
|
+
incident = get_incident(incident_id)
|
|
42
|
+
|
|
43
|
+
template = f"""
|
|
44
|
+
# Incident Postmortem: {incident.title}
|
|
45
|
+
|
|
46
|
+
## Incident Summary
|
|
47
|
+
- **Date**: {incident.date}
|
|
48
|
+
- **Duration**: {incident.duration}
|
|
49
|
+
- **Severity**: {incident.severity}
|
|
50
|
+
- **Responders**: {incident.responders}
|
|
51
|
+
|
|
52
|
+
## Timeline
|
|
53
|
+
{incident.timeline}
|
|
54
|
+
|
|
55
|
+
## Root Cause
|
|
56
|
+
[To be filled]
|
|
57
|
+
|
|
58
|
+
## Resolution
|
|
59
|
+
{incident.resolution}
|
|
60
|
+
|
|
61
|
+
## Action Items
|
|
62
|
+
- [ ] TODO 1
|
|
63
|
+
- [ ] TODO 2
|
|
64
|
+
|
|
65
|
+
## Lessons Learned
|
|
66
|
+
[To be filled]
|
|
67
|
+
"""
|
|
68
|
+
return template
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
## Integration
|
|
72
|
+
|
|
73
|
+
**Connects with:** pe-03 (SLO), do-08 (Monitoring), pe-01 (IDP)
|