tech-hub-skills 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (133) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +250 -0
  3. package/bin/cli.js +241 -0
  4. package/bin/copilot.js +182 -0
  5. package/bin/postinstall.js +42 -0
  6. package/package.json +46 -0
  7. package/tech_hub_skills/roles/ai-engineer/skills/01-prompt-engineering/README.md +252 -0
  8. package/tech_hub_skills/roles/ai-engineer/skills/02-rag-pipeline/README.md +448 -0
  9. package/tech_hub_skills/roles/ai-engineer/skills/03-agent-orchestration/README.md +599 -0
  10. package/tech_hub_skills/roles/ai-engineer/skills/04-llm-guardrails/README.md +735 -0
  11. package/tech_hub_skills/roles/ai-engineer/skills/05-vector-embeddings/README.md +711 -0
  12. package/tech_hub_skills/roles/ai-engineer/skills/06-llm-evaluation/README.md +777 -0
  13. package/tech_hub_skills/roles/azure/skills/01-infrastructure-fundamentals/README.md +264 -0
  14. package/tech_hub_skills/roles/azure/skills/02-data-factory/README.md +264 -0
  15. package/tech_hub_skills/roles/azure/skills/03-synapse-analytics/README.md +264 -0
  16. package/tech_hub_skills/roles/azure/skills/04-databricks/README.md +264 -0
  17. package/tech_hub_skills/roles/azure/skills/05-functions/README.md +264 -0
  18. package/tech_hub_skills/roles/azure/skills/06-kubernetes-service/README.md +264 -0
  19. package/tech_hub_skills/roles/azure/skills/07-openai-service/README.md +264 -0
  20. package/tech_hub_skills/roles/azure/skills/08-machine-learning/README.md +264 -0
  21. package/tech_hub_skills/roles/azure/skills/09-storage-adls/README.md +264 -0
  22. package/tech_hub_skills/roles/azure/skills/10-networking/README.md +264 -0
  23. package/tech_hub_skills/roles/azure/skills/11-sql-cosmos/README.md +264 -0
  24. package/tech_hub_skills/roles/azure/skills/12-event-hubs/README.md +264 -0
  25. package/tech_hub_skills/roles/code-review/skills/01-automated-code-review/README.md +394 -0
  26. package/tech_hub_skills/roles/code-review/skills/02-pr-review-workflow/README.md +427 -0
  27. package/tech_hub_skills/roles/code-review/skills/03-code-quality-gates/README.md +518 -0
  28. package/tech_hub_skills/roles/code-review/skills/04-reviewer-assignment/README.md +504 -0
  29. package/tech_hub_skills/roles/code-review/skills/05-review-analytics/README.md +540 -0
  30. package/tech_hub_skills/roles/data-engineer/skills/01-lakehouse-architecture/README.md +550 -0
  31. package/tech_hub_skills/roles/data-engineer/skills/02-etl-pipeline/README.md +580 -0
  32. package/tech_hub_skills/roles/data-engineer/skills/03-data-quality/README.md +579 -0
  33. package/tech_hub_skills/roles/data-engineer/skills/04-streaming-pipelines/README.md +608 -0
  34. package/tech_hub_skills/roles/data-engineer/skills/05-performance-optimization/README.md +547 -0
  35. package/tech_hub_skills/roles/data-governance/skills/01-data-catalog/README.md +112 -0
  36. package/tech_hub_skills/roles/data-governance/skills/02-data-lineage/README.md +129 -0
  37. package/tech_hub_skills/roles/data-governance/skills/03-data-quality-framework/README.md +182 -0
  38. package/tech_hub_skills/roles/data-governance/skills/04-access-control/README.md +39 -0
  39. package/tech_hub_skills/roles/data-governance/skills/05-master-data-management/README.md +40 -0
  40. package/tech_hub_skills/roles/data-governance/skills/06-compliance-privacy/README.md +46 -0
  41. package/tech_hub_skills/roles/data-scientist/skills/01-eda-automation/README.md +230 -0
  42. package/tech_hub_skills/roles/data-scientist/skills/02-statistical-modeling/README.md +264 -0
  43. package/tech_hub_skills/roles/data-scientist/skills/03-feature-engineering/README.md +264 -0
  44. package/tech_hub_skills/roles/data-scientist/skills/04-predictive-modeling/README.md +264 -0
  45. package/tech_hub_skills/roles/data-scientist/skills/05-customer-analytics/README.md +264 -0
  46. package/tech_hub_skills/roles/data-scientist/skills/06-campaign-analysis/README.md +264 -0
  47. package/tech_hub_skills/roles/data-scientist/skills/07-experimentation/README.md +264 -0
  48. package/tech_hub_skills/roles/data-scientist/skills/08-data-visualization/README.md +264 -0
  49. package/tech_hub_skills/roles/devops/skills/01-cicd-pipeline/README.md +264 -0
  50. package/tech_hub_skills/roles/devops/skills/02-container-orchestration/README.md +264 -0
  51. package/tech_hub_skills/roles/devops/skills/03-infrastructure-as-code/README.md +264 -0
  52. package/tech_hub_skills/roles/devops/skills/04-gitops/README.md +264 -0
  53. package/tech_hub_skills/roles/devops/skills/05-environment-management/README.md +264 -0
  54. package/tech_hub_skills/roles/devops/skills/06-automated-testing/README.md +264 -0
  55. package/tech_hub_skills/roles/devops/skills/07-release-management/README.md +264 -0
  56. package/tech_hub_skills/roles/devops/skills/08-monitoring-alerting/README.md +264 -0
  57. package/tech_hub_skills/roles/devops/skills/09-devsecops/README.md +265 -0
  58. package/tech_hub_skills/roles/finops/skills/01-cost-visibility/README.md +264 -0
  59. package/tech_hub_skills/roles/finops/skills/02-resource-tagging/README.md +264 -0
  60. package/tech_hub_skills/roles/finops/skills/03-budget-management/README.md +264 -0
  61. package/tech_hub_skills/roles/finops/skills/04-reserved-instances/README.md +264 -0
  62. package/tech_hub_skills/roles/finops/skills/05-spot-optimization/README.md +264 -0
  63. package/tech_hub_skills/roles/finops/skills/06-storage-tiering/README.md +264 -0
  64. package/tech_hub_skills/roles/finops/skills/07-compute-rightsizing/README.md +264 -0
  65. package/tech_hub_skills/roles/finops/skills/08-chargeback/README.md +264 -0
  66. package/tech_hub_skills/roles/ml-engineer/skills/01-mlops-pipeline/README.md +566 -0
  67. package/tech_hub_skills/roles/ml-engineer/skills/02-feature-engineering/README.md +655 -0
  68. package/tech_hub_skills/roles/ml-engineer/skills/03-model-training/README.md +704 -0
  69. package/tech_hub_skills/roles/ml-engineer/skills/04-model-serving/README.md +845 -0
  70. package/tech_hub_skills/roles/ml-engineer/skills/05-model-monitoring/README.md +874 -0
  71. package/tech_hub_skills/roles/mlops/skills/01-ml-pipeline-orchestration/README.md +264 -0
  72. package/tech_hub_skills/roles/mlops/skills/02-experiment-tracking/README.md +264 -0
  73. package/tech_hub_skills/roles/mlops/skills/03-model-registry/README.md +264 -0
  74. package/tech_hub_skills/roles/mlops/skills/04-feature-store/README.md +264 -0
  75. package/tech_hub_skills/roles/mlops/skills/05-model-deployment/README.md +264 -0
  76. package/tech_hub_skills/roles/mlops/skills/06-model-observability/README.md +264 -0
  77. package/tech_hub_skills/roles/mlops/skills/07-data-versioning/README.md +264 -0
  78. package/tech_hub_skills/roles/mlops/skills/08-ab-testing/README.md +264 -0
  79. package/tech_hub_skills/roles/mlops/skills/09-automated-retraining/README.md +264 -0
  80. package/tech_hub_skills/roles/platform-engineer/skills/01-internal-developer-platform/README.md +153 -0
  81. package/tech_hub_skills/roles/platform-engineer/skills/02-self-service-infrastructure/README.md +57 -0
  82. package/tech_hub_skills/roles/platform-engineer/skills/03-slo-sli-management/README.md +59 -0
  83. package/tech_hub_skills/roles/platform-engineer/skills/04-developer-experience/README.md +57 -0
  84. package/tech_hub_skills/roles/platform-engineer/skills/05-incident-management/README.md +73 -0
  85. package/tech_hub_skills/roles/platform-engineer/skills/06-capacity-management/README.md +59 -0
  86. package/tech_hub_skills/roles/product-designer/skills/01-requirements-discovery/README.md +407 -0
  87. package/tech_hub_skills/roles/product-designer/skills/02-user-research/README.md +382 -0
  88. package/tech_hub_skills/roles/product-designer/skills/03-brainstorming-ideation/README.md +437 -0
  89. package/tech_hub_skills/roles/product-designer/skills/04-ux-design/README.md +496 -0
  90. package/tech_hub_skills/roles/product-designer/skills/05-product-market-fit/README.md +376 -0
  91. package/tech_hub_skills/roles/product-designer/skills/06-stakeholder-management/README.md +412 -0
  92. package/tech_hub_skills/roles/security-architect/skills/01-pii-detection/README.md +319 -0
  93. package/tech_hub_skills/roles/security-architect/skills/02-threat-modeling/README.md +264 -0
  94. package/tech_hub_skills/roles/security-architect/skills/03-infrastructure-security/README.md +264 -0
  95. package/tech_hub_skills/roles/security-architect/skills/04-iam/README.md +264 -0
  96. package/tech_hub_skills/roles/security-architect/skills/05-application-security/README.md +264 -0
  97. package/tech_hub_skills/roles/security-architect/skills/06-secrets-management/README.md +264 -0
  98. package/tech_hub_skills/roles/security-architect/skills/07-security-monitoring/README.md +264 -0
  99. package/tech_hub_skills/roles/system-design/skills/01-architecture-patterns/README.md +337 -0
  100. package/tech_hub_skills/roles/system-design/skills/02-requirements-engineering/README.md +264 -0
  101. package/tech_hub_skills/roles/system-design/skills/03-scalability/README.md +264 -0
  102. package/tech_hub_skills/roles/system-design/skills/04-high-availability/README.md +264 -0
  103. package/tech_hub_skills/roles/system-design/skills/05-cost-optimization-design/README.md +264 -0
  104. package/tech_hub_skills/roles/system-design/skills/06-api-design/README.md +264 -0
  105. package/tech_hub_skills/roles/system-design/skills/07-observability-architecture/README.md +264 -0
  106. package/tech_hub_skills/roles/system-design/skills/08-process-automation/PROCESS_TEMPLATE.md +336 -0
  107. package/tech_hub_skills/roles/system-design/skills/08-process-automation/README.md +521 -0
  108. package/tech_hub_skills/skills/README.md +336 -0
  109. package/tech_hub_skills/skills/ai-engineer.md +104 -0
  110. package/tech_hub_skills/skills/azure.md +149 -0
  111. package/tech_hub_skills/skills/code-review.md +399 -0
  112. package/tech_hub_skills/skills/compliance-automation.md +747 -0
  113. package/tech_hub_skills/skills/data-engineer.md +113 -0
  114. package/tech_hub_skills/skills/data-governance.md +102 -0
  115. package/tech_hub_skills/skills/data-scientist.md +123 -0
  116. package/tech_hub_skills/skills/devops.md +160 -0
  117. package/tech_hub_skills/skills/docker.md +160 -0
  118. package/tech_hub_skills/skills/enterprise-dashboard.md +613 -0
  119. package/tech_hub_skills/skills/finops.md +184 -0
  120. package/tech_hub_skills/skills/ml-engineer.md +115 -0
  121. package/tech_hub_skills/skills/mlops.md +187 -0
  122. package/tech_hub_skills/skills/optimization-advisor.md +329 -0
  123. package/tech_hub_skills/skills/orchestrator.md +497 -0
  124. package/tech_hub_skills/skills/platform-engineer.md +102 -0
  125. package/tech_hub_skills/skills/process-automation.md +226 -0
  126. package/tech_hub_skills/skills/process-changelog.md +184 -0
  127. package/tech_hub_skills/skills/process-documentation.md +484 -0
  128. package/tech_hub_skills/skills/process-kanban.md +324 -0
  129. package/tech_hub_skills/skills/process-versioning.md +214 -0
  130. package/tech_hub_skills/skills/product-designer.md +104 -0
  131. package/tech_hub_skills/skills/project-starter.md +443 -0
  132. package/tech_hub_skills/skills/security-architect.md +135 -0
  133. package/tech_hub_skills/skills/system-design.md +126 -0
@@ -0,0 +1,264 @@
1
+ # Skill 09: Automated Retraining Pipelines
2
+
3
+ ## 🎯 Overview
4
+ Trigger-based retraining, validation gates
5
+
6
+ ## 🔗 Connections
7
+ - **Data Engineer**: Data foundation and pipelines (de-01, de-02, de-03)
8
+ - **Security Architect**: Compliance, PII detection, access control (sa-01, sa-02)
9
+ - **ML Engineer**: Model lifecycle and serving (ml-01, ml-04)
10
+ - **AI Engineer**: LLM integration and automation (ai-01, ai-02, ai-07)
11
+ - **MLOps**: Experiment tracking and monitoring (mo-01, mo-03, mo-06)
12
+ - **FinOps**: Cost optimization and tracking (fo-01, fo-07)
13
+ - **DevOps**: CI/CD, containerization, monitoring (do-01, do-03, do-08)
14
+ - **System Design**: Architecture patterns (sd-01)
15
+ - **Dependencies**: mo-06
16
+
17
+ ## 🛠️ Tools Included
18
+
19
+ ### 1. Primary Implementation Script
20
+ Core implementation for automated retraining pipelines.
21
+
22
+ ### 2. Configuration Manager
23
+ Manage configuration and settings for automated retraining pipelines.
24
+
25
+ ### 3. Integration Connector
26
+ Connect with other Tech Hub skills and external services.
27
+
28
+ ### 4. Monitoring & Metrics
29
+ Track performance, costs, and quality metrics.
30
+
31
+ ### 5. Automation Scripts
32
+ Automate common workflows and tasks.
33
+
34
+ ## 📊 Key Metrics
35
+ - Implementation quality score
36
+ - Performance benchmarks
37
+ - Cost efficiency
38
+ - Security compliance rate
39
+ - Integration test coverage
40
+
41
+ ## 🚀 Quick Start
42
+
43
+ ```python
44
+ # Example implementation for Automated Retraining Pipelines
45
+ from mlops import 09_automated_retraining
46
+
47
+ # Initialize
48
+ service = 09AutomatedRetrainingService()
49
+
50
+ # Execute
51
+ result = service.execute(
52
+ config={
53
+ "environment": "production",
54
+ "enable_monitoring": True
55
+ }
56
+ )
57
+
58
+ print(f"Status: {result.status}")
59
+ print(f"Metrics: {result.metrics}")
60
+ ```
61
+
62
+ ## 📚 Best Practices
63
+
64
+ ### Cost Optimization (FinOps Integration)
65
+
66
+ 1. **Monitor Resource Costs**
67
+ - Track costs per execution
68
+ - Set budget alerts
69
+ - Optimize resource utilization
70
+ - Reference: FinOps fo-01 (Cost Monitoring)
71
+
72
+ 2. **Right-size Resources**
73
+ - Use appropriate compute sizes
74
+ - Implement auto-scaling
75
+ - Leverage spot/reserved instances where applicable
76
+ - Reference: FinOps fo-06, fo-07
77
+
78
+ ### Security & Privacy (Security Architect Integration)
79
+
80
+ 3. **Implement Access Control**
81
+ - Use least privilege principle
82
+ - Enable Azure AD authentication
83
+ - Audit access logs
84
+ - Reference: Security Architect sa-02 (IAM), sa-04
85
+
86
+ 4. **Data Protection**
87
+ - Encrypt data at rest and in transit
88
+ - Scan for PII before processing
89
+ - Implement data retention policies
90
+ - Reference: Security Architect sa-01 (PII Detection)
91
+
92
+ ### Quality & Governance (Data Engineer Integration)
93
+
94
+ 5. **Ensure Data Quality**
95
+ - Validate inputs and outputs
96
+ - Implement quality gates
97
+ - Monitor data freshness
98
+ - Reference: Data Engineer de-03 (Data Quality)
99
+
100
+ ### Lifecycle Management (MLOps Integration)
101
+
102
+ 6. **Version Control**
103
+ - Version all configurations
104
+ - Track changes over time
105
+ - Enable rollback capability
106
+ - Reference: MLOps mo-03 (Versioning)
107
+
108
+ 7. **Continuous Monitoring**
109
+ - Track performance metrics
110
+ - Set up alerting
111
+ - Monitor for drift
112
+ - Reference: MLOps mo-06 (Monitoring)
113
+
114
+ ### Deployment & Operations (DevOps Integration)
115
+
116
+ 8. **Automate Deployment**
117
+ - Implement CI/CD pipelines
118
+ - Use infrastructure as code
119
+ - Enable blue-green deployments
120
+ - Reference: DevOps do-01 (CI/CD), do-03 (IaC)
121
+
122
+ 9. **Observability**
123
+ - Implement distributed tracing
124
+ - Set up dashboards
125
+ - Enable logging and metrics
126
+ - Reference: DevOps do-08 (Monitoring)
127
+
128
+ ### Azure-Specific Best Practices
129
+
130
+ 10. **Leverage Azure Services**
131
+ - Use managed services where possible
132
+ - Implement Azure Policy for governance
133
+ - Enable Azure Monitor integration
134
+ - Use managed identities for authentication
135
+
136
+ ## 💰 Cost Optimization Examples
137
+
138
+ ### Cost Tracking
139
+ ```python
140
+ from finops_tracker import CostTracker
141
+
142
+ tracker = CostTracker()
143
+
144
+ @tracker.track_costs
145
+ def run_operation(params):
146
+ # Your operation here
147
+ result = execute_operation(params)
148
+ return result
149
+
150
+ # Monthly report
151
+ report = tracker.monthly_report()
152
+ print(f"Total cost: ${report.total_cost:.2f}")
153
+ print(f"Cost per operation: ${report.avg_cost:.4f}")
154
+ ```
155
+
156
+ ## 🔒 Security Best Practices Examples
157
+
158
+ ### Access Control Implementation
159
+ ```python
160
+ from azure.identity import DefaultAzureCredential
161
+ from security_manager import AccessControl
162
+
163
+ credential = DefaultAzureCredential()
164
+ access_control = AccessControl(credential)
165
+
166
+ # Validate access before operation
167
+ @access_control.require_role("operator")
168
+ def sensitive_operation(data):
169
+ # Operation logic
170
+ return process_data(data)
171
+ ```
172
+
173
+ ## 📊 Enhanced Metrics & Monitoring
174
+
175
+ | Metric Category | Metric | Target | Tool |
176
+ |-----------------|--------|--------|------|
177
+ | **Performance** | Execution time (p95) | <5s | Azure Monitor |
178
+ | | Success rate | >99% | Custom metrics |
179
+ | **Cost** | Cost per operation | <$0.05 | FinOps dashboard |
180
+ | | Resource utilization | >75% | Azure Monitor |
181
+ | **Quality** | Error rate | <1% | App Insights |
182
+ | | Data quality score | >95% | Quality tracker |
183
+ | **Security** | Access violations | 0 | Security logs |
184
+ | | Compliance score | 100% | Audit system |
185
+
186
+ ## 🚀 Deployment Pipeline
187
+
188
+ ### CI/CD Example
189
+ ```yaml
190
+ # .github/workflows/deploy-09-automated-retraining.yml
191
+ name: Deploy Automated Retraining Pipelines
192
+
193
+ on:
194
+ push:
195
+ paths:
196
+ - 'mlops/skills/09-automated-retraining/**'
197
+ branches:
198
+ - main
199
+
200
+ jobs:
201
+ test:
202
+ runs-on: ubuntu-latest
203
+ steps:
204
+ - uses: actions/checkout@v3
205
+ - name: Run tests
206
+ run: pytest tests/ -v
207
+ - name: Security scan
208
+ run: python scripts/security_scan.py
209
+ - name: Cost validation
210
+ run: python scripts/validate_costs.py
211
+
212
+ deploy:
213
+ needs: test
214
+ runs-on: ubuntu-latest
215
+ steps:
216
+ - name: Deploy to Azure
217
+ run: |
218
+ az deployment group create \
219
+ --resource-group rg-mlops \
220
+ --template-file infra/main.bicep
221
+ - name: Monitor deployment
222
+ run: python scripts/monitor_health.py --duration 10m
223
+ ```
224
+
225
+ ## 🔄 Integration Workflow
226
+
227
+ ### End-to-End Process
228
+ ```
229
+ 1. Input Validation
230
+
231
+ 2. Security Checks (sa-01, sa-02)
232
+
233
+ 3. Main Processing
234
+
235
+ 4. Quality Validation (de-03)
236
+
237
+ 5. Cost Tracking (fo-01)
238
+
239
+ 6. Monitoring & Logging (do-08)
240
+
241
+ 7. Output Delivery
242
+ ```
243
+
244
+ ## 🎯 Quick Wins
245
+
246
+ 1. **Enable cost tracking** - Monitor spending from day one
247
+ 2. **Implement security scanning** - Catch vulnerabilities early
248
+ 3. **Set up monitoring** - Full visibility into operations
249
+ 4. **Automate deployment** - Faster, safer releases
250
+ 5. **Add quality gates** - Prevent bad data from propagating
251
+ 6. **Enable caching** - Reduce redundant operations
252
+ 7. **Implement retries** - Improve reliability
253
+ 8. **Set up alerting** - Know about issues immediately
254
+
255
+ ## 🔗 Related Skills
256
+ - mo-06
257
+
258
+ ---
259
+
260
+ **Skill ID**: `09-automated-retraining`
261
+ **Complexity**: Expert
262
+ **Dependencies**: mo-06
263
+ **Business Value**: High
264
+ **Estimated Implementation Time**: 1-2 weeks
@@ -0,0 +1,153 @@
1
+ # pe-01: Internal Developer Platform (IDP)
2
+
3
+ ## Overview
4
+
5
+ Build developer portals using Backstage for service catalog, golden path templates, self-service provisioning, and platform documentation.
6
+
7
+ ## Key Capabilities
8
+
9
+ - **Developer Portal**: Centralized platform UI (Backstage)
10
+ - **Service Catalog**: All services, APIs, documentation
11
+ - **Golden Path Templates**: Scaffolding for new services
12
+ - **Self-Service Provisioning**: One-click infrastructure
13
+ - **Platform Documentation**: Unified docs portal
14
+
15
+ ## Tools & Technologies
16
+
17
+ - **Backstage**: Open-source developer portal
18
+ - **Port**: Developer portal platform
19
+ - **Humanitec**: Platform orchestrator
20
+ - **Kratix**: Platform-as-a-product framework
21
+
22
+ ## Implementation
23
+
24
+ ### 1. Backstage Setup
25
+
26
+ ```yaml
27
+ # app-config.yaml
28
+ app:
29
+ title: Tech Hub Platform
30
+ baseUrl: http://localhost:3000
31
+
32
+ organization:
33
+ name: Tech Innovation Hub
34
+
35
+ backend:
36
+ baseUrl: http://localhost:7007
37
+ listen:
38
+ port: 7007
39
+ database:
40
+ client: pg
41
+ connection:
42
+ host: ${POSTGRES_HOST}
43
+ port: ${POSTGRES_PORT}
44
+
45
+ catalog:
46
+ providers:
47
+ github:
48
+ organization: 'your-org'
49
+ catalogPath: '/catalog-info.yaml'
50
+ ```
51
+
52
+ ### 2. Service Catalog
53
+
54
+ ```yaml
55
+ # catalog-info.yaml
56
+ apiVersion: backstage.io/v1alpha1
57
+ kind: Component
58
+ metadata:
59
+ name: customer-api
60
+ description: Customer management API
61
+ annotations:
62
+ github.com/project-slug: your-org/customer-api
63
+ sonarqube.org/project-key: customer-api
64
+ spec:
65
+ type: service
66
+ lifecycle: production
67
+ owner: team-platform
68
+ system: customer-domain
69
+ providesApis:
70
+ - customer-api
71
+ consumesApis:
72
+ - auth-api
73
+ ```
74
+
75
+ ### 3. Golden Path Template
76
+
77
+ ```yaml
78
+ # template.yaml
79
+ apiVersion: scaffolder.backstage.io/v1beta3
80
+ kind: Template
81
+ metadata:
82
+ name: python-fastapi-service
83
+ title: Python FastAPI Service
84
+ description: Create a new Python FastAPI microservice
85
+ spec:
86
+ owner: platform-team
87
+ type: service
88
+
89
+ parameters:
90
+ - title: Service Information
91
+ required:
92
+ - name
93
+ - owner
94
+ properties:
95
+ name:
96
+ title: Service Name
97
+ type: string
98
+ description: Unique name for the service
99
+ owner:
100
+ title: Owner
101
+ type: string
102
+ description: Team that owns this service
103
+
104
+ steps:
105
+ - id: fetch-template
106
+ name: Fetch Template
107
+ action: fetch:template
108
+ input:
109
+ url: ./skeleton
110
+ values:
111
+ name: ${{ parameters.name }}
112
+ owner: ${{ parameters.owner }}
113
+
114
+ - id: publish
115
+ name: Publish to GitHub
116
+ action: publish:github
117
+ input:
118
+ repoUrl: github.com?owner=your-org&repo=${{ parameters.name }}
119
+
120
+ - id: register
121
+ name: Register Component
122
+ action: catalog:register
123
+ input:
124
+ repoContentsUrl: ${{ steps.publish.output.repoContentsUrl }}
125
+ catalogInfoPath: '/catalog-info.yaml'
126
+ ```
127
+
128
+ ## Best Practices
129
+
130
+ 1. **Start Small**: Begin with service catalog, add features iteratively
131
+ 2. **Golden Paths**: Create templates for 80% of use cases
132
+ 3. **Self-Service**: Minimize manual ticket workflows
133
+ 4. **Measure Adoption**: Track active users and template usage
134
+ 5. **Documentation**: Keep docs updated and searchable
135
+
136
+ ## Cost Optimization
137
+
138
+ - Host Backstage on Kubernetes spot instances
139
+ - Use PostgreSQL managed service (cheaper than self-hosted)
140
+ - Cache plugin data to reduce API calls
141
+ - Right-size backend resources
142
+
143
+ ## Integration
144
+
145
+ **Connects with:**
146
+ - do-01 (CI/CD): Link to deployment pipelines
147
+ - do-02 (Kubernetes): Service deployment info
148
+ - pe-03 (SLO): Display SLO status
149
+ - dg-01 (Catalog): Link to data catalog
150
+
151
+ ## Quick Win
152
+
153
+ Deploy Backstage with GitHub integration, import 5 services to catalog, show team the unified view of their services.
@@ -0,0 +1,57 @@
1
+ # pe-02: Self-Service Infrastructure
2
+
3
+ ## Overview
4
+
5
+ Enable developers to provision namespaces, databases, secrets, and environments through self-service automation.
6
+
7
+ ## Key Capabilities
8
+
9
+ - **Namespace Provisioning**: Auto-create K8s namespaces
10
+ - **Database Provisioning**: Self-service DB creation
11
+ - **Secret Management**: Automated secret injection
12
+ - **Resource Quotas**: Automatic quota management
13
+ - **Environment Management**: Dev/staging/prod provisioning
14
+
15
+ ## Implementation
16
+
17
+ ```python
18
+ # Self-service namespace provisioning
19
+ from kubernetes import client, config
20
+
21
+ def provision_namespace(team_name, environment):
22
+ """Create namespace with quotas and RBAC"""
23
+ config.load_kube_config()
24
+ v1 = client.CoreV1Api()
25
+
26
+ # Create namespace
27
+ namespace = client.V1Namespace(
28
+ metadata=client.V1ObjectMeta(
29
+ name=f"{team_name}-{environment}",
30
+ labels={
31
+ "team": team_name,
32
+ "environment": environment
33
+ }
34
+ )
35
+ )
36
+ v1.create_namespace(namespace)
37
+
38
+ # Apply resource quota
39
+ quota = client.V1ResourceQuota(
40
+ metadata=client.V1ObjectMeta(name="default-quota"),
41
+ spec=client.V1ResourceQuotaSpec(
42
+ hard={
43
+ "requests.cpu": "10",
44
+ "requests.memory": "20Gi",
45
+ "pods": "50"
46
+ }
47
+ )
48
+ )
49
+ v1.create_namespaced_resource_quota(
50
+ namespace=namespace.metadata.name,
51
+ body=quota
52
+ )
53
+ ```
54
+
55
+ ## Integration
56
+
57
+ **Connects with:** do-02 (Kubernetes), sa-06 (Secrets), pe-01 (IDP)
@@ -0,0 +1,59 @@
1
+ # pe-03: SLO/SLI Management
2
+
3
+ ## Overview
4
+
5
+ Define and track Service Level Objectives (SLOs), manage error budgets, instrument SLIs, and create SLO-based alerting.
6
+
7
+ ## Key Capabilities
8
+
9
+ - **SLO Definition**: Availability, latency, error rate targets
10
+ - **Error Budget Management**: Track remaining error budget
11
+ - **SLI Instrumentation**: Collect service-level indicators
12
+ - **SLO-Based Alerting**: Alert on error budget burn
13
+ - **SLO Dashboards**: Visualize SLO compliance
14
+
15
+ ## Implementation
16
+
17
+ ```yaml
18
+ # SLO definition (Sloth)
19
+ version: prometheus/v1
20
+ service: customer-api
21
+ slos:
22
+ - name: requests-availability
23
+ objective: 99.9
24
+ description: 99.9% of requests successful
25
+ sli:
26
+ events:
27
+ error_query: |
28
+ sum(rate(http_requests_total{job="customer-api",code=~"5.."}[5m]))
29
+ total_query: |
30
+ sum(rate(http_requests_total{job="customer-api"}[5m]))
31
+ alerting:
32
+ name: CustomerAPIHighErrorRate
33
+ labels:
34
+ severity: page
35
+ annotations:
36
+ summary: High error rate on customer API
37
+ ```
38
+
39
+ ```python
40
+ # Error budget calculation
41
+ def calculate_error_budget(slo_target, time_window_days=30):
42
+ """Calculate remaining error budget"""
43
+ total_minutes = time_window_days * 24 * 60
44
+ allowed_downtime = total_minutes * (1 - slo_target/100)
45
+
46
+ actual_downtime = get_actual_downtime(time_window_days)
47
+ remaining_budget = allowed_downtime - actual_downtime
48
+
49
+ return {
50
+ 'allowed_downtime_minutes': allowed_downtime,
51
+ 'actual_downtime_minutes': actual_downtime,
52
+ 'remaining_budget_minutes': remaining_budget,
53
+ 'budget_consumed_percent': (actual_downtime / allowed_downtime) * 100
54
+ }
55
+ ```
56
+
57
+ ## Integration
58
+
59
+ **Connects with:** do-08 (Monitoring), pe-01 (IDP), pe-05 (Incident Management)
@@ -0,0 +1,57 @@
1
+ # pe-04: Developer Experience
2
+
3
+ ## Overview
4
+
5
+ Improve developer velocity through automated onboarding, documentation-as-code, CLI tools, and DORA metrics tracking.
6
+
7
+ ## Key Capabilities
8
+
9
+ - **Automated Onboarding**: Zero-to-commit in < 1 hour
10
+ - **Documentation-as-Code**: Docs in git, versioned
11
+ - **Developer CLI**: Unified command-line tools
12
+ - **DORA Metrics**: Deployment frequency, lead time, MTTR, change fail rate
13
+ - **Feedback Collection**: Regular developer surveys
14
+
15
+ ## Implementation
16
+
17
+ ```bash
18
+ # Developer CLI
19
+ #!/bin/bash
20
+ # platform-cli
21
+
22
+ case $1 in
23
+ create-service)
24
+ backstage scaffold $2 --template python-fastapi
25
+ ;;
26
+ deploy)
27
+ kubectl apply -f k8s/ --namespace=$2
28
+ ;;
29
+ logs)
30
+ kubectl logs -f deployment/$2 -n $3
31
+ ;;
32
+ metrics)
33
+ open "https://grafana.company.com/d/dora-metrics"
34
+ ;;
35
+ esac
36
+ ```
37
+
38
+ ```python
39
+ # DORA metrics collection
40
+ def calculate_dora_metrics(team_name, days=30):
41
+ """Calculate DORA metrics"""
42
+ deployments = get_deployments(team_name, days)
43
+ incidents = get_incidents(team_name, days)
44
+
45
+ metrics = {
46
+ 'deployment_frequency': len(deployments) / days,
47
+ 'lead_time_hours': sum(d.lead_time for d in deployments) / len(deployments),
48
+ 'mttr_hours': sum(i.resolution_time for i in incidents) / len(incidents) if incidents else 0,
49
+ 'change_fail_rate': len([d for d in deployments if d.failed]) / len(deployments)
50
+ }
51
+
52
+ return metrics
53
+ ```
54
+
55
+ ## Integration
56
+
57
+ **Connects with:** pe-01 (IDP), do-01 (CI/CD), pe-05 (Incident Management)
@@ -0,0 +1,73 @@
1
+ # pe-05: Incident Management
2
+
3
+ ## Overview
4
+
5
+ On-call management, incident response procedures, postmortem templates, runbook automation, and alert routing.
6
+
7
+ ## Key Capabilities
8
+
9
+ - **On-Call Management**: Rotation schedules
10
+ - **Incident Response**: Clear escalation procedures
11
+ - **Postmortem Templates**: Blameless retrospectives
12
+ - **Runbook Automation**: Auto-remediation
13
+ - **Alert Routing**: Intelligent alert distribution
14
+
15
+ ## Implementation
16
+
17
+ ```yaml
18
+ # PagerDuty incident response
19
+ services:
20
+ - name: customer-api
21
+ escalation_policy: platform-team
22
+ alert_grouping: intelligent
23
+ incident_urgency_rule:
24
+ type: constant
25
+ urgency: high
26
+
27
+ # Runbook automation
28
+ runbooks:
29
+ - name: high-memory-usage
30
+ trigger: memory_usage > 90%
31
+ actions:
32
+ - restart_pod
33
+ - scale_replicas: 2
34
+ - notify_oncall
35
+ ```
36
+
37
+ ```python
38
+ # Postmortem template generator
39
+ def create_postmortem(incident_id):
40
+ """Generate postmortem document"""
41
+ incident = get_incident(incident_id)
42
+
43
+ template = f"""
44
+ # Incident Postmortem: {incident.title}
45
+
46
+ ## Incident Summary
47
+ - **Date**: {incident.date}
48
+ - **Duration**: {incident.duration}
49
+ - **Severity**: {incident.severity}
50
+ - **Responders**: {incident.responders}
51
+
52
+ ## Timeline
53
+ {incident.timeline}
54
+
55
+ ## Root Cause
56
+ [To be filled]
57
+
58
+ ## Resolution
59
+ {incident.resolution}
60
+
61
+ ## Action Items
62
+ - [ ] TODO 1
63
+ - [ ] TODO 2
64
+
65
+ ## Lessons Learned
66
+ [To be filled]
67
+ """
68
+ return template
69
+ ```
70
+
71
+ ## Integration
72
+
73
+ **Connects with:** pe-03 (SLO), do-08 (Monitoring), pe-01 (IDP)