@forwardimpact/schema 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (65) hide show
  1. package/bin/fit-schema.js +260 -0
  2. package/examples/behaviours/_index.yaml +8 -0
  3. package/examples/behaviours/outcome_ownership.yaml +43 -0
  4. package/examples/behaviours/polymathic_knowledge.yaml +41 -0
  5. package/examples/behaviours/precise_communication.yaml +39 -0
  6. package/examples/behaviours/relentless_curiosity.yaml +37 -0
  7. package/examples/behaviours/systems_thinking.yaml +40 -0
  8. package/examples/capabilities/_index.yaml +8 -0
  9. package/examples/capabilities/business.yaml +189 -0
  10. package/examples/capabilities/delivery.yaml +305 -0
  11. package/examples/capabilities/people.yaml +68 -0
  12. package/examples/capabilities/reliability.yaml +414 -0
  13. package/examples/capabilities/scale.yaml +378 -0
  14. package/examples/copilot-setup-steps.yaml +25 -0
  15. package/examples/devcontainer.yaml +21 -0
  16. package/examples/disciplines/_index.yaml +6 -0
  17. package/examples/disciplines/data_engineering.yaml +78 -0
  18. package/examples/disciplines/engineering_management.yaml +63 -0
  19. package/examples/disciplines/software_engineering.yaml +78 -0
  20. package/examples/drivers.yaml +202 -0
  21. package/examples/framework.yaml +69 -0
  22. package/examples/grades.yaml +115 -0
  23. package/examples/questions/behaviours/outcome_ownership.yaml +51 -0
  24. package/examples/questions/behaviours/polymathic_knowledge.yaml +47 -0
  25. package/examples/questions/behaviours/precise_communication.yaml +54 -0
  26. package/examples/questions/behaviours/relentless_curiosity.yaml +50 -0
  27. package/examples/questions/behaviours/systems_thinking.yaml +52 -0
  28. package/examples/questions/skills/architecture_design.yaml +53 -0
  29. package/examples/questions/skills/cloud_platforms.yaml +47 -0
  30. package/examples/questions/skills/code_quality.yaml +48 -0
  31. package/examples/questions/skills/data_modeling.yaml +45 -0
  32. package/examples/questions/skills/devops.yaml +46 -0
  33. package/examples/questions/skills/full_stack_development.yaml +47 -0
  34. package/examples/questions/skills/sre_practices.yaml +43 -0
  35. package/examples/questions/skills/stakeholder_management.yaml +48 -0
  36. package/examples/questions/skills/team_collaboration.yaml +42 -0
  37. package/examples/questions/skills/technical_writing.yaml +42 -0
  38. package/examples/self-assessments.yaml +64 -0
  39. package/examples/stages.yaml +139 -0
  40. package/examples/tracks/_index.yaml +5 -0
  41. package/examples/tracks/platform.yaml +49 -0
  42. package/examples/tracks/sre.yaml +48 -0
  43. package/examples/vscode-settings.yaml +21 -0
  44. package/lib/index-generator.js +65 -0
  45. package/lib/index.js +44 -0
  46. package/lib/levels.js +601 -0
  47. package/lib/loader.js +599 -0
  48. package/lib/modifiers.js +23 -0
  49. package/lib/schema-validation.js +438 -0
  50. package/lib/validation.js +2130 -0
  51. package/package.json +49 -0
  52. package/schema/json/behaviour-questions.schema.json +68 -0
  53. package/schema/json/behaviour.schema.json +73 -0
  54. package/schema/json/capability.schema.json +220 -0
  55. package/schema/json/defs.schema.json +132 -0
  56. package/schema/json/discipline.schema.json +132 -0
  57. package/schema/json/drivers.schema.json +48 -0
  58. package/schema/json/framework.schema.json +55 -0
  59. package/schema/json/grades.schema.json +121 -0
  60. package/schema/json/index.schema.json +18 -0
  61. package/schema/json/self-assessments.schema.json +52 -0
  62. package/schema/json/skill-questions.schema.json +68 -0
  63. package/schema/json/stages.schema.json +84 -0
  64. package/schema/json/track.schema.json +100 -0
  65. package/schema/rdf/pathway.ttl +2362 -0
@@ -0,0 +1,414 @@
1
+ # yaml-language-server: $schema=https://schema.forwardimpact.team/json/capability.schema.json
2
+
3
+ name: Reliability
4
+ emojiIcon: 🛡️
5
+ displayOrder: 5
6
+ description: |
7
+ Ensuring systems are dependable, secure, and observable.
8
+ Includes DevOps practices, security, monitoring, incident response,
9
+ and infrastructure management.
10
+ professionalResponsibilities:
11
+ awareness:
12
+ Follow security and operational guidelines, escalate issues appropriately,
13
+ and participate in on-call rotations with guidance
14
+ foundational:
15
+ Implement reliability practices in your code, create basic monitoring, and
16
+ contribute effectively to incident response
17
+ working:
18
+ Design for reliability, implement comprehensive monitoring and alerting,
19
+ lead incident response, and drive post-incident improvements
20
+ practitioner:
21
+ Establish SLOs/SLIs across teams, build resilient systems, lead reliability
22
+ initiatives for your area, mentor engineers on reliability practices, and
23
+ drive reliability culture
24
+ expert:
25
+ Shape reliability strategy across the business unit, lead critical incident
26
+ management, pioneer new reliability practices, and be the authority on
27
+ system resilience
28
+ managementResponsibilities:
29
+ awareness:
30
+ Understand reliability requirements and support incident escalation
31
+ processes
32
+ foundational:
33
+ Ensure team follows reliability practices, manage on-call schedules, and
34
+ facilitate incident retrospectives
35
+ working:
36
+ Own team reliability outcomes, manage incident response rotations, staff
37
+ reliability initiatives, and champion operational excellence
38
+ practitioner:
39
+ Drive reliability culture across teams, establish SLOs and incident
40
+ management processes for your area, and own cross-team reliability outcomes
41
+ expert:
42
+ Shape reliability strategy across the business unit, lead critical incident
43
+ management at executive level, and own enterprise reliability outcomes
44
+ skills:
45
+ - id: devops
46
+ name: DevOps & CI/CD
47
+ human:
48
+ description:
49
+ Building and maintaining deployment pipelines, infrastructure, and
50
+ operational practices
51
+ levelDescriptions:
52
+ awareness:
53
+ You understand CI/CD concepts (build, test, deploy) and can trigger
54
+ and monitor pipelines others have built. You follow deployment
55
+ procedures.
56
+ foundational:
57
+ You configure basic CI/CD pipelines, understand containerization
58
+ (Docker), and can troubleshoot common build and deployment failures.
59
+ working:
60
+ You build complete CI/CD pipelines end-to-end, manage infrastructure
61
+ as code (Terraform, CloudFormation), implement monitoring, and design
62
+ deployment strategies for your services.
63
+ practitioner:
64
+ You design deployment strategies for complex multi-service systems
65
+ across teams, optimize pipeline performance and reliability, define
66
+ DevOps practices for your area, and mentor engineers on
67
+ infrastructure.
68
+ expert:
69
+ You shape DevOps culture and practices across the business unit. You
70
+ introduce innovative approaches to deployment and infrastructure,
71
+ solve large-scale DevOps challenges, and are recognized externally.
72
+ agent:
73
+ name: devops-cicd
74
+ description: |
75
+ Guide for building CI/CD pipelines, managing infrastructure as code,
76
+ and implementing deployment best practices.
77
+ useWhen: |
78
+ Setting up pipelines, containerizing applications, or configuring
79
+ infrastructure.
80
+ stages:
81
+ specify:
82
+ focus: |
83
+ Define CI/CD and infrastructure requirements.
84
+ Clarify deployment strategy and operational needs.
85
+ activities:
86
+ - Document deployment frequency requirements
87
+ - Identify rollback and recovery requirements
88
+ - Specify monitoring and alerting needs
89
+ - Define security and compliance constraints
90
+ - Mark ambiguities with [NEEDS CLARIFICATION]
91
+ ready:
92
+ - Deployment requirements are documented
93
+ - Recovery requirements are specified
94
+ - Monitoring needs are identified
95
+ - Compliance constraints are clear
96
+ plan:
97
+ focus: |
98
+ Plan CI/CD pipeline architecture and infrastructure requirements.
99
+ Consider deployment strategies and monitoring needs.
100
+ activities:
101
+ - Define pipeline stages (build, test, deploy)
102
+ - Identify infrastructure requirements
103
+ - Plan deployment strategy (rolling, blue-green, canary)
104
+ - Consider monitoring and alerting needs
105
+ - Plan secret management approach
106
+ ready:
107
+ - Pipeline architecture is documented
108
+ - Deployment strategy is chosen and justified
109
+ - Infrastructure requirements are identified
110
+ - Monitoring approach is defined
111
+ code:
112
+ focus: |
113
+ Implement CI/CD pipelines and infrastructure as code. Follow
114
+ best practices for containerization and deployment automation.
115
+ activities:
116
+ - Configure CI/CD pipeline stages
117
+ - Implement infrastructure as code (Terraform, CloudFormation)
118
+ - Create Dockerfiles with security best practices
119
+ - Set up monitoring and alerting
120
+ - Configure secret management
121
+ - Implement deployment automation
122
+ ready:
123
+ - Pipeline runs on every commit
124
+ - Tests run before deployment
125
+ - Deployments are automated
126
+ - Infrastructure is version controlled
127
+ - Secrets are managed securely
128
+ - Monitoring is in place
129
+ review:
130
+ focus: |
131
+ Verify pipeline reliability, security, and operational readiness.
132
+ Ensure rollback procedures work and documentation is complete.
133
+ activities:
134
+ - Verify pipeline runs successfully end-to-end
135
+ - Test rollback procedures
136
+ - Review security configurations
137
+ - Validate monitoring and alerts
138
+ - Check documentation completeness
139
+ ready:
140
+ - Pipeline is tested and reliable
141
+ - Rollback procedure is documented and tested
142
+ - Alerts are configured and tested
143
+ - Runbooks exist for common issues
144
+ deploy:
145
+ focus: |
146
+ Deploy pipeline and infrastructure changes to production.
147
+ Verify operational readiness.
148
+ activities:
149
+ - Deploy pipeline configuration to production
150
+ - Verify deployment workflows work correctly
151
+ - Confirm monitoring and alerting are operational
152
+ - Run deployment through the new pipeline
153
+ ready:
154
+ - Pipeline deployed and operational
155
+ - Workflows tested in production
156
+ - Monitoring confirms healthy operation
157
+ - First deployment through pipeline succeeded
158
+ toolReferences:
159
+ - name: Terraform
160
+ url: https://developer.hashicorp.com/terraform/docs
161
+ simpleIcon: terraform
162
+ description: Infrastructure as code tool
163
+ useWhen: Provisioning and managing cloud infrastructure
164
+ - name: Docker
165
+ url: https://docs.docker.com/
166
+ simpleIcon: docker
167
+ description: Container platform
168
+ useWhen: Containerizing applications or managing container environments
169
+ implementationReference: |
170
+ ## CI/CD Pipeline Stages
171
+
172
+ ### Build
173
+ - Install dependencies
174
+ - Compile/transpile code
175
+ - Generate artifacts
176
+ - Cache dependencies for speed
177
+
178
+ ### Test
179
+ - Run unit tests
180
+ - Run integration tests
181
+ - Static analysis and linting
182
+ - Security scanning
183
+
184
+ ### Deploy
185
+ - Deploy to staging environment
186
+ - Run smoke tests
187
+ - Deploy to production
188
+ - Verify deployment health
189
+
190
+ ## Infrastructure as Code
191
+
192
+ ### Terraform
193
+ ```hcl
194
+ # Define resources declaratively
195
+ resource "aws_instance" "example" {
196
+ ami = "ami-0c55b159cbfafe1f0"
197
+ instance_type = "t2.micro"
198
+ }
199
+ ```
200
+
201
+ ### Docker
202
+ ```dockerfile
203
+ FROM node:18-alpine
204
+ WORKDIR /app
205
+ COPY package*.json ./
206
+ RUN npm ci --only=production
207
+ COPY . .
208
+ CMD ["node", "server.js"]
209
+ ```
210
+
211
+ ## Deployment Strategies
212
+
213
+ ### Rolling Deployment
214
+ - Gradual replacement of instances
215
+ - Zero downtime
216
+ - Easy rollback
217
+
218
+ ### Blue-Green Deployment
219
+ - Two identical environments
220
+ - Switch traffic atomically
221
+ - Fast rollback
222
+
223
+ ### Canary Deployment
224
+ - Route small percentage to new version
225
+ - Monitor for issues
226
+ - Gradually increase traffic
227
+ - id: sre_practices
228
+ name: Site Reliability Engineering
229
+ human:
230
+ description:
231
+ Ensuring system reliability through observability, incident response,
232
+ and capacity planning
233
+ levelDescriptions:
234
+ awareness:
235
+ You understand SLIs, SLOs, and error budgets conceptually. You can use
236
+ monitoring dashboards and escalate issues appropriately.
237
+ foundational:
238
+ You create basic alerts and dashboards. You participate in on-call
239
+ rotations and contribute to incident response under guidance.
240
+ working:
241
+ You design observability strategies for your services, lead incident
242
+ response, implement resilience testing, and conduct blameless
243
+ post-mortems. You balance reliability investment with feature
244
+ velocity.
245
+ practitioner:
246
+ You define reliability standards across teams in your area, drive
247
+ post-incident improvements systematically, design capacity planning
248
+ processes, and mentor engineers on SRE practices.
249
+ expert:
250
+ You shape reliability culture and standards across the business unit.
251
+ You pioneer new reliability practices, solve large-scale reliability
252
+ challenges, and are recognized as an authority on system resilience.
253
+ agent:
254
+ name: sre-practices
255
+ description: |
256
+ Guide for ensuring system reliability through observability, incident
257
+ response, and capacity planning.
258
+ useWhen: |
259
+ Designing monitoring, handling incidents, setting SLOs, or improving
260
+ system resilience.
261
+ stages:
262
+ specify:
263
+ focus: |
264
+ Define reliability requirements and SLO targets.
265
+ Identify critical user journeys that need protection.
266
+ activities:
267
+ - Identify critical user journeys and business impact
268
+ - Document reliability requirements (availability, latency)
269
+ - Define SLO targets with stakeholder agreement
270
+ - Specify acceptable error budgets
271
+ - Mark ambiguities with [NEEDS CLARIFICATION]
272
+ ready:
273
+ - Critical user journeys are identified
274
+ - Reliability requirements are documented
275
+ - SLO targets are defined
276
+ - Error budgets are agreed
277
+ plan:
278
+ focus: |
279
+ Define reliability requirements, SLIs/SLOs, and observability
280
+ strategy. Plan for resilience and capacity needs.
281
+ activities:
282
+ - Define SLIs for key user journeys
283
+ - Set SLOs with stakeholder agreement
284
+ - Plan observability strategy (metrics, logs, traces)
285
+ - Identify failure modes and resilience patterns
286
+ - Define alerting thresholds
287
+ ready:
288
+ - SLIs defined for key user journeys
289
+ - SLOs set with stakeholder agreement
290
+ - Monitoring strategy is planned
291
+ - Failure modes are identified
292
+ - Alerting thresholds are defined
293
+ code:
294
+ focus: |
295
+ Implement observability, resilience patterns, and operational
296
+ tooling. Build systems that fail gracefully and recover quickly.
297
+ activities:
298
+ - Implement metrics, logging, and tracing
299
+ - Configure alerts based on SLOs
300
+ - Implement resilience patterns (timeouts, retries, circuit
301
+ breakers)
302
+ - Create runbooks for common issues
303
+ - Set up error budget tracking
304
+ ready:
305
+ - Comprehensive monitoring is in place
306
+ - Alerts are actionable and low-noise
307
+ - Resilience patterns are implemented
308
+ - Runbooks exist for common issues
309
+ - Error budget tracking is in place
310
+ review:
311
+ focus: |
312
+ Verify reliability implementation meets SLOs and operational
313
+ readiness. Ensure incident response procedures are in place.
314
+ activities:
315
+ - Validate SLOs are measurable
316
+ - Test failure scenarios
317
+ - Review runbook completeness
318
+ - Verify incident response procedures
319
+ - Check alert quality and coverage
320
+ ready:
321
+ - SLOs are measurable and validated
322
+ - Failure scenarios are tested
323
+ - Incident response process documented
324
+ - Post-mortem culture established
325
+ - Disaster recovery approach is tested
326
+ deploy:
327
+ focus: |
328
+ Deploy reliability infrastructure and verify production
329
+ monitoring. Ensure on-call readiness.
330
+ activities:
331
+ - Deploy monitoring and alerting to production
332
+ - Verify dashboards and alerts work correctly
333
+ - Confirm on-call rotation is ready
334
+ - Run production readiness review
335
+ ready:
336
+ - Monitoring is live in production
337
+ - Alerts fire correctly for SLO breaches
338
+ - On-call team is trained and ready
339
+ - Production readiness review is complete
340
+ implementationReference: |
341
+ ## Service Level Concepts
342
+
343
+ ### SLI (Service Level Indicator)
344
+ Quantitative measure of service behavior:
345
+ - Request latency (p50, p95, p99)
346
+ - Error rate (% of failed requests)
347
+ - Availability (% of successful requests)
348
+ - Throughput (requests per second)
349
+
350
+ ### SLO (Service Level Objective)
351
+ Target value for an SLI:
352
+ - "99.9% of requests complete in < 200ms"
353
+ - "Error rate < 0.1% over 30 days"
354
+ - "99.95% availability monthly"
355
+
356
+ ### Error Budget
357
+ Allowed unreliability: 100% - SLO
358
+ - 99.9% SLO = 0.1% error budget
359
+ - ~43 minutes downtime per month
360
+ - Spend on features or reliability
361
+
362
+ ## Observability
363
+
364
+ ### Three Pillars
365
+ - **Metrics**: Aggregated numeric data (counters, gauges, histograms)
366
+ - **Logs**: Discrete event records with context
367
+ - **Traces**: Request flow across services
368
+
369
+ ### Alerting Principles
370
+ - Alert on symptoms, not causes
371
+ - Every alert should be actionable
372
+ - Reduce noise ruthlessly
373
+ - Page only for user-impacting issues
374
+ - Use severity levels appropriately
375
+
376
+ ## Incident Response
377
+
378
+ ### Incident Lifecycle
379
+ 1. **Detection**: Automated alerts or user reports
380
+ 2. **Triage**: Assess severity and impact
381
+ 3. **Mitigation**: Stop the bleeding first
382
+ 4. **Resolution**: Fix the underlying issue
383
+ 5. **Post-mortem**: Learn and improve
384
+
385
+ ### During an Incident
386
+ - Communicate early and often
387
+ - Focus on mitigation before root cause
388
+ - Document actions in real-time
389
+ - Escalate when needed
390
+ - Update stakeholders regularly
391
+
392
+ ## Post-Mortem Process
393
+
394
+ ### Blameless Culture
395
+ - Focus on systems, not individuals
396
+ - Assume good intentions
397
+ - Ask "how did the system allow this?"
398
+ - Share findings openly
399
+
400
+ ### Post-Mortem Template
401
+ 1. Incident summary
402
+ 2. Timeline of events
403
+ 3. Root cause analysis
404
+ 4. What went well
405
+ 5. What could be improved
406
+ 6. Action items with owners
407
+
408
+ ## Resilience Patterns
409
+
410
+ - **Timeouts**: Don't wait forever
411
+ - **Retries**: With exponential backoff
412
+ - **Circuit breakers**: Fail fast when downstream is unhealthy
413
+ - **Bulkheads**: Isolate failures
414
+ - **Graceful degradation**: Partial functionality over total failure