@forwardimpact/pathway 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (227) hide show
  1. package/LICENSE +201 -0
  2. package/README.md +104 -0
  3. package/app/commands/agent.js +430 -0
  4. package/app/commands/behaviour.js +61 -0
  5. package/app/commands/command-factory.js +211 -0
  6. package/app/commands/discipline.js +58 -0
  7. package/app/commands/driver.js +94 -0
  8. package/app/commands/grade.js +60 -0
  9. package/app/commands/index.js +20 -0
  10. package/app/commands/init.js +67 -0
  11. package/app/commands/interview.js +68 -0
  12. package/app/commands/job.js +157 -0
  13. package/app/commands/progress.js +77 -0
  14. package/app/commands/questions.js +179 -0
  15. package/app/commands/serve.js +143 -0
  16. package/app/commands/site.js +121 -0
  17. package/app/commands/skill.js +76 -0
  18. package/app/commands/stage.js +129 -0
  19. package/app/commands/track.js +70 -0
  20. package/app/components/action-buttons.js +66 -0
  21. package/app/components/behaviour-profile.js +53 -0
  22. package/app/components/builder.js +341 -0
  23. package/app/components/card.js +98 -0
  24. package/app/components/checklist.js +145 -0
  25. package/app/components/comparison-radar.js +237 -0
  26. package/app/components/detail.js +230 -0
  27. package/app/components/error-page.js +72 -0
  28. package/app/components/grid.js +109 -0
  29. package/app/components/list.js +120 -0
  30. package/app/components/modifier-table.js +142 -0
  31. package/app/components/nav.js +64 -0
  32. package/app/components/progression-table.js +320 -0
  33. package/app/components/radar-chart.js +102 -0
  34. package/app/components/skill-matrix.js +97 -0
  35. package/app/css/base.css +56 -0
  36. package/app/css/bundles/app.css +40 -0
  37. package/app/css/bundles/handout.css +43 -0
  38. package/app/css/bundles/slides.css +40 -0
  39. package/app/css/components/badges.css +215 -0
  40. package/app/css/components/buttons.css +101 -0
  41. package/app/css/components/forms.css +105 -0
  42. package/app/css/components/layout.css +209 -0
  43. package/app/css/components/nav.css +166 -0
  44. package/app/css/components/progress.css +166 -0
  45. package/app/css/components/states.css +82 -0
  46. package/app/css/components/surfaces.css +243 -0
  47. package/app/css/components/tables.css +362 -0
  48. package/app/css/components/typography.css +122 -0
  49. package/app/css/components/utilities.css +41 -0
  50. package/app/css/pages/agent-builder.css +391 -0
  51. package/app/css/pages/assessment-results.css +453 -0
  52. package/app/css/pages/detail.css +59 -0
  53. package/app/css/pages/interview-builder.css +148 -0
  54. package/app/css/pages/job-builder.css +134 -0
  55. package/app/css/pages/landing.css +92 -0
  56. package/app/css/pages/lifecycle.css +118 -0
  57. package/app/css/pages/progress-builder.css +274 -0
  58. package/app/css/pages/self-assessment.css +502 -0
  59. package/app/css/reset.css +50 -0
  60. package/app/css/tokens.css +153 -0
  61. package/app/css/views/handout.css +30 -0
  62. package/app/css/views/print.css +608 -0
  63. package/app/css/views/slide-animations.css +113 -0
  64. package/app/css/views/slide-base.css +330 -0
  65. package/app/css/views/slide-sections.css +597 -0
  66. package/app/css/views/slide-tables.css +275 -0
  67. package/app/formatters/agent/dom.js +540 -0
  68. package/app/formatters/agent/profile.js +133 -0
  69. package/app/formatters/agent/skill.js +58 -0
  70. package/app/formatters/behaviour/dom.js +91 -0
  71. package/app/formatters/behaviour/markdown.js +54 -0
  72. package/app/formatters/behaviour/shared.js +64 -0
  73. package/app/formatters/discipline/dom.js +187 -0
  74. package/app/formatters/discipline/markdown.js +87 -0
  75. package/app/formatters/discipline/shared.js +131 -0
  76. package/app/formatters/driver/dom.js +103 -0
  77. package/app/formatters/driver/shared.js +92 -0
  78. package/app/formatters/grade/dom.js +208 -0
  79. package/app/formatters/grade/markdown.js +94 -0
  80. package/app/formatters/grade/shared.js +86 -0
  81. package/app/formatters/index.js +50 -0
  82. package/app/formatters/interview/dom.js +97 -0
  83. package/app/formatters/interview/markdown.js +66 -0
  84. package/app/formatters/interview/shared.js +332 -0
  85. package/app/formatters/job/description.js +176 -0
  86. package/app/formatters/job/dom.js +411 -0
  87. package/app/formatters/job/markdown.js +102 -0
  88. package/app/formatters/progress/dom.js +135 -0
  89. package/app/formatters/progress/markdown.js +86 -0
  90. package/app/formatters/progress/shared.js +339 -0
  91. package/app/formatters/questions/json.js +43 -0
  92. package/app/formatters/questions/markdown.js +303 -0
  93. package/app/formatters/questions/shared.js +274 -0
  94. package/app/formatters/questions/yaml.js +76 -0
  95. package/app/formatters/shared.js +71 -0
  96. package/app/formatters/skill/dom.js +168 -0
  97. package/app/formatters/skill/markdown.js +109 -0
  98. package/app/formatters/skill/shared.js +125 -0
  99. package/app/formatters/stage/dom.js +135 -0
  100. package/app/formatters/stage/index.js +12 -0
  101. package/app/formatters/stage/shared.js +111 -0
  102. package/app/formatters/track/dom.js +128 -0
  103. package/app/formatters/track/markdown.js +105 -0
  104. package/app/formatters/track/shared.js +181 -0
  105. package/app/handout-main.js +421 -0
  106. package/app/handout.html +21 -0
  107. package/app/index.html +59 -0
  108. package/app/lib/card-mappers.js +173 -0
  109. package/app/lib/cli-output.js +270 -0
  110. package/app/lib/error-boundary.js +70 -0
  111. package/app/lib/errors.js +49 -0
  112. package/app/lib/form-controls.js +47 -0
  113. package/app/lib/job-cache.js +86 -0
  114. package/app/lib/markdown.js +114 -0
  115. package/app/lib/radar.js +866 -0
  116. package/app/lib/reactive.js +77 -0
  117. package/app/lib/render.js +212 -0
  118. package/app/lib/router-core.js +160 -0
  119. package/app/lib/router-pages.js +16 -0
  120. package/app/lib/router-slides.js +202 -0
  121. package/app/lib/state.js +148 -0
  122. package/app/lib/utils.js +14 -0
  123. package/app/lib/yaml-loader.js +327 -0
  124. package/app/main.js +213 -0
  125. package/app/model/agent.js +702 -0
  126. package/app/model/checklist.js +137 -0
  127. package/app/model/derivation.js +699 -0
  128. package/app/model/index-generator.js +71 -0
  129. package/app/model/interview.js +539 -0
  130. package/app/model/job.js +222 -0
  131. package/app/model/levels.js +591 -0
  132. package/app/model/loader.js +564 -0
  133. package/app/model/matching.js +858 -0
  134. package/app/model/modifiers.js +158 -0
  135. package/app/model/profile.js +266 -0
  136. package/app/model/progression.js +507 -0
  137. package/app/model/validation.js +1385 -0
  138. package/app/pages/agent-builder.js +823 -0
  139. package/app/pages/assessment-results.js +507 -0
  140. package/app/pages/behaviour.js +70 -0
  141. package/app/pages/discipline.js +71 -0
  142. package/app/pages/driver.js +106 -0
  143. package/app/pages/grade.js +117 -0
  144. package/app/pages/interview-builder.js +50 -0
  145. package/app/pages/interview.js +304 -0
  146. package/app/pages/job-builder.js +50 -0
  147. package/app/pages/job.js +58 -0
  148. package/app/pages/landing.js +305 -0
  149. package/app/pages/progress-builder.js +58 -0
  150. package/app/pages/progress.js +495 -0
  151. package/app/pages/self-assessment.js +729 -0
  152. package/app/pages/skill.js +113 -0
  153. package/app/pages/stage.js +231 -0
  154. package/app/pages/track.js +69 -0
  155. package/app/slide-main.js +360 -0
  156. package/app/slides/behaviour.js +38 -0
  157. package/app/slides/chapter.js +82 -0
  158. package/app/slides/discipline.js +40 -0
  159. package/app/slides/driver.js +39 -0
  160. package/app/slides/grade.js +32 -0
  161. package/app/slides/index.js +198 -0
  162. package/app/slides/interview.js +58 -0
  163. package/app/slides/job.js +55 -0
  164. package/app/slides/overview.js +126 -0
  165. package/app/slides/progress.js +83 -0
  166. package/app/slides/skill.js +40 -0
  167. package/app/slides/track.js +39 -0
  168. package/app/slides.html +56 -0
  169. package/app/types.js +147 -0
  170. package/bin/pathway.js +489 -0
  171. package/examples/agents/.claude/skills/architecture-design/SKILL.md +88 -0
  172. package/examples/agents/.claude/skills/cloud-platforms/SKILL.md +90 -0
  173. package/examples/agents/.claude/skills/code-quality-review/SKILL.md +67 -0
  174. package/examples/agents/.claude/skills/data-modeling/SKILL.md +99 -0
  175. package/examples/agents/.claude/skills/developer-experience/SKILL.md +99 -0
  176. package/examples/agents/.claude/skills/devops-cicd/SKILL.md +96 -0
  177. package/examples/agents/.claude/skills/full-stack-development/SKILL.md +90 -0
  178. package/examples/agents/.claude/skills/knowledge-management/SKILL.md +100 -0
  179. package/examples/agents/.claude/skills/pattern-generalization/SKILL.md +102 -0
  180. package/examples/agents/.claude/skills/sre-practices/SKILL.md +117 -0
  181. package/examples/agents/.claude/skills/technical-debt-management/SKILL.md +123 -0
  182. package/examples/agents/.claude/skills/technical-writing/SKILL.md +129 -0
  183. package/examples/agents/.github/agents/se-platform-code.agent.md +181 -0
  184. package/examples/agents/.github/agents/se-platform-plan.agent.md +178 -0
  185. package/examples/agents/.github/agents/se-platform-review.agent.md +113 -0
  186. package/examples/agents/.vscode/settings.json +8 -0
  187. package/examples/behaviours/_index.yaml +8 -0
  188. package/examples/behaviours/outcome_ownership.yaml +44 -0
  189. package/examples/behaviours/polymathic_knowledge.yaml +42 -0
  190. package/examples/behaviours/precise_communication.yaml +40 -0
  191. package/examples/behaviours/relentless_curiosity.yaml +38 -0
  192. package/examples/behaviours/systems_thinking.yaml +41 -0
  193. package/examples/capabilities/_index.yaml +8 -0
  194. package/examples/capabilities/business.yaml +251 -0
  195. package/examples/capabilities/delivery.yaml +352 -0
  196. package/examples/capabilities/people.yaml +100 -0
  197. package/examples/capabilities/reliability.yaml +318 -0
  198. package/examples/capabilities/scale.yaml +394 -0
  199. package/examples/disciplines/_index.yaml +5 -0
  200. package/examples/disciplines/data_engineering.yaml +76 -0
  201. package/examples/disciplines/software_engineering.yaml +76 -0
  202. package/examples/drivers.yaml +205 -0
  203. package/examples/framework.yaml +58 -0
  204. package/examples/grades.yaml +118 -0
  205. package/examples/questions/behaviours/outcome_ownership.yaml +52 -0
  206. package/examples/questions/behaviours/polymathic_knowledge.yaml +48 -0
  207. package/examples/questions/behaviours/precise_communication.yaml +55 -0
  208. package/examples/questions/behaviours/relentless_curiosity.yaml +51 -0
  209. package/examples/questions/behaviours/systems_thinking.yaml +53 -0
  210. package/examples/questions/skills/architecture_design.yaml +54 -0
  211. package/examples/questions/skills/cloud_platforms.yaml +48 -0
  212. package/examples/questions/skills/code_quality.yaml +49 -0
  213. package/examples/questions/skills/data_modeling.yaml +46 -0
  214. package/examples/questions/skills/devops.yaml +47 -0
  215. package/examples/questions/skills/full_stack_development.yaml +48 -0
  216. package/examples/questions/skills/sre_practices.yaml +44 -0
  217. package/examples/questions/skills/stakeholder_management.yaml +49 -0
  218. package/examples/questions/skills/team_collaboration.yaml +43 -0
  219. package/examples/questions/skills/technical_writing.yaml +43 -0
  220. package/examples/self-assessments.yaml +66 -0
  221. package/examples/stages.yaml +76 -0
  222. package/examples/tracks/_index.yaml +6 -0
  223. package/examples/tracks/manager.yaml +53 -0
  224. package/examples/tracks/platform.yaml +54 -0
  225. package/examples/tracks/sre.yaml +58 -0
  226. package/examples/vscode-settings.yaml +22 -0
  227. package/package.json +68 -0
@@ -0,0 +1,100 @@
1
+ name: People
2
+ emoji: 👥
3
+ displayOrder: 6
4
+ description: |
5
+ Growing individuals and building effective teams.
6
+ Includes mentoring, coaching, hiring, performance management,
7
+ and creating inclusive environments.
8
+ transitionChecklists:
9
+ plan_to_code:
10
+ foundational:
11
+ - Pair programming opportunities are identified
12
+ - Knowledge sharing approach is considered
13
+ working:
14
+ - Mentoring opportunities are planned
15
+ - Team learning needs are identified
16
+ - Collaboration approach is documented
17
+ practitioner:
18
+ - Cross-team knowledge sharing is coordinated
19
+ - Coaching objectives are defined
20
+ - Hiring needs are assessed
21
+ expert:
22
+ - Talent development strategy is aligned
23
+ - Leadership pipeline opportunities are identified
24
+ - Organizational learning goals are addressed
25
+ code_to_review:
26
+ foundational:
27
+ - Code review is constructive and educational
28
+ - Knowledge is shared during review
29
+ - Team standards are reinforced
30
+ working:
31
+ - Junior engineers can learn from the code
32
+ - Technical decisions are explained
33
+ - Review feedback develops others
34
+ practitioner:
35
+ - Cross-team learning is enabled
36
+ - Technical mentoring happens through reviews
37
+ - Best practices are reinforced
38
+ expert:
39
+ - Code sets example for organization
40
+ - Leadership behaviors are modeled
41
+ - Institutional knowledge is preserved
42
+ professionalResponsibilities:
43
+ awareness:
44
+ Contribute positively to team dynamics, be open to feedback, and learn
45
+ actively from colleagues
46
+ foundational:
47
+ Support teammates through pair programming, knowledge sharing, and
48
+ constructive code reviews
49
+ working:
50
+ Mentor junior engineers on technical topics, contribute to hiring through
51
+ interviews, and actively build team knowledge
52
+ practitioner:
53
+ Coach multiple engineers on career growth, lead hiring for technical roles
54
+ across your area, and shape team technical culture
55
+ expert:
56
+ Develop technical leaders, shape engineering talent strategy across the
57
+ business unit, and build high-performing engineering teams
58
+ managementResponsibilities:
59
+ awareness:
60
+ Build positive relationships with team members and seek feedback on your
61
+ leadership
62
+ foundational:
63
+ Conduct effective 1:1s, provide regular feedback, support individual
64
+ development, and recognize contributions
65
+ working:
66
+ Manage team performance, own hiring decisions, create inclusive team
67
+ environments, and handle difficult conversations
68
+ practitioner:
69
+ Develop and retain talent across teams, build leadership pipelines for your
70
+ area, make promotion decisions, and shape team culture
71
+ expert:
72
+ Develop senior leaders, shape talent strategy across the business unit,
73
+ build high-performing teams, and own succession planning
74
+ skills:
75
+ - id: team_collaboration
76
+ name: Team Collaboration
77
+ isHumanOnly: true
78
+ human:
79
+ description: Working effectively with others to achieve shared goals
80
+ levelDescriptions:
81
+ awareness:
82
+ You participate constructively in team activities, communicate
83
+ clearly, and ask for help when stuck. You are reliable and follow
84
+ through on commitments.
85
+ foundational:
86
+ You collaborate effectively on shared work, support teammates
87
+ proactively, share knowledge freely, and give and receive feedback
88
+ constructively.
89
+ working:
90
+ You facilitate collaboration across the team, resolve minor conflicts
91
+ before they escalate, enable others to succeed, and contribute
92
+ positively to team dynamics and morale.
93
+ practitioner:
94
+ You build high-performing teams across your area, navigate complex
95
+ interpersonal dynamics, foster psychological safety, and create
96
+ environments where diverse perspectives are valued and heard.
97
+ expert:
98
+ You create collaborative culture across the business unit. You
99
+ transform dysfunctional team dynamics, are recognized for building
100
+ exceptional teams, and mentor others on collaboration excellence.
@@ -0,0 +1,318 @@
1
+ name: Reliability
2
+ emoji: 🛡️
3
+ displayOrder: 5
4
+ description: |
5
+ Ensuring systems are dependable, secure, and observable.
6
+ Includes DevOps practices, security, monitoring, incident response,
7
+ and infrastructure management.
8
+ transitionChecklists:
9
+ plan_to_code:
10
+ foundational:
11
+ - Security requirements are understood
12
+ - Operational guidelines are followed
13
+ working:
14
+ - Monitoring strategy is planned
15
+ - Failure modes are identified
16
+ - Alerting thresholds are defined
17
+ practitioner:
18
+ - SLOs/SLIs are defined for the system
19
+ - Incident response procedures are documented
20
+ - Cross-team reliability dependencies are mapped
21
+ expert:
22
+ - Reliability strategy aligns with enterprise standards
23
+ - Disaster recovery approach is defined
24
+ - Resilience patterns are specified
25
+ code_to_review:
26
+ foundational:
27
+ - Security guidelines are followed
28
+ - Basic monitoring is implemented
29
+ - Error logging exists
30
+ working:
31
+ - Comprehensive monitoring and alerting are in place
32
+ - Failure scenarios are tested
33
+ - Runbooks are created or updated
34
+ practitioner:
35
+ - SLOs are measurable and validated
36
+ - Resilience is tested (chaos engineering)
37
+ - Security review is completed
38
+ expert:
39
+ - Solution meets enterprise reliability standards
40
+ - Incident management integration is verified
41
+ - Disaster recovery is tested
42
+ professionalResponsibilities:
43
+ awareness:
44
+ Follow security and operational guidelines, escalate issues appropriately,
45
+ and participate in on-call rotations with guidance
46
+ foundational:
47
+ Implement reliability practices in your code, create basic monitoring, and
48
+ contribute effectively to incident response
49
+ working:
50
+ Design for reliability, implement comprehensive monitoring and alerting,
51
+ lead incident response, and drive post-incident improvements
52
+ practitioner:
53
+ Establish SLOs/SLIs across teams, build resilient systems, lead reliability
54
+ initiatives for your area, mentor engineers on reliability practices, and
55
+ drive reliability culture
56
+ expert:
57
+ Shape reliability strategy across the business unit, lead critical incident
58
+ management, pioneer new reliability practices, and be the authority on
59
+ system resilience
60
+ managementResponsibilities:
61
+ awareness:
62
+ Understand reliability requirements and support incident escalation
63
+ processes
64
+ foundational:
65
+ Ensure team follows reliability practices, manage on-call schedules, and
66
+ facilitate incident retrospectives
67
+ working:
68
+ Own team reliability outcomes, manage incident response rotations, staff
69
+ reliability initiatives, and champion operational excellence
70
+ practitioner:
71
+ Drive reliability culture across teams, establish SLOs and incident
72
+ management processes for your area, and own cross-team reliability outcomes
73
+ expert:
74
+ Shape reliability strategy across the business unit, lead critical incident
75
+ management at executive level, and own enterprise reliability outcomes
76
+ skills:
77
+ - id: devops
78
+ name: DevOps & CI/CD
79
+ human:
80
+ description:
81
+ Building and maintaining deployment pipelines, infrastructure, and
82
+ operational practices
83
+ levelDescriptions:
84
+ awareness:
85
+ You understand CI/CD concepts (build, test, deploy) and can trigger
86
+ and monitor pipelines others have built. You follow deployment
87
+ procedures.
88
+ foundational:
89
+ You configure basic CI/CD pipelines, understand containerization
90
+ (Docker), and can troubleshoot common build and deployment failures.
91
+ working:
92
+ You build complete CI/CD pipelines end-to-end, manage infrastructure
93
+ as code (Terraform, CloudFormation), implement monitoring, and design
94
+ deployment strategies for your services.
95
+ practitioner:
96
+ You design deployment strategies for complex multi-service systems
97
+ across teams, optimize pipeline performance and reliability, define
98
+ DevOps practices for your area, and mentor engineers on
99
+ infrastructure.
100
+ expert:
101
+ You shape DevOps culture and practices across the business unit. You
102
+ introduce innovative approaches to deployment and infrastructure,
103
+ solve large-scale DevOps challenges, and are recognized externally.
104
+ agent:
105
+ name: devops-cicd
106
+ description: |
107
+ Guide for building CI/CD pipelines, managing infrastructure as code, and
108
+ implementing deployment best practices. Use when setting up pipelines,
109
+ containerizing applications, or configuring infrastructure.
110
+ body: |
111
+ # DevOps & CI/CD
112
+
113
+ ## When to use this skill
114
+
115
+ Use this skill when:
116
+ - Setting up or modifying CI/CD pipelines
117
+ - Containerizing applications with Docker
118
+ - Managing infrastructure as code
119
+ - Troubleshooting deployment failures
120
+ - Implementing monitoring and alerting
121
+
122
+ ## CI/CD Pipeline Stages
123
+
124
+ ### Build
125
+ - Install dependencies
126
+ - Compile/transpile code
127
+ - Generate artifacts
128
+ - Cache dependencies for speed
129
+
130
+ ### Test
131
+ - Run unit tests
132
+ - Run integration tests
133
+ - Static analysis and linting
134
+ - Security scanning
135
+
136
+ ### Deploy
137
+ - Deploy to staging environment
138
+ - Run smoke tests
139
+ - Deploy to production
140
+ - Verify deployment health
141
+
142
+ ## Infrastructure as Code
143
+
144
+ ### Terraform
145
+ ```hcl
146
+ # Define resources declaratively
147
+ resource "aws_instance" "example" {
148
+ ami = "ami-0c55b159cbfafe1f0"
149
+ instance_type = "t2.micro"
150
+ }
151
+ ```
152
+
153
+ ### Docker
154
+ ```dockerfile
155
+ FROM node:18-alpine
156
+ WORKDIR /app
157
+ COPY package*.json ./
158
+ RUN npm ci --only=production
159
+ COPY . .
160
+ CMD ["node", "server.js"]
161
+ ```
162
+
163
+ ## Deployment Strategies
164
+
165
+ ### Rolling Deployment
166
+ - Gradual replacement of instances
167
+ - Zero downtime
168
+ - Easy rollback
169
+
170
+ ### Blue-Green Deployment
171
+ - Two identical environments
172
+ - Switch traffic atomically
173
+ - Fast rollback
174
+
175
+ ### Canary Deployment
176
+ - Route small percentage to new version
177
+ - Monitor for issues
178
+ - Gradually increase traffic
179
+
180
+ ## DevOps Checklist
181
+
182
+ - [ ] Pipeline runs on every commit
183
+ - [ ] Tests run before deployment
184
+ - [ ] Deployments are automated
185
+ - [ ] Rollback procedure is documented
186
+ - [ ] Infrastructure is version controlled
187
+ - [ ] Secrets are managed securely
188
+ - [ ] Monitoring is in place
189
+ - [ ] Alerts are configured
190
+ - id: sre_practices
191
+ name: Site Reliability Engineering
192
+ human:
193
+ description:
194
+ Ensuring system reliability through observability, incident response,
195
+ and capacity planning
196
+ levelDescriptions:
197
+ awareness:
198
+ You understand SLIs, SLOs, and error budgets conceptually. You can use
199
+ monitoring dashboards and escalate issues appropriately.
200
+ foundational:
201
+ You create basic alerts and dashboards. You participate in on-call
202
+ rotations and contribute to incident response under guidance.
203
+ working:
204
+ You design observability strategies for your services, lead incident
205
+ response, implement resilience testing, and conduct blameless
206
+ post-mortems. You balance reliability investment with feature
207
+ velocity.
208
+ practitioner:
209
+ You define reliability standards across teams in your area, drive
210
+ post-incident improvements systematically, design capacity planning
211
+ processes, and mentor engineers on SRE practices.
212
+ expert:
213
+ You shape reliability culture and standards across the business unit.
214
+ You pioneer new reliability practices, solve large-scale reliability
215
+ challenges, and are recognized as an authority on system resilience.
216
+ agent:
217
+ name: sre-practices
218
+ description: |
219
+ Guide for ensuring system reliability through observability, incident
220
+ response, and capacity planning. Use when designing monitoring, handling
221
+ incidents, setting SLOs, or improving system resilience.
222
+ body: |
223
+ # Site Reliability Engineering
224
+
225
+ ## When to use this skill
226
+
227
+ Use this skill when:
228
+ - Designing monitoring and alerting
229
+ - Defining SLIs, SLOs, and error budgets
230
+ - Handling or preparing for incidents
231
+ - Conducting post-mortems
232
+ - Planning for capacity and resilience
233
+
234
+ ## Service Level Concepts
235
+
236
+ ### SLI (Service Level Indicator)
237
+ Quantitative measure of service behavior:
238
+ - Request latency (p50, p95, p99)
239
+ - Error rate (% of failed requests)
240
+ - Availability (% of successful requests)
241
+ - Throughput (requests per second)
242
+
243
+ ### SLO (Service Level Objective)
244
+ Target value for an SLI:
245
+ - "99.9% of requests complete in < 200ms"
246
+ - "Error rate < 0.1% over 30 days"
247
+ - "99.95% availability monthly"
248
+
249
+ ### Error Budget
250
+ Allowed unreliability: 100% - SLO
251
+ - 99.9% SLO = 0.1% error budget
252
+ - ~43 minutes downtime per month
253
+ - Spend on features or reliability
254
+
255
+ ## Observability
256
+
257
+ ### Three Pillars
258
+ - **Metrics**: Aggregated numeric data (counters, gauges, histograms)
259
+ - **Logs**: Discrete event records with context
260
+ - **Traces**: Request flow across services
261
+
262
+ ### Alerting Principles
263
+ - Alert on symptoms, not causes
264
+ - Every alert should be actionable
265
+ - Reduce noise ruthlessly
266
+ - Page only for user-impacting issues
267
+ - Use severity levels appropriately
268
+
269
+ ## Incident Response
270
+
271
+ ### Incident Lifecycle
272
+ 1. **Detection**: Automated alerts or user reports
273
+ 2. **Triage**: Assess severity and impact
274
+ 3. **Mitigation**: Stop the bleeding first
275
+ 4. **Resolution**: Fix the underlying issue
276
+ 5. **Post-mortem**: Learn and improve
277
+
278
+ ### During an Incident
279
+ - Communicate early and often
280
+ - Focus on mitigation before root cause
281
+ - Document actions in real-time
282
+ - Escalate when needed
283
+ - Update stakeholders regularly
284
+
285
+ ## Post-Mortem Process
286
+
287
+ ### Blameless Culture
288
+ - Focus on systems, not individuals
289
+ - Assume good intentions
290
+ - Ask "how did the system allow this?"
291
+ - Share findings openly
292
+
293
+ ### Post-Mortem Template
294
+ 1. Incident summary
295
+ 2. Timeline of events
296
+ 3. Root cause analysis
297
+ 4. What went well
298
+ 5. What could be improved
299
+ 6. Action items with owners
300
+
301
+ ## Resilience Patterns
302
+
303
+ - **Timeouts**: Don't wait forever
304
+ - **Retries**: With exponential backoff
305
+ - **Circuit breakers**: Fail fast when downstream is unhealthy
306
+ - **Bulkheads**: Isolate failures
307
+ - **Graceful degradation**: Partial functionality over total failure
308
+
309
+ ## SRE Checklist
310
+
311
+ - [ ] SLIs defined for key user journeys
312
+ - [ ] SLOs set with stakeholder agreement
313
+ - [ ] Error budget tracking in place
314
+ - [ ] Alerts are actionable and low-noise
315
+ - [ ] Runbooks exist for common issues
316
+ - [ ] Incident response process documented
317
+ - [ ] Post-mortem culture established
318
+ - [ ] Resilience patterns implemented