agentic-team-templates 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (103) hide show
  1. package/README.md +280 -0
  2. package/bin/cli.js +5 -0
  3. package/package.json +47 -0
  4. package/src/index.js +521 -0
  5. package/templates/_shared/code-quality.md +162 -0
  6. package/templates/_shared/communication.md +114 -0
  7. package/templates/_shared/core-principles.md +62 -0
  8. package/templates/_shared/git-workflow.md +165 -0
  9. package/templates/_shared/security-fundamentals.md +173 -0
  10. package/templates/blockchain/.cursorrules/defi-patterns.md +520 -0
  11. package/templates/blockchain/.cursorrules/gas-optimization.md +339 -0
  12. package/templates/blockchain/.cursorrules/overview.md +130 -0
  13. package/templates/blockchain/.cursorrules/security.md +318 -0
  14. package/templates/blockchain/.cursorrules/smart-contracts.md +364 -0
  15. package/templates/blockchain/.cursorrules/testing.md +415 -0
  16. package/templates/blockchain/.cursorrules/web3-integration.md +538 -0
  17. package/templates/blockchain/CLAUDE.md +389 -0
  18. package/templates/cli-tools/.cursorrules/architecture.md +412 -0
  19. package/templates/cli-tools/.cursorrules/arguments.md +406 -0
  20. package/templates/cli-tools/.cursorrules/distribution.md +546 -0
  21. package/templates/cli-tools/.cursorrules/error-handling.md +455 -0
  22. package/templates/cli-tools/.cursorrules/overview.md +136 -0
  23. package/templates/cli-tools/.cursorrules/testing.md +537 -0
  24. package/templates/cli-tools/.cursorrules/user-experience.md +545 -0
  25. package/templates/cli-tools/CLAUDE.md +356 -0
  26. package/templates/data-engineering/.cursorrules/data-modeling.md +367 -0
  27. package/templates/data-engineering/.cursorrules/data-quality.md +455 -0
  28. package/templates/data-engineering/.cursorrules/overview.md +85 -0
  29. package/templates/data-engineering/.cursorrules/performance.md +339 -0
  30. package/templates/data-engineering/.cursorrules/pipeline-design.md +280 -0
  31. package/templates/data-engineering/.cursorrules/security.md +460 -0
  32. package/templates/data-engineering/.cursorrules/testing.md +452 -0
  33. package/templates/data-engineering/CLAUDE.md +974 -0
  34. package/templates/devops-sre/.cursorrules/capacity-planning.md +653 -0
  35. package/templates/devops-sre/.cursorrules/change-management.md +584 -0
  36. package/templates/devops-sre/.cursorrules/chaos-engineering.md +651 -0
  37. package/templates/devops-sre/.cursorrules/disaster-recovery.md +641 -0
  38. package/templates/devops-sre/.cursorrules/incident-management.md +565 -0
  39. package/templates/devops-sre/.cursorrules/observability.md +714 -0
  40. package/templates/devops-sre/.cursorrules/overview.md +230 -0
  41. package/templates/devops-sre/.cursorrules/postmortems.md +588 -0
  42. package/templates/devops-sre/.cursorrules/runbooks.md +760 -0
  43. package/templates/devops-sre/.cursorrules/slo-sli.md +617 -0
  44. package/templates/devops-sre/.cursorrules/toil-reduction.md +567 -0
  45. package/templates/devops-sre/CLAUDE.md +1007 -0
  46. package/templates/documentation/.cursorrules/adr.md +277 -0
  47. package/templates/documentation/.cursorrules/api-documentation.md +411 -0
  48. package/templates/documentation/.cursorrules/code-comments.md +253 -0
  49. package/templates/documentation/.cursorrules/maintenance.md +260 -0
  50. package/templates/documentation/.cursorrules/overview.md +82 -0
  51. package/templates/documentation/.cursorrules/readme-standards.md +306 -0
  52. package/templates/documentation/CLAUDE.md +120 -0
  53. package/templates/fullstack/.cursorrules/api-contracts.md +331 -0
  54. package/templates/fullstack/.cursorrules/architecture.md +298 -0
  55. package/templates/fullstack/.cursorrules/overview.md +109 -0
  56. package/templates/fullstack/.cursorrules/shared-types.md +348 -0
  57. package/templates/fullstack/.cursorrules/testing.md +386 -0
  58. package/templates/fullstack/CLAUDE.md +349 -0
  59. package/templates/ml-ai/.cursorrules/data-engineering.md +483 -0
  60. package/templates/ml-ai/.cursorrules/deployment.md +601 -0
  61. package/templates/ml-ai/.cursorrules/model-development.md +538 -0
  62. package/templates/ml-ai/.cursorrules/monitoring.md +658 -0
  63. package/templates/ml-ai/.cursorrules/overview.md +131 -0
  64. package/templates/ml-ai/.cursorrules/security.md +637 -0
  65. package/templates/ml-ai/.cursorrules/testing.md +678 -0
  66. package/templates/ml-ai/CLAUDE.md +1136 -0
  67. package/templates/mobile/.cursorrules/navigation.md +246 -0
  68. package/templates/mobile/.cursorrules/offline-first.md +302 -0
  69. package/templates/mobile/.cursorrules/overview.md +71 -0
  70. package/templates/mobile/.cursorrules/performance.md +345 -0
  71. package/templates/mobile/.cursorrules/testing.md +339 -0
  72. package/templates/mobile/CLAUDE.md +233 -0
  73. package/templates/platform-engineering/.cursorrules/ci-cd.md +778 -0
  74. package/templates/platform-engineering/.cursorrules/developer-experience.md +632 -0
  75. package/templates/platform-engineering/.cursorrules/infrastructure-as-code.md +600 -0
  76. package/templates/platform-engineering/.cursorrules/kubernetes.md +710 -0
  77. package/templates/platform-engineering/.cursorrules/observability.md +747 -0
  78. package/templates/platform-engineering/.cursorrules/overview.md +215 -0
  79. package/templates/platform-engineering/.cursorrules/security.md +855 -0
  80. package/templates/platform-engineering/.cursorrules/testing.md +878 -0
  81. package/templates/platform-engineering/CLAUDE.md +850 -0
  82. package/templates/utility-agent/.cursorrules/action-control.md +284 -0
  83. package/templates/utility-agent/.cursorrules/context-management.md +186 -0
  84. package/templates/utility-agent/.cursorrules/hallucination-prevention.md +253 -0
  85. package/templates/utility-agent/.cursorrules/overview.md +78 -0
  86. package/templates/utility-agent/.cursorrules/token-optimization.md +369 -0
  87. package/templates/utility-agent/CLAUDE.md +513 -0
  88. package/templates/web-backend/.cursorrules/api-design.md +255 -0
  89. package/templates/web-backend/.cursorrules/authentication.md +309 -0
  90. package/templates/web-backend/.cursorrules/database-patterns.md +298 -0
  91. package/templates/web-backend/.cursorrules/error-handling.md +366 -0
  92. package/templates/web-backend/.cursorrules/overview.md +69 -0
  93. package/templates/web-backend/.cursorrules/security.md +358 -0
  94. package/templates/web-backend/.cursorrules/testing.md +395 -0
  95. package/templates/web-backend/CLAUDE.md +366 -0
  96. package/templates/web-frontend/.cursorrules/accessibility.md +296 -0
  97. package/templates/web-frontend/.cursorrules/component-patterns.md +204 -0
  98. package/templates/web-frontend/.cursorrules/overview.md +72 -0
  99. package/templates/web-frontend/.cursorrules/performance.md +325 -0
  100. package/templates/web-frontend/.cursorrules/state-management.md +227 -0
  101. package/templates/web-frontend/.cursorrules/styling.md +271 -0
  102. package/templates/web-frontend/.cursorrules/testing.md +311 -0
  103. package/templates/web-frontend/CLAUDE.md +399 -0
@@ -0,0 +1,230 @@
1
+ # DevOps/SRE Overview
2
+
3
+ Staff-level guidelines for Site Reliability Engineering and operational excellence.
4
+
5
+ ## Scope
6
+
7
+ This template applies to:
8
+
9
+ - Site Reliability Engineering (SRE) practices and culture
10
+ - Production operations and 24/7 system reliability
11
+ - Incident management, response, and postmortems
12
+ - Monitoring, alerting, and observability strategies
13
+ - SLO/SLI definition and error budget management
14
+ - Capacity planning and performance engineering
15
+ - Disaster recovery and business continuity
16
+ - Toil reduction and operational automation
17
+ - Change management and safe deployments
18
+ - Chaos engineering and resilience testing
19
+
20
+ ## Core Principles
21
+
22
+ ### 1. Reliability is a Feature
23
+
24
+ Users don't distinguish between "the app is slow" and "the app is broken." Reliability directly impacts user experience, trust, and business outcomes.
25
+
26
+ - Treat reliability work as product work
27
+ - Measure reliability from the user's perspective
28
+ - Invest in reliability proportional to business impact
29
+ - Make reliability visible to stakeholders
30
+
31
+ ### 2. Error Budgets Over Perfection
32
+
33
+ 100% reliability is the wrong target. Perfect reliability means zero innovation.
34
+
35
+ - Define explicit reliability targets (SLOs)
36
+ - Use error budgets to balance reliability and velocity
37
+ - When budget is healthy, take risks and move fast
38
+ - When budget is low, prioritize reliability work
39
+ - Error budget is a currency, not a constraint
40
+
41
+ ### 3. Automate Toil Away
42
+
43
+ Toil is the kind of work tied to running a service that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as the service grows.
44
+
45
+ - If you're doing it manually more than twice, automate it
46
+ - Measure and track toil as a metric
47
+ - Allocate engineering time specifically for toil reduction
48
+ - Target < 50% of SRE time spent on toil
49
+
50
+ ### 4. Observability First
51
+
52
+ You can't fix what you can't measure. You can't improve what you can't observe.
53
+
54
+ - Instrument everything from day one
55
+ - Logs, metrics, and traces are not optional
56
+ - Design systems to be debuggable
57
+ - Correlate signals across the stack
58
+
59
+ ### 5. Blameless Culture
60
+
61
+ Incidents are learning opportunities, not blame games. Human error is a symptom of system problems.
62
+
63
+ - Focus on systems, not individuals
64
+ - Ask "how did the system allow this?" not "who caused this?"
65
+ - Share postmortems widely
66
+ - Celebrate learning from failures
67
+
68
+ ## Project Structure
69
+
70
+ ```
71
+ sre/
72
+ ├── monitoring/ # Observability configuration
73
+ │ ├── prometheus/ # Prometheus rules and alerts
74
+ │ │ ├── rules/
75
+ │ │ └── alerts/
76
+ │ ├── grafana/ # Dashboard definitions
77
+ │ │ └── dashboards/
78
+ │ ├── loki/ # Log aggregation config
79
+ │ └── alertmanager/ # Alert routing
80
+
81
+ ├── runbooks/ # Operational runbooks
82
+ │ ├── services/ # Per-service runbooks
83
+ │ │ ├── api-server.md
84
+ │ │ └── database.md
85
+ │ ├── alerts/ # Per-alert runbooks
86
+ │ └── procedures/ # General procedures
87
+ │ ├── incident-response.md
88
+ │ └── on-call-handoff.md
89
+
90
+ ├── slos/ # SLO definitions
91
+ │ ├── api.yaml
92
+ │ ├── frontend.yaml
93
+ │ └── error-budget-policy.yaml
94
+
95
+ ├── incident-response/ # Incident management
96
+ │ ├── templates/
97
+ │ │ ├── incident-doc.md
98
+ │ │ └── postmortem.md
99
+ │ ├── severity-definitions.yaml
100
+ │ └── escalation-policy.yaml
101
+
102
+ ├── chaos/ # Chaos engineering
103
+ │ ├── experiments/
104
+ │ └── game-days/
105
+
106
+ ├── load-testing/ # Performance testing
107
+ │ ├── k6/
108
+ │ └── scenarios/
109
+
110
+ ├── disaster-recovery/ # DR documentation
111
+ │ ├── runbooks/
112
+ │ ├── backup-policies/
113
+ │ └── failover-procedures/
114
+
115
+ └── docs/ # SRE documentation
116
+ ├── on-call-guide.md
117
+ ├── escalation-policy.md
118
+ └── service-catalog.md
119
+ ```
120
+
121
+ ## Technology Stack
122
+
123
+ | Layer | Primary | Alternatives |
124
+ |-------|---------|--------------|
125
+ | Metrics Collection | Prometheus | Datadog, InfluxDB, Victoria Metrics |
126
+ | Metrics Visualization | Grafana | Datadog, Kibana, Chronograf |
127
+ | Log Aggregation | Loki | Elasticsearch, Splunk, Datadog Logs |
128
+ | Distributed Tracing | Jaeger, Tempo | Zipkin, X-Ray, Honeycomb |
129
+ | Alerting | Alertmanager | PagerDuty, Datadog, OpsGenie |
130
+ | Incident Management | PagerDuty, Incident.io | OpsGenie, Squadcast, Rootly |
131
+ | Status Pages | Statuspage.io | Instatus, Cachet, Better Uptime |
132
+ | On-Call Management | PagerDuty | OpsGenie, VictorOps, Squadcast |
133
+ | Chaos Engineering | Chaos Mesh, Litmus | Gremlin, AWS FIS, Pumba |
134
+ | Load Testing | k6 | Locust, Gatling, JMeter |
135
+ | Synthetic Monitoring | Grafana Synthetic | Datadog Synthetics, Pingdom |
136
+
137
+ ## Staff Engineer Responsibilities
138
+
139
+ ### Technical Leadership
140
+
141
+ - Define and evolve organization-wide reliability standards
142
+ - Establish SLO frameworks and error budget policies
143
+ - Design observability architectures that scale
144
+ - Make build vs. buy decisions for SRE tooling
145
+ - Drive cultural shift toward reliability ownership
146
+
147
+ ### Cross-Team Enablement
148
+
149
+ - Create reusable monitoring and alerting patterns
150
+ - Build self-service observability for development teams
151
+ - Establish incident response procedures and training
152
+ - Design runbook templates and standards
153
+ - Lead chaos engineering and game day initiatives
154
+
155
+ ### Operational Excellence
156
+
157
+ - Own and improve the incident management process
158
+ - Drive postmortem quality and action item follow-through
159
+ - Reduce mean time to detection (MTTD) and recovery (MTTR)
160
+ - Eliminate recurring incidents through systemic fixes
161
+ - Balance on-call health with operational coverage
162
+
163
+ ### Strategic Thinking
164
+
165
+ - Align reliability investments with business priorities
166
+ - Plan capacity for multi-year growth projections
167
+ - Design disaster recovery strategies
168
+ - Evaluate emerging SRE tools and practices
169
+ - Manage technical debt in operational systems
170
+
171
+ ## Key Metrics
172
+
173
+ ### Reliability Metrics
174
+
175
+ - **Availability**: Percentage of successful requests
176
+ - **Latency**: Response time at various percentiles (p50, p95, p99)
177
+ - **Error Rate**: Percentage of failed requests
178
+ - **Throughput**: Requests processed per unit time
179
+
180
+ ### Operational Metrics
181
+
182
+ - **MTTD**: Mean Time to Detect incidents
183
+ - **MTTR**: Mean Time to Resolve incidents
184
+ - **MTBF**: Mean Time Between Failures
185
+ - **Change Failure Rate**: Percentage of changes causing incidents
186
+
187
+ ### On-Call Health Metrics
188
+
189
+ - **Pages per shift**: Target < 10
190
+ - **Pages per night**: Target < 2
191
+ - **False positive rate**: Target < 10%
192
+ - **Alert noise ratio**: Actionable vs total alerts
193
+
194
+ ### Toil Metrics
195
+
196
+ - **Toil percentage**: Time spent on toil vs engineering work
197
+ - **Manual intervention rate**: Human touchpoints per deployment
198
+ - **Automation coverage**: Percentage of runbooks with automation
199
+
200
+ ## Anti-Patterns to Avoid
201
+
202
+ ### Alert on Everything
203
+
204
+ ❌ **Wrong**: Create alerts for every possible metric "just in case"
205
+
206
+ ✅ **Right**: Every alert must be actionable, urgent, and relevant. If the on-call doesn't need to do something immediately, it shouldn't page.
207
+
208
+ ### SLO as Ceiling
209
+
210
+ ❌ **Wrong**: Treat SLOs as minimum acceptable reliability
211
+
212
+ ✅ **Right**: SLOs define the target; error budget is the tool for balancing reliability with velocity
213
+
214
+ ### Postmortem Graveyard
215
+
216
+ ❌ **Wrong**: Write postmortems that go into a folder and are never read
217
+
218
+ ✅ **Right**: Track action items, measure recurring incidents, share learnings organization-wide
219
+
220
+ ### Hero Culture
221
+
222
+ ❌ **Wrong**: Rely on specific engineers who "know the system" to fix everything
223
+
224
+ ✅ **Right**: Document everything, share knowledge, ensure any on-call can handle any incident
225
+
226
+ ### Manual Everything
227
+
228
+ ❌ **Wrong**: Keep critical procedures as tribal knowledge
229
+
230
+ ✅ **Right**: Automate recovery, create runbooks, build self-healing systems