dojo.md 0.2.0 → 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (225) hide show
  1. package/courses/GENERATION_LOG.md +45 -0
  2. package/courses/aws-lambda-debugging/course.yaml +11 -0
  3. package/courses/aws-lambda-debugging/scenarios/level-1/api-gateway-integration.yaml +71 -0
  4. package/courses/aws-lambda-debugging/scenarios/level-1/cloudwatch-logs-basics.yaml +64 -0
  5. package/courses/aws-lambda-debugging/scenarios/level-1/cold-start-basics.yaml +70 -0
  6. package/courses/aws-lambda-debugging/scenarios/level-1/environment-variable-issues.yaml +72 -0
  7. package/courses/aws-lambda-debugging/scenarios/level-1/first-debugging-shift.yaml +73 -0
  8. package/courses/aws-lambda-debugging/scenarios/level-1/handler-import-errors.yaml +71 -0
  9. package/courses/aws-lambda-debugging/scenarios/level-1/iam-permission-errors.yaml +68 -0
  10. package/courses/aws-lambda-debugging/scenarios/level-1/invocation-errors.yaml +72 -0
  11. package/courses/aws-lambda-debugging/scenarios/level-1/lambda-timeout-errors.yaml +65 -0
  12. package/courses/aws-lambda-debugging/scenarios/level-1/memory-and-oom.yaml +70 -0
  13. package/courses/aws-lambda-debugging/scenarios/level-2/async-invocation-failures.yaml +72 -0
  14. package/courses/aws-lambda-debugging/scenarios/level-2/cold-start-optimization.yaml +76 -0
  15. package/courses/aws-lambda-debugging/scenarios/level-2/dynamodb-streams-debugging.yaml +70 -0
  16. package/courses/aws-lambda-debugging/scenarios/level-2/intermediate-debugging-shift.yaml +71 -0
  17. package/courses/aws-lambda-debugging/scenarios/level-2/lambda-concurrency-management.yaml +70 -0
  18. package/courses/aws-lambda-debugging/scenarios/level-2/lambda-layers-debugging.yaml +76 -0
  19. package/courses/aws-lambda-debugging/scenarios/level-2/sam-local-debugging.yaml +74 -0
  20. package/courses/aws-lambda-debugging/scenarios/level-2/sqs-event-source.yaml +72 -0
  21. package/courses/aws-lambda-debugging/scenarios/level-2/vpc-networking-issues.yaml +71 -0
  22. package/courses/aws-lambda-debugging/scenarios/level-2/xray-tracing.yaml +62 -0
  23. package/courses/aws-lambda-debugging/scenarios/level-3/advanced-debugging-shift.yaml +72 -0
  24. package/courses/aws-lambda-debugging/scenarios/level-3/container-image-lambda.yaml +79 -0
  25. package/courses/aws-lambda-debugging/scenarios/level-3/cross-account-invocation.yaml +72 -0
  26. package/courses/aws-lambda-debugging/scenarios/level-3/eventbridge-patterns.yaml +79 -0
  27. package/courses/aws-lambda-debugging/scenarios/level-3/iac-deployment-debugging.yaml +68 -0
  28. package/courses/aws-lambda-debugging/scenarios/level-3/kinesis-stream-processing.yaml +64 -0
  29. package/courses/aws-lambda-debugging/scenarios/level-3/lambda-at-edge.yaml +64 -0
  30. package/courses/aws-lambda-debugging/scenarios/level-3/lambda-extensions-debugging.yaml +67 -0
  31. package/courses/aws-lambda-debugging/scenarios/level-3/powertools-observability.yaml +79 -0
  32. package/courses/aws-lambda-debugging/scenarios/level-3/step-functions-debugging.yaml +80 -0
  33. package/courses/aws-lambda-debugging/scenarios/level-4/cost-optimization-strategy.yaml +67 -0
  34. package/courses/aws-lambda-debugging/scenarios/level-4/expert-debugging-shift.yaml +62 -0
  35. package/courses/aws-lambda-debugging/scenarios/level-4/incident-management-serverless.yaml +61 -0
  36. package/courses/aws-lambda-debugging/scenarios/level-4/multi-region-serverless.yaml +67 -0
  37. package/courses/aws-lambda-debugging/scenarios/level-4/observability-platform-design.yaml +71 -0
  38. package/courses/aws-lambda-debugging/scenarios/level-4/serverless-architecture-design.yaml +64 -0
  39. package/courses/aws-lambda-debugging/scenarios/level-4/serverless-data-architecture.yaml +66 -0
  40. package/courses/aws-lambda-debugging/scenarios/level-4/serverless-migration-strategy.yaml +65 -0
  41. package/courses/aws-lambda-debugging/scenarios/level-4/serverless-security-design.yaml +60 -0
  42. package/courses/aws-lambda-debugging/scenarios/level-4/serverless-testing-strategy.yaml +62 -0
  43. package/courses/aws-lambda-debugging/scenarios/level-5/board-serverless-strategy.yaml +63 -0
  44. package/courses/aws-lambda-debugging/scenarios/level-5/consulting-serverless-adoption.yaml +57 -0
  45. package/courses/aws-lambda-debugging/scenarios/level-5/industry-serverless-patterns.yaml +62 -0
  46. package/courses/aws-lambda-debugging/scenarios/level-5/ma-serverless-integration.yaml +75 -0
  47. package/courses/aws-lambda-debugging/scenarios/level-5/master-debugging-shift.yaml +61 -0
  48. package/courses/aws-lambda-debugging/scenarios/level-5/organizational-serverless-transformation.yaml +65 -0
  49. package/courses/aws-lambda-debugging/scenarios/level-5/regulatory-serverless.yaml +61 -0
  50. package/courses/aws-lambda-debugging/scenarios/level-5/serverless-economics.yaml +65 -0
  51. package/courses/aws-lambda-debugging/scenarios/level-5/serverless-future-technology.yaml +66 -0
  52. package/courses/aws-lambda-debugging/scenarios/level-5/serverless-platform-design.yaml +71 -0
  53. package/courses/docker-container-debugging/course.yaml +11 -0
  54. package/courses/docker-container-debugging/scenarios/level-1/container-exit-codes.yaml +59 -0
  55. package/courses/docker-container-debugging/scenarios/level-1/container-networking-basics.yaml +69 -0
  56. package/courses/docker-container-debugging/scenarios/level-1/docker-logs-debugging.yaml +67 -0
  57. package/courses/docker-container-debugging/scenarios/level-1/dockerfile-build-failures.yaml +71 -0
  58. package/courses/docker-container-debugging/scenarios/level-1/environment-variable-issues.yaml +74 -0
  59. package/courses/docker-container-debugging/scenarios/level-1/first-debugging-shift.yaml +70 -0
  60. package/courses/docker-container-debugging/scenarios/level-1/image-pull-failures.yaml +68 -0
  61. package/courses/docker-container-debugging/scenarios/level-1/port-mapping-issues.yaml +67 -0
  62. package/courses/docker-container-debugging/scenarios/level-1/resource-limits-oom.yaml +70 -0
  63. package/courses/docker-container-debugging/scenarios/level-1/volume-mount-problems.yaml +66 -0
  64. package/courses/docker-container-debugging/scenarios/level-2/container-health-checks.yaml +73 -0
  65. package/courses/docker-container-debugging/scenarios/level-2/docker-compose-debugging.yaml +66 -0
  66. package/courses/docker-container-debugging/scenarios/level-2/docker-exec-debugging.yaml +71 -0
  67. package/courses/docker-container-debugging/scenarios/level-2/image-layer-optimization.yaml +81 -0
  68. package/courses/docker-container-debugging/scenarios/level-2/intermediate-debugging-shift.yaml +73 -0
  69. package/courses/docker-container-debugging/scenarios/level-2/logging-and-log-rotation.yaml +76 -0
  70. package/courses/docker-container-debugging/scenarios/level-2/multi-stage-build-debugging.yaml +76 -0
  71. package/courses/docker-container-debugging/scenarios/level-2/network-debugging-tools.yaml +67 -0
  72. package/courses/docker-container-debugging/scenarios/level-2/pid1-signal-handling.yaml +71 -0
  73. package/courses/docker-container-debugging/scenarios/level-2/security-scanning-basics.yaml +67 -0
  74. package/courses/docker-container-debugging/scenarios/level-3/advanced-debugging-shift.yaml +77 -0
  75. package/courses/docker-container-debugging/scenarios/level-3/buildkit-optimization.yaml +67 -0
  76. package/courses/docker-container-debugging/scenarios/level-3/container-filesystem-debugging.yaml +70 -0
  77. package/courses/docker-container-debugging/scenarios/level-3/container-security-hardening.yaml +74 -0
  78. package/courses/docker-container-debugging/scenarios/level-3/disk-space-management.yaml +74 -0
  79. package/courses/docker-container-debugging/scenarios/level-3/docker-api-automation.yaml +72 -0
  80. package/courses/docker-container-debugging/scenarios/level-3/docker-daemon-issues.yaml +73 -0
  81. package/courses/docker-container-debugging/scenarios/level-3/docker-in-docker-ci.yaml +69 -0
  82. package/courses/docker-container-debugging/scenarios/level-3/overlay-network-debugging.yaml +70 -0
  83. package/courses/docker-container-debugging/scenarios/level-3/production-container-ops.yaml +71 -0
  84. package/courses/docker-container-debugging/scenarios/level-4/cicd-pipeline-design.yaml +66 -0
  85. package/courses/docker-container-debugging/scenarios/level-4/container-monitoring-observability.yaml +63 -0
  86. package/courses/docker-container-debugging/scenarios/level-4/container-orchestration-strategy.yaml +62 -0
  87. package/courses/docker-container-debugging/scenarios/level-4/container-performance-engineering.yaml +64 -0
  88. package/courses/docker-container-debugging/scenarios/level-4/container-security-architecture.yaml +66 -0
  89. package/courses/docker-container-debugging/scenarios/level-4/enterprise-image-management.yaml +58 -0
  90. package/courses/docker-container-debugging/scenarios/level-4/expert-debugging-shift.yaml +63 -0
  91. package/courses/docker-container-debugging/scenarios/level-4/incident-response-containers.yaml +70 -0
  92. package/courses/docker-container-debugging/scenarios/level-4/multi-environment-management.yaml +65 -0
  93. package/courses/docker-container-debugging/scenarios/level-4/stateful-service-containers.yaml +65 -0
  94. package/courses/docker-container-debugging/scenarios/level-5/board-infrastructure-strategy.yaml +58 -0
  95. package/courses/docker-container-debugging/scenarios/level-5/consulting-container-strategy.yaml +61 -0
  96. package/courses/docker-container-debugging/scenarios/level-5/container-platform-architecture.yaml +67 -0
  97. package/courses/docker-container-debugging/scenarios/level-5/container-platform-economics.yaml +67 -0
  98. package/courses/docker-container-debugging/scenarios/level-5/container-technology-evolution.yaml +67 -0
  99. package/courses/docker-container-debugging/scenarios/level-5/disaster-recovery-containers.yaml +66 -0
  100. package/courses/docker-container-debugging/scenarios/level-5/industry-container-patterns.yaml +71 -0
  101. package/courses/docker-container-debugging/scenarios/level-5/master-debugging-shift.yaml +62 -0
  102. package/courses/docker-container-debugging/scenarios/level-5/organizational-transformation.yaml +67 -0
  103. package/courses/docker-container-debugging/scenarios/level-5/regulatory-compliance-containers.yaml +61 -0
  104. package/courses/kubernetes-deployment-troubleshooting/course.yaml +12 -0
  105. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/configmap-secret-issues.yaml +69 -0
  106. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/crashloopbackoff.yaml +68 -0
  107. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/deployment-rollout.yaml +56 -0
  108. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/first-troubleshooting-shift.yaml +65 -0
  109. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/health-probe-failures.yaml +70 -0
  110. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/imagepullbackoff.yaml +57 -0
  111. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/kubectl-debugging-basics.yaml +56 -0
  112. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/oomkilled.yaml +70 -0
  113. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/pending-pods.yaml +68 -0
  114. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/service-not-reachable.yaml +66 -0
  115. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/dns-resolution-failures.yaml +63 -0
  116. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/helm-deployment-failures.yaml +63 -0
  117. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/hpa-scaling-issues.yaml +62 -0
  118. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/ingress-routing-issues.yaml +63 -0
  119. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/init-container-failures.yaml +63 -0
  120. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/intermediate-troubleshooting-shift.yaml +66 -0
  121. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/network-policy-blocking.yaml +67 -0
  122. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/persistent-volume-issues.yaml +69 -0
  123. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/rbac-permission-denied.yaml +57 -0
  124. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/resource-quota-limits.yaml +64 -0
  125. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/advanced-troubleshooting-shift.yaml +69 -0
  126. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/cluster-upgrade-failures.yaml +71 -0
  127. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/gitops-drift-detection.yaml +62 -0
  128. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/job-cronjob-failures.yaml +67 -0
  129. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/monitoring-alerting-gaps.yaml +64 -0
  130. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/multi-container-debugging.yaml +68 -0
  131. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/node-pressure-evictions.yaml +70 -0
  132. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/pod-disruption-budgets.yaml +59 -0
  133. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/service-mesh-debugging.yaml +64 -0
  134. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/statefulset-troubleshooting.yaml +69 -0
  135. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/capacity-planning.yaml +65 -0
  136. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/cost-optimization.yaml +57 -0
  137. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/disaster-recovery-design.yaml +56 -0
  138. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/executive-communication.yaml +62 -0
  139. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/expert-troubleshooting-shift.yaml +65 -0
  140. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/incident-management-process.yaml +59 -0
  141. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/multi-cluster-operations.yaml +62 -0
  142. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/multi-tenancy-design.yaml +55 -0
  143. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/platform-engineering.yaml +59 -0
  144. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/security-hardening.yaml +58 -0
  145. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/behavioral-science.yaml +62 -0
  146. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/board-strategy.yaml +61 -0
  147. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/cloud-native-future.yaml +65 -0
  148. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/comprehensive-platform.yaml +57 -0
  149. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/consulting-engagement.yaml +62 -0
  150. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/industry-benchmarks.yaml +58 -0
  151. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/ma-integration.yaml +62 -0
  152. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/master-troubleshooting-shift.yaml +73 -0
  153. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/product-development.yaml +65 -0
  154. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/regulatory-compliance.yaml +76 -0
  155. package/courses/mysql-query-optimization/course.yaml +11 -0
  156. package/courses/mysql-query-optimization/scenarios/level-1/buffer-pool-basics.yaml +65 -0
  157. package/courses/mysql-query-optimization/scenarios/level-1/explain-basics.yaml +66 -0
  158. package/courses/mysql-query-optimization/scenarios/level-1/first-optimization-shift.yaml +78 -0
  159. package/courses/mysql-query-optimization/scenarios/level-1/innodb-index-fundamentals.yaml +68 -0
  160. package/courses/mysql-query-optimization/scenarios/level-1/join-basics.yaml +66 -0
  161. package/courses/mysql-query-optimization/scenarios/level-1/n-plus-one-queries.yaml +67 -0
  162. package/courses/mysql-query-optimization/scenarios/level-1/query-rewriting-basics.yaml +66 -0
  163. package/courses/mysql-query-optimization/scenarios/level-1/select-star-problems.yaml +68 -0
  164. package/courses/mysql-query-optimization/scenarios/level-1/slow-query-diagnosis.yaml +65 -0
  165. package/courses/mysql-query-optimization/scenarios/level-1/where-clause-optimization.yaml +65 -0
  166. package/courses/mysql-query-optimization/scenarios/level-2/buffer-pool-tuning.yaml +64 -0
  167. package/courses/mysql-query-optimization/scenarios/level-2/composite-index-design.yaml +71 -0
  168. package/courses/mysql-query-optimization/scenarios/level-2/covering-and-invisible-indexes.yaml +69 -0
  169. package/courses/mysql-query-optimization/scenarios/level-2/cte-and-window-functions.yaml +78 -0
  170. package/courses/mysql-query-optimization/scenarios/level-2/intermediate-optimization-shift.yaml +68 -0
  171. package/courses/mysql-query-optimization/scenarios/level-2/join-optimization.yaml +67 -0
  172. package/courses/mysql-query-optimization/scenarios/level-2/performance-schema-analysis.yaml +69 -0
  173. package/courses/mysql-query-optimization/scenarios/level-2/query-optimizer-hints.yaml +74 -0
  174. package/courses/mysql-query-optimization/scenarios/level-2/subquery-optimization.yaml +70 -0
  175. package/courses/mysql-query-optimization/scenarios/level-2/write-optimization.yaml +63 -0
  176. package/courses/mysql-query-optimization/scenarios/level-3/advanced-optimization-shift.yaml +71 -0
  177. package/courses/mysql-query-optimization/scenarios/level-3/connection-management.yaml +67 -0
  178. package/courses/mysql-query-optimization/scenarios/level-3/full-text-search.yaml +77 -0
  179. package/courses/mysql-query-optimization/scenarios/level-3/json-optimization.yaml +87 -0
  180. package/courses/mysql-query-optimization/scenarios/level-3/lock-contention-analysis.yaml +68 -0
  181. package/courses/mysql-query-optimization/scenarios/level-3/monitoring-alerting.yaml +63 -0
  182. package/courses/mysql-query-optimization/scenarios/level-3/online-schema-changes.yaml +79 -0
  183. package/courses/mysql-query-optimization/scenarios/level-3/partitioning-strategies.yaml +83 -0
  184. package/courses/mysql-query-optimization/scenarios/level-3/query-profiling-deep-dive.yaml +84 -0
  185. package/courses/mysql-query-optimization/scenarios/level-3/replication-optimization.yaml +66 -0
  186. package/courses/mysql-query-optimization/scenarios/level-4/aurora-vs-rds-evaluation.yaml +61 -0
  187. package/courses/mysql-query-optimization/scenarios/level-4/data-architecture.yaml +62 -0
  188. package/courses/mysql-query-optimization/scenarios/level-4/database-migration-planning.yaml +59 -0
  189. package/courses/mysql-query-optimization/scenarios/level-4/enterprise-governance.yaml +50 -0
  190. package/courses/mysql-query-optimization/scenarios/level-4/executive-communication.yaml +54 -0
  191. package/courses/mysql-query-optimization/scenarios/level-4/expert-optimization-shift.yaml +67 -0
  192. package/courses/mysql-query-optimization/scenarios/level-4/high-availability-architecture.yaml +60 -0
  193. package/courses/mysql-query-optimization/scenarios/level-4/optimizer-internals.yaml +62 -0
  194. package/courses/mysql-query-optimization/scenarios/level-4/performance-sla-design.yaml +52 -0
  195. package/courses/mysql-query-optimization/scenarios/level-4/read-replica-scaling.yaml +51 -0
  196. package/courses/mysql-query-optimization/scenarios/level-5/ai-database-future.yaml +45 -0
  197. package/courses/mysql-query-optimization/scenarios/level-5/behavioral-science.yaml +44 -0
  198. package/courses/mysql-query-optimization/scenarios/level-5/benchmark-design.yaml +47 -0
  199. package/courses/mysql-query-optimization/scenarios/level-5/board-strategy.yaml +48 -0
  200. package/courses/mysql-query-optimization/scenarios/level-5/comprehensive-platform.yaml +49 -0
  201. package/courses/mysql-query-optimization/scenarios/level-5/consulting-engagement.yaml +52 -0
  202. package/courses/mysql-query-optimization/scenarios/level-5/ma-database-integration.yaml +47 -0
  203. package/courses/mysql-query-optimization/scenarios/level-5/master-optimization-shift.yaml +56 -0
  204. package/courses/mysql-query-optimization/scenarios/level-5/product-development.yaml +48 -0
  205. package/courses/mysql-query-optimization/scenarios/level-5/regulatory-compliance.yaml +48 -0
  206. package/courses/postgresql-query-optimization/scenarios/level-5/comprehensive-database-system.yaml +70 -0
  207. package/courses/postgresql-query-optimization/scenarios/level-5/database-ai-future.yaml +81 -0
  208. package/courses/postgresql-query-optimization/scenarios/level-5/database-behavioral-science.yaml +63 -0
  209. package/courses/postgresql-query-optimization/scenarios/level-5/database-board-strategy.yaml +77 -0
  210. package/courses/postgresql-query-optimization/scenarios/level-5/database-consulting-engagement.yaml +61 -0
  211. package/courses/postgresql-query-optimization/scenarios/level-5/database-industry-benchmarks.yaml +64 -0
  212. package/courses/postgresql-query-optimization/scenarios/level-5/database-ma-integration.yaml +71 -0
  213. package/courses/postgresql-query-optimization/scenarios/level-5/database-product-development.yaml +72 -0
  214. package/courses/postgresql-query-optimization/scenarios/level-5/database-regulatory-landscape.yaml +76 -0
  215. package/courses/postgresql-query-optimization/scenarios/level-5/master-optimization-shift.yaml +66 -0
  216. package/courses/terraform-infrastructure-setup/course.yaml +11 -0
  217. package/courses/terraform-infrastructure-setup/scenarios/level-1/hcl-syntax-errors.yaml +65 -0
  218. package/courses/terraform-infrastructure-setup/scenarios/level-1/provider-configuration.yaml +62 -0
  219. package/courses/terraform-infrastructure-setup/scenarios/level-1/terraform-init-errors.yaml +72 -0
  220. package/courses/terraform-infrastructure-setup/scenarios/level-1/variable-and-output-errors.yaml +78 -0
  221. package/dist/mcp/session-manager.d.ts +7 -4
  222. package/dist/mcp/session-manager.d.ts.map +1 -1
  223. package/dist/mcp/session-manager.js +23 -8
  224. package/dist/mcp/session-manager.js.map +1 -1
  225. package/package.json +3 -2
@@ -0,0 +1,62 @@
1
+ meta:
2
+ id: master-debugging-shift
3
+ level: 5
4
+ course: docker-container-debugging
5
+ type: output
6
+ description: "Combined master debugging shift — serve as fractional CTO advising on a complete container platform strategy encompassing technology, people, process, and business alignment"
7
+ tags: [Docker, troubleshooting, combined, shift-simulation, CTO, master]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You're engaged as a fractional CTO for a Series B startup ($30M
13
+ raised, 80 engineers, growing to 200). They've been running on
14
+ Docker Compose across 5 servers for 2 years. Growing pains are
15
+ severe:
16
+
17
+ Technical challenges:
18
+ - 40 microservices on Docker Compose, manual deployment
19
+ - Last week: wrong image tag deployed, 3-hour outage ($150K impact)
20
+ - No security scanning — audit found 12 CRITICAL CVEs in production
21
+ - Database containers without proper backup — near-miss data loss
22
+ - Developers wait 30+ minutes for CI builds (no caching)
23
+ - No centralized logging — debugging requires SSH to 5 servers
24
+
25
+ People challenges:
26
+ - 2-person DevOps team overwhelmed (they're firefighting 80% of time)
27
+ - Developers have no container training, copy-paste Dockerfiles
28
+ - No on-call rotation — same 2 DevOps engineers handle everything
29
+ - Engineering velocity declining as team grows
30
+
31
+ Business context:
32
+ - Series C fundraising in 9 months — need to demonstrate scalability
33
+ - Enterprise customer prospects require SOC2 compliance
34
+ - Planning international expansion (EU data residency requirements)
35
+ - Board expects 99.95% availability SLA for enterprise tier
36
+
37
+ Budget: $500K for platform investment over next 12 months
38
+ Hiring: Can add 3-4 people
39
+
40
+ The CEO asks: "Give me a 12-month roadmap that gets us to Series C
41
+ ready. We need to stop firefighting and start scaling."
42
+
43
+ Task: Design the comprehensive 12-month roadmap. Write: the phased
44
+ approach (stabilize → automate → scale → optimize), technology
45
+ decisions (stay on Compose? Move to K8s? Use managed services?),
46
+ hiring plan (who to hire first and why), SOC2 compliance path,
47
+ cost breakdown and ROI, risk register with mitigations, and the
48
+ key milestones that demonstrate investor readiness.
49
+
50
+ assertions:
51
+ - type: llm_judge
52
+ criteria: "Phased roadmap is realistic — Month 1-3 (Stabilize): fix critical security CVEs, implement image scanning in CI, set up proper database backups, add health checks, configure log rotation, basic monitoring (Prometheus + Grafana). Month 4-6 (Automate): CI/CD pipeline with automated build/scan/deploy, move to managed Kubernetes (EKS/GKE), centralized logging, on-call rotation. Month 7-9 (Scale): SOC2 controls implementation, multi-region readiness, developer self-service platform. Month 10-12 (Optimize): cost optimization, performance tuning, DR testing, compliance audit. Each phase builds on the previous"
53
+ weight: 0.35
54
+ description: "Phased roadmap"
55
+ - type: llm_judge
56
+ criteria: "Technology and hiring decisions are justified — recommend managed Kubernetes over Compose for 40+ services (Compose doesn't scale operationally). Managed K8s (EKS/GKE) over self-hosted (team too small to manage control plane). Hiring priority: (1) senior platform engineer (lead the migration), (2) security engineer (SOC2 + scanning), (3) SRE (on-call, monitoring, incident response), (4) DevOps engineer (CI/CD, automation). This relieves the existing 2-person team and adds specialization. Budget allocation: $200K tooling, $300K hiring (partial year)"
57
+ weight: 0.35
58
+ description: "Technology and hiring"
59
+ - type: llm_judge
60
+ criteria: "Investor readiness and compliance are addressed — SOC2 Type I achievable in 6-9 months (show controls exist), Type II requires 6+ months of evidence (start collecting immediately). Key investor metrics: 99.95% availability (track from month 3), deployment frequency (daily by month 6), MTTR < 30 minutes, security posture (0 CRITICAL CVEs). EU expansion: GDPR compliance, data residency (EU region deployment). Present to board quarterly: progress against milestones, risk reduction, cost efficiency. Risk register: migration timeline slippage (mitigate: phased approach), hiring delays (mitigate: start immediately), scope creep (mitigate: strict prioritization)"
61
+ weight: 0.30
62
+ description: "Investor readiness"
@@ -0,0 +1,67 @@
1
+ meta:
2
+ id: organizational-transformation
3
+ level: 5
4
+ course: docker-container-debugging
5
+ type: output
6
+ description: "Lead organizational transformation through containerization — manage cultural change, team restructuring, and DevOps transformation alongside container adoption"
7
+ tags: [Docker, organizational-change, DevOps, culture, transformation, leadership, master]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You're leading container adoption at a 5,000-person enterprise.
13
+ The technology is ready but the organization isn't. Three months
14
+ into the initiative, adoption is stalling:
15
+
16
+ Resistance patterns observed:
17
+
18
+ Operations team (50 people):
19
+ "Containers are a fad. We've managed VMs for 15 years. Why change?"
20
+ Fear: automation will eliminate their jobs.
21
+ Reality: need to transform from VM operators to platform engineers.
22
+
23
+ Development teams (200 people across 30 teams):
24
+ Only 4 teams have adopted containers. Others: "We're too busy
25
+ delivering features to learn new deployment tools."
26
+ Fear: containers add complexity to their already-complex workflow.
27
+ Reality: containers simplify deployment once learned.
28
+
29
+ Security team (10 people):
30
+ "Containers increase our attack surface. We can't audit them."
31
+ Fear: loss of visibility and control.
32
+ Reality: containers can improve security posture with proper tooling.
33
+
34
+ Management:
35
+ CTO sponsors the initiative but middle managers are neutral.
36
+ Project managers: "Container migration isn't in our roadmap."
37
+ No incentives aligned with container adoption.
38
+
39
+ Change management strategy needed:
40
+
41
+ 1. Create urgency — show real costs of current approach
42
+ 2. Build a coalition — identify champions in each group
43
+ 3. Quick wins — solve real pain points first
44
+ 4. Enablement — training, documentation, support
45
+ 5. Incentive alignment — tie container adoption to OKRs
46
+ 6. Celebrate success — publicize wins, recognize adopters
47
+ 7. Sustain — embed in hiring, onboarding, promotion criteria
48
+
49
+ Task: Design the organizational transformation strategy. Write:
50
+ the change management framework, addressing each stakeholder
51
+ group's concerns, training and enablement program, metrics for
52
+ transformation progress, common transformation failure modes,
53
+ and the role of leadership in driving technology adoption.
54
+
55
+ assertions:
56
+ - type: llm_judge
57
+ criteria: "Change management framework is structured — use Kotter's 8-step model or similar: create urgency (show competitor advantage, calculate cost of status quo), build coalition (executive sponsor + tech leads from willing teams + operations champion), quick wins (solve a visible pain point in 30 days), scale (expand from pilot teams to adjacent teams). Transformation timeline: 12-18 months for meaningful adoption, 3-5 years for full organizational shift. Don't announce a 'container mandate' — enable and incentivize instead"
58
+ weight: 0.35
59
+ description: "Change framework"
60
+ - type: llm_judge
61
+ criteria: "Stakeholder-specific strategies are empathetic — operations: retrain as platform engineers (higher-value role, not elimination), pair with developers, give ownership of the container platform. Developers: provide golden paths (make containers easier than current approach), don't ask teams to stop feature work — integrate container adoption into existing projects. Security: give better tools (runtime monitoring, automated scanning gives more visibility than VMs), involve in platform design. Management: show metrics (deployment speed, incident reduction), align with business OKRs"
62
+ weight: 0.35
63
+ description: "Stakeholder strategies"
64
+ - type: llm_judge
65
+ criteria: "Failure modes and measurement are realistic — common failures: mandating adoption without enablement, moving too fast (teams overwhelmed), moving too slow (initiative loses momentum), not investing in platform team (adoption stalls without support), not celebrating wins (no positive reinforcement). Measure: adoption rate (% services containerized), developer satisfaction (NPS), deployment frequency per team, time to onboard new service, support ticket volume for container issues. Transformation is a people problem, not a technology problem — treat it accordingly"
66
+ weight: 0.30
67
+ description: "Failures and measurement"
@@ -0,0 +1,61 @@
1
+ meta:
2
+ id: regulatory-compliance-containers
3
+ level: 5
4
+ course: docker-container-debugging
5
+ type: output
6
+ description: "Design regulatory compliance for containers — implement controls for SOC2, PCI-DSS, HIPAA, and FedRAMP in containerized environments"
7
+ tags: [Docker, compliance, SOC2, PCI-DSS, HIPAA, FedRAMP, regulation, master]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your company is pursuing SOC2 Type II certification and PCI-DSS
13
+ compliance. The auditor has questions about your container platform:
14
+
15
+ Auditor question 1: "How do you ensure only authorized images
16
+ run in production?"
17
+ Current answer: "We trust our developers." (Not acceptable)
18
+ Required: Image signing, admission control, approved registry list.
19
+
20
+ Auditor question 2: "How do you track who deployed what and when?"
21
+ Current answer: "We can check git history." (Insufficient)
22
+ Required: Immutable audit logs of all deployment actions with
23
+ identity, timestamp, image digest, and approval chain.
24
+
25
+ Auditor question 3: "How do you ensure containers don't contain
26
+ known vulnerabilities?"
27
+ Current answer: "We scan periodically." (When? How? What's the SLA?)
28
+ Required: Automated scanning in CI/CD with defined severity thresholds,
29
+ documented exception process, SLA for patching (CRITICAL: 24h,
30
+ HIGH: 7d, MEDIUM: 30d).
31
+
32
+ Auditor question 4: "How do you isolate cardholder data environments?"
33
+ Current answer: "Different Docker network." (Insufficient for PCI)
34
+ Required: Network segmentation with documented firewall rules,
35
+ encrypted communication (mTLS), access logging, separate
36
+ infrastructure for CDE.
37
+
38
+ Auditor question 5: "How do you handle secrets and encryption keys?"
39
+ Current answer: "Environment variables in Docker Compose."
40
+ Required: Dedicated secrets manager (Vault), encryption at rest,
41
+ rotation policy, access auditing.
42
+
43
+ Task: Design compliance controls for containerized environments.
44
+ Write: control mappings for SOC2 and PCI-DSS, image governance
45
+ (signing, scanning, admission), audit logging architecture,
46
+ network segmentation for compliance, secrets management, and the
47
+ continuous compliance monitoring approach.
48
+
49
+ assertions:
50
+ - type: llm_judge
51
+ criteria: "Control mappings are specific — SOC2: CC6.1 (logical access) → RBAC, namespace isolation, registry access control. CC7.1 (monitoring) → container runtime monitoring, audit logs. CC8.1 (change management) → GitOps, immutable images, deployment approvals. PCI-DSS: Requirement 2 (secure configuration) → hardened container images, CIS benchmarks. Requirement 6 (secure development) → image scanning in CI. Requirement 10 (logging) → centralized audit logs with tamper protection. Requirement 11 (testing) → regular vulnerability scanning"
52
+ weight: 0.35
53
+ description: "Control mappings"
54
+ - type: llm_judge
55
+ criteria: "Image governance and audit are covered — image lifecycle: build → scan → sign → approve → deploy. Admission controller (OPA/Kyverno) rejects unsigned or unscanned images. Approved registry allowlist prevents pulling from public registries. Audit logging: every docker/kubectl command logged with identity (who), action (what), resource (which container/image), timestamp (when), result (success/fail). Logs must be immutable (append-only, shipped to SIEM). Retention: 1 year minimum for SOC2, as defined by PCI-DSS"
56
+ weight: 0.35
57
+ description: "Governance and audit"
58
+ - type: llm_judge
59
+ criteria: "Network and secrets compliance are practical — PCI CDE isolation: separate cluster or namespace with strict network policies. mTLS between all services in CDE (service mesh). No direct internet access from CDE containers. Secrets: HashiCorp Vault or cloud KMS, automatic rotation, access auditing, encryption at rest. Never in environment variables, Docker Compose files, or image layers. Continuous compliance: automated scanning against CIS Docker Benchmark, regular penetration testing, compliance dashboards for auditors, automated evidence collection"
60
+ weight: 0.30
61
+ description: "Network and secrets"
@@ -0,0 +1,12 @@
1
+ id: kubernetes-deployment-troubleshooting
2
+ name: "Kubernetes Deployment Troubleshooting"
3
+ description: >
4
+ Master Kubernetes deployment troubleshooting from pod debugging basics
5
+ to enterprise platform operations. Learn to diagnose CrashLoopBackOff,
6
+ ImagePullBackOff, OOMKilled pods, fix networking and service issues,
7
+ manage storage and RBAC, optimize resources with HPA/VPA, implement
8
+ GitOps pipelines, and design multi-cluster disaster recovery
9
+ strategies for large-scale Kubernetes deployments.
10
+ levels: 5
11
+ scenarios_per_level: 10
12
+ tags: [development, Kubernetes, DevOps, troubleshooting, containers, deployment, cloud-native]
@@ -0,0 +1,69 @@
1
+ meta:
2
+ id: configmap-secret-issues
3
+ level: 1
4
+ course: kubernetes-deployment-troubleshooting
5
+ type: output
6
+ description: "Debug ConfigMap and Secret issues — diagnose missing environment variables, mount failures, and configuration mismatches"
7
+ tags: [Kubernetes, ConfigMap, Secret, environment-variables, configuration, beginner]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your application deployed successfully but is returning 500 errors.
13
+ The logs show it can't find its configuration:
14
+
15
+ $ kubectl get pods
16
+ NAME READY STATUS RESTARTS AGE
17
+ auth-svc-2b3c4d5e-fg67 1/1 Running 0 2m
18
+
19
+ $ kubectl logs auth-svc-2b3c4d5e-fg67
20
+ Error: Missing required environment variable: JWT_SECRET
21
+ Error: Missing required environment variable: REDIS_URL
22
+ Error: Configuration validation failed, shutting down...
23
+ Process exited with code 1
24
+
25
+ Wait — the pod status shows Running, not CrashLoopBackOff? That's
26
+ because the container has a startup delay and the crash happened after
27
+ the liveness probe window.
28
+
29
+ $ kubectl describe pod auth-svc-2b3c4d5e-fg67
30
+ Environment:
31
+ DATABASE_URL: <set to the key 'DATABASE_URL' in secret 'auth-secrets'>
32
+ JWT_SECRET: <set to the key 'JWT_SECRET' in secret 'auth-secrets'>
33
+ REDIS_URL: <set to the key 'REDIS_URL' in configmap 'auth-config'>
34
+
35
+ $ kubectl get secret auth-secrets -o jsonpath='{.data}' | jq
36
+ {
37
+ "DATABASE_URL": "cG9zdGdyZXM6Ly8uLi4="
38
+ }
39
+
40
+ The secret exists but only has DATABASE_URL — JWT_SECRET is missing
41
+ from the secret! And the ConfigMap:
42
+
43
+ $ kubectl get configmap auth-config
44
+ Error from server (NotFound): configmaps "auth-config" not found
45
+
46
+ The ConfigMap doesn't exist at all. Two issues:
47
+ 1. Secret auth-secrets is missing the JWT_SECRET key
48
+ 2. ConfigMap auth-config was never created
49
+
50
+ Task: Explain how ConfigMaps and Secrets work in Kubernetes. Write:
51
+ how pods consume configuration (env vars vs volume mounts), what
52
+ happens when a referenced ConfigMap/Secret is missing (pod may or may
53
+ not start depending on optional flag), how to debug missing config,
54
+ the difference between ConfigMaps and Secrets, and best practices for
55
+ configuration management.
56
+
57
+ assertions:
58
+ - type: llm_judge
59
+ criteria: "ConfigMap and Secret consumption is explained — two methods: environment variables (envFrom or env with valueFrom) and volume mounts. envFrom loads all keys, env with valueFrom loads specific keys. Volume mounts project keys as files. If a referenced ConfigMap/Secret doesn't exist, the pod fails to start unless the reference is marked as optional"
60
+ weight: 0.35
61
+ description: "ConfigMap/Secret consumption"
62
+ - type: llm_judge
63
+ criteria: "Debugging steps are systematic — check pod events (kubectl describe pod), verify ConfigMap/Secret exists (kubectl get cm/secret), inspect keys (kubectl get secret -o jsonpath or -o yaml), verify key names match what the deployment references, check if the reference is in the correct namespace. Shows how base64 encoding works for secrets"
64
+ weight: 0.35
65
+ description: "Debugging steps"
66
+ - type: llm_judge
67
+ criteria: "Differences and best practices are covered — ConfigMaps for non-sensitive config, Secrets for sensitive data (base64 encoded, can be encrypted at rest in etcd). Best practices: use optional references where appropriate, validate config in CI/CD, use sealed-secrets or external secret managers for production, consider volume mounts for auto-updates vs env vars requiring restart"
68
+ weight: 0.30
69
+ description: "Differences and best practices"
@@ -0,0 +1,68 @@
1
+ meta:
2
+ id: crashloopbackoff
3
+ level: 1
4
+ course: kubernetes-deployment-troubleshooting
5
+ type: output
6
+ description: "Debug CrashLoopBackOff — diagnose why a pod keeps crashing and restarting, using kubectl logs, describe, and events"
7
+ tags: [Kubernetes, CrashLoopBackOff, debugging, pod-lifecycle, beginner]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You deployed a new version of your API service and kubectl shows:
13
+
14
+ $ kubectl get pods
15
+ NAME READY STATUS RESTARTS AGE
16
+ api-server-7b8f9c4d-x2k9l 0/1 CrashLoopBackOff 5 3m
17
+ api-server-7b8f9c4d-m4p7q 0/1 CrashLoopBackOff 5 3m
18
+ api-server-7b8f9c4d-r9n1j 0/1 CrashLoopBackOff 5 3m
19
+
20
+ All 3 replicas are in CrashLoopBackOff. The previous version was
21
+ running fine. The team is panicking because the API is down.
22
+
23
+ kubectl describe pod api-server-7b8f9c4d-x2k9l shows:
24
+ State: Waiting
25
+ Reason: CrashLoopBackOff
26
+ Last State: Terminated
27
+ Reason: Error
28
+ Exit Code: 1
29
+ Started: 2025-12-01T10:00:05Z
30
+ Finished: 2025-12-01T10:00:06Z
31
+
32
+ Events:
33
+ Warning BackOff 3m kubelet Back-off restarting failed container
34
+
35
+ kubectl logs api-server-7b8f9c4d-x2k9l shows:
36
+ Error: connect ECONNREFUSED 10.100.50.3:5432
37
+ Error: Unable to connect to database
38
+ Process exited with code 1
39
+
40
+ The application requires a PostgreSQL database connection. The
41
+ database is running in the same cluster as a StatefulSet. Nothing
42
+ changed about the database — only the API image was updated.
43
+
44
+ Investigation reveals:
45
+ - The new image version changed the env var name from DATABASE_URL
46
+ to DB_CONNECTION_STRING
47
+ - The ConfigMap still has DATABASE_URL
48
+ - The container starts, fails to connect (wrong env var), and exits
49
+
50
+ Task: Explain how to debug CrashLoopBackOff. Write: what
51
+ CrashLoopBackOff means (the restart backoff mechanism), the debugging
52
+ workflow (kubectl logs → describe → events), common causes (app crash,
53
+ missing env vars, missing dependencies, OOM, wrong command), the fix
54
+ for this specific case, and how to prevent this in the future.
55
+
56
+ assertions:
57
+ - type: llm_judge
58
+ criteria: "CrashLoopBackOff is explained — the container starts, crashes, Kubernetes restarts it with exponential backoff (10s, 20s, 40s... up to 5 minutes). Exit code 1 indicates application error (not OOMKilled=137, not SIGTERM=143). The backoff means Kubernetes is waiting longer between each restart attempt"
59
+ weight: 0.35
60
+ description: "CrashLoopBackOff explained"
61
+ - type: llm_judge
62
+ criteria: "Debugging workflow is systematic — (1) kubectl logs <pod> to see application error, (2) kubectl logs <pod> --previous if container already restarted, (3) kubectl describe pod to see events, exit codes, and container state, (4) check env vars with kubectl exec (if pod stays up) or kubectl get cm/secret. Identifies the env var mismatch as root cause"
63
+ weight: 0.35
64
+ description: "Systematic debugging workflow"
65
+ - type: llm_judge
66
+ criteria: "Fix and prevention are practical — immediate fix: update ConfigMap to include DB_CONNECTION_STRING, or update deployment to map the old var name. Prevention: validate env vars in CI/CD, use health checks (readiness probe on DB connection), and consider using a shared config schema between app versions"
67
+ weight: 0.30
68
+ description: "Fix and prevention"
@@ -0,0 +1,56 @@
1
+ meta:
2
+ id: deployment-rollout
3
+ level: 1
4
+ course: kubernetes-deployment-troubleshooting
5
+ type: output
6
+ description: "Debug deployment rollout issues — understand rolling updates, rollback, and why a deployment might get stuck"
7
+ tags: [Kubernetes, Deployment, rollout, rolling-update, rollback, beginner]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You updated your deployment to a new image version but the rollout
13
+ is stuck. Old pods are still running alongside partially updated ones:
14
+
15
+ $ kubectl rollout status deployment/user-svc
16
+ Waiting for deployment "user-svc" spec update to be observed...
17
+ Waiting for rollout to finish: 1 out of 3 new replicas have been updated...
18
+
19
+ $ kubectl get pods
20
+ NAME READY STATUS RESTARTS AGE
21
+ user-svc-7a8b9c0d-old1 1/1 Running 0 2h
22
+ user-svc-7a8b9c0d-old2 1/1 Running 0 2h
23
+ user-svc-7a8b9c0d-old3 1/1 Running 0 2h
24
+ user-svc-3e4f5g6h-new1 0/1 CrashLoopBackOff 4 5m
25
+
26
+ The deployment strategy is RollingUpdate with maxSurge=1 and
27
+ maxUnavailable=0. The new pod keeps crashing, so the rollout can't
28
+ proceed (it needs at least 1 new pod Ready before terminating old ones).
29
+
30
+ $ kubectl logs user-svc-3e4f5g6h-new1
31
+ Error: Cannot connect to database — migration v15 not applied
32
+ The new version requires a database migration that wasn't run.
33
+
34
+ The deployment is stuck: new pods crash, old pods keep serving traffic,
35
+ but the deployment never completes. After 10 minutes the
36
+ progressDeadlineSeconds (default 600s) will mark it as Failed.
37
+
38
+ Task: Explain how Kubernetes deployment rollouts work. Write: the
39
+ RollingUpdate strategy (maxSurge, maxUnavailable), what happens when
40
+ a rollout gets stuck, how to check rollout status and history, how
41
+ to rollback (kubectl rollout undo), progressDeadlineSeconds, and the
42
+ Recreate strategy as an alternative.
43
+
44
+ assertions:
45
+ - type: llm_judge
46
+ criteria: "Rolling update mechanics are explained — maxSurge controls how many extra pods can be created during update, maxUnavailable controls how many pods can be down. With maxSurge=1 maxUnavailable=0, Kubernetes creates 1 new pod, waits for it to be Ready, then terminates 1 old pod. If the new pod never becomes Ready, the rollout stalls"
47
+ weight: 0.35
48
+ description: "Rolling update mechanics"
49
+ - type: llm_judge
50
+ criteria: "Stuck rollout debugging is explained — kubectl rollout status shows progress, kubectl rollout history shows revision history, progressDeadlineSeconds (default 600s) marks deployment as Failed if no progress within deadline. A stuck rollout means old pods continue serving traffic (safe but incomplete update)"
51
+ weight: 0.35
52
+ description: "Stuck rollout debugging"
53
+ - type: llm_judge
54
+ criteria: "Rollback and alternatives are covered — kubectl rollout undo deployment/<name> reverts to previous revision, kubectl rollout undo --to-revision=N reverts to specific version. Recreate strategy: terminates all old pods before creating new ones (causes downtime but avoids version mixing). kubectl rollout pause/resume for controlled rollouts"
55
+ weight: 0.30
56
+ description: "Rollback and alternatives"
@@ -0,0 +1,65 @@
1
+ meta:
2
+ id: first-troubleshooting-shift
3
+ level: 1
4
+ course: kubernetes-deployment-troubleshooting
5
+ type: output
6
+ description: "Combined troubleshooting shift — diagnose multiple pod failures across a namespace using the full beginner debugging toolkit"
7
+ tags: [Kubernetes, troubleshooting, debugging, combined, shift-simulation, beginner]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You're on-call and get paged: "Multiple services down in staging
13
+ namespace." You run:
14
+
15
+ $ kubectl get pods -n staging
16
+ NAME READY STATUS RESTARTS AGE
17
+ api-gw-5a6b7c8d-e9f0 0/1 CrashLoopBackOff 6 15m
18
+ user-svc-1g2h3i4j-k5l6 0/1 ImagePullBackOff 0 15m
19
+ order-svc-7m8n9o0p-q1r2 0/1 Pending 0 15m
20
+ payment-svc-3s4t5u6v-w7x8 1/1 Running 0 15m
21
+ notification-svc-9a0b-c1d2 0/1 Running 0 15m
22
+
23
+ Five services, four different problems:
24
+
25
+ 1. api-gw — CrashLoopBackOff, exit code 1
26
+ Logs: "Error: REDIS_HOST environment variable not set"
27
+ The ConfigMap was deleted during a cleanup.
28
+
29
+ 2. user-svc — ImagePullBackOff
30
+ Events: "Failed to pull image 'registry.company.com/user-svc:v3.2.1':
31
+ unauthorized"
32
+ The registry token expired overnight.
33
+
34
+ 3. order-svc — Pending
35
+ Events: "0/3 nodes are available: 3 Insufficient memory"
36
+ Requests 4Gi memory but largest available is 2Gi.
37
+
38
+ 4. payment-svc — Running but 0/1 Ready? No, it shows 1/1, but the
39
+ Service has no endpoints. Label mismatch: Service selector
40
+ app=payment, pod label app=payment-svc.
41
+
42
+ 5. notification-svc — Running, 0/1 Ready
43
+ Readiness probe failing: HTTP 503 on /ready
44
+ The downstream email service is unreachable.
45
+
46
+ Task: Walk through diagnosing all five issues. Write: the triage
47
+ approach (start with kubectl get pods, identify status patterns), the
48
+ debugging steps for each issue type (CrashLoopBackOff → logs,
49
+ ImagePullBackOff → events, Pending → describe, Service issues →
50
+ endpoints, Readiness → probe config), and how to prioritize which to
51
+ fix first in a real incident.
52
+
53
+ assertions:
54
+ - type: llm_judge
55
+ criteria: "All five issues are correctly diagnosed — (1) CrashLoopBackOff from missing ConfigMap causing missing env var, (2) ImagePullBackOff from expired registry credentials, (3) Pending from insufficient memory on nodes, (4) Service-pod label mismatch causing empty endpoints despite pod running, (5) Readiness probe failing due to downstream dependency. Each diagnosis maps to the correct kubectl command"
56
+ weight: 0.35
57
+ description: "All issues diagnosed"
58
+ - type: llm_judge
59
+ criteria: "Triage methodology is systematic — start with kubectl get pods for overview, group by status type, check most impactful services first. Use kubectl describe for events, kubectl logs for application errors, kubectl get endpoints for service connectivity. Prioritize: fix what unblocks the most services first (e.g., ConfigMap might affect multiple services)"
60
+ weight: 0.35
61
+ description: "Triage methodology"
62
+ - type: llm_judge
63
+ criteria: "Fixes and prioritization are practical — prioritize payment-svc (revenue-critical, quick label fix), then api-gw (gateway affects all downstream, recreate ConfigMap), then user-svc (refresh registry token), then order-svc (scale down request or add capacity), then notification-svc (investigate downstream dependency). Shows the actual kubectl fix commands"
64
+ weight: 0.30
65
+ description: "Fixes and prioritization"
@@ -0,0 +1,70 @@
1
+ meta:
2
+ id: health-probe-failures
3
+ level: 1
4
+ course: kubernetes-deployment-troubleshooting
5
+ type: output
6
+ description: "Debug health probe failures — understand liveness, readiness, and startup probes and why misconfigured probes cause restarts or traffic loss"
7
+ tags: [Kubernetes, health-probes, liveness, readiness, startup, beginner]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your application is running but keeps getting restarted, and users
13
+ report intermittent 503 errors:
14
+
15
+ $ kubectl get pods
16
+ NAME READY STATUS RESTARTS AGE
17
+ web-app-4a5b6c7d-mn01 1/1 Running 12 1h
18
+ web-app-4a5b6c7d-op23 0/1 Running 0 30s
19
+ web-app-4a5b6c7d-qr45 1/1 Running 8 1h
20
+
21
+ $ kubectl describe pod web-app-4a5b6c7d-mn01
22
+ Containers:
23
+ web-app:
24
+ Liveness: http-get http://:8080/healthz delay=5s timeout=1s
25
+ period=10s #success=1 #failure=3
26
+ Readiness: http-get http://:8080/ready delay=5s timeout=1s
27
+ period=10s #success=1 #failure=3
28
+
29
+ Events:
30
+ Warning Unhealthy 2m kubelet Liveness probe failed: HTTP probe
31
+ failed with statuscode: 503
32
+ Normal Killing 2m kubelet Container web-app failed liveness
33
+ check, will be restarted
34
+ Warning Unhealthy 30s kubelet Readiness probe failed: HTTP probe
35
+ failed with statuscode: 503
36
+
37
+ The application takes 45 seconds to fully start up (loading caches,
38
+ warming connections). But the liveness probe starts checking at 5
39
+ seconds with only 3 failures allowed (5s + 3*10s = 35s), so it kills
40
+ the container before it's ready.
41
+
42
+ Additionally, the readiness probe uses the same aggressive timing,
43
+ so during startup the pod is removed from the Service endpoints,
44
+ causing 503s for users.
45
+
46
+ Problems:
47
+ 1. Liveness probe starts too early — kills the container during startup
48
+ 2. No startup probe — would protect the container during initialization
49
+ 3. Readiness probe timing causes premature endpoint removal
50
+
51
+ Task: Explain Kubernetes health probes. Write: the difference between
52
+ liveness, readiness, and startup probes, what each one controls (restart
53
+ vs traffic vs startup protection), how to configure timing parameters
54
+ (initialDelaySeconds, periodSeconds, failureThreshold, timeoutSeconds),
55
+ common misconfiguration patterns, and best practices for slow-starting
56
+ applications.
57
+
58
+ assertions:
59
+ - type: llm_judge
60
+ criteria: "All three probe types are explained — liveness: kills and restarts the container if it fails (detects deadlocks/hangs), readiness: removes pod from Service endpoints if it fails (controls traffic routing), startup: disables liveness/readiness until it succeeds (protects slow-starting containers). Each probe serves a different purpose and should not use the same endpoint"
61
+ weight: 0.35
62
+ description: "Probe types explained"
63
+ - type: llm_judge
64
+ criteria: "Timing parameters are explained with the math — initialDelaySeconds (wait before first check), periodSeconds (interval between checks), failureThreshold (consecutive failures before action), timeoutSeconds (per-check timeout). Total startup tolerance = initialDelaySeconds + (failureThreshold * periodSeconds). In this case: 5 + (3*10) = 35s < 45s startup time, so the container gets killed"
65
+ weight: 0.35
66
+ description: "Timing parameters explained"
67
+ - type: llm_judge
68
+ criteria: "Fix and best practices are practical — add a startup probe with generous timeout (e.g., failureThreshold=30, periodSeconds=5 = 150s window), keep liveness probe for runtime health only (not startup), make readiness check actual dependency availability, never use the same endpoint for all three probes if they serve different purposes"
69
+ weight: 0.30
70
+ description: "Fix and best practices"
@@ -0,0 +1,57 @@
1
+ meta:
2
+ id: imagepullbackoff
3
+ level: 1
4
+ course: kubernetes-deployment-troubleshooting
5
+ type: output
6
+ description: "Debug ImagePullBackOff — diagnose why Kubernetes can't pull a container image from the registry"
7
+ tags: [Kubernetes, ImagePullBackOff, container-registry, authentication, beginner]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You deployed a new service but pods are stuck in ImagePullBackOff:
13
+
14
+ $ kubectl get pods
15
+ NAME READY STATUS RESTARTS AGE
16
+ payment-svc-5c6d7e8f-abc12 0/1 ImagePullBackOff 0 5m
17
+ payment-svc-5c6d7e8f-def34 0/1 ImagePullBackOff 0 5m
18
+
19
+ $ kubectl describe pod payment-svc-5c6d7e8f-abc12
20
+ Events:
21
+ Normal Scheduled 5m default-scheduler Successfully assigned...
22
+ Normal Pulling 5m kubelet Pulling image "gcr.io/mycompany/payment:v2.1.0"
23
+ Warning Failed 5m kubelet Failed to pull image "gcr.io/mycompany/payment:v2.1.0":
24
+ rpc error: code = Unknown desc = Error response from daemon:
25
+ unauthorized: You don't have the needed permissions to perform
26
+ this operation, and you may have invalid credentials.
27
+ Warning Failed 5m kubelet Error: ImagePullBackOff
28
+
29
+ The v2.0.0 tag works fine. v2.1.0 was just pushed to GCR.
30
+
31
+ Possible causes to investigate:
32
+ 1. Image doesn't exist (typo in tag or repository)
33
+ 2. Registry authentication expired or misconfigured
34
+ 3. Image was pushed to a different registry/project
35
+ 4. imagePullSecrets not configured on the ServiceAccount
36
+ 5. Network issue between cluster and registry
37
+ 6. Image pull policy: IfNotPresent with a mutable tag
38
+
39
+ Task: Explain how to debug ImagePullBackOff. Write: the common
40
+ causes (authentication, typo, network, pull policy), the debugging
41
+ steps (verify image exists, check secrets, test pull manually), how
42
+ imagePullSecrets work, image pull policies (Always, IfNotPresent,
43
+ Never), and best practices for image management.
44
+
45
+ assertions:
46
+ - type: llm_judge
47
+ criteria: "Common causes are listed — authentication failure (expired token, missing imagePullSecrets), image tag doesn't exist (typo, not pushed yet), registry network issue, wrong image pull policy. The error message 'unauthorized' points to authentication as the likely cause in this scenario"
48
+ weight: 0.35
49
+ description: "Common causes listed"
50
+ - type: llm_judge
51
+ criteria: "Debugging steps are actionable — verify image exists (docker pull or crane/skopeo), check imagePullSecrets (kubectl get sa default -o yaml, kubectl get secret), verify secret has correct credentials (kubectl get secret -o jsonpath), test from node directly. Shows the kubectl commands for each step"
52
+ weight: 0.35
53
+ description: "Actionable debugging steps"
54
+ - type: llm_judge
55
+ criteria: "Pull policies and best practices are explained — Always (re-pull every time, good for mutable tags like :latest), IfNotPresent (use cached if available, good for immutable tags), Never (only use locally loaded). Best practices: use immutable tags (SHA or semver, not :latest), rotate registry credentials, set imagePullSecrets on ServiceAccount rather than per-pod"
56
+ weight: 0.30
57
+ description: "Pull policies and best practices"