dojo.md 0.2.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (222) hide show
  1. package/courses/GENERATION_LOG.md +45 -0
  2. package/courses/aws-lambda-debugging/course.yaml +11 -0
  3. package/courses/aws-lambda-debugging/scenarios/level-1/api-gateway-integration.yaml +71 -0
  4. package/courses/aws-lambda-debugging/scenarios/level-1/cloudwatch-logs-basics.yaml +64 -0
  5. package/courses/aws-lambda-debugging/scenarios/level-1/cold-start-basics.yaml +70 -0
  6. package/courses/aws-lambda-debugging/scenarios/level-1/environment-variable-issues.yaml +72 -0
  7. package/courses/aws-lambda-debugging/scenarios/level-1/first-debugging-shift.yaml +73 -0
  8. package/courses/aws-lambda-debugging/scenarios/level-1/handler-import-errors.yaml +71 -0
  9. package/courses/aws-lambda-debugging/scenarios/level-1/iam-permission-errors.yaml +68 -0
  10. package/courses/aws-lambda-debugging/scenarios/level-1/invocation-errors.yaml +72 -0
  11. package/courses/aws-lambda-debugging/scenarios/level-1/lambda-timeout-errors.yaml +65 -0
  12. package/courses/aws-lambda-debugging/scenarios/level-1/memory-and-oom.yaml +70 -0
  13. package/courses/aws-lambda-debugging/scenarios/level-2/async-invocation-failures.yaml +72 -0
  14. package/courses/aws-lambda-debugging/scenarios/level-2/cold-start-optimization.yaml +76 -0
  15. package/courses/aws-lambda-debugging/scenarios/level-2/dynamodb-streams-debugging.yaml +70 -0
  16. package/courses/aws-lambda-debugging/scenarios/level-2/intermediate-debugging-shift.yaml +71 -0
  17. package/courses/aws-lambda-debugging/scenarios/level-2/lambda-concurrency-management.yaml +70 -0
  18. package/courses/aws-lambda-debugging/scenarios/level-2/lambda-layers-debugging.yaml +76 -0
  19. package/courses/aws-lambda-debugging/scenarios/level-2/sam-local-debugging.yaml +74 -0
  20. package/courses/aws-lambda-debugging/scenarios/level-2/sqs-event-source.yaml +72 -0
  21. package/courses/aws-lambda-debugging/scenarios/level-2/vpc-networking-issues.yaml +71 -0
  22. package/courses/aws-lambda-debugging/scenarios/level-2/xray-tracing.yaml +62 -0
  23. package/courses/aws-lambda-debugging/scenarios/level-3/advanced-debugging-shift.yaml +72 -0
  24. package/courses/aws-lambda-debugging/scenarios/level-3/container-image-lambda.yaml +79 -0
  25. package/courses/aws-lambda-debugging/scenarios/level-3/cross-account-invocation.yaml +72 -0
  26. package/courses/aws-lambda-debugging/scenarios/level-3/eventbridge-patterns.yaml +79 -0
  27. package/courses/aws-lambda-debugging/scenarios/level-3/iac-deployment-debugging.yaml +68 -0
  28. package/courses/aws-lambda-debugging/scenarios/level-3/kinesis-stream-processing.yaml +64 -0
  29. package/courses/aws-lambda-debugging/scenarios/level-3/lambda-at-edge.yaml +64 -0
  30. package/courses/aws-lambda-debugging/scenarios/level-3/lambda-extensions-debugging.yaml +67 -0
  31. package/courses/aws-lambda-debugging/scenarios/level-3/powertools-observability.yaml +79 -0
  32. package/courses/aws-lambda-debugging/scenarios/level-3/step-functions-debugging.yaml +80 -0
  33. package/courses/aws-lambda-debugging/scenarios/level-4/cost-optimization-strategy.yaml +67 -0
  34. package/courses/aws-lambda-debugging/scenarios/level-4/expert-debugging-shift.yaml +62 -0
  35. package/courses/aws-lambda-debugging/scenarios/level-4/incident-management-serverless.yaml +61 -0
  36. package/courses/aws-lambda-debugging/scenarios/level-4/multi-region-serverless.yaml +67 -0
  37. package/courses/aws-lambda-debugging/scenarios/level-4/observability-platform-design.yaml +71 -0
  38. package/courses/aws-lambda-debugging/scenarios/level-4/serverless-architecture-design.yaml +64 -0
  39. package/courses/aws-lambda-debugging/scenarios/level-4/serverless-data-architecture.yaml +66 -0
  40. package/courses/aws-lambda-debugging/scenarios/level-4/serverless-migration-strategy.yaml +65 -0
  41. package/courses/aws-lambda-debugging/scenarios/level-4/serverless-security-design.yaml +60 -0
  42. package/courses/aws-lambda-debugging/scenarios/level-4/serverless-testing-strategy.yaml +62 -0
  43. package/courses/aws-lambda-debugging/scenarios/level-5/board-serverless-strategy.yaml +63 -0
  44. package/courses/aws-lambda-debugging/scenarios/level-5/consulting-serverless-adoption.yaml +57 -0
  45. package/courses/aws-lambda-debugging/scenarios/level-5/industry-serverless-patterns.yaml +62 -0
  46. package/courses/aws-lambda-debugging/scenarios/level-5/ma-serverless-integration.yaml +75 -0
  47. package/courses/aws-lambda-debugging/scenarios/level-5/master-debugging-shift.yaml +61 -0
  48. package/courses/aws-lambda-debugging/scenarios/level-5/organizational-serverless-transformation.yaml +65 -0
  49. package/courses/aws-lambda-debugging/scenarios/level-5/regulatory-serverless.yaml +61 -0
  50. package/courses/aws-lambda-debugging/scenarios/level-5/serverless-economics.yaml +65 -0
  51. package/courses/aws-lambda-debugging/scenarios/level-5/serverless-future-technology.yaml +66 -0
  52. package/courses/aws-lambda-debugging/scenarios/level-5/serverless-platform-design.yaml +71 -0
  53. package/courses/docker-container-debugging/course.yaml +11 -0
  54. package/courses/docker-container-debugging/scenarios/level-1/container-exit-codes.yaml +59 -0
  55. package/courses/docker-container-debugging/scenarios/level-1/container-networking-basics.yaml +69 -0
  56. package/courses/docker-container-debugging/scenarios/level-1/docker-logs-debugging.yaml +67 -0
  57. package/courses/docker-container-debugging/scenarios/level-1/dockerfile-build-failures.yaml +71 -0
  58. package/courses/docker-container-debugging/scenarios/level-1/environment-variable-issues.yaml +74 -0
  59. package/courses/docker-container-debugging/scenarios/level-1/first-debugging-shift.yaml +70 -0
  60. package/courses/docker-container-debugging/scenarios/level-1/image-pull-failures.yaml +68 -0
  61. package/courses/docker-container-debugging/scenarios/level-1/port-mapping-issues.yaml +67 -0
  62. package/courses/docker-container-debugging/scenarios/level-1/resource-limits-oom.yaml +70 -0
  63. package/courses/docker-container-debugging/scenarios/level-1/volume-mount-problems.yaml +66 -0
  64. package/courses/docker-container-debugging/scenarios/level-2/container-health-checks.yaml +73 -0
  65. package/courses/docker-container-debugging/scenarios/level-2/docker-compose-debugging.yaml +66 -0
  66. package/courses/docker-container-debugging/scenarios/level-2/docker-exec-debugging.yaml +71 -0
  67. package/courses/docker-container-debugging/scenarios/level-2/image-layer-optimization.yaml +81 -0
  68. package/courses/docker-container-debugging/scenarios/level-2/intermediate-debugging-shift.yaml +73 -0
  69. package/courses/docker-container-debugging/scenarios/level-2/logging-and-log-rotation.yaml +76 -0
  70. package/courses/docker-container-debugging/scenarios/level-2/multi-stage-build-debugging.yaml +76 -0
  71. package/courses/docker-container-debugging/scenarios/level-2/network-debugging-tools.yaml +67 -0
  72. package/courses/docker-container-debugging/scenarios/level-2/pid1-signal-handling.yaml +71 -0
  73. package/courses/docker-container-debugging/scenarios/level-2/security-scanning-basics.yaml +67 -0
  74. package/courses/docker-container-debugging/scenarios/level-3/advanced-debugging-shift.yaml +77 -0
  75. package/courses/docker-container-debugging/scenarios/level-3/buildkit-optimization.yaml +67 -0
  76. package/courses/docker-container-debugging/scenarios/level-3/container-filesystem-debugging.yaml +70 -0
  77. package/courses/docker-container-debugging/scenarios/level-3/container-security-hardening.yaml +74 -0
  78. package/courses/docker-container-debugging/scenarios/level-3/disk-space-management.yaml +74 -0
  79. package/courses/docker-container-debugging/scenarios/level-3/docker-api-automation.yaml +72 -0
  80. package/courses/docker-container-debugging/scenarios/level-3/docker-daemon-issues.yaml +73 -0
  81. package/courses/docker-container-debugging/scenarios/level-3/docker-in-docker-ci.yaml +69 -0
  82. package/courses/docker-container-debugging/scenarios/level-3/overlay-network-debugging.yaml +70 -0
  83. package/courses/docker-container-debugging/scenarios/level-3/production-container-ops.yaml +71 -0
  84. package/courses/docker-container-debugging/scenarios/level-4/cicd-pipeline-design.yaml +66 -0
  85. package/courses/docker-container-debugging/scenarios/level-4/container-monitoring-observability.yaml +63 -0
  86. package/courses/docker-container-debugging/scenarios/level-4/container-orchestration-strategy.yaml +62 -0
  87. package/courses/docker-container-debugging/scenarios/level-4/container-performance-engineering.yaml +64 -0
  88. package/courses/docker-container-debugging/scenarios/level-4/container-security-architecture.yaml +66 -0
  89. package/courses/docker-container-debugging/scenarios/level-4/enterprise-image-management.yaml +58 -0
  90. package/courses/docker-container-debugging/scenarios/level-4/expert-debugging-shift.yaml +63 -0
  91. package/courses/docker-container-debugging/scenarios/level-4/incident-response-containers.yaml +70 -0
  92. package/courses/docker-container-debugging/scenarios/level-4/multi-environment-management.yaml +65 -0
  93. package/courses/docker-container-debugging/scenarios/level-4/stateful-service-containers.yaml +65 -0
  94. package/courses/docker-container-debugging/scenarios/level-5/board-infrastructure-strategy.yaml +58 -0
  95. package/courses/docker-container-debugging/scenarios/level-5/consulting-container-strategy.yaml +61 -0
  96. package/courses/docker-container-debugging/scenarios/level-5/container-platform-architecture.yaml +67 -0
  97. package/courses/docker-container-debugging/scenarios/level-5/container-platform-economics.yaml +67 -0
  98. package/courses/docker-container-debugging/scenarios/level-5/container-technology-evolution.yaml +67 -0
  99. package/courses/docker-container-debugging/scenarios/level-5/disaster-recovery-containers.yaml +66 -0
  100. package/courses/docker-container-debugging/scenarios/level-5/industry-container-patterns.yaml +71 -0
  101. package/courses/docker-container-debugging/scenarios/level-5/master-debugging-shift.yaml +62 -0
  102. package/courses/docker-container-debugging/scenarios/level-5/organizational-transformation.yaml +67 -0
  103. package/courses/docker-container-debugging/scenarios/level-5/regulatory-compliance-containers.yaml +61 -0
  104. package/courses/kubernetes-deployment-troubleshooting/course.yaml +12 -0
  105. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/configmap-secret-issues.yaml +69 -0
  106. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/crashloopbackoff.yaml +68 -0
  107. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/deployment-rollout.yaml +56 -0
  108. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/first-troubleshooting-shift.yaml +65 -0
  109. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/health-probe-failures.yaml +70 -0
  110. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/imagepullbackoff.yaml +57 -0
  111. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/kubectl-debugging-basics.yaml +56 -0
  112. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/oomkilled.yaml +70 -0
  113. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/pending-pods.yaml +68 -0
  114. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/service-not-reachable.yaml +66 -0
  115. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/dns-resolution-failures.yaml +63 -0
  116. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/helm-deployment-failures.yaml +63 -0
  117. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/hpa-scaling-issues.yaml +62 -0
  118. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/ingress-routing-issues.yaml +63 -0
  119. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/init-container-failures.yaml +63 -0
  120. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/intermediate-troubleshooting-shift.yaml +66 -0
  121. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/network-policy-blocking.yaml +67 -0
  122. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/persistent-volume-issues.yaml +69 -0
  123. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/rbac-permission-denied.yaml +57 -0
  124. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/resource-quota-limits.yaml +64 -0
  125. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/advanced-troubleshooting-shift.yaml +69 -0
  126. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/cluster-upgrade-failures.yaml +71 -0
  127. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/gitops-drift-detection.yaml +62 -0
  128. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/job-cronjob-failures.yaml +67 -0
  129. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/monitoring-alerting-gaps.yaml +64 -0
  130. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/multi-container-debugging.yaml +68 -0
  131. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/node-pressure-evictions.yaml +70 -0
  132. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/pod-disruption-budgets.yaml +59 -0
  133. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/service-mesh-debugging.yaml +64 -0
  134. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/statefulset-troubleshooting.yaml +69 -0
  135. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/capacity-planning.yaml +65 -0
  136. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/cost-optimization.yaml +57 -0
  137. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/disaster-recovery-design.yaml +56 -0
  138. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/executive-communication.yaml +62 -0
  139. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/expert-troubleshooting-shift.yaml +65 -0
  140. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/incident-management-process.yaml +59 -0
  141. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/multi-cluster-operations.yaml +62 -0
  142. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/multi-tenancy-design.yaml +55 -0
  143. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/platform-engineering.yaml +59 -0
  144. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/security-hardening.yaml +58 -0
  145. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/behavioral-science.yaml +62 -0
  146. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/board-strategy.yaml +61 -0
  147. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/cloud-native-future.yaml +65 -0
  148. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/comprehensive-platform.yaml +57 -0
  149. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/consulting-engagement.yaml +62 -0
  150. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/industry-benchmarks.yaml +58 -0
  151. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/ma-integration.yaml +62 -0
  152. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/master-troubleshooting-shift.yaml +73 -0
  153. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/product-development.yaml +65 -0
  154. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/regulatory-compliance.yaml +76 -0
  155. package/courses/mysql-query-optimization/course.yaml +11 -0
  156. package/courses/mysql-query-optimization/scenarios/level-1/buffer-pool-basics.yaml +65 -0
  157. package/courses/mysql-query-optimization/scenarios/level-1/explain-basics.yaml +66 -0
  158. package/courses/mysql-query-optimization/scenarios/level-1/first-optimization-shift.yaml +78 -0
  159. package/courses/mysql-query-optimization/scenarios/level-1/innodb-index-fundamentals.yaml +68 -0
  160. package/courses/mysql-query-optimization/scenarios/level-1/join-basics.yaml +66 -0
  161. package/courses/mysql-query-optimization/scenarios/level-1/n-plus-one-queries.yaml +67 -0
  162. package/courses/mysql-query-optimization/scenarios/level-1/query-rewriting-basics.yaml +66 -0
  163. package/courses/mysql-query-optimization/scenarios/level-1/select-star-problems.yaml +68 -0
  164. package/courses/mysql-query-optimization/scenarios/level-1/slow-query-diagnosis.yaml +65 -0
  165. package/courses/mysql-query-optimization/scenarios/level-1/where-clause-optimization.yaml +65 -0
  166. package/courses/mysql-query-optimization/scenarios/level-2/buffer-pool-tuning.yaml +64 -0
  167. package/courses/mysql-query-optimization/scenarios/level-2/composite-index-design.yaml +71 -0
  168. package/courses/mysql-query-optimization/scenarios/level-2/covering-and-invisible-indexes.yaml +69 -0
  169. package/courses/mysql-query-optimization/scenarios/level-2/cte-and-window-functions.yaml +78 -0
  170. package/courses/mysql-query-optimization/scenarios/level-2/intermediate-optimization-shift.yaml +68 -0
  171. package/courses/mysql-query-optimization/scenarios/level-2/join-optimization.yaml +67 -0
  172. package/courses/mysql-query-optimization/scenarios/level-2/performance-schema-analysis.yaml +69 -0
  173. package/courses/mysql-query-optimization/scenarios/level-2/query-optimizer-hints.yaml +74 -0
  174. package/courses/mysql-query-optimization/scenarios/level-2/subquery-optimization.yaml +70 -0
  175. package/courses/mysql-query-optimization/scenarios/level-2/write-optimization.yaml +63 -0
  176. package/courses/mysql-query-optimization/scenarios/level-3/advanced-optimization-shift.yaml +71 -0
  177. package/courses/mysql-query-optimization/scenarios/level-3/connection-management.yaml +67 -0
  178. package/courses/mysql-query-optimization/scenarios/level-3/full-text-search.yaml +77 -0
  179. package/courses/mysql-query-optimization/scenarios/level-3/json-optimization.yaml +87 -0
  180. package/courses/mysql-query-optimization/scenarios/level-3/lock-contention-analysis.yaml +68 -0
  181. package/courses/mysql-query-optimization/scenarios/level-3/monitoring-alerting.yaml +63 -0
  182. package/courses/mysql-query-optimization/scenarios/level-3/online-schema-changes.yaml +79 -0
  183. package/courses/mysql-query-optimization/scenarios/level-3/partitioning-strategies.yaml +83 -0
  184. package/courses/mysql-query-optimization/scenarios/level-3/query-profiling-deep-dive.yaml +84 -0
  185. package/courses/mysql-query-optimization/scenarios/level-3/replication-optimization.yaml +66 -0
  186. package/courses/mysql-query-optimization/scenarios/level-4/aurora-vs-rds-evaluation.yaml +61 -0
  187. package/courses/mysql-query-optimization/scenarios/level-4/data-architecture.yaml +62 -0
  188. package/courses/mysql-query-optimization/scenarios/level-4/database-migration-planning.yaml +59 -0
  189. package/courses/mysql-query-optimization/scenarios/level-4/enterprise-governance.yaml +50 -0
  190. package/courses/mysql-query-optimization/scenarios/level-4/executive-communication.yaml +54 -0
  191. package/courses/mysql-query-optimization/scenarios/level-4/expert-optimization-shift.yaml +67 -0
  192. package/courses/mysql-query-optimization/scenarios/level-4/high-availability-architecture.yaml +60 -0
  193. package/courses/mysql-query-optimization/scenarios/level-4/optimizer-internals.yaml +62 -0
  194. package/courses/mysql-query-optimization/scenarios/level-4/performance-sla-design.yaml +52 -0
  195. package/courses/mysql-query-optimization/scenarios/level-4/read-replica-scaling.yaml +51 -0
  196. package/courses/mysql-query-optimization/scenarios/level-5/ai-database-future.yaml +45 -0
  197. package/courses/mysql-query-optimization/scenarios/level-5/behavioral-science.yaml +44 -0
  198. package/courses/mysql-query-optimization/scenarios/level-5/benchmark-design.yaml +47 -0
  199. package/courses/mysql-query-optimization/scenarios/level-5/board-strategy.yaml +48 -0
  200. package/courses/mysql-query-optimization/scenarios/level-5/comprehensive-platform.yaml +49 -0
  201. package/courses/mysql-query-optimization/scenarios/level-5/consulting-engagement.yaml +52 -0
  202. package/courses/mysql-query-optimization/scenarios/level-5/ma-database-integration.yaml +47 -0
  203. package/courses/mysql-query-optimization/scenarios/level-5/master-optimization-shift.yaml +56 -0
  204. package/courses/mysql-query-optimization/scenarios/level-5/product-development.yaml +48 -0
  205. package/courses/mysql-query-optimization/scenarios/level-5/regulatory-compliance.yaml +48 -0
  206. package/courses/postgresql-query-optimization/scenarios/level-5/comprehensive-database-system.yaml +70 -0
  207. package/courses/postgresql-query-optimization/scenarios/level-5/database-ai-future.yaml +81 -0
  208. package/courses/postgresql-query-optimization/scenarios/level-5/database-behavioral-science.yaml +63 -0
  209. package/courses/postgresql-query-optimization/scenarios/level-5/database-board-strategy.yaml +77 -0
  210. package/courses/postgresql-query-optimization/scenarios/level-5/database-consulting-engagement.yaml +61 -0
  211. package/courses/postgresql-query-optimization/scenarios/level-5/database-industry-benchmarks.yaml +64 -0
  212. package/courses/postgresql-query-optimization/scenarios/level-5/database-ma-integration.yaml +71 -0
  213. package/courses/postgresql-query-optimization/scenarios/level-5/database-product-development.yaml +72 -0
  214. package/courses/postgresql-query-optimization/scenarios/level-5/database-regulatory-landscape.yaml +76 -0
  215. package/courses/postgresql-query-optimization/scenarios/level-5/master-optimization-shift.yaml +66 -0
  216. package/courses/terraform-infrastructure-setup/course.yaml +11 -0
  217. package/courses/terraform-infrastructure-setup/scenarios/level-1/terraform-init-errors.yaml +72 -0
  218. package/dist/mcp/session-manager.d.ts +7 -4
  219. package/dist/mcp/session-manager.d.ts.map +1 -1
  220. package/dist/mcp/session-manager.js +23 -8
  221. package/dist/mcp/session-manager.js.map +1 -1
  222. package/package.json +1 -1
@@ -0,0 +1,56 @@
1
+ meta:
2
+ id: kubectl-debugging-basics
3
+ level: 1
4
+ course: kubernetes-deployment-troubleshooting
5
+ type: output
6
+ description: "Learn essential kubectl debugging commands — logs, describe, exec, port-forward, and events for systematic troubleshooting"
7
+ tags: [Kubernetes, kubectl, debugging, logs, describe, exec, beginner]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ A teammate asks for help debugging a misbehaving pod. You need to
13
+ walk them through the essential kubectl debugging toolkit.
14
+
15
+ Here's what they see:
16
+
17
+ $ kubectl get pods -n production
18
+ NAME READY STATUS RESTARTS AGE
19
+ inventory-svc-6f7g8h9i-j0 1/1 Running 3 45m
20
+ inventory-svc-6f7g8h9i-k1 1/1 Running 0 45m
21
+
22
+ One pod has 3 restarts, the other has 0. Both show Running now.
23
+
24
+ They need to figure out:
25
+ 1. Why did the first pod restart 3 times?
26
+ 2. What's different about the two pods?
27
+ 3. Is the application actually healthy?
28
+
29
+ Key commands to teach:
30
+ - kubectl describe pod: events, conditions, container state, restart reason
31
+ - kubectl logs: current and previous container output
32
+ - kubectl logs --previous: see logs from the crashed container
33
+ - kubectl exec: run commands inside the container
34
+ - kubectl port-forward: test the application locally
35
+ - kubectl get events: cluster-wide event timeline
36
+ - kubectl top pod: resource usage (requires metrics-server)
37
+
38
+ Task: Explain the essential kubectl debugging workflow. Write: the
39
+ purpose and usage of each debugging command (describe, logs, logs
40
+ --previous, exec, port-forward, events, top), the order to use them
41
+ for systematic debugging, what information each command reveals, and
42
+ how to debug multi-container pods (using -c flag).
43
+
44
+ assertions:
45
+ - type: llm_judge
46
+ criteria: "Core debugging commands are explained with purpose — kubectl describe pod shows events/conditions/state (first command to run), kubectl logs shows stdout/stderr output, kubectl logs --previous shows logs from the crashed container instance, kubectl exec -it allows interactive debugging inside the container, kubectl port-forward tunnels to test the app locally, kubectl get events shows cluster-wide timeline"
47
+ weight: 0.35
48
+ description: "Core debugging commands"
49
+ - type: llm_judge
50
+ criteria: "Systematic debugging order is presented — recommended flow: (1) kubectl get pods to see status/restarts, (2) kubectl describe pod for events and exit codes, (3) kubectl logs / logs --previous for application errors, (4) kubectl exec to inspect container state (env vars, files, connectivity), (5) kubectl top pod for resource usage. For multi-container pods: use -c <container-name> flag"
51
+ weight: 0.35
52
+ description: "Systematic debugging order"
53
+ - type: llm_judge
54
+ criteria: "Practical usage patterns are shown — how to stream logs (kubectl logs -f), follow multiple pods (kubectl logs -l app=inventory-svc), filter events by type (kubectl get events --field-selector type=Warning), test connectivity from inside a pod (kubectl exec -- curl, nslookup), and use ephemeral debug containers (kubectl debug) for distroless images that lack shell access"
55
+ weight: 0.30
56
+ description: "Practical usage patterns"
@@ -0,0 +1,70 @@
1
+ meta:
2
+ id: oomkilled
3
+ level: 1
4
+ course: kubernetes-deployment-troubleshooting
5
+ type: output
6
+ description: "Debug OOMKilled pods — understand memory limits, diagnose memory leaks, and configure appropriate resource limits"
7
+ tags: [Kubernetes, OOMKilled, memory, resource-limits, debugging, beginner]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your Java application pod keeps getting OOMKilled every few hours:
13
+
14
+ $ kubectl get pods
15
+ NAME READY STATUS RESTARTS AGE
16
+ order-svc-8a9b0c1d-xyz 1/1 Running 4 6h
17
+
18
+ $ kubectl describe pod order-svc-8a9b0c1d-xyz
19
+ Last State: Terminated
20
+ Reason: OOMKilled
21
+ Exit Code: 137
22
+ Started: 2025-12-01T10:00:00Z
23
+ Finished: 2025-12-01T11:30:00Z
24
+
25
+ Containers:
26
+ order-svc:
27
+ Limits:
28
+ memory: 512Mi
29
+ Requests:
30
+ memory: 256Mi
31
+
32
+ $ kubectl top pod order-svc-8a9b0c1d-xyz
33
+ NAME CPU(cores) MEMORY(bytes)
34
+ order-svc-8a9b0c1d-xyz 150m 498Mi
35
+
36
+ The pod is using 498Mi of its 512Mi limit — about to be killed again.
37
+
38
+ The application is a Java Spring Boot service with:
39
+ - JVM heap: not explicitly configured (defaults to 25% of container RAM)
40
+ - No -XX:MaxRAMPercentage set
41
+ - Container limit: 512Mi
42
+ - JVM sees 512Mi and sets max heap to ~128Mi
43
+ - But JVM total memory (heap + metaspace + threads + native) exceeds 512Mi
44
+
45
+ Questions:
46
+ 1. What does OOMKilled mean and why is exit code 137?
47
+ 2. Why does the JVM exceed the container memory limit?
48
+ 3. How should you configure JVM memory in containers?
49
+ 4. How do Kubernetes memory requests vs limits work?
50
+ 5. How to monitor memory to prevent OOMKilled?
51
+
52
+ Task: Explain OOMKilled and how to fix it. Write: what OOMKilled
53
+ means (kernel OOM killer, exit code 137), why JVM apps often get
54
+ OOMKilled in containers, the correct JVM memory configuration for
55
+ containers, how requests and limits work (QoS classes), and memory
56
+ monitoring approach.
57
+
58
+ assertions:
59
+ - type: llm_judge
60
+ criteria: "OOMKilled is correctly explained — the Linux kernel's OOM killer terminates the process when it exceeds cgroup memory limit. Exit code 137 = 128 + 9 (SIGKILL). This is different from the application running out of heap space (which throws OutOfMemoryError). Kubernetes sets the cgroup limit based on the pod's memory limit"
61
+ weight: 0.35
62
+ description: "OOMKilled explained"
63
+ - type: llm_judge
64
+ criteria: "JVM container issue is addressed — JVM total memory = heap + metaspace + thread stacks + direct buffers + native memory. Even if heap is within limit, total can exceed. Fix: set -XX:MaxRAMPercentage=75 (leave 25% for non-heap), or explicitly set -Xmx to 75% of container limit. Modern JVMs (11+) are container-aware but still need tuning"
65
+ weight: 0.35
66
+ description: "JVM container memory"
67
+ - type: llm_judge
68
+ criteria: "Requests, limits, and QoS are explained — requests: minimum guaranteed memory (used for scheduling), limits: maximum allowed (enforced by cgroup). QoS classes: Guaranteed (requests=limits), Burstable (requests<limits), BestEffort (no requests/limits). Guaranteed pods are last to be evicted. Monitoring: kubectl top, Prometheus metrics, set up alerts before hitting limits"
69
+ weight: 0.30
70
+ description: "Requests, limits, QoS"
@@ -0,0 +1,68 @@
1
+ meta:
2
+ id: pending-pods
3
+ level: 1
4
+ course: kubernetes-deployment-troubleshooting
5
+ type: output
6
+ description: "Debug Pending pods — diagnose why pods won't schedule, from insufficient resources to node affinity mismatches"
7
+ tags: [Kubernetes, Pending, scheduling, resources, node-affinity, beginner]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You deployed a new service but the pods are stuck in Pending state
13
+ for over 10 minutes:
14
+
15
+ $ kubectl get pods
16
+ NAME READY STATUS RESTARTS AGE
17
+ analytics-svc-9d8e7f6g-a1 0/1 Pending 0 10m
18
+ analytics-svc-9d8e7f6g-b2 0/1 Pending 0 10m
19
+ analytics-svc-9d8e7f6g-c3 0/1 Pending 0 10m
20
+
21
+ $ kubectl describe pod analytics-svc-9d8e7f6g-a1
22
+ Events:
23
+ Warning FailedScheduling 10m default-scheduler
24
+ 0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had
25
+ untolerated taint {dedicated: gpu-workload}
26
+
27
+ The deployment requests:
28
+ resources:
29
+ requests:
30
+ cpu: "4"
31
+ memory: "8Gi"
32
+
33
+ Cluster nodes:
34
+ - node-1: 8 CPU, 32GB — 6.5 CPU already allocated
35
+ - node-2: 8 CPU, 32GB — 7 CPU already allocated
36
+ - node-3: 16 CPU, 64GB — taint: dedicated=gpu-workload:NoSchedule
37
+ - node-4: 16 CPU, 64GB — taint: dedicated=gpu-workload:NoSchedule
38
+ - node-5: 16 CPU, 64GB — taint: dedicated=gpu-workload:NoSchedule
39
+
40
+ So: nodes 1-2 don't have enough CPU available, nodes 3-5 have taints
41
+ that the pod doesn't tolerate.
42
+
43
+ Common Pending causes:
44
+ 1. Insufficient CPU or memory on available nodes
45
+ 2. Node taints without matching tolerations
46
+ 3. Node affinity/anti-affinity rules not satisfied
47
+ 4. PVC not bound (waiting for volume)
48
+ 5. ResourceQuota exceeded for namespace
49
+ 6. Too many pods (max pods per node reached)
50
+
51
+ Task: Explain how to debug Pending pods. Write: what Pending means
52
+ (scheduler can't place the pod), how to read the FailedScheduling
53
+ event message, common causes and their fixes, how taints and
54
+ tolerations work, and how to check cluster capacity.
55
+
56
+ assertions:
57
+ - type: llm_judge
58
+ criteria: "Pending state is explained — the pod is in the scheduling queue but no node satisfies all constraints (resource requests, taints, affinity rules, PVC availability). The FailedScheduling event message tells you exactly why — it lists how many nodes failed each check"
59
+ weight: 0.35
60
+ description: "Pending state explained"
61
+ - type: llm_judge
62
+ criteria: "Taints and tolerations are explained — taints on nodes repel pods unless the pod has a matching toleration. NoSchedule prevents scheduling, PreferNoSchedule is soft, NoExecute evicts existing pods. In this case, nodes 3-5 have dedicated=gpu-workload:NoSchedule, so non-GPU pods can't schedule there"
63
+ weight: 0.35
64
+ description: "Taints and tolerations"
65
+ - type: llm_judge
66
+ criteria: "Fixes are practical — options: reduce resource requests, add capacity (new nodes), add toleration to the pod spec (if appropriate), check if allocated resources can be reclaimed (pods using less than requested), or use the cluster autoscaler. Shows kubectl commands to check node capacity (kubectl describe node, kubectl top node)"
67
+ weight: 0.30
68
+ description: "Practical fixes"
@@ -0,0 +1,66 @@
1
+ meta:
2
+ id: service-not-reachable
3
+ level: 1
4
+ course: kubernetes-deployment-troubleshooting
5
+ type: output
6
+ description: "Debug Service connectivity — diagnose why a Kubernetes Service can't reach its backend pods"
7
+ tags: [Kubernetes, Service, networking, endpoints, DNS, beginner]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your frontend app can't connect to the backend API service. The
13
+ frontend logs show:
14
+
15
+ Error: connect ECONNREFUSED api-service:8080
16
+ Error: getaddrinfo ENOTFOUND api-service
17
+
18
+ Both the frontend and backend are running in the same namespace.
19
+
20
+ $ kubectl get pods -n app
21
+ NAME READY STATUS RESTARTS AGE
22
+ frontend-6a7b8c9d-p1 1/1 Running 0 10m
23
+ backend-api-3e4f5g6h-q2 1/1 Running 0 10m
24
+
25
+ $ kubectl get svc -n app
26
+ NAME TYPE CLUSTER-IP PORT(S) AGE
27
+ api-service ClusterIP 10.96.45.12 80/TCP 10m
28
+
29
+ $ kubectl get endpoints api-service -n app
30
+ NAME ENDPOINTS AGE
31
+ api-service <none> 10m
32
+
33
+ The endpoints are empty! The Service exists and the pod is Running,
34
+ but they're not connected.
35
+
36
+ Investigation:
37
+ - Service selector: app=backend-api
38
+ - Pod labels: app=api-backend (label mismatch!)
39
+ - The Service uses port 80 but the container listens on 8080
40
+ (targetPort should be 8080)
41
+
42
+ Two problems:
43
+ 1. Label selector mismatch: Service looks for app=backend-api but
44
+ pods have app=api-backend
45
+ 2. Port mismatch: Service port 80 → targetPort not set (defaults
46
+ to 80, but container listens on 8080)
47
+
48
+ Task: Explain how Kubernetes Services connect to pods. Write: how
49
+ selectors match pods to Services (label-based), how to verify
50
+ endpoints, the port vs targetPort distinction, common Service
51
+ connectivity issues, and how to test connectivity from inside the
52
+ cluster.
53
+
54
+ assertions:
55
+ - type: llm_judge
56
+ criteria: "Service-to-pod connection is explained — Services use label selectors to find matching pods and create endpoints. If no pods match the selector, endpoints list is empty and the Service has nothing to route to. The selector must exactly match the pod labels"
57
+ weight: 0.35
58
+ description: "Service-pod connection"
59
+ - type: llm_judge
60
+ criteria: "Debugging steps identify both issues — check endpoints (kubectl get endpoints), compare Service selector with pod labels (kubectl get svc -o yaml vs kubectl get pod --show-labels), verify port configuration (Service port vs container port vs targetPort). Shows how to test with kubectl exec and curl/wget from within the cluster"
61
+ weight: 0.35
62
+ description: "Debugging identifies issues"
63
+ - type: llm_judge
64
+ criteria: "Port configuration is clearly explained — Service port: what clients connect to, targetPort: the container port to forward to (defaults to port if not specified), containerPort: informational only (doesn't actually restrict access). DNS resolution: <service-name>.<namespace>.svc.cluster.local"
65
+ weight: 0.30
66
+ description: "Port configuration explained"
@@ -0,0 +1,63 @@
1
+ meta:
2
+ id: dns-resolution-failures
3
+ level: 2
4
+ course: kubernetes-deployment-troubleshooting
5
+ type: output
6
+ description: "Debug DNS resolution failures — diagnose CoreDNS issues, service discovery problems, and cross-namespace resolution"
7
+ tags: [Kubernetes, DNS, CoreDNS, service-discovery, networking, intermediate]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your microservices can't discover each other. Applications report DNS
13
+ resolution failures:
14
+
15
+ $ kubectl exec -it frontend-pod -- nslookup api-service
16
+ ;; connection timed out; no servers could be reached
17
+
18
+ $ kubectl exec -it frontend-pod -- cat /etc/resolv.conf
19
+ nameserver 10.96.0.10
20
+ search default.svc.cluster.local svc.cluster.local cluster.local
21
+ options ndots:5
22
+
23
+ $ kubectl get pods -n kube-system -l k8s-app=kube-dns
24
+ NAME READY STATUS RESTARTS AGE
25
+ coredns-5d78c9869d-abc12 0/1 CrashLoopBackOff 8 1h
26
+ coredns-5d78c9869d-def34 0/1 CrashLoopBackOff 8 1h
27
+
28
+ Both CoreDNS pods are crashing! Without DNS, no service can resolve
29
+ any other service by name.
30
+
31
+ $ kubectl logs coredns-5d78c9869d-abc12 -n kube-system
32
+ [FATAL] plugin/loop: Loop detected for zone ".", forwarding to
33
+ "10.96.0.10", aborting.
34
+
35
+ CoreDNS detected a DNS loop — it's forwarding queries to itself.
36
+ The node's /etc/resolv.conf points to the cluster DNS IP (10.96.0.10),
37
+ so CoreDNS tries to forward to itself, creating an infinite loop.
38
+
39
+ Additionally, even after fixing CoreDNS, a service in the "backend"
40
+ namespace can't be reached from the "frontend" namespace using just
41
+ the service name — cross-namespace resolution requires the full DNS
42
+ name: <service>.<namespace>.svc.cluster.local
43
+
44
+ Task: Explain Kubernetes DNS and how to debug resolution failures.
45
+ Write: how CoreDNS provides service discovery, the DNS name format
46
+ (<svc>.<ns>.svc.cluster.local), the search domains in resolv.conf
47
+ and ndots setting, common CoreDNS failures (loop detection, resource
48
+ exhaustion, network policy blocking), cross-namespace resolution,
49
+ and how to test DNS from inside pods.
50
+
51
+ assertions:
52
+ - type: llm_judge
53
+ criteria: "Kubernetes DNS architecture is explained — CoreDNS runs as a Deployment in kube-system, provides DNS for all services. DNS format: <service>.<namespace>.svc.cluster.local. The search domains in /etc/resolv.conf allow short names within the same namespace. ndots:5 means names with fewer than 5 dots get the search domains appended first. Cross-namespace requires at least <service>.<namespace>"
54
+ weight: 0.35
55
+ description: "DNS architecture explained"
56
+ - type: llm_judge
57
+ criteria: "CoreDNS failure modes are covered — loop detection (forwarding to itself, common when node resolv.conf points to cluster DNS), OOMKilled (too many DNS queries), CrashLoopBackOff (configuration errors), NetworkPolicy blocking DNS traffic on port 53. The loop detection fix: edit CoreDNS ConfigMap to forward to upstream DNS (e.g., 8.8.8.8) instead of /etc/resolv.conf"
58
+ weight: 0.35
59
+ description: "CoreDNS failures covered"
60
+ - type: llm_judge
61
+ criteria: "Debugging and testing are practical — test with kubectl exec nslookup/dig, check CoreDNS pods and logs, inspect CoreDNS ConfigMap (kubectl get cm coredns -n kube-system), verify the kube-dns Service exists (kubectl get svc kube-dns -n kube-system), use a debug pod with networking tools (nicolaka/netshoot). Headless services (clusterIP: None) return pod IPs directly instead of cluster IP"
62
+ weight: 0.30
63
+ description: "Debugging and testing"
@@ -0,0 +1,63 @@
1
+ meta:
2
+ id: helm-deployment-failures
3
+ level: 2
4
+ course: kubernetes-deployment-troubleshooting
5
+ type: output
6
+ description: "Debug Helm deployment failures — diagnose template rendering errors, stuck releases, and rollback issues"
7
+ tags: [Kubernetes, Helm, deployment, chart, rollback, intermediate]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your Helm upgrade failed and now the release is stuck in a bad state:
13
+
14
+ $ helm upgrade myapp ./chart --values prod-values.yaml
15
+ Error: UPGRADE FAILED: template: chart/templates/deployment.yaml:42:
16
+ function "toJson" not defined
17
+
18
+ The chart uses a custom template function that doesn't exist. After
19
+ fixing the template, you try again:
20
+
21
+ $ helm upgrade myapp ./chart --values prod-values.yaml
22
+ Error: UPGRADE FAILED: another operation (install/upgrade/rollback)
23
+ is in progress
24
+
25
+ The release is stuck in "pending-upgrade" status from the failed attempt:
26
+
27
+ $ helm list
28
+ NAME NAMESPACE REVISION STATUS CHART
29
+ myapp default 5 pending-upgrade myapp-2.3.0
30
+
31
+ $ helm history myapp
32
+ REVISION STATUS CHART DESCRIPTION
33
+ 3 superseded myapp-2.1.0 Upgrade complete
34
+ 4 superseded myapp-2.2.0 Upgrade complete
35
+ 5 pending-upgrade myapp-2.3.0 Preparing upgrade
36
+
37
+ The release is stuck because Helm's operation tracking thinks an
38
+ upgrade is still in progress. You need to rollback first:
39
+
40
+ $ helm rollback myapp 4
41
+ Rollback was a success! Happy Helming!
42
+
43
+ Now the upgrade can proceed with the fixed template.
44
+
45
+ Task: Explain Helm troubleshooting. Write: how Helm manages releases
46
+ (revision history, status tracking), common template errors and how
47
+ to debug them (helm template, helm lint, --dry-run), stuck release
48
+ states and how to fix them, helm rollback workflow, helm get commands
49
+ for inspecting releases, and best practices for safe Helm deployments.
50
+
51
+ assertions:
52
+ - type: llm_judge
53
+ criteria: "Helm release management is explained — Helm stores release history as Secrets in the namespace, each upgrade creates a new revision. Release statuses: deployed, pending-upgrade, pending-install, pending-rollback, failed, superseded. A stuck pending state means the previous operation didn't complete cleanly. Fix with helm rollback to last good revision"
54
+ weight: 0.35
55
+ description: "Release management"
56
+ - type: llm_judge
57
+ criteria: "Template debugging is covered — helm template renders templates locally without installing (catches syntax errors), helm lint validates chart structure and values, --dry-run --debug shows rendered manifests before applying. helm get manifest shows what was actually deployed, helm get values shows values used. Common errors: undefined functions, missing values, YAML indentation"
58
+ weight: 0.35
59
+ description: "Template debugging"
60
+ - type: llm_judge
61
+ criteria: "Best practices are practical — always use --dry-run before production upgrades, set --timeout and --wait for reliable status tracking, use --atomic flag (auto-rollback on failure), keep revision history (--history-max), version pin chart dependencies, use helm diff plugin to preview changes before applying"
62
+ weight: 0.30
63
+ description: "Best practices"
@@ -0,0 +1,62 @@
1
+ meta:
2
+ id: hpa-scaling-issues
3
+ level: 2
4
+ course: kubernetes-deployment-troubleshooting
5
+ type: output
6
+ description: "Debug HPA scaling issues — diagnose why autoscaling isn't working, from missing metrics to thrashing"
7
+ tags: [Kubernetes, HPA, autoscaling, metrics-server, scaling, intermediate]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your HPA isn't scaling despite high CPU usage on your pods:
13
+
14
+ $ kubectl get hpa
15
+ NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
16
+ web-app Deployment/web-app <unknown>/70% 2 10 2 30m
17
+
18
+ The TARGETS column shows <unknown> — the HPA can't read metrics.
19
+
20
+ $ kubectl describe hpa web-app
21
+ Conditions:
22
+ Type Status Reason
23
+ AbleToScale True SucceededGetScale
24
+ ScalingActive False FailedGetResourceMetric
25
+ Events:
26
+ Warning FailedComputeMetricsReplicas 1m horizontal-pod-autoscaler
27
+ invalid metrics (1 invalid out of 1), first error is: failed to get
28
+ cpu utilization: missing request for cpu in container "web-app"
29
+
30
+ Two issues found:
31
+ 1. The pods don't have CPU requests set — HPA calculates utilization
32
+ as a percentage of the request, so without requests it can't compute
33
+ 2. After fixing requests, metrics-server isn't installed:
34
+
35
+ $ kubectl top pods
36
+ error: Metrics API not available
37
+
38
+ $ kubectl get deployment metrics-server -n kube-system
39
+ Error from server (NotFound): deployments.apps "metrics-server" not found
40
+
41
+ After installing metrics-server and setting CPU requests, the HPA
42
+ works but thrashes — scaling up and down rapidly every minute.
43
+
44
+ Task: Explain HPA and how to debug scaling issues. Write: how HPA
45
+ works (metrics → desired replicas calculation), why CPU requests are
46
+ required, metrics-server role, the stabilization window to prevent
47
+ thrashing, custom metrics with Prometheus adapter, and how to tune
48
+ HPA behavior.
49
+
50
+ assertions:
51
+ - type: llm_judge
52
+ criteria: "HPA mechanics are explained — HPA queries metrics API every 15s (default), calculates desired replicas as ceil(currentReplicas * (currentMetric/targetMetric)). CPU utilization is percentage of CPU request, so requests MUST be set. metrics-server collects resource metrics from kubelets and exposes them via the Metrics API. Without metrics-server or without requests, HPA shows <unknown>"
53
+ weight: 0.35
54
+ description: "HPA mechanics"
55
+ - type: llm_judge
56
+ criteria: "Debugging steps are systematic — check TARGETS column (unknown = no metrics), kubectl describe hpa for conditions and events, verify metrics-server running, verify pods have resource requests, check if ScalingActive condition is True. Use kubectl top pod to verify metrics are available. Check HPA events for error messages"
57
+ weight: 0.35
58
+ description: "Debugging steps"
59
+ - type: llm_judge
60
+ criteria: "Thrashing prevention and tuning are covered — stabilization window (scaleDown stabilizationWindowSeconds defaults to 300s, prevents rapid scale-down), scaling policies (pods-per-minute or percent-per-minute limits), behavior field in HPA spec for fine-grained control. Custom metrics via Prometheus adapter for application-specific scaling (requests per second, queue depth)"
61
+ weight: 0.30
62
+ description: "Thrashing and tuning"
@@ -0,0 +1,63 @@
1
+ meta:
2
+ id: ingress-routing-issues
3
+ level: 2
4
+ course: kubernetes-deployment-troubleshooting
5
+ type: output
6
+ description: "Debug Ingress routing issues — diagnose 404s, TLS errors, and misconfigured path routing in Ingress resources"
7
+ tags: [Kubernetes, Ingress, routing, TLS, nginx-ingress, intermediate]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your application is returning 404 for all API routes through the
13
+ Ingress, but the app works fine when accessed via port-forward:
14
+
15
+ $ curl https://app.example.com/api/users
16
+ <html><body><h1>404 Not Found</h1></body></html>
17
+
18
+ $ kubectl port-forward svc/api-service 8080:80
19
+ $ curl http://localhost:8080/api/users
20
+ {"users": [...]} # Works!
21
+
22
+ $ kubectl get ingress
23
+ NAME CLASS HOSTS ADDRESS PORTS AGE
24
+ app-ing nginx app.example.com 10.0.50.100 80, 443 30m
25
+
26
+ $ kubectl describe ingress app-ing
27
+ Rules:
28
+ Host Path Backends
29
+ app.example.com
30
+ /api api-service:80
31
+ Annotations:
32
+ nginx.ingress.kubernetes.io/rewrite-target: /
33
+
34
+ The problem: the rewrite-target annotation rewrites /api/users to
35
+ just / — the backend receives / instead of /api/users. The app needs
36
+ the full path.
37
+
38
+ Additionally, HTTPS isn't working:
39
+ $ curl https://app.example.com/
40
+ curl: (60) SSL: certificate subject name does not match target host name
41
+
42
+ The TLS secret has a certificate for "*.internal.com", not
43
+ "app.example.com".
44
+
45
+ Task: Explain how Kubernetes Ingress works and how to debug routing
46
+ issues. Write: how Ingress controllers route traffic (host-based and
47
+ path-based), path types (Exact, Prefix, ImplementationSpecific),
48
+ the rewrite-target annotation and when to use it, TLS configuration
49
+ and certificate management, and debugging techniques for 404/502/503.
50
+
51
+ assertions:
52
+ - type: llm_judge
53
+ criteria: "Ingress routing is explained — Ingress resource defines rules mapping hostnames and paths to backend Services. Ingress controller (nginx, traefik, etc.) implements the routing. Path types: Exact (exact match only), Prefix (prefix match, / matches everything), ImplementationSpecific (controller decides). The rewrite-target annotation modifies the URL path before forwarding to the backend"
54
+ weight: 0.35
55
+ description: "Ingress routing explained"
56
+ - type: llm_judge
57
+ criteria: "TLS and common errors are covered — TLS configured via tls section referencing a Kubernetes Secret containing tls.crt and tls.key. Certificate must match the hostname in the Ingress rules. Common errors: 404 (path mismatch, wrong backend, rewrite issues), 502 (backend pod not running or port wrong), 503 (no endpoints, readiness probe failing). cert-manager for automatic certificate management"
58
+ weight: 0.35
59
+ description: "TLS and errors"
60
+ - type: llm_judge
61
+ criteria: "Debugging workflow is practical — check Ingress controller logs (kubectl logs -n ingress-nginx <controller-pod>), verify backend Service exists and has endpoints, test directly via port-forward to isolate Ingress vs app issue, check Ingress controller config (kubectl exec into controller and inspect nginx.conf), verify DNS resolves correctly, check annotations match the controller type"
62
+ weight: 0.30
63
+ description: "Debugging workflow"
@@ -0,0 +1,63 @@
1
+ meta:
2
+ id: init-container-failures
3
+ level: 2
4
+ course: kubernetes-deployment-troubleshooting
5
+ type: output
6
+ description: "Debug init container failures — diagnose why pods are stuck in Init state and how init containers affect pod startup"
7
+ tags: [Kubernetes, init-containers, pod-startup, dependencies, intermediate]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your application pod is stuck in Init:0/2 state and never starts:
13
+
14
+ $ kubectl get pods
15
+ NAME READY STATUS RESTARTS AGE
16
+ webapp-4a5b6c7d-e8f9 0/1 Init:0/2 0 10m
17
+
18
+ $ kubectl describe pod webapp-4a5b6c7d-e8f9
19
+ Init Containers:
20
+ wait-for-db:
21
+ Image: busybox
22
+ Command: ["sh", "-c", "until nc -z postgres-svc 5432; do echo
23
+ waiting for postgres; sleep 2; done"]
24
+ State: Running
25
+ Started: 2025-12-01T10:00:00Z
26
+ run-migrations:
27
+ Image: webapp:v2.0
28
+ Command: ["./migrate", "--up"]
29
+ State: Waiting
30
+ Reason: PodInitializing
31
+
32
+ Events:
33
+ Normal Pulled 10m kubelet Container image "busybox" pulled
34
+ Normal Created 10m kubelet Created container wait-for-db
35
+ Normal Started 10m kubelet Started container wait-for-db
36
+
37
+ The first init container (wait-for-db) is running forever because
38
+ postgres-svc doesn't exist in this namespace — the database is in the
39
+ "data" namespace and should be referenced as postgres-svc.data.
40
+
41
+ The second init container (run-migrations) can't start because init
42
+ containers run sequentially — it waits for wait-for-db to complete.
43
+
44
+ Task: Explain init containers and how to debug them. Write: what init
45
+ containers are (run-to-completion before main container starts), how
46
+ they run sequentially, how to read Init:X/Y status, how to view init
47
+ container logs (kubectl logs <pod> -c <init-container>), common init
48
+ container patterns (wait for dependency, run migrations, download
49
+ config), and how to fix stuck init containers.
50
+
51
+ assertions:
52
+ - type: llm_judge
53
+ criteria: "Init containers are explained — specialized containers that run to completion before the main container starts. They run sequentially (init-1 must succeed before init-2 starts). Init:0/2 means 0 of 2 init containers have completed. If an init container fails, the pod restarts (applying the restartPolicy). Init containers share volumes with the main container but have their own image and command"
54
+ weight: 0.35
55
+ description: "Init containers explained"
56
+ - type: llm_judge
57
+ criteria: "Debugging workflow is clear — read Init status (Init:X/Y shows progress), kubectl describe pod shows each init container state and events, kubectl logs <pod> -c <init-container-name> shows init container output (critical for debugging), identify if init container is stuck running (infinite wait loop) vs failing (exit code). In this case, the DNS name is wrong — should use cross-namespace DNS"
58
+ weight: 0.35
59
+ description: "Init container debugging"
60
+ - type: llm_judge
61
+ criteria: "Common patterns and fixes covered — patterns: wait-for-dependency (check service availability), run-migrations (database schema changes), download-config (fetch from external source), set-permissions (chown/chmod on shared volumes). Fixes: add timeouts to wait loops, use correct DNS names for cross-namespace services (<svc>.<namespace>.svc.cluster.local), add resource limits to init containers"
62
+ weight: 0.30
63
+ description: "Patterns and fixes"
@@ -0,0 +1,66 @@
1
+ meta:
2
+ id: intermediate-troubleshooting-shift
3
+ level: 2
4
+ course: kubernetes-deployment-troubleshooting
5
+ type: output
6
+ description: "Combined intermediate troubleshooting shift — diagnose interconnected failures involving storage, networking, RBAC, and scaling"
7
+ tags: [Kubernetes, troubleshooting, combined, shift-simulation, intermediate]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You're investigating a complex outage in the "ecommerce" namespace.
13
+ A Helm upgrade was deployed 30 minutes ago and multiple things broke:
14
+
15
+ $ kubectl get pods -n ecommerce
16
+ NAME READY STATUS RESTARTS AGE
17
+ catalog-svc-new-7a8b-c9d0 0/1 Pending 0 30m
18
+ catalog-svc-old-1e2f-g3h4 1/1 Running 0 2d
19
+ cart-svc-5i6j7k8l-m9n0 0/1 CrashLoopBackOff 8 30m
20
+ search-svc-1o2p3q4r-s5t6 1/1 Running 0 30m
21
+ payment-svc-7u8v9w0x-y1z2 1/1 Running 0 2d
22
+
23
+ $ kubectl get hpa -n ecommerce
24
+ NAME TARGETS MINPODS MAXPODS REPLICAS
25
+ catalog-svc <unknown>/80% 2 20 1
26
+
27
+ $ kubectl get ingress -n ecommerce
28
+ NAME HOSTS ADDRESS PORTS AGE
29
+ ecommerce shop.example.com 80, 443 30m
30
+
31
+ Investigation reveals four interconnected issues:
32
+
33
+ 1. catalog-svc — new pod Pending, PVC can't bind. The Helm upgrade
34
+ changed StorageClass from "standard" to "premium-ssd" which doesn't
35
+ exist. Old pod still running because Deployment strategy is
36
+ RollingUpdate with maxUnavailable=0.
37
+
38
+ 2. cart-svc — CrashLoopBackOff, logs show "FORBIDDEN: cannot list
39
+ endpoints in namespace ecommerce." The new version uses Kubernetes
40
+ API for service discovery but the ServiceAccount lacks RBAC.
41
+
42
+ 3. HPA — showing <unknown> targets because the Helm upgrade removed
43
+ resource requests from the catalog-svc Deployment spec.
44
+
45
+ 4. Ingress — no ADDRESS assigned. The Ingress resource was recreated
46
+ with an invalid ingressClassName that doesn't match any controller.
47
+
48
+ Task: Walk through diagnosing and fixing all four issues. Write:
49
+ the triage approach for a Helm-triggered multi-failure incident,
50
+ how to identify the Helm upgrade as the common cause, the fix for
51
+ each issue, whether to rollback the entire Helm release or fix
52
+ forward, and the post-incident review process.
53
+
54
+ assertions:
55
+ - type: llm_judge
56
+ criteria: "All four issues are diagnosed — (1) PVC binding failure from non-existent StorageClass, (2) RBAC Forbidden error for cart-svc ServiceAccount, (3) HPA <unknown> from missing CPU requests, (4) Ingress no address from wrong ingressClassName. The root cause is identified as the Helm upgrade introducing multiple configuration errors"
57
+ weight: 0.35
58
+ description: "All issues diagnosed"
59
+ - type: llm_judge
60
+ criteria: "Rollback vs fix-forward decision is analyzed — helm rollback is faster and safer when multiple issues exist (reverts all changes at once), but may lose legitimate improvements in the new version. Fix-forward makes sense for single, well-understood issues. In this case with 4 issues, rollback to last good revision is recommended, then fix the chart and re-deploy"
61
+ weight: 0.35
62
+ description: "Rollback decision"
63
+ - type: llm_judge
64
+ criteria: "Post-incident process is covered — review the Helm chart diff (helm diff), add validation checks (helm lint, --dry-run, OPA policies), implement staging environment that mirrors production, use helm test for post-deploy verification, add monitoring alerts for HPA issues and Ingress health, consider GitOps with ArgoCD for change review"
65
+ weight: 0.30
66
+ description: "Post-incident process"