dojo.md 0.2.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (222) hide show
  1. package/courses/GENERATION_LOG.md +45 -0
  2. package/courses/aws-lambda-debugging/course.yaml +11 -0
  3. package/courses/aws-lambda-debugging/scenarios/level-1/api-gateway-integration.yaml +71 -0
  4. package/courses/aws-lambda-debugging/scenarios/level-1/cloudwatch-logs-basics.yaml +64 -0
  5. package/courses/aws-lambda-debugging/scenarios/level-1/cold-start-basics.yaml +70 -0
  6. package/courses/aws-lambda-debugging/scenarios/level-1/environment-variable-issues.yaml +72 -0
  7. package/courses/aws-lambda-debugging/scenarios/level-1/first-debugging-shift.yaml +73 -0
  8. package/courses/aws-lambda-debugging/scenarios/level-1/handler-import-errors.yaml +71 -0
  9. package/courses/aws-lambda-debugging/scenarios/level-1/iam-permission-errors.yaml +68 -0
  10. package/courses/aws-lambda-debugging/scenarios/level-1/invocation-errors.yaml +72 -0
  11. package/courses/aws-lambda-debugging/scenarios/level-1/lambda-timeout-errors.yaml +65 -0
  12. package/courses/aws-lambda-debugging/scenarios/level-1/memory-and-oom.yaml +70 -0
  13. package/courses/aws-lambda-debugging/scenarios/level-2/async-invocation-failures.yaml +72 -0
  14. package/courses/aws-lambda-debugging/scenarios/level-2/cold-start-optimization.yaml +76 -0
  15. package/courses/aws-lambda-debugging/scenarios/level-2/dynamodb-streams-debugging.yaml +70 -0
  16. package/courses/aws-lambda-debugging/scenarios/level-2/intermediate-debugging-shift.yaml +71 -0
  17. package/courses/aws-lambda-debugging/scenarios/level-2/lambda-concurrency-management.yaml +70 -0
  18. package/courses/aws-lambda-debugging/scenarios/level-2/lambda-layers-debugging.yaml +76 -0
  19. package/courses/aws-lambda-debugging/scenarios/level-2/sam-local-debugging.yaml +74 -0
  20. package/courses/aws-lambda-debugging/scenarios/level-2/sqs-event-source.yaml +72 -0
  21. package/courses/aws-lambda-debugging/scenarios/level-2/vpc-networking-issues.yaml +71 -0
  22. package/courses/aws-lambda-debugging/scenarios/level-2/xray-tracing.yaml +62 -0
  23. package/courses/aws-lambda-debugging/scenarios/level-3/advanced-debugging-shift.yaml +72 -0
  24. package/courses/aws-lambda-debugging/scenarios/level-3/container-image-lambda.yaml +79 -0
  25. package/courses/aws-lambda-debugging/scenarios/level-3/cross-account-invocation.yaml +72 -0
  26. package/courses/aws-lambda-debugging/scenarios/level-3/eventbridge-patterns.yaml +79 -0
  27. package/courses/aws-lambda-debugging/scenarios/level-3/iac-deployment-debugging.yaml +68 -0
  28. package/courses/aws-lambda-debugging/scenarios/level-3/kinesis-stream-processing.yaml +64 -0
  29. package/courses/aws-lambda-debugging/scenarios/level-3/lambda-at-edge.yaml +64 -0
  30. package/courses/aws-lambda-debugging/scenarios/level-3/lambda-extensions-debugging.yaml +67 -0
  31. package/courses/aws-lambda-debugging/scenarios/level-3/powertools-observability.yaml +79 -0
  32. package/courses/aws-lambda-debugging/scenarios/level-3/step-functions-debugging.yaml +80 -0
  33. package/courses/aws-lambda-debugging/scenarios/level-4/cost-optimization-strategy.yaml +67 -0
  34. package/courses/aws-lambda-debugging/scenarios/level-4/expert-debugging-shift.yaml +62 -0
  35. package/courses/aws-lambda-debugging/scenarios/level-4/incident-management-serverless.yaml +61 -0
  36. package/courses/aws-lambda-debugging/scenarios/level-4/multi-region-serverless.yaml +67 -0
  37. package/courses/aws-lambda-debugging/scenarios/level-4/observability-platform-design.yaml +71 -0
  38. package/courses/aws-lambda-debugging/scenarios/level-4/serverless-architecture-design.yaml +64 -0
  39. package/courses/aws-lambda-debugging/scenarios/level-4/serverless-data-architecture.yaml +66 -0
  40. package/courses/aws-lambda-debugging/scenarios/level-4/serverless-migration-strategy.yaml +65 -0
  41. package/courses/aws-lambda-debugging/scenarios/level-4/serverless-security-design.yaml +60 -0
  42. package/courses/aws-lambda-debugging/scenarios/level-4/serverless-testing-strategy.yaml +62 -0
  43. package/courses/aws-lambda-debugging/scenarios/level-5/board-serverless-strategy.yaml +63 -0
  44. package/courses/aws-lambda-debugging/scenarios/level-5/consulting-serverless-adoption.yaml +57 -0
  45. package/courses/aws-lambda-debugging/scenarios/level-5/industry-serverless-patterns.yaml +62 -0
  46. package/courses/aws-lambda-debugging/scenarios/level-5/ma-serverless-integration.yaml +75 -0
  47. package/courses/aws-lambda-debugging/scenarios/level-5/master-debugging-shift.yaml +61 -0
  48. package/courses/aws-lambda-debugging/scenarios/level-5/organizational-serverless-transformation.yaml +65 -0
  49. package/courses/aws-lambda-debugging/scenarios/level-5/regulatory-serverless.yaml +61 -0
  50. package/courses/aws-lambda-debugging/scenarios/level-5/serverless-economics.yaml +65 -0
  51. package/courses/aws-lambda-debugging/scenarios/level-5/serverless-future-technology.yaml +66 -0
  52. package/courses/aws-lambda-debugging/scenarios/level-5/serverless-platform-design.yaml +71 -0
  53. package/courses/docker-container-debugging/course.yaml +11 -0
  54. package/courses/docker-container-debugging/scenarios/level-1/container-exit-codes.yaml +59 -0
  55. package/courses/docker-container-debugging/scenarios/level-1/container-networking-basics.yaml +69 -0
  56. package/courses/docker-container-debugging/scenarios/level-1/docker-logs-debugging.yaml +67 -0
  57. package/courses/docker-container-debugging/scenarios/level-1/dockerfile-build-failures.yaml +71 -0
  58. package/courses/docker-container-debugging/scenarios/level-1/environment-variable-issues.yaml +74 -0
  59. package/courses/docker-container-debugging/scenarios/level-1/first-debugging-shift.yaml +70 -0
  60. package/courses/docker-container-debugging/scenarios/level-1/image-pull-failures.yaml +68 -0
  61. package/courses/docker-container-debugging/scenarios/level-1/port-mapping-issues.yaml +67 -0
  62. package/courses/docker-container-debugging/scenarios/level-1/resource-limits-oom.yaml +70 -0
  63. package/courses/docker-container-debugging/scenarios/level-1/volume-mount-problems.yaml +66 -0
  64. package/courses/docker-container-debugging/scenarios/level-2/container-health-checks.yaml +73 -0
  65. package/courses/docker-container-debugging/scenarios/level-2/docker-compose-debugging.yaml +66 -0
  66. package/courses/docker-container-debugging/scenarios/level-2/docker-exec-debugging.yaml +71 -0
  67. package/courses/docker-container-debugging/scenarios/level-2/image-layer-optimization.yaml +81 -0
  68. package/courses/docker-container-debugging/scenarios/level-2/intermediate-debugging-shift.yaml +73 -0
  69. package/courses/docker-container-debugging/scenarios/level-2/logging-and-log-rotation.yaml +76 -0
  70. package/courses/docker-container-debugging/scenarios/level-2/multi-stage-build-debugging.yaml +76 -0
  71. package/courses/docker-container-debugging/scenarios/level-2/network-debugging-tools.yaml +67 -0
  72. package/courses/docker-container-debugging/scenarios/level-2/pid1-signal-handling.yaml +71 -0
  73. package/courses/docker-container-debugging/scenarios/level-2/security-scanning-basics.yaml +67 -0
  74. package/courses/docker-container-debugging/scenarios/level-3/advanced-debugging-shift.yaml +77 -0
  75. package/courses/docker-container-debugging/scenarios/level-3/buildkit-optimization.yaml +67 -0
  76. package/courses/docker-container-debugging/scenarios/level-3/container-filesystem-debugging.yaml +70 -0
  77. package/courses/docker-container-debugging/scenarios/level-3/container-security-hardening.yaml +74 -0
  78. package/courses/docker-container-debugging/scenarios/level-3/disk-space-management.yaml +74 -0
  79. package/courses/docker-container-debugging/scenarios/level-3/docker-api-automation.yaml +72 -0
  80. package/courses/docker-container-debugging/scenarios/level-3/docker-daemon-issues.yaml +73 -0
  81. package/courses/docker-container-debugging/scenarios/level-3/docker-in-docker-ci.yaml +69 -0
  82. package/courses/docker-container-debugging/scenarios/level-3/overlay-network-debugging.yaml +70 -0
  83. package/courses/docker-container-debugging/scenarios/level-3/production-container-ops.yaml +71 -0
  84. package/courses/docker-container-debugging/scenarios/level-4/cicd-pipeline-design.yaml +66 -0
  85. package/courses/docker-container-debugging/scenarios/level-4/container-monitoring-observability.yaml +63 -0
  86. package/courses/docker-container-debugging/scenarios/level-4/container-orchestration-strategy.yaml +62 -0
  87. package/courses/docker-container-debugging/scenarios/level-4/container-performance-engineering.yaml +64 -0
  88. package/courses/docker-container-debugging/scenarios/level-4/container-security-architecture.yaml +66 -0
  89. package/courses/docker-container-debugging/scenarios/level-4/enterprise-image-management.yaml +58 -0
  90. package/courses/docker-container-debugging/scenarios/level-4/expert-debugging-shift.yaml +63 -0
  91. package/courses/docker-container-debugging/scenarios/level-4/incident-response-containers.yaml +70 -0
  92. package/courses/docker-container-debugging/scenarios/level-4/multi-environment-management.yaml +65 -0
  93. package/courses/docker-container-debugging/scenarios/level-4/stateful-service-containers.yaml +65 -0
  94. package/courses/docker-container-debugging/scenarios/level-5/board-infrastructure-strategy.yaml +58 -0
  95. package/courses/docker-container-debugging/scenarios/level-5/consulting-container-strategy.yaml +61 -0
  96. package/courses/docker-container-debugging/scenarios/level-5/container-platform-architecture.yaml +67 -0
  97. package/courses/docker-container-debugging/scenarios/level-5/container-platform-economics.yaml +67 -0
  98. package/courses/docker-container-debugging/scenarios/level-5/container-technology-evolution.yaml +67 -0
  99. package/courses/docker-container-debugging/scenarios/level-5/disaster-recovery-containers.yaml +66 -0
  100. package/courses/docker-container-debugging/scenarios/level-5/industry-container-patterns.yaml +71 -0
  101. package/courses/docker-container-debugging/scenarios/level-5/master-debugging-shift.yaml +62 -0
  102. package/courses/docker-container-debugging/scenarios/level-5/organizational-transformation.yaml +67 -0
  103. package/courses/docker-container-debugging/scenarios/level-5/regulatory-compliance-containers.yaml +61 -0
  104. package/courses/kubernetes-deployment-troubleshooting/course.yaml +12 -0
  105. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/configmap-secret-issues.yaml +69 -0
  106. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/crashloopbackoff.yaml +68 -0
  107. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/deployment-rollout.yaml +56 -0
  108. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/first-troubleshooting-shift.yaml +65 -0
  109. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/health-probe-failures.yaml +70 -0
  110. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/imagepullbackoff.yaml +57 -0
  111. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/kubectl-debugging-basics.yaml +56 -0
  112. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/oomkilled.yaml +70 -0
  113. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/pending-pods.yaml +68 -0
  114. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/service-not-reachable.yaml +66 -0
  115. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/dns-resolution-failures.yaml +63 -0
  116. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/helm-deployment-failures.yaml +63 -0
  117. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/hpa-scaling-issues.yaml +62 -0
  118. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/ingress-routing-issues.yaml +63 -0
  119. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/init-container-failures.yaml +63 -0
  120. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/intermediate-troubleshooting-shift.yaml +66 -0
  121. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/network-policy-blocking.yaml +67 -0
  122. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/persistent-volume-issues.yaml +69 -0
  123. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/rbac-permission-denied.yaml +57 -0
  124. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/resource-quota-limits.yaml +64 -0
  125. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/advanced-troubleshooting-shift.yaml +69 -0
  126. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/cluster-upgrade-failures.yaml +71 -0
  127. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/gitops-drift-detection.yaml +62 -0
  128. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/job-cronjob-failures.yaml +67 -0
  129. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/monitoring-alerting-gaps.yaml +64 -0
  130. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/multi-container-debugging.yaml +68 -0
  131. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/node-pressure-evictions.yaml +70 -0
  132. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/pod-disruption-budgets.yaml +59 -0
  133. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/service-mesh-debugging.yaml +64 -0
  134. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/statefulset-troubleshooting.yaml +69 -0
  135. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/capacity-planning.yaml +65 -0
  136. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/cost-optimization.yaml +57 -0
  137. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/disaster-recovery-design.yaml +56 -0
  138. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/executive-communication.yaml +62 -0
  139. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/expert-troubleshooting-shift.yaml +65 -0
  140. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/incident-management-process.yaml +59 -0
  141. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/multi-cluster-operations.yaml +62 -0
  142. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/multi-tenancy-design.yaml +55 -0
  143. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/platform-engineering.yaml +59 -0
  144. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/security-hardening.yaml +58 -0
  145. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/behavioral-science.yaml +62 -0
  146. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/board-strategy.yaml +61 -0
  147. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/cloud-native-future.yaml +65 -0
  148. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/comprehensive-platform.yaml +57 -0
  149. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/consulting-engagement.yaml +62 -0
  150. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/industry-benchmarks.yaml +58 -0
  151. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/ma-integration.yaml +62 -0
  152. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/master-troubleshooting-shift.yaml +73 -0
  153. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/product-development.yaml +65 -0
  154. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/regulatory-compliance.yaml +76 -0
  155. package/courses/mysql-query-optimization/course.yaml +11 -0
  156. package/courses/mysql-query-optimization/scenarios/level-1/buffer-pool-basics.yaml +65 -0
  157. package/courses/mysql-query-optimization/scenarios/level-1/explain-basics.yaml +66 -0
  158. package/courses/mysql-query-optimization/scenarios/level-1/first-optimization-shift.yaml +78 -0
  159. package/courses/mysql-query-optimization/scenarios/level-1/innodb-index-fundamentals.yaml +68 -0
  160. package/courses/mysql-query-optimization/scenarios/level-1/join-basics.yaml +66 -0
  161. package/courses/mysql-query-optimization/scenarios/level-1/n-plus-one-queries.yaml +67 -0
  162. package/courses/mysql-query-optimization/scenarios/level-1/query-rewriting-basics.yaml +66 -0
  163. package/courses/mysql-query-optimization/scenarios/level-1/select-star-problems.yaml +68 -0
  164. package/courses/mysql-query-optimization/scenarios/level-1/slow-query-diagnosis.yaml +65 -0
  165. package/courses/mysql-query-optimization/scenarios/level-1/where-clause-optimization.yaml +65 -0
  166. package/courses/mysql-query-optimization/scenarios/level-2/buffer-pool-tuning.yaml +64 -0
  167. package/courses/mysql-query-optimization/scenarios/level-2/composite-index-design.yaml +71 -0
  168. package/courses/mysql-query-optimization/scenarios/level-2/covering-and-invisible-indexes.yaml +69 -0
  169. package/courses/mysql-query-optimization/scenarios/level-2/cte-and-window-functions.yaml +78 -0
  170. package/courses/mysql-query-optimization/scenarios/level-2/intermediate-optimization-shift.yaml +68 -0
  171. package/courses/mysql-query-optimization/scenarios/level-2/join-optimization.yaml +67 -0
  172. package/courses/mysql-query-optimization/scenarios/level-2/performance-schema-analysis.yaml +69 -0
  173. package/courses/mysql-query-optimization/scenarios/level-2/query-optimizer-hints.yaml +74 -0
  174. package/courses/mysql-query-optimization/scenarios/level-2/subquery-optimization.yaml +70 -0
  175. package/courses/mysql-query-optimization/scenarios/level-2/write-optimization.yaml +63 -0
  176. package/courses/mysql-query-optimization/scenarios/level-3/advanced-optimization-shift.yaml +71 -0
  177. package/courses/mysql-query-optimization/scenarios/level-3/connection-management.yaml +67 -0
  178. package/courses/mysql-query-optimization/scenarios/level-3/full-text-search.yaml +77 -0
  179. package/courses/mysql-query-optimization/scenarios/level-3/json-optimization.yaml +87 -0
  180. package/courses/mysql-query-optimization/scenarios/level-3/lock-contention-analysis.yaml +68 -0
  181. package/courses/mysql-query-optimization/scenarios/level-3/monitoring-alerting.yaml +63 -0
  182. package/courses/mysql-query-optimization/scenarios/level-3/online-schema-changes.yaml +79 -0
  183. package/courses/mysql-query-optimization/scenarios/level-3/partitioning-strategies.yaml +83 -0
  184. package/courses/mysql-query-optimization/scenarios/level-3/query-profiling-deep-dive.yaml +84 -0
  185. package/courses/mysql-query-optimization/scenarios/level-3/replication-optimization.yaml +66 -0
  186. package/courses/mysql-query-optimization/scenarios/level-4/aurora-vs-rds-evaluation.yaml +61 -0
  187. package/courses/mysql-query-optimization/scenarios/level-4/data-architecture.yaml +62 -0
  188. package/courses/mysql-query-optimization/scenarios/level-4/database-migration-planning.yaml +59 -0
  189. package/courses/mysql-query-optimization/scenarios/level-4/enterprise-governance.yaml +50 -0
  190. package/courses/mysql-query-optimization/scenarios/level-4/executive-communication.yaml +54 -0
  191. package/courses/mysql-query-optimization/scenarios/level-4/expert-optimization-shift.yaml +67 -0
  192. package/courses/mysql-query-optimization/scenarios/level-4/high-availability-architecture.yaml +60 -0
  193. package/courses/mysql-query-optimization/scenarios/level-4/optimizer-internals.yaml +62 -0
  194. package/courses/mysql-query-optimization/scenarios/level-4/performance-sla-design.yaml +52 -0
  195. package/courses/mysql-query-optimization/scenarios/level-4/read-replica-scaling.yaml +51 -0
  196. package/courses/mysql-query-optimization/scenarios/level-5/ai-database-future.yaml +45 -0
  197. package/courses/mysql-query-optimization/scenarios/level-5/behavioral-science.yaml +44 -0
  198. package/courses/mysql-query-optimization/scenarios/level-5/benchmark-design.yaml +47 -0
  199. package/courses/mysql-query-optimization/scenarios/level-5/board-strategy.yaml +48 -0
  200. package/courses/mysql-query-optimization/scenarios/level-5/comprehensive-platform.yaml +49 -0
  201. package/courses/mysql-query-optimization/scenarios/level-5/consulting-engagement.yaml +52 -0
  202. package/courses/mysql-query-optimization/scenarios/level-5/ma-database-integration.yaml +47 -0
  203. package/courses/mysql-query-optimization/scenarios/level-5/master-optimization-shift.yaml +56 -0
  204. package/courses/mysql-query-optimization/scenarios/level-5/product-development.yaml +48 -0
  205. package/courses/mysql-query-optimization/scenarios/level-5/regulatory-compliance.yaml +48 -0
  206. package/courses/postgresql-query-optimization/scenarios/level-5/comprehensive-database-system.yaml +70 -0
  207. package/courses/postgresql-query-optimization/scenarios/level-5/database-ai-future.yaml +81 -0
  208. package/courses/postgresql-query-optimization/scenarios/level-5/database-behavioral-science.yaml +63 -0
  209. package/courses/postgresql-query-optimization/scenarios/level-5/database-board-strategy.yaml +77 -0
  210. package/courses/postgresql-query-optimization/scenarios/level-5/database-consulting-engagement.yaml +61 -0
  211. package/courses/postgresql-query-optimization/scenarios/level-5/database-industry-benchmarks.yaml +64 -0
  212. package/courses/postgresql-query-optimization/scenarios/level-5/database-ma-integration.yaml +71 -0
  213. package/courses/postgresql-query-optimization/scenarios/level-5/database-product-development.yaml +72 -0
  214. package/courses/postgresql-query-optimization/scenarios/level-5/database-regulatory-landscape.yaml +76 -0
  215. package/courses/postgresql-query-optimization/scenarios/level-5/master-optimization-shift.yaml +66 -0
  216. package/courses/terraform-infrastructure-setup/course.yaml +11 -0
  217. package/courses/terraform-infrastructure-setup/scenarios/level-1/terraform-init-errors.yaml +72 -0
  218. package/dist/mcp/session-manager.d.ts +7 -4
  219. package/dist/mcp/session-manager.d.ts.map +1 -1
  220. package/dist/mcp/session-manager.js +23 -8
  221. package/dist/mcp/session-manager.js.map +1 -1
  222. package/package.json +1 -1
@@ -0,0 +1,71 @@
1
+ meta:
2
+ id: production-container-ops
3
+ level: 3
4
+ course: docker-container-debugging
5
+ type: output
6
+ description: "Debug production container operations — diagnose zero-downtime deployment failures, container update strategies, and production runtime issues"
7
+ tags: [Docker, production, zero-downtime, rolling-update, deployment, advanced]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your production Docker deployment experiences issues during updates:
13
+
14
+ Deployment strategy uses Docker Compose with rolling updates:
15
+ $ docker compose up -d --no-deps --build api
16
+
17
+ Problem 1 — Downtime during update:
18
+ The old container is killed before the new one is ready. For 5-10
19
+ seconds, requests to the API return "connection refused":
20
+
21
+ $ docker compose up -d api
22
+ Recreating api ... done
23
+
24
+ The default behavior: stop old → start new. No overlap period.
25
+ Health check is configured but Compose doesn't wait for it by
26
+ default during updates.
27
+
28
+ Problem 2 — Container doesn't stop gracefully:
29
+ $ docker stop api
30
+ (waits 10 seconds, then kills with SIGKILL)
31
+
32
+ The application doesn't handle SIGTERM. Running connections are
33
+ severed. In-progress requests return errors. The app needs a
34
+ graceful shutdown handler that:
35
+ - Stops accepting new connections
36
+ - Finishes processing current requests
37
+ - Closes database connections
38
+ - Exits cleanly
39
+
40
+ Problem 3 — Rollback needed but old image was overwritten:
41
+ $ docker compose up -d # deploys broken version
42
+ # Need to roll back but :latest was overwritten
43
+ # No way to get the previous version!
44
+
45
+ Problem 4 — Resource leak over time:
46
+ $ docker stats api
47
+ CONTAINER CPU% MEM USAGE / LIMIT MEM% NET I/O
48
+ api 145% 1.8GiB / 2GiB 90% 45GB / 12GB
49
+
50
+ Memory at 90% and climbing. The container has a memory leak and
51
+ will eventually OOM. Restart is a band-aid, not a fix.
52
+
53
+ Task: Explain production container operations. Write: zero-downtime
54
+ deployment strategies (blue-green, rolling), graceful shutdown
55
+ (signal handling, stop_grace_period), image tagging strategies (never
56
+ use :latest in production), resource monitoring with docker stats,
57
+ handling memory leaks, and production readiness checklist.
58
+
59
+ assertions:
60
+ - type: llm_judge
61
+ criteria: "Zero-downtime deployment is explained — blue-green: run new version alongside old, switch traffic after health check passes. Rolling update: gradually replace instances. With Compose: use health checks + depends_on condition: service_healthy. deploy.update_config in Swarm: parallelism, delay, order (start-first vs stop-first). start-first ensures new container is healthy before stopping old one. Without orchestration: use a reverse proxy (nginx, traefik) to manage traffic switching"
62
+ weight: 0.35
63
+ description: "Zero-downtime deployment"
64
+ - type: llm_judge
65
+ criteria: "Graceful shutdown is covered — Docker sends SIGTERM, waits stop_grace_period (default 10s), then SIGKILL. Application must handle SIGTERM: stop accepting new connections, drain existing connections, close resources, exit 0. Node.js: process.on('SIGTERM', ...). Python: signal.signal(SIGTERM, ...). stop_grace_period in Compose should match the app's drain time. PID 1 problem: shell form CMD doesn't forward signals — use exec form or tini"
66
+ weight: 0.35
67
+ description: "Graceful shutdown"
68
+ - type: llm_judge
69
+ criteria: "Production best practices are practical — never use :latest in production (not reproducible). Tag with git SHA or semantic version. Keep previous image tags for rollback. Monitor with docker stats (CPU, memory, I/O, network). Set resource limits to prevent runaway containers from affecting host. Memory leaks: monitor trend over time, set restart policy (--restart unless-stopped) as safety net, fix root cause. Production checklist: health checks, log rotation, resource limits, graceful shutdown, backup strategy, monitoring alerts"
70
+ weight: 0.30
71
+ description: "Production practices"
@@ -0,0 +1,66 @@
1
+ meta:
2
+ id: cicd-pipeline-design
3
+ level: 4
4
+ course: docker-container-debugging
5
+ type: output
6
+ description: "Design container CI/CD pipelines — implement build, test, scan, sign, and deploy workflows for containerized applications at scale"
7
+ tags: [Docker, CI/CD, pipeline, deployment, automation, GitOps, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your organization is standardizing its container CI/CD pipeline.
13
+ Currently, each team has ad-hoc scripts:
14
+
15
+ Team A: docker build && docker push && ssh prod docker pull && restart
16
+ Team B: Jenkins pipeline with docker-compose up on build agent
17
+ Team C: GitHub Actions with manual approval for production
18
+
19
+ None have: security scanning, image signing, staged rollouts,
20
+ automated rollback, or audit trails.
21
+
22
+ Design a standard pipeline:
23
+
24
+ Stage 1 — Build:
25
+ - BuildKit with layer caching (registry-based cache)
26
+ - Multi-architecture builds (amd64 + arm64) via docker buildx
27
+ - Deterministic builds: pinned base image digests, lock files
28
+ - Build metadata: git SHA, build timestamp, CI job URL as labels
29
+
30
+ Stage 2 — Test:
31
+ - Unit tests in build stage (fail fast)
32
+ - Integration tests with docker compose (spin up dependencies)
33
+ - Smoke tests against built image (start container, hit health endpoint)
34
+ - Test containers clean up after themselves (--rm, compose down -v)
35
+
36
+ Stage 3 — Security:
37
+ - Trivy scan: block CRITICAL, warn HIGH
38
+ - SBOM generation (syft) attached as attestation
39
+ - Image signing (cosign) with CI identity
40
+ - License compliance check
41
+
42
+ Stage 4 — Deploy:
43
+ - Push to registry with git SHA tag + branch tag
44
+ - Staging: automatic deploy, run integration tests
45
+ - Production: manual approval gate, canary deployment (10% → 50% → 100%)
46
+ - Automated rollback if error rate exceeds threshold
47
+
48
+ Task: Design the container CI/CD pipeline. Write: each stage in
49
+ detail, caching strategies for fast builds, testing containers in
50
+ CI (compose-based integration tests), security gates, deployment
51
+ strategies (canary, blue-green), rollback automation, and how to
52
+ standardize across teams without being overly rigid.
53
+
54
+ assertions:
55
+ - type: llm_judge
56
+ criteria: "Build and test stages are thorough — BuildKit cache: --cache-from type=registry for cross-CI cache sharing. Multi-arch: docker buildx build --platform linux/amd64,linux/arm64. Pin base images by digest for reproducibility. Test in CI: docker compose up -d to start dependencies, run tests, compose down -v to clean up. Smoke test: start the built image, wait for health check, hit /health endpoint. All test containers must clean up (no orphaned resources in CI)"
57
+ weight: 0.35
58
+ description: "Build and test"
59
+ - type: llm_judge
60
+ criteria: "Security and deployment gates are covered — scan before push (shift left). Trivy with --exit-code 1 --severity CRITICAL fails the pipeline. SBOM with syft for supply chain transparency. Image signing with cosign (keyless via OIDC in CI). Deploy: push to registry with immutable tags (git SHA, never :latest). Canary: deploy to subset, monitor error rate, auto-promote or rollback. Blue-green: run both versions, switch traffic at load balancer. Rollback: keep previous N images, automate based on health metrics"
61
+ weight: 0.35
62
+ description: "Security and deployment"
63
+ - type: llm_judge
64
+ criteria: "Standardization approach is practical — provide a shared pipeline template/library that teams extend. Core stages (build, scan, sign) are mandatory and team-managed. Testing and deployment stages are customizable per team. Platform team maintains the template, teams consume via CI includes. Don't enforce identical pipelines — allow team-specific test suites and deployment strategies. Measure: build times, deployment frequency, change failure rate, mean time to recovery (DORA metrics)"
65
+ weight: 0.30
66
+ description: "Standardization"
@@ -0,0 +1,63 @@
1
+ meta:
2
+ id: container-monitoring-observability
3
+ level: 4
4
+ course: docker-container-debugging
5
+ type: output
6
+ description: "Design container monitoring and observability — implement metrics, logging, and tracing for containerized applications at enterprise scale"
7
+ tags: [Docker, monitoring, observability, Prometheus, logging, tracing, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your containerized platform has 100+ services. The current monitoring
13
+ is "docker stats on the host and hope for the best." Last month,
14
+ three production incidents were discovered by customers, not by your
15
+ team. You need to design a comprehensive observability strategy.
16
+
17
+ Current state:
18
+ - No centralized metrics — each team checks docker stats manually
19
+ - Logs go to docker logs (json-file driver) — no aggregation
20
+ - No distributed tracing — debugging request flows across services
21
+ requires correlating timestamps across multiple docker logs outputs
22
+ - Alert on "server down" only — no application-level alerts
23
+ - Post-incident: "we don't know what happened because logs rotated"
24
+
25
+ Target state (three pillars of observability):
26
+
27
+ 1. Metrics — Prometheus + cAdvisor + Grafana:
28
+ - cAdvisor exports container metrics (CPU, memory, disk, network)
29
+ - Application metrics via /metrics endpoints (RED method)
30
+ - Grafana dashboards per service and per host
31
+ - Alerts: container restart > 3/hour, memory > 80%, error rate > 1%
32
+
33
+ 2. Logging — Fluentd/Fluent Bit + Elasticsearch + Kibana:
34
+ - Structured JSON logging from applications
35
+ - Docker logging driver: fluentd (forward to aggregator)
36
+ - Centralized search, retention policies, log correlation
37
+ - Keep 30 days hot, 90 days warm, 1 year cold storage
38
+
39
+ 3. Tracing — OpenTelemetry + Jaeger:
40
+ - Distributed trace propagation across services
41
+ - Trace ID in logs for correlation
42
+ - Latency analysis, dependency mapping
43
+ - Sampling strategy: 100% for errors, 10% for normal traffic
44
+
45
+ Task: Design the container observability strategy. Write: the three
46
+ pillars approach (metrics, logs, traces), tooling selection and
47
+ architecture, Docker-specific considerations (logging drivers, cAdvisor,
48
+ container labels for discovery), alert design (what to alert on),
49
+ and cost/complexity trade-offs for different organization sizes.
50
+
51
+ assertions:
52
+ - type: llm_judge
53
+ criteria: "Three pillars architecture is explained — Metrics: Prometheus scrapes cAdvisor (container metrics) and application /metrics endpoints. Service discovery via Docker labels or DNS. Grafana for visualization. Logging: structured JSON from apps, collected via Docker logging driver (fluentd) or sidecar, shipped to Elasticsearch/Loki. Tracing: OpenTelemetry SDK in applications, export to Jaeger/Tempo. Correlation: trace ID embedded in logs and metrics labels connects all three"
54
+ weight: 0.35
55
+ description: "Three pillars"
56
+ - type: llm_judge
57
+ criteria: "Docker-specific monitoring is covered — cAdvisor runs as privileged container, mounts /var/lib/docker and /sys for metrics. Docker logging drivers determine log pipeline: json-file (local only), fluentd (forward), journald (systemd). Dual logging (Docker 20.10+) allows docker logs to work alongside remote driver. Container labels for Prometheus service discovery. Health check status as a metric. docker events stream for container lifecycle tracking. Monitor the Docker daemon itself (memory, goroutines)"
58
+ weight: 0.35
59
+ description: "Docker-specific monitoring"
60
+ - type: llm_judge
61
+ criteria: "Alerting and scaling are practical — alert on symptoms not causes: error rate, latency percentile (p99), saturation (CPU/memory %). Avoid alert fatigue: page only for customer-impacting issues. Container-specific alerts: restart loops, OOM kills, health check failures, disk pressure. Scaling considerations: Prometheus needs storage planning, Elasticsearch is resource-intensive. For smaller orgs: Grafana Loki (lighter than Elasticsearch), Grafana Tempo (lighter than Jaeger). Cost grows with retention and cardinality"
62
+ weight: 0.30
63
+ description: "Alerting and scaling"
@@ -0,0 +1,62 @@
1
+ meta:
2
+ id: container-orchestration-strategy
3
+ level: 4
4
+ course: docker-container-debugging
5
+ type: output
6
+ description: "Design container orchestration strategy — evaluate Docker Swarm vs Kubernetes, define deployment architectures, and plan migration paths for enterprise containerized applications"
7
+ tags: [Docker, orchestration, Swarm, Kubernetes, architecture, strategy, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your company runs 50+ microservices on Docker Compose across 8
13
+ bare-metal servers. The setup works but pain points are growing:
14
+
15
+ - Manual deployment: SSH into each server, docker compose pull && up
16
+ - No automatic failover: if a server dies, its services are down
17
+ - Scaling: manually adding containers and updating nginx upstream
18
+ - Secret management: .env files on each server, rotated manually
19
+ - No resource governance: containers compete for CPU/memory
20
+
21
+ Management wants "orchestration" and asks you to evaluate options:
22
+
23
+ Option A — Docker Swarm:
24
+ Pros: Built into Docker Engine, minimal learning curve, uses existing
25
+ docker-compose.yaml (with minor changes to deploy section), simple
26
+ setup (docker swarm init, docker swarm join).
27
+ Cons: Smaller community, fewer features, no auto-scaling, limited
28
+ ecosystem, uncertain future (Docker Inc. focus shifted to Desktop).
29
+
30
+ Option B — Kubernetes:
31
+ Pros: Industry standard, massive ecosystem, advanced scheduling,
32
+ auto-scaling (HPA/VPA), extensive networking (CNI plugins), strong
33
+ secret management, RBAC, namespace isolation.
34
+ Cons: Steep learning curve, complex operations, requires dedicated
35
+ team, YAML complexity, overkill for small deployments.
36
+
37
+ Option C — Managed Kubernetes (EKS/GKE/AKS):
38
+ Pros: Control plane managed by cloud provider, integrated with cloud
39
+ services, automatic upgrades, SLA-backed.
40
+ Cons: Cloud vendor dependency, cost, networking complexity with
41
+ existing on-prem services, data sovereignty concerns.
42
+
43
+ The decision affects 3+ years of infrastructure investment.
44
+
45
+ Task: Evaluate container orchestration options. Write: comparison
46
+ matrix (Swarm vs K8s vs managed K8s), migration path from Compose
47
+ to orchestration, decision criteria (team size, scale, budget,
48
+ compliance), rollout strategy, and risk mitigation plan.
49
+
50
+ assertions:
51
+ - type: llm_judge
52
+ criteria: "Orchestration comparison is thorough — Docker Swarm: simple, uses Compose files, good for < 20 services and small teams. Kubernetes: complex but standard, good for > 20 services, multi-team, auto-scaling needs. Managed K8s: removes operational burden of K8s control plane. Decision factors: team expertise, number of services, scaling requirements, compliance needs, budget (managed K8s has cloud costs), existing infrastructure (on-prem favors Swarm or self-hosted K8s)"
53
+ weight: 0.35
54
+ description: "Orchestration comparison"
55
+ - type: llm_judge
56
+ criteria: "Migration strategy is covered — phased approach: (1) containerize properly first (health checks, graceful shutdown, 12-factor), (2) start with non-critical services, (3) migrate stateless before stateful, (4) run hybrid during transition. Compose to Swarm: add deploy section, docker stack deploy. Compose to K8s: use kompose for initial conversion, then refine. Keep Compose for local development regardless of production orchestration"
57
+ weight: 0.35
58
+ description: "Migration strategy"
59
+ - type: llm_judge
60
+ criteria: "Risk mitigation is practical — run parallel environments during migration, blue-green at infrastructure level. Keep rollback path to Compose for 6+ months. Staff training before migration (K8s has steep curve). Start with a platform team of 2-3 dedicated engineers for K8s. Budget for monitoring/observability platform. Document runbooks for new platform. Consider: is the complexity justified? Many companies successfully run on Compose/Swarm at significant scale"
61
+ weight: 0.30
62
+ description: "Risk mitigation"
@@ -0,0 +1,64 @@
1
+ meta:
2
+ id: container-performance-engineering
3
+ level: 4
4
+ course: docker-container-debugging
5
+ type: output
6
+ description: "Design container performance engineering — implement performance testing, profiling, resource optimization, and capacity planning for containerized applications"
7
+ tags: [Docker, performance, profiling, resource-optimization, capacity-planning, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your containerized API handles 10,000 requests/second in production.
13
+ After a deployment, latency increased from p99 of 50ms to 800ms.
14
+ Docker stats shows:
15
+
16
+ $ docker stats --no-stream
17
+ CONTAINER CPU% MEM USAGE/LIMIT MEM% NET I/O BLOCK I/O
18
+ api-1 195% 1.8GiB/2GiB 90% 2.1GB/800MB 500MB/2.3GB
19
+ api-2 190% 1.7GiB/2GiB 85% 2.0GB/790MB 480MB/2.1GB
20
+ api-3 45% 900MiB/2GiB 44% 200MB/80MB 50MB/100MB
21
+
22
+ Observations:
23
+ - api-1 and api-2 are CPU-saturated and near memory limit
24
+ - api-3 has low utilization — load balancing is uneven
25
+ - Block I/O is suspiciously high for an API service
26
+
27
+ Investigation:
28
+
29
+ 1. CPU: container has --cpus=2 but the Go application spawns
30
+ GOMAXPROCS=runtime.NumCPU() = 32 (host cores). Goroutines
31
+ compete for 2 CPU cores, causing excessive context switching.
32
+ Fix: set GOMAXPROCS to match container CPU limit.
33
+
34
+ 2. Memory: the application uses an in-memory cache that grows
35
+ unbounded. Near the 2GiB limit, the Go GC runs constantly
36
+ (GC pressure), consuming CPU cycles for garbage collection.
37
+
38
+ 3. Block I/O: the application writes temporary files to the
39
+ container's writable layer (overlay2). This goes through the
40
+ storage driver with copy-on-write overhead. Should use tmpfs.
41
+
42
+ 4. Load balancing: Docker's internal DNS round-robin sends requests
43
+ to all instances equally, but api-3 was just restarted and its
44
+ JIT/cache is cold. Need weighted or least-connections balancing.
45
+
46
+ Task: Design container performance engineering. Write: how containers
47
+ affect application performance (cgroups, namespace overhead, storage
48
+ drivers), profiling containers (docker stats, nsenter, perf), common
49
+ performance pitfalls (CPU throttling, memory pressure, I/O through
50
+ overlay2), resource right-sizing, and load testing containerized apps.
51
+
52
+ assertions:
53
+ - type: llm_judge
54
+ criteria: "Container performance impact is explained — cgroups enforce CPU and memory limits but applications may not be aware of them. CPU: --cpus=2 means CFS quota, not dedicated cores. Applications should read cgroup limits, not /proc/cpuinfo (Java: -XX:+UseContainerSupport, Go: GOMAXPROCS=container limit, Node: --max-old-space-size). Memory: OOM killer triggers at cgroup limit. GC-heavy languages suffer near the limit. Storage driver adds I/O overhead for writable layer. Network: NAT overhead for published ports"
55
+ weight: 0.35
56
+ description: "Performance impact"
57
+ - type: llm_judge
58
+ criteria: "Profiling techniques are covered — docker stats: real-time CPU, memory, I/O, network per container. nsenter: enter container namespaces for host-level tools (perf, strace, tcpdump). docker exec with profiling tools (if available in image). cAdvisor: detailed container metrics over time. Application-level: pprof (Go), async-profiler (Java), py-spy (Python). Flame graphs for CPU analysis. Memory profiling to identify leaks. Compare: container metrics vs application metrics to identify if bottleneck is container-level or application-level"
59
+ weight: 0.35
60
+ description: "Profiling techniques"
61
+ - type: llm_judge
62
+ criteria: "Resource right-sizing is practical — start with generous limits, monitor actual usage, tighten. CPU: observe throttling (nr_throttled in cgroup stats). Memory: set limit 20-30% above normal usage to handle spikes and GC. Use tmpfs for temporary files (avoid overlay2 write overhead). Load testing: use tools like k6/wrk against containerized app with production-like resource limits. Capacity planning: requests per container × containers = total capacity. Account for startup latency (cold JIT, cache warming) in scaling calculations"
63
+ weight: 0.30
64
+ description: "Resource right-sizing"
@@ -0,0 +1,66 @@
1
+ meta:
2
+ id: container-security-architecture
3
+ level: 4
4
+ course: docker-container-debugging
5
+ type: output
6
+ description: "Design container security architecture — implement defense-in-depth for containerized applications including runtime security, network policies, and incident response"
7
+ tags: [Docker, security, architecture, defense-in-depth, runtime, compliance, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ After a security breach where an attacker gained access to a
13
+ container and attempted lateral movement, your CISO requests a
14
+ comprehensive container security architecture. The current state:
15
+
16
+ - All containers run as root
17
+ - No network segmentation — all containers on default bridge network
18
+ - Images pulled from public Docker Hub without scanning
19
+ - Docker socket mounted into several containers "for monitoring"
20
+ - No runtime threat detection
21
+ - Secrets in environment variables visible via docker inspect
22
+
23
+ Attack path reconstruction:
24
+ 1. Attacker exploited an application vulnerability (RCE)
25
+ 2. Gained shell inside container (as root)
26
+ 3. Found Docker socket mounted → created a privileged container
27
+ 4. Mounted host filesystem from privileged container
28
+ 5. Accessed other containers' secrets via docker inspect
29
+ 6. Exfiltrated data from database container
30
+
31
+ Defense-in-depth layers needed:
32
+
33
+ Layer 1 — Build time: scan images, use minimal base images, no
34
+ secrets in images, sign images
35
+
36
+ Layer 2 — Configuration: non-root, drop capabilities, read-only
37
+ filesystem, resource limits, no privileged containers
38
+
39
+ Layer 3 — Network: network segmentation, firewall rules between
40
+ service tiers (frontend can't reach database directly)
41
+
42
+ Layer 4 — Runtime: anomaly detection (unexpected processes,
43
+ network connections, file modifications), audit logging
44
+
45
+ Layer 5 — Secrets: Docker secrets or external vault, rotated
46
+ automatically, never in environment variables or images
47
+
48
+ Task: Design the container security architecture. Write: each
49
+ defense layer with specific controls, the Docker socket security
50
+ problem and alternatives, network segmentation for containers,
51
+ secrets management, runtime security monitoring, and compliance
52
+ requirements (SOC2, PCI-DSS) for containerized environments.
53
+
54
+ assertions:
55
+ - type: llm_judge
56
+ criteria: "Defense-in-depth layers are explained — Build: scan with Trivy/Scout, use distroless/alpine bases, multi-stage builds, sign images. Config: USER directive, --cap-drop ALL --cap-add <specific>, --read-only --tmpfs /tmp, no --privileged ever. Network: custom bridge networks per tier, frontend ↔ api ↔ database (no frontend → database). Runtime: Falco/Sysdig for anomaly detection (unexpected exec, network, file access). Secrets: Docker secrets, HashiCorp Vault, never env vars for sensitive data"
57
+ weight: 0.35
58
+ description: "Defense layers"
59
+ - type: llm_judge
60
+ criteria: "Docker socket and lateral movement prevention are covered — Docker socket = root access to host. NEVER mount into application containers. Alternatives: Docker API proxy with authorization (Tecnativa docker-socket-proxy), rootless Docker, monitoring via cAdvisor (read-only metrics without socket). Lateral movement prevention: network segmentation (containers can only reach needed services), no shared volumes between security tiers, separate Docker networks per service group"
61
+ weight: 0.35
62
+ description: "Socket and lateral movement"
63
+ - type: llm_judge
64
+ criteria: "Compliance and incident response are practical — SOC2/PCI-DSS requirements: audit logging of all container operations, access control (who can deploy, exec into containers), encryption at rest and in transit, vulnerability management program. Incident response for containers: preserve container (don't delete), capture filesystem (docker export), collect logs, analyze with forensic tools. Runtime monitoring: alert on docker exec in production, unexpected outbound connections, privilege escalation attempts"
65
+ weight: 0.30
66
+ description: "Compliance and response"
@@ -0,0 +1,58 @@
1
+ meta:
2
+ id: enterprise-image-management
3
+ level: 4
4
+ course: docker-container-debugging
5
+ type: output
6
+ description: "Design enterprise container image management — implement private registries, image signing, vulnerability policies, and golden image pipelines"
7
+ tags: [Docker, registry, image-signing, vulnerability, enterprise, governance, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your organization has 200+ developers building Docker images with
13
+ no governance. A security audit reveals:
14
+
15
+ 1. Developers pull random base images from Docker Hub — some contain
16
+ known vulnerabilities, some are abandoned/unmaintained.
17
+
18
+ 2. No image provenance — can't verify who built an image or if it
19
+ was tampered with between build and deployment.
20
+
21
+ 3. Production runs images with CRITICAL vulnerabilities because
22
+ there's no gate between build and deploy.
23
+
24
+ 4. Multiple teams independently build similar base images with
25
+ different security configurations.
26
+
27
+ 5. Docker Hub rate limits (100 pulls/6hrs for anonymous, 200 for
28
+ free accounts) cause CI failures during peak hours.
29
+
30
+ Proposed enterprise image management architecture:
31
+
32
+ - Private registry (Harbor) as pull-through cache and primary store
33
+ - Golden base images maintained by the platform team, pre-hardened
34
+ and scanned, rebuilt weekly with latest patches
35
+ - Image signing with Docker Content Trust / cosign
36
+ - Admission controller that rejects unsigned/unscanned images
37
+ - Vulnerability policy: block CRITICAL, alert HIGH, log MEDIUM
38
+ - Image retention policy: keep 10 latest tags, delete untagged after 7 days
39
+
40
+ Task: Design enterprise image management. Write: private registry
41
+ architecture (Harbor, ECR, GCR), golden base image strategy, image
42
+ signing and verification (cosign, Notary), vulnerability gating in
43
+ CI/CD, image retention and garbage collection, and developer
44
+ experience considerations (don't slow down development).
45
+
46
+ assertions:
47
+ - type: llm_judge
48
+ criteria: "Registry architecture is explained — private registry serves as: pull-through cache (reduces Docker Hub dependency, avoids rate limits), primary store for internal images, security scan integration point. Harbor: open-source, includes scanning (Trivy), signing (cosign/Notary), replication, RBAC, audit logging. Cloud registries (ECR, GCR, ACR): managed, integrated with cloud IAM. Registry should be highly available and backed up"
49
+ weight: 0.35
50
+ description: "Registry architecture"
51
+ - type: llm_judge
52
+ criteria: "Golden images and signing are covered — golden base images: maintained by platform team, pre-hardened (non-root, minimal packages, security config), scanned and signed, rebuilt on schedule (weekly or on CVE). Image signing: cosign for keyless signing (works with Sigstore), Docker Content Trust (DCT) for Docker-native. Admission control: reject unsigned images in production. Supply chain: SBOM generation (syft), provenance attestation (SLSA)"
53
+ weight: 0.35
54
+ description: "Golden images and signing"
55
+ - type: llm_judge
56
+ criteria: "Developer experience is balanced with security — developers should be able to build and test locally without friction. CI pipeline handles scanning/signing automatically. Clear documentation on approved base images. Fast feedback: scan in CI before merge, not just at deploy. Escape valve for urgent deployments (with audit trail). Self-service: developers can request new base images through a defined process. Metrics: track mean time from vulnerability disclosure to patched image deployment"
57
+ weight: 0.30
58
+ description: "Developer experience"
@@ -0,0 +1,63 @@
1
+ meta:
2
+ id: expert-debugging-shift
3
+ level: 4
4
+ course: docker-container-debugging
5
+ type: output
6
+ description: "Combined expert debugging shift — diagnose and design solutions for a production container platform with orchestration, security, performance, and operational challenges"
7
+ tags: [Docker, troubleshooting, combined, shift-simulation, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You're the newly hired container platform lead. On day one, you
13
+ discover the platform has accumulated significant technical debt.
14
+ The CTO wants a 90-day improvement plan.
15
+
16
+ Current state assessment:
17
+
18
+ Infrastructure: 20 Docker hosts, 150+ containers, Docker Compose
19
+ on each host, no orchestration. Manual deployments via SSH scripts.
20
+
21
+ Security audit findings:
22
+ - 40% of containers run as root with --privileged
23
+ - Docker socket mounted in 12 containers
24
+ - No image scanning — 3 containers have CRITICAL CVEs from 2023
25
+ - Secrets in docker-compose.yml files committed to git
26
+ - No network segmentation — all containers on default bridge
27
+
28
+ Performance issues:
29
+ - 5 containers with memory leaks, restarted nightly via cron
30
+ - No resource limits on 80% of containers
31
+ - Disk fills up monthly — manual cleanup each time
32
+ - No monitoring beyond Nagios ping checks
33
+
34
+ Operational gaps:
35
+ - No centralized logging — engineers SSH to each host for logs
36
+ - Deployments take 2 hours (manual process on 20 hosts)
37
+ - Rollback requires re-deploying the previous version manually
38
+ - Last week's incident: deployed wrong image tag to production,
39
+ took 4 hours to detect because no health checks
40
+
41
+ Team: 3 DevOps engineers, 30 developers, 0 security engineers
42
+
43
+ Budget: Can hire 2 more people, $50K/year for tooling
44
+
45
+ Task: Design the 90-day improvement plan. Write: the priority
46
+ ranking (what to fix first and why), quick wins vs long-term
47
+ improvements, orchestration decision, security remediation plan,
48
+ observability implementation, deployment automation strategy, team
49
+ structure and hiring priorities, and success metrics.
50
+
51
+ assertions:
52
+ - type: llm_judge
53
+ criteria: "Priority ranking is justified — Phase 1 (days 1-30): security remediation (remove --privileged, remove socket mounts, rotate exposed secrets, scan images) and quick wins (health checks, resource limits, log rotation, automated disk cleanup). Phase 2 (days 31-60): deployment automation (CI/CD pipeline, image registry, automated rollouts) and observability (centralized logging, container monitoring). Phase 3 (days 61-90): orchestration evaluation (Swarm vs K8s based on team and scale), network segmentation, performance optimization"
54
+ weight: 0.35
55
+ description: "Priority ranking"
56
+ - type: llm_judge
57
+ criteria: "Resource and team strategy is realistic — $50K budget allocation: private registry (Harbor, free), monitoring (Prometheus + Grafana, free), CI/CD (GitLab CI or GitHub Actions, existing). Hiring: 1 security-focused DevOps + 1 platform engineer. Team structure: platform team (3 existing + 2 new) owns infrastructure, provides self-service to 30 developers. Developer enablement: standardized Dockerfiles, CI templates, documentation. Don't try to do everything at once — incremental improvements with measurable outcomes"
58
+ weight: 0.35
59
+ description: "Resource strategy"
60
+ - type: llm_judge
61
+ criteria: "Success metrics are measurable — security: 0 CRITICAL CVEs, 0 --privileged containers, 0 exposed secrets. Performance: all containers with resource limits, automated disk management, no manual restarts. Operations: deployment time < 15 minutes (from 2 hours), rollback < 5 minutes, MTTR < 30 minutes. Observability: centralized logs for all containers, alerting on container health/resources, dashboard coverage. Track DORA metrics: deployment frequency, lead time, change failure rate, MTTR"
62
+ weight: 0.30
63
+ description: "Success metrics"
@@ -0,0 +1,70 @@
1
+ meta:
2
+ id: incident-response-containers
3
+ level: 4
4
+ course: docker-container-debugging
5
+ type: output
6
+ description: "Design container incident response — implement forensic procedures, evidence preservation, root cause analysis, and post-incident improvements for container security events"
7
+ tags: [Docker, incident-response, forensics, security, post-incident, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your security team detects anomalous behavior from a production
13
+ container: unexpected outbound connections to an unknown IP,
14
+ unusual processes running, and a spike in CPU usage.
15
+
16
+ Alert timeline:
17
+ 10:15 — Falco alert: "Shell spawned in container api-prod-3"
18
+ 10:16 — Network alert: Outbound connection to 198.51.100.42:4444
19
+ 10:17 — Falco alert: "Sensitive file opened: /etc/shadow"
20
+ 10:18 — CPU spike to 400% in api-prod-3
21
+
22
+ You are the incident commander. What do you do?
23
+
24
+ WRONG approach:
25
+ $ docker stop api-prod-3
26
+ $ docker rm api-prod-3
27
+ # Evidence destroyed! Can't determine what happened.
28
+
29
+ CORRECT approach:
30
+
31
+ Step 1 — Isolate without destroying:
32
+ $ docker network disconnect production-net api-prod-3
33
+ # Container still running but network-isolated
34
+
35
+ Step 2 — Preserve evidence:
36
+ $ docker export api-prod-3 > api-prod-3-filesystem.tar
37
+ $ docker logs api-prod-3 > api-prod-3-logs.txt
38
+ $ docker inspect api-prod-3 > api-prod-3-inspect.json
39
+ $ docker diff api-prod-3 > api-prod-3-diff.txt
40
+ $ docker top api-prod-3 > api-prod-3-processes.txt
41
+
42
+ Step 3 — Analyze:
43
+ $ docker exec api-prod-3 cat /proc/net/tcp # active connections
44
+ $ docker exec api-prod-3 find / -newer /app/server.js # recently modified files
45
+ $ docker exec api-prod-3 cat /proc/1/environ # check for injected env vars
46
+
47
+ Step 4 — Determine blast radius:
48
+ - What other containers could this container reach?
49
+ - Were any secrets or tokens accessible?
50
+ - What data was in the container's network segment?
51
+
52
+ Task: Design container incident response procedures. Write: the
53
+ isolation strategy (network disconnect vs stop), evidence collection
54
+ (export, logs, inspect, diff), forensic analysis techniques, blast
55
+ radius assessment, communication plan (who to notify), and
56
+ post-incident improvements to prevent recurrence.
57
+
58
+ assertions:
59
+ - type: llm_judge
60
+ criteria: "Isolation and evidence preservation are explained — ISOLATE FIRST: docker network disconnect removes network access while keeping container running for investigation. Do NOT docker rm — this destroys the writable layer and all evidence. Preserve: docker export (full filesystem tar), docker logs (stdout/stderr), docker inspect (full container config including env vars, mounts, network), docker diff (filesystem changes), docker top (running processes). Chain of custody: hash all evidence files, timestamp collection"
61
+ weight: 0.35
62
+ description: "Isolation and evidence"
63
+ - type: llm_judge
64
+ criteria: "Forensic analysis is covered — analyze filesystem changes: docker diff shows what was modified/added. Look for: new binaries, modified configs, dropped tools, cryptocurrency miners. Check /proc: /proc/net/tcp for connections, /proc/*/cmdline for processes, /proc/*/environ for environment. Network forensics: captured packets if tcpdump was running. Image comparison: diff the running container against the original image to find all attacker modifications. Timeline reconstruction from logs and filesystem timestamps"
65
+ weight: 0.35
66
+ description: "Forensic analysis"
67
+ - type: llm_judge
68
+ criteria: "Blast radius and post-incident are practical — blast radius: check what networks the container was on (docker network inspect), what secrets it had access to (docker inspect environment), what volumes were shared, what other services it could reach. Rotate all credentials the container had access to. Post-incident: add runtime monitoring (Falco), implement network segmentation, remove unnecessary capabilities, add read-only filesystem, review image for vulnerabilities that enabled the initial compromise. Write an incident report with timeline, impact, root cause, and remediation actions"
69
+ weight: 0.30
70
+ description: "Blast radius and post-incident"