dojo.md 0.2.0 → 0.2.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/courses/GENERATION_LOG.md +45 -0
- package/courses/aws-lambda-debugging/course.yaml +11 -0
- package/courses/aws-lambda-debugging/scenarios/level-1/api-gateway-integration.yaml +71 -0
- package/courses/aws-lambda-debugging/scenarios/level-1/cloudwatch-logs-basics.yaml +64 -0
- package/courses/aws-lambda-debugging/scenarios/level-1/cold-start-basics.yaml +70 -0
- package/courses/aws-lambda-debugging/scenarios/level-1/environment-variable-issues.yaml +72 -0
- package/courses/aws-lambda-debugging/scenarios/level-1/first-debugging-shift.yaml +73 -0
- package/courses/aws-lambda-debugging/scenarios/level-1/handler-import-errors.yaml +71 -0
- package/courses/aws-lambda-debugging/scenarios/level-1/iam-permission-errors.yaml +68 -0
- package/courses/aws-lambda-debugging/scenarios/level-1/invocation-errors.yaml +72 -0
- package/courses/aws-lambda-debugging/scenarios/level-1/lambda-timeout-errors.yaml +65 -0
- package/courses/aws-lambda-debugging/scenarios/level-1/memory-and-oom.yaml +70 -0
- package/courses/aws-lambda-debugging/scenarios/level-2/async-invocation-failures.yaml +72 -0
- package/courses/aws-lambda-debugging/scenarios/level-2/cold-start-optimization.yaml +76 -0
- package/courses/aws-lambda-debugging/scenarios/level-2/dynamodb-streams-debugging.yaml +70 -0
- package/courses/aws-lambda-debugging/scenarios/level-2/intermediate-debugging-shift.yaml +71 -0
- package/courses/aws-lambda-debugging/scenarios/level-2/lambda-concurrency-management.yaml +70 -0
- package/courses/aws-lambda-debugging/scenarios/level-2/lambda-layers-debugging.yaml +76 -0
- package/courses/aws-lambda-debugging/scenarios/level-2/sam-local-debugging.yaml +74 -0
- package/courses/aws-lambda-debugging/scenarios/level-2/sqs-event-source.yaml +72 -0
- package/courses/aws-lambda-debugging/scenarios/level-2/vpc-networking-issues.yaml +71 -0
- package/courses/aws-lambda-debugging/scenarios/level-2/xray-tracing.yaml +62 -0
- package/courses/aws-lambda-debugging/scenarios/level-3/advanced-debugging-shift.yaml +72 -0
- package/courses/aws-lambda-debugging/scenarios/level-3/container-image-lambda.yaml +79 -0
- package/courses/aws-lambda-debugging/scenarios/level-3/cross-account-invocation.yaml +72 -0
- package/courses/aws-lambda-debugging/scenarios/level-3/eventbridge-patterns.yaml +79 -0
- package/courses/aws-lambda-debugging/scenarios/level-3/iac-deployment-debugging.yaml +68 -0
- package/courses/aws-lambda-debugging/scenarios/level-3/kinesis-stream-processing.yaml +64 -0
- package/courses/aws-lambda-debugging/scenarios/level-3/lambda-at-edge.yaml +64 -0
- package/courses/aws-lambda-debugging/scenarios/level-3/lambda-extensions-debugging.yaml +67 -0
- package/courses/aws-lambda-debugging/scenarios/level-3/powertools-observability.yaml +79 -0
- package/courses/aws-lambda-debugging/scenarios/level-3/step-functions-debugging.yaml +80 -0
- package/courses/aws-lambda-debugging/scenarios/level-4/cost-optimization-strategy.yaml +67 -0
- package/courses/aws-lambda-debugging/scenarios/level-4/expert-debugging-shift.yaml +62 -0
- package/courses/aws-lambda-debugging/scenarios/level-4/incident-management-serverless.yaml +61 -0
- package/courses/aws-lambda-debugging/scenarios/level-4/multi-region-serverless.yaml +67 -0
- package/courses/aws-lambda-debugging/scenarios/level-4/observability-platform-design.yaml +71 -0
- package/courses/aws-lambda-debugging/scenarios/level-4/serverless-architecture-design.yaml +64 -0
- package/courses/aws-lambda-debugging/scenarios/level-4/serverless-data-architecture.yaml +66 -0
- package/courses/aws-lambda-debugging/scenarios/level-4/serverless-migration-strategy.yaml +65 -0
- package/courses/aws-lambda-debugging/scenarios/level-4/serverless-security-design.yaml +60 -0
- package/courses/aws-lambda-debugging/scenarios/level-4/serverless-testing-strategy.yaml +62 -0
- package/courses/aws-lambda-debugging/scenarios/level-5/board-serverless-strategy.yaml +63 -0
- package/courses/aws-lambda-debugging/scenarios/level-5/consulting-serverless-adoption.yaml +57 -0
- package/courses/aws-lambda-debugging/scenarios/level-5/industry-serverless-patterns.yaml +62 -0
- package/courses/aws-lambda-debugging/scenarios/level-5/ma-serverless-integration.yaml +75 -0
- package/courses/aws-lambda-debugging/scenarios/level-5/master-debugging-shift.yaml +61 -0
- package/courses/aws-lambda-debugging/scenarios/level-5/organizational-serverless-transformation.yaml +65 -0
- package/courses/aws-lambda-debugging/scenarios/level-5/regulatory-serverless.yaml +61 -0
- package/courses/aws-lambda-debugging/scenarios/level-5/serverless-economics.yaml +65 -0
- package/courses/aws-lambda-debugging/scenarios/level-5/serverless-future-technology.yaml +66 -0
- package/courses/aws-lambda-debugging/scenarios/level-5/serverless-platform-design.yaml +71 -0
- package/courses/docker-container-debugging/course.yaml +11 -0
- package/courses/docker-container-debugging/scenarios/level-1/container-exit-codes.yaml +59 -0
- package/courses/docker-container-debugging/scenarios/level-1/container-networking-basics.yaml +69 -0
- package/courses/docker-container-debugging/scenarios/level-1/docker-logs-debugging.yaml +67 -0
- package/courses/docker-container-debugging/scenarios/level-1/dockerfile-build-failures.yaml +71 -0
- package/courses/docker-container-debugging/scenarios/level-1/environment-variable-issues.yaml +74 -0
- package/courses/docker-container-debugging/scenarios/level-1/first-debugging-shift.yaml +70 -0
- package/courses/docker-container-debugging/scenarios/level-1/image-pull-failures.yaml +68 -0
- package/courses/docker-container-debugging/scenarios/level-1/port-mapping-issues.yaml +67 -0
- package/courses/docker-container-debugging/scenarios/level-1/resource-limits-oom.yaml +70 -0
- package/courses/docker-container-debugging/scenarios/level-1/volume-mount-problems.yaml +66 -0
- package/courses/docker-container-debugging/scenarios/level-2/container-health-checks.yaml +73 -0
- package/courses/docker-container-debugging/scenarios/level-2/docker-compose-debugging.yaml +66 -0
- package/courses/docker-container-debugging/scenarios/level-2/docker-exec-debugging.yaml +71 -0
- package/courses/docker-container-debugging/scenarios/level-2/image-layer-optimization.yaml +81 -0
- package/courses/docker-container-debugging/scenarios/level-2/intermediate-debugging-shift.yaml +73 -0
- package/courses/docker-container-debugging/scenarios/level-2/logging-and-log-rotation.yaml +76 -0
- package/courses/docker-container-debugging/scenarios/level-2/multi-stage-build-debugging.yaml +76 -0
- package/courses/docker-container-debugging/scenarios/level-2/network-debugging-tools.yaml +67 -0
- package/courses/docker-container-debugging/scenarios/level-2/pid1-signal-handling.yaml +71 -0
- package/courses/docker-container-debugging/scenarios/level-2/security-scanning-basics.yaml +67 -0
- package/courses/docker-container-debugging/scenarios/level-3/advanced-debugging-shift.yaml +77 -0
- package/courses/docker-container-debugging/scenarios/level-3/buildkit-optimization.yaml +67 -0
- package/courses/docker-container-debugging/scenarios/level-3/container-filesystem-debugging.yaml +70 -0
- package/courses/docker-container-debugging/scenarios/level-3/container-security-hardening.yaml +74 -0
- package/courses/docker-container-debugging/scenarios/level-3/disk-space-management.yaml +74 -0
- package/courses/docker-container-debugging/scenarios/level-3/docker-api-automation.yaml +72 -0
- package/courses/docker-container-debugging/scenarios/level-3/docker-daemon-issues.yaml +73 -0
- package/courses/docker-container-debugging/scenarios/level-3/docker-in-docker-ci.yaml +69 -0
- package/courses/docker-container-debugging/scenarios/level-3/overlay-network-debugging.yaml +70 -0
- package/courses/docker-container-debugging/scenarios/level-3/production-container-ops.yaml +71 -0
- package/courses/docker-container-debugging/scenarios/level-4/cicd-pipeline-design.yaml +66 -0
- package/courses/docker-container-debugging/scenarios/level-4/container-monitoring-observability.yaml +63 -0
- package/courses/docker-container-debugging/scenarios/level-4/container-orchestration-strategy.yaml +62 -0
- package/courses/docker-container-debugging/scenarios/level-4/container-performance-engineering.yaml +64 -0
- package/courses/docker-container-debugging/scenarios/level-4/container-security-architecture.yaml +66 -0
- package/courses/docker-container-debugging/scenarios/level-4/enterprise-image-management.yaml +58 -0
- package/courses/docker-container-debugging/scenarios/level-4/expert-debugging-shift.yaml +63 -0
- package/courses/docker-container-debugging/scenarios/level-4/incident-response-containers.yaml +70 -0
- package/courses/docker-container-debugging/scenarios/level-4/multi-environment-management.yaml +65 -0
- package/courses/docker-container-debugging/scenarios/level-4/stateful-service-containers.yaml +65 -0
- package/courses/docker-container-debugging/scenarios/level-5/board-infrastructure-strategy.yaml +58 -0
- package/courses/docker-container-debugging/scenarios/level-5/consulting-container-strategy.yaml +61 -0
- package/courses/docker-container-debugging/scenarios/level-5/container-platform-architecture.yaml +67 -0
- package/courses/docker-container-debugging/scenarios/level-5/container-platform-economics.yaml +67 -0
- package/courses/docker-container-debugging/scenarios/level-5/container-technology-evolution.yaml +67 -0
- package/courses/docker-container-debugging/scenarios/level-5/disaster-recovery-containers.yaml +66 -0
- package/courses/docker-container-debugging/scenarios/level-5/industry-container-patterns.yaml +71 -0
- package/courses/docker-container-debugging/scenarios/level-5/master-debugging-shift.yaml +62 -0
- package/courses/docker-container-debugging/scenarios/level-5/organizational-transformation.yaml +67 -0
- package/courses/docker-container-debugging/scenarios/level-5/regulatory-compliance-containers.yaml +61 -0
- package/courses/kubernetes-deployment-troubleshooting/course.yaml +12 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/configmap-secret-issues.yaml +69 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/crashloopbackoff.yaml +68 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/deployment-rollout.yaml +56 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/first-troubleshooting-shift.yaml +65 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/health-probe-failures.yaml +70 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/imagepullbackoff.yaml +57 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/kubectl-debugging-basics.yaml +56 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/oomkilled.yaml +70 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/pending-pods.yaml +68 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/service-not-reachable.yaml +66 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/dns-resolution-failures.yaml +63 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/helm-deployment-failures.yaml +63 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/hpa-scaling-issues.yaml +62 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/ingress-routing-issues.yaml +63 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/init-container-failures.yaml +63 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/intermediate-troubleshooting-shift.yaml +66 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/network-policy-blocking.yaml +67 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/persistent-volume-issues.yaml +69 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/rbac-permission-denied.yaml +57 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/resource-quota-limits.yaml +64 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/advanced-troubleshooting-shift.yaml +69 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/cluster-upgrade-failures.yaml +71 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/gitops-drift-detection.yaml +62 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/job-cronjob-failures.yaml +67 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/monitoring-alerting-gaps.yaml +64 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/multi-container-debugging.yaml +68 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/node-pressure-evictions.yaml +70 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/pod-disruption-budgets.yaml +59 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/service-mesh-debugging.yaml +64 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/statefulset-troubleshooting.yaml +69 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/capacity-planning.yaml +65 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/cost-optimization.yaml +57 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/disaster-recovery-design.yaml +56 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/executive-communication.yaml +62 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/expert-troubleshooting-shift.yaml +65 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/incident-management-process.yaml +59 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/multi-cluster-operations.yaml +62 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/multi-tenancy-design.yaml +55 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/platform-engineering.yaml +59 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/security-hardening.yaml +58 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/behavioral-science.yaml +62 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/board-strategy.yaml +61 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/cloud-native-future.yaml +65 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/comprehensive-platform.yaml +57 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/consulting-engagement.yaml +62 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/industry-benchmarks.yaml +58 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/ma-integration.yaml +62 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/master-troubleshooting-shift.yaml +73 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/product-development.yaml +65 -0
- package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/regulatory-compliance.yaml +76 -0
- package/courses/mysql-query-optimization/course.yaml +11 -0
- package/courses/mysql-query-optimization/scenarios/level-1/buffer-pool-basics.yaml +65 -0
- package/courses/mysql-query-optimization/scenarios/level-1/explain-basics.yaml +66 -0
- package/courses/mysql-query-optimization/scenarios/level-1/first-optimization-shift.yaml +78 -0
- package/courses/mysql-query-optimization/scenarios/level-1/innodb-index-fundamentals.yaml +68 -0
- package/courses/mysql-query-optimization/scenarios/level-1/join-basics.yaml +66 -0
- package/courses/mysql-query-optimization/scenarios/level-1/n-plus-one-queries.yaml +67 -0
- package/courses/mysql-query-optimization/scenarios/level-1/query-rewriting-basics.yaml +66 -0
- package/courses/mysql-query-optimization/scenarios/level-1/select-star-problems.yaml +68 -0
- package/courses/mysql-query-optimization/scenarios/level-1/slow-query-diagnosis.yaml +65 -0
- package/courses/mysql-query-optimization/scenarios/level-1/where-clause-optimization.yaml +65 -0
- package/courses/mysql-query-optimization/scenarios/level-2/buffer-pool-tuning.yaml +64 -0
- package/courses/mysql-query-optimization/scenarios/level-2/composite-index-design.yaml +71 -0
- package/courses/mysql-query-optimization/scenarios/level-2/covering-and-invisible-indexes.yaml +69 -0
- package/courses/mysql-query-optimization/scenarios/level-2/cte-and-window-functions.yaml +78 -0
- package/courses/mysql-query-optimization/scenarios/level-2/intermediate-optimization-shift.yaml +68 -0
- package/courses/mysql-query-optimization/scenarios/level-2/join-optimization.yaml +67 -0
- package/courses/mysql-query-optimization/scenarios/level-2/performance-schema-analysis.yaml +69 -0
- package/courses/mysql-query-optimization/scenarios/level-2/query-optimizer-hints.yaml +74 -0
- package/courses/mysql-query-optimization/scenarios/level-2/subquery-optimization.yaml +70 -0
- package/courses/mysql-query-optimization/scenarios/level-2/write-optimization.yaml +63 -0
- package/courses/mysql-query-optimization/scenarios/level-3/advanced-optimization-shift.yaml +71 -0
- package/courses/mysql-query-optimization/scenarios/level-3/connection-management.yaml +67 -0
- package/courses/mysql-query-optimization/scenarios/level-3/full-text-search.yaml +77 -0
- package/courses/mysql-query-optimization/scenarios/level-3/json-optimization.yaml +87 -0
- package/courses/mysql-query-optimization/scenarios/level-3/lock-contention-analysis.yaml +68 -0
- package/courses/mysql-query-optimization/scenarios/level-3/monitoring-alerting.yaml +63 -0
- package/courses/mysql-query-optimization/scenarios/level-3/online-schema-changes.yaml +79 -0
- package/courses/mysql-query-optimization/scenarios/level-3/partitioning-strategies.yaml +83 -0
- package/courses/mysql-query-optimization/scenarios/level-3/query-profiling-deep-dive.yaml +84 -0
- package/courses/mysql-query-optimization/scenarios/level-3/replication-optimization.yaml +66 -0
- package/courses/mysql-query-optimization/scenarios/level-4/aurora-vs-rds-evaluation.yaml +61 -0
- package/courses/mysql-query-optimization/scenarios/level-4/data-architecture.yaml +62 -0
- package/courses/mysql-query-optimization/scenarios/level-4/database-migration-planning.yaml +59 -0
- package/courses/mysql-query-optimization/scenarios/level-4/enterprise-governance.yaml +50 -0
- package/courses/mysql-query-optimization/scenarios/level-4/executive-communication.yaml +54 -0
- package/courses/mysql-query-optimization/scenarios/level-4/expert-optimization-shift.yaml +67 -0
- package/courses/mysql-query-optimization/scenarios/level-4/high-availability-architecture.yaml +60 -0
- package/courses/mysql-query-optimization/scenarios/level-4/optimizer-internals.yaml +62 -0
- package/courses/mysql-query-optimization/scenarios/level-4/performance-sla-design.yaml +52 -0
- package/courses/mysql-query-optimization/scenarios/level-4/read-replica-scaling.yaml +51 -0
- package/courses/mysql-query-optimization/scenarios/level-5/ai-database-future.yaml +45 -0
- package/courses/mysql-query-optimization/scenarios/level-5/behavioral-science.yaml +44 -0
- package/courses/mysql-query-optimization/scenarios/level-5/benchmark-design.yaml +47 -0
- package/courses/mysql-query-optimization/scenarios/level-5/board-strategy.yaml +48 -0
- package/courses/mysql-query-optimization/scenarios/level-5/comprehensive-platform.yaml +49 -0
- package/courses/mysql-query-optimization/scenarios/level-5/consulting-engagement.yaml +52 -0
- package/courses/mysql-query-optimization/scenarios/level-5/ma-database-integration.yaml +47 -0
- package/courses/mysql-query-optimization/scenarios/level-5/master-optimization-shift.yaml +56 -0
- package/courses/mysql-query-optimization/scenarios/level-5/product-development.yaml +48 -0
- package/courses/mysql-query-optimization/scenarios/level-5/regulatory-compliance.yaml +48 -0
- package/courses/postgresql-query-optimization/scenarios/level-5/comprehensive-database-system.yaml +70 -0
- package/courses/postgresql-query-optimization/scenarios/level-5/database-ai-future.yaml +81 -0
- package/courses/postgresql-query-optimization/scenarios/level-5/database-behavioral-science.yaml +63 -0
- package/courses/postgresql-query-optimization/scenarios/level-5/database-board-strategy.yaml +77 -0
- package/courses/postgresql-query-optimization/scenarios/level-5/database-consulting-engagement.yaml +61 -0
- package/courses/postgresql-query-optimization/scenarios/level-5/database-industry-benchmarks.yaml +64 -0
- package/courses/postgresql-query-optimization/scenarios/level-5/database-ma-integration.yaml +71 -0
- package/courses/postgresql-query-optimization/scenarios/level-5/database-product-development.yaml +72 -0
- package/courses/postgresql-query-optimization/scenarios/level-5/database-regulatory-landscape.yaml +76 -0
- package/courses/postgresql-query-optimization/scenarios/level-5/master-optimization-shift.yaml +66 -0
- package/courses/terraform-infrastructure-setup/course.yaml +11 -0
- package/courses/terraform-infrastructure-setup/scenarios/level-1/hcl-syntax-errors.yaml +65 -0
- package/courses/terraform-infrastructure-setup/scenarios/level-1/provider-configuration.yaml +62 -0
- package/courses/terraform-infrastructure-setup/scenarios/level-1/terraform-init-errors.yaml +72 -0
- package/courses/terraform-infrastructure-setup/scenarios/level-1/variable-and-output-errors.yaml +78 -0
- package/dist/mcp/session-manager.d.ts +7 -4
- package/dist/mcp/session-manager.d.ts.map +1 -1
- package/dist/mcp/session-manager.js +23 -8
- package/dist/mcp/session-manager.js.map +1 -1
- package/package.json +3 -2
|
@@ -0,0 +1,77 @@
|
|
|
1
|
+
meta:
|
|
2
|
+
id: advanced-debugging-shift
|
|
3
|
+
level: 3
|
|
4
|
+
course: docker-container-debugging
|
|
5
|
+
type: output
|
|
6
|
+
description: "Combined advanced debugging shift — diagnose a production Docker environment with daemon, security, networking, storage, and deployment issues simultaneously"
|
|
7
|
+
tags: [Docker, troubleshooting, combined, shift-simulation, advanced]
|
|
8
|
+
|
|
9
|
+
state: {}
|
|
10
|
+
|
|
11
|
+
trigger: |
|
|
12
|
+
You're called in for an urgent production issue. The Docker host
|
|
13
|
+
runs 15 services and "everything is degraded":
|
|
14
|
+
|
|
15
|
+
$ docker system df
|
|
16
|
+
TYPE TOTAL ACTIVE SIZE RECLAIMABLE
|
|
17
|
+
Images 200+ 15 85GB 62GB (72%)
|
|
18
|
+
Containers 45 15 25GB 18GB
|
|
19
|
+
Local Volumes 80 20 120GB 85GB (70%)
|
|
20
|
+
Build Cache - - 45GB 45GB
|
|
21
|
+
|
|
22
|
+
275GB consumed. Disk is at 94%.
|
|
23
|
+
|
|
24
|
+
$ docker stats --no-stream
|
|
25
|
+
CONTAINER CPU% MEM USAGE/LIMIT MEM%
|
|
26
|
+
api-1 180% 3.8GiB/4GiB 95%
|
|
27
|
+
api-2 175% 3.7GiB/4GiB 92%
|
|
28
|
+
worker-1 85% 1.2GiB/2GiB 60%
|
|
29
|
+
postgres 25% 2.1GiB/4GiB 52%
|
|
30
|
+
redis 5% 512MiB/1GiB 50%
|
|
31
|
+
nginx 2% 128MiB/256MiB 50%
|
|
32
|
+
|
|
33
|
+
Investigation reveals 6 interconnected issues:
|
|
34
|
+
|
|
35
|
+
1. API containers at 95% memory — memory leak in a recent deployment.
|
|
36
|
+
The previous image was tagged :latest and overwritten. No rollback
|
|
37
|
+
available. Must debug and fix forward.
|
|
38
|
+
|
|
39
|
+
2. Disk at 94% — 85GB of orphaned volumes from old database migrations
|
|
40
|
+
that were never cleaned up. 45GB of build cache. Containers can't
|
|
41
|
+
write logs.
|
|
42
|
+
|
|
43
|
+
3. Worker containers failing intermittently — overlay network between
|
|
44
|
+
worker and Redis has MTU issues. Large payloads fail, small ones
|
|
45
|
+
succeed. tcpdump shows fragmented packets.
|
|
46
|
+
|
|
47
|
+
4. Security audit alert — 3 containers running as root with
|
|
48
|
+
--privileged flag. One container has a writable root filesystem
|
|
49
|
+
with a suspicious file: /tmp/.backdoor.sh
|
|
50
|
+
|
|
51
|
+
5. Nginx proxy returns 502 for 10% of requests — during rolling
|
|
52
|
+
updates, traffic is routed to containers that haven't finished
|
|
53
|
+
starting. No health check-based routing.
|
|
54
|
+
|
|
55
|
+
6. Docker daemon itself is consuming 8GB of memory due to a known
|
|
56
|
+
bug with container event logging. Needs daemon restart but
|
|
57
|
+
live-restore isn't configured.
|
|
58
|
+
|
|
59
|
+
Task: Walk through the complete triage and resolution. Write: the
|
|
60
|
+
priority order (security incident first!), immediate stabilization
|
|
61
|
+
steps, disk cleanup (safe vs dangerous commands), network debugging,
|
|
62
|
+
deployment fix, daemon maintenance, and the post-incident action
|
|
63
|
+
items for long-term fixes.
|
|
64
|
+
|
|
65
|
+
assertions:
|
|
66
|
+
- type: llm_judge
|
|
67
|
+
criteria: "Priority triage is correct — (1) SECURITY FIRST: investigate the suspicious /tmp/.backdoor.sh file, check for container escape indicators, audit --privileged containers, preserve evidence before cleanup. (2) STABILITY: address 94% disk to prevent cascading failures — safe prune of build cache and dangling images first. (3) MEMORY: restart leaking API containers as immediate mitigation. (4) Fix networking MTU issue. (5) Fix deployment process for zero-downtime. (6) Plan daemon restart"
|
|
68
|
+
weight: 0.35
|
|
69
|
+
description: "Priority triage"
|
|
70
|
+
- type: llm_judge
|
|
71
|
+
criteria: "Immediate fixes and long-term actions are covered — immediate: quarantine compromised containers (don't delete — preserve for forensics), docker builder prune and docker image prune for safe disk recovery, restart API containers with memory limits, fix MTU on overlay network. Long-term: remove --privileged, implement non-root containers, tag images with versions (not :latest), configure live-restore in daemon.json, add health check-based routing, set up automated prune cron, implement monitoring and alerting"
|
|
72
|
+
weight: 0.35
|
|
73
|
+
description: "Fixes and actions"
|
|
74
|
+
- type: llm_judge
|
|
75
|
+
criteria: "Post-incident improvements are systematic — deploy monitoring: Prometheus + cAdvisor for container metrics, alert on memory/disk thresholds. Security: scan images in CI, enforce non-root, drop capabilities, read-only filesystems. Operations: automated cleanup policies, image retention limits, volume backup before prune. Deployment: rolling updates with health checks, image versioning, rollback capability. Documentation: runbook for disk emergencies, security incident response, update procedures"
|
|
76
|
+
weight: 0.30
|
|
77
|
+
description: "Post-incident improvements"
|
|
@@ -0,0 +1,67 @@
|
|
|
1
|
+
meta:
|
|
2
|
+
id: buildkit-optimization
|
|
3
|
+
level: 3
|
|
4
|
+
course: docker-container-debugging
|
|
5
|
+
type: output
|
|
6
|
+
description: "Debug BuildKit build performance — diagnose slow builds, cache misses, and optimize multi-stage builds with BuildKit features"
|
|
7
|
+
tags: [Docker, BuildKit, build-cache, multi-stage, optimization, advanced]
|
|
8
|
+
|
|
9
|
+
state: {}
|
|
10
|
+
|
|
11
|
+
trigger: |
|
|
12
|
+
Your Docker image builds take 15+ minutes in CI. You're using BuildKit
|
|
13
|
+
but not seeing the expected speedup:
|
|
14
|
+
|
|
15
|
+
$ DOCKER_BUILDKIT=1 docker build -t myapp .
|
|
16
|
+
[+] Building 923.4s (18/18)
|
|
17
|
+
=> [internal] load build context 45.2s
|
|
18
|
+
=> [stage-1 1/5] FROM node:20 0.0s
|
|
19
|
+
=> [stage-1 2/5] COPY package*.json ./ 0.3s
|
|
20
|
+
=> [stage-1 3/5] RUN npm ci 412.8s
|
|
21
|
+
=> [stage-1 4/5] COPY . . 12.1s
|
|
22
|
+
=> [stage-1 5/5] RUN npm run build 180.5s
|
|
23
|
+
=> [stage-2 1/3] FROM node:20-slim 0.0s
|
|
24
|
+
=> [stage-2 2/3] COPY --from=stage-1 /app/dist ./dist 2.1s
|
|
25
|
+
=> [stage-2 3/3] COPY --from=stage-1 /app/node_modules 8.4s
|
|
26
|
+
|
|
27
|
+
Issues found:
|
|
28
|
+
|
|
29
|
+
1. Build context is 2.3GB — .dockerignore missing node_modules, .git,
|
|
30
|
+
and test fixtures. The 45-second context load is pure waste.
|
|
31
|
+
|
|
32
|
+
2. npm ci runs every build because COPY . . invalidates the cache.
|
|
33
|
+
Layer ordering: dependencies should be installed before copying
|
|
34
|
+
source code.
|
|
35
|
+
|
|
36
|
+
3. node_modules is 800MB because it includes devDependencies.
|
|
37
|
+
The production image only needs production deps.
|
|
38
|
+
|
|
39
|
+
4. No BuildKit cache mounts — npm ci downloads packages fresh each
|
|
40
|
+
time even when package.json hasn't changed:
|
|
41
|
+
RUN --mount=type=cache,target=/root/.npm npm ci
|
|
42
|
+
|
|
43
|
+
5. The COPY --from copies all of node_modules (800MB) into the final
|
|
44
|
+
image. Should run npm ci --production in the final stage or use
|
|
45
|
+
a separate dependency stage.
|
|
46
|
+
|
|
47
|
+
After optimization, build drops from 15 minutes to 2 minutes.
|
|
48
|
+
|
|
49
|
+
Task: Explain BuildKit optimization. Write: build context management
|
|
50
|
+
(.dockerignore), layer cache optimization (order matters), BuildKit
|
|
51
|
+
cache mounts (--mount=type=cache), multi-stage build patterns for
|
|
52
|
+
minimal images, parallel stage execution in BuildKit, and common
|
|
53
|
+
Dockerfile anti-patterns that kill build performance.
|
|
54
|
+
|
|
55
|
+
assertions:
|
|
56
|
+
- type: llm_judge
|
|
57
|
+
criteria: "Layer caching strategy is explained — Docker caches each layer; if a layer changes, all subsequent layers are invalidated. Key pattern: copy dependency files first (package.json, requirements.txt), install dependencies, THEN copy source code. This way source code changes don't invalidate the dependency install layer. COPY . . before RUN npm install is an anti-pattern because any source change triggers a full reinstall"
|
|
58
|
+
weight: 0.35
|
|
59
|
+
description: "Layer caching"
|
|
60
|
+
- type: llm_judge
|
|
61
|
+
criteria: "BuildKit features are covered — cache mounts: RUN --mount=type=cache,target=/root/.npm npm ci (persists npm cache between builds). Secret mounts: RUN --mount=type=secret,id=npmrc (inject secrets without baking into layers). Parallel stage execution: BuildKit builds independent stages concurrently. Build context: .dockerignore should exclude .git, node_modules, test files, docs. Large context = slow builds even if files aren't used"
|
|
62
|
+
weight: 0.35
|
|
63
|
+
description: "BuildKit features"
|
|
64
|
+
- type: llm_judge
|
|
65
|
+
criteria: "Multi-stage optimization is practical — separate stages for: (1) dependency installation (with dev deps), (2) build/compile, (3) production image (minimal base, production deps only). Use specific base image tags (node:20.11-slim not node:latest). Final image should only contain runtime artifacts. Measure with docker images and docker history. Target: production images should be 10-50x smaller than build images"
|
|
66
|
+
weight: 0.30
|
|
67
|
+
description: "Multi-stage optimization"
|
package/courses/docker-container-debugging/scenarios/level-3/container-filesystem-debugging.yaml
ADDED
|
@@ -0,0 +1,70 @@
|
|
|
1
|
+
meta:
|
|
2
|
+
id: container-filesystem-debugging
|
|
3
|
+
level: 3
|
|
4
|
+
course: docker-container-debugging
|
|
5
|
+
type: output
|
|
6
|
+
description: "Debug container filesystem issues — diagnose storage driver problems, layer corruption, copy-on-write behavior, and container diff analysis"
|
|
7
|
+
tags: [Docker, filesystem, storage-driver, overlay2, copy-on-write, layers, advanced]
|
|
8
|
+
|
|
9
|
+
state: {}
|
|
10
|
+
|
|
11
|
+
trigger: |
|
|
12
|
+
Your application container is behaving strangely — configuration
|
|
13
|
+
files appear to have been modified, but nobody changed the image:
|
|
14
|
+
|
|
15
|
+
$ docker diff webapp-1
|
|
16
|
+
C /etc/nginx/nginx.conf
|
|
17
|
+
A /tmp/session_abc123
|
|
18
|
+
C /var/log/nginx/access.log
|
|
19
|
+
A /var/log/nginx/error.log
|
|
20
|
+
C /app/config/settings.json
|
|
21
|
+
|
|
22
|
+
The C (changed) entries show files modified in the container's
|
|
23
|
+
writable layer. Someone exec'd into the container and made changes
|
|
24
|
+
directly — these changes aren't in the image and will be lost on
|
|
25
|
+
container restart.
|
|
26
|
+
|
|
27
|
+
Deeper investigation reveals storage driver issues:
|
|
28
|
+
|
|
29
|
+
$ docker info | grep "Storage Driver"
|
|
30
|
+
Storage Driver: overlay2
|
|
31
|
+
|
|
32
|
+
$ ls /var/lib/docker/overlay2/
|
|
33
|
+
(hundreds of directories — one per layer)
|
|
34
|
+
|
|
35
|
+
$ du -sh /var/lib/docker/overlay2/
|
|
36
|
+
89GB
|
|
37
|
+
|
|
38
|
+
Problem 1: A container's writable layer grew to 15GB because the
|
|
39
|
+
application writes temporary files that are never cleaned up. Even
|
|
40
|
+
though the files are "deleted" inside the container, the overlay2
|
|
41
|
+
whiteout files still consume space.
|
|
42
|
+
|
|
43
|
+
Problem 2: docker commit was used to save a modified container as
|
|
44
|
+
a new image. This created a massive single layer with all the
|
|
45
|
+
runtime changes, making the image 3x larger than necessary.
|
|
46
|
+
|
|
47
|
+
Problem 3: After an unclean Docker shutdown, some overlay2 layers
|
|
48
|
+
are corrupted:
|
|
49
|
+
$ docker inspect broken-container
|
|
50
|
+
Error: layer not found
|
|
51
|
+
|
|
52
|
+
Task: Explain container filesystem debugging. Write: how overlay2
|
|
53
|
+
works (lower/upper/merged/work directories), copy-on-write behavior,
|
|
54
|
+
docker diff for change detection, whiteout files, why docker commit
|
|
55
|
+
is an anti-pattern, layer corruption recovery, and best practices
|
|
56
|
+
for managing container writable layers.
|
|
57
|
+
|
|
58
|
+
assertions:
|
|
59
|
+
- type: llm_judge
|
|
60
|
+
criteria: "Overlay2 mechanics are explained — overlay2 uses lower directories (read-only image layers), upper directory (container's writable layer), merged directory (unified view), work directory (atomic operations). Copy-on-write: first write to a file copies it from lower to upper, then modifies. Deletes create whiteout files in upper layer (file still exists in lower). docker diff shows C (changed), A (added), D (deleted) in the writable layer"
|
|
61
|
+
weight: 0.35
|
|
62
|
+
description: "Overlay2 mechanics"
|
|
63
|
+
- type: llm_judge
|
|
64
|
+
criteria: "Layer management issues are covered — container writable layer grows with every write (even temporary files). Deleting files creates whiteout entries, not actual space recovery. docker commit captures the entire writable layer as a new image layer — includes temp files, logs, secrets, and is not reproducible. Always use Dockerfile for image creation, not docker commit. Large writable layers indicate the app should use volumes for data/logs/temp files"
|
|
65
|
+
weight: 0.35
|
|
66
|
+
description: "Layer management"
|
|
67
|
+
- type: llm_judge
|
|
68
|
+
criteria: "Recovery and best practices are practical — layer corruption: docker system prune can remove orphaned layers. For specific container: docker rm (removes writable layer). For images: docker rmi and re-pull. /var/lib/docker/overlay2 should not be manually modified. Best practices: use volumes for persistent data, tmpfs for temporary data, --read-only filesystem where possible, monitor writable layer size with docker system df -v, set storage-opts for layer size limits"
|
|
69
|
+
weight: 0.30
|
|
70
|
+
description: "Recovery and practices"
|
package/courses/docker-container-debugging/scenarios/level-3/container-security-hardening.yaml
ADDED
|
@@ -0,0 +1,74 @@
|
|
|
1
|
+
meta:
|
|
2
|
+
id: container-security-hardening
|
|
3
|
+
level: 3
|
|
4
|
+
course: docker-container-debugging
|
|
5
|
+
type: output
|
|
6
|
+
description: "Debug container security issues — diagnose privilege escalation, capability problems, seccomp profiles, and rootless container configuration"
|
|
7
|
+
tags: [Docker, security, rootless, seccomp, capabilities, hardening, advanced]
|
|
8
|
+
|
|
9
|
+
state: {}
|
|
10
|
+
|
|
11
|
+
trigger: |
|
|
12
|
+
A security audit of your Docker deployment flags several issues:
|
|
13
|
+
|
|
14
|
+
Finding 1 — Containers running as root:
|
|
15
|
+
$ docker exec webapp whoami
|
|
16
|
+
root
|
|
17
|
+
$ docker exec webapp id
|
|
18
|
+
uid=0(root) gid=0(root) groups=0(root)
|
|
19
|
+
|
|
20
|
+
If the application is compromised, the attacker has root inside the
|
|
21
|
+
container. With certain misconfigurations, this can lead to host
|
|
22
|
+
escape.
|
|
23
|
+
|
|
24
|
+
Finding 2 — Excessive capabilities:
|
|
25
|
+
$ docker exec webapp capsh --print
|
|
26
|
+
Current: cap_chown,cap_dac_override,cap_fsetid,cap_fowner,
|
|
27
|
+
cap_mknod,cap_net_raw,cap_setgid,cap_setuid,cap_setfcap,
|
|
28
|
+
cap_setpcap,cap_net_bind_service,cap_sys_chroot,cap_kill,cap_audit_write
|
|
29
|
+
|
|
30
|
+
Most apps don't need cap_net_raw, cap_sys_chroot, or cap_mknod.
|
|
31
|
+
|
|
32
|
+
Finding 3 — --privileged flag used "because it works":
|
|
33
|
+
docker run --privileged myapp
|
|
34
|
+
This gives the container ALL capabilities, access to all devices,
|
|
35
|
+
and disables seccomp/AppArmor. Used as a debugging shortcut but
|
|
36
|
+
left in production.
|
|
37
|
+
|
|
38
|
+
Finding 4 — Writable root filesystem:
|
|
39
|
+
$ docker exec webapp touch /etc/malicious-config
|
|
40
|
+
(succeeds — attacker can modify container filesystem)
|
|
41
|
+
|
|
42
|
+
Hardened configuration:
|
|
43
|
+
docker run \
|
|
44
|
+
--read-only \
|
|
45
|
+
--tmpfs /tmp \
|
|
46
|
+
--cap-drop ALL \
|
|
47
|
+
--cap-add NET_BIND_SERVICE \
|
|
48
|
+
--security-opt no-new-privileges:true \
|
|
49
|
+
--security-opt seccomp=custom-profile.json \
|
|
50
|
+
-u 1000:1000 \
|
|
51
|
+
myapp
|
|
52
|
+
|
|
53
|
+
But this breaks the app — it tries to write to /var/cache and
|
|
54
|
+
needs to bind to port 80 (requires NET_BIND_SERVICE or port > 1024).
|
|
55
|
+
|
|
56
|
+
Task: Explain container security hardening. Write: running as
|
|
57
|
+
non-root (USER directive, --user flag), Linux capabilities (drop
|
|
58
|
+
all, add specific), read-only filesystem (with tmpfs exceptions),
|
|
59
|
+
seccomp profiles, no-new-privileges, why --privileged is dangerous,
|
|
60
|
+
and the debugging process when hardening breaks applications.
|
|
61
|
+
|
|
62
|
+
assertions:
|
|
63
|
+
- type: llm_judge
|
|
64
|
+
criteria: "Non-root and capabilities are explained — Dockerfile USER directive or --user flag. Running as root inside container is dangerous even with namespace isolation. Drop ALL capabilities, add only what's needed (NET_BIND_SERVICE for port < 1024, CHOWN for file ownership). --privileged gives ALL capabilities plus device access plus disables security modules — never use in production. To find required capabilities: run with all dropped, check error messages, add back one at a time"
|
|
65
|
+
weight: 0.35
|
|
66
|
+
description: "Non-root and capabilities"
|
|
67
|
+
- type: llm_judge
|
|
68
|
+
criteria: "Filesystem and seccomp are covered — --read-only makes root filesystem immutable (prevents attackers from modifying binaries/config). Use --tmpfs for directories that need writes (/tmp, /var/cache, /var/run). no-new-privileges prevents privilege escalation via setuid binaries. Seccomp profiles restrict syscalls — Docker's default profile blocks ~44 dangerous syscalls. Custom profiles can restrict further. AppArmor/SELinux add mandatory access control"
|
|
69
|
+
weight: 0.35
|
|
70
|
+
description: "Filesystem and seccomp"
|
|
71
|
+
- type: llm_judge
|
|
72
|
+
criteria: "Debugging hardened containers is practical — common breakages when hardening: app writes to filesystem (add tmpfs), needs specific capability (add one at a time), needs specific syscall blocked by seccomp (check audit log, adjust profile). Use strace to identify required syscalls. Use docker diff to see filesystem changes. Test hardening in staging before production. Security scanning (Trivy, Docker Scout) complements runtime hardening"
|
|
73
|
+
weight: 0.30
|
|
74
|
+
description: "Debugging hardened containers"
|
|
@@ -0,0 +1,74 @@
|
|
|
1
|
+
meta:
|
|
2
|
+
id: disk-space-management
|
|
3
|
+
level: 3
|
|
4
|
+
course: docker-container-debugging
|
|
5
|
+
type: output
|
|
6
|
+
description: "Debug Docker disk space exhaustion — diagnose storage consumption from images, containers, volumes, and build cache, then implement cleanup strategies"
|
|
7
|
+
tags: [Docker, disk-space, prune, storage, build-cache, cleanup, advanced]
|
|
8
|
+
|
|
9
|
+
state: {}
|
|
10
|
+
|
|
11
|
+
trigger: |
|
|
12
|
+
Production Docker host is critically low on disk space. Containers
|
|
13
|
+
are failing to start:
|
|
14
|
+
|
|
15
|
+
$ docker run myapp
|
|
16
|
+
Error: no space left on device
|
|
17
|
+
|
|
18
|
+
$ df -h /var/lib/docker
|
|
19
|
+
Filesystem Size Used Avail Use%
|
|
20
|
+
/dev/sda1 200G 195G 5G 98%
|
|
21
|
+
|
|
22
|
+
$ docker system df
|
|
23
|
+
TYPE TOTAL ACTIVE SIZE RECLAIMABLE
|
|
24
|
+
Images 127 8 45.2GB 38.1GB (84%)
|
|
25
|
+
Containers 43 8 12.3GB 8.7GB (70%)
|
|
26
|
+
Local Volumes 56 12 78.5GB 52.3GB (66%)
|
|
27
|
+
Build Cache - - 34.8GB 34.8GB
|
|
28
|
+
|
|
29
|
+
170GB used by Docker! Breakdown:
|
|
30
|
+
|
|
31
|
+
1. Images (45GB, 84% reclaimable): 127 images, only 8 in use.
|
|
32
|
+
Old CI builds never cleaned up. Dangling images from failed
|
|
33
|
+
builds. Multiple versions of the same image.
|
|
34
|
+
$ docker images --filter "dangling=true" -q | wc -l
|
|
35
|
+
45
|
|
36
|
+
|
|
37
|
+
2. Containers (12GB): Stopped containers with large log files and
|
|
38
|
+
writable layer changes. docker rm only removes stopped containers.
|
|
39
|
+
|
|
40
|
+
3. Volumes (78GB, 66% reclaimable): Orphaned volumes from removed
|
|
41
|
+
containers. Database test volumes never cleaned up. Named volumes
|
|
42
|
+
for services that no longer exist.
|
|
43
|
+
$ docker volume ls --filter "dangling=true" -q | wc -l
|
|
44
|
+
44
|
|
45
|
+
|
|
46
|
+
4. Build cache (34GB): BuildKit cache from months of builds.
|
|
47
|
+
|
|
48
|
+
Cleanup strategy (careful — don't delete production data!):
|
|
49
|
+
$ docker system prune -a --volumes
|
|
50
|
+
WARNING! This will remove all stopped containers, all networks not
|
|
51
|
+
used, all images without containers, and all volumes not used.
|
|
52
|
+
Are you sure? [y/N]
|
|
53
|
+
|
|
54
|
+
But this is too aggressive for production! Need selective cleanup.
|
|
55
|
+
|
|
56
|
+
Task: Explain Docker disk space management. Write: understanding
|
|
57
|
+
docker system df output, selective cleanup (prune with filters),
|
|
58
|
+
dangerous vs safe prune commands, volume backup before cleanup,
|
|
59
|
+
automated cleanup strategies (cron, CI cleanup), monitoring disk
|
|
60
|
+
usage, and prevention (log rotation, image retention policies).
|
|
61
|
+
|
|
62
|
+
assertions:
|
|
63
|
+
- type: llm_judge
|
|
64
|
+
criteria: "Disk usage diagnosis is explained — docker system df shows breakdown by type (images, containers, volumes, build cache). RECLAIMABLE shows what can be safely removed. docker system df -v shows detailed per-item usage. Disk consumers: dangling images (untagged), stopped containers (writable layer + logs), orphaned volumes (no container reference), build cache. Check with du -sh /var/lib/docker/* for filesystem-level breakdown"
|
|
65
|
+
weight: 0.35
|
|
66
|
+
description: "Disk usage diagnosis"
|
|
67
|
+
- type: llm_judge
|
|
68
|
+
criteria: "Selective cleanup is covered — safe: docker image prune (remove dangling only), docker container prune (remove stopped), docker builder prune (clear build cache). Filtered: docker image prune -a --filter 'until=720h' (remove images older than 30 days). DANGEROUS: docker system prune -a --volumes (removes everything unused including volumes with data). Always: backup named volumes before pruning, never prune volumes on database hosts without verification. Use docker volume inspect to check last mount time"
|
|
69
|
+
weight: 0.35
|
|
70
|
+
description: "Selective cleanup"
|
|
71
|
+
- type: llm_judge
|
|
72
|
+
criteria: "Prevention and automation are practical — automated cleanup: cron job with docker system prune --filter 'until=168h' (weekly, items older than 7 days). CI/CD: clean up after pipeline runs, limit image retention. Log rotation in daemon.json prevents log bloat. Image retention policy: keep only N latest tags. Monitor: alert when /var/lib/docker exceeds 80%. Use docker system events to track resource creation. Separate volume for /var/lib/docker to isolate Docker disk usage from system"
|
|
73
|
+
weight: 0.30
|
|
74
|
+
description: "Prevention and automation"
|
|
@@ -0,0 +1,72 @@
|
|
|
1
|
+
meta:
|
|
2
|
+
id: docker-api-automation
|
|
3
|
+
level: 3
|
|
4
|
+
course: docker-container-debugging
|
|
5
|
+
type: output
|
|
6
|
+
description: "Debug Docker API and automation issues — diagnose Docker SDK errors, API versioning problems, and container orchestration script failures"
|
|
7
|
+
tags: [Docker, API, SDK, automation, scripting, REST, advanced]
|
|
8
|
+
|
|
9
|
+
state: {}
|
|
10
|
+
|
|
11
|
+
trigger: |
|
|
12
|
+
Your container management script suddenly breaks after a Docker
|
|
13
|
+
Engine upgrade:
|
|
14
|
+
|
|
15
|
+
import docker
|
|
16
|
+
client = docker.from_env()
|
|
17
|
+
containers = client.containers.list(filters={"health": "unhealthy"})
|
|
18
|
+
for c in containers:
|
|
19
|
+
c.restart()
|
|
20
|
+
print(f"Restarted {c.name}")
|
|
21
|
+
|
|
22
|
+
Error:
|
|
23
|
+
docker.errors.APIError: 400 Client Error: Bad Request
|
|
24
|
+
("invalid filter 'health'")
|
|
25
|
+
|
|
26
|
+
Investigation reveals several API issues:
|
|
27
|
+
|
|
28
|
+
1. API version mismatch — the script was written for Docker API 1.44
|
|
29
|
+
but the server now runs 1.45 which changed the filter syntax:
|
|
30
|
+
$ docker version --format '{{.Server.APIVersion}}'
|
|
31
|
+
1.45
|
|
32
|
+
|
|
33
|
+
Fix: pin API version in client:
|
|
34
|
+
client = docker.DockerClient(base_url='unix:///var/run/docker.sock',
|
|
35
|
+
version='1.44')
|
|
36
|
+
|
|
37
|
+
2. The script uses client.containers.list() to find containers,
|
|
38
|
+
but hits a race condition — between listing and restarting,
|
|
39
|
+
containers may have been removed by another process:
|
|
40
|
+
docker.errors.NotFound: 404 Client Error: Container not found
|
|
41
|
+
|
|
42
|
+
3. Direct API calls via curl for debugging:
|
|
43
|
+
$ curl --unix-socket /var/run/docker.sock \
|
|
44
|
+
http://localhost/v1.44/containers/json?filters={"health":["unhealthy"]}
|
|
45
|
+
$ curl --unix-socket /var/run/docker.sock \
|
|
46
|
+
http://localhost/v1.44/events?since=1700000000
|
|
47
|
+
|
|
48
|
+
4. The Docker socket permissions prevent the script from running
|
|
49
|
+
without root:
|
|
50
|
+
$ ls -la /var/run/docker.sock
|
|
51
|
+
srw-rw---- 1 root docker 0 Dec 01 10:00 /var/run/docker.sock
|
|
52
|
+
User not in docker group.
|
|
53
|
+
|
|
54
|
+
Task: Explain Docker API and automation debugging. Write: Docker
|
|
55
|
+
Engine API basics (REST over Unix socket), API versioning and
|
|
56
|
+
negotiation, Docker SDKs (Python, Go), common SDK errors and fixes,
|
|
57
|
+
race conditions in container management, socket permissions and
|
|
58
|
+
security, and using curl for raw API debugging.
|
|
59
|
+
|
|
60
|
+
assertions:
|
|
61
|
+
- type: llm_judge
|
|
62
|
+
criteria: "Docker API fundamentals are explained — Docker Engine exposes a REST API over Unix socket (/var/run/docker.sock) or TCP. All Docker CLI commands are API calls under the hood. API versioning: client and server negotiate version. Pin version to avoid breaking changes. curl --unix-socket for raw API debugging. Key endpoints: /containers/json (list), /containers/{id}/start, /containers/{id}/logs, /events (stream), /images/json"
|
|
63
|
+
weight: 0.35
|
|
64
|
+
description: "API fundamentals"
|
|
65
|
+
- type: llm_judge
|
|
66
|
+
criteria: "SDK debugging is covered — Python SDK (docker-py): docker.from_env() reads DOCKER_HOST and DOCKER_TLS_VERIFY. Common errors: APIError (bad request/version mismatch), NotFound (container removed), ConnectionError (daemon not running). Always handle NotFound for race conditions. Use version parameter to pin API version. Go SDK (moby/moby/client): similar patterns. Check API compatibility: docker version shows client and server API versions"
|
|
67
|
+
weight: 0.35
|
|
68
|
+
description: "SDK debugging"
|
|
69
|
+
- type: llm_judge
|
|
70
|
+
criteria: "Security and best practices are practical — socket access = root-equivalent access (can mount host filesystem via API). Never expose Docker socket to untrusted containers. Socket permissions: add user to docker group (convenience) or use rootless Docker (security). For automation: handle race conditions (container may disappear), use docker events for reactive management instead of polling, implement retry logic with backoff, log all management actions for audit trail"
|
|
71
|
+
weight: 0.30
|
|
72
|
+
description: "Security and practices"
|
|
@@ -0,0 +1,73 @@
|
|
|
1
|
+
meta:
|
|
2
|
+
id: docker-daemon-issues
|
|
3
|
+
level: 3
|
|
4
|
+
course: docker-container-debugging
|
|
5
|
+
type: output
|
|
6
|
+
description: "Debug Docker daemon issues — diagnose daemon startup failures, storage driver problems, and daemon configuration errors"
|
|
7
|
+
tags: [Docker, daemon, storage-driver, daemon.json, configuration, advanced]
|
|
8
|
+
|
|
9
|
+
state: {}
|
|
10
|
+
|
|
11
|
+
trigger: |
|
|
12
|
+
The Docker daemon won't start after a configuration change:
|
|
13
|
+
|
|
14
|
+
$ sudo systemctl start docker
|
|
15
|
+
Job for docker.service failed because the control process exited
|
|
16
|
+
with error code.
|
|
17
|
+
|
|
18
|
+
$ sudo journalctl -u docker -e
|
|
19
|
+
dockerd: unable to configure the Docker daemon with file
|
|
20
|
+
/etc/docker/daemon.json: invalid character '}' after object key
|
|
21
|
+
|
|
22
|
+
$ cat /etc/docker/daemon.json
|
|
23
|
+
{
|
|
24
|
+
"storage-driver": "overlay2",
|
|
25
|
+
"log-driver": "json-file",
|
|
26
|
+
"log-opts": {
|
|
27
|
+
"max-size": "10m"
|
|
28
|
+
"max-file": "3"
|
|
29
|
+
}
|
|
30
|
+
}
|
|
31
|
+
|
|
32
|
+
Missing comma after "max-size": "10m"! JSON syntax error.
|
|
33
|
+
|
|
34
|
+
After fixing the JSON, a different error:
|
|
35
|
+
$ sudo journalctl -u docker -e
|
|
36
|
+
dockerd: failed to start daemon: error initializing graphdriver:
|
|
37
|
+
driver not supported
|
|
38
|
+
|
|
39
|
+
The server has a btrfs filesystem but overlay2 was configured as
|
|
40
|
+
the storage driver. overlay2 requires ext4 or xfs.
|
|
41
|
+
|
|
42
|
+
After changing to the correct storage driver:
|
|
43
|
+
$ sudo systemctl start docker
|
|
44
|
+
(success, but all containers and images are gone!)
|
|
45
|
+
|
|
46
|
+
Changing storage drivers creates a new storage directory. The old
|
|
47
|
+
images and containers using the previous driver are inaccessible.
|
|
48
|
+
|
|
49
|
+
Additional daemon issues to cover:
|
|
50
|
+
- Disk space exhaustion preventing container creation
|
|
51
|
+
- iptables conflicts breaking container networking
|
|
52
|
+
- DNS configuration in daemon.json
|
|
53
|
+
- Live restore feature for daemon upgrades
|
|
54
|
+
|
|
55
|
+
Task: Explain Docker daemon troubleshooting. Write: how to read
|
|
56
|
+
daemon logs (journalctl, Docker Desktop logs), common daemon.json
|
|
57
|
+
errors (JSON syntax, invalid options), storage driver compatibility
|
|
58
|
+
and migration, disk space management (docker system df, prune), and
|
|
59
|
+
daemon configuration best practices.
|
|
60
|
+
|
|
61
|
+
assertions:
|
|
62
|
+
- type: llm_judge
|
|
63
|
+
criteria: "Daemon log analysis is explained — Linux: journalctl -u docker for systemd logs, Docker Desktop: ~/Library/Logs/Docker/ (Mac). Common startup failures: invalid daemon.json (JSON syntax error), storage driver incompatibility, disk space exhaustion, port conflicts, permission issues on /var/run/docker.sock. Always validate daemon.json with jq before restarting: jq empty /etc/docker/daemon.json"
|
|
64
|
+
weight: 0.35
|
|
65
|
+
description: "Daemon log analysis"
|
|
66
|
+
- type: llm_judge
|
|
67
|
+
criteria: "Storage driver issues are covered — overlay2 requires ext4/xfs filesystem, btrfs driver requires btrfs filesystem. Changing storage drivers means losing access to existing images/containers (stored in /var/lib/docker). Check current driver: docker info | grep 'Storage Driver'. Check filesystem: df -T /var/lib/docker. Use data-root in daemon.json to change Docker's storage location"
|
|
68
|
+
weight: 0.35
|
|
69
|
+
description: "Storage driver issues"
|
|
70
|
+
- type: llm_judge
|
|
71
|
+
criteria: "Configuration best practices are practical — keep a backup of working daemon.json, validate JSON before applying, use live-restore: true to keep containers running during daemon restart, configure log rotation globally, set DNS servers if needed, monitor disk space with docker system df. Disk cleanup: docker system prune -a removes unused images, containers, networks, volumes"
|
|
72
|
+
weight: 0.30
|
|
73
|
+
description: "Configuration practices"
|
|
@@ -0,0 +1,69 @@
|
|
|
1
|
+
meta:
|
|
2
|
+
id: docker-in-docker-ci
|
|
3
|
+
level: 3
|
|
4
|
+
course: docker-container-debugging
|
|
5
|
+
type: output
|
|
6
|
+
description: "Debug Docker-in-Docker and CI/CD pipeline issues — diagnose DinD socket mounting, build context problems, and CI-specific container failures"
|
|
7
|
+
tags: [Docker, DinD, CI/CD, socket, build-context, advanced]
|
|
8
|
+
|
|
9
|
+
state: {}
|
|
10
|
+
|
|
11
|
+
trigger: |
|
|
12
|
+
Your CI/CD pipeline (GitLab CI) suddenly fails at the Docker build step:
|
|
13
|
+
|
|
14
|
+
.gitlab-ci.yml:
|
|
15
|
+
build:
|
|
16
|
+
image: docker:24-dind
|
|
17
|
+
services:
|
|
18
|
+
- docker:24-dind
|
|
19
|
+
variables:
|
|
20
|
+
DOCKER_HOST: tcp://docker:2376
|
|
21
|
+
DOCKER_TLS_CERTDIR: "/certs"
|
|
22
|
+
script:
|
|
23
|
+
- docker build -t myapp:$CI_COMMIT_SHA .
|
|
24
|
+
- docker push registry.example.com/myapp:$CI_COMMIT_SHA
|
|
25
|
+
|
|
26
|
+
Error:
|
|
27
|
+
Cannot connect to the Docker daemon at tcp://docker:2376.
|
|
28
|
+
Is the docker daemon running?
|
|
29
|
+
|
|
30
|
+
Investigation:
|
|
31
|
+
1. The DinD service container starts but TLS handshake fails. The
|
|
32
|
+
DOCKER_TLS_CERTDIR variable must match between the client and
|
|
33
|
+
service containers. Also needs DOCKER_CERT_PATH and
|
|
34
|
+
DOCKER_TLS_VERIFY=1.
|
|
35
|
+
|
|
36
|
+
2. After fixing TLS, builds work but are extremely slow — every
|
|
37
|
+
build downloads all base images fresh because the DinD container
|
|
38
|
+
has no persistent storage.
|
|
39
|
+
|
|
40
|
+
3. Alternative approach — mounting the host Docker socket:
|
|
41
|
+
volumes:
|
|
42
|
+
- /var/run/docker.sock:/var/run/docker.sock
|
|
43
|
+
This is faster (shared image cache) but creates security risks:
|
|
44
|
+
the CI job has full access to the host Docker daemon.
|
|
45
|
+
|
|
46
|
+
4. The socket mount approach causes a different bug:
|
|
47
|
+
docker build -t myapp . uses the CI container's filesystem as
|
|
48
|
+
build context, but .dockerignore is missing, sending 2GB of
|
|
49
|
+
node_modules and .git to the daemon.
|
|
50
|
+
|
|
51
|
+
Task: Explain Docker-in-Docker and CI/CD debugging. Write: DinD vs
|
|
52
|
+
socket mounting (trade-offs), TLS configuration for DinD, build
|
|
53
|
+
caching strategies in CI (BuildKit cache mounts, registry cache),
|
|
54
|
+
build context optimization, and security considerations for Docker
|
|
55
|
+
in CI environments.
|
|
56
|
+
|
|
57
|
+
assertions:
|
|
58
|
+
- type: llm_judge
|
|
59
|
+
criteria: "DinD vs socket mounting trade-offs are explained — DinD (docker:dind service): fully isolated, clean state each build, slower (no cache), more secure. Socket mounting (/var/run/docker.sock): uses host daemon, shared image cache (faster), but CI jobs get host-level Docker access (security risk). Socket mount also means containers created by CI are siblings, not children. DinD TLS setup requires matching DOCKER_TLS_CERTDIR, DOCKER_CERT_PATH, DOCKER_HOST with tcp:// and port 2376"
|
|
60
|
+
weight: 0.35
|
|
61
|
+
description: "DinD vs socket mounting"
|
|
62
|
+
- type: llm_judge
|
|
63
|
+
criteria: "Build caching in CI is covered — BuildKit inline cache: DOCKER_BUILDKIT=1 docker build --build-arg BUILDKIT_INLINE_CACHE=1. Registry-based cache: --cache-from registry.example.com/myapp:latest --cache-to type=registry. CI-specific: mount cache volumes for DinD, use --cache-from with previously pushed images. Build context: always use .dockerignore to exclude .git, node_modules, test files. Small context = fast builds"
|
|
64
|
+
weight: 0.35
|
|
65
|
+
description: "Build caching in CI"
|
|
66
|
+
- type: llm_judge
|
|
67
|
+
criteria: "Security considerations are practical — socket mounting gives full Docker access (can access other containers, mount host filesystem). DinD with --privileged also has elevated privileges. Alternatives: Kaniko (build without Docker daemon), Buildah (daemonless builds), Docker BuildKit with rootless mode. Credential management: use CI variables for registry auth, never hardcode in Dockerfile. Image scanning in CI pipeline before push"
|
|
68
|
+
weight: 0.30
|
|
69
|
+
description: "Security considerations"
|
|
@@ -0,0 +1,70 @@
|
|
|
1
|
+
meta:
|
|
2
|
+
id: overlay-network-debugging
|
|
3
|
+
level: 3
|
|
4
|
+
course: docker-container-debugging
|
|
5
|
+
type: output
|
|
6
|
+
description: "Debug Docker overlay and multi-host networking — diagnose Swarm overlay networks, VXLAN issues, and cross-host container communication"
|
|
7
|
+
tags: [Docker, overlay, Swarm, VXLAN, multi-host, networking, advanced]
|
|
8
|
+
|
|
9
|
+
state: {}
|
|
10
|
+
|
|
11
|
+
trigger: |
|
|
12
|
+
Your Docker Swarm cluster has 3 nodes. Services on different nodes
|
|
13
|
+
can't communicate even though they're on the same overlay network:
|
|
14
|
+
|
|
15
|
+
$ docker network create -d overlay --attachable mynet
|
|
16
|
+
$ docker service create --name api --network mynet api-image
|
|
17
|
+
$ docker service create --name db --network mynet postgres:15
|
|
18
|
+
|
|
19
|
+
api and db are scheduled on different nodes:
|
|
20
|
+
$ docker service ps api
|
|
21
|
+
ID NAME NODE CURRENT STATE
|
|
22
|
+
abc123 api.1 node-1 Running 5 minutes ago
|
|
23
|
+
|
|
24
|
+
$ docker service ps db
|
|
25
|
+
ID NAME NODE CURRENT STATE
|
|
26
|
+
def456 db.1 node-2 Running 3 minutes ago
|
|
27
|
+
|
|
28
|
+
From api container:
|
|
29
|
+
$ docker exec api.1 ping db
|
|
30
|
+
PING db (10.0.1.5): 56 data bytes
|
|
31
|
+
(no response — packets lost)
|
|
32
|
+
|
|
33
|
+
Investigation:
|
|
34
|
+
|
|
35
|
+
1. The overlay network uses VXLAN (UDP port 4789) for encapsulation.
|
|
36
|
+
Node-2's firewall blocks UDP 4789:
|
|
37
|
+
$ sudo iptables -L -n | grep 4789
|
|
38
|
+
(no rule — VXLAN traffic is being dropped)
|
|
39
|
+
|
|
40
|
+
2. After opening port 4789, pings work but TCP connections to
|
|
41
|
+
postgres:5432 intermittently fail. MTU issue — VXLAN adds 50
|
|
42
|
+
bytes of overhead. The default MTU of 1500 causes fragmentation:
|
|
43
|
+
$ docker exec api.1 ping -s 1450 db
|
|
44
|
+
(works)
|
|
45
|
+
$ docker exec api.1 ping -s 1472 db
|
|
46
|
+
(fails — packet too large with VXLAN overhead)
|
|
47
|
+
|
|
48
|
+
3. DNS resolution between services uses Docker's embedded DNS
|
|
49
|
+
(127.0.0.11). In some cases, the VIP (virtual IP) load balancing
|
|
50
|
+
sends traffic to a container that hasn't finished starting.
|
|
51
|
+
|
|
52
|
+
Task: Explain Docker overlay network debugging. Write: how overlay
|
|
53
|
+
networks work (VXLAN encapsulation, control plane vs data plane),
|
|
54
|
+
required ports for Swarm (2377, 7946, 4789), MTU considerations,
|
|
55
|
+
DNS service discovery in Swarm (VIP vs DNSRR), cross-host debugging
|
|
56
|
+
tools and techniques, and common overlay network failure modes.
|
|
57
|
+
|
|
58
|
+
assertions:
|
|
59
|
+
- type: llm_judge
|
|
60
|
+
criteria: "Overlay network mechanics are explained — overlay networks use VXLAN to encapsulate Layer 2 frames in UDP packets (port 4789). Docker Swarm requires: TCP 2377 (cluster management), TCP/UDP 7946 (node communication/gossip), UDP 4789 (VXLAN data). If any port is blocked, overlay networking fails. The ingress network handles published port routing (routing mesh). Each service gets a VIP on the overlay network"
|
|
61
|
+
weight: 0.35
|
|
62
|
+
description: "Overlay mechanics"
|
|
63
|
+
- type: llm_judge
|
|
64
|
+
criteria: "MTU and DNS issues are covered — VXLAN adds 50 bytes overhead, so effective MTU is 1450 (not 1500). Symptoms of MTU issues: small packets work, large transfers fail or hang. Fix: set MTU on overlay network creation (--opt com.docker.network.driver.mtu=1450) or in daemon.json. DNS: services resolve via embedded DNS (127.0.0.11). VIP mode (default) provides single virtual IP with load balancing. DNSRR mode returns all container IPs. VIP can route to starting containers — use health checks"
|
|
65
|
+
weight: 0.35
|
|
66
|
+
description: "MTU and DNS"
|
|
67
|
+
- type: llm_judge
|
|
68
|
+
criteria: "Debugging techniques are practical — check network connectivity: docker network inspect (shows connected containers and IPs). Use netshoot container on overlay network for debugging. Check VXLAN: tcpdump on host for UDP 4789. Verify port rules: iptables -L -n. Check Swarm gossip: docker node ls (all nodes must be Ready). Common failures: firewall blocking Swarm ports, MTU mismatch, stale network state (docker network disconnect/connect to reset)"
|
|
69
|
+
weight: 0.30
|
|
70
|
+
description: "Debugging techniques"
|