npm - dojo.md - Versions diffs - 0.2.0 → 0.2.1 - Mend

dojo.md 0.2.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (222) hide show

package/courses/docker-container-debugging/scenarios/level-4/multi-environment-management.yaml ADDED Viewed

@@ -0,0 +1,65 @@
+meta:
+  id: multi-environment-management
+  level: 4
+  course: docker-container-debugging
+  type: output
+  description: "Design multi-environment container management — implement dev/staging/prod parity, environment-specific configuration, and promotion workflows"
+  tags: [Docker, environments, dev-prod-parity, configuration, promotion, expert]
+state: {}
+trigger: |
+  "It works in dev but not in production" — your team's most common
+  incident cause. Investigation reveals massive environment drift:
+  Development:
+  - docker compose up (all services on laptop)
+  - Uses :latest tags, builds locally
+  - Mounts source code as volumes (hot reload)
+  - Single instance of everything
+  - SQLite instead of PostgreSQL
+  - No resource limits, no health checks
+  Staging:
+  - docker compose -f docker-compose.staging.yml up
+  - Some services use :latest, some use specific tags
+  - Different Compose file with staging overrides
+  - PostgreSQL but different version than production
+  - Health checks defined but not enforced
+  Production:
+  - docker compose -f docker-compose.prod.yml up on 3 servers
+  - Manually tagged images (sometimes wrong tag)
+  - nginx reverse proxy manually configured
+  - PostgreSQL 15 with replication
+  - Resource limits, health checks, log rotation
+  Three separate Compose files that have diverged significantly.
+  Config values hardcoded differently in each environment.
+  Proposed solution architecture:
+  - Single docker-compose.yml with environment-specific overrides
+  - Identical images across all environments (config via env vars)
+  - Environment parity: same database engine, same service versions
+  - Promotion: same image SHA moves dev → staging → production
+  - Configuration management via .env files per environment
+  Task: Design multi-environment container management. Write: the
+  12-factor app approach to configuration, Compose override files
+  (docker-compose.override.yml), environment parity strategies,
+  image promotion workflows (same SHA across environments), secrets
+  per environment, and debugging environment-specific issues.
+assertions:
+  - type: llm_judge
+    criteria: "Environment parity is explained — the 12-factor app: same image in all environments, configuration via environment variables only. Docker Compose overrides: base docker-compose.yml + docker-compose.override.yml (dev defaults) + docker-compose.prod.yml (production overrides). Use docker compose -f base.yml -f prod.yml up. Keep dev close to prod: same database engine (not SQLite vs PostgreSQL), same service versions, same network topology. Dev additions: volume mounts for hot reload, debug ports"
+    weight: 0.35
+    description: "Environment parity"
+  - type: llm_judge
+    criteria: "Image promotion workflow is covered — build once, deploy everywhere: CI builds image with git SHA tag, same image moves through environments. Never rebuild for different environments — configuration changes, not the image. Promotion: dev (auto-deploy on merge) → staging (auto-deploy, run integration tests) → production (manual approval, same SHA). Image tags: git SHA for traceability, semantic version for releases, never :latest in staging/production. Registry as the source of truth for promoted images"
+    weight: 0.35
+    description: "Promotion workflow"
+  - type: llm_judge
+    criteria: "Configuration and debugging are practical — configuration hierarchy: defaults in image → .env file per environment → runtime environment variables. Secrets: Docker secrets for production, .env file for development (gitignored). Debugging environment-specific issues: compare docker inspect output between environments, diff effective configuration, check resource limits (may only manifest under production load), verify network topology matches. Common gotchas: DNS resolution differs, filesystem case sensitivity (Mac vs Linux), time zones, locale settings"
+    weight: 0.30
+    description: "Configuration and debugging"

package/courses/docker-container-debugging/scenarios/level-4/stateful-service-containers.yaml ADDED Viewed

@@ -0,0 +1,65 @@
+meta:
+  id: stateful-service-containers
+  level: 4
+  course: docker-container-debugging
+  type: output
+  description: "Design stateful service container strategies — implement database containers, persistent storage, backup/restore, and data migration for containerized databases"
+  tags: [Docker, stateful, databases, volumes, backup, persistence, expert]
+state: {}
+trigger: |
+  Your team wants to containerize production databases (PostgreSQL,
+  Redis, Elasticsearch). The DBA team objects: "Databases don't belong
+  in containers — we'll lose data." You need to design a bulletproof
+  strategy.
+  Current concerns:
+  1. Data persistence — container removal must NOT delete data:
+     $ docker rm postgres-prod
+     # Without named volumes, all data is gone!
+     Solution: named volumes with explicit volume drivers:
+     docker run -v pgdata:/var/lib/postgresql/data postgres:15
+  2. Backup strategy — can't just copy files while database is running:
+     $ docker exec postgres pg_dump -U app mydb > backup.sql
+     # Logical backup — portable, but slow for large databases
+     $ docker run --volumes-from postgres -v /backups:/backup \
+       ubuntu tar czf /backup/pgdata.tar.gz /var/lib/postgresql/data
+     # Physical backup — fast, but must stop writes first
+  3. Performance — default volume driver (local) may have overhead:
+     $ docker run -v pgdata:/data --mount type=tmpfs,target=/tmp \
+       --shm-size=256m postgres:15
+     # SHM size affects PostgreSQL shared memory for sorting/hashing
+  4. High availability — single container is single point of failure:
+     - PostgreSQL: streaming replication with pg_basebackup
+     - Redis: Sentinel for automatic failover
+     - Elasticsearch: cluster across multiple containers
+  5. Version upgrades — PostgreSQL major version requires data migration:
+     $ docker run -v pgdata-old:/old -v pgdata-new:/new \
+       tianon/postgres-upgrade pg_upgrade --old-datadir /old ...
+  Task: Design stateful container strategy. Write: volume management
+  for databases, backup/restore procedures, performance tuning for
+  containerized databases, high availability patterns, version upgrade
+  strategies, and when NOT to containerize databases.
+assertions:
+  - type: llm_judge
+    criteria: "Volume management is thorough — always use named volumes (not bind mounts) for database data. Volume drivers: local (default, good for single host), NFS (shared across hosts), cloud-specific (EBS, Azure Disk). Never use anonymous volumes for important data. docker volume inspect shows mount point. Backup the volume directory on host. Consider: volume permissions (database runs as specific UID), volume labels for organization, volume cleanup policies (never auto-prune database volumes)"
+    weight: 0.35
+    description: "Volume management"
+  - type: llm_judge
+    criteria: "Backup and HA are covered — logical backups: pg_dump (portable, slow), mysqldump, redis-cli BGSAVE. Physical backups: volume snapshot (requires write quiescing), filesystem-level backup. Schedule: daily full + continuous WAL archiving for point-in-time recovery. HA patterns: primary-replica with volume per instance, automated failover (PgBouncer + Patroni for PostgreSQL, Redis Sentinel, Elasticsearch cluster). Test restore regularly — backup without tested restore is not a backup"
+    weight: 0.35
+    description: "Backup and HA"
+  - type: llm_judge
+    criteria: "Trade-offs and guidance are practical — when to containerize databases: development/testing (always), stateless caches (Redis without persistence), small-medium production databases with proper volume management. When NOT to: very large databases (multi-TB), extreme I/O requirements, when managed database services are available (RDS, Cloud SQL). Performance tuning: --shm-size for PostgreSQL, dedicated CPU/memory limits, SSD-backed volumes, tune database config for container environment (not bare-metal defaults)"
+    weight: 0.30
+    description: "Trade-offs"

package/courses/docker-container-debugging/scenarios/level-5/board-infrastructure-strategy.yaml ADDED Viewed

@@ -0,0 +1,58 @@
+meta:
+  id: board-infrastructure-strategy
+  level: 5
+  course: docker-container-debugging
+  type: output
+  description: "Board-level infrastructure strategy — present container technology investment to the board of directors with business outcomes, risk analysis, and competitive positioning"
+  tags: [Docker, board, strategy, business-outcomes, investment, executive, master]
+state: {}
+trigger: |
+  The board of directors wants to understand the company's container
+  platform investment. The CEO asks you (VP Engineering) to present
+  at the next board meeting. Board members are non-technical but
+  business-savvy.
+  Key metrics to present:
+  Business outcomes (last 12 months since container adoption):
+  - Deployment frequency: 4/month → 50/day (300x increase)
+  - Time to market for new features: 6 weeks → 1 week (6x faster)
+  - System availability: 99.7% → 99.95% ($1.2M saved in downtime)
+  - Infrastructure costs: $3.8M → $2.75M (28% reduction)
+  - Developer onboarding: 2 weeks → 2 days
+  - Incident recovery time: 4 hours → 25 minutes
+  Investment to date:
+  - Platform team (5 engineers): $900K/year
+  - Tooling and cloud: $1.85M/year
+  - Training: $25K/year
+  - Total: $2.75M/year
+  Questions the board will ask:
+  1. "What's the 3-year ROI?" (Cumulative savings + revenue acceleration)
+  2. "What are the risks?" (Vendor lock-in, skill dependency, security)
+  3. "How does this compare to competitors?"
+  4. "What happens if we stop investing?"
+  5. "What's the next phase and what will it cost?"
+  Task: Prepare the board presentation. Write: the executive narrative
+  (why containers matter for business outcomes), metrics that resonate
+  with board members, risk analysis with mitigations, competitive
+  positioning, investment roadmap for next 3 years, and the business
+  case for continued investment.
+assertions:
+  - type: llm_judge
+    criteria: "Executive narrative is non-technical — frame containers as 'standardized, portable application packaging that automates deployment.' Avoid jargon (no 'Docker,' 'Kubernetes,' 'orchestration'). Focus on outcomes: faster time to market enables revenue growth, higher availability protects revenue, lower infrastructure costs improve margins, faster developer onboarding supports hiring growth. Use analogies: containers are like shipping containers — standardized boxes that work everywhere"
+    weight: 0.35
+    description: "Executive narrative"
+  - type: llm_judge
+    criteria: "Metrics and ROI are compelling — 3-year ROI calculation: investment ($8.25M over 3 years) vs savings (infrastructure: $3.15M, downtime prevention: $3.6M, developer productivity: $5M+). Net ROI: positive within 18 months. Present metrics the board cares about: revenue impact (features shipped faster = revenue sooner), risk reduction (availability, security posture), competitive advantage (faster than competitors). Benchmarks: compare deployment frequency and MTTR to industry standards (DORA report)"
+    weight: 0.35
+    description: "Metrics and ROI"
+  - type: llm_judge
+    criteria: "Risk and forward strategy are balanced — risks: key person dependency (mitigate with documentation and cross-training), cloud vendor risk (mitigate with multi-cloud strategy), security (mitigate with defense-in-depth). What happens if we stop investing: technical debt accumulates, competitors gain advantage, talent retention suffers (engineers want modern tools). Next phase: AI/ML workloads on container platform, edge computing, developer experience improvements. Investment ask: maintain current spending + 15% growth for new capabilities. Present as continued competitive advantage, not just cost savings"
+    weight: 0.30
+    description: "Risk and strategy"

package/courses/docker-container-debugging/scenarios/level-5/consulting-container-strategy.yaml ADDED Viewed

@@ -0,0 +1,61 @@
+meta:
+  id: consulting-container-strategy
+  level: 5
+  course: docker-container-debugging
+  type: output
+  description: "Consulting engagement — design a container adoption strategy for a traditional enterprise migrating from VMs to containers"
+  tags: [Docker, consulting, strategy, migration, enterprise, VM-to-container, master]
+state: {}
+trigger: |
+  A Fortune 500 manufacturing company with 500+ applications running
+  on VMware wants to containerize. Their CTO has heard "containers
+  are the future" but the organization has zero container experience.
+  Current state:
+  - 2,000+ VMs across 3 data centers
+  - 500+ applications (Java, .NET, Python, legacy C++)
+  - 50-person IT operations team skilled in VMware/Linux admin
+  - Compliance: SOX, ITAR (defense contracts), HIPAA (employee health)
+  - Budget: $2M over 2 years for modernization
+  - Recent incidents: 3 outages from VM sprawl (overprovisioned hosts)
+  Stakeholder concerns:
+  - CTO: "How fast can we containerize everything?"
+  - VP Engineering: "My teams can barely keep up with current work"
+  - CISO: "Containers are less secure than VMs" (misconception to address)
+  - CFO: "What's the ROI? VMs work fine."
+  - Operations: "We'll lose our jobs if we automate everything"
+  You need to deliver:
+  1. Application portfolio assessment (what to containerize first)
+  2. Platform architecture recommendation
+  3. Team transformation plan
+  4. Risk-adjusted timeline
+  5. Business case with ROI projections
+  Not all 500 applications should be containerized:
+  - Good candidates: stateless web apps, microservices, batch jobs
+  - Poor candidates: legacy mainframe connectors, GUI applications,
+    apps with hardware dependencies, some databases
+  - Never: real-time control systems, kernel-dependent applications
+  Task: Design the consulting engagement deliverable. Write: the
+  assessment framework (containerization readiness scoring), platform
+  recommendation, migration strategy (strangler fig for legacy),
+  organizational change management, ROI model, and risk mitigation.
+assertions:
+  - type: llm_judge
+    criteria: "Assessment framework is structured — score applications on: architecture (stateless vs stateful), dependencies (external hardware, specific OS), compliance requirements, team capability, business criticality. Categorize: containerize now (stateless web apps, APIs), containerize with refactoring (monoliths needing decomposition), containerize later (stateful with complex data), don't containerize (legacy, hardware-dependent). Start with low-risk, high-value candidates for quick wins"
+    weight: 0.35
+    description: "Assessment framework"
+  - type: llm_judge
+    criteria: "Platform and migration strategy are covered — platform: managed Kubernetes (EKS/AKS/GKE) unless air-gapped (then Rancher/OpenShift on-prem). Container runtime: Docker/containerd. Registry: Harbor for on-prem, ECR/GCR for cloud. CI/CD: GitLab or GitHub Actions with container pipeline. Migration: strangler fig pattern — new features as containers, gradually extract from monolith. Lift-and-shift first (containerize as-is), then optimize (refactor for cloud-native). Timeline: 6-month pilot, 18-month primary migration, ongoing optimization"
+    weight: 0.35
+    description: "Platform and migration"
+  - type: llm_judge
+    criteria: "Change management and ROI are practical — team transformation: upskill operations team (they become platform engineers, not eliminated). Training plan: Docker basics → Kubernetes → CI/CD. Hire 3-5 container specialists to lead and mentor. Address CISO: containers with proper hardening have smaller attack surface than VMs. ROI model: infrastructure cost reduction (higher density than VMs: 3-5x), operational efficiency (automated deployments, reduced MTTR), developer velocity (consistent environments, faster CI). Risks: timeline slippage, skill gaps, legacy app compatibility. Mitigation: phased approach, pilot first, keep VM fallback"
+    weight: 0.30
+    description: "Change management and ROI"

package/courses/docker-container-debugging/scenarios/level-5/container-platform-architecture.yaml ADDED Viewed

@@ -0,0 +1,67 @@
+meta:
+  id: container-platform-architecture
+  level: 5
+  course: docker-container-debugging
+  type: output
+  description: "Design enterprise container platform architecture — build an internal developer platform with self-service container deployment, governance, and operational excellence"
+  tags: [Docker, platform, architecture, IDP, self-service, governance, master]
+state: {}
+trigger: |
+  Your company (2,000 engineers, 300+ services) needs an Internal
+  Developer Platform (IDP) built on containers. The goal: developers
+  deploy services without understanding infrastructure details.
+  Requirements from stakeholders:
+  Engineering VP: "Developers should deploy in under 5 minutes with
+  zero infrastructure knowledge. Current process takes 2 weeks
+  involving 3 teams."
+  CISO: "Every deployment must be scanned, signed, and compliant.
+  We need audit trails for SOC2."
+  SRE Director: "We need 99.95% platform availability. Current
+  container infrastructure has 99.7% (26 hours downtime/year)."
+  CFO: "Platform team costs $3M/year. Justify every dollar."
+  Platform architecture layers:
+  Layer 1 — Infrastructure:
+  Multi-region Kubernetes clusters (managed), container runtime,
+  CNI networking, storage classes, ingress controllers.
+  Layer 2 — Platform Services:
+  Service mesh (Istio/Linkerd), centralized logging (ELK/Loki),
+  monitoring (Prometheus/Grafana), secrets management (Vault),
+  CI/CD (ArgoCD/Flux), registry (Harbor).
+  Layer 3 — Developer Interface:
+  Self-service portal (Backstage), templated service creation,
+  one-click deployments, environment management, cost dashboards.
+  Layer 4 — Governance:
+  Policy engine (OPA/Kyverno), security scanning gates,
+  resource quotas, network policies, compliance reporting.
+  Task: Design the platform architecture. Write: each layer with
+  component selection rationale, the developer experience journey
+  (from code to production), governance automation, platform team
+  structure and responsibilities, SLO/SLA framework, and how to
+  measure platform success (adoption, developer satisfaction, MTTR).
+assertions:
+  - type: llm_judge
+    criteria: "Platform layers are well-designed — Infrastructure: managed K8s for reduced operational burden, multi-region for HA, storage classes for different workload needs. Platform services: service mesh for traffic management and mTLS, centralized observability, GitOps (ArgoCD/Flux) for declarative deployments. Developer interface: Backstage or similar portal for service catalog, templated creation (golden paths), self-service environments. Each layer has clear ownership and SLOs"
+    weight: 0.35
+    description: "Platform layers"
+  - type: llm_judge
+    criteria: "Developer experience is frictionless — golden path: developer creates service from template (Backstage), CI automatically builds/scans/signs, CD deploys to staging, promotion to production with one click. Time from commit to production: < 30 minutes. Self-service: developers manage their own services without tickets. Guard rails, not gates: policies enforced automatically, not by manual review. Documentation: runbooks auto-generated, dashboards pre-configured per service template"
+    weight: 0.35
+    description: "Developer experience"
+  - type: llm_judge
+    criteria: "Success metrics are measurable — platform adoption: % of services on platform (target: 80% in year 1). Developer satisfaction: quarterly survey (NPS > 50). MTTR: < 15 minutes for platform issues. Deployment frequency: daily (from weekly). Change failure rate: < 5%. Cost per service: decreasing trend. Platform team efficiency: services per platform engineer ratio (target: 50:1). SLA: 99.95% platform availability with error budget policy. Track and report these metrics monthly to stakeholders"
+    weight: 0.30
+    description: "Success metrics"

package/courses/docker-container-debugging/scenarios/level-5/container-platform-economics.yaml ADDED Viewed

@@ -0,0 +1,67 @@
+meta:
+  id: container-platform-economics
+  level: 5
+  course: docker-container-debugging
+  type: output
+  description: "Analyze container platform economics — evaluate total cost of ownership, resource optimization, and financial models for containerized infrastructure"
+  tags: [Docker, economics, TCO, FinOps, cost-optimization, ROI, master]
+state: {}
+trigger: |
+  The CFO wants a comprehensive financial analysis of your container
+  platform before approving the next year's infrastructure budget.
+  Current container platform costs (annual):
+  - Cloud compute (EC2/GCE instances): $1.2M
+  - Container registry (ECR): $15K
+  - Load balancers (ALB/NLB): $85K
+  - Storage (EBS volumes, S3 for backups): $180K
+  - Monitoring (Datadog): $240K
+  - Security tools (Snyk, Falco): $60K
+  - CI/CD (GitHub Actions): $48K
+  - Personnel: 5 platform engineers × $180K = $900K
+  - Training and certification: $25K
+  Total: ~$2.75M
+  Compared to the previous VM-based infrastructure: $3.8M
+  Container platform saves $1.05M/year (28% reduction).
+  But the CFO challenges:
+  1. "Monitoring costs tripled — why is Datadog $240K?"
+     Container environments generate 10x more metrics/logs.
+  2. "We have 40% average CPU utilization — we're over-provisioned"
+     Containers are sized for peak, not average. Right-sizing
+     could save $300-400K.
+  3. "Can we use spot/preemptible instances for containers?"
+     Stateless containers are perfect for spot instances at
+     60-90% discount, but need graceful shutdown handling.
+  4. "What about reserved instances vs on-demand?"
+     Base load on reserved (1-3 year commitment for 30-60% savings),
+     burst on spot, safety net on on-demand.
+  5. "Alternative: what would self-hosted cost?"
+     Hardware amortization, colo, hiring more ops staff.
+  Task: Analyze container platform economics. Write: TCO model
+  (compute, storage, networking, tooling, personnel), optimization
+  strategies (right-sizing, spot instances, reserved capacity),
+  monitoring cost management, build vs buy decisions, and how to
+  present infrastructure ROI to finance stakeholders.
+assertions:
+  - type: llm_judge
+    criteria: "TCO model is comprehensive — include all cost categories: compute (instances/servers), storage (volumes, snapshots, backups), networking (load balancers, data transfer, VPN), tooling (registry, CI/CD, monitoring, security), personnel (platform team, on-call, training), and hidden costs (incident remediation, technical debt, compliance audits). Compare: cloud vs on-prem vs hybrid. Account for opportunity cost — what could engineers build instead of managing infrastructure?"
+    weight: 0.35
+    description: "TCO model"
+  - type: llm_judge
+    criteria: "Optimization strategies are actionable — right-sizing: analyze CPU/memory usage percentiles (p95, not average) to set container limits. Use Vertical Pod Autoscaler recommendations. Spot instances: 60-90% savings for stateless, fault-tolerant workloads. Reserved instances: 1-year or 3-year commitments for baseline capacity. Monitoring costs: reduce metric cardinality, sample logs instead of collecting all, use open-source alternatives (Prometheus + Grafana instead of Datadog for $200K savings). Multi-cloud or serverless for specific workloads"
+    weight: 0.35
+    description: "Optimization strategies"
+  - type: llm_judge
+    criteria: "Financial communication is stakeholder-appropriate — present to CFO in business terms: ROI percentage, payback period, cost per transaction/user, comparison with industry benchmarks. Show trends: cost per container decreasing as platform matures. Avoid technical jargon — translate 'right-sizing containers' to 'eliminating waste in compute resources.' Include risk-adjusted scenarios: optimistic (40% savings), realistic (25%), conservative (15%). Demonstrate continuous improvement: quarterly cost reviews, automated rightsizing recommendations"
+    weight: 0.30
+    description: "Financial communication"

package/courses/docker-container-debugging/scenarios/level-5/container-technology-evolution.yaml ADDED Viewed

@@ -0,0 +1,67 @@
+meta:
+  id: container-technology-evolution
+  level: 5
+  course: docker-container-debugging
+  type: output
+  description: "Analyze container technology evolution — evaluate emerging technologies like WebAssembly, eBPF, unikernels, and serverless containers and their impact on Docker"
+  tags: [Docker, WebAssembly, eBPF, serverless, evolution, future, master]
+state: {}
+trigger: |
+  Your company's technology radar review asks: "Is Docker still the
+  right bet? What's coming next in containerization?"
+  You need to evaluate emerging technologies and their relationship
+  to Docker:
+  1. WebAssembly (Wasm) Containers:
+  Docker co-founder Solomon Hykes: "If WASM+WASI existed in 2008,
+  we wouldn't have needed to create Docker."
+  Wasm advantages: ~1ms cold start (vs seconds for containers),
+  near-native performance, language-agnostic, smaller binaries (KBs
+  vs MBs), sandboxed by design. Docker Desktop can already run Wasm
+  workloads via containerd's Wasm shims (spin, wasmtime).
+  But: limited system access, no filesystem by default, ecosystem
+  still maturing, not suitable for all workloads (databases, legacy apps).
+  2. eBPF for Container Networking/Security:
+  Cilium replaces iptables with eBPF for container networking —
+  10x better performance for service mesh. Tetragon provides
+  kernel-level runtime security without sidecar containers.
+  3. Serverless Containers (AWS Fargate, Google Cloud Run):
+  Run containers without managing servers. Pay per second of
+  compute. No cluster management, no node scaling.
+  Trade-off: less control, cold starts, vendor lock-in, higher
+  per-unit cost at scale.
+  4. Unikernels and MicroVMs (Firecracker):
+  VM-level isolation with container-like density. Firecracker
+  (powers Lambda/Fargate): boots a microVM in 125ms, 5MB memory
+  overhead. Kata Containers: run each container in its own microVM.
+  5. OCI Runtime Alternatives:
+  containerd replacing Docker daemon. crun (C-based, faster than
+  runc). youki (Rust-based). gVisor (kernel-level sandboxing).
+  Task: Evaluate container technology evolution. Write: assessment
+  of each technology (maturity, use cases, limitations), impact on
+  Docker's role, migration considerations, timeline predictions,
+  and strategic recommendations for technology adoption.
+assertions:
+  - type: llm_judge
+    criteria: "Technology assessments are balanced — Wasm: production-ready for edge/serverless workloads, not ready to replace general-purpose containers. Excellent for plugins, edge computing, serverless functions. eBPF: production-ready (Cilium, Tetragon), revolutionizing container networking and security. Serverless containers: production-ready, ideal for variable workloads and small teams. MicroVMs: production-ready (Firecracker), used by major cloud providers, good for multi-tenant isolation. Each technology complements rather than replaces Docker"
+    weight: 0.35
+    description: "Technology assessments"
+  - type: llm_judge
+    criteria: "Docker's evolving role is explained — Docker is no longer just a runtime — it's a developer experience layer. containerd is the actual container runtime in most Kubernetes clusters. Docker's value: developer workflow (build, test, local development), Dockerfile as standard build format, Docker Desktop as development environment, Docker Hub as distribution platform. Docker won't disappear but its role shifts from 'the container platform' to 'the developer tool for containers.' OCI standards ensure interoperability"
+    weight: 0.35
+    description: "Docker's role"
+  - type: llm_judge
+    criteria: "Strategic recommendations are practical — don't chase every new technology. Adopt based on: maturity (production-ready?), team capability (learning curve), ROI (real problems it solves), ecosystem (tooling, community). Timeline: eBPF/Cilium — adopt now for networking. Serverless containers — use for appropriate workloads now. Wasm — pilot for edge/plugin use cases, watch for general workloads. MicroVMs — let cloud providers manage. Key principle: containers (OCI) remain the packaging format; the runtime and orchestration evolve underneath"
+    weight: 0.30
+    description: "Strategic recommendations"

package/courses/docker-container-debugging/scenarios/level-5/disaster-recovery-containers.yaml ADDED Viewed

@@ -0,0 +1,66 @@
+meta:
+  id: disaster-recovery-containers
+  level: 5
+  course: docker-container-debugging
+  type: output
+  description: "Design container disaster recovery — implement backup strategies, multi-region failover, RPO/RTO targets, and chaos engineering for containerized applications"
+  tags: [Docker, disaster-recovery, multi-region, RPO-RTO, chaos-engineering, master]
+state: {}
+trigger: |
+  Your container platform experienced a catastrophic failure: a cloud
+  provider region outage took down all services for 6 hours. Revenue
+  impact: $2.4M. The board demands a disaster recovery plan.
+  Current state (single region):
+  - All containers in us-east-1
+  - Database backups: daily SQL dumps to S3 (same region)
+  - No automated failover
+  - Recovery process: manual, undocumented, took 6 hours
+  - RPO: 24 hours (last backup could be 24h old)
+  - RTO: 6 hours (how long it took to restore)
+  Business requirements:
+  - RPO target: < 1 hour (max 1 hour of data loss)
+  - RTO target: < 15 minutes (max 15 minutes of downtime)
+  - Cost constraint: DR infrastructure < 30% of primary cost
+  Proposed DR architecture:
+  Tier 1 — Critical services (payment, auth, core API):
+  Active-active across 2 regions. Database: multi-region replication.
+  RPO: ~0 (synchronous replication). RTO: ~0 (automatic failover).
+  Cost: 100% duplication.
+  Tier 2 — Important services (user dashboard, notifications):
+  Active-passive with warm standby. Database: async replication.
+  RPO: < 5 minutes. RTO: < 15 minutes.
+  Cost: 30-50% of primary.
+  Tier 3 — Non-critical (analytics, reporting, batch jobs):
+  Backup and restore. Database: daily backups to DR region.
+  RPO: < 24 hours. RTO: < 4 hours.
+  Cost: 10% of primary (storage only).
+  Validation: quarterly DR drills + continuous chaos engineering.
+  Task: Design the container DR strategy. Write: the tiered approach
+  with RPO/RTO definitions, multi-region container deployment,
+  database replication for containerized databases, automated
+  failover mechanisms, DR testing (chaos engineering, game days),
+  and cost optimization for DR infrastructure.
+assertions:
+  - type: llm_judge
+    criteria: "Tiered DR architecture is justified — not all services need the same DR level. Tier by business impact: revenue-generating services get active-active, supporting services get warm standby, internal tools get backup/restore. RPO (Recovery Point Objective): maximum acceptable data loss. RTO (Recovery Time Objective): maximum acceptable downtime. Cost scales with RPO/RTO: near-zero requires full duplication, longer RPO/RTO uses cheaper strategies. Service dependency mapping determines which services must fail over together"
+    weight: 0.35
+    description: "Tiered DR"
+  - type: llm_judge
+    criteria: "Container-specific DR is covered — containers are inherently portable: same image runs anywhere. Stateless containers: deploy to DR region from registry (no data migration needed). Stateful containers: need data replication strategy (database replication, volume snapshots, S3 cross-region replication). Multi-region deployment: global load balancer (Route53/CloudFront), container images replicated to DR region registry, infrastructure-as-code (Terraform) for consistent DR environment. DNS failover: health check based, automatic redirect to DR region"
+    weight: 0.35
+    description: "Container DR"
+  - type: llm_judge
+    criteria: "Testing and chaos engineering are practical — DR plan is worthless without testing. Quarterly DR drills: simulate region failure, measure actual RTO/RPO. Chaos engineering: Netflix Chaos Monkey approach — randomly terminate containers in production to verify resilience. Tools: Chaos Monkey, Litmus, Pumba (Docker-specific chaos). Game days: scheduled full-team DR exercises with stakeholder communication practice. Document and improve after each drill. Automated chaos tests in CI/CD pipeline for critical services. Track: actual RTO vs target RTO across drills"
+    weight: 0.30
+    description: "Testing and chaos"

package/courses/docker-container-debugging/scenarios/level-5/industry-container-patterns.yaml ADDED Viewed

@@ -0,0 +1,71 @@
+meta:
+  id: industry-container-patterns
+  level: 5
+  course: docker-container-debugging
+  type: output
+  description: "Analyze industry container patterns — evaluate container adoption patterns, anti-patterns, and maturity models across different industries and organization sizes"
+  tags: [Docker, industry, patterns, maturity-model, adoption, benchmarks, master]
+state: {}
+trigger: |
+  You're presenting at a container conference on "Container Adoption:
+  What We've Learned From 1,000 Deployments." Your research across
+  companies of all sizes reveals common patterns:
+  Maturity Model — Container Adoption Stages:
+  Stage 1 — Exploration (6-12 months):
+  - Docker on developer laptops for consistent environments
+  - One team pilots containerization for a single service
+  - Pain point: "it works on my machine" solved immediately
+  - Risk: containers treated as "lightweight VMs" (anti-pattern)
+  Stage 2 — Foundation (6-12 months):
+  - CI/CD pipeline builds container images
+  - Private registry, basic security scanning
+  - Docker Compose for staging environments
+  - Pain point: deployment consistency, environment parity
+  - Risk: lack of orchestration, manual scaling
+  Stage 3 — Scaling (12-18 months):
+  - Container orchestration (Kubernetes/Swarm) in production
+  - Centralized logging and monitoring
+  - Multiple teams containerizing services
+  - Pain point: operational complexity, skill gaps
+  - Risk: over-engineering, premature microservice decomposition
+  Stage 4 — Optimization (ongoing):
+  - Platform team provides self-service container platform
+  - GitOps, automated security, cost optimization
+  - Containers are default deployment target
+  - Pain point: legacy applications, cultural resistance
+  - Risk: platform team becomes bottleneck
+  Common anti-patterns observed:
+  - "Docker as a package manager" — complex Dockerfiles that replicate
+    system configuration instead of building cloud-native apps
+  - "Pet containers" — treating containers like servers (ssh in, fix,
+    don't restart)
+  - "Premature Kubernetes" — choosing K8s when 10 containers on
+    Compose would suffice
+  Task: Analyze industry container patterns. Write: the maturity model
+  stages with indicators, common anti-patterns and how to avoid them,
+  industry-specific considerations (healthcare, finance, government),
+  organization size impact on container strategy, and predictions for
+  container technology evolution.
+assertions:
+  - type: llm_judge
+    criteria: "Maturity model is actionable — each stage should have: indicators (how to know you're at this stage), goals (what to achieve before moving on), risks (what goes wrong at this stage), timeline expectations. Key insight: companies that skip stages fail — you can't adopt Kubernetes effectively without first mastering Docker and CI/CD. Maturity assessment should drive investment decisions. Most organizations are at Stage 2-3, even those who think they're at Stage 4"
+    weight: 0.35
+    description: "Maturity model"
+  - type: llm_judge
+    criteria: "Anti-patterns are explained with solutions — pet containers: treat containers as cattle, not pets (immutable infrastructure). Docker as VM: refactor applications for container-native patterns (12-factor, health checks, graceful shutdown). Premature Kubernetes: match orchestration complexity to actual needs. Monolith-in-a-container: valid first step, but plan decomposition. YAML engineering: spending more time on deployment config than application code. Tool sprawl: standardize on a curated set of tools"
+    weight: 0.35
+    description: "Anti-patterns"
+  - type: llm_judge
+    criteria: "Industry and evolution perspectives are insightful — healthcare: HIPAA requires encryption at rest/in transit, audit logging, access controls. Finance: PCI-DSS, SOX compliance, air-gapped environments. Government: FedRAMP, data sovereignty, FIPS-validated cryptography. Small companies (< 50 engineers): Docker Compose is often sufficient, don't over-engineer. Enterprise (500+ engineers): platform team is essential. Evolution: WebAssembly (Wasm) containers, serverless containers (Fargate, Cloud Run), eBPF for container networking/security, Unikernels for specialized workloads"
+    weight: 0.30
+    description: "Industry and evolution"