dojo.md 0.2.1 → 0.2.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (152) hide show
  1. package/courses/GENERATION_LOG.md +20 -0
  2. package/courses/api-documentation-writing/course.yaml +12 -0
  3. package/courses/api-documentation-writing/scenarios/level-1/authentication-basics.yaml +46 -0
  4. package/courses/api-documentation-writing/scenarios/level-1/data-types-formats.yaml +45 -0
  5. package/courses/api-documentation-writing/scenarios/level-1/endpoint-description.yaml +45 -0
  6. package/courses/api-documentation-writing/scenarios/level-1/error-documentation.yaml +45 -0
  7. package/courses/api-documentation-writing/scenarios/level-1/first-documentation-shift.yaml +47 -0
  8. package/courses/api-documentation-writing/scenarios/level-1/getting-started-guide.yaml +42 -0
  9. package/courses/api-documentation-writing/scenarios/level-1/pagination-docs.yaml +51 -0
  10. package/courses/api-documentation-writing/scenarios/level-1/request-parameters.yaml +46 -0
  11. package/courses/api-documentation-writing/scenarios/level-1/request-response-examples.yaml +48 -0
  12. package/courses/api-documentation-writing/scenarios/level-1/status-codes.yaml +45 -0
  13. package/courses/api-documentation-writing/scenarios/level-2/error-patterns.yaml +48 -0
  14. package/courses/api-documentation-writing/scenarios/level-2/intermediate-documentation-shift.yaml +48 -0
  15. package/courses/api-documentation-writing/scenarios/level-2/oauth-documentation.yaml +47 -0
  16. package/courses/api-documentation-writing/scenarios/level-2/openapi-specification.yaml +46 -0
  17. package/courses/api-documentation-writing/scenarios/level-2/rate-limiting-docs.yaml +45 -0
  18. package/courses/api-documentation-writing/scenarios/level-2/request-body-schemas.yaml +46 -0
  19. package/courses/api-documentation-writing/scenarios/level-2/schema-definitions.yaml +41 -0
  20. package/courses/api-documentation-writing/scenarios/level-2/swagger-redoc-rendering.yaml +43 -0
  21. package/courses/api-documentation-writing/scenarios/level-2/validation-documentation.yaml +47 -0
  22. package/courses/api-documentation-writing/scenarios/level-2/versioning-changelog.yaml +42 -0
  23. package/courses/api-documentation-writing/scenarios/level-3/advanced-documentation-shift.yaml +43 -0
  24. package/courses/api-documentation-writing/scenarios/level-3/api-style-guide.yaml +40 -0
  25. package/courses/api-documentation-writing/scenarios/level-3/code-samples-multilang.yaml +40 -0
  26. package/courses/api-documentation-writing/scenarios/level-3/content-architecture.yaml +47 -0
  27. package/courses/api-documentation-writing/scenarios/level-3/deprecation-communication.yaml +44 -0
  28. package/courses/api-documentation-writing/scenarios/level-3/interactive-api-explorer.yaml +42 -0
  29. package/courses/api-documentation-writing/scenarios/level-3/migration-guides.yaml +42 -0
  30. package/courses/api-documentation-writing/scenarios/level-3/sdk-documentation.yaml +40 -0
  31. package/courses/api-documentation-writing/scenarios/level-3/webhook-documentation.yaml +48 -0
  32. package/courses/api-documentation-writing/scenarios/level-3/websocket-sse-docs.yaml +47 -0
  33. package/courses/api-documentation-writing/scenarios/level-4/api-changelog-management.yaml +44 -0
  34. package/courses/api-documentation-writing/scenarios/level-4/api-governance-standards.yaml +41 -0
  35. package/courses/api-documentation-writing/scenarios/level-4/api-product-strategy.yaml +41 -0
  36. package/courses/api-documentation-writing/scenarios/level-4/developer-portal-design.yaml +48 -0
  37. package/courses/api-documentation-writing/scenarios/level-4/docs-as-code.yaml +41 -0
  38. package/courses/api-documentation-writing/scenarios/level-4/documentation-localization.yaml +46 -0
  39. package/courses/api-documentation-writing/scenarios/level-4/documentation-metrics.yaml +45 -0
  40. package/courses/api-documentation-writing/scenarios/level-4/documentation-testing.yaml +41 -0
  41. package/courses/api-documentation-writing/scenarios/level-4/expert-documentation-shift.yaml +45 -0
  42. package/courses/api-documentation-writing/scenarios/level-4/multi-audience-docs.yaml +46 -0
  43. package/courses/api-documentation-writing/scenarios/level-5/ai-powered-documentation.yaml +44 -0
  44. package/courses/api-documentation-writing/scenarios/level-5/api-first-documentation.yaml +45 -0
  45. package/courses/api-documentation-writing/scenarios/level-5/api-marketplace-docs.yaml +42 -0
  46. package/courses/api-documentation-writing/scenarios/level-5/board-api-strategy.yaml +48 -0
  47. package/courses/api-documentation-writing/scenarios/level-5/documentation-program-strategy.yaml +42 -0
  48. package/courses/api-documentation-writing/scenarios/level-5/documentation-team-structure.yaml +47 -0
  49. package/courses/api-documentation-writing/scenarios/level-5/dx-competitive-advantage.yaml +46 -0
  50. package/courses/api-documentation-writing/scenarios/level-5/ecosystem-documentation.yaml +45 -0
  51. package/courses/api-documentation-writing/scenarios/level-5/industry-documentation-patterns.yaml +46 -0
  52. package/courses/api-documentation-writing/scenarios/level-5/master-documentation-shift.yaml +46 -0
  53. package/courses/code-review-feedback-writing/course.yaml +12 -0
  54. package/courses/code-review-feedback-writing/scenarios/level-1/approve-vs-request-changes.yaml +48 -0
  55. package/courses/code-review-feedback-writing/scenarios/level-1/asking-questions.yaml +50 -0
  56. package/courses/code-review-feedback-writing/scenarios/level-1/clear-comment-writing.yaml +45 -0
  57. package/courses/code-review-feedback-writing/scenarios/level-1/constructive-tone.yaml +43 -0
  58. package/courses/code-review-feedback-writing/scenarios/level-1/first-review-shift.yaml +46 -0
  59. package/courses/code-review-feedback-writing/scenarios/level-1/giving-praise.yaml +44 -0
  60. package/courses/code-review-feedback-writing/scenarios/level-1/nitpick-etiquette.yaml +44 -0
  61. package/courses/code-review-feedback-writing/scenarios/level-1/providing-context.yaml +46 -0
  62. package/courses/code-review-feedback-writing/scenarios/level-1/reviewing-small-prs.yaml +43 -0
  63. package/courses/code-review-feedback-writing/scenarios/level-1/style-vs-logic.yaml +48 -0
  64. package/courses/code-review-feedback-writing/scenarios/level-2/architectural-feedback.yaml +52 -0
  65. package/courses/code-review-feedback-writing/scenarios/level-2/intermediate-review-shift.yaml +46 -0
  66. package/courses/code-review-feedback-writing/scenarios/level-2/performance-feedback.yaml +50 -0
  67. package/courses/code-review-feedback-writing/scenarios/level-2/reviewing-breaking-changes.yaml +44 -0
  68. package/courses/code-review-feedback-writing/scenarios/level-2/reviewing-complex-prs.yaml +43 -0
  69. package/courses/code-review-feedback-writing/scenarios/level-2/reviewing-documentation.yaml +47 -0
  70. package/courses/code-review-feedback-writing/scenarios/level-2/reviewing-error-handling.yaml +50 -0
  71. package/courses/code-review-feedback-writing/scenarios/level-2/reviewing-tests.yaml +53 -0
  72. package/courses/code-review-feedback-writing/scenarios/level-2/security-review-comments.yaml +50 -0
  73. package/courses/code-review-feedback-writing/scenarios/level-2/suggesting-alternatives.yaml +42 -0
  74. package/courses/code-review-feedback-writing/scenarios/level-3/cross-team-review.yaml +45 -0
  75. package/courses/code-review-feedback-writing/scenarios/level-3/mentoring-through-review.yaml +46 -0
  76. package/courses/code-review-feedback-writing/scenarios/level-3/reviewing-unfamiliar-code.yaml +43 -0
  77. package/courses/terraform-infrastructure-setup/scenarios/level-1/first-debugging-shift.yaml +66 -0
  78. package/courses/terraform-infrastructure-setup/scenarios/level-1/hcl-syntax-errors.yaml +65 -0
  79. package/courses/terraform-infrastructure-setup/scenarios/level-1/plan-output-reading.yaml +71 -0
  80. package/courses/terraform-infrastructure-setup/scenarios/level-1/provider-configuration.yaml +62 -0
  81. package/courses/terraform-infrastructure-setup/scenarios/level-1/resource-creation-failures.yaml +54 -0
  82. package/courses/terraform-infrastructure-setup/scenarios/level-1/resource-references.yaml +70 -0
  83. package/courses/terraform-infrastructure-setup/scenarios/level-1/state-file-basics.yaml +73 -0
  84. package/courses/terraform-infrastructure-setup/scenarios/level-1/terraform-fmt-validate.yaml +58 -0
  85. package/courses/terraform-infrastructure-setup/scenarios/level-1/variable-and-output-errors.yaml +78 -0
  86. package/courses/terraform-infrastructure-setup/scenarios/level-2/count-vs-for-each.yaml +58 -0
  87. package/courses/terraform-infrastructure-setup/scenarios/level-2/dependency-management.yaml +80 -0
  88. package/courses/terraform-infrastructure-setup/scenarios/level-2/intermediate-debugging-shift.yaml +66 -0
  89. package/courses/terraform-infrastructure-setup/scenarios/level-2/lifecycle-rules.yaml +51 -0
  90. package/courses/terraform-infrastructure-setup/scenarios/level-2/locals-and-expressions.yaml +58 -0
  91. package/courses/terraform-infrastructure-setup/scenarios/level-2/module-structure.yaml +75 -0
  92. package/courses/terraform-infrastructure-setup/scenarios/level-2/provisioner-pitfalls.yaml +64 -0
  93. package/courses/terraform-infrastructure-setup/scenarios/level-2/remote-state-backend.yaml +55 -0
  94. package/courses/terraform-infrastructure-setup/scenarios/level-2/terraform-import.yaml +55 -0
  95. package/courses/terraform-infrastructure-setup/scenarios/level-2/workspace-management.yaml +51 -0
  96. package/courses/terraform-infrastructure-setup/scenarios/level-3/advanced-debugging-shift.yaml +63 -0
  97. package/courses/terraform-infrastructure-setup/scenarios/level-3/api-rate-limiting.yaml +50 -0
  98. package/courses/terraform-infrastructure-setup/scenarios/level-3/conditional-resources.yaml +66 -0
  99. package/courses/terraform-infrastructure-setup/scenarios/level-3/drift-detection.yaml +66 -0
  100. package/courses/terraform-infrastructure-setup/scenarios/level-3/dynamic-blocks.yaml +71 -0
  101. package/courses/terraform-infrastructure-setup/scenarios/level-3/large-scale-refactoring.yaml +59 -0
  102. package/courses/terraform-infrastructure-setup/scenarios/level-3/multi-provider-config.yaml +69 -0
  103. package/courses/terraform-infrastructure-setup/scenarios/level-3/state-surgery.yaml +57 -0
  104. package/courses/terraform-infrastructure-setup/scenarios/level-3/terraform-cloud-enterprise.yaml +59 -0
  105. package/courses/terraform-infrastructure-setup/scenarios/level-3/terraform-debugging.yaml +51 -0
  106. package/courses/terraform-infrastructure-setup/scenarios/level-4/blast-radius-management.yaml +51 -0
  107. package/courses/terraform-infrastructure-setup/scenarios/level-4/cicd-pipeline-design.yaml +50 -0
  108. package/courses/terraform-infrastructure-setup/scenarios/level-4/compliance-as-code.yaml +46 -0
  109. package/courses/terraform-infrastructure-setup/scenarios/level-4/cost-estimation-governance.yaml +42 -0
  110. package/courses/terraform-infrastructure-setup/scenarios/level-4/expert-debugging-shift.yaml +51 -0
  111. package/courses/terraform-infrastructure-setup/scenarios/level-4/iac-organization-strategy.yaml +45 -0
  112. package/courses/terraform-infrastructure-setup/scenarios/level-4/incident-response-iac.yaml +47 -0
  113. package/courses/terraform-infrastructure-setup/scenarios/level-4/infrastructure-testing.yaml +41 -0
  114. package/courses/terraform-infrastructure-setup/scenarios/level-4/module-registry-design.yaml +45 -0
  115. package/courses/terraform-infrastructure-setup/scenarios/level-4/multi-account-strategy.yaml +57 -0
  116. package/courses/terraform-infrastructure-setup/scenarios/level-5/board-infrastructure-investment.yaml +53 -0
  117. package/courses/terraform-infrastructure-setup/scenarios/level-5/disaster-recovery-iac.yaml +47 -0
  118. package/courses/terraform-infrastructure-setup/scenarios/level-5/enterprise-iac-transformation.yaml +48 -0
  119. package/courses/terraform-infrastructure-setup/scenarios/level-5/iac-technology-evolution.yaml +49 -0
  120. package/courses/terraform-infrastructure-setup/scenarios/level-5/ma-infrastructure-consolidation.yaml +54 -0
  121. package/courses/terraform-infrastructure-setup/scenarios/level-5/master-debugging-shift.yaml +53 -0
  122. package/courses/terraform-infrastructure-setup/scenarios/level-5/multi-cloud-strategy.yaml +49 -0
  123. package/courses/terraform-infrastructure-setup/scenarios/level-5/platform-engineering.yaml +47 -0
  124. package/courses/terraform-infrastructure-setup/scenarios/level-5/regulatory-compliance-automation.yaml +47 -0
  125. package/courses/terraform-infrastructure-setup/scenarios/level-5/terraform-vs-alternatives.yaml +46 -0
  126. package/dist/cli/commands/generate.d.ts.map +1 -1
  127. package/dist/cli/commands/generate.js +2 -1
  128. package/dist/cli/commands/generate.js.map +1 -1
  129. package/dist/cli/commands/train.d.ts.map +1 -1
  130. package/dist/cli/commands/train.js +6 -3
  131. package/dist/cli/commands/train.js.map +1 -1
  132. package/dist/cli/index.js +9 -6
  133. package/dist/cli/index.js.map +1 -1
  134. package/dist/cli/run-demo.js +3 -2
  135. package/dist/cli/run-demo.js.map +1 -1
  136. package/dist/engine/model-utils.d.ts +6 -0
  137. package/dist/engine/model-utils.d.ts.map +1 -1
  138. package/dist/engine/model-utils.js +28 -1
  139. package/dist/engine/model-utils.js.map +1 -1
  140. package/dist/engine/training.d.ts.map +1 -1
  141. package/dist/engine/training.js +4 -3
  142. package/dist/engine/training.js.map +1 -1
  143. package/dist/generator/course-generator.d.ts.map +1 -1
  144. package/dist/generator/course-generator.js +4 -3
  145. package/dist/generator/course-generator.js.map +1 -1
  146. package/dist/mcp/server.d.ts.map +1 -1
  147. package/dist/mcp/server.js +7 -3
  148. package/dist/mcp/server.js.map +1 -1
  149. package/dist/mcp/session-manager.d.ts.map +1 -1
  150. package/dist/mcp/session-manager.js +3 -2
  151. package/dist/mcp/session-manager.js.map +1 -1
  152. package/package.json +3 -2
@@ -0,0 +1,50 @@
1
+ meta:
2
+ id: cicd-pipeline-design
3
+ level: 4
4
+ course: terraform-infrastructure-setup
5
+ type: output
6
+ description: "Design CI/CD pipelines for Terraform — implement GitOps workflows with Atlantis, GitHub Actions, or Terraform Cloud for safe infrastructure deployment"
7
+ tags: [Terraform, CI/CD, GitOps, Atlantis, GitHub-Actions, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your team deploys Terraform from individual laptops. Last month:
13
+ - An engineer applied to production instead of staging (wrong workspace)
14
+ - Two engineers ran apply simultaneously, causing state corruption
15
+ - An apply failed halfway but no one noticed for 3 hours
16
+ - No record of who deployed what or when
17
+
18
+ You need to design a CI/CD pipeline for Terraform that prevents
19
+ all of these issues. Options on the table:
20
+
21
+ 1. GitHub Actions with custom workflow
22
+ 2. Atlantis (pull request automation)
23
+ 3. Terraform Cloud/Enterprise
24
+ 4. Spacelift
25
+
26
+ Requirements:
27
+ - Plan on every PR
28
+ - Apply only after approval and merge
29
+ - Environment protection (can't accidentally apply to prod)
30
+ - Cost estimation before apply
31
+ - Security scanning (tfsec/checkov)
32
+ - Slack notifications for plan/apply results
33
+
34
+ Task: Design the CI/CD pipeline for Terraform, compare the tool
35
+ options, show a complete workflow from code change to production
36
+ deployment, and address security considerations.
37
+
38
+ assertions:
39
+ - type: llm_judge
40
+ criteria: "Complete pipeline workflow is designed — code change → PR opened → automated pipeline: (1) terraform fmt -check (formatting), (2) terraform validate (syntax), (3) tfsec/checkov scan (security), (4) terraform plan (preview changes), (5) Infracost estimate (cost), (6) post results as PR comment. On merge to main: (7) terraform plan again (detect drift since PR), (8) approval gate (manual for prod), (9) terraform apply, (10) post-apply verification, (11) Slack notification. Environment promotion: dev auto-apply, staging auto-apply, prod manual approval"
41
+ weight: 0.35
42
+ description: "Pipeline workflow"
43
+ - type: llm_judge
44
+ criteria: "Tool comparison is practical — Atlantis: open-source, PR automation, self-hosted, lightweight. Best for: teams wanting simple PR-based workflow. GitHub Actions: flexible, native GitHub integration, custom workflows. Best for: teams already on GitHub wanting full control. Terraform Cloud: managed service, built-in Sentinel, cost estimation, team management. Best for: organizations wanting managed solution. Spacelift: multi-tool support, advanced policies, drift detection. Best for: enterprises with complex requirements. Recommendation depends on: team size, budget, compliance needs, multi-tool requirements"
45
+ weight: 0.35
46
+ description: "Tool comparison"
47
+ - type: llm_judge
48
+ criteria: "Security considerations are covered — credentials: use OIDC for cloud authentication (no static keys in CI). GitHub Actions: aws-actions/configure-aws-credentials with OIDC. State access: CI role has minimal permissions (plan role vs apply role). Secrets: never echo credentials, use GitHub encrypted secrets or Terraform Cloud variables. Branch protection: require PR reviews, no direct pushes to main. Environment protection: GitHub environments with required reviewers for prod. Audit: log all plan/apply with outputs. Network: CI runner in private network if accessing private resources"
49
+ weight: 0.30
50
+ description: "Security"
@@ -0,0 +1,46 @@
1
+ meta:
2
+ id: compliance-as-code
3
+ level: 4
4
+ course: terraform-infrastructure-setup
5
+ type: output
6
+ description: "Implement compliance as code — enforce SOC2, HIPAA, and PCI-DSS requirements through Terraform policies, scanning, and automated remediation"
7
+ tags: [Terraform, compliance, SOC2, HIPAA, PCI-DSS, policy-as-code, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your healthtech company needs SOC2 Type II and HIPAA compliance.
13
+ The compliance auditor found these Terraform-managed infrastructure
14
+ gaps:
15
+
16
+ 1. S3 buckets without encryption at rest (5 of 30 buckets)
17
+ 2. Security groups allowing 0.0.0.0/0 ingress on non-HTTP ports
18
+ 3. RDS instances without encryption or automated backups
19
+ 4. CloudTrail not enabled in all regions
20
+ 5. No log retention policy (CloudWatch logs kept indefinitely)
21
+ 6. IAM users with programmatic access keys older than 90 days
22
+ 7. EBS volumes not encrypted by default
23
+
24
+ The auditor needs evidence that:
25
+ - These controls are enforced automatically (not just documented)
26
+ - Non-compliant resources cannot be deployed
27
+ - Continuous monitoring detects and alerts on compliance drift
28
+
29
+ Task: Design the compliance-as-code strategy covering: policy
30
+ enforcement (prevent non-compliant deployments), automated scanning,
31
+ remediation patterns, audit evidence generation, and continuous
32
+ compliance monitoring.
33
+
34
+ assertions:
35
+ - type: llm_judge
36
+ criteria: "Policy enforcement prevents non-compliant deployments — pre-deployment: checkov/tfsec in CI catches violations before apply. Sentinel policies (Terraform Cloud): hard-mandatory rules that block non-compliant applies. Example policies: all S3 buckets must have server_side_encryption_configuration, all RDS instances must have storage_encrypted = true and backup_retention_period >= 7, security groups cannot have cidr_blocks = ['0.0.0.0/0'] except on ports 80 and 443. Module library: compliant-by-default modules that enforce encryption, logging, and access controls"
37
+ weight: 0.35
38
+ description: "Policy enforcement"
39
+ - type: llm_judge
40
+ criteria: "Remediation and audit evidence are covered — remediation: (1) update Terraform modules to include compliance requirements by default (encryption, backups, logging), (2) apply changes across all environments using shared module updates, (3) for existing non-compliant resources: plan and apply to add encryption/backups. Audit evidence: (1) Terraform Cloud audit logs showing who approved what, (2) Git history showing code reviews for all infrastructure changes, (3) checkov reports stored as CI artifacts, (4) AWS Config compliance dashboard. Compliance as code = evidence generated automatically from deployment pipeline"
41
+ weight: 0.35
42
+ description: "Remediation and audit"
43
+ - type: llm_judge
44
+ criteria: "Continuous monitoring is practical — AWS Config rules: detect non-compliant resources in real-time (encrypted-volumes, s3-bucket-server-side-encryption-enabled, rds-storage-encrypted). Config remediation: auto-remediate with SSM Automation (e.g., enable encryption on new unencrypted volumes). Terraform Cloud drift detection: scheduled plans detect unauthorized changes. Alert pipeline: Config finding → SNS → Lambda → Slack/PagerDuty. Quarterly compliance review: run full checkov scan, compare against SOC2/HIPAA control matrix, generate compliance report. Map each Terraform policy to specific compliance control (SOC2 CC6.1, HIPAA §164.312(a)(2)(iv))"
45
+ weight: 0.30
46
+ description: "Continuous monitoring"
@@ -0,0 +1,42 @@
1
+ meta:
2
+ id: cost-estimation-governance
3
+ level: 4
4
+ course: terraform-infrastructure-setup
5
+ type: output
6
+ description: "Implement cost governance with Terraform — integrate Infracost for pre-deployment estimation, set budget alerts, and enforce cost policies"
7
+ tags: [Terraform, cost, Infracost, FinOps, governance, budgets, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your monthly AWS bill is $150K and growing 15% month-over-month.
13
+ Nobody knows the cost impact of Terraform changes until the bill
14
+ arrives. Recent surprises:
15
+
16
+ - Engineer created a NAT Gateway in 4 AZs ($576/month) when 1 was
17
+ sufficient ($144/month)
18
+ - A for_each over 50 items created 50 CloudWatch dashboards at
19
+ $3/each ($150/month) — nobody realized
20
+ - An RDS upgrade from db.r5.large to db.r5.4xlarge increased cost
21
+ from $260/month to $2,080/month
22
+ - Dev environment running same instance types as production ($8K/month
23
+ wasted)
24
+
25
+ Task: Design the cost governance strategy covering: Infracost
26
+ integration in CI/CD, policy-based cost controls, environment-specific
27
+ sizing, tagging for cost allocation, and ongoing cost optimization
28
+ practices.
29
+
30
+ assertions:
31
+ - type: llm_judge
32
+ criteria: "Infracost integration is designed — Infracost estimates cost changes in PRs before deployment. CI integration: infracost breakdown --path . shows total monthly cost, infracost diff shows cost change from PR. PR comment: shows cost increase/decrease per resource. Setup: install Infracost in CI, generate plan JSON (terraform plan -out=plan.tfplan && terraform show -json plan.tfplan), run infracost diff --path plan.json. Thresholds: alert if monthly cost increase > $100, block if > $500 (configurable). Free tier available for open source and small teams"
33
+ weight: 0.35
34
+ description: "Infracost"
35
+ - type: llm_judge
36
+ criteria: "Policy-based cost controls are implemented — Sentinel/OPA policies: restrict expensive instance types (no db.r5.4xlarge in non-prod), limit resource counts (max 3 NAT Gateways), require cost justification for changes over threshold. Variable validation: variable 'instance_type' { validation { condition = !contains(['r5.4xlarge','r5.8xlarge'], var.instance_type) || var.environment == 'prod' } }. Environment sizing: locals { env_sizing = { dev = 't3.small', staging = 't3.medium', prod = 't3.large' } }. Enforce via policy: dev instances must be t3.small or smaller"
37
+ weight: 0.35
38
+ description: "Cost policies"
39
+ - type: llm_judge
40
+ criteria: "Tagging and optimization are practical — mandatory tags for cost allocation: Team, Environment, CostCenter, Service. AWS Cost Explorer uses tags for breakdown. Enforce tags via Sentinel policy or AWS SCP (deny resource creation without required tags). Cost optimization: (1) right-size instances (use AWS Compute Optimizer data), (2) Reserved Instances or Savings Plans for steady-state, (3) spot instances for non-critical workloads, (4) auto-shutdown dev environments off-hours (Lambda + CloudWatch Events). Monthly cost review: compare actual vs Infracost estimates, identify optimization opportunities"
41
+ weight: 0.30
42
+ description: "Tagging and optimization"
@@ -0,0 +1,51 @@
1
+ meta:
2
+ id: expert-debugging-shift
3
+ level: 4
4
+ course: terraform-infrastructure-setup
5
+ type: output
6
+ description: "Combined expert shift — advise on organizational IaC strategy while handling CI/CD pipeline failures and compliance audit findings"
7
+ tags: [Terraform, troubleshooting, combined, shift-simulation, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ As infrastructure lead, you face three organizational challenges:
13
+
14
+ Challenge 1 — CI/CD pipeline reliability:
15
+ Your Atlantis-based pipeline has been flaky:
16
+ - 20% of plans timeout (state lock contention with 8 teams)
17
+ - Plans show different results on retry (eventual consistency)
18
+ - Apply fails intermittently (API rate limiting)
19
+ Teams are losing trust and starting to apply from laptops again.
20
+
21
+ Challenge 2 — Compliance audit preparation:
22
+ SOC2 auditor arrives in 6 weeks. They need to see:
23
+ - Evidence that all infrastructure changes go through code review
24
+ - Proof that production access is restricted
25
+ - Encryption enforcement across all resources
26
+ - Automated security scanning results
27
+
28
+ Challenge 3 — Cost optimization mandate:
29
+ CFO mandates 25% cost reduction ($37.5K/month savings from $150K).
30
+ Current waste identified:
31
+ - Dev environments running 24/7 ($20K/month)
32
+ - Oversized RDS instances ($15K/month excess)
33
+ - Unused EBS volumes and snapshots ($8K/month)
34
+ - NAT Gateway in all AZs for non-prod ($5K/month excess)
35
+
36
+ Task: Address all three challenges with actionable plans and
37
+ timelines.
38
+
39
+ assertions:
40
+ - type: llm_judge
41
+ criteria: "CI/CD pipeline reliability is addressed — state lock contention: split monolith states into per-team states (each team's plan/apply doesn't block others). Timeout: increase lock timeout (-lock-timeout=5m), investigate which team's applies are long-running. Eventual consistency: add terraform plan -refresh-only before plan to ensure consistent state. Rate limiting: reduce parallelism (-parallelism=5), stagger team deployments. Trust recovery: show teams metrics (success rate improvement), ensure fast feedback loops. Consider: Terraform Cloud for managed execution (handles locking, retries, queueing)"
42
+ weight: 0.35
43
+ description: "Pipeline reliability"
44
+ - type: llm_judge
45
+ criteria: "Compliance preparation has a timeline — weeks 1-2: implement checkov/tfsec in all CI pipelines, generate baseline compliance reports. Weeks 2-3: remediate findings (add encryption to all S3 buckets, RDS, EBS; restrict security groups; configure CloudTrail). Weeks 3-4: implement Sentinel policies to prevent future violations. Weeks 4-5: generate audit evidence (Git logs showing all changes reviewed, CI scan reports, Terraform Cloud audit logs). Week 6: dry-run audit with compliance team. Evidence portfolio: PR review logs, automated scan reports, policy enforcement logs, access control documentation (IAM policy)"
46
+ weight: 0.35
47
+ description: "Compliance"
48
+ - type: llm_judge
49
+ criteria: "Cost optimization targets specific savings — dev environments 24/7 → schedule off-hours (Lambda + EventBridge, save $15K): terraform manages the schedule. RDS right-sizing: use Performance Insights data, downsize dev/staging instances (save $10K). EBS cleanup: terraform state list to find managed volumes, delete unattached ones, manage snapshot lifecycle (save $5K). NAT Gateway: single NAT Gateway per non-prod VPC instead of per-AZ (save $4K). Total: ~$34K savings (23%, close to 25% target). Implementation: Infracost in all PRs to prevent future waste, monthly cost review meeting, per-team cost dashboards using tags"
50
+ weight: 0.30
51
+ description: "Cost optimization"
@@ -0,0 +1,45 @@
1
+ meta:
2
+ id: iac-organization-strategy
3
+ level: 4
4
+ course: terraform-infrastructure-setup
5
+ type: output
6
+ description: "Design IaC organization strategy — choose between mono-repo and multi-repo, design state architecture, and establish team ownership boundaries"
7
+ tags: [Terraform, organization, mono-repo, multi-repo, strategy, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You're the infrastructure architect for a company with 80 engineers
13
+ across 8 teams. Current state:
14
+ - 3 separate Terraform repositories with inconsistent patterns
15
+ - 15 state files with no naming convention
16
+ - No shared modules — each team copy-pastes configurations
17
+ - Teams frequently conflict when deploying overlapping resources
18
+ - No visibility into who owns what infrastructure
19
+
20
+ You need to design the Terraform organization strategy for the
21
+ company. Leadership wants:
22
+ - Clear team ownership boundaries
23
+ - Reusable modules (stop copy-paste)
24
+ - Safe deployment workflows
25
+ - Audit trail for all changes
26
+ - Cost visibility per team
27
+
28
+ Task: Design the IaC organization strategy covering: repository
29
+ structure (mono-repo vs multi-repo trade-offs), state architecture
30
+ (how to partition state files), module library design, team
31
+ ownership model, and governance policies.
32
+
33
+ assertions:
34
+ - type: llm_judge
35
+ criteria: "Repository and state architecture are designed — mono-repo: single repo with directories per team/service. Benefits: unified modules, single PR workflow, easy cross-team visibility. Challenges: large repo, team coupling, complex CI/CD. Multi-repo: separate repos per team or service domain. Benefits: team autonomy, independent versioning, isolated CI/CD. Challenges: module sharing harder, cross-repo coordination. Recommended for 8 teams: hybrid — shared modules repo + per-team repos. State architecture: partition by (1) environment (dev/staging/prod), (2) team/service domain, (3) blast radius. Naming: s3://state/<team>/<env>/<service>.tfstate"
36
+ weight: 0.35
37
+ description: "Repo and state"
38
+ - type: llm_judge
39
+ criteria: "Module library and team ownership are designed — internal module registry: centralized repo with versioned, tested modules (VPC, EKS, RDS, S3). Module standards: README, input/output documentation, examples, tests (terraform test or Terratest). Publishing: git tags for versioning, semantic versioning (major.minor.patch). Team ownership: CODEOWNERS file mapping directories to teams. Platform team owns shared modules and foundation infrastructure. Service teams own their application infrastructure. Tagging strategy: mandatory tags for team, cost-center, environment on all resources"
40
+ weight: 0.35
41
+ description: "Modules and ownership"
42
+ - type: llm_judge
43
+ criteria: "Governance and deployment are practical — deployment workflow: feature branch → PR → automated plan → code review → merge → automated apply. Policy enforcement: pre-commit hooks (fmt, validate), CI checks (tflint, tfsec, checkov), Sentinel/OPA policies in Terraform Cloud. Cost visibility: Infracost in PR comments, AWS Cost Explorer tags. Audit: Terraform Cloud audit logs or CloudTrail for API calls. Change management: production changes require 2 approvals, blast radius classification (high-risk changes need additional review). Onboarding: documentation, module catalog, self-service templates"
44
+ weight: 0.30
45
+ description: "Governance"
@@ -0,0 +1,47 @@
1
+ meta:
2
+ id: incident-response-iac
3
+ level: 4
4
+ course: terraform-infrastructure-setup
5
+ type: output
6
+ description: "Handle infrastructure incidents with Terraform — implement emergency change procedures, rollback strategies, and post-incident IaC reconciliation"
7
+ tags: [Terraform, incident-response, rollback, emergency, recovery, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Production is down. The sequence of events:
13
+
14
+ 10:00 — Engineer deploys Terraform changes (new ECS task definition)
15
+ 10:05 — Health checks start failing on 3 of 8 ECS services
16
+ 10:10 — ALB marks targets unhealthy, 502 errors spike
17
+ 10:15 — Pager fires, incident declared
18
+ 10:20 — Investigation: new task definition has wrong environment
19
+ variable pointing to staging database
20
+ 10:25 — Need to rollback immediately
21
+
22
+ Options debated during the incident:
23
+ 1. Revert the git commit and re-apply
24
+ 2. terraform apply -target to fix just the task definition
25
+ 3. Manually update ECS in console
26
+ 4. terraform state replace-provider (someone suggested this randomly)
27
+
28
+ After the immediate fix, terraform plan shows drift because someone
29
+ made emergency changes in the console during the incident.
30
+
31
+ Task: Design the incident response procedure for Terraform-managed
32
+ infrastructure covering: rollback strategies, emergency change
33
+ procedures, post-incident reconciliation, and runbook development.
34
+
35
+ assertions:
36
+ - type: llm_judge
37
+ criteria: "Rollback strategies are evaluated — Option 1 (git revert + apply): safest, maintains IaC integrity, but slow (5-10 minutes for plan+apply). Option 2 (targeted apply): faster, fixes specific resource, but skips normal review process. Option 3 (console change): fastest (30 seconds), but creates drift. Recommendation: (1) for critical outages: fix in console first to restore service, then reconcile Terraform. (2) for moderate issues: git revert + targeted apply. (3) for minor issues: normal git revert + full apply. Speed of recovery matters more than IaC purity during incidents"
38
+ weight: 0.35
39
+ description: "Rollback strategies"
40
+ - type: llm_judge
41
+ criteria: "Emergency change procedure is defined — emergency procedure: (1) declare incident, (2) fix immediately using fastest safe method (console if needed), (3) document all manual changes made, (4) after incident resolved: create PR with Terraform changes matching manual fixes, (5) run terraform plan to verify no drift, (6) apply to reconcile state. Emergency access: pre-configured 'break glass' IAM role with broad permissions, used only during incidents, logged via CloudTrail. Never run terraform destroy during an incident. Incident commander approves all infrastructure changes"
42
+ weight: 0.35
43
+ description: "Emergency procedure"
44
+ - type: llm_judge
45
+ criteria: "Post-incident reconciliation is practical — after incident: (1) terraform plan -refresh-only to see all drift, (2) review each drift: accept intentional changes (update .tf), revert accidental changes (apply). (3) Update IaC to prevent recurrence (add validation, pre-deploy checks). Runbook: document common failure scenarios with exact rollback commands. Example runbooks: bad ECS deployment (terraform apply -target=aws_ecs_task_definition.app -var='image_tag=v1.2.3'), database connection issue (terraform apply -target=aws_security_group.db). Store runbooks alongside Terraform code. Post-mortem: identify what IaC improvements would have prevented the incident"
46
+ weight: 0.30
47
+ description: "Reconciliation"
@@ -0,0 +1,41 @@
1
+ meta:
2
+ id: infrastructure-testing
3
+ level: 4
4
+ course: terraform-infrastructure-setup
5
+ type: output
6
+ description: "Test Terraform infrastructure — implement unit tests with terraform test, integration tests with Terratest, and policy testing with checkov and tfsec"
7
+ tags: [Terraform, testing, Terratest, checkov, tfsec, terraform-test, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your Terraform modules have no tests. Last month, three incidents
13
+ were caused by module changes that broke consumers:
14
+ - VPC module changed output name (vpc_id → id), breaking 5 services
15
+ - RDS module removed a variable, breaking all callers
16
+ - Security group module allowed 0.0.0.0/0 ingress by default
17
+
18
+ Your testing strategy needs to cover:
19
+ 1. Module contract validation (inputs/outputs don't break)
20
+ 2. Security compliance (no open security groups, encryption enabled)
21
+ 3. Integration testing (resources actually work in AWS)
22
+ 4. Cost validation (changes don't blow budget)
23
+
24
+ Task: Design the infrastructure testing strategy covering: terraform
25
+ test (native, Terraform 1.6+), Terratest (Go-based integration),
26
+ policy scanning (checkov, tfsec), testing pyramid for infrastructure,
27
+ and CI integration for automated testing.
28
+
29
+ assertions:
30
+ - type: llm_judge
31
+ criteria: "terraform test (native testing) is explained — terraform test runs .tftest.hcl files. command = plan: validates without creating resources (fast, free). command = apply: creates real resources (slow, costs money, thorough). Assert conditions: assert { condition = output.vpc_id != '', error_message = 'VPC ID must not be empty' }. Variables block for test inputs. Run blocks chain: create VPC → verify VPC → create subnet using VPC. Module contract testing: verify required outputs exist and have correct types. Run with: terraform test. Best for: unit testing modules without external dependencies"
32
+ weight: 0.35
33
+ description: "terraform test"
34
+ - type: llm_judge
35
+ criteria: "Terratest and policy scanning are covered — Terratest (Go): creates real infrastructure, validates properties, destroys after test. Pattern: InitAndApply → verify outputs → verify cloud resources (SDK calls) → Destroy. Example: deploy VPC, verify CIDR block matches, verify subnets are in correct AZs, destroy. Best for: integration testing that verifies real cloud behavior. Policy scanning: checkov scans for security misconfigurations (CIS benchmarks, HIPAA, PCI-DSS). tfsec: Terraform-specific security scanner. OPA/Conftest: custom policy validation. Run in CI before plan to catch issues early"
36
+ weight: 0.35
37
+ description: "Terratest and policy"
38
+ - type: llm_judge
39
+ criteria: "Testing pyramid and CI integration are practical — testing pyramid for infrastructure: base = static analysis (fmt, validate, tfsec, checkov — fast, run on every PR), middle = plan-based tests (terraform test with command = plan — moderate speed, no cost), top = integration tests (Terratest/terraform test with command = apply — slow, costly, run nightly or on release). CI integration: every PR gets static analysis + plan tests. Nightly: full integration tests against ephemeral AWS account. Release: full integration suite. Cost management: use smallest instance types in tests, set up auto-cleanup for failed test runs, dedicated test AWS account with budget alerts"
40
+ weight: 0.30
41
+ description: "Pyramid and CI"
@@ -0,0 +1,45 @@
1
+ meta:
2
+ id: module-registry-design
3
+ level: 4
4
+ course: terraform-infrastructure-setup
5
+ type: output
6
+ description: "Design a private module registry — create versioned, tested, documented modules with governance for enterprise consumption"
7
+ tags: [Terraform, modules, registry, versioning, governance, enterprise, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your organization has 200+ Terraform modules scattered across 15
13
+ repositories with no versioning, testing, or documentation standards.
14
+ Teams duplicate effort building similar modules. There's no way to
15
+ know which modules are safe, maintained, or compliant.
16
+
17
+ A recent incident: a team used an outdated VPC module that created
18
+ security groups without logging — violating SOC2 controls. The module
19
+ had been "fixed" months ago but the team was using an old copy.
20
+
21
+ You need to design a private module registry that:
22
+ - Provides a catalog of approved, tested modules
23
+ - Enforces versioning and deprecation
24
+ - Includes compliance-checked modules
25
+ - Has clear ownership and support model
26
+ - Prevents use of unapproved or outdated modules
27
+
28
+ Task: Design the private module registry covering: registry platform
29
+ choice, module lifecycle (create, review, publish, deprecate),
30
+ versioning strategy, testing requirements, documentation standards,
31
+ and consumption governance.
32
+
33
+ assertions:
34
+ - type: llm_judge
35
+ criteria: "Registry platform and module lifecycle are designed — platform options: Terraform Cloud private registry (built-in, easy), self-hosted registry (terraform-registry-address), Git-based with tags (simple, no separate infrastructure). Module lifecycle: (1) proposal: RFC for new module, (2) development: follow template structure, (3) review: platform team reviews for compliance and quality, (4) testing: automated tests must pass (terraform test, checkov), (5) publish: tagged release with changelog, (6) maintenance: active, deprecated, archived states. Module maturity levels: experimental, supported, certified"
36
+ weight: 0.35
37
+ description: "Registry and lifecycle"
38
+ - type: llm_judge
39
+ criteria: "Versioning and testing are enforced — semantic versioning: MAJOR (breaking changes), MINOR (new features, backward compatible), PATCH (bug fixes). Version constraints: consumers use ~> 2.0 (allows 2.x, not 3.0). Breaking change policy: major version bump, migration guide, deprecation notice 2 releases before removal. Testing requirements before publish: (1) terraform fmt -check passes, (2) terraform validate passes, (3) checkov scan clean, (4) terraform test plan-level tests pass, (5) integration tests pass (for certified modules). CI pipeline: on tag creation, run all tests, publish to registry if passing"
40
+ weight: 0.35
41
+ description: "Versioning and testing"
42
+ - type: llm_judge
43
+ criteria: "Documentation and consumption governance are practical — documentation requirements: README with description, usage examples, inputs table, outputs table, requirements (provider versions). Generated with terraform-docs. Examples directory with working configurations. Consumption governance: Sentinel policy requiring modules from approved registry sources only. Module pinning: all consumers must use version constraints (not latest). Upgrade process: platform team announces new versions, teams have 90 days to upgrade deprecated versions. Metrics: module adoption rate, version currency (% on latest), support ticket volume per module"
44
+ weight: 0.30
45
+ description: "Docs and governance"
@@ -0,0 +1,57 @@
1
+ meta:
2
+ id: multi-account-strategy
3
+ level: 4
4
+ course: terraform-infrastructure-setup
5
+ type: output
6
+ description: "Design multi-account Terraform strategy — implement AWS Organizations landing zone, cross-account roles, and account vending with Terraform"
7
+ tags: [Terraform, multi-account, AWS-Organizations, landing-zone, cross-account, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your company is moving from a single AWS account (everything in one
13
+ account) to a multi-account strategy using AWS Organizations. Plan:
14
+
15
+ ```
16
+ Management Account (root)
17
+ ├── Security OU
18
+ │ ├── Security Account (GuardDuty, SecurityHub)
19
+ │ └── Log Archive Account (CloudTrail, Config)
20
+ ├── Infrastructure OU
21
+ │ ├── Shared Services (DNS, VPN, Transit Gateway)
22
+ │ └── Network Hub (centralized networking)
23
+ ├── Workloads OU
24
+ │ ├── Production OU
25
+ │ │ ├── App1-Prod
26
+ │ │ └── App2-Prod
27
+ │ └── Non-Production OU
28
+ │ ├── App1-Dev
29
+ │ └── App2-Staging
30
+ └── Sandbox OU
31
+ └── Developer Sandboxes
32
+ ```
33
+
34
+ Terraform needs to:
35
+ 1. Create and manage the Organization structure
36
+ 2. Provision new accounts automatically (account vending)
37
+ 3. Apply baseline security controls to every account
38
+ 4. Manage cross-account networking (Transit Gateway)
39
+
40
+ Task: Design the multi-account Terraform strategy covering:
41
+ Organization management, account vending machine, baseline
42
+ security controls (SCPs, GuardDuty, Config), cross-account
43
+ IAM, and state management across accounts.
44
+
45
+ assertions:
46
+ - type: llm_judge
47
+ criteria: "Organization management with Terraform is designed — aws_organizations_organization for the org, aws_organizations_organizational_unit for OUs, aws_organizations_account for member accounts, aws_organizations_policy (SCP) for guardrails. Account vending: module that creates account, configures baseline (IAM roles, logging, security), outputs account ID. SCPs: restrict allowed services and regions, deny root user actions, enforce encryption. Terraform runs from management account with OrganizationAccountAccessRole to configure member accounts"
48
+ weight: 0.35
49
+ description: "Organization management"
50
+ - type: llm_judge
51
+ criteria: "Baseline security and cross-account are covered — baseline module per account: (1) CloudTrail → Log Archive bucket, (2) AWS Config → centralized rules, (3) GuardDuty member enrollment, (4) IAM password policy, (5) EBS default encryption, (6) S3 Block Public Access account-level setting. Cross-account IAM: create TerraformRole in each account with trust to management/CI account. Cross-account networking: Transit Gateway in network hub, RAM sharing to workload accounts. VPC peering or Transit Gateway attachments managed by network team's Terraform"
52
+ weight: 0.35
53
+ description: "Baseline and cross-account"
54
+ - type: llm_judge
55
+ criteria: "State management across accounts is practical — one state file per account per domain (not one giant state). State bucket: centralized in management or shared services account with cross-account access policies. State architecture: management-account/org.tfstate, security-account/baseline.tfstate, each-workload-account/baseline.tfstate + app.tfstate. CI/CD: pipeline assumes different roles per account. terraform_remote_state: network team's outputs consumed by workload teams. DynamoDB locking table: centralized with per-state granularity. Least privilege: each team's CI role can only access their account's state"
56
+ weight: 0.30
57
+ description: "State management"
@@ -0,0 +1,53 @@
1
+ meta:
2
+ id: board-infrastructure-investment
3
+ level: 5
4
+ course: terraform-infrastructure-setup
5
+ type: output
6
+ description: "Present infrastructure investment to the board — justify IaC platform spend, demonstrate ROI, and align technology strategy with business outcomes"
7
+ tags: [Terraform, board, ROI, investment, business-case, strategy, master]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You're preparing a board presentation for a mid-market SaaS company
13
+ ($50M ARR, 150 engineers, Series C). The board is questioning the
14
+ proposed $2M infrastructure platform investment over 2 years:
15
+
16
+ Investment breakdown:
17
+ - Terraform Cloud Enterprise: $200K/year
18
+ - Platform team (4 new hires): $800K/year
19
+ - Module library development: $300K (one-time)
20
+ - Training program: $100K/year
21
+ - Infrastructure testing: $100K/year
22
+ Total: $800K Year 1, $1.2M Year 2
23
+
24
+ Board concerns:
25
+ - "We're not an infrastructure company — why spend $2M on plumbing?"
26
+ - "Our competitors seem to manage without this investment"
27
+ - "Can't we just hire more DevOps engineers instead?"
28
+ - "What's the ROI timeline?"
29
+
30
+ Current pain:
31
+ - 3 production outages/quarter (average $50K revenue impact each)
32
+ - 2-week deployment cycle (competitors deploy daily)
33
+ - 30% of engineering time on infrastructure ops
34
+ - Failed enterprise deals due to security/compliance gaps
35
+ - $1.8M/year in AWS costs, growing 20% YoY with no optimization
36
+
37
+ Task: Build the board-level business case for IaC investment
38
+ covering: ROI analysis, competitive positioning, risk mitigation,
39
+ and the executive narrative.
40
+
41
+ assertions:
42
+ - type: llm_judge
43
+ criteria: "ROI analysis is quantified — costs: $2M over 2 years. Benefits Year 1: reduced outages (from 12 to 3/year at $50K each = $450K saved), deployment acceleration (engineering productivity: 30% ops → 15% = 22 engineer-months freed at $15K/month = $330K), AWS cost optimization (20% reduction = $360K/year from current $1.8M). Benefits Year 2: additional enterprise deals enabled by compliance ($2M+ ARR pipeline), further ops reduction (10%), hiring efficiency (fewer DevOps needed). Total 2-year benefit: $3-5M. ROI: 150-250% over 2 years. Payback period: 12-15 months. Frame as: infrastructure investment SAVES money, it doesn't just cost money"
44
+ weight: 0.35
45
+ description: "ROI analysis"
46
+ - type: llm_judge
47
+ criteria: "Competitive positioning addresses board concerns — 'not an infrastructure company': infrastructure is a competitive moat (faster deployments = faster features = win customers). Competitors DO invest (they just don't talk about it publicly). 'hire more DevOps': doesn't scale — each new DevOps engineer adds linear capacity, platform adds exponential (150 engineers benefit from 4 platform engineers). 'ROI timeline': infrastructure investment front-loads cost but compounds returns. Year 1: breakeven. Year 2+: net positive and accelerating. Enterprise readiness: SOC2/HIPAA compliance unlocks enterprise market ($10M+ ARR opportunity), impossible without proper IaC governance"
48
+ weight: 0.35
49
+ description: "Competitive positioning"
50
+ - type: llm_judge
51
+ criteria: "Risk and narrative are compelling — risk of NOT investing: continued outages erode customer trust, enterprise deals lost to competitors, increasing AWS bill without optimization, engineering velocity gap widens. Risk mitigation of investment: phased approach (spend $200K first 3 months, validate before full commitment), measurable milestones (if no improvement by month 6, adjust strategy). Executive narrative: 'We're investing in engineering velocity. Every dollar spent on infrastructure automation generates $3 in engineering productivity and $5 in enterprise revenue opportunity. This isn't plumbing — it's the engine that powers our product velocity and enterprise readiness.' Board metrics to track: deployment frequency, outage count, AWS cost trend, enterprise deal closure rate"
52
+ weight: 0.30
53
+ description: "Risk and narrative"
@@ -0,0 +1,47 @@
1
+ meta:
2
+ id: disaster-recovery-iac
3
+ level: 5
4
+ course: terraform-infrastructure-setup
5
+ type: output
6
+ description: "Design disaster recovery with Terraform — implement multi-region failover, state backup, infrastructure rebuilding, and DR testing strategies"
7
+ tags: [Terraform, disaster-recovery, multi-region, failover, backup, master]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your client's us-east-1 region experienced a 4-hour outage affecting
13
+ their production environment. Post-mortem revealed:
14
+ - No multi-region deployment (single region)
15
+ - Terraform state stored only in us-east-1 S3 bucket (inaccessible
16
+ during outage)
17
+ - No runbook for rebuilding infrastructure from scratch
18
+ - RTO target: 1 hour. Actual recovery: 4 hours
19
+ - RPO target: 15 minutes. Actual data loss: 2 hours
20
+
21
+ The CEO demands a DR strategy that meets:
22
+ - RTO: < 30 minutes for critical services
23
+ - RPO: < 5 minutes for transactional data
24
+ - Annual DR testing: full failover drill
25
+ - Cost: additional spend < 30% of current infrastructure cost
26
+
27
+ Current infrastructure: $200K/month
28
+ DR budget: up to $60K/month additional
29
+
30
+ Task: Design the DR strategy using Terraform covering: multi-region
31
+ architecture, state backup strategy, infrastructure-as-code for
32
+ rapid rebuilding, DR testing automation, and cost optimization
33
+ for standby infrastructure.
34
+
35
+ assertions:
36
+ - type: llm_judge
37
+ criteria: "Multi-region architecture meets RTO/RPO — active-passive architecture: primary (us-east-1) handles all traffic, secondary (us-west-2) has warm standby. Critical tier (RTO < 30 min): multi-AZ RDS with cross-region read replica (RPO: seconds), S3 cross-region replication, Route53 health checks with automated failover. Important tier (RTO < 2 hours): AMI/container image replication, Terraform can provision compute in 15 minutes. Non-critical tier (RTO < 8 hours): rebuild from Terraform on demand, no standby resources. Terraform manages all regions: provider aliases for us-east-1 and us-west-2, shared modules deployed to both"
38
+ weight: 0.35
39
+ description: "Multi-region architecture"
40
+ - type: llm_judge
41
+ criteria: "State backup and rebuilding strategy are robust — state backup: S3 bucket with versioning + cross-region replication to us-west-2 bucket. DynamoDB global table for lock table (accessible in both regions). If primary S3 inaccessible: terraform init with backend pointing to replicated bucket. Infrastructure rebuilding: Terraform code can recreate all infrastructure from scratch. Test this quarterly — run terraform plan in a clean account to verify. Runbook: step-by-step DR activation (1) activate Route53 failover, (2) promote RDS read replica, (3) terraform apply in DR region for compute, (4) verify application health. Automated: Lambda triggered by CloudWatch alarm runs DR activation script"
42
+ weight: 0.35
43
+ description: "State and rebuilding"
44
+ - type: llm_judge
45
+ criteria: "DR testing and cost optimization are practical — DR testing: quarterly full failover drill using Terraform. Automation: (1) terraform workspace for DR drill, (2) apply creates full DR infrastructure, (3) automated tests verify failover works, (4) terraform destroy after drill. GameDay approach: simulate failures (kill primary, verify automatic failover). Document: recovery time achieved, data loss measured, issues found. Cost optimization: warm standby only for critical tier (RDS replica: ~$3K/month, S3 replication: minimal, Route53 health checks: minimal). Compute on-demand in DR region (no standby instances — provision with Terraform during failover, 10-15 min). Estimated DR cost: $15-20K/month (10% of infrastructure), well within $60K budget"
46
+ weight: 0.30
47
+ description: "Testing and cost"
@@ -0,0 +1,48 @@
1
+ meta:
2
+ id: enterprise-iac-transformation
3
+ level: 5
4
+ course: terraform-infrastructure-setup
5
+ type: output
6
+ description: "Lead enterprise IaC transformation — design the organizational change management strategy for adopting Terraform across a 500-person engineering organization"
7
+ tags: [Terraform, enterprise, transformation, change-management, adoption, master]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You're a consulting CTO advising a Fortune 500 financial services
13
+ company (500 engineers, $5M/month AWS bill) on IaC transformation.
14
+ Current state:
15
+ - 80% of infrastructure provisioned via console or manual scripts
16
+ - 5 teams use Terraform (inconsistent patterns, no governance)
17
+ - 20 teams use no IaC at all
18
+ - Manual change management process (tickets, approvals, 3-day SLA)
19
+ - 3 compliance frameworks (SOC2, PCI-DSS, GDPR)
20
+ - 2 failed IaC adoption attempts in the past 3 years
21
+
22
+ Why previous attempts failed:
23
+ - Attempt 1: mandated Terraform for everyone, no training → engineers
24
+ created broken configurations, lost trust
25
+ - Attempt 2: platform team built perfect modules → too rigid, teams
26
+ couldn't customize, abandoned within 6 months
27
+
28
+ CEO: "We need infrastructure as code. Our competitors deploy daily,
29
+ we deploy monthly. But it has to actually stick this time."
30
+
31
+ Task: Design the enterprise IaC transformation strategy that
32
+ addresses why previous attempts failed. Cover: phased adoption,
33
+ team enablement, governance model, success metrics, executive
34
+ communication, and risk mitigation.
35
+
36
+ assertions:
37
+ - type: llm_judge
38
+ criteria: "Phased adoption addresses previous failures — why mandates fail: forced adoption without enablement creates resistance. Why rigid platforms fail: one-size-fits-all ignores team autonomy. Better approach: Phase 1 (months 1-3): select 3 willing champion teams, co-develop patterns with them (not for them). Phase 2 (months 4-6): expand to 8-10 teams using champion-developed patterns, champions become mentors. Phase 3 (months 7-12): organization-wide with self-service platform, remaining teams onboard with support. Phase 4 (months 12-18): optimize, advanced features, measure ROI. Key: teams choose when to adopt (within a deadline), not forced on day 1"
39
+ weight: 0.35
40
+ description: "Phased adoption"
41
+ - type: llm_judge
42
+ criteria: "Enablement and governance balance autonomy with guardrails — enablement: 2-week IaC bootcamp per team (not just Terraform syntax — include workflow, patterns, debugging). Pair programming: platform engineers embed in teams for first 2 months. Self-service catalog: modules teams can use immediately (VPC, ECS, RDS) with sensible defaults but configurable. Governance: guardrails not gates. Pre-commit hooks (fmt, validate, scan) — fast feedback. CI pipeline (plan, security scan) — catch issues before merge. Policy enforcement (Sentinel/OPA) — prevent non-compliant deployments. Allow teams to write custom modules within compliance boundaries"
43
+ weight: 0.35
44
+ description: "Enablement and governance"
45
+ - type: llm_judge
46
+ criteria: "Metrics and executive communication are practical — success metrics: (1) adoption rate (% of infrastructure managed by IaC, target: 50% in 6 months, 80% in 12 months), (2) deployment frequency (monthly → weekly → daily), (3) change failure rate (failed deployments ÷ total deployments), (4) MTTR (time to recover from failures), (5) compliance score (automated scan pass rate), (6) team satisfaction (survey). Executive dashboard: IaC coverage, deployment velocity, cost savings, compliance posture. Board narrative: IaC is competitive advantage — competitors deploy 10x faster, compliance is automated not manual, risk is reduced through automation. ROI: reduced manual work ($500K/year), faster deployments ($2M opportunity cost), compliance automation ($300K/year audit savings)"
47
+ weight: 0.30
48
+ description: "Metrics and communication"