dojo.md 0.2.2 → 0.2.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (149) hide show
  1. package/courses/GENERATION_LOG.md +20 -0
  2. package/courses/api-documentation-writing/course.yaml +12 -0
  3. package/courses/api-documentation-writing/scenarios/level-1/authentication-basics.yaml +46 -0
  4. package/courses/api-documentation-writing/scenarios/level-1/data-types-formats.yaml +45 -0
  5. package/courses/api-documentation-writing/scenarios/level-1/endpoint-description.yaml +45 -0
  6. package/courses/api-documentation-writing/scenarios/level-1/error-documentation.yaml +45 -0
  7. package/courses/api-documentation-writing/scenarios/level-1/first-documentation-shift.yaml +47 -0
  8. package/courses/api-documentation-writing/scenarios/level-1/getting-started-guide.yaml +42 -0
  9. package/courses/api-documentation-writing/scenarios/level-1/pagination-docs.yaml +51 -0
  10. package/courses/api-documentation-writing/scenarios/level-1/request-parameters.yaml +46 -0
  11. package/courses/api-documentation-writing/scenarios/level-1/request-response-examples.yaml +48 -0
  12. package/courses/api-documentation-writing/scenarios/level-1/status-codes.yaml +45 -0
  13. package/courses/api-documentation-writing/scenarios/level-2/error-patterns.yaml +48 -0
  14. package/courses/api-documentation-writing/scenarios/level-2/intermediate-documentation-shift.yaml +48 -0
  15. package/courses/api-documentation-writing/scenarios/level-2/oauth-documentation.yaml +47 -0
  16. package/courses/api-documentation-writing/scenarios/level-2/openapi-specification.yaml +46 -0
  17. package/courses/api-documentation-writing/scenarios/level-2/rate-limiting-docs.yaml +45 -0
  18. package/courses/api-documentation-writing/scenarios/level-2/request-body-schemas.yaml +46 -0
  19. package/courses/api-documentation-writing/scenarios/level-2/schema-definitions.yaml +41 -0
  20. package/courses/api-documentation-writing/scenarios/level-2/swagger-redoc-rendering.yaml +43 -0
  21. package/courses/api-documentation-writing/scenarios/level-2/validation-documentation.yaml +47 -0
  22. package/courses/api-documentation-writing/scenarios/level-2/versioning-changelog.yaml +42 -0
  23. package/courses/api-documentation-writing/scenarios/level-3/advanced-documentation-shift.yaml +43 -0
  24. package/courses/api-documentation-writing/scenarios/level-3/api-style-guide.yaml +40 -0
  25. package/courses/api-documentation-writing/scenarios/level-3/code-samples-multilang.yaml +40 -0
  26. package/courses/api-documentation-writing/scenarios/level-3/content-architecture.yaml +47 -0
  27. package/courses/api-documentation-writing/scenarios/level-3/deprecation-communication.yaml +44 -0
  28. package/courses/api-documentation-writing/scenarios/level-3/interactive-api-explorer.yaml +42 -0
  29. package/courses/api-documentation-writing/scenarios/level-3/migration-guides.yaml +42 -0
  30. package/courses/api-documentation-writing/scenarios/level-3/sdk-documentation.yaml +40 -0
  31. package/courses/api-documentation-writing/scenarios/level-3/webhook-documentation.yaml +48 -0
  32. package/courses/api-documentation-writing/scenarios/level-3/websocket-sse-docs.yaml +47 -0
  33. package/courses/api-documentation-writing/scenarios/level-4/api-changelog-management.yaml +44 -0
  34. package/courses/api-documentation-writing/scenarios/level-4/api-governance-standards.yaml +41 -0
  35. package/courses/api-documentation-writing/scenarios/level-4/api-product-strategy.yaml +41 -0
  36. package/courses/api-documentation-writing/scenarios/level-4/developer-portal-design.yaml +48 -0
  37. package/courses/api-documentation-writing/scenarios/level-4/docs-as-code.yaml +41 -0
  38. package/courses/api-documentation-writing/scenarios/level-4/documentation-localization.yaml +46 -0
  39. package/courses/api-documentation-writing/scenarios/level-4/documentation-metrics.yaml +45 -0
  40. package/courses/api-documentation-writing/scenarios/level-4/documentation-testing.yaml +41 -0
  41. package/courses/api-documentation-writing/scenarios/level-4/expert-documentation-shift.yaml +45 -0
  42. package/courses/api-documentation-writing/scenarios/level-4/multi-audience-docs.yaml +46 -0
  43. package/courses/api-documentation-writing/scenarios/level-5/ai-powered-documentation.yaml +44 -0
  44. package/courses/api-documentation-writing/scenarios/level-5/api-first-documentation.yaml +45 -0
  45. package/courses/api-documentation-writing/scenarios/level-5/api-marketplace-docs.yaml +42 -0
  46. package/courses/api-documentation-writing/scenarios/level-5/board-api-strategy.yaml +48 -0
  47. package/courses/api-documentation-writing/scenarios/level-5/documentation-program-strategy.yaml +42 -0
  48. package/courses/api-documentation-writing/scenarios/level-5/documentation-team-structure.yaml +47 -0
  49. package/courses/api-documentation-writing/scenarios/level-5/dx-competitive-advantage.yaml +46 -0
  50. package/courses/api-documentation-writing/scenarios/level-5/ecosystem-documentation.yaml +45 -0
  51. package/courses/api-documentation-writing/scenarios/level-5/industry-documentation-patterns.yaml +46 -0
  52. package/courses/api-documentation-writing/scenarios/level-5/master-documentation-shift.yaml +46 -0
  53. package/courses/code-review-feedback-writing/course.yaml +12 -0
  54. package/courses/code-review-feedback-writing/scenarios/level-1/approve-vs-request-changes.yaml +48 -0
  55. package/courses/code-review-feedback-writing/scenarios/level-1/asking-questions.yaml +50 -0
  56. package/courses/code-review-feedback-writing/scenarios/level-1/clear-comment-writing.yaml +45 -0
  57. package/courses/code-review-feedback-writing/scenarios/level-1/constructive-tone.yaml +43 -0
  58. package/courses/code-review-feedback-writing/scenarios/level-1/first-review-shift.yaml +46 -0
  59. package/courses/code-review-feedback-writing/scenarios/level-1/giving-praise.yaml +44 -0
  60. package/courses/code-review-feedback-writing/scenarios/level-1/nitpick-etiquette.yaml +44 -0
  61. package/courses/code-review-feedback-writing/scenarios/level-1/providing-context.yaml +46 -0
  62. package/courses/code-review-feedback-writing/scenarios/level-1/reviewing-small-prs.yaml +43 -0
  63. package/courses/code-review-feedback-writing/scenarios/level-1/style-vs-logic.yaml +48 -0
  64. package/courses/code-review-feedback-writing/scenarios/level-2/architectural-feedback.yaml +52 -0
  65. package/courses/code-review-feedback-writing/scenarios/level-2/intermediate-review-shift.yaml +46 -0
  66. package/courses/code-review-feedback-writing/scenarios/level-2/performance-feedback.yaml +50 -0
  67. package/courses/code-review-feedback-writing/scenarios/level-2/reviewing-breaking-changes.yaml +44 -0
  68. package/courses/code-review-feedback-writing/scenarios/level-2/reviewing-complex-prs.yaml +43 -0
  69. package/courses/code-review-feedback-writing/scenarios/level-2/reviewing-documentation.yaml +47 -0
  70. package/courses/code-review-feedback-writing/scenarios/level-2/reviewing-error-handling.yaml +50 -0
  71. package/courses/code-review-feedback-writing/scenarios/level-2/reviewing-tests.yaml +53 -0
  72. package/courses/code-review-feedback-writing/scenarios/level-2/security-review-comments.yaml +50 -0
  73. package/courses/code-review-feedback-writing/scenarios/level-2/suggesting-alternatives.yaml +42 -0
  74. package/courses/code-review-feedback-writing/scenarios/level-3/cross-team-review.yaml +45 -0
  75. package/courses/code-review-feedback-writing/scenarios/level-3/mentoring-through-review.yaml +46 -0
  76. package/courses/code-review-feedback-writing/scenarios/level-3/reviewing-unfamiliar-code.yaml +43 -0
  77. package/courses/terraform-infrastructure-setup/scenarios/level-1/first-debugging-shift.yaml +66 -0
  78. package/courses/terraform-infrastructure-setup/scenarios/level-1/plan-output-reading.yaml +71 -0
  79. package/courses/terraform-infrastructure-setup/scenarios/level-1/resource-creation-failures.yaml +54 -0
  80. package/courses/terraform-infrastructure-setup/scenarios/level-1/resource-references.yaml +70 -0
  81. package/courses/terraform-infrastructure-setup/scenarios/level-1/state-file-basics.yaml +73 -0
  82. package/courses/terraform-infrastructure-setup/scenarios/level-1/terraform-fmt-validate.yaml +58 -0
  83. package/courses/terraform-infrastructure-setup/scenarios/level-2/count-vs-for-each.yaml +58 -0
  84. package/courses/terraform-infrastructure-setup/scenarios/level-2/dependency-management.yaml +80 -0
  85. package/courses/terraform-infrastructure-setup/scenarios/level-2/intermediate-debugging-shift.yaml +66 -0
  86. package/courses/terraform-infrastructure-setup/scenarios/level-2/lifecycle-rules.yaml +51 -0
  87. package/courses/terraform-infrastructure-setup/scenarios/level-2/locals-and-expressions.yaml +58 -0
  88. package/courses/terraform-infrastructure-setup/scenarios/level-2/module-structure.yaml +75 -0
  89. package/courses/terraform-infrastructure-setup/scenarios/level-2/provisioner-pitfalls.yaml +64 -0
  90. package/courses/terraform-infrastructure-setup/scenarios/level-2/remote-state-backend.yaml +55 -0
  91. package/courses/terraform-infrastructure-setup/scenarios/level-2/terraform-import.yaml +55 -0
  92. package/courses/terraform-infrastructure-setup/scenarios/level-2/workspace-management.yaml +51 -0
  93. package/courses/terraform-infrastructure-setup/scenarios/level-3/advanced-debugging-shift.yaml +63 -0
  94. package/courses/terraform-infrastructure-setup/scenarios/level-3/api-rate-limiting.yaml +50 -0
  95. package/courses/terraform-infrastructure-setup/scenarios/level-3/conditional-resources.yaml +66 -0
  96. package/courses/terraform-infrastructure-setup/scenarios/level-3/drift-detection.yaml +66 -0
  97. package/courses/terraform-infrastructure-setup/scenarios/level-3/dynamic-blocks.yaml +71 -0
  98. package/courses/terraform-infrastructure-setup/scenarios/level-3/large-scale-refactoring.yaml +59 -0
  99. package/courses/terraform-infrastructure-setup/scenarios/level-3/multi-provider-config.yaml +69 -0
  100. package/courses/terraform-infrastructure-setup/scenarios/level-3/state-surgery.yaml +57 -0
  101. package/courses/terraform-infrastructure-setup/scenarios/level-3/terraform-cloud-enterprise.yaml +59 -0
  102. package/courses/terraform-infrastructure-setup/scenarios/level-3/terraform-debugging.yaml +51 -0
  103. package/courses/terraform-infrastructure-setup/scenarios/level-4/blast-radius-management.yaml +51 -0
  104. package/courses/terraform-infrastructure-setup/scenarios/level-4/cicd-pipeline-design.yaml +50 -0
  105. package/courses/terraform-infrastructure-setup/scenarios/level-4/compliance-as-code.yaml +46 -0
  106. package/courses/terraform-infrastructure-setup/scenarios/level-4/cost-estimation-governance.yaml +42 -0
  107. package/courses/terraform-infrastructure-setup/scenarios/level-4/expert-debugging-shift.yaml +51 -0
  108. package/courses/terraform-infrastructure-setup/scenarios/level-4/iac-organization-strategy.yaml +45 -0
  109. package/courses/terraform-infrastructure-setup/scenarios/level-4/incident-response-iac.yaml +47 -0
  110. package/courses/terraform-infrastructure-setup/scenarios/level-4/infrastructure-testing.yaml +41 -0
  111. package/courses/terraform-infrastructure-setup/scenarios/level-4/module-registry-design.yaml +45 -0
  112. package/courses/terraform-infrastructure-setup/scenarios/level-4/multi-account-strategy.yaml +57 -0
  113. package/courses/terraform-infrastructure-setup/scenarios/level-5/board-infrastructure-investment.yaml +53 -0
  114. package/courses/terraform-infrastructure-setup/scenarios/level-5/disaster-recovery-iac.yaml +47 -0
  115. package/courses/terraform-infrastructure-setup/scenarios/level-5/enterprise-iac-transformation.yaml +48 -0
  116. package/courses/terraform-infrastructure-setup/scenarios/level-5/iac-technology-evolution.yaml +49 -0
  117. package/courses/terraform-infrastructure-setup/scenarios/level-5/ma-infrastructure-consolidation.yaml +54 -0
  118. package/courses/terraform-infrastructure-setup/scenarios/level-5/master-debugging-shift.yaml +53 -0
  119. package/courses/terraform-infrastructure-setup/scenarios/level-5/multi-cloud-strategy.yaml +49 -0
  120. package/courses/terraform-infrastructure-setup/scenarios/level-5/platform-engineering.yaml +47 -0
  121. package/courses/terraform-infrastructure-setup/scenarios/level-5/regulatory-compliance-automation.yaml +47 -0
  122. package/courses/terraform-infrastructure-setup/scenarios/level-5/terraform-vs-alternatives.yaml +46 -0
  123. package/dist/cli/commands/generate.d.ts.map +1 -1
  124. package/dist/cli/commands/generate.js +2 -1
  125. package/dist/cli/commands/generate.js.map +1 -1
  126. package/dist/cli/commands/train.d.ts.map +1 -1
  127. package/dist/cli/commands/train.js +6 -3
  128. package/dist/cli/commands/train.js.map +1 -1
  129. package/dist/cli/index.js +9 -6
  130. package/dist/cli/index.js.map +1 -1
  131. package/dist/cli/run-demo.js +3 -2
  132. package/dist/cli/run-demo.js.map +1 -1
  133. package/dist/engine/model-utils.d.ts +6 -0
  134. package/dist/engine/model-utils.d.ts.map +1 -1
  135. package/dist/engine/model-utils.js +28 -1
  136. package/dist/engine/model-utils.js.map +1 -1
  137. package/dist/engine/training.d.ts.map +1 -1
  138. package/dist/engine/training.js +4 -3
  139. package/dist/engine/training.js.map +1 -1
  140. package/dist/generator/course-generator.d.ts.map +1 -1
  141. package/dist/generator/course-generator.js +4 -3
  142. package/dist/generator/course-generator.js.map +1 -1
  143. package/dist/mcp/server.d.ts.map +1 -1
  144. package/dist/mcp/server.js +7 -3
  145. package/dist/mcp/server.js.map +1 -1
  146. package/dist/mcp/session-manager.d.ts.map +1 -1
  147. package/dist/mcp/session-manager.js +3 -2
  148. package/dist/mcp/session-manager.js.map +1 -1
  149. package/package.json +1 -1
@@ -0,0 +1,59 @@
1
+ meta:
2
+ id: large-scale-refactoring
3
+ level: 3
4
+ course: terraform-infrastructure-setup
5
+ type: output
6
+ description: "Refactor large Terraform codebases — split monoliths into modules, migrate between state files, and use moved blocks for safe resource reorganization"
7
+ tags: [Terraform, refactoring, modules, moved-blocks, migration, advanced]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your organization's Terraform codebase has grown organically over
13
+ 3 years into a monolith:
14
+
15
+ ```
16
+ infrastructure/
17
+ ├── main.tf (3500 lines, 180 resources)
18
+ ├── variables.tf (800 lines, 95 variables)
19
+ ├── outputs.tf (200 lines)
20
+ └── terraform.tfstate (25MB, all resources in one state)
21
+ ```
22
+
23
+ Problems:
24
+ - terraform plan takes 8 minutes (refreshes all 180 resources)
25
+ - Any change risks all resources (blast radius = everything)
26
+ - 5 teams touch the same files, causing merge conflicts
27
+ - Lock contention: only one person can run terraform at a time
28
+
29
+ Target architecture:
30
+ ```
31
+ infrastructure/
32
+ ├── foundation/ (VPC, DNS, IAM — Platform team)
33
+ │ └── terraform.tfstate
34
+ ├── database/ (RDS, ElastiCache — Database team)
35
+ │ └── terraform.tfstate
36
+ ├── compute/ (ECS, ALB — App team)
37
+ │ └── terraform.tfstate
38
+ ├── monitoring/ (CloudWatch, Alarms — SRE team)
39
+ │ └── terraform.tfstate
40
+ └── modules/ (Shared modules)
41
+ ```
42
+
43
+ Task: Design the migration strategy from monolith to modular
44
+ Terraform, covering state splitting, moved blocks, cross-state
45
+ references, testing the migration, and rollback planning.
46
+
47
+ assertions:
48
+ - type: llm_judge
49
+ criteria: "Migration strategy is phased — Phase 1: catalog all resources by team/domain. Phase 2: create module structure and write configurations for each domain. Phase 3: use moved blocks within the monolith to reorganize into modules (no state split yet). Phase 4: split state files using state mv or state rm + import. Phase 5: establish cross-state references using terraform_remote_state data sources. Each phase is independently verifiable: plan should show no changes after each phase. Never do everything at once — incremental migration with verification"
50
+ weight: 0.35
51
+ description: "Migration strategy"
52
+ - type: llm_judge
53
+ criteria: "State splitting mechanics are covered — approach 1 (state mv): (1) backup state, (2) create new backend configs, (3) terraform state mv resources to new state files. Approach 2 (state rm + import): (1) remove resources from monolith state, (2) import into new domain state files. Approach 3 (manual): (1) state pull, (2) edit JSON to split resources, (3) state push to new backends. Cross-state references: foundation outputs VPC ID, compute reads it via terraform_remote_state. IAM and dependency order: foundation first (VPC, IAM), then database (needs VPC), then compute (needs both)"
54
+ weight: 0.35
55
+ description: "State splitting"
56
+ - type: llm_judge
57
+ criteria: "Testing and rollback are practical — testing: after each migration step, terraform plan must show zero changes in all state files. If plan shows changes, something was migrated incorrectly — fix before proceeding. Rollback: keep the original monolith state backup throughout migration. If anything goes wrong, restore from backup and restart the phase. Timeline: for 180 resources, plan 2-4 weeks. Risk mitigation: migrate non-production first, then production during maintenance window. Communication: notify all teams of the plan, freeze non-essential changes during migration"
58
+ weight: 0.30
59
+ description: "Testing and rollback"
@@ -0,0 +1,69 @@
1
+ meta:
2
+ id: multi-provider-config
3
+ level: 3
4
+ course: terraform-infrastructure-setup
5
+ type: output
6
+ description: "Configure multi-provider setups — manage multi-region, multi-account, and multi-cloud deployments with provider aliases and assume_role"
7
+ tags: [Terraform, providers, multi-region, multi-account, cross-account, advanced]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your organization needs infrastructure across multiple AWS accounts
13
+ and regions:
14
+
15
+ ```
16
+ Production Account (111111111111) - us-east-1
17
+ Staging Account (222222222222) - us-east-1
18
+ DR Account (111111111111) - us-west-2
19
+ Shared Services (333333333333) - us-east-1
20
+ ```
21
+
22
+ Your Terraform configuration:
23
+
24
+ ```hcl
25
+ provider "aws" {
26
+ region = "us-east-1"
27
+ }
28
+
29
+ provider "aws" {
30
+ alias = "dr"
31
+ region = "us-west-2"
32
+ }
33
+
34
+ provider "aws" {
35
+ alias = "staging"
36
+ region = "us-east-1"
37
+ assume_role {
38
+ role_arn = "arn:aws:iam::222222222222:role/TerraformRole"
39
+ }
40
+ }
41
+ ```
42
+
43
+ Error when deploying to staging:
44
+ ```
45
+ Error: error configuring Terraform AWS Provider: IAM Role
46
+ (arn:aws:iam::222222222222:role/TerraformRole) cannot be assumed.
47
+
48
+ There are a number of possible causes:
49
+ - The credentials used do not have permission to assume the role
50
+ - The role's trust policy does not allow the current identity
51
+ ```
52
+
53
+ Task: Explain multi-provider configuration, assume_role for
54
+ cross-account access, passing providers to modules, provider
55
+ configuration best practices, and debugging cross-account issues.
56
+
57
+ assertions:
58
+ - type: llm_judge
59
+ criteria: "Multi-provider setup is explained — provider aliases allow multiple configurations of the same provider. Default provider (no alias) used when provider isn't specified on a resource. Aliased providers: specify with provider = aws.dr on each resource. assume_role: Terraform assumes an IAM role in another account. Requirements: (1) trust policy on target role must allow the source account/role, (2) source must have sts:AssumeRole permission, (3) external_id for additional security. The error: trust policy or permissions issue — check both sides"
60
+ weight: 0.35
61
+ description: "Multi-provider setup"
62
+ - type: llm_judge
63
+ criteria: "Provider passing to modules is covered — modules don't inherit provider aliases automatically. Pass explicitly: module 'dr_vpc' { source = './modules/vpc', providers = { aws = aws.dr } }. Module must declare required providers: terraform { required_providers { aws = { source = 'hashicorp/aws' } } }. For modules needing multiple providers: providers = { aws = aws, aws.secondary = aws.dr }. Anti-pattern: configuring providers inside modules — always configure in root and pass down"
64
+ weight: 0.35
65
+ description: "Module providers"
66
+ - type: llm_judge
67
+ criteria: "Cross-account debugging is practical — debugging assume_role: (1) verify trust policy on target role allows the source identity, (2) verify source has sts:AssumeRole permission, (3) check for external_id requirement, (4) test manually: aws sts assume-role --role-arn ... (5) enable TF_LOG=DEBUG to see the exact API call. IAM role trust policy must include the specific ARN (account, user, or role). Session duration: default 1 hour, can increase with duration_seconds. MFA: if required, must be handled outside Terraform. Best practice: use separate state files per account for blast radius isolation"
68
+ weight: 0.30
69
+ description: "Cross-account debugging"
@@ -0,0 +1,57 @@
1
+ meta:
2
+ id: state-surgery
3
+ level: 3
4
+ course: terraform-infrastructure-setup
5
+ type: output
6
+ description: "Perform state surgery — use state mv, rm, pull, push for complex migrations, module extraction, and resource address changes"
7
+ tags: [Terraform, state, migration, state-mv, state-rm, advanced]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your monolithic Terraform configuration with 200 resources needs
13
+ to be split into separate modules. Current flat structure:
14
+
15
+ ```hcl
16
+ # main.tf (2000 lines)
17
+ resource "aws_vpc" "main" { ... }
18
+ resource "aws_subnet" "public" { ... }
19
+ resource "aws_instance" "web" { ... }
20
+ resource "aws_rds_instance" "db" { ... }
21
+ ```
22
+
23
+ Target: split into modules/networking, modules/compute, modules/database.
24
+
25
+ Attempt 1 — Just move code into modules:
26
+ ```
27
+ $ terraform plan
28
+ # aws_vpc.main will be destroyed
29
+ # module.networking.aws_vpc.main will be created
30
+ # aws_instance.web will be destroyed
31
+ # module.compute.aws_instance.web will be created
32
+ # aws_rds_instance.db will be destroyed (!!!)
33
+ # module.database.aws_rds_instance.db will be created
34
+ Plan: 6 to add, 0 to change, 6 to destroy.
35
+ ```
36
+
37
+ All resources will be destroyed and recreated — unacceptable for
38
+ production! The database would be lost.
39
+
40
+ Task: Explain state surgery operations (mv, rm, pull, push),
41
+ how to migrate resources between modules without recreation,
42
+ moved blocks (Terraform 1.1+), state backup best practices,
43
+ and complex migration strategies.
44
+
45
+ assertions:
46
+ - type: llm_judge
47
+ criteria: "State mv migration is explained — terraform state mv moves a resource from one address to another in state without modifying infrastructure. To migrate to modules: terraform state mv aws_vpc.main module.networking.aws_vpc.main, terraform state mv aws_instance.web module.compute.aws_instance.web, etc. After all moves: terraform plan should show no changes. Always backup state first: terraform state pull > backup.tfstate. State mv is atomic per resource — if interrupted, some resources moved, others not. Plan carefully and script the moves"
48
+ weight: 0.35
49
+ description: "State mv"
50
+ - type: llm_judge
51
+ criteria: "Moved blocks are covered as the modern alternative — moved { from = aws_vpc.main, to = module.networking.aws_vpc.main }. Benefits over state mv: (1) declarative and code-reviewable, (2) handled during plan/apply, (3) no manual state manipulation, (4) works across plan/apply workflow. Multiple moved blocks can coexist. Moved blocks are removed after successful apply. Supports: resource address changes, module refactoring, count to for_each migration. Terraform 1.1+ required. Preferred over state mv for most migrations"
52
+ weight: 0.35
53
+ description: "Moved blocks"
54
+ - type: llm_judge
55
+ criteria: "State rm and complex operations are practical — terraform state rm: removes resource from state without destroying it. Use when: (1) resource should no longer be managed by Terraform, (2) moving resource to different state file, (3) removing accidentally imported resource. terraform state pull/push: download/upload entire state file. Use for: manual state repair, migrating between backends, debugging. Complex migration: for splitting state files, (1) state pull, (2) manipulate JSON, (3) state push to new backend. Always: backup before surgery, verify with plan after, use -dry-run where available"
56
+ weight: 0.30
57
+ description: "Complex operations"
@@ -0,0 +1,59 @@
1
+ meta:
2
+ id: terraform-cloud-enterprise
3
+ level: 3
4
+ course: terraform-infrastructure-setup
5
+ type: output
6
+ description: "Use Terraform Cloud/Enterprise — configure remote execution, VCS integration, workspace management, and Sentinel policies"
7
+ tags: [Terraform, Terraform-Cloud, Enterprise, remote-execution, Sentinel, advanced]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your team is migrating from local Terraform execution to Terraform
13
+ Cloud. Current pain points:
14
+ - Engineers run terraform from laptops with different provider versions
15
+ - No audit trail of who applied what
16
+ - State files stored in S3 with overly permissive access
17
+ - No policy enforcement (anyone can create m5.24xlarge instances)
18
+
19
+ Migration configuration:
20
+
21
+ ```hcl
22
+ terraform {
23
+ cloud {
24
+ organization = "acme-corp"
25
+ workspaces {
26
+ name = "production"
27
+ }
28
+ }
29
+ }
30
+ ```
31
+
32
+ After migration:
33
+ ```
34
+ $ terraform plan
35
+
36
+ Running plan in Terraform Cloud. Output will stream here.
37
+
38
+ Error: Terraform Cloud returned an unexpected error
39
+ UNAUTHORIZED: You are not authorized to perform this action.
40
+ ```
41
+
42
+ Task: Explain Terraform Cloud features (remote execution, VCS
43
+ integration, workspace management), Sentinel policies for
44
+ governance, migration from local/S3 to Terraform Cloud, and
45
+ when to use Cloud vs Enterprise vs self-hosted.
46
+
47
+ assertions:
48
+ - type: llm_judge
49
+ criteria: "Terraform Cloud features are explained — remote execution: plan and apply run on Terraform Cloud's infrastructure (consistent environment, no laptop dependencies). VCS integration: connect to GitHub/GitLab, automatic plans on PRs, apply on merge. Workspace management: each workspace has its own state, variables, and permissions. Variable sets: share variables across workspaces. Run triggers: chain workspaces (VPC workspace triggers EKS workspace). The auth error: need to run terraform login first, or set TF_TOKEN_app_terraform_io environment variable. Team permissions control who can plan vs apply"
50
+ weight: 0.35
51
+ description: "Cloud features"
52
+ - type: llm_judge
53
+ criteria: "Sentinel policies are covered — Sentinel: policy-as-code framework for governance. Policy sets: attach to workspaces. Enforcement levels: advisory (warn), soft-mandatory (override with approval), hard-mandatory (no override). Example policies: restrict instance types (no m5.24xlarge), require tags on all resources, enforce encryption, restrict regions. Policy workflow: plan → Sentinel check → cost estimation → apply. Policies written in Sentinel language (not HCL). OPA (Open Policy Agent) also supported as alternative"
54
+ weight: 0.35
55
+ description: "Sentinel policies"
56
+ - type: llm_judge
57
+ criteria: "Migration and comparison are practical — migration from S3: (1) add cloud block to config, (2) terraform login, (3) terraform init to migrate state. Cloud vs Enterprise vs self-hosted: Cloud (SaaS, free tier available, easiest setup), Enterprise (self-hosted, air-gapped support, custom agents), self-hosted agents with Cloud (hybrid — control plane in Cloud, execution on your infrastructure). When Cloud: most teams. When Enterprise: regulatory requirements for air-gapped, very large scale, custom integrations. Cost: Cloud free for small teams, Enterprise starts at $70K+/year"
58
+ weight: 0.30
59
+ description: "Migration and comparison"
@@ -0,0 +1,51 @@
1
+ meta:
2
+ id: terraform-debugging
3
+ level: 3
4
+ course: terraform-infrastructure-setup
5
+ type: output
6
+ description: "Debug Terraform with TF_LOG — use log levels, provider-specific debugging, crash logs, and systematic troubleshooting for complex failures"
7
+ tags: [Terraform, debugging, TF_LOG, crash-logs, troubleshooting, advanced]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ A terraform apply fails with a cryptic error that gives no useful
13
+ information:
14
+
15
+ ```
16
+ Error: error creating ECS Service (my-service): InvalidParameterException:
17
+ Unable to assume the provided role.
18
+
19
+ with aws_ecs_service.web,
20
+ on ecs.tf line 15, in resource "aws_ecs_service" "web":
21
+ 15: resource "aws_ecs_service" "web" {
22
+ ```
23
+
24
+ The IAM role exists and looks correct. You need to dig deeper.
25
+
26
+ You also encounter a Terraform crash:
27
+ ```
28
+ !!!!!!!!!!!!!!!!!!!!!!!!!!! TERRAFORM CRASH !!!!!!!!!!!!!!!!!!!!!!!!!
29
+
30
+ Terraform crashed! This is always indicative of a bug within
31
+ Terraform or a provider. Crash log saved to: crash.log
32
+ ```
33
+
34
+ Task: Explain Terraform debugging techniques, TF_LOG levels and
35
+ environment variables, provider-specific debugging, crash log
36
+ analysis, and systematic troubleshooting methodology for complex
37
+ infrastructure failures.
38
+
39
+ assertions:
40
+ - type: llm_judge
41
+ criteria: "TF_LOG debugging is explained — levels (most to least verbose): TRACE, DEBUG, INFO, WARN, ERROR. Set: TF_LOG=DEBUG terraform apply. Save to file: TF_LOG_PATH=./debug.log. Component-specific: TF_LOG_CORE=WARN TF_LOG_PROVIDER=DEBUG (provider operations verbose, core quiet). The ECS error: TF_LOG=DEBUG reveals the actual API request/response — likely IAM role trust policy doesn't include ecs.amazonaws.com, or there's an IAM propagation delay. DEBUG shows: HTTP requests, API responses, retry attempts, timing. TRACE shows everything including internal state operations"
42
+ weight: 0.35
43
+ description: "TF_LOG debugging"
44
+ - type: llm_judge
45
+ criteria: "Crash logs and provider debugging are covered — crash log: contains Go stack trace, panic message, provider version. Report to: provider GitHub issues if provider crash, Terraform core GitHub if core crash. Include: Terraform version, provider versions, sanitized config, crash.log. Provider debugging: check provider changelog for known bugs, try upgrading/downgrading provider version, reproduce with minimal configuration. AWS-specific: decode authorization failure messages with aws sts decode-authorization-message. Eventual consistency: IAM changes can take seconds to propagate — add depends_on or retry"
46
+ weight: 0.35
47
+ description: "Crash and provider"
48
+ - type: llm_judge
49
+ criteria: "Systematic troubleshooting is practical — methodology: (1) read the error message carefully (resource, file, line), (2) check provider documentation for the resource, (3) enable TF_LOG=DEBUG and search for the actual API error, (4) reproduce with minimal configuration (isolate the issue), (5) check for known issues on provider GitHub, (6) verify cloud-side (correct permissions, quotas, resource limits). Common hidden causes: IAM propagation delay, API rate limiting (429 errors hidden in retries), eventual consistency, stale provider cache (terraform init -upgrade). terraform plan -refresh-only to verify state matches reality"
50
+ weight: 0.30
51
+ description: "Troubleshooting method"
@@ -0,0 +1,51 @@
1
+ meta:
2
+ id: blast-radius-management
3
+ level: 4
4
+ course: terraform-infrastructure-setup
5
+ type: output
6
+ description: "Manage Terraform blast radius — design state boundaries, implement approval workflows, and prevent large-scale outages from single changes"
7
+ tags: [Terraform, blast-radius, state-separation, approvals, risk, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ A single terraform apply destroyed your production database, two
13
+ load balancers, and a VPN connection. Root cause: all 300 resources
14
+ were in one state file. The engineer intended to modify a CloudWatch
15
+ alarm but a provider upgrade changed the behavior of unrelated
16
+ resources.
17
+
18
+ Impact:
19
+ - 4 hours of downtime
20
+ - Database restored from backup (30 minutes of data loss)
21
+ - Post-mortem found: blast radius = 300 resources per apply
22
+ - Board asked: "How do we prevent this from happening again?"
23
+
24
+ Current state architecture:
25
+ ```
26
+ Single state: 300 resources
27
+ - VPC, subnets, NAT gateways
28
+ - RDS, ElastiCache
29
+ - ECS services, ALBs
30
+ - CloudWatch, SNS, SQS
31
+ - IAM roles, policies
32
+ - S3 buckets, CloudFront
33
+ ```
34
+
35
+ Task: Design the blast radius management strategy covering: state
36
+ file boundaries, change classification (risk levels), approval
37
+ workflows, provider upgrade safety, and recovery procedures.
38
+
39
+ assertions:
40
+ - type: llm_judge
41
+ criteria: "State boundaries reduce blast radius — split 300 resources into isolated state files: foundation (VPC, subnets, NAT — rarely changes, ~20 resources), database (RDS, ElastiCache — critical, ~10 resources), compute (ECS, ALB — frequently changes, ~50 resources), messaging (SQS, SNS — moderate, ~30 resources), monitoring (CloudWatch, alarms — frequent, ~40 resources), IAM (roles, policies — sensitive, ~30 resources), CDN (CloudFront, S3 — moderate, ~20 resources). Each state file limits the blast radius. Maximum 50-80 resources per state. Cross-state references via terraform_remote_state"
42
+ weight: 0.35
43
+ description: "State boundaries"
44
+ - type: llm_judge
45
+ criteria: "Change classification and approvals are defined — risk levels: Low (monitoring, tags, non-destructive updates — auto-approve in CI), Medium (security group changes, scaling modifications — 1 approval), High (database changes, network topology, IAM — 2 approvals + change window), Critical (provider upgrades, state operations, foundation changes — team lead + SRE approval). Implement via: Terraform Cloud workspace-level permissions, GitHub environment protection rules, or Atlantis apply requirements. Provider upgrades: pin exact versions, upgrade in dev first, review changelog for breaking changes, upgrade one state file at a time"
46
+ weight: 0.35
47
+ description: "Classification and approvals"
48
+ - type: llm_judge
49
+ criteria: "Recovery procedures are practical — immediate response: (1) don't run terraform apply again, (2) assess damage scope from state and CloudTrail, (3) restore from backups (RDS snapshots, S3 versioning). Recovery: (1) if resources destroyed but state intact: terraform apply recreates, (2) if state corrupted: restore from S3 versioned state backup. Prevention: prevent_destroy on databases and critical resources, separate state files limit collateral damage, terraform plan -detailed-exitcode in CI catches unexpected destroys, plan output review required before apply. Provider upgrades: test in isolated environment first, upgrade one service domain at a time, maintain rollback plan (pin to previous version)"
50
+ weight: 0.30
51
+ description: "Recovery"
@@ -0,0 +1,50 @@
1
+ meta:
2
+ id: cicd-pipeline-design
3
+ level: 4
4
+ course: terraform-infrastructure-setup
5
+ type: output
6
+ description: "Design CI/CD pipelines for Terraform — implement GitOps workflows with Atlantis, GitHub Actions, or Terraform Cloud for safe infrastructure deployment"
7
+ tags: [Terraform, CI/CD, GitOps, Atlantis, GitHub-Actions, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your team deploys Terraform from individual laptops. Last month:
13
+ - An engineer applied to production instead of staging (wrong workspace)
14
+ - Two engineers ran apply simultaneously, causing state corruption
15
+ - An apply failed halfway but no one noticed for 3 hours
16
+ - No record of who deployed what or when
17
+
18
+ You need to design a CI/CD pipeline for Terraform that prevents
19
+ all of these issues. Options on the table:
20
+
21
+ 1. GitHub Actions with custom workflow
22
+ 2. Atlantis (pull request automation)
23
+ 3. Terraform Cloud/Enterprise
24
+ 4. Spacelift
25
+
26
+ Requirements:
27
+ - Plan on every PR
28
+ - Apply only after approval and merge
29
+ - Environment protection (can't accidentally apply to prod)
30
+ - Cost estimation before apply
31
+ - Security scanning (tfsec/checkov)
32
+ - Slack notifications for plan/apply results
33
+
34
+ Task: Design the CI/CD pipeline for Terraform, compare the tool
35
+ options, show a complete workflow from code change to production
36
+ deployment, and address security considerations.
37
+
38
+ assertions:
39
+ - type: llm_judge
40
+ criteria: "Complete pipeline workflow is designed — code change → PR opened → automated pipeline: (1) terraform fmt -check (formatting), (2) terraform validate (syntax), (3) tfsec/checkov scan (security), (4) terraform plan (preview changes), (5) Infracost estimate (cost), (6) post results as PR comment. On merge to main: (7) terraform plan again (detect drift since PR), (8) approval gate (manual for prod), (9) terraform apply, (10) post-apply verification, (11) Slack notification. Environment promotion: dev auto-apply, staging auto-apply, prod manual approval"
41
+ weight: 0.35
42
+ description: "Pipeline workflow"
43
+ - type: llm_judge
44
+ criteria: "Tool comparison is practical — Atlantis: open-source, PR automation, self-hosted, lightweight. Best for: teams wanting simple PR-based workflow. GitHub Actions: flexible, native GitHub integration, custom workflows. Best for: teams already on GitHub wanting full control. Terraform Cloud: managed service, built-in Sentinel, cost estimation, team management. Best for: organizations wanting managed solution. Spacelift: multi-tool support, advanced policies, drift detection. Best for: enterprises with complex requirements. Recommendation depends on: team size, budget, compliance needs, multi-tool requirements"
45
+ weight: 0.35
46
+ description: "Tool comparison"
47
+ - type: llm_judge
48
+ criteria: "Security considerations are covered — credentials: use OIDC for cloud authentication (no static keys in CI). GitHub Actions: aws-actions/configure-aws-credentials with OIDC. State access: CI role has minimal permissions (plan role vs apply role). Secrets: never echo credentials, use GitHub encrypted secrets or Terraform Cloud variables. Branch protection: require PR reviews, no direct pushes to main. Environment protection: GitHub environments with required reviewers for prod. Audit: log all plan/apply with outputs. Network: CI runner in private network if accessing private resources"
49
+ weight: 0.30
50
+ description: "Security"
@@ -0,0 +1,46 @@
1
+ meta:
2
+ id: compliance-as-code
3
+ level: 4
4
+ course: terraform-infrastructure-setup
5
+ type: output
6
+ description: "Implement compliance as code — enforce SOC2, HIPAA, and PCI-DSS requirements through Terraform policies, scanning, and automated remediation"
7
+ tags: [Terraform, compliance, SOC2, HIPAA, PCI-DSS, policy-as-code, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your healthtech company needs SOC2 Type II and HIPAA compliance.
13
+ The compliance auditor found these Terraform-managed infrastructure
14
+ gaps:
15
+
16
+ 1. S3 buckets without encryption at rest (5 of 30 buckets)
17
+ 2. Security groups allowing 0.0.0.0/0 ingress on non-HTTP ports
18
+ 3. RDS instances without encryption or automated backups
19
+ 4. CloudTrail not enabled in all regions
20
+ 5. No log retention policy (CloudWatch logs kept indefinitely)
21
+ 6. IAM users with programmatic access keys older than 90 days
22
+ 7. EBS volumes not encrypted by default
23
+
24
+ The auditor needs evidence that:
25
+ - These controls are enforced automatically (not just documented)
26
+ - Non-compliant resources cannot be deployed
27
+ - Continuous monitoring detects and alerts on compliance drift
28
+
29
+ Task: Design the compliance-as-code strategy covering: policy
30
+ enforcement (prevent non-compliant deployments), automated scanning,
31
+ remediation patterns, audit evidence generation, and continuous
32
+ compliance monitoring.
33
+
34
+ assertions:
35
+ - type: llm_judge
36
+ criteria: "Policy enforcement prevents non-compliant deployments — pre-deployment: checkov/tfsec in CI catches violations before apply. Sentinel policies (Terraform Cloud): hard-mandatory rules that block non-compliant applies. Example policies: all S3 buckets must have server_side_encryption_configuration, all RDS instances must have storage_encrypted = true and backup_retention_period >= 7, security groups cannot have cidr_blocks = ['0.0.0.0/0'] except on ports 80 and 443. Module library: compliant-by-default modules that enforce encryption, logging, and access controls"
37
+ weight: 0.35
38
+ description: "Policy enforcement"
39
+ - type: llm_judge
40
+ criteria: "Remediation and audit evidence are covered — remediation: (1) update Terraform modules to include compliance requirements by default (encryption, backups, logging), (2) apply changes across all environments using shared module updates, (3) for existing non-compliant resources: plan and apply to add encryption/backups. Audit evidence: (1) Terraform Cloud audit logs showing who approved what, (2) Git history showing code reviews for all infrastructure changes, (3) checkov reports stored as CI artifacts, (4) AWS Config compliance dashboard. Compliance as code = evidence generated automatically from deployment pipeline"
41
+ weight: 0.35
42
+ description: "Remediation and audit"
43
+ - type: llm_judge
44
+ criteria: "Continuous monitoring is practical — AWS Config rules: detect non-compliant resources in real-time (encrypted-volumes, s3-bucket-server-side-encryption-enabled, rds-storage-encrypted). Config remediation: auto-remediate with SSM Automation (e.g., enable encryption on new unencrypted volumes). Terraform Cloud drift detection: scheduled plans detect unauthorized changes. Alert pipeline: Config finding → SNS → Lambda → Slack/PagerDuty. Quarterly compliance review: run full checkov scan, compare against SOC2/HIPAA control matrix, generate compliance report. Map each Terraform policy to specific compliance control (SOC2 CC6.1, HIPAA §164.312(a)(2)(iv))"
45
+ weight: 0.30
46
+ description: "Continuous monitoring"
@@ -0,0 +1,42 @@
1
+ meta:
2
+ id: cost-estimation-governance
3
+ level: 4
4
+ course: terraform-infrastructure-setup
5
+ type: output
6
+ description: "Implement cost governance with Terraform — integrate Infracost for pre-deployment estimation, set budget alerts, and enforce cost policies"
7
+ tags: [Terraform, cost, Infracost, FinOps, governance, budgets, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your monthly AWS bill is $150K and growing 15% month-over-month.
13
+ Nobody knows the cost impact of Terraform changes until the bill
14
+ arrives. Recent surprises:
15
+
16
+ - Engineer created a NAT Gateway in 4 AZs ($576/month) when 1 was
17
+ sufficient ($144/month)
18
+ - A for_each over 50 items created 50 CloudWatch dashboards at
19
+ $3/each ($150/month) — nobody realized
20
+ - An RDS upgrade from db.r5.large to db.r5.4xlarge increased cost
21
+ from $260/month to $2,080/month
22
+ - Dev environment running same instance types as production ($8K/month
23
+ wasted)
24
+
25
+ Task: Design the cost governance strategy covering: Infracost
26
+ integration in CI/CD, policy-based cost controls, environment-specific
27
+ sizing, tagging for cost allocation, and ongoing cost optimization
28
+ practices.
29
+
30
+ assertions:
31
+ - type: llm_judge
32
+ criteria: "Infracost integration is designed — Infracost estimates cost changes in PRs before deployment. CI integration: infracost breakdown --path . shows total monthly cost, infracost diff shows cost change from PR. PR comment: shows cost increase/decrease per resource. Setup: install Infracost in CI, generate plan JSON (terraform plan -out=plan.tfplan && terraform show -json plan.tfplan), run infracost diff --path plan.json. Thresholds: alert if monthly cost increase > $100, block if > $500 (configurable). Free tier available for open source and small teams"
33
+ weight: 0.35
34
+ description: "Infracost"
35
+ - type: llm_judge
36
+ criteria: "Policy-based cost controls are implemented — Sentinel/OPA policies: restrict expensive instance types (no db.r5.4xlarge in non-prod), limit resource counts (max 3 NAT Gateways), require cost justification for changes over threshold. Variable validation: variable 'instance_type' { validation { condition = !contains(['r5.4xlarge','r5.8xlarge'], var.instance_type) || var.environment == 'prod' } }. Environment sizing: locals { env_sizing = { dev = 't3.small', staging = 't3.medium', prod = 't3.large' } }. Enforce via policy: dev instances must be t3.small or smaller"
37
+ weight: 0.35
38
+ description: "Cost policies"
39
+ - type: llm_judge
40
+ criteria: "Tagging and optimization are practical — mandatory tags for cost allocation: Team, Environment, CostCenter, Service. AWS Cost Explorer uses tags for breakdown. Enforce tags via Sentinel policy or AWS SCP (deny resource creation without required tags). Cost optimization: (1) right-size instances (use AWS Compute Optimizer data), (2) Reserved Instances or Savings Plans for steady-state, (3) spot instances for non-critical workloads, (4) auto-shutdown dev environments off-hours (Lambda + CloudWatch Events). Monthly cost review: compare actual vs Infracost estimates, identify optimization opportunities"
41
+ weight: 0.30
42
+ description: "Tagging and optimization"
@@ -0,0 +1,51 @@
1
+ meta:
2
+ id: expert-debugging-shift
3
+ level: 4
4
+ course: terraform-infrastructure-setup
5
+ type: output
6
+ description: "Combined expert shift — advise on organizational IaC strategy while handling CI/CD pipeline failures and compliance audit findings"
7
+ tags: [Terraform, troubleshooting, combined, shift-simulation, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ As infrastructure lead, you face three organizational challenges:
13
+
14
+ Challenge 1 — CI/CD pipeline reliability:
15
+ Your Atlantis-based pipeline has been flaky:
16
+ - 20% of plans timeout (state lock contention with 8 teams)
17
+ - Plans show different results on retry (eventual consistency)
18
+ - Apply fails intermittently (API rate limiting)
19
+ Teams are losing trust and starting to apply from laptops again.
20
+
21
+ Challenge 2 — Compliance audit preparation:
22
+ SOC2 auditor arrives in 6 weeks. They need to see:
23
+ - Evidence that all infrastructure changes go through code review
24
+ - Proof that production access is restricted
25
+ - Encryption enforcement across all resources
26
+ - Automated security scanning results
27
+
28
+ Challenge 3 — Cost optimization mandate:
29
+ CFO mandates 25% cost reduction ($37.5K/month savings from $150K).
30
+ Current waste identified:
31
+ - Dev environments running 24/7 ($20K/month)
32
+ - Oversized RDS instances ($15K/month excess)
33
+ - Unused EBS volumes and snapshots ($8K/month)
34
+ - NAT Gateway in all AZs for non-prod ($5K/month excess)
35
+
36
+ Task: Address all three challenges with actionable plans and
37
+ timelines.
38
+
39
+ assertions:
40
+ - type: llm_judge
41
+ criteria: "CI/CD pipeline reliability is addressed — state lock contention: split monolith states into per-team states (each team's plan/apply doesn't block others). Timeout: increase lock timeout (-lock-timeout=5m), investigate which team's applies are long-running. Eventual consistency: add terraform plan -refresh-only before plan to ensure consistent state. Rate limiting: reduce parallelism (-parallelism=5), stagger team deployments. Trust recovery: show teams metrics (success rate improvement), ensure fast feedback loops. Consider: Terraform Cloud for managed execution (handles locking, retries, queueing)"
42
+ weight: 0.35
43
+ description: "Pipeline reliability"
44
+ - type: llm_judge
45
+ criteria: "Compliance preparation has a timeline — weeks 1-2: implement checkov/tfsec in all CI pipelines, generate baseline compliance reports. Weeks 2-3: remediate findings (add encryption to all S3 buckets, RDS, EBS; restrict security groups; configure CloudTrail). Weeks 3-4: implement Sentinel policies to prevent future violations. Weeks 4-5: generate audit evidence (Git logs showing all changes reviewed, CI scan reports, Terraform Cloud audit logs). Week 6: dry-run audit with compliance team. Evidence portfolio: PR review logs, automated scan reports, policy enforcement logs, access control documentation (IAM policy)"
46
+ weight: 0.35
47
+ description: "Compliance"
48
+ - type: llm_judge
49
+ criteria: "Cost optimization targets specific savings — dev environments 24/7 → schedule off-hours (Lambda + EventBridge, save $15K): terraform manages the schedule. RDS right-sizing: use Performance Insights data, downsize dev/staging instances (save $10K). EBS cleanup: terraform state list to find managed volumes, delete unattached ones, manage snapshot lifecycle (save $5K). NAT Gateway: single NAT Gateway per non-prod VPC instead of per-AZ (save $4K). Total: ~$34K savings (23%, close to 25% target). Implementation: Infracost in all PRs to prevent future waste, monthly cost review meeting, per-team cost dashboards using tags"
50
+ weight: 0.30
51
+ description: "Cost optimization"
@@ -0,0 +1,45 @@
1
+ meta:
2
+ id: iac-organization-strategy
3
+ level: 4
4
+ course: terraform-infrastructure-setup
5
+ type: output
6
+ description: "Design IaC organization strategy — choose between mono-repo and multi-repo, design state architecture, and establish team ownership boundaries"
7
+ tags: [Terraform, organization, mono-repo, multi-repo, strategy, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You're the infrastructure architect for a company with 80 engineers
13
+ across 8 teams. Current state:
14
+ - 3 separate Terraform repositories with inconsistent patterns
15
+ - 15 state files with no naming convention
16
+ - No shared modules — each team copy-pastes configurations
17
+ - Teams frequently conflict when deploying overlapping resources
18
+ - No visibility into who owns what infrastructure
19
+
20
+ You need to design the Terraform organization strategy for the
21
+ company. Leadership wants:
22
+ - Clear team ownership boundaries
23
+ - Reusable modules (stop copy-paste)
24
+ - Safe deployment workflows
25
+ - Audit trail for all changes
26
+ - Cost visibility per team
27
+
28
+ Task: Design the IaC organization strategy covering: repository
29
+ structure (mono-repo vs multi-repo trade-offs), state architecture
30
+ (how to partition state files), module library design, team
31
+ ownership model, and governance policies.
32
+
33
+ assertions:
34
+ - type: llm_judge
35
+ criteria: "Repository and state architecture are designed — mono-repo: single repo with directories per team/service. Benefits: unified modules, single PR workflow, easy cross-team visibility. Challenges: large repo, team coupling, complex CI/CD. Multi-repo: separate repos per team or service domain. Benefits: team autonomy, independent versioning, isolated CI/CD. Challenges: module sharing harder, cross-repo coordination. Recommended for 8 teams: hybrid — shared modules repo + per-team repos. State architecture: partition by (1) environment (dev/staging/prod), (2) team/service domain, (3) blast radius. Naming: s3://state/<team>/<env>/<service>.tfstate"
36
+ weight: 0.35
37
+ description: "Repo and state"
38
+ - type: llm_judge
39
+ criteria: "Module library and team ownership are designed — internal module registry: centralized repo with versioned, tested modules (VPC, EKS, RDS, S3). Module standards: README, input/output documentation, examples, tests (terraform test or Terratest). Publishing: git tags for versioning, semantic versioning (major.minor.patch). Team ownership: CODEOWNERS file mapping directories to teams. Platform team owns shared modules and foundation infrastructure. Service teams own their application infrastructure. Tagging strategy: mandatory tags for team, cost-center, environment on all resources"
40
+ weight: 0.35
41
+ description: "Modules and ownership"
42
+ - type: llm_judge
43
+ criteria: "Governance and deployment are practical — deployment workflow: feature branch → PR → automated plan → code review → merge → automated apply. Policy enforcement: pre-commit hooks (fmt, validate), CI checks (tflint, tfsec, checkov), Sentinel/OPA policies in Terraform Cloud. Cost visibility: Infracost in PR comments, AWS Cost Explorer tags. Audit: Terraform Cloud audit logs or CloudTrail for API calls. Change management: production changes require 2 approvals, blast radius classification (high-risk changes need additional review). Onboarding: documentation, module catalog, self-service templates"
44
+ weight: 0.30
45
+ description: "Governance"