npm - dojo.md - Versions diffs - 0.2.2 → 0.2.4 - Mend

dojo.md 0.2.2 → 0.2.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (196) hide show

package/courses/technical-rfc-writing/scenarios/level-2/technical-design-detail.yaml ADDED Viewed

@@ -0,0 +1,42 @@
+meta:
+  id: technical-design-detail
+  level: 2
+  course: technical-rfc-writing
+  type: output
+  description: "Write detailed technical designs — create architecture diagrams, data models, and API specifications within an RFC at the right level of detail"
+  tags: [RFC, technical-design, architecture, data-model, API, intermediate]
+state: {}
+trigger: |
+  You're writing the technical design section of an RFC proposing a
+  notification system. The system needs to:
+  - Send notifications via email, SMS, and push notifications
+  - Support user notification preferences (opt-in/out per channel)
+  - Handle templating for different notification types
+  - Queue notifications for reliability and rate limiting
+  - Track delivery status (sent, delivered, failed, read)
+  Your challenge: engineers want detailed technical design, but
+  managers keep asking "do I need to read all this?" You need to
+  provide enough detail for engineers to evaluate feasibility and
+  identify issues, without making the RFC impenetrable.
+  Task: Write the technical design section including: system
+  architecture (with component descriptions), data model, API
+  design, and sequence diagrams (in text/mermaid format). Show how
+  to provide technical depth while keeping the section navigable.
+assertions:
+  - type: llm_judge
+    criteria: "Architecture is clearly described with components and interactions — components: (1) Notification Service API (receives notification requests, validates, queues). (2) Message Queue (decouples sending from requesting, enables retry). (3) Channel Dispatchers (email via SendGrid/SES, SMS via Twilio, push via FCM/APNS — each as separate workers). (4) Preference Service (reads user notification preferences, filters channels). (5) Template Engine (renders notification content from templates + variables). (6) Delivery Tracker (records status updates, provides read receipts). Data flow described: 'Service A calls Notification API with (user_id, notification_type, payload) → API checks preferences → renders templates → enqueues per channel → dispatchers process and report delivery status.' Component interactions are explicit — which component calls which, synchronous vs asynchronous boundaries clearly marked"
+    weight: 0.35
+    description: "Architecture clarity"
+  - type: llm_judge
+    criteria: "Data model and API design are specific enough to evaluate — data model includes key tables/schemas: notifications (id, user_id, type, status, created_at, channels), notification_preferences (user_id, channel, notification_type, enabled), notification_templates (type, channel, subject_template, body_template), delivery_log (notification_id, channel, status, provider_response, attempted_at). API endpoints defined: POST /notifications (send), GET /notifications/:id (status), PUT /preferences (update user prefs), with request/response shapes shown. Sequence diagram (text or mermaid) for the happy path: caller → API → preference check → template render → queue → dispatcher → provider → delivery callback. The design should be specific enough that an engineer could identify gaps ('what about batch notifications?') but not so detailed that it's pseudocode"
+    weight: 0.35
+    description: "Data model and API"
+  - type: llm_judge
+    criteria: "Technical depth is balanced — navigable structure: overview paragraph (3 sentences summarizing the architecture for skimmers), then detailed subsections with clear headings (Architecture Overview, Data Model, API Design, Sequence Flows). Each subsection starts with a one-sentence summary. Diagrams complement text rather than replacing it. Design decisions are called out: 'We use a message queue between the API and dispatchers because: synchronous sending would block the caller, retries are handled by the queue infrastructure, and we can add rate limiting per channel at the queue level.' Trade-offs acknowledged: 'Separate dispatcher workers per channel add operational complexity but allow independent scaling and deployment — email spikes (marketing campaigns) don't affect SMS delivery.' Non-obvious choices explained: why this queue technology, why these API patterns, why this data model structure. The section answers 'is this feasible?' and 'are there design flaws?' without being an implementation guide"
+    weight: 0.30
+    description: "Balanced depth"

package/courses/technical-rfc-writing/scenarios/level-2/trade-off-analysis.yaml ADDED Viewed

@@ -0,0 +1,43 @@
+meta:
+  id: trade-off-analysis
+  level: 2
+  course: technical-rfc-writing
+  type: output
+  description: "Analyze trade-offs systematically — evaluate competing solutions using weighted criteria and make the reasoning transparent"
+  tags: [RFC, trade-offs, decision-matrix, evaluation, criteria, intermediate]
+state: {}
+trigger: |
+  You're writing an RFC to choose a state management solution for your
+  growing React application. The app currently uses prop drilling and
+  local useState, but as it's grown to 50+ components across 8 feature
+  areas, state management has become a major pain point.
+  Options being debated:
+  1. Redux Toolkit — mature, well-documented, large ecosystem
+  2. Zustand — lightweight, minimal boilerplate, growing community
+  3. React Context + useReducer — built-in, no dependencies, familiar
+  4. Jotai — atomic state model, fine-grained reactivity, modern
+  Different team members advocate for different solutions based on
+  personal preference. The discussions are going in circles.
+  Task: Write a trade-off analysis section that evaluates these options
+  systematically using weighted criteria. The analysis should make the
+  decision process transparent and help the team converge on a choice
+  regardless of individual preferences.
+assertions:
+  - type: llm_judge
+    criteria: "Evaluation criteria are defined and weighted before comparing options — criteria identified from project needs, not abstract quality: (1) Learning curve (weight: 25%) — team has 8 engineers with varying React experience, onboarding new hires quarterly. (2) Bundle size impact (weight: 15%) — mobile-first app, performance budget of 200KB JS. (3) Boilerplate/developer experience (weight: 20%) — team velocity matters, less ceremony preferred. (4) Scalability to 100+ components (weight: 20%) — app is growing, solution must handle complexity. (5) Community and ecosystem (weight: 20%) — long-term viability, available middleware, debugging tools. Weights are justified: 'Learning curve is weighted highest because our team composition means 3 engineers have never used external state management. An unfamiliar tool slows the entire team, not just adoption.' The criteria should emerge from the specific project context, not be a generic comparison"
+    weight: 0.35
+    description: "Weighted criteria"
+  - type: llm_judge
+    criteria: "Each option is scored fairly against criteria with evidence — scoring is specific, not 'good/bad': Redux Toolkit — learning curve: 3/5 (extensive docs but steep initial concepts: slices, thunks, selectors), bundle: 3/5 (11KB gzipped — within budget), DX: 3/5 (RTK reduces boilerplate significantly vs classic Redux but still more ceremony than alternatives), scalability: 5/5 (proven at massive scale, clear patterns), ecosystem: 5/5 (largest ecosystem, Redux DevTools, middleware). Zustand — learning curve: 5/5 (minimal API surface, hooks-based, familiar patterns), bundle: 5/5 (1KB gzipped), DX: 5/5 (minimal boilerplate, no providers), scalability: 4/5 (works well at scale but fewer established patterns for very large apps), ecosystem: 3/5 (growing but smaller than Redux). Similar depth for Context and Jotai. Scores have brief justifications. A weighted total is calculated. Sensitivity analysis: 'If we weight scalability higher (30%), Redux wins. With current weights, Zustand scores highest.' This shows the decision isn't arbitrary"
+    weight: 0.35
+    description: "Fair scoring with evidence"
+  - type: llm_judge
+    criteria: "Analysis leads to a clear recommendation with reversibility assessment — recommendation states: 'Based on weighted scoring, we recommend Zustand (weighted score: X) over Redux Toolkit (score: Y).' Why the scores matter: 'The 2-point advantage in learning curve and DX translates to faster adoption and higher team velocity in the first 3 months.' Acknowledges what's lost: 'We trade Redux's larger ecosystem and proven patterns at extreme scale for simplicity and developer experience. Given our current size (50 components) and growth trajectory, this trade-off favors Zustand.' Reversibility: 'Both Zustand and Redux use hooks-based APIs. Migration from Zustand to Redux, if needed, involves refactoring store definitions — estimated 2-3 weeks for current codebase. This makes Zustand a low-risk starting choice.' Conditions for revisiting: 'If the app exceeds 200 components or we need server-side state synchronization, re-evaluate Redux Toolkit or TanStack Query.' The analysis should feel objective enough that someone favoring Redux could accept the Zustand recommendation"
+    weight: 0.30
+    description: "Clear recommendation"

package/courses/terraform-infrastructure-setup/scenarios/level-1/first-debugging-shift.yaml ADDED Viewed

@@ -0,0 +1,66 @@
+meta:
+  id: first-debugging-shift
+  level: 1
+  course: terraform-infrastructure-setup
+  type: output
+  description: "Combined beginner shift — handle multiple Terraform issues during your first on-call including init failures, plan errors, and state problems"
+  tags: [Terraform, troubleshooting, combined, shift-simulation, beginner]
+state: {}
+trigger: |
+  It's your first week managing Terraform infrastructure. You face
+  three issues in one day:
+  Issue 1 — New engineer can't get started:
+  ```
+  $ terraform init
+  Error: Failed to get existing workspaces: S3 bucket
+  "company-tf-state" does not exist.
+  ```
+  The engineer cloned the repo but nobody told them about the
+  backend S3 bucket setup.
+  Issue 2 — Staging plan shows unexpected destroy:
+  ```
+  $ terraform plan
+  # aws_instance.app[0] will be destroyed
+  # aws_instance.app[1] will be destroyed
+  # aws_instance.app[2] will be destroyed
+  # aws_instance.app["web-1"] will be created
+  # aws_instance.app["web-2"] will be created
+  # aws_instance.app["web-3"] will be created
+  Plan: 3 to add, 0 to change, 3 to destroy.
+  ```
+  Someone refactored from count to for_each. The 3 instances will be
+  destroyed and recreated, causing downtime.
+  Issue 3 — State lock stuck:
+  ```
+  Error: Error acquiring the state lock
+  Lock Info:
+    ID:        abc-123
+    Who:       jenkins@ci-server
+    Created:   2024-01-15 08:00:00 UTC
+  ```
+  The CI pipeline crashed mid-apply 6 hours ago and the lock is stale.
+  Task: Diagnose and fix all three issues. Explain the root causes
+  and preventive measures for each.
+assertions:
+  - type: llm_judge
+    criteria: "Issue 1 (init failure) is resolved — the S3 backend bucket must exist before terraform init. Fix: create the bucket manually (or with a separate bootstrap Terraform config), or use terraform init -backend-config=path/to/backend.hcl for flexible backend configuration. Document the bootstrap process in README. Consider: use a Makefile or script that checks prerequisites before init. For new engineers: provide onboarding docs with exact setup steps, or use Terraform Cloud to avoid S3 backend setup entirely"
+    weight: 0.35
+    description: "Init failure"
+  - type: llm_judge
+    criteria: "Issue 2 (count to for_each migration) is resolved — switching from count to for_each changes resource addresses (aws_instance.app[0] → aws_instance.app['web-1']), causing Terraform to see them as different resources. Fix: use terraform state mv to rename resources in state: terraform state mv 'aws_instance.app[0]' 'aws_instance.app[\"web-1\"]' for each instance. This avoids recreation. Alternative: use moved blocks in Terraform 1.1+: moved { from = aws_instance.app[0], to = aws_instance.app[\"web-1\"] }. Always check plan after refactoring"
+    weight: 0.35
+    description: "Count to for_each"
+  - type: llm_judge
+    criteria: "Issue 3 (stuck lock) is resolved — the CI pipeline crashed and didn't release the DynamoDB lock. Verify the lock is stale (6 hours old, CI pipeline is dead). Fix: terraform force-unlock abc-123. This is safe because the original process is dead. Prevention: CI/CD pipeline should have timeout protection, use terraform apply with -lock-timeout=5m to wait for locks, ensure CI cleanup steps release locks on failure. Warning: never force-unlock if the original process might still be running"
+    weight: 0.30
+    description: "Stuck lock"

package/courses/terraform-infrastructure-setup/scenarios/level-1/plan-output-reading.yaml ADDED Viewed

@@ -0,0 +1,71 @@
+meta:
+  id: plan-output-reading
+  level: 1
+  course: terraform-infrastructure-setup
+  type: output
+  description: "Read terraform plan output — interpret change symbols, understand resource actions, identify destructive changes before apply"
+  tags: [Terraform, plan, output, changes, destroy, beginner]
+state: {}
+trigger: |
+  Your terraform plan shows this output and you need to understand
+  what will happen before approving:
+  ```
+  Terraform will perform the following actions:
+    # aws_instance.web will be destroyed and re-created
+    # (because ami has changed)
+  -/+ resource "aws_instance" "web" {
+      ~ ami                    = "ami-old123" -> "ami-new456" # forces replacement
+      ~ arn                    = "arn:aws:ec2:..." -> (known after apply)
+      ~ id                     = "i-abc123" -> (known after apply)
+        instance_type          = "t3.micro"
+      ~ public_ip              = "54.1.2.3" -> (known after apply)
+      + secondary_private_ips  = (known after apply)
+      - tags                   = {} -> null
+        # (15 unchanged attributes hidden)
+      }
+    # aws_s3_bucket.data will be updated in-place
+    ~ resource "aws_s3_bucket" "data" {
+        id     = "my-data-bucket"
+      ~ tags   = {
+          + "Environment" = "prod"
+          }
+      }
+    # aws_security_group.old will be destroyed
+    - resource "aws_security_group" "old" {
+      - id   = "sg-old789" -> null
+      - name = "old-sg" -> null
+      }
+    # aws_security_group.new will be created
+    + resource "aws_security_group" "new" {
+      + id   = (known after apply)
+      + name = "new-sg"
+      }
+  Plan: 2 to add, 1 to change, 2 to destroy.
+  ```
+  Task: Explain how to read terraform plan output, what each symbol
+  means (+, -, ~, -/+), what "forces replacement" means and why it's
+  dangerous, "known after apply" values, and best practices for
+  reviewing plans before apply.
+assertions:
+  - type: llm_judge
+    criteria: "Plan symbols are explained — + (create): new resource will be created. - (destroy): existing resource will be deleted. ~ (update in-place): resource will be modified without recreation. -/+ (destroy and recreate): resource must be destroyed and recreated (replacement). +/- (create before destroy): new resource created first, then old destroyed (lifecycle create_before_destroy). Within attributes: + (added), - (removed), ~ (changed). 'forces replacement' means changing that attribute requires destroying and recreating the resource (e.g., changing AMI on EC2 instance). '(known after apply)' means the value will be determined by the cloud provider during creation"
+    weight: 0.35
+    description: "Plan symbols"
+  - type: llm_judge
+    criteria: "Dangerous changes are identified — destroy and recreate (-/+) is dangerous: causes downtime, new IP address, new resource ID. The example: changing AMI forces EC2 instance replacement — means the server goes down, gets a new public IP, loses ephemeral storage. Review all -/+ changes carefully. Mitigation: lifecycle { create_before_destroy = true } creates the new resource before destroying the old one (reduces downtime). Pure destroy (-) is also dangerous: verify you actually want to remove the resource. Summary line 'Plan: 2 to add, 1 to change, 2 to destroy' is the quick check"
+    weight: 0.35
+    description: "Dangerous changes"
+  - type: llm_judge
+    criteria: "Plan review best practices are practical — always run plan before apply. Save plan: terraform plan -out=plan.tfplan, then terraform apply plan.tfplan (prevents drift between plan and apply). In CI/CD: plan on PR, apply on merge. Review checklist: (1) any destroys? expected? (2) any replacements? understand why? (3) count of changes matches expectations? (4) no sensitive data exposed in plan output? Use terraform plan -target=resource to plan specific resources. terraform plan -detailed-exitcode: exit 0 (no changes), exit 1 (error), exit 2 (changes present) — useful in scripts"
+    weight: 0.30
+    description: "Review practices"

package/courses/terraform-infrastructure-setup/scenarios/level-1/resource-creation-failures.yaml ADDED Viewed

@@ -0,0 +1,54 @@
+meta:
+  id: resource-creation-failures
+  level: 1
+  course: terraform-infrastructure-setup
+  type: output
+  description: "Debug resource creation failures — diagnose API errors, dependency issues, naming conflicts, and partial apply states"
+  tags: [Terraform, resources, apply, errors, dependencies, beginner]
+state: {}
+trigger: |
+  You run `terraform apply` and it partially succeeds:
+  ```
+  aws_vpc.main: Creating...
+  aws_vpc.main: Creation complete after 3s [id=vpc-0abc123def456]
+  aws_subnet.public: Creating...
+  aws_subnet.public: Creation complete after 1s [id=subnet-0abc789]
+  aws_security_group.web: Creating...
+  Error: creating Security Group (web-sg): InvalidGroup.Duplicate:
+  The security group 'web-sg' already exists for VPC 'vpc-0abc123def456'
+  aws_instance.web: Creating...
+  Error: creating EC2 Instance: UnauthorizedOperation: You are not
+  authorized to perform this operation. Encoded authorization failure
+  message: ...
+  ```
+  State after partial apply:
+  - VPC: created ✓
+  - Subnet: created ✓
+  - Security Group: FAILED ✗
+  - EC2 Instance: FAILED ✗
+  Task: Explain how terraform apply works (dependency graph, parallel
+  creation, partial state), what happens when apply partially fails,
+  how to recover from partial failures, common resource creation errors
+  (duplicates, permissions, quotas), and terraform plan vs apply workflow.
+assertions:
+  - type: llm_judge
+    criteria: "Apply process is explained — terraform builds a dependency graph (DAG) and creates resources in parallel where possible. Resources with dependencies wait for their dependencies. Partial failure: successfully created resources are saved to state. Failed resources are not in state. On next apply, Terraform will try to create the failed resources again (it won't recreate already-created ones). The state file tracks what exists. Recovery: fix the errors and run apply again. terraform plan shows what will happen without making changes"
+    weight: 0.35
+    description: "Apply process"
+  - type: llm_judge
+    criteria: "Common creation errors are diagnosed — (1) Duplicate resource: security group already exists outside Terraform. Fix: import it (terraform import) or use a unique name. (2) UnauthorizedOperation: IAM permissions missing. Decode the message: aws sts decode-authorization-message. Fix: add required IAM permissions. (3) Quota exceeded: AWS service limits. Fix: request limit increase. (4) Invalid parameter: wrong AMI for region, invalid CIDR block. (5) Timeout: resource takes too long to create (increase timeouts in resource block)"
+    weight: 0.35
+    description: "Common errors"
+  - type: llm_judge
+    criteria: "Plan vs apply workflow is practical — terraform plan: preview changes without modifying infrastructure. Shows: resources to add (+), change (~), destroy (-). Save plan: terraform plan -out=tfplan, then terraform apply tfplan (ensures exactly the planned changes are applied). Always review plan before apply in production. terraform apply -auto-approve: skips confirmation (only for CI/CD, not manual runs). terraform destroy: removes all managed resources. Partial state: use terraform state list to see what's managed, terraform show for current state"
+    weight: 0.30
+    description: "Plan vs apply"

package/courses/terraform-infrastructure-setup/scenarios/level-1/resource-references.yaml ADDED Viewed

@@ -0,0 +1,70 @@
+meta:
+  id: resource-references
+  level: 1
+  course: terraform-infrastructure-setup
+  type: output
+  description: "Fix resource reference errors — debug circular dependencies, missing attributes, implicit vs explicit dependencies, and data source lookups"
+  tags: [Terraform, references, dependencies, data-sources, expressions, beginner]
+state: {}
+trigger: |
+  Your Terraform configuration has reference errors:
+  ```
+  Error: Cycle: aws_security_group.web, aws_security_group.db
+  Error: Unsupported attribute
+    on main.tf line 25:
+    25:   vpc_id = aws_vpc.main.vpc_id
+  A managed resource "aws_vpc" "main" has not been declared in the
+  root module.
+  ```
+  Your configuration:
+  ```hcl
+  resource "aws_security_group" "web" {
+    ingress {
+      from_port       = 443
+      to_port         = 443
+      security_groups = [aws_security_group.db.id]
+    }
+  }
+  resource "aws_security_group" "db" {
+    ingress {
+      from_port       = 5432
+      to_port         = 5432
+      security_groups = [aws_security_group.web.id]
+    }
+  }
+  resource "aws_subnet" "public" {
+    vpc_id = aws_vpc.main.vpc_id
+  }
+  ```
+  The VPC already exists in AWS (created by another team) — you need
+  to reference it, not create it.
+  Task: Explain Terraform resource references, implicit vs explicit
+  dependencies, circular dependency detection and resolution, data
+  sources for reading existing infrastructure, and expression syntax
+  for accessing resource attributes.
+assertions:
+  - type: llm_judge
+    criteria: "Resource references and dependencies are explained — implicit dependency: when resource A references resource B's attribute (aws_security_group.web.id), Terraform automatically creates A after B. Explicit dependency: depends_on = [aws_security_group.web] when there's no attribute reference but order matters. Circular dependency: A references B and B references A — Terraform can't determine creation order. Fix: break the cycle by using aws_security_group_rule as separate resources instead of inline ingress blocks, or use depends_on with one direction only"
+    weight: 0.35
+    description: "Dependencies"
+  - type: llm_judge
+    criteria: "Data sources are explained — data sources read existing infrastructure without managing it. data 'aws_vpc' 'main' { filter { name = 'tag:Name', values = ['production'] } } reads the VPC. Reference: data.aws_vpc.main.id. Use data sources when: resource exists outside your Terraform config, managed by another team/state, or pre-existing infrastructure. Common data sources: aws_ami (find latest AMI), aws_vpc, aws_subnet, aws_caller_identity (current AWS account), aws_region. Data sources are refreshed on every plan"
+    weight: 0.35
+    description: "Data sources"
+  - type: llm_judge
+    criteria: "Expression syntax is covered — resource attributes: aws_instance.web.id, aws_instance.web.public_ip. With count: aws_instance.web[0].id or aws_instance.web[*].id (splat). With for_each: aws_instance.web['key'].id. Module outputs: module.vpc.vpc_id. Data sources: data.aws_vpc.main.id. Local values: local.common_tags. Built-in functions: lookup(), element(), concat(), join(). Conditional: condition ? true_val : false_val. For expressions: [for s in var.list : upper(s)]"
+    weight: 0.30
+    description: "Expression syntax"

package/courses/terraform-infrastructure-setup/scenarios/level-1/state-file-basics.yaml ADDED Viewed

@@ -0,0 +1,73 @@
+meta:
+  id: state-file-basics
+  level: 1
+  course: terraform-infrastructure-setup
+  type: output
+  description: "Understand Terraform state — diagnose state file corruption, lock conflicts, state drift, and local vs remote state tradeoffs"
+  tags: [Terraform, state, tfstate, locking, drift, beginner]
+state: {}
+trigger: |
+  Your colleague runs `terraform plan` and gets unexpected results:
+  ```
+  $ terraform plan
+  Note: Objects have changed outside of Terraform
+  Terraform detected the following changes made outside of Terraform
+  since the last "terraform apply":
+    # aws_instance.web has been changed
+    ~ resource "aws_instance" "web" {
+        ~ instance_type = "t3.micro" -> "t3.large"
+          # (10 unchanged attributes hidden)
+      }
+  Terraform will perform the following actions:
+    # aws_instance.web will be updated in-place
+    ~ resource "aws_instance" "web" {
+        ~ instance_type = "t3.large" -> "t3.micro"
+      }
+  Plan: 0 to add, 1 to change, 0 to destroy.
+  ```
+  Someone changed the instance type in the AWS console from t3.micro
+  to t3.large. Terraform wants to revert it back to match the code.
+  Also, when two people run terraform at the same time:
+  ```
+  Error: Error acquiring the state lock
+  Error message: ConditionalCheckFailedException: The conditional
+  request failed
+  Lock Info:
+    ID:        12345-abcde
+    Path:      my-bucket/prod/terraform.tfstate
+    Operation: OperationTypeApply
+    Who:       colleague@laptop
+    Version:   1.7.0
+    Created:   2024-01-15 10:30:00 UTC
+  ```
+  Task: Explain Terraform state fundamentals, what the state file
+  contains and why it exists, state drift detection and resolution,
+  state locking (why it matters, how it works with DynamoDB), and
+  local vs remote state.
+assertions:
+  - type: llm_judge
+    criteria: "State fundamentals are explained — terraform.tfstate is a JSON file mapping configuration to real infrastructure IDs. It tracks: resource IDs, attribute values, dependencies, metadata. Why state exists: (1) map config to real resources (Terraform needs to know which aws_instance.web is vpc-abc123), (2) track metadata (dependencies), (3) performance (cache attribute values instead of querying APIs every time). State is the source of truth for what Terraform manages. Drift: when real infrastructure differs from state, detected during plan/apply refresh"
+    weight: 0.35
+    description: "State fundamentals"
+  - type: llm_judge
+    criteria: "Drift detection and resolution are covered — Terraform refreshes state before plan (compares real infrastructure to state). Drift detected: plan shows changes to revert. Options: (1) apply to revert to code (desired state wins), (2) update code to match reality (if the manual change was intentional), (3) terraform apply -refresh-only to update state without changing infrastructure. Best practice: all changes through Terraform, never manual. If manual changes are needed: update the .tf files to match, then plan to confirm no changes"
+    weight: 0.35
+    description: "Drift resolution"
+  - type: llm_judge
+    criteria: "Locking and remote state are practical — local state: terraform.tfstate in working directory. Problem: not shared, no locking, easy to lose. Remote state: store in S3, GCS, Azure Blob, Terraform Cloud. S3 backend with DynamoDB: S3 stores state file, DynamoDB provides locking (prevents concurrent modifications). Lock error: someone else is running terraform. Fix: wait for them to finish, or terraform force-unlock <ID> (dangerous, only if lock is stale). Never commit terraform.tfstate to git (contains secrets). Add to .gitignore"
+    weight: 0.30
+    description: "Locking and remote"

package/courses/terraform-infrastructure-setup/scenarios/level-1/terraform-fmt-validate.yaml ADDED Viewed

@@ -0,0 +1,58 @@
+meta:
+  id: terraform-fmt-validate
+  level: 1
+  course: terraform-infrastructure-setup
+  type: output
+  description: "Use terraform fmt and validate — enforce consistent formatting, catch configuration errors before plan, and set up pre-commit hooks"
+  tags: [Terraform, fmt, validate, formatting, pre-commit, beginner]
+state: {}
+trigger: |
+  Your team's Terraform codebase is a formatting mess. Every developer
+  uses different indentation, alignment, and spacing. Code reviews are
+  full of style nitpicks instead of substance:
+  ```hcl
+  resource "aws_instance" "web" {
+      ami = "ami-0c55b159cbfafe1f0"
+    instance_type="t3.micro"
+        tags={
+      Name ="web-server"
+    Environment= "prod"
+        }
+  }
+  variable "region" {
+  type=string
+    default ="us-east-1"
+  }
+  ```
+  Additionally, a developer pushed this broken config that wasn't caught
+  until CI ran terraform plan (wasting 10 minutes):
+  ```hcl
+  resource "aws_s3_bucket" "data" {
+    bucket = var.bucket_name
+    acl    = "private"  # acl argument removed in AWS provider v4+
+  }
+  ```
+  Task: Explain terraform fmt (auto-formatting), terraform validate
+  (configuration checking), how to enforce both in CI/CD and pre-commit
+  hooks, and the difference between validate and plan for error catching.
+assertions:
+  - type: llm_judge
+    criteria: "terraform fmt is explained — terraform fmt rewrites .tf files to canonical format (2-space indent, aligned equals signs, consistent spacing). terraform fmt -check: returns non-zero exit code if files need formatting (for CI). terraform fmt -recursive: formats all .tf files in subdirectories. terraform fmt -diff: shows the formatting changes. The example code would be auto-fixed to consistent 2-space indentation with aligned = signs. Best practice: run fmt before every commit"
+    weight: 0.35
+    description: "Formatting"
+  - type: llm_judge
+    criteria: "terraform validate is explained with its limitations — validate checks: syntax errors, invalid argument names, type mismatches, missing required arguments, invalid resource references. validate does NOT check: cloud API validity (wrong AMI ID), permissions, resource existence, provider-specific logic. The acl argument error: validate might not catch this because it depends on provider version schema. Plan catches more because it initializes providers and checks against their schemas. Workflow: fmt → validate → plan → apply. validate is fast (no API calls), plan is thorough (makes API calls)"
+    weight: 0.35
+    description: "Validation"
+  - type: llm_judge
+    criteria: "CI/CD and pre-commit integration are practical — pre-commit hook: use pre-commit framework with terraform_fmt and terraform_validate hooks. CI pipeline: (1) terraform fmt -check -recursive (fail if unformatted), (2) terraform init -backend=false (for validate only), (3) terraform validate, (4) terraform plan. This catches formatting issues before PR, validation errors without cloud access, and full errors with plan. Tools: pre-commit-terraform hooks, GitHub Actions, GitLab CI. TFLint for additional linting beyond validate (naming conventions, deprecated syntax, best practices)"
+    weight: 0.30
+    description: "CI integration"

package/courses/terraform-infrastructure-setup/scenarios/level-2/count-vs-for-each.yaml ADDED Viewed

@@ -0,0 +1,58 @@
+meta:
+  id: count-vs-for-each
+  level: 2
+  course: terraform-infrastructure-setup
+  type: output
+  description: "Choose between count and for_each — understand index-based vs key-based resources, migration pitfalls, and when to use each"
+  tags: [Terraform, count, for_each, meta-arguments, iteration, intermediate]
+state: {}
+trigger: |
+  Your infrastructure uses count for EC2 instances:
+  ```hcl
+  variable "instances" {
+    default = ["web-1", "web-2", "web-3"]
+  }
+  resource "aws_instance" "web" {
+    count         = length(var.instances)
+    ami           = "ami-0c55b159cbfafe1f0"
+    instance_type = "t3.micro"
+    tags = {
+      Name = var.instances[count.index]
+    }
+  }
+  ```
+  A developer removes "web-2" from the list. Terraform plan shows:
+  ```
+  # aws_instance.web[1] will be updated in-place (web-3 → web-2 tags)
+  # aws_instance.web[2] will be destroyed
+  Plan: 0 to add, 1 to change, 1 to destroy.
+  ```
+  The web-3 instance gets renamed to web-2, and the real web-3 instance
+  gets destroyed! This is the wrong behavior — you wanted to remove
+  web-2, not web-3.
+  Task: Explain count vs for_each, why count causes index shift problems,
+  how for_each solves this with stable keys, how to migrate from count
+  to for_each safely, and when to use each approach.
+assertions:
+  - type: llm_judge
+    criteria: "Count problems are explained — count uses numeric indices: web[0], web[1], web[2]. Removing a middle element shifts all subsequent indices. Removing 'web-2' (index 1) makes 'web-3' become index 1 — Terraform sees index 1 changed and index 2 disappeared. Result: wrong instance modified, wrong instance destroyed. This is a fundamental limitation of count with lists that can have elements removed. Count is safe only when: (1) all elements are identical, (2) you only add/remove from the end, (3) the count is a simple number not derived from a list"
+    weight: 0.35
+    description: "Count problems"
+  - type: llm_judge
+    criteria: "for_each solution is explained — for_each uses stable map keys: web['web-1'], web['web-2'], web['web-3']. Removing 'web-2' only affects web['web-2'] — other instances untouched. Implementation: resource 'aws_instance' 'web' { for_each = toset(var.instances), tags = { Name = each.key } } or with a map: for_each = var.instances_map, using each.key and each.value. for_each accepts: set(string) or map. Not list — use toset() to convert. Access instances: aws_instance.web['web-1'].id"
+    weight: 0.35
+    description: "for_each solution"
+  - type: llm_judge
+    criteria: "Migration strategy is covered — migrating count to for_each changes resource addresses (web[0] → web['web-1']). Without state management, Terraform destroys and recreates all instances. Safe migration: (1) use moved blocks (Terraform 1.1+): moved { from = aws_instance.web[0], to = aws_instance.web['web-1'] }. (2) Or use terraform state mv: terraform state mv 'aws_instance.web[0]' 'aws_instance.web[\"web-1\"]'. Verify with plan — should show no changes. When to use count: simple numeric repetition (create N identical resources). When to use for_each: named resources, resources that may be individually added/removed"
+    weight: 0.30
+    description: "Migration"

package/courses/terraform-infrastructure-setup/scenarios/level-2/dependency-management.yaml ADDED Viewed

@@ -0,0 +1,80 @@
+meta:
+  id: dependency-management
+  level: 2
+  course: terraform-infrastructure-setup
+  type: output
+  description: "Manage resource dependencies — debug dependency graphs, resolve circular references, use depends_on correctly, and understand the DAG"
+  tags: [Terraform, dependencies, DAG, circular, depends-on, intermediate]
+state: {}
+trigger: |
+  Your infrastructure has a complex dependency chain that's causing
+  issues during apply:
+  ```hcl
+  resource "aws_iam_role" "lambda" {
+    name = "lambda-role"
+    assume_role_policy = jsonencode({...})
+  }
+  resource "aws_iam_role_policy" "lambda" {
+    role   = aws_iam_role.lambda.name
+    policy = jsonencode({
+      Statement = [{
+        Action   = "s3:GetObject"
+        Resource = aws_s3_bucket.data.arn
+      }]
+    })
+  }
+  resource "aws_lambda_function" "processor" {
+    function_name = "processor"
+    role          = aws_iam_role.lambda.arn
+    handler       = "index.handler"
+    runtime       = "nodejs18.x"
+    filename      = "lambda.zip"
+  }
+  resource "aws_s3_bucket_notification" "trigger" {
+    bucket = aws_s3_bucket.data.id
+    lambda_function {
+      lambda_function_arn = aws_lambda_function.processor.arn
+      events              = ["s3:ObjectCreated:*"]
+    }
+  }
+  resource "aws_lambda_permission" "s3" {
+    action        = "lambda:InvokeFunction"
+    function_name = aws_lambda_function.processor.function_name
+    principal     = "s3.amazonaws.com"
+    source_arn    = aws_s3_bucket.data.arn
+  }
+  ```
+  Error:
+  ```
+  Error: error creating S3 Bucket Notification: Unable to validate
+  the following destination configurations: Lambda function ARN
+  The Lambda function doesn't have permission to be invoked by S3 yet
+  (the permission resource hasn't been created).
+  ```
+  Task: Explain the Terraform dependency graph (DAG), implicit vs
+  explicit dependencies, how to debug dependency ordering issues,
+  terraform graph command, and depends_on best practices.
+assertions:
+  - type: llm_judge
+    criteria: "DAG and implicit dependencies are explained — Terraform builds a directed acyclic graph (DAG) from resource references. Implicit: when resource A uses resource B's attribute, A depends on B. The chain: role → role_policy (references role.name), role → lambda (references role.arn), bucket → notification (references bucket.id), lambda → notification (references lambda.arn). The error: s3_bucket_notification depends on lambda (implicit) but NOT on lambda_permission (no attribute reference). S3 notification tries to verify the Lambda ARN but permission doesn't exist yet"
+    weight: 0.35
+    description: "DAG explained"
+  - type: llm_judge
+    criteria: "The fix uses depends_on correctly — add depends_on = [aws_lambda_permission.s3] to the aws_s3_bucket_notification resource. This creates an explicit dependency where no implicit one exists. depends_on is needed because the notification resource doesn't reference any attribute of the permission resource, but the permission must exist for the notification to succeed. terraform graph: visualize dependencies with terraform graph | dot -Tsvg > graph.svg. Look for missing edges that represent real-world dependencies"
+    weight: 0.35
+    description: "Fix with depends_on"
+  - type: llm_judge
+    criteria: "depends_on best practices are covered — use depends_on sparingly: prefer implicit dependencies (reference attributes). depends_on forces sequential creation (reduces parallelism). Common scenarios needing depends_on: IAM permissions before resources that need them, DNS records before health checks, network resources before resources placed in them (when ID isn't directly referenced). Anti-pattern: depends_on everywhere 'just in case' — slows down apply. Use terraform graph to verify dependency order before adding depends_on. Module-level depends_on: depends_on on module blocks waits for entire module to complete"
+    weight: 0.30
+    description: "Best practices"