dojo.md 0.2.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (222) hide show
  1. package/courses/GENERATION_LOG.md +45 -0
  2. package/courses/aws-lambda-debugging/course.yaml +11 -0
  3. package/courses/aws-lambda-debugging/scenarios/level-1/api-gateway-integration.yaml +71 -0
  4. package/courses/aws-lambda-debugging/scenarios/level-1/cloudwatch-logs-basics.yaml +64 -0
  5. package/courses/aws-lambda-debugging/scenarios/level-1/cold-start-basics.yaml +70 -0
  6. package/courses/aws-lambda-debugging/scenarios/level-1/environment-variable-issues.yaml +72 -0
  7. package/courses/aws-lambda-debugging/scenarios/level-1/first-debugging-shift.yaml +73 -0
  8. package/courses/aws-lambda-debugging/scenarios/level-1/handler-import-errors.yaml +71 -0
  9. package/courses/aws-lambda-debugging/scenarios/level-1/iam-permission-errors.yaml +68 -0
  10. package/courses/aws-lambda-debugging/scenarios/level-1/invocation-errors.yaml +72 -0
  11. package/courses/aws-lambda-debugging/scenarios/level-1/lambda-timeout-errors.yaml +65 -0
  12. package/courses/aws-lambda-debugging/scenarios/level-1/memory-and-oom.yaml +70 -0
  13. package/courses/aws-lambda-debugging/scenarios/level-2/async-invocation-failures.yaml +72 -0
  14. package/courses/aws-lambda-debugging/scenarios/level-2/cold-start-optimization.yaml +76 -0
  15. package/courses/aws-lambda-debugging/scenarios/level-2/dynamodb-streams-debugging.yaml +70 -0
  16. package/courses/aws-lambda-debugging/scenarios/level-2/intermediate-debugging-shift.yaml +71 -0
  17. package/courses/aws-lambda-debugging/scenarios/level-2/lambda-concurrency-management.yaml +70 -0
  18. package/courses/aws-lambda-debugging/scenarios/level-2/lambda-layers-debugging.yaml +76 -0
  19. package/courses/aws-lambda-debugging/scenarios/level-2/sam-local-debugging.yaml +74 -0
  20. package/courses/aws-lambda-debugging/scenarios/level-2/sqs-event-source.yaml +72 -0
  21. package/courses/aws-lambda-debugging/scenarios/level-2/vpc-networking-issues.yaml +71 -0
  22. package/courses/aws-lambda-debugging/scenarios/level-2/xray-tracing.yaml +62 -0
  23. package/courses/aws-lambda-debugging/scenarios/level-3/advanced-debugging-shift.yaml +72 -0
  24. package/courses/aws-lambda-debugging/scenarios/level-3/container-image-lambda.yaml +79 -0
  25. package/courses/aws-lambda-debugging/scenarios/level-3/cross-account-invocation.yaml +72 -0
  26. package/courses/aws-lambda-debugging/scenarios/level-3/eventbridge-patterns.yaml +79 -0
  27. package/courses/aws-lambda-debugging/scenarios/level-3/iac-deployment-debugging.yaml +68 -0
  28. package/courses/aws-lambda-debugging/scenarios/level-3/kinesis-stream-processing.yaml +64 -0
  29. package/courses/aws-lambda-debugging/scenarios/level-3/lambda-at-edge.yaml +64 -0
  30. package/courses/aws-lambda-debugging/scenarios/level-3/lambda-extensions-debugging.yaml +67 -0
  31. package/courses/aws-lambda-debugging/scenarios/level-3/powertools-observability.yaml +79 -0
  32. package/courses/aws-lambda-debugging/scenarios/level-3/step-functions-debugging.yaml +80 -0
  33. package/courses/aws-lambda-debugging/scenarios/level-4/cost-optimization-strategy.yaml +67 -0
  34. package/courses/aws-lambda-debugging/scenarios/level-4/expert-debugging-shift.yaml +62 -0
  35. package/courses/aws-lambda-debugging/scenarios/level-4/incident-management-serverless.yaml +61 -0
  36. package/courses/aws-lambda-debugging/scenarios/level-4/multi-region-serverless.yaml +67 -0
  37. package/courses/aws-lambda-debugging/scenarios/level-4/observability-platform-design.yaml +71 -0
  38. package/courses/aws-lambda-debugging/scenarios/level-4/serverless-architecture-design.yaml +64 -0
  39. package/courses/aws-lambda-debugging/scenarios/level-4/serverless-data-architecture.yaml +66 -0
  40. package/courses/aws-lambda-debugging/scenarios/level-4/serverless-migration-strategy.yaml +65 -0
  41. package/courses/aws-lambda-debugging/scenarios/level-4/serverless-security-design.yaml +60 -0
  42. package/courses/aws-lambda-debugging/scenarios/level-4/serverless-testing-strategy.yaml +62 -0
  43. package/courses/aws-lambda-debugging/scenarios/level-5/board-serverless-strategy.yaml +63 -0
  44. package/courses/aws-lambda-debugging/scenarios/level-5/consulting-serverless-adoption.yaml +57 -0
  45. package/courses/aws-lambda-debugging/scenarios/level-5/industry-serverless-patterns.yaml +62 -0
  46. package/courses/aws-lambda-debugging/scenarios/level-5/ma-serverless-integration.yaml +75 -0
  47. package/courses/aws-lambda-debugging/scenarios/level-5/master-debugging-shift.yaml +61 -0
  48. package/courses/aws-lambda-debugging/scenarios/level-5/organizational-serverless-transformation.yaml +65 -0
  49. package/courses/aws-lambda-debugging/scenarios/level-5/regulatory-serverless.yaml +61 -0
  50. package/courses/aws-lambda-debugging/scenarios/level-5/serverless-economics.yaml +65 -0
  51. package/courses/aws-lambda-debugging/scenarios/level-5/serverless-future-technology.yaml +66 -0
  52. package/courses/aws-lambda-debugging/scenarios/level-5/serverless-platform-design.yaml +71 -0
  53. package/courses/docker-container-debugging/course.yaml +11 -0
  54. package/courses/docker-container-debugging/scenarios/level-1/container-exit-codes.yaml +59 -0
  55. package/courses/docker-container-debugging/scenarios/level-1/container-networking-basics.yaml +69 -0
  56. package/courses/docker-container-debugging/scenarios/level-1/docker-logs-debugging.yaml +67 -0
  57. package/courses/docker-container-debugging/scenarios/level-1/dockerfile-build-failures.yaml +71 -0
  58. package/courses/docker-container-debugging/scenarios/level-1/environment-variable-issues.yaml +74 -0
  59. package/courses/docker-container-debugging/scenarios/level-1/first-debugging-shift.yaml +70 -0
  60. package/courses/docker-container-debugging/scenarios/level-1/image-pull-failures.yaml +68 -0
  61. package/courses/docker-container-debugging/scenarios/level-1/port-mapping-issues.yaml +67 -0
  62. package/courses/docker-container-debugging/scenarios/level-1/resource-limits-oom.yaml +70 -0
  63. package/courses/docker-container-debugging/scenarios/level-1/volume-mount-problems.yaml +66 -0
  64. package/courses/docker-container-debugging/scenarios/level-2/container-health-checks.yaml +73 -0
  65. package/courses/docker-container-debugging/scenarios/level-2/docker-compose-debugging.yaml +66 -0
  66. package/courses/docker-container-debugging/scenarios/level-2/docker-exec-debugging.yaml +71 -0
  67. package/courses/docker-container-debugging/scenarios/level-2/image-layer-optimization.yaml +81 -0
  68. package/courses/docker-container-debugging/scenarios/level-2/intermediate-debugging-shift.yaml +73 -0
  69. package/courses/docker-container-debugging/scenarios/level-2/logging-and-log-rotation.yaml +76 -0
  70. package/courses/docker-container-debugging/scenarios/level-2/multi-stage-build-debugging.yaml +76 -0
  71. package/courses/docker-container-debugging/scenarios/level-2/network-debugging-tools.yaml +67 -0
  72. package/courses/docker-container-debugging/scenarios/level-2/pid1-signal-handling.yaml +71 -0
  73. package/courses/docker-container-debugging/scenarios/level-2/security-scanning-basics.yaml +67 -0
  74. package/courses/docker-container-debugging/scenarios/level-3/advanced-debugging-shift.yaml +77 -0
  75. package/courses/docker-container-debugging/scenarios/level-3/buildkit-optimization.yaml +67 -0
  76. package/courses/docker-container-debugging/scenarios/level-3/container-filesystem-debugging.yaml +70 -0
  77. package/courses/docker-container-debugging/scenarios/level-3/container-security-hardening.yaml +74 -0
  78. package/courses/docker-container-debugging/scenarios/level-3/disk-space-management.yaml +74 -0
  79. package/courses/docker-container-debugging/scenarios/level-3/docker-api-automation.yaml +72 -0
  80. package/courses/docker-container-debugging/scenarios/level-3/docker-daemon-issues.yaml +73 -0
  81. package/courses/docker-container-debugging/scenarios/level-3/docker-in-docker-ci.yaml +69 -0
  82. package/courses/docker-container-debugging/scenarios/level-3/overlay-network-debugging.yaml +70 -0
  83. package/courses/docker-container-debugging/scenarios/level-3/production-container-ops.yaml +71 -0
  84. package/courses/docker-container-debugging/scenarios/level-4/cicd-pipeline-design.yaml +66 -0
  85. package/courses/docker-container-debugging/scenarios/level-4/container-monitoring-observability.yaml +63 -0
  86. package/courses/docker-container-debugging/scenarios/level-4/container-orchestration-strategy.yaml +62 -0
  87. package/courses/docker-container-debugging/scenarios/level-4/container-performance-engineering.yaml +64 -0
  88. package/courses/docker-container-debugging/scenarios/level-4/container-security-architecture.yaml +66 -0
  89. package/courses/docker-container-debugging/scenarios/level-4/enterprise-image-management.yaml +58 -0
  90. package/courses/docker-container-debugging/scenarios/level-4/expert-debugging-shift.yaml +63 -0
  91. package/courses/docker-container-debugging/scenarios/level-4/incident-response-containers.yaml +70 -0
  92. package/courses/docker-container-debugging/scenarios/level-4/multi-environment-management.yaml +65 -0
  93. package/courses/docker-container-debugging/scenarios/level-4/stateful-service-containers.yaml +65 -0
  94. package/courses/docker-container-debugging/scenarios/level-5/board-infrastructure-strategy.yaml +58 -0
  95. package/courses/docker-container-debugging/scenarios/level-5/consulting-container-strategy.yaml +61 -0
  96. package/courses/docker-container-debugging/scenarios/level-5/container-platform-architecture.yaml +67 -0
  97. package/courses/docker-container-debugging/scenarios/level-5/container-platform-economics.yaml +67 -0
  98. package/courses/docker-container-debugging/scenarios/level-5/container-technology-evolution.yaml +67 -0
  99. package/courses/docker-container-debugging/scenarios/level-5/disaster-recovery-containers.yaml +66 -0
  100. package/courses/docker-container-debugging/scenarios/level-5/industry-container-patterns.yaml +71 -0
  101. package/courses/docker-container-debugging/scenarios/level-5/master-debugging-shift.yaml +62 -0
  102. package/courses/docker-container-debugging/scenarios/level-5/organizational-transformation.yaml +67 -0
  103. package/courses/docker-container-debugging/scenarios/level-5/regulatory-compliance-containers.yaml +61 -0
  104. package/courses/kubernetes-deployment-troubleshooting/course.yaml +12 -0
  105. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/configmap-secret-issues.yaml +69 -0
  106. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/crashloopbackoff.yaml +68 -0
  107. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/deployment-rollout.yaml +56 -0
  108. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/first-troubleshooting-shift.yaml +65 -0
  109. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/health-probe-failures.yaml +70 -0
  110. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/imagepullbackoff.yaml +57 -0
  111. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/kubectl-debugging-basics.yaml +56 -0
  112. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/oomkilled.yaml +70 -0
  113. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/pending-pods.yaml +68 -0
  114. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-1/service-not-reachable.yaml +66 -0
  115. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/dns-resolution-failures.yaml +63 -0
  116. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/helm-deployment-failures.yaml +63 -0
  117. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/hpa-scaling-issues.yaml +62 -0
  118. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/ingress-routing-issues.yaml +63 -0
  119. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/init-container-failures.yaml +63 -0
  120. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/intermediate-troubleshooting-shift.yaml +66 -0
  121. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/network-policy-blocking.yaml +67 -0
  122. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/persistent-volume-issues.yaml +69 -0
  123. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/rbac-permission-denied.yaml +57 -0
  124. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-2/resource-quota-limits.yaml +64 -0
  125. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/advanced-troubleshooting-shift.yaml +69 -0
  126. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/cluster-upgrade-failures.yaml +71 -0
  127. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/gitops-drift-detection.yaml +62 -0
  128. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/job-cronjob-failures.yaml +67 -0
  129. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/monitoring-alerting-gaps.yaml +64 -0
  130. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/multi-container-debugging.yaml +68 -0
  131. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/node-pressure-evictions.yaml +70 -0
  132. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/pod-disruption-budgets.yaml +59 -0
  133. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/service-mesh-debugging.yaml +64 -0
  134. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-3/statefulset-troubleshooting.yaml +69 -0
  135. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/capacity-planning.yaml +65 -0
  136. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/cost-optimization.yaml +57 -0
  137. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/disaster-recovery-design.yaml +56 -0
  138. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/executive-communication.yaml +62 -0
  139. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/expert-troubleshooting-shift.yaml +65 -0
  140. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/incident-management-process.yaml +59 -0
  141. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/multi-cluster-operations.yaml +62 -0
  142. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/multi-tenancy-design.yaml +55 -0
  143. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/platform-engineering.yaml +59 -0
  144. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-4/security-hardening.yaml +58 -0
  145. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/behavioral-science.yaml +62 -0
  146. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/board-strategy.yaml +61 -0
  147. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/cloud-native-future.yaml +65 -0
  148. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/comprehensive-platform.yaml +57 -0
  149. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/consulting-engagement.yaml +62 -0
  150. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/industry-benchmarks.yaml +58 -0
  151. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/ma-integration.yaml +62 -0
  152. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/master-troubleshooting-shift.yaml +73 -0
  153. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/product-development.yaml +65 -0
  154. package/courses/kubernetes-deployment-troubleshooting/scenarios/level-5/regulatory-compliance.yaml +76 -0
  155. package/courses/mysql-query-optimization/course.yaml +11 -0
  156. package/courses/mysql-query-optimization/scenarios/level-1/buffer-pool-basics.yaml +65 -0
  157. package/courses/mysql-query-optimization/scenarios/level-1/explain-basics.yaml +66 -0
  158. package/courses/mysql-query-optimization/scenarios/level-1/first-optimization-shift.yaml +78 -0
  159. package/courses/mysql-query-optimization/scenarios/level-1/innodb-index-fundamentals.yaml +68 -0
  160. package/courses/mysql-query-optimization/scenarios/level-1/join-basics.yaml +66 -0
  161. package/courses/mysql-query-optimization/scenarios/level-1/n-plus-one-queries.yaml +67 -0
  162. package/courses/mysql-query-optimization/scenarios/level-1/query-rewriting-basics.yaml +66 -0
  163. package/courses/mysql-query-optimization/scenarios/level-1/select-star-problems.yaml +68 -0
  164. package/courses/mysql-query-optimization/scenarios/level-1/slow-query-diagnosis.yaml +65 -0
  165. package/courses/mysql-query-optimization/scenarios/level-1/where-clause-optimization.yaml +65 -0
  166. package/courses/mysql-query-optimization/scenarios/level-2/buffer-pool-tuning.yaml +64 -0
  167. package/courses/mysql-query-optimization/scenarios/level-2/composite-index-design.yaml +71 -0
  168. package/courses/mysql-query-optimization/scenarios/level-2/covering-and-invisible-indexes.yaml +69 -0
  169. package/courses/mysql-query-optimization/scenarios/level-2/cte-and-window-functions.yaml +78 -0
  170. package/courses/mysql-query-optimization/scenarios/level-2/intermediate-optimization-shift.yaml +68 -0
  171. package/courses/mysql-query-optimization/scenarios/level-2/join-optimization.yaml +67 -0
  172. package/courses/mysql-query-optimization/scenarios/level-2/performance-schema-analysis.yaml +69 -0
  173. package/courses/mysql-query-optimization/scenarios/level-2/query-optimizer-hints.yaml +74 -0
  174. package/courses/mysql-query-optimization/scenarios/level-2/subquery-optimization.yaml +70 -0
  175. package/courses/mysql-query-optimization/scenarios/level-2/write-optimization.yaml +63 -0
  176. package/courses/mysql-query-optimization/scenarios/level-3/advanced-optimization-shift.yaml +71 -0
  177. package/courses/mysql-query-optimization/scenarios/level-3/connection-management.yaml +67 -0
  178. package/courses/mysql-query-optimization/scenarios/level-3/full-text-search.yaml +77 -0
  179. package/courses/mysql-query-optimization/scenarios/level-3/json-optimization.yaml +87 -0
  180. package/courses/mysql-query-optimization/scenarios/level-3/lock-contention-analysis.yaml +68 -0
  181. package/courses/mysql-query-optimization/scenarios/level-3/monitoring-alerting.yaml +63 -0
  182. package/courses/mysql-query-optimization/scenarios/level-3/online-schema-changes.yaml +79 -0
  183. package/courses/mysql-query-optimization/scenarios/level-3/partitioning-strategies.yaml +83 -0
  184. package/courses/mysql-query-optimization/scenarios/level-3/query-profiling-deep-dive.yaml +84 -0
  185. package/courses/mysql-query-optimization/scenarios/level-3/replication-optimization.yaml +66 -0
  186. package/courses/mysql-query-optimization/scenarios/level-4/aurora-vs-rds-evaluation.yaml +61 -0
  187. package/courses/mysql-query-optimization/scenarios/level-4/data-architecture.yaml +62 -0
  188. package/courses/mysql-query-optimization/scenarios/level-4/database-migration-planning.yaml +59 -0
  189. package/courses/mysql-query-optimization/scenarios/level-4/enterprise-governance.yaml +50 -0
  190. package/courses/mysql-query-optimization/scenarios/level-4/executive-communication.yaml +54 -0
  191. package/courses/mysql-query-optimization/scenarios/level-4/expert-optimization-shift.yaml +67 -0
  192. package/courses/mysql-query-optimization/scenarios/level-4/high-availability-architecture.yaml +60 -0
  193. package/courses/mysql-query-optimization/scenarios/level-4/optimizer-internals.yaml +62 -0
  194. package/courses/mysql-query-optimization/scenarios/level-4/performance-sla-design.yaml +52 -0
  195. package/courses/mysql-query-optimization/scenarios/level-4/read-replica-scaling.yaml +51 -0
  196. package/courses/mysql-query-optimization/scenarios/level-5/ai-database-future.yaml +45 -0
  197. package/courses/mysql-query-optimization/scenarios/level-5/behavioral-science.yaml +44 -0
  198. package/courses/mysql-query-optimization/scenarios/level-5/benchmark-design.yaml +47 -0
  199. package/courses/mysql-query-optimization/scenarios/level-5/board-strategy.yaml +48 -0
  200. package/courses/mysql-query-optimization/scenarios/level-5/comprehensive-platform.yaml +49 -0
  201. package/courses/mysql-query-optimization/scenarios/level-5/consulting-engagement.yaml +52 -0
  202. package/courses/mysql-query-optimization/scenarios/level-5/ma-database-integration.yaml +47 -0
  203. package/courses/mysql-query-optimization/scenarios/level-5/master-optimization-shift.yaml +56 -0
  204. package/courses/mysql-query-optimization/scenarios/level-5/product-development.yaml +48 -0
  205. package/courses/mysql-query-optimization/scenarios/level-5/regulatory-compliance.yaml +48 -0
  206. package/courses/postgresql-query-optimization/scenarios/level-5/comprehensive-database-system.yaml +70 -0
  207. package/courses/postgresql-query-optimization/scenarios/level-5/database-ai-future.yaml +81 -0
  208. package/courses/postgresql-query-optimization/scenarios/level-5/database-behavioral-science.yaml +63 -0
  209. package/courses/postgresql-query-optimization/scenarios/level-5/database-board-strategy.yaml +77 -0
  210. package/courses/postgresql-query-optimization/scenarios/level-5/database-consulting-engagement.yaml +61 -0
  211. package/courses/postgresql-query-optimization/scenarios/level-5/database-industry-benchmarks.yaml +64 -0
  212. package/courses/postgresql-query-optimization/scenarios/level-5/database-ma-integration.yaml +71 -0
  213. package/courses/postgresql-query-optimization/scenarios/level-5/database-product-development.yaml +72 -0
  214. package/courses/postgresql-query-optimization/scenarios/level-5/database-regulatory-landscape.yaml +76 -0
  215. package/courses/postgresql-query-optimization/scenarios/level-5/master-optimization-shift.yaml +66 -0
  216. package/courses/terraform-infrastructure-setup/course.yaml +11 -0
  217. package/courses/terraform-infrastructure-setup/scenarios/level-1/terraform-init-errors.yaml +72 -0
  218. package/dist/mcp/session-manager.d.ts +7 -4
  219. package/dist/mcp/session-manager.d.ts.map +1 -1
  220. package/dist/mcp/session-manager.js +23 -8
  221. package/dist/mcp/session-manager.js.map +1 -1
  222. package/package.json +1 -1
@@ -0,0 +1,64 @@
1
+ meta:
2
+ id: kinesis-stream-processing
3
+ level: 3
4
+ course: aws-lambda-debugging
5
+ type: output
6
+ description: "Debug Lambda Kinesis stream processing — diagnose shard iterator issues, throughput limits, and enhanced fan-out for high-volume stream consumers"
7
+ tags: [AWS, Lambda, Kinesis, streams, shards, fan-out, advanced]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your Lambda function processes a Kinesis data stream for real-time
13
+ analytics. Suddenly, processing lag increases from seconds to hours:
14
+
15
+ CloudWatch Metric — IteratorAge:
16
+ 10:00 - 500ms (normal)
17
+ 11:00 - 30,000ms (30 seconds behind)
18
+ 12:00 - 3,600,000ms (1 hour behind!)
19
+
20
+ Investigation:
21
+
22
+ 1. The stream has 4 shards but traffic doubled. Each shard supports
23
+ 1MB/s or 1,000 records/s input. At 2,000 records/s, shards are
24
+ at capacity. Records back up.
25
+ Fix: increase shard count (UpdateShardCount) or use on-demand mode.
26
+
27
+ 2. Lambda processes one batch per shard concurrently. With 4 shards,
28
+ maximum 4 concurrent Lambda invocations. Each takes 2 seconds.
29
+ Throughput: 4 batches × 100 records / 2 seconds = 200 records/s.
30
+ But 2,000 records/s are arriving!
31
+ Fix: increase parallelization factor (up to 10 per shard):
32
+ $ aws lambda update-event-source-mapping --uuid abc-123 \
33
+ --parallelization-factor 10
34
+ Now: 4 shards × 10 parallel = 40 concurrent Lambda invocations.
35
+
36
+ 3. After fixing throughput, one shard has a "poison pill" record
37
+ that causes the Lambda to crash. The shard is blocked — the
38
+ same bad record retries indefinitely.
39
+ Fix: configure BisectBatchOnFunctionError, MaximumRetryAttempts,
40
+ and OnFailure destination (same as DynamoDB Streams).
41
+
42
+ 4. Multiple consumers on the same stream compete for read throughput
43
+ (2MB/s per shard shared). Enhanced fan-out gives each consumer
44
+ a dedicated 2MB/s pipe.
45
+
46
+ Task: Explain Kinesis + Lambda debugging. Write: how Kinesis event
47
+ source mapping works (shards, iterators, checkpointing), throughput
48
+ optimization (parallelization factor, shard splitting), error
49
+ handling for stream records, enhanced fan-out for multiple consumers,
50
+ and monitoring stream processing health.
51
+
52
+ assertions:
53
+ - type: llm_judge
54
+ criteria: "Kinesis processing mechanics are explained — Lambda polls each shard independently. One batch per shard at a time by default. Parallelization factor: run up to 10 Lambda invocations per shard concurrently (records within a partition key remain ordered). Shard throughput: 1MB/s write, 2MB/s read per shard. IteratorAge: time between record written and processed — critical metric for lag detection. Checkpointing: Lambda checkpoints after successful batch processing. Failed batches retry from the last successful checkpoint"
55
+ weight: 0.35
56
+ description: "Processing mechanics"
57
+ - type: llm_judge
58
+ criteria: "Error handling and throughput are covered — poison pill: one bad record blocks the entire shard. Fix: BisectBatchOnFunctionError (split batch to isolate bad record), MaximumRetryAttempts (stop retrying after N attempts), MaximumRecordAgeInSeconds (skip old records), OnFailure destination (send failed records to SQS). Throughput: increase parallelization factor for more concurrency, split shards for more capacity, use on-demand mode for auto-scaling shards. Enhanced fan-out: dedicated 2MB/s per consumer (avoids shared throughput limits)"
59
+ weight: 0.35
60
+ description: "Errors and throughput"
61
+ - type: llm_judge
62
+ criteria: "Monitoring is practical — key metrics: IteratorAge (processing lag, most important), GetRecords.IteratorAgeMilliseconds, IncomingBytes/IncomingRecords (input rate), ReadProvisionedThroughputExceeded (consumer hitting read limits). Alert on: IteratorAge > threshold (minutes, not hours), ReadProvisionedThroughputExceeded > 0 (need enhanced fan-out or more shards). Lambda metrics: Errors, Duration, ConcurrentExecutions per stream function. Capacity planning: records/second × average record size must be less than shard capacity × number of shards"
63
+ weight: 0.30
64
+ description: "Monitoring"
@@ -0,0 +1,64 @@
1
+ meta:
2
+ id: lambda-at-edge
3
+ level: 3
4
+ course: aws-lambda-debugging
5
+ type: output
6
+ description: "Debug Lambda@Edge and CloudFront Functions — diagnose edge function failures, replication delays, and CloudFront integration issues"
7
+ tags: [AWS, Lambda, Lambda@Edge, CloudFront, edge, CDN, advanced]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your Lambda@Edge function for URL rewriting intermittently fails.
13
+ CloudFront returns 502 errors to some users but not others:
14
+
15
+ CloudFront error response:
16
+ 502 ERROR — The request could not be satisfied.
17
+ Lambda@Edge function execution error.
18
+
19
+ Debugging Lambda@Edge is harder than regular Lambda:
20
+ 1. Logs are in the REGION where the function executed, not where
21
+ it was deployed. A user in Tokyo triggers logs in ap-northeast-1,
22
+ a user in London triggers logs in eu-west-2.
23
+
24
+ 2. You deployed the function in us-east-1 (required for Lambda@Edge)
25
+ but the error happens in eu-west-1. You must check CloudWatch
26
+ Logs in eu-west-1.
27
+
28
+ 3. The function exceeds Lambda@Edge limits:
29
+ - Viewer request/response: 5 seconds timeout, 128MB memory,
30
+ 40KB response size
31
+ - Origin request/response: 30 seconds timeout, 128MB memory,
32
+ 1MB response size
33
+
34
+ Your function takes 6 seconds on some requests (cold start +
35
+ external API call) — exceeding the 5-second viewer request limit.
36
+
37
+ 4. After a code fix, replication takes 5-15 minutes. You can't
38
+ test immediately — the old version runs until replication
39
+ completes across all edge locations.
40
+
41
+ Alternative: CloudFront Functions (simpler, faster, cheaper):
42
+ - Sub-millisecond execution
43
+ - Runs at edge locations (not regional)
44
+ - Limited to simple request/response manipulation
45
+ - JavaScript only, no network access
46
+
47
+ Task: Explain Lambda@Edge debugging. Write: Lambda@Edge vs
48
+ CloudFront Functions (when to use which), deployment and
49
+ replication process, finding logs across regions, edge-specific
50
+ limits and constraints, common failures, and testing strategies.
51
+
52
+ assertions:
53
+ - type: llm_judge
54
+ criteria: "Lambda@Edge vs CloudFront Functions are compared — Lambda@Edge: Node.js/Python, up to 5s (viewer) or 30s (origin), network access, VPC access (origin only), deployed in us-east-1 and replicated. CloudFront Functions: JavaScript only, sub-millisecond, no network access, runs at edge locations, cheaper (1/6th the cost). Use CloudFront Functions for: simple header manipulation, URL rewrites, redirects. Use Lambda@Edge for: complex logic, network calls, authentication, dynamic content generation"
55
+ weight: 0.35
56
+ description: "Edge comparison"
57
+ - type: llm_judge
58
+ criteria: "Debugging challenges are covered — logs are in the execution region (not us-east-1). To find logs: check CloudFront access logs for x-edge-location, then check CloudWatch Logs in that region. Replication delay: 5-15 minutes after publishing — can't test immediately. Must use published versions (not $LATEST). Limits are stricter than regular Lambda: 128MB memory max, 5s timeout for viewer events, 40KB response size for viewer. Test with CloudFront's test event feature before deploying. Use CloudWatch Logs Insights cross-region query to find errors"
59
+ weight: 0.35
60
+ description: "Debugging challenges"
61
+ - type: llm_judge
62
+ criteria: "Testing and best practices are practical — test locally: sam local invoke with CloudFront event samples. Test in staging: create a staging CloudFront distribution. Monitor: CloudFront 5xx error rate metric (catches Lambda@Edge failures), Lambda@Edge metrics in us-east-1 (invocations, errors, duration). Keep functions fast: minimize cold starts (small packages), avoid external network calls in viewer events. Cache responses when possible. Use CloudFront Functions for simple tasks — they're faster, cheaper, and easier to debug. Always have a rollback plan (revert to previous version/alias)"
63
+ weight: 0.30
64
+ description: "Testing and practices"
@@ -0,0 +1,67 @@
1
+ meta:
2
+ id: lambda-extensions-debugging
3
+ level: 3
4
+ course: aws-lambda-debugging
5
+ type: output
6
+ description: "Debug Lambda Extensions — diagnose extension initialization failures, performance impact, and third-party extension integration issues"
7
+ tags: [AWS, Lambda, extensions, monitoring, third-party, performance, advanced]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ After adding a Datadog monitoring extension to your Lambda function,
13
+ cold starts doubled from 800ms to 2.2 seconds, and some invocations
14
+ timeout:
15
+
16
+ CloudWatch REPORT (before extension):
17
+ REPORT Duration: 150.00 ms Init Duration: 800.00 ms
18
+
19
+ CloudWatch REPORT (after extension):
20
+ REPORT Duration: 350.00 ms Init Duration: 2200.00 ms
21
+ EXTENSION Name: datadog-agent State: Ready
22
+
23
+ The Datadog extension adds 1,400ms to cold start and 200ms to
24
+ each invocation. For a function with a 3-second timeout, this
25
+ pushes warm invocations over the limit.
26
+
27
+ Investigation:
28
+
29
+ 1. Extension lifecycle: Extensions run in the INIT, INVOKE, and
30
+ SHUTDOWN phases. They add overhead to each phase.
31
+ INIT: extension initializes (downloads config, opens connections)
32
+ INVOKE: extension runs alongside the function (collects metrics)
33
+ SHUTDOWN: extension flushes data (sends metrics/logs to backend)
34
+
35
+ 2. Extensions share the function's memory and timeout:
36
+ Function memory: 256MB, extension uses 50MB → only 206MB for
37
+ your code. Extension INIT counts toward the 10-second init limit.
38
+
39
+ 3. The extension makes outbound HTTPS calls during INVOKE to send
40
+ telemetry. In a VPC Lambda without internet access, these calls
41
+ timeout, adding the full HTTP timeout to each invocation.
42
+
43
+ 4. Multiple extensions stack:
44
+ Layer 1: Datadog monitoring (+1.4s cold start)
45
+ Layer 2: Secrets Manager cache (+300ms cold start)
46
+ Layer 3: Custom logging (+200ms cold start)
47
+ Total: +1.9s cold start from extensions alone!
48
+
49
+ Task: Explain Lambda Extensions debugging. Write: how extensions
50
+ work (lifecycle phases, resource sharing), performance impact
51
+ (cold start, duration, memory), common extension issues (timeout,
52
+ VPC, memory pressure), popular extensions (Datadog, New Relic,
53
+ Secrets Manager), and when the overhead is worth it.
54
+
55
+ assertions:
56
+ - type: llm_judge
57
+ criteria: "Extension lifecycle is explained — extensions participate in three phases: INIT (initialize alongside runtime, counts toward 10s init timeout), INVOKE (runs concurrently with function handler, shares CPU/memory), SHUTDOWN (cleanup, up to 2 seconds). Internal extensions: run in the same process as the function. External extensions: run as separate processes (Lambda Layers). Extensions share the function's configured memory and timeout. Extension errors don't crash the function but degraded extensions may cause issues"
58
+ weight: 0.35
59
+ description: "Extension lifecycle"
60
+ - type: llm_judge
61
+ criteria: "Performance impact and debugging are covered — cold start: each extension adds initialization time (100ms to 2+ seconds). Duration: extensions running during INVOKE add latency. Memory: extensions share configured memory — reduce available memory for function code. Debug: check Init Duration increase after adding extension, monitor Duration increase on warm invocations, check Max Memory Used for memory pressure. VPC Lambda: extensions making outbound calls need internet access or VPC endpoints. Consider: is monitoring worth 200ms+ per invocation? For user-facing APIs, maybe not"
62
+ weight: 0.35
63
+ description: "Performance impact"
64
+ - type: llm_judge
65
+ criteria: "Trade-offs and recommendations are practical — worth it: production functions where observability is critical (revenue-impacting APIs, compliance requirements). Not worth it: high-frequency, low-latency functions where every millisecond matters. Alternatives: CloudWatch native metrics (zero extension overhead), X-Ray (minimal overhead, no extension needed), Powertools (in-process, no extension). If using extensions: increase memory to compensate, increase timeout, test performance before and after. Use CloudWatch Lambda Insights (lighter than third-party alternatives). Minimize number of extensions — each one adds overhead"
66
+ weight: 0.30
67
+ description: "Trade-offs"
@@ -0,0 +1,79 @@
1
+ meta:
2
+ id: powertools-observability
3
+ level: 3
4
+ course: aws-lambda-debugging
5
+ type: output
6
+ description: "Implement Lambda observability with Powertools — use structured logging, custom metrics, and tracing for production-grade monitoring"
7
+ tags: [AWS, Lambda, Powertools, observability, structured-logging, metrics, advanced]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your team has 50+ Lambda functions in production. Debugging is
13
+ painful: logs are unstructured text, no custom metrics, X-Ray
14
+ tracing is inconsistent. When an incident occurs, you spend
15
+ 30+ minutes correlating logs across functions.
16
+
17
+ Current logging:
18
+ console.log("Processing order " + orderId);
19
+ console.log("Error: " + err.message);
20
+
21
+ Searching CloudWatch Logs:
22
+ fields @timestamp, @message
23
+ | filter @message like /Error/
24
+ | sort @timestamp desc
25
+ (returns hundreds of unrelated errors, no context)
26
+
27
+ Implementing AWS Lambda Powertools (TypeScript):
28
+
29
+ import { Logger, Metrics, Tracer } from '@aws-lambda-powertools/...'
30
+
31
+ const logger = new Logger({ serviceName: 'order-api' });
32
+ const metrics = new Metrics({ namespace: 'OrderService' });
33
+ const tracer = new Tracer({ serviceName: 'order-api' });
34
+
35
+ export const handler = async (event) => {
36
+ logger.addContext(event); // Adds requestId, coldStart, etc.
37
+ logger.info('Processing order', { orderId, customerId });
38
+
39
+ metrics.addMetric('OrderProcessed', MetricUnit.Count, 1);
40
+ metrics.addMetric('OrderAmount', MetricUnit.None, amount);
41
+
42
+ const segment = tracer.getSegment();
43
+ const subsegment = segment.addNewSubsegment('processPayment');
44
+ // ... business logic
45
+ subsegment.close();
46
+ };
47
+
48
+ Now logs are structured JSON with correlation IDs:
49
+ {
50
+ "level": "INFO",
51
+ "message": "Processing order",
52
+ "service": "order-api",
53
+ "timestamp": "2024-12-01T10:00:00.000Z",
54
+ "xray_trace_id": "1-abc-def",
55
+ "cold_start": false,
56
+ "function_name": "order-api",
57
+ "orderId": "ORD-123",
58
+ "customerId": "CUST-456"
59
+ }
60
+
61
+ Task: Explain Lambda Powertools observability. Write: structured
62
+ logging (Logger), custom metrics (Metrics with EMF), distributed
63
+ tracing (Tracer), how to correlate across functions, CloudWatch
64
+ Logs Insights queries for structured logs, custom dashboards, and
65
+ alerting on business metrics.
66
+
67
+ assertions:
68
+ - type: llm_judge
69
+ criteria: "Powertools components are explained — Logger: structured JSON logging with automatic Lambda context (requestId, functionName, coldStart, xrayTraceId). Supports log levels, child loggers, sensitive data masking. Metrics: publishes CloudWatch metrics via EMF (Embedded Metric Format) — no API calls, written as structured log lines. Supports dimensions for filtering. Tracer: wraps X-Ray SDK, automatic capture of AWS SDK calls, support for custom subsegments. Available for Python, TypeScript, Java, .NET"
70
+ weight: 0.35
71
+ description: "Powertools components"
72
+ - type: llm_judge
73
+ criteria: "Correlation and querying are covered — correlation: trace ID connects logs across functions (X-Ray trace propagated through Lambda invocations). Include business context (orderId, customerId) in all log entries for business-level tracing. CloudWatch Logs Insights: query structured JSON fields directly. Example: fields orderId, @timestamp, level | filter level = 'ERROR' | filter service = 'order-api'. Custom dashboards: visualize custom metrics (order count, error rate, processing time) alongside Lambda system metrics"
74
+ weight: 0.35
75
+ description: "Correlation and querying"
76
+ - type: llm_judge
77
+ criteria: "Business metrics and alerting are practical — EMF metrics: define custom metrics like OrderProcessed, PaymentFailed, OrderAmount. Add dimensions: customer tier, product category. Alert on business metrics: OrderErrors > 5 in 5 minutes, PaymentFailureRate > 2%. Create CloudWatch dashboards per service showing: invocations, errors, duration, custom business metrics. Combine with X-Ray service map for full observability. Cost: Powertools itself is free. CloudWatch metrics: $0.30/metric/month. Logs: $0.50/GB ingested. Budget: set log retention to limit costs"
78
+ weight: 0.30
79
+ description: "Business metrics"
@@ -0,0 +1,80 @@
1
+ meta:
2
+ id: step-functions-debugging
3
+ level: 3
4
+ course: aws-lambda-debugging
5
+ type: output
6
+ description: "Debug Step Functions workflows — diagnose state machine failures, Lambda integration errors, retry policies, and error handling in complex orchestrations"
7
+ tags: [AWS, Lambda, Step-Functions, orchestration, state-machine, workflow, advanced]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your order processing workflow uses Step Functions to orchestrate
13
+ 5 Lambda functions. The workflow fails intermittently:
14
+
15
+ State machine definition:
16
+ StartAt: ValidateOrder
17
+ States:
18
+ ValidateOrder:
19
+ Type: Task
20
+ Resource: arn:aws:lambda:...:validate-order
21
+ Next: CheckInventory
22
+ CheckInventory:
23
+ Type: Task
24
+ Resource: arn:aws:lambda:...:check-inventory
25
+ Next: ProcessPayment
26
+ Catch:
27
+ - ErrorEquals: [States.TaskFailed]
28
+ Next: NotifyOutOfStock
29
+ ProcessPayment:
30
+ Type: Task
31
+ Resource: arn:aws:lambda:...:process-payment
32
+ Retry:
33
+ - ErrorEquals: [PaymentGatewayError]
34
+ IntervalSeconds: 5
35
+ MaxAttempts: 3
36
+ BackoffRate: 2.0
37
+ Next: FulfillOrder
38
+ FulfillOrder:
39
+ Type: Task
40
+ Resource: arn:aws:lambda:...:fulfill-order
41
+ End: true
42
+ NotifyOutOfStock:
43
+ Type: Task
44
+ Resource: arn:aws:lambda:...:notify-customer
45
+ End: true
46
+
47
+ Issue 1: ProcessPayment retries 3 times then fails with:
48
+ "States.TaskFailed" — but there's no Catch for this state!
49
+ The entire workflow fails. Missing error handler after retries
50
+ are exhausted.
51
+
52
+ Issue 2: CheckInventory Lambda returns:
53
+ {"error": "OutOfStock", "item": "SKU-123"}
54
+ But Step Functions expects the Lambda to THROW an error to
55
+ trigger the Catch. Returning an error object is not an error —
56
+ it's a successful invocation with data.
57
+
58
+ Issue 3: Execution history shows the workflow took 45 minutes.
59
+ ProcessPayment waited 5s, then 10s, then 20s between retries
60
+ (exponential backoff). Meanwhile, the customer waited.
61
+
62
+ Task: Explain Step Functions debugging. Write: how Task states
63
+ invoke Lambda (integration patterns), error handling (Catch, Retry),
64
+ the difference between Lambda errors and successful returns, reading
65
+ execution history, timeout and heartbeat configuration, and common
66
+ Step Functions anti-patterns.
67
+
68
+ assertions:
69
+ - type: llm_judge
70
+ criteria: "Lambda integration is explained — Task states can invoke Lambda in two ways: RequestResponse (synchronous, default) or InvokeFunction (similar). Lambda must THROW an error (not return an error object) to trigger Catch/Retry. Common mistake: returning {error: message} is a successful invocation. Use callback pattern for long-running tasks (send taskToken to Lambda, Lambda calls SendTaskSuccess/Failure when done). Execution input/output: use InputPath, OutputPath, ResultPath to control data flow between states"
71
+ weight: 0.35
72
+ description: "Lambda integration"
73
+ - type: llm_judge
74
+ criteria: "Error handling is covered — Retry: ErrorEquals matches error type, IntervalSeconds for delay, MaxAttempts for retry count, BackoffRate for exponential backoff. Catch: ErrorEquals matches error, Next routes to error handling state, ResultPath stores error details. Always add a catch-all Catch after Retry (handles errors after retries are exhausted). Error types: States.ALL (catch everything), States.TaskFailed (Lambda failure), States.Timeout (task timeout), custom errors thrown by Lambda. Without Catch, unhandled errors fail the entire execution"
75
+ weight: 0.35
76
+ description: "Error handling"
77
+ - type: llm_judge
78
+ criteria: "Debugging and anti-patterns are practical — read execution history: shows each state transition with input/output and timing. Visual workflow view highlights which state failed (red). Anti-patterns: no timeout on Task states (can run indefinitely — set TimeoutSeconds), no Catch after Retry (errors after retries kill the workflow), returning errors instead of throwing them, overly long retry delays for user-facing workflows. Use Step Functions local for development testing. Monitor: ExecutionsFailed, ExecutionThrottled, ExecutionsTimedOut metrics"
79
+ weight: 0.30
80
+ description: "Debugging and anti-patterns"
@@ -0,0 +1,67 @@
1
+ meta:
2
+ id: cost-optimization-strategy
3
+ level: 4
4
+ course: aws-lambda-debugging
5
+ type: output
6
+ description: "Design Lambda cost optimization — implement right-sizing, architecture optimizations, and FinOps practices for serverless applications at scale"
7
+ tags: [AWS, Lambda, cost, optimization, FinOps, right-sizing, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your serverless application's AWS bill has grown from $5K to $45K
13
+ per month. The CFO wants answers. Cost Explorer breakdown:
14
+
15
+ Lambda: $18,000 (40%)
16
+ DynamoDB: $9,000 (20%)
17
+ API Gateway: $6,000 (13%)
18
+ CloudWatch: $5,000 (11%)
19
+ S3/Data Transfer: $4,000 (9%)
20
+ Other: $3,000 (7%)
21
+
22
+ Lambda cost analysis:
23
+ - 50 functions, 120M invocations/month
24
+ - Average duration: 800ms
25
+ - Average memory: 512MB (many over-provisioned)
26
+ - 5 functions with provisioned concurrency: $3,200/month
27
+ - ARM (Graviton2): not used (could save 20%)
28
+
29
+ Investigation reveals cost drivers:
30
+
31
+ 1. Over-provisioned memory: 30 functions at 512MB-1024MB but
32
+ Max Memory Used is 50-100MB. They were set high "just in case."
33
+ Right-sizing to actual usage + 20% buffer saves $4,500/month.
34
+
35
+ 2. Unnecessary invocations: a "poller" Lambda runs every 1 second
36
+ checking for SQS messages. SQS event source mapping would
37
+ eliminate 2.5M unnecessary invocations/month ($800).
38
+
39
+ 3. CloudWatch Logs: $5,000/month! Functions log everything at
40
+ DEBUG level in production. Switching to INFO saves 80% of
41
+ log volume ($4,000/month).
42
+
43
+ 4. Provisioned concurrency: 5 functions with 100 provisioned
44
+ concurrent executions each, but traffic only needs 100 total.
45
+ Reduce to 20 per function with auto-scaling.
46
+
47
+ 5. Not using ARM: all functions on x86_64. Graviton2 is 20%
48
+ cheaper and often faster. Simple switch for most functions.
49
+
50
+ Task: Design Lambda cost optimization strategy. Write: memory
51
+ right-sizing methodology, ARM migration, invocation reduction
52
+ patterns, CloudWatch cost management, provisioned concurrency
53
+ optimization, and the FinOps process for ongoing cost governance.
54
+
55
+ assertions:
56
+ - type: llm_judge
57
+ criteria: "Right-sizing methodology is explained — use Lambda Power Tuning to test functions at different memory levels. Analyze Max Memory Used from CloudWatch Logs Insights (filter @type = 'REPORT' | stats avg(@maxMemoryUsed), p99(@maxMemoryUsed) by @logStream). Set memory to p99 + 20% buffer. Remember: memory also controls CPU — some functions need more memory for CPU, not for memory. Monitor after changes: track duration and cost. ARM (Graviton2): 20% cheaper, often faster. Migration: change architecture in function config, rebuild native dependencies for arm64"
58
+ weight: 0.35
59
+ description: "Right-sizing"
60
+ - type: llm_judge
61
+ criteria: "Invocation and infrastructure optimization are covered — reduce invocations: use event source mappings instead of polling, batch processing (increase SQS batch size), use EventBridge scheduled rules instead of CloudWatch Events for cron. CloudWatch costs: set log level to INFO/WARN in production, set log retention (7-30 days), use sampling for high-frequency functions, consider structured logging with selective field logging. API Gateway: use HTTP API (cheaper than REST API) where possible. DynamoDB: on-demand vs provisioned (on-demand more expensive per request but no over-provisioning)"
62
+ weight: 0.35
63
+ description: "Infrastructure optimization"
64
+ - type: llm_judge
65
+ criteria: "FinOps process is practical — tag all Lambda functions (team, environment, service) for cost attribution. Use AWS Cost Explorer with tags for per-service cost breakdown. Set up budgets with alerts ($X threshold per service). Monthly cost review: identify top 10 cost drivers, track cost per transaction/user. Compute Savings Plans: up to 17% additional savings for committed usage. Reserved capacity: use for stable base load, on-demand for burst. Track: cost per invocation, cost per request, cost per user as key metrics. Automate: Lambda Power Tuning on schedule to catch optimization opportunities"
66
+ weight: 0.30
67
+ description: "FinOps process"
@@ -0,0 +1,62 @@
1
+ meta:
2
+ id: expert-debugging-shift
3
+ level: 4
4
+ course: aws-lambda-debugging
5
+ type: output
6
+ description: "Combined expert debugging shift — diagnose and design solutions for an enterprise serverless platform with architecture, security, cost, and operational challenges"
7
+ tags: [AWS, Lambda, troubleshooting, combined, shift-simulation, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You're hired as the serverless platform architect for a fintech
13
+ company. The existing Lambda-based platform has grown organically
14
+ over 3 years and has significant technical debt.
15
+
16
+ Assessment findings:
17
+
18
+ Architecture:
19
+ - 200+ Lambda functions across 3 AWS accounts (dev, staging, prod)
20
+ - No naming convention — functions named randomly
21
+ - 40 functions with inline code (console editor, not IaC!)
22
+ - Step Functions workflows with 20+ states and no error handling
23
+ - Mixed deployment methods: SAM, Serverless Framework, CDK, console
24
+
25
+ Security:
26
+ - 30% of functions share a single IAM role with AdministratorAccess
27
+ - API Gateway endpoints with no authentication
28
+ - Secrets stored as plain-text environment variables
29
+ - No dependency scanning — last audit found 85 CRITICAL CVEs
30
+ - Docker socket pattern found in 5 CI/CD Lambda functions
31
+
32
+ Operations:
33
+ - No structured logging — console.log everywhere
34
+ - CloudWatch Logs retention: unlimited (costing $3K/month alone)
35
+ - No X-Ray tracing enabled
36
+ - On-call rotation: 2 people who "know how it works"
37
+ - MTTR: 4+ hours average
38
+
39
+ Cost:
40
+ - Monthly bill: $65K (Lambda $28K, CloudWatch $8K, API GW $12K,
41
+ DynamoDB $10K, other $7K)
42
+ - 50% of functions over-provisioned (1GB memory, using 100MB)
43
+ - 15 functions with provisioned concurrency never utilizing it
44
+
45
+ Task: Design the 6-month platform improvement plan. Write: the
46
+ priority ranking (security first!), IaC standardization approach,
47
+ security remediation, observability implementation, cost optimization
48
+ targets, and the team structure needed to sustain the platform.
49
+
50
+ assertions:
51
+ - type: llm_judge
52
+ criteria: "Priority ranking is justified — Month 1: Security remediation (replace AdministratorAccess roles, add API authentication, move secrets to Secrets Manager, scan dependencies). Month 2: IaC migration (move inline functions to SAM/CDK, standardize on one IaC tool). Month 3: Observability (Powertools Logger + X-Ray + custom metrics, set log retention). Month 4: Cost optimization (right-size memory, remove unused provisioned concurrency, ARM migration). Month 5-6: Architecture improvements (error handling in Step Functions, standardize deployment, documentation)"
53
+ weight: 0.35
54
+ description: "Priority ranking"
55
+ - type: llm_judge
56
+ criteria: "Security and IaC remediation are covered — security: per-function IAM roles with least privilege (automated with IAM Access Analyzer), Cognito or Lambda authorizers on all APIs, Secrets Manager with automatic rotation, dependency scanning in CI (block on CRITICAL). IaC: standardize on CDK or SAM (pick one), migrate inline functions to IaC (start with critical functions), use infrastructure review in PR process. Naming convention: {service}-{function}-{env}. Tag all resources: team, service, environment, cost-center"
57
+ weight: 0.35
58
+ description: "Security and IaC"
59
+ - type: llm_judge
60
+ criteria: "Cost and team structure are practical — cost optimization: right-size memory (target: $10K Lambda reduction), set log retention to 30 days ($6K CloudWatch savings), remove unused provisioned concurrency ($2K savings), switch to HTTP API where possible ($4K API GW savings). Total target: $22K/month savings (34%). Team: minimum 3-person platform team (1 security-focused, 1 observability, 1 developer experience). On-call: expand rotation to 5+ engineers with runbooks. Success metrics: MTTR < 30 min, security findings < 5, IaC coverage > 95%, cost trend decreasing quarter over quarter"
61
+ weight: 0.30
62
+ description: "Cost and team"
@@ -0,0 +1,61 @@
1
+ meta:
2
+ id: incident-management-serverless
3
+ level: 4
4
+ course: aws-lambda-debugging
5
+ type: output
6
+ description: "Design serverless incident management — implement on-call processes, runbooks, and incident response procedures for Lambda-based applications"
7
+ tags: [AWS, Lambda, incident-management, on-call, runbooks, SRE, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your team has had 15 production incidents in the past quarter.
13
+ Average MTTR (Mean Time To Resolve): 2.5 hours. The engineering
14
+ VP wants MTTR under 30 minutes.
15
+
16
+ Analysis of past incidents:
17
+
18
+ Incident 1 (3.5 hours):
19
+ - Alert: "Lambda Errors > 100/minute"
20
+ - 45 minutes spent finding which function
21
+ - 30 minutes finding the root cause (DynamoDB throttling)
22
+ - 1 hour trying different fixes
23
+ - 1 hour implementing and deploying the fix
24
+ - 15 minutes verifying
25
+
26
+ Incident 2 (4 hours):
27
+ - No alert! Customer reported "orders not processing"
28
+ - 1 hour investigating (SQS DLQ had 10K messages)
29
+ - 1 hour finding root cause (Lambda VPC ENI limit reached)
30
+ - 1 hour requesting ENI limit increase
31
+ - 1 hour waiting for AWS support + reprocessing DLQ
32
+
33
+ Incident 3 (1 hour):
34
+ - Alert: Step Functions ExecutionsFailed
35
+ - On-call engineer knew the system, diagnosed in 10 minutes
36
+ - Fix: increase Lambda timeout from 30s to 60s
37
+ - Deploy and verify in 50 minutes
38
+
39
+ Patterns: incidents are slow when engineers don't know the system,
40
+ when alerts don't provide enough context, and when there's no
41
+ runbook.
42
+
43
+ Task: Design the serverless incident management process. Write:
44
+ alert design (actionable alerts with context), runbook structure
45
+ for serverless (per-function and per-workflow), on-call rotation
46
+ and escalation, automated remediation (Lambda-based auto-fixing),
47
+ post-incident review process, and SLO/SLI framework for Lambda.
48
+
49
+ assertions:
50
+ - type: llm_judge
51
+ criteria: "Alert design is actionable — alerts must include: what is broken (function/workflow name), impact (customer-facing? internal?), severity (P1-P3), link to dashboard, link to runbook. Serverless-specific alerts: Lambda Errors by function (not account-wide), Step Functions ExecutionsFailed, SQS ApproximateAgeOfOldestMessage (processing lag), DLQ message count > 0 (something is failing silently), EventBridge FailedInvocations. Reduce noise: use anomaly detection instead of static thresholds, composite alarms for related metrics"
52
+ weight: 0.35
53
+ description: "Alert design"
54
+ - type: llm_judge
55
+ criteria: "Runbooks and automation are covered — runbook per function: what it does, dependencies, common failure modes, diagnostic commands, fix procedures. Automated runbook steps: link to CloudWatch Logs Insights query (pre-built), link to X-Ray trace search, link to function configuration. Automated remediation: Lambda function triggered by CloudWatch alarm that can: increase concurrency limits, restart event source mappings, reprocess DLQ messages, scale DynamoDB capacity. Guard rails: automated remediation must be safe (idempotent, bounded, logged)"
56
+ weight: 0.35
57
+ description: "Runbooks and automation"
58
+ - type: llm_judge
59
+ criteria: "SLOs and process are practical — define SLOs: order completion rate > 99.5%, API p99 latency < 2 seconds, payment success rate > 99.9%. SLIs: measure from CloudWatch custom metrics. Error budget: 0.5% error budget for order completion — when budget is consumed, focus on reliability instead of features. On-call rotation: weekly rotation, 2 tiers (primary responds in 5 min, secondary backup). Post-incident: blameless review within 48 hours, identify: timeline, root cause, detection gap, prevention measures. Track: MTTD (detect), MTTR (resolve), incident count, error budget consumption"
60
+ weight: 0.30
61
+ description: "SLOs and process"
@@ -0,0 +1,67 @@
1
+ meta:
2
+ id: multi-region-serverless
3
+ level: 4
4
+ course: aws-lambda-debugging
5
+ type: output
6
+ description: "Design multi-region serverless architecture — implement active-active deployments, data replication, and failover for globally distributed Lambda applications"
7
+ tags: [AWS, Lambda, multi-region, global, failover, disaster-recovery, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your serverless application serves users globally. A US-East-1
13
+ outage took the entire service down for 4 hours. The board demands
14
+ multi-region resilience. Current architecture (single region):
15
+
16
+ Route53 → API Gateway (us-east-1) → Lambda → DynamoDB
17
+
18
+ Target: active-active across us-east-1 and eu-west-1 with
19
+ automatic failover. Users should be routed to the nearest region
20
+ with < 200ms latency.
21
+
22
+ Design challenges:
23
+
24
+ 1. Data replication — DynamoDB Global Tables:
25
+ Automatically replicates data across regions with ~1 second
26
+ latency. Last-writer-wins conflict resolution. But: Global
27
+ Tables cost 2x (write replicated to all regions).
28
+
29
+ 2. API deployment — same code, multiple regions:
30
+ Deploy same Lambda functions and API Gateway to both regions.
31
+ Use SAM StackSets or CDK Pipelines for multi-region deployment.
32
+ Route53 health checks route traffic to healthy region.
33
+
34
+ 3. Event processing — how to handle events in both regions:
35
+ SQS queues are regional. S3 events trigger in the bucket's
36
+ region. EventBridge can be global (cross-region event bus).
37
+ Challenge: prevent duplicate processing when both regions
38
+ process the same event.
39
+
40
+ 4. State management — avoid split-brain:
41
+ If both regions write to the same DynamoDB item simultaneously,
42
+ last-writer-wins may lose data. Design for conflict resolution
43
+ or use conditional writes.
44
+
45
+ 5. Deployment coordination — how to deploy across regions safely:
46
+ Deploy to secondary first, test, then deploy to primary.
47
+ Canary deployment per region before full rollout.
48
+
49
+ Task: Design multi-region serverless architecture. Write: the
50
+ active-active vs active-passive trade-off, data replication
51
+ strategies (DynamoDB Global Tables, S3 Cross-Region Replication),
52
+ routing and failover (Route53, CloudFront), event processing in
53
+ multi-region, deployment strategy, and cost analysis.
54
+
55
+ assertions:
56
+ - type: llm_judge
57
+ criteria: "Active-active architecture is explained — active-active: both regions serve traffic simultaneously, Route53 latency-based routing sends users to nearest region. RPO: ~0 (replicated data), RTO: ~0 (automatic failover). Active-passive: secondary region on standby, Route53 failover routing switches on health check failure. RPO: replication lag, RTO: DNS propagation (60-300 seconds). Cost: active-active costs ~2x for compute and data replication. Decision: active-active for global users or strict availability requirements, active-passive for regional with DR needs"
58
+ weight: 0.35
59
+ description: "Active-active design"
60
+ - type: llm_judge
61
+ criteria: "Data and event replication are covered — DynamoDB Global Tables: automatic multi-master replication, ~1s lag, last-writer-wins. Design for eventual consistency. S3: Cross-Region Replication for object storage. SQS: regional only — deploy separate queues per region, route events to appropriate region. EventBridge: global event bus for cross-region events. Idempotency: critical in multi-region — same event may be processed in both regions. Use idempotency keys and conditional DynamoDB writes. Conflict resolution: design data model to avoid conflicts (partition by region, use timestamps)"
62
+ weight: 0.35
63
+ description: "Data and events"
64
+ - type: llm_judge
65
+ criteria: "Deployment and operations are practical — multi-region deployment: CDK Pipelines or SAM StackSets deploy to all regions. Deploy secondary first, run integration tests, then deploy primary (blue-green across regions). Route53 health checks: check API Gateway endpoint in each region, failover if unhealthy. CloudFront: distribute static assets globally, origin failover group for API. Monitoring: centralized dashboard showing both regions (CloudWatch cross-account/cross-region). Cost: estimate additional cost of multi-region (2x DynamoDB, 2x Lambda, Route53 health checks). Justify: compare cost of multi-region to cost of downtime"
66
+ weight: 0.30
67
+ description: "Deployment and operations"