dojo.md 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (243) hide show
  1. package/courses/GENERATION_LOG.md +27 -0
  2. package/courses/api-error-handling/course.yaml +16 -0
  3. package/courses/api-error-handling/scenarios/level-1/error-response-format.yaml +131 -0
  4. package/courses/api-error-handling/scenarios/level-1/http-status-codes-basics.yaml +90 -0
  5. package/courses/api-error-handling/scenarios/level-1/rate-limiting-basics.yaml +135 -0
  6. package/courses/api-error-handling/scenarios/level-1/request-validation-errors.yaml +208 -0
  7. package/courses/api-error-handling/scenarios/level-2/circuit-breaker-pattern.yaml +189 -0
  8. package/courses/api-error-handling/scenarios/level-2/idempotency-retry-logic.yaml +159 -0
  9. package/courses/api-error-handling/scenarios/level-2/rfc-7807-problem-details.yaml +178 -0
  10. package/courses/api-error-handling/scenarios/level-2/webhook-error-handling.yaml +211 -0
  11. package/courses/api-error-handling/scenarios/level-3/distributed-tracing-errors.yaml +275 -0
  12. package/courses/github-actions-cicd/course.yaml +10 -0
  13. package/courses/github-actions-cicd/scenarios/level-1/actions-and-runners.yaml +58 -0
  14. package/courses/github-actions-cicd/scenarios/level-1/basic-workflow-syntax.yaml +52 -0
  15. package/courses/github-actions-cicd/scenarios/level-1/branch-protection-checks.yaml +63 -0
  16. package/courses/github-actions-cicd/scenarios/level-1/environment-variables-secrets.yaml +65 -0
  17. package/courses/github-actions-cicd/scenarios/level-1/first-cicd-shift.yaml +62 -0
  18. package/courses/github-actions-cicd/scenarios/level-1/job-dependencies-outputs.yaml +62 -0
  19. package/courses/github-actions-cicd/scenarios/level-1/simple-ci-pipeline.yaml +57 -0
  20. package/courses/github-actions-cicd/scenarios/level-1/workflow-debugging.yaml +90 -0
  21. package/courses/github-actions-cicd/scenarios/level-1/workflow-status-notifications.yaml +59 -0
  22. package/courses/github-actions-cicd/scenarios/level-1/workflow-triggers.yaml +56 -0
  23. package/courses/github-actions-cicd/scenarios/level-2/concurrency-control.yaml +58 -0
  24. package/courses/github-actions-cicd/scenarios/level-2/conditional-execution.yaml +60 -0
  25. package/courses/github-actions-cicd/scenarios/level-2/custom-actions-development.yaml +55 -0
  26. package/courses/github-actions-cicd/scenarios/level-2/dependency-caching.yaml +58 -0
  27. package/courses/github-actions-cicd/scenarios/level-2/deployment-workflows.yaml +61 -0
  28. package/courses/github-actions-cicd/scenarios/level-2/github-packages-publishing.yaml +59 -0
  29. package/courses/github-actions-cicd/scenarios/level-2/intermediate-cicd-shift.yaml +68 -0
  30. package/courses/github-actions-cicd/scenarios/level-2/matrix-builds.yaml +59 -0
  31. package/courses/github-actions-cicd/scenarios/level-2/reusable-workflows.yaml +61 -0
  32. package/courses/github-actions-cicd/scenarios/level-2/workflow-cost-optimization.yaml +61 -0
  33. package/courses/github-actions-cicd/scenarios/level-3/advanced-cicd-shift.yaml +64 -0
  34. package/courses/github-actions-cicd/scenarios/level-3/compliance-automation.yaml +68 -0
  35. package/courses/github-actions-cicd/scenarios/level-3/docker-action-development.yaml +65 -0
  36. package/courses/github-actions-cicd/scenarios/level-3/github-environments.yaml +65 -0
  37. package/courses/github-actions-cicd/scenarios/level-3/monorepo-ci.yaml +68 -0
  38. package/courses/github-actions-cicd/scenarios/level-3/oidc-cloud-deployments.yaml +55 -0
  39. package/courses/github-actions-cicd/scenarios/level-3/release-automation.yaml +61 -0
  40. package/courses/github-actions-cicd/scenarios/level-3/security-hardening.yaml +63 -0
  41. package/courses/github-actions-cicd/scenarios/level-3/self-hosted-runners.yaml +60 -0
  42. package/courses/github-actions-cicd/scenarios/level-3/workflow-optimization.yaml +59 -0
  43. package/courses/github-actions-cicd/scenarios/level-4/cicd-data-architecture.yaml +63 -0
  44. package/courses/github-actions-cicd/scenarios/level-4/cicd-economics-roi.yaml +63 -0
  45. package/courses/github-actions-cicd/scenarios/level-4/cicd-executive-communication.yaml +58 -0
  46. package/courses/github-actions-cicd/scenarios/level-4/cicd-incident-response.yaml +60 -0
  47. package/courses/github-actions-cicd/scenarios/level-4/cicd-org-design.yaml +59 -0
  48. package/courses/github-actions-cicd/scenarios/level-4/cicd-platform-architecture.yaml +63 -0
  49. package/courses/github-actions-cicd/scenarios/level-4/cicd-training-program.yaml +65 -0
  50. package/courses/github-actions-cicd/scenarios/level-4/cicd-vendor-evaluation.yaml +59 -0
  51. package/courses/github-actions-cicd/scenarios/level-4/enterprise-cicd-governance.yaml +55 -0
  52. package/courses/github-actions-cicd/scenarios/level-4/expert-cicd-shift.yaml +60 -0
  53. package/courses/github-actions-cicd/scenarios/level-5/cicd-ai-future.yaml +63 -0
  54. package/courses/github-actions-cicd/scenarios/level-5/cicd-behavioral-science.yaml +70 -0
  55. package/courses/github-actions-cicd/scenarios/level-5/cicd-board-strategy.yaml +56 -0
  56. package/courses/github-actions-cicd/scenarios/level-5/cicd-consulting-engagement.yaml +61 -0
  57. package/courses/github-actions-cicd/scenarios/level-5/cicd-industry-benchmarks.yaml +63 -0
  58. package/courses/github-actions-cicd/scenarios/level-5/cicd-ma-integration.yaml +73 -0
  59. package/courses/github-actions-cicd/scenarios/level-5/cicd-product-development.yaml +68 -0
  60. package/courses/github-actions-cicd/scenarios/level-5/cicd-regulatory-landscape.yaml +72 -0
  61. package/courses/github-actions-cicd/scenarios/level-5/comprehensive-cicd-system.yaml +66 -0
  62. package/courses/github-actions-cicd/scenarios/level-5/master-cicd-shift.yaml +76 -0
  63. package/courses/github-pr-review/scenarios/level-2/api-change-review.yaml +82 -0
  64. package/courses/github-pr-review/scenarios/level-2/automated-review-tooling.yaml +53 -0
  65. package/courses/github-pr-review/scenarios/level-2/cross-team-review.yaml +61 -0
  66. package/courses/github-pr-review/scenarios/level-2/intermediate-review-shift.yaml +66 -0
  67. package/courses/github-pr-review/scenarios/level-2/performance-review-patterns.yaml +99 -0
  68. package/courses/github-pr-review/scenarios/level-2/review-disagreement-resolution.yaml +64 -0
  69. package/courses/github-pr-review/scenarios/level-2/review-metrics-analysis.yaml +63 -0
  70. package/courses/github-pr-review/scenarios/level-2/review-turnaround-sla.yaml +54 -0
  71. package/courses/github-pr-review/scenarios/level-2/stacked-pr-review.yaml +65 -0
  72. package/courses/github-pr-review/scenarios/level-3/advanced-review-shift.yaml +65 -0
  73. package/courses/github-pr-review/scenarios/level-3/ai-powered-review.yaml +58 -0
  74. package/courses/github-pr-review/scenarios/level-3/compliance-review-process.yaml +64 -0
  75. package/courses/github-pr-review/scenarios/level-3/cross-functional-review.yaml +60 -0
  76. package/courses/github-pr-review/scenarios/level-3/incident-driven-review.yaml +63 -0
  77. package/courses/github-pr-review/scenarios/level-3/large-scale-review-operations.yaml +55 -0
  78. package/courses/github-pr-review/scenarios/level-3/monorepo-review-process.yaml +68 -0
  79. package/courses/github-pr-review/scenarios/level-3/review-automation-platform.yaml +61 -0
  80. package/courses/github-pr-review/scenarios/level-3/review-culture-design.yaml +62 -0
  81. package/courses/github-pr-review/scenarios/level-3/review-data-pipeline.yaml +62 -0
  82. package/courses/github-pr-review/scenarios/level-4/enterprise-review-operations.yaml +61 -0
  83. package/courses/github-pr-review/scenarios/level-4/expert-review-shift.yaml +62 -0
  84. package/courses/github-pr-review/scenarios/level-4/review-data-architecture.yaml +69 -0
  85. package/courses/github-pr-review/scenarios/level-4/review-economics-roi.yaml +63 -0
  86. package/courses/github-pr-review/scenarios/level-4/review-executive-communication.yaml +61 -0
  87. package/courses/github-pr-review/scenarios/level-4/review-incident-postmortem.yaml +69 -0
  88. package/courses/github-pr-review/scenarios/level-4/review-org-design.yaml +62 -0
  89. package/courses/github-pr-review/scenarios/level-4/review-platform-architecture.yaml +64 -0
  90. package/courses/github-pr-review/scenarios/level-4/review-training-program.yaml +66 -0
  91. package/courses/github-pr-review/scenarios/level-4/review-vendor-evaluation.yaml +76 -0
  92. package/courses/github-pr-review/scenarios/level-5/comprehensive-review-system.yaml +68 -0
  93. package/courses/github-pr-review/scenarios/level-5/master-review-shift.yaml +73 -0
  94. package/courses/github-pr-review/scenarios/level-5/review-ai-future.yaml +69 -0
  95. package/courses/github-pr-review/scenarios/level-5/review-behavioral-science.yaml +66 -0
  96. package/courses/github-pr-review/scenarios/level-5/review-board-strategy.yaml +62 -0
  97. package/courses/github-pr-review/scenarios/level-5/review-consulting-engagement.yaml +62 -0
  98. package/courses/github-pr-review/scenarios/level-5/review-devtools-product.yaml +71 -0
  99. package/courses/github-pr-review/scenarios/level-5/review-industry-benchmarks.yaml +64 -0
  100. package/courses/github-pr-review/scenarios/level-5/review-ma-integration.yaml +76 -0
  101. package/courses/github-pr-review/scenarios/level-5/review-regulatory-landscape.yaml +78 -0
  102. package/courses/postgresql-query-optimization/course.yaml +11 -0
  103. package/courses/postgresql-query-optimization/scenarios/level-1/explain-analyze-basics.yaml +80 -0
  104. package/courses/postgresql-query-optimization/scenarios/level-1/first-optimization-shift.yaml +77 -0
  105. package/courses/postgresql-query-optimization/scenarios/level-1/index-fundamentals.yaml +76 -0
  106. package/courses/postgresql-query-optimization/scenarios/level-1/join-basics.yaml +73 -0
  107. package/courses/postgresql-query-optimization/scenarios/level-1/n-plus-one-queries.yaml +62 -0
  108. package/courses/postgresql-query-optimization/scenarios/level-1/query-rewriting-basics.yaml +69 -0
  109. package/courses/postgresql-query-optimization/scenarios/level-1/select-star-problems.yaml +69 -0
  110. package/courses/postgresql-query-optimization/scenarios/level-1/slow-query-diagnosis.yaml +63 -0
  111. package/courses/postgresql-query-optimization/scenarios/level-1/vacuum-and-statistics.yaml +62 -0
  112. package/courses/postgresql-query-optimization/scenarios/level-1/where-clause-optimization.yaml +74 -0
  113. package/courses/postgresql-query-optimization/scenarios/level-2/autovacuum-tuning.yaml +76 -0
  114. package/courses/postgresql-query-optimization/scenarios/level-2/composite-index-design.yaml +81 -0
  115. package/courses/postgresql-query-optimization/scenarios/level-2/covering-indexes.yaml +74 -0
  116. package/courses/postgresql-query-optimization/scenarios/level-2/cte-optimization.yaml +83 -0
  117. package/courses/postgresql-query-optimization/scenarios/level-2/intermediate-optimization-shift.yaml +66 -0
  118. package/courses/postgresql-query-optimization/scenarios/level-2/join-optimization.yaml +72 -0
  119. package/courses/postgresql-query-optimization/scenarios/level-2/partial-and-expression-indexes.yaml +75 -0
  120. package/courses/postgresql-query-optimization/scenarios/level-2/query-planner-settings.yaml +62 -0
  121. package/courses/postgresql-query-optimization/scenarios/level-2/subquery-optimization.yaml +67 -0
  122. package/courses/postgresql-query-optimization/scenarios/level-2/window-function-optimization.yaml +63 -0
  123. package/courses/postgresql-query-optimization/scenarios/level-3/advanced-optimization-shift.yaml +71 -0
  124. package/courses/postgresql-query-optimization/scenarios/level-3/connection-pooling.yaml +60 -0
  125. package/courses/postgresql-query-optimization/scenarios/level-3/full-text-search-optimization.yaml +66 -0
  126. package/courses/postgresql-query-optimization/scenarios/level-3/jsonb-optimization.yaml +88 -0
  127. package/courses/postgresql-query-optimization/scenarios/level-3/lock-contention-analysis.yaml +80 -0
  128. package/courses/postgresql-query-optimization/scenarios/level-3/materialized-view-optimization.yaml +73 -0
  129. package/courses/postgresql-query-optimization/scenarios/level-3/parallel-query-execution.yaml +74 -0
  130. package/courses/postgresql-query-optimization/scenarios/level-3/partitioning-strategies.yaml +71 -0
  131. package/courses/postgresql-query-optimization/scenarios/level-3/specialized-index-types.yaml +67 -0
  132. package/courses/postgresql-query-optimization/scenarios/level-3/write-optimization.yaml +65 -0
  133. package/courses/postgresql-query-optimization/scenarios/level-4/data-architecture-analytics.yaml +64 -0
  134. package/courses/postgresql-query-optimization/scenarios/level-4/database-executive-communication.yaml +64 -0
  135. package/courses/postgresql-query-optimization/scenarios/level-4/database-migration-planning.yaml +57 -0
  136. package/courses/postgresql-query-optimization/scenarios/level-4/enterprise-database-governance.yaml +52 -0
  137. package/courses/postgresql-query-optimization/scenarios/level-4/expert-optimization-shift.yaml +73 -0
  138. package/courses/postgresql-query-optimization/scenarios/level-4/high-availability-architecture.yaml +62 -0
  139. package/courses/postgresql-query-optimization/scenarios/level-4/optimizer-internals.yaml +69 -0
  140. package/courses/postgresql-query-optimization/scenarios/level-4/performance-sla-design.yaml +58 -0
  141. package/courses/postgresql-query-optimization/scenarios/level-4/read-replica-optimization.yaml +62 -0
  142. package/courses/postgresql-query-optimization/scenarios/level-4/vendor-evaluation.yaml +73 -0
  143. package/courses/rest-api-error-handling/course.yaml +11 -0
  144. package/courses/rest-api-error-handling/scenarios/level-1/authentication-errors.yaml +71 -0
  145. package/courses/rest-api-error-handling/scenarios/level-1/content-negotiation-errors.yaml +63 -0
  146. package/courses/rest-api-error-handling/scenarios/level-1/error-logging-basics.yaml +63 -0
  147. package/courses/rest-api-error-handling/scenarios/level-1/error-response-format.yaml +58 -0
  148. package/courses/rest-api-error-handling/scenarios/level-1/first-error-handling-shift.yaml +67 -0
  149. package/courses/rest-api-error-handling/scenarios/level-1/http-status-codes.yaml +46 -0
  150. package/courses/rest-api-error-handling/scenarios/level-1/not-found-errors.yaml +52 -0
  151. package/courses/rest-api-error-handling/scenarios/level-1/rate-limiting-errors.yaml +56 -0
  152. package/courses/rest-api-error-handling/scenarios/level-1/request-validation-errors.yaml +59 -0
  153. package/courses/rest-api-error-handling/scenarios/level-1/server-error-handling.yaml +55 -0
  154. package/courses/rest-api-error-handling/scenarios/level-2/api-versioning-errors.yaml +66 -0
  155. package/courses/rest-api-error-handling/scenarios/level-2/batch-request-errors.yaml +61 -0
  156. package/courses/rest-api-error-handling/scenarios/level-2/circuit-breaker-pattern.yaml +52 -0
  157. package/courses/rest-api-error-handling/scenarios/level-2/error-code-taxonomy.yaml +62 -0
  158. package/courses/rest-api-error-handling/scenarios/level-2/error-monitoring-alerting.yaml +53 -0
  159. package/courses/rest-api-error-handling/scenarios/level-2/intermediate-error-shift.yaml +69 -0
  160. package/courses/rest-api-error-handling/scenarios/level-2/pagination-errors.yaml +66 -0
  161. package/courses/rest-api-error-handling/scenarios/level-2/retry-and-idempotency.yaml +60 -0
  162. package/courses/rest-api-error-handling/scenarios/level-2/rfc7807-problem-details.yaml +60 -0
  163. package/courses/rest-api-error-handling/scenarios/level-2/webhook-error-handling.yaml +55 -0
  164. package/courses/rest-api-error-handling/scenarios/level-3/advanced-error-shift.yaml +72 -0
  165. package/courses/rest-api-error-handling/scenarios/level-3/api-gateway-errors.yaml +71 -0
  166. package/courses/rest-api-error-handling/scenarios/level-3/async-api-errors.yaml +67 -0
  167. package/courses/rest-api-error-handling/scenarios/level-3/caching-error-scenarios.yaml +65 -0
  168. package/courses/rest-api-error-handling/scenarios/level-3/chaos-engineering-apis.yaml +62 -0
  169. package/courses/rest-api-error-handling/scenarios/level-3/database-error-handling.yaml +79 -0
  170. package/courses/rest-api-error-handling/scenarios/level-3/distributed-error-propagation.yaml +63 -0
  171. package/courses/rest-api-error-handling/scenarios/level-3/error-budgets-sre.yaml +61 -0
  172. package/courses/rest-api-error-handling/scenarios/level-3/error-correlation.yaml +58 -0
  173. package/courses/rest-api-error-handling/scenarios/level-3/graphql-vs-rest-errors.yaml +73 -0
  174. package/courses/rest-api-error-handling/scenarios/level-4/compliance-error-handling.yaml +65 -0
  175. package/courses/rest-api-error-handling/scenarios/level-4/enterprise-error-governance.yaml +62 -0
  176. package/courses/rest-api-error-handling/scenarios/level-4/error-analytics-platform.yaml +65 -0
  177. package/courses/rest-api-error-handling/scenarios/level-4/error-cost-optimization.yaml +63 -0
  178. package/courses/rest-api-error-handling/scenarios/level-4/error-executive-communication.yaml +60 -0
  179. package/courses/rest-api-error-handling/scenarios/level-4/error-handling-architecture.yaml +67 -0
  180. package/courses/rest-api-error-handling/scenarios/level-4/error-org-design.yaml +68 -0
  181. package/courses/rest-api-error-handling/scenarios/level-4/error-sla-design.yaml +65 -0
  182. package/courses/rest-api-error-handling/scenarios/level-4/error-training-program.yaml +61 -0
  183. package/courses/rest-api-error-handling/scenarios/level-4/expert-error-shift.yaml +63 -0
  184. package/courses/rest-api-error-handling/scenarios/level-5/comprehensive-error-system.yaml +68 -0
  185. package/courses/rest-api-error-handling/scenarios/level-5/error-ai-future.yaml +75 -0
  186. package/courses/rest-api-error-handling/scenarios/level-5/error-behavioral-science.yaml +73 -0
  187. package/courses/rest-api-error-handling/scenarios/level-5/error-board-strategy.yaml +60 -0
  188. package/courses/rest-api-error-handling/scenarios/level-5/error-consulting-engagement.yaml +58 -0
  189. package/courses/rest-api-error-handling/scenarios/level-5/error-industry-benchmarks.yaml +72 -0
  190. package/courses/rest-api-error-handling/scenarios/level-5/error-ma-integration.yaml +68 -0
  191. package/courses/rest-api-error-handling/scenarios/level-5/error-product-development.yaml +66 -0
  192. package/courses/rest-api-error-handling/scenarios/level-5/error-regulatory-landscape.yaml +80 -0
  193. package/courses/rest-api-error-handling/scenarios/level-5/master-error-shift.yaml +73 -0
  194. package/dist/cli/commands/add.d.ts.map +1 -1
  195. package/dist/cli/commands/add.js +6 -5
  196. package/dist/cli/commands/add.js.map +1 -1
  197. package/dist/cli/commands/generate.d.ts.map +1 -1
  198. package/dist/cli/commands/generate.js +4 -0
  199. package/dist/cli/commands/generate.js.map +1 -1
  200. package/dist/cli/commands/list.d.ts.map +1 -1
  201. package/dist/cli/commands/list.js +6 -18
  202. package/dist/cli/commands/list.js.map +1 -1
  203. package/dist/cli/commands/train.d.ts.map +1 -1
  204. package/dist/cli/commands/train.js +18 -18
  205. package/dist/cli/commands/train.js.map +1 -1
  206. package/dist/cli/index.js +93 -55
  207. package/dist/cli/index.js.map +1 -1
  208. package/dist/cli/run-demo.js +2 -1
  209. package/dist/cli/run-demo.js.map +1 -1
  210. package/dist/cli/setup.d.ts +18 -0
  211. package/dist/cli/setup.d.ts.map +1 -0
  212. package/dist/cli/setup.js +154 -0
  213. package/dist/cli/setup.js.map +1 -0
  214. package/dist/engine/agent-bridge.d.ts +5 -2
  215. package/dist/engine/agent-bridge.d.ts.map +1 -1
  216. package/dist/engine/agent-bridge.js +36 -9
  217. package/dist/engine/agent-bridge.js.map +1 -1
  218. package/dist/engine/loader.d.ts +21 -0
  219. package/dist/engine/loader.d.ts.map +1 -1
  220. package/dist/engine/loader.js +54 -1
  221. package/dist/engine/loader.js.map +1 -1
  222. package/dist/engine/training-loop.d.ts.map +1 -1
  223. package/dist/engine/training-loop.js +1 -0
  224. package/dist/engine/training-loop.js.map +1 -1
  225. package/dist/engine/training.d.ts.map +1 -1
  226. package/dist/engine/training.js +1 -0
  227. package/dist/engine/training.js.map +1 -1
  228. package/dist/generator/skill-generator.d.ts +1 -1
  229. package/dist/generator/skill-generator.d.ts.map +1 -1
  230. package/dist/generator/skill-generator.js +21 -2
  231. package/dist/generator/skill-generator.js.map +1 -1
  232. package/dist/mcp/server.d.ts.map +1 -1
  233. package/dist/mcp/server.js +11 -26
  234. package/dist/mcp/server.js.map +1 -1
  235. package/dist/mcp/session-manager.d.ts +3 -1
  236. package/dist/mcp/session-manager.d.ts.map +1 -1
  237. package/dist/mcp/session-manager.js +44 -22
  238. package/dist/mcp/session-manager.js.map +1 -1
  239. package/dist/types/schemas.d.ts +38 -13
  240. package/dist/types/schemas.d.ts.map +1 -1
  241. package/dist/types/schemas.js +9 -5
  242. package/dist/types/schemas.js.map +1 -1
  243. package/package.json +1 -1
@@ -0,0 +1,77 @@
1
+ meta:
2
+ id: first-optimization-shift
3
+ level: 1
4
+ course: postgresql-query-optimization
5
+ type: output
6
+ description: "First optimization shift — triage and fix multiple slow queries during a production performance incident"
7
+ tags: [PostgreSQL, optimization, shift-simulation, triage, beginner]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You're on-call for a SaaS application when alerts fire at 9:00 AM
13
+ Monday (peak traffic). Database CPU is at 95% and climbing.
14
+ Response times have tripled. Users are complaining on Twitter.
15
+
16
+ Your monitoring dashboard shows:
17
+
18
+ Active connections: 180/200 (at limit)
19
+ Query queue: 45 queries waiting
20
+ Longest running query: 180 seconds (and counting)
21
+ Database CPU: 95%
22
+ Disk I/O: 98% utilization
23
+
24
+ Top 5 queries by current CPU consumption:
25
+
26
+ 1. (45% CPU) Running for 180 seconds:
27
+ SELECT * FROM audit_logs WHERE created_at > '2020-01-01'
28
+ ORDER BY created_at;
29
+ (audit_logs has 200M rows, no LIMIT, someone ran it from an
30
+ admin panel)
31
+
32
+ 2. (20% CPU) Running 500 concurrent copies:
33
+ SELECT u.*, p.* FROM users u
34
+ JOIN profiles p ON p.user_id = u.id
35
+ WHERE u.email = $1;
36
+ (Seq Scan on users — missing index on email)
37
+
38
+ 3. (15% CPU) Running 200 concurrent copies:
39
+ SELECT COUNT(*) FROM products
40
+ WHERE category_id = $1 AND active = true;
41
+ (Seq Scan on products — missing index)
42
+
43
+ 4. (10% CPU) Running every 5 seconds (background job):
44
+ UPDATE notifications SET checked_at = NOW()
45
+ WHERE user_id = $1 AND checked_at IS NULL;
46
+ (Seq Scan + row lock contention)
47
+
48
+ 5. (5% CPU) Running 50 concurrent copies:
49
+ SELECT * FROM orders WHERE customer_id = $1
50
+ ORDER BY created_at DESC LIMIT 10;
51
+ (Using an index but fetching too many columns)
52
+
53
+ Available actions:
54
+ - Kill specific queries (pg_terminate_backend)
55
+ - Create indexes (takes time on large tables)
56
+ - Modify application code (requires deploy)
57
+ - Add connection pooling (PgBouncer not yet set up)
58
+ - Scale up database instance
59
+
60
+ Task: Triage and fix this incident. Write: the immediate actions
61
+ (first 5 minutes), the short-term fixes (next hour), the medium-
62
+ term improvements (this week), and the monitoring setup to prevent
63
+ this from recurring.
64
+
65
+ assertions:
66
+ - type: llm_judge
67
+ criteria: "Immediate triage is correct — kills the 200M-row audit_logs query first (45% CPU), considers killing or limiting long-running queries, and doesn't create indexes under 95% CPU load (would make things worse). Prioritizes by impact"
68
+ weight: 0.35
69
+ description: "Correct immediate triage"
70
+ - type: llm_judge
71
+ criteria: "Short-term fixes address root causes — index on users(email) for the most frequent query, index on products(category_id, active) for the product count, application-level statement_timeout to prevent unbounded queries, and connection pooling to handle the 180/200 connection pressure"
72
+ weight: 0.35
73
+ description: "Root cause fixes"
74
+ - type: llm_judge
75
+ criteria: "Prevention measures are practical — sets statement_timeout for different query types, implements pg_stat_statements monitoring, adds connection pooling, puts row limits on admin panel queries, and sets up alerting on slow queries and connection count before hitting limits"
76
+ weight: 0.30
77
+ description: "Practical prevention measures"
@@ -0,0 +1,76 @@
1
+ meta:
2
+ id: index-fundamentals
3
+ level: 1
4
+ course: postgresql-query-optimization
5
+ type: output
6
+ description: "Understand index fundamentals — learn when to create B-tree indexes and how they affect query performance"
7
+ tags: [PostgreSQL, indexes, B-tree, performance, beginner]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You're working on a customer support ticketing system. The database
13
+ has a tickets table with 5 million rows:
14
+
15
+ CREATE TABLE tickets (
16
+ id SERIAL PRIMARY KEY,
17
+ customer_id INTEGER NOT NULL,
18
+ agent_id INTEGER,
19
+ status VARCHAR(20) NOT NULL DEFAULT 'open',
20
+ priority VARCHAR(10) NOT NULL DEFAULT 'medium',
21
+ subject TEXT NOT NULL,
22
+ created_at TIMESTAMP NOT NULL DEFAULT NOW(),
23
+ updated_at TIMESTAMP NOT NULL DEFAULT NOW(),
24
+ resolved_at TIMESTAMP,
25
+ category VARCHAR(50),
26
+ tags TEXT[]
27
+ );
28
+
29
+ Currently the only index is on id (primary key). These queries are
30
+ all doing sequential scans on 5M rows:
31
+
32
+ Query 1 — Agent dashboard (runs 500 times/minute):
33
+ SELECT * FROM tickets
34
+ WHERE agent_id = 42 AND status = 'open'
35
+ ORDER BY priority, created_at;
36
+
37
+ Query 2 — Customer history (runs 200 times/minute):
38
+ SELECT * FROM tickets
39
+ WHERE customer_id = 12345
40
+ ORDER BY created_at DESC;
41
+
42
+ Query 3 — Reporting (runs hourly):
43
+ SELECT status, COUNT(*) FROM tickets
44
+ WHERE created_at >= '2026-02-01'
45
+ GROUP BY status;
46
+
47
+ Query 4 — Search (runs 100 times/minute):
48
+ SELECT * FROM tickets
49
+ WHERE category = 'billing' AND status IN ('open', 'pending')
50
+ AND created_at >= NOW() - INTERVAL '7 days';
51
+
52
+ Query 5 — Duplicate detection (runs on every new ticket):
53
+ SELECT * FROM tickets
54
+ WHERE customer_id = 12345
55
+ AND subject = 'Refund request'
56
+ AND created_at >= NOW() - INTERVAL '24 hours';
57
+
58
+ Task: For each of the 5 queries, recommend the right index. Explain
59
+ why you chose each index (column order matters!), what type of scan
60
+ the query will use after indexing, and the trade-offs of adding too
61
+ many indexes. Then prioritize which indexes to create first based on
62
+ query frequency and impact.
63
+
64
+ assertions:
65
+ - type: llm_judge
66
+ criteria: "Index recommendations are correct — multi-column indexes have the right column order (equality columns first, then range/sort columns), each query's access pattern is analyzed (point lookup vs range scan vs sort), and the recommended indexes would eliminate the sequential scans"
67
+ weight: 0.35
68
+ description: "Correct index recommendations"
69
+ - type: llm_judge
70
+ criteria: "Trade-offs are explained — discusses write overhead (indexes slow down INSERTs/UPDATEs), storage cost, index maintenance (bloat, REINDEX), and why not to create an index for every possible query. Addresses whether some queries can share indexes"
71
+ weight: 0.35
72
+ description: "Trade-offs explained"
73
+ - type: llm_judge
74
+ criteria: "Prioritization is well-reasoned — considers query frequency (agent dashboard at 500/min is highest priority), business impact, and whether a single index can serve multiple queries. Creates the minimum set of indexes for maximum impact"
75
+ weight: 0.30
76
+ description: "Well-reasoned prioritization"
@@ -0,0 +1,73 @@
1
+ meta:
2
+ id: join-basics
3
+ level: 1
4
+ course: postgresql-query-optimization
5
+ type: output
6
+ description: "Understand JOIN performance — learn how PostgreSQL executes different JOIN types and when each is appropriate"
7
+ tags: [PostgreSQL, JOIN, nested-loop, hash-join, merge-join, beginner]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You're building a reporting dashboard for a school management
13
+ system. The queries are slow and you need to understand how JOINs
14
+ work in PostgreSQL to fix them.
15
+
16
+ Tables:
17
+ - students (10,000 rows): id, name, grade_level, enrollment_date
18
+ - courses (500 rows): id, name, department, credits
19
+ - enrollments (100,000 rows): id, student_id, course_id, semester,
20
+ grade (references students and courses)
21
+ - teachers (200 rows): id, name, department
22
+ - course_assignments (600 rows): course_id, teacher_id, semester
23
+
24
+ Slow queries:
25
+
26
+ Query 1 — Student transcript (Nested Loop is fine here):
27
+ SELECT c.name, e.grade, e.semester
28
+ FROM enrollments e
29
+ JOIN courses c ON c.id = e.course_id
30
+ WHERE e.student_id = 42;
31
+ (Returns ~40 rows, but takes 2 seconds)
32
+
33
+ Query 2 — Department report (Hash Join expected):
34
+ SELECT s.name, c.name, e.grade
35
+ FROM students s
36
+ JOIN enrollments e ON e.student_id = s.id
37
+ JOIN courses c ON c.id = e.course_id
38
+ WHERE c.department = 'Mathematics';
39
+ (Returns ~5,000 rows, takes 8 seconds)
40
+
41
+ Query 3 — Full grade export (Merge Join might help):
42
+ SELECT s.name, c.name, e.grade
43
+ FROM students s
44
+ JOIN enrollments e ON e.student_id = s.id
45
+ JOIN courses c ON c.id = e.course_id
46
+ ORDER BY s.name, c.name;
47
+ (Returns 100,000 rows, takes 30 seconds)
48
+
49
+ Query 4 — Students without enrollments:
50
+ SELECT s.* FROM students s
51
+ LEFT JOIN enrollments e ON e.student_id = s.id
52
+ WHERE e.id IS NULL;
53
+ (Checking for unenrolled students)
54
+
55
+ Task: For each query, explain: which JOIN algorithm PostgreSQL
56
+ should choose and why, what indexes are needed, the expected
57
+ execution plan after optimization, and any query rewrites that
58
+ would help. Then write a general guide for when each JOIN
59
+ algorithm (Nested Loop, Hash Join, Merge Join) is appropriate.
60
+
61
+ assertions:
62
+ - type: llm_judge
63
+ criteria: "JOIN algorithm selection is correct — Nested Loop is good for small outer sets (Query 1: student_id = 42 returns few rows), Hash Join for medium sets with equality conditions (Query 2), Merge Join for large pre-sorted datasets (Query 3 with ORDER BY). Each selection is justified by data sizes and access patterns"
64
+ weight: 0.35
65
+ description: "Correct JOIN algorithm selection"
66
+ - type: llm_judge
67
+ criteria: "Index recommendations are specific — enrollments needs index on student_id (for Query 1 and 3), courses may need index on department (Query 2), and the indexes support the JOIN algorithms (index for Nested Loop inner table, or pre-sorting for Merge Join)"
68
+ weight: 0.35
69
+ description: "Specific index recommendations"
70
+ - type: llm_judge
71
+ criteria: "General JOIN guide is clear — explains when each algorithm is best (Nested Loop: small outer × indexed inner, Hash Join: medium tables with equality, Merge Join: large pre-sorted tables), how to read the execution plan to see which was chosen, and how work_mem affects Hash Join behavior"
72
+ weight: 0.30
73
+ description: "Clear general JOIN guide"
@@ -0,0 +1,62 @@
1
+ meta:
2
+ id: n-plus-one-queries
3
+ level: 1
4
+ course: postgresql-query-optimization
5
+ type: output
6
+ description: "Fix N+1 query problems — identify and resolve the most common ORM performance anti-pattern"
7
+ tags: [PostgreSQL, N+1, ORM, performance, anti-pattern, beginner]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your blog application's homepage takes 12 seconds to load. The page
13
+ shows the 20 most recent posts with their authors and comment
14
+ counts. Your monitoring shows the page makes 62 database queries
15
+ per request.
16
+
17
+ The ORM code (pseudocode):
18
+ posts = Post.findAll({ limit: 20, order: 'created_at DESC' })
19
+ for each post in posts:
20
+ author = User.findById(post.author_id) // N queries
21
+ commentCount = Comment.count(post_id: post.id) // N queries
22
+ categories = post.getCategories() // N queries
23
+ post.display(author, commentCount, categories)
24
+
25
+ Database query log (abbreviated):
26
+ Query 1: SELECT * FROM posts ORDER BY created_at DESC LIMIT 20
27
+ Query 2: SELECT * FROM users WHERE id = 101
28
+ Query 3: SELECT COUNT(*) FROM comments WHERE post_id = 1
29
+ Query 4: SELECT c.* FROM categories c JOIN post_categories pc
30
+ ON c.id = pc.category_id WHERE pc.post_id = 1
31
+ Query 5: SELECT * FROM users WHERE id = 102
32
+ Query 6: SELECT COUNT(*) FROM comments WHERE post_id = 2
33
+ Query 7: SELECT c.* FROM categories c JOIN post_categories pc
34
+ ON c.id = pc.category_id WHERE pc.post_id = 2
35
+ ... (repeat for all 20 posts = 1 + 20*3 = 61 queries)
36
+
37
+ Table sizes:
38
+ - posts: 50,000 rows
39
+ - users: 10,000 rows
40
+ - comments: 500,000 rows
41
+ - categories: 50 rows
42
+ - post_categories: 100,000 rows
43
+
44
+ Task: Rewrite this as optimized SQL. Show: (1) the N+1 problem
45
+ explained visually (why 62 queries), (2) the optimized version
46
+ using JOINs (1-2 queries), (3) an alternative using subqueries,
47
+ (4) how to fix this at the ORM level (eager loading), and (5) the
48
+ expected performance improvement with EXPLAIN ANALYZE comparison.
49
+
50
+ assertions:
51
+ - type: llm_judge
52
+ criteria: "N+1 problem is clearly explained — shows why the loop generates 1 + N*3 queries, explains the network round-trip overhead of 62 separate queries, and demonstrates that the database does redundant work (same author fetched multiple times)"
53
+ weight: 0.35
54
+ description: "Clear N+1 explanation"
55
+ - type: llm_judge
56
+ criteria: "Optimized query is correct — uses JOINs to fetch posts with authors, comment counts (via LEFT JOIN + GROUP BY or subquery), and categories in 1-2 queries instead of 62. The SQL is syntactically correct and handles edge cases (posts with no comments, no categories)"
57
+ weight: 0.35
58
+ description: "Correct optimized query"
59
+ - type: llm_judge
60
+ criteria: "Multiple solutions are provided — shows the pure SQL approach (JOINs), the ORM approach (eager loading / includes / preload), and discusses when each is appropriate. Includes the expected performance improvement (62 queries → 1-2 queries, 12s → <200ms)"
61
+ weight: 0.30
62
+ description: "Multiple solution approaches"
@@ -0,0 +1,69 @@
1
+ meta:
2
+ id: query-rewriting-basics
3
+ level: 1
4
+ course: postgresql-query-optimization
5
+ type: output
6
+ description: "Rewrite queries for performance — transform common slow query patterns into efficient alternatives"
7
+ tags: [PostgreSQL, query-rewriting, optimization, patterns, beginner]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You've been asked to review and optimize a set of queries from a
13
+ legacy application. Each query works correctly but performs poorly.
14
+ The table has 5 million invoices.
15
+
16
+ Query 1 — Correlated subquery (runs per row):
17
+ SELECT i.*, (SELECT name FROM customers c
18
+ WHERE c.id = i.customer_id) as customer_name
19
+ FROM invoices i
20
+ WHERE i.amount > 1000;
21
+ (Executes the subquery for each of the 200,000 matching rows)
22
+
23
+ Query 2 — DISTINCT instead of proper JOIN:
24
+ SELECT DISTINCT i.*
25
+ FROM invoices i
26
+ JOIN line_items li ON li.invoice_id = i.id
27
+ WHERE li.product_id = 42;
28
+ (DISTINCT removes duplicates caused by the JOIN, very expensive
29
+ on wide rows)
30
+
31
+ Query 3 — Unnecessary sorting:
32
+ SELECT * FROM invoices
33
+ WHERE status = 'overdue'
34
+ ORDER BY id
35
+ LIMIT 100;
36
+ (The ORDER BY id is meaningless if you just want any 100 rows)
37
+
38
+ Query 4 — Counting all to check existence:
39
+ IF (SELECT COUNT(*) FROM invoices
40
+ WHERE customer_id = 42 AND status = 'unpaid') > 0 THEN ...
41
+ (Counts all matching rows just to check if any exist)
42
+
43
+ Query 5 — Multiple queries that could be one:
44
+ total = SELECT COUNT(*) FROM invoices WHERE status = 'paid';
45
+ total_amount = SELECT SUM(amount) FROM invoices WHERE status = 'paid';
46
+ avg_amount = SELECT AVG(amount) FROM invoices WHERE status = 'paid';
47
+ (Three full table scans for the same filter)
48
+
49
+ Query 6 — Inefficient pagination:
50
+ SELECT * FROM invoices ORDER BY created_at OFFSET 999980 LIMIT 20;
51
+ (OFFSET 999980 scans and discards almost 1M rows)
52
+
53
+ Task: Rewrite each query with the optimized version. For each,
54
+ explain: why the original is slow, what the optimized version does
55
+ differently, and the expected performance improvement.
56
+
57
+ assertions:
58
+ - type: llm_judge
59
+ criteria: "All 6 queries are correctly rewritten — correlated subquery becomes a JOIN, DISTINCT becomes EXISTS or proper deduplication, unnecessary ORDER BY is removed, COUNT for existence becomes EXISTS, three queries become one with multiple aggregates, and OFFSET pagination becomes keyset/cursor pagination"
60
+ weight: 0.35
61
+ description: "All queries correctly rewritten"
62
+ - type: llm_judge
63
+ criteria: "Explanations show why originals are slow — correlated subquery executes N times, DISTINCT sorts/hashes wide rows, COUNT(*) scans all matching rows vs EXISTS stops at first, OFFSET scans and discards rows. Each explanation identifies the specific performance anti-pattern"
64
+ weight: 0.35
65
+ description: "Clear explanations of slowness"
66
+ - type: llm_judge
67
+ criteria: "Performance improvements are quantified — estimates the improvement for each rewrite (e.g., EXISTS vs COUNT: O(1) vs O(N), keyset pagination: constant time vs linear time with offset). The improvements are realistic for the 5M row table"
68
+ weight: 0.30
69
+ description: "Quantified performance improvements"
@@ -0,0 +1,69 @@
1
+ meta:
2
+ id: select-star-problems
3
+ level: 1
4
+ course: postgresql-query-optimization
5
+ type: output
6
+ description: "Eliminate SELECT * problems — understand why selecting all columns hurts performance and how to fix it"
7
+ tags: [PostgreSQL, SELECT, columns, performance, anti-pattern, beginner]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your API has a /users endpoint that returns user profiles. The
13
+ endpoint query is:
14
+
15
+ SELECT * FROM users WHERE active = true ORDER BY name LIMIT 50;
16
+
17
+ The users table has 25 columns including:
18
+ - id, name, email (used by the API response)
19
+ - password_hash, salt (NEVER should be sent to clients)
20
+ - profile_photo BYTEA (avg 500KB per row)
21
+ - resume_pdf BYTEA (avg 2MB per row)
22
+ - login_history JSONB (avg 100KB per row)
23
+ - preferences JSONB (avg 10KB per row)
24
+ - 15 other metadata columns
25
+
26
+ Performance problems caused by SELECT *:
27
+
28
+ 1. Data transfer: Each row is ~2.6MB instead of ~200 bytes (name +
29
+ email). For 50 rows, that's 130MB instead of 10KB.
30
+
31
+ 2. Memory usage: The application ORM hydrates all 25 columns into
32
+ objects, consuming 130MB of heap memory per request.
33
+
34
+ 3. Security leak: password_hash and salt are fetched (even if the
35
+ API doesn't serialize them, they're in memory).
36
+
37
+ 4. Index-only scan prevented: With SELECT *, PostgreSQL must visit
38
+ the heap even if an index covers the needed columns. An index on
39
+ (active, name) could serve SELECT name, email with an index-only
40
+ scan but SELECT * forces a heap fetch.
41
+
42
+ 5. Schema evolution risk: When a new column is added (like a 5MB
43
+ attachment_data), all SELECT * queries automatically get slower
44
+ without any code change.
45
+
46
+ Other scenarios to address:
47
+ - COUNT(*) vs COUNT(1) — is there a difference?
48
+ - SELECT * in subqueries — does the optimizer help?
49
+ - SELECT * in EXISTS — does it matter?
50
+ - When is SELECT * actually OK? (ad-hoc queries, CTEs?)
51
+
52
+ Task: Rewrite the query with explicit columns, explain each of the
53
+ 5 problems in detail, address the 4 additional scenarios, and
54
+ provide a coding guideline for when to use SELECT * vs explicit
55
+ columns.
56
+
57
+ assertions:
58
+ - type: llm_judge
59
+ criteria: "All 5 problems are clearly explained — data transfer bloat (130MB vs 10KB), memory waste, security risk (password in memory), prevented index-only scans, and schema evolution risk. Each explanation connects the technical issue to real impact (latency, memory, security)"
60
+ weight: 0.35
61
+ description: "All problems clearly explained"
62
+ - type: llm_judge
63
+ criteria: "Additional scenarios are correctly answered — COUNT(*) vs COUNT(1) are identical in PostgreSQL, SELECT * in subqueries may or may not be optimized by the planner, SELECT * in EXISTS is fine because only existence is checked, and SELECT * in development/debugging contexts is acceptable"
64
+ weight: 0.35
65
+ description: "Additional scenarios correctly answered"
66
+ - type: llm_judge
67
+ criteria: "Practical guidelines are provided — when to always use explicit columns (production code, APIs), when SELECT * is acceptable (ad-hoc queries, EXISTS), how to enforce this in code review or linting, and the covering index opportunity when selecting specific columns"
68
+ weight: 0.30
69
+ description: "Practical guidelines"
@@ -0,0 +1,63 @@
1
+ meta:
2
+ id: slow-query-diagnosis
3
+ level: 1
4
+ course: postgresql-query-optimization
5
+ type: output
6
+ description: "Diagnose slow queries — use pg_stat_statements and query logs to find and prioritize the worst-performing queries"
7
+ tags: [PostgreSQL, pg_stat_statements, slow-queries, diagnosis, beginner]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your application's database response times have been creeping up
13
+ over the last month. Users are complaining about slowness but
14
+ nobody knows which queries are the problem. Your senior DBA is
15
+ on vacation and left you a note: "Enable pg_stat_statements and
16
+ check the slow query log."
17
+
18
+ You enable pg_stat_statements and after 24 hours, here's the
19
+ top 10 by total_exec_time:
20
+
21
+ | calls | mean_ms | total_ms | query (truncated) |
22
+ |----------|---------|------------|----------------------------------|
23
+ | 2,400,000| 0.5 | 1,200,000 | SELECT id FROM sessions WHERE... |
24
+ | 50,000 | 200 | 10,000,000 | SELECT * FROM orders JOIN... |
25
+ | 500,000 | 15 | 7,500,000 | UPDATE products SET view_count...|
26
+ | 10,000 | 500 | 5,000,000 | SELECT * FROM analytics WHERE...|
27
+ | 1,000 | 3,000 | 3,000,000 | SELECT * FROM reports WHERE... |
28
+ | 100,000 | 25 | 2,500,000 | INSERT INTO events (...)... |
29
+ | 5,000 | 400 | 2,000,000 | DELETE FROM sessions WHERE... |
30
+ | 800,000 | 2 | 1,600,000 | SELECT 1 FROM users WHERE id=...|
31
+ | 200,000 | 5 | 1,000,000 | SELECT name FROM categories... |
32
+ | 50 | 15,000 | 750,000 | SELECT * FROM users CROSS JOIN...|
33
+
34
+ Questions to answer:
35
+ 1. Which query should you optimize first? (It's not the one with
36
+ the longest single execution time)
37
+ 2. The session check query (row 1) is fast per call but runs 2.4M
38
+ times — is this a problem?
39
+ 3. The reports query (row 5) runs only 1,000 times but takes 3
40
+ seconds each — how do you approach this differently than the
41
+ orders query?
42
+ 4. The CROSS JOIN query (row 10) takes 15 seconds — what's likely
43
+ wrong?
44
+
45
+ Task: Analyze the pg_stat_statements data and create an optimization
46
+ priority list. For each of the top 10 queries, explain: whether it
47
+ needs optimization, what the likely issue is, how to investigate
48
+ further, and the expected impact of fixing it. Then explain how to
49
+ set up ongoing slow query monitoring.
50
+
51
+ assertions:
52
+ - type: llm_judge
53
+ criteria: "Prioritization is correct — ranks by total_exec_time (not mean or calls alone), identifies the orders query (50K calls × 200ms = 10M ms total) as the highest priority, explains why a fast-but-frequent query can consume more time than a slow-but-rare one"
54
+ weight: 0.35
55
+ description: "Correct query prioritization"
56
+ - type: llm_judge
57
+ criteria: "Analysis of each query is insightful — identifies likely issues (missing indexes for orders/analytics, CROSS JOIN likely missing a WHERE clause, session checks might be unnecessary with caching, view_count UPDATE might need batching). Investigation steps are practical (EXPLAIN ANALYZE, check indexes, check table sizes)"
58
+ weight: 0.35
59
+ description: "Insightful per-query analysis"
60
+ - type: llm_judge
61
+ criteria: "Monitoring setup is practical — explains how to configure pg_stat_statements (shared_preload_libraries), log_min_duration_statement for slow query logging, and how to set up ongoing monitoring (periodic snapshots of pg_stat_statements, alerting on regression)"
62
+ weight: 0.30
63
+ description: "Practical monitoring setup"
@@ -0,0 +1,62 @@
1
+ meta:
2
+ id: vacuum-and-statistics
3
+ level: 1
4
+ course: postgresql-query-optimization
5
+ type: output
6
+ description: "Understand VACUUM and statistics — learn why PostgreSQL needs maintenance and how outdated statistics cause bad query plans"
7
+ tags: [PostgreSQL, VACUUM, autovacuum, statistics, ANALYZE, beginner]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your orders table has 10 million rows. Yesterday it was fast.
13
+ Today, a simple query that used to take 50ms now takes 15 seconds:
14
+
15
+ SELECT * FROM orders WHERE status = 'pending'
16
+ AND created_at >= '2026-02-26';
17
+
18
+ EXPLAIN shows: Seq Scan on orders (estimated rows: 5,000,000)
19
+ But the actual result is only 150 rows.
20
+
21
+ Investigation reveals:
22
+ 1. The table statistics say 50% of rows have status = 'pending'
23
+ (true 6 months ago, now only 0.01% are pending)
24
+ 2. A bulk UPDATE last night changed 8M rows from 'pending' to
25
+ 'completed' but ANALYZE hasn't run since
26
+ 3. The table has 8M dead tuples (from the bulk UPDATE) that
27
+ VACUUM hasn't cleaned up yet
28
+ 4. Table bloat: the table occupies 4GB on disk but only 2GB of
29
+ that is live data
30
+
31
+ Your DBA explains: "PostgreSQL's query planner uses statistics to
32
+ estimate how many rows a query will return. If the statistics are
33
+ wrong, the planner picks the wrong execution plan."
34
+
35
+ Additional scenarios:
36
+ - Why did autovacuum not run? (It's been blocked by a long-running
37
+ transaction from a reporting query)
38
+ - What happens when dead tuples accumulate? (Table bloat, slower
39
+ sequential scans, index bloat)
40
+ - What's the difference between VACUUM and VACUUM FULL?
41
+ - What does ANALYZE do and when should you run it?
42
+
43
+ Task: Explain what happened, fix the immediate problem, and set up
44
+ ongoing maintenance. Write: why the statistics were wrong and how
45
+ they led to a bad plan, the immediate fix (ANALYZE + VACUUM), the
46
+ difference between VACUUM, VACUUM FULL, and VACUUM ANALYZE, the
47
+ autovacuum configuration for high-update tables, and how to prevent
48
+ long-running transactions from blocking autovacuum.
49
+
50
+ assertions:
51
+ - type: llm_judge
52
+ criteria: "Root cause is clearly explained — outdated statistics caused the planner to estimate 5M rows when only 150 existed, leading it to choose Seq Scan instead of Index Scan. ANALYZE updates statistics, fixing the plan. Dead tuples from the bulk UPDATE caused bloat"
53
+ weight: 0.35
54
+ description: "Clear root cause explanation"
55
+ - type: llm_judge
56
+ criteria: "Maintenance operations are correctly distinguished — VACUUM reclaims dead tuples without locking (but doesn't shrink the file), VACUUM FULL rewrites the table (locks it), ANALYZE updates statistics only, VACUUM ANALYZE does both. Explains when each is appropriate"
57
+ weight: 0.35
58
+ description: "Correct maintenance distinctions"
59
+ - type: llm_judge
60
+ criteria: "Autovacuum configuration is practical — explains key parameters (autovacuum_vacuum_threshold, autovacuum_vacuum_scale_factor) for high-update tables, how to set per-table autovacuum settings, and how to prevent long-running transactions from blocking autovacuum (statement_timeout, idle_in_transaction_session_timeout)"
61
+ weight: 0.30
62
+ description: "Practical autovacuum configuration"
@@ -0,0 +1,74 @@
1
+ meta:
2
+ id: where-clause-optimization
3
+ level: 1
4
+ course: postgresql-query-optimization
5
+ type: output
6
+ description: "Optimize WHERE clauses — rewrite query filters to use indexes effectively and avoid common pitfalls"
7
+ tags: [PostgreSQL, WHERE, filters, optimization, SARGable, beginner]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your team has a users table (2 million rows) with indexes on email,
13
+ created_at, and status. But several queries are still doing full
14
+ table scans despite having indexes on the filtered columns.
15
+
16
+ Index definitions:
17
+ CREATE INDEX idx_users_email ON users(email);
18
+ CREATE INDEX idx_users_created ON users(created_at);
19
+ CREATE INDEX idx_users_status ON users(status);
20
+
21
+ Queries that aren't using indexes (and why):
22
+
23
+ Query 1 — Function on indexed column:
24
+ SELECT * FROM users WHERE LOWER(email) = 'john@example.com';
25
+ (Index on email exists but LOWER() prevents its use)
26
+
27
+ Query 2 — Implicit type cast:
28
+ SELECT * FROM users WHERE id = '12345';
29
+ (id is INTEGER, '12345' is TEXT — implicit cast)
30
+
31
+ Query 3 — Leading wildcard:
32
+ SELECT * FROM users WHERE email LIKE '%@gmail.com';
33
+ (Leading wildcard can't use B-tree index)
34
+
35
+ Query 4 — OR with different columns:
36
+ SELECT * FROM users WHERE email = 'john@example.com'
37
+ OR status = 'suspended';
38
+ (OR across columns prevents single index use)
39
+
40
+ Query 5 — Math on indexed column:
41
+ SELECT * FROM users WHERE created_at + INTERVAL '30 days' > NOW();
42
+ (Math on the column prevents index use)
43
+
44
+ Query 6 — NOT IN with subquery:
45
+ SELECT * FROM users WHERE id NOT IN
46
+ (SELECT user_id FROM deleted_users);
47
+ (NOT IN with NULL handling issues)
48
+
49
+ Query 7 — Coalesce on indexed column:
50
+ SELECT * FROM users WHERE COALESCE(status, 'unknown') = 'active';
51
+ (Function wrapping prevents index use)
52
+
53
+ Query 8 — Comparing to column instead of constant:
54
+ SELECT * FROM users WHERE created_at > updated_at;
55
+ (Two columns compared — no single index helps)
56
+
57
+ Task: Rewrite each of the 8 queries to use indexes effectively.
58
+ Explain the SARGable concept (Search ARGument ABLE) and why each
59
+ original query violates it. Show the EXPLAIN output before and
60
+ after each rewrite.
61
+
62
+ assertions:
63
+ - type: llm_judge
64
+ criteria: "All 8 queries are correctly rewritten — LOWER(email) uses a functional index or citext, type cast is fixed by matching types, leading wildcard uses pg_trgm or reverse index, OR uses UNION, math is moved to the constant side, NOT IN is replaced with NOT EXISTS or LEFT JOIN, COALESCE is eliminated, and column comparison is addressed"
65
+ weight: 0.35
66
+ description: "All queries correctly rewritten"
67
+ - type: llm_judge
68
+ criteria: "SARGable concept is clearly explained — defines what makes a WHERE clause SARGable (index-usable), the general rule (don't transform the indexed column), and provides the mental model for writing index-friendly filters"
69
+ weight: 0.35
70
+ description: "Clear SARGable explanation"
71
+ - type: llm_judge
72
+ criteria: "Before/after comparison shows the improvement — demonstrates that the rewritten queries use index scans instead of sequential scans, with approximate row counts and timing to illustrate the performance difference on 2M rows"
73
+ weight: 0.30
74
+ description: "Improvement demonstration"