dojo.md 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (243) hide show
  1. package/courses/GENERATION_LOG.md +27 -0
  2. package/courses/api-error-handling/course.yaml +16 -0
  3. package/courses/api-error-handling/scenarios/level-1/error-response-format.yaml +131 -0
  4. package/courses/api-error-handling/scenarios/level-1/http-status-codes-basics.yaml +90 -0
  5. package/courses/api-error-handling/scenarios/level-1/rate-limiting-basics.yaml +135 -0
  6. package/courses/api-error-handling/scenarios/level-1/request-validation-errors.yaml +208 -0
  7. package/courses/api-error-handling/scenarios/level-2/circuit-breaker-pattern.yaml +189 -0
  8. package/courses/api-error-handling/scenarios/level-2/idempotency-retry-logic.yaml +159 -0
  9. package/courses/api-error-handling/scenarios/level-2/rfc-7807-problem-details.yaml +178 -0
  10. package/courses/api-error-handling/scenarios/level-2/webhook-error-handling.yaml +211 -0
  11. package/courses/api-error-handling/scenarios/level-3/distributed-tracing-errors.yaml +275 -0
  12. package/courses/github-actions-cicd/course.yaml +10 -0
  13. package/courses/github-actions-cicd/scenarios/level-1/actions-and-runners.yaml +58 -0
  14. package/courses/github-actions-cicd/scenarios/level-1/basic-workflow-syntax.yaml +52 -0
  15. package/courses/github-actions-cicd/scenarios/level-1/branch-protection-checks.yaml +63 -0
  16. package/courses/github-actions-cicd/scenarios/level-1/environment-variables-secrets.yaml +65 -0
  17. package/courses/github-actions-cicd/scenarios/level-1/first-cicd-shift.yaml +62 -0
  18. package/courses/github-actions-cicd/scenarios/level-1/job-dependencies-outputs.yaml +62 -0
  19. package/courses/github-actions-cicd/scenarios/level-1/simple-ci-pipeline.yaml +57 -0
  20. package/courses/github-actions-cicd/scenarios/level-1/workflow-debugging.yaml +90 -0
  21. package/courses/github-actions-cicd/scenarios/level-1/workflow-status-notifications.yaml +59 -0
  22. package/courses/github-actions-cicd/scenarios/level-1/workflow-triggers.yaml +56 -0
  23. package/courses/github-actions-cicd/scenarios/level-2/concurrency-control.yaml +58 -0
  24. package/courses/github-actions-cicd/scenarios/level-2/conditional-execution.yaml +60 -0
  25. package/courses/github-actions-cicd/scenarios/level-2/custom-actions-development.yaml +55 -0
  26. package/courses/github-actions-cicd/scenarios/level-2/dependency-caching.yaml +58 -0
  27. package/courses/github-actions-cicd/scenarios/level-2/deployment-workflows.yaml +61 -0
  28. package/courses/github-actions-cicd/scenarios/level-2/github-packages-publishing.yaml +59 -0
  29. package/courses/github-actions-cicd/scenarios/level-2/intermediate-cicd-shift.yaml +68 -0
  30. package/courses/github-actions-cicd/scenarios/level-2/matrix-builds.yaml +59 -0
  31. package/courses/github-actions-cicd/scenarios/level-2/reusable-workflows.yaml +61 -0
  32. package/courses/github-actions-cicd/scenarios/level-2/workflow-cost-optimization.yaml +61 -0
  33. package/courses/github-actions-cicd/scenarios/level-3/advanced-cicd-shift.yaml +64 -0
  34. package/courses/github-actions-cicd/scenarios/level-3/compliance-automation.yaml +68 -0
  35. package/courses/github-actions-cicd/scenarios/level-3/docker-action-development.yaml +65 -0
  36. package/courses/github-actions-cicd/scenarios/level-3/github-environments.yaml +65 -0
  37. package/courses/github-actions-cicd/scenarios/level-3/monorepo-ci.yaml +68 -0
  38. package/courses/github-actions-cicd/scenarios/level-3/oidc-cloud-deployments.yaml +55 -0
  39. package/courses/github-actions-cicd/scenarios/level-3/release-automation.yaml +61 -0
  40. package/courses/github-actions-cicd/scenarios/level-3/security-hardening.yaml +63 -0
  41. package/courses/github-actions-cicd/scenarios/level-3/self-hosted-runners.yaml +60 -0
  42. package/courses/github-actions-cicd/scenarios/level-3/workflow-optimization.yaml +59 -0
  43. package/courses/github-actions-cicd/scenarios/level-4/cicd-data-architecture.yaml +63 -0
  44. package/courses/github-actions-cicd/scenarios/level-4/cicd-economics-roi.yaml +63 -0
  45. package/courses/github-actions-cicd/scenarios/level-4/cicd-executive-communication.yaml +58 -0
  46. package/courses/github-actions-cicd/scenarios/level-4/cicd-incident-response.yaml +60 -0
  47. package/courses/github-actions-cicd/scenarios/level-4/cicd-org-design.yaml +59 -0
  48. package/courses/github-actions-cicd/scenarios/level-4/cicd-platform-architecture.yaml +63 -0
  49. package/courses/github-actions-cicd/scenarios/level-4/cicd-training-program.yaml +65 -0
  50. package/courses/github-actions-cicd/scenarios/level-4/cicd-vendor-evaluation.yaml +59 -0
  51. package/courses/github-actions-cicd/scenarios/level-4/enterprise-cicd-governance.yaml +55 -0
  52. package/courses/github-actions-cicd/scenarios/level-4/expert-cicd-shift.yaml +60 -0
  53. package/courses/github-actions-cicd/scenarios/level-5/cicd-ai-future.yaml +63 -0
  54. package/courses/github-actions-cicd/scenarios/level-5/cicd-behavioral-science.yaml +70 -0
  55. package/courses/github-actions-cicd/scenarios/level-5/cicd-board-strategy.yaml +56 -0
  56. package/courses/github-actions-cicd/scenarios/level-5/cicd-consulting-engagement.yaml +61 -0
  57. package/courses/github-actions-cicd/scenarios/level-5/cicd-industry-benchmarks.yaml +63 -0
  58. package/courses/github-actions-cicd/scenarios/level-5/cicd-ma-integration.yaml +73 -0
  59. package/courses/github-actions-cicd/scenarios/level-5/cicd-product-development.yaml +68 -0
  60. package/courses/github-actions-cicd/scenarios/level-5/cicd-regulatory-landscape.yaml +72 -0
  61. package/courses/github-actions-cicd/scenarios/level-5/comprehensive-cicd-system.yaml +66 -0
  62. package/courses/github-actions-cicd/scenarios/level-5/master-cicd-shift.yaml +76 -0
  63. package/courses/github-pr-review/scenarios/level-2/api-change-review.yaml +82 -0
  64. package/courses/github-pr-review/scenarios/level-2/automated-review-tooling.yaml +53 -0
  65. package/courses/github-pr-review/scenarios/level-2/cross-team-review.yaml +61 -0
  66. package/courses/github-pr-review/scenarios/level-2/intermediate-review-shift.yaml +66 -0
  67. package/courses/github-pr-review/scenarios/level-2/performance-review-patterns.yaml +99 -0
  68. package/courses/github-pr-review/scenarios/level-2/review-disagreement-resolution.yaml +64 -0
  69. package/courses/github-pr-review/scenarios/level-2/review-metrics-analysis.yaml +63 -0
  70. package/courses/github-pr-review/scenarios/level-2/review-turnaround-sla.yaml +54 -0
  71. package/courses/github-pr-review/scenarios/level-2/stacked-pr-review.yaml +65 -0
  72. package/courses/github-pr-review/scenarios/level-3/advanced-review-shift.yaml +65 -0
  73. package/courses/github-pr-review/scenarios/level-3/ai-powered-review.yaml +58 -0
  74. package/courses/github-pr-review/scenarios/level-3/compliance-review-process.yaml +64 -0
  75. package/courses/github-pr-review/scenarios/level-3/cross-functional-review.yaml +60 -0
  76. package/courses/github-pr-review/scenarios/level-3/incident-driven-review.yaml +63 -0
  77. package/courses/github-pr-review/scenarios/level-3/large-scale-review-operations.yaml +55 -0
  78. package/courses/github-pr-review/scenarios/level-3/monorepo-review-process.yaml +68 -0
  79. package/courses/github-pr-review/scenarios/level-3/review-automation-platform.yaml +61 -0
  80. package/courses/github-pr-review/scenarios/level-3/review-culture-design.yaml +62 -0
  81. package/courses/github-pr-review/scenarios/level-3/review-data-pipeline.yaml +62 -0
  82. package/courses/github-pr-review/scenarios/level-4/enterprise-review-operations.yaml +61 -0
  83. package/courses/github-pr-review/scenarios/level-4/expert-review-shift.yaml +62 -0
  84. package/courses/github-pr-review/scenarios/level-4/review-data-architecture.yaml +69 -0
  85. package/courses/github-pr-review/scenarios/level-4/review-economics-roi.yaml +63 -0
  86. package/courses/github-pr-review/scenarios/level-4/review-executive-communication.yaml +61 -0
  87. package/courses/github-pr-review/scenarios/level-4/review-incident-postmortem.yaml +69 -0
  88. package/courses/github-pr-review/scenarios/level-4/review-org-design.yaml +62 -0
  89. package/courses/github-pr-review/scenarios/level-4/review-platform-architecture.yaml +64 -0
  90. package/courses/github-pr-review/scenarios/level-4/review-training-program.yaml +66 -0
  91. package/courses/github-pr-review/scenarios/level-4/review-vendor-evaluation.yaml +76 -0
  92. package/courses/github-pr-review/scenarios/level-5/comprehensive-review-system.yaml +68 -0
  93. package/courses/github-pr-review/scenarios/level-5/master-review-shift.yaml +73 -0
  94. package/courses/github-pr-review/scenarios/level-5/review-ai-future.yaml +69 -0
  95. package/courses/github-pr-review/scenarios/level-5/review-behavioral-science.yaml +66 -0
  96. package/courses/github-pr-review/scenarios/level-5/review-board-strategy.yaml +62 -0
  97. package/courses/github-pr-review/scenarios/level-5/review-consulting-engagement.yaml +62 -0
  98. package/courses/github-pr-review/scenarios/level-5/review-devtools-product.yaml +71 -0
  99. package/courses/github-pr-review/scenarios/level-5/review-industry-benchmarks.yaml +64 -0
  100. package/courses/github-pr-review/scenarios/level-5/review-ma-integration.yaml +76 -0
  101. package/courses/github-pr-review/scenarios/level-5/review-regulatory-landscape.yaml +78 -0
  102. package/courses/postgresql-query-optimization/course.yaml +11 -0
  103. package/courses/postgresql-query-optimization/scenarios/level-1/explain-analyze-basics.yaml +80 -0
  104. package/courses/postgresql-query-optimization/scenarios/level-1/first-optimization-shift.yaml +77 -0
  105. package/courses/postgresql-query-optimization/scenarios/level-1/index-fundamentals.yaml +76 -0
  106. package/courses/postgresql-query-optimization/scenarios/level-1/join-basics.yaml +73 -0
  107. package/courses/postgresql-query-optimization/scenarios/level-1/n-plus-one-queries.yaml +62 -0
  108. package/courses/postgresql-query-optimization/scenarios/level-1/query-rewriting-basics.yaml +69 -0
  109. package/courses/postgresql-query-optimization/scenarios/level-1/select-star-problems.yaml +69 -0
  110. package/courses/postgresql-query-optimization/scenarios/level-1/slow-query-diagnosis.yaml +63 -0
  111. package/courses/postgresql-query-optimization/scenarios/level-1/vacuum-and-statistics.yaml +62 -0
  112. package/courses/postgresql-query-optimization/scenarios/level-1/where-clause-optimization.yaml +74 -0
  113. package/courses/postgresql-query-optimization/scenarios/level-2/autovacuum-tuning.yaml +76 -0
  114. package/courses/postgresql-query-optimization/scenarios/level-2/composite-index-design.yaml +81 -0
  115. package/courses/postgresql-query-optimization/scenarios/level-2/covering-indexes.yaml +74 -0
  116. package/courses/postgresql-query-optimization/scenarios/level-2/cte-optimization.yaml +83 -0
  117. package/courses/postgresql-query-optimization/scenarios/level-2/intermediate-optimization-shift.yaml +66 -0
  118. package/courses/postgresql-query-optimization/scenarios/level-2/join-optimization.yaml +72 -0
  119. package/courses/postgresql-query-optimization/scenarios/level-2/partial-and-expression-indexes.yaml +75 -0
  120. package/courses/postgresql-query-optimization/scenarios/level-2/query-planner-settings.yaml +62 -0
  121. package/courses/postgresql-query-optimization/scenarios/level-2/subquery-optimization.yaml +67 -0
  122. package/courses/postgresql-query-optimization/scenarios/level-2/window-function-optimization.yaml +63 -0
  123. package/courses/postgresql-query-optimization/scenarios/level-3/advanced-optimization-shift.yaml +71 -0
  124. package/courses/postgresql-query-optimization/scenarios/level-3/connection-pooling.yaml +60 -0
  125. package/courses/postgresql-query-optimization/scenarios/level-3/full-text-search-optimization.yaml +66 -0
  126. package/courses/postgresql-query-optimization/scenarios/level-3/jsonb-optimization.yaml +88 -0
  127. package/courses/postgresql-query-optimization/scenarios/level-3/lock-contention-analysis.yaml +80 -0
  128. package/courses/postgresql-query-optimization/scenarios/level-3/materialized-view-optimization.yaml +73 -0
  129. package/courses/postgresql-query-optimization/scenarios/level-3/parallel-query-execution.yaml +74 -0
  130. package/courses/postgresql-query-optimization/scenarios/level-3/partitioning-strategies.yaml +71 -0
  131. package/courses/postgresql-query-optimization/scenarios/level-3/specialized-index-types.yaml +67 -0
  132. package/courses/postgresql-query-optimization/scenarios/level-3/write-optimization.yaml +65 -0
  133. package/courses/postgresql-query-optimization/scenarios/level-4/data-architecture-analytics.yaml +64 -0
  134. package/courses/postgresql-query-optimization/scenarios/level-4/database-executive-communication.yaml +64 -0
  135. package/courses/postgresql-query-optimization/scenarios/level-4/database-migration-planning.yaml +57 -0
  136. package/courses/postgresql-query-optimization/scenarios/level-4/enterprise-database-governance.yaml +52 -0
  137. package/courses/postgresql-query-optimization/scenarios/level-4/expert-optimization-shift.yaml +73 -0
  138. package/courses/postgresql-query-optimization/scenarios/level-4/high-availability-architecture.yaml +62 -0
  139. package/courses/postgresql-query-optimization/scenarios/level-4/optimizer-internals.yaml +69 -0
  140. package/courses/postgresql-query-optimization/scenarios/level-4/performance-sla-design.yaml +58 -0
  141. package/courses/postgresql-query-optimization/scenarios/level-4/read-replica-optimization.yaml +62 -0
  142. package/courses/postgresql-query-optimization/scenarios/level-4/vendor-evaluation.yaml +73 -0
  143. package/courses/rest-api-error-handling/course.yaml +11 -0
  144. package/courses/rest-api-error-handling/scenarios/level-1/authentication-errors.yaml +71 -0
  145. package/courses/rest-api-error-handling/scenarios/level-1/content-negotiation-errors.yaml +63 -0
  146. package/courses/rest-api-error-handling/scenarios/level-1/error-logging-basics.yaml +63 -0
  147. package/courses/rest-api-error-handling/scenarios/level-1/error-response-format.yaml +58 -0
  148. package/courses/rest-api-error-handling/scenarios/level-1/first-error-handling-shift.yaml +67 -0
  149. package/courses/rest-api-error-handling/scenarios/level-1/http-status-codes.yaml +46 -0
  150. package/courses/rest-api-error-handling/scenarios/level-1/not-found-errors.yaml +52 -0
  151. package/courses/rest-api-error-handling/scenarios/level-1/rate-limiting-errors.yaml +56 -0
  152. package/courses/rest-api-error-handling/scenarios/level-1/request-validation-errors.yaml +59 -0
  153. package/courses/rest-api-error-handling/scenarios/level-1/server-error-handling.yaml +55 -0
  154. package/courses/rest-api-error-handling/scenarios/level-2/api-versioning-errors.yaml +66 -0
  155. package/courses/rest-api-error-handling/scenarios/level-2/batch-request-errors.yaml +61 -0
  156. package/courses/rest-api-error-handling/scenarios/level-2/circuit-breaker-pattern.yaml +52 -0
  157. package/courses/rest-api-error-handling/scenarios/level-2/error-code-taxonomy.yaml +62 -0
  158. package/courses/rest-api-error-handling/scenarios/level-2/error-monitoring-alerting.yaml +53 -0
  159. package/courses/rest-api-error-handling/scenarios/level-2/intermediate-error-shift.yaml +69 -0
  160. package/courses/rest-api-error-handling/scenarios/level-2/pagination-errors.yaml +66 -0
  161. package/courses/rest-api-error-handling/scenarios/level-2/retry-and-idempotency.yaml +60 -0
  162. package/courses/rest-api-error-handling/scenarios/level-2/rfc7807-problem-details.yaml +60 -0
  163. package/courses/rest-api-error-handling/scenarios/level-2/webhook-error-handling.yaml +55 -0
  164. package/courses/rest-api-error-handling/scenarios/level-3/advanced-error-shift.yaml +72 -0
  165. package/courses/rest-api-error-handling/scenarios/level-3/api-gateway-errors.yaml +71 -0
  166. package/courses/rest-api-error-handling/scenarios/level-3/async-api-errors.yaml +67 -0
  167. package/courses/rest-api-error-handling/scenarios/level-3/caching-error-scenarios.yaml +65 -0
  168. package/courses/rest-api-error-handling/scenarios/level-3/chaos-engineering-apis.yaml +62 -0
  169. package/courses/rest-api-error-handling/scenarios/level-3/database-error-handling.yaml +79 -0
  170. package/courses/rest-api-error-handling/scenarios/level-3/distributed-error-propagation.yaml +63 -0
  171. package/courses/rest-api-error-handling/scenarios/level-3/error-budgets-sre.yaml +61 -0
  172. package/courses/rest-api-error-handling/scenarios/level-3/error-correlation.yaml +58 -0
  173. package/courses/rest-api-error-handling/scenarios/level-3/graphql-vs-rest-errors.yaml +73 -0
  174. package/courses/rest-api-error-handling/scenarios/level-4/compliance-error-handling.yaml +65 -0
  175. package/courses/rest-api-error-handling/scenarios/level-4/enterprise-error-governance.yaml +62 -0
  176. package/courses/rest-api-error-handling/scenarios/level-4/error-analytics-platform.yaml +65 -0
  177. package/courses/rest-api-error-handling/scenarios/level-4/error-cost-optimization.yaml +63 -0
  178. package/courses/rest-api-error-handling/scenarios/level-4/error-executive-communication.yaml +60 -0
  179. package/courses/rest-api-error-handling/scenarios/level-4/error-handling-architecture.yaml +67 -0
  180. package/courses/rest-api-error-handling/scenarios/level-4/error-org-design.yaml +68 -0
  181. package/courses/rest-api-error-handling/scenarios/level-4/error-sla-design.yaml +65 -0
  182. package/courses/rest-api-error-handling/scenarios/level-4/error-training-program.yaml +61 -0
  183. package/courses/rest-api-error-handling/scenarios/level-4/expert-error-shift.yaml +63 -0
  184. package/courses/rest-api-error-handling/scenarios/level-5/comprehensive-error-system.yaml +68 -0
  185. package/courses/rest-api-error-handling/scenarios/level-5/error-ai-future.yaml +75 -0
  186. package/courses/rest-api-error-handling/scenarios/level-5/error-behavioral-science.yaml +73 -0
  187. package/courses/rest-api-error-handling/scenarios/level-5/error-board-strategy.yaml +60 -0
  188. package/courses/rest-api-error-handling/scenarios/level-5/error-consulting-engagement.yaml +58 -0
  189. package/courses/rest-api-error-handling/scenarios/level-5/error-industry-benchmarks.yaml +72 -0
  190. package/courses/rest-api-error-handling/scenarios/level-5/error-ma-integration.yaml +68 -0
  191. package/courses/rest-api-error-handling/scenarios/level-5/error-product-development.yaml +66 -0
  192. package/courses/rest-api-error-handling/scenarios/level-5/error-regulatory-landscape.yaml +80 -0
  193. package/courses/rest-api-error-handling/scenarios/level-5/master-error-shift.yaml +73 -0
  194. package/dist/cli/commands/add.d.ts.map +1 -1
  195. package/dist/cli/commands/add.js +6 -5
  196. package/dist/cli/commands/add.js.map +1 -1
  197. package/dist/cli/commands/generate.d.ts.map +1 -1
  198. package/dist/cli/commands/generate.js +4 -0
  199. package/dist/cli/commands/generate.js.map +1 -1
  200. package/dist/cli/commands/list.d.ts.map +1 -1
  201. package/dist/cli/commands/list.js +6 -18
  202. package/dist/cli/commands/list.js.map +1 -1
  203. package/dist/cli/commands/train.d.ts.map +1 -1
  204. package/dist/cli/commands/train.js +18 -18
  205. package/dist/cli/commands/train.js.map +1 -1
  206. package/dist/cli/index.js +93 -55
  207. package/dist/cli/index.js.map +1 -1
  208. package/dist/cli/run-demo.js +2 -1
  209. package/dist/cli/run-demo.js.map +1 -1
  210. package/dist/cli/setup.d.ts +18 -0
  211. package/dist/cli/setup.d.ts.map +1 -0
  212. package/dist/cli/setup.js +154 -0
  213. package/dist/cli/setup.js.map +1 -0
  214. package/dist/engine/agent-bridge.d.ts +5 -2
  215. package/dist/engine/agent-bridge.d.ts.map +1 -1
  216. package/dist/engine/agent-bridge.js +36 -9
  217. package/dist/engine/agent-bridge.js.map +1 -1
  218. package/dist/engine/loader.d.ts +21 -0
  219. package/dist/engine/loader.d.ts.map +1 -1
  220. package/dist/engine/loader.js +54 -1
  221. package/dist/engine/loader.js.map +1 -1
  222. package/dist/engine/training-loop.d.ts.map +1 -1
  223. package/dist/engine/training-loop.js +1 -0
  224. package/dist/engine/training-loop.js.map +1 -1
  225. package/dist/engine/training.d.ts.map +1 -1
  226. package/dist/engine/training.js +1 -0
  227. package/dist/engine/training.js.map +1 -1
  228. package/dist/generator/skill-generator.d.ts +1 -1
  229. package/dist/generator/skill-generator.d.ts.map +1 -1
  230. package/dist/generator/skill-generator.js +21 -2
  231. package/dist/generator/skill-generator.js.map +1 -1
  232. package/dist/mcp/server.d.ts.map +1 -1
  233. package/dist/mcp/server.js +11 -26
  234. package/dist/mcp/server.js.map +1 -1
  235. package/dist/mcp/session-manager.d.ts +3 -1
  236. package/dist/mcp/session-manager.d.ts.map +1 -1
  237. package/dist/mcp/session-manager.js +44 -22
  238. package/dist/mcp/session-manager.js.map +1 -1
  239. package/dist/types/schemas.d.ts +38 -13
  240. package/dist/types/schemas.d.ts.map +1 -1
  241. package/dist/types/schemas.js +9 -5
  242. package/dist/types/schemas.js.map +1 -1
  243. package/package.json +1 -1
@@ -0,0 +1,76 @@
1
+ meta:
2
+ id: autovacuum-tuning
3
+ level: 2
4
+ course: postgresql-query-optimization
5
+ type: output
6
+ description: "Tune autovacuum — configure per-table autovacuum settings for high-write tables and prevent table bloat"
7
+ tags: [PostgreSQL, autovacuum, VACUUM, table-bloat, maintenance, intermediate]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your database has a mix of tables with very different write patterns,
13
+ and the default autovacuum settings aren't working for all of them.
14
+
15
+ Table profiles:
16
+
17
+ Table 1 — events (high-write, append-only):
18
+ - 500M rows, 200GB
19
+ - 1M INSERTs/day, 0 UPDATEs, 0 DELETEs
20
+ - Dead tuples: 0 (no updates = no dead tuples)
21
+ - Issue: Autovacuum runs daily but there's nothing to vacuum.
22
+ Wastes I/O.
23
+
24
+ Table 2 — sessions (high-churn):
25
+ - 5M rows, 2GB (but should be 500MB)
26
+ - 2M INSERTs/day, 2M DELETEs/day (session created, then deleted)
27
+ - Dead tuples: 1.5M (always behind)
28
+ - Issue: Table is 4x bloated. Autovacuum can't keep up with the
29
+ DELETE rate. Index bloat growing too.
30
+
31
+ Table 3 — user_preferences (bulk-update):
32
+ - 2M rows, 400MB
33
+ - Weekly batch job updates ALL 2M rows at once
34
+ - Dead tuples: 2M after batch job (100% of table)
35
+ - Issue: After batch job, table is 200% bloated until autovacuum
36
+ runs. Queries are 3x slower for hours.
37
+
38
+ Table 4 — orders (mixed, hot table):
39
+ - 50M rows, 20GB
40
+ - 100K INSERTs/day + 500K UPDATEs/day (status changes)
41
+ - Dead tuples: varies, 50K-500K
42
+ - Issue: Autovacuum contends with production queries. During peak
43
+ hours, autovacuum slows down the API.
44
+
45
+ Table 5 — audit_logs (write-once, read-rarely):
46
+ - 1B rows, 500GB
47
+ - 5M INSERTs/day, never updated, never deleted
48
+ - Issue: Autovacuum runs for hours on this table, consuming I/O
49
+ that the other tables need.
50
+
51
+ Current settings (all default):
52
+ autovacuum_vacuum_threshold = 50
53
+ autovacuum_vacuum_scale_factor = 0.2
54
+ autovacuum_analyze_threshold = 50
55
+ autovacuum_analyze_scale_factor = 0.1
56
+ autovacuum_vacuum_cost_delay = 2ms
57
+ autovacuum_vacuum_cost_limit = 200
58
+
59
+ Task: Design per-table autovacuum settings for each of the 5 tables.
60
+ For each, calculate: when autovacuum triggers (threshold + scale
61
+ factor × rows), the optimal settings, and any additional strategies
62
+ (partitioning, pg_repack for bloat).
63
+
64
+ assertions:
65
+ - type: llm_judge
66
+ criteria: "Per-table settings are correctly calculated — events table should have autovacuum disabled or very high thresholds, sessions table needs aggressive settings (low scale_factor, high cost_limit), user_preferences needs immediate vacuum after batch job, orders table needs throttled vacuum during peak hours, audit_logs should use partitioning to avoid vacuuming the entire table"
67
+ weight: 0.35
68
+ description: "Correct per-table settings"
69
+ - type: llm_judge
70
+ criteria: "Bloat remediation strategies are practical — recommends pg_repack for the sessions table (online table rebuild), partitioning for audit_logs (vacuum per partition), and scheduling manual VACUUM ANALYZE after the user_preferences batch job. Explains why VACUUM FULL is rarely the right answer (exclusive lock)"
71
+ weight: 0.35
72
+ description: "Practical bloat remediation"
73
+ - type: llm_judge
74
+ criteria: "Autovacuum monitoring is included — shows how to monitor autovacuum effectiveness using pg_stat_user_tables (n_dead_tup, last_autovacuum, autovacuum_count), how to detect tables where autovacuum is falling behind, and how to set up alerting on table bloat"
75
+ weight: 0.30
76
+ description: "Autovacuum monitoring"
@@ -0,0 +1,81 @@
1
+ meta:
2
+ id: composite-index-design
3
+ level: 2
4
+ course: postgresql-query-optimization
5
+ type: output
6
+ description: "Design composite indexes — optimize multi-column indexes with correct column ordering for complex query patterns"
7
+ tags: [PostgreSQL, composite-index, column-ordering, intermediate]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your SaaS application has an orders table (50M rows) with these
13
+ common query patterns. You need to design the minimum set of
14
+ composite indexes that covers all of them.
15
+
16
+ Table:
17
+ CREATE TABLE orders (
18
+ id BIGSERIAL PRIMARY KEY,
19
+ tenant_id INTEGER NOT NULL,
20
+ customer_id INTEGER NOT NULL,
21
+ status VARCHAR(20) NOT NULL,
22
+ total DECIMAL(10,2) NOT NULL,
23
+ created_at TIMESTAMP NOT NULL,
24
+ shipped_at TIMESTAMP,
25
+ region VARCHAR(10) NOT NULL
26
+ );
27
+
28
+ Query patterns (ordered by frequency):
29
+
30
+ Q1 (10,000/min): Tenant dashboard — recent orders
31
+ WHERE tenant_id = ? AND status = ?
32
+ ORDER BY created_at DESC LIMIT 50
33
+
34
+ Q2 (5,000/min): Customer order history
35
+ WHERE tenant_id = ? AND customer_id = ?
36
+ ORDER BY created_at DESC
37
+
38
+ Q3 (1,000/min): Revenue reporting
39
+ WHERE tenant_id = ? AND created_at BETWEEN ? AND ?
40
+ AND status = 'completed'
41
+
42
+ Q4 (500/min): Shipping queue
43
+ WHERE tenant_id = ? AND status = 'ready_to_ship'
44
+ AND region = ?
45
+ ORDER BY created_at ASC
46
+
47
+ Q5 (100/min): Large order alerts
48
+ WHERE tenant_id = ? AND total > 10000
49
+ AND created_at >= NOW() - INTERVAL '24 hours'
50
+
51
+ Q6 (50/min): Analytics — status distribution
52
+ SELECT status, COUNT(*) FROM orders
53
+ WHERE tenant_id = ? AND created_at >= ?
54
+ GROUP BY status
55
+
56
+ Current state: Only the PRIMARY KEY index exists. All queries
57
+ do sequential scans.
58
+
59
+ Constraints:
60
+ - Maximum 5 indexes total (write performance budget)
61
+ - Every query must use tenant_id (multi-tenant isolation)
62
+ - Minimize total index storage (currently 50M rows × 5 indexes)
63
+
64
+ Task: Design the 5 composite indexes. For each, explain: the column
65
+ order and why (equality → range → sort), which queries it serves,
66
+ the expected scan type after indexing. Then explain which queries
67
+ share indexes and any trade-offs in your design.
68
+
69
+ assertions:
70
+ - type: llm_judge
71
+ criteria: "Index column ordering follows the equality-range-sort principle — equality columns (tenant_id, status) come first, followed by range columns (created_at BETWEEN), then sort columns (ORDER BY created_at). The reasoning for each column's position is explicit"
72
+ weight: 0.35
73
+ description: "Correct column ordering principle"
74
+ - type: llm_judge
75
+ criteria: "Index set is minimal and covers all queries — 5 or fewer indexes serve all 6 query patterns, some indexes are shared across multiple queries (e.g., tenant_id + status + created_at serves Q1 and Q3), and the design prioritizes high-frequency queries (Q1 at 10K/min gets the best index)"
76
+ weight: 0.35
77
+ description: "Minimal covering index set"
78
+ - type: llm_judge
79
+ criteria: "Trade-offs are acknowledged — explains write overhead of 5 indexes on a 50M-row table, discusses which queries get optimal vs acceptable performance, and considers partial indexes or covering indexes (INCLUDE) to further optimize"
80
+ weight: 0.30
81
+ description: "Trade-offs acknowledged"
@@ -0,0 +1,74 @@
1
+ meta:
2
+ id: covering-indexes
3
+ level: 2
4
+ course: postgresql-query-optimization
5
+ type: output
6
+ description: "Implement covering indexes — use INCLUDE columns and index-only scans to eliminate heap fetches"
7
+ tags: [PostgreSQL, covering-index, INCLUDE, index-only-scan, intermediate]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your analytics API has endpoints that run the same queries millions
13
+ of times per day. Even with indexes, EXPLAIN ANALYZE shows "Heap
14
+ Fetches: 50000" — the query uses the index to find rows but then
15
+ must read the actual table (heap) to get the columns not in the
16
+ index.
17
+
18
+ Table: events (500M rows, 200GB)
19
+ Most queries select only 3-5 columns out of 25.
20
+
21
+ Query 1 (2M calls/day):
22
+ SELECT user_id, event_type, created_at
23
+ FROM events
24
+ WHERE user_id = $1 AND created_at >= $2
25
+ ORDER BY created_at DESC LIMIT 100;
26
+
27
+ Current index: (user_id, created_at)
28
+ Problem: Index finds the rows fast, but must heap-fetch each row
29
+ to read event_type. With 100 rows, that's 100 random I/O reads.
30
+
31
+ Query 2 (500K calls/day):
32
+ SELECT product_id, SUM(quantity), COUNT(*)
33
+ FROM events
34
+ WHERE event_type = 'purchase' AND created_at >= $1
35
+ GROUP BY product_id;
36
+
37
+ Current index: (event_type, created_at)
38
+ Problem: Must heap-fetch every matching row to read product_id
39
+ and quantity. Thousands of heap fetches.
40
+
41
+ Query 3 (1M calls/day):
42
+ SELECT COUNT(*) FROM events
43
+ WHERE user_id = $1 AND event_type = 'page_view';
44
+
45
+ Current index: (user_id, event_type)
46
+ Problem: COUNT(*) should be index-only but EXPLAIN shows heap
47
+ fetches. Why?
48
+
49
+ Query 4 (100K calls/day):
50
+ SELECT DISTINCT category FROM events
51
+ WHERE tenant_id = $1;
52
+
53
+ Current index: (tenant_id)
54
+ Problem: Must heap-fetch every row to read category, then
55
+ deduplicate. Extremely slow for tenants with millions of events.
56
+
57
+ Task: Design covering indexes for each query. Explain: the INCLUDE
58
+ clause syntax, why index-only scans are faster, why Query 3 still
59
+ shows heap fetches (visibility map), and the trade-offs of wider
60
+ indexes (storage, write overhead, maintenance).
61
+
62
+ assertions:
63
+ - type: llm_judge
64
+ criteria: "Covering indexes are correctly designed — uses INCLUDE to add non-key columns (event_type in Q1, product_id + quantity in Q2, category in Q4), enabling index-only scans that eliminate heap fetches. The INCLUDE columns are in the right place (not as index keys)"
65
+ weight: 0.35
66
+ description: "Correct covering index design"
67
+ - type: llm_judge
68
+ criteria: "Visibility map issue is explained — Query 3 shows heap fetches because recently updated/inserted pages aren't marked as all-visible in the visibility map. VACUUM updates the visibility map. Explains the connection between VACUUM frequency and index-only scan effectiveness"
69
+ weight: 0.35
70
+ description: "Visibility map explanation"
71
+ - type: llm_judge
72
+ criteria: "Trade-offs are quantified — estimates the index size increase from INCLUDE columns, discusses write amplification (wider indexes = more data per write), and provides guidelines for when covering indexes are worth the cost (high-frequency read queries on large tables)"
73
+ weight: 0.30
74
+ description: "Quantified trade-offs"
@@ -0,0 +1,83 @@
1
+ meta:
2
+ id: cte-optimization
3
+ level: 2
4
+ course: postgresql-query-optimization
5
+ type: output
6
+ description: "Optimize CTEs and subqueries — understand materialization control and when CTEs help or hurt performance"
7
+ tags: [PostgreSQL, CTE, materialization, subquery, optimization, intermediate]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your application uses CTEs (Common Table Expressions) extensively
13
+ for readability. After upgrading from PostgreSQL 11 to 16, some
14
+ queries got faster and some got slower. The team is confused about
15
+ CTE behavior.
16
+
17
+ Query 1 — CTE that got faster (auto-inlined in PG 12+):
18
+ WITH active_users AS (
19
+ SELECT * FROM users WHERE active = true
20
+ )
21
+ SELECT * FROM active_users WHERE created_at > '2026-01-01';
22
+
23
+ PG 11: Materialized the CTE (scanned all active users, then
24
+ filtered by date). Slow because active_users = 900K rows.
25
+ PG 16: Inlined the CTE (pushed date filter into the scan).
26
+ Fast because it reads only users active AND created after Jan 1.
27
+
28
+ Query 2 — CTE that got slower (inlined when it shouldn't be):
29
+ WITH expensive_calc AS (
30
+ SELECT customer_id, SUM(amount) as total
31
+ FROM orders GROUP BY customer_id
32
+ )
33
+ SELECT u.name, e.total FROM users u
34
+ JOIN expensive_calc e ON e.customer_id = u.id
35
+ WHERE u.tier = 'premium';
36
+
37
+ PG 11: Materialized (computed once, joined). Fine.
38
+ PG 16: Inlined (merged into main query). Now the aggregation
39
+ runs after the JOIN filter, but the planner pushes the filter
40
+ into the wrong place, making it slower.
41
+
42
+ Query 3 — CTE referenced multiple times:
43
+ WITH monthly_stats AS (
44
+ SELECT date_trunc('month', created_at) as month,
45
+ COUNT(*) as order_count, SUM(amount) as revenue
46
+ FROM orders GROUP BY 1
47
+ )
48
+ SELECT a.month, a.order_count, a.revenue,
49
+ a.revenue - b.revenue as growth
50
+ FROM monthly_stats a
51
+ JOIN monthly_stats b ON b.month = a.month - INTERVAL '1 month';
52
+
53
+ This CTE is referenced twice. Should it be materialized?
54
+
55
+ Query 4 — Recursive CTE (always materialized):
56
+ WITH RECURSIVE org_chart AS (
57
+ SELECT id, name, manager_id, 1 as depth
58
+ FROM employees WHERE manager_id IS NULL
59
+ UNION ALL
60
+ SELECT e.id, e.name, e.manager_id, oc.depth + 1
61
+ FROM employees e JOIN org_chart oc ON e.manager_id = oc.id
62
+ )
63
+ SELECT * FROM org_chart WHERE depth <= 5;
64
+
65
+ This is slow for a 100K employee table. Can it be optimized?
66
+
67
+ Task: For each query, explain the CTE behavior (materialized vs
68
+ inlined), when to force MATERIALIZED or NOT MATERIALIZED, and the
69
+ optimization strategy. Then write guidelines for CTE usage.
70
+
71
+ assertions:
72
+ - type: llm_judge
73
+ criteria: "CTE materialization behavior is correctly explained — PG 12+ defaults to inlining CTEs referenced once, materializing when referenced multiple times. Query 1 benefits from inlining (filter pushdown), Query 2 needs explicit MATERIALIZED (compute-once benefit), Query 3 should be materialized (referenced twice), Query 4 is always materialized (recursive)"
74
+ weight: 0.35
75
+ description: "Correct materialization behavior"
76
+ - type: llm_judge
77
+ criteria: "MATERIALIZED/NOT MATERIALIZED hints are correctly applied — shows exact syntax, explains when each is needed, and addresses Query 2's regression (force MATERIALIZED to restore PG 11 behavior) and Query 4's optimization (add depth limit to recursive term, not just WHERE clause)"
78
+ weight: 0.35
79
+ description: "Correct hint application"
80
+ - type: llm_judge
81
+ criteria: "Guidelines are practical — provides decision tree for CTE materialization (referenced once → inline unless expensive, referenced multiple times → materialize, recursive → always materialized), and discusses alternatives to CTEs (subqueries, temporary tables, materialized views)"
82
+ weight: 0.30
83
+ description: "Practical CTE guidelines"
@@ -0,0 +1,66 @@
1
+ meta:
2
+ id: intermediate-optimization-shift
3
+ level: 2
4
+ course: postgresql-query-optimization
5
+ type: output
6
+ description: "Intermediate optimization shift — optimize a database migration that's blocked by slow queries and table bloat"
7
+ tags: [PostgreSQL, optimization, shift-simulation, migration, intermediate]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your company is migrating from a monolith to microservices. The
13
+ first step is splitting the monolithic 2TB PostgreSQL database into
14
+ domain-specific databases. But the migration is blocked by
15
+ performance issues.
16
+
17
+ Current database: 2TB, 150 tables, 300 indexes, 100M queries/day
18
+
19
+ Blockers:
20
+
21
+ Blocker 1 — The migration query is too slow:
22
+ INSERT INTO orders_new SELECT * FROM orders
23
+ WHERE created_at >= '2024-01-01';
24
+ This copies 80M rows (800GB) and has been running for 18 hours
25
+ with no end in sight. It's also generating massive WAL traffic
26
+ that's causing replication lag on your read replica.
27
+
28
+ Blocker 2 — Table bloat prevents accurate size estimation:
29
+ The orders table is 800GB on disk but pgstattuple shows only
30
+ 400GB of live data. 50% bloat is throwing off migration planning.
31
+ Should you fix bloat first or after migration?
32
+
33
+ Blocker 3 — Foreign key constraints slow the migration:
34
+ order_items has a FK to orders. Migrating orders first breaks the
35
+ FK. Migrating both together in a transaction locks both tables.
36
+ Current approach: disable FK → migrate → re-enable FK → validate.
37
+ But FK validation on 200M rows takes 4 hours.
38
+
39
+ Blocker 4 — Index rebuilds after migration:
40
+ After copying data, you need to recreate 15 indexes on the new
41
+ orders_new table. Each index takes 30-90 minutes to build. Total:
42
+ 10-15 hours. Can you parallelize?
43
+
44
+ Blocker 5 — Applications still querying during migration:
45
+ The old database must remain operational during migration. Long-
46
+ running migration queries compete with production queries for
47
+ resources (CPU, I/O, connections).
48
+
49
+ Task: Unblock the migration. For each blocker, write: the root
50
+ cause, the solution (with specific commands), the expected time
51
+ savings, and the risk mitigation. Then create the optimized
52
+ migration plan that addresses all 5 blockers.
53
+
54
+ assertions:
55
+ - type: llm_judge
56
+ criteria: "All 5 blockers have specific solutions — batch the large copy (1M rows at a time vs 80M at once), fix bloat after migration (pg_repack or VACUUM FULL during maintenance), handle FKs with NOT VALID then VALIDATE CONSTRAINT, use CREATE INDEX CONCURRENTLY for parallel index builds, and throttle migration queries to coexist with production"
57
+ weight: 0.35
58
+ description: "All blockers solved"
59
+ - type: llm_judge
60
+ criteria: "Solutions use PostgreSQL-specific features — COPY instead of INSERT...SELECT for bulk data movement, CREATE INDEX CONCURRENTLY for non-blocking index builds, ALTER TABLE ... VALIDATE CONSTRAINT for background FK validation, and logical replication for zero-downtime migration alternative"
61
+ weight: 0.35
62
+ description: "PostgreSQL-specific solutions"
63
+ - type: llm_judge
64
+ criteria: "Migration plan is sequenced correctly — data migration before index builds, FK validation after both tables are migrated, resource throttling throughout, and a rollback plan in case any step fails. Time estimates are realistic"
65
+ weight: 0.30
66
+ description: "Correctly sequenced plan"
@@ -0,0 +1,72 @@
1
+ meta:
2
+ id: join-optimization
3
+ level: 2
4
+ course: postgresql-query-optimization
5
+ type: output
6
+ description: "Optimize JOIN strategies — tune PostgreSQL to choose the right JOIN algorithm for each query pattern"
7
+ tags: [PostgreSQL, JOIN, hash-join, merge-join, nested-loop, intermediate]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your reporting system has three problematic JOIN queries. Each uses
13
+ the wrong JOIN algorithm, causing severe performance issues.
14
+
15
+ Tables:
16
+ - customers (1M rows): id, name, email, tier, created_at
17
+ - orders (50M rows): id, customer_id, amount, status, created_at
18
+ - order_items (200M rows): id, order_id, product_id, quantity, price
19
+ - products (100K rows): id, name, category, price
20
+
21
+ Problem 1 — Wrong algorithm: Nested Loop on large tables
22
+ SELECT c.name, SUM(o.amount)
23
+ FROM customers c
24
+ JOIN orders o ON o.customer_id = c.id
25
+ WHERE c.tier = 'enterprise'
26
+ GROUP BY c.name;
27
+
28
+ EXPLAIN shows: Nested Loop Join (estimated: 30 seconds)
29
+ - Enterprise customers: 500 rows
30
+ - Orders per enterprise customer: ~100,000
31
+ - Total: 50M random index lookups
32
+ Better choice: Hash Join (build hash on 500 customers, probe orders)
33
+
34
+ Problem 2 — Hash Join spilling to disk
35
+ SELECT o.id, oi.product_id, oi.quantity
36
+ FROM orders o
37
+ JOIN order_items oi ON oi.order_id = o.id
38
+ WHERE o.created_at >= '2026-01-01';
39
+
40
+ EXPLAIN shows: Hash Join (estimated: 120 seconds)
41
+ - Matching orders: 5M rows
42
+ - Hash table too large for work_mem → spills to disk (Batches: 16)
43
+ - Disk I/O dominates execution time
44
+
45
+ Problem 3 — Missing opportunity for Merge Join
46
+ SELECT c.name, o.amount, o.created_at
47
+ FROM customers c
48
+ JOIN orders o ON o.customer_id = c.id
49
+ ORDER BY c.id, o.created_at;
50
+
51
+ EXPLAIN shows: Hash Join + Sort (estimated: 90 seconds)
52
+ - Both tables could be pre-sorted by the JOIN key
53
+ - Merge Join would eliminate the separate Sort step
54
+
55
+ Task: Fix each problem. For each, explain: why PostgreSQL chose
56
+ the wrong algorithm, what parameter or index change would fix it,
57
+ the expected EXPLAIN output after the fix, and the general rule
58
+ for when each JOIN algorithm is optimal.
59
+
60
+ assertions:
61
+ - type: llm_judge
62
+ criteria: "All 3 problems are correctly diagnosed and fixed — Problem 1 needs statistics update or adjusted cost parameters to favor Hash Join over Nested Loop, Problem 2 needs increased work_mem to prevent hash table spill, Problem 3 needs indexes on JOIN keys to enable Merge Join. Each fix is specific"
63
+ weight: 0.35
64
+ description: "All problems correctly fixed"
65
+ - type: llm_judge
66
+ criteria: "JOIN algorithm selection rules are clearly stated — Nested Loop: best for small outer × indexed inner, Hash Join: best for equality JOINs with sufficient memory, Merge Join: best when both inputs are pre-sorted or will be sorted anyway. Includes data size thresholds and work_mem considerations"
67
+ weight: 0.35
68
+ description: "Clear algorithm selection rules"
69
+ - type: llm_judge
70
+ criteria: "work_mem tuning is addressed — explains how work_mem affects Hash Join (too low → disk spill, too high × many connections → OOM), how to set it per-query vs globally, and the relationship between work_mem, max_connections, and available RAM"
71
+ weight: 0.30
72
+ description: "work_mem tuning addressed"
@@ -0,0 +1,75 @@
1
+ meta:
2
+ id: partial-and-expression-indexes
3
+ level: 2
4
+ course: postgresql-query-optimization
5
+ type: output
6
+ description: "Use partial and expression indexes — create targeted indexes for specific query patterns with reduced storage and maintenance cost"
7
+ tags: [PostgreSQL, partial-index, expression-index, optimization, intermediate]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your e-commerce platform has a products table (10M rows) with
13
+ skewed data distribution. Full indexes are too large and slow
14
+ to maintain.
15
+
16
+ Table statistics:
17
+ - 10M total products
18
+ - 9.5M have status = 'archived' (95%)
19
+ - 300K have status = 'active' (3%)
20
+ - 200K have status = 'draft' (2%)
21
+ - Only active products appear in search results
22
+ - Only draft products appear in the admin editor
23
+
24
+ Current indexes:
25
+ CREATE INDEX idx_products_status ON products(status);
26
+ -- This index is 400MB but 95% of it covers 'archived' rows
27
+ -- that are almost never queried
28
+
29
+ Problematic queries:
30
+
31
+ Q1: Case-insensitive email search:
32
+ SELECT * FROM users WHERE LOWER(email) = LOWER($1);
33
+ -- Index on email doesn't help (function applied)
34
+
35
+ Q2: Active product search:
36
+ SELECT * FROM products WHERE status = 'active' AND category = $1
37
+ ORDER BY price;
38
+ -- Full index scans 10M entries to find 300K active ones
39
+
40
+ Q3: Soft-deleted records cleanup:
41
+ SELECT * FROM orders WHERE deleted_at IS NOT NULL
42
+ AND deleted_at < NOW() - INTERVAL '90 days';
43
+ -- Only 0.1% of orders are soft-deleted, but full index is huge
44
+
45
+ Q4: JSON field query:
46
+ SELECT * FROM products WHERE (metadata->>'brand') = 'Nike';
47
+ -- No index on JSON fields
48
+
49
+ Q5: Computed date query:
50
+ SELECT * FROM subscriptions
51
+ WHERE DATE(expires_at) = CURRENT_DATE;
52
+ -- Function on column prevents index use
53
+
54
+ Q6: Trigram search:
55
+ SELECT * FROM products WHERE name ILIKE '%wireless%mouse%';
56
+ -- Leading wildcard with case-insensitive match
57
+
58
+ Task: Design the optimal index for each query using partial indexes,
59
+ expression indexes, or specialized index types. For each, show:
60
+ the CREATE INDEX statement, the size compared to a full index, and
61
+ the EXPLAIN output showing the index is used.
62
+
63
+ assertions:
64
+ - type: llm_judge
65
+ criteria: "Partial indexes are used correctly — index on products WHERE status = 'active' covers only 3% of rows (300K vs 10M), index on orders WHERE deleted_at IS NOT NULL covers only 0.1%. Each partial index is dramatically smaller than the full equivalent"
66
+ weight: 0.35
67
+ description: "Correct partial index usage"
68
+ - type: llm_judge
69
+ criteria: "Expression indexes handle function-on-column — LOWER(email) expression index enables case-insensitive search, JSON expression index on (metadata->>'brand') enables JSON field queries, DATE(expires_at) expression index enables computed date queries. Each eliminates the need for full table scans"
70
+ weight: 0.35
71
+ description: "Expression indexes for functions"
72
+ - type: llm_judge
73
+ criteria: "Specialized indexes are recommended where appropriate — pg_trgm GIN index for ILIKE wildcard search, citext extension as alternative to LOWER() expression index, and the trade-offs of each approach (storage, maintenance, query compatibility)"
74
+ weight: 0.30
75
+ description: "Appropriate specialized indexes"
@@ -0,0 +1,62 @@
1
+ meta:
2
+ id: query-planner-settings
3
+ level: 2
4
+ course: postgresql-query-optimization
5
+ type: output
6
+ description: "Tune query planner settings — configure work_mem, effective_cache_size, and cost parameters for your workload"
7
+ tags: [PostgreSQL, planner, work_mem, effective_cache_size, tuning, intermediate]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You just migrated to a new database server with NVMe SSDs and
13
+ 128GB RAM, but query performance hasn't improved. The planner
14
+ still prefers sequential scans over index scans and hash joins
15
+ spill to disk.
16
+
17
+ Server specs:
18
+ - CPU: 32 cores
19
+ - RAM: 128GB
20
+ - Storage: NVMe SSD (500K IOPS, 3GB/s throughput)
21
+ - PostgreSQL 16
22
+ - max_connections: 200
23
+
24
+ Current settings (still at defaults):
25
+ - shared_buffers = 128MB (should be much higher)
26
+ - work_mem = 4MB (sorts spill to disk)
27
+ - effective_cache_size = 4GB (planner underestimates cache)
28
+ - random_page_cost = 4.0 (assumes spinning disk, not SSD)
29
+ - seq_page_cost = 1.0
30
+ - maintenance_work_mem = 64MB (VACUUM and index builds are slow)
31
+ - effective_io_concurrency = 1 (not using SSD parallelism)
32
+
33
+ Symptoms:
34
+ 1. Planner chooses Seq Scan over Index Scan because
35
+ random_page_cost = 4.0 makes random I/O look 4x expensive
36
+ (true for HDD, not for SSD where it's ~1.1x)
37
+ 2. Hash Joins spill to disk because work_mem = 4MB can't hold
38
+ the hash table in memory
39
+ 3. Sorts spill to disk for the same reason
40
+ 4. VACUUM takes hours because maintenance_work_mem = 64MB
41
+ 5. effective_cache_size = 4GB makes the planner think only 4GB
42
+ of data is cached (actual: ~100GB between shared_buffers and
43
+ OS cache)
44
+
45
+ Task: Calculate the optimal value for each setting. For each,
46
+ explain: the formula or reasoning, the impact on query plans,
47
+ and the risks of over-tuning. Then show before/after EXPLAIN
48
+ output for a query that changes plan due to the settings.
49
+
50
+ assertions:
51
+ - type: llm_judge
52
+ criteria: "Setting calculations are correct for the hardware — shared_buffers ~25% of RAM (32GB), work_mem calculated from RAM / (max_connections * expected_operations), effective_cache_size ~75% of RAM (96GB), random_page_cost 1.1-1.5 for NVMe SSDs, maintenance_work_mem 1-2GB, effective_io_concurrency 200+ for NVMe"
53
+ weight: 0.35
54
+ description: "Correct setting calculations"
55
+ - type: llm_judge
56
+ criteria: "Impact on query plans is demonstrated — shows how reducing random_page_cost makes the planner prefer Index Scan over Seq Scan, how increasing work_mem eliminates disk spill in Hash Joins, and how effective_cache_size changes the planner's cost estimates. Before/after EXPLAIN shows the plan change"
57
+ weight: 0.35
58
+ description: "Demonstrated plan impact"
59
+ - type: llm_judge
60
+ criteria: "Risks of over-tuning are explained — too-high work_mem × 200 connections can OOM the server, too-low random_page_cost can force inefficient index scans, shared_buffers > 40% of RAM can cause double-buffering with OS cache. Includes the formula for safe work_mem calculation"
61
+ weight: 0.30
62
+ description: "Over-tuning risks explained"
@@ -0,0 +1,67 @@
1
+ meta:
2
+ id: subquery-optimization
3
+ level: 2
4
+ course: postgresql-query-optimization
5
+ type: output
6
+ description: "Optimize subqueries — convert correlated subqueries to JOINs, use EXISTS vs IN, and leverage lateral joins for dependent subqueries"
7
+ tags: [PostgreSQL, subqueries, correlated-subquery, EXISTS, IN, lateral-join, intermediate]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your team's order summary endpoint is slow. The query uses multiple
13
+ subqueries, and EXPLAIN ANALYZE shows repeated execution of correlated
14
+ subqueries — one for each row in the outer query.
15
+
16
+ The current query (takes 12 seconds):
17
+ SELECT c.id, c.name, c.email,
18
+ (SELECT COUNT(*) FROM orders WHERE customer_id = c.id) AS order_count,
19
+ (SELECT SUM(total) FROM orders WHERE customer_id = c.id) AS total_spent,
20
+ (SELECT MAX(created_at) FROM orders WHERE customer_id = c.id) AS last_order
21
+ FROM customers c
22
+ WHERE c.status = 'active'
23
+ AND (SELECT COUNT(*) FROM orders WHERE customer_id = c.id) > 5;
24
+
25
+ EXPLAIN ANALYZE shows:
26
+ Seq Scan on customers (rows=50,000)
27
+ → SubPlan 1: Index Scan on orders (executed 50,000 times)
28
+ → SubPlan 2: Index Scan on orders (executed 50,000 times)
29
+ → SubPlan 3: Index Scan on orders (executed 50,000 times)
30
+ → SubPlan 4: Index Scan on orders (executed 50,000 times)
31
+ Total: 12.4 seconds
32
+
33
+ That's 200,000 subplan executions for 50,000 customers.
34
+
35
+ Additional subquery patterns your team uses that need review:
36
+ 1. WHERE id IN (SELECT ...) vs WHERE EXISTS (SELECT ...)
37
+ - When does the planner treat these differently?
38
+ 2. Scalar subquery in SELECT vs JOIN
39
+ - The query above uses scalar subqueries — how does a JOIN compare?
40
+ 3. LATERAL JOIN for "top-N per group" queries:
41
+ SELECT d.name, recent.*
42
+ FROM departments d,
43
+ LATERAL (SELECT * FROM employees WHERE dept_id = d.id
44
+ ORDER BY hire_date DESC LIMIT 3) recent;
45
+ 4. Subquery in FROM (derived table) vs CTE:
46
+ SELECT * FROM (SELECT ...) sub WHERE sub.x > 10;
47
+ vs WITH sub AS (SELECT ...) SELECT * FROM sub WHERE sub.x > 10;
48
+
49
+ Task: Rewrite the slow query to eliminate the repeated subplan
50
+ executions. Explain: the rewritten query using JOINs, why correlated
51
+ subqueries are expensive (execution model), when EXISTS outperforms IN,
52
+ how LATERAL joins work and when to use them, and the difference between
53
+ derived tables and CTEs in PostgreSQL's optimizer.
54
+
55
+ assertions:
56
+ - type: llm_judge
57
+ criteria: "The correlated subquery is correctly rewritten — converts the 4 scalar subqueries into a single LEFT JOIN with GROUP BY on orders (aggregating COUNT, SUM, MAX), reducing 200K subplan executions to a single hash aggregate. The rewritten query should use HAVING or a subquery/CTE for the >5 filter"
58
+ weight: 0.35
59
+ description: "Correct subquery rewrite"
60
+ - type: llm_judge
61
+ criteria: "EXISTS vs IN is explained — EXISTS short-circuits (stops at first match), IN materializes the full subquery result. EXISTS is better for large subquery results with selective outer queries. The planner may convert IN to a semi-join (equivalent to EXISTS) but not always, especially with NULLs"
62
+ weight: 0.35
63
+ description: "EXISTS vs IN explained"
64
+ - type: llm_judge
65
+ criteria: "LATERAL joins and derived tables vs CTEs are explained — LATERAL allows dependent subqueries in FROM clause (each row can reference prior tables), useful for top-N per group. Derived tables can be folded into the outer query by the optimizer, while CTEs in PG 12+ are only materialized when specified or referenced multiple times"
66
+ weight: 0.30
67
+ description: "LATERAL and CTE behavior explained"