dojo.md 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (243) hide show
  1. package/courses/GENERATION_LOG.md +27 -0
  2. package/courses/api-error-handling/course.yaml +16 -0
  3. package/courses/api-error-handling/scenarios/level-1/error-response-format.yaml +131 -0
  4. package/courses/api-error-handling/scenarios/level-1/http-status-codes-basics.yaml +90 -0
  5. package/courses/api-error-handling/scenarios/level-1/rate-limiting-basics.yaml +135 -0
  6. package/courses/api-error-handling/scenarios/level-1/request-validation-errors.yaml +208 -0
  7. package/courses/api-error-handling/scenarios/level-2/circuit-breaker-pattern.yaml +189 -0
  8. package/courses/api-error-handling/scenarios/level-2/idempotency-retry-logic.yaml +159 -0
  9. package/courses/api-error-handling/scenarios/level-2/rfc-7807-problem-details.yaml +178 -0
  10. package/courses/api-error-handling/scenarios/level-2/webhook-error-handling.yaml +211 -0
  11. package/courses/api-error-handling/scenarios/level-3/distributed-tracing-errors.yaml +275 -0
  12. package/courses/github-actions-cicd/course.yaml +10 -0
  13. package/courses/github-actions-cicd/scenarios/level-1/actions-and-runners.yaml +58 -0
  14. package/courses/github-actions-cicd/scenarios/level-1/basic-workflow-syntax.yaml +52 -0
  15. package/courses/github-actions-cicd/scenarios/level-1/branch-protection-checks.yaml +63 -0
  16. package/courses/github-actions-cicd/scenarios/level-1/environment-variables-secrets.yaml +65 -0
  17. package/courses/github-actions-cicd/scenarios/level-1/first-cicd-shift.yaml +62 -0
  18. package/courses/github-actions-cicd/scenarios/level-1/job-dependencies-outputs.yaml +62 -0
  19. package/courses/github-actions-cicd/scenarios/level-1/simple-ci-pipeline.yaml +57 -0
  20. package/courses/github-actions-cicd/scenarios/level-1/workflow-debugging.yaml +90 -0
  21. package/courses/github-actions-cicd/scenarios/level-1/workflow-status-notifications.yaml +59 -0
  22. package/courses/github-actions-cicd/scenarios/level-1/workflow-triggers.yaml +56 -0
  23. package/courses/github-actions-cicd/scenarios/level-2/concurrency-control.yaml +58 -0
  24. package/courses/github-actions-cicd/scenarios/level-2/conditional-execution.yaml +60 -0
  25. package/courses/github-actions-cicd/scenarios/level-2/custom-actions-development.yaml +55 -0
  26. package/courses/github-actions-cicd/scenarios/level-2/dependency-caching.yaml +58 -0
  27. package/courses/github-actions-cicd/scenarios/level-2/deployment-workflows.yaml +61 -0
  28. package/courses/github-actions-cicd/scenarios/level-2/github-packages-publishing.yaml +59 -0
  29. package/courses/github-actions-cicd/scenarios/level-2/intermediate-cicd-shift.yaml +68 -0
  30. package/courses/github-actions-cicd/scenarios/level-2/matrix-builds.yaml +59 -0
  31. package/courses/github-actions-cicd/scenarios/level-2/reusable-workflows.yaml +61 -0
  32. package/courses/github-actions-cicd/scenarios/level-2/workflow-cost-optimization.yaml +61 -0
  33. package/courses/github-actions-cicd/scenarios/level-3/advanced-cicd-shift.yaml +64 -0
  34. package/courses/github-actions-cicd/scenarios/level-3/compliance-automation.yaml +68 -0
  35. package/courses/github-actions-cicd/scenarios/level-3/docker-action-development.yaml +65 -0
  36. package/courses/github-actions-cicd/scenarios/level-3/github-environments.yaml +65 -0
  37. package/courses/github-actions-cicd/scenarios/level-3/monorepo-ci.yaml +68 -0
  38. package/courses/github-actions-cicd/scenarios/level-3/oidc-cloud-deployments.yaml +55 -0
  39. package/courses/github-actions-cicd/scenarios/level-3/release-automation.yaml +61 -0
  40. package/courses/github-actions-cicd/scenarios/level-3/security-hardening.yaml +63 -0
  41. package/courses/github-actions-cicd/scenarios/level-3/self-hosted-runners.yaml +60 -0
  42. package/courses/github-actions-cicd/scenarios/level-3/workflow-optimization.yaml +59 -0
  43. package/courses/github-actions-cicd/scenarios/level-4/cicd-data-architecture.yaml +63 -0
  44. package/courses/github-actions-cicd/scenarios/level-4/cicd-economics-roi.yaml +63 -0
  45. package/courses/github-actions-cicd/scenarios/level-4/cicd-executive-communication.yaml +58 -0
  46. package/courses/github-actions-cicd/scenarios/level-4/cicd-incident-response.yaml +60 -0
  47. package/courses/github-actions-cicd/scenarios/level-4/cicd-org-design.yaml +59 -0
  48. package/courses/github-actions-cicd/scenarios/level-4/cicd-platform-architecture.yaml +63 -0
  49. package/courses/github-actions-cicd/scenarios/level-4/cicd-training-program.yaml +65 -0
  50. package/courses/github-actions-cicd/scenarios/level-4/cicd-vendor-evaluation.yaml +59 -0
  51. package/courses/github-actions-cicd/scenarios/level-4/enterprise-cicd-governance.yaml +55 -0
  52. package/courses/github-actions-cicd/scenarios/level-4/expert-cicd-shift.yaml +60 -0
  53. package/courses/github-actions-cicd/scenarios/level-5/cicd-ai-future.yaml +63 -0
  54. package/courses/github-actions-cicd/scenarios/level-5/cicd-behavioral-science.yaml +70 -0
  55. package/courses/github-actions-cicd/scenarios/level-5/cicd-board-strategy.yaml +56 -0
  56. package/courses/github-actions-cicd/scenarios/level-5/cicd-consulting-engagement.yaml +61 -0
  57. package/courses/github-actions-cicd/scenarios/level-5/cicd-industry-benchmarks.yaml +63 -0
  58. package/courses/github-actions-cicd/scenarios/level-5/cicd-ma-integration.yaml +73 -0
  59. package/courses/github-actions-cicd/scenarios/level-5/cicd-product-development.yaml +68 -0
  60. package/courses/github-actions-cicd/scenarios/level-5/cicd-regulatory-landscape.yaml +72 -0
  61. package/courses/github-actions-cicd/scenarios/level-5/comprehensive-cicd-system.yaml +66 -0
  62. package/courses/github-actions-cicd/scenarios/level-5/master-cicd-shift.yaml +76 -0
  63. package/courses/github-pr-review/scenarios/level-2/api-change-review.yaml +82 -0
  64. package/courses/github-pr-review/scenarios/level-2/automated-review-tooling.yaml +53 -0
  65. package/courses/github-pr-review/scenarios/level-2/cross-team-review.yaml +61 -0
  66. package/courses/github-pr-review/scenarios/level-2/intermediate-review-shift.yaml +66 -0
  67. package/courses/github-pr-review/scenarios/level-2/performance-review-patterns.yaml +99 -0
  68. package/courses/github-pr-review/scenarios/level-2/review-disagreement-resolution.yaml +64 -0
  69. package/courses/github-pr-review/scenarios/level-2/review-metrics-analysis.yaml +63 -0
  70. package/courses/github-pr-review/scenarios/level-2/review-turnaround-sla.yaml +54 -0
  71. package/courses/github-pr-review/scenarios/level-2/stacked-pr-review.yaml +65 -0
  72. package/courses/github-pr-review/scenarios/level-3/advanced-review-shift.yaml +65 -0
  73. package/courses/github-pr-review/scenarios/level-3/ai-powered-review.yaml +58 -0
  74. package/courses/github-pr-review/scenarios/level-3/compliance-review-process.yaml +64 -0
  75. package/courses/github-pr-review/scenarios/level-3/cross-functional-review.yaml +60 -0
  76. package/courses/github-pr-review/scenarios/level-3/incident-driven-review.yaml +63 -0
  77. package/courses/github-pr-review/scenarios/level-3/large-scale-review-operations.yaml +55 -0
  78. package/courses/github-pr-review/scenarios/level-3/monorepo-review-process.yaml +68 -0
  79. package/courses/github-pr-review/scenarios/level-3/review-automation-platform.yaml +61 -0
  80. package/courses/github-pr-review/scenarios/level-3/review-culture-design.yaml +62 -0
  81. package/courses/github-pr-review/scenarios/level-3/review-data-pipeline.yaml +62 -0
  82. package/courses/github-pr-review/scenarios/level-4/enterprise-review-operations.yaml +61 -0
  83. package/courses/github-pr-review/scenarios/level-4/expert-review-shift.yaml +62 -0
  84. package/courses/github-pr-review/scenarios/level-4/review-data-architecture.yaml +69 -0
  85. package/courses/github-pr-review/scenarios/level-4/review-economics-roi.yaml +63 -0
  86. package/courses/github-pr-review/scenarios/level-4/review-executive-communication.yaml +61 -0
  87. package/courses/github-pr-review/scenarios/level-4/review-incident-postmortem.yaml +69 -0
  88. package/courses/github-pr-review/scenarios/level-4/review-org-design.yaml +62 -0
  89. package/courses/github-pr-review/scenarios/level-4/review-platform-architecture.yaml +64 -0
  90. package/courses/github-pr-review/scenarios/level-4/review-training-program.yaml +66 -0
  91. package/courses/github-pr-review/scenarios/level-4/review-vendor-evaluation.yaml +76 -0
  92. package/courses/github-pr-review/scenarios/level-5/comprehensive-review-system.yaml +68 -0
  93. package/courses/github-pr-review/scenarios/level-5/master-review-shift.yaml +73 -0
  94. package/courses/github-pr-review/scenarios/level-5/review-ai-future.yaml +69 -0
  95. package/courses/github-pr-review/scenarios/level-5/review-behavioral-science.yaml +66 -0
  96. package/courses/github-pr-review/scenarios/level-5/review-board-strategy.yaml +62 -0
  97. package/courses/github-pr-review/scenarios/level-5/review-consulting-engagement.yaml +62 -0
  98. package/courses/github-pr-review/scenarios/level-5/review-devtools-product.yaml +71 -0
  99. package/courses/github-pr-review/scenarios/level-5/review-industry-benchmarks.yaml +64 -0
  100. package/courses/github-pr-review/scenarios/level-5/review-ma-integration.yaml +76 -0
  101. package/courses/github-pr-review/scenarios/level-5/review-regulatory-landscape.yaml +78 -0
  102. package/courses/postgresql-query-optimization/course.yaml +11 -0
  103. package/courses/postgresql-query-optimization/scenarios/level-1/explain-analyze-basics.yaml +80 -0
  104. package/courses/postgresql-query-optimization/scenarios/level-1/first-optimization-shift.yaml +77 -0
  105. package/courses/postgresql-query-optimization/scenarios/level-1/index-fundamentals.yaml +76 -0
  106. package/courses/postgresql-query-optimization/scenarios/level-1/join-basics.yaml +73 -0
  107. package/courses/postgresql-query-optimization/scenarios/level-1/n-plus-one-queries.yaml +62 -0
  108. package/courses/postgresql-query-optimization/scenarios/level-1/query-rewriting-basics.yaml +69 -0
  109. package/courses/postgresql-query-optimization/scenarios/level-1/select-star-problems.yaml +69 -0
  110. package/courses/postgresql-query-optimization/scenarios/level-1/slow-query-diagnosis.yaml +63 -0
  111. package/courses/postgresql-query-optimization/scenarios/level-1/vacuum-and-statistics.yaml +62 -0
  112. package/courses/postgresql-query-optimization/scenarios/level-1/where-clause-optimization.yaml +74 -0
  113. package/courses/postgresql-query-optimization/scenarios/level-2/autovacuum-tuning.yaml +76 -0
  114. package/courses/postgresql-query-optimization/scenarios/level-2/composite-index-design.yaml +81 -0
  115. package/courses/postgresql-query-optimization/scenarios/level-2/covering-indexes.yaml +74 -0
  116. package/courses/postgresql-query-optimization/scenarios/level-2/cte-optimization.yaml +83 -0
  117. package/courses/postgresql-query-optimization/scenarios/level-2/intermediate-optimization-shift.yaml +66 -0
  118. package/courses/postgresql-query-optimization/scenarios/level-2/join-optimization.yaml +72 -0
  119. package/courses/postgresql-query-optimization/scenarios/level-2/partial-and-expression-indexes.yaml +75 -0
  120. package/courses/postgresql-query-optimization/scenarios/level-2/query-planner-settings.yaml +62 -0
  121. package/courses/postgresql-query-optimization/scenarios/level-2/subquery-optimization.yaml +67 -0
  122. package/courses/postgresql-query-optimization/scenarios/level-2/window-function-optimization.yaml +63 -0
  123. package/courses/postgresql-query-optimization/scenarios/level-3/advanced-optimization-shift.yaml +71 -0
  124. package/courses/postgresql-query-optimization/scenarios/level-3/connection-pooling.yaml +60 -0
  125. package/courses/postgresql-query-optimization/scenarios/level-3/full-text-search-optimization.yaml +66 -0
  126. package/courses/postgresql-query-optimization/scenarios/level-3/jsonb-optimization.yaml +88 -0
  127. package/courses/postgresql-query-optimization/scenarios/level-3/lock-contention-analysis.yaml +80 -0
  128. package/courses/postgresql-query-optimization/scenarios/level-3/materialized-view-optimization.yaml +73 -0
  129. package/courses/postgresql-query-optimization/scenarios/level-3/parallel-query-execution.yaml +74 -0
  130. package/courses/postgresql-query-optimization/scenarios/level-3/partitioning-strategies.yaml +71 -0
  131. package/courses/postgresql-query-optimization/scenarios/level-3/specialized-index-types.yaml +67 -0
  132. package/courses/postgresql-query-optimization/scenarios/level-3/write-optimization.yaml +65 -0
  133. package/courses/postgresql-query-optimization/scenarios/level-4/data-architecture-analytics.yaml +64 -0
  134. package/courses/postgresql-query-optimization/scenarios/level-4/database-executive-communication.yaml +64 -0
  135. package/courses/postgresql-query-optimization/scenarios/level-4/database-migration-planning.yaml +57 -0
  136. package/courses/postgresql-query-optimization/scenarios/level-4/enterprise-database-governance.yaml +52 -0
  137. package/courses/postgresql-query-optimization/scenarios/level-4/expert-optimization-shift.yaml +73 -0
  138. package/courses/postgresql-query-optimization/scenarios/level-4/high-availability-architecture.yaml +62 -0
  139. package/courses/postgresql-query-optimization/scenarios/level-4/optimizer-internals.yaml +69 -0
  140. package/courses/postgresql-query-optimization/scenarios/level-4/performance-sla-design.yaml +58 -0
  141. package/courses/postgresql-query-optimization/scenarios/level-4/read-replica-optimization.yaml +62 -0
  142. package/courses/postgresql-query-optimization/scenarios/level-4/vendor-evaluation.yaml +73 -0
  143. package/courses/rest-api-error-handling/course.yaml +11 -0
  144. package/courses/rest-api-error-handling/scenarios/level-1/authentication-errors.yaml +71 -0
  145. package/courses/rest-api-error-handling/scenarios/level-1/content-negotiation-errors.yaml +63 -0
  146. package/courses/rest-api-error-handling/scenarios/level-1/error-logging-basics.yaml +63 -0
  147. package/courses/rest-api-error-handling/scenarios/level-1/error-response-format.yaml +58 -0
  148. package/courses/rest-api-error-handling/scenarios/level-1/first-error-handling-shift.yaml +67 -0
  149. package/courses/rest-api-error-handling/scenarios/level-1/http-status-codes.yaml +46 -0
  150. package/courses/rest-api-error-handling/scenarios/level-1/not-found-errors.yaml +52 -0
  151. package/courses/rest-api-error-handling/scenarios/level-1/rate-limiting-errors.yaml +56 -0
  152. package/courses/rest-api-error-handling/scenarios/level-1/request-validation-errors.yaml +59 -0
  153. package/courses/rest-api-error-handling/scenarios/level-1/server-error-handling.yaml +55 -0
  154. package/courses/rest-api-error-handling/scenarios/level-2/api-versioning-errors.yaml +66 -0
  155. package/courses/rest-api-error-handling/scenarios/level-2/batch-request-errors.yaml +61 -0
  156. package/courses/rest-api-error-handling/scenarios/level-2/circuit-breaker-pattern.yaml +52 -0
  157. package/courses/rest-api-error-handling/scenarios/level-2/error-code-taxonomy.yaml +62 -0
  158. package/courses/rest-api-error-handling/scenarios/level-2/error-monitoring-alerting.yaml +53 -0
  159. package/courses/rest-api-error-handling/scenarios/level-2/intermediate-error-shift.yaml +69 -0
  160. package/courses/rest-api-error-handling/scenarios/level-2/pagination-errors.yaml +66 -0
  161. package/courses/rest-api-error-handling/scenarios/level-2/retry-and-idempotency.yaml +60 -0
  162. package/courses/rest-api-error-handling/scenarios/level-2/rfc7807-problem-details.yaml +60 -0
  163. package/courses/rest-api-error-handling/scenarios/level-2/webhook-error-handling.yaml +55 -0
  164. package/courses/rest-api-error-handling/scenarios/level-3/advanced-error-shift.yaml +72 -0
  165. package/courses/rest-api-error-handling/scenarios/level-3/api-gateway-errors.yaml +71 -0
  166. package/courses/rest-api-error-handling/scenarios/level-3/async-api-errors.yaml +67 -0
  167. package/courses/rest-api-error-handling/scenarios/level-3/caching-error-scenarios.yaml +65 -0
  168. package/courses/rest-api-error-handling/scenarios/level-3/chaos-engineering-apis.yaml +62 -0
  169. package/courses/rest-api-error-handling/scenarios/level-3/database-error-handling.yaml +79 -0
  170. package/courses/rest-api-error-handling/scenarios/level-3/distributed-error-propagation.yaml +63 -0
  171. package/courses/rest-api-error-handling/scenarios/level-3/error-budgets-sre.yaml +61 -0
  172. package/courses/rest-api-error-handling/scenarios/level-3/error-correlation.yaml +58 -0
  173. package/courses/rest-api-error-handling/scenarios/level-3/graphql-vs-rest-errors.yaml +73 -0
  174. package/courses/rest-api-error-handling/scenarios/level-4/compliance-error-handling.yaml +65 -0
  175. package/courses/rest-api-error-handling/scenarios/level-4/enterprise-error-governance.yaml +62 -0
  176. package/courses/rest-api-error-handling/scenarios/level-4/error-analytics-platform.yaml +65 -0
  177. package/courses/rest-api-error-handling/scenarios/level-4/error-cost-optimization.yaml +63 -0
  178. package/courses/rest-api-error-handling/scenarios/level-4/error-executive-communication.yaml +60 -0
  179. package/courses/rest-api-error-handling/scenarios/level-4/error-handling-architecture.yaml +67 -0
  180. package/courses/rest-api-error-handling/scenarios/level-4/error-org-design.yaml +68 -0
  181. package/courses/rest-api-error-handling/scenarios/level-4/error-sla-design.yaml +65 -0
  182. package/courses/rest-api-error-handling/scenarios/level-4/error-training-program.yaml +61 -0
  183. package/courses/rest-api-error-handling/scenarios/level-4/expert-error-shift.yaml +63 -0
  184. package/courses/rest-api-error-handling/scenarios/level-5/comprehensive-error-system.yaml +68 -0
  185. package/courses/rest-api-error-handling/scenarios/level-5/error-ai-future.yaml +75 -0
  186. package/courses/rest-api-error-handling/scenarios/level-5/error-behavioral-science.yaml +73 -0
  187. package/courses/rest-api-error-handling/scenarios/level-5/error-board-strategy.yaml +60 -0
  188. package/courses/rest-api-error-handling/scenarios/level-5/error-consulting-engagement.yaml +58 -0
  189. package/courses/rest-api-error-handling/scenarios/level-5/error-industry-benchmarks.yaml +72 -0
  190. package/courses/rest-api-error-handling/scenarios/level-5/error-ma-integration.yaml +68 -0
  191. package/courses/rest-api-error-handling/scenarios/level-5/error-product-development.yaml +66 -0
  192. package/courses/rest-api-error-handling/scenarios/level-5/error-regulatory-landscape.yaml +80 -0
  193. package/courses/rest-api-error-handling/scenarios/level-5/master-error-shift.yaml +73 -0
  194. package/dist/cli/commands/add.d.ts.map +1 -1
  195. package/dist/cli/commands/add.js +6 -5
  196. package/dist/cli/commands/add.js.map +1 -1
  197. package/dist/cli/commands/generate.d.ts.map +1 -1
  198. package/dist/cli/commands/generate.js +4 -0
  199. package/dist/cli/commands/generate.js.map +1 -1
  200. package/dist/cli/commands/list.d.ts.map +1 -1
  201. package/dist/cli/commands/list.js +6 -18
  202. package/dist/cli/commands/list.js.map +1 -1
  203. package/dist/cli/commands/train.d.ts.map +1 -1
  204. package/dist/cli/commands/train.js +18 -18
  205. package/dist/cli/commands/train.js.map +1 -1
  206. package/dist/cli/index.js +93 -55
  207. package/dist/cli/index.js.map +1 -1
  208. package/dist/cli/run-demo.js +2 -1
  209. package/dist/cli/run-demo.js.map +1 -1
  210. package/dist/cli/setup.d.ts +18 -0
  211. package/dist/cli/setup.d.ts.map +1 -0
  212. package/dist/cli/setup.js +154 -0
  213. package/dist/cli/setup.js.map +1 -0
  214. package/dist/engine/agent-bridge.d.ts +5 -2
  215. package/dist/engine/agent-bridge.d.ts.map +1 -1
  216. package/dist/engine/agent-bridge.js +36 -9
  217. package/dist/engine/agent-bridge.js.map +1 -1
  218. package/dist/engine/loader.d.ts +21 -0
  219. package/dist/engine/loader.d.ts.map +1 -1
  220. package/dist/engine/loader.js +54 -1
  221. package/dist/engine/loader.js.map +1 -1
  222. package/dist/engine/training-loop.d.ts.map +1 -1
  223. package/dist/engine/training-loop.js +1 -0
  224. package/dist/engine/training-loop.js.map +1 -1
  225. package/dist/engine/training.d.ts.map +1 -1
  226. package/dist/engine/training.js +1 -0
  227. package/dist/engine/training.js.map +1 -1
  228. package/dist/generator/skill-generator.d.ts +1 -1
  229. package/dist/generator/skill-generator.d.ts.map +1 -1
  230. package/dist/generator/skill-generator.js +21 -2
  231. package/dist/generator/skill-generator.js.map +1 -1
  232. package/dist/mcp/server.d.ts.map +1 -1
  233. package/dist/mcp/server.js +11 -26
  234. package/dist/mcp/server.js.map +1 -1
  235. package/dist/mcp/session-manager.d.ts +3 -1
  236. package/dist/mcp/session-manager.d.ts.map +1 -1
  237. package/dist/mcp/session-manager.js +44 -22
  238. package/dist/mcp/session-manager.js.map +1 -1
  239. package/dist/types/schemas.d.ts +38 -13
  240. package/dist/types/schemas.d.ts.map +1 -1
  241. package/dist/types/schemas.js +9 -5
  242. package/dist/types/schemas.js.map +1 -1
  243. package/package.json +1 -1
@@ -0,0 +1,67 @@
1
+ meta:
2
+ id: specialized-index-types
3
+ level: 3
4
+ course: postgresql-query-optimization
5
+ type: output
6
+ description: "Use specialized index types — choose between GIN, GiST, BRIN, and SP-GiST for non-standard data types and query patterns"
7
+ tags: [PostgreSQL, GIN, GiST, BRIN, SP-GiST, specialized-indexes, advanced]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your platform has diverse data types that B-tree indexes can't
13
+ handle efficiently. Each needs a specialized index type.
14
+
15
+ Scenario 1 — Full-text search (articles table, 20M rows):
16
+ SELECT * FROM articles
17
+ WHERE to_tsvector('english', title || ' ' || body) @@
18
+ to_tsquery('english', 'postgresql & optimization');
19
+ Currently: Seq Scan, 15 seconds per query.
20
+ Needs: GIN index on tsvector.
21
+
22
+ Scenario 2 — JSONB queries (products table, 5M rows):
23
+ SELECT * FROM products WHERE metadata @> '{"color": "red"}';
24
+ SELECT * FROM products WHERE metadata ? 'warranty';
25
+ Currently: Seq Scan, 8 seconds per query.
26
+ Needs: GIN index on JSONB column.
27
+
28
+ Scenario 3 — Geospatial (locations table, 2M rows):
29
+ SELECT * FROM locations
30
+ WHERE ST_DWithin(coordinates, ST_MakePoint(-73.98, 40.75), 1000);
31
+ Currently: Seq Scan with distance calculation on every row.
32
+ Needs: GiST index on geometry column.
33
+
34
+ Scenario 4 — Time-series data (metrics table, 10B rows):
35
+ SELECT * FROM metrics
36
+ WHERE recorded_at BETWEEN '2026-02-26' AND '2026-02-27';
37
+ Data is inserted in time order (naturally ordered on disk).
38
+ B-tree index would be 200GB. Needs: BRIN index (tiny, ~10MB).
39
+
40
+ Scenario 5 — Array contains (tags table, 3M rows):
41
+ SELECT * FROM posts WHERE tags @> ARRAY['postgresql', 'tutorial'];
42
+ Currently: Seq Scan.
43
+ Needs: GIN index on array column.
44
+
45
+ Scenario 6 — IP address range (access_logs table, 100M rows):
46
+ SELECT * FROM access_logs
47
+ WHERE ip_address <<= '192.168.0.0/16';
48
+ Needs: SP-GiST or GiST index on inet type.
49
+
50
+ Task: For each scenario, write: the CREATE INDEX statement, the
51
+ expected query performance improvement, the index size compared to
52
+ B-tree, and the trade-offs (build time, write overhead, maintenance).
53
+ Then create a decision matrix for choosing the right index type.
54
+
55
+ assertions:
56
+ - type: llm_judge
57
+ criteria: "Index type selection is correct — GIN for full-text search, JSONB, and arrays (inverted index for containment), GiST for geospatial (nearest-neighbor, distance), BRIN for naturally-ordered time-series (block range summary), SP-GiST for IP ranges and hierarchical data"
58
+ weight: 0.35
59
+ description: "Correct index type selection"
60
+ - type: llm_judge
61
+ criteria: "Performance improvements are realistic — GIN on JSONB: 100-1000x improvement, GiST on geometry: spatial index eliminates full scan, BRIN on time-series: tiny index (10MB vs 200GB B-tree) with good pruning on ordered data. Each improvement is justified by the data characteristics"
62
+ weight: 0.35
63
+ description: "Realistic performance estimates"
64
+ - type: llm_judge
65
+ criteria: "Decision matrix is comprehensive — covers when to use each type (data type, query pattern, data ordering, write frequency), includes trade-offs (GIN: fast queries but slow updates, BRIN: tiny but requires physical ordering, GiST: versatile but larger than BRIN), and handles hybrid approaches"
66
+ weight: 0.30
67
+ description: "Comprehensive decision matrix"
@@ -0,0 +1,65 @@
1
+ meta:
2
+ id: write-optimization
3
+ level: 3
4
+ course: postgresql-query-optimization
5
+ type: output
6
+ description: "Optimize write performance — accelerate bulk inserts, COPY operations, and upserts for data pipeline workloads"
7
+ tags: [PostgreSQL, write-optimization, COPY, bulk-insert, upsert, advanced]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your data pipeline ingests 100M rows per day into PostgreSQL. The
13
+ current pipeline takes 16 hours to load each day's data, meaning
14
+ you're falling behind. By next month, you'll have a 2-day backlog.
15
+
16
+ Current pipeline:
17
+ 1. Read CSV files from S3 (100 files × 1M rows each)
18
+ 2. For each file: for each row: INSERT INTO events (...) VALUES (...)
19
+ 3. After all inserts: UPDATE statistics tables
20
+ 4. After all updates: run VACUUM ANALYZE
21
+
22
+ Table: events (currently 5B rows)
23
+ - 15 columns, average row size 500 bytes
24
+ - 8 indexes (some composite)
25
+ - 3 triggers (audit logging, updated_at, notification)
26
+ - Foreign keys to 4 other tables
27
+
28
+ Performance profile:
29
+ - Single INSERT: 0.5ms per row
30
+ - 100M rows × 0.5ms = 50,000 seconds = 13.8 hours
31
+ - Overhead: FK checks (1 hour), trigger execution (1 hour),
32
+ WAL generation (0.2 hours)
33
+ - Total: ~16 hours
34
+
35
+ Target: Load 100M rows in < 2 hours
36
+
37
+ Optimization options to evaluate:
38
+ 1. COPY instead of INSERT (binary or CSV format)
39
+ 2. Batch INSERTs (100-1000 rows per statement)
40
+ 3. Drop indexes → load → recreate indexes
41
+ 4. Disable triggers during load
42
+ 5. Disable FK checks during load
43
+ 6. UNLOGGED table for staging
44
+ 7. Increase wal_buffers and checkpoint_timeout
45
+ 8. Parallel loading (multiple COPY workers)
46
+
47
+ Task: Design the optimized pipeline. Evaluate each option with
48
+ expected speedup. Write: the pipeline architecture, the COPY
49
+ configuration, the index management strategy, the WAL tuning,
50
+ and the safety measures (how to validate data integrity after
51
+ disabling constraints).
52
+
53
+ assertions:
54
+ - type: llm_judge
55
+ criteria: "Pipeline uses COPY for bulk loading — explains that COPY is 10-100x faster than individual INSERTs, shows the exact COPY command with optimal settings, and uses parallel COPY workers (one per file) with appropriate staging table approach"
56
+ weight: 0.35
57
+ description: "COPY-based pipeline"
58
+ - type: llm_judge
59
+ criteria: "Index and constraint management is safe — drops non-essential indexes before load, recreates with CREATE INDEX CONCURRENTLY after, disables triggers safely with ALTER TABLE DISABLE TRIGGER, and validates FK constraints after re-enabling with ALTER TABLE VALIDATE CONSTRAINT"
60
+ weight: 0.35
61
+ description: "Safe index/constraint management"
62
+ - type: llm_judge
63
+ criteria: "WAL optimization is included — increases checkpoint_timeout and max_wal_size for large loads, uses wal_level=minimal for initial load (if possible), and explains the tradeoff between WAL reduction and crash recovery risk. The total estimated time is under 2 hours"
64
+ weight: 0.30
65
+ description: "WAL optimization included"
@@ -0,0 +1,64 @@
1
+ meta:
2
+ id: data-architecture-analytics
3
+ level: 4
4
+ course: postgresql-query-optimization
5
+ type: output
6
+ description: "Design data architecture for analytics — separate OLTP from OLAP, implement data warehousing patterns, and optimize PostgreSQL for analytical workloads"
7
+ tags: [PostgreSQL, data-architecture, OLTP, OLAP, analytics, warehousing, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your company runs everything on PostgreSQL — OLTP transactions AND
13
+ analytics queries on the same database. The analytics team's queries
14
+ are killing production performance. The CTO wants a data architecture
15
+ that separates concerns without introducing too much complexity.
16
+
17
+ Current state:
18
+ - Single PostgreSQL cluster handling both OLTP and analytics
19
+ - OLTP: 50K transactions/second, needs <10ms latency
20
+ - Analytics: 200 queries/day, each scanning 100M+ rows, taking 5-30 min
21
+ - Analytics queries cause: table locks, buffer cache pollution, CPU spikes
22
+ - Data team wants: real-time dashboards, ad-hoc queries, ML feature extraction
23
+ - Current data volume: 2TB, growing 50GB/month
24
+
25
+ Recent incidents caused by analytics on production:
26
+ 1. Analytics query consumed all work_mem (256MB × 50 parallel workers),
27
+ causing OOM and PostgreSQL restart
28
+ 2. Sequential scan on 500M-row table evicted hot OLTP data from
29
+ shared_buffers, causing 10x latency spike for 30 minutes
30
+ 3. Long-running analytics transaction prevented VACUUM, causing table
31
+ bloat that filled the disk
32
+
33
+ Architecture options being considered:
34
+ A. Read replica dedicated to analytics (simplest)
35
+ B. PostgreSQL + columnar extension (Citus columnar, pg_analytics)
36
+ C. PostgreSQL OLTP + dedicated analytics database (ClickHouse, DuckDB)
37
+ D. PostgreSQL OLTP + data warehouse (Snowflake, BigQuery, Redshift)
38
+ E. PostgreSQL OLTP + data lake (S3 + Iceberg + query engine)
39
+
40
+ Constraints:
41
+ - Engineering team knows PostgreSQL, limited experience with other DBs
42
+ - Budget: $50K/month for analytics infrastructure
43
+ - Data freshness: dashboards need data within 5 minutes
44
+ - Compliance: PII must be masked in analytics layer
45
+ - 3 data engineers available for implementation
46
+
47
+ Task: Design the data architecture. Write: the recommended approach
48
+ with justification, the data pipeline design (how data flows from OLTP
49
+ to analytics), the query optimization for analytical workloads, the
50
+ PII handling strategy, and the migration plan from current state.
51
+
52
+ assertions:
53
+ - type: llm_judge
54
+ criteria: "Architecture separates OLTP and OLAP effectively — recommends an approach that eliminates analytics impact on production, justifies the choice against all 5 options considering the team's PostgreSQL expertise and constraints, and explains the data flow from OLTP to analytics layer (CDC, logical replication, or ETL)"
55
+ weight: 0.35
56
+ description: "Sound OLTP/OLAP separation"
57
+ - type: llm_judge
58
+ criteria: "Data pipeline achieves freshness requirements — designs a pipeline that delivers data within 5 minutes to the analytics layer, handles schema evolution, includes PII masking (column-level encryption, tokenization, or view-based masking), and addresses the 3 past incidents (memory, cache pollution, vacuum blocking)"
59
+ weight: 0.35
60
+ description: "Production-ready data pipeline"
61
+ - type: llm_judge
62
+ criteria: "Migration plan is realistic — phases the migration (start with read replica, then add dedicated analytics), estimates timeline considering 3 data engineers, identifies risks (data consistency during migration, team learning curve), and provides a cost projection within the $50K/month budget"
63
+ weight: 0.30
64
+ description: "Realistic migration plan"
@@ -0,0 +1,64 @@
1
+ meta:
2
+ id: database-executive-communication
3
+ level: 4
4
+ course: postgresql-query-optimization
5
+ type: output
6
+ description: "Communicate database performance to executives — translate query optimization into business impact, ROI, and strategic decisions"
7
+ tags: [PostgreSQL, executive-communication, ROI, business-impact, strategy, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You're the Director of Database Engineering presenting to the C-suite.
13
+ Your team has spent 6 months optimizing the PostgreSQL infrastructure,
14
+ and the CFO wants to know the ROI. The CTO wants to understand the
15
+ technical debt. The CEO wants to know if the database can support 10x
16
+ growth.
17
+
18
+ What your team accomplished (6-month optimization program):
19
+ 1. Query optimization: Identified and fixed 200 slow queries
20
+ - Average response time: 250ms → 45ms (82% improvement)
21
+ - P99 latency: 5 seconds → 200ms (96% improvement)
22
+ 2. Index optimization: Added 50 indexes, removed 30 unused ones
23
+ - Storage for indexes: reduced from 800GB to 500GB
24
+ - Write performance: improved 15% (fewer indexes to maintain)
25
+ 3. Connection pooling: Deployed PgBouncer across all services
26
+ - Max connections reduced from 5,000 to 500 (to PostgreSQL)
27
+ - Connection-related timeouts: eliminated
28
+ 4. Partitioning: Partitioned 5 largest tables
29
+ - Query performance on partitioned tables: 10x faster
30
+ - VACUUM time: 8 hours → 45 minutes per table
31
+ 5. Infrastructure: Migrated 3 clusters to larger instances
32
+ - Cost increase: $15K/month ($180K/year)
33
+
34
+ Financial context:
35
+ - Database infrastructure cost: $1.2M/year
36
+ - Engineering team (6 DBAs + 4 SREs): $2M/year
37
+ - Revenue impact of downtime: $50K/minute
38
+ - Last year's downtime: 18 hours ($54M potential impact)
39
+ - This year's downtime (so far): 52 minutes ($2.6M potential impact)
40
+
41
+ The executives ask:
42
+ - CFO: "What's the ROI of this optimization program?"
43
+ - CTO: "What's our remaining technical debt in the database layer?"
44
+ - CEO: "Can we support 10x user growth without 10x database cost?"
45
+
46
+ Task: Write the executive presentation. Include: the ROI calculation
47
+ (quantified business impact), the technical debt assessment (in
48
+ business terms), the scaling roadmap (cost projections for 10x),
49
+ the risk register (what could still go wrong), and the recommended
50
+ next investments.
51
+
52
+ assertions:
53
+ - type: llm_judge
54
+ criteria: "ROI is quantified in business terms — calculates the value of reduced downtime ($54M→$2.6M potential impact = $51.4M risk reduction), the value of faster response times (conversion rate impact, user experience), and compares against total investment ($180K infrastructure + team costs). Presents clear ROI ratio"
55
+ weight: 0.35
56
+ description: "Quantified business ROI"
57
+ - type: llm_judge
58
+ criteria: "Technical debt is translated for executives — expresses database technical debt in terms of risk, cost, and timeline rather than technical jargon. Identifies remaining debt items (e.g., tables not yet partitioned, queries not yet optimized, missing monitoring) and estimates the cost/effort to address each"
59
+ weight: 0.35
60
+ description: "Executive-friendly technical debt"
61
+ - type: llm_judge
62
+ criteria: "Scaling roadmap addresses 10x growth — projects database costs at 2x, 5x, 10x user growth (not linear — explains economies of scale and breaking points), identifies when architectural changes are needed (e.g., sharding at 50TB), and provides a timeline with investment milestones"
63
+ weight: 0.30
64
+ description: "10x scaling roadmap"
@@ -0,0 +1,57 @@
1
+ meta:
2
+ id: database-migration-planning
3
+ level: 4
4
+ course: postgresql-query-optimization
5
+ type: output
6
+ description: "Plan a terabyte-scale database migration — migrate PostgreSQL with near-zero downtime using logical replication"
7
+ tags: [PostgreSQL, migration, logical-replication, zero-downtime, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your company needs to migrate a 5TB PostgreSQL 14 database to
13
+ PostgreSQL 16 on new hardware. The database processes $50M in
14
+ transactions daily and has an SLA of 99.99% availability. Maximum
15
+ acceptable downtime: 5 minutes.
16
+
17
+ Source: PostgreSQL 14 on bare metal, 5TB, 200 tables
18
+ Target: PostgreSQL 16 on AWS RDS, same schema
19
+
20
+ Challenges:
21
+ 1. 5TB of data takes ~10 hours to pg_dump/restore
22
+ 2. Logical replication needs to catch up on changes during transfer
23
+ 3. 15 custom extensions on the source (3 not available on RDS)
24
+ 4. Sequences must be synchronized at cutover
25
+ 5. Application uses prepared statements tied to backend connections
26
+ 6. The orders table has 2B rows and takes 3 hours to COPY alone
27
+ 7. Read replicas serving analytics must also be migrated
28
+ 8. Rollback plan: if the new database fails, how to go back?
29
+
30
+ Migration options evaluated:
31
+ A. pg_dump + pg_restore (simple, requires hours of downtime)
32
+ B. Logical replication (near-zero downtime, complex setup)
33
+ C. AWS DMS (managed, but performance limitations)
34
+ D. pglogical extension (more features than native logical rep)
35
+
36
+ The VP of Engineering asked: "Can we do this with zero downtime
37
+ and zero data loss? What's the actual risk?"
38
+
39
+ Task: Design the migration plan. Write: the chosen approach with
40
+ justification, the step-by-step migration procedure, the cutover
41
+ choreography (the 5-minute window), the validation checklist
42
+ (how to verify the migration succeeded), the rollback plan, and
43
+ the risk assessment.
44
+
45
+ assertions:
46
+ - type: llm_judge
47
+ criteria: "Migration approach handles the constraints — addresses the 5TB volume (parallel COPY or initial pg_dump + logical replication for incremental), the 3 incompatible extensions (workarounds or replacements), and sequence synchronization at cutover. The approach achieves <5 minutes downtime"
48
+ weight: 0.35
49
+ description: "Constraint-handling approach"
50
+ - type: llm_judge
51
+ criteria: "Cutover choreography is precise — defines the exact steps during the 5-minute window (stop writes, ensure replication caught up, sync sequences, switch application connection strings, verify, resume writes), with timing estimates for each step"
52
+ weight: 0.35
53
+ description: "Precise cutover choreography"
54
+ - type: llm_judge
55
+ criteria: "Validation and rollback are thorough — row count comparison per table, checksum verification for critical tables, application smoke tests, and the rollback plan maintains the original database as hot standby during the migration so it can take over immediately if needed"
56
+ weight: 0.30
57
+ description: "Thorough validation and rollback"
@@ -0,0 +1,52 @@
1
+ meta:
2
+ id: enterprise-database-governance
3
+ level: 4
4
+ course: postgresql-query-optimization
5
+ type: output
6
+ description: "Design enterprise database governance — standardize PostgreSQL operations across a large organization with multiple teams and databases"
7
+ tags: [PostgreSQL, governance, enterprise, standardization, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You're the VP of Data Infrastructure at a company with 1,500
13
+ engineers and 200 PostgreSQL databases. An audit revealed systemic
14
+ problems:
15
+
16
+ - 200 databases, 40 teams, no consistent standards
17
+ - 30% of databases have never been vacuumed manually
18
+ - 50 databases have no monitoring
19
+ - 15 databases have default settings (shared_buffers=128MB on
20
+ 256GB RAM servers)
21
+ - Query performance varies 100x between teams for similar workloads
22
+ - 8 production incidents last year caused by unoptimized queries
23
+ - No query review process before deployment
24
+ - Connection pooling used by only 5 teams
25
+ - Backup tested by 0 teams (backups exist but never restored)
26
+
27
+ The CTO wants a governance framework that:
28
+ - Standardizes PostgreSQL configuration across all 200 databases
29
+ - Prevents slow queries from reaching production
30
+ - Ensures backup and recovery readiness
31
+ - Provides visibility into database health organization-wide
32
+ - Doesn't create a bottleneck (teams must remain autonomous)
33
+
34
+ Task: Design the governance framework. Write: the configuration
35
+ standards (tiered by database size/criticality), the query review
36
+ process (automated CI/CD checks), the monitoring and alerting
37
+ standards, the backup verification program, and the organizational
38
+ structure (DBA team vs platform team vs embedded).
39
+
40
+ assertions:
41
+ - type: llm_judge
42
+ criteria: "Configuration standards are tiered and practical — defines standard settings for small/medium/large databases, provides a base configuration template that teams can customize, and addresses the 15 databases with default settings as an immediate remediation"
43
+ weight: 0.35
44
+ description: "Tiered configuration standards"
45
+ - type: llm_judge
46
+ criteria: "Query review is automated — CI/CD pipeline includes EXPLAIN analysis for new queries, automated detection of missing indexes, N+1 patterns, and sequential scans on large tables. Blocks deployment if query exceeds performance thresholds. Teams maintain autonomy through self-service tooling"
47
+ weight: 0.35
48
+ description: "Automated query review"
49
+ - type: llm_judge
50
+ criteria: "Monitoring and backup programs are comprehensive — unified monitoring dashboard across 200 databases, alerting on key metrics (replication lag, table bloat, connection count, query latency), and backup recovery is tested quarterly with documented RTO/RPO per database"
51
+ weight: 0.30
52
+ description: "Comprehensive monitoring and backup"
@@ -0,0 +1,73 @@
1
+ meta:
2
+ id: expert-optimization-shift
3
+ level: 4
4
+ course: postgresql-query-optimization
5
+ type: output
6
+ description: "Expert optimization shift — handle a complex production crisis combining replication failure, query regression, and capacity emergency during a traffic spike"
7
+ tags: [PostgreSQL, shift-simulation, crisis, replication, capacity, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ You're the on-call database lead during Black Friday. Multiple
13
+ cascading failures hit your PostgreSQL infrastructure simultaneously.
14
+ The CEO is in the war room. Revenue is dropping $100K/minute.
15
+
16
+ Timeline of events:
17
+
18
+ 10:00 AM — Traffic starts at 3x normal. All systems green.
19
+
20
+ 10:15 AM — Alert: Primary CPU at 95%. Query latency P99 jumps from
21
+ 200ms to 2 seconds. PgBouncer queue depth growing.
22
+
23
+ 10:22 AM — Alert: Replication lag on Replica 2 exceeds 60 seconds.
24
+ Replica removed from read pool automatically. Remaining 2 replicas
25
+ now handle all read traffic — they start lagging too.
26
+
27
+ 10:25 AM — Alert: Connection pool exhausted. Application returning
28
+ 503 errors. 40% of checkout requests failing. Revenue impact begins.
29
+
30
+ 10:30 AM — Investigation reveals: A deployment at 10:05 AM introduced
31
+ a new query in the checkout flow:
32
+ SELECT * FROM inventory i
33
+ JOIN products p ON p.id = i.product_id
34
+ JOIN warehouses w ON w.id = i.warehouse_id
35
+ WHERE p.category_id IN (SELECT id FROM categories WHERE parent_id = $1)
36
+ AND i.quantity > 0
37
+ ORDER BY w.distance_from(point($2, $3))
38
+ LIMIT 20;
39
+ This query does a sequential scan on inventory (200M rows) because
40
+ the planner doesn't have statistics for the distance_from function.
41
+
42
+ 10:35 AM — Alert: Disk space on primary dropping fast. Autovacuum
43
+ is blocked by a 2-hour-old analytics transaction on Replica 1 that's
44
+ preventing cleanup of dead tuples. WAL files accumulating.
45
+
46
+ 10:40 AM — Disk at 90%. If it hits 100%, PostgreSQL will crash.
47
+
48
+ Simultaneous problems:
49
+ 1. New query causing sequential scan on 200M rows (10:30 AM finding)
50
+ 2. Replication cascade failure (replicas can't keep up)
51
+ 3. Connection pool exhausted (503 errors)
52
+ 4. Disk filling due to WAL accumulation + vacuum blocked
53
+ 5. Revenue loss of $100K/minute
54
+
55
+ Task: Write the incident response plan. Include: the immediate
56
+ triage (first 5 minutes — what do you do RIGHT NOW), the
57
+ stabilization plan (next 30 minutes), the root cause analysis,
58
+ the post-incident improvements, and the communication updates to
59
+ the war room at each stage.
60
+
61
+ assertions:
62
+ - type: llm_judge
63
+ criteria: "Immediate triage prioritizes correctly — addresses the disk space emergency first (terminate the blocking analytics transaction, manually trigger VACUUM, or move WAL to another volume), then mitigates the bad query (kill active instances, roll back the deployment or add a query hint/index), and increases connection pool limits or redirects traffic. Timing for each step is estimated"
64
+ weight: 0.35
65
+ description: "Correct triage prioritization"
66
+ - type: llm_judge
67
+ criteria: "Stabilization addresses all 5 problems — fixes the query (add appropriate index or rewrite), recovers replication (restart WAL replay, potentially rebuild lagging replicas), restores connection capacity (increase pool, reduce per-query connection time), reclaims disk space (vacuum, archive WAL), and provides war-room updates with ETA for recovery"
68
+ weight: 0.35
69
+ description: "Comprehensive stabilization"
70
+ - type: llm_judge
71
+ criteria: "Post-incident improvements prevent recurrence — recommends query review in CI/CD (EXPLAIN analysis before deployment), query timeout configuration, disk space alerting with more headroom, replication lag circuit breakers, and a deployment freeze policy during high-traffic events. Root cause traces back to missing query review process"
72
+ weight: 0.30
73
+ description: "Preventive post-incident improvements"
@@ -0,0 +1,62 @@
1
+ meta:
2
+ id: high-availability-architecture
3
+ level: 4
4
+ course: postgresql-query-optimization
5
+ type: output
6
+ description: "Design PostgreSQL high availability — architect streaming replication, automatic failover, and multi-region deployments"
7
+ tags: [PostgreSQL, high-availability, replication, failover, Patroni, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your company's SLA requires 99.99% database availability (52
13
+ minutes of downtime per year). Currently you have a single
14
+ PostgreSQL instance with no replication. Last year's downtime:
15
+ 18 hours across 4 incidents.
16
+
17
+ Incidents that caused downtime:
18
+ 1. Hardware failure (4 hours): Server disk died, restored from
19
+ backup. RPO: 1 hour of data lost.
20
+ 2. Failed upgrade (6 hours): PG 15 → 16 upgrade failed, had to
21
+ roll back. No secondary to fail to.
22
+ 3. DDL lock (3 hours): ALTER TABLE on 500M-row table took
23
+ AccessExclusive lock, blocking all queries.
24
+ 4. Connection storm (5 hours): Application bug opened 10K
25
+ connections, server ran out of memory and crashed.
26
+
27
+ Requirements:
28
+ - 99.99% availability (< 52 minutes downtime/year)
29
+ - RPO < 5 minutes (maximum data loss in disaster)
30
+ - RTO < 2 minutes (time to recover from failure)
31
+ - Zero-downtime deployments (schema changes, PG upgrades)
32
+ - Multi-region for disaster recovery
33
+ - Read scaling for analytics workload
34
+
35
+ Architecture options:
36
+ A. Patroni + streaming replication (self-managed)
37
+ B. AWS RDS Multi-AZ
38
+ C. AWS Aurora PostgreSQL
39
+ D. Citus for horizontal scaling
40
+ E. Hybrid: Aurora primary + self-managed read replicas
41
+
42
+ Current infrastructure: AWS, 3 availability zones, 2 regions
43
+
44
+ Task: Design the HA architecture. Write: the architecture
45
+ recommendation (with justification against all 5 options), the
46
+ failover mechanism (automatic detection and promotion), the
47
+ replication topology (sync vs async, cascade), the zero-downtime
48
+ deployment strategy, and the DR plan (cross-region recovery).
49
+
50
+ assertions:
51
+ - type: llm_judge
52
+ criteria: "Architecture choice is well-justified — evaluates all 5 options against the requirements (availability, RPO, RTO, zero-downtime, multi-region, read scaling), selects the best fit, and explains why the others don't meet requirements or are less optimal"
53
+ weight: 0.35
54
+ description: "Well-justified architecture choice"
55
+ - type: llm_judge
56
+ criteria: "Failover mechanism achieves RTO < 2 minutes — automatic failure detection (health checks, WAL lag monitoring), automatic promotion (Patroni consensus or Aurora automatic failover), application connection routing (DNS failover, connection string discovery), and the 4 past incidents would all be handled"
57
+ weight: 0.35
58
+ description: "RTO-achieving failover"
59
+ - type: llm_judge
60
+ criteria: "Zero-downtime deployments and DR are addressed — schema changes use online DDL techniques (CREATE INDEX CONCURRENTLY, pg_repack), PG upgrades use logical replication for live migration, and cross-region DR with async replication achieves RPO < 5 minutes"
61
+ weight: 0.30
62
+ description: "Zero-downtime and DR"
@@ -0,0 +1,69 @@
1
+ meta:
2
+ id: optimizer-internals
3
+ level: 4
4
+ course: postgresql-query-optimization
5
+ type: output
6
+ description: "Understand the cost-based optimizer — learn how PostgreSQL estimates query costs and how to influence plan choices"
7
+ tags: [PostgreSQL, optimizer, cost-model, statistics, planner, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your team has a critical query that occasionally switches from a
13
+ fast plan (Index Scan, 5ms) to a slow plan (Seq Scan, 15 seconds).
14
+ It seems random. The team wants to "force" the fast plan, but you
15
+ need to understand why the planner sometimes picks the slow one.
16
+
17
+ The query:
18
+ SELECT o.*, c.name
19
+ FROM orders o
20
+ JOIN customers c ON c.id = o.customer_id
21
+ WHERE o.status = $1 AND o.created_at >= $2;
22
+
23
+ Fast plan (when status = 'pending', 500 rows estimated):
24
+ Nested Loop Join
25
+ → Index Scan on orders using idx_orders_status_date
26
+ → Index Scan on customers (pk)
27
+ Cost: 1500, Time: 5ms
28
+
29
+ Slow plan (when status = 'completed', 5M rows estimated):
30
+ Hash Join
31
+ → Seq Scan on orders (filter: status = 'completed')
32
+ → Hash on customers
33
+ Cost: 45000, Time: 15 seconds
34
+
35
+ The problem: With prepared statements, PostgreSQL eventually
36
+ switches to a generic plan that must work for ALL parameter values.
37
+ If 'completed' is the most common value, the generic plan chooses
38
+ Seq Scan (because it's better for 5M rows), even when the actual
39
+ parameter is 'pending' (500 rows).
40
+
41
+ Additional optimizer mysteries:
42
+ 1. Why does the planner sometimes choose Hash Join over Merge Join
43
+ even when data is sorted?
44
+ 2. Why does increasing work_mem from 4MB to 256MB make some queries
45
+ slower? (Hash Join replaces Index Scan)
46
+ 3. The planner estimates 100 rows but actual is 50,000 — why are
47
+ statistics wrong?
48
+ 4. Two identical queries with different literal values get different
49
+ plans — what's happening?
50
+
51
+ Task: Explain the optimizer's decision-making process. Write: how
52
+ cost estimation works (the formula), how statistics influence
53
+ estimates, why prepared statement plans differ from ad-hoc plans,
54
+ how to fix the plan instability for the critical query, and when
55
+ it's appropriate to influence the planner.
56
+
57
+ assertions:
58
+ - type: llm_judge
59
+ criteria: "Cost model is correctly explained — explains the cost formula (startup_cost + total_cost), how seq_page_cost and random_page_cost affect scan choice, how row estimates come from statistics (histograms, MCVs, distinct values), and why the planner's choice is rational given its estimates"
60
+ weight: 0.35
61
+ description: "Correct cost model explanation"
62
+ - type: llm_judge
63
+ criteria: "Prepared statement plan instability is addressed — explains custom vs generic plan switching (threshold after ~5 executions), how plan_cache_mode can force custom plans, and alternative solutions (partition by status, partial indexes, or restructuring the query to avoid the problem)"
64
+ weight: 0.35
65
+ description: "Plan instability addressed"
66
+ - type: llm_judge
67
+ criteria: "All 4 mysteries are explained — Hash vs Merge Join cost comparison, work_mem increase changing plan choice (hash now fits in memory so Hash Join replaces Index Scan), statistics accuracy (insufficient samples, correlated columns), and parameterized query plans using different statistics for different literal values"
68
+ weight: 0.30
69
+ description: "All mysteries explained"
@@ -0,0 +1,58 @@
1
+ meta:
2
+ id: performance-sla-design
3
+ level: 4
4
+ course: postgresql-query-optimization
5
+ type: output
6
+ description: "Design database performance SLAs — define latency budgets, throughput targets, and capacity planning for PostgreSQL at scale"
7
+ tags: [PostgreSQL, SLA, performance, latency, throughput, capacity-planning, expert]
8
+
9
+ state: {}
10
+
11
+ trigger: |
12
+ Your company is formalizing database performance SLAs for the first
13
+ time. The platform serves 50M daily active users across 12 product
14
+ teams. Each team has different performance needs, but there are no
15
+ documented standards. Last quarter, 3 incidents were caused by teams
16
+ with conflicting expectations about what "fast" means.
17
+
18
+ Current state:
19
+ - 15 PostgreSQL clusters (mix of RDS and self-managed)
20
+ - No formal latency targets — teams say "it should be fast"
21
+ - P50 latency: 5ms, P95: 50ms, P99: 500ms, P99.9: 5 seconds
22
+ - The P99.9 tail is causing user-facing timeouts
23
+ - No query budget per team — one team's expensive query impacts others
24
+ - No capacity planning — teams discover limits during traffic spikes
25
+ - Monitoring exists but no alerting thresholds defined
26
+
27
+ Product teams with different needs:
28
+ 1. Checkout (critical path): Needs <10ms P99 for payment queries
29
+ 2. Search: Needs <100ms P95 for search results
30
+ 3. Analytics: Runs 30-minute aggregation queries, needs throughput
31
+ 4. Notification: Burst writes of 50K/sec during campaigns
32
+ 5. User Profile: Read-heavy, tolerates 50ms P99 but needs 99.99% availability
33
+
34
+ The VP of Engineering asks:
35
+ - "What should our database SLAs be?"
36
+ - "How do we prevent one team from degrading another's performance?"
37
+ - "When do we need to add capacity, and how much lead time do we need?"
38
+ - "What's the cost of improving P99 from 500ms to 50ms?"
39
+
40
+ Task: Design the performance SLA framework. Write: tiered SLA
41
+ definitions (by criticality), the latency budget allocation per team,
42
+ the capacity planning model (when to scale), the isolation strategy
43
+ (preventing noisy neighbors), and the cost-performance trade-off
44
+ analysis.
45
+
46
+ assertions:
47
+ - type: llm_judge
48
+ criteria: "SLA tiers are well-defined — creates at least 3 tiers (e.g., critical/standard/batch) with specific latency targets (P50, P95, P99), throughput limits, and availability requirements per tier. Maps the 5 product teams to appropriate tiers with justification"
49
+ weight: 0.35
50
+ description: "Well-defined SLA tiers"
51
+ - type: llm_judge
52
+ criteria: "Isolation and capacity planning are addressed — explains workload isolation strategies (separate clusters, connection pool limits per team, resource governor, query timeouts), capacity planning model (utilization thresholds, growth projections, lead time for provisioning), and how to detect approaching limits before they cause incidents"
53
+ weight: 0.35
54
+ description: "Isolation and capacity planning"
55
+ - type: llm_judge
56
+ criteria: "Cost-performance trade-off is analyzed — explains the diminishing returns of tail latency improvement (P99 500ms→50ms might require 3x infrastructure), quantifies the cost of each SLA tier, and provides a framework for teams to choose their tier based on business impact vs cost"
57
+ weight: 0.30
58
+ description: "Cost-performance analysis"