npm - dojo.md - Versions diffs - 0.1.0 → 0.2.0 - Mend

dojo.md 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (243) hide show

package/courses/postgresql-query-optimization/scenarios/level-2/autovacuum-tuning.yaml ADDED Viewed

@@ -0,0 +1,76 @@
+meta:
+  id: autovacuum-tuning
+  level: 2
+  course: postgresql-query-optimization
+  type: output
+  description: "Tune autovacuum — configure per-table autovacuum settings for high-write tables and prevent table bloat"
+  tags: [PostgreSQL, autovacuum, VACUUM, table-bloat, maintenance, intermediate]
+state: {}
+trigger: |
+  Your database has a mix of tables with very different write patterns,
+  and the default autovacuum settings aren't working for all of them.
+  Table profiles:
+  Table 1 — events (high-write, append-only):
+  - 500M rows, 200GB
+  - 1M INSERTs/day, 0 UPDATEs, 0 DELETEs
+  - Dead tuples: 0 (no updates = no dead tuples)
+  - Issue: Autovacuum runs daily but there's nothing to vacuum.
+    Wastes I/O.
+  Table 2 — sessions (high-churn):
+  - 5M rows, 2GB (but should be 500MB)
+  - 2M INSERTs/day, 2M DELETEs/day (session created, then deleted)
+  - Dead tuples: 1.5M (always behind)
+  - Issue: Table is 4x bloated. Autovacuum can't keep up with the
+    DELETE rate. Index bloat growing too.
+  Table 3 — user_preferences (bulk-update):
+  - 2M rows, 400MB
+  - Weekly batch job updates ALL 2M rows at once
+  - Dead tuples: 2M after batch job (100% of table)
+  - Issue: After batch job, table is 200% bloated until autovacuum
+    runs. Queries are 3x slower for hours.
+  Table 4 — orders (mixed, hot table):
+  - 50M rows, 20GB
+  - 100K INSERTs/day + 500K UPDATEs/day (status changes)
+  - Dead tuples: varies, 50K-500K
+  - Issue: Autovacuum contends with production queries. During peak
+    hours, autovacuum slows down the API.
+  Table 5 — audit_logs (write-once, read-rarely):
+  - 1B rows, 500GB
+  - 5M INSERTs/day, never updated, never deleted
+  - Issue: Autovacuum runs for hours on this table, consuming I/O
+    that the other tables need.
+  Current settings (all default):
+  autovacuum_vacuum_threshold = 50
+  autovacuum_vacuum_scale_factor = 0.2
+  autovacuum_analyze_threshold = 50
+  autovacuum_analyze_scale_factor = 0.1
+  autovacuum_vacuum_cost_delay = 2ms
+  autovacuum_vacuum_cost_limit = 200
+  Task: Design per-table autovacuum settings for each of the 5 tables.
+  For each, calculate: when autovacuum triggers (threshold + scale
+  factor × rows), the optimal settings, and any additional strategies
+  (partitioning, pg_repack for bloat).
+assertions:
+  - type: llm_judge
+    criteria: "Per-table settings are correctly calculated — events table should have autovacuum disabled or very high thresholds, sessions table needs aggressive settings (low scale_factor, high cost_limit), user_preferences needs immediate vacuum after batch job, orders table needs throttled vacuum during peak hours, audit_logs should use partitioning to avoid vacuuming the entire table"
+    weight: 0.35
+    description: "Correct per-table settings"
+  - type: llm_judge
+    criteria: "Bloat remediation strategies are practical — recommends pg_repack for the sessions table (online table rebuild), partitioning for audit_logs (vacuum per partition), and scheduling manual VACUUM ANALYZE after the user_preferences batch job. Explains why VACUUM FULL is rarely the right answer (exclusive lock)"
+    weight: 0.35
+    description: "Practical bloat remediation"
+  - type: llm_judge
+    criteria: "Autovacuum monitoring is included — shows how to monitor autovacuum effectiveness using pg_stat_user_tables (n_dead_tup, last_autovacuum, autovacuum_count), how to detect tables where autovacuum is falling behind, and how to set up alerting on table bloat"
+    weight: 0.30
+    description: "Autovacuum monitoring"

package/courses/postgresql-query-optimization/scenarios/level-2/composite-index-design.yaml ADDED Viewed

@@ -0,0 +1,81 @@
+meta:
+  id: composite-index-design
+  level: 2
+  course: postgresql-query-optimization
+  type: output
+  description: "Design composite indexes — optimize multi-column indexes with correct column ordering for complex query patterns"
+  tags: [PostgreSQL, composite-index, column-ordering, intermediate]
+state: {}
+trigger: |
+  Your SaaS application has an orders table (50M rows) with these
+  common query patterns. You need to design the minimum set of
+  composite indexes that covers all of them.
+  Table:
+  CREATE TABLE orders (
+    id BIGSERIAL PRIMARY KEY,
+    tenant_id INTEGER NOT NULL,
+    customer_id INTEGER NOT NULL,
+    status VARCHAR(20) NOT NULL,
+    total DECIMAL(10,2) NOT NULL,
+    created_at TIMESTAMP NOT NULL,
+    shipped_at TIMESTAMP,
+    region VARCHAR(10) NOT NULL
+  );
+  Query patterns (ordered by frequency):
+  Q1 (10,000/min): Tenant dashboard — recent orders
+  WHERE tenant_id = ? AND status = ?
+  ORDER BY created_at DESC LIMIT 50
+  Q2 (5,000/min): Customer order history
+  WHERE tenant_id = ? AND customer_id = ?
+  ORDER BY created_at DESC
+  Q3 (1,000/min): Revenue reporting
+  WHERE tenant_id = ? AND created_at BETWEEN ? AND ?
+  AND status = 'completed'
+  Q4 (500/min): Shipping queue
+  WHERE tenant_id = ? AND status = 'ready_to_ship'
+  AND region = ?
+  ORDER BY created_at ASC
+  Q5 (100/min): Large order alerts
+  WHERE tenant_id = ? AND total > 10000
+  AND created_at >= NOW() - INTERVAL '24 hours'
+  Q6 (50/min): Analytics — status distribution
+  SELECT status, COUNT(*) FROM orders
+  WHERE tenant_id = ? AND created_at >= ?
+  GROUP BY status
+  Current state: Only the PRIMARY KEY index exists. All queries
+  do sequential scans.
+  Constraints:
+  - Maximum 5 indexes total (write performance budget)
+  - Every query must use tenant_id (multi-tenant isolation)
+  - Minimize total index storage (currently 50M rows × 5 indexes)
+  Task: Design the 5 composite indexes. For each, explain: the column
+  order and why (equality → range → sort), which queries it serves,
+  the expected scan type after indexing. Then explain which queries
+  share indexes and any trade-offs in your design.
+assertions:
+  - type: llm_judge
+    criteria: "Index column ordering follows the equality-range-sort principle — equality columns (tenant_id, status) come first, followed by range columns (created_at BETWEEN), then sort columns (ORDER BY created_at). The reasoning for each column's position is explicit"
+    weight: 0.35
+    description: "Correct column ordering principle"
+  - type: llm_judge
+    criteria: "Index set is minimal and covers all queries — 5 or fewer indexes serve all 6 query patterns, some indexes are shared across multiple queries (e.g., tenant_id + status + created_at serves Q1 and Q3), and the design prioritizes high-frequency queries (Q1 at 10K/min gets the best index)"
+    weight: 0.35
+    description: "Minimal covering index set"
+  - type: llm_judge
+    criteria: "Trade-offs are acknowledged — explains write overhead of 5 indexes on a 50M-row table, discusses which queries get optimal vs acceptable performance, and considers partial indexes or covering indexes (INCLUDE) to further optimize"
+    weight: 0.30
+    description: "Trade-offs acknowledged"

package/courses/postgresql-query-optimization/scenarios/level-2/covering-indexes.yaml ADDED Viewed

@@ -0,0 +1,74 @@
+meta:
+  id: covering-indexes
+  level: 2
+  course: postgresql-query-optimization
+  type: output
+  description: "Implement covering indexes — use INCLUDE columns and index-only scans to eliminate heap fetches"
+  tags: [PostgreSQL, covering-index, INCLUDE, index-only-scan, intermediate]
+state: {}
+trigger: |
+  Your analytics API has endpoints that run the same queries millions
+  of times per day. Even with indexes, EXPLAIN ANALYZE shows "Heap
+  Fetches: 50000" — the query uses the index to find rows but then
+  must read the actual table (heap) to get the columns not in the
+  index.
+  Table: events (500M rows, 200GB)
+  Most queries select only 3-5 columns out of 25.
+  Query 1 (2M calls/day):
+  SELECT user_id, event_type, created_at
+  FROM events
+  WHERE user_id = $1 AND created_at >= $2
+  ORDER BY created_at DESC LIMIT 100;
+  Current index: (user_id, created_at)
+  Problem: Index finds the rows fast, but must heap-fetch each row
+  to read event_type. With 100 rows, that's 100 random I/O reads.
+  Query 2 (500K calls/day):
+  SELECT product_id, SUM(quantity), COUNT(*)
+  FROM events
+  WHERE event_type = 'purchase' AND created_at >= $1
+  GROUP BY product_id;
+  Current index: (event_type, created_at)
+  Problem: Must heap-fetch every matching row to read product_id
+  and quantity. Thousands of heap fetches.
+  Query 3 (1M calls/day):
+  SELECT COUNT(*) FROM events
+  WHERE user_id = $1 AND event_type = 'page_view';
+  Current index: (user_id, event_type)
+  Problem: COUNT(*) should be index-only but EXPLAIN shows heap
+  fetches. Why?
+  Query 4 (100K calls/day):
+  SELECT DISTINCT category FROM events
+  WHERE tenant_id = $1;
+  Current index: (tenant_id)
+  Problem: Must heap-fetch every row to read category, then
+  deduplicate. Extremely slow for tenants with millions of events.
+  Task: Design covering indexes for each query. Explain: the INCLUDE
+  clause syntax, why index-only scans are faster, why Query 3 still
+  shows heap fetches (visibility map), and the trade-offs of wider
+  indexes (storage, write overhead, maintenance).
+assertions:
+  - type: llm_judge
+    criteria: "Covering indexes are correctly designed — uses INCLUDE to add non-key columns (event_type in Q1, product_id + quantity in Q2, category in Q4), enabling index-only scans that eliminate heap fetches. The INCLUDE columns are in the right place (not as index keys)"
+    weight: 0.35
+    description: "Correct covering index design"
+  - type: llm_judge
+    criteria: "Visibility map issue is explained — Query 3 shows heap fetches because recently updated/inserted pages aren't marked as all-visible in the visibility map. VACUUM updates the visibility map. Explains the connection between VACUUM frequency and index-only scan effectiveness"
+    weight: 0.35
+    description: "Visibility map explanation"
+  - type: llm_judge
+    criteria: "Trade-offs are quantified — estimates the index size increase from INCLUDE columns, discusses write amplification (wider indexes = more data per write), and provides guidelines for when covering indexes are worth the cost (high-frequency read queries on large tables)"
+    weight: 0.30
+    description: "Quantified trade-offs"

package/courses/postgresql-query-optimization/scenarios/level-2/cte-optimization.yaml ADDED Viewed

@@ -0,0 +1,83 @@
+meta:
+  id: cte-optimization
+  level: 2
+  course: postgresql-query-optimization
+  type: output
+  description: "Optimize CTEs and subqueries — understand materialization control and when CTEs help or hurt performance"
+  tags: [PostgreSQL, CTE, materialization, subquery, optimization, intermediate]
+state: {}
+trigger: |
+  Your application uses CTEs (Common Table Expressions) extensively
+  for readability. After upgrading from PostgreSQL 11 to 16, some
+  queries got faster and some got slower. The team is confused about
+  CTE behavior.
+  Query 1 — CTE that got faster (auto-inlined in PG 12+):
+  WITH active_users AS (
+    SELECT * FROM users WHERE active = true
+  )
+  SELECT * FROM active_users WHERE created_at > '2026-01-01';
+  PG 11: Materialized the CTE (scanned all active users, then
+  filtered by date). Slow because active_users = 900K rows.
+  PG 16: Inlined the CTE (pushed date filter into the scan).
+  Fast because it reads only users active AND created after Jan 1.
+  Query 2 — CTE that got slower (inlined when it shouldn't be):
+  WITH expensive_calc AS (
+    SELECT customer_id, SUM(amount) as total
+    FROM orders GROUP BY customer_id
+  )
+  SELECT u.name, e.total FROM users u
+  JOIN expensive_calc e ON e.customer_id = u.id
+  WHERE u.tier = 'premium';
+  PG 11: Materialized (computed once, joined). Fine.
+  PG 16: Inlined (merged into main query). Now the aggregation
+  runs after the JOIN filter, but the planner pushes the filter
+  into the wrong place, making it slower.
+  Query 3 — CTE referenced multiple times:
+  WITH monthly_stats AS (
+    SELECT date_trunc('month', created_at) as month,
+           COUNT(*) as order_count, SUM(amount) as revenue
+    FROM orders GROUP BY 1
+  )
+  SELECT a.month, a.order_count, a.revenue,
+         a.revenue - b.revenue as growth
+  FROM monthly_stats a
+  JOIN monthly_stats b ON b.month = a.month - INTERVAL '1 month';
+  This CTE is referenced twice. Should it be materialized?
+  Query 4 — Recursive CTE (always materialized):
+  WITH RECURSIVE org_chart AS (
+    SELECT id, name, manager_id, 1 as depth
+    FROM employees WHERE manager_id IS NULL
+    UNION ALL
+    SELECT e.id, e.name, e.manager_id, oc.depth + 1
+    FROM employees e JOIN org_chart oc ON e.manager_id = oc.id
+  )
+  SELECT * FROM org_chart WHERE depth <= 5;
+  This is slow for a 100K employee table. Can it be optimized?
+  Task: For each query, explain the CTE behavior (materialized vs
+  inlined), when to force MATERIALIZED or NOT MATERIALIZED, and the
+  optimization strategy. Then write guidelines for CTE usage.
+assertions:
+  - type: llm_judge
+    criteria: "CTE materialization behavior is correctly explained — PG 12+ defaults to inlining CTEs referenced once, materializing when referenced multiple times. Query 1 benefits from inlining (filter pushdown), Query 2 needs explicit MATERIALIZED (compute-once benefit), Query 3 should be materialized (referenced twice), Query 4 is always materialized (recursive)"
+    weight: 0.35
+    description: "Correct materialization behavior"
+  - type: llm_judge
+    criteria: "MATERIALIZED/NOT MATERIALIZED hints are correctly applied — shows exact syntax, explains when each is needed, and addresses Query 2's regression (force MATERIALIZED to restore PG 11 behavior) and Query 4's optimization (add depth limit to recursive term, not just WHERE clause)"
+    weight: 0.35
+    description: "Correct hint application"
+  - type: llm_judge
+    criteria: "Guidelines are practical — provides decision tree for CTE materialization (referenced once → inline unless expensive, referenced multiple times → materialize, recursive → always materialized), and discusses alternatives to CTEs (subqueries, temporary tables, materialized views)"
+    weight: 0.30
+    description: "Practical CTE guidelines"

package/courses/postgresql-query-optimization/scenarios/level-2/intermediate-optimization-shift.yaml ADDED Viewed

@@ -0,0 +1,66 @@
+meta:
+  id: intermediate-optimization-shift
+  level: 2
+  course: postgresql-query-optimization
+  type: output
+  description: "Intermediate optimization shift — optimize a database migration that's blocked by slow queries and table bloat"
+  tags: [PostgreSQL, optimization, shift-simulation, migration, intermediate]
+state: {}
+trigger: |
+  Your company is migrating from a monolith to microservices. The
+  first step is splitting the monolithic 2TB PostgreSQL database into
+  domain-specific databases. But the migration is blocked by
+  performance issues.
+  Current database: 2TB, 150 tables, 300 indexes, 100M queries/day
+  Blockers:
+  Blocker 1 — The migration query is too slow:
+  INSERT INTO orders_new SELECT * FROM orders
+  WHERE created_at >= '2024-01-01';
+  This copies 80M rows (800GB) and has been running for 18 hours
+  with no end in sight. It's also generating massive WAL traffic
+  that's causing replication lag on your read replica.
+  Blocker 2 — Table bloat prevents accurate size estimation:
+  The orders table is 800GB on disk but pgstattuple shows only
+  400GB of live data. 50% bloat is throwing off migration planning.
+  Should you fix bloat first or after migration?
+  Blocker 3 — Foreign key constraints slow the migration:
+  order_items has a FK to orders. Migrating orders first breaks the
+  FK. Migrating both together in a transaction locks both tables.
+  Current approach: disable FK → migrate → re-enable FK → validate.
+  But FK validation on 200M rows takes 4 hours.
+  Blocker 4 — Index rebuilds after migration:
+  After copying data, you need to recreate 15 indexes on the new
+  orders_new table. Each index takes 30-90 minutes to build. Total:
+  10-15 hours. Can you parallelize?
+  Blocker 5 — Applications still querying during migration:
+  The old database must remain operational during migration. Long-
+  running migration queries compete with production queries for
+  resources (CPU, I/O, connections).
+  Task: Unblock the migration. For each blocker, write: the root
+  cause, the solution (with specific commands), the expected time
+  savings, and the risk mitigation. Then create the optimized
+  migration plan that addresses all 5 blockers.
+assertions:
+  - type: llm_judge
+    criteria: "All 5 blockers have specific solutions — batch the large copy (1M rows at a time vs 80M at once), fix bloat after migration (pg_repack or VACUUM FULL during maintenance), handle FKs with NOT VALID then VALIDATE CONSTRAINT, use CREATE INDEX CONCURRENTLY for parallel index builds, and throttle migration queries to coexist with production"
+    weight: 0.35
+    description: "All blockers solved"
+  - type: llm_judge
+    criteria: "Solutions use PostgreSQL-specific features — COPY instead of INSERT...SELECT for bulk data movement, CREATE INDEX CONCURRENTLY for non-blocking index builds, ALTER TABLE ... VALIDATE CONSTRAINT for background FK validation, and logical replication for zero-downtime migration alternative"
+    weight: 0.35
+    description: "PostgreSQL-specific solutions"
+  - type: llm_judge
+    criteria: "Migration plan is sequenced correctly — data migration before index builds, FK validation after both tables are migrated, resource throttling throughout, and a rollback plan in case any step fails. Time estimates are realistic"
+    weight: 0.30
+    description: "Correctly sequenced plan"

package/courses/postgresql-query-optimization/scenarios/level-2/join-optimization.yaml ADDED Viewed

@@ -0,0 +1,72 @@
+meta:
+  id: join-optimization
+  level: 2
+  course: postgresql-query-optimization
+  type: output
+  description: "Optimize JOIN strategies — tune PostgreSQL to choose the right JOIN algorithm for each query pattern"
+  tags: [PostgreSQL, JOIN, hash-join, merge-join, nested-loop, intermediate]
+state: {}
+trigger: |
+  Your reporting system has three problematic JOIN queries. Each uses
+  the wrong JOIN algorithm, causing severe performance issues.
+  Tables:
+  - customers (1M rows): id, name, email, tier, created_at
+  - orders (50M rows): id, customer_id, amount, status, created_at
+  - order_items (200M rows): id, order_id, product_id, quantity, price
+  - products (100K rows): id, name, category, price
+  Problem 1 — Wrong algorithm: Nested Loop on large tables
+  SELECT c.name, SUM(o.amount)
+  FROM customers c
+  JOIN orders o ON o.customer_id = c.id
+  WHERE c.tier = 'enterprise'
+  GROUP BY c.name;
+  EXPLAIN shows: Nested Loop Join (estimated: 30 seconds)
+  - Enterprise customers: 500 rows
+  - Orders per enterprise customer: ~100,000
+  - Total: 50M random index lookups
+  Better choice: Hash Join (build hash on 500 customers, probe orders)
+  Problem 2 — Hash Join spilling to disk
+  SELECT o.id, oi.product_id, oi.quantity
+  FROM orders o
+  JOIN order_items oi ON oi.order_id = o.id
+  WHERE o.created_at >= '2026-01-01';
+  EXPLAIN shows: Hash Join (estimated: 120 seconds)
+  - Matching orders: 5M rows
+  - Hash table too large for work_mem → spills to disk (Batches: 16)
+  - Disk I/O dominates execution time
+  Problem 3 — Missing opportunity for Merge Join
+  SELECT c.name, o.amount, o.created_at
+  FROM customers c
+  JOIN orders o ON o.customer_id = c.id
+  ORDER BY c.id, o.created_at;
+  EXPLAIN shows: Hash Join + Sort (estimated: 90 seconds)
+  - Both tables could be pre-sorted by the JOIN key
+  - Merge Join would eliminate the separate Sort step
+  Task: Fix each problem. For each, explain: why PostgreSQL chose
+  the wrong algorithm, what parameter or index change would fix it,
+  the expected EXPLAIN output after the fix, and the general rule
+  for when each JOIN algorithm is optimal.
+assertions:
+  - type: llm_judge
+    criteria: "All 3 problems are correctly diagnosed and fixed — Problem 1 needs statistics update or adjusted cost parameters to favor Hash Join over Nested Loop, Problem 2 needs increased work_mem to prevent hash table spill, Problem 3 needs indexes on JOIN keys to enable Merge Join. Each fix is specific"
+    weight: 0.35
+    description: "All problems correctly fixed"
+  - type: llm_judge
+    criteria: "JOIN algorithm selection rules are clearly stated — Nested Loop: best for small outer × indexed inner, Hash Join: best for equality JOINs with sufficient memory, Merge Join: best when both inputs are pre-sorted or will be sorted anyway. Includes data size thresholds and work_mem considerations"
+    weight: 0.35
+    description: "Clear algorithm selection rules"
+  - type: llm_judge
+    criteria: "work_mem tuning is addressed — explains how work_mem affects Hash Join (too low → disk spill, too high × many connections → OOM), how to set it per-query vs globally, and the relationship between work_mem, max_connections, and available RAM"
+    weight: 0.30
+    description: "work_mem tuning addressed"

package/courses/postgresql-query-optimization/scenarios/level-2/partial-and-expression-indexes.yaml ADDED Viewed

@@ -0,0 +1,75 @@
+meta:
+  id: partial-and-expression-indexes
+  level: 2
+  course: postgresql-query-optimization
+  type: output
+  description: "Use partial and expression indexes — create targeted indexes for specific query patterns with reduced storage and maintenance cost"
+  tags: [PostgreSQL, partial-index, expression-index, optimization, intermediate]
+state: {}
+trigger: |
+  Your e-commerce platform has a products table (10M rows) with
+  skewed data distribution. Full indexes are too large and slow
+  to maintain.
+  Table statistics:
+  - 10M total products
+  - 9.5M have status = 'archived' (95%)
+  - 300K have status = 'active' (3%)
+  - 200K have status = 'draft' (2%)
+  - Only active products appear in search results
+  - Only draft products appear in the admin editor
+  Current indexes:
+  CREATE INDEX idx_products_status ON products(status);
+  -- This index is 400MB but 95% of it covers 'archived' rows
+  -- that are almost never queried
+  Problematic queries:
+  Q1: Case-insensitive email search:
+  SELECT * FROM users WHERE LOWER(email) = LOWER($1);
+  -- Index on email doesn't help (function applied)
+  Q2: Active product search:
+  SELECT * FROM products WHERE status = 'active' AND category = $1
+  ORDER BY price;
+  -- Full index scans 10M entries to find 300K active ones
+  Q3: Soft-deleted records cleanup:
+  SELECT * FROM orders WHERE deleted_at IS NOT NULL
+  AND deleted_at < NOW() - INTERVAL '90 days';
+  -- Only 0.1% of orders are soft-deleted, but full index is huge
+  Q4: JSON field query:
+  SELECT * FROM products WHERE (metadata->>'brand') = 'Nike';
+  -- No index on JSON fields
+  Q5: Computed date query:
+  SELECT * FROM subscriptions
+  WHERE DATE(expires_at) = CURRENT_DATE;
+  -- Function on column prevents index use
+  Q6: Trigram search:
+  SELECT * FROM products WHERE name ILIKE '%wireless%mouse%';
+  -- Leading wildcard with case-insensitive match
+  Task: Design the optimal index for each query using partial indexes,
+  expression indexes, or specialized index types. For each, show:
+  the CREATE INDEX statement, the size compared to a full index, and
+  the EXPLAIN output showing the index is used.
+assertions:
+  - type: llm_judge
+    criteria: "Partial indexes are used correctly — index on products WHERE status = 'active' covers only 3% of rows (300K vs 10M), index on orders WHERE deleted_at IS NOT NULL covers only 0.1%. Each partial index is dramatically smaller than the full equivalent"
+    weight: 0.35
+    description: "Correct partial index usage"
+  - type: llm_judge
+    criteria: "Expression indexes handle function-on-column — LOWER(email) expression index enables case-insensitive search, JSON expression index on (metadata->>'brand') enables JSON field queries, DATE(expires_at) expression index enables computed date queries. Each eliminates the need for full table scans"
+    weight: 0.35
+    description: "Expression indexes for functions"
+  - type: llm_judge
+    criteria: "Specialized indexes are recommended where appropriate — pg_trgm GIN index for ILIKE wildcard search, citext extension as alternative to LOWER() expression index, and the trade-offs of each approach (storage, maintenance, query compatibility)"
+    weight: 0.30
+    description: "Appropriate specialized indexes"

package/courses/postgresql-query-optimization/scenarios/level-2/query-planner-settings.yaml ADDED Viewed

@@ -0,0 +1,62 @@
+meta:
+  id: query-planner-settings
+  level: 2
+  course: postgresql-query-optimization
+  type: output
+  description: "Tune query planner settings — configure work_mem, effective_cache_size, and cost parameters for your workload"
+  tags: [PostgreSQL, planner, work_mem, effective_cache_size, tuning, intermediate]
+state: {}
+trigger: |
+  You just migrated to a new database server with NVMe SSDs and
+  128GB RAM, but query performance hasn't improved. The planner
+  still prefers sequential scans over index scans and hash joins
+  spill to disk.
+  Server specs:
+  - CPU: 32 cores
+  - RAM: 128GB
+  - Storage: NVMe SSD (500K IOPS, 3GB/s throughput)
+  - PostgreSQL 16
+  - max_connections: 200
+  Current settings (still at defaults):
+  - shared_buffers = 128MB (should be much higher)
+  - work_mem = 4MB (sorts spill to disk)
+  - effective_cache_size = 4GB (planner underestimates cache)
+  - random_page_cost = 4.0 (assumes spinning disk, not SSD)
+  - seq_page_cost = 1.0
+  - maintenance_work_mem = 64MB (VACUUM and index builds are slow)
+  - effective_io_concurrency = 1 (not using SSD parallelism)
+  Symptoms:
+  1. Planner chooses Seq Scan over Index Scan because
+     random_page_cost = 4.0 makes random I/O look 4x expensive
+     (true for HDD, not for SSD where it's ~1.1x)
+  2. Hash Joins spill to disk because work_mem = 4MB can't hold
+     the hash table in memory
+  3. Sorts spill to disk for the same reason
+  4. VACUUM takes hours because maintenance_work_mem = 64MB
+  5. effective_cache_size = 4GB makes the planner think only 4GB
+     of data is cached (actual: ~100GB between shared_buffers and
+     OS cache)
+  Task: Calculate the optimal value for each setting. For each,
+  explain: the formula or reasoning, the impact on query plans,
+  and the risks of over-tuning. Then show before/after EXPLAIN
+  output for a query that changes plan due to the settings.
+assertions:
+  - type: llm_judge
+    criteria: "Setting calculations are correct for the hardware — shared_buffers ~25% of RAM (32GB), work_mem calculated from RAM / (max_connections * expected_operations), effective_cache_size ~75% of RAM (96GB), random_page_cost 1.1-1.5 for NVMe SSDs, maintenance_work_mem 1-2GB, effective_io_concurrency 200+ for NVMe"
+    weight: 0.35
+    description: "Correct setting calculations"
+  - type: llm_judge
+    criteria: "Impact on query plans is demonstrated — shows how reducing random_page_cost makes the planner prefer Index Scan over Seq Scan, how increasing work_mem eliminates disk spill in Hash Joins, and how effective_cache_size changes the planner's cost estimates. Before/after EXPLAIN shows the plan change"
+    weight: 0.35
+    description: "Demonstrated plan impact"
+  - type: llm_judge
+    criteria: "Risks of over-tuning are explained — too-high work_mem × 200 connections can OOM the server, too-low random_page_cost can force inefficient index scans, shared_buffers > 40% of RAM can cause double-buffering with OS cache. Includes the formula for safe work_mem calculation"
+    weight: 0.30
+    description: "Over-tuning risks explained"

package/courses/postgresql-query-optimization/scenarios/level-2/subquery-optimization.yaml ADDED Viewed

@@ -0,0 +1,67 @@
+meta:
+  id: subquery-optimization
+  level: 2
+  course: postgresql-query-optimization
+  type: output
+  description: "Optimize subqueries — convert correlated subqueries to JOINs, use EXISTS vs IN, and leverage lateral joins for dependent subqueries"
+  tags: [PostgreSQL, subqueries, correlated-subquery, EXISTS, IN, lateral-join, intermediate]
+state: {}
+trigger: |
+  Your team's order summary endpoint is slow. The query uses multiple
+  subqueries, and EXPLAIN ANALYZE shows repeated execution of correlated
+  subqueries — one for each row in the outer query.
+  The current query (takes 12 seconds):
+  SELECT c.id, c.name, c.email,
+    (SELECT COUNT(*) FROM orders WHERE customer_id = c.id) AS order_count,
+    (SELECT SUM(total) FROM orders WHERE customer_id = c.id) AS total_spent,
+    (SELECT MAX(created_at) FROM orders WHERE customer_id = c.id) AS last_order
+  FROM customers c
+  WHERE c.status = 'active'
+    AND (SELECT COUNT(*) FROM orders WHERE customer_id = c.id) > 5;
+  EXPLAIN ANALYZE shows:
+  Seq Scan on customers (rows=50,000)
+    → SubPlan 1: Index Scan on orders (executed 50,000 times)
+    → SubPlan 2: Index Scan on orders (executed 50,000 times)
+    → SubPlan 3: Index Scan on orders (executed 50,000 times)
+    → SubPlan 4: Index Scan on orders (executed 50,000 times)
+  Total: 12.4 seconds
+  That's 200,000 subplan executions for 50,000 customers.
+  Additional subquery patterns your team uses that need review:
+  1. WHERE id IN (SELECT ...) vs WHERE EXISTS (SELECT ...)
+     - When does the planner treat these differently?
+  2. Scalar subquery in SELECT vs JOIN
+     - The query above uses scalar subqueries — how does a JOIN compare?
+  3. LATERAL JOIN for "top-N per group" queries:
+     SELECT d.name, recent.*
+     FROM departments d,
+     LATERAL (SELECT * FROM employees WHERE dept_id = d.id
+              ORDER BY hire_date DESC LIMIT 3) recent;
+  4. Subquery in FROM (derived table) vs CTE:
+     SELECT * FROM (SELECT ...) sub WHERE sub.x > 10;
+     vs WITH sub AS (SELECT ...) SELECT * FROM sub WHERE sub.x > 10;
+  Task: Rewrite the slow query to eliminate the repeated subplan
+  executions. Explain: the rewritten query using JOINs, why correlated
+  subqueries are expensive (execution model), when EXISTS outperforms IN,
+  how LATERAL joins work and when to use them, and the difference between
+  derived tables and CTEs in PostgreSQL's optimizer.
+assertions:
+  - type: llm_judge
+    criteria: "The correlated subquery is correctly rewritten — converts the 4 scalar subqueries into a single LEFT JOIN with GROUP BY on orders (aggregating COUNT, SUM, MAX), reducing 200K subplan executions to a single hash aggregate. The rewritten query should use HAVING or a subquery/CTE for the >5 filter"
+    weight: 0.35
+    description: "Correct subquery rewrite"
+  - type: llm_judge
+    criteria: "EXISTS vs IN is explained — EXISTS short-circuits (stops at first match), IN materializes the full subquery result. EXISTS is better for large subquery results with selective outer queries. The planner may convert IN to a semi-join (equivalent to EXISTS) but not always, especially with NULLs"
+    weight: 0.35
+    description: "EXISTS vs IN explained"
+  - type: llm_judge
+    criteria: "LATERAL joins and derived tables vs CTEs are explained — LATERAL allows dependent subqueries in FROM clause (each row can reference prior tables), useful for top-N per group. Derived tables can be folded into the outer query by the optimizer, while CTEs in PG 12+ are only materialized when specified or referenced multiple times"
+    weight: 0.30
+    description: "LATERAL and CTE behavior explained"