npm - dojo.md - Versions diffs - 0.2.0 → 0.2.1 - Mend

dojo.md 0.2.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (222) hide show

package/courses/mysql-query-optimization/scenarios/level-1/join-basics.yaml ADDED Viewed

@@ -0,0 +1,66 @@
+meta:
+  id: join-basics
+  level: 1
+  course: mysql-query-optimization
+  type: output
+  description: "Understand MySQL JOIN execution — learn how MySQL processes JOINs, the nested-loop algorithm, and how to optimize JOIN queries with indexes"
+  tags: [MySQL, JOIN, nested-loop, hash-join, index, beginner]
+state: {}
+trigger: |
+  You're building a report that shows order details with customer and
+  product information. Your query joins 4 tables and takes 30 seconds:
+  SELECT o.id, o.total, c.name, c.email,
+         oi.quantity, oi.unit_price, p.name AS product_name
+  FROM orders o
+  JOIN customers c ON c.id = o.customer_id
+  JOIN order_items oi ON oi.order_id = o.id
+  JOIN products p ON p.id = oi.product_id
+  WHERE o.created_at >= '2025-01-01'
+    AND o.status = 'completed';
+  EXPLAIN output:
+  +----+-------+--------+------+---------------+------+--------+-----------+
+  | id | table | type   | key  | rows          | filt | Extra              |
+  +----+-------+--------+------+---------------+------+--------+-----------+
+  |  1 | o     | ALL    | NULL | 5,000,000     | 5.00 | Using where        |
+  |  1 | c     | eq_ref | PK   | 1             | 100  | NULL               |
+  |  1 | oi    | ref    | idx_order_id | 3    | 100  | NULL               |
+  |  1 | p     | eq_ref | PK   | 1             | 100  | NULL               |
+  +----+-------+--------+------+---------------+------+--------+-----------+
+  The issue: orders table has type=ALL (full table scan of 5M rows).
+  MySQL then does a nested loop: for each of the 5M order rows, it
+  looks up the customer (eq_ref, fast), finds order items (ref, ~3 per
+  order), and looks up each product (eq_ref, fast).
+  But the first step scans all 5M orders when only ~250K match the
+  WHERE clause (status='completed' AND created_at >= '2025-01-01').
+  Available indexes:
+  - orders: PRIMARY(id), idx_customer_id(customer_id)
+  - customers: PRIMARY(id)
+  - order_items: PRIMARY(id), idx_order_id(order_id)
+  - products: PRIMARY(id)
+  Task: Explain how MySQL executes JOINs. Write: the nested-loop join
+  algorithm (how MySQL processes joins step by step), why join order
+  matters (driving table selection), why the orders table needs a better
+  index, how eq_ref vs ref vs ALL access types affect join performance,
+  and how MySQL 8.0's hash join changes things.
+assertions:
+  - type: llm_judge
+    criteria: "Nested-loop join is explained — MySQL traditionally uses nested-loop join: for each row in the outer (driving) table, it looks up matching rows in the inner table using the join condition. The optimizer chooses the join order to minimize total rows examined. Currently the orders full table scan drives 5M iterations"
+    weight: 0.35
+    description: "Nested-loop algorithm explained"
+  - type: llm_judge
+    criteria: "Index fix is recommended — a composite index on orders(status, created_at) or orders(created_at, status) would allow MySQL to use a range scan instead of a full table scan on the driving table, reducing rows from 5M to ~250K. Explains that the join itself (eq_ref for customers and products) is already optimal"
+    weight: 0.35
+    description: "Index fix recommended"
+  - type: llm_judge
+    criteria: "MySQL 8.0 hash join is mentioned — MySQL 8.0.18+ introduced hash join for equi-joins without indexes (builds hash table on smaller table, probes with larger table). Hash join is used when there's no usable index on the join column. It's an alternative to block nested-loop join, not a replacement for indexed nested-loop"
+    weight: 0.30
+    description: "Hash join mentioned"

package/courses/mysql-query-optimization/scenarios/level-1/n-plus-one-queries.yaml ADDED Viewed

@@ -0,0 +1,67 @@
+meta:
+  id: n-plus-one-queries
+  level: 1
+  course: mysql-query-optimization
+  type: output
+  description: "Detect and fix N+1 queries — identify the pattern in ORM-generated SQL and resolve with eager loading or query restructuring"
+  tags: [MySQL, N+1, ORM, eager-loading, performance, beginner]
+state: {}
+trigger: |
+  Your API endpoint /api/orders takes 12 seconds to return 100 orders.
+  You enable the MySQL slow query log and discover that the endpoint
+  executes 301 queries:
+  -- Query 1: Get orders
+  SELECT * FROM orders WHERE user_id = 42 ORDER BY created_at DESC LIMIT 100;
+  -- 0.5ms
+  -- Query 2-101: Get customer for each order (N times)
+  SELECT * FROM customers WHERE id = 1001;  -- 0.3ms
+  SELECT * FROM customers WHERE id = 1002;  -- 0.3ms
+  SELECT * FROM customers WHERE id = 1003;  -- 0.3ms
+  ... (98 more)
+  -- Query 102-201: Get order items for each order (N times)
+  SELECT * FROM order_items WHERE order_id = 5001;  -- 0.4ms
+  SELECT * FROM order_items WHERE order_id = 5002;  -- 0.4ms
+  ... (98 more)
+  -- Query 202-301: Get product for each order item (N times)
+  SELECT * FROM products WHERE id = 8001;  -- 0.3ms
+  SELECT * FROM products WHERE id = 8002;  -- 0.3ms
+  ... (98 more)
+  Each individual query is fast (0.3-0.5ms), but the total is:
+  1 + 100 + 100 + 100 = 301 queries × ~0.35ms = ~105ms query time
+  But with network round-trip overhead: 301 × ~40ms = 12 seconds
+  The ORM code that generates this:
+  orders = Order.find_by_user(user_id=42, limit=100)
+  for order in orders:
+      order.customer = Customer.find(order.customer_id)
+      order.items = OrderItem.find_by_order(order.id)
+      for item in order.items:
+          item.product = Product.find(item.product_id)
+  Task: Explain the N+1 query problem and how to fix it. Write: why
+  this pattern is slow (network round trips, not query execution), how
+  to detect it (query count monitoring, slow query log patterns), the
+  JOIN-based solution (single query with JOINs), the IN-based solution
+  (batch loading with WHERE id IN (...)), and which ORM features help
+  (eager loading, batch size configuration).
+assertions:
+  - type: llm_judge
+    criteria: "N+1 problem is correctly identified — 1 query to get the list + N queries per related entity = N+1 pattern. The bottleneck is network round-trip latency (not query execution), which is why each query is fast individually but the total is slow. 301 round trips at 40ms each dominates the 12-second response time"
+    weight: 0.35
+    description: "N+1 problem identified"
+  - type: llm_judge
+    criteria: "Solutions are provided — JOIN solution (single query joining orders, customers, order_items, products), IN-based batch loading (SELECT * FROM customers WHERE id IN (1001, 1002, ...)), and ORM eager loading patterns. Explains trade-offs: JOINs may return duplicate data for one-to-many, while IN-based loading keeps queries separate but reduces to 4 queries total"
+    weight: 0.35
+    description: "Multiple solutions provided"
+  - type: llm_judge
+    criteria: "Detection methods are explained — monitoring query count per request, enabling slow query log with low threshold, using MySQL Performance Schema (events_statements_summary_by_digest) to find high-frequency simple queries, and ORM-level query logging to catch the pattern during development"
+    weight: 0.30
+    description: "Detection methods explained"

package/courses/mysql-query-optimization/scenarios/level-1/query-rewriting-basics.yaml ADDED Viewed

@@ -0,0 +1,66 @@
+meta:
+  id: query-rewriting-basics
+  level: 1
+  course: mysql-query-optimization
+  type: output
+  description: "Rewrite inefficient MySQL queries — transform common slow patterns into optimized equivalents using indexes, limits, and restructured logic"
+  tags: [MySQL, query-rewriting, optimization, patterns, beginner]
+state: {}
+trigger: |
+  Your team lead asks you to review 6 queries flagged by the DBA as
+  "query optimization opportunities." Each query works correctly but
+  can be rewritten to be significantly faster.
+  Query 1 — Counting with full scan:
+  SELECT COUNT(*) FROM orders WHERE status = 'pending';
+  -- Takes 3 seconds on 10M rows, even with index on status
+  -- Because COUNT(*) must visit every matching row
+  Query 2 — Existence check with COUNT:
+  SELECT COUNT(*) FROM users WHERE email = 'john@example.com';
+  -- Application checks: if (count > 0) { user_exists = true }
+  -- Scans all matching rows just to check existence
+  Query 3 — DISTINCT with large result set:
+  SELECT DISTINCT customer_id FROM orders
+  WHERE created_at >= '2025-01-01';
+  -- Returns 500K distinct customer IDs from 5M orders
+  -- Creates a temporary table and sorts
+  Query 4 — Sorting large result sets:
+  SELECT * FROM products ORDER BY popularity DESC;
+  -- Returns all 1M products sorted, but UI shows only top 20
+  -- Sorts 1M rows in memory or on disk
+  Query 5 — Correlated subquery:
+  SELECT * FROM customers c
+  WHERE (SELECT MAX(created_at) FROM orders
+         WHERE customer_id = c.id) >= '2025-01-01';
+  -- Executes the subquery once per customer (500K times)
+  Query 6 — UNION when UNION ALL works:
+  SELECT name FROM products WHERE category = 'electronics'
+  UNION
+  SELECT name FROM products WHERE category = 'computers';
+  -- UNION removes duplicates (requires sort + dedup)
+  -- Categories are mutually exclusive, so no duplicates possible
+  Task: Rewrite each query for better performance. Write: the optimized
+  version of each query, why the original is slow, what technique makes
+  it faster, and the general principle behind each optimization.
+assertions:
+  - type: llm_judge
+    criteria: "All 6 rewrites are correct — (1) maintain a counter table or use approximate count, or accept the cost if exact count is needed, (2) SELECT EXISTS or SELECT 1 ... LIMIT 1, (3) consider GROUP BY or index-only scan, (4) add LIMIT 20, (5) convert to JOIN (JOIN orders ON... WHERE orders.created_at >= ...), (6) UNION ALL since categories are mutually exclusive"
+    weight: 0.35
+    description: "All rewrites correct"
+  - type: llm_judge
+    criteria: "Performance impact is explained — each rewrite explains why it's faster (fewer rows examined, less sorting, avoiding temporary tables, reducing subquery executions from N to 1, removing unnecessary deduplication)"
+    weight: 0.35
+    description: "Performance impact explained"
+  - type: llm_judge
+    criteria: "General principles are extracted — always add LIMIT when you only need a subset, use EXISTS instead of COUNT for existence checks, use UNION ALL when duplicates aren't possible, convert correlated subqueries to JOINs, avoid sorting more data than needed"
+    weight: 0.30
+    description: "General principles extracted"

package/courses/mysql-query-optimization/scenarios/level-1/select-star-problems.yaml ADDED Viewed

@@ -0,0 +1,68 @@
+meta:
+  id: select-star-problems
+  level: 1
+  course: mysql-query-optimization
+  type: output
+  description: "Understand why SELECT * hurts performance — learn about unnecessary I/O, buffer pool waste, and covering index defeat in MySQL"
+  tags: [MySQL, SELECT-star, InnoDB, buffer-pool, covering-index, beginner]
+state: {}
+trigger: |
+  Your team's API is consuming excessive memory and the MySQL server's
+  buffer pool hit ratio has dropped from 99% to 85%. Investigation
+  reveals that almost every query uses SELECT *. A senior DBA says
+  "SELECT * is killing our buffer pool." You need to understand why.
+  Example queries and their impact:
+  1. User list endpoint:
+     SELECT * FROM users WHERE status = 'active' LIMIT 50;
+     -- users table has 45 columns including a TEXT bio and BLOB avatar
+     -- Only the API needs: id, name, email, avatar_url (4 columns)
+     -- Result: 50 rows × 45 columns instead of 50 × 4
+  2. Order count:
+     SELECT * FROM orders WHERE customer_id = 123;
+     -- Then the application does: orders.length
+     -- The application only needs COUNT(*), not the actual rows
+     -- Result: transfers 500 rows (with JSON payload column) just to count
+  3. Existence check:
+     SELECT * FROM users WHERE email = 'john@example.com';
+     -- Then: if (result) { ... }
+     -- Only needs SELECT 1 or SELECT EXISTS
+     -- Result: loads entire row with all 45 columns to check existence
+  4. Covering index defeat:
+     -- Index exists: idx_status_email (status, email)
+     SELECT * FROM users WHERE status = 'active';
+     -- EXPLAIN: type=ref, Extra=NULL (reads full rows from clustered index)
+     SELECT status, email FROM users WHERE status = 'active';
+     -- EXPLAIN: type=ref, Extra=Using index (index-only scan!)
+  InnoDB buffer pool stats:
+  - Buffer pool size: 8GB
+  - Pages read from disk: 50K/hour (was 5K/hour before traffic growth)
+  - Buffer pool hit ratio: 85% (target: 99%+)
+  - Average row size: 2.4KB (would be 200 bytes with explicit columns)
+  Task: Explain why SELECT * is a performance problem. Write: how it
+  wastes I/O and memory (InnoDB buffer pool impact), how it defeats
+  covering indexes, when it prevents query optimization, what the
+  correct practice is (explicit column lists), and exceptions where
+  SELECT * is acceptable.
+assertions:
+  - type: llm_judge
+    criteria: "I/O and buffer pool impact is explained — SELECT * reads entire rows from the clustered index including large columns (TEXT, BLOB, JSON), filling the buffer pool with unnecessary data and evicting hot pages. With explicit columns, InnoDB may satisfy the query from a smaller secondary index (covering index), keeping the buffer pool efficient"
+    weight: 0.35
+    description: "Buffer pool impact explained"
+  - type: llm_judge
+    criteria: "Covering index defeat is explained — when SELECT * is used, MySQL must read the full row from the clustered index even if a secondary index matches the WHERE clause. With explicit columns that are all in the index, MySQL can use an index-only scan (Using index in EXPLAIN), avoiding the clustered index entirely"
+    weight: 0.35
+    description: "Covering index defeat explained"
+  - type: llm_judge
+    criteria: "Practical guidance is provided — always specify needed columns, use COUNT(*) for counts, use SELECT EXISTS or SELECT 1 ... LIMIT 1 for existence checks, and notes exceptions (ad-hoc debugging queries, tables with few small columns, ORM code that needs all fields for object hydration)"
+    weight: 0.30
+    description: "Practical guidance"

package/courses/mysql-query-optimization/scenarios/level-1/slow-query-diagnosis.yaml ADDED Viewed

@@ -0,0 +1,65 @@
+meta:
+  id: slow-query-diagnosis
+  level: 1
+  course: mysql-query-optimization
+  type: output
+  description: "Diagnose slow MySQL queries — use the slow query log, Performance Schema, and SHOW PROCESSLIST to find and analyze problematic queries"
+  tags: [MySQL, slow-query-log, Performance-Schema, diagnosis, beginner]
+state: {}
+trigger: |
+  Users are reporting that "the app is slow." You need to find which
+  queries are causing the problem, but you don't know where to start.
+  You have access to a MySQL 8.0 production server.
+  Current configuration:
+  - slow_query_log = OFF (never been enabled)
+  - long_query_time = 10 (default — only logs queries >10 seconds)
+  - performance_schema = ON (default in MySQL 8.0)
+  When you run SHOW PROCESSLIST:
+  +----+------+-------+------+---------+------+----------+----------------------+
+  | Id | User | Host  | db   | Command | Time | State    | Info                 |
+  +----+------+-------+------+---------+------+----------+----------------------+
+  | 45 | app  | ...   | shop | Query   | 120  | Sending  | SELECT * FROM ord... |
+  | 46 | app  | ...   | shop | Query   | 85   | Sorting  | SELECT p.*, COUNT... |
+  | 47 | app  | ...   | shop | Query   | 0    | executing| SELECT id FROM u... |
+  | 48 | app  | ...   | shop | Sleep   | 300  | NULL     | NULL                 |
+  | 49 | app  | ...   | shop | Query   | 45   | Locked   | UPDATE inventory ... |
+  +----+------+-------+------+---------+------+----------+----------------------+
+  (plus 195 more connections, mostly in Sleep state)
+  You also check Performance Schema:
+  SELECT DIGEST_TEXT, COUNT_STAR, AVG_TIMER_WAIT/1000000000 AS avg_ms,
+         SUM_ROWS_EXAMINED, SUM_ROWS_SENT
+  FROM performance_schema.events_statements_summary_by_digest
+  ORDER BY SUM_TIMER_WAIT DESC LIMIT 5;
+  Top 5 queries by total time:
+  1. SELECT * FROM orders WHERE... — 50K executions, avg 200ms, 10M rows examined
+  2. SELECT p.*, COUNT(r.*)... — 5K executions, avg 3 seconds, 50M rows examined
+  3. UPDATE inventory SET... — 20K executions, avg 150ms (but high lock waits)
+  4. SELECT * FROM products... — 100K executions, avg 50ms, 500K rows examined
+  5. INSERT INTO audit_log... — 200K executions, avg 5ms (low individual but high total)
+  Task: Explain how to diagnose slow queries in MySQL. Write: how to
+  enable and configure the slow query log (what settings to change),
+  how to read SHOW PROCESSLIST output (what each column and state means),
+  how to use Performance Schema for query analysis, how to identify the
+  highest-impact query to optimize first, and the initial investigation
+  steps for each of the 5 top queries.
+assertions:
+  - type: llm_judge
+    criteria: "Slow query log configuration is correct — SET GLOBAL slow_query_log = ON, lower long_query_time to 1 second (or lower), enable log_queries_not_using_indexes, and explains where to find the log file. Mentions pt-query-digest or mysqldumpslow for analyzing the log"
+    weight: 0.35
+    description: "Slow query log setup"
+  - type: llm_judge
+    criteria: "PROCESSLIST analysis identifies issues — the 120-second query is a major problem (likely full table scan), the Sorting state indicates filesort on large result, the Locked state indicates lock contention, the 300-second Sleep is a connection leak. Each state is explained and the severity is assessed"
+    weight: 0.35
+    description: "PROCESSLIST analysis"
+  - type: llm_judge
+    criteria: "Performance Schema analysis prioritizes correctly — identifies query #2 (3-second avg, 50M rows) as highest impact for optimization, explains why total time matters more than individual execution time for query #5, and recommends investigating the rows_examined vs rows_sent ratio as a key efficiency metric"
+    weight: 0.30
+    description: "Performance Schema prioritization"

package/courses/mysql-query-optimization/scenarios/level-1/where-clause-optimization.yaml ADDED Viewed

@@ -0,0 +1,65 @@
+meta:
+  id: where-clause-optimization
+  level: 1
+  course: mysql-query-optimization
+  type: output
+  description: "Optimize WHERE clauses in MySQL — write SARGable conditions, avoid function wrapping, and understand how MySQL uses indexes for filtering"
+  tags: [MySQL, WHERE, SARGable, index-usage, filtering, beginner]
+state: {}
+trigger: |
+  You're reviewing slow queries found in the slow query log. Several
+  queries have indexes available but MySQL isn't using them. The
+  EXPLAIN output shows type=ALL (full table scan) despite relevant
+  indexes existing.
+  Problematic queries (all have indexes on the filtered columns):
+  1. Date range query (index on created_at):
+     SELECT * FROM orders
+     WHERE YEAR(created_at) = 2025 AND MONTH(created_at) = 12;
+     -- EXPLAIN: type=ALL, rows=5M (ignores index)
+  2. Case-insensitive search (index on email):
+     SELECT * FROM users WHERE LOWER(email) = 'john@example.com';
+     -- EXPLAIN: type=ALL, rows=2M (ignores index)
+  3. String pattern search (index on name):
+     SELECT * FROM products WHERE name LIKE '%widget%';
+     -- EXPLAIN: type=ALL, rows=1M (ignores index)
+  4. Math on column (index on price):
+     SELECT * FROM products WHERE price * 1.1 > 50;
+     -- EXPLAIN: type=ALL, rows=1M (ignores index)
+  5. Implicit type conversion (index on phone VARCHAR):
+     SELECT * FROM customers WHERE phone = 5551234567;
+     -- EXPLAIN: type=ALL, rows=500K (ignores index)
+  6. OR with different columns (indexes on status and priority):
+     SELECT * FROM tickets
+     WHERE status = 'open' OR priority = 'high';
+     -- EXPLAIN: type=ALL, rows=2M (ignores both indexes)
+  Each query has an appropriate index, but the WHERE clause is written
+  in a way that prevents MySQL from using it.
+  Task: Explain why each query can't use its index and how to rewrite
+  it. Write: what "SARGable" means, why functions on indexed columns
+  prevent index usage, the correct rewrite for each of the 6 queries,
+  and general rules for writing index-friendly WHERE clauses.
+assertions:
+  - type: llm_judge
+    criteria: "SARGable is explained — Search ARGument ABLE means the condition can use an index. Functions, math, or type conversions on the indexed column prevent index usage because MySQL can't use the B-tree to look up transformed values. The condition must isolate the column on one side of the comparison"
+    weight: 0.35
+    description: "SARGable concept explained"
+  - type: llm_judge
+    criteria: "All 6 rewrites are correct — (1) created_at >= '2025-12-01' AND created_at < '2026-01-01', (2) email = 'john@example.com' (MySQL default collation is case-insensitive) or use a generated column with expression index, (3) full-text index or application-level search (leading wildcard can't use B-tree), (4) price > 50/1.1 (move math to the constant side), (5) phone = '5551234567' (match the column type), (6) UNION of two queries or index_merge optimization"
+    weight: 0.35
+    description: "Correct rewrites for all 6"
+  - type: llm_judge
+    criteria: "General rules are provided — never apply functions to indexed columns, keep the indexed column alone on one side, match data types to avoid implicit conversion, leading wildcards can't use B-tree indexes, OR across different columns may prevent index usage (consider UNION or index_merge)"
+    weight: 0.30
+    description: "General SARGable rules"

package/courses/mysql-query-optimization/scenarios/level-2/buffer-pool-tuning.yaml ADDED Viewed

@@ -0,0 +1,64 @@
+meta:
+  id: buffer-pool-tuning
+  level: 2
+  course: mysql-query-optimization
+  type: output
+  description: "Tune InnoDB buffer pool — configure pool size, instances, and flushing for optimal read/write performance"
+  tags: [MySQL, InnoDB, buffer-pool, tuning, memory, intermediate]
+state: {}
+trigger: |
+  You've been asked to tune a MySQL 8.0 server that's performing poorly.
+  The server has 128GB RAM but was set up with default settings. You
+  need to configure the buffer pool and related settings for a mixed
+  OLTP/reporting workload.
+  Server specs:
+  - RAM: 128GB
+  - CPU: 32 cores
+  - Storage: NVMe SSD (500K IOPS)
+  - MySQL data size: 200GB (data + indexes)
+  - Workload: 70% reads, 30% writes, 50K QPS peak
+  Current settings (defaults):
+  innodb_buffer_pool_size = 128M  (0.1% of RAM!)
+  innodb_buffer_pool_instances = 1
+  innodb_buffer_pool_chunk_size = 128M
+  innodb_log_file_size = 48M
+  innodb_io_capacity = 200
+  innodb_io_capacity_max = 2000
+  innodb_flush_log_at_trx_commit = 1
+  innodb_flush_method = fsync
+  Current symptoms:
+  - Buffer pool hit ratio: 65% (terrible)
+  - Innodb_buffer_pool_reads: 500K/hour (disk reads)
+  - Innodb_buffer_pool_pages_dirty: fluctuates wildly (200 to 8000)
+  - Innodb_log_waits: 50K/hour (log writes stalling)
+  - CPU usage: 20% (mostly waiting on I/O)
+  - Disk read latency: 0.5ms (NVMe is fast, but too many reads)
+  The data team runs nightly reports (full table scans on 50GB tables)
+  that flush the buffer pool, causing morning performance degradation.
+  Task: Design the buffer pool tuning configuration. Write: the
+  recommended innodb_buffer_pool_size (with calculation), the instance
+  and chunk size configuration, the flushing strategy (io_capacity,
+  flush_method, log_file_size), how to handle the nightly report
+  buffer pool pollution, and the monitoring queries to validate
+  improvements.
+assertions:
+  - type: llm_judge
+    criteria: "Buffer pool sizing is correct — recommends 70-80% of RAM (90-100GB) for this dedicated MySQL server, explains why it can't cache the full 200GB dataset but should cache the hot working set, configures innodb_buffer_pool_instances to 8-16 for multi-core concurrency, and sets chunk_size appropriately (size must be multiple of chunk × instances)"
+    weight: 0.35
+    description: "Buffer pool sizing"
+  - type: llm_judge
+    criteria: "Flushing and logging are tuned — increases innodb_log_file_size to 1-4GB (reduces checkpoint frequency, eliminates log_waits), sets innodb_io_capacity to match NVMe capability (10K-50K), uses O_DIRECT flush method (avoids OS double-buffering), and explains the trade-off of innodb_flush_log_at_trx_commit=2 for the reporting workload"
+    weight: 0.35
+    description: "Flushing and logging tuned"
+  - type: llm_judge
+    criteria: "Buffer pool pollution from reports is addressed — explains InnoDB's midpoint insertion strategy (innodb_old_blocks_pct, innodb_old_blocks_time), how full table scans pollute the buffer pool by evicting hot pages, and solutions: tuning old_blocks_time to prevent scan pages from moving to young list, or running reports on a read replica"
+    weight: 0.30
+    description: "Buffer pool pollution addressed"

package/courses/mysql-query-optimization/scenarios/level-2/composite-index-design.yaml ADDED Viewed

@@ -0,0 +1,71 @@
+meta:
+  id: composite-index-design
+  level: 2
+  course: mysql-query-optimization
+  type: output
+  description: "Design composite indexes — understand column ordering, leftmost prefix rule, and how to cover multiple query patterns with fewer indexes"
+  tags: [MySQL, composite-index, column-ordering, leftmost-prefix, intermediate]
+state: {}
+trigger: |
+  Your e-commerce application has 8 different query patterns on the
+  orders table (50M rows), and the current 12 single-column indexes
+  are consuming 15GB of storage while not covering the most common
+  query patterns effectively.
+  The 8 query patterns (ordered by frequency):
+  1. (50K/day) Order lookup by customer:
+     WHERE customer_id = ? ORDER BY created_at DESC LIMIT 20
+  2. (30K/day) Recent orders by status:
+     WHERE status = ? AND created_at >= ? ORDER BY created_at DESC
+  3. (20K/day) Customer orders by status:
+     WHERE customer_id = ? AND status = ?
+  4. (10K/day) Date range with status:
+     WHERE created_at BETWEEN ? AND ? AND status = ?
+  5. (5K/day) Admin search:
+     WHERE status = ? AND total >= ? ORDER BY total DESC
+  6. (2K/day) Customer analytics:
+     SELECT customer_id, COUNT(*), SUM(total)
+     FROM orders WHERE created_at >= ? GROUP BY customer_id
+  7. (1K/day) Fulfillment queue:
+     WHERE status = 'pending' AND warehouse_id = ?
+     ORDER BY priority DESC, created_at ASC
+  8. (500/day) Revenue report:
+     SELECT DATE(created_at), SUM(total)
+     WHERE created_at BETWEEN ? AND ?
+     GROUP BY DATE(created_at)
+  Current single-column indexes:
+  idx_customer_id, idx_status, idx_created_at, idx_total,
+  idx_warehouse_id, idx_priority (plus 6 more on other columns)
+  Goal: Replace the 12 indexes with 4-5 well-designed composite indexes
+  that cover all 8 query patterns.
+  Task: Design the composite index strategy. Write: the indexes you
+  would create (with column order), which query patterns each index
+  serves, the leftmost prefix rule (why column order matters), how to
+  handle conflicting sort orders, and the storage savings estimate.
+assertions:
+  - type: llm_judge
+    criteria: "Composite indexes are well-designed — proposes 4-5 indexes that cover all 8 query patterns using the leftmost prefix rule. For example: (customer_id, status, created_at), (status, created_at), (status, warehouse_id, priority, created_at). Column ordering follows equality-first, then range/sort"
+    weight: 0.35
+    description: "Well-designed composite indexes"
+  - type: llm_judge
+    criteria: "Leftmost prefix rule is explained — MySQL can use a composite index for queries that filter on the leftmost columns of the index. An index on (A, B, C) can serve queries on (A), (A, B), and (A, B, C) but NOT (B, C) or (C). This is why column order matters and why one composite index can serve multiple query patterns"
+    weight: 0.35
+    description: "Leftmost prefix rule"
+  - type: llm_judge
+    criteria: "Trade-offs are discussed — explains how equality columns should come before range columns in the index, why conflicting sort orders may need separate indexes, the write overhead of composite indexes (each insert/update must update all indexes), and the estimated storage reduction from 12 single-column to 4-5 composite indexes"
+    weight: 0.30
+    description: "Trade-offs discussed"

package/courses/mysql-query-optimization/scenarios/level-2/covering-and-invisible-indexes.yaml ADDED Viewed

@@ -0,0 +1,69 @@
+meta:
+  id: covering-and-invisible-indexes
+  level: 2
+  course: mysql-query-optimization
+  type: output
+  description: "Master covering indexes and invisible indexes — use INCLUDE-like patterns and safely test index removal in MySQL 8.0+"
+  tags: [MySQL, covering-index, invisible-index, index-only-scan, intermediate]
+state: {}
+trigger: |
+  Your application's most frequent query runs 100K times per day and
+  takes 15ms on average. EXPLAIN shows it uses an index but still
+  reads the full row from the clustered index (type=ref, no "Using
+  index" in Extra). You want to make it an index-only scan.
+  The query:
+  SELECT customer_id, email, status
+  FROM users
+  WHERE status = 'active' AND country = 'US';
+  Current index: idx_status_country (status, country)
+  EXPLAIN:
+  type=ref, key=idx_status_country, rows=50000, Extra=NULL
+  The index matches the WHERE clause, so MySQL uses it to find the
+  50K matching row pointers. But then it must do 50K lookups into
+  the clustered index to retrieve customer_id and email — those
+  columns aren't in the secondary index.
+  Target: Make this a covering index (index-only scan) so MySQL
+  reads everything from the index without touching the table data.
+  Meanwhile, you also have 15 indexes on the users table and suspect
+  5 of them are unused. But dropping an index in production is scary
+  — what if an important query uses it at month-end?
+  MySQL 8.0 introduced invisible indexes:
+  ALTER TABLE users ALTER INDEX idx_old_column INVISIBLE;
+  -- The optimizer ignores this index, but it's still maintained
+  -- If something breaks, make it visible again instantly
+  Questions to address:
+  1. How to create a covering index for the query above?
+  2. Why doesn't MySQL have INCLUDE columns like PostgreSQL?
+  3. How do invisible indexes help safely remove unused indexes?
+  4. What's the trade-off of wider indexes for covering vs write perf?
+  5. How do you identify which indexes are unused?
+  Task: Design the covering index and the index cleanup strategy.
+  Write: the covering index for the query, how index-only scans work
+  in InnoDB, the invisible index workflow for safe index removal, how
+  to identify unused indexes (Performance Schema), and the trade-offs
+  of wider covering indexes.
+assertions:
+  - type: llm_judge
+    criteria: "Covering index is correct — creates idx_status_country_email_custid(status, country, customer_id, email) or similar. Explains that 'Using index' in EXPLAIN Extra means all needed columns are in the index (index-only scan). MySQL doesn't have INCLUDE like PostgreSQL, so all columns must be part of the index key"
+    weight: 0.35
+    description: "Correct covering index"
+  - type: llm_judge
+    criteria: "Invisible index workflow is explained — make suspect index invisible, monitor for query regressions over a business cycle (1 week or 1 month), if no impact then drop it, if queries slow down make it visible again instantly. Uses performance_schema.table_io_waits_summary_by_index_usage or sys.schema_unused_indexes to find candidates"
+    weight: 0.35
+    description: "Invisible index workflow"
+  - type: llm_judge
+    criteria: "Trade-offs are analyzed — wider covering indexes use more storage and slow writes (more data to update per insert/update), but eliminate clustered index lookups for reads. Quantifies the trade-off: 50K lookups avoided per query execution vs additional index size and write overhead. Decision depends on read/write ratio"
+    weight: 0.30
+    description: "Trade-offs analyzed"