npm - dojo.md - Versions diffs - 0.2.0 → 0.2.1 - Mend

dojo.md 0.2.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (222) hide show

package/courses/mysql-query-optimization/scenarios/level-2/cte-and-window-functions.yaml ADDED Viewed

@@ -0,0 +1,78 @@
+meta:
+  id: cte-and-window-functions
+  level: 2
+  course: mysql-query-optimization
+  type: output
+  description: "Optimize CTEs and window functions — use MySQL 8.0 features effectively while understanding their performance implications"
+  tags: [MySQL, CTE, window-functions, MySQL-8, optimization, intermediate]
+state: {}
+trigger: |
+  Your team is migrating queries from subqueries to CTEs and window
+  functions (MySQL 8.0 upgrade). Some rewritten queries are faster,
+  but others are unexpectedly slower. You need to understand why.
+  Query 1 — CTE that got slower:
+  -- Old (subquery, 200ms):
+  SELECT * FROM (SELECT customer_id, SUM(total) AS total_spent
+    FROM orders GROUP BY customer_id) AS sub
+  WHERE total_spent > 10000;
+  -- New (CTE, 500ms):
+  WITH high_spenders AS (
+    SELECT customer_id, SUM(total) AS total_spent
+    FROM orders GROUP BY customer_id
+  )
+  SELECT * FROM high_spenders WHERE total_spent > 10000;
+  Why is the CTE slower? MySQL 8.0 may materialize the CTE into a
+  temporary table, while the derived table (subquery) gets merged into
+  the outer query by the optimizer.
+  Query 2 — Window function vs self-join:
+  -- Old (self-join, 5 seconds):
+  SELECT o1.*, (SELECT COUNT(*) FROM orders o2
+    WHERE o2.customer_id = o1.customer_id
+    AND o2.created_at <= o1.created_at) AS order_number
+  FROM orders o1 WHERE o1.customer_id = 42;
+  -- New (window function, 50ms):
+  SELECT *, ROW_NUMBER() OVER (
+    PARTITION BY customer_id ORDER BY created_at
+  ) AS order_number
+  FROM orders WHERE customer_id = 42;
+  Query 3 — Running total with window function:
+  SELECT id, amount,
+    SUM(amount) OVER (ORDER BY created_at
+      ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total,
+    AVG(amount) OVER (ORDER BY created_at
+      ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS moving_avg_7
+  FROM transactions WHERE account_id = 1;
+  Questions:
+  1. When do CTEs get materialized vs merged?
+  2. How do window functions handle the OVER clause efficiently?
+  3. Can window functions use indexes?
+  4. What's the performance impact of multiple OVER clauses?
+  Task: Explain CTE and window function optimization in MySQL 8.0.
+  Write: when CTEs are materialized vs merged (and how to control it),
+  how window functions are executed (sort-based processing), when window
+  functions outperform self-joins, how to optimize queries with multiple
+  window functions, and common pitfalls.
+assertions:
+  - type: llm_judge
+    criteria: "CTE materialization is explained — MySQL 8.0 may materialize CTEs into temporary tables (adding overhead), while derived tables can be merged into the outer query. A CTE is materialized if: referenced more than once, is recursive, or optimizer chooses materialization. Use EXPLAIN to check. Can influence with optimizer_switch derived_merge"
+    weight: 0.35
+    description: "CTE materialization explained"
+  - type: llm_judge
+    criteria: "Window function execution is explained — MySQL sorts data by the PARTITION BY and ORDER BY columns, then processes the window frame for each row. Multiple OVER clauses with different orderings may require multiple sorts. Same OVER clause shared across functions avoids re-sorting. Indexes on (partition_col, order_col) help avoid explicit sorts"
+    weight: 0.35
+    description: "Window function execution"
+  - type: llm_judge
+    criteria: "Practical optimization advice is provided — use named WINDOW clauses to share sort operations, prefer ROW_NUMBER over self-joins for ranking, be aware that window functions process all rows before LIMIT (so add WHERE filters to reduce the window), and compare CTE vs derived table performance with EXPLAIN"
+    weight: 0.30
+    description: "Practical optimization advice"

package/courses/mysql-query-optimization/scenarios/level-2/intermediate-optimization-shift.yaml ADDED Viewed

@@ -0,0 +1,68 @@
+meta:
+  id: intermediate-optimization-shift
+  level: 2
+  course: mysql-query-optimization
+  type: output
+  description: "Intermediate optimization shift — tune a MySQL server for a growing SaaS application combining indexing, buffer pool, and query optimization"
+  tags: [MySQL, shift-simulation, SaaS, tuning, intermediate]
+state: {}
+trigger: |
+  You're the sole DBA for a SaaS application that's grown from 100
+  to 10,000 customers in 6 months. MySQL was set up once and never
+  tuned. Performance has degraded steadily, and the CEO says "the
+  app feels slow — fix it by Friday."
+  Server: MySQL 8.0 on RDS (db.r6g.2xlarge, 8 vCPU, 64GB RAM)
+  Performance Schema top issues:
+  1. 45 queries with full table scans (type=ALL in EXPLAIN)
+  2. Buffer pool hit ratio: 82% (innodb_buffer_pool_size=8GB)
+  3. 38 unused indexes consuming 5GB
+  4. Top query by total time: multi-tenant dashboard query
+     SELECT * FROM activities
+     WHERE tenant_id = ? AND created_at >= ?
+     ORDER BY created_at DESC LIMIT 50;
+     — No composite index on (tenant_id, created_at)
+     — Executes 200K times/day, avg 500ms each
+  5. Second highest: user search
+     SELECT * FROM users WHERE LOWER(email) LIKE '%@gmail.com';
+     — Full table scan (function on column + leading wildcard)
+     — Executes 50K times/day, avg 200ms each
+  6. Lock contention on the subscriptions table:
+     UPDATE subscriptions SET status = 'expired'
+     WHERE end_date < NOW() AND status = 'active';
+     — Runs every minute via cron, locks 5,000 rows for 3 seconds
+     — Blocks checkout flow (which also updates subscriptions)
+  7. Slow nightly report:
+     SELECT tenant_id, COUNT(*), SUM(amount)
+     FROM invoices GROUP BY tenant_id;
+     — 50M rows, takes 8 minutes, pollutes buffer pool
+  8. Connection issues:
+     max_connections = 151 (default)
+     Current connections: 145 (at capacity during peak)
+     Many connections in Sleep state (idle 300+ seconds)
+  Task: Write the optimization plan. Include: the immediate fixes
+  (what to do today), the index changes (add and remove), the
+  configuration tuning (buffer pool, connections), the query rewrites,
+  and the priority order with expected impact.
+assertions:
+  - type: llm_judge
+    criteria: "Immediate fixes are high-impact — adds composite index on activities(tenant_id, created_at) as top priority (200K queries × 500ms → <10ms = saves 27 hours/day of query time), increases buffer pool to 48GB (75% of 64GB), and increases max_connections or adds connection timeout to handle the Sleep connections"
+    weight: 0.35
+    description: "High-impact immediate fixes"
+  - type: llm_judge
+    criteria: "All 8 issues are addressed — full table scans (add indexes), buffer pool (resize), unused indexes (make invisible then drop), dashboard query (composite index), user search (rewrite LOWER + wildcard, consider full-text search), lock contention (batch the UPDATE into smaller chunks or use WHERE + LIMIT), nightly report (run on read replica), connection management (reduce wait_timeout, consider ProxySQL)"
+    weight: 0.35
+    description: "All issues addressed"
+  - type: llm_judge
+    criteria: "Priority order maximizes impact — ranks fixes by total time saved or risk reduction, estimates improvement for each fix, and groups into 'do today' (index + buffer pool), 'this week' (query rewrites, connection tuning), and 'next sprint' (lock contention redesign, read replica for reports)"
+    weight: 0.30
+    description: "Prioritized action plan"

package/courses/mysql-query-optimization/scenarios/level-2/join-optimization.yaml ADDED Viewed

@@ -0,0 +1,67 @@
+meta:
+  id: join-optimization
+  level: 2
+  course: mysql-query-optimization
+  type: output
+  description: "Optimize MySQL JOIN execution — understand join buffer, block nested loop, hash join, and how to choose the right join strategy"
+  tags: [MySQL, JOIN, hash-join, block-nested-loop, join-buffer, intermediate]
+state: {}
+trigger: |
+  Your analytics report query joins 5 tables and takes 45 seconds.
+  The EXPLAIN output reveals a mix of efficient and inefficient join
+  methods:
+  SELECT o.id, c.name, p.name, cat.name, s.tracking_number
+  FROM orders o
+  JOIN customers c ON c.id = o.customer_id
+  JOIN order_items oi ON oi.order_id = o.id
+  JOIN products p ON p.id = oi.product_id
+  JOIN categories cat ON cat.code = p.category_code
+  JOIN shipments s ON s.order_id = o.id AND s.status = 'shipped'
+  WHERE o.created_at >= '2025-01-01'
+    AND o.created_at < '2025-02-01';
+  EXPLAIN:
+  | table | type    | key               | rows   | Extra                       |
+  |-------|---------|-------------------|--------|-----------------------------|
+  | o     | range   | idx_created_at    | 500000 | Using index condition        |
+  | c     | eq_ref  | PRIMARY           | 1      |                             |
+  | oi    | ref     | idx_order_id      | 3      |                             |
+  | p     | eq_ref  | PRIMARY           | 1      |                             |
+  | cat   | ALL     | NULL              | 500    | Using where; Using join buf |
+  | s     | ref     | idx_order_id      | 1      | Using where                 |
+  Problems identified:
+  1. categories table uses ALL (full table scan per join iteration) even
+     though it's only 500 rows — no index on categories.code
+  2. The join buffer (block nested-loop) is being used for categories
+  3. With 500K orders × 3 items × 500 categories = many iterations
+  4. join_buffer_size is default 256KB
+  MySQL 8.0 changes:
+  - MySQL 8.0.18+ uses hash join instead of block nested-loop for
+    equi-joins without indexes
+  - MySQL 8.0.20+ removed block nested-loop entirely
+  - Hash join performance depends on join_buffer_size
+  Task: Optimize this join query. Write: why the categories join is
+  slow (missing index + join strategy), the difference between nested-
+  loop, block nested-loop, and hash join, how join_buffer_size affects
+  performance, the optimal indexes for this query, and how to read the
+  join-related EXPLAIN output.
+assertions:
+  - type: llm_judge
+    criteria: "Categories join fix is correct — ADD INDEX idx_code(code) on categories table eliminates the full table scan, changing type from ALL to eq_ref. Explains that even though categories is small (500 rows), the full scan is repeated for each row from the products join (1.5M iterations)"
+    weight: 0.35
+    description: "Categories join fix"
+  - type: llm_judge
+    criteria: "Join algorithms are explained — nested-loop (for each outer row, scan inner table using index), block nested-loop (buffer outer rows, scan inner table once per buffer-full, pre-8.0.20), hash join (build hash table on smaller table, probe with larger table, 8.0.18+). Hash join is used when no index available on join column. join_buffer_size controls how much memory for hash/block-NL"
+    weight: 0.35
+    description: "Join algorithms explained"
+  - type: llm_judge
+    criteria: "EXPLAIN join-related output is decoded — 'Using join buffer (Block Nested Loop)' or 'Using join buffer (hash join)' in Extra, how the rows column estimates are multiplied across joins (500K × 3 × 1 × 1 × 500 × 1 for this query), and how eq_ref vs ref vs ALL determines the join strategy per table"
+    weight: 0.30
+    description: "EXPLAIN join output decoded"

package/courses/mysql-query-optimization/scenarios/level-2/performance-schema-analysis.yaml ADDED Viewed

@@ -0,0 +1,69 @@
+meta:
+  id: performance-schema-analysis
+  level: 2
+  course: mysql-query-optimization
+  type: output
+  description: "Analyze MySQL with Performance Schema — use events_statements, sys schema views, and digest tables to identify optimization targets"
+  tags: [MySQL, Performance-Schema, sys-schema, monitoring, analysis, intermediate]
+state: {}
+trigger: |
+  Your team needs to identify the top optimization opportunities across
+  your MySQL database. The slow query log captures individual slow
+  queries, but you want a complete picture of all query performance.
+  Performance Schema is enabled but nobody on the team knows how to
+  use it effectively.
+  Available data sources:
+  1. performance_schema.events_statements_summary_by_digest
+     — Aggregated query statistics (the most valuable table)
+  2. performance_schema.events_waits_summary_by_instance
+     — Where time is spent waiting (I/O, locks, mutexes)
+  3. sys.schema_unused_indexes
+     — Indexes that haven't been used since last server restart
+  4. sys.statements_with_full_table_scans
+     — Queries performing full table scans
+  5. sys.innodb_lock_waits
+     — Current lock wait chains
+  Your initial analysis reveals:
+  Top 5 queries by total time (events_statements_summary_by_digest):
+  | digest_text (truncated)        | count  | avg_ms | total_min | rows_exam | rows_sent |
+  |-------------------------------|--------|--------|-----------|-----------|-----------|
+  | SELECT * FROM orders WHERE... | 500K   | 50     | 416       | 2.5B      | 500K      |
+  | SELECT p.*, COUNT(r...)...    | 50K    | 200    | 166       | 5B        | 50K       |
+  | INSERT INTO audit_log...      | 2M     | 5      | 166       | 0         | 0         |
+  | SELECT * FROM users WHERE...  | 1M     | 10     | 166       | 100M      | 1M        |
+  | UPDATE inventory SET...       | 100K   | 100    | 166       | 100M      | 0         |
+  Unused indexes: 45 indexes consuming 12GB of storage
+  Full table scan queries: 23 queries, top one examining 2B rows/day
+  Questions to answer:
+  1. How to interpret the rows_examined vs rows_sent ratio?
+  2. Which query should we optimize first?
+  3. How to use sys schema for quick wins?
+  4. How to track query performance after optimization?
+  Task: Write the Performance Schema analysis guide. Include: how to
+  query the key Performance Schema tables, how to interpret the results
+  (which metrics matter most), the prioritization framework (which
+  queries to optimize first), the sys schema quick wins, and how to
+  set up ongoing monitoring.
+assertions:
+  - type: llm_judge
+    criteria: "Key queries for Performance Schema are provided — shows SQL for top queries by total time, by rows examined, by frequency, and the rows_examined/rows_sent ratio as an efficiency metric. Explains that a ratio of 5000:1 (query #1) means 5000 rows are read for every 1 returned — a clear index problem"
+    weight: 0.35
+    description: "Performance Schema queries"
+  - type: llm_judge
+    criteria: "Prioritization framework is practical — optimizes by total time impact (not just per-query time), considers query #1 first (500K executions × 50ms = 416 minutes/day, high volume), explains that the INSERT query (#3) has high total time but low per-query time (may not be optimizable), and the UPDATE (#5) has high per-query time suggesting a missing index or lock contention"
+    weight: 0.35
+    description: "Practical prioritization"
+  - type: llm_judge
+    criteria: "Sys schema quick wins are identified — sys.schema_unused_indexes (drop 45 unused indexes to save 12GB and speed writes), sys.statements_with_full_table_scans (add indexes for top full-scan queries), sys.innodb_lock_waits (identify blocking transactions), and explains how to reset Performance Schema counters (TRUNCATE TABLE) to measure improvement after changes"
+    weight: 0.30
+    description: "Sys schema quick wins"

package/courses/mysql-query-optimization/scenarios/level-2/query-optimizer-hints.yaml ADDED Viewed

@@ -0,0 +1,74 @@
+meta:
+  id: query-optimizer-hints
+  level: 2
+  course: mysql-query-optimization
+  type: output
+  description: "Use MySQL optimizer hints — influence query plans with index hints, optimizer_switch, and MySQL 8.0 hint syntax"
+  tags: [MySQL, optimizer-hints, USE-INDEX, FORCE-INDEX, optimizer-switch, intermediate]
+state: {}
+trigger: |
+  Your application has a query that usually runs in 5ms but
+  occasionally switches to a slow plan (15 seconds). The DBA says
+  the optimizer's statistics are sometimes stale, causing it to choose
+  a full table scan instead of an index scan. You need to understand
+  how to influence the optimizer's choices.
+  The query:
+  SELECT * FROM orders
+  WHERE customer_id = 42 AND status = 'pending'
+  ORDER BY created_at DESC LIMIT 10;
+  Fast plan (normal): Uses idx_customer_status (customer_id, status)
+  Slow plan (occasional): Full table scan when optimizer estimates
+  the index will return too many rows (stale statistics)
+  MySQL hint mechanisms:
+  1. Traditional index hints (SQL-level):
+     SELECT * FROM orders USE INDEX (idx_customer_status) WHERE ...
+     SELECT * FROM orders FORCE INDEX (idx_customer_status) WHERE ...
+     SELECT * FROM orders IGNORE INDEX (idx_status) WHERE ...
+  2. MySQL 8.0+ optimizer hints (comment-based):
+     SELECT /*+ INDEX(orders idx_customer_status) */ * FROM orders ...
+     SELECT /*+ NO_INDEX(orders idx_status) */ * FROM orders ...
+     SELECT /*+ JOIN_ORDER(orders, customers) */ ...
+     SELECT /*+ SET_VAR(join_buffer_size=16M) */ ...
+     SELECT /*+ MAX_EXECUTION_TIME(5000) */ ...  -- timeout in ms
+  3. optimizer_switch (session/global):
+     SET optimizer_switch = 'index_merge=off';
+     SET optimizer_switch = 'block_nested_loop=off';
+     SET optimizer_switch = 'derived_merge=off';
+  4. ANALYZE TABLE (update statistics):
+     ANALYZE TABLE orders;  -- Recalculate index statistics
+  Common scenarios where hints are needed:
+  A. Plan instability (same query, different plans)
+  B. Optimizer choosing wrong join order
+  C. Index merge being slower than single index
+  D. Derived table merging causing performance regression
+  E. Setting per-query memory limits
+  Task: Explain when and how to use optimizer hints. Write: the
+  difference between USE INDEX, FORCE INDEX, and IGNORE INDEX, the
+  MySQL 8.0 hint syntax and available hints, when to use ANALYZE TABLE
+  vs hints (root cause vs workaround), common optimizer_switch settings,
+  and the risks of using hints in production.
+assertions:
+  - type: llm_judge
+    criteria: "Index hints are correctly differentiated — USE INDEX suggests the optimizer consider only these indexes (but it may still choose a table scan), FORCE INDEX tells the optimizer to use the index unless impossible (treats table scan as very expensive), IGNORE INDEX excludes specific indexes. FORCE INDEX is stronger than USE INDEX"
+    weight: 0.35
+    description: "Index hints differentiated"
+  - type: llm_judge
+    criteria: "MySQL 8.0 hints are explained — /*+ ... */ syntax is preferred over traditional hints because they don't change SQL semantics, support more features (JOIN_ORDER, SET_VAR, MAX_EXECUTION_TIME), and can be added without modifying query structure. Shows examples of common hints and their use cases"
+    weight: 0.35
+    description: "MySQL 8.0 hints explained"
+  - type: llm_judge
+    criteria: "Risks and best practices are discussed — hints are workarounds, not fixes (ANALYZE TABLE and index design are better long-term solutions), hints can become counter-productive when data distribution changes, they should be documented and reviewed periodically, and per-query SET_VAR/MAX_EXECUTION_TIME are safer than global optimizer_switch changes"
+    weight: 0.30
+    description: "Risks and best practices"

package/courses/mysql-query-optimization/scenarios/level-2/subquery-optimization.yaml ADDED Viewed

@@ -0,0 +1,70 @@
+meta:
+  id: subquery-optimization
+  level: 2
+  course: mysql-query-optimization
+  type: output
+  description: "Optimize MySQL subqueries — understand semi-join, materialization, and how the optimizer transforms subqueries internally"
+  tags: [MySQL, subqueries, semi-join, materialization, optimizer, intermediate]
+state: {}
+trigger: |
+  Your application has several queries using subqueries, and some are
+  fast while others are unexpectedly slow. You need to understand how
+  MySQL's optimizer handles subqueries differently from JOINs.
+  Query 1 — Fast (uses semi-join optimization):
+  SELECT * FROM customers
+  WHERE id IN (SELECT customer_id FROM orders WHERE total > 1000);
+  -- EXPLAIN: Shows semi-join optimization (FirstMatch or Materialization)
+  -- Time: 50ms
+  Query 2 — Slow (correlated subquery, no optimization):
+  SELECT * FROM customers c
+  WHERE (SELECT SUM(total) FROM orders WHERE customer_id = c.id) > 10000;
+  -- EXPLAIN: Shows DEPENDENT SUBQUERY executed 500K times
+  -- Time: 30 seconds
+  Query 3 — Tricky (NOT IN with NULLs):
+  SELECT * FROM customers
+  WHERE id NOT IN (SELECT customer_id FROM orders);
+  -- If orders.customer_id can be NULL, MySQL can't use anti-join
+  -- Falls back to full scan with subquery-per-row
+  -- Time: 45 seconds
+  Query 4 — Derived table (materialized once):
+  SELECT c.*, order_stats.total_spent
+  FROM customers c
+  JOIN (SELECT customer_id, SUM(total) AS total_spent
+        FROM orders GROUP BY customer_id) AS order_stats
+    ON order_stats.customer_id = c.id
+  WHERE order_stats.total_spent > 10000;
+  -- Derived table materialized once, then joined
+  -- Time: 2 seconds (better than Query 2)
+  MySQL 8.0 subquery optimizations:
+  - Semi-join strategies: FirstMatch, LooseScan, Materialization,
+    DuplicateWeedout, TablePullout
+  - Derived table merging: optimizer may merge derived tables into
+    outer query instead of materializing
+  - optimizer_switch settings control which strategies are enabled
+  Task: Explain MySQL's subquery optimization. Write: how semi-join
+  transforms IN subqueries (the 5 strategies), why correlated
+  subqueries are expensive, the NOT IN NULL trap and how to avoid it,
+  derived table materialization vs merging, and when to manually
+  rewrite subqueries as JOINs.
+assertions:
+  - type: llm_judge
+    criteria: "Semi-join strategies are explained — FirstMatch (stop at first match per outer row), LooseScan (scan inner table index, skip duplicates), Materialization (materialize subquery result into temp table), DuplicateWeedout (use temp table to remove duplicates), TablePullout (convert subquery to join when inner table is unique). Explains that these apply to IN/EXISTS subqueries"
+    weight: 0.35
+    description: "Semi-join strategies explained"
+  - type: llm_judge
+    criteria: "Correlated subquery and NOT IN issues are addressed — correlated subqueries with aggregate functions (Query 2) can't be semi-joined and execute once per outer row (500K times). NOT IN with nullable columns prevents anti-join optimization because NULL IN (...NULL...) is UNKNOWN. Fix: use NOT EXISTS, or add WHERE customer_id IS NOT NULL in the subquery"
+    weight: 0.35
+    description: "Correlated and NOT IN issues"
+  - type: llm_judge
+    criteria: "Practical rewrite guidance is provided — shows how to rewrite Query 2 as a JOIN (like Query 4), when the optimizer's automatic transformations are sufficient vs when manual rewriting helps, and how to check what the optimizer did (EXPLAIN FORMAT=JSON shows 'attached_subqueries' vs 'nested_loop' for semi-join)"
+    weight: 0.30
+    description: "Practical rewrite guidance"

package/courses/mysql-query-optimization/scenarios/level-2/write-optimization.yaml ADDED Viewed

@@ -0,0 +1,63 @@
+meta:
+  id: write-optimization
+  level: 2
+  course: mysql-query-optimization
+  type: output
+  description: "Optimize MySQL writes — use bulk INSERT, LOAD DATA INFILE, transaction batching, and InnoDB flush settings for high-throughput writes"
+  tags: [MySQL, InnoDB, write-optimization, bulk-insert, LOAD-DATA, intermediate]
+state: {}
+trigger: |
+  Your data pipeline needs to insert 10 million rows per hour into MySQL.
+  Currently it takes 4 hours — you need to make it fit within 1 hour.
+  Current approach (slow):
+  for each row in data:
+      cursor.execute("INSERT INTO events VALUES (%s, %s, %s, ...)", row)
+      connection.commit()
+  Time: 4 hours for 10M rows (700 inserts/second)
+  Settings: All defaults (innodb_flush_log_at_trx_commit=1, autocommit=1)
+  Table: events (20 columns, 3 secondary indexes)
+  - PRIMARY KEY (id) — auto_increment BIGINT
+  - INDEX idx_user (user_id)
+  - INDEX idx_type_date (event_type, created_at)
+  - INDEX idx_session (session_id)
+  Benchmarks you've run:
+  A. Single-row INSERT with autocommit: 700/sec (current)
+  B. Multi-row INSERT (1000 per statement): 15,000/sec
+  C. Multi-row INSERT + batch commit (10K per txn): 50,000/sec
+  D. LOAD DATA INFILE: 200,000/sec
+  E. LOAD DATA INFILE + disabled indexes: 500,000/sec
+  The DBA warns about trade-offs:
+  1. innodb_flush_log_at_trx_commit=2 speeds up 30% but risks 1 second
+     of data loss on OS crash
+  2. Disabling unique_checks and foreign_key_checks speeds up 20% but
+     risks data integrity if data is bad
+  3. Dropping secondary indexes before load and rebuilding after is
+     faster but table is unusable during rebuild
+  Task: Design the write optimization strategy for the data pipeline.
+  Write: why single-row inserts are slow (per-row commit overhead),
+  the multi-row INSERT technique and optimal batch size, LOAD DATA
+  INFILE usage and best practices, the InnoDB flush settings trade-off,
+  and the index management strategy during bulk loads.
+assertions:
+  - type: llm_judge
+    criteria: "Root cause of slow writes is explained — single-row inserts with autocommit=1 flush the redo log to disk on every INSERT (fsync per row). With 10M rows, that's 10M fsyncs. Batching reduces fsyncs: 1000-row batches = 10K fsyncs, transaction batching (10K rows per commit) = 1K fsyncs. LOAD DATA INFILE uses internal bulk loading with minimal logging"
+    weight: 0.35
+    description: "Write bottleneck explained"
+  - type: llm_judge
+    criteria: "Optimization techniques are ranked correctly — LOAD DATA INFILE is fastest (200K/sec), followed by multi-row INSERT with batch commits (50K/sec), then multi-row INSERT with autocommit (15K/sec). Explains optimal batch size (1000-10000 rows per INSERT statement, 10K-100K rows per transaction), and how innodb_flush_log_at_trx_commit=2 provides a balanced speed/safety trade-off"
+    weight: 0.35
+    description: "Techniques ranked correctly"
+  - type: llm_judge
+    criteria: "Index management strategy is practical — for large bulk loads, consider: (1) disable secondary indexes (ALTER TABLE DISABLE KEYS for MyISAM, or drop/recreate for InnoDB), (2) load data, (3) rebuild indexes. For ongoing ingestion, keep indexes but use INSERT IGNORE or ON DUPLICATE KEY UPDATE. Quantifies the trade-off between index maintenance overhead and query availability"
+    weight: 0.30
+    description: "Index management strategy"

package/courses/mysql-query-optimization/scenarios/level-3/advanced-optimization-shift.yaml ADDED Viewed

@@ -0,0 +1,71 @@
+meta:
+  id: advanced-optimization-shift
+  level: 3
+  course: mysql-query-optimization
+  type: output
+  description: "Advanced optimization shift — handle a complex MySQL performance crisis combining replication failure, lock contention, and slow query regression"
+  tags: [MySQL, shift-simulation, crisis, replication, lock-contention, advanced]
+state: {}
+trigger: |
+  You're on-call when multiple MySQL alerts fire simultaneously during
+  a product launch event. The application is partially down and the
+  engineering VP is asking for status updates every 5 minutes.
+  Alert timeline:
+  14:00 — Traffic starts at 4x normal for product launch.
+  14:10 — Alert: Primary CPU 92%, query latency P99 jumps to 5 seconds.
+  Performance Schema shows a new query (from the launch feature):
+    SELECT p.*, r.*, u.name
+    FROM products p
+    JOIN reviews r ON r.product_id = p.id
+    JOIN users u ON u.id = r.user_id
+    WHERE p.id = ?
+    ORDER BY r.created_at DESC;
+  Missing index on reviews.product_id — full table scan on 50M reviews.
+  14:15 — Alert: Replication lag on Replica 1 at 120 seconds.
+  Analytics query running for 45 minutes holding open transaction,
+  preventing relay log application on the replica.
+  14:18 — Alert: "Lock wait timeout exceeded" errors at 100/minute.
+  The launch feature's "add to cart" UPDATE:
+    UPDATE inventory SET reserved = reserved + 1
+    WHERE product_id = ? AND warehouse_id = ?;
+  200 concurrent users trying to reserve the same hot product.
+  InnoDB row lock wait queue backing up.
+  14:22 — Alert: Connection pool exhausted. ProxySQL reports 0 available
+  backend connections. Queries queuing in ProxySQL.
+  Many connections stuck waiting for locks from 14:18 issue.
+  14:25 — Replica 2 also lagging 90 seconds. Read traffic can't be
+  served from replicas. All traffic hitting primary.
+  14:28 — Disk usage alert: 88%. Binlog accumulation from high write
+  volume + relay logs backing up on replicas.
+  Current state: Application returning 50% error rate.
+  Revenue impact: $20K/minute lost.
+  Task: Write the incident response. Include: the immediate triage
+  (first 5 minutes — what to fix first and why), the stabilization
+  steps (next 30 minutes), the root cause for each issue, the
+  post-incident action items, and the war-room communication template.
+assertions:
+  - type: llm_judge
+    criteria: "Triage prioritization is correct — fixes the most impactful issue first: kill the blocking analytics query on the replica (unblocking replication), add the missing index on reviews.product_id (fixing the slow query driving CPU), and increase ProxySQL backend connection limits or kill stuck connections. Lock contention on hot product addressed with application-level serialization or optimistic locking"
+    weight: 0.35
+    description: "Correct triage prioritization"
+  - type: llm_judge
+    criteria: "All issues are resolved with specific actions — missing index (CREATE INDEX CONCURRENTLY or online DDL), replication lag (kill blocking query, enable parallel replication for future), lock contention (change to optimistic locking or queue-based reservation), connection exhaustion (increase pool, reduce per-query lock hold time), disk (purge old binlogs, extend disk)"
+    weight: 0.35
+    description: "All issues resolved"
+  - type: llm_judge
+    criteria: "Post-incident improvements prevent recurrence — query review in CI/CD (catch missing indexes before deploy), load testing before launch events, replica query timeout (prevent long-running analytics from blocking replication), hot-product inventory pattern (Redis cache or queue-based reservation), and ProxySQL connection limits tuned for peak traffic"
+    weight: 0.30
+    description: "Prevention measures"

package/courses/mysql-query-optimization/scenarios/level-3/connection-management.yaml ADDED Viewed

@@ -0,0 +1,67 @@
+meta:
+  id: connection-management
+  level: 3
+  course: mysql-query-optimization
+  type: output
+  description: "Optimize MySQL connections — deploy ProxySQL for connection pooling, read/write splitting, and query routing"
+  tags: [MySQL, ProxySQL, connection-pooling, read-write-split, routing, advanced]
+state: {}
+trigger: |
+  Your application is experiencing "Too many connections" errors during
+  peak hours. The MySQL server has max_connections=500 but 20
+  microservices are each opening 50 connections, totaling 1,000
+  potential connections. Most connections sit idle in Sleep state.
+  Current state:
+  - MySQL max_connections: 500
+  - Application connections needed: 1,000 (20 services × 50)
+  - Active queries at peak: only 80 concurrent
+  - Sleep connections: 420 (84% idle)
+  - Memory per connection: ~10MB (thread stack + buffers)
+  - Total connection memory: 500 × 10MB = 5GB wasted on idle connections
+  SHOW PROCESSLIST during peak:
+  - 80 connections in Query state (active)
+  - 420 connections in Sleep state (idle, wasting resources)
+  - Error: "Too many connections" — 500 rejected in last hour
+  Architecture problem:
+  - Each microservice maintains its own connection pool
+  - No connection sharing between services
+  - Read queries (70%) go to primary — replicas underutilized
+  - No query routing — all services connect directly to MySQL
+  ProxySQL deployment plan:
+  - Insert ProxySQL between applications and MySQL
+  - Multiplex: 1,000 frontend connections → 200 backend connections
+  - Read/write split: writes → primary, reads → replicas
+  - Query routing rules: route by query pattern, user, schema
+  - Connection pooling: reuse backend connections across services
+  Questions:
+  1. How does ProxySQL multiplex connections?
+  2. How to configure read/write splitting?
+  3. How to handle prepared statements through a proxy?
+  4. What about MySQL Router (official) vs ProxySQL?
+  5. How to monitor connection pool health?
+  Task: Design the connection management architecture. Write: the
+  ProxySQL deployment plan, the read/write splitting configuration,
+  the query routing rules, the connection pool sizing (frontend and
+  backend), and the comparison with MySQL Router.
+assertions:
+  - type: llm_judge
+    criteria: "ProxySQL multiplexing is explained — frontend connections from applications are mapped to a smaller pool of backend connections to MySQL. ProxySQL maintains the session state and switches backend connections as needed. This reduces MySQL from 500+ connections to ~200 backend connections, saving memory and reducing context switching"
+    weight: 0.35
+    description: "ProxySQL multiplexing explained"
+  - type: llm_judge
+    criteria: "Read/write split is configured — ProxySQL query rules route SELECT to read replicas (hostgroup for readers) and INSERT/UPDATE/DELETE to primary (hostgroup for writers). Handles edge cases: SELECT ... FOR UPDATE goes to primary, queries within transactions go to primary, and freshly-written data reads may need primary (session stickiness or replication-lag-aware routing)"
+    weight: 0.35
+    description: "Read/write split configured"
+  - type: llm_judge
+    criteria: "Pool sizing and comparison are provided — backend pool size calculation (based on active query concurrency, not application pool size), ProxySQL vs MySQL Router comparison (ProxySQL: more features, query routing, caching; MySQL Router: simpler, better InnoDB Cluster integration, no connection pooling), and monitoring queries for pool health (ProxySQL stats tables)"
+    weight: 0.30
+    description: "Pool sizing and comparison"