npm - dojo.md - Versions diffs - 0.2.0 → 0.2.1 - Mend

dojo.md 0.2.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (222) hide show

package/courses/mysql-query-optimization/scenarios/level-3/full-text-search.yaml ADDED Viewed

@@ -0,0 +1,77 @@
+meta:
+  id: full-text-search
+  level: 3
+  course: mysql-query-optimization
+  type: output
+  description: "Implement MySQL full-text search — use FULLTEXT indexes, natural language and boolean modes, and relevance ranking for search features"
+  tags: [MySQL, full-text-search, FULLTEXT, MATCH-AGAINST, relevance, advanced]
+state: {}
+trigger: |
+  Your e-commerce site's product search is using LIKE '%keyword%'
+  queries, which cause full table scans on 5 million products. The
+  search page takes 8 seconds. You need to implement proper full-text
+  search using MySQL's built-in FULLTEXT indexes.
+  Current query (slow):
+  SELECT * FROM products
+  WHERE name LIKE '%wireless headphones%'
+     OR description LIKE '%wireless headphones%'
+  ORDER BY created_at DESC
+  LIMIT 20;
+  -- EXPLAIN: type=ALL, rows=5M, Using filesort
+  Products table:
+  CREATE TABLE products (
+    id BIGINT AUTO_INCREMENT PRIMARY KEY,
+    name VARCHAR(200) NOT NULL,
+    description TEXT,
+    category_id INT NOT NULL,
+    price DECIMAL(10,2) NOT NULL,
+    rating DECIMAL(3,2),
+    created_at DATETIME,
+    INDEX idx_category (category_id)
+  ) ENGINE=InnoDB;
+  Requirements:
+  1. Search by keyword across name and description
+  2. Results ranked by relevance (not just date)
+  3. Support for boolean operators (+must -exclude "exact phrase")
+  4. Search within a specific category
+  5. Boost exact name matches over description matches
+  6. Handle misspellings and partial matches
+  Search use cases to support:
+  A. Simple search: "wireless headphones"
+  B. Boolean search: "+wireless +headphones -bluetooth"
+  C. Phrase search: '"noise cancelling"'
+  D. Category-filtered search: wireless headphones in Electronics
+  E. Weighted relevance: name matches ranked higher than description
+  Questions:
+  1. How does FULLTEXT indexing work in InnoDB?
+  2. Natural language mode vs boolean mode — when to use each?
+  3. How to combine FULLTEXT with other WHERE conditions?
+  4. How to weight name matches higher than description matches?
+  5. When to use MySQL FTS vs Elasticsearch?
+  Task: Implement the full-text search. Write: the FULLTEXT index
+  creation, the MATCH...AGAINST query for each use case, the relevance
+  ranking approach (weighting name vs description), the combined
+  filtering (FULLTEXT + category), and the comparison with
+  Elasticsearch for when MySQL FTS isn't enough.
+assertions:
+  - type: llm_judge
+    criteria: "FULLTEXT implementation is correct — creates FULLTEXT INDEX on (name, description), uses MATCH(name, description) AGAINST('wireless headphones' IN NATURAL LANGUAGE MODE) for simple search and IN BOOLEAN MODE for operator search. Shows how to combine with WHERE category_id = ? using AND"
+    weight: 0.35
+    description: "Correct FULLTEXT implementation"
+  - type: llm_judge
+    criteria: "Relevance weighting is addressed — uses separate MATCH expressions: (MATCH(name) AGAINST(...) * 2 + MATCH(description) AGAINST(...)) AS relevance to weight name matches 2x higher. Requires FULLTEXT index on name alone AND a separate FULLTEXT index on (name, description). Explains that all MATCH columns must correspond to a single FULLTEXT index"
+    weight: 0.35
+    description: "Relevance weighting"
+  - type: llm_judge
+    criteria: "Limitations and Elasticsearch comparison are discussed — MySQL FTS limitations: no fuzzy matching (typo tolerance), no synonyms, no faceted search, minimum word length (ft_min_word_len), stopword issues. Elasticsearch is better for: complex search UX, faceted navigation, typo tolerance, multilingual search. MySQL FTS is sufficient for: basic keyword search, small-to-medium datasets, when simplicity matters"
+    weight: 0.30
+    description: "Limitations and Elasticsearch comparison"

package/courses/mysql-query-optimization/scenarios/level-3/json-optimization.yaml ADDED Viewed

@@ -0,0 +1,87 @@
+meta:
+  id: json-optimization
+  level: 3
+  course: mysql-query-optimization
+  type: output
+  description: "Optimize MySQL JSON queries — use generated columns, functional indexes, multi-valued indexes, and JSON_TABLE for efficient JSON processing"
+  tags: [MySQL, JSON, generated-columns, multi-valued-index, JSON-TABLE, advanced]
+state: {}
+trigger: |
+  Your application stores product metadata as JSON in MySQL 8.0. As
+  the product catalog grows to 5M rows, JSON queries are becoming slow
+  because they can't use indexes effectively.
+  Table schema:
+  CREATE TABLE products (
+    id BIGINT AUTO_INCREMENT PRIMARY KEY,
+    name VARCHAR(200) NOT NULL,
+    category_id INT NOT NULL,
+    metadata JSON NOT NULL,
+    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
+    INDEX idx_category (category_id)
+  ) ENGINE=InnoDB;
+  Sample JSON in metadata column:
+  {
+    "brand": "AcmeCorp",
+    "price": 29.99,
+    "currency": "USD",
+    "tags": ["electronics", "sale", "new-arrival"],
+    "dimensions": {"width": 10, "height": 5, "depth": 3},
+    "ratings": {"average": 4.5, "count": 230},
+    "variants": [
+      {"color": "red", "sku": "ACM-001-R", "stock": 50},
+      {"color": "blue", "sku": "ACM-001-B", "stock": 0}
+    ]
+  }
+  Slow queries:
+  1. Filter by brand (50K queries/day, 2 seconds each):
+     SELECT * FROM products
+     WHERE JSON_EXTRACT(metadata, '$.brand') = 'AcmeCorp';
+     -- Full table scan — no index on JSON field
+  2. Filter by tag (20K queries/day, 3 seconds each):
+     SELECT * FROM products
+     WHERE JSON_CONTAINS(metadata->'$.tags', '"sale"');
+     -- Full table scan — can't index arrays normally
+  3. Price range (30K queries/day, 1.5 seconds each):
+     SELECT * FROM products
+     WHERE CAST(JSON_EXTRACT(metadata, '$.price') AS DECIMAL(10,2))
+           BETWEEN 20 AND 50;
+     -- Full table scan — CAST prevents index usage
+  4. Join JSON array with another table:
+     SELECT p.name, t.tag_name
+     FROM products p, JSON_TABLE(
+       p.metadata, '$.tags[*]' COLUMNS (tag_name VARCHAR(50) PATH '$')
+     ) t
+     WHERE t.tag_name IN ('electronics', 'sale');
+  Optimization options:
+  A. Generated columns + regular indexes
+  B. Functional indexes (MySQL 8.0.13+)
+  C. Multi-valued indexes for arrays (MySQL 8.0.17+)
+  D. Denormalize into regular columns
+  Task: Optimize each slow query. Write: the generated column approach
+  (extract JSON to indexed column), functional indexes (direct index on
+  JSON expression), multi-valued indexes (for JSON arrays), JSON_TABLE
+  usage and performance, and when to denormalize vs keep JSON.
+assertions:
+  - type: llm_judge
+    criteria: "Each query has a correct optimization — Query 1: functional index on JSON_EXTRACT(metadata, '$.brand') or generated column + index, Query 2: multi-valued index on CAST(metadata->'$.tags' AS CHAR(50) ARRAY) for JSON_CONTAINS, Query 3: generated column brand_price DECIMAL AS (metadata->>'$.price') STORED + regular index, Query 4: JSON_TABLE is for ad-hoc use, consider denormalizing for frequent joins"
+    weight: 0.35
+    description: "Correct optimizations"
+  - type: llm_judge
+    criteria: "Multi-valued indexes are explained — MySQL 8.0.17+ allows indexing JSON arrays, enabling efficient MEMBER OF(), JSON_CONTAINS(), and JSON_OVERLAPS() queries. Shows the CREATE INDEX syntax with CAST(... AS ... ARRAY), explains that it creates one index entry per array element, and notes limitations (only works with specific JSON functions)"
+    weight: 0.35
+    description: "Multi-valued indexes explained"
+  - type: llm_judge
+    criteria: "Denormalization trade-off is discussed — JSON is good for flexible, rarely-queried attributes; generated columns (STORED) + indexes for frequently-queried fields; full denormalization for high-performance needs. Discusses storage trade-off (STORED generated columns duplicate data) and when to keep JSON vs extract to regular columns"
+    weight: 0.30
+    description: "Denormalization trade-off"

package/courses/mysql-query-optimization/scenarios/level-3/lock-contention-analysis.yaml ADDED Viewed

@@ -0,0 +1,68 @@
+meta:
+  id: lock-contention-analysis
+  level: 3
+  course: mysql-query-optimization
+  type: output
+  description: "Analyze and resolve MySQL lock contention — understand InnoDB row locks, gap locks, deadlocks, and design lock-free patterns"
+  tags: [MySQL, InnoDB, locks, deadlock, gap-locks, contention, advanced]
+state: {}
+trigger: |
+  Your e-commerce platform is experiencing intermittent deadlocks and
+  lock wait timeouts during checkout. The error rate spikes during flash
+  sales when multiple customers try to purchase the same products
+  simultaneously.
+  Error logs show:
+  - Deadlock detected (InnoDB): 50/hour during flash sales
+  - Lock wait timeout exceeded: 200/hour during flash sales
+  - innodb_lock_wait_timeout = 50 (default)
+  The problematic flow (checkout):
+  Transaction 1 (Customer A):
+    BEGIN;
+    SELECT * FROM inventory WHERE product_id = 101 FOR UPDATE; -- X lock
+    UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 101;
+    INSERT INTO order_items (order_id, product_id, quantity) VALUES (1, 101, 1);
+    UPDATE orders SET total = total + 29.99 WHERE id = 1; -- Deadlock here!
+    COMMIT;
+  Transaction 2 (Customer B, concurrent):
+    BEGIN;
+    SELECT * FROM inventory WHERE product_id = 101 FOR UPDATE; -- Waits for T1
+    -- or takes a different code path that locks orders first, then inventory
+  SHOW ENGINE INNODB STATUS reveals:
+  - Gap locks on inventory (product_id range) blocking other inserts
+  - Deadlock cycle: T1 holds X on inventory.101, wants X on orders.1
+    T2 holds X on orders.2, wants X on inventory.101
+  - Lock modes: X (exclusive), S (shared), X,GAP, X,REC_NOT_GAP
+  Additional lock issues:
+  1. ALTER TABLE on orders (500M rows) takes AccessExclusive lock,
+     blocking all queries for 20 minutes during metadata operations
+  2. Long-running analytics transaction (READ COMMITTED) prevents
+     InnoDB purge of old row versions, causing undo log growth
+  3. Gap locks in REPEATABLE READ preventing concurrent inserts in
+     ranges even when no actual conflict exists
+  Task: Analyze and resolve the lock contention. Write: the deadlock
+  root cause analysis (why the circular dependency forms), the lock-free
+  checkout redesign, the gap lock explanation and mitigation, the
+  isolation level trade-offs (REPEATABLE READ vs READ COMMITTED), and
+  the monitoring strategy for lock issues.
+assertions:
+  - type: llm_judge
+    criteria: "Deadlock analysis is correct — identifies the circular lock dependency (T1 locks inventory then orders, T2 locks orders then inventory), explains that consistent lock ordering would prevent the deadlock (always lock inventory first, then orders), and shows how to read SHOW ENGINE INNODB STATUS deadlock output"
+    weight: 0.35
+    description: "Deadlock analysis"
+  - type: llm_judge
+    criteria: "Lock-free redesign is practical — uses optimistic locking (version column), or UPDATE ... WHERE quantity >= 1 without SELECT FOR UPDATE, or queuing (serialize inventory updates through a queue), or advisory locking. Explains gap locks (prevent phantom rows in REPEATABLE READ by locking gaps between index records) and how READ COMMITTED eliminates gap locks"
+    weight: 0.35
+    description: "Lock-free redesign"
+  - type: llm_judge
+    criteria: "Isolation level trade-offs and monitoring are addressed — REPEATABLE READ (default): gap locks prevent phantoms but reduce concurrency; READ COMMITTED: no gap locks, higher concurrency, but allows phantom reads. Monitoring: performance_schema.data_locks, data_lock_waits, innodb_row_lock_waits status variable, and alerting on deadlock frequency"
+    weight: 0.30
+    description: "Isolation and monitoring"

package/courses/mysql-query-optimization/scenarios/level-3/monitoring-alerting.yaml ADDED Viewed

@@ -0,0 +1,63 @@
+meta:
+  id: monitoring-alerting
+  level: 3
+  course: mysql-query-optimization
+  type: output
+  description: "Design MySQL monitoring and alerting — implement Percona PMM, define alert thresholds, and build performance dashboards"
+  tags: [MySQL, monitoring, PMM, alerting, dashboards, advanced]
+state: {}
+trigger: |
+  Your company has 15 MySQL servers and no unified monitoring. Last
+  month, a disk-full incident on the primary database caused 2 hours
+  of downtime — nobody noticed until customers reported errors. The
+  CTO mandates comprehensive monitoring by end of quarter.
+  Current state:
+  - 15 MySQL 8.0 instances (5 primaries, 10 replicas)
+  - No centralized monitoring (each team checks their own server)
+  - Basic CloudWatch metrics (CPU, disk, connections)
+  - No query-level visibility
+  - No alerting — issues discovered by customer complaints
+  Recent incidents that monitoring would have prevented:
+  1. Disk full: WAL files + binlog accumulated, 100% disk, crash
+  2. Replication lag: Replica fell 2 hours behind, stale data served
+  3. Slow query regression: New deployment added query scanning 10M rows
+  4. Connection exhaustion: max_connections hit during traffic spike
+  5. Table bloat: 500GB table with 300GB dead rows (no optimize)
+  Monitoring stack options:
+  A. Percona PMM (open-source, MySQL-focused)
+  B. Datadog Database Monitoring (commercial, broad integration)
+  C. Prometheus + Grafana + mysqld_exporter (DIY)
+  D. CloudWatch Enhanced Monitoring (AWS-native)
+  Key metrics to monitor:
+  - Query performance: QPS, latency percentiles, slow query count
+  - Replication: lag, binlog position, I/O and SQL thread status
+  - InnoDB: buffer pool hit ratio, dirty pages, row lock waits
+  - Resources: CPU, memory, disk I/O, disk space, connections
+  - Schema: table sizes, index sizes, unused indexes, fragmentation
+  Task: Design the monitoring and alerting system. Write: the
+  recommended monitoring stack (with justification), the metric
+  categories and specific metrics to track, the alerting thresholds
+  (warning and critical for each metric), the dashboard design (what
+  each audience needs), and the incident detection that would have
+  caught the 5 past incidents.
+assertions:
+  - type: llm_judge
+    criteria: "Monitoring stack is justified — recommends PMM or Prometheus+Grafana with clear reasoning (PMM: MySQL-specific dashboards out of box, query analytics, advisor checks; Prometheus+Grafana: more customizable, broader ecosystem; Datadog: best integration but expensive). Covers installation and configuration overview"
+    weight: 0.35
+    description: "Monitoring stack justified"
+  - type: llm_judge
+    criteria: "Alerting thresholds are specific — defines warning and critical thresholds for: disk usage (warning 80%, critical 90%), replication lag (warning 30s, critical 300s), buffer pool hit ratio (warning <95%, critical <90%), connections (warning 80% of max, critical 95%), slow queries (warning 2x baseline, critical 5x). Each threshold has a response action"
+    weight: 0.35
+    description: "Specific alerting thresholds"
+  - type: llm_judge
+    criteria: "Past incidents would be caught — maps each of the 5 incidents to specific metrics and thresholds: disk full (disk usage alert at 80%), replication lag (lag alert at 30s), slow query regression (query latency change detection), connection exhaustion (connections approaching max), table bloat (table size growth rate or InnoDB free space). Demonstrates the monitoring prevents future incidents"
+    weight: 0.30
+    description: "Past incidents prevented"

package/courses/mysql-query-optimization/scenarios/level-3/online-schema-changes.yaml ADDED Viewed

@@ -0,0 +1,79 @@
+meta:
+  id: online-schema-changes
+  level: 3
+  course: mysql-query-optimization
+  type: output
+  description: "Perform online schema changes — use pt-online-schema-change, gh-ost, and MySQL Online DDL to modify large tables without downtime"
+  tags: [MySQL, online-DDL, pt-osc, gh-ost, schema-change, advanced]
+state: {}
+trigger: |
+  You need to add a column and an index to a 500 million row table in
+  production without causing downtime. The last time someone ran ALTER
+  TABLE on this table, it blocked all queries for 45 minutes.
+  The change needed:
+  ALTER TABLE orders
+    ADD COLUMN shipping_method VARCHAR(50) DEFAULT 'standard',
+    ADD INDEX idx_shipping (shipping_method, created_at);
+  Table: orders (500M rows, 200GB, InnoDB)
+  - 50K queries/second during peak
+  - Used by checkout (cannot tolerate blocking)
+  MySQL Online DDL capabilities (8.0):
+  - ALGORITHM=INPLACE: restructures table in-place without full copy
+  - ALGORITHM=INSTANT: metadata-only change (fastest, very limited)
+  - LOCK=NONE: allows concurrent DML during DDL
+  What each DDL supports:
+  | Operation          | INSTANT | INPLACE | COPY  | Lock during |
+  |--------------------|---------|---------|-------|-------------|
+  | ADD COLUMN (last)  | Yes     | Yes     | Yes   | None        |
+  | ADD COLUMN (middle)| No      | Yes     | Yes   | None        |
+  | ADD INDEX          | No      | Yes     | Yes   | None        |
+  | DROP COLUMN        | No*     | Yes     | Yes   | None        |
+  | CHANGE column type | No      | No      | Yes   | Full lock   |
+  | ADD FOREIGN KEY    | No      | Yes*    | Yes   | None*       |
+  * With caveats
+  But even with ALGORITHM=INPLACE, the operation can:
+  1. Take hours (rebuilds index in background)
+  2. Consume massive disk space (temporary files)
+  3. Cause replication lag (DDL replays as single operation on replica)
+  4. Briefly acquire metadata lock at start and end (blocking if long
+     transactions are running)
+  External tools:
+  A. pt-online-schema-change (Percona Toolkit):
+     - Creates shadow table, copies data via triggers, atomic rename
+     - Throttles based on replica lag and server load
+  B. gh-ost (GitHub):
+     - Uses binlog streaming instead of triggers
+     - No trigger overhead, more controllable
+     - Can pause/resume migration
+  C. MySQL Online DDL (native):
+     - Built-in, no external tool
+     - More limited control, no throttling
+  Task: Design the schema change strategy. Write: which approach to
+  use for this change (and why), the execution plan (step by step), the
+  risk mitigation (what if it fails mid-way), the replication impact
+  and how to manage it, and the decision framework for choosing between
+  pt-osc, gh-ost, and native Online DDL.
+assertions:
+  - type: llm_judge
+    criteria: "Tool recommendation is justified — recommends gh-ost or pt-osc for this change (despite native Online DDL support) because of controllability: throttling based on lag, ability to pause/resume, predictable disk usage. Explains that native INPLACE DDL works but can't be throttled and metadata lock acquisition can block queries if long transactions exist"
+    weight: 0.35
+    description: "Tool recommendation justified"
+  - type: llm_judge
+    criteria: "Execution plan is detailed — includes pre-checks (disk space for shadow table, no running long transactions, replica lag baseline), the migration command with throttling parameters, monitoring during migration (lag, disk, query latency), and the atomic cutover (rename tables). Estimates duration based on table size"
+    weight: 0.35
+    description: "Detailed execution plan"
+  - type: llm_judge
+    criteria: "Decision framework covers common cases — native Online DDL for: INSTANT-supported changes (add column at end), simple index additions on small tables. pt-osc for: trigger-compatible changes, MySQL 5.7. gh-ost for: large tables requiring throttling, trigger-sensitive environments, when pause/resume needed. All three compared on: speed, safety, controllability, replication impact"
+    weight: 0.30
+    description: "Decision framework"

package/courses/mysql-query-optimization/scenarios/level-3/partitioning-strategies.yaml ADDED Viewed

@@ -0,0 +1,83 @@
+meta:
+  id: partitioning-strategies
+  level: 3
+  course: mysql-query-optimization
+  type: output
+  description: "Design MySQL partitioning — implement RANGE, LIST, HASH, and KEY partitioning with partition pruning for large table performance"
+  tags: [MySQL, partitioning, RANGE, LIST, HASH, partition-pruning, advanced]
+state: {}
+trigger: |
+  Your events table has grown to 2 billion rows (800GB) and queries
+  are slowing down despite proper indexing. The table receives 50K
+  inserts per second and most queries filter by date range. The DBA
+  team proposes partitioning but the engineers have concerns about
+  complexity.
+  Current table:
+  CREATE TABLE events (
+    id BIGINT AUTO_INCREMENT,
+    tenant_id INT NOT NULL,
+    event_type VARCHAR(50) NOT NULL,
+    payload JSON,
+    created_at DATETIME NOT NULL,
+    PRIMARY KEY (id),
+    INDEX idx_tenant_date (tenant_id, created_at),
+    INDEX idx_type_date (event_type, created_at)
+  ) ENGINE=InnoDB;
+  Query patterns:
+  1. (80%) Recent events by tenant:
+     WHERE tenant_id = ? AND created_at >= NOW() - INTERVAL 7 DAY
+  2. (15%) Events by type and date:
+     WHERE event_type = ? AND created_at BETWEEN ? AND ?
+  3. (5%) Data cleanup:
+     DELETE FROM events WHERE created_at < NOW() - INTERVAL 90 DAY
+  Partitioning options considered:
+  A. RANGE by created_at (monthly partitions):
+     PARTITION BY RANGE (TO_DAYS(created_at)) (
+       PARTITION p202501 VALUES LESS THAN (TO_DAYS('2025-02-01')),
+       PARTITION p202502 VALUES LESS THAN (TO_DAYS('2025-03-01')),
+       ...
+     )
+  B. LIST by tenant_id:
+     PARTITION BY LIST (tenant_id) (
+       PARTITION p_small VALUES IN (1,2,3,...100),
+       PARTITION p_medium VALUES IN (101,102,...200),
+       ...
+     )
+  C. HASH by tenant_id:
+     PARTITION BY HASH (tenant_id) PARTITIONS 16;
+  D. Composite: RANGE by date, sub-partitioned by tenant (not supported
+     natively in MySQL — would need application-level sharding)
+  Constraints:
+  - PRIMARY KEY must include the partition key in MySQL
+  - This means changing PK from (id) to (id, created_at) for RANGE
+  - UNIQUE indexes must include the partition key
+  - Foreign keys not supported on partitioned tables
+  Task: Design the partitioning strategy. Write: the recommended
+  partitioning scheme with justification, the table DDL changes
+  (including PK modification), the partition pruning explanation (how
+  queries benefit), the maintenance plan (adding/dropping partitions),
+  and the data cleanup improvement (DROP PARTITION vs DELETE).
+assertions:
+  - type: llm_judge
+    criteria: "RANGE partitioning by date is recommended — matches the dominant query pattern (80% filter by date range), enables partition pruning (queries touch only relevant monthly partitions instead of scanning 2B rows), and the 90-day cleanup becomes DROP PARTITION (instant) instead of DELETE (hours of I/O). Justifies why LIST/HASH are less optimal for this workload"
+    weight: 0.35
+    description: "RANGE partitioning justified"
+  - type: llm_judge
+    criteria: "PK modification is addressed — explains that MySQL requires partition key in the PK, so PK changes from (id) to (id, created_at). Discusses the impact: auto_increment still works but is now only unique per partition, UNIQUE constraints must include created_at, and foreign keys referencing this table need updates. Shows the DDL"
+    weight: 0.35
+    description: "PK modification addressed"
+  - type: llm_judge
+    criteria: "Maintenance plan is practical — includes automated partition creation (add new monthly partitions ahead of time via event scheduler or cron), partition pruning verification (EXPLAIN PARTITIONS shows which partitions are accessed), and data lifecycle management (DROP PARTITION for old data is instant vs DELETE which generates massive undo logs)"
+    weight: 0.30
+    description: "Practical maintenance plan"

package/courses/mysql-query-optimization/scenarios/level-3/query-profiling-deep-dive.yaml ADDED Viewed

@@ -0,0 +1,84 @@
+meta:
+  id: query-profiling-deep-dive
+  level: 3
+  course: mysql-query-optimization
+  type: output
+  description: "Deep-dive query profiling — use EXPLAIN ANALYZE, optimizer trace, and handler counters to understand exactly where query time is spent"
+  tags: [MySQL, profiling, EXPLAIN-ANALYZE, optimizer-trace, handler-counters, advanced]
+state: {}
+trigger: |
+  A critical API query takes 2 seconds but EXPLAIN estimates show it
+  should be fast (few rows, good indexes). You need to understand
+  exactly where the 2 seconds are spent — is it I/O, CPU, lock waits,
+  or something else entirely?
+  The query:
+  SELECT o.id, o.total, o.status,
+         c.name, c.email,
+         GROUP_CONCAT(oi.product_name ORDER BY oi.price DESC) AS products
+  FROM orders o
+  JOIN customers c ON c.id = o.customer_id
+  JOIN order_items oi ON oi.order_id = o.id
+  WHERE o.customer_id = 42
+    AND o.created_at >= '2025-01-01'
+  GROUP BY o.id
+  ORDER BY o.created_at DESC
+  LIMIT 10;
+  EXPLAIN shows good plans (all using indexes, few rows).
+  But actual execution: 2 seconds.
+  Profiling tools available:
+  1. EXPLAIN ANALYZE (MySQL 8.0.18+):
+     Shows actual time spent at each step, not just estimates.
+  2. Optimizer Trace:
+     SET optimizer_trace = 'enabled=on';
+     SELECT ...; -- your query
+     SELECT * FROM information_schema.optimizer_trace;
+     Shows every decision the optimizer made.
+  3. Handler counters:
+     FLUSH STATUS;
+     SELECT ...; -- your query
+     SHOW STATUS LIKE 'Handler%';
+     Shows exact row-level operations (reads, writes, sorts).
+  4. Performance Schema events:
+     SELECT * FROM events_stages_history
+     WHERE THREAD_ID = (current thread);
+     Shows time spent in each execution stage.
+  5. Status counters:
+     SHOW STATUS LIKE 'Sort%';
+     SHOW STATUS LIKE 'Created_tmp%';
+     Shows temporary table and sort operations.
+  Your investigation reveals:
+  - Handler_read_next: 50,000 (reading 50K order_items)
+  - Created_tmp_disk_tables: 1 (GROUP_CONCAT overflow to disk)
+  - Sort_merge_passes: 5 (sorting on disk, not in memory)
+  - GROUP_CONCAT result for one order has 500+ items
+  Task: Write the query profiling guide. Include: how to use EXPLAIN
+  ANALYZE to find time bottlenecks, how to read optimizer trace output,
+  what handler counters reveal about row access patterns, how to
+  identify temporary table and sort issues, and the fix for this
+  specific query.
+assertions:
+  - type: llm_judge
+    criteria: "EXPLAIN ANALYZE interpretation is explained — shows how to read the actual_time field (start_time...end_time), how actual_rows vs estimated rows reveals bad estimates, and identifies which iterator in the tree is consuming the most time. This is the primary tool for finding where 2 seconds is spent"
+    weight: 0.35
+    description: "EXPLAIN ANALYZE interpretation"
+  - type: llm_judge
+    criteria: "Handler counters and temporary tables are decoded — Handler_read_next=50K means 50K sequential reads (scanning through order_items), Created_tmp_disk_tables=1 means the temporary table overflowed to disk (likely GROUP_CONCAT result exceeding group_concat_max_len or tmp_table_size), and Sort_merge_passes=5 indicates sort didn't fit in sort_buffer_size"
+    weight: 0.35
+    description: "Counters decoded"
+  - type: llm_judge
+    criteria: "Fix for the specific query is provided — the GROUP_CONCAT on 500+ items per order is the bottleneck (overflows to disk temp table). Solutions: add LIMIT within GROUP_CONCAT (MySQL doesn't support it natively, so use a subquery), increase group_concat_max_len and tmp_table_size, or restructure to return order items separately instead of concatenating"
+    weight: 0.30
+    description: "Specific query fix"

package/courses/mysql-query-optimization/scenarios/level-3/replication-optimization.yaml ADDED Viewed

@@ -0,0 +1,66 @@
+meta:
+  id: replication-optimization
+  level: 3
+  course: mysql-query-optimization
+  type: output
+  description: "Optimize MySQL replication — tune source-replica lag, implement parallel replication, and design read scaling topology"
+  tags: [MySQL, replication, source-replica, parallel-replication, lag, advanced]
+state: {}
+trigger: |
+  Your MySQL replication is falling behind. The replica consistently
+  lags 30-60 seconds behind the source during peak hours, causing stale
+  reads in your read-heavy application. The analytics team's queries on
+  the replica further increase the lag.
+  Current topology:
+  - Source: MySQL 8.0, db.r6g.4xlarge (16 vCPU, 128GB)
+  - Replica 1: db.r6g.2xlarge (8 vCPU, 64GB) — application reads
+  - Replica 2: db.r6g.2xlarge (8 vCPU, 64GB) — analytics
+  - Binlog format: ROW
+  - replica_parallel_workers = 0 (single-threaded replay!)
+  Replication lag investigation:
+  SHOW REPLICA STATUS:
+  - Seconds_Behind_Source: 45
+  - Relay_Log_Space: 2GB (large backlog)
+  - Replica_SQL_Running_State: "Applying batch of row changes"
+  Source write patterns:
+  - 10K writes/second peak
+  - Large transactions: nightly ETL process writes 5M rows in one txn
+  - DDL operations: weekly ALTER TABLE on 100M-row table
+  - Binlog output: 500MB/hour
+  Issues:
+  1. Single-threaded SQL applier can't keep up with multi-threaded source
+  2. Large transactions block all other replication until complete
+  3. Analytics queries on Replica 2 compete for CPU with SQL applier
+  4. No parallel replication configured
+  5. ROW format binlog is large (every changed column for every row)
+  MySQL 8.0 parallel replication options:
+  A. LOGICAL_CLOCK: parallel replay based on commit timestamps
+  B. DATABASE: parallel replay per database (limited if single database)
+  C. WRITESET: track row-level dependencies (most parallel, MySQL 8.0.14+)
+  Task: Design the replication optimization. Write: the parallel
+  replication configuration (which mode, how many workers), the large
+  transaction handling strategy, the replica sizing (match source
+  capacity), the binlog format trade-offs (ROW vs MIXED vs STATEMENT),
+  and the lag monitoring and alerting approach.
+assertions:
+  - type: llm_judge
+    criteria: "Parallel replication is correctly configured — enables replica_parallel_workers=16 (matching source CPU), sets replica_parallel_type=LOGICAL_CLOCK or WRITESET for MySQL 8.0, explains why DATABASE mode doesn't help with single database, and notes that WRITESET provides the most parallelism by tracking row-level dependencies"
+    weight: 0.35
+    description: "Parallel replication configured"
+  - type: llm_judge
+    criteria: "Large transaction and sizing issues are addressed — recommends breaking large ETL transactions into smaller batches (50K rows per commit), explains that a single large transaction serializes all replication (blocks other workers), and recommends matching replica hardware to source (8 vCPU replica can't keep up with 16 vCPU source)"
+    weight: 0.35
+    description: "Large transactions and sizing"
+  - type: llm_judge
+    criteria: "Binlog format and monitoring are explained — ROW format: largest binlog but safest (deterministic), STATEMENT: smallest but non-deterministic risk, MIXED: auto-selects. Monitoring: Seconds_Behind_Source, Relay_Log_Space, performance_schema.replication_applier_status_by_worker for per-worker lag, and alerting when lag exceeds application tolerance (e.g., 10 seconds)"
+    weight: 0.30
+    description: "Binlog format and monitoring"