@pageai/ralph-loop 1.9.0 → 1.10.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.agents/skills/mysql/SKILL.md +81 -0
- package/.agents/skills/mysql/references/character-sets.md +66 -0
- package/.agents/skills/mysql/references/composite-indexes.md +59 -0
- package/.agents/skills/mysql/references/connection-management.md +70 -0
- package/.agents/skills/mysql/references/covering-indexes.md +47 -0
- package/.agents/skills/mysql/references/data-types.md +69 -0
- package/.agents/skills/mysql/references/deadlocks.md +72 -0
- package/.agents/skills/mysql/references/explain-analysis.md +66 -0
- package/.agents/skills/mysql/references/fulltext-indexes.md +28 -0
- package/.agents/skills/mysql/references/index-maintenance.md +110 -0
- package/.agents/skills/mysql/references/isolation-levels.md +49 -0
- package/.agents/skills/mysql/references/json-column-patterns.md +77 -0
- package/.agents/skills/mysql/references/n-plus-one.md +77 -0
- package/.agents/skills/mysql/references/online-ddl.md +53 -0
- package/.agents/skills/mysql/references/partitioning.md +92 -0
- package/.agents/skills/mysql/references/primary-keys.md +70 -0
- package/.agents/skills/mysql/references/query-optimization-pitfalls.md +117 -0
- package/.agents/skills/mysql/references/replication-lag.md +46 -0
- package/.agents/skills/mysql/references/row-locking-gotchas.md +63 -0
- package/.agents/skills/postgres/SKILL.md +46 -0
- package/.agents/skills/postgres/references/backup-recovery.md +41 -0
- package/.agents/skills/postgres/references/index-optimization.md +69 -0
- package/.agents/skills/postgres/references/indexing.md +61 -0
- package/.agents/skills/postgres/references/memory-management-ops.md +39 -0
- package/.agents/skills/postgres/references/monitoring.md +59 -0
- package/.agents/skills/postgres/references/mvcc-transactions.md +38 -0
- package/.agents/skills/postgres/references/mvcc-vacuum.md +41 -0
- package/.agents/skills/postgres/references/optimization-checklist.md +19 -0
- package/.agents/skills/postgres/references/partitioning.md +79 -0
- package/.agents/skills/postgres/references/process-architecture.md +46 -0
- package/.agents/skills/postgres/references/ps-cli-api-insights.md +53 -0
- package/.agents/skills/postgres/references/ps-cli-commands.md +72 -0
- package/.agents/skills/postgres/references/ps-connection-pooling.md +72 -0
- package/.agents/skills/postgres/references/ps-connections.md +37 -0
- package/.agents/skills/postgres/references/ps-extensions.md +27 -0
- package/.agents/skills/postgres/references/ps-insights.md +62 -0
- package/.agents/skills/postgres/references/query-patterns.md +80 -0
- package/.agents/skills/postgres/references/replication.md +49 -0
- package/.agents/skills/postgres/references/schema-design.md +66 -0
- package/.agents/skills/postgres/references/storage-layout.md +41 -0
- package/.agents/skills/postgres/references/wal-operations.md +42 -0
- package/README.md +1 -1
- package/bin/cli.js +2 -0
- package/bin/lib/shadcn.js +1 -1
- package/package.json +1 -1
|
@@ -0,0 +1,110 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: Index Maintenance and Cleanup
|
|
3
|
+
description: Index maintenance
|
|
4
|
+
tags: mysql, indexes, maintenance, unused-indexes, performance
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Index Maintenance
|
|
8
|
+
|
|
9
|
+
## Find Unused Indexes
|
|
10
|
+
|
|
11
|
+
```sql
|
|
12
|
+
-- Requires performance_schema enabled (default in MySQL 5.7+)
|
|
13
|
+
-- "Unused" here means no reads/writes since last restart.
|
|
14
|
+
SELECT object_schema, object_name, index_name, COUNT_READ, COUNT_WRITE
|
|
15
|
+
FROM performance_schema.table_io_waits_summary_by_index_usage
|
|
16
|
+
WHERE object_schema = 'mydb'
|
|
17
|
+
AND index_name IS NOT NULL AND index_name != 'PRIMARY'
|
|
18
|
+
AND COUNT_READ = 0 AND COUNT_WRITE = 0
|
|
19
|
+
ORDER BY COUNT_WRITE DESC;
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
Sometimes you'll also see indexes with **writes but no reads** (overhead without query benefit). Review these carefully: some are required for constraints (UNIQUE/PK) even if not used in query plans.
|
|
23
|
+
|
|
24
|
+
```sql
|
|
25
|
+
SELECT object_schema, object_name, index_name, COUNT_READ, COUNT_WRITE
|
|
26
|
+
FROM performance_schema.table_io_waits_summary_by_index_usage
|
|
27
|
+
WHERE object_schema = 'mydb'
|
|
28
|
+
AND index_name IS NOT NULL AND index_name != 'PRIMARY'
|
|
29
|
+
AND COUNT_READ = 0 AND COUNT_WRITE > 0
|
|
30
|
+
ORDER BY COUNT_WRITE DESC;
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
Counters reset on restart — ensure 1+ full business cycle of uptime before dropping.
|
|
34
|
+
|
|
35
|
+
## Find Redundant Indexes
|
|
36
|
+
|
|
37
|
+
Index on `(a)` is redundant if `(a, b)` exists (leftmost prefix covers it). Pairs sharing only the first column (e.g. `(a,b)` vs `(a,c)`) need manual review — neither is redundant.
|
|
38
|
+
|
|
39
|
+
```sql
|
|
40
|
+
-- Prefer sys schema view (MySQL 5.7.7+)
|
|
41
|
+
SELECT table_schema, table_name,
|
|
42
|
+
redundant_index_name, redundant_index_columns,
|
|
43
|
+
dominant_index_name, dominant_index_columns
|
|
44
|
+
FROM sys.schema_redundant_indexes
|
|
45
|
+
WHERE table_schema = 'mydb';
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
## Check Index Sizes
|
|
49
|
+
|
|
50
|
+
```sql
|
|
51
|
+
SELECT database_name, table_name, index_name,
|
|
52
|
+
ROUND(stat_value * @@innodb_page_size / 1024 / 1024, 2) AS size_mb
|
|
53
|
+
FROM mysql.innodb_index_stats
|
|
54
|
+
WHERE stat_name = 'size' AND database_name = 'mydb'
|
|
55
|
+
ORDER BY stat_value DESC;
|
|
56
|
+
-- stat_value is in pages; multiply by innodb_page_size for bytes
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
## Index Write Overhead
|
|
60
|
+
Each index must be updated on INSERT, UPDATE, and DELETE operations. More indexes = slower writes.
|
|
61
|
+
|
|
62
|
+
- **INSERT**: each secondary index adds a write
|
|
63
|
+
- **UPDATE**: changing indexed columns updates all affected indexes
|
|
64
|
+
- **DELETE**: removes entries from all indexes
|
|
65
|
+
|
|
66
|
+
InnoDB can defer some secondary index updates via the change buffer, but excessive indexing still reduces write throughput.
|
|
67
|
+
|
|
68
|
+
## Update Statistics (ANALYZE TABLE)
|
|
69
|
+
The optimizer relies on index cardinality and distribution statistics. After large data changes, refresh statistics:
|
|
70
|
+
|
|
71
|
+
```sql
|
|
72
|
+
ANALYZE TABLE orders;
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
This updates statistics (does not rebuild the table).
|
|
76
|
+
|
|
77
|
+
## Rebuild / Reclaim Space (OPTIMIZE TABLE)
|
|
78
|
+
`OPTIMIZE TABLE` can reclaim space and rebuild indexes:
|
|
79
|
+
|
|
80
|
+
```sql
|
|
81
|
+
OPTIMIZE TABLE orders;
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
For InnoDB this effectively rebuilds the table and indexes and can be slow on large tables.
|
|
85
|
+
|
|
86
|
+
## Invisible Indexes (MySQL 8.0+)
|
|
87
|
+
Test removing an index without dropping it:
|
|
88
|
+
|
|
89
|
+
```sql
|
|
90
|
+
ALTER TABLE orders ALTER INDEX idx_status INVISIBLE;
|
|
91
|
+
ALTER TABLE orders ALTER INDEX idx_status VISIBLE;
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
Invisible indexes are still maintained on writes (overhead remains), but the optimizer won't consider them.
|
|
95
|
+
|
|
96
|
+
## Index Maintenance Tools
|
|
97
|
+
|
|
98
|
+
### Online DDL (Built-in)
|
|
99
|
+
Most add/drop index operations are online-ish but still take brief metadata locks:
|
|
100
|
+
|
|
101
|
+
```sql
|
|
102
|
+
ALTER TABLE orders ADD INDEX idx_status (status), ALGORITHM=INPLACE, LOCK=NONE;
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
### pt-online-schema-change / gh-ost
|
|
106
|
+
For very large tables or high-write workloads, online schema change tools can reduce blocking by using a shadow table and a controlled cutover (tradeoffs: operational complexity, privileges, triggers/binlog requirements).
|
|
107
|
+
|
|
108
|
+
## Guidelines
|
|
109
|
+
- 1–5 indexes per table is normal. 6+: audit for redundancy.
|
|
110
|
+
- Combine `performance_schema` data with `EXPLAIN` of frequent queries monthly.
|
|
@@ -0,0 +1,49 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: InnoDB Transaction Isolation Levels
|
|
3
|
+
description: Best practices for choosing and using isolation levels
|
|
4
|
+
tags: mysql, transactions, isolation, innodb, locking, concurrency
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Isolation Levels (InnoDB Best Practices)
|
|
8
|
+
|
|
9
|
+
**Default to REPEATABLE READ.** It is the InnoDB default, most tested, and prevents phantom reads. Only change per-session with a measured reason.
|
|
10
|
+
|
|
11
|
+
```sql
|
|
12
|
+
SELECT @@transaction_isolation;
|
|
13
|
+
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED; -- per-session only
|
|
14
|
+
```
|
|
15
|
+
|
|
16
|
+
## Autocommit Interaction
|
|
17
|
+
- Default: `autocommit=1` (each statement is its own transaction).
|
|
18
|
+
- With `autocommit=0`, transactions span multiple statements until `COMMIT`/`ROLLBACK`.
|
|
19
|
+
- Isolation level applies per transaction. SERIALIZABLE behavior differs based on autocommit setting (see SERIALIZABLE section).
|
|
20
|
+
|
|
21
|
+
## Locking vs Non-Locking Reads
|
|
22
|
+
- **Non-locking reads**: plain `SELECT` statements use consistent reads (MVCC snapshots). They don't acquire locks and don't block writers.
|
|
23
|
+
- **Locking reads**: `SELECT ... FOR UPDATE` (exclusive) or `SELECT ... FOR SHARE` (shared) acquire locks and can block concurrent modifications.
|
|
24
|
+
- `UPDATE` and `DELETE` statements are implicitly locking reads.
|
|
25
|
+
|
|
26
|
+
## REPEATABLE READ (Default — Prefer This)
|
|
27
|
+
- Consistent reads: snapshot established at first read; all plain SELECTs within the transaction read from that same snapshot (MVCC). Plain SELECTs are non-locking and don't block writers.
|
|
28
|
+
- Locking reads/writes use **next-key locks** (row + gap) — prevents phantoms. Exception: a unique index with a unique search condition locks only the index record, not the gap.
|
|
29
|
+
- **Use for**: OLTP, check-then-insert, financial logic, reports needing consistent snapshots.
|
|
30
|
+
- **Avoid mixing** locking statements (`SELECT ... FOR UPDATE`, `UPDATE`, `DELETE`) with non-locking `SELECT` statements in the same transaction — they can observe different states (current vs snapshot) and lead to surprises.
|
|
31
|
+
|
|
32
|
+
## READ COMMITTED (Per-Session Only, When Needed)
|
|
33
|
+
- Fresh snapshot per SELECT; **record locks only** (gap locks disabled for searches/index scans, but still used for foreign-key and duplicate-key checks) — more concurrency, but phantoms possible.
|
|
34
|
+
- **Switch only when**: gap-lock deadlocks confirmed via `SHOW ENGINE INNODB STATUS`, bulk imports with contention, or high-write concurrency on overlapping ranges.
|
|
35
|
+
- **Never switch globally.** Check-then-insert patterns break — use `INSERT ... ON DUPLICATE KEY` or `FOR UPDATE` instead.
|
|
36
|
+
|
|
37
|
+
## SERIALIZABLE — Avoid
|
|
38
|
+
Converts all plain SELECTs to `SELECT ... FOR SHARE` **if autocommit is disabled**. If autocommit is enabled, SELECTs are consistent (non-locking) reads. SERIALIZABLE can cause massive contention when autocommit is disabled. Prefer explicit `SELECT ... FOR UPDATE` at REPEATABLE READ instead — same safety, far less lock scope.
|
|
39
|
+
|
|
40
|
+
## READ UNCOMMITTED — Never Use
|
|
41
|
+
Dirty reads with no valid production use case.
|
|
42
|
+
|
|
43
|
+
## Decision Guide
|
|
44
|
+
| Scenario | Recommendation |
|
|
45
|
+
|---|---|
|
|
46
|
+
| General OLTP / check-then-insert / reports | **REPEATABLE READ** (default) |
|
|
47
|
+
| Bulk import or gap-lock deadlocks | **READ COMMITTED** (per-session), benchmark first |
|
|
48
|
+
| Need serializability | Explicit `FOR UPDATE` at REPEATABLE READ; SERIALIZABLE only as last resort |
|
|
49
|
+
|
|
@@ -0,0 +1,77 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: JSON Column Best Practices
|
|
3
|
+
description: When and how to use JSON columns safely
|
|
4
|
+
tags: mysql, json, generated-columns, indexes, data-modeling
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# JSON Column Patterns
|
|
8
|
+
|
|
9
|
+
MySQL 5.7+ supports native JSON columns. Useful, but with important caveats.
|
|
10
|
+
|
|
11
|
+
## When JSON Is Appropriate
|
|
12
|
+
- Truly schema-less data (user preferences, metadata bags, webhook payloads).
|
|
13
|
+
- Rarely filtered/joined — if you query a JSON path frequently, extract it to a real column.
|
|
14
|
+
|
|
15
|
+
## Indexing JSON: Use Generated Columns
|
|
16
|
+
You **cannot** index a JSON column directly. Create a virtual generated column and index that:
|
|
17
|
+
```sql
|
|
18
|
+
ALTER TABLE events
|
|
19
|
+
ADD COLUMN event_type VARCHAR(50) GENERATED ALWAYS AS (data->>'$.type') VIRTUAL,
|
|
20
|
+
ADD INDEX idx_event_type (event_type);
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
## Extraction Operators
|
|
24
|
+
| Syntax | Returns | Use for |
|
|
25
|
+
|---|---|---|
|
|
26
|
+
| `JSON_EXTRACT(col, '$.key')` | JSON type value (e.g., `"foo"` for strings) | When you need JSON type semantics |
|
|
27
|
+
| `col->'$.key'` | Same as `JSON_EXTRACT(col, '$.key')` | Shorthand |
|
|
28
|
+
| `col->>'$.key'` | Unquoted scalar (equivalent to `JSON_UNQUOTE(JSON_EXTRACT(col, '$.key'))`) | WHERE comparisons, display |
|
|
29
|
+
|
|
30
|
+
Always use `->>` (unquote) in WHERE clauses, otherwise you compare against `"foo"` (with quotes).
|
|
31
|
+
|
|
32
|
+
Tip: the generated column example above can be written more concisely as:
|
|
33
|
+
|
|
34
|
+
```sql
|
|
35
|
+
ALTER TABLE events
|
|
36
|
+
ADD COLUMN event_type VARCHAR(50) GENERATED ALWAYS AS (data->>'$.type') VIRTUAL,
|
|
37
|
+
ADD INDEX idx_event_type (event_type);
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
## Multi-Valued Indexes (MySQL 8.0.17+)
|
|
41
|
+
If you store arrays in JSON (e.g., `tags: ["electronics","sale"]`), MySQL 8.0.17+ supports multi-valued indexes to index array elements:
|
|
42
|
+
|
|
43
|
+
```sql
|
|
44
|
+
ALTER TABLE products
|
|
45
|
+
ADD INDEX idx_tags ((CAST(tags AS CHAR(50) ARRAY)));
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
This can accelerate membership queries such as:
|
|
49
|
+
|
|
50
|
+
```sql
|
|
51
|
+
SELECT * FROM products WHERE 'electronics' MEMBER OF (tags);
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
## Collation and Type Casting Pitfalls
|
|
55
|
+
- **JSON type comparisons**: `JSON_EXTRACT` returns JSON type. Comparing directly to strings can be wrong for numbers/dates.
|
|
56
|
+
|
|
57
|
+
```sql
|
|
58
|
+
-- WRONG: lexicographic string comparison
|
|
59
|
+
WHERE data->>'$.price' <= '1200'
|
|
60
|
+
|
|
61
|
+
-- CORRECT: cast to numeric
|
|
62
|
+
WHERE CAST(data->>'$.price' AS UNSIGNED) <= 1200
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
- **Collation**: values extracted with `->>` behave like strings and use a collation. Use `COLLATE` when you need a specific comparison behavior.
|
|
66
|
+
|
|
67
|
+
```sql
|
|
68
|
+
WHERE data->>'$.status' COLLATE utf8mb4_0900_as_cs = 'Active'
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
## Common Pitfalls
|
|
72
|
+
- **Heavy update cost**: `JSON_SET`/`JSON_REPLACE` can touch large portions of a JSON document and generate significant redo/undo work on large blobs.
|
|
73
|
+
- **No partial indexes**: You can only index extracted scalar paths via generated columns.
|
|
74
|
+
- **Large documents hurt**: JSON stored inline in the row. Documents >8 KB spill to overflow pages, hurting read performance.
|
|
75
|
+
- **Type mismatches**: `JSON_EXTRACT` returns a JSON type. Comparing with `= 'foo'` may not match — use `->>` or `JSON_UNQUOTE`.
|
|
76
|
+
- **VIRTUAL vs STORED generated columns**: VIRTUAL columns compute on read (less storage, more CPU). STORED columns materialize on write (more storage, faster reads if selected often). Both can be indexed; for indexed paths, the index stores the computed value either way.
|
|
77
|
+
|
|
@@ -0,0 +1,77 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: N+1 Query Detection and Fixes
|
|
3
|
+
description: N+1 query solutions
|
|
4
|
+
tags: mysql, n-plus-one, orm, query-optimization, performance
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# N+1 Query Detection
|
|
8
|
+
|
|
9
|
+
## What Is N+1?
|
|
10
|
+
The N+1 pattern occurs when you fetch N parent records, then execute N additional queries (one per parent) to fetch related data.
|
|
11
|
+
|
|
12
|
+
Example: 1 query for users + N queries for posts.
|
|
13
|
+
|
|
14
|
+
## ORM Fixes (Quick Reference)
|
|
15
|
+
|
|
16
|
+
- **SQLAlchemy 1.x**: `session.query(User).options(joinedload(User.posts))`
|
|
17
|
+
- **SQLAlchemy 2.0**: `select(User).options(joinedload(User.posts))`
|
|
18
|
+
- **Django**: `select_related('fk_field')` for FK/O2O, `prefetch_related('m2m_field')` for M2M/reverse FK
|
|
19
|
+
- **ActiveRecord**: `User.includes(:orders)`
|
|
20
|
+
- **Prisma**: `findMany({ include: { orders: true } })`
|
|
21
|
+
- **Drizzle**: use `.leftJoin()` instead of loop queries
|
|
22
|
+
|
|
23
|
+
```typescript
|
|
24
|
+
// Drizzle example: avoid N+1 with a join
|
|
25
|
+
const rows = await db
|
|
26
|
+
.select()
|
|
27
|
+
.from(users)
|
|
28
|
+
.leftJoin(posts, eq(users.id, posts.userId));
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
## Detecting in MySQL Production
|
|
32
|
+
|
|
33
|
+
```sql
|
|
34
|
+
-- High-frequency simple queries often indicate N+1
|
|
35
|
+
-- Requires performance_schema enabled (default in MySQL 5.7+)
|
|
36
|
+
SELECT digest_text, count_star, avg_timer_wait
|
|
37
|
+
FROM performance_schema.events_statements_summary_by_digest
|
|
38
|
+
ORDER BY count_star DESC LIMIT 20;
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
Also check the slow query log sorted by `count` for frequently repeated simple SELECTs.
|
|
42
|
+
|
|
43
|
+
## Batch Consolidation
|
|
44
|
+
Replace sequential queries with `WHERE id IN (...)`.
|
|
45
|
+
|
|
46
|
+
Practical limits:
|
|
47
|
+
- Total statement size is capped by `max_allowed_packet` (often 4MB by default).
|
|
48
|
+
- Very large IN lists increase parsing/planning overhead and can hurt performance.
|
|
49
|
+
|
|
50
|
+
Strategies:
|
|
51
|
+
- Up to ~1000–5000 ids: `IN (...)` is usually fine.
|
|
52
|
+
- Larger: chunk the list (e.g. batches of 500–1000) or use a temporary table and join.
|
|
53
|
+
|
|
54
|
+
```sql
|
|
55
|
+
-- Temporary table approach for large batches
|
|
56
|
+
CREATE TEMPORARY TABLE temp_user_ids (id BIGINT PRIMARY KEY);
|
|
57
|
+
INSERT INTO temp_user_ids VALUES (1), (2), (3);
|
|
58
|
+
|
|
59
|
+
SELECT p.*
|
|
60
|
+
FROM posts p
|
|
61
|
+
JOIN temp_user_ids t ON p.user_id = t.id;
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
## Joins vs Separate Queries
|
|
65
|
+
- Prefer **JOINs** when you need related data for most/all parent rows and the result set stays reasonable.
|
|
66
|
+
- Prefer **separate queries** (batched) when JOINs would explode rows (one-to-many) or over-fetch too much data.
|
|
67
|
+
|
|
68
|
+
## Eager Loading Caveats
|
|
69
|
+
- **Over-fetching**: eager loading pulls *all* related rows unless you filter it.
|
|
70
|
+
- **Memory**: loading large collections can blow up memory.
|
|
71
|
+
- **Row multiplication**: JOIN-based eager loading can create huge result sets; in some ORMs, a "select-in" strategy is safer.
|
|
72
|
+
|
|
73
|
+
## Prepared Statements
|
|
74
|
+
Prepared statements reduce repeated parse/optimize overhead for repeated parameterized queries, but they do **not** eliminate N+1: you still execute N queries. Use batching/eager loading to reduce query count.
|
|
75
|
+
|
|
76
|
+
## Pagination Pitfalls
|
|
77
|
+
N+1 often reappears per page. Ensure eager loading or batching is applied to the paginated query, not inside the per-row loop.
|
|
@@ -0,0 +1,53 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: Online DDL and Schema Migrations
|
|
3
|
+
description: Lock-safe ALTER TABLE guidance
|
|
4
|
+
tags: mysql, ddl, schema-migration, alter-table, innodb
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Online DDL
|
|
8
|
+
|
|
9
|
+
Not all `ALTER TABLE` is equal — some block writes for the entire duration.
|
|
10
|
+
|
|
11
|
+
## Algorithm Spectrum
|
|
12
|
+
|
|
13
|
+
| Algorithm | What Happens | DML During? |
|
|
14
|
+
|---|---|---|
|
|
15
|
+
| `INSTANT` | Metadata-only change | Yes |
|
|
16
|
+
| `INPLACE` | Rebuilds in background | Usually yes |
|
|
17
|
+
| `COPY` | Full table copy to tmp table | **Blocked** |
|
|
18
|
+
|
|
19
|
+
MySQL picks the fastest available. Specify explicitly to fail-safe:
|
|
20
|
+
```sql
|
|
21
|
+
ALTER TABLE orders ADD COLUMN note VARCHAR(255) DEFAULT NULL, ALGORITHM=INSTANT;
|
|
22
|
+
-- Fails loudly if INSTANT isn't possible, rather than silently falling back to COPY.
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
## What Supports INSTANT (MySQL 8.0+)
|
|
26
|
+
- Adding a column (at any position as of 8.0.29; only at end before 8.0.29)
|
|
27
|
+
- Dropping a column (8.0.29+)
|
|
28
|
+
- Renaming a column (8.0.28+)
|
|
29
|
+
|
|
30
|
+
**Not INSTANT**: adding indexes (uses INPLACE), dropping indexes (uses INPLACE; typically metadata-only), changing column type, extending VARCHAR (uses INPLACE), adding columns when INSTANT isn't supported for the table/operation.
|
|
31
|
+
|
|
32
|
+
## Lock Levels
|
|
33
|
+
`LOCK=NONE` (concurrent DML), `LOCK=SHARED` (reads only), `LOCK=EXCLUSIVE` (full block), `LOCK=DEFAULT` (server chooses maximum concurrency; default).
|
|
34
|
+
|
|
35
|
+
Always request `LOCK=NONE` (and an explicit `ALGORITHM`) to surface conflicts early instead of silently falling back to a more blocking method.
|
|
36
|
+
|
|
37
|
+
## Large Tables (millions+ rows)
|
|
38
|
+
Even `INPLACE` operations typically hold brief metadata locks at start/end. The commit phase requires an exclusive metadata lock and will wait for concurrent transactions to finish; long-running transactions can block DDL from completing.
|
|
39
|
+
|
|
40
|
+
On huge tables, consider external tools:
|
|
41
|
+
- **pt-online-schema-change**: creates shadow table, syncs via triggers.
|
|
42
|
+
- **gh-ost**: triggerless, uses binlog stream. Preferred for high-write tables.
|
|
43
|
+
|
|
44
|
+
## Replication Considerations
|
|
45
|
+
- DDL replicates to replicas and executes there, potentially causing lag (especially COPY-like rebuilds).
|
|
46
|
+
- INSTANT operations minimize replication impact because they complete quickly.
|
|
47
|
+
- INPLACE operations can still cause lag and metadata lock waits on replicas during apply.
|
|
48
|
+
|
|
49
|
+
## PlanetScale Users
|
|
50
|
+
On PlanetScale, use **deploy requests** instead of manual DDL tools. Vitess handles non-blocking migrations automatically. Use this whenever possible because it offers much safer schema migrations.
|
|
51
|
+
|
|
52
|
+
## Key Rule
|
|
53
|
+
Never run `ALTER TABLE` on production without checking the algorithm. A surprise `COPY` on a 100M-row table can lock writes for hours.
|
|
@@ -0,0 +1,92 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: MySQL Partitioning
|
|
3
|
+
description: Partition types and management operations
|
|
4
|
+
tags: mysql, partitioning, range, list, hash, maintenance, data-retention
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Partitioning
|
|
8
|
+
|
|
9
|
+
All columns used in the partitioning expression must be part of every UNIQUE/PRIMARY KEY.
|
|
10
|
+
|
|
11
|
+
## Partition Pruning
|
|
12
|
+
The optimizer can eliminate partitions that cannot contain matching rows based on the WHERE clause ("partition pruning"). Partitioning helps most when queries frequently filter by the partition key/expression:
|
|
13
|
+
- Equality: `WHERE partition_key = ?` (HASH/KEY)
|
|
14
|
+
- Ranges: `WHERE partition_key BETWEEN ? AND ?` (RANGE)
|
|
15
|
+
- IN lists: `WHERE partition_key IN (...)` (LIST)
|
|
16
|
+
|
|
17
|
+
## Types
|
|
18
|
+
|
|
19
|
+
| Need | Type |
|
|
20
|
+
|---|---|
|
|
21
|
+
| Time-ordered / data retention | RANGE |
|
|
22
|
+
| Discrete categories | LIST |
|
|
23
|
+
| Even distribution | HASH / KEY |
|
|
24
|
+
| Two access patterns | RANGE + HASH sub |
|
|
25
|
+
|
|
26
|
+
```sql
|
|
27
|
+
-- RANGE COLUMNS (direct date comparisons; avoids function wrapper)
|
|
28
|
+
PARTITION BY RANGE COLUMNS (created_at) (
|
|
29
|
+
PARTITION p2025_q1 VALUES LESS THAN ('2025-04-01'),
|
|
30
|
+
PARTITION p_future VALUES LESS THAN (MAXVALUE)
|
|
31
|
+
);
|
|
32
|
+
|
|
33
|
+
-- RANGE with function (use when you must partition by an expression)
|
|
34
|
+
PARTITION BY RANGE (TO_DAYS(created_at)) (
|
|
35
|
+
PARTITION p2025_q1 VALUES LESS THAN (TO_DAYS('2025-04-01')),
|
|
36
|
+
PARTITION p_future VALUES LESS THAN MAXVALUE
|
|
37
|
+
);
|
|
38
|
+
-- LIST (discrete categories — unlisted values cause errors, ensure full coverage)
|
|
39
|
+
PARTITION BY LIST COLUMNS (region) (
|
|
40
|
+
PARTITION p_americas VALUES IN ('us', 'ca', 'br'),
|
|
41
|
+
PARTITION p_europe VALUES IN ('uk', 'de', 'fr')
|
|
42
|
+
);
|
|
43
|
+
-- HASH/KEY (even distribution, equality pruning only)
|
|
44
|
+
PARTITION BY HASH (user_id) PARTITIONS 8;
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
## Foreign Key Restrictions (InnoDB)
|
|
48
|
+
Partitioned InnoDB tables do not support foreign keys:
|
|
49
|
+
- A partitioned table cannot define foreign key constraints to other tables.
|
|
50
|
+
- Other tables cannot reference a partitioned table with a foreign key.
|
|
51
|
+
|
|
52
|
+
If you need foreign keys, partitioning may not be an option.
|
|
53
|
+
|
|
54
|
+
## When Partitioning Helps vs Hurts
|
|
55
|
+
**Helps:**
|
|
56
|
+
- Very large tables (millions+ rows) with time-ordered access patterns
|
|
57
|
+
- Data retention workflows (drop old partitions vs DELETE)
|
|
58
|
+
- Queries that filter by the partition key/expression (enables pruning)
|
|
59
|
+
- Maintenance on subsets of data (operate on partitions vs whole table)
|
|
60
|
+
|
|
61
|
+
**Hurts:**
|
|
62
|
+
- Small tables (overhead without benefit)
|
|
63
|
+
- Queries that don't filter by the partition key (no pruning)
|
|
64
|
+
- Workloads that require foreign keys
|
|
65
|
+
- Complex UNIQUE key requirements (partition key columns must be included everywhere)
|
|
66
|
+
|
|
67
|
+
## Management Operations
|
|
68
|
+
|
|
69
|
+
```sql
|
|
70
|
+
-- Add: split catch-all MAXVALUE partition
|
|
71
|
+
ALTER TABLE events REORGANIZE PARTITION p_future INTO (
|
|
72
|
+
PARTITION p2026_01 VALUES LESS THAN (TO_DAYS('2026-02-01')),
|
|
73
|
+
PARTITION p_future VALUES LESS THAN MAXVALUE
|
|
74
|
+
);
|
|
75
|
+
-- Drop aged-out data (orders of magnitude faster than DELETE)
|
|
76
|
+
ALTER TABLE events DROP PARTITION p2025_q1;
|
|
77
|
+
-- Merge partitions
|
|
78
|
+
ALTER TABLE events REORGANIZE PARTITION p2025_01, p2025_02, p2025_03 INTO (
|
|
79
|
+
PARTITION p2025_q1 VALUES LESS THAN (TO_DAYS('2025-04-01'))
|
|
80
|
+
);
|
|
81
|
+
-- Archive via exchange (LIKE creates non-partitioned copy; both must match structure)
|
|
82
|
+
CREATE TABLE events_archive LIKE events;
|
|
83
|
+
ALTER TABLE events_archive REMOVE PARTITIONING;
|
|
84
|
+
ALTER TABLE events EXCHANGE PARTITION p2025_q1 WITH TABLE events_archive;
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
Notes:
|
|
88
|
+
- `REORGANIZE PARTITION` rebuilds the affected partition(s).
|
|
89
|
+
- `EXCHANGE PARTITION` requires an exact structure match (including indexes) and the target table must not be partitioned.
|
|
90
|
+
- `DROP PARTITION` is DDL (fast) vs `DELETE` (DML; slow on large datasets).
|
|
91
|
+
|
|
92
|
+
Always ask for human approval before dropping, deleting, or archiving data.
|
|
@@ -0,0 +1,70 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: Primary Key Design
|
|
3
|
+
description: Primary key patterns
|
|
4
|
+
tags: mysql, primary-keys, auto-increment, uuid, innodb
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Primary Keys
|
|
8
|
+
|
|
9
|
+
InnoDB stores rows in primary key order (clustered index). This means:
|
|
10
|
+
- **Sequential keys = optimal inserts**: new rows append, minimizing page splits and fragmentation.
|
|
11
|
+
- **Random keys = fragmentation**: random inserts cause page splits to maintain PK order, wasting space and slowing inserts.
|
|
12
|
+
- **Secondary index lookups**: secondary indexes store the PK value and use it to fetch the full row from the clustered index.
|
|
13
|
+
|
|
14
|
+
## INT vs BIGINT for Primary Keys
|
|
15
|
+
- **INT UNSIGNED**: 4 bytes, max ~4.3B rows.
|
|
16
|
+
- **BIGINT UNSIGNED**: 8 bytes, max ~18.4 quintillion rows.
|
|
17
|
+
|
|
18
|
+
Guideline: default to **BIGINT UNSIGNED** unless you're certain the table will never approach the INT limit. The extra 4 bytes is usually cheaper than the risk of exhausting INT.
|
|
19
|
+
|
|
20
|
+
## Avoid Random UUID as Clustered PK
|
|
21
|
+
- UUID PK stored as `BINARY(16)`: 16 bytes (vs 8 for BIGINT). Random inserts cause page splits, and every secondary index entry carries the PK.
|
|
22
|
+
- UUID stored as `CHAR(36)`/`VARCHAR(36)`: 36 bytes (+ overhead) and is generally worse for storage and index size.
|
|
23
|
+
- If external identifiers are required, store UUID as `BINARY(16)` in a secondary unique column:
|
|
24
|
+
|
|
25
|
+
```sql
|
|
26
|
+
CREATE TABLE users (
|
|
27
|
+
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
|
|
28
|
+
public_id BINARY(16) NOT NULL,
|
|
29
|
+
UNIQUE KEY idx_public_id (public_id)
|
|
30
|
+
);
|
|
31
|
+
-- UUID_TO_BIN(uuid, 1) reorders UUIDv1 bytes to be roughly time-sorted (reduces fragmentation)
|
|
32
|
+
-- MySQL's UUID() returns UUIDv4 (random). For time-ordered IDs, use app-generated UUIDv7/ULID/Snowflake.
|
|
33
|
+
INSERT INTO users (public_id) VALUES (UUID_TO_BIN(?, 1)); -- app provides UUID string
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
If UUIDs are required, prefer time-ordered variants such as UUIDv7 (app-generated) to reduce index fragmentation.
|
|
37
|
+
|
|
38
|
+
## Secondary Indexes Include the Primary Key
|
|
39
|
+
InnoDB secondary indexes store the primary key value with each index entry. Implications:
|
|
40
|
+
- **Larger secondary indexes**: a secondary index entry includes (indexed columns + PK bytes).
|
|
41
|
+
- **Covering reads**: `SELECT id FROM users WHERE email = ?` can often be satisfied from `INDEX(email)` because `id` (PK) is already present in the index entry.
|
|
42
|
+
- **UUID penalty**: a `BINARY(16)` PK makes every secondary index entry 8 bytes larger than a BIGINT PK.
|
|
43
|
+
|
|
44
|
+
## Auto-Increment Considerations
|
|
45
|
+
- **Hot spot**: inserts target the end of the clustered index (usually fine; can bottleneck at extreme insert rates).
|
|
46
|
+
- **Gaps are normal**: rollbacks or failed inserts can leave gaps.
|
|
47
|
+
- **Locking**: auto-increment allocation can introduce contention under very high concurrency.
|
|
48
|
+
|
|
49
|
+
## Alternative Ordered IDs (Snowflake / ULID / UUIDv7)
|
|
50
|
+
If you need globally unique IDs generated outside the database:
|
|
51
|
+
- **Snowflake-style**: 64-bit integers (fits in BIGINT), time-ordered, compact.
|
|
52
|
+
- **ULID / UUIDv7**: 128-bit (store as `BINARY(16)`), time-ordered, better insert locality than random UUIDv4.
|
|
53
|
+
|
|
54
|
+
Recommendation: prefer `BIGINT AUTO_INCREMENT` unless you need distributed ID generation or externally meaningful identifiers.
|
|
55
|
+
|
|
56
|
+
## Replication Considerations
|
|
57
|
+
- Random-key insert patterns (UUIDv4) can amplify page splits and I/O on replicas too, increasing lag.
|
|
58
|
+
- Time-ordered IDs reduce fragmentation and tend to replicate more smoothly under heavy insert workloads.
|
|
59
|
+
|
|
60
|
+
## Composite Primary Keys
|
|
61
|
+
|
|
62
|
+
Use for join/many-to-many tables. Most-queried column first:
|
|
63
|
+
|
|
64
|
+
```sql
|
|
65
|
+
CREATE TABLE user_roles (
|
|
66
|
+
user_id BIGINT UNSIGNED NOT NULL,
|
|
67
|
+
role_id BIGINT UNSIGNED NOT NULL,
|
|
68
|
+
PRIMARY KEY (user_id, role_id)
|
|
69
|
+
);
|
|
70
|
+
```
|
|
@@ -0,0 +1,117 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: Query Optimization Pitfalls
|
|
3
|
+
description: Common anti-patterns that silently kill performance
|
|
4
|
+
tags: mysql, query-optimization, anti-patterns, performance, indexes
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Query Optimization Pitfalls
|
|
8
|
+
|
|
9
|
+
These patterns look correct but bypass indexes or cause full scans.
|
|
10
|
+
|
|
11
|
+
## Non-Sargable Predicates
|
|
12
|
+
A **sargable** predicate can use an index. Common non-sargable patterns:
|
|
13
|
+
- functions/arithmetic on indexed columns
|
|
14
|
+
- implicit type conversions
|
|
15
|
+
- leading wildcards (`LIKE '%x'`)
|
|
16
|
+
- some negations (`!=`, `NOT IN`, `NOT LIKE`) depending on shape/data
|
|
17
|
+
|
|
18
|
+
## Functions on Indexed Columns
|
|
19
|
+
```sql
|
|
20
|
+
-- BAD: function prevents index use on created_at
|
|
21
|
+
WHERE YEAR(created_at) = 2024
|
|
22
|
+
|
|
23
|
+
-- GOOD: sargable range
|
|
24
|
+
WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01'
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
MySQL 8.0+ can use expression (functional) indexes for some cases:
|
|
28
|
+
|
|
29
|
+
```sql
|
|
30
|
+
CREATE INDEX idx_users_upper_name ON users ((UPPER(name)));
|
|
31
|
+
-- Now this can use idx_users_upper_name:
|
|
32
|
+
WHERE UPPER(name) = 'SMITH'
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
## Implicit Type Conversions
|
|
36
|
+
Implicit casts can make indexes unusable:
|
|
37
|
+
|
|
38
|
+
```sql
|
|
39
|
+
-- If phone is VARCHAR, this may force CAST(phone AS UNSIGNED) and scan
|
|
40
|
+
WHERE phone = 1234567890
|
|
41
|
+
|
|
42
|
+
-- Better: match the column type
|
|
43
|
+
WHERE phone = '1234567890'
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
## LIKE Patterns
|
|
47
|
+
```sql
|
|
48
|
+
-- BAD: leading wildcard cannot use a B-Tree index
|
|
49
|
+
WHERE name LIKE '%smith'
|
|
50
|
+
WHERE name LIKE '%smith%'
|
|
51
|
+
|
|
52
|
+
-- GOOD: prefix match can use an index
|
|
53
|
+
WHERE name LIKE 'smith%'
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
For suffix search, consider storing a reversed generated column + prefix search:
|
|
57
|
+
|
|
58
|
+
```sql
|
|
59
|
+
ALTER TABLE users
|
|
60
|
+
ADD COLUMN name_reversed VARCHAR(255) AS (REVERSE(name)) STORED,
|
|
61
|
+
ADD INDEX idx_users_name_reversed (name_reversed);
|
|
62
|
+
|
|
63
|
+
WHERE name_reversed LIKE CONCAT(REVERSE('smith'), '%');
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
For infix search at scale, use `FULLTEXT` (when appropriate) or a dedicated search engine.
|
|
67
|
+
|
|
68
|
+
## `OR` Across Different Columns
|
|
69
|
+
`OR` across different columns often prevents efficient index use.
|
|
70
|
+
|
|
71
|
+
```sql
|
|
72
|
+
-- Often suboptimal
|
|
73
|
+
WHERE status = 'active' OR region = 'us-east'
|
|
74
|
+
|
|
75
|
+
-- Often better: two indexed queries
|
|
76
|
+
SELECT * FROM orders WHERE status = 'active'
|
|
77
|
+
UNION ALL
|
|
78
|
+
SELECT * FROM orders WHERE region = 'us-east';
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
MySQL can sometimes use `index_merge`, but it's frequently slower than a purpose-built composite index or a UNION rewrite.
|
|
82
|
+
|
|
83
|
+
## ORDER BY + LIMIT Without an Index
|
|
84
|
+
`LIMIT` does not automatically make sorting cheap. If no index supports the order, MySQL may sort many rows (`Using filesort`) and then apply LIMIT.
|
|
85
|
+
|
|
86
|
+
```sql
|
|
87
|
+
-- Needs an index on created_at (or it will filesort)
|
|
88
|
+
SELECT * FROM orders ORDER BY created_at DESC LIMIT 10;
|
|
89
|
+
|
|
90
|
+
-- For WHERE + ORDER BY, you usually need a composite index:
|
|
91
|
+
-- (status, created_at DESC)
|
|
92
|
+
SELECT * FROM orders
|
|
93
|
+
WHERE status = 'pending'
|
|
94
|
+
ORDER BY created_at DESC
|
|
95
|
+
LIMIT 10;
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
## DISTINCT / GROUP BY
|
|
99
|
+
`DISTINCT` and `GROUP BY` can trigger temp tables and sorts (`Using temporary`, `Using filesort`) when indexes don't match.
|
|
100
|
+
|
|
101
|
+
```sql
|
|
102
|
+
-- Often improved by an index on (status)
|
|
103
|
+
SELECT DISTINCT status FROM orders;
|
|
104
|
+
|
|
105
|
+
-- Often improved by an index on (status)
|
|
106
|
+
SELECT status, COUNT(*) FROM orders GROUP BY status;
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
## Derived Tables / CTE Materialization
|
|
110
|
+
Derived tables and CTEs may be materialized into temporary tables, which can be slower than a flattened query. If performance is surprising, check `EXPLAIN` and consider rewriting the query or adding supporting indexes.
|
|
111
|
+
|
|
112
|
+
## Other Quick Rules
|
|
113
|
+
- **`OFFSET` pagination**: `OFFSET N` scans and discards N rows. Use cursor-based pagination.
|
|
114
|
+
- **`SELECT *`** defeats covering indexes. Select only needed columns.
|
|
115
|
+
- **`NOT IN` with NULLs**: `NOT IN (subquery)` returns no rows if subquery contains any NULL. Use `NOT EXISTS`.
|
|
116
|
+
- **`COUNT(*)` vs `COUNT(col)`**: `COUNT(*)` counts all rows; `COUNT(col)` skips NULLs.
|
|
117
|
+
- **Arithmetic on indexed columns**: `WHERE price * 1.1 > 100` prevents index use. Rewrite to keep the column bare: `WHERE price > 100 / 1.1`.
|