clearctx 3.0.0 → 3.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,315 @@
1
+ ---
2
+ name: postgresql
3
+ description: Production-grade PostgreSQL schema design, query optimization, migrations, and operational patterns
4
+ domain: database
5
+ keywords: [postgresql, postgres, sql, schema, migrations, indexes, queries, transactions, pgbouncer]
6
+ version: 1.0.0
7
+ ---
8
+
9
+ # PostgreSQL — Expertise Guide
10
+
11
+ ## Worker Context
12
+
13
+ You are a PostgreSQL specialist implementing production database code. Follow these rules exactly.
14
+
15
+ ### Naming Conventions
16
+
17
+ - Tables: `snake_case`, singular (`user_account`, NOT `users` or `UserAccounts`)
18
+ - Columns: `snake_case` (`created_at`, NOT `createdAt`)
19
+ - Primary keys: `id` (always)
20
+ - Foreign keys: `{referenced_table}_id` (`organization_id`, NOT `org` or `orgId`)
21
+ - Indexes: `idx_{table}_{columns}` (`idx_order_customer_id`)
22
+ - Constraints: `{type}_{table}_{columns}` — `pk_order`, `uq_user_account_email`, `fk_order_customer_id`, `ck_order_status`
23
+ - Enums: lowercase strings matching application enums exactly (`'pending'`, `'in_progress'`, NOT `'Pending'`)
24
+
25
+ ### Data Type Selection
26
+
27
+ | Use Case | Use This | NEVER This | Why |
28
+ |----------|----------|------------|-----|
29
+ | Primary keys | `uuid` via `gen_random_uuid()` | `serial`/`bigserial` | Non-guessable, safe for distributed systems, no sequence contention |
30
+ | Timestamps | `timestamptz` | `timestamp` | `timestamp` silently drops timezone — data corruption across TZs |
31
+ | Money/currency | `integer` (cents) or `numeric(19,4)` | `float`/`double precision`/`real` | Floating point loses precision: `0.1 + 0.2 != 0.3` |
32
+ | Structured JSON | `jsonb` | `json` | `jsonb` is binary, indexable, deduplicates keys; `json` is raw text |
33
+ | Boolean flags | `boolean` with `NOT NULL DEFAULT false` | `integer` 0/1 | Semantic clarity, type safety |
34
+ | Variable text | `text` | `varchar(n)` | PostgreSQL stores both identically; `varchar(n)` adds check overhead with no storage benefit |
35
+ | IP addresses | `inet` | `text` | Native validation, comparison operators, indexing |
36
+ | Intervals/durations | `interval` | `integer` (seconds) | Native arithmetic: `now() + interval '30 days'` |
37
+
38
+ CRITICAL: Every table MUST have `id uuid PRIMARY KEY DEFAULT gen_random_uuid()`, `created_at timestamptz NOT NULL DEFAULT now()`, and `updated_at timestamptz NOT NULL DEFAULT now()`.
39
+
40
+ ### Schema Design
41
+
42
+ **Normalize by default.** Denormalize only when you can prove a read query requires 3+ JOINs on tables exceeding 1M rows AND the query runs >100ms after index optimization.
43
+
44
+ When denormalizing:
45
+ - Add a comment explaining WHY (`-- denormalized from order_item for dashboard query perf`)
46
+ - Create a trigger or application hook to keep denormalized data in sync
47
+ - NEVER denormalize without a sync mechanism — stale data is worse than slow reads
48
+
49
+ **Foreign keys MUST have indexes.** Every `_id` column that references another table gets an index. No exceptions. Missing FK indexes cause full table scans on JOINs and CASCADE deletes.
50
+
51
+ ### Index Strategy
52
+
53
+ ```
54
+ Default decision tree:
55
+ Equality lookup (WHERE x = ?) → B-tree (default)
56
+ Range scan (WHERE x > ? AND x < ?) → B-tree
57
+ Full-text search → GIN on tsvector column
58
+ JSONB containment (@>, ?) → GIN
59
+ Array containment (@>, &&) → GIN
60
+ Geometric/spatial → GiST
61
+ Pattern matching (LIKE 'prefix%') → B-tree (with text_pattern_ops)
62
+ Pattern matching (LIKE '%infix%') → GIN with pg_trgm
63
+ ```
64
+
65
+ **Composite indexes:** Column order matters. Put equality columns first, range columns last. `(status, created_at)` for `WHERE status = 'active' AND created_at > ?` — NOT `(created_at, status)`.
66
+
67
+ **Partial indexes:** Use when queries filter on a constant. `CREATE INDEX idx_order_active ON order (created_at) WHERE status = 'pending'` — indexes only pending orders, dramatically smaller.
68
+
69
+ **When NOT to index:** Tables under 10K rows (seq scan is faster). Columns with <5% selectivity (e.g., boolean `is_active` where 95% are true). Write-heavy columns that rarely appear in WHERE clauses.
70
+
71
+ ### Migrations
72
+
73
+ CRITICAL: Every migration MUST have a corresponding rollback (down migration). Test rollbacks before deploying.
74
+
75
+ **Zero-downtime checklist:**
76
+ 1. NEVER `ALTER TABLE ... ALTER COLUMN ... TYPE` on large tables (rewrites entire table, locks exclusively)
77
+ 2. NEVER add `NOT NULL` column without a default (table rewrite on PG <11, lock on PG 11+)
78
+ 3. NEVER drop columns directly — first stop reading, deploy, then drop in a follow-up migration
79
+ 4. Add new columns as nullable first, backfill, then add NOT NULL constraint separately
80
+ 5. Rename columns using dual-write: add new column → copy data → update app → drop old column
81
+
82
+ **Safe migration template:**
83
+ ```sql
84
+ -- UP
85
+ BEGIN;
86
+ ALTER TABLE order ADD COLUMN IF NOT EXISTS discount_cents integer;
87
+ COMMIT;
88
+
89
+ -- DOWN
90
+ BEGIN;
91
+ ALTER TABLE order DROP COLUMN IF EXISTS discount_cents;
92
+ COMMIT;
93
+ ```
94
+
95
+ ### Query Optimization
96
+
97
+ **Read EXPLAIN ANALYZE output — focus on these red flags:**
98
+ - `Seq Scan` on tables >10K rows → missing index
99
+ - `Nested Loop` with high `rows` on inner side → consider `Hash Join` (check work_mem)
100
+ - `Sort` with `external merge Disk` → increase `work_mem` for session
101
+ - Estimated rows wildly off from actual rows → run `ANALYZE` on the table
102
+
103
+ **N+1 prevention:** NEVER issue queries inside loops. Use JOINs, subqueries, or `WHERE id = ANY($1::uuid[])` with an array parameter.
104
+
105
+ ```sql
106
+ -- GOOD: Single query with JOIN
107
+ SELECT o.id, o.total_cents, c.name
108
+ FROM order o
109
+ JOIN customer c ON c.id = o.customer_id
110
+ WHERE o.status = 'pending';
111
+
112
+ -- BAD: N+1 — one query per order
113
+ SELECT * FROM order WHERE status = 'pending';
114
+ -- then for EACH order:
115
+ SELECT * FROM customer WHERE id = $1;
116
+ ```
117
+
118
+ **Pagination:** NEVER use `OFFSET` for deep pagination (scans and discards rows). Use keyset/cursor pagination:
119
+ ```sql
120
+ -- GOOD: Keyset pagination (constant performance)
121
+ SELECT * FROM order
122
+ WHERE created_at < $cursor_timestamp
123
+ ORDER BY created_at DESC
124
+ LIMIT 20;
125
+
126
+ -- BAD: Offset pagination (degrades linearly)
127
+ SELECT * FROM order ORDER BY created_at DESC LIMIT 20 OFFSET 10000;
128
+ ```
129
+
130
+ ### Transactions
131
+
132
+ | Isolation Level | Use When | Cost |
133
+ |----------------|----------|------|
134
+ | `READ COMMITTED` (default) | Standard CRUD operations. Fine for most workloads. | Lowest |
135
+ | `REPEATABLE READ` | Reports, analytics that need consistent snapshot. Retrying on serialization failure is acceptable. | Medium |
136
+ | `SERIALIZABLE` | Financial transactions, inventory decrements where race conditions cause data loss. MUST implement retry logic for serialization failures. | Highest |
137
+
138
+ Keep transactions short. NEVER hold transactions open during external HTTP calls or user input waits.
139
+
140
+ ### Connection Pooling
141
+
142
+ - Application → PgBouncer → PostgreSQL. NEVER connect directly from app to PG in production.
143
+ - Use `transaction` pooling mode (default). Use `session` mode only if using prepared statements, LISTEN/NOTIFY, or advisory locks.
144
+ - Set PG `max_connections` to cores * 2 + effective_spindle_count (typically 100-200). Let PgBouncer handle thousands of app connections.
145
+ - Connection limit per pool: match your app's max concurrent DB operations, NOT total HTTP connections.
146
+
147
+ ---
148
+
149
+ ## Conventions
150
+
151
+ ### SQL Formatting
152
+ ```sql
153
+ -- Keywords: UPPERCASE
154
+ -- Identifiers: lowercase snake_case
155
+ -- One clause per line
156
+ -- Indent JOIN conditions and subqueries
157
+
158
+ SELECT
159
+ o.id,
160
+ o.total_cents,
161
+ c.name AS customer_name
162
+ FROM order o
163
+ JOIN customer c ON c.id = o.customer_id
164
+ WHERE o.status = 'pending'
165
+ AND o.created_at > now() - interval '30 days'
166
+ ORDER BY o.created_at DESC
167
+ LIMIT 50;
168
+ ```
169
+
170
+ ### Migration File Naming
171
+ ```
172
+ {timestamp}_{description}.sql
173
+ 20260215143000_create_order_table.sql
174
+ 20260215144500_add_discount_to_order.sql
175
+ ```
176
+
177
+ ### Default Table Template
178
+ ```sql
179
+ CREATE TABLE {table_name} (
180
+ id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
181
+ -- domain columns here
182
+ created_at timestamptz NOT NULL DEFAULT now(),
183
+ updated_at timestamptz NOT NULL DEFAULT now()
184
+ );
185
+
186
+ -- updated_at trigger
187
+ CREATE TRIGGER set_{table_name}_updated_at
188
+ BEFORE UPDATE ON {table_name}
189
+ FOR EACH ROW
190
+ EXECUTE FUNCTION set_updated_at();
191
+
192
+ -- Reusable trigger function (create once per database)
193
+ CREATE OR REPLACE FUNCTION set_updated_at()
194
+ RETURNS TRIGGER AS $$
195
+ BEGIN
196
+ NEW.updated_at = now();
197
+ RETURN NEW;
198
+ END;
199
+ $$ LANGUAGE plpgsql;
200
+ ```
201
+
202
+ ---
203
+
204
+ ## Common Patterns
205
+
206
+ ### 1. Soft Delete
207
+ ```sql
208
+ ALTER TABLE user_account ADD COLUMN deleted_at timestamptz;
209
+ CREATE INDEX idx_user_account_active ON user_account (id) WHERE deleted_at IS NULL;
210
+
211
+ -- All queries filter: WHERE deleted_at IS NULL
212
+ -- Partial index keeps it fast
213
+ ```
214
+
215
+ ### 2. Enum-as-Text with Check Constraint
216
+ ```sql
217
+ -- Prefer check constraint over PostgreSQL ENUM type
218
+ -- (ENUM types are painful to modify in production)
219
+ ALTER TABLE order ADD COLUMN status text NOT NULL DEFAULT 'pending'
220
+ CONSTRAINT ck_order_status CHECK (status IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled'));
221
+ ```
222
+
223
+ ### 3. JSONB for Flexible Metadata
224
+ ```sql
225
+ ALTER TABLE order ADD COLUMN metadata jsonb NOT NULL DEFAULT '{}';
226
+ CREATE INDEX idx_order_metadata ON order USING gin (metadata);
227
+
228
+ -- Query: find orders with specific tag
229
+ SELECT * FROM order WHERE metadata @> '{"priority": "high"}';
230
+ ```
231
+
232
+ ### 4. Advisory Locks for Job Processing
233
+ ```sql
234
+ -- Claim a job without row-level lock contention
235
+ SELECT * FROM job_queue
236
+ WHERE status = 'pending'
237
+ AND pg_try_advisory_xact_lock(id::bigint)
238
+ ORDER BY created_at
239
+ LIMIT 1
240
+ FOR UPDATE SKIP LOCKED;
241
+ ```
242
+
243
+ ### 5. Upsert (INSERT ON CONFLICT)
244
+ ```sql
245
+ INSERT INTO user_preference (user_account_id, key, value)
246
+ VALUES ($1, $2, $3)
247
+ ON CONFLICT (user_account_id, key)
248
+ DO UPDATE SET value = EXCLUDED.value, updated_at = now();
249
+ ```
250
+
251
+ ---
252
+
253
+ ## Anti-Patterns
254
+
255
+ ### 1. SELECT *
256
+ NEVER use `SELECT *` in application code. It breaks when columns are added/removed, fetches unnecessary data, and prevents covering index optimization. Always list columns explicitly.
257
+
258
+ ### 2. Storing Money as Float
259
+ `float` and `double precision` cause rounding errors. `19.99 + 0.01` might not equal `20.00`. Use `integer` (cents) or `numeric(19,4)`. See Data Type Selection above.
260
+
261
+ ### 3. Missing Indexes on Foreign Keys
262
+ Every `_id` column referencing another table needs an index. Without it, JOINs and CASCADE DELETE perform full table scans. PostgreSQL does NOT auto-create indexes on foreign keys.
263
+
264
+ ### 4. Naive OFFSET Pagination
265
+ `OFFSET 100000` scans and discards 100K rows on every query. Performance degrades linearly with page depth. Use keyset pagination — see Query Optimization section.
266
+
267
+ ### 5. Long-Running Transactions
268
+ Transactions holding locks for >5 seconds block other writers. NEVER hold a transaction open during HTTP calls, queue operations, or user-facing waits. Fetch external data first, then open a short transaction to write.
269
+
270
+ ### 6. Using PostgreSQL ENUM Types
271
+ ```sql
272
+ -- BAD: Modifying ENUM requires ALTER TYPE which takes ACCESS EXCLUSIVE lock
273
+ CREATE TYPE order_status AS ENUM ('pending', 'shipped');
274
+ -- Adding a value: ALTER TYPE order_status ADD VALUE 'cancelled'; -- Cannot run in transaction!
275
+
276
+ -- GOOD: text + check constraint — easy to modify
277
+ ALTER TABLE order ADD CONSTRAINT ck_order_status
278
+ CHECK (status IN ('pending', 'shipped', 'cancelled'));
279
+ ```
280
+
281
+ ---
282
+
283
+ ## Integration Notes
284
+
285
+ ### Multi-Session Orchestration
286
+ When working as a database worker in a clearctx team:
287
+
288
+ 1. **Check inbox first** — `team_check_inbox` before any work
289
+ 2. **Read shared-conventions artifact** — Match response format, error format, enum values, and column names EXACTLY as defined
290
+ 3. **Publish schema artifact** — After creating tables, publish an artifact with table names, column names, and types so backend workers can generate correct queries:
291
+ ```json
292
+ {
293
+ "artifactId": "db-schema",
294
+ "type": "schema-change",
295
+ "data": {
296
+ "tables": ["order", "customer"],
297
+ "filesCreated": ["migrations/20260215_create_order.sql"],
298
+ "columnMap": {
299
+ "order": ["id", "customer_id", "total_cents", "status", "created_at", "updated_at"]
300
+ }
301
+ }
302
+ }
303
+ ```
304
+ 4. **Broadcast completion** — Let teammates know the schema is ready
305
+ 5. **Relative paths only** — `migrations/001_create_tables.sql`, NEVER absolute paths
306
+
307
+ ### Coordination with Backend Workers
308
+ - Backend workers depend on your schema artifact for query generation
309
+ - Agree on exact column names via shared-conventions BEFORE writing any SQL
310
+ - If a backend worker asks about column names or types via `team_ask`, reference your published artifact
311
+ - JSONB columns: document the expected shape in your artifact so backend workers serialize correctly
312
+
313
+ ### Coordination with Testing Workers
314
+ - Provide seed data SQL or a migration that inserts test fixtures
315
+ - Document any database-level constraints that affect test setup (unique constraints, check constraints, foreign keys)