npm - @cubis/foundry - Versions diffs - 0.3.10 → 0.3.11 - Mend

@cubis/foundry 0.3.10 → 0.3.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (56) hide show

package/Ai Agent Workflow/skills/database-skills/skills/neki/references/architecture.md CHANGED Viewed

@@ -1,5 +1,136 @@
-# Neki Architecture
+# Neki — Architecture and Pre-Sharding Design
-- Define shard key and data domain boundaries first.
-- Reduce cross-shard dependencies in transactional paths.
-- Map query ownership to shard topology.
+## What is Neki
+Neki is **sharded Postgres** built by PlanetScale — the company behind Vitess (the MySQL sharding system used at YouTube scale). Neki brings the same horizontal scaling approach to the Postgres ecosystem.
+> **Status as of early 2026**: Neki is not yet GA. Treat all behavioral assumptions as provisional until official docs stabilize. Re-verify after each preview update.
+## What Neki provides
+- **Horizontal sharding**: Distributes data across multiple Postgres nodes. Applications scale beyond single-node limits without sharding logic in application code.
+- **Managed by PlanetScale**: Operational experience from running Vitess at scale applied to Postgres.
+- **High availability**: PlanetScale-grade uptime with automatic failover.
+- **Postgres protocol compatibility**: Applications connect using standard Postgres drivers.
+## Architecture model (conceptual)
+Neki is architecturally adjacent to Vitess outcomes, but is **not a Vitess fork**. It is built from first principles for Postgres:
+- A **routing layer** (analogous to VTGate) intercepts queries and routes them to the correct shard.
+- A **shard key** is chosen at schema design time and determines which node stores each row.
+- Data is partitioned horizontally — each shard holds a subset of rows for every sharded table.
+- **Reference tables** (small lookup data) are replicated to all shards.
+## Pre-sharding design checklist
+Designing for Neki compatibility now means you won't need a painful migration later.
+### 1. Identify your shard key early
+Choose based on real query patterns — not theoretical. The shard key should appear in every high-QPS `WHERE` clause.
+Common choices: `tenant_id`, `org_id`, `user_id`, `account_id`.
+The shard key must be:
+- Present on every tenant-scoped table
+- High cardinality (even distribution across shards)
+- Immutable after insert (changing it requires data migration)
+### 2. Primary key design
+```sql
+-- Good: single-column PK that is the shard key
+CREATE TABLE users (user_id BIGINT PRIMARY KEY, ...);
+-- Good: composite PK with shard key leading on child tables
+CREATE TABLE orders (
+  user_id BIGINT NOT NULL,
+  id BIGINT GENERATED ALWAYS AS IDENTITY,
+  PRIMARY KEY (user_id, id)
+);
+-- Bad: shard key not leading
+PRIMARY KEY (id, user_id)
+```
+Use UUIDs (prefer UUIDv7 for sortability) or app-generated monotonic IDs — global sequences across shards are a coordination bottleneck.
+### 3. Co-locate joined tables
+Tables frequently joined must share the same shard key and be co-located. Always include the shard key in join conditions.
+```sql
+-- Correct: shard-local join
+SELECT o.id, oi.product_id FROM orders o
+JOIN order_items oi ON oi.user_id = o.user_id AND oi.order_id = o.id
+WHERE o.user_id = $1;
+```
+### 4. Index design
+Lead all indexes with the shard key. Scope unique constraints to include it.
+```sql
+-- Correct
+CREATE INDEX idx_orders_user_status ON orders (user_id, status, created_at);
+ALTER TABLE orders ADD CONSTRAINT uq_order_number UNIQUE (user_id, order_number);
+-- Incorrect: missing shard key in leading position
+CREATE INDEX idx_orders_status ON orders (status, created_at);
+```
+### 5. Foreign keys
+- FKs within the same shard key (co-located data) may be supported.
+- Cross-shard-key FKs must become application-level enforcement before sharding.
+- Audit all FKs before planning a Neki migration.
+### 6. Query patterns
+Every query on sharded tables must include the shard key:
+```sql
+-- Correct: routes to single shard
+SELECT * FROM orders WHERE user_id = $1 AND status = 'pending';
+-- Incorrect: scatter — hits all shards
+SELECT * FROM orders WHERE status = 'pending';
+```
+For lookups by a non-shard column, maintain a mapping table and harden it with backfill + miss-rate monitoring.
+### 7. Transactions
+Keep transactions within a single shard key value. Cross-shard transactions require coordination and are significantly slower.
+### 8. Global aggregations
+`COUNT(*)`, `SUM()` across all shards are expensive. Scope to shard key, or maintain pre-computed rollup tables for global stats.
+### 9. Reference tables
+Small, rarely-changing lookup data (countries, currencies, feature flags ≲100K rows, rarely written, no tenant scoping) don't need a shard key — they get replicated to all shards.
+## Shard-readiness checklist
+- [ ] Shard key identified and present on every tenant-scoped table
+- [ ] Composite PKs with shard key leading; shard-safe IDs (UUIDv7 or app-generated)
+- [ ] Shard key in all queries, indexes (leading position), and join conditions
+- [ ] Unique constraints scoped to include shard key
+- [ ] Cross-shard FKs audited; plan for app-level enforcement
+- [ ] Transactions scoped to single shard-key value
+- [ ] Global aggregations identified; rollup/async plan in place
+- [ ] Migrations use online/revertible patterns — avoid long locks
+## When to evaluate Neki
+- Single Postgres node is hitting CPU or storage limits under real load.
+- Multi-tenant SaaS with tenant isolation requirements.
+- Write volume exceeds what vertical scaling can address.
+Always benchmark on production-like data volume before committing. Keep migration plans reversible and testable.
+## Sources
+- Neki product page: https://www.neki.dev/
+- PlanetScale announcement: https://planetscale.com/blog/announcing-neki

package/Ai Agent Workflow/skills/database-skills/skills/neki/references/operations.md CHANGED Viewed

@@ -1,5 +1,77 @@
-# Neki Operations
+# Neki — Operational Guidance
-- Monitor shard-level health and imbalance trends.
-- Plan shard migration/resharding as explicit projects.
-- Keep rollback criteria defined for every topology change.
+> **Status as of early 2026**: Neki is pre-GA. All operational assumptions are provisional. Re-verify behavior after each preview or doc update from PlanetScale.
+## Migration planning principles
+### Keep it reversible
+- Schema migrations should use online, non-blocking patterns (e.g., additive changes first, then backfill, then constraint).
+- Never cut data into Neki from a single-node Postgres setup without a proven rollback path to the original.
+- Stage the migration: dev → staging with production-like data → production with canary traffic.
+### Test with production-like data volume
+Behavior under 100K rows can be dramatically different at 100M rows. Before committing:
+1. Restore a production snapshot to a staging Neki environment.
+2. Run your full query workload against it.
+3. Measure cross-shard scatter rate, latency p99, and aggregation performance.
+4. Validate that all queries include the shard key.
+### Platform lock-in decision criteria
+Only commit to Neki when:
+- Benchmark results on Neki staging match or exceed your current single-node Postgres.
+- All high-QPS query paths are shard-key-scoped (no unresolved scatter queries).
+- Application code connects via standard Postgres driver with no sharding logic — confirm no changes needed.
+- A rollback path is documented and tested.
+## Connection setup
+Neki exposes a standard Postgres protocol endpoint. Connect the same way as any managed Postgres service:
+```
+host: <your-neki-host>
+port: 5432
+sslmode: require    (always use TLS in production)
+```
+No special driver needed. Use standard `pg`, `psycopg2`, `pgx`, etc.
+## Schema change workflow in Neki
+> Full DDL behavior docs pending GA. Apply conservative practices:
+1. **Additive changes first**: add nullable columns without defaults before backfill-and-constrain.
+2. **Online migrations**: use tools like `pg-osc` patterns — shadow table, backfill, atomic cutover.
+3. **Test on staging** with a representative data subset before production.
+4. **Monitor replication lag** during migrations — pause if lag grows unexpectedly.
+## Monitoring
+While Neki-specific observability tooling is not yet documented, apply standard Postgres monitoring:
+```sql
+-- Active connections and query state
+SELECT state, count(*) FROM pg_stat_activity GROUP BY state;
+-- Slow queries (requires pg_stat_statements extension)
+SELECT query, calls, total_exec_time, mean_exec_time
+FROM pg_stat_statements
+ORDER BY total_exec_time DESC
+LIMIT 10;
+```
+Additionally at the Neki platform level:
+- Monitor per-shard query distribution — uneven distribution suggests a poor shard key choice.
+- Track cross-shard query rate — high scatter rate is a signal to revisit schema or query design.
+## Guardrails
+- **Never run destructive operations** (`DROP TABLE`, `TRUNCATE`, mass `DELETE`) without explicit user confirmation and a verified backup.
+- **Avoid long-running transactions** — they block vacuum/maintenance on affected shards.
+- **Validate before lock-in**: run a full workload benchmark on production-like data before treating Neki as the primary datastore.
+## Sources
+- Neki product page: https://www.neki.dev/
+- PlanetScale announcement: https://planetscale.com/blog/announcing-neki

package/Ai Agent Workflow/skills/database-skills/skills/postgres/SKILL.md CHANGED Viewed

@@ -1,15 +1,39 @@
 ---
 name: postgres
-description: PostgreSQL schema, indexing, query optimization, migrations, and operations.
+description: PostgreSQL schema, indexing, pagination, query optimization, migrations, and operations.
 ---
 # Postgres
-Load references as needed:
+## Optimization workflow
+1. Baseline query with `EXPLAIN (ANALYZE, BUFFERS)`.
+2. Align index design to `WHERE + JOIN + ORDER BY` shape.
+3. Prefer keyset pagination for deep lists.
+4. Re-check planner stats (`ANALYZE`) and maintenance health (`VACUUM`, autovacuum behavior).
+5. Validate with production-like data skew.
+## Indexing techniques
+- Multicolumn indexes for common combined predicates.
+- Partial indexes for hot filtered subsets.
+- `INCLUDE` columns for index-only scans.
+- GIN for JSONB/search-like containment queries.
+- BRIN for append-mostly time-series style tables.
+## Pagination techniques
+- Prefer seek pagination: `WHERE (sort_col, id) > (...) ORDER BY sort_col, id LIMIT n`.
+- Keep deterministic ordering with unique tie-breakers.
+- Use offset only for shallow pages.
+## Performance guardrails
+- Avoid unused indexes; they increase write and vacuum cost.
+- Keep transactions short to reduce lock and bloat pressure.
+- Validate any planner-sensitive change on realistic row counts.
+## References
 - `references/schema-indexing.md`
 - `references/performance-ops.md`
-Key rules:
-- Start with `EXPLAIN (ANALYZE, BUFFERS)`.
-- Design indexes from real query patterns.
-- Keep transactions short; monitor vacuum and bloat.

package/Ai Agent Workflow/skills/database-skills/skills/postgres/references/connection-pooling.md ADDED Viewed

@@ -0,0 +1,142 @@
+# Postgres — Connection Pooling
+## Why connection pooling is necessary
+Postgres spawns one process per connection, each consuming ~5–10 MB of RAM. At 200+ direct connections:
+- Memory pressure becomes significant.
+- Context-switching overhead increases.
+- Connection setup latency adds up (especially from serverless functions).
+**Rule**: almost every Postgres deployment in production needs a connection pooler in front.
+## PgBouncer — the standard choice
+PgBouncer is a lightweight, battle-tested TCP proxy for Postgres.
+### Pooling modes
+| Mode | How it works | Use for |
+| --- | --- | --- |
+| **Transaction** (recommended) | A server connection is held only for the duration of a transaction | Stateless apps, serverless, most OLTP |
+| **Session** | One server connection per client session until it disconnects | Apps that use session-level features (`SET`, `LISTEN`, advisory locks) |
+| **Statement** | Returns connection after each statement | Only for apps that don't use multi-statement transactions — rare |
+**Transaction mode caveat**: prepared statements are session-level in Postgres and break under transaction pooling. Use `pgbouncer_prepared_statements = 1` (PgBouncer 1.21+) or disable prepared statements in your client.
+### Typical PgBouncer configuration (`pgbouncer.ini`)
+```ini
+[databases]
+mydb = host=127.0.0.1 port=5432 dbname=mydb
+[pgbouncer]
+listen_addr = 0.0.0.0
+listen_port = 6432
+auth_type = scram-sha-256
+auth_file = /etc/pgbouncer/userlist.txt
+pool_mode = transaction
+max_client_conn = 1000        ; max connections from apps to PgBouncer
+default_pool_size = 20        ; max actual Postgres connections per database/user pair
+reserve_pool_size = 5         ; extra connections for spikes
+reserve_pool_timeout = 3
+server_idle_timeout = 600
+log_connections = 0           ; disable in production (log noise)
+```
+### Pool sizing formula
+```
+default_pool_size ≈ (num_postgres_cpu_cores × 2) + num_spindle_disks
+```
+For a 4-core managed Postgres: target ~10–15 server connections. App instances × client threads → PgBouncer → bounded Postgres connections.
+## Application-level connection pools
+Even with PgBouncer, application clients should pool connections to PgBouncer (not open/close on each request).
+### Node.js — `pg` / `node-postgres`
+```ts
+import { Pool } from 'pg';
+const pool = new Pool({
+  connectionString: process.env.DATABASE_URL,
+  max: 10,            // max connections from this app instance to PgBouncer
+  idleTimeoutMillis: 30000,
+  connectionTimeoutMillis: 5000,
+});
+// Use pool.query() directly or pool.connect() for transactions
+```
+### Python — SQLAlchemy
+```python
+engine = create_engine(
+    DATABASE_URL,
+    pool_size=10,
+    max_overflow=5,
+    pool_pre_ping=True,       # test connection health before use
+    pool_recycle=3600,        # recycle connections every hour
+)
+```
+### Prisma (Node.js)
+```
+# .env
+DATABASE_URL="postgresql://user:pass@pgbouncer-host:6432/mydb?pgbouncer=true"
+```
+The `?pgbouncer=true` flag disables prepared statements, which is required for transaction pooling.
+## Serverless / edge environments
+Serverless functions open and close connections per invocation — disastrous for direct Postgres connections.
+Options:
+1. **PgBouncer in transaction mode** — each function call uses a pool connection only during its transaction.
+2. **Supabase Transaction Pooler** — managed PgBouncer built into Supabase (port 6543).
+3. **Neon serverless driver** — uses HTTP instead of TCP; no persistent connection overhead.
+```ts
+// Neon serverless (HTTP-based, no connection overhead)
+import { neon } from '@neondatabase/serverless';
+const sql = neon(process.env.DATABASE_URL);
+const orders = await sql`SELECT * FROM orders WHERE user_id = ${userId}`;
+```
+## Monitoring connections
+```sql
+-- Current connection breakdown
+SELECT state, count(*) FROM pg_stat_activity GROUP BY state ORDER BY count DESC;
+-- Waiting connections (lock or connection wait)
+SELECT pid, state, wait_event_type, wait_event, query
+FROM pg_stat_activity
+WHERE wait_event IS NOT NULL;
+-- Max connections setting
+SHOW max_connections;
+-- Current utilization rate
+SELECT count(*) * 100.0 / current_setting('max_connections')::int AS pct_used
+FROM pg_stat_activity;
+```
+Alert when `pct_used > 80%` — before you hit the limit.
+## Common mistakes
+| Mistake | Fix |
+| --- | --- |
+| No pooler in serverless | Add PgBouncer or use HTTP driver |
+| `max_connections` set too high on Postgres | Lower it and pool instead |
+| Prepared statements in transaction pool mode | Disable at driver level |
+| App pool size > PgBouncer pool size | App waits; PgBouncer has no server connections left |
+| No `pool_pre_ping` / health check | Stale connections fail silently |
+## Sources
+- PgBouncer documentation: https://www.pgbouncer.org/config.html
+- PostgreSQL max_connections: https://www.postgresql.org/docs/current/runtime-config-connection.html
+- Prisma PgBouncer guide: https://www.prisma.io/docs/orm/prisma-client/setup-and-configuration/databases/postgresql#pgbouncer

package/Ai Agent Workflow/skills/database-skills/skills/postgres/references/migrations.md ADDED Viewed

@@ -0,0 +1,126 @@
+# Postgres — Database Migrations
+## Core principles
+- Migrations must be **idempotent** where possible — safe to run more than once.
+- Migrations must be **reversible** — always write a down migration.
+- Every migration runs inside a transaction (unless it contains commands that can't be transactional, like `CREATE INDEX CONCURRENTLY`).
+- Test on a staging environment with a recent production data dump before production.
+## Migration table (simple self-managed setup)
+```sql
+CREATE TABLE IF NOT EXISTS _migrations (
+  id      SERIAL PRIMARY KEY,
+  name    TEXT NOT NULL UNIQUE,
+  applied_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
+);
+```
+## Zero-downtime migration pattern (additive-first)
+Never make breaking schema changes in the same deploy as the application code that depends on them. Expand, then contract.
+### Phase 1: Expand (add without breaking)
+```sql
+-- Add new nullable column — safe, no lock, app can read NULL for old rows
+ALTER TABLE orders ADD COLUMN notes TEXT;
+-- Add new index concurrently — no write lock
+CREATE INDEX CONCURRENTLY idx_orders_notes ON orders (notes) WHERE notes IS NOT NULL;
+```
+### Phase 2: Backfill (populate data)
+```sql
+-- Backfill in batches — never in one big UPDATE that locks the table
+DO $$
+DECLARE batch_size INT := 1000;
+        last_id BIGINT := 0;
+BEGIN
+  LOOP
+    UPDATE orders SET notes = '' WHERE id > last_id AND notes IS NULL
+    RETURNING id INTO last_id;
+    EXIT WHEN NOT FOUND;
+    PERFORM pg_sleep(0.01);  -- brief pause between batches
+  END LOOP;
+END $$;
+```
+### Phase 3: Constrain (enforce, after app deploys)
+```sql
+-- Add NOT NULL only after backfill is complete and app always sets notes
+-- Use VALIDATE CONSTRAINT to avoid a long table lock
+ALTER TABLE orders ADD CONSTRAINT orders_notes_not_null CHECK (notes IS NOT NULL) NOT VALID;
+ALTER TABLE orders VALIDATE CONSTRAINT orders_notes_not_null;
+-- Later, replace with actual NOT NULL (requires table rewrite — schedule maintenance)
+ALTER TABLE orders ALTER COLUMN notes SET NOT NULL;
+```
+## Safe vs unsafe DDL
+| Operation | Safe online? | Notes |
+| --- | --- | --- |
+| Add nullable column | ✅ | Instant in Postgres 11+ |
+| Add NOT NULL column with DEFAULT | ✅ Postgres 11+ | Older versions rewrite table |
+| Drop column | ✅ | Marks column invisible; no immediate rewrite |
+| Add index | ❌ (blocks writes) | Use `CREATE INDEX CONCURRENTLY` |
+| Add UNIQUE constraint | ❌ | Create unique index concurrently first, then `ADD CONSTRAINT ... USING INDEX` |
+| Rename column | ⚠️ | Breaking change — requires multi-phase deploy |
+| Change column type | ❌ | Usually needs table rewrite; use additive approach |
+| Drop table | ❌ | Irreversible without backup; ensure app no longer references it |
+## CREATE INDEX CONCURRENTLY
+The only way to add an index without blocking writes:
+```sql
+-- Run outside a transaction block (psql \c or separate connection)
+CREATE INDEX CONCURRENTLY idx_orders_user_id ON orders (user_id);
+```
+Caveats:
+- Cannot run inside a transaction.
+- Takes longer than regular `CREATE INDEX`.
+- If it fails, leaves an `INVALID` index — drop it and retry:
+  ```sql
+  DROP INDEX CONCURRENTLY idx_orders_user_id;
+  ```
+## Renaming — multi-phase deploy
+Never rename a column in a single deploy. The app will break.
+```
+Phase 1: Add new column, dual-write in app to both old and new.
+Phase 2: Backfill new column from old.
+Phase 3: Deploy app to read from new column only.
+Phase 4: Remove old column.
+```
+## Migration tools
+| Tool | Language | Notes |
+| --- | --- | --- |
+| **Flyway** | Java / CLI | SQL-based, version numbered, popular in enterprise |
+| **Liquibase** | Java / CLI | XML/YAML/SQL, rollback built in |
+| **golang-migrate** | Go / CLI | Simple, SQL-based, widely used in Go projects |
+| **Alembic** | Python | SQLAlchemy-integrated, autogenerate support |
+| **Prisma Migrate** | Node.js | Generates SQL from schema diff, dev-friendly |
+| **Drizzle** | Node.js | TypeScript-first, explicit SQL migrations |
+For any tool: always store migration files in version control and review them in PRs.
+## Production checklist before running a migration
+- [ ] Tested on staging with production data size.
+- [ ] Estimated lock duration checked (`EXPLAIN` or timing on staging).
+- [ ] `CREATE INDEX CONCURRENTLY` used for any new indexes.
+- [ ] Down migration written and tested.
+- [ ] Monitoring dashboard open during migration.
+- [ ] Rollback plan documented.
+- [ ] Maintenance window scheduled if migration is non-online.
+## Sources
+- ALTER TABLE: https://www.postgresql.org/docs/current/sql-altertable.html
+- CREATE INDEX CONCURRENTLY: https://www.postgresql.org/docs/current/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY
+- Zero-downtime schema changes: https://www.postgresql.org/docs/current/ddl-alter.html

package/Ai Agent Workflow/skills/database-skills/skills/postgres/references/performance-ops.md CHANGED Viewed

@@ -1,5 +1,117 @@
-# Postgres Performance and Operations
+# Postgres — Performance and Operations
-- Use pg_stat_statements to find high total-cost queries.
-- Keep autovacuum healthy and monitor long transactions.
-- Use connection pooling for serverless/runtime bursts.
+## EXPLAIN workflow
+Always baseline with `EXPLAIN (ANALYZE, BUFFERS)` before and after any change.
+```sql
+EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
+SELECT * FROM orders WHERE user_id = 42 ORDER BY created_at DESC LIMIT 20;
+```
+Key things to read:
+- **Actual vs estimated rows**: large mismatch → run `ANALYZE` on the table.
+- **`Buffers: shared hit / read`**: high `read` → data not cached, I/O bound.
+- **`Seq Scan`** on a large table with a filter → likely missing index.
+- **`Hash Join` vs `Nested Loop`**: nested loop is fast with small inner set; hash join is better for large sets.
+- **`Sort` + `Limit`**: if sorting before limiting, consider an index with matching sort order.
+## pg_stat_statements
+Tracks cumulative stats for every query shape. Use to find top queries by total time.
+```sql
+-- Enable once per cluster
+CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
+-- Top 10 queries by total execution time
+SELECT query, calls, total_exec_time::int, mean_exec_time::int, rows
+FROM pg_stat_statements
+ORDER BY total_exec_time DESC
+LIMIT 10;
+-- Reset stats after tuning
+SELECT pg_stat_statements_reset();
+```
+## ANALYZE — keeping planner stats fresh
+Postgres uses per-column statistics (histograms, MCVs) to estimate row counts. Stale stats = bad plans.
+```sql
+ANALYZE orders;              -- single table
+ANALYZE VERBOSE orders;      -- with output
+ANALYZE;                     -- whole database
+```
+- `autovacuum` runs `ANALYZE` automatically when ~10% of rows change. For bulk loads, run manually.
+- Increase `default_statistics_target` (default 100) for columns with skewed distribution:
+  ```sql
+  ALTER TABLE orders ALTER COLUMN status SET STATISTICS 500;
+  ANALYZE orders;
+  ```
+## VACUUM and autovacuum
+Postgres uses MVCC — dead tuples accumulate from UPDATEs and DELETEs. VACUUM reclaims them.
+```sql
+VACUUM orders;               -- reclaim dead tuples (non-blocking)
+VACUUM ANALYZE orders;       -- reclaim + refresh stats
+VACUUM FULL orders;          -- rewrite table, reclaims disk — needs exclusive lock, use cautiously
+```
+Signs of autovacuum not keeping up:
+```sql
+-- Tables with high dead tuple counts
+SELECT relname, n_dead_tup, n_live_tup, last_autovacuum
+FROM pg_stat_user_tables
+ORDER BY n_dead_tup DESC;
+```
+Tuning autovacuum for hot tables:
+```sql
+ALTER TABLE orders SET (
+  autovacuum_vacuum_scale_factor = 0.01,   -- default 0.2 — trigger earlier
+  autovacuum_analyze_scale_factor = 0.005
+);
+```
+## Connection pooling
+Postgres spawns one process per connection. At ~200+ connections, overhead is significant.
+- Use **PgBouncer** (transaction pooling) to multiplex app connections.
+- Size pool to available CPU cores × 2–4. Monitor `pg_stat_activity`.
+```sql
+-- Active connections breakdown
+SELECT state, count(*) FROM pg_stat_activity GROUP BY state;
+-- Long-running queries
+SELECT pid, now() - query_start AS duration, query, state
+FROM pg_stat_activity
+WHERE state != 'idle' AND query_start < now() - interval '30 seconds';
+```
+## Lock monitoring
+```sql
+-- Blocked queries and what is blocking them
+SELECT blocked.pid, blocked.query, blocking.pid AS blocking_pid, blocking.query AS blocking_query
+FROM pg_stat_activity blocked
+JOIN pg_stat_activity blocking ON blocking.pid = ANY(pg_blocking_pids(blocked.pid))
+WHERE cardinality(pg_blocking_pids(blocked.pid)) > 0;
+```
+## Key production guardrails
+- Never run `VACUUM FULL` on a busy production table — takes an exclusive lock.
+- Use `CREATE INDEX CONCURRENTLY` to avoid write blocks.
+- Set `statement_timeout` and `lock_timeout` to prevent runaway queries from starving the system.
+- Avoid long-open transactions — they block autovacuum and cause bloat.
+## Sources
+- EXPLAIN: https://www.postgresql.org/docs/current/using-explain.html
+- pg_stat_statements: https://www.postgresql.org/docs/current/pgstatstatements.html
+- ANALYZE: https://www.postgresql.org/docs/current/sql-analyze.html
+- VACUUM / autovacuum: https://www.postgresql.org/docs/current/routine-vacuuming.html