npm - @coralai/sps-cli - Versions diffs - 0.42.0 → 0.43.0 - Mend

@coralai/sps-cli 0.42.0 → 0.43.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (109) hide show

package/README.md +34 -3
package/dist/commands/projectInit.d.ts.map +1 -1
package/dist/commands/projectInit.js +40 -53
package/dist/commands/projectInit.js.map +1 -1
package/dist/commands/skillCommand.d.ts +2 -0
package/dist/commands/skillCommand.d.ts.map +1 -0
package/dist/commands/skillCommand.js +235 -0
package/dist/commands/skillCommand.js.map +1 -0
package/dist/core/skillStore.d.ts +46 -0
package/dist/core/skillStore.d.ts.map +1 -0
package/dist/core/skillStore.js +197 -0
package/dist/core/skillStore.js.map +1 -0
package/dist/core/skillStore.test.d.ts +2 -0
package/dist/core/skillStore.test.d.ts.map +1 -0
package/dist/core/skillStore.test.js +190 -0
package/dist/core/skillStore.test.js.map +1 -0
package/dist/main.js +19 -17
package/dist/main.js.map +1 -1
package/package.json +1 -1
package/skills/architecture-decision-records/SKILL.md +207 -0
package/skills/backend/SKILL.md +62 -0
package/skills/backend/references/api-design.md +168 -0
package/skills/backend/references/caching.md +181 -0
package/skills/backend/references/data-access.md +173 -0
package/skills/backend/references/layering.md +181 -0
package/skills/backend/references/observability.md +190 -0
package/skills/backend/references/resilience.md +201 -0
package/skills/backend/references/security.md +186 -0
package/skills/backend-architect/SKILL.md +119 -0
package/skills/code-reviewer/SKILL.md +143 -0
package/skills/coding-standards/SKILL.md +60 -0
package/skills/coding-standards/references/clean-code.md +258 -0
package/skills/coding-standards/references/code-review.md +192 -0
package/skills/coding-standards/references/commits-and-prs.md +226 -0
package/skills/coding-standards/references/error-strategy.md +193 -0
package/skills/coding-standards/references/naming.md +185 -0
package/skills/coding-standards/references/tdd.md +171 -0
package/skills/database/SKILL.md +53 -0
package/skills/database/references/indexing.md +190 -0
package/skills/database/references/migrations.md +199 -0
package/skills/database/references/nosql.md +185 -0
package/skills/database/references/queries.md +295 -0
package/skills/database/references/scaling.md +203 -0
package/skills/database/references/schema.md +191 -0
package/skills/database-optimizer/SKILL.md +168 -0
package/skills/debugging-workflow/SKILL.md +244 -0
package/skills/devops/SKILL.md +55 -0
package/skills/devops/references/ci-cd.md +204 -0
package/skills/devops/references/containers.md +272 -0
package/skills/devops/references/deploy.md +201 -0
package/skills/devops/references/iac.md +252 -0
package/skills/devops/references/observability.md +228 -0
package/skills/devops/references/secrets.md +178 -0
package/skills/devops-automator/SKILL.md +164 -0
package/skills/frontend/SKILL.md +52 -0
package/skills/frontend/references/accessibility.md +222 -0
package/skills/frontend/references/components.md +206 -0
package/skills/frontend/references/performance.md +219 -0
package/skills/frontend/references/routing.md +209 -0
package/skills/frontend/references/state.md +190 -0
package/skills/frontend/references/testing.md +216 -0
package/skills/frontend-developer/SKILL.md +115 -0
package/skills/git-workflow/SKILL.md +355 -0
package/skills/golang/SKILL.md +49 -0
package/skills/golang/references/concurrency.md +284 -0
package/skills/golang/references/errors.md +241 -0
package/skills/golang/references/idioms.md +285 -0
package/skills/golang/references/testing.md +238 -0
package/skills/java/SKILL.md +50 -0
package/skills/java/references/concurrency.md +194 -0
package/skills/java/references/idioms.md +283 -0
package/skills/java/references/testing.md +228 -0
package/skills/kotlin/SKILL.md +47 -0
package/skills/kotlin/references/coroutines.md +240 -0
package/skills/kotlin/references/idioms.md +268 -0
package/skills/kotlin/references/testing.md +219 -0
package/skills/mobile/SKILL.md +50 -0
package/skills/mobile/references/architecture.md +204 -0
package/skills/mobile/references/navigation.md +158 -0
package/skills/mobile/references/performance.md +152 -0
package/skills/mobile/references/platform.md +166 -0
package/skills/mobile/references/state-and-data.md +174 -0
package/skills/python/SKILL.md +51 -0
package/skills/python/THIRD_PARTY.md +14 -0
package/skills/python/references/async.md +218 -0
package/skills/python/references/error-handling.md +254 -0
package/skills/python/references/idioms.md +279 -0
package/skills/python/references/packaging.md +233 -0
package/skills/python/references/testing.md +269 -0
package/skills/python/references/typing.md +292 -0
package/skills/qa-tester/SKILL.md +186 -0
package/skills/rust/SKILL.md +50 -0
package/skills/rust/references/async.md +224 -0
package/skills/rust/references/errors.md +240 -0
package/skills/rust/references/ownership.md +263 -0
package/skills/rust/references/testing.md +274 -0
package/skills/rust/references/traits.md +250 -0
package/skills/security-engineer/SKILL.md +157 -0
package/skills/swift/SKILL.md +48 -0
package/skills/swift/references/concurrency.md +280 -0
package/skills/swift/references/idioms.md +334 -0
package/skills/swift/references/testing.md +229 -0
package/skills/typescript/SKILL.md +51 -0
package/skills/typescript/references/async.md +241 -0
package/skills/typescript/references/errors.md +208 -0
package/skills/typescript/references/idioms.md +246 -0
package/skills/typescript/references/testing.md +225 -0
package/skills/typescript/references/tooling.md +208 -0
package/skills/typescript/references/types.md +259 -0

package/skills/database/references/nosql.md ADDED Viewed

@@ -0,0 +1,185 @@
+# NoSQL
+Document, key-value, time-series, graph — when each fits.
+## Picking the right engine
+```
+Pick by access pattern, not by popularity.
+```
+| Engine | Shape | Good for |
+|---|---|---|
+| **Postgres / MySQL** | Relational + JSON | Almost everything; strong default |
+| **MongoDB / Couchbase** | Document | Shape varies per record; embed-heavy reads |
+| **DynamoDB / Cassandra** | Key-value / wide column | Massive scale; predictable access patterns |
+| **Redis** | Key-value (in-memory) | Cache, leaderboard, session, pub-sub |
+| **ClickHouse / BigQuery / Redshift** | Columnar / OLAP | Analytics; big scans; aggregations |
+| **TimescaleDB / InfluxDB** | Time-series | Metrics, events, sensor data |
+| **Neo4j / DGraph** | Graph | Traversals (relationships matter more than rows) |
+| **Elasticsearch / OpenSearch** | Inverted index | Full text, log search |
+Rule: start with Postgres. Add a specialized store **when the access pattern justifies operational cost**. A lonely Elasticsearch cluster is a bug farm.
+## Document stores (MongoDB, Couchbase)
+Store JSON-ish documents. No joins; embed or reference.
+```json
+{
+  "_id": "u_01H...",
+  "email": "a@x.com",
+  "orders": [
+    { "id": "ord_01H...", "items": [...], "total_cents": 2599 }
+  ]
+}
+```
+### Embed vs. reference
+- **Embed** what you always read together and that doesn't grow unboundedly.
+- **Reference** what's read separately or grows.
+Rules:
+- One-to-few (user ↔ addresses) → embed.
+- One-to-many (user ↔ orders, bounded) → embed with a soft limit.
+- One-to-many unbounded (user ↔ events, could be millions) → reference.
+- Many-to-many → reference on both sides.
+### Indexes
+Every MongoDB query benefits from an index; without, it scans the collection. Index on the field you query, and compound on `(filter, sort)`.
+### Schema is still a thing
+Schemaless doesn't mean unschematic. Use a library (Zod, Joi, Pydantic, Mongoose) to define the shape in code and validate at write time. Otherwise two weeks later you have five variants of the same "user" shape.
+### Transactions
+Modern MongoDB supports multi-document transactions. Use them when you need them; don't hand-roll two-phase commits in app code.
+## Key-value (DynamoDB, Cassandra)
+Scale: massive. Flexibility: low.
+Access patterns **must** be known at design time. You design the key to match the queries, not the other way around.
+### Single-table design (DynamoDB)
+```
+pk          sk              attrs
+USER#u1     PROFILE         { name, email }
+USER#u1     ORDER#o1        { total, status }
+USER#u1     ORDER#o2        { total, status }
+ORDER#o1    META            { user_id, items }
+```
+One table, careful composite keys. Supports:
+- Get user profile: PK=USER#u1, SK=PROFILE
+- List user orders: PK=USER#u1, SK begins_with ORDER#
+- Get order: PK=ORDER#o1
+Queries that don't fit the key structure need a global secondary index (expensive) or an adapter (client-side fan-out).
+If access patterns aren't stable, use Postgres.
+### Eventual consistency
+Most KV stores offer strong consistency on single-key ops and eventual on cross-key. Design around it:
+- Read-your-writes: query the primary (DynamoDB: `ConsistentRead=true`).
+- Listing results may lag recent writes briefly.
+## Redis
+Primarily in-memory; data structures (strings, lists, sets, sorted sets, hashes, streams, geo).
+### Patterns
+- **Cache-aside** — most common.
+- **Session store** — cookie → Redis key → session blob.
+- **Rate limiter** — `INCR` with TTL on window keys.
+- **Leaderboard** — sorted set (`ZADD` / `ZREVRANGE`).
+- **Job queue** — lists (`LPUSH` / `BRPOP`) or Streams (better for ack-and-retry).
+- **Pub/sub** — fan out events to subscribers.
+- **Distributed lock** — Redlock or `SET NX PX`.
+### Persistence
+- **RDB** snapshots: point-in-time, fast recovery, up to minutes of data loss.
+- **AOF** append-only log: near-zero loss, slower restart.
+- **AOF + RDB** together: recommended.
+Even "in-memory" Redis should have persistence. Otherwise a restart vacuums state.
+### Memory limits
+Set `maxmemory` + eviction policy (`allkeys-lru`, `volatile-ttl`). Otherwise Redis OOMs and crashes.
+## Time-series (TimescaleDB, InfluxDB)
+Purpose-built for write-heavy append-mostly workloads with time-range reads.
+- High ingest rate (100K+ points / sec).
+- Downsampling / rollups built in.
+- Retention policies: drop data older than N.
+- Optimized compression.
+If your workload is "mostly append, rarely update, queried by time", this is the right tool. Forcing a generic DB to do it (insert rate, index bloat, storage cost) is misery.
+TimescaleDB sits on top of Postgres — you get SQL, joins to normal tables, familiar ops.
+## Graph databases (Neo4j, DGraph)
+When the data is relationships and queries are traversals: "friends of friends with a common interest", "shortest path", "who did this user influence".
+In a relational DB, that's a multi-step recursive CTE. In a graph DB, it's a one-liner in Cypher / GraphQL+ / Gremlin.
+Cost: operational, learning curve, ecosystem. Use only when traversals dominate.
+## Search (Elasticsearch / OpenSearch / Typesense / MeiliSearch)
+Full-text with relevance, filters, aggregations. Postgres tsvector handles basic needs; search engines handle:
+- Fuzzy matching, typo tolerance.
+- Multi-language stemming.
+- Custom relevance tuning (boosts per field).
+- Faceted filtering with aggregates.
+- Log / event search at scale.
+**Never** use a search engine as the source of truth. Index from the authoritative DB; reindex is a feature, not a crisis.
+## Analytics stores
+OLAP vs. OLTP is the single biggest architectural fork.
+- **OLTP** (Postgres, MySQL): many small reads/writes, low latency, strong consistency.
+- **OLAP** (ClickHouse, BigQuery, Snowflake, Redshift, DuckDB): few big scans, column-oriented, optimized for aggregations.
+Running analytics on the prod DB:
+- OK for small teams, small data.
+- Hits limits fast: locks the primary, slows the site, bursts workload.
+Standard path:
+1. Start with OLTP + read replica for reports.
+2. Add a warehouse (BigQuery / Snowflake) when replica reads aren't enough.
+3. Build an ELT pipeline (Fivetran, Airbyte, or home-grown) to move OLTP → warehouse.
+## Polyglot persistence
+Using 5 different stores because each is "best at one thing" sounds great, adds up in ops cost. Rule: every new store doubles what your oncall rotation needs to know.
+Add stores sparingly. Prefer a generalist (Postgres) doing a specialist's job badly over two specialists that must be kept in sync.
+## Anti-patterns
+| Anti-pattern | Fix |
+|---|---|
+| MongoDB for a heavily relational domain | Postgres |
+| DynamoDB single-table design with unknown future queries | Postgres until you know what you're querying |
+| Redis as the source of truth | Cache only; durable store behind it |
+| Elasticsearch as primary data store | Secondary index; reindex from the primary |
+| Graph DB because "everything is a graph" | Only if traversals dominate |
+| Analytics on the OLTP primary | Replica → warehouse |
+| One schemaless collection per "microservice" | Validate shape; otherwise chaos in six months |
+| Eventual-consistency reads presented to the user as "final" | Surface "just a moment" UX; or read from primary |
+| No persistence on Redis in production | Always configure AOF + RDB |

package/skills/database/references/queries.md ADDED Viewed

@@ -0,0 +1,295 @@
+# Queries
+JOINs, subqueries, CTEs, window functions, EXPLAIN.
+## Read the plan
+`EXPLAIN (ANALYZE, BUFFERS)` shows what the DB actually did.
+```sql
+EXPLAIN (ANALYZE, BUFFERS)
+SELECT u.email, count(*) AS n
+FROM users u JOIN orders o ON o.user_id = u.id
+WHERE u.active AND o.created_at > now() - interval '30 days'
+GROUP BY u.email
+HAVING count(*) > 5;
+```
+Red flags:
+- `Seq Scan` on large tables where a filter expects few rows.
+- Actual rows hugely diverge from the estimate (stats are stale → `ANALYZE`).
+- `Hash Join` spilling to disk (`Disk: ...`).
+- Nested Loop where Hash/Merge would be cheaper (low cardinality estimate misled the planner).
+## Join types
+| Join | Meaning |
+|---|---|
+| `INNER JOIN` | Rows where both sides match |
+| `LEFT JOIN` | All left rows; NULLs for unmatched right |
+| `RIGHT JOIN` | Rare; usually rewrite as LEFT with swapped sides |
+| `FULL OUTER JOIN` | Both sides; NULLs where unmatched |
+| `CROSS JOIN` | Cartesian product — use deliberately |
+| `LATERAL` | Right side can reference left — per-row subquery |
+## `EXISTS` vs. `IN` vs. `JOIN`
+For "users who have at least one paid order":
+```sql
+-- ✅ Standard, optimizer handles well in modern DBs
+SELECT u.* FROM users u
+WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id AND o.status = 'paid');
+-- Equivalent, often same plan
+SELECT DISTINCT u.* FROM users u JOIN orders o ON o.user_id = u.id
+WHERE o.status = 'paid';
+-- Avoid — duplicates without DISTINCT; subtle
+SELECT u.* FROM users u
+WHERE u.id IN (SELECT o.user_id FROM orders o WHERE o.status = 'paid');
+```
+`EXISTS` is usually the clearest for "does any matching row exist". `JOIN + DISTINCT` can double-count on many-to-many relationships.
+## CTEs (`WITH`)
+Named subqueries for readability. In Postgres < 12, CTEs were optimization fences; 12+ inlines unless `MATERIALIZED` is specified.
+```sql
+WITH active_users AS (
+    SELECT id FROM users WHERE active
+),
+recent_orders AS (
+    SELECT user_id, sum(total_cents) AS total
+    FROM orders
+    WHERE created_at > now() - interval '30 days'
+    GROUP BY user_id
+)
+SELECT u.id, r.total
+FROM active_users u
+JOIN recent_orders r ON r.user_id = u.id;
+```
+Recursive CTEs for trees:
+```sql
+WITH RECURSIVE subordinates AS (
+    SELECT id, manager_id, 0 AS depth FROM employees WHERE id = ?
+    UNION ALL
+    SELECT e.id, e.manager_id, s.depth + 1
+    FROM employees e JOIN subordinates s ON e.manager_id = s.id
+)
+SELECT * FROM subordinates;
+```
+## Window functions
+Computations over rows without collapsing them.
+```sql
+-- Rank orders by total within each user
+SELECT user_id, id, total_cents,
+       row_number() OVER (PARTITION BY user_id ORDER BY total_cents DESC) AS rn
+FROM orders;
+-- Running total
+SELECT created_at, total_cents,
+       sum(total_cents) OVER (ORDER BY created_at) AS running_total
+FROM orders;
+-- Day-over-day
+SELECT day, revenue,
+       revenue - lag(revenue) OVER (ORDER BY day) AS delta
+FROM daily_revenue;
+```
+Often replaces a self-join or a complex subquery.
+## Aggregations
+```sql
+-- Count
+SELECT count(*) FROM orders;            -- counts rows
+SELECT count(email) FROM users;         -- counts non-NULL emails
+SELECT count(DISTINCT user_id) FROM orders;
+-- Conditional aggregates
+SELECT
+    count(*) FILTER (WHERE status = 'paid') AS paid,
+    count(*) FILTER (WHERE status = 'pending') AS pending
+FROM orders;
+-- Array / string aggregation
+SELECT user_id, string_agg(tag, ',' ORDER BY tag) FROM user_tags GROUP BY user_id;
+SELECT user_id, array_agg(tag) FROM user_tags GROUP BY user_id;
+```
+`FILTER` is cleaner than `CASE WHEN ... ELSE NULL END` in an aggregate.
+## Upsert
+Insert-or-update in one statement, atomically.
+```sql
+-- Postgres
+INSERT INTO users (id, email, name)
+VALUES (?, ?, ?)
+ON CONFLICT (email) DO UPDATE
+SET name = EXCLUDED.name, updated_at = now();
+-- MySQL
+INSERT INTO users (id, email, name)
+VALUES (?, ?, ?)
+ON DUPLICATE KEY UPDATE
+    name = VALUES(name), updated_at = now();
+```
+Don't SELECT-then-INSERT-or-UPDATE in app code. Race condition → duplicate rows.
+## Batch operations
+Process many rows in one statement.
+```sql
+-- Bulk insert with VALUES
+INSERT INTO events (id, payload) VALUES (?, ?), (?, ?), (?, ?), ...;
+-- Bulk delete with IN
+DELETE FROM stale_events WHERE id IN (?, ?, ?);
+-- Update with CTE + JOIN
+WITH bad_ids AS (SELECT id FROM events WHERE created_at < now() - interval '1 year')
+DELETE FROM events USING bad_ids WHERE events.id = bad_ids.id;
+```
+## Pagination — keyset over offset
+```sql
+-- ❌ Slow on page 1000
+SELECT * FROM events ORDER BY id LIMIT 50 OFFSET 50000;
+-- ✅ Keyset pagination
+SELECT * FROM events WHERE id > ? ORDER BY id LIMIT 50;
+```
+Offset pagination's cost grows with offset. Keyset is O(log n) regardless.
+For reverse / filtered queries, the cursor is a tuple: `(created_at, id)`.
+```sql
+SELECT * FROM events
+WHERE (created_at, id) < (?, ?)
+ORDER BY created_at DESC, id DESC
+LIMIT 50;
+```
+## `LIMIT` with `ORDER BY`
+Always use `ORDER BY` when using `LIMIT`. Without it, the order is undefined — results can change between runs even on the same data.
+## Avoid SELECT *
+```sql
+-- ❌ Sends every column over the wire; brittle when schema evolves
+SELECT * FROM users WHERE id = ?;
+-- ✅
+SELECT id, email, active FROM users WHERE id = ?;
+```
+Explicit column lists:
+- Ship less bandwidth.
+- Enable covering indexes.
+- Don't break when a column is added / renamed.
+## Locking
+Explicit locks for concurrency control:
+```sql
+-- Read the row, block concurrent writers
+BEGIN;
+SELECT * FROM accounts WHERE id = 1 FOR UPDATE;
+-- decide, modify
+UPDATE accounts SET balance = balance - 100 WHERE id = 1;
+COMMIT;
+```
+`FOR UPDATE SKIP LOCKED` — for job-queue workers to take different rows:
+```sql
+SELECT * FROM jobs WHERE status = 'ready'
+ORDER BY created_at
+FOR UPDATE SKIP LOCKED
+LIMIT 1;
+```
+Advisory locks for non-row locking (Postgres):
+```sql
+SELECT pg_try_advisory_xact_lock(hashtext('reindex-job'));
+```
+Great for cross-process leader election / run-once jobs.
+## Avoid N+1 at SQL level
+```sql
+-- ❌ In app code:
+for order in orders:
+    user = SELECT * FROM users WHERE id = order.user_id
+-- N+1 queries
+-- ✅ Single JOIN
+SELECT o.*, u.email FROM orders o JOIN users u ON u.id = o.user_id;
+-- ✅ Or a single IN:
+SELECT * FROM users WHERE id IN (?, ?, ?, ...);
+```
+## NULL semantics — three-valued logic
+`NULL = NULL` is `UNKNOWN`, not `TRUE`. Catches people out:
+```sql
+SELECT * FROM users WHERE email = 'a@x.com' OR email != 'a@x.com';
+-- Rows where email IS NULL are NOT returned (NULL doesn't match either)
+-- To include NULL:
+WHERE email = 'a@x.com' OR email IS NULL
+```
+Use `COALESCE(x, default)` when comparing nullable columns.
+## String / date functions — DB-specific
+Every engine has its own. Postgres examples:
+```sql
+date_trunc('day', created_at)               -- 2026-04-20 00:00:00
+extract(epoch from (end_at - start_at))     -- seconds between
+now() - interval '1 day'                     -- 24h ago
+age(now(), created_at)                       -- human-friendly interval
+lower(email), upper(email), length(text), substring(text, 1, 10)
+split_part('a,b,c', ',', 2)                  -- 'b'
+regexp_replace(s, '[0-9]+', '#', 'g')
+```
+Check the manual before inventing portable helpers.
+## Anti-patterns
+| Anti-pattern | Fix |
+|---|---|
+| `SELECT *` in app code | Name the columns |
+| Queries built by string-concatenating user input | Parameterize |
+| Offset pagination on large tables | Keyset |
+| `WHERE function(col) = ?` without expression index | Rewrite or index |
+| N+1 queries in app code | JOIN / IN / batch |
+| Multiple round-trips where one query suffices | CTE / subquery |
+| Guessing with ORDER BY RANDOM() on big tables | Pre-shuffle, reservoir sample, or `TABLESAMPLE` |
+| SELECT-then-INSERT for upsert | Native `ON CONFLICT` / `MERGE` |
+| Long transactions reading large data | Split, or use a cursor / pagination |
+| `LIKE '%x%'` on a big table | Full-text index |

package/skills/database/references/scaling.md ADDED Viewed

@@ -0,0 +1,203 @@
+# Scaling
+Replication, read replicas, sharding, partitioning, pooling.
+## Scale vertical before horizontal
+A single Postgres / MySQL instance on modern hardware handles a LOT:
+- Postgres on a large machine: tens of thousands of QPS, terabytes of data.
+- Most teams that "need" sharding actually have a missing index.
+Scaling checklist before scaling out:
+1. Are queries using indexes? (EXPLAIN)
+2. Is the instance CPU- or I/O-bound? (top / iostat)
+3. Connection pool configured? (see below)
+4. Any long-held locks / long transactions?
+5. Table bloat? (Postgres VACUUM health)
+## Read replicas
+Streaming replication (Postgres, MySQL) sends WAL / binlog to followers. Route reads to replicas, writes to primary.
+```
+app ──writes──▶ primary
+  ──reads──▶ replica 1, replica 2, ...
+```
+Benefits:
+- Horizontal read scaling.
+- Failover target.
+- Analytics offload.
+Gotchas:
+- **Replication lag**. Milliseconds usually, seconds under load. Reads immediately after a write may see stale data.
+- **Read-your-writes**: route the current user's reads to primary for N seconds after they write, or always read from primary for critical paths.
+## Connection pooling
+Every real backend uses a pool. DBs limit max connections (Postgres default ~100); without pooling, a traffic spike hits the ceiling.
+Pool parameters:
+| Parameter | Starting value |
+|---|---|
+| min idle | 2–5 |
+| max size | (DB max ÷ app replicas) − safety margin |
+| connect timeout | 2–5 s |
+| idle timeout | 30 s – 5 min |
+| max lifetime | 30 min |
+Serverless + traditional DB: use a pooler (PgBouncer, RDS Proxy, Supabase Pooler, Hyperdrive) — each cold lambda can't open its own pool.
+PgBouncer modes:
+- **Session pooling** — safest, but holds one backend connection per client session.
+- **Transaction pooling** — best efficiency; some features (session-level variables, prepared statements in certain drivers) don't work.
+- **Statement pooling** — rare; most strict restrictions.
+## Partitioning
+Split a large table into pieces based on a key (time, tenant, region). Each partition is its own physical table.
+```sql
+CREATE TABLE events (
+    id UUID, tenant_id UUID, created_at TIMESTAMPTZ, payload JSONB
+) PARTITION BY RANGE (created_at);
+CREATE TABLE events_2026_04 PARTITION OF events
+    FOR VALUES FROM ('2026-04-01') TO ('2026-05-01');
+```
+Benefits:
+- Queries with `WHERE created_at BETWEEN ...` scan only the relevant partitions.
+- Drop old data via `DROP PARTITION` (no delete scan, no VACUUM).
+- Smaller indexes per partition.
+Costs:
+- Schema evolution is more work.
+- Queries that don't include the partition key scan all partitions.
+- Some ORMs / tools handle partitioning poorly.
+Partition when:
+- A single table is approaching 100M+ rows and queries typically touch a subset.
+- You need to age out old data regularly.
+- Insert rate is pushing one partition's size limits.
+## Sharding
+Distribute a dataset across multiple databases by a shard key.
+```
+tenant_id % N → which shard holds this tenant's data
+```
+Each shard is a self-contained DB. Cross-shard queries require the app to fan out and merge.
+Before sharding, try:
+- Vertical scale + read replicas.
+- Partitioning.
+- Service-level split (separate the auth DB from the orders DB).
+When sharding is justified:
+- Write throughput > what one instance can take.
+- Data size > one instance's disk.
+- Strong per-tenant isolation required.
+Hard parts:
+- **Cross-shard joins**: don't. Or use a small read replica that aggregates.
+- **Cross-shard transactions**: don't. Use eventual consistency patterns.
+- **Rebalancing**: moving data between shards is painful; pick a shard key that rarely grows imbalanced.
+Managed services (Vitess, Citus, Spanner, CockroachDB, Yugabyte) do a lot of this for you. Self-building a sharding layer is a project unto itself.
+## Caching layer
+Offload read traffic from the DB.
+| Layer | Cost | Gain |
+|---|---|---|
+| In-process cache | Free | Limited to pod memory |
+| Redis / Memcached | Ops + infra | Cross-pod shared |
+| CDN / HTTP cache | Free on public GETs | Offloads to edge |
+Cache-aside is the default pattern. See `backend/references/caching.md`.
+Guardrail: a cache layer is another thing to monitor. Measure its hit rate — a low hit rate means you're paying the complexity without the benefit.
+## Write amplification
+Every index is a write tax. A row written to a table with 5 indexes is actually 6 writes. Review indexes periodically:
+```sql
+-- Postgres: unused indexes in recent weeks
+SELECT relname, indexrelname, idx_scan
+FROM pg_stat_user_indexes
+WHERE idx_scan = 0 AND indexrelname NOT LIKE 'pg_%'
+ORDER BY pg_relation_size(indexrelid) DESC;
+```
+Drop what isn't used.
+## Hot rows
+A single row updated by many clients (a counter, a shared config, a leaderboard entry) becomes a contention point.
+Mitigations:
+- **Increment via atomic SQL**: `UPDATE stats SET count = count + 1 WHERE id = ?`.
+- **Shard the counter**: N sub-counters, summed when read.
+- **Move to a cache**: increment in Redis, flush periodically.
+- **Denormalize**: pre-aggregate at write time into a stream that downstream consumers read.
+## Archival
+Active data and historical data usually have different access patterns. Move old data out of the hot path.
+- **Partition by time** and drop old partitions.
+- **Archive to a data warehouse** (Snowflake, BigQuery, Redshift, ClickHouse) for analytics.
+- **Cold storage** (S3 + Athena / Parquet) for years-old data accessed rarely.
+Keeps the hot DB small, fast, cheap.
+## Replication topologies
+- **Single primary, multiple replicas** — read scaling, simple.
+- **Cascading replication** — replica of a replica, for geographic distribution.
+- **Logical replication** (Postgres) — table-level; supports zero-downtime upgrades and data migration between major versions.
+- **Multi-primary** — rare, complex, use managed services (Spanner, CockroachDB) that built for it.
+Be skeptical of "multi-master" in traditional engines. Conflict resolution is user-facing work.
+## Backups
+- **Automated daily backups + point-in-time recovery** (PITR). Not optional.
+- **Test restore** quarterly. A backup you've never restored is a hope, not a backup.
+- **Off-region storage** for disaster recovery.
+- **Encryption at rest**.
+Many incidents are resolved by restoring a table / row from a backup. Make sure that's possible without a day of pain.
+## Monitoring — the must-haves
+- **Connection count** vs. max.
+- **Replication lag** per replica.
+- **Query p95 / p99 latency**.
+- **Slow query log** (threshold: 500 ms+).
+- **Lock waits and deadlocks**.
+- **Cache hit ratio** (Postgres `pg_statio_*`, MySQL InnoDB buffer pool).
+- **Disk space** and growth rate.
+- **Autovacuum / maintenance activity** (Postgres).
+Dashboards + alerts on all of the above.
+## Anti-patterns
+| Anti-pattern | Fix |
+|---|---|
+| Sharding before trying indexes / replicas / partitioning | Exhaust simpler options first |
+| One connection per request without a pool | Use a pool; use a pooler for serverless |
+| Read from a replica for "write-then-read" flows | Read from primary for stickiness window |
+| No automated backups | Set up PITR yesterday |
+| 50 indexes on a write-heavy table | Prune; serve reports from a replica / warehouse |
+| Ignoring replication lag in the app | Observe and degrade |
+| Cross-shard queries in the app | Architect around the shard key |
+| Running the DB on the same host as the app | One process's CPU spike kills the DB |
+| Unrestricted `pg_dump` over the wire during business hours | Replicate to a snapshot host and dump there |