@comfanion/workflow 4.36.39 → 4.36.41

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,820 @@
1
+ ---
2
+ name: database-design
3
+ description: Use when designing database schema, choosing storage strategy, planning migrations, or optimizing queries for a module or service
4
+ license: MIT
5
+ compatibility: opencode
6
+ metadata:
7
+ domain: software-architecture
8
+ patterns: normalization, indexing, partitioning, migrations
9
+ artifacts: docs/architecture/*/data-model.md
10
+ ---
11
+
12
+ # Database Design Skill
13
+
14
+ ## When to Use
15
+
16
+ Use this skill when you need to:
17
+ - Design database schema for new module/service
18
+ - Choose database type (relational, document, graph, time-series)
19
+ - Plan table structure and relationships
20
+ - Design indexes for query optimization
21
+ - Plan data migrations (forward/backward compatible)
22
+ - Implement partitioning strategy
23
+ - Design for scalability and performance
24
+
25
+ ## Reference
26
+
27
+ Always check project standards: `@CLAUDE.md`
28
+
29
+ ## Templates
30
+
31
+ - Main: `@.opencode/skills/database-design/template.md`
32
+ - Migration: `@.opencode/skills/database-design/template-migration.md`
33
+
34
+ ---
35
+
36
+ ## Database Selection
37
+
38
+ **No default — choose based on project requirements.**
39
+
40
+ Analyze requirements → evaluate options → document decision in ADR.
41
+
42
+ ### Selection Criteria
43
+
44
+ | Criteria | Questions to Answer |
45
+ |----------|---------------------|
46
+ | **Data model** | Relational? Documents? Key-value? Graph? Time-series? |
47
+ | **Consistency** | ACID required? Eventual OK? |
48
+ | **Scale** | GB? TB? PB? Read/write ratio? |
49
+ | **Query patterns** | Complex JOINs? Full-text? Aggregations? |
50
+ | **Deployment** | Cloud managed? Self-hosted? Embedded? Serverless? |
51
+ | **Team expertise** | What does team know? Learning curve acceptable? |
52
+ | **Cost** | License? Infrastructure? Operations? |
53
+
54
+ ### Relational Databases
55
+
56
+ | Database | Pros | Cons | Best For |
57
+ |----------|------|------|----------|
58
+ | **PostgreSQL** | Feature-rich, extensions (JSONB, vectors, FTS), ACID, open-source | Heavier than alternatives, vertical scaling | Complex queries, mixed workloads, extensibility needed |
59
+ | **MySQL/MariaDB** | Fast reads, mature, wide hosting support | Less features than PG, replication complexity | Read-heavy web apps, legacy compatibility |
60
+ | **SQLite** | Zero config, embedded, single file, serverless | Single writer, no network access | Mobile, desktop apps, dev/test, edge, small projects |
61
+ | **CockroachDB** | Distributed SQL, horizontal scaling, ACID | Complexity, latency vs single-node | Global distribution, high availability |
62
+ | **TiDB** | MySQL compatible, HTAP, horizontal scaling | Operational complexity | MySQL migration with scale needs |
63
+
64
+ ### Document Databases
65
+
66
+ | Database | Pros | Cons | Best For |
67
+ |----------|------|------|----------|
68
+ | **MongoDB** | Flexible schema, horizontal scaling, rich queries | No JOINs, eventual consistency default | Rapid prototyping, variable schemas, content management |
69
+ | **CouchDB** | Multi-master replication, offline-first | Limited queries, slower | Sync-heavy apps, offline-first |
70
+ | **FerretDB** | MongoDB protocol, PostgreSQL backend | Newer, subset of features | MongoDB API with PostgreSQL storage |
71
+
72
+ ### Key-Value & Cache
73
+
74
+ | Database | Pros | Cons | Best For |
75
+ |----------|------|------|----------|
76
+ | **Redis** | Sub-ms latency, data structures, pub/sub | Memory-bound, persistence tradeoffs | Cache, sessions, real-time, queues |
77
+ | **Memcached** | Simple, fast, multi-threaded | No persistence, only strings | Pure caching |
78
+ | **DragonflyDB** | Redis compatible, better memory efficiency | Newer | Redis replacement at scale |
79
+ | **KeyDB** | Redis fork, multi-threaded | Community smaller | High-throughput Redis workloads |
80
+
81
+ ### Search Engines
82
+
83
+ | Database | Pros | Cons | Best For |
84
+ |----------|------|------|----------|
85
+ | **Elasticsearch** | Powerful search, analytics, ecosystem | Resource heavy, operational complexity | Full-text search, logging, analytics |
86
+ | **OpenSearch** | Elasticsearch fork, open-source | Feature lag | Elasticsearch alternative (AWS) |
87
+ | **Meilisearch** | Simple, fast, typo-tolerant | Less features, smaller scale | Simple search, instant search UX |
88
+ | **Typesense** | Easy setup, typo-tolerant | Smaller community | Developer-friendly search |
89
+
90
+ ### Vector Databases (AI/ML)
91
+
92
+ | Database | Pros | Cons | Best For |
93
+ |----------|------|------|----------|
94
+ | **pgvector** | PostgreSQL extension, familiar | Scale limits vs specialized | Vector search + relational in one DB |
95
+ | **Pinecone** | Managed, scalable, simple API | Vendor lock-in, cost | Production AI apps, managed solution |
96
+ | **Qdrant** | Open-source, filtering, fast | Self-hosting complexity | Self-hosted vector search |
97
+ | **Weaviate** | GraphQL, modules, hybrid search | Heavier | Semantic search, multi-modal |
98
+ | **Chroma** | Simple, embedded option | Early stage | Prototyping, small projects |
99
+ | **Milvus** | Scalable, open-source | Complex deployment | Large-scale similarity search |
100
+
101
+ ### Time-Series
102
+
103
+ | Database | Pros | Cons | Best For |
104
+ |----------|------|------|----------|
105
+ | **TimescaleDB** | PostgreSQL extension, SQL | Single-node limits | Time-series + relational |
106
+ | **ClickHouse** | Blazing fast analytics, compression | Column-oriented learning curve | OLAP, analytics, logs at scale |
107
+ | **InfluxDB** | Purpose-built, ecosystem | Query language (Flux) | Metrics, IoT, monitoring |
108
+ | **QuestDB** | Fast ingestion, SQL | Smaller community | High-frequency time-series |
109
+
110
+ ### Graph Databases
111
+
112
+ | Database | Pros | Cons | Best For |
113
+ |----------|------|------|----------|
114
+ | **Neo4j** | Mature, Cypher query language | License cost, scaling | Complex relationships, graph algorithms |
115
+ | **Amazon Neptune** | Managed, multiple models | AWS only | Cloud-native graph |
116
+ | **ArangoDB** | Multi-model (doc, graph, KV) | Jack of all trades | Flexible data models |
117
+
118
+ ### Message Queues & Streaming
119
+
120
+ | System | Pros | Cons | Best For |
121
+ |--------|------|------|----------|
122
+ | **Kafka** | High throughput, durability, ecosystem | Operational complexity | Event streaming, high-volume |
123
+ | **RabbitMQ** | Flexible routing, protocols | Lower throughput than Kafka | Task queues, complex routing |
124
+ | **NATS** | Simple, fast, lightweight | Less features | Microservices, IoT |
125
+ | **Redis Streams** | Built into Redis | Less features than Kafka | Simple streaming with Redis |
126
+
127
+ ### Multi-Storage Architecture
128
+
129
+ When one database isn't enough:
130
+
131
+ ```
132
+ ┌─────────────────────────────────────────────────────────┐
133
+ │ Application │
134
+ └───────┬─────────────┬─────────────┬─────────────┬───────┘
135
+ │ │ │ │
136
+ ▼ ▼ ▼ ▼
137
+ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
138
+ │ Primary │ │ Cache │ │ Search │ │ Events │
139
+ │ DB │ │ │ │ │ │ │
140
+ └─────────┘ └─────────┘ └─────────┘ └─────────┘
141
+ ```
142
+
143
+ **Considerations:**
144
+ - Sync complexity (eventual consistency)
145
+ - Operational overhead (more systems to manage)
146
+ - Cost (licenses, infrastructure, people)
147
+ - Data integrity across systems
148
+
149
+ ### Decision Checklist
150
+
151
+ Before choosing:
152
+
153
+ - [ ] Requirements documented? (consistency, scale, queries)
154
+ - [ ] Multiple options evaluated with pros/cons?
155
+ - [ ] Team expertise considered?
156
+ - [ ] Operational complexity acceptable?
157
+ - [ ] Cost analyzed? (license + infra + ops)
158
+ - [ ] Migration path exists if wrong choice?
159
+ - [ ] Decision documented in ADR?
160
+
161
+ ---
162
+
163
+ ## Decision Scenarios
164
+
165
+ ### By Project Type
166
+
167
+ | Project Type | Recommended Approach | Why |
168
+ |--------------|---------------------|-----|
169
+ | **MVP / Prototype** | SQLite or single PostgreSQL/MySQL | Simplicity, fast iteration, easy to change later |
170
+ | **SaaS B2B** | PostgreSQL + Redis cache | Complex queries, multi-tenancy, transactions |
171
+ | **Mobile app** | SQLite (local) + sync to cloud DB | Offline-first, embedded |
172
+ | **E-commerce** | PostgreSQL/MySQL + Elasticsearch + Redis | Transactions + search + cache |
173
+ | **Analytics platform** | ClickHouse / BigQuery + PostgreSQL (metadata) | OLAP for data, OLTP for config |
174
+ | **Real-time chat** | Redis/KeyDB + PostgreSQL (history) | Speed for live, durability for archive |
175
+ | **IoT / Telemetry** | TimescaleDB / InfluxDB / QuestDB | Time-series optimized |
176
+ | **AI/ML application** | PostgreSQL + pgvector OR dedicated vector DB | Embeddings + relational data |
177
+ | **Content platform** | MongoDB or PostgreSQL + S3 | Flexible content, binary storage |
178
+ | **Social network** | PostgreSQL + Neo4j (graph) + Redis | Relations + graph traversals + cache |
179
+
180
+ ### By Scale
181
+
182
+ | Scale | Data Volume | Approach |
183
+ |-------|-------------|----------|
184
+ | **Small** | < 100K rows, 1 server | SQLite, single instance any DB |
185
+ | **Medium** | 100K-10M rows | Single PostgreSQL/MySQL with replicas |
186
+ | **Large** | 10M-1B rows | Sharding, read replicas, caching layer |
187
+ | **Massive** | > 1B rows | Distributed DB (CockroachDB, TiDB), specialized stores |
188
+
189
+ ### By Team Size
190
+
191
+ | Team | Recommendation |
192
+ |------|----------------|
193
+ | **Solo / 1-2 devs** | SQLite → PostgreSQL. Minimize ops burden |
194
+ | **Small team (3-5)** | One primary DB + cache. Avoid polyglot until needed |
195
+ | **Medium team (5-15)** | Can handle 2-3 specialized stores if justified |
196
+ | **Large team (15+)** | Dedicated DBAs, can manage complex setups |
197
+
198
+ ---
199
+
200
+ ## Selection Rules
201
+
202
+ ### Rule 1: Start Simple
203
+
204
+ ```
205
+ IF project is new AND requirements unclear
206
+ THEN choose simplest option (SQLite for embedded, PostgreSQL for server)
207
+ AND plan to migrate if needed
208
+ ```
209
+
210
+ ### Rule 2: Match Data Model
211
+
212
+ ```
213
+ IF data is highly relational (JOINs, transactions)
214
+ THEN relational DB (PostgreSQL, MySQL)
215
+
216
+ IF data is documents with variable schema
217
+ THEN document DB (MongoDB) OR relational with JSONB
218
+
219
+ IF data is key-value with TTL
220
+ THEN Redis / Memcached
221
+
222
+ IF data is time-ordered metrics
223
+ THEN time-series DB (TimescaleDB, InfluxDB)
224
+
225
+ IF data is embeddings for similarity
226
+ THEN vector DB OR PostgreSQL + pgvector
227
+ ```
228
+
229
+ ### Rule 3: Consider Consistency Requirements
230
+
231
+ ```
232
+ IF financial transactions OR inventory
233
+ THEN ACID required → relational DB with transactions
234
+
235
+ IF user preferences OR analytics
236
+ THEN eventual consistency OK → more options available
237
+
238
+ IF distributed system with strong consistency
239
+ THEN CockroachDB, Spanner, YugabyteDB
240
+ ```
241
+
242
+ ### Rule 4: Operational Reality
243
+
244
+ ```
245
+ IF team has no DBA AND no DevOps
246
+ THEN prefer managed services (RDS, PlanetScale, Supabase, Neon)
247
+
248
+ IF compliance requires data locality
249
+ THEN self-hosted OR specific cloud regions
250
+
251
+ IF budget is limited
252
+ THEN open-source + self-hosted OR SQLite
253
+ ```
254
+
255
+ ### Rule 5: Don't Over-Engineer
256
+
257
+ ```
258
+ IF single PostgreSQL can handle the load
259
+ THEN don't add Redis "just in case"
260
+
261
+ IF you're adding a DB "for future scale"
262
+ THEN stop — add when actually needed
263
+
264
+ IF polyglot persistence adds sync complexity
265
+ THEN evaluate if benefits outweigh costs
266
+ ```
267
+
268
+ ---
269
+
270
+ ## Anti-Patterns
271
+
272
+ ### ❌ Don't Do This
273
+
274
+ | Anti-Pattern | Problem | Better Approach |
275
+ |--------------|---------|-----------------|
276
+ | **MongoDB for everything** | Loses relational benefits, JOINs in app code | Use relational for relational data |
277
+ | **Microservice = own DB type** | Operational nightmare, 10 different DBs | Standardize on 1-2 types across services |
278
+ | **Redis as primary store** | Persistence is tricky, data loss risk | Redis for cache, real DB for persistence |
279
+ | **Premature sharding** | Complexity before it's needed | Vertical scaling first, shard when measured need |
280
+ | **Elasticsearch as primary** | Not designed for ACID, sync issues | ES for search, source of truth elsewhere |
281
+ | **Ignoring team expertise** | Steep learning curve, bugs, slow delivery | Factor in what team knows |
282
+ | **Choosing by hype** | New ≠ better, production readiness matters | Evaluate maturity, community, support |
283
+ | **No migration plan** | Stuck with wrong choice forever | Always consider "what if we need to change" |
284
+
285
+ ### ⚠️ Warning Signs
286
+
287
+ | Sign | What It Means |
288
+ |------|---------------|
289
+ | "Let's use X, I've been wanting to try it" | Technology-driven, not requirement-driven |
290
+ | "We might need scale someday" | Premature optimization |
291
+ | "Everyone uses MongoDB now" | Hype-driven, not analysis-driven |
292
+ | "PostgreSQL is boring" | Boring = proven, stable, predictable |
293
+ | "We need real-time so Redis for everything" | Misunderstanding use cases |
294
+ | Adding 4th database to architecture | Complexity explosion, reconsider |
295
+
296
+ ---
297
+
298
+ ## Example Architectures
299
+
300
+ ### Example 1: SaaS Task Management
301
+
302
+ ```
303
+ Requirements:
304
+ - Multi-tenant (100s of companies)
305
+ - Real-time updates
306
+ - Full-text search in tasks
307
+ - Audit trail
308
+
309
+ Architecture:
310
+ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
311
+ │ PostgreSQL │────►│ Redis │────►│ Client │
312
+ │ (primary) │ │ (pub/sub, │ │ │
313
+ │ │ │ cache) │ │ │
314
+ └─────────────┘ └─────────────┘ └─────────────┘
315
+
316
+ │ CDC/trigger
317
+
318
+ ┌─────────────┐
319
+ │ Meilisearch │ (search)
320
+ └─────────────┘
321
+
322
+ Why:
323
+ - PostgreSQL: ACID for tasks, RLS for multi-tenancy
324
+ - Redis: pub/sub for real-time, session cache
325
+ - Meilisearch: simple search, typo-tolerant
326
+ ```
327
+
328
+ ### Example 2: E-commerce Platform
329
+
330
+ ```
331
+ Requirements:
332
+ - Product catalog (1M+ products)
333
+ - Order transactions
334
+ - Search with facets
335
+ - Recommendations
336
+
337
+ Architecture:
338
+ ┌─────────────┐ ┌─────────────┐
339
+ │ MySQL │ │ Elasticsearch│
340
+ │ (orders, │────►│ (product │
341
+ │ inventory) │ │ search) │
342
+ └─────────────┘ └─────────────┘
343
+
344
+ │ ┌─────────────┐
345
+ └───────────►│ Redis │
346
+ │ (cart,cache)│
347
+ └─────────────┘
348
+
349
+ ┌─────────────┐
350
+ │ pgvector/ │ (recommendations)
351
+ │ Qdrant │
352
+ └─────────────┘
353
+
354
+ Why:
355
+ - MySQL: proven for e-commerce, transactions
356
+ - Elasticsearch: faceted search, filters
357
+ - Redis: cart sessions, product cache
358
+ - Vector DB: "similar products" recommendations
359
+ ```
360
+
361
+ ### Example 3: Mobile App with Offline
362
+
363
+ ```
364
+ Requirements:
365
+ - Works offline
366
+ - Syncs when online
367
+ - Simple data model
368
+
369
+ Architecture:
370
+ ┌─────────────────────────────────────┐
371
+ │ Mobile App │
372
+ │ ┌─────────────┐ │
373
+ │ │ SQLite │ (local) │
374
+ │ └──────┬──────┘ │
375
+ └─────────┼───────────────────────────┘
376
+ │ sync
377
+
378
+ ┌─────────────────────────────────────┐
379
+ │ Backend API │
380
+ │ ┌─────────────┐ │
381
+ │ │ PostgreSQL │ (cloud) │
382
+ │ └─────────────┘ │
383
+ └─────────────────────────────────────┘
384
+
385
+ Why:
386
+ - SQLite: embedded, zero-config, offline
387
+ - PostgreSQL: server-side, handles conflicts
388
+ - Sync: custom or use Supabase/Firebase
389
+ ```
390
+
391
+ ### Example 4: Analytics Dashboard
392
+
393
+ ```
394
+ Requirements:
395
+ - Ingest 1M events/day
396
+ - Fast aggregations
397
+ - Historical queries
398
+
399
+ Architecture:
400
+ ┌─────────────┐ ┌─────────────┐
401
+ │ Kafka │────►│ ClickHouse │
402
+ │ (ingest) │ │ (analytics) │
403
+ └─────────────┘ └─────────────┘
404
+
405
+
406
+ ┌─────────────┐
407
+ │ PostgreSQL │
408
+ │ (metadata, │
409
+ │ users) │
410
+ └─────────────┘
411
+
412
+ Why:
413
+ - Kafka: handles high-volume ingest
414
+ - ClickHouse: columnar, fast aggregations
415
+ - PostgreSQL: user accounts, dashboard configs
416
+ ```
417
+
418
+ ---
419
+
420
+ ## Schema Design Process
421
+
422
+ ### Step 1: Identify Entities
423
+
424
+ From requirements/PRD, extract:
425
+ - **Core entities** — Main business objects
426
+ - **Supporting entities** — Lookup tables, configs
427
+ - **Junction tables** — M:N relationships
428
+
429
+ ### Step 2: Define Relationships
430
+
431
+ | Relationship | Implementation | Example |
432
+ |--------------|----------------|---------|
433
+ | 1:1 | FK in either table OR same table | user ↔ profile |
434
+ | 1:N | FK in "many" side | user → posts |
435
+ | M:N | Junction table | users ↔ roles |
436
+ | Hierarchy | Self-referencing FK OR nested set | categories |
437
+ | Polymorphic | Type column + nullable FKs OR separate tables | comments on posts/tasks |
438
+
439
+ ### Step 3: Normalize (then Denormalize)
440
+
441
+ #### Normalization Levels
442
+
443
+ | Form | Rule | Check |
444
+ |------|------|-------|
445
+ | **1NF** | Atomic values, no arrays in columns | Each cell = one value |
446
+ | **2NF** | No partial dependencies | All non-key columns depend on FULL PK |
447
+ | **3NF** | No transitive dependencies | Non-key columns don't depend on other non-key |
448
+ | **BCNF** | Every determinant is a candidate key | Advanced, rare |
449
+
450
+ #### When to Denormalize
451
+
452
+ | Situation | Denormalization |
453
+ |-----------|-----------------|
454
+ | Read-heavy, rarely changes | Duplicate for read speed |
455
+ | Reporting/analytics | Materialized views |
456
+ | Cross-module data | Cache foreign data locally |
457
+ | Computed values | Store calculated fields |
458
+
459
+ **Rule:** Normalize first, denormalize with measured reason.
460
+
461
+ ### Step 4: Design Indexes
462
+
463
+ #### Index Types
464
+
465
+ | Type | PostgreSQL | Use For |
466
+ |------|------------|---------|
467
+ | B-tree | `CREATE INDEX` (default) | Equality, range, sorting |
468
+ | Hash | `USING hash` | Equality only, faster |
469
+ | GiST | `USING gist` | Geometric, full-text |
470
+ | GIN | `USING gin` | Arrays, JSONB, full-text |
471
+ | BRIN | `USING brin` | Large tables, sorted data |
472
+
473
+ #### Index Strategy
474
+
475
+ ```sql
476
+ -- Primary key (automatic)
477
+ PRIMARY KEY (id)
478
+
479
+ -- Foreign keys (CREATE MANUALLY!)
480
+ CREATE INDEX idx_tasks_user_id ON tasks(user_id);
481
+
482
+ -- Common queries
483
+ CREATE INDEX idx_tasks_status ON tasks(status) WHERE status != 'archived';
484
+
485
+ -- Composite (order matters!)
486
+ CREATE INDEX idx_tasks_user_status ON tasks(user_id, status);
487
+
488
+ -- Covering index (includes columns)
489
+ CREATE INDEX idx_tasks_list ON tasks(user_id) INCLUDE (title, status);
490
+ ```
491
+
492
+ #### Index Anti-Patterns
493
+
494
+ | Anti-Pattern | Problem |
495
+ |--------------|---------|
496
+ | Index every column | Write overhead, storage |
497
+ | Missing FK indexes | Slow JOINs, lock contention |
498
+ | Wrong column order in composite | Not used for queries |
499
+ | Over-indexing low-cardinality | B-tree inefficient for few values |
500
+
501
+ ---
502
+
503
+ ## Table Conventions
504
+
505
+ ### Naming
506
+
507
+ ```sql
508
+ -- Tables: plural, snake_case
509
+ users, task_comments, user_roles
510
+
511
+ -- Columns: snake_case
512
+ created_at, user_id, is_active
513
+
514
+ -- Primary keys: id (uuid or bigint)
515
+ id UUID PRIMARY KEY DEFAULT gen_random_uuid()
516
+
517
+ -- Foreign keys: {table_singular}_id
518
+ user_id UUID REFERENCES users(id)
519
+
520
+ -- Timestamps: _at suffix
521
+ created_at, updated_at, deleted_at
522
+ ```
523
+
524
+ ### Standard Columns
525
+
526
+ ```sql
527
+ -- Every table should have
528
+ id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
529
+ created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
530
+ updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
531
+
532
+ -- Soft delete (if needed)
533
+ deleted_at TIMESTAMPTZ
534
+
535
+ -- Multi-tenant
536
+ tenant_id UUID NOT NULL REFERENCES tenants(id)
537
+ ```
538
+
539
+ ### Constraints
540
+
541
+ ```sql
542
+ -- NOT NULL by default, NULL explicitly
543
+ email VARCHAR(255) NOT NULL,
544
+ bio TEXT -- nullable explicitly documented
545
+
546
+ -- CHECK constraints for business rules
547
+ CHECK (status IN ('draft', 'active', 'archived'))
548
+ CHECK (price >= 0)
549
+ CHECK (start_date < end_date)
550
+
551
+ -- UNIQUE constraints
552
+ UNIQUE (tenant_id, email) -- scoped uniqueness
553
+ ```
554
+
555
+ ---
556
+
557
+ ## Common Patterns
558
+
559
+ ### Soft Delete
560
+
561
+ ```sql
562
+ -- Column
563
+ deleted_at TIMESTAMPTZ
564
+
565
+ -- Index for active records
566
+ CREATE INDEX idx_users_active ON users(id) WHERE deleted_at IS NULL;
567
+
568
+ -- Application: always filter
569
+ SELECT * FROM users WHERE deleted_at IS NULL;
570
+ ```
571
+
572
+ ### Audit Trail (History)
573
+
574
+ ```sql
575
+ -- Option 1: History table
576
+ CREATE TABLE tasks_history (
577
+ history_id UUID PRIMARY KEY,
578
+ task_id UUID NOT NULL,
579
+ changed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
580
+ changed_by UUID,
581
+ operation VARCHAR(10), -- INSERT, UPDATE, DELETE
582
+ old_data JSONB,
583
+ new_data JSONB
584
+ );
585
+
586
+ -- Option 2: SCD Type 2 (Slowly Changing Dimension)
587
+ CREATE TABLE users (
588
+ id UUID,
589
+ email VARCHAR(255),
590
+ name VARCHAR(255),
591
+ valid_from TIMESTAMPTZ NOT NULL,
592
+ valid_to TIMESTAMPTZ, -- NULL = current
593
+ is_current BOOLEAN GENERATED ALWAYS AS (valid_to IS NULL) STORED,
594
+ PRIMARY KEY (id, valid_from)
595
+ );
596
+ ```
597
+
598
+ ### Multi-Tenancy
599
+
600
+ ```sql
601
+ -- Option 1: Tenant column (recommended for most cases)
602
+ CREATE TABLE tasks (
603
+ id UUID PRIMARY KEY,
604
+ tenant_id UUID NOT NULL REFERENCES tenants(id),
605
+ -- ...
606
+ );
607
+
608
+ -- Row-level security
609
+ ALTER TABLE tasks ENABLE ROW LEVEL SECURITY;
610
+ CREATE POLICY tenant_isolation ON tasks
611
+ USING (tenant_id = current_setting('app.tenant_id')::uuid);
612
+
613
+ -- Option 2: Schema per tenant (for strong isolation)
614
+ CREATE SCHEMA tenant_abc;
615
+ CREATE TABLE tenant_abc.tasks (...);
616
+ ```
617
+
618
+ ### Enum vs Lookup Table
619
+
620
+ ```sql
621
+ -- PostgreSQL ENUM (simple, fixed values)
622
+ CREATE TYPE task_status AS ENUM ('todo', 'in_progress', 'done');
623
+ ALTER TABLE tasks ADD COLUMN status task_status;
624
+
625
+ -- Lookup table (configurable, metadata)
626
+ CREATE TABLE task_statuses (
627
+ id SERIAL PRIMARY KEY,
628
+ code VARCHAR(50) UNIQUE NOT NULL,
629
+ name VARCHAR(100),
630
+ display_order INT,
631
+ is_terminal BOOLEAN DEFAULT FALSE
632
+ );
633
+
634
+ -- Use lookup when:
635
+ -- - Values change at runtime
636
+ -- - Need metadata (color, order, permissions)
637
+ -- - Multi-tenant with different values
638
+ ```
639
+
640
+ ### JSONB Usage
641
+
642
+ ```sql
643
+ -- Good: Flexible attributes, rare queries
644
+ metadata JSONB DEFAULT '{}'
645
+
646
+ -- Index for queries
647
+ CREATE INDEX idx_tasks_metadata ON tasks USING gin(metadata);
648
+
649
+ -- Query
650
+ SELECT * FROM tasks WHERE metadata @> '{"priority": "high"}';
651
+
652
+ -- Bad: Structured data that needs JOINs, constraints
653
+ -- Don't store user_id in JSONB if you need FK constraint
654
+ ```
655
+
656
+ ---
657
+
658
+ ## Partitioning
659
+
660
+ ### When to Partition
661
+
662
+ | Signal | Consider Partitioning |
663
+ |--------|----------------------|
664
+ | Table > 100GB | Yes |
665
+ | Time-series data | Yes (by time) |
666
+ | Queries always filter by X | Yes (by X) |
667
+ | Deleting old data regularly | Yes (DROP partition vs DELETE) |
668
+ | < 10M rows, simple queries | No |
669
+
670
+ ### Partition Strategies
671
+
672
+ ```sql
673
+ -- Range partitioning (time-series)
674
+ CREATE TABLE events (
675
+ id UUID,
676
+ event_time TIMESTAMPTZ NOT NULL,
677
+ data JSONB
678
+ ) PARTITION BY RANGE (event_time);
679
+
680
+ CREATE TABLE events_2026_01 PARTITION OF events
681
+ FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
682
+
683
+ -- List partitioning (by category)
684
+ CREATE TABLE orders (
685
+ id UUID,
686
+ region VARCHAR(10) NOT NULL,
687
+ data JSONB
688
+ ) PARTITION BY LIST (region);
689
+
690
+ CREATE TABLE orders_eu PARTITION OF orders FOR VALUES IN ('eu');
691
+ CREATE TABLE orders_us PARTITION OF orders FOR VALUES IN ('us');
692
+
693
+ -- Hash partitioning (even distribution)
694
+ PARTITION BY HASH (tenant_id);
695
+ ```
696
+
697
+ ---
698
+
699
+ ## Migrations
700
+
701
+ ### Migration Rules
702
+
703
+ 1. **Forward compatible** — New code works with old schema
704
+ 2. **Backward compatible** — Old code works with new schema
705
+ 3. **Small, incremental** — One change per migration
706
+ 4. **Reversible** — Include DOWN migration
707
+ 5. **Tested** — Run on copy of production data
708
+
709
+ ### Safe Migration Patterns
710
+
711
+ | Change | Safe Way |
712
+ |--------|----------|
713
+ | Add column | `ADD COLUMN ... DEFAULT NULL` (no lock) |
714
+ | Add NOT NULL column | Add nullable → backfill → add constraint |
715
+ | Remove column | Stop using → deploy → remove |
716
+ | Rename column | Add new → copy → remove old |
717
+ | Add index | `CREATE INDEX CONCURRENTLY` |
718
+ | Change type | Add new column → migrate → remove old |
719
+
720
+ ### Dangerous Operations
721
+
722
+ ```sql
723
+ -- ❌ LOCKS TABLE
724
+ ALTER TABLE tasks ADD COLUMN status VARCHAR NOT NULL DEFAULT 'todo';
725
+
726
+ -- ✅ SAFE
727
+ ALTER TABLE tasks ADD COLUMN status VARCHAR;
728
+ UPDATE tasks SET status = 'todo' WHERE status IS NULL; -- batched
729
+ ALTER TABLE tasks ALTER COLUMN status SET NOT NULL;
730
+ ALTER TABLE tasks ALTER COLUMN status SET DEFAULT 'todo';
731
+ ```
732
+
733
+ ### Migration File Format
734
+
735
+ ```sql
736
+ -- migrations/20260124_001_add_status_to_tasks.sql
737
+
738
+ -- +migrate Up
739
+ ALTER TABLE tasks ADD COLUMN status VARCHAR(50);
740
+ CREATE INDEX CONCURRENTLY idx_tasks_status ON tasks(status);
741
+
742
+ -- +migrate Down
743
+ DROP INDEX CONCURRENTLY IF EXISTS idx_tasks_status;
744
+ ALTER TABLE tasks DROP COLUMN IF EXISTS status;
745
+ ```
746
+
747
+ ---
748
+
749
+ ## Query Optimization
750
+
751
+ ### EXPLAIN Checklist
752
+
753
+ ```sql
754
+ EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
755
+ SELECT * FROM tasks WHERE user_id = '...' AND status = 'active';
756
+ ```
757
+
758
+ | Look For | Problem | Fix |
759
+ |----------|---------|-----|
760
+ | `Seq Scan` on large table | Missing index | Add index |
761
+ | High `rows` estimate vs actual | Stale statistics | `ANALYZE table` |
762
+ | `Sort` with high cost | Missing index for ORDER BY | Add covering index |
763
+ | Nested Loop on large sets | Inefficient JOIN | Add index, rewrite query |
764
+
765
+ ### Common Optimizations
766
+
767
+ ```sql
768
+ -- Pagination: cursor-based (not OFFSET)
769
+ -- ❌ Slow for large offsets
770
+ SELECT * FROM tasks ORDER BY created_at LIMIT 20 OFFSET 10000;
771
+
772
+ -- ✅ Fast cursor-based
773
+ SELECT * FROM tasks
774
+ WHERE created_at < '2026-01-20T10:00:00Z'
775
+ ORDER BY created_at DESC LIMIT 20;
776
+
777
+ -- Batch operations
778
+ -- ❌ One by one
779
+ UPDATE tasks SET status = 'archived' WHERE ...;
780
+
781
+ -- ✅ Batched
782
+ UPDATE tasks SET status = 'archived'
783
+ WHERE id IN (SELECT id FROM tasks WHERE ... LIMIT 1000);
784
+
785
+ -- Count estimation (for UI "~1000 results")
786
+ SELECT reltuples::bigint FROM pg_class WHERE relname = 'tasks';
787
+ ```
788
+
789
+ ---
790
+
791
+ ## Validation Checklist
792
+
793
+ Before completing database design:
794
+
795
+ - [ ] Entity relationships documented (ERD)
796
+ - [ ] All tables have PK, created_at, updated_at
797
+ - [ ] Foreign keys have indexes
798
+ - [ ] Naming follows conventions
799
+ - [ ] Constraints enforce business rules
800
+ - [ ] Query patterns have supporting indexes
801
+ - [ ] Soft delete strategy defined (if needed)
802
+ - [ ] Multi-tenancy strategy defined (if needed)
803
+ - [ ] Migration strategy for schema changes
804
+ - [ ] Data retention/archival plan
805
+ - [ ] Backup strategy documented
806
+
807
+ ---
808
+
809
+ ## Output
810
+
811
+ - Schema: `docs/architecture/{module}/data-model.md`
812
+ - Migrations: `migrations/YYYYMMDD_NNN_description.sql`
813
+ - ERD: `docs/diagrams/data/{module}-erd.md`
814
+
815
+ ## Related Skills
816
+
817
+ - `architecture-design` — Database is part of architecture
818
+ - `unit-writing` — Includes data-model.md
819
+ - `adr-writing` — Document database decisions
820
+ - `diagram-creation` — ER diagrams