@comfanion/workflow 4.36.39 → 4.36.41
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/src/build-info.json +2 -2
- package/src/opencode/agents/dev.md +1 -1
- package/src/opencode/skills/adr-writing/SKILL.md +1 -1
- package/src/opencode/skills/api-design/SKILL.md +527 -0
- package/src/opencode/skills/architecture-design/SKILL.md +313 -10
- package/src/opencode/skills/architecture-validation/SKILL.md +4 -4
- package/src/opencode/skills/database-design/SKILL.md +820 -0
- package/src/opencode/skills/diagram-creation/SKILL.md +384 -10
- package/src/opencode/skills/unit-writing/SKILL.md +1 -1
|
@@ -0,0 +1,820 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: database-design
|
|
3
|
+
description: Use when designing database schema, choosing storage strategy, planning migrations, or optimizing queries for a module or service
|
|
4
|
+
license: MIT
|
|
5
|
+
compatibility: opencode
|
|
6
|
+
metadata:
|
|
7
|
+
domain: software-architecture
|
|
8
|
+
patterns: normalization, indexing, partitioning, migrations
|
|
9
|
+
artifacts: docs/architecture/*/data-model.md
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
# Database Design Skill
|
|
13
|
+
|
|
14
|
+
## When to Use
|
|
15
|
+
|
|
16
|
+
Use this skill when you need to:
|
|
17
|
+
- Design database schema for new module/service
|
|
18
|
+
- Choose database type (relational, document, graph, time-series)
|
|
19
|
+
- Plan table structure and relationships
|
|
20
|
+
- Design indexes for query optimization
|
|
21
|
+
- Plan data migrations (forward/backward compatible)
|
|
22
|
+
- Implement partitioning strategy
|
|
23
|
+
- Design for scalability and performance
|
|
24
|
+
|
|
25
|
+
## Reference
|
|
26
|
+
|
|
27
|
+
Always check project standards: `@CLAUDE.md`
|
|
28
|
+
|
|
29
|
+
## Templates
|
|
30
|
+
|
|
31
|
+
- Main: `@.opencode/skills/database-design/template.md`
|
|
32
|
+
- Migration: `@.opencode/skills/database-design/template-migration.md`
|
|
33
|
+
|
|
34
|
+
---
|
|
35
|
+
|
|
36
|
+
## Database Selection
|
|
37
|
+
|
|
38
|
+
**No default — choose based on project requirements.**
|
|
39
|
+
|
|
40
|
+
Analyze requirements → evaluate options → document decision in ADR.
|
|
41
|
+
|
|
42
|
+
### Selection Criteria
|
|
43
|
+
|
|
44
|
+
| Criteria | Questions to Answer |
|
|
45
|
+
|----------|---------------------|
|
|
46
|
+
| **Data model** | Relational? Documents? Key-value? Graph? Time-series? |
|
|
47
|
+
| **Consistency** | ACID required? Eventual OK? |
|
|
48
|
+
| **Scale** | GB? TB? PB? Read/write ratio? |
|
|
49
|
+
| **Query patterns** | Complex JOINs? Full-text? Aggregations? |
|
|
50
|
+
| **Deployment** | Cloud managed? Self-hosted? Embedded? Serverless? |
|
|
51
|
+
| **Team expertise** | What does team know? Learning curve acceptable? |
|
|
52
|
+
| **Cost** | License? Infrastructure? Operations? |
|
|
53
|
+
|
|
54
|
+
### Relational Databases
|
|
55
|
+
|
|
56
|
+
| Database | Pros | Cons | Best For |
|
|
57
|
+
|----------|------|------|----------|
|
|
58
|
+
| **PostgreSQL** | Feature-rich, extensions (JSONB, vectors, FTS), ACID, open-source | Heavier than alternatives, vertical scaling | Complex queries, mixed workloads, extensibility needed |
|
|
59
|
+
| **MySQL/MariaDB** | Fast reads, mature, wide hosting support | Less features than PG, replication complexity | Read-heavy web apps, legacy compatibility |
|
|
60
|
+
| **SQLite** | Zero config, embedded, single file, serverless | Single writer, no network access | Mobile, desktop apps, dev/test, edge, small projects |
|
|
61
|
+
| **CockroachDB** | Distributed SQL, horizontal scaling, ACID | Complexity, latency vs single-node | Global distribution, high availability |
|
|
62
|
+
| **TiDB** | MySQL compatible, HTAP, horizontal scaling | Operational complexity | MySQL migration with scale needs |
|
|
63
|
+
|
|
64
|
+
### Document Databases
|
|
65
|
+
|
|
66
|
+
| Database | Pros | Cons | Best For |
|
|
67
|
+
|----------|------|------|----------|
|
|
68
|
+
| **MongoDB** | Flexible schema, horizontal scaling, rich queries | No JOINs, eventual consistency default | Rapid prototyping, variable schemas, content management |
|
|
69
|
+
| **CouchDB** | Multi-master replication, offline-first | Limited queries, slower | Sync-heavy apps, offline-first |
|
|
70
|
+
| **FerretDB** | MongoDB protocol, PostgreSQL backend | Newer, subset of features | MongoDB API with PostgreSQL storage |
|
|
71
|
+
|
|
72
|
+
### Key-Value & Cache
|
|
73
|
+
|
|
74
|
+
| Database | Pros | Cons | Best For |
|
|
75
|
+
|----------|------|------|----------|
|
|
76
|
+
| **Redis** | Sub-ms latency, data structures, pub/sub | Memory-bound, persistence tradeoffs | Cache, sessions, real-time, queues |
|
|
77
|
+
| **Memcached** | Simple, fast, multi-threaded | No persistence, only strings | Pure caching |
|
|
78
|
+
| **DragonflyDB** | Redis compatible, better memory efficiency | Newer | Redis replacement at scale |
|
|
79
|
+
| **KeyDB** | Redis fork, multi-threaded | Community smaller | High-throughput Redis workloads |
|
|
80
|
+
|
|
81
|
+
### Search Engines
|
|
82
|
+
|
|
83
|
+
| Database | Pros | Cons | Best For |
|
|
84
|
+
|----------|------|------|----------|
|
|
85
|
+
| **Elasticsearch** | Powerful search, analytics, ecosystem | Resource heavy, operational complexity | Full-text search, logging, analytics |
|
|
86
|
+
| **OpenSearch** | Elasticsearch fork, open-source | Feature lag | Elasticsearch alternative (AWS) |
|
|
87
|
+
| **Meilisearch** | Simple, fast, typo-tolerant | Less features, smaller scale | Simple search, instant search UX |
|
|
88
|
+
| **Typesense** | Easy setup, typo-tolerant | Smaller community | Developer-friendly search |
|
|
89
|
+
|
|
90
|
+
### Vector Databases (AI/ML)
|
|
91
|
+
|
|
92
|
+
| Database | Pros | Cons | Best For |
|
|
93
|
+
|----------|------|------|----------|
|
|
94
|
+
| **pgvector** | PostgreSQL extension, familiar | Scale limits vs specialized | Vector search + relational in one DB |
|
|
95
|
+
| **Pinecone** | Managed, scalable, simple API | Vendor lock-in, cost | Production AI apps, managed solution |
|
|
96
|
+
| **Qdrant** | Open-source, filtering, fast | Self-hosting complexity | Self-hosted vector search |
|
|
97
|
+
| **Weaviate** | GraphQL, modules, hybrid search | Heavier | Semantic search, multi-modal |
|
|
98
|
+
| **Chroma** | Simple, embedded option | Early stage | Prototyping, small projects |
|
|
99
|
+
| **Milvus** | Scalable, open-source | Complex deployment | Large-scale similarity search |
|
|
100
|
+
|
|
101
|
+
### Time-Series
|
|
102
|
+
|
|
103
|
+
| Database | Pros | Cons | Best For |
|
|
104
|
+
|----------|------|------|----------|
|
|
105
|
+
| **TimescaleDB** | PostgreSQL extension, SQL | Single-node limits | Time-series + relational |
|
|
106
|
+
| **ClickHouse** | Blazing fast analytics, compression | Column-oriented learning curve | OLAP, analytics, logs at scale |
|
|
107
|
+
| **InfluxDB** | Purpose-built, ecosystem | Query language (Flux) | Metrics, IoT, monitoring |
|
|
108
|
+
| **QuestDB** | Fast ingestion, SQL | Smaller community | High-frequency time-series |
|
|
109
|
+
|
|
110
|
+
### Graph Databases
|
|
111
|
+
|
|
112
|
+
| Database | Pros | Cons | Best For |
|
|
113
|
+
|----------|------|------|----------|
|
|
114
|
+
| **Neo4j** | Mature, Cypher query language | License cost, scaling | Complex relationships, graph algorithms |
|
|
115
|
+
| **Amazon Neptune** | Managed, multiple models | AWS only | Cloud-native graph |
|
|
116
|
+
| **ArangoDB** | Multi-model (doc, graph, KV) | Jack of all trades | Flexible data models |
|
|
117
|
+
|
|
118
|
+
### Message Queues & Streaming
|
|
119
|
+
|
|
120
|
+
| System | Pros | Cons | Best For |
|
|
121
|
+
|--------|------|------|----------|
|
|
122
|
+
| **Kafka** | High throughput, durability, ecosystem | Operational complexity | Event streaming, high-volume |
|
|
123
|
+
| **RabbitMQ** | Flexible routing, protocols | Lower throughput than Kafka | Task queues, complex routing |
|
|
124
|
+
| **NATS** | Simple, fast, lightweight | Less features | Microservices, IoT |
|
|
125
|
+
| **Redis Streams** | Built into Redis | Less features than Kafka | Simple streaming with Redis |
|
|
126
|
+
|
|
127
|
+
### Multi-Storage Architecture
|
|
128
|
+
|
|
129
|
+
When one database isn't enough:
|
|
130
|
+
|
|
131
|
+
```
|
|
132
|
+
┌─────────────────────────────────────────────────────────┐
|
|
133
|
+
│ Application │
|
|
134
|
+
└───────┬─────────────┬─────────────┬─────────────┬───────┘
|
|
135
|
+
│ │ │ │
|
|
136
|
+
▼ ▼ ▼ ▼
|
|
137
|
+
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
|
|
138
|
+
│ Primary │ │ Cache │ │ Search │ │ Events │
|
|
139
|
+
│ DB │ │ │ │ │ │ │
|
|
140
|
+
└─────────┘ └─────────┘ └─────────┘ └─────────┘
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
**Considerations:**
|
|
144
|
+
- Sync complexity (eventual consistency)
|
|
145
|
+
- Operational overhead (more systems to manage)
|
|
146
|
+
- Cost (licenses, infrastructure, people)
|
|
147
|
+
- Data integrity across systems
|
|
148
|
+
|
|
149
|
+
### Decision Checklist
|
|
150
|
+
|
|
151
|
+
Before choosing:
|
|
152
|
+
|
|
153
|
+
- [ ] Requirements documented? (consistency, scale, queries)
|
|
154
|
+
- [ ] Multiple options evaluated with pros/cons?
|
|
155
|
+
- [ ] Team expertise considered?
|
|
156
|
+
- [ ] Operational complexity acceptable?
|
|
157
|
+
- [ ] Cost analyzed? (license + infra + ops)
|
|
158
|
+
- [ ] Migration path exists if wrong choice?
|
|
159
|
+
- [ ] Decision documented in ADR?
|
|
160
|
+
|
|
161
|
+
---
|
|
162
|
+
|
|
163
|
+
## Decision Scenarios
|
|
164
|
+
|
|
165
|
+
### By Project Type
|
|
166
|
+
|
|
167
|
+
| Project Type | Recommended Approach | Why |
|
|
168
|
+
|--------------|---------------------|-----|
|
|
169
|
+
| **MVP / Prototype** | SQLite or single PostgreSQL/MySQL | Simplicity, fast iteration, easy to change later |
|
|
170
|
+
| **SaaS B2B** | PostgreSQL + Redis cache | Complex queries, multi-tenancy, transactions |
|
|
171
|
+
| **Mobile app** | SQLite (local) + sync to cloud DB | Offline-first, embedded |
|
|
172
|
+
| **E-commerce** | PostgreSQL/MySQL + Elasticsearch + Redis | Transactions + search + cache |
|
|
173
|
+
| **Analytics platform** | ClickHouse / BigQuery + PostgreSQL (metadata) | OLAP for data, OLTP for config |
|
|
174
|
+
| **Real-time chat** | Redis/KeyDB + PostgreSQL (history) | Speed for live, durability for archive |
|
|
175
|
+
| **IoT / Telemetry** | TimescaleDB / InfluxDB / QuestDB | Time-series optimized |
|
|
176
|
+
| **AI/ML application** | PostgreSQL + pgvector OR dedicated vector DB | Embeddings + relational data |
|
|
177
|
+
| **Content platform** | MongoDB or PostgreSQL + S3 | Flexible content, binary storage |
|
|
178
|
+
| **Social network** | PostgreSQL + Neo4j (graph) + Redis | Relations + graph traversals + cache |
|
|
179
|
+
|
|
180
|
+
### By Scale
|
|
181
|
+
|
|
182
|
+
| Scale | Data Volume | Approach |
|
|
183
|
+
|-------|-------------|----------|
|
|
184
|
+
| **Small** | < 100K rows, 1 server | SQLite, single instance any DB |
|
|
185
|
+
| **Medium** | 100K-10M rows | Single PostgreSQL/MySQL with replicas |
|
|
186
|
+
| **Large** | 10M-1B rows | Sharding, read replicas, caching layer |
|
|
187
|
+
| **Massive** | > 1B rows | Distributed DB (CockroachDB, TiDB), specialized stores |
|
|
188
|
+
|
|
189
|
+
### By Team Size
|
|
190
|
+
|
|
191
|
+
| Team | Recommendation |
|
|
192
|
+
|------|----------------|
|
|
193
|
+
| **Solo / 1-2 devs** | SQLite → PostgreSQL. Minimize ops burden |
|
|
194
|
+
| **Small team (3-5)** | One primary DB + cache. Avoid polyglot until needed |
|
|
195
|
+
| **Medium team (5-15)** | Can handle 2-3 specialized stores if justified |
|
|
196
|
+
| **Large team (15+)** | Dedicated DBAs, can manage complex setups |
|
|
197
|
+
|
|
198
|
+
---
|
|
199
|
+
|
|
200
|
+
## Selection Rules
|
|
201
|
+
|
|
202
|
+
### Rule 1: Start Simple
|
|
203
|
+
|
|
204
|
+
```
|
|
205
|
+
IF project is new AND requirements unclear
|
|
206
|
+
THEN choose simplest option (SQLite for embedded, PostgreSQL for server)
|
|
207
|
+
AND plan to migrate if needed
|
|
208
|
+
```
|
|
209
|
+
|
|
210
|
+
### Rule 2: Match Data Model
|
|
211
|
+
|
|
212
|
+
```
|
|
213
|
+
IF data is highly relational (JOINs, transactions)
|
|
214
|
+
THEN relational DB (PostgreSQL, MySQL)
|
|
215
|
+
|
|
216
|
+
IF data is documents with variable schema
|
|
217
|
+
THEN document DB (MongoDB) OR relational with JSONB
|
|
218
|
+
|
|
219
|
+
IF data is key-value with TTL
|
|
220
|
+
THEN Redis / Memcached
|
|
221
|
+
|
|
222
|
+
IF data is time-ordered metrics
|
|
223
|
+
THEN time-series DB (TimescaleDB, InfluxDB)
|
|
224
|
+
|
|
225
|
+
IF data is embeddings for similarity
|
|
226
|
+
THEN vector DB OR PostgreSQL + pgvector
|
|
227
|
+
```
|
|
228
|
+
|
|
229
|
+
### Rule 3: Consider Consistency Requirements
|
|
230
|
+
|
|
231
|
+
```
|
|
232
|
+
IF financial transactions OR inventory
|
|
233
|
+
THEN ACID required → relational DB with transactions
|
|
234
|
+
|
|
235
|
+
IF user preferences OR analytics
|
|
236
|
+
THEN eventual consistency OK → more options available
|
|
237
|
+
|
|
238
|
+
IF distributed system with strong consistency
|
|
239
|
+
THEN CockroachDB, Spanner, YugabyteDB
|
|
240
|
+
```
|
|
241
|
+
|
|
242
|
+
### Rule 4: Operational Reality
|
|
243
|
+
|
|
244
|
+
```
|
|
245
|
+
IF team has no DBA AND no DevOps
|
|
246
|
+
THEN prefer managed services (RDS, PlanetScale, Supabase, Neon)
|
|
247
|
+
|
|
248
|
+
IF compliance requires data locality
|
|
249
|
+
THEN self-hosted OR specific cloud regions
|
|
250
|
+
|
|
251
|
+
IF budget is limited
|
|
252
|
+
THEN open-source + self-hosted OR SQLite
|
|
253
|
+
```
|
|
254
|
+
|
|
255
|
+
### Rule 5: Don't Over-Engineer
|
|
256
|
+
|
|
257
|
+
```
|
|
258
|
+
IF single PostgreSQL can handle the load
|
|
259
|
+
THEN don't add Redis "just in case"
|
|
260
|
+
|
|
261
|
+
IF you're adding a DB "for future scale"
|
|
262
|
+
THEN stop — add when actually needed
|
|
263
|
+
|
|
264
|
+
IF polyglot persistence adds sync complexity
|
|
265
|
+
THEN evaluate if benefits outweigh costs
|
|
266
|
+
```
|
|
267
|
+
|
|
268
|
+
---
|
|
269
|
+
|
|
270
|
+
## Anti-Patterns
|
|
271
|
+
|
|
272
|
+
### ❌ Don't Do This
|
|
273
|
+
|
|
274
|
+
| Anti-Pattern | Problem | Better Approach |
|
|
275
|
+
|--------------|---------|-----------------|
|
|
276
|
+
| **MongoDB for everything** | Loses relational benefits, JOINs in app code | Use relational for relational data |
|
|
277
|
+
| **Microservice = own DB type** | Operational nightmare, 10 different DBs | Standardize on 1-2 types across services |
|
|
278
|
+
| **Redis as primary store** | Persistence is tricky, data loss risk | Redis for cache, real DB for persistence |
|
|
279
|
+
| **Premature sharding** | Complexity before it's needed | Vertical scaling first, shard when measured need |
|
|
280
|
+
| **Elasticsearch as primary** | Not designed for ACID, sync issues | ES for search, source of truth elsewhere |
|
|
281
|
+
| **Ignoring team expertise** | Steep learning curve, bugs, slow delivery | Factor in what team knows |
|
|
282
|
+
| **Choosing by hype** | New ≠ better, production readiness matters | Evaluate maturity, community, support |
|
|
283
|
+
| **No migration plan** | Stuck with wrong choice forever | Always consider "what if we need to change" |
|
|
284
|
+
|
|
285
|
+
### ⚠️ Warning Signs
|
|
286
|
+
|
|
287
|
+
| Sign | What It Means |
|
|
288
|
+
|------|---------------|
|
|
289
|
+
| "Let's use X, I've been wanting to try it" | Technology-driven, not requirement-driven |
|
|
290
|
+
| "We might need scale someday" | Premature optimization |
|
|
291
|
+
| "Everyone uses MongoDB now" | Hype-driven, not analysis-driven |
|
|
292
|
+
| "PostgreSQL is boring" | Boring = proven, stable, predictable |
|
|
293
|
+
| "We need real-time so Redis for everything" | Misunderstanding use cases |
|
|
294
|
+
| Adding 4th database to architecture | Complexity explosion, reconsider |
|
|
295
|
+
|
|
296
|
+
---
|
|
297
|
+
|
|
298
|
+
## Example Architectures
|
|
299
|
+
|
|
300
|
+
### Example 1: SaaS Task Management
|
|
301
|
+
|
|
302
|
+
```
|
|
303
|
+
Requirements:
|
|
304
|
+
- Multi-tenant (100s of companies)
|
|
305
|
+
- Real-time updates
|
|
306
|
+
- Full-text search in tasks
|
|
307
|
+
- Audit trail
|
|
308
|
+
|
|
309
|
+
Architecture:
|
|
310
|
+
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
|
311
|
+
│ PostgreSQL │────►│ Redis │────►│ Client │
|
|
312
|
+
│ (primary) │ │ (pub/sub, │ │ │
|
|
313
|
+
│ │ │ cache) │ │ │
|
|
314
|
+
└─────────────┘ └─────────────┘ └─────────────┘
|
|
315
|
+
│
|
|
316
|
+
│ CDC/trigger
|
|
317
|
+
▼
|
|
318
|
+
┌─────────────┐
|
|
319
|
+
│ Meilisearch │ (search)
|
|
320
|
+
└─────────────┘
|
|
321
|
+
|
|
322
|
+
Why:
|
|
323
|
+
- PostgreSQL: ACID for tasks, RLS for multi-tenancy
|
|
324
|
+
- Redis: pub/sub for real-time, session cache
|
|
325
|
+
- Meilisearch: simple search, typo-tolerant
|
|
326
|
+
```
|
|
327
|
+
|
|
328
|
+
### Example 2: E-commerce Platform
|
|
329
|
+
|
|
330
|
+
```
|
|
331
|
+
Requirements:
|
|
332
|
+
- Product catalog (1M+ products)
|
|
333
|
+
- Order transactions
|
|
334
|
+
- Search with facets
|
|
335
|
+
- Recommendations
|
|
336
|
+
|
|
337
|
+
Architecture:
|
|
338
|
+
┌─────────────┐ ┌─────────────┐
|
|
339
|
+
│ MySQL │ │ Elasticsearch│
|
|
340
|
+
│ (orders, │────►│ (product │
|
|
341
|
+
│ inventory) │ │ search) │
|
|
342
|
+
└─────────────┘ └─────────────┘
|
|
343
|
+
│
|
|
344
|
+
│ ┌─────────────┐
|
|
345
|
+
└───────────►│ Redis │
|
|
346
|
+
│ (cart,cache)│
|
|
347
|
+
└─────────────┘
|
|
348
|
+
|
|
349
|
+
┌─────────────┐
|
|
350
|
+
│ pgvector/ │ (recommendations)
|
|
351
|
+
│ Qdrant │
|
|
352
|
+
└─────────────┘
|
|
353
|
+
|
|
354
|
+
Why:
|
|
355
|
+
- MySQL: proven for e-commerce, transactions
|
|
356
|
+
- Elasticsearch: faceted search, filters
|
|
357
|
+
- Redis: cart sessions, product cache
|
|
358
|
+
- Vector DB: "similar products" recommendations
|
|
359
|
+
```
|
|
360
|
+
|
|
361
|
+
### Example 3: Mobile App with Offline
|
|
362
|
+
|
|
363
|
+
```
|
|
364
|
+
Requirements:
|
|
365
|
+
- Works offline
|
|
366
|
+
- Syncs when online
|
|
367
|
+
- Simple data model
|
|
368
|
+
|
|
369
|
+
Architecture:
|
|
370
|
+
┌─────────────────────────────────────┐
|
|
371
|
+
│ Mobile App │
|
|
372
|
+
│ ┌─────────────┐ │
|
|
373
|
+
│ │ SQLite │ (local) │
|
|
374
|
+
│ └──────┬──────┘ │
|
|
375
|
+
└─────────┼───────────────────────────┘
|
|
376
|
+
│ sync
|
|
377
|
+
▼
|
|
378
|
+
┌─────────────────────────────────────┐
|
|
379
|
+
│ Backend API │
|
|
380
|
+
│ ┌─────────────┐ │
|
|
381
|
+
│ │ PostgreSQL │ (cloud) │
|
|
382
|
+
│ └─────────────┘ │
|
|
383
|
+
└─────────────────────────────────────┘
|
|
384
|
+
|
|
385
|
+
Why:
|
|
386
|
+
- SQLite: embedded, zero-config, offline
|
|
387
|
+
- PostgreSQL: server-side, handles conflicts
|
|
388
|
+
- Sync: custom or use Supabase/Firebase
|
|
389
|
+
```
|
|
390
|
+
|
|
391
|
+
### Example 4: Analytics Dashboard
|
|
392
|
+
|
|
393
|
+
```
|
|
394
|
+
Requirements:
|
|
395
|
+
- Ingest 1M events/day
|
|
396
|
+
- Fast aggregations
|
|
397
|
+
- Historical queries
|
|
398
|
+
|
|
399
|
+
Architecture:
|
|
400
|
+
┌─────────────┐ ┌─────────────┐
|
|
401
|
+
│ Kafka │────►│ ClickHouse │
|
|
402
|
+
│ (ingest) │ │ (analytics) │
|
|
403
|
+
└─────────────┘ └─────────────┘
|
|
404
|
+
│
|
|
405
|
+
▼
|
|
406
|
+
┌─────────────┐
|
|
407
|
+
│ PostgreSQL │
|
|
408
|
+
│ (metadata, │
|
|
409
|
+
│ users) │
|
|
410
|
+
└─────────────┘
|
|
411
|
+
|
|
412
|
+
Why:
|
|
413
|
+
- Kafka: handles high-volume ingest
|
|
414
|
+
- ClickHouse: columnar, fast aggregations
|
|
415
|
+
- PostgreSQL: user accounts, dashboard configs
|
|
416
|
+
```
|
|
417
|
+
|
|
418
|
+
---
|
|
419
|
+
|
|
420
|
+
## Schema Design Process
|
|
421
|
+
|
|
422
|
+
### Step 1: Identify Entities
|
|
423
|
+
|
|
424
|
+
From requirements/PRD, extract:
|
|
425
|
+
- **Core entities** — Main business objects
|
|
426
|
+
- **Supporting entities** — Lookup tables, configs
|
|
427
|
+
- **Junction tables** — M:N relationships
|
|
428
|
+
|
|
429
|
+
### Step 2: Define Relationships
|
|
430
|
+
|
|
431
|
+
| Relationship | Implementation | Example |
|
|
432
|
+
|--------------|----------------|---------|
|
|
433
|
+
| 1:1 | FK in either table OR same table | user ↔ profile |
|
|
434
|
+
| 1:N | FK in "many" side | user → posts |
|
|
435
|
+
| M:N | Junction table | users ↔ roles |
|
|
436
|
+
| Hierarchy | Self-referencing FK OR nested set | categories |
|
|
437
|
+
| Polymorphic | Type column + nullable FKs OR separate tables | comments on posts/tasks |
|
|
438
|
+
|
|
439
|
+
### Step 3: Normalize (then Denormalize)
|
|
440
|
+
|
|
441
|
+
#### Normalization Levels
|
|
442
|
+
|
|
443
|
+
| Form | Rule | Check |
|
|
444
|
+
|------|------|-------|
|
|
445
|
+
| **1NF** | Atomic values, no arrays in columns | Each cell = one value |
|
|
446
|
+
| **2NF** | No partial dependencies | All non-key columns depend on FULL PK |
|
|
447
|
+
| **3NF** | No transitive dependencies | Non-key columns don't depend on other non-key |
|
|
448
|
+
| **BCNF** | Every determinant is a candidate key | Advanced, rare |
|
|
449
|
+
|
|
450
|
+
#### When to Denormalize
|
|
451
|
+
|
|
452
|
+
| Situation | Denormalization |
|
|
453
|
+
|-----------|-----------------|
|
|
454
|
+
| Read-heavy, rarely changes | Duplicate for read speed |
|
|
455
|
+
| Reporting/analytics | Materialized views |
|
|
456
|
+
| Cross-module data | Cache foreign data locally |
|
|
457
|
+
| Computed values | Store calculated fields |
|
|
458
|
+
|
|
459
|
+
**Rule:** Normalize first, denormalize with measured reason.
|
|
460
|
+
|
|
461
|
+
### Step 4: Design Indexes
|
|
462
|
+
|
|
463
|
+
#### Index Types
|
|
464
|
+
|
|
465
|
+
| Type | PostgreSQL | Use For |
|
|
466
|
+
|------|------------|---------|
|
|
467
|
+
| B-tree | `CREATE INDEX` (default) | Equality, range, sorting |
|
|
468
|
+
| Hash | `USING hash` | Equality only, faster |
|
|
469
|
+
| GiST | `USING gist` | Geometric, full-text |
|
|
470
|
+
| GIN | `USING gin` | Arrays, JSONB, full-text |
|
|
471
|
+
| BRIN | `USING brin` | Large tables, sorted data |
|
|
472
|
+
|
|
473
|
+
#### Index Strategy
|
|
474
|
+
|
|
475
|
+
```sql
|
|
476
|
+
-- Primary key (automatic)
|
|
477
|
+
PRIMARY KEY (id)
|
|
478
|
+
|
|
479
|
+
-- Foreign keys (CREATE MANUALLY!)
|
|
480
|
+
CREATE INDEX idx_tasks_user_id ON tasks(user_id);
|
|
481
|
+
|
|
482
|
+
-- Common queries
|
|
483
|
+
CREATE INDEX idx_tasks_status ON tasks(status) WHERE status != 'archived';
|
|
484
|
+
|
|
485
|
+
-- Composite (order matters!)
|
|
486
|
+
CREATE INDEX idx_tasks_user_status ON tasks(user_id, status);
|
|
487
|
+
|
|
488
|
+
-- Covering index (includes columns)
|
|
489
|
+
CREATE INDEX idx_tasks_list ON tasks(user_id) INCLUDE (title, status);
|
|
490
|
+
```
|
|
491
|
+
|
|
492
|
+
#### Index Anti-Patterns
|
|
493
|
+
|
|
494
|
+
| Anti-Pattern | Problem |
|
|
495
|
+
|--------------|---------|
|
|
496
|
+
| Index every column | Write overhead, storage |
|
|
497
|
+
| Missing FK indexes | Slow JOINs, lock contention |
|
|
498
|
+
| Wrong column order in composite | Not used for queries |
|
|
499
|
+
| Over-indexing low-cardinality | B-tree inefficient for few values |
|
|
500
|
+
|
|
501
|
+
---
|
|
502
|
+
|
|
503
|
+
## Table Conventions
|
|
504
|
+
|
|
505
|
+
### Naming
|
|
506
|
+
|
|
507
|
+
```sql
|
|
508
|
+
-- Tables: plural, snake_case
|
|
509
|
+
users, task_comments, user_roles
|
|
510
|
+
|
|
511
|
+
-- Columns: snake_case
|
|
512
|
+
created_at, user_id, is_active
|
|
513
|
+
|
|
514
|
+
-- Primary keys: id (uuid or bigint)
|
|
515
|
+
id UUID PRIMARY KEY DEFAULT gen_random_uuid()
|
|
516
|
+
|
|
517
|
+
-- Foreign keys: {table_singular}_id
|
|
518
|
+
user_id UUID REFERENCES users(id)
|
|
519
|
+
|
|
520
|
+
-- Timestamps: _at suffix
|
|
521
|
+
created_at, updated_at, deleted_at
|
|
522
|
+
```
|
|
523
|
+
|
|
524
|
+
### Standard Columns
|
|
525
|
+
|
|
526
|
+
```sql
|
|
527
|
+
-- Every table should have
|
|
528
|
+
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
529
|
+
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
|
530
|
+
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
|
531
|
+
|
|
532
|
+
-- Soft delete (if needed)
|
|
533
|
+
deleted_at TIMESTAMPTZ
|
|
534
|
+
|
|
535
|
+
-- Multi-tenant
|
|
536
|
+
tenant_id UUID NOT NULL REFERENCES tenants(id)
|
|
537
|
+
```
|
|
538
|
+
|
|
539
|
+
### Constraints
|
|
540
|
+
|
|
541
|
+
```sql
|
|
542
|
+
-- NOT NULL by default, NULL explicitly
|
|
543
|
+
email VARCHAR(255) NOT NULL,
|
|
544
|
+
bio TEXT -- nullable explicitly documented
|
|
545
|
+
|
|
546
|
+
-- CHECK constraints for business rules
|
|
547
|
+
CHECK (status IN ('draft', 'active', 'archived'))
|
|
548
|
+
CHECK (price >= 0)
|
|
549
|
+
CHECK (start_date < end_date)
|
|
550
|
+
|
|
551
|
+
-- UNIQUE constraints
|
|
552
|
+
UNIQUE (tenant_id, email) -- scoped uniqueness
|
|
553
|
+
```
|
|
554
|
+
|
|
555
|
+
---
|
|
556
|
+
|
|
557
|
+
## Common Patterns
|
|
558
|
+
|
|
559
|
+
### Soft Delete
|
|
560
|
+
|
|
561
|
+
```sql
|
|
562
|
+
-- Column
|
|
563
|
+
deleted_at TIMESTAMPTZ
|
|
564
|
+
|
|
565
|
+
-- Index for active records
|
|
566
|
+
CREATE INDEX idx_users_active ON users(id) WHERE deleted_at IS NULL;
|
|
567
|
+
|
|
568
|
+
-- Application: always filter
|
|
569
|
+
SELECT * FROM users WHERE deleted_at IS NULL;
|
|
570
|
+
```
|
|
571
|
+
|
|
572
|
+
### Audit Trail (History)
|
|
573
|
+
|
|
574
|
+
```sql
|
|
575
|
+
-- Option 1: History table
|
|
576
|
+
CREATE TABLE tasks_history (
|
|
577
|
+
history_id UUID PRIMARY KEY,
|
|
578
|
+
task_id UUID NOT NULL,
|
|
579
|
+
changed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
|
580
|
+
changed_by UUID,
|
|
581
|
+
operation VARCHAR(10), -- INSERT, UPDATE, DELETE
|
|
582
|
+
old_data JSONB,
|
|
583
|
+
new_data JSONB
|
|
584
|
+
);
|
|
585
|
+
|
|
586
|
+
-- Option 2: SCD Type 2 (Slowly Changing Dimension)
|
|
587
|
+
CREATE TABLE users (
|
|
588
|
+
id UUID,
|
|
589
|
+
email VARCHAR(255),
|
|
590
|
+
name VARCHAR(255),
|
|
591
|
+
valid_from TIMESTAMPTZ NOT NULL,
|
|
592
|
+
valid_to TIMESTAMPTZ, -- NULL = current
|
|
593
|
+
is_current BOOLEAN GENERATED ALWAYS AS (valid_to IS NULL) STORED,
|
|
594
|
+
PRIMARY KEY (id, valid_from)
|
|
595
|
+
);
|
|
596
|
+
```
|
|
597
|
+
|
|
598
|
+
### Multi-Tenancy
|
|
599
|
+
|
|
600
|
+
```sql
|
|
601
|
+
-- Option 1: Tenant column (recommended for most cases)
|
|
602
|
+
CREATE TABLE tasks (
|
|
603
|
+
id UUID PRIMARY KEY,
|
|
604
|
+
tenant_id UUID NOT NULL REFERENCES tenants(id),
|
|
605
|
+
-- ...
|
|
606
|
+
);
|
|
607
|
+
|
|
608
|
+
-- Row-level security
|
|
609
|
+
ALTER TABLE tasks ENABLE ROW LEVEL SECURITY;
|
|
610
|
+
CREATE POLICY tenant_isolation ON tasks
|
|
611
|
+
USING (tenant_id = current_setting('app.tenant_id')::uuid);
|
|
612
|
+
|
|
613
|
+
-- Option 2: Schema per tenant (for strong isolation)
|
|
614
|
+
CREATE SCHEMA tenant_abc;
|
|
615
|
+
CREATE TABLE tenant_abc.tasks (...);
|
|
616
|
+
```
|
|
617
|
+
|
|
618
|
+
### Enum vs Lookup Table
|
|
619
|
+
|
|
620
|
+
```sql
|
|
621
|
+
-- PostgreSQL ENUM (simple, fixed values)
|
|
622
|
+
CREATE TYPE task_status AS ENUM ('todo', 'in_progress', 'done');
|
|
623
|
+
ALTER TABLE tasks ADD COLUMN status task_status;
|
|
624
|
+
|
|
625
|
+
-- Lookup table (configurable, metadata)
|
|
626
|
+
CREATE TABLE task_statuses (
|
|
627
|
+
id SERIAL PRIMARY KEY,
|
|
628
|
+
code VARCHAR(50) UNIQUE NOT NULL,
|
|
629
|
+
name VARCHAR(100),
|
|
630
|
+
display_order INT,
|
|
631
|
+
is_terminal BOOLEAN DEFAULT FALSE
|
|
632
|
+
);
|
|
633
|
+
|
|
634
|
+
-- Use lookup when:
|
|
635
|
+
-- - Values change at runtime
|
|
636
|
+
-- - Need metadata (color, order, permissions)
|
|
637
|
+
-- - Multi-tenant with different values
|
|
638
|
+
```
|
|
639
|
+
|
|
640
|
+
### JSONB Usage
|
|
641
|
+
|
|
642
|
+
```sql
|
|
643
|
+
-- Good: Flexible attributes, rare queries
|
|
644
|
+
metadata JSONB DEFAULT '{}'
|
|
645
|
+
|
|
646
|
+
-- Index for queries
|
|
647
|
+
CREATE INDEX idx_tasks_metadata ON tasks USING gin(metadata);
|
|
648
|
+
|
|
649
|
+
-- Query
|
|
650
|
+
SELECT * FROM tasks WHERE metadata @> '{"priority": "high"}';
|
|
651
|
+
|
|
652
|
+
-- Bad: Structured data that needs JOINs, constraints
|
|
653
|
+
-- Don't store user_id in JSONB if you need FK constraint
|
|
654
|
+
```
|
|
655
|
+
|
|
656
|
+
---
|
|
657
|
+
|
|
658
|
+
## Partitioning
|
|
659
|
+
|
|
660
|
+
### When to Partition
|
|
661
|
+
|
|
662
|
+
| Signal | Consider Partitioning |
|
|
663
|
+
|--------|----------------------|
|
|
664
|
+
| Table > 100GB | Yes |
|
|
665
|
+
| Time-series data | Yes (by time) |
|
|
666
|
+
| Queries always filter by X | Yes (by X) |
|
|
667
|
+
| Deleting old data regularly | Yes (DROP partition vs DELETE) |
|
|
668
|
+
| < 10M rows, simple queries | No |
|
|
669
|
+
|
|
670
|
+
### Partition Strategies
|
|
671
|
+
|
|
672
|
+
```sql
|
|
673
|
+
-- Range partitioning (time-series)
|
|
674
|
+
CREATE TABLE events (
|
|
675
|
+
id UUID,
|
|
676
|
+
event_time TIMESTAMPTZ NOT NULL,
|
|
677
|
+
data JSONB
|
|
678
|
+
) PARTITION BY RANGE (event_time);
|
|
679
|
+
|
|
680
|
+
CREATE TABLE events_2026_01 PARTITION OF events
|
|
681
|
+
FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
|
|
682
|
+
|
|
683
|
+
-- List partitioning (by category)
|
|
684
|
+
CREATE TABLE orders (
|
|
685
|
+
id UUID,
|
|
686
|
+
region VARCHAR(10) NOT NULL,
|
|
687
|
+
data JSONB
|
|
688
|
+
) PARTITION BY LIST (region);
|
|
689
|
+
|
|
690
|
+
CREATE TABLE orders_eu PARTITION OF orders FOR VALUES IN ('eu');
|
|
691
|
+
CREATE TABLE orders_us PARTITION OF orders FOR VALUES IN ('us');
|
|
692
|
+
|
|
693
|
+
-- Hash partitioning (even distribution)
|
|
694
|
+
PARTITION BY HASH (tenant_id);
|
|
695
|
+
```
|
|
696
|
+
|
|
697
|
+
---
|
|
698
|
+
|
|
699
|
+
## Migrations
|
|
700
|
+
|
|
701
|
+
### Migration Rules
|
|
702
|
+
|
|
703
|
+
1. **Forward compatible** — New code works with old schema
|
|
704
|
+
2. **Backward compatible** — Old code works with new schema
|
|
705
|
+
3. **Small, incremental** — One change per migration
|
|
706
|
+
4. **Reversible** — Include DOWN migration
|
|
707
|
+
5. **Tested** — Run on copy of production data
|
|
708
|
+
|
|
709
|
+
### Safe Migration Patterns
|
|
710
|
+
|
|
711
|
+
| Change | Safe Way |
|
|
712
|
+
|--------|----------|
|
|
713
|
+
| Add column | `ADD COLUMN ... DEFAULT NULL` (no lock) |
|
|
714
|
+
| Add NOT NULL column | Add nullable → backfill → add constraint |
|
|
715
|
+
| Remove column | Stop using → deploy → remove |
|
|
716
|
+
| Rename column | Add new → copy → remove old |
|
|
717
|
+
| Add index | `CREATE INDEX CONCURRENTLY` |
|
|
718
|
+
| Change type | Add new column → migrate → remove old |
|
|
719
|
+
|
|
720
|
+
### Dangerous Operations
|
|
721
|
+
|
|
722
|
+
```sql
|
|
723
|
+
-- ❌ LOCKS TABLE
|
|
724
|
+
ALTER TABLE tasks ADD COLUMN status VARCHAR NOT NULL DEFAULT 'todo';
|
|
725
|
+
|
|
726
|
+
-- ✅ SAFE
|
|
727
|
+
ALTER TABLE tasks ADD COLUMN status VARCHAR;
|
|
728
|
+
UPDATE tasks SET status = 'todo' WHERE status IS NULL; -- batched
|
|
729
|
+
ALTER TABLE tasks ALTER COLUMN status SET NOT NULL;
|
|
730
|
+
ALTER TABLE tasks ALTER COLUMN status SET DEFAULT 'todo';
|
|
731
|
+
```
|
|
732
|
+
|
|
733
|
+
### Migration File Format
|
|
734
|
+
|
|
735
|
+
```sql
|
|
736
|
+
-- migrations/20260124_001_add_status_to_tasks.sql
|
|
737
|
+
|
|
738
|
+
-- +migrate Up
|
|
739
|
+
ALTER TABLE tasks ADD COLUMN status VARCHAR(50);
|
|
740
|
+
CREATE INDEX CONCURRENTLY idx_tasks_status ON tasks(status);
|
|
741
|
+
|
|
742
|
+
-- +migrate Down
|
|
743
|
+
DROP INDEX CONCURRENTLY IF EXISTS idx_tasks_status;
|
|
744
|
+
ALTER TABLE tasks DROP COLUMN IF EXISTS status;
|
|
745
|
+
```
|
|
746
|
+
|
|
747
|
+
---
|
|
748
|
+
|
|
749
|
+
## Query Optimization
|
|
750
|
+
|
|
751
|
+
### EXPLAIN Checklist
|
|
752
|
+
|
|
753
|
+
```sql
|
|
754
|
+
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
|
|
755
|
+
SELECT * FROM tasks WHERE user_id = '...' AND status = 'active';
|
|
756
|
+
```
|
|
757
|
+
|
|
758
|
+
| Look For | Problem | Fix |
|
|
759
|
+
|----------|---------|-----|
|
|
760
|
+
| `Seq Scan` on large table | Missing index | Add index |
|
|
761
|
+
| High `rows` estimate vs actual | Stale statistics | `ANALYZE table` |
|
|
762
|
+
| `Sort` with high cost | Missing index for ORDER BY | Add covering index |
|
|
763
|
+
| Nested Loop on large sets | Inefficient JOIN | Add index, rewrite query |
|
|
764
|
+
|
|
765
|
+
### Common Optimizations
|
|
766
|
+
|
|
767
|
+
```sql
|
|
768
|
+
-- Pagination: cursor-based (not OFFSET)
|
|
769
|
+
-- ❌ Slow for large offsets
|
|
770
|
+
SELECT * FROM tasks ORDER BY created_at LIMIT 20 OFFSET 10000;
|
|
771
|
+
|
|
772
|
+
-- ✅ Fast cursor-based
|
|
773
|
+
SELECT * FROM tasks
|
|
774
|
+
WHERE created_at < '2026-01-20T10:00:00Z'
|
|
775
|
+
ORDER BY created_at DESC LIMIT 20;
|
|
776
|
+
|
|
777
|
+
-- Batch operations
|
|
778
|
+
-- ❌ One by one
|
|
779
|
+
UPDATE tasks SET status = 'archived' WHERE ...;
|
|
780
|
+
|
|
781
|
+
-- ✅ Batched
|
|
782
|
+
UPDATE tasks SET status = 'archived'
|
|
783
|
+
WHERE id IN (SELECT id FROM tasks WHERE ... LIMIT 1000);
|
|
784
|
+
|
|
785
|
+
-- Count estimation (for UI "~1000 results")
|
|
786
|
+
SELECT reltuples::bigint FROM pg_class WHERE relname = 'tasks';
|
|
787
|
+
```
|
|
788
|
+
|
|
789
|
+
---
|
|
790
|
+
|
|
791
|
+
## Validation Checklist
|
|
792
|
+
|
|
793
|
+
Before completing database design:
|
|
794
|
+
|
|
795
|
+
- [ ] Entity relationships documented (ERD)
|
|
796
|
+
- [ ] All tables have PK, created_at, updated_at
|
|
797
|
+
- [ ] Foreign keys have indexes
|
|
798
|
+
- [ ] Naming follows conventions
|
|
799
|
+
- [ ] Constraints enforce business rules
|
|
800
|
+
- [ ] Query patterns have supporting indexes
|
|
801
|
+
- [ ] Soft delete strategy defined (if needed)
|
|
802
|
+
- [ ] Multi-tenancy strategy defined (if needed)
|
|
803
|
+
- [ ] Migration strategy for schema changes
|
|
804
|
+
- [ ] Data retention/archival plan
|
|
805
|
+
- [ ] Backup strategy documented
|
|
806
|
+
|
|
807
|
+
---
|
|
808
|
+
|
|
809
|
+
## Output
|
|
810
|
+
|
|
811
|
+
- Schema: `docs/architecture/{module}/data-model.md`
|
|
812
|
+
- Migrations: `migrations/YYYYMMDD_NNN_description.sql`
|
|
813
|
+
- ERD: `docs/diagrams/data/{module}-erd.md`
|
|
814
|
+
|
|
815
|
+
## Related Skills
|
|
816
|
+
|
|
817
|
+
- `architecture-design` — Database is part of architecture
|
|
818
|
+
- `unit-writing` — Includes data-model.md
|
|
819
|
+
- `adr-writing` — Document database decisions
|
|
820
|
+
- `diagram-creation` — ER diagrams
|