@booklib/skills 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +105 -0
- package/animation-at-work/SKILL.md +246 -0
- package/animation-at-work/assets/example_asset.txt +1 -0
- package/animation-at-work/references/api_reference.md +369 -0
- package/animation-at-work/references/review-checklist.md +79 -0
- package/animation-at-work/scripts/example.py +1 -0
- package/bin/skills.js +85 -0
- package/clean-code-reviewer/SKILL.md +292 -0
- package/clean-code-reviewer/evals/evals.json +67 -0
- package/data-intensive-patterns/SKILL.md +204 -0
- package/data-intensive-patterns/assets/example_asset.txt +1 -0
- package/data-intensive-patterns/references/api_reference.md +34 -0
- package/data-intensive-patterns/references/patterns-catalog.md +551 -0
- package/data-intensive-patterns/references/review-checklist.md +193 -0
- package/data-intensive-patterns/scripts/example.py +1 -0
- package/data-pipelines/SKILL.md +252 -0
- package/data-pipelines/assets/example_asset.txt +1 -0
- package/data-pipelines/references/api_reference.md +301 -0
- package/data-pipelines/references/review-checklist.md +181 -0
- package/data-pipelines/scripts/example.py +1 -0
- package/design-patterns/SKILL.md +245 -0
- package/design-patterns/assets/example_asset.txt +1 -0
- package/design-patterns/references/api_reference.md +1 -0
- package/design-patterns/references/patterns-catalog.md +726 -0
- package/design-patterns/references/review-checklist.md +173 -0
- package/design-patterns/scripts/example.py +1 -0
- package/domain-driven-design/SKILL.md +221 -0
- package/domain-driven-design/assets/example_asset.txt +1 -0
- package/domain-driven-design/references/api_reference.md +1 -0
- package/domain-driven-design/references/patterns-catalog.md +545 -0
- package/domain-driven-design/references/review-checklist.md +158 -0
- package/domain-driven-design/scripts/example.py +1 -0
- package/effective-java/SKILL.md +195 -0
- package/effective-java/assets/example_asset.txt +1 -0
- package/effective-java/references/api_reference.md +1 -0
- package/effective-java/references/items-catalog.md +955 -0
- package/effective-java/references/review-checklist.md +216 -0
- package/effective-java/scripts/example.py +1 -0
- package/effective-kotlin/SKILL.md +225 -0
- package/effective-kotlin/assets/example_asset.txt +1 -0
- package/effective-kotlin/references/api_reference.md +1 -0
- package/effective-kotlin/references/practices-catalog.md +1228 -0
- package/effective-kotlin/references/review-checklist.md +126 -0
- package/effective-kotlin/scripts/example.py +1 -0
- package/kotlin-in-action/SKILL.md +251 -0
- package/kotlin-in-action/assets/example_asset.txt +1 -0
- package/kotlin-in-action/references/api_reference.md +1 -0
- package/kotlin-in-action/references/practices-catalog.md +436 -0
- package/kotlin-in-action/references/review-checklist.md +204 -0
- package/kotlin-in-action/scripts/example.py +1 -0
- package/lean-startup/SKILL.md +250 -0
- package/lean-startup/assets/example_asset.txt +1 -0
- package/lean-startup/references/api_reference.md +319 -0
- package/lean-startup/references/review-checklist.md +137 -0
- package/lean-startup/scripts/example.py +1 -0
- package/microservices-patterns/SKILL.md +179 -0
- package/microservices-patterns/references/patterns-catalog.md +391 -0
- package/microservices-patterns/references/review-checklist.md +169 -0
- package/package.json +17 -0
- package/refactoring-ui/SKILL.md +236 -0
- package/refactoring-ui/assets/example_asset.txt +1 -0
- package/refactoring-ui/references/api_reference.md +355 -0
- package/refactoring-ui/references/review-checklist.md +114 -0
- package/refactoring-ui/scripts/example.py +1 -0
- package/storytelling-with-data/SKILL.md +238 -0
- package/storytelling-with-data/assets/example_asset.txt +1 -0
- package/storytelling-with-data/references/api_reference.md +379 -0
- package/storytelling-with-data/references/review-checklist.md +111 -0
- package/storytelling-with-data/scripts/example.py +1 -0
- package/system-design-interview/SKILL.md +213 -0
- package/system-design-interview/assets/example_asset.txt +1 -0
- package/system-design-interview/references/api_reference.md +582 -0
- package/system-design-interview/references/review-checklist.md +201 -0
- package/system-design-interview/scripts/example.py +1 -0
- package/using-asyncio-python/SKILL.md +242 -0
- package/using-asyncio-python/assets/example_asset.txt +1 -0
- package/using-asyncio-python/references/api_reference.md +267 -0
- package/using-asyncio-python/references/review-checklist.md +149 -0
- package/using-asyncio-python/scripts/example.py +1 -0
- package/web-scraping-python/SKILL.md +259 -0
- package/web-scraping-python/assets/example_asset.txt +1 -0
- package/web-scraping-python/references/api_reference.md +393 -0
- package/web-scraping-python/references/review-checklist.md +163 -0
- package/web-scraping-python/scripts/example.py +1 -0
|
@@ -0,0 +1,551 @@
|
|
|
1
|
+
# Data-Intensive Application Patterns Catalog
|
|
2
|
+
|
|
3
|
+
Comprehensive reference of patterns from Martin Kleppmann's *Designing Data-Intensive Applications*.
|
|
4
|
+
Organized by the book's three-part structure. Read the section relevant to the code you're generating.
|
|
5
|
+
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## Table of Contents
|
|
9
|
+
|
|
10
|
+
1. [Data Models and Query Languages](#data-models-and-query-languages)
|
|
11
|
+
2. [Storage Engines and Indexing](#storage-engines-and-indexing)
|
|
12
|
+
3. [Encoding and Schema Evolution](#encoding-and-schema-evolution)
|
|
13
|
+
4. [Replication](#replication)
|
|
14
|
+
5. [Partitioning](#partitioning)
|
|
15
|
+
6. [Transactions](#transactions)
|
|
16
|
+
7. [Distributed Systems Fundamentals](#distributed-systems-fundamentals)
|
|
17
|
+
8. [Consistency and Consensus](#consistency-and-consensus)
|
|
18
|
+
9. [Batch Processing](#batch-processing)
|
|
19
|
+
10. [Stream Processing](#stream-processing)
|
|
20
|
+
11. [Derived Data and Integration](#derived-data-and-integration)
|
|
21
|
+
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
## Data Models and Query Languages
|
|
25
|
+
|
|
26
|
+
### Relational Model
|
|
27
|
+
- Tables with rows and columns, enforced schema
|
|
28
|
+
- Best for: many-to-many relationships, complex joins, data with strong integrity requirements
|
|
29
|
+
- Normalized: reduce redundancy, enforce consistency via foreign keys
|
|
30
|
+
- Query language: SQL (declarative)
|
|
31
|
+
|
|
32
|
+
### Document Model
|
|
33
|
+
- Self-contained JSON/BSON documents, flexible schema (schema-on-read)
|
|
34
|
+
- Best for: one-to-many relationships, self-contained records, heterogeneous data
|
|
35
|
+
- Denormalized: data locality — everything for one entity in one document
|
|
36
|
+
- Limitations: poor support for many-to-many; joins are weak or manual
|
|
37
|
+
- Examples: MongoDB, CouchDB, RethinkDB
|
|
38
|
+
|
|
39
|
+
### Graph Model
|
|
40
|
+
- Vertices (nodes) and edges (relationships), flexible schema
|
|
41
|
+
- Best for: highly interconnected data, variable relationship types, traversal-heavy queries
|
|
42
|
+
- Two flavors:
|
|
43
|
+
- **Property graph** (Neo4j): nodes/edges have properties, query with Cypher
|
|
44
|
+
- **Triple store** (RDF): subject-predicate-object triples, query with SPARQL
|
|
45
|
+
- Good for social networks, recommendation engines, knowledge graphs, fraud detection
|
|
46
|
+
|
|
47
|
+
### Choosing a Data Model
|
|
48
|
+
|
|
49
|
+
| Access Pattern | Best Model |
|
|
50
|
+
|---------------|-----------|
|
|
51
|
+
| Complex joins, aggregations, reporting | Relational |
|
|
52
|
+
| Self-contained documents, flexible schema | Document |
|
|
53
|
+
| Highly connected data, graph traversals | Graph |
|
|
54
|
+
| Append-only event log | Event store |
|
|
55
|
+
| Key-value lookups with high throughput | Key-value (Redis, DynamoDB) |
|
|
56
|
+
| Time-series with range scans | Time-series (TimescaleDB, InfluxDB) |
|
|
57
|
+
|
|
58
|
+
---
|
|
59
|
+
|
|
60
|
+
## Storage Engines and Indexing
|
|
61
|
+
|
|
62
|
+
### Log-Structured Storage (LSM-Trees)
|
|
63
|
+
|
|
64
|
+
How it works:
|
|
65
|
+
1. Writes go to an in-memory balanced tree (memtable)
|
|
66
|
+
2. When memtable exceeds threshold, flush to disk as a sorted SSTable (Sorted String Table)
|
|
67
|
+
3. Background compaction merges SSTables, removing duplicates and deleted entries
|
|
68
|
+
4. Reads check memtable first, then SSTables (newest to oldest), aided by Bloom filters
|
|
69
|
+
|
|
70
|
+
Characteristics:
|
|
71
|
+
- **Write-optimized**: sequential writes to disk, no random I/O on write path
|
|
72
|
+
- **Compaction strategies**: size-tiered (good for write-heavy) vs leveled (better read amplification)
|
|
73
|
+
- **Bloom filters**: probabilistic data structure to quickly check if a key might exist in an SSTable
|
|
74
|
+
- **Trade-off**: higher write throughput, but reads may touch multiple SSTables
|
|
75
|
+
|
|
76
|
+
Implementations: LevelDB, RocksDB, Cassandra, HBase, ScyllaDB
|
|
77
|
+
|
|
78
|
+
### Page-Oriented Storage (B-Trees)
|
|
79
|
+
|
|
80
|
+
How it works:
|
|
81
|
+
1. Data organized in fixed-size pages (typically 4KB)
|
|
82
|
+
2. Tree structure with branching factor ~several hundred
|
|
83
|
+
3. Updates modify pages in place
|
|
84
|
+
4. Write-ahead log (WAL) ensures crash recovery
|
|
85
|
+
|
|
86
|
+
Characteristics:
|
|
87
|
+
- **Read-optimized**: O(log n) lookups with predictable performance
|
|
88
|
+
- **In-place updates**: overwrites pages on disk
|
|
89
|
+
- **WAL**: append-only log written before modifying pages, for crash recovery
|
|
90
|
+
- **Latches**: lightweight locks for concurrent access to tree pages
|
|
91
|
+
- **Copy-on-write**: some implementations (LMDB) write new pages instead of overwriting
|
|
92
|
+
|
|
93
|
+
Implementations: PostgreSQL, MySQL InnoDB, SQL Server, Oracle
|
|
94
|
+
|
|
95
|
+
### Choosing a Storage Engine
|
|
96
|
+
|
|
97
|
+
| Workload | Recommended Engine | Why |
|
|
98
|
+
|----------|-------------------|-----|
|
|
99
|
+
| Write-heavy (logging, IoT, events) | LSM-Tree | Sequential writes, high throughput |
|
|
100
|
+
| Read-heavy with point lookups | B-Tree | Predictable read latency |
|
|
101
|
+
| Mixed OLTP | B-Tree (usually) | Good all-around for transactions |
|
|
102
|
+
| Analytical (OLAP) | Column-oriented | Compression, vectorized processing |
|
|
103
|
+
| Time-series | LSM-Tree or specialized | Append-heavy, range scan friendly |
|
|
104
|
+
|
|
105
|
+
### Column-Oriented Storage (OLAP)
|
|
106
|
+
|
|
107
|
+
- Store values from each column together instead of each row together
|
|
108
|
+
- Enables aggressive compression (run-length encoding, bitmap encoding)
|
|
109
|
+
- Vectorized processing: operate on columns of compressed data in CPU cache
|
|
110
|
+
- **Star schema**: central fact table with dimension tables (snowflake if dimensions are further normalized)
|
|
111
|
+
- **Materialized views / data cubes**: pre-computed aggregations for dashboard queries
|
|
112
|
+
|
|
113
|
+
---
|
|
114
|
+
|
|
115
|
+
## Encoding and Schema Evolution
|
|
116
|
+
|
|
117
|
+
### Encoding Formats Comparison
|
|
118
|
+
|
|
119
|
+
| Format | Schema? | Binary? | Forward Compatible | Backward Compatible | Notes |
|
|
120
|
+
|--------|---------|---------|-------------------|--------------------|----|
|
|
121
|
+
| JSON | Optional (JSON Schema) | No | Partial | Partial | Human-readable; number precision issues |
|
|
122
|
+
| XML | Optional (XSD) | No | Partial | Partial | Verbose; human-readable |
|
|
123
|
+
| Protocol Buffers | Required (.proto) | Yes | Yes (new fields with new tags) | Yes (new fields optional) | Field tags for evolution |
|
|
124
|
+
| Thrift | Required (.thrift) | Yes | Yes | Yes | Similar to Protobuf, two binary formats |
|
|
125
|
+
| Avro | Required (.avsc) | Yes | Yes (writer's schema + reader's schema) | Yes | Schema resolution; great for Hadoop/Kafka |
|
|
126
|
+
|
|
127
|
+
### Schema Evolution Rules
|
|
128
|
+
|
|
129
|
+
- **Forward compatibility**: old code can read data written by new code
|
|
130
|
+
- New fields must be optional or have defaults
|
|
131
|
+
- Never remove a required field
|
|
132
|
+
- **Backward compatibility**: new code can read data written by old code
|
|
133
|
+
- New fields must be optional or have defaults
|
|
134
|
+
- Never reuse a deleted field's tag number
|
|
135
|
+
- **Full compatibility** (both): needed when readers and writers are updated at different times
|
|
136
|
+
|
|
137
|
+
### Dataflow Patterns
|
|
138
|
+
|
|
139
|
+
How data flows between processes determines which compatibility direction matters:
|
|
140
|
+
|
|
141
|
+
- **Through databases**: writer encodes, reader decodes (potentially much later) — need both directions
|
|
142
|
+
- **Through services (REST/RPC)**: request and response each need backward + forward compatibility
|
|
143
|
+
- REST: use content negotiation, versioned URLs, or header-based versioning
|
|
144
|
+
- RPC: treat as cross-service API, version carefully
|
|
145
|
+
- **Through async messaging**: producer encodes, consumer decodes — similar to databases
|
|
146
|
+
- Use schema registry (Confluent Schema Registry for Kafka)
|
|
147
|
+
- Avro is ideal: writer's schema stored with each message, reader uses its own schema
|
|
148
|
+
|
|
149
|
+
---
|
|
150
|
+
|
|
151
|
+
## Replication
|
|
152
|
+
|
|
153
|
+
### Single-Leader Replication
|
|
154
|
+
|
|
155
|
+
One node (leader) accepts writes; followers replicate the leader's write-ahead log.
|
|
156
|
+
|
|
157
|
+
- **Synchronous followers**: guaranteed up-to-date but write latency increases
|
|
158
|
+
- **Asynchronous followers**: leader doesn't wait, risk of data loss if leader fails before replication
|
|
159
|
+
- **Semi-synchronous**: one follower is synchronous (guaranteed durability), rest are async
|
|
160
|
+
|
|
161
|
+
Failover concerns:
|
|
162
|
+
- Split-brain: two nodes think they're leader (use fencing tokens, epoch numbers)
|
|
163
|
+
- Lost writes: async follower promoted to leader may be missing recent writes
|
|
164
|
+
- Replication lag: stale reads from followers
|
|
165
|
+
|
|
166
|
+
Handling replication lag:
|
|
167
|
+
- **Read-after-write consistency**: after a write, read from leader (or wait for follower to catch up)
|
|
168
|
+
- **Monotonic reads**: always read from the same replica (session stickiness)
|
|
169
|
+
- **Consistent prefix reads**: preserve causal ordering of writes
|
|
170
|
+
|
|
171
|
+
### Multi-Leader Replication
|
|
172
|
+
|
|
173
|
+
Multiple nodes accept writes. Each leader replicates to all others.
|
|
174
|
+
|
|
175
|
+
- **Use case**: multi-datacenter (one leader per datacenter), offline-capable clients, collaborative editing
|
|
176
|
+
- **Conflict resolution**:
|
|
177
|
+
- Last-write-wins (LWW): discard concurrent writes arbitrarily — data loss risk
|
|
178
|
+
- Merge values: application-specific merge logic
|
|
179
|
+
- Custom conflict handlers: on-write or on-read resolution
|
|
180
|
+
- **Topologies**: all-to-all (best), star, circular (avoid single points of failure)
|
|
181
|
+
|
|
182
|
+
### Leaderless Replication (Dynamo-style)
|
|
183
|
+
|
|
184
|
+
Client sends writes to multiple replicas. Reads query multiple replicas.
|
|
185
|
+
|
|
186
|
+
- **Quorum**: w + r > n (write quorum + read quorum > total replicas) to guarantee overlap
|
|
187
|
+
- Common config: n=3, w=2, r=2
|
|
188
|
+
- Tunable: w=1, r=3 for fast writes; w=3, r=1 for fast reads
|
|
189
|
+
- **Read repair**: client detects stale value during read, writes newer value back to stale replica
|
|
190
|
+
- **Anti-entropy**: background process compares replicas and fixes differences
|
|
191
|
+
- **Sloppy quorum + hinted handoff**: during network partition, write to reachable nodes, hand off later
|
|
192
|
+
- **Version vectors**: track causal history to detect concurrent writes vs. sequential
|
|
193
|
+
|
|
194
|
+
---
|
|
195
|
+
|
|
196
|
+
## Partitioning
|
|
197
|
+
|
|
198
|
+
### Partitioning Strategies
|
|
199
|
+
|
|
200
|
+
**By key range:**
|
|
201
|
+
- Keys sorted, each partition owns a contiguous range
|
|
202
|
+
- Enables efficient range scans
|
|
203
|
+
- Risk: hot spots if writes cluster on one range (e.g., time-based keys → today's partition is hot)
|
|
204
|
+
- Mitigation: compound keys (e.g., sensor_id + timestamp)
|
|
205
|
+
|
|
206
|
+
**By hash of key:**
|
|
207
|
+
- Hash function distributes keys uniformly across partitions
|
|
208
|
+
- Destroys sort order — range scans require querying all partitions
|
|
209
|
+
- More uniform load distribution
|
|
210
|
+
- Consistent hashing: minimizes data movement when adding/removing nodes
|
|
211
|
+
|
|
212
|
+
**Compound/composite partitioning:**
|
|
213
|
+
- First part of key determines partition (by hash), rest preserves sort order within partition
|
|
214
|
+
- Cassandra approach: partition key (hashed) + clustering columns (sorted within partition)
|
|
215
|
+
|
|
216
|
+
### Secondary Indexes with Partitioning
|
|
217
|
+
|
|
218
|
+
**Document-partitioned (local) index:**
|
|
219
|
+
- Each partition maintains its own secondary index covering only its data
|
|
220
|
+
- Write: update one partition's index
|
|
221
|
+
- Read: scatter-gather across all partitions (fan-out) — can be slow
|
|
222
|
+
|
|
223
|
+
**Term-partitioned (global) index:**
|
|
224
|
+
- Secondary index is itself partitioned by the indexed term
|
|
225
|
+
- Write: may need to update multiple partitions' indexes (distributed transaction or async)
|
|
226
|
+
- Read: query only the relevant index partition — faster reads
|
|
227
|
+
|
|
228
|
+
### Rebalancing Strategies
|
|
229
|
+
|
|
230
|
+
- **Fixed number of partitions**: create many more partitions than nodes; reassign whole partitions when nodes join/leave (Riak, Elasticsearch, Couchbase)
|
|
231
|
+
- **Dynamic partitioning**: split large partitions, merge small ones (HBase, RethinkDB)
|
|
232
|
+
- **Proportional to nodes**: fixed number of partitions per node (Cassandra)
|
|
233
|
+
- **Avoid**: hash mod N (reassigns almost everything when N changes)
|
|
234
|
+
|
|
235
|
+
### Request Routing
|
|
236
|
+
|
|
237
|
+
- **Client-side**: client learns partition assignment (via ZooKeeper/gossip) and connects directly
|
|
238
|
+
- **Routing tier**: proxy/load balancer that knows partition assignment
|
|
239
|
+
- **Coordinator node**: any node accepts request, forwards to correct partition owner (Cassandra gossip)
|
|
240
|
+
|
|
241
|
+
---
|
|
242
|
+
|
|
243
|
+
## Transactions
|
|
244
|
+
|
|
245
|
+
### Isolation Levels
|
|
246
|
+
|
|
247
|
+
**Read Committed:**
|
|
248
|
+
- Guarantees: no dirty reads (only see committed data), no dirty writes (only overwrite committed data)
|
|
249
|
+
- Implementation: row-level locks for writes; snapshot for reads (return old committed value during write)
|
|
250
|
+
- Default in PostgreSQL, SQL Server
|
|
251
|
+
|
|
252
|
+
**Snapshot Isolation (Repeatable Read / MVCC):**
|
|
253
|
+
- Each transaction sees a consistent snapshot of the database from the start of the transaction
|
|
254
|
+
- Implementation: MVCC (Multi-Version Concurrency Control) — each write creates a new version; reads see only versions committed before the transaction started
|
|
255
|
+
- Prevents: read skew (non-repeatable reads)
|
|
256
|
+
- Does NOT prevent: write skew (two transactions read, then both write based on stale reads)
|
|
257
|
+
|
|
258
|
+
**Serializable:**
|
|
259
|
+
- Strongest guarantee — result is as if transactions ran one-at-a-time
|
|
260
|
+
|
|
261
|
+
Three implementations:
|
|
262
|
+
1. **Actual serial execution**: literally run one transaction at a time on a single CPU
|
|
263
|
+
- Viable when transactions are short and fit in memory
|
|
264
|
+
- Use stored procedures to avoid network round-trips
|
|
265
|
+
- Partitioning can enable per-partition serial execution
|
|
266
|
+
2. **Two-phase locking (2PL)**: readers block writers, writers block readers
|
|
267
|
+
- Shared locks for reads, exclusive locks for writes
|
|
268
|
+
- Predicate locks or index-range locks prevent phantoms
|
|
269
|
+
- Performance: significant contention, potential deadlocks
|
|
270
|
+
3. **Serializable Snapshot Isolation (SSI)**: optimistic — detect conflicts at commit time
|
|
271
|
+
- Based on snapshot isolation + tracking reads and writes
|
|
272
|
+
- Detect: writes that affect prior reads (stale MVCC reads)
|
|
273
|
+
- Abort conflicting transactions at commit
|
|
274
|
+
- Better performance than 2PL for low-contention workloads
|
|
275
|
+
|
|
276
|
+
### Preventing Write Skew and Phantoms
|
|
277
|
+
|
|
278
|
+
Write skew: two transactions both read a condition, both decide to act, both write — violating a constraint that should hold across both.
|
|
279
|
+
|
|
280
|
+
Example: two doctors both check "≥2 doctors on call" → both remove themselves → 0 on call.
|
|
281
|
+
|
|
282
|
+
Solutions:
|
|
283
|
+
- Serializable isolation
|
|
284
|
+
- Explicit locking: `SELECT ... FOR UPDATE` to materialize the conflict
|
|
285
|
+
- Application-level constraints with saga patterns
|
|
286
|
+
|
|
287
|
+
Phantoms: a write in one transaction changes the result of a search query in another.
|
|
288
|
+
|
|
289
|
+
Solutions:
|
|
290
|
+
- Predicate locks (lock the search condition itself)
|
|
291
|
+
- Index-range locks (practical approximation of predicate locks)
|
|
292
|
+
- Materializing conflicts: create a lock table that represents the condition
|
|
293
|
+
|
|
294
|
+
---
|
|
295
|
+
|
|
296
|
+
## Distributed Systems Fundamentals
|
|
297
|
+
|
|
298
|
+
### Unreliable Networks
|
|
299
|
+
|
|
300
|
+
- Networks are **asynchronous**: no upper bound on message delay
|
|
301
|
+
- Packet loss, reordering, duplication are normal
|
|
302
|
+
- **Timeout selection**: too short → false positives (unnecessary failover); too long → slow detection
|
|
303
|
+
- Adaptive timeouts based on observed round-trip times
|
|
304
|
+
- **Network partitions**: some nodes can't communicate with others
|
|
305
|
+
|
|
306
|
+
### Unreliable Clocks
|
|
307
|
+
|
|
308
|
+
- **Time-of-day clocks**: wall clock, can jump (NTP sync, leap seconds) — DO NOT use for durations or ordering
|
|
309
|
+
- **Monotonic clocks**: always move forward, for measuring elapsed time within a single node
|
|
310
|
+
- **Logical clocks**: Lamport timestamps, vector clocks — capture causal ordering without relying on physical time
|
|
311
|
+
|
|
312
|
+
Clock issues in distributed systems:
|
|
313
|
+
- LWW conflict resolution using timestamps is fundamentally unsafe
|
|
314
|
+
- Lease expiration: a process might think its lease is valid when the clock has drifted
|
|
315
|
+
- Solution: **fencing tokens** — monotonically increasing tokens; storage rejects stale tokens
|
|
316
|
+
|
|
317
|
+
### Process Pauses
|
|
318
|
+
|
|
319
|
+
- GC pauses, VM suspension, disk I/O stalls can freeze a process for seconds
|
|
320
|
+
- A process cannot know it was paused — it might act on stale state after resuming
|
|
321
|
+
- Solution: lease-based protocols with fencing tokens; always validate at the point of write
|
|
322
|
+
|
|
323
|
+
### Fencing Tokens
|
|
324
|
+
|
|
325
|
+
When using distributed locks or leases:
|
|
326
|
+
1. Lock service issues a fencing token (monotonically increasing number) with each lock grant
|
|
327
|
+
2. Client includes fencing token with every write to the storage service
|
|
328
|
+
3. Storage service rejects writes with a token lower than the highest seen
|
|
329
|
+
4. Guarantees mutual exclusion even if a client holds a stale lock
|
|
330
|
+
|
|
331
|
+
---
|
|
332
|
+
|
|
333
|
+
## Consistency and Consensus
|
|
334
|
+
|
|
335
|
+
### Linearizability
|
|
336
|
+
|
|
337
|
+
Strongest consistency model: operations appear to take effect atomically at some point between invocation and completion. Like a single-copy system.
|
|
338
|
+
|
|
339
|
+
- **Use cases**: leader election, uniqueness constraints, distributed locks
|
|
340
|
+
- **Not needed for**: most application reads, analytics, caching
|
|
341
|
+
- **Cost**: slower (requires coordination), reduced availability during network partitions
|
|
342
|
+
|
|
343
|
+
CAP Theorem (more precisely): if there's a network partition, you must choose between consistency (linearizability) and availability.
|
|
344
|
+
|
|
345
|
+
### Causal Consistency
|
|
346
|
+
|
|
347
|
+
Weaker than linearizability but preserves causally related ordering:
|
|
348
|
+
- If operation A happened before B, everyone sees A before B
|
|
349
|
+
- Concurrent operations can be seen in different orders by different nodes
|
|
350
|
+
- Implemented via: version vectors, Lamport timestamps
|
|
351
|
+
|
|
352
|
+
### Total Order Broadcast
|
|
353
|
+
|
|
354
|
+
Protocol to deliver messages to all nodes in the same order:
|
|
355
|
+
- All nodes deliver the same messages in the same sequence
|
|
356
|
+
- Equivalent to consensus (and to linearizable compare-and-swap)
|
|
357
|
+
- Implementations: ZooKeeper's Zab, Raft's log replication
|
|
358
|
+
|
|
359
|
+
### Distributed Consensus
|
|
360
|
+
|
|
361
|
+
Agreement: all nodes decide on the same value. Properties:
|
|
362
|
+
- **Uniform agreement**: no two nodes decide differently
|
|
363
|
+
- **Integrity**: a node decides at most once
|
|
364
|
+
- **Validity**: if a node decides value v, then v was proposed by some node
|
|
365
|
+
- **Termination**: every non-crashed node eventually decides
|
|
366
|
+
|
|
367
|
+
Algorithms: Paxos, Raft, Zab (ZooKeeper), Viewstamped Replication
|
|
368
|
+
|
|
369
|
+
Practical usage: don't implement consensus yourself. Use:
|
|
370
|
+
- **ZooKeeper/etcd**: coordination services providing linearizable key-value store, leader election, distributed locks, group membership, service discovery
|
|
371
|
+
- **Raft-based systems**: etcd (Kubernetes), CockroachDB, TiKV
|
|
372
|
+
|
|
373
|
+
### Two-Phase Commit (2PC) — and Why to Avoid It
|
|
374
|
+
|
|
375
|
+
Coordinator-based distributed transaction protocol:
|
|
376
|
+
1. Prepare phase: coordinator asks all participants to prepare (vote yes/no)
|
|
377
|
+
2. Commit phase: if all vote yes, coordinator commits; if any vote no, abort
|
|
378
|
+
|
|
379
|
+
Problems:
|
|
380
|
+
- **Blocking**: if coordinator crashes after prepare, participants are stuck holding locks
|
|
381
|
+
- **Single point of failure**: coordinator must be highly available
|
|
382
|
+
- **Performance**: high latency, reduced throughput due to lock holding
|
|
383
|
+
- **Heterogeneous systems**: XA transactions across different databases are especially fragile
|
|
384
|
+
|
|
385
|
+
Prefer: sagas with compensating transactions, or single-database transactions with outbox pattern.
|
|
386
|
+
|
|
387
|
+
---
|
|
388
|
+
|
|
389
|
+
## Batch Processing
|
|
390
|
+
|
|
391
|
+
### Unix Philosophy Applied to Data
|
|
392
|
+
|
|
393
|
+
- Small, focused tools composed via pipes
|
|
394
|
+
- Immutable inputs, explicit outputs
|
|
395
|
+
- Separate the logic from the wiring
|
|
396
|
+
|
|
397
|
+
### MapReduce
|
|
398
|
+
|
|
399
|
+
How it works:
|
|
400
|
+
1. **Map**: extract key-value pairs from each input record
|
|
401
|
+
2. **Shuffle**: group all values for the same key together (sorted)
|
|
402
|
+
3. **Reduce**: process all values for each key, produce output
|
|
403
|
+
|
|
404
|
+
Patterns:
|
|
405
|
+
- **Sort-merge join (reduce-side)**: both datasets emit the join key; reducer sees all matching records
|
|
406
|
+
- **Broadcast hash join (map-side)**: small dataset loaded into memory in each mapper; no shuffle needed
|
|
407
|
+
- **Partitioned hash join (map-side)**: both datasets partitioned the same way; each mapper joins its own partition pair
|
|
408
|
+
|
|
409
|
+
Limitations of MapReduce:
|
|
410
|
+
- Materializes intermediate state to disk between stages (slow)
|
|
411
|
+
- Overhead of repeated job startup
|
|
412
|
+
|
|
413
|
+
### Dataflow Engines (Spark, Flink, Tez)
|
|
414
|
+
|
|
415
|
+
Improvements over MapReduce:
|
|
416
|
+
- Model entire workflow as a directed acyclic graph (DAG) of operators
|
|
417
|
+
- No mandatory materialization of intermediate results (can pipeline through memory)
|
|
418
|
+
- Operators are generalized (not limited to map and reduce)
|
|
419
|
+
- Better fault tolerance: recompute from upstream operator or checkpoint
|
|
420
|
+
|
|
421
|
+
### Graph Processing (Pregel / BSP)
|
|
422
|
+
|
|
423
|
+
Bulk Synchronous Parallel model for iterative graph algorithms:
|
|
424
|
+
- Each vertex processes messages from its neighbors
|
|
425
|
+
- Sends messages to neighbors for the next iteration
|
|
426
|
+
- Iterations continue until convergence (no more messages)
|
|
427
|
+
- Use for: PageRank, shortest paths, connected components
|
|
428
|
+
|
|
429
|
+
---
|
|
430
|
+
|
|
431
|
+
## Stream Processing
|
|
432
|
+
|
|
433
|
+
### Message Brokers vs. Log-Based Systems
|
|
434
|
+
|
|
435
|
+
**Traditional message broker** (RabbitMQ, ActiveMQ):
|
|
436
|
+
- Messages deleted after acknowledgment
|
|
437
|
+
- No long-term history
|
|
438
|
+
- Multiple consumers: load balancing (competing consumers) or fan-out (pub/sub)
|
|
439
|
+
- Good for: task queues, work distribution
|
|
440
|
+
|
|
441
|
+
**Log-based message broker** (Kafka, Amazon Kinesis):
|
|
442
|
+
- Append-only log, messages retained for configurable period
|
|
443
|
+
- Consumers track their position (offset) in the log
|
|
444
|
+
- Multiple consumer groups read independently at their own pace
|
|
445
|
+
- Replay: reset offset to re-process past messages
|
|
446
|
+
- Ordering guaranteed within a partition
|
|
447
|
+
|
|
448
|
+
### Change Data Capture (CDC)
|
|
449
|
+
|
|
450
|
+
Capture every write to a database and publish it as an event stream:
|
|
451
|
+
- **Implementation**: read the database's replication log (WAL) and convert to events
|
|
452
|
+
- **Tools**: Debezium (Kafka Connect), Maxwell, AWS DMS
|
|
453
|
+
- **Initial snapshot**: bootstrap new consumer with a full table dump, then switch to streaming
|
|
454
|
+
- **Log compaction**: retain only the latest event for each key — bounded storage, full state rebuild
|
|
455
|
+
|
|
456
|
+
Use CDC to keep derived systems (search indexes, caches, data warehouses) in sync with the primary database.
|
|
457
|
+
|
|
458
|
+
### Event Sourcing
|
|
459
|
+
|
|
460
|
+
Store every state change as an immutable event in an append-only log:
|
|
461
|
+
- Current state = replay all events from the beginning (or from a snapshot)
|
|
462
|
+
- Events are facts — never deleted or modified
|
|
463
|
+
- Commands (requests) are validated and may be rejected; events (facts) are always recorded
|
|
464
|
+
- **Snapshots**: periodically save materialized state to avoid replaying entire history
|
|
465
|
+
|
|
466
|
+
Differences from CDC:
|
|
467
|
+
- CDC captures low-level database changes (row inserts/updates/deletes)
|
|
468
|
+
- Event sourcing captures high-level domain events (OrderPlaced, PaymentReceived)
|
|
469
|
+
- Event sourcing is designed into the application; CDC is applied to existing databases
|
|
470
|
+
|
|
471
|
+
### Stream Processing Patterns
|
|
472
|
+
|
|
473
|
+
**Complex Event Processing (CEP):**
|
|
474
|
+
- Define patterns over event streams (e.g., "three failed logins within 5 minutes")
|
|
475
|
+
- Query is stored, events flow through the query
|
|
476
|
+
- Tools: Esper, Apache Flink CEP
|
|
477
|
+
|
|
478
|
+
**Stream analytics:**
|
|
479
|
+
- Continuous aggregation over time windows
|
|
480
|
+
- Window types:
|
|
481
|
+
- **Tumbling**: fixed-size, non-overlapping (e.g., every 1 minute)
|
|
482
|
+
- **Hopping**: fixed-size, overlapping (e.g., 5-min window every 1 min)
|
|
483
|
+
- **Sliding**: all events within a fixed duration of each other
|
|
484
|
+
- **Session**: group events by activity with inactivity gap
|
|
485
|
+
|
|
486
|
+
**Stream joins:**
|
|
487
|
+
- **Stream-stream join (window join)**: join two streams within a time window; buffer events from both sides
|
|
488
|
+
- **Stream-table join (enrichment)**: enrich stream events with data from a table (maintained locally via CDC or changelog)
|
|
489
|
+
- **Table-table join (materialized view)**: both inputs are changelogs; output is an updated materialized view
|
|
490
|
+
|
|
491
|
+
### Stream Fault Tolerance
|
|
492
|
+
|
|
493
|
+
- **Microbatching** (Spark Streaming): process stream in small batches; replay batch on failure
|
|
494
|
+
- **Checkpointing** (Flink): periodic snapshots of operator state; restore from last checkpoint
|
|
495
|
+
- **Idempotent writes**: make sink operations idempotent so replayed messages don't cause duplicates
|
|
496
|
+
- **Exactly-once semantics**: achieved via atomic commit of state + output + offset (Kafka transactions), or idempotency keys at the sink
|
|
497
|
+
- **Rebuilding state**: if a local state store is lost, rebuild from the changelog or re-process the input stream
|
|
498
|
+
|
|
499
|
+
---
|
|
500
|
+
|
|
501
|
+
## Derived Data and Integration
|
|
502
|
+
|
|
503
|
+
### The Dataflow Paradigm
|
|
504
|
+
|
|
505
|
+
Think of data systems as a pipeline:
|
|
506
|
+
- **System of record** (source of truth): authoritative data store
|
|
507
|
+
- **Derived data**: caches, search indexes, materialized views, data warehouse — all derived from the source of truth
|
|
508
|
+
- A change to the source of truth triggers updates to all derived views
|
|
509
|
+
|
|
510
|
+
### Transactional Outbox Pattern
|
|
511
|
+
|
|
512
|
+
Ensure reliable event publishing alongside database writes:
|
|
513
|
+
1. Write business data AND event to an OUTBOX table in the same database transaction
|
|
514
|
+
2. A separate process (relay) reads the outbox and publishes events to the message broker
|
|
515
|
+
3. After successful publish, mark outbox entry as published
|
|
516
|
+
|
|
517
|
+
Relay strategies:
|
|
518
|
+
- **Polling publisher**: periodically query the outbox table for unpublished events
|
|
519
|
+
- **Transaction log tailing**: read the database's WAL to detect outbox inserts (lower latency)
|
|
520
|
+
|
|
521
|
+
### CQRS (Command Query Responsibility Segregation)
|
|
522
|
+
|
|
523
|
+
Separate the write model from the read model:
|
|
524
|
+
- **Command side**: handles writes, publishes domain events
|
|
525
|
+
- **Query side**: subscribes to events, maintains denormalized read-optimized views
|
|
526
|
+
|
|
527
|
+
Benefits:
|
|
528
|
+
- Read model optimized for specific query patterns
|
|
529
|
+
- Read and write sides can scale independently
|
|
530
|
+
- Can have multiple read models for different access patterns
|
|
531
|
+
|
|
532
|
+
Trade-offs:
|
|
533
|
+
- Eventual consistency between write and read sides
|
|
534
|
+
- More infrastructure (event bus, separate read databases)
|
|
535
|
+
- Complexity of maintaining derived views
|
|
536
|
+
|
|
537
|
+
### Lambda Architecture vs. Kappa Architecture
|
|
538
|
+
|
|
539
|
+
**Lambda**: maintain both a batch layer (reprocess all data periodically) and a speed layer (process new events in real time); merge results.
|
|
540
|
+
- Problem: maintaining two codepaths (batch and streaming)
|
|
541
|
+
|
|
542
|
+
**Kappa**: single stream processing system handles everything; reprocess by replaying the log from the beginning through a new version of the processor.
|
|
543
|
+
- Simpler; requires log retention and ability to replay
|
|
544
|
+
|
|
545
|
+
### End-to-End Correctness
|
|
546
|
+
|
|
547
|
+
No single component provides exactly-once across the entire pipeline. Instead:
|
|
548
|
+
- Use **idempotency keys** at every boundary (producer → broker → consumer → database)
|
|
549
|
+
- **Deduplication**: consumers track processed message IDs
|
|
550
|
+
- **End-to-end argument**: push correctness guarantees to the application level rather than relying on infrastructure
|
|
551
|
+
- **Deterministic processing**: same input always produces same output, enabling safe replay
|