event-storage 0.8.0 → 0.9.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,580 +1,89 @@
1
- [![build](https://github.com/albe/node-event-storage/workflows/build/badge.svg)](https://github.com/albe/node-event-storage/actions)
1
+ ![event-storage](logo/color.png)
2
+
3
+ [![build](https://github.com/albe/node-event-storage/actions/workflows/build.yml/badge.svg?branch=main)](https://github.com/albe/node-event-storage/actions/workflows/build.yml)
2
4
  [![npm version](https://badge.fury.io/js/event-storage.svg)](https://badge.fury.io/js/event-storage)
3
5
  [![Code Climate](https://codeclimate.com/github/albe/node-event-storage/badges/gpa.svg)](https://codeclimate.com/github/albe/node-event-storage)
4
- [![Coverage Status](https://coveralls.io/repos/github/albe/node-event-storage/badge.svg?branch=master)](https://coveralls.io/github/albe/node-event-storage?branch=master)
5
- [![Code documentation](https://inch-ci.org/github/albe/node-event-storage.svg?branch=master)](https://inch-ci.org/github/albe/node-event-storage)
6
+ [![Coverage Status](https://coveralls.io/repos/github/albe/node-event-storage/badge.svg?branch=main)](https://coveralls.io/github/albe/node-event-storage?branch=main)
7
+ [![Code documentation](https://inch-ci.org/github/albe/node-event-storage.svg?branch=main)](https://inch-ci.org/github/albe/node-event-storage)
6
8
 
7
9
  # node-event-storage
8
10
 
9
11
  An optimized embedded event store for modern node.js, written in ES6.
10
12
 
11
- > **Disclaimer:** This is currently under heavy development and not production ready. See [issues/29](https://github.com/albe/node-event-storage/issues/29) for more information.
12
-
13
- # Contents
13
+ 📖 **[Full documentation on readthedocs.io](https://node-event-storage.readthedocs.io/en/latest/)**
14
14
 
15
- - [Why?](#why)
16
- - [Use cases](#use-cases)
17
- - [Design goals](#design-goals)
18
- - [Event storage specifics](#event-storage-and-its-specifics)
19
- - [Installation](#installation)
20
- - [Usage](#usage)
21
- * [Creating additional streams](#creating-additional-streams)
22
- * [Optimistic concurrency](#optimistic-concurrency)
23
- * [Reading streams](#reading-streams)
24
- * [Joining streams](#joining-streams)
25
- * [Event metadata](#event-metadata)
26
- * [Consumers](#consumers)
27
- * [Exactly-once](#exactly-once-semantics)
28
- * [Consumer state](#consumer-state)
29
- * [Consistency guards (a.k.a. "Aggregates")](#consistency-guards-aka-aggregates)
30
- * [Read-Only](#read-only)
31
- - [Implementation details](#implementation-details)
32
- * [ACID](#acid)
33
- * [Global order](#global-order)
34
- * [Event streams](#event-streams)
35
- * [Partitioning](#partitioning)
36
- * [Custom serialization](#custom-serialization)
37
- * [Compression](#compression)
38
- * [Security](#security)
15
+ ---
39
16
 
40
17
  ## Why?
41
18
 
42
- There is currently only a single embedded event store implementation for node/javascript, namely https://github.com/adrai/node-eventstore
43
-
44
- It is a nice project, but has a few drawbacks though:
45
-
46
- - its API is fully based around Event Streams, so in order to commit a new event the full existing Event Stream needs to be
47
- retrieved first. This makes it unfit for client application scenarios that frequently restart the application.
48
- - it has backends for quite a few existing databases (TingoDB, NeDB, MongoDB, ...), but none of them are optimized for event storage needs
49
- - the embeddable storage backends (TingoDB, NeDB) do not persist indexes and hence are very slow on initial load
50
- - it stores event publishing meta information in the events, so it does updates to event data
51
- - events are fixed onto one stream and it's not possible to create multiple streams that partially contain
52
- the same events. This makes creating projections hard and/or slow.
53
-
54
- ## Use cases
55
-
56
- Event sourced client applications running on node.js (electron, node-webkit, etc.).
57
- Small event sourced single-server applications that want to get near-optimal write performance.
58
- Using it as queryable log storage.
59
-
60
- ## Design goals
61
-
62
- - single node scalability
63
- * opening/writing to an existing store with millions of events should be as fast as opening/writing an empty store
64
- * write performance should not be constrained by locking or distributed transaction costs, i.e. single-writer (at least per transaction boundary = stream), so no horizontal write scaling
65
- * read performance should be optimized for sequential read-forward style reads starting at arbitrary position
66
- * reads should be scalable to as many readers as necessary (but typically one reader per projection)
67
- * it should be possible to create high number (thousands) of streams without high resource (memory,cpu) usage
68
- * re-reading (replaying) an arbitrary stream should be optimized for and cost no more than visiting every document in that stream (no full database scan)
69
- - consistency
70
- * writes to a single stream need to be able to guarantee consistency (i.e. every write happens only as of the state immediately before that write)
71
- * reads from a stream need to be consistent every time, i.e. repeatable read isolation (guaranteed order, read-committed for read-only but read-uncommitted/read your own writes for writers)
72
- - simplicity
73
- * the architecture and design should be straight-forward, not more complex than dictated by the goals
74
- * creating new streams (from existing data) should be easily doable with language-level methods
75
-
76
- ### Non-Goals
19
+ There is currently only a single other embedded event store for node/javascript: [node-eventstore](https://github.com/adrai/node-eventstore). It has a few drawbacks:
77
20
 
78
- - distributed storage/distributed transactions
79
- - therefore: no network API
80
- - cross-stream transactions
81
- - arbitrary querying capabilities - only range scans per stream
21
+ - Its API requires loading a full Event Stream before committing, making it unfit for frequently-restarting client applications.
22
+ - Its embeddable backends (TingoDB, NeDB) do not persist indexes and are slow on initial load.
23
+ - Events are fixed to one stream — creating overlapping projection streams is not possible.
82
24
 
83
- ## Event-Storage and it's specifics
25
+ **node-event-storage** is built from first principles for append-only workloads, giving you near-optimal write speed with no unnecessary overhead.
84
26
 
85
- The thing that makes event storages stand out (and makes them simpler and more performant), is that they
86
- have no concept of overwriting or deleting data. They are purely append-only storages, and the only querying is
87
- sequential (range) reading (possibly with some filtering applied):
88
-
89
- This means a couple of things:
90
-
91
- - no write-ahead log or transaction log required - the storage itself is the transaction log!
92
- - therefore writes are as fast as they can get, but you only can have a single writer (without implementing complex distributed log with RAFT or Paxos)
93
- - durability comes for free (in complexity) if write caches are avoided
94
- - reads and writes can happen lock-free, reads don't block writes and are always consistent (natural MVCC)
95
- - indexes are append-only and hence gain the same benefits
96
- - since only sequential reading is needed, indexes are simple file position lists - no fancy B+-Tree/fractal tree required
97
- - indexes are therefore pretty cheap and can be created in high numbers
98
- - creating backups is easily doable with rsync or by creating file copies on the fly
99
-
100
- Using any SQL/NoSQL database for storing events therefore is sub-optimal, as those databases do a lot of work on
101
- top which is simply not needed. Write and read performance suffer.
27
+ ---
102
28
 
103
29
  ## Installation
104
30
 
105
- `npm install event-storage`
106
-
107
- ## Run Tests
108
-
109
- `npm test`
31
+ ```bash
32
+ npm install event-storage
33
+ ```
110
34
 
111
- ## Usage
35
+ ## Quick Start
112
36
 
113
37
  ```javascript
114
38
  const EventStore = require('event-storage');
115
39
 
116
40
  const eventstore = new EventStore('my-event-store', { storageDirectory: './data' });
41
+
117
42
  eventstore.on('ready', () => {
118
- const streamVersion = eventstore.getStreamVersion('my-stream');
119
- //...
120
- eventstore.commit('my-stream', [{ foo: 'bar' }], streamVersion, () => {
121
- //...
43
+ // Write events
44
+ eventstore.commit('my-stream', [{ type: 'SomethingHappened', value: 42 }], 0, () => {
45
+ console.log('Written!');
122
46
  });
123
47
 
124
- let stream = eventstore.getEventStream('my-stream');
125
- for (let event of stream) {
126
- //...
127
- }
128
- });
129
- ```
130
-
131
- The `streamVersion` is needed if you do any async work in between the `getStreamVersion` and `commit`, that
132
- potentially involves other commits to the same stream. See [Optimistic Concurrency](#optimistic-concurrency).
133
-
134
- ### Creating additional streams
135
-
136
- Create additional streams that contain only part of another stream, or even a combination of events of other streams.
137
-
138
- ```javascript
139
- //...
140
- let myProjectionStream = eventstore.createStream('my-projection-stream', (event) => ['FooHappened', 'BarHappened'].includes(event.type));
141
-
142
- for (let event of myProjectionStream) {
143
- //...
144
- }
145
- ```
146
-
147
- ### Optimistic concurrency
148
-
149
- Optimistic concurrency control is required when multiple sources generate events concurrently.
150
-
151
- > Note that having the producer of events behind a HTTP interface automatically implies concurrent operation.
152
-
153
- To handle those cases but still guarantee all those producers can have their own consistent view of the current state,
154
- you need to track the last `streamVersion` the producer was at when he generated the event, then send that as `expectedVersion`
155
- with the commit.
156
-
157
- ```javascript
158
- const model = new MyConsistencyModel();
159
- const stream = eventstore.getEventStream('my-stream');
160
- stream.forEach((event, metadata) => {
161
- model.apply(event);
162
- });
163
- const expectedVersion = stream.version;
164
- // Provide model state and expectedVersion to some state change API or UI that returns a command
165
- //...
166
- // generate new events from the current model, by applying an incoming command
167
- const events = model.handle(command.payload);
168
- try {
169
- // The expectedVersion is supposed to be given back through the command
170
- eventstore.commit('my-stream', events, command.expectedVersion, () => {
171
- //...
172
- });
173
- } catch (e) {
174
- if (e instanceof EventStore.OptimisticConcurrencyError) {
175
- //...
176
- // Reattempt command / resolve conflict
48
+ // Read events
49
+ const stream = eventstore.getEventStream('my-stream');
50
+ for (const event of stream) {
51
+ console.log(event);
177
52
  }
178
- }
179
- ```
180
-
181
- Where `expectedVersion` is either `EventStore.ExpectedVersion.Any` (no optimistic concurrency check, the default),
182
- `EventStore.ExpectedVersion.EmptyStream` or any version number > 0 that the stream is expected to be at.
183
- It will throw an OptimisticConcurrencyError if the given stream version does not match the expected.
184
- In that case you should either signal that back to the upstream source, or replay state and reattempt application
185
- of the command.
186
-
187
- ### Reading streams
188
-
189
- Of course any functional system will not only write to the storage, but also read back the events and do something meaningful with them.
190
- The common case is a projection/read model, or a process manager (which is technically a projection that emits new events), but could also
191
- be for just skimming through the events for migrating/upgrading data or just showing a history table.
192
- For this you can just get a hold of the event stream you want to read, and iterate it. The EventStream is an [Iterable](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Iteration_protocols)!
193
- Apart from that, you can also specify the exact version range you want to iterate at the time of retrieving the stream. With this it is also
194
- possible to iterate the stream in reverse, by specifying a lower `max` than `min` revision.
195
-
196
- ```javascript
197
- const stream0 = eventstore.getEventStream('my-stream', 0, -1); // all events from the start (#0) up to the last (-1 equals the last version)
198
- const stream1 = eventstore.getEventStream('my-stream', 0, 50); // all events from the start (#0) up to event #50, hence 51 events in total
199
- const stream2 = eventstore.getEventStream('my-stream', 10, -11); // the events starting from #10 up to the 10th last event
200
- const stream3 = eventstore.getEventStream('my-stream', -11, -1); // get the last ten events starting from the earliest
201
- const stream4 = eventstore.getEventStream('my-stream', -1, -11); // get the last ten events starting from the last in reverse order
202
-
203
- for (let event of stream{x}) {
204
- //...
205
- }
206
- ```
207
-
208
- **Note**
209
- > If a new event is appended right after the `getEventStream()` call, but before iterating, this event will **not** be included in the iteration.
210
- > This is due to the revision boundary being fixed at the time of getting the stream reference. In some cases this might be unwanted, but those cases are
211
- > probably better covered by [consumers](#consumers).
212
-
213
- #### Joining streams
214
-
215
- Sometimes you might want to iterate over events from multiple streams in the order they were appended to the respective streams. In that case the
216
- `fromStreams(string transientStreamName, array streamNames, [number minRevision, [number maxRevision]])` method will do what you want.
217
- It will return an instance of `EventStream` (`JoinEventStream` actually) that will iterate the events of all streams specified in their global insertion order.
218
- You can also reverse the order by specifying a lower `max` than `min` revision.
219
- The result of this iteration will not be persisted and is not applicable to [consumers](#consumers), so if you intend to more frequently work with the join of
220
- those streams, another approach would be to create a completely new stream that will match all events that belong to the streams you want to join.
221
-
222
- #### Stream categories
223
-
224
- Similar to EventStoreDB (and other), event-storage allows categorizing streams by naming convention.
225
- This is useful when e.g. needing to iterate all events that belong to a single model class, rather than instance.
226
- In this case, you name the streams for the instances as the class name followed by the identity of the instance, e.g. `user-123`, `user-456`, etc.
227
- If you then want to iterate all users' events, you would need to join the streams of all users and for convenience you can do this with
228
- the method `getEventStreamForCategory(categoryName, minRevision, maxRevision)`. This will find all streams whose name starts with the given
229
- `categoryName` followed by a dash and return a [joined stream](#joining-streams) over those. If you already created a dedicated stream for this
230
- category manually, this stream will be returned.
231
-
232
- ```javascript
233
- eventstore.commit('user-' + user.id, [new UserRegistered(user.id, user.email)]);
234
- //...
235
- const allUsersStream = eventstore.getEventStreamForCategory('user');
236
- ```
237
-
238
- #### Event metadata
239
-
240
- In case you also need access to the storage level meta information, the iterable approach will not suffice. For those cases the `forEach((event, metadata, streamName) callback)`
241
- method will give you everything you need.
242
- ```javascript
243
- const stream = eventstore.getEventStream('my-stream');
244
- stream.forEach((event, metadata, streamName) => {
245
- // metadata is an object of the form { commitId, committedAt, commitVersion, streamVersion } combined with any additional metadata you provide in the commit call.
246
- // commitId is a unique Id for the whole commit, committedAt the milliseconds timestamp when the commit happened,
247
- // commitVersion is the sequence number for the event within the commit and streamVersion the version of the event within the stream
248
- eventstore.commit('my-new-stream', [event], metadata);
249
53
  });
250
54
  ```
251
- This is primarily useful for low-level work, like rewriting streams.
252
55
 
253
- ### Consumers
56
+ ## Key Features
254
57
 
255
- Consumers are durable event-driven listeners on event streams. From a nodejs perspective they are `stream.Readable`s. They provide
256
- at-least-once delivery guarantees, meaning they receive each event in the stream at least once. An event may be delivered twice if
257
- the program crashed during the handling of an event, since the current position will only be persisted *afterwards*.
258
- As of version 0.6 the `setState()` method allows opting into [exactly-once](#exactly-once-semantics) processing.
58
+ | Feature | Summary |
59
+ |---------|---------|
60
+ | **Optimistic concurrency** | Pass `expectedVersion` to `commit()` to guarantee conflict-free writes. |
61
+ | **Flexible stream reading** | Range queries, reverse iteration, and a fluent builder API. |
62
+ | **Derived streams** | Filter or combine events into new read-only streams. |
63
+ | **Stream categories** | Name streams `<category>-<id>` and query the whole category at once. |
64
+ | **Durable consumers** | At-least-once (and exactly-once with `setState`) event delivery with automatic position tracking. |
65
+ | **Consistency guards** | Build aggregates that enforce business invariants with built-in snapshotting. |
66
+ | **Read-only mode** | Open the store from a second process to build projections without touching the writer. |
67
+ | **Crash safety** | Torn writes detected and truncated on startup; automatic index repair via `LOCK_RECLAIM`; bounded, predictable data loss validated by a dedicated stress test. |
68
+ | **Custom serialization** | Plug in msgpack, protobuf, or any other codec. |
69
+ | **Compression** | Apply LZ4, zstd, or any other compression via the `serializer` option. |
70
+ | **Access control hooks** | `preCommit` / `preRead` hooks with per-stream metadata for authorization. |
259
71
 
260
- ```javascript
261
- let myConsumer = eventstore.getConsumer('my-stream', 'my-stream-consumer1');
262
- myConsumer.on('data', event => {
263
- // do something with event, but be sure to de-duplicate or have idempotent handling
264
- });
265
- ```
266
-
267
- Since a consumer is always bound to a specific stream, you need to create a stream for the specific consumer first,
268
- if it needs to listen to events from different [write-streams](#event-streams).
269
-
270
- **Note**
271
- > The consuming of events will start as soon as a handler for the `data` event is registered and suspended
272
- > when the last listener is removed.
273
-
274
- As soon as the consumer has caught up the stream, it will emit a `caught-up` event.
275
-
276
- #### Exactly-Once semantics
277
-
278
- Since version 0.6 the consumers can persist their state (a simple JSON object), which allows for achieving
279
- exactly-once processing semantics relatively easy. What this means is, that the state of the consumer will
280
- always reflect the state of having each event processed exactly once, because if persisting the state fails,
281
- the position will also not be updated and vice versa.
282
-
283
- ```javascript
284
- let myConsumer = eventstore.getConsumer('my-stream', 'my-stream-consumer1');
285
- myConsumer.on('data', event => {
286
- const newState = { ...myConsumer.state, projectedValue: myConsumer.state.projectedValue + event.someValue };
287
- myConsumer.setState(newState);
288
- });
289
- ```
290
-
291
- This is very useful for projecting some data out of a stream with exactly-once processing without a lot of effort.
292
- Whenever the state has been persisted, the consumer will also emit a `persisted` event.
293
-
294
- **Note**
295
- > Never mutate the consumers `state` property directly and only use the `setState` method **inside** the `data` handler.
296
- > Since version 0.8 mutating is prevented by freezing the state object.
72
+ ---
297
73
 
298
- The reason why this works is, that conceptually the state update and the position update happens within a single
299
- transaction. So anything you can wrap inside a transaction with storing the position yields exactly-once semantics.
300
- However, for example sending an email exactly once for every event is not achievable with this, because you can't
301
- wrap a transaction around sending an e-mail and persisting the consumer position in a local file easily.
74
+ ## Documentation
302
75
 
303
- #### Consumer state
76
+ The full documentation is hosted at **<https://node-event-storage.readthedocs.io/en/latest/>** and covers:
304
77
 
305
- Since version 0.8 a consumer can set an initial state and update it's state via a function that receives the current state as argument.
306
- That way it becomes much easier to write reusable state calculation functions.
307
-
308
- ```javascript
309
- const myConsumer = eventstore.getConsumer('my-stream', 'my-stream-consumer1', { someValue: 0, someOtherValue: true });
310
- myConsumer.on('data', event => {
311
- myConsumer.setState(state => ({ ...state, someValue: state.someValue + event.someValueDiff }));
312
- });
313
- ```
78
+ - [Getting Started](https://node-event-storage.readthedocs.io/en/latest/getting-started/) installation, constructor options, basic usage.
79
+ - [Event Streams](https://node-event-storage.readthedocs.io/en/latest/streams/) writing, reading, optimistic concurrency, fluent API, joining streams, categories, and event metadata.
80
+ - [Consumers](https://node-event-storage.readthedocs.io/en/latest/consumers/) — at-least-once and exactly-once delivery, consumer state, consistency guards, and read-only mode.
81
+ - [Advanced Topics](https://node-event-storage.readthedocs.io/en/latest/advanced/) — ACID properties, reliability and crash-safety guarantees, storage configuration, partitioning, custom serialization, compression, security, and access control hooks.
314
82
 
315
- Also, since that version the consumer can be reset, to force it to reprocess all (or a subset) of the events.
316
-
317
- ```javascript
318
- myConsumer.reset({ someValue: 1 }, 10);
319
- ```
320
- This will restart the consumer with an inital state of `someValue = 1` and reprocess starting from position 10 in the stream.
321
-
322
- #### Consistency guards (a.k.a. "Aggregates")
323
-
324
- Consistency guards, or more famously yet misleadingly called "Aggregates" in event sourcing can be built with the semantics
325
- that a `Consumer` provides.
326
- One example for the code is shown here:
327
-
328
- ```javascript
329
- const myConsistencyGuard = eventstore.getConsumer('my-guard-stream', 'my-guard-uuid');
330
- // The guard's apply event method, which will update the internal state. Since the consumer is running in the same process
331
- // as the writing eventstore, this is effectively synchronous (invoked on next node event loop).
332
- // This should only contain the data necessary to make the decisions in validateCommand()
333
- myConsistencyGuard.apply = function(event) {
334
- this.setState(state => ({ ...state, someValue: calculateNewValue(state.someValue, event) }));
335
- };
336
- // You could also just use a lambda here, but the apply/handle separation is a well known paradigm when building "Aggregates"
337
- myConsistencyGuard.on('data', myConsistencyGuard.apply);
338
- // The command handling method that builds new events (this makes the guard easily testable).
339
- // This contains (only) your business rules fulfilling some (hard) constraints. It only returns the events
340
- // that should be emitted from handling the command.
341
- myConsistencyGuard.handle = function(command) {
342
- // Should throw an Error if the command is rejected based on the current state
343
- validateCommand(command, this.state);
344
- return [new MyDomainEvent(command), ...];
345
- };
346
-
347
- // This is probably a HTTP handler method like express' app.post('my/guard/uri', ...) or invoked from there
348
- function myCommandHandler(command) {
349
- // Notice how the guard just becomes some arbitrary event emitter - in a lot of cases you don't need a guard at all, e.g. if you only do Event = CommandHappened
350
- eventstore.commit(myConsistencyGuard.streamName, myConsistencyGuard.handle(command), command.position || myConsistencyGuard.position);
351
- }
352
- ```
353
-
354
- So how does this work? First, the guard is basically a consumer of its own stream. Since a consumer provides
355
- [exactly-once](#exactly-once-semantics) processing guarantees when using `setState()`, we are always sure that the guard's state exactly reflects
356
- the state after processing all events once. Therefore, the handle method can safely make decisions based on that assumption
357
- and reject commands that do not fit the current state of the guard. If two requests come in in parallel, the optimistic concurrency
358
- check of the commit will prevent the second attempt from persisting those events. For multi-user handling, the command should
359
- already carry the last known version of the guard that the user made a decision on. Otherwise, the guard's own position makes sure
360
- that only events directly following the previous state are committed.
361
-
362
- **Note**
363
- > This implementation of a consistency guard already implements snapshotting automatically, which means that restarting the process
364
- > does not require rebuilding the state from all previous events. If you want to control how often the guard's state is snapshotted,
365
- > you can specify a second argument to the `setState()` method that should be true when a snapshot should be created and false otherwise,
366
- > e.g. `this.position % 20 === 0`. Note that this is only needed for very high frequency guards/streams, in order to reduce IO.
367
-
368
- ### Read-Only
369
-
370
- The `EventStore` can also be opened in a readonly mode since 0.7, by specifying the constructor option `readOnly: true`.
371
- In this mode, any writes to the store will be prevented, while all reads and consumers work as normal. The read-only storage
372
- will watch the files that back it and automatically update internal state on changes, so the reader is asynchronously fully
373
- consistent to the writer state. You can open as many readers as needed, and the main use case is to use it for consumers running
374
- in a different process than the writer. This way, you can have different processes create projections from the events for
375
- different use cases and serve their state out to other systems, e.g. through an HTTP interface or whatever deems useful.
376
-
377
- ```javascript
378
- const EventStore = require('event-storage');
379
-
380
- const eventstore = new EventStore('my-event-store', { storageDirectory: './data', readOnly: true });
381
- eventstore.on('ready', () => {
382
- let myConsumer = eventstore.getConsumer('my-stream', 'my-stream-consumer1');
383
- myConsumer.on('data', event => {
384
- const newState = { ...myConsumer.state, projectedValue: myConsumer.state.projectedValue + event.someValue };
385
- myConsumer.setState(newState);
386
- });
387
- });
388
- ```
389
-
390
- In theory, it would even be possible with this, to scale the storage to multiple machines, if they are all backed by a common
391
- file system. The biggest issue preventing this is, that the nodejs file watcher needs to work on that filesystem.
392
- See https://nodejs.org/api/fs.html#fs_availability for more information.
393
- Also, you could rsync the files that back the storage to another machine and have a read-only instance running on that.
394
- See https://linux.die.net/man/1/rsync and the `--append` option.
395
-
396
- ## Implementation details
397
-
398
- ### ACID
399
-
400
- > Note: All following explanations talk about a single transaction boundary, which is a single write-stream, AKA a storage partition.
401
-
402
- The storage engine is not strictly designed to follow ACID semantics. However, it has following properties:
403
-
404
- #### Atomicity
405
-
406
- A single document write is guaranteed to be atomic. Unless specifically configured, atomicity spreads to all subsequent
407
- writes until the write buffer is flushed, which happens either if the current document doesn't fully fit into the write
408
- buffer or on the next node event loop.
409
- This can be (ab)used to create a reduced form of transactional behaviour: All writes that happen within a single event loop
410
- and still fit into the write buffer will all happen together or not at all.
411
- If strict atomicity for single documents is required, you can configure the option `maxWriteBufferDocuments` to 1, which
412
- leads to every single document being flushed directly.
413
-
414
- #### Consistency
415
-
416
- Since the storage is append-only, consistency is automatically guaranteed for all successful writes. Writes that fail in
417
- the middle, e.g. because the machine crashes before the full write buffer is flushed, will lead to a torn write. This is
418
- a partial invalid write. To recover from such a state, the storage will detect torn writes and truncate them when an existing
419
- lock is reclaimed. This can be done by instantiating the store with the following option:
420
-
421
- ```javascript
422
- const eventstore = new EventStore('my-event-store', { storageConfig: { lock: EventStore.LOCK_RECLAIM } });
423
- ```
424
-
425
- Note that this option will effectively bypass the lock that prevents multiple instances from being created, so you should
426
- not use this carelessly. Having multiple instances write to the same files will lead to inconsistent data that can not be
427
- easily recovered from.
428
-
429
- #### Isolation
430
-
431
- The storage is supposed to only work with a single writer, therefore writes do not influence each other obviously. The single
432
- writer is only guaranteed with a simple lock-directory mechanic, which works on NFS. This is of course not a hard guarantee, just
433
- a helper to prevent accidentally opening two writers.
434
- Reads are guaranteed to be isolated due to the append-only nature and a read only ever seeing writes that have finished
435
- (not necessarily flushed - i.e. Dirty Reads) at the point of the read. In a read-only instance, dirty reads are technically
436
- impossible, because the reader has no access to the unfinished writes. Multiple reads can happen without blocking writes.
437
-
438
- If Dirty Reads are not wanted, they can be disabled with the storage configuration option `dirtyReads` set to false. That
439
- way you will only ever be able to read back documents that where flushed to disk, even on writers. Note though, that this should
440
- only be done with in-memory models that keep their own (uncommitted) state, or else you might suffer from inconsistency.
441
-
442
- There are no lost updates due to the append-only nature. Phantom reads can be prevented by specifying the `maxRevision` for
443
- streams explicitly (MVCC). All reads are repeatable, as long as no manual truncation happens.
444
-
445
- #### Durability
446
-
447
- Durability is not strictly guaranteed due to the used write buffering and flushes not being synced to disk by default.
448
- All writes happening within a single node event loop and fitting into the write buffer can be lost on application crash.
449
- Even after flush, the OS and/or disk write buffers can still limit durability guarantees.
450
- This is a trade-off made for increased write performance and can be more finely configured to needs.
451
- The write buffer behaviour can be configured with the already mentioned `maxWriteBufferDocuments` and `writeBufferSize`
452
- options. For strict durability, you can set the option `syncOnFlush` which will sync all flushes to disk before finishing,
453
- but comes at a very high performance penalty of course.
454
-
455
- Note: If there are any misconceptions on my side to the ACID semantics, let me know.
456
-
457
- ### Global order
458
-
459
- Currently, the `storage` guarantees a consistent global ordering on all events by managing a global primary index. This makes
460
- sure that streams that are made up of multiple write-streams will stay consistent when re-reading all events. This has some
461
- issues though, like not being able to consistently reindex a storage, which is discussed in https://github.com/albe/node-event-storage/issues/24.
462
-
463
- Since version 0.7 the storage also stores a monotonic clock stamp and an external sequence number together with the document.
464
- This way, a consistent global order can also be reconsituted without a global index. In a later version, the global index might
465
- therefore be removed and reindexing a storage be possible, which allows to rebuild a consistent state after a destructive crash.
466
-
467
- ### Event Streams
468
-
469
- There are two slightly different concepts of Event Streams:
470
-
471
- - A write stream is a single identifier that an event/document is assigned to on write (see Partitioning). It is therefore
472
- a physical separation of the events that happens on write. An event written to a specific write stream can not be removed
473
- from it, it can only be linked to from other additional (read) streams.
474
-
475
- - A read stream is an ordered sequence in which specific events are iterated when reading. Every write stream automatically
476
- creates a read stream that will iterate the events in the order they were written to that stream. Additional read streams
477
- can be created that possibly even sequence events from multiple write streams. Such read streams can be deleted without
478
- problem, since they will not actually delete the events, but just the specific iteration sequence.
479
-
480
- An Event Stream is implemented as an iterator over an storage index. It is therefore limited to iterating the events at
481
- the point the Event Stream was retrieved, but can be limited to a specific range of events, denoted by min/max revision.
482
- It implements the node `ReadableStream` interface.
483
-
484
- ### Partitioning
485
-
486
- By default, the Event Store is partitioned on (write) streams, so every unique stream name is written to a separate file.
487
- This has several consequences:
488
-
489
- - subsequent reads from a single write stream are faster, because the events share more locality
490
- - every write stream has it's own write and read buffer, hence interleaved writes/reads will not trash the buffers
491
- - since writes are buffered, only writes within a single write stream will be flushed together, hence "transactionality" is not spread over streams
492
- - the amount of write streams is limited by the amount of files the filesystem can handle inside a single folder
493
- - if hard disk is configured for file based RAID, this will most likely lead to unbalanced load
494
-
495
- If required, the partitioning behaviour can be configured with the `partitioner` option, which is a method with following signature:
496
- `(string:document, number:sequenceNumber) -> string:partitionName`
497
- i.e. it maps a document and it's sequence number to a partition name. That way you could for example easily distribute all writes
498
- equally among a fixed number of arbitrary partitions by doing `(document, sequenceNumber) => 'partition-' + (sequenceNumber % maxPartitions)`.
499
- This is not recommended in the generic case though, since it contradicts the consistency boundary that a single stream should give.
500
- Many databases partition the data into Chunks (striding) of a fixed size, which helps with disk performance especially in RAID setups.
501
- However, since SSDs become more the standard, the benefit of chunking data is becoming more limited. It does help with incremental
502
- backup strategies, or for use cases where old data needs to be archived or even deleted. For those cases, the partitioner could look
503
- like `(document, sequenceNumber) -> 'partition' + (sequenceNumber / documentsPerChunk) >> 0`, which will write documents into an ever
504
- increasing number of partitions. Or you partition by the document timestamp, which for an `EventStore` document could be taken from the `committedAt` field, which is a javascript timestamp. Optimally, you might want to make sure a commit is not spread among partitions though, so those partitioners are not fool-proof.
505
-
506
- ### Custom Serialization
507
-
508
- By default, the serialization will be achieved through `JSON.stringify` and `JSON.parse`. Those are plenty fast on recent nodejs
509
- versions, but JSON serialization takes more space than more optimized formats. You could use some other library, like `@msgpack/msgpack`
510
- to have performant, but space-safing data format. In benchmarks, `@msgpack/msgpack` even turns out faster than `JSON.parse` for
511
- deserialization and pretty much on par with `JSON.stringify` for serialization. The drawback is that the storage files are no longer
512
- human readable.
83
+ ---
513
84
 
85
+ ## Run Tests
514
86
 
515
- ```javascript
516
- const { encode, decode } = require('@msgpack/msgpack');
517
- const eventstore = new EventStore('my-event-store', {
518
- storageDirectory: './data',
519
- storageConfig: {
520
- serializer: {
521
- serialize: (doc) => {
522
- const encoded = encode(doc);
523
- return Buffer.from(encoded.buffer, encoded.byteOffset, encoded.byteLength).toString('binary');
524
- },
525
- deserialize: (string) => {
526
- return decode(Buffer.from(string, 'binary'));
527
- }
528
- }
529
- }
530
- });
87
+ ```bash
88
+ npm test
531
89
  ```
532
-
533
- ### Compression
534
-
535
- To apply compression on the storage level, the `serializer` option of the Storage can be used.
536
-
537
- For example to use LZ4:
538
-
539
- ```javascript
540
- const lz4 = require('lz4');
541
- const eventstore = new EventStore('my-event-store', {
542
- storageDirectory: './data',
543
- storageConfig: {
544
- serializer: {
545
- serialize: (doc) => {
546
- return lz4.encode(Buffer.from(JSON.stringify(doc))).toString('binary');
547
- },
548
- deserialize: (string) => {
549
- return JSON.parse(lz4.decode(Buffer.from(string, 'binary')));
550
- }
551
- }
552
- }
553
- });
554
- ```
555
-
556
- Since compression works on a per document level, compression efficiency is reduced. This is currently necessary
557
- to allow fully random access of single documents without having to read a large block before.
558
- If available, use a dictionary for the compression library and fill it with common words that describe
559
- your event/document schema and the following terms:
560
-
561
- - "metadata":{"commitId":
562
- - ,"committedAt":
563
- - ,"commitVersion":
564
- - ,"commitSize":
565
- - ,"streamVersion":
566
-
567
- ### Security
568
-
569
- When specifying a matcher function for streams/indexes those matcher functions will be serialized into the index
570
- file and be `eval`'d on later loading for convenience to not having to specify the matcher when reopening.
571
- In order to prevent some malicious attacker from executing arbitrary code in your application by altering an index
572
- file, the matcher function gets fingerprinted with an HMAC.
573
- This HMAC is calculated with a secret that you should specify with the `hmacSecret` option of the storage
574
- configuration.
575
-
576
- Currently the `hmacSecret` is an optional parameter defaulting to an empty string, which is insecure, so always
577
- specify an own unique random secret for this in production.
578
-
579
- Alternatively you should always explicitly specify your matchers when opening an existing index, since that will
580
- check the specified matcher matches the one in the index file.