redstream 0.0.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/.gitignore +14 -0
- data/.travis.yml +10 -0
- data/Gemfile +5 -0
- data/LICENSE.txt +22 -0
- data/README.md +253 -0
- data/Rakefile +9 -0
- data/docker-compose.yml +6 -0
- data/lib/redstream.rb +134 -0
- data/lib/redstream/consumer.rb +115 -0
- data/lib/redstream/delayer.rb +100 -0
- data/lib/redstream/lock.rb +80 -0
- data/lib/redstream/message.rb +52 -0
- data/lib/redstream/model.rb +57 -0
- data/lib/redstream/producer.rb +145 -0
- data/lib/redstream/trimmer.rb +91 -0
- data/lib/redstream/version.rb +5 -0
- data/redstream.gemspec +38 -0
- data/spec/redstream/consumer_spec.rb +90 -0
- data/spec/redstream/delayer_spec.rb +53 -0
- data/spec/redstream/lock_spec.rb +68 -0
- data/spec/redstream/model_spec.rb +57 -0
- data/spec/redstream/producer_spec.rb +79 -0
- data/spec/redstream/trimmer_spec.rb +32 -0
- data/spec/redstream_spec.rb +117 -0
- data/spec/spec_helper.rb +66 -0
- metadata +289 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: 0e1ddc2700836c469d1ca61069e3416c21e657e05725b92e75969aa8110768e3
|
4
|
+
data.tar.gz: c8565f3754b3fd4f66823d6d7035e814f73e27abdb15936ecd906f5f07dd8643
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: edd496df8d06b98b9318b9796f400e2c0870edfc84c3aa7f9c7946dbe6cf91c5a8c0ab32425d627bc20c585389eab92ed1b290e57e0df856e8995547d8a9b7c6
|
7
|
+
data.tar.gz: 4893d2197f427479e4df0821ca29a23ee98a604fa73680f955da8d2c71cbdb192d006c476dc3bd6c03c719da327c3d9b6f207842082a64133f0fe2383771aef5
|
data/.gitignore
ADDED
data/.travis.yml
ADDED
data/LICENSE.txt
ADDED
@@ -0,0 +1,22 @@
|
|
1
|
+
Copyright (c) 2014 Benjamin Vetter
|
2
|
+
|
3
|
+
MIT License
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
6
|
+
a copy of this software and associated documentation files (the
|
7
|
+
"Software"), to deal in the Software without restriction, including
|
8
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
9
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
10
|
+
permit persons to whom the Software is furnished to do so, subject to
|
11
|
+
the following conditions:
|
12
|
+
|
13
|
+
The above copyright notice and this permission notice shall be
|
14
|
+
included in all copies or substantial portions of the Software.
|
15
|
+
|
16
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
19
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
20
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
21
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
22
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,253 @@
|
|
1
|
+
|
2
|
+
# Redstream
|
3
|
+
|
4
|
+
**Using redis streams to keep your primary database in sync with secondary
|
5
|
+
datastores (e.g. elasticsearch).**
|
6
|
+
|
7
|
+
[![Build Status](https://secure.travis-ci.org/mrkamel/redstream.png?branch=master)](http://travis-ci.org/mrkamel/redstream)
|
8
|
+
|
9
|
+
## Installation
|
10
|
+
|
11
|
+
First, install redis. Then, add this line to your application's Gemfile:
|
12
|
+
|
13
|
+
```ruby
|
14
|
+
gem 'redstream'
|
15
|
+
```
|
16
|
+
|
17
|
+
And then execute:
|
18
|
+
|
19
|
+
$ bundle
|
20
|
+
|
21
|
+
Or install it yourself as:
|
22
|
+
|
23
|
+
$ gem install redstream
|
24
|
+
|
25
|
+
## Reference Docs
|
26
|
+
|
27
|
+
The reference docs can be found at
|
28
|
+
[https://www.rubydoc.info/github/mrkamel/redstream/master](https://www.rubydoc.info/github/mrkamel/redstream/master).
|
29
|
+
|
30
|
+
## Usage
|
31
|
+
|
32
|
+
Include `Redstream::Model` in your model and add a call to
|
33
|
+
`redstream_callbacks`.
|
34
|
+
|
35
|
+
```ruby
|
36
|
+
class MyModel < ActiveRecord::Base
|
37
|
+
include Redstream::Model
|
38
|
+
|
39
|
+
# ...
|
40
|
+
|
41
|
+
redstream_callbacks
|
42
|
+
|
43
|
+
# ...
|
44
|
+
end
|
45
|
+
```
|
46
|
+
|
47
|
+
`redstream_callbacks` adds `after_save`, `after_touch`, `after_destroy` and,
|
48
|
+
most importantly, `after_commit` callbacks which write messages, containing the
|
49
|
+
record id, to a redis stream. A background worker can then fetch those messages
|
50
|
+
and update secondary datastores.
|
51
|
+
|
52
|
+
In a background process, you need to run a `Redstream::Consumer`, `Redstream::Delayer`
|
53
|
+
and a `Redstream::Trimmer`:
|
54
|
+
|
55
|
+
```ruby
|
56
|
+
Redstream::Consumer.new(stream_name: Product.redstream_name, name: "consumer").run do |messages|
|
57
|
+
# Update seconday datastore
|
58
|
+
end
|
59
|
+
|
60
|
+
# ...
|
61
|
+
|
62
|
+
Redstream::Delayer.new(stream_name: Product.redstream_name, delay: 5.minutes).run
|
63
|
+
|
64
|
+
# ...
|
65
|
+
|
66
|
+
trimmer = RedStream::Trimmer.new(
|
67
|
+
stream_name: Product.redstream_name,
|
68
|
+
consumer_names: ["indexer", "cacher"],
|
69
|
+
interval: 30
|
70
|
+
)
|
71
|
+
|
72
|
+
trimmer.run
|
73
|
+
```
|
74
|
+
|
75
|
+
As all of them are blocking, you should run them in individual threads. But as
|
76
|
+
none of them must be stopped gracefully, this can be as simple as:
|
77
|
+
|
78
|
+
```ruby
|
79
|
+
Thread.new do
|
80
|
+
Redstream::Consumer.new("...").run do |messages|
|
81
|
+
# ...
|
82
|
+
end
|
83
|
+
end
|
84
|
+
```
|
85
|
+
|
86
|
+
More concretely, `after_save`, `after_touch` and `after_destroy` only write
|
87
|
+
"delay" messages to an additional redis stream. Delay message are like any
|
88
|
+
other messages, but they get processed by a `Redstream::Delayer` and the
|
89
|
+
`Delayer`will wait for some (configurable) delay/time before processing them.
|
90
|
+
As the `Delayer` is neccessary to fix inconsistencies, the delay must be at
|
91
|
+
least as long as your maxiumum database transaction time. Contrary,
|
92
|
+
`after_commit` writes messages to a redis stream from which the messages can
|
93
|
+
be fetched immediately to keep the secondary datastores updated in
|
94
|
+
near-realtime. The reasoning of all this is simple: usually, i.e. by using only
|
95
|
+
one way to update secondary datastores, namely `after_save` or `after_commit`,
|
96
|
+
any errors occurring in between `after_save` and `after_commit` result in
|
97
|
+
inconsistencies between your primary and secondary datastore. By using these
|
98
|
+
kinds of "delay" messages triggered by `after_save` and fetched after e.g. 5
|
99
|
+
minutes, errors occurring in between `after_save` and `after_commit` can be
|
100
|
+
fixed when the delay message get processed.
|
101
|
+
|
102
|
+
Any messages are fetched in batches, such that e.g. elasticsearch can be
|
103
|
+
updated using its bulk API. For instance, depending on which elasticsearch ruby
|
104
|
+
client you are using, the reindexing code regarding elasticsearch will look
|
105
|
+
similar to:
|
106
|
+
|
107
|
+
```ruby
|
108
|
+
Thread.new do
|
109
|
+
Redstream::Consumer.new(stream_name: Product.redstream_name, name: "indexer").run do |messages|
|
110
|
+
ids = messages.map { |message| message.payload["id"] }
|
111
|
+
|
112
|
+
ProductIndex.import Product.where(id: ids)
|
113
|
+
end
|
114
|
+
end
|
115
|
+
|
116
|
+
Thread.new do
|
117
|
+
Redstream::Delayer.new(stream_name: Product.redstream_name, delay: 5.minutes).run
|
118
|
+
end
|
119
|
+
|
120
|
+
Thread.new do
|
121
|
+
RedStream::Trimmer.new(stream_name: Product.redstream_name, consumer_names: ["indexer"], interval: 30).run
|
122
|
+
end
|
123
|
+
```
|
124
|
+
|
125
|
+
You should run a consumer per `(stream_name, name)` tuple on multiple hosts for
|
126
|
+
high availability. They'll use a redis based locking mechanism to ensure that
|
127
|
+
only one consumer is consuming messages per tuple while the others are
|
128
|
+
hot-standbys, i.e. they'll take over in case the currently active instance
|
129
|
+
dies. The same stands for delayers and trimmers.
|
130
|
+
|
131
|
+
Please note: if you have multiple kinds of consumers for a single model/topic,
|
132
|
+
then you must use distinct names. Assume you have an indexer, which updates a
|
133
|
+
search index for a model and a cacher, which updates a cache store for a model:
|
134
|
+
|
135
|
+
```ruby
|
136
|
+
Redstream::Consumer.new(stream_name: Product.redstream_name, name: "indexer").run do |messages|
|
137
|
+
# ...
|
138
|
+
end
|
139
|
+
|
140
|
+
Redstream::Consumer.new(stream_name: Product.redstream_name, name: "cacher").run do |messages|
|
141
|
+
# ...
|
142
|
+
end
|
143
|
+
```
|
144
|
+
|
145
|
+
# Consumer, Delayer, Trimmer, Producer
|
146
|
+
|
147
|
+
A `Consumer` fetches messages that have been added to a redis stream via
|
148
|
+
`after_commit` or by a `Delayer`, i.e. messages that are available for
|
149
|
+
immediate retrieval/reindexing/syncing.
|
150
|
+
|
151
|
+
```ruby
|
152
|
+
Redstream::Consumer.new(stream_name: Product.redstream_name, name: "indexer").run do |messages|
|
153
|
+
ids = messages.map { |message| message.payload["id"] }
|
154
|
+
|
155
|
+
ProductIndex.import Product.where(id: ids)
|
156
|
+
end
|
157
|
+
```
|
158
|
+
|
159
|
+
A `Delayer` fetches messages that have been added to a second redis stream via
|
160
|
+
`after_save`, `after_touch` and `after_destroy` to be retrieved after a certain
|
161
|
+
configurable amount of time (5 minutes usually) to fix inconsistencies. The
|
162
|
+
amount of time must be longer than your maximum database transaction time at
|
163
|
+
least.
|
164
|
+
|
165
|
+
```ruby
|
166
|
+
Redstream::Delayer.new(stream_name: Product.redstream_name, delay: 5.minutes).run
|
167
|
+
```
|
168
|
+
|
169
|
+
A `Trimmer` is responsible to finally remove messages from redis streams.
|
170
|
+
Without a `Trimmer` messages will fill up your redis server and redis will
|
171
|
+
finally crash due to out of memory errors. To be able to trim a stream, you
|
172
|
+
must pass an array containing all consumer names reading from the respective
|
173
|
+
stream. The `Trimmer` then continously checks how far each consumer already
|
174
|
+
processed the stream and trims the stream up to the committed minimum.
|
175
|
+
Contrary, if there is nothing to trim, the `Trimmer` will sleep for a specified
|
176
|
+
`interval`.
|
177
|
+
|
178
|
+
```ruby
|
179
|
+
RedStream::Trimmer.new(stream_name: Product.redstream_name, consumer_names: ["indexer"], interval: 30).run
|
180
|
+
```
|
181
|
+
|
182
|
+
A `Producer` adds messages to the concrete redis streams, and you
|
183
|
+
can actually pass a concrete `Producer` instance via `redstream_callbacks`:
|
184
|
+
|
185
|
+
```ruby
|
186
|
+
class Product < ActiveRecord::Base
|
187
|
+
include Redstream::Model
|
188
|
+
|
189
|
+
# ...
|
190
|
+
|
191
|
+
redstream_callbacks producer: Redstream::Producer.new("...")
|
192
|
+
|
193
|
+
# ...
|
194
|
+
end
|
195
|
+
```
|
196
|
+
|
197
|
+
As you might recognize, `Redstream::Model` is of course only able to send
|
198
|
+
messages to redis streams for model lifecyle callbacks. This is however not
|
199
|
+
the case for `#update_all`:
|
200
|
+
|
201
|
+
```ruby
|
202
|
+
Product.where(on_stock: true).update_all(featured: true)
|
203
|
+
```
|
204
|
+
|
205
|
+
To capture those updates as well, you need to change:
|
206
|
+
|
207
|
+
```ruby
|
208
|
+
Product.where(on_stock: true).update_all(featured: true)
|
209
|
+
```
|
210
|
+
|
211
|
+
to
|
212
|
+
|
213
|
+
```ruby
|
214
|
+
RedstreamProducer = Redstream::Producer.new
|
215
|
+
|
216
|
+
Product.where(on_stock: true).find_in_batches do |products|
|
217
|
+
RedstreamProducer.bulk products do
|
218
|
+
Product.where(id: products.map(&:id)).update_all(featured: true)
|
219
|
+
end
|
220
|
+
end
|
221
|
+
```
|
222
|
+
|
223
|
+
The `Producer` will write a message for every matched record into the delay
|
224
|
+
stream before `update_all` is called and will write another message for every
|
225
|
+
record to the main stream after `update_all` is called - just like it is done
|
226
|
+
within the model lifecycle callbacks.
|
227
|
+
|
228
|
+
The `#bulk` method must ensure that the same set of records is used for the
|
229
|
+
delay messages and the instant messages. Thus, you better directly pass an
|
230
|
+
array of records to `Redstream::Producer#bulk`, like shown above. If you pass
|
231
|
+
an `ActiveRecord::Relation`, the `#bulk` method will convert it to an array,
|
232
|
+
i.e. load the whole result set into memory.
|
233
|
+
|
234
|
+
## Namespacing
|
235
|
+
|
236
|
+
In case you are using a shared redis, where multiple appications read/write
|
237
|
+
from the same redis server using Redstream, key conflicts could occur.
|
238
|
+
To avoid that, you want to use namespacing:
|
239
|
+
|
240
|
+
```ruby
|
241
|
+
Redstream.namespace = 'my_app'
|
242
|
+
```
|
243
|
+
|
244
|
+
such that every application will have its own namespaced Redstream keys.
|
245
|
+
|
246
|
+
## Contributing
|
247
|
+
|
248
|
+
Bug reports and pull requests are welcome on GitHub at https://github.com/mrkamel/redstream
|
249
|
+
|
250
|
+
## License
|
251
|
+
|
252
|
+
The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
|
253
|
+
|
data/Rakefile
ADDED
data/docker-compose.yml
ADDED
data/lib/redstream.rb
ADDED
@@ -0,0 +1,134 @@
|
|
1
|
+
|
2
|
+
require "active_support/inflector"
|
3
|
+
require "connection_pool"
|
4
|
+
require "redis"
|
5
|
+
require "json"
|
6
|
+
require "thread"
|
7
|
+
require "set"
|
8
|
+
|
9
|
+
require "redstream/version"
|
10
|
+
require "redstream/lock"
|
11
|
+
require "redstream/message"
|
12
|
+
require "redstream/consumer"
|
13
|
+
require "redstream/producer"
|
14
|
+
require "redstream/delayer"
|
15
|
+
require "redstream/model"
|
16
|
+
require "redstream/trimmer"
|
17
|
+
|
18
|
+
module Redstream
|
19
|
+
# Redstream uses the connection_pool gem to pool redis connections. In case
|
20
|
+
# you have a distributed redis setup (sentinel/cluster) or the default pool
|
21
|
+
# size doesn't match your requirements, then you must specify the connection
|
22
|
+
# pool. A connection pool is neccessary, because redstream is using blocking
|
23
|
+
# commands. Please note, redis connections are somewhat cheap, so you better
|
24
|
+
# specify the pool size to be large enough instead of running into
|
25
|
+
# bottlenecks.
|
26
|
+
#
|
27
|
+
# @example
|
28
|
+
# Redstream.connection_pool = ConnectionPool.new(size: 50) do
|
29
|
+
# Redis.new("...")
|
30
|
+
# end
|
31
|
+
|
32
|
+
def self.connection_pool=(connection_pool)
|
33
|
+
@connection_pool = connection_pool
|
34
|
+
end
|
35
|
+
|
36
|
+
# Returns the connection pool instance or sets and creates a new connection
|
37
|
+
# pool in case no pool is yet created.
|
38
|
+
#
|
39
|
+
# @return [ConnectionPool] The connection pool
|
40
|
+
|
41
|
+
def self.connection_pool
|
42
|
+
@connection_pool ||= ConnectionPool.new { Redis.new }
|
43
|
+
end
|
44
|
+
|
45
|
+
# You can specify a namespace to use for redis keys. This is useful in case
|
46
|
+
# you are using a shared redis.
|
47
|
+
#
|
48
|
+
# @example
|
49
|
+
# Redstream.namespace = 'my_app'
|
50
|
+
|
51
|
+
def self.namespace=(namespace)
|
52
|
+
@namespace = namespace
|
53
|
+
end
|
54
|
+
|
55
|
+
# Returns the previously set namespace for redis keys to be used by
|
56
|
+
# Redstream.
|
57
|
+
|
58
|
+
def self.namespace
|
59
|
+
@namespace
|
60
|
+
end
|
61
|
+
|
62
|
+
# Returns the max id of the specified stream, i.e. the id of the
|
63
|
+
# last/newest message added. Returns nil for empty streams.
|
64
|
+
#
|
65
|
+
# @param stream_name [String] The stream name
|
66
|
+
# @return [String, nil] The id of a stream's newest messages, or nil
|
67
|
+
|
68
|
+
def self.max_stream_id(stream_name)
|
69
|
+
connection_pool.with do |redis|
|
70
|
+
message = redis.xrevrange(stream_key_name(stream_name), "+", "-", count: 1).first
|
71
|
+
|
72
|
+
return unless message
|
73
|
+
|
74
|
+
message[0]
|
75
|
+
end
|
76
|
+
end
|
77
|
+
|
78
|
+
# Returns the max committed id, i.e. the consumer's offset, for the specified
|
79
|
+
# consumer name.
|
80
|
+
#
|
81
|
+
# @param stream_name [String] the stream name
|
82
|
+
# @param name [String] the consumer name
|
83
|
+
#
|
84
|
+
# @return [String, nil] The max committed offset, or nil
|
85
|
+
|
86
|
+
def self.max_consumer_id(stream_name:, consumer_name:)
|
87
|
+
connection_pool.with do |redis|
|
88
|
+
redis.get offset_key_name(stream_name: stream_name, consumer_name: consumer_name)
|
89
|
+
end
|
90
|
+
end
|
91
|
+
|
92
|
+
# @api private
|
93
|
+
#
|
94
|
+
# Generates the low level redis stream key name.
|
95
|
+
#
|
96
|
+
# @param stream_name A high level stream name
|
97
|
+
# @return [String] A low level redis stream key name
|
98
|
+
|
99
|
+
def self.stream_key_name(stream_name)
|
100
|
+
"#{base_key_name}:stream:#{stream_name}"
|
101
|
+
end
|
102
|
+
|
103
|
+
# @api private
|
104
|
+
#
|
105
|
+
# Generates the redis key name used for storing a consumer's current offset,
|
106
|
+
# i.e. the maximum id successfully processed.
|
107
|
+
#
|
108
|
+
# @param consumer_name A high level consumer name
|
109
|
+
# @return [String] A redis key name for storing a stream's current offset
|
110
|
+
|
111
|
+
def self.offset_key_name(stream_name:, consumer_name:)
|
112
|
+
"#{base_key_name}:offset:#{stream_name}:#{consumer_name}"
|
113
|
+
end
|
114
|
+
|
115
|
+
# @api private
|
116
|
+
#
|
117
|
+
# Generates the redis key name used for locking.
|
118
|
+
#
|
119
|
+
# @param name A high level name for the lock
|
120
|
+
# @return [String] A redis key name used for locking
|
121
|
+
|
122
|
+
def self.lock_key_name(name)
|
123
|
+
"#{base_key_name}:lock:#{name}"
|
124
|
+
end
|
125
|
+
|
126
|
+
# @api private
|
127
|
+
#
|
128
|
+
# Returns the full name namespace for redis keys.
|
129
|
+
|
130
|
+
def self.base_key_name
|
131
|
+
[namespace, "redstream"].compact.join(":")
|
132
|
+
end
|
133
|
+
end
|
134
|
+
|