redstream 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 0e1ddc2700836c469d1ca61069e3416c21e657e05725b92e75969aa8110768e3
4
+ data.tar.gz: c8565f3754b3fd4f66823d6d7035e814f73e27abdb15936ecd906f5f07dd8643
5
+ SHA512:
6
+ metadata.gz: edd496df8d06b98b9318b9796f400e2c0870edfc84c3aa7f9c7946dbe6cf91c5a8c0ab32425d627bc20c585389eab92ed1b290e57e0df856e8995547d8a9b7c6
7
+ data.tar.gz: 4893d2197f427479e4df0821ca29a23ee98a604fa73680f955da8d2c71cbdb192d006c476dc3bd6c03c719da327c3d9b6f207842082a64133f0fe2383771aef5
@@ -0,0 +1,14 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /Gemfile.lock
4
+ /_yardoc/
5
+ /coverage/
6
+ /doc/
7
+ /pkg/
8
+ /spec/reports/
9
+ /tmp/
10
+ *.bundle
11
+ *.so
12
+ *.o
13
+ *.a
14
+ mkmf.log
@@ -0,0 +1,10 @@
1
+ sudo: false
2
+ language: ruby
3
+ rvm:
4
+ - ruby-head
5
+ before_install:
6
+ - docker-compose up -d
7
+ - sleep 10
8
+ install:
9
+ - travis_retry bundle install
10
+ script: rspec
data/Gemfile ADDED
@@ -0,0 +1,5 @@
1
+
2
+ source "https://rubygems.org"
3
+
4
+ gemspec
5
+
@@ -0,0 +1,22 @@
1
+ Copyright (c) 2014 Benjamin Vetter
2
+
3
+ MIT License
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining
6
+ a copy of this software and associated documentation files (the
7
+ "Software"), to deal in the Software without restriction, including
8
+ without limitation the rights to use, copy, modify, merge, publish,
9
+ distribute, sublicense, and/or sell copies of the Software, and to
10
+ permit persons to whom the Software is furnished to do so, subject to
11
+ the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,253 @@
1
+
2
+ # Redstream
3
+
4
+ **Using redis streams to keep your primary database in sync with secondary
5
+ datastores (e.g. elasticsearch).**
6
+
7
+ [![Build Status](https://secure.travis-ci.org/mrkamel/redstream.png?branch=master)](http://travis-ci.org/mrkamel/redstream)
8
+
9
+ ## Installation
10
+
11
+ First, install redis. Then, add this line to your application's Gemfile:
12
+
13
+ ```ruby
14
+ gem 'redstream'
15
+ ```
16
+
17
+ And then execute:
18
+
19
+ $ bundle
20
+
21
+ Or install it yourself as:
22
+
23
+ $ gem install redstream
24
+
25
+ ## Reference Docs
26
+
27
+ The reference docs can be found at
28
+ [https://www.rubydoc.info/github/mrkamel/redstream/master](https://www.rubydoc.info/github/mrkamel/redstream/master).
29
+
30
+ ## Usage
31
+
32
+ Include `Redstream::Model` in your model and add a call to
33
+ `redstream_callbacks`.
34
+
35
+ ```ruby
36
+ class MyModel < ActiveRecord::Base
37
+ include Redstream::Model
38
+
39
+ # ...
40
+
41
+ redstream_callbacks
42
+
43
+ # ...
44
+ end
45
+ ```
46
+
47
+ `redstream_callbacks` adds `after_save`, `after_touch`, `after_destroy` and,
48
+ most importantly, `after_commit` callbacks which write messages, containing the
49
+ record id, to a redis stream. A background worker can then fetch those messages
50
+ and update secondary datastores.
51
+
52
+ In a background process, you need to run a `Redstream::Consumer`, `Redstream::Delayer`
53
+ and a `Redstream::Trimmer`:
54
+
55
+ ```ruby
56
+ Redstream::Consumer.new(stream_name: Product.redstream_name, name: "consumer").run do |messages|
57
+ # Update seconday datastore
58
+ end
59
+
60
+ # ...
61
+
62
+ Redstream::Delayer.new(stream_name: Product.redstream_name, delay: 5.minutes).run
63
+
64
+ # ...
65
+
66
+ trimmer = RedStream::Trimmer.new(
67
+ stream_name: Product.redstream_name,
68
+ consumer_names: ["indexer", "cacher"],
69
+ interval: 30
70
+ )
71
+
72
+ trimmer.run
73
+ ```
74
+
75
+ As all of them are blocking, you should run them in individual threads. But as
76
+ none of them must be stopped gracefully, this can be as simple as:
77
+
78
+ ```ruby
79
+ Thread.new do
80
+ Redstream::Consumer.new("...").run do |messages|
81
+ # ...
82
+ end
83
+ end
84
+ ```
85
+
86
+ More concretely, `after_save`, `after_touch` and `after_destroy` only write
87
+ "delay" messages to an additional redis stream. Delay message are like any
88
+ other messages, but they get processed by a `Redstream::Delayer` and the
89
+ `Delayer`will wait for some (configurable) delay/time before processing them.
90
+ As the `Delayer` is neccessary to fix inconsistencies, the delay must be at
91
+ least as long as your maxiumum database transaction time. Contrary,
92
+ `after_commit` writes messages to a redis stream from which the messages can
93
+ be fetched immediately to keep the secondary datastores updated in
94
+ near-realtime. The reasoning of all this is simple: usually, i.e. by using only
95
+ one way to update secondary datastores, namely `after_save` or `after_commit`,
96
+ any errors occurring in between `after_save` and `after_commit` result in
97
+ inconsistencies between your primary and secondary datastore. By using these
98
+ kinds of "delay" messages triggered by `after_save` and fetched after e.g. 5
99
+ minutes, errors occurring in between `after_save` and `after_commit` can be
100
+ fixed when the delay message get processed.
101
+
102
+ Any messages are fetched in batches, such that e.g. elasticsearch can be
103
+ updated using its bulk API. For instance, depending on which elasticsearch ruby
104
+ client you are using, the reindexing code regarding elasticsearch will look
105
+ similar to:
106
+
107
+ ```ruby
108
+ Thread.new do
109
+ Redstream::Consumer.new(stream_name: Product.redstream_name, name: "indexer").run do |messages|
110
+ ids = messages.map { |message| message.payload["id"] }
111
+
112
+ ProductIndex.import Product.where(id: ids)
113
+ end
114
+ end
115
+
116
+ Thread.new do
117
+ Redstream::Delayer.new(stream_name: Product.redstream_name, delay: 5.minutes).run
118
+ end
119
+
120
+ Thread.new do
121
+ RedStream::Trimmer.new(stream_name: Product.redstream_name, consumer_names: ["indexer"], interval: 30).run
122
+ end
123
+ ```
124
+
125
+ You should run a consumer per `(stream_name, name)` tuple on multiple hosts for
126
+ high availability. They'll use a redis based locking mechanism to ensure that
127
+ only one consumer is consuming messages per tuple while the others are
128
+ hot-standbys, i.e. they'll take over in case the currently active instance
129
+ dies. The same stands for delayers and trimmers.
130
+
131
+ Please note: if you have multiple kinds of consumers for a single model/topic,
132
+ then you must use distinct names. Assume you have an indexer, which updates a
133
+ search index for a model and a cacher, which updates a cache store for a model:
134
+
135
+ ```ruby
136
+ Redstream::Consumer.new(stream_name: Product.redstream_name, name: "indexer").run do |messages|
137
+ # ...
138
+ end
139
+
140
+ Redstream::Consumer.new(stream_name: Product.redstream_name, name: "cacher").run do |messages|
141
+ # ...
142
+ end
143
+ ```
144
+
145
+ # Consumer, Delayer, Trimmer, Producer
146
+
147
+ A `Consumer` fetches messages that have been added to a redis stream via
148
+ `after_commit` or by a `Delayer`, i.e. messages that are available for
149
+ immediate retrieval/reindexing/syncing.
150
+
151
+ ```ruby
152
+ Redstream::Consumer.new(stream_name: Product.redstream_name, name: "indexer").run do |messages|
153
+ ids = messages.map { |message| message.payload["id"] }
154
+
155
+ ProductIndex.import Product.where(id: ids)
156
+ end
157
+ ```
158
+
159
+ A `Delayer` fetches messages that have been added to a second redis stream via
160
+ `after_save`, `after_touch` and `after_destroy` to be retrieved after a certain
161
+ configurable amount of time (5 minutes usually) to fix inconsistencies. The
162
+ amount of time must be longer than your maximum database transaction time at
163
+ least.
164
+
165
+ ```ruby
166
+ Redstream::Delayer.new(stream_name: Product.redstream_name, delay: 5.minutes).run
167
+ ```
168
+
169
+ A `Trimmer` is responsible to finally remove messages from redis streams.
170
+ Without a `Trimmer` messages will fill up your redis server and redis will
171
+ finally crash due to out of memory errors. To be able to trim a stream, you
172
+ must pass an array containing all consumer names reading from the respective
173
+ stream. The `Trimmer` then continously checks how far each consumer already
174
+ processed the stream and trims the stream up to the committed minimum.
175
+ Contrary, if there is nothing to trim, the `Trimmer` will sleep for a specified
176
+ `interval`.
177
+
178
+ ```ruby
179
+ RedStream::Trimmer.new(stream_name: Product.redstream_name, consumer_names: ["indexer"], interval: 30).run
180
+ ```
181
+
182
+ A `Producer` adds messages to the concrete redis streams, and you
183
+ can actually pass a concrete `Producer` instance via `redstream_callbacks`:
184
+
185
+ ```ruby
186
+ class Product < ActiveRecord::Base
187
+ include Redstream::Model
188
+
189
+ # ...
190
+
191
+ redstream_callbacks producer: Redstream::Producer.new("...")
192
+
193
+ # ...
194
+ end
195
+ ```
196
+
197
+ As you might recognize, `Redstream::Model` is of course only able to send
198
+ messages to redis streams for model lifecyle callbacks. This is however not
199
+ the case for `#update_all`:
200
+
201
+ ```ruby
202
+ Product.where(on_stock: true).update_all(featured: true)
203
+ ```
204
+
205
+ To capture those updates as well, you need to change:
206
+
207
+ ```ruby
208
+ Product.where(on_stock: true).update_all(featured: true)
209
+ ```
210
+
211
+ to
212
+
213
+ ```ruby
214
+ RedstreamProducer = Redstream::Producer.new
215
+
216
+ Product.where(on_stock: true).find_in_batches do |products|
217
+ RedstreamProducer.bulk products do
218
+ Product.where(id: products.map(&:id)).update_all(featured: true)
219
+ end
220
+ end
221
+ ```
222
+
223
+ The `Producer` will write a message for every matched record into the delay
224
+ stream before `update_all` is called and will write another message for every
225
+ record to the main stream after `update_all` is called - just like it is done
226
+ within the model lifecycle callbacks.
227
+
228
+ The `#bulk` method must ensure that the same set of records is used for the
229
+ delay messages and the instant messages. Thus, you better directly pass an
230
+ array of records to `Redstream::Producer#bulk`, like shown above. If you pass
231
+ an `ActiveRecord::Relation`, the `#bulk` method will convert it to an array,
232
+ i.e. load the whole result set into memory.
233
+
234
+ ## Namespacing
235
+
236
+ In case you are using a shared redis, where multiple appications read/write
237
+ from the same redis server using Redstream, key conflicts could occur.
238
+ To avoid that, you want to use namespacing:
239
+
240
+ ```ruby
241
+ Redstream.namespace = 'my_app'
242
+ ```
243
+
244
+ such that every application will have its own namespaced Redstream keys.
245
+
246
+ ## Contributing
247
+
248
+ Bug reports and pull requests are welcome on GitHub at https://github.com/mrkamel/redstream
249
+
250
+ ## License
251
+
252
+ The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
253
+
@@ -0,0 +1,9 @@
1
+ require "bundler/gem_tasks"
2
+ require "rake/testtask"
3
+
4
+ Rake::TestTask.new(:test) do |t|
5
+ t.libs << "lib"
6
+ t.pattern = "test/**/*_test.rb"
7
+ t.verbose = true
8
+ end
9
+
@@ -0,0 +1,6 @@
1
+ version: '2'
2
+ services:
3
+ redis:
4
+ image: redis:5.0
5
+ ports:
6
+ - 127.0.0.1:6379:6379
@@ -0,0 +1,134 @@
1
+
2
+ require "active_support/inflector"
3
+ require "connection_pool"
4
+ require "redis"
5
+ require "json"
6
+ require "thread"
7
+ require "set"
8
+
9
+ require "redstream/version"
10
+ require "redstream/lock"
11
+ require "redstream/message"
12
+ require "redstream/consumer"
13
+ require "redstream/producer"
14
+ require "redstream/delayer"
15
+ require "redstream/model"
16
+ require "redstream/trimmer"
17
+
18
+ module Redstream
19
+ # Redstream uses the connection_pool gem to pool redis connections. In case
20
+ # you have a distributed redis setup (sentinel/cluster) or the default pool
21
+ # size doesn't match your requirements, then you must specify the connection
22
+ # pool. A connection pool is neccessary, because redstream is using blocking
23
+ # commands. Please note, redis connections are somewhat cheap, so you better
24
+ # specify the pool size to be large enough instead of running into
25
+ # bottlenecks.
26
+ #
27
+ # @example
28
+ # Redstream.connection_pool = ConnectionPool.new(size: 50) do
29
+ # Redis.new("...")
30
+ # end
31
+
32
+ def self.connection_pool=(connection_pool)
33
+ @connection_pool = connection_pool
34
+ end
35
+
36
+ # Returns the connection pool instance or sets and creates a new connection
37
+ # pool in case no pool is yet created.
38
+ #
39
+ # @return [ConnectionPool] The connection pool
40
+
41
+ def self.connection_pool
42
+ @connection_pool ||= ConnectionPool.new { Redis.new }
43
+ end
44
+
45
+ # You can specify a namespace to use for redis keys. This is useful in case
46
+ # you are using a shared redis.
47
+ #
48
+ # @example
49
+ # Redstream.namespace = 'my_app'
50
+
51
+ def self.namespace=(namespace)
52
+ @namespace = namespace
53
+ end
54
+
55
+ # Returns the previously set namespace for redis keys to be used by
56
+ # Redstream.
57
+
58
+ def self.namespace
59
+ @namespace
60
+ end
61
+
62
+ # Returns the max id of the specified stream, i.e. the id of the
63
+ # last/newest message added. Returns nil for empty streams.
64
+ #
65
+ # @param stream_name [String] The stream name
66
+ # @return [String, nil] The id of a stream's newest messages, or nil
67
+
68
+ def self.max_stream_id(stream_name)
69
+ connection_pool.with do |redis|
70
+ message = redis.xrevrange(stream_key_name(stream_name), "+", "-", count: 1).first
71
+
72
+ return unless message
73
+
74
+ message[0]
75
+ end
76
+ end
77
+
78
+ # Returns the max committed id, i.e. the consumer's offset, for the specified
79
+ # consumer name.
80
+ #
81
+ # @param stream_name [String] the stream name
82
+ # @param name [String] the consumer name
83
+ #
84
+ # @return [String, nil] The max committed offset, or nil
85
+
86
+ def self.max_consumer_id(stream_name:, consumer_name:)
87
+ connection_pool.with do |redis|
88
+ redis.get offset_key_name(stream_name: stream_name, consumer_name: consumer_name)
89
+ end
90
+ end
91
+
92
+ # @api private
93
+ #
94
+ # Generates the low level redis stream key name.
95
+ #
96
+ # @param stream_name A high level stream name
97
+ # @return [String] A low level redis stream key name
98
+
99
+ def self.stream_key_name(stream_name)
100
+ "#{base_key_name}:stream:#{stream_name}"
101
+ end
102
+
103
+ # @api private
104
+ #
105
+ # Generates the redis key name used for storing a consumer's current offset,
106
+ # i.e. the maximum id successfully processed.
107
+ #
108
+ # @param consumer_name A high level consumer name
109
+ # @return [String] A redis key name for storing a stream's current offset
110
+
111
+ def self.offset_key_name(stream_name:, consumer_name:)
112
+ "#{base_key_name}:offset:#{stream_name}:#{consumer_name}"
113
+ end
114
+
115
+ # @api private
116
+ #
117
+ # Generates the redis key name used for locking.
118
+ #
119
+ # @param name A high level name for the lock
120
+ # @return [String] A redis key name used for locking
121
+
122
+ def self.lock_key_name(name)
123
+ "#{base_key_name}:lock:#{name}"
124
+ end
125
+
126
+ # @api private
127
+ #
128
+ # Returns the full name namespace for redis keys.
129
+
130
+ def self.base_key_name
131
+ [namespace, "redstream"].compact.join(":")
132
+ end
133
+ end
134
+