redis-bitops 0.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/MIT-LICENSE +20 -0
- data/README.md +193 -0
- data/lib/redis/bitops.rb +26 -0
- data/lib/redis/bitops/bitmap.rb +107 -0
- data/lib/redis/bitops/configuration.rb +38 -0
- data/lib/redis/bitops/queries/binary_operator.rb +71 -0
- data/lib/redis/bitops/queries/lazy_evaluation.rb +47 -0
- data/lib/redis/bitops/queries/materialization_helpers.rb +53 -0
- data/lib/redis/bitops/queries/tree_building_helpers.rb +46 -0
- data/lib/redis/bitops/queries/unary_operator.rb +48 -0
- data/lib/redis/bitops/sparse_bitmap.rb +125 -0
- data/spec/redis/bitops/bitmap_spec.rb +9 -0
- data/spec/redis/bitops/queries/binary_operator_spec.rb +24 -0
- data/spec/redis/bitops/queries/unary_operator_spec.rb +27 -0
- data/spec/redis/bitops/sparse_bitmap_spec.rb +99 -0
- data/spec/spec_helper.rb +16 -0
- data/spec/support/bitmap_examples.rb +313 -0
- metadata +173 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: a567e3b9af870d0a1875760e5c02ea1562b1204b
|
4
|
+
data.tar.gz: 0d27e8d824da9cc252aa3058c5c0fcb90c1af965
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 9a127af0eda591bd37ad31378275c52ea1cd8d4280fe836520e67d4ae423947836316583edffb4f2a0a153ba67fe4cc7c75c741b98bea862761ba2b4f92f731c
|
7
|
+
data.tar.gz: 56f170a119fff911cc250597887c4e49b6d5c9a784817315d02b80fbd70db129ca39ceaeb2a3d999bf5bba51066d3df705184a9ec1971a12084d215565ad89c1
|
data/MIT-LICENSE
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
Copyright 2014 Martin Bilski
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
4
|
+
a copy of this software and associated documentation files (the
|
5
|
+
"Software"), to deal in the Software without restriction, including
|
6
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
7
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
8
|
+
permit persons to whom the Software is furnished to do so, subject to
|
9
|
+
the following conditions:
|
10
|
+
|
11
|
+
The above copyright notice and this permission notice shall be
|
12
|
+
included in all copies or substantial portions of the Software.
|
13
|
+
|
14
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
15
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
16
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
17
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
18
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
19
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
20
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,193 @@
|
|
1
|
+
# Introduction
|
2
|
+
|
3
|
+
This gem makes it easier to do bit-wise operations on large Redis bitsets, usually called bitmaps, with a natural expression syntax. It also supports huge **sparse bitmaps** by storing data in multiple keys, called chunks, per bitmap.
|
4
|
+
|
5
|
+
The typical use is real-time web analytics where each bit in a bitmap/bitset corresponds to a user ([introductory article here](http://blog.getspool.com/2011/11/29/fast-easy-realtime-metrics-using-redis-bitmaps/)). This library isn't an analytic package though, it's more low level than that and you can use it for anything.
|
6
|
+
|
7
|
+
**This library is under development and its interface might change.**
|
8
|
+
|
9
|
+
|
10
|
+
# Quick start/pitch
|
11
|
+
|
12
|
+
Why use the library?
|
13
|
+
|
14
|
+
require 'redis/bitops'
|
15
|
+
redis = Redis.new
|
16
|
+
|
17
|
+
b1 = redis.sparse_bitmap("b1")
|
18
|
+
b1[128_000_000] = true
|
19
|
+
b2 = redis.sparse_bitmap("b2")
|
20
|
+
b2[128_000_000] = true
|
21
|
+
result = b1 & b2
|
22
|
+
|
23
|
+
Memory usage: about 20kb because it uses a sparse bitmap implementation using chunks of data.
|
24
|
+
|
25
|
+
Let's go crazy with super-complicated expressions:
|
26
|
+
|
27
|
+
...
|
28
|
+
result = (b1 & ~b2) | b3 (| (b4 & b5 & b6 & ~b7))
|
29
|
+
|
30
|
+
Imagine writing this expression using Redis#bitop!
|
31
|
+
|
32
|
+
|
33
|
+
# Installation
|
34
|
+
|
35
|
+
To install the gem:
|
36
|
+
|
37
|
+
gem install redis-bitops
|
38
|
+
|
39
|
+
To use it in your code:
|
40
|
+
|
41
|
+
require 'redis/bitops'
|
42
|
+
|
43
|
+
# Usage
|
44
|
+
|
45
|
+
Reference: [here](http://rdoc.info/github/bilus/redis-bitops/master/frames)
|
46
|
+
|
47
|
+
## Basic example
|
48
|
+
|
49
|
+
An example is often better than theory so here's one. Let's create a few bitmaps and set their individual bits; we'll use those bitmaps in the examples below:
|
50
|
+
|
51
|
+
redis = Redis.new
|
52
|
+
|
53
|
+
a = redis.bitmap("a")
|
54
|
+
b = redis.bitmap("b")
|
55
|
+
result = redis.bitmap("result")
|
56
|
+
|
57
|
+
b[0] = true; b[2] = true; b[7] = true # 10100001
|
58
|
+
a[0] = true; a[1] = true; a[7] = true # 11000001
|
59
|
+
|
60
|
+
So, now here's a very simple expression:
|
61
|
+
|
62
|
+
c = a & b
|
63
|
+
|
64
|
+
You may be surprised but the above statement does not query Redis at all! The expression is lazy-evaluated when you access the result:
|
65
|
+
|
66
|
+
puts c.bitcount # => 2
|
67
|
+
puts c[0] # => true
|
68
|
+
puts c[1] # => false
|
69
|
+
puts c[2] # => false
|
70
|
+
puts c[7] # => false
|
71
|
+
|
72
|
+
So, in the above example, the call to `c.bitcount` happens to be the first moment when Redis is queried. The result is stored under a temporary unique key.
|
73
|
+
|
74
|
+
puts c.root_key # => "redis:bitops:8eef38u9o09334"
|
75
|
+
|
76
|
+
Let's delete the temporary result:
|
77
|
+
|
78
|
+
c.delete!
|
79
|
+
|
80
|
+
If you want to store the result directly under a specific key:
|
81
|
+
|
82
|
+
result << c
|
83
|
+
|
84
|
+
Or, more adventurously, we can use the following more complex one-liner:
|
85
|
+
|
86
|
+
result << (~c & (a | b))
|
87
|
+
|
88
|
+
**Note: ** expressions are optimized by reducing the number of Redis commands and using as few temporary keys to hold intermediate values as possible. See below for details.
|
89
|
+
|
90
|
+
|
91
|
+
## Sparse bitmaps
|
92
|
+
|
93
|
+
### Usage
|
94
|
+
|
95
|
+
You don't have to do anything special, simply use `Redis#sparse_bitmap` instead of `Redis#bitmap`:
|
96
|
+
|
97
|
+
a = redis.sparse_bitmap("a")
|
98
|
+
b = redis.sparse_bitmap("b")
|
99
|
+
result = redis.sparse_bitmap("result")
|
100
|
+
|
101
|
+
b[0] = true; b[2] = true; b[7] = true # 10100001
|
102
|
+
a[0] = true; a[1] = true; a[7] = true # 11000001
|
103
|
+
|
104
|
+
c = a & b
|
105
|
+
|
106
|
+
result << c
|
107
|
+
|
108
|
+
or just:
|
109
|
+
|
110
|
+
result << (a & b)
|
111
|
+
|
112
|
+
You can specify the chunk size (in bytes).
|
113
|
+
|
114
|
+
Use the size consistently. Note that it cannot be re-adjusted for data already saved to Redis:
|
115
|
+
|
116
|
+
x = redis.sparse_bitmap("x", 1024 * 1024) # 1 MB per chunk.
|
117
|
+
x[0] = true
|
118
|
+
x[1000] = true
|
119
|
+
|
120
|
+
**Important:** Do not mix sparse bitmaps with regular ones and never mix sparse bitmaps with different chunk sizes in the same expressions.
|
121
|
+
|
122
|
+
### Rationale
|
123
|
+
|
124
|
+
If you want to store a lot of huge but sparse bitsets, with not many bits set, using regular Redis bitmaps doesn't work very well. It wastes a lot of space. In analytics, it's a reasonable requirement, to be able to store data about several million users. A bitmap for 10 million users weights over 1MB! Imagine storing hourly statistics and using up memory at a rate of 720MB per month.
|
125
|
+
|
126
|
+
For, say, 100 million users it becomes outright prohibitive!
|
127
|
+
|
128
|
+
But even with a fairly popular websites, I dare say, you don't often have one million users per hour :) This means that the majority of those bits is never sets and a lot of space goes wasted.
|
129
|
+
|
130
|
+
Enter sparse bitmaps. They divide each bitmap into chunks thus minimizing memory use (chunks' size can be configured, see Configuration below).
|
131
|
+
|
132
|
+
Creating and using sparse bitmaps is identical to using regular bitmaps:
|
133
|
+
|
134
|
+
huge = redis.sparse_bitmap("huge_bitmap")
|
135
|
+
huge[128_000_000] = true
|
136
|
+
|
137
|
+
The only difference in the above example is that it will allocate two 32kb chunks as opposed to 1MB that would be allocated if we used a regular bitmap (Redis#bitmap). In addition, setting the bit is nearly instantaneous.
|
138
|
+
|
139
|
+
Compare:
|
140
|
+
|
141
|
+
puts Benchmark.measure {
|
142
|
+
sparse = redis.sparse_bitmap("huge_sparse_bitmap")
|
143
|
+
sparse[500_000_000] = true
|
144
|
+
}
|
145
|
+
|
146
|
+
which on my machine this generates:
|
147
|
+
|
148
|
+
0.000000 0.000000 0.000000 ( 0.000366)
|
149
|
+
|
150
|
+
It uses just 23kb memory as opposed to 120MB (megabytes!) to store the bit using a regular Redis bitmap:
|
151
|
+
|
152
|
+
regular = redis.bitmap("huge_regular_bitmap")
|
153
|
+
regular[500_000_000] = true
|
154
|
+
|
155
|
+
## Configuration
|
156
|
+
|
157
|
+
Here's how to configure the gem:
|
158
|
+
|
159
|
+
Redis::Bitops.configure do |config|
|
160
|
+
config.default_bytes_per_chunk = 8096 # Eight kilobytes.
|
161
|
+
config.transaction_level = :bitmap # allowed values: :bitmap or :none.
|
162
|
+
end
|
163
|
+
|
164
|
+
# Implementation & efficiency
|
165
|
+
|
166
|
+
## Optimization phase
|
167
|
+
|
168
|
+
Prior to evaluation, the expression is optimized by combining operators into single BITOP commands and reusing temporary keys (required to store intermediate results) as much as possible.
|
169
|
+
|
170
|
+
This silly example:
|
171
|
+
|
172
|
+
result << (a & b & c | a | b)
|
173
|
+
|
174
|
+
translates into simply:
|
175
|
+
|
176
|
+
BITOP AND result a b c
|
177
|
+
BITOP OR result result a b
|
178
|
+
|
179
|
+
and doesn't create any temporary keys at all!
|
180
|
+
|
181
|
+
## Materialization phase
|
182
|
+
|
183
|
+
At this point, the calculations are carried out and the result is saved under the destination key. Note that, for sparse bitmaps, multiple keys may be created.
|
184
|
+
|
185
|
+
|
186
|
+
## Transaction levels
|
187
|
+
|
188
|
+
TBD
|
189
|
+
|
190
|
+
|
191
|
+
## Contributing/feedback
|
192
|
+
|
193
|
+
Please send in your suggestions to [gyamtso@gmail.com](mailto:gyamtso@gmail.com). Pull requests, issues, comments are more than welcome.
|
data/lib/redis/bitops.rb
ADDED
@@ -0,0 +1,26 @@
|
|
1
|
+
require 'redis'
|
2
|
+
require 'redis/bitops/queries/materialization_helpers'
|
3
|
+
require 'redis/bitops/queries/tree_building_helpers'
|
4
|
+
require 'redis/bitops/queries/lazy_evaluation'
|
5
|
+
require 'redis/bitops/queries/binary_operator'
|
6
|
+
require 'redis/bitops/queries/unary_operator'
|
7
|
+
require 'redis/bitops/bitmap'
|
8
|
+
require 'redis/bitops/sparse_bitmap'
|
9
|
+
|
10
|
+
require 'redis/bitops/configuration'
|
11
|
+
|
12
|
+
|
13
|
+
class Redis
|
14
|
+
|
15
|
+
# Creates a new bitmap.
|
16
|
+
#
|
17
|
+
def bitmap(key)
|
18
|
+
Bitops::Bitmap.new(key, self)
|
19
|
+
end
|
20
|
+
|
21
|
+
# Creates a new sparse bitmap storing data in n chunks to conserve memory.
|
22
|
+
#
|
23
|
+
def sparse_bitmap(key, bytes_per_chunk = nil)
|
24
|
+
Bitops::SparseBitmap.new(key, self, bytes_per_chunk)
|
25
|
+
end
|
26
|
+
end
|
@@ -0,0 +1,107 @@
|
|
1
|
+
class Redis
|
2
|
+
module Bitops
|
3
|
+
|
4
|
+
# A sparse bitmap using multiple key to store its data to save memory.
|
5
|
+
#
|
6
|
+
# Note: When adding new public methods, revise the LazyEvaluation module.
|
7
|
+
#
|
8
|
+
class Bitmap
|
9
|
+
|
10
|
+
include Queries
|
11
|
+
include TreeBuildingHelpers # See for a list of supported operators.
|
12
|
+
|
13
|
+
# Creates a new regular Redis bitmap stored in 'redis' under 'root_key'.
|
14
|
+
#
|
15
|
+
def initialize(root_key, redis)
|
16
|
+
@redis = redis
|
17
|
+
@root_key = root_key
|
18
|
+
end
|
19
|
+
|
20
|
+
# Saves the result of the query in the bitmap.
|
21
|
+
#
|
22
|
+
def << (query)
|
23
|
+
query.evaluate(self)
|
24
|
+
end
|
25
|
+
|
26
|
+
# Reads bit at position 'pos' returning a boolean.
|
27
|
+
#
|
28
|
+
def [] (pos)
|
29
|
+
i2b(@redis.getbit(key(pos), offset(pos)))
|
30
|
+
end
|
31
|
+
|
32
|
+
# Sets bit at position 'pos' to 1 or 0 based on the boolean 'b'.
|
33
|
+
#
|
34
|
+
def []= (pos, b)
|
35
|
+
@redis.setbit(key(pos), offset(pos), b2i(b))
|
36
|
+
end
|
37
|
+
|
38
|
+
# Returns the number of set bits.
|
39
|
+
#
|
40
|
+
def bitcount
|
41
|
+
@redis.bitcount(@root_key)
|
42
|
+
end
|
43
|
+
|
44
|
+
# Deletes the bitmap and all its keys.
|
45
|
+
#
|
46
|
+
def delete!
|
47
|
+
@redis.del(@root_key)
|
48
|
+
end
|
49
|
+
|
50
|
+
# Redis BITOP operator 'op' (one of :and, :or, :xor or :not) on operands
|
51
|
+
# (bitmaps). The result is stored in 'result'.
|
52
|
+
#
|
53
|
+
def bitop(op, *operands, result)
|
54
|
+
@redis.bitop(op, result.root_key, self.root_key, *operands.map(&:root_key))
|
55
|
+
result
|
56
|
+
end
|
57
|
+
|
58
|
+
# The key the bitmap is stored under.
|
59
|
+
#
|
60
|
+
def root_key
|
61
|
+
@root_key
|
62
|
+
end
|
63
|
+
|
64
|
+
# Returns lambda creating Bitmap objects using @redis as the connection.
|
65
|
+
#
|
66
|
+
def bitmap_factory
|
67
|
+
lambda { |key| @redis.bitmap(key) }
|
68
|
+
end
|
69
|
+
|
70
|
+
# Copy this bitmap to 'dest' bitmap.
|
71
|
+
#
|
72
|
+
def copy_to(dest)
|
73
|
+
copy(root_key, dest.root_key)
|
74
|
+
end
|
75
|
+
|
76
|
+
protected
|
77
|
+
|
78
|
+
def key(pos)
|
79
|
+
@root_key
|
80
|
+
end
|
81
|
+
|
82
|
+
def offset(pos)
|
83
|
+
pos
|
84
|
+
end
|
85
|
+
|
86
|
+
def b2i(b)
|
87
|
+
b ? 1 : 0
|
88
|
+
end
|
89
|
+
|
90
|
+
def i2b(i)
|
91
|
+
i.to_i != 0 ? true : false
|
92
|
+
end
|
93
|
+
|
94
|
+
COPY_SCRIPT =
|
95
|
+
<<-EOS
|
96
|
+
redis.call("DEL", KEYS[2])
|
97
|
+
if redis.call("EXISTS", KEYS[1]) == 1 then
|
98
|
+
local val = redis.call("DUMP", KEYS[1])
|
99
|
+
redis.call("RESTORE", KEYS[2], 0, val)
|
100
|
+
end
|
101
|
+
EOS
|
102
|
+
def copy(source_key, dest_key)
|
103
|
+
@redis.eval(COPY_SCRIPT, [source_key, dest_key])
|
104
|
+
end
|
105
|
+
end
|
106
|
+
end
|
107
|
+
end
|
@@ -0,0 +1,38 @@
|
|
1
|
+
class Redis
|
2
|
+
module Bitops
|
3
|
+
|
4
|
+
# Configurable settings.
|
5
|
+
#
|
6
|
+
class Configuration
|
7
|
+
|
8
|
+
# Number of bytes per one sparse bitmap chunk.
|
9
|
+
#
|
10
|
+
attr_accessor :default_bytes_per_chunk
|
11
|
+
|
12
|
+
# Granulatity of MULTI transactions. Currently supported values are :bitmap and nil.
|
13
|
+
#
|
14
|
+
attr_accessor :transaction_level
|
15
|
+
|
16
|
+
def initialize
|
17
|
+
reset!
|
18
|
+
end
|
19
|
+
|
20
|
+
def reset!
|
21
|
+
@default_bytes_per_chunk = 32 * 1024
|
22
|
+
@transaction_level = :bitmap
|
23
|
+
end
|
24
|
+
end
|
25
|
+
|
26
|
+
extend self
|
27
|
+
attr_accessor :configuration
|
28
|
+
|
29
|
+
# Call this method to modify defaults in your initializers.
|
30
|
+
#
|
31
|
+
def configure
|
32
|
+
self.configuration ||= Configuration.new
|
33
|
+
yield(configuration)
|
34
|
+
end
|
35
|
+
end
|
36
|
+
|
37
|
+
Bitops.configure {}
|
38
|
+
end
|
@@ -0,0 +1,71 @@
|
|
1
|
+
require 'securerandom'
|
2
|
+
|
3
|
+
class Redis
|
4
|
+
module Bitops
|
5
|
+
module Queries
|
6
|
+
|
7
|
+
# Binary bitwise operator.
|
8
|
+
#
|
9
|
+
class BinaryOperator
|
10
|
+
include MaterializationHelpers
|
11
|
+
include TreeBuildingHelpers
|
12
|
+
include LazyEvaluation
|
13
|
+
|
14
|
+
# Creates a bitwise operator 'op' with left-hand operand, 'lhs', and right-hand operand, 'rhs'.
|
15
|
+
#
|
16
|
+
def initialize(op, lhs, rhs)
|
17
|
+
@args = [lhs, rhs]
|
18
|
+
@op = op
|
19
|
+
end
|
20
|
+
|
21
|
+
# Runs the expression tree against the redis database, saving the results
|
22
|
+
# in bitmap 'dest'.
|
23
|
+
#
|
24
|
+
def materialize(dest)
|
25
|
+
# Resolve lhs and rhs operand, using 'dest' to store intermediate result so
|
26
|
+
# a maximum of one temporary Bitmap has to be created.
|
27
|
+
# Then apply the bitwise operator storing the final result in 'dest'.
|
28
|
+
|
29
|
+
intermediate = dest
|
30
|
+
|
31
|
+
lhs, *other_args = @args
|
32
|
+
temp_intermediates = []
|
33
|
+
|
34
|
+
# Side-effects: if a temp intermediate bitmap is created, it's added to 'temp_intermediates'
|
35
|
+
# to be deleted in the "ensure" block. Marked with "<- SE".
|
36
|
+
|
37
|
+
lhs_operand, intermediate = resolve_operand(lhs, intermediate, temp_intermediates) # <- SE
|
38
|
+
other_operands, *_ = other_args.inject([[], intermediate]) do |(operands, intermediate), arg|
|
39
|
+
operand, intermediate = resolve_operand(arg, intermediate, temp_intermediates) # <- SE
|
40
|
+
[operands << operand, intermediate]
|
41
|
+
end
|
42
|
+
|
43
|
+
lhs_operand.bitop(@op, *other_operands, dest)
|
44
|
+
ensure
|
45
|
+
temp_intermediates.each(&:delete!)
|
46
|
+
end
|
47
|
+
|
48
|
+
# Recursively optimizes the expression tree by combining operands for neighboring identical
|
49
|
+
# operators, so for instance a & b & c ultimately becomes BITOP :and dest a b c as opposed
|
50
|
+
# to running two separate BITOP commands.
|
51
|
+
#
|
52
|
+
def optimize!(parent_op = nil)
|
53
|
+
@args.map! { |arg| arg.respond_to?(:optimize!) ? arg.optimize!(@op) : arg }.flatten!
|
54
|
+
if parent_op == @op
|
55
|
+
@args
|
56
|
+
else
|
57
|
+
self
|
58
|
+
end
|
59
|
+
end
|
60
|
+
|
61
|
+
# Finds the first bitmap factory in the expression tree.
|
62
|
+
# Required by LazyEvaluation and MaterializationHelpers.
|
63
|
+
#
|
64
|
+
def bitmap_factory
|
65
|
+
arg = @args.find { |arg| arg.bitmap_factory } or raise "Internal error. Cannot find a bitmap factory."
|
66
|
+
arg.bitmap_factory
|
67
|
+
end
|
68
|
+
end
|
69
|
+
end
|
70
|
+
end
|
71
|
+
end
|