redis-bitops 0.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/MIT-LICENSE +20 -0
- data/README.md +193 -0
- data/lib/redis/bitops.rb +26 -0
- data/lib/redis/bitops/bitmap.rb +107 -0
- data/lib/redis/bitops/configuration.rb +38 -0
- data/lib/redis/bitops/queries/binary_operator.rb +71 -0
- data/lib/redis/bitops/queries/lazy_evaluation.rb +47 -0
- data/lib/redis/bitops/queries/materialization_helpers.rb +53 -0
- data/lib/redis/bitops/queries/tree_building_helpers.rb +46 -0
- data/lib/redis/bitops/queries/unary_operator.rb +48 -0
- data/lib/redis/bitops/sparse_bitmap.rb +125 -0
- data/spec/redis/bitops/bitmap_spec.rb +9 -0
- data/spec/redis/bitops/queries/binary_operator_spec.rb +24 -0
- data/spec/redis/bitops/queries/unary_operator_spec.rb +27 -0
- data/spec/redis/bitops/sparse_bitmap_spec.rb +99 -0
- data/spec/spec_helper.rb +16 -0
- data/spec/support/bitmap_examples.rb +313 -0
- metadata +173 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: a567e3b9af870d0a1875760e5c02ea1562b1204b
|
4
|
+
data.tar.gz: 0d27e8d824da9cc252aa3058c5c0fcb90c1af965
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 9a127af0eda591bd37ad31378275c52ea1cd8d4280fe836520e67d4ae423947836316583edffb4f2a0a153ba67fe4cc7c75c741b98bea862761ba2b4f92f731c
|
7
|
+
data.tar.gz: 56f170a119fff911cc250597887c4e49b6d5c9a784817315d02b80fbd70db129ca39ceaeb2a3d999bf5bba51066d3df705184a9ec1971a12084d215565ad89c1
|
data/MIT-LICENSE
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
Copyright 2014 Martin Bilski
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
4
|
+
a copy of this software and associated documentation files (the
|
5
|
+
"Software"), to deal in the Software without restriction, including
|
6
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
7
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
8
|
+
permit persons to whom the Software is furnished to do so, subject to
|
9
|
+
the following conditions:
|
10
|
+
|
11
|
+
The above copyright notice and this permission notice shall be
|
12
|
+
included in all copies or substantial portions of the Software.
|
13
|
+
|
14
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
15
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
16
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
17
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
18
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
19
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
20
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,193 @@
|
|
1
|
+
# Introduction
|
2
|
+
|
3
|
+
This gem makes it easier to do bit-wise operations on large Redis bitsets, usually called bitmaps, with a natural expression syntax. It also supports huge **sparse bitmaps** by storing data in multiple keys, called chunks, per bitmap.
|
4
|
+
|
5
|
+
The typical use is real-time web analytics where each bit in a bitmap/bitset corresponds to a user ([introductory article here](http://blog.getspool.com/2011/11/29/fast-easy-realtime-metrics-using-redis-bitmaps/)). This library isn't an analytic package though, it's more low level than that and you can use it for anything.
|
6
|
+
|
7
|
+
**This library is under development and its interface might change.**
|
8
|
+
|
9
|
+
|
10
|
+
# Quick start/pitch
|
11
|
+
|
12
|
+
Why use the library?
|
13
|
+
|
14
|
+
require 'redis/bitops'
|
15
|
+
redis = Redis.new
|
16
|
+
|
17
|
+
b1 = redis.sparse_bitmap("b1")
|
18
|
+
b1[128_000_000] = true
|
19
|
+
b2 = redis.sparse_bitmap("b2")
|
20
|
+
b2[128_000_000] = true
|
21
|
+
result = b1 & b2
|
22
|
+
|
23
|
+
Memory usage: about 20kb because it uses a sparse bitmap implementation using chunks of data.
|
24
|
+
|
25
|
+
Let's go crazy with super-complicated expressions:
|
26
|
+
|
27
|
+
...
|
28
|
+
result = (b1 & ~b2) | b3 (| (b4 & b5 & b6 & ~b7))
|
29
|
+
|
30
|
+
Imagine writing this expression using Redis#bitop!
|
31
|
+
|
32
|
+
|
33
|
+
# Installation
|
34
|
+
|
35
|
+
To install the gem:
|
36
|
+
|
37
|
+
gem install redis-bitops
|
38
|
+
|
39
|
+
To use it in your code:
|
40
|
+
|
41
|
+
require 'redis/bitops'
|
42
|
+
|
43
|
+
# Usage
|
44
|
+
|
45
|
+
Reference: [here](http://rdoc.info/github/bilus/redis-bitops/master/frames)
|
46
|
+
|
47
|
+
## Basic example
|
48
|
+
|
49
|
+
An example is often better than theory so here's one. Let's create a few bitmaps and set their individual bits; we'll use those bitmaps in the examples below:
|
50
|
+
|
51
|
+
redis = Redis.new
|
52
|
+
|
53
|
+
a = redis.bitmap("a")
|
54
|
+
b = redis.bitmap("b")
|
55
|
+
result = redis.bitmap("result")
|
56
|
+
|
57
|
+
b[0] = true; b[2] = true; b[7] = true # 10100001
|
58
|
+
a[0] = true; a[1] = true; a[7] = true # 11000001
|
59
|
+
|
60
|
+
So, now here's a very simple expression:
|
61
|
+
|
62
|
+
c = a & b
|
63
|
+
|
64
|
+
You may be surprised but the above statement does not query Redis at all! The expression is lazy-evaluated when you access the result:
|
65
|
+
|
66
|
+
puts c.bitcount # => 2
|
67
|
+
puts c[0] # => true
|
68
|
+
puts c[1] # => false
|
69
|
+
puts c[2] # => false
|
70
|
+
puts c[7] # => false
|
71
|
+
|
72
|
+
So, in the above example, the call to `c.bitcount` happens to be the first moment when Redis is queried. The result is stored under a temporary unique key.
|
73
|
+
|
74
|
+
puts c.root_key # => "redis:bitops:8eef38u9o09334"
|
75
|
+
|
76
|
+
Let's delete the temporary result:
|
77
|
+
|
78
|
+
c.delete!
|
79
|
+
|
80
|
+
If you want to store the result directly under a specific key:
|
81
|
+
|
82
|
+
result << c
|
83
|
+
|
84
|
+
Or, more adventurously, we can use the following more complex one-liner:
|
85
|
+
|
86
|
+
result << (~c & (a | b))
|
87
|
+
|
88
|
+
**Note: ** expressions are optimized by reducing the number of Redis commands and using as few temporary keys to hold intermediate values as possible. See below for details.
|
89
|
+
|
90
|
+
|
91
|
+
## Sparse bitmaps
|
92
|
+
|
93
|
+
### Usage
|
94
|
+
|
95
|
+
You don't have to do anything special, simply use `Redis#sparse_bitmap` instead of `Redis#bitmap`:
|
96
|
+
|
97
|
+
a = redis.sparse_bitmap("a")
|
98
|
+
b = redis.sparse_bitmap("b")
|
99
|
+
result = redis.sparse_bitmap("result")
|
100
|
+
|
101
|
+
b[0] = true; b[2] = true; b[7] = true # 10100001
|
102
|
+
a[0] = true; a[1] = true; a[7] = true # 11000001
|
103
|
+
|
104
|
+
c = a & b
|
105
|
+
|
106
|
+
result << c
|
107
|
+
|
108
|
+
or just:
|
109
|
+
|
110
|
+
result << (a & b)
|
111
|
+
|
112
|
+
You can specify the chunk size (in bytes).
|
113
|
+
|
114
|
+
Use the size consistently. Note that it cannot be re-adjusted for data already saved to Redis:
|
115
|
+
|
116
|
+
x = redis.sparse_bitmap("x", 1024 * 1024) # 1 MB per chunk.
|
117
|
+
x[0] = true
|
118
|
+
x[1000] = true
|
119
|
+
|
120
|
+
**Important:** Do not mix sparse bitmaps with regular ones and never mix sparse bitmaps with different chunk sizes in the same expressions.
|
121
|
+
|
122
|
+
### Rationale
|
123
|
+
|
124
|
+
If you want to store a lot of huge but sparse bitsets, with not many bits set, using regular Redis bitmaps doesn't work very well. It wastes a lot of space. In analytics, it's a reasonable requirement, to be able to store data about several million users. A bitmap for 10 million users weights over 1MB! Imagine storing hourly statistics and using up memory at a rate of 720MB per month.
|
125
|
+
|
126
|
+
For, say, 100 million users it becomes outright prohibitive!
|
127
|
+
|
128
|
+
But even with a fairly popular websites, I dare say, you don't often have one million users per hour :) This means that the majority of those bits is never sets and a lot of space goes wasted.
|
129
|
+
|
130
|
+
Enter sparse bitmaps. They divide each bitmap into chunks thus minimizing memory use (chunks' size can be configured, see Configuration below).
|
131
|
+
|
132
|
+
Creating and using sparse bitmaps is identical to using regular bitmaps:
|
133
|
+
|
134
|
+
huge = redis.sparse_bitmap("huge_bitmap")
|
135
|
+
huge[128_000_000] = true
|
136
|
+
|
137
|
+
The only difference in the above example is that it will allocate two 32kb chunks as opposed to 1MB that would be allocated if we used a regular bitmap (Redis#bitmap). In addition, setting the bit is nearly instantaneous.
|
138
|
+
|
139
|
+
Compare:
|
140
|
+
|
141
|
+
puts Benchmark.measure {
|
142
|
+
sparse = redis.sparse_bitmap("huge_sparse_bitmap")
|
143
|
+
sparse[500_000_000] = true
|
144
|
+
}
|
145
|
+
|
146
|
+
which on my machine this generates:
|
147
|
+
|
148
|
+
0.000000 0.000000 0.000000 ( 0.000366)
|
149
|
+
|
150
|
+
It uses just 23kb memory as opposed to 120MB (megabytes!) to store the bit using a regular Redis bitmap:
|
151
|
+
|
152
|
+
regular = redis.bitmap("huge_regular_bitmap")
|
153
|
+
regular[500_000_000] = true
|
154
|
+
|
155
|
+
## Configuration
|
156
|
+
|
157
|
+
Here's how to configure the gem:
|
158
|
+
|
159
|
+
Redis::Bitops.configure do |config|
|
160
|
+
config.default_bytes_per_chunk = 8096 # Eight kilobytes.
|
161
|
+
config.transaction_level = :bitmap # allowed values: :bitmap or :none.
|
162
|
+
end
|
163
|
+
|
164
|
+
# Implementation & efficiency
|
165
|
+
|
166
|
+
## Optimization phase
|
167
|
+
|
168
|
+
Prior to evaluation, the expression is optimized by combining operators into single BITOP commands and reusing temporary keys (required to store intermediate results) as much as possible.
|
169
|
+
|
170
|
+
This silly example:
|
171
|
+
|
172
|
+
result << (a & b & c | a | b)
|
173
|
+
|
174
|
+
translates into simply:
|
175
|
+
|
176
|
+
BITOP AND result a b c
|
177
|
+
BITOP OR result result a b
|
178
|
+
|
179
|
+
and doesn't create any temporary keys at all!
|
180
|
+
|
181
|
+
## Materialization phase
|
182
|
+
|
183
|
+
At this point, the calculations are carried out and the result is saved under the destination key. Note that, for sparse bitmaps, multiple keys may be created.
|
184
|
+
|
185
|
+
|
186
|
+
## Transaction levels
|
187
|
+
|
188
|
+
TBD
|
189
|
+
|
190
|
+
|
191
|
+
## Contributing/feedback
|
192
|
+
|
193
|
+
Please send in your suggestions to [gyamtso@gmail.com](mailto:gyamtso@gmail.com). Pull requests, issues, comments are more than welcome.
|
data/lib/redis/bitops.rb
ADDED
@@ -0,0 +1,26 @@
|
|
1
|
+
require 'redis'
|
2
|
+
require 'redis/bitops/queries/materialization_helpers'
|
3
|
+
require 'redis/bitops/queries/tree_building_helpers'
|
4
|
+
require 'redis/bitops/queries/lazy_evaluation'
|
5
|
+
require 'redis/bitops/queries/binary_operator'
|
6
|
+
require 'redis/bitops/queries/unary_operator'
|
7
|
+
require 'redis/bitops/bitmap'
|
8
|
+
require 'redis/bitops/sparse_bitmap'
|
9
|
+
|
10
|
+
require 'redis/bitops/configuration'
|
11
|
+
|
12
|
+
|
13
|
+
class Redis
|
14
|
+
|
15
|
+
# Creates a new bitmap.
|
16
|
+
#
|
17
|
+
def bitmap(key)
|
18
|
+
Bitops::Bitmap.new(key, self)
|
19
|
+
end
|
20
|
+
|
21
|
+
# Creates a new sparse bitmap storing data in n chunks to conserve memory.
|
22
|
+
#
|
23
|
+
def sparse_bitmap(key, bytes_per_chunk = nil)
|
24
|
+
Bitops::SparseBitmap.new(key, self, bytes_per_chunk)
|
25
|
+
end
|
26
|
+
end
|
@@ -0,0 +1,107 @@
|
|
1
|
+
class Redis
|
2
|
+
module Bitops
|
3
|
+
|
4
|
+
# A sparse bitmap using multiple key to store its data to save memory.
|
5
|
+
#
|
6
|
+
# Note: When adding new public methods, revise the LazyEvaluation module.
|
7
|
+
#
|
8
|
+
class Bitmap
|
9
|
+
|
10
|
+
include Queries
|
11
|
+
include TreeBuildingHelpers # See for a list of supported operators.
|
12
|
+
|
13
|
+
# Creates a new regular Redis bitmap stored in 'redis' under 'root_key'.
|
14
|
+
#
|
15
|
+
def initialize(root_key, redis)
|
16
|
+
@redis = redis
|
17
|
+
@root_key = root_key
|
18
|
+
end
|
19
|
+
|
20
|
+
# Saves the result of the query in the bitmap.
|
21
|
+
#
|
22
|
+
def << (query)
|
23
|
+
query.evaluate(self)
|
24
|
+
end
|
25
|
+
|
26
|
+
# Reads bit at position 'pos' returning a boolean.
|
27
|
+
#
|
28
|
+
def [] (pos)
|
29
|
+
i2b(@redis.getbit(key(pos), offset(pos)))
|
30
|
+
end
|
31
|
+
|
32
|
+
# Sets bit at position 'pos' to 1 or 0 based on the boolean 'b'.
|
33
|
+
#
|
34
|
+
def []= (pos, b)
|
35
|
+
@redis.setbit(key(pos), offset(pos), b2i(b))
|
36
|
+
end
|
37
|
+
|
38
|
+
# Returns the number of set bits.
|
39
|
+
#
|
40
|
+
def bitcount
|
41
|
+
@redis.bitcount(@root_key)
|
42
|
+
end
|
43
|
+
|
44
|
+
# Deletes the bitmap and all its keys.
|
45
|
+
#
|
46
|
+
def delete!
|
47
|
+
@redis.del(@root_key)
|
48
|
+
end
|
49
|
+
|
50
|
+
# Redis BITOP operator 'op' (one of :and, :or, :xor or :not) on operands
|
51
|
+
# (bitmaps). The result is stored in 'result'.
|
52
|
+
#
|
53
|
+
def bitop(op, *operands, result)
|
54
|
+
@redis.bitop(op, result.root_key, self.root_key, *operands.map(&:root_key))
|
55
|
+
result
|
56
|
+
end
|
57
|
+
|
58
|
+
# The key the bitmap is stored under.
|
59
|
+
#
|
60
|
+
def root_key
|
61
|
+
@root_key
|
62
|
+
end
|
63
|
+
|
64
|
+
# Returns lambda creating Bitmap objects using @redis as the connection.
|
65
|
+
#
|
66
|
+
def bitmap_factory
|
67
|
+
lambda { |key| @redis.bitmap(key) }
|
68
|
+
end
|
69
|
+
|
70
|
+
# Copy this bitmap to 'dest' bitmap.
|
71
|
+
#
|
72
|
+
def copy_to(dest)
|
73
|
+
copy(root_key, dest.root_key)
|
74
|
+
end
|
75
|
+
|
76
|
+
protected
|
77
|
+
|
78
|
+
def key(pos)
|
79
|
+
@root_key
|
80
|
+
end
|
81
|
+
|
82
|
+
def offset(pos)
|
83
|
+
pos
|
84
|
+
end
|
85
|
+
|
86
|
+
def b2i(b)
|
87
|
+
b ? 1 : 0
|
88
|
+
end
|
89
|
+
|
90
|
+
def i2b(i)
|
91
|
+
i.to_i != 0 ? true : false
|
92
|
+
end
|
93
|
+
|
94
|
+
COPY_SCRIPT =
|
95
|
+
<<-EOS
|
96
|
+
redis.call("DEL", KEYS[2])
|
97
|
+
if redis.call("EXISTS", KEYS[1]) == 1 then
|
98
|
+
local val = redis.call("DUMP", KEYS[1])
|
99
|
+
redis.call("RESTORE", KEYS[2], 0, val)
|
100
|
+
end
|
101
|
+
EOS
|
102
|
+
def copy(source_key, dest_key)
|
103
|
+
@redis.eval(COPY_SCRIPT, [source_key, dest_key])
|
104
|
+
end
|
105
|
+
end
|
106
|
+
end
|
107
|
+
end
|
@@ -0,0 +1,38 @@
|
|
1
|
+
class Redis
|
2
|
+
module Bitops
|
3
|
+
|
4
|
+
# Configurable settings.
|
5
|
+
#
|
6
|
+
class Configuration
|
7
|
+
|
8
|
+
# Number of bytes per one sparse bitmap chunk.
|
9
|
+
#
|
10
|
+
attr_accessor :default_bytes_per_chunk
|
11
|
+
|
12
|
+
# Granulatity of MULTI transactions. Currently supported values are :bitmap and nil.
|
13
|
+
#
|
14
|
+
attr_accessor :transaction_level
|
15
|
+
|
16
|
+
def initialize
|
17
|
+
reset!
|
18
|
+
end
|
19
|
+
|
20
|
+
def reset!
|
21
|
+
@default_bytes_per_chunk = 32 * 1024
|
22
|
+
@transaction_level = :bitmap
|
23
|
+
end
|
24
|
+
end
|
25
|
+
|
26
|
+
extend self
|
27
|
+
attr_accessor :configuration
|
28
|
+
|
29
|
+
# Call this method to modify defaults in your initializers.
|
30
|
+
#
|
31
|
+
def configure
|
32
|
+
self.configuration ||= Configuration.new
|
33
|
+
yield(configuration)
|
34
|
+
end
|
35
|
+
end
|
36
|
+
|
37
|
+
Bitops.configure {}
|
38
|
+
end
|
@@ -0,0 +1,71 @@
|
|
1
|
+
require 'securerandom'
|
2
|
+
|
3
|
+
class Redis
|
4
|
+
module Bitops
|
5
|
+
module Queries
|
6
|
+
|
7
|
+
# Binary bitwise operator.
|
8
|
+
#
|
9
|
+
class BinaryOperator
|
10
|
+
include MaterializationHelpers
|
11
|
+
include TreeBuildingHelpers
|
12
|
+
include LazyEvaluation
|
13
|
+
|
14
|
+
# Creates a bitwise operator 'op' with left-hand operand, 'lhs', and right-hand operand, 'rhs'.
|
15
|
+
#
|
16
|
+
def initialize(op, lhs, rhs)
|
17
|
+
@args = [lhs, rhs]
|
18
|
+
@op = op
|
19
|
+
end
|
20
|
+
|
21
|
+
# Runs the expression tree against the redis database, saving the results
|
22
|
+
# in bitmap 'dest'.
|
23
|
+
#
|
24
|
+
def materialize(dest)
|
25
|
+
# Resolve lhs and rhs operand, using 'dest' to store intermediate result so
|
26
|
+
# a maximum of one temporary Bitmap has to be created.
|
27
|
+
# Then apply the bitwise operator storing the final result in 'dest'.
|
28
|
+
|
29
|
+
intermediate = dest
|
30
|
+
|
31
|
+
lhs, *other_args = @args
|
32
|
+
temp_intermediates = []
|
33
|
+
|
34
|
+
# Side-effects: if a temp intermediate bitmap is created, it's added to 'temp_intermediates'
|
35
|
+
# to be deleted in the "ensure" block. Marked with "<- SE".
|
36
|
+
|
37
|
+
lhs_operand, intermediate = resolve_operand(lhs, intermediate, temp_intermediates) # <- SE
|
38
|
+
other_operands, *_ = other_args.inject([[], intermediate]) do |(operands, intermediate), arg|
|
39
|
+
operand, intermediate = resolve_operand(arg, intermediate, temp_intermediates) # <- SE
|
40
|
+
[operands << operand, intermediate]
|
41
|
+
end
|
42
|
+
|
43
|
+
lhs_operand.bitop(@op, *other_operands, dest)
|
44
|
+
ensure
|
45
|
+
temp_intermediates.each(&:delete!)
|
46
|
+
end
|
47
|
+
|
48
|
+
# Recursively optimizes the expression tree by combining operands for neighboring identical
|
49
|
+
# operators, so for instance a & b & c ultimately becomes BITOP :and dest a b c as opposed
|
50
|
+
# to running two separate BITOP commands.
|
51
|
+
#
|
52
|
+
def optimize!(parent_op = nil)
|
53
|
+
@args.map! { |arg| arg.respond_to?(:optimize!) ? arg.optimize!(@op) : arg }.flatten!
|
54
|
+
if parent_op == @op
|
55
|
+
@args
|
56
|
+
else
|
57
|
+
self
|
58
|
+
end
|
59
|
+
end
|
60
|
+
|
61
|
+
# Finds the first bitmap factory in the expression tree.
|
62
|
+
# Required by LazyEvaluation and MaterializationHelpers.
|
63
|
+
#
|
64
|
+
def bitmap_factory
|
65
|
+
arg = @args.find { |arg| arg.bitmap_factory } or raise "Internal error. Cannot find a bitmap factory."
|
66
|
+
arg.bitmap_factory
|
67
|
+
end
|
68
|
+
end
|
69
|
+
end
|
70
|
+
end
|
71
|
+
end
|