mneme 0.5.0
Sign up to get free protection for your applications and to get access to all the features.
- data/.gitignore +4 -0
- data/.rspec +0 -0
- data/Gemfile +6 -0
- data/README.md +62 -0
- data/Rakefile +9 -0
- data/bin/mneme +11 -0
- data/config.rb +8 -0
- data/lib/mneme.rb +97 -0
- data/lib/mneme/helper.rb +11 -0
- data/lib/mneme/sweeper.rb +38 -0
- data/mneme.gemspec +30 -0
- data/spec/mneme_spec.rb +93 -0
- metadata +144 -0
data/.gitignore
ADDED
data/.rspec
ADDED
File without changes
|
data/Gemfile
ADDED
data/README.md
ADDED
@@ -0,0 +1,62 @@
|
|
1
|
+
# Mneme
|
2
|
+
|
3
|
+
mneme (n.) mne·me
|
4
|
+
1. Psychology: the retentive basis or basic principle in a mind or organism accounting for memory, persisting effect of memory of past events.
|
5
|
+
2. Mythology: the Muse of memory, one of the original three Muses. Cf."Aoede, Melete."
|
6
|
+
|
7
|
+
Mneme is an HTTP web-service for recording and identifying previously seen records - aka, duplicate detection. To achieve this goal in a scalable, and zero-maintenance manner, it is implemented via a collection of automatically rotated bloomfilters. By using a collection of bloomfilters, you can customize your false-positive error rate, as well as the amount of time you want your memory to perist (ex: remember all keys for the last 6 hours).
|
8
|
+
|
9
|
+
To minimize the require memory footprint, mneme does not store the actual key names, instead each specified key is hashed and mapped onto the bloomfilter. For data storage, we use Redis getbit/setbit to efficiently store and retrieve bit-level data for each key. Couple this with Goliath app-server, and you have an out-of-the-box, high-performance, customizable duplicate filter.
|
10
|
+
|
11
|
+
For more details: [Mneme: Scalable Duplicate Filtering Service](http://www.igvita.com/2011/03/24/mneme-scalable-duplicate-filtering-service )
|
12
|
+
|
13
|
+
## Sample configuration
|
14
|
+
|
15
|
+
# example_config.rb
|
16
|
+
|
17
|
+
config['namespace'] = 'default' # namespace for your app (if you're sharing a redis instance)
|
18
|
+
config['periods'] = 3 # number of periods to store data for
|
19
|
+
config['length'] = 60 # length of a period in seconds (length = 60, periods = 3.. 180s worth of data)
|
20
|
+
|
21
|
+
config['size'] = 1000 # desired size of the bloomfilter
|
22
|
+
config['bits'] = 10 # number of bits allocated per key
|
23
|
+
config['hashes'] = 7 # number of times each key will be hashed
|
24
|
+
config['seed'] = 30 # seed value for the hash function
|
25
|
+
|
26
|
+
To learn more about Bloom filter configuration: [Scalable Datasets: Bloom Filters in Ruby](http://www.igvita.com/2008/12/27/scalable-datasets-bloom-filters-in-ruby/)
|
27
|
+
|
28
|
+
## Launching mneme
|
29
|
+
|
30
|
+
$> redis-server
|
31
|
+
$> gem install mneme
|
32
|
+
$> mneme -p 9000 -sv -c config.rb # run with -h to see all options
|
33
|
+
|
34
|
+
That's it! You now have a mneme web service running on port 9000. Let's try querying and inserting some data:
|
35
|
+
|
36
|
+
$> curl "http://127.0.0.1:9000?key=abcd"
|
37
|
+
{"found":[],"missing":["abcd"]}
|
38
|
+
|
39
|
+
# -d creates a POST request with key=abcd, aka insert into filter
|
40
|
+
$> curl "http://127.0.0.1:9000?key=abcd" -d' '
|
41
|
+
|
42
|
+
$> curl "http://127.0.0.1:9000?key=abcd"
|
43
|
+
{"found":["abcd"],"missing":[]}
|
44
|
+
|
45
|
+
## Performance & Memory requirements
|
46
|
+
|
47
|
+
- [Redis](http://redis.io/) is used as an in-memory datastore of the bloomfilter
|
48
|
+
- [Goliath](https://github.com/postrank-labs/goliath) provides the high-performance HTTP frontend
|
49
|
+
- The speed of storing a new key is: *O(number of BF hashes) - aka, O(1)*
|
50
|
+
- The speed of retrieving a key is: *O(number of filters * number of BF hashes) - aka, O(1)*
|
51
|
+
|
52
|
+
Bloom filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positives are possible, but false negatives are not. Because we are using Redis as a backend, in-memory store for the filters, there is some extra overhead. Sample memory requirements:
|
53
|
+
|
54
|
+
- 1.0% error rate for 1M items, 10 bits/item: 2.5 mb
|
55
|
+
- 1.0% error rate for 150M items, 10 bits per item: 358.52 mb
|
56
|
+
- 0.1% error rate for 150M items, 15 bits per item: 537.33 mb
|
57
|
+
|
58
|
+
Ex: If you wanted to store up to 24 hours (with 1 hour = 1 bloom filter) of keys, where each hour can have up to 1M keys, and you are willing to accept a 1.0% error rate, then your memory footprint is: 24 * 2.5mb = 60mb of memory. The footprint will not change after 24 hours, because Mneme will automatically rotate and delete old filters for you!
|
59
|
+
|
60
|
+
### License
|
61
|
+
|
62
|
+
(MIT License) - Copyright (c) 2011 Ilya Grigorik
|
data/Rakefile
ADDED
data/bin/mneme
ADDED
@@ -0,0 +1,11 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
begin
|
4
|
+
config_index = (ARGV.index('-c') || ARGV.index('--config')) + 1
|
5
|
+
ARGV[config_index] = File.absolute_path(ARGV[config_index])
|
6
|
+
rescue
|
7
|
+
puts "Please specify a valid mneme configuration file (ex: -c config.rb)"
|
8
|
+
exit
|
9
|
+
end
|
10
|
+
|
11
|
+
system("/usr/bin/env ruby " + File.dirname(__FILE__) + '/../lib/mneme.rb' + ' ' + ARGV.join(" "))
|
data/config.rb
ADDED
data/lib/mneme.rb
ADDED
@@ -0,0 +1,97 @@
|
|
1
|
+
require 'goliath'
|
2
|
+
require 'yajl'
|
3
|
+
|
4
|
+
require 'redis'
|
5
|
+
require 'redis/connection/synchrony'
|
6
|
+
require 'bloomfilter-rb'
|
7
|
+
|
8
|
+
require 'lib/mneme/helper'
|
9
|
+
require 'lib/mneme/sweeper'
|
10
|
+
|
11
|
+
class Mneme < Goliath::API
|
12
|
+
include Mnemosyne::Helper
|
13
|
+
plugin Mnemosyne::Sweeper
|
14
|
+
|
15
|
+
use ::Rack::Reloader, 0 if Goliath.dev?
|
16
|
+
|
17
|
+
use Goliath::Rack::Params
|
18
|
+
use Goliath::Rack::DefaultMimeType
|
19
|
+
use Goliath::Rack::Formatters::JSON
|
20
|
+
use Goliath::Rack::Render
|
21
|
+
use Goliath::Rack::Heartbeat
|
22
|
+
use Goliath::Rack::ValidationError
|
23
|
+
use Goliath::Rack::Validation::RequestMethod, %w(GET POST)
|
24
|
+
|
25
|
+
def response(env)
|
26
|
+
keys = [params.delete('key') || params.delete('key[]')].flatten.compact
|
27
|
+
return [400, {}, {error: 'no key specified'}] if keys.empty?
|
28
|
+
|
29
|
+
logger.debug "Processing: #{keys}"
|
30
|
+
case env[Goliath::Request::REQUEST_METHOD]
|
31
|
+
when 'GET' then query_filters(keys)
|
32
|
+
when 'POST' then update_filters(keys)
|
33
|
+
end
|
34
|
+
end
|
35
|
+
|
36
|
+
def query_filters(keys)
|
37
|
+
found, missing = [], []
|
38
|
+
keys.each do |key|
|
39
|
+
|
40
|
+
present = false
|
41
|
+
config['periods'].to_i.times do |n|
|
42
|
+
if filter(n).key?(key)
|
43
|
+
present = true
|
44
|
+
break
|
45
|
+
end
|
46
|
+
end
|
47
|
+
|
48
|
+
if present
|
49
|
+
found << key
|
50
|
+
else
|
51
|
+
missing << key
|
52
|
+
end
|
53
|
+
end
|
54
|
+
|
55
|
+
code = case keys.size
|
56
|
+
when found.size then 200
|
57
|
+
when missing.size then 404
|
58
|
+
else 206
|
59
|
+
end
|
60
|
+
|
61
|
+
[code, {}, {found: found, missing: missing}]
|
62
|
+
end
|
63
|
+
|
64
|
+
def update_filters(keys)
|
65
|
+
keys.each do |key|
|
66
|
+
filter(0).insert key
|
67
|
+
logger.debug "Inserted new key: #{key}"
|
68
|
+
end
|
69
|
+
|
70
|
+
[201, {}, '']
|
71
|
+
end
|
72
|
+
|
73
|
+
private
|
74
|
+
|
75
|
+
def filter(n)
|
76
|
+
period = epoch_name(config['namespace'], n, config['length'])
|
77
|
+
|
78
|
+
filter = if env.key? period
|
79
|
+
env[period]
|
80
|
+
else
|
81
|
+
opts = {
|
82
|
+
namespace: config['namespace'],
|
83
|
+
size: config['size'] * config['bits'],
|
84
|
+
seed: config['seed'],
|
85
|
+
hashes: config['hashes']
|
86
|
+
}
|
87
|
+
|
88
|
+
# env[period] = EventMachine::Synchrony::ConnectionPool.new(size: 10) do
|
89
|
+
env[period] = BloomFilter::Redis.new(opts)
|
90
|
+
# end
|
91
|
+
|
92
|
+
env[period]
|
93
|
+
end
|
94
|
+
|
95
|
+
filter
|
96
|
+
end
|
97
|
+
end
|
data/lib/mneme/helper.rb
ADDED
@@ -0,0 +1,38 @@
|
|
1
|
+
module Mnemosyne
|
2
|
+
class Sweeper
|
3
|
+
include Helper
|
4
|
+
|
5
|
+
def initialize(port, config, status, logger)
|
6
|
+
@status = status
|
7
|
+
@config = config
|
8
|
+
@logger = logger
|
9
|
+
end
|
10
|
+
|
11
|
+
def run
|
12
|
+
if @config.empty?
|
13
|
+
puts "Please specify a valid mneme configuration file (ex: -c config.rb)"
|
14
|
+
EM.stop
|
15
|
+
exit
|
16
|
+
end
|
17
|
+
|
18
|
+
sweeper = Proc.new do
|
19
|
+
current = epoch_name(@config['namespace'], 0, @config['length'])
|
20
|
+
@logger.info "Sweeping old filters, current epoch: #{current}"
|
21
|
+
|
22
|
+
conn = Redis.new
|
23
|
+
@config['periods'].times do |n|
|
24
|
+
name = epoch_name(@config['namespace'], n + @config['periods'], @config['length'])
|
25
|
+
|
26
|
+
conn.del(name)
|
27
|
+
@logger.info "Removed: #{name}"
|
28
|
+
end
|
29
|
+
conn.client.disconnect
|
30
|
+
end
|
31
|
+
|
32
|
+
sweeper.call
|
33
|
+
EM.add_periodic_timer(@config['length']) { sweeper.call }
|
34
|
+
|
35
|
+
@logger.info "Started Mnemosyne::Sweeper with #{@config['length']}s interval"
|
36
|
+
end
|
37
|
+
end
|
38
|
+
end
|
data/mneme.gemspec
ADDED
@@ -0,0 +1,30 @@
|
|
1
|
+
# -*- encoding: utf-8 -*-
|
2
|
+
$:.push File.expand_path("../lib", __FILE__)
|
3
|
+
|
4
|
+
Gem::Specification.new do |s|
|
5
|
+
s.name = "mneme"
|
6
|
+
s.version = "0.5.0"
|
7
|
+
s.platform = Gem::Platform::RUBY
|
8
|
+
s.authors = ["Ilya Grigorik"]
|
9
|
+
s.email = ["ilya@igvita.com"]
|
10
|
+
s.homepage = ""
|
11
|
+
s.summary = %q{abc}
|
12
|
+
s.description = %q{Write a gem description}
|
13
|
+
|
14
|
+
s.rubyforge_project = "mneme"
|
15
|
+
|
16
|
+
s.add_dependency "goliath"
|
17
|
+
s.add_dependency "hiredis"
|
18
|
+
|
19
|
+
s.add_dependency "redis"
|
20
|
+
s.add_dependency "yajl-ruby"
|
21
|
+
s.add_dependency "bloomfilter-rb"
|
22
|
+
|
23
|
+
s.add_development_dependency "rspec"
|
24
|
+
s.add_development_dependency "em-http-request", ">= 1.0.0.beta.3"
|
25
|
+
|
26
|
+
s.files = `git ls-files`.split("\n")
|
27
|
+
s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
|
28
|
+
s.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
|
29
|
+
s.require_paths = ["lib"]
|
30
|
+
end
|
data/spec/mneme_spec.rb
ADDED
@@ -0,0 +1,93 @@
|
|
1
|
+
require 'lib/mneme'
|
2
|
+
require 'goliath/test_helper'
|
3
|
+
require 'em-http/middleware/json_response'
|
4
|
+
|
5
|
+
describe Mneme do
|
6
|
+
include Goliath::TestHelper
|
7
|
+
|
8
|
+
let(:err) { Proc.new { fail "API request failed" } }
|
9
|
+
let(:api_options) { { :config => File.expand_path(File.join(File.dirname(__FILE__), '..', 'config.rb')) } }
|
10
|
+
|
11
|
+
EventMachine::HttpRequest.use EventMachine::Middleware::JSONResponse
|
12
|
+
|
13
|
+
it 'responds to hearbeat' do
|
14
|
+
with_api(Mneme, api_options) do
|
15
|
+
get_request({path: '/status'}, err) do |c|
|
16
|
+
c.response.should match('OK')
|
17
|
+
end
|
18
|
+
end
|
19
|
+
end
|
20
|
+
|
21
|
+
it 'should require an error if no key is provided' do
|
22
|
+
with_api(Mneme, api_options) do
|
23
|
+
get_request({}, err) do |c|
|
24
|
+
c.response.should include 'error'
|
25
|
+
end
|
26
|
+
end
|
27
|
+
end
|
28
|
+
|
29
|
+
context 'single key' do
|
30
|
+
it 'should return 404 on missing key' do
|
31
|
+
with_api(Mneme, api_options) do
|
32
|
+
get_request({:query => {:key => 'missing'}}, err) do |c|
|
33
|
+
c.response_header.status.should == 404
|
34
|
+
c.response['missing'].should include 'missing'
|
35
|
+
end
|
36
|
+
end
|
37
|
+
end
|
38
|
+
|
39
|
+
it 'should insert key into filter' do
|
40
|
+
with_api(Mneme, api_options) do
|
41
|
+
post_request({:query => {key: 'abc'}}) do |c|
|
42
|
+
c.response_header.status.should == 201
|
43
|
+
|
44
|
+
get_request({:query => {:key => 'abc'}}, err) do |c|
|
45
|
+
c.response_header.status.should == 200
|
46
|
+
c.response['found'].should include 'abc'
|
47
|
+
end
|
48
|
+
end
|
49
|
+
end
|
50
|
+
end
|
51
|
+
end
|
52
|
+
|
53
|
+
context 'multiple keys' do
|
54
|
+
|
55
|
+
it 'should return 404 on missing keys' do
|
56
|
+
with_api(Mneme, api_options) do
|
57
|
+
get_request({:query => {:key => ['a', 'b']}}, err) do |c|
|
58
|
+
c.response_header.status.should == 404
|
59
|
+
|
60
|
+
c.response['found'].should be_empty
|
61
|
+
c.response['missing'].should include 'a'
|
62
|
+
c.response['missing'].should include 'b'
|
63
|
+
end
|
64
|
+
end
|
65
|
+
end
|
66
|
+
|
67
|
+
it 'should return 200 on found keys' do
|
68
|
+
with_api(Mneme, api_options) do
|
69
|
+
post_request({:query => {key: ['abc1', 'abc2']}}) do |c|
|
70
|
+
c.response_header.status.should == 201
|
71
|
+
|
72
|
+
get_request({:query => {:key => ['abc1', 'abc2']}}, err) do |c|
|
73
|
+
c.response_header.status.should == 200
|
74
|
+
end
|
75
|
+
end
|
76
|
+
end
|
77
|
+
end
|
78
|
+
|
79
|
+
it 'should return 206 on mixed keys' do
|
80
|
+
with_api(Mneme, api_options) do
|
81
|
+
post_request({:query => {key: ['abc3']}}) do |c|
|
82
|
+
c.response_header.status.should == 201
|
83
|
+
|
84
|
+
get_request({:query => {:key => ['abc3', 'abc4']}}, err) do |c|
|
85
|
+
c.response_header.status.should == 206
|
86
|
+
end
|
87
|
+
end
|
88
|
+
end
|
89
|
+
end
|
90
|
+
|
91
|
+
end
|
92
|
+
|
93
|
+
end
|
metadata
ADDED
@@ -0,0 +1,144 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: mneme
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
prerelease:
|
5
|
+
version: 0.5.0
|
6
|
+
platform: ruby
|
7
|
+
authors:
|
8
|
+
- Ilya Grigorik
|
9
|
+
autorequire:
|
10
|
+
bindir: bin
|
11
|
+
cert_chain: []
|
12
|
+
|
13
|
+
date: 2011-03-24 00:00:00 -04:00
|
14
|
+
default_executable:
|
15
|
+
dependencies:
|
16
|
+
- !ruby/object:Gem::Dependency
|
17
|
+
name: goliath
|
18
|
+
prerelease: false
|
19
|
+
requirement: &id001 !ruby/object:Gem::Requirement
|
20
|
+
none: false
|
21
|
+
requirements:
|
22
|
+
- - ">="
|
23
|
+
- !ruby/object:Gem::Version
|
24
|
+
version: "0"
|
25
|
+
type: :runtime
|
26
|
+
version_requirements: *id001
|
27
|
+
- !ruby/object:Gem::Dependency
|
28
|
+
name: hiredis
|
29
|
+
prerelease: false
|
30
|
+
requirement: &id002 !ruby/object:Gem::Requirement
|
31
|
+
none: false
|
32
|
+
requirements:
|
33
|
+
- - ">="
|
34
|
+
- !ruby/object:Gem::Version
|
35
|
+
version: "0"
|
36
|
+
type: :runtime
|
37
|
+
version_requirements: *id002
|
38
|
+
- !ruby/object:Gem::Dependency
|
39
|
+
name: redis
|
40
|
+
prerelease: false
|
41
|
+
requirement: &id003 !ruby/object:Gem::Requirement
|
42
|
+
none: false
|
43
|
+
requirements:
|
44
|
+
- - ">="
|
45
|
+
- !ruby/object:Gem::Version
|
46
|
+
version: "0"
|
47
|
+
type: :runtime
|
48
|
+
version_requirements: *id003
|
49
|
+
- !ruby/object:Gem::Dependency
|
50
|
+
name: yajl-ruby
|
51
|
+
prerelease: false
|
52
|
+
requirement: &id004 !ruby/object:Gem::Requirement
|
53
|
+
none: false
|
54
|
+
requirements:
|
55
|
+
- - ">="
|
56
|
+
- !ruby/object:Gem::Version
|
57
|
+
version: "0"
|
58
|
+
type: :runtime
|
59
|
+
version_requirements: *id004
|
60
|
+
- !ruby/object:Gem::Dependency
|
61
|
+
name: bloomfilter-rb
|
62
|
+
prerelease: false
|
63
|
+
requirement: &id005 !ruby/object:Gem::Requirement
|
64
|
+
none: false
|
65
|
+
requirements:
|
66
|
+
- - ">="
|
67
|
+
- !ruby/object:Gem::Version
|
68
|
+
version: "0"
|
69
|
+
type: :runtime
|
70
|
+
version_requirements: *id005
|
71
|
+
- !ruby/object:Gem::Dependency
|
72
|
+
name: rspec
|
73
|
+
prerelease: false
|
74
|
+
requirement: &id006 !ruby/object:Gem::Requirement
|
75
|
+
none: false
|
76
|
+
requirements:
|
77
|
+
- - ">="
|
78
|
+
- !ruby/object:Gem::Version
|
79
|
+
version: "0"
|
80
|
+
type: :development
|
81
|
+
version_requirements: *id006
|
82
|
+
- !ruby/object:Gem::Dependency
|
83
|
+
name: em-http-request
|
84
|
+
prerelease: false
|
85
|
+
requirement: &id007 !ruby/object:Gem::Requirement
|
86
|
+
none: false
|
87
|
+
requirements:
|
88
|
+
- - ">="
|
89
|
+
- !ruby/object:Gem::Version
|
90
|
+
version: 1.0.0.beta.3
|
91
|
+
type: :development
|
92
|
+
version_requirements: *id007
|
93
|
+
description: Write a gem description
|
94
|
+
email:
|
95
|
+
- ilya@igvita.com
|
96
|
+
executables:
|
97
|
+
- mneme
|
98
|
+
extensions: []
|
99
|
+
|
100
|
+
extra_rdoc_files: []
|
101
|
+
|
102
|
+
files:
|
103
|
+
- .gitignore
|
104
|
+
- .rspec
|
105
|
+
- Gemfile
|
106
|
+
- README.md
|
107
|
+
- Rakefile
|
108
|
+
- bin/mneme
|
109
|
+
- config.rb
|
110
|
+
- lib/mneme.rb
|
111
|
+
- lib/mneme/helper.rb
|
112
|
+
- lib/mneme/sweeper.rb
|
113
|
+
- mneme.gemspec
|
114
|
+
- spec/mneme_spec.rb
|
115
|
+
has_rdoc: true
|
116
|
+
homepage: ""
|
117
|
+
licenses: []
|
118
|
+
|
119
|
+
post_install_message:
|
120
|
+
rdoc_options: []
|
121
|
+
|
122
|
+
require_paths:
|
123
|
+
- lib
|
124
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
125
|
+
none: false
|
126
|
+
requirements:
|
127
|
+
- - ">="
|
128
|
+
- !ruby/object:Gem::Version
|
129
|
+
version: "0"
|
130
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
131
|
+
none: false
|
132
|
+
requirements:
|
133
|
+
- - ">="
|
134
|
+
- !ruby/object:Gem::Version
|
135
|
+
version: "0"
|
136
|
+
requirements: []
|
137
|
+
|
138
|
+
rubyforge_project: mneme
|
139
|
+
rubygems_version: 1.6.2
|
140
|
+
signing_key:
|
141
|
+
specification_version: 3
|
142
|
+
summary: abc
|
143
|
+
test_files:
|
144
|
+
- spec/mneme_spec.rb
|