mneme 0.5.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,4 @@
1
+ *.gem
2
+ .bundle
3
+ Gemfile.lock
4
+ pkg/*
data/.rspec ADDED
File without changes
data/Gemfile ADDED
@@ -0,0 +1,6 @@
1
+ source "http://rubygems.org"
2
+
3
+ # Specify your gem's dependencies in mneme.gemspec
4
+ gemspec
5
+
6
+ gem 'redis', :git => 'git://github.com/igrigorik/redis-rb.git'
@@ -0,0 +1,62 @@
1
+ # Mneme
2
+
3
+ mneme (n.) mne·me
4
+ 1. Psychology: the retentive basis or basic principle in a mind or organism accounting for memory, persisting effect of memory of past events.
5
+ 2. Mythology: the Muse of memory, one of the original three Muses. Cf."Aoede, Melete."
6
+
7
+ Mneme is an HTTP web-service for recording and identifying previously seen records - aka, duplicate detection. To achieve this goal in a scalable, and zero-maintenance manner, it is implemented via a collection of automatically rotated bloomfilters. By using a collection of bloomfilters, you can customize your false-positive error rate, as well as the amount of time you want your memory to perist (ex: remember all keys for the last 6 hours).
8
+
9
+ To minimize the require memory footprint, mneme does not store the actual key names, instead each specified key is hashed and mapped onto the bloomfilter. For data storage, we use Redis getbit/setbit to efficiently store and retrieve bit-level data for each key. Couple this with Goliath app-server, and you have an out-of-the-box, high-performance, customizable duplicate filter.
10
+
11
+ For more details: [Mneme: Scalable Duplicate Filtering Service](http://www.igvita.com/2011/03/24/mneme-scalable-duplicate-filtering-service )
12
+
13
+ ## Sample configuration
14
+
15
+ # example_config.rb
16
+
17
+ config['namespace'] = 'default' # namespace for your app (if you're sharing a redis instance)
18
+ config['periods'] = 3 # number of periods to store data for
19
+ config['length'] = 60 # length of a period in seconds (length = 60, periods = 3.. 180s worth of data)
20
+
21
+ config['size'] = 1000 # desired size of the bloomfilter
22
+ config['bits'] = 10 # number of bits allocated per key
23
+ config['hashes'] = 7 # number of times each key will be hashed
24
+ config['seed'] = 30 # seed value for the hash function
25
+
26
+ To learn more about Bloom filter configuration: [Scalable Datasets: Bloom Filters in Ruby](http://www.igvita.com/2008/12/27/scalable-datasets-bloom-filters-in-ruby/)
27
+
28
+ ## Launching mneme
29
+
30
+ $> redis-server
31
+ $> gem install mneme
32
+ $> mneme -p 9000 -sv -c config.rb # run with -h to see all options
33
+
34
+ That's it! You now have a mneme web service running on port 9000. Let's try querying and inserting some data:
35
+
36
+ $> curl "http://127.0.0.1:9000?key=abcd"
37
+ {"found":[],"missing":["abcd"]}
38
+
39
+ # -d creates a POST request with key=abcd, aka insert into filter
40
+ $> curl "http://127.0.0.1:9000?key=abcd" -d' '
41
+
42
+ $> curl "http://127.0.0.1:9000?key=abcd"
43
+ {"found":["abcd"],"missing":[]}
44
+
45
+ ## Performance & Memory requirements
46
+
47
+ - [Redis](http://redis.io/) is used as an in-memory datastore of the bloomfilter
48
+ - [Goliath](https://github.com/postrank-labs/goliath) provides the high-performance HTTP frontend
49
+ - The speed of storing a new key is: *O(number of BF hashes) - aka, O(1)*
50
+ - The speed of retrieving a key is: *O(number of filters * number of BF hashes) - aka, O(1)*
51
+
52
+ Bloom filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positives are possible, but false negatives are not. Because we are using Redis as a backend, in-memory store for the filters, there is some extra overhead. Sample memory requirements:
53
+
54
+ - 1.0% error rate for 1M items, 10 bits/item: 2.5 mb
55
+ - 1.0% error rate for 150M items, 10 bits per item: 358.52 mb
56
+ - 0.1% error rate for 150M items, 15 bits per item: 537.33 mb
57
+
58
+ Ex: If you wanted to store up to 24 hours (with 1 hour = 1 bloom filter) of keys, where each hour can have up to 1M keys, and you are willing to accept a 1.0% error rate, then your memory footprint is: 24 * 2.5mb = 60mb of memory. The footprint will not change after 24 hours, because Mneme will automatically rotate and delete old filters for you!
59
+
60
+ ### License
61
+
62
+ (MIT License) - Copyright (c) 2011 Ilya Grigorik
@@ -0,0 +1,9 @@
1
+ require 'bundler'
2
+ Bundler::GemHelper.install_tasks
3
+
4
+ require 'rspec/core/rake_task'
5
+
6
+ desc "Run all RSpec tests"
7
+ RSpec::Core::RakeTask.new(:spec)
8
+
9
+ task :default => :spec
@@ -0,0 +1,11 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ begin
4
+ config_index = (ARGV.index('-c') || ARGV.index('--config')) + 1
5
+ ARGV[config_index] = File.absolute_path(ARGV[config_index])
6
+ rescue
7
+ puts "Please specify a valid mneme configuration file (ex: -c config.rb)"
8
+ exit
9
+ end
10
+
11
+ system("/usr/bin/env ruby " + File.dirname(__FILE__) + '/../lib/mneme.rb' + ' ' + ARGV.join(" "))
@@ -0,0 +1,8 @@
1
+ config['namespace'] = 'default'
2
+ config['periods'] = 3
3
+ config['length'] = 60
4
+
5
+ config['size'] = 1000
6
+ config['bits'] = 10
7
+ config['hashes'] = 7
8
+ config['seed'] = 30
@@ -0,0 +1,97 @@
1
+ require 'goliath'
2
+ require 'yajl'
3
+
4
+ require 'redis'
5
+ require 'redis/connection/synchrony'
6
+ require 'bloomfilter-rb'
7
+
8
+ require 'lib/mneme/helper'
9
+ require 'lib/mneme/sweeper'
10
+
11
+ class Mneme < Goliath::API
12
+ include Mnemosyne::Helper
13
+ plugin Mnemosyne::Sweeper
14
+
15
+ use ::Rack::Reloader, 0 if Goliath.dev?
16
+
17
+ use Goliath::Rack::Params
18
+ use Goliath::Rack::DefaultMimeType
19
+ use Goliath::Rack::Formatters::JSON
20
+ use Goliath::Rack::Render
21
+ use Goliath::Rack::Heartbeat
22
+ use Goliath::Rack::ValidationError
23
+ use Goliath::Rack::Validation::RequestMethod, %w(GET POST)
24
+
25
+ def response(env)
26
+ keys = [params.delete('key') || params.delete('key[]')].flatten.compact
27
+ return [400, {}, {error: 'no key specified'}] if keys.empty?
28
+
29
+ logger.debug "Processing: #{keys}"
30
+ case env[Goliath::Request::REQUEST_METHOD]
31
+ when 'GET' then query_filters(keys)
32
+ when 'POST' then update_filters(keys)
33
+ end
34
+ end
35
+
36
+ def query_filters(keys)
37
+ found, missing = [], []
38
+ keys.each do |key|
39
+
40
+ present = false
41
+ config['periods'].to_i.times do |n|
42
+ if filter(n).key?(key)
43
+ present = true
44
+ break
45
+ end
46
+ end
47
+
48
+ if present
49
+ found << key
50
+ else
51
+ missing << key
52
+ end
53
+ end
54
+
55
+ code = case keys.size
56
+ when found.size then 200
57
+ when missing.size then 404
58
+ else 206
59
+ end
60
+
61
+ [code, {}, {found: found, missing: missing}]
62
+ end
63
+
64
+ def update_filters(keys)
65
+ keys.each do |key|
66
+ filter(0).insert key
67
+ logger.debug "Inserted new key: #{key}"
68
+ end
69
+
70
+ [201, {}, '']
71
+ end
72
+
73
+ private
74
+
75
+ def filter(n)
76
+ period = epoch_name(config['namespace'], n, config['length'])
77
+
78
+ filter = if env.key? period
79
+ env[period]
80
+ else
81
+ opts = {
82
+ namespace: config['namespace'],
83
+ size: config['size'] * config['bits'],
84
+ seed: config['seed'],
85
+ hashes: config['hashes']
86
+ }
87
+
88
+ # env[period] = EventMachine::Synchrony::ConnectionPool.new(size: 10) do
89
+ env[period] = BloomFilter::Redis.new(opts)
90
+ # end
91
+
92
+ env[period]
93
+ end
94
+
95
+ filter
96
+ end
97
+ end
@@ -0,0 +1,11 @@
1
+ module Mnemosyne
2
+ module Helper
3
+ def epoch(n, length)
4
+ (Time.now.to_i / length) - n
5
+ end
6
+
7
+ def epoch_name(namespace, n, length)
8
+ "mneme-#{namespace}-#{epoch(n, length)}"
9
+ end
10
+ end
11
+ end
@@ -0,0 +1,38 @@
1
+ module Mnemosyne
2
+ class Sweeper
3
+ include Helper
4
+
5
+ def initialize(port, config, status, logger)
6
+ @status = status
7
+ @config = config
8
+ @logger = logger
9
+ end
10
+
11
+ def run
12
+ if @config.empty?
13
+ puts "Please specify a valid mneme configuration file (ex: -c config.rb)"
14
+ EM.stop
15
+ exit
16
+ end
17
+
18
+ sweeper = Proc.new do
19
+ current = epoch_name(@config['namespace'], 0, @config['length'])
20
+ @logger.info "Sweeping old filters, current epoch: #{current}"
21
+
22
+ conn = Redis.new
23
+ @config['periods'].times do |n|
24
+ name = epoch_name(@config['namespace'], n + @config['periods'], @config['length'])
25
+
26
+ conn.del(name)
27
+ @logger.info "Removed: #{name}"
28
+ end
29
+ conn.client.disconnect
30
+ end
31
+
32
+ sweeper.call
33
+ EM.add_periodic_timer(@config['length']) { sweeper.call }
34
+
35
+ @logger.info "Started Mnemosyne::Sweeper with #{@config['length']}s interval"
36
+ end
37
+ end
38
+ end
@@ -0,0 +1,30 @@
1
+ # -*- encoding: utf-8 -*-
2
+ $:.push File.expand_path("../lib", __FILE__)
3
+
4
+ Gem::Specification.new do |s|
5
+ s.name = "mneme"
6
+ s.version = "0.5.0"
7
+ s.platform = Gem::Platform::RUBY
8
+ s.authors = ["Ilya Grigorik"]
9
+ s.email = ["ilya@igvita.com"]
10
+ s.homepage = ""
11
+ s.summary = %q{abc}
12
+ s.description = %q{Write a gem description}
13
+
14
+ s.rubyforge_project = "mneme"
15
+
16
+ s.add_dependency "goliath"
17
+ s.add_dependency "hiredis"
18
+
19
+ s.add_dependency "redis"
20
+ s.add_dependency "yajl-ruby"
21
+ s.add_dependency "bloomfilter-rb"
22
+
23
+ s.add_development_dependency "rspec"
24
+ s.add_development_dependency "em-http-request", ">= 1.0.0.beta.3"
25
+
26
+ s.files = `git ls-files`.split("\n")
27
+ s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
28
+ s.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
29
+ s.require_paths = ["lib"]
30
+ end
@@ -0,0 +1,93 @@
1
+ require 'lib/mneme'
2
+ require 'goliath/test_helper'
3
+ require 'em-http/middleware/json_response'
4
+
5
+ describe Mneme do
6
+ include Goliath::TestHelper
7
+
8
+ let(:err) { Proc.new { fail "API request failed" } }
9
+ let(:api_options) { { :config => File.expand_path(File.join(File.dirname(__FILE__), '..', 'config.rb')) } }
10
+
11
+ EventMachine::HttpRequest.use EventMachine::Middleware::JSONResponse
12
+
13
+ it 'responds to hearbeat' do
14
+ with_api(Mneme, api_options) do
15
+ get_request({path: '/status'}, err) do |c|
16
+ c.response.should match('OK')
17
+ end
18
+ end
19
+ end
20
+
21
+ it 'should require an error if no key is provided' do
22
+ with_api(Mneme, api_options) do
23
+ get_request({}, err) do |c|
24
+ c.response.should include 'error'
25
+ end
26
+ end
27
+ end
28
+
29
+ context 'single key' do
30
+ it 'should return 404 on missing key' do
31
+ with_api(Mneme, api_options) do
32
+ get_request({:query => {:key => 'missing'}}, err) do |c|
33
+ c.response_header.status.should == 404
34
+ c.response['missing'].should include 'missing'
35
+ end
36
+ end
37
+ end
38
+
39
+ it 'should insert key into filter' do
40
+ with_api(Mneme, api_options) do
41
+ post_request({:query => {key: 'abc'}}) do |c|
42
+ c.response_header.status.should == 201
43
+
44
+ get_request({:query => {:key => 'abc'}}, err) do |c|
45
+ c.response_header.status.should == 200
46
+ c.response['found'].should include 'abc'
47
+ end
48
+ end
49
+ end
50
+ end
51
+ end
52
+
53
+ context 'multiple keys' do
54
+
55
+ it 'should return 404 on missing keys' do
56
+ with_api(Mneme, api_options) do
57
+ get_request({:query => {:key => ['a', 'b']}}, err) do |c|
58
+ c.response_header.status.should == 404
59
+
60
+ c.response['found'].should be_empty
61
+ c.response['missing'].should include 'a'
62
+ c.response['missing'].should include 'b'
63
+ end
64
+ end
65
+ end
66
+
67
+ it 'should return 200 on found keys' do
68
+ with_api(Mneme, api_options) do
69
+ post_request({:query => {key: ['abc1', 'abc2']}}) do |c|
70
+ c.response_header.status.should == 201
71
+
72
+ get_request({:query => {:key => ['abc1', 'abc2']}}, err) do |c|
73
+ c.response_header.status.should == 200
74
+ end
75
+ end
76
+ end
77
+ end
78
+
79
+ it 'should return 206 on mixed keys' do
80
+ with_api(Mneme, api_options) do
81
+ post_request({:query => {key: ['abc3']}}) do |c|
82
+ c.response_header.status.should == 201
83
+
84
+ get_request({:query => {:key => ['abc3', 'abc4']}}, err) do |c|
85
+ c.response_header.status.should == 206
86
+ end
87
+ end
88
+ end
89
+ end
90
+
91
+ end
92
+
93
+ end
metadata ADDED
@@ -0,0 +1,144 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: mneme
3
+ version: !ruby/object:Gem::Version
4
+ prerelease:
5
+ version: 0.5.0
6
+ platform: ruby
7
+ authors:
8
+ - Ilya Grigorik
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+
13
+ date: 2011-03-24 00:00:00 -04:00
14
+ default_executable:
15
+ dependencies:
16
+ - !ruby/object:Gem::Dependency
17
+ name: goliath
18
+ prerelease: false
19
+ requirement: &id001 !ruby/object:Gem::Requirement
20
+ none: false
21
+ requirements:
22
+ - - ">="
23
+ - !ruby/object:Gem::Version
24
+ version: "0"
25
+ type: :runtime
26
+ version_requirements: *id001
27
+ - !ruby/object:Gem::Dependency
28
+ name: hiredis
29
+ prerelease: false
30
+ requirement: &id002 !ruby/object:Gem::Requirement
31
+ none: false
32
+ requirements:
33
+ - - ">="
34
+ - !ruby/object:Gem::Version
35
+ version: "0"
36
+ type: :runtime
37
+ version_requirements: *id002
38
+ - !ruby/object:Gem::Dependency
39
+ name: redis
40
+ prerelease: false
41
+ requirement: &id003 !ruby/object:Gem::Requirement
42
+ none: false
43
+ requirements:
44
+ - - ">="
45
+ - !ruby/object:Gem::Version
46
+ version: "0"
47
+ type: :runtime
48
+ version_requirements: *id003
49
+ - !ruby/object:Gem::Dependency
50
+ name: yajl-ruby
51
+ prerelease: false
52
+ requirement: &id004 !ruby/object:Gem::Requirement
53
+ none: false
54
+ requirements:
55
+ - - ">="
56
+ - !ruby/object:Gem::Version
57
+ version: "0"
58
+ type: :runtime
59
+ version_requirements: *id004
60
+ - !ruby/object:Gem::Dependency
61
+ name: bloomfilter-rb
62
+ prerelease: false
63
+ requirement: &id005 !ruby/object:Gem::Requirement
64
+ none: false
65
+ requirements:
66
+ - - ">="
67
+ - !ruby/object:Gem::Version
68
+ version: "0"
69
+ type: :runtime
70
+ version_requirements: *id005
71
+ - !ruby/object:Gem::Dependency
72
+ name: rspec
73
+ prerelease: false
74
+ requirement: &id006 !ruby/object:Gem::Requirement
75
+ none: false
76
+ requirements:
77
+ - - ">="
78
+ - !ruby/object:Gem::Version
79
+ version: "0"
80
+ type: :development
81
+ version_requirements: *id006
82
+ - !ruby/object:Gem::Dependency
83
+ name: em-http-request
84
+ prerelease: false
85
+ requirement: &id007 !ruby/object:Gem::Requirement
86
+ none: false
87
+ requirements:
88
+ - - ">="
89
+ - !ruby/object:Gem::Version
90
+ version: 1.0.0.beta.3
91
+ type: :development
92
+ version_requirements: *id007
93
+ description: Write a gem description
94
+ email:
95
+ - ilya@igvita.com
96
+ executables:
97
+ - mneme
98
+ extensions: []
99
+
100
+ extra_rdoc_files: []
101
+
102
+ files:
103
+ - .gitignore
104
+ - .rspec
105
+ - Gemfile
106
+ - README.md
107
+ - Rakefile
108
+ - bin/mneme
109
+ - config.rb
110
+ - lib/mneme.rb
111
+ - lib/mneme/helper.rb
112
+ - lib/mneme/sweeper.rb
113
+ - mneme.gemspec
114
+ - spec/mneme_spec.rb
115
+ has_rdoc: true
116
+ homepage: ""
117
+ licenses: []
118
+
119
+ post_install_message:
120
+ rdoc_options: []
121
+
122
+ require_paths:
123
+ - lib
124
+ required_ruby_version: !ruby/object:Gem::Requirement
125
+ none: false
126
+ requirements:
127
+ - - ">="
128
+ - !ruby/object:Gem::Version
129
+ version: "0"
130
+ required_rubygems_version: !ruby/object:Gem::Requirement
131
+ none: false
132
+ requirements:
133
+ - - ">="
134
+ - !ruby/object:Gem::Version
135
+ version: "0"
136
+ requirements: []
137
+
138
+ rubyforge_project: mneme
139
+ rubygems_version: 1.6.2
140
+ signing_key:
141
+ specification_version: 3
142
+ summary: abc
143
+ test_files:
144
+ - spec/mneme_spec.rb