pacer-bloomfilter 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,5 @@
1
+ lib/**/*.rb
2
+ bin/*
3
+ -
4
+ features/**/*.feature
5
+ LICENSE.txt
data/.rspec ADDED
@@ -0,0 +1 @@
1
+ --color
data/Gemfile ADDED
@@ -0,0 +1,9 @@
1
+ source "http://rubygems.org"
2
+ gem 'pacer', '>= 0.6.1'
3
+
4
+ group :development do
5
+ gem "bundler", "~> 1.0.0"
6
+ gem "jeweler", "~> 1.5.2"
7
+ gem "rspec", "~> 2.3.0"
8
+ gem "rcov", ">= 0"
9
+ end
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2011 Darrick Wiebe
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,121 @@
1
+ # Pacer BloomFilter plugin (pacer-bloomfilter)
2
+
3
+ This plugin adds set filtering using [bloom filters](http://en.wikipedia.org/wiki/Bloom_filter) to the [Pacer](https://github.com/pangloss/pacer) graph and
4
+ streaming data processing library.
5
+
6
+ This plugin is also meant to serve as an example of how easy it is to
7
+ build a plugin for Pacer.
8
+
9
+ ## Usage
10
+
11
+ The bloomfilter method is added to the core route object in Pacer,
12
+ which means that it will be available to all routes. The method takes 2
13
+ arguments and a block:
14
+
15
+ - false_positive_probability: between 0 and 1 with a lower number
16
+ indicating a lower chance of different keys being considered equal
17
+ - expected_count: the maximum number of elements you think will be added
18
+ to the bloom filter. The more accurate this number is the more
19
+ accurate your false_positive_probability will be.
20
+ - block: The block should map the elements that will be iterated over to
21
+ a value that will be used by the filter. You should return a string.
22
+ This block does not affect the actual output of the route.
23
+ If no block is given, to_s on the element itself will used for the
24
+ filter.
25
+
26
+ ### Example
27
+
28
+ Map the vertices to names and then filter by name:
29
+
30
+ graph.v[:name].bloomfilter(0.001, 10).except(['sam', 'bob'])
31
+
32
+ "steve" "gary"
33
+ Total: 2
34
+ => #<GraphV -> Obj(name) -> Obj-Bloom>
35
+
36
+ Wrong! There is no way to map the vertices to the name, all vertices
37
+ pass through:
38
+
39
+ graph.v.bloomfilter(0.001, 10).except(['sam', 'bob'])
40
+
41
+ #<V[0]> #<V[1]> #<V[2]> #<V[3]>
42
+ Total: 4
43
+ => #<GraphV -> V-Bloom>
44
+
45
+ That's better. Here we tell the bloomfilter how to map the vertices to
46
+ the name field (we switched it to #only though just to get more mileage
47
+ out of the example):
48
+
49
+ graph.v.bloomfilter(0.001, 10) { |v| v[:name] }.only(['sam', 'bob'])
50
+
51
+ #<V[0]> #<V[3]>
52
+ Total: 2
53
+ => #<GraphV -> V-Bloom>
54
+
55
+ And for completeness, the uniq method is pretty self explanitory I hope:
56
+
57
+ graph.v[:type].bloomfilter(0.001, 10).uniq
58
+
59
+ "band member"
60
+ Total: 1
61
+ => #<GraphV -> V-Bloom>
62
+
63
+ ## Is it Fast?
64
+
65
+ I don't know. This plugin is currently only a proof of concept and has
66
+ not been optimized, profiled or benchmarked! If you want to spend a
67
+ couple of hours to make it blazing fast, I'll be quite impressed!
68
+
69
+ ## How Pacer Plugins Work
70
+
71
+ The plugin architecture of Pacer is very simple. You can define a module
72
+ in one of 3 namespaces which correspond to the categories of functions
73
+ that I've identified thus far:
74
+
75
+ - Pacer::Filter
76
+ - Pacer::Transform
77
+ - Pacer::SideEffect
78
+
79
+ Rather than try to explain those clearly, it is probably best to
80
+ dig into the code and documenation of Pacer, Pipes and Gremlin.
81
+
82
+ The module that you define will be mixed in to a Pacer::Route instance
83
+ at runtime when chain_route is called pointing to your module. Nearly
84
+ every method that is called when a user is building a route is actually
85
+ just a friendlier version of chain_route. The chain_route method takes a
86
+ hash of arguments, a few of which are reserved, and the rest of which
87
+ will be applied to your module via property setters. Note that in this
88
+ plugin, :false_pos_prob, :expected_count, :bloomfilter, and :block all
89
+ have corresponding attr_accessors in the BloomFilter module. In that
90
+ way, settings are carried over from the route definition to the route
91
+ instance.
92
+
93
+ Once the route is defined, it may be executed multiple times. Whenever
94
+ it is executed, a new pipeline is built, of which your module is just
95
+ one part. In almost all cases, all you will need to do is define the
96
+ protected attach_pipe method. Inside that method you just need to create
97
+ your pipe, call setStarts on it with the pipe that was given as a
98
+ parameter, and return the pipe you created. Pacer will look after the
99
+ rest of the pipe building process.
100
+
101
+ ## Contributing to pacer-bloomfilter
102
+
103
+ * Check out the latest master to make sure the feature hasn't been
104
+ implemented or the bug hasn't been fixed yet
105
+ * Check out the issue tracker to make sure someone already hasn't
106
+ requested it and/or contributed it
107
+ * Fork the project
108
+ * Start a feature/bugfix branch
109
+ * Commit and push until you are happy with your contribution
110
+ * Make sure to add tests for it. This is important so I don't break it
111
+ in a future version unintentionally.
112
+ * Please try not to mess with the Rakefile, version, or history. If you
113
+ want to have your own version, or is otherwise necessary, that is
114
+ fine, but please isolate to its own commit so I can cherry-pick around
115
+ it.
116
+
117
+ ## Copyright
118
+
119
+ Copyright (c) 2011 Darrick Wiebe. See LICENSE.txt for
120
+ further details.
121
+
@@ -0,0 +1,50 @@
1
+ require 'rubygems'
2
+ require 'bundler'
3
+ begin
4
+ Bundler.setup(:default, :development)
5
+ rescue Bundler::BundlerError => e
6
+ $stderr.puts e.message
7
+ $stderr.puts "Run `bundle install` to install missing gems"
8
+ exit e.status_code
9
+ end
10
+ require 'rake'
11
+
12
+ require 'jeweler'
13
+ Jeweler::Tasks.new do |gem|
14
+ # gem is a Gem::Specification... see http://docs.rubygems.org/read/chapter/20 for more options
15
+ gem.name = "pacer-bloomfilter"
16
+ gem.homepage = "http://github.com/pangloss/pacer-bloomfilter"
17
+ gem.license = "MIT"
18
+ gem.summary = %Q{Filter object streams in Pacer by using a bloom filter}
19
+ gem.description = %Q{Bloom filters are fast, compact, probabalistic data structures that allow set filtering with a configurable rate of false positives. This plugin adds .bloom_filter.uniq, .bloom_filter.only([collection]), and .bloom_filter.except([collection]) to the available routes methods in Pacer.}
20
+ gem.email = "darrick@innatesoftware.com"
21
+ gem.authors = ["Darrick Wiebe"]
22
+ # Include your dependencies below. Runtime dependencies are required when using your gem,
23
+ # and development dependencies are only needed for development (ie running rake tasks, tests, etc)
24
+ gem.add_runtime_dependency 'pacer', '>= 0.6.1'
25
+ # gem.add_development_dependency 'rspec', '> 1.2.3'
26
+ end
27
+ Jeweler::RubygemsDotOrgTasks.new
28
+
29
+ require 'rspec/core'
30
+ require 'rspec/core/rake_task'
31
+ RSpec::Core::RakeTask.new(:spec) do |spec|
32
+ spec.pattern = FileList['spec/**/*_spec.rb']
33
+ end
34
+
35
+ RSpec::Core::RakeTask.new(:rcov) do |spec|
36
+ spec.pattern = 'spec/**/*_spec.rb'
37
+ spec.rcov = true
38
+ end
39
+
40
+ task :default => :spec
41
+
42
+ require 'rake/rdoctask'
43
+ Rake::RDocTask.new do |rdoc|
44
+ version = File.exist?('VERSION') ? File.read('VERSION') : ""
45
+
46
+ rdoc.rdoc_dir = 'rdoc'
47
+ rdoc.title = "pacer-bloomfilter #{version}"
48
+ rdoc.rdoc_files.include('README*')
49
+ rdoc.rdoc_files.include('lib/**/*.rb')
50
+ end
data/VERSION ADDED
@@ -0,0 +1 @@
1
+ 1.0.0
@@ -0,0 +1,21 @@
1
+ module PacerBloomFilter
2
+ unless const_defined? :VERSION
3
+ PATH = File.expand_path(File.join(File.dirname(__FILE__), '..'))
4
+ VERSION = File.read(PATH + '/VERSION').chomp
5
+
6
+ $: << File.dirname(__FILE__)
7
+ require File.dirname(__FILE__) + '/../vendor/java-bloomfilter.jar'
8
+ end
9
+
10
+ def self.reload!
11
+ Dir[File.dirname(__FILE__) + '/**/*.rb'].each do |file|
12
+ puts file
13
+ load file
14
+ end
15
+ nil
16
+ end
17
+ end
18
+
19
+ require 'pacer/pipe/bloomfilter_reject'
20
+ require 'pacer/filter/bloomfilter'
21
+
@@ -0,0 +1,77 @@
1
+ module Pacer
2
+ module Core
3
+ module Route
4
+ def bloomfilter(false_pos_prob, expected_count, opts = {}, &block)
5
+ chain_route :filter => :bloom,
6
+ :false_pos_prob => false_pos_prob,
7
+ :expected_count => expected_count,
8
+ :bloomfilter => opts[:bloomfilter],
9
+ :block => block
10
+ end
11
+ end
12
+ end
13
+
14
+ module Filter
15
+ module BloomFilter
16
+ attr_accessor :false_pos_prob, :expected_count, :block, :bloomfilter
17
+
18
+ def uniq
19
+ @except ||= []
20
+ @uniq = true
21
+ self
22
+ end
23
+
24
+ def except(others)
25
+ @except ||= []
26
+ @except << others
27
+ self
28
+ end
29
+
30
+ def only(others)
31
+ @only ||= []
32
+ @only << others
33
+ self
34
+ end
35
+
36
+ protected
37
+
38
+ def attach_pipe(pipe)
39
+ pipe = except_pipe(pipe) if @except
40
+ pipe = only_pipe(pipe) if @only
41
+ pipe
42
+ end
43
+
44
+ private
45
+
46
+ def except_pipe(pipe)
47
+ bfp = Pacer::Pipes::BloomFilter::RejectPipe.new false_pos_prob, expected_count, sideline_pipe
48
+ bfp.accumulate if @uniq
49
+ prepare_pipe(bfp, @except, pipe)
50
+ end
51
+
52
+ def only_pipe(pipe)
53
+ bfp = Pacer::Pipes::BloomFilter::SelectPipe.new false_pos_prob, expected_count, sideline_pipe
54
+ prepare_pipe(bfp, @except, pipe)
55
+ end
56
+
57
+ def prepare_pipe(bfp, all_items, pipe)
58
+ bfp.bloomfilter = bloomfilter if bloomfilter
59
+ all_items.each do |items|
60
+ if items.is_a? Enumerable
61
+ bfp.addAll items
62
+ else
63
+ bfp.addAll [items]
64
+ end
65
+ end
66
+ bfp.setStarts pipe if pipe
67
+ bfp
68
+ end
69
+
70
+ def sideline_pipe
71
+ if block
72
+ Pacer::Route.pipeline Pacer::Route.empty(self).map(&block)
73
+ end
74
+ end
75
+ end
76
+ end
77
+ end
@@ -0,0 +1,103 @@
1
+ module Pacer
2
+ module Pipes
3
+ module BloomFilter
4
+ class SideliningPipe < AbstractPipe
5
+ def initialize(pipe)
6
+ super()
7
+ if pipe
8
+ @sideline = pipe
9
+ @sidelineExpando = ExpandableIterator.new(java.util.ArrayList.new.iterator);
10
+ @sideline.setStarts(@sidelineExpando);
11
+ end
12
+ end
13
+
14
+ protected
15
+
16
+ def sidelineValue(value)
17
+ if @sideline
18
+ @sideline.reset
19
+ @sidelineExpando.add value
20
+ @sideline.next
21
+ else
22
+ value
23
+ end
24
+ rescue NativeException => e
25
+ if e.cause.getClass == Pacer::NoSuchElementException.getClass
26
+ nil
27
+ else
28
+ raise e
29
+ end
30
+ end
31
+ end
32
+
33
+ class RejectPipe < SideliningPipe
34
+ import com.skjegstad.utils.BloomFilter
35
+ field_accessor :starts
36
+ attr_accessor :filter
37
+
38
+ def initialize(false_pos_prob, expected_count, sideline_pipe = nil)
39
+ super(sideline_pipe)
40
+ @filter = BloomFilter.new(false_pos_prob, expected_count)
41
+ end
42
+
43
+ def addAll(elements)
44
+ @filter.addAll(elements)
45
+ end
46
+
47
+ def accumulate
48
+ @accumulate = true
49
+ end
50
+
51
+ protected
52
+
53
+ def processNextStart()
54
+ while raw_element = starts.next
55
+ value = sidelineValue(raw_element)
56
+ unless @filter.contains? value.to_s
57
+ @filter.add(value.to_s) if @accumulate and value
58
+ return raw_element
59
+ end
60
+ end
61
+ rescue NativeException => e
62
+ if e.cause.getClass == Pacer::NoSuchElementException.getClass
63
+ raise e.cause
64
+ else
65
+ raise e
66
+ end
67
+ end
68
+ end
69
+
70
+ class SelectPipe < SideliningPipe
71
+ import com.skjegstad.utils.BloomFilter
72
+ field_accessor :starts
73
+ attr_accessor :filter
74
+
75
+ def initialize(false_pos_prob, expected_count, sideline_pipe = nil)
76
+ super(sideline_pipe)
77
+ @filter = BloomFilter.new(false_pos_prob, expected_count)
78
+ end
79
+
80
+ def addAll(elements)
81
+ @filter.addAll(elements)
82
+ end
83
+
84
+ protected
85
+
86
+ def processNextStart()
87
+ while raw_element = starts.next
88
+ value = sidelineValue(raw_element)
89
+ if @filter.contains? value.to_s
90
+ return raw_element
91
+ end
92
+ end
93
+ rescue NativeException => e
94
+ if e.cause.getClass == Pacer::NoSuchElementException.getClass
95
+ raise e.cause
96
+ else
97
+ raise e
98
+ end
99
+ end
100
+ end
101
+ end
102
+ end
103
+ end
@@ -0,0 +1,7 @@
1
+ require File.expand_path(File.dirname(__FILE__) + '/spec_helper')
2
+
3
+ describe "PacerBloomfilter" do
4
+ it "fails" do
5
+ fail "hey buddy, you should probably rename this file and start specing for real"
6
+ end
7
+ end
@@ -0,0 +1,12 @@
1
+ $LOAD_PATH.unshift(File.join(File.dirname(__FILE__), '..', 'lib'))
2
+ $LOAD_PATH.unshift(File.dirname(__FILE__))
3
+ require 'rspec'
4
+ require 'pacer-bloomfilter'
5
+
6
+ # Requires supporting files with custom matchers and macros, etc,
7
+ # in ./support/ and its subdirectories.
8
+ Dir["#{File.dirname(__FILE__)}/support/**/*.rb"].each {|f| require f}
9
+
10
+ RSpec.configure do |config|
11
+
12
+ end
metadata ADDED
@@ -0,0 +1,138 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: pacer-bloomfilter
3
+ version: !ruby/object:Gem::Version
4
+ prerelease:
5
+ version: 1.0.0
6
+ platform: ruby
7
+ authors:
8
+ - Darrick Wiebe
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+
13
+ date: 2011-03-29 00:00:00 -04:00
14
+ default_executable:
15
+ dependencies:
16
+ - !ruby/object:Gem::Dependency
17
+ name: pacer
18
+ version_requirements: &id001 !ruby/object:Gem::Requirement
19
+ none: false
20
+ requirements:
21
+ - - ">="
22
+ - !ruby/object:Gem::Version
23
+ version: 0.6.1
24
+ requirement: *id001
25
+ prerelease: false
26
+ type: :runtime
27
+ - !ruby/object:Gem::Dependency
28
+ name: bundler
29
+ version_requirements: &id002 !ruby/object:Gem::Requirement
30
+ none: false
31
+ requirements:
32
+ - - ~>
33
+ - !ruby/object:Gem::Version
34
+ version: 1.0.0
35
+ requirement: *id002
36
+ prerelease: false
37
+ type: :development
38
+ - !ruby/object:Gem::Dependency
39
+ name: jeweler
40
+ version_requirements: &id003 !ruby/object:Gem::Requirement
41
+ none: false
42
+ requirements:
43
+ - - ~>
44
+ - !ruby/object:Gem::Version
45
+ version: 1.5.2
46
+ requirement: *id003
47
+ prerelease: false
48
+ type: :development
49
+ - !ruby/object:Gem::Dependency
50
+ name: rspec
51
+ version_requirements: &id004 !ruby/object:Gem::Requirement
52
+ none: false
53
+ requirements:
54
+ - - ~>
55
+ - !ruby/object:Gem::Version
56
+ version: 2.3.0
57
+ requirement: *id004
58
+ prerelease: false
59
+ type: :development
60
+ - !ruby/object:Gem::Dependency
61
+ name: rcov
62
+ version_requirements: &id005 !ruby/object:Gem::Requirement
63
+ none: false
64
+ requirements:
65
+ - - ">="
66
+ - !ruby/object:Gem::Version
67
+ version: "0"
68
+ requirement: *id005
69
+ prerelease: false
70
+ type: :development
71
+ - !ruby/object:Gem::Dependency
72
+ name: pacer
73
+ version_requirements: &id006 !ruby/object:Gem::Requirement
74
+ none: false
75
+ requirements:
76
+ - - ">="
77
+ - !ruby/object:Gem::Version
78
+ version: 0.6.1
79
+ requirement: *id006
80
+ prerelease: false
81
+ type: :runtime
82
+ description: Bloom filters are fast, compact, probabalistic data structures that allow set filtering with a configurable rate of false positives. This plugin adds .bloom_filter.uniq, .bloom_filter.only([collection]), and .bloom_filter.except([collection]) to the available routes methods in Pacer.
83
+ email: darrick@innatesoftware.com
84
+ executables: []
85
+
86
+ extensions: []
87
+
88
+ extra_rdoc_files:
89
+ - LICENSE.txt
90
+ - README.md
91
+ files:
92
+ - .document
93
+ - .rspec
94
+ - Gemfile
95
+ - LICENSE.txt
96
+ - README.md
97
+ - Rakefile
98
+ - VERSION
99
+ - lib/pacer-bloomfilter.rb
100
+ - lib/pacer/filter/bloomfilter.rb
101
+ - lib/pacer/pipe/bloomfilter_reject.rb
102
+ - spec/pacer-bloomfilter_spec.rb
103
+ - spec/spec_helper.rb
104
+ - vendor/java-bloomfilter.jar
105
+ has_rdoc: true
106
+ homepage: http://github.com/pangloss/pacer-bloomfilter
107
+ licenses:
108
+ - MIT
109
+ post_install_message:
110
+ rdoc_options: []
111
+
112
+ require_paths:
113
+ - lib
114
+ required_ruby_version: !ruby/object:Gem::Requirement
115
+ none: false
116
+ requirements:
117
+ - - ">="
118
+ - !ruby/object:Gem::Version
119
+ hash: 2
120
+ segments:
121
+ - 0
122
+ version: "0"
123
+ required_rubygems_version: !ruby/object:Gem::Requirement
124
+ none: false
125
+ requirements:
126
+ - - ">="
127
+ - !ruby/object:Gem::Version
128
+ version: "0"
129
+ requirements: []
130
+
131
+ rubyforge_project:
132
+ rubygems_version: 1.5.1
133
+ signing_key:
134
+ specification_version: 3
135
+ summary: Filter object streams in Pacer by using a bloom filter
136
+ test_files:
137
+ - spec/pacer-bloomfilter_spec.rb
138
+ - spec/spec_helper.rb