bloomfilter 0.1.0
Sign up to get free protection for your applications and to get access to all the features.
- data/History.txt +4 -0
- data/License.txt +20 -0
- data/Manifest.txt +16 -0
- data/README.txt +37 -0
- data/Rakefile +123 -0
- data/lib/bloomfilter/version.rb +9 -0
- data/lib/bloomfilter.rb +6 -0
- data/scripts/txt2html +67 -0
- data/setup.rb +1585 -0
- data/test/test_bloomfilter.rb +43 -0
- data/test/test_external_bloom_filter.rb +44 -0
- data/test/test_helper.rb +2 -0
- data/website/index.html +92 -0
- data/website/index.txt +38 -0
- data/website/javascripts/rounded_corners_lite.inc.js +285 -0
- data/website/stylesheets/screen.css +138 -0
- data/website/template.rhtml +48 -0
- metadata +68 -0
data/History.txt
ADDED
data/License.txt
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
Copyright (c) 2007 FIXME full name
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
4
|
+
a copy of this software and associated documentation files (the
|
5
|
+
"Software"), to deal in the Software without restriction, including
|
6
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
7
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
8
|
+
permit persons to whom the Software is furnished to do so, subject to
|
9
|
+
the following conditions:
|
10
|
+
|
11
|
+
The above copyright notice and this permission notice shall be
|
12
|
+
included in all copies or substantial portions of the Software.
|
13
|
+
|
14
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
15
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
16
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
17
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
18
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
19
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
20
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/Manifest.txt
ADDED
@@ -0,0 +1,16 @@
|
|
1
|
+
History.txt
|
2
|
+
License.txt
|
3
|
+
Manifest.txt
|
4
|
+
README.txt
|
5
|
+
Rakefile
|
6
|
+
lib/bloomfilter.rb
|
7
|
+
lib/bloomfilter/version.rb
|
8
|
+
scripts/txt2html
|
9
|
+
setup.rb
|
10
|
+
test/test_bloomfilter.rb
|
11
|
+
test/test_helper.rb
|
12
|
+
website/index.html
|
13
|
+
website/index.txt
|
14
|
+
website/javascripts/rounded_corners_lite.inc.js
|
15
|
+
website/stylesheets/screen.css
|
16
|
+
website/template.rhtml
|
data/README.txt
ADDED
@@ -0,0 +1,37 @@
|
|
1
|
+
README for bloomfilter
|
2
|
+
======================
|
3
|
+
|
4
|
+
This gem contains two Ruby Bloom filter implementations. A discussion of Bloom filters in general is beyond the scope of this document, but from a high level, the point of a Bloom filter is to compactly represent a set of items so that you can avoid doing a costly full-set search if you're confident that the item isn't in the set.
|
5
|
+
|
6
|
+
This library includes two implementations: BloomFilter, which is a fast, in-memory implementation; and ExternalBloomFilter, which is a file-based implementation. When you will be doing many searches of the filter, you should use the BloomFilter. If you are memory constrained or will be using the filter sparingly, use the ExternalBloomFilter.
|
7
|
+
|
8
|
+
BloomFilter
|
9
|
+
------------
|
10
|
+
To create a new filter, simply use
|
11
|
+
|
12
|
+
filter = BloomFilter.new(bit_array_size)
|
13
|
+
|
14
|
+
and then you can add to it by
|
15
|
+
|
16
|
+
filter.add "item"
|
17
|
+
|
18
|
+
and query it by
|
19
|
+
|
20
|
+
filter.include? "item" #=> true
|
21
|
+
|
22
|
+
If it is non-trivial to build your filter in the first place, you'll likely want to persist it. Use the BloomFilter.save(path) method to write the bit field out to a file. To load a saved filter, use BloomFilter.load(path). This file format is compatible with ExternalBloomFilter as well.
|
23
|
+
|
24
|
+
|
25
|
+
ExternalBloomFilter
|
26
|
+
--------------------
|
27
|
+
ExternalBloomFilter works by keeping the bit array on disk and only reading 5 bytes per lookup and reading and writing 5 bytes each per addition. This makes it very memory efficient (read: nearly zero).
|
28
|
+
|
29
|
+
To use ExternalBloomFilter, the only difference is that instead of specifying a size, you specify a path. Here's how you create a new filter:
|
30
|
+
|
31
|
+
filter = ExternalBloomFilter.create(path, size_of_bit_array)
|
32
|
+
|
33
|
+
Here's how you load an existing filter:
|
34
|
+
|
35
|
+
filter = ExternalBloomFilter.new(path)
|
36
|
+
|
37
|
+
After you have a valid filter, you add to it and query it the same as the BloomFilter class. Note that any changes you make to the filter by adding to it will persist immediately to disk. There is no undo with the External implementation.
|
data/Rakefile
ADDED
@@ -0,0 +1,123 @@
|
|
1
|
+
require 'rubygems'
|
2
|
+
require 'rake'
|
3
|
+
require 'rake/clean'
|
4
|
+
require 'rake/testtask'
|
5
|
+
require 'rake/packagetask'
|
6
|
+
require 'rake/gempackagetask'
|
7
|
+
require 'rake/rdoctask'
|
8
|
+
require 'rake/contrib/rubyforgepublisher'
|
9
|
+
require 'fileutils'
|
10
|
+
require 'hoe'
|
11
|
+
|
12
|
+
include FileUtils
|
13
|
+
require File.join(File.dirname(__FILE__), 'lib', 'bloomfilter', 'version')
|
14
|
+
|
15
|
+
AUTHOR = 'Bryan Duxbury' # can also be an array of Authors
|
16
|
+
EMAIL = "bryan.duxbury@gmail.com"
|
17
|
+
DESCRIPTION = "Two Bloom filter implementations."
|
18
|
+
GEM_NAME = 'bloomfilter' # what ppl will type to install your gem
|
19
|
+
|
20
|
+
@config_file = "~/.rubyforge/user-config.yml"
|
21
|
+
@config = nil
|
22
|
+
def rubyforge_username
|
23
|
+
unless @config
|
24
|
+
begin
|
25
|
+
@config = YAML.load(File.read(File.expand_path(@config_file)))
|
26
|
+
rescue
|
27
|
+
puts <<-EOS
|
28
|
+
ERROR: No rubyforge config file found: #{@config_file}"
|
29
|
+
Run 'rubyforge setup' to prepare your env for access to Rubyforge
|
30
|
+
- See http://newgem.rubyforge.org/rubyforge.html for more details
|
31
|
+
EOS
|
32
|
+
exit
|
33
|
+
end
|
34
|
+
end
|
35
|
+
@rubyforge_username ||= @config["username"]
|
36
|
+
end
|
37
|
+
|
38
|
+
RUBYFORGE_PROJECT = 'rapleaf' # The unix name for your project
|
39
|
+
HOMEPATH = "http://#{RUBYFORGE_PROJECT}.rubyforge.org"
|
40
|
+
DOWNLOAD_PATH = "http://rubyforge.org/projects/#{RUBYFORGE_PROJECT}"
|
41
|
+
|
42
|
+
NAME = "bloomfilter"
|
43
|
+
REV = nil
|
44
|
+
# UNCOMMENT IF REQUIRED:
|
45
|
+
# REV = `svn info`.each {|line| if line =~ /^Revision:/ then k,v = line.split(': '); break v.chomp; else next; end} rescue nil
|
46
|
+
VERS = Bloomfilter::VERSION::STRING + (REV ? ".#{REV}" : "")
|
47
|
+
CLEAN.include ['**/.*.sw?', '*.gem', '.config', '**/.DS_Store']
|
48
|
+
RDOC_OPTS = ['--quiet', '--title', 'bloomfilter documentation',
|
49
|
+
"--opname", "index.html",
|
50
|
+
"--line-numbers",
|
51
|
+
"--main", "README",
|
52
|
+
"--inline-source"]
|
53
|
+
|
54
|
+
class Hoe
|
55
|
+
def extra_deps
|
56
|
+
@extra_deps.reject { |x| Array(x).first == 'hoe' }
|
57
|
+
end
|
58
|
+
end
|
59
|
+
|
60
|
+
# Generate all the Rake tasks
|
61
|
+
# Run 'rake -T' to see list of generated tasks (from gem root directory)
|
62
|
+
hoe = Hoe.new(GEM_NAME, VERS) do |p|
|
63
|
+
p.author = AUTHOR
|
64
|
+
p.description = DESCRIPTION
|
65
|
+
p.email = EMAIL
|
66
|
+
p.summary = DESCRIPTION
|
67
|
+
p.url = HOMEPATH
|
68
|
+
p.rubyforge_name = RUBYFORGE_PROJECT if RUBYFORGE_PROJECT
|
69
|
+
p.test_globs = ["test/**/test_*.rb"]
|
70
|
+
p.clean_globs |= CLEAN #An array of file patterns to delete on clean.
|
71
|
+
|
72
|
+
# == Optional
|
73
|
+
p.changes = p.paragraphs_of("History.txt", 0..1).join("\n\n")
|
74
|
+
#p.extra_deps = [] # An array of rubygem dependencies [name, version], e.g. [ ['active_support', '>= 1.3.1'] ]
|
75
|
+
#p.spec_extras = {} # A hash of extra values to set in the gemspec.
|
76
|
+
end
|
77
|
+
|
78
|
+
CHANGES = hoe.paragraphs_of('History.txt', 0..1).join("\n\n")
|
79
|
+
PATH = (RUBYFORGE_PROJECT == GEM_NAME) ? RUBYFORGE_PROJECT : "#{RUBYFORGE_PROJECT}/#{GEM_NAME}"
|
80
|
+
hoe.remote_rdoc_dir = File.join(PATH.gsub(/^#{RUBYFORGE_PROJECT}\/?/,''), 'rdoc')
|
81
|
+
|
82
|
+
desc 'Generate website files'
|
83
|
+
task :website_generate do
|
84
|
+
Dir['website/**/*.txt'].each do |txt|
|
85
|
+
sh %{ ruby scripts/txt2html #{txt} > #{txt.gsub(/txt$/,'html')} }
|
86
|
+
end
|
87
|
+
end
|
88
|
+
|
89
|
+
desc 'Upload website files to rubyforge'
|
90
|
+
task :website_upload do
|
91
|
+
host = "#{rubyforge_username}@rubyforge.org"
|
92
|
+
remote_dir = "/var/www/gforge-projects/#{PATH}/"
|
93
|
+
local_dir = 'website'
|
94
|
+
sh %{rsync -aCv #{local_dir}/ #{host}:#{remote_dir}}
|
95
|
+
end
|
96
|
+
|
97
|
+
desc 'Generate and upload website files'
|
98
|
+
task :website => [:website_generate, :website_upload, :publish_docs]
|
99
|
+
|
100
|
+
desc 'Release the website and new gem version'
|
101
|
+
task :deploy => [:check_version, :website, :release] do
|
102
|
+
puts "Remember to create SVN tag:"
|
103
|
+
puts "svn copy svn+ssh://#{rubyforge_username}@rubyforge.org/var/svn/#{PATH}/trunk " +
|
104
|
+
"svn+ssh://#{rubyforge_username}@rubyforge.org/var/svn/#{PATH}/tags/REL-#{VERS} "
|
105
|
+
puts "Suggested comment:"
|
106
|
+
puts "Tagging release #{CHANGES}"
|
107
|
+
end
|
108
|
+
|
109
|
+
desc 'Runs tasks website_generate and install_gem as a local deployment of the gem'
|
110
|
+
task :local_deploy => [:website_generate, :install_gem]
|
111
|
+
|
112
|
+
task :check_version do
|
113
|
+
unless ENV['VERSION']
|
114
|
+
puts 'Must pass a VERSION=x.y.z release version'
|
115
|
+
exit
|
116
|
+
end
|
117
|
+
unless ENV['VERSION'] == VERS
|
118
|
+
puts "Please update your version.rb to match the release version, currently #{VERS}"
|
119
|
+
exit
|
120
|
+
end
|
121
|
+
end
|
122
|
+
|
123
|
+
|
data/lib/bloomfilter.rb
ADDED
@@ -0,0 +1,6 @@
|
|
1
|
+
module Bloomfilter
|
2
|
+
end
|
3
|
+
|
4
|
+
require File.expand_path(File.dirname(__FILE__)) + '/bloomfilter/version'
|
5
|
+
require File.expand_path(File.dirname(__FILE__)) + "/bloomfilter/bloomfilter"
|
6
|
+
require File.expand_path(File.dirname(__FILE__)) + "/bloomfilter/external_bloom_filter"
|
data/scripts/txt2html
ADDED
@@ -0,0 +1,67 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
require 'rubygems'
|
4
|
+
require 'redcloth'
|
5
|
+
require 'syntax/convertors/html'
|
6
|
+
require 'erb'
|
7
|
+
require File.dirname(__FILE__) + '/../lib/bloomfilter/version.rb'
|
8
|
+
|
9
|
+
version = Bloomfilter::VERSION::STRING
|
10
|
+
download = 'http://rubyforge.org/projects/bloomfilter'
|
11
|
+
|
12
|
+
class Fixnum
|
13
|
+
def ordinal
|
14
|
+
# teens
|
15
|
+
return 'th' if (10..19).include?(self % 100)
|
16
|
+
# others
|
17
|
+
case self % 10
|
18
|
+
when 1: return 'st'
|
19
|
+
when 2: return 'nd'
|
20
|
+
when 3: return 'rd'
|
21
|
+
else return 'th'
|
22
|
+
end
|
23
|
+
end
|
24
|
+
end
|
25
|
+
|
26
|
+
class Time
|
27
|
+
def pretty
|
28
|
+
return "#{mday}#{mday.ordinal} #{strftime('%B')} #{year}"
|
29
|
+
end
|
30
|
+
end
|
31
|
+
|
32
|
+
def convert_syntax(syntax, source)
|
33
|
+
return Syntax::Convertors::HTML.for_syntax(syntax).convert(source).gsub(%r!^<pre>|</pre>$!,'')
|
34
|
+
end
|
35
|
+
|
36
|
+
if ARGV.length >= 1
|
37
|
+
src, template = ARGV
|
38
|
+
template ||= File.dirname(__FILE__) + '/../website/template.rhtml'
|
39
|
+
|
40
|
+
else
|
41
|
+
puts("Usage: #{File.split($0).last} source.txt [template.rhtml] > output.html")
|
42
|
+
exit!
|
43
|
+
end
|
44
|
+
|
45
|
+
template = ERB.new(File.open(template).read)
|
46
|
+
|
47
|
+
title = nil
|
48
|
+
body = nil
|
49
|
+
File.open(src) do |fsrc|
|
50
|
+
title_text = fsrc.readline
|
51
|
+
body_text = fsrc.read
|
52
|
+
syntax_items = []
|
53
|
+
body_text.gsub!(%r!<(pre|code)[^>]*?syntax=['"]([^'"]+)[^>]*>(.*?)</>!m){
|
54
|
+
ident = syntax_items.length
|
55
|
+
element, syntax, source = $1, $2, $3
|
56
|
+
syntax_items << "<#{element} class='syntax'>#{convert_syntax(syntax, source)}</#{element}>"
|
57
|
+
"syntax-temp-#{ident}"
|
58
|
+
}
|
59
|
+
title = RedCloth.new(title_text).to_html.gsub(%r!<.*?>!,'').strip
|
60
|
+
body = RedCloth.new(body_text).to_html
|
61
|
+
body.gsub!(%r!(?:<pre><code>)?syntax-temp-(d+)(?:</code></pre>)?!){ syntax_items[$1.to_i] }
|
62
|
+
end
|
63
|
+
stat = File.stat(src)
|
64
|
+
created = stat.ctime
|
65
|
+
modified = stat.mtime
|
66
|
+
|
67
|
+
$stdout << template.result(binding)
|