bloomury 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 2d6df04e7ad2afdce3a4b13161cf02b329c425ee9b87d609bf6411ed586a47cb
4
+ data.tar.gz: a4a92b1376cbedca9f498206147c48763be4715e107f1d38ad7592e76330fba4
5
+ SHA512:
6
+ metadata.gz: e93062ff5f2ba0d29a8d4a43c821c198b92bdeeaf3af3751a21e216d0f81880d5fff4e8ee7dd4f7ba63c4bb0f44194924c092f4dd565605e7a1cf71037fafc03
7
+ data.tar.gz: 21d4a9e041916181a8d7c08c0072cbcdbdda73c6348c767bc81c1a8678441a221c6021b76b72961680884cfbf6223c4ead1e6af768ba1fd16e2f3dc19c537bcc
data/CHANGELOG.md ADDED
@@ -0,0 +1,11 @@
1
+ ## [Unreleased]
2
+
3
+ ## [0.1.0] - 2026-03-25
4
+
5
+ ### Added
6
+ - `Bloomury::Filter` C extension implementing a Bloom filter with MurmurHash3 (32-bit)
7
+ - Double-hashing (`h1 + i*h2`) with configurable `k` bit positions per item
8
+ - `new(capacity, error_rate, seed1:, seed2:)` — auto-calculates optimal `m`/`k` parameters; seeds default to `0x9747b28c` / `0x5a4afe17`
9
+ - `add(item)`, `include?(item)`, `add_count`, `bit_count`, `hash_count`, `seed1`, `seed2`
10
+ - Standalone C unit tests for MurmurHash3 (`rake test_c`)
11
+ - `rake 'memory_estimate[capacity,error_rate]'` task to estimate memory usage before allocating
@@ -0,0 +1,10 @@
1
+ # Code of Conduct
2
+
3
+ "bloomury" follows [The Ruby Community Conduct Guideline](https://www.ruby-lang.org/en/conduct) in all "collaborative space", which is defined as community communications channels (such as mailing lists, submitted patches, commit comments, etc.):
4
+
5
+ * Participants will be tolerant of opposing views.
6
+ * Participants must ensure that their language and actions are free of personal attacks and disparaging personal remarks.
7
+ * When interpreting the words and actions of others, participants should always assume good intentions.
8
+ * Behaviour which can be reasonably considered harassment will not be tolerated.
9
+
10
+ If you have any concerns about behaviour within this project, please contact us at ["TODO: Write your email address"](mailto:"TODO: Write your email address").
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2026 TODO: Write your name
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,139 @@
1
+ ![Bloomury poster](assets/bloomury_poster.svg)
2
+
3
+ Bloomury is a fast, memory-efficient way to answer one question: **have I seen this before?**
4
+
5
+ It's a Bloom filter — a data structure that can tell you with certainty when something is *new*, and with very high confidence when something has been *seen before*. It uses a fraction of the memory a full set would require, making it useful for deduplication, caching, and spam filtering at scale.
6
+
7
+ The trade-off: it can occasionally say "yes, I've seen this" when it hasn't (a false positive). You control how often that happens via the error rate. It will **never** say "no" when it should say "yes".
8
+
9
+ ## Quick start
10
+
11
+ ```ruby
12
+ gem install bloomury
13
+ ```
14
+
15
+ ```ruby
16
+ require "bloomury"
17
+
18
+ # Create a filter for up to 10,000 items with a 1% false positive rate
19
+ filter = Bloomury::Filter.new(10_000, 0.01)
20
+
21
+ filter.add("user@example.com")
22
+
23
+ filter.include?("user@example.com") # => true (definitely seen)
24
+ filter.include?("other@example.com") # => false (definitely not seen)
25
+ ```
26
+
27
+ That's it. Pick a capacity roughly equal to the number of items you expect to add, choose an error rate, and go.
28
+
29
+ ## Choosing your parameters
30
+
31
+ **Capacity** is the number of items you plan to add. Going significantly over capacity increases the false positive rate beyond what you asked for.
32
+
33
+ **Seeds** control which bit positions each item maps to. The defaults are fine for most uses; see [Fixed hash seeds](#fixed-hash-seeds) if you're exposing a filter over a network.
34
+
35
+ **Error rate** is the probability of a false positive once the filter is full. `0.01` means roughly 1 in 100 membership checks on unseen items will incorrectly return `true`. Lower is more accurate but uses more memory.
36
+
37
+ Not sure how much memory you'll use? Check before allocating:
38
+
39
+ ```bash
40
+ rake 'memory_estimate[capacity,error_rate]'
41
+ ```
42
+
43
+ ```
44
+ $ rake 'memory_estimate[10000,0.01]'
45
+ capacity: 10000 items
46
+ error rate: 0.01
47
+ bit count: 95851 (7 hash functions)
48
+ memory: 11982 bytes (11.7 KB)
49
+ ```
50
+
51
+ ## Full API
52
+
53
+ ```ruby
54
+ filter = Bloomury::Filter.new(capacity, error_rate)
55
+
56
+ filter.add("item") # add an item
57
+ filter.include?("item") # true/false membership check
58
+ filter.add_count # number of add calls made (including duplicates)
59
+ filter.bit_count # size of the underlying bit array
60
+ filter.hash_count # number of hash functions in use
61
+ ```
62
+
63
+ ## Why MurmurHash3
64
+
65
+ A Bloom filter needs to hash each item multiple times, quickly, with results spread evenly across the bit array. MurmurHash3 is a good fit: it's fast, has excellent distribution, and is simple enough to embed directly in the extension with no external dependency.
66
+
67
+ It's not a cryptographic hash — it wasn't designed to be hard to reverse or to resist deliberate collisions. That's an acceptable trade-off here because speed and distribution quality matter more than security for this use case.
68
+
69
+ ## When to use a Bloom filter
70
+
71
+ Good fits:
72
+ - Deduplication — skip processing URLs, emails, or IDs you've already handled
73
+ - Cache guard — avoid expensive lookups for items you know aren't cached
74
+ - Spam/abuse — fast pre-check before a heavier database query
75
+
76
+ Not a good fit:
77
+ - You need to remove items (Bloom filters are add-only)
78
+ - You need to count occurrences
79
+ - False positives are completely unacceptable
80
+
81
+ ## Security considerations
82
+
83
+ ### Fixed hash seeds
84
+
85
+ This implementation uses fixed seeds for MurmurHash3 (`0x9747b28c` and `0x5a4afe17`). This is fine for local, in-process use — deduplication pipelines, caches, offline processing.
86
+
87
+ If you expose a filter over a network where untrusted parties can observe query results, fixed seeds are a liability: an adversary who knows the seeds can craft inputs that saturate specific bit positions and force false positives on legitimate lookups. Supply random seeds instead:
88
+
89
+ ```ruby
90
+ require "securerandom"
91
+
92
+ filter = Bloomury::Filter.new(10_000, 0.01,
93
+ seed1: SecureRandom.random_number(0xFFFFFFFF),
94
+ seed2: SecureRandom.random_number(0xFFFFFFFF)
95
+ )
96
+ ```
97
+
98
+ ### Not a cryptographic primitive
99
+
100
+ MurmurHash3 is optimised for speed and distribution, not security. Don't use it for password hashing or message authentication.
101
+
102
+ ## Development
103
+
104
+ ```bash
105
+ bin/setup # install dependencies
106
+ rake # compile, test, lint
107
+ rake compile # build C extension only
108
+ rake test # Ruby tests only
109
+ rake test_c # C unit tests only
110
+ ```
111
+
112
+ ## Prior art
113
+
114
+ [bloomfilter-rb](https://github.com/igrigorik/bloomfilter-rb) by Ilya Grigorik
115
+ is the most widely used Ruby bloom filter gem and was the primary inspiration
116
+ for this project. It implements a counting bloom filter with both native C and
117
+ Redis backends.
118
+
119
+ Bloomury differs in a few deliberate ways:
120
+
121
+ - **MurmurHash3** instead of CRC32 for better bit distribution across
122
+ real-world data patterns
123
+ - **Capacity and error rate** as primary constructor arguments — the filter
124
+ derives bit count and hash function count mathematically rather than
125
+ requiring callers to calculate these themselves
126
+ - **Non-counting filter** — no deletion support, which preserves the
127
+ no-false-negatives guarantee that counting filters can violate on bucket
128
+ overflow
129
+
130
+ If you need deletion support or a Redis-backed shared filter,
131
+ `bloomfilter-rb` may be a better fit.
132
+
133
+ ## Contributing
134
+
135
+ Bug reports and pull requests are welcome at https://github.com/tostart-pickagreatname/bloomury. Please follow the [code of conduct](https://github.com/tostart-pickagreatname/bloomury/blob/master/CODE_OF_CONDUCT.md).
136
+
137
+ ## License
138
+
139
+ [MIT License](https://opensource.org/licenses/MIT).
data/Rakefile ADDED
@@ -0,0 +1,52 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "bundler/gem_tasks"
4
+ require "minitest/test_task"
5
+
6
+ Minitest::TestTask.create
7
+
8
+ require "rubocop/rake_task"
9
+
10
+ RuboCop::RakeTask.new
11
+
12
+ require "rake/extensiontask"
13
+
14
+ task build: :compile
15
+
16
+ GEMSPEC = Gem::Specification.load("bloomury.gemspec")
17
+
18
+ Rake::ExtensionTask.new("bloomury", GEMSPEC) do |ext|
19
+ ext.lib_dir = "lib/bloomury"
20
+ end
21
+
22
+ desc "Estimate filter memory usage: rake memory_estimate[capacity,error_rate]"
23
+ task :memory_estimate, [:capacity, :error_rate] do |_t, args|
24
+ capacity = Float(args[:capacity] || abort("capacity required"))
25
+ error_rate = Float(args[:error_rate] || abort("error_rate required"))
26
+
27
+ abort("error_rate must be between 0 and 1") unless (0...1).include?(error_rate)
28
+ abort("capacity must be positive") unless capacity.positive?
29
+
30
+ m = (-capacity * Math.log(error_rate) / Math.log(2)**2).ceil
31
+ k = ((m / capacity) * Math.log(2)).round
32
+
33
+ bytes = (m / 8.0).ceil
34
+ puts "capacity: #{capacity.to_i} items"
35
+ puts "error rate: #{error_rate}"
36
+ puts "bit count: #{m} (#{k} hash functions)"
37
+ puts "memory: #{bytes} bytes (#{(bytes / 1024.0).round(2)} KB)"
38
+ end
39
+
40
+ desc "Run C unit tests"
41
+ task test_c: :compile do
42
+ cc = RbConfig::CONFIG["CC"]
43
+ cflags = "-I#{File.expand_path("ext/bloomury")}"
44
+ src = "test/c/test_murmurhash3.c ext/bloomury/murmurhash3.c"
45
+ out = "tmp/test_murmurhash3"
46
+
47
+ mkdir_p "tmp"
48
+ sh "#{cc} #{cflags} #{src} -o #{out}"
49
+ sh out
50
+ end
51
+
52
+ task default: %i[clobber compile test test_c rubocop]
@@ -0,0 +1,48 @@
1
+ <svg width="100%" viewBox="0 0 680 490" xmlns="http://www.w3.org/2000/svg">
2
+ <defs>
3
+
4
+ <mask id="imagine-text-gaps-g8r15m" maskUnits="userSpaceOnUse"><rect x="0" y="0" width="680" height="490" fill="white"/><rect x="169.6649627685547" y="86.44488525390625" width="340.67010498046875" height="88.88188934326172" fill="black" rx="2"/><rect x="191.13412475585938" y="196.42520141601562" width="297.7318115234375" height="18.468503952026367" fill="black" rx="2"/><rect x="149.98178100585938" y="253.35435485839844" width="379.0719299316406" height="16.5393705368042" fill="black" rx="2"/><rect x="-231.02935791015625" y="-8.64566946029663" width="14.058686256408691" height="16.5393705368042" fill="black" rx="2"/><rect x="-191.0293426513672" y="-8.64566946029663" width="14.058686256408691" height="16.5393705368042" fill="black" rx="2"/><rect x="-151.0293426513672" y="-8.64566946029663" width="14.058686256408691" height="16.5393705368042" fill="black" rx="2"/><rect x="-111.02935028076172" y="-8.64566946029663" width="14.058686256408691" height="16.5393705368042" fill="black" rx="2"/><rect x="-71.02934265136719" y="-8.64566946029663" width="14.058686256408691" height="16.5393705368042" fill="black" rx="2"/><rect x="-31.02934455871582" y="-8.64566946029663" width="14.058686256408691" height="16.5393705368042" fill="black" rx="2"/><rect x="8.970657348632812" y="-8.64566946029663" width="14.058686256408691" height="16.5393705368042" fill="black" rx="2"/><rect x="48.97065734863281" y="-8.64566946029663" width="14.058686256408691" height="16.5393705368042" fill="black" rx="2"/><rect x="88.97066497802734" y="-8.64566946029663" width="14.058686256408691" height="16.5393705368042" fill="black" rx="2"/><rect x="128.97067260742188" y="-8.64566946029663" width="14.058686256408691" height="16.5393705368042" fill="black" rx="2"/><rect x="168.9706573486328" y="-8.64566946029663" width="14.058686256408691" height="16.5393705368042" fill="black" rx="2"/><rect x="208.9706573486328" y="-8.64566946029663" width="14.058686256408691" height="16.5393705368042" fill="black" rx="2"/><rect x="88" y="378.3543395996094" width="68.72883224487305" height="16.5393705368042" fill="black" rx="2"/><rect x="176.55087280273438" y="378.3543395996094" width="106.89825439453125" height="16.5393705368042" fill="black" rx="2"/><rect x="352.0444030761719" y="378.3543395996094" width="96.1995620727539" height="16.5393705368042" fill="black" rx="2"/><rect x="485.10174560546875" y="378.3543395996094" width="106.89825439453125" height="16.5393705368042" fill="black" rx="2"/></mask></defs>
5
+
6
+ <rect width="680" height="480" style="fill:rgb(8, 10, 14);stroke:none;color:rgb(255, 255, 255);stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;opacity:1;font-family:&quot;Anthropic Sans&quot;, -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, sans-serif;font-size:16px;font-weight:400;text-anchor:start;dominant-baseline:auto"/>
7
+
8
+ <rect x="60" y="60" width="560" height="360" fill="none" stroke="#111827" stroke-width="0.5" style="fill:none;stroke:rgb(17, 24, 39);color:rgb(255, 255, 255);stroke-width:0.5px;stroke-linecap:butt;stroke-linejoin:miter;opacity:1;font-family:&quot;Anthropic Sans&quot;, -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, sans-serif;font-size:16px;font-weight:400;text-anchor:start;dominant-baseline:auto"/>
9
+ <rect x="66" y="66" width="548" height="348" fill="none" stroke="#0d1117" stroke-width="0.5" style="fill:none;stroke:rgb(13, 17, 23);color:rgb(255, 255, 255);stroke-width:0.5px;stroke-linecap:butt;stroke-linejoin:miter;opacity:1;font-family:&quot;Anthropic Sans&quot;, -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, sans-serif;font-size:16px;font-weight:400;text-anchor:start;dominant-baseline:auto"/>
10
+
11
+ <text x="340" y="155" text-anchor="middle" font-size="72" font-weight="700" fill="#e2e8f0" letter-spacing="2" style="fill:rgb(226, 232, 240);stroke:none;color:rgb(255, 255, 255);stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;opacity:1;font-family:&quot;SF Mono&quot;, &quot;Fira Code&quot;, monospace;font-size:72px;font-weight:700;text-anchor:middle;dominant-baseline:auto">bloomury</text>
12
+
13
+ <line x1="80" y1="175" x2="600" y2="175" mask="url(#imagine-text-gaps-g8r15m)" style="fill:rgb(0, 0, 0);stroke:rgb(30, 38, 48);color:rgb(255, 255, 255);stroke-width:0.5px;stroke-linecap:butt;stroke-linejoin:miter;opacity:1;font-family:&quot;Anthropic Sans&quot;, -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, sans-serif;font-size:16px;font-weight:400;text-anchor:start;dominant-baseline:auto"/>
14
+
15
+ <text x="340" y="210" text-anchor="middle" font-size="13" fill="#3b82f6" letter-spacing="4" style="fill:rgb(59, 130, 246);stroke:none;color:rgb(255, 255, 255);stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;opacity:1;font-family:&quot;SF Mono&quot;, &quot;Fira Code&quot;, monospace;font-size:13px;font-weight:400;text-anchor:middle;dominant-baseline:auto">BLOOM FILTER × MURMURHASH3</text>
16
+
17
+ <line x1="80" y1="228" x2="600" y2="228" style="fill:rgb(0, 0, 0);stroke:rgb(30, 38, 48);color:rgb(255, 255, 255);stroke-width:0.5px;stroke-linecap:butt;stroke-linejoin:miter;opacity:1;font-family:&quot;Anthropic Sans&quot;, -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, sans-serif;font-size:16px;font-weight:400;text-anchor:start;dominant-baseline:auto"/>
18
+
19
+ <g transform="translate(340, 320)" style="fill:rgb(0, 0, 0);stroke:none;color:rgb(255, 255, 255);stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;opacity:1;font-family:&quot;Anthropic Sans&quot;, -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, sans-serif;font-size:16px;font-weight:400;text-anchor:start;dominant-baseline:auto">
20
+ <g style="fill:rgb(0, 0, 0);stroke:none;color:rgb(255, 255, 255);stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;opacity:1;font-family:&quot;Anthropic Sans&quot;, -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, sans-serif;font-size:16px;font-weight:400;text-anchor:start;dominant-baseline:auto">
21
+ <rect x="-240" y="-18" width="32" height="32" rx="3" fill="#0f1720" stroke="#1e3a5f" stroke-width="0.5" style="fill:rgb(15, 23, 32);stroke:rgb(30, 58, 95);color:rgb(255, 255, 255);stroke-width:0.5px;stroke-linecap:butt;stroke-linejoin:miter;opacity:1;font-family:&quot;Anthropic Sans&quot;, -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, sans-serif;font-size:16px;font-weight:400;text-anchor:start;dominant-baseline:auto"/>
22
+ <rect x="-200" y="-18" width="32" height="32" rx="3" fill="#1a2d4a" stroke="#2563eb" stroke-width="0.5" style="fill:rgb(26, 45, 74);stroke:rgb(37, 99, 235);color:rgb(255, 255, 255);stroke-width:0.5px;stroke-linecap:butt;stroke-linejoin:miter;opacity:1;font-family:&quot;Anthropic Sans&quot;, -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, sans-serif;font-size:16px;font-weight:400;text-anchor:start;dominant-baseline:auto"/>
23
+ <rect x="-160" y="-18" width="32" height="32" rx="3" fill="#0f1720" stroke="#1e3a5f" stroke-width="0.5" style="fill:rgb(15, 23, 32);stroke:rgb(30, 58, 95);color:rgb(255, 255, 255);stroke-width:0.5px;stroke-linecap:butt;stroke-linejoin:miter;opacity:1;font-family:&quot;Anthropic Sans&quot;, -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, sans-serif;font-size:16px;font-weight:400;text-anchor:start;dominant-baseline:auto"/>
24
+ <rect x="-120" y="-18" width="32" height="32" rx="3" fill="#0f1720" stroke="#1e3a5f" stroke-width="0.5" style="fill:rgb(15, 23, 32);stroke:rgb(30, 58, 95);color:rgb(255, 255, 255);stroke-width:0.5px;stroke-linecap:butt;stroke-linejoin:miter;opacity:1;font-family:&quot;Anthropic Sans&quot;, -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, sans-serif;font-size:16px;font-weight:400;text-anchor:start;dominant-baseline:auto"/>
25
+ <rect x="-80" y="-18" width="32" height="32" rx="3" fill="#2a1f00" stroke="#92400e" stroke-width="0.5" style="fill:rgb(42, 31, 0);stroke:rgb(146, 64, 14);color:rgb(255, 255, 255);stroke-width:0.5px;stroke-linecap:butt;stroke-linejoin:miter;opacity:1;font-family:&quot;Anthropic Sans&quot;, -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, sans-serif;font-size:16px;font-weight:400;text-anchor:start;dominant-baseline:auto"/>
26
+ <rect x="-40" y="-18" width="32" height="32" rx="3" fill="#0f1720" stroke="#1e3a5f" stroke-width="0.5" style="fill:rgb(15, 23, 32);stroke:rgb(30, 58, 95);color:rgb(255, 255, 255);stroke-width:0.5px;stroke-linecap:butt;stroke-linejoin:miter;opacity:1;font-family:&quot;Anthropic Sans&quot;, -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, sans-serif;font-size:16px;font-weight:400;text-anchor:start;dominant-baseline:auto"/>
27
+ <rect x="0" y="-18" width="32" height="32" rx="3" fill="#1a2d4a" stroke="#2563eb" stroke-width="0.5" style="fill:rgb(26, 45, 74);stroke:rgb(37, 99, 235);color:rgb(255, 255, 255);stroke-width:0.5px;stroke-linecap:butt;stroke-linejoin:miter;opacity:1;font-family:&quot;Anthropic Sans&quot;, -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, sans-serif;font-size:16px;font-weight:400;text-anchor:start;dominant-baseline:auto"/>
28
+ <rect x="40" y="-18" width="32" height="32" rx="3" fill="#0f1720" stroke="#1e3a5f" stroke-width="0.5" style="fill:rgb(15, 23, 32);stroke:rgb(30, 58, 95);color:rgb(255, 255, 255);stroke-width:0.5px;stroke-linecap:butt;stroke-linejoin:miter;opacity:1;font-family:&quot;Anthropic Sans&quot;, -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, sans-serif;font-size:16px;font-weight:400;text-anchor:start;dominant-baseline:auto"/>
29
+ <rect x="80" y="-18" width="32" height="32" rx="3" fill="#2a1f00" stroke="#92400e" stroke-width="0.5" style="fill:rgb(42, 31, 0);stroke:rgb(146, 64, 14);color:rgb(255, 255, 255);stroke-width:0.5px;stroke-linecap:butt;stroke-linejoin:miter;opacity:1;font-family:&quot;Anthropic Sans&quot;, -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, sans-serif;font-size:16px;font-weight:400;text-anchor:start;dominant-baseline:auto"/>
30
+ <rect x="120" y="-18" width="32" height="32" rx="3" fill="#0f1720" stroke="#1e3a5f" stroke-width="0.5" style="fill:rgb(15, 23, 32);stroke:rgb(30, 58, 95);color:rgb(255, 255, 255);stroke-width:0.5px;stroke-linecap:butt;stroke-linejoin:miter;opacity:1;font-family:&quot;Anthropic Sans&quot;, -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, sans-serif;font-size:16px;font-weight:400;text-anchor:start;dominant-baseline:auto"/>
31
+ <rect x="160" y="-18" width="32" height="32" rx="3" fill="#1a2d4a" stroke="#2563eb" stroke-width="0.5" style="fill:rgb(26, 45, 74);stroke:rgb(37, 99, 235);color:rgb(255, 255, 255);stroke-width:0.5px;stroke-linecap:butt;stroke-linejoin:miter;opacity:1;font-family:&quot;Anthropic Sans&quot;, -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, sans-serif;font-size:16px;font-weight:400;text-anchor:start;dominant-baseline:auto"/>
32
+ <rect x="200" y="-18" width="32" height="32" rx="3" fill="#0f1720" stroke="#1e3a5f" stroke-width="0.5" style="fill:rgb(15, 23, 32);stroke:rgb(30, 58, 95);color:rgb(255, 255, 255);stroke-width:0.5px;stroke-linecap:butt;stroke-linejoin:miter;opacity:1;font-family:&quot;Anthropic Sans&quot;, -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, sans-serif;font-size:16px;font-weight:400;text-anchor:start;dominant-baseline:auto"/>
33
+ <text x="-224" y="3" text-anchor="middle" font-size="11" fill="#374151" style="fill:rgb(55, 65, 81);stroke:none;color:rgb(255, 255, 255);stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;opacity:1;font-family:&quot;SF Mono&quot;, &quot;Fira Code&quot;, monospace;font-size:11px;font-weight:400;text-anchor:middle;dominant-baseline:auto">0</text>
34
+ <text x="-184" y="3" text-anchor="middle" font-size="11" fill="#93c5fd" style="fill:rgb(147, 197, 253);stroke:none;color:rgb(255, 255, 255);stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;opacity:1;font-family:&quot;SF Mono&quot;, &quot;Fira Code&quot;, monospace;font-size:11px;font-weight:400;text-anchor:middle;dominant-baseline:auto">1</text>
35
+ <text x="-144" y="3" text-anchor="middle" font-size="11" fill="#374151" style="fill:rgb(55, 65, 81);stroke:none;color:rgb(255, 255, 255);stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;opacity:1;font-family:&quot;SF Mono&quot;, &quot;Fira Code&quot;, monospace;font-size:11px;font-weight:400;text-anchor:middle;dominant-baseline:auto">0</text>
36
+ <text x="-104" y="3" text-anchor="middle" font-size="11" fill="#374151" style="fill:rgb(55, 65, 81);stroke:none;color:rgb(255, 255, 255);stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;opacity:1;font-family:&quot;SF Mono&quot;, &quot;Fira Code&quot;, monospace;font-size:11px;font-weight:400;text-anchor:middle;dominant-baseline:auto">0</text>
37
+ <text x="-64" y="3" text-anchor="middle" font-size="11" fill="#fbbf24" style="fill:rgb(251, 191, 36);stroke:none;color:rgb(255, 255, 255);stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;opacity:1;font-family:&quot;SF Mono&quot;, &quot;Fira Code&quot;, monospace;font-size:11px;font-weight:400;text-anchor:middle;dominant-baseline:auto">1</text>
38
+ <text x="-24" y="3" text-anchor="middle" font-size="11" fill="#374151" style="fill:rgb(55, 65, 81);stroke:none;color:rgb(255, 255, 255);stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;opacity:1;font-family:&quot;SF Mono&quot;, &quot;Fira Code&quot;, monospace;font-size:11px;font-weight:400;text-anchor:middle;dominant-baseline:auto">0</text>
39
+ <text x="16" y="3" text-anchor="middle" font-size="11" fill="#93c5fd" style="fill:rgb(147, 197, 253);stroke:none;color:rgb(255, 255, 255);stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;opacity:1;font-family:&quot;SF Mono&quot;, &quot;Fira Code&quot;, monospace;font-size:11px;font-weight:400;text-anchor:middle;dominant-baseline:auto">1</text>
40
+ <text x="56" y="3" text-anchor="middle" font-size="11" fill="#374151" style="fill:rgb(55, 65, 81);stroke:none;color:rgb(255, 255, 255);stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;opacity:1;font-family:&quot;SF Mono&quot;, &quot;Fira Code&quot;, monospace;font-size:11px;font-weight:400;text-anchor:middle;dominant-baseline:auto">0</text>
41
+ <text x="96" y="3" text-anchor="middle" font-size="11" fill="#fbbf24" style="fill:rgb(251, 191, 36);stroke:none;color:rgb(255, 255, 255);stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;opacity:1;font-family:&quot;SF Mono&quot;, &quot;Fira Code&quot;, monospace;font-size:11px;font-weight:400;text-anchor:middle;dominant-baseline:auto">1</text>
42
+ <text x="136" y="3" text-anchor="middle" font-size="11" fill="#374151" style="fill:rgb(55, 65, 81);stroke:none;color:rgb(255, 255, 255);stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;opacity:1;font-family:&quot;SF Mono&quot;, &quot;Fira Code&quot;, monospace;font-size:11px;font-weight:400;text-anchor:middle;dominant-baseline:auto">0</text>
43
+ <text x="176" y="3" text-anchor="middle" font-size="11" fill="#93c5fd" style="fill:rgb(147, 197, 253);stroke:none;color:rgb(255, 255, 255);stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;opacity:1;font-family:&quot;SF Mono&quot;, &quot;Fira Code&quot;, monospace;font-size:11px;font-weight:400;text-anchor:middle;dominant-baseline:auto">1</text>
44
+ <text x="216" y="3" text-anchor="middle" font-size="11" fill="#374151" style="fill:rgb(55, 65, 81);stroke:none;color:rgb(255, 255, 255);stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;opacity:1;font-family:&quot;SF Mono&quot;, &quot;Fira Code&quot;, monospace;font-size:11px;font-weight:400;text-anchor:middle;dominant-baseline:auto">0</text>
45
+ </g>
46
+ </g>
47
+
48
+ </svg>
@@ -0,0 +1,48 @@
1
+ #include "bloomury.h"
2
+ #include "murmurhash3.h"
3
+
4
+ static inline void bit_set(uint8_t *bits, uint64_t pos) {
5
+ bits[pos / 8] |= (1 << (pos % 8));
6
+ }
7
+
8
+ static inline int bit_get(const uint8_t *bits, uint64_t pos) {
9
+ return (bits[pos / 8] >> (pos % 8)) & 1;
10
+ }
11
+
12
+ void bloom_filter_init(BloomFilter *f, uint64_t bit_count,
13
+ uint32_t hash_count, uint32_t seed1, uint32_t seed2) {
14
+ f->bit_count = bit_count;
15
+ f->hash_count = hash_count;
16
+ f->item_count = 0;
17
+ f->seed1 = seed1;
18
+ f->seed2 = seed2;
19
+ f->bits = ruby_xcalloc((bit_count + 7) / 8, 1);
20
+ }
21
+
22
+ void bloom_filter_free(BloomFilter *f) {
23
+ ruby_xfree(f->bits);
24
+ f->bits = NULL;
25
+ }
26
+
27
+ void bloom_filter_add(BloomFilter *f, const uint8_t *data, size_t len) {
28
+ uint32_t h1 = murmur3_32(data, len, f->seed1);
29
+ uint32_t h2 = murmur3_32(data, len, f->seed2);
30
+
31
+ for (uint32_t i = 0; i < f->hash_count; i++) {
32
+ uint64_t pos = ((uint64_t)h1 + i * (uint64_t)h2) % f->bit_count;
33
+ bit_set(f->bits, pos);
34
+ }
35
+ f->item_count++;
36
+ }
37
+
38
+ int bloom_filter_check(BloomFilter *f, const uint8_t *data, size_t len) {
39
+ uint32_t h1 = murmur3_32(data, len, f->seed1);
40
+ uint32_t h2 = murmur3_32(data, len, f->seed2);
41
+
42
+ for (uint32_t i = 0; i < f->hash_count; i++) {
43
+ uint64_t pos = ((uint64_t)h1 + i * (uint64_t)h2) % f->bit_count;
44
+ if (!bit_get(f->bits, pos))
45
+ return 0;
46
+ }
47
+ return 1;
48
+ }
@@ -0,0 +1,23 @@
1
+ #ifndef BLOOMURY_H
2
+ #define BLOOMURY_H 1
3
+
4
+ #include "ruby.h"
5
+ #include <stdint.h>
6
+ #include <stdlib.h>
7
+
8
+ typedef struct {
9
+ uint8_t *bits;
10
+ uint64_t bit_count;
11
+ uint32_t hash_count;
12
+ uint64_t item_count;
13
+ uint32_t seed1;
14
+ uint32_t seed2;
15
+ } BloomFilter;
16
+
17
+ void bloom_filter_init(BloomFilter *f, uint64_t bit_count, uint32_t hash_count,
18
+ uint32_t seed1, uint32_t seed2);
19
+ void bloom_filter_free(BloomFilter *f);
20
+ void bloom_filter_add(BloomFilter *f, const uint8_t *data, size_t len);
21
+ int bloom_filter_check(BloomFilter *f, const uint8_t *data, size_t len);
22
+
23
+ #endif /* BLOOMURY_H */
@@ -0,0 +1,10 @@
1
+ # frozen_string_literal: true
2
+ require "mkmf"
3
+
4
+ # Makes all symbols private by default to avoid unintended conflict
5
+ # with other gems. To explicitly export symbols you can use RUBY_FUNC_EXPORTED
6
+ # selectively, or entirely remove this flag.
7
+ append_cflags("-fvisibility=hidden")
8
+ append_cflags("-O2")
9
+
10
+ create_makefile("bloomury/bloomury")
@@ -0,0 +1,61 @@
1
+ #include "murmurhash3.h"
2
+ #include <string.h>
3
+
4
+ /* ============================================================
5
+ MurmurHash3 — 32 bit variant
6
+ Austin Appleby, public domain
7
+ ============================================================ */
8
+
9
+ static inline uint32_t rotl32(uint32_t x, int8_t r) {
10
+ return (x << r) | (x >> (32 - r));
11
+ }
12
+
13
+ #define MURMUR_C1 0xcc9e2d51
14
+ #define MURMUR_C2 0x1b873593
15
+ #define MURMUR_MIX 0xe6546b64 // additive constant in block mixing step
16
+ #define MURMUR_F1 0x85ebca6b // finalization mix constants
17
+ #define MURMUR_F2 0xc2b2ae35
18
+
19
+ uint32_t murmur3_32(const uint8_t *data, size_t len, uint32_t seed) {
20
+ uint32_t h = seed;
21
+ size_t nblocks = len / 4;
22
+
23
+ const uint32_t *blocks = (const uint32_t *)data;
24
+ for (size_t i = 0; i < nblocks; i++) {
25
+ uint32_t k;
26
+ memcpy(&k, blocks + i, sizeof(k));
27
+
28
+ k *= MURMUR_C1;
29
+ k = rotl32(k, 15);
30
+ k *= MURMUR_C2;
31
+
32
+ h ^= k;
33
+ h = rotl32(h, 13);
34
+ h = h * 5 + MURMUR_MIX;
35
+ }
36
+
37
+ const uint8_t *tail = data + nblocks * 4;
38
+ uint32_t k = 0;
39
+
40
+ switch (len & 3) {
41
+ case 3:
42
+ k ^= tail[2] << 16;
43
+ case 2:
44
+ k ^= tail[1] << 8;
45
+ case 1:
46
+ k ^= tail[0];
47
+ k *= MURMUR_C1;
48
+ k = rotl32(k, 15);
49
+ k *= MURMUR_C2;
50
+ h ^= k;
51
+ }
52
+
53
+ h ^= (uint32_t)len;
54
+ h ^= h >> 16;
55
+ h *= MURMUR_F1;
56
+ h ^= h >> 13;
57
+ h *= MURMUR_F2;
58
+ h ^= h >> 16;
59
+
60
+ return h;
61
+ }
@@ -0,0 +1,9 @@
1
+ #ifndef MURMURHASH3_H
2
+ #define MURMURHASH3_H 1
3
+
4
+ #include <stddef.h>
5
+ #include <stdint.h>
6
+
7
+ uint32_t murmur3_32(const uint8_t *data, size_t len, uint32_t seed);
8
+
9
+ #endif /* MURMURHASH3_H */
@@ -0,0 +1,129 @@
1
+ #include "bloomury.h"
2
+ #include <limits.h>
3
+ #include <math.h>
4
+
5
+ VALUE rb_mBloomury;
6
+ VALUE rb_cBloomFilter;
7
+
8
+ static void bloom_free(void *ptr) {
9
+ BloomFilter *f = (BloomFilter *)ptr;
10
+ bloom_filter_free(f);
11
+ ruby_xfree(f);
12
+ }
13
+
14
+ static size_t bloom_memsize(const void *ptr) {
15
+ const BloomFilter *f = (const BloomFilter *)ptr;
16
+ return sizeof(BloomFilter) + (f->bit_count + 7) / 8;
17
+ }
18
+
19
+ static const rb_data_type_t bloom_type = {"BloomFilter",
20
+ {NULL, bloom_free, bloom_memsize},
21
+ 0,
22
+ 0,
23
+ RUBY_TYPED_FREE_IMMEDIATELY};
24
+
25
+ static VALUE bloom_alloc(VALUE klass) {
26
+ BloomFilter *f = ruby_xcalloc(1, sizeof(BloomFilter));
27
+ return TypedData_Wrap_Struct(klass, &bloom_type, f);
28
+ }
29
+
30
+ static VALUE bloom_initialize(int argc, VALUE *argv, VALUE self) {
31
+ VALUE capacity, error_rate, kwargs;
32
+ rb_scan_args(argc, argv, "2:", &capacity, &error_rate, &kwargs);
33
+
34
+ ID kwarg_ids[2] = { rb_intern("seed1"), rb_intern("seed2") };
35
+ VALUE kwarg_vals[2];
36
+ rb_get_kwargs(kwargs, kwarg_ids, 0, 2, kwarg_vals);
37
+
38
+ uint32_t seed1 = kwarg_vals[0] == Qundef ? 0x9747b28c : (uint32_t)NUM2ULONG(kwarg_vals[0]);
39
+ uint32_t seed2 = kwarg_vals[1] == Qundef ? 0x5a4afe17 : (uint32_t)NUM2ULONG(kwarg_vals[1]);
40
+
41
+ BloomFilter *f;
42
+ TypedData_Get_Struct(self, BloomFilter, &bloom_type, f);
43
+
44
+ double n = NUM2DBL(capacity);
45
+ double p = NUM2DBL(error_rate);
46
+
47
+ if (n <= 0)
48
+ rb_raise(rb_eArgError, "capacity must be positive");
49
+ if (p <= 0 || p >= 1)
50
+ rb_raise(rb_eArgError, "error_rate must be between 0 and 1 (exclusive)");
51
+
52
+ double m = ceil(-n * log(p) / (log(2) * log(2)));
53
+ double k = round((m / n) * log(2));
54
+
55
+ if (!isfinite(m) || m > (double)SIZE_MAX)
56
+ rb_raise(rb_eRangeError,
57
+ "filter parameters would require an infeasible allocation; "
58
+ "run `rake memory_estimate[capacity,error_rate]` to check requirements first");
59
+
60
+ if (k < 1)
61
+ rb_raise(rb_eArgError, "computed hash count is zero; error_rate is too loose");
62
+
63
+ bloom_filter_init(f, (uint64_t)m, (uint32_t)k, seed1, seed2);
64
+ return self;
65
+ }
66
+
67
+ static VALUE bloom_seed1(VALUE self) {
68
+ BloomFilter *f;
69
+ TypedData_Get_Struct(self, BloomFilter, &bloom_type, f);
70
+ return UINT2NUM(f->seed1);
71
+ }
72
+
73
+ static VALUE bloom_seed2(VALUE self) {
74
+ BloomFilter *f;
75
+ TypedData_Get_Struct(self, BloomFilter, &bloom_type, f);
76
+ return UINT2NUM(f->seed2);
77
+ }
78
+
79
+ static VALUE bloom_add(VALUE self, VALUE item) {
80
+ BloomFilter *f;
81
+ TypedData_Get_Struct(self, BloomFilter, &bloom_type, f);
82
+
83
+ StringValue(item);
84
+ bloom_filter_add(f, (uint8_t *)RSTRING_PTR(item), RSTRING_LEN(item));
85
+ return item;
86
+ }
87
+
88
+ static VALUE bloom_check(VALUE self, VALUE item) {
89
+ BloomFilter *f;
90
+ TypedData_Get_Struct(self, BloomFilter, &bloom_type, f);
91
+
92
+ StringValue(item);
93
+ int found =
94
+ bloom_filter_check(f, (uint8_t *)RSTRING_PTR(item), RSTRING_LEN(item));
95
+ return found ? Qtrue : Qfalse;
96
+ }
97
+
98
+ static VALUE bloom_add_count(VALUE self) {
99
+ BloomFilter *f;
100
+ TypedData_Get_Struct(self, BloomFilter, &bloom_type, f);
101
+ return ULL2NUM(f->item_count);
102
+ }
103
+
104
+ static VALUE bloom_bit_count(VALUE self) {
105
+ BloomFilter *f;
106
+ TypedData_Get_Struct(self, BloomFilter, &bloom_type, f);
107
+ return ULL2NUM(f->bit_count);
108
+ }
109
+
110
+ static VALUE bloom_hash_count(VALUE self) {
111
+ BloomFilter *f;
112
+ TypedData_Get_Struct(self, BloomFilter, &bloom_type, f);
113
+ return UINT2NUM(f->hash_count);
114
+ }
115
+
116
+ RUBY_FUNC_EXPORTED void Init_bloomury(void) {
117
+ rb_mBloomury = rb_define_module("Bloomury");
118
+ rb_cBloomFilter = rb_define_class_under(rb_mBloomury, "Filter", rb_cObject);
119
+
120
+ rb_define_alloc_func(rb_cBloomFilter, bloom_alloc);
121
+ rb_define_method(rb_cBloomFilter, "initialize", bloom_initialize, -1);
122
+ rb_define_method(rb_cBloomFilter, "seed1", bloom_seed1, 0);
123
+ rb_define_method(rb_cBloomFilter, "seed2", bloom_seed2, 0);
124
+ rb_define_method(rb_cBloomFilter, "add", bloom_add, 1);
125
+ rb_define_method(rb_cBloomFilter, "include?", bloom_check, 1);
126
+ rb_define_method(rb_cBloomFilter, "add_count", bloom_add_count, 0);
127
+ rb_define_method(rb_cBloomFilter, "bit_count", bloom_bit_count, 0);
128
+ rb_define_method(rb_cBloomFilter, "hash_count", bloom_hash_count, 0);
129
+ }
@@ -0,0 +1,5 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Bloomury
4
+ VERSION = "0.1.0"
5
+ end
data/lib/bloomury.rb ADDED
@@ -0,0 +1,9 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "bloomury/version"
4
+ require "bloomury/bloomury"
5
+
6
+ module Bloomury
7
+ class Error < StandardError; end
8
+ # Your code goes here...
9
+ end
data/sig/bloomury.rbs ADDED
@@ -0,0 +1,4 @@
1
+ module BloomRb
2
+ VERSION: String
3
+ # See the writing guide of rbs: https://github.com/ruby/rbs#guides
4
+ end
metadata ADDED
@@ -0,0 +1,57 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: bloomury
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Steven Scott
8
+ bindir: exe
9
+ cert_chain: []
10
+ date: 1980-01-02 00:00:00.000000000 Z
11
+ dependencies: []
12
+ email:
13
+ - steven.bradley.scott@gmail.com
14
+ executables: []
15
+ extensions:
16
+ - ext/bloomury/extconf.rb
17
+ extra_rdoc_files: []
18
+ files:
19
+ - CHANGELOG.md
20
+ - CODE_OF_CONDUCT.md
21
+ - LICENSE.txt
22
+ - README.md
23
+ - Rakefile
24
+ - assets/bloomury_poster.svg
25
+ - ext/bloomury/bloom_filter.c
26
+ - ext/bloomury/bloomury.h
27
+ - ext/bloomury/extconf.rb
28
+ - ext/bloomury/murmurhash3.c
29
+ - ext/bloomury/murmurhash3.h
30
+ - ext/bloomury/ruby_api.c
31
+ - lib/bloomury.rb
32
+ - lib/bloomury/version.rb
33
+ - sig/bloomury.rbs
34
+ homepage: https://github.com/tostart-pickagreatname/bloomury
35
+ licenses:
36
+ - MIT
37
+ metadata:
38
+ homepage_uri: https://github.com/tostart-pickagreatname/bloomury
39
+ changelog_uri: https://github.com/tostart-pickagreatname/bloomury/blob/main/CHANGELOG.md
40
+ rdoc_options: []
41
+ require_paths:
42
+ - lib
43
+ required_ruby_version: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - ">="
46
+ - !ruby/object:Gem::Version
47
+ version: 3.2.0
48
+ required_rubygems_version: !ruby/object:Gem::Requirement
49
+ requirements:
50
+ - - ">="
51
+ - !ruby/object:Gem::Version
52
+ version: '0'
53
+ requirements: []
54
+ rubygems_version: 4.0.8
55
+ specification_version: 4
56
+ summary: Ruby gem bloom filter using murmurhash3 in a C extension.
57
+ test_files: []