frequency_enumerator 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README.md +100 -0
- data/lib/frequency_enumerator/base.rb +75 -0
- data/lib/frequency_enumerator/composer.rb +45 -0
- data/lib/frequency_enumerator/decomposer.rb +56 -0
- data/lib/frequency_enumerator/sorter.rb +77 -0
- data/lib/frequency_enumerator.rb +4 -0
- metadata +84 -0
data/README.md
ADDED
@@ -0,0 +1,100 @@
|
|
1
|
+
## Frequency Enumerator
|
2
|
+
|
3
|
+
Yields hashes that correlate with the given frequency distribution.
|
4
|
+
|
5
|
+
## Concept
|
6
|
+
|
7
|
+
If you're using brute-force search to solve some problem, it makes sense to carry out some frequency analysis on the problem first.
|
8
|
+
|
9
|
+
Consider a simple example of trying to figure out which combinations of items cost a known total:
|
10
|
+
|
11
|
+
```
|
12
|
+
Total: £2.00
|
13
|
+
|
14
|
+
Item prices: Tea (£0.20), Coffee (£0.30), Biscuit (£0.15)
|
15
|
+
```
|
16
|
+
|
17
|
+
We could use *maths* to solve this problem. Or we could brute-force it.
|
18
|
+
|
19
|
+
For the latter, you'd go through every combination of these items and see which totalled £2.00. In this example, that'd take no time at all, but what if we're dealing with huge sums of money, or there are dozens of items? What if we're brute-forcing passwords?
|
20
|
+
|
21
|
+
It helps to do some [frequency analysis](https://github.com/cpatuzzo/frequency_analyser) first.
|
22
|
+
|
23
|
+
You might discover, that in fact, almost no one drinks tea and everyone loves biscuits. You might ask a couple of hundred people and end up with a frequency distribution like this:
|
24
|
+
|
25
|
+
```ruby
|
26
|
+
{ :tea => 25, :coffee => 60, :biscuit => 115 }
|
27
|
+
```
|
28
|
+
|
29
|
+
It'd be nice if we could brute-force the problem, but be more intelligent about the order in which we do so. We should use make use of our valuable, newfound knowledge.
|
30
|
+
|
31
|
+
And that's exactly what Frequency Enumerator does. (I got there in the end!)
|
32
|
+
|
33
|
+
You simply feed it a frequency distribution and it does its best to spew out 'attempts' that correlate with the given distribution. In our case, we'd do something like this:
|
34
|
+
|
35
|
+
## Usage
|
36
|
+
|
37
|
+
```ruby
|
38
|
+
# gem install frequency_enumerator
|
39
|
+
|
40
|
+
require 'frequency_enumerator'
|
41
|
+
|
42
|
+
distribution = { :tea => 25, :coffee => 60, :biscuit => 115 }
|
43
|
+
bits_required = 4 # 0..15 should be enough for our simple problem
|
44
|
+
|
45
|
+
FrequencyEnumerator.new(distribution, :bit_count => bits_required).each do |hash|
|
46
|
+
# ...
|
47
|
+
end
|
48
|
+
```
|
49
|
+
|
50
|
+
The first 10 attempts yielded to the block are:
|
51
|
+
|
52
|
+
```ruby
|
53
|
+
{ :tea=>0, :coffee=>0, :biscuit=>0 }
|
54
|
+
{ :tea=>0, :coffee=>0, :biscuit=>1 }
|
55
|
+
{ :tea=>0, :coffee=>0, :biscuit=>2 }
|
56
|
+
{ :tea=>0, :coffee=>0, :biscuit=>3 }
|
57
|
+
{ :tea=>0, :coffee=>1, :biscuit=>0 }
|
58
|
+
{ :tea=>0, :coffee=>1, :biscuit=>1 }
|
59
|
+
{ :tea=>0, :coffee=>1, :biscuit=>2 }
|
60
|
+
{ :tea=>0, :coffee=>1, :biscuit=>3 }
|
61
|
+
{ :tea=>0, :coffee=>0, :biscuit=>4 }
|
62
|
+
{ :tea=>0, :coffee=>0, :biscuit=>5 }
|
63
|
+
```
|
64
|
+
|
65
|
+
As you can see, most of attempts change the number of biscuits, whilst we haven't even explored the possibility that tea might be in the solution yet.
|
66
|
+
|
67
|
+
# Limit
|
68
|
+
|
69
|
+
All attempts are guaranteed to be unique and appear in a deterministic order. The 'limit' method calculates the number of unique enumerations for the search space (zero-offset).
|
70
|
+
|
71
|
+
```ruby
|
72
|
+
enum = FrequencyEnumerator.new(distribution, :bit_count => 4)
|
73
|
+
enum.limit #=> 4095
|
74
|
+
```
|
75
|
+
|
76
|
+
So there will be 4096 enumerations yielded to the block.
|
77
|
+
|
78
|
+
## Options
|
79
|
+
|
80
|
+
You can set 'from' and 'to' to explore different portions of the search space:
|
81
|
+
|
82
|
+
```ruby
|
83
|
+
FrequencyEnumerator.new(distribution, :from => 100, :to => 199)
|
84
|
+
```
|
85
|
+
|
86
|
+
This might be useful for multi-threading, map-reduce, or carrying on from where you left off if you're exploring a large search space.
|
87
|
+
|
88
|
+
## Real-world example
|
89
|
+
|
90
|
+
My motivation for building this gem is to more intelligently brute-force the problem of finding [self-enumerating pangrams](http://en.wikipedia.org/wiki/Pangram#Self-enumerating_pangrams) by using classical literature to build a frequency distribution of English text.
|
91
|
+
|
92
|
+
In theory, mutating the E's, T's, A's, O's and I's first should result in attempts that correlate with English text and therefore are more likely to be solutions.
|
93
|
+
|
94
|
+
## Contribution
|
95
|
+
|
96
|
+
Feel free to contribute. No commit is too small.
|
97
|
+
|
98
|
+
If you're good at optimisation, this project might be for you.
|
99
|
+
|
100
|
+
You should follow me: [@cpatuzzo](https://twitter.com/cpatuzzo)
|
@@ -0,0 +1,75 @@
|
|
1
|
+
class FrequencyEnumerator < Enumerable::Enumerator
|
2
|
+
|
3
|
+
attr_reader :frequencies, :bit_count, :from, :to
|
4
|
+
|
5
|
+
def initialize(frequencies, params = {})
|
6
|
+
@frequencies = frequencies
|
7
|
+
|
8
|
+
@bit_count = params[:bit_count] || 6
|
9
|
+
@from = params[:from] || 0
|
10
|
+
@to = params[:to] || limit
|
11
|
+
|
12
|
+
raise_if_either_boundary_is_out_of_range
|
13
|
+
|
14
|
+
@sorter = params[:sorter] || fe::Sorter
|
15
|
+
@composer = params[:composer] || fe::Composer
|
16
|
+
@decomposer = params[:decomposer] || fe::Decomposer
|
17
|
+
end
|
18
|
+
|
19
|
+
def each(&block)
|
20
|
+
(from..to).each do |i|
|
21
|
+
binary = decomposer.decompose(i)
|
22
|
+
bitmap = fragmented_bitmap(binary)
|
23
|
+
yield composition(bitmap)
|
24
|
+
end
|
25
|
+
|
26
|
+
self
|
27
|
+
end
|
28
|
+
|
29
|
+
def limit
|
30
|
+
@limit ||= (2 ** bit_count) ** frequencies.size - 1
|
31
|
+
end
|
32
|
+
|
33
|
+
private
|
34
|
+
def decomposer
|
35
|
+
@decomposer.new(:bit_count => @bit_count * frequencies.size)
|
36
|
+
end
|
37
|
+
|
38
|
+
def fragmented_bitmap(binary)
|
39
|
+
pairs = binary.zip(sorted_keys)
|
40
|
+
empty_array_default = Hash.new { |h, k| h[k] = [] }
|
41
|
+
|
42
|
+
pairs.inject(empty_array_default) do |h, (bit, key)|
|
43
|
+
h[key] << bit; h
|
44
|
+
end
|
45
|
+
end
|
46
|
+
|
47
|
+
def composition(bitmap)
|
48
|
+
bitmap.inject({}) do |h, (key, fragment)|
|
49
|
+
h.merge(key => @composer.compose(fragment))
|
50
|
+
end
|
51
|
+
end
|
52
|
+
|
53
|
+
def sorted_keys
|
54
|
+
return @sorted_keys if @sorted_keys
|
55
|
+
sorter = @sorter.new(:bit_count => @bit_count)
|
56
|
+
@sorted_keys = sorter.sort(frequencies)
|
57
|
+
end
|
58
|
+
|
59
|
+
def raise_if_either_boundary_is_out_of_range
|
60
|
+
[@from, @to].each do |i|
|
61
|
+
raise ArgumentError.new(
|
62
|
+
"#{i} lies outside of the range of the function: (0..#{limit})."
|
63
|
+
) if out_of_range?(i)
|
64
|
+
end
|
65
|
+
end
|
66
|
+
|
67
|
+
def out_of_range?(x)
|
68
|
+
x < 0 || x > limit
|
69
|
+
end
|
70
|
+
|
71
|
+
def fe
|
72
|
+
self.class
|
73
|
+
end
|
74
|
+
|
75
|
+
end
|
@@ -0,0 +1,45 @@
|
|
1
|
+
class FrequencyEnumerator::Composer
|
2
|
+
|
3
|
+
attr_reader :endianess
|
4
|
+
|
5
|
+
def initialize(params = {})
|
6
|
+
@endianess = params[:endianess]
|
7
|
+
end
|
8
|
+
|
9
|
+
def self.compose(bit_array)
|
10
|
+
new.compose(bit_array)
|
11
|
+
end
|
12
|
+
|
13
|
+
def compose(bit_array)
|
14
|
+
raise_if_non_binary_elements(bit_array)
|
15
|
+
|
16
|
+
bit_array = bit_array.reverse if big_endian?
|
17
|
+
|
18
|
+
bit_array.each_with_index.inject(0) do |sum, (bit, index)|
|
19
|
+
sum + (bit << index)
|
20
|
+
end
|
21
|
+
end
|
22
|
+
|
23
|
+
def little_endian?
|
24
|
+
@endianess == :little
|
25
|
+
end
|
26
|
+
|
27
|
+
def big_endian?
|
28
|
+
@endianess == :big
|
29
|
+
end
|
30
|
+
|
31
|
+
private
|
32
|
+
def raise_if_non_binary_elements(bit_array)
|
33
|
+
non_binary_elements = bit_array.reject { |b| [0, 1].include?(b) }
|
34
|
+
|
35
|
+
if non_binary_elements.any?
|
36
|
+
plural = 's' if non_binary_elements.size > 1
|
37
|
+
elements = non_binary_elements.map(&:inspect).join(', ')
|
38
|
+
|
39
|
+
raise TypeError.new(
|
40
|
+
"Composing from non-binary element#{plural} #{elements}."
|
41
|
+
)
|
42
|
+
end
|
43
|
+
end
|
44
|
+
|
45
|
+
end
|
@@ -0,0 +1,56 @@
|
|
1
|
+
class FrequencyEnumerator::Decomposer
|
2
|
+
|
3
|
+
class ::OverflowError < StandardError; end
|
4
|
+
class ::SignedError < StandardError; end
|
5
|
+
|
6
|
+
attr_reader :bit_count
|
7
|
+
attr_reader :endianness
|
8
|
+
|
9
|
+
def initialize(params = {})
|
10
|
+
@bit_count = params[:bit_count] || 8
|
11
|
+
@endianness = params[:endianness] || :little
|
12
|
+
end
|
13
|
+
|
14
|
+
def self.decompose(integer)
|
15
|
+
new.decompose(integer)
|
16
|
+
end
|
17
|
+
|
18
|
+
def decompose(integer)
|
19
|
+
raise_if_negative(integer)
|
20
|
+
raise_if_not_enough_bits(integer)
|
21
|
+
|
22
|
+
bit_array = bit_count.times.map { |b| integer[b] }
|
23
|
+
|
24
|
+
little_endian? ? bit_array : bit_array.reverse
|
25
|
+
end
|
26
|
+
|
27
|
+
def little_endian?
|
28
|
+
endianness == :little
|
29
|
+
end
|
30
|
+
|
31
|
+
def big_endian?
|
32
|
+
endianess == :big
|
33
|
+
end
|
34
|
+
|
35
|
+
private
|
36
|
+
def raise_if_negative(integer)
|
37
|
+
raise SignedError.new(
|
38
|
+
"Decomposing negative integers is unsupported."
|
39
|
+
) if integer < 0
|
40
|
+
end
|
41
|
+
|
42
|
+
def raise_if_not_enough_bits(integer)
|
43
|
+
bits_required = bits_required_to_decompose(integer)
|
44
|
+
|
45
|
+
raise OverflowError.new(
|
46
|
+
"Decomposing #{integer} requires more than #{bit_count} bits."
|
47
|
+
) if bits_required > bit_count
|
48
|
+
end
|
49
|
+
|
50
|
+
def bits_required_to_decompose(integer)
|
51
|
+
(1..1.0/0).detect do |bits|
|
52
|
+
(integer >> bits).zero?
|
53
|
+
end
|
54
|
+
end
|
55
|
+
|
56
|
+
end
|
@@ -0,0 +1,77 @@
|
|
1
|
+
class FrequencyEnumerator::Sorter
|
2
|
+
|
3
|
+
attr_reader :bit_count
|
4
|
+
|
5
|
+
def initialize(params = {})
|
6
|
+
@bit_count = params[:bit_count] || 8
|
7
|
+
end
|
8
|
+
|
9
|
+
def self.sort(frequencies)
|
10
|
+
new.sort(frequencies)
|
11
|
+
end
|
12
|
+
|
13
|
+
def sort(frequencies)
|
14
|
+
helper = AccumulationHelper.new(frequencies, bit_count)
|
15
|
+
sorted_keys = []
|
16
|
+
|
17
|
+
until helper.depleted_keys? do
|
18
|
+
key = helper.maximal_key
|
19
|
+
sorted_keys << key
|
20
|
+
helper.accumulate(key)
|
21
|
+
end
|
22
|
+
|
23
|
+
sorted_keys
|
24
|
+
end
|
25
|
+
|
26
|
+
class AccumulationHelper
|
27
|
+
|
28
|
+
attr_reader :frequencies, :bit_count
|
29
|
+
|
30
|
+
def initialize(frequencies, bit_count = 6)
|
31
|
+
@frequencies = frequencies
|
32
|
+
@bit_count = bit_count
|
33
|
+
end
|
34
|
+
|
35
|
+
def depleted_keys?
|
36
|
+
available_keys.empty?
|
37
|
+
end
|
38
|
+
|
39
|
+
def maximal_key
|
40
|
+
accumulation.max_by { |_, v| v }.first
|
41
|
+
end
|
42
|
+
|
43
|
+
def accumulate(key)
|
44
|
+
accumulation[key] *= probabilities[key]
|
45
|
+
consume(key)
|
46
|
+
end
|
47
|
+
|
48
|
+
def available_keys
|
49
|
+
@available_keys ||= frequencies.inject({}) do |hash, (k, _)|
|
50
|
+
hash.merge(k => bit_count)
|
51
|
+
end
|
52
|
+
end
|
53
|
+
|
54
|
+
def accumulation
|
55
|
+
@accumulation ||= probabilities.dup
|
56
|
+
end
|
57
|
+
|
58
|
+
def consume(key)
|
59
|
+
available_keys[key] -= 1
|
60
|
+
|
61
|
+
if available_keys[key].zero?
|
62
|
+
available_keys.delete(key)
|
63
|
+
accumulation.delete(key)
|
64
|
+
end
|
65
|
+
end
|
66
|
+
|
67
|
+
def probabilities
|
68
|
+
return @probabilities if @probabilities
|
69
|
+
total = frequencies.values.inject(:+)
|
70
|
+
@probabilities = frequencies.inject({}) do |hash, (k, v)|
|
71
|
+
hash.merge(k => v.to_f / total)
|
72
|
+
end
|
73
|
+
end
|
74
|
+
|
75
|
+
end
|
76
|
+
|
77
|
+
end
|
metadata
ADDED
@@ -0,0 +1,84 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: frequency_enumerator
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
hash: 23
|
5
|
+
prerelease:
|
6
|
+
segments:
|
7
|
+
- 1
|
8
|
+
- 0
|
9
|
+
- 0
|
10
|
+
version: 1.0.0
|
11
|
+
platform: ruby
|
12
|
+
authors:
|
13
|
+
- Christopher Patuzzo
|
14
|
+
autorequire:
|
15
|
+
bindir: bin
|
16
|
+
cert_chain: []
|
17
|
+
|
18
|
+
date: 2012-10-05 00:00:00 Z
|
19
|
+
dependencies:
|
20
|
+
- !ruby/object:Gem::Dependency
|
21
|
+
name: rspec
|
22
|
+
prerelease: false
|
23
|
+
requirement: &id001 !ruby/object:Gem::Requirement
|
24
|
+
none: false
|
25
|
+
requirements:
|
26
|
+
- - ">="
|
27
|
+
- !ruby/object:Gem::Version
|
28
|
+
hash: 3
|
29
|
+
segments:
|
30
|
+
- 0
|
31
|
+
version: "0"
|
32
|
+
type: :development
|
33
|
+
version_requirements: *id001
|
34
|
+
description: Yields hashes that correlate with the given frequency distribution.
|
35
|
+
email: chris@patuzzo.co.uk
|
36
|
+
executables: []
|
37
|
+
|
38
|
+
extensions: []
|
39
|
+
|
40
|
+
extra_rdoc_files: []
|
41
|
+
|
42
|
+
files:
|
43
|
+
- README.md
|
44
|
+
- lib/frequency_enumerator/base.rb
|
45
|
+
- lib/frequency_enumerator/composer.rb
|
46
|
+
- lib/frequency_enumerator/decomposer.rb
|
47
|
+
- lib/frequency_enumerator/sorter.rb
|
48
|
+
- lib/frequency_enumerator.rb
|
49
|
+
homepage: https://github.com/cpatuzzo/frequency_enumerator
|
50
|
+
licenses: []
|
51
|
+
|
52
|
+
post_install_message:
|
53
|
+
rdoc_options: []
|
54
|
+
|
55
|
+
require_paths:
|
56
|
+
- lib
|
57
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
58
|
+
none: false
|
59
|
+
requirements:
|
60
|
+
- - ">="
|
61
|
+
- !ruby/object:Gem::Version
|
62
|
+
hash: 3
|
63
|
+
segments:
|
64
|
+
- 0
|
65
|
+
version: "0"
|
66
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
67
|
+
none: false
|
68
|
+
requirements:
|
69
|
+
- - ">="
|
70
|
+
- !ruby/object:Gem::Version
|
71
|
+
hash: 3
|
72
|
+
segments:
|
73
|
+
- 0
|
74
|
+
version: "0"
|
75
|
+
requirements: []
|
76
|
+
|
77
|
+
rubyforge_project:
|
78
|
+
rubygems_version: 1.8.24
|
79
|
+
signing_key:
|
80
|
+
specification_version: 3
|
81
|
+
summary: Frequency Enumerator
|
82
|
+
test_files: []
|
83
|
+
|
84
|
+
has_rdoc:
|