frequency_enumerator 1.0.0
Sign up to get free protection for your applications and to get access to all the features.
- data/README.md +100 -0
- data/lib/frequency_enumerator/base.rb +75 -0
- data/lib/frequency_enumerator/composer.rb +45 -0
- data/lib/frequency_enumerator/decomposer.rb +56 -0
- data/lib/frequency_enumerator/sorter.rb +77 -0
- data/lib/frequency_enumerator.rb +4 -0
- metadata +84 -0
data/README.md
ADDED
@@ -0,0 +1,100 @@
|
|
1
|
+
## Frequency Enumerator
|
2
|
+
|
3
|
+
Yields hashes that correlate with the given frequency distribution.
|
4
|
+
|
5
|
+
## Concept
|
6
|
+
|
7
|
+
If you're using brute-force search to solve some problem, it makes sense to carry out some frequency analysis on the problem first.
|
8
|
+
|
9
|
+
Consider a simple example of trying to figure out which combinations of items cost a known total:
|
10
|
+
|
11
|
+
```
|
12
|
+
Total: £2.00
|
13
|
+
|
14
|
+
Item prices: Tea (£0.20), Coffee (£0.30), Biscuit (£0.15)
|
15
|
+
```
|
16
|
+
|
17
|
+
We could use *maths* to solve this problem. Or we could brute-force it.
|
18
|
+
|
19
|
+
For the latter, you'd go through every combination of these items and see which totalled £2.00. In this example, that'd take no time at all, but what if we're dealing with huge sums of money, or there are dozens of items? What if we're brute-forcing passwords?
|
20
|
+
|
21
|
+
It helps to do some [frequency analysis](https://github.com/cpatuzzo/frequency_analyser) first.
|
22
|
+
|
23
|
+
You might discover, that in fact, almost no one drinks tea and everyone loves biscuits. You might ask a couple of hundred people and end up with a frequency distribution like this:
|
24
|
+
|
25
|
+
```ruby
|
26
|
+
{ :tea => 25, :coffee => 60, :biscuit => 115 }
|
27
|
+
```
|
28
|
+
|
29
|
+
It'd be nice if we could brute-force the problem, but be more intelligent about the order in which we do so. We should use make use of our valuable, newfound knowledge.
|
30
|
+
|
31
|
+
And that's exactly what Frequency Enumerator does. (I got there in the end!)
|
32
|
+
|
33
|
+
You simply feed it a frequency distribution and it does its best to spew out 'attempts' that correlate with the given distribution. In our case, we'd do something like this:
|
34
|
+
|
35
|
+
## Usage
|
36
|
+
|
37
|
+
```ruby
|
38
|
+
# gem install frequency_enumerator
|
39
|
+
|
40
|
+
require 'frequency_enumerator'
|
41
|
+
|
42
|
+
distribution = { :tea => 25, :coffee => 60, :biscuit => 115 }
|
43
|
+
bits_required = 4 # 0..15 should be enough for our simple problem
|
44
|
+
|
45
|
+
FrequencyEnumerator.new(distribution, :bit_count => bits_required).each do |hash|
|
46
|
+
# ...
|
47
|
+
end
|
48
|
+
```
|
49
|
+
|
50
|
+
The first 10 attempts yielded to the block are:
|
51
|
+
|
52
|
+
```ruby
|
53
|
+
{ :tea=>0, :coffee=>0, :biscuit=>0 }
|
54
|
+
{ :tea=>0, :coffee=>0, :biscuit=>1 }
|
55
|
+
{ :tea=>0, :coffee=>0, :biscuit=>2 }
|
56
|
+
{ :tea=>0, :coffee=>0, :biscuit=>3 }
|
57
|
+
{ :tea=>0, :coffee=>1, :biscuit=>0 }
|
58
|
+
{ :tea=>0, :coffee=>1, :biscuit=>1 }
|
59
|
+
{ :tea=>0, :coffee=>1, :biscuit=>2 }
|
60
|
+
{ :tea=>0, :coffee=>1, :biscuit=>3 }
|
61
|
+
{ :tea=>0, :coffee=>0, :biscuit=>4 }
|
62
|
+
{ :tea=>0, :coffee=>0, :biscuit=>5 }
|
63
|
+
```
|
64
|
+
|
65
|
+
As you can see, most of attempts change the number of biscuits, whilst we haven't even explored the possibility that tea might be in the solution yet.
|
66
|
+
|
67
|
+
# Limit
|
68
|
+
|
69
|
+
All attempts are guaranteed to be unique and appear in a deterministic order. The 'limit' method calculates the number of unique enumerations for the search space (zero-offset).
|
70
|
+
|
71
|
+
```ruby
|
72
|
+
enum = FrequencyEnumerator.new(distribution, :bit_count => 4)
|
73
|
+
enum.limit #=> 4095
|
74
|
+
```
|
75
|
+
|
76
|
+
So there will be 4096 enumerations yielded to the block.
|
77
|
+
|
78
|
+
## Options
|
79
|
+
|
80
|
+
You can set 'from' and 'to' to explore different portions of the search space:
|
81
|
+
|
82
|
+
```ruby
|
83
|
+
FrequencyEnumerator.new(distribution, :from => 100, :to => 199)
|
84
|
+
```
|
85
|
+
|
86
|
+
This might be useful for multi-threading, map-reduce, or carrying on from where you left off if you're exploring a large search space.
|
87
|
+
|
88
|
+
## Real-world example
|
89
|
+
|
90
|
+
My motivation for building this gem is to more intelligently brute-force the problem of finding [self-enumerating pangrams](http://en.wikipedia.org/wiki/Pangram#Self-enumerating_pangrams) by using classical literature to build a frequency distribution of English text.
|
91
|
+
|
92
|
+
In theory, mutating the E's, T's, A's, O's and I's first should result in attempts that correlate with English text and therefore are more likely to be solutions.
|
93
|
+
|
94
|
+
## Contribution
|
95
|
+
|
96
|
+
Feel free to contribute. No commit is too small.
|
97
|
+
|
98
|
+
If you're good at optimisation, this project might be for you.
|
99
|
+
|
100
|
+
You should follow me: [@cpatuzzo](https://twitter.com/cpatuzzo)
|
@@ -0,0 +1,75 @@
|
|
1
|
+
class FrequencyEnumerator < Enumerable::Enumerator
|
2
|
+
|
3
|
+
attr_reader :frequencies, :bit_count, :from, :to
|
4
|
+
|
5
|
+
def initialize(frequencies, params = {})
|
6
|
+
@frequencies = frequencies
|
7
|
+
|
8
|
+
@bit_count = params[:bit_count] || 6
|
9
|
+
@from = params[:from] || 0
|
10
|
+
@to = params[:to] || limit
|
11
|
+
|
12
|
+
raise_if_either_boundary_is_out_of_range
|
13
|
+
|
14
|
+
@sorter = params[:sorter] || fe::Sorter
|
15
|
+
@composer = params[:composer] || fe::Composer
|
16
|
+
@decomposer = params[:decomposer] || fe::Decomposer
|
17
|
+
end
|
18
|
+
|
19
|
+
def each(&block)
|
20
|
+
(from..to).each do |i|
|
21
|
+
binary = decomposer.decompose(i)
|
22
|
+
bitmap = fragmented_bitmap(binary)
|
23
|
+
yield composition(bitmap)
|
24
|
+
end
|
25
|
+
|
26
|
+
self
|
27
|
+
end
|
28
|
+
|
29
|
+
def limit
|
30
|
+
@limit ||= (2 ** bit_count) ** frequencies.size - 1
|
31
|
+
end
|
32
|
+
|
33
|
+
private
|
34
|
+
def decomposer
|
35
|
+
@decomposer.new(:bit_count => @bit_count * frequencies.size)
|
36
|
+
end
|
37
|
+
|
38
|
+
def fragmented_bitmap(binary)
|
39
|
+
pairs = binary.zip(sorted_keys)
|
40
|
+
empty_array_default = Hash.new { |h, k| h[k] = [] }
|
41
|
+
|
42
|
+
pairs.inject(empty_array_default) do |h, (bit, key)|
|
43
|
+
h[key] << bit; h
|
44
|
+
end
|
45
|
+
end
|
46
|
+
|
47
|
+
def composition(bitmap)
|
48
|
+
bitmap.inject({}) do |h, (key, fragment)|
|
49
|
+
h.merge(key => @composer.compose(fragment))
|
50
|
+
end
|
51
|
+
end
|
52
|
+
|
53
|
+
def sorted_keys
|
54
|
+
return @sorted_keys if @sorted_keys
|
55
|
+
sorter = @sorter.new(:bit_count => @bit_count)
|
56
|
+
@sorted_keys = sorter.sort(frequencies)
|
57
|
+
end
|
58
|
+
|
59
|
+
def raise_if_either_boundary_is_out_of_range
|
60
|
+
[@from, @to].each do |i|
|
61
|
+
raise ArgumentError.new(
|
62
|
+
"#{i} lies outside of the range of the function: (0..#{limit})."
|
63
|
+
) if out_of_range?(i)
|
64
|
+
end
|
65
|
+
end
|
66
|
+
|
67
|
+
def out_of_range?(x)
|
68
|
+
x < 0 || x > limit
|
69
|
+
end
|
70
|
+
|
71
|
+
def fe
|
72
|
+
self.class
|
73
|
+
end
|
74
|
+
|
75
|
+
end
|
@@ -0,0 +1,45 @@
|
|
1
|
+
class FrequencyEnumerator::Composer
|
2
|
+
|
3
|
+
attr_reader :endianess
|
4
|
+
|
5
|
+
def initialize(params = {})
|
6
|
+
@endianess = params[:endianess]
|
7
|
+
end
|
8
|
+
|
9
|
+
def self.compose(bit_array)
|
10
|
+
new.compose(bit_array)
|
11
|
+
end
|
12
|
+
|
13
|
+
def compose(bit_array)
|
14
|
+
raise_if_non_binary_elements(bit_array)
|
15
|
+
|
16
|
+
bit_array = bit_array.reverse if big_endian?
|
17
|
+
|
18
|
+
bit_array.each_with_index.inject(0) do |sum, (bit, index)|
|
19
|
+
sum + (bit << index)
|
20
|
+
end
|
21
|
+
end
|
22
|
+
|
23
|
+
def little_endian?
|
24
|
+
@endianess == :little
|
25
|
+
end
|
26
|
+
|
27
|
+
def big_endian?
|
28
|
+
@endianess == :big
|
29
|
+
end
|
30
|
+
|
31
|
+
private
|
32
|
+
def raise_if_non_binary_elements(bit_array)
|
33
|
+
non_binary_elements = bit_array.reject { |b| [0, 1].include?(b) }
|
34
|
+
|
35
|
+
if non_binary_elements.any?
|
36
|
+
plural = 's' if non_binary_elements.size > 1
|
37
|
+
elements = non_binary_elements.map(&:inspect).join(', ')
|
38
|
+
|
39
|
+
raise TypeError.new(
|
40
|
+
"Composing from non-binary element#{plural} #{elements}."
|
41
|
+
)
|
42
|
+
end
|
43
|
+
end
|
44
|
+
|
45
|
+
end
|
@@ -0,0 +1,56 @@
|
|
1
|
+
class FrequencyEnumerator::Decomposer
|
2
|
+
|
3
|
+
class ::OverflowError < StandardError; end
|
4
|
+
class ::SignedError < StandardError; end
|
5
|
+
|
6
|
+
attr_reader :bit_count
|
7
|
+
attr_reader :endianness
|
8
|
+
|
9
|
+
def initialize(params = {})
|
10
|
+
@bit_count = params[:bit_count] || 8
|
11
|
+
@endianness = params[:endianness] || :little
|
12
|
+
end
|
13
|
+
|
14
|
+
def self.decompose(integer)
|
15
|
+
new.decompose(integer)
|
16
|
+
end
|
17
|
+
|
18
|
+
def decompose(integer)
|
19
|
+
raise_if_negative(integer)
|
20
|
+
raise_if_not_enough_bits(integer)
|
21
|
+
|
22
|
+
bit_array = bit_count.times.map { |b| integer[b] }
|
23
|
+
|
24
|
+
little_endian? ? bit_array : bit_array.reverse
|
25
|
+
end
|
26
|
+
|
27
|
+
def little_endian?
|
28
|
+
endianness == :little
|
29
|
+
end
|
30
|
+
|
31
|
+
def big_endian?
|
32
|
+
endianess == :big
|
33
|
+
end
|
34
|
+
|
35
|
+
private
|
36
|
+
def raise_if_negative(integer)
|
37
|
+
raise SignedError.new(
|
38
|
+
"Decomposing negative integers is unsupported."
|
39
|
+
) if integer < 0
|
40
|
+
end
|
41
|
+
|
42
|
+
def raise_if_not_enough_bits(integer)
|
43
|
+
bits_required = bits_required_to_decompose(integer)
|
44
|
+
|
45
|
+
raise OverflowError.new(
|
46
|
+
"Decomposing #{integer} requires more than #{bit_count} bits."
|
47
|
+
) if bits_required > bit_count
|
48
|
+
end
|
49
|
+
|
50
|
+
def bits_required_to_decompose(integer)
|
51
|
+
(1..1.0/0).detect do |bits|
|
52
|
+
(integer >> bits).zero?
|
53
|
+
end
|
54
|
+
end
|
55
|
+
|
56
|
+
end
|
@@ -0,0 +1,77 @@
|
|
1
|
+
class FrequencyEnumerator::Sorter
|
2
|
+
|
3
|
+
attr_reader :bit_count
|
4
|
+
|
5
|
+
def initialize(params = {})
|
6
|
+
@bit_count = params[:bit_count] || 8
|
7
|
+
end
|
8
|
+
|
9
|
+
def self.sort(frequencies)
|
10
|
+
new.sort(frequencies)
|
11
|
+
end
|
12
|
+
|
13
|
+
def sort(frequencies)
|
14
|
+
helper = AccumulationHelper.new(frequencies, bit_count)
|
15
|
+
sorted_keys = []
|
16
|
+
|
17
|
+
until helper.depleted_keys? do
|
18
|
+
key = helper.maximal_key
|
19
|
+
sorted_keys << key
|
20
|
+
helper.accumulate(key)
|
21
|
+
end
|
22
|
+
|
23
|
+
sorted_keys
|
24
|
+
end
|
25
|
+
|
26
|
+
class AccumulationHelper
|
27
|
+
|
28
|
+
attr_reader :frequencies, :bit_count
|
29
|
+
|
30
|
+
def initialize(frequencies, bit_count = 6)
|
31
|
+
@frequencies = frequencies
|
32
|
+
@bit_count = bit_count
|
33
|
+
end
|
34
|
+
|
35
|
+
def depleted_keys?
|
36
|
+
available_keys.empty?
|
37
|
+
end
|
38
|
+
|
39
|
+
def maximal_key
|
40
|
+
accumulation.max_by { |_, v| v }.first
|
41
|
+
end
|
42
|
+
|
43
|
+
def accumulate(key)
|
44
|
+
accumulation[key] *= probabilities[key]
|
45
|
+
consume(key)
|
46
|
+
end
|
47
|
+
|
48
|
+
def available_keys
|
49
|
+
@available_keys ||= frequencies.inject({}) do |hash, (k, _)|
|
50
|
+
hash.merge(k => bit_count)
|
51
|
+
end
|
52
|
+
end
|
53
|
+
|
54
|
+
def accumulation
|
55
|
+
@accumulation ||= probabilities.dup
|
56
|
+
end
|
57
|
+
|
58
|
+
def consume(key)
|
59
|
+
available_keys[key] -= 1
|
60
|
+
|
61
|
+
if available_keys[key].zero?
|
62
|
+
available_keys.delete(key)
|
63
|
+
accumulation.delete(key)
|
64
|
+
end
|
65
|
+
end
|
66
|
+
|
67
|
+
def probabilities
|
68
|
+
return @probabilities if @probabilities
|
69
|
+
total = frequencies.values.inject(:+)
|
70
|
+
@probabilities = frequencies.inject({}) do |hash, (k, v)|
|
71
|
+
hash.merge(k => v.to_f / total)
|
72
|
+
end
|
73
|
+
end
|
74
|
+
|
75
|
+
end
|
76
|
+
|
77
|
+
end
|
metadata
ADDED
@@ -0,0 +1,84 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: frequency_enumerator
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
hash: 23
|
5
|
+
prerelease:
|
6
|
+
segments:
|
7
|
+
- 1
|
8
|
+
- 0
|
9
|
+
- 0
|
10
|
+
version: 1.0.0
|
11
|
+
platform: ruby
|
12
|
+
authors:
|
13
|
+
- Christopher Patuzzo
|
14
|
+
autorequire:
|
15
|
+
bindir: bin
|
16
|
+
cert_chain: []
|
17
|
+
|
18
|
+
date: 2012-10-05 00:00:00 Z
|
19
|
+
dependencies:
|
20
|
+
- !ruby/object:Gem::Dependency
|
21
|
+
name: rspec
|
22
|
+
prerelease: false
|
23
|
+
requirement: &id001 !ruby/object:Gem::Requirement
|
24
|
+
none: false
|
25
|
+
requirements:
|
26
|
+
- - ">="
|
27
|
+
- !ruby/object:Gem::Version
|
28
|
+
hash: 3
|
29
|
+
segments:
|
30
|
+
- 0
|
31
|
+
version: "0"
|
32
|
+
type: :development
|
33
|
+
version_requirements: *id001
|
34
|
+
description: Yields hashes that correlate with the given frequency distribution.
|
35
|
+
email: chris@patuzzo.co.uk
|
36
|
+
executables: []
|
37
|
+
|
38
|
+
extensions: []
|
39
|
+
|
40
|
+
extra_rdoc_files: []
|
41
|
+
|
42
|
+
files:
|
43
|
+
- README.md
|
44
|
+
- lib/frequency_enumerator/base.rb
|
45
|
+
- lib/frequency_enumerator/composer.rb
|
46
|
+
- lib/frequency_enumerator/decomposer.rb
|
47
|
+
- lib/frequency_enumerator/sorter.rb
|
48
|
+
- lib/frequency_enumerator.rb
|
49
|
+
homepage: https://github.com/cpatuzzo/frequency_enumerator
|
50
|
+
licenses: []
|
51
|
+
|
52
|
+
post_install_message:
|
53
|
+
rdoc_options: []
|
54
|
+
|
55
|
+
require_paths:
|
56
|
+
- lib
|
57
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
58
|
+
none: false
|
59
|
+
requirements:
|
60
|
+
- - ">="
|
61
|
+
- !ruby/object:Gem::Version
|
62
|
+
hash: 3
|
63
|
+
segments:
|
64
|
+
- 0
|
65
|
+
version: "0"
|
66
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
67
|
+
none: false
|
68
|
+
requirements:
|
69
|
+
- - ">="
|
70
|
+
- !ruby/object:Gem::Version
|
71
|
+
hash: 3
|
72
|
+
segments:
|
73
|
+
- 0
|
74
|
+
version: "0"
|
75
|
+
requirements: []
|
76
|
+
|
77
|
+
rubyforge_project:
|
78
|
+
rubygems_version: 1.8.24
|
79
|
+
signing_key:
|
80
|
+
specification_version: 3
|
81
|
+
summary: Frequency Enumerator
|
82
|
+
test_files: []
|
83
|
+
|
84
|
+
has_rdoc:
|