frequency_enumerator 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
data/README.md ADDED
@@ -0,0 +1,100 @@
1
+ ## Frequency Enumerator
2
+
3
+ Yields hashes that correlate with the given frequency distribution.
4
+
5
+ ## Concept
6
+
7
+ If you're using brute-force search to solve some problem, it makes sense to carry out some frequency analysis on the problem first.
8
+
9
+ Consider a simple example of trying to figure out which combinations of items cost a known total:
10
+
11
+ ```
12
+ Total: £2.00
13
+
14
+ Item prices: Tea (£0.20), Coffee (£0.30), Biscuit (£0.15)
15
+ ```
16
+
17
+ We could use *maths* to solve this problem. Or we could brute-force it.
18
+
19
+ For the latter, you'd go through every combination of these items and see which totalled £2.00. In this example, that'd take no time at all, but what if we're dealing with huge sums of money, or there are dozens of items? What if we're brute-forcing passwords?
20
+
21
+ It helps to do some [frequency analysis](https://github.com/cpatuzzo/frequency_analyser) first.
22
+
23
+ You might discover, that in fact, almost no one drinks tea and everyone loves biscuits. You might ask a couple of hundred people and end up with a frequency distribution like this:
24
+
25
+ ```ruby
26
+ { :tea => 25, :coffee => 60, :biscuit => 115 }
27
+ ```
28
+
29
+ It'd be nice if we could brute-force the problem, but be more intelligent about the order in which we do so. We should use make use of our valuable, newfound knowledge.
30
+
31
+ And that's exactly what Frequency Enumerator does. (I got there in the end!)
32
+
33
+ You simply feed it a frequency distribution and it does its best to spew out 'attempts' that correlate with the given distribution. In our case, we'd do something like this:
34
+
35
+ ## Usage
36
+
37
+ ```ruby
38
+ # gem install frequency_enumerator
39
+
40
+ require 'frequency_enumerator'
41
+
42
+ distribution = { :tea => 25, :coffee => 60, :biscuit => 115 }
43
+ bits_required = 4 # 0..15 should be enough for our simple problem
44
+
45
+ FrequencyEnumerator.new(distribution, :bit_count => bits_required).each do |hash|
46
+ # ...
47
+ end
48
+ ```
49
+
50
+ The first 10 attempts yielded to the block are:
51
+
52
+ ```ruby
53
+ { :tea=>0, :coffee=>0, :biscuit=>0 }
54
+ { :tea=>0, :coffee=>0, :biscuit=>1 }
55
+ { :tea=>0, :coffee=>0, :biscuit=>2 }
56
+ { :tea=>0, :coffee=>0, :biscuit=>3 }
57
+ { :tea=>0, :coffee=>1, :biscuit=>0 }
58
+ { :tea=>0, :coffee=>1, :biscuit=>1 }
59
+ { :tea=>0, :coffee=>1, :biscuit=>2 }
60
+ { :tea=>0, :coffee=>1, :biscuit=>3 }
61
+ { :tea=>0, :coffee=>0, :biscuit=>4 }
62
+ { :tea=>0, :coffee=>0, :biscuit=>5 }
63
+ ```
64
+
65
+ As you can see, most of attempts change the number of biscuits, whilst we haven't even explored the possibility that tea might be in the solution yet.
66
+
67
+ # Limit
68
+
69
+ All attempts are guaranteed to be unique and appear in a deterministic order. The 'limit' method calculates the number of unique enumerations for the search space (zero-offset).
70
+
71
+ ```ruby
72
+ enum = FrequencyEnumerator.new(distribution, :bit_count => 4)
73
+ enum.limit #=> 4095
74
+ ```
75
+
76
+ So there will be 4096 enumerations yielded to the block.
77
+
78
+ ## Options
79
+
80
+ You can set 'from' and 'to' to explore different portions of the search space:
81
+
82
+ ```ruby
83
+ FrequencyEnumerator.new(distribution, :from => 100, :to => 199)
84
+ ```
85
+
86
+ This might be useful for multi-threading, map-reduce, or carrying on from where you left off if you're exploring a large search space.
87
+
88
+ ## Real-world example
89
+
90
+ My motivation for building this gem is to more intelligently brute-force the problem of finding [self-enumerating pangrams](http://en.wikipedia.org/wiki/Pangram#Self-enumerating_pangrams) by using classical literature to build a frequency distribution of English text.
91
+
92
+ In theory, mutating the E's, T's, A's, O's and I's first should result in attempts that correlate with English text and therefore are more likely to be solutions.
93
+
94
+ ## Contribution
95
+
96
+ Feel free to contribute. No commit is too small.
97
+
98
+ If you're good at optimisation, this project might be for you.
99
+
100
+ You should follow me: [@cpatuzzo](https://twitter.com/cpatuzzo)
@@ -0,0 +1,75 @@
1
+ class FrequencyEnumerator < Enumerable::Enumerator
2
+
3
+ attr_reader :frequencies, :bit_count, :from, :to
4
+
5
+ def initialize(frequencies, params = {})
6
+ @frequencies = frequencies
7
+
8
+ @bit_count = params[:bit_count] || 6
9
+ @from = params[:from] || 0
10
+ @to = params[:to] || limit
11
+
12
+ raise_if_either_boundary_is_out_of_range
13
+
14
+ @sorter = params[:sorter] || fe::Sorter
15
+ @composer = params[:composer] || fe::Composer
16
+ @decomposer = params[:decomposer] || fe::Decomposer
17
+ end
18
+
19
+ def each(&block)
20
+ (from..to).each do |i|
21
+ binary = decomposer.decompose(i)
22
+ bitmap = fragmented_bitmap(binary)
23
+ yield composition(bitmap)
24
+ end
25
+
26
+ self
27
+ end
28
+
29
+ def limit
30
+ @limit ||= (2 ** bit_count) ** frequencies.size - 1
31
+ end
32
+
33
+ private
34
+ def decomposer
35
+ @decomposer.new(:bit_count => @bit_count * frequencies.size)
36
+ end
37
+
38
+ def fragmented_bitmap(binary)
39
+ pairs = binary.zip(sorted_keys)
40
+ empty_array_default = Hash.new { |h, k| h[k] = [] }
41
+
42
+ pairs.inject(empty_array_default) do |h, (bit, key)|
43
+ h[key] << bit; h
44
+ end
45
+ end
46
+
47
+ def composition(bitmap)
48
+ bitmap.inject({}) do |h, (key, fragment)|
49
+ h.merge(key => @composer.compose(fragment))
50
+ end
51
+ end
52
+
53
+ def sorted_keys
54
+ return @sorted_keys if @sorted_keys
55
+ sorter = @sorter.new(:bit_count => @bit_count)
56
+ @sorted_keys = sorter.sort(frequencies)
57
+ end
58
+
59
+ def raise_if_either_boundary_is_out_of_range
60
+ [@from, @to].each do |i|
61
+ raise ArgumentError.new(
62
+ "#{i} lies outside of the range of the function: (0..#{limit})."
63
+ ) if out_of_range?(i)
64
+ end
65
+ end
66
+
67
+ def out_of_range?(x)
68
+ x < 0 || x > limit
69
+ end
70
+
71
+ def fe
72
+ self.class
73
+ end
74
+
75
+ end
@@ -0,0 +1,45 @@
1
+ class FrequencyEnumerator::Composer
2
+
3
+ attr_reader :endianess
4
+
5
+ def initialize(params = {})
6
+ @endianess = params[:endianess]
7
+ end
8
+
9
+ def self.compose(bit_array)
10
+ new.compose(bit_array)
11
+ end
12
+
13
+ def compose(bit_array)
14
+ raise_if_non_binary_elements(bit_array)
15
+
16
+ bit_array = bit_array.reverse if big_endian?
17
+
18
+ bit_array.each_with_index.inject(0) do |sum, (bit, index)|
19
+ sum + (bit << index)
20
+ end
21
+ end
22
+
23
+ def little_endian?
24
+ @endianess == :little
25
+ end
26
+
27
+ def big_endian?
28
+ @endianess == :big
29
+ end
30
+
31
+ private
32
+ def raise_if_non_binary_elements(bit_array)
33
+ non_binary_elements = bit_array.reject { |b| [0, 1].include?(b) }
34
+
35
+ if non_binary_elements.any?
36
+ plural = 's' if non_binary_elements.size > 1
37
+ elements = non_binary_elements.map(&:inspect).join(', ')
38
+
39
+ raise TypeError.new(
40
+ "Composing from non-binary element#{plural} #{elements}."
41
+ )
42
+ end
43
+ end
44
+
45
+ end
@@ -0,0 +1,56 @@
1
+ class FrequencyEnumerator::Decomposer
2
+
3
+ class ::OverflowError < StandardError; end
4
+ class ::SignedError < StandardError; end
5
+
6
+ attr_reader :bit_count
7
+ attr_reader :endianness
8
+
9
+ def initialize(params = {})
10
+ @bit_count = params[:bit_count] || 8
11
+ @endianness = params[:endianness] || :little
12
+ end
13
+
14
+ def self.decompose(integer)
15
+ new.decompose(integer)
16
+ end
17
+
18
+ def decompose(integer)
19
+ raise_if_negative(integer)
20
+ raise_if_not_enough_bits(integer)
21
+
22
+ bit_array = bit_count.times.map { |b| integer[b] }
23
+
24
+ little_endian? ? bit_array : bit_array.reverse
25
+ end
26
+
27
+ def little_endian?
28
+ endianness == :little
29
+ end
30
+
31
+ def big_endian?
32
+ endianess == :big
33
+ end
34
+
35
+ private
36
+ def raise_if_negative(integer)
37
+ raise SignedError.new(
38
+ "Decomposing negative integers is unsupported."
39
+ ) if integer < 0
40
+ end
41
+
42
+ def raise_if_not_enough_bits(integer)
43
+ bits_required = bits_required_to_decompose(integer)
44
+
45
+ raise OverflowError.new(
46
+ "Decomposing #{integer} requires more than #{bit_count} bits."
47
+ ) if bits_required > bit_count
48
+ end
49
+
50
+ def bits_required_to_decompose(integer)
51
+ (1..1.0/0).detect do |bits|
52
+ (integer >> bits).zero?
53
+ end
54
+ end
55
+
56
+ end
@@ -0,0 +1,77 @@
1
+ class FrequencyEnumerator::Sorter
2
+
3
+ attr_reader :bit_count
4
+
5
+ def initialize(params = {})
6
+ @bit_count = params[:bit_count] || 8
7
+ end
8
+
9
+ def self.sort(frequencies)
10
+ new.sort(frequencies)
11
+ end
12
+
13
+ def sort(frequencies)
14
+ helper = AccumulationHelper.new(frequencies, bit_count)
15
+ sorted_keys = []
16
+
17
+ until helper.depleted_keys? do
18
+ key = helper.maximal_key
19
+ sorted_keys << key
20
+ helper.accumulate(key)
21
+ end
22
+
23
+ sorted_keys
24
+ end
25
+
26
+ class AccumulationHelper
27
+
28
+ attr_reader :frequencies, :bit_count
29
+
30
+ def initialize(frequencies, bit_count = 6)
31
+ @frequencies = frequencies
32
+ @bit_count = bit_count
33
+ end
34
+
35
+ def depleted_keys?
36
+ available_keys.empty?
37
+ end
38
+
39
+ def maximal_key
40
+ accumulation.max_by { |_, v| v }.first
41
+ end
42
+
43
+ def accumulate(key)
44
+ accumulation[key] *= probabilities[key]
45
+ consume(key)
46
+ end
47
+
48
+ def available_keys
49
+ @available_keys ||= frequencies.inject({}) do |hash, (k, _)|
50
+ hash.merge(k => bit_count)
51
+ end
52
+ end
53
+
54
+ def accumulation
55
+ @accumulation ||= probabilities.dup
56
+ end
57
+
58
+ def consume(key)
59
+ available_keys[key] -= 1
60
+
61
+ if available_keys[key].zero?
62
+ available_keys.delete(key)
63
+ accumulation.delete(key)
64
+ end
65
+ end
66
+
67
+ def probabilities
68
+ return @probabilities if @probabilities
69
+ total = frequencies.values.inject(:+)
70
+ @probabilities = frequencies.inject({}) do |hash, (k, v)|
71
+ hash.merge(k => v.to_f / total)
72
+ end
73
+ end
74
+
75
+ end
76
+
77
+ end
@@ -0,0 +1,4 @@
1
+ require 'frequency_enumerator/base'
2
+ require 'frequency_enumerator/decomposer'
3
+ require 'frequency_enumerator/composer'
4
+ require 'frequency_enumerator/sorter'
metadata ADDED
@@ -0,0 +1,84 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: frequency_enumerator
3
+ version: !ruby/object:Gem::Version
4
+ hash: 23
5
+ prerelease:
6
+ segments:
7
+ - 1
8
+ - 0
9
+ - 0
10
+ version: 1.0.0
11
+ platform: ruby
12
+ authors:
13
+ - Christopher Patuzzo
14
+ autorequire:
15
+ bindir: bin
16
+ cert_chain: []
17
+
18
+ date: 2012-10-05 00:00:00 Z
19
+ dependencies:
20
+ - !ruby/object:Gem::Dependency
21
+ name: rspec
22
+ prerelease: false
23
+ requirement: &id001 !ruby/object:Gem::Requirement
24
+ none: false
25
+ requirements:
26
+ - - ">="
27
+ - !ruby/object:Gem::Version
28
+ hash: 3
29
+ segments:
30
+ - 0
31
+ version: "0"
32
+ type: :development
33
+ version_requirements: *id001
34
+ description: Yields hashes that correlate with the given frequency distribution.
35
+ email: chris@patuzzo.co.uk
36
+ executables: []
37
+
38
+ extensions: []
39
+
40
+ extra_rdoc_files: []
41
+
42
+ files:
43
+ - README.md
44
+ - lib/frequency_enumerator/base.rb
45
+ - lib/frequency_enumerator/composer.rb
46
+ - lib/frequency_enumerator/decomposer.rb
47
+ - lib/frequency_enumerator/sorter.rb
48
+ - lib/frequency_enumerator.rb
49
+ homepage: https://github.com/cpatuzzo/frequency_enumerator
50
+ licenses: []
51
+
52
+ post_install_message:
53
+ rdoc_options: []
54
+
55
+ require_paths:
56
+ - lib
57
+ required_ruby_version: !ruby/object:Gem::Requirement
58
+ none: false
59
+ requirements:
60
+ - - ">="
61
+ - !ruby/object:Gem::Version
62
+ hash: 3
63
+ segments:
64
+ - 0
65
+ version: "0"
66
+ required_rubygems_version: !ruby/object:Gem::Requirement
67
+ none: false
68
+ requirements:
69
+ - - ">="
70
+ - !ruby/object:Gem::Version
71
+ hash: 3
72
+ segments:
73
+ - 0
74
+ version: "0"
75
+ requirements: []
76
+
77
+ rubyforge_project:
78
+ rubygems_version: 1.8.24
79
+ signing_key:
80
+ specification_version: 3
81
+ summary: Frequency Enumerator
82
+ test_files: []
83
+
84
+ has_rdoc: