frequency_enumerator 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md ADDED
@@ -0,0 +1,100 @@
1
+ ## Frequency Enumerator
2
+
3
+ Yields hashes that correlate with the given frequency distribution.
4
+
5
+ ## Concept
6
+
7
+ If you're using brute-force search to solve some problem, it makes sense to carry out some frequency analysis on the problem first.
8
+
9
+ Consider a simple example of trying to figure out which combinations of items cost a known total:
10
+
11
+ ```
12
+ Total: £2.00
13
+
14
+ Item prices: Tea (£0.20), Coffee (£0.30), Biscuit (£0.15)
15
+ ```
16
+
17
+ We could use *maths* to solve this problem. Or we could brute-force it.
18
+
19
+ For the latter, you'd go through every combination of these items and see which totalled £2.00. In this example, that'd take no time at all, but what if we're dealing with huge sums of money, or there are dozens of items? What if we're brute-forcing passwords?
20
+
21
+ It helps to do some [frequency analysis](https://github.com/cpatuzzo/frequency_analyser) first.
22
+
23
+ You might discover, that in fact, almost no one drinks tea and everyone loves biscuits. You might ask a couple of hundred people and end up with a frequency distribution like this:
24
+
25
+ ```ruby
26
+ { :tea => 25, :coffee => 60, :biscuit => 115 }
27
+ ```
28
+
29
+ It'd be nice if we could brute-force the problem, but be more intelligent about the order in which we do so. We should use make use of our valuable, newfound knowledge.
30
+
31
+ And that's exactly what Frequency Enumerator does. (I got there in the end!)
32
+
33
+ You simply feed it a frequency distribution and it does its best to spew out 'attempts' that correlate with the given distribution. In our case, we'd do something like this:
34
+
35
+ ## Usage
36
+
37
+ ```ruby
38
+ # gem install frequency_enumerator
39
+
40
+ require 'frequency_enumerator'
41
+
42
+ distribution = { :tea => 25, :coffee => 60, :biscuit => 115 }
43
+ bits_required = 4 # 0..15 should be enough for our simple problem
44
+
45
+ FrequencyEnumerator.new(distribution, :bit_count => bits_required).each do |hash|
46
+ # ...
47
+ end
48
+ ```
49
+
50
+ The first 10 attempts yielded to the block are:
51
+
52
+ ```ruby
53
+ { :tea=>0, :coffee=>0, :biscuit=>0 }
54
+ { :tea=>0, :coffee=>0, :biscuit=>1 }
55
+ { :tea=>0, :coffee=>0, :biscuit=>2 }
56
+ { :tea=>0, :coffee=>0, :biscuit=>3 }
57
+ { :tea=>0, :coffee=>1, :biscuit=>0 }
58
+ { :tea=>0, :coffee=>1, :biscuit=>1 }
59
+ { :tea=>0, :coffee=>1, :biscuit=>2 }
60
+ { :tea=>0, :coffee=>1, :biscuit=>3 }
61
+ { :tea=>0, :coffee=>0, :biscuit=>4 }
62
+ { :tea=>0, :coffee=>0, :biscuit=>5 }
63
+ ```
64
+
65
+ As you can see, most of attempts change the number of biscuits, whilst we haven't even explored the possibility that tea might be in the solution yet.
66
+
67
+ # Limit
68
+
69
+ All attempts are guaranteed to be unique and appear in a deterministic order. The 'limit' method calculates the number of unique enumerations for the search space (zero-offset).
70
+
71
+ ```ruby
72
+ enum = FrequencyEnumerator.new(distribution, :bit_count => 4)
73
+ enum.limit #=> 4095
74
+ ```
75
+
76
+ So there will be 4096 enumerations yielded to the block.
77
+
78
+ ## Options
79
+
80
+ You can set 'from' and 'to' to explore different portions of the search space:
81
+
82
+ ```ruby
83
+ FrequencyEnumerator.new(distribution, :from => 100, :to => 199)
84
+ ```
85
+
86
+ This might be useful for multi-threading, map-reduce, or carrying on from where you left off if you're exploring a large search space.
87
+
88
+ ## Real-world example
89
+
90
+ My motivation for building this gem is to more intelligently brute-force the problem of finding [self-enumerating pangrams](http://en.wikipedia.org/wiki/Pangram#Self-enumerating_pangrams) by using classical literature to build a frequency distribution of English text.
91
+
92
+ In theory, mutating the E's, T's, A's, O's and I's first should result in attempts that correlate with English text and therefore are more likely to be solutions.
93
+
94
+ ## Contribution
95
+
96
+ Feel free to contribute. No commit is too small.
97
+
98
+ If you're good at optimisation, this project might be for you.
99
+
100
+ You should follow me: [@cpatuzzo](https://twitter.com/cpatuzzo)
@@ -0,0 +1,75 @@
1
+ class FrequencyEnumerator < Enumerable::Enumerator
2
+
3
+ attr_reader :frequencies, :bit_count, :from, :to
4
+
5
+ def initialize(frequencies, params = {})
6
+ @frequencies = frequencies
7
+
8
+ @bit_count = params[:bit_count] || 6
9
+ @from = params[:from] || 0
10
+ @to = params[:to] || limit
11
+
12
+ raise_if_either_boundary_is_out_of_range
13
+
14
+ @sorter = params[:sorter] || fe::Sorter
15
+ @composer = params[:composer] || fe::Composer
16
+ @decomposer = params[:decomposer] || fe::Decomposer
17
+ end
18
+
19
+ def each(&block)
20
+ (from..to).each do |i|
21
+ binary = decomposer.decompose(i)
22
+ bitmap = fragmented_bitmap(binary)
23
+ yield composition(bitmap)
24
+ end
25
+
26
+ self
27
+ end
28
+
29
+ def limit
30
+ @limit ||= (2 ** bit_count) ** frequencies.size - 1
31
+ end
32
+
33
+ private
34
+ def decomposer
35
+ @decomposer.new(:bit_count => @bit_count * frequencies.size)
36
+ end
37
+
38
+ def fragmented_bitmap(binary)
39
+ pairs = binary.zip(sorted_keys)
40
+ empty_array_default = Hash.new { |h, k| h[k] = [] }
41
+
42
+ pairs.inject(empty_array_default) do |h, (bit, key)|
43
+ h[key] << bit; h
44
+ end
45
+ end
46
+
47
+ def composition(bitmap)
48
+ bitmap.inject({}) do |h, (key, fragment)|
49
+ h.merge(key => @composer.compose(fragment))
50
+ end
51
+ end
52
+
53
+ def sorted_keys
54
+ return @sorted_keys if @sorted_keys
55
+ sorter = @sorter.new(:bit_count => @bit_count)
56
+ @sorted_keys = sorter.sort(frequencies)
57
+ end
58
+
59
+ def raise_if_either_boundary_is_out_of_range
60
+ [@from, @to].each do |i|
61
+ raise ArgumentError.new(
62
+ "#{i} lies outside of the range of the function: (0..#{limit})."
63
+ ) if out_of_range?(i)
64
+ end
65
+ end
66
+
67
+ def out_of_range?(x)
68
+ x < 0 || x > limit
69
+ end
70
+
71
+ def fe
72
+ self.class
73
+ end
74
+
75
+ end
@@ -0,0 +1,45 @@
1
+ class FrequencyEnumerator::Composer
2
+
3
+ attr_reader :endianess
4
+
5
+ def initialize(params = {})
6
+ @endianess = params[:endianess]
7
+ end
8
+
9
+ def self.compose(bit_array)
10
+ new.compose(bit_array)
11
+ end
12
+
13
+ def compose(bit_array)
14
+ raise_if_non_binary_elements(bit_array)
15
+
16
+ bit_array = bit_array.reverse if big_endian?
17
+
18
+ bit_array.each_with_index.inject(0) do |sum, (bit, index)|
19
+ sum + (bit << index)
20
+ end
21
+ end
22
+
23
+ def little_endian?
24
+ @endianess == :little
25
+ end
26
+
27
+ def big_endian?
28
+ @endianess == :big
29
+ end
30
+
31
+ private
32
+ def raise_if_non_binary_elements(bit_array)
33
+ non_binary_elements = bit_array.reject { |b| [0, 1].include?(b) }
34
+
35
+ if non_binary_elements.any?
36
+ plural = 's' if non_binary_elements.size > 1
37
+ elements = non_binary_elements.map(&:inspect).join(', ')
38
+
39
+ raise TypeError.new(
40
+ "Composing from non-binary element#{plural} #{elements}."
41
+ )
42
+ end
43
+ end
44
+
45
+ end
@@ -0,0 +1,56 @@
1
+ class FrequencyEnumerator::Decomposer
2
+
3
+ class ::OverflowError < StandardError; end
4
+ class ::SignedError < StandardError; end
5
+
6
+ attr_reader :bit_count
7
+ attr_reader :endianness
8
+
9
+ def initialize(params = {})
10
+ @bit_count = params[:bit_count] || 8
11
+ @endianness = params[:endianness] || :little
12
+ end
13
+
14
+ def self.decompose(integer)
15
+ new.decompose(integer)
16
+ end
17
+
18
+ def decompose(integer)
19
+ raise_if_negative(integer)
20
+ raise_if_not_enough_bits(integer)
21
+
22
+ bit_array = bit_count.times.map { |b| integer[b] }
23
+
24
+ little_endian? ? bit_array : bit_array.reverse
25
+ end
26
+
27
+ def little_endian?
28
+ endianness == :little
29
+ end
30
+
31
+ def big_endian?
32
+ endianess == :big
33
+ end
34
+
35
+ private
36
+ def raise_if_negative(integer)
37
+ raise SignedError.new(
38
+ "Decomposing negative integers is unsupported."
39
+ ) if integer < 0
40
+ end
41
+
42
+ def raise_if_not_enough_bits(integer)
43
+ bits_required = bits_required_to_decompose(integer)
44
+
45
+ raise OverflowError.new(
46
+ "Decomposing #{integer} requires more than #{bit_count} bits."
47
+ ) if bits_required > bit_count
48
+ end
49
+
50
+ def bits_required_to_decompose(integer)
51
+ (1..1.0/0).detect do |bits|
52
+ (integer >> bits).zero?
53
+ end
54
+ end
55
+
56
+ end
@@ -0,0 +1,77 @@
1
+ class FrequencyEnumerator::Sorter
2
+
3
+ attr_reader :bit_count
4
+
5
+ def initialize(params = {})
6
+ @bit_count = params[:bit_count] || 8
7
+ end
8
+
9
+ def self.sort(frequencies)
10
+ new.sort(frequencies)
11
+ end
12
+
13
+ def sort(frequencies)
14
+ helper = AccumulationHelper.new(frequencies, bit_count)
15
+ sorted_keys = []
16
+
17
+ until helper.depleted_keys? do
18
+ key = helper.maximal_key
19
+ sorted_keys << key
20
+ helper.accumulate(key)
21
+ end
22
+
23
+ sorted_keys
24
+ end
25
+
26
+ class AccumulationHelper
27
+
28
+ attr_reader :frequencies, :bit_count
29
+
30
+ def initialize(frequencies, bit_count = 6)
31
+ @frequencies = frequencies
32
+ @bit_count = bit_count
33
+ end
34
+
35
+ def depleted_keys?
36
+ available_keys.empty?
37
+ end
38
+
39
+ def maximal_key
40
+ accumulation.max_by { |_, v| v }.first
41
+ end
42
+
43
+ def accumulate(key)
44
+ accumulation[key] *= probabilities[key]
45
+ consume(key)
46
+ end
47
+
48
+ def available_keys
49
+ @available_keys ||= frequencies.inject({}) do |hash, (k, _)|
50
+ hash.merge(k => bit_count)
51
+ end
52
+ end
53
+
54
+ def accumulation
55
+ @accumulation ||= probabilities.dup
56
+ end
57
+
58
+ def consume(key)
59
+ available_keys[key] -= 1
60
+
61
+ if available_keys[key].zero?
62
+ available_keys.delete(key)
63
+ accumulation.delete(key)
64
+ end
65
+ end
66
+
67
+ def probabilities
68
+ return @probabilities if @probabilities
69
+ total = frequencies.values.inject(:+)
70
+ @probabilities = frequencies.inject({}) do |hash, (k, v)|
71
+ hash.merge(k => v.to_f / total)
72
+ end
73
+ end
74
+
75
+ end
76
+
77
+ end
@@ -0,0 +1,4 @@
1
+ require 'frequency_enumerator/base'
2
+ require 'frequency_enumerator/decomposer'
3
+ require 'frequency_enumerator/composer'
4
+ require 'frequency_enumerator/sorter'
metadata ADDED
@@ -0,0 +1,84 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: frequency_enumerator
3
+ version: !ruby/object:Gem::Version
4
+ hash: 23
5
+ prerelease:
6
+ segments:
7
+ - 1
8
+ - 0
9
+ - 0
10
+ version: 1.0.0
11
+ platform: ruby
12
+ authors:
13
+ - Christopher Patuzzo
14
+ autorequire:
15
+ bindir: bin
16
+ cert_chain: []
17
+
18
+ date: 2012-10-05 00:00:00 Z
19
+ dependencies:
20
+ - !ruby/object:Gem::Dependency
21
+ name: rspec
22
+ prerelease: false
23
+ requirement: &id001 !ruby/object:Gem::Requirement
24
+ none: false
25
+ requirements:
26
+ - - ">="
27
+ - !ruby/object:Gem::Version
28
+ hash: 3
29
+ segments:
30
+ - 0
31
+ version: "0"
32
+ type: :development
33
+ version_requirements: *id001
34
+ description: Yields hashes that correlate with the given frequency distribution.
35
+ email: chris@patuzzo.co.uk
36
+ executables: []
37
+
38
+ extensions: []
39
+
40
+ extra_rdoc_files: []
41
+
42
+ files:
43
+ - README.md
44
+ - lib/frequency_enumerator/base.rb
45
+ - lib/frequency_enumerator/composer.rb
46
+ - lib/frequency_enumerator/decomposer.rb
47
+ - lib/frequency_enumerator/sorter.rb
48
+ - lib/frequency_enumerator.rb
49
+ homepage: https://github.com/cpatuzzo/frequency_enumerator
50
+ licenses: []
51
+
52
+ post_install_message:
53
+ rdoc_options: []
54
+
55
+ require_paths:
56
+ - lib
57
+ required_ruby_version: !ruby/object:Gem::Requirement
58
+ none: false
59
+ requirements:
60
+ - - ">="
61
+ - !ruby/object:Gem::Version
62
+ hash: 3
63
+ segments:
64
+ - 0
65
+ version: "0"
66
+ required_rubygems_version: !ruby/object:Gem::Requirement
67
+ none: false
68
+ requirements:
69
+ - - ">="
70
+ - !ruby/object:Gem::Version
71
+ hash: 3
72
+ segments:
73
+ - 0
74
+ version: "0"
75
+ requirements: []
76
+
77
+ rubyforge_project:
78
+ rubygems_version: 1.8.24
79
+ signing_key:
80
+ specification_version: 3
81
+ summary: Frequency Enumerator
82
+ test_files: []
83
+
84
+ has_rdoc: