RubyGems - frequency_enumerator - Versions diffs - 1.0.0 - Mend

frequency_enumerator 1.0.0

Files changed (7) hide show

data/README.md +100 -0
data/lib/frequency_enumerator/base.rb +75 -0
data/lib/frequency_enumerator/composer.rb +45 -0
data/lib/frequency_enumerator/decomposer.rb +56 -0
data/lib/frequency_enumerator/sorter.rb +77 -0
data/lib/frequency_enumerator.rb +4 -0
metadata +84 -0

data/README.md ADDED Viewed

@@ -0,0 +1,100 @@
+## Frequency Enumerator
+Yields hashes that correlate with the given frequency distribution.
+## Concept
+If you're using brute-force search to solve some problem, it makes sense to carry out some frequency analysis on the problem first.
+Consider a simple example of trying to figure out which combinations of items cost a known total:
+```
+Total: £2.00
+Item prices: Tea (£0.20), Coffee (£0.30), Biscuit (£0.15)
+```
+We could use *maths* to solve this problem. Or we could brute-force it.
+For the latter, you'd go through every combination of these items and see which totalled £2.00. In this example, that'd take no time at all, but what if we're dealing with huge sums of money, or there are dozens of items? What if we're brute-forcing passwords?
+It helps to do some [frequency analysis](https://github.com/cpatuzzo/frequency_analyser) first.
+You might discover, that in fact, almost no one drinks tea and everyone loves biscuits. You might ask a couple of hundred people and end up with a frequency distribution like this:
+```ruby
+{ :tea => 25, :coffee => 60, :biscuit => 115 }
+```
+It'd be nice if we could brute-force the problem, but be more intelligent about the order in which we do so. We should use make use of our valuable, newfound knowledge.
+And that's exactly what Frequency Enumerator does. (I got there in the end!)
+You simply feed it a frequency distribution and it does its best to spew out 'attempts' that correlate with the given distribution. In our case, we'd do something like this:
+## Usage
+```ruby
+# gem install frequency_enumerator
+require 'frequency_enumerator'
+distribution = { :tea => 25, :coffee => 60, :biscuit => 115 }
+bits_required = 4 # 0..15 should be enough for our simple problem
+FrequencyEnumerator.new(distribution, :bit_count => bits_required).each do |hash|
+  # ...
+end
+```
+The first 10 attempts yielded to the block are:
+```ruby
+{ :tea=>0, :coffee=>0, :biscuit=>0 }
+{ :tea=>0, :coffee=>0, :biscuit=>1 }
+{ :tea=>0, :coffee=>0, :biscuit=>2 }
+{ :tea=>0, :coffee=>0, :biscuit=>3 }
+{ :tea=>0, :coffee=>1, :biscuit=>0 }
+{ :tea=>0, :coffee=>1, :biscuit=>1 }
+{ :tea=>0, :coffee=>1, :biscuit=>2 }
+{ :tea=>0, :coffee=>1, :biscuit=>3 }
+{ :tea=>0, :coffee=>0, :biscuit=>4 }
+{ :tea=>0, :coffee=>0, :biscuit=>5 }
+```
+As you can see, most of attempts change the number of biscuits, whilst we haven't even explored the possibility that tea might be in the solution yet.
+# Limit
+All attempts are guaranteed to be unique and appear in a deterministic order. The 'limit' method calculates the number of unique enumerations for the search space (zero-offset).
+```ruby
+  enum = FrequencyEnumerator.new(distribution, :bit_count => 4)
+  enum.limit #=> 4095
+```
+So there will be 4096 enumerations yielded to the block.
+## Options
+You can set 'from' and 'to' to explore different portions of the search space:
+```ruby
+  FrequencyEnumerator.new(distribution, :from => 100, :to => 199)
+```
+This might be useful for multi-threading, map-reduce, or carrying on from where you left off if you're exploring a large search space.
+## Real-world example
+My motivation for building this gem is to more intelligently brute-force the problem of finding [self-enumerating pangrams](http://en.wikipedia.org/wiki/Pangram#Self-enumerating_pangrams) by using classical literature to build a frequency distribution of English text.
+In theory, mutating the E's, T's, A's, O's and I's first should result in attempts that correlate with English text and therefore are more likely to be solutions.
+## Contribution
+Feel free to contribute. No commit is too small.
+If you're good at optimisation, this project might be for you.
+You should follow me: [@cpatuzzo](https://twitter.com/cpatuzzo)

data/lib/frequency_enumerator/base.rb ADDED Viewed

@@ -0,0 +1,75 @@
+class FrequencyEnumerator < Enumerable::Enumerator
+  attr_reader :frequencies, :bit_count, :from, :to
+  def initialize(frequencies, params = {})
+    @frequencies = frequencies
+    @bit_count  = params[:bit_count]  || 6
+    @from       = params[:from]       || 0
+    @to         = params[:to]         || limit
+    raise_if_either_boundary_is_out_of_range
+    @sorter     = params[:sorter]     || fe::Sorter
+    @composer   = params[:composer]   || fe::Composer
+    @decomposer = params[:decomposer] || fe::Decomposer
+  end
+  def each(&block)
+    (from..to).each do |i|
+      binary = decomposer.decompose(i)
+      bitmap = fragmented_bitmap(binary)
+      yield composition(bitmap)
+    end
+    self
+  end
+  def limit
+    @limit ||= (2 ** bit_count) ** frequencies.size - 1
+  end
+  private
+  def decomposer
+    @decomposer.new(:bit_count => @bit_count * frequencies.size)
+  end
+  def fragmented_bitmap(binary)
+    pairs = binary.zip(sorted_keys)
+    empty_array_default = Hash.new { |h, k| h[k] = [] }
+    pairs.inject(empty_array_default) do |h, (bit, key)|
+      h[key] << bit; h
+    end
+  end
+  def composition(bitmap)
+    bitmap.inject({}) do |h, (key, fragment)|
+      h.merge(key => @composer.compose(fragment))
+    end
+  end
+  def sorted_keys
+    return @sorted_keys if @sorted_keys
+    sorter = @sorter.new(:bit_count => @bit_count)
+    @sorted_keys = sorter.sort(frequencies)
+  end
+  def raise_if_either_boundary_is_out_of_range
+    [@from, @to].each do |i|
+      raise ArgumentError.new(
+        "#{i} lies outside of the range of the function: (0..#{limit})."
+      ) if out_of_range?(i)
+    end
+  end
+  def out_of_range?(x)
+    x < 0 || x > limit
+  end
+  def fe
+    self.class
+  end
+end

data/lib/frequency_enumerator/composer.rb ADDED Viewed

@@ -0,0 +1,45 @@
+class FrequencyEnumerator::Composer
+  attr_reader :endianess
+  def initialize(params = {})
+    @endianess = params[:endianess]
+  end
+  def self.compose(bit_array)
+    new.compose(bit_array)
+  end
+  def compose(bit_array)
+    raise_if_non_binary_elements(bit_array)
+    bit_array = bit_array.reverse if big_endian?
+    bit_array.each_with_index.inject(0) do |sum, (bit, index)|
+      sum + (bit << index)
+    end
+  end
+  def little_endian?
+    @endianess == :little
+  end
+  def big_endian?
+    @endianess == :big
+  end
+  private
+  def raise_if_non_binary_elements(bit_array)
+    non_binary_elements = bit_array.reject { |b| [0, 1].include?(b) }
+    if non_binary_elements.any?
+      plural = 's' if non_binary_elements.size > 1
+      elements = non_binary_elements.map(&:inspect).join(', ')
+      raise TypeError.new(
+        "Composing from non-binary element#{plural} #{elements}."
+      )
+    end
+  end
+end

data/lib/frequency_enumerator/decomposer.rb ADDED Viewed

@@ -0,0 +1,56 @@
+class FrequencyEnumerator::Decomposer
+  class ::OverflowError < StandardError; end
+  class ::SignedError   < StandardError; end
+  attr_reader :bit_count
+  attr_reader :endianness
+  def initialize(params = {})
+    @bit_count  = params[:bit_count]  || 8
+    @endianness = params[:endianness] || :little
+  end
+  def self.decompose(integer)
+    new.decompose(integer)
+  end
+  def decompose(integer)
+    raise_if_negative(integer)
+    raise_if_not_enough_bits(integer)
+    bit_array = bit_count.times.map { |b| integer[b] }
+    little_endian? ? bit_array : bit_array.reverse
+  end
+  def little_endian?
+    endianness == :little
+  end
+  def big_endian?
+    endianess == :big
+  end
+  private
+  def raise_if_negative(integer)
+    raise SignedError.new(
+      "Decomposing negative integers is unsupported."
+    ) if integer < 0
+  end
+  def raise_if_not_enough_bits(integer)
+    bits_required = bits_required_to_decompose(integer)
+    raise OverflowError.new(
+      "Decomposing #{integer} requires more than #{bit_count} bits."
+    ) if bits_required > bit_count
+  end
+  def bits_required_to_decompose(integer)
+    (1..1.0/0).detect do |bits|
+      (integer >> bits).zero?
+    end
+  end
+end

data/lib/frequency_enumerator/sorter.rb ADDED Viewed

@@ -0,0 +1,77 @@
+class FrequencyEnumerator::Sorter
+  attr_reader :bit_count
+  def initialize(params = {})
+    @bit_count = params[:bit_count] || 8
+  end
+  def self.sort(frequencies)
+    new.sort(frequencies)
+  end
+  def sort(frequencies)
+    helper = AccumulationHelper.new(frequencies, bit_count)
+    sorted_keys = []
+    until helper.depleted_keys? do
+      key = helper.maximal_key
+      sorted_keys << key
+      helper.accumulate(key)
+    end
+    sorted_keys
+  end
+  class AccumulationHelper
+    attr_reader :frequencies, :bit_count
+    def initialize(frequencies, bit_count = 6)
+      @frequencies = frequencies
+      @bit_count = bit_count
+    end
+    def depleted_keys?
+      available_keys.empty?
+    end
+    def maximal_key
+      accumulation.max_by { |_, v| v }.first
+    end
+    def accumulate(key)
+      accumulation[key] *= probabilities[key]
+      consume(key)
+    end
+    def available_keys
+      @available_keys ||= frequencies.inject({}) do |hash, (k, _)|
+        hash.merge(k => bit_count)
+      end
+    end
+    def accumulation
+      @accumulation ||= probabilities.dup
+    end
+    def consume(key)
+      available_keys[key] -= 1
+      if available_keys[key].zero?
+        available_keys.delete(key)
+        accumulation.delete(key)
+      end
+    end
+    def probabilities
+      return @probabilities if @probabilities
+      total = frequencies.values.inject(:+)
+      @probabilities = frequencies.inject({}) do |hash, (k, v)|
+        hash.merge(k => v.to_f / total)
+      end
+    end
+  end
+end

data/lib/frequency_enumerator.rb ADDED Viewed

@@ -0,0 +1,4 @@
+require 'frequency_enumerator/base'
+require 'frequency_enumerator/decomposer'
+require 'frequency_enumerator/composer'
+require 'frequency_enumerator/sorter'

metadata ADDED Viewed

@@ -0,0 +1,84 @@
+--- !ruby/object:Gem::Specification
+name: frequency_enumerator
+version: !ruby/object:Gem::Version
+  hash: 23
+  prerelease:
+  segments:
+  - 1
+  - 0
+  - 0
+  version: 1.0.0
+platform: ruby
+authors:
+- Christopher Patuzzo
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2012-10-05 00:00:00 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: rspec
+  prerelease: false
+  requirement: &id001 !ruby/object:Gem::Requirement
+    none: false
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        hash: 3
+        segments:
+        - 0
+        version: "0"
+  type: :development
+  version_requirements: *id001
+description: Yields hashes that correlate with the given frequency distribution.
+email: chris@patuzzo.co.uk
+executables: []
+extensions: []
+extra_rdoc_files: []
+files:
+- README.md
+- lib/frequency_enumerator/base.rb
+- lib/frequency_enumerator/composer.rb
+- lib/frequency_enumerator/decomposer.rb
+- lib/frequency_enumerator/sorter.rb
+- lib/frequency_enumerator.rb
+homepage: https://github.com/cpatuzzo/frequency_enumerator
+licenses: []
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  none: false
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      hash: 3
+      segments:
+      - 0
+      version: "0"
+required_rubygems_version: !ruby/object:Gem::Requirement
+  none: false
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      hash: 3
+      segments:
+      - 0
+      version: "0"
+requirements: []
+rubyforge_project:
+rubygems_version: 1.8.24
+signing_key:
+specification_version: 3
+summary: Frequency Enumerator
+test_files: []
+has_rdoc: