RubyGems - stamina-induction - Versions diffs - 0.5.0 - Mend

stamina-induction 0.5.0

Files changed (39) hide show

data/CHANGELOG.md +78 -0
data/LICENCE.md +22 -0
data/lib/stamina-induction/stamina-induction.rb +1 -0
data/lib/stamina-induction/stamina/abbadingo.rb +2 -0
data/lib/stamina-induction/stamina/abbadingo/random_dfa.rb +55 -0
data/lib/stamina-induction/stamina/abbadingo/random_sample.rb +146 -0
data/lib/stamina-induction/stamina/classifier.rb +55 -0
data/lib/stamina-induction/stamina/command.rb +6 -0
data/lib/stamina-induction/stamina/command/abbadingo_dfa.rb +80 -0
data/lib/stamina-induction/stamina/command/abbadingo_samples.rb +39 -0
data/lib/stamina-induction/stamina/command/classify.rb +47 -0
data/lib/stamina-induction/stamina/command/infer.rb +140 -0
data/lib/stamina-induction/stamina/command/metrics.rb +50 -0
data/lib/stamina-induction/stamina/command/score.rb +34 -0
data/lib/stamina-induction/stamina/dsl.rb +2 -0
data/lib/stamina-induction/stamina/dsl/induction.rb +29 -0
data/lib/stamina-induction/stamina/dsl/reg_lang.rb +69 -0
data/lib/stamina-induction/stamina/induction.rb +13 -0
data/lib/stamina-induction/stamina/induction/blue_fringe.rb +265 -0
data/lib/stamina-induction/stamina/induction/commons.rb +156 -0
data/lib/stamina-induction/stamina/induction/rpni.rb +186 -0
data/lib/stamina-induction/stamina/induction/union_find.rb +377 -0
data/lib/stamina-induction/stamina/input_string.rb +123 -0
data/lib/stamina-induction/stamina/reg_lang.rb +226 -0
data/lib/stamina-induction/stamina/reg_lang/canonical_info.rb +181 -0
data/lib/stamina-induction/stamina/reg_lang/parser.rb +10 -0
data/lib/stamina-induction/stamina/reg_lang/parser/alternative.rb +19 -0
data/lib/stamina-induction/stamina/reg_lang/parser/node.rb +22 -0
data/lib/stamina-induction/stamina/reg_lang/parser/parenthesized.rb +12 -0
data/lib/stamina-induction/stamina/reg_lang/parser/parser.citrus +49 -0
data/lib/stamina-induction/stamina/reg_lang/parser/plus.rb +14 -0
data/lib/stamina-induction/stamina/reg_lang/parser/question.rb +17 -0
data/lib/stamina-induction/stamina/reg_lang/parser/regexp.rb +12 -0
data/lib/stamina-induction/stamina/reg_lang/parser/sequence.rb +15 -0
data/lib/stamina-induction/stamina/reg_lang/parser/star.rb +15 -0
data/lib/stamina-induction/stamina/reg_lang/parser/symbol.rb +14 -0
data/lib/stamina-induction/stamina/sample.rb +309 -0
data/lib/stamina-induction/stamina/scoring.rb +213 -0
metadata +106 -0

data/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,78 @@
+# 0.5.0 / FIX ME
+* Breaking features.
+  * Support for ruby 1.8.7 has been definitely removed.
+* Major enhancements
+  * The project has been split in different sub gems (core, induction and gui). This
+    implies a lot of internal changes, but the public API has not been affected. A main
+    'stamina' gem automatically includes all sub gems so previous behavior is guaranteed.
+* Minor enhancements
+    * Fixed a bug with bundler usage in main stamina binary
+    * adl2dot command now support samples as input in addition to automata. In that case,
+      the dot result models a PTA (prefix tree acceptor)
+    * Added --png to 'stamina adl2dot'
+# 0.4.0 / 2011-05-01
+* Major Enhancements
+    * Added Automaton#to_adl as an shortcut for Stamina::ADL::print_automaton(...)
+    * Added Sample#to_pta taken from Induction::Commons
+    * Added Automaton completion (all strings parsable) under Automaton#complete[!?]
+    * Added Automaton stripping (removal of unreachable states) under Automaton#strip[!]
+    * Added Automaton minimization (Hopcroft + Pitchies) under Automaton#minimize
+    * Added Abbadingo generators under Abbadingo::RandomDFA and Abbadingo::RandomSample
+    * Added a main 'stamina' command relying on Quickl. classiy/adl2dot commands become
+      subcommands of stamina itself (see stamina --help for a list of available commands).
+      Induction command (rpni and redblue) are now handled by a 'stamina infer' with
+      options.
+    * Error states and now correctly handled in ADL::parse and ADL::flush
+    * RedBlue has been renamed as BlueFringe everywhere (red_?blue -> blue_fringe)
+* Minnor Enhancements
+    * Added a few optimizations here and there
+* Bug fixes
+    * Fixed a bug in Automaton#depth when some states are unreachable
+# 0.3.1 / 2011-03-24
+* Major Enhancements
+    * Implemented the decoration algorithm of Damas10, allowing to decorate states
+      with information propagated from states to states until a fixpoint is reached.
+    * Added Automaton::Metrics module, automatically included, with useful metrics
+      like automaton depth, accepting ratio and so on.
+    * Added Scoring module and Classifier#classification_scoring(sample) method
+      with common measures from information retrieval.
+* On the devel side
+    * Moved specific automaton tests under test/stamina/automaton/...
+# 0.3.0 / 2011-03-24
+* On the devel side
+  * The project structure is now handled by Noe
+  * Ensures that tests are correctly executed under ruby 1.9.2
+# 0.2.2 / 2010-10-22
+* Major Enhancements
+  * Sample#<< does not detect inconsistencies anymore, to ensure a linear method instead of a quadratic one.
+* On the devel side
+  * Fixes a bug in Rakefile that lead to test failures under ruby 1.8.7
+# 0.2.1 / 2010-05-01
+* Main public version for the official competition, extracted from private SVN.

data/LICENCE.md ADDED Viewed

@@ -0,0 +1,22 @@
+The MIT License
+Copyright (c) 2008-2009 University of Louvain
+(Universite catholique de Louvain-la-Neuve, Belgium)
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.

data/lib/stamina-induction/stamina-induction.rb ADDED Viewed

	@@ -0,0 +1 @@
1	+ require_relative 'stamina/induction'

data/lib/stamina-induction/stamina/abbadingo.rb ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ require_relative 'abbadingo/random_dfa'
2	+ require_relative 'abbadingo/random_sample'

data/lib/stamina-induction/stamina/abbadingo/random_dfa.rb ADDED Viewed

@@ -0,0 +1,55 @@
+module Stamina
+  module Abbadingo
+    #
+    # Generates a random DFA using the Abbadingo protocol.
+    #
+    class RandomDFA
+      DEFAULT_OPTIONS = {
+        :minimize => :hopcroft
+      }
+      def execute(state_count = 64,
+                  accepting_ratio = 0.5,
+                  options = {})
+        options = DEFAULT_OPTIONS.merge(options)
+        # Built dfa
+        dfa = Automaton.new
+        # Generate 5/4*state_count states
+        (state_count.to_f * 5.0 / 4.0).to_i.times do
+          dfa.add_state(:initial   => false,
+                        :accepting => (Kernel.rand <= accepting_ratio),
+                        :error     => false)
+        end
+        # Generate all edges
+        dfa.each_state do |source|
+          ["0", "1"].each do |symbol|
+            target = dfa.ith_state(Kernel.rand(dfa.state_count))
+            dfa.connect(source, target, symbol)
+          end
+        end
+        # Choose an initial state
+        dfa.ith_state(Kernel.rand(dfa.state_count)).initial!
+        # Minimize the automaton and return it
+        case options[:minimize]
+          when :hopcroft
+            Stamina::Automaton::Minimize::Hopcroft.execute(dfa)
+          when :pitchies
+            Stamina::Automaton::Minimize::Pitchies.execute(dfa)
+          else
+            dfa
+        end
+      end
+      def self.execute(*args)
+        new.execute(*args)
+      end
+    end # class RandomDFA
+  end # module Abbadingo
+end # module Stamina

data/lib/stamina-induction/stamina/abbadingo/random_sample.rb ADDED Viewed

@@ -0,0 +1,146 @@
+module Stamina
+  module Abbadingo
+    #
+    # Generates a random Sample using the Abbadingo protocol.
+    #
+    class RandomSample
+      #
+      # Implements an enumerator for binary strings whose length lies between 0
+      # and max_length (passed at construction).
+      #
+      # The enumerator guarantees that strings are sampled with an uniform
+      # distribution among all available. As the number of strings of a given
+      # length is an exponential function, this means that you've got 50% change
+      # of having a string of length max_length, 25% of max_length - 1, 12.5% of
+      # max_length - 2 and so on.
+      #
+      # How to use it?
+      #
+      #   # create for strings between 0 and 10 symbols, inclusive
+      #   enum = Stamina::Abbadingo::StringEnumerator.new(10)
+      #
+      #   # this is how to generate strings while a predicate is true
+      #   enum.each do |s|
+      #     # s is an array of binary integer symbols (0 or 1)
+      #     # true for continuing, false otherwise
+      #     (true || false)
+      #   end
+      #
+      #   # this is how to generate a fixed number of strings
+      #   (1..1000).collect{ enum.one }
+      #
+      # How does it work? Well, the distribution of strings is as follows:
+      #
+      #    length     [n]b_strings        [c]umul       log2(n)         log2(c)    log2(c).floor
+      #                   (2**n)         2**(n+1)-1
+      #      0               1               1       0.0000000000       0.000000        0
+      #      1               2               3       1.0000000000       1.584963        1
+      #      2               4               7       2.0000000000       2.807355        2
+      #      3               8              15       3.0000000000       3.906891        3
+      #      4              16              31       4.0000000000       4.954196        4
+      #      5              32              63       5.0000000000       5.977280        5
+      #
+      # where _cumul_ is the total number of string upto _length_ symbols.
+      #
+      # Therefore, the idea is to see each string has an identifier, say _x_,
+      # between 1 and 2**(max_length+1)-1 (see max).
+      #   * The length of the _x_th string is log2(x).floor (see length_for)
+      #   * The string itself is the binary decomposition of x, up to length_for(x)
+      #     symbols (see string_for)
+      #
+      # As those identifiers naturally respect the exponential distribution, sampling
+      # the strings is the same as taking string_for(x) for random x upto _max_.
+      #
+      class StringEnumerator
+        include Enumerable
+        # Maximal length of a string
+        attr_reader :max_length
+        def initialize(max_length = 16)
+          @max_length = max_length
+        end
+        #
+        # Returns the length of the string whose identifier is _x_ (> 0)
+        #
+        def length_for(x)
+          Math.log2(x).floor
+        end
+        #
+        # Returns the binary string whose identifier is _x_ (> 0)
+        #
+        def string_for(x)
+          length = length_for(x)
+          (0..length-1).collect{|i| ((x >> i) % 2).to_s}
+        end
+        #
+        # Returns the maximum identifier, which is also the number of strings
+        # up to max_length symbols
+        #
+        def max
+          @max ||= 2 ** (max_length+1) - 1
+        end
+        #
+        # Generates a string at random
+        #
+        def one
+          string_for(1+Kernel.rand(max))
+        end
+        #
+        # Yields the block with a random string, until the block return false
+        # or nil.
+        #
+        def each
+          begin
+            cont = yield(one)
+          end while cont
+        end
+      end # class StringEnumerator
+      #
+      # Generates a Sample instance with _nb_ strings randomly sampled with a
+      # uniform distribution over all strings up
+      #
+      def self.execute(classifier, max_length = classifier.depth + 3)
+        enum = StringEnumerator.new(max_length)
+        # We generate 1800 strings for the test set plus n^2/2 strings for
+        # the training set. If there are no enough strings available, we generate
+        # the maximum we can
+        seen = {}
+        nb = Math.min(1800 + (classifier.state_count**2), enum.max)
+        # Let's go now
+        enum.each do |s|
+          seen[s] = true
+          seen.size < nb
+        end
+        # Make them
+        strings = seen.keys.collect{|s| InputString.new(s, classifier.accepts?(s))}
+        pos, neg = strings.partition{|s| s.positive?}
+        # Split them, 1800 in test and the rest in training set
+        if (pos.size > 900) && (neg.size > 900)
+          pos_test, pos_training = pos[0...900], pos[900..-1]
+          neg_test, neg_training = neg[0...900], neg[900..-1]
+        else
+          pos_test, pos_training = pos.partition{|s| Kernel.rand < 0.5}
+          neg_test, neg_training = neg.partition{|s| Kernel.rand < 0.5}
+        end
+        flusher = lambda{|x,y| Kernel.rand < 0.5 ? 1 : -1}
+        training = (pos_training + neg_training).sort &flusher
+        test = (pos_test + neg_test).sort &flusher
+        [Sample.new(training), Sample.new(test)]
+      end
+    end # class RandomSample
+  end # module Abbadingo
+end # module Stamina

data/lib/stamina-induction/stamina/classifier.rb ADDED Viewed

@@ -0,0 +1,55 @@
+module Stamina
+  #
+  # Provides a reusable module for binary classifiers. Classes including this
+  # module are required to provide a label_of(string) method, returning '1' for
+  # strings considered positive, and '0' fr strings considered negative.
+  #
+  # Note that an Automaton being a classifier it already includes this module.
+  #
+  module Classifier
+    #
+    # Computes a signature for a given sample (that is, an ordered set of strings).
+    # The signature is a string containing 1 (considered positive, or accepted)
+    # and 0 (considered negative, or rejected), one for each string.
+    #
+    def signature(sample)
+      signature = ''
+      sample.each do |str|
+        signature << label_of(str)
+      end
+      signature
+    end
+    alias :classification_signature :signature
+    #
+    # Classifies a sample then compute the classification scoring that is obtained
+    # by comparing the signature obtained by classification and the one of the sample
+    # itself. Returns an object responding to methods defined in Scoring module.
+    #
+    # This method is actually a convenient shortcut for:
+    #
+    #    Stamina::Scoring.scoring(signature(sample), sample.signature)
+    #
+    def scoring(sample)
+      Stamina::Scoring.scoring(signature(sample), sample.signature)
+    end
+    alias :classification_scoring :scoring
+    #
+    # Checks if a labeled sample is correctly classified by the classifier.
+    #
+    def correctly_classify?(sample)
+      sample.each do |str|
+        label = label_of(str)
+        expected = (str.positive? ? '1' : '0')
+        return false unless expected==label
+      end
+      true
+    end
+  end # module Classifier
+  class Automaton
+    include Stamina::Classifier
+  end
+end # module Stamina

data/lib/stamina-induction/stamina/command.rb ADDED Viewed

@@ -0,0 +1,6 @@
+require_relative 'command/metrics'
+require_relative 'command/classify'
+require_relative 'command/score'
+require_relative 'command/abbadingo_dfa'
+require_relative 'command/abbadingo_samples'
+require_relative 'command/infer'

data/lib/stamina-induction/stamina/command/abbadingo_dfa.rb ADDED Viewed

@@ -0,0 +1,80 @@
+module Stamina
+  class Command
+    #
+    # Generates a DFA following Abbadingo's protocol
+    #
+    # SYNOPSIS
+    #   #{program_name} #{command_name}
+    #
+    # OPTIONS
+    # #{summarized_options}
+    #
+    class AbbadingoDfa < Quickl::Command(__FILE__, __LINE__)
+      include Robustness
+      # Size of the target automaton
+      attr_accessor :size
+      # Tolerance on the size
+      attr_accessor :size_tolerance
+      # Tolerance on the automaton depth
+      attr_accessor :depth_tolerance
+      # Where to flush the dfa
+      attr_accessor :output_file
+      # Install options
+      options do |opt|
+        @size = 64
+        opt.on("--size=X", Integer, "Sets the size of the automaton to generate") do |x|
+          @size = x
+        end
+        @size_tolerance = nil
+        opt.on("--size-tolerance[=X]", Integer, "Sets the tolerance on automaton size (in number of states)") do |x|
+          @size_tolerance = x
+        end
+        @depth_tolerance = 0
+        opt.on("--depth-tolerance[=X]", Integer, "Sets the tolerance on expected automaton depth (in length, 0 by default)") do |x|
+          @depth_tolerance = x
+        end
+        @output_file = nil
+        opt.on("-o", "--output=OUTPUT",
+               "Flush DFA in output file") do |value|
+          @output_file = assert_writable_file(value)
+        end
+      end # options
+      def accept?(dfa)
+        (size_tolerance.nil?  || (size - dfa.state_count).abs <= size_tolerance) &&
+        (depth_tolerance.nil? || ((2*Math.log2(size)-2) - dfa.depth).abs <= depth_tolerance)
+      end
+      # Command execution
+      def execute(args)
+        require 'stamina/abbadingo'
+        # generate it
+        randomizer = Stamina::Abbadingo::RandomDFA.new(size)
+        begin
+          dfa = randomizer.execute
+        end until accept?(dfa)
+        # flush it
+        if output_file
+          File.open(output_file, 'w') do |file|
+            Stamina::ADL.print_automaton(dfa, file)
+          end
+        else
+          Stamina::ADL.print_automaton(dfa, $stdout)
+        end
+      end
+    end # class AbbadingoDFA
+  end # class Command
+end # module Stamina