RubyGems - shalmaneser-rosy - Versions diffs - 1.2.0.rc4 → 1.2.rc5 - Mend

shalmaneser-rosy 1.2.0.rc4 → 1.2.rc5

Files changed (41) hide show

checksums.yaml +4 -4
data/README.md +47 -18
data/bin/rosy +14 -7
data/lib/rosy/FailedParses.rb +22 -20
data/lib/rosy/FeatureInfo.rb +35 -31
data/lib/rosy/GfInduce.rb +132 -130
data/lib/rosy/GfInduceFeature.rb +86 -68
data/lib/rosy/InputData.rb +59 -55
data/lib/rosy/RosyConfusability.rb +47 -40
data/lib/rosy/RosyEval.rb +55 -55
data/lib/rosy/RosyFeatureExtractors.rb +295 -290
data/lib/rosy/RosyFeaturize.rb +54 -67
data/lib/rosy/RosyInspect.rb +52 -50
data/lib/rosy/RosyIterator.rb +73 -67
data/lib/rosy/RosyPhase2FeatureExtractors.rb +48 -48
data/lib/rosy/RosyPruning.rb +39 -31
data/lib/rosy/RosyServices.rb +116 -115
data/lib/rosy/RosySplit.rb +55 -53
data/lib/rosy/RosyTask.rb +7 -3
data/lib/rosy/RosyTest.rb +174 -191
data/lib/rosy/RosyTrain.rb +46 -50
data/lib/rosy/RosyTrainingTestTable.rb +101 -99
data/lib/rosy/TargetsMostFrequentFrame.rb +13 -9
data/lib/rosy/{AbstractFeatureAndExternal.rb → abstract_feature_extractor.rb} +22 -97
data/lib/rosy/abstract_single_feature_extractor.rb +52 -0
data/lib/rosy/external_feature_extractor.rb +35 -0
data/lib/rosy/opt_parser.rb +231 -201
data/lib/rosy/rosy.rb +63 -64
data/lib/rosy/rosy_conventions.rb +66 -0
data/lib/rosy/rosy_error.rb +15 -0
data/lib/rosy/var_var_restriction.rb +16 -0
data/lib/shalmaneser/rosy.rb +1 -0
metadata +26 -19
data/lib/rosy/ExternalConfigData.rb +0 -58
data/lib/rosy/View.rb +0 -418
data/lib/rosy/rosy_config_data.rb +0 -121
data/test/frprep/test_opt_parser.rb +0 -94
data/test/functional/functional_test_helper.rb +0 -58
data/test/functional/test_fred.rb +0 -47
data/test/functional/test_frprep.rb +0 -99
data/test/functional/test_rosy.rb +0 -40

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: fc41641d5f0eed28292b10a996ffd797eb1002fc
-  data.tar.gz: 76916c60023ae21361dc6752ca316028585d1522
+  metadata.gz: 35508aa71aef19017118cbe0bafc4f76f7223844
+  data.tar.gz: 993e563615c38a29c70de52f9f95fb27145fe535
 SHA512:
-  metadata.gz: 51bbbd581acb92993cd12d485b405f0f9f199d5ea4334b37cac6a4ff6150d49e4b0bc7b92ab6d08399a1bbe69839ebc476ae69267b64f2ec34464d4d080569cf
-  data.tar.gz: abbc5f43fd9d191730c8149743fcd84ba6183ff2ee0f5eaf08b5f574a68aaca2a89b390c68e54af6f5f9019dc7f97f22b352fb6ee6d88001579c55868c06e286
+  metadata.gz: e08f84cf7a13dda90f423cdc8f611a3cfb87fa082f2fe349be4d4b6bf3adc9f5b7653f4892dd037cab32e6ae695195bc54995640bdc222c366942af110a95c72
+  data.tar.gz: d6cdbb894fd1ab32a5e6a02e2785d31401dac8fa018facdecc6e380ea7bb2a8f37a78565afc435cac16ba1f1b6f91f4e2285c9490f7f533e904dc03e78919e97

data/README.md CHANGED Viewed

@@ -1,4 +1,4 @@
-# [SHALMANESER - a SHALlow seMANtic parSER](http://www.coli.uni-saarland.de/projects/salsa/shal/)
+# SHALMANESER
 [RubyGems](http://rubygems.org/gems/shalmaneser) |
 [Shalmanesers Project Page](http://bu.chsta.be/projects/shalmaneser/) |
@@ -7,9 +7,9 @@
 [![Gem Version](https://img.shields.io/gem/v/shalmaneser.svg")](https://rubygems.org/gems/shalmaneser)
-[![Gem Version](https://img.shields.io/gem/v/frprep.svg")](https://rubygems.org/gems/frprep)
-[![Gem Version](https://img.shields.io/gem/v/fred.svg")](https://rubygems.org/gems/fred)
-[![Gem Version](https://img.shields.io/gem/v/rosy.svg")](https://rubygems.org/gems/rosy)
+[![Gem Version](https://img.shields.io/gem/v/frprep.svg")](https://rubygems.org/gems/shalmaneser-prep)
+[![Gem Version](https://img.shields.io/gem/v/fred.svg")](https://rubygems.org/gems/shalmaneser-fred)
+[![Gem Version](https://img.shields.io/gem/v/rosy.svg")](https://rubygems.org/gems/shalmaneser-rosy)
 [![License GPL 2](http://img.shields.io/badge/License-GPL%202-green.svg)](http://www.gnu.org/licenses/gpl-2.0.txt)
@@ -17,12 +17,44 @@
 [![Code Climate](https://img.shields.io/codeclimate/github/arbox/shalmaneser.svg")](https://codeclimate.com/github/arbox/shalmaneser)
 [![Dependency Status](https://img.shields.io/gemnasium/arbox/shalmaneser.svg")](https://gemnasium.com/arbox/shalmaneser)
+[SHALMANESER](http://www.coli.uni-saarland.de/projects/salsa/shal/) is a SHALlow seMANtic parSER.
+The name Shalmaneser is borrowed from John Brunner. He describes in his novel
+"Stand on Zanzibar" an all knowing supercomputer baptized Shalmaneser.
+Shalmaneser also has other origins like the king [Shalmaneser III](https://en.wikipedia.org/wiki/Shalmaneser_III).
+> "SCANALYZER is the one single, the ONLY study of the news in depth
+> that’s processed by General Technics’ famed computer Shalmaneser,
+> who sees all, hears all, knows all save only that which YOU, Mr. and Mrs.
+> Everywhere, wish to keep to yourselves." <br/>
+> John Brunner (1968) "Stand on Zanzibar"
+> But Shalmaneser is a Micryogenic® computer bathed in liquid helium and it’s cold in his vault. <br/>
+> John Brunner (1968) "Stand on Zanzibar"
+> “Of course not. Shalmaneser’s main task is to achieve the impossible again, a routine undertaking here at GT.” <br/>
+> John Brunner (1968) "Stand on Zanzibar"
+> “They programmed Shalmaneser with the formula for this stiffener, see, and…” <br/>
+> John Brunner (1968) "Stand on Zanzibar"
+> What am I going to do now? <br/>
+> “All right, Shalmaneser!” <br/>
+> John Brunner (1968) "Stand on Zanzibar"
+> Shalmaneser is a Micryogenic® computer bathed in liquid helium and there’s no sign of Teresa. <br/>
+> John Brunner (1968) "Stand on Zanzibar"
+> Bathed in his currents of liquid helium, self-contained, immobile, vastly well informed by every mechanical sense: Shalmaneser. <br/>
+> John Brunner (1968) "Stand on Zanzibar"
 ## Description
 Please be careful, the whole thing is under construction! For now Shalmaneser it not intended to run on Windows systems since it heavily uses system calls for external invocations.
 Current versions of Shalmaneser have been tested on Linux only (other *NIX testers are welcome!).
-Shalmaneser is a supervised learning toolbox for shallow semantic parsing, i.e. the automatic assignment of semantic classes and roles to text. This technique is often called SRL (Semantic Role Labelling). The system was developed for Frame Semantics; thus we use Frame Semantics terminology and call the classes frames and the roles frame elements. However, the architecture is reasonably general, and with a certain amount of adaption, Shalmaneser should be usable for other paradigms (e.g., PropBank roles) as well. Shalmaneser caters both for end users, and for researchers.
+Shalmaneser is a supervised learning toolbox for shallow semantic parsing, i.e. the automatic assignment of semantic classes and roles to text. This technique is often called [SRL](https://en.wikipedia.org/wiki/Semantic_role_labeling) (Semantic Role Labelling). The system was developed for Frame Semantics; thus we use Frame Semantics terminology and call the classes frames and the roles frame elements. However, the architecture is reasonably general, and with a certain amount of adaption, Shalmaneser should be usable for other paradigms (e.g., PropBank roles) as well. Shalmaneser caters both for end users, and for researchers.
 For end users, we provide a simple end user mode which can simply apply the pre-trained classifiers
 for [English](http://www.coli.uni-saarland.de/projects/salsa/shal/index.php?nav=download) (FrameNet 1.3 annotation / Collins parser)
@@ -34,32 +66,27 @@ For researchers interested in investigating shallow semantic parsing, our system
 ## Origin
-The original version of Shalmaneser was written by Sebastian Padó, Katrin Erk and others during their work in the SALSA Project.
+The original version of Shalmaneser was written by Sebastian Padó, Katrin Erk, Alexander Koller, Ines Rehbein, Aljoscha Burchardt and others during their work in the SALSA Project.
 You can find original versions of Shalmaneser up to ``1.1`` on the [SALSA](http://www.coli.uni-saarland.de/projects/salsa/shal/) project page.
 ## Publications on Shalmaneser
 - K. Erk and S. Padó: Shalmaneser - a flexible toolbox for semantic role assignment. Proceedings of LREC 2006, Genoa, Italy. [Click here for details](http://www.nlpado.de/~sebastian/pub/papers/lrec06_erk.pdf).
 - TODO: add other works
 ## Documentation
-The project documentation can be found in our [doc](https://github.com/arbox/shalmaneser/blob/1.2/doc/index.md) folder.
+The project documentation can be found in our [doc](https://github.com/arbox/shalmaneser/blob/master/doc/index.md) folder.
 ## Development
-We are working now on two branches:
-- ``dev`` - our development branch incorporating actual changes, for now pointing to ``1.2``;
-- ``1.2`` - intermediate target;
-- ``2.0`` - final target.
+We are working now only on the `master` branch. For different intermediate versions see corresponding tags.
 ## Installation
-See the installation instructions in the [doc](https://github.com/arbox/shalmaneser/blob/1.2/doc/index.md#installation) folder.
+See the installation instructions in the [doc](https://github.com/arbox/shalmaneser/blob/master/doc/index.md#installation) folder.
 ### Tokenizers
@@ -75,7 +102,7 @@ See the installation instructions in the [doc](https://github.com/arbox/shalmane
 ### Parsers
-- [BerkeleyParser](https://code.google.com/p/berkeleyparser/downloads/list)
+- [BerkeleyParser](https://github.com/slavpetrov/berkeleyparser)
 - [Stanford Parser](http://nlp.stanford.edu/software/lex-parser.shtml)
 - [Collins Parser](http://www.cs.columbia.edu/~mcollins/code.html)
@@ -86,8 +113,10 @@ See the installation instructions in the [doc](https://github.com/arbox/shalmane
 ## License
-See the `LICENSE` file.
+Shalmaneser is released under the `GPL v. 2.0` license as of the initial authors.
+For a local copy of the full license text see the [LICENSE](LICENSE.md) file.
 ## Contributing
-See the `CONTRIBUTING` file.
+Feel free to contact me via Github. Open an issue if you see problems or need help.

data/bin/rosy CHANGED Viewed

@@ -1,17 +1,24 @@
 #!/usr/bin/env ruby
 # -*- encoding: utf-8 -*-
-# AB: 2011-11-14
+# @author Andrei Beliankou
+# 2011-11-14
 # rosy.rb
-# KE, SP April 05
+# @author KE, SP April 05
 #
 # Main file of the Rosy role assignment system.
-require 'rosy/opt_parser'
 require 'rosy/rosy'
+require 'rosy/opt_parser'
-options = Rosy::OptParser.parse(ARGV)
+begin
+  options = ::Shalmaneser::Rosy::OptParser.parse(ARGV)
-rosy = Rosy::Rosy.new(options)
-rosy.assign
+  rosy = ::Shalmaneser::Rosy::Rosy.new(options)
+  # @todo Rename the assing method.
+  rosy.assign
+rescue => e
+  $stderr.puts 'Rosy cannot serve you!'
+  $stderr.puts e.message, e.backtrace
+  exit(1)
+end

data/lib/rosy/FailedParses.rb CHANGED Viewed

@@ -2,23 +2,24 @@
 #
 # SP May 05
 #
-# Administration of information about failed parses;
+# Administration of information about failed parses;
 # - sentence ID
 # - frame
 # - missed FE markables
 #
-# this class is pretty much a gloriefied hash table with methods to
+# this class is pretty much a gloriefied hash table with methods to
 # - read FailedParses from a file and to write them to a file
 # - access info in a frame-specific way
+module Shalmaneser
+module Rosy
 class FailedParses
   ###
   # initialize
   #
   # nothing much happens here
-  def initialize()
-    @failed_parses = Array.new
+  def initialize
+    @failed_parses = []
   end
   ###
@@ -28,7 +29,7 @@ class FailedParses
   # - its sentence id (any object)
   # - its frame (String)
   # - its FE list (String Array)
   def register(sent_id, # object
                frame,   # string: frame name
                target,  # string?
@@ -54,8 +55,8 @@ class FailedParses
     unless train_percentage.class < Integer and train_percentage >= 0 and train_percentage <= 100
       raise "Need Integer between 0 and 100 as training percentage."
     end
-    train_failed = FailedParses.new()
-    test_failed = FailedParses.new()
+    train_failed = FailedParses.new
+    test_failed = FailedParses.new
     @failed_parses.each {|sent_id,frame,target,target_pos,fe_list|
       if rand(100) > train_percentage
         test_failed.register(sent_id,frame,target,target_pos,fe_list)
@@ -70,17 +71,17 @@ class FailedParses
   # Access information
   #
   # failed_sent: number of failed sentences
-  # failed_fes:  Hash that maps FE names [String] onto numbers of failed FEs [Int]
+  # failed_fes:  Hash that maps FE names [String] onto numbers of failed FEs [Int]
   #
-  # optional parameters: frame, target, target_pos : if not specified or nil, marginal
+  # optional parameters: frame, target, target_pos : if not specified or nil, marginal
   #                      frequencies are counted (sum over all values)
-  def failed_sent(frame_spec=nil,target_spec=nil,target_pos_spec=nil)
+  def failed_sent(frame_spec=nil,target_spec=nil,target_pos_spec=nil)
     counter = 0
     @failed_parses.each {|sent_id,frame,target,target_pos,fe_list|
-      if ((frame_spec.nil? or frame_spec == frame) and
-	  (target_spec.nil? or target_spec == target) and
+      if ((frame_spec.nil? or frame_spec == frame) and
+	  (target_spec.nil? or target_spec == target) and
 	  (target_pos_spec.nil? or target_pos_spec == target_pos))
 	counter += 1
       end
@@ -91,8 +92,8 @@ class FailedParses
   def failed_fes(frame_spec=nil,target_spec=nil,target_pos_spec=nil)
     fe_hash = Hash.new(0)
     @failed_parses.each {|sent_id,frame,target,target_pos,fe_list|
-      if ((frame_spec.nil? or frame_spec == frame) and
-	  (target_spec.nil? or target_spec == target) and
+      if ((frame_spec.nil? or frame_spec == frame) and
+	  (target_spec.nil? or target_spec == target) and
 	  (target_pos_spec.nil? or target_pos_spec == target))
 	fe_list.each {|fe_label|
 	  fe_hash[fe_label] += 1
@@ -102,7 +103,7 @@ class FailedParses
     return fe_hash
   end
   ###
   # Marshalling:
   #
@@ -125,6 +126,7 @@ class FailedParses
       $stderr.puts "I'll assume that there are no failed parses."
     end
   end
+end
+end
 end

data/lib/rosy/FeatureInfo.rb CHANGED Viewed

@@ -1,11 +1,13 @@
-require 'common/ruby_class_extensions'
+require 'ruby_class_extensions'
+module Shalmaneser
+module Rosy
 class RosyFeatureInfo
   ###
   # class variable:
   # list of all known extractors
   # add to it using add_feature()
-  @@extractors = Array.new
+  @@extractors = []
   # boolean. set to true after warning messages have been given once
   @@warned = false
@@ -15,21 +17,21 @@ class RosyFeatureInfo
   def RosyFeatureInfo.add_feature(class_name) # Class object
     @@extractors << class_name
   end
   ###
   def initialize(exp)
     ##
     # make list of extractors that are
     # either required by the user
     # or needed by the system
-    @current_extractors = Array.new
+    @current_extractors = []
     @exp = exp
     # user-chosen extractors:
     # returns array of pairs [feature group designator(string), options(array:string)]
     exp.get_lf("feature").each { |extractor_name, options|
-      extractor = @@extractors.detect { |e| e.designator() == extractor_name }
+      extractor = @@extractors.detect { |e| e.designator == extractor_name }
       unless extractor
         # no extractor found matching the given designator
         unless @@warned
@@ -69,13 +71,13 @@ class RosyFeatureInfo
     # extractors needed by the system
     @@extractors.select { |e|
       # select admin features and gold feature
-      ["admin", "gold"].include? e.feature_type()
+      ["admin", "gold"].include? e.feature_type
     }.each { |extractor|
       # if we have already added that extractor, remove it
       # and add it with our own options
-      @current_extractors.delete_if { |descr| descr["extractor"].designator() == extractor.designator() }
+      @current_extractors.delete_if { |descr| descr["extractor"].designator == extractor.designator }
       @current_extractors << {
         "extractor"=> extractor,
         "step" => "dontuse"
@@ -86,14 +88,14 @@ class RosyFeatureInfo
     # (i.e. check dependencies)
     allstep_extractors = @current_extractors.find_all {|e_hash| e_hash["step"].nil?
-    }.map { |e| e["extractor"].designator() }
+    }.map { |e| e["extractor"].designator }
     argrec_extractors = @current_extractors.find_all {|e_hash| e_hash["step"].nil? or e_hash["step"] == "argrec"
-    }.map { |e| e["extractor"].designator() }
+    }.map { |e| e["extractor"].designator }
     arglab_extractors = @current_extractors.find_all {|e_hash| e_hash["step"].nil? or e_hash["step"] == "arglab"
-    }.map { |e| e["extractor"].designator() }
+    }.map { |e| e["extractor"].designator }
     onestep_extractors = @current_extractors.find_all {|e_hash| e_hash["step"].nil? or e_hash["step"] == "onestep"
-    }.map { |e| e["extractor"].designator() }
+    }.map { |e| e["extractor"].designator }
     @current_extractors.delete_if {|extractor_hash|
       case extractor_hash["step"]
       when nil
@@ -104,7 +106,7 @@ class RosyFeatureInfo
         computable = extractor_hash["extractor"].is_computable(arglab_extractors)
       when "onestep"
         computable = extractor_hash["extractor"].is_computable(onestep_extractors)
-      when "dontuse"
+      when "dontuse"
 	# either an admin feature or a user feature not to be used this time
         computable = true
       end
@@ -113,7 +115,7 @@ class RosyFeatureInfo
         false # i.e. don't delete
       else
         unless @@warned
-          $stderr.puts "Warning: Feature extractor #{extractor_hash["extractor"].designator()} cannot be computed: skipping."
+          $stderr.puts "Warning: Feature extractor #{extractor_hash["extractor"].designator} cannot be computed: skipping."
         end
         true
       end
@@ -126,17 +128,17 @@ class RosyFeatureInfo
     # "step" -> string: argrec, arglab, onestep, or nil
     # "type" -> string
     # "phase" -> string: phase 1 or phase 2
-    @features = Array.new
+    @features = []
     @current_extractors.each { |descr|
       extractor = descr["extractor"]
       extractor.feature_names.each { |feature_name|
         @features << {
           "feature_name" => feature_name,
-          "sql_type"     => extractor.sql_type(),
-          "is_index"     => extractor.info().include?("index"),
+          "sql_type"     => extractor.sql_type,
+          "is_index"     => extractor.info.include?("index"),
           "step"         => descr["step"],
-          "type"         => extractor.feature_type(),
-          "phase"        => extractor.phase()
+          "type"         => extractor.feature_type,
+          "phase"        => extractor.phase
         }
       }
     }
@@ -152,7 +154,7 @@ class RosyFeatureInfo
   # all features to be computed, with their SQL column formats
   def get_column_formats(phase = nil) # string: phase 1 or phase 2
     return @features.select { |feature_descr|
-      phase.nil? or
+      phase.nil? or
         feature_descr["phase"] == phase
     }.map { |feature_descr|
       [feature_descr["feature_name"], feature_descr["sql_type"]]
@@ -166,7 +168,7 @@ class RosyFeatureInfo
   # all features to be computed
   def get_column_names(phase = nil)  # string: phase 1 or phase 2
     return @features.select { |feature_descr|
-      phase.nil? or
+      phase.nil? or
         feature_descr["phase"] == phase
     }.map { |feature_descr|
       feature_descr["feature_name"]
@@ -179,9 +181,9 @@ class RosyFeatureInfo
   # returns a list of feature (column) names as Strings
   # consisting of all features that have been requested as index features
   # in the experiment file or in the list of @@all_features_we_have above
-  def get_index_columns()
+  def get_index_columns
     return @features.select { |feature_descr|
-      feature_descr["is_index"]
+      feature_descr["is_index"]
     }.map {|feature_descr|
       feature_descr["feature_name"]
     }
@@ -209,13 +211,13 @@ class RosyFeatureInfo
     }.map { |feature_descr|
       # use just the names of the features
       feature_descr["feature_name"]
-    }
+    }
   end
   ###
   # get_extractor_objects
   #
-  # returns two lists of feature extractor objects,
+  # returns two lists of feature extractor objects,
   # covering all features of the given phase:
   # the first list contains RosyFeatureExtractor extractors,
   # the second list contains the others.
@@ -227,16 +229,18 @@ class RosyFeatureInfo
     return @current_extractors.select { |descr|
       # select extractors of the right phase
-      descr["extractor"].phase() == phase
+      descr["extractor"].phase == phase
     }.map { |descr|
       # make objects from extractor classes
       descr["extractor"].new(@exp, interpreter_class)
     }.distribute { |extractor_obj|
-      # distribute extractors in two bins:
+      # distribute extractors in two bins:
       # first, rosy extractors
       # second, others
-      extractor_obj.class.info().include? "rosy"
+      extractor_obj.class.info.include? "rosy"
     }
   end
 end
+end
+end