RubyGems - shalmaneser-rosy - Versions diffs - 1.2.0.rc4 - Mend

shalmaneser-rosy 1.2.0.rc4

Files changed (38) hide show

checksums.yaml +7 -0
data/.yardopts +10 -0
data/CHANGELOG.md +4 -0
data/LICENSE.md +4 -0
data/README.md +93 -0
data/bin/rosy +17 -0
data/lib/rosy/AbstractFeatureAndExternal.rb +242 -0
data/lib/rosy/ExternalConfigData.rb +58 -0
data/lib/rosy/FailedParses.rb +130 -0
data/lib/rosy/FeatureInfo.rb +242 -0
data/lib/rosy/GfInduce.rb +1115 -0
data/lib/rosy/GfInduceFeature.rb +148 -0
data/lib/rosy/InputData.rb +294 -0
data/lib/rosy/RosyConfusability.rb +338 -0
data/lib/rosy/RosyEval.rb +465 -0
data/lib/rosy/RosyFeatureExtractors.rb +1609 -0
data/lib/rosy/RosyFeaturize.rb +281 -0
data/lib/rosy/RosyInspect.rb +336 -0
data/lib/rosy/RosyIterator.rb +478 -0
data/lib/rosy/RosyPhase2FeatureExtractors.rb +230 -0
data/lib/rosy/RosyPruning.rb +165 -0
data/lib/rosy/RosyServices.rb +744 -0
data/lib/rosy/RosySplit.rb +232 -0
data/lib/rosy/RosyTask.rb +19 -0
data/lib/rosy/RosyTest.rb +829 -0
data/lib/rosy/RosyTrain.rb +234 -0
data/lib/rosy/RosyTrainingTestTable.rb +787 -0
data/lib/rosy/TargetsMostFrequentFrame.rb +60 -0
data/lib/rosy/View.rb +418 -0
data/lib/rosy/opt_parser.rb +379 -0
data/lib/rosy/rosy.rb +78 -0
data/lib/rosy/rosy_config_data.rb +121 -0
data/test/frprep/test_opt_parser.rb +94 -0
data/test/functional/functional_test_helper.rb +58 -0
data/test/functional/test_fred.rb +47 -0
data/test/functional/test_frprep.rb +99 -0
data/test/functional/test_rosy.rb +40 -0
metadata +105 -0

checksums.yaml ADDED

@@ -0,0 +1,7 @@
+---
+SHA1:
+  metadata.gz: fc41641d5f0eed28292b10a996ffd797eb1002fc
+  data.tar.gz: 76916c60023ae21361dc6752ca316028585d1522
+SHA512:
+  metadata.gz: 51bbbd581acb92993cd12d485b405f0f9f199d5ea4334b37cac6a4ff6150d49e4b0bc7b92ab6d08399a1bbe69839ebc476ae69267b64f2ec34464d4d080569cf
+  data.tar.gz: abbc5f43fd9d191730c8149743fcd84ba6183ff2ee0f5eaf08b5f574a68aaca2a89b390c68e54af6f5f9019dc7f97f22b352fb6ee6d88001579c55868c06e286

data/.yardopts ADDED

@@ -0,0 +1,10 @@
+--private
+--protected
+--title 'SHALMANESER'
+lib/**/*.rb
+bin/**/*
+doc/**/*.md
+-
+CHANGELOG.md
+LICENSE.md
+doc/index.md

data/CHANGELOG.md ADDED

@@ -0,0 +1,4 @@
+# Versions
+## Version 1.2.0-rc1

data/LICENSE.md ADDED

@@ -0,0 +1,4 @@
+# LICENSE
+This software is written in Ruby and is released under the [GNU Public License](http://www.gnu.org/licenses/gpl-2.0.html) (GPL v2), and the documentation under the [Free Document License](http://www.gnu.org/licenses/old-licenses/fdl-1.2.html) (FDL v1.2).

data/README.md ADDED

@@ -0,0 +1,93 @@
+# [SHALMANESER - a SHALlow seMANtic parSER](http://www.coli.uni-saarland.de/projects/salsa/shal/)
+[RubyGems](http://rubygems.org/gems/shalmaneser) |
+[Shalmanesers Project Page](http://bu.chsta.be/projects/shalmaneser/) |
+[Source Code](https://github.com/arbox/shalmaneser) |
+[Bug Tracker](https://github.com/arbox/shalmaneser/issues)
+[![Gem Version](https://img.shields.io/gem/v/shalmaneser.svg")](https://rubygems.org/gems/shalmaneser)
+[![Gem Version](https://img.shields.io/gem/v/frprep.svg")](https://rubygems.org/gems/frprep)
+[![Gem Version](https://img.shields.io/gem/v/fred.svg")](https://rubygems.org/gems/fred)
+[![Gem Version](https://img.shields.io/gem/v/rosy.svg")](https://rubygems.org/gems/rosy)
+[![License GPL 2](http://img.shields.io/badge/License-GPL%202-green.svg)](http://www.gnu.org/licenses/gpl-2.0.txt)
+[![Build Status](https://img.shields.io/travis/arbox/shalmaneser.svg?branch=1.2")](https://travis-ci.org/arbox/shalmaneser)
+[![Code Climate](https://img.shields.io/codeclimate/github/arbox/shalmaneser.svg")](https://codeclimate.com/github/arbox/shalmaneser)
+[![Dependency Status](https://img.shields.io/gemnasium/arbox/shalmaneser.svg")](https://gemnasium.com/arbox/shalmaneser)
+## Description
+Please be careful, the whole thing is under construction! For now Shalmaneser it not intended to run on Windows systems since it heavily uses system calls for external invocations.
+Current versions of Shalmaneser have been tested on Linux only (other *NIX testers are welcome!).
+Shalmaneser is a supervised learning toolbox for shallow semantic parsing, i.e. the automatic assignment of semantic classes and roles to text. This technique is often called SRL (Semantic Role Labelling). The system was developed for Frame Semantics; thus we use Frame Semantics terminology and call the classes frames and the roles frame elements. However, the architecture is reasonably general, and with a certain amount of adaption, Shalmaneser should be usable for other paradigms (e.g., PropBank roles) as well. Shalmaneser caters both for end users, and for researchers.
+For end users, we provide a simple end user mode which can simply apply the pre-trained classifiers
+for [English](http://www.coli.uni-saarland.de/projects/salsa/shal/index.php?nav=download) (FrameNet 1.3 annotation / Collins parser)
+and [German](http://www.coli.uni-saarland.de/projects/salsa/shal/index.php?nav=download) (SALSA 1.0 annotation / Sleepy parser).
+We'll try to provide newer pretrained models for English, German, and possibly other languages as soon as possible.
+For researchers interested in investigating shallow semantic parsing, our system is extensively configurable and extendable.
+## Origin
+The original version of Shalmaneser was written by Sebastian Padó, Katrin Erk and others during their work in the SALSA Project.
+You can find original versions of Shalmaneser up to ``1.1`` on the [SALSA](http://www.coli.uni-saarland.de/projects/salsa/shal/) project page.
+## Publications on Shalmaneser
+- K. Erk and S. Padó: Shalmaneser - a flexible toolbox for semantic role assignment. Proceedings of LREC 2006, Genoa, Italy. [Click here for details](http://www.nlpado.de/~sebastian/pub/papers/lrec06_erk.pdf).
+- TODO: add other works
+## Documentation
+The project documentation can be found in our [doc](https://github.com/arbox/shalmaneser/blob/1.2/doc/index.md) folder.
+## Development
+We are working now on two branches:
+- ``dev`` - our development branch incorporating actual changes, for now pointing to ``1.2``;
+- ``1.2`` - intermediate target;
+- ``2.0`` - final target.
+## Installation
+See the installation instructions in the [doc](https://github.com/arbox/shalmaneser/blob/1.2/doc/index.md#installation) folder.
+### Tokenizers
+- [Ucto](http://ilk.uvt.nl/ucto/)
+### POS Taggers
+- [TreeTagger](http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/)
+### Lemmatizers
+- [TreeTagger](http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/)
+### Parsers
+- [BerkeleyParser](https://code.google.com/p/berkeleyparser/downloads/list)
+- [Stanford Parser](http://nlp.stanford.edu/software/lex-parser.shtml)
+- [Collins Parser](http://www.cs.columbia.edu/~mcollins/code.html)
+### Machine Learning Systems
+- [OpenNLP MaxEnt](http://sourceforge.net/projects/maxent/files/Maxent/2.4.0/)
+- [Mallet](http://mallet.cs.umass.edu/index.php)
+## License
+See the `LICENSE` file.
+## Contributing
+See the `CONTRIBUTING` file.

data/bin/rosy ADDED

@@ -0,0 +1,17 @@
+#!/usr/bin/env ruby
+# -*- encoding: utf-8 -*-
+# AB: 2011-11-14
+# rosy.rb
+# KE, SP April 05
+#
+# Main file of the Rosy role assignment system.
+require 'rosy/opt_parser'
+require 'rosy/rosy'
+options = Rosy::OptParser.parse(ARGV)
+rosy = Rosy::Rosy.new(options)
+rosy.assign

data/lib/rosy/AbstractFeatureAndExternal.rb ADDED

@@ -0,0 +1,242 @@
+# Katrin Erk November 05
+#
+# Abstract classes for
+# - Rosy features
+# - Rosy interface for external knowledge sources.
+require 'rosy/ExternalConfigData'
+####
+# Feature Extractor:
+# computes one or more features for a node (a SynNode object) out of
+#  a SalsaTigerSentence
+class AbstractFeatureExtractor
+  @@sent = nil  # SalsaTigerSentence: sentence of the current instance
+  @@frame = nil # FrameNode: frame of the current instance
+  @@node = nil  # SynNode: constituent that is the current instance
+  @@interpreter_class = nil # SynInterpreter class
+  @@instance_ok = true
+  ###
+  # returns a string: the designator for this feature extractor
+  # (an extractor may compute several features, but
+  #  in the experiment file it is chosen by a single designator)
+  def AbstractFeatureExtractor.designator()
+    raise "Overwrite me"
+  end
+  ###
+  # returns an array of feature names, the names of the
+  # features that it can compute.
+  # The number of features that the extractor computes must be fixed.
+  def AbstractFeatureExtractor.feature_names()
+    raise "Overwrite me."
+  end
+  ###
+  # returns a string: the data type for the feature
+  # to be passed on to the MySQL database,
+  # e.g. VARCHAR(10), INT
+  def AbstractFeatureExtractor.sql_type()
+    raise "Overwrite me"
+  end
+  ###
+  # returns a string: the feature type
+  # (the same for all features computed by this extractor)
+  # possible values:
+  # - gold: gold label
+  # - admin: administrative feature, do not pass this on to the learner
+  # - syn: feature computed from syntactic characteristics of the instance
+  # - sem: feature involving semantic characteristics of the instance
+  # - sentlevel: this feature is the same for all instances of a sentence
+  def AbstractFeatureExtractor.feature_type()
+    raise "Overwrite me"
+  end
+  ###
+  # returns a string: "phase 1" or "phase 2",
+  # depending on whether the feature is computed
+  # directly from the SalsaTigerSentence and the SynNode objects
+  # or whether it is computed from the phase 1 features
+  def AbstractFeatureExtractor.phase()
+    raise "Overwrite me."
+  end
+  ###
+  # returns an array of strings, providing information about
+  # the feature extractor
+  def AbstractFeatureExtractor.info()
+    return []
+  end
+  ###
+  # set sentence, set node, set other settings:
+  # this is done prior to
+  # feature computation using compute_feature()
+  # such that computations that stay the same for
+  # several features can be done in advance
+  #
+  # This is just relevant for Phase 1
+  #
+  # returns: false/nil if there was a problem
+  def AbstractFeatureExtractor.set_sentence(sent,  # SalsaTigerSentence object
+                                            frame) # FrameNode object
+    @@sent = sent
+    @@frame = frame
+    return true
+  end
+  def AbstractFeatureExtractor.set_node(node) # SynNode of the sentence set in set_sentence
+    @@node = node
+    return true
+  end
+  ###
+  # set sentence, set node, set general settings: this is done prior to
+  # feature computation using compute_feature_value()
+  # such that computations that stay the same for
+  # several features can be done in advance
+  def AbstractFeatureExtractor.set(var_hash = {})
+    # no settings at this point
+    return true
+  end
+  # test during initialisation whether a feature is computable
+  # gives the feature the possibility to specify additional constraints
+  # e.g. for phase2 features : specify which extractors from phase 1 are presupposed
+  def AbstractFeatureExtractor.is_computable(extractor_list) # bool
+    return true
+  end
+  ###
+  # @param exp [ConfigData] Experiment file information
+  # @param interpreter_class [Class]
+  def initialize(exp, interpreter_class)
+    @exp = exp
+    @@interpreter_class = interpreter_class
+  end
+  ###
+  # compute: compute features
+  #
+  # returns an array of features (strings), length the same as the
+  # length of feature_names()
+  def compute_features()
+    raise "overwrite me"
+  end
+  ###
+  # phase 2 extractors:
+  # compute features for a complete view
+  #
+  # returns: an array of columns,
+  # where a column is an array of feature values.
+  # returns one column per entry in feature_names()
+  def compute_features_on_view(view) # DBView object
+    raise "overwrite me"
+  end
+  # At this place, we had abstract methods for "training" phase 2 features
+  # Since this involves introducing a "state" that is nontrivial to preserve
+  # for a standalone version of the classifiers, without keeping the training data,
+  # we decided to remove this functionality (30.11.05).
+  # Features which rely on learning patterns from the training data and applying them
+  # to the test data will from now on be implemented as externals.
+  ######
+  protected
+  def AbstractFeatureExtractor.announce_me()
+    # AB: In 1.9 constants are symbols.
+    if Module.constants.include?("RosyFeatureInfo") or Module.constants.include?(:RosyFeatureInfo)
+      # yup, we have a class to which we can announce ourselves
+      RosyFeatureInfo.add_feature(eval(self.name()))
+    else
+      # no interface collector class
+#      $stderr.puts "Feature #{self.name()} not announced: no RosyFeatureInfo."
+    end
+  end
+end
+################################################################
+# Wrapper class for extractors that compute a single feature
+class AbstractSingleFeatureExtractor < AbstractFeatureExtractor
+  ###
+  # returns a string: the designator for this feature extractor
+  # (an extractor may compute several features, but
+  #  in the experiment file it is chosen by a single designator)
+  #
+  # here: single feature, and the feature name is the designator
+  def AbstractFeatureExtractor.designator()
+    return eval(self.name()).feature_name()
+  end
+  ###
+  def AbstractSingleFeatureExtractor.feature_names()
+    return [eval(self.name()).feature_name()]
+  end
+  ###
+  def compute_features()
+    return [compute_feature()]
+  end
+  def compute_features_on_view(view) # DBView object
+    return [compute_feature_on_view(view)]
+  end
+  ######
+  # Single-feature methods
+  ###
+  def AbstractSingleFeatureExtractor.feature_name()
+    raise "Overwrite me."
+  end
+  ###
+  def compute_feature()
+    raise "Overwrite me"
+  end
+  ###
+  def compute_feature_on_view(view) # DBView object
+    raise "Overwrite me"
+  end
+end
+######################################################
+class ExternalFeatureExtractor < AbstractFeatureExtractor
+  @@warning_uttered = false
+  ####
+  # initialization:
+  #
+  # read experiment file for external interfaces
+  def initialize(exp,    # RosyConfigData object
+                 interpreter_class)
+    @exp_rosy = exp
+    @@interpreter_class = interpreter_class
+    unless @exp_rosy.get("external_descr_file")
+      unless @@warning_uttered
+	$stderr.puts "Warning: Cannot compute external feature"
+	$stderr.puts "since 'external_descr_file' has not been set"
+	$stderr.puts "in the Rosy experiment file."
+	@@warning_uttered = true
+      end
+      @exp_external = nil
+      return
+    end
+    @exp_external = ExternalConfigData.new(@exp_rosy.get("external_descr_file"))
+  end
+end

data/lib/rosy/ExternalConfigData.rb ADDED

@@ -0,0 +1,58 @@
+# ExternalConfigData
+# Katrin Erk January 2006
+#
+# All scripts that compute additional external knowledge sources
+# for Fred and Rosy:
+# access to configuration and experiment description file
+require 'common/config_data'
+##############################
+# Class ExternalConfigData
+#
+# inherits from ConfigData,
+# sets variable names appropriate to tasks of external knowledge sources
+class ExternalConfigData < ConfigData
+  def initialize(filename)
+    # initialize config data object
+    super(filename,          # config file
+	  { "directory" => "string", # features
+	    "experiment_id" => "string",
+	    "gfmap_restrict_to_downpath" => "bool",
+	    "gfmap_restrict_pathlen" => "integer",
+	    "gfmap_remove_gf" => "list"
+	  },
+	  [] # variables
+	  )
+    # set access functions for list features
+    set_list_feature_access("gfmap_remove_gf",
+			    method("access_as_stringlist"))
+  end
+  ###
+  protected
+  #####
+  # access_as_stringlist
+  #
+  # assumed format:
+  #
+  #   lhs = rhs1 rhs2 ... rhsN
+  #
+  # given in val_list as string tuples [rhs1,...,rhsN]
+  #
+  # join the rhs strings by spaces, return as string
+  # "rhs1 rhs2 ... rhsN"
+  #
+  def access_as_stringlist(val_list) # array:array:string
+    return val_list.map { |rhs| rhs.join(" ") }
+  end
+end

data/lib/rosy/FailedParses.rb ADDED

@@ -0,0 +1,130 @@
+# Failed Parses
+#
+# SP May 05
+#
+# Administration of information about failed parses;
+# - sentence ID
+# - frame
+# - missed FE markables
+#
+# this class is pretty much a gloriefied hash table with methods to
+# - read FailedParses from a file and to write them to a file
+# - access info in a frame-specific way
+class FailedParses
+  ###
+  # initialize
+  #
+  # nothing much happens here
+  def initialize()
+    @failed_parses = Array.new
+  end
+  ###
+  # register
+  #
+  # register new failed parse by specifying
+  # - its sentence id (any object)
+  # - its frame (String)
+  # - its FE list (String Array)
+  def register(sent_id, # object
+               frame,   # string: frame name
+               target,  # string?
+               target_pos, # string: target POS
+               fe_list) # array:string
+    if @failed_parses.assoc sent_id
+#      $stderr.puts "Error: trying to register sentence id #{sent_id} twice!"
+#      $stderr.puts "Skipping second occurrence."
+    end
+    @failed_parses << [sent_id,frame,target,target_pos,fe_list]
+  end
+  ###
+  # make_split
+  #
+  # produce a "split" of the failed parses into a train and a test section
+  # paramer: train_percentage, Integer between 0 and 100
+  #
+  # returns an Array with two FailedParses objects, the first for the
+  # train data, the second for the test data
+  def make_split(train_percentage)
+    unless train_percentage.class < Integer and train_percentage >= 0 and train_percentage <= 100
+      raise "Need Integer between 0 and 100 as training percentage."
+    end
+    train_failed = FailedParses.new()
+    test_failed = FailedParses.new()
+    @failed_parses.each {|sent_id,frame,target,target_pos,fe_list|
+      if rand(100) > train_percentage
+        test_failed.register(sent_id,frame,target,target_pos,fe_list)
+      else
+        train_failed.register(sent_id,frame,target,target_pos,fe_list)
+      end
+    }
+    return [train_failed, test_failed]
+  end
+  ###
+  # Access information
+  #
+  # failed_sent: number of failed sentences
+  # failed_fes:  Hash that maps FE names [String] onto numbers of failed FEs [Int]
+  #
+  # optional parameters: frame, target, target_pos : if not specified or nil, marginal
+  #                      frequencies are counted (sum over all values)
+  def failed_sent(frame_spec=nil,target_spec=nil,target_pos_spec=nil)
+    counter = 0
+    @failed_parses.each {|sent_id,frame,target,target_pos,fe_list|
+      if ((frame_spec.nil? or frame_spec == frame) and
+	  (target_spec.nil? or target_spec == target) and
+	  (target_pos_spec.nil? or target_pos_spec == target_pos))
+	counter += 1
+      end
+    }
+    return counter
+  end
+  def failed_fes(frame_spec=nil,target_spec=nil,target_pos_spec=nil)
+    fe_hash = Hash.new(0)
+    @failed_parses.each {|sent_id,frame,target,target_pos,fe_list|
+      if ((frame_spec.nil? or frame_spec == frame) and
+	  (target_spec.nil? or target_spec == target) and
+	  (target_pos_spec.nil? or target_pos_spec == target))
+	fe_list.each {|fe_label|
+	  fe_hash[fe_label] += 1
+	}
+      end
+    }
+    return fe_hash
+  end
+  ###
+  # Marshalling:
+  #
+  # save - save info about failed parses to file
+  # load - load info about failed parses from file
+  def save(filename)
+    io_obj = File.new(filename,"w")
+    Marshal.dump(@failed_parses,io_obj)
+    io_obj.close
+  end
+  def load(filename)
+    begin
+      io_obj = File.new(filename)
+      @failed_parses = Marshal.load(io_obj)
+      io_obj.close
+    rescue
+      $stderr.puts "WARNING: couldn't read failed parses file #{filename}."
+      $stderr.puts "I'll assume that there are no failed parses."
+    end
+  end
+end