RubyGems - shalmaneser - Versions diffs - 1.2.0.rc1 → 1.2.0.rc2 - Mend

shalmaneser 1.2.0.rc1 → 1.2.0.rc2

Files changed (30) hide show

checksums.yaml +4 -4
data/README.md +26 -8
data/doc/SB_README +57 -0
data/doc/exp_files_description.txt +160 -0
data/doc/fred.pdf +0 -0
data/doc/index.md +120 -0
data/doc/salsa_tool.pdf +0 -0
data/doc/salsatigerxml.pdf +0 -0
data/doc/shal_doc.pdf +0 -0
data/doc/shal_lrec.pdf +0 -0
data/lib/ext/maxent/Classify.class +0 -0
data/lib/ext/maxent/Train.class +0 -0
data/lib/frprep/TreetaggerInterface.rb +4 -4
data/lib/shalmaneser/version.rb +1 -1
metadata +41 -48
data/test/frprep/test_opt_parser.rb +0 -94
data/test/functional/functional_test_helper.rb +0 -40
data/test/functional/sample_experiment_files/fred_test.salsa.erb +0 -122
data/test/functional/sample_experiment_files/fred_train.salsa.erb +0 -135
data/test/functional/sample_experiment_files/prp_test.salsa.erb +0 -138
data/test/functional/sample_experiment_files/prp_test.salsa.fred.standalone.erb +0 -120
data/test/functional/sample_experiment_files/prp_test.salsa.rosy.standalone.erb +0 -120
data/test/functional/sample_experiment_files/prp_train.salsa.erb +0 -138
data/test/functional/sample_experiment_files/prp_train.salsa.fred.standalone.erb +0 -138
data/test/functional/sample_experiment_files/prp_train.salsa.rosy.standalone.erb +0 -138
data/test/functional/sample_experiment_files/rosy_test.salsa.erb +0 -257
data/test/functional/sample_experiment_files/rosy_train.salsa.erb +0 -259
data/test/functional/test_fred.rb +0 -47
data/test/functional/test_frprep.rb +0 -52
data/test/functional/test_rosy.rb +0 -40

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 83f5f0ca7cc27a632cb46deef7c093df649c61e1
-  data.tar.gz: dbc9a29186421206de7bf9b0138f05f89228fad6
+  metadata.gz: 0ca81bfeedc7c124833b61ae3facccf6eb86cecc
+  data.tar.gz: 38ada6d619d55ede1e291a7c1534b1633b08e363
 SHA512:
-  metadata.gz: 8a87f1e74b16082cba8d2ab49eb33289e8db23f5bdf3cdd4f294901c8119c8bff1239ec870032871d6d2cf69efbaba500058a47827df92be707aba3ab36ab30a
-  data.tar.gz: be1f6b6f3e4aa0b20f26437f30c579faf68f03f7c474cb78e28cb1263ef4ab9397ab4d52fbdffa4ac7ceb50a2d3f44cb4200303a7f14b2bdd0cb06fbfae68f0f
+  metadata.gz: 863962998b66640c61e54a29ceb1a21821ff6f45d20329a22c4fce5284f2877e2e4b4f7ee88fd1cdf37b6e90d68d8b1d1acaac41c1225fe858ce4262f9b638da
+  data.tar.gz: 78334e6fa48815e1fa1dbc502759cb4632655a8bde2b0810ac2afe53c288bc983d5368e50812d576a4604c31acfff8921dc4f1b9707d569a60e0be4facdfd03e

data/README.md CHANGED Viewed

@@ -1,17 +1,18 @@
 # [SHALMANESER - a SHALlow seMANtic parSER](http://www.coli.uni-saarland.de/projects/salsa/shal/)
-[RubyGems](http://rubygems.org/gems/shalmaneser) | [RTT Project Page](http://bu.chsta.be/projects/shalmaneser/) |
+[RubyGems](http://rubygems.org/gems/shalmaneser) | [Shalmanesers Project Page](http://bu.chsta.be/projects/shalmaneser/) |
 [Source Code](https://github.com/arbox/shalmaneser) | [Bug Tracker](https://github.com/arbox/shalmaneser/issues)
 [<img src="https://badge.fury.io/rb/shalmaneser.png" alt="Gem Version" />](http://badge.fury.io/rb/shalmaneser)
 [<img src="https://travis-ci.org/arbox/shalmaneser.png" alt="Build Status" />](https://travis-ci.org/arbox/shalmaneser)
 [<img src="https://codeclimate.com/github/arbox/shalmaneser.png" alt="Code Climate" />](https://codeclimate.com/github/arbox/shalmaneser)
 [<img alt="Bitdeli Badge" src="https://d2weczhvl823v0.cloudfront.net/arbox/shalmaneser/trend.png" />](https://bitdeli.com/free)
+[![Dependency Status](https://gemnasium.com/arbox/shalmaneser.png)](https://gemnasium.com/arbox/shalmaneser)
 ## Description
-Please be careful, the whole thing is under construction!
+Please be careful, the whole thing is under construction! Shalmaneser it not intended to run on Windows systems. For now it has been tested on only Linux.
 Shalmaneser is a supervised learning toolbox for shallow semantic parsing, i.e. the automatic assignment of semantic classes and roles to text. The system was developed for Frame Semantics; thus we use Frame Semantics terminology and call the classes frames and the roles frame elements. However, the architecture is reasonably general, and with a certain amount of adaption, Shalmaneser should be usable for other paradigms (e.g., PropBank roles) as well. Shalmaneser caters both for end users, and for researchers.
@@ -20,13 +21,13 @@ For end users, we provide a simple end user mode which can simply apply the pre-
 ## Origin
 You can find original versions of Shalmaneser up to ``1.1`` on the [SALSA](http://www.coli.uni-saarland.de/projects/salsa/shal/) project page.
-## Literature
+## Publications on Shalmaneser
-K. Erk and S. Padó: Shalmaneser - a flexible toolbox for semantic role assignment. Proceedings of LREC 2006, Genoa, Italy. [Click here for details](http://www.nlpado.de/~sebastian/pub/papers/lrec06_erk.pdf).
+- K. Erk and S. Padó: Shalmaneser - a flexible toolbox for semantic role assignment. Proceedings of LREC 2006, Genoa, Italy. [Click here for details](http://www.nlpado.de/~sebastian/pub/papers/lrec06_erk.pdf).
 ## Documentation
-The project documentation can be found in our [doc](doc/index.md) folder.
+The project documentation can be found in our [doc](https://github.com/arbox/shalmaneser/blob/1.2/doc/index.md) folder.
 ## Development
@@ -40,10 +41,27 @@ We are working now on two branches:
 ## Installation
-See the installation instructions in the [doc](doc/index.md#installation) folder.
+See the installation instructions in the [doc](https://github.com/arbox/shalmaneser/blob/1.2/doc/index.md#installation) folder.
-### Machine Learning Systems
+### Tokenizers
+- [Ucto](http://ilk.uvt.nl/ucto/)
+### POS Taggers
+- [TreeTagger](http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/)
-- http://sourceforge.net/projects/maxent/files/Maxent/2.4.0/
+### Lemmatizers
+- [TreeTagger](http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/)
+### Parsers
+- [BerkeleyParser](https://code.google.com/p/berkeleyparser/downloads/list)
+- [Stanford Parser](http://nlp.stanford.edu/software/lex-parser.shtml)
+- [Collins Parser](http://www.cs.columbia.edu/~mcollins/code.html)
+### Machine Learning Systems
+- [OpenNLP MaxEnt](http://sourceforge.net/projects/maxent/files/Maxent/2.4.0/)
+- [Mallet](http://mallet.cs.umass.edu/index.php)

data/doc/SB_README ADDED Viewed

@@ -0,0 +1,57 @@
+# Before running the programs you should make sure that all components
+# needed by shalmaneser are installed and that all paths in the
+# configuration files/code are adapted accordingly
+# (maybe iterate over all files and grep for "rehbein" to find hard-
+# coded paths; have a look at all configuration files in SampleExperimentFiles.salsa)
+# Directories
+# program_de	-> ruby source code and additional stuff for the German
+#		   version of shalmaneser
+# program_de/SampleExperimentFiles.salsa
+#		-> configuration files for shalmaneser
+# input		-> includes test data in plain text format
+# output	-> all temporary files and output files, including the
+#		   classifiers
+#
+# directory output:
+# prp_test 	-> output of frprep.rb (parsed/tagged/lemmatised data)
+# preprocessed	-> output of frprep.rb (data converted to SalsaTiGerXML)
+# exp_fred_salsa-> temp files/output of fred.rb (classifiers, features, ...)
+# exp_fred/output/stxml/  -> output of fred.rb (SalsaTigerXML file with
+# 		   frames)
+# exp_rosy_salsa-> temp files/output of rosy.rb (classifiers, features, ...)
+# exp_rosy_salsa/output	  -> output of rosy.rb
+# Set some variables
+# => adapt to your program paths
+DIR=/proj/llx/Annotation/experiments/test/shalmaneser
+EXP=$DIR/program_de/SampleExperimentFiles.salsa
+export CLASSPATH=/proj/llx/Software/MachineLearning/maxent-2.4.0/lib/trove.jar:/proj/llx/Software/MachineLearning/maxent-2.4.0/output/maxent-2.4.0.jar:/proj/llx/Annotation/experiments/sfischer_bachelor/shalmaneser/program/tools/maxent
+# change to shalmaneser directory
+cd $DIR/program_de
+# Preprocessing
+# (result: parsed file in SalsaTiGerXML format
+#  when running on SalsaTiGerXML data: gold frames/roles included
+#  when running on plain text: without frames/roles)
+ruby frprep.rb -e $EXP/prp_test.salsa
+# Frame assignment with fred
+ruby fred.rb -t featurize -e $EXP/fred_test.salsa -d test
+ruby fred.rb -t test -e $EXP/fred_test.salsa
+# Role assignment with rosy
+ruby rosy.rb -t featurize -e $EXP/rosy.salsa -d test
+ruby rosy.rb -t test -e $EXP/rosy.salsa

data/doc/exp_files_description.txt ADDED Viewed

@@ -0,0 +1,160 @@
+= FrPrep
+prep_experiment_ID => "string", # experiment identifier
+frprep_directory => "string", # dir for frprep internal data
+# information about the dataset
+language => "string", # en, de
+origin => "string",    # FrameNet, Salsa, or nothing
+format => "string",   # Plain, SalsaTab, FNXml, FNCorpusXml, SalsaTigerXML
+encoding => "string", # utf8, iso, hex, or nothing
+# directories
+directory_input => "string", # dir with input data
+directory_preprocessed => "string", # dir with output Salsa/Tiger XML data
+directory_parserout => "string", # dir with parser output for the parser named below
+# syntactic processing
+pos_tagger => "string", # name of POS tagger
+lemmatizer => "string", # name of lemmatizer
+parser => "string",     # name of parser
+pos_tagger_path => "string", # path to POS tagger
+lemmatizer_path => "string", # path to lemmatizer
+parser_path => "string",     # path to parser
+parser_max_sent_num => "integer", # max number of sentences per parser
+input file
+parser_max_sent_len => "integer", # max sentence length the parser handles
+do_parse" => "bool",    # use parser?
+do_lemmatize" => "bool",# use lemmatizer?
+do_postag" => "bool",   # use POS tagger?
+# output format: if tabformat_output == true,
+# output in Tab format rather than Salsa/Tiger XML
+# (this will not work if do_parse == true)
+tabformat_output" => "bool",
+# syntactic repairs, dependent on existing semantic role annotation
+fe_syn_repair" => "bool", # map words to constituents for FEs: idealize?
+fe_rel_repair" => "bool", # FEs: include non-included relative clauses into FEs
+= Fred
+experiment_ID" => "string", # experiment ID
+enduser_mode" => "bool", # work in enduser mode? (disallowing many things)
+preproc_descr_file_train" => "string", # path to preprocessing files
+preproc_descr_file_test" => "string",
+directory_output" => "string", # path to Salsa/Tiger XML output directory
+verbose" => "bool" ,     # print diagnostic messages?
+apply_to_all_known_targets" => "bool", # apply to all known targets rather than the ones with a frame?
+fred_directory" => "string",# directory for internal info
+classifier_dir" => "string", # write classifiers here
+classifier" => "list",  # classifiers
+dbtype" => "string",    # "mysql" or "sqlite"
+host" => "string",      # DB access: sqlite only
+user" => "string",
+passwd" => "string",
+dbname" => "string",
+# featurization info
+feature" => "list",     # which features to use for the classifier?
+binary_classifiers" => "bool",# make binary rather than n-ary clasifiers?
+negsense" => "string",  # binary classifier: negative sense is..?
+numerical_features" => "string", # do what with numerical features?
+# what to do with items that have multiple senses?
+# 'binarize': binary classifiers, and consider positive
+#          if the sense is among the gold senses
+# 'join' : make one joint sense
+# 'repeat' : make multiple occurrences of the item, one sense per occ
+# 'keep' : keep as separate labels
+#
+# multilabel: consider as assigned all labels
+# above a certain confidence threshold?
+handle_multilabel" => "string",
+assignment_confidence_threshold" => "float",
+# single-sentence context?
+single_sent_context" => "bool",
+# noncontiguous input? then we need access to a larger corpus
+noncontiguous_input" => "bool",
+larger_corpus_dir" => "string",
+larger_corpus_format" => "string",
+larger_corpus_encoding" => "string"
+[ # variables
+"train",
+"exp_ID"
+]
+= Rosy
+# features
+feature" => "list",
+classifier" => "list",
+verbose" => "bool" ,
+enduser_mode" => "bool",
+experiment_ID" => "string",
+directory_input_train" => "string",
+directory_input_test" => "string",
+directory_output" => "string",
+preproc_descr_file_train" => "string",
+preproc_descr_file_test" => "string",
+external_descr_file"    => "string",
+dbtype" => "string",    # "mysql" or "sqlite"
+host" => "string",      # DB access: sqlite only
+user" => "string",
+passwd" => "string",
+dbname" => "string",
+data_dir" => "string",  # for external use
+rosy_dir" => "pattern", # for internal use only, set by rosy.rb
+classifier_dir" => "string", # if present, special directory for classifiers
+classif_column_name" => "string",
+main_table_name" => "pattern",
+test_table_name" => "pattern",
+eval_file" => "pattern",
+log_file" => "pattern",
+failed_file" => "pattern",
+classifier_file" => "pattern",
+classifier_output_file" => "pattern",
+noval" => "string",
+split_nones" => "bool",
+print_eval_log" => "bool",
+assume_argrec_perfect" => "bool",
+xwise_argrec" => "string",
+xwise_arglab" => "string",
+xwise_onestep" => "string",
+fe_syn_repair" => "bool", # map words to constituents for FEs: idealize?
+fe_rel_repair" => "bool", # FEs: include non-included relative clauses into FEs
+prune" => "string",       # pruning prior to argrec?
+["exp_ID", "test_ID", "split_ID", "feature_name", "classif", "step",
+           "group", "dataset","mode"]                      # variables
+= External Config Data
+directory" => "string", # features
+experiment_id" => "string",
+gfmap_restrict_to_downpath" => "bool",
+gfmap_restrict_pathlen" => "integer",
+gfmap_remove_gf" => "list"

data/doc/fred.pdf ADDED Viewed

Binary file

data/doc/index.md ADDED Viewed

@@ -0,0 +1,120 @@
+# Shalmaneser Documentation Index
+## Prerequisites
+You need the following items installed on your system:
+- [Ruby](https://www.ruby-lang.org/en/downloads/), at least version ``1.8.7`` (please note that the version ``1.8.7`` is deprecated, future Shalmaneser incarnations will run only under Ruby greater than ``1.9.x``)
+- a MySQL database server, your database must be large enough to hold the test data (in end user mode) plus any training data (for training new models in manual mode), e.g. training on the complete FrameNet 1.2 dataset requires about 1.5 GB of free space.
+- if you don't want to train classifiers from you own data, you need to download suitable classifiers from our homepage for available configurations (see for links later).
+- preprocessing tools for your language, at least the ones required for the use of pre-trained classifiers. Currently Shalmaneser provides interfaces for the following systems:
+<table>
+<tr>
+<th>System</th><th>Version</th>
+</tr>
+<tr>
+<td>TreeTagger</td><td>README from 09.04.96</td>
+</tr>
+<tr>
+<td>Collins Parser</td><td>1.0</td>
+</tr>
+<tr>
+<td>Berkeley Parser</td><td>latest</td>
+</tr>
+<tr>
+<td>Stanford Parser</td><td>latest</td>
+</tr>
+</table>
+- at least one machine learning system. Currently Shalmaneser provides interfaces for the following systems:
+<table>
+<tr>
+<th>System</th><th>Version</th>
+</tr>
+<tr>
+<td>OpenNLP MaxEnt</td><td>2.4.0</td>
+</tr>
+<tr>
+<td>TiMBL</td><td>Timbl5</td>
+</tr>
+<tr>
+<td>Mallet</td><td>Mallet 0.4</td>
+</tr>
+</table>
+Note: Please make sure you run the system in a terminal with Unicode encoding (``export LANG=eng_US.UTF-8``).
+## Setting up Shalmaneser on your system
+### MySQL Database
+You need an instance of MySQL Server running on your system. Possibly, you have such a server on your site on the local or remote server. If not, please install one (e.g. on Debian based systems):
+    $ sudo aptitude install mysql-server mysql-client
+During the installation you'll be prompted for the root password.
+Log in into MySQL management console:
+    $ mysql -u root -p
+You will be asked for the ``root`` password. The following commands suppose a local installation of MySQL.
+Create a new user for Shalmaneser (or use an existing one if it complies with your security policy):
+    mysql> CREATE USER 'shalm'@'localhost' IDENTIFIED BY 'shalmpassword';
+Feel free to change the username and the password.
+Create at least one database for Shalmaneser (it is convenient to use several databases to reuse experiment results):
+    mysql> CREATE DATABASE shalmaneser;
+Give your new user rights to use the new database and (for older MySQL versions) flush the privileges:
+    mysql> GRANT ALL PRIVILEGES ON shalmaneser.* TO 'shalm'@'localhost';
+    mysql> FLUSH PRIVILEGES; # Not needed on newer systems.
+The ``username``, the ``password`` and the ``database name`` are essential for for the experiment file declarations.
+### TreeTagger
+Downloand the TreeTagger archive from the official [site](http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/) by Helmut Schmid, uncompress it to your favorite location, preserve the initial directory structure. The path to the root directory is essential for the experiment file declarations. Schalmaneser expects the following directory structure:
+    TreeTaggerRootDirectory/
+    |_ bin/
+    |    |_ tree-tagger
+    |_ lib/
+    |    |_ english.par
+    |    |_ german.par
+    |_ cmd/
+         |_ filter-german-tags
+If you cannot name the binary or the model (the ``.par`` file) as given above please set the following environment variables: ``SHALM_TREETAGGER_BIN`` and ``SHALM_TREETAGGER_MODEL``.
+Please do not use Unicode models for TreeTagger for now! We'll change this dependency in the future.
+### Berkeley Parser
+Downloand the Berkeley Parser archive from the official [site](https://code.google.com/p/berkeleyparser/downloads/list) at Google Code, uncompress it to your favorite location. The path to the root directory is essential for the experiment file declarations. Schalmaneser expects the following directory structure:
+    BerkeleyRootDirectory/
+    |_ berkeleyParser.jar
+    |_ grammar.gr
+If you cannot name the binary and/or the model as given above please set the following environment variables: ``SHALM_BERKELEY_BIN`` and ``SHALM_BERKELEY_MODEL``.
+### Stanford Parser
+Downloand the Stanford Parser archive from the official [site](http://nlp.stanford.edu/software/lex-parser.shtml), uncompress it to your favorite location. The path to the root directory is essential for the experiment file declarations. Schalmaneser expects the following directory structure:
+    StanfordRootDirectory/
+    |_ stanford_parser.jar
+    |_ stanford_parser-x.y.z-models.jar
+### OpenNLP MaxEnt
+Downloand the MaxEnt archive from the official [site](http://sourceforge.net/projects/maxent/files/Maxent/2.4.0/) from SourceForge, uncompress it to your favorite location. Set ``JAVA_HOME`` if it isn't set on your system. Run ``build.sh`` in the MaxEnt Root Directory.
+The path to the root directory is essential for the experiment file declarations. Schalmaneser expects the following directory structure:
+    MaxEntRootDirectory/
+    |_ output/
+            |_ classes/

data/doc/salsa_tool.pdf ADDED Viewed

Binary file

data/doc/salsatigerxml.pdf ADDED Viewed

Binary file

data/doc/shal_doc.pdf ADDED Viewed

Binary file

data/doc/shal_lrec.pdf ADDED Viewed

Binary file

data/lib/ext/maxent/Classify.class ADDED Viewed

Binary file

data/lib/ext/maxent/Train.class ADDED Viewed

Binary file

data/lib/frprep/TreetaggerInterface.rb CHANGED Viewed

@@ -117,13 +117,13 @@ class TreetaggerInterface < SynInterfaceTab
   include TreetaggerModule
   ###
-  def TreetaggerInterface.system()
-    return "treetagger"
+  def self.system
+    'treetagger'
   end
   ###
-  def TreetaggerInterface.service()
-    return "lemmatizer"
+  def self.service
+    'lemmatizer'
   end
   ###