shalmaneser 1.2.0.rc2 → 1.2.0.rc3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 0ca81bfeedc7c124833b61ae3facccf6eb86cecc
4
- data.tar.gz: 38ada6d619d55ede1e291a7c1534b1633b08e363
3
+ metadata.gz: f64175ecd62ad8540989348c15317500a81a001f
4
+ data.tar.gz: fe381a419d70708f84ee2060bc91fea35e31cf26
5
5
  SHA512:
6
- metadata.gz: 863962998b66640c61e54a29ceb1a21821ff6f45d20329a22c4fce5284f2877e2e4b4f7ee88fd1cdf37b6e90d68d8b1d1acaac41c1225fe858ce4262f9b638da
7
- data.tar.gz: 78334e6fa48815e1fa1dbc502759cb4632655a8bde2b0810ac2afe53c288bc983d5368e50812d576a4604c31acfff8921dc4f1b9707d569a60e0be4facdfd03e
6
+ metadata.gz: f888c3690741dda8f2ca980f1ba51020a696ea92fc6c2cb3e596da8349c11e946acf08ae4ead19cf2664f20b827cda49882783ccd2aabd29cb902a819b2e9c65
7
+ data.tar.gz: 3268708b720df30dac6928bb5df7732391a1f73a9da65bd3cc2912af27447831ca0c0d281a0d032e279bf9356a07bcd3ecb099e542666887d1b3447462ab34e2
Binary file
Binary file
data/.yardopts CHANGED
@@ -1,8 +1,10 @@
1
1
  --private
2
2
  --protected
3
3
  --title 'SHALMANESER'
4
- lib/**/*
4
+ lib/**/*.rb
5
5
  bin/**/*
6
+ doc/**/*.md
6
7
  -
7
8
  CHANGELOG.md
8
9
  LICENSE.md
10
+ doc/index.md
data/README.md CHANGED
@@ -1,22 +1,28 @@
1
1
  # [SHALMANESER - a SHALlow seMANtic parSER](http://www.coli.uni-saarland.de/projects/salsa/shal/)
2
2
 
3
3
 
4
- [RubyGems](http://rubygems.org/gems/shalmaneser) | [Shalmanesers Project Page](http://bu.chsta.be/projects/shalmaneser/) |
5
- [Source Code](https://github.com/arbox/shalmaneser) | [Bug Tracker](https://github.com/arbox/shalmaneser/issues)
4
+ [RubyGems](http://rubygems.org/gems/shalmaneser) | [Shalmanesers Project Page](http://bu.chsta.be/projects/shalmaneser/) | [Source Code](https://github.com/arbox/shalmaneser) | [Bug Tracker](https://github.com/arbox/shalmaneser/issues)
6
5
 
7
6
  [<img src="https://badge.fury.io/rb/shalmaneser.png" alt="Gem Version" />](http://badge.fury.io/rb/shalmaneser)
8
- [<img src="https://travis-ci.org/arbox/shalmaneser.png" alt="Build Status" />](https://travis-ci.org/arbox/shalmaneser)
7
+ [![Build Status](https://travis-ci.org/arbox/shalmaneser.png?branch=1.2)](https://travis-ci.org/arbox/shalmaneser)
9
8
  [<img src="https://codeclimate.com/github/arbox/shalmaneser.png" alt="Code Climate" />](https://codeclimate.com/github/arbox/shalmaneser)
10
9
  [<img alt="Bitdeli Badge" src="https://d2weczhvl823v0.cloudfront.net/arbox/shalmaneser/trend.png" />](https://bitdeli.com/free)
11
10
  [![Dependency Status](https://gemnasium.com/arbox/shalmaneser.png)](https://gemnasium.com/arbox/shalmaneser)
12
11
 
13
12
  ## Description
14
13
 
15
- Please be careful, the whole thing is under construction! Shalmaneser it not intended to run on Windows systems. For now it has been tested on only Linux.
14
+ Please be careful, the whole thing is under construction! For now Shalmaneser it not intended to run on Windows systems since it heavily uses system call for external invocations.
15
+ Current versions of Shalmaneser have been tested on Linux only (other *NIX testers are welcome!).
16
16
 
17
- Shalmaneser is a supervised learning toolbox for shallow semantic parsing, i.e. the automatic assignment of semantic classes and roles to text. The system was developed for Frame Semantics; thus we use Frame Semantics terminology and call the classes frames and the roles frame elements. However, the architecture is reasonably general, and with a certain amount of adaption, Shalmaneser should be usable for other paradigms (e.g., PropBank roles) as well. Shalmaneser caters both for end users, and for researchers.
17
+ Shalmaneser is a supervised learning toolbox for shallow semantic parsing, i.e. the automatic assignment of semantic classes and roles to text. This technique is often called SRL (Semantic Role Labelling). The system was developed for Frame Semantics; thus we use Frame Semantics terminology and call the classes frames and the roles frame elements. However, the architecture is reasonably general, and with a certain amount of adaption, Shalmaneser should be usable for other paradigms (e.g., PropBank roles) as well. Shalmaneser caters both for end users, and for researchers.
18
18
 
19
- For end users, we provide a simple end user mode which can simply apply the pre-trained classifiers for English (FrameNet annotation / Collins parser) and German (SALSA Frame annotation / Sleepy parser). For researchers interested in investigating shallow semantic parsing, our system is extensively configurable and extendable.
19
+ For end users, we provide a simple end user mode which can simply apply the pre-trained classifiers
20
+ for [English](http://www.coli.uni-saarland.de/projects/salsa/shal/index.php?nav=download) (FrameNet 1.3 annotation / Collins parser)
21
+ and [German](http://www.coli.uni-saarland.de/projects/salsa/shal/index.php?nav=download) (SALSA 1.0 annotation / Sleepy parser).
22
+
23
+ We'll try to provide newer pretrained models for English, German, and possibly other languages as soon as possible.
24
+
25
+ For researchers interested in investigating shallow semantic parsing, our system is extensively configurable and extendable.
20
26
 
21
27
  ## Origin
22
28
  You can find original versions of Shalmaneser up to ``1.1`` on the [SALSA](http://www.coli.uni-saarland.de/projects/salsa/shal/) project page.
@@ -24,6 +30,7 @@ You can find original versions of Shalmaneser up to ``1.1`` on the [SALSA](http:
24
30
  ## Publications on Shalmaneser
25
31
 
26
32
  - K. Erk and S. Padó: Shalmaneser - a flexible toolbox for semantic role assignment. Proceedings of LREC 2006, Genoa, Italy. [Click here for details](http://www.nlpado.de/~sebastian/pub/papers/lrec06_erk.pdf).
33
+ - TODO: add other works
27
34
 
28
35
  ## Documentation
29
36
 
@@ -0,0 +1,13 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'shalmaneser/opt_parser'
4
+
5
+ # Change for options to default <-h> for now.
6
+ #cmd_args = ['-h']
7
+ cmd_args = ARGV
8
+
9
+ begin
10
+ options = Shalmaneser::OptParser.parse(cmd_args)
11
+ rescue
12
+ raise
13
+ end
@@ -0,0 +1,191 @@
1
+ # Experiment file description
2
+ The whole work with Shalmaneser and its submodules is governed be experiment files.
3
+
4
+ In an experiment file all feature specifications have the form:
5
+
6
+ feature_name = feature_value
7
+
8
+ The ``feature_name`` is a string without spaces. And the ``feature_value`` may include spaces, depending on the feature type (see below).
9
+
10
+ To include a comment in a config file, start the comment line with ``#``.
11
+
12
+ Features are typed. The following ``normal`` types are supported:
13
+
14
+ - ``bool``,
15
+ - ``float``,
16
+ - ``integer``,
17
+ - ``string``
18
+
19
+ For the ``#get`` method, with which features in the ``ConfigData`` object are accessed, the values are transformed from the strings in the experiment file to the appropriate Ruby class.
20
+
21
+ Other types:
22
+
23
+ - ``pattern``,
24
+ - `` list``.
25
+
26
+ Feature of the ``pattern`` type are features that may include variables in <> brackets. When this feature is accesssed, values for these variables are given, i.e. this pattern has to be instantiated.
27
+
28
+ For example, given a feature
29
+
30
+ fileformat = features.<type>.train
31
+
32
+ and method call
33
+
34
+ instantiate("fileformat", "type" => "path")
35
+
36
+ what is returned is a String ``features.path.train``.
37
+
38
+ The ``list`` type is the only feature type where more than one feature specification with the same feature_name is allowed. The right-hand sides of a list feature are stored in an array.
39
+
40
+ Given a ``list`` feature ``bla``, if the experiment file contains:
41
+
42
+ bla = blupp 1 2
43
+ bla = la di da
44
+
45
+ the list feature ``bla`` is represented as follows:
46
+
47
+ @features['bla'] = [['blupp', 1,2], ['la', 'di', 'da']]
48
+
49
+ For comfortable access to a list feature, arbitrary access functions for list features can be defined.
50
+
51
+ ## Fred and Rosy Preprocessor (aka frprep|prep)
52
+
53
+ "prep_experiment_ID" => "string", # experiment identifier
54
+ "frprep_directory" => "string", # dir for frprep internal data
55
+
56
+ # information about the dataset
57
+ "language" => "string", # en, de
58
+ "origin"=> "string", # FrameNet, Salsa, or nothing
59
+ "format" => "string", # Plain, SalsaTab, FNXml, FNCorpusXml, SalsaTigerXML
60
+ "encoding" => "string", # utf8, iso, hex, or nothing
61
+
62
+ # directories
63
+ "directory_input" => "string", # dir with input data
64
+ "directory_preprocessed" => "string", # dir with output Salsa/Tiger XML data
65
+ "directory_parserout" => "string", # dir with parser output for the parser named below
66
+
67
+ # syntactic processing
68
+ "pos_tagger" => "string", # name of POS tagger
69
+ "lemmatizer" => "string", # name of lemmatizer
70
+ "parser" => "string", # name of parser
71
+ "pos_tagger_path" => "string", # path to POS tagger
72
+ "lemmatizer_path" => "string", # path to lemmatizer
73
+ "parser_path" => "string", # path to parser
74
+ "parser_max_sent_num" => "integer", # max number of sentences per parser input file
75
+ "parser_max_sent_len" => "integer", # max sentence length the parser handles
76
+
77
+ "do_parse" => "bool", # use parser?
78
+ "do_lemmatize" => "bool",# use lemmatizer?
79
+ "do_postag" => "bool", # use POS tagger?
80
+
81
+ # output format: if tabformat_output == true,
82
+ # output in Tab format rather than Salsa/Tiger XML
83
+ # (this will not work if do_parse == true)
84
+ "tabformat_output" => "bool",
85
+
86
+ # syntactic repairs, dependent on existing semantic role annotation
87
+ "fe_syn_repair" => "bool", # map words to constituents for FEs: idealize?
88
+ "fe_rel_repair" => "bool", # FEs: include non-included relative clauses into FEs
89
+
90
+ ## Frame Disambiguation System (aka Fred)
91
+ "experiment_ID" => "string", # experiment ID
92
+ "enduser_mode" => "bool", # work in enduser mode? (disallowing many things)
93
+
94
+ "preproc_descr_file_train" => "string", # path to preprocessing files
95
+ "preproc_descr_file_test" => "string",
96
+ "directory_output" => "string", # path to Salsa/Tiger XML output directory
97
+
98
+ "verbose" => "bool" , # print diagnostic messages?
99
+ "apply_to_all_known_targets" => "bool", # apply to all known targets rather than the ones with a frame?
100
+
101
+ "fred_directory" => "string",# directory for internal info
102
+ "classifier_dir" => "string", # write classifiers here
103
+
104
+ "classifier" => "list", # classifiers
105
+
106
+ "dbtype" => "string", # "mysql" or "sqlite"
107
+
108
+ "host" => "string", # DB access: sqlite only
109
+ "user" => "string",
110
+ "passwd" => "string",
111
+ "dbname" => "string",
112
+
113
+ # featurization info
114
+ "feature" => "list", # which features to use for the classifier?
115
+ "binary_classifiers" => "bool",# make binary rather than n-ary clasifiers?
116
+ "negsense" => "string", # binary classifier: negative sense is..?
117
+ "numerical_features" => "string", # do what with numerical features?
118
+
119
+ # what to do with items that have multiple senses?
120
+ # 'binarize': binary classifiers, and consider positive
121
+ # if the sense is among the gold senses
122
+ # 'join' : make one joint sense
123
+ # 'repeat' : make multiple occurrences of the item, one sense per occ
124
+ # 'keep' : keep as separate labels
125
+ #
126
+ # multilabel: consider as assigned all labels
127
+ # above a certain confidence threshold?
128
+ "handle_multilabel" => "string",
129
+ "assignment_confidence_threshold" => "float",
130
+
131
+ # single-sentence context?
132
+ "single_sent_context" => "bool",
133
+
134
+ # noncontiguous input? then we need access to a larger corpus
135
+ "noncontiguous_input" => "bool",
136
+ "larger_corpus_dir" => "string",
137
+ "larger_corpus_format" => "string",
138
+ "larger_corpus_encoding" => "string"
139
+ ## Role Assignment System (aka Rosy)
140
+ # features
141
+ "feature" => "list",
142
+ "classifier" => "list",
143
+
144
+ "verbose" => "bool" ,
145
+ "enduser_mode" => "bool",
146
+
147
+ "experiment_ID" => "string",
148
+
149
+ "directory_input_train" => "string",
150
+ "directory_input_test" => "string",
151
+ "directory_output" => "string",
152
+
153
+ "preproc_descr_file_train" => "string",
154
+ "preproc_descr_file_test" => "string",
155
+ "external_descr_file" => "string",
156
+
157
+ "dbtype" => "string", # "mysql" or "sqlite"
158
+
159
+ "host" => "string", # DB access: sqlite only
160
+ "user" => "string",
161
+ "passwd" => "string",
162
+ "dbname" => "string",
163
+
164
+ "data_dir" => "string", # for external use
165
+ "rosy_dir" => "pattern", # for internal use only, set by rosy.rb
166
+
167
+ "classifier_dir" => "string", # if present, special directory for classifiers
168
+
169
+ "classif_column_name" => "string",
170
+ "main_table_name" => "pattern",
171
+ "test_table_name" => "pattern",
172
+
173
+ "eval_file" => "pattern",
174
+ "log_file" => "pattern",
175
+ "failed_file" => "pattern",
176
+ "classifier_file" => "pattern",
177
+ "classifier_output_file" => "pattern",
178
+ "noval" => "string",
179
+
180
+
181
+ "split_nones" => "bool",
182
+ "print_eval_log" => "bool",
183
+ "assume_argrec_perfect" => "bool",
184
+ "xwise_argrec" => "string",
185
+ "xwise_arglab" => "string",
186
+ "xwise_onestep" => "string",
187
+
188
+ "fe_syn_repair" => "bool", # map words to constituents for FEs: idealize?
189
+ "fe_rel_repair" => "bool", # FEs: include non-included relative clauses into FEs
190
+
191
+ "prune" => "string", # pruning prior to argrec?
@@ -3,6 +3,7 @@
3
3
  ## Prerequisites
4
4
 
5
5
  You need the following items installed on your system:
6
+
6
7
  - [Ruby](https://www.ruby-lang.org/en/downloads/), at least version ``1.8.7`` (please note that the version ``1.8.7`` is deprecated, future Shalmaneser incarnations will run only under Ruby greater than ``1.9.x``)
7
8
  - a MySQL database server, your database must be large enough to hold the test data (in end user mode) plus any training data (for training new models in manual mode), e.g. training on the complete FrameNet 1.2 dataset requires about 1.5 GB of free space.
8
9
  - if you don't want to train classifiers from you own data, you need to download suitable classifiers from our homepage for available configurations (see for links later).
@@ -111,7 +112,7 @@ Downloand the Stanford Parser archive from the official [site](http://nlp.stanfo
111
112
  |_ stanford_parser-x.y.z-models.jar
112
113
 
113
114
  ### OpenNLP MaxEnt
114
- Downloand the MaxEnt archive from the official [site](http://sourceforge.net/projects/maxent/files/Maxent/2.4.0/) from SourceForge, uncompress it to your favorite location. Set ``JAVA_HOME`` if it isn't set on your system. Run ``build.sh`` in the MaxEnt Root Directory.
115
+ Downloand the MaxEnt archive from the official [site](http://sourceforge.net/projects/maxent/files/Maxent/2.4.0/) from SourceForge. You have to use the Version ``2.4.0``, other versions aren't compatible with Shalmaneser for now, but we are working on it. Untar the archive to your favorite location. Set ``JAVA_HOME`` if it isn't set on your system. Run ``build.sh`` in the MaxEnt Root Directory.
115
116
 
116
117
  The path to the root directory is essential for the experiment file declarations. Schalmaneser expects the following directory structure:
117
118
 
@@ -1,4 +1,6 @@
1
- require 'ISO-8859-1'
1
+ # @note AB: This whole thing should be obsolete on Ruby 1.9
2
+ # @note #unpack seems to work on 1.8 and 1.9 equally
3
+ require 'common/ISO-8859-1'
2
4
 
3
5
  ####################3
4
6
  # Reformatting to and from
@@ -23,9 +23,9 @@
23
23
  # ne: named entity
24
24
  # sent_id: sentence ID
25
25
 
26
- require "Ampersand"
27
- require "ISO-8859-1"
28
- require "RegXML"
26
+ require 'frprep/Ampersand'
27
+ require 'common/ISO-8859-1'
28
+ require 'common/RegXML'
29
29
 
30
30
  #####################
31
31
  # mixins to make work with RegXML a little less repetitive
@@ -2,7 +2,7 @@
2
2
  #
3
3
  # this module offers methods to extract gemma corpora from the FrameNet database#
4
4
 
5
- require 'FrameXML'
5
+ require 'frprep/FrameXML'
6
6
 
7
7
  class FNDatabase
8
8
 
@@ -28,9 +28,9 @@
28
28
  # write new adapted FNTab format
29
29
  # ( "word", ("pt", "gf", "role", "target", "frame", "stuff")* "ne", "sent_id" )
30
30
 
31
- require 'Ampersand'
32
- require 'ISO-8859-1'
33
- require 'RegXML'
31
+ require 'frprep/Ampersand'
32
+ require 'common/ISO-8859-1'
33
+ require 'common/RegXML'
34
34
 
35
35
  class FrameXMLFile # only verified to work for FrameNet v1.1
36
36
 
@@ -1,6 +1,9 @@
1
1
  require 'frprep/do_parses'
2
2
  require 'common/frprep_helper'
3
3
  require 'common/FixSynSemMapping'
4
+ # For FN input.
5
+ require 'frprep/FNCorpusXML'
6
+ require 'frprep/FNDatabase'
4
7
 
5
8
  ##############################
6
9
  # The class that does all the work
@@ -91,8 +91,12 @@ class BerkeleyInterface < SynInterfaceSTXML
91
91
 
92
92
  # AB: for testing we leave this step out, it takes too much time.
93
93
  # Please keep the <parsefile> intact!!!
94
- Kernel.system("#{berkeley_prog} < #{tempfile.path} > #{parsefilename}")
94
+ rv = system("#{berkeley_prog} < #{tempfile.path} > #{parsefilename}")
95
95
 
96
+ # AB: Testing for return value.
97
+ unless rv
98
+ fail 'Berkeley Parser failed to parse our files!'
99
+ end
96
100
  end
97
101
  end
98
102
 
@@ -129,7 +133,16 @@ class BerkeleyInterface < SynInterfaceSTXML
129
133
  line = parsefile.gets
130
134
 
131
135
  # search for the next "relevant" file or end of the file
132
- if line.nil? or line=~/^\( *\((PSEUDO|TOP|ROOT)/ or line=~/^\(\(\)/
136
+ # We expect here:
137
+ # - an empty line;
138
+ # - a failed parse;
139
+ # - a parse beginning with <( (>, <( (TOP>, <( (VROOT> etc.
140
+ # TOP - Negra Grammars
141
+ # VROOT - Tiger Grammars
142
+ # PSEUDE - Original BP Grammars
143
+ # ROOT - some english grammars
144
+ # empty identifiers for older Tiger grammars
145
+ if line.nil? or line=~/^\( *\((PSEUDO|TOP|ROOT|VROOT)? / or line=~/^\(\(\)/
133
146
  break
134
147
  end
135
148
  sentid +=1
@@ -141,12 +154,21 @@ class BerkeleyInterface < SynInterfaceSTXML
141
154
  raise "Error: premature end of parser file!"
142
155
  end
143
156
 
144
-
157
+ # Insert a top node <VROOT> if missing.
158
+ # Some grammars trained on older Tiger Versions
159
+ # expose this problem.
160
+ line.sub!(/^(\(\s+\(\s+)/, '\1VROOT')
161
+
145
162
  # berkeley parser output: remove brackets /(.*)/
163
+ # Remove leading and trailing top level brackets.
146
164
  line.sub!(/^\( */, '')
147
165
  line.sub!(/ *\) *$/, '')
166
+
167
+ # Split consequtive closing brackets.
148
168
  line.gsub!(/\)\)/, ') )')
149
169
  line.gsub!(/\)\)/, ') )')
170
+
171
+ # Change CAT_FUNC delimiter from <_> to <->.
150
172
  line.gsub!(/(\([A-Z]+)_/, '\1-')
151
173
 
152
174
  sentence_str = line.chomp!
@@ -326,24 +348,27 @@ class BerkeleyInterface < SynInterfaceSTXML
326
348
 
327
349
  return build_salsatiger(sentence,pos+$&.length, stack,termc,nontc,sent_obj)
328
350
  else
329
- raise "Error: cannot analyse sentence at pos #{pos}: #{sentence[pos..-1]}. Complete sentence: \n#{sentence}"
351
+ raise "Error: cannot analyse sentence at pos #{pos}: <#{sentence[pos..-1]}>. Complete sentence: \n#{sentence}"
330
352
  end
331
353
  end
332
354
 
333
355
  ###
334
- # BerkeleyParser delivers node labels as "phrase type"-"grammatical function",
335
- # but the GF may not be present.
356
+ # BerkeleyParser delivers node labels in different forms:
357
+ # - "phrase type"-"grammatical function",
358
+ # - "phrase type"_"grammatical function",
359
+ # - "prase type":"grammatical function",
360
+ # but the GF may be absent.
336
361
  # @param cat [String]
337
- # @return [String]
362
+ # @return [Array<String>]
338
363
  def split_cat(cat)
339
364
 
340
- md = cat.match(/^([^-]*)(-([^-]*))?$/)
365
+ md = cat.match(/^([^-:_]*)([-:_]([^-:_]*))?$/)
341
366
  raise "Error: Could not identify category in #{cat}!" unless md[1]
342
367
 
343
368
  proper_cat = md[1]
344
369
  md[3] ? gf = md[3] : gf = ''
345
370
 
346
- [proper_cat,gf]
371
+ [proper_cat, gf]
347
372
  end
348
373
 
349
374
  end
@@ -0,0 +1,51 @@
1
+ require 'optparse'
2
+ require 'shalmaneser/version'
3
+
4
+
5
+ module Shalmaneser
6
+ class OptParser
7
+ def self.parse(cmd_args)
8
+
9
+ parser = create_parser
10
+
11
+ if cmd_args.empty?
12
+ cmd_args << '-h'
13
+ end
14
+
15
+ # Parse ARGV and provide the options hash.
16
+ # Check if everything is correct and handle exceptions
17
+ begin
18
+ parser.parse(cmd_args)
19
+ rescue OptionParser::InvalidArgument => e
20
+ arg = e.message.split.last
21
+ puts "The provided argument #{arg} is currently not supported by Shalmaneser!"
22
+ puts 'Please colsult <shalmaneser --help>.'
23
+ exit(1)
24
+ rescue OptionParser::InvalidOption => e
25
+ puts "You have provided an #{e.message}."
26
+ puts 'Please colsult <shalmaneser --help>.'
27
+ exit(1)
28
+ rescue
29
+ raise
30
+ end
31
+ end
32
+
33
+ def self.create_parser
34
+ OptionParser.new do |opts|
35
+ opts.banner = 'Usage: shalmaneser OPTIONS'
36
+ opts.separator ''
37
+ opts.separator 'Common options:'
38
+
39
+ opts.on_tail('-h', '--help', 'Show the help message.') do
40
+ puts opts
41
+ exit
42
+ end
43
+
44
+ opts.on_tail('-v', '--version', 'Show the program version.') do
45
+ puts VERSION
46
+ exit
47
+ end
48
+ end
49
+ end
50
+ end # OptParser
51
+ end # Shalmaneser
@@ -1,3 +1,3 @@
1
1
  module Shalmaneser
2
- VERSION = '1.2.0.rc2'
2
+ VERSION = '1.2.0.rc3'
3
3
  end
metadata CHANGED
@@ -1,83 +1,105 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: shalmaneser
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.2.0.rc2
4
+ version: 1.2.0.rc3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrei Beliankou
8
8
  autorequire:
9
9
  bindir: bin
10
- cert_chain: []
11
- date: 2014-01-06 00:00:00.000000000 Z
10
+ cert_chain:
11
+ - |
12
+ -----BEGIN CERTIFICATE-----
13
+ MIIDZDCCAkygAwIBAgIBATANBgkqhkiG9w0BAQUFADA8MQ4wDAYDVQQDDAVhcmJv
14
+ eDEWMBQGCgmSJomT8ixkARkWBnlhbmRleDESMBAGCgmSJomT8ixkARkWAnJ1MB4X
15
+ DTE0MDEwNjE1NDU0MFoXDTE1MDEwNjE1NDU0MFowPDEOMAwGA1UEAwwFYXJib3gx
16
+ FjAUBgoJkiaJk/IsZAEZFgZ5YW5kZXgxEjAQBgoJkiaJk/IsZAEZFgJydTCCASIw
17
+ DQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBAKpdkXWo8sFAq/Dd+rCLRCKHpH02
18
+ 8cZsiy3Dx5kt9qpjYn/LX4/QlJ2mc2C3QXUr++DFJjA0K3TcRS2esUVS9ZlNMDM9
19
+ YQnxFmPJ4tfpsMiteQMBVqU643aZrh64rqddklg8BwRec+prIIDxfQHzXalnNBad
20
+ YfiHhjgTh5YQsx3Q0zidhlAtsIbJljaNLuJ4DiVQUtjumEnOI0HTLTuUdpg/Hhh+
21
+ nPlnhwOUBGzj5hUGzf9QcbV2k99KXsKlHQVkMDn7gsXuIKsisVde07lUbhhR7YGy
22
+ Z3vGnZK7oNI0It0LIBm7pdx2gtB4YG9O5QKEJo0WzLY60TiY8DzDguLndIcCAwEA
23
+ AaNxMG8wCQYDVR0TBAIwADALBgNVHQ8EBAMCBLAwHQYDVR0OBBYEFHhWOk+TWhtU
24
+ KMnM8ZyfBZYcVXxDMBoGA1UdEQQTMBGBD2FyYm94QHlhbmRleC5ydTAaBgNVHRIE
25
+ EzARgQ9hcmJveEB5YW5kZXgucnUwDQYJKoZIhvcNAQEFBQADggEBAF2Y+mc/uTug
26
+ OX3ivVkD4AaPpFsB2EglJhQxivlAHkix593RpZPXNf6jeu36oRCV/vRFLkzzaZ73
27
+ N7MaI5Z2HczDkZvi8ZZM5L3p4wHttquranUdI3bZv4SiAVFmhkeFZLSp6pFf/Fmg
28
+ qmEeXWVbsCIhYI7KYQ0XKbnRuj9AmjUEoMBZPnMsM1S/R+dBQfrUszXROWqxaENA
29
+ 728ScNHCmRYuNutDO9yRDJT1SRumpgwH4df6c0LHBCuXuQTWODYqc/CDZJJb9Tfi
30
+ BJreIpPMe0KFMphkN/x5cHkRDtMoY+rBGcqRe60otCEsAHdM+CXox9tAREnr/4lT
31
+ Jn9sRDVszy4=
32
+ -----END CERTIFICATE-----
33
+ date: 2014-01-11 00:00:00.000000000 Z
12
34
  dependencies:
13
35
  - !ruby/object:Gem::Dependency
14
36
  name: mysql
15
37
  requirement: !ruby/object:Gem::Requirement
16
38
  requirements:
17
- - - ~>
39
+ - - '>='
18
40
  - !ruby/object:Gem::Version
19
41
  version: '0'
20
42
  type: :runtime
21
43
  prerelease: false
22
44
  version_requirements: !ruby/object:Gem::Requirement
23
45
  requirements:
24
- - - ~>
46
+ - - '>='
25
47
  - !ruby/object:Gem::Version
26
48
  version: '0'
27
49
  - !ruby/object:Gem::Dependency
28
50
  name: rdoc
29
51
  requirement: !ruby/object:Gem::Requirement
30
52
  requirements:
31
- - - ~>
53
+ - - '>='
32
54
  - !ruby/object:Gem::Version
33
55
  version: '0'
34
56
  type: :development
35
57
  prerelease: false
36
58
  version_requirements: !ruby/object:Gem::Requirement
37
59
  requirements:
38
- - - ~>
60
+ - - '>='
39
61
  - !ruby/object:Gem::Version
40
62
  version: '0'
41
63
  - !ruby/object:Gem::Dependency
42
64
  name: bundler
43
65
  requirement: !ruby/object:Gem::Requirement
44
66
  requirements:
45
- - - ~>
67
+ - - '>='
46
68
  - !ruby/object:Gem::Version
47
69
  version: '0'
48
70
  type: :development
49
71
  prerelease: false
50
72
  version_requirements: !ruby/object:Gem::Requirement
51
73
  requirements:
52
- - - ~>
74
+ - - '>='
53
75
  - !ruby/object:Gem::Version
54
76
  version: '0'
55
77
  - !ruby/object:Gem::Dependency
56
78
  name: yard
57
79
  requirement: !ruby/object:Gem::Requirement
58
80
  requirements:
59
- - - ~>
81
+ - - '>='
60
82
  - !ruby/object:Gem::Version
61
83
  version: '0'
62
84
  type: :development
63
85
  prerelease: false
64
86
  version_requirements: !ruby/object:Gem::Requirement
65
87
  requirements:
66
- - - ~>
88
+ - - '>='
67
89
  - !ruby/object:Gem::Version
68
90
  version: '0'
69
91
  - !ruby/object:Gem::Dependency
70
92
  name: rake
71
93
  requirement: !ruby/object:Gem::Requirement
72
94
  requirements:
73
- - - ~>
95
+ - - '>='
74
96
  - !ruby/object:Gem::Version
75
97
  version: '0'
76
98
  type: :development
77
99
  prerelease: false
78
100
  version_requirements: !ruby/object:Gem::Requirement
79
101
  requirements:
80
- - - ~>
102
+ - - '>='
81
103
  - !ruby/object:Gem::Version
82
104
  version: '0'
83
105
  description: |
@@ -89,6 +111,7 @@ description: |
89
111
  Project at the University of Saarbrücken.
90
112
  email: arbox@yandex.ru
91
113
  executables:
114
+ - shalmaneser
92
115
  - frprep
93
116
  - fred
94
117
  - rosy
@@ -97,6 +120,8 @@ extra_rdoc_files:
97
120
  - README.md
98
121
  - LICENSE.md
99
122
  - CHANGELOG.md
123
+ - doc/exp_files.md
124
+ - doc/index.md
100
125
  files:
101
126
  - .yardopts
102
127
  - CHANGELOG.md
@@ -105,14 +130,9 @@ files:
105
130
  - bin/fred
106
131
  - bin/frprep
107
132
  - bin/rosy
108
- - doc/SB_README
109
- - doc/exp_files_description.txt
110
- - doc/fred.pdf
133
+ - bin/shalmaneser
134
+ - doc/exp_files.md
111
135
  - doc/index.md
112
- - doc/salsa_tool.pdf
113
- - doc/salsatigerxml.pdf
114
- - doc/shal_doc.pdf
115
- - doc/shal_lrec.pdf
116
136
  - lib/common/AbstractSynInterface.rb
117
137
  - lib/common/ConfigData.rb
118
138
  - lib/common/Counter.rb
@@ -217,16 +237,16 @@ files:
217
237
  - lib/rosy/View.rb
218
238
  - lib/rosy/opt_parser.rb
219
239
  - lib/rosy/rosy.rb
240
+ - lib/shalmaneser/opt_parser.rb
220
241
  - lib/shalmaneser/version.rb
221
- homepage: https://github.com/arbox/shalmaneser
242
+ homepage: http://bu.chsta.be/projects/shalmaneser/
222
243
  licenses:
223
244
  - GPL-2.0
224
245
  metadata:
225
246
  issue_tracker: https://github.com/arbox/shalmaneser/issues
226
- homepage: http://bu.chsta.be/projects/shalmaneser/
227
247
  post_install_message: |2+
228
248
 
229
- Thank you for installing Shalmaneser 1.2.0.rc2!
249
+ Thank you for installing Shalmaneser 1.2.0.rc3!
230
250
 
231
251
  This software package has multiple external dependencies:
232
252
  - OpenNLP Maximum Entropy Classifier;
Binary file
@@ -1,57 +0,0 @@
1
- # Before running the programs you should make sure that all components
2
- # needed by shalmaneser are installed and that all paths in the
3
- # configuration files/code are adapted accordingly
4
- # (maybe iterate over all files and grep for "rehbein" to find hard-
5
- # coded paths; have a look at all configuration files in SampleExperimentFiles.salsa)
6
-
7
-
8
- # Directories
9
-
10
- # program_de -> ruby source code and additional stuff for the German
11
- # version of shalmaneser
12
- # program_de/SampleExperimentFiles.salsa
13
- # -> configuration files for shalmaneser
14
- # input -> includes test data in plain text format
15
- # output -> all temporary files and output files, including the
16
- # classifiers
17
- #
18
- # directory output:
19
- # prp_test -> output of frprep.rb (parsed/tagged/lemmatised data)
20
- # preprocessed -> output of frprep.rb (data converted to SalsaTiGerXML)
21
- # exp_fred_salsa-> temp files/output of fred.rb (classifiers, features, ...)
22
- # exp_fred/output/stxml/ -> output of fred.rb (SalsaTigerXML file with
23
- # frames)
24
- # exp_rosy_salsa-> temp files/output of rosy.rb (classifiers, features, ...)
25
- # exp_rosy_salsa/output -> output of rosy.rb
26
-
27
- # Set some variables
28
- # => adapt to your program paths
29
- DIR=/proj/llx/Annotation/experiments/test/shalmaneser
30
- EXP=$DIR/program_de/SampleExperimentFiles.salsa
31
-
32
- export CLASSPATH=/proj/llx/Software/MachineLearning/maxent-2.4.0/lib/trove.jar:/proj/llx/Software/MachineLearning/maxent-2.4.0/output/maxent-2.4.0.jar:/proj/llx/Annotation/experiments/sfischer_bachelor/shalmaneser/program/tools/maxent
33
-
34
-
35
-
36
- # change to shalmaneser directory
37
- cd $DIR/program_de
38
-
39
- # Preprocessing
40
- # (result: parsed file in SalsaTiGerXML format
41
- # when running on SalsaTiGerXML data: gold frames/roles included
42
- # when running on plain text: without frames/roles)
43
-
44
- ruby frprep.rb -e $EXP/prp_test.salsa
45
-
46
-
47
- # Frame assignment with fred
48
- ruby fred.rb -t featurize -e $EXP/fred_test.salsa -d test
49
-
50
- ruby fred.rb -t test -e $EXP/fred_test.salsa
51
-
52
-
53
- # Role assignment with rosy
54
- ruby rosy.rb -t featurize -e $EXP/rosy.salsa -d test
55
-
56
- ruby rosy.rb -t test -e $EXP/rosy.salsa
57
-
@@ -1,160 +0,0 @@
1
- = FrPrep
2
- prep_experiment_ID => "string", # experiment identifier
3
- frprep_directory => "string", # dir for frprep internal data
4
- # information about the dataset
5
- language => "string", # en, de
6
- origin => "string", # FrameNet, Salsa, or nothing
7
- format => "string", # Plain, SalsaTab, FNXml, FNCorpusXml, SalsaTigerXML
8
- encoding => "string", # utf8, iso, hex, or nothing
9
- # directories
10
- directory_input => "string", # dir with input data
11
- directory_preprocessed => "string", # dir with output Salsa/Tiger XML data
12
- directory_parserout => "string", # dir with parser output for the parser named below
13
-
14
- # syntactic processing
15
- pos_tagger => "string", # name of POS tagger
16
- lemmatizer => "string", # name of lemmatizer
17
- parser => "string", # name of parser
18
- pos_tagger_path => "string", # path to POS tagger
19
- lemmatizer_path => "string", # path to lemmatizer
20
- parser_path => "string", # path to parser
21
- parser_max_sent_num => "integer", # max number of sentences per parser
22
- input file
23
- parser_max_sent_len => "integer", # max sentence length the parser handles
24
-
25
- do_parse" => "bool", # use parser?
26
- do_lemmatize" => "bool",# use lemmatizer?
27
- do_postag" => "bool", # use POS tagger?
28
-
29
- # output format: if tabformat_output == true,
30
- # output in Tab format rather than Salsa/Tiger XML
31
- # (this will not work if do_parse == true)
32
- tabformat_output" => "bool",
33
-
34
- # syntactic repairs, dependent on existing semantic role annotation
35
- fe_syn_repair" => "bool", # map words to constituents for FEs: idealize?
36
- fe_rel_repair" => "bool", # FEs: include non-included relative clauses into FEs
37
-
38
- = Fred
39
- experiment_ID" => "string", # experiment ID
40
- enduser_mode" => "bool", # work in enduser mode? (disallowing many things)
41
-
42
- preproc_descr_file_train" => "string", # path to preprocessing files
43
- preproc_descr_file_test" => "string",
44
- directory_output" => "string", # path to Salsa/Tiger XML output directory
45
-
46
- verbose" => "bool" , # print diagnostic messages?
47
- apply_to_all_known_targets" => "bool", # apply to all known targets rather than the ones with a frame?
48
-
49
- fred_directory" => "string",# directory for internal info
50
- classifier_dir" => "string", # write classifiers here
51
-
52
- classifier" => "list", # classifiers
53
-
54
- dbtype" => "string", # "mysql" or "sqlite"
55
-
56
- host" => "string", # DB access: sqlite only
57
- user" => "string",
58
- passwd" => "string",
59
- dbname" => "string",
60
-
61
- # featurization info
62
- feature" => "list", # which features to use for the classifier?
63
- binary_classifiers" => "bool",# make binary rather than n-ary clasifiers?
64
- negsense" => "string", # binary classifier: negative sense is..?
65
- numerical_features" => "string", # do what with numerical features?
66
-
67
- # what to do with items that have multiple senses?
68
- # 'binarize': binary classifiers, and consider positive
69
- # if the sense is among the gold senses
70
- # 'join' : make one joint sense
71
- # 'repeat' : make multiple occurrences of the item, one sense per occ
72
- # 'keep' : keep as separate labels
73
- #
74
- # multilabel: consider as assigned all labels
75
- # above a certain confidence threshold?
76
- handle_multilabel" => "string",
77
- assignment_confidence_threshold" => "float",
78
-
79
- # single-sentence context?
80
- single_sent_context" => "bool",
81
-
82
- # noncontiguous input? then we need access to a larger corpus
83
- noncontiguous_input" => "bool",
84
- larger_corpus_dir" => "string",
85
- larger_corpus_format" => "string",
86
- larger_corpus_encoding" => "string"
87
-
88
- [ # variables
89
- "train",
90
- "exp_ID"
91
- ]
92
-
93
- = Rosy
94
- # features
95
- feature" => "list",
96
- classifier" => "list",
97
-
98
- verbose" => "bool" ,
99
- enduser_mode" => "bool",
100
-
101
- experiment_ID" => "string",
102
-
103
- directory_input_train" => "string",
104
- directory_input_test" => "string",
105
- directory_output" => "string",
106
-
107
- preproc_descr_file_train" => "string",
108
- preproc_descr_file_test" => "string",
109
- external_descr_file" => "string",
110
-
111
- dbtype" => "string", # "mysql" or "sqlite"
112
-
113
- host" => "string", # DB access: sqlite only
114
- user" => "string",
115
- passwd" => "string",
116
- dbname" => "string",
117
-
118
- data_dir" => "string", # for external use
119
- rosy_dir" => "pattern", # for internal use only, set by rosy.rb
120
-
121
- classifier_dir" => "string", # if present, special directory for classifiers
122
-
123
- classif_column_name" => "string",
124
- main_table_name" => "pattern",
125
- test_table_name" => "pattern",
126
-
127
- eval_file" => "pattern",
128
- log_file" => "pattern",
129
- failed_file" => "pattern",
130
- classifier_file" => "pattern",
131
- classifier_output_file" => "pattern",
132
- noval" => "string",
133
-
134
-
135
- split_nones" => "bool",
136
- print_eval_log" => "bool",
137
- assume_argrec_perfect" => "bool",
138
- xwise_argrec" => "string",
139
- xwise_arglab" => "string",
140
- xwise_onestep" => "string",
141
-
142
- fe_syn_repair" => "bool", # map words to constituents for FEs: idealize?
143
- fe_rel_repair" => "bool", # FEs: include non-included relative clauses into FEs
144
-
145
- prune" => "string", # pruning prior to argrec?
146
-
147
- ["exp_ID", "test_ID", "split_ID", "feature_name", "classif", "step",
148
- "group", "dataset","mode"] # variables
149
-
150
- = External Config Data
151
-
152
- directory" => "string", # features
153
-
154
- experiment_id" => "string",
155
-
156
- gfmap_restrict_to_downpath" => "bool",
157
- gfmap_restrict_pathlen" => "integer",
158
- gfmap_remove_gf" => "list"
159
-
160
-
Binary file
Binary file
Binary file
Binary file
Binary file