shalmaneser 1.2.0.rc2 → 1.2.0.rc3
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- checksums.yaml.gz.sig +0 -0
- data.tar.gz.sig +0 -0
- data/.yardopts +3 -1
- data/README.md +13 -6
- data/bin/shalmaneser +13 -0
- data/doc/exp_files.md +191 -0
- data/doc/index.md +2 -1
- data/lib/frprep/Ampersand.rb +3 -1
- data/lib/frprep/FNCorpusXML.rb +3 -3
- data/lib/frprep/FNDatabase.rb +1 -1
- data/lib/frprep/FrameXML.rb +3 -3
- data/lib/frprep/frprep.rb +3 -0
- data/lib/frprep/interfaces/berkeley_interface.rb +34 -9
- data/lib/shalmaneser/opt_parser.rb +51 -0
- data/lib/shalmaneser/version.rb +1 -1
- metadata +43 -23
- metadata.gz.sig +0 -0
- data/doc/SB_README +0 -57
- data/doc/exp_files_description.txt +0 -160
- data/doc/fred.pdf +0 -0
- data/doc/salsa_tool.pdf +0 -0
- data/doc/salsatigerxml.pdf +0 -0
- data/doc/shal_doc.pdf +0 -0
- data/doc/shal_lrec.pdf +0 -0
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: f64175ecd62ad8540989348c15317500a81a001f
|
4
|
+
data.tar.gz: fe381a419d70708f84ee2060bc91fea35e31cf26
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: f888c3690741dda8f2ca980f1ba51020a696ea92fc6c2cb3e596da8349c11e946acf08ae4ead19cf2664f20b827cda49882783ccd2aabd29cb902a819b2e9c65
|
7
|
+
data.tar.gz: 3268708b720df30dac6928bb5df7732391a1f73a9da65bd3cc2912af27447831ca0c0d281a0d032e279bf9356a07bcd3ecb099e542666887d1b3447462ab34e2
|
checksums.yaml.gz.sig
ADDED
Binary file
|
data.tar.gz.sig
ADDED
Binary file
|
data/.yardopts
CHANGED
data/README.md
CHANGED
@@ -1,22 +1,28 @@
|
|
1
1
|
# [SHALMANESER - a SHALlow seMANtic parSER](http://www.coli.uni-saarland.de/projects/salsa/shal/)
|
2
2
|
|
3
3
|
|
4
|
-
[RubyGems](http://rubygems.org/gems/shalmaneser) | [Shalmanesers Project Page](http://bu.chsta.be/projects/shalmaneser/) |
|
5
|
-
[Source Code](https://github.com/arbox/shalmaneser) | [Bug Tracker](https://github.com/arbox/shalmaneser/issues)
|
4
|
+
[RubyGems](http://rubygems.org/gems/shalmaneser) | [Shalmanesers Project Page](http://bu.chsta.be/projects/shalmaneser/) | [Source Code](https://github.com/arbox/shalmaneser) | [Bug Tracker](https://github.com/arbox/shalmaneser/issues)
|
6
5
|
|
7
6
|
[<img src="https://badge.fury.io/rb/shalmaneser.png" alt="Gem Version" />](http://badge.fury.io/rb/shalmaneser)
|
8
|
-
[
|
7
|
+
[![Build Status](https://travis-ci.org/arbox/shalmaneser.png?branch=1.2)](https://travis-ci.org/arbox/shalmaneser)
|
9
8
|
[<img src="https://codeclimate.com/github/arbox/shalmaneser.png" alt="Code Climate" />](https://codeclimate.com/github/arbox/shalmaneser)
|
10
9
|
[<img alt="Bitdeli Badge" src="https://d2weczhvl823v0.cloudfront.net/arbox/shalmaneser/trend.png" />](https://bitdeli.com/free)
|
11
10
|
[![Dependency Status](https://gemnasium.com/arbox/shalmaneser.png)](https://gemnasium.com/arbox/shalmaneser)
|
12
11
|
|
13
12
|
## Description
|
14
13
|
|
15
|
-
Please be careful, the whole thing is under construction! Shalmaneser it not intended to run on Windows systems
|
14
|
+
Please be careful, the whole thing is under construction! For now Shalmaneser it not intended to run on Windows systems since it heavily uses system call for external invocations.
|
15
|
+
Current versions of Shalmaneser have been tested on Linux only (other *NIX testers are welcome!).
|
16
16
|
|
17
|
-
Shalmaneser is a supervised learning toolbox for shallow semantic parsing, i.e. the automatic assignment of semantic classes and roles to text. The system was developed for Frame Semantics; thus we use Frame Semantics terminology and call the classes frames and the roles frame elements. However, the architecture is reasonably general, and with a certain amount of adaption, Shalmaneser should be usable for other paradigms (e.g., PropBank roles) as well. Shalmaneser caters both for end users, and for researchers.
|
17
|
+
Shalmaneser is a supervised learning toolbox for shallow semantic parsing, i.e. the automatic assignment of semantic classes and roles to text. This technique is often called SRL (Semantic Role Labelling). The system was developed for Frame Semantics; thus we use Frame Semantics terminology and call the classes frames and the roles frame elements. However, the architecture is reasonably general, and with a certain amount of adaption, Shalmaneser should be usable for other paradigms (e.g., PropBank roles) as well. Shalmaneser caters both for end users, and for researchers.
|
18
18
|
|
19
|
-
For end users, we provide a simple end user mode which can simply apply the pre-trained classifiers
|
19
|
+
For end users, we provide a simple end user mode which can simply apply the pre-trained classifiers
|
20
|
+
for [English](http://www.coli.uni-saarland.de/projects/salsa/shal/index.php?nav=download) (FrameNet 1.3 annotation / Collins parser)
|
21
|
+
and [German](http://www.coli.uni-saarland.de/projects/salsa/shal/index.php?nav=download) (SALSA 1.0 annotation / Sleepy parser).
|
22
|
+
|
23
|
+
We'll try to provide newer pretrained models for English, German, and possibly other languages as soon as possible.
|
24
|
+
|
25
|
+
For researchers interested in investigating shallow semantic parsing, our system is extensively configurable and extendable.
|
20
26
|
|
21
27
|
## Origin
|
22
28
|
You can find original versions of Shalmaneser up to ``1.1`` on the [SALSA](http://www.coli.uni-saarland.de/projects/salsa/shal/) project page.
|
@@ -24,6 +30,7 @@ You can find original versions of Shalmaneser up to ``1.1`` on the [SALSA](http:
|
|
24
30
|
## Publications on Shalmaneser
|
25
31
|
|
26
32
|
- K. Erk and S. Padó: Shalmaneser - a flexible toolbox for semantic role assignment. Proceedings of LREC 2006, Genoa, Italy. [Click here for details](http://www.nlpado.de/~sebastian/pub/papers/lrec06_erk.pdf).
|
33
|
+
- TODO: add other works
|
27
34
|
|
28
35
|
## Documentation
|
29
36
|
|
data/bin/shalmaneser
ADDED
data/doc/exp_files.md
ADDED
@@ -0,0 +1,191 @@
|
|
1
|
+
# Experiment file description
|
2
|
+
The whole work with Shalmaneser and its submodules is governed be experiment files.
|
3
|
+
|
4
|
+
In an experiment file all feature specifications have the form:
|
5
|
+
|
6
|
+
feature_name = feature_value
|
7
|
+
|
8
|
+
The ``feature_name`` is a string without spaces. And the ``feature_value`` may include spaces, depending on the feature type (see below).
|
9
|
+
|
10
|
+
To include a comment in a config file, start the comment line with ``#``.
|
11
|
+
|
12
|
+
Features are typed. The following ``normal`` types are supported:
|
13
|
+
|
14
|
+
- ``bool``,
|
15
|
+
- ``float``,
|
16
|
+
- ``integer``,
|
17
|
+
- ``string``
|
18
|
+
|
19
|
+
For the ``#get`` method, with which features in the ``ConfigData`` object are accessed, the values are transformed from the strings in the experiment file to the appropriate Ruby class.
|
20
|
+
|
21
|
+
Other types:
|
22
|
+
|
23
|
+
- ``pattern``,
|
24
|
+
- `` list``.
|
25
|
+
|
26
|
+
Feature of the ``pattern`` type are features that may include variables in <> brackets. When this feature is accesssed, values for these variables are given, i.e. this pattern has to be instantiated.
|
27
|
+
|
28
|
+
For example, given a feature
|
29
|
+
|
30
|
+
fileformat = features.<type>.train
|
31
|
+
|
32
|
+
and method call
|
33
|
+
|
34
|
+
instantiate("fileformat", "type" => "path")
|
35
|
+
|
36
|
+
what is returned is a String ``features.path.train``.
|
37
|
+
|
38
|
+
The ``list`` type is the only feature type where more than one feature specification with the same feature_name is allowed. The right-hand sides of a list feature are stored in an array.
|
39
|
+
|
40
|
+
Given a ``list`` feature ``bla``, if the experiment file contains:
|
41
|
+
|
42
|
+
bla = blupp 1 2
|
43
|
+
bla = la di da
|
44
|
+
|
45
|
+
the list feature ``bla`` is represented as follows:
|
46
|
+
|
47
|
+
@features['bla'] = [['blupp', 1,2], ['la', 'di', 'da']]
|
48
|
+
|
49
|
+
For comfortable access to a list feature, arbitrary access functions for list features can be defined.
|
50
|
+
|
51
|
+
## Fred and Rosy Preprocessor (aka frprep|prep)
|
52
|
+
|
53
|
+
"prep_experiment_ID" => "string", # experiment identifier
|
54
|
+
"frprep_directory" => "string", # dir for frprep internal data
|
55
|
+
|
56
|
+
# information about the dataset
|
57
|
+
"language" => "string", # en, de
|
58
|
+
"origin"=> "string", # FrameNet, Salsa, or nothing
|
59
|
+
"format" => "string", # Plain, SalsaTab, FNXml, FNCorpusXml, SalsaTigerXML
|
60
|
+
"encoding" => "string", # utf8, iso, hex, or nothing
|
61
|
+
|
62
|
+
# directories
|
63
|
+
"directory_input" => "string", # dir with input data
|
64
|
+
"directory_preprocessed" => "string", # dir with output Salsa/Tiger XML data
|
65
|
+
"directory_parserout" => "string", # dir with parser output for the parser named below
|
66
|
+
|
67
|
+
# syntactic processing
|
68
|
+
"pos_tagger" => "string", # name of POS tagger
|
69
|
+
"lemmatizer" => "string", # name of lemmatizer
|
70
|
+
"parser" => "string", # name of parser
|
71
|
+
"pos_tagger_path" => "string", # path to POS tagger
|
72
|
+
"lemmatizer_path" => "string", # path to lemmatizer
|
73
|
+
"parser_path" => "string", # path to parser
|
74
|
+
"parser_max_sent_num" => "integer", # max number of sentences per parser input file
|
75
|
+
"parser_max_sent_len" => "integer", # max sentence length the parser handles
|
76
|
+
|
77
|
+
"do_parse" => "bool", # use parser?
|
78
|
+
"do_lemmatize" => "bool",# use lemmatizer?
|
79
|
+
"do_postag" => "bool", # use POS tagger?
|
80
|
+
|
81
|
+
# output format: if tabformat_output == true,
|
82
|
+
# output in Tab format rather than Salsa/Tiger XML
|
83
|
+
# (this will not work if do_parse == true)
|
84
|
+
"tabformat_output" => "bool",
|
85
|
+
|
86
|
+
# syntactic repairs, dependent on existing semantic role annotation
|
87
|
+
"fe_syn_repair" => "bool", # map words to constituents for FEs: idealize?
|
88
|
+
"fe_rel_repair" => "bool", # FEs: include non-included relative clauses into FEs
|
89
|
+
|
90
|
+
## Frame Disambiguation System (aka Fred)
|
91
|
+
"experiment_ID" => "string", # experiment ID
|
92
|
+
"enduser_mode" => "bool", # work in enduser mode? (disallowing many things)
|
93
|
+
|
94
|
+
"preproc_descr_file_train" => "string", # path to preprocessing files
|
95
|
+
"preproc_descr_file_test" => "string",
|
96
|
+
"directory_output" => "string", # path to Salsa/Tiger XML output directory
|
97
|
+
|
98
|
+
"verbose" => "bool" , # print diagnostic messages?
|
99
|
+
"apply_to_all_known_targets" => "bool", # apply to all known targets rather than the ones with a frame?
|
100
|
+
|
101
|
+
"fred_directory" => "string",# directory for internal info
|
102
|
+
"classifier_dir" => "string", # write classifiers here
|
103
|
+
|
104
|
+
"classifier" => "list", # classifiers
|
105
|
+
|
106
|
+
"dbtype" => "string", # "mysql" or "sqlite"
|
107
|
+
|
108
|
+
"host" => "string", # DB access: sqlite only
|
109
|
+
"user" => "string",
|
110
|
+
"passwd" => "string",
|
111
|
+
"dbname" => "string",
|
112
|
+
|
113
|
+
# featurization info
|
114
|
+
"feature" => "list", # which features to use for the classifier?
|
115
|
+
"binary_classifiers" => "bool",# make binary rather than n-ary clasifiers?
|
116
|
+
"negsense" => "string", # binary classifier: negative sense is..?
|
117
|
+
"numerical_features" => "string", # do what with numerical features?
|
118
|
+
|
119
|
+
# what to do with items that have multiple senses?
|
120
|
+
# 'binarize': binary classifiers, and consider positive
|
121
|
+
# if the sense is among the gold senses
|
122
|
+
# 'join' : make one joint sense
|
123
|
+
# 'repeat' : make multiple occurrences of the item, one sense per occ
|
124
|
+
# 'keep' : keep as separate labels
|
125
|
+
#
|
126
|
+
# multilabel: consider as assigned all labels
|
127
|
+
# above a certain confidence threshold?
|
128
|
+
"handle_multilabel" => "string",
|
129
|
+
"assignment_confidence_threshold" => "float",
|
130
|
+
|
131
|
+
# single-sentence context?
|
132
|
+
"single_sent_context" => "bool",
|
133
|
+
|
134
|
+
# noncontiguous input? then we need access to a larger corpus
|
135
|
+
"noncontiguous_input" => "bool",
|
136
|
+
"larger_corpus_dir" => "string",
|
137
|
+
"larger_corpus_format" => "string",
|
138
|
+
"larger_corpus_encoding" => "string"
|
139
|
+
## Role Assignment System (aka Rosy)
|
140
|
+
# features
|
141
|
+
"feature" => "list",
|
142
|
+
"classifier" => "list",
|
143
|
+
|
144
|
+
"verbose" => "bool" ,
|
145
|
+
"enduser_mode" => "bool",
|
146
|
+
|
147
|
+
"experiment_ID" => "string",
|
148
|
+
|
149
|
+
"directory_input_train" => "string",
|
150
|
+
"directory_input_test" => "string",
|
151
|
+
"directory_output" => "string",
|
152
|
+
|
153
|
+
"preproc_descr_file_train" => "string",
|
154
|
+
"preproc_descr_file_test" => "string",
|
155
|
+
"external_descr_file" => "string",
|
156
|
+
|
157
|
+
"dbtype" => "string", # "mysql" or "sqlite"
|
158
|
+
|
159
|
+
"host" => "string", # DB access: sqlite only
|
160
|
+
"user" => "string",
|
161
|
+
"passwd" => "string",
|
162
|
+
"dbname" => "string",
|
163
|
+
|
164
|
+
"data_dir" => "string", # for external use
|
165
|
+
"rosy_dir" => "pattern", # for internal use only, set by rosy.rb
|
166
|
+
|
167
|
+
"classifier_dir" => "string", # if present, special directory for classifiers
|
168
|
+
|
169
|
+
"classif_column_name" => "string",
|
170
|
+
"main_table_name" => "pattern",
|
171
|
+
"test_table_name" => "pattern",
|
172
|
+
|
173
|
+
"eval_file" => "pattern",
|
174
|
+
"log_file" => "pattern",
|
175
|
+
"failed_file" => "pattern",
|
176
|
+
"classifier_file" => "pattern",
|
177
|
+
"classifier_output_file" => "pattern",
|
178
|
+
"noval" => "string",
|
179
|
+
|
180
|
+
|
181
|
+
"split_nones" => "bool",
|
182
|
+
"print_eval_log" => "bool",
|
183
|
+
"assume_argrec_perfect" => "bool",
|
184
|
+
"xwise_argrec" => "string",
|
185
|
+
"xwise_arglab" => "string",
|
186
|
+
"xwise_onestep" => "string",
|
187
|
+
|
188
|
+
"fe_syn_repair" => "bool", # map words to constituents for FEs: idealize?
|
189
|
+
"fe_rel_repair" => "bool", # FEs: include non-included relative clauses into FEs
|
190
|
+
|
191
|
+
"prune" => "string", # pruning prior to argrec?
|
data/doc/index.md
CHANGED
@@ -3,6 +3,7 @@
|
|
3
3
|
## Prerequisites
|
4
4
|
|
5
5
|
You need the following items installed on your system:
|
6
|
+
|
6
7
|
- [Ruby](https://www.ruby-lang.org/en/downloads/), at least version ``1.8.7`` (please note that the version ``1.8.7`` is deprecated, future Shalmaneser incarnations will run only under Ruby greater than ``1.9.x``)
|
7
8
|
- a MySQL database server, your database must be large enough to hold the test data (in end user mode) plus any training data (for training new models in manual mode), e.g. training on the complete FrameNet 1.2 dataset requires about 1.5 GB of free space.
|
8
9
|
- if you don't want to train classifiers from you own data, you need to download suitable classifiers from our homepage for available configurations (see for links later).
|
@@ -111,7 +112,7 @@ Downloand the Stanford Parser archive from the official [site](http://nlp.stanfo
|
|
111
112
|
|_ stanford_parser-x.y.z-models.jar
|
112
113
|
|
113
114
|
### OpenNLP MaxEnt
|
114
|
-
Downloand the MaxEnt archive from the official [site](http://sourceforge.net/projects/maxent/files/Maxent/2.4.0/) from SourceForge,
|
115
|
+
Downloand the MaxEnt archive from the official [site](http://sourceforge.net/projects/maxent/files/Maxent/2.4.0/) from SourceForge. You have to use the Version ``2.4.0``, other versions aren't compatible with Shalmaneser for now, but we are working on it. Untar the archive to your favorite location. Set ``JAVA_HOME`` if it isn't set on your system. Run ``build.sh`` in the MaxEnt Root Directory.
|
115
116
|
|
116
117
|
The path to the root directory is essential for the experiment file declarations. Schalmaneser expects the following directory structure:
|
117
118
|
|
data/lib/frprep/Ampersand.rb
CHANGED
data/lib/frprep/FNCorpusXML.rb
CHANGED
@@ -23,9 +23,9 @@
|
|
23
23
|
# ne: named entity
|
24
24
|
# sent_id: sentence ID
|
25
25
|
|
26
|
-
require
|
27
|
-
require
|
28
|
-
require
|
26
|
+
require 'frprep/Ampersand'
|
27
|
+
require 'common/ISO-8859-1'
|
28
|
+
require 'common/RegXML'
|
29
29
|
|
30
30
|
#####################
|
31
31
|
# mixins to make work with RegXML a little less repetitive
|
data/lib/frprep/FNDatabase.rb
CHANGED
data/lib/frprep/FrameXML.rb
CHANGED
@@ -28,9 +28,9 @@
|
|
28
28
|
# write new adapted FNTab format
|
29
29
|
# ( "word", ("pt", "gf", "role", "target", "frame", "stuff")* "ne", "sent_id" )
|
30
30
|
|
31
|
-
require 'Ampersand'
|
32
|
-
require 'ISO-8859-1'
|
33
|
-
require 'RegXML'
|
31
|
+
require 'frprep/Ampersand'
|
32
|
+
require 'common/ISO-8859-1'
|
33
|
+
require 'common/RegXML'
|
34
34
|
|
35
35
|
class FrameXMLFile # only verified to work for FrameNet v1.1
|
36
36
|
|
data/lib/frprep/frprep.rb
CHANGED
@@ -91,8 +91,12 @@ class BerkeleyInterface < SynInterfaceSTXML
|
|
91
91
|
|
92
92
|
# AB: for testing we leave this step out, it takes too much time.
|
93
93
|
# Please keep the <parsefile> intact!!!
|
94
|
-
|
94
|
+
rv = system("#{berkeley_prog} < #{tempfile.path} > #{parsefilename}")
|
95
95
|
|
96
|
+
# AB: Testing for return value.
|
97
|
+
unless rv
|
98
|
+
fail 'Berkeley Parser failed to parse our files!'
|
99
|
+
end
|
96
100
|
end
|
97
101
|
end
|
98
102
|
|
@@ -129,7 +133,16 @@ class BerkeleyInterface < SynInterfaceSTXML
|
|
129
133
|
line = parsefile.gets
|
130
134
|
|
131
135
|
# search for the next "relevant" file or end of the file
|
132
|
-
|
136
|
+
# We expect here:
|
137
|
+
# - an empty line;
|
138
|
+
# - a failed parse;
|
139
|
+
# - a parse beginning with <( (>, <( (TOP>, <( (VROOT> etc.
|
140
|
+
# TOP - Negra Grammars
|
141
|
+
# VROOT - Tiger Grammars
|
142
|
+
# PSEUDE - Original BP Grammars
|
143
|
+
# ROOT - some english grammars
|
144
|
+
# empty identifiers for older Tiger grammars
|
145
|
+
if line.nil? or line=~/^\( *\((PSEUDO|TOP|ROOT|VROOT)? / or line=~/^\(\(\)/
|
133
146
|
break
|
134
147
|
end
|
135
148
|
sentid +=1
|
@@ -141,12 +154,21 @@ class BerkeleyInterface < SynInterfaceSTXML
|
|
141
154
|
raise "Error: premature end of parser file!"
|
142
155
|
end
|
143
156
|
|
144
|
-
|
157
|
+
# Insert a top node <VROOT> if missing.
|
158
|
+
# Some grammars trained on older Tiger Versions
|
159
|
+
# expose this problem.
|
160
|
+
line.sub!(/^(\(\s+\(\s+)/, '\1VROOT')
|
161
|
+
|
145
162
|
# berkeley parser output: remove brackets /(.*)/
|
163
|
+
# Remove leading and trailing top level brackets.
|
146
164
|
line.sub!(/^\( */, '')
|
147
165
|
line.sub!(/ *\) *$/, '')
|
166
|
+
|
167
|
+
# Split consequtive closing brackets.
|
148
168
|
line.gsub!(/\)\)/, ') )')
|
149
169
|
line.gsub!(/\)\)/, ') )')
|
170
|
+
|
171
|
+
# Change CAT_FUNC delimiter from <_> to <->.
|
150
172
|
line.gsub!(/(\([A-Z]+)_/, '\1-')
|
151
173
|
|
152
174
|
sentence_str = line.chomp!
|
@@ -326,24 +348,27 @@ class BerkeleyInterface < SynInterfaceSTXML
|
|
326
348
|
|
327
349
|
return build_salsatiger(sentence,pos+$&.length, stack,termc,nontc,sent_obj)
|
328
350
|
else
|
329
|
-
raise "Error: cannot analyse sentence at pos #{pos}:
|
351
|
+
raise "Error: cannot analyse sentence at pos #{pos}: <#{sentence[pos..-1]}>. Complete sentence: \n#{sentence}"
|
330
352
|
end
|
331
353
|
end
|
332
354
|
|
333
355
|
###
|
334
|
-
# BerkeleyParser delivers node labels
|
335
|
-
#
|
356
|
+
# BerkeleyParser delivers node labels in different forms:
|
357
|
+
# - "phrase type"-"grammatical function",
|
358
|
+
# - "phrase type"_"grammatical function",
|
359
|
+
# - "prase type":"grammatical function",
|
360
|
+
# but the GF may be absent.
|
336
361
|
# @param cat [String]
|
337
|
-
# @return [String]
|
362
|
+
# @return [Array<String>]
|
338
363
|
def split_cat(cat)
|
339
364
|
|
340
|
-
md = cat.match(/^([
|
365
|
+
md = cat.match(/^([^-:_]*)([-:_]([^-:_]*))?$/)
|
341
366
|
raise "Error: Could not identify category in #{cat}!" unless md[1]
|
342
367
|
|
343
368
|
proper_cat = md[1]
|
344
369
|
md[3] ? gf = md[3] : gf = ''
|
345
370
|
|
346
|
-
[proper_cat,gf]
|
371
|
+
[proper_cat, gf]
|
347
372
|
end
|
348
373
|
|
349
374
|
end
|
@@ -0,0 +1,51 @@
|
|
1
|
+
require 'optparse'
|
2
|
+
require 'shalmaneser/version'
|
3
|
+
|
4
|
+
|
5
|
+
module Shalmaneser
|
6
|
+
class OptParser
|
7
|
+
def self.parse(cmd_args)
|
8
|
+
|
9
|
+
parser = create_parser
|
10
|
+
|
11
|
+
if cmd_args.empty?
|
12
|
+
cmd_args << '-h'
|
13
|
+
end
|
14
|
+
|
15
|
+
# Parse ARGV and provide the options hash.
|
16
|
+
# Check if everything is correct and handle exceptions
|
17
|
+
begin
|
18
|
+
parser.parse(cmd_args)
|
19
|
+
rescue OptionParser::InvalidArgument => e
|
20
|
+
arg = e.message.split.last
|
21
|
+
puts "The provided argument #{arg} is currently not supported by Shalmaneser!"
|
22
|
+
puts 'Please colsult <shalmaneser --help>.'
|
23
|
+
exit(1)
|
24
|
+
rescue OptionParser::InvalidOption => e
|
25
|
+
puts "You have provided an #{e.message}."
|
26
|
+
puts 'Please colsult <shalmaneser --help>.'
|
27
|
+
exit(1)
|
28
|
+
rescue
|
29
|
+
raise
|
30
|
+
end
|
31
|
+
end
|
32
|
+
|
33
|
+
def self.create_parser
|
34
|
+
OptionParser.new do |opts|
|
35
|
+
opts.banner = 'Usage: shalmaneser OPTIONS'
|
36
|
+
opts.separator ''
|
37
|
+
opts.separator 'Common options:'
|
38
|
+
|
39
|
+
opts.on_tail('-h', '--help', 'Show the help message.') do
|
40
|
+
puts opts
|
41
|
+
exit
|
42
|
+
end
|
43
|
+
|
44
|
+
opts.on_tail('-v', '--version', 'Show the program version.') do
|
45
|
+
puts VERSION
|
46
|
+
exit
|
47
|
+
end
|
48
|
+
end
|
49
|
+
end
|
50
|
+
end # OptParser
|
51
|
+
end # Shalmaneser
|
data/lib/shalmaneser/version.rb
CHANGED
metadata
CHANGED
@@ -1,83 +1,105 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: shalmaneser
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.2.0.
|
4
|
+
version: 1.2.0.rc3
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Andrei Beliankou
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
|
-
cert_chain:
|
11
|
-
|
10
|
+
cert_chain:
|
11
|
+
- |
|
12
|
+
-----BEGIN CERTIFICATE-----
|
13
|
+
MIIDZDCCAkygAwIBAgIBATANBgkqhkiG9w0BAQUFADA8MQ4wDAYDVQQDDAVhcmJv
|
14
|
+
eDEWMBQGCgmSJomT8ixkARkWBnlhbmRleDESMBAGCgmSJomT8ixkARkWAnJ1MB4X
|
15
|
+
DTE0MDEwNjE1NDU0MFoXDTE1MDEwNjE1NDU0MFowPDEOMAwGA1UEAwwFYXJib3gx
|
16
|
+
FjAUBgoJkiaJk/IsZAEZFgZ5YW5kZXgxEjAQBgoJkiaJk/IsZAEZFgJydTCCASIw
|
17
|
+
DQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBAKpdkXWo8sFAq/Dd+rCLRCKHpH02
|
18
|
+
8cZsiy3Dx5kt9qpjYn/LX4/QlJ2mc2C3QXUr++DFJjA0K3TcRS2esUVS9ZlNMDM9
|
19
|
+
YQnxFmPJ4tfpsMiteQMBVqU643aZrh64rqddklg8BwRec+prIIDxfQHzXalnNBad
|
20
|
+
YfiHhjgTh5YQsx3Q0zidhlAtsIbJljaNLuJ4DiVQUtjumEnOI0HTLTuUdpg/Hhh+
|
21
|
+
nPlnhwOUBGzj5hUGzf9QcbV2k99KXsKlHQVkMDn7gsXuIKsisVde07lUbhhR7YGy
|
22
|
+
Z3vGnZK7oNI0It0LIBm7pdx2gtB4YG9O5QKEJo0WzLY60TiY8DzDguLndIcCAwEA
|
23
|
+
AaNxMG8wCQYDVR0TBAIwADALBgNVHQ8EBAMCBLAwHQYDVR0OBBYEFHhWOk+TWhtU
|
24
|
+
KMnM8ZyfBZYcVXxDMBoGA1UdEQQTMBGBD2FyYm94QHlhbmRleC5ydTAaBgNVHRIE
|
25
|
+
EzARgQ9hcmJveEB5YW5kZXgucnUwDQYJKoZIhvcNAQEFBQADggEBAF2Y+mc/uTug
|
26
|
+
OX3ivVkD4AaPpFsB2EglJhQxivlAHkix593RpZPXNf6jeu36oRCV/vRFLkzzaZ73
|
27
|
+
N7MaI5Z2HczDkZvi8ZZM5L3p4wHttquranUdI3bZv4SiAVFmhkeFZLSp6pFf/Fmg
|
28
|
+
qmEeXWVbsCIhYI7KYQ0XKbnRuj9AmjUEoMBZPnMsM1S/R+dBQfrUszXROWqxaENA
|
29
|
+
728ScNHCmRYuNutDO9yRDJT1SRumpgwH4df6c0LHBCuXuQTWODYqc/CDZJJb9Tfi
|
30
|
+
BJreIpPMe0KFMphkN/x5cHkRDtMoY+rBGcqRe60otCEsAHdM+CXox9tAREnr/4lT
|
31
|
+
Jn9sRDVszy4=
|
32
|
+
-----END CERTIFICATE-----
|
33
|
+
date: 2014-01-11 00:00:00.000000000 Z
|
12
34
|
dependencies:
|
13
35
|
- !ruby/object:Gem::Dependency
|
14
36
|
name: mysql
|
15
37
|
requirement: !ruby/object:Gem::Requirement
|
16
38
|
requirements:
|
17
|
-
- -
|
39
|
+
- - '>='
|
18
40
|
- !ruby/object:Gem::Version
|
19
41
|
version: '0'
|
20
42
|
type: :runtime
|
21
43
|
prerelease: false
|
22
44
|
version_requirements: !ruby/object:Gem::Requirement
|
23
45
|
requirements:
|
24
|
-
- -
|
46
|
+
- - '>='
|
25
47
|
- !ruby/object:Gem::Version
|
26
48
|
version: '0'
|
27
49
|
- !ruby/object:Gem::Dependency
|
28
50
|
name: rdoc
|
29
51
|
requirement: !ruby/object:Gem::Requirement
|
30
52
|
requirements:
|
31
|
-
- -
|
53
|
+
- - '>='
|
32
54
|
- !ruby/object:Gem::Version
|
33
55
|
version: '0'
|
34
56
|
type: :development
|
35
57
|
prerelease: false
|
36
58
|
version_requirements: !ruby/object:Gem::Requirement
|
37
59
|
requirements:
|
38
|
-
- -
|
60
|
+
- - '>='
|
39
61
|
- !ruby/object:Gem::Version
|
40
62
|
version: '0'
|
41
63
|
- !ruby/object:Gem::Dependency
|
42
64
|
name: bundler
|
43
65
|
requirement: !ruby/object:Gem::Requirement
|
44
66
|
requirements:
|
45
|
-
- -
|
67
|
+
- - '>='
|
46
68
|
- !ruby/object:Gem::Version
|
47
69
|
version: '0'
|
48
70
|
type: :development
|
49
71
|
prerelease: false
|
50
72
|
version_requirements: !ruby/object:Gem::Requirement
|
51
73
|
requirements:
|
52
|
-
- -
|
74
|
+
- - '>='
|
53
75
|
- !ruby/object:Gem::Version
|
54
76
|
version: '0'
|
55
77
|
- !ruby/object:Gem::Dependency
|
56
78
|
name: yard
|
57
79
|
requirement: !ruby/object:Gem::Requirement
|
58
80
|
requirements:
|
59
|
-
- -
|
81
|
+
- - '>='
|
60
82
|
- !ruby/object:Gem::Version
|
61
83
|
version: '0'
|
62
84
|
type: :development
|
63
85
|
prerelease: false
|
64
86
|
version_requirements: !ruby/object:Gem::Requirement
|
65
87
|
requirements:
|
66
|
-
- -
|
88
|
+
- - '>='
|
67
89
|
- !ruby/object:Gem::Version
|
68
90
|
version: '0'
|
69
91
|
- !ruby/object:Gem::Dependency
|
70
92
|
name: rake
|
71
93
|
requirement: !ruby/object:Gem::Requirement
|
72
94
|
requirements:
|
73
|
-
- -
|
95
|
+
- - '>='
|
74
96
|
- !ruby/object:Gem::Version
|
75
97
|
version: '0'
|
76
98
|
type: :development
|
77
99
|
prerelease: false
|
78
100
|
version_requirements: !ruby/object:Gem::Requirement
|
79
101
|
requirements:
|
80
|
-
- -
|
102
|
+
- - '>='
|
81
103
|
- !ruby/object:Gem::Version
|
82
104
|
version: '0'
|
83
105
|
description: |
|
@@ -89,6 +111,7 @@ description: |
|
|
89
111
|
Project at the University of Saarbrücken.
|
90
112
|
email: arbox@yandex.ru
|
91
113
|
executables:
|
114
|
+
- shalmaneser
|
92
115
|
- frprep
|
93
116
|
- fred
|
94
117
|
- rosy
|
@@ -97,6 +120,8 @@ extra_rdoc_files:
|
|
97
120
|
- README.md
|
98
121
|
- LICENSE.md
|
99
122
|
- CHANGELOG.md
|
123
|
+
- doc/exp_files.md
|
124
|
+
- doc/index.md
|
100
125
|
files:
|
101
126
|
- .yardopts
|
102
127
|
- CHANGELOG.md
|
@@ -105,14 +130,9 @@ files:
|
|
105
130
|
- bin/fred
|
106
131
|
- bin/frprep
|
107
132
|
- bin/rosy
|
108
|
-
-
|
109
|
-
- doc/
|
110
|
-
- doc/fred.pdf
|
133
|
+
- bin/shalmaneser
|
134
|
+
- doc/exp_files.md
|
111
135
|
- doc/index.md
|
112
|
-
- doc/salsa_tool.pdf
|
113
|
-
- doc/salsatigerxml.pdf
|
114
|
-
- doc/shal_doc.pdf
|
115
|
-
- doc/shal_lrec.pdf
|
116
136
|
- lib/common/AbstractSynInterface.rb
|
117
137
|
- lib/common/ConfigData.rb
|
118
138
|
- lib/common/Counter.rb
|
@@ -217,16 +237,16 @@ files:
|
|
217
237
|
- lib/rosy/View.rb
|
218
238
|
- lib/rosy/opt_parser.rb
|
219
239
|
- lib/rosy/rosy.rb
|
240
|
+
- lib/shalmaneser/opt_parser.rb
|
220
241
|
- lib/shalmaneser/version.rb
|
221
|
-
homepage:
|
242
|
+
homepage: http://bu.chsta.be/projects/shalmaneser/
|
222
243
|
licenses:
|
223
244
|
- GPL-2.0
|
224
245
|
metadata:
|
225
246
|
issue_tracker: https://github.com/arbox/shalmaneser/issues
|
226
|
-
homepage: http://bu.chsta.be/projects/shalmaneser/
|
227
247
|
post_install_message: |2+
|
228
248
|
|
229
|
-
Thank you for installing Shalmaneser 1.2.0.
|
249
|
+
Thank you for installing Shalmaneser 1.2.0.rc3!
|
230
250
|
|
231
251
|
This software package has multiple external dependencies:
|
232
252
|
- OpenNLP Maximum Entropy Classifier;
|
metadata.gz.sig
ADDED
Binary file
|
data/doc/SB_README
DELETED
@@ -1,57 +0,0 @@
|
|
1
|
-
# Before running the programs you should make sure that all components
|
2
|
-
# needed by shalmaneser are installed and that all paths in the
|
3
|
-
# configuration files/code are adapted accordingly
|
4
|
-
# (maybe iterate over all files and grep for "rehbein" to find hard-
|
5
|
-
# coded paths; have a look at all configuration files in SampleExperimentFiles.salsa)
|
6
|
-
|
7
|
-
|
8
|
-
# Directories
|
9
|
-
|
10
|
-
# program_de -> ruby source code and additional stuff for the German
|
11
|
-
# version of shalmaneser
|
12
|
-
# program_de/SampleExperimentFiles.salsa
|
13
|
-
# -> configuration files for shalmaneser
|
14
|
-
# input -> includes test data in plain text format
|
15
|
-
# output -> all temporary files and output files, including the
|
16
|
-
# classifiers
|
17
|
-
#
|
18
|
-
# directory output:
|
19
|
-
# prp_test -> output of frprep.rb (parsed/tagged/lemmatised data)
|
20
|
-
# preprocessed -> output of frprep.rb (data converted to SalsaTiGerXML)
|
21
|
-
# exp_fred_salsa-> temp files/output of fred.rb (classifiers, features, ...)
|
22
|
-
# exp_fred/output/stxml/ -> output of fred.rb (SalsaTigerXML file with
|
23
|
-
# frames)
|
24
|
-
# exp_rosy_salsa-> temp files/output of rosy.rb (classifiers, features, ...)
|
25
|
-
# exp_rosy_salsa/output -> output of rosy.rb
|
26
|
-
|
27
|
-
# Set some variables
|
28
|
-
# => adapt to your program paths
|
29
|
-
DIR=/proj/llx/Annotation/experiments/test/shalmaneser
|
30
|
-
EXP=$DIR/program_de/SampleExperimentFiles.salsa
|
31
|
-
|
32
|
-
export CLASSPATH=/proj/llx/Software/MachineLearning/maxent-2.4.0/lib/trove.jar:/proj/llx/Software/MachineLearning/maxent-2.4.0/output/maxent-2.4.0.jar:/proj/llx/Annotation/experiments/sfischer_bachelor/shalmaneser/program/tools/maxent
|
33
|
-
|
34
|
-
|
35
|
-
|
36
|
-
# change to shalmaneser directory
|
37
|
-
cd $DIR/program_de
|
38
|
-
|
39
|
-
# Preprocessing
|
40
|
-
# (result: parsed file in SalsaTiGerXML format
|
41
|
-
# when running on SalsaTiGerXML data: gold frames/roles included
|
42
|
-
# when running on plain text: without frames/roles)
|
43
|
-
|
44
|
-
ruby frprep.rb -e $EXP/prp_test.salsa
|
45
|
-
|
46
|
-
|
47
|
-
# Frame assignment with fred
|
48
|
-
ruby fred.rb -t featurize -e $EXP/fred_test.salsa -d test
|
49
|
-
|
50
|
-
ruby fred.rb -t test -e $EXP/fred_test.salsa
|
51
|
-
|
52
|
-
|
53
|
-
# Role assignment with rosy
|
54
|
-
ruby rosy.rb -t featurize -e $EXP/rosy.salsa -d test
|
55
|
-
|
56
|
-
ruby rosy.rb -t test -e $EXP/rosy.salsa
|
57
|
-
|
@@ -1,160 +0,0 @@
|
|
1
|
-
= FrPrep
|
2
|
-
prep_experiment_ID => "string", # experiment identifier
|
3
|
-
frprep_directory => "string", # dir for frprep internal data
|
4
|
-
# information about the dataset
|
5
|
-
language => "string", # en, de
|
6
|
-
origin => "string", # FrameNet, Salsa, or nothing
|
7
|
-
format => "string", # Plain, SalsaTab, FNXml, FNCorpusXml, SalsaTigerXML
|
8
|
-
encoding => "string", # utf8, iso, hex, or nothing
|
9
|
-
# directories
|
10
|
-
directory_input => "string", # dir with input data
|
11
|
-
directory_preprocessed => "string", # dir with output Salsa/Tiger XML data
|
12
|
-
directory_parserout => "string", # dir with parser output for the parser named below
|
13
|
-
|
14
|
-
# syntactic processing
|
15
|
-
pos_tagger => "string", # name of POS tagger
|
16
|
-
lemmatizer => "string", # name of lemmatizer
|
17
|
-
parser => "string", # name of parser
|
18
|
-
pos_tagger_path => "string", # path to POS tagger
|
19
|
-
lemmatizer_path => "string", # path to lemmatizer
|
20
|
-
parser_path => "string", # path to parser
|
21
|
-
parser_max_sent_num => "integer", # max number of sentences per parser
|
22
|
-
input file
|
23
|
-
parser_max_sent_len => "integer", # max sentence length the parser handles
|
24
|
-
|
25
|
-
do_parse" => "bool", # use parser?
|
26
|
-
do_lemmatize" => "bool",# use lemmatizer?
|
27
|
-
do_postag" => "bool", # use POS tagger?
|
28
|
-
|
29
|
-
# output format: if tabformat_output == true,
|
30
|
-
# output in Tab format rather than Salsa/Tiger XML
|
31
|
-
# (this will not work if do_parse == true)
|
32
|
-
tabformat_output" => "bool",
|
33
|
-
|
34
|
-
# syntactic repairs, dependent on existing semantic role annotation
|
35
|
-
fe_syn_repair" => "bool", # map words to constituents for FEs: idealize?
|
36
|
-
fe_rel_repair" => "bool", # FEs: include non-included relative clauses into FEs
|
37
|
-
|
38
|
-
= Fred
|
39
|
-
experiment_ID" => "string", # experiment ID
|
40
|
-
enduser_mode" => "bool", # work in enduser mode? (disallowing many things)
|
41
|
-
|
42
|
-
preproc_descr_file_train" => "string", # path to preprocessing files
|
43
|
-
preproc_descr_file_test" => "string",
|
44
|
-
directory_output" => "string", # path to Salsa/Tiger XML output directory
|
45
|
-
|
46
|
-
verbose" => "bool" , # print diagnostic messages?
|
47
|
-
apply_to_all_known_targets" => "bool", # apply to all known targets rather than the ones with a frame?
|
48
|
-
|
49
|
-
fred_directory" => "string",# directory for internal info
|
50
|
-
classifier_dir" => "string", # write classifiers here
|
51
|
-
|
52
|
-
classifier" => "list", # classifiers
|
53
|
-
|
54
|
-
dbtype" => "string", # "mysql" or "sqlite"
|
55
|
-
|
56
|
-
host" => "string", # DB access: sqlite only
|
57
|
-
user" => "string",
|
58
|
-
passwd" => "string",
|
59
|
-
dbname" => "string",
|
60
|
-
|
61
|
-
# featurization info
|
62
|
-
feature" => "list", # which features to use for the classifier?
|
63
|
-
binary_classifiers" => "bool",# make binary rather than n-ary clasifiers?
|
64
|
-
negsense" => "string", # binary classifier: negative sense is..?
|
65
|
-
numerical_features" => "string", # do what with numerical features?
|
66
|
-
|
67
|
-
# what to do with items that have multiple senses?
|
68
|
-
# 'binarize': binary classifiers, and consider positive
|
69
|
-
# if the sense is among the gold senses
|
70
|
-
# 'join' : make one joint sense
|
71
|
-
# 'repeat' : make multiple occurrences of the item, one sense per occ
|
72
|
-
# 'keep' : keep as separate labels
|
73
|
-
#
|
74
|
-
# multilabel: consider as assigned all labels
|
75
|
-
# above a certain confidence threshold?
|
76
|
-
handle_multilabel" => "string",
|
77
|
-
assignment_confidence_threshold" => "float",
|
78
|
-
|
79
|
-
# single-sentence context?
|
80
|
-
single_sent_context" => "bool",
|
81
|
-
|
82
|
-
# noncontiguous input? then we need access to a larger corpus
|
83
|
-
noncontiguous_input" => "bool",
|
84
|
-
larger_corpus_dir" => "string",
|
85
|
-
larger_corpus_format" => "string",
|
86
|
-
larger_corpus_encoding" => "string"
|
87
|
-
|
88
|
-
[ # variables
|
89
|
-
"train",
|
90
|
-
"exp_ID"
|
91
|
-
]
|
92
|
-
|
93
|
-
= Rosy
|
94
|
-
# features
|
95
|
-
feature" => "list",
|
96
|
-
classifier" => "list",
|
97
|
-
|
98
|
-
verbose" => "bool" ,
|
99
|
-
enduser_mode" => "bool",
|
100
|
-
|
101
|
-
experiment_ID" => "string",
|
102
|
-
|
103
|
-
directory_input_train" => "string",
|
104
|
-
directory_input_test" => "string",
|
105
|
-
directory_output" => "string",
|
106
|
-
|
107
|
-
preproc_descr_file_train" => "string",
|
108
|
-
preproc_descr_file_test" => "string",
|
109
|
-
external_descr_file" => "string",
|
110
|
-
|
111
|
-
dbtype" => "string", # "mysql" or "sqlite"
|
112
|
-
|
113
|
-
host" => "string", # DB access: sqlite only
|
114
|
-
user" => "string",
|
115
|
-
passwd" => "string",
|
116
|
-
dbname" => "string",
|
117
|
-
|
118
|
-
data_dir" => "string", # for external use
|
119
|
-
rosy_dir" => "pattern", # for internal use only, set by rosy.rb
|
120
|
-
|
121
|
-
classifier_dir" => "string", # if present, special directory for classifiers
|
122
|
-
|
123
|
-
classif_column_name" => "string",
|
124
|
-
main_table_name" => "pattern",
|
125
|
-
test_table_name" => "pattern",
|
126
|
-
|
127
|
-
eval_file" => "pattern",
|
128
|
-
log_file" => "pattern",
|
129
|
-
failed_file" => "pattern",
|
130
|
-
classifier_file" => "pattern",
|
131
|
-
classifier_output_file" => "pattern",
|
132
|
-
noval" => "string",
|
133
|
-
|
134
|
-
|
135
|
-
split_nones" => "bool",
|
136
|
-
print_eval_log" => "bool",
|
137
|
-
assume_argrec_perfect" => "bool",
|
138
|
-
xwise_argrec" => "string",
|
139
|
-
xwise_arglab" => "string",
|
140
|
-
xwise_onestep" => "string",
|
141
|
-
|
142
|
-
fe_syn_repair" => "bool", # map words to constituents for FEs: idealize?
|
143
|
-
fe_rel_repair" => "bool", # FEs: include non-included relative clauses into FEs
|
144
|
-
|
145
|
-
prune" => "string", # pruning prior to argrec?
|
146
|
-
|
147
|
-
["exp_ID", "test_ID", "split_ID", "feature_name", "classif", "step",
|
148
|
-
"group", "dataset","mode"] # variables
|
149
|
-
|
150
|
-
= External Config Data
|
151
|
-
|
152
|
-
directory" => "string", # features
|
153
|
-
|
154
|
-
experiment_id" => "string",
|
155
|
-
|
156
|
-
gfmap_restrict_to_downpath" => "bool",
|
157
|
-
gfmap_restrict_pathlen" => "integer",
|
158
|
-
gfmap_remove_gf" => "list"
|
159
|
-
|
160
|
-
|
data/doc/fred.pdf
DELETED
Binary file
|
data/doc/salsa_tool.pdf
DELETED
Binary file
|
data/doc/salsatigerxml.pdf
DELETED
Binary file
|
data/doc/shal_doc.pdf
DELETED
Binary file
|
data/doc/shal_lrec.pdf
DELETED
Binary file
|