shalmaneser-rosy 1.2.0.rc4

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: fc41641d5f0eed28292b10a996ffd797eb1002fc
4
+ data.tar.gz: 76916c60023ae21361dc6752ca316028585d1522
5
+ SHA512:
6
+ metadata.gz: 51bbbd581acb92993cd12d485b405f0f9f199d5ea4334b37cac6a4ff6150d49e4b0bc7b92ab6d08399a1bbe69839ebc476ae69267b64f2ec34464d4d080569cf
7
+ data.tar.gz: abbc5f43fd9d191730c8149743fcd84ba6183ff2ee0f5eaf08b5f574a68aaca2a89b390c68e54af6f5f9019dc7f97f22b352fb6ee6d88001579c55868c06e286
@@ -0,0 +1,10 @@
1
+ --private
2
+ --protected
3
+ --title 'SHALMANESER'
4
+ lib/**/*.rb
5
+ bin/**/*
6
+ doc/**/*.md
7
+ -
8
+ CHANGELOG.md
9
+ LICENSE.md
10
+ doc/index.md
@@ -0,0 +1,4 @@
1
+ # Versions
2
+
3
+ ## Version 1.2.0-rc1
4
+
@@ -0,0 +1,4 @@
1
+ # LICENSE
2
+
3
+ This software is written in Ruby and is released under the [GNU Public License](http://www.gnu.org/licenses/gpl-2.0.html) (GPL v2), and the documentation under the [Free Document License](http://www.gnu.org/licenses/old-licenses/fdl-1.2.html) (FDL v1.2).
4
+
@@ -0,0 +1,93 @@
1
+ # [SHALMANESER - a SHALlow seMANtic parSER](http://www.coli.uni-saarland.de/projects/salsa/shal/)
2
+
3
+ [RubyGems](http://rubygems.org/gems/shalmaneser) |
4
+ [Shalmanesers Project Page](http://bu.chsta.be/projects/shalmaneser/) |
5
+ [Source Code](https://github.com/arbox/shalmaneser) |
6
+ [Bug Tracker](https://github.com/arbox/shalmaneser/issues)
7
+
8
+
9
+ [![Gem Version](https://img.shields.io/gem/v/shalmaneser.svg")](https://rubygems.org/gems/shalmaneser)
10
+ [![Gem Version](https://img.shields.io/gem/v/frprep.svg")](https://rubygems.org/gems/frprep)
11
+ [![Gem Version](https://img.shields.io/gem/v/fred.svg")](https://rubygems.org/gems/fred)
12
+ [![Gem Version](https://img.shields.io/gem/v/rosy.svg")](https://rubygems.org/gems/rosy)
13
+
14
+
15
+ [![License GPL 2](http://img.shields.io/badge/License-GPL%202-green.svg)](http://www.gnu.org/licenses/gpl-2.0.txt)
16
+ [![Build Status](https://img.shields.io/travis/arbox/shalmaneser.svg?branch=1.2")](https://travis-ci.org/arbox/shalmaneser)
17
+ [![Code Climate](https://img.shields.io/codeclimate/github/arbox/shalmaneser.svg")](https://codeclimate.com/github/arbox/shalmaneser)
18
+ [![Dependency Status](https://img.shields.io/gemnasium/arbox/shalmaneser.svg")](https://gemnasium.com/arbox/shalmaneser)
19
+
20
+ ## Description
21
+
22
+ Please be careful, the whole thing is under construction! For now Shalmaneser it not intended to run on Windows systems since it heavily uses system calls for external invocations.
23
+ Current versions of Shalmaneser have been tested on Linux only (other *NIX testers are welcome!).
24
+
25
+ Shalmaneser is a supervised learning toolbox for shallow semantic parsing, i.e. the automatic assignment of semantic classes and roles to text. This technique is often called SRL (Semantic Role Labelling). The system was developed for Frame Semantics; thus we use Frame Semantics terminology and call the classes frames and the roles frame elements. However, the architecture is reasonably general, and with a certain amount of adaption, Shalmaneser should be usable for other paradigms (e.g., PropBank roles) as well. Shalmaneser caters both for end users, and for researchers.
26
+
27
+ For end users, we provide a simple end user mode which can simply apply the pre-trained classifiers
28
+ for [English](http://www.coli.uni-saarland.de/projects/salsa/shal/index.php?nav=download) (FrameNet 1.3 annotation / Collins parser)
29
+ and [German](http://www.coli.uni-saarland.de/projects/salsa/shal/index.php?nav=download) (SALSA 1.0 annotation / Sleepy parser).
30
+
31
+ We'll try to provide newer pretrained models for English, German, and possibly other languages as soon as possible.
32
+
33
+ For researchers interested in investigating shallow semantic parsing, our system is extensively configurable and extendable.
34
+
35
+ ## Origin
36
+
37
+ The original version of Shalmaneser was written by Sebastian Padó, Katrin Erk and others during their work in the SALSA Project.
38
+
39
+ You can find original versions of Shalmaneser up to ``1.1`` on the [SALSA](http://www.coli.uni-saarland.de/projects/salsa/shal/) project page.
40
+
41
+ ## Publications on Shalmaneser
42
+
43
+ - K. Erk and S. Padó: Shalmaneser - a flexible toolbox for semantic role assignment. Proceedings of LREC 2006, Genoa, Italy. [Click here for details](http://www.nlpado.de/~sebastian/pub/papers/lrec06_erk.pdf).
44
+ - TODO: add other works
45
+
46
+ ## Documentation
47
+
48
+ The project documentation can be found in our [doc](https://github.com/arbox/shalmaneser/blob/1.2/doc/index.md) folder.
49
+
50
+ ## Development
51
+
52
+ We are working now on two branches:
53
+
54
+ - ``dev`` - our development branch incorporating actual changes, for now pointing to ``1.2``;
55
+
56
+ - ``1.2`` - intermediate target;
57
+
58
+ - ``2.0`` - final target.
59
+
60
+ ## Installation
61
+
62
+ See the installation instructions in the [doc](https://github.com/arbox/shalmaneser/blob/1.2/doc/index.md#installation) folder.
63
+
64
+ ### Tokenizers
65
+
66
+ - [Ucto](http://ilk.uvt.nl/ucto/)
67
+
68
+ ### POS Taggers
69
+
70
+ - [TreeTagger](http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/)
71
+
72
+ ### Lemmatizers
73
+
74
+ - [TreeTagger](http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/)
75
+
76
+ ### Parsers
77
+
78
+ - [BerkeleyParser](https://code.google.com/p/berkeleyparser/downloads/list)
79
+ - [Stanford Parser](http://nlp.stanford.edu/software/lex-parser.shtml)
80
+ - [Collins Parser](http://www.cs.columbia.edu/~mcollins/code.html)
81
+
82
+ ### Machine Learning Systems
83
+
84
+ - [OpenNLP MaxEnt](http://sourceforge.net/projects/maxent/files/Maxent/2.4.0/)
85
+ - [Mallet](http://mallet.cs.umass.edu/index.php)
86
+
87
+ ## License
88
+
89
+ See the `LICENSE` file.
90
+
91
+ ## Contributing
92
+
93
+ See the `CONTRIBUTING` file.
@@ -0,0 +1,17 @@
1
+ #!/usr/bin/env ruby
2
+ # -*- encoding: utf-8 -*-
3
+
4
+ # AB: 2011-11-14
5
+ # rosy.rb
6
+ # KE, SP April 05
7
+ #
8
+ # Main file of the Rosy role assignment system.
9
+
10
+
11
+ require 'rosy/opt_parser'
12
+ require 'rosy/rosy'
13
+
14
+ options = Rosy::OptParser.parse(ARGV)
15
+
16
+ rosy = Rosy::Rosy.new(options)
17
+ rosy.assign
@@ -0,0 +1,242 @@
1
+ # Katrin Erk November 05
2
+ #
3
+ # Abstract classes for
4
+ # - Rosy features
5
+ # - Rosy interface for external knowledge sources.
6
+
7
+ require 'rosy/ExternalConfigData'
8
+
9
+ ####
10
+ # Feature Extractor:
11
+ # computes one or more features for a node (a SynNode object) out of
12
+ # a SalsaTigerSentence
13
+ class AbstractFeatureExtractor
14
+ @@sent = nil # SalsaTigerSentence: sentence of the current instance
15
+ @@frame = nil # FrameNode: frame of the current instance
16
+ @@node = nil # SynNode: constituent that is the current instance
17
+ @@interpreter_class = nil # SynInterpreter class
18
+ @@instance_ok = true
19
+
20
+ ###
21
+ # returns a string: the designator for this feature extractor
22
+ # (an extractor may compute several features, but
23
+ # in the experiment file it is chosen by a single designator)
24
+ def AbstractFeatureExtractor.designator()
25
+ raise "Overwrite me"
26
+ end
27
+
28
+ ###
29
+ # returns an array of feature names, the names of the
30
+ # features that it can compute.
31
+ # The number of features that the extractor computes must be fixed.
32
+ def AbstractFeatureExtractor.feature_names()
33
+ raise "Overwrite me."
34
+ end
35
+
36
+ ###
37
+ # returns a string: the data type for the feature
38
+ # to be passed on to the MySQL database,
39
+ # e.g. VARCHAR(10), INT
40
+ def AbstractFeatureExtractor.sql_type()
41
+ raise "Overwrite me"
42
+ end
43
+
44
+ ###
45
+ # returns a string: the feature type
46
+ # (the same for all features computed by this extractor)
47
+ # possible values:
48
+ # - gold: gold label
49
+ # - admin: administrative feature, do not pass this on to the learner
50
+ # - syn: feature computed from syntactic characteristics of the instance
51
+ # - sem: feature involving semantic characteristics of the instance
52
+ # - sentlevel: this feature is the same for all instances of a sentence
53
+ def AbstractFeatureExtractor.feature_type()
54
+ raise "Overwrite me"
55
+ end
56
+
57
+ ###
58
+ # returns a string: "phase 1" or "phase 2",
59
+ # depending on whether the feature is computed
60
+ # directly from the SalsaTigerSentence and the SynNode objects
61
+ # or whether it is computed from the phase 1 features
62
+ def AbstractFeatureExtractor.phase()
63
+ raise "Overwrite me."
64
+ end
65
+
66
+ ###
67
+ # returns an array of strings, providing information about
68
+ # the feature extractor
69
+ def AbstractFeatureExtractor.info()
70
+ return []
71
+ end
72
+
73
+ ###
74
+ # set sentence, set node, set other settings:
75
+ # this is done prior to
76
+ # feature computation using compute_feature()
77
+ # such that computations that stay the same for
78
+ # several features can be done in advance
79
+ #
80
+ # This is just relevant for Phase 1
81
+ #
82
+ # returns: false/nil if there was a problem
83
+ def AbstractFeatureExtractor.set_sentence(sent, # SalsaTigerSentence object
84
+ frame) # FrameNode object
85
+ @@sent = sent
86
+ @@frame = frame
87
+
88
+ return true
89
+ end
90
+
91
+ def AbstractFeatureExtractor.set_node(node) # SynNode of the sentence set in set_sentence
92
+ @@node = node
93
+
94
+ return true
95
+ end
96
+
97
+ ###
98
+ # set sentence, set node, set general settings: this is done prior to
99
+ # feature computation using compute_feature_value()
100
+ # such that computations that stay the same for
101
+ # several features can be done in advance
102
+ def AbstractFeatureExtractor.set(var_hash = {})
103
+ # no settings at this point
104
+
105
+ return true
106
+ end
107
+ # test during initialisation whether a feature is computable
108
+ # gives the feature the possibility to specify additional constraints
109
+ # e.g. for phase2 features : specify which extractors from phase 1 are presupposed
110
+ def AbstractFeatureExtractor.is_computable(extractor_list) # bool
111
+ return true
112
+ end
113
+
114
+ ###
115
+ # @param exp [ConfigData] Experiment file information
116
+ # @param interpreter_class [Class]
117
+ def initialize(exp, interpreter_class)
118
+ @exp = exp
119
+ @@interpreter_class = interpreter_class
120
+ end
121
+
122
+ ###
123
+ # compute: compute features
124
+ #
125
+ # returns an array of features (strings), length the same as the
126
+ # length of feature_names()
127
+ def compute_features()
128
+ raise "overwrite me"
129
+ end
130
+
131
+ ###
132
+ # phase 2 extractors:
133
+ # compute features for a complete view
134
+ #
135
+ # returns: an array of columns,
136
+ # where a column is an array of feature values.
137
+ # returns one column per entry in feature_names()
138
+ def compute_features_on_view(view) # DBView object
139
+ raise "overwrite me"
140
+ end
141
+
142
+ # At this place, we had abstract methods for "training" phase 2 features
143
+ # Since this involves introducing a "state" that is nontrivial to preserve
144
+ # for a standalone version of the classifiers, without keeping the training data,
145
+ # we decided to remove this functionality (30.11.05).
146
+ # Features which rely on learning patterns from the training data and applying them
147
+ # to the test data will from now on be implemented as externals.
148
+
149
+ ######
150
+ protected
151
+
152
+ def AbstractFeatureExtractor.announce_me()
153
+ # AB: In 1.9 constants are symbols.
154
+ if Module.constants.include?("RosyFeatureInfo") or Module.constants.include?(:RosyFeatureInfo)
155
+ # yup, we have a class to which we can announce ourselves
156
+ RosyFeatureInfo.add_feature(eval(self.name()))
157
+ else
158
+ # no interface collector class
159
+ # $stderr.puts "Feature #{self.name()} not announced: no RosyFeatureInfo."
160
+ end
161
+ end
162
+ end
163
+
164
+ ################################################################
165
+ # Wrapper class for extractors that compute a single feature
166
+ class AbstractSingleFeatureExtractor < AbstractFeatureExtractor
167
+
168
+ ###
169
+ # returns a string: the designator for this feature extractor
170
+ # (an extractor may compute several features, but
171
+ # in the experiment file it is chosen by a single designator)
172
+ #
173
+ # here: single feature, and the feature name is the designator
174
+ def AbstractFeatureExtractor.designator()
175
+ return eval(self.name()).feature_name()
176
+ end
177
+
178
+ ###
179
+ def AbstractSingleFeatureExtractor.feature_names()
180
+ return [eval(self.name()).feature_name()]
181
+ end
182
+
183
+ ###
184
+ def compute_features()
185
+ return [compute_feature()]
186
+ end
187
+
188
+ def compute_features_on_view(view) # DBView object
189
+ return [compute_feature_on_view(view)]
190
+ end
191
+
192
+
193
+ ######
194
+ # Single-feature methods
195
+
196
+ ###
197
+ def AbstractSingleFeatureExtractor.feature_name()
198
+ raise "Overwrite me."
199
+ end
200
+
201
+ ###
202
+ def compute_feature()
203
+ raise "Overwrite me"
204
+ end
205
+
206
+ ###
207
+ def compute_feature_on_view(view) # DBView object
208
+ raise "Overwrite me"
209
+ end
210
+ end
211
+
212
+ ######################################################
213
+
214
+ class ExternalFeatureExtractor < AbstractFeatureExtractor
215
+
216
+ @@warning_uttered = false
217
+
218
+ ####
219
+ # initialization:
220
+ #
221
+ # read experiment file for external interfaces
222
+ def initialize(exp, # RosyConfigData object
223
+ interpreter_class)
224
+
225
+ @exp_rosy = exp
226
+ @@interpreter_class = interpreter_class
227
+
228
+ unless @exp_rosy.get("external_descr_file")
229
+ unless @@warning_uttered
230
+ $stderr.puts "Warning: Cannot compute external feature"
231
+ $stderr.puts "since 'external_descr_file' has not been set"
232
+ $stderr.puts "in the Rosy experiment file."
233
+ @@warning_uttered = true
234
+ end
235
+
236
+ @exp_external = nil
237
+ return
238
+ end
239
+
240
+ @exp_external = ExternalConfigData.new(@exp_rosy.get("external_descr_file"))
241
+ end
242
+ end
@@ -0,0 +1,58 @@
1
+ # ExternalConfigData
2
+ # Katrin Erk January 2006
3
+ #
4
+ # All scripts that compute additional external knowledge sources
5
+ # for Fred and Rosy:
6
+ # access to configuration and experiment description file
7
+
8
+ require 'common/config_data'
9
+
10
+ ##############################
11
+ # Class ExternalConfigData
12
+ #
13
+ # inherits from ConfigData,
14
+ # sets variable names appropriate to tasks of external knowledge sources
15
+
16
+ class ExternalConfigData < ConfigData
17
+ def initialize(filename)
18
+
19
+ # initialize config data object
20
+ super(filename, # config file
21
+ { "directory" => "string", # features
22
+
23
+ "experiment_id" => "string",
24
+
25
+ "gfmap_restrict_to_downpath" => "bool",
26
+ "gfmap_restrict_pathlen" => "integer",
27
+ "gfmap_remove_gf" => "list"
28
+ },
29
+ [] # variables
30
+ )
31
+
32
+ # set access functions for list features
33
+ set_list_feature_access("gfmap_remove_gf",
34
+ method("access_as_stringlist"))
35
+ end
36
+
37
+ ###
38
+ protected
39
+
40
+ #####
41
+ # access_as_stringlist
42
+ #
43
+ # assumed format:
44
+ #
45
+ # lhs = rhs1 rhs2 ... rhsN
46
+ #
47
+ # given in val_list as string tuples [rhs1,...,rhsN]
48
+ #
49
+ # join the rhs strings by spaces, return as string
50
+ # "rhs1 rhs2 ... rhsN"
51
+ #
52
+ def access_as_stringlist(val_list) # array:array:string
53
+ return val_list.map { |rhs| rhs.join(" ") }
54
+ end
55
+ end
56
+
57
+
58
+
@@ -0,0 +1,130 @@
1
+ # Failed Parses
2
+ #
3
+ # SP May 05
4
+ #
5
+ # Administration of information about failed parses;
6
+ # - sentence ID
7
+ # - frame
8
+ # - missed FE markables
9
+ #
10
+ # this class is pretty much a gloriefied hash table with methods to
11
+ # - read FailedParses from a file and to write them to a file
12
+ # - access info in a frame-specific way
13
+
14
+ class FailedParses
15
+
16
+ ###
17
+ # initialize
18
+ #
19
+ # nothing much happens here
20
+ def initialize()
21
+ @failed_parses = Array.new
22
+ end
23
+
24
+ ###
25
+ # register
26
+ #
27
+ # register new failed parse by specifying
28
+ # - its sentence id (any object)
29
+ # - its frame (String)
30
+ # - its FE list (String Array)
31
+
32
+ def register(sent_id, # object
33
+ frame, # string: frame name
34
+ target, # string?
35
+ target_pos, # string: target POS
36
+ fe_list) # array:string
37
+ if @failed_parses.assoc sent_id
38
+ # $stderr.puts "Error: trying to register sentence id #{sent_id} twice!"
39
+ # $stderr.puts "Skipping second occurrence."
40
+ end
41
+ @failed_parses << [sent_id,frame,target,target_pos,fe_list]
42
+ end
43
+
44
+ ###
45
+ # make_split
46
+ #
47
+ # produce a "split" of the failed parses into a train and a test section
48
+ # paramer: train_percentage, Integer between 0 and 100
49
+ #
50
+ # returns an Array with two FailedParses objects, the first for the
51
+ # train data, the second for the test data
52
+
53
+ def make_split(train_percentage)
54
+ unless train_percentage.class < Integer and train_percentage >= 0 and train_percentage <= 100
55
+ raise "Need Integer between 0 and 100 as training percentage."
56
+ end
57
+ train_failed = FailedParses.new()
58
+ test_failed = FailedParses.new()
59
+ @failed_parses.each {|sent_id,frame,target,target_pos,fe_list|
60
+ if rand(100) > train_percentage
61
+ test_failed.register(sent_id,frame,target,target_pos,fe_list)
62
+ else
63
+ train_failed.register(sent_id,frame,target,target_pos,fe_list)
64
+ end
65
+ }
66
+ return [train_failed, test_failed]
67
+ end
68
+
69
+ ###
70
+ # Access information
71
+ #
72
+ # failed_sent: number of failed sentences
73
+ # failed_fes: Hash that maps FE names [String] onto numbers of failed FEs [Int]
74
+ #
75
+ # optional parameters: frame, target, target_pos : if not specified or nil, marginal
76
+ # frequencies are counted (sum over all values)
77
+
78
+
79
+ def failed_sent(frame_spec=nil,target_spec=nil,target_pos_spec=nil)
80
+ counter = 0
81
+ @failed_parses.each {|sent_id,frame,target,target_pos,fe_list|
82
+ if ((frame_spec.nil? or frame_spec == frame) and
83
+ (target_spec.nil? or target_spec == target) and
84
+ (target_pos_spec.nil? or target_pos_spec == target_pos))
85
+ counter += 1
86
+ end
87
+ }
88
+ return counter
89
+ end
90
+
91
+ def failed_fes(frame_spec=nil,target_spec=nil,target_pos_spec=nil)
92
+ fe_hash = Hash.new(0)
93
+ @failed_parses.each {|sent_id,frame,target,target_pos,fe_list|
94
+ if ((frame_spec.nil? or frame_spec == frame) and
95
+ (target_spec.nil? or target_spec == target) and
96
+ (target_pos_spec.nil? or target_pos_spec == target))
97
+ fe_list.each {|fe_label|
98
+ fe_hash[fe_label] += 1
99
+ }
100
+ end
101
+ }
102
+ return fe_hash
103
+ end
104
+
105
+
106
+ ###
107
+ # Marshalling:
108
+ #
109
+ # save - save info about failed parses to file
110
+ # load - load info about failed parses from file
111
+
112
+ def save(filename)
113
+ io_obj = File.new(filename,"w")
114
+ Marshal.dump(@failed_parses,io_obj)
115
+ io_obj.close
116
+ end
117
+
118
+ def load(filename)
119
+ begin
120
+ io_obj = File.new(filename)
121
+ @failed_parses = Marshal.load(io_obj)
122
+ io_obj.close
123
+ rescue
124
+ $stderr.puts "WARNING: couldn't read failed parses file #{filename}."
125
+ $stderr.puts "I'll assume that there are no failed parses."
126
+ end
127
+ end
128
+
129
+
130
+ end