shalmaneser 1.2.0.rc3 → 1.2.0.rc4

Sign up to get free protection for your applications and to get access to all the features.
Files changed (47) hide show
  1. checksums.yaml +4 -4
  2. data/README.md +26 -7
  3. data/bin/fred +2 -4
  4. data/doc/exp_files.md +6 -5
  5. data/lib/common/{ConfigData.rb → config_data.rb} +46 -270
  6. data/lib/common/config_format_element.rb +220 -0
  7. data/lib/common/prep_config_data.rb +62 -0
  8. data/lib/common/{frprep_helper.rb → prep_helper.rb} +0 -0
  9. data/lib/{common/DBInterface.rb → db/db_interface.rb} +2 -2
  10. data/lib/{rosy/DBMySQL.rb → db/db_mysql.rb} +1 -2
  11. data/lib/{rosy/DBSQLite.rb → db/db_sqlite.rb} +1 -1
  12. data/lib/{rosy/DBTable.rb → db/db_table.rb} +1 -1
  13. data/lib/{rosy/DBWrapper.rb → db/db_wrapper.rb} +0 -0
  14. data/lib/{common/SQLQuery.rb → db/sql_query.rb} +0 -0
  15. data/lib/fred/FredBOWContext.rb +8 -6
  16. data/lib/fred/FredDetermineTargets.rb +1 -1
  17. data/lib/fred/FredEval.rb +1 -1
  18. data/lib/fred/FredFeaturize.rb +22 -16
  19. data/lib/fred/FredTest.rb +0 -1
  20. data/lib/fred/fred.rb +2 -0
  21. data/lib/fred/{FredConfigData.rb → fred_config_data.rb} +70 -67
  22. data/lib/fred/opt_parser.rb +1 -1
  23. data/lib/frprep/frprep.rb +1 -1
  24. data/lib/frprep/interfaces/berkeley_interface.rb +7 -9
  25. data/lib/frprep/opt_parser.rb +1 -1
  26. data/lib/rosy/ExternalConfigData.rb +1 -1
  27. data/lib/rosy/RosyEval.rb +1 -1
  28. data/lib/rosy/RosyFeaturize.rb +21 -20
  29. data/lib/rosy/RosyInspect.rb +1 -1
  30. data/lib/rosy/RosyPruning.rb +1 -1
  31. data/lib/rosy/RosyServices.rb +1 -1
  32. data/lib/rosy/RosySplit.rb +1 -1
  33. data/lib/rosy/RosyTest.rb +23 -20
  34. data/lib/rosy/RosyTrain.rb +15 -13
  35. data/lib/rosy/RosyTrainingTestTable.rb +2 -1
  36. data/lib/rosy/View.rb +1 -1
  37. data/lib/rosy/opt_parser.rb +1 -1
  38. data/lib/rosy/rosy.rb +1 -1
  39. data/lib/rosy/rosy_config_data.rb +121 -0
  40. data/lib/shalmaneser/opt_parser.rb +32 -2
  41. data/lib/shalmaneser/version.rb +1 -1
  42. metadata +23 -114
  43. checksums.yaml.gz.sig +0 -0
  44. data.tar.gz.sig +0 -0
  45. data/lib/common/FrPrepConfigData.rb +0 -66
  46. data/lib/rosy/RosyConfigData.rb +0 -115
  47. metadata.gz.sig +0 -0
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: f64175ecd62ad8540989348c15317500a81a001f
4
- data.tar.gz: fe381a419d70708f84ee2060bc91fea35e31cf26
3
+ metadata.gz: 4c95a6bc0ac36a239b3014dda1fc240790e99e10
4
+ data.tar.gz: 469f70a4fc236755f626a48493aaa68bc3d4639e
5
5
  SHA512:
6
- metadata.gz: f888c3690741dda8f2ca980f1ba51020a696ea92fc6c2cb3e596da8349c11e946acf08ae4ead19cf2664f20b827cda49882783ccd2aabd29cb902a819b2e9c65
7
- data.tar.gz: 3268708b720df30dac6928bb5df7732391a1f73a9da65bd3cc2912af27447831ca0c0d281a0d032e279bf9356a07bcd3ecb099e542666887d1b3447462ab34e2
6
+ metadata.gz: 5be8c42e0d322d429896a1b55a0330c977f8f47cfdb52e6174f043dc9e46f61bd2464fe6feb642151caf070a70c55f4a335557f02b98b934f5b6ec04ed797f9f
7
+ data.tar.gz: 54390d8388a780003247794806ee9427d7889aa72e1af54ec968d52d728fc323361816bf2b1823f18249a3b1396320497440afe37addfc86969fd255fe830a54
data/README.md CHANGED
@@ -1,17 +1,25 @@
1
1
  # [SHALMANESER - a SHALlow seMANtic parSER](http://www.coli.uni-saarland.de/projects/salsa/shal/)
2
2
 
3
+ [RubyGems](http://rubygems.org/gems/shalmaneser) |
4
+ [Shalmanesers Project Page](http://bu.chsta.be/projects/shalmaneser/) |
5
+ [Source Code](https://github.com/arbox/shalmaneser) |
6
+ [Bug Tracker](https://github.com/arbox/shalmaneser/issues)
3
7
 
4
- [RubyGems](http://rubygems.org/gems/shalmaneser) | [Shalmanesers Project Page](http://bu.chsta.be/projects/shalmaneser/) | [Source Code](https://github.com/arbox/shalmaneser) | [Bug Tracker](https://github.com/arbox/shalmaneser/issues)
5
8
 
6
- [<img src="https://badge.fury.io/rb/shalmaneser.png" alt="Gem Version" />](http://badge.fury.io/rb/shalmaneser)
7
- [![Build Status](https://travis-ci.org/arbox/shalmaneser.png?branch=1.2)](https://travis-ci.org/arbox/shalmaneser)
8
- [<img src="https://codeclimate.com/github/arbox/shalmaneser.png" alt="Code Climate" />](https://codeclimate.com/github/arbox/shalmaneser)
9
- [<img alt="Bitdeli Badge" src="https://d2weczhvl823v0.cloudfront.net/arbox/shalmaneser/trend.png" />](https://bitdeli.com/free)
10
- [![Dependency Status](https://gemnasium.com/arbox/shalmaneser.png)](https://gemnasium.com/arbox/shalmaneser)
9
+ [![Gem Version](https://img.shields.io/gem/v/shalmaneser.svg")](https://rubygems.org/gems/shalmaneser)
10
+ [![Gem Version](https://img.shields.io/gem/v/frprep.svg")](https://rubygems.org/gems/frprep)
11
+ [![Gem Version](https://img.shields.io/gem/v/fred.svg")](https://rubygems.org/gems/fred)
12
+ [![Gem Version](https://img.shields.io/gem/v/rosy.svg")](https://rubygems.org/gems/rosy)
13
+
14
+
15
+ [![License GPL 2](http://img.shields.io/badge/License-GPL%202-green.svg)](http://www.gnu.org/licenses/gpl-2.0.txt)
16
+ [![Build Status](https://img.shields.io/travis/arbox/shalmaneser.svg?branch=1.2")](https://travis-ci.org/arbox/shalmaneser)
17
+ [![Code Climate](https://img.shields.io/codeclimate/github/arbox/shalmaneser.svg")](https://codeclimate.com/github/arbox/shalmaneser)
18
+ [![Dependency Status](https://img.shields.io/gemnasium/arbox/shalmaneser.svg")](https://gemnasium.com/arbox/shalmaneser)
11
19
 
12
20
  ## Description
13
21
 
14
- Please be careful, the whole thing is under construction! For now Shalmaneser it not intended to run on Windows systems since it heavily uses system call for external invocations.
22
+ Please be careful, the whole thing is under construction! For now Shalmaneser it not intended to run on Windows systems since it heavily uses system calls for external invocations.
15
23
  Current versions of Shalmaneser have been tested on Linux only (other *NIX testers are welcome!).
16
24
 
17
25
  Shalmaneser is a supervised learning toolbox for shallow semantic parsing, i.e. the automatic assignment of semantic classes and roles to text. This technique is often called SRL (Semantic Role Labelling). The system was developed for Frame Semantics; thus we use Frame Semantics terminology and call the classes frames and the roles frame elements. However, the architecture is reasonably general, and with a certain amount of adaption, Shalmaneser should be usable for other paradigms (e.g., PropBank roles) as well. Shalmaneser caters both for end users, and for researchers.
@@ -25,6 +33,9 @@ We'll try to provide newer pretrained models for English, German, and possibly o
25
33
  For researchers interested in investigating shallow semantic parsing, our system is extensively configurable and extendable.
26
34
 
27
35
  ## Origin
36
+
37
+ The original version of Shalmaneser was written by Sebastian Padó, Katrin Erk and others during their work in the SALSA Project.
38
+
28
39
  You can find original versions of Shalmaneser up to ``1.1`` on the [SALSA](http://www.coli.uni-saarland.de/projects/salsa/shal/) project page.
29
40
 
30
41
  ## Publications on Shalmaneser
@@ -72,3 +83,11 @@ See the installation instructions in the [doc](https://github.com/arbox/shalmane
72
83
 
73
84
  - [OpenNLP MaxEnt](http://sourceforge.net/projects/maxent/files/Maxent/2.4.0/)
74
85
  - [Mallet](http://mallet.cs.umass.edu/index.php)
86
+
87
+ ## License
88
+
89
+ See the `LICENSE` file.
90
+
91
+ ## Contributing
92
+
93
+ See the `CONTRIBUTING` file.
data/bin/fred CHANGED
@@ -1,10 +1,8 @@
1
1
  #!/usr/bin/env ruby
2
2
  # -*- encoding: utf-8 -*-
3
3
 
4
- # AB, 2011-11-13
5
-
6
- # fred
7
- # Katrin Erk, April 05
4
+ # @author Andrei Beliankou, 2011-11-13
5
+ # @author Katrin Erk, April 05
8
6
  #
9
7
  # Frame disambiguation system:
10
8
  # frame assignment as word sense disambiguation
data/doc/exp_files.md CHANGED
@@ -103,7 +103,7 @@ For comfortable access to a list feature, arbitrary access functions for list fe
103
103
 
104
104
  "classifier" => "list", # classifiers
105
105
 
106
- "dbtype" => "string", # "mysql" or "sqlite"
106
+ "dbtype" => "string", # "mysql" or ("sqlite" doesn't work for now)
107
107
 
108
108
  "host" => "string", # DB access: sqlite only
109
109
  "user" => "string",
@@ -132,9 +132,10 @@ For comfortable access to a list feature, arbitrary access functions for list fe
132
132
  "single_sent_context" => "bool",
133
133
 
134
134
  # noncontiguous input? then we need access to a larger corpus
135
- "noncontiguous_input" => "bool",
136
- "larger_corpus_dir" => "string",
137
- "larger_corpus_format" => "string",
135
+ # NOTE: This doesn't work for now.
136
+ "noncontiguous_input" => "bool"
137
+ "larger_corpus_dir" => "string"
138
+ "larger_corpus_format" => "string"
138
139
  "larger_corpus_encoding" => "string"
139
140
  ## Role Assignment System (aka Rosy)
140
141
  # features
@@ -154,7 +155,7 @@ For comfortable access to a list feature, arbitrary access functions for list fe
154
155
  "preproc_descr_file_test" => "string",
155
156
  "external_descr_file" => "string",
156
157
 
157
- "dbtype" => "string", # "mysql" or "sqlite"
158
+ "dbtype" => "string", # "mysql" ("sqlite" doen't work for now)
158
159
 
159
160
  "host" => "string", # DB access: sqlite only
160
161
  "user" => "string",
@@ -56,6 +56,7 @@
56
56
  #
57
57
  #
58
58
 
59
+ require 'common/config_format_element'
59
60
  require 'common/ruby_class_extensions'
60
61
 
61
62
 
@@ -68,64 +69,62 @@ require 'common/ruby_class_extensions'
68
69
  # that only takes as input the name of the config file
69
70
  # and that declares all the feature types and variable names
70
71
  # needed for the given application.
71
-
72
+ #
73
+ # @abstract Subclass and override {#initialize} to implement
74
+ # a custom ConfigData class.
72
75
  class ConfigData
73
76
 
74
- ###########
75
- # new()
76
- #
77
- # reads the config file
78
- #
79
77
  # Input parameters: the name of the config file, a hash declaring all
80
78
  # features by mapping feature names to their types,
81
79
  # and an array of all variables that may occur in pattern type features
82
80
  #
83
- def initialize(filename, # string: name of config file
84
- feature_types, # hash: feature_name => feature_type
85
- variables) # array of strings: list of variables used in pattern features
81
+ # @param filename [String] a name of the configuration file
82
+ # @param feature_types [Hash] feature type definitions
83
+ # @param variables [Array] list of variables used in pattern features
84
+ def initialize(filename, feature_types, variables)
86
85
 
87
86
  @test_print = false
88
87
  @variables = variables
89
- @original_filename = filename
90
-
91
- ##
92
- # open config file
93
- begin
94
- file = File.new(filename)
95
- rescue
96
- $stderr.puts "Error: I could not open the experiment file " + filename
97
- exit 1
98
- end
88
+ @filename = filename
99
89
 
100
90
  # feature_types: hash: feature_name => feature_type
101
- # features: hash: feature_name => value
102
91
  @feature_types = feature_types
103
- @features = Hash.new
104
92
 
105
- # @list_feature_access: hash feature_name => Proc
93
+ # features: hash: feature_name => value
94
+ @features = {}
95
+
96
+ # hash: feature_name => Proc
106
97
  # access method for list features
107
- @list_feature_access = Hash.new
98
+ @list_feature_access = {}
108
99
 
109
100
  # pre-initialize list features to an empty array
110
- @feature_types.each_pair { |feature_name, feature_type|
101
+ @feature_types.each_pair do |feature_name, feature_type|
111
102
  if feature_type == "list"
112
- @features[feature_name] = Array.new
103
+ @features[feature_name] = []
113
104
  end
114
- }
105
+ end
115
106
 
107
+
108
+ ##
109
+ # open config file
110
+ # @todo Introduce custom exceptions to handle external errors.
111
+ begin
112
+ file = File.new(@filename)
113
+ rescue
114
+ $stderr.puts "Error: I could not open the experiment file " + @filename
115
+ exit 1
116
+ end
116
117
  ##
118
+
117
119
  # examine the config file contents
118
120
 
119
- while (line = file.gets())
120
- line = line.chomp().strip()
121
- if line =~ /^#/ # comment
121
+ while line = file.gets
122
+ line = line.strip
123
+ # Empty lines and comments
124
+ if line =~ /^#/ or line.empty?
122
125
  next
123
126
  end
124
127
 
125
- if line.empty? # nothing to be seen here
126
- next
127
- end
128
-
129
128
  feature_name, rhs = extract_def(line)
130
129
  set_entry(feature_name, rhs)
131
130
  end
@@ -140,9 +139,9 @@ class ConfigData
140
139
 
141
140
  unless @feature_types[feature_name]
142
141
  $stderr.puts "Error in experiment file:"
143
- $stderr.puts "Unknown parameter #{feature_name} in #{@original_filename}."
142
+ $stderr.puts "Unknown parameter #{feature_name} in #{@filename}."
144
143
  $stderr.puts "Expected features for this type of experiment file:"
145
- $stderr.puts @feature_types.keys().join(", ")
144
+ $stderr.puts @feature_types.keys.join(", ")
146
145
  exit 1
147
146
  end
148
147
 
@@ -269,13 +268,14 @@ class ConfigData
269
268
  # returns: a feature value. the type of the return value
270
269
  # depends on the type of the feature.
271
270
  # returns nil if the feature has not been set in the config file.
272
- def get(name) # string: name of the feature to access
271
+ # @param name [String] name of the feature to access
272
+ def get(name)
273
273
  if @feature_types[name].nil?
274
274
  raise "Unknown feature " + name
275
275
  end
276
276
 
277
277
  # may return nil if something has not been set
278
- return @features[name]
278
+ @features[name]
279
279
  end
280
280
 
281
281
  ####
@@ -284,7 +284,7 @@ class ConfigData
284
284
  # returns the type of a given feature,
285
285
  # or nil if it is undefined
286
286
  def get_type(feature_name)
287
- return @feature_types[feature_name]
287
+ @feature_types[feature_name]
288
288
  end
289
289
 
290
290
  #####
@@ -292,12 +292,10 @@ class ConfigData
292
292
  #
293
293
  # returns: true if a feature by this name has been set in the config file,
294
294
  # false else
295
- def is_defined(feature) # string: name of the feature
296
- if @features[feature]
297
- return true
298
- else
299
- return false
300
- end
295
+ # @param feature [String] name of the feature
296
+ # @note This method is nowhere used.
297
+ def is_defined(feature)
298
+ @features[feature] ? true : false
301
299
  end
302
300
 
303
301
  #####
@@ -328,6 +326,7 @@ class ConfigData
328
326
  # get_filename:
329
327
  #
330
328
  # synonym for instantiate()
329
+ # @note What for?
331
330
  def get_filename(key, var_hash={})
332
331
  return instantiate(key, var_hash)
333
332
  end
@@ -451,8 +450,8 @@ class ConfigData
451
450
  #
452
451
  # returns: a pair of strings, the left-hand side and the right-hand side
453
452
  # of the =, minus the [white space] in the places shown above
454
-
455
- def extract_def(line) # string: line from config file
453
+ # @param line [String] line from config file
454
+ def extract_def(line)
456
455
  unless line =~ /^\s*(\w+)\s*=\s*([^\s].*)$/
457
456
  $stderr.puts "Error in experiment file: "
458
457
  $stderr.puts "I couldn't analyze the following line: "
@@ -464,231 +463,8 @@ class ConfigData
464
463
 
465
464
  ####
466
465
  # access to the object variables
467
- def get_contents()
466
+ def get_contents
468
467
  return [@features, @feature_types, @list_feature_access]
469
468
  end
470
469
 
471
470
  end
472
-
473
-
474
- ##############################
475
- # ConfigFormatelement is an auxiliary class
476
- # of ConfigData.
477
- # It keeps track of feature patterns with variables in them
478
- # that can be instantiated.
479
-
480
- class ConfigFormatElement
481
-
482
- # new()
483
- #
484
- # given a pattern and a list of variable names,
485
- # analyze the pattern and remember the variable names
486
- #
487
- def initialize(string, # string: feature name, may include names of variables.
488
- # they are included in <>
489
- variables) # list of variable names that can occur
490
-
491
- @variables = variables
492
-
493
- # pattern: this is what the 'string' is split into,
494
- # an array of elements that are either fixed parts or variables.
495
- # fixed part: pair [item:string, "string"]
496
- # variable: pair [variable_name:string, "variable"]
497
- @pattern = Array.new
498
- state = "out"
499
- item = ""
500
-
501
- # analyze string,
502
- # split into variables and fixed parts
503
- string.split(//).each { |char|
504
-
505
- case state
506
- when "in"
507
- case char
508
- when "<"
509
- raise "Duplicate < in " + string
510
- when ">"
511
- unless @variables.include? item
512
- raise "Unknown variable " + item
513
- end
514
- @pattern << [item, "variable"]
515
- item = ""
516
- state = "out"
517
- else
518
- item << char
519
- state = "in"
520
- end
521
-
522
- when "out"
523
- case char
524
- when "<"
525
- unless item.empty?
526
- @pattern << [item, "string"]
527
- item = ""
528
- end
529
- state = "in"
530
- when ">"
531
- raise "Unexpected > in " + string
532
- else
533
- item << char
534
- state = "out"
535
- end
536
-
537
- else
538
- raise "Shouldn't be here"
539
- end
540
- }
541
-
542
- # read through the whole of "string"
543
- # end state has to be "out"
544
- unless state == "out"
545
- raise "Unclosed < in " + string
546
- end
547
-
548
- # last bit still to be recorded?
549
- unless item.empty?
550
- @pattern << [item, "string"]
551
- end
552
-
553
- # make regexp for matching this pattern
554
- @regexp = make_regexp(@pattern)
555
- end
556
-
557
- # instantiate: given pairs of variable names and variable values,
558
- # instantiate @pattern to a string in which var names are replaced
559
- # by their values
560
- #
561
- # returns: string
562
- def instantiate(var_hash) # hash variable name(string) => variable value(string)
563
-
564
- # instantiate the pattern
565
- return @pattern.map { |item, string_or_var|
566
-
567
- case string_or_var
568
- when "string"
569
- item
570
-
571
- when "variable"
572
-
573
- if var_hash[item].nil?
574
- raise "Missing variable instantiation: " + item
575
- end
576
- var_hash[item]
577
-
578
- else
579
- raise "Shouldn't be here"
580
- end
581
- }.join
582
- end
583
-
584
- # match()
585
- #
586
- # given a string, try to match it against the @pattern
587
- # while setting the variables given in 'fillers' to
588
- # the values given in that hash.
589
- #
590
- # returns: if the string matches, a hash variable name => value
591
- # that includes the fillers given as a parameter as well as
592
- # values for all other variables mentioned in @pattern,
593
- # or false if no match.
594
- def match(string, # a string
595
- fillers = nil) # hash variable name(string) => value(string)
596
-
597
- # have we been given partial info about variables?
598
- if fillers
599
- match = make_regexp(@pattern, fillers).match(string)
600
- # $stderr.print "matching " + make_regexp(@pattern, fillers).source +
601
- # " against " + string + " "
602
- # if match.nil?
603
- # $stderr.puts "no"
604
- # else
605
- # $stderr.puts "yes"
606
- # end
607
- else
608
- match = @regexp.match(string)
609
- end
610
-
611
- if match.nil?
612
- # no match via the regular expression
613
- return false
614
- end
615
-
616
- # regular expression matched.
617
- # construct return value in hash
618
- # retv: variable name(string) => value(string)
619
- retv = Hash.new()
620
- if fillers
621
- # include given fillers in retv hash
622
- fillers.each_pair { |name, val| retv[name] = val }
623
- end
624
-
625
- # now put values for other variables in @pattern into retv
626
- index = 1
627
- @pattern.to_a.select { |item, string_or_var|
628
- string_or_var == "variable"
629
- }.select { |item, string_or_var|
630
- fillers.nil? or
631
- fillers[item].nil?
632
- }.each { |item, string_or_var|
633
- # for all items on the pattern list
634
- # that are variables and
635
- # haven't been filled by the "fillers" list already:
636
- # fill from matches
637
-
638
- if match[index].nil?
639
- raise "Match, but not enough matched elements? Strange."
640
- end
641
-
642
- if retv[item].nil?
643
- retv[item] = match[index]
644
- else
645
- unless retv[item] == match[index]
646
- return false
647
- end
648
- end
649
-
650
- index += 1
651
- }
652
-
653
- return retv
654
- end
655
-
656
- # used_variables
657
- #
658
- # returns: an array of variable names used in @pattern
659
- def used_variables()
660
- return @pattern.select { |item, string_or_var|
661
- string_or_var == "variable"
662
- }.map { |item, string_or_var| item}
663
- end
664
-
665
- ####################
666
- private
667
-
668
- # make_regexp:
669
- # make regular expression from a pattern
670
- # together with some variable fillers
671
- #
672
- # returns: Regexp object
673
- def make_regexp(pattern, # array of pairs [string, "string"] or [string, "variable"]
674
- fillers = nil) # hash variable name(string) => value(string)
675
- return (Regexp.new "^" +
676
- pattern.map { |item, string_or_var|
677
- case string_or_var
678
- when "variable"
679
- if fillers and
680
- fillers[item]
681
- Regexp.escape(fillers[item])
682
- else
683
- "(.+)"
684
- end
685
- when "string"
686
- Regexp.escape(item)
687
- else
688
- raise "Shouldn't be here"
689
- end
690
- }.join + "$")
691
- end
692
-
693
- end
694
-