shalmaneser-rosy 1.2.0.rc4 → 1.2.rc5
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +47 -18
- data/bin/rosy +14 -7
- data/lib/rosy/FailedParses.rb +22 -20
- data/lib/rosy/FeatureInfo.rb +35 -31
- data/lib/rosy/GfInduce.rb +132 -130
- data/lib/rosy/GfInduceFeature.rb +86 -68
- data/lib/rosy/InputData.rb +59 -55
- data/lib/rosy/RosyConfusability.rb +47 -40
- data/lib/rosy/RosyEval.rb +55 -55
- data/lib/rosy/RosyFeatureExtractors.rb +295 -290
- data/lib/rosy/RosyFeaturize.rb +54 -67
- data/lib/rosy/RosyInspect.rb +52 -50
- data/lib/rosy/RosyIterator.rb +73 -67
- data/lib/rosy/RosyPhase2FeatureExtractors.rb +48 -48
- data/lib/rosy/RosyPruning.rb +39 -31
- data/lib/rosy/RosyServices.rb +116 -115
- data/lib/rosy/RosySplit.rb +55 -53
- data/lib/rosy/RosyTask.rb +7 -3
- data/lib/rosy/RosyTest.rb +174 -191
- data/lib/rosy/RosyTrain.rb +46 -50
- data/lib/rosy/RosyTrainingTestTable.rb +101 -99
- data/lib/rosy/TargetsMostFrequentFrame.rb +13 -9
- data/lib/rosy/{AbstractFeatureAndExternal.rb → abstract_feature_extractor.rb} +22 -97
- data/lib/rosy/abstract_single_feature_extractor.rb +52 -0
- data/lib/rosy/external_feature_extractor.rb +35 -0
- data/lib/rosy/opt_parser.rb +231 -201
- data/lib/rosy/rosy.rb +63 -64
- data/lib/rosy/rosy_conventions.rb +66 -0
- data/lib/rosy/rosy_error.rb +15 -0
- data/lib/rosy/var_var_restriction.rb +16 -0
- data/lib/shalmaneser/rosy.rb +1 -0
- metadata +26 -19
- data/lib/rosy/ExternalConfigData.rb +0 -58
- data/lib/rosy/View.rb +0 -418
- data/lib/rosy/rosy_config_data.rb +0 -121
- data/test/frprep/test_opt_parser.rb +0 -94
- data/test/functional/functional_test_helper.rb +0 -58
- data/test/functional/test_fred.rb +0 -47
- data/test/functional/test_frprep.rb +0 -99
- data/test/functional/test_rosy.rb +0 -40
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 35508aa71aef19017118cbe0bafc4f76f7223844
|
4
|
+
data.tar.gz: 993e563615c38a29c70de52f9f95fb27145fe535
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: e08f84cf7a13dda90f423cdc8f611a3cfb87fa082f2fe349be4d4b6bf3adc9f5b7653f4892dd037cab32e6ae695195bc54995640bdc222c366942af110a95c72
|
7
|
+
data.tar.gz: d6cdbb894fd1ab32a5e6a02e2785d31401dac8fa018facdecc6e380ea7bb2a8f37a78565afc435cac16ba1f1b6f91f4e2285c9490f7f533e904dc03e78919e97
|
data/README.md
CHANGED
@@ -1,4 +1,4 @@
|
|
1
|
-
#
|
1
|
+
# SHALMANESER
|
2
2
|
|
3
3
|
[RubyGems](http://rubygems.org/gems/shalmaneser) |
|
4
4
|
[Shalmanesers Project Page](http://bu.chsta.be/projects/shalmaneser/) |
|
@@ -7,9 +7,9 @@
|
|
7
7
|
|
8
8
|
|
9
9
|
[![Gem Version](https://img.shields.io/gem/v/shalmaneser.svg")](https://rubygems.org/gems/shalmaneser)
|
10
|
-
[![Gem Version](https://img.shields.io/gem/v/frprep.svg")](https://rubygems.org/gems/
|
11
|
-
[![Gem Version](https://img.shields.io/gem/v/fred.svg")](https://rubygems.org/gems/fred)
|
12
|
-
[![Gem Version](https://img.shields.io/gem/v/rosy.svg")](https://rubygems.org/gems/rosy)
|
10
|
+
[![Gem Version](https://img.shields.io/gem/v/frprep.svg")](https://rubygems.org/gems/shalmaneser-prep)
|
11
|
+
[![Gem Version](https://img.shields.io/gem/v/fred.svg")](https://rubygems.org/gems/shalmaneser-fred)
|
12
|
+
[![Gem Version](https://img.shields.io/gem/v/rosy.svg")](https://rubygems.org/gems/shalmaneser-rosy)
|
13
13
|
|
14
14
|
|
15
15
|
[![License GPL 2](http://img.shields.io/badge/License-GPL%202-green.svg)](http://www.gnu.org/licenses/gpl-2.0.txt)
|
@@ -17,12 +17,44 @@
|
|
17
17
|
[![Code Climate](https://img.shields.io/codeclimate/github/arbox/shalmaneser.svg")](https://codeclimate.com/github/arbox/shalmaneser)
|
18
18
|
[![Dependency Status](https://img.shields.io/gemnasium/arbox/shalmaneser.svg")](https://gemnasium.com/arbox/shalmaneser)
|
19
19
|
|
20
|
+
[SHALMANESER](http://www.coli.uni-saarland.de/projects/salsa/shal/) is a SHALlow seMANtic parSER.
|
21
|
+
|
22
|
+
The name Shalmaneser is borrowed from John Brunner. He describes in his novel
|
23
|
+
"Stand on Zanzibar" an all knowing supercomputer baptized Shalmaneser.
|
24
|
+
|
25
|
+
Shalmaneser also has other origins like the king [Shalmaneser III](https://en.wikipedia.org/wiki/Shalmaneser_III).
|
26
|
+
|
27
|
+
> "SCANALYZER is the one single, the ONLY study of the news in depth
|
28
|
+
> that’s processed by General Technics’ famed computer Shalmaneser,
|
29
|
+
> who sees all, hears all, knows all save only that which YOU, Mr. and Mrs.
|
30
|
+
> Everywhere, wish to keep to yourselves." <br/>
|
31
|
+
> John Brunner (1968) "Stand on Zanzibar"
|
32
|
+
|
33
|
+
> But Shalmaneser is a Micryogenic® computer bathed in liquid helium and it’s cold in his vault. <br/>
|
34
|
+
> John Brunner (1968) "Stand on Zanzibar"
|
35
|
+
|
36
|
+
> “Of course not. Shalmaneser’s main task is to achieve the impossible again, a routine undertaking here at GT.” <br/>
|
37
|
+
> John Brunner (1968) "Stand on Zanzibar"
|
38
|
+
|
39
|
+
> “They programmed Shalmaneser with the formula for this stiffener, see, and…” <br/>
|
40
|
+
> John Brunner (1968) "Stand on Zanzibar"
|
41
|
+
|
42
|
+
> What am I going to do now? <br/>
|
43
|
+
> “All right, Shalmaneser!” <br/>
|
44
|
+
> John Brunner (1968) "Stand on Zanzibar"
|
45
|
+
|
46
|
+
> Shalmaneser is a Micryogenic® computer bathed in liquid helium and there’s no sign of Teresa. <br/>
|
47
|
+
> John Brunner (1968) "Stand on Zanzibar"
|
48
|
+
|
49
|
+
> Bathed in his currents of liquid helium, self-contained, immobile, vastly well informed by every mechanical sense: Shalmaneser. <br/>
|
50
|
+
> John Brunner (1968) "Stand on Zanzibar"
|
51
|
+
|
20
52
|
## Description
|
21
53
|
|
22
54
|
Please be careful, the whole thing is under construction! For now Shalmaneser it not intended to run on Windows systems since it heavily uses system calls for external invocations.
|
23
55
|
Current versions of Shalmaneser have been tested on Linux only (other *NIX testers are welcome!).
|
24
56
|
|
25
|
-
Shalmaneser is a supervised learning toolbox for shallow semantic parsing, i.e. the automatic assignment of semantic classes and roles to text. This technique is often called SRL (Semantic Role Labelling). The system was developed for Frame Semantics; thus we use Frame Semantics terminology and call the classes frames and the roles frame elements. However, the architecture is reasonably general, and with a certain amount of adaption, Shalmaneser should be usable for other paradigms (e.g., PropBank roles) as well. Shalmaneser caters both for end users, and for researchers.
|
57
|
+
Shalmaneser is a supervised learning toolbox for shallow semantic parsing, i.e. the automatic assignment of semantic classes and roles to text. This technique is often called [SRL](https://en.wikipedia.org/wiki/Semantic_role_labeling) (Semantic Role Labelling). The system was developed for Frame Semantics; thus we use Frame Semantics terminology and call the classes frames and the roles frame elements. However, the architecture is reasonably general, and with a certain amount of adaption, Shalmaneser should be usable for other paradigms (e.g., PropBank roles) as well. Shalmaneser caters both for end users, and for researchers.
|
26
58
|
|
27
59
|
For end users, we provide a simple end user mode which can simply apply the pre-trained classifiers
|
28
60
|
for [English](http://www.coli.uni-saarland.de/projects/salsa/shal/index.php?nav=download) (FrameNet 1.3 annotation / Collins parser)
|
@@ -34,32 +66,27 @@ For researchers interested in investigating shallow semantic parsing, our system
|
|
34
66
|
|
35
67
|
## Origin
|
36
68
|
|
37
|
-
The original version of Shalmaneser was written by Sebastian Padó, Katrin Erk and others during their work in the SALSA Project.
|
69
|
+
The original version of Shalmaneser was written by Sebastian Padó, Katrin Erk, Alexander Koller, Ines Rehbein, Aljoscha Burchardt and others during their work in the SALSA Project.
|
38
70
|
|
39
71
|
You can find original versions of Shalmaneser up to ``1.1`` on the [SALSA](http://www.coli.uni-saarland.de/projects/salsa/shal/) project page.
|
40
72
|
|
41
73
|
## Publications on Shalmaneser
|
42
74
|
|
43
75
|
- K. Erk and S. Padó: Shalmaneser - a flexible toolbox for semantic role assignment. Proceedings of LREC 2006, Genoa, Italy. [Click here for details](http://www.nlpado.de/~sebastian/pub/papers/lrec06_erk.pdf).
|
76
|
+
|
44
77
|
- TODO: add other works
|
45
78
|
|
46
79
|
## Documentation
|
47
80
|
|
48
|
-
The project documentation can be found in our [doc](https://github.com/arbox/shalmaneser/blob/
|
81
|
+
The project documentation can be found in our [doc](https://github.com/arbox/shalmaneser/blob/master/doc/index.md) folder.
|
49
82
|
|
50
83
|
## Development
|
51
84
|
|
52
|
-
We are working now on
|
53
|
-
|
54
|
-
- ``dev`` - our development branch incorporating actual changes, for now pointing to ``1.2``;
|
55
|
-
|
56
|
-
- ``1.2`` - intermediate target;
|
57
|
-
|
58
|
-
- ``2.0`` - final target.
|
85
|
+
We are working now only on the `master` branch. For different intermediate versions see corresponding tags.
|
59
86
|
|
60
87
|
## Installation
|
61
88
|
|
62
|
-
See the installation instructions in the [doc](https://github.com/arbox/shalmaneser/blob/
|
89
|
+
See the installation instructions in the [doc](https://github.com/arbox/shalmaneser/blob/master/doc/index.md#installation) folder.
|
63
90
|
|
64
91
|
### Tokenizers
|
65
92
|
|
@@ -75,7 +102,7 @@ See the installation instructions in the [doc](https://github.com/arbox/shalmane
|
|
75
102
|
|
76
103
|
### Parsers
|
77
104
|
|
78
|
-
- [BerkeleyParser](https://
|
105
|
+
- [BerkeleyParser](https://github.com/slavpetrov/berkeleyparser)
|
79
106
|
- [Stanford Parser](http://nlp.stanford.edu/software/lex-parser.shtml)
|
80
107
|
- [Collins Parser](http://www.cs.columbia.edu/~mcollins/code.html)
|
81
108
|
|
@@ -86,8 +113,10 @@ See the installation instructions in the [doc](https://github.com/arbox/shalmane
|
|
86
113
|
|
87
114
|
## License
|
88
115
|
|
89
|
-
|
116
|
+
Shalmaneser is released under the `GPL v. 2.0` license as of the initial authors.
|
117
|
+
|
118
|
+
For a local copy of the full license text see the [LICENSE](LICENSE.md) file.
|
90
119
|
|
91
120
|
## Contributing
|
92
121
|
|
93
|
-
|
122
|
+
Feel free to contact me via Github. Open an issue if you see problems or need help.
|
data/bin/rosy
CHANGED
@@ -1,17 +1,24 @@
|
|
1
1
|
#!/usr/bin/env ruby
|
2
2
|
# -*- encoding: utf-8 -*-
|
3
3
|
|
4
|
-
#
|
4
|
+
# @author Andrei Beliankou
|
5
|
+
# 2011-11-14
|
5
6
|
# rosy.rb
|
6
|
-
# KE, SP April 05
|
7
|
+
# @author KE, SP April 05
|
7
8
|
#
|
8
9
|
# Main file of the Rosy role assignment system.
|
9
10
|
|
10
|
-
|
11
|
-
require 'rosy/opt_parser'
|
12
11
|
require 'rosy/rosy'
|
12
|
+
require 'rosy/opt_parser'
|
13
13
|
|
14
|
-
|
14
|
+
begin
|
15
|
+
options = ::Shalmaneser::Rosy::OptParser.parse(ARGV)
|
15
16
|
|
16
|
-
rosy = Rosy::Rosy.new(options)
|
17
|
-
|
17
|
+
rosy = ::Shalmaneser::Rosy::Rosy.new(options)
|
18
|
+
# @todo Rename the assing method.
|
19
|
+
rosy.assign
|
20
|
+
rescue => e
|
21
|
+
$stderr.puts 'Rosy cannot serve you!'
|
22
|
+
$stderr.puts e.message, e.backtrace
|
23
|
+
exit(1)
|
24
|
+
end
|
data/lib/rosy/FailedParses.rb
CHANGED
@@ -2,23 +2,24 @@
|
|
2
2
|
#
|
3
3
|
# SP May 05
|
4
4
|
#
|
5
|
-
# Administration of information about failed parses;
|
5
|
+
# Administration of information about failed parses;
|
6
6
|
# - sentence ID
|
7
7
|
# - frame
|
8
8
|
# - missed FE markables
|
9
9
|
#
|
10
|
-
# this class is pretty much a gloriefied hash table with methods to
|
10
|
+
# this class is pretty much a gloriefied hash table with methods to
|
11
11
|
# - read FailedParses from a file and to write them to a file
|
12
12
|
# - access info in a frame-specific way
|
13
|
-
|
13
|
+
module Shalmaneser
|
14
|
+
module Rosy
|
14
15
|
class FailedParses
|
15
|
-
|
16
|
+
|
16
17
|
###
|
17
18
|
# initialize
|
18
19
|
#
|
19
20
|
# nothing much happens here
|
20
|
-
def initialize
|
21
|
-
@failed_parses =
|
21
|
+
def initialize
|
22
|
+
@failed_parses = []
|
22
23
|
end
|
23
24
|
|
24
25
|
###
|
@@ -28,7 +29,7 @@ class FailedParses
|
|
28
29
|
# - its sentence id (any object)
|
29
30
|
# - its frame (String)
|
30
31
|
# - its FE list (String Array)
|
31
|
-
|
32
|
+
|
32
33
|
def register(sent_id, # object
|
33
34
|
frame, # string: frame name
|
34
35
|
target, # string?
|
@@ -54,8 +55,8 @@ class FailedParses
|
|
54
55
|
unless train_percentage.class < Integer and train_percentage >= 0 and train_percentage <= 100
|
55
56
|
raise "Need Integer between 0 and 100 as training percentage."
|
56
57
|
end
|
57
|
-
train_failed = FailedParses.new
|
58
|
-
test_failed = FailedParses.new
|
58
|
+
train_failed = FailedParses.new
|
59
|
+
test_failed = FailedParses.new
|
59
60
|
@failed_parses.each {|sent_id,frame,target,target_pos,fe_list|
|
60
61
|
if rand(100) > train_percentage
|
61
62
|
test_failed.register(sent_id,frame,target,target_pos,fe_list)
|
@@ -70,17 +71,17 @@ class FailedParses
|
|
70
71
|
# Access information
|
71
72
|
#
|
72
73
|
# failed_sent: number of failed sentences
|
73
|
-
# failed_fes: Hash that maps FE names [String] onto numbers of failed FEs [Int]
|
74
|
+
# failed_fes: Hash that maps FE names [String] onto numbers of failed FEs [Int]
|
74
75
|
#
|
75
|
-
# optional parameters: frame, target, target_pos : if not specified or nil, marginal
|
76
|
+
# optional parameters: frame, target, target_pos : if not specified or nil, marginal
|
76
77
|
# frequencies are counted (sum over all values)
|
77
|
-
|
78
78
|
|
79
|
-
|
79
|
+
|
80
|
+
def failed_sent(frame_spec=nil,target_spec=nil,target_pos_spec=nil)
|
80
81
|
counter = 0
|
81
82
|
@failed_parses.each {|sent_id,frame,target,target_pos,fe_list|
|
82
|
-
if ((frame_spec.nil? or frame_spec == frame) and
|
83
|
-
(target_spec.nil? or target_spec == target) and
|
83
|
+
if ((frame_spec.nil? or frame_spec == frame) and
|
84
|
+
(target_spec.nil? or target_spec == target) and
|
84
85
|
(target_pos_spec.nil? or target_pos_spec == target_pos))
|
85
86
|
counter += 1
|
86
87
|
end
|
@@ -91,8 +92,8 @@ class FailedParses
|
|
91
92
|
def failed_fes(frame_spec=nil,target_spec=nil,target_pos_spec=nil)
|
92
93
|
fe_hash = Hash.new(0)
|
93
94
|
@failed_parses.each {|sent_id,frame,target,target_pos,fe_list|
|
94
|
-
if ((frame_spec.nil? or frame_spec == frame) and
|
95
|
-
(target_spec.nil? or target_spec == target) and
|
95
|
+
if ((frame_spec.nil? or frame_spec == frame) and
|
96
|
+
(target_spec.nil? or target_spec == target) and
|
96
97
|
(target_pos_spec.nil? or target_pos_spec == target))
|
97
98
|
fe_list.each {|fe_label|
|
98
99
|
fe_hash[fe_label] += 1
|
@@ -102,7 +103,7 @@ class FailedParses
|
|
102
103
|
return fe_hash
|
103
104
|
end
|
104
105
|
|
105
|
-
|
106
|
+
|
106
107
|
###
|
107
108
|
# Marshalling:
|
108
109
|
#
|
@@ -125,6 +126,7 @@ class FailedParses
|
|
125
126
|
$stderr.puts "I'll assume that there are no failed parses."
|
126
127
|
end
|
127
128
|
end
|
128
|
-
|
129
|
-
|
129
|
+
|
130
|
+
end
|
131
|
+
end
|
130
132
|
end
|
data/lib/rosy/FeatureInfo.rb
CHANGED
@@ -1,11 +1,13 @@
|
|
1
|
-
require '
|
1
|
+
require 'ruby_class_extensions'
|
2
2
|
|
3
|
+
module Shalmaneser
|
4
|
+
module Rosy
|
3
5
|
class RosyFeatureInfo
|
4
6
|
###
|
5
7
|
# class variable:
|
6
8
|
# list of all known extractors
|
7
9
|
# add to it using add_feature()
|
8
|
-
@@extractors =
|
10
|
+
@@extractors = []
|
9
11
|
|
10
12
|
# boolean. set to true after warning messages have been given once
|
11
13
|
@@warned = false
|
@@ -15,21 +17,21 @@ class RosyFeatureInfo
|
|
15
17
|
def RosyFeatureInfo.add_feature(class_name) # Class object
|
16
18
|
@@extractors << class_name
|
17
19
|
end
|
18
|
-
|
20
|
+
|
19
21
|
###
|
20
22
|
def initialize(exp)
|
21
|
-
|
23
|
+
|
22
24
|
##
|
23
25
|
# make list of extractors that are
|
24
26
|
# either required by the user
|
25
27
|
# or needed by the system
|
26
|
-
@current_extractors =
|
28
|
+
@current_extractors = []
|
27
29
|
@exp = exp
|
28
30
|
|
29
31
|
# user-chosen extractors:
|
30
32
|
# returns array of pairs [feature group designator(string), options(array:string)]
|
31
33
|
exp.get_lf("feature").each { |extractor_name, options|
|
32
|
-
extractor = @@extractors.detect { |e| e.designator
|
34
|
+
extractor = @@extractors.detect { |e| e.designator == extractor_name }
|
33
35
|
unless extractor
|
34
36
|
# no extractor found matching the given designator
|
35
37
|
unless @@warned
|
@@ -69,13 +71,13 @@ class RosyFeatureInfo
|
|
69
71
|
# extractors needed by the system
|
70
72
|
@@extractors.select { |e|
|
71
73
|
# select admin features and gold feature
|
72
|
-
["admin", "gold"].include? e.feature_type
|
74
|
+
["admin", "gold"].include? e.feature_type
|
73
75
|
}.each { |extractor|
|
74
|
-
|
76
|
+
|
75
77
|
# if we have already added that extractor, remove it
|
76
78
|
# and add it with our own options
|
77
|
-
@current_extractors.delete_if { |descr| descr["extractor"].designator
|
78
|
-
|
79
|
+
@current_extractors.delete_if { |descr| descr["extractor"].designator == extractor.designator }
|
80
|
+
|
79
81
|
@current_extractors << {
|
80
82
|
"extractor"=> extractor,
|
81
83
|
"step" => "dontuse"
|
@@ -86,14 +88,14 @@ class RosyFeatureInfo
|
|
86
88
|
# (i.e. check dependencies)
|
87
89
|
|
88
90
|
allstep_extractors = @current_extractors.find_all {|e_hash| e_hash["step"].nil?
|
89
|
-
}.map { |e| e["extractor"].designator
|
91
|
+
}.map { |e| e["extractor"].designator }
|
90
92
|
argrec_extractors = @current_extractors.find_all {|e_hash| e_hash["step"].nil? or e_hash["step"] == "argrec"
|
91
|
-
}.map { |e| e["extractor"].designator
|
93
|
+
}.map { |e| e["extractor"].designator }
|
92
94
|
arglab_extractors = @current_extractors.find_all {|e_hash| e_hash["step"].nil? or e_hash["step"] == "arglab"
|
93
|
-
}.map { |e| e["extractor"].designator
|
95
|
+
}.map { |e| e["extractor"].designator }
|
94
96
|
onestep_extractors = @current_extractors.find_all {|e_hash| e_hash["step"].nil? or e_hash["step"] == "onestep"
|
95
|
-
}.map { |e| e["extractor"].designator
|
96
|
-
|
97
|
+
}.map { |e| e["extractor"].designator }
|
98
|
+
|
97
99
|
@current_extractors.delete_if {|extractor_hash|
|
98
100
|
case extractor_hash["step"]
|
99
101
|
when nil
|
@@ -104,7 +106,7 @@ class RosyFeatureInfo
|
|
104
106
|
computable = extractor_hash["extractor"].is_computable(arglab_extractors)
|
105
107
|
when "onestep"
|
106
108
|
computable = extractor_hash["extractor"].is_computable(onestep_extractors)
|
107
|
-
when "dontuse"
|
109
|
+
when "dontuse"
|
108
110
|
# either an admin feature or a user feature not to be used this time
|
109
111
|
computable = true
|
110
112
|
end
|
@@ -113,7 +115,7 @@ class RosyFeatureInfo
|
|
113
115
|
false # i.e. don't delete
|
114
116
|
else
|
115
117
|
unless @@warned
|
116
|
-
$stderr.puts "Warning: Feature extractor #{extractor_hash["extractor"].designator
|
118
|
+
$stderr.puts "Warning: Feature extractor #{extractor_hash["extractor"].designator} cannot be computed: skipping."
|
117
119
|
end
|
118
120
|
true
|
119
121
|
end
|
@@ -126,17 +128,17 @@ class RosyFeatureInfo
|
|
126
128
|
# "step" -> string: argrec, arglab, onestep, or nil
|
127
129
|
# "type" -> string
|
128
130
|
# "phase" -> string: phase 1 or phase 2
|
129
|
-
@features =
|
131
|
+
@features = []
|
130
132
|
@current_extractors.each { |descr|
|
131
133
|
extractor = descr["extractor"]
|
132
134
|
extractor.feature_names.each { |feature_name|
|
133
135
|
@features << {
|
134
136
|
"feature_name" => feature_name,
|
135
|
-
"sql_type" => extractor.sql_type
|
136
|
-
"is_index" => extractor.info
|
137
|
+
"sql_type" => extractor.sql_type,
|
138
|
+
"is_index" => extractor.info.include?("index"),
|
137
139
|
"step" => descr["step"],
|
138
|
-
"type" => extractor.feature_type
|
139
|
-
"phase" => extractor.phase
|
140
|
+
"type" => extractor.feature_type,
|
141
|
+
"phase" => extractor.phase
|
140
142
|
}
|
141
143
|
}
|
142
144
|
}
|
@@ -152,7 +154,7 @@ class RosyFeatureInfo
|
|
152
154
|
# all features to be computed, with their SQL column formats
|
153
155
|
def get_column_formats(phase = nil) # string: phase 1 or phase 2
|
154
156
|
return @features.select { |feature_descr|
|
155
|
-
phase.nil? or
|
157
|
+
phase.nil? or
|
156
158
|
feature_descr["phase"] == phase
|
157
159
|
}.map { |feature_descr|
|
158
160
|
[feature_descr["feature_name"], feature_descr["sql_type"]]
|
@@ -166,7 +168,7 @@ class RosyFeatureInfo
|
|
166
168
|
# all features to be computed
|
167
169
|
def get_column_names(phase = nil) # string: phase 1 or phase 2
|
168
170
|
return @features.select { |feature_descr|
|
169
|
-
phase.nil? or
|
171
|
+
phase.nil? or
|
170
172
|
feature_descr["phase"] == phase
|
171
173
|
}.map { |feature_descr|
|
172
174
|
feature_descr["feature_name"]
|
@@ -179,9 +181,9 @@ class RosyFeatureInfo
|
|
179
181
|
# returns a list of feature (column) names as Strings
|
180
182
|
# consisting of all features that have been requested as index features
|
181
183
|
# in the experiment file or in the list of @@all_features_we_have above
|
182
|
-
def get_index_columns
|
184
|
+
def get_index_columns
|
183
185
|
return @features.select { |feature_descr|
|
184
|
-
feature_descr["is_index"]
|
186
|
+
feature_descr["is_index"]
|
185
187
|
}.map {|feature_descr|
|
186
188
|
feature_descr["feature_name"]
|
187
189
|
}
|
@@ -209,13 +211,13 @@ class RosyFeatureInfo
|
|
209
211
|
}.map { |feature_descr|
|
210
212
|
# use just the names of the features
|
211
213
|
feature_descr["feature_name"]
|
212
|
-
}
|
214
|
+
}
|
213
215
|
end
|
214
216
|
|
215
217
|
###
|
216
218
|
# get_extractor_objects
|
217
219
|
#
|
218
|
-
# returns two lists of feature extractor objects,
|
220
|
+
# returns two lists of feature extractor objects,
|
219
221
|
# covering all features of the given phase:
|
220
222
|
# the first list contains RosyFeatureExtractor extractors,
|
221
223
|
# the second list contains the others.
|
@@ -227,16 +229,18 @@ class RosyFeatureInfo
|
|
227
229
|
|
228
230
|
return @current_extractors.select { |descr|
|
229
231
|
# select extractors of the right phase
|
230
|
-
descr["extractor"].phase
|
232
|
+
descr["extractor"].phase == phase
|
231
233
|
}.map { |descr|
|
232
234
|
|
233
235
|
# make objects from extractor classes
|
234
236
|
descr["extractor"].new(@exp, interpreter_class)
|
235
237
|
}.distribute { |extractor_obj|
|
236
|
-
# distribute extractors in two bins:
|
238
|
+
# distribute extractors in two bins:
|
237
239
|
# first, rosy extractors
|
238
240
|
# second, others
|
239
|
-
extractor_obj.class.info
|
241
|
+
extractor_obj.class.info.include? "rosy"
|
240
242
|
}
|
241
243
|
end
|
242
244
|
end
|
245
|
+
end
|
246
|
+
end
|