svmlab 1.0.0
Sign up to get free protection for your applications and to get access to all the features.
- data/lib/README +219 -0
- data/lib/arraymethods.rb +87 -0
- data/lib/irb.history +100 -0
- data/lib/libsvmdata.rb +122 -0
- data/lib/svmfeature.rb +337 -0
- data/lib/svmfeature2.rb +98 -0
- data/lib/svmlab-config.rb +215 -0
- data/lib/svmlab-irb.rb +98 -0
- data/lib/svmlab-optim.rb +556 -0
- data/lib/svmlab-plot.rb +170 -0
- data/lib/svmlab.rb +365 -0
- data/lib/svmprediction.rb +176 -0
- data/lib/test.cfg +12 -0
- data/lib/test.rb +5 -0
- data/lib/testdata +3 -0
- data/lib/texput.log +20 -0
- data/lib/tmp.irb.rc +81 -0
- data/lib/v6.cfg +124 -0
- metadata +102 -0
data/lib/README
ADDED
@@ -0,0 +1,219 @@
|
|
1
|
+
Contents:
|
2
|
+
1) Install
|
3
|
+
2) Configuration
|
4
|
+
3) Feature methods
|
5
|
+
|
6
|
+
|
7
|
+
--------------------------------------------------------------------------------
|
8
|
+
1) HOW TO INSTALL
|
9
|
+
--------------------------------------------------------------------------------
|
10
|
+
|
11
|
+
--------------------------------------------------------------------------------
|
12
|
+
2) CONFIGURATION
|
13
|
+
--------------------------------------------------------------------------------
|
14
|
+
|
15
|
+
------------------------------------------------------------
|
16
|
+
CONFIGURATION OF SUPPORT VECTOR MACHINE
|
17
|
+
------------------------------------------------------------
|
18
|
+
SVM:
|
19
|
+
C: 1
|
20
|
+
e: 0.3
|
21
|
+
g: 0.05
|
22
|
+
Optimization:
|
23
|
+
Method: patternsearch
|
24
|
+
Nhalf: 2
|
25
|
+
Scale:
|
26
|
+
Default: std
|
27
|
+
---------------------------------------------------------------------------------
|
28
|
+
The configuration under the 'SVM' entry has three parts, as seen above.
|
29
|
+
1) Parameters to the SVM algorithm.
|
30
|
+
Possible entries are:
|
31
|
+
C : The penalty factor for misclassification.
|
32
|
+
g : Gamma for the RBF kernel exp(- gamma * |x-y|^2)
|
33
|
+
e : Epsilon for epsilon tube regression
|
34
|
+
2) Optimization configuration.
|
35
|
+
Any parameter given in the configuration may be optimized. The
|
36
|
+
optimization method is configured here.
|
37
|
+
Method : Name of optimization method to be used. Currently only the
|
38
|
+
patternsearch method is implemented.
|
39
|
+
Nhalf : How many times to halve the patternsearch step size before
|
40
|
+
giving up the optimization. It is a trade-off between
|
41
|
+
execution time and detail of optimization. A reasonable value
|
42
|
+
for Nhalf is 2.
|
43
|
+
----------------------------------------------------------------------
|
44
|
+
| For a parameter to be optimized it needs to have optimization
|
45
|
+
| instructions added to its setting. Pattern search instructions may
|
46
|
+
| be added to any numerical parameter by appending text following
|
47
|
+
| this syntax:
|
48
|
+
| step exp<stepsize>
|
49
|
+
| OR
|
50
|
+
| step <stepsize>
|
51
|
+
| which should appear AFTER any parameter in the configuration to be
|
52
|
+
| optimized. If 'step exp1.0' is given, it means that the pattern
|
53
|
+
| search should use an initial step size of 1 in the 10-logarithmic
|
54
|
+
| scale, i.e. it tries values at 0.1 and 10 times the original
|
55
|
+
| value. Most parameters (including the SVM parameters) are best
|
56
|
+
| optimized in the log scale. For example, to optimize SVM parameters
|
57
|
+
| we may write
|
58
|
+
| C: 1 step exp1.0
|
59
|
+
| g: 0.05 step exp1.0
|
60
|
+
-------------------------------------------------------------------
|
61
|
+
3) Normalization factors for features.
|
62
|
+
The SVM does best when features are normalized in one way or another.
|
63
|
+
The standard way of normalizing is to calculate the so called z-score
|
64
|
+
for each feature (subtracting mean value and dividing by standard
|
65
|
+
deviation), but depending on the feature other scaling may be better.
|
66
|
+
All scalings described below are applied after each feature has been
|
67
|
+
centralized around zero (by calculating the mean value of each feature
|
68
|
+
on the dataset given). For scaling after the centralization, there are
|
69
|
+
several possibilities:
|
70
|
+
A) No Scale entry at all in configuration.
|
71
|
+
All features will be given to the SVM without rescaling.
|
72
|
+
B) Only Default given.
|
73
|
+
All features will be scaled according to the scale factor given
|
74
|
+
for Default.
|
75
|
+
C) Specific features' scaling is also given.
|
76
|
+
A specific scaling factor will be applied to the features given.
|
77
|
+
All other features will have the Default scaling factor (or 1 if no
|
78
|
+
Default given). A specific feature scaling is given in one of the
|
79
|
+
following syntaxes:
|
80
|
+
featurename: scaling factor
|
81
|
+
OR
|
82
|
+
featurename:
|
83
|
+
index1: scaling factor
|
84
|
+
...
|
85
|
+
indexN: scaling factor
|
86
|
+
Note that 'index' refers to index in a multidimensional feature,
|
87
|
+
counting from 0. Also note that if scaling is given by the first
|
88
|
+
syntax, the scaling will be applied to all dimensions of the
|
89
|
+
feature.
|
90
|
+
A scaling factor can be one of the following:
|
91
|
+
max : Scaling factor is set so that the max (absolute) value of the
|
92
|
+
(centralized) feature is 1 on the dataset given.
|
93
|
+
std : Scaling factor is set so that the standard deviation is 1.
|
94
|
+
This is the most common normalization of features.
|
95
|
+
<num>: A specific numeric value.
|
96
|
+
Note that optimization instructions may also be appended to scaling
|
97
|
+
factors (an extreme example would be to add it to the Default scaling
|
98
|
+
factor, which gives a lengthy optimization calculation and probably
|
99
|
+
an "overtraining" on how to scale features).
|
100
|
+
---------------------------------------------------------------------------------
|
101
|
+
------------------------------------------------------------
|
102
|
+
CONFIGURATION OF FEATURES
|
103
|
+
------------------------------------------------------------
|
104
|
+
Feature:
|
105
|
+
Features:
|
106
|
+
- <targetfeature>
|
107
|
+
- <feature1>
|
108
|
+
- <feature2>
|
109
|
+
...
|
110
|
+
- <featureN>
|
111
|
+
BaseDir: <base directory>
|
112
|
+
DataSet: <file>
|
113
|
+
OR
|
114
|
+
<file prefix>
|
115
|
+
Groups: <range (n1..n2)>
|
116
|
+
OR
|
117
|
+
<file prefix>
|
118
|
+
Methods: <path of Ruby code file>
|
119
|
+
<featureX>:
|
120
|
+
HomeDir: <home directory>
|
121
|
+
Method: <method name>
|
122
|
+
Dimensions: <number of dimensions in this feature>
|
123
|
+
<Further specific configuration for this feature>
|
124
|
+
...
|
125
|
+
|
126
|
+
-----------------------------------------------------------------------------------
|
127
|
+
| A more detailed descriptions of the configuration entries for the
|
128
|
+
| feature configuration.
|
129
|
+
| ---------------------------------------------------------------------------------
|
130
|
+
| Features : This is an array giving all features to be used. The first feature
|
131
|
+
| in the array is the feature to be predicted from the other features.
|
132
|
+
| BaseDir : A directory from which all other directory names are derived.
|
133
|
+
| DataSet : A file or a file prefix (such as 'dataset' points to dataset1,
|
134
|
+
| dataset2, etc). The file(s) contain the names of all samples in
|
135
|
+
| the dataset used to build an SVM model in SVMLab. The file(s)
|
136
|
+
| should be simples lists of sample names, each name on its own
|
137
|
+
| line. Sample names should be without blank spaces.
|
138
|
+
| Example : ?
|
139
|
+
| Groups : These are groups made for crossvalidation.
|
140
|
+
| If Groups is not set, a leave-one-out crossvalidate will be used.
|
141
|
+
| Two possibilities:
|
142
|
+
| A) The range (n1..n2) groups together all samples whose names'
|
143
|
+
| substrings indexed by (n1..n2) correspond. E.g. the range
|
144
|
+
| (0..2) would group together 'tommy' and 'tomas' while putting
|
145
|
+
| 'tony' in a separate group.
|
146
|
+
| B) A file prefix, and a group is made from the sample names found
|
147
|
+
| in each single file. See DataSet above.
|
148
|
+
| Methods : The path to a file containg Ruby code implementing methods used
|
149
|
+
| for feature calculation (see specific feature configurations
|
150
|
+
| below).
|
151
|
+
| <feature> : Each feature given in the array Features (see above) needs its
|
152
|
+
| own configuration. The following entries should be set for all
|
153
|
+
| features:
|
154
|
+
| HomeDir : This may be set in an absolute value or relative to
|
155
|
+
| BaseDir (see above). If HomeDir is not given, it will
|
156
|
+
| be set to BaseDir/featurename.
|
157
|
+
| Method : The function that is called when the feature is
|
158
|
+
| requested. If Method is not given, an attempts is made
|
159
|
+
| to acquire the feature from the database. If it fails,
|
160
|
+
| ERROR is reported.
|
161
|
+
| Dimensions : The number of dimensions for this feature. If
|
162
|
+
| Dimensions is not given, it will be assumed to be 1
|
163
|
+
| and only the first value for each example will be
|
164
|
+
| used.
|
165
|
+
| <other> : You may wish to include more configuration for the
|
166
|
+
| feature, and you may add them here. The entire feature
|
167
|
+
| configuration are sent to the feature method (see
|
168
|
+
| Method above) so this is a good place to put
|
169
|
+
| parameters for feature calculations. It is also
|
170
|
+
| possible to add pattern search instructions here, for
|
171
|
+
| parameters that are up for optimization.
|
172
|
+
-----------------------------------------------------------------------------------
|
173
|
+
|
174
|
+
--------------------------------------------------------------------------------
|
175
|
+
3) FEATURE METHODS
|
176
|
+
--------------------------------------------------------------------------------
|
177
|
+
As a user of SVMLab you should not have to worry about preparing data for the
|
178
|
+
SVM experiment. What you should care about is instead to implement methods for
|
179
|
+
calculating or retrieving features. Unfortunately, all methods have to be
|
180
|
+
implemented in Ruby since SVMLab is written in Ruby and executed in the Ruby
|
181
|
+
environment. Ruby is however not a very complicated language and you may also
|
182
|
+
use it merely to call another method implemented in any language by letting the
|
183
|
+
Ruby method make a call to the terminal shell (see below for an example).
|
184
|
+
As you can see in the configuration section, you have to configure a method for
|
185
|
+
each feature that you use. This section is about how to implement the method
|
186
|
+
itself. There are only two rules for this - the format of the input and the
|
187
|
+
format of the output.
|
188
|
+
1) Input
|
189
|
+
SVMLab calls each method with the syntax
|
190
|
+
method(Hash cfg, String samplename)
|
191
|
+
It is up to the method that you implement to know how to interpret the
|
192
|
+
samplename and use the configuration in order to give an answer to the request
|
193
|
+
for a feature value.
|
194
|
+
2) Output
|
195
|
+
This should be given in the format
|
196
|
+
{samplename => String answer,
|
197
|
+
othersample => String otheranswer,
|
198
|
+
...}
|
199
|
+
Most important here is that an answer should be given in the form of a Hash
|
200
|
+
where one key is pointing to the answer for the sample that the feature was
|
201
|
+
requested for. Your method may however include other keys in the Hash giving
|
202
|
+
answers for other samples if this has a benefit for you. For example, if the
|
203
|
+
method is a lengthy calculation that would have to be repeated for other
|
204
|
+
samples it is convenient to throw out all answers given by the calculation to
|
205
|
+
SVMLab. This way, SVMLab will only need to consult its database next time a
|
206
|
+
feature value is needed that needs the same calculation.
|
207
|
+
Also note that the answers should be given as a String. While the SVM
|
208
|
+
expects numeric features, this lets the method give a non-numeric answer which
|
209
|
+
SVMLab interprets as an error message. If the feature is a multidimensional
|
210
|
+
feature all numeric values should be separated with a space in the String.
|
211
|
+
|
212
|
+
Example of a method
|
213
|
+
|
214
|
+
This is an example of a case where you already have an executable that you call
|
215
|
+
in the terminal by 'myexecutable -a myargument' :
|
216
|
+
def mymethod(cfg, sample)
|
217
|
+
ans = `myexecutable -a #{cfg{'myargument'}}`
|
218
|
+
{sample => ans}
|
219
|
+
end
|
data/lib/arraymethods.rb
ADDED
@@ -0,0 +1,87 @@
|
|
1
|
+
#require 'narray'
|
2
|
+
|
3
|
+
class Array
|
4
|
+
def sum
|
5
|
+
self.inject(0) { |s,i| s + i }
|
6
|
+
end
|
7
|
+
|
8
|
+
def mean
|
9
|
+
self.sum.to_f / self.size
|
10
|
+
end
|
11
|
+
|
12
|
+
#def median
|
13
|
+
# NArray.to_na(self).median
|
14
|
+
#end
|
15
|
+
|
16
|
+
def percentile(percent)
|
17
|
+
if percent == 0
|
18
|
+
[self.min,self.max]
|
19
|
+
else
|
20
|
+
sorted = self.sort
|
21
|
+
[ sorted[(percent/100.0*self.size).round], sorted[((100-percent)/100.0*self.size).round] ]
|
22
|
+
end
|
23
|
+
end
|
24
|
+
|
25
|
+
def stddev
|
26
|
+
Math.sqrt( self.variance )
|
27
|
+
end
|
28
|
+
|
29
|
+
# Mean square deviation from mean
|
30
|
+
def variance
|
31
|
+
m = self.mean
|
32
|
+
sum = self.inject(0) do |s,i|
|
33
|
+
s + (i-m)**2
|
34
|
+
end
|
35
|
+
sum / self.size.to_f
|
36
|
+
end
|
37
|
+
|
38
|
+
# Scalar product
|
39
|
+
def *(arg)
|
40
|
+
if self.size != arg.size
|
41
|
+
raise "Not equal sizes in scalar product"
|
42
|
+
end
|
43
|
+
self.zip(arg).inject(0.0) do |s,(i,j)|
|
44
|
+
s + i * j
|
45
|
+
end
|
46
|
+
end
|
47
|
+
|
48
|
+
def to_i
|
49
|
+
self.map{|i| i.to_i}
|
50
|
+
end
|
51
|
+
|
52
|
+
def to_f
|
53
|
+
self.map{|i| i.to_f}
|
54
|
+
end
|
55
|
+
end
|
56
|
+
|
57
|
+
def correlation(x, y)
|
58
|
+
xymean = (x * y) / x.size.to_f
|
59
|
+
sx = x.stddev
|
60
|
+
sy = y.stddev
|
61
|
+
(xymean - x.mean * y.mean) / (sx * sy)
|
62
|
+
end
|
63
|
+
|
64
|
+
class Hash
|
65
|
+
def to_xy(other)
|
66
|
+
raise "Please give Hash as argument" if !other.is_a? Hash
|
67
|
+
arr = self.keys.inject([]) do |a,k|
|
68
|
+
if other[k] and self[k]
|
69
|
+
a << [self[k].to_f, other[k].to_f]
|
70
|
+
end
|
71
|
+
a
|
72
|
+
end
|
73
|
+
x = arr.map{|i| i[0]}
|
74
|
+
y = arr.map{|i| i[1]}
|
75
|
+
[ x, y ]
|
76
|
+
end
|
77
|
+
|
78
|
+
def cc(other)
|
79
|
+
x, y = to_xy(other)
|
80
|
+
#puts "CC : N = #{x.size}"
|
81
|
+
if x.size > 0
|
82
|
+
correlation(x,y)
|
83
|
+
else
|
84
|
+
nil
|
85
|
+
end
|
86
|
+
end
|
87
|
+
end
|
data/lib/irb.history
ADDED
@@ -0,0 +1,100 @@
|
|
1
|
+
p.f1
|
2
|
+
lab = SVMLab.new('/home/fred/DeltaDeltaG/cfg/final.cfg')
|
3
|
+
p = lab.crossvalidate
|
4
|
+
p.cc
|
5
|
+
p.rmsd
|
6
|
+
p.f1 2
|
7
|
+
p.f1
|
8
|
+
p.f1 1
|
9
|
+
p.f1 1:3
|
10
|
+
p.f1 3
|
11
|
+
p.f1 4
|
12
|
+
p.f1 5
|
13
|
+
p.f1 0
|
14
|
+
p.plot
|
15
|
+
lab = SVMLab.new('/home/fred/DeltaDeltaG/cfg/final.cfg')
|
16
|
+
p = lab.crossvalidate
|
17
|
+
p.plot
|
18
|
+
help f1
|
19
|
+
help p.f1
|
20
|
+
pm p
|
21
|
+
class Fhash < Hash
|
22
|
+
def to_s
|
23
|
+
"I'm a Fhash!"
|
24
|
+
end
|
25
|
+
end
|
26
|
+
f = Fhash.new
|
27
|
+
puts f
|
28
|
+
exit
|
29
|
+
lab = SVMLab.new('/home/fred/DeltaDeltaG/cfg/final.cfg')
|
30
|
+
exit
|
31
|
+
lab = SVMLab.new('/home/fred/DeltaDeltaG/cfg/final.cfg')
|
32
|
+
exit
|
33
|
+
lab = SVMLab.new('/home/fred/DeltaDeltaG/cfg/final.cfg')
|
34
|
+
exit
|
35
|
+
lab = SVMLab.new('/home/fred/DeltaDeltaG/cfg/final.cfg')
|
36
|
+
exit
|
37
|
+
lab = SVMLab.new('/home/fred/DeltaDeltaG/cfg/final.cfg')
|
38
|
+
lab = SVMLab.new('testcfg')
|
39
|
+
exit
|
40
|
+
lab = SVMLab.new('testcfg')
|
41
|
+
lab = SVMLab.new('testcfg')
|
42
|
+
exit
|
43
|
+
lab = SVMLab.new('testcfg')
|
44
|
+
exit
|
45
|
+
lab = SVMLab.new('testcfg')
|
46
|
+
exit
|
47
|
+
lab = SVMLab.new('testcfg')
|
48
|
+
exit
|
49
|
+
lab = SVMLab.new('testcfg')
|
50
|
+
a |= 1
|
51
|
+
a
|
52
|
+
a = 1
|
53
|
+
a |= 1
|
54
|
+
a
|
55
|
+
a =| 1
|
56
|
+
a &= 1
|
57
|
+
a
|
58
|
+
a &= 2
|
59
|
+
a
|
60
|
+
b &= 2
|
61
|
+
b
|
62
|
+
a ||= 2
|
63
|
+
a = nil
|
64
|
+
a ||= 2
|
65
|
+
a
|
66
|
+
a ||= 3
|
67
|
+
a
|
68
|
+
a = nil
|
69
|
+
a ||= []
|
70
|
+
a
|
71
|
+
p a
|
72
|
+
a.size
|
73
|
+
exit
|
74
|
+
lab = SVMLab.new('testcfg')
|
75
|
+
lab['SVM']['Center'].to_yaml
|
76
|
+
lab.cfg['SVM']['Center'].to_yaml
|
77
|
+
puts lab.cfg['SVM']['Center'].to_yaml
|
78
|
+
exit
|
79
|
+
lab = SVMLab.new('testcfg')
|
80
|
+
exit
|
81
|
+
lab = SVMLab.new('testcfg')
|
82
|
+
exit
|
83
|
+
lab = SVMLab.new('testcfg')
|
84
|
+
exit
|
85
|
+
lab = SVMLab.new('testcfg')
|
86
|
+
exit
|
87
|
+
lab = SVMLab.new('testcfg')
|
88
|
+
exit
|
89
|
+
lab = SVMLab.new('testcfg')
|
90
|
+
p = lab.crossvalidate
|
91
|
+
p.cc
|
92
|
+
lab = SVMLab.new('/home/fred/DeltaDeltaG/cfg/final.cfg')
|
93
|
+
lab = SVMLab.new('/home/fred/DeltaDeltaG/cfg/final.cfg')
|
94
|
+
lab.crossvalidate.cc
|
95
|
+
lab.crossvalidate.rmsd
|
96
|
+
lab.C
|
97
|
+
lab.epsilon
|
98
|
+
exit
|
99
|
+
exit
|
100
|
+
exit
|
data/lib/libsvmdata.rb
ADDED
@@ -0,0 +1,122 @@
|
|
1
|
+
require 'tempfile'
|
2
|
+
# This is a class that IS the path of a file with LIBSVM data.
|
3
|
+
# It can do some more tricks beyond just being a path.
|
4
|
+
# However, all those tricks are DESTRUCTIVE so be careful.
|
5
|
+
# It is also not a good class if you want to keep LIBSVM's sparse
|
6
|
+
# data format. This class will fill everything missing with zeros
|
7
|
+
# upon creation.
|
8
|
+
class LIBSVMdata < String
|
9
|
+
|
10
|
+
attr_reader :data, :path
|
11
|
+
|
12
|
+
def initialize(path, readonly = true, classborder = nil)
|
13
|
+
@readonly = readonly
|
14
|
+
@data = open(path).read
|
15
|
+
@classborder = classborder
|
16
|
+
if @readonly
|
17
|
+
super(path)
|
18
|
+
else
|
19
|
+
@file = Tempfile.new('libsvmdata')
|
20
|
+
super(@file.path)
|
21
|
+
self.fillSparseData!
|
22
|
+
end
|
23
|
+
end
|
24
|
+
|
25
|
+
def dim
|
26
|
+
@dim ||= @data.split("\n").first.split.size
|
27
|
+
end
|
28
|
+
|
29
|
+
def nexamples
|
30
|
+
@nexamples ||= @data.split("\n").size
|
31
|
+
end
|
32
|
+
|
33
|
+
def update
|
34
|
+
return if @readonly
|
35
|
+
@dim = nil
|
36
|
+
@nexamples = nil
|
37
|
+
open(self,'w') do |f|
|
38
|
+
f.puts @data
|
39
|
+
end
|
40
|
+
self
|
41
|
+
end
|
42
|
+
|
43
|
+
def translate!(vector)
|
44
|
+
return if @readonly
|
45
|
+
@data =
|
46
|
+
@data.split("\n").map { |line|
|
47
|
+
pred,*feats = line.split
|
48
|
+
(@classborder ? "#{pred} " : "#{(pred.to_f+vector[0])} ") +
|
49
|
+
feats.map{|feat|
|
50
|
+
key,val = feat.split(':')
|
51
|
+
"#{key}:#{val.to_f+vector[key.to_i]}"}.join(' ') + "\n"
|
52
|
+
}.join
|
53
|
+
self.update
|
54
|
+
end
|
55
|
+
|
56
|
+
def scale!(vector)
|
57
|
+
return if @readonly
|
58
|
+
@data =
|
59
|
+
@data.split("\n").map { |line|
|
60
|
+
pred,*feats = line.split
|
61
|
+
(@classborder ? "#{pred} " : "#{(pred.to_f*vector[0])} ") +
|
62
|
+
feats.map{|feat|
|
63
|
+
key,val = feat.split(':')
|
64
|
+
"#{key}:#{val.to_f*vector[key.to_i]}"}.join(' ') + "\n"
|
65
|
+
}.join
|
66
|
+
self.update
|
67
|
+
end
|
68
|
+
|
69
|
+
def mean
|
70
|
+
sum = Array.new(dim){0}
|
71
|
+
sum =
|
72
|
+
@data.split("\n").inject(sum) { |s,line|
|
73
|
+
pred,*feats = line.split
|
74
|
+
if !@classborder then s[0] += pred.to_f end
|
75
|
+
fs = s[1..-1].zip(feats).map{|si,fi|
|
76
|
+
si += fi.split(':').last.to_f}
|
77
|
+
#puts s; gets
|
78
|
+
[s[0]] + fs }
|
79
|
+
sum.map{|i| i / nexamples.to_f}
|
80
|
+
end
|
81
|
+
|
82
|
+
def absmax
|
83
|
+
max = Array.new(dim){0}
|
84
|
+
max =
|
85
|
+
@data.split("\n").inject(max) { |s,line|
|
86
|
+
pred,*feats = line.split
|
87
|
+
s[0] = [s[0],pred.to_f.abs].max
|
88
|
+
fs = s[1..-1].zip(feats).map{|si,fi|
|
89
|
+
[si, fi.split(':').last.to_f.abs].max}
|
90
|
+
#puts s; gets
|
91
|
+
[s[0]] + fs }
|
92
|
+
end
|
93
|
+
|
94
|
+
def fillSparseData!
|
95
|
+
return if @readonly
|
96
|
+
ndim = @data.split("\n").map{|line|
|
97
|
+
if (arr=line.split).size>1
|
98
|
+
arr.last.split(':').first.to_i
|
99
|
+
else 0 end
|
100
|
+
}.max
|
101
|
+
@data =
|
102
|
+
@data.split("\n").map{|line|
|
103
|
+
pred,*feats = line.split
|
104
|
+
if @classborder
|
105
|
+
pred = (pred.to_f>=@classborder) ? '1' : '-1'
|
106
|
+
end
|
107
|
+
counter = 0
|
108
|
+
pred + ' ' +
|
109
|
+
(1..ndim).map { |index|
|
110
|
+
if (element = feats[counter]) and
|
111
|
+
(element.split(':').first.to_i == index)
|
112
|
+
counter+=1
|
113
|
+
"#{index}:#{element.split(':').last}"
|
114
|
+
else
|
115
|
+
"#{index}:0"
|
116
|
+
end
|
117
|
+
}.join(' ') + "\n"
|
118
|
+
}.join
|
119
|
+
self.update
|
120
|
+
end
|
121
|
+
|
122
|
+
end
|