sbn 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README ADDED
@@ -0,0 +1,320 @@
1
+ = SBN - Simple Bayesian Networks
2
+ == Software License Agreement
3
+ Copyright (c) 2005-2007 Carl Youngblood mailto:carl@youngbloods.org
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
22
+
23
+ == Introduction
24
+
25
+ SBN makes it easy to use Bayesian Networks in your ruby application. Why would
26
+ you want to do this? Bayesian networks are excellent tools for making
27
+ intelligent decisions based on collected data. They are used to measure and
28
+ predict the probabilities of various outcomes in a problem space.
29
+
30
+ A Bayesian Network is a directed acyclic graph representing the variables in a
31
+ problem space, the causal relationships between these variables and the
32
+ probabilities of these variables' possible states, as well as the algorithms
33
+ used for inference on these variables.
34
+
35
+ == Installation
36
+ Installation of SBN is simple:
37
+
38
+ # gem install sbn
39
+
40
+ == A Basic Example
41
+ http://youngbloods.org/sbn/images/grass_wetness.png
42
+
43
+ We'll begin with a network whose probabilities have been pre-determined. This
44
+ example comes from the excellent <em>Artificial Intelligence: A Modern
45
+ Approach</em>, by Russell & Norvig. Later we'll see how to determine a
46
+ network's probabilities from sample points. Our sample network has four
47
+ variables, each of which has two possible states:
48
+ * <em>Cloudy</em>: <b>:true</b> if sky is cloudy, <b>:false</b> if sky is sunny.
49
+ * <em>Sprinkler</em>: <b>:true</b> if sprinkler was turned on, <b>:false</b> if not. Whether or not it's cloudy has a direct influence on whether or not the sprinkler is turned on, so there is a parent-child relationship between <em>Sprinkler</em> and <em>Cloudy</em>.
50
+ * <em>Rain</em>: <b>:true</b> if it rained, <b>:false</b> if not. Whether or not it's cloudy has a direct influence on whether or not it will rain, so there is a relationship there too.
51
+ * <em>Grass Wet</em>: <b>:true</b> if the grass is wet, <b>:false</b> if not. The state of the grass is directly influenced by both rain and the sprinkler, but cloudiness has no direct influence on the state of the grass, so grass has a relationship with both <em>Sprinkler</em> and <em>Rain</em> but not <em>Cloudy</em>.
52
+
53
+ Each variable holds a state table representing the conditional probabilties of
54
+ each of its own states given each of its parents' states. <em>Cloudy</em> has
55
+ no parents, so it only has probabilities for its own two states.
56
+ <em>Sprinkler</em> and <em>Rain</em> each have one parent, so they must
57
+ specify probabilities for all four possible combinations of their own states
58
+ and their parents' states. Since <em>Grass Wet</em> has two parents, it must
59
+ specify all eight possible combinations of states. Since we live in a logical
60
+ universe, each variable's possible states given a specific combination of its
61
+ parents' states must add up to 1.0. Notice that <em>Cloudy</em>'s
62
+ probabilities add up to 1.0, <em>Sprinkler</em>'s states given
63
+ <em>Cloudy</em> == :true add up to 1.0 and so on.
64
+
65
+ In the following code, the network shown above is created and its variables
66
+ are initialized and connected to one another. The "evidence" is then set. This
67
+ evidence represents observations of variables in the problem state. In this
68
+ case, the <em>Sprinkler</em> variable was observed to be in the <b>:false</b>
69
+ state, and the <em>Rain</em> variable was observed to be in the <b>:true</b>
70
+ state. Setting this evidence in the network is analogous to knowing more about
71
+ the problem space. With more knowledge of the problem space, the posterior
72
+ probabilities of the remaining unobserved variables can be predicted with
73
+ greater accuracy. After setting the evidence, query_variable() is called on
74
+ the <em>Grass Wet</em> variable. This returns a hash of possible states and
75
+ their posterior probabilties.
76
+
77
+ require 'rubygems'
78
+ require 'sbn'
79
+
80
+ net = Sbn::Net.new("Grass Wetness Belief Net")
81
+ cloudy = Sbn::Variable.new(net, :cloudy, [0.5, 0.5])
82
+ sprinkler = Sbn::Variable.new(net, :sprinkler, [0.1, 0.9, 0.5, 0.5])
83
+ rain = Sbn::Variable.new(net, :rain, [0.8, 0.2, 0.2, 0.8])
84
+ grass_wet = Sbn::Variable.new(net, :grass_wet, [0.99, 0.01, 0.9, 0.1, 0.9, 0.1, 0.0, 1.0])
85
+ cloudy.add_child(sprinkler) # also creates parent relationship
86
+ cloudy.add_child(rain)
87
+ sprinkler.add_child(grass_wet)
88
+ rain.add_child(grass_wet)
89
+ evidence = {:sprinkler => :false, :rain => :true}
90
+ net.set_evidence(evidence)
91
+ net.query_variable(:grass_wet)
92
+
93
+ => {:true=>0.8995, :false=>0.1005} # inferred probabilities for grass_wet
94
+ # given sprinkler == :false and rain == :true
95
+
96
+ === Specifying probabilities
97
+
98
+ The order that probabilities are supplied is as follows. Always alternate
99
+ between the states of the variable whose probabilities you are supplying.
100
+ Supply the probabilities of these states given the variable's parents in the
101
+ order the parents were added, from right to left, with the rightmost (most
102
+ recently added) parent alternating first. For example, if I have one variable
103
+ A with two parents B and C, A having three states, B having two, and C having
104
+ four, I would supply the probabilities in the following order:
105
+
106
+ P(A1|B1,C1) # this notation means "The probability of A1 given B1 and C1"
107
+ P(A2|B1,C1)
108
+ P(A3|B1,C1)
109
+
110
+ P(A1|B1,C2)
111
+ P(A2|B1,C2)
112
+ P(A3|B1,C2)
113
+
114
+ P(A1|B1,C3)
115
+ P(A2|B1,C3)
116
+ P(A3|B1,C3)
117
+
118
+ P(A1|B1,C4)
119
+ P(A2|B1,C4)
120
+ P(A3|B1,C4)
121
+
122
+ P(A1|B2,C1)
123
+ P(A2|B2,C1)
124
+ P(A3|B2,C1)
125
+
126
+ P(A1|B2,C2)
127
+ P(A2|B2,C2)
128
+ P(A3|B2,C2)
129
+
130
+ P(A1|B2,C3)
131
+ P(A2|B2,C3)
132
+ P(A3|B2,C3)
133
+
134
+ P(A1|B2,C4)
135
+ P(A2|B2,C4)
136
+ P(A3|B2,C4)
137
+
138
+ A more verbose, but possibly less confusing way of specifying probabilities is
139
+ to set the specific probability for each state separately using a hash to
140
+ represent the combination of states:
141
+
142
+ net = Sbn::Net.new("Grass Wetness Belief Net")
143
+ cloudy = Sbn::Variable.new(net, :cloudy) # states default to :true and :false
144
+ sprinkler = Sbn::Variable.new(net, :sprinkler)
145
+ rain = Sbn::Variable.new(net, :rain)
146
+ grass_wet = Sbn::Variable.new(net, :grass_wet)
147
+ cloudy.add_child(sprinkler)
148
+ cloudy.add_child(rain)
149
+ sprinkler.add_child(grass_wet)
150
+ rain.add_child(grass_wet)
151
+ cloudy.set_probability(0.5, {:cloudy => :true})
152
+ cloudy.set_probability(0.5, {:cloudy => :false})
153
+ sprinkler.set_probability(0.1, {:sprinkler => :true, :cloudy => :true})
154
+ sprinkler.set_probability(0.9, {:sprinkler => :false, :cloudy => :true})
155
+ sprinkler.set_probability(0.5, {:sprinkler => :true, :cloudy => :false})
156
+ sprinkler.set_probability(0.5, {:sprinkler => :false, :cloudy => :false})
157
+ # etc etc
158
+
159
+ === Inference
160
+
161
+ After your network is set up, you can set evidence for specific variables that
162
+ you have observed and then query unknown variables to see the posterior
163
+ probability of their various states. Given these inferred probabilties, one
164
+ common decision-making strategy is to assume that the variables are set to
165
+ their most probable states.
166
+
167
+ evidence = {:sprinkler => :false, :rain => :true}
168
+ net.set_evidence(evidence)
169
+ net.query_variable(:grass_wet)
170
+
171
+ => {:true=>0.8995, :false=>0.1005} # inferred probabilities for grass_wet
172
+ # given sprinkler == :false and rain == :true
173
+
174
+ The only currently supported inference algorithm is the Markov Chain Monte
175
+ Carlo (MCMC) algorithm. This is an approximation algorithm. Given the
176
+ complexity of inference in Bayesian networks
177
+ (NP-hard[http://en.wikipedia.org/wiki/NP-hard]), exact inference is often
178
+ intractable. The MCMC algorithm approximates the posterior probability for
179
+ each variable's state by generating a random set of states for the unset
180
+ variables in proportion to each state's posterior probability. It generates
181
+ successive random states conditioned on the previous values of the
182
+ non-evidence variables. The reason this works is because over time, the amount
183
+ of time spent in each random state is proportional to its posterior
184
+ probabilty.
185
+
186
+ == Parameter Learning
187
+
188
+ Although it is sometimes useful to be able to specify a variable's
189
+ probabilities in advance, we usually begin with a clean slate, and only are
190
+ able to make a reasonable estimate of each variable's probabilities after
191
+ collecting sufficient data. This process is easy with SBN. The parameter
192
+ learning process requires complete sample points for all variables in the
193
+ network. Each set of sample points is a hash with keys matching each
194
+ variable's name and values corresponding to each variable's observed state.
195
+ The more sample points you supply to your network, the more accurate its
196
+ probability estimates will be.
197
+
198
+ net.learn([
199
+ {:cloudy => :true, :sprinkler => :false, :rain => :true, :grass_wet => :true},
200
+ {:cloudy => :true, :sprinkler => :true, :rain => :false, :grass_wet => :true},
201
+ {:cloudy => :false, :sprinkler => :false, :rain => :true, :grass_wet => :true},
202
+ {:cloudy => :true, :sprinkler => :false, :rain => :true, :grass_wet => :true},
203
+ {:cloudy => :false, :sprinkler => :true, :rain => :false, :grass_wet => :false},
204
+ {:cloudy => :false, :sprinkler => :false, :rain => :false, :grass_wet => :false},
205
+ {:cloudy => :false, :sprinkler => :false, :rain => :false, :grass_wet => :false},
206
+ {:cloudy => :true, :sprinkler => :false, :rain => :true, :grass_wet => :true},
207
+ {:cloudy => :true, :sprinkler => :false, :rain => :false, :grass_wet => :false},
208
+ {:cloudy => :false, :sprinkler => :false, :rain => :false, :grass_wet => :false},
209
+ ])
210
+
211
+ Sample points can also be specified one set at a time and calculation of the
212
+ probability tables can be deferred until a specific time:
213
+
214
+ net.add_sample_point({:cloudy => :true, :sprinkler => :false, :rain => :true, :grass_wet => :true})
215
+ net.add_sample_point({:cloudy => :true, :sprinkler => :true, :rain => :false, :grass_wet => :true})
216
+ net.set_probabilities_from_sample_points!
217
+
218
+ Networks store the sample points you have given them, so that future learning
219
+ continues to take previous samples into account. The learning process is fairly
220
+ simple. The frequency of each state combination in each variable is
221
+ determined, and the number of occurrences for each state combination are
222
+ divided by the total number of combinations learned on.
223
+
224
+ == Saving and Restoring a Network
225
+
226
+ SBN currently supports the {XMLBIF
227
+ format}[http://www.cs.cmu.edu/afs/cs/user/fgcozman/www/Research/InterchangeFormat]
228
+ for serializing Bayesian networks:
229
+
230
+ FILENAME = 'grass_wetness.xml'
231
+ File.open(FILENAME, 'w') {|f| f.write(net.to_xmlbif) }
232
+ reconstituted_net = net.from_xmlbif(File.read(FILENAME))
233
+
234
+ At present, sample points are not saved with your network, but this feature is
235
+ anticipated in a future release.
236
+
237
+ == Advanced Variable Types
238
+ Among SBN's most powerful features are its advanced variable types, which make
239
+ it much more convenient to handle real-world data and increase the relevancy
240
+ of your results.
241
+
242
+ === Sbn::StringVariable
243
+ Sbn::StringVariable is used for handling string data. Rather than set a
244
+ StringVariable's states manually, rely on the learning process. During
245
+ learning, you should pass the observed string for this variable for each
246
+ sample point. Each observed string is divided into a series of n-grams (short
247
+ character sequences) matching snippets of the observed string. A new variable
248
+ is created (of class Sbn::StringCovariable) for each ngram, whose state will
249
+ be :true or :false depending on whether the snippet is observed or not. These
250
+ covariables are managed by the main StringVariable to which they belong and
251
+ are transparent to you, the developer. They inherit the same parents and
252
+ children as their managing StringVariable. By dividing observed string data
253
+ into fine-grained substrings and determining separate probabilities for each
254
+ substring occurrence, an extremely accurate understanding of the data can be
255
+ developed.
256
+
257
+ === Sbn::NumericVariable
258
+ Sbn::NumericVariable is used for handling numeric data, which is continuous
259
+ and is thus more difficult to categorize than discrete states. Due to the
260
+ nature of the MCMC algorithm used for inference, every variable in the network
261
+ must have discrete states, but this limitation can be ameliorated by
262
+ dynamically altering a numeric variable's states according to the variance of
263
+ the numeric data. Whenever learning occurs, the average and standard deviation
264
+ of the observations for the NumericVariable are calculated, and the
265
+ occurrences are divided into multiple categories through a process known as
266
+ discretization. For example, all numbers between 1.0 and 3.5 might be
267
+ classified as one state, and all numbers between 3.5 and 6 might be classified
268
+ in another. The thresholds for each state are based on the mean and standard
269
+ deviation of the observed data, and are recalculated every time learning
270
+ occurs, so even though some amount of accuracy is lost by discretization, the
271
+ states chosen should usually be well-adapted to the data in your domain
272
+ (assuming it is somewhat normally distributed). This variable type makes it
273
+ much easier to work with numeric data by dynamically adapting to your data and
274
+ handling the discretization for you.
275
+
276
+ The following example shows a network that uses these advanced variable types:
277
+
278
+ net = Sbn::Net('Budget Category Network')
279
+ category = Sbn::Variable(net, :category, [0.5, 0.25, 0.25], [:gas, :food, :clothing])
280
+ amount = Sbn::NumericVariable(net, :amount)
281
+ merchant = Sbn::StringVariable(net, :merchant)
282
+ category.add_child(amount)
283
+ category.add_child(merchant)
284
+
285
+ Before parameter learning occurs on this network, it looks like this:
286
+
287
+ http://youngbloods.org/sbn/images/stringvar1.png
288
+
289
+ The <em>Category</em> variable represents a budget category for a financial
290
+ transaction. The <em>Amount</em> variable is for the amount of the transaction
291
+ and the <em>Merchant</em> variable handles observed strings for the merchant
292
+ where the transaction took place. Suppose we supplied some sample points to
293
+ this network:
294
+
295
+ net.add_sample_point :category => :gas, :amount => 29.11, :merchant => 'Chevron'
296
+
297
+ After adding that sample point, the network would look something like this:
298
+
299
+ http://youngbloods.org/sbn/images/stringvar2.png
300
+
301
+ The variables with dashed edges are the string covariables that were created
302
+ by the managing string variable when it saw a new string in the sample points.
303
+ At present, string variables generate ngrams of length 3, 6, and 10
304
+ characters. It is anticipated that these lengths will become customizable in a
305
+ future release.
306
+
307
+ == Future Features
308
+ There are many areas where we hope to improve Simple Bayesian Networks. Here
309
+ are some of the possible improvements that may be added in future releases:
310
+ * Support for exact inference
311
+ * Support for continuous variables
312
+ * Saving the sample points along with the network when saving to XMLBIF
313
+ * Speedier inference using native C++ with vectorization provided by macstl[http://www.pixelglow.com/macstl/]
314
+ * Speedier inference through parallelization
315
+ * Support for learning network structure
316
+ * Support for customizing the number of iterations in the MCMC algorithm (currently hard-coded)
317
+ * Support for customizing the size of ngrams used in string variables
318
+ * Support for intelligently determining the best number of iterations for MCMC at runtime based on the desired level of precision
319
+
320
+ Please share your own ideas with us and help to improve this library.
@@ -0,0 +1,78 @@
1
+ # = combination.rb: Class for handling variable state combinations
2
+ # Copyright (C) 2005-2007 Carl Youngblood mailto:carl@youngbloods.org
3
+ #
4
+ # Takes an array of arrays and iterates over all combinations of sub-elements.
5
+ # For example:
6
+ #
7
+ # c = Combination.new([[1, 2], [6, 7, 8]])
8
+ # c.each {|comb| p comb }
9
+ #
10
+ # Will produce:
11
+ #
12
+ # [1, 6]
13
+ # [1, 7]
14
+ # [1, 8]
15
+ # [2, 6]
16
+ # [2, 7]
17
+ # [2, 8]
18
+
19
+ class Combination # :nodoc:
20
+ include Enumerable
21
+
22
+ def initialize(arr)
23
+ @arr = arr
24
+ @current = Array.new(arr.size, 0)
25
+ end
26
+
27
+ def each
28
+ iterations = @arr.inject(1) {|product, element| product * element.size } - 1
29
+ yield current
30
+ iterations.times { yield self.next_combination }
31
+ end
32
+
33
+ def <=>(other)
34
+ @current <=> other.current
35
+ end
36
+
37
+ def first
38
+ @current.fill 0
39
+ end
40
+
41
+ def last
42
+ @current.size.times {|i| @current[i] = @arr[i].size - 1 }
43
+ end
44
+
45
+ def current
46
+ returnval = []
47
+ @current.size.times {|i| returnval[i] = @arr[i][@current[i]] }
48
+ returnval
49
+ end
50
+
51
+ def next_combination
52
+ i = @current.size - 1
53
+ @current.reverse.each do |e|
54
+ if e == @arr[i].size - 1
55
+ @current[i] = 0
56
+ else
57
+ @current[i] += 1
58
+ break
59
+ end
60
+ i -= 1
61
+ end
62
+ current
63
+ end
64
+
65
+ def prev_combination
66
+ i = @current.size - 1
67
+ @current.reverse.each do |e|
68
+ if e == 0
69
+ @current[i] = @arr[i].size - 1
70
+ else
71
+ @current[i] -= 1
72
+ break
73
+ end
74
+ i -= 1
75
+ end
76
+ current
77
+ end
78
+ end
data/lib/formats.rb ADDED
@@ -0,0 +1,119 @@
1
+ class Sbn
2
+ class Net
3
+ # Returns a string containing a representation of the network in XMLBIF format.
4
+ # http://www.cs.cmu.edu/afs/cs/user/fgcozman/www/Research/InterchangeFormat
5
+ def to_xmlbif
6
+ xml = Builder::XmlMarkup.new(:indent => 2)
7
+ xml.instruct!
8
+ xml.comment! <<-EOS
9
+
10
+ Bayesian network in XMLBIF v0.3 (BayesNet Interchange Format)
11
+ Produced by SBN (Simple Bayesian Network library)
12
+ Output created #{Time.now}
13
+ EOS
14
+ xml.text! "\n"
15
+ xml.comment! "DTD for the XMLBIF 0.3 format"
16
+ xml.declare! :DOCTYPE, :bif do
17
+ xml.declare! :ELEMENT, :bif, :"(network)*"
18
+ xml.declare! :ATTLIST, :bif, :version, :CDATA, :"#REQUIRED"
19
+ xml.declare! :ELEMENT, :"network (name, (property | variable | definition)*)"
20
+ xml.declare! :ELEMENT, :name, :"(#PCDATA)"
21
+ xml.declare! :ELEMENT, :"variable (name, (outcome | property)*)"
22
+ xml.declare! :ATTLIST, :"variable type (nature | decision | utility) \"nature\""
23
+ xml.declare! :ELEMENT, :outcome, :"(#PCDATA)"
24
+ xml.declare! :ELEMENT, :definition, :"(for | given | table | property)*"
25
+ xml.declare! :ELEMENT, :for, :"(#PCDATA)"
26
+ xml.declare! :ELEMENT, :given, :"(#PCDATA)"
27
+ xml.declare! :ELEMENT, :table, :"(#PCDATA)"
28
+ xml.declare! :ELEMENT, :property, :"(#PCDATA)"
29
+ end
30
+ xml.bif :version => 0.3 do
31
+ xml.network do
32
+ xml.name(@name.to_s)
33
+ xml.text! "\n"
34
+ xml.comment! "Variables"
35
+ @variables.each {|name, variable| variable.to_xmlbif_variable(xml) }
36
+ xml.text! "\n"
37
+ xml.comment! "Probability distributions"
38
+ @variables.each {|name, variable| variable.to_xmlbif_definition(xml) }
39
+ end
40
+ end
41
+ end
42
+
43
+ # Reconstitute a saved network.
44
+ def self.from_xmlbif(source)
45
+ # convert tags to lower case
46
+ source.gsub!(/(<.*?>)/, '\\1'.downcase)
47
+
48
+ doc = XmlSimple.xml_in(source)
49
+ netname = doc['network'].first['name'].first
50
+
51
+ # find net name
52
+ returnval = Net.new(netname)
53
+
54
+ # find variables
55
+ count = 0
56
+ variables = {}
57
+ variable_elements = doc['network'].first['variable'].each do |var|
58
+ varname = var['name'].first.to_sym
59
+ properties = var['property']
60
+ vartype = nil
61
+ manager_name = nil
62
+ text_to_match = ""
63
+ options = {}
64
+ thresholds = []
65
+ properties.each do |prop|
66
+ key, val = prop.split('=').map {|e| e.strip }
67
+ vartype = val if key == 'SbnVariableType'
68
+ manager_name = val if key == 'ManagerVariableName'
69
+ text_to_match = eval(val) if key == 'TextToMatch'
70
+ options[key.to_sym] = val.to_i if key =~ /stdev_state_count/
71
+ thresholds = val.map {|e| e.to_f } if key == 'StateThresholds'
72
+ end
73
+ states = var['outcome']
74
+ table = []
75
+ doc['network'].first['definition'].each do |defn|
76
+ if defn['for'].first.to_sym == varname
77
+ table = defn['table'].first.split.map {|prob| prob.to_f }
78
+ end
79
+ end
80
+ count += 1
81
+ variables[varname] = case vartype
82
+ when "Sbn::StringVariable" then StringVariable.new(returnval, varname)
83
+ when "Sbn::NumericVariable" then NumericVariable.new(returnval, varname, table, thresholds, options)
84
+ when "Sbn::Variable" then Variable.new(returnval, varname, table, states)
85
+ when "Sbn::StringCovariable" then StringCovariable.new(returnval, manager_name, text_to_match, table)
86
+ end
87
+ end
88
+
89
+ # find relationships between variables
90
+
91
+ # connect covariables to their managers
92
+ variable_elements = doc['network'].first['variable'].each do |var|
93
+ varname = var['name'].first.to_sym
94
+ properties = var['property']
95
+ vartype = nil
96
+ covars = nil
97
+ parents = nil
98
+ properties.each do |prop|
99
+ key, val = prop.split('=').map {|e| e.strip }
100
+ covars = val.split(',').map {|e| e.strip.to_sym } if key == 'Covariables'
101
+ parents = val.split(',').map {|e| e.strip.to_sym } if key == 'Parents'
102
+ vartype = val if key == 'SbnVariableType'
103
+ end
104
+ if vartype == "Sbn::StringVariable"
105
+ parents.each {|p| variables[varname].add_parent(variables[p]) } if parents
106
+ covars.each {|covar| variables[varname].add_covariable(variables[covar]) } if covars
107
+ end
108
+ end
109
+
110
+ # connect all other variables to their parents
111
+ doc['network'].first['definition'].each do |defn|
112
+ varname = defn['for'].first.to_sym
113
+ parents = defn['given']
114
+ parents.each {|p| variables[varname].add_parent(variables[p.to_sym]) } if parents
115
+ end
116
+ returnval
117
+ end
118
+ end
119
+ end
data/lib/helpers.rb ADDED
@@ -0,0 +1,140 @@
1
+ # = helpers.rb: Helper methods added to existing Ruby classes
2
+ # Credit goes to ruby-talk posts for many of these (details below).
3
+ #
4
+ # Permission is hereby granted, free of charge, to any person
5
+ # obtaining a copy of this software and associated documentation
6
+ # files (the "Software"), to deal in the Software without
7
+ # restriction, including without limitation the rights to use,
8
+ # copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ # copies of the Software, and to permit persons to whom the
10
+ # Software is furnished to do so, subject to the following
11
+ # conditions:
12
+ #
13
+ # The above copyright notice and this permission notice shall be
14
+ # included in all copies or substantial portions of the Software.
15
+ #
16
+ # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
18
+ # OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
+ # NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
20
+ # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
21
+ # WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
22
+ # FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
23
+ # OTHER DEALINGS IN THE SOFTWARE.
24
+
25
+ # Thanks to Brian Schrer <ruby.brian _at_ gmail.com> for the
26
+ # following two methods, from ruby-talk post #150456.
27
+ class Object # :nodoc:
28
+ def self.enums(*args)
29
+ args.flatten.each_with_index do |const, i|
30
+ class_eval %(#{const} = #{i})
31
+ end
32
+ end
33
+
34
+ def self.bitwise_enums(*args)
35
+ args.flatten.each_with_index do |const, i|
36
+ class_eval %(#{const} = #{2**i})
37
+ end
38
+ end
39
+ end
40
+
41
+ class String # :nodoc:
42
+ def to_underscore_sym
43
+ self.titleize.gsub(/\s+/, '').underscore.to_sym
44
+ end
45
+
46
+ # Thanks to David Alan Black for this method, from
47
+ # ruby-talk post #11792
48
+ def ngrams(len = 1)
49
+ ngrams = []
50
+ len = size if len > size
51
+ (0..size - len).each do |n|
52
+ ng = self[n...(n + len)]
53
+ ngrams.push(ng)
54
+ yield ng if block_given?
55
+ end
56
+ ngrams
57
+ end
58
+ end
59
+
60
+ class Symbol # :nodoc:
61
+ def to_underscore_sym
62
+ self.to_s.titleize.gsub(/\s+/, '').underscore.to_sym
63
+ end
64
+ end
65
+
66
+ class Array # :nodoc:
67
+ def symbolize_values
68
+ self.map {|e| e.to_underscore_sym }
69
+ end
70
+
71
+ def symbolize_values!
72
+ self.map! {|e| e.to_underscore_sym }
73
+ end
74
+
75
+ def normalize
76
+ sum = self.inject(0.0) {|sum, e| sum += e }
77
+ self.map {|e| e.to_f / sum }
78
+ end
79
+
80
+ def normalize!
81
+ sum = self.inject(0.0) {|sum, e| sum += e }
82
+ self.map! {|e| e.to_f / sum }
83
+ end
84
+ end
85
+
86
+ class Hash # :nodoc:
87
+ def symbolize_keys_and_values
88
+ inject({}) do |options, (key, value)|
89
+ key = key.to_underscore_sym
90
+ value = value.to_underscore_sym
91
+ options[key] = value
92
+ options
93
+ end
94
+ end
95
+
96
+ def symbolize_keys_and_values!
97
+ keys.each do |key|
98
+ newkey = key.to_underscore_sym
99
+ self[newkey] = self[key].to_underscore_sym
100
+ delete(key) unless key == newkey
101
+ end
102
+ self
103
+ end
104
+ end
105
+
106
+ # Thanks to Eric Hodel for the following additions
107
+ # to the enumerable model, from ruby-talk post #135920.
108
+ module Enumerable # :nodoc:
109
+ ##
110
+ # Sum of all the elements of the Enumerable
111
+ def sum
112
+ return self.inject(0) { |acc, i| acc + i }
113
+ end
114
+
115
+ ##
116
+ # Average of all the elements of the Enumerable
117
+ #
118
+ # The Enumerable must respond to #length
119
+ def average
120
+ return self.sum / self.length.to_f
121
+ end
122
+
123
+ ##
124
+ # Sample variance of all the elements of the Enumerable
125
+ #
126
+ # The Enumerable must respond to #length
127
+ def sample_variance
128
+ avg = self.average
129
+ sum = self.inject(0) { |acc, i| acc + (i - avg) ** 2 }
130
+ return (1 / self.length.to_f * sum)
131
+ end
132
+
133
+ ##
134
+ # Standard deviation of all the elements of the Enumerable
135
+ #
136
+ # The Enumerable must respond to #length
137
+ def standard_deviation
138
+ return Math.sqrt(self.sample_variance)
139
+ end
140
+ end