sbn 0.9.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README +320 -0
- data/lib/combination.rb +78 -0
- data/lib/formats.rb +119 -0
- data/lib/helpers.rb +140 -0
- data/lib/inference.rb +65 -0
- data/lib/learning.rb +141 -0
- data/lib/net.rb +49 -0
- data/lib/numeric_variable.rb +94 -0
- data/lib/sbn.rb +6 -0
- data/lib/string_variable.rb +176 -0
- data/lib/variable.rb +224 -0
- data/test/sbn.rb +5 -0
- data/test/test_combination.rb +51 -0
- data/test/test_helpers.rb +80 -0
- data/test/test_learning.rb +104 -0
- data/test/test_net.rb +136 -0
- data/test/test_variable.rb +373 -0
- metadata +63 -0
data/README
ADDED
@@ -0,0 +1,320 @@
|
|
1
|
+
= SBN - Simple Bayesian Networks
|
2
|
+
== Software License Agreement
|
3
|
+
Copyright (c) 2005-2007 Carl Youngblood mailto:carl@youngbloods.org
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
7
|
+
in the Software without restriction, including without limitation the rights
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
10
|
+
furnished to do so, subject to the following conditions:
|
11
|
+
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
13
|
+
copies or substantial portions of the Software.
|
14
|
+
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
21
|
+
SOFTWARE.
|
22
|
+
|
23
|
+
== Introduction
|
24
|
+
|
25
|
+
SBN makes it easy to use Bayesian Networks in your ruby application. Why would
|
26
|
+
you want to do this? Bayesian networks are excellent tools for making
|
27
|
+
intelligent decisions based on collected data. They are used to measure and
|
28
|
+
predict the probabilities of various outcomes in a problem space.
|
29
|
+
|
30
|
+
A Bayesian Network is a directed acyclic graph representing the variables in a
|
31
|
+
problem space, the causal relationships between these variables and the
|
32
|
+
probabilities of these variables' possible states, as well as the algorithms
|
33
|
+
used for inference on these variables.
|
34
|
+
|
35
|
+
== Installation
|
36
|
+
Installation of SBN is simple:
|
37
|
+
|
38
|
+
# gem install sbn
|
39
|
+
|
40
|
+
== A Basic Example
|
41
|
+
http://youngbloods.org/sbn/images/grass_wetness.png
|
42
|
+
|
43
|
+
We'll begin with a network whose probabilities have been pre-determined. This
|
44
|
+
example comes from the excellent <em>Artificial Intelligence: A Modern
|
45
|
+
Approach</em>, by Russell & Norvig. Later we'll see how to determine a
|
46
|
+
network's probabilities from sample points. Our sample network has four
|
47
|
+
variables, each of which has two possible states:
|
48
|
+
* <em>Cloudy</em>: <b>:true</b> if sky is cloudy, <b>:false</b> if sky is sunny.
|
49
|
+
* <em>Sprinkler</em>: <b>:true</b> if sprinkler was turned on, <b>:false</b> if not. Whether or not it's cloudy has a direct influence on whether or not the sprinkler is turned on, so there is a parent-child relationship between <em>Sprinkler</em> and <em>Cloudy</em>.
|
50
|
+
* <em>Rain</em>: <b>:true</b> if it rained, <b>:false</b> if not. Whether or not it's cloudy has a direct influence on whether or not it will rain, so there is a relationship there too.
|
51
|
+
* <em>Grass Wet</em>: <b>:true</b> if the grass is wet, <b>:false</b> if not. The state of the grass is directly influenced by both rain and the sprinkler, but cloudiness has no direct influence on the state of the grass, so grass has a relationship with both <em>Sprinkler</em> and <em>Rain</em> but not <em>Cloudy</em>.
|
52
|
+
|
53
|
+
Each variable holds a state table representing the conditional probabilties of
|
54
|
+
each of its own states given each of its parents' states. <em>Cloudy</em> has
|
55
|
+
no parents, so it only has probabilities for its own two states.
|
56
|
+
<em>Sprinkler</em> and <em>Rain</em> each have one parent, so they must
|
57
|
+
specify probabilities for all four possible combinations of their own states
|
58
|
+
and their parents' states. Since <em>Grass Wet</em> has two parents, it must
|
59
|
+
specify all eight possible combinations of states. Since we live in a logical
|
60
|
+
universe, each variable's possible states given a specific combination of its
|
61
|
+
parents' states must add up to 1.0. Notice that <em>Cloudy</em>'s
|
62
|
+
probabilities add up to 1.0, <em>Sprinkler</em>'s states given
|
63
|
+
<em>Cloudy</em> == :true add up to 1.0 and so on.
|
64
|
+
|
65
|
+
In the following code, the network shown above is created and its variables
|
66
|
+
are initialized and connected to one another. The "evidence" is then set. This
|
67
|
+
evidence represents observations of variables in the problem state. In this
|
68
|
+
case, the <em>Sprinkler</em> variable was observed to be in the <b>:false</b>
|
69
|
+
state, and the <em>Rain</em> variable was observed to be in the <b>:true</b>
|
70
|
+
state. Setting this evidence in the network is analogous to knowing more about
|
71
|
+
the problem space. With more knowledge of the problem space, the posterior
|
72
|
+
probabilities of the remaining unobserved variables can be predicted with
|
73
|
+
greater accuracy. After setting the evidence, query_variable() is called on
|
74
|
+
the <em>Grass Wet</em> variable. This returns a hash of possible states and
|
75
|
+
their posterior probabilties.
|
76
|
+
|
77
|
+
require 'rubygems'
|
78
|
+
require 'sbn'
|
79
|
+
|
80
|
+
net = Sbn::Net.new("Grass Wetness Belief Net")
|
81
|
+
cloudy = Sbn::Variable.new(net, :cloudy, [0.5, 0.5])
|
82
|
+
sprinkler = Sbn::Variable.new(net, :sprinkler, [0.1, 0.9, 0.5, 0.5])
|
83
|
+
rain = Sbn::Variable.new(net, :rain, [0.8, 0.2, 0.2, 0.8])
|
84
|
+
grass_wet = Sbn::Variable.new(net, :grass_wet, [0.99, 0.01, 0.9, 0.1, 0.9, 0.1, 0.0, 1.0])
|
85
|
+
cloudy.add_child(sprinkler) # also creates parent relationship
|
86
|
+
cloudy.add_child(rain)
|
87
|
+
sprinkler.add_child(grass_wet)
|
88
|
+
rain.add_child(grass_wet)
|
89
|
+
evidence = {:sprinkler => :false, :rain => :true}
|
90
|
+
net.set_evidence(evidence)
|
91
|
+
net.query_variable(:grass_wet)
|
92
|
+
|
93
|
+
=> {:true=>0.8995, :false=>0.1005} # inferred probabilities for grass_wet
|
94
|
+
# given sprinkler == :false and rain == :true
|
95
|
+
|
96
|
+
=== Specifying probabilities
|
97
|
+
|
98
|
+
The order that probabilities are supplied is as follows. Always alternate
|
99
|
+
between the states of the variable whose probabilities you are supplying.
|
100
|
+
Supply the probabilities of these states given the variable's parents in the
|
101
|
+
order the parents were added, from right to left, with the rightmost (most
|
102
|
+
recently added) parent alternating first. For example, if I have one variable
|
103
|
+
A with two parents B and C, A having three states, B having two, and C having
|
104
|
+
four, I would supply the probabilities in the following order:
|
105
|
+
|
106
|
+
P(A1|B1,C1) # this notation means "The probability of A1 given B1 and C1"
|
107
|
+
P(A2|B1,C1)
|
108
|
+
P(A3|B1,C1)
|
109
|
+
|
110
|
+
P(A1|B1,C2)
|
111
|
+
P(A2|B1,C2)
|
112
|
+
P(A3|B1,C2)
|
113
|
+
|
114
|
+
P(A1|B1,C3)
|
115
|
+
P(A2|B1,C3)
|
116
|
+
P(A3|B1,C3)
|
117
|
+
|
118
|
+
P(A1|B1,C4)
|
119
|
+
P(A2|B1,C4)
|
120
|
+
P(A3|B1,C4)
|
121
|
+
|
122
|
+
P(A1|B2,C1)
|
123
|
+
P(A2|B2,C1)
|
124
|
+
P(A3|B2,C1)
|
125
|
+
|
126
|
+
P(A1|B2,C2)
|
127
|
+
P(A2|B2,C2)
|
128
|
+
P(A3|B2,C2)
|
129
|
+
|
130
|
+
P(A1|B2,C3)
|
131
|
+
P(A2|B2,C3)
|
132
|
+
P(A3|B2,C3)
|
133
|
+
|
134
|
+
P(A1|B2,C4)
|
135
|
+
P(A2|B2,C4)
|
136
|
+
P(A3|B2,C4)
|
137
|
+
|
138
|
+
A more verbose, but possibly less confusing way of specifying probabilities is
|
139
|
+
to set the specific probability for each state separately using a hash to
|
140
|
+
represent the combination of states:
|
141
|
+
|
142
|
+
net = Sbn::Net.new("Grass Wetness Belief Net")
|
143
|
+
cloudy = Sbn::Variable.new(net, :cloudy) # states default to :true and :false
|
144
|
+
sprinkler = Sbn::Variable.new(net, :sprinkler)
|
145
|
+
rain = Sbn::Variable.new(net, :rain)
|
146
|
+
grass_wet = Sbn::Variable.new(net, :grass_wet)
|
147
|
+
cloudy.add_child(sprinkler)
|
148
|
+
cloudy.add_child(rain)
|
149
|
+
sprinkler.add_child(grass_wet)
|
150
|
+
rain.add_child(grass_wet)
|
151
|
+
cloudy.set_probability(0.5, {:cloudy => :true})
|
152
|
+
cloudy.set_probability(0.5, {:cloudy => :false})
|
153
|
+
sprinkler.set_probability(0.1, {:sprinkler => :true, :cloudy => :true})
|
154
|
+
sprinkler.set_probability(0.9, {:sprinkler => :false, :cloudy => :true})
|
155
|
+
sprinkler.set_probability(0.5, {:sprinkler => :true, :cloudy => :false})
|
156
|
+
sprinkler.set_probability(0.5, {:sprinkler => :false, :cloudy => :false})
|
157
|
+
# etc etc
|
158
|
+
|
159
|
+
=== Inference
|
160
|
+
|
161
|
+
After your network is set up, you can set evidence for specific variables that
|
162
|
+
you have observed and then query unknown variables to see the posterior
|
163
|
+
probability of their various states. Given these inferred probabilties, one
|
164
|
+
common decision-making strategy is to assume that the variables are set to
|
165
|
+
their most probable states.
|
166
|
+
|
167
|
+
evidence = {:sprinkler => :false, :rain => :true}
|
168
|
+
net.set_evidence(evidence)
|
169
|
+
net.query_variable(:grass_wet)
|
170
|
+
|
171
|
+
=> {:true=>0.8995, :false=>0.1005} # inferred probabilities for grass_wet
|
172
|
+
# given sprinkler == :false and rain == :true
|
173
|
+
|
174
|
+
The only currently supported inference algorithm is the Markov Chain Monte
|
175
|
+
Carlo (MCMC) algorithm. This is an approximation algorithm. Given the
|
176
|
+
complexity of inference in Bayesian networks
|
177
|
+
(NP-hard[http://en.wikipedia.org/wiki/NP-hard]), exact inference is often
|
178
|
+
intractable. The MCMC algorithm approximates the posterior probability for
|
179
|
+
each variable's state by generating a random set of states for the unset
|
180
|
+
variables in proportion to each state's posterior probability. It generates
|
181
|
+
successive random states conditioned on the previous values of the
|
182
|
+
non-evidence variables. The reason this works is because over time, the amount
|
183
|
+
of time spent in each random state is proportional to its posterior
|
184
|
+
probabilty.
|
185
|
+
|
186
|
+
== Parameter Learning
|
187
|
+
|
188
|
+
Although it is sometimes useful to be able to specify a variable's
|
189
|
+
probabilities in advance, we usually begin with a clean slate, and only are
|
190
|
+
able to make a reasonable estimate of each variable's probabilities after
|
191
|
+
collecting sufficient data. This process is easy with SBN. The parameter
|
192
|
+
learning process requires complete sample points for all variables in the
|
193
|
+
network. Each set of sample points is a hash with keys matching each
|
194
|
+
variable's name and values corresponding to each variable's observed state.
|
195
|
+
The more sample points you supply to your network, the more accurate its
|
196
|
+
probability estimates will be.
|
197
|
+
|
198
|
+
net.learn([
|
199
|
+
{:cloudy => :true, :sprinkler => :false, :rain => :true, :grass_wet => :true},
|
200
|
+
{:cloudy => :true, :sprinkler => :true, :rain => :false, :grass_wet => :true},
|
201
|
+
{:cloudy => :false, :sprinkler => :false, :rain => :true, :grass_wet => :true},
|
202
|
+
{:cloudy => :true, :sprinkler => :false, :rain => :true, :grass_wet => :true},
|
203
|
+
{:cloudy => :false, :sprinkler => :true, :rain => :false, :grass_wet => :false},
|
204
|
+
{:cloudy => :false, :sprinkler => :false, :rain => :false, :grass_wet => :false},
|
205
|
+
{:cloudy => :false, :sprinkler => :false, :rain => :false, :grass_wet => :false},
|
206
|
+
{:cloudy => :true, :sprinkler => :false, :rain => :true, :grass_wet => :true},
|
207
|
+
{:cloudy => :true, :sprinkler => :false, :rain => :false, :grass_wet => :false},
|
208
|
+
{:cloudy => :false, :sprinkler => :false, :rain => :false, :grass_wet => :false},
|
209
|
+
])
|
210
|
+
|
211
|
+
Sample points can also be specified one set at a time and calculation of the
|
212
|
+
probability tables can be deferred until a specific time:
|
213
|
+
|
214
|
+
net.add_sample_point({:cloudy => :true, :sprinkler => :false, :rain => :true, :grass_wet => :true})
|
215
|
+
net.add_sample_point({:cloudy => :true, :sprinkler => :true, :rain => :false, :grass_wet => :true})
|
216
|
+
net.set_probabilities_from_sample_points!
|
217
|
+
|
218
|
+
Networks store the sample points you have given them, so that future learning
|
219
|
+
continues to take previous samples into account. The learning process is fairly
|
220
|
+
simple. The frequency of each state combination in each variable is
|
221
|
+
determined, and the number of occurrences for each state combination are
|
222
|
+
divided by the total number of combinations learned on.
|
223
|
+
|
224
|
+
== Saving and Restoring a Network
|
225
|
+
|
226
|
+
SBN currently supports the {XMLBIF
|
227
|
+
format}[http://www.cs.cmu.edu/afs/cs/user/fgcozman/www/Research/InterchangeFormat]
|
228
|
+
for serializing Bayesian networks:
|
229
|
+
|
230
|
+
FILENAME = 'grass_wetness.xml'
|
231
|
+
File.open(FILENAME, 'w') {|f| f.write(net.to_xmlbif) }
|
232
|
+
reconstituted_net = net.from_xmlbif(File.read(FILENAME))
|
233
|
+
|
234
|
+
At present, sample points are not saved with your network, but this feature is
|
235
|
+
anticipated in a future release.
|
236
|
+
|
237
|
+
== Advanced Variable Types
|
238
|
+
Among SBN's most powerful features are its advanced variable types, which make
|
239
|
+
it much more convenient to handle real-world data and increase the relevancy
|
240
|
+
of your results.
|
241
|
+
|
242
|
+
=== Sbn::StringVariable
|
243
|
+
Sbn::StringVariable is used for handling string data. Rather than set a
|
244
|
+
StringVariable's states manually, rely on the learning process. During
|
245
|
+
learning, you should pass the observed string for this variable for each
|
246
|
+
sample point. Each observed string is divided into a series of n-grams (short
|
247
|
+
character sequences) matching snippets of the observed string. A new variable
|
248
|
+
is created (of class Sbn::StringCovariable) for each ngram, whose state will
|
249
|
+
be :true or :false depending on whether the snippet is observed or not. These
|
250
|
+
covariables are managed by the main StringVariable to which they belong and
|
251
|
+
are transparent to you, the developer. They inherit the same parents and
|
252
|
+
children as their managing StringVariable. By dividing observed string data
|
253
|
+
into fine-grained substrings and determining separate probabilities for each
|
254
|
+
substring occurrence, an extremely accurate understanding of the data can be
|
255
|
+
developed.
|
256
|
+
|
257
|
+
=== Sbn::NumericVariable
|
258
|
+
Sbn::NumericVariable is used for handling numeric data, which is continuous
|
259
|
+
and is thus more difficult to categorize than discrete states. Due to the
|
260
|
+
nature of the MCMC algorithm used for inference, every variable in the network
|
261
|
+
must have discrete states, but this limitation can be ameliorated by
|
262
|
+
dynamically altering a numeric variable's states according to the variance of
|
263
|
+
the numeric data. Whenever learning occurs, the average and standard deviation
|
264
|
+
of the observations for the NumericVariable are calculated, and the
|
265
|
+
occurrences are divided into multiple categories through a process known as
|
266
|
+
discretization. For example, all numbers between 1.0 and 3.5 might be
|
267
|
+
classified as one state, and all numbers between 3.5 and 6 might be classified
|
268
|
+
in another. The thresholds for each state are based on the mean and standard
|
269
|
+
deviation of the observed data, and are recalculated every time learning
|
270
|
+
occurs, so even though some amount of accuracy is lost by discretization, the
|
271
|
+
states chosen should usually be well-adapted to the data in your domain
|
272
|
+
(assuming it is somewhat normally distributed). This variable type makes it
|
273
|
+
much easier to work with numeric data by dynamically adapting to your data and
|
274
|
+
handling the discretization for you.
|
275
|
+
|
276
|
+
The following example shows a network that uses these advanced variable types:
|
277
|
+
|
278
|
+
net = Sbn::Net('Budget Category Network')
|
279
|
+
category = Sbn::Variable(net, :category, [0.5, 0.25, 0.25], [:gas, :food, :clothing])
|
280
|
+
amount = Sbn::NumericVariable(net, :amount)
|
281
|
+
merchant = Sbn::StringVariable(net, :merchant)
|
282
|
+
category.add_child(amount)
|
283
|
+
category.add_child(merchant)
|
284
|
+
|
285
|
+
Before parameter learning occurs on this network, it looks like this:
|
286
|
+
|
287
|
+
http://youngbloods.org/sbn/images/stringvar1.png
|
288
|
+
|
289
|
+
The <em>Category</em> variable represents a budget category for a financial
|
290
|
+
transaction. The <em>Amount</em> variable is for the amount of the transaction
|
291
|
+
and the <em>Merchant</em> variable handles observed strings for the merchant
|
292
|
+
where the transaction took place. Suppose we supplied some sample points to
|
293
|
+
this network:
|
294
|
+
|
295
|
+
net.add_sample_point :category => :gas, :amount => 29.11, :merchant => 'Chevron'
|
296
|
+
|
297
|
+
After adding that sample point, the network would look something like this:
|
298
|
+
|
299
|
+
http://youngbloods.org/sbn/images/stringvar2.png
|
300
|
+
|
301
|
+
The variables with dashed edges are the string covariables that were created
|
302
|
+
by the managing string variable when it saw a new string in the sample points.
|
303
|
+
At present, string variables generate ngrams of length 3, 6, and 10
|
304
|
+
characters. It is anticipated that these lengths will become customizable in a
|
305
|
+
future release.
|
306
|
+
|
307
|
+
== Future Features
|
308
|
+
There are many areas where we hope to improve Simple Bayesian Networks. Here
|
309
|
+
are some of the possible improvements that may be added in future releases:
|
310
|
+
* Support for exact inference
|
311
|
+
* Support for continuous variables
|
312
|
+
* Saving the sample points along with the network when saving to XMLBIF
|
313
|
+
* Speedier inference using native C++ with vectorization provided by macstl[http://www.pixelglow.com/macstl/]
|
314
|
+
* Speedier inference through parallelization
|
315
|
+
* Support for learning network structure
|
316
|
+
* Support for customizing the number of iterations in the MCMC algorithm (currently hard-coded)
|
317
|
+
* Support for customizing the size of ngrams used in string variables
|
318
|
+
* Support for intelligently determining the best number of iterations for MCMC at runtime based on the desired level of precision
|
319
|
+
|
320
|
+
Please share your own ideas with us and help to improve this library.
|
data/lib/combination.rb
ADDED
@@ -0,0 +1,78 @@
|
|
1
|
+
# = combination.rb: Class for handling variable state combinations
|
2
|
+
# Copyright (C) 2005-2007 Carl Youngblood mailto:carl@youngbloods.org
|
3
|
+
#
|
4
|
+
# Takes an array of arrays and iterates over all combinations of sub-elements.
|
5
|
+
# For example:
|
6
|
+
#
|
7
|
+
# c = Combination.new([[1, 2], [6, 7, 8]])
|
8
|
+
# c.each {|comb| p comb }
|
9
|
+
#
|
10
|
+
# Will produce:
|
11
|
+
#
|
12
|
+
# [1, 6]
|
13
|
+
# [1, 7]
|
14
|
+
# [1, 8]
|
15
|
+
# [2, 6]
|
16
|
+
# [2, 7]
|
17
|
+
# [2, 8]
|
18
|
+
|
19
|
+
class Combination # :nodoc:
|
20
|
+
include Enumerable
|
21
|
+
|
22
|
+
def initialize(arr)
|
23
|
+
@arr = arr
|
24
|
+
@current = Array.new(arr.size, 0)
|
25
|
+
end
|
26
|
+
|
27
|
+
def each
|
28
|
+
iterations = @arr.inject(1) {|product, element| product * element.size } - 1
|
29
|
+
yield current
|
30
|
+
iterations.times { yield self.next_combination }
|
31
|
+
end
|
32
|
+
|
33
|
+
def <=>(other)
|
34
|
+
@current <=> other.current
|
35
|
+
end
|
36
|
+
|
37
|
+
def first
|
38
|
+
@current.fill 0
|
39
|
+
end
|
40
|
+
|
41
|
+
def last
|
42
|
+
@current.size.times {|i| @current[i] = @arr[i].size - 1 }
|
43
|
+
end
|
44
|
+
|
45
|
+
def current
|
46
|
+
returnval = []
|
47
|
+
@current.size.times {|i| returnval[i] = @arr[i][@current[i]] }
|
48
|
+
returnval
|
49
|
+
end
|
50
|
+
|
51
|
+
def next_combination
|
52
|
+
i = @current.size - 1
|
53
|
+
@current.reverse.each do |e|
|
54
|
+
if e == @arr[i].size - 1
|
55
|
+
@current[i] = 0
|
56
|
+
else
|
57
|
+
@current[i] += 1
|
58
|
+
break
|
59
|
+
end
|
60
|
+
i -= 1
|
61
|
+
end
|
62
|
+
current
|
63
|
+
end
|
64
|
+
|
65
|
+
def prev_combination
|
66
|
+
i = @current.size - 1
|
67
|
+
@current.reverse.each do |e|
|
68
|
+
if e == 0
|
69
|
+
@current[i] = @arr[i].size - 1
|
70
|
+
else
|
71
|
+
@current[i] -= 1
|
72
|
+
break
|
73
|
+
end
|
74
|
+
i -= 1
|
75
|
+
end
|
76
|
+
current
|
77
|
+
end
|
78
|
+
end
|
data/lib/formats.rb
ADDED
@@ -0,0 +1,119 @@
|
|
1
|
+
class Sbn
|
2
|
+
class Net
|
3
|
+
# Returns a string containing a representation of the network in XMLBIF format.
|
4
|
+
# http://www.cs.cmu.edu/afs/cs/user/fgcozman/www/Research/InterchangeFormat
|
5
|
+
def to_xmlbif
|
6
|
+
xml = Builder::XmlMarkup.new(:indent => 2)
|
7
|
+
xml.instruct!
|
8
|
+
xml.comment! <<-EOS
|
9
|
+
|
10
|
+
Bayesian network in XMLBIF v0.3 (BayesNet Interchange Format)
|
11
|
+
Produced by SBN (Simple Bayesian Network library)
|
12
|
+
Output created #{Time.now}
|
13
|
+
EOS
|
14
|
+
xml.text! "\n"
|
15
|
+
xml.comment! "DTD for the XMLBIF 0.3 format"
|
16
|
+
xml.declare! :DOCTYPE, :bif do
|
17
|
+
xml.declare! :ELEMENT, :bif, :"(network)*"
|
18
|
+
xml.declare! :ATTLIST, :bif, :version, :CDATA, :"#REQUIRED"
|
19
|
+
xml.declare! :ELEMENT, :"network (name, (property | variable | definition)*)"
|
20
|
+
xml.declare! :ELEMENT, :name, :"(#PCDATA)"
|
21
|
+
xml.declare! :ELEMENT, :"variable (name, (outcome | property)*)"
|
22
|
+
xml.declare! :ATTLIST, :"variable type (nature | decision | utility) \"nature\""
|
23
|
+
xml.declare! :ELEMENT, :outcome, :"(#PCDATA)"
|
24
|
+
xml.declare! :ELEMENT, :definition, :"(for | given | table | property)*"
|
25
|
+
xml.declare! :ELEMENT, :for, :"(#PCDATA)"
|
26
|
+
xml.declare! :ELEMENT, :given, :"(#PCDATA)"
|
27
|
+
xml.declare! :ELEMENT, :table, :"(#PCDATA)"
|
28
|
+
xml.declare! :ELEMENT, :property, :"(#PCDATA)"
|
29
|
+
end
|
30
|
+
xml.bif :version => 0.3 do
|
31
|
+
xml.network do
|
32
|
+
xml.name(@name.to_s)
|
33
|
+
xml.text! "\n"
|
34
|
+
xml.comment! "Variables"
|
35
|
+
@variables.each {|name, variable| variable.to_xmlbif_variable(xml) }
|
36
|
+
xml.text! "\n"
|
37
|
+
xml.comment! "Probability distributions"
|
38
|
+
@variables.each {|name, variable| variable.to_xmlbif_definition(xml) }
|
39
|
+
end
|
40
|
+
end
|
41
|
+
end
|
42
|
+
|
43
|
+
# Reconstitute a saved network.
|
44
|
+
def self.from_xmlbif(source)
|
45
|
+
# convert tags to lower case
|
46
|
+
source.gsub!(/(<.*?>)/, '\\1'.downcase)
|
47
|
+
|
48
|
+
doc = XmlSimple.xml_in(source)
|
49
|
+
netname = doc['network'].first['name'].first
|
50
|
+
|
51
|
+
# find net name
|
52
|
+
returnval = Net.new(netname)
|
53
|
+
|
54
|
+
# find variables
|
55
|
+
count = 0
|
56
|
+
variables = {}
|
57
|
+
variable_elements = doc['network'].first['variable'].each do |var|
|
58
|
+
varname = var['name'].first.to_sym
|
59
|
+
properties = var['property']
|
60
|
+
vartype = nil
|
61
|
+
manager_name = nil
|
62
|
+
text_to_match = ""
|
63
|
+
options = {}
|
64
|
+
thresholds = []
|
65
|
+
properties.each do |prop|
|
66
|
+
key, val = prop.split('=').map {|e| e.strip }
|
67
|
+
vartype = val if key == 'SbnVariableType'
|
68
|
+
manager_name = val if key == 'ManagerVariableName'
|
69
|
+
text_to_match = eval(val) if key == 'TextToMatch'
|
70
|
+
options[key.to_sym] = val.to_i if key =~ /stdev_state_count/
|
71
|
+
thresholds = val.map {|e| e.to_f } if key == 'StateThresholds'
|
72
|
+
end
|
73
|
+
states = var['outcome']
|
74
|
+
table = []
|
75
|
+
doc['network'].first['definition'].each do |defn|
|
76
|
+
if defn['for'].first.to_sym == varname
|
77
|
+
table = defn['table'].first.split.map {|prob| prob.to_f }
|
78
|
+
end
|
79
|
+
end
|
80
|
+
count += 1
|
81
|
+
variables[varname] = case vartype
|
82
|
+
when "Sbn::StringVariable" then StringVariable.new(returnval, varname)
|
83
|
+
when "Sbn::NumericVariable" then NumericVariable.new(returnval, varname, table, thresholds, options)
|
84
|
+
when "Sbn::Variable" then Variable.new(returnval, varname, table, states)
|
85
|
+
when "Sbn::StringCovariable" then StringCovariable.new(returnval, manager_name, text_to_match, table)
|
86
|
+
end
|
87
|
+
end
|
88
|
+
|
89
|
+
# find relationships between variables
|
90
|
+
|
91
|
+
# connect covariables to their managers
|
92
|
+
variable_elements = doc['network'].first['variable'].each do |var|
|
93
|
+
varname = var['name'].first.to_sym
|
94
|
+
properties = var['property']
|
95
|
+
vartype = nil
|
96
|
+
covars = nil
|
97
|
+
parents = nil
|
98
|
+
properties.each do |prop|
|
99
|
+
key, val = prop.split('=').map {|e| e.strip }
|
100
|
+
covars = val.split(',').map {|e| e.strip.to_sym } if key == 'Covariables'
|
101
|
+
parents = val.split(',').map {|e| e.strip.to_sym } if key == 'Parents'
|
102
|
+
vartype = val if key == 'SbnVariableType'
|
103
|
+
end
|
104
|
+
if vartype == "Sbn::StringVariable"
|
105
|
+
parents.each {|p| variables[varname].add_parent(variables[p]) } if parents
|
106
|
+
covars.each {|covar| variables[varname].add_covariable(variables[covar]) } if covars
|
107
|
+
end
|
108
|
+
end
|
109
|
+
|
110
|
+
# connect all other variables to their parents
|
111
|
+
doc['network'].first['definition'].each do |defn|
|
112
|
+
varname = defn['for'].first.to_sym
|
113
|
+
parents = defn['given']
|
114
|
+
parents.each {|p| variables[varname].add_parent(variables[p.to_sym]) } if parents
|
115
|
+
end
|
116
|
+
returnval
|
117
|
+
end
|
118
|
+
end
|
119
|
+
end
|
data/lib/helpers.rb
ADDED
@@ -0,0 +1,140 @@
|
|
1
|
+
# = helpers.rb: Helper methods added to existing Ruby classes
|
2
|
+
# Credit goes to ruby-talk posts for many of these (details below).
|
3
|
+
#
|
4
|
+
# Permission is hereby granted, free of charge, to any person
|
5
|
+
# obtaining a copy of this software and associated documentation
|
6
|
+
# files (the "Software"), to deal in the Software without
|
7
|
+
# restriction, including without limitation the rights to use,
|
8
|
+
# copy, modify, merge, publish, distribute, sublicense, and/or sell
|
9
|
+
# copies of the Software, and to permit persons to whom the
|
10
|
+
# Software is furnished to do so, subject to the following
|
11
|
+
# conditions:
|
12
|
+
#
|
13
|
+
# The above copyright notice and this permission notice shall be
|
14
|
+
# included in all copies or substantial portions of the Software.
|
15
|
+
#
|
16
|
+
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
|
+
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
|
18
|
+
# OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
19
|
+
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
|
20
|
+
# HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
|
21
|
+
# WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
22
|
+
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
|
23
|
+
# OTHER DEALINGS IN THE SOFTWARE.
|
24
|
+
|
25
|
+
# Thanks to Brian Schrer <ruby.brian _at_ gmail.com> for the
|
26
|
+
# following two methods, from ruby-talk post #150456.
|
27
|
+
class Object # :nodoc:
|
28
|
+
def self.enums(*args)
|
29
|
+
args.flatten.each_with_index do |const, i|
|
30
|
+
class_eval %(#{const} = #{i})
|
31
|
+
end
|
32
|
+
end
|
33
|
+
|
34
|
+
def self.bitwise_enums(*args)
|
35
|
+
args.flatten.each_with_index do |const, i|
|
36
|
+
class_eval %(#{const} = #{2**i})
|
37
|
+
end
|
38
|
+
end
|
39
|
+
end
|
40
|
+
|
41
|
+
class String # :nodoc:
|
42
|
+
def to_underscore_sym
|
43
|
+
self.titleize.gsub(/\s+/, '').underscore.to_sym
|
44
|
+
end
|
45
|
+
|
46
|
+
# Thanks to David Alan Black for this method, from
|
47
|
+
# ruby-talk post #11792
|
48
|
+
def ngrams(len = 1)
|
49
|
+
ngrams = []
|
50
|
+
len = size if len > size
|
51
|
+
(0..size - len).each do |n|
|
52
|
+
ng = self[n...(n + len)]
|
53
|
+
ngrams.push(ng)
|
54
|
+
yield ng if block_given?
|
55
|
+
end
|
56
|
+
ngrams
|
57
|
+
end
|
58
|
+
end
|
59
|
+
|
60
|
+
class Symbol # :nodoc:
|
61
|
+
def to_underscore_sym
|
62
|
+
self.to_s.titleize.gsub(/\s+/, '').underscore.to_sym
|
63
|
+
end
|
64
|
+
end
|
65
|
+
|
66
|
+
class Array # :nodoc:
|
67
|
+
def symbolize_values
|
68
|
+
self.map {|e| e.to_underscore_sym }
|
69
|
+
end
|
70
|
+
|
71
|
+
def symbolize_values!
|
72
|
+
self.map! {|e| e.to_underscore_sym }
|
73
|
+
end
|
74
|
+
|
75
|
+
def normalize
|
76
|
+
sum = self.inject(0.0) {|sum, e| sum += e }
|
77
|
+
self.map {|e| e.to_f / sum }
|
78
|
+
end
|
79
|
+
|
80
|
+
def normalize!
|
81
|
+
sum = self.inject(0.0) {|sum, e| sum += e }
|
82
|
+
self.map! {|e| e.to_f / sum }
|
83
|
+
end
|
84
|
+
end
|
85
|
+
|
86
|
+
class Hash # :nodoc:
|
87
|
+
def symbolize_keys_and_values
|
88
|
+
inject({}) do |options, (key, value)|
|
89
|
+
key = key.to_underscore_sym
|
90
|
+
value = value.to_underscore_sym
|
91
|
+
options[key] = value
|
92
|
+
options
|
93
|
+
end
|
94
|
+
end
|
95
|
+
|
96
|
+
def symbolize_keys_and_values!
|
97
|
+
keys.each do |key|
|
98
|
+
newkey = key.to_underscore_sym
|
99
|
+
self[newkey] = self[key].to_underscore_sym
|
100
|
+
delete(key) unless key == newkey
|
101
|
+
end
|
102
|
+
self
|
103
|
+
end
|
104
|
+
end
|
105
|
+
|
106
|
+
# Thanks to Eric Hodel for the following additions
|
107
|
+
# to the enumerable model, from ruby-talk post #135920.
|
108
|
+
module Enumerable # :nodoc:
|
109
|
+
##
|
110
|
+
# Sum of all the elements of the Enumerable
|
111
|
+
def sum
|
112
|
+
return self.inject(0) { |acc, i| acc + i }
|
113
|
+
end
|
114
|
+
|
115
|
+
##
|
116
|
+
# Average of all the elements of the Enumerable
|
117
|
+
#
|
118
|
+
# The Enumerable must respond to #length
|
119
|
+
def average
|
120
|
+
return self.sum / self.length.to_f
|
121
|
+
end
|
122
|
+
|
123
|
+
##
|
124
|
+
# Sample variance of all the elements of the Enumerable
|
125
|
+
#
|
126
|
+
# The Enumerable must respond to #length
|
127
|
+
def sample_variance
|
128
|
+
avg = self.average
|
129
|
+
sum = self.inject(0) { |acc, i| acc + (i - avg) ** 2 }
|
130
|
+
return (1 / self.length.to_f * sum)
|
131
|
+
end
|
132
|
+
|
133
|
+
##
|
134
|
+
# Standard deviation of all the elements of the Enumerable
|
135
|
+
#
|
136
|
+
# The Enumerable must respond to #length
|
137
|
+
def standard_deviation
|
138
|
+
return Math.sqrt(self.sample_variance)
|
139
|
+
end
|
140
|
+
end
|