neuronet 6.0.0
Sign up to get free protection for your applications and to get access to all the features.
- data/README.md +871 -0
- data/lib/neuronet.rb +434 -0
- metadata +47 -0
data/README.md
ADDED
@@ -0,0 +1,871 @@
|
|
1
|
+
# Neuronet 6.0.0
|
2
|
+
|
3
|
+
Library to create neural networks.
|
4
|
+
|
5
|
+
* Author: <carlosjhr64@gmail.com>
|
6
|
+
* Copyright: 2013
|
7
|
+
* License: [GPL](http://www.gnu.org/licenses/gpl.html)
|
8
|
+
* Git Page: <https://github.com/carlosjhr64/neuronet>
|
9
|
+
|
10
|
+
## Installation
|
11
|
+
|
12
|
+
gem install neuronet
|
13
|
+
|
14
|
+
## Synopsis
|
15
|
+
|
16
|
+
Given some set of inputs and targets that are Array's of Float's.
|
17
|
+
Then:
|
18
|
+
|
19
|
+
# data = [ [input, target], ... }
|
20
|
+
# n = input.length
|
21
|
+
# t = target.length
|
22
|
+
# m = n + t
|
23
|
+
# l = data.length
|
24
|
+
# Then:
|
25
|
+
# Create a general purpose neurnet
|
26
|
+
|
27
|
+
neuronet = Neuronet::ScaledNetwork.new([n, m, t])
|
28
|
+
|
29
|
+
# "Bless" it as a TaoYinYang,
|
30
|
+
# a perceptron hybrid with the middle layer
|
31
|
+
# initially mirroring the input layer and
|
32
|
+
# mirrored by the output layer.
|
33
|
+
|
34
|
+
Neuronet::TaoYinYang.bless(neuronet)
|
35
|
+
|
36
|
+
# The following sets the learning constant
|
37
|
+
# to something I think is reasonable.
|
38
|
+
|
39
|
+
neuronet.num(l)
|
40
|
+
|
41
|
+
# Start training
|
42
|
+
|
43
|
+
MANY.times do
|
44
|
+
data.shuffle.each do |input, target|
|
45
|
+
neuronet.exemplar(input, target)
|
46
|
+
end
|
47
|
+
end # or until some small enough error
|
48
|
+
|
49
|
+
# See how well the training went
|
50
|
+
|
51
|
+
require 'pp'
|
52
|
+
data.each do |input, target|
|
53
|
+
puts "Input:"
|
54
|
+
pp input
|
55
|
+
puts "Output:"
|
56
|
+
neuronet.reset(input) # sets the input values
|
57
|
+
pp neuronet.output # gets the output values
|
58
|
+
puts "Target:"
|
59
|
+
pp target
|
60
|
+
end
|
61
|
+
|
62
|
+
## Introduction
|
63
|
+
|
64
|
+
Neuronet is a pure Ruby 1.9, sigmoid squashed, neural network building library.
|
65
|
+
It allows one to build a network by connecting one neuron at a time, or a layer at a time,
|
66
|
+
or up to a full feed forward network that automatically scales the inputs and outputs.
|
67
|
+
|
68
|
+
I chose a TaoYinYang'ed ScaledNetwork neuronet for the synopsis because
|
69
|
+
it'll probably handle most anything you'd throw at it.
|
70
|
+
But there's a lot you can do to the data before throwing it at a neuronet.
|
71
|
+
And you can build a neuronet specifically to solve a particular kind of problem.
|
72
|
+
Properly transforming the data and choosing the right neuronet architecture
|
73
|
+
can greatly reduce the amount of training time the neuronet will require.
|
74
|
+
A neuronet with the wrong architecture for a problem will be unable to solve it.
|
75
|
+
Raw data without hints as to what's important in the data will take longer to solve.
|
76
|
+
|
77
|
+
As an analogy, think of what you can do with linear regression.
|
78
|
+
Your raw data might not be linear, but if a transform converts it to a linear form,
|
79
|
+
you can use linear regression to find the best fit line, and
|
80
|
+
from that deduce the properties of the untransformed data.
|
81
|
+
Likewise, if you can transform the data into something the neuronet can solve,
|
82
|
+
you can by inverse get back the answer you're lookin for.
|
83
|
+
|
84
|
+
# WARNING: I STILL NEED TO REWRITE THE NOTES BELOW TO MATCH THE NEW CODE.
|
85
|
+
|
86
|
+
## Example: Time Series
|
87
|
+
|
88
|
+
First, a little motivation...
|
89
|
+
A common use for a neural-net is to attempt to forecast future set of data points
|
90
|
+
based on past set of data points, [Time series](http://en.wikipedia.org/wiki/Time_series).
|
91
|
+
To demonstrate, I'll train a network with the following function:
|
92
|
+
|
93
|
+
f(t) = A + B sine(C + D t), t in [0,1,2,3,...]
|
94
|
+
|
95
|
+
I'll set A, B, C, and D to some random number and see
|
96
|
+
if eventually the network can predict the next set of values based on previous values.
|
97
|
+
I'll try:
|
98
|
+
|
99
|
+
[f(n),...,f(n+19)] => [f(n+20),...,f(n+24)]
|
100
|
+
|
101
|
+
That is... given 20 consecutive values, give the next 5 in the series.
|
102
|
+
There is no loss, and probably greater generality,
|
103
|
+
if I set at random the phase (C above), so that for any given random phase we want:
|
104
|
+
|
105
|
+
[f(0),...,f(19)] => [f(20),...,f(24)]
|
106
|
+
|
107
|
+
I'll be using [Neuronet::ScaledNetwork](http://rubydoc.info/gems/neuronet/Neuronet/ScaledNetwork).
|
108
|
+
Also note that the Sine function is entirely defined within a cycle ( 2 Math::PI ) and
|
109
|
+
so parameters (particularly C) need only to be set within the cycle.
|
110
|
+
After a lot of testing, I've verified that a
|
111
|
+
[Perceptron](http://en.wikipedia.org/wiki/Perceptron) is enough to solve the problem.
|
112
|
+
The Sine function is [Linearly separable](http://en.wikipedia.org/wiki/Linearly_separable).
|
113
|
+
Adding hidden layers needlessly adds training time, but does converge.
|
114
|
+
|
115
|
+
|
116
|
+
The gist of the
|
117
|
+
[example code](https://github.com/carlosjhr64/neuronet/blob/master/examples/sine_series.rb)
|
118
|
+
is:
|
119
|
+
|
120
|
+
...
|
121
|
+
# The constructor
|
122
|
+
neuronet = Neuronet::ScaledNetwork.new([INPUTS, OUTPUTS])
|
123
|
+
...
|
124
|
+
# Setting learning constant
|
125
|
+
neuronet.num(1.0)
|
126
|
+
...
|
127
|
+
# Setting the input values
|
128
|
+
neuronet.reset(input)
|
129
|
+
...
|
130
|
+
# Getting the neuronet's output
|
131
|
+
output = neuronet.output
|
132
|
+
...
|
133
|
+
# Training the target
|
134
|
+
neuronet.train!(target)
|
135
|
+
...
|
136
|
+
|
137
|
+
Heres a sample output:
|
138
|
+
|
139
|
+
f(phase, t) = 3.002 + 3.28*Sin(phase + 1.694*t)
|
140
|
+
Cycle step = 0.27
|
141
|
+
|
142
|
+
Iterations: 1738
|
143
|
+
Relative Error (std/B): 0.79% Standard Deviation: 0.026
|
144
|
+
Examples:
|
145
|
+
|
146
|
+
Input: 0.522, 1.178, 5.932, 4.104, -0.199, 2.689, 6.28, 2.506, -0.154, 4.276, 5.844, 1.028, 0.647, 5.557, 4.727, 0.022, 2.011, 6.227, 3.198, -0.271
|
147
|
+
Target: 3.613, 6.124, 1.621, 0.22, 5.069
|
148
|
+
Output: 3.575, 6.101, 1.664, 0.227, 5.028
|
149
|
+
|
150
|
+
Input: 5.265, 5.079, 0.227, 1.609, 6.12, 3.626, -0.27, 3.184, 6.229, 2.024, 0.016, 4.716, 5.565, 0.656, 1.017, 5.837, 4.288, -0.151, 2.493, 6.28
|
151
|
+
Target: 2.703, -0.202, 4.091, 5.938, 1.189
|
152
|
+
Output: 2.728, -0.186, 4.062, 5.931, 1.216
|
153
|
+
|
154
|
+
Input: 5.028, 0.193, 1.669, 6.14, 3.561, -0.274, 3.25, 6.217, 1.961, 0.044, 4.772, 5.524, 0.61, 1.07, 5.87, 4.227, -0.168, 2.558, 6.281, 2.637
|
155
|
+
Target: -0.188, 4.153, 5.908, 1.135, 0.557
|
156
|
+
Output: -0.158, 4.112, 5.887, 1.175, 0.564
|
157
|
+
|
158
|
+
|
159
|
+
ScaledNetwork automatically scales each input via
|
160
|
+
[Neuronet::Gaussian](http://rubydoc.info/gems/neuronet/Neuronet/Gaussian),
|
161
|
+
so the input needs to be many variables and
|
162
|
+
the output entirely determined by the shape of the input and not it's scale.
|
163
|
+
That is, two inputs that are different only in scale should
|
164
|
+
produce outputs that are different only in scale.
|
165
|
+
The input must have at least three points.
|
166
|
+
|
167
|
+
You can tackle many problems just with
|
168
|
+
[Neuronet::ScaledNetwork](http://rubydoc.info/gems/neuronet/Neuronet/ScaledNetwork)
|
169
|
+
as described above.
|
170
|
+
So now that you're hopefully interested and want to go on to exactly how it all works,
|
171
|
+
I'll describe Neuronet from the ground up.
|
172
|
+
|
173
|
+
## Squashing Function
|
174
|
+
|
175
|
+
An artificial neural network uses an activation function
|
176
|
+
that determines the activation value of a neuron.
|
177
|
+
This activation value is often thought of on/off or true/false.
|
178
|
+
Neuronet uses a sigmoid function to set the neuron's activation value between 1.0 and 0.0.
|
179
|
+
For classification problems, activation values near one are considered true
|
180
|
+
while activation values near 0.0 are considered false.
|
181
|
+
In Neuronet I make a distinction between the neuron's activation value and
|
182
|
+
it's representation to the problem.
|
183
|
+
In the case of a true or false problem,
|
184
|
+
the neuron's value is either true or false,
|
185
|
+
while it's activation is between 1.0 and 0.0.
|
186
|
+
This attribute, activation, need never appear in an implementation of Neuronet, but
|
187
|
+
it is mapped back to it's unsquashed value every time
|
188
|
+
the implementation asks for the neuron's value.
|
189
|
+
|
190
|
+
Neuronet.squash( unsquashed )
|
191
|
+
1.0 / ( 1.0 + Math.exp( -unsquashed ) )
|
192
|
+
|
193
|
+
Neuronet.unsquashed( squashed )
|
194
|
+
Math.log( squashed / ( 1.0 - squashed ) )
|
195
|
+
|
196
|
+
## Learning Constant
|
197
|
+
|
198
|
+
**TODO: This section needs re-write**
|
199
|
+
|
200
|
+
One can think of a neural network as a sheet of very elastic rubber
|
201
|
+
which one pokes and pulls to fit the training data while
|
202
|
+
otherwise keeping the sheet as smooth as possible.
|
203
|
+
You don't want to hammer this malleable sheet too hard.
|
204
|
+
One concern is that the training data may contain noise, random errors.
|
205
|
+
So the training of the network should add up the true signal in the data
|
206
|
+
while canceling out the noise. This balance is set via the learning constant.
|
207
|
+
|
208
|
+
|
209
|
+
neuronet.learning
|
210
|
+
# Returns the current value of the network's learning constant
|
211
|
+
|
212
|
+
neuronet.learning = float
|
213
|
+
# where float is greater than zero but less than one.
|
214
|
+
# Sets the global learning constant by an implementation given value
|
215
|
+
|
216
|
+
I've not come across any hard rule for the learning constant.
|
217
|
+
I have my own intuition derived from the behavior of random walks.
|
218
|
+
The distance away from a starting point in a random walk is
|
219
|
+
proportional to the square root of the number of steps.
|
220
|
+
I conjecture that the number of training data points is related to
|
221
|
+
the optimal learning constant in the same way.
|
222
|
+
I have come across 0.2 as a good value for the learning constant, which
|
223
|
+
would mean the proponent of this value was working with a data set size of about 25.
|
224
|
+
In any case, I've had good results with the following:
|
225
|
+
|
226
|
+
# where number is the number of data points
|
227
|
+
neuronet.learning( number )
|
228
|
+
1.0 / Math.sqrt( number + 1.0 )
|
229
|
+
|
230
|
+
In the case of setting number to 1.0,
|
231
|
+
the learning constant would be the square root of 1/2.
|
232
|
+
This would suggest that although we're taking larger steps than half steps,
|
233
|
+
due to the nature of a random walk, we're approaching the solution in half steps.
|
234
|
+
|
235
|
+
## Noise
|
236
|
+
|
237
|
+
The literature I've read (probably outdated by now)
|
238
|
+
would have one create a neural network with random weights and
|
239
|
+
hope that training it will converge to a solution.
|
240
|
+
I've never really believed that to be a correct way.
|
241
|
+
Although the implementation is free to set all parameters for each neuron,
|
242
|
+
Neuronet by default creates zeroed neurons.
|
243
|
+
Association between inputs and outputs are trained, and
|
244
|
+
neurons differentiate from each other randomly.
|
245
|
+
Differentiation among neurons is achieved by noise in the back-propagation of errors.
|
246
|
+
This noise is provided by:
|
247
|
+
|
248
|
+
Neuronet.noise
|
249
|
+
rand + rand
|
250
|
+
|
251
|
+
I chose rand + rand to give the noise an average value of one and a bell shape distribution.
|
252
|
+
|
253
|
+
## Node
|
254
|
+
|
255
|
+
A neuron is a node. In Neuronet, Neuronet::Neuron subclasses Neuronet::Node.
|
256
|
+
A node has a value which the implementation can set. A Node object is created via:
|
257
|
+
|
258
|
+
Neuronet::Node.new( value=0.0 )
|
259
|
+
|
260
|
+
and responds to the following methods:
|
261
|
+
|
262
|
+
value=( float )
|
263
|
+
value
|
264
|
+
|
265
|
+
The above methods work just as expected:
|
266
|
+
|
267
|
+
node = Neuronet::Node.new
|
268
|
+
a = node.value # sets a to 0.0
|
269
|
+
node.value = 1.37
|
270
|
+
b = node.value # sets b to 1.37
|
271
|
+
|
272
|
+
But if you look at the code for Neuronet::Node, you'll see that value is not stored,
|
273
|
+
but it's calculated activation is.
|
274
|
+
The implementation can get this value via the attribute reader:
|
275
|
+
|
276
|
+
activation
|
277
|
+
|
278
|
+
In Neuronet, a node is a constant neuron whose value is not changed by training,
|
279
|
+
backpropagation of errors. It is used for inputs.
|
280
|
+
It's used as a terminal where updates and back-propagations end.
|
281
|
+
For this purpose, it provides the following methods:
|
282
|
+
|
283
|
+
train( target=nil, learning=nil ) # returns nil
|
284
|
+
backpropagate( error ) # returns nil
|
285
|
+
update # returns activation
|
286
|
+
|
287
|
+
I consider these methods private.
|
288
|
+
I can't think of a reason they'd appear in the implementation.
|
289
|
+
Likewise, the implementation should not have to bother with activation.
|
290
|
+
|
291
|
+
## Scaling The Problem
|
292
|
+
|
293
|
+
It's early to be talking about scaling the problem, but
|
294
|
+
since I just covered how to set values to a node above,
|
295
|
+
it's a good time to start thinking about scale.
|
296
|
+
The squashing function, sigmoid, maps real numbers (negative infinity, positive infinity)
|
297
|
+
to the segment zero to one (0,1).
|
298
|
+
But for the sake of computation in a neural net,
|
299
|
+
sigmoid works best if the problem is scaled to numbers
|
300
|
+
between negative one and positive one (-1, 1).
|
301
|
+
Study the following table and see if you can see why:
|
302
|
+
|
303
|
+
x => sigmoid(x)
|
304
|
+
9 => 0.99987...
|
305
|
+
3 => 0.95257...
|
306
|
+
2 => 0.88079...
|
307
|
+
1 => 0.73105...
|
308
|
+
0 => 0.50000...
|
309
|
+
-1 => 0.26894...
|
310
|
+
-2 => 0.11920...
|
311
|
+
-3 => 0.04742...
|
312
|
+
-9 => 0.00012...
|
313
|
+
|
314
|
+
So as x gets much higher than 3, sigmoid(x) gets to be pretty close to just 1, and
|
315
|
+
as x gets much lower than -3, sigmoid(x) gets to be pretty close to 0.
|
316
|
+
Also note that sigmoid is centered about 0.5 which maps to 0.0 in problem space.
|
317
|
+
It is for this reason that I suggest the problem be displaced (subtracted)
|
318
|
+
by it's average to be centered about zero and scaled (divided) by it standard deviation.
|
319
|
+
For non gaussian data where outbounds are expected,
|
320
|
+
you should probably scale by a multiple of the standard deviation so
|
321
|
+
that most of the data fits within sigmoid's "field of view" (-1, 1).
|
322
|
+
|
323
|
+
## Connection
|
324
|
+
|
325
|
+
This is where I think Neuronet gets it's architecture really right!
|
326
|
+
Connections between neurons (and nodes) are there own separate objects.
|
327
|
+
In other codes I've seen this is not abstracted out.
|
328
|
+
In Neuronet, a neuron contains it's bias, and a list of it's connections.
|
329
|
+
Each connection contains it's weight (strength) and connected terminal node.
|
330
|
+
Given a terminal, node, a connection is created as follows:
|
331
|
+
|
332
|
+
connection = Neuronet::Connection.new( node, weight=0.0 )
|
333
|
+
|
334
|
+
So a neuron connected to the given terminal node would have
|
335
|
+
the created connection in its connections list.
|
336
|
+
This will be discussed below under the topic Neuron.
|
337
|
+
The object, connection, responds to the following methods:
|
338
|
+
|
339
|
+
value
|
340
|
+
update
|
341
|
+
backpropagate( error )
|
342
|
+
|
343
|
+
The value of a connection is the weighted activation of
|
344
|
+
the node it's connected to ( weight node.activation ).
|
345
|
+
Similarly, update is the updated value of a connection,
|
346
|
+
which is the weighted updated activation of the node it's connected to ( weight*node.update ).
|
347
|
+
The method update is the one to use
|
348
|
+
whenever the value of the inputs are changed (or right after training).
|
349
|
+
Otherwise, both update and value should give the same result
|
350
|
+
with value avoiding the unnecessary back calculations.
|
351
|
+
The method backpropagate modifies the connection's weight in proportion to
|
352
|
+
the error given and passes that error to its connected node via the node's backpropagate.
|
353
|
+
|
354
|
+
## Neuron
|
355
|
+
|
356
|
+
[Neuronet::Neuron](http://rubydoc.info/gems/neuronet/Neuronet/Neuron)
|
357
|
+
is a Neuronet::Node with some extra features.
|
358
|
+
It adds two attributes: connections, and bias.
|
359
|
+
As mentioned above, connections is a list, aka Array,
|
360
|
+
of the neuron's connections to other neurons (or nodes).
|
361
|
+
A neuron's bias is it's kicker (or deduction) to it's activation value
|
362
|
+
as a sum of its connections values. So a neuron's updated value is set as:
|
363
|
+
|
364
|
+
self.value = @bias + @connections.inject(0.0){|sum,connection| sum + connection.update}
|
365
|
+
|
366
|
+
If you're not familiar with ruby's Array::inject method,
|
367
|
+
it's the Ruby way of doing summations.
|
368
|
+
It's really cool once you get the gist of it. Checkout:
|
369
|
+
|
370
|
+
* Jay Field's Thoughts on Ruby: inject
|
371
|
+
* Induction ( for_all )
|
372
|
+
|
373
|
+
But that's a digression... Here's how an implementation creates a new neuron:
|
374
|
+
|
375
|
+
neuron = Neuronet::Neuron.new( bias=0.0 )
|
376
|
+
|
377
|
+
There's an attribute accessor for @bias, and an attribute reader for @connections. The object, neuron, responds to the following methods:
|
378
|
+
|
379
|
+
update
|
380
|
+
partial
|
381
|
+
backpropagate( error )
|
382
|
+
train( target, learning=Neuronet.learning )
|
383
|
+
connect( node, weight=0.0 )
|
384
|
+
|
385
|
+
The update method sets the neuron's value as described above. The partial method sets the neuron's value without calling the connections update methods as follows:
|
386
|
+
|
387
|
+
self.value = @bias + @connections.inject(0.0){|sum,connection| sum + connection.value}
|
388
|
+
|
389
|
+
It's not necessary to burrow all the way down to update the current neuron
|
390
|
+
if it's connected neurons have all been updated.
|
391
|
+
The implementation should set it's algorithm to use partial
|
392
|
+
instead of update as update will most likely needlessly update previously updated neurons.
|
393
|
+
The backpropagate method modifies the neuron's bias in proportion to the given error and
|
394
|
+
passes on this error to each of its connection's backpropagate method.
|
395
|
+
The connect method is how the implementation adds a connection,
|
396
|
+
the way to connect the neuron to another.
|
397
|
+
To connect neuron out to neuron in, for example, it is:
|
398
|
+
|
399
|
+
in = Neuronet::Neuron.new
|
400
|
+
out = Neuronet::Neuron.new
|
401
|
+
out.connect(in)
|
402
|
+
|
403
|
+
Think output connects to input.
|
404
|
+
Here, the input flow would be from in to out,
|
405
|
+
while back-propagation of errors flows from out to in.
|
406
|
+
If you wanted to train the value of out, out.value,
|
407
|
+
to be 1.5 with the given value of in set at 0.3, you do as follows:
|
408
|
+
|
409
|
+
puts "(#{in}, #{out})" # what you've got before (0.0, 0.0)
|
410
|
+
in.value = 0.3
|
411
|
+
out.train(1.5)
|
412
|
+
out.partial # don't forget to update (no need to go deeper than a, so partial)
|
413
|
+
puts "(#{in}, #{out})" # (0.3, 0.113022020702079)
|
414
|
+
|
415
|
+
Note that with continued training, b should approach it's target value of 1.5.
|
416
|
+
|
417
|
+
## InputLayer
|
418
|
+
|
419
|
+
What follows next in lib/neuronet.rb's code is motivated by feedforward neural networks,
|
420
|
+
and Neuronet eventually gets to build one.
|
421
|
+
Neuronet::InputLayer is an Array of Neuronet::Node's.
|
422
|
+
An input layer of a given length (number of nodes) is created as follows:
|
423
|
+
|
424
|
+
input = Neuronet::InputLayer.new( length )
|
425
|
+
|
426
|
+
The object, input, responds to a couple of methods (on top of those from Array):
|
427
|
+
|
428
|
+
set( input )
|
429
|
+
values
|
430
|
+
|
431
|
+
For example, a three neuron input layer with it's neuron values set as -1, 0, and 1:
|
432
|
+
|
433
|
+
input = Neuronet::InputLayer(3)
|
434
|
+
input.set( [-1, 0, 1] )
|
435
|
+
puts input.values.join(', ') # [-1.0,0.0,1.0].join(', ')
|
436
|
+
|
437
|
+
## Layer
|
438
|
+
|
439
|
+
In Neuronet, InputLayer is to Layer what Node is to Neuron.
|
440
|
+
Layer is an Array of Neurons.
|
441
|
+
A Layer object is created as follows:
|
442
|
+
|
443
|
+
# length is the number of neurons in the layer
|
444
|
+
layer = Neuronet::Layer.new( length )
|
445
|
+
|
446
|
+
The Layer object responds to the following methods:
|
447
|
+
|
448
|
+
connect( layer, weight=0.0 )
|
449
|
+
partial
|
450
|
+
train( targets, learning=Neuronet.learning )
|
451
|
+
values
|
452
|
+
|
453
|
+
So now one can create layers, connect them, train them, and update them (via partial).
|
454
|
+
A Perceptron is built this way:
|
455
|
+
|
456
|
+
n, m = 3, 3 # building a 3X3 perceptron
|
457
|
+
input_layer = Neuronet::InputLayer.new( n )
|
458
|
+
output_layer = Neuronet::Layer.new( m )
|
459
|
+
output_layer.connect( input_layer )
|
460
|
+
# to set the perceptron's input to -0.5,0.25,2.1...
|
461
|
+
input_layer.set( [-0.5, 0.25, 2.1] )
|
462
|
+
# to train it to -0.1, 0.2, 0.5
|
463
|
+
output_layer.train( [-0.1, 0.2, 0.5] )
|
464
|
+
output_layer.partial # update!
|
465
|
+
# to see its values
|
466
|
+
puts output_layer.values.join(', ')
|
467
|
+
|
468
|
+
|
469
|
+
## FeedForwardNetwork
|
470
|
+
|
471
|
+
Now we're building complete networks.
|
472
|
+
To create a feedforward neural network with optional middle layers, ffnn:
|
473
|
+
|
474
|
+
ffnn = Neuronet::FeedForwardNetwork.new([input, <layer1, ...,> output])
|
475
|
+
|
476
|
+
The FeedForwardNetwork object, ffnn, responds to the following methods:
|
477
|
+
|
478
|
+
learning=( learning_constant ) # to explicitly set a learning constant
|
479
|
+
update
|
480
|
+
set( inputs )
|
481
|
+
train!( targets, learning=@learning )
|
482
|
+
exemplar( inputs, targets, learning=@learning ) # trains an input/output pair
|
483
|
+
values(layer) # layer's values
|
484
|
+
input # in (first layer's) values
|
485
|
+
output # out (last layer's) values
|
486
|
+
And has the following attribute readers:
|
487
|
+
in # input (first) layer
|
488
|
+
out # output (last) layer
|
489
|
+
|
490
|
+
Notice that this time I've named the training method train! (with the exclamation mark).
|
491
|
+
This is because train! automatically does the update as well.
|
492
|
+
I thought it might be confusing that at the lower level one had to call train and
|
493
|
+
either partial or update, so I made the distinction.
|
494
|
+
Neuronet also provides a convenience method exemplar to train input / output pairs.
|
495
|
+
It's equivalent the following:
|
496
|
+
|
497
|
+
ffnn.set( inputs ); ffnn.train!( targets );
|
498
|
+
|
499
|
+
## Scale
|
500
|
+
|
501
|
+
Neuronet::Scale is a class to help scale problems to fit within a network's "field of view".
|
502
|
+
Given a list of values, it finds the minimum and maximum values and
|
503
|
+
establishes a mapping to a scaled set of numbers between minus one and one (-1,1).
|
504
|
+
For example:
|
505
|
+
|
506
|
+
scale = Neuronet::Scale.new
|
507
|
+
values = [ 1, -3, 5, -2 ]
|
508
|
+
scale.set( values )
|
509
|
+
mapped = scale.mapped( values )
|
510
|
+
puts mapped.join(', ') # 0.0, -1.0, 1.0, -0.75
|
511
|
+
puts scale.unmapped( mapped ).join(', ') # 1.0, -3.0, 5.0, -2.0
|
512
|
+
|
513
|
+
The mapping is like:
|
514
|
+
|
515
|
+
center = (maximum + minimum) / 2.0 if center.nil? # calculate center if not given
|
516
|
+
spread = (maximum - minimum) / 2.0 if spread.nil? # calculate spread if not given
|
517
|
+
inputs.map{ |value| (value - center) / (factor * spread) }
|
518
|
+
|
519
|
+
One can change the range of the map to (-1/factor, 1/factor)
|
520
|
+
where factor is the spread multiplier and force
|
521
|
+
a (prehaps pre-calculated) value for center and spread.
|
522
|
+
The constructor is:
|
523
|
+
|
524
|
+
scale = Neuronet::Scale.new( factor=1.0, center=nil, spread=nil )
|
525
|
+
|
526
|
+
In the constructor, if the value of center is provided, then
|
527
|
+
that value will be used instead of it being calculated from the values passed to method set.
|
528
|
+
Likewise, if spread is provided, that value of spread will be used.
|
529
|
+
There are two attribute accessors:
|
530
|
+
|
531
|
+
spread
|
532
|
+
center
|
533
|
+
|
534
|
+
One attribute writer:
|
535
|
+
|
536
|
+
init
|
537
|
+
|
538
|
+
In the code, the attribute @init flags if
|
539
|
+
there is a initiation phase to the calculation of @spread and @center.
|
540
|
+
For Scale, @init is true and the initiation phase calculates
|
541
|
+
the intermediate values @min and @max (the minimum and maximum values in the data set).
|
542
|
+
It's possible for subclasses of Scale, such as Gaussian, to not have this initiation phase.
|
543
|
+
An instance, scale, of class Scale will respond to the following methods considered to be public:
|
544
|
+
|
545
|
+
set( inputs )
|
546
|
+
mapped_input
|
547
|
+
mapped_output
|
548
|
+
unmapped_input
|
549
|
+
unmapped_output
|
550
|
+
|
551
|
+
In Scale, mapped\_input\ and mapped\_output are synonyms of mapped, but
|
552
|
+
in general this symmetry may be broken.
|
553
|
+
Likewise, unmapped\_input and unmapped\_output are synonyms of unmapped.
|
554
|
+
Scale also provides the following methods which are considered private:
|
555
|
+
|
556
|
+
set_init( inputs )
|
557
|
+
set_spread( inputs )
|
558
|
+
set_center( inputs )
|
559
|
+
mapped( inputs )
|
560
|
+
unmapped( outputs )
|
561
|
+
|
562
|
+
Except maybe for mapped and unmapped,
|
563
|
+
there should be no reason for the implementation to call these directly.
|
564
|
+
These are expected to be overridden by subclasses.
|
565
|
+
For example, in Gaussian, set\_spread calculates the standard deviation and
|
566
|
+
set\_center calculates the mean (average),
|
567
|
+
while set\_init is skipped by setting @init to false.
|
568
|
+
|
569
|
+
## Gaussian
|
570
|
+
|
571
|
+
In Neuronet, Gaussian subclasses Scale and is used exactly the same way.
|
572
|
+
The only changes are that it calculates the arithmetic mean (average) for center and
|
573
|
+
the standard deviation for spread.
|
574
|
+
The following private methods are overridden to provide that effect:
|
575
|
+
|
576
|
+
set_center( inputs )
|
577
|
+
inputs.inject(0.0,:+) / inputs.length
|
578
|
+
set_spread( inputs )
|
579
|
+
Math.sqrt( inputs.map{|value| self.center - value}.inject(0.0){|sum,value| value*value + sum} / (inputs.length - 1.0) )
|
580
|
+
|
581
|
+
## LogNormal
|
582
|
+
|
583
|
+
Neuronet::LogNormal subclasses Neuronet::Gaussian to transform the values to a logarithmic scale. It overrides the following methods:
|
584
|
+
|
585
|
+
set( inputs )
|
586
|
+
super( inputs.map{|value| Math::log(value)} )
|
587
|
+
mapped(inputs)
|
588
|
+
super( inputs.map{|value| Math::log(value)} )
|
589
|
+
unmapped(inputs)
|
590
|
+
super( inputs.map{|value| Math::exp(value)} )
|
591
|
+
|
592
|
+
So LogNormal is just Gaussian except that it first pipes values through a logarithm, and
|
593
|
+
then pipes the output back through exponentiation.
|
594
|
+
|
595
|
+
## ScaledNetwork
|
596
|
+
|
597
|
+
So now we're back to where we started.
|
598
|
+
In Neuronet, ScaledNetwork is a subclass of FeedForwardNetwork.
|
599
|
+
It automatically scales the problem given to it by using a Scale type instance,
|
600
|
+
Gaussian by default.
|
601
|
+
It adds on attribute accessor:
|
602
|
+
|
603
|
+
distribution
|
604
|
+
|
605
|
+
The attribute, distribution, is set to Neuronet::Gausian.new by default,
|
606
|
+
but one can change this to Scale, LogNormal, or one's own custom mapper.
|
607
|
+
ScaledNetwork also adds one method:
|
608
|
+
|
609
|
+
reset( values )
|
610
|
+
|
611
|
+
This method, reset, works just like FeedForwardNetwork's set method,
|
612
|
+
but calls distribution.set( values ) first.
|
613
|
+
Sometimes you'll want to set the distribution with the entire data set and the use set,
|
614
|
+
and then there will be times you'll want to set the distribution with each input and use reset.
|
615
|
+
For example, either:
|
616
|
+
|
617
|
+
scaled_network.distribution.set( data_set.flatten )
|
618
|
+
data_set.each do |inputs,outputs|
|
619
|
+
# ... do your stuff using scaled_network.set( inputs )
|
620
|
+
end
|
621
|
+
|
622
|
+
or:
|
623
|
+
|
624
|
+
data_set.each do |inputs,outputs|
|
625
|
+
# ... do your stuff using scaled_network.reset( inputs )
|
626
|
+
end
|
627
|
+
|
628
|
+
## Pit Falls
|
629
|
+
|
630
|
+
When sub-classing a Neuronet::Scale type class,
|
631
|
+
make sure mapped\_input, mapped\_output, unmapped\_input,
|
632
|
+
and unmapped\_output are defined as you intended.
|
633
|
+
If you don't override them, they will point to the first ancestor that defines them.
|
634
|
+
I had a bug (in 2.0.0, fixed in 2.0.1) where
|
635
|
+
I assumed overriding mapped redefined along all it's parent's synonyms...
|
636
|
+
it does not work that way.
|
637
|
+
Another pitfall is confusing the input/output flow in connections and back-propagation.
|
638
|
+
Remember to connect outputs to inputs (out.connect(in)) and
|
639
|
+
to back-propagate from outputs to inputs (out.train(targets)).
|
640
|
+
|
641
|
+
# THE FOLLOWING SECTIONS NEEDS A COMPLETE REWRITE!
|
642
|
+
|
643
|
+
## Custom Networks
|
644
|
+
|
645
|
+
To demonstrate how this library can build custom networks,
|
646
|
+
I've created four new classes of feed forward networks.
|
647
|
+
By the way, I'm completely making these up and was about to call them
|
648
|
+
HotDog, Taco, Burrito, and Enchilada when I then thought of Tao/Yin/Yang:
|
649
|
+
|
650
|
+
## Tao
|
651
|
+
|
652
|
+
In Neuronet, Tao is a three or more layered feed forward neural network
|
653
|
+
with it's output and input connected directly.
|
654
|
+
It effectively makes it a hybrid perceptron. It subclasses ScaledNetwork.
|
655
|
+
|
656
|
+
## Yin
|
657
|
+
|
658
|
+
In Neuronet, Yin is a Tao with the first hidden layer, hereby called yin,
|
659
|
+
initially set to have corresponding neuron pairs with
|
660
|
+
it's input layer's with weights set to 1.0 and bias 0.5.
|
661
|
+
This makes yin initially mirror the input layer.
|
662
|
+
The correspondence is done between the first neurons in the yin layer and the input layer.
|
663
|
+
|
664
|
+
## Yang
|
665
|
+
|
666
|
+
In Neuronet, Yang is a Tao with it's output layer connected to the last hidden layer,
|
667
|
+
hereby called yang, such that corresponding neuron pairs have weights set to 1.0 and bias 0.5.
|
668
|
+
This makes output initially mirror yang.
|
669
|
+
The correspondence is done between the last neurons in the yang layer and the output layer.
|
670
|
+
|
671
|
+
## YinYang
|
672
|
+
|
673
|
+
In Neuronet, YinYang is a Tao that's been Yin'ed Yang'ed. :))
|
674
|
+
That's a feed forward network of at least three layers with
|
675
|
+
its output layer also connected directly to the input layer, and
|
676
|
+
with the output layer initially mirroring the last hidden layer, and
|
677
|
+
the first hidden layer initially mirroring the input layer.
|
678
|
+
Note that a particularly interesting YinYang with n inputs and m outputs
|
679
|
+
would be constructed this way:
|
680
|
+
|
681
|
+
yinyang = Neuronet::YinYang.new( [n, n+m, m] )
|
682
|
+
|
683
|
+
Here yinyang's hidden layer (which is both yin and yang)
|
684
|
+
initially would have the first n neurons mirror the input and
|
685
|
+
the last m neurons be mirrored by the output.
|
686
|
+
Another interesting YinYang would be:
|
687
|
+
|
688
|
+
yinyang = Neuronet::YinYang.new( [n, n, n] )
|
689
|
+
|
690
|
+
The following code demonstrates what is meant by "mirroring":
|
691
|
+
|
692
|
+
yinyang = Neuronet::YinYang.new( [3, 3, 3] )
|
693
|
+
yinyang.reset( [-1,0,1] )
|
694
|
+
puts yinyang.in.map{|x| x.activation}.join(', ')
|
695
|
+
puts yinyang.yin.map{|x| x.activation}.join(', ')
|
696
|
+
puts yinyang.out.map{|x| x.activation}.join(', ')
|
697
|
+
puts yinyang.output.join(', ')
|
698
|
+
|
699
|
+
Here's the output:
|
700
|
+
|
701
|
+
0.268941421369995, 0.5, 0.731058578630005
|
702
|
+
0.442490985892539, 0.5, 0.557509014107461
|
703
|
+
0.485626707638021, 0.5, 0.514373292361979
|
704
|
+
-0.0575090141074614, 0.0, 0.057509014107461
|
705
|
+
|
706
|
+
## Questions?
|
707
|
+
|
708
|
+
Email me.
|
709
|
+
|
710
|
+
# AND A LOT OF MATH I NEED TO GO OVER AND CLEAN UP
|
711
|
+
|
712
|
+
## Notes I had on my old ynot2day
|
713
|
+
|
714
|
+
My sources are Neural Networks & Fuzzy Logic by Dr. Valluru B. Rao and Hayagriva V. Rao (1995), and Neural Computing Architectures edited by Igor Aleksander (1989) which includes "A theory of neural networks" by Eduardo R. Caianiello and "Backpropagation in non-feedforward networks" by Luis B. Almeida. The following is my analysis of the general mathematics of neural networks, which clarity I have not found elsewhere.
|
715
|
+
|
716
|
+
First, let me define my notation. I hate to reinvent the wheel (well, actually, it is kind of fun to do so), but I do not know the standard math notation when using straight ASCII typed from a normal keyboard. So I define the notation for sumation, differentiation, the sigmoid function, and the exponential function given as Exp{}. Indexes to vectors and arrays are bracketed with []. Objects acted on by functions are bracketed by {}. Grouping of variables/objects is achieved with (). I also use () to include parameters that modify a function.
|
717
|
+
|
718
|
+
Definition of Sum:
|
719
|
+
|
720
|
+
Sum(j=1 to 3){A[j]} = A[1]+A[2]+A[3]
|
721
|
+
Sum(i=0 to N){f[i]} = f[i]+...+f[N]
|
722
|
+
|
723
|
+
Definition of Dif:
|
724
|
+
|
725
|
+
Dif(x){x^n} = n*x^(n-1)
|
726
|
+
Dif(y){f{u}} = Dif(u){f{u}}*Dif(y){u}
|
727
|
+
|
728
|
+
Definition of Sig:
|
729
|
+
|
730
|
+
Sig{z} = 1/(1+Exp{-z})
|
731
|
+
|
732
|
+
Next, I describe a mathematical model of a neuron. Usually a neuron is described as being either on or off. I believe it more usefull to describe a neuron as having a pulse rate. A boolean (true or false) neuron would either have a high or a low pulse rate. In absence of any stimuli from neighbohring neurons, the neuron may also have a rest pulse rate. The rest pulse rate is due to the the bias of a neuron. A neuron receives stimuli from other neurons through the axons that connect them. These axons communicate to the receiving neuron the pulse rates of the transmitting neurons. The signal from other neurons are either strengthen or weakened at the synapse, and might either inhibit or excite the receiving neuron. The sum of all these signals is the activation of the receiving neuron. The activation of a neuron determines the neuron's actual response (its pulse rate), which the neuron then transmits to other neurons through its axons. Finally, a neuron has a maximum pulse rate which I map to 1, and a minimum pulse rate of 0.
|
733
|
+
|
734
|
+
Let the bias of a neuron be b[], the activation, y[], the response, x[], and the weights, w[,]. The pulse rate of a receiving neuron, r, is related to its activation which is related to the pulse rates of the other transmitting neurons, t, by the following equations:
|
735
|
+
|
736
|
+
x[r] = Sig{ y[r] }
|
737
|
+
y[r] = b[r] + Sum(All t){ w[r,t] * x[t] }
|
738
|
+
x[r] = Sig{ b[r] + Sum(All t){w[r,t] * x[t]} }
|
739
|
+
|
740
|
+
Next I try to derive the learning rule of the neural network. Somehow, a neuron can be trained to become more or less sensitive to stimuli from another neuron and to become more or less sensitive in general. That is, we can change the neuron's bias and synaptic weights. To do it right, I need an estimate of the error in the neuron's pulse and relate this to the correction needed in the bias and each synaptic weight. This type of error analysis is usually aproximated through differentiation. What is the error in the pulse rate due to an error in a synaptic weight?
|
741
|
+
|
742
|
+
Dif(w[r,t]){x[r]} = Dif(y[r]){Sig{y[r]}}*Dif(w[r,t]){y[r]}}
|
743
|
+
|
744
|
+
Dif(z){Sig{z}} = -Exp{-z}/(1+Exp{-z})^2
|
745
|
+
= (1-1-Exp{-z})/(1+Exp{-z})^2
|
746
|
+
= (1-(1+Exp{-z}))/(1+Exp{-z})^2
|
747
|
+
= 1/(1+Exp{-z})^2 - 1/(1+Exp{-z})
|
748
|
+
= Sig{z}^2 - Sig{z}
|
749
|
+
= Sig{z}*(Sig{z}-1) # <== THIS HAS TO BE WRONG! Looking for D{f}=f(1-f)
|
750
|
+
|
751
|
+
Dif(y[r]){Sig{y[r]}} = Sig{y[r]}*(Sig{y[r]}-1) =
|
752
|
+
= x[r]*(x[r]-1)
|
753
|
+
|
754
|
+
Dif(w[r,t]){y[r]} = Dif(w[r,t]){Sum(t){w[r,t]*x[t]}} = x[t]
|
755
|
+
|
756
|
+
Dif(w[r,t]){x[r]} = x[r] * (x[r]-1) * x[t]
|
757
|
+
|
758
|
+
Let X[r] be the correct pulse rate. The error in the pulse rate is the difference between the correct value and the actual computation, e[r]=X[r]-x[r]. Let dx[r,t] be the error in x[r] due to weight w[r,t]. Consider that dx[r,t]/dw[r,t] aproximates Dif(w[r,t]){x[r]}.
|
759
|
+
|
760
|
+
dx[r,t]/dw[r,t] = Dif(w[r,t]){x[r]}
|
761
|
+
dx[r,t] = x[r] * (x[r]-1) * x[t] * dw[r,t]
|
762
|
+
|
763
|
+
Then e[r] is the sum of all errors, dx[r,t].
|
764
|
+
|
765
|
+
e[r] = Sum(t){ dx[r,t] } =
|
766
|
+
= Sum(t=1 to N){ x[r]*(x[r]-1)*x[t]*dw[r,t] }
|
767
|
+
|
768
|
+
Straight Algebra thus far. Now the tricky part... I have related the error in a neuron's pulse to the sum of the errors in the neurons receiving synapses. What I really want is to relate it to a particular synapse. This information is lost in the sum, and I must rely on statistical chance. Let me first pretend I know that the error partitions itself among the synapses with distribution u[i,j].
|
769
|
+
|
770
|
+
e[r] = Sum(t=1 to N){ u[r,t] e[r] }
|
771
|
+
u[r,t] * e[r] = x[r] * (x[r]-1) * x[t] * dw[r,t]
|
772
|
+
|
773
|
+
The average value of u[r,t] is probably 1/N. In any case, the point is that this average is a small number less than 1. We use an equi-partition hypothesis and assume that each dw[r,t] is equally likely to be the source of error. Let u[r,t] ~ u, a small number, for all r and t. The best estimate of dw[r,t] becomes:
|
774
|
+
|
775
|
+
u * e[r] = x[r] * (x[r]-1) * x[t] * dw[r,t]
|
776
|
+
dw[r,t] = u * e[r] / ( x[r] * (x[r]-1) * x[t] ) ???
|
777
|
+
|
778
|
+
If u~1/N was not tricky, then consider this. x[] is meant to converge to either 0 or 1. That is x[] is meant to be boolean. Note how the above equation for dw[i,j] could not really work if x[] truly were 0 or 1. But x[] is a fuzzy variable never really achieving 0 or 1.
|
779
|
+
|
780
|
+
How do I conclude that... dw[r,t]=u * x[r] * (1-x[r]) * x[t] * e[r] ...which is the correct learning rule?
|
781
|
+
|
782
|
+
This is the part of the Neural Net jargon I have not been able to bring to my level. I believe the answer is buried in what is being called transposition of linear networks. My analysis is correct up to this:
|
783
|
+
|
784
|
+
u * e[r] = x[r] * (x[r]-1) * x[t] * dw[r,t]
|
785
|
+
|
786
|
+
This equation relates the error in the pulse of neuron to the error in the synaptic weight between the transmiting neuron, t, and the receiving neuron, r. I believe the transposition of linear networks states that the relationship remains the same when back propagating the error in the neural pulse to the the synaptic weight. That is, we do not invert the multiplication factor. This seems intuitive, but I admit I am confused by the paradox in the algebra above. Thus...
|
787
|
+
|
788
|
+
The Learning Rule for the (0,1) sigmoid neuron
|
789
|
+
dw[r,t] = x[r] * (x[r]-1) * x[t] * u * e[r]
|
790
|
+
|
791
|
+
The derivation for the correction to the bias is analogous. Note that the x[t] factor does not appear in this case.
|
792
|
+
|
793
|
+
db[r] = Dif(b[r]){x[r]} * u * e[r]
|
794
|
+
= x[r]*(x[r]-1)*Dif(b[r]){br} * u * e[r] = x[r]*(x[r]-1)*1 * u * e[r]
|
795
|
+
db[r] = x[r]*(x[r]-1) * u * e[r]
|
796
|
+
|
797
|
+
I was able to arrive at an estimate to the correction needed for the output neuron's synaptic weight and bias. I knew what the output was suppose to be, X[r], and the actual computation, x[r]. The error of the output neuron, e[r], was X[r]-x[r]. But what if the neuron was not an output neuron? I need to propagate back the error of the output neuron (and later for the general receiving neuron) to each of its transmitting neuron. The error of a transmitting neuron is assigned to be the sum of all errors propagated back from all of its receiving neurons.
|
798
|
+
|
799
|
+
e[t] = Sum(r){ x[r] * (x[r]-1) * x[t] * u * e[r] }
|
800
|
+
|
801
|
+
Then, when we get to adjusting the transmitting neuron we will have an estimate of its pulse error. These are the learning equations for the general neural network:
|
802
|
+
|
803
|
+
dw[r,t] = Dif(w[r,t]){x[r]} * u * e[r]
|
804
|
+
db[r] = Dif(b[r]){x[r]} * u * e[r]
|
805
|
+
|
806
|
+
The above equations give the correction to the synaptic weight and neural bias once we are given the error in a neuron. Next, we need to propagate back the known errors in the output through the network.
|
807
|
+
|
808
|
+
e[t] = X[t] - x[t] if the i'th neuron is also the output.
|
809
|
+
e[t] = Sum(r){ w[r,t] * e[r] * Dif(b[r]){x[r]} } for the rest.
|
810
|
+
|
811
|
+
Note how I sent the errors from the receiving neurons to the transmitting neuron. I hope this explains the theory well. I distilled it from the above sources.
|
812
|
+
|
813
|
+
|
814
|
+
## Notes from reading neuronet
|
815
|
+
|
816
|
+
For some Neuronet::FeedForward object, obj:
|
817
|
+
|
818
|
+
obj.output
|
819
|
+
obj.out.values
|
820
|
+
obj.out.map{|node| node.to_f}
|
821
|
+
obj.out.map{|node| node.value}
|
822
|
+
obj.out.map{|node| unsquash(node.activation)}
|
823
|
+
obj.out.map{|node| bias+connections }
|
824
|
+
|
825
|
+
O[i] = b + Sum(1,J){|j| W[i,j] Squash(I[j])}
|
826
|
+
|
827
|
+
100 +/- 1 = 50 + sum[1,50]{|j| w[i,j]I[j]}
|
828
|
+
de = 1/100
|
829
|
+
b += b*de
|
830
|
+
|
831
|
+
If we say that the value of some output is
|
832
|
+
|
833
|
+
Output[o] = Bias[o] + Sum{ Connection[m,o] }
|
834
|
+
|
835
|
+
has some error E
|
836
|
+
|
837
|
+
Target[o] = Output[o] + E
|
838
|
+
|
839
|
+
Then there is an e such that
|
840
|
+
|
841
|
+
Output[o](1+e) = Output[o] + E
|
842
|
+
(1+e) = (Output[o] + E)/Output[o]
|
843
|
+
1+e = 1 + E/Output[o]
|
844
|
+
e = E/Output[o]
|
845
|
+
|
846
|
+
And Target can be set as
|
847
|
+
|
848
|
+
Target[o] = (Bias[o] + Sum{ Connection[m,o] })(1+e)
|
849
|
+
Target[o] = Bias[o](1+e) + Sum{ Connection[m,o] }(1+e)
|
850
|
+
|
851
|
+
Assumping equipartition in error,
|
852
|
+
we might then suggest the following correction to Bias:
|
853
|
+
|
854
|
+
Bias[o] = Bias[o](1+e)
|
855
|
+
Bias[o] = Bias[o]+Bias[o]e
|
856
|
+
Bias[o] += Bias[o]e
|
857
|
+
|
858
|
+
|
859
|
+
Remember that:
|
860
|
+
D{squash(u)} = squash(u)*(1-squash(u))*D{u}
|
861
|
+
|
862
|
+
@activation = squash( @bias + @connections...)
|
863
|
+
D{ @activation } = D{ squash( @bias + @connections...) }
|
864
|
+
D{ @activation } = @activation*(1-@activation) D{ @bias + @connections... }
|
865
|
+
Just the part due to bias...
|
866
|
+
D.bias{ @activation } = @activation*(1-@activation) D{ @bias }
|
867
|
+
D.bias{ @activation } / (@activation*(1-@activation)) = D{ @bias }
|
868
|
+
Just the part due to connection...
|
869
|
+
D.connection{ @activation } = @activation*(1-@activation) D{ @connections... }
|
870
|
+
|
871
|
+
D
|
data/lib/neuronet.rb
ADDED
@@ -0,0 +1,434 @@
|
|
1
|
+
# Neuronet module
|
2
|
+
module Neuronet
|
3
|
+
VERSION = '6.0.0'
|
4
|
+
|
5
|
+
# The squash function for Neuronet is the sigmoid function.
|
6
|
+
# One should scale the problem with most data points between -1 and 1,
|
7
|
+
# extremes under 2s, and no outbounds above 3s.
|
8
|
+
# Standard deviations from the mean is probably a good way to figure the scale of the problem.
|
9
|
+
def self.squash(unsquashed)
|
10
|
+
1.0 / (1.0 + Math.exp(-unsquashed))
|
11
|
+
end
|
12
|
+
|
13
|
+
def self.unsquash(squashed)
|
14
|
+
Math.log(squashed / (1.0 - squashed))
|
15
|
+
end
|
16
|
+
|
17
|
+
# By default, Neuronet builds a zeroed network.
|
18
|
+
# Noise adds random fluctuations to create a search for minima.
|
19
|
+
def self.noise
|
20
|
+
rand + rand
|
21
|
+
end
|
22
|
+
|
23
|
+
# A Node, used for the input layer.
|
24
|
+
class Node
|
25
|
+
attr_reader :activation
|
26
|
+
# A Node is constant (Input)
|
27
|
+
alias update activation
|
28
|
+
|
29
|
+
# The "real world" value of a node is the value of it's activation unsquashed.
|
30
|
+
def value=(val)
|
31
|
+
@activation = Neuronet.squash(val)
|
32
|
+
end
|
33
|
+
|
34
|
+
def initialize(val=0.0)
|
35
|
+
self.value = val
|
36
|
+
end
|
37
|
+
|
38
|
+
# The "real world" value is stored as a squashed activation.
|
39
|
+
def value
|
40
|
+
Neuronet.unsquash(@activation)
|
41
|
+
end
|
42
|
+
|
43
|
+
# Node is a terminal where backpropagation ends.
|
44
|
+
def backpropagate(error)
|
45
|
+
# to be over-ridden
|
46
|
+
nil
|
47
|
+
end
|
48
|
+
end
|
49
|
+
|
50
|
+
# A weighted connection to a neuron (or node).
|
51
|
+
class Connection
|
52
|
+
attr_accessor :node, :weight
|
53
|
+
def initialize(node, weight=0.0)
|
54
|
+
@node, @weight = node, weight
|
55
|
+
end
|
56
|
+
|
57
|
+
# The value of a connection is the weighted activation of the connected node.
|
58
|
+
def value
|
59
|
+
@node.activation * @weight
|
60
|
+
end
|
61
|
+
|
62
|
+
# Updates and returns the value of the connection.
|
63
|
+
# Updates the connected node.
|
64
|
+
def update
|
65
|
+
@node.update * @weight
|
66
|
+
end
|
67
|
+
|
68
|
+
# Adjusts the connection weight according to error and
|
69
|
+
# backpropagates the error to the connected node.
|
70
|
+
def backpropagate(error)
|
71
|
+
@weight += @node.activation * error * Neuronet.noise
|
72
|
+
@node.backpropagate(error)
|
73
|
+
end
|
74
|
+
end
|
75
|
+
|
76
|
+
# A Neuron with bias and connections
|
77
|
+
class Neuron < Node
|
78
|
+
attr_reader :connections
|
79
|
+
attr_accessor :bias
|
80
|
+
def initialize(bias=0.0)
|
81
|
+
super(bias)
|
82
|
+
@connections = []
|
83
|
+
@bias = bias
|
84
|
+
end
|
85
|
+
|
86
|
+
# Updates the activation with the current value of bias and updated values of connections.
|
87
|
+
def update
|
88
|
+
self.value = @bias + @connections.inject(0.0){|sum,connection| sum + connection.update}
|
89
|
+
end
|
90
|
+
|
91
|
+
# Updates the activation with the current values of bias and connections
|
92
|
+
# For when connections are already updated.
|
93
|
+
def partial
|
94
|
+
self.value = @bias + @connections.inject(0.0){|sum,connection| sum + connection.value}
|
95
|
+
end
|
96
|
+
|
97
|
+
# Adjusts bias according to error and
|
98
|
+
# backpropagates the error to the connections.
|
99
|
+
def backpropagate(error)
|
100
|
+
@bias += error * Neuronet.noise
|
101
|
+
@connections.each{|connection| connection.backpropagate(error)}
|
102
|
+
end
|
103
|
+
|
104
|
+
# Connects the neuron to another node.
|
105
|
+
# Updates the activation with the new connection.
|
106
|
+
# The default weight=0 means there is no initial association
|
107
|
+
def connect(node, weight=0.0)
|
108
|
+
@connections.push(Connection.new(node,weight))
|
109
|
+
update
|
110
|
+
end
|
111
|
+
end
|
112
|
+
|
113
|
+
# This is the Input Layer
|
114
|
+
class InputLayer < Array
|
115
|
+
def initialize(length) # number of nodes
|
116
|
+
super(length)
|
117
|
+
0.upto(length-1){|index| self[index] = Neuronet::Node.new }
|
118
|
+
end
|
119
|
+
|
120
|
+
# This is where one enters the "real world" inputs.
|
121
|
+
def set(inputs)
|
122
|
+
0.upto(self.length-1){|index| self[index].value = inputs[index]}
|
123
|
+
end
|
124
|
+
end
|
125
|
+
|
126
|
+
# Just a regular Layer
|
127
|
+
class Layer < Array
|
128
|
+
def initialize(length)
|
129
|
+
super(length)
|
130
|
+
0.upto(length-1){|index| self[index] = Neuronet::Neuron.new }
|
131
|
+
end
|
132
|
+
|
133
|
+
# Allows one to fully connect layers.
|
134
|
+
def connect(layer, weight=0.0)
|
135
|
+
# creates the neuron matrix... note that node can be either Neuron or Node class.
|
136
|
+
self.each{|neuron| layer.each{|node| neuron.connect(node,weight) }}
|
137
|
+
end
|
138
|
+
|
139
|
+
# updates layer with current values of the previous layer
|
140
|
+
def partial
|
141
|
+
self.each{|neuron| neuron.partial}
|
142
|
+
end
|
143
|
+
|
144
|
+
# Takes the real world targets for each node in this layer
|
145
|
+
# and backpropagates the error to each node.
|
146
|
+
# Note that the learning constant is really a value
|
147
|
+
# that needs to be determined for each network.
|
148
|
+
def train(targets, learning)
|
149
|
+
0.upto(self.length-1) do |index|
|
150
|
+
node = self[index]
|
151
|
+
node.backpropagate(learning*(targets[index] - node.value))
|
152
|
+
end
|
153
|
+
end
|
154
|
+
|
155
|
+
# Returns the real world values of this layer.
|
156
|
+
def values
|
157
|
+
self.map{|node| node.value}
|
158
|
+
end
|
159
|
+
end
|
160
|
+
|
161
|
+
# A Feed Forward Network
|
162
|
+
class FeedForward < Array
|
163
|
+
# Whatchamacallits?
|
164
|
+
def mu
|
165
|
+
sum = 1.0
|
166
|
+
1.upto(self.length-1) do |i|
|
167
|
+
n, m = self[i-1].length, self[i].length
|
168
|
+
sum += n + n*m
|
169
|
+
end
|
170
|
+
return sum
|
171
|
+
end
|
172
|
+
def muk(k=1.0)
|
173
|
+
@learning = k/mu
|
174
|
+
end
|
175
|
+
def num(n)
|
176
|
+
@learning = 1.0/(Math.sqrt(1.0+n) * mu)
|
177
|
+
end
|
178
|
+
|
179
|
+
attr_reader :in, :out
|
180
|
+
attr_reader :yin, :yang
|
181
|
+
attr_accessor :learning
|
182
|
+
def initialize(layers)
|
183
|
+
super(length = layers.length)
|
184
|
+
@in = self[0] = Neuronet::InputLayer.new(layers[0])
|
185
|
+
(1).upto(length-1){|index|
|
186
|
+
self[index] = Neuronet::Layer.new(layers[index])
|
187
|
+
self[index].connect(self[index-1])
|
188
|
+
}
|
189
|
+
@out = self.last
|
190
|
+
@yin = self[1] # first middle layer
|
191
|
+
@yang = self[-2] # last middle layer
|
192
|
+
@learning = 1.0/mu
|
193
|
+
end
|
194
|
+
|
195
|
+
def update
|
196
|
+
# update up the layers
|
197
|
+
(1).upto(self.length-1){|index| self[index].partial}
|
198
|
+
end
|
199
|
+
|
200
|
+
def set(inputs)
|
201
|
+
@in.set(inputs)
|
202
|
+
update
|
203
|
+
end
|
204
|
+
|
205
|
+
def train!(targets)
|
206
|
+
@out.train(targets, @learning)
|
207
|
+
update
|
208
|
+
end
|
209
|
+
|
210
|
+
# trains an input/output pair
|
211
|
+
def exemplar(inputs, targets)
|
212
|
+
set(inputs)
|
213
|
+
train!(targets)
|
214
|
+
end
|
215
|
+
|
216
|
+
def input
|
217
|
+
@in.values
|
218
|
+
end
|
219
|
+
|
220
|
+
def output
|
221
|
+
@out.values
|
222
|
+
end
|
223
|
+
end
|
224
|
+
|
225
|
+
# Scales the problem
|
226
|
+
class Scale
|
227
|
+
attr_accessor :spread, :center
|
228
|
+
attr_writer :init
|
229
|
+
|
230
|
+
def initialize(factor=1.0,center=nil,spread=nil)
|
231
|
+
@factor,@center,@spread = factor,center,spread
|
232
|
+
@centered, @spreaded = center.nil?, spread.nil?
|
233
|
+
@init = true
|
234
|
+
end
|
235
|
+
|
236
|
+
def set_init(inputs)
|
237
|
+
@min, @max = inputs.minmax
|
238
|
+
end
|
239
|
+
|
240
|
+
# In this case, inputs is unused, but
|
241
|
+
# it's there for the general case.
|
242
|
+
def set_spread(inputs)
|
243
|
+
@spread = (@max - @min) / 2.0
|
244
|
+
end
|
245
|
+
|
246
|
+
# In this case, inputs is unused, but
|
247
|
+
# it's there for the general case.
|
248
|
+
def set_center(inputs)
|
249
|
+
@center = (@max + @min) / 2.0
|
250
|
+
end
|
251
|
+
|
252
|
+
def set(inputs)
|
253
|
+
set_init(inputs) if @init
|
254
|
+
set_center(inputs) if @centered
|
255
|
+
set_spread(inputs) if @spreaded
|
256
|
+
end
|
257
|
+
|
258
|
+
def mapped(inputs)
|
259
|
+
factor = 1.0 / (@factor*@spread)
|
260
|
+
inputs.map{|value| factor*(value - @center)}
|
261
|
+
end
|
262
|
+
alias mapped_input mapped
|
263
|
+
alias mapped_output mapped
|
264
|
+
|
265
|
+
# Note that it could also unmap inputs, but
|
266
|
+
# outputs is typically what's being transformed back.
|
267
|
+
def unmapped(outputs)
|
268
|
+
factor = @factor*@spread
|
269
|
+
outputs.map{|value| factor*value + @center}
|
270
|
+
end
|
271
|
+
alias unmapped_input unmapped
|
272
|
+
alias unmapped_output unmapped
|
273
|
+
end
|
274
|
+
|
275
|
+
# Normal Distribution
|
276
|
+
class Gaussian < Scale
|
277
|
+
def initialize(factor=1.0,center=nil,spread=nil)
|
278
|
+
super(factor, center, spread)
|
279
|
+
self.init = false
|
280
|
+
end
|
281
|
+
|
282
|
+
def set_center(inputs)
|
283
|
+
self.center = inputs.inject(0.0,:+) / inputs.length
|
284
|
+
end
|
285
|
+
|
286
|
+
def set_spread(inputs)
|
287
|
+
self.spread = Math.sqrt(inputs.map{|value|
|
288
|
+
self.center - value}.inject(0.0){|sum,value|
|
289
|
+
value*value + sum} / (inputs.length - 1.0))
|
290
|
+
end
|
291
|
+
end
|
292
|
+
|
293
|
+
# Log-Normal Distribution
|
294
|
+
class LogNormal < Gaussian
|
295
|
+
def initialize(factor=1.0,center=nil,spread=nil)
|
296
|
+
super(factor, center, spread)
|
297
|
+
end
|
298
|
+
|
299
|
+
def set(inputs)
|
300
|
+
super( inputs.map{|value| Math::log(value)} )
|
301
|
+
end
|
302
|
+
|
303
|
+
def mapped(inputs)
|
304
|
+
super( inputs.map{|value| Math::log(value)} )
|
305
|
+
end
|
306
|
+
alias mapped_input mapped
|
307
|
+
alias mapped_output mapped
|
308
|
+
|
309
|
+
def unmapped(outputs)
|
310
|
+
super(outputs).map{|value| Math::exp(value)}
|
311
|
+
end
|
312
|
+
alias unmapped_input unmapped
|
313
|
+
alias unmapped_output unmapped
|
314
|
+
end
|
315
|
+
|
316
|
+
# Series Network for similar input/output values
|
317
|
+
class ScaledNetwork < FeedForward
|
318
|
+
attr_accessor :distribution
|
319
|
+
|
320
|
+
def initialize(layers)
|
321
|
+
super(layers)
|
322
|
+
@distribution = Gaussian.new
|
323
|
+
end
|
324
|
+
|
325
|
+
def train!(targets)
|
326
|
+
super(@distribution.mapped_output(targets))
|
327
|
+
end
|
328
|
+
|
329
|
+
# @param (List of Float) values
|
330
|
+
def set(inputs)
|
331
|
+
super(@distribution.mapped_input(inputs))
|
332
|
+
end
|
333
|
+
|
334
|
+
def reset(inputs)
|
335
|
+
@distribution.set(inputs)
|
336
|
+
set(inputs)
|
337
|
+
end
|
338
|
+
|
339
|
+
def output
|
340
|
+
@distribution.unmapped_output(super)
|
341
|
+
end
|
342
|
+
|
343
|
+
def input
|
344
|
+
@distribution.unmapped_input(super)
|
345
|
+
end
|
346
|
+
end
|
347
|
+
|
348
|
+
# A Perceptron Hybrid
|
349
|
+
module Tao
|
350
|
+
def mu
|
351
|
+
sum = super
|
352
|
+
sum += self.first.length * self.last.length
|
353
|
+
return sum
|
354
|
+
end
|
355
|
+
def self.bless(myself)
|
356
|
+
# @out directly connects to @in
|
357
|
+
myself.out.connect(myself.in)
|
358
|
+
myself.extend Tao
|
359
|
+
# Save current learning and set it to muk(1).
|
360
|
+
l, m = myself.learning, myself.muk
|
361
|
+
# If learning was lower b/4, revert.
|
362
|
+
myself.learning = l if l<m
|
363
|
+
return myself
|
364
|
+
end
|
365
|
+
end
|
366
|
+
|
367
|
+
# sets @yin to initially mirror @in
|
368
|
+
module Yin
|
369
|
+
def self.bless(myself)
|
370
|
+
yin = myself.yin
|
371
|
+
if yin.length < (in_length = myself.in.length)
|
372
|
+
raise "First hidden layer, yin, needs to have at least the same length as input"
|
373
|
+
end
|
374
|
+
# connections from yin[i] to in[i] are 1... mirroring to start.
|
375
|
+
0.upto(in_length-1) do |index|
|
376
|
+
node = yin[index]
|
377
|
+
node.connections[index].weight = 1.0
|
378
|
+
node.bias = -0.5
|
379
|
+
end
|
380
|
+
return myself
|
381
|
+
end
|
382
|
+
end
|
383
|
+
|
384
|
+
# sets @out to initially mirror @yang
|
385
|
+
module Yang
|
386
|
+
def self.bless(myself)
|
387
|
+
offset = myself.yang.length - (out_length = (out = myself.out).length)
|
388
|
+
raise "Last hidden layer, yang, needs to have at least the same length as output" if offset < 0
|
389
|
+
0.upto(out_length-1) do |index|
|
390
|
+
node = out[index]
|
391
|
+
node.connections[offset+index].weight = 1.0
|
392
|
+
node.bias = -0.5
|
393
|
+
end
|
394
|
+
return myself
|
395
|
+
end
|
396
|
+
end
|
397
|
+
|
398
|
+
# And convenient composites...
|
399
|
+
|
400
|
+
# Yin-Yang-ed :))
|
401
|
+
module YinYang
|
402
|
+
def self.bless(myself)
|
403
|
+
Yin.bless(myself)
|
404
|
+
Yang.bless(myself)
|
405
|
+
return myself
|
406
|
+
end
|
407
|
+
end
|
408
|
+
|
409
|
+
module TaoYinYang
|
410
|
+
def self.bless(myself)
|
411
|
+
Tao.bless(myself)
|
412
|
+
Yin.bless(myself)
|
413
|
+
Yang.bless(myself)
|
414
|
+
return myself
|
415
|
+
end
|
416
|
+
end
|
417
|
+
|
418
|
+
module TaoYin
|
419
|
+
def self.bless(myself)
|
420
|
+
Tao.bless(myself)
|
421
|
+
Yin.bless(myself)
|
422
|
+
return myself
|
423
|
+
end
|
424
|
+
end
|
425
|
+
|
426
|
+
module TaoYang
|
427
|
+
def self.bless(myself)
|
428
|
+
Tao.bless(myself)
|
429
|
+
Yang.bless(myself)
|
430
|
+
return myself
|
431
|
+
end
|
432
|
+
end
|
433
|
+
|
434
|
+
end
|
metadata
ADDED
@@ -0,0 +1,47 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: neuronet
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 6.0.0
|
5
|
+
prerelease:
|
6
|
+
platform: ruby
|
7
|
+
authors:
|
8
|
+
- carlosjhr64@gmail.com
|
9
|
+
autorequire:
|
10
|
+
bindir: bin
|
11
|
+
cert_chain: []
|
12
|
+
date: 2013-06-16 00:00:00.000000000 Z
|
13
|
+
dependencies: []
|
14
|
+
description: Build custom neural networks. 100% 1.9 Ruby.
|
15
|
+
email: carlosjhr64@gmail.com
|
16
|
+
executables: []
|
17
|
+
extensions: []
|
18
|
+
extra_rdoc_files: []
|
19
|
+
files:
|
20
|
+
- ./lib/neuronet.rb
|
21
|
+
- ./README.md
|
22
|
+
homepage: https://github.com/carlosjhr64/neuronet
|
23
|
+
licenses: []
|
24
|
+
post_install_message:
|
25
|
+
rdoc_options: []
|
26
|
+
require_paths:
|
27
|
+
- lib
|
28
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
29
|
+
none: false
|
30
|
+
requirements:
|
31
|
+
- - ! '>='
|
32
|
+
- !ruby/object:Gem::Version
|
33
|
+
version: '0'
|
34
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
35
|
+
none: false
|
36
|
+
requirements:
|
37
|
+
- - ! '>='
|
38
|
+
- !ruby/object:Gem::Version
|
39
|
+
version: '0'
|
40
|
+
requirements: []
|
41
|
+
rubyforge_project:
|
42
|
+
rubygems_version: 1.8.11
|
43
|
+
signing_key:
|
44
|
+
specification_version: 3
|
45
|
+
summary: Library to create neural networks.
|
46
|
+
test_files: []
|
47
|
+
has_rdoc:
|