neuronet 6.0.0 → 6.0.1
Sign up to get free protection for your applications and to get access to all the features.
- data/README.md +407 -564
- data/lib/neuronet.rb +157 -30
- metadata +2 -2
data/README.md
CHANGED
@@ -1,11 +1,12 @@
|
|
1
|
-
# Neuronet 6.0.
|
1
|
+
# Neuronet 6.0.1
|
2
2
|
|
3
3
|
Library to create neural networks.
|
4
4
|
|
5
|
+
* Gem: <https://rubygems.org/gems/neuronet>
|
6
|
+
* Git: <https://github.com/carlosjhr64/neuronet>
|
5
7
|
* Author: <carlosjhr64@gmail.com>
|
6
8
|
* Copyright: 2013
|
7
9
|
* License: [GPL](http://www.gnu.org/licenses/gpl.html)
|
8
|
-
* Git Page: <https://github.com/carlosjhr64/neuronet>
|
9
10
|
|
10
11
|
## Installation
|
11
12
|
|
@@ -13,16 +14,16 @@ Library to create neural networks.
|
|
13
14
|
|
14
15
|
## Synopsis
|
15
16
|
|
16
|
-
Given some set of inputs
|
17
|
-
Then:
|
17
|
+
Given some set of inputs (of at least length 3) and
|
18
|
+
targets that are Array's of Float's. Then:
|
18
19
|
|
19
20
|
# data = [ [input, target], ... }
|
20
|
-
# n = input.length
|
21
|
+
# n = input.length # > 3
|
21
22
|
# t = target.length
|
22
23
|
# m = n + t
|
23
24
|
# l = data.length
|
24
25
|
# Then:
|
25
|
-
# Create a general purpose
|
26
|
+
# Create a general purpose neuronet
|
26
27
|
|
27
28
|
neuronet = Neuronet::ScaledNetwork.new([n, m, t])
|
28
29
|
|
@@ -42,7 +43,8 @@ Then:
|
|
42
43
|
|
43
44
|
MANY.times do
|
44
45
|
data.shuffle.each do |input, target|
|
45
|
-
neuronet.
|
46
|
+
neuronet.reset(input)
|
47
|
+
neuronet.train!(target)
|
46
48
|
end
|
47
49
|
end # or until some small enough error
|
48
50
|
|
@@ -66,7 +68,7 @@ It allows one to build a network by connecting one neuron at a time, or a layer
|
|
66
68
|
or up to a full feed forward network that automatically scales the inputs and outputs.
|
67
69
|
|
68
70
|
I chose a TaoYinYang'ed ScaledNetwork neuronet for the synopsis because
|
69
|
-
it
|
71
|
+
it will probably handle most anything with 3 or more input variables you'd throw at it.
|
70
72
|
But there's a lot you can do to the data before throwing it at a neuronet.
|
71
73
|
And you can build a neuronet specifically to solve a particular kind of problem.
|
72
74
|
Properly transforming the data and choosing the right neuronet architecture
|
@@ -74,25 +76,25 @@ can greatly reduce the amount of training time the neuronet will require.
|
|
74
76
|
A neuronet with the wrong architecture for a problem will be unable to solve it.
|
75
77
|
Raw data without hints as to what's important in the data will take longer to solve.
|
76
78
|
|
77
|
-
As an analogy, think of what you can do with
|
79
|
+
As an analogy, think of what you can do with
|
80
|
+
[linear regression](http://en.wikipedia.org/wiki/Linear_regression).
|
78
81
|
Your raw data might not be linear, but if a transform converts it to a linear form,
|
79
82
|
you can use linear regression to find the best fit line, and
|
80
83
|
from that deduce the properties of the untransformed data.
|
81
84
|
Likewise, if you can transform the data into something the neuronet can solve,
|
82
85
|
you can by inverse get back the answer you're lookin for.
|
83
86
|
|
84
|
-
#
|
87
|
+
# Examples
|
85
88
|
|
86
|
-
##
|
89
|
+
## Time Series
|
87
90
|
|
88
|
-
First, a little motivation...
|
89
91
|
A common use for a neural-net is to attempt to forecast future set of data points
|
90
92
|
based on past set of data points, [Time series](http://en.wikipedia.org/wiki/Time_series).
|
91
93
|
To demonstrate, I'll train a network with the following function:
|
92
94
|
|
93
95
|
f(t) = A + B sine(C + D t), t in [0,1,2,3,...]
|
94
96
|
|
95
|
-
I'll set A, B, C, and D
|
97
|
+
I'll set A, B, C, and D with random numbers and see
|
96
98
|
if eventually the network can predict the next set of values based on previous values.
|
97
99
|
I'll try:
|
98
100
|
|
@@ -106,13 +108,12 @@ if I set at random the phase (C above), so that for any given random phase we wa
|
|
106
108
|
|
107
109
|
I'll be using [Neuronet::ScaledNetwork](http://rubydoc.info/gems/neuronet/Neuronet/ScaledNetwork).
|
108
110
|
Also note that the Sine function is entirely defined within a cycle ( 2 Math::PI ) and
|
109
|
-
so parameters (particularly C) need only to be set within
|
111
|
+
so parameters (particularly C) need only to be set within this cycle.
|
110
112
|
After a lot of testing, I've verified that a
|
111
113
|
[Perceptron](http://en.wikipedia.org/wiki/Perceptron) is enough to solve the problem.
|
112
114
|
The Sine function is [Linearly separable](http://en.wikipedia.org/wiki/Linearly_separable).
|
113
115
|
Adding hidden layers needlessly adds training time, but does converge.
|
114
116
|
|
115
|
-
|
116
117
|
The gist of the
|
117
118
|
[example code](https://github.com/carlosjhr64/neuronet/blob/master/examples/sine_series.rb)
|
118
119
|
is:
|
@@ -155,7 +156,6 @@ Heres a sample output:
|
|
155
156
|
Target: -0.188, 4.153, 5.908, 1.135, 0.557
|
156
157
|
Output: -0.158, 4.112, 5.887, 1.175, 0.564
|
157
158
|
|
158
|
-
|
159
159
|
ScaledNetwork automatically scales each input via
|
160
160
|
[Neuronet::Gaussian](http://rubydoc.info/gems/neuronet/Neuronet/Gaussian),
|
161
161
|
so the input needs to be many variables and
|
@@ -167,340 +167,247 @@ The input must have at least three points.
|
|
167
167
|
You can tackle many problems just with
|
168
168
|
[Neuronet::ScaledNetwork](http://rubydoc.info/gems/neuronet/Neuronet/ScaledNetwork)
|
169
169
|
as described above.
|
170
|
-
So now that you're hopefully interested and want to go on to exactly how it all works,
|
171
|
-
I'll describe Neuronet from the ground up.
|
172
|
-
|
173
|
-
## Squashing Function
|
174
|
-
|
175
|
-
An artificial neural network uses an activation function
|
176
|
-
that determines the activation value of a neuron.
|
177
|
-
This activation value is often thought of on/off or true/false.
|
178
|
-
Neuronet uses a sigmoid function to set the neuron's activation value between 1.0 and 0.0.
|
179
|
-
For classification problems, activation values near one are considered true
|
180
|
-
while activation values near 0.0 are considered false.
|
181
|
-
In Neuronet I make a distinction between the neuron's activation value and
|
182
|
-
it's representation to the problem.
|
183
|
-
In the case of a true or false problem,
|
184
|
-
the neuron's value is either true or false,
|
185
|
-
while it's activation is between 1.0 and 0.0.
|
186
|
-
This attribute, activation, need never appear in an implementation of Neuronet, but
|
187
|
-
it is mapped back to it's unsquashed value every time
|
188
|
-
the implementation asks for the neuron's value.
|
189
|
-
|
190
|
-
Neuronet.squash( unsquashed )
|
191
|
-
1.0 / ( 1.0 + Math.exp( -unsquashed ) )
|
192
|
-
|
193
|
-
Neuronet.unsquashed( squashed )
|
194
|
-
Math.log( squashed / ( 1.0 - squashed ) )
|
195
170
|
|
196
|
-
|
171
|
+
# Component Architecture
|
197
172
|
|
198
|
-
|
173
|
+
## Nodes and Neurons
|
199
174
|
|
200
|
-
|
201
|
-
|
202
|
-
|
203
|
-
|
204
|
-
|
205
|
-
|
206
|
-
|
207
|
-
|
208
|
-
|
209
|
-
neuronet.learning
|
210
|
-
# Returns the current value of the network's learning constant
|
175
|
+
[Nodes](http://rubydoc.info/gems/neuronet/Neuronet/Node)
|
176
|
+
are used to set inputs while
|
177
|
+
[Neurons](http://rubydoc.info/gems/neuronet/Neuronet/Neuron)
|
178
|
+
are used for outputs and middle layers.
|
179
|
+
It's easy to create and connect Nodes and Neurons.
|
180
|
+
You can assemble custom neuronets one neuron at a time.
|
181
|
+
Too illustrate, here's a simple network that adds two random numbers.
|
211
182
|
|
212
|
-
neuronet
|
213
|
-
|
214
|
-
# Sets the global learning constant by an implementation given value
|
183
|
+
require 'neuronet'
|
184
|
+
include Neuronet
|
215
185
|
|
216
|
-
|
217
|
-
|
218
|
-
|
219
|
-
proportional to the square root of the number of steps.
|
220
|
-
I conjecture that the number of training data points is related to
|
221
|
-
the optimal learning constant in the same way.
|
222
|
-
I have come across 0.2 as a good value for the learning constant, which
|
223
|
-
would mean the proponent of this value was working with a data set size of about 25.
|
224
|
-
In any case, I've had good results with the following:
|
225
|
-
|
226
|
-
# where number is the number of data points
|
227
|
-
neuronet.learning( number )
|
228
|
-
1.0 / Math.sqrt( number + 1.0 )
|
186
|
+
def random
|
187
|
+
rand - rand
|
188
|
+
end
|
229
189
|
|
230
|
-
|
231
|
-
|
232
|
-
|
233
|
-
due to the nature of a random walk, we're approaching the solution in half steps.
|
190
|
+
# create the input nodes
|
191
|
+
a = Node.new
|
192
|
+
b = Node.new
|
234
193
|
|
235
|
-
|
194
|
+
# create the output neuron
|
195
|
+
sum = Neuron.new
|
236
196
|
|
237
|
-
|
238
|
-
|
239
|
-
hope that training it will converge to a solution.
|
240
|
-
I've never really believed that to be a correct way.
|
241
|
-
Although the implementation is free to set all parameters for each neuron,
|
242
|
-
Neuronet by default creates zeroed neurons.
|
243
|
-
Association between inputs and outputs are trained, and
|
244
|
-
neurons differentiate from each other randomly.
|
245
|
-
Differentiation among neurons is achieved by noise in the back-propagation of errors.
|
246
|
-
This noise is provided by:
|
197
|
+
# and a neuron on the side
|
198
|
+
adjuster = Neuron.new
|
247
199
|
|
248
|
-
|
249
|
-
|
200
|
+
# connect the adjuster to a and b
|
201
|
+
adjuster.connect(a)
|
202
|
+
adjuster.connect(b)
|
250
203
|
|
251
|
-
|
204
|
+
# connect sum to a and b
|
205
|
+
sum.connect(a)
|
206
|
+
sum.connect(b)
|
207
|
+
# and to the adjuster
|
208
|
+
sum.connect(adjuster)
|
252
209
|
|
253
|
-
|
210
|
+
# The learning constant is about...
|
211
|
+
learning = 0.1
|
254
212
|
|
255
|
-
|
256
|
-
|
213
|
+
# Train the tiny network
|
214
|
+
10_000.times do
|
215
|
+
a.value = x = random
|
216
|
+
b.value = y = random
|
217
|
+
target = x+y
|
218
|
+
output = sum.update
|
219
|
+
sum.backpropagate(learning*(target-output))
|
220
|
+
end
|
257
221
|
|
258
|
-
|
222
|
+
# Let's see how well the training went
|
223
|
+
10.times do
|
224
|
+
a.value = x = random
|
225
|
+
b.value = y = random
|
226
|
+
target = x+y
|
227
|
+
output = sum.update
|
228
|
+
puts "#{x.round(3)} + #{y.round(3)} = #{target.round(3)}"
|
229
|
+
puts " Neuron says #{output.round(3)}, #{(100.0*(target-output)/target).round(2)}% error."
|
230
|
+
end
|
259
231
|
|
260
|
-
and responds to the following methods:
|
261
232
|
|
262
|
-
|
263
|
-
|
233
|
+
Here's a sample output:
|
234
|
+
|
235
|
+
0.003 + -0.413 = -0.41
|
236
|
+
Neuron says -0.413, -0.87% error.
|
237
|
+
-0.458 + 0.528 = 0.07
|
238
|
+
Neuron says 0.07, -0.45% error.
|
239
|
+
0.434 + -0.125 = 0.309
|
240
|
+
Neuron says 0.313, -1.43% error.
|
241
|
+
-0.212 + 0.34 = 0.127
|
242
|
+
Neuron says 0.131, -2.83% error.
|
243
|
+
-0.364 + 0.659 = 0.294
|
244
|
+
Neuron says 0.286, 2.86% error.
|
245
|
+
0.045 + 0.323 = 0.368
|
246
|
+
Neuron says 0.378, -2.75% error.
|
247
|
+
0.545 + 0.901 = 1.446
|
248
|
+
Neuron says 1.418, 1.9% error.
|
249
|
+
-0.451 + -0.486 = -0.937
|
250
|
+
Neuron says -0.944, -0.77% error.
|
251
|
+
-0.008 + 0.219 = 0.211
|
252
|
+
Neuron says 0.219, -3.58% error.
|
253
|
+
0.61 + 0.554 = 1.163
|
254
|
+
Neuron says 1.166, -0.25% error.
|
255
|
+
|
256
|
+
Note that the tiny neuronet has a limit on how precisely it can match the target, and
|
257
|
+
even after a million times training it won't do any beter than when it trains a few thousands.
|
258
|
+
[code](https://github.com/carlosjhr64/neuronet/blob/master/examples/neurons.rb)
|
259
|
+
|
260
|
+
|
261
|
+
## InputLayer and Layer
|
262
|
+
|
263
|
+
Instead of working with individual neurons, you can work with layers.
|
264
|
+
Here we build a [Perceptron](http://en.wikipedia.org/wiki/Perceptron):
|
265
|
+
|
266
|
+
in = InputLayer.new(9)
|
267
|
+
out = Layer.new(1)
|
268
|
+
out.connect(in)
|
264
269
|
|
265
|
-
|
270
|
+
When making connections keep in mind "outputs connects to inputs",
|
271
|
+
not the other way around.
|
272
|
+
You can set the input values and update this way:
|
266
273
|
|
267
|
-
|
268
|
-
|
269
|
-
node.value = 1.37
|
270
|
-
b = node.value # sets b to 1.37
|
274
|
+
in.set([1,2,3,4,5,6,7,8,9])
|
275
|
+
out.partial
|
271
276
|
|
272
|
-
|
273
|
-
|
274
|
-
|
277
|
+
Partial means the update wont travel further than the current layer,
|
278
|
+
which is all we have in this case anyways.
|
279
|
+
You get the output this way:
|
275
280
|
|
276
|
-
|
281
|
+
output = out.output # returns an array of values
|
277
282
|
|
278
|
-
|
279
|
-
backpropagation of errors. It is used for inputs.
|
280
|
-
It's used as a terminal where updates and back-propagations end.
|
281
|
-
For this purpose, it provides the following methods:
|
283
|
+
You train this way:
|
282
284
|
|
283
|
-
|
284
|
-
|
285
|
-
|
285
|
+
target = [1] #<= whatever value you want in the array
|
286
|
+
learning = 0.1
|
287
|
+
out.train(target, learning)
|
286
288
|
|
287
|
-
|
288
|
-
I can't think of a reason they'd appear in the implementation.
|
289
|
-
Likewise, the implementation should not have to bother with activation.
|
289
|
+
## FeedForward Network
|
290
290
|
|
291
|
-
|
291
|
+
Most of the time, you'll just use a network created with the
|
292
|
+
[FeedForward](http://rubydoc.info/gems/neuronet/Neuronet/FeedForward) class,
|
293
|
+
or a modified version or subclass of it.
|
294
|
+
Here we build a neuronet with four layers.
|
295
|
+
The input layer has four neurons, and the output has three.
|
296
|
+
Then we train it with a list of inputs and targets
|
297
|
+
using the method [#exemplar](http://rubydoc.info/gems/neuronet/Neuronet/FeedForward:exemplar):
|
292
298
|
|
293
|
-
|
294
|
-
|
295
|
-
|
296
|
-
|
297
|
-
|
298
|
-
|
299
|
-
|
300
|
-
between negative one and positive one (-1, 1).
|
301
|
-
Study the following table and see if you can see why:
|
302
|
-
|
303
|
-
x => sigmoid(x)
|
304
|
-
9 => 0.99987...
|
305
|
-
3 => 0.95257...
|
306
|
-
2 => 0.88079...
|
307
|
-
1 => 0.73105...
|
308
|
-
0 => 0.50000...
|
309
|
-
-1 => 0.26894...
|
310
|
-
-2 => 0.11920...
|
311
|
-
-3 => 0.04742...
|
312
|
-
-9 => 0.00012...
|
313
|
-
|
314
|
-
So as x gets much higher than 3, sigmoid(x) gets to be pretty close to just 1, and
|
315
|
-
as x gets much lower than -3, sigmoid(x) gets to be pretty close to 0.
|
316
|
-
Also note that sigmoid is centered about 0.5 which maps to 0.0 in problem space.
|
317
|
-
It is for this reason that I suggest the problem be displaced (subtracted)
|
318
|
-
by it's average to be centered about zero and scaled (divided) by it standard deviation.
|
319
|
-
For non gaussian data where outbounds are expected,
|
320
|
-
you should probably scale by a multiple of the standard deviation so
|
321
|
-
that most of the data fits within sigmoid's "field of view" (-1, 1).
|
322
|
-
|
323
|
-
## Connection
|
324
|
-
|
325
|
-
This is where I think Neuronet gets it's architecture really right!
|
326
|
-
Connections between neurons (and nodes) are there own separate objects.
|
327
|
-
In other codes I've seen this is not abstracted out.
|
328
|
-
In Neuronet, a neuron contains it's bias, and a list of it's connections.
|
329
|
-
Each connection contains it's weight (strength) and connected terminal node.
|
330
|
-
Given a terminal, node, a connection is created as follows:
|
331
|
-
|
332
|
-
connection = Neuronet::Connection.new( node, weight=0.0 )
|
333
|
-
|
334
|
-
So a neuron connected to the given terminal node would have
|
335
|
-
the created connection in its connections list.
|
336
|
-
This will be discussed below under the topic Neuron.
|
337
|
-
The object, connection, responds to the following methods:
|
338
|
-
|
339
|
-
value
|
340
|
-
update
|
341
|
-
backpropagate( error )
|
342
|
-
|
343
|
-
The value of a connection is the weighted activation of
|
344
|
-
the node it's connected to ( weight node.activation ).
|
345
|
-
Similarly, update is the updated value of a connection,
|
346
|
-
which is the weighted updated activation of the node it's connected to ( weight*node.update ).
|
347
|
-
The method update is the one to use
|
348
|
-
whenever the value of the inputs are changed (or right after training).
|
349
|
-
Otherwise, both update and value should give the same result
|
350
|
-
with value avoiding the unnecessary back calculations.
|
351
|
-
The method backpropagate modifies the connection's weight in proportion to
|
352
|
-
the error given and passes that error to its connected node via the node's backpropagate.
|
353
|
-
|
354
|
-
## Neuron
|
355
|
-
|
356
|
-
[Neuronet::Neuron](http://rubydoc.info/gems/neuronet/Neuronet/Neuron)
|
357
|
-
is a Neuronet::Node with some extra features.
|
358
|
-
It adds two attributes: connections, and bias.
|
359
|
-
As mentioned above, connections is a list, aka Array,
|
360
|
-
of the neuron's connections to other neurons (or nodes).
|
361
|
-
A neuron's bias is it's kicker (or deduction) to it's activation value
|
362
|
-
as a sum of its connections values. So a neuron's updated value is set as:
|
363
|
-
|
364
|
-
self.value = @bias + @connections.inject(0.0){|sum,connection| sum + connection.update}
|
365
|
-
|
366
|
-
If you're not familiar with ruby's Array::inject method,
|
367
|
-
it's the Ruby way of doing summations.
|
368
|
-
It's really cool once you get the gist of it. Checkout:
|
369
|
-
|
370
|
-
* Jay Field's Thoughts on Ruby: inject
|
371
|
-
* Induction ( for_all )
|
372
|
-
|
373
|
-
But that's a digression... Here's how an implementation creates a new neuron:
|
374
|
-
|
375
|
-
neuron = Neuronet::Neuron.new( bias=0.0 )
|
376
|
-
|
377
|
-
There's an attribute accessor for @bias, and an attribute reader for @connections. The object, neuron, responds to the following methods:
|
378
|
-
|
379
|
-
update
|
380
|
-
partial
|
381
|
-
backpropagate( error )
|
382
|
-
train( target, learning=Neuronet.learning )
|
383
|
-
connect( node, weight=0.0 )
|
384
|
-
|
385
|
-
The update method sets the neuron's value as described above. The partial method sets the neuron's value without calling the connections update methods as follows:
|
386
|
-
|
387
|
-
self.value = @bias + @connections.inject(0.0){|sum,connection| sum + connection.value}
|
388
|
-
|
389
|
-
It's not necessary to burrow all the way down to update the current neuron
|
390
|
-
if it's connected neurons have all been updated.
|
391
|
-
The implementation should set it's algorithm to use partial
|
392
|
-
instead of update as update will most likely needlessly update previously updated neurons.
|
393
|
-
The backpropagate method modifies the neuron's bias in proportion to the given error and
|
394
|
-
passes on this error to each of its connection's backpropagate method.
|
395
|
-
The connect method is how the implementation adds a connection,
|
396
|
-
the way to connect the neuron to another.
|
397
|
-
To connect neuron out to neuron in, for example, it is:
|
398
|
-
|
399
|
-
in = Neuronet::Neuron.new
|
400
|
-
out = Neuronet::Neuron.new
|
401
|
-
out.connect(in)
|
402
|
-
|
403
|
-
Think output connects to input.
|
404
|
-
Here, the input flow would be from in to out,
|
405
|
-
while back-propagation of errors flows from out to in.
|
406
|
-
If you wanted to train the value of out, out.value,
|
407
|
-
to be 1.5 with the given value of in set at 0.3, you do as follows:
|
408
|
-
|
409
|
-
puts "(#{in}, #{out})" # what you've got before (0.0, 0.0)
|
410
|
-
in.value = 0.3
|
411
|
-
out.train(1.5)
|
412
|
-
out.partial # don't forget to update (no need to go deeper than a, so partial)
|
413
|
-
puts "(#{in}, #{out})" # (0.3, 0.113022020702079)
|
414
|
-
|
415
|
-
Note that with continued training, b should approach it's target value of 1.5.
|
299
|
+
neuronet = Neuronet::FeedForward.new([4,5,6,3])
|
300
|
+
LIST.each do |input, target|
|
301
|
+
neuronet.exemplar(input, target)
|
302
|
+
# you could also train this way:
|
303
|
+
# neuronet.set(input)
|
304
|
+
# neuronet.train!(target)
|
305
|
+
end
|
416
306
|
|
417
|
-
|
307
|
+
The first layer is the input layer and the last layer is the output layer.
|
308
|
+
Neuronet also names the second and second last layer.
|
309
|
+
The second layer is called yin.
|
310
|
+
The second last layer is called yang.
|
311
|
+
For the example above, we can check their lengths.
|
418
312
|
|
419
|
-
|
420
|
-
|
421
|
-
|
422
|
-
|
313
|
+
puts neuronet.in.length #=> 4
|
314
|
+
puts neuronet.yin.length #=> 5
|
315
|
+
puts neuronet.yang.length #=> 6
|
316
|
+
puts neuronet.out.length #=> 3
|
317
|
+
|
318
|
+
## Tao, Yin, and Yang
|
423
319
|
|
424
|
-
|
320
|
+
Tao
|
321
|
+
: The absolute principle underlying the universe,
|
322
|
+
combining within itself the principles of yin and yang and
|
323
|
+
signifying the way, or code of behavior,
|
324
|
+
that is in harmony with the natural order.
|
425
325
|
|
426
|
-
|
326
|
+
Perceptrons are already very capable and quick to train.
|
327
|
+
By connecting the input layer to the output layer of a multilayer FeedForward network,
|
328
|
+
you'll get the Perceptron solution quicker while the middle layers work on the harder problem.
|
329
|
+
You can do that this way:
|
427
330
|
|
428
|
-
|
429
|
-
values
|
331
|
+
neronet.out.connect(neuronet.in)
|
430
332
|
|
431
|
-
|
333
|
+
But giving that a name, [Tao](http://rubydoc.info/gems/neuronet/Neuronet/Tao),
|
334
|
+
and using a prototype pattern to modify the instance is more fun:
|
432
335
|
|
433
|
-
|
434
|
-
input.set( [-1, 0, 1] )
|
435
|
-
puts input.values.join(', ') # [-1.0,0.0,1.0].join(', ')
|
336
|
+
Tao.bless(neuronet)
|
436
337
|
|
437
|
-
|
338
|
+
Yin
|
339
|
+
: The passive female principle of the universe, characterized as female and
|
340
|
+
sustaining and associated with earth, dark, and cold.
|
438
341
|
|
439
|
-
|
440
|
-
|
441
|
-
|
342
|
+
Initially FeedForward sets the weights of all connections to zero.
|
343
|
+
That is, there is no association made from input to ouput.
|
344
|
+
Changes in the inputs have no effect on the output.
|
345
|
+
Training begins the process that sets the weights to associate the two.
|
346
|
+
But you can also manually set the initial weights.
|
347
|
+
One useful way to initially set the weigths is to have one layer mirror another.
|
348
|
+
The [Yin](http://rubydoc.info/gems/neuronet/Neuronet/Yin) bless makes yin mirror the input.
|
442
349
|
|
443
|
-
|
444
|
-
layer = Neuronet::Layer.new( length )
|
350
|
+
Yin.bless(neuronet)
|
445
351
|
|
446
|
-
|
352
|
+
Yang
|
353
|
+
: The active male principle of the universe, characterized as male and
|
354
|
+
creative and associated with heaven, heat, and light.
|
447
355
|
|
448
|
-
|
449
|
-
|
450
|
-
train( targets, learning=Neuronet.learning )
|
451
|
-
values
|
356
|
+
One the other hand, the [Yang](http://rubydoc.info/gems/neuronet/Neuronet/Yang)
|
357
|
+
bless makes the output mirror yang.
|
452
358
|
|
453
|
-
|
454
|
-
A Perceptron is built this way:
|
359
|
+
Yang.bless(neuronet)
|
455
360
|
|
456
|
-
|
457
|
-
|
458
|
-
output_layer = Neuronet::Layer.new( m )
|
459
|
-
output_layer.connect( input_layer )
|
460
|
-
# to set the perceptron's input to -0.5,0.25,2.1...
|
461
|
-
input_layer.set( [-0.5, 0.25, 2.1] )
|
462
|
-
# to train it to -0.1, 0.2, 0.5
|
463
|
-
output_layer.train( [-0.1, 0.2, 0.5] )
|
464
|
-
output_layer.partial # update!
|
465
|
-
# to see its values
|
466
|
-
puts output_layer.values.join(', ')
|
361
|
+
Bless
|
362
|
+
: Pronounce words in a religious rite, to confer or invoke divine favor upon.
|
467
363
|
|
364
|
+
The reason Tao, Yin, and Yang are not classes onto themselves is that
|
365
|
+
you can combine these, and a protoptype pattern (bless) works better in this case.
|
366
|
+
Bless is the keyword used in [Perl](http://www.perl.org/) to create objects,
|
367
|
+
so it's not without precedent.
|
368
|
+
To combine all three features, Tao, Yin, and Yang, do this:
|
468
369
|
|
469
|
-
|
370
|
+
Tao.bless Yin.bless Yang.bless neuronet
|
470
371
|
|
471
|
-
|
472
|
-
|
372
|
+
To save typing, the library provides the possible combinations.
|
373
|
+
For example:
|
473
374
|
|
474
|
-
|
375
|
+
TaoYinYang.bless neuronet
|
475
376
|
|
476
|
-
|
377
|
+
# Scaling The Problem
|
477
378
|
|
478
|
-
|
479
|
-
|
480
|
-
|
481
|
-
|
482
|
-
|
483
|
-
|
484
|
-
input # in (first layer's) values
|
485
|
-
output # out (last layer's) values
|
486
|
-
And has the following attribute readers:
|
487
|
-
in # input (first) layer
|
488
|
-
out # output (last) layer
|
379
|
+
The squashing function, sigmoid, maps real numbers (negative infinity, positive infinity)
|
380
|
+
to the segment zero to one (0,1).
|
381
|
+
But for the sake of computation in a neural net,
|
382
|
+
sigmoid works best if the problem is scaled to numbers
|
383
|
+
between negative one and positive one (-1, 1).
|
384
|
+
Study the following table and see if you can see why:
|
489
385
|
|
490
|
-
|
491
|
-
|
492
|
-
|
493
|
-
|
494
|
-
|
495
|
-
|
386
|
+
x => sigmoid(x)
|
387
|
+
9 => 0.99987...
|
388
|
+
3 => 0.95257...
|
389
|
+
2 => 0.88079...
|
390
|
+
1 => 0.73105...
|
391
|
+
0 => 0.50000...
|
392
|
+
-1 => 0.26894...
|
393
|
+
-2 => 0.11920...
|
394
|
+
-3 => 0.04742...
|
395
|
+
-9 => 0.00012...
|
496
396
|
|
497
|
-
|
397
|
+
As x gets much higher than 3, sigmoid(x) gets to be pretty close to just 1, and
|
398
|
+
as x gets much lower than -3, sigmoid(x) gets to be pretty close to 0.
|
399
|
+
Note that sigmoid is centered about 0.5 which maps to 0.0 in problem space.
|
400
|
+
It is for this reason that I suggest the problem be displaced (subtracted)
|
401
|
+
by it's average to be centered about zero and scaled (divided) by it standard deviation.
|
402
|
+
Try to get most of the data to fit within sigmoid's central "field of view" (-1, 1).
|
498
403
|
|
499
|
-
## Scale
|
404
|
+
## Scale, Gaussian, and Log Normal
|
500
405
|
|
501
|
-
Neuronet
|
502
|
-
|
503
|
-
|
406
|
+
Neuronet provides three classes to help scale the problem space.
|
407
|
+
[Neuronet::Scale](http://rubydoc.info/gems/neuronet/Neuronet/Scale)
|
408
|
+
is the simplest most straight forward.
|
409
|
+
It finds the range and center of a list of values, and
|
410
|
+
linearly tranforms it to a range of (-1,1) centered at 0.
|
504
411
|
For example:
|
505
412
|
|
506
413
|
scale = Neuronet::Scale.new
|
@@ -510,7 +417,7 @@ For example:
|
|
510
417
|
puts mapped.join(', ') # 0.0, -1.0, 1.0, -0.75
|
511
418
|
puts scale.unmapped( mapped ).join(', ') # 1.0, -3.0, 5.0, -2.0
|
512
419
|
|
513
|
-
The mapping is
|
420
|
+
The mapping is the following:
|
514
421
|
|
515
422
|
center = (maximum + minimum) / 2.0 if center.nil? # calculate center if not given
|
516
423
|
spread = (maximum - minimum) / 2.0 if spread.nil? # calculate spread if not given
|
@@ -518,7 +425,7 @@ The mapping is like:
|
|
518
425
|
|
519
426
|
One can change the range of the map to (-1/factor, 1/factor)
|
520
427
|
where factor is the spread multiplier and force
|
521
|
-
a (
|
428
|
+
a (perhaps pre-calculated) value for center and spread.
|
522
429
|
The constructor is:
|
523
430
|
|
524
431
|
scale = Neuronet::Scale.new( factor=1.0, center=nil, spread=nil )
|
@@ -526,171 +433,74 @@ The constructor is:
|
|
526
433
|
In the constructor, if the value of center is provided, then
|
527
434
|
that value will be used instead of it being calculated from the values passed to method set.
|
528
435
|
Likewise, if spread is provided, that value of spread will be used.
|
529
|
-
There are two attribute accessors:
|
530
|
-
|
531
|
-
spread
|
532
|
-
center
|
533
|
-
|
534
|
-
One attribute writer:
|
535
|
-
|
536
|
-
init
|
537
|
-
|
538
|
-
In the code, the attribute @init flags if
|
539
|
-
there is a initiation phase to the calculation of @spread and @center.
|
540
|
-
For Scale, @init is true and the initiation phase calculates
|
541
|
-
the intermediate values @min and @max (the minimum and maximum values in the data set).
|
542
|
-
It's possible for subclasses of Scale, such as Gaussian, to not have this initiation phase.
|
543
|
-
An instance, scale, of class Scale will respond to the following methods considered to be public:
|
544
|
-
|
545
|
-
set( inputs )
|
546
|
-
mapped_input
|
547
|
-
mapped_output
|
548
|
-
unmapped_input
|
549
|
-
unmapped_output
|
550
|
-
|
551
|
-
In Scale, mapped\_input\ and mapped\_output are synonyms of mapped, but
|
552
|
-
in general this symmetry may be broken.
|
553
|
-
Likewise, unmapped\_input and unmapped\_output are synonyms of unmapped.
|
554
|
-
Scale also provides the following methods which are considered private:
|
555
|
-
|
556
|
-
set_init( inputs )
|
557
|
-
set_spread( inputs )
|
558
|
-
set_center( inputs )
|
559
|
-
mapped( inputs )
|
560
|
-
unmapped( outputs )
|
561
|
-
|
562
|
-
Except maybe for mapped and unmapped,
|
563
|
-
there should be no reason for the implementation to call these directly.
|
564
|
-
These are expected to be overridden by subclasses.
|
565
|
-
For example, in Gaussian, set\_spread calculates the standard deviation and
|
566
|
-
set\_center calculates the mean (average),
|
567
|
-
while set\_init is skipped by setting @init to false.
|
568
|
-
|
569
|
-
## Gaussian
|
570
|
-
|
571
|
-
In Neuronet, Gaussian subclasses Scale and is used exactly the same way.
|
572
|
-
The only changes are that it calculates the arithmetic mean (average) for center and
|
573
|
-
the standard deviation for spread.
|
574
|
-
The following private methods are overridden to provide that effect:
|
575
|
-
|
576
|
-
set_center( inputs )
|
577
|
-
inputs.inject(0.0,:+) / inputs.length
|
578
|
-
set_spread( inputs )
|
579
|
-
Math.sqrt( inputs.map{|value| self.center - value}.inject(0.0){|sum,value| value*value + sum} / (inputs.length - 1.0) )
|
580
|
-
|
581
|
-
## LogNormal
|
582
|
-
|
583
|
-
Neuronet::LogNormal subclasses Neuronet::Gaussian to transform the values to a logarithmic scale. It overrides the following methods:
|
584
|
-
|
585
|
-
set( inputs )
|
586
|
-
super( inputs.map{|value| Math::log(value)} )
|
587
|
-
mapped(inputs)
|
588
|
-
super( inputs.map{|value| Math::log(value)} )
|
589
|
-
unmapped(inputs)
|
590
|
-
super( inputs.map{|value| Math::exp(value)} )
|
591
|
-
|
592
|
-
So LogNormal is just Gaussian except that it first pipes values through a logarithm, and
|
593
|
-
then pipes the output back through exponentiation.
|
594
|
-
|
595
|
-
## ScaledNetwork
|
596
|
-
|
597
|
-
So now we're back to where we started.
|
598
|
-
In Neuronet, ScaledNetwork is a subclass of FeedForwardNetwork.
|
599
|
-
It automatically scales the problem given to it by using a Scale type instance,
|
600
|
-
Gaussian by default.
|
601
|
-
It adds on attribute accessor:
|
602
436
|
|
603
|
-
|
437
|
+
[Neuronet::Gaussian](http://rubydoc.info/gems/neuronet/Neuronet/Gaussian)
|
438
|
+
works the same way, except that it uses the average value of the list given
|
439
|
+
for the center, and the standard deviation for the spread.
|
604
440
|
|
605
|
-
|
606
|
-
|
607
|
-
|
441
|
+
And [Neuronet::LogNormal](http://rubydoc.info/gems/neuronet/Neuronet/LogNormal)
|
442
|
+
is just like Gaussian except that it first pipes values through a logarithm, and
|
443
|
+
then pipes the output back through exponentiation.
|
608
444
|
|
609
|
-
|
445
|
+
## ScaledNetwork
|
610
446
|
|
611
|
-
|
612
|
-
|
613
|
-
|
614
|
-
|
615
|
-
For example
|
447
|
+
[Neuronet::ScaledNetwork](http://rubydoc.info/gems/neuronet/Neuronet/ScaledNetwork)
|
448
|
+
automates the problem space scaling.
|
449
|
+
You can choose to do your scaling over the entire data set if you think
|
450
|
+
the relative scale of the individual inputs matter.
|
451
|
+
For example if in the problem one apple is good but two is to many...
|
452
|
+
In that case do this:
|
616
453
|
|
617
454
|
scaled_network.distribution.set( data_set.flatten )
|
618
455
|
data_set.each do |inputs,outputs|
|
619
456
|
# ... do your stuff using scaled_network.set( inputs )
|
620
457
|
end
|
621
458
|
|
622
|
-
|
459
|
+
If on the other hand the scale of the individual inputs is not the relevant feature,
|
460
|
+
you can you your scaling per individual input.
|
461
|
+
For example a small apple is an apple, and so is the big one. They're both apples.
|
462
|
+
Then do this:
|
623
463
|
|
624
464
|
data_set.each do |inputs,outputs|
|
625
465
|
# ... do your stuff using scaled_network.reset( inputs )
|
626
466
|
end
|
627
467
|
|
628
|
-
|
468
|
+
Note that in the first case you are using
|
469
|
+
[#set](http://rubydoc.info/gems/neuronet/Neuronet/ScaledNetwork:set)
|
470
|
+
and in the second case you are using
|
471
|
+
[#reset](http://rubydoc.info/gems/neuronet/Neuronet/ScaledNetwork:reset).
|
472
|
+
|
473
|
+
# Pit Falls
|
629
474
|
|
630
475
|
When sub-classing a Neuronet::Scale type class,
|
631
476
|
make sure mapped\_input, mapped\_output, unmapped\_input,
|
632
477
|
and unmapped\_output are defined as you intended.
|
633
478
|
If you don't override them, they will point to the first ancestor that defines them.
|
634
|
-
|
635
|
-
|
636
|
-
|
479
|
+
Overriding #mapped does not piggyback the aliases and
|
480
|
+
they will continue to point to the original #mapped method.
|
481
|
+
|
637
482
|
Another pitfall is confusing the input/output flow in connections and back-propagation.
|
638
483
|
Remember to connect outputs to inputs (out.connect(in)) and
|
639
484
|
to back-propagate from outputs to inputs (out.train(targets)).
|
640
485
|
|
641
|
-
#
|
642
|
-
|
643
|
-
## Custom Networks
|
644
|
-
|
645
|
-
To demonstrate how this library can build custom networks,
|
646
|
-
I've created four new classes of feed forward networks.
|
647
|
-
By the way, I'm completely making these up and was about to call them
|
648
|
-
HotDog, Taco, Burrito, and Enchilada when I then thought of Tao/Yin/Yang:
|
649
|
-
|
650
|
-
## Tao
|
651
|
-
|
652
|
-
In Neuronet, Tao is a three or more layered feed forward neural network
|
653
|
-
with it's output and input connected directly.
|
654
|
-
It effectively makes it a hybrid perceptron. It subclasses ScaledNetwork.
|
486
|
+
# Interesting Custom Networks
|
655
487
|
|
656
|
-
## Yin
|
657
|
-
|
658
|
-
In Neuronet, Yin is a Tao with the first hidden layer, hereby called yin,
|
659
|
-
initially set to have corresponding neuron pairs with
|
660
|
-
it's input layer's with weights set to 1.0 and bias 0.5.
|
661
|
-
This makes yin initially mirror the input layer.
|
662
|
-
The correspondence is done between the first neurons in the yin layer and the input layer.
|
663
|
-
|
664
|
-
## Yang
|
665
|
-
|
666
|
-
In Neuronet, Yang is a Tao with it's output layer connected to the last hidden layer,
|
667
|
-
hereby called yang, such that corresponding neuron pairs have weights set to 1.0 and bias 0.5.
|
668
|
-
This makes output initially mirror yang.
|
669
|
-
The correspondence is done between the last neurons in the yang layer and the output layer.
|
670
|
-
|
671
|
-
## YinYang
|
672
|
-
|
673
|
-
In Neuronet, YinYang is a Tao that's been Yin'ed Yang'ed. :))
|
674
|
-
That's a feed forward network of at least three layers with
|
675
|
-
its output layer also connected directly to the input layer, and
|
676
|
-
with the output layer initially mirroring the last hidden layer, and
|
677
|
-
the first hidden layer initially mirroring the input layer.
|
678
488
|
Note that a particularly interesting YinYang with n inputs and m outputs
|
679
489
|
would be constructed this way:
|
680
490
|
|
681
|
-
yinyang =
|
491
|
+
yinyang = YinYang.bless FeedForward.new( [n, n+m, m] )
|
682
492
|
|
683
493
|
Here yinyang's hidden layer (which is both yin and yang)
|
684
494
|
initially would have the first n neurons mirror the input and
|
685
495
|
the last m neurons be mirrored by the output.
|
686
496
|
Another interesting YinYang would be:
|
687
497
|
|
688
|
-
yinyang =
|
498
|
+
yinyang = YinYang.bless FeedForward.new( [n, n, n] )
|
689
499
|
|
690
500
|
The following code demonstrates what is meant by "mirroring":
|
691
501
|
|
692
|
-
yinyang =
|
693
|
-
yinyang.
|
502
|
+
yinyang = YinYang.bless FeedForward.new( [3, 3, 3] )
|
503
|
+
yinyang.set( [-1,0,1] )
|
694
504
|
puts yinyang.in.map{|x| x.activation}.join(', ')
|
695
505
|
puts yinyang.yin.map{|x| x.activation}.join(', ')
|
696
506
|
puts yinyang.out.map{|x| x.activation}.join(', ')
|
@@ -703,169 +513,202 @@ Here's the output:
|
|
703
513
|
0.485626707638021, 0.5, 0.514373292361979
|
704
514
|
-0.0575090141074614, 0.0, 0.057509014107461
|
705
515
|
|
706
|
-
|
707
|
-
|
708
|
-
Email me.
|
709
|
-
|
710
|
-
# AND A LOT OF MATH I NEED TO GO OVER AND CLEAN UP
|
711
|
-
|
712
|
-
## Notes I had on my old ynot2day
|
713
|
-
|
714
|
-
My sources are Neural Networks & Fuzzy Logic by Dr. Valluru B. Rao and Hayagriva V. Rao (1995), and Neural Computing Architectures edited by Igor Aleksander (1989) which includes "A theory of neural networks" by Eduardo R. Caianiello and "Backpropagation in non-feedforward networks" by Luis B. Almeida. The following is my analysis of the general mathematics of neural networks, which clarity I have not found elsewhere.
|
715
|
-
|
716
|
-
First, let me define my notation. I hate to reinvent the wheel (well, actually, it is kind of fun to do so), but I do not know the standard math notation when using straight ASCII typed from a normal keyboard. So I define the notation for sumation, differentiation, the sigmoid function, and the exponential function given as Exp{}. Indexes to vectors and arrays are bracketed with []. Objects acted on by functions are bracketed by {}. Grouping of variables/objects is achieved with (). I also use () to include parameters that modify a function.
|
717
|
-
|
718
|
-
Definition of Sum:
|
719
|
-
|
720
|
-
Sum(j=1 to 3){A[j]} = A[1]+A[2]+A[3]
|
721
|
-
Sum(i=0 to N){f[i]} = f[i]+...+f[N]
|
722
|
-
|
723
|
-
Definition of Dif:
|
724
|
-
|
725
|
-
Dif(x){x^n} = n*x^(n-1)
|
726
|
-
Dif(y){f{u}} = Dif(u){f{u}}*Dif(y){u}
|
727
|
-
|
728
|
-
Definition of Sig:
|
729
|
-
|
730
|
-
Sig{z} = 1/(1+Exp{-z})
|
731
|
-
|
732
|
-
Next, I describe a mathematical model of a neuron. Usually a neuron is described as being either on or off. I believe it more usefull to describe a neuron as having a pulse rate. A boolean (true or false) neuron would either have a high or a low pulse rate. In absence of any stimuli from neighbohring neurons, the neuron may also have a rest pulse rate. The rest pulse rate is due to the the bias of a neuron. A neuron receives stimuli from other neurons through the axons that connect them. These axons communicate to the receiving neuron the pulse rates of the transmitting neurons. The signal from other neurons are either strengthen or weakened at the synapse, and might either inhibit or excite the receiving neuron. The sum of all these signals is the activation of the receiving neuron. The activation of a neuron determines the neuron's actual response (its pulse rate), which the neuron then transmits to other neurons through its axons. Finally, a neuron has a maximum pulse rate which I map to 1, and a minimum pulse rate of 0.
|
733
|
-
|
734
|
-
Let the bias of a neuron be b[], the activation, y[], the response, x[], and the weights, w[,]. The pulse rate of a receiving neuron, r, is related to its activation which is related to the pulse rates of the other transmitting neurons, t, by the following equations:
|
735
|
-
|
736
|
-
x[r] = Sig{ y[r] }
|
737
|
-
y[r] = b[r] + Sum(All t){ w[r,t] * x[t] }
|
738
|
-
x[r] = Sig{ b[r] + Sum(All t){w[r,t] * x[t]} }
|
739
|
-
|
740
|
-
Next I try to derive the learning rule of the neural network. Somehow, a neuron can be trained to become more or less sensitive to stimuli from another neuron and to become more or less sensitive in general. That is, we can change the neuron's bias and synaptic weights. To do it right, I need an estimate of the error in the neuron's pulse and relate this to the correction needed in the bias and each synaptic weight. This type of error analysis is usually aproximated through differentiation. What is the error in the pulse rate due to an error in a synaptic weight?
|
741
|
-
|
742
|
-
Dif(w[r,t]){x[r]} = Dif(y[r]){Sig{y[r]}}*Dif(w[r,t]){y[r]}}
|
743
|
-
|
744
|
-
Dif(z){Sig{z}} = -Exp{-z}/(1+Exp{-z})^2
|
745
|
-
= (1-1-Exp{-z})/(1+Exp{-z})^2
|
746
|
-
= (1-(1+Exp{-z}))/(1+Exp{-z})^2
|
747
|
-
= 1/(1+Exp{-z})^2 - 1/(1+Exp{-z})
|
748
|
-
= Sig{z}^2 - Sig{z}
|
749
|
-
= Sig{z}*(Sig{z}-1) # <== THIS HAS TO BE WRONG! Looking for D{f}=f(1-f)
|
750
|
-
|
751
|
-
Dif(y[r]){Sig{y[r]}} = Sig{y[r]}*(Sig{y[r]}-1) =
|
752
|
-
= x[r]*(x[r]-1)
|
753
|
-
|
754
|
-
Dif(w[r,t]){y[r]} = Dif(w[r,t]){Sum(t){w[r,t]*x[t]}} = x[t]
|
516
|
+
# Theory
|
755
517
|
|
756
|
-
|
518
|
+
## The Biological Description of a Neuron
|
757
519
|
|
758
|
-
|
520
|
+
Usually a neuron is described as being either on or off.
|
521
|
+
I think it is more useful to describe a neuron as having a pulse rate.
|
522
|
+
A neuron would either have a high or a low pulse rate.
|
523
|
+
In absence of any stimuli from neighboring neurons, the neuron may also have a rest pulse rate.
|
524
|
+
A neuron receives stimuli from other neurons through the axons that connects them.
|
525
|
+
These axons communicate to the receiving neuron the pulse rates of the transmitting neurons.
|
526
|
+
The signal from other neurons are either strengthen or weakened at the synapse, and
|
527
|
+
might either inhibit or excite the receiving neuron.
|
528
|
+
Regardless of how much stimuli the neuron gets,
|
529
|
+
a neuron has a maximum pulse it cannot exceed.
|
759
530
|
|
760
|
-
|
761
|
-
dx[r,t] = x[r] * (x[r]-1) * x[t] * dw[r,t]
|
531
|
+
## The Mathematical Model of a Neuron
|
762
532
|
|
763
|
-
|
533
|
+
Since my readers here are probably Ruby programmers, I'll write the math in a Ruby-ish way.
|
534
|
+
Allow me to sum this way:
|
764
535
|
|
765
|
-
|
766
|
-
|
767
|
-
|
768
|
-
|
769
|
-
|
770
|
-
|
771
|
-
u[r,t] * e[r] = x[r] * (x[r]-1) * x[t] * dw[r,t]
|
772
|
-
|
773
|
-
The average value of u[r,t] is probably 1/N. In any case, the point is that this average is a small number less than 1. We use an equi-partition hypothesis and assume that each dw[r,t] is equally likely to be the source of error. Let u[r,t] ~ u, a small number, for all r and t. The best estimate of dw[r,t] becomes:
|
774
|
-
|
775
|
-
u * e[r] = x[r] * (x[r]-1) * x[t] * dw[r,t]
|
776
|
-
dw[r,t] = u * e[r] / ( x[r] * (x[r]-1) * x[t] ) ???
|
777
|
-
|
778
|
-
If u~1/N was not tricky, then consider this. x[] is meant to converge to either 0 or 1. That is x[] is meant to be boolean. Note how the above equation for dw[i,j] could not really work if x[] truly were 0 or 1. But x[] is a fuzzy variable never really achieving 0 or 1.
|
779
|
-
|
780
|
-
How do I conclude that... dw[r,t]=u * x[r] * (1-x[r]) * x[t] * e[r] ...which is the correct learning rule?
|
781
|
-
|
782
|
-
This is the part of the Neural Net jargon I have not been able to bring to my level. I believe the answer is buried in what is being called transposition of linear networks. My analysis is correct up to this:
|
783
|
-
|
784
|
-
u * e[r] = x[r] * (x[r]-1) * x[t] * dw[r,t]
|
785
|
-
|
786
|
-
This equation relates the error in the pulse of neuron to the error in the synaptic weight between the transmiting neuron, t, and the receiving neuron, r. I believe the transposition of linear networks states that the relationship remains the same when back propagating the error in the neural pulse to the the synaptic weight. That is, we do not invert the multiplication factor. This seems intuitive, but I admit I am confused by the paradox in the algebra above. Thus...
|
787
|
-
|
788
|
-
The Learning Rule for the (0,1) sigmoid neuron
|
789
|
-
dw[r,t] = x[r] * (x[r]-1) * x[t] * u * e[r]
|
536
|
+
module Enumerable
|
537
|
+
def sum
|
538
|
+
map{|a| yield(a)}.inject(0, :+)
|
539
|
+
end
|
540
|
+
end
|
541
|
+
[1,2,3].sum{|i| 2*i} == 2+4+6 # => true
|
790
542
|
|
791
|
-
|
543
|
+
Can I convince you that taking the derivative of a function looks like this?
|
792
544
|
|
793
|
-
|
794
|
-
|
795
|
-
|
545
|
+
def d(x)
|
546
|
+
dx = SMALL
|
547
|
+
f = yield(x)
|
548
|
+
(yield(x+dx) - f)/dx
|
549
|
+
end
|
550
|
+
dfdx = d(a){|x| f(x)}
|
796
551
|
|
797
|
-
|
552
|
+
So the Ruby-ish way to write one of the rules of Calculus is:
|
798
553
|
|
799
|
-
|
554
|
+
d{|x| Ax^n} == nAx^(n-1)
|
800
555
|
|
801
|
-
|
556
|
+
We won't bother distinguishing integers from floats.
|
557
|
+
The sigmoid function is:
|
802
558
|
|
803
|
-
|
804
|
-
|
559
|
+
def sigmoid(x)
|
560
|
+
1/(1+exp(-x))
|
561
|
+
end
|
562
|
+
sigmoid(a) == 1/(1+exp(a))
|
563
|
+
|
564
|
+
A neuron's pulserate increases with increasing stimulus, so
|
565
|
+
we need a model that adds up all the stimuli a neuron gets.
|
566
|
+
The sum of all stimuli we will call the neuron's value.
|
567
|
+
(I find this confusing, but
|
568
|
+
it works out that it is this sum that will give us the problem space value.)
|
569
|
+
To model the neuron's rest pulse, we'll say that it has a bias value, it's own stimuli.
|
570
|
+
Stimuli from other neurons comes through the connections,
|
571
|
+
so there is a sum over all the connections.
|
572
|
+
The stimuli from other transmitting neurons is be proportional to their own pulsetates and
|
573
|
+
the weight the receiving neuron gives them.
|
574
|
+
In the model we will call the pulserate the neuron's activation.
|
575
|
+
Lastly, to more closely match the code, a neuron is a node.
|
576
|
+
This is what we have so far:
|
577
|
+
|
578
|
+
value = bias + connections.sum{|connection| connection.weight * connection.node.activation }
|
579
|
+
|
580
|
+
# or by their biological synonyms
|
581
|
+
|
582
|
+
stimulus = unsquashed_rest_pulse_rate +
|
583
|
+
connections.sum{|connection| connection.weight * connection.neuron.pulserate}
|
584
|
+
|
585
|
+
Unsquashed rest pulse rate? Yeah, I'm about to close the loop here.
|
586
|
+
As described, a neuron can have a very low pulse rate, effectively zero,
|
587
|
+
and a maximum pulse which I will define as being one.
|
588
|
+
The sigmoid function will take any amount it gets and
|
589
|
+
squashes it to a number between zero and one,
|
590
|
+
which is what we need to model the neuron's behavior.
|
591
|
+
To get the node's activation (aka neuron's pulserate)
|
592
|
+
from the node's value (aka neuron's stimulus),
|
593
|
+
we squash the value with the sigmoid function.
|
594
|
+
|
595
|
+
# the node's activation from it's value
|
596
|
+
activation = sigmoid(value)
|
597
|
+
|
598
|
+
# or by their biological synonyms
|
599
|
+
|
600
|
+
# the neuron's pulserate from its stimulus
|
601
|
+
pulserate = sigmoid(stimulus)
|
602
|
+
|
603
|
+
So the "rest pulse rate" is sigmoid("unsquashed rest pulse rate").
|
604
|
+
|
605
|
+
## Backpropagation of Errors
|
606
|
+
|
607
|
+
There's a lot of really complicated math in understanding how neural networks work.
|
608
|
+
But if we concentrate on just the part pertinent to the bacpkpropagation code, it's not that bad.
|
609
|
+
The trick is to do the analysis in the problem space (otherwise things get real ugly).
|
610
|
+
When we train a neuron, we want the neuron's value to match a target as closely as possible.
|
611
|
+
The deviation from the target is the error:
|
612
|
+
|
613
|
+
error = target - value
|
614
|
+
|
615
|
+
Where does the error come from?
|
616
|
+
It comes from deviations from the ideal bias and weights the neuron should have.
|
617
|
+
|
618
|
+
target = value + error
|
619
|
+
target = bias + bias_error +
|
620
|
+
connections.sum{|connection| (connection.weight + weight_error) * connection.node.activation }
|
621
|
+
error = bias_error + connections.sum{|connection| weight_error * connection.node.activation }
|
622
|
+
|
623
|
+
Next we assume that the errors are equally likely everywhere,
|
624
|
+
so that the bias error is expected to be same on average as weight error.
|
625
|
+
That's where the learning constant comes in.
|
626
|
+
We need to divide the error equally among all contributors, say 1/N.
|
627
|
+
Then:
|
805
628
|
|
806
|
-
|
629
|
+
error = error/N + connections.sum{|connection| error/N * connection.node.activation }
|
807
630
|
|
808
|
-
|
809
|
-
e[t] = Sum(r){ w[r,t] * e[r] * Dif(b[r]){x[r]} } for the rest.
|
631
|
+
Note that if the equation above represents the entire network, then
|
810
632
|
|
811
|
-
|
633
|
+
N = 1 + connections.length
|
812
634
|
|
635
|
+
So now that we know the error, we can modify the bias and weights.
|
813
636
|
|
814
|
-
|
637
|
+
bias += error/N
|
638
|
+
connection.weight += connection.node.activation * error/N
|
815
639
|
|
816
|
-
|
640
|
+
The Calculus is:
|
817
641
|
|
818
|
-
|
819
|
-
|
820
|
-
obj.out.map{|node| node.to_f}
|
821
|
-
obj.out.map{|node| node.value}
|
822
|
-
obj.out.map{|node| unsquash(node.activation)}
|
823
|
-
obj.out.map{|node| bias+connections }
|
642
|
+
d{|bias| bias + connections.sum{|connection| connection.weight * connection.node.activation }}
|
643
|
+
== d{|bias| bias}
|
824
644
|
|
825
|
-
|
645
|
+
d{|connection.weight| bias + connections.sum{|connection| connection.weight * connection.node.activation }}
|
646
|
+
== connection.node.activation * d{|weight| connection.weight }
|
826
647
|
|
827
|
-
|
828
|
-
|
829
|
-
|
648
|
+
So what's all the ugly math you'll see elsewhere?
|
649
|
+
Well, you can try to do the above analysis in neuron space.
|
650
|
+
Then you're inside the squash function.
|
651
|
+
I'll just show derivative of the sigmoid function:
|
830
652
|
|
831
|
-
|
653
|
+
d{|x| sigmoid(x)} ==
|
654
|
+
d{|x| 1/(1+exp(-x))} ==
|
655
|
+
1/(1+exp(-x))^2 * d{|x|(1+exp(-x)} ==
|
656
|
+
1/(1+exp(-x))^2 * d{|x|(exp(-x)} ==
|
657
|
+
1/(1+exp(-x))^2 * d{|x| -x}*exp(-x) ==
|
658
|
+
1/(1+exp(-x))^2 * (-1)*exp(-x) ==
|
659
|
+
-exp(-x)/(1+exp(-x))^2 ==
|
660
|
+
(1 -1 - exp(-x))/(1+exp(-x))^2 ==
|
661
|
+
(1 - (1 + exp(-x)))/(1+exp(-x))^2 ==
|
662
|
+
(1 - 1/sigmoid(x)) * sigmoid^2(x) ==
|
663
|
+
(sigmoid(x) - 1) * sigmoid(x) ==
|
664
|
+
sigmoid(x)*(sigmoid(x) - 1)
|
665
|
+
# =>
|
666
|
+
d{|x| sigmoid(x)} == sigmoid(x)*(sigmoid(x) - 1)
|
832
667
|
|
833
|
-
|
668
|
+
From there you try to find the errors from the point of view of the activation instead of the value.
|
669
|
+
But as the code clearly shows, the analysis need not get this deep.
|
834
670
|
|
835
|
-
|
671
|
+
## Learning Constant
|
836
672
|
|
837
|
-
|
673
|
+
One can think of a neural network as a sheet of very elastic rubber
|
674
|
+
which one pokes and pulls to fit the training data while
|
675
|
+
otherwise keeping the sheet as smooth as possible.
|
676
|
+
One concern is that the training data may contain noise, random errors.
|
677
|
+
So the training of the network should add up the true signal in the data
|
678
|
+
while canceling out the noise. This balance is set via the learning constant.
|
838
679
|
|
839
|
-
|
680
|
+
neuronet.learning
|
681
|
+
# Returns the current value of the network's learning constant
|
840
682
|
|
841
|
-
|
842
|
-
|
843
|
-
1+e = 1 + E/Output[o]
|
844
|
-
e = E/Output[o]
|
683
|
+
neuronet.learning = float
|
684
|
+
# where float is greater than zero but less than one.
|
845
685
|
|
846
|
-
|
686
|
+
By default, Neuronet::FeedForward sets the learning constant to 1/N, where
|
687
|
+
N is the number of biases and weights in the network
|
688
|
+
(plus one, just because...). You can get the vale of N with
|
689
|
+
[#mu](http://rubydoc.info/gems/neuronet/Neuronet/FeedForward:mu).
|
847
690
|
|
848
|
-
|
849
|
-
|
691
|
+
So I'm now making up a few more names for stuff.
|
692
|
+
The number of contributors to errors in the network is #mu.
|
693
|
+
The learning constant based on #mu is
|
694
|
+
[#muk](http://rubydoc.info/gems/neuronet/Neuronet/FeedForward:muk).
|
695
|
+
You can modify the learning constant to some fraction of muk, say 0.7, this way:
|
850
696
|
|
851
|
-
|
852
|
-
we might then suggest the following correction to Bias:
|
697
|
+
neuronet.muk(0.7)
|
853
698
|
|
854
|
-
|
855
|
-
|
856
|
-
|
699
|
+
I've not come across any hard rule for the learning constant.
|
700
|
+
I have my own intuition derived from the behavior of random walks.
|
701
|
+
The distance away from a starting point in a random walk is
|
702
|
+
proportional to the square root of the number of steps.
|
703
|
+
I conjecture that the number of training data points is related to
|
704
|
+
the optimal learning constant in the same way.
|
705
|
+
So I provide a way to set the learning constant based on the size of the data with
|
706
|
+
[#num](http://rubydoc.info/gems/neuronet/Neuronet/FeedForward:num)
|
857
707
|
|
708
|
+
neuronet.num(n)
|
858
709
|
|
859
|
-
|
860
|
-
D{squash(u)} = squash(u)*(1-squash(u))*D{u}
|
710
|
+
The value of #num(n) is #muk(1.0)/Math.sqrt(n)).
|
861
711
|
|
862
|
-
|
863
|
-
D{ @activation } = D{ squash( @bias + @connections...) }
|
864
|
-
D{ @activation } = @activation*(1-@activation) D{ @bias + @connections... }
|
865
|
-
Just the part due to bias...
|
866
|
-
D.bias{ @activation } = @activation*(1-@activation) D{ @bias }
|
867
|
-
D.bias{ @activation } / (@activation*(1-@activation)) = D{ @bias }
|
868
|
-
Just the part due to connection...
|
869
|
-
D.connection{ @activation } = @activation*(1-@activation) D{ @connections... }
|
712
|
+
# Questions?
|
870
713
|
|
871
|
-
|
714
|
+
Email me!
|