Figure 19.3: Fully connected feed-forward network
plest types of ANNs and differ significantly from feedback networks, which we will not look further into here.
For most practical applications, a single hidden layer is sufficient, so the typical NN for our purposes has exactly three layers:
• Input layer (for example input from robot sensors)
•Hidden layer (connected to input and output layer)
•Output layer (for example output to robot actuators)
Perceptron Incidentally, the first feed-forward network proposed by Rosenblatt had only two layers, one input layer and one output layer [Rosenblatt 1962]. However, these so-called “Perceptrons” were severely limited in their computational power because of this restriction, as was soon after discovered by [Minsky, Papert 1969]. Unfortunately, this publication almost brought neural network research to a halt for several years, although the principal restriction applies only to two-layer networks, not for networks with three layers or more.
In the standard three-layer network, the input layer is usually simplified in the way that the input values are directly taken as neuron activation. No activation function is called for input neurons. The remaining questions for our standard three-layer NN type are:
•How many neurons to use in each layer?
•Which connections should be made between layer i and layer i + 1?
•How are the weights determined?
The answers to these questions are surprisingly straightforward:
•How many neurons to use in each layer?
The number of neurons in the input and output layer are determined by the application. For example, if we want to have an NN drive a robot around a maze (compare Chapter 15) with three PSD sensors as input
279
19 Neural Networks
and two motors as output, then the network should have three input neurons and two output neurons.
Unfortunately, there is no rule for the “right” number of hidden neurons. Too few hidden neurons will prevent the network from learning, since they have insufficient storage capacity. Too many hidden neurons will slow down the learning process because of extra overhead. The right number of hidden neurons depends on the “complexity” of the given problem and has to be determined through experimenting. In this example we are using six hidden neurons.
•Which connections should be made between layer i and layer i + 1?
We simply connect every output from layer i to every input at layer i + 1. This is called a “fully connected” neural network. There is no need to leave out individual connections, since the same effect can be achieved by giving this connection a weight of zero. That way we can use a much more general and uniform network structure.
•How are the weights determined?
This is the really tricky question. Apparently the whole intelligence of an NN is somehow encoded in the set of weights being used. What used to be a program (e.g. driving a robot in a straight line, but avoiding any obstacles sensed by the PSD sensors) is now reduced to a set of floating point numbers. With sufficient insight, we could just “program” an NN by specifying the correct (or let’s say working) weights. However, since this would be virtually impossible, even for networks with small complexity, we need another technique.
The standard method is supervised learning, for example through error backpropagation (see Section 19.3). The same task is repeatedly run by the NN and the outcome judged by a supervisor. Errors made by the network are backpropagated from the output layer via the hidden layer to the input layer, amending the weights of each connection.
left |
|
|
|
|
|
|
|
|
|
|
left wheel |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
M |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
front |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
M |






right wheel right 





sensors |
|
|
input layer |
|
hidden layer output layer |
|
actuators |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Figure 19.4: Neural network for driving a mobile robot
Evolutionary algorithms provide another method for determining the weights of a neural network. For example, a genetic algorithm (see Chapter 20) can be used to evolve an optimal set of neuron weights.
Figure 19.4 shows the experimental setup for an NN that should drive a mobile robot collision-free through a maze (for example left-wall following) with constant speed. Since we are using three sensor inputs and two motor outputs and we chose six hidden neurons, our network has 3 + 6 + 2 neurons in total. The input layer receives the sensor data from the infrared PSD distance sensors and the output layer produces driving commands for the left and right motors of a robot with differential drive steering.
Let us calculate the output of an NN for a simpler case with 2 + 4 + 1 neurons. Figure 19.5, top, shows the labelling of the neurons and connections in the three layers, Figure 19.5, bottom, shows the network with sample input values and weights. For a network with three layers, only two sets of connection weights are required:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
win 1,1 |
|
nh1id |
wout 1,1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
win 2,1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
in1 |
|
|
|
|
|
|
|
|
|
|
nin1 |
win 1,2 |
|
nhid2 |
wout 2,1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
w |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
in 2,2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
nout1 |
|
|
|
|
|
|
|
|
out |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
w |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
in2 |
|
|
|
|
|
|
|
|
|
|
|
nin2 |
|
in 1,3 |
|
|
|
|
|
|
wout 3,1 |
|
|
|
1 |
|
|
|
|
|
|
|
|
|
|
|
w |
|
n |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
in 2,3 |
|
|
hid3 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
win 1,4 |
|
|
|
|
|
|
wout 4,1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
win 2,4 |
|
n |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
hid4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
input layer |
|
|
|
|
hidden layer output layer |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
0.2
0.3
0.1
0.4
-0.2
0.6
-0.7 0.1 
input layer |
hidden layer output layer |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Figure 19.5: Example neural network
281
19 Neural Networks
•Weights from the input layer to the hidden layer, summarized as ma-
trix win i,j (weight of connection from input neuron i to hidden neuron j).
•Weights from the hidden layer to the output layer, summarized as ma-
trix wout i,j (weight of connection from hidden neuron i to output neuron j).
No weights are required from sensors to the first layer or from the output layer to actuators. These weights are just assumed to be always 1. All other weights are normalized to the range [–1 .. +1].
|
|
|
|
|
|
|
|
|
|
|
0.2 |
|
0.35 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
0.3 |
|
0.59 |
|
0.8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
0.1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1.00 |
|
|
|
|
|
|
0.30 |
|
|
|
-0.2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
0.4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
0.60 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
-0.2 |
|
|
0.57 -0.2 |
0.42 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
0.50 |
|
|
|
|
|
0.6 |
|
|
0.10 |
|
|
|
0.5 |
0.60 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
-0.7 |
|
|
0.52 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
0.1 |
|
-0.65 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
0.34 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
input layer |
|
|
hidden layer |
|
|
|
|
output layer |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Figure 19.6: Feed-forward evaluation
Calculation of the output function starts with the input layer on the left and propagates through the network. For the input layer, there is one input value (sensor value) per input neuron. Each input data value is used directly as neuron activation value:
a(nin1) = o(nin1) = 1.00 a(nin2) = o(nin2) = 0.50
For all subsequent layers, we first calculate the activation function of each neuron as a weighted sum of its inputs, and then apply the sigmoid output function. The first neuron of the hidden layer has the following activation and output values:
a(nhid1) = 1.00 · 0.2 + 0.50 · 0.3 = 0.35 o(nhid1) = 1 / (1 + e–0.35) = 0.59
The subsequent steps for the remaining two layers are shown in Figure 19.6 with the activation values printed in each neuron symbol and the output values below, always rounded to two decimal places.
Once the values have percolated through the feed-forward network, they will not change until the input values change. Obviously this is not true for networks with feedback connections. Program 19.1 shows the implementation of
Backpropagation
the feed-forward process. This program already takes care of two additional so-called “bias neurons”, which are required for backpropagation learning.
Program 19.1: Feed-forward execution
1 |
#include <math.h> |
// number of input neurons |
|
2 |
#define NIN |
(2+1) |
|
3 |
#define NHID |
(4+1) |
// number of hidden |
neurons |
|
4 |
#define NOUT |
1 |
// number of output |
neurons |
|
5 |
float w_in [NIN][NHID]; |
// in weights from 3 to 4 neur. |
6 |
float w_out[NHID][NOUT]; // out weights from |
4 to 1 |
neur. |
7 |
|
|
|
|
|
8float sigmoid(float x)
9{ return 1.0 / (1.0 + exp(-x));
10}
11 |
void feedforward(float |
N_in[NIN], float N_hid[NHID], |
12 |
13 |
float |
N_out[NOUT]) |
14{ int i,j;
15// calculate activation of hidden neurons
16N_in[NIN-1] = 1.0; // set bias input neuron
17for (i=0; i<NHID-1; i++)
18{ N_hid[i] = 0.0;
19for (j=0; j<NIN; j++)
20N_hid[i] += N_in[j] * w_in[j][i];
21N_hid[i] = sigmoid(N_hid[i]);
22}
23N_hid[NHID-1] = 1.0; // set bias hidden neuron
24// calculate activation and output of output neurons
25for (i=0; i<NOUT; i++)
26{ N_out[i] = 0.0;
27for (j=0; j<NHID; j++)
28N_out[i] += N_hid[j] * w_out[j][i];
29N_out[i] = sigmoid(N_out[i]);
30}
31}
19.3Backpropagation
A large number of different techniques exist for learning in neural networks. These include supervised and unsupervised techniques, depending on whether a “teacher” presents the correct answer to a training case or not, as well as online or off-line learning, depending on whether the system evolves inside or outside the execution environment. Classification networks with the popular backpropagation learning method [Rumelhart, McClelland 1986], a supervised off-line technique, can be used to identify a certain situation from the network input and produce a corresponding output signal. The drawback of this method is that a complete set of all relevant input cases together with their solutions have to be presented to the NN. Another popular method requiring
283
19 Neural Networks
only incremental feedback for input/output pairs is reinforcement learning [Sutton, Barto 1998]. This on-line technique can be seen as either supervised or unsupervised, since the feedback signal only refers to the network’s current performance and does not provide the desired network output. In the following, the backpropagation method is presented.
A feed-forward neural network starts with random weights and is presented a number of test cases called the training set. The network’s outputs are compared with the known correct results for the particular set of input values and any deviations (error function) are propagated back through the net.
Having done this for a number of iterations, the NN hopefully has learned the complete training set and can now produce the correct output for each input pattern in the training set. The real hope, however, is that the network is able to generalize, which means it will be able to produce similar outputs corresponding to similar input patterns it has not seen before. Without the capability of generalization, no useful learning can take place, since we would simply store and reproduce the training set.
The backpropagation algorithm works as follows:
1.Initialize network with random weights.
2.For all training cases:
a.Present training inputs to network and calculate output.
b.For all layers (starting with output layer, back to input layer):
i.Compare network output with correct output (error function).
ii.Adapt weights in current layer.
For implementing this learning algorithm, we do know what the correct results for the output layer should be, because they are supplied together with the training inputs. However, it is not yet clear for the other layers, so let us do this step by step.
Firstly, we look at the error function. For each output neuron, we compute the difference between the actual output value outi and the desired output dout i. For the total network error, we calculate the sum of square difference:
Eout i |
= dout i – outi |
|
|
num nout |
|
Etotal |
¦ Eout2 |
i |
|
i 0 |
|
The next step is to adapt the weights, which is done by a gradient descent approach:
So the adjustment of the weight will be proportional to the contribution of the weight to the error, with the magnitude of change determined by constant
K. This can be achieved by the following formulas [Rumelhart, McClelland 1986]:
diffout i |
= (o(nout i) – dout i) · (1 – o(nout i)) · o(nout i) |
'wout k,i |
= –2 · K · diffout i · inputk(nout i) |
|
= –2 · K · diffout i · o(nhid k) |
Assuming |
the desired output dout1 of the NN in Figure 19.5 to be |
dout1 = 1.0, and choosing K = 0.5 to further simplify the formula, we can now update the four weights between the hidden layer and the output layer. Note
that all calculations have been performed with full floating point accuracy, while only two or three digits are printed.
=(o(nout 1) – dout1) · (1 – o(nout 1)) · o(nout 1)
=(0.60 – 1.00) · (1 – 0.60) · 0.60 = –0.096
=– diffout1 · input1(nout1)
=– diffout1 · o(nhid1)
=– (–0.096) · 0.59 = +0.057
'wout 2,1 = 0.096 · 0.57 = 'wout 3,1 = 0.096 · 0.52 = 'wout4,1 = 0.096 · 0.34 =
The new weights will be:
w´out 1,1 |
= wout1,1 + 'wout 1,1 |
w´out 2,1 |
= wout2,1 + 'wout 2,1 |
w´out 3,1 |
= wout3,1 + 'wout 3,1 |
w´out 4,1 |
= wout4,1 + 'wout 4,1 |
+0.055
+0.050
+0.033
= 0.8 + 0.057 = 0.86
=–0.2 + 0.055 = –0.15
=–0.2 + 0.050 = –0.15
= 0.5 + 0.033 = 0.53
The only remaining step is to adapt the win weights. Using the same formula, we need to know what the desired outputs dhid k are for the hidden layer. We get these values by backpropagating the error values from the output layer multiplied by the activation value of the corresponding neuron in the hidden layer, and adding up all these terms for each neuron in the hidden layer. We could also use the difference values of the output layer instead of the error values, however we found that using error values improves convergence. Here, we use the old (unchanged) value of the connection weight, which again improves convergence. The error formula for the hidden layer (difference between desired and actual hidden value) is:
|
num nout |
Ehid i |
¦ Eout k wout i k |
|
k 1 |
diffhid i |
= Ehid i · (1 – o(nhid i)) · o(nhid i) |
In the example in Figure 19.5, there is only one output neuron, so each hidden neuron has only a single term for its desired value. The value and difference values for the first hidden neuron are therefore:
285
19 Neural Networks
Ehid 1 = Eout 1 · wout 1,1
= 0.4 · 0.8 = 0.32
diffhid 1 = Ehid 1 · (1 – o(nhid 1)) · o(nhid 1) = 0.32 · (1 – 0.59) · 0.59 = 0.077
Program 19.2: Backpropagation execution
1float backprop(float train_in[NIN], float train_out[NOUT])
2/* returns current square error value */
3{ int i,j;
4float err_total;
5float N_out[NOUT],err_out[NOUT];
6float diff_out[NOUT];
7float N_hid[NHID], err_hid[NHID], diff_hid[NHID];
8
9//run network, calculate difference to desired output
10feedforward(train_in, N_hid, N_out);
11err_total = 0.0;
12for (i=0; i<NOUT; i++)
13{ err_out[i] = train_out[i]-N_out[i];
14diff_out[i]= err_out[i] * (1.0-N_out[i]) * N_out[i];
15err_total += err_out[i]*err_out[i];
16}
17
18// update w_out and calculate hidden difference values
19for (i=0; i<NHID; i++)
20{ err_hid[i] = 0.0;
21for (j=0; j<NOUT; j++)
22{ err_hid[i] += err_out[j] * w_out[i][j];
23w_out[i][j] += diff_out[j] * N_hid[i];
24}
25diff_hid[i] = err_hid[i] * (1.0-N_hid[i]) * N_hid[i];
26}
27
28// update w_in
29for (i=0; i<NIN; i++)
30for (j=0; j<NHID; j++)
31w_in[i][j] += diff_hid[j] * train_in[i];
33return err_total;
34}
The weight changes for the two connections from the input layer to the first hidden neuron are as follows. Remember that the input of the hidden layer is the output of the input layer:
'win k,i = 2 · K · diffhid i · inputk(nhid i) for K = 0.5
= diffhid i · o(nin k)
'win1,1 = diffhid 1 · o(nin 1)
= 0.077 · 1.0 = 0.077
'win2,1 = diffhid 1 · o(nin 2)
= 0.077 · 0.5 = 0.039
and so on for the remaining weights. The first two updated weights will therefore be:
w´in1,1 = win1,1 + 'win1,1 = 0.2 + 0.077 = 0.28 w´in2,1 = win2,1 + 'win2,1 = 0.3 + 0.039 = 0.34
The backpropagation procedure iterates until a certain termination criterion has been fulfilled. This could be a fixed number of iterations over all training patterns, or until sufficient convergence has been achieved, for example if the total output error over all training patterns falls below a certain threshold.
Bias neurons Program 19.2 demonstrates the implementation of the backpropagation process. Note that in order for backpropagation to work, we need one additional input neuron and one additional hidden neuron, called “bias neurons”. The activation levels of these two neurons are always fixed to 1. The weights of the connections to the bias neurons are required for the backpropagation procedure to converge (see Figure 19.7).
1

1
input layer |
hidden layer |
|
output layer |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Figure 19.7: Bias neurons and connections for backpropagation
287
19 Neural Networks
19.4 Neural Network Example
A simple example for testing a neural network implementation is trying to learn the digits 0..9 from a seven-segment display representation. Figure 19.8 shows the arrangement of the segments and the numerical input and training output for the neural network, which could be read from a data file. Note that there are ten output neurons, one for each digit, 0..9. This will be much easier to learn than e.g. a four-digit binary encoded output (0000 to 1001).
1 |
|
digit 0 |
in: 1 1 1 0 1 1 1 |
out: 1 0 0 0 0 0 0 0 0 0 |
|
digit 1 |
in: 0 0 1 0 0 1 0 |
out: 0 1 0 0 0 0 0 0 0 0 |
|
|
|
|
|
|
|
|
|
|
2 |
|
|
|
3 |
digit 2 |
in: 1 0 1 1 1 0 1 |
out: 0 0 1 0 0 0 0 0 0 0 |
|
|
|
digit 3 |
in: 1 0 1 1 0 1 1 |
out: 0 0 0 1 0 0 0 0 0 0 |
|
|
|
|
|
|
|
|
|
|
digit 4 |
in: 0 1 1 1 0 1 0 |
out: 0 0 0 0 1 0 0 0 0 0 |
|
|
|
4 |
|
digit 5 |
in: 1 1 0 1 0 1 1 |
out: 0 0 0 0 0 1 0 0 0 0 |
|
|
5 |
|
|
|
6 |
digit 6 |
in: 1 1 0 1 1 1 1 |
out: 0 0 0 0 0 0 1 0 0 0 |
|
|
|
|
|
digit 7 |
in: 1 0 1 0 0 1 0 |
out: 0 0 0 0 0 0 0 1 0 0 |
|
|
|
|
|
digit 8 |
in: 1 1 1 1 1 1 1 |
out: 0 0 0 0 0 0 0 0 1 0 |
|
|
|
|
|
7 |
|
|
digit 9 |
in: 1 1 1 1 0 1 1 |
out: 0 0 0 0 0 0 0 0 0 1 |
|
|
|
|
|
Figure 19.8: Seven-segment digit representation
Figure 19.9 shows the decrease of total error values by applying the backpropagation procedure on the complete input data set for some 700 iterations. Eventually the goal of an error value below 0.1 is reached and the algorithm terminates. The weights stored in the neural net are now ready to take on previously unseen real data. In this example the trained network could e.g. be tested against 7-segment inputs with a single defective segment (always on or always off).
10 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
0 |
40 |
79 |
118 |
157 |
196 |
235 |
274 |
313 |
352 |
391 |
430 |
469 |
508 |
547 |
586 |
625 |
664 |
703 |
1 |
Figure 19.9: Error reduction for 7-segment example