This section shows that a multi-layer perceptron maps data points outside its training domain closer to its domain if the perceptron is trained to map data distributed in a circle onto the same circle (see section 7.4). Let
be the input to the trained network. Here,
has unit length and
is a scalar.
We study the effect of on the network output
. Let
be a
h×2 matrix containing the weights between the input and the hidden layer, and
be a 2×h matrix with the weights between the hidden and the output layer. Further, let
be a column vector of
, and
be a row vector of
. We assume that all threshold values equal zero, and that the weights fulfill:
=
and
=
.
We first look at the case = 1. The network output is
![]() |
Let
be the vector with components
tanh(yj). A larger number h of hidden units leads to smaller components of
(section 7.4: yj equals on average 1/h). Therefore, we approximate
tanh(yj)
yj. It follows that also
lies on the circle in the span of
{
}.
Next, we look at the effect of the weight matrix . After training, all
(which have unit length) are mapped (C.15) onto a circle with radius one. Thus,
needs to project the circle in the span of
{
} onto the unit circle in the two-dimensional output space. This is only achieved if both row vectors
and
lie in the span of
{
} (otherwise, the projection would be an ellipse). It follows that
is also in the span of
{
}, and any vector
in the span of
{
} can be written as
=
(
)
.
Next, we look at the case > 1. Let
(
) be the vector with components
tanh(
yj). Here, the above tanh-approximation is generally not valid, and
(
) might protrude out of the plane spanned by
{
}. Thus, we need to write
(
) =
(
)T
+
, with
orthogonal to
{
}. The squares of this equation fulfill
||
(
)||2 =
||
(
)T
||2 + ||
||2, from which follows:
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
(C.17) |
|![]() ![]() ![]() |
(C.19) |