This section shows that a multi-layer perceptron maps data points outside its training domain closer to its domain if the perceptron is trained to map data distributed in a circle onto the same circle (see section 7.4). Let be the input to the trained network. Here, has unit length and is a scalar.
We study the effect of on the network output . Let be a h×2 matrix containing the weights between the input and the hidden layer, and be a 2×h matrix with the weights between the hidden and the output layer. Further, let be a column vector of , and be a row vector of . We assume that all threshold values equal zero, and that the weights fulfill: = and = .
We first look at the case = 1. The network output is
|
Let be the vector with components tanh(yj). A larger number h of hidden units leads to smaller components of (section 7.4: yj equals on average 1/h). Therefore, we approximate tanh(yj) yj. It follows that also lies on the circle in the span of {}.
Next, we look at the effect of the weight matrix . After training, all (which have unit length) are mapped (C.15) onto a circle with radius one. Thus, needs to project the circle in the span of {} onto the unit circle in the two-dimensional output space. This is only achieved if both row vectors and lie in the span of {} (otherwise, the projection would be an ellipse). It follows that is also in the span of {}, and any vector in the span of {} can be written as = ().
Next, we look at the case > 1. Let () be the vector with components tanh(yj). Here, the above tanh-approximation is generally not valid, and () might protrude out of the plane spanned by {}. Thus, we need to write () = ()T + , with orthogonal to {}. The squares of this equation fulfill ||()||2 = ||()T||2 + ||||2, from which follows:
|()T|2|()|2 . | (C.17) |
|()| < . | (C.19) |