Next: D. Database of hand-written Up: C. Proofs Previous: C.3 Estimate of error

# C.4 Contraction of input vectors

This section shows that a multi-layer perceptron maps data points outside its training domain closer to its domain if the perceptron is trained to map data distributed in a circle onto the same circle (see section 7.4). Let be the input to the trained network. Here, has unit length and is a scalar.

We study the effect of on the network output . Let be a h×2 matrix containing the weights between the input and the hidden layer, and be a 2×h matrix with the weights between the hidden and the output layer. Further, let be a column vector of , and be a row vector of . We assume that all threshold values equal zero, and that the weights fulfill: = and = .

We first look at the case = 1. The network output is

 oi(1) = vijtanhujksk . (C.15)

As a result of the network training, (1) has unit length. Let = be the argument of the tanh-function. From the assumptions follows that has unit length,

 ||||2 = sk = sk2 = 1 . (C.16)

Thus, the states lie on a circle with radius one and spanned by {} in a h-dimensional space (figure C.2).

Let be the vector with components tanh(yj). A larger number h of hidden units leads to smaller components of (section 7.4: yj equals on average 1/h). Therefore, we approximate tanh(yj) yj. It follows that also lies on the circle in the span of {}.

Next, we look at the effect of the weight matrix . After training, all (which have unit length) are mapped (C.15) onto a circle with radius one. Thus, needs to project the circle in the span of {} onto the unit circle in the two-dimensional output space. This is only achieved if both row vectors and lie in the span of {} (otherwise, the projection would be an ellipse). It follows that is also in the span of {}, and any vector in the span of {} can be written as = ().

Next, we look at the case > 1. Let () be the vector with components tanh(yj). Here, the above tanh-approximation is generally not valid, and () might protrude out of the plane spanned by {}. Thus, we need to write () = ()T + , with orthogonal to {}. The squares of this equation fulfill ||()||2 = ||()T||2 + ||||2, from which follows:

 |()T|2|()|2 . (C.17)

Therefore, for > 1, the squared length of the output vector can be written as

 ||()||2 = tanhyjvkjtanhyj < tanh yj. (C.18)

The last inequality follows from tanh() being a convex function for > 0. Under the assumption tanh(yj) = yj and (C.16), the last term in (C.18) equals . Thus,

 |()| <  . (C.19)

Points further away from the circle are mapped closer to the circle (the training domain).

Next: D. Database of hand-written Up: C. Proofs Previous: C.3 Estimate of error
Heiko Hoffmann
2005-03-22