C.4 Contraction of input vectors

This section shows that a multi-layer perceptron maps data points outside its training domain closer to its domain if the perceptron is trained to map data distributed in a circle onto the same circle (see section 7.4). Let $\beta$ $\bf s$ be the input to the trained network. Here, $\bf s$ has unit length and $\beta$ is a scalar.

We study the effect of $\beta$ on the network output $\bf o$ . Let $\bf U$ be a h×2 matrix containing the weights between the input and the hidden layer, and $\bf V$ be a 2×h matrix with the weights between the hidden and the output layer. Further, let $\bf u_{k}^{}$ be a column vector of $\bf U$ , and $\bf v_{k}^{}$ be a row vector of $\bf V$ . We assume that all threshold values equal zero, and that the weights fulfill: $\bf u_{k}^{T}$ $\bf u_{l}^{}$ = $\delta_{{kl}}^{}$ and $\bf v_{k}^{T}$ $\bf v_{l}^{}$ = $\delta_{{kl}}^{}$ .

o_i(1) = $\displaystyle \sum_{{j=1}}^{h}$ v_ijtanh $\displaystyle \left(\vphantom{\sum_{k=1}^2 u_{jk} s_k}\right.$ $\displaystyle \sum_{{k=1}}^{2}$ u_jks_k $\displaystyle \left.\vphantom{\sum_{k=1}^2 u_{jk} s_k}\right)$ .

(C.15)

|| $\displaystyle \bf y$ ||² = $\displaystyle \left\Vert\vphantom{\sum_{k=1}^2 s_k {\bf u}_k}\right.$ $\displaystyle \sum_{{k=1}}^{2}$ s_k $\displaystyle \bf u_{k}^{}$ $\displaystyle \left.\vphantom{\sum_{k=1}^2 s_k {\bf u}_k}\right\Vert^{2}_{}$ = $\displaystyle \sum_{{k=1}}^{2}$ s_k² = 1 .

(C.16)

**Figure C.2:** Image of the training patterns (gray ellipse) in the space of the hidden neurons (here, h = 3). The circle lies on a plane spanned by { $\bf u_{k}^{}$ }. The vectors { $\bf v_{k}^{}$ } lie in the same plane.
$\includegraphics[width=9cm]{project.eps}$

Let $\tilde{{\bf y}}$ be the vector with components tanh(y_j). A larger number h of hidden units leads to smaller components of $\bf y$ (section 7.4: y_j equals on average 1/h). Therefore, we approximate tanh(y_j) $\approx$ y_j. It follows that also $\tilde{{\bf y}}$ lies on the circle in the span of { $\bf u_{k}^{}$ }.

Next, we look at the effect of the weight matrix $\bf V$ . After training, all $\bf x$ (which have unit length) are mapped (C.15) onto a circle with radius one. Thus, $\bf V$ needs to project the circle in the span of { $\bf u_{k}^{}$ } onto the unit circle in the two-dimensional output space. This is only achieved if both row vectors $\bf v_{1}^{}$ and $\bf v_{2}^{}$ lie in the span of { $\bf u_{k}^{}$ } (otherwise, the projection would be an ellipse). It follows that $\tilde{{\bf y}}$ is also in the span of { $\bf v_{k}^{}$ }, and any vector $\tilde{{\bf y}}$ in the span of { $\bf v_{k}^{}$ } can be written as $\tilde{{\bf y}}$ = $\sum_{k}^{}$ ( $\tilde{{\bf y}}^{T}_{}$ $\bf v_{k}^{}$ ) $\bf v_{k}^{}$ .

Next, we look at the case $\beta$ > 1. Let $\tilde{{\bf y}}$ ( $\beta$ ) be the vector with components tanh( $\beta$ y_j). Here, the above tanh-approximation is generally not valid, and $\tilde{{\bf y}}$ ( $\beta$ ) might protrude out of the plane spanned by { $\bf v_{k}^{}$ }. Thus, we need to write $\tilde{{\bf y}}$ ( $\beta$ ) = $\sum_{k}^{}$ $\left(\vphantom{\tilde{\bf y}(\beta)^T {\bf v}_k}\right.$ $\tilde{{\bf y}}$ ( $\beta$ )^T $\bf v_{k}^{}$ $\left.\vphantom{\tilde{\bf y}(\beta)^T {\bf v}_k}\right)$ $\bf v_{k}^{}$ + $\bf b$ , with $\bf b$ orthogonal to { $\bf v_{k}^{}$ }. The squares of this equation fulfill || $\tilde{{\bf y}}$ ( $\beta$ )||² = $\sum_{k}^{}$ || $\tilde{{\bf y}}$ ( $\beta$ )^T $\bf v_{k}^{}$ ||² + || $\bf b$ ||², from which follows:

$\displaystyle \sum_{k}^{}$ | $\displaystyle \tilde{{\bf y}}$ ( $\displaystyle \beta$ )^T $\displaystyle \bf v_{k}^{}$ |² $\displaystyle \le$ | $\displaystyle \tilde{{\bf y}}$ ( $\displaystyle \beta$ )|² .

(C.17)

|| $\displaystyle \bf o$ ( $\displaystyle \beta$ )||² = $\displaystyle \sum_{{k=1}}^{2}$ $\displaystyle \left(\vphantom{\sum_{j=1}^h \left(\tanh \beta y_j\right) v_{kj}}\right.$ $\displaystyle \sum_{{j=1}}^{h}$ $\displaystyle \left(\vphantom{\tanh \beta y_j}\right.$ tanh $\displaystyle \beta$ y_j $\displaystyle \left.\vphantom{\tanh \beta y_j}\right)$ v_kj $\displaystyle \left.\vphantom{\sum_{j=1}^h \left(\tanh \beta y_j\right) v_{kj}}\right)^{2}_{}$ $\displaystyle \le$ $\displaystyle \sum_{{j=1}}^{h}$ $\displaystyle \left(\vphantom{\tanh \beta y_j}\right.$ tanh $\displaystyle \beta$ y_j $\displaystyle \left.\vphantom{\tanh \beta y_j}\right)^{2}_{}$ < $\displaystyle \beta^{2}_{}$ $\displaystyle \sum_{{j=1}}^{h}$ $\displaystyle \left(\vphantom{\tanh y_j}\right.$ tanh y_j $\displaystyle \left.\vphantom{\tanh y_j}\right)^{2}_{}$ .

(C.18)