7.4 Data outside the training domain

This section explains why a multi-layer perceptron that is trained to map data points within a sensory manifold, may map data points outside its training domain closer to the manifold (section 7.3.2, figure 7.11, left). This phenomenon depends on the structure of the training domain. It is not a general property of MLPs.

First, I show that all image vectors have about the same length, independent of the position of the robot. Second, I give a two-dimensional synthetic example having the same property. Third, I explain theoretically why in the example data points outside the training domain are mapped closer to the domain. Last, I show that the abstract RNN does not have this property in the example.

We estimate the length of an image vector $\bf s$ (the sensory representation). Although the world-to-camera mapping was non-linear, the image of the obstacle circle was still close to circular (figure 7.3). Its area was further almost independent of the robot's position. Thus, we assume that also on the camera image, the obstacles form a circle with fixed area. Within this region, the robot can stay at any point. To obtain the sensory representation, the circle is subdivided into ten sectors centered at the robot's position (figure 7.15).

**Figure 7.15:** All sectors have the same angle $\alpha$ (left). A sector has a length s_i and an area A_i (right).
$\includegraphics[width=13cm]{constantlength.eps}$

Let s_i be the length of each sector, and $\alpha$ be the angle of every sector (figure 7.15). If $\alpha$ is small enough then the area of a sector is well approximated by

|| $\displaystyle \bf s$ ||² = $\displaystyle \sum_{i}^{}$ s_i² = $\displaystyle \sum_{i}^{}$ $\displaystyle {\frac{{2}}{{\alpha}}}$ A_i $\displaystyle \approx$ $\displaystyle {\frac{{2}}{{\alpha}}}$ A_o . (7.3)

In the synthetic example discussed in the following, a circle is mapped onto a circle; that is, input and output are two-dimensional, and the training domain is a circle in the input and in the output space. The two circles would coincide if input and output coordinate system were put on top of each other. Each point $\bf s_{i}^{}$ in the input circle has in the second circle a target point $\bf g_{i}^{}$ that is rotated relative to $\bf s_{i}^{}$ by 23^o around the origin. 200 training points uniformly distributed around the circle were generated. An MLP learned the mapping from $\bf s_{i}^{}$ to $\bf g_{i}^{}$ for all i = 1,..., 200. The MLP had a three layer structure composed of two input neurons, h = 5 hidden neurons, and two output neurons. In the hidden layer, the activation function was sigmoidal (tanh), and in the other layers, it was the identity function. Initially, the weights were drawn uniformly from the interval [-0.1; 0.1]. Using back-propagation in on-line mode, the network trained until convergence.

Figure 7.16 shows the result after training. Points outside the training domain (distance to the origin: 2.0) were mapped closer to the origin in the output space (distance around 1.5), and points inside the training domain (distance: 0.66) were mapped closer the unit circle (distance around 0.75).

$\includegraphics[width=6cm]{circle1.eps}$

$\includegraphics[width=6cm]{circle2.eps}$

In the following, this finding is studied theoretically. The MLP maps an input $\bf s$ to an output $\bf o$ ,

o_i = $\displaystyle \sum_{{j=1}}^{h}$ v_ijtanh $\displaystyle \left(\vphantom{\sum_{k=1}^2 u_{jk} s_k}\right.$ $\displaystyle \sum_{{k=1}}^{2}$ u_jks_k $\displaystyle \left.\vphantom{\sum_{k=1}^2 u_{jk} s_k}\right)$ ,

(7.4)

$\displaystyle \bf V$ $\displaystyle \bf U$ $\displaystyle \beta$ $\displaystyle \bf s$ = $\displaystyle \beta$ $\displaystyle \bf V$ $\displaystyle \bf U$ $\displaystyle \bf s$ .

(7.5)

In the example with the two-dimensional circle, it was observed that in the trained network, the column vectors $\bf u_{k}^{}$ of $\bf U$ were approximately orthogonal and had unit length; the same held for the row vectors^7.2 $\bf v_{k}^{}$ of $\bf V$ . Thus, we assume that $\bf u_{k}^{T}$ $\bf u_{l}^{}$ = $\delta_{{kl}}^{}$ and $\bf v_{k}^{T}$ $\bf v_{l}^{}$ = $\delta_{{kl}}^{}$ . With this assumption, it can be shown (appendix C.4) that points $\bf s$ outside the circle are mapped closer to the circle,

| $\displaystyle \bf o$ | < $\displaystyle \left\Vert\vphantom{{\bf s}}\right.$ $\displaystyle \bf s$ $\displaystyle \left.\vphantom{{\bf s}}\right\Vert$ .

(7.6)

The assumption $\bf u_{k}^{T}$ $\bf u_{l}^{}$ = $\delta_{{kl}}^{}$ further predicts that the contraction effect decreases with increasing number of neurons h in the hidden layer. The assumption infers that $\sum_{{j=1}}^{h}$ u²_jk = 1. Thus, the expectation value of u²_jk equals 1/h. The argument of tanh is $\sum_{k}^{}$ u_jks_k. Here, the only random variables are {u_jk}, since the statement should hold for all $\bf s$ . Further, we assume that the expectation value of u_jk is zero. Then, for all inputs $\bf s$ with length $\beta$ , the expectation value of the squared tanh-argument can be written as

$\displaystyle \left\langle\vphantom{ \left(\sum_{k=1}^2 u_{jk} s_k\right)^2 }\right.$ $\displaystyle \left(\vphantom{\sum_{k=1}^2 u_{jk} s_k}\right.$ $\displaystyle \sum_{{k=1}}^{2}$ u_jks_k $\displaystyle \left.\vphantom{\sum_{k=1}^2 u_{jk} s_k}\right)^{2}_{}$ $\displaystyle \left.\vphantom{ \left(\sum_{k=1}^2 u_{jk} s_k\right)^2 }\right\rangle$ = $\displaystyle \sum_{{k=1}}^{2}$ $\displaystyle \left\langle\vphantom{ u^2_{jk} }\right.$ u²_jk $\displaystyle \left.\vphantom{ u^2_{jk} }\right\rangle$ s_k² = $\displaystyle {\frac{{\beta^2}}{{h}}}$ .

(7.7)

This finding was tested with the above experiment for different values of h. The result is shown in table 7.4. The values were averaged over three separately trained networks and on 360 trials each. The length of input vectors was set to 2.0. This experiment is in agreement with the above theoretical prediction.

Table 7.4: Dependence of the mean contraction c = $\left\langle\vphantom{ \Vert{\bf o}\Vert }\right.$ | $\bf o$ | $\left.\vphantom{ \Vert{\bf o}\Vert }\right\rangle$ / || $\bf s$ || on the number of hidden neurons.

hidden neurons	c
5	0.78
10	0.85
15	0.89
20	0.91
25	0.92

Different from the MLP, the abstract RNN maintains the scale in the circle task (figure 7.17). The 200 pairs of circle points ( $\bf s_{i}^{}$ , $\bf g_{i}^{}$ ) were approximated using a mixture of five units, each with two principal components (using for training MPPCA-ext). The centers of the ellipsoids turned out to be evenly distributed around the circle. Figure 7.17 shows that the distance to the origin is consistent between input and output pairs. As in (7.5), the local linear mappings do not change the length of input patterns.

$\includegraphics[width=6cm]{circleRNN1.eps}$

$\includegraphics[width=6cm]{circleRNN2.eps}$