1.5.5 Self-organizing maps

To learn sensorimotor relations, Ritter et al. (1990) used the self-organizing map (SOM) algorithm (Kohonen, 1995). This algorithm fits a q-dimensional grid to a distribution of training patterns in IR^d, q $\le$ d (Kohonen, 1982). SOMs were motivated by sensory maps in the brain, for example, the somatosensory map or the tonotopic map (Kohonen, 1989). A SOM consists of a q-dimensional array^1.8 of nodes; each node i has a location $\bf r_{i}^{}$ and a weight vector $\bf w_{i}^{}$ (figure 1.7). The weight vectors are in the space of the training patterns.

**Figure 1.7:** Node locations (left: circles) and weight vectors (right: black dots). Each node i with location $\bf r_{i}^{}$ has an associated weight vector $\bf w_{i}^{}$ in the space of training patterns (dotted lines mark sample associations). The dashed lines and curves indicate the manifold of training patterns.
$\includegraphics[width=15cm]{som.eps}$

The algorithm consists of three steps, which are alternated until convergence is reached. First, a training pattern $\bf x$ is drawn randomly. Second, the node c is determined whose weight vector $\bf w_{c}^{}$ is closest to $\bf x$ ,

h_ic = $\displaystyle \alpha$ (t)exp $\displaystyle \left(\vphantom{-\frac{\vert\vert{\bf r}_i - {\bf r}_c\vert\vert^2}{2\sigma^2(t)}}\right.$ - $\displaystyle {\frac{{\vert\vert{\bf r}_i - {\bf r}_c\vert\vert^2}}{{2\sigma^2(t)}}}$ $\displaystyle \left.\vphantom{-\frac{\vert\vert{\bf r}_i - {\bf r}_c\vert\vert^2}{2\sigma^2(t)}}\right)$ .

(1.4)

The SOM algorithm can be easily extended by adding more parameters to each node and by updating them in parallel to the weight vectors. With such an extension, Ritter and Schulten (1986) used the SOM to learn sensorimotor relations. Here, a training pattern consists of a pair ( $\bf x$ ,y) of sensor values $\bf x$ and corresponding motor values $\bf y$ . Such a sensorimotor pattern is an element in the space formed by the Cartesian product of the sensor and the motor space. The SOM extension has two weight vectors for each node i, one for the sensory input, $\bf w_{i}^{{\mbox{\footnotesize in}}}$ , and one for the motor output, $\bf w_{i}^{{\mbox{\footnotesize out}}}$ . The computation (1.2) of the best matching node is restricted to the sensory domain; but both $\bf w_{i}^{{\mbox{\footnotesize in}}}$ and $\bf w_{i}^{{\mbox{\footnotesize out}}}$ are updated according to (1.3) ( $\bf w_{i}^{{\mbox{\footnotesize out}}}$ is updated based on $\bf y$ , and the neighborhood function may have different parameters).

This algorithm fits the grid of weight vectors ( $\bf w_{i}^{{\mbox{\footnotesize in}}}$ , $\bf w_{i}^{{\mbox{\footnotesize out}}}$ ) to the sensorimotor pattern distribution. The resulting link between a sensory input $\bf w_{i}^{{\mbox{\footnotesize in}}}$ and a motor output $\bf w_{i}^{{\mbox{\footnotesize out}}}$ provides a discrete mapping, usable for an inverse model. To obtain a continuous mapping, Ritter et al. (1989) further added a locally linear map to each node. The result was successfully applied to control a robot arm with three degrees of freedom.

**Figure 1.8:** Failure of the sensorimotor extension of SOM on one-to-many mappings. The motor component $\bf w^{{\mbox{\footnotesize out}}}_{}$ is averaged over competing patterns $\bf y_{1}^{}$ and $\bf y_{2}^{}$ .
$\includegraphics[width=12cm]{one-to-manySOM.eps}$

The restriction of the node competition (1.2) to the distance in sensory space reduces the search space for the weights, but it makes the approach fail on one-to-many mappings (figure 1.8). The learning for $\bf w_{i}^{{\mbox{\footnotesize in}}}$ is independent of $\bf w_{i}^{{\mbox{\footnotesize out}}}$ . As a result, a classical SOM algorithm is applied solely to the sensory domain, while $\bf w_{i}^{{\mbox{\footnotesize out}}}$ is updated simultaneously. In the case of two possible target values for one sensory input, $\bf w_{i}^{{\mbox{\footnotesize out}}}$ is attracted to two different positions $\bf y_{1}^{}$ and $\bf y_{2}^{}$ (because the node competition is independent of the distance to $\bf y_{1}^{}$ or $\bf y_{2}^{}$ ). Thus, as a result from the update rule (1.3), $\bf w_{i}^{{\mbox{\footnotesize out}}}$ will be averaged among $\bf y_{1}^{}$ and $\bf y_{2}^{}$ . For example, given that both $\bf y_{1}^{}$ and $\bf y_{2}^{}$ are drawn with the same probability (p = 0.5), on average, $\bf w_{i}^{{\mbox{\footnotesize out}}}$ is updated according to

$\displaystyle \bf w_{i}^{{\mbox{\footnotesize out}}}$ (t + 1)	=	$\displaystyle \bf w_{i}^{{\mbox{\footnotesize out}}}$ (t) + $\displaystyle {\frac{{1}}{{2}}}$ h_ic $\displaystyle \left[\vphantom{{\bf y}_1-{\bf w}_i^{\mbox{\footnotesize out}}(t)}\right.$ $\displaystyle \bf y_{1}^{}$ - $\displaystyle \bf w_{i}^{{\mbox{\footnotesize out}}}$ (t) $\displaystyle \left.\vphantom{{\bf y}_1-{\bf w}_i^{\mbox{\footnotesize out}}(t)}\right]$ + $\displaystyle {\frac{{1}}{{2}}}$ h_ic $\displaystyle \left[\vphantom{{\bf y}_2-{\bf w}_i^{\mbox{\footnotesize out}}(t)}\right.$ $\displaystyle \bf y_{2}^{}$ - $\displaystyle \bf w_{i}^{{\mbox{\footnotesize out}}}$ (t) $\displaystyle \left.\vphantom{{\bf y}_2-{\bf w}_i^{\mbox{\footnotesize out}}(t)}\right]$
	=	$\displaystyle \bf w_{i}^{{\mbox{\footnotesize out}}}$ (t) + h_ic $\displaystyle \left[\vphantom{\frac{1}{2} ({\bf y}_1+{\bf y}_2) - {\bf w}_i^{\mbox{\footnotesize out}}(t)}\right.$ $\displaystyle {\frac{{1}}{{2}}}$ ( $\displaystyle \bf y_{1}^{}$ + $\displaystyle \bf y_{2}^{}$ ) - $\displaystyle \bf w_{i}^{{\mbox{\footnotesize out}}}$ (t) $\displaystyle \left.\vphantom{\frac{1}{2} ({\bf y}_1+{\bf y}_2) - {\bf w}_i^{\mbox{\footnotesize out}}(t)}\right]$ .	(1.5)

Further limitations arise from the SOM grid structure. First, since sensorimotor manifolds are usually non-linear, many grid points are needed. With increasing dimensionality q of the manifold, the number of necessary points increases exponentially (the number of points per dimension to the power of q). Soon (q > 3), this gets computationally unfeasible. As a solution to this problem, Martinetz and Schulten (1990) suggested an extension to hierarchical SOM. A second limitation arises in real world applications: some sensor values could be pure noise (or irrelevant to the sensorimotor map). Such noise dimensions need to be filled with grid points (figure 1.9), resulting in the same problems as mentioned above.