Next: 4.3 Function approximation on Up: 4. Abstract recurrent neural Previous: 4.1.2 Potential fields and

4.2 Recall algorithm

The goal of the recall is to complete a pattern $\bf p$ whose components are only partially given (Hoffmann and Möller, 2003). The resulting pattern $\bf z$ shares the components of $\bf p$ that are defined as input.

After learning, the data distribution is represented by a collection of hyper-ellipsoids; each has a center $\bf c_{j}^{}$ , direction vectors $\bf w_{j}^{l}$ (principal axes), semi-axes lengths $\sqrt{{\lambda _j^l}}$ , and a residual variance $\sigma^{2}_{j}$ (in any direction orthogonal to the span of the principal axes). The $\bf w_{j}^{l}$ are the eigenvectors of a local principal component analysis, and the $\lambda_{j}^{l}$ are the corresponding eigenvalues. The hyper-ellipsoids are iso-potential surfaces of the normalized Mahalanobis distance plus reconstruction error (see section 3.2.1),

E_j( $\displaystyle \bf z$ ) = $\displaystyle \bf y_{j}^{T}$ $\displaystyle \Lambda$ _j^-1 $\displaystyle \bf y_{j}^{}$ + $\displaystyle {\frac{{1}}{{\sigma^2_j}}}$ ( $\displaystyle \xi$ _j^T $\displaystyle \xi$ _j - $\displaystyle \bf y_{j}^{T}$ $\displaystyle \bf y_{j}^{}$ ) + lndet $\displaystyle \Lambda$ _j + (d - q)ln $\displaystyle \sigma^{2}_{j}$ .

(4.1)

The dimensionality of the pattern space is d, and q is the number of principal components. $\xi$ _j is the displacement from the center, $\xi$ _j = $\bf z$ - $\bf c_{j}^{}$ . Its representation in the local coordinate system of the ellipsoid is $\bf y_{j}^{}$ = $\bf W_{j}^{T}$ $\xi$ _j. The eigenvectors $\bf w_{j}^{l}$ are the columns of $\bf W_{j}^{}$ . $\Lambda$ _j is a diagonal matrix containing the eigenvalues $\lambda_{j}^{l}$ .

An input to the network (one part of the components of $\bf p$ ) defines the offset of a constrained space $\bf z$ ( $\eta$ ) spanning the space of all possible output values:

$\displaystyle \bf z$ ( $\displaystyle \eta$ ) = $\displaystyle \bf M$ $\displaystyle \eta$ + $\displaystyle \bf p$ .

(4.2)

$\eta$ is a collection of s free parameters (s being the dimension of the network output). $\bf M$ is a d×s matrix, which is chosen such that the constrained space is aligned with the coordinate axes.

The recall of the complete pattern happens in two steps. First, for each unit j, determine the point $\hat{{{\bf z}}}_{j}^{}$ that yields the smallest potential value (4.1) on the constrained subspace. Second, choose the unit j^* that gives the smallest of these minimal potential values {E_j( $\hat{{{\bf z}}}_{j}^{}$ )}. The corresponding $\hat{{{\bf z}}}_{{j^*}}^{}$ yields the desired output values (figure 4.2).

**Figure 4.2:** Pattern recall. The ellipses are iso-potential curves of the error measure E_j for each unit j. The input x defines the constraint's offset from zero. In this subspace, the unit j^* and the point (circle) that result in the smallest error E_j^* are chosen. This point's y value is the desired output.
$\includegraphics[width=10cm]{recall.eps}$

The error E_j as a function of the free parameters $\eta$ can be written as:

E_j( $\displaystyle \bf z$ ( $\displaystyle \eta$ ))	=	( $\displaystyle \bf M$ $\displaystyle \eta$ + $\displaystyle \pi$ _j)^T( $\displaystyle \bf W_{j}^{}$ $\displaystyle \Lambda$ _j^-1 $\displaystyle \bf W_{j}^{T}$ + $\displaystyle {\frac{{1}}{{\sigma^2_j}}}$ { $\displaystyle \bf I$ - $\displaystyle \bf W_{j}^{}$ $\displaystyle \bf W_{j}^{T}$ } ) ( $\displaystyle \bf M$ $\displaystyle \eta$ + $\displaystyle \pi$ _j)
	+	lndet $\displaystyle \Lambda$ _j + (d - q)ln $\displaystyle \sigma^{2}_{j}$ ,	(4.3)

with $\pi$ _j = $\bf p$ - $\bf c_{j}^{}$ . We derive with respect to $\eta$ :

$\displaystyle {\frac{{\partial{E_j}}}{{\partial\boldsymbol{\eta}}}}$ = 2 $\displaystyle \bf M^{T}_{}$ $\displaystyle \bf D_{j}^{}$ $\displaystyle \bf M$ $\displaystyle \eta$ + 2 $\displaystyle \bf M^{T}_{}$ $\displaystyle \bf D_{j}^{}$ $\displaystyle \pi$ _j

(4.4)

with

$\displaystyle \bf D_{j}^{}$ = $\displaystyle \bf W_{j}^{}$ $\displaystyle \Lambda$ _j^-1 $\displaystyle \bf W_{j}^{T}$ + $\displaystyle {\frac{{1}}{{\sigma^2_j}}}$ $\displaystyle \left(\vphantom{{\bf I}- {\bf W}_j {\bf W}_j^T }\right.$ $\displaystyle \bf I$ - $\displaystyle \bf W_{j}^{}$ $\displaystyle \bf W_{j}^{T}$ $\displaystyle \left.\vphantom{{\bf I}- {\bf W}_j {\bf W}_j^T }\right)$ .

(4.5)

Setting the derivative equal to zero yields,

$\displaystyle \hat{{\boldsymbol{\eta}}}_{j}^{}$ = $\displaystyle \bf A_{j}^{}$ ( $\displaystyle \bf p$ - $\displaystyle \bf c_{j}^{}$ )

(4.6)

with

$\displaystyle \bf A_{j}^{}$ = - ( $\displaystyle \bf M^{T}_{}$ $\displaystyle \bf D_{j}^{}$ $\displaystyle \bf M$ )^-1 $\displaystyle \bf M^{T}_{}$ $\displaystyle \bf D_{j}^{}$ .

(4.7)

After the input and output dimensions have been selected, $\bf A_{j}^{}$ needs to be computed only once for each unit.

The function E( $\eta$ ) is convex. Therefore, $\hat{{\boldsymbol{\eta}}}_{j}^{}$ is the only minimum. Thus, $\hat{{{\bf z}}}_{j}^{}$ = $\bf M$ $\hat{{\boldsymbol{\eta}}}_{j}^{}$ + $\bf p$ is the point with the smallest potential on the constraint. Next, j^* can be chosen, and the resulting $\hat{{\bf z}}_{{j^*}}^{}$ concludes the algorithm. For each input a unique output is given, and local minima as described in section 4.1.2 are avoided.

Next: 4.3 Function approximation on Up: 4. Abstract recurrent neural Previous: 4.1.2 Potential fields and

Heiko Hoffmann
2005-03-22