Next: 2.1.1 Neural networks for Up: 2. Modeling of data Previous: 2. Modeling of data

2.1 Principal component analysis (PCA)

Principal component analysis is a widely used tool for dimension reduction (Diamantaras and Kung, 1996). Let $\bf x_{i}^{}$ $\in$ IR^d, where i = 1,..., n, be the training patterns. The principal components are a set of q < d orthonormal vectors and span a subspace in the major directions into which the patterns extend (figure 2.1).

**Figure 2.1:** The principal component $\bf w_{1}^{}$ points into the direction of maximum variance. The gray dots are the training patterns. The intersection of the dashed lines is the center of the pattern distribution.
$\includegraphics[width=12cm]{pcavariance.eps}$

In this section, we assume that the patterns are centered around the origin (without loss of generality). Let $\bf y$ be the projection onto a subspace,

$\displaystyle \bf y$ = $\displaystyle \bf W^{T}_{}$ $\displaystyle \bf x$ .

(2.1)

$\bf W$ is a d×q matrix that contains the principal components as columns. The vector $\bf y$ is a dimension-reduced representation of $\bf x$ . Let $\hat{{\bf x}}$ be the reconstruction of $\bf x$ given only the vector $\bf y$ ,

$\displaystyle \hat{{\bf x}}$ = $\displaystyle \bf W$ $\displaystyle \bf y$ .

(2.2)

The goal of PCA is to set the subspace such that the mean reconstruction error E_rec is minimized,

E_rec = $\displaystyle {\frac{{1}}{{n}}}$ $\displaystyle \sum_{{i=1}}^{n}$ $\displaystyle \left\Vert\vphantom{{\bf x}_i - \hat{\bf x}_i}\right.$ $\displaystyle \bf x_{i}^{}$ - $\displaystyle \hat{{\bf x}}_{i}^{}$ $\displaystyle \left.\vphantom{{\bf x}_i - \hat{\bf x}_i}\right\Vert^{2}_{}$ .

(2.3)

This goal is equivalent to finding the q major directions of maximal variance within the set of patterns { $\bf x_{i}^{}$ } (Diamantaras and Kung, 1996). Moreover, it is equivalent to the principal components being the first q eigenvectors $\bf w_{l}^{}$ of the covariance matrix $\bf C$ of the pattern set (Diamantaras and Kung, 1996),

$\displaystyle \bf C$ = $\displaystyle {\frac{{1}}{{n}}}$ $\displaystyle \sum_{{i=1}}^{n}$ $\displaystyle \bf x_{i}^{}$ $\displaystyle \bf x_{i}^{T}$ .

(2.4)

The corresponding eigenvalue equation is

$\displaystyle \bf C$ $\displaystyle \bf w_{l}^{}$ = $\displaystyle \lambda_{l}^{}$ $\displaystyle \bf w_{l}^{}$ .

(2.5)

The eigenvalue $\lambda_{l}^{}$ is the variance of the distribution { $\bf x_{i}^{}$ } in the direction of $\bf w_{l}^{}$ . The following sections describe how neural networks can extract principal components and how PCA can be linked to the probability density of a pattern distribution.

Subsections

Next: 2.1.1 Neural networks for Up: 2. Modeling of data Previous: 2. Modeling of data

Heiko Hoffmann
2005-03-22