next up previous contents
Next: 2.1.1 Neural networks for Up: 2. Modeling of data Previous: 2. Modeling of data


2.1 Principal component analysis (PCA)

Principal component analysis is a widely used tool for dimension reduction (Diamantaras and Kung, 1996). Let $ \bf x_{i}^{}$ $ \in$ IRd, where i = 1,..., n, be the training patterns. The principal components are a set of q < d orthonormal vectors and span a subspace in the major directions into which the patterns extend (figure 2.1).

Figure 2.1: The principal component $ \bf w_{1}^{}$ points into the direction of maximum variance. The gray dots are the training patterns. The intersection of the dashed lines is the center of the pattern distribution.
\includegraphics[width=12cm]{pcavariance.eps}

In this section, we assume that the patterns are centered around the origin (without loss of generality). Let $ \bf y$ be the projection onto a subspace,

$\displaystyle \bf y$ = $\displaystyle \bf W^{T}_{}$$\displaystyle \bf x$ . (2.1)

$ \bf W$ is a d×q matrix that contains the principal components as columns. The vector $ \bf y$ is a dimension-reduced representation of $ \bf x$. Let $ \hat{{\bf x}}$ be the reconstruction of $ \bf x$ given only the vector $ \bf y$,

$\displaystyle \hat{{\bf x}}$ = $\displaystyle \bf W$$\displaystyle \bf y$ . (2.2)

The goal of PCA is to set the subspace such that the mean reconstruction error Erec is minimized,

Erec = $\displaystyle {\frac{{1}}{{n}}}$$\displaystyle \sum_{{i=1}}^{n}$$\displaystyle \left\Vert\vphantom{{\bf x}_i - \hat{\bf x}_i}\right.$$\displaystyle \bf x_{i}^{}$ - $\displaystyle \hat{{\bf x}}_{i}^{}$$\displaystyle \left.\vphantom{{\bf x}_i - \hat{\bf x}_i}\right\Vert^{2}_{}$ . (2.3)

This goal is equivalent to finding the q major directions of maximal variance within the set of patterns {$ \bf x_{i}^{}$} (Diamantaras and Kung, 1996). Moreover, it is equivalent to the principal components being the first q eigenvectors $ \bf w_{l}^{}$ of the covariance matrix $ \bf C$ of the pattern set (Diamantaras and Kung, 1996),

$\displaystyle \bf C$ = $\displaystyle {\frac{{1}}{{n}}}$$\displaystyle \sum_{{i=1}}^{n}$$\displaystyle \bf x_{i}^{}$$\displaystyle \bf x_{i}^{T}$ . (2.4)

The corresponding eigenvalue equation is

$\displaystyle \bf C$$\displaystyle \bf w_{l}^{}$ = $\displaystyle \lambda_{l}^{}$$\displaystyle \bf w_{l}^{}$ . (2.5)

The eigenvalue $ \lambda_{l}^{}$ is the variance of the distribution {$ \bf x_{i}^{}$} in the direction of $ \bf w_{l}^{}$. The following sections describe how neural networks can extract principal components and how PCA can be linked to the probability density of a pattern distribution.



Subsections
next up previous contents
Next: 2.1.1 Neural networks for Up: 2. Modeling of data Previous: 2. Modeling of data
Heiko Hoffmann
2005-03-22