next up previous contents
Next: C.2 The eigenvalue equation Up: C. Proofs Previous: C. Proofs


C.1 Probabilistic PCA and error measures

In probabilistic principal component analysis, the observed d-dimensional data {$ \bf x_{i}^{}$} are assumed to origin from a probability density p($ \bf x$). This density can be written as

p($\displaystyle \bf x$) = (2$\displaystyle \pi$)-d/2(det$\displaystyle \bf B$)-1/2exp$\displaystyle \left(\vphantom{-\frac{1}{2}({\bf x}-{\bf c})^T{\bf B}^{-1}({\bf x}-{\bf c})}\right.$ - $\displaystyle {\frac{{1}}{{2}}}$($\displaystyle \bf x$ - $\displaystyle \bf c$)T$\displaystyle \bf B^{{-1}}_{}$($\displaystyle \bf x$ - $\displaystyle \bf c$)$\displaystyle \left.\vphantom{-\frac{1}{2}({\bf x}-{\bf c})^T{\bf B}^{-1}({\bf x}-{\bf c})}\right)$ , (C.1)

with $ \bf B$ = $ \sigma^{2}_{}$$ \bf I_{d}^{}$ + $ \bf U$$ \bf U^{T}_{}$ (Tipping and Bishop, 1997). $ \bf I_{d}^{}$ is the d-dimensional identity matrix, and $ \sigma^{2}_{}$ is the noise variance. The d×q matrix $ \bf U$ is obtained by maximizing the likelihood of the data {$ \bf x_{i}^{}$} given the probability p($ \bf x$). Tipping and Bishop (1999) showed that the result is

$\displaystyle \bf U$ = $\displaystyle \bf W$($\displaystyle \Lambda$ - $\displaystyle \sigma^{2}_{}$$\displaystyle \bf I_{q}^{}$)1/2$\displaystyle \bf R$ . (C.2)

The columns of the d×q matrix $ \bf W$ are the q principal eigenvectors of the covariance matrix of {$ \bf x_{i}^{}$}. The q largest eigenvalues $ \lambda_{j}^{}$ of the covariance matrix are the entries of the diagonal matrix $ \Lambda$. $ \bf R$ is an arbitrary q×q rotational matrix.

In the following, it is shown that the double negative logarithm of (C.1) equals the normalized Mahalanobis distance plus reconstruction error (section 3.2.1) plus a constant. Using (C.2) to rewrite the expression for $ \bf B$ gives

$\displaystyle \bf B$ = $\displaystyle \sigma^{2}_{}$$\displaystyle \bf I_{d}^{}$ + $\displaystyle \bf W$($\displaystyle \Lambda$ - $\displaystyle \sigma^{2}_{}$$\displaystyle \bf I_{q}^{}$)$\displaystyle \bf W^{T}_{}$ = $\displaystyle \bf W$$\displaystyle \Lambda$$\displaystyle \bf W^{T}_{}$ + $\displaystyle \sigma^{2}_{}$($\displaystyle \bf I_{d}^{}$ - $\displaystyle \bf W$$\displaystyle \bf W^{T}_{}$) . (C.3)

By showing that $ \bf B$$ \bf B^{{-1}}_{}$ = $ \bf I$ and $ \bf B^{{-1}}_{}$$ \bf B$ = $ \bf I$, we can verify that the inverse of $ \bf B$ is

$\displaystyle \bf B^{{-1}}_{}$ = $\displaystyle \bf W$$\displaystyle \Lambda$-1$\displaystyle \bf W^{T}_{}$ + $\displaystyle {\frac{{1}}{{\sigma^2}}}$($\displaystyle \bf I_{d}^{}$ - $\displaystyle \bf W$$\displaystyle \bf W^{T}_{}$) . (C.4)

The eigenvalues of $ \bf B$ are $ \lambda_{1}^{}$,...,$ \lambda_{q}^{}$ and $ \sigma^{2}_{}$. The latter occurs (d - q)-times. Thus, the determinant of $ \bf B$ is

det$\displaystyle \bf B$ = $\displaystyle \left(\vphantom{\sigma^2}\right.$$\displaystyle \sigma^{2}_{}$$\displaystyle \left.\vphantom{\sigma^2}\right)^{{d-q}}_{}$$\displaystyle \prod_{{j=1}}^{q}$$\displaystyle \lambda_{j}^{}$ . (C.5)

Finally, we evaluate the logarithm of p($ \bf x$) using (C.1), (C.4), and (C.5):

ln p($\displaystyle \bf x$) = - $\displaystyle {\frac{{d}}{{2}}}$ln(2$\displaystyle \pi$) - $\displaystyle {\frac{{1}}{{2}}}$E($\displaystyle \bf x$ - $\displaystyle \bf c$) (C.6)

with

E($\displaystyle \xi$) = $\displaystyle \xi$T$\displaystyle \bf W$$\displaystyle \Lambda$-1$\displaystyle \bf W^{T}_{}$$\displaystyle \xi$ + $\displaystyle {\frac{{1}}{{\sigma^2}}}$($\displaystyle \xi$T$\displaystyle \xi$ - $\displaystyle \xi$T$\displaystyle \bf W$$\displaystyle \bf W^{T}_{}$$\displaystyle \xi$) + $\displaystyle \sum_{j}^{}$ln$\displaystyle \lambda_{j}^{}$ + (d - q)ln$\displaystyle \sigma^{2}_{}$ , (C.7)

and $ \xi$ = $ \bf x$ - $ \bf c$. E is a normalized Mahalanobis distance plus reconstruction error.


next up previous contents
Next: C.2 The eigenvalue equation Up: C. Proofs Previous: C. Proofs
Heiko Hoffmann
2005-03-22