next up previous contents
Next: A.3 Iterative mean Up: A. Statistical tools Previous: A.1 Bayes' theorem


A.2 Maximum likelihood

The maximum likelihood principle is illustrated in an example with a one-dimensional data distribution {xi}, i = 1,..., n. We assume that the data originate from a Gaussian distribution p(x) with parameters $ \sigma$ and $ \mu$,

p(x) = $\displaystyle {\frac{{1}}{{\sqrt{2 \pi} \sigma}}}$exp$\displaystyle \left(\vphantom{-\frac{(x-\mu)^2}{2 \sigma^2}}\right.$ - $\displaystyle {\frac{{(x-\mu)^2}}{{2 \sigma^2}}}$$\displaystyle \left.\vphantom{-\frac{(x-\mu)^2}{2 \sigma^2}}\right)$ . (A.3)

According to the maximum likelihood principle, we will choose the unknown parameters such that the given data are most likely under the obtained distribution. The probability L of the given data set is

L($\displaystyle \sigma$,$\displaystyle \mu$) = $\displaystyle \prod^{n}_{{i=1}}$p(xi) = $\displaystyle \left(\vphantom{\frac{1}{\sqrt{2 \pi} \sigma}}\right.$$\displaystyle {\frac{{1}}{{\sqrt{2 \pi} \sigma}}}$$\displaystyle \left.\vphantom{\frac{1}{\sqrt{2 \pi} \sigma}}\right)^{n}_{}$exp$\displaystyle \left(\vphantom{-\frac{\sum_{i=1}^n(x_i-\mu)^2}{2 \sigma^2}}\right.$ - $\displaystyle {\frac{{\sum_{i=1}^n(x_i-\mu)^2}}{{2 \sigma^2}}}$$\displaystyle \left.\vphantom{-\frac{\sum_{i=1}^n(x_i-\mu)^2}{2 \sigma^2}}\right)$ . (A.4)

We want to find $ \hat{{\sigma}}$ and $ \hat{{\mu}}$ that maximize L. Maximizing L is equivalent to maximizing log L, which is also called the log-likelihood $ \mathcal {L}$,

$\displaystyle \mathcal {L}$($\displaystyle \sigma$,$\displaystyle \mu$) = log L($\displaystyle \sigma$,$\displaystyle \mu$) = - n log$\displaystyle \sigma$ - $\displaystyle {\frac{{\sum_i(x_i-\mu)^2}}{{2 \sigma^2}}}$ + const . (A.5)

To find the maximum we compute the derivatives of the log-likelihood $ \mathcal {L}$ and set them to zero:
$\displaystyle {\frac{{\partial\mathcal{L}}}{{\partial\sigma}}}$ = - $\displaystyle {\frac{{n}}{{\sigma}}}$ + $\displaystyle {\frac{{\sum_i(x_i-\mu)^2}}{{\sigma^3}}}$  $\displaystyle \;\stackrel{{\textstyle!}}{{=}}\;$0 , (A.6)
$\displaystyle {\frac{{\partial\mathcal{L}}}{{\partial\mu}}}$ = $\displaystyle {\frac{{\sum_i(x_i-\mu)}}{{\sigma^2}}}$  $\displaystyle \;\stackrel{{\textstyle!}}{{=}}\;$0 . (A.7)

Thus, we obtain the values of the parameters $ \hat{{\sigma}}$ and $ \hat{{\mu}}$:
$\displaystyle \hat{{\sigma}}^{2}_{}$ = $\displaystyle {\frac{{\sum_i(x_i-\hat{\mu})^2}}{{n}}}$ , (A.8)
$\displaystyle \hat{{\mu}}$ = $\displaystyle {\frac{{\sum_i x_i}}{{n}}}$ . (A.9)

The resulting $ \hat{\sigma}^{2}_{}$ is the variance of the distribution and $ \hat{{\mu}}$ is its center. The extremum of $ \mathcal {L}$ is indeed a local maximum, as can be seen by computing the Hesse matrix of $ \mathcal {L}$ and evaluating it at the extreme point ($ \hat{{\sigma}}$,$ \hat{{\mu}}$):

H$\scriptstyle \mathcal {L}$ = $\displaystyle \left\vert\vphantom{ \begin{array}{cc} \frac{\partial^2\mathcal{L...
...rtial\sigma} & \frac{\partial^2\mathcal{L}}{\partial\mu^2} \end{array} }\right.$$\displaystyle \begin{array}{cc} \frac{\partial^2\mathcal{L}}{\partial\sigma^2} ...
...ial\mu\partial\sigma} & \frac{\partial^2\mathcal{L}}{\partial\mu^2} \end{array}$$\displaystyle \left.\vphantom{ \begin{array}{cc} \frac{\partial^2\mathcal{L}}{\...
...l\sigma} & \frac{\partial^2\mathcal{L}}{\partial\mu^2} \end{array} }\right\vert$ , (A.10)


$\displaystyle {\frac{{\partial^2\mathcal{L}}}{{\partial\sigma^2}}}$$\displaystyle \Big\vert _{{\sigma=\hat{\sigma}, \mu=\hat{\mu}}}^{}$ = $\displaystyle {\frac{{n}}{{\hat{\sigma}^2}}}$ - $\displaystyle {\frac{{3\sum_i (x_i-\hat{\mu})^2}}{{\hat{\sigma}^4}}}$ = $\displaystyle {\frac{{n}}{{\hat{\sigma}^2}}}$ - $\displaystyle {\frac{{3 n\hat{\sigma}^2}}{{\hat{\sigma}^4}}}$ = - $\displaystyle {\frac{{2 n}}{{\hat{\sigma}^2}}}$ , (A.11)
       
$\displaystyle {\frac{{\partial^2\mathcal{L}}}{{\partial\sigma\partial\mu}}}$$\displaystyle \Big\vert _{{\sigma=\hat{\sigma}, \mu=\hat{\mu}}}^{}$ = $\displaystyle {\frac{{\partial^2\mathcal{L}}}{{\partial\mu\partial\sigma}}}$$\displaystyle \Big\vert _{{\sigma=\hat{\sigma}, \mu=\hat{\mu}}}^{}$ = - $\displaystyle {\frac{{2 \sum_i(x_i - \hat{\mu})}}{{\hat{\sigma}^3}}}$ = 0 ,  
       
$\displaystyle {\frac{{\partial^2\mathcal{L}}}{{\partial\mu^2}}}$$\displaystyle \Big\vert _{{\sigma=\hat{\sigma}, \mu=\hat{\mu}}}^{}$ = - $\displaystyle {\frac{{n}}{{\hat{\sigma}^2}}}$ .  

It follows that the Hesse matrix at the extremum is negative definite,

H$\scriptstyle \mathcal {L}$|$\scriptstyle \sigma$=$\scriptstyle \hat{{\sigma}}$$\scriptstyle \mu$=$\scriptstyle \hat{{\mu}}$ = $\displaystyle \left\vert\vphantom{ \begin{array}{cc} - \frac{2 n}{\hat{\sigma}^2} & 0 \  \  0 & - \frac{n}{\hat{\sigma}^2} \end{array} }\right.$$\displaystyle \begin{array}{cc} - \frac{2 n}{\hat{\sigma}^2} & 0 \  \  0 & - \frac{n}{\hat{\sigma}^2} \end{array}$$\displaystyle \left.\vphantom{ \begin{array}{cc} - \frac{2 n}{\hat{\sigma}^2} & 0 \  \  0 & - \frac{n}{\hat{\sigma}^2} \end{array} }\right\vert$ . (A.12)

Therefore, the extremum is a local maximum. Moreover, it is also a global maximum. First, for finite parameters, no other extrema exist because $ \mathcal {L}$ is a smooth function. Second, $ \mathcal {L}$ is positive for finite parameters, but approaches zero for infinite values. Thus, any maximum must be in the finite range.
next up previous contents
Next: A.3 Iterative mean Up: A. Statistical tools Previous: A.1 Bayes' theorem
Heiko Hoffmann
2005-03-22