Next: A.3 Iterative mean
Up: A. Statistical tools
Previous: A.1 Bayes' theorem
A.2 Maximum likelihood
The maximum likelihood principle is illustrated in an example with a one-dimensional data distribution {xi},
i = 1,..., n. We assume that the data originate from a Gaussian distribution p(x) with parameters
and
,
According to the maximum likelihood principle,
we will choose the unknown parameters such that the given data are most likely under the obtained distribution. The probability L of the given data set is
L( , ) = p(xi) = data:image/s3,"s3://crabby-images/d85b2/d85b2e3b599d97780bcc70433b3fcd7bbc0e75ad" alt="$\displaystyle \left(\vphantom{\frac{1}{\sqrt{2 \pi} \sigma}}\right.$" data:image/s3,"s3://crabby-images/4e640/4e640ca5e29c3b1dac860c3c784b389e257af300" alt="$\displaystyle {\frac{{1}}{{\sqrt{2 \pi} \sigma}}}$" exp - data:image/s3,"s3://crabby-images/6fe75/6fe75c1985f5737b3db9dad9fd8552c416dc9580" alt="$\displaystyle {\frac{{\sum_{i=1}^n(x_i-\mu)^2}}{{2 \sigma^2}}}$" . |
(A.4) |
We want to find
and
that maximize L. Maximizing L is equivalent to maximizing log L, which is also called the log-likelihood
,
( , ) = log L( , ) = - n log - + const . |
(A.5) |
To find the maximum we compute the derivatives of the log-likelihood
and set them to zero:
data:image/s3,"s3://crabby-images/74e49/74e499b061c476ff2fd55e7f0a4663f0d25114cf" alt="$\displaystyle {\frac{{\partial\mathcal{L}}}{{\partial\sigma}}}$" |
= |
- + 0 , |
(A.6) |
data:image/s3,"s3://crabby-images/f5456/f54562c45d6980c07b664a4877817bce51134e6d" alt="$\displaystyle {\frac{{\partial\mathcal{L}}}{{\partial\mu}}}$" |
= |
0 . |
(A.7) |
Thus, we obtain the values of the parameters
and
:
The resulting
is the variance of the distribution and
is its center. The extremum of
is indeed a local maximum, as can be seen by computing the Hesse matrix of
and evaluating it at the extreme point
(
,
):
data:image/s3,"s3://crabby-images/fff89/fff890cf4039f0e0e2edf2c733912df303c86b65" alt="$\displaystyle {\frac{{\partial^2\mathcal{L}}}{{\partial\sigma^2}}}$" data:image/s3,"s3://crabby-images/762a3/762a38780c9019fe2fc6ae7af4b41f45c56c4655" alt="$\displaystyle \Big\vert _{{\sigma=\hat{\sigma}, \mu=\hat{\mu}}}^{}$" |
= |
- = - = - , |
(A.11) |
|
|
|
|
data:image/s3,"s3://crabby-images/e938a/e938a93affbb0aa1e9e32f6c35c18ff07889d56d" alt="$\displaystyle {\frac{{\partial^2\mathcal{L}}}{{\partial\sigma\partial\mu}}}$" data:image/s3,"s3://crabby-images/762a3/762a38780c9019fe2fc6ae7af4b41f45c56c4655" alt="$\displaystyle \Big\vert _{{\sigma=\hat{\sigma}, \mu=\hat{\mu}}}^{}$" |
= |
data:image/s3,"s3://crabby-images/65e70/65e7010a7f89b4fc1c711ecbacd7077ce5bb76fc" alt="$\displaystyle {\frac{{\partial^2\mathcal{L}}}{{\partial\mu\partial\sigma}}}$" = - = 0 , |
|
|
|
|
|
data:image/s3,"s3://crabby-images/f7137/f7137dd52f3984fdc3da5a110c6d0c304f45ba8a" alt="$\displaystyle {\frac{{\partial^2\mathcal{L}}}{{\partial\mu^2}}}$" data:image/s3,"s3://crabby-images/762a3/762a38780c9019fe2fc6ae7af4b41f45c56c4655" alt="$\displaystyle \Big\vert _{{\sigma=\hat{\sigma}, \mu=\hat{\mu}}}^{}$" |
= |
- . |
|
It follows that the Hesse matrix at the extremum is negative definite,
Therefore, the extremum is a local maximum. Moreover, it is also a global maximum. First, for finite parameters, no other extrema exist because
is a smooth function. Second,
is positive for finite parameters, but approaches zero for infinite values. Thus, any maximum must be in the finite range.
Next: A.3 Iterative mean
Up: A. Statistical tools
Previous: A.1 Bayes' theorem
Heiko Hoffmann
2005-03-22