A.2 Maximum likelihood

The maximum likelihood principle is illustrated in an example with a one-dimensional data distribution {*x*_{i}},
*i* = 1,..., *n*. We assume that the data originate from a Gaussian distribution *p*(*x*) with parameters and ,

p(x) = exp - . |
(A.3) |

According to the maximum likelihood principle, we will choose the unknown parameters such that the given data are most likely under the obtained distribution. The probability

L(,) = p(x_{i}) = exp - . |
(A.4) |

We want to find and that maximize

(,) = log L(,) = - n log - + const . |
(A.5) |

To find the maximum we compute the derivatives of the log-likelihood and set them to zero:

= | - + 0 , | (A.6) | |

= | 0 . | (A.7) |

Thus, we obtain the values of the parameters and :

= | , | (A.8) | |

= | . | (A.9) |

The resulting is the variance of the distribution and is its center. The extremum of is indeed a local maximum, as can be seen by computing the Hesse matrix of and evaluating it at the extreme point (,):

H_{} = , |
(A.10) |

= | - = - = - , | (A.11) | |

= | = - = 0 , | ||

= | - . |

It follows that the Hesse matrix at the extremum is negative definite,

H_{}|_{=, =} = . |
(A.12) |

Therefore, the extremum is a local maximum. Moreover, it is also a global maximum. First, for finite parameters, no other extrema exist because is a smooth function. Second, is positive for finite parameters, but approaches zero for infinite values. Thus, any maximum must be in the finite range.

2005-03-22