Site Index

The Likelihood Function
Introduction

Density Independence

Density Dependence

Age-Structured Population Growth

Binomial Sampling

Likelihood Function

Statistical Power

Lotka-Volterra Competition

Lotka-Volterra Predation

Maximum Sustained Yield

Harvest Compensation

Diploid Selection

Genetic Drift



As we’ve seen, a probability function f(x) provides us with a measure of how “likely” given sample outcomes (x) are.  For example, we’ve seen that the function that describes the probability of x successes in n Bernoulli trials, is given by: 

where p is the probability of success for each trial.  This function presupposes that we know (1) the number of trials (n) and the probability of success (p).  In other words, this model says “you give me the model, and I’ll tell you how likely the data are.” Let’s stand this problem on its head and ask a somewhat different question: “Given that I have the data (x successes in n trials), how likely are different values of p”?  This question gives rise to something called a likelihood function  or simply likelihood.

This function looks almost identical to the probability function, but notice a subtle difference: in the probability function the parameter p is assumed to be fixed, and the observation (x) is assumed to be variable (and a function of p and n).  In the likelihood we already have the data, and it’s p we are trying to determine;   p is now a function of the data. 

For nice (defined as the opposite of naughty) likelihoods (like the binomial), it turns out that there is a single value of p, given the data, that is most likely; that is, makes the likelihood as big as possible.  That value is know as the maximum likelihood estimate (MLE) and has some very appealing  properties we don’t need to go into here.   It turns out that  that our old buddy

 is the MLE for p.  You could prove this by taking the likelihood function and finding its maximum using calculus, as we’ll explain below. Many common estimates in statistics, such as the mean and variance of the normal distribution, are MLEs, and many common test statistics (t, F, chi-square, etc.) are based on likelihoods. 

 

Applet Exercises

</COMMENT>

Our applet demonstrates some of the properties of this simple likelihood for one and two samples.  There are just 2 sliders: n (the number of trials) and x the number of successes.  That’s all you need, as you now know, to get the likelihood.  Notice that if you change n that x (the number of successes) will shift too, keeping the same proportional relationship to n that it had when you started.  This will help you see the effect that increasing n has in concentrating L around the most likely value.  If you want to see the likelihood for a specific combination of n and x, first set n (number of trials) and then x (number of successes).  Sliding x around for a fixed n shows how the most likely value changes when you get different sample results.

The 2 other display windows show functions that are useful for a variety of purposes. The natural logarithm of the likelihood ln (L(p|x, n)) is mathematically easier to deal with than the likelihood.  The last window, which represents the derivative of this function with respect to p, shows the function that is actually solved to obtain the MLE (by setting it equal to zero and solving for p).  This general approach works for many types of likelihoods, including those with many parameters.  For some of these calculus doesn’t provide a neat solution, but we can still find values of the parameters that maximize the likelihood using numerical methods, and these are also MLEs.

Click the ‘1 Population’ box to change to 2 populations.  Here you see 2 likelihoods side by side, each based on n independent trials.  See how changing the number of trials affects the concentration of each likelihood around the MLE.  Try 10 trials and =1, .  Notice how much overlap there is between the likelihood functions.  Then increase the number of trials gradually to 100 and observe how the overlap between the likelihood lessens.  Are there still values of the parameters that are equally likely under either likelihood?  What does this tell you about the ability of data to distinguish between real differences in populations?