Poisson and Approximating Binomial

RocketManJames · #1 12-04-2005, 04:52 AM

Can someone help me understand this a bit, and tell me if this makes sense at all. Years ago, when I took a probability class, we went over the Poisson distribution, which could be used to approximate the binomial distribution for rare events.

So, say that I've got a fairly rare event (probability of this event occuring is about 1 in 100,000 trials). And, say that I've estimated the probability of this rare event from running 30 million trials and seeing it 300 times.

Now, I have a few questions...

1) Can I use the Poisson distribution to approximate the distribution that I would expect to see if I were to run a large number of trials (N = Large)?

2) Is the reason for using the Poisson Distribution as an approximation, because it is simpler (fewer terms, etc) than the Binomial?

3) Is it possible given the information I gave above to provide some range of error for any estimated distribution? Since P(rare event) was estimated from running 30MM trials and seeing 300 occurrences, can we somehow infer some cloud of error around any distribution we come up with?

I apologize if what I am asking is confusing or if it is way off. I'm just trying to learn here.

Thanks.

-RMJ

pzhon · #2 12-04-2005, 06:23 AM

A Poisson distribution has one parameter, its mean, m. That is also its variance, so the standard deviation is sqrt(m). When m is large, a Poisson distribution is well-approximated by a normal distribution with the same mean and standard deviation.

When m is small, say 1, or .01, a Poisson distribution is only very poorly approximated by a normal distribution. When you have a binomial distribution with a low mean, it is much more accurate to use a Poisson approximation than a normal approximation.

[ QUOTE ]
3) Is it possible given the information I gave above to provide some range of error for any estimated distribution? Since P(rare event) was estimated from running 30MM trials and seeing 300 occurrences, can we somehow infer some cloud of error around any distribution we come up with?

[/ QUOTE ]
That's a statistics question rather than a probability question. You can use the Poisson distributions to find confidence intervals about your observed P, but what is appropriate depends on how you will use P.

Siegmund · #3 12-04-2005, 06:35 AM

Yes, the Poisson distribution can be used to make such approximations. The rarer the event the better the approximation.

It is "simpler" in the sense that, until the last ten years or so, it was the only practical way of doing these at all, and it's still very handy for doing ballpark estimates in your head. (And it's handy for avoiding underflow errors, if you're working on a machine with limited precision.)

Let's look at two recent threads in this forum. Scroll down to "No Royal in 100k hands?" We're told that chance of going 100,000 hands (if we take every hand to the river )without holding a Royal is 1 in 30940.

How do you calculate (30939/30940)^100000 by hand? You take logarithms. ln P = 100000 ln 30939/30940 = 100000 ln (1-1/30940) = 100000 * (- 1/30940 + 1/30940^2 etc), you approximate by dropping the higher-order terms from the series expansion of the log, and you have ln P ~ -100000/30940, or P=e^(-100000/39040).

The Poisson formula just generalizes this, to P(something never happens) = e^-(#times something is expected to happen), and saves you from similar but harder series.

In the Royal Flush thread, e^-3.23206 = 0.03948, and we expect the approximation to be good to 4 decimal places since the neglected term in the series was 1/30940 as big as the included term.

By comparison, in the thread "what are the odds of 100 hands with no PP?" we are asked to calculate (16/17)^100. Here the Poisson approximation is e^-100/17 = e^-5.8824 = 0.279%, while the actual answer was 0.233%. This is not surprising since the neglected term is now only 1/17 as small as the term we considered, and an error appears in the second significant digit.

The number of digits you can trust in your approximation is, then, controlled by how unlikely the event is to occur on a single trial. In your case, you can expect using the Poisson approximation instead of an exact calculation to give you 5 good decimal places.

3) Yes; the variance and mean of a Poisson distribution are equal. If you run many series 30M trials with p=10^-5, your estimates are going to be centered on 300 with a standard deviation between 17 and 18.

This is the other reason why the Poisson approximation is used. Since the experiment itself is going to only provide you an answer with two significant digits, you needn't lose any sleep at all about the errors in fifth digit from the approximation.

AaronBrown · #4 12-05-2005, 12:15 AM

In addition to the excellent replies by Siegmund and phzon, I'd add that the Poisson is most useful not only when the events are rare, but when the sample size is small enough that you only expect a few successes. In your case, with 300 expected successes, you'll get similar answers with the Normal and Poisson approximations.

Another case in which the Poisson is useful is when you don't know the number of trials. For example, suppose in 2005, 10 people have died of heart attacks immediately after getting holes in one in golf. I'm willing to assume these are independent events. The long term average number is 5 per year. But I don't know how many people get holes in one in golf. Nevertheless, because fatal heart attacks are reasonably rare over short intervals of time, I can assume that the number per year is a Poisson distribution with mean and variance of 5. There is only a 3% chance of observing 10 or more occurrances in that case. So, this is significant evidence that something has changed (more people playing golf, more holes in one, sicker people playing golf, more heart attacks; something).

The interesting thing is I don't need any more information than the long run average, and that the events are independent and rare.

RocketManJames · #5 12-06-2005, 06:12 PM

Siegmund, Aaron, PZhon: Thanks for all the information.

I guess in my case, since I have access to many trials, there no major benefit of using the Poisson. Maybe using it would lead to slightly easier calculations. I'll have to look into that.

-RMJ