Confidence Levels? [Archive] - Two Plus Two Older Archives

magic_man

09-15-2002, 11:47 PM

I have a fairly good grasp of statistics and analysis, but I never took a statistics course. I know the 68-95-99 rule, but have a related question. How are these numbers calculated? From what I remember, they are the area under the curve between given standard deviations, right? In that case, is it the normal curve, and if so, what is the equation for this curve? I am specifically interested in an equation that will let me calculate how many standard deviations it takes to obtain a particular confidence level, say 92%. If I knew the equation of the curve, would I just do an integral .92 = int(equation) from -x to x, where x is the required number of standard deviations? Thanks for any help you guys can offer.

~Magic Man

BruceZ

09-16-2002, 04:48 AM

We are dealing with the normal distribution, also known as the gaussian distribution (bell-shaped curve). 68% is the area under the curve between +/- 1 standard deviation. 95% is the area between +/-2 standard deviations. The area between +/- 3 standard deviations is actually 99.7%. 99% exactly corresponds to 2.58 standard deviations. You get these numbers from a table of the standard normal distribution, which is the normal distribution with mean = 0 and standard deviation = 1. The definite integral cannot be done in closed form, you must use a table which is derived numerically. The equation for the standard normal distribution is
sqrt(1/2pi)exp^(-x^2/2). It is only possible to do the indefinite integral from minus infinity to +infinity, which of course is 1 since this is the total probability, because Gauss figured out (I think when he was very young) that the integral of exp^(-x^2/2) from minus infinity to +infinity is sqrt(2pi), which is why we divide by this as a normalization constant to get the area to be 1. The trick to doing the indefinite integral is to relate it to an integral in polar co-ordinantes, and then convert it to a double integral which is separable in x and y.

Note that the general formula for any mean u and standard deviation sigma is
sqrt(1/2pi)exp[-(x-u)^2/(2*sigma)]. This is abbreviated N(u,sigma). The standard normal distribution would be N(0,1). When you use the table of the standard normal distribution, you must be aware of what it is giving you. Mine gives the integral from minus infinity to x. This is most useful for doing 1-sided probabilities such as the probability that you will be at least so far ahead after a certain time. For example, the value for 1 standard deviation is .84. This means there is an 84% chance you will be less than 1 standard deviation above the mean, and a 16% chance you will be above this amount (the area under one tail). To get the 2-sided confidence interval that you are familiar with, subtract 2*16% from 1 to subtract off the 2 tails.

09-16-2002, 09:08 AM

I believe that f(x) = sqrt(1/(2pi))exp^(-x^2/2) is correct for sigma = 1 and u = 0. The expression

sqrt(1/2pi)exp[-(x-u)^2/(2*sigma)]

for the general formula has a typo. I think the correct formula is

sqrt(1/(2 pi))/sigma * exp(-(x-u)^2/(2*sigma^2)).

Magic man wanted to integrate f(x) from -x to x. As BruceZ stated, this does not have a solution among the elementary functions. If you allow more advanced functions, the integral is

I = erf(x/sqrt(2)).

For example with Excel, you can type +ERF(1/SQRT(2)) to get the 1 s.d. confidence level.

BruceZ

09-16-2002, 09:13 AM

sqrt(1/2pi)exp[-(x-u)^2/(2*sigma)]

for the general formula has a typo. I think the correct formula is

sqrt(1/(2 pi))/sigma * exp(-(x-u)^2/(2*sigma^2)).

You are correct. (x-u)/sigma will convert any normally disributed random variable to standard normal.

BruceZ

09-16-2002, 10:29 AM

The important thing to note is that even if your mean is not 0, or your standard deviation is not 1, you still use the standard normal distribution table to determine the probability for different numbers of standard deviations. All we are doing is changing units and shifting where you define 0. So if your mean is 100 bets and your standard deviation is 100 bets, then 100 bets corresponds to a standard deviation of 1 on the standard normal distribution, and 100 bets is the mean of the distribution. This would be your disribution after 100 hours if your mean is 1 bb/hr, and your standard deviation is 10 bets per hour (total mean = 1 bb/hr * 100 hours, total sigma = 10 bb/hr*sqrt(100 hours).

Also, I shouldn't say that we can do the "indefinite integral" from minus infinity to +infinity since this makes no sense. We cannot write the indefinite integral in terms of elementary functions, so we cannot do the definite integral between arbitrary limits. We are able evaluate the integral from minus infinity to +infinity.

09-16-2002, 01:00 PM

Thanks for all the help, everyone. I suspected I wouldn't be able to do the integral, but I can still solve the problems I wanted in matlab or excel. Here's something that confuses me, though. In "Gambling Theory and Other Topics", on the section on bankrolls, Mason shows how to calculate a BR such that you will "never" go broke (he uses 3 standard deviations). However, he follows up saying that if you want only a 5% risk of going broke, you should use 1.64 standard deviations in the equations instead of 3. Isn't 1.64 SD's only a 90% confidence level? For 95% confidence, why don't we use 1.96 SD's? Thanks again.

~Magic Man

magic_man

09-16-2002, 01:01 PM

Mano

09-16-2002, 08:05 PM

<font color="blue">
Thanks for all the help, everyone. I suspected I wouldn't be able to do the integral, but I can still solve the problems I wanted in matlab or excel. Here's something that confuses me, though. In "Gambling Theory and Other Topics", on the section on bankrolls, Mason shows how to calculate a BR such that you will "never" go broke (he uses 3 standard deviations). However, he follows up saying that if you want only a 5% risk of going broke, you should use 1.64 standard deviations in the equations instead of 3. Isn't 1.64 SD's only a 90% confidence level? For 95% confidence, why don't we use 1.96 SD's? Thanks again.

~Magic Man
</font color>

The reason for only needing the 90% confidence for not going broke is that half of your remaining probability is positive - that is 5% of the time you will win more than the 1.64 standard deviations, and 5% of the time you will lose more than 1.64 standard deviations, with the remaining 90% of the time being within 1.64 sd's of the expected value.

magic_man

09-16-2002, 09:09 PM

Ah, that makes perfect sense. Thanks everyone!

~Magic Man
"If the whole idea is not to show how it's done, how does ANYBODY ever learn card tricks?!" -Tommy, cartoon character

BruceZ

09-16-2002, 09:31 PM

This is what I was saying about the 1-sided confidence interval. You will be +/- 1.96 standard deviations from the mean 95% of the time as you say, but you will be LESS THAN 1.96 standard deviations ABOVE the mean 97.5% of the time, and you will be GREATER THAN 1.96 standard deviations BELOW the mean 97.5% of the time. We are only ignoring one tail of the bell curve instead of both tails. You will be less than 1.64 standard deviations above the mean 95% of the time, and greater than 1.64 standard deviations below the mean 95% of the time. BUT...THIS IS DOES NOT MEAN YOU HAVE A 5% RISK OF GOING BROKE AND THIS IS A COMMON AND TERRIBLE ERROR!!! I don't know how Mason says to do risk of ruin because I haven't read this book (though it will arrive in a few days).

What is true is that if you play for the number of hours for which performing 1.64 standard deviations below average would mean you lost your starting bankroll, you have a 5% chance of being broke at that point. This is not your total risk of going broke, because you may go broke before you ever get to that point unless you find some more money so you can keep playing. Assuming you have a way to keep playing, the chance of losing your starting bankroll and needing more money before you play for any number of hours can be shown to always be more than twice the risk of being broke once you play for that number of hours. Also, the amount of money represented by 1.64 standard deviations changes over time. If you want the time at which 1.64 standard deviation below average represents a maximum loss, maximize 1.64sigma*sqrt(n) - nu. The maximum occurs at n = (1.64sigma/2u)^2 hours. While you have a 5% chance of being down 1.64*sigma dollars at this point if you play to this point, it turns out you have over a 10% chance of being down that amount of money before this point (or any other point).

In a recent post, I calculated that you need a 172 bet bankroll for a 10% risk of ruin with an hourly rate of 1.05 bets and a sigma of 13.11 bets/hr. See this post for the correct formula. If I were to use the wrong method, I would say I need to be no more than 1.28 standard deviations below the mean, and the number of hours for which this is the largest number of bets is [1.28*13.11/(2*1.05)]^2 = 63.9 hours. Then since my average win at that time is 63.9*1.05 = 67 bets, and 1.28 standard deviations is 1.28*13.11*sqrt(63.9) = 134 bets, you would conclude I only need 134-67 = 67 bets to have a 10% risk of ruin, when in fact I need 172 bets, over 2.5 times as much! If I only had 67 bets, my risk of ruin would be (.1)^(67/172) = 40%, 4 times as great as I thought! Don't do this.

BruceZ

09-16-2002, 10:15 PM

The maximum occurs at n = (1.64sigma/2u)^2 hours. While you have a 5% chance of being down 1.64*sigma dollars at this point

Meant 1.64*sigma*sqrt(n) dollars. This is 1.64 standard deviations after n hours.