PDA

View Full Version : Confidence Intervals for Non-normally distributed data


11-11-2002, 12:31 AM
Ok, say I have a set of data. All of the data is either a -1, or a number from 0 to 5. There are no numbers smaller than -1, no numbers between -1 and 0 and no numbers greater than 5. Clearly this is not a Normal distribution.

The standard deviation is 0.9 and the mean is -0.11. There are 375 pieces of data. How can I find a 95% confidence interval for the mean?

Also, how large of a sample size do I need to be "95% confident" that the true "expecation" is within +/- 0.02 of my estimated expectation. (In other words, how much data do I need to be 95% confident that the expecation is between -0.13 and -0.09?

I can do confidence intervals if my data is "normal" using "t-values" and what not, but this one has me puzzled.

Any help would be appreciated.

11-11-2002, 12:59 AM
Let me clarify where I am going with this and maybe someone can help me out.

Lets say I do a bunch of math and I think I have a 5% edge on a certain type of sports bet, and I calculated this 5% edge based on 300 games I recorded.

Lets say I do some other math and I find out that I have a 10% edge on a different type of sports bet, and I calculated this 10% edge based on only 100 games worth of data.

For which one do I truly have a better expectation? 5% edge over 300 games, or 10% edge on 100 games.

To go to the extreme I know that if I found a 50% edge in 5 games clearly that is a fluke.

Sorry if this doesn't make sense...

11-11-2002, 02:46 AM
The dumbed down version of the central limit theorem says that regardless of the distribution of the original data, the sampling distribution of the sample mean is approximately normal, provided the sample size is fairly large. 375 is definitely large enough for your problem.
You can reasonably calculate a confidence interval for the mean using a z table here. -.11 +- 1.96*.9/sqrt(375).
( -.2011, -.0189)


"Also, how large of a sample size do I need to be "95% confident" that the true "expecation" is within +/- 0.02 of my estimated expectation. (In other words, how much data do I need to be 95% confident that the expecation is between -0.13 and -0.09?"

The first statement in here is a reasonable question, the second part is not, but I know what you mean.

The sample size required to get an estimate within .02 is going to be ballpark 7780. ( Using your estimate of the sd as the real deal ).
Also, keep in mind this is all assuming a random sample. Not "man, I've been hot as hell lately, let's do a 95
% confidence interval for my expectation".