PDA

View Full Version : confidence intervals?


jasonHoldEm
02-13-2004, 05:36 PM
Total hands - 26315
Total hours - 431.13
BB/Hr - 1.46
stddev/hr - 13.04

I'm not sure if all the above are necessary info, but how do I figure out my "actual" win rate...isn't there some formula that says XX% of the time I will make between X and Y? Thanks.

J

Jezebel
02-13-2004, 05:52 PM
Divide your SD by the sqaureroot of the number of hours played for your standard error. For the info you provided your standard error is .63

Your true winrate will be + or - one standard error 68% of the time.

Your true winrate will be + or - 1.3 standard error 80% of the time.

Your true winrate will be + or - 1.6 standard error 90% of the time.

Your true winrate will be + or - 2 standard errors 95% of the time.

So it appears that there is a 95% chance that your true win rate lies between .2 BB/hr and 2.72 BB/hr. The more hours you play the tigher this range will become.

fluff
02-13-2004, 05:55 PM
Your can be 95% confident that your win rate falls between 0.229BB/hr and 2.69BB/hr.

Or between WR-1.96*SD/(Time^0.5) and WR+1.96*SD/(Time^0.5)

jasonHoldEm
02-13-2004, 06:20 PM
Thanks guys!

jds1201
03-02-2004, 02:04 PM
I've been racking my brain over the past few weeks trying to figure out how the relationship between my confidence in my win rate and the distribution of true win rates play together.

For example, let's say that I've got a win rate of 3BB/hr and a standard error of 1. The bell curve around my win rate says that I could be 95% confident that my true win rate is 3 +- 2BB/hr (so between 1 and 5).

What is puzzling me that it seems to me that the low end of the range (1) is much more likely than the high end of the range (5) because there are a lot more 1BB/Hr players out there than there are 5BB/Hr. Taking it a step further, I would expect almost all of the outlying 5% to be to the low side instead of the high side, not 2.5% on each side, yet the normal distribution around my result would seem to indicate that the chances of these results are equal.

I'm sure I'm missing something simple (I haven't worked with this stuff in a long time), but I just can't seem to figure out how this distribution interacts with the distribution of all players.

I hope this made some sense.

Thanks.

JDS

Am Alert
03-02-2004, 02:41 PM
You are not missing anything, stop racking your brain. It is an overtly simplistic approach to apply normal distribution to every problem and expect reliable results.

BruceZ
03-03-2004, 09:22 AM
yet the normal distribution around my result would seem to indicate that the chances of these results are equal.

In fact you are missing something very important and very subtle which many people miss regarding the definition of confidence intervals. The confidence interval does not tell you the probability that your win rate lies in that interval. Instead it gives the probability of obtaining your results if your win rate were in that interval. Your win rate would have to lie between 1 and 5 bb/hr in order to achieve your results or better with a probability of 95%. This means there is a 95% confidence that your win rate is between 1 and 5 bb/hr, but confidence and probability are two different things.

The win rate and standard deviation you computed are called maximum likelihood estimates. This is because they are the values which maximize the probability of obtaining your results, and not because they are the most likely estimates of those values. In fact, there are other intervals which give a 95% probability, but the one which is symmetrical to your computed win rate can be shown to be the shortest.

In order to compute the probability that your win rate lies within some interval, you need a different form of statistics called Bayesian statistics. In Bayesian estimation, you take into account any prior knowledge you already have about where your win rate is likely to lie by assuming a prior probability distribution of your win rate. This prior distribution would take into account the distribution of win rates among all players. If you think it is very unlikely that your win rate is above 5, then this would be reflected in the prior distribution. Your data is then used to update the prior distribution to yield a posterior distribution for the win rate which is a true probability distribution. The final distribution you get depends on the prior distribution you start with, so the results can be somewhat controversial. This contrasts with using maximum likelihood estimation and confidence intervals which depend only on the observed data.

irchans
03-03-2004, 10:23 AM
Bruce,

You have some very important points in your last post most of which I understand and agree with. The idea of using Bayesian statistics is very good. I would think a normal a prior distribution of winrate with a standard deviation of 2 BB per sqrt hour and mean of 0 BB per hour would be reasonable. Then the ML estimator for win rate would be something like

estimate = WinRateObserved / ( 1 + 2*SDestimate/Sqrt[hours])


If instead we chose a uniform distribution say between -1000 BB per hour and +1000 BB per hour, then I wonder if we could then say that there is a 95% probability that the win rate is within 2 standard errors of the observed win rate.


I do have one question about your first paragraph.

[ QUOTE ]
The confidence interval does not tell you the probability that your win rate lies in that interval. Instead it gives the probability of obtaining your results if your win rate were in that interval. Your win rate would have to lie between 1 and 5 bb/hr in order to achieve your results or better with a probability of 95%.

[/ QUOTE ]

I think there must be a word or two wrong in those statements. If your win rate was 100 bb/hr, then you would achieve those "results or better." Am I missing something here?

Cheers,
Irchans

irchans
03-03-2004, 10:31 AM
AmAlert,

I agree that "It is an overtly simplistic approach to apply normal distribution to every problem and expect reliable results." Stock market analysts make this mistake all the time. There are situations where normal distributions are good and there are situations where normal distributions are bad. I would think that a normal distribution would work well for estimating win rates of poker with hundreds of hours of data. What would be a better distribution? How much improvement could we expect if we used a better distribution?

Cheers,
Irchans

BruceZ
03-03-2004, 02:17 PM
Your win rate would have to lie between 1 and 5 bb/hr in order to achieve your results or better with a probability of 95%. This means there is a 95% confidence that your win rate is between 1 and 5 bb/hr, but confidence and probability are two different things.

This first sentence was worded wrong; sorry if it caused confusion. What I intended to explain was that if your true win rate were to lie below 1 bb/hr, then there would have been less than a 2.5% probability of obtaining your result or better, and if your true win rate were to lie above 5 bb/hr, then there would have been less than a 2.5% probability of obtaining your win rate or worse. So if your true win rate lies outside of this interval from 1 to 5 bb/hr, then there would have been less than a 5% probability of obtaining your result, or a result which is farther from the true win rate. It is correct to describe this situation by stating that there is a 95% confidence that your true win rate lies between 1 and 5 bb/hr as stated. The important point is that the confidence interval describes the probability of your results given particular win rates, not the probability of the true win rate.

Another way to look at this is that before you played these hours, there actually was a 95% probability that your observed win rate would fall within 2 standard errors of your true win rate. However, after you play, and your win rate and standard error are numbers rather than variables, then you can no longer make this statement unless you substitute the word “confidence” for “probability”.

BruceZ
03-04-2004, 05:50 AM
Hans,

I would think a normal a prior distribution of winrate with a standard deviation of 2 BB per sqrt hour and mean of 0 BB per hour would be reasonable.

I think the units of standard deviation of win rate should be BB/hr rather than BB/sqrt(hr). This is the standard deviation of the prior distribution which reflects our uncertainty in the value of the true win rate, so it should have the same units as the win rate. Note that standard error, which is the standard deviation of win rate, is SD/sqrt(n) and has units of BB/hr also since SD has units of BB/sqrt(hr).


Then the ML estimator for win rate would be something like

estimate = WinRateObserved / ( 1 + 2*SDestimate/Sqrt[hours])

I think you mean this to be the mean of the posterior distribution, right? There is no maximum likelihood estimator in the context of Baysian estimation. We don't maximize the likelihood function; instead we multiply the likelihood function by the prior distribution and normalize this to get the posterior distribution. Then for a normal posterior distribution, the mean of the posterior distribution maximizes either the mean square loss function or the absolute value loss function. For your prior distribution this would give:

estimate = WinRateObserved / [ 1 + SDestimate^2/(4*hours) ][/i]

This also gives the correct units.

In general,

estimate = (sigma^2*u + nv^2*WinRateObserved) / (sigma^2 + nv^2)

Where u is the mean of the prior distribution, v^2 is the variance of the prior distribution, and n is the number of hours (from DeGroot).


If instead we chose a uniform distribution say between -1000 BB per hour and +1000 BB per hour, then I wonder if we could then say that there is a 95% probability that the win rate is within 2 standard errors of the observed win rate.

Yes, I think that should give a posterior distribution which is just the likelihood function which is normal, and the mean and standard deviation of this function would be the same as the maximum likelihood estimates.


I think there must be a word or two wrong in those statements. If your win rate was 100 bb/hr, then you would achieve those "results or better." Am I missing something here?

See above post for the correction. By "better" I meant farther from the actual win rate, but the probability should be 2.5% or greater. Since I was explaining that the confidence interval pertains to the probability of results, I was trying to explain the 95% in terms of this interpretation, but the easiest way to explain the 95% is that this is the probability that your true win rate will lie in this interval before you play. The sentence in the correction where I take 2*2.5% to get 5% really only makes sense before you play. After you play, you cannot integrate the tails this way because that would assume the probability of each win rate is equal, which we can't assume.

-Bruce

irchans
03-05-2004, 11:29 AM
Bruce,

I don't have much time right now, so I will only comment on the first sentence in your posted. I will read the rest later this weekend.

I think the units of standard deviation of win rate should be BB/hr rather than BB/sqrt(hr).

I have seen several threads on this topic over the years both on 2+2 and on rec.gambling.poker. Certainly you would think that the standard deviation of a random variable X would have the same units as X. That would suggest that SD has units of BB/hr. On the other hand, here are three formulas that suggest SD has units of BB/Sqrt[hr]

risk_of_ruin = Exp[ - 2 SD^2 / bankroll / winrate ]

confidence_interval_size = SD * Sqrt[hours_of_play].

ML_estimate_of_SD_from_two_sessions = Abs[ winrate1- winrate2]/Sqrt[2]/Sqrt[hours1 + hours2].

In truth, I really don't know what the correct units are.

Cheers,
Irchans

BruceZ
03-05-2004, 12:18 PM
In truth, I really don't know what the correct units are.

I find that hard to believe. /images/graemlins/crazy.gif There is no (valid) controversy here. The SD of total win has units of bb/sqrt(hr) when the total win is taken to be a function of time because it increases as the square root of the number of hours played. This is the SD you are using in the examples you gave for the risk of ruin and the confidence interval. However, we were talking about the prior distribution of the win RATE, and the standard deviation of this has units of bb/hr the same as win rate itself. When you estimate your win RATE, the standard deviation of the win rate is called the "standard error" which is SD/sqrt(N), and the size of the confidence interval for the win RATE is SD/sqrt(N). This comes from the confidence interval you gave divided by N, SD*sqrt(N)/N = SD/sqrt(N). This has units of bb/hr since SD has units of SD/sqrt(hr).

irchans
03-10-2004, 12:02 PM
Hi Bruce,

I found the time to check your math and your formulas agree with my calculations.

WinRateestimate = WinRateObserved / [ 1 + SDestimate^2/(4*hours) ]

WinRateestimate = (sigma^2*u + nv^2*WinRateObserved) / (sigma^2 + nv^2)


Thanks also for the post on units for standard deviation. I was a bit confused about that.

Cheers,
Irchans

Saborion
03-11-2004, 11:16 PM
Where do you get 1.96 from?

bigpooch
03-12-2004, 01:25 AM
That's just a z-value you can get from a table at the back
of almost any stats book. It's just the z that for the area
under the standard normal curve between -z and +z is 95% of
the total area (which is just one). The standard normal
distribution is given by

f(x) = (2 x pi)^-(1/2) x exp(-(z^2)/2)

and the integral over the entire real line is 1.

If you want the right hand tail and left hand tail to have
a total of only 5% of the area, by symmetry, the right hand
tail should only have area 0.025 and so the integral of
the distribution (from -infinity to z) should be 0.975 and
if you look for this in the table, it should yield 1.96.

irchans
03-12-2004, 09:26 AM
f(z) = (2 x pi)^-(1/2) x exp(-(z^2)/2)

where x represents multiplication.