confidence intervals? - Page 2

BruceZ · #11 03-04-2004, 05:50 AM

Hans,

I would think a normal a prior distribution of winrate with a standard deviation of 2 BB per sqrt hour and mean of 0 BB per hour would be reasonable.

I think the units of standard deviation of win rate should be BB/hr rather than BB/sqrt(hr). This is the standard deviation of the prior distribution which reflects our uncertainty in the value of the true win rate, so it should have the same units as the win rate. Note that standard error, which is the standard deviation of win rate, is SD/sqrt(n) and has units of BB/hr also since SD has units of BB/sqrt(hr).

Then the ML estimator for win rate would be something like

estimate = WinRateObserved / ( 1 + 2*SDestimate/Sqrt[hours])

I think you mean this to be the mean of the posterior distribution, right? There is no maximum likelihood estimator in the context of Baysian estimation. We don't maximize the likelihood function; instead we multiply the likelihood function by the prior distribution and normalize this to get the posterior distribution. Then for a normal posterior distribution, the mean of the posterior distribution maximizes either the mean square loss function or the absolute value loss function. For your prior distribution this would give:

estimate = WinRateObserved / [ 1 + SDestimate^2/(4*hours) ][/i]

This also gives the correct units.

In general,

estimate = (sigma^2*u + nv^2*WinRateObserved) / (sigma^2 + nv^2)

Where u is the mean of the prior distribution, v^2 is the variance of the prior distribution, and n is the number of hours (from DeGroot).

If instead we chose a uniform distribution say between -1000 BB per hour and +1000 BB per hour, then I wonder if we could then say that there is a 95% probability that the win rate is within 2 standard errors of the observed win rate.

Yes, I think that should give a posterior distribution which is just the likelihood function which is normal, and the mean and standard deviation of this function would be the same as the maximum likelihood estimates.

I think there must be a word or two wrong in those statements. If your win rate was 100 bb/hr, then you would achieve those "results or better." Am I missing something here?

See above post for the correction. By "better" I meant farther from the actual win rate, but the probability should be 2.5% or greater. Since I was explaining that the confidence interval pertains to the probability of results, I was trying to explain the 95% in terms of this interpretation, but the easiest way to explain the 95% is that this is the probability that your true win rate will lie in this interval before you play. The sentence in the correction where I take 2*2.5% to get 5% really only makes sense before you play. After you play, you cannot integrate the tails this way because that would assume the probability of each win rate is equal, which we can't assume.

-Bruce

irchans · #12 03-05-2004, 11:29 AM

Bruce,

I don't have much time right now, so I will only comment on the first sentence in your posted. I will read the rest later this weekend.

I think the units of standard deviation of win rate should be BB/hr rather than BB/sqrt(hr).

I have seen several threads on this topic over the years both on 2+2 and on rec.gambling.poker. Certainly you would think that the standard deviation of a random variable X would have the same units as X. That would suggest that SD has units of BB/hr. On the other hand, here are three formulas that suggest SD has units of BB/Sqrt[hr]

risk_of_ruin = Exp[ - 2 SD^2 / bankroll / winrate ]

confidence_interval_size = SD * Sqrt[hours_of_play].

ML_estimate_of_SD_from_two_sessions = Abs[ winrate1- winrate2]/Sqrt[2]/Sqrt[hours1 + hours2].

In truth, I really don't know what the correct units are.

Cheers,
Irchans

BruceZ · #13 03-05-2004, 12:18 PM

In truth, I really don't know what the correct units are.

I find that hard to believe. [img]/images/graemlins/crazy.gif[/img] There is no (valid) controversy here. The SD of total win has units of bb/sqrt(hr) when the total win is taken to be a function of time because it increases as the square root of the number of hours played. This is the SD you are using in the examples you gave for the risk of ruin and the confidence interval. However, we were talking about the prior distribution of the win RATE, and the standard deviation of this has units of bb/hr the same as win rate itself. When you estimate your win RATE, the standard deviation of the win rate is called the "standard error" which is SD/sqrt(N), and the size of the confidence interval for the win RATE is SD/sqrt(N). This comes from the confidence interval you gave divided by N, SD*sqrt(N)/N = SD/sqrt(N). This has units of bb/hr since SD has units of SD/sqrt(hr).

irchans · #14 03-10-2004, 12:02 PM

Hi Bruce,

I found the time to check your math and your formulas agree with my calculations.

WinRateestimate = WinRateObserved / [ 1 + SDestimate^2/(4*hours) ]

WinRateestimate = (sigma^2*u + nv^2*WinRateObserved) / (sigma^2 + nv^2)

Thanks also for the post on units for standard deviation. I was a bit confused about that.

Cheers,
Irchans

Saborion · #15 03-11-2004, 11:16 PM

Where do you get 1.96 from?

bigpooch · #16 03-12-2004, 01:25 AM

That's just a z-value you can get from a table at the back
of almost any stats book. It's just the z that for the area
under the standard normal curve between -z and +z is 95% of
the total area (which is just one). The standard normal
distribution is given by

f(x) = (2 x pi)^-(1/2) x exp(-(z^2)/2)

and the integral over the entire real line is 1.

If you want the right hand tail and left hand tail to have
a total of only 5% of the area, by symmetry, the right hand
tail should only have area 0.025 and so the integral of
the distribution (from -infinity to z) should be 0.975 and
if you look for this in the table, it should yield 1.96.

irchans · #17 03-12-2004, 09:26 AM

f(z) = (2 x pi)^-(1/2) x exp(-(z^2)/2)

where x represents multiplication.