PDA

View Full Version : standard deviation calculation and evaluation - Weitzman eqn?

Bubu
11-07-2003, 06:17 PM
Hi,

-What is the difference between the "standard textbook" standard deviation equation (&lt;x^2&gt;-&lt;x&gt;^2) vs the Mark Weitzman one that Malmuth has posted in the essay section.

-The confidence level of your ev is dependant on your sd, correct ? I am wondering if I have enough data to get meaningful results for a recent switch to a new limit. Currently I have 14k recorded hands at this limit under homogenous conditions over the run of 7 weeks. My sd/ev is less than 5 - which seems too good to be true. (sd is 16 bb/hr using Malmuth formula - I usually play shorthanded, should this matter ?)

(Anecdotally, before I moved full time to this limit and started taking records, I had a couple of bad when weeks when I mixed up my limits - "taking shot at it ?". This made me question my game. Maybe as a result of these bad experiences i'm playing more seriously and purposely, but these results seem too skewed to the +ve side. I do want to evaluate my game and not fool myself here. Can this just be fluke ?)

BTW, this is online play, so I am planning on updating my spreadsheet to calculate the sd in terms of bb/60 hands (approximately one table hour). Is this the preferred method ?

Thank you very very much. I am a little confused and really impressed by the posts I've seen in this forum.

Bubu

BruceZ
11-07-2003, 06:58 PM
What is the difference between the "standard textbook" standard deviation equation (&lt;x^2&gt;-&lt;x&gt;^2) vs the Mark Weitzman one that Malmuth has posted in the essay section.

Mason's formula can be used for sessions with a variable number of hours. It computes the maximum likelihood estimate of the standard deviation, just like the one you learned in school. This means that it gives the standard deviaton which is most likely to give your results. Note that this is not the same as saying that it is the most likely standard deviation. It is the likelihood of the results that are maximized, not the likelihood of the standard deviation. I was able to verify that this formula is indeed the maximum likelihood estimator for variable length sessions. If you want, I can post a derivation.

(&lt;x^2&gt;-&lt;x&gt;^2

This is the variance. You have to take the square root of that to get standard deviation. This identity is also used to derive Mason's formula.

The confidence level of your ev is dependant on your sd, correct?

Yes.

am wondering if I have enough data to get meaningful results for a recent switch to a new limit. Currently I have 14k recorded hands at this limit under homogenous conditions over the run of 7 weeks

You can always construct confidence intervals for the conditions under which you played. If the number of sessions is too small, your standard deviation may not be accurate. You can account for this with a t-test. To do this, the number of standard deviations you use for a certain confidence interval for N sessions should come from the t-distribution with N-1 degrees of freedom, instead of from the standard normal distribution. You can use the Excel function =TINV(1-p,N) where p is the confidence. So for 90% confidence after 20 sessions, you would use TINV(0.9,20) = 1.72. You would then use 1.72 standard deviations. Once you have a large number of sessions, this becomes 1.65, the same as if you used a normal distribution. In addition, multiply your standard deviation by a factor of sqrt(N/N-1). None of this will make a large difference to your confidence interval once you have enough sessions (40-100 or so).

BTW, this is online play, so I am planning on updating my spreadsheet to calculate the sd in terms of bb/60 hands (approximately one table hour). Is this the preferred method ?

It depends on what you do with it. You can compute it for any length of time or number of hands. It can then be used to compute your swings over any period of time, the accuracy of your EV for any unit of time, and your risk of ruin, as long as you are consistent in units. Just remember that it actually has units of \$/sqrt(hands) or \$/sqrt(hours).

BillC
11-07-2003, 07:43 PM
I always use the version where you divide by n-1. and not by in as in GTOT. I believe my version is the unbiased, variance minimizing estimator, instead of the maximum likelyhood one. Of course even tho' it doesn't mattter in the long run which one you use, I managed to roil my bj team with the slightly different formula.

BillC

BillC
11-07-2003, 07:48 PM
It is not as simple as doing a t-interval (or z- ), is it? The estimate must depend in the time intervals in the sample. I look at it as more of a regression problem, where
you are estimating the slope.

BruceZ
11-07-2003, 07:52 PM
You can use the Excel function =TINV(1-p,N) where p is the confidence. So for 90% confidence after 20 sessions, you would use TINV(0.9,20) = 1.72.

Sorry, make that TINV(0.1,20) = 1.72.

I just noticed that you have 14K hands, which is 233 "sessions" of 60 hands. You shouldn't need to use the t-test with this many sessions, as any inaccuracy in your SD should be small enough to not affect your confidence interval. If you have results of each hand, then you can even use the simplified formula for SD that you used in school, since you can group your results into sessions of equal length. It will give the same result.

Bubu
11-07-2003, 08:27 PM
I would appreciate the derivation of the Weitzman formula- thought if its too much bother, don't worry. I am mathematically inclined though quite ignorant of statistics and such - specially in the practical aspects of gaming and money management. Could you recommend me a book ?

Thanks to your information I updated my spreadsheet. I guess my winrate is 3.75+/-2.63 bb/hr at the 95% confidence level. That's very comforting - its quite likely that I'm winning. (though I still don't know how ...) Still this is only considering 24 sessions tallying some 15 thousand hands. I am not sure how significant this number is - despite the t-tests and all. Only time will tell ;-)

Thank you,

Bubu

BruceZ
11-07-2003, 08:34 PM
I always use the version where you divide by n-1. and not by in as in GTOT. I believe my version is the unbiased, variance minimizing estimator, instead of the maximum likelyhood one.

Dividing by n-1 gives the unbiased estimate of the variance, but taking the square root of this does not give the unbiased estimate of the standard deviation, that is a more complicated formula. Dividing by n+1 minimizes the mean squared error of the variance estimate.

BruceZ
11-07-2003, 08:38 PM
Sorry, make that TINV(0.1,20) = 1.72.

Argh, still not right. TINV(0.1,19) = 1.72.

For 20 sessions, you use 19 degrees of freedom. Note that this may vary from the way you would use some published tables of the t-distribution. Depending on what is being tabluated, you may need 0.05, 0.9, or 0.95. Excel gives a 2-sided probability of being outside the range, so for a 90% confidence, you use 10% or 0.1.