PDA

View Full Version : Standard Deviation 24.4 BB/100


DougOzzzz
12-10-2004, 11:27 PM
Over a pretty small sample size of 6000 hands, I have a standard deviation of 24.4 BB/100. I checked a previous thread and saw that 15 is about normal, and there seems to be a very small "standard deviation" for standard deviation. Hah. About 1/3 of my hands have been 6 max tables... I am not loose (18% VP$IP at 10 handed tables), nor super aggressive (2.5 post flop aggression). Any other reasons my standard deviation is so high? Is it just a result of the small sample size? Oddly enough, my daily win rate seems to have a very low SD...

How valid is a 6000 hand history anyways? If I am at X BB/100 with an SD of Y after Z hands, whats the standard deviation for "actual" BB/100 for that player?

Thanks for any help...

uuDevil
12-11-2004, 02:51 AM
This number is quite large. What limit do you play? This is Hold'em?

You definitely should separate out your shorthanded results. (If you have Pokertracker this is easy to do. Just uncheck the 6-max tables under the preferences tab.) The combination of very loose and very aggressive play by either you or your opponents could account for this. How often do you see a showdown?

Have a look at these threads:

Small Stakes SD (http://forumserver.twoplustwo.com/showthreaded.php?Cat=&Number=1362524&page=&view=&s b=5&o=&vc=1)

Micro SD (http://forumserver.twoplustwo.com/showthreaded.php?Cat=&Number=1362540&page=&view=&s b=5&o=&vc=1)

Shorthanded SD (http://forumserver.twoplustwo.com/showthreaded.php?Cat=&Number=1362508&page=2&view=c ollapsed&sb=5&o=7&vc=1)

DougOzzzz
12-11-2004, 08:51 AM
Thanks for the reply... even though half the thread probably belongs on another forum. Taking out the 6 handed games (which has SD of >27), lowers the number a little, but its still much higher than anyone elses.

Compared to my opponents (at .5/1, 1/2 (most games), 2/4, and 3/6), my went to showdown percent is a little low (32.4% vs. 34.4% average), while my W$SD% is a little high (50.7 vs. 47.7).

The sample size is still small... maybe it's just a freak occurrence.

jason1990
12-11-2004, 12:45 PM
I intend to write an article in the near future titled, "How Accurate is my SD?" It will be for my own personal use and I will put it on my website (as soon as I finish building it), but I doubt it will appeal to a large audience since I intend it to be fairly "math heavy". But at any rate, my claim is that your SD is not very accurate after only 6000 hands. Here's some initial computations. Everything below is approximate and I will make it exact in the article.

Let n = 6000 and X_1, ..., X_n denote the results of the n hands that you played. Let X denote some future hand. We want to estimate sig := sqrt{E|X|^2}. (Note that 10sig is your true SD in BB/100.) There are a couple of ways to do this.

First, you could consider aggregate sums. For example, let Y_j = X_{100(j-1)+1} + ... + X_{100j}, so that Y_1, ..., Y_{60} are the net results of each block of 100 hands. It is probably a relatively safe assumption that each Y_j is a Gaussian random variable. (To be safer, you could consider blocks of 150 or 200. This will still give you at least 30 data points, which, according to Mason in an article I can no longer find a link to, should be enough.) We then compute

bar{Y} = 60^{-1}sum_{j=1}^{60}{ Y_j }
S^2 = 60^{-1}sum_{j=1}^{60}{ (Y_j - bar{Y})^2 }

(Note that S is your empirical SD in BB/100. In your case, S = 24.4.) Standard techniques using the tails of the chi-square distribution give that [0.72S^2,1.48S^2] is a 95% confidence interval for (10sig)^2. In other words, a 95% confidence interval for your true SD in BB/100 is [20.7,29.7].

A second method of estimating your SD is to use individual hand data (which is probably what Poker Tracker does). In this case, you compute

bar{X} = n^{-1}sum_{j=1}^n{ X_j }
S^2 = n^{-1}sum_{j=1}^n{ (X_j - bar{X})^2 }.

Unfortunately, you can no longer use the standard methods for estimating the accuracy of S^2 since X is not Gaussian. To get a confidence interval for your true SD in this case, you must have an estimate on E|X|^4. More precisely, if we write E|X|^4 = Csig^4, then we want to estimate C. For a Gaussian random variable, C = 3. Unfortunately, with only 6000 hands, you will not be able to get an accurate estimate on the tail behavior of X, and this will result in a horrible estimate for C. For example, suppose the largest net win you had over these 6000 hands is 30 BB. If we set p = P(|X| > 30), then a 95% confidence interval for p is given by [0,1-.05^{1/n}] = [0,.0005]. Since the largest win possible is 108 BB, this gives us the estimate of

E[|X|^4 1_{|X|>30}] <= .0005(108^4) = 68000.

Combined with the rest of the data, our estimate of E|X|^4 will likely exceed 100,000. If we observe that sig^4 is certainly less than 100, we see that we will get an estimate for C of more than 1000, which is way too large to be effective.

An alternative way to estimate C is to use the assumption that Y_1 is approximately Gaussian. This means (with all equalities being approximate)

E|Y_1|^4 = 100 E|X|^4
= 3(E|Y_1|^2)^2
= 3(100 E|X|^2)^2

which gives C = 300. If we now assume S^2 is approximately Gaussian, then its mean is approximately sig^2 and its variance is approximately

n^{-2}sum_{j=1}^n{ E|X|^4 }
= n^{-2}sum_{j=1}^n{ 300sig^4 }
= 300sig^4/n
= sig^4/20.

Hence, the standard deviation of S^2 is sig^2/sqrt{20}. In your case, S = 2.44, so sig^2/sqrt{20} is roughly 1.33. Hence, a 95% confidence interval for S^2 is [4.62,7.28]. Taking square roots and converting to BB/100 gives a 95% confidence interval for your true SD as [21.5,27.0]. This is not very different than the previous confidence interval. (It is a little smaller, but this is an artificial effect of the crude nature of these approximate equalities.) This is no surprise since the computations were founded on the same assumption; namely, that X_1 + ... + X_{100} is Gaussian. Only by having a better estimate of E|X|^4 that comes directly from the data can we hope to improve on this. And we can only get such as estimate by controlling the tail behavior. And we can only control the tail behavior by playing many, many more hands. Exactly how large a sample size we need is something I intend to address later.

The moral of the story is this: you know that your SD is high compared to others on this forum. (It is probably at least 20 BB/100.) However, if you want to do any calculations with your SD, such as computing risk of ruin under the assumption of some given winrate, your calculations will be hopelessly inaccurate.

Note: Don't take these confidence intervals literally. As I said, everything here is approximate.

gaming_mouse
12-11-2004, 03:50 PM
jason,

Interesting stuff. A few points of clarification:

[ QUOTE ]
We want to estimate sig := sqrt{E|X|^2}. (Note that 10sig is your true SD in BB/100.)

[/ QUOTE ]

Don't we want to estimate sqrt(E[(X - X_bar)^2]). That is, why has the mean dropped out of your variance calculation?

[ QUOTE ]

First, you could consider aggregate sums....

We then compute

bar{Y} = 60^{-1}sum_{j=1}^{60}{ Y_j }
S^2 = 60^{-1}sum_{j=1}^{60}{ (Y_j - bar{Y})^2 }


[/ QUOTE ]

This is approach is clever, and I like the trick of using chi-square. However, even with partial sums of 200, we still have only approximations of a normal distribution for our 30 data points, whereas the chi-square method assumes normality. How do you we know that using approximations won't badly affect the accuracy of using chi-square? More to the point: How can we get a bound on the overall error?

TIA,
gm

jason1990
12-11-2004, 04:07 PM
[ QUOTE ]
jason,

Interesting stuff. A few points of clarification:

[ QUOTE ]
We want to estimate sig := sqrt{E|X|^2}. (Note that 10sig is your true SD in BB/100.)

[/ QUOTE ]

Don't we want to estimate sqrt(E[(X - X_bar)^2]). That is, why has the mean dropped out of your variance calculation?

[/ QUOTE ]
You're absolutely correct, but as I said, these are all just heuristic approximations which I will make rigorous later. Dropping the mean simplifies the heuristics and since the mean is (presumably) small compared to the variance, dropping it will not change the qualitative nature of the results. For what it's worth, I've used the correct formulas in my own versions of the heuristics, but I didn't want to post it all here. It's fairly complicated.

[ QUOTE ]
[ QUOTE ]

First, you could consider aggregate sums....

We then compute

bar{Y} = 60^{-1}sum_{j=1}^{60}{ Y_j }
S^2 = 60^{-1}sum_{j=1}^{60}{ (Y_j - bar{Y})^2 }


[/ QUOTE ]

This is approach is clever, and I like the trick of using chi-square. However, even with partial sums of 200, we still have only approximations of a normal distribution for our 30 data points, whereas the chi-square method assumes normality. How do you we know that using approximations won't badly affect the accuracy of using chi-square? More to the point: How can we get a bound on the overall error?

[/ QUOTE ]
You cannot get a bound on the overall error without an estimate on the tail behavior of X, such as a bound on E|X|^4. And my claim is that a genuine and accurate bound like that is impossible after only 6000 hands. However, the point of the above is that, even if the Y's were *exactly* Gaussian, you still cannot get a 95% confidence interval on your SD whose order of magnitude is smaller than about 10 BB/100.

uuDevil
12-11-2004, 04:10 PM
[ QUOTE ]
I intend to write an article in the near future titled, "How Accurate is my SD?" It will be for my own personal use and I will put it on my website (as soon as I finish building it), but I doubt it will appeal to a large audience since I intend it to be fairly "math heavy".

[/ QUOTE ]
You may want to consider submitting it to the 2+2 internet magazine (they are offering $200/article). The audience may be small, but Mason insists he only cares about quality.

DougOzzzz
12-11-2004, 07:13 PM
I have to admit I am a little lost in the notation here... but I don't see why performing calculations with my SD would be hopelessly inaccurate, given that the SD is likely 20+ and BB/100 taken over sets of 100 hands are probably Gaussian. Seems to me like a monte carlo simulation could fairly accurately show some risk of ruin type things with that info (sure there's a more elegant mathematical way to do it, but I do not know the mathematics).

jason1990
12-11-2004, 09:49 PM
Well, "hopelessly inaccurate" may be an exaggeration. But the best you can say right now is that your SD is somewhere between 20 and 30. So, for example, suppose you have a 500BB bankroll and you assume your winrate is 2BB/100. Then your risk of ruin is e^{-2000/s^2}, where s is your SD. Using s = 20 gives 0.7% and using s = 30 gives 10.8%. So all you can say, in this particular case, is that your risk of ruin is somewhere between 0.7% and 10.8%. That's not a very precise range, but it's the best you can do without a more accurate estimate on your SD.

Or, conversely, suppose you assume your winrate is 2BB/100 and you want to know what size bankroll you need to have a risk of ruin of 1%. Well, you can't say for certain, because you don't know your SD. The bankroll you need is given by the formula

b = -(s^2/4)ln(.01)

where s is your SD. Using s = 20 gives b = 461. Using s = 30 gives b = 1036. So all you can say is that you need a bankroll of somewhere between 461BB and 1036BB. Again, not a very precise range, but, granted, it does tell you *something*.

DougOzzzz
12-11-2004, 10:13 PM
You can do a little better than that though. First, I'd regress the 24.4 SD towards the mean of 15. After just 6000 hands, and with no obvious anomalies in my play, I think its pretty safe to say theres a much greater chance that my "true" SD is less than 24.4 than it is more than 24.4. It's kinda like in baseball, or any sport... a team that wins 60% of their 1st 100 games is on average more like a "true" 57.1% team, so any future projections should use this number instead of 60%.

Then, I'd use the results of your formula to determine the standard deviation for my standard deviation after 6000 hands.

Then with my regressed starting SD, I'd use Normal Distribution for say each SD within 3 standard deviations (an increment of .1 for each), and calculate the risk of ruin for each, then take the weighted average (with values closer to the guessed SD weighted more heavily, since they would occur more frequently).

Of course it's all meaningless since with only 6000 hands I can only guess what the true win rate is. The same thing could be done for win rate too, though.

gaming_mouse
12-11-2004, 10:13 PM
[ QUOTE ]
You cannot get a bound on the overall error without an estimate on the tail behavior of X, such as a bound on E|X|^4. And my claim is that a genuine and accurate bound like that is impossible after only 6000 hands.

[/ QUOTE ]

jason,

I have no idea, but is it possible that E|X|^4 is more stable across different players (playing styles) than E|X|^2 is? In that case, you could get a bound on that using very large aggregate databases (millions, or even tens of millions of hands) and then publish the known result for practical use. No idea if this is feasible, but just curious.

gm

jason1990
12-12-2004, 03:39 AM
It's unclear to me how you would do the regression. Perhaps you could illustrate with a simple example. Frankly, I don't see how the number 15 has any relevance here, except maybe as the SD of some Bayesian prior distribution. But if you want to take a Bayesian approach to the analysis, then I'm sure you could do much better than what I'm suggesting, simply by appropriately choosing your prior. For this reason, my intention is to completely avoid any sort of Bayesian analysis.

As for computing a weighted average of your risk of ruin, it sounds like a nice idea. But it looks like you are considering the parameters in the model (for example, the true SD) to be random variables themselves. This is a decidely Bayesian approach. If this is not the case, then perhaps you could elaborate. I am interested in hearing any (non-Bayesian) ideas that might improve this analysis.

jason1990
12-12-2004, 03:50 AM
Interesting. I have no idea. But I see no reason to think that E|X|^4 would be more stable. In fact, a "naive" argument suggests it should be less stable. After all, SD is less stable than winrate from player to player, so perhaps stability decreases as we look at higher moments.

Also, I think it would be difficult to test the stability of E|X|^4. We would have to compare its "true" value for many different players. But to get a "true" value for any particular player, they would have to have a large personal database. So all we could really test is the stability of E|X|^4 among players with large databases. And my guess is that "large" here would mean something larger than the database of the typical player who would want to make use of this statistic.