Standard Deviation of PT Stats (Light Math)

chris_a · #1 04-19-2005, 08:11 PM

I've seen a lot of stats posts lately and some conclusions being drawn about some stats. I felt like I should post this so that people on micro know about it. I know this is more of a probability forum post, but it directly affects posts in micro so I think it's appropriate here. Generally speaking anyone who comments about a players stats should know about this so that they don't give bad information based on very low sample size (for instance about someone's blind stealing tendencies that don't converge for a while.)

The fact is, every stat. estimate that's a "% of time" stat. has its own standard deviation. We can determine, based on the number of times such an event happens, what range we can expect the true number to be in.

Generally:
If you have an event, that happens with probability p, and it has happened N times, then you can figure out the standard deviation of the estimated percentage by:

Std. Dev. = sqrt(p*(1-p)/N)

For example if someone calculated there VPIP after 10,000 hands at 20 VPIP. That means that 20% of the time they voluntarily put money into the pot. Then,

Std. Dev. = sqrt(0.2*0.8/10000) = 0.004 = 0.4% They can roughly expect their true VPIP to be within 2 std. dev. of the 20VPIP estimate about 95% of the time. Thus they can expect that their VPIP is between 19.2 and 20.8. You can see that VPIP does converge quickly as this is pretty accurate.

Now, what about a number like "Went to SD %"? You can't compute the standard deviation of this number from the total number of hands. You have to compute it from the number of times the player has seen the flop. You can get that number from the More Detail... button on the general tab in poker tracker. It's listed as the 5th number down "Saw Flop All Hands". So for instance if your number of hands is 10k maybe your saw flop number will be around 3000.

Thus to compute the std. dev. of "Went to SD %" you would take the estimate say it's 35% and N = 3000 (from the previous paragraph). You'd get

Std. Dev. = sqrt(0.35*0.65/3000) = 0.0087 = 0.87%

As you can see it's slightly bigger than the Std. Dev. for VPIP. The expected 95% interval for Went to SD would be 33.3%-36.7%.

For one final example, we'll take the "Won $ at SD %" stat. For this one N is the number of showdowns you've gone to. This is given on the More Detail... page again under "Went to Showdown". An example is:

Went to Showdown: 41.65% (1320 times out of 3169)

The number you'd want to use in computing the std. dev. would be the 1320. So for the std. dev. for Won$@SD if the Won$@SD number was say 55% we'd have:

Std. Dev. = sqrt(0.55*0.45/1320) = 0.0137 = 1.37%

Thus the 95% confidence interval would be 52.3% to about 57.7%. So obviously this stat converges slower than VPIP or Went to SD.

Note: the confidence intervals assume that the stats are Gaussian, but they aren't really, they are binomial. However, after the sample sizes that we are talking about, the approximation becomes good.

Conclusion: You can use the above formula to get the std. dev. for any of these percentage stats, but you have to know the number of times such a thing happens and that affects the range of the %.

Ball Park Figures:
-Your VPIP should be within 1 percentage point of its true value (with 95% confidence) after about 5,000 hands.
-Your Went to SD should be within 1 percentage point of its true value after about 30,000 hands.
-Your Won$ at showdown should be within 1 percentage point of its true value after about 100,000 hands. Not coincidentally, this is about the same as when your win rate is starting to be really converged.

After 10k hands conclusion:
-VPIP is very well converged. It's known plus or minus ~0.8 percentage points.
-Went to Showdown is pretty well converged and you can draw rough conclusions about it. It's known plus or minus ~1.7 percentage points.
-Won $ at SD isn't very converged at all and probably not a lot of conclusions should be drawn about it unless it's in the mid 40%s or lower, or its higher than 60%. It's known plus or minus about 3 percentage points.

Stuey · #2 04-20-2005, 12:32 AM

Very nice post. Wish I had something to add but I would likely create confusion. Your post states some very important facts. I read GTAOT and it cleared up SD as it relates to BR management, and winrates. But it is easy to forget how the SD can skew the importance of a number. Thanks for the info.

dozer · #3 04-20-2005, 12:45 AM

Can you tell me what these mean, I've been waiting to ask someone that knows about standard deviation.

jaxUp · #4 04-20-2005, 12:47 AM

I love this post. (fish sucks).

jaxUp · #5 04-20-2005, 12:54 AM

They basically mean that variance sucks ass. The longer version basically says that every hundred hands you play, you're on average, 15.6327 BBs from what you would expect to win.

(this is very possibly wrong...wait for validation).

UncleSalty · #6 04-20-2005, 01:04 AM

[ QUOTE ]
They basically mean that variance sucks ass.

[/ QUOTE ]

Variance in Texas Hold'em is God's greatest gift to skilled players. Without variance allowing absolute morons to score some very big wins on a semi-regular basis, we would very quickly run out of fish and end up playing each other. (Yikes.)

Say it with me kids..."I love variance".

Statistically speaking, your standard deviation is used to provide a level of confidence in your winrate. It's been a long time since I took a stats class, so I'm sure some of our resident math whizzes can provide more clarity.

Basically, you can be about 99% confident that your "true" winrate is within 3 standard deviations on either side of your current average winrate after 13K hands. I believe the confidence interval for just one SD is about 78% or so.

So, what it really means is that your 13K hand sample size doesn't tell you dick about your true winrate.

yellowjack · #7 04-20-2005, 01:07 AM

The formulas for a sampling distribution of a sample proportion are being used. This is because the population proportion from which the sampling proportion is taken from is impossible to account for (since there is an infinite number of hands that are posible).

i.e. We use s.d. = sqrt( p(1-p)/n )
and not s.d. = sqrt( np(1-p) )

RockPile · #8 04-20-2005, 03:05 AM

Standard deviation.. hmm..how to explain. There are 3 steps to SD, each diveded equally. Think of a bell curve. If you were cut it in half then divide it into 3 equal parts then combine it again you should get the idea. 66% of the average + st.dev will fall in the middle 2/3rds of the bell curve. 99% of you avergae +st.dev+st.dev will fall in the majortiy of the bell curvem and the last 1% is the third step.

So, if your AVERAGE W$SD is 50% with a ST.Deviation of 10% we can say that

66% of the time your W$SD is between 40% and 60%
99% of the time your W$SD is between 30% and 70%
100% of the time your W$SD is between 20% and 80%

if you graph it out you should see a nice bell curve

yellowjack · #9 04-20-2005, 05:47 AM

Not quite what I was asking, but thanks for trying [img]/images/graemlins/smile.gif[/img]
*shameless bump*

jaxUp · #10 04-20-2005, 05:51 AM

[ QUOTE ]
Not quite what I was asking, but thanks for trying [img]/images/graemlins/smile.gif[/img]
*shameless bump*

[/ QUOTE ]

Well, I haven't been told that I'm wrong really, but nobody seems to eager to validate what I said either. I'm at least in the ballpark I think.