Are Winrates Normally Distributed? - Page 4

stinkypete · #31 12-08-2005, 04:38 AM

[ QUOTE ]
Hi Josh,

I'll take a stab at addressing a few points.

The fundamental Random Variable in poker is the amount of money you win on one hand. This random variable has a distribution, which is certainly not Guassian.
First off, it's a discrete random variable. The mean is your winrate per hand. The max value values it can take are +12BB and -12BB (on Party Poker). The most probable event is 0, since you fold most hands.
Other frequently occurring values are -0.50BB and -0.25BB since these are the values you lose when you fold your blinds, and maybe -1.5BB since this is how much you lose when you raise pre-flop, completely blank the flop, bet the flop, and get raised.

So we get a sense of what the probability mass function of this random variable looks like: It's centered at your winrate (say .02bb) but its peak value is at 0. Then it has smaller peaks at popularly occuring values, such as -0.50BB, -0.25BB, etc. It is, obviously, not a normal distribution.

The Central Limit Theorem tells us that if we ADD together enough of these strange random variables, the sum, regarded as a random variable, must start looking more and more Guassian.

In your charts, when you group together a string of hands, you are adding all the random variable in each group, and this sum should starting looking Guassian the larger the group is (BB/1000 should look more Guassian than BB/10). With a 150k hand sample, I don't think you have enough hands to get a graph that shows this, since if you went to, say, BB/1000, you would only have 150 sample points. But I'm pretty sure that at some point, it would look like a nice bell-shaped curve.

Edit: You can start to see at BB/50 how the graph is looking more Guassian. Below BB/50 you have the nice feature that you have many smaple points. BUT each sample point is not yet being taken from a very Guassian distribution. Above BB/50 (BB/100 and up), you have the nice feature that the samples are being taken from a pretty Guassian distribution, BUT you don't have enough samples to draw the curve. If your DB was much larger, I think you would see the BB/100 look much closer to Guassian than the BB/50.

-v

[/ QUOTE ]

this is very well said, and based on my understanding of statistics, exactly correct.

the assumption here is that the win/loss per hand is a random variable, which it strictly speaking is not, as dcifr mentioned. the distribution of the random variable will change based on game conditions, improvements in your play, tilt, who you're playing against, the number of dumps the guy in seat 6 has taken that day, etc. but these things shouldn't change the random variable so much that you can't approximate win rate per N hands where N is large as a normal distribution fairly accurately (the last point in particular has very little effect.)

#32 12-08-2005, 05:36 AM

Hi Justin,

The confidence interval calcs that people do only apply to a Normal distribution with mean and variance equal to the mean and variance of our special BB/100 distribution.
Since our BB/100 distribution is not normal, these first two "moments" are not enough to figure out exactly what our confidence intervals are. So, our calculations introduce an error. How big is the error?

Let's go to the extreme and say that we were interested in BB/1 (BB/hand). Our mean, let's say, is 0.0200 (this would mean we had a winrate of 2.00 BB/100). And let's say our standard deviation is 1.5 (which would result in an SD/100 of 15). This means that a one sigma event would put us between -1.48 and 1.52 BB, and a five sigma event (which happens less than once in a million trials on average) would be between -7.48 BB and 7.52 BB. Clearly this is way off. We win (and even sometimes lose) more than 7.5 BB well over once in every million hands. So our confidence interval calcs for BB/1 are way off because the real poker distribution is nowhere near Normal and can NOT be approximated well using just its first two moments (mean and variance).

So where does that leave us with BB/100? How much error do we introduce in our confidence interval calculations by assuming BB/100 is Guassian? Good question. I'm not sure. I think you will still see remnants of the longer flat tail on the positive side and the shorter, steeper tail on the negative side (a by-product of the fact that you can win a lot more in one hand than you can lose). But I suspect it will be close enough to normal to not worry about it.

w_alloy · #33 12-08-2005, 08:08 AM

vkh, great posts, and thanks to Justing and Josh for bringing this topic up.

I think this...

[ QUOTE ]

So where does that leave us with BB/100? How much error do we introduce in our confidence interval calculations by assuming BB/100 is Guassian? Good question. I'm not sure. I think you will still see remnants of the longer flat tail on the positive side and the shorter, steeper tail on the negative side (a by-product of the fact that you can win a lot more in one hand than you can lose). But I suspect it will be close enough to normal to not worry about it.

[/ QUOTE ]

is very worth studying. BB/100 and the associated confidence intervals are important to a lot of people. Finding out how off our confidence intervals have been, and exactly what we should use as x for bb/x with a sample size y, are very worthy pursuits. Even if you suspect its close enough to normal not to worry about, Josh's graph with (only?) 1500 datapoints suggests otherwise.

#34 12-08-2005, 09:34 AM

If I am not mistaken, according to the central limit theorem they certainly must be normally distributed.

sfer · #35 12-08-2005, 10:38 AM

[ QUOTE ]
It looks pretty close to normal here, but I think the way you did the sampling is not quite right.

You need to draw random samples from the total group of hands. Chopping them up into blocks is easier, but not appropriate. The way you have done it, we might find a different results using a different database.

[/ QUOTE ]

You probably want to draw largish random blocks of hands to randomize position as much as possible between the samples.

MaxPower · #36 12-08-2005, 11:58 AM

[ QUOTE ]
[ QUOTE ]
It looks pretty close to normal here, but I think the way you did the sampling is not quite right.

You need to draw random samples from the total group of hands. Chopping them up into blocks is easier, but not appropriate. The way you have done it, we might find a different results using a different database.

[/ QUOTE ]

You probably want to draw largish random blocks of hands to randomize position as much as possible between the samples.

[/ QUOTE ]

OK, I figured out a way to do this in SPSS. I thought the data file would crash my computer but it doesn't

I have a data file with the amount I won/lost for 164,724 hands at 15/30. My win rate over these hands is a pitiful 1.13BB/100.

How large should the samples be and how many should I pull? I was thinking of selecting 10,000 samples of 1000 hands each.

Then I can I plot them and get the skewness, kurtosis, etc.

IndieMatty · #37 12-08-2005, 12:22 PM

[ QUOTE ]
[ QUOTE ]
[ QUOTE ]
It looks pretty close to normal here, but I think the way you did the sampling is not quite right.

You need to draw random samples from the total group of hands. Chopping them up into blocks is easier, but not appropriate. The way you have done it, we might find a different results using a different database.

[/ QUOTE ]

You probably want to draw largish random blocks of hands to randomize position as much as possible between the samples.

[/ QUOTE ]

OK, I figured out a way to do this in SPSS. I thought the data file would crash my computer but it doesn't

I have a data file with the amount I won/lost for 164,724 hands at 15/30. My win rate over these hands is a pitiful 1.13BB/100.

How large should the samples be and how many should I pull? I was thinking of selecting 10,000 samples of 1000 hands each.

Then I can I plot them and get the skewness, kurtosis, etc.

[/ QUOTE ]

I just had a grim college statistics flashback when you mentioned SPSS. And I am now scared of this thread. You guys are so much smarter than me. [img]/images/graemlins/frown.gif[/img]

danzasmack · #38 12-08-2005, 12:47 PM

[ QUOTE ]
If I am not mistaken, according to the central limit theorem they certainly must be normally distributed.

[/ QUOTE ]

In order the central limit theorom to be applicable, the sample's variate's must be:

1) independent
and
2) distributed arbitrarily.

I think arguments can be made in support of both being true or false.

I also think that the nature of the player is very important, as well as the game they are plaing (limit, NL)

I think an equally interesting question is, if you were to model BB/100, what would it be a function of?

MNpoker · #39 12-08-2005, 01:07 PM

A discrete function will never become perfectly normal if the underlying discrete function is not normal. It will however become MORE normal as you increase the number of independent trials.

For example say your indivdual hand distrbution is:
- 1 90%
+ 11 10% of the time

If you run 100 trials there is still a .1^100 chance you will have won 11 * 100. (you win every trial) There is a zero chance you will have lost 11 * 100 (in fact your largest possible loss is 100).

So what you need to decide first is how close to normal do you need the aggregate results to be to be considered normal?
----------------------------------------------------------
On the changing parameters. (Good table, playing well, etc.) This will be VERY hard to empirically estimate, and will make the calculations more tedious.
----------------------------------------------------------

Some methods that may be usefull in seeing how many hands you need to approximate a normal function would be:

Get every hand you have and put it into a discrete curve with the values as percentages. There won't be that many buckets. (You can lose up to 12 BB, but probably never have, and win up to 108BB <-- That I'd like to see). Most likely your range will be something like -8BB to +30BB with an interval size of .25 BB <-- The small blind.

Now you can either
A) Run X simulations off that curve and see if the result is 'normal'

Or

B) Do a FFT on the curve with X convolutions(?) and see if the outputs are 'normal'.

If the results are still to skewed to be called 'normal' increase X.
The FFT method has the advantage it gives the 'true' distribution but it's a pain to do (I would guess you will need somewhere in the neighborhood of 5,000 convolutions(?) and you certainly won't pull this off using FFT in excel)
The simulation method if you get a 'normal' result you should probably still run it about 10 times (minimum) to make sure you didn't just get a 'normal' simulation.

DcifrThs · #40 12-08-2005, 01:14 PM

[ QUOTE ]
A discrete function will never become perfectly normal if the underlying discrete function is not normal. It will however become MORE normal as you increase the number of independent trials.

For example say your indivdual hand distrbution is:
- 1 90%
+ 11 10% of the time

If you run 100 trials there is still a .1^100 chance you will have won 11 * 100. (you win every trial) There is a zero chance you will have lost 11 * 100 (in fact your largest possible loss is 100).

So what you need to decide first is how close to normal do you need the aggregate results to be to be considered normal?
----------------------------------------------------------
On the changing parameters. (Good table, playing well, etc.) This will be VERY hard to empirically estimate, and will make the calculations more tedious.
----------------------------------------------------------

Some methods that may be usefull in seeing how many hands you need to approximate a normal function would be:

Get every hand you have and put it into a discrete curve with the values as percentages. There won't be that many buckets. (You can lose up to 12 BB, but probably never have, and win up to 108BB <-- That I'd like to see). Most likely your range will be something like -8BB to +30BB with an interval size of .25 BB <-- The small blind.

Now you can either
A) Run X simulations off that curve and see if the result is 'normal'

Or

B) Do a FFT on the curve with X iterations and see if the outputs are 'normal'.

If the results are still to skewed to be called 'normal' increase X.
[The FFT method has the advantage it gives the 'true' distribution but it's a pain to do, the simulation method if you get a 'normal' result you should probably still run it about 10 times (minimum) to make sure you didn't just get a 'normal' simulation.

[/ QUOTE ]

i think a more realistic hand distribution would be much closer to approaching normality.

lose 0sb with P1
lose 1sb with P2
lose .5sb with P3
lose 2 sbs with P4
lose 3sbs with P5
lose 4sbs with P6
lose 5sbs with P7
.
.
.
lose 12bbs with Pn

then the upside:
win 0 with Pa
win
.
.
.
win 55bbs with Pm

i think that would be better, but still, there is a skew due to the limited downside and positive upside.

but we're aggregating in such a way around the MEAN of this random variable over time which might be enough to make it fairly close to gaussian.

as vks stated, "how close"? we dont know....we'd need more data than anybody here has to look at it.

Barron