PDA

View Full Version : Are Winrates Normally Distributed?


sthief09
12-07-2005, 02:16 PM
BIG thanks to Justin A for coming to me about this. He was the one who first noticed it and most of this was his idea. I had some free time last month on my flight to Vegas so I did most of the tedious Excel stuff.

Also, Barron, I know this doesn't have anything to do with high stakes poker, but I feel that it is an interesting topic that many posters will be interested in. If you don't agree, please move it to MHHUSH.

I am not a statistician but based on my not-quite-intermediate understanding of statistics, this is what I've come up with.

Methodology: I took a database of about 150k hands at one limit. I took these hands and put them into Excel. I chopped them up into blocks of hands (in chronological order) and plotted them in a histogram.

Theory: BB/100 is the average BB you will win every 100 hands. If you've played a large enough sample of hands (technically 50 samples should be enough, so 5,000 hands), the sample should behave normally and form the shape of a bell curve.

Results: The following graphs are of BB/25, BB/50, BB/100, etc. Notice that they are skewed toward the low end. This would suggest that winrates are not normally distributed, which would mean you are more likely to run good but the bad runs will be worse. I don't remember the exactly mean but figure it's somewhere between 2 and 2.5 BB/100. for BB/10 it would be between 20 and 25, etc.

Can someone with a better understanding of statistics explain this?

http://img.photobucket.com/albums/v651/sthief09/per10.jpg

http://img.photobucket.com/albums/v651/sthief09/per25.jpg

http://img.photobucket.com/albums/v651/sthief09/per50.jpg

http://img.photobucket.com/albums/v651/sthief09/per100.jpg

http://img.photobucket.com/albums/v651/sthief09/per200.jpg

http://img.photobucket.com/albums/v651/sthief09/per500.jpg

http://img.photobucket.com/albums/v651/sthief09/per700.jpg

astroglide
12-07-2005, 02:23 PM
i can't speak on math, but i know that people tend to play a lot when they're down and tend to leave when they're up

MaxPower
12-07-2005, 02:35 PM
It looks pretty close to normal here, but I think the way you did the sampling is not quite right.

You need to draw random samples from the total group of hands. Chopping them up into blocks is easier, but not appropriate. The way you have done it, we might find a different results using a different database.

PTjvs
12-07-2005, 02:43 PM
My gut instinct says this is because the raw +$/hand won is greater than the -$/hand you lose (due to some pots being multiway). Generally speaking, your small chunks of hands will break down as follows:

1) Many where you don't win big pots and you are slightly -$

2) Slightly fewer where you win enough hands to have a small +$

3) A few where you win a big hand or two & are very +$.

4) A few where you lose a few big pots & are very -$ (less however than your big +$ chunks, since due to multiway pots, the amount you WIN if big pots is larger than the amount you LOSE in the same big pot)

This should leader to a bell curve is NOT evenly distributed, but peaks on the - side, with the difference being made up by the curve coming down less steeply on postive end.

I hope I described that well, I'd do one in MS paint, but I'm a very bad artist.

if i am correct however, if you took the same data from HU play, it SHOULD look like a normal bell curve, as the effect of multiway play is completely eliminated.

jvs

sthief09
12-07-2005, 02:45 PM
[ QUOTE ]
It looks pretty close to normal here, but I think the way you did the sampling is not quite right.

You need to draw random samples from the total group of hands. Chopping them up into blocks is easier, but not appropriate. The way you have done it, we might find a different results using a different database.

[/ QUOTE ]

i could easily randomize it

sthief09
12-07-2005, 02:48 PM
one of the possible conclusions that justin a came up with is that maybe BB/100 is not the optimal measure for those who want to do tests on it. maybe we could get a more accurate standard deviation from BB/1000, though for most people, playing that many hands before getting a standard deviation would be infeasible and plain annoying

MaxPower
12-07-2005, 02:54 PM
[ QUOTE ]
[ QUOTE ]
It looks pretty close to normal here, but I think the way you did the sampling is not quite right.

You need to draw random samples from the total group of hands. Chopping them up into blocks is easier, but not appropriate. The way you have done it, we might find a different results using a different database.

[/ QUOTE ]

i could easily randomize it

[/ QUOTE ]

You can easily randomize the order of the hands in your database, but that is not what I was suggesting.

You need to randomly select X number of hands and compute the win rate for that sample. Then you need to do it over and over and over again, each time selecting X number of hands from the total set of hands. Then you plot the win rates for each of those samples.

I don't know of any easy way to do that in Excel. It could be done in some statistical packages, but a data set of that size is too big for a desktop computer, it would need to be run off of a server or a mainframe computer.

There is type of statistics called Bootstrapping which does not rely on the assumption of normality. I assume there is some bootstrapping software that does this kind of thing, but I don't know enough about it.

felson
12-07-2005, 02:57 PM
[ QUOTE ]
i can't speak on math, but i know that people tend to play a lot when they're down and tend to leave when they're up

[/ QUOTE ]

Astro, that shouldn't matter since even if Josh leaves, his next session gets grouped in the stats (if I understand correctly). This would only matter if Josh tilts.

sthief09
12-07-2005, 03:03 PM
oh i see what youre getting at. they have to be completely independent. by choosing blocks of hands, it assumes dependence since each has an equal likelihood of being picked and thus isnt random?

felson
12-07-2005, 03:04 PM
I think PTjvs is dead on. Also, this effect is strongest when the blocks of hands are very small. If the block is just one hand in length, then (in a 10-handed game) around 90% of your sample points will be <= 0, and about 10% will be greater than zero. As the block size gets larger, the median block value tends towards the mean. You can see this reflected in the plots, which shift to the right as the block size gets larger.

Chobohoya
12-07-2005, 03:15 PM
[ QUOTE ]
[ QUOTE ]
i can't speak on math, but i know that people tend to play a lot when they're down and tend to leave when they're up

[/ QUOTE ]

Astro, that shouldn't matter since even if Josh leaves, his next session gets grouped in the stats (if I understand correctly). This would only matter if Josh tilts.

[/ QUOTE ]

It should matter. If you quit early to lock up a win, and "play through your downswings," then you're going to have a smaller winrate, and the center of your distribution will be more to the left than it could be.

Derek123
12-07-2005, 03:32 PM
This would suggest that winrates are not normally distributed, which would mean you are more likely to run good but the bad runs will be worse.


This sentence seems backwards to me. If it is skewed to the left, there are more instances of bad, but the few big wins make up for it.

felson
12-07-2005, 03:32 PM
[ QUOTE ]
[ QUOTE ]
[ QUOTE ]
i can't speak on math, but i know that people tend to play a lot when they're down and tend to leave when they're up

[/ QUOTE ]

Astro, that shouldn't matter since even if Josh leaves, his next session gets grouped in the stats (if I understand correctly). This would only matter if Josh tilts.

[/ QUOTE ]

It should matter. If you quit early to lock up a win, and "play through your downswings," then you're going to have a smaller winrate, and the center of your distribution will be more to the left than it could be.

[/ QUOTE ]

If Josh wins his first hand and quits immediately, then plays again later, then will the hands from his second session be put in the same block as his first hand?

If so, then it doesn't matter if Josh locks up his wins.

If Josh's one hand forms its own block, then that's different -- but I don't think that's what is happening.

sam h
12-07-2005, 03:34 PM
This is a cool idea.

I don't know about the conclusions, though, as the distributions look pretty normal to me. The variance is pretty high, so even with a lot of observations its not that surprising to see things look kind of choppy.

Another suggestion to redo the graphs using the same number of "bins" (divisions for bars) in each histogram. You have a lot more bins in the first couple, which makes things look slower to converge on a normal than they probably are.

I think something might be wrong with your BB/100 histogram. It doesn't look like there are 1500 observations there.

UprightCreature
12-07-2005, 03:40 PM
For the reason PTjvs states the winrate for one hand should not be normally distributed. There is an interesting fact though, that is the distribution of groups of a samples from a non-normal distribution aproach normal as the size of the group increases (Central Limit Theorem). Eg. winrate/1000 will be more normal than winrate /10.

disjunction
12-07-2005, 03:51 PM
http://forumserver.twoplustwo.com/showth...rue#Post3134879 (http://forumserver.twoplustwo.com/showthreaded.php?Cat=0&Board=mediumholdem&Number=3 134879&Searchpage=4&Main=3133127&Words=disjunction &topic=&Search=true#Post3134879)

Also there is no reason that I know of to believe that winrates are normally distributed. The normal distribution is a hammer but not every problem is a nail.

Edit: Also I forgot to say in the linked post that a bad table or a bad seat, rather than bad play, can be "Mr. Hyde".

Chobohoya
12-07-2005, 04:30 PM
[ QUOTE ]
[ QUOTE ]
[ QUOTE ]
Astro, that shouldn't matter since even if Josh leaves, his next session gets grouped in the stats (if I understand correctly). This would only matter if Josh tilts.

[/ QUOTE ]

It should matter. If you quit early to lock up a win, and "play through your downswings," then you're going to have a smaller winrate, and the center of your distribution will be more to the left than it could be.

[/ QUOTE ]

If Josh wins his first hand and quits immediately, then plays again later, then will the hands from his second session be put in the same block as his first hand?

If so, then it doesn't matter if Josh locks up his wins.

If Josh's one hand forms its own block, then that's different -- but I don't think that's what is happening.

[/ QUOTE ]

Ok you're right about that. However, what I think Astro was getting at and I know I was, is that people play longer when they are losing. Over a large sample, this is going to mean that you play more hands when you: have a worse image, a tougher table, less confidence, etc. If you put more hours in with a lesser expectation, you move your curve to the left. If you practice excellent game selection without regard to your immediate results (aka don't tilt - just like you said) then your point stands. Many people do not do this.

Justin A
12-07-2005, 05:03 PM
[ QUOTE ]
This would suggest that winrates are not normally distributed, which would mean you are more likely to run good but the bad runs will be worse.


This sentence seems backwards to me. If it is skewed to the left, there are more instances of bad, but the few big wins make up for it.

[/ QUOTE ]

Yeah you're right. I think Josh got it mixed up.

In an extreme sense, it's like we're usually treading water with a few really good runs in between that makes our results better.

MarkD
12-07-2005, 05:15 PM
[ QUOTE ]

In an extreme sense, it's like we're usually treading water with a few really good runs in between that makes our results better.

[/ QUOTE ]

LOL, this is funny because this quote seems to very accurately describe my experience at poker. Nice big bursts between periods of losing or breaking even.

12-07-2005, 05:20 PM
Hi Josh,

I'll take a stab at addressing a few points.

The fundamental Random Variable in poker is the amount of money you win on one hand. This random variable has a distribution, which is certainly not Guassian.
First off, it's a discrete random variable. The mean is your winrate per hand. The max value values it can take are +12BB and -12BB (on Party Poker). The most probable event is 0, since you fold most hands.
Other frequently occurring values are -0.50BB and -0.25BB since these are the values you lose when you fold your blinds, and maybe -1.5BB since this is how much you lose when you raise pre-flop, completely blank the flop, bet the flop, and get raised.

So we get a sense of what the probability mass function of this random variable looks like: It's centered at your winrate (say .02bb) but its peak value is at 0. Then it has smaller peaks at popularly occuring values, such as -0.50BB, -0.25BB, etc. It is, obviously, not a normal distribution.

The Central Limit Theorem tells us that if we ADD together enough of these strange random variables, the sum, regarded as a random variable, must start looking more and more Guassian.

In your charts, when you group together a string of hands, you are adding all the random variable in each group, and this sum should starting looking Guassian the larger the group is (BB/1000 should look more Guassian than BB/10). With a 150k hand sample, I don't think you have enough hands to get a graph that shows this, since if you went to, say, BB/1000, you would only have 150 sample points. But I'm pretty sure that at some point, it would look like a nice bell-shaped curve.

Edit: You can start to see at BB/50 how the graph is looking more Guassian. Below BB/50 you have the nice feature that you have many smaple points. BUT each sample point is not yet being taken from a very Guassian distribution. Above BB/50 (BB/100 and up), you have the nice feature that the samples are being taken from a pretty Guassian distribution, BUT you don't have enough samples to draw the curve. If your DB was much larger, I think you would see the BB/100 look much closer to Guassian than the BB/50.

-v

12-07-2005, 05:59 PM
[ QUOTE ]
[ QUOTE ]

In an extreme sense, it's like we're usually treading water with a few really good runs in between that makes our results better.

[/ QUOTE ]

LOL, this is funny because this quote seems to very accurately describe my experience at poker. Nice big bursts between periods of losing or breaking even.

[/ QUOTE ]

This has been my experience as well. Since I moved up to 15/30 in May and later 20/40 in September, I have never had a losing month, but I did make about half my money in one 30 day span in which I ran insanely well, and as a result, played a ton of hands.

What the other poster said about taking random hands and combining them to make a sample is appropriate. As tilt proof as all of us think we are (or aren't), it is still perhaps not an accurate statement to call each group of 100 hands independent. Combining hands from different sessions to form samples would be a much better indicator of overall play in my opinion.

jetsonsdogcanfly
12-07-2005, 06:03 PM
There are two easy numbers that reflect the "normalness" of the distribution. Skewness measures the bias towards one side of the mean, and kurtosis measures the fatness of the tails of the distribution. In excel, you can easily get these numbers using the functions skew() and kurt(), with the arguments for the functions being simply the winrate series. Can you do that, and post or PM me the results, for each of the block sizes?

B Dids
12-07-2005, 06:06 PM
I liked this better when you tried to explain it drunk off your ass at Craftsteak.

damaniac
12-07-2005, 06:17 PM
Isn't your theoretical max winrate (or win) for a hand 12BB x N(number of players)? You can only lose 12 bets but you can certainly win far more.

12-07-2005, 06:22 PM
Yes, you are right. Max loss is -12BB and Max win is 9*12BB = 108BB - rake.

Justin A
12-07-2005, 10:37 PM
[ QUOTE ]
Hi Josh,

I'll take a stab at addressing a few points.

The fundamental Random Variable in poker is the amount of money you win on one hand. This random variable has a distribution, which is certainly not Guassian.
First off, it's a discrete random variable. The mean is your winrate per hand. The max value values it can take are +12BB and -12BB (on Party Poker). The most probable event is 0, since you fold most hands.
Other frequently occurring values are -0.50BB and -0.25BB since these are the values you lose when you fold your blinds, and maybe -1.5BB since this is how much you lose when you raise pre-flop, completely blank the flop, bet the flop, and get raised.

So we get a sense of what the probability mass function of this random variable looks like: It's centered at your winrate (say .02bb) but its peak value is at 0. Then it has smaller peaks at popularly occuring values, such as -0.50BB, -0.25BB, etc. It is, obviously, not a normal distribution.

The Central Limit Theorem tells us that if we ADD together enough of these strange random variables, the sum, regarded as a random variable, must start looking more and more Guassian.

In your charts, when you group together a string of hands, you are adding all the random variable in each group, and this sum should starting looking Guassian the larger the group is (BB/1000 should look more Guassian than BB/10). With a 150k hand sample, I don't think you have enough hands to get a graph that shows this, since if you went to, say, BB/1000, you would only have 150 sample points. But I'm pretty sure that at some point, it would look like a nice bell-shaped curve.

Edit: You can start to see at BB/50 how the graph is looking more Guassian. Below BB/50 you have the nice feature that you have many smaple points. BUT each sample point is not yet being taken from a very Guassian distribution. Above BB/50 (BB/100 and up), you have the nice feature that the samples are being taken from a pretty Guassian distribution, BUT you don't have enough samples to draw the curve. If your DB was much larger, I think you would see the BB/100 look much closer to Guassian than the BB/50.

-v

[/ QUOTE ]

Ok you seem to know a lot about stats. When I first started looking into this I did so because I was wondering if the confidence interval calcs we've done in the past are accurate when dealing with BB/100. So if you have a winrate of x bb/100 after 100k hands or whatever, then we do a calc for a 95% or 99% or whatever confidence level we choose to find out where are true winrate most likely falls. Likewise we can do the same for level of confidence that winrate > x.

So my question to you, is how accurate are these confidence intervals when dealing with a statistic that is not distributed normally?

MaxPower
12-08-2005, 01:01 AM
If the distribution is not normal but not much different from the normal distribution, then in practical terms I don't think it would make much difference. Even if it were not strictly accurate, it would be good enough and probably not worth doing all the extra work to get an accurate confidence interval.

I am still not convinced that win rates are not normally distributed around the mean.

If you want to base you confidence interval on the actual distribution, you might look into bootstrapping.

oreogod
12-08-2005, 03:27 AM
you pictures are huge, even on 1600*1200 in makes reading this thread no fun. If u can, resize them before u post next time.

jason_t
12-08-2005, 03:29 AM
[ QUOTE ]
you pictures are huge, even on 1600*1200 in makes reading this thread no fun. If u can, resize them before u post next time.

[/ QUOTE ]

He can resize them right now through photobucket.

DcifrThs
12-08-2005, 03:45 AM
[ QUOTE ]
If the distribution is not normal but not much different from the normal distribution, then in practical terms I don't think it would make much difference. Even if it were not strictly accurate, it would be good enough and probably not worth doing all the extra work to get an accurate confidence interval.

I am still not convinced that win rates are not normally distributed around the mean.

If you want to base you confidence interval on the actual distribution, you might look into bootstrapping.

[/ QUOTE ]

one thing thats important to consider is the nature of the process that drives the win rate of a given player or a pool of players.

in discrete time, its easier to deal with but when we move to continuous time, the driving force could be a set of stochastic processes which COULD nullify any inferences made from using the current normal distribution as a base for analysis.

basically, if random processes drive parts of winrate (one process could be how a given person does x,y, or z and have it based on randomness or even have real life like jumps-like the poker graphs show- by making those processes brownian motions that accumulate quadratic variation at rate 1 per unit time) then we wont see the distribution as normal or even a good approximation unless ALL processes meet a few criteria:

-they all have to individually be random and not deterministic (though they can change over time, so long as its random)

-their drift/diffusion (mean/variance) must be adapted to the SAME information that drives the whole system

-they are jointly normally distributed

NOTE: these are some seroiusly strict conditions ... especially the last one.

if these are met then the distribution of the results of the process may approximate a normal distrubution with some confidence.

either way, studies have shown that almost all biological/psychological phenomenon are normally distributed or very easily and readily approximated by a normal distribution. since the win rate is driven by largely biological phenomenon, it would seem as if on a large enough scale, the results of the win rate observations would converge to a normal distribution as well.

the whole thing is interesting and i like thinking about it but im not good enough at all types of higher level math to write out a proof of this...

well, its bedtime.

Barron

stinkypete
12-08-2005, 04:38 AM
[ QUOTE ]
Hi Josh,

I'll take a stab at addressing a few points.

The fundamental Random Variable in poker is the amount of money you win on one hand. This random variable has a distribution, which is certainly not Guassian.
First off, it's a discrete random variable. The mean is your winrate per hand. The max value values it can take are +12BB and -12BB (on Party Poker). The most probable event is 0, since you fold most hands.
Other frequently occurring values are -0.50BB and -0.25BB since these are the values you lose when you fold your blinds, and maybe -1.5BB since this is how much you lose when you raise pre-flop, completely blank the flop, bet the flop, and get raised.

So we get a sense of what the probability mass function of this random variable looks like: It's centered at your winrate (say .02bb) but its peak value is at 0. Then it has smaller peaks at popularly occuring values, such as -0.50BB, -0.25BB, etc. It is, obviously, not a normal distribution.

The Central Limit Theorem tells us that if we ADD together enough of these strange random variables, the sum, regarded as a random variable, must start looking more and more Guassian.

In your charts, when you group together a string of hands, you are adding all the random variable in each group, and this sum should starting looking Guassian the larger the group is (BB/1000 should look more Guassian than BB/10). With a 150k hand sample, I don't think you have enough hands to get a graph that shows this, since if you went to, say, BB/1000, you would only have 150 sample points. But I'm pretty sure that at some point, it would look like a nice bell-shaped curve.

Edit: You can start to see at BB/50 how the graph is looking more Guassian. Below BB/50 you have the nice feature that you have many smaple points. BUT each sample point is not yet being taken from a very Guassian distribution. Above BB/50 (BB/100 and up), you have the nice feature that the samples are being taken from a pretty Guassian distribution, BUT you don't have enough samples to draw the curve. If your DB was much larger, I think you would see the BB/100 look much closer to Guassian than the BB/50.

-v

[/ QUOTE ]

this is very well said, and based on my understanding of statistics, exactly correct.


the assumption here is that the win/loss per hand is a random variable, which it strictly speaking is not, as dcifr mentioned. the distribution of the random variable will change based on game conditions, improvements in your play, tilt, who you're playing against, the number of dumps the guy in seat 6 has taken that day, etc. but these things shouldn't change the random variable so much that you can't approximate win rate per N hands where N is large as a normal distribution fairly accurately (the last point in particular has very little effect.)

12-08-2005, 05:36 AM
Hi Justin,

The confidence interval calcs that people do only apply to a Normal distribution with mean and variance equal to the mean and variance of our special BB/100 distribution.
Since our BB/100 distribution is not normal, these first two "moments" are not enough to figure out exactly what our confidence intervals are. So, our calculations introduce an error. How big is the error?

Let's go to the extreme and say that we were interested in BB/1 (BB/hand). Our mean, let's say, is 0.0200 (this would mean we had a winrate of 2.00 BB/100). And let's say our standard deviation is 1.5 (which would result in an SD/100 of 15). This means that a one sigma event would put us between -1.48 and 1.52 BB, and a five sigma event (which happens less than once in a million trials on average) would be between -7.48 BB and 7.52 BB. Clearly this is way off. We win (and even sometimes lose) more than 7.5 BB well over once in every million hands. So our confidence interval calcs for BB/1 are way off because the real poker distribution is nowhere near Normal and can NOT be approximated well using just its first two moments (mean and variance).

So where does that leave us with BB/100? How much error do we introduce in our confidence interval calculations by assuming BB/100 is Guassian? Good question. I'm not sure. I think you will still see remnants of the longer flat tail on the positive side and the shorter, steeper tail on the negative side (a by-product of the fact that you can win a lot more in one hand than you can lose). But I suspect it will be close enough to normal to not worry about it.

w_alloy
12-08-2005, 08:08 AM
vkh, great posts, and thanks to Justing and Josh for bringing this topic up.

I think this...

[ QUOTE ]

So where does that leave us with BB/100? How much error do we introduce in our confidence interval calculations by assuming BB/100 is Guassian? Good question. I'm not sure. I think you will still see remnants of the longer flat tail on the positive side and the shorter, steeper tail on the negative side (a by-product of the fact that you can win a lot more in one hand than you can lose). But I suspect it will be close enough to normal to not worry about it.

[/ QUOTE ]

is very worth studying. BB/100 and the associated confidence intervals are important to a lot of people. Finding out how off our confidence intervals have been, and exactly what we should use as x for bb/x with a sample size y, are very worthy pursuits. Even if you suspect its close enough to normal not to worry about, Josh's graph with (only?) 1500 datapoints suggests otherwise.

12-08-2005, 09:34 AM
If I am not mistaken, according to the central limit theorem they certainly must be normally distributed.

sfer
12-08-2005, 10:38 AM
[ QUOTE ]
It looks pretty close to normal here, but I think the way you did the sampling is not quite right.

You need to draw random samples from the total group of hands. Chopping them up into blocks is easier, but not appropriate. The way you have done it, we might find a different results using a different database.

[/ QUOTE ]

You probably want to draw largish random blocks of hands to randomize position as much as possible between the samples.

MaxPower
12-08-2005, 11:58 AM
[ QUOTE ]
[ QUOTE ]
It looks pretty close to normal here, but I think the way you did the sampling is not quite right.

You need to draw random samples from the total group of hands. Chopping them up into blocks is easier, but not appropriate. The way you have done it, we might find a different results using a different database.

[/ QUOTE ]

You probably want to draw largish random blocks of hands to randomize position as much as possible between the samples.

[/ QUOTE ]

OK, I figured out a way to do this in SPSS. I thought the data file would crash my computer but it doesn't

I have a data file with the amount I won/lost for 164,724 hands at 15/30. My win rate over these hands is a pitiful 1.13BB/100.

How large should the samples be and how many should I pull? I was thinking of selecting 10,000 samples of 1000 hands each.

Then I can I plot them and get the skewness, kurtosis, etc.

IndieMatty
12-08-2005, 12:22 PM
[ QUOTE ]
[ QUOTE ]
[ QUOTE ]
It looks pretty close to normal here, but I think the way you did the sampling is not quite right.

You need to draw random samples from the total group of hands. Chopping them up into blocks is easier, but not appropriate. The way you have done it, we might find a different results using a different database.

[/ QUOTE ]

You probably want to draw largish random blocks of hands to randomize position as much as possible between the samples.

[/ QUOTE ]

OK, I figured out a way to do this in SPSS. I thought the data file would crash my computer but it doesn't

I have a data file with the amount I won/lost for 164,724 hands at 15/30. My win rate over these hands is a pitiful 1.13BB/100.

How large should the samples be and how many should I pull? I was thinking of selecting 10,000 samples of 1000 hands each.

Then I can I plot them and get the skewness, kurtosis, etc.

[/ QUOTE ]

I just had a grim college statistics flashback when you mentioned SPSS. And I am now scared of this thread. You guys are so much smarter than me. /images/graemlins/frown.gif

danzasmack
12-08-2005, 12:47 PM
[ QUOTE ]
If I am not mistaken, according to the central limit theorem they certainly must be normally distributed.

[/ QUOTE ]

In order the central limit theorom to be applicable, the sample's variate's must be:

1) independent
and
2) distributed arbitrarily.

I think arguments can be made in support of both being true or false.

I also think that the nature of the player is very important, as well as the game they are plaing (limit, NL)

I think an equally interesting question is, if you were to model BB/100, what would it be a function of?

MNpoker
12-08-2005, 01:07 PM
A discrete function will never become perfectly normal if the underlying discrete function is not normal. It will however become MORE normal as you increase the number of independent trials.

For example say your indivdual hand distrbution is:
- 1 90%
+ 11 10% of the time

If you run 100 trials there is still a .1^100 chance you will have won 11 * 100. (you win every trial) There is a zero chance you will have lost 11 * 100 (in fact your largest possible loss is 100).

So what you need to decide first is how close to normal do you need the aggregate results to be to be considered normal?
----------------------------------------------------------
On the changing parameters. (Good table, playing well, etc.) This will be VERY hard to empirically estimate, and will make the calculations more tedious.
----------------------------------------------------------

Some methods that may be usefull in seeing how many hands you need to approximate a normal function would be:

Get every hand you have and put it into a discrete curve with the values as percentages. There won't be that many buckets. (You can lose up to 12 BB, but probably never have, and win up to 108BB <-- That I'd like to see). Most likely your range will be something like -8BB to +30BB with an interval size of .25 BB <-- The small blind.

Now you can either
A) Run X simulations off that curve and see if the result is 'normal'

Or

B) Do a FFT on the curve with X convolutions(?) and see if the outputs are 'normal'.

If the results are still to skewed to be called 'normal' increase X.
The FFT method has the advantage it gives the 'true' distribution but it's a pain to do (I would guess you will need somewhere in the neighborhood of 5,000 convolutions(?) and you certainly won't pull this off using FFT in excel)
The simulation method if you get a 'normal' result you should probably still run it about 10 times (minimum) to make sure you didn't just get a 'normal' simulation.

DcifrThs
12-08-2005, 01:14 PM
[ QUOTE ]
A discrete function will never become perfectly normal if the underlying discrete function is not normal. It will however become MORE normal as you increase the number of independent trials.

For example say your indivdual hand distrbution is:
- 1 90%
+ 11 10% of the time

If you run 100 trials there is still a .1^100 chance you will have won 11 * 100. (you win every trial) There is a zero chance you will have lost 11 * 100 (in fact your largest possible loss is 100).

So what you need to decide first is how close to normal do you need the aggregate results to be to be considered normal?
----------------------------------------------------------
On the changing parameters. (Good table, playing well, etc.) This will be VERY hard to empirically estimate, and will make the calculations more tedious.
----------------------------------------------------------

Some methods that may be usefull in seeing how many hands you need to approximate a normal function would be:

Get every hand you have and put it into a discrete curve with the values as percentages. There won't be that many buckets. (You can lose up to 12 BB, but probably never have, and win up to 108BB <-- That I'd like to see). Most likely your range will be something like -8BB to +30BB with an interval size of .25 BB <-- The small blind.

Now you can either
A) Run X simulations off that curve and see if the result is 'normal'

Or

B) Do a FFT on the curve with X iterations and see if the outputs are 'normal'.

If the results are still to skewed to be called 'normal' increase X.
[The FFT method has the advantage it gives the 'true' distribution but it's a pain to do, the simulation method if you get a 'normal' result you should probably still run it about 10 times (minimum) to make sure you didn't just get a 'normal' simulation.

[/ QUOTE ]

i think a more realistic hand distribution would be much closer to approaching normality.

lose 0sb with P1
lose 1sb with P2
lose .5sb with P3
lose 2 sbs with P4
lose 3sbs with P5
lose 4sbs with P6
lose 5sbs with P7
.
.
.
lose 12bbs with Pn

then the upside:
win 0 with Pa
win
.
.
.
win 55bbs with Pm

i think that would be better, but still, there is a skew due to the limited downside and positive upside.

but we're aggregating in such a way around the MEAN of this random variable over time which might be enough to make it fairly close to gaussian.

as vks stated, "how close"? we dont know....we'd need more data than anybody here has to look at it.

Barron

MNpoker
12-08-2005, 01:21 PM
[ QUOTE ]

i think a more realistic hand distribution would be much closer to approaching normality.

[/ QUOTE ]

Where's Capt obvious?

Mine was just for example purposes.

BOTH will move towards normalcy one will just be quicker than the other.

[ QUOTE ]

lose 0sb with P1
lose 1sb with P2
lose .5sb with P3
lose 2 sbs with P4
lose 3sbs with P5
lose 4sbs with P6
lose 5sbs with P7
.
.
.
lose 12bbs with Pn

then the upside:
win 0 with Pa
win
.
.
.
win 55bbs with Pm


[/ QUOTE ]

This is the table I was recommending to be set up. Then run convolutions.

Just like we will never know our win rates to the .00001 per 100 we will never know exactly where the distribution becomes normal.
That's just part of statistics and the reason people use confidence intervals.

The defintion of 'enough' data is how much you have.

MaxPower
12-08-2005, 01:27 PM
[ QUOTE ]
[ QUOTE ]
[ QUOTE ]
It looks pretty close to normal here, but I think the way you did the sampling is not quite right.

You need to draw random samples from the total group of hands. Chopping them up into blocks is easier, but not appropriate. The way you have done it, we might find a different results using a different database.

[/ QUOTE ]

You probably want to draw largish random blocks of hands to randomize position as much as possible between the samples.

[/ QUOTE ]

OK, I figured out a way to do this in SPSS. I thought the data file would crash my computer but it doesn't

I have a data file with the amount I won/lost for 164,724 hands at 15/30. My win rate over these hands is a pitiful 1.13BB/100.

How large should the samples be and how many should I pull? I was thinking of selecting 10,000 samples of 1000 hands each.

Then I can I plot them and get the skewness, kurtosis, etc.

[/ QUOTE ]

I just ran 1000 samples of 1000 hands each from this list and from the graph and statistics, I am convinced that the win rates for these samples are normally distributed.

I can run a larger job overnight and then post some pretty pictures and stats for you.

sfer
12-08-2005, 01:37 PM
[ QUOTE ]
[ QUOTE ]
[ QUOTE ]
It looks pretty close to normal here, but I think the way you did the sampling is not quite right.

You need to draw random samples from the total group of hands. Chopping them up into blocks is easier, but not appropriate. The way you have done it, we might find a different results using a different database.

[/ QUOTE ]

You probably want to draw largish random blocks of hands to randomize position as much as possible between the samples.

[/ QUOTE ]

OK, I figured out a way to do this in SPSS. I thought the data file would crash my computer but it doesn't

I have a data file with the amount I won/lost for 164,724 hands at 15/30. My win rate over these hands is a pitiful 1.13BB/100.

How large should the samples be and how many should I pull? I was thinking of selecting 10,000 samples of 1000 hands each.

Then I can I plot them and get the skewness, kurtosis, etc.

[/ QUOTE ]

I think 1K hand chunks is probably large enough.

12-08-2005, 02:16 PM
[ QUOTE ]
Josh's graph with (only?) 1500 datapoints suggests otherwise.

[/ QUOTE ]

Actually it doesn't suggest otherwise. It just doesn't suggest anything. If you look closely at the BB/100 graph you will see that it is much too granular to draw any sort of conclusion on convergence properties (how closely it approximates normal). There are only 38 different points on the x-axis each of which have having frequency values between 0 and 11. To even have a shot at drawing a reasonable conclusion we would need something more like the BB/10 granularity at the BB/100 level. This would require a 1.5 million hand DB instead of a 150k hand DB.

I do agree with you, however, that this is interesting and is worth looking into. And I think that Josh's method (looking at empirical data) is the best way to do it because we don't have an accurate theoretical model for a hold'em probability distribution.

ScottyP431
12-08-2005, 03:54 PM
Max Power,

Isn't what you are describiing a sampling distribution of the mean? Doesn't that always produce a normal distribution even if the population distribution is not normal?

stinkypete
12-08-2005, 04:08 PM
[ QUOTE ]
Max Power,

Isn't what you are describiing a sampling distribution of the mean? Doesn't that always produce a normal distribution even if the population distribution is not normal?

[/ QUOTE ]

i don't know what it's called, but his method will produce a normal distribution regardless of what the data is, if the samples are large enough.

jetsonsdogcanfly
12-08-2005, 04:26 PM
I don't think that tells you anything particularly usefull. A distribution of randomly sampled means will necessarily converge to normality, but doesn't give you any new volatility information.

jetsonsdogcanfly
12-08-2005, 05:28 PM
I think a better way to do this would be to run some ARMA models with component GARCH disturbances. This should be run on the time series of bb/x blocked returns. The confidence interval can be computed using http://i35.photobucket.com/albums/d177/jetsonsdogcanfly/adjustedCIcopy.jpg

with gamma1=skewness, gamma2=kurtosis, C(sub-alpha)=the standard normal Critical value corresponding to the confidence level.

Piece of Cake.

mmbt0ne
12-08-2005, 05:46 PM
Josh, you have Arean open. I can see it. Save a huge text file, load it into input analyzer, and see what kind of p-values it gives you for all the different distributions it tests for.

FWIW, I tried to say this earlier and pete made me doubt it a little with his well-thought-out responses. However, I still want to do more tests on it.

MaxPower
12-08-2005, 06:10 PM
[ QUOTE ]
Max Power,

Isn't what you are describiing a sampling distribution of the mean? Doesn't that always produce a normal distribution even if the population distribution is not normal?

[/ QUOTE ]

You guys are correct. I just basically demonstrated the Central Limit Theorem.

It is not at all useful. It tells us what we already know, but I think that Josh's method was flawed and led him to the wrong conclusion.

The sampling distribution of win rates is normally distributed and it is appropriate to use that distribution to calculate confidence intervals. This is just an empirical demonstration.

stinkypete
12-08-2005, 06:16 PM
[ QUOTE ]

It is not at all useful. It tells us what we already know, but I think that Josh's method was flawed and led him to the wrong conclusion.

[/ QUOTE ]

while josh's method is flawed in a purely mathematical sense, i think his random variable, profit/hand (or profit/orbit if you prefer), is close enough to a random variable that his results will converge to something very close to a normal distribution for N>100, if he has enough hands.

the major problem here, as has been pointed out, is that the posted graphs aren't based on nearly enough hands.

Shillx
12-08-2005, 07:53 PM
Hey,

MaxPower's post is the solution for all of the problems that we found with PT's sampling errors. It makes no difference how big your database is, because you can draw random 100 hand samples (picked one hand at a time) from it a near infinate number of times. The old way you could only have 1000 x 100 hand samples if you played 100k hands. The new way, there are 100000 nCr 100 ways to draw 100 hands samples (this number is [censored] huge, 200 nCr 100 is like 10^58). If you take millions of these, you could both get a better picture of what the distribution looks like and also figure out what your true standard deviation with a very large level of confidence.

stinkypete
12-08-2005, 07:58 PM
[ QUOTE ]
Hey,

MaxPower's post is the solution for all of the problems that we found with PT's sampling errors. It makes no difference how big your database is, because you can draw random 100 hand samples (picked one hand at a time) from it an infinate number of times. The old way you could only have 1000 x 100 hand samples if you played 100k hands. The new way, there are 100000 nCr 100 ways to draw 100 hands samples (this number is [censored] huge, 200 nCr 100 is like 10^58). If you take millions of these, you could both get a better picture of what the distribution looks like and also figure out what your true standard deviation with a very large level of confidence.

[/ QUOTE ]

how would this give you any new standard deviation data? i can't prove it off the top of my head, but i'm pretty sure this would give the same standard deviation PT would give you over the 100k hands.

EDIT: this assumes PT calculates SD on a per hand basis, which i believe it does, but i could be wrong.

Shillx
12-08-2005, 08:14 PM
Well you can't calculate SD accurately on a per hand basis because the distribution isn't close to normal. You will have a lot of hands that will lose a small amount and then a long tail on the positive side (it will look more like an F-curve or even a chi square distribution). The reason why we do it in 100 hands blocks is to get a more normal distribution.

I always figured that PT did it by sessions (or tables played) and not by # of hands. I remember one time I put a couple big sessions (maybe 500 hands each) and it gave some error like "not enough sessions to calculate SD". So I deleted them and put in 5 or 6 small 30-50 hand sessions and it calculated an SD for me. So while I'm not certain, what it might be doing is finding the SD of the corse of the session and then normalizing it to a BB/100 value. This seems like a very poor way to figure out SD so I could be wrong (and hope I'm wrong quite honestly).

stinkypete
12-08-2005, 08:39 PM
woops, my example sucked. deleted. fixing it. attempt 2 here:

[ QUOTE ]
Well you can't calculate SD accurately on a per hand basis because the distribution isn't close to normal.

[/ QUOTE ]

the definition of standard deviation doesn't rely on a normal distribution in any way. you can calculate the standard deviation of any distribution. its meaning obviously varies depending on the distribution though.

Example:

1 hand:
90% -1
10% +9

SDa = sqrt(0.9*1+0.1*9) = sqrt(1.8)

100 hands:
90% -1
100% +9
SDb = sqrt(90*1 + 10*9) = sqrt(180)


and SDb = sqrt(100)*SDa

so i ask, what is the problem with calculating the SD on a per hand basis?

Justin A
12-08-2005, 09:39 PM
[ QUOTE ]
[ QUOTE ]
[ QUOTE ]
[ QUOTE ]
It looks pretty close to normal here, but I think the way you did the sampling is not quite right.

You need to draw random samples from the total group of hands. Chopping them up into blocks is easier, but not appropriate. The way you have done it, we might find a different results using a different database.

[/ QUOTE ]

You probably want to draw largish random blocks of hands to randomize position as much as possible between the samples.

[/ QUOTE ]

OK, I figured out a way to do this in SPSS. I thought the data file would crash my computer but it doesn't

I have a data file with the amount I won/lost for 164,724 hands at 15/30. My win rate over these hands is a pitiful 1.13BB/100.

How large should the samples be and how many should I pull? I was thinking of selecting 10,000 samples of 1000 hands each.

Then I can I plot them and get the skewness, kurtosis, etc.

[/ QUOTE ]

I just ran 1000 samples of 1000 hands each from this list and from the graph and statistics, I am convinced that the win rates for these samples are normally distributed.

I can run a larger job overnight and then post some pretty pictures and stats for you.

[/ QUOTE ]

Max,
Can you do this with sets of 100 hands? I understand that once the sample sizes get large enough the sampling distribution will be about normal, but I'm wondering how close the bb/100 statistic gets us there.

stinkypete
12-08-2005, 10:30 PM
[ QUOTE ]

Max,
Can you do this with sets of 100 hands? I understand that once the sample sizes get large enough the sampling distribution will be about normal, but I'm wondering how close the bb/100 statistic gets us there.

[/ QUOTE ]

this is a good suggestion.

sfer
12-08-2005, 10:56 PM
[ QUOTE ]
[ QUOTE ]

Max,
Can you do this with sets of 100 hands? I understand that once the sample sizes get large enough the sampling distribution will be about normal, but I'm wondering how close the bb/100 statistic gets us there.

[/ QUOTE ]

this is a good suggestion.

[/ QUOTE ]

I don't think so. 100 hands is 10 orbits of full ring, and having a 100 hand block with 3 more button hands or 5 more Big Blind hands is very significant. I think you need large chunks in order to mitigate that.

Justin A
12-08-2005, 11:08 PM
[ QUOTE ]
[ QUOTE ]
[ QUOTE ]

Max,
Can you do this with sets of 100 hands? I understand that once the sample sizes get large enough the sampling distribution will be about normal, but I'm wondering how close the bb/100 statistic gets us there.

[/ QUOTE ]

this is a good suggestion.

[/ QUOTE ]

I don't think so. 100 hands is 10 orbits of full ring, and having a 100 hand block with 3 more button hands or 5 more Big Blind hands is very significant. I think you need large chunks in order to mitigate that.

[/ QUOTE ]

You just made an argument for running the sims with 100 hand sets.

stinkypete
12-08-2005, 11:13 PM
[ QUOTE ]
[ QUOTE ]
[ QUOTE ]

Max,
Can you do this with sets of 100 hands? I understand that once the sample sizes get large enough the sampling distribution will be about normal, but I'm wondering how close the bb/100 statistic gets us there.

[/ QUOTE ]

this is a good suggestion.

[/ QUOTE ]

I don't think so. 100 hands is 10 orbits of full ring, and having a 100 hand block with 3 more button hands or 5 more Big Blind hands is very significant. I think you need large chunks in order to mitigate that.

[/ QUOTE ]

that's a good point. rather than selecting hands at random, he could do something like filtering for 10-handed games only and selecting 10 hands from each position for each 100 hand sample. posting in the cutoff (or anywhere outside of the blinds) screws things up too, so those hands could be ignored. unfortunately that will get rid of a lot of the hands, since they're probably not nearly all at full 10-handed tables. there's a number of things similar to this that you could do to improve the approximation - this is probably the simplest, but as i said, you lose a bunch of data.

stinkypete
12-08-2005, 11:19 PM
[ QUOTE ]

You just made an argument for running the sims with 100 hand sets.

[/ QUOTE ]

i think his point was that the samples don't provide a realistic approximation of 100 hand blocks since you could easily have, for example, 15 big blind hands, which you almost never would in an actual 100 hand block, so the distributions wouldn't be entirely comparable.

i don't think it really makes much of a difference though.

Justin A
12-08-2005, 11:21 PM
[ QUOTE ]
[ QUOTE ]

You just made an argument for running the sims with 100 hand sets.

[/ QUOTE ]

i think his point was that the samples don't provide a realistic approximation of 100 hand blocks since you could easily have, for example, 15 big blind hands, which you almost never would in an actual 100 hand block, so the distributions wouldn't be entirely comparable.

i don't think it really makes much of a difference though.

[/ QUOTE ]

Oh I get it now. Yeah that complicates things a bit.

MaxPower
12-08-2005, 11:40 PM
[ QUOTE ]
[ QUOTE ]

You just made an argument for running the sims with 100 hand sets.

[/ QUOTE ]

i think his point was that the samples don't provide a realistic approximation of 100 hand blocks since you could easily have, for example, 15 big blind hands, which you almost never would in an actual 100 hand block, so the distributions wouldn't be entirely comparable.

i don't think it really makes much of a difference though.

[/ QUOTE ]


Picking hands randomly is a better way to to control for these extraneous factors.

I could run 100 hand blocks, but I'm playing poker right now and it will crash my machine. I'll do it at work.

I don't know why you guys are so hung up on 100 hands. It is just an arbitrary number.

stinkypete
12-09-2005, 12:02 AM
[ QUOTE ]

Picking hands randomly is a better way to to control for these extraneous factors.


[/ QUOTE ]

why do you say that? it's a better way to control for factors like game conditions and tilt, but i don't see how it's a better way to control for position/blinds.

[ QUOTE ]

I could run 100 hand blocks, but I'm playing poker right now and it will crash my machine. I'll do it at work.

I don't know why you guys are so hung up on 100 hands. It is just an arbitrary number.

[/ QUOTE ]

blame it on pokertracker pat.

Justin A
12-09-2005, 02:15 AM
[ QUOTE ]


I don't know why you guys are so hung up on 100 hands. It is just an arbitrary number.

[/ QUOTE ]

PT does everything in BB/100, so I'd like to know the significance of this stat over certain sample sizes.

12-09-2005, 09:53 AM
[ QUOTE ]
It looks pretty close to normal here, but I think the way you did the sampling is not quite right.

You need to draw random samples from the total group of hands. Chopping them up into blocks is easier, but not appropriate. The way you have done it, we might find a different results using a different database.

[/ QUOTE ]

Exactly.

It looks like the effect that you're seeing is caused by the fact that playing badly costs you more than playing well earns you. Your $/hand should still be normally distributed, but not if you break it up into temperal blocks because you're more likely to see effects of tilt or playing poorly (no offense intended) or bad tables, etc.

MaxPower
12-09-2005, 12:27 PM
OK, I did run 10,000 samples of 100 hands each last night and I can post the results. But before I do, I want to make a point. I do not think the results of my simulation have any practical implications.

Why do we keep track of BB/100 and SD/100? We use these to determine our bankroll needs, how much we can expect to win (confidence intervals), how long one can break even, etc.

I don't know if Mason Malmuth was the first to apply these concepts, but he certainly popularized it. I assume that BB/hour and SD/hour were used in order to simply record keeping and computation. It could have been done per hand, but then you would need to keep track of how many hands you played.

With the advent of internet multitabling, BB/hour was replaced by BB/100, but once again the choice of 100 hands was arbitrary.

We could keep track of win rate and SD on a per hand basis and it would work just as well.

Obviously, the win rate for individual hands are not normally distributed (since you win/lose zero for a majority of you hands), but that does not matter.

What matters is your total sample size, how you compute your test statistic is not important (as long as it is accurate and consistent). We could make it BB/hand, BB/10, BB/1000, or BB/134 and it wouldn't matter.

So if I play 20,000 hands and my BB/100 is 1.5, I know that the sampling distribution for samples of that size is normally distributed.

The fact that the sd is based on 100 hands is also irrelevant, because we use the standard error to compute confidence intervals and that takes the number of hands played into account.

I'm not certain about the bankroll formulas, but I'm pretty sure that it is the same concept.

It has been about 8 years since I studied stats and I am a little out of practice, so please correct me if I am wrong here.

I will post the results of the simulation for those that are interested.

MaxPower
12-09-2005, 01:40 PM
OK, this is 10,000 random saples of 100 hands each drawn from 164,724 hands at 15/30 with a win rate of 1.13BB/100.

First, the Descriptive Statistics:

The skeweness is positive and the ratio of the skewness to the standard error of the skewness indicates that it it different from the normal distribution. The distribution is positively skewed - as you can see the mean is higher than the median.

The same is true for the kurtosis. The value are more closely clustered about the mean than in a normal distribution.
http://i3.photobucket.com/albums/y56/nahthanoJ/descriptives.jpg

This is a test of normality. The significance value indicates that this distribution differs significantly from normal.


http://i3.photobucket.com/albums/y56/nahthanoJ/normality.jpg


These are the extreme sample values:
http://i3.photobucket.com/albums/y56/nahthanoJ/extremevalues.jpg

This is a histogram of the distribution with a normal curve superimposed:
http://i3.photobucket.com/albums/y56/nahthanoJ/a096890f.jpg


These are some additional fun plots for the geeks out there:

http://i3.photobucket.com/albums/y56/nahthanoJ/9e85ceb4.jpg
http://i3.photobucket.com/albums/y56/nahthanoJ/00a2c629.jpg
http://i3.photobucket.com/albums/y56/nahthanoJ/11b6c615.jpg

DcifrThs
12-09-2005, 03:34 PM
excellent post...i was thinking of the "normality test" but couldn't think of the kolmogorov smirnov name...

PS- even in their self named tests, the russions represent smirnov /images/graemlins/smile.gif

Barron

edtost
12-09-2005, 04:29 PM
from looking at the q-q plot, it seems that the upper tail of the poker data is fatter than the gaussian, and the lower is thinner. shouldn't this result in large downswings happening less often than a normal assumption would predict?

i need to spend some more time thinking about this, the repetition of trials inherent in poker makes this more complicated than a standard VaR calculation.

Justin A
12-09-2005, 04:31 PM
Nice work Max, thanks a lot.

[ QUOTE ]
So if I play 20,000 hands and my BB/100 is 1.5, I know that the sampling distribution for samples of that size is normally distributed.

[/ QUOTE ]

Don't the numbers you posted indicate that this is not the case? Sampling distributions are going to approximate normal with n sufficiently large. What your numbers seem to show is that n=100 is not large enough.

Correct me if I'm wrong, but you've shown that the sampling distribution for bb/100 is decidedly not normal. What I want to know is how does this affect confidence intervals?

MaxPower
12-09-2005, 04:50 PM
[ QUOTE ]
Nice work Max, thanks a lot.

[ QUOTE ]
So if I play 20,000 hands and my BB/100 is 1.5, I know that the sampling distribution for samples of that size is normally distributed.

[/ QUOTE ]

Don't the numbers you posted indicate that this is not the case? Sampling distributions are going to approximate normal with n sufficiently large. What your numbers seem to show is that n=100 is not large enough.

Correct me if I'm wrong, but you've shown that the sampling distribution for bb/100 is decidedly not normal. What I want to know is how does this affect confidence intervals?

[/ QUOTE ]

If your BB/100 is based on exactly 100 hands, then the sampling distribution is not normal. If it based on thousands of hands, then it will be.

The BB/100 is just a measure, don't get hung up on it. We can measure height in feet, inches, centimeters, etc. Regardless of the units we use your height stays the same and the distribution stays the same.

MaxPower
12-09-2005, 04:55 PM
[ QUOTE ]
Nice work Max, thanks a lot.

[ QUOTE ]
So if I play 20,000 hands and my BB/100 is 1.5, I know that the sampling distribution for samples of that size is normally distributed.

[/ QUOTE ]

Don't the numbers you posted indicate that this is not the case? Sampling distributions are going to approximate normal with n sufficiently large. What your numbers seem to show is that n=100 is not large enough.

Correct me if I'm wrong, but you've shown that the sampling distribution for bb/100 is decidedly not normal. What I want to know is how does this affect confidence intervals?

[/ QUOTE ]

There is no such thing as sampling distribution for BB/100 (the measure), only a sampling distribution for samples of size X (in this case, 100). If I did the same analysis using BB/hand or BB/1000 or bb/21, I would find the exact same results.

MaxPower
12-09-2005, 04:59 PM
[ QUOTE ]
from looking at the q-q plot, it seems that the upper tail of the poker data is fatter than the gaussian, and the lower is thinner. shouldn't this result in large downswings happening less often than a normal assumption would predict?

i need to spend some more time thinking about this, the repetition of trials inherent in poker makes this more complicated than a standard VaR calculation.

[/ QUOTE ]

These are based on 100 hand samples, so in order win a lot or lose a lot you probably need to play some big pots. Typically , you win more bets when you take down a big pot than you lose bets when you lose a big pot (I am talking about multiway pots here). So the big wins should be bigger than the big losses.

This has got me curious. I will run another one using a very large sample size and see what it looks like.

stinkypete
12-09-2005, 07:57 PM
[ QUOTE ]
from looking at the q-q plot, it seems that the upper tail of the poker data is fatter than the gaussian, and the lower is thinner. shouldn't this result in large downswings happening less often than a normal assumption would predict?

[/ QUOTE ]

yes, but not really. what it really suggests is that really bad 100-hand stretches would happen less often than a normal distribution would predict. big downswings occur over 10,000 or more hands, and the sample size should reflect that if you want to know how likely big downswings are. over that many hands though, it should be very very close to a normal distribution, considering that this is quite close as well (even though everyone's saying it's not for some reason... just look at the damn graph, it's pretty close).

anyway, i say "yes, but not really" because the chart with 10,000 samples should be shaped somewhere between this and a normal distribution. it should differ from a normal distribution in the same way, though not by nearly as much. (someone correct me if i've misunderstood on this)

on the other hand, while the shape suggests that really terrible downswings should happen less than in a normal distribution, it also suggests that pain-in-the-ass break-even streaks should be longer on average than a normal distribution suggests, since that's what will happen when you don't get any of those rare, winrate boosting surges that are illustrated on the far right of the graph. you can see this by noting that the graph is "fatter" on the left side of the mean than the right side (the bars near the mean are above the normal on the left side, and below the normal on the right side).

edited some errors. i dont know why i can't type what i'm thinking the first time.

BillC
12-18-2005, 09:43 PM
I think the way this excellent line research will show is that BB/100 is both:

-approximately normal, but
-signifigantly different than normal.