PDA

View Full Version : Sample sizes


threeonefour
07-16-2004, 10:09 PM
people on here keep saying that 25000 or even 75000 hands will not always give you an accurate assessment of your profitability (BB/100).

Has anyone actually computed if that is statistically true? I am a stats minor is college(Although its been awhile :-/) and I would be capable of testing if it were true(I think) but I personally don't have access to 20000 hands.

However without even testing it, my gut just tells me that is crazy. I am constantly doing similar statistical studies and getting very accurate results with miniscule P-values and such with far less data to work with.

It is obvious that your BB/100 is statistically significant at 20000 hands, I am just wondering how many hands must be played to have a 95% confidence interval of less than 1BB. I searched for the answer but I couldn't find it. Surely someone has figured it out?

threeonefour
07-16-2004, 10:10 PM
Although the type of poker probably doesn't matter that much. this question is particularly targeted towards limit holdem.

astroglide
07-16-2004, 10:22 PM
i would say REAL bb/100 probably starts around 300k based on my experience with fluctuations

BruceZ
07-16-2004, 10:31 PM
[ QUOTE ]
people on here keep saying that 25000 or even 75000 hands will not always give you an accurate assessment of your profitability (BB/100).

Has anyone actually computed if that is statistically true? I am a stats minor is college(Although its been awhile :-/) and I would be capable of testing if it were true(I think) but I personally don't have access to 20000 hands.

However without even testing it, my gut just tells me that is crazy. I am constantly doing similar statistical studies and getting very accurate results with miniscule P-values and such with far less data to work with.

It is obvious that your BB/100 is statistically significant at 20000 hands, I am just wondering how many hands must be played to have a 95% confidence interval of less than 1BB. I searched for the answer but I couldn't find it. Surely someone has figured it out?

[/ QUOTE ]

Be sure to search the recent archives too. You can look for "standard error". Here is a repost of one of my old posts.

What you want to look at is the standard deviation of your hourly RATE. This is called the standard error or SE, and it is equal to SD/sqrt(N) where SD is your standard deviation for 1 hour, and N is the number of hours played. The SE works just like standard deviation in that your actual win rate will lie within 1 SE of your true hourly rate with 68% probability, within 1.3*SE with 80% probability, 1.6*SE with 90% proability, 2*SE with 95% probability, etc.

Example: If your SD for 1 hour is 10 bb, and you play for 400 hours, then your SE is 10/sqrt(400) = 0.5 bb. Then to 90% confidence, your win rate will be accurate to within 1.6*0.5 = 0.8 bb. If your win rate is 1 bb/hr, your 90% confidence interval will be 1 bb +/- .8 bb. That is a wide range, so suppose we want it to be accurate to within 0.5 bb, Then from the above equation, you can see you would have to play until 1.6*10/sqrt(N) = 0.5, so N = 1024 hours. What if you wanted it to within 25%? Then you would have to play 4096 hours (4 times as long as for 50%). For 10% accuracy, you need to play 25,600 hours, or over 12 years of full time play. You can see the number of hours increases rapidly with the reduction in range. If you are willing to accept a lower confidence, these hours can be reduced.

threeonefour
07-16-2004, 10:34 PM
I thought about the idea some more and I seem to have made myself more confused.

Well I guess if you are doing BB/100 over 20000 hands you would have 20 sets of 100. 20 samples is nothing special I suppose. But is that right? Because you really have 20000 samples and you have just stratified(wrong word perhaps) the data into blocks for your convienance.

Actually why do you have to do the /100 at all. Couldn't you do an analysis of variance or something on a hand for hand basis.

threeonefour
07-16-2004, 10:42 PM
doing calculations on a per hour basis may be infalting the standard deviation since there is a standard deviation associated with the number of hands per hour.

wouldn't be better to do it in relation to the hands themselves.

BruceZ
07-16-2004, 10:48 PM
[ QUOTE ]
doing calculations on a per hour basis may be infalting the standard deviation since there is a standard deviation associated with the number of hands per hour.

wouldn't be better to do it in relation to the hands themselves.

[/ QUOTE ]

It makes no difference what units you choose to use for the SD. When you compute the SD, you factor in the number of hours in each session. See Mason's essay in the essay section for the correct formula to do this. This will compute SD in units of $/sqrt(hr), but you can convert this to bb/sqrt(100 hands), bb/sqrt(hands), or whatever. This just involves a scaling factor.

threeonefour
07-16-2004, 10:54 PM
I the formula you listed to calculate the sample size required to be +/- 1 BB.

the data came from BigBaitsim (milo)'s earlier post:

"assuming a win rate of 2.64BB/100 hands and a SD of 17.97BB/100 hands, which are my stats over the past 25K hands."

1.6*17.95/SQRT(N)=1 N=825 So to be +/- 1 BB you would have to play 825,000 hands(given the above SD; which might be inflated.)? Is this statistically correct?

BruceZ
07-16-2004, 10:55 PM
Sorry, I misread your point. Yes, if the number of hands per hour varies significantly, and you didn't want that to affect your SD, it would be better to record the number of hands per sample instead of the number of hours. Your SD would then be in units of bb/sqrt(hands) or bb/sqrt(100 hands).

threeonefour
07-16-2004, 10:57 PM
How can this be true. Sometimes I play 75 hands in an hour, sometimes 50. This has to swing the fluncations of my BB/hr. In calculating BB/100 you are always playing 100 hands so would that be more accurate in determining your return per hand. Obviously if you are +EV you can expect to make more money per hour when playing 75 hands per hour than when playing 50 hands per hour.

threeonefour
07-16-2004, 10:57 PM
ok gotcha. ignore my other reply then. unless it makes no sense. /images/graemlins/smile.gif

BruceZ
07-16-2004, 11:01 PM
[ QUOTE ]
I the formula you listed to calculate the sample size required to be +/- 1 BB.

the data came from BigBaitsim (milo)'s earlier post:

"assuming a win rate of 2.64BB/100 hands and a SD of 17.97BB/100 hands, which are my stats over the past 25K hands."

1.6*17.95/SQRT(N)=1 N=825 So to be +/- 1 BB you would have to play 825,000 hands(given the above SD; which might be inflated.)? Is this statistically correct?

[/ QUOTE ]

Except it's 82,500 hands.

threeonefour
07-16-2004, 11:08 PM
touché.

lol

Sully
07-16-2004, 11:37 PM
[ QUOTE ]
Example: If your SD for 1 hour is 10 bb, and you play for 400 hours, then your SE is 10/sqrt(400) = 0.5 bb. Then to 90% confidence, your win rate will be accurate to within 1.6*0.5 = 0.8 bb. If your win rate is 1 bb/hr, your 90% confidence interval will be 1 bb +/- .8 bb. That is a wide range, so suppose we want it to be accurate to within 0.5 bb, Then from the above equation, you can see you would have to play until 1.6*10/sqrt(N) = 0.5, so N = 1024 hours. What if you wanted it to within 25%? Then you would have to play 4096 hours (4 times as long as for 50%). For 10% accuracy, you need to play 25,600 hours, or over 12 years of full time play. You can see the number of hours increases rapidly with the reduction in range. If you are willing to accept a lower confidence, these hours can be reduced.

[/ QUOTE ]

I have taken my share of Statistics courses, but the knowledge is long gone from my head, so please understand that I am not disagreeing with these numbers from a statistics perspective. I am disagreeing with their usefulness in the world of poker.

If you use these numbers, there is absolutely no way that the numbers can ever be used with any certainty, because

A) We'd like to be able to analyze the numbers before we're old men, and
B) The player I am today is not the same player that existed four weeks, four months, or four years ago. The older the statistics, the less relevant they become.

At what # of hands / hours can we agree that the numbers have some actual usefulness for our discussions? 10,000? 20,000? Perhaps there is no magic number, but can I start to feel pretty confident about my win rate once I reach a certain realistic number of hands?

Or maybe that's why we have and use standard deviation.