PDA

View Full Version : Question for Mason Malmuth

jason1990
10-26-2004, 03:59 PM
In your article "Computing Your Standard Deviation" (http://www.twoplustwo.com/mmessay8.html) you write "A good rule of thumb is to have at least 30 observations (playing sessions) for the estimate to be reasonably accurate." Roughly speaking, about how many hands per session should there be for this to be a good rule of thumb?

You also write, "for most poker games of any size, a standard deviation of 83.11 is not realistic. Expect a much larger number." What are the units on this number?

It's often said that one needs 100K hands to have a reasonable estimate of one's winrate. I've often wondered how many hands one needs to have a reasonable estimate of one's standard deviation. Hence, my first question.

Also, I've often read that 15BB/100 is a reasonable standard deviation in small stakes holdem. Hence, my second question.

Thanks in advance for any clarification you (or others) can give me.

Jason

Leo99
10-26-2004, 08:40 PM
I apologize if I should have waited for Mason to reply first.

I believe the units for the SD are $/session-hour. Time of the session is taken into consideration by the equation so it's not critical. Number of hands is not part of the equation. As long as your session times are fairly consistent (from the same normal distribution) they're ok. In other words, Mason's example is fine. If you threw in a 5 minute session where you won or lost a lot of money it would skew the calculation. Mason only deals with money won or lost per session-hour. If you averaged 15 hands per hour over all 10 sessions in the example: 83.11$/15 session-hour = 5.5 $/hand If we assume a$2 BB then 15BB/100hands = (15x2BB)/(100hands/15hands/hour) = $4.5$/hand which seems about in the right ballpark.

I don't understand Mason's statement when he says that for most poker games of any size expect a higher SD. Since the calculation ignores the stakes, the SD should be proportional to the stakes. The SD needs to be normalized by the BB in order to compare the SD in a low limit game to the SD in a high limit game.

jason1990
10-26-2004, 09:10 PM
Thanks for the reply. I was mostly interested in his comment, "A good rule of thumb is to have at least 30 observations (playing sessions) for the estimate to be reasonably accurate." I think it's pretty obvious (but maybe it isn't) that if you play 30 20-minute sessions at a B&amp;M, then you won't really have an accurate measure of your long-term standard deviation. But if you play 30 6-hour online sessions, that's probably much more than enough.

So, for example, if I play about 100 hands per session, is 30 sessions enough for me to reliably estimate my standard deviation? Or is my hands/session rate lower than average, and does that mean I need more sessions to get a reasonable estimate.

By the way, I have no problem applying the formula. In fact, I can have all my "sessions" be exactly the same number of hands, since I have a database on hand-by-hand results.

Edit: In other words, I can chop up the hands I've played into sessions of arbitrary length, since I have all that info available in a database. So, once I reach 3000 hands, can I chop them up into 100 hand sessions, apply the formula, and get a reliable estimate of my standard deviation? This seems strange, since I've been led to believe that I need around 100,000 hands to get a reliable estimate on my winrate. Why should these two statistics be so fundamentally different?

Leo99
10-27-2004, 01:30 PM
You're confusing hands with session-hours. You can't chop up the hands and create your own sessions. That's not how the formula works. The formulas assumes a bias in that you might be playing better on one day than another or that there is some session to session difference. The formula says (approximately) that 99% (3SD) of the times that you play a session you will have an average win rate of $87 per session +/-$260. Or another way to say this is 99% of the time you sit down to play you will win or lose in the range of winning $348 to losing$173.

It is also assumed that your skill level and your opponents' skill level remain constant.

On a technical level Mason's formula gives the formula for sigma squared. It should be s squared as s is our estimate for sigma. s --&gt;sigma as N --&gt; infinity. So, our estimate gets better and better the more sessions we play. If I play 1000 sessions of 30 minutes and you play 1000 sessions of 6 hours we'd both be able to calculate pretty accurate estimates of sigma for each of our sessions.

jason1990
10-27-2004, 06:04 PM
[ QUOTE ]
The formulas assumes a bias in that you might be playing better on one day than another or that there is some session to session difference.

[/ QUOTE ]

I assume you mean that the user of the formula is (implicitly) assuming a bias in that you might be playing better on one day than another or that there is some session to session difference. (Since a formula cannot assume anything. /images/graemlins/smile.gif)

Anyway, I disagree completely (with what I think you're trying to say). Any statistician who wanted to prove something about the accuracy of this formula, whether asymptotically or for specific sample sizes, would undoubtedly assume independence of the session outcomes, as well as some consistency in the probability distributions of the session results. They obviously couldn't be identically distributed since the session times are different, but a natural assumption would be that the variance of each session is proportional to the duration of that session, and that the constant of proportionality is the same for all sessions. Without these assumptions (which wouldn't be there if you assumed some sort of bias or session to session differences), you would be hard pressed to prove anything about the reliability of this estimator. This is precisely why you should recompute your standard deviation whenever game conditions change. You may not always know when they change, but when you do, you must correct for this phenomenon, because it is not part of the original formula.

[ QUOTE ]
You're confusing hands with session-hours. You can't chop up the hands and create your own sessions. That's not how the formula works.

[/ QUOTE ]

Again, I completely disagree with the claim that I cannot chop up the hands and create my own sessions. Consider an online player who plays long hours at multiple tables. Suppose he plays 1000 hands per session on average and plays 30 sessions. He then applies the formula and gets a number, call it \sigma_1. Suppose he then chops up each session into 10 "artifical" sessions of 100 hands each. He now has 300 "sessions" and he applies the formula again, getting a new number \sigma_2. Now, I'm a probabilist, not a statistician, but I'm sure that even I could prove that \sigma_2 is a more reliable estimator of his true standard deviation than \sigma_1, in the sense that \sigma_2 has a smaller variance. In fact, it's no great leap to believe that the optimal estimator (in the sense of having the smallest variance) is obtained by using the formula with 30000 1-hand sessions. (Of course, there's a subtle difficulty with all of this, because the smaller the session length, the less it will behave like a Gaussian. But Gaussian assumptions are not strictly necessary to compute and compare the variances of different estimators.)

Obviously, the formula is most valuable for people who do not have hand-by-hand results. (For example, a B&amp;M player who has a diary containing only the results of each night.) It is presented in a way that they can use, since they don't have the ability to chop up their sessions into single hands, let alone into sessions of equal length.

I don't know your mathematical background, so I may have assumed you have a higher degree of familiarity with probability lingo than you do. If so, I apologize. Anyway, my original question still stands:

"A good rule of thumb is to have at least 30 observations (playing sessions) for the estimate to be reasonably accurate." How long should these playing sessions be in small stakes Holdem?

Leo99
10-28-2004, 12:42 AM
I majored in statistics and probability in college but that was a LONG time ago.

I do remember learning about estimates of sigma where you stratified the population to get a better estimate. Maybe that's what Mason is doing here when he chooses the calculations he's chosen.

My experience with statistics is that it's relatively easy and straightforward to grind out the results of the calculations but it's hard to determine exactly what calculation best fits your population and distribution. That is where you need to make certain assumptions about your population. It appears it would be easy with the data Mason uses to simply divide the money won/lost by the time played and get a win (or loss) per hour figure and use that to calculate your statistics. But they don't. I don't know why but I can only deduce they want to keep things on a per session-hour basis. I wouldn't stratify the population any further unless I understood their reasoning.