View Full Version : grouped observations (expert Stats question - some poker relevance)

Kenneth Sloan
10-19-2004, 02:29 PM
Suppose I want to estimate the mean and variance for BB/hr.
Suppose, further, that I'm lazy and only bother to note the total time and the gain/loss for each *session*.

Now - estimating the mean BB/hr is simple - just take a weighted sum of the session observations.

But...how about the variance?

It seems to me that I can make full use of my observations when estimating the mean - but that I only get to use one "observation" for each session when estimating variance. Can I do better than that? How?

10-26-2004, 07:41 AM
You have lost data, but you can still make the estimate. It just won't be as accurate. You will need to estimate the number of hands played per session, however. Once you've done that, consider:

Let x1, x2, x3 ... xN be your original data points, representing BB per hand.

You now have:

y1, y2 ... yM where yi = sum of xj thru xk for some j, k.

Thus if each x had mean x_bar and standard deviation sigma, each y will have mean x_bar too and standard deviation sigma / sqrt(S) where S is the number of hands played per session.

This assumes that:

1. Each session is approximately equal length
2. Each session is long enough for the central limit theorem to apply (I think S > 50 should do).

If 2. does not hold, you can still get by using the t distribution to estimate sigma.

If 1. does not hold, there are still things you can do to get estimates, but that is more complex and i don't know how you'd do it offhand.


10-26-2004, 08:51 AM
This essay (http://twoplustwo.com/mmessay8.html) by Mason contains the correct formula for the maximum likelihood estimate of the variance and standard deviation for variable length sessions. It requires that you record the length of each session (in hours or hands) along with the session result.

Kenneth Sloan
10-29-2004, 01:21 PM
Thanks - that's precisely the answer I was looking for!

Kenneth Sloan
10-29-2004, 01:52 PM
Thanks for the reply - alas, I'm mostly interested in the case where the session lengths vary considerably.

I think the reply below addresses that - but I need to trace back to the primary source of the derivation to double check the assumptions.

My best attempt was to compute EV/hr using *all* of the information in the session observations, and then compute variance assuming that there was only one observation per session. There seem to be competing problems with that: the individual observations would have a SMALLER variance than the actual...and the smaller number of observations would increase the variance in the estimate.

It looks like the formula cited in Malmuth's essay tries to deal with these problems - but again, I'd like to trace back to the original derivation. The online version of Malmuth's essay doesn't provide a reference.

So...new question: is there a standard literature citation for the original derivation? or, is it simply "somewhere on the net"?