Rolling SD?

Sarge85 · #1 12-22-2003, 08:12 PM

I'm focusing more and more on tracking my results. I've been through the archives and essays and have figured out how to compute my SD, and feel like I know what I'm looking at.

My question is - How do I update my SD?

Should I just use a long "chain" of data and continue to update it-

OR

Should I use a "rolling" SD? I thought if I used a rolling 50 session (is 50 to low or high) SD it would reflect a more current picture of "how I'm playing now" effect.

While I don't have my intial SD numbers when I first started poker, I'm sure it would have been much higher than what it is now.

Comments-?

bigpooch · #2 12-22-2003, 10:28 PM

The standard deviation can be computed from the following

s^2 = (1/(n-1)) * (Sum_of_squares- n*m^2)

where s is the standard deviation, n=number of observations.
If the observations are X_1, X_2, ...X_n, then m is the
sample mean (X_1 + ... + X_n)/n and Sum_of_squares is just
(X_1^2 + X_2^2 + ... + X_n^2). If you are keeping data on a
per hour basis, this leads to a natural way of updating all
the estimators.

Thus, to keep a current estimation of both win rate and SD,
you only need to update n, Sum = X_1 + X_2 + ... + X_n and
the Sum_of_squares = X_1^2 + X_2^2 + ... + X_n^2. Then the
win rate is just (old Sum + sum of new data)/new n and the
new Sum_of_squares = old Sum_of_squares + sum of squares of
new data. Then you can simply plug these into the above
formula for s^2 and take the square root to get the SD. In
fact, one could argue that you no longer need to keep the
raw data if you are only concerned with win rate and SD;
you only need to keep track of n, Sum and Sum_of_squares.

Note that s^2 has chi squared distribution with n-1 degrees
of freedom so you can use a statistics table to determine
confidence intervals. In practice, it isn't very important
because the SD for a player converges quite rapidly and
will not change very much unless game conditions dictate
otherwise. On the other hand, having a lot of data is
important in estimating the win rate to any reasonable
degree of accuracy.

On the other hand, you may not keep records on an hour by
hour basis (or any other regular method) but session by
session. This seems not nearly as useful because some
sessions can be very long (10+ hours in B&M) and some are
very short (<1 hour) online. Nevertheless, the same sort of
updating can be achieved although the number of hours played
would have to be much greater (compared to hour by hour
data) to achieve the same level of confidence in any
estimation.

BruceZ · #3 12-22-2003, 11:01 PM

Note that s^2 has chi squared distribution with n-1 degrees of freedom so you can use a statistics table to determine confidence intervals.

Actually, n*s^2/sigma^2 has a chi squared distribution with n-1 degrees of freedom, where sigma is the true standard deviation. Also, that is strictly true only when s is computed as:

s^2 = (1/n * (Sum_of_squares- n*m^2)

See this post for gory details.