View Single Post
  #9  
Old 12-29-2003, 04:37 PM
BruceZ BruceZ is offline
Senior Member
 
Join Date: Sep 2002
Posts: 1,636
Default Re: Hypothetical Question-Hero is runnin real bad!!

on the other hand, the interval for the SD per
hour will be estimated more quickly (but it isn't that
important as this seems to converge very quickly). It does
seem more natural to keep statistics on an orbit by orbit
basis (or perhaps every 2 or 3 orbits) because time-based
statistics in this case seems so arbitrary.


The problem with computing your SD over data logged every few orbits is that your results for time periods of an hour or less for 1 table are probably not normally distributed, so if you try to estimate your standard deviation based on the sample variance of hourly results, then you can no longer use the chi-squared distribution to determine the confidence interval of the estimate, it will no longer be a maximum-likelihood estimator, and it won't be as accurate as estimates obtained from several hour sessions. Now if you *could* estimate your SD for 1 orbit or for 1 or hour accurately, and if this were a fixed distribution for each orbit or for each hour, then by the CLT you could multiply this SD by sqrt(N) for N oribts or N hours to get the correct long term SD, even though the oribit and hourly SDs are not normal.

It is best to compute your SD based on session results, where sessions are usually several hours in length, and the longer the better. Even though your hourly results are not normal, your session results will be considerably more normal by the CLT. Sampling per session has the added advantage of including in the distribution the changes due to different opponents which may change from session to session. The sessions can vary in duration, and you use the maximum likelihood estimate of the SD for variable length sessions given in the essay section. I still owe someone a derivation of that which I'll put up. After 20-30 sessions, even though the SD will still have some uncertainty associated with it, it turns out that this will have very little impact on the confidence interval of the hourly rate. After only 20 sessions, it should impact it by less than 10%, as can be seen from the t-distribution. The reason for this is that it is as likely to be estimated too high as too low, so the effect on the hourly rate confidence interval is smaller than than you might expect from the confidence interval of the SD. If you want to know the confidence interval for the SD, you could now use the chi-square distribution, since the underlying samples are normal.
Reply With Quote