Two Plus Two Older Archives  

Go Back   Two Plus Two Older Archives > General Gambling > Probability
FAQ Community Calendar Today's Posts Search

Reply
 
Thread Tools Display Modes
  #11  
Old 06-09-2005, 02:19 PM
BruceZ BruceZ is offline
Senior Member
 
Join Date: Sep 2002
Posts: 1,636
Default Correction

Sorry, but I muffed the most important sentence of this explanation. I already edited the above post with the following correction to this paragraph:

[ QUOTE ]
Confidence has a mathematically defined meaning distinct from "likely" or "probability". Homer's statement does not mean that there is a 95% probability that his true win rate is in this range. It means that IF the true win rate were outside of this range, there would be less than a 5% probability that his observed results would occur.

[/ QUOTE ]

That would only make sense if the confidence interval were 1-sided, as would be the case if we testing the hypothesis that our win rate is strictly above or below a certain value. Even then, "observed results" would have to be replaced by "observed results or worse" or "observed results or better". In this case we were talking about a 2-sided confidence interval. This paragraph has been replaced with the following paragraphs:

Confidence has a mathematically defined meaning distinct from "likely" or "probability". Homer's statement does not mean that there is a 95% probability that his true win rate is in this range. It means that IF the true win rate were above this range, then the probability of observing a win rate equal to or less than the observed win rate would be less than 2.5%, and IF the true win rate were below this range, then the probability of observing a win rate equal to or greater than the observed win rate would also be less than 2.5%.

Another way to look at it is to note that if a large number of people constructed their 95% confidence interval in this manner, then 95% of the people would have their true win rate fall in this interval. This is because before they played any hands, there was a 95% probability that their observed win rate would fall within +/- 2 standard deviations of their true win rate. Conversely then, there was a 95% probability that their true win rate would lie within +/- 2 standard deviations of their observed win rate. After they play the hands, they then construct a +/- 2 standard deviation confidence interval around their observed win rate. However, it is no longer correct to say that their true win rate has a 95% probability of lying within this interval because their true win rate and their observed win rate are not probability distributions; they are fixed numbers. Thus we characterize the situation by saying that there is a 95% confidence that the true win rate lies within this interval.

Note that the center of this interval is the observed win rate, or the "sample mean", and this is termed the "maximum likelihood estimate" of the mean, assuming that we have enough samples that the sample mean is well-approximated by a normal distribution. This term "maximum likelihood estimate" does not mean that this is the most likely value of the mean. It means that this is the value of the mean which maximizes the likelihood of the observed win rate.
Reply With Quote
  #12  
Old 06-10-2005, 12:59 AM
BruceZ BruceZ is offline
Senior Member
 
Join Date: Sep 2002
Posts: 1,636
Default Re: Confidence Intervals

[ QUOTE ]
[ QUOTE ]
On the other hand, the maximum likelihood estimate of the win rate is an unbiased estimate because the expected value of the estimate is equal to the true win rate.

[/ QUOTE ]

No time like the present to remind y'all that MLEs don't come with an automatic guarantee of being unbiased either (and for non-symmetric distributions they usually aren't), only that they asymptotically approach the unbiased value in an efficient fashion.

[/ QUOTE ]

The sample mean is always an unbiased estimate of the mean of any distribution by definition. By "maximum likelihood estimate of the win rate", I was actually referring to the sample mean which we use as an estimate of the win rate when performing a maximum likelihood analysis. The central limit theorem says that for a sufficient number of hands, the distribution of the observed win rate, or sample mean, will be well-approximated by a normal distribution with a mean equal to the true win rate. We then identify the sample mean as the MLE of the mean of this normal distribution, and this allows us to use normal confidence intervals. I realize that this isn't the same as saying that we have an MLE for the win rate which is unbiased unless the sample mean is exactly normally distributed, but it is an unbiased estimate of the win rate, and it can be made arbitrarily close to the MLE of the win rate for a sufficient number of samples.
Reply With Quote
  #13  
Old 06-10-2005, 06:39 PM
Jerrod Ankenman Jerrod Ankenman is offline
Member
 
Join Date: Jun 2004
Posts: 40
Default Re: Confidence Intervals

[ QUOTE ]
You apparently do not understand a very important and subtle thing which most people also do not understand. You need to understand this before you put the above in a book, or your book will be seriously flawed.

[/ QUOTE ]

So first of all, I was wrong about the mathematical definition of confidence intervals. Bill, of course, wasn't, and neither is our book because we aren't specifically addressing the type of confidence interval discussed here. I apologize for calling the original post "wrong" or "flawed." In the claims that it makes, it is correct. I stand by my statements that it will often be misleading, especially when it is used by the unsuspecting to answer questions for which it is unsuitable. This is not the fault of the original poster, although it is my general feeling that those who answer questions about inferring information about win rates from observed data (the topic we address in our book) in this manner are doing the askers a disservice.

The "probability" definition of a confidence interval is the most commonly known and understood definition; if one types "define: confidence interval" into Google, as I just did, for example, the first result is "a range of values that has a specified probability of containing the rate or trend....(example deleted)." It is also unhelpful that there is a definition of confidence interval within statistics that is the probability definition that applies to a situation that is not entirely unlike this one to the untrained eye; that is when we know the population mean and want to estimate the outcome of a sample.

<snip>
[ QUOTE ]
[ QUOTE ]
Unfortunately, I don't really have any useful ideas as for incorporating the distribution of all poker player win rates, since a) I don't know it and b) it wouldn't be expressible in closed form.

[/ QUOTE ]
This is precisely the circumstance under which you would use maximum likelihood estimation, and why the use of Bayesian estimation is controversial. The result that you get from Bayesian estimation depends on what prior distribution you assume, and different people will assume different prior distributions. At any rate, the maximum likelihood method which Homer has presented, when understood with the proper mathematical definitions, is mathematically correct.

[/ QUOTE ]
True, understood with the proper mathematical definitions, this statement is mathematically correct. Also, many of the things that I said about this statement were wrong. This doesn't make using this methodology any less misleading to the layperson, however; as evidence of this, I'd enter my own confusion, or if that is not sufficiently compelling, we can poll a hundred smart laypeople and ask them what they think.

So my problem with pointing people to that type of answer to their "win rate certainty" questions is that they are likely to conclude (as I did) that the "confidence interval" reflects a probability that the population mean lies within the interval and use this information to come to consistent overestimation of their particular win rates.
[ QUOTE ]
[ QUOTE ]
I hate criticizing someone else's work without offering a correction or a different methodology; but in this case, I hope it's clear why this method is so biased

[/ QUOTE ]
It is interesting that you used the word "biased" because this has a precise mathematical definition. Bayesian estimates are biased as a result of taking into account our preconceived notions about the prior probability distribution. On the other hand, the maximum likelihood estimate of the win rate is an unbiased estimate because the expected value of the estimate is equal to the true win rate.

[/ QUOTE ]
I'm not a statistician, but Bill claims that this statement is just wrong and that maximum likelihood estimators are not necessarily unbiased. I don't think we need to get into that really. He's at the WSOP now, and it's not the point really.
Reply With Quote
  #14  
Old 06-10-2005, 08:42 PM
bigjohnn bigjohnn is offline
Junior Member
 
Join Date: May 2005
Location: United Kingdom
Posts: 23
Default Re: Confidence Intervals

[ QUOTE ]
The sample mean is always an unbiased estimate of the mean of any distribution by definition.

[/ QUOTE ]

Should this not be qualified by stating that the sample is "random"?

John
Reply With Quote
  #15  
Old 06-10-2005, 08:51 PM
BillChen BillChen is offline
Junior Member
 
Join Date: Feb 2003
Posts: 3
Default Re: Confidence Intervals

It seems from a lot of the literature, including the Yale stats page and the Association of Physicians that the definition of confidence interval as the interval with the specified value of the parameter in that range. I also asked a couple of my mathematician friends who are doing research outside of statistics who gave a similar definition.

But I think we agree on the point Jerrod was trying to make that a two std deviation radius window from the mean does not mean a 95% probability that the win rate falls within this window, and I don't think this point is well communicated to the world. Without making a value judgement on whether the definition is misleading, I would say most poker players who know about confidence intervals are misled.

Also maximum likelyhood estimators are not always unbiased. A MLE represents the mode or peak of the parameter distribution which may be to the right or to the left of the mean.

I agree it's nice to know what your distribution of observed win rates is given a true win rate--as in most things trying to go backwards from an observed win rate and other information to make a guess at a true win rate is much harder and less well defined, but it's much more important to the poker player. This reminds me of the story of two guys lost in a balloon who ask where they are. "You're in a hot air balloon." "He must have been a mathematician. What he said is precisely correct, but is of no use to us."

Bill Chen
Reply With Quote
  #16  
Old 06-10-2005, 09:56 PM
irchans irchans is offline
Senior Member
 
Join Date: Sep 2002
Posts: 157
Default William Chen, Jerrod Ankenman, The [0,1] Game

Hi Bill, Jerrod,

Great to see you posting! Do you post anywhere other than 2+2 and rec.gambling.poker?

What is your book about? I really enjoyed Jerrod's posts to RGP on the [0,1] game. Will they be in the book? Is there any place I can preorder the book?
Reply With Quote
  #17  
Old 06-11-2005, 01:21 AM
JohnG JohnG is offline
Senior Member
 
Join Date: Sep 2002
Posts: 192
Default Re: William Chen, Jerrod Ankenman, The [0,1] Game

Jerrod has a journal. Don't know about Bill. I think the rest of your questions are answered somewhere at the following link.

http://www.livejournal.com/users/hgfalling/
Reply With Quote
  #18  
Old 06-11-2005, 02:19 AM
Mason Malmuth Mason Malmuth is offline
Senior Member
 
Join Date: Aug 2002
Location: Nevada
Posts: 1,831
Default Re: Confidence Intervals

Hi Bill:

I've only scaned this thread and do not wish to comment on it directly. But I did notice the following:

[ QUOTE ]
Also maximum likelyhood estimators are not always unbiased.

[/ QUOTE ]

It's my understanding that just the opposite is true. That is they are almost always biased. However, as the sample size grow that bias tends to become insignificant, and the bias can also usually be corrected by a factor, and I'm going from memory, of (n)/(n-1), or something like that.

Best wishes,
Mason
Reply With Quote
  #19  
Old 06-11-2005, 03:39 AM
David Sklansky David Sklansky is offline
Senior Member
 
Join Date: Aug 2002
Posts: 241
Default Re: Confidence Intervals

All hi falootin math aside, anybody who doesn't integrate all evidence available into their estimates is going to lose bets to those who do.
Reply With Quote
  #20  
Old 06-11-2005, 06:45 AM
BruceZ BruceZ is offline
Senior Member
 
Join Date: Sep 2002
Posts: 1,636
Default Re: Confidence Intervals

[ QUOTE ]
So first of all, I was wrong about the mathematical definition of confidence intervals. Bill, of course, wasn't, and neither is our book because we aren't specifically addressing the type of confidence interval discussed here.

[/ QUOTE ]

Are you using the term "confidence interval". What does it refer to?


[ QUOTE ]
I apologize for calling the original post "wrong" or "flawed." In the claims that it makes, it is correct. I stand by my statements that it will often be misleading, especially when it is used by the unsuspecting to answer questions for which it is unsuitable. This is not the fault of the original poster, although it is my general feeling that those who answer questions about inferring information about win rates from observed data (the topic we address in our book) in this manner are doing the askers a disservice.

[/ QUOTE ]

Spoken like a true Bayesian. [img]/images/graemlins/laugh.gif[/img] Of course any statistical analysis will be misleading if used to answer questions for which it is unsuitable. I believe that it is useful to know that if your win rate were such and such a value, that your results would only be obtained a certain percentage of the time. If this conclusion is stated exactly that way, I don't think anyone should be confused.


[ QUOTE ]
The "probability" definition of a confidence interval is the most commonly known and understood definition; if one types "define: confidence interval" into Google, as I just did, for example, the first result is "a range of values that has a specified probability of containing the rate or trend....(example deleted)."

[/ QUOTE ]

This does not contradict the correct definition of confidence interval I have given, but it is incomplete and highly misleading. Their definition applies only BEFORE the observations are obtained, but not AFTER they are obtained. Before the observations, this range of values can only be given relative to an unknown sample mean, so it is not an absolute range of numbers. For example, before the experiment, we can write:

P(sample mean - 2*sigma < true mean < sample mean + 2*sigma) = 95%.

The range of values in parentheses is the 95% confidence interval. AFTER the experiment, this equation is no longer valid; however, the value in parentheses is still the 95% confidence interval, and this is what google fails to define.

Now if anyone thinks that their definition holds even after the experiment, or if they think that the “range of values” is an absolute set of numbers, then that would be wrong. Absolutely wrong. Confidence and confidence intervals come from statistics, and there is a reason why the term confidence had to be introduced separate from probability. It is not "commonly understood" to mean the same thing as probability. It is commonly misunderstood to mean that. I'm aware that the wrong definition is used all over the place by people who don't understand the underlying statistics. This doesn't make it correct any more than it is correct to say that "2000 volts flowed through someone", though we hear this all the time.


[ QUOTE ]
It is also unhelpful that there is a definition of confidence interval within statistics that is the probability definition that applies to a situation that is not entirely unlike this one to the untrained eye; that is when we know the population mean and want to estimate the outcome of a sample.

[/ QUOTE ]

It's not unhelpful; this IS the definition, and it is the only definition. Yes it is an extremely subtle point that the probability part applies only BEFORE you obtain the outcome of a sample, but not AFTER.


[ QUOTE ]
or if that is not sufficiently compelling, we can poll a hundred smart laypeople and ask them what they think.

[/ QUOTE ]

Ironically, if they all think that their win rate lies in their 95% confidence interval, then on average 95 of them will be right. [img]/images/graemlins/smile.gif[/img]


[ QUOTE ]
[ QUOTE ]
It is interesting that you used the word "biased" because this has a precise mathematical definition. Bayesian estimates are biased as a result of taking into account our preconceived notions about the prior probability distribution. On the other hand, the maximum likelihood estimate of the win rate is an unbiased estimate because the expected value of the estimate is equal to the true win rate.

[/ QUOTE ]
I'm not a statistician, but Bill claims that this statement is just wrong and that maximum likelihood estimators are not necessarily unbiased.

[/ QUOTE ]

That's right they aren't, and neither are Bayesian estimators. I explained what I meant by this in my response to Siegmund. The important point is that when we do a maximum likelihood analysis for win rate (as Homer did), we use the sample mean, and this is always unbiased for any distribution.
Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -4. The time now is 03:58 PM.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.