Confidence Intervals [Archive] - Two Plus Two Older Archives

View Full Version : Confidence Intervals

ddubois

07-12-2005, 05:49 PM

I've read the Homer post (http://forumserver.twoplustwo.com/showflat.php?Cat=&Board=inet&Number=1342415&fpart= 1&PHPSESSID=), but it doesn't directly apply to my question.

You have some event with P(X) that produces a 1 or 0. You don't know the value of P(X); it could be anything from 0% to 100%, non-inclusive. After ten trials, you have 8 successes. I understand intuitively that it is "unlikely" that P(X) is something low, for instance, 0.2, but I don't know how to compute a confidence interval.

The first thing I might do, as a layman, with litte background in statistics, would be to determine the likelihood of P(X) = 0.1 producing that result, the likelihood of P(X) = 0.2 producing that result, P(X) = 0.3, etc., up to 0.9, and then looking at the ratios of all those likelihoods proportionally.

For instance:

Trials: 10
Successes: 8
0.05 0.000000002
0.10 0.000000365
0.15 0.000008333
0.20 0.000073728
0.25 0.000386238
0.30 0.001446701
0.35 0.004281378
0.40 0.010616832
0.45 0.022889589
0.50 0.043945313
0.55 0.076302551
0.60 0.120932352
0.65 0.175652953
0.70 0.233474441
0.75 0.281567574
0.80 0.301989888
0.85 0.275896657
0.90 0.193710245
0.95 0.074634799
1.817809935

So P(X)=0.8 is the most likely possibility, which of course makes sense. The second column doesn't add up to 100%, but that's not surpirsing - this is a discontinous sampling of possible P(X)s, ignoring P(X)=0.952, P=0.001, P(X)=0.85541, ad infinitum, and as I add more samples the sum of the column will just get higher and higher.

Then I might note that the values of the second column from P(X)=0.55 through P(X)=0.95 carry 95% of the "weight" of the total second column (i.e. =SUM(B13:B21)/SUM(B3:B21) is 0.95398).

Is this anything at all like being 95% confident the P(X) is >= 0.55, or have I spewed completely unrelated nonsense, and a confidence interval has a completely different meaning?

Trials: 20
Successes: 10
0.05 0.000000011
0.10 0.000006442
0.15 0.000209749
0.20 0.002031414
0.25 0.009922275
0.30 0.030817081
0.35 0.068613972
0.40 0.117141551
0.45 0.159349455
0.50 0.176197052
0.55 0.159349455
0.60 0.117141551
0.65 0.068613972
0.70 0.030817081
0.75 0.009922275
0.80 0.002031414
0.85 0.000209749
0.90 0.000006442
0.95 0.000000011
sum all 0.952380952
sum range 0.909727359

Am I "90% confident" that P(X) is not 0 thru 0.30 nor is it 0.70 thru 1?

DWarrior

07-12-2005, 06:41 PM

Well, I wasn't sure if what you said was correct, so I decided to calculate the CI for the example you gave and see if they match. Since I had a CI calculator in excel, all I had to do was calculate the SD (since Excel's SD formula is screwed up)

Anyway, the EV for this is .8, as 8 of 10 outcomes came positive. The SD for this is .4, here's my work:

((2*(.8-0)^2 + 8*(.8-1)^2)/10)^.5 = .4

So, I put in .8 for EV, .4 for SD, and 10 for sample size, and .95 for the interval. The result was 0.5521 to 1.04, but just under 1 is the max it can realistically go.

So, while it doesn't disprove that your answer is a mere coincidence, it gave me something to work off (since I knew your number was correct).

And, indeed, your reasoning makes sense. CI is basically a way to calculate the possibility of variance playing a part in the outcome. Anything between .55 and less than 1 (remember, if the odds of something are 1, there is no other outcome) can occure.

Finally, this assumes the distribution is normal, which is not, as it's clearly skewed upwards. Think about it, havint 8 out of 10 trials succeed when the odds are 0.999 is extremely improbable, since 0.999 is virtually 1, so this is not a normal distribution, the upper bound is most around .85-.86 as you calculated.

So I guess your confidence interval is more accurate than following the formula for a normal distribution, as this is not a normal distribution.

ddubois

07-12-2005, 07:42 PM

Thank you for your response.

I beleive there must be some flaw in what I have done however (beyond the normal distribution issue, which I do not fully understand).

Trials: 3
Successes: 3
0.05 0.000125000
0.10 0.001000000
0.15 0.003375000
0.20 0.008000000
0.25 0.015625000
0.30 0.027000000
0.35 0.042875000
0.40 0.064000000
0.45 0.091125000
0.50 0.125000000
0.55 0.166375000
0.60 0.216000000
0.65 0.274625000
0.70 0.343000000
0.75 0.421875000
0.80 0.512000000
0.85 0.614125000
0.90 0.729000000
0.95 0.857375000
sum all 4.512500000
portion .45 to .95 0.964099723

It does not seem reasonable to beleive that I can be 95% confident that P(X) is higher than .4 after a mere 3 trials. Three trials is like "nothing", you know what I mean?

BTW, the impetus for this thread is to resolve a discussion in the HUSH forum about the relevance of a small sampling of stats on an opponent in your pokertracker database, for "quickly converging" stats like VPIP or PFR. There is some contingent who beleives that as little as one orbit with a player conviews some meaningful information we can use to guess at that person's "true looseness", and another contingent who believes a sampling size that small conveys no information whatsoever.

ddubois

07-12-2005, 07:58 PM

I guess the problem is that ranges of VPIPs for multiple players is not "normally distributed". In other words, given any randomly chosen player on party poker, it is not equally likely that the person will be 10% vpip or 90% vpip. There is definately signifigant clustering towards the 10-40 range. And this is why my "3 for 3" results don't seem congruent with reality - it's because my model isn't congruent with reality. To solve my "confidence interval for VPIP after X trails" problem, I now suspect that I would need to add another column with weights on the rows, where this new column would contain a value that indicates what portion of the poker populace falls into that P(X) range.

Siegmund

07-12-2005, 08:49 PM

Your layman's intuition is good.

Your method of constructing likelihoods for each possible true p and seeing which range of them give the highest likelihoods is the simplest type of "Bayesian confidence interval," one with a uniform prior distribution.

If you have some information ahead of time about what reasonable values of p are, you can refine the estimate further.

A (classical) confidence interval answers the question "if the true value of p is 80%, what range of estimated p's will I see if I repeat this experiment several times?" Your method answers the related but different question "if I imagine that p is a random variable (even though in the real world it isnt) which p's are most likely to give rise to the data set I actually have in my hands?" This latter question has different answers according to exactly what sort of random variable you imagine p to be.

The two methods start out giving different answers for small sample sizes but converge to the same one as the sample size increases. The middle of the classical confidence interval is x/n, the estimated p, while the middle of yours will be at (x+1)/(n+2), reflecting the fact that the tail between 0.8 and 1 is steep and "squashed in" while the other tail from 0 to 0.8 is more spread out.

BruceZ

07-13-2005, 05:31 AM

[ QUOTE ]
A (classical) confidence interval answers the question "if the true value of p is 80%, what range of estimated p's will I see if I repeat this experiment several times?" Your method answers the related but different question "if I imagine that p is a random variable (even though in the real world it isnt) which p's are most likely to give rise to the data set I actually have in my hands?"

[/ QUOTE ]

His method absolutely produces the confidence intervals. Perhaps what you call a "(classical) confidence interval" may also be called a confidence interval, but I doubt that this is what the term was originally meant to refer to, simply because the term confidence exists in statistics distinct from probability to describe confidence intervals derived from maximum likelihood estimators just as the OP has done. If this is why the term was introduced, then his confidence interval is the more "classical", even if Bayesian statistics was invented first. I know you can find a lot of "definitions" of confidence intervals around which state only what yours did without qualification, but I think you would agree that these are at best incomplete and misleading definitions, usually perpetuated by people who do not completely understand statistics.

In any case, the confidence intervals that the OP produced are the only ones which can be reasonably produced based solely on the data provided.

BruceZ

07-13-2005, 05:55 AM

[ QUOTE ]
Is this anything at all like being 95% confident the P(X) is >= 0.55, or have I spewed completely unrelated nonsense, and a confidence interval has a completely different meaning?

[/ QUOTE ]

Yes, it is mathematically correct to say that you are 95% confident, and that this is a 95% confidence interval, and your others are correct as well. They do not tell you the probability that P(x) lies in this interval. Confidence and probability are two different things, and the word confidence is defined exactly as you are using it. To find a probability, you would use the methods of Bayesian statistics as Siegmund has described to estimate a probability distribution for P(x) which you would refine with additional data, but your current method would produce essentially the same result if you collect enough data. You might want to refer to this post (http://forumserver.twoplustwo.com/showthreaded.php?Cat=&Number=2587704&page=&view=&s b=5&o=&vc=1) for more discussion about these different philosophies.

BTW, did you see the response I posted to this question (http://forumserver.twoplustwo.com/showthreaded.php?Cat=&Number=2757944&page=&view=&s b=5&o=&vc=1) that you asked?

ddubois

07-13-2005, 03:04 PM

Interesting link. I don't think I will be able to fully internalize it all, but it's interesting nonetheless.

So back to the issue of finding a confidence interval for a 'true' VPIP given a VPIP sampling, let me see if I understand correctly: The process I outlined in the OP is a Bayesian estiamtion, but with a bogus prior distribution, at least for the VPIP issue. And if I had some data sampling of a few thousands players' VPIPs, and was able to assign each P(X) a weighting to model the distribution of poker players' P(X)es, then included that as a multiplier in my tally columnm, the result would still be a Bayesian estimation, but hopefully a more refined and accurate one?

By the way, the formula I used in the orginal post was:
fails ^ (1 - P(X)) * successes ^ (P(X)) * trials! / fails! / successes!
so the new formula would be the same but with an extra multipler term.

PS: yes, I saw your reply to 'Runs', thanks.

ddubois

07-13-2005, 04:36 PM

So I got some data about distribution of VPIPs over 2000+ players who had more than 50 hands in my database. I was hoping someone with a bigger (5/10 6-max) database would contribute, but this might be a sizable enough sample for my purposes.

<5 0
<10 2
<15 32
<20 180
<25 415
<30 701
<35 967
<40 1249
<45 1495
<50 1705
<55 1923
<60 2078
<65 2192
<70 2256
<75 2306
<80 2327
<85 2340
<90 2346
<95 2353
<100 2355

and I translated these into fractions of 2355, and put them in excel like this:

Hands: 10
Paid for flop: 8
0.05 0.00000000 0.00000000
0.10 0.00084926 0.00000000
0.15 0.01273885 0.00000011
0.20 0.06284501 0.00000463
0.25 0.09978769 0.00003854
0.30 0.16390658 0.00023712
0.35 0.11295117 0.00048359
0.40 0.11974522 0.00127131
0.45 0.10445860 0.00239101
0.50 0.08917197 0.00391869
0.55 0.09256900 0.00706325
0.60 0.06581741 0.00795945
0.65 0.04840764 0.00850295
0.70 0.02717622 0.00634495
0.75 0.02216312 0.00624042
0.80 0.00806794 0.00243644
0.85 0.00581655 0.00160477
0.90 0.00254777 0.00049353
0.95 0.00084926 0.00006338
sum all 0.049054
.4 to .9 0.973073

The third column uses this formula:
=POWER(A3,$C$2) * POWER(1-A3,$C$1-$C$2) * B3 * FACT($C$1) / FACT($C$2) / FACT($C$1-$C$2)

Now after having read the other thread, it seems I cannot say "It is 97% likely this person's true VPIP is within the 40% to 85% bins", but I can say "I am 97% confident this person's true VPIP is within the 40% to 85% bins"? Yeah, that distinction means nothing to me. Honeslty, the former statement sound more like what I want. If I wanted to say "It is X% likely this person's true VPIP is within the range Y to Z" is there some math I could do to make that claim with some sound statistical basis?

Anyways, looking at how fewer trials affects confidence:
Hands: 5
Paid for flop: 4
0.05 0.00000000 0.00000000
0.10 0.00084926 0.00000038
0.15 0.01273885 0.00002741
0.20 0.06284501 0.00040221
0.25 0.09978769 0.00146173
0.30 0.16390658 0.00464675
0.35 0.11295117 0.00550866
0.40 0.11974522 0.00919643
0.45 0.10445860 0.01177950
0.50 0.08917197 0.01393312
0.55 0.09256900 0.01905895
0.60 0.06581741 0.01705987
0.65 0.04840764 0.01512187
0.70 0.02717622 0.00978752
0.75 0.02216312 0.00876569
0.80 0.00806794 0.00330463
0.85 0.00581655 0.00227721
0.90 0.00254777 0.00083580
0.95 0.00084926 0.00017293
sum all 0.123341
.4 to .9 0.894148

So I'm less confident after 5 hands than after 10 hands, which is a good result, because that's what I was expecting to see. This leads me to believe that what I'm doing might be valid, and appears to lend some crecedence to my argument that "Even one orbit of stats can give you some indication on the likely tendencies of the player being viewed".

After 50 hands I am very, very confident this player is not a rock:

Hands: 50
Paid for flop: 40
0.05 0.00000000 0.00000000
0.10 0.00084926 0.00000000
0.15 0.01273885 0.00000000
0.20 0.06284501 0.00000000
0.25 0.09978769 0.00000000
0.30 0.16390658 0.00000000
0.35 0.11295117 0.00000000
0.40 0.11974522 0.00000000
0.45 0.10445860 0.00000004
0.50 0.08917197 0.00000081
0.55 0.09256900 0.00001333
0.60 0.06581741 0.00009477
0.65 0.04840764 0.00045060
0.70 0.02717622 0.00104952
0.75 0.02216312 0.00218348
0.80 0.00806794 0.00112805
0.85 0.00581655 0.00051761
0.90 0.00254777 0.00003868
0.95 0.00084926 0.00000011
sum all 0.005477
.4 to .9 0.992917

ddubois

07-13-2005, 05:15 PM

[ QUOTE ]
.4 to .9 0.992917

[/ QUOTE ]
The last line is wrong, I changed the range in the calculation, but not the label, so it should say ".4 to .9 0.999980" or could say ".45 to .85 0.992917" (or could say ".65 to .85 0.973026", etc. etc.)