PDA

View Full Version : Question for BruceZ (or others) regarding streaks

DougOzzzz
08-18-2005, 07:33 PM
A while ago you showed how to calculate the odds of getting 3 PP in a row any time during 1000 hands.

I'm interested in calculating the odds of getting X or more PP's in Y hands at any point over the course of Z hands. For example, use 14 as X, 100 as Y, and 1000 as Z. I know using the BINOM function in excel that the probability of getting 14 or more PP's in 100 hands is 0.002182578 or, about 458:1. How do I calculate the odds of this happening at any point over 1000 hands? 10000 hands? Etc.

Thanks.

hukilai
08-18-2005, 09:20 PM
Calculating of prob. hitting X+ PP in Y hands is tedious but possible (it will a sum of binom coef). But finding out the chances of having it happened in the wider hands set would require apllying of incursion/excursion theorem, which is just too much hassle.

Any smarter ways to do that?

DougOzzzz
08-25-2005, 08:29 PM
No help?

BruceZ
08-26-2005, 05:19 AM
[ QUOTE ]
A while ago you showed how to calculate the odds of getting 3 PP in a row any time during 1000 hands.

I'm interested in calculating the odds of getting X or more PP's in Y hands at any point over the course of Z hands. For example, use 14 as X, 100 as Y, and 1000 as Z. I know using the BINOM function in excel that the probability of getting 14 or more PP's in 100 hands is 0.002182578 or, about 458:1. How do I calculate the odds of this happening at any point over 1000 hands? 10000 hands? Etc.

Thanks.

[/ QUOTE ]

<font color="red">It has been pointed out that this is only an approximation. See the note in red below</font>.

Let f(n,k) be the probability of exactly k pairs in the last 100 hands out of n hands for 0&lt;=k&lt;=13, and let f(n,14) be the probability of 14 or more pairs in any 100 hands out of n hands. You want to find f(1000,14).

Binomdist(k,100,1/17,false) is the probability of exactly k pairs in 100 hands, with the probability of a pair = 1/17. 1 - Binomdist(13,100,1/17,true) is the probability of more than 13 pairs in 100 hands. This is an Excel function.

f(100,k) = binomdist(k,100,1/17,false), for k &lt; 14
f(100,14) = 1 - binomdist(13,100,1/17,true)

for n &gt; 100:

f(n,0) = f(n-1,0)*16/17 + f(n-1,1)*16/17*(1/100)

f(n,k) = f(n-1,k)*[1/17*(k/100) + 16/17*(1-k/100)] +
f(n-1,k-1)*1/17*[1-(k-1)/100] +
f(n-1,k+1)*16/17*(k+1)/100, for 1 &lt;= k &lt;= 12

f(n,13) = f(n-1,13)*[1/17*(13/100) + 16/17*(1-13/100)] +
f(n-1,12)*1/17*(1-12/100)

f(n,14) = f(n-1,14) + f(n-1,13)*1/17*(1-13/100)

f(1000,14) =~ 9.35%

In Excel, this requires computing n rows by k columns, or for this specific example, 901 rows and 15 columns, with the first row representing f(100,k). Think of each row as a 15 dimensional state vector, where states k = 0 to 13 hold the probability of k pairs in the last 100 hands, and state k = 14 accumulates the probability of 14 or more pairs anywhere in the last n hands. Note that there is no exit from state 14, so this accumulates the answer over all n. All of the other states only hold the fraction of pairs in the last 100 hands. If you like, I can email you this spreadsheet.

To understand this method, picture a sliding window 100 hands long that we advance 1 hand at a time. With each advance, we pick up one new hand (hand n), and lose 1 hand from the beginning of the window (hand n-100). The number of pairs in the window does not change if the nth hand is a pair (probability 1/17) and hand n-100 was also a pair (probability k/100), and there is also no change if hand n is not a pair (16/17) and hand n-100 was not a pair (1-k/100). In these cases, we update f(n,k) from f(n-1,k). We update f(n,k) from f(n-1,k-1) when the sliding window gains a pair, that is, when the nth hand is a pair (1/17) and hand n-100 was not a pair which has probability 1-(k-1)/100. We update f(n,k) from f(n-1,k+1) when the sliding window loses a pair, that is, when the nth hand is not a pair (16/17) and hand n-100 was a pair with probability (k+1)/100.

If you were writing a program to do this, the only storage required would be 2 arrays of length 15, where each one gets updated from the other back and forth.

This is a Markov process, and it can also be set up as a matrix. The matrix would be 15x15, with each row having at most 3 entries. The initial state vector would be f(100,k), which are the probabilities after 100 hands. The matrix would be multiplied by this vector 900 times, or the matrix could be raised to the 900th power, and the resultant matrix would then be multiplied by the vector one time.

<font color="red">It is assumed in the above that when k pairs appear in the window of length 100, that the probability that the first hand in the window is a pair is k/100. This would be true if we considered all hand sequences with k pairs in this window; however, we are only considering the hand sequences which do not have 14 or more pairs in any preceding window of length 100. This condition would sometimes change this probability, though perhaps only slightly.</font>

BruceZ
08-26-2005, 03:53 PM
Irchans has pointed out that my above method is not exact. See my comments in red in the original post for an explanation. In fact, the computation of the probability in question was the main issue which prevented me from posting a solution earlier. I will leave this solution up because I believe it is a good approximation, though I do not know what the error is, and because I don't have any better solution at this point.

gol4pro
08-27-2005, 12:03 AM
Change the situation around a little bit.

Say a computer generates a random number from 1-17. What's the chance that you would guess the number (with 1 guess each round) 14 times in a span of 100 guesses. This is really the same thing as the pocket pair question.

Seems to me like binomial theorem would work.

100c14*100c86*(1/17)^14*(16-17)^86

It's been a while since I took probability, so there may be a 1 - something in there somewhere, but is there any reason binomial therom doesn't work here?

BruceZ
08-27-2005, 12:39 AM
[ QUOTE ]
Change the situation around a little bit.

Say a computer generates a random number from 1-17. What's the chance that you would guess the number (with 1 guess each round) 14 times in a span of 100 guesses. This is really the same thing as the pocket pair question.

Seems to me like binomial theorem would work.

100c14*100c86*(1/17)^14*(16-17)^86

It's been a while since I took probability, so there may be a 1 - something in there somewhere, but is there any reason binomial therom doesn't work here?

[/ QUOTE ]

The probability that you guess it exactly 14 times out of 100 is just

C(100,14)*(1/17)^14*(16/17)^86

or in Excel, =Binomdist(14,100,1/17,false).

For the probability that you guess it 14 times or more, you would take 1 minus the sum of C(100,k)*(1/17)^k*(16/17)^(100-k) for k = 0 to 13. This is the same as =1 - Binomdist(13,100,1/17,true).

This isn't the question that the OP asked. He wants to know the probability of 14 or more out of 100 hands for some consecutive 100 hands within 1000 hands. The binomial distribution is just used to get the initial conditions for the first 100 hands.

Lexander
08-27-2005, 03:45 AM
If you are willing to accept an approximation (which I personally have no problem with), then you might as well get your answer by simulation.

Personally, I wish I was more comfortable with queuing theory, because this seems like some kind of single server queue, with geometric arrivals and fixed departures (my intro class never got past M/M/1 queues, so I am out of my league on how to model this).

irchans
08-27-2005, 09:39 AM
I wrote two quick simulations. Here are the results for the probability that at some time in a sequence of 1000 hands, there exists a 100 hand window with 14 or more pocket pairs.

In the first simulation of 10000 1000 hand sequences, I got 14 or more pairs 1091 times. The second simulation I got 1040. So, I think the correct answer is around 10% or 11%.

For the second simulation I copied the following formulas into excel:

cells c1 to c1000 &gt;&gt;&gt;&gt; +IF(RAND()&lt;1/17,1,0)
cells d1 to d1000 &gt;&gt;&gt;&gt; +SUM(C1:C100)
cell e1 &gt;&gt;&gt;&gt; 0
cells e2 to e1000 &gt;&gt;&gt;&gt; +MAX(E1,D1)

Each time you hit the F9 button, the worksheet shows the maximum number of pairs in a 100 hand window in cell e1000.

irchans
08-27-2005, 12:16 PM
I think it's possible to use recurrence formulas like Bruce's formaulas to get the exact answer.

Look at the 1000 hands as 900 overlapping windows of width 100.

The Probability p[n] that the nth window will have exactly 14 hands is 0.0014279740790007608. Let g[n] be the probability that the nth window has exactly 14 hands and that no previous window had 14 hands. Let r[n,i] be the probability that the nth window will have exactly 14 hands given that the ith window had exactly 14 hands. Note that r[n,i] = r[n-i] because it only depends on the difference between n and i. Then

p[n] = Sum[ g[j] * r[n-j] , {j, 0, n}].

( p is the convolution of g and r so we might be able to use Fourier Transforms to solve the problem, but I won't use those in this post. )

If we can solve this equation for g[n] then the answer to the original poster's question is:

Sum[ g[n], {n,1,900}].

Notice that r[0] = 1 so:

p[n] = Sum[ g[j] * r[n-j] , {j, 0, n}]
p[n] = g[n] + Sum[ g[j] * r[n-j] , {j, 0, n-1}]
g[n] = p[n] - Sum[ g[j] * r[n-j] , {j, 0, n-1}]
g[n] = 0.0014279740790007608- Sum[ g[j] * r[n-j] , {j, 0, n-1}].

Now we just need the sequence r[n] to compute g[n]. If n&lt;100, r[n] is the sum over k=0,...,14 of the probability that the first n hands of a window has k pairs times the probability that the next n hands after the window have k pairs. (If n&gt;=100, r[n] = 0.0014279740790007608.)

<font class="small">Code:</font><hr /><pre>
r[n] = Sum[ C[n, i]
*(1/17^i)*(16/17)^(n - i) * prfirst[n, i],
{i, 0, Min[n, 14]}]
</pre><hr />

The probability that the first n hands of a 100 hand window have i pairs is

<font class="small">Code:</font><hr /><pre>
prfirst[n, i] = If[ j &lt; 14 - (100 - i),
0,
C[i, j]*C[100 - i, 14 - j] / C[100, 14]]
</pre><hr />

Using those formulas, the 110 values of r are 0.81765, 0.68107, 0.57759, 0.49818, 0.43635, 0.38746, 0.34817, 0.31607, 0.28941, 0.26693, 0.24769, 0.231, 0.21634, 0.20332, 0.19166, 0.18111, 0.17151, 0.16272, 0.15461, 0.1471, 0.14012, 0.13361, 0.12751, 0.12178, 0.11639, 0.1113, 0.10648, 0.10192, 0.09759, 0.09347, 0.08955, 0.08581, 0.08224, 0.07883, 0.07557, 0.07245, 0.06946, 0.06659, 0.06384, 0.0612, 0.05866, 0.05622, 0.05387, 0.05162, 0.04945, 0.04736, 0.04534, 0.0434, 0.04154, 0.03974, 0.038, 0.03633, 0.03472, 0.03317, 0.03168, 0.03023, 0.02884, 0.0275, 0.02621, 0.02497, 0.02377, 0.02262, 0.02151, 0.02044, 0.01941, 0.01841, 0.01746, 0.01654, 0.01566, 0.01482, 0.014, 0.01322, 0.01247, 0.01175, 0.01107, 0.01041, 0.00977, 0.00917, 0.00859, 0.00804, 0.00752, 0.00701, 0.00654, 0.00608, 0.00565, 0.00524, 0.00485, 0.00448, 0.00413, 0.0038, 0.00349, 0.0032, 0.00292, 0.00266, 0.00242, 0.00219, 0.00198, 0.00178, 0.0016, 0.00143, 0.00143, 0.00143, 0.00143, 0.00143, 0.00143, 0.00143, 0.00143, 0.00143, 0.00143, and 0.00143.

Plugging these values into the formula for g gives

g[1] = 0.00142797
g[2] = 0.000260395
g[3] = 0.000242518,
g[4] = 0.000227544, ...

Adding up the first 900 values of g gives an answer of 11.7327%. This number seems a bit higher than the simulation values so I'm not real confident about it yet.

irchans
08-27-2005, 10:00 PM
I fixed some typos in the post below

I think it's possible to use recurrence formulas like Bruce's formaulas to get the exact answer.

Look at the 1000 hands as 900 overlapping windows of width 100.

The Probability p[n] that the nth window will have exactly 14 hands is 0.0014279740790007608. Let g[n] be the probability that the nth window has exactly 14 hands and that no previous window had 14 hands. Let r[n,i] be the probability that the nth window will have exactly 14 hands given that the ith window had exactly 14 hands. Note that r[n,i] = r[n-i] because it only depends on the difference between n and i. Then

p[n] = Sum[ g[j] * r[n-j] , {j, 1, n}].

( p is the convolution of g and r so we might be able to use Fourier Transforms to solve the problem, but I won't use those in this post. )

If we can solve this equation for g[n] then the answer to the original poster's question is:

Sum[ g[n], {n,1,900}].

Notice that r[0] = 1 so:

p[n] = Sum[ g[j] * r[n-j] , {j, 1, n}]
p[n] = g[n] + Sum[ g[j] * r[n-j] , {j, 1, n-1}]
g[n] = p[n] - Sum[ g[j] * r[n-j] , {j, 1, n-1}]
g[n] = 0.0014279740790007608- Sum[ g[j] * r[n-j] , {j, 1, n-1}].

Now we just need the sequence r[n] to compute g[n]. If n&lt;100, r[n] is the sum over k=0,...,14 of the probability that the first n hands of a window has k pairs times the probability that the next n hands after the window have k pairs. (If n&gt;=100, r[n] = 0.0014279740790007608.)

Code:

r[n] = Sum[ C[n, i]
*(1/17^i)*(16/17)^(n - i) * prfirst[n, i],
{i, 0, Min[n, 14]}]

The probability that the first n hands of a 100 hand window have i pairs is

Code:

prfirst[i, j] = If[ j &lt; 14 - (100 - i),
0,
C[i, j]*C[100 - i, 14 - j] / C[100, 14]]

Using those formulas, the 110 values of r are 0.81765, 0.68107, 0.57759, 0.49818, 0.43635, 0.38746, 0.34817, 0.31607, 0.28941, 0.26693, 0.24769, 0.231, 0.21634, 0.20332, 0.19166, 0.18111, 0.17151, 0.16272, 0.15461, 0.1471, 0.14012, 0.13361, 0.12751, 0.12178, 0.11639, 0.1113, 0.10648, 0.10192, 0.09759, 0.09347, 0.08955, 0.08581, 0.08224, 0.07883, 0.07557, 0.07245, 0.06946, 0.06659, 0.06384, 0.0612, 0.05866, 0.05622, 0.05387, 0.05162, 0.04945, 0.04736, 0.04534, 0.0434, 0.04154, 0.03974, 0.038, 0.03633, 0.03472, 0.03317, 0.03168, 0.03023, 0.02884, 0.0275, 0.02621, 0.02497, 0.02377, 0.02262, 0.02151, 0.02044, 0.01941, 0.01841, 0.01746, 0.01654, 0.01566, 0.01482, 0.014, 0.01322, 0.01247, 0.01175, 0.01107, 0.01041, 0.00977, 0.00917, 0.00859, 0.00804, 0.00752, 0.00701, 0.00654, 0.00608, 0.00565, 0.00524, 0.00485, 0.00448, 0.00413, 0.0038, 0.00349, 0.0032, 0.00292, 0.00266, 0.00242, 0.00219, 0.00198, 0.00178, 0.0016, 0.00143, 0.00143, 0.00143, 0.00143, 0.00143, 0.00143, 0.00143, 0.00143, 0.00143, 0.00143, and 0.00143.

Plugging these values into the formula for g gives

g[1] = 0.00142797
g[2] = 0.000260395
g[3] = 0.000242518,
g[4] = 0.000227544, ...

Adding up the first 900 values of g gives an answer of 11.7327%. This number seems a bit higher than the simulation values so I'm not real confident about it yet.

alThor
08-28-2005, 12:58 PM
[ QUOTE ]
Look at the 1000 hands as 900 overlapping windows of width 100.

[/ QUOTE ]

I found no mistake in your formulas, but wouldn't use want to use 901 windows?

OTOH, I am confused by some entries in your sim post. Cell E1 should be =D1; the other 900 (or 899?) cells Ei should max cells Di and E(i-1). Is that what you actually did? Perhaps this explains the discrepancy.

alThor

irchans
08-28-2005, 02:41 PM
I agree, there are 901 windows. That changes the theoretical answer to 11.7447%.

For the Excel simulation, I agree that cell E1 should be =D1. In the other E cells I had max of cells D(i-1) and E(i-1) which I think works as well as your formula.

I reran the Excel simulation with your corrections. Among the 200,000 virtual deals there were 20,789 deals that had 14 pairs or more in a window. That simulation gives a 10.4% estimate of the probability. I wrote a simulator in Mathematica and got 22,545 deals with 14 pairs or more. That's 11.3%. Given the size of the simulations, either there is a bug one of the simulation programs and the theoretical derivation, or these random generators are not good enough. Hmm.

alThor
08-28-2005, 05:03 PM
[ QUOTE ]
I reran the Excel simulation with your corrections. Among the 200,000 virtual deals there were 20,789 deals that had 14 pairs or more in a window. That simulation gives a 10.4% estimate of the probability.

[/ QUOTE ]

Just so we are clear, does 200k deals mean 200 million hands?

In any case, I also wrote a sim from scratch in Python. After 20k trials (i.e. up to 20 million "hands"), I got around 10.3%, which has a conf. interval size around +/- half a percent, so I believe it is below 11%.

[ QUOTE ]
I wrote a simulator in Mathematica and got 22,545 deals with 14 pairs or more. That's 11.3%. Given the size of the simulations, either there is a bug one of the simulation programs and the theoretical derivation, or these random generators are not good enough. Hmm.

[/ QUOTE ]

It is not a problem with the random generators (though with something that big, I wouldn't trust Excel completely). Perhaps we both made the same sim mistake, whatever it is. I also checked some of your numbers; I agree with your "r" values, etc. We are missing something. Oh well.

alThor

DougOzzzz
08-29-2005, 12:42 AM
I haven't tried to understand your post yet explaining your method for solving the problem

I can speak for Excel's RNG being crappy though. There are some very obvious patterns if you analyze a large sequence of randomly generated numbers. I almost always get incorrect answers when I write simulations in Excel even with millions of trials.

Still, it seems a bit strange that BruceZ's method could be so far off. I understood his method, and it makes sense. The flaw in the method also makes sense, and logically I concluded that the error would cause the odds to increase.

However the difference still seems large. I'll try to understand your method at a later time. I don't have as much mathematical training as most of you here so sometimes it's difficult for me to translate these formulas into something that makes sense to me.

I do appreciate the effort that everyone put into answering my question, though.

BruceZ
09-14-2005, 08:08 PM

This method has a problem similar to the one in my solution.

[ QUOTE ]

The Probability p[n] that the nth window will have exactly 14 hands is 0.0014279740790007608. Let g[n] be the probability that the nth window has exactly 14 hands and that no previous window had 14 hands. Let r[n,i] be the probability that the nth window will have exactly 14 hands given that the ith window had exactly 14 hands. Note that r[n,i] = r[n-i] because it only depends on the difference between n and i. Then

p[n] = Sum[ g[j] * r[n-j] , {j, 1, n}].

[/ QUOTE ]

For this formula to be correct, r[n,i] must be the probability that the nth window will have exactly 14 hands given that the ith window had exactly 14 hands, and given that no window previous to i had 14 hands. This latter condition will change the probability.

Further, it is not true that r[n,i] only depends only on n-i, because when the windows overlap, small values for the starting position of the first window i place different constraints on the window starting at n than larger values of i, since there would be more potential windows with 14 pairs that could start before a large value of i, and the hands in the window starting at n would be constrained to preclude this.

[ QUOTE ]
Now we just need the sequence r[n] to compute g[n]. If n&lt;100, r[n] is the sum over k=0,...,14 of the probability that the first n hands of a window has k pairs times the probability that the next n hands after the window have k pairs. (If n&gt;=100, r[n] = 0.0014279740790007608.)

Code:

r[n] = Sum[ C[n, i]
*(1/17^i)*(16/17)^(n - i) * prfirst[n, i],
{i, 0, Min[n, 14]}]

The probability that the first n hands of a 100 hand window have i pairs is

Code:

prfirst[i, j] = If[ j &lt; 14 - (100 - i),
0,
C[i, j]*C[100 - i, 14 - j] / C[100, 14]]

[/ QUOTE ]

Because of the conditional issue discussed above, all C[100,14] do not have the same probability when the window starting at n overlaps with the first window. In this case, arrangements which put more pairs at lower values of n would be less likely, since these arrangements are constrained to preclude a window of 14 starting before the first window. The C[i, j]*C[100 - i, 14 - j] would be similarly constrained.