Question for BruceZ (or others) regarding streaks

DougOzzzz · #1 08-18-2005, 07:33 PM

A while ago you showed how to calculate the odds of getting 3 PP in a row any time during 1000 hands.

I'm interested in calculating the odds of getting X or more PP's in Y hands at any point over the course of Z hands. For example, use 14 as X, 100 as Y, and 1000 as Z. I know using the BINOM function in excel that the probability of getting 14 or more PP's in 100 hands is 0.002182578 or, about 458:1. How do I calculate the odds of this happening at any point over 1000 hands? 10000 hands? Etc.

Thanks.

hukilai · #2 08-18-2005, 09:20 PM

Calculating of prob. hitting X+ PP in Y hands is tedious but possible (it will a sum of binom coef). But finding out the chances of having it happened in the wider hands set would require apllying of incursion/excursion theorem, which is just too much hassle.

Any smarter ways to do that?

DougOzzzz · #3 08-25-2005, 08:29 PM

No help?

BruceZ · #4 08-26-2005, 05:19 AM

[ QUOTE ]
A while ago you showed how to calculate the odds of getting 3 PP in a row any time during 1000 hands.

I'm interested in calculating the odds of getting X or more PP's in Y hands at any point over the course of Z hands. For example, use 14 as X, 100 as Y, and 1000 as Z. I know using the BINOM function in excel that the probability of getting 14 or more PP's in 100 hands is 0.002182578 or, about 458:1. How do I calculate the odds of this happening at any point over 1000 hands? 10000 hands? Etc.

Thanks.

[/ QUOTE ]

It has been pointed out that this is only an approximation. See the note in red below.

Let f(n,k) be the probability of exactly k pairs in the last 100 hands out of n hands for 0<=k<=13, and let f(n,14) be the probability of 14 or more pairs in any 100 hands out of n hands. You want to find f(1000,14).

Binomdist(k,100,1/17,false) is the probability of exactly k pairs in 100 hands, with the probability of a pair = 1/17. 1 - Binomdist(13,100,1/17,true) is the probability of more than 13 pairs in 100 hands. This is an Excel function.

f(100,k) = binomdist(k,100,1/17,false), for k < 14
f(100,14) = 1 - binomdist(13,100,1/17,true)

for n > 100:

f(n,0) = f(n-1,0)*16/17 + f(n-1,1)*16/17*(1/100)

f(n,k) = f(n-1,k)*[1/17*(k/100) + 16/17*(1-k/100)] +
f(n-1,k-1)*1/17*[1-(k-1)/100] +
f(n-1,k+1)*16/17*(k+1)/100, for 1 <= k <= 12

f(n,13) = f(n-1,13)*[1/17*(13/100) + 16/17*(1-13/100)] +
f(n-1,12)*1/17*(1-12/100)

f(n,14) = f(n-1,14) + f(n-1,13)*1/17*(1-13/100)

f(1000,14) =~ 9.35%

In Excel, this requires computing n rows by k columns, or for this specific example, 901 rows and 15 columns, with the first row representing f(100,k). Think of each row as a 15 dimensional state vector, where states k = 0 to 13 hold the probability of k pairs in the last 100 hands, and state k = 14 accumulates the probability of 14 or more pairs anywhere in the last n hands. Note that there is no exit from state 14, so this accumulates the answer over all n. All of the other states only hold the fraction of pairs in the last 100 hands. If you like, I can email you this spreadsheet.

To understand this method, picture a sliding window 100 hands long that we advance 1 hand at a time. With each advance, we pick up one new hand (hand n), and lose 1 hand from the beginning of the window (hand n-100). The number of pairs in the window does not change if the nth hand is a pair (probability 1/17) and hand n-100 was also a pair (probability k/100), and there is also no change if hand n is not a pair (16/17) and hand n-100 was not a pair (1-k/100). In these cases, we update f(n,k) from f(n-1,k). We update f(n,k) from f(n-1,k-1) when the sliding window gains a pair, that is, when the nth hand is a pair (1/17) and hand n-100 was not a pair which has probability 1-(k-1)/100. We update f(n,k) from f(n-1,k+1) when the sliding window loses a pair, that is, when the nth hand is not a pair (16/17) and hand n-100 was a pair with probability (k+1)/100.

If you were writing a program to do this, the only storage required would be 2 arrays of length 15, where each one gets updated from the other back and forth.

This is a Markov process, and it can also be set up as a matrix. The matrix would be 15x15, with each row having at most 3 entries. The initial state vector would be f(100,k), which are the probabilities after 100 hands. The matrix would be multiplied by this vector 900 times, or the matrix could be raised to the 900th power, and the resultant matrix would then be multiplied by the vector one time.

It is assumed in the above that when k pairs appear in the window of length 100, that the probability that the first hand in the window is a pair is k/100. This would be true if we considered all hand sequences with k pairs in this window; however, we are only considering the hand sequences which do not have 14 or more pairs in any preceding window of length 100. This condition would sometimes change this probability, though perhaps only slightly.

BruceZ · #5 08-26-2005, 03:53 PM

Irchans has pointed out that my above method is not exact. See my comments in red in the original post for an explanation. In fact, the computation of the probability in question was the main issue which prevented me from posting a solution earlier. I will leave this solution up because I believe it is a good approximation, though I do not know what the error is, and because I don't have any better solution at this point.

gol4pro · #6 08-27-2005, 12:03 AM

Change the situation around a little bit.

Say a computer generates a random number from 1-17. What's the chance that you would guess the number (with 1 guess each round) 14 times in a span of 100 guesses. This is really the same thing as the pocket pair question.

Seems to me like binomial theorem would work.

100c14*100c86*(1/17)^14*(16-17)^86

It's been a while since I took probability, so there may be a 1 - something in there somewhere, but is there any reason binomial therom doesn't work here?

BruceZ · #7 08-27-2005, 12:39 AM

[ QUOTE ]
Change the situation around a little bit.

Say a computer generates a random number from 1-17. What's the chance that you would guess the number (with 1 guess each round) 14 times in a span of 100 guesses. This is really the same thing as the pocket pair question.

Seems to me like binomial theorem would work.

100c14*100c86*(1/17)^14*(16-17)^86

It's been a while since I took probability, so there may be a 1 - something in there somewhere, but is there any reason binomial therom doesn't work here?

[/ QUOTE ]

The probability that you guess it exactly 14 times out of 100 is just

C(100,14)*(1/17)^14*(16/17)^86

or in Excel, =Binomdist(14,100,1/17,false).

For the probability that you guess it 14 times or more, you would take 1 minus the sum of C(100,k)*(1/17)^k*(16/17)^(100-k) for k = 0 to 13. This is the same as =1 - Binomdist(13,100,1/17,true).

This isn't the question that the OP asked. He wants to know the probability of 14 or more out of 100 hands for some consecutive 100 hands within 1000 hands. The binomial distribution is just used to get the initial conditions for the first 100 hands.

Lexander · #8 08-27-2005, 03:45 AM

If you are willing to accept an approximation (which I personally have no problem with), then you might as well get your answer by simulation.

Personally, I wish I was more comfortable with queuing theory, because this seems like some kind of single server queue, with geometric arrivals and fixed departures (my intro class never got past M/M/1 queues, so I am out of my league on how to model this).

irchans · #9 08-27-2005, 09:39 AM

I wrote two quick simulations. Here are the results for the probability that at some time in a sequence of 1000 hands, there exists a 100 hand window with 14 or more pocket pairs.

In the first simulation of 10000 1000 hand sequences, I got 14 or more pairs 1091 times. The second simulation I got 1040. So, I think the correct answer is around 10% or 11%.

For the second simulation I copied the following formulas into excel:

cells c1 to c1000 >>>> +IF(RAND()<1/17,1,0)
cells d1 to d1000 >>>> +SUM(C1:C100)
cell e1 >>>> 0
cells e2 to e1000 >>>> +MAX(E1,D1)

Each time you hit the F9 button, the worksheet shows the maximum number of pairs in a 100 hand window in cell e1000.

irchans · #10 08-27-2005, 12:16 PM

I think it's possible to use recurrence formulas like Bruce's formaulas to get the exact answer.

Look at the 1000 hands as 900 overlapping windows of width 100.

The Probability p[n] that the nth window will have exactly 14 hands is 0.0014279740790007608. Let g[n] be the probability that the nth window has exactly 14 hands and that no previous window had 14 hands. Let r[n,i] be the probability that the nth window will have exactly 14 hands given that the ith window had exactly 14 hands. Note that r[n,i] = r[n-i] because it only depends on the difference between n and i. Then

p[n] = Sum[ g[j] * r[n-j] , {j, 0, n}].

( p is the convolution of g and r so we might be able to use Fourier Transforms to solve the problem, but I won't use those in this post. )

If we can solve this equation for g[n] then the answer to the original poster's question is:

Sum[ g[n], {n,1,900}].

Notice that r[0] = 1 so:

p[n] = Sum[ g[j] * r[n-j] , {j, 0, n}]
p[n] = g[n] + Sum[ g[j] * r[n-j] , {j, 0, n-1}]
g[n] = p[n] - Sum[ g[j] * r[n-j] , {j, 0, n-1}]
g[n] = 0.0014279740790007608- Sum[ g[j] * r[n-j] , {j, 0, n-1}].

Now we just need the sequence r[n] to compute g[n]. If n<100, r[n] is the sum over k=0,...,14 of the probability that the first n hands of a window has k pairs times the probability that the next n hands after the window have k pairs. (If n>=100, r[n] = 0.0014279740790007608.)

Code:<hr /><pre>
r[n] = Sum[ C[n, i]
*(1/17^i)*(16/17)^(n - i) * prfirst[n, i],
{i, 0, Min[n, 14]}]
</pre><hr />

The probability that the first n hands of a 100 hand window have i pairs is

Code:<hr /><pre>
prfirst[n, i] = If[ j < 14 - (100 - i),
0,
C[i, j]*C[100 - i, 14 - j] / C[100, 14]]
</pre><hr />

Using those formulas, the 110 values of r are 0.81765, 0.68107, 0.57759, 0.49818, 0.43635, 0.38746, 0.34817, 0.31607, 0.28941, 0.26693, 0.24769, 0.231, 0.21634, 0.20332, 0.19166, 0.18111, 0.17151, 0.16272, 0.15461, 0.1471, 0.14012, 0.13361, 0.12751, 0.12178, 0.11639, 0.1113, 0.10648, 0.10192, 0.09759, 0.09347, 0.08955, 0.08581, 0.08224, 0.07883, 0.07557, 0.07245, 0.06946, 0.06659, 0.06384, 0.0612, 0.05866, 0.05622, 0.05387, 0.05162, 0.04945, 0.04736, 0.04534, 0.0434, 0.04154, 0.03974, 0.038, 0.03633, 0.03472, 0.03317, 0.03168, 0.03023, 0.02884, 0.0275, 0.02621, 0.02497, 0.02377, 0.02262, 0.02151, 0.02044, 0.01941, 0.01841, 0.01746, 0.01654, 0.01566, 0.01482, 0.014, 0.01322, 0.01247, 0.01175, 0.01107, 0.01041, 0.00977, 0.00917, 0.00859, 0.00804, 0.00752, 0.00701, 0.00654, 0.00608, 0.00565, 0.00524, 0.00485, 0.00448, 0.00413, 0.0038, 0.00349, 0.0032, 0.00292, 0.00266, 0.00242, 0.00219, 0.00198, 0.00178, 0.0016, 0.00143, 0.00143, 0.00143, 0.00143, 0.00143, 0.00143, 0.00143, 0.00143, 0.00143, 0.00143, and 0.00143.

Plugging these values into the formula for g gives

g[1] = 0.00142797
g[2] = 0.000260395
g[3] = 0.000242518,
g[4] = 0.000227544, ...

Adding up the first 900 values of g gives an answer of 11.7327%. This number seems a bit higher than the simulation values so I'm not real confident about it yet.

Any comments?