Two Plus Two Older Archives - View Single Post - Great problem solved

BruceZ · #1 07-12-2003, 06:34 PM

A couple of months ago, someone asked for the probability of getting 3 AA in a row in 250 hands. This gave rise to the more general question which was phrased as:

Do you have the formula for "at least 1 streak of at least m occurences in n trials"? I don't. Show us yours.

This is an important problem in the theory of runs. At the time, I had given a number of approximations which were very close to the exact answer. Last year, I discovered an exact solution to the particular problem for m=2 and probability of success = ½ in terms of the Fibonacci series. Since that time I have had a project on the back burner to extend this exact solution to the general question asked above. I have now completed this project, and I have not one but two different exact closed form solutions to this problem, and I have verified unequivocally that they are both correct. Although accurate approximate solutions can be found by elementary methods, the exact solution is of a very different nature than one may initially expect, and the solution is quite elegant and beautiful. Hence I am now prepared to “show you mine” as requested.

To restate the problem, we want the probability P(m,n,p) of at least 1 occurrence of a run of of length at least m of an event with probability p in n trials. For the question of 3 AA in a row in 250 hands, m = 3, n=250, p=1/221. For a coin flip problem, p=1/2.

Exact closed form solution 1

The probability P(m,n,p) of at least 1 occurrence of a run of length at least m of an event with probability p in n trials is given exactly by:

P(m,n,p) = Sum{k = 1 to n} F[k]

Where F[k] is given by the recurrence relation:

F[k] = F[k-1] - (1/p - 1)*F[k-m-1]*p^(m+1) for k > m+1

And the initial conditions:

F[k] = 0 for k< m
F[m] = p^m
F[m+1] = (1-p)*p^m

The F[k] can easily be computed and summed in Excel. For the 3 AA problem simply set:

A1 = 0
A2 = 0
A3 = 1/221^3
A4 = (220/221)*(1/221)^3
A5 = A4-220*A1/221^4
.
.
.
Copy the last formula down to cell A250. Then compute =sum(a1:a250) for the exact probability. An Excel spreadsheet is available upon request.

For m=2 and p=1/2, this reduces to

P(2,n,1/2) = sum{k = 2 to n} F[k-1]/2^k

Where F[k] is the Fibonacci series as derived previously, and this simplifies too:

P(2,n,1/2) = 1 - F[n+2]/2^n

Subsequent to this derivation, I have found a relatively recent reference to the Fibonacci series solution for this special case in the literature; however, I have not found the closed form exact solution to the more general problem which I have given above . There is a slight chance that I am the only one in possession of it.

Exact closed form solution 2

The probability P(m,n,p) of at least 1 occurrence of a run of length at least m of an event with probability p in n trials is given exactly by:

P(m,n,p) = v[m+1]

Where v is an m+1 dimensional vector given by the m+1 x m+1 matrix product:

<pre>code:<hr>
[ 1-p 1-p 1-p . 1-p 0 ] ^n [ 1 ]
v = | p 0 0 . 0 0 | | 0 |
| 0 p 0 . 0 0 | * | 0 |
| 0 0 p . 0 0 | | 0 |
| … . … | | … |
[ 0 0 0 . p 1 ] [ 0 ]</pre><hr>
The matrix represents the transition probabilities between m+1 states. The states are 0 in a row, 1 in a row…up to m in a row having been obtained. The columns of the matrix represent the current state, and the rows represent the next state. This is called a Markov chain. Such a solution was once presented by irchans for a similar problem.

Matrix multiplication can be carried out in Excel using MMULT to produce all of the n vectors. You have to use a ctl-shift-enter sequence to define the size of the vector as explained in the help to MMULT. An example spreadsheet is available upon request.

Derivation of exact solution 1

I give here a concise derivation of the most general form of the equation, but it is not especially easy to understand the derivation in this general form. It is easiest to refer to the derivation for the special case of m=2, p=1/2, and then insert those constants into the derivation below. It may be helpful to refer to the intial thread where the Fibonacci solution was derived for the case of m=2 and p=1/2 Fibonacci solution for coin tossing

Remember that the view we take here is that of a death and renewal process in which we consider a population of sequences at each stage, each of which “reproduces” by having a success or failure appended to it, and then some of the sequences “die” or terminate because m in a row have been obtained. We wish to determine the number that terminate at each stage which we’ll call t(n). The key realization is that this is equal to the number of sequences which do not end in a success m stages previous, which we’ll call k(n-m). This is because each of these sequences may be followed by m successes, and only those sequences will terminate at stage n. o(n) are the total number of outcomes or sequences after the nth stage. Thus:

t(n) = k(n-m)

= (1-p)*o(n-m)

= [ (1-p)/p ] * [ o(n-m-1) – t(n-m-1) ]

= [ (1-p)/p ]*{ [ 1/(1-p) ]*t(n-1) – t(n-m-1) ]

= (1/p)*t(n-1) – (1/p – 1)*t(n-m-1)

Since t(n) is the number of sequences that end in m successes at each stage, it remains to divide each of these by 2^n total sequences and sum these to get the final probability.

P(n,m,p) = sum{k = 1 to n} t(k)/2^k

In computing the above difference equation in Excel, the terms become so large that they are no longer representable. The following difference equation absorbs the division by 2^k.

F[k] = F[k-1] – (1/p – 1)*F[k-m-1]*p^(m+1) for k > m+1

With initial conditions:

F[k] = 0 for k < m
F[m] = p^m
F[m+1] = (1-p)*p^m

Then P(n,m,p) = sum{k = 1 to n} F[k]

Comparison of approximate and exact results

In the original thread I gave a number of approximate solutions to the problem of 3 AA in a row in 250 hands, all of varying degrees of precision and varying degrees of complexity, all of which I knew to be very useable. The lack of an exact solution led to some disparaging comments (Bruce's notion is unfortunately extremely inadequate, and from Robk Sorry Bruce, but I think you are way off on this one). I have tabulated below the results of all of the approximation methods given by me and by other posters in that thread, along with the exact answer computed by the methods described above. As can be seen, all of the approximation methods give probabilities which are quite adequate for all practical applications, and none of them are “way off” with the exception of the solution given by Robk which is indeed way off simply due to an error in his post which I have corrected. I have listed the corrected answer obtained from the method which he intended. Refer to Feller pp. 324-325. I used the goal seek feature of Excel to solve the cubic equation.

Approximations 3 and 4 appeared in the original thread. Approximations 1 and 2 are much more precise versions of 3 and 4 respectively which I have computed subsequent to that thread. The only difference is that the more precise versions use a different number of trials. Instead of 248, I have used 248 divided by the expected length of a trial which is 1*(220/221) + 2*(1/221)(220/221) + 3*(1/221)^3 = 248/1.004484216. These methods are both simple and very precise. I have given this method in other threads prior to this discussion, for example see the top of boxcars. The bottom of that post is actually an early failed attempt at a more general solution which only works for m=2. This derivation should be ignored.

The inclusion-exclusion calculation is carried out to the number of terms in the original thread. The inclusion-exclusion approximation, the Feller approximation, and the computer program all give highly precise solutions correct to the nearest whole number. The exact solutions derived above are much simpler to compute than either inclusion-exclusion or the Feller approximation.

<pre>code:<hr>

Method probability (%) 1 in

1.) EXACT(both methods above) .002287222419 43721.15242

2.) (248/1.004484216)*(1/221)^3 .002287112016 43723.26291

3.) 1–(1-1/221)^3^(248/1.004484216) .002287319310 43719.30039

4.) 248/221^3 .002297602313 43523.63306

5.) 1 – (1-1/221)^3^248 .002297576026 43524.13103

6.) Inclusion-Exclusion .002872000000 43721.58097

7.) Cyrus’ computer program .002287224379 43721.11495

8.) Robk approximation(w/error) .002248 44483.98577

9.) Feller approximation .002287222422 43721.15236
(Robk method corrected)
1 – (1 - 1/221)*x / [ (4 – 3x)(220/221)*x^251 ]
where x is the real root of (220/221)*x*[ 1 + (1/221)*x + (1/221)^2*x^2 ] = 1
x = 1.00000092
</pre><hr>