View Single Post
  #13  
Old 10-27-2005, 07:03 AM
BruceZ BruceZ is offline
Senior Member
 
Join Date: Sep 2002
Posts: 1,636
Default Re: Riddle -- Probability of Expectation

[ QUOTE ]
I was given a sheet of riddles for extra credit in my statistics class and got all but one of them right.

Here's the one that totally stumped me and pissed me off.

In The Art of Shakespeare's Sonnets (Harvard Univ. Press, 1997), author Helen Vendler noted that each of the 14 lines of Sonnet 20 (one of 154 sonnets written by Shakespeare) includes the letters of the word "hues" and or the letters of the word "hews."

Suppose that 154 monkeys sitting at 154 keyboards pounded out one sonnet apiece, each consisting of 14 lines and 36 alphabet letters each, with each letter equally likely. What is the probability that in at least one of the sonnets, every line includes the letters of the word "hues" and/or the letters of the word "hews?"


What say youse?

Prof. will not be giving the answer until the end of the semester.

[/ QUOTE ]


This problem can be solved very quickly and exactly by the inclusion-exclusion principle. As we have seen, the only difficult part of the problem is to compute the probability that the letters of "hues" or "hews" do not occur in a 36 letter line. If we call this probability P, then our final answer will be

P(at least 1 sonnet has "hues" or "hews" in every line) = 1 - [1 - (1 - P)^14]^154.

That is, 1 - P is the probability that it does occur in a 36 character line. (1 - P)^14 is the probability that it occurs in all 14 lines of a sonnet. 1 - (1 - P)^14 is the probability that it does not occur in every line of a sonnet. [1 - (1 - P)^14]^154 is the probability that it does not occur in every line of any of the 154 sonnets. 1 - [1 - (1 - P)^14]^154 is the probability that it does occur in every line of at least 1 of 154 sonnets.

We are assuming here that every line contains exactly 36 letters, since otherwise we would need to have been given the probability distribution of the length of a line. We were told that each of the 36 letters can be 1 of 26 equally probable letters, so we are ignoring spaces.

Now we simply need to find P, the probability that neither the letters of "hues" or "hews" occur in a 36 letter line. Note that this condition will be satisfied if there is no h, or if there is no e, or if there is no s, or if there is both no u AND no w. We simply want the probability of the union of these four conditions. We can compute the probability of this union by adding their probabilities, and then using the inclusion-exclusion principle to adjust for the over counting as follows:

P = P(no "h-u-e-s" AND no "h-e-w-s) =

P(no h) + P(no e) + P(no s) + P(no u AND no w) -

P(no h AND no e) - P(no h AND no s) - P(no e AND no s) - P[no h AND (no u AND no w)] - P[no e AND (no u AND no w)]- P[no s AND (no u AND no w)] +

P(no h AND no e AND no s) + P[no h AND no e AND (no u AND no w)] + P[no h AND no s AND (no u AND no w)] + P[no e AND no s AND (no u AND no w)] -

P[no h AND no e AND no s AND (no u AND no w)]


This is much simpler than it looks since we can combine many probabilities that are equal. We can rewrite the above as:

P = P(no "h-u-e-s" AND no "h-e-w-s") =

3*P(no h) + P(no u AND no w) -

3*P(no h AND no e) - 3*P[no h AND (no u AND no w)] +

P(no h AND no e AND no s) + 3*P[no h AND no e AND (no u AND no w)] +

P[no h AND no e AND no s AND (no u AND no w)]


These terms are all easy to compute as:

P = P(no "h-u-e-s" AND no "h-e-w-s) =

3*(25/26)^36 + (24/26)^36 -

3*(24/26)^36 - 3*(23/26)^36 +

(23/26)^36 + 3*(22/26)^36 -

(21/26)^36

=~ 60.16%

Note that using the independence approximation, Baker computed 1 minus this number as 40.84%, so it was off by about 1% which is reasonable, and it was slightly larger than the exact value, which we expected.

Substituting this value of P into our above equation produces the final answer:

1 - [1 - (1 - P)^14]^154 =~ 0.039%.
Reply With Quote