I am reposting this with a much clearer explanation for the second term in examples 1 and 3. If you read the original post, you can see the typo and further clarification that I posted below it. The examples were intended to show how all of the terms are computed in essentially the same way. Unfortunately, much of this hinged on a couple of key sentences which failed to convey what was intended. Hopefully this post will rectify that. Ignore the previous post, and use this version for reference.
As far as texts, I just answered a similar question in this post. Additionally, a classic book just on probability theory (no statistics) is the text by William Feller, Introduction to Probability Theory and it's Applications. Despite its title, it presents many advanced topics, and the explanations are somewhat terse, so it is best for math majors.
Neither text discusses inclusion-exclusion, nor do I know of one that does. A multi-part series by Brian Alspach in Poker Digest called I'm In...No I'm Out dealt with this topic. I haven't read this one, but it is probably good. I have written many posts on this forum with explanations and examples of this principle. If you search for these posts, they will tell you everything you need to know about this principle to solve practical problems. I have repeated a few of the simplest examples below.
Explanation of the Inclusion-Exclusion Principle
Inclusion-exclusion is based on a very simple idea. Suppose a poker room has 3 waiting lists for 3 different games, and players are allowed to be on more than one list at a time, but of course on each list only once. We want to know how many total players are waiting to play. We could simply add the number of players on each list, but this has the obvious problem that it would count the players that are on 2 lists twice, and it would count the players on all 3 lists 3 times, so this number would be too high if any players are on more than one list. That is, we would have a number equal to:
If we knew how many were on two or three lists, we could subtract that off to get the answer we want. So suppose we next count all the players that are waiting on two lists by counting the number that are on both list 1 and 2, both lists 2 and 3, and both lists 1 and 3. This would count all the players on exactly 2 of these lists once, which is good, but it would count the players on all 3 lists 3 times. Now if we subtract this number from our previous count, we would get:
Now if we count the players that are on all 3 lists, we could add this on to get the total number of players waiting.
Why bother with this method of counting? Because in many probability problems, this is the only practical way to count the number of things we need to count, and it actually makes our lives much easier. That is, it is often very difficult to count the members of a union of sets without over counting the members that lie in the intersection of sets. The inclusion-exclusion principle allows us to compute much simpler terms which over count on purpose, and then it fixes the over counting for us.
This method works for counting the members of any number of sets, not just 3. Note that to obtain an exact answer, we need to compute as many terms as there are different sets of members we are counting, in this case 3. The odd numbered terms (starting from 1,3,5..) are always added, and the even numbered terms are always subtracted (reverse that for comp-sci. people who start counting with 0, myself included ). The idea is that when we add a term, we include some members too many times, so then we subtract a term, but then we exclude some members too many times, etc. That is why it is called "inclusion-exclusion". Successive terms always get smaller, so in most practical problems, we can stop computing terms when we see that the terms have become smaller than our desired degree of approximation, instead of computing all of the terms which is often difficult.
Mathematically, this is what we are doing:
P(A u B u C u ...) = P(A) + P(B) + P(C) + ... - P(AB) - P(AC ) - P(BC) - ... + P(ABC) + ...
where A,B,C... are sets, and (A u B) means the union of A and B, so P(A u B) is the probability of A OR B. (AB) is the intersection of A and B, so P(AB) is the probability of A AND B. So first we add the probabilities of each set, and this is the first term. Then we subtract the probabilities of each pair-wise intersection of sets, and this is the second term. Then we add the probabilities of each 3-way intersection, and this is the third term, etc.
This is all some textbooks may mention (or less) about the probability of a union, usually without even referring to “inclusion-exclusion” by name. To fully appreciate its usefulness and power requires some examples.
This is the simplest non-trivial poker example. If you hold KK before the flop, what is the probability that at least one of your 9 opponents holds AA?
Note that at most 2 opponents can have AA. The probability that any particular player holds AA is 6/C(50,2). So the first term in inclusion-exclusion is simply 9*6/C(50,2). That is the sum of the 9 probabilities of each player having AA. This alone is very close to the exact answer, but it double counts the times that 2 players hold AA. For the second term, we take the probability of 2 particular players having aces, 1/C(50,4), and multiply by the number of ways to pick the two players, which is C(9,2). So the final answer is:
P(KK vs. AA) = 9*6/C(50,2) - C(9,2)/C(50,4)
or about 4.39%
Here’s a neat "trick". If you hold AA, what is the probability that one of your 9 opponents holds AA?
The probability that a particular opponent holds AA with you is 1/C(50,2). The first term in inclusion-exclusion is 9/C(50,2). We’re done, that's the final answer. There are no more terms to compute because it is not possible for more than 1 opponent to have AA with you. That is, the events of different players having AA are mutually exclusive, and you’ll recall that the probability of mutually exclusive events can always be summed to get the probability of one of the events occurring. Mutually exclusive events are a trivial special case of the inclusion-exclusion principle where there are no intersections, so all terms except the first are zero.
Let's do one more example. This one is a little more substantial. If the flop has exactly 2 diamonds, and you don't have a diamond, what is the probability that at least one of your 4 remaining opponents has a 4-flush assuming they hold random cards? Note: It is equally easy to do this problem assuming a realistic distribution of starting hands.
In this case it is possible for up to 4 players to have 4-flushes, so we will have 4 terms in the exact answer. The probability that a particular player has 2 diamonds is C(11,2)/C(47,2). The first term in inclusion-exclusion is thus 4*C(11,2)/C(47,2). For the second term, we take the probability that 2 particular players have diamonds, which is C(11,4)/C(47,4), and we multiply this by the number of ways to choose the 2 players which is C(4,2). Note that this counts the times that more than 2 players have diamonds more than once, but this is exactly what we want for the inclusion-exclusion principle. The second term is thus C(4,2)*C(11,4)/C(47,4). Proceeding the same way for the remaining terms give this exact solution:
C(4,1)*C(11,2) / C(47,2) -
C(4,2)*C(11,4) / C(47,4) +
C(4,3)*C(11,6) / C(47,6) -
C(4,4)*C(11,8) / C(47,8)
or about 19.3%