Two Plus Two Older Archives - View Single Post

BruceZ · #2 10-24-2005, 02:49 AM

[ QUOTE ]
This is a question from my probability homework, and someone please tell me if it's bad form to ask for homework help here (it's completely allowed in terms of my class -- we're allowed to work in groups and get help from anyone and anywhere). But if it's bad for this forum then please let me know and I won't do it again.

On to the question --

To determine whether or not they have a certain disease, 100 people are to have their blood tested. However, rather than testing each individual separately, it has been decided first to group the people into groups of 10. The blood samples of the 10 people in each group will be pooled and analyzed together. If the test is negative, one test will suffice for the 10 people. If the pool is positive, then each of the 10 people will be individually tested. Assume that the probability of disease is 10% for all people, independently of each other. Compute the distribution of the number of tests to be performed along with the expected value and variance.

So... I know that each group of 10 people has a .9^10 probability of testing negative, meaning just one test will suffice. And then with probability 1-.9^10, it will require 11 tests (1 for the whole group and then 10 individually). But I'm not really sure where to go from here.

Thanks in advance.

[/ QUOTE ]

For convenience, let S be the probability that at least one person from a group is sick. S is the probability that a group needs to have all of its individual members tested. You found that S = 1-.9^10. Then 1-S = .9^10 is the probability that no one in a given group is sick, and so 1-S is the probability that a given group does not need to have its individual members tested.

To compute the distribution, you need to compute the probability of each possible number of tests. The minimum number of tests is 10, and that happens if nobody is sick. That has probability (1-S)^10. If exactly 1 group has sick people, then there will be 20 tests. From the binomial distribution, that has probability C(10,1)*S*(1-S)^9. If exactly 2 groups have sick people, then there will be 30 tests, and that has probability C(10,2)*S^2*(1-S)^8. Do this for each possible number of tests, up to 110 tests which will be required if someone from each group is sick, which has probability C(10,10)*S^10*(1-S)^0 = S^10. At that point you will have a function which assigns probabilities to the 11 possible numbers of tests from 10 to 110. Now technically, this is called the density function, not the distribution function. The distribution function, call it F(x), is cumulative, so that F(x) means the probability that there are <= x tests.

F(x) = 0 for x < 10 since the probability of < 10 tests is 0.
F(10) = P(<= 10 tests) = P(exactly 10 tests)
F(20) = P(<= 20 tests) = F(10) + P(exactly 20 tests)
F(30) = P(<= 30 tests) = F(20) + P(exactly 30 tests)
...
F(110) = P(<=110 tests) = F(100) + P(exactly 110 tests) = 1, since there will always be <= 110 tests.

Note that the cumulative probability distribution always starts with a value of 0 and increases to a value of 1.

To get the expected value of the number of tests, from the definition of expected value, you could compute:

E(# tests) = 10*P(exactly 10 tests) + 20*P(exactly 20 tests) +
30*P(exactly 30 tests) + ... + 110*P(exactly 110 tests).

However; there is a simpler way, and that is to recognize that the expected value of the number of tests is simply 10 + 100*S. It is a useful trick to remember that whenever you wish to compute the expected value of a number of things, you can simply add the probabilities of each thing. In this case, the probability of each of the 100 individuals being tested is S, the probability of that individual’s group being tested. Note that it doesn’t matter that these probabilities are not independent. We add 10 since we will always start with 10 tests.

To compute the variance of the number of tests, you can use this identity:

variance(x) = E(x^2) - [E(x)]^2

where x is the number of tests, and E(x) is the expected value of the number of tests that you computed in the previous part. You would just need to compute E(x^2), the expected value of the number of tests squared, using the first definition of expected value given above, where x^2 can be 10^2, 20^2, ...up to 110^2.

It is easiest to organize this on a spreadsheet such as Excel if you have access to that. Excel also has a function called =BINOMDIST, which can sum the terms for you if you set the last input to a value of TRUE for cumulative.